AI Models Directory
Explore our curated collection of machine learning and AI models from leading research organizations and companies.
Find Models
Microsoft
BitNet b1.58 2B4T
SLM
Open Source
A 1-bit AI model with 2 billion parameters, released in April 2025, designed for hyper-efficient performance on CPUs, including Apple’s M2, using a custom bitnet.cpp framework. It achieves double the speed and lower memory usage compared to traditional models, ideal for resource-constrained devices.
Efficient
CPU-optimized
Low Memory
Fast Inference
8.8/10
Performance
8.9/10
Accuracy
MIT
License
Stability AI
Stable Virtual Camera
3D
Video
Research
A research-preview multi-view diffusion model launched in March 2025, transforming 2D images into immersive 3D videos with realistic depth and perspective. It eliminates the need for complex reconstruction, making it ideal for virtual reality and cinematic applications.
2D-to-3D Video
Immersive Depth
No Reconstruction
Cinematic Use
9.0/10
Performance
9.1/10
Accuracy
CreativeML Open RAIL-M
License
DeepSeek
DeepSeek-V3-0324
LLM
Open Source
MoE
DeepSeek-V3-0324 is an upgraded V3 model with enhanced reasoning, coding, and tool-use capabilities, outperforming GPT-4.5 in math and coding.
Mathematical Reasoning
Code Generation
Tool Use
Large Context Window
9.3/10
Performance
9.5/10
Accuracy
MIT
License
NVIDIA
Cosmos Nemotron
LLM
Open Source
Physical AI
Cosmos Nemotron is an open reasoning model for physical AI development, offering customizable world generation for robotics and simulation.
World Generation
Physical AI Reasoning
Customizable Simulation
9.1/10
Performance
9.3/10
Accuracy
Apache 2.0
License
NVIDIA
GR00T N1
Robotics
Open Source
Humanoid AI
GR00T N1 is an open, customizable foundation model for humanoid robot reasoning, enabling advanced perception and action in robotics.
Humanoid Robot Reasoning
Perception
Action Planning
9.0/10
Performance
9.2/10
Accuracy
Apache 2.0
License
Alibaba
Qwen2.5-Coder Series
Code Generation
Open Source
Multilingual
A series of code-specific models optimized for code generation, reasoning, and fixing, available in multiple sizes.
Code Generation
Code Reasoning
Code Fixing
Supports 92 Programming Languages
9.3/10
Performance
9.1/10
Accuracy
Apache 2.0
License
Alibaba
Qwen2.5-Math Series
Math
Open Source
Multilingual
An advanced math-specific model series extending Qwen2.5 capabilities with high performance in mathematical reasoning tasks.
Math Word Problems
Multi-Hop Reasoning
Symbolic Math
MathQA Tasks
9.2/10
Performance
9.3/10
Accuracy
Apache 2.0
License
OpenAI
GPT-4.5
LLM
Conversational
Codenamed Orion, this large model reduces hallucinations compared to GPT-4o and o1. It’s designed for conversational tasks and broad knowledge applications.
Text Generation
Low Hallucination
Conversational AI
9.3/10
Performance
9.5/10
Accuracy
Proprietary
License
Microsoft
Magma
Multimodal
Agentic AI
A multimodal AI model introduced in February 2025, combining visual and language processing to control software interfaces and robotic systems, enabling agentic AI for autonomous task execution. It features Set-of-Mark and Trace-of-Mark for spatial intelligence, with public code released on GitHub.
Visual Processing
Language Processing
Robotic Control
Spatial Intelligence
9.0/10
Performance
9.1/10
Accuracy
MIT
License
Microsoft
Muse (WHAM)
Generative AI
Open Source
A generative AI model for video game visuals and controller actions, released in February 2025, developed with Ninja Theory and published in Nature. It supports gameplay ideation through the WHAM Demonstrator, with open-source weights and sample data available on Azure AI Foundry.
Game Visuals
Controller Actions
Gameplay Ideation
Interactive Interface
8.9/10
Performance
9.0/10
Accuracy
MIT
License
xAI
Grok-3
LLM
Web Search
Advanced Reasoning
The latest Grok model featuring reflection capabilities and advanced web search integration.
Reflection Capabilities
DeepSearch Integration
Advanced Reasoning
9.2/10
Performance
9.0/10
Accuracy
Proprietary
License
Stability AI
Stable Point Aware 3D (SPAR3D)
3D
Real-time
A cutting-edge 3D generation model introduced in January 2025, enabling real-time editing and complete structure generation from a single image in under a second. It supports rapid prototyping for gaming, architecture, and entertainment with high precision and efficiency.
Real-time Editing
Image-to-3D
High Precision
Rapid Prototyping
9.2/10
Performance
9.3/10
Accuracy
CreativeML Open RAIL-M
License
DeepSeek
DeepSeek-R1
LLM
Open Source
Reasoning
DeepSeek-R1 is a reasoning-focused model fine-tuned from V3, competing with top models like OpenAI’s o1 in math, coding, and reasoning tasks.
Chain-of-Thought Reasoning
Mathematical Reasoning
Code Generation
Self-correction
9.2/10
Performance
9.4/10
Accuracy
MIT
License
DeepSeek
Janus-Pro-7B
Multimodal
Open Source
Vision
Janus-Pro-7B is a multimodal vision model for image understanding and generation, outperforming models like DALL-E 3 on key benchmarks.
Image Understanding
Image Generation
Multimodal Processing
8.9/10
Performance
9.1/10
Accuracy
MIT
License
Alibaba
Qwen2.5 Series
LLM
Instruction-Tuned
Open Source
The latest series of decoder-only language models, available in various sizes and optimized for instruction following and structured output generation.
Instruction Following
Structured Output Generation
Multilingual Support
9.2/10
Performance
9.0/10
Accuracy
Apache 2.0
License
OpenAI
O3 Mini
Reasoning
API Only
Efficiency
A lightweight version of O3 with efficient reasoning capabilities, expected to be accessible via API.
Efficient Reasoning
Problem Solving
Task Versatility
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Amazon
NOVA
Multimodal
API Only
Enterprise
A suite of AI models for various tasks, including text and image processing, accessible via API for enterprise applications.
Text and Image Processing
Enterprise Integration
Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
OpenAI
SORA
Video Generation
API Only
Creative
A video generation model for creating high-quality videos from text prompts, now publicly released via API.
Text-to-Video
High-Quality Output
Creative Storytelling
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Cohere
Command R7B
Language Model
Open Weights
Performance
An open-weight language model optimized for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP
Task Versatility
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
OpenAI
O1
Reasoning
API Only
Problem-Solving
An advanced reasoning model with superior problem-solving capabilities, accessible via API.
Advanced Reasoning
Problem Solving
Task Versatility
9.5/10
Performance
9.4/10
Accuracy
Proprietary
License
OpenAI
O1 Pro
Reasoning
API Only
Professional
A professional-grade version of O1 with enhanced reasoning and task capabilities, accessible via API.
Enhanced Reasoning
Professional Tasks
Task Versatility
9.6/10
Performance
9.5/10
Accuracy
Proprietary
License
OpenAI
Live Video Mode
Multimodal
API Only
Video
A feature for GPT-4o enabling real-time video interaction and analysis, accessible via API.
Real-Time Video
Video Analysis
Task Assistance
9.2/10
Performance
9.1/10
Accuracy
Proprietary
License
Google
Gemini-Exp-1206
Multimodal
API Only
Experimental
An experimental multimodal model with advanced text and image processing, accessible via API.
Text and Image Processing
Advanced Reasoning
Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Google
Gemini 2.0 Flash
Multimodal
API Only
Efficiency
A lightweight multimodal model in beta, optimized for efficient text and image processing, accessible via API.
Text and Image Processing
Efficient Performance
Task Versatility
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Google
Gemini-2.0-Flash-Thinking
Multimodal
API Only
Reasoning
A variant of Gemini 2.0 Flash with enhanced reasoning capabilities, accessible via API.
Enhanced Reasoning
Text and Image Processing
Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Google
Veo 2
Video Generation
API Only
Creative
An advanced video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video
High-Quality Output
Creative Storytelling
9.1/10
Performance
9.0/10
Accuracy
Proprietary
License
IBM
Granite 3.1
Language Model
Open Weights
Performance
An open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP
Task Versatility
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Google
Imagen 3 Update
Image Generation
API Only
Creative
An updated image generation model for creating high-quality visuals, accessible via API for creative applications.
High-Quality Images
Creative Workflows
Professional Design
9.1/10
Performance
9.0/10
Accuracy
Proprietary
License
xAI
Aurora
Image Generation
API Only
Creative
An image generation model integrated with xAI's ecosystem, accessible via API for creative applications.
High-Quality Images
Creative Workflows
xAI Integration
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Microsoft
Phi4
Language Model
Open Weights
Efficiency
An open-weight language model optimized for efficiency and performance on resource-constrained devices.
Resource Efficiency
High Performance
Customizable
8.7/10
Performance
8.6/10
Accuracy
MIT
License
Meta
Llama 3.3 70B
Language Model
Open Weights
Research
An upgraded open-weight language model for research, offering high performance in NLP tasks.
NLP Research
High Performance
Customizable
9.0/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Google
PaliGemma 2
Multimodal
Open Weights
Research
An open-weight vision-language model for advanced multimodal tasks, suitable for research and development.
Vision-Language Processing
Research Flexibility
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Pika
Pika Labs 2.0
Video Generation
API Only
Creative
An upgraded video generation model for creating high-quality videos with advanced effects, accessible via API.
Text-to-Video
Advanced Effects
Creative Storytelling
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Meta
Apollo
Multimodal
Open Weights
Research
An open-weight multimodal model for text and image processing, optimized for research and development.
Text and Image Processing
Research Flexibility
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Deepseek
DeepSeek V3
Language Model
Open Weights
Performance
An open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP
Task Versatility
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
AnswerAI and LightOn
ModernBERT
Language Model
Open Weights
Efficiency
An open-weight language model optimized for advanced NLP tasks, offering high performance and efficiency.
Advanced NLP
Efficient Processing
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Alibaba
QVQ-72B-Preview
Language Model
Open Weights
Preview
A preview of a high-performance language model for advanced NLP tasks, offering open weights for customization.
Advanced NLP
Task Versatility
Customizable
9.0/10
Performance
8.9/10
Accuracy
Apache 2.0
License
OpenAI
O3
Reasoning
API Only
Problem-Solving
An advanced AI model with superior reasoning and problem-solving capabilities, accessible via API.
Advanced Reasoning
Problem Solving
Task Versatility
9.6/10
Performance
9.5/10
Accuracy
Proprietary
License
KLING
Kling 1.6
Video Generation
API Only
Creative
An upgraded video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video
High-Quality Output
Creative Storytelling
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
TII
Falcon 3
Multimodal
Open Weights
Performance
An open-weight model family for advanced language and multimodal tasks, offering high performance and flexibility.
Language Processing
Multimodal Capabilities
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Alibaba
QwQ 32B Preview
Language Model
Open Weights
Preview
An open-weight language model preview for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP
Task Versatility
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Alibaba
Qwen2.5 Coder 32B
Code Generation
Open Weights
Developer
An open-weight model for advanced code generation, optimized for programming tasks and developer workflows.
Code Completion
Syntax Understanding
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Deepseek
DeepSeek-R1-Lite-Preview
Reasoning
API Only
Preview
A preview of a lightweight AI model for reasoning and task assistance, accessible via API.
Efficient Reasoning
Task Assistance
Developer API
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
allenai
Tulu 3
Language Model
Open Weights
Research
An open-weight language model for research, offering high performance in NLP tasks with a focus on flexibility.
NLP Research
High Performance
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Suno AI
Suno v4
Music Generation
API Only
Creative
An upgraded music creation model generating high-quality audio tracks, accessible via API for creative projects.
Text-to-Music
High-Quality Audio
Creative Flexibility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
HuggingFace
SmolLM 2
Language Model
Open Weights
Efficiency
An open-weight lightweight language model for research and efficient NLP tasks, offering high performance.
Lightweight NLP
Research Flexibility
Customizable
8.6/10
Performance
8.5/10
Accuracy
Apache 2.0
License
Mistral
Pixtral Large
Multimodal
Open Weights
Research
An open-weight multimodal model for advanced text and image processing, optimized for research and development.
Text and Image Processing
Research Flexibility
Customizable
9.0/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Mistral
Mistral Large 2411
Language Model
Open Weights
Performance
An upgraded open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP
Task Versatility
Customizable
9.0/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Google
gemini-exp-1114
Multimodal
API Only
Experimental
An experimental multimodal model with advanced text and image processing, accessible via API.
Text and Image Processing
Advanced Reasoning
Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Google
gemini-exp-1121
Multimodal
API Only
Experimental
An experimental multimodal model with enhanced text and image processing, accessible via API.
Text and Image Processing
Advanced Reasoning
Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Allen AI
OLMo 2
Language Model
Open Weights
Research
An open-weight language model for research, offering high performance in NLP tasks with a focus on efficiency.
NLP Research
High Performance
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Anthropic
Visual PDF Analysis
Document Analysis
API Only
Multimodal
A feature in Claude for analyzing PDF documents with visual content, accessible via API.
PDF Analysis
Visual Content Processing
Task Assistance
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
HuggingFace
SmolVLM
Multimodal
Open Weights
Efficiency
An open-weight vision-language model optimized for efficient multimodal tasks, suitable for research and development.
Vision-Language Processing
Resource Efficiency
Customizable
8.6/10
Performance
8.5/10
Accuracy
Apache 2.0
License
Black Forest Labs
Flux 1.1 Pro
Image Generation
API Only
Creative
An upgraded image generation model for professional-grade visuals, accessible via API.
High-Quality Images
Professional Design
Creative Workflows
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Meta
Movie Gen
Video Generation
API Only
Creative
A video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video
High-Quality Output
Creative Storytelling
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Cohere
Aya Expanse
Language Model
Open Weights
Performance
An open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP
Task Versatility
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Pika
Pika Effects
Video Generation
API Only
Creative
A video model with advanced effects for creative video editing, accessible via API.
Video Effects
Creative Editing
High-Quality Output
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Adobe
Firefly Video
Video Generation
API Only
Creative
A video generation model for professional-grade video creation, accessible via API.
High-Quality Video
Professional Editing
Creative Workflows
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Rhymes
Aria
Conversational
Open Weights
Creative
An open-weight conversational AI model optimized for task assistance and creative interactions.
Task Assistance
Creative Interactions
Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Meta
Meta Spirit LM
Language Model
Open Weights
Research
An open-weight language model for research, offering high performance in NLP tasks.
NLP Research
High Performance
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Mistral
Ministral
Language Model
API Only
Efficiency
A lightweight language model for efficient NLP tasks, accessible via API for developers.
Efficient NLP
Task Versatility
Developer API
8.5/10
Performance
8.4/10
Accuracy
Proprietary
License
Deepseek
Janus
Multimodal
Open Weights
Research
An open-weight multimodal model for text and image processing, optimized for research and development.
Text and Image Processing
Research Flexibility
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google
Fluid
Reasoning
API Only
Research
An AI model for advanced reasoning and problem-solving, accessible via API for research applications.
Advanced Reasoning
Problem Solving
Research Applications
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Stability AI
Stable Diffusion 3.5
Image Generation
Open Weights
Creative
An upgraded open-weight model for text-to-image generation, offering improved quality and flexibility.
Text-to-Image
High-Quality Output
Creative Flexibility
8.9/10
Performance
8.8/10
Accuracy
CreativeML Open RAIL-M
License
Anthropic
Claude 3.5 Sonnet New
Conversational
API Only
Safety
An upgraded conversational AI model with enhanced reasoning and safety, accessible via API.
Enhanced Reasoning
Safe Interactions
Helpful Responses
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Anthropic
Claude 3.5 Haiku
Conversational
API Only
Efficiency
A lightweight conversational AI model with efficient performance and safety, accessible via API.
Efficient Reasoning
Safe Interactions
Helpful Responses
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Recraft
Recraft v3
Image Generation
API Only
Creative
An image generation model for creating high-quality visuals, accessible via API for creative workflows.
High-Quality Images
Creative Workflows
Professional Design
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
OpenAI
Search GPT
Search
API Only
Summarization
An AI-powered search engine providing concise and relevant answers, accessible via API.
Search Summaries
Information Retrieval
Concise Outputs
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Allen AI
OLMoE
Language Model
Open Weights
Research
An open-weight language model for research, offering high performance in NLP tasks with a focus on efficiency.
NLP Research
High Performance
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Mistral
Pixtral12B
Multimodal
Open Weights
Research
An open-weight multimodal model for text and image processing, optimized for research and development.
Text and Image Processing
Research Flexibility
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
OpenAI
o1 preview
Reasoning
API Only
Problem-Solving
A preview of an advanced reasoning model with enhanced problem-solving capabilities, accessible via API.
Advanced Reasoning
Problem Solving
Task Versatility
9.4/10
Performance
9.3/10
Accuracy
Proprietary
License
OpenAI
o1 mini
Reasoning
API Only
Efficiency
A lightweight version of the o1 model with efficient reasoning capabilities, accessible via API.
Efficient Reasoning
Problem Solving
Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
IBM
Granite Code
Code Generation
Open Weights
Developer
An open-weight model for code generation, optimized for programming tasks and developer workflows.
Code Completion
Syntax Understanding
Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Alibaba
Qwen 2.5
Language Model
Open Weights
Performance
An open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP
Task Versatility
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
KLING
KLING 1.5
Video Generation
API Only
Creative
A video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video
High-Quality Output
Creative Storytelling
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
01 AI
Yi Coder
Code Generation
Open Weights
Developer
An open-weight model for code generation, optimized for programming tasks and developer workflows.
Code Completion
Syntax Understanding
Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
OpenAI
GPT4o Advanced Voice Mode
Multimodal
API Only
Voice
An enhanced version of GPT-4o with advanced voice interaction capabilities, accessible via API.
Voice Interaction
Text and Image Processing
Task Versatility
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Meta
Llama 3.2
Language Model
Open Weights
Research
An upgraded open-weight language model for research, offering improved performance in NLP tasks.
NLP Research
High Performance
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Google
Gemini Pro 1.5 002
Multimodal
API Only
Performance
An updated multimodal model with enhanced text and image processing capabilities, accessible via API.
Text and Image Processing
Advanced Reasoning
Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Kyutai
Moshi
Conversational
Open Weights
Voice
An open-weight conversational AI model optimized for real-time voice and text interactions.
Voice Interaction
Real-Time Processing
Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Google
NotebookLM
Research
API Only
Summarization
An updated AI-powered tool for research and note-taking, providing summaries and insights via API.
Research Summaries
Note-Taking
Insight Generation
8.6/10
Performance
8.5/10
Accuracy
Proprietary
License
Mistral
Mistral Small
Language Model
API Only
Efficiency
A lightweight language model for efficient NLP tasks, accessible via API for developers.
Efficient NLP
Task Versatility
Developer API
8.5/10
Performance
8.4/10
Accuracy
Proprietary
License
Black Forest Labs
Flux
Image Generation
Open Weights
Creative
An open-weight model for high-quality image generation, optimized for creative and professional applications.
Text-to-Image
High-Quality Output
Creative Flexibility
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
OpenAI
GPT-4o 0806
Multimodal
API Only
Reasoning
An updated multimodal AI model with enhanced text and image processing capabilities, accessible via API.
Text and Image Processing
Advanced Reasoning
Task Versatility
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Google
Imagen 3
Image Generation
API Only
Creative
An advanced image generation model for creating high-quality visuals, accessible via API for creative applications.
High-Quality Images
Creative Workflows
Professional Design
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
xAI
Grok 2
Conversational
API Only
Reasoning
An advanced conversational AI model with enhanced reasoning capabilities, accessible via API.
Enhanced Reasoning
Task Assistance
Conversational
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
xAI
Grok 2 mini
Conversational
API Only
Efficiency
A lightweight version of Grok 2 with efficient conversational capabilities, accessible via API.
Efficient Reasoning
Task Assistance
Conversational
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Nous
Hermes 3
Language Model
Open Weights
Research
An open-weight language model for research and advanced NLP tasks, offering high performance and flexibility.
NLP Research
High Performance
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Microsoft
Phi 3.5
Language Model
Open Weights
Efficiency
An upgraded open-weight language model optimized for efficiency and performance on resource-constrained devices.
Resource Efficiency
High Performance
Customizable
8.6/10
Performance
8.5/10
Accuracy
MIT
License
Google
Gemini 1.5 Flash8B
Multimodal
API Only
Efficiency
A lightweight multimodal model with efficient text and image processing, accessible via API.
Text and Image Processing
Efficient Performance
Task Versatility
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Ideogram
Ideogram 2.0
Image Generation
API Only
Creative
An image generation model for creating high-quality visuals, accessible via API for creative applications.
High-Quality Images
Creative Workflows
Professional Design
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Luma
Dream Machine 1.5
Video Generation
API Only
Creative
A video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video
High-Quality Output
Creative Storytelling
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Cohere
Command R+
Language Model
Open Weights
Performance
An open-weight language model optimized for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP
Task Versatility
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
TII
Falcon Mamba
Language Model
Open Weights
Efficiency
An open-weight state-space model for efficient language processing, suitable for research and development.
State-Space Architecture
Efficient Processing
Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
OpenAI
GPT-4o mini
Multimodal
API Only
Efficiency
A lightweight version of GPT-4o with multimodal capabilities, optimized for efficiency via API access.
Text and Image Processing
Efficient Performance
Task Versatility
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Meta
Llama 3.1
Language Model
Open Weights
Research
An upgraded open-weight language model for research, offering improved performance in NLP tasks.
NLP Research
High Performance
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Mistral
Codestral Mamba
Code Generation
Open Weights
Efficiency
An open-weight model for code generation, leveraging state-space architecture for efficient programming tasks.
Code Completion
State-Space Architecture
Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Google
AlphaProof & AlphaGeometry 2
Math
API Only
Reasoning
Specialized AI models for mathematical reasoning and geometry problem-solving, accessible via API.
Mathematical Reasoning
Geometry Solving
Research Applications
9.0/10
Performance
9.1/10
Accuracy
Proprietary
License
OpenAI
SearchGPT
Search
API Only
Summarization
An AI-powered search engine providing concise and relevant answers, accessible via API.
Search Summaries
Information Retrieval
Concise Outputs
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Udio
Udio v1.5
Music Generation
API Only
Creative
A music creation model generating high-quality audio tracks, accessible via API for creative applications.
Text-to-Music
High-Quality Audio
Creative Flexibility
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Mistral
Mistral Large 2
Language Model
API Only
Performance
A high-performance language model for advanced NLP tasks, accessible via API for developers.
Advanced NLP
Task Versatility
Developer API
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Midjourney
Midjourney v6.1
Image Generation
API Only
Creative
An upgraded image generation model for creating high-quality visuals, accessible via API for creative workflows.
High-Quality Images
Creative Workflows
Professional Design
9.1/10
Performance
9.0/10
Accuracy
Proprietary
License
Google
Gemma 2 2B
Language Model
Open Weights
Efficiency
A lightweight open-weight language model for research and efficient NLP tasks, offering high performance.
Lightweight NLP
Research Flexibility
Customizable
8.5/10
Performance
8.4/10
Accuracy
Apache 2.0
License
Stability AI
Stable Diffusion 3 (Medium)
Image Generation
Open Weights
Creative
A medium-sized version of Stable Diffusion 3, offering open weights for text-to-image generation with balanced performance.
Text-to-Image
Balanced Performance
Creative Flexibility
8.7/10
Performance
8.6/10
Accuracy
CreativeML Open RAIL-M
License
Apple
Apple Intelligence
Productivity
API Only
On-Device
An AI suite for Apple devices, enhancing user experience through on-device task automation and insights, accessible via API.
Task Automation
On-Device Processing
User Experience
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Deepseek
DeepSeekCoderV2
Code Generation
Open Weights
Developer
An open-weight model for advanced code generation, optimized for programming tasks and developer workflows.
Code Completion
Syntax Understanding
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Runway
Gen3 Alpha
Video Generation
API Only
Creative
A video generation model for creating high-quality videos from text prompts, accessible via API for creative applications.
Text-to-Video
High-Quality Output
Creative Storytelling
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
01 AI
Yi 1.5
Language Model
Open Weights
Research
An open-weight language model optimized for research and NLP tasks, offering high performance and flexibility.
NLP Research
High Performance
Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Anthropic
Claude Sonnet 3.5
Conversational
API Only
Safety
An upgraded conversational AI model with enhanced reasoning and safety features, accessible via API.
Enhanced Reasoning
Safe Interactions
Helpful Responses
9.2/10
Performance
9.1/10
Accuracy
Proprietary
License
Microsoft
Florence 2
Vision
Open Weights
Research
An open-weight vision model for advanced image processing tasks, suitable for research and development.
Image Processing
Research Flexibility
Customizable
8.6/10
Performance
8.5/10
Accuracy
MIT
License
Google
Gemma 2
Language Model
Open Weights
Efficiency
An open-weight language model optimized for research and lightweight NLP tasks, offering high efficiency.
Lightweight NLP
Research Flexibility
Customizable
8.5/10
Performance
8.4/10
Accuracy
Apache 2.0
License
OpenAI
GPT-4o
Multimodal
API Only
Reasoning
A multimodal AI model with advanced capabilities in text, image processing, and reasoning, accessible via API.
Text and Image Processing
Advanced Reasoning
Task Versatility
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Google
Gemini 1.5
Multimodal
API Only
High Capacity
An upgraded multimodal model with a 2 million token limit, offering enhanced performance in text and image tasks.
Large Token Limit
Text and Image Processing
Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Microsoft
Copilot+
Productivity
API Only
Automation
An AI assistant for dedicated computers, enhancing productivity through task automation and insights, accessible via API.
Task Automation
Productivity Insights
Integration
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Meta
Chameleon
Multimodal
Open Weights
Research
A multimodal model with open weights, designed for text and image processing in research and development.
Text and Image Processing
Research Flexibility
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Mistral
Mistral-7B-Instruct-v0.3
Language Model
Open Weights
Instruction
An open-weight language model optimized for instruction-following tasks, offering high efficiency and performance.
Instruction Following
Efficient Processing
Customizable
8.6/10
Performance
8.5/10
Accuracy
Apache 2.0
License
Google
AI Overviews
Search
API Only
Summarization
An AI-powered search summary tool providing concise and relevant information, accessible via API.
Search Summaries
Information Retrieval
Concise Outputs
8.5/10
Performance
8.4/10
Accuracy
Proprietary
License
Suno AI
Suno v3.5
Music Generation
API Only
Creative
An upgraded music creation model generating high-quality audio tracks, accessible via API for creative projects.
Text-to-Music
High-Quality Audio
Creative Flexibility
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Mistral
Codestral
Code Generation
Open Weights
Developer
An open-weight model for code generation, optimized for programming tasks and developer workflows.
Code Completion
Syntax Understanding
Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
TII
Falcon 2
Multimodal
Open Weights
Language
An open-weight model family including Falcon2-11B and Falcon2-VLM, designed for language and vision tasks.
Language Processing
Vision Processing
Customizable
8.6/10
Performance
8.5/10
Accuracy
Apache 2.0
License
Stability AI
Stable Audio 2.0
Audio Generation
Open Weights
Creative
An open-weight model for generating high-fidelity audio, suitable for music and sound design applications.
High-Fidelity Audio
Sound Design
Customizable
8.8/10
Performance
8.7/10
Accuracy
CreativeML Open RAIL-M
License
xAI
Grok-1.5V
Multimodal
API Only
Reasoning
An enhanced version of Grok with image recognition capabilities, designed for multimodal task assistance via API.
Image Recognition
Task Assistance
Conversational
8.7/10
Performance
8.8/10
Accuracy
Proprietary
License
Mistral
Mixtral 8x22B
Language Model
Open Weights
Efficiency
A high-performance language model with open weights, optimized for efficiency and scalability in NLP tasks.
Efficient Processing
Scalable NLP
Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Meta
LLaMA 3
Language Model
Open Weights
Research
An open-weight language model designed for research, offering high performance in natural language tasks.
NLP Research
High Performance
Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Microsoft
Phi-3-mini
Language Model
Open Weights
Efficiency
A lightweight, open-weight language model optimized for efficiency and performance on resource-constrained devices.
Resource Efficiency
High Performance
Customizable
8.5/10
Performance
8.4/10
Accuracy
MIT
License
Adobe
Firefly 3
Image Generation
API Only
Creative
An image creation model for professional design, offering high-quality outputs via API for creative workflows.
High-Quality Images
Professional Design
Creative Workflows
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Reka
Reka AI Models
Multimodal
API Only
Language
Multimodal language models designed for advanced text and image processing, accessible via API.
Text Processing
Image Processing
Task Versatility
8.6/10
Performance
8.5/10
Accuracy
Proprietary
License
Apple
OpenELM
Language Model
Open Weights
Efficiency
An open-weight language model optimized for efficient on-device NLP tasks, suitable for research and development.
On-Device NLP
Resource Efficiency
Customizable
8.4/10
Performance
8.3/10
Accuracy
Apache 2.0
License
xAI
Grok 1.5
Conversational
Open Weights
Reasoning
An advanced conversational AI model with improved reasoning and open weights, designed to assist users in various tasks.
Enhanced Reasoning
Task Assistance
Open Customization
8.6/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Anthropic
Claude 3
Conversational
API Only
Safety
A conversational AI model outperforming GPT-4, focused on safety and helpfulness, accessible via API.
Safe Interactions
Advanced Reasoning
Helpful Responses
9.1/10
Performance
9.0/10
Accuracy
Proprietary
License
Suno AI
Suno v3
Music Generation
API Only
Creative
A music creation model generating high-quality audio tracks from prompts, accessible via API for creative applications.
Text-to-Music
High-Quality Audio
Creative Flexibility
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Stability AI
Stable Diffusion 3
Image Generation
Open Weights
Creative
A text-to-image model with enhanced capabilities for generating high-quality, detailed images from textual prompts, suitable for creative and professional applications.
Text-to-Image
High-Resolution Output
Creative Flexibility
8.8/10
Performance
8.9/10
Accuracy
CreativeML Open RAIL-M
License
Google
Gemini Pro
Conversational
Reasoning
Multimodal
An upgraded conversational AI model powering Bard, offering improved reasoning and language understanding for diverse tasks.
Enhanced Reasoning
Language Understanding
Task Versatility
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Google
Gemini Pro 1.5
Multimodal
API Only
Reasoning
A multimodal AI model with advanced capabilities in text, image processing, and reasoning, accessible via API for developers.
Text and Image Processing
Advanced Reasoning
Developer API
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Google
CodeGemma
Code Generation
Open Weights
Developer
A code generation model designed for programming tasks, offering open weights for community use and customization.
Code Completion
Syntax Understanding
Customizable
8.5/10
Performance
8.4/10
Accuracy
Apache 2.0
License
OpenAI
Sora
Video Generation
API Only
Creative
A video generation model capable of creating realistic and imaginative videos from text prompts, not yet publicly released.
Text-to-Video
Realistic Rendering
Creative Storytelling
9.2/10
Performance
9.0/10
Accuracy
Proprietary
License
Google
Gemini Ultra
LLM
Multimodal
Google's largest multimodal model for text, vision, and reasoning tasks. It excels in complex problem-solving across diverse data types like code, images, and text.
Multimodal
Advanced Reasoning
Code Generation
Vision
9.1/10
Performance
9.3/10
Accuracy
Commercial
License
Google
Gemini Pro
LLM
Multimodal
Mid-tier multimodal Gemini model for text, vision, and reasoning. It offers a balance of performance and efficiency for various tasks.
Multimodal
Reasoning
Code Generation
Vision
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
Google
Gemini Nano
LLM
On-device
Lightweight Gemini model optimized for on-device tasks. It supports text and vision processing with low resource requirements.
On-device
Multimodal
Efficient
Vision
7.8/10
Performance
8.2/10
Accuracy
Commercial
License
Microsoft
Phi-2
SLM
Open Source
A 2.7 billion parameter SLM optimized for reasoning, coding, and math, delivering near state-of-the-art performance for its size. It’s designed for low-resource environments, making it ideal for on-device applications and research experimentation.
Reasoning
Coding
Math
On-device
8.5/10
Performance
8.7/10
Accuracy
MIT
License
DeepSeek
DeepSeek Coder
LLM
Open Source
Coding
DeepSeek Coder is an open-source model optimized for programming tasks, enabling code generation and completion with high accuracy. It laid the foundation for DeepSeek's later coding-focused models.
Code Generation
Code Completion
Programming Support
8.0/10
Performance
8.2/10
Accuracy
DeepSeek License
License
DeepSeek
DeepSeek LLM
LLM
Open Source
General Purpose
DeepSeek LLM is a general-purpose language model available in Base and Chat variants, trained on 2 trillion tokens of English and Chinese text. It excels in text generation and conversational tasks.
Text Generation
Conversational AI
Multilingual Support
Context-aware
8.3/10
Performance
8.5/10
Accuracy
DeepSeek License
License
Google
Tram
LLM
Structured Data
Model for structured data processing and analysis. It excels in tasks involving tabular data and complex data structures.
Structured Data
Data Analysis
Table Processing
Scalable
8.5/10
Performance
8.8/10
Accuracy
Commercial
License
OpenAI
DALL-E 3
Multimodal
Image Generation
The latest DALL-E model with advanced image generation and integration into ChatGPT. It offers improved detail and prompt adherence.
Image Generation
Prompt Adherence
ChatGPT Integration
9.0/10
Performance
9.2/10
Accuracy
Proprietary
License
Microsoft
Phi-1.5
SLM
Open Source
An enhanced version of Phi-1 with improved reasoning and text generation capabilities, maintaining a compact 1.3 billion parameter size. It excels in tasks like coding and math, offering a balance of efficiency and performance for local deployment.
Text Generation
Reasoning
Coding
Local Deployment
8.2/10
Performance
8.4/10
Accuracy
MIT
License
Alibaba
Qwen-VL-7B
Multimodal
Vision-Language
Open Source
A vision-language model capable of understanding and generating content from images, supporting multi-round question answering.
Image Understanding
Multi-Round QA
Creative Capabilities
Multilingual Support
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google
Flan-PaLM
LLM
Instruction-tuned
Instruction-tuned PaLM for improved task generalization. It leverages diverse instruction datasets to excel in zero-shot and few-shot scenarios across tasks.
Zero-shot Learning
Instruction Tuning
Advanced Reasoning
Multilingual
9.2/10
Performance
9.3/10
Accuracy
Commercial
License
Google
Med-PaLM 2
LLM
Medical
Advanced medical-domain model for clinical tasks and question answering. It improves on MedPaLM with enhanced accuracy for healthcare applications.
Medical QA
Clinical Tasks
Advanced Reasoning
Fine-tuning
9.2/10
Performance
9.4/10
Accuracy
Commercial
License
Meta AI
Code Llama
LLM
Code
A specialized LLaMA variant for code generation and programming tasks. It supports multiple programming languages and excels in generating accurate, context-aware code.
Code Generation
Programming
Context-aware
Fine-tuning
8.7/10
Performance
8.9/10
Accuracy
Non-commercial
License
Google
MedPaLM
LLM
Medical
Medical-domain PaLM for clinical tasks like medical question answering. It is fine-tuned with medical data for high accuracy in healthcare applications.
Medical QA
Clinical Tasks
Advanced Reasoning
Fine-tuning
9.1/10
Performance
9.3/10
Accuracy
Commercial
License
Meta AI
LLaMA 2
LLM
Research
An improved version of LLaMA, offering enhanced performance and safety for research applications. It supports a wide range of NLP tasks with better generalization and efficiency.
Text Generation
Safety-focused
Efficient
Fine-tuning
8.8/10
Performance
9.0/10
Accuracy
Non-commercial
License
Stability AI
Stable Diffusion XL (SDXL)
LLM
Image Generation
Open Source
A powerful evolution of Stable Diffusion, SDXL delivers superior image quality and prompt adherence at higher resolutions, ideal for professional use cases. It incorporates advanced training methods and supports diverse styles like 3D, photography, and painting, with optimized performance on consumer hardware.
High-Resolution Images
Prompt Adherence
Diverse Styles
Consumer Hardware
9.0/10
Performance
9.1/10
Accuracy
CreativeML Open RAIL-M
License
Google
U-PaLM
LLM
Advanced
Continually trained PaLM with Unified Language Learning (UL2) objectives. It enhances generalization across tasks, improving performance in reasoning and multilingual settings.
Continual Learning
Advanced Reasoning
Multilingual
Scalable
9.1/10
Performance
9.2/10
Accuracy
Commercial
License
Google
Flamingo-C
LLM
Multimodal
Compact version of Flamingo for multimodal vision-language tasks. It maintains strong performance with reduced resource requirements.
Multimodal
Vision and Language
Efficient
Image Captioning
8.6/10
Performance
8.9/10
Accuracy
Commercial
License
Microsoft
Phi-1
SLM
Open Source
A small language model (SLM) with 1.3 billion parameters, designed for efficient text generation and basic reasoning tasks, particularly in research settings. It achieves strong performance on benchmarks like HumanEval, focusing on lightweight, cost-effective AI solutions for developers.
Text Generation
Efficient
Research-focused
Lightweight
8.0/10
Performance
8.2/10
Accuracy
MIT
License
Google
PaLM 2
LLM
Advanced
Enhanced version of PaLM with improved efficiency and performance. It offers better reasoning, multilingual capabilities, and optimized training for diverse tasks.
Advanced Reasoning
Multilingual
Efficient
Code Generation
9.2/10
Performance
9.3/10
Accuracy
Commercial
License
Stability AI
Stable LM
LLM
Language
Open Source
An open-source language model suite launched in April 2023, with 3B to 7B parameter models designed for efficient text and code generation on personal devices. Trained on a massive dataset three times larger than The Pile, it offers high performance for conversational and coding tasks.
Text Generation
Code Generation
Efficient
Consumer Devices
8.6/10
Performance
8.7/10
Accuracy
CC BY-SA-4.0
License
Google
PaLM-E
LLM
Multimodal
Embodied PaLM for robotics and multimodal tasks, integrating language with sensory inputs. It enables language-guided control in physical environments like robotic navigation.
Multimodal
Robotics
Advanced Reasoning
Embodied AI
8.9/10
Performance
9.1/10
Accuracy
Commercial
License
OpenAI
GPT-4
LLM
Multimodal
A multimodal model with enhanced text and image processing capabilities. It achieves human-level performance on academic benchmarks and supports complex tasks.
Text Generation
Image Processing
Advanced Reasoning
Multimodal
9.0/10
Performance
9.2/10
Accuracy
Proprietary
License
Meta AI
LLaMA
LLM
Research
A family of language models designed for research purposes, known for efficiency in natural language tasks. LLaMA models excel in text generation and understanding with optimized architectures.
Text Generation
Efficient
Research-focused
Fine-tuning
8.5/10
Performance
8.8/10
Accuracy
Non-commercial
License
Google
Dramatron
LLM
Creative
Model for scriptwriting and creative writing assistance. It generates coherent narratives and dialogue for storytelling applications.
Creative Writing
Scriptwriting
Text Generation
Narrative
8.3/10
Performance
8.5/10
Accuracy
Commercial
License
OpenAI
GPT-3.5
LLM
Conversational
An optimized version of GPT-3 with fewer parameters, fine-tuned using reinforcement learning for conversational tasks. It powers the initial ChatGPT release.
Conversational AI
Text Generation
Fine-tuned
RLHF
8.7/10
Performance
8.9/10
Accuracy
Proprietary
License
Stability AI
Stable Diffusion 2.0
LLM
Image Generation
Open Source
An enhanced version of Stable Diffusion, introducing inpainting, outpainting, and depth-guided image generation for improved creative control. It maintains high-quality outputs while addressing ethical concerns through filtered training data and permissive licensing for diverse applications.
Inpainting
Outpainting
Depth-guided Generation
Text-to-Image
8.9/10
Performance
9.0/10
Accuracy
CreativeML Open RAIL-M
License
Google
Flan-T5
LLM
Instruction-tuned
Instruction-tuned T5 for zero-shot task generalization. It improves T5’s performance on unseen tasks by fine-tuning with diverse instruction datasets.
Zero-shot Learning
Text-to-Text
Fine-tuning
Instruction Tuning
8.7/10
Performance
9.0/10
Accuracy
Apache 2.0
License
Google
Sparrow
LLM
Conversational
Dialogue model with a focus on safety and ethical responses. It aims to reduce harmful outputs while maintaining conversational quality.
Conversational
Safety-focused
Dialogue
Ethical
8.6/10
Performance
8.8/10
Accuracy
Commercial
License
OpenAI
Whisper
Speech
Open Source
An automatic speech recognition model for transcribing and translating audio. It supports multilingual speech processing with high accuracy.
Speech Recognition
Transcription
Translation
8.7/10
Performance
8.9/10
Accuracy
MIT
License
Google
SayCan
LLM
Robotics
Language-guided robotic control model for task execution. It combines language understanding with physical actions for robotic applications.
Robotics
Language-guided
Task Execution
Multimodal
8.5/10
Performance
8.7/10
Accuracy
Commercial
License
Stability AI
Stable Diffusion
LLM
Image Generation
Open Source
A pioneering open-source text-to-image model that generates high-quality, photorealistic images from textual prompts, leveraging latent diffusion techniques. Widely adopted for its flexibility and ability to run on consumer hardware, it supports creative applications in art, design, and media production.
Text-to-Image
High Resolution
Artistic Quality
Open Source
8.8/10
Performance
8.9/10
Accuracy
CreativeML Open RAIL-M
License
Google
Minerva
LLM
Reasoning
PaLM-based model optimized for quantitative reasoning tasks. It excels in solving mathematical and scientific problems with high accuracy.
Quantitative Reasoning
Advanced Reasoning
Math-focused
Scalable
9.0/10
Performance
9.2/10
Accuracy
Commercial
License
Google
UL2
LLM
Versatile
Unified Language Learning model with a mixture-of-denoisers approach. It supports diverse tasks by combining multiple pre-training objectives for flexibility.
Mixture-of-Denoisers
Pre-training
Multitask
Scalable
8.7/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Google
Gato
LLM
Generalist
Generalist agent for text, vision, and robotics tasks. It performs well across diverse domains, from language to physical control.
Multimodal
Robotics
Generalist
Scalable
8.6/10
Performance
8.8/10
Accuracy
Commercial
License
Meta AI
OPT
LLM
Open Source
Open Pre-trained Transformer models for research, offering efficient large-scale language modeling. It provides performance comparable to GPT-3 with open access for academic use.
Text Generation
Efficient
Research-focused
Scalable
8.6/10
Performance
8.9/10
Accuracy
Non-commercial
License
Google
PaLM
LLM
Large-scale
Pathways Language Model, a 540B-parameter model for advanced reasoning and multilingual tasks. It excels in complex tasks like mathematical reasoning and code generation.
Advanced Reasoning
Multilingual
Code Generation
Scalable
9.0/10
Performance
9.2/10
Accuracy
Commercial
License
Google
Flamingo
LLM
Multimodal
Multimodal vision-language model for tasks like image captioning. It combines visual and textual understanding for versatile applications.
Multimodal
Vision and Language
Image Captioning
Fine-tuning
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
OpenAI
DALL-E 2
Multimodal
Image Generation
An enhanced version of DALL-E with improved image quality and editing capabilities. It supports higher resolution and more precise outputs.
Image Generation
Image Editing
High Resolution
8.8/10
Performance
9.0/10
Accuracy
Proprietary
License
Google
Luminous
LLM
Multilingual
Multilingual text generation model with limited public details. It focuses on high-quality text generation for diverse languages and applications.
Multilingual
Text Generation
Pre-training
Fine-tuning
8.4/10
Performance
8.7/10
Accuracy
Commercial
License
Google
Chinchilla
LLM
Efficient
70B-parameter compute-optimal model for efficient performance. It outperforms larger models in NLP tasks with less computational cost.
Compute-optimal
Efficient
NLP Tasks
Scalable
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
Google
Ithaca
LLM
Specialized
Model for historical text restoration, specializing in ancient Greek texts. It reconstructs missing text with high contextual accuracy.
Text Restoration
Historical Texts
Contextual Understanding
Specialized
8.4/10
Performance
8.6/10
Accuracy
Commercial
License
Google
CodeGen
LLM
Code
Code-generation model complementing AlphaCode for programming tasks. It generates high-quality code for various languages and applications.
Code Generation
Programming
Reasoning
Scalable
8.7/10
Performance
8.9/10
Accuracy
Commercial
License
Google
AlphaCode
LLM
Code
Model for competitive programming and code generation. It solves complex algorithmic problems with high accuracy and efficiency.
Code Generation
Competitive Programming
Reasoning
Scalable
8.9/10
Performance
9.1/10
Accuracy
Commercial
License
Google
LaMDA
LLM
Conversational
Language Model for Dialogue Applications, optimized for conversational tasks. It generates coherent and contextually relevant responses for natural dialogue.
Conversational
Contextual Understanding
Dialogue
Fine-tuning
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
Google
RETRO
LLM
Retrieval-augmented
Retrieval-augmented Transformer for enhanced language modeling. It uses external memory to improve performance on knowledge-intensive tasks.
Retrieval-augmented
Knowledge-intensive
NLP Tasks
Scalable
8.7/10
Performance
8.9/10
Accuracy
Commercial
License
OpenAI
GPT-3.5 Turbo
LLM
Conversational
A highly capable variant of GPT-3.5, optimized for speed and efficiency. It supports ChatGPT and was integrated into platforms like Bing before GPT-4.
Conversational AI
High Efficiency
Text Generation
8.8/10
Performance
9.0/10
Accuracy
Proprietary
License
NVIDIA
PeopleNet
Computer Vision
Pretrained
Real-time
PeopleNet is a computer vision model developed using NVIDIA TAO for real-time pedestrian detection and tracking in urban environments, optimized for smart cities and autonomous vehicles.
Pedestrian Detection
Object Tracking
Real-time Processing
8.5/10
Performance
8.7/10
Accuracy
NVIDIA License
License
NVIDIA
Bi3D
Computer Vision
Pretrained
Depth Estimation
Bi3D is a binary depth classification network for classifying object depth, ideal for collision avoidance in autonomous mobile robots.
Depth Classification
Collision Avoidance
Efficient Processing
8.3/10
Performance
8.5/10
Accuracy
NVIDIA License
License
NVIDIA
BioBERT
NLP
Pretrained
Biomedical
BioBERT is a BERT-based model fine-tuned on biomedical datasets for text mining and NLP tasks, optimized for identifying chemical and protein entities.
Biomedical Text Mining
Entity Recognition
Context-aware Processing
8.6/10
Performance
8.8/10
Accuracy
NVIDIA License
License
NVIDIA
Spleen Segmentation
Computer Vision
Pretrained
Medical
Spleen Segmentation is a pretrained model for volumetric 3D segmentation of the spleen from CT images, using advanced medical segmentation techniques.
3D Segmentation
Medical Imaging
High Accuracy
8.7/10
Performance
8.9/10
Accuracy
NVIDIA License
License
NVIDIA
Conformer
Speech Recognition
Pretrained
Multilingual
Conformer is a convolution-augmented transformer model for automatic speech recognition, supporting over 10 languages for applications like live captioning and voice assistants.
Speech Recognition
Multilingual Support
High Accuracy
8.8/10
Performance
9.0/10
Accuracy
NVIDIA License
License
NVIDIA
ECAPA-TDNN
Speech AI
Pretrained
Speaker Identification
ECAPA-TDNN is a time delay neural network-based model for speaker identification and verification, providing robust speaker embeddings for applications like medical conversation analysis.
Speaker Identification
Speaker Verification
Robust Embeddings
8.6/10
Performance
8.8/10
Accuracy
NVIDIA License
License
NVIDIA
Megatron 530B
LLM
Pretrained
Conversational
Megatron 530B is a transformer-based language model using ELECTRA pretraining, optimized for NLP tasks like chatbots and virtual assistants with smaller size and faster training.
Text Generation
Conversational AI
Efficient Training
8.9/10
Performance
9.1/10
Accuracy
NVIDIA License
License
Google
GLaM
LLM
Efficient
Generalist Language Model, a 1.2T-parameter Mixture-of-Experts model. It achieves high performance with lower energy consumption for NLP tasks.
Mixture-of-Experts
Efficient
Scalable
NLP Tasks
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
Google
Flan
LLM
Instruction-tuned
Instruction-tuned model family for zero-shot performance across tasks. It leverages fine-tuning on diverse datasets to improve generalization.
Zero-shot Learning
Instruction Tuning
Multitask
Fine-tuning
8.6/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Google
Gopher
LLM
Large-scale
280B-parameter model focused on reasoning and language tasks. It competes with large-scale models in NLP benchmarks and research applications.
Advanced Reasoning
Scalable
NLP Tasks
Research
8.9/10
Performance
9.1/10
Accuracy
Commercial
License
Google
T0
LLM
Zero-shot
T5-based model for zero-shot task generalization. It leverages multitask prompting to perform well on unseen tasks without additional fine-tuning.
Zero-shot Learning
Text-to-Text
Multitask
Fine-tuning
8.5/10
Performance
8.8/10
Accuracy
Apache 2.0
License
OpenAI
Codex
LLM
Code
A specialized model for code generation and editing, powering tools like GitHub Copilot. It excels in understanding and generating programming languages.
Code Generation
Code Editing
Programming Support
8.6/10
Performance
8.8/10
Accuracy
Proprietary
License
Google
Perceiver
LLM
Multimodal
General-purpose architecture for text and multimodal tasks. It uses cross-attention to handle diverse data types efficiently.
Multimodal
Cross-attention
Efficient
Scalable
8.3/10
Performance
8.6/10
Accuracy
Commercial
License
Google
CANINE
LLM
Multilingual
Character-based model for multilingual text processing without word tokenization. It excels in low-resource languages and noisy text environments.
Character-based
Multilingual
Robust
Pre-training
7.8/10
Performance
8.1/10
Accuracy
Apache 2.0
License
Google
UniT
LLM
Multimodal
Unified Transformer for vision and language tasks. It handles multimodal inputs for applications like image captioning and visual question answering.
Multimodal
Vision and Language
Pre-training
Fine-tuning
8.5/10
Performance
8.8/10
Accuracy
Commercial
License
Google
ByT5
LLM
Character-based
Byte-level T5 model for character-based text processing. It operates directly on UTF-8 bytes, improving performance on noisy text and low-resource languages.
Byte-level Processing
Multilingual
Text-to-Text
Robust
8.3/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Google
MUM
LLM
Multimodal
Multitask Unified Model, a multimodal model for search combining text and images. It enhances search relevance by understanding complex, multimodal queries.
Multimodal
Search Optimization
Text and Image
Scalable
8.7/10
Performance
8.9/10
Accuracy
Commercial
License
Google
ViT-BERT
LLM
Multimodal
Hybrid vision-language model combining Vision Transformer and BERT. It excels in tasks requiring joint understanding of images and text.
Multimodal
Vision and Language
Pre-training
Fine-tuning
8.6/10
Performance
8.9/10
Accuracy
Commercial
License
Google
MuRIL
LLM
Multilingual
Multilingual Representation for Indian Languages, a BERT-based model tailored for Indian languages. It supports 17 Indian languages, enhancing NLP tasks like sentiment analysis and text classification.
Multilingual
Indian Languages
Pre-training
Fine-tuning
7.9/10
Performance
8.2/10
Accuracy
Apache 2.0
License
Google
Switch Transformer
LLM
Scalable
Mixture-of-experts model for scaling to trillions of parameters efficiently. It dynamically selects expert subnetworks, reducing compute costs for large-scale tasks.
Mixture-of-Experts
Scalable
Efficient
Pre-training
8.9/10
Performance
9.1/10
Accuracy
Commercial
License
OpenAI
CLIP
Vision
Open Source
A vision-language model that connects text and images for tasks like image classification and captioning. It’s open-source and widely used in research.
Image Classification
Text-Image Mapping
Captioning
8.3/10
Performance
8.5/10
Accuracy
MIT
License
OpenAI
DALL-E
Multimodal
Image Generation
A text-to-image model generating creative images from textual prompts. It combines GPT-like architectures with diffusion models.
Image Generation
Text-to-Image
Creative Output
8.5/10
Performance
8.7/10
Accuracy
Proprietary
License
Google
mT5
LLM
Multilingual
Multilingual T5, supporting 101 languages for global NLP applications. It extends T5’s text-to-text framework to low-resource languages, improving cross-lingual performance.
Multilingual
Text-to-Text
Pre-training
Fine-tuning
8.4/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google
ETC
LLM
Long-context
Extended Transformer Construction with hierarchical attention for long-context processing. It handles extended sequences efficiently for tasks like document understanding.
Long-context
Hierarchical Attention
Pre-training
NLP Tasks
7.9/10
Performance
8.2/10
Accuracy
Apache 2.0
License
Google
DocT5query
LLM
Search
T5-based model for document ranking and query generation. It improves search relevance by generating queries for document indexing.
Document Ranking
Query Generation
Text-to-Text
Fine-tuning
8.3/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google
BigBird
LLM
Long-context
Transformer with sparse attention for processing long sequences. It reduces memory usage while maintaining performance on tasks like document classification.
Sparse Attention
Long-context
Pre-training
NLP Tasks
8.0/10
Performance
8.4/10
Accuracy
Apache 2.0
License
Google
GShard
LLM
Multilingual
Sharding-based Mixture-of-Experts model optimized for translation tasks. It enables efficient scaling for multilingual applications with reduced computational costs.
Mixture-of-Experts
Multilingual
Translation
Scalable
8.6/10
Performance
8.8/10
Accuracy
Commercial
License
OpenAI
GPT-3
LLM
Commercial
A massive model excelling in diverse NLP tasks, from text generation to question answering. It introduced few-shot learning capabilities and powered early API applications.
Text Generation
Few-shot Learning
Question Answering
Translation
8.5/10
Performance
8.7/10
Accuracy
Proprietary
License
Microsoft
DeBERTa
LLM
Open Source
Multilingual
DeBERTa is a family of transformer-based language models (including base, large, V2, V3, and multilingual variants) that enhances BERT with disentangled attention and ELECTRA-style pre-training, achieving top performance on benchmarks like SuperGLUE and SQuAD. With sizes ranging from 22M to 1.5B parameters, it supports tasks like text classification, question answering, and cross-lingual transfer, powering Microsoft’s Turing NLRv4 for Bing and Azure.
Disentangled Attention
ELECTRA Pre-training
Cross-lingual Transfer
Question Answering
9.0/10
Performance
9.2/10
Accuracy
MIT
License
Google
TAPAS
LLM
Structured Data
Table-based question answering and parsing model. It processes structured data in tables, enabling natural language queries over tabular content.
Table Parsing
Question Answering
Structured Data
Fine-tuning
8.2/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Microsoft
CodeBERT
LLM
Open Source
Code
CodeBERT is a bimodal pre-trained model for programming and natural language, leveraging a large corpus of code and comments to excel in tasks like code search and documentation generation. It supports multiple programming languages and is widely used in tools for software development and AI-driven code analysis.
Code Search
Documentation Generation
Programming Languages
Bimodal Pre-training
8.4/10
Performance
8.6/10
Accuracy
MIT
License
Google
MobileBERT
LLM
Mobile
A compact BERT variant optimized for mobile and edge devices. It balances performance and resource usage, enabling efficient NLP on low-power hardware.
Mobile Optimization
Low Latency
Fine-tuning
NLP Tasks
7.5/10
Performance
8.0/10
Accuracy
Apache 2.0
License
Google
Longformer
LLM
Long-context
Transformer with efficient attention for long-document processing. It reduces computational complexity while handling extended sequences for tasks like summarization.
Long-context
Efficient Attention
Pre-training
NLP Tasks
8.1/10
Performance
8.5/10
Accuracy
Apache 2.0
License
OpenAI
Jukebox
Audio
Open Source
A model for generating music from text prompts, supporting various genres and styles. It’s an experimental open-source project.
Music Generation
Text-to-Audio
Genre Support
8.0/10
Performance
8.2/10
Accuracy
Non-commercial
License
Google
ELECTRA
LLM
Efficient
Efficient pre-training model using a generator-discriminator framework. It achieves high performance with less compute by replacing masked language modeling with token detection.
Efficient Pre-training
Token Detection
Fine-tuning
NLP Tasks
8.2/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google
PEGASUS
LLM
Summarization
Pre-training with Extracted Gap-sentences for Abstractive Summarization. It is optimized for generating concise and accurate text summaries.
Summarization
Pre-training
Fine-tuning
Text Generation
8.3/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google
Meena
LLM
Conversational
Early conversational model, a precursor to LaMDA. It focuses on generating human-like dialogue with improved coherence and context awareness.
Conversational
Contextual Understanding
Dialogue
Pre-training
8.0/10
Performance
8.3/10
Accuracy
Commercial
License
Google
Reformer
LLM
Efficient
Memory-efficient Transformer using locality-sensitive hashing. It reduces memory usage for processing long sequences, suitable for resource-constrained environments.
Memory-efficient
Long-context
Pre-training
NLP Tasks
7.9/10
Performance
8.2/10
Accuracy
Apache 2.0
License
Meta AI
XLM-R
LLM
Multilingual
Cross-lingual Language Model-RoBERTa for multilingual NLP tasks. It supports 100 languages, enabling robust performance in translation and text classification across languages.
Multilingual
Pre-training
Fine-tuning
Cross-lingual
8.3/10
Performance
8.7/10
Accuracy
MIT
License
Microsoft
DialoGPT
LLM
Open Source
Conversational
DialoGPT is a conversational model trained on Reddit dialogues, designed to generate human-like responses for interactive chat applications. Its GPT-2-based architecture and large-scale dialogue data enable coherent and contextually relevant conversations, influencing later models like BlenderBot.
Conversational AI
Dialogue Generation
Context-aware
Human-like Responses
8.2/10
Performance
8.4/10
Accuracy
MIT
License
Google
T5
LLM
Versatile
Text-to-Text Transfer Transformer, a unified framework for NLP tasks. It converts all tasks into a text-to-text format, enabling versatile applications like translation and summarization.
Text-to-Text
Pre-training
Fine-tuning
Multitask
8.5/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Meta AI
BART
LLM
Open Source
Bidirectional and Auto-Regressive Transformer for text generation and comprehension. It excels in tasks like summarization and translation with a denoising pre-training objective.
Text Generation
Summarization
Translation
Fine-tuning
8.4/10
Performance
8.8/10
Accuracy
MIT
License
Google
ALBERT
LLM
Efficient
A Lite BERT with reduced parameters for efficiency while maintaining performance. It uses factorized embedding parameterization and cross-layer parameter sharing, ideal for resource-constrained environments.
Parameter Efficiency
Scalable
Fine-tuning
NLP Tasks
7.8/10
Performance
8.3/10
Accuracy
Apache 2.0
License
Microsoft
MT-DNN
LLM
Open Source
Multi-Task Deep Neural Network (MT-DNN) combines multi-task learning with pre-trained language models like BERT to achieve robust performance across diverse NLP tasks like sentiment analysis and text classification. Its knowledge distillation techniques enable efficient fine-tuning, making it a versatile choice for enterprise NLP applications.
Multi-task Learning
Knowledge Distillation
Text Classification
Sentiment Analysis
8.3/10
Performance
8.5/10
Accuracy
MIT
License
Meta AI
RoBERTa
LLM
Open Source
Robustly optimized BERT approach for enhanced NLP performance. It improves on BERT with dynamic masking and larger pre-training data for tasks like text classification.
Bidirectional Context
Pre-training
Fine-tuning
NLP Tasks
8.2/10
Performance
8.6/10
Accuracy
MIT
License
Microsoft
UniLM
LLM
Open Source
Unified Language Model (UniLM) is a pre-trained model supporting both natural language understanding and generation tasks, such as summarization and dialogue, through a shared transformer architecture. Its bidirectional, unidirectional, and sequence-to-sequence pre-training objectives make it highly flexible for applications in Azure Cognitive Services.
Text Generation
Summarization
Dialogue
Flexible Pre-training
8.5/10
Performance
8.7/10
Accuracy
MIT
License
Google
XLNet
LLM
Open Source
Generalized autoregressive model developed with CMU, outperforming BERT in certain tasks. It uses permutation-based training for better context modeling.
Autoregressive
Permutation Training
Fine-tuning
NLP Tasks
8.1/10
Performance
8.6/10
Accuracy
Apache 2.0
License
OpenAI
GPT-2
LLM
Open-weight
An improved transformer model with better coherence in text generation. Initially partially released due to misuse concerns, it showed strong zero-shot task performance.
Text Generation
Zero-shot Learning
Coherent Output
7.8/10
Performance
8.0/10
Accuracy
Partially Open
License
Google
BERT
LLM
Open Source
Bidirectional Encoder Representations from Transformers, designed for deep contextual understanding of text. It revolutionized NLP by enabling bidirectional context in pre-training, excelling in tasks like question answering and text classification.
Bidirectional Context
Pre-training
Fine-tuning
NLP Tasks
8.0/10
Performance
8.5/10
Accuracy
Apache 2.0
License
OpenAI
GPT-1
LLM
Research
The first in the GPT series, a transformer-based model focused on unsupervised learning for natural language tasks. It laid the foundation for future models with generative pre-training.
Text Generation
Unsupervised Learning
Pre-training
6.5/10
Performance
6.8/10
Accuracy
Proprietary
License
Microsoft
Turing
LLM
Proprietary
A family of language models developed by Microsoft Research, used across products like Bing and Azure for tasks like search and text generation. It leverages advanced techniques for efficiency and performance, contributing to Microsoft’s AI infrastructure.
Text Generation
Search
Efficient
Scalable
8.7/10
Performance
8.9/10
Accuracy
Proprietary
License
...
Showing 1-5 of 58 models