The Grand AI Handbook

AI Models Directory

Explore our curated collection of machine learning and AI models from leading research organizations and companies.

Find Models

Microsoft

BitNet b1.58 2B4T

SLM Open Source
A 1-bit AI model with 2 billion parameters, released in April 2025, designed for hyper-efficient performance on CPUs, including Apple’s M2, using a custom bitnet.cpp framework. It achieves double the speed and lower memory usage compared to traditional models, ideal for resource-constrained devices.
Efficient CPU-optimized Low Memory Fast Inference
8.8/10
Performance
8.9/10
Accuracy
MIT
License
Stability AI

Stable Virtual Camera

3D Video Research
A research-preview multi-view diffusion model launched in March 2025, transforming 2D images into immersive 3D videos with realistic depth and perspective. It eliminates the need for complex reconstruction, making it ideal for virtual reality and cinematic applications.
2D-to-3D Video Immersive Depth No Reconstruction Cinematic Use
9.0/10
Performance
9.1/10
Accuracy
CreativeML Open RAIL-M
License
DeepSeek

DeepSeek-V3-0324

LLM Open Source MoE
DeepSeek-V3-0324 is an upgraded V3 model with enhanced reasoning, coding, and tool-use capabilities, outperforming GPT-4.5 in math and coding.
Mathematical Reasoning Code Generation Tool Use Large Context Window
9.3/10
Performance
9.5/10
Accuracy
MIT
License
NVIDIA

Cosmos Nemotron

LLM Open Source Physical AI
Cosmos Nemotron is an open reasoning model for physical AI development, offering customizable world generation for robotics and simulation.
World Generation Physical AI Reasoning Customizable Simulation
9.1/10
Performance
9.3/10
Accuracy
Apache 2.0
License
NVIDIA

GR00T N1

Robotics Open Source Humanoid AI
GR00T N1 is an open, customizable foundation model for humanoid robot reasoning, enabling advanced perception and action in robotics.
Humanoid Robot Reasoning Perception Action Planning
9.0/10
Performance
9.2/10
Accuracy
Apache 2.0
License
Alibaba

Qwen2.5-Coder Series

Code Generation Open Source Multilingual
A series of code-specific models optimized for code generation, reasoning, and fixing, available in multiple sizes.
Code Generation Code Reasoning Code Fixing Supports 92 Programming Languages
9.3/10
Performance
9.1/10
Accuracy
Apache 2.0
License
Alibaba

Qwen2.5-Math Series

Math Open Source Multilingual
An advanced math-specific model series extending Qwen2.5 capabilities with high performance in mathematical reasoning tasks.
Math Word Problems Multi-Hop Reasoning Symbolic Math MathQA Tasks
9.2/10
Performance
9.3/10
Accuracy
Apache 2.0
License
OpenAI

GPT-4.5

LLM Conversational
Codenamed Orion, this large model reduces hallucinations compared to GPT-4o and o1. It’s designed for conversational tasks and broad knowledge applications.
Text Generation Low Hallucination Conversational AI
9.3/10
Performance
9.5/10
Accuracy
Proprietary
License
Microsoft

Magma

Multimodal Agentic AI
A multimodal AI model introduced in February 2025, combining visual and language processing to control software interfaces and robotic systems, enabling agentic AI for autonomous task execution. It features Set-of-Mark and Trace-of-Mark for spatial intelligence, with public code released on GitHub.
Visual Processing Language Processing Robotic Control Spatial Intelligence
9.0/10
Performance
9.1/10
Accuracy
MIT
License
Microsoft

Muse (WHAM)

Generative AI Open Source
A generative AI model for video game visuals and controller actions, released in February 2025, developed with Ninja Theory and published in Nature. It supports gameplay ideation through the WHAM Demonstrator, with open-source weights and sample data available on Azure AI Foundry.
Game Visuals Controller Actions Gameplay Ideation Interactive Interface
8.9/10
Performance
9.0/10
Accuracy
MIT
License
xAI

Grok-3

LLM Web Search Advanced Reasoning
The latest Grok model featuring reflection capabilities and advanced web search integration.
Reflection Capabilities DeepSearch Integration Advanced Reasoning
9.2/10
Performance
9.0/10
Accuracy
Proprietary
License
Stability AI

Stable Point Aware 3D (SPAR3D)

3D Real-time
A cutting-edge 3D generation model introduced in January 2025, enabling real-time editing and complete structure generation from a single image in under a second. It supports rapid prototyping for gaming, architecture, and entertainment with high precision and efficiency.
Real-time Editing Image-to-3D High Precision Rapid Prototyping
9.2/10
Performance
9.3/10
Accuracy
CreativeML Open RAIL-M
License
DeepSeek

DeepSeek-R1

LLM Open Source Reasoning
DeepSeek-R1 is a reasoning-focused model fine-tuned from V3, competing with top models like OpenAI’s o1 in math, coding, and reasoning tasks.
Chain-of-Thought Reasoning Mathematical Reasoning Code Generation Self-correction
9.2/10
Performance
9.4/10
Accuracy
MIT
License
DeepSeek

Janus-Pro-7B

Multimodal Open Source Vision
Janus-Pro-7B is a multimodal vision model for image understanding and generation, outperforming models like DALL-E 3 on key benchmarks.
Image Understanding Image Generation Multimodal Processing
8.9/10
Performance
9.1/10
Accuracy
MIT
License
Alibaba

Qwen2.5 Series

LLM Instruction-Tuned Open Source
The latest series of decoder-only language models, available in various sizes and optimized for instruction following and structured output generation.
Instruction Following Structured Output Generation Multilingual Support
9.2/10
Performance
9.0/10
Accuracy
Apache 2.0
License
OpenAI

O3 Mini

Reasoning API Only Efficiency
A lightweight version of O3 with efficient reasoning capabilities, expected to be accessible via API.
Efficient Reasoning Problem Solving Task Versatility
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Amazon

NOVA

Multimodal API Only Enterprise
A suite of AI models for various tasks, including text and image processing, accessible via API for enterprise applications.
Text and Image Processing Enterprise Integration Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
OpenAI

SORA

Video Generation API Only Creative
A video generation model for creating high-quality videos from text prompts, now publicly released via API.
Text-to-Video High-Quality Output Creative Storytelling
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Cohere

Command R7B

Language Model Open Weights Performance
An open-weight language model optimized for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP Task Versatility Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
OpenAI

O1

Reasoning API Only Problem-Solving
An advanced reasoning model with superior problem-solving capabilities, accessible via API.
Advanced Reasoning Problem Solving Task Versatility
9.5/10
Performance
9.4/10
Accuracy
Proprietary
License
OpenAI

O1 Pro

Reasoning API Only Professional
A professional-grade version of O1 with enhanced reasoning and task capabilities, accessible via API.
Enhanced Reasoning Professional Tasks Task Versatility
9.6/10
Performance
9.5/10
Accuracy
Proprietary
License
OpenAI

Live Video Mode

Multimodal API Only Video
A feature for GPT-4o enabling real-time video interaction and analysis, accessible via API.
Real-Time Video Video Analysis Task Assistance
9.2/10
Performance
9.1/10
Accuracy
Proprietary
License
Google

Gemini-Exp-1206

Multimodal API Only Experimental
An experimental multimodal model with advanced text and image processing, accessible via API.
Text and Image Processing Advanced Reasoning Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Google

Gemini 2.0 Flash

Multimodal API Only Efficiency
A lightweight multimodal model in beta, optimized for efficient text and image processing, accessible via API.
Text and Image Processing Efficient Performance Task Versatility
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Google

Gemini-2.0-Flash-Thinking

Multimodal API Only Reasoning
A variant of Gemini 2.0 Flash with enhanced reasoning capabilities, accessible via API.
Enhanced Reasoning Text and Image Processing Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Google

Veo 2

Video Generation API Only Creative
An advanced video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video High-Quality Output Creative Storytelling
9.1/10
Performance
9.0/10
Accuracy
Proprietary
License
IBM

Granite 3.1

Language Model Open Weights Performance
An open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP Task Versatility Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Google

Imagen 3 Update

Image Generation API Only Creative
An updated image generation model for creating high-quality visuals, accessible via API for creative applications.
High-Quality Images Creative Workflows Professional Design
9.1/10
Performance
9.0/10
Accuracy
Proprietary
License
xAI

Aurora

Image Generation API Only Creative
An image generation model integrated with xAI's ecosystem, accessible via API for creative applications.
High-Quality Images Creative Workflows xAI Integration
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Microsoft

Phi4

Language Model Open Weights Efficiency
An open-weight language model optimized for efficiency and performance on resource-constrained devices.
Resource Efficiency High Performance Customizable
8.7/10
Performance
8.6/10
Accuracy
MIT
License
Meta

Llama 3.3 70B

Language Model Open Weights Research
An upgraded open-weight language model for research, offering high performance in NLP tasks.
NLP Research High Performance Customizable
9.0/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Google

PaliGemma 2

Multimodal Open Weights Research
An open-weight vision-language model for advanced multimodal tasks, suitable for research and development.
Vision-Language Processing Research Flexibility Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Pika

Pika Labs 2.0

Video Generation API Only Creative
An upgraded video generation model for creating high-quality videos with advanced effects, accessible via API.
Text-to-Video Advanced Effects Creative Storytelling
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Meta

Apollo

Multimodal Open Weights Research
An open-weight multimodal model for text and image processing, optimized for research and development.
Text and Image Processing Research Flexibility Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Deepseek

DeepSeek V3

Language Model Open Weights Performance
An open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP Task Versatility Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
AnswerAI and LightOn

ModernBERT

Language Model Open Weights Efficiency
An open-weight language model optimized for advanced NLP tasks, offering high performance and efficiency.
Advanced NLP Efficient Processing Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Alibaba

QVQ-72B-Preview

Language Model Open Weights Preview
A preview of a high-performance language model for advanced NLP tasks, offering open weights for customization.
Advanced NLP Task Versatility Customizable
9.0/10
Performance
8.9/10
Accuracy
Apache 2.0
License
OpenAI

O3

Reasoning API Only Problem-Solving
An advanced AI model with superior reasoning and problem-solving capabilities, accessible via API.
Advanced Reasoning Problem Solving Task Versatility
9.6/10
Performance
9.5/10
Accuracy
Proprietary
License
KLING

Kling 1.6

Video Generation API Only Creative
An upgraded video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video High-Quality Output Creative Storytelling
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
TII

Falcon 3

Multimodal Open Weights Performance
An open-weight model family for advanced language and multimodal tasks, offering high performance and flexibility.
Language Processing Multimodal Capabilities Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Alibaba

QwQ 32B Preview

Language Model Open Weights Preview
An open-weight language model preview for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP Task Versatility Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Alibaba

Qwen2.5 Coder 32B

Code Generation Open Weights Developer
An open-weight model for advanced code generation, optimized for programming tasks and developer workflows.
Code Completion Syntax Understanding Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Deepseek

DeepSeek-R1-Lite-Preview

Reasoning API Only Preview
A preview of a lightweight AI model for reasoning and task assistance, accessible via API.
Efficient Reasoning Task Assistance Developer API
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
allenai

Tulu 3

Language Model Open Weights Research
An open-weight language model for research, offering high performance in NLP tasks with a focus on flexibility.
NLP Research High Performance Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Suno AI

Suno v4

Music Generation API Only Creative
An upgraded music creation model generating high-quality audio tracks, accessible via API for creative projects.
Text-to-Music High-Quality Audio Creative Flexibility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
HuggingFace

SmolLM 2

Language Model Open Weights Efficiency
An open-weight lightweight language model for research and efficient NLP tasks, offering high performance.
Lightweight NLP Research Flexibility Customizable
8.6/10
Performance
8.5/10
Accuracy
Apache 2.0
License
Mistral

Pixtral Large

Multimodal Open Weights Research
An open-weight multimodal model for advanced text and image processing, optimized for research and development.
Text and Image Processing Research Flexibility Customizable
9.0/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Mistral

Mistral Large 2411

Language Model Open Weights Performance
An upgraded open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP Task Versatility Customizable
9.0/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Google

gemini-exp-1114

Multimodal API Only Experimental
An experimental multimodal model with advanced text and image processing, accessible via API.
Text and Image Processing Advanced Reasoning Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Google

gemini-exp-1121

Multimodal API Only Experimental
An experimental multimodal model with enhanced text and image processing, accessible via API.
Text and Image Processing Advanced Reasoning Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Allen AI

OLMo 2

Language Model Open Weights Research
An open-weight language model for research, offering high performance in NLP tasks with a focus on efficiency.
NLP Research High Performance Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Anthropic

Visual PDF Analysis

Document Analysis API Only Multimodal
A feature in Claude for analyzing PDF documents with visual content, accessible via API.
PDF Analysis Visual Content Processing Task Assistance
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
HuggingFace

SmolVLM

Multimodal Open Weights Efficiency
An open-weight vision-language model optimized for efficient multimodal tasks, suitable for research and development.
Vision-Language Processing Resource Efficiency Customizable
8.6/10
Performance
8.5/10
Accuracy
Apache 2.0
License
Black Forest Labs

Flux 1.1 Pro

Image Generation API Only Creative
An upgraded image generation model for professional-grade visuals, accessible via API.
High-Quality Images Professional Design Creative Workflows
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Meta

Movie Gen

Video Generation API Only Creative
A video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video High-Quality Output Creative Storytelling
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Cohere

Aya Expanse

Language Model Open Weights Performance
An open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP Task Versatility Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Pika

Pika Effects

Video Generation API Only Creative
A video model with advanced effects for creative video editing, accessible via API.
Video Effects Creative Editing High-Quality Output
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Adobe

Firefly Video

Video Generation API Only Creative
A video generation model for professional-grade video creation, accessible via API.
High-Quality Video Professional Editing Creative Workflows
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Rhymes

Aria

Conversational Open Weights Creative
An open-weight conversational AI model optimized for task assistance and creative interactions.
Task Assistance Creative Interactions Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Meta

Meta Spirit LM

Language Model Open Weights Research
An open-weight language model for research, offering high performance in NLP tasks.
NLP Research High Performance Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Mistral

Ministral

Language Model API Only Efficiency
A lightweight language model for efficient NLP tasks, accessible via API for developers.
Efficient NLP Task Versatility Developer API
8.5/10
Performance
8.4/10
Accuracy
Proprietary
License
Deepseek

Janus

Multimodal Open Weights Research
An open-weight multimodal model for text and image processing, optimized for research and development.
Text and Image Processing Research Flexibility Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google

Fluid

Reasoning API Only Research
An AI model for advanced reasoning and problem-solving, accessible via API for research applications.
Advanced Reasoning Problem Solving Research Applications
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Stability AI

Stable Diffusion 3.5

Image Generation Open Weights Creative
An upgraded open-weight model for text-to-image generation, offering improved quality and flexibility.
Text-to-Image High-Quality Output Creative Flexibility
8.9/10
Performance
8.8/10
Accuracy
CreativeML Open RAIL-M
License
Anthropic

Claude 3.5 Sonnet New

Conversational API Only Safety
An upgraded conversational AI model with enhanced reasoning and safety, accessible via API.
Enhanced Reasoning Safe Interactions Helpful Responses
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Anthropic

Claude 3.5 Haiku

Conversational API Only Efficiency
A lightweight conversational AI model with efficient performance and safety, accessible via API.
Efficient Reasoning Safe Interactions Helpful Responses
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Recraft

Recraft v3

Image Generation API Only Creative
An image generation model for creating high-quality visuals, accessible via API for creative workflows.
High-Quality Images Creative Workflows Professional Design
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
OpenAI

Search GPT

Search API Only Summarization
An AI-powered search engine providing concise and relevant answers, accessible via API.
Search Summaries Information Retrieval Concise Outputs
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Allen AI

OLMoE

Language Model Open Weights Research
An open-weight language model for research, offering high performance in NLP tasks with a focus on efficiency.
NLP Research High Performance Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Mistral

Pixtral12B

Multimodal Open Weights Research
An open-weight multimodal model for text and image processing, optimized for research and development.
Text and Image Processing Research Flexibility Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
OpenAI

o1 preview

Reasoning API Only Problem-Solving
A preview of an advanced reasoning model with enhanced problem-solving capabilities, accessible via API.
Advanced Reasoning Problem Solving Task Versatility
9.4/10
Performance
9.3/10
Accuracy
Proprietary
License
OpenAI

o1 mini

Reasoning API Only Efficiency
A lightweight version of the o1 model with efficient reasoning capabilities, accessible via API.
Efficient Reasoning Problem Solving Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
IBM

Granite Code

Code Generation Open Weights Developer
An open-weight model for code generation, optimized for programming tasks and developer workflows.
Code Completion Syntax Understanding Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Alibaba

Qwen 2.5

Language Model Open Weights Performance
An open-weight language model for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP Task Versatility Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
KLING

KLING 1.5

Video Generation API Only Creative
A video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video High-Quality Output Creative Storytelling
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
01 AI

Yi Coder

Code Generation Open Weights Developer
An open-weight model for code generation, optimized for programming tasks and developer workflows.
Code Completion Syntax Understanding Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
OpenAI

GPT4o Advanced Voice Mode

Multimodal API Only Voice
An enhanced version of GPT-4o with advanced voice interaction capabilities, accessible via API.
Voice Interaction Text and Image Processing Task Versatility
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Meta

Llama 3.2

Language Model Open Weights Research
An upgraded open-weight language model for research, offering improved performance in NLP tasks.
NLP Research High Performance Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Google

Gemini Pro 1.5 002

Multimodal API Only Performance
An updated multimodal model with enhanced text and image processing capabilities, accessible via API.
Text and Image Processing Advanced Reasoning Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Kyutai

Moshi

Conversational Open Weights Voice
An open-weight conversational AI model optimized for real-time voice and text interactions.
Voice Interaction Real-Time Processing Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Google

NotebookLM

Research API Only Summarization
An updated AI-powered tool for research and note-taking, providing summaries and insights via API.
Research Summaries Note-Taking Insight Generation
8.6/10
Performance
8.5/10
Accuracy
Proprietary
License
Mistral

Mistral Small

Language Model API Only Efficiency
A lightweight language model for efficient NLP tasks, accessible via API for developers.
Efficient NLP Task Versatility Developer API
8.5/10
Performance
8.4/10
Accuracy
Proprietary
License
Black Forest Labs

Flux

Image Generation Open Weights Creative
An open-weight model for high-quality image generation, optimized for creative and professional applications.
Text-to-Image High-Quality Output Creative Flexibility
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
OpenAI

GPT-4o 0806

Multimodal API Only Reasoning
An updated multimodal AI model with enhanced text and image processing capabilities, accessible via API.
Text and Image Processing Advanced Reasoning Task Versatility
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Google

Imagen 3

Image Generation API Only Creative
An advanced image generation model for creating high-quality visuals, accessible via API for creative applications.
High-Quality Images Creative Workflows Professional Design
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
xAI

Grok 2

Conversational API Only Reasoning
An advanced conversational AI model with enhanced reasoning capabilities, accessible via API.
Enhanced Reasoning Task Assistance Conversational
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
xAI

Grok 2 mini

Conversational API Only Efficiency
A lightweight version of Grok 2 with efficient conversational capabilities, accessible via API.
Efficient Reasoning Task Assistance Conversational
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Nous

Hermes 3

Language Model Open Weights Research
An open-weight language model for research and advanced NLP tasks, offering high performance and flexibility.
NLP Research High Performance Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Microsoft

Phi 3.5

Language Model Open Weights Efficiency
An upgraded open-weight language model optimized for efficiency and performance on resource-constrained devices.
Resource Efficiency High Performance Customizable
8.6/10
Performance
8.5/10
Accuracy
MIT
License
Google

Gemini 1.5 Flash8B

Multimodal API Only Efficiency
A lightweight multimodal model with efficient text and image processing, accessible via API.
Text and Image Processing Efficient Performance Task Versatility
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Ideogram

Ideogram 2.0

Image Generation API Only Creative
An image generation model for creating high-quality visuals, accessible via API for creative applications.
High-Quality Images Creative Workflows Professional Design
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Luma

Dream Machine 1.5

Video Generation API Only Creative
A video generation model for creating high-quality videos from text prompts, accessible via API.
Text-to-Video High-Quality Output Creative Storytelling
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Cohere

Command R+

Language Model Open Weights Performance
An open-weight language model optimized for advanced NLP tasks, offering high performance and flexibility.
Advanced NLP Task Versatility Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
TII

Falcon Mamba

Language Model Open Weights Efficiency
An open-weight state-space model for efficient language processing, suitable for research and development.
State-Space Architecture Efficient Processing Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
OpenAI

GPT-4o mini

Multimodal API Only Efficiency
A lightweight version of GPT-4o with multimodal capabilities, optimized for efficiency via API access.
Text and Image Processing Efficient Performance Task Versatility
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Meta

Llama 3.1

Language Model Open Weights Research
An upgraded open-weight language model for research, offering improved performance in NLP tasks.
NLP Research High Performance Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Mistral

Codestral Mamba

Code Generation Open Weights Efficiency
An open-weight model for code generation, leveraging state-space architecture for efficient programming tasks.
Code Completion State-Space Architecture Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Google

AlphaProof & AlphaGeometry 2

Math API Only Reasoning
Specialized AI models for mathematical reasoning and geometry problem-solving, accessible via API.
Mathematical Reasoning Geometry Solving Research Applications
9.0/10
Performance
9.1/10
Accuracy
Proprietary
License
OpenAI

SearchGPT

Search API Only Summarization
An AI-powered search engine providing concise and relevant answers, accessible via API.
Search Summaries Information Retrieval Concise Outputs
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Udio

Udio v1.5

Music Generation API Only Creative
A music creation model generating high-quality audio tracks, accessible via API for creative applications.
Text-to-Music High-Quality Audio Creative Flexibility
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Mistral

Mistral Large 2

Language Model API Only Performance
A high-performance language model for advanced NLP tasks, accessible via API for developers.
Advanced NLP Task Versatility Developer API
9.0/10
Performance
8.9/10
Accuracy
Proprietary
License
Midjourney

Midjourney v6.1

Image Generation API Only Creative
An upgraded image generation model for creating high-quality visuals, accessible via API for creative workflows.
High-Quality Images Creative Workflows Professional Design
9.1/10
Performance
9.0/10
Accuracy
Proprietary
License
Google

Gemma 2 2B

Language Model Open Weights Efficiency
A lightweight open-weight language model for research and efficient NLP tasks, offering high performance.
Lightweight NLP Research Flexibility Customizable
8.5/10
Performance
8.4/10
Accuracy
Apache 2.0
License
Stability AI

Stable Diffusion 3 (Medium)

Image Generation Open Weights Creative
A medium-sized version of Stable Diffusion 3, offering open weights for text-to-image generation with balanced performance.
Text-to-Image Balanced Performance Creative Flexibility
8.7/10
Performance
8.6/10
Accuracy
CreativeML Open RAIL-M
License
Apple

Apple Intelligence

Productivity API Only On-Device
An AI suite for Apple devices, enhancing user experience through on-device task automation and insights, accessible via API.
Task Automation On-Device Processing User Experience
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Deepseek

DeepSeekCoderV2

Code Generation Open Weights Developer
An open-weight model for advanced code generation, optimized for programming tasks and developer workflows.
Code Completion Syntax Understanding Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Runway

Gen3 Alpha

Video Generation API Only Creative
A video generation model for creating high-quality videos from text prompts, accessible via API for creative applications.
Text-to-Video High-Quality Output Creative Storytelling
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
01 AI

Yi 1.5

Language Model Open Weights Research
An open-weight language model optimized for research and NLP tasks, offering high performance and flexibility.
NLP Research High Performance Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Anthropic

Claude Sonnet 3.5

Conversational API Only Safety
An upgraded conversational AI model with enhanced reasoning and safety features, accessible via API.
Enhanced Reasoning Safe Interactions Helpful Responses
9.2/10
Performance
9.1/10
Accuracy
Proprietary
License
Microsoft

Florence 2

Vision Open Weights Research
An open-weight vision model for advanced image processing tasks, suitable for research and development.
Image Processing Research Flexibility Customizable
8.6/10
Performance
8.5/10
Accuracy
MIT
License
Google

Gemma 2

Language Model Open Weights Efficiency
An open-weight language model optimized for research and lightweight NLP tasks, offering high efficiency.
Lightweight NLP Research Flexibility Customizable
8.5/10
Performance
8.4/10
Accuracy
Apache 2.0
License
OpenAI

GPT-4o

Multimodal API Only Reasoning
A multimodal AI model with advanced capabilities in text, image processing, and reasoning, accessible via API.
Text and Image Processing Advanced Reasoning Task Versatility
9.3/10
Performance
9.2/10
Accuracy
Proprietary
License
Google

Gemini 1.5

Multimodal API Only High Capacity
An upgraded multimodal model with a 2 million token limit, offering enhanced performance in text and image tasks.
Large Token Limit Text and Image Processing Task Versatility
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Microsoft

Copilot+

Productivity API Only Automation
An AI assistant for dedicated computers, enhancing productivity through task automation and insights, accessible via API.
Task Automation Productivity Insights Integration
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Meta

Chameleon

Multimodal Open Weights Research
A multimodal model with open weights, designed for text and image processing in research and development.
Text and Image Processing Research Flexibility Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Mistral

Mistral-7B-Instruct-v0.3

Language Model Open Weights Instruction
An open-weight language model optimized for instruction-following tasks, offering high efficiency and performance.
Instruction Following Efficient Processing Customizable
8.6/10
Performance
8.5/10
Accuracy
Apache 2.0
License
Google

AI Overviews

Search API Only Summarization
An AI-powered search summary tool providing concise and relevant information, accessible via API.
Search Summaries Information Retrieval Concise Outputs
8.5/10
Performance
8.4/10
Accuracy
Proprietary
License
Suno AI

Suno v3.5

Music Generation API Only Creative
An upgraded music creation model generating high-quality audio tracks, accessible via API for creative projects.
Text-to-Music High-Quality Audio Creative Flexibility
8.8/10
Performance
8.7/10
Accuracy
Proprietary
License
Mistral

Codestral

Code Generation Open Weights Developer
An open-weight model for code generation, optimized for programming tasks and developer workflows.
Code Completion Syntax Understanding Customizable
8.7/10
Performance
8.6/10
Accuracy
Apache 2.0
License
TII

Falcon 2

Multimodal Open Weights Language
An open-weight model family including Falcon2-11B and Falcon2-VLM, designed for language and vision tasks.
Language Processing Vision Processing Customizable
8.6/10
Performance
8.5/10
Accuracy
Apache 2.0
License
Stability AI

Stable Audio 2.0

Audio Generation Open Weights Creative
An open-weight model for generating high-fidelity audio, suitable for music and sound design applications.
High-Fidelity Audio Sound Design Customizable
8.8/10
Performance
8.7/10
Accuracy
CreativeML Open RAIL-M
License
xAI

Grok-1.5V

Multimodal API Only Reasoning
An enhanced version of Grok with image recognition capabilities, designed for multimodal task assistance via API.
Image Recognition Task Assistance Conversational
8.7/10
Performance
8.8/10
Accuracy
Proprietary
License
Mistral

Mixtral 8x22B

Language Model Open Weights Efficiency
A high-performance language model with open weights, optimized for efficiency and scalability in NLP tasks.
Efficient Processing Scalable NLP Customizable
8.9/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Meta

LLaMA 3

Language Model Open Weights Research
An open-weight language model designed for research, offering high performance in natural language tasks.
NLP Research High Performance Customizable
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Microsoft

Phi-3-mini

Language Model Open Weights Efficiency
A lightweight, open-weight language model optimized for efficiency and performance on resource-constrained devices.
Resource Efficiency High Performance Customizable
8.5/10
Performance
8.4/10
Accuracy
MIT
License
Adobe

Firefly 3

Image Generation API Only Creative
An image creation model for professional design, offering high-quality outputs via API for creative workflows.
High-Quality Images Professional Design Creative Workflows
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Reka

Reka AI Models

Multimodal API Only Language
Multimodal language models designed for advanced text and image processing, accessible via API.
Text Processing Image Processing Task Versatility
8.6/10
Performance
8.5/10
Accuracy
Proprietary
License
Apple

OpenELM

Language Model Open Weights Efficiency
An open-weight language model optimized for efficient on-device NLP tasks, suitable for research and development.
On-Device NLP Resource Efficiency Customizable
8.4/10
Performance
8.3/10
Accuracy
Apache 2.0
License
xAI

Grok 1.5

Conversational Open Weights Reasoning
An advanced conversational AI model with improved reasoning and open weights, designed to assist users in various tasks.
Enhanced Reasoning Task Assistance Open Customization
8.6/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Anthropic

Claude 3

Conversational API Only Safety
A conversational AI model outperforming GPT-4, focused on safety and helpfulness, accessible via API.
Safe Interactions Advanced Reasoning Helpful Responses
9.1/10
Performance
9.0/10
Accuracy
Proprietary
License
Suno AI

Suno v3

Music Generation API Only Creative
A music creation model generating high-quality audio tracks from prompts, accessible via API for creative applications.
Text-to-Music High-Quality Audio Creative Flexibility
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Stability AI

Stable Diffusion 3

Image Generation Open Weights Creative
A text-to-image model with enhanced capabilities for generating high-quality, detailed images from textual prompts, suitable for creative and professional applications.
Text-to-Image High-Resolution Output Creative Flexibility
8.8/10
Performance
8.9/10
Accuracy
CreativeML Open RAIL-M
License
Google

Gemini Pro

Conversational Reasoning Multimodal
An upgraded conversational AI model powering Bard, offering improved reasoning and language understanding for diverse tasks.
Enhanced Reasoning Language Understanding Task Versatility
8.7/10
Performance
8.6/10
Accuracy
Proprietary
License
Google

Gemini Pro 1.5

Multimodal API Only Reasoning
A multimodal AI model with advanced capabilities in text, image processing, and reasoning, accessible via API for developers.
Text and Image Processing Advanced Reasoning Developer API
8.9/10
Performance
8.8/10
Accuracy
Proprietary
License
Google

CodeGemma

Code Generation Open Weights Developer
A code generation model designed for programming tasks, offering open weights for community use and customization.
Code Completion Syntax Understanding Customizable
8.5/10
Performance
8.4/10
Accuracy
Apache 2.0
License
OpenAI

Sora

Video Generation API Only Creative
A video generation model capable of creating realistic and imaginative videos from text prompts, not yet publicly released.
Text-to-Video Realistic Rendering Creative Storytelling
9.2/10
Performance
9.0/10
Accuracy
Proprietary
License
Google

Gemini Ultra

LLM Multimodal
Google's largest multimodal model for text, vision, and reasoning tasks. It excels in complex problem-solving across diverse data types like code, images, and text.
Multimodal Advanced Reasoning Code Generation Vision
9.1/10
Performance
9.3/10
Accuracy
Commercial
License
Google

Gemini Pro

LLM Multimodal
Mid-tier multimodal Gemini model for text, vision, and reasoning. It offers a balance of performance and efficiency for various tasks.
Multimodal Reasoning Code Generation Vision
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
Google

Gemini Nano

LLM On-device
Lightweight Gemini model optimized for on-device tasks. It supports text and vision processing with low resource requirements.
On-device Multimodal Efficient Vision
7.8/10
Performance
8.2/10
Accuracy
Commercial
License
Microsoft

Phi-2

SLM Open Source
A 2.7 billion parameter SLM optimized for reasoning, coding, and math, delivering near state-of-the-art performance for its size. It’s designed for low-resource environments, making it ideal for on-device applications and research experimentation.
Reasoning Coding Math On-device
8.5/10
Performance
8.7/10
Accuracy
MIT
License
DeepSeek

DeepSeek Coder

LLM Open Source Coding
DeepSeek Coder is an open-source model optimized for programming tasks, enabling code generation and completion with high accuracy. It laid the foundation for DeepSeek's later coding-focused models.
Code Generation Code Completion Programming Support
8.0/10
Performance
8.2/10
Accuracy
DeepSeek License
License
DeepSeek

DeepSeek LLM

LLM Open Source General Purpose
DeepSeek LLM is a general-purpose language model available in Base and Chat variants, trained on 2 trillion tokens of English and Chinese text. It excels in text generation and conversational tasks.
Text Generation Conversational AI Multilingual Support Context-aware
8.3/10
Performance
8.5/10
Accuracy
DeepSeek License
License
Google

Tram

LLM Structured Data
Model for structured data processing and analysis. It excels in tasks involving tabular data and complex data structures.
Structured Data Data Analysis Table Processing Scalable
8.5/10
Performance
8.8/10
Accuracy
Commercial
License
OpenAI

DALL-E 3

Multimodal Image Generation
The latest DALL-E model with advanced image generation and integration into ChatGPT. It offers improved detail and prompt adherence.
Image Generation Prompt Adherence ChatGPT Integration
9.0/10
Performance
9.2/10
Accuracy
Proprietary
License
Microsoft

Phi-1.5

SLM Open Source
An enhanced version of Phi-1 with improved reasoning and text generation capabilities, maintaining a compact 1.3 billion parameter size. It excels in tasks like coding and math, offering a balance of efficiency and performance for local deployment.
Text Generation Reasoning Coding Local Deployment
8.2/10
Performance
8.4/10
Accuracy
MIT
License
Alibaba

Qwen-VL-7B

Multimodal Vision-Language Open Source
A vision-language model capable of understanding and generating content from images, supporting multi-round question answering.
Image Understanding Multi-Round QA Creative Capabilities Multilingual Support
8.8/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google

Flan-PaLM

LLM Instruction-tuned
Instruction-tuned PaLM for improved task generalization. It leverages diverse instruction datasets to excel in zero-shot and few-shot scenarios across tasks.
Zero-shot Learning Instruction Tuning Advanced Reasoning Multilingual
9.2/10
Performance
9.3/10
Accuracy
Commercial
License
Google

Med-PaLM 2

LLM Medical
Advanced medical-domain model for clinical tasks and question answering. It improves on MedPaLM with enhanced accuracy for healthcare applications.
Medical QA Clinical Tasks Advanced Reasoning Fine-tuning
9.2/10
Performance
9.4/10
Accuracy
Commercial
License
Meta AI

Code Llama

LLM Code
A specialized LLaMA variant for code generation and programming tasks. It supports multiple programming languages and excels in generating accurate, context-aware code.
Code Generation Programming Context-aware Fine-tuning
8.7/10
Performance
8.9/10
Accuracy
Non-commercial
License
Google

MedPaLM

LLM Medical
Medical-domain PaLM for clinical tasks like medical question answering. It is fine-tuned with medical data for high accuracy in healthcare applications.
Medical QA Clinical Tasks Advanced Reasoning Fine-tuning
9.1/10
Performance
9.3/10
Accuracy
Commercial
License
Meta AI

LLaMA 2

LLM Research
An improved version of LLaMA, offering enhanced performance and safety for research applications. It supports a wide range of NLP tasks with better generalization and efficiency.
Text Generation Safety-focused Efficient Fine-tuning
8.8/10
Performance
9.0/10
Accuracy
Non-commercial
License
Stability AI

Stable Diffusion XL (SDXL)

LLM Image Generation Open Source
A powerful evolution of Stable Diffusion, SDXL delivers superior image quality and prompt adherence at higher resolutions, ideal for professional use cases. It incorporates advanced training methods and supports diverse styles like 3D, photography, and painting, with optimized performance on consumer hardware.
High-Resolution Images Prompt Adherence Diverse Styles Consumer Hardware
9.0/10
Performance
9.1/10
Accuracy
CreativeML Open RAIL-M
License
Google

U-PaLM

LLM Advanced
Continually trained PaLM with Unified Language Learning (UL2) objectives. It enhances generalization across tasks, improving performance in reasoning and multilingual settings.
Continual Learning Advanced Reasoning Multilingual Scalable
9.1/10
Performance
9.2/10
Accuracy
Commercial
License
Google

Flamingo-C

LLM Multimodal
Compact version of Flamingo for multimodal vision-language tasks. It maintains strong performance with reduced resource requirements.
Multimodal Vision and Language Efficient Image Captioning
8.6/10
Performance
8.9/10
Accuracy
Commercial
License
Microsoft

Phi-1

SLM Open Source
A small language model (SLM) with 1.3 billion parameters, designed for efficient text generation and basic reasoning tasks, particularly in research settings. It achieves strong performance on benchmarks like HumanEval, focusing on lightweight, cost-effective AI solutions for developers.
Text Generation Efficient Research-focused Lightweight
8.0/10
Performance
8.2/10
Accuracy
MIT
License
Google

PaLM 2

LLM Advanced
Enhanced version of PaLM with improved efficiency and performance. It offers better reasoning, multilingual capabilities, and optimized training for diverse tasks.
Advanced Reasoning Multilingual Efficient Code Generation
9.2/10
Performance
9.3/10
Accuracy
Commercial
License
Stability AI

Stable LM

LLM Language Open Source
An open-source language model suite launched in April 2023, with 3B to 7B parameter models designed for efficient text and code generation on personal devices. Trained on a massive dataset three times larger than The Pile, it offers high performance for conversational and coding tasks.
Text Generation Code Generation Efficient Consumer Devices
8.6/10
Performance
8.7/10
Accuracy
CC BY-SA-4.0
License
Google

PaLM-E

LLM Multimodal
Embodied PaLM for robotics and multimodal tasks, integrating language with sensory inputs. It enables language-guided control in physical environments like robotic navigation.
Multimodal Robotics Advanced Reasoning Embodied AI
8.9/10
Performance
9.1/10
Accuracy
Commercial
License
OpenAI

GPT-4

LLM Multimodal
A multimodal model with enhanced text and image processing capabilities. It achieves human-level performance on academic benchmarks and supports complex tasks.
Text Generation Image Processing Advanced Reasoning Multimodal
9.0/10
Performance
9.2/10
Accuracy
Proprietary
License
Meta AI

LLaMA

LLM Research
A family of language models designed for research purposes, known for efficiency in natural language tasks. LLaMA models excel in text generation and understanding with optimized architectures.
Text Generation Efficient Research-focused Fine-tuning
8.5/10
Performance
8.8/10
Accuracy
Non-commercial
License
Google

Dramatron

LLM Creative
Model for scriptwriting and creative writing assistance. It generates coherent narratives and dialogue for storytelling applications.
Creative Writing Scriptwriting Text Generation Narrative
8.3/10
Performance
8.5/10
Accuracy
Commercial
License
OpenAI

GPT-3.5

LLM Conversational
An optimized version of GPT-3 with fewer parameters, fine-tuned using reinforcement learning for conversational tasks. It powers the initial ChatGPT release.
Conversational AI Text Generation Fine-tuned RLHF
8.7/10
Performance
8.9/10
Accuracy
Proprietary
License
Stability AI

Stable Diffusion 2.0

LLM Image Generation Open Source
An enhanced version of Stable Diffusion, introducing inpainting, outpainting, and depth-guided image generation for improved creative control. It maintains high-quality outputs while addressing ethical concerns through filtered training data and permissive licensing for diverse applications.
Inpainting Outpainting Depth-guided Generation Text-to-Image
8.9/10
Performance
9.0/10
Accuracy
CreativeML Open RAIL-M
License
Google

Flan-T5

LLM Instruction-tuned
Instruction-tuned T5 for zero-shot task generalization. It improves T5’s performance on unseen tasks by fine-tuning with diverse instruction datasets.
Zero-shot Learning Text-to-Text Fine-tuning Instruction Tuning
8.7/10
Performance
9.0/10
Accuracy
Apache 2.0
License
Google

Sparrow

LLM Conversational
Dialogue model with a focus on safety and ethical responses. It aims to reduce harmful outputs while maintaining conversational quality.
Conversational Safety-focused Dialogue Ethical
8.6/10
Performance
8.8/10
Accuracy
Commercial
License
OpenAI

Whisper

Speech Open Source
An automatic speech recognition model for transcribing and translating audio. It supports multilingual speech processing with high accuracy.
Speech Recognition Transcription Translation
8.7/10
Performance
8.9/10
Accuracy
MIT
License
Google

SayCan

LLM Robotics
Language-guided robotic control model for task execution. It combines language understanding with physical actions for robotic applications.
Robotics Language-guided Task Execution Multimodal
8.5/10
Performance
8.7/10
Accuracy
Commercial
License
Stability AI

Stable Diffusion

LLM Image Generation Open Source
A pioneering open-source text-to-image model that generates high-quality, photorealistic images from textual prompts, leveraging latent diffusion techniques. Widely adopted for its flexibility and ability to run on consumer hardware, it supports creative applications in art, design, and media production.
Text-to-Image High Resolution Artistic Quality Open Source
8.8/10
Performance
8.9/10
Accuracy
CreativeML Open RAIL-M
License
Google

Minerva

LLM Reasoning
PaLM-based model optimized for quantitative reasoning tasks. It excels in solving mathematical and scientific problems with high accuracy.
Quantitative Reasoning Advanced Reasoning Math-focused Scalable
9.0/10
Performance
9.2/10
Accuracy
Commercial
License
Google

UL2

LLM Versatile
Unified Language Learning model with a mixture-of-denoisers approach. It supports diverse tasks by combining multiple pre-training objectives for flexibility.
Mixture-of-Denoisers Pre-training Multitask Scalable
8.7/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Google

Gato

LLM Generalist
Generalist agent for text, vision, and robotics tasks. It performs well across diverse domains, from language to physical control.
Multimodal Robotics Generalist Scalable
8.6/10
Performance
8.8/10
Accuracy
Commercial
License
Meta AI

OPT

LLM Open Source
Open Pre-trained Transformer models for research, offering efficient large-scale language modeling. It provides performance comparable to GPT-3 with open access for academic use.
Text Generation Efficient Research-focused Scalable
8.6/10
Performance
8.9/10
Accuracy
Non-commercial
License
Google

PaLM

LLM Large-scale
Pathways Language Model, a 540B-parameter model for advanced reasoning and multilingual tasks. It excels in complex tasks like mathematical reasoning and code generation.
Advanced Reasoning Multilingual Code Generation Scalable
9.0/10
Performance
9.2/10
Accuracy
Commercial
License
Google

Flamingo

LLM Multimodal
Multimodal vision-language model for tasks like image captioning. It combines visual and textual understanding for versatile applications.
Multimodal Vision and Language Image Captioning Fine-tuning
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
OpenAI

DALL-E 2

Multimodal Image Generation
An enhanced version of DALL-E with improved image quality and editing capabilities. It supports higher resolution and more precise outputs.
Image Generation Image Editing High Resolution
8.8/10
Performance
9.0/10
Accuracy
Proprietary
License
Google

Luminous

LLM Multilingual
Multilingual text generation model with limited public details. It focuses on high-quality text generation for diverse languages and applications.
Multilingual Text Generation Pre-training Fine-tuning
8.4/10
Performance
8.7/10
Accuracy
Commercial
License
Google

Chinchilla

LLM Efficient
70B-parameter compute-optimal model for efficient performance. It outperforms larger models in NLP tasks with less computational cost.
Compute-optimal Efficient NLP Tasks Scalable
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
Google

Ithaca

LLM Specialized
Model for historical text restoration, specializing in ancient Greek texts. It reconstructs missing text with high contextual accuracy.
Text Restoration Historical Texts Contextual Understanding Specialized
8.4/10
Performance
8.6/10
Accuracy
Commercial
License
Google

CodeGen

LLM Code
Code-generation model complementing AlphaCode for programming tasks. It generates high-quality code for various languages and applications.
Code Generation Programming Reasoning Scalable
8.7/10
Performance
8.9/10
Accuracy
Commercial
License
Google

AlphaCode

LLM Code
Model for competitive programming and code generation. It solves complex algorithmic problems with high accuracy and efficiency.
Code Generation Competitive Programming Reasoning Scalable
8.9/10
Performance
9.1/10
Accuracy
Commercial
License
Google

LaMDA

LLM Conversational
Language Model for Dialogue Applications, optimized for conversational tasks. It generates coherent and contextually relevant responses for natural dialogue.
Conversational Contextual Understanding Dialogue Fine-tuning
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
Google

RETRO

LLM Retrieval-augmented
Retrieval-augmented Transformer for enhanced language modeling. It uses external memory to improve performance on knowledge-intensive tasks.
Retrieval-augmented Knowledge-intensive NLP Tasks Scalable
8.7/10
Performance
8.9/10
Accuracy
Commercial
License
OpenAI

GPT-3.5 Turbo

LLM Conversational
A highly capable variant of GPT-3.5, optimized for speed and efficiency. It supports ChatGPT and was integrated into platforms like Bing before GPT-4.
Conversational AI High Efficiency Text Generation
8.8/10
Performance
9.0/10
Accuracy
Proprietary
License
NVIDIA

PeopleNet

Computer Vision Pretrained Real-time
PeopleNet is a computer vision model developed using NVIDIA TAO for real-time pedestrian detection and tracking in urban environments, optimized for smart cities and autonomous vehicles.
Pedestrian Detection Object Tracking Real-time Processing
8.5/10
Performance
8.7/10
Accuracy
NVIDIA License
License
NVIDIA

Bi3D

Computer Vision Pretrained Depth Estimation
Bi3D is a binary depth classification network for classifying object depth, ideal for collision avoidance in autonomous mobile robots.
Depth Classification Collision Avoidance Efficient Processing
8.3/10
Performance
8.5/10
Accuracy
NVIDIA License
License
NVIDIA

BioBERT

NLP Pretrained Biomedical
BioBERT is a BERT-based model fine-tuned on biomedical datasets for text mining and NLP tasks, optimized for identifying chemical and protein entities.
Biomedical Text Mining Entity Recognition Context-aware Processing
8.6/10
Performance
8.8/10
Accuracy
NVIDIA License
License
NVIDIA

Spleen Segmentation

Computer Vision Pretrained Medical
Spleen Segmentation is a pretrained model for volumetric 3D segmentation of the spleen from CT images, using advanced medical segmentation techniques.
3D Segmentation Medical Imaging High Accuracy
8.7/10
Performance
8.9/10
Accuracy
NVIDIA License
License
NVIDIA

Conformer

Speech Recognition Pretrained Multilingual
Conformer is a convolution-augmented transformer model for automatic speech recognition, supporting over 10 languages for applications like live captioning and voice assistants.
Speech Recognition Multilingual Support High Accuracy
8.8/10
Performance
9.0/10
Accuracy
NVIDIA License
License
NVIDIA

ECAPA-TDNN

Speech AI Pretrained Speaker Identification
ECAPA-TDNN is a time delay neural network-based model for speaker identification and verification, providing robust speaker embeddings for applications like medical conversation analysis.
Speaker Identification Speaker Verification Robust Embeddings
8.6/10
Performance
8.8/10
Accuracy
NVIDIA License
License
NVIDIA

Megatron 530B

LLM Pretrained Conversational
Megatron 530B is a transformer-based language model using ELECTRA pretraining, optimized for NLP tasks like chatbots and virtual assistants with smaller size and faster training.
Text Generation Conversational AI Efficient Training
8.9/10
Performance
9.1/10
Accuracy
NVIDIA License
License
Google

GLaM

LLM Efficient
Generalist Language Model, a 1.2T-parameter Mixture-of-Experts model. It achieves high performance with lower energy consumption for NLP tasks.
Mixture-of-Experts Efficient Scalable NLP Tasks
8.8/10
Performance
9.0/10
Accuracy
Commercial
License
Google

Flan

LLM Instruction-tuned
Instruction-tuned model family for zero-shot performance across tasks. It leverages fine-tuning on diverse datasets to improve generalization.
Zero-shot Learning Instruction Tuning Multitask Fine-tuning
8.6/10
Performance
8.9/10
Accuracy
Apache 2.0
License
Google

Gopher

LLM Large-scale
280B-parameter model focused on reasoning and language tasks. It competes with large-scale models in NLP benchmarks and research applications.
Advanced Reasoning Scalable NLP Tasks Research
8.9/10
Performance
9.1/10
Accuracy
Commercial
License
Google

T0

LLM Zero-shot
T5-based model for zero-shot task generalization. It leverages multitask prompting to perform well on unseen tasks without additional fine-tuning.
Zero-shot Learning Text-to-Text Multitask Fine-tuning
8.5/10
Performance
8.8/10
Accuracy
Apache 2.0
License
OpenAI

Codex

LLM Code
A specialized model for code generation and editing, powering tools like GitHub Copilot. It excels in understanding and generating programming languages.
Code Generation Code Editing Programming Support
8.6/10
Performance
8.8/10
Accuracy
Proprietary
License
Google

Perceiver

LLM Multimodal
General-purpose architecture for text and multimodal tasks. It uses cross-attention to handle diverse data types efficiently.
Multimodal Cross-attention Efficient Scalable
8.3/10
Performance
8.6/10
Accuracy
Commercial
License
Google

CANINE

LLM Multilingual
Character-based model for multilingual text processing without word tokenization. It excels in low-resource languages and noisy text environments.
Character-based Multilingual Robust Pre-training
7.8/10
Performance
8.1/10
Accuracy
Apache 2.0
License
Google

UniT

LLM Multimodal
Unified Transformer for vision and language tasks. It handles multimodal inputs for applications like image captioning and visual question answering.
Multimodal Vision and Language Pre-training Fine-tuning
8.5/10
Performance
8.8/10
Accuracy
Commercial
License
Google

ByT5

LLM Character-based
Byte-level T5 model for character-based text processing. It operates directly on UTF-8 bytes, improving performance on noisy text and low-resource languages.
Byte-level Processing Multilingual Text-to-Text Robust
8.3/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Google

MUM

LLM Multimodal
Multitask Unified Model, a multimodal model for search combining text and images. It enhances search relevance by understanding complex, multimodal queries.
Multimodal Search Optimization Text and Image Scalable
8.7/10
Performance
8.9/10
Accuracy
Commercial
License
Google

ViT-BERT

LLM Multimodal
Hybrid vision-language model combining Vision Transformer and BERT. It excels in tasks requiring joint understanding of images and text.
Multimodal Vision and Language Pre-training Fine-tuning
8.6/10
Performance
8.9/10
Accuracy
Commercial
License
Google

MuRIL

LLM Multilingual
Multilingual Representation for Indian Languages, a BERT-based model tailored for Indian languages. It supports 17 Indian languages, enhancing NLP tasks like sentiment analysis and text classification.
Multilingual Indian Languages Pre-training Fine-tuning
7.9/10
Performance
8.2/10
Accuracy
Apache 2.0
License
Google

Switch Transformer

LLM Scalable
Mixture-of-experts model for scaling to trillions of parameters efficiently. It dynamically selects expert subnetworks, reducing compute costs for large-scale tasks.
Mixture-of-Experts Scalable Efficient Pre-training
8.9/10
Performance
9.1/10
Accuracy
Commercial
License
OpenAI

CLIP

Vision Open Source
A vision-language model that connects text and images for tasks like image classification and captioning. It’s open-source and widely used in research.
Image Classification Text-Image Mapping Captioning
8.3/10
Performance
8.5/10
Accuracy
MIT
License
OpenAI

DALL-E

Multimodal Image Generation
A text-to-image model generating creative images from textual prompts. It combines GPT-like architectures with diffusion models.
Image Generation Text-to-Image Creative Output
8.5/10
Performance
8.7/10
Accuracy
Proprietary
License
Google

mT5

LLM Multilingual
Multilingual T5, supporting 101 languages for global NLP applications. It extends T5’s text-to-text framework to low-resource languages, improving cross-lingual performance.
Multilingual Text-to-Text Pre-training Fine-tuning
8.4/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google

ETC

LLM Long-context
Extended Transformer Construction with hierarchical attention for long-context processing. It handles extended sequences efficiently for tasks like document understanding.
Long-context Hierarchical Attention Pre-training NLP Tasks
7.9/10
Performance
8.2/10
Accuracy
Apache 2.0
License
Google

DocT5query

LLM Search
T5-based model for document ranking and query generation. It improves search relevance by generating queries for document indexing.
Document Ranking Query Generation Text-to-Text Fine-tuning
8.3/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google

BigBird

LLM Long-context
Transformer with sparse attention for processing long sequences. It reduces memory usage while maintaining performance on tasks like document classification.
Sparse Attention Long-context Pre-training NLP Tasks
8.0/10
Performance
8.4/10
Accuracy
Apache 2.0
License
Google

GShard

LLM Multilingual
Sharding-based Mixture-of-Experts model optimized for translation tasks. It enables efficient scaling for multilingual applications with reduced computational costs.
Mixture-of-Experts Multilingual Translation Scalable
8.6/10
Performance
8.8/10
Accuracy
Commercial
License
OpenAI

GPT-3

LLM Commercial
A massive model excelling in diverse NLP tasks, from text generation to question answering. It introduced few-shot learning capabilities and powered early API applications.
Text Generation Few-shot Learning Question Answering Translation
8.5/10
Performance
8.7/10
Accuracy
Proprietary
License
Microsoft

DeBERTa

LLM Open Source Multilingual
DeBERTa is a family of transformer-based language models (including base, large, V2, V3, and multilingual variants) that enhances BERT with disentangled attention and ELECTRA-style pre-training, achieving top performance on benchmarks like SuperGLUE and SQuAD. With sizes ranging from 22M to 1.5B parameters, it supports tasks like text classification, question answering, and cross-lingual transfer, powering Microsoft’s Turing NLRv4 for Bing and Azure.
Disentangled Attention ELECTRA Pre-training Cross-lingual Transfer Question Answering
9.0/10
Performance
9.2/10
Accuracy
MIT
License
Google

TAPAS

LLM Structured Data
Table-based question answering and parsing model. It processes structured data in tables, enabling natural language queries over tabular content.
Table Parsing Question Answering Structured Data Fine-tuning
8.2/10
Performance
8.6/10
Accuracy
Apache 2.0
License
Microsoft

CodeBERT

LLM Open Source Code
CodeBERT is a bimodal pre-trained model for programming and natural language, leveraging a large corpus of code and comments to excel in tasks like code search and documentation generation. It supports multiple programming languages and is widely used in tools for software development and AI-driven code analysis.
Code Search Documentation Generation Programming Languages Bimodal Pre-training
8.4/10
Performance
8.6/10
Accuracy
MIT
License
Google

MobileBERT

LLM Mobile
A compact BERT variant optimized for mobile and edge devices. It balances performance and resource usage, enabling efficient NLP on low-power hardware.
Mobile Optimization Low Latency Fine-tuning NLP Tasks
7.5/10
Performance
8.0/10
Accuracy
Apache 2.0
License
Google

Longformer

LLM Long-context
Transformer with efficient attention for long-document processing. It reduces computational complexity while handling extended sequences for tasks like summarization.
Long-context Efficient Attention Pre-training NLP Tasks
8.1/10
Performance
8.5/10
Accuracy
Apache 2.0
License
OpenAI

Jukebox

Audio Open Source
A model for generating music from text prompts, supporting various genres and styles. It’s an experimental open-source project.
Music Generation Text-to-Audio Genre Support
8.0/10
Performance
8.2/10
Accuracy
Non-commercial
License
Google

ELECTRA

LLM Efficient
Efficient pre-training model using a generator-discriminator framework. It achieves high performance with less compute by replacing masked language modeling with token detection.
Efficient Pre-training Token Detection Fine-tuning NLP Tasks
8.2/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google

PEGASUS

LLM Summarization
Pre-training with Extracted Gap-sentences for Abstractive Summarization. It is optimized for generating concise and accurate text summaries.
Summarization Pre-training Fine-tuning Text Generation
8.3/10
Performance
8.7/10
Accuracy
Apache 2.0
License
Google

Meena

LLM Conversational
Early conversational model, a precursor to LaMDA. It focuses on generating human-like dialogue with improved coherence and context awareness.
Conversational Contextual Understanding Dialogue Pre-training
8.0/10
Performance
8.3/10
Accuracy
Commercial
License
Google

Reformer

LLM Efficient
Memory-efficient Transformer using locality-sensitive hashing. It reduces memory usage for processing long sequences, suitable for resource-constrained environments.
Memory-efficient Long-context Pre-training NLP Tasks
7.9/10
Performance
8.2/10
Accuracy
Apache 2.0
License
Meta AI

XLM-R

LLM Multilingual
Cross-lingual Language Model-RoBERTa for multilingual NLP tasks. It supports 100 languages, enabling robust performance in translation and text classification across languages.
Multilingual Pre-training Fine-tuning Cross-lingual
8.3/10
Performance
8.7/10
Accuracy
MIT
License
Microsoft

DialoGPT

LLM Open Source Conversational
DialoGPT is a conversational model trained on Reddit dialogues, designed to generate human-like responses for interactive chat applications. Its GPT-2-based architecture and large-scale dialogue data enable coherent and contextually relevant conversations, influencing later models like BlenderBot.
Conversational AI Dialogue Generation Context-aware Human-like Responses
8.2/10
Performance
8.4/10
Accuracy
MIT
License
Google

T5

LLM Versatile
Text-to-Text Transfer Transformer, a unified framework for NLP tasks. It converts all tasks into a text-to-text format, enabling versatile applications like translation and summarization.
Text-to-Text Pre-training Fine-tuning Multitask
8.5/10
Performance
8.8/10
Accuracy
Apache 2.0
License
Meta AI

BART

LLM Open Source
Bidirectional and Auto-Regressive Transformer for text generation and comprehension. It excels in tasks like summarization and translation with a denoising pre-training objective.
Text Generation Summarization Translation Fine-tuning
8.4/10
Performance
8.8/10
Accuracy
MIT
License
Google

ALBERT

LLM Efficient
A Lite BERT with reduced parameters for efficiency while maintaining performance. It uses factorized embedding parameterization and cross-layer parameter sharing, ideal for resource-constrained environments.
Parameter Efficiency Scalable Fine-tuning NLP Tasks
7.8/10
Performance
8.3/10
Accuracy
Apache 2.0
License
Microsoft

MT-DNN

LLM Open Source
Multi-Task Deep Neural Network (MT-DNN) combines multi-task learning with pre-trained language models like BERT to achieve robust performance across diverse NLP tasks like sentiment analysis and text classification. Its knowledge distillation techniques enable efficient fine-tuning, making it a versatile choice for enterprise NLP applications.
Multi-task Learning Knowledge Distillation Text Classification Sentiment Analysis
8.3/10
Performance
8.5/10
Accuracy
MIT
License
Meta AI

RoBERTa

LLM Open Source
Robustly optimized BERT approach for enhanced NLP performance. It improves on BERT with dynamic masking and larger pre-training data for tasks like text classification.
Bidirectional Context Pre-training Fine-tuning NLP Tasks
8.2/10
Performance
8.6/10
Accuracy
MIT
License
Microsoft

UniLM

LLM Open Source
Unified Language Model (UniLM) is a pre-trained model supporting both natural language understanding and generation tasks, such as summarization and dialogue, through a shared transformer architecture. Its bidirectional, unidirectional, and sequence-to-sequence pre-training objectives make it highly flexible for applications in Azure Cognitive Services.
Text Generation Summarization Dialogue Flexible Pre-training
8.5/10
Performance
8.7/10
Accuracy
MIT
License
Google

XLNet

LLM Open Source
Generalized autoregressive model developed with CMU, outperforming BERT in certain tasks. It uses permutation-based training for better context modeling.
Autoregressive Permutation Training Fine-tuning NLP Tasks
8.1/10
Performance
8.6/10
Accuracy
Apache 2.0
License
OpenAI

GPT-2

LLM Open-weight
An improved transformer model with better coherence in text generation. Initially partially released due to misuse concerns, it showed strong zero-shot task performance.
Text Generation Zero-shot Learning Coherent Output
7.8/10
Performance
8.0/10
Accuracy
Partially Open
License
Google

BERT

LLM Open Source
Bidirectional Encoder Representations from Transformers, designed for deep contextual understanding of text. It revolutionized NLP by enabling bidirectional context in pre-training, excelling in tasks like question answering and text classification.
Bidirectional Context Pre-training Fine-tuning NLP Tasks
8.0/10
Performance
8.5/10
Accuracy
Apache 2.0
License
OpenAI

GPT-1

LLM Research
The first in the GPT series, a transformer-based model focused on unsupervised learning for natural language tasks. It laid the foundation for future models with generative pre-training.
Text Generation Unsupervised Learning Pre-training
6.5/10
Performance
6.8/10
Accuracy
Proprietary
License
Microsoft

Turing

LLM Proprietary
A family of language models developed by Microsoft Research, used across products like Bing and Azure for tasks like search and text generation. It leverages advanced techniques for efficiency and performance, contributing to Microsoft’s AI infrastructure.
Text Generation Search Efficient Scalable
8.7/10
Performance
8.9/10
Accuracy
Proprietary
License
Showing 1-5 of 58 models