The Grand AI Handbook

Google

Gemma 3 QAT

LLM Quantized Open

Released: April 18, 2025

Version: 3.0 QAT

Size: Not disclosed

Gemma 3 QAT models are quantization-aware trained versions of Gemma 3, enabling state-of-the-art AI performance on consumer-grade GPUs by reducing memory requirements without compromising quality.

Quantization-Aware Training Reduced Memory Footprint High Performance on Consumer GPUs Maintained Model Quality Optimized for BF16 Precision

8.8/10

Performance

9.0/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Anthropic

Claude 3.7 Sonnet

LLM Hybrid Reasoning Proprietary

Released: February 24, 2025

Version: 3.7

Size: Not disclosed

Claude 3.7 Sonnet is Anthropic’s most advanced hybrid reasoning model, capable of delivering both rapid responses and extended, step-by-step reasoning. It excels in real-world coding, planning, and instruction-following tasks, offering fine-grained control over reasoning depth and output length.

Hybrid Reasoning (Quick vs. Extended Thinking) Extended Thinking Mode (Up to 128K Tokens) Agentic Coding via Claude Code State-of-the-Art on SWE-bench and TAU-bench Multimodal Capabilities Available via API, Amazon Bedrock, and Google Vertex AI

9.5/10

Performance

9.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google DeepMind

Gemini 2.5 Pro

LLM Multimodal Thinking

Released: March 2025

Version: 2.5

Size: Not disclosed

Gemini 2.5 Pro is Google DeepMind’s most advanced multimodal AI model with built-in thinking capabilities, excelling in complex reasoning, coding, and math/science tasks. It leads benchmarks like GPQA, AIME 2025, and SWE-Bench Verified, offering state-of-the-art performance for developers and enterprises.

Native Multimodal Support (Text, Images, Code) 1M Token Context Window Dynamic Thinking for Enhanced Reasoning High Performance in Math and Science Benchmarks Advanced Coding (Web Apps, Agentic Applications) Available via Google AI Studio and Vertex AI

9.3/10

Performance

9.4/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Google DeepMind

Gemini 2.5 Flash

LLM Multimodal Thinking

Released: April 2025

Version: 2.5

Size: Not disclosed

Gemini 2.5 Flash is a highly efficient multimodal AI model with hybrid reasoning, designed for developers needing fast responses for high-volume tasks. It offers controllable thinking, strong performance in coding and reasoning, and a cost-effective solution for applications like chatbots and data extraction.

Native Multimodal Support (Text, Images, Code) 1M Token Context Window Controllable Dynamic Thinking (On/Off) Optimized for High-Volume Tasks (Chat Apps, Data Extraction) High-Speed Inference with Low Latency Available via Google AI Studio and Vertex AI

9.0/10

Performance

9.1/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Google DeepMind

Gemini 2.5 Pro

LLM Multimodal Thinking

Released: March 2025

Version: 2.5

Size: Not disclosed

Gemini 2.5 Pro is Google DeepMind’s most advanced multimodal AI model with built-in thinking capabilities, excelling in complex reasoning, coding, and math/science tasks. It leads benchmarks like GPQA, AIME 2025, and SWE-Bench Verified, offering state-of-the-art performance for developers and enterprises.

Native Multimodal Support (Text, Images, Code) 1M Token Context Window Dynamic Thinking for Enhanced Reasoning High Performance in Math and Science Benchmarks Advanced Coding (Web Apps, Agentic Applications) Available via Google AI Studio and Vertex AI

9.3/10

Performance

9.4/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Google DeepMind

Gemini 2.5 Flash

LLM Multimodal Thinking

Released: April 2025

Version: 2.5

Size: Not disclosed

Gemini 2.5 Flash is a highly efficient multimodal AI model with hybrid reasoning, designed for developers needing fast responses for high-volume tasks. It offers controllable thinking, strong performance in coding and reasoning, and a cost-effective solution for applications like chatbots and data extraction.

Native Multimodal Support (Text, Images, Code) 1M Token Context Window Controllable Dynamic Thinking (On/Off) Optimized for High-Volume Tasks (Chat Apps, Data Extraction) High-Speed Inference with Low Latency Available via Google AI Studio and Vertex AI

9.0/10

Performance

9.1/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Mistral AI

Mistral Saba

LLM Regional Open

Released: February 17, 2025

Version: 1.0

Size: 24B

Mistral Saba is a 24B parameter regional language model tailored for Arabic and South Asian languages. Trained on curated datasets from the Middle East and South Asia, it delivers culturally nuanced and linguistically accurate responses, optimized for single-GPU deployment and high-speed inference.

Arabic and Indian-Origin Language Support Cultural Context Awareness Single-GPU Deployment (150+ tokens/sec) Fine-Tuning for Domain-Specific Expertise Ideal for Conversational AI and Content Creation Available via API and On-Premise Deployment

9.0/10

Performance

9.2/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Meta

Llama 4 Scout

LLM Multimodal Open-Weight

Released: April 2025

Version: 4.0

Size: 17B active, 109B total

Llama 4 Scout is a multimodal, open-weight AI model with a Mixture of Experts (MoE) architecture, designed for advanced long-context reasoning, multi-document summarization, and visual understanding. It supports an industry-leading context window of 10 million tokens and fits on a single NVIDIA H100 GPU.

Native Multimodal Support (Text and Image Input) 10M Token Context Window Mixture of Experts (MoE) Architecture with 16 Experts Optimized for Multi-Document Summarization Advanced Reasoning Over Large Codebases Supports 12 Languages

8.8/10

Performance

8.9/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Meta

Llama 4 Maverick

LLM Multimodal Open-Weight

Released: April 2025

Version: 4.0

Size: 17B active, 400B total

Llama 4 Maverick is a multimodal, open-weight AI model with a Mixture of Experts (MoE) architecture, optimized for high-quality general assistant and chat use cases. It excels in multilingual image and text understanding, offering high performance at a lower cost compared to Llama 3.3 70B.

Native Multimodal Support (Text and Image Input) 1M Token Context Window Mixture of Experts (MoE) Architecture with 128 Experts Multilingual Support (12 Languages) Optimized for Creative Writing and Chat Applications High-Speed Inference with FP8 Quantization

9.0/10

Performance

9.1/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Meta

Llama 4 Behemoth

LLM Multimodal Open-Weight

Released: Not yet released (in training as of April 2025)

Version: 4.0

Size: 288B active, ~2T total

Llama 4 Behemoth is Meta AI's most powerful multimodal, open-weight AI model, featuring a Mixture of Experts (MoE) architecture with 288 billion active parameters. Designed as a teacher model for distilling knowledge to smaller Llama 4 models, it excels in STEM benchmarks and advanced reasoning tasks. It is still in training and not yet publicly available.

Native Multimodal Support (Text and Image Input) Mixture of Experts (MoE) Architecture with 16 Experts High Performance on STEM Benchmarks Teacher Model for Knowledge Distillation Optimized for Large-Scale Data Analysis and Complex Reasoning Supports Multilingual Tasks (12 Languages)

9.2/10

Performance

9.3/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Cohere

Command A

LLM Multilingual Enterprise

Released: March 2025

Version: A-03-2025

Size: Not disclosed

Command A is Cohere’s generative AI model designed for enterprise applications, offering high performance in agentic tasks, retrieval-augmented generation (RAG), and multilingual handling. It achieves efficiency with minimal compute, requiring only two GPUs, and supports 23 languages for business, STEM, and coding tasks.

Advanced Retrieval-Augmented Generation (RAG) Agentic Tool Use for Enterprise Workflows Multilingual Support (23 Languages) High-Speed Token Streaming (73 tokens/sec) Optimized for Conversational and Instruct Modes Integration with North AI Platform

9.1/10

Performance

9.2/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Cohere For AI

Aya Vision

LLM Multimodal Multilingual Open-Weight

Released: March 2025

Version: Not specified

Size: 8B and 32B

Aya Vision is a state-of-the-art, open-weight multimodal AI model from Cohere For AI, excelling in multilingual and vision-language tasks across 23 languages. It supports image captioning, visual question answering, and text generation, optimized for research and non-commercial use.

Multimodal Support (Text and Image Input) Multilingual Capabilities (23 Languages) Image Captioning and Visual Question Answering Text Extraction from Images (OCR) Optimized for Multilingual Benchmarks (AyaVisionBench) Open-Weight Availability for Research

8.9/10

Performance

9.0/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

LG AI Research

EXAONE 3.0

LLM Bilingual Open

Released: August 2024

Version: 3.0

Size: 7.8B

EXAONE 3.0 is LG AI Research's first open-source, instruction-tuned bilingual language model, optimized for English and Korean. It achieves top-tier performance in real-world applications and complex reasoning tasks.

Instruction Tuning Bilingual Support (English & Korean) High Performance in Real-World Applications Complex Reasoning Capabilities Open-Source Availability

Not disclosed

Performance

Not disclosed

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Mistral AI

Mistral Small 3.1

LLM Multimodal Open

Released: March 17, 2025

Version: 3.1

Size: Not disclosed

Mistral Small 3.1 is an open-source, state-of-the-art language model optimized for multimodal understanding, multilingual capabilities, and low-latency performance. It features an expanded context window and is suitable for both enterprise and consumer applications.

Multimodal Input Processing Multilingual Support 128k Token Context Window High Inference Speed (150 tokens/sec) On-Device Deployment Capability Function Calling Fine-Tuning for Specialized Domains

9.0/10

Performance

9.2/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

TxGemma

LLM Domain-Specific Open

Released: March 25, 2025

Version: 1.0

Size: Not disclosed

TxGemma is a collection of open models tailored for therapeutic development, leveraging large language models to understand and predict properties of therapeutic entities, aiming to improve efficiency in drug discovery.

Therapeutic Entity Understanding Property Prediction for Drug Candidates Optimized for Biomedical Applications Supports Drug Discovery Processes Open Models for Research and Development

8.6/10

Performance

8.9/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Google

ShieldGemma 2

Safety Image Classifier Open

Released: March 12, 2025

Version: 2.0

Size: 4B

ShieldGemma 2 is a 4 billion parameter image safety classifier built on Gemma 3. It is designed to detect harmful content in both synthetic and natural images, aiding in the development of safer AI applications.

Image Safety Classification Detection of Harmful Content Support for Synthetic and Natural Images Integration with Vision-Language Models Open Weights for Fine-Tuning

Not disclosed

Performance

Not disclosed

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Google

PaliGemma 2 Mix

LLM Multimodal Open

Released: March 2025

Version: 2 Mix

Size: Not disclosed

PaliGemma 2 Mix is a multimodal model combining language and vision capabilities, designed for tasks requiring understanding of both text and images. It builds upon the PaliGemma series with enhanced performance.

Text and Image Understanding Enhanced Multimodal Reasoning Improved Performance over Previous Versions Suitable for Diverse Applications Optimized for Efficiency

8.5/10

Performance

8.7/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

Google

Gemma 3

LLM Multimodal Open

Released: March 12, 2025

Version: 3.0

Size: Not disclosed

Gemma 3 is Google's most advanced open model, supporting multimodal inputs including text, images, and short videos. It is optimized for performance on a single GPU and includes safety features like the ShieldGemma 2 image safety classifier.

Multimodal Input Processing Vision Encoder for High-Resolution Images Image Safety Classification Support for 35+ Languages Optimized for Single GPU Performance

9.0/10

Performance

9.2/10

Accuracy

Open (with restrictions)

License

Read Paper View Model Page Learning Resources

OpenAI

o3

LLM Multimodal Proprietary

Released: April 16, 2025

Version: o3

Size: Not disclosed

OpenAI’s most advanced reasoning model to date, o3 excels in complex tasks across coding, mathematics, science, and visual analysis. It integrates multimodal inputs and agentic tool use, setting new standards in AI capabilities.

Advanced Reasoning Multimodal Input Processing Agentic Tool Use Image Analysis Code Generation Mathematical Problem Solving

9.5/10

Performance

9.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

OpenAI

o4-mini

LLM Multimodal Proprietary

Released: April 16, 2025

Version: o4-mini

Size: Not disclosed

A compact and efficient model, o4-mini delivers high performance in reasoning tasks, particularly in mathematics and coding, while being optimized for speed and cost-effectiveness.

Efficient Reasoning Mathematical Problem Solving Code Generation Visual Input Processing Tool Integration

8.8/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Microsoft

BitNet b1.58 2B4T

SLM Open Source

Released: April 2025

Version: 1.0

Size: 2B parameters

A 1-bit AI model with 2 billion parameters, released in April 2025, designed for hyper-efficient performance on CPUs, including Apple’s M2, using a custom bitnet.cpp framework. It achieves double the speed and lower memory usage compared to traditional models, ideal for resource-constrained devices.

Efficient CPU-optimized Low Memory Fast Inference

8.8/10

Performance

8.9/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Virtual Camera

3D Video Research

Released: March 2025

Version: 1.0

Size: Not disclosed

A research-preview multi-view diffusion model launched in March 2025, transforming 2D images into immersive 3D videos with realistic depth and perspective. It eliminates the need for complex reconstruction, making it ideal for virtual reality and cinematic applications.

2D-to-3D Video Immersive Depth No Reconstruction Cinematic Use

9.0/10

Performance

9.1/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

DeepSeek

DeepSeek-V3-0324

LLM Open Source MoE

Released: March 2025

Version: 3.0324

Size: 671B parameters (37B activated)

DeepSeek-V3-0324 is an upgraded V3 model with enhanced reasoning, coding, and tool-use capabilities, outperforming GPT-4.5 in math and coding.

Mathematical Reasoning Code Generation Tool Use Large Context Window

9.3/10

Performance

9.5/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

NVIDIA

Cosmos Nemotron

LLM Open Source Physical AI

Released: March 2025

Version: 1.0

Size: Not specified

Cosmos Nemotron is an open reasoning model for physical AI development, offering customizable world generation for robotics and simulation.

World Generation Physical AI Reasoning Customizable Simulation

9.1/10

Performance

9.3/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

NVIDIA

GR00T N1

Robotics Open Source Humanoid AI

Released: March 2025

Version: 1.0

Size: Not specified

GR00T N1 is an open, customizable foundation model for humanoid robot reasoning, enabling advanced perception and action in robotics.

Humanoid Robot Reasoning Perception Action Planning

9.0/10

Performance

9.2/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Alibaba

Qwen2.5-Coder Series

Code Generation Open Source Multilingual

Released: March 2025

Version: 2.5

Size: 3

A series of code-specific models optimized for code generation, reasoning, and fixing, available in multiple sizes.

Code Generation Code Reasoning Code Fixing Supports 92 Programming Languages

9.3/10

Performance

9.1/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Alibaba

Qwen2.5-Math Series

Math Open Source Multilingual

Released: March 2025

Version: 2.5

Size: 3

An advanced math-specific model series extending Qwen2.5 capabilities with high performance in mathematical reasoning tasks.

Math Word Problems Multi-Hop Reasoning Symbolic Math MathQA Tasks

9.2/10

Performance

9.3/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-4.5

LLM Conversational

Released: February 2025

Version: 1.0

Size: Not disclosed

Codenamed Orion, this large model reduces hallucinations compared to GPT-4o and o1. It’s designed for conversational tasks and broad knowledge applications.

Text Generation Low Hallucination Conversational AI

9.3/10

Performance

9.5/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Microsoft

Magma

Multimodal Agentic AI

Released: February 2025

Version: 1.0

Size: Not disclosed

A multimodal AI model introduced in February 2025, combining visual and language processing to control software interfaces and robotic systems, enabling agentic AI for autonomous task execution. It features Set-of-Mark and Trace-of-Mark for spatial intelligence, with public code released on GitHub.

Visual Processing Language Processing Robotic Control Spatial Intelligence

9.0/10

Performance

9.1/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Microsoft

Muse (WHAM)

Generative AI Open Source

Released: February 2025

Version: 1.0

Size: Not disclosed

A generative AI model for video game visuals and controller actions, released in February 2025, developed with Ninja Theory and published in Nature. It supports gameplay ideation through the WHAM Demonstrator, with open-source weights and sample data available on Azure AI Foundry.

Game Visuals Controller Actions Gameplay Ideation Interactive Interface

8.9/10

Performance

9.0/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

xAI

Grok-3

LLM Web Search Advanced Reasoning

Released: February 2025

Version: 3.0

Size: Unknown

The latest Grok model featuring reflection capabilities and advanced web search integration.

Reflection Capabilities DeepSearch Integration Advanced Reasoning

9.2/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Point Aware 3D (SPAR3D)

3D Real-time

Released: January 2025

Version: 1.0

Size: Not disclosed

A cutting-edge 3D generation model introduced in January 2025, enabling real-time editing and complete structure generation from a single image in under a second. It supports rapid prototyping for gaming, architecture, and entertainment with high precision and efficiency.

Real-time Editing Image-to-3D High Precision Rapid Prototyping

9.2/10

Performance

9.3/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

DeepSeek

DeepSeek-R1

LLM Open Source Reasoning

Released: January 2025

Version: 1.0

Size: 671B parameters (37B activated)

DeepSeek-R1 is a reasoning-focused model fine-tuned from V3, competing with top models like OpenAI’s o1 in math, coding, and reasoning tasks.

Chain-of-Thought Reasoning Mathematical Reasoning Code Generation Self-correction

9.2/10

Performance

9.4/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

DeepSeek

Janus-Pro-7B

Multimodal Open Source Vision

Released: January 2025

Version: 1.0

Size: 7B parameters

Janus-Pro-7B is a multimodal vision model for image understanding and generation, outperforming models like DALL-E 3 on key benchmarks.

Image Understanding Image Generation Multimodal Processing

8.9/10

Performance

9.1/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Alibaba

Qwen2.5 Series

LLM Instruction-Tuned Open Source

Released: January 2025

Version: 2.5

Size: 3

The latest series of decoder-only language models, available in various sizes and optimized for instruction following and structured output generation.

Instruction Following Structured Output Generation Multilingual Support

9.2/10

Performance

9.0/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

O3 Mini

Reasoning API Only Efficiency

Released: January 2025

Version: 3.0

Size: Not disclosed

A lightweight version of O3 with efficient reasoning capabilities, expected to be accessible via API.

Efficient Reasoning Problem Solving Task Versatility

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Amazon

NOVA

Multimodal API Only Enterprise

Released: December 2024

Version: 1.0

Size: Not disclosed

A suite of AI models for various tasks, including text and image processing, accessible via API for enterprise applications.

Text and Image Processing Enterprise Integration Task Versatility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

OpenAI

SORA

Video Generation API Only Creative

Released: December 2024

Version: 1.0

Size: Not disclosed

A video generation model for creating high-quality videos from text prompts, now publicly released via API.

Text-to-Video High-Quality Output Creative Storytelling

9.3/10

Performance

9.2/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Cohere

Command R7B

Language Model Open Weights Performance

Released: December 2024

Version: 7B

Size: 7B

An open-weight language model optimized for advanced NLP tasks, offering high performance and flexibility.

Advanced NLP Task Versatility Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

O1

Reasoning API Only Problem-Solving

Released: December 2024

Version: 1.0

Size: Not disclosed

An advanced reasoning model with superior problem-solving capabilities, accessible via API.

Advanced Reasoning Problem Solving Task Versatility

9.5/10

Performance

9.4/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

OpenAI

O1 Pro

Reasoning API Only Professional

Released: December 2024

Version: 1.0

Size: Not disclosed

A professional-grade version of O1 with enhanced reasoning and task capabilities, accessible via API.

Enhanced Reasoning Professional Tasks Task Versatility

9.6/10

Performance

9.5/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

OpenAI

Live Video Mode

Multimodal API Only Video

Released: December 2024

Version: 1.0

Size: Not disclosed

A feature for GPT-4o enabling real-time video interaction and analysis, accessible via API.

Real-Time Video Video Analysis Task Assistance

9.2/10

Performance

9.1/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Gemini-Exp-1206

Multimodal API Only Experimental

Released: December 2024

Version: 1206

Size: Not disclosed

An experimental multimodal model with advanced text and image processing, accessible via API.

Text and Image Processing Advanced Reasoning Task Versatility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Gemini 2.0 Flash

Multimodal API Only Efficiency

Released: December 2024

Version: 2.0

Size: Not disclosed

A lightweight multimodal model in beta, optimized for efficient text and image processing, accessible via API.

Text and Image Processing Efficient Performance Task Versatility

8.8/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Gemini-2.0-Flash-Thinking

Multimodal API Only Reasoning

Released: December 2024

Version: 2.0

Size: Not disclosed

A variant of Gemini 2.0 Flash with enhanced reasoning capabilities, accessible via API.

Enhanced Reasoning Text and Image Processing Task Versatility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Veo 2

Video Generation API Only Creative

Released: December 2024

Version: 2.0

Size: Not disclosed

An advanced video generation model for creating high-quality videos from text prompts, accessible via API.

Text-to-Video High-Quality Output Creative Storytelling

9.1/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

IBM

Granite 3.1

Language Model Open Weights Performance

Released: December 2024

Version: 3.1

Size: Not disclosed

An open-weight language model for advanced NLP tasks, offering high performance and flexibility.

Advanced NLP Task Versatility Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Imagen 3 Update

Image Generation API Only Creative

Released: December 2024

Version: 3.0

Size: Not disclosed

An updated image generation model for creating high-quality visuals, accessible via API for creative applications.

High-Quality Images Creative Workflows Professional Design

9.1/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

xAI

Aurora

Image Generation API Only Creative

Released: December 2024

Version: 1.0

Size: Not disclosed

An image generation model integrated with xAI's ecosystem, accessible via API for creative applications.

High-Quality Images Creative Workflows xAI Integration

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Microsoft

Phi4

Language Model Open Weights Efficiency

Released: December 2024

Version: 4.0

Size: Not disclosed

An open-weight language model optimized for efficiency and performance on resource-constrained devices.

Resource Efficiency High Performance Customizable

8.7/10

Performance

8.6/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Meta

Llama 3.3 70B

Language Model Open Weights Research

Released: December 2024

Version: 3.3

Size: 70B

An upgraded open-weight language model for research, offering high performance in NLP tasks.

NLP Research High Performance Customizable

9.0/10

Performance

8.9/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

PaliGemma 2

Multimodal Open Weights Research

Released: December 2024

Version: 2.0

Size: Not disclosed

An open-weight vision-language model for advanced multimodal tasks, suitable for research and development.

Vision-Language Processing Research Flexibility Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Pika

Pika Labs 2.0

Video Generation API Only Creative

Released: December 2024

Version: 2.0

Size: Not disclosed

An upgraded video generation model for creating high-quality videos with advanced effects, accessible via API.

Text-to-Video Advanced Effects Creative Storytelling

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Meta

Apollo

Multimodal Open Weights Research

Released: December 2024

Version: 1.0

Size: Not disclosed

An open-weight multimodal model for text and image processing, optimized for research and development.

Text and Image Processing Research Flexibility Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Deepseek

DeepSeek V3

Language Model Open Weights Performance

Released: December 2024

Version: 3.0

Size: Not disclosed

An open-weight language model for advanced NLP tasks, offering high performance and flexibility.

Advanced NLP Task Versatility Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

AnswerAI and LightOn

ModernBERT

Language Model Open Weights Efficiency

Released: December 2024

Version: 1.0

Size: Not disclosed

An open-weight language model optimized for advanced NLP tasks, offering high performance and efficiency.

Advanced NLP Efficient Processing Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Alibaba

QVQ-72B-Preview

Language Model Open Weights Preview

Released: December 2024

Version: 72B

Size: 72B

A preview of a high-performance language model for advanced NLP tasks, offering open weights for customization.

Advanced NLP Task Versatility Customizable

9.0/10

Performance

8.9/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

O3

Reasoning API Only Problem-Solving

Released: December 2024

Version: 3.0

Size: Not disclosed

An advanced AI model with superior reasoning and problem-solving capabilities, accessible via API.

Advanced Reasoning Problem Solving Task Versatility

9.6/10

Performance

9.5/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

KLING

Kling 1.6

Video Generation API Only Creative

Released: December 2024

Version: 1.6

Size: Not disclosed

An upgraded video generation model for creating high-quality videos from text prompts, accessible via API.

Text-to-Video High-Quality Output Creative Storytelling

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

TII

Falcon 3

Multimodal Open Weights Performance

Released: December 2024

Version: 3.0

Size: Not disclosed

An open-weight model family for advanced language and multimodal tasks, offering high performance and flexibility.

Language Processing Multimodal Capabilities Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Alibaba

QwQ 32B Preview

Language Model Open Weights Preview

Released: November 2024

Version: 32B

Size: 32B

An open-weight language model preview for advanced NLP tasks, offering high performance and flexibility.

Advanced NLP Task Versatility Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Alibaba

Qwen2.5 Coder 32B

Code Generation Open Weights Developer

Released: November 2024

Version: 2.5

Size: 32B

An open-weight model for advanced code generation, optimized for programming tasks and developer workflows.

Code Completion Syntax Understanding Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Deepseek

DeepSeek-R1-Lite-Preview

Reasoning API Only Preview

Released: November 2024

Version: 1.0

Size: Not disclosed

A preview of a lightweight AI model for reasoning and task assistance, accessible via API.

Efficient Reasoning Task Assistance Developer API

8.7/10

Performance

8.6/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

allenai

Tulu 3

Language Model Open Weights Research

Released: November 2024

Version: 3.0

Size: Not disclosed

An open-weight language model for research, offering high performance in NLP tasks with a focus on flexibility.

NLP Research High Performance Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Suno AI

Suno v4

Music Generation API Only Creative

Released: November 2024

Version: 4.0

Size: Not disclosed

An upgraded music creation model generating high-quality audio tracks, accessible via API for creative projects.

Text-to-Music High-Quality Audio Creative Flexibility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

HuggingFace

SmolLM 2

Language Model Open Weights Efficiency

Released: November 2024

Version: 2.0

Size: Not disclosed

An open-weight lightweight language model for research and efficient NLP tasks, offering high performance.

Lightweight NLP Research Flexibility Customizable

8.6/10

Performance

8.5/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Mistral

Pixtral Large

Multimodal Open Weights Research

Released: November 2024

Version: 1.0

Size: Not disclosed

An open-weight multimodal model for advanced text and image processing, optimized for research and development.

Text and Image Processing Research Flexibility Customizable

9.0/10

Performance

8.9/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Mistral

Mistral Large 2411

Language Model Open Weights Performance

Released: November 2024

Version: 2411

Size: Not disclosed

An upgraded open-weight language model for advanced NLP tasks, offering high performance and flexibility.

Advanced NLP Task Versatility Customizable

9.0/10

Performance

8.9/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

gemini-exp-1114

Multimodal API Only Experimental

Released: November 2024

Version: 1114

Size: Not disclosed

An experimental multimodal model with advanced text and image processing, accessible via API.

Text and Image Processing Advanced Reasoning Task Versatility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

gemini-exp-1121

Multimodal API Only Experimental

Released: November 2024

Version: 1121

Size: Not disclosed

An experimental multimodal model with enhanced text and image processing, accessible via API.

Text and Image Processing Advanced Reasoning Task Versatility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Allen AI

OLMo 2

Language Model Open Weights Research

Released: November 2024

Version: 2.0

Size: Not disclosed

An open-weight language model for research, offering high performance in NLP tasks with a focus on efficiency.

NLP Research High Performance Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Anthropic

Visual PDF Analysis

Document Analysis API Only Multimodal

Released: November 2024

Version: 1.0

Size: Not disclosed

A feature in Claude for analyzing PDF documents with visual content, accessible via API.

PDF Analysis Visual Content Processing Task Assistance

8.8/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

HuggingFace

SmolVLM

Multimodal Open Weights Efficiency

Released: November 2024

Version: 1.0

Size: Not disclosed

An open-weight vision-language model optimized for efficient multimodal tasks, suitable for research and development.

Vision-Language Processing Resource Efficiency Customizable

8.6/10

Performance

8.5/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Black Forest Labs

Flux 1.1 Pro

Image Generation API Only Creative

Released: October 2024

Version: 1.1

Size: Not disclosed

An upgraded image generation model for professional-grade visuals, accessible via API.

High-Quality Images Professional Design Creative Workflows

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Meta

Movie Gen

Video Generation API Only Creative

Released: October 2024

Version: 1.0

Size: Not disclosed

A video generation model for creating high-quality videos from text prompts, accessible via API.

Text-to-Video High-Quality Output Creative Storytelling

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Cohere

Aya Expanse

Language Model Open Weights Performance

Released: October 2024

Version: 1.0

Size: Not disclosed

An open-weight language model for advanced NLP tasks, offering high performance and flexibility.

Advanced NLP Task Versatility Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Pika

Pika Effects

Video Generation API Only Creative

Released: October 2024

Version: 1.5

Size: Not disclosed

A video model with advanced effects for creative video editing, accessible via API.

Video Effects Creative Editing High-Quality Output

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Adobe

Firefly Video

Video Generation API Only Creative

Released: October 2024

Version: 1.0

Size: Not disclosed

A video generation model for professional-grade video creation, accessible via API.

High-Quality Video Professional Editing Creative Workflows

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Rhymes

Aria

Conversational Open Weights Creative

Released: October 2024

Version: 1.0

Size: Not disclosed

An open-weight conversational AI model optimized for task assistance and creative interactions.

Task Assistance Creative Interactions Customizable

8.7/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Meta

Meta Spirit LM

Language Model Open Weights Research

Released: October 2024

Version: 1.0

Size: Not disclosed

An open-weight language model for research, offering high performance in NLP tasks.

NLP Research High Performance Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Mistral

Ministral

Language Model API Only Efficiency

Released: October 2024

Version: 1.0

Size: Not disclosed

A lightweight language model for efficient NLP tasks, accessible via API for developers.

Efficient NLP Task Versatility Developer API

8.5/10

Performance

8.4/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Deepseek

Janus

Multimodal Open Weights Research

Released: October 2024

Version: 1.0

Size: Not disclosed

An open-weight multimodal model for text and image processing, optimized for research and development.

Text and Image Processing Research Flexibility Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Fluid

Reasoning API Only Research

Released: October 2024

Version: 1.0

Size: Not disclosed

An AI model for advanced reasoning and problem-solving, accessible via API for research applications.

Advanced Reasoning Problem Solving Research Applications

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Diffusion 3.5

Image Generation Open Weights Creative

Released: October 2024

Version: 3.5

Size: Not disclosed

An upgraded open-weight model for text-to-image generation, offering improved quality and flexibility.

Text-to-Image High-Quality Output Creative Flexibility

8.9/10

Performance

8.8/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

Anthropic

Claude 3.5 Sonnet New

Conversational API Only Safety

Released: October 2024

Version: 3.5

Size: Not disclosed

An upgraded conversational AI model with enhanced reasoning and safety, accessible via API.

Enhanced Reasoning Safe Interactions Helpful Responses

9.3/10

Performance

9.2/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Anthropic

Claude 3.5 Haiku

Conversational API Only Efficiency

Released: October 2024

Version: 3.5

Size: Not disclosed

A lightweight conversational AI model with efficient performance and safety, accessible via API.

Efficient Reasoning Safe Interactions Helpful Responses

8.8/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Recraft

Recraft v3

Image Generation API Only Creative

Released: October 2024

Version: 3.0

Size: Not disclosed

An image generation model for creating high-quality visuals, accessible via API for creative workflows.

High-Quality Images Creative Workflows Professional Design

8.8/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

OpenAI

Search GPT

Search API Only Summarization

Released: October 2024

Version: 1.0

Size: Not disclosed

An AI-powered search engine providing concise and relevant answers, accessible via API.

Search Summaries Information Retrieval Concise Outputs

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Allen AI

OLMoE

Language Model Open Weights Research

Released: September 2024

Version: 1.0

Size: Not disclosed

An open-weight language model for research, offering high performance in NLP tasks with a focus on efficiency.

NLP Research High Performance Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Mistral

Pixtral12B

Multimodal Open Weights Research

Released: September 2024

Version: 12B

Size: 12B

An open-weight multimodal model for text and image processing, optimized for research and development.

Text and Image Processing Research Flexibility Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

o1 preview

Reasoning API Only Problem-Solving

Released: September 2024

Version: 1.0

Size: Not disclosed

A preview of an advanced reasoning model with enhanced problem-solving capabilities, accessible via API.

Advanced Reasoning Problem Solving Task Versatility

9.4/10

Performance

9.3/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

OpenAI

o1 mini

Reasoning API Only Efficiency

Released: September 2024

Version: 1.0

Size: Not disclosed

A lightweight version of the o1 model with efficient reasoning capabilities, accessible via API.

Efficient Reasoning Problem Solving Task Versatility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

IBM

Granite Code

Code Generation Open Weights Developer

Released: September 2024

Version: 1.0

Size: Not disclosed

An open-weight model for code generation, optimized for programming tasks and developer workflows.

Code Completion Syntax Understanding Customizable

8.7/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Alibaba

Qwen 2.5

Language Model Open Weights Performance

Released: September 2024

Version: 2.5

Size: Not disclosed

An open-weight language model for advanced NLP tasks, offering high performance and flexibility.

Advanced NLP Task Versatility Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

KLING

KLING 1.5

Video Generation API Only Creative

Released: September 2024

Version: 1.5

Size: Not disclosed

A video generation model for creating high-quality videos from text prompts, accessible via API.

Text-to-Video High-Quality Output Creative Storytelling

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

01 AI

Yi Coder

Code Generation Open Weights Developer

Released: September 2024

Version: 1.0

Size: Not disclosed

An open-weight model for code generation, optimized for programming tasks and developer workflows.

Code Completion Syntax Understanding Customizable

8.7/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

GPT4o Advanced Voice Mode

Multimodal API Only Voice

Released: September 2024

Version: 4o

Size: Not disclosed

An enhanced version of GPT-4o with advanced voice interaction capabilities, accessible via API.

Voice Interaction Text and Image Processing Task Versatility

9.3/10

Performance

9.2/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Meta

Llama 3.2

Language Model Open Weights Research

Released: September 2024

Version: 3.2

Size: Not disclosed

An upgraded open-weight language model for research, offering improved performance in NLP tasks.

NLP Research High Performance Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Gemini Pro 1.5 002

Multimodal API Only Performance

Released: September 2024

Version: 1.5

Size: Not disclosed

An updated multimodal model with enhanced text and image processing capabilities, accessible via API.

Text and Image Processing Advanced Reasoning Task Versatility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Kyutai

Moshi

Conversational Open Weights Voice

Released: September 2024

Version: 1.0

Size: Not disclosed

An open-weight conversational AI model optimized for real-time voice and text interactions.

Voice Interaction Real-Time Processing Customizable

8.7/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

NotebookLM

Research API Only Summarization

Released: September 2024

Version: 1.0

Size: Not disclosed

An updated AI-powered tool for research and note-taking, providing summaries and insights via API.

Research Summaries Note-Taking Insight Generation

8.6/10

Performance

8.5/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Mistral

Mistral Small

Language Model API Only Efficiency

Released: September 2024

Version: 1.0

Size: Not disclosed

A lightweight language model for efficient NLP tasks, accessible via API for developers.

Efficient NLP Task Versatility Developer API

8.5/10

Performance

8.4/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Black Forest Labs

Flux

Image Generation Open Weights Creative

Released: August 2024

Version: 1.0

Size: Not disclosed

An open-weight model for high-quality image generation, optimized for creative and professional applications.

Text-to-Image High-Quality Output Creative Flexibility

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-4o 0806

Multimodal API Only Reasoning

Released: August 2024

Version: 0806

Size: Not disclosed

An updated multimodal AI model with enhanced text and image processing capabilities, accessible via API.

Text and Image Processing Advanced Reasoning Task Versatility

9.3/10

Performance

9.2/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Imagen 3

Image Generation API Only Creative

Released: August 2024

Version: 3.0

Size: Not disclosed

An advanced image generation model for creating high-quality visuals, accessible via API for creative applications.

High-Quality Images Creative Workflows Professional Design

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

xAI

Grok 2

Conversational API Only Reasoning

Released: August 2024

Version: 2.0

Size: Not disclosed

An advanced conversational AI model with enhanced reasoning capabilities, accessible via API.

Enhanced Reasoning Task Assistance Conversational

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

xAI

Grok 2 mini

Conversational API Only Efficiency

Released: August 2024

Version: 2.0

Size: Not disclosed

A lightweight version of Grok 2 with efficient conversational capabilities, accessible via API.

Efficient Reasoning Task Assistance Conversational

8.7/10

Performance

8.6/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Nous

Hermes 3

Language Model Open Weights Research

Released: August 2024

Version: 3.0

Size: Not disclosed

An open-weight language model for research and advanced NLP tasks, offering high performance and flexibility.

NLP Research High Performance Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Microsoft

Phi 3.5

Language Model Open Weights Efficiency

Released: August 2024

Version: 3.5

Size: Not disclosed

An upgraded open-weight language model optimized for efficiency and performance on resource-constrained devices.

Resource Efficiency High Performance Customizable

8.6/10

Performance

8.5/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

Gemini 1.5 Flash8B

Multimodal API Only Efficiency

Released: August 2024

Version: 1.5

Size: 8B

A lightweight multimodal model with efficient text and image processing, accessible via API.

Text and Image Processing Efficient Performance Task Versatility

8.7/10

Performance

8.6/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Ideogram

Ideogram 2.0

Image Generation API Only Creative

Released: August 2024

Version: 2.0

Size: Not disclosed

An image generation model for creating high-quality visuals, accessible via API for creative applications.

High-Quality Images Creative Workflows Professional Design

8.8/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Luma

Dream Machine 1.5

Video Generation API Only Creative

Released: August 2024

Version: 1.5

Size: Not disclosed

A video generation model for creating high-quality videos from text prompts, accessible via API.

Text-to-Video High-Quality Output Creative Storytelling

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Cohere

Command R+

Language Model Open Weights Performance

Released: August 2024

Version: 1.0

Size: Not disclosed

An open-weight language model optimized for advanced NLP tasks, offering high performance and flexibility.

Advanced NLP Task Versatility Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

TII

Falcon Mamba

Language Model Open Weights Efficiency

Released: August 2024

Version: 1.0

Size: Not disclosed

An open-weight state-space model for efficient language processing, suitable for research and development.

State-Space Architecture Efficient Processing Customizable

8.7/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-4o mini

Multimodal API Only Efficiency

Released: July 2024

Version: 4o mini

Size: Not disclosed

A lightweight version of GPT-4o with multimodal capabilities, optimized for efficiency via API access.

Text and Image Processing Efficient Performance Task Versatility

8.8/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Meta

Llama 3.1

Language Model Open Weights Research

Released: July 2024

Version: 3.1

Size: Not disclosed

An upgraded open-weight language model for research, offering improved performance in NLP tasks.

NLP Research High Performance Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Mistral

Codestral Mamba

Code Generation Open Weights Efficiency

Released: July 2024

Version: 1.0

Size: Not disclosed

An open-weight model for code generation, leveraging state-space architecture for efficient programming tasks.

Code Completion State-Space Architecture Customizable

8.7/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

AlphaProof & AlphaGeometry 2

Math API Only Reasoning

Released: July 2024

Version: 2.0

Size: Not disclosed

Specialized AI models for mathematical reasoning and geometry problem-solving, accessible via API.

Mathematical Reasoning Geometry Solving Research Applications

9.0/10

Performance

9.1/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

OpenAI

SearchGPT

Search API Only Summarization

Released: July 2024

Version: 1.0

Size: Not disclosed

An AI-powered search engine providing concise and relevant answers, accessible via API.

Search Summaries Information Retrieval Concise Outputs

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Udio

Udio v1.5

Music Generation API Only Creative

Released: July 2024

Version: 1.5

Size: Not disclosed

A music creation model generating high-quality audio tracks, accessible via API for creative applications.

Text-to-Music High-Quality Audio Creative Flexibility

8.7/10

Performance

8.6/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Mistral

Mistral Large 2

Language Model API Only Performance

Released: July 2024

Version: 2.0

Size: Not disclosed

A high-performance language model for advanced NLP tasks, accessible via API for developers.

Advanced NLP Task Versatility Developer API

9.0/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Midjourney

Midjourney v6.1

Image Generation API Only Creative

Released: July 2024

Version: 6.1

Size: Not disclosed

An upgraded image generation model for creating high-quality visuals, accessible via API for creative workflows.

High-Quality Images Creative Workflows Professional Design

9.1/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Gemma 2 2B

Language Model Open Weights Efficiency

Released: July 2024

Version: 2.0

Size: 2B

A lightweight open-weight language model for research and efficient NLP tasks, offering high performance.

Lightweight NLP Research Flexibility Customizable

8.5/10

Performance

8.4/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Diffusion 3 (Medium)

Image Generation Open Weights Creative

Released: June 2024

Version: 3.0

Size: Not disclosed

A medium-sized version of Stable Diffusion 3, offering open weights for text-to-image generation with balanced performance.

Text-to-Image Balanced Performance Creative Flexibility

8.7/10

Performance

8.6/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

Apple

Apple Intelligence

Productivity API Only On-Device

Released: June 2024

Version: 1.0

Size: Not disclosed

An AI suite for Apple devices, enhancing user experience through on-device task automation and insights, accessible via API.

Task Automation On-Device Processing User Experience

8.8/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Deepseek

DeepSeekCoderV2

Code Generation Open Weights Developer

Released: June 2024

Version: 2.0

Size: Not disclosed

An open-weight model for advanced code generation, optimized for programming tasks and developer workflows.

Code Completion Syntax Understanding Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Runway

Gen3 Alpha

Video Generation API Only Creative

Released: June 2024

Version: 3.0

Size: Not disclosed

A video generation model for creating high-quality videos from text prompts, accessible via API for creative applications.

Text-to-Video High-Quality Output Creative Storytelling

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

01 AI

Yi 1.5

Language Model Open Weights Research

Released: June 2024

Version: 1.5

Size: Not disclosed

An open-weight language model optimized for research and NLP tasks, offering high performance and flexibility.

NLP Research High Performance Customizable

8.7/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Anthropic

Claude Sonnet 3.5

Conversational API Only Safety

Released: June 2024

Version: 3.5

Size: Not disclosed

An upgraded conversational AI model with enhanced reasoning and safety features, accessible via API.

Enhanced Reasoning Safe Interactions Helpful Responses

9.2/10

Performance

9.1/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Microsoft

Florence 2

Vision Open Weights Research

Released: June 2024

Version: 2.0

Size: Not disclosed

An open-weight vision model for advanced image processing tasks, suitable for research and development.

Image Processing Research Flexibility Customizable

8.6/10

Performance

8.5/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

Gemma 2

Language Model Open Weights Efficiency

Released: June 2024

Version: 2.0

Size: Not disclosed

An open-weight language model optimized for research and lightweight NLP tasks, offering high efficiency.

Lightweight NLP Research Flexibility Customizable

8.5/10

Performance

8.4/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-4o

Multimodal API Only Reasoning

Released: May 2024

Version: 4o

Size: Not disclosed

A multimodal AI model with advanced capabilities in text, image processing, and reasoning, accessible via API.

Text and Image Processing Advanced Reasoning Task Versatility

9.3/10

Performance

9.2/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Gemini 1.5

Multimodal API Only High Capacity

Released: May 2024

Version: 1.5

Size: Not disclosed

An upgraded multimodal model with a 2 million token limit, offering enhanced performance in text and image tasks.

Large Token Limit Text and Image Processing Task Versatility

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Microsoft

Copilot+

Productivity API Only Automation

Released: May 2024

Version: 1.0

Size: Not disclosed

An AI assistant for dedicated computers, enhancing productivity through task automation and insights, accessible via API.

Task Automation Productivity Insights Integration

8.7/10

Performance

8.6/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Meta

Chameleon

Multimodal Open Weights Research

Released: May 2024

Version: 1.0

Size: Not disclosed

A multimodal model with open weights, designed for text and image processing in research and development.

Text and Image Processing Research Flexibility Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Mistral

Mistral-7B-Instruct-v0.3

Language Model Open Weights Instruction

Released: May 2024

Version: 0.3

Size: 7B

An open-weight language model optimized for instruction-following tasks, offering high efficiency and performance.

Instruction Following Efficient Processing Customizable

8.6/10

Performance

8.5/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

AI Overviews

Search API Only Summarization

Released: May 2024

Version: 1.0

Size: Not disclosed

An AI-powered search summary tool providing concise and relevant information, accessible via API.

Search Summaries Information Retrieval Concise Outputs

8.5/10

Performance

8.4/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Suno AI

Suno v3.5

Music Generation API Only Creative

Released: May 2024

Version: 3.5

Size: Not disclosed

An upgraded music creation model generating high-quality audio tracks, accessible via API for creative projects.

Text-to-Music High-Quality Audio Creative Flexibility

8.8/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Mistral

Codestral

Code Generation Open Weights Developer

Released: May 2024

Version: 1.0

Size: Not disclosed

An open-weight model for code generation, optimized for programming tasks and developer workflows.

Code Completion Syntax Understanding Customizable

8.7/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

TII

Falcon 2

Multimodal Open Weights Language

Released: May 2024

Version: 2.0

Size: 11B

An open-weight model family including Falcon2-11B and Falcon2-VLM, designed for language and vision tasks.

Language Processing Vision Processing Customizable

8.6/10

Performance

8.5/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Audio 2.0

Audio Generation Open Weights Creative

Released: April 2024

Version: 2.0

Size: Not disclosed

An open-weight model for generating high-fidelity audio, suitable for music and sound design applications.

High-Fidelity Audio Sound Design Customizable

8.8/10

Performance

8.7/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

xAI

Grok-1.5V

Multimodal API Only Reasoning

Released: April 2024

Version: 1.5V

Size: Not disclosed

An enhanced version of Grok with image recognition capabilities, designed for multimodal task assistance via API.

Image Recognition Task Assistance Conversational

8.7/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Mistral

Mixtral 8x22B

Language Model Open Weights Efficiency

Released: April 2024

Version: 8x22B

Size: 22B

A high-performance language model with open weights, optimized for efficiency and scalability in NLP tasks.

Efficient Processing Scalable NLP Customizable

8.9/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Meta

LLaMA 3

Language Model Open Weights Research

Released: April 2024

Version: 3.0

Size: Not disclosed

An open-weight language model designed for research, offering high performance in natural language tasks.

NLP Research High Performance Customizable

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Microsoft

Phi-3-mini

Language Model Open Weights Efficiency

Released: April 2024

Version: 3.0

Size: 3.8B

A lightweight, open-weight language model optimized for efficiency and performance on resource-constrained devices.

Resource Efficiency High Performance Customizable

8.5/10

Performance

8.4/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Adobe

Firefly 3

Image Generation API Only Creative

Released: April 2024

Version: 3.0

Size: Not disclosed

An image creation model for professional design, offering high-quality outputs via API for creative workflows.

High-Quality Images Professional Design Creative Workflows

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Reka

Reka AI Models

Multimodal API Only Language

Released: April 2024

Version: 1.0

Size: Not disclosed

Multimodal language models designed for advanced text and image processing, accessible via API.

Text Processing Image Processing Task Versatility

8.6/10

Performance

8.5/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Apple

OpenELM

Language Model Open Weights Efficiency

Released: April 2024

Version: 1.0

Size: Not disclosed

An open-weight language model optimized for efficient on-device NLP tasks, suitable for research and development.

On-Device NLP Resource Efficiency Customizable

8.4/10

Performance

8.3/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

xAI

Grok 1.5

Conversational Open Weights Reasoning

Released: March 2024

Version: 1.5

Size: Not disclosed

An advanced conversational AI model with improved reasoning and open weights, designed to assist users in various tasks.

Enhanced Reasoning Task Assistance Open Customization

8.6/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Anthropic

Claude 3

Conversational API Only Safety

Released: March 2024

Version: 3.0

Size: Not disclosed

A conversational AI model outperforming GPT-4, focused on safety and helpfulness, accessible via API.

Safe Interactions Advanced Reasoning Helpful Responses

9.1/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Suno AI

Suno v3

Music Generation API Only Creative

Released: March 2024

Version: 3.0

Size: Not disclosed

A music creation model generating high-quality audio tracks from prompts, accessible via API for creative applications.

Text-to-Music High-Quality Audio Creative Flexibility

8.7/10

Performance

8.6/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Diffusion 3

Image Generation Open Weights Creative

Released: February 2024

Version: 3.0

Size: Not disclosed

A text-to-image model with enhanced capabilities for generating high-quality, detailed images from textual prompts, suitable for creative and professional applications.

Text-to-Image High-Resolution Output Creative Flexibility

8.8/10

Performance

8.9/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

Google

Gemini Pro

Conversational Reasoning Multimodal

Released: February 2024

Version: 1.0

Size: Not disclosed

An upgraded conversational AI model powering Bard, offering improved reasoning and language understanding for diverse tasks.

Enhanced Reasoning Language Understanding Task Versatility

8.7/10

Performance

8.6/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Gemini Pro 1.5

Multimodal API Only Reasoning

Released: February 2024

Version: 1.5

Size: Not disclosed

A multimodal AI model with advanced capabilities in text, image processing, and reasoning, accessible via API for developers.

Text and Image Processing Advanced Reasoning Developer API

8.9/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

CodeGemma

Code Generation Open Weights Developer

Released: February 2024

Version: 1.0

Size: Not disclosed

A code generation model designed for programming tasks, offering open weights for community use and customization.

Code Completion Syntax Understanding Customizable

8.5/10

Performance

8.4/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

Sora

Video Generation API Only Creative

Released: February 2024

Version: 1.0

Size: Not disclosed

A video generation model capable of creating realistic and imaginative videos from text prompts, not yet publicly released.

Text-to-Video Realistic Rendering Creative Storytelling

9.2/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Gemini Ultra

LLM Multimodal

Released: December 2023

Version: 1.0

Size: Unknown parameters

Google's largest multimodal model for text, vision, and reasoning tasks. It excels in complex problem-solving across diverse data types like code, images, and text.

Multimodal Advanced Reasoning Code Generation Vision

9.1/10

Performance

9.3/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Gemini Pro

LLM Multimodal

Released: December 2023

Version: 1.0

Size: Unknown parameters

Mid-tier multimodal Gemini model for text, vision, and reasoning. It offers a balance of performance and efficiency for various tasks.

Multimodal Reasoning Code Generation Vision

8.8/10

Performance

9.0/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Gemini Nano

LLM On-device

Released: December 2023

Version: 1.0

Size: Unknown parameters

Lightweight Gemini model optimized for on-device tasks. It supports text and vision processing with low resource requirements.

On-device Multimodal Efficient Vision

7.8/10

Performance

8.2/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Microsoft

Phi-2

SLM Open Source

Released: December 2023

Version: 2.0

Size: 2.7B parameters

A 2.7 billion parameter SLM optimized for reasoning, coding, and math, delivering near state-of-the-art performance for its size. It’s designed for low-resource environments, making it ideal for on-device applications and research experimentation.

Reasoning Coding Math On-device

8.5/10

Performance

8.7/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

DeepSeek

DeepSeek Coder

LLM Open Source Coding

Released: November 2023

Version: 1.0

Size: Not specified

DeepSeek Coder is an open-source model optimized for programming tasks, enabling code generation and completion with high accuracy. It laid the foundation for DeepSeek's later coding-focused models.

Code Generation Code Completion Programming Support

8.0/10

Performance

8.2/10

Accuracy

DeepSeek License

License

Read Paper View Model Page Learning Resources

DeepSeek

DeepSeek LLM

LLM Open Source General Purpose

Released: November 2023

Version: 1.0

Size: 7B and 67B parameters

DeepSeek LLM is a general-purpose language model available in Base and Chat variants, trained on 2 trillion tokens of English and Chinese text. It excels in text generation and conversational tasks.

Text Generation Conversational AI Multilingual Support Context-aware

8.3/10

Performance

8.5/10

Accuracy

DeepSeek License

License

Read Paper View Model Page Learning Resources

Google

Tram

LLM Structured Data

Released: October 2023

Version: 1.0

Size: Unknown parameters

Model for structured data processing and analysis. It excels in tasks involving tabular data and complex data structures.

Structured Data Data Analysis Table Processing Scalable

8.5/10

Performance

8.8/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

OpenAI

DALL-E 3

Multimodal Image Generation

Released: October 2023

Version: 3.0

Size: Not disclosed

The latest DALL-E model with advanced image generation and integration into ChatGPT. It offers improved detail and prompt adherence.

Image Generation Prompt Adherence ChatGPT Integration

9.0/10

Performance

9.2/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Microsoft

Phi-1.5

SLM Open Source

Released: September 2023

Version: 1.5

Size: 1.3B parameters

An enhanced version of Phi-1 with improved reasoning and text generation capabilities, maintaining a compact 1.3 billion parameter size. It excels in tasks like coding and math, offering a balance of efficiency and performance for local deployment.

Text Generation Reasoning Coding Local Deployment

8.2/10

Performance

8.4/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Alibaba

Qwen-VL-7B

Multimodal Vision-Language Open Source

Released: September 2023

Version: 1.0

Size: 7B

A vision-language model capable of understanding and generating content from images, supporting multi-round question answering.

Image Understanding Multi-Round QA Creative Capabilities Multilingual Support

8.8/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Flan-PaLM

LLM Instruction-tuned

Released: August 2023

Version: 1.0

Size: 540B parameters

Instruction-tuned PaLM for improved task generalization. It leverages diverse instruction datasets to excel in zero-shot and few-shot scenarios across tasks.

Zero-shot Learning Instruction Tuning Advanced Reasoning Multilingual

9.2/10

Performance

9.3/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Med-PaLM 2

LLM Medical

Released: August 2023

Version: 2.0

Size: Unknown parameters

Advanced medical-domain model for clinical tasks and question answering. It improves on MedPaLM with enhanced accuracy for healthcare applications.

Medical QA Clinical Tasks Advanced Reasoning Fine-tuning

9.2/10

Performance

9.4/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Meta

Code Llama

LLM Code

Released: August 2023

Version: 1.0

Size: 34B parameters

A specialized LLaMA variant for code generation and programming tasks. It supports multiple programming languages and excels in generating accurate, context-aware code.

Code Generation Programming Context-aware Fine-tuning

8.7/10

Performance

8.9/10

Accuracy

Non-commercial

License

Read Paper View Model Page Learning Resources

Google

MedPaLM

LLM Medical

Released: July 2023

Version: 1.0

Size: 540B parameters

Medical-domain PaLM for clinical tasks like medical question answering. It is fine-tuned with medical data for high accuracy in healthcare applications.

Medical QA Clinical Tasks Advanced Reasoning Fine-tuning

9.1/10

Performance

9.3/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Meta

LLaMA 2

LLM Research

Released: July 2023

Version: 2.0

Size: 70B parameters

An improved version of LLaMA, offering enhanced performance and safety for research applications. It supports a wide range of NLP tasks with better generalization and efficiency.

Text Generation Safety-focused Efficient Fine-tuning

8.8/10

Performance

9.0/10

Accuracy

Non-commercial

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Diffusion XL (SDXL)

LLM Image Generation Open Source

Released: July 2023

Version: 1.0

Size: 3.5B parameters

A powerful evolution of Stable Diffusion, SDXL delivers superior image quality and prompt adherence at higher resolutions, ideal for professional use cases. It incorporates advanced training methods and supports diverse styles like 3D, photography, and painting, with optimized performance on consumer hardware.

High-Resolution Images Prompt Adherence Diverse Styles Consumer Hardware

9.0/10

Performance

9.1/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

Google

U-PaLM

LLM Advanced

Released: June 2023

Version: 1.0

Size: 540B parameters

Continually trained PaLM with Unified Language Learning (UL2) objectives. It enhances generalization across tasks, improving performance in reasoning and multilingual settings.

Continual Learning Advanced Reasoning Multilingual Scalable

9.1/10

Performance

9.2/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Flamingo-C

LLM Multimodal

Released: June 2023

Version: 1.0

Size: Unknown parameters

Compact version of Flamingo for multimodal vision-language tasks. It maintains strong performance with reduced resource requirements.

Multimodal Vision and Language Efficient Image Captioning

8.6/10

Performance

8.9/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Microsoft

Phi-1

SLM Open Source

Released: June 2023

Version: 1.0

Size: 1.3B parameters

A small language model (SLM) with 1.3 billion parameters, designed for efficient text generation and basic reasoning tasks, particularly in research settings. It achieves strong performance on benchmarks like HumanEval, focusing on lightweight, cost-effective AI solutions for developers.

Text Generation Efficient Research-focused Lightweight

8.0/10

Performance

8.2/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

PaLM 2

LLM Advanced

Released: May 2023

Version: 2.0

Size: 340B parameters

Enhanced version of PaLM with improved efficiency and performance. It offers better reasoning, multilingual capabilities, and optimized training for diverse tasks.

Advanced Reasoning Multilingual Efficient Code Generation

9.2/10

Performance

9.3/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Stability AI

Stable LM

LLM Language Open Source

Released: April 2023

Version: Alpha

Size: 3B-7B parameters

An open-source language model suite launched in April 2023, with 3B to 7B parameter models designed for efficient text and code generation on personal devices. Trained on a massive dataset three times larger than The Pile, it offers high performance for conversational and coding tasks.

Text Generation Code Generation Efficient Consumer Devices

8.6/10

Performance

8.7/10

Accuracy

CC BY-SA-4.0

License

Read Paper View Model Page Learning Resources

Google

PaLM-E

LLM Multimodal

Released: March 2023

Version: 1.0

Size: 562B parameters

Embodied PaLM for robotics and multimodal tasks, integrating language with sensory inputs. It enables language-guided control in physical environments like robotic navigation.

Multimodal Robotics Advanced Reasoning Embodied AI

8.9/10

Performance

9.1/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-4

LLM Multimodal

Released: March 2023

Version: 1.0

Size: Not disclosed

A multimodal model with enhanced text and image processing capabilities. It achieves human-level performance on academic benchmarks and supports complex tasks.

Text Generation Image Processing Advanced Reasoning Multimodal

9.0/10

Performance

9.2/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Meta

LLaMA

LLM Research

Released: February 2023

Version: 1.0

Size: 13B parameters

A family of language models designed for research purposes, known for efficiency in natural language tasks. LLaMA models excel in text generation and understanding with optimized architectures.

Text Generation Efficient Research-focused Fine-tuning

8.5/10

Performance

8.8/10

Accuracy

Non-commercial

License

Read Paper View Model Page Learning Resources

Google

Dramatron

LLM Creative

Released: December 2022

Version: 1.0

Size: Unknown parameters

Model for scriptwriting and creative writing assistance. It generates coherent narratives and dialogue for storytelling applications.

Creative Writing Scriptwriting Text Generation Narrative

8.3/10

Performance

8.5/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-3.5

LLM Conversational

Released: November 2022

Version: 1.0

Size: Not disclosed

An optimized version of GPT-3 with fewer parameters, fine-tuned using reinforcement learning for conversational tasks. It powers the initial ChatGPT release.

Conversational AI Text Generation Fine-tuned RLHF

8.7/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Diffusion 2.0

LLM Image Generation Open Source

Released: November 2022

Version: 2.0

Size: 4B parameters

An enhanced version of Stable Diffusion, introducing inpainting, outpainting, and depth-guided image generation for improved creative control. It maintains high-quality outputs while addressing ethical concerns through filtered training data and permissive licensing for diverse applications.

Inpainting Outpainting Depth-guided Generation Text-to-Image

8.9/10

Performance

9.0/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

Google

Flan-T5

LLM Instruction-tuned

Released: October 2022

Version: 1.0

Size: 11B parameters

Instruction-tuned T5 for zero-shot task generalization. It improves T5’s performance on unseen tasks by fine-tuning with diverse instruction datasets.

Zero-shot Learning Text-to-Text Fine-tuning Instruction Tuning

8.7/10

Performance

9.0/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Sparrow

LLM Conversational

Released: September 2022

Version: 1.0

Size: Unknown parameters

Dialogue model with a focus on safety and ethical responses. It aims to reduce harmful outputs while maintaining conversational quality.

Conversational Safety-focused Dialogue Ethical

8.6/10

Performance

8.8/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

OpenAI

Whisper

Speech Open Source

Released: September 2022

Version: 1.0

Size: Not disclosed

An automatic speech recognition model for transcribing and translating audio. It supports multilingual speech processing with high accuracy.

Speech Recognition Transcription Translation

8.7/10

Performance

8.9/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

SayCan

LLM Robotics

Released: August 2022

Version: 1.0

Size: Unknown parameters

Language-guided robotic control model for task execution. It combines language understanding with physical actions for robotic applications.

Robotics Language-guided Task Execution Multimodal

8.5/10

Performance

8.7/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Stability AI

Stable Diffusion

LLM Image Generation Open Source

Released: August 2022

Version: 1.0

Size: 4B parameters

A pioneering open-source text-to-image model that generates high-quality, photorealistic images from textual prompts, leveraging latent diffusion techniques. Widely adopted for its flexibility and ability to run on consumer hardware, it supports creative applications in art, design, and media production.

Text-to-Image High Resolution Artistic Quality Open Source

8.8/10

Performance

8.9/10

Accuracy

CreativeML Open RAIL-M

License

Read Paper View Model Page Learning Resources

Google

Minerva

LLM Reasoning

Released: June 2022

Version: 1.0

Size: 540B parameters

PaLM-based model optimized for quantitative reasoning tasks. It excels in solving mathematical and scientific problems with high accuracy.

Quantitative Reasoning Advanced Reasoning Math-focused Scalable

9.0/10

Performance

9.2/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

UL2

LLM Versatile

Released: May 2022

Version: 1.0

Size: 20B parameters

Unified Language Learning model with a mixture-of-denoisers approach. It supports diverse tasks by combining multiple pre-training objectives for flexibility.

Mixture-of-Denoisers Pre-training Multitask Scalable

8.7/10

Performance

8.9/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Gato

LLM Generalist

Released: May 2022

Version: 1.0

Size: 1.2B parameters

Generalist agent for text, vision, and robotics tasks. It performs well across diverse domains, from language to physical control.

Multimodal Robotics Generalist Scalable

8.6/10

Performance

8.8/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Meta

OPT

LLM Open Source

Released: May 2022

Version: 1.0

Size: 175B parameters

Open Pre-trained Transformer models for research, offering efficient large-scale language modeling. It provides performance comparable to GPT-3 with open access for academic use.

Text Generation Efficient Research-focused Scalable

8.6/10

Performance

8.9/10

Accuracy

Non-commercial

License

Read Paper View Model Page Learning Resources

Google

PaLM

LLM Large-scale

Released: April 2022

Version: 1.0

Size: 540B parameters

Pathways Language Model, a 540B-parameter model for advanced reasoning and multilingual tasks. It excels in complex tasks like mathematical reasoning and code generation.

Advanced Reasoning Multilingual Code Generation Scalable

9.0/10

Performance

9.2/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Flamingo

LLM Multimodal

Released: April 2022

Version: 1.0

Size: 80B parameters

Multimodal vision-language model for tasks like image captioning. It combines visual and textual understanding for versatile applications.

Multimodal Vision and Language Image Captioning Fine-tuning

8.8/10

Performance

9.0/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

OpenAI

DALL-E 2

Multimodal Image Generation

Released: April 2022

Version: 2.0

Size: Not disclosed

An enhanced version of DALL-E with improved image quality and editing capabilities. It supports higher resolution and more precise outputs.

Image Generation Image Editing High Resolution

8.8/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Luminous

LLM Multilingual

Released: March 2022

Version: 1.0

Size: Unknown parameters

Multilingual text generation model with limited public details. It focuses on high-quality text generation for diverse languages and applications.

Multilingual Text Generation Pre-training Fine-tuning

8.4/10

Performance

8.7/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Chinchilla

LLM Efficient

Released: March 2022

Version: 1.0

Size: 70B parameters

70B-parameter compute-optimal model for efficient performance. It outperforms larger models in NLP tasks with less computational cost.

Compute-optimal Efficient NLP Tasks Scalable

8.8/10

Performance

9.0/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Ithaca

LLM Specialized

Released: March 2022

Version: 1.0

Size: Unknown parameters

Model for historical text restoration, specializing in ancient Greek texts. It reconstructs missing text with high contextual accuracy.

Text Restoration Historical Texts Contextual Understanding Specialized

8.4/10

Performance

8.6/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

CodeGen

LLM Code

Released: March 2022

Version: 1.0

Size: 16B parameters

Code-generation model complementing AlphaCode for programming tasks. It generates high-quality code for various languages and applications.

Code Generation Programming Reasoning Scalable

8.7/10

Performance

8.9/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

AlphaCode

LLM Code

Released: February 2022

Version: 1.0

Size: 41B parameters

Model for competitive programming and code generation. It solves complex algorithmic problems with high accuracy and efficiency.

Code Generation Competitive Programming Reasoning Scalable

8.9/10

Performance

9.1/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

LaMDA

LLM Conversational

Released: January 2022

Version: 1.0

Size: 137B parameters

Language Model for Dialogue Applications, optimized for conversational tasks. It generates coherent and contextually relevant responses for natural dialogue.

Conversational Contextual Understanding Dialogue Fine-tuning

8.8/10

Performance

9.0/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

RETRO

LLM Retrieval-augmented

Released: January 2022

Version: 1.0

Size: 7B parameters

Retrieval-augmented Transformer for enhanced language modeling. It uses external memory to improve performance on knowledge-intensive tasks.

Retrieval-augmented Knowledge-intensive NLP Tasks Scalable

8.7/10

Performance

8.9/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-3.5 Turbo

LLM Conversational

Released: 2022

Version: 1.0

Size: Not disclosed

A highly capable variant of GPT-3.5, optimized for speed and efficiency. It supports ChatGPT and was integrated into platforms like Bing before GPT-4.

Conversational AI High Efficiency Text Generation

8.8/10

Performance

9.0/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

NVIDIA

PeopleNet

Computer Vision Pretrained Real-time

Released: 2022

Version: 1.0

Size: Not specified

PeopleNet is a computer vision model developed using NVIDIA TAO for real-time pedestrian detection and tracking in urban environments, optimized for smart cities and autonomous vehicles.

Pedestrian Detection Object Tracking Real-time Processing

8.5/10

Performance

8.7/10

Accuracy

NVIDIA License

License

Read Paper View Model Page Learning Resources

NVIDIA

Bi3D

Computer Vision Pretrained Depth Estimation

Released: 2022

Version: 1.0

Size: Not specified

Bi3D is a binary depth classification network for classifying object depth, ideal for collision avoidance in autonomous mobile robots.

Depth Classification Collision Avoidance Efficient Processing

8.3/10

Performance

8.5/10

Accuracy

NVIDIA License

License

Read Paper View Model Page Learning Resources

NVIDIA

BioBERT

NLP Pretrained Biomedical

Released: 2022

Version: 1.0

Size: Not specified

BioBERT is a BERT-based model fine-tuned on biomedical datasets for text mining and NLP tasks, optimized for identifying chemical and protein entities.

Biomedical Text Mining Entity Recognition Context-aware Processing

8.6/10

Performance

8.8/10

Accuracy

NVIDIA License

License

Read Paper View Model Page Learning Resources

NVIDIA

Spleen Segmentation

Computer Vision Pretrained Medical

Released: 2022

Version: 1.0

Size: Not specified

Spleen Segmentation is a pretrained model for volumetric 3D segmentation of the spleen from CT images, using advanced medical segmentation techniques.

3D Segmentation Medical Imaging High Accuracy

8.7/10

Performance

8.9/10

Accuracy

NVIDIA License

License

Read Paper View Model Page Learning Resources

NVIDIA

Conformer

Speech Recognition Pretrained Multilingual

Released: 2022

Version: 1.0

Size: Not specified

Conformer is a convolution-augmented transformer model for automatic speech recognition, supporting over 10 languages for applications like live captioning and voice assistants.

Speech Recognition Multilingual Support High Accuracy

8.8/10

Performance

9.0/10

Accuracy

NVIDIA License

License

Read Paper View Model Page Learning Resources

NVIDIA

ECAPA-TDNN

Speech AI Pretrained Speaker Identification

Released: 2022

Version: 1.0

Size: Not specified

ECAPA-TDNN is a time delay neural network-based model for speaker identification and verification, providing robust speaker embeddings for applications like medical conversation analysis.

Speaker Identification Speaker Verification Robust Embeddings

8.6/10

Performance

8.8/10

Accuracy

NVIDIA License

License

Read Paper View Model Page Learning Resources

NVIDIA

Megatron 530B

LLM Pretrained Conversational

Released: 2022

Version: 1.0

Size: 530B parameters

Megatron 530B is a transformer-based language model using ELECTRA pretraining, optimized for NLP tasks like chatbots and virtual assistants with smaller size and faster training.

Text Generation Conversational AI Efficient Training

8.9/10

Performance

9.1/10

Accuracy

NVIDIA License

License

Read Paper View Model Page Learning Resources

Google

GLaM

LLM Efficient

Released: December 2021

Version: 1.0

Size: 1.2T parameters

Generalist Language Model, a 1.2T-parameter Mixture-of-Experts model. It achieves high performance with lower energy consumption for NLP tasks.

Mixture-of-Experts Efficient Scalable NLP Tasks

8.8/10

Performance

9.0/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Flan

LLM Instruction-tuned

Released: December 2021

Version: 1.0

Size: Unknown parameters

Instruction-tuned model family for zero-shot performance across tasks. It leverages fine-tuning on diverse datasets to improve generalization.

Zero-shot Learning Instruction Tuning Multitask Fine-tuning

8.6/10

Performance

8.9/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Gopher

LLM Large-scale

Released: December 2021

Version: 1.0

Size: 280B parameters

280B-parameter model focused on reasoning and language tasks. It competes with large-scale models in NLP benchmarks and research applications.

Advanced Reasoning Scalable NLP Tasks Research

8.9/10

Performance

9.1/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

T0

LLM Zero-shot

Released: October 2021

Version: 1.0

Size: 11B parameters

T5-based model for zero-shot task generalization. It leverages multitask prompting to perform well on unseen tasks without additional fine-tuning.

Zero-shot Learning Text-to-Text Multitask Fine-tuning

8.5/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

Codex

LLM Code

Released: August 2021

Version: 1.0

Size: Not disclosed

A specialized model for code generation and editing, powering tools like GitHub Copilot. It excels in understanding and generating programming languages.

Code Generation Code Editing Programming Support

8.6/10

Performance

8.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

Perceiver

LLM Multimodal

Released: July 2021

Version: 1.0

Size: Unknown parameters

General-purpose architecture for text and multimodal tasks. It uses cross-attention to handle diverse data types efficiently.

Multimodal Cross-attention Efficient Scalable

8.3/10

Performance

8.6/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

CANINE

LLM Multilingual

Released: June 2021

Version: 1.0

Size: 370M parameters

Character-based model for multilingual text processing without word tokenization. It excels in low-resource languages and noisy text environments.

Character-based Multilingual Robust Pre-training

7.8/10

Performance

8.1/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

UniT

LLM Multimodal

Released: June 2021

Version: 1.0

Size: Unknown parameters

Unified Transformer for vision and language tasks. It handles multimodal inputs for applications like image captioning and visual question answering.

Multimodal Vision and Language Pre-training Fine-tuning

8.5/10

Performance

8.8/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

ByT5

LLM Character-based

Released: May 2021

Version: 1.0

Size: 12B parameters

Byte-level T5 model for character-based text processing. It operates directly on UTF-8 bytes, improving performance on noisy text and low-resource languages.

Byte-level Processing Multilingual Text-to-Text Robust

8.3/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

MUM

LLM Multimodal

Released: May 2021

Version: 1.0

Size: Unknown parameters

Multitask Unified Model, a multimodal model for search combining text and images. It enhances search relevance by understanding complex, multimodal queries.

Multimodal Search Optimization Text and Image Scalable

8.7/10

Performance

8.9/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

ViT-BERT

LLM Multimodal

Released: April 2021

Version: 1.0

Size: Unknown parameters

Hybrid vision-language model combining Vision Transformer and BERT. It excels in tasks requiring joint understanding of images and text.

Multimodal Vision and Language Pre-training Fine-tuning

8.6/10

Performance

8.9/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

MuRIL

LLM Multilingual

Released: March 2021

Version: 1.0

Size: 237M parameters

Multilingual Representation for Indian Languages, a BERT-based model tailored for Indian languages. It supports 17 Indian languages, enhancing NLP tasks like sentiment analysis and text classification.

Multilingual Indian Languages Pre-training Fine-tuning

7.9/10

Performance

8.2/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Switch Transformer

LLM Scalable

Released: January 2021

Version: 1.0

Size: 1.6T parameters

Mixture-of-experts model for scaling to trillions of parameters efficiently. It dynamically selects expert subnetworks, reducing compute costs for large-scale tasks.

Mixture-of-Experts Scalable Efficient Pre-training

8.9/10

Performance

9.1/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

OpenAI

CLIP

Vision Open Source

Released: January 2021

Version: 1.0

Size: Not disclosed

A vision-language model that connects text and images for tasks like image classification and captioning. It’s open-source and widely used in research.

Image Classification Text-Image Mapping Captioning

8.3/10

Performance

8.5/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

OpenAI

DALL-E

Multimodal Image Generation

Released: January 2021

Version: 1.0

Size: Not disclosed

A text-to-image model generating creative images from textual prompts. It combines GPT-like architectures with diffusion models.

Image Generation Text-to-Image Creative Output

8.5/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Google

mT5

LLM Multilingual

Released: October 2020

Version: 1.0

Size: 13B parameters

Multilingual T5, supporting 101 languages for global NLP applications. It extends T5’s text-to-text framework to low-resource languages, improving cross-lingual performance.

Multilingual Text-to-Text Pre-training Fine-tuning

8.4/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

ETC

LLM Long-context

Released: August 2020

Version: 1.0

Size: 400M parameters

Extended Transformer Construction with hierarchical attention for long-context processing. It handles extended sequences efficiently for tasks like document understanding.

Long-context Hierarchical Attention Pre-training NLP Tasks

7.9/10

Performance

8.2/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

DocT5query

LLM Search

Released: August 2020

Version: 1.0

Size: 11B parameters

T5-based model for document ranking and query generation. It improves search relevance by generating queries for document indexing.

Document Ranking Query Generation Text-to-Text Fine-tuning

8.3/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

BigBird

LLM Long-context

Released: July 2020

Version: 1.0

Size: 400M parameters

Transformer with sparse attention for processing long sequences. It reduces memory usage while maintaining performance on tasks like document classification.

Sparse Attention Long-context Pre-training NLP Tasks

8.0/10

Performance

8.4/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

GShard

LLM Multilingual

Released: June 2020

Version: 1.0

Size: 600B parameters

Sharding-based Mixture-of-Experts model optimized for translation tasks. It enables efficient scaling for multilingual applications with reduced computational costs.

Mixture-of-Experts Multilingual Translation Scalable

8.6/10

Performance

8.8/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-3

LLM Commercial

Released: June 2020

Version: 1.0

Size: 175B parameters

A massive model excelling in diverse NLP tasks, from text generation to question answering. It introduced few-shot learning capabilities and powered early API applications.

Text Generation Few-shot Learning Question Answering Translation

8.5/10

Performance

8.7/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Microsoft

DeBERTa

LLM Open Source Multilingual

Released: June 2020

Version: 3.0 (latest)

Size: 22M-1.5B parameters

DeBERTa is a family of transformer-based language models (including base, large, V2, V3, and multilingual variants) that enhances BERT with disentangled attention and ELECTRA-style pre-training, achieving top performance on benchmarks like SuperGLUE and SQuAD. With sizes ranging from 22M to 1.5B parameters, it supports tasks like text classification, question answering, and cross-lingual transfer, powering Microsoft’s Turing NLRv4 for Bing and Azure.

Disentangled Attention ELECTRA Pre-training Cross-lingual Transfer Question Answering

9.0/10

Performance

9.2/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

TAPAS

LLM Structured Data

Released: May 2020

Version: 1.0

Size: 340M parameters

Table-based question answering and parsing model. It processes structured data in tables, enabling natural language queries over tabular content.

Table Parsing Question Answering Structured Data Fine-tuning

8.2/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Microsoft

CodeBERT

LLM Open Source Code

Released: May 2020

Version: 1.0

Size: 125M parameters

CodeBERT is a bimodal pre-trained model for programming and natural language, leveraging a large corpus of code and comments to excel in tasks like code search and documentation generation. It supports multiple programming languages and is widely used in tools for software development and AI-driven code analysis.

Code Search Documentation Generation Programming Languages Bimodal Pre-training

8.4/10

Performance

8.6/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

MobileBERT

LLM Mobile

Released: April 2020

Version: 1.0

Size: 25M parameters

A compact BERT variant optimized for mobile and edge devices. It balances performance and resource usage, enabling efficient NLP on low-power hardware.

Mobile Optimization Low Latency Fine-tuning NLP Tasks

7.5/10

Performance

8.0/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Longformer

LLM Long-context

Released: April 2020

Version: 1.0

Size: 435M parameters

Transformer with efficient attention for long-document processing. It reduces computational complexity while handling extended sequences for tasks like summarization.

Long-context Efficient Attention Pre-training NLP Tasks

8.1/10

Performance

8.5/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

Jukebox

Audio Open Source

Released: April 2020

Version: 1.0

Size: Not disclosed

A model for generating music from text prompts, supporting various genres and styles. It’s an experimental open-source project.

Music Generation Text-to-Audio Genre Support

8.0/10

Performance

8.2/10

Accuracy

Non-commercial

License

Read Paper View Model Page Learning Resources

Google

ELECTRA

LLM Efficient

Released: March 2020

Version: 1.0

Size: 335M parameters

Efficient pre-training model using a generator-discriminator framework. It achieves high performance with less compute by replacing masked language modeling with token detection.

Efficient Pre-training Token Detection Fine-tuning NLP Tasks

8.2/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

PEGASUS

LLM Summarization

Released: February 2020

Version: 1.0

Size: 570M parameters

Pre-training with Extracted Gap-sentences for Abstractive Summarization. It is optimized for generating concise and accurate text summaries.

Summarization Pre-training Fine-tuning Text Generation

8.3/10

Performance

8.7/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Google

Meena

LLM Conversational

Released: January 2020

Version: 1.0

Size: 2.6B parameters

Early conversational model, a precursor to LaMDA. It focuses on generating human-like dialogue with improved coherence and context awareness.

Conversational Contextual Understanding Dialogue Pre-training

8.0/10

Performance

8.3/10

Accuracy

Commercial

License

Read Paper View Model Page Learning Resources

Google

Reformer

LLM Efficient

Released: January 2020

Version: 1.0

Size: 360M parameters

Memory-efficient Transformer using locality-sensitive hashing. It reduces memory usage for processing long sequences, suitable for resource-constrained environments.

Memory-efficient Long-context Pre-training NLP Tasks

7.9/10

Performance

8.2/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Meta

XLM-R

LLM Multilingual

Released: November 2019

Version: 1.0

Size: 550M parameters

Cross-lingual Language Model-RoBERTa for multilingual NLP tasks. It supports 100 languages, enabling robust performance in translation and text classification across languages.

Multilingual Pre-training Fine-tuning Cross-lingual

8.3/10

Performance

8.7/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Microsoft

DialoGPT

LLM Open Source Conversational

Released: November 2019

Version: 1.0

Size: 345M parameters

DialoGPT is a conversational model trained on Reddit dialogues, designed to generate human-like responses for interactive chat applications. Its GPT-2-based architecture and large-scale dialogue data enable coherent and contextually relevant conversations, influencing later models like BlenderBot.

Conversational AI Dialogue Generation Context-aware Human-like Responses

8.2/10

Performance

8.4/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

T5

LLM Versatile

Released: October 2019

Version: 1.0

Size: 11B parameters

Text-to-Text Transfer Transformer, a unified framework for NLP tasks. It converts all tasks into a text-to-text format, enabling versatile applications like translation and summarization.

Text-to-Text Pre-training Fine-tuning Multitask

8.5/10

Performance

8.8/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Meta

BART

LLM Open Source

Released: October 2019

Version: 1.0

Size: 400M parameters

Bidirectional and Auto-Regressive Transformer for text generation and comprehension. It excels in tasks like summarization and translation with a denoising pre-training objective.

Text Generation Summarization Translation Fine-tuning

8.4/10

Performance

8.8/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

ALBERT

LLM Efficient

Released: September 2019

Version: 1.0

Size: 12M parameters

A Lite BERT with reduced parameters for efficiency while maintaining performance. It uses factorized embedding parameterization and cross-layer parameter sharing, ideal for resource-constrained environments.

Parameter Efficiency Scalable Fine-tuning NLP Tasks

7.8/10

Performance

8.3/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

Microsoft

MT-DNN

LLM Open Source

Released: August 2019

Version: 1.0

Size: Not disclosed

Multi-Task Deep Neural Network (MT-DNN) combines multi-task learning with pre-trained language models like BERT to achieve robust performance across diverse NLP tasks like sentiment analysis and text classification. Its knowledge distillation techniques enable efficient fine-tuning, making it a versatile choice for enterprise NLP applications.

Multi-task Learning Knowledge Distillation Text Classification Sentiment Analysis

8.3/10

Performance

8.5/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Meta

RoBERTa

LLM Open Source

Released: July 2019

Version: 1.0

Size: 355M parameters

Robustly optimized BERT approach for enhanced NLP performance. It improves on BERT with dynamic masking and larger pre-training data for tasks like text classification.

Bidirectional Context Pre-training Fine-tuning NLP Tasks

8.2/10

Performance

8.6/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Microsoft

UniLM

LLM Open Source

Released: July 2019

Version: 1.0

Size: Not disclosed

Unified Language Model (UniLM) is a pre-trained model supporting both natural language understanding and generation tasks, such as summarization and dialogue, through a shared transformer architecture. Its bidirectional, unidirectional, and sequence-to-sequence pre-training objectives make it highly flexible for applications in Azure Cognitive Services.

Text Generation Summarization Dialogue Flexible Pre-training

8.5/10

Performance

8.7/10

Accuracy

MIT

License

Read Paper View Model Page Learning Resources

Google

XLNet

LLM Open Source

Released: June 2019

Version: 1.0

Size: 340M parameters

Generalized autoregressive model developed with CMU, outperforming BERT in certain tasks. It uses permutation-based training for better context modeling.

Autoregressive Permutation Training Fine-tuning NLP Tasks

8.1/10

Performance

8.6/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-2

LLM Open-weight

Released: February 2019

Version: 1.0

Size: 1.5B parameters

An improved transformer model with better coherence in text generation. Initially partially released due to misuse concerns, it showed strong zero-shot task performance.

Text Generation Zero-shot Learning Coherent Output

7.8/10

Performance

8.0/10

Accuracy

Partially Open

License

Read Paper View Model Page Learning Resources

Google

BERT

LLM Open Source

Released: October 2018

Version: 1.0

Size: 340M parameters

Bidirectional Encoder Representations from Transformers, designed for deep contextual understanding of text. It revolutionized NLP by enabling bidirectional context in pre-training, excelling in tasks like question answering and text classification.

Bidirectional Context Pre-training Fine-tuning NLP Tasks

8.0/10

Performance

8.5/10

Accuracy

Apache 2.0

License

Read Paper View Model Page Learning Resources

OpenAI

GPT-1

LLM Research

Released: June 2018

Version: 1.0

Size: 117M parameters

The first in the GPT series, a transformer-based model focused on unsupervised learning for natural language tasks. It laid the foundation for future models with generative pre-training.

Text Generation Unsupervised Learning Pre-training

6.5/10

Performance

6.8/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources

Microsoft

Turing

LLM Proprietary

Released: Not disclosed

Version: Not disclosed

Size: Not disclosed

A family of language models developed by Microsoft Research, used across products like Bing and Azure for tasks like search and text generation. It leverages advanced techniques for efficiency and performance, contributing to Microsoft’s AI infrastructure.

Text Generation Search Efficient Scalable

8.7/10

Performance

8.9/10

Accuracy

Proprietary

License

Read Paper View Model Page Learning Resources