The Grand AI Handbook

Google

TensorFlow

Framework Open Source Enterprise

Released: November 2015

Language: Python, C++

Version: 2.14.0

TensorFlow is a comprehensive, flexible ecosystem of tools, libraries, and community resources for building and deploying machine learning applications. It provides a complete platform for developers, researchers, and enterprises to develop and deploy ML models at scale with high performance, featuring support for distributed computing, production deployment, and mobile/edge device integration.

Deep learning framework with high-level APIs Distributed training across multiple devices TensorFlow Lite for mobile and embedded devices TensorFlow.js for browser-based ML

179k

89k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Meta

PyTorch

Framework Open Source Research

Released: January 2017

Language: Python, C++, CUDA

Version: 2.1.0

PyTorch is an open-source machine learning library that provides a seamless path from research prototyping to production deployment. Known for its dynamic computational graphs and pythonic syntax, PyTorch offers intuitive design and lightning-fast performance, making it the framework of choice for cutting-edge AI research and professional applications.

Dynamic computational graphs for flexible modeling Native support for tensors and GPU acceleration Distributed training with TorchDistributed Strong ecosystem with torchvision and torchaudio

73k

20k

Updated

★★★★★

Popularity

★★★★★

Activity

BSD-3-Clause

License

GitHub Website Documentation

scikit-learn developers

scikit-learn

Library Open Source Machine Learning

Released: 2007

Language: Python, C++, Cython

Version: 1.3.1

Scikit-learn is a comprehensive machine learning library that provides simple and efficient tools for data mining and data analysis. Built on NumPy, SciPy, and matplotlib, it features various classification, regression, and clustering algorithms, making it the go-to library for traditional machine learning tasks with an intuitive and consistent API.

Comprehensive collection of ML algorithms Simple and consistent API across all models Excellent documentation and tutorials Integration with NumPy and SciPy

56k

25k

Updated

★★★★★

Popularity

★★★★★

Activity

BSD-3-Clause

License

GitHub Website Documentation

Hugging Face

Hugging Face Transformers

Library Open Source NLP

Released: November 2018

Language: Python

Version: 4.33.0

Transformers provides state-of-the-art pre-trained models and architectures for natural language processing, computer vision, and audio tasks. The library offers thousands of pretrained models that can be used for tasks like text classification, information extraction, question answering, summarization, and more, with seamless integration for training and deployment.

Access to thousands of pretrained models Support for NLP, Vision, and Audio tasks Easy fine-tuning and transfer learning Integration with PyTorch and TensorFlow

115k

23k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Intel

OpenCV

Library Open Source Computer Vision

Released: June 2000

Language: C++, Python, Java

Version: 4.8.0

OpenCV is the leading open-source computer vision and machine learning software library with over 2500 optimized algorithms. It provides a comprehensive infrastructure for real-time optimized image and video processing applications in a wide variety of fields including facial recognition, object detection, augmented reality, and autonomous vehicles.

Comprehensive computer vision algorithms Real-time image and video processing Multi-language support and bindings CUDA and OpenCL acceleration support

71k

55k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

François Chollet

Keras

Framework Open Source Deep Learning

Released: March 2015

Language: Python

Version: 2.14.0

Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano. It provides a simple, flexible, and user-friendly interface for creating and training deep learning models, enabling fast experimentation with deep neural networks through consistent and intuitive APIs.

User-friendly high-level API Modular and composable architecture Support for convolutional and recurrent networks Seamless CPU and GPU computations

60k

19k

Updated

★★★★★

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

DMLC

XGBoost

Library Open Source Machine Learning

Released: March 2014

Language: C++, Python, R

Version: 1.7.6

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It implements machine learning algorithms under the Gradient Boosting framework, providing a parallel tree boosting that solves many data science problems in a fast and accurate way, making it the go-to choice for winning machine learning competitions.

Highly efficient gradient boosting implementation Parallel and distributed computing capabilities Handling of missing values automatically Regularization to prevent overfitting

25k

8.7k

Updated

★★★★★

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

LangChain AI

LangChain

Framework Open Source LLM

Released: October 2022

Language: Python, TypeScript

Version: 0.0.340

LangChain is a framework for developing applications powered by language models. It enables developers to build context-aware reasoning applications by connecting language models to sources of context and providing a standard interface for chains, agents, retrieval strategies, and other components, making it easier to build complex LLM applications.

Modular components for LLM applications Memory management for conversational systems Integration with various LLM providers Support for retrieval-augmented generation

72k

11k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Stability AI

Stable Diffusion

Model Open Source Computer Vision

Released: August 2022

Language: Python

Version: 2.1

Stable Diffusion is a state-of-the-art latent text-to-image diffusion model that generates photorealistic images from text descriptions. It can create and manipulate images based on text prompts, perform inpainting, outpainting, and image-to-image translations while offering exceptional quality and artistic flexibility, democratizing access to high-quality AI image generation.

High-quality text-to-image generation Image inpainting and outpainting Style transfer and image editing Open-source with commercial use allowed

63k

11k

Updated

★★★★★

Popularity

★★★★★

Activity

CreativeML OpenRAIL-M

License

GitHub Website Documentation

pandas community

Pandas

Library Open Source Data Analysis

Released: January 2009

Language: Python, Cython

Version: 2.1.1

Pandas is a powerful, fast, and flexible open-source data analysis and manipulation tool built on top of Python. It provides data structures like DataFrame and Series for handling structured data, along with a comprehensive set of tools for data cleaning, transformation, and analysis, making it essential for data science workflows.

Efficient DataFrame objects for data manipulation Tools for reading and writing various file formats Intelligent data alignment and missing data handling Advanced time series functionality

40k

17k

Updated

★★★★★

Popularity

★★★★★

Activity

BSD-3-Clause

License

GitHub Website Documentation

Ultralytics

YOLO

Model Open Source Computer Vision

Released: June 2016

Language: Python, PyTorch

Version: 8.0.0

YOLO (You Only Look Once) is a state-of-the-art, real-time object detection system that can process images in real-time with high accuracy. It offers a range of models (YOLOv8, YOLOv5, etc.) that excel at detecting objects in images and videos with exceptional speed-accuracy trade-offs, making it ideal for autonomous vehicles, security systems, and industrial automation.

Real-time object detection capabilities Instance segmentation and pose estimation Pre-trained models for various use cases Easy deployment to edge devices

42k

11k

Updated

★★★★★

Popularity

★★★★★

Activity

AGPL-3.0

License

GitHub Website Documentation

Sebastián Ramírez

FastAPI

Framework Open Source API Development

Released: December 2018

Language: Python

Version: 0.103.1

FastAPI is a modern, fast (high-performance) web framework for building APIs with Python 3.7+ based on standard Python type hints. It provides automatic API documentation, validation, serialization, and asynchronous support, making it ideal for building microservices and ML model serving endpoints with minimal code while maintaining high performance.

Automatic interactive API documentation Data validation and serialization using Pydantic Native async support for high performance Type hints for better IDE support

64k

5.4k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Explosion AI

spaCy

Library Open Source NLP

Released: February 2015

Language: Python, Cython

Version: 3.6.1

spaCy is an industrial-strength natural language processing library designed for production use. It offers fast and accurate syntactic analysis, named entity recognition, dependency parsing, and built-in deep learning integration, providing developers with efficient tools for building sophisticated NLP pipelines that can handle large volumes of text.

Fast and accurate NLP pipelines Pre-trained models for multiple languages Named entity recognition and dependency parsing Deep learning integration with transformers

28k

4.3k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Linux Foundation

MLflow

Platform Open Source MLOps

Released: June 2018

Language: Python, JavaScript

Version: 2.7.1

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools for experiment tracking, model packaging, model registry, and deployment, enabling data scientists and ML engineers to develop, collaborate, and productionize machine learning models efficiently while maintaining reproducibility and version control.

Experiment tracking and versioning Model packaging in standard formats Model registry for versioning and staging Integration with major ML frameworks

16k

3.7k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Anyscale

Ray

Framework Open Source Distributed Computing

Released: May 2017

Language: Python, C++

Version: 2.7.0

Ray is a unified framework for scaling AI and Python applications from a laptop to a cluster. It provides a simple, universal API for building distributed applications, including capabilities for distributed training, hyperparameter tuning, reinforcement learning, and serving, making it essential for scaling machine learning workflows to production.

Distributed computing for ML workflows Scalable hyperparameter tuning with Ray Tune Reinforcement learning with Ray RLlib Model serving with Ray Serve

29k

4.9k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Google

MediaPipe

Framework Open Source Computer Vision

Released: June 2019

Language: C++, Python

Version: 0.10.7

MediaPipe is a cross-platform framework for building multimodal applied machine learning pipelines. It provides out-of-the-box solutions for common perception tasks like hand tracking, face detection, and pose estimation, enabling developers to create sophisticated AR and perception applications with minimal effort across mobile, web, and IoT devices.

Cross-platform ML pipeline framework Pre-built solutions for perception tasks Real-time performance on mobile devices Integration with TensorFlow Lite

24k

5k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Hugging Face

Gradio

Library Open Source UI Framework

Released: February 2019

Language: Python, JavaScript

Version: 3.50.0

Gradio is an open-source Python library that helps you create machine learning demos and web applications with just a few lines of code. It enables rapid prototyping and sharing of machine learning models through user-friendly web interfaces, supporting various input and output types and making ML models accessible to non-technical users.

Quick ML demo creation with minimal code Support for multiple input/output types Built-in sharing capabilities Integration with Hugging Face Hub

25k

1.9k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Google

JAX

Library Open Source Deep Learning

Released: December 2018

Language: Python, C++

Version: 0.4.16

JAX is a high-performance numerical computation library that combines NumPy's familiar API with automatic differentiation and hardware acceleration. It provides composable function transformations for machine learning research, including automatic differentiation, vectorization, and GPU/TPU acceleration, making it ideal for cutting-edge ML research and production deployment.

Automatic differentiation for gradient computation Hardware acceleration on GPUs and TPUs Composable function transformations NumPy-compatible API for easy adoption

26k

2.4k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Iterative

DVC

Tool Open Source MLOps

Released: May 2017

Language: Python

Version: 3.27.0

DVC (Data Version Control) is an open-source version control system for machine learning projects. It works alongside Git to manage and version large data files, ML models, and experiments, providing reproducibility and collaboration features specifically designed for data science teams, making ML projects as maintainable as software projects.

Version control for data and models ML pipeline management and automation Experiment tracking and comparison Storage-agnostic remote data management

13k

1.1k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Meta

Detectron2

Framework Open Source Computer Vision

Released: October 2019

Language: Python, C++

Version: 0.6

Detectron2 is Meta AI Research's next-generation library that provides state-of-the-art detection and segmentation algorithms. It features flexible and modular design, high performance, and extensive support for various computer vision tasks including object detection, instance segmentation, keypoint detection, and panoptic segmentation, making it a go-to choice for computer vision research.

State-of-the-art object detection algorithms Instance and semantic segmentation Panoptic segmentation capabilities Flexible model architecture design

27k

7.3k

Updated

★★★★☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Microsoft

LightGBM

Library Open Source Machine Learning

Released: October 2016

Language: C++, Python

Version: 4.1.0

LightGBM is a gradient boosting framework that uses tree-based learning algorithms. It's designed for distributed and efficient training, making it ideal for large-scale machine learning tasks with remarkable speed and accuracy. LightGBM excels at handling large datasets with lower memory usage and offers parallel and GPU learning capabilities.

Faster training speed and higher efficiency Lower memory usage with large datasets Parallel and GPU learning supported Optimal split finding with histogram-based algorithms

16k

3.8k

Updated

★★★★☆

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Meta

Fairseq

Framework Open Source NLP

Released: August 2017

Language: Python

Version: 0.12.2

Fairseq is a sequence modeling toolkit for training custom models for translation, summarization, language modeling and other text generation tasks. It provides state-of-the-art implementations of sequence models including transformers, convolutional nets, and LSTMs, with a focus on research flexibility and production efficiency.

State-of-the-art sequence modeling architectures Distributed training on multiple GPUs/machines Flexible and extensible research framework Pre-trained models for various NLP tasks

29k

6.3k

Updated

★★★★☆

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Linux Foundation

ONNX

Standard Open Source Interoperability

Released: September 2017

Language: C++, Python

Version: 1.14.1

ONNX (Open Neural Network Exchange) is an open format built to represent machine learning models. It defines a common set of operators and a common file format to enable AI developers to use models with a variety of frameworks, tools, runtimes, and compilers, facilitating seamless model interoperability across different platforms.

Framework-agnostic model representation Extensive operator support across frameworks Model optimization and conversion tools Hardware acceleration support

16k

3.8k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Allen Institute for AI

AllenNLP

Library Open Source NLP

Released: September 2017

Language: Python

Version: 2.10.1

AllenNLP is an open-source NLP research library built on PyTorch. It provides modular components, abstractions, and implementations for common NLP tasks, making it easy to develop state-of-the-art deep learning models for natural language understanding, featuring high-quality reference implementations and research-focused design.

High-level abstractions for NLP research Reference implementations of state-of-the-art models Configuration-driven experiment management Comprehensive evaluation metrics and visualization

11.7k

2.3k

Updated

★★★★☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Preferred Networks

Optuna

Framework Open Source AutoML

Released: December 2018

Language: Python

Version: 3.3.0

Optuna is an automatic hyperparameter optimization framework that allows for efficient optimization of machine learning model parameters. It provides a define-by-run API, distributed optimization capabilities, and supports pruning of unpromising trials, making hyperparameter tuning more efficient and accessible for ML practitioners.

Define-by-run API for flexible search spaces State-of-the-art optimization algorithms Distributed hyperparameter optimization Visualization dashboard for optimization process

9.1k

981

Updated

★★★★☆

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Uber

Horovod

Framework Open Source Distributed Computing

Released: October 2017

Language: C++, Python

Version: 0.28.1

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. It enables fast and easy distributed training by using ring-allreduce algorithm, making distributed training as simple as running a single-GPU training script while achieving near-linear scalability.

Efficient distributed training for multiple frameworks Ring-allreduce algorithm for optimal performance Minimal code changes for distributed training Support for CPU, GPU, and heterogeneous clusters

13.8k

2.2k

Updated

★★★★☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Microsoft

DeepSpeed

Library Open Source Distributed Computing

Released: May 2020

Language: Python, C++

Version: 0.10.3

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. It enables training of large models with trillions of parameters, offering advanced optimizations like ZeRO, pipeline parallelism, and 3D parallelism, while significantly reducing memory requirements and training time.

ZeRO optimizer for memory efficiency Pipeline parallelism for large model training Expert parallelism for MoE models Automatic mixed precision training

32k

3.8k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Kornia AI

Kornia

Library Open Source Computer Vision

Released: March 2019

Language: Python

Version: 0.7.0

Kornia is a differentiable computer vision library for PyTorch that provides a set of routines and differentiable modules to solve generic computer vision problems. It enables end-to-end training of deep learning models with geometric computer vision operations, making complex visual tasks differentiable and GPU-accelerated.

Differentiable computer vision operations GPU-accelerated image processing Augmentation pipelines for training Geometric computer vision algorithms

9.2k

942

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Weights & Biases

Platform Open Source MLOps

Released: May 2018

Language: Python

Version: 0.15.12

Weights & Biases (W&B) is a ML experiment tracking platform that provides tools for experiment tracking, model optimization, and dataset versioning. It offers a seamless integration with popular ML frameworks, enabling teams to track metrics, visualize model performance, and collaborate effectively on machine learning projects.

Experiment tracking and visualization Hyperparameter sweep orchestration Model and dataset versioning Team collaboration and reporting

8.2k

605

Updated

★★★★☆

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

UKPLab

Sentence Transformers

Library Open Source NLP

Released: August 2019

Language: Python

Version: 2.2.2

Sentence Transformers is a Python framework for state-of-the-art sentence, text, and image embeddings. It provides an easy method to compute dense vector representations for sentences, paragraphs, and images, enabling semantic search, clustering, and information retrieval tasks with remarkable efficiency and accuracy.

Pre-trained models for sentence embeddings Multi-lingual and cross-lingual models Easy fine-tuning for domain adaptation Efficient similarity search implementations

13.2k

2.3k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Meta

Prophet

Library Open Source Time Series

Released: February 2017

Language: Python, R

Version: 1.1.5

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data, making it ideal for business forecasting.

Automatic seasonality detection Robust to missing data and outliers Built-in holiday effects Intuitive parameter tuning

17.5k

4.5k

Updated

★★★★☆

Popularity

★★★★☆

Activity

MIT

License

GitHub Website Documentation

Rasa Technologies

Rasa

Framework Open Source Conversational AI

Released: December 2016

Language: Python

Version: 3.6.12

Rasa is an open-source machine learning framework for building conversational AI assistants and chatbots. It provides tools for intent classification, entity extraction, and dialogue management, enabling developers to create contextual AI assistants that can have natural conversations while integrating with existing systems.

Natural language understanding pipeline Dialogue management with machine learning Custom action server for integration Multilingual chatbot support

17.5k

4.4k

Updated

★★★★☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Streamlit

Library Open Source UI Framework

Released: October 2019

Language: Python

Version: 1.27.2

Streamlit is an open-source Python library that makes it easy to create custom web apps for machine learning and data science. It turns data scripts into shareable web apps in minutes, requiring no front-end experience, making it perfect for data scientists and ML engineers to create interactive demos and dashboards.

Simple Python API for web app creation Built-in widgets for data visualization Real-time app updates during development Easy deployment and sharing

30k

2.7k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Google

Flax

Library Open Source Deep Learning

Released: March 2020

Language: Python

Version: 0.7.4

Flax is a neural network library for JAX designed for flexibility and high performance. It offers a simple, scalable, and flexible approach to neural network construction, particularly suited for research environments where customization and performance are paramount, leveraging JAX's powerful transformation capabilities.

Neural network library built on JAX Flexible module system for research Automatic state management Seamless integration with JAX transforms

5.3k

574

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Project MONAI

MONAI

Framework Open Source Medical Imaging

Released: April 2020

Language: Python

Version: 1.3.0

MONAI is a PyTorch-based framework for deep learning in healthcare imaging. It provides domain-optimized foundational capabilities for developing healthcare imaging training workflows, offering a comprehensive set of medical image-specific operations, models, and utilities for research and clinical applications.

Medical image-specific data operations Standardized training workflows Pre-trained models for medical tasks Integration with popular medical formats

5.1k

948

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Baidu

PaddlePaddle

Framework Open Source Deep Learning

Released: September 2016

Language: C++, Python

Version: 2.5.1

PaddlePaddle (PArallel Distributed Deep LEarning) is an industrial platform with advanced technologies and rich features for deep learning. It provides an easy-to-use, efficient, flexible, and scalable deep learning platform, with special focus on deployability and enterprise applications in Chinese language processing and industry use cases.

High-performance distributed training Extensive model repository Strong support for NLP in Chinese Industrial deployment optimization

21.5k

5.4k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Uber

Ludwig

Framework Open Source AutoML

Released: February 2019

Language: Python

Version: 0.8.1

Ludwig is a declarative machine learning framework that makes it easy to define deep learning pipelines with a simple configuration file. It enables users to train state-of-the-art models without writing code, supporting a variety of data types and tasks, making ML accessible to non-experts while being flexible for researchers.

Declarative model definition via YAML Support for multiple data types Automatic feature preprocessing Integration with popular ML libraries

10.5k

1.2k

Updated

★★★★☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

NVIDIA

TensorRT

SDK Performance Inference

Released: September 2016

Language: C++, Python

Version: 8.6.1

TensorRT is NVIDIA's SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications, optimizing neural network models to leverage NVIDIA GPUs with precision calibration and layer fusion capabilities.

Deep learning inference optimization Multi-precision inference (FP32, FP16, INT8) Dynamic tensor memory management Layer and tensor fusion

8.5k

2.2k

Updated

★★★★☆

Popularity

★★★★★

Activity

Proprietary

License

GitHub Website Documentation

Significant Gravitas

Auto-GPT

Application Open Source Autonomous AI

Released: March 2023

Language: Python

Version: 0.4.7

Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. It autonomously develops and manages businesses to increase net worth, demonstrating the potential of autonomous AI agents. The project features goal-oriented task execution and self-improvement capabilities.

Autonomous AI agent framework GPT-4 powered decision making Internet access for research and data gathering Long-term and short-term memory management

157k

40k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Meta

fastText

Library Open Source NLP

Released: August 2016

Language: C++

Version: 0.9.2

fastText is a library for efficient learning of word representations and sentence classification. It allows for training of supervised and unsupervised models on massive datasets quickly, providing high-quality word vectors for 157 languages and supporting text classification with blazing speed and efficiency.

Fast and accurate text classification Efficient word representation learning Pre-trained models for 157 languages Subword information for better representations

25.5k

4.7k

Updated

★★★★☆

Popularity

★★★☆☆

Activity

MIT

License

GitHub Website Documentation

Yandex

CatBoost

Library Open Source Machine Learning

Released: July 2017

Language: C++, Python

Version: 1.2.2

CatBoost is a high-performance gradient boosting library that handles categorical features naturally, making it ideal for real-world datasets with mixed data types. It offers superior out-of-the-box performance, built-in GPU acceleration, and requires minimal hyperparameter tuning, making it perfect for both beginners and experts.

Native categorical feature support GPU acceleration for training Reduced overfitting with ordered boosting Fast inference for production use

7.6k

1.1k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

PyG Team

PyG (PyTorch Geometric)

Library Open Source Graph ML

Released: November 2018

Language: Python, C++

Version: 2.4.0

PyTorch Geometric is a library for deep learning on irregularly structured input data such as graphs, point clouds, and manifolds. It provides efficient data loaders, various graph neural network layers, and high-performance processing for graph-structured data, enabling state-of-the-art graph learning research and applications.

Comprehensive graph neural network layers Efficient data loaders for graphs GPU-accelerated graph operations Integration with PyTorch ecosystem

19.5k

3.4k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

MindsDB

Platform Open Source AutoML

Released: August 2018

Language: Python

Version: 23.11.4

MindsDB is an AI automation platform that brings machine learning into databases, enabling developers to build AI applications directly with SQL. It simplifies the integration of AI models with existing data infrastructure, allowing real-time predictions and automated machine learning workflows without requiring data science expertise.

AI layer for databases AutoML with SQL interface Integration with multiple data sources Real-time model training and predictions

20k

2.8k

Updated

★★★★☆

Popularity

★★★★★

Activity

GPL-3.0

License

GitHub Website Documentation

deepset

Haystack

Framework Open Source NLP

Released: November 2019

Language: Python

Version: 2.0.0

Haystack is an end-to-end framework for building production-ready NLP applications, focused on search, question answering, and document retrieval. It combines transformer models with traditional search algorithms, providing a flexible architecture for building LLM-powered applications at scale with multiple document stores and retrievers.

Production-ready NLP pipelines Integration with LLMs and vector stores Flexible document retrieval system Question answering and semantic search

13k

1.7k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Scott Lundberg

SHAP

Library Open Source Explainable AI

Released: November 2017

Language: Python, C++

Version: 0.43.0

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using classic Shapley values from game theory, providing a unified measure of feature importance for model interpretability and debugging.

Model-agnostic explanations Tree-based model optimizations Visualization tools for interpretability Local and global feature importance

21k

3.1k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Linux Foundation

Feast

Platform Open Source MLOps

Released: January 2019

Language: Python, Go

Version: 0.35.0

Feast is an open-source feature store that serves machine learning features to real-time applications with production-grade reliability. It provides a centralized platform for managing feature definitions, ensures consistency between training and serving, and supports both batch and streaming feature computation for ML pipelines.

Centralized feature management Real-time feature serving Point-in-time correctness Integration with data warehouses

5.2k

915

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

NVIDIA

NeMo

Framework Open Source Conversational AI

Released: August 2019

Language: Python

Version: 1.21.0

NVIDIA NeMo is a toolkit for building, training, and fine-tuning GPU-accelerated speech AI and natural language processing models. It provides pre-trained models, training recipes, and optimized building blocks for creating state-of-the-art conversational AI applications, with special support for large language models and multimodal systems.

Pre-trained models for ASR, NLP, and TTS Large language model training support Multi-GPU and multi-node scaling Mixed precision training optimization

10k

2.1k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Apache

MXNet

Framework Open Source Deep Learning

Released: October 2015

Language: C++, Python

Version: 1.9.1

Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows mixing symbolic and imperative programming to maximize efficiency and productivity, offering scalability across multiple GPUs and multiple machines, making it suitable for both research and industrial applications.

Hybrid programming model Distributed training support Memory efficiency optimizations Gluon API for flexibility

20.6k

6.8k

Updated

★★★★☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

OpenMMLab

mmdetection

Framework Open Source Computer Vision

Released: June 2018

Language: Python

Version: 3.2.0

MMDetection is an open-source object detection toolbox based on PyTorch, part of the OpenMMLab project. It provides a modular design with support for various object detection frameworks, extensive model zoo, and flexible configuration system, making it ideal for both research and production deployment of detection models.

Modular design for object detection Rich model zoo with pre-trained weights Support for mainstream detection methods Easy configuration and customization

27k

9.1k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Steven Loria

TextBlob

Library Open Source NLP

Released: August 2013

Language: Python

Version: 0.17.1

TextBlob is a Python library for processing textual data that provides a simple API for common natural language processing tasks. It offers sentiment analysis, part-of-speech tagging, noun phrase extraction, and more, making NLP accessible to developers with its intuitive interface built on top of NLTK and pattern.

Simple API for common NLP tasks Built-in sentiment analysis Part-of-speech tagging Language translation and detection

8.9k

1.1k

Updated

★★★☆☆

Popularity

★★★☆☆

Activity

MIT

License

GitHub Website Documentation

IBM

AI Fairness 360

Toolkit Open Source Ethics

Released: September 2018

Language: Python

Version: 0.5.0

AI Fairness 360 (AIF360) is an extensible open-source toolkit that helps detect and mitigate bias in machine learning models throughout the AI application lifecycle. It provides metrics to test for biases and algorithms to mitigate bias in datasets and models, supporting the development of trustworthy AI systems.

Comprehensive bias detection metrics Bias mitigation algorithms Pre-processing and post-processing techniques Integration with scikit-learn pipelines

2.3k

769

Updated

★★★☆☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Uber AI Labs

Pyro

Framework Open Source Probabilistic Programming

Released: November 2017

Language: Python

Version: 1.8.6

Pyro is a universal probabilistic programming language built on PyTorch. It enables flexible and expressive deep probabilistic modeling, unifying modern deep learning with bayesian modeling through a simple yet powerful API, making it ideal for applications requiring uncertainty quantification and probabilistic inference.

Deep probabilistic programming Stochastic variational inference Integration with PyTorch ecosystem Flexible inference algorithms

8.3k

980

Updated

★★★★☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Google

Kubeflow

Platform Open Source MLOps

Released: December 2017

Language: Go, Python

Version: 1.8.0

Kubeflow is an open-source project dedicated to making deployments of machine learning workflows on Kubernetes simple, portable, and scalable. It provides a complete platform for deploying, monitoring, and managing complex ML systems in production, with components for experimentation, training, serving, and pipeline orchestration.

ML workflow orchestration on Kubernetes Distributed training job management Model serving and monitoring Integrated MLOps toolchain

13.6k

2.3k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Google

Trax

Library Open Source Deep Learning

Released: August 2019

Language: Python

Version: 1.4.1

Trax is an end-to-end library for deep learning that focuses on clear code and speed. It's actively used and maintained by the Google Brain team for advanced research in deep learning, offering a simple API for defining models while providing powerful features for large-scale distributed training and transformer architectures.

Fast training with JAX acceleration Built-in transformer models Simple and clear API design Scalable to large datasets

7.9k

811

Updated

★★★☆☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

PyCaret

Library Open Source AutoML

Released: April 2020

Language: Python

Version: 3.1.0

PyCaret is an open-source, low-code machine learning library in Python that automates machine learning workflows. It's an end-to-end ML solution for data scientists, offering a simple interface to perform common machine learning tasks with just a few lines of code, including data preprocessing, model training, and deployment.

Low-code machine learning automation Integrated preprocessing pipeline Model comparison and ensemble methods MLOps integration capabilities

8.3k

1.7k

Updated

★★★★☆

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

BentoML

Platform Open Source MLOps

Released: April 2019

Language: Python

Version: 1.1.7

BentoML is an open platform for machine learning model serving and deployment. It simplifies the process of packaging ML models as production-ready API services, supporting various ML frameworks and providing containerization, scaling, and monitoring features for deploying models in production environments.

Framework-agnostic model serving Built-in API server with OpenAPI support Docker containerization automation Cloud deployment integrations

6.3k

701

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

HumanSignal

Label Studio

Tool Open Source Data Labeling

Released: October 2019

Language: Python, React

Version: 1.9.1

Label Studio is a multi-type data labeling and annotation tool with standardized output format. It provides flexible interfaces for labeling various data types including images, audio, text, time series, and video, supporting both human labeling and automated pre-annotation with machine learning models for efficient dataset creation.

Multi-format data annotation interface ML-assisted labeling and automation Project management and collaboration tools Integration with ML pipelines

16k

1.9k

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Evidently AI

Evidently

Tool Open Source Monitoring

Released: December 2020

Language: Python

Version: 0.4.7

Evidently is an open-source tool for ML model monitoring and testing that helps evaluate, test, and monitor data and ML model quality throughout the model lifecycle. It provides interactive reports, drift detection, and monitoring dashboards for maintaining ML system health in production environments.

Interactive model quality reports Data and prediction drift detection Integration with ML pipelines Customizable test suites

4.6k

520

Updated

★★★☆☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Graphcore

Graphcore Poplar

SDK Proprietary AI Hardware

Released: March 2018

Language: C++, Python

Version: 3.3.0

Poplar is Graphcore's graph programming framework designed specifically for AI compute on Intelligence Processing Units (IPUs). It provides a complete SDK for developing and deploying machine learning models with exceptional performance, offering unique graph-based computing paradigms optimized for AI workloads.

Graph compiler for IPU hardware Optimized for AI compute patterns Integration with PyTorch and TensorFlow Fine-grained parallelism control

605

82

Updated

★★★☆☆

Popularity

★★★★☆

Activity

Proprietary

License

GitHub Website Documentation

PyTorch

Torchaudio

Library Open Source Audio Processing

Released: May 2019

Language: Python, C++

Version: 2.1.0

Torchaudio is an audio library for PyTorch that provides I/O utilities, popular datasets, and common audio transformations. It simplifies audio processing tasks for machine learning applications, offering GPU-accelerated operations for efficient audio feature extraction and transformation in deep learning pipelines.

Audio I/O and dataset loading Common audio transformations GPU-accelerated operations Integration with PyTorch ecosystem

2.3k

604

Updated

★★★☆☆

Popularity

★★★★★

Activity

BSD-3-Clause

License

GitHub Website Documentation

Stanford NLP Group

Stanza

Library Open Source NLP

Released: April 2020

Language: Python

Version: 1.5.1

Stanza is Stanford NLP Group's official Python library for advanced NLP with support for 60+ languages. It provides neural network models for various NLP tasks including tokenization, part-of-speech tagging, lemmatization, dependency parsing, and named entity recognition with state-of-the-art accuracy and efficiency.

Multi-lingual support for 60+ languages Full neural NLP pipeline State-of-the-art model performance Integration with CoNLL data formats

6.9k

878

Updated

★★★★☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Adobe Research

NLP-Cube

Framework Open Source NLP

Released: July 2018

Language: Python

Version: 3.0

NLP-Cube is a natural language processing framework that provides end-to-end processing pipelines for multiple languages. It offers state-of-the-art neural network models for sentence splitting, tokenization, POS tagging, lemmatization, and dependency parsing with a unified API across all supported languages.

Neural end-to-end NLP pipeline Multi-task learning architecture Language-agnostic design Docker deployment support

374

57

Updated

★★☆☆☆

Popularity

★★☆☆☆

Activity

Apache 2.0

License

GitHub Website Documentation

OpenNMT

Framework Open Source NLP

Released: December 2016

Language: Python, Lua

Version: 3.3.1

OpenNMT is an open-source ecosystem for neural machine translation and neural sequence learning. It provides industrial-strength, production-ready implementations of neural machine translation architectures, supporting both research experimentation and large-scale production deployment with optimized performance.

Multiple neural machine translation architectures Production-ready deployment tools Support for multi-modal translation Extensive customization options

6.5k

2.2k

Updated

★★★★☆

Popularity

★★★★☆

Activity

MIT

License

GitHub Website Documentation

HPC-AI Tech

Colossal-AI

Framework Open Source Distributed Computing

Released: October 2021

Language: Python, C++

Version: 0.3.4

Colossal-AI is a unified deep learning system for large-scale parallel training. It provides easy-to-use APIs for distributed training of large models, offering various parallelism strategies including data, tensor, pipeline, and sequence parallelism, making large model training accessible to all.

Multiple parallelism strategies Heterogeneous memory management Automatic parallelization Zero overhead integration

37k

4.2k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Lightning AI

PyTorch Lightning

Framework Open Source Deep Learning

Released: March 2019

Language: Python

Version: 2.1.0

PyTorch Lightning is the deep learning framework for professional AI researchers and machine learning engineers who need maximal flexibility without sacrificing performance at scale. It organizes PyTorch code to remove boilerplate while adding essential features for production deployment and scaling.

Hardware agnostic training Distributed training orchestration Built-in debugging and profiling Easy production deployment

26.5k

3.2k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

MosaicML

Composer

Library Open Source Training Optimization

Released: March 2022

Language: Python

Version: 0.16.4

Composer is a PyTorch library for efficient neural network training through algorithmic improvements. It provides a set of optimizations that can be composed to accelerate model training by up to 7x while improving model quality, featuring drop-in replacements for standard training procedures.

Algorithmic training optimizations Model-agnostic speedup methods Integration with popular frameworks Memory and compute efficiency

4.8k

392

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Hugging Face

Accelerate

Library Open Source Distributed Computing

Released: January 2021

Language: Python

Version: 0.24.1

Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code. It provides simple APIs to make PyTorch training scripts runnable on any distributed setup, supporting multiple GPUs, TPUs, and DeepSpeed integration.

Distributed training with minimal code changes Automatic mixed precision support Device placement management Integration with HF Trainer

6.5k

751

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Alpa

Framework Open Source Distributed Computing

Released: April 2022

Language: Python, C++

Version: 0.2.3

Alpa is a system for training and serving large-scale neural networks. It automates parallelization of large tensor computations and generates execution plans that unify data, operator, and pipeline parallelism, enabling training of models with hundreds of billions of parameters on distributed clusters.

Automatic parallelization of large models Inter-operator parallelism Memory optimization techniques JAX ecosystem integration

3k

337

Updated

★★★☆☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Hugging Face

PEFT

Library Open Source Fine-tuning

Released: February 2023

Language: Python

Version: 0.6.2

PEFT (Parameter-Efficient Fine-Tuning) is a library for efficiently adapting pre-trained language models to various downstream applications without fine-tuning all the model's parameters. It implements state-of-the-art methods like LoRA, Prefix Tuning, and P-Tuning to achieve competitive performance with minimal compute requirements.

Multiple parameter-efficient methods Integration with transformers library Memory-efficient fine-tuning Support for various model architectures

13.5k

1.2k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Artidoro Pagnoni

QLoRA

Method Open Source Fine-tuning

Released: May 2023

Language: Python

Version: 1.0.0

QLoRA is an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while preserving full 16-bit finetuning task performance. It uses 4-bit quantization and Low Rank Adapters to achieve unprecedented memory efficiency in LLM finetuning.

4-bit quantization for LLM finetuning Memory-efficient adapter training Maintains 16-bit task performance Single GPU finetuning capability

9.2k

1.1k

Updated

★★★★☆

Popularity

★★★★☆

Activity

MIT

License

GitHub Website Documentation

Hugging Face

Transformers Agents

Library Open Source LLM

Released: May 2023

Language: Python

Version: 4.35.0

Transformers Agents is a natural language API built on top of transformers that provides an agent interface to use tools, search the web, and leverage language models for complex tasks. It enables natural language programming by converting user instructions into executable code using LLMs.

Natural language API for coding Tool use and web search capabilities Integration with transformers models Multi-modal agent support

115k

23k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Hegel AI

PromptTools

Tool Open Source Prompt Engineering

Released: June 2023

Language: Python

Version: 0.0.41

PromptTools provides a set of open-source, self-hostable tools for experimenting with, testing, and evaluating LLMs, vector databases, and prompts. It enables systematic prompt engineering through experimentation frameworks, evaluation metrics, and visualization tools for optimizing LLM applications.

Prompt testing and experimentation Multiple LLM provider support Evaluation framework for prompts Visualization and comparison tools

2.4k

202

Updated

★★★☆☆

Popularity

★★★★☆

Activity

Apache 2.0

License

GitHub Website Documentation

Unsloth AI

Unsloth

Library Open Source Fine-tuning

Released: October 2023

Language: Python, CUDA

Version: 2023.11

Unsloth is a lightweight library for efficient finetuning of LLMs that requires 70% less memory and runs 2.2x faster while maintaining accuracy. It implements custom CUDA kernels and memory optimizations specifically designed for LLM finetuning, making large model training more accessible on consumer hardware.

70% less memory usage for finetuning 2.2x faster training speed Custom CUDA kernel optimizations Compatible with popular LLM architectures

7.2k

442

Updated

★★★★☆

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

LlamaIndex

Framework Open Source RAG

Released: November 2022

Language: Python

Version: 0.9.15

LlamaIndex is a data framework for LLM applications to ingest, structure, and access private or domain-specific data. It provides tools for building production RAG systems, including document processing, embedding management, vector stores integration, and advanced query capabilities for contextual LLM applications.

Document ingestion and indexing Advanced retrieval strategies Multiple vector store integrations Query engines and chat interfaces

30k

3.9k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

CrewAI

Framework Open Source Multi-Agent

Released: December 2023

Language: Python

Version: 0.1.35

CrewAI is a framework for orchestrating role-playing autonomous AI agents. It enables the creation of AI teams that work together to accomplish complex tasks, providing a structured approach to multi-agent collaboration with specialized roles, goals, and tools for each agent in the crew.

Role-based agent framework Multi-agent task orchestration Built-in tools and memory systems Integration with various LLM providers

12k

1.5k

Updated

★★★★☆

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

NVIDIA

Triton Inference Server

Server Open Source Inference

Released: October 2018

Language: C++, Python

Version: 2.39.0

Triton Inference Server delivers fast and scalable AI inferencing for any framework on GPU and CPU. It supports concurrent model execution, dynamic batching, and model ensembles, providing a standardized inference platform that maximizes throughput and hardware utilization in production environments.

Multi-framework model serving Dynamic batching for high throughput Model ensembles and pipelines GPU and CPU optimization

7.4k

1.6k

Updated

★★★★☆

Popularity

★★★★★

Activity

BSD-3-Clause

License

GitHub Website Documentation

vLLM Team

vLLM

Engine Open Source Inference

Released: June 2023

Language: Python, C++

Version: 0.2.4

vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. It achieves 24x higher throughput than HuggingFace Transformers by using PagedAttention, continuous batching, and optimized CUDA kernels, making it ideal for production deployment of large language models.

PagedAttention for efficient memory use Continuous batching for high throughput Integration with popular LLM architectures OpenAI-compatible API server

17k

2.2k

Updated

★★★★★

Popularity

★★★★★

Activity

Apache 2.0

License

GitHub Website Documentation

Microsoft

ONNX Runtime

Runtime Open Source Inference

Released: December 2018

Language: C++, Python

Version: 1.16.1

ONNX Runtime is a cross-platform inference and training accelerator compatible with deep learning frameworks, including PyTorch and TensorFlow/Keras. It optimizes and accelerates machine learning inferencing and training, providing consistent performance improvements across different hardware platforms.

Cross-platform inference optimization Hardware acceleration support Multiple framework compatibility Production-grade performance

12.5k

2.6k

Updated

★★★★★

Popularity

★★★★★

Activity

MIT

License

GitHub Website Documentation

Comet

Comet ML

Platform Commercial MLOps

Released: August 2017

Language: Python

Version: 3.35.3

Comet ML is a machine learning platform that helps track, compare, explain, and optimize experiments and models. It provides experiment tracking, model production monitoring, and a model registry, enabling teams to build better models faster through comprehensive visualization and collaboration tools.

Automatic experiment tracking Model performance visualization Hyperparameter optimization Team collaboration features

145

23

Updated

★★★★☆

Popularity

★★★★★

Activity

Commercial

License

GitHub Website Documentation