Landmark Papers in Generative AI

Explore the foundational research that has shaped the field of Generative AI. This curated collection highlights the most influential papers that established key concepts, techniques, and breakthroughs in the evolution of generative models.

Landmark Papers in Generative AI is a curated collection showcasing the foundational research that has shaped the field of generative artificial intelligence. I've carefully selected these papers to highlight the key breakthroughs and conceptual advances that have defined the evolution of generative models, providing historical context and significance for researchers and enthusiasts alike.

1994-2010

June 1994

Mixture Models Probabilistic

Density Estimation by Mixture Models

This pioneering work by Hinton and colleagues at the University of Toronto advanced density estimation using mixture models, establishing foundational techniques for probabilistic generative approaches that would influence decades of subsequent research.

Landmark Papers in Generative AI

1994-2010

Density Estimation by Mixture Models

Neural Network Models for Unconditional Generation of Sequences

The Helmholtz Machine

Generating Faces with Neural Networks

Latent Dirichlet Allocation

A Fast Learning Algorithm for Deep Belief Nets

Reducing the Dimensionality of Data with Neural Networks

Learning Deep Boltzmann Machines

2012-2014

ImageNet Classification with Deep Convolutional Neural Networks

Auto-Encoding Variational Bayes

Generative Adversarial Networks

Inceptionism: Going Deeper into Neural Networks

2015-2016

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

A Neural Algorithm of Artistic Style

Pixel Recurrent Neural Networks

WaveNet: A Generative Model for Raw Audio

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

2017-2018

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Attention is All You Need

Image-to-Image Translation with Conditional Adversarial Networks

Progressive Growing of GANs for Improved Quality, Stability, and Variation

Large Scale GAN Training for High Fidelity Natural Image Synthesis

A Style-Based Generator Architecture for GANs

2019-2020

Language Models are Unsupervised Multitask Learners

Generating Diverse High-Fidelity Images with VQ-VAE-2

BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation

Denoising Diffusion Probabilistic Models

Analyzing and Improving the Image Quality of StyleGAN

Jukebox: A Generative Model for Music

Language Models are Few-Shot Learners

2021

Learning Transferable Visual Models From Natural Language Supervision

Taming Transformers for High-Resolution Image Synthesis

Zero-Shot Text-to-Image Generation

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Evaluating Large Language Models Trained on Code

Diffusion Models Beat GANs on Image Synthesis

Training Language Models to Follow Instructions with Human Feedback

2022

Hierarchical Text-Conditional Image Generation with CLIP Latents

High-Resolution Image Synthesis with Latent Diffusion Models

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding

Pathways Autoregressive Text-to-Image Model

DreamFusion: Text-to-3D using 2D Diffusion

Training Language Models to Follow Instructions with Human Feedback

2023

Adding Conditional Control to Text-to-Image Diffusion Models

Robust Speech Recognition via Large-Scale Weak Supervision

GPT-4 Technical Report

Visual Instruction Tuning

MusicLM: Generating Music From Text

AudioLM: a Language Modeling Approach to Audio Generation

Improving Image Generation with Better Captions

Stable Audio: Fast Timing-Conditioned Latent Audio Diffusion

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Make-A-Video: Text-to-Video Generation without Text-Video Data

Generative Multiworld Models for Visual Interaction

Constitutional AI: Harmlessness from AI Feedback

2024

Video Generation Models as World Simulators

Gemini: A Family of Highly Capable Multimodal Models

Claude 3 Technical Report

Generative Interactive Environments

Stable Video 3D: Consistent Diffusion for End-to-End View-Consistent Video Generation

Lumiere: A Space-Time Diffusion Model for Video Generation

Emu2: Advanced Multimodal Generation through Unified Representations

VideoPoet: A Large-Scale Multimodal Model for Video Generation

xAI Multimodal Grok: Generative Understanding Across Modalities