Landmark Papers in Computer Vision

Explore the foundational research that has shaped the field of Computer Vision. This curated collection highlights the most influential papers that established key concepts, techniques, and breakthroughs in the evolution of computer vision systems.

Landmark Papers in Computer Vision is a curated collection showcasing the foundational research that has shaped the field of computer vision. I've carefully selected these papers to highlight the key breakthroughs and conceptual advances that have defined the evolution of visual perception systems, providing historical context and significance for researchers and enthusiasts alike.

1960s-1980s

February 1963

Edge Detection 3D Vision

Machine Perception of Three-Dimensional Solids

This pioneering work by Roberts at MIT introduced the Roberts Operator, one of the first algorithms for edge detection and laid the groundwork for computational approaches to 3D object recognition from 2D images, establishing fundamental techniques for extracting structure from visual data.

Landmark Papers in Computer Vision

1960s-1980s

Machine Perception of Three-Dimensional Solids

Computer Detection of Human Faces

Theory of Edge Detection

Neocognitron: A Self-organizing Neural Network Model for Pattern Recognition

A Computational Approach to Edge Detection

A Computational Framework for the Visual Motion

Backpropagation Applied to Handwritten Zip Code Recognition

1990s

Eigenfaces for Recognition

Snakes: Active Contour Models

Graph Cuts for Image Segmentation

Normalized Cuts and Image Segmentation

Gradient-Based Learning Applied to Document Recognition

A Global Geometric Framework for Nonlinear Dimensionality Reduction

2000-2009

Rapid Object Detection using a Boosted Cascade of Simple Features

Pictorial Structures for Object Recognition

Distinctive Image Features from Scale-Invariant Keypoints

Histograms of Oriented Gradients for Human Detection

SURF: Speeded Up Robust Features

BRIEF: Binary Robust Independent Elementary Features

ImageNet: A Large-Scale Hierarchical Image Database

2010-2015

The PASCAL Visual Object Classes Challenge

Object Detection with Discriminatively Trained Part-Based Models

ImageNet Classification with Deep Convolutional Neural Networks

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition

Going Deeper with Convolutions

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep Learning Face Representation by Joint Identification-Verification

Fast R-CNN

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

U-Net: Convolutional Networks for Biomedical Image Segmentation

Fully Convolutional Networks for Semantic Segmentation

Deep Residual Learning for Image Recognition

SSD: Single Shot MultiBox Detector

2016-2019

DeepFace: Closing the Gap to Human-Level Performance in Face Verification

Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

You Only Look Once: Unified, Real-Time Object Detection

SqueezeNet: AlexNet-level Accuracy with 50x Fewer Parameters and <0.5MB Model Size

Microsoft COCO: Common Objects in Context

Pyramid Scene Parsing Network

Mask R-CNN

Focal Loss for Dense Object Detection

MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

Attention is All You Need

Dynamic Routing Between Capsules

Densely Connected Convolutional Networks

Learning Transferable Architectures for Scalable Image Recognition

YOLOv3: An Incremental Improvement

A Neural Algorithm of Artistic Style

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

Large Scale GAN Training for High Fidelity Natural Image Synthesis

Objects as Points

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

MnasNet: Platform-Aware Neural Architecture Search for Mobile

CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features

StyleGAN: A Style-Based Generator Architecture for Generative Adversarial Networks

2020-2021

End-to-End Object Detection with Transformers (DETR)

Analyzing and Improving the Image Quality of StyleGAN (StyleGAN2)

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

RepVGG: Making VGG-style ConvNets Great Again

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)

Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning (BYOL)

Learning Transferable Visual Models From Natural Language Supervision (CLIP)

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

An Empirical Study of Training Self-Supervised Vision Transformers (MoCo-v3)

CvT: Introducing Convolutions to Vision Transformers

CoAtNet: Marrying Convolution and Attention for All Data Sizes

Alias-Free Generative Adversarial Networks (StyleGAN3)

YOLOX: Exceeding YOLO Series in 2021

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation

Masked Autoencoders Are Scalable Vision Learners (MAE)

2022