---
layout: default
title: Mathematical and Statistical Foundations
---
Chapter 1: Mathematical Preliminaries
(Linear algebra, calculus, optimization, differential geometry)
Chapter 2: Probability and Statistics
(Distributions, Bayesian inference, hypothesis testing, KL divergence)
Chapter 3: Signal and Image Processing Basics
(Convolution, Fourier transforms, wavelets, filtering, noise models)
---
layout: default
title: Core Concepts and Traditional Methods
---
Chapter 4: Image Formation and Optics
(Pinhole cameras, lens models, radiometry, projective geometry)
Chapter 5: Feature Extraction and Matching
(Harris corners, SIFT, SURF, ORB, BRIEF, RANSAC)
Chapter 6: Geometric Vision
(Homography, epipolar geometry, stereo vision, camera calibration)
Chapter 7: Motion and Optical Flow
(Lucas-Kanade, Horn-Schunck, dense flow, motion estimation)
Chapter 8: Color and Texture Analysis
(RGB, HSV, LAB, texture descriptors, Gabor filters)
Chapter 9: Traditional Recognition Techniques
(HOG, Haar cascades, Viola-Jones, SVMs, template matching)
---
layout: default
title: Deep Learning Foundations for Vision
---
Chapter 10: Convolutional Neural Networks (CNNs): Fundamentals
(Convolution, pooling, activation functions, backpropagation)
Chapter 11: Types of Convolutions
(Standard, dilated, transposed, depthwise separable, group, deformable)
Chapter 12: Data Augmentation Techniques
(Flipping, rotation, color jitter, CutMix, MixUp, synthetic augmentation)
Chapter 13: Pretraining and Transfer Learning
(ImageNet, fine-tuning, domain adaptation, frozen vs. unfrozen layers)
Chapter 14: Training Techniques and Optimization
(SGD, Adam, learning rate schedules, label smoothing, mix-precision)
---
layout: default
title: CNN Architectures and Enhancements
---
Chapter 15: Classic CNN Architectures
(LeNet, AlexNet, VGG, GoogLeNet/Inception)
Chapter 16: Residual and Dense Networks
(ResNet, ResNeXt, DenseNet, WideResNet)
Chapter 17: Attention-Augmented CNNs
(SENet, CBAM, non-local blocks, attention gates, ECA-Net)
Chapter 18: Region-Based CNNs
(R-CNN, Fast R-CNN, Faster R-CNN, Mask R-CNN)
Chapter 19: Lightweight and Efficient CNNs
(MobileNet, ShuffleNet, EfficientNet, GhostNet)
---
layout: default
title: Core and Extended Vision Tasks
---
Chapter 20: Image Classification
(Benchmarks: ImageNet, CIFAR; multi-label classification)
Chapter 21: Object Detection
(YOLO, SSD, RetinaNet, DETR, CenterNet, FCOS)
Chapter 22: Semantic Segmentation
(FCN, U-Net, DeepLab, HRNet, SegFormer)
Chapter 23: Instance and Panoptic Segmentation
(Mask R-CNN, Panoptic FPN, SOLO, PointRend)
Chapter 24: Pose Estimation
(2D/3D human pose, OpenPose, DensePose, animal pose)
Chapter 25: Optical Character Recognition (OCR)
(Tesseract, CRNN, EAST, Transformer-based OCR)
Chapter 26: Image Retrieval
(Content-based retrieval, hashing, Siamese networks)
Chapter 27: Face Recognition and Metric Learning
(FaceNet, ArcFace, CosFace, triplet loss, sphereface)
Chapter 28: Scene Understanding
(Scene classification, object relationships, layout estimation)
Chapter 29: Anomaly Detection
(One-class SVM, autoencoders, reconstruction-based methods)
---
layout: default
title: Advanced Learning Paradigms
---
Chapter 30: Self-Supervised Learning
(SimCLR, MoCo, BYOL, DINO, MAE, SimSiam)
Chapter 31: Semi-Supervised Learning
(Pseudo-labeling, consistency regularization, FixMatch)
Chapter 32: Few-Shot and Zero-Shot Learning
(Prototypical networks, meta-learning, CLIP-based zero-shot)
Chapter 33: Knowledge Distillation and Self-Distillation
(Teacher-student models, DML, self-knowledge distillation)
Chapter 34: Continual and Lifelong Learning
(Catastrophic forgetting, EWC, replay methods)
---
layout: default
title: Vision Transformers and Large-Scale Models
---
Chapter 35: Foundations of Vision Transformers
(ViT, DeiT, patch embeddings, self-attention for images, training challenges)
Chapter 36: Hierarchical Vision Transformers
(Swin Transformer, Twins, PVT, Nested ViT, hierarchical design principles)
Chapter 37: Vision Transformers for Object Detection
(DETR, Deformable DETR, DINO, YOLOS, ViTDet)
Chapter 38: Vision Transformers for Segmentation
(SegFormer, Mask2Former, SETR, Swin-Unet, Segmenter)
Chapter 39: Vision Transformers for Video and Temporal Tasks
(Video Swin Transformer, TimeSformer, ViViT, MViT)
Chapter 40: Hybrid CNN-Transformer Architectures
(ConvNeXt, CoAtNet, LeViT, CvT, BoTNet)
Chapter 41: Vision Large Language Models (vLLMs)
(Flamingo, BLIP, LLaVA, CLIP-ViT, GIT, visual reasoning, image-text alignment)
Chapter 42: Scaling and Optimizing Vision Transformers
(Efficient ViTs, Sparse Transformers, Long-Range ViTs, FlashAttention for ViTs)
Chapter 43: Task-Specific ViT Innovations
(ViTPose, TransReID, ViTGAN, ViT-based OCR)
---
layout: default
title: 3D and Geometric Vision
---
Chapter 44: Depth Estimation
(Monocular depth, stereo matching, depth from motion, MVS)
Chapter 45: 3D Point Cloud Processing
(PointNet, PointNet++, PointConv, KPConv)
Chapter 46: Structure from Motion (SfM)
(Feature tracking, bundle adjustment, multi-view reconstruction)
Chapter 47: 3D Reconstruction and Rendering
(Voxel grids, meshes, NeRF, Instant NeRF, Plenoxels)
Chapter 48: Visual SLAM and Odometry
(ORB-SLAM, DSO, monocular/stereo SLAM, VIO)
---
layout: default
title: Generative Vision Models
---
Chapter 49: Variational Autoencoders (VAEs)
(Image generation, latent space interpolation)
Chapter 50: Generative Adversarial Networks (GANs)
(DCGAN, StyleGAN, BigGAN, ProGAN, GAN inversion)
Chapter 51: Diffusion Models
(DDPM, Stable Diffusion, DALL·E 2, latent diffusion)
Chapter 52: Conditional and Controllable Generation
(Pix2Pix, CycleGAN, GauGAN, text-guided synthesis)
Chapter 53: Neural Rendering
(NeRF, GRAF, differentiable rendering, scene synthesis)
---
layout: default
title: Multimodal and Dynamic Vision
---
Chapter 54: Multimodal Learning: Vision and Language
(CLIP, ViLBERT, BLIP, image captioning, VQA)
Chapter 55: Multimodal Learning: Vision and Beyond
(Vision-audio, vision-touch, cross-modal retrieval)
Chapter 56: Video Understanding: Classification and Action
(C3D, I3D, SlowFast, TimeSformer, VideoMAE)
Chapter 57: Video Segmentation and Tracking
(VOS, STCN, DeepSORT, ByteTrack, multi-object tracking)
Chapter 58: Event-Based and Neuromorphic Vision
(Event cameras, DVS, spiking neural networks)
---
layout: default
title: Efficiency and Optimization
---
Chapter 59: Model Compression Techniques
(Pruning, quantization: INT8/4-bit, weight sharing)
Chapter 60: Efficient Inference Architectures
(MobileNetV3, EfficientNetV2, Dynamic Neural Networks)
Chapter 61: Hardware Acceleration for Vision
(GPUs, TPUs, FPGAs, edge devices, NVidia Jetson)
Chapter 62: Real-Time Vision Optimization
(KV caching for ViTs, FlashAttention, latency reduction)
---
layout: default
title: Evaluation and Applications
---
Chapter 63: Benchmarking and Metrics
(ImageNet, COCO, KITTI, ADE20K, mAP, IoU, FID)
Chapter 64: Autonomous Systems
(Autonomous driving, SLAM, lane detection, path planning)
Chapter 65: Medical Imaging
(Radiology, pathology, segmentation, disease classification)
Chapter 66: Surveillance and Biometrics
(Face recognition, gait analysis, crowd monitoring)
Chapter 67: Augmented and Virtual Reality
(Pose tracking, occlusion, scene reconstruction)
Chapter 68: Industrial Vision
(Defect detection, quality control, robotics vision)
Chapter 69: Retail and E-Commerce
(Product recognition, visual search, inventory tracking)
Chapter 70: Creative and Media Applications
(Image editing, style transfer, video enhancement)
---
layout: default
title: Deployment, Ethics, and Future Directions
---
Chapter 71: Deployment Pipelines for Vision
(ONNX, TensorRT, model serving, MLOps)
Chapter 72: Ethical Considerations in Vision
(Bias, privacy, fairness, misuse prevention)
Subsection: Watermarking with SynthID
Techniques for images/video watermarking
Transparency and misinformation mitigation
Limitations and ethical impact
Chapter 73: Security in Vision Systems
(Adversarial robustness, backdoor attacks, defenses)
Chapter 74: Future Directions in Computer Vision
(Neurosymbolic vision, vLLM evolution, general perception)