The Grand AI Handbook

Advanced RL Paradigms

An exploration of MBPO, CQL, and GAIL, expanding RL into planning, offline, and imitation learning.

Chapter 17: Model-Based Deep RL Problem Definition and Research Motivation Research Directions: World models, planning algorithms Model-Based Planning Algorithms: MCTS, trajectory optimization Model-Based Value Extension RL: Value-equivalent models Policy Optimization with Model Gradient Backhaul: MBPO, VPN Future Study: Scaling model-based RL, real-world applications References Chapter 18: Offline RL Problem Definition and Motivation Research Directions: Batch RL, policy constraints Algorithms: BCQ, CQL, TD3BC, EDAC, DT (Decision Transformer), QGPO, Diffuser Future Outlooks: Generalization, large-scale offline RL References Chapter 19: Imitation Learning and Inverse RL Problem Definition and Research Motivation Research Directions: Learning from demonstrations Behavioral Cloning (BC), SQIL Inverse Reinforcement Learning (IRL): Max-entropy IRL Adversarial Structured IL: GAIL, DQfD, TREX, R2D3 Future Study: Scalable IL, robust reward inference References Chapter 20: Transfer and Multitask RL Domain adaptation, task embeddings, meta-RL Generalization: PLR (Procedural Learning and Regularization) References Chapter 21: Hierarchical RL (Options framework, feudal networks, MAXQ, temporal abstraction)