The Grand AI Handbook

RL with Human Interaction

An outline of RLHF and safe RL methods to create agents that align with human needs.

Chapter 27: RL with Human Feedback (RLHF) Reward modeling, preference-based RL, RLHF in LLMs References Chapter 28: Safe RL Problem Definition and Research Motivation Research Directions: Safety constraints, risk mitigation Primal-Dual Methods: Lagrangian optimization, CMDPs Primal Methods: Reward shaping, safety critics Model-Free Safe RL: Conservative Q-learning, safe PPO Model-Based Safe RL: Safe planning, uncertainty-aware models Future Study: Scalable safety, human-robot interaction References Chapter 29: Interactive RL (Human-in-the-loop RL, TAMER, reward shaping) Chapter 30: Explainable RL (Policy interpretability, value decomposition, causal RL)