The Grand AI Handbook

Core Concepts of Reinforcement Learning

An exploration of MDPs, dynamic programming, and Q-learning, igniting the spark for agents that learn by interacting with their world.

Chapter 4: Markov Decision Processes (MDPs) (States, actions, rewards, transition probabilities, Bellman equations) Chapter 5: Dynamic Programming for RL (Policy iteration, value iteration, asynchronous DP) Chapter 6: Monte Carlo Methods (First-visit MC, every-visit MC, importance sampling) Chapter 7: Temporal Difference Learning (TD(0), SARSA, Q-learning, eligibility traces)