Algorithms for Reinforcement Learning

Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective.What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming.We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.

Conditions of Use

This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook Algorithms for Reinforcement Learning for free.

Title: Algorithms for Reinforcement Learning
Publisher: Morgan and Claypool Publishers
Author(s): Csaba Szepesvari
Published: 2010-06-25
Edition: 1
Format: eBook (pdf, epub, mobi)
Pages: 104
Language: English
ISBN-10: 1608454924
ISBN-13: 9781608454921
License: CC BY-NC-SA
Book Homepage: Free eBook, Errata, Code, Solutions, etc.

Preface ix
Acknowledgments xiii
Markov Decision Processes
Preliminaries
Markov Decision Processes
Value functions
Dynamic programming algorithms for solving MDPs
Value Prediction Problems
Temporal difference learning in finite state spaces
Tabular TD(0)
Every-visit Monte-Carlo
TD(lambda): Unifying Monte-Carlo and TD(0)
Algorithms for large state spaces
TD(lambda) with function approximation
Gradient temporal difference learning
Least-squares methods
The choice of the function space
Control
A catalog of learning problems
Closed-loop interactive learning
Online learning in bandits
Active learning in bandits
Active learning in Markov Decision Processes
Online learning in Markov Decision Processes
Direct methods
Q-learning in finite MDPs
Q-learning with function approximation
Actor-critic methods
Implementing a critic
Implementing an actor
For Further Exploration
Further reading
Applications
Software
Appendix: The Theory of Discounted Markovian Decision Processes
A.1 Contractions and Banach’s fixed-point theorem
A.2 Application to MDPs
Bibliography
Author's Biography