Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective.What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming.We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.
Conditions of Use
This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook Algorithms for Reinforcement Learning for free.
- Title
- Algorithms for Reinforcement Learning
- Publisher
- Morgan and Claypool Publishers
- Author(s)
- Csaba Szepesvari
- Published
- 2010-06-25
- Edition
- 1
- Format
- eBook (pdf, epub, mobi)
- Pages
- 104
- Language
- English
- ISBN-10
- 1608454924
- ISBN-13
- 9781608454921
- License
- CC BY-NC-SA
- Book Homepage
- Free eBook, Errata, Code, Solutions, etc.
Preface ix Acknowledgments xiii Markov Decision Processes Preliminaries Markov Decision Processes Value functions Dynamic programming algorithms for solving MDPs Value Prediction Problems Temporal difference learning in finite state spaces Tabular TD(0) Every-visit Monte-Carlo TD(lambda): Unifying Monte-Carlo and TD(0) Algorithms for large state spaces TD(lambda) with function approximation Gradient temporal difference learning Least-squares methods The choice of the function space Control A catalog of learning problems Closed-loop interactive learning Online learning in bandits Active learning in bandits Active learning in Markov Decision Processes Online learning in Markov Decision Processes Direct methods Q-learning in finite MDPs Q-learning with function approximation Actor-critic methods Implementing a critic Implementing an actor For Further Exploration Further reading Applications Software Appendix: The Theory of Discounted Markovian Decision Processes A.1 Contractions and Banach’s fixed-point theorem A.2 Application to MDPs Bibliography Author's Biography