The multi-armed bandit problem can be viewed as a simplified version of the reinforcement learning problem where the number of ________ is just one.

  • Episodes
  • States
  • Actions
  • Rewards
The multi-armed bandit problem simplifies reinforcement learning to just one action, where you need to decide which arm of a bandit to pull.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *