In the context of the multi-armed bandit problem, what is regret?

  • The feeling of loss and remorse
  • An optimization metric
  • A random variable
  • An arm selection policy
In the context of the multi-armed bandit problem, regret is an optimization metric that quantifies how much an agent's total reward falls short of the best possible reward it could have achieved by always choosing the best arm. It's a way to measure how well an agent's arm selection policy performs.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *