In the context of the multi-armed bandit problem, what is regret?

The feeling of loss and remorse
An optimization metric
A random variable
An arm selection policy

In the context of the multi-armed bandit problem, regret is an optimization metric that quantifies how much an agent's total reward falls short of the best possible reward it could have achieved by always choosing the best arm. It's a way to measure how well an agent's arm selection policy performs.

Add your answer