Q-learning is an off-policy algorithm because it learns the value of the optimal policy's actions, which may be different from the current ________'s actions.

  • Agent's
  • Environment's
  • Agent's or Environment's
  • Policy's
Q-learning is indeed an off-policy algorithm, as it learns the value of the optimal policy's actions (maximizing expected rewards) irrespective of the current environment's actions.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *