In the context of reinforcement learning, what does the term "gradient" in "Policy Gradient Methods" refer to?

  • The direction of steepest ascent in the policy space
  • A mathematical term used to describe the rate of change
  • The probability distribution of actions
  • The value function
In "Policy Gradient Methods," the "gradient" refers to the direction in the policy space that increases the expected reward. It guides policy updates to maximize reward.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *