TRPO: A DRL algorithm that uses a neural network to approximate the policy and adapts the step size of the update based on the change in the policy.
TRPO: A DRL algorithm that uses a neural network to approximate the policy and adapts the step size of the update based on the change in the policy.