Trust Region Policy Optimization (TRPO): A type of reinforcement learning algorithm that uses the trust region method to optimize the policy, by restricting the change in the policy to a small trust region, to ensure that the policy improvement is reliable.