Multi-Armed Bandit: A problem in reinforcement learning where an agent has to choose among multiple options or βarmsβ, each with a different probability of giving a reward. The agent has to balance the exploration of the options and the exploitation of the best options.