Thompson Sampling for Multi-armed Bandit Problems: From Theory to Applications

Yichen Wang

doi:10.23977/csic2022.017

Thompson Sampling for Multi-armed Bandit Problems: From Theory to Applications

Download as PDF

DOI: 10.23977/csic2022.017

Author(s)

Yichen Wang

Corresponding Author

Yichen Wang

ABSTRACT

A multi-armed bandit serves as a simple but powerful framework for reinforcement learning algorithms that can make decisions over time under uncertainty. The multi-armed bandit problem involves an interesting topic: the goal is to maximize the payoff through a balance of exploration and exploitation, for which a great deal of work has accumulated over the years. This paper introduces and models the multi-armed bandit problem, and mainly demonstrates three most common multi-armed bandit algorithms, ε-greedy, upper confidence bound, and Thompson sampling (TS). Among them, TS is characterized by solving a wide range of problems in a computationally efficient way and is currently widely used. This paper will introduce three practical applications in different fields. Finally, this paper summarizes the main applicable types of practical problems and the shortcomings of each algorithm and provides some promising directions for future research in related fields. It is concluded that the advantages of TS allow quick and efficient converge to optima in decision-making problems with no prior information, while the disadvantages of it remain open to improve in several aspects.

KEYWORDS

reinforcement learning, multi-armed bandit, Thompson sampling, decision making

Thompson Sampling for Multi-armed Bandit Problems: From Theory to Applications

Author(s)

Corresponding Author

ABSTRACT

KEYWORDS

RESOURCES

JOIN US

PUBLICATION SERVICES

CONTACT US