Education, Science, Technology, Innovation and Life
Open Access
Sign In

Analysis of Regret Function Trends of Algorithms for Multi-armed Bandit Problem

Download as PDF

DOI: 10.23977/csic2022.013

Author(s)

Bingjiang Xia

Corresponding Author

Bingjiang Xia

ABSTRACT

The multi-armed bandit (MAB) problem has gained interest in reinforcement learning and other areas due to its wide applications. Many algorithms for the MAB problem have been proposed. This paper discusses the three most common algorithms for the MAB problem: ε-greedy algorithm, upper confidence bound algorithm, and Thompson sampling algorithm. Since the performance, in terms of expected returns, of these algorithms is critical when deciding which algorithm to use, this paper analyzes the trends and properties of the regret function of these algorithms to compare their performance by stimulations. It is suggested that the Thompson sampling algorithm usually shows the best algorithm among the three algorithms based on the simulation performed.

KEYWORDS

multi-armed bandit problem, Thompson sampling, regret function

All published work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright © 2016 - 2031 Clausius Scientific Press Inc. All Rights Reserved.