Abstract:
Poker is an imperfect information game, i.e. the player possesses some information about
the state of the game, but not all of it (each player can see his/her own cards and the public cards,
but not the cards of the opponent). Inversely, chess is a perfect information game, as the
knowledge about other players is available to all players (everyone sees all of the pieces on the
board). This makes poker a substantially different game, making impossible to automatically
play poker using conventional Machine Learning algorithms. University of Alberta is leading the
research in the field of creating computer programs that play poker better than any human being.
[ 1 ] By being occasional poker players interested in game theory, we decided to analyze
already existing algorithms, define problems and try to build an AI-based strategy that will play
the most popular variation of the poker game, 2 player limit Texas Hold’em Poker, without
human interaction. One way of building a poker playing strategy is to build a program which is playing at
Nash Equilibrium. One way of reaching the Equilibrium is using the algorithm called
“Counterfactual Regret Minimization” (CFR). CFR is a self-play algorithm: it learns to play a
game by repeatedly playing against itself. There are many modifications of this algorithm such
as vanilla CFR, Chance Sampling (CS) CFR, Outcome Sampling (CS) CFR, Public Chance
Sampling (PCS) CFR. Our goal was to implement the Pure CFR. We implement the CFR algorithm to solve abstractions of limit Texas Hold’em with
1012 states. Game states are the number of possible sequences of actions by the players. In the
poker setting, this would include all of the ways that the players’ private and public cards can be
dealt and all of the possible betting sequences [9]. This number allows us to compare a game
against other games such as chess or backgammon, which have 1047 and 1020 distinct game
states respectively. We train the CFR algorithm by playing million rounds with itself and get the approximate
Nash Equilibrium of the game (the probabilities of each action in the tree which leads to profit).
The software is implemented using C++ programming language. Most universities who do research in this area have their own implementations of
different bots, but the source code is not available. This creates a barrier for new students trying to join the research. For that reason, we decided to implement the "basic" poker playing algorithm
as well as some AI algorithms, which can serve as a basis for any future research. After a few
more optimizations, when we believe our software is mature enough, we will make it
open-source, allowing other people to do their research starting not from scratch anymore.