dc.description.abstract |
Poker is an imperfect information game, i.e. the player possesses some information about the
state of the game, but not all of it (each player can see his/her own cards and the public cards, but
not the cards of the opponent(s)). Inversely, chess is a perfect information game, as the
knowledge about other players is available to all players(everyone sees all of the pieces on the
board). This makes poker a substantially different game, making impossible to automatically
play poker using conventional Machine Learning algorithms. University of Alberta is leading the
research in the field of creating computer programs that play poker better than any human being.
By being occasional poker players interested in game theory, we decided to analyze already
existing algorithms, define problems and try to build an AI-based strategy that will play the most
popular variation of the poker game, 2 player limit Texas Hold’em Poker.
One way of building a poker playing strategy is to build a program which is playing at Nash
Equilibrium. To reach the Equilibrium in games like poker, we can use the well-known
algorithm called “Counterfactual Regret Minimization” (CFR). CFR is a self-play algorithm: it
learns to play a game by repeatedly playing against itself. There are many modifications of this
algorithm, and the one we tried to implement is called Pure CFR. Other variations are CFR with
Monte Carlo sampling, Public cards Sampling.
There’s also a need for “supporting infrastructure” for CFR. This includes 2 modules:
1. Cards isomorphism module : Based on the hole cards (private cards dealt to the player)
and the public cards opened, there are many possible card combinations. Yet some card
combinations are the same, e.g. if I hold a king and a 9 of different suits, it doesn’t really
matter what are those 2 suits. So if we count only different possible card combinations,
we will get the following numbers:
● Preflop: 169
● Flop: 1755
● Turn: 16432
● River: 42783
2. Cards bucketing module : As we can see, the number of card combinations for river is still
quite big, so using bucketing to reduce the space would be required. We can safely
assume that for card combinations which are “similar”, we will be making the same
action (e.g. if we fold with 2 and 3, most probably we will fold with 2 and 4 too). So this
way we can split the space of 42783 possible card combinations along with hole cards
combinations into N buckets.
Here we have a few challenges.
● What does “similar” mean? In machine learning terms, what are the features of the
unsupervised learning algorithm? There are several values which are used for this
clustering here, including “Effective hand Strength”(EHS), “EHS Squared” and some
types of “History bucketing”. There may be newer ideas, so cards bucketing must be
researched and benchmarked.
● What is the optimal value for the number of buckets for each game round? Is 1000
buckets for river all right, or would it be better to use 5000, or 400? The more buckets we
have, the better the algorithm will perform, but the more time it will take to train the
model, and more memory will be required. So here is a tradeoff, and we must find the
best point.
● Which bucketing algorithm performs the best? Algorithms like k-means, hierarchical
bucketing and some others must be implemented and benchmarked/compared.
There are many universities that do research in this area. They have many bots implemented, but
the source code is not available. This creates a barrier for new people trying to join the research.
For that reason we decided to implement a supporting infrastructure which is needed to run basic
algorithms which can serve as a basis for any future research, and make it open-source.
The software is implemented in C++. The card isomorphism algorithm, its implementation, and
card bucketing performed by K-means clustering algorithm, which significantly reduced the time
for reaching Nash Equilibrium using CFR algorithm are discussed in this thesis. |
en_US |