Research and development of automated poker playing software

Galoyan, Hasmik

DSpace Home
→
Student Theses and Research
→
Master of Science in Computer and Information Science (CIS)
→
View Item

Research and development of automated poker playing software

Galoyan, Hasmik

URI: https://dspace.aua.am/xmlui/handle/123456789/2139

Date: 2018

Abstract:

Poker is an imperfect information game, i.e. the player possesses some information about the state of the game, but not all of it (each player can see his/her own cards and the public cards, but not the cards of the opponent(s)). Inversely, chess is a perfect information game, as the knowledge about other players is available to all players(everyone sees all of the pieces on the board). This makes poker a substantially different game, making impossible to automatically play poker using conventional Machine Learning algorithms. University of Alberta is leading the research in the field of creating computer programs that play poker better than any human being. By being occasional poker players interested in game theory, we decided to analyze already existing algorithms, define problems and try to build an AI-based strategy that will play the most popular variation of the poker game, 2 player limit Texas Hold’em Poker. One way of building a poker playing strategy is to build a program which is playing at Nash Equilibrium. To reach the Equilibrium in games like poker, we can use the well-known algorithm called “Counterfactual Regret Minimization” (CFR). CFR is a self-play algorithm: it learns to play a game by repeatedly playing against itself. There are many modifications of this algorithm, and the one we tried to implement is called Pure CFR. Other variations are CFR with Monte Carlo sampling, Public cards Sampling. There’s also a need for “supporting infrastructure” for CFR. This includes 2 modules: 1. Cards isomorphism module : Based on the hole cards (private cards dealt to the player) and the public cards opened, there are many possible card combinations. Yet some card combinations are the same, e.g. if I hold a king and a 9 of different suits, it doesn’t really matter what are those 2 suits. So if we count only different possible card combinations, we will get the following numbers: ● Preflop: 169 ● Flop: 1755 ● Turn: 16432 ● River: 42783 2. Cards bucketing module : As we can see, the number of card combinations for river is still quite big, so using bucketing to reduce the space would be required. We can safely assume that for card combinations which are “similar”, we will be making the same action (e.g. if we fold with 2 and 3, most probably we will fold with 2 and 4 too). So this way we can split the space of 42783 possible card combinations along with hole cards combinations into N buckets. Here we have a few challenges. ● What does “similar” mean? In machine learning terms, what are the features of the unsupervised learning algorithm? There are several values which are used for this clustering here, including “Effective hand Strength”(EHS), “EHS Squared” and some types of “History bucketing”. There may be newer ideas, so cards bucketing must be researched and benchmarked. ● What is the optimal value for the number of buckets for each game round? Is 1000 buckets for river all right, or would it be better to use 5000, or 400? The more buckets we have, the better the algorithm will perform, but the more time it will take to train the model, and more memory will be required. So here is a tradeoff, and we must find the best point. ● Which bucketing algorithm performs the best? Algorithms like k-means, hierarchical bucketing and some others must be implemented and benchmarked/compared. There are many universities that do research in this area. They have many bots implemented, but the source code is not available. This creates a barrier for new people trying to join the research. For that reason we decided to implement a supporting infrastructure which is needed to run basic algorithms which can serve as a basis for any future research, and make it open-source. The software is implemented in C++. The card isomorphism algorithm, its implementation, and card bucketing performed by K-means clustering algorithm, which significantly reduced the time for reaching Nash Equilibrium using CFR algorithm are discussed in this thesis.