Research and development of automated poker playing software (counterfactual regret minimization)

Shmavonyan, Tamara

DSpace Home
→
Student Theses and Research
→
Master of Science in Computer and Information Science (CIS)
→
View Item

Research and development of automated poker playing software (counterfactual regret minimization)

Shmavonyan, Tamara

URI: https://dspace.aua.am/xmlui/handle/123456789/2134

Date: 2018

Abstract:

Poker is an imperfect information game, i.e. the player possesses some information about the state of the game, but not all of it (each player can see his/her own cards and the public cards, but not the cards of the opponent). Inversely, chess is a perfect information game, as the knowledge about other players is available to all players (everyone sees all of the pieces on the board). This makes poker a substantially different game, making impossible to automatically play poker using conventional Machine Learning algorithms. University of Alberta is leading the research in the field of creating computer programs that play poker better than any human being. [ 1 ] By being occasional poker players interested in game theory, we decided to analyze already existing algorithms, define problems and try to build an AI-based strategy that will play the most popular variation of the poker game, 2 player limit Texas Hold’em Poker, without human interaction. One way of building a poker playing strategy is to build a program which is playing at Nash Equilibrium. One way of reaching the Equilibrium is using the algorithm called “Counterfactual Regret Minimization” (CFR). CFR is a self-play algorithm: it learns to play a game by repeatedly playing against itself. There are many modifications of this algorithm such as vanilla CFR, Chance Sampling (CS) CFR, Outcome Sampling (CS) CFR, Public Chance Sampling (PCS) CFR. Our goal was to implement the Pure CFR. We implement the CFR algorithm to solve abstractions of limit Texas Hold’em with 1012 states. Game states are the number of possible sequences of actions by the players. In the poker setting, this would include all of the ways that the players’ private and public cards can be dealt and all of the possible betting sequences [9]. This number allows us to compare a game against other games such as chess or backgammon, which have 1047 and 1020 distinct game states respectively. We train the CFR algorithm by playing million rounds with itself and get the approximate Nash Equilibrium of the game (the probabilities of each action in the tree which leads to profit). The software is implemented using C++ programming language. Most universities who do research in this area have their own implementations of different bots, but the source code is not available. This creates a barrier for new students trying to join the research. For that reason, we decided to implement the "basic" poker playing algorithm as well as some AI algorithms, which can serve as a basis for any future research. After a few more optimizations, when we believe our software is mature enough, we will make it open-source, allowing other people to do their research starting not from scratch anymore.