Poker AI Libratus Wins Computing Award

18th November 2017 // Industry, Misc, News

Unbeatable poker bot Libratus, which dominated poker pros twice this year, was honored with an HPCwire Readers’ Choice Award at the Supercomputing Conference (SC17) this week in Denver, Colorado. If you have never heard of HPCwire – and you probably haven’t – feel free to check out its website if you want to feel really stupid. HPCwire bills itself as “the leading publication for news and information for the high performance computing industry,” and it certainly has loads of content that will be over most people’s heads.

Libratus, developed by Carnegie Mellon professor of computer science Tuomas Sandholm and Ph.D. student Noam Brown, runs on the $9.65 million “Bridges” supercomputer at the Pittsburgh Computing Center. It won HPCWire’s Readers’ Choice Award for “Best Use of AI” for winning the Brains vs. AI poker competition.

In a press release, Tom Tabor, CEO of Tabor Communications, the publisher of HPCwire, said of the awards:

HPCwire’s readership is broadly diversified; it includes industry leaders from the private sector, innovators in academia, and end users that are bringing HPC to the enterprise. Being selected to win either a Readers’ or Editors’ Choice Award is no small feat. This success signifies support and recognition from the HPC community along with the industries it serves. It is both an honor and privilege to once again engage with our readership and allow their voices to be heard. We extend a sincere thank you to our readers for their nominations and votes and a heartfelt congratulations to this year’s winners.

In January, Libratus took on the pro poker quarter of Dong King, Daniel McAulay, Jimmy Chou, and Jason Les at Pittsburgh’s Rivers Casino. Over the course of 20 days, the humans played 120,000 No-Limit Hold’em heads-up hands against the computer, losing miserably.

In order to try to weed out the luck factor as much as possible, the game was altered a bit from what we are used to seeing. Chip stacks were reset after every hand so that nobody could gain a big stack advantage and if there was an all-in and call before the river, no more cards were dealt and chips were distributed based on each player’s equity in the hand so as to avoid suckouts.

retro robot toy In addition, hands were mirrored. The human team was split into pairs; in each pair, the hands were reversed. For instance, in Hand #1, one human might be dealt 8-9 and Libratus might be dealt K-K. In the other match of the pair, Libratus would get the 8-9 and the other human would get the Kings. This way, neither side could be the beneficiary of every “lucky” deal.

Libratus ended up winning $1,766,250 of play money, or $14.72 per hand (starting stacks were 20,000 chips with 50/100 blinds). None of the human players beat the AI. Dong King was the closest, losing $85,649.

Tuomas Sandholm calculated that, based on the massive margin of victory, there was a 97.7 percent certainty that Libratus played more skillfully than the poker pros.

Interestingly, Libratus was not programmed to make one specific play based on the situation. Instead, it had a few options for every scenario. For example, when holding certain hole cards and facing a specific bet, it might be programmed to double the bet 40 percent of the time, raise three times the bet 20 percent of the time, call 25 percent of the time, and fold 15 percent of the time. After the competition, the computer would then go through all the hands and make adjustments to it strategy as necessary.

Sandholm and Brown trained Libratus by having it play billions of hands against itself and then analyze and adjust its decision-making algorithm.

Sandholm said that while Libratus trained itself to win at poker, it didn’t really learn poker, per se. “The algorithms we used are not poker specific,” he said. “They take as input the rules of the game and output strategy.”

He also said that he and Brown had Libratus focus on its own decision making and strategy and not trying to find weaknesses in the opponent because it is safer.

“When you exploit opponents, you open yourself up to exploitation more and more.”

In April, Libratus took on human poker players again, and again it won convincingly. This time, it played against six members of “Team Dragon” in Hainan, China, one of whom was 2016 World Series of Poker bracelet winner Tue Du. This match was only across 36,000 hands, but the stakes were real: Strategic Machine Inc., a company founded by Sandholm and Brown, took home $290,000 for the win.

Libratus won 220 “milli-big blinds” per game against the Chinese team versus 147 milli-big blinds against the Pittsburgh crew.