Computers beat Multiplayer NLHE

ImperatorImperator Red Chipper Posts: 898 ✭✭✭
"A new artificial intelligence poker player called Pluribus, developed by a Facebook researchers and a Carnegie Mellon professor, has conquered six-player no-limit Texas hold ’em, outperforming a group of the world’s top human pros. The achievement is a milestone for AI gaming — the programmed conquered a complicated game with hidden information and one played with more than two players. "

Robots Are Beating Humans At Poker


Superhuman AI for multiplayer poker






Tagged:

Comments

  • ImperatorImperator Red Chipper Posts: 898 ✭✭✭
    Pluribus, like other superhuman AI games players, learned to play poker solely by playing against itself over eight days and 12,400 CPU core hours. It begins by playing randomly. It observes what works and what doesn’t. And it alters its approach along the way, using an algorithm that aims it toward Nash’s eponymous equilibria. This process created its plan of attack for the entire game, called its “blueprint strategy,” which was computed offline before the competition for what the authors estimate would be just $144 in current cloud computing costs. During its competitive games, Pluribus searches in real time for improvements to its coarse blueprint.
  • SplitSuitSplitSuit RCP Coach Posts: 4,011 -
    Interesting times we live in. Thanks for the share @Imperator
  • KemahPhilKemahPhil Red Chipper Posts: 103 ✭✭
    "On the other hand, poker is like a pyramid scheme: It needs a wide range of skill levels to support the pros playing for big bucks at the top. As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    Is this really the future?
  • TheGameKatTheGameKat Posts: 2,142 -
    KemahPhil wrote: »
    "On the other hand, poker is like a pyramid scheme: It needs a wide range of skill levels to support the pros playing for big bucks at the top. As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    Is this really the future?

    Many people predicted that the switch from LHE to NLHE would eventually lead to this, but for slightly different reasons.
    Moderation In Moderation
  • ImperatorImperator Red Chipper Posts: 898 ✭✭✭
    I've been following the development of computers and games since the late 1970s. Back then I was playing chess and backgammon and the whole debate about how to develop gameplaying computers fascinated me.

    The division at the time was between those who wanted to program computers to think like humans and those who wanted to program computers for brute force calculation. Brute force calculation won out, but for reasons that skipped over the science of the project, and simply went for the win.

    What is not well understood is that Soviet scientists and chess players such as Mikhail Botvinnik did not want to make computers who played great chess (or rather this was their secondary goal), they wanted to model human thinking in order to understand the human mind better.

    They made the following argument, which I will paraphrase here: "It is only of minor significance if humans develop computers that are powerful enough to beat other humans at any particular game. It is as significant as making a backhoe that can dig faster than a human with a shovel. In the world of machines, these kinds of things are bound to happen. It is of greater significance if we can use calculating machines to model human thought and thus learn more about how humans think and thus teach humans to think better."

    In the chess world, especially with recent developments such as AlphaZero and its use of neural networks, computers are teaching humans to play better. But do humans think better as a result? Perhaps.

    As a sidelight, there are plenty of people around the world that use computers to model reality in order to learn about reality. This goes on every day. In this sense, Botvinnik's project remains true.

    What does this have to do with poker? Humans cannot act like computers. But we may learn from them and apply ourselves in that way.


    KemahPhil wrote: »
    "On the other hand, poker is like a pyramid scheme: It needs a wide range of skill levels to support the pros playing for big bucks at the top. As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    Is this really the future?

    My guess is that at the higher levels, the skill levels will flatten out. But then we will introduce more complications to the game.

    The problem is the whole economics of poker. The money filters from bottom to top. But none of the people in my home games play any attention to any of this including the theory. If over the long run they lose enough money they quit playing and that is that.

    I would not say that the economics of poker models a pyramid scheme. It is closer to the way parimutuel betting works. In effect, and over the long-run, players are betting on their own skills into a pool of money, hoping that their odds end up better than the odds of everybody else. I have proposed this model before but I think I am very peculiar to stick to it.

    The simple fact is the following: good poker players are all looking to expand the volume of the betting pool so that the bad bets will increase the value of the mediocre and good bets.

  • RedRed Red Chipper Posts: 2,094 ✭✭✭✭
    Imperator wrote: »
    I've been following the development of computers and games since the late 1970s. Back then I was playing chess and backgammon and the whole debate about how to develop gameplaying computers fascinated me.

    The division at the time was between those who wanted to program computers to think like humans and those who wanted to program computers for brute force calculation. Brute force calculation won out, but for reasons that skipped over the science of the project, and simply went for the win.

    Which is also how Pluribus was made BTW
    "The core of Pluribus’s strategy was computed via self play, in which the AI plays against copies of itself, without any data of human or prior AI play used as input." ("Superhuman AI for multiplayer poker", Science, 11.07.2019)
  • boyd148boyd148 Red Chipper Posts: 92 ✭✭
    I heard that Pluribus made good use of donk betting to accomplish this victory
  • eugeniusjreugeniusjr Red Chipper Posts: 423 ✭✭✭
    I heard Pluribus actually lost, but that it called it a win because it got unlucky. Very very humanlike.
  • persuadeopersuadeo Red Chipper, Table Captain Posts: 4,008 ✭✭✭✭✭
    edited July 17
    eugeniusjr wrote: »
    I heard Pluribus actually lost, but that it called it a win because it got unlucky. Very very humanlike.

    Actually this is something Imperator could possibly explain.

    Deepstack and Pluribus use Range vs. Hand analysis to assign EV of actions for the computer, and use this EV to correct the results in a relatively straightforward computation called AIVAT or such.

    However, I find this, without better explanations available and no response to queries by experts, strange. If we only assign AIVAT variation reductions to Computer's action vs human action, a massive assumption is being made: that it is variance, not the human's strategy, that accounts for certain EV losses and gains.

    An example. In one of the Deepstack hands, we see a human hit a gutter on the end, and moves in for nearly 4x pot. The computer blithely pays it off with TPMK, then retracts all of the human's score in the hand as variance under AIVAT. In layman's terms, Eugene nails it on the head.

    But if the human is only moving in nutted because it knows the fixed strategy of Deepstack is overcalling, this is not variance, but strategy. The EV loss of the human's floats are punished in the scores per street, very fair, but his 4x pot reward when he is paid off with a bad call is not treated the same way.

    This argument applies to every hand I saw of DS - range vs hand for a one sided look at variance and a correction that will be useful but under exploitative circumstances trend to aid the computer's score.

    Now, look at Pluribus' results over the graph provided by the science team. Note that Pluribus is losing as time goes on - especially at showdown where its blue line is absolutely plummeting. Is this variance or are the humans adjusting? Why did that player in the DS challenge move in as he did?

    Nevertheless, I'm not a computer scientist and await correction.
  • Doug HullDoug Hull RCP Coach Posts: 1,771 -
    KemahPhil wrote: »
    "As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    As humans learn from bots? Look how many people play craps and roulette. The GTO play for these games has been known forever, yet they still play.
    Co-founder Red Chip Poker,
    Author Poker Plays You Can Use
    Author Poker Workbook for Math Geeks
  • TheGameKatTheGameKat Posts: 2,142 -
    Something that interests me that follows on from @Imperator's observations is that the self-learning chess machines (AlphaZero, Leela) play games that simply look different to the brute force engines like Stockfish. I'm curious to the degree this is repeated in poker.
    Moderation In Moderation
  • TheGameKatTheGameKat Posts: 2,142 -
    Doug Hull wrote: »
    KemahPhil wrote: »
    "As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    As humans learn from bots? Look how many people play craps and roulette. The GTO play for these games has been known forever, yet they still play.

    Ha, Yes, quite. But ssshhhh this is why we don't have state income tax.
    Moderation In Moderation
  • kenaceskenaces Red Chipper Posts: 1,394 ✭✭✭✭
    @persuadeo make a good point. In the 10k hands played Pluribus lost -7bb/100 and then used some fancy math(not published) to claim victory.

    They also claim to have beat elite poker players. While Linus was, of course, elite I am not sure about all the other player. There were quite a few bad plays by some of the humans(misclick, no financial incentive, MTT has-beens?)

    But it is pretty impressive that they built such a strong bot so fast with not much computer power. AI is coming for our bankrolls!
  • The MuleThe Mule Red Chipper Posts: 783 ✭✭✭
    Firstly thanks for sharing this @Imperator. I miss your contributions.

    Regarding the AIVAT variance reduction, without seeing details I am making some assumptions, but if it works as I expect then it should be reducing the variance of the EV estimate without altering the actual estimate. It is effectively using the available information of the AI range to increase sample size, thus providing a larger data set to estimate EV. The actual hand is just one data point in the range, but is no more meaningful than any other hand in it's range.

    In @Persuadeo's example, if the human is only moving in nutted because "it knows that the AI is over calling" then the range of the AI should reflect the loss of EV. The AIVAT should not be able to make an unprofitable call into a profitable call, because the call is unprofitable on average across the AI's range. AIVAT should be a better reflection of the EV of the human's strategy, because it considers the entirety of the AI range rather than just the actual hand.

    David.
  • persuadeopersuadeo Red Chipper, Table Captain Posts: 4,008 ✭✭✭✭✭
    Ok, that is reasonable, but again, the human range is unknown and the computer's calling strategy known. In the example I use, the human again loses chips on the river according to AIVAT, in fact even more chips than he's lost on the previous streets, despite it being clear that A8 is a terrible bluffcatcher and presumably Deepstack is calling with many other hands as well. The implication is that the human despite realizing max implied odds simply can't win against this set of actions with this hand ever, even if it stacks the computer and is never stacked in the reverse scenario. That's reasonable but a little strange to accept.

    Fair enough but it still doesn't address the essential problem of this technique: everything passes through player 1's strategy and not player 2's. For instance, if player 2 takes a different action with A2 on the river each time, have we really learned anything? As an example of this, Pluribus makes one absurd call that is extremely rare according to its own internal logic and clearly negative EV. I'd like to see the AIVAT score for situations such as that.

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file