Computers beat Multiplayer NLHE

ImperatorImperator Red Chipper Posts: 898 ✭✭✭
"A new artificial intelligence poker player called Pluribus, developed by a Facebook researchers and a Carnegie Mellon professor, has conquered six-player no-limit Texas hold ’em, outperforming a group of the world’s top human pros. The achievement is a milestone for AI gaming — the programmed conquered a complicated game with hidden information and one played with more than two players. "

Robots Are Beating Humans At Poker


Superhuman AI for multiplayer poker






Tagged:

Comments

  • ImperatorImperator Red Chipper Posts: 898 ✭✭✭
    Pluribus, like other superhuman AI games players, learned to play poker solely by playing against itself over eight days and 12,400 CPU core hours. It begins by playing randomly. It observes what works and what doesn’t. And it alters its approach along the way, using an algorithm that aims it toward Nash’s eponymous equilibria. This process created its plan of attack for the entire game, called its “blueprint strategy,” which was computed offline before the competition for what the authors estimate would be just $144 in current cloud computing costs. During its competitive games, Pluribus searches in real time for improvements to its coarse blueprint.
  • SplitSuitSplitSuit RCP Coach Posts: 4,032 -
    Interesting times we live in. Thanks for the share @Imperator
  • KemahPhilKemahPhil Red Chipper Posts: 108 ✭✭
    "On the other hand, poker is like a pyramid scheme: It needs a wide range of skill levels to support the pros playing for big bucks at the top. As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    Is this really the future?
  • TheGameKatTheGameKat Posts: 2,400 -
    KemahPhil wrote: »
    "On the other hand, poker is like a pyramid scheme: It needs a wide range of skill levels to support the pros playing for big bucks at the top. As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    Is this really the future?

    Many people predicted that the switch from LHE to NLHE would eventually lead to this, but for slightly different reasons.
    Moderation In Moderation
  • ImperatorImperator Red Chipper Posts: 898 ✭✭✭
    I've been following the development of computers and games since the late 1970s. Back then I was playing chess and backgammon and the whole debate about how to develop gameplaying computers fascinated me.

    The division at the time was between those who wanted to program computers to think like humans and those who wanted to program computers for brute force calculation. Brute force calculation won out, but for reasons that skipped over the science of the project, and simply went for the win.

    What is not well understood is that Soviet scientists and chess players such as Mikhail Botvinnik did not want to make computers who played great chess (or rather this was their secondary goal), they wanted to model human thinking in order to understand the human mind better.

    They made the following argument, which I will paraphrase here: "It is only of minor significance if humans develop computers that are powerful enough to beat other humans at any particular game. It is as significant as making a backhoe that can dig faster than a human with a shovel. In the world of machines, these kinds of things are bound to happen. It is of greater significance if we can use calculating machines to model human thought and thus learn more about how humans think and thus teach humans to think better."

    In the chess world, especially with recent developments such as AlphaZero and its use of neural networks, computers are teaching humans to play better. But do humans think better as a result? Perhaps.

    As a sidelight, there are plenty of people around the world that use computers to model reality in order to learn about reality. This goes on every day. In this sense, Botvinnik's project remains true.

    What does this have to do with poker? Humans cannot act like computers. But we may learn from them and apply ourselves in that way.


    KemahPhil wrote: »
    "On the other hand, poker is like a pyramid scheme: It needs a wide range of skill levels to support the pros playing for big bucks at the top. As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    Is this really the future?

    My guess is that at the higher levels, the skill levels will flatten out. But then we will introduce more complications to the game.

    The problem is the whole economics of poker. The money filters from bottom to top. But none of the people in my home games play any attention to any of this including the theory. If over the long run they lose enough money they quit playing and that is that.

    I would not say that the economics of poker models a pyramid scheme. It is closer to the way parimutuel betting works. In effect, and over the long-run, players are betting on their own skills into a pool of money, hoping that their odds end up better than the odds of everybody else. I have proposed this model before but I think I am very peculiar to stick to it.

    The simple fact is the following: good poker players are all looking to expand the volume of the betting pool so that the bad bets will increase the value of the mediocre and good bets.

  • RedRed Red Chipper Posts: 2,144 ✭✭✭✭
    Imperator wrote: »
    I've been following the development of computers and games since the late 1970s. Back then I was playing chess and backgammon and the whole debate about how to develop gameplaying computers fascinated me.

    The division at the time was between those who wanted to program computers to think like humans and those who wanted to program computers for brute force calculation. Brute force calculation won out, but for reasons that skipped over the science of the project, and simply went for the win.

    Which is also how Pluribus was made BTW
    "The core of Pluribus’s strategy was computed via self play, in which the AI plays against copies of itself, without any data of human or prior AI play used as input." ("Superhuman AI for multiplayer poker", Science, 11.07.2019)
  • boyd148boyd148 Red Chipper Posts: 94 ✭✭
    I heard that Pluribus made good use of donk betting to accomplish this victory
  • eugeniusjreugeniusjr Red Chipper Posts: 427 ✭✭✭
    I heard Pluribus actually lost, but that it called it a win because it got unlucky. Very very humanlike.
  • persuadeopersuadeo Red Chipper, Table Captain Posts: 4,094 ✭✭✭✭✭
    edited July 17
    eugeniusjr wrote: »
    I heard Pluribus actually lost, but that it called it a win because it got unlucky. Very very humanlike.

    Actually this is something Imperator could possibly explain.

    Deepstack and Pluribus use Range vs. Hand analysis to assign EV of actions for the computer, and use this EV to correct the results in a relatively straightforward computation called AIVAT or such.

    However, I find this, without better explanations available and no response to queries by experts, strange. If we only assign AIVAT variation reductions to Computer's action vs human action, a massive assumption is being made: that it is variance, not the human's strategy, that accounts for certain EV losses and gains.

    An example. In one of the Deepstack hands, we see a human hit a gutter on the end, and moves in for nearly 4x pot. The computer blithely pays it off with TPMK, then retracts all of the human's score in the hand as variance under AIVAT. In layman's terms, Eugene nails it on the head.

    But if the human is only moving in nutted because it knows the fixed strategy of Deepstack is overcalling, this is not variance, but strategy. The EV loss of the human's floats are punished in the scores per street, very fair, but his 4x pot reward when he is paid off with a bad call is not treated the same way.

    This argument applies to every hand I saw of DS - range vs hand for a one sided look at variance and a correction that will be useful but under exploitative circumstances trend to aid the computer's score.

    Now, look at Pluribus' results over the graph provided by the science team. Note that Pluribus is losing as time goes on - especially at showdown where its blue line is absolutely plummeting. Is this variance or are the humans adjusting? Why did that player in the DS challenge move in as he did?

    Nevertheless, I'm not a computer scientist and await correction.
  • Doug HullDoug Hull RCP Coach Posts: 1,784 -
    KemahPhil wrote: »
    "As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    As humans learn from bots? Look how many people play craps and roulette. The GTO play for these games has been known forever, yet they still play.
    Co-founder Red Chip Poker,
    Author Poker Plays You Can Use
    Author Poker Workbook for Math Geeks
  • TheGameKatTheGameKat Posts: 2,400 -
    Something that interests me that follows on from @Imperator's observations is that the self-learning chess machines (AlphaZero, Leela) play games that simply look different to the brute force engines like Stockfish. I'm curious to the degree this is repeated in poker.
    Moderation In Moderation
  • TheGameKatTheGameKat Posts: 2,400 -
    Doug Hull wrote: »
    KemahPhil wrote: »
    "As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    As humans learn from bots? Look how many people play craps and roulette. The GTO play for these games has been known forever, yet they still play.

    Ha, Yes, quite. But ssshhhh this is why we don't have state income tax.
    Moderation In Moderation
  • kenaceskenaces Red Chipper Posts: 1,400 ✭✭✭✭
    @persuadeo make a good point. In the 10k hands played Pluribus lost -7bb/100 and then used some fancy math(not published) to claim victory.

    They also claim to have beat elite poker players. While Linus was, of course, elite I am not sure about all the other player. There were quite a few bad plays by some of the humans(misclick, no financial incentive, MTT has-beens?)

    But it is pretty impressive that they built such a strong bot so fast with not much computer power. AI is coming for our bankrolls!
  • The MuleThe Mule Red Chipper Posts: 787 ✭✭✭
    Firstly thanks for sharing this @Imperator. I miss your contributions.

    Regarding the AIVAT variance reduction, without seeing details I am making some assumptions, but if it works as I expect then it should be reducing the variance of the EV estimate without altering the actual estimate. It is effectively using the available information of the AI range to increase sample size, thus providing a larger data set to estimate EV. The actual hand is just one data point in the range, but is no more meaningful than any other hand in it's range.

    In @Persuadeo's example, if the human is only moving in nutted because "it knows that the AI is over calling" then the range of the AI should reflect the loss of EV. The AIVAT should not be able to make an unprofitable call into a profitable call, because the call is unprofitable on average across the AI's range. AIVAT should be a better reflection of the EV of the human's strategy, because it considers the entirety of the AI range rather than just the actual hand.

    David.
  • persuadeopersuadeo Red Chipper, Table Captain Posts: 4,094 ✭✭✭✭✭
    Ok, that is reasonable, but again, the human range is unknown and the computer's calling strategy known. In the example I use, the human again loses chips on the river according to AIVAT, in fact even more chips than he's lost on the previous streets, despite it being clear that A8 is a terrible bluffcatcher and presumably Deepstack is calling with many other hands as well. The implication is that the human despite realizing max implied odds simply can't win against this set of actions with this hand ever, even if it stacks the computer and is never stacked in the reverse scenario. That's reasonable but a little strange to accept.

    Fair enough but it still doesn't address the essential problem of this technique: everything passes through player 1's strategy and not player 2's. For instance, if player 2 takes a different action with A2 on the river each time, have we really learned anything? As an example of this, Pluribus makes one absurd call that is extremely rare according to its own internal logic and clearly negative EV. I'd like to see the AIVAT score for situations such as that.
  • jeffncjeffnc Red Chipper Posts: 4,749 ✭✭✭✭✭
    KemahPhil wrote: »
    "On the other hand, poker is like a pyramid scheme: It needs a wide range of skill levels to support the pros playing for big bucks at the top. As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies."

    Is this really the future?

    Well, let me ask you this. With all the advances in computer simulation, getting game theory and math probability information out to the public on the internet and in books, and the increased experience levels of players, casinos are still jammed full of roulette and slots players. Why would that be?
  • jeffncjeffnc Red Chipper Posts: 4,749 ✭✭✭✭✭
    persuadeo wrote: »
    The computer blithely pays it off with TPMK, then retracts all of the human's score in the hand as variance under AIVAT.

    All of it? That seems hard to believe.

  • persuadeopersuadeo Red Chipper, Table Captain Posts: 4,094 ✭✭✭✭✭
    jeffnc wrote: »
    persuadeo wrote: »
    The computer blithely pays it off with TPMK, then retracts all of the human's score in the hand as variance under AIVAT.

    All of it? That seems hard to believe.

    His score for that hand was -300, as i recall. This is fair in one sense, because there is no reason for Deepstacks to be wrong about how often it calls, making even getting paid widely by hands like A8 not enough. Deepstacks can also be randomizing its calls so that A8 might be a tiny fraction of calls. We can't know.
  • persuadeopersuadeo Red Chipper, Table Captain Posts: 4,094 ✭✭✭✭✭
    You can read about it here.

    I'm sure it is both straightforward and unbiased, what I am questioning is just how much we learn in a poker game, given that we only see it through the lens of the computer, as opposed to the long term strategy of the player.

    Take a careful look at the trends in the Pluribus challenge, especially a crusher like player Eddie, who destroys pluribus and manages a beautifully even redline, which is going to be hard to manage over the course of swings, given that this can often reflect range play more than the swings of a blue line.

    Since you are familiar with the sciences, you also know of the problems with researchers trumpeting their results. I'm not sold, as good as Pluribus is, that there are not faults here that the winning players in the match were overcoming. Not yet, anyway.
  • persuadeopersuadeo Red Chipper, Table Captain Posts: 4,094 ✭✭✭✭✭
    Again, i'm not doubting their computations. I'm doubting its efficacy here when we only see through the lens of measuring computer range vs human hand. The computer gets to measure its entire strategy while never seeing the entire human strategy.

    You can find more information on Kevin Wang's blog.
  • persuadeopersuadeo Red Chipper, Table Captain Posts: 4,094 ✭✭✭✭✭
    edited July 25
    Yeah, i think we've exhausted the issue for now. It would be, i think, useful to see the math of some of the decisions, such as some of the ones I mentioned, or how it computes its own AIVAT for when Pluribus makes calls that are low frequency to the point of non-existence and negative EV (demonstrably in a solver) but win massive pots, if only as a demonstration of why a losing call is a winning call over all. The Deepstacks AIVAT math on that hand 71 I discussed at the start is explainable, of course, but both counter-intuitive and opaque, as we don't know the frequency or combinations Deepstack is using to bluffcatch.

    Yes, you need to download the hands and look at them yourself to see.

    Thanks for the guidance.
  • jeffncjeffnc Red Chipper Posts: 4,749 ✭✭✭✭✭
    presumably the EV of that decision is zero (definition of a bluff catcher).

    Where did that definition come from?


  • The MuleThe Mule Red Chipper Posts: 787 ✭✭✭
    I have had a chance to read through some of the material referred to in this thread.

    My original assumption was that AIVAT adjusted EV for the computer in all-in situations (which I incorrectly assumed were the AI in AIVAT), which intuitively feels like a completely mathematically correct thing to do.

    However I see that it is also making adjustments for EV where the human still has decisions. It must be valuing the human’s EV using a strategy of how the human would/should act at decision points, and the only reasonable way to do this would be to assume the human acts using the computer’s optimal strategy. Please correct me if I am incorrect here.

    If this is the case, then I think AIVAT will assume that humans are making the same mistakes the computer is making. For example, if the computer has an abstraction that all flush draws should be treated the same, but the human differentiates between nut flush draws/flush draws with a pair/combo draws/other flush draws, then the AIVAT adjustment would be producing a worse estimate of EV than the human was using in their decision making - this could lead to AIVAT incorrectly adjusting a loss to a win.

    In practice the abstractions are unlikely to be as extreme as the flush draw example, however they are still there, there must be a lot of them, and they reduce the accuracy of the EV estimate. I think we need to be very careful trusting an adjustment that claims the computer won based on AIVAT (with the huge assumption that my understanding is correct).
  • jeffncjeffnc Red Chipper Posts: 4,749 ✭✭✭✭✭
    edited July 30
    That doesn't seem very useful - everything is zero EV in a Nash Equilibrium. You can't just assume a bluff catching is 0 EV in the real world.
  • jeffncjeffnc Red Chipper Posts: 4,749 ✭✭✭✭✭
    My point was that Pluribus is trying to get down to a Nash Equilibrium strategy, and its bluff catchers should solve down to zero EV

    OK

Leave a Comment

BoldItalicStrikethroughOrdered listUnordered list
Emoji
Image
Align leftAlign centerAlign rightToggle HTML viewToggle full pageToggle lights
Drop image/file