Lex Fridman Podcast - 图奥马斯·桑德霍尔姆:扑克与博弈论 封面

图奥马斯·桑德霍尔姆:扑克与博弈论

Tuomas Sandholm: Poker and Game Theory

本集简介

图奥马斯·桑德霍尔姆是卡内基梅隆大学教授,也是Libratus的联合创造者——这是首个在无限注德州扑克单挑赛中击败顶尖人类玩家的AI系统。他已发表450余篇关于博弈论与机器学习的论文,包括2017年NIPS/NeurIPS会议的最佳论文。其研究与创立的企业对现实世界产生了深远影响,这尤其得益于他与团队不仅提出新理念,更构建系统验证这些理念在现实中的可行性。视频版本可在YouTube观看。若想获取更多播客信息,请访问https://lexfridman.com/ai 或在Twitter、LinkedIn、Facebook及YouTube上关注@lexfridman,这些平台还提供对话的视频版本。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

以下是与托马斯·桑德霍姆的对话。

The following is a conversation with Thomas Sandholm.

Speaker 0

他是圣...

He's a professor at St.

Speaker 0

穆学院的教授,也是Libratus的联合创始人。Libratus是首个在单挑无限注德州扑克中击败人类顶尖选手的人工智能系统。

Mu and cocreator of Libratus, which is the first AI system to beat top human players in the game of heads up, no limit, Texas hold them.

Speaker 0

他已发表超过450篇关于博弈论与机器学习的论文,包括2017年在NIPS(现更名为NeurIPS)上的最佳论文奖。我就是在那次会议上与他进行了这次对话。

He has published over 450 papers on game theory and machine learning, including a best paper in 2017 at NIPS, now renamed to New Rips, which is where I caught up with him for this conversation.

Speaker 0

他的研究和公司对现实世界产生了深远影响,这尤其得益于他和团队不仅提出新理论,还会构建系统来验证这些理论在现实中的可行性。

His research and companies have had wide reaching impact in the real world, especially because he and his group not only propose new ideas, but also build systems to prove that these ideas work in the real world.

Speaker 0

本对话是MIT通用人工智能课程及人工智能播客系列的一部分。

This conversation is part of the MIT course on artificial general intelligence and the artificial intelligence podcast.

Speaker 0

若您喜欢本期内容,欢迎在YouTube、iTunes订阅,或通过Twitter@lexfriedman(拼写f-r-i-d)与我联系。

If you enjoy it, subscribe on YouTube, iTunes, or simply connect with me on Twitter at lex friedman, spelled f r I d.

Speaker 0

现在请收听我与托马斯·桑德霍姆的对话。

And now here's my conversation with Thomas Sandholm.

Speaker 1

您能否简要描述德州扑克这个游戏?

Can you describe at the high level the game of poker, Texas hold them?

Speaker 1

为不熟悉纸牌游戏的听众解释下单挑德州扑克。

Heads up, Texas Hold'em for people who might not be familiar with this card game.

Speaker 1

好的,很乐意。

Yeah, happy to.

Speaker 1

单挑无限注德州扑克已成为AI领域测试不完美信息博弈解决方案的通用算法的主要基准。

So, Heads up Nolemi Texas Hold'em has really emerged in the AI community as a main benchmark for testing these application independent algorithms for imperfect information game solving.

Speaker 1

这是一款实际上由人类玩家参与的游戏。

And this is a game that's actually played by humans.

Speaker 1

出于各种原因,你在电视或赌场里不太常见到它,但在一些高级赌场和史上最佳扑克电影中能见到它的身影。

You don't see it that much on TV or casinos for various reasons, but you do see it in some expert level casinos and you see it in the best poker movies of all time.

Speaker 1

它实际上是世界扑克系列赛的一个比赛项目,但主要是在线进行,而且通常涉及相当可观的赌注。

It's actually an event in the World Series of Poker, but mostly it's played online and typically for pretty big sums of money.

Speaker 1

这通常只有专家级玩家才会玩的游戏。

This is a game that usually only experts play.

Speaker 1

所以如果你周五晚上去参加家庭牌局,很可能不会是单挑无限注德州扑克,有些情况下可能是无限注德州扑克,但通常是一大群人玩而且竞争性没那么强。

So if you go to your home game on a Friday night, it probably is not going to be heads up, no limit Texas hold It might be no limit Texas hold them in some cases, but typically for a big group and it's not as competitive.

Speaker 1

单挑意味着只有两名玩家,所以就像是我和你之间的对决。

While heads up means it's two players, so it's really like me against you.

Speaker 1

是我更厉害还是你更厉害?

Am I better or are you better?

Speaker 1

这很像国际象棋或围棋,但属于不完全信息博弈,因此难度更大——因为我既要应对你知我不知的情况,也要处理我知你不知的局面,而不是像棋盘上的棋子那样双方都一目了然。

Much like chess or Go in that sense, but an imperfect information game, which makes it much harder because I have to deal with issues of you knowing things that I don't know and I know things that you don't know instead of pieces being nicely laid on the board for both of us to see.

Speaker 0

在德州扑克里,有两张牌只有你自己能看到。

So in Texas Hold'em, there's two cards that you only see that belong to you.

Speaker 0

然后会逐步亮出一些公共牌,最终共有五张大家都看得见的牌。

Then they gradually lay out some cards that add up overall to five cards that everybody can see.

Speaker 0

所以信息的不完整性就体现在你手中持有的那两张牌上。

So the imperfect nature of the information is the two cards that you're holding.

Speaker 0

从一开始就是。

Upfront.

Speaker 0

所以正如你

So as you

Speaker 1

所说,首先每人私下拿到两张牌,然后进入一轮下注。

said, you first get two cards in private each and then there's a betting round.

Speaker 1

接着在公共桌面上发三张牌,再进行一轮下注。

Then you get three cards in public on the table, then there's a betting round.

Speaker 1

然后在公共桌面上发第四张牌,进行一轮下注。

Then you get the fourth card in public on the table, there's a betting round.

Speaker 1

最后在桌面上发第五张牌,进行最后一轮下注。

Then you get the fifth card on the table, there's a betting round.

Speaker 1

因此,总共有四轮下注和四轮信息揭示阶段。

So, there's a total of four betting rounds and four tranches of information revelation, if you will.

Speaker 1

只有第一阶段是私下的,之后都是公开的。

Only the first tranche is private and then it's public from there.

Speaker 0

这很可能是目前人工智能领域乃至大众中最受欢迎的非完全信息博弈游戏。

And this is probably by far the most popular game in AI and just the general public in terms of imperfect information.

Speaker 0

因此它大概是最受观众欢迎的观赏性游戏。

So it's probably the most popular spectator game to watch.

Speaker 0

对吧?

Right?

Speaker 0

正因如此,攻克这个游戏才格外令人兴奋。

So which is why it's a super exciting game to tackle.

Speaker 0

可以说,在受欢迎程度和作为人工智能智力标杆方面,它与国际象棋不相上下。

So it's it's on the order of chess, I would say, in terms of popularity, in terms of AI setting it as the bar of what is intelligence.

Speaker 0

2017年,Libratas这个词怎么发音?

So in 2017, Libratas, how do you pronounce it?

Speaker 0

Libratas。

Libratas.

Speaker 0

Libratas在那里击败了Little Latin。

Libratas beats Little Latin there.

Speaker 0

带点拉丁风味。

A little bit of Latin.

Speaker 0

Libratas击败了几位专业人类玩家。

Libratas beats a few four expert human players.

Speaker 0

能描述下那个事件吗?

Can you describe that event?

Speaker 0

你从中学到了什么?

What you learned from it?

Speaker 0

当时情况是怎样的?

What was it like?

Speaker 0

对于没读过论文也没研究过的人,整个过程大概是什么样?

What was the process in general for people who have not read the papers and studied?

Speaker 1

是的,当时我们邀请了排名前十的四位选手。

Yeah, so the event was that we invited four of the top 10 players.

Speaker 1

这些都是单挑无限注德州扑克的职业玩家,这点很重要,因为这种玩法与多人版本差异很大。

These are specialist players in Heads Up No Limit Texas Hold'em, which is very important because this game is actually quite different than the multiplayer version.

Speaker 1

我们邀请他们来匹兹堡,在逆向赌场进行了二十天的对战。

We brought them in to Pittsburgh to play at the reverse casino for twenty days.

Speaker 1

我们目标是收集12万手牌局数据,以确保统计显著性。

We wanted to get 120,000 hands in because we wanted to get statistical significance.

Speaker 1

所以即便是对这些通常打牌很快的职业高手来说,这也是相当大的牌局量。

So it's a lot of hands for hemers to play, even for these top pros who play fairly quickly normally.

Speaker 1

因此我们无法仅让其中一人完成如此多的牌局。

So we couldn't just have one of them play, so many hands.

Speaker 1

他们连续二十天基本从早打到晚,我提供了20万美元作为激励奖金。

Twenty days they were playing basically morning to evening, and I raised 200,000 as a little incentive for them to play.

Speaker 1

奖金设置并非平均分配每人5万。

The setting was so that they didn't all get 50,000.

Speaker 1

实际支付是根据每位选手对抗AI的表现而定。

We actually paid them out based on how they did against the AI each.

Speaker 1

这样无论领先、落后还是接近击败AI,他们都有全力拼搏的动力。

So they had an incentive to play as hard as they could, whether they're way ahead or way behind or right at the mark of beating the AI.

Speaker 0

可惜你们没赚到钱。

And you make any money, unfortunately.

Speaker 1

对。

Right.

Speaker 1

不,我们一分钱都赚不到。

No, we can't make any money.

Speaker 1

其实几年前我曾探讨过能否进行真钱对局,因为与顶尖选手赌钱对决会很有趣。

So originally, a couple of years earlier, I actually explored whether we could actually play for money because that would be, of course, interesting as well, to play against the top people for money.

Speaker 1

但宾夕法尼亚博彩委员会否决了这个提议。

But the Pennsylvania Gaming Board said no.

Speaker 1

所以,我们没能成功。

So, we couldn't.

Speaker 1

所以,这很像为音乐家或拳击手之类的举办的展览。

So, this is much like an exhibit for a musician or a boxer or something like that.

Speaker 0

尽管如此,你一直在记录资金,并为我们带来了接近200万美元的收入,我想。

Nevertheless, you were keeping track of the money and brought us one close to $2,000,000, I think.

Speaker 0

所以如果那是真钱,如果你能赚到钱,那将是非常令人印象深刻且鼓舞人心的成就。

So so if that if it was for real money, if you were able to earn money, that was a quite impressive and inspiring achievement.

Speaker 0

只是些细节问题。

Just a few details.

Speaker 0

玩家们在看什么?

What were the players looking at?

Speaker 0

我是说,他们是在电脑前操作吗?

I mean, were they behind a computer?

Speaker 0

界面是什么样子的?

What what was the interface like?

Speaker 1

是的。

Yes.

Speaker 1

他们就像平常那样进行游戏。

They they were playing much like they normally do.

Speaker 1

这些顶尖选手玩这款游戏时,主要是在线对战。

These top players, when they play this game, they play mostly online.

Speaker 1

所以他们习惯了通过用户界面操作,在这里也是同样的方式。

So they're used to playing through UI and they did the same thing here.

Speaker 1

当时有这么个布局。

So there was this layout.

Speaker 1

你可以想象屏幕上有一张桌子。

You could imagine there's a table on a screen.

Speaker 1

人类坐在那里,AI也坐在那里,屏幕显示着正在发生的一切。

There's the human sitting there and then there's the AI sitting there and the screen shows everything that's happening.

Speaker 1

牌发出来,显示下注情况。

The cards coming out and shows the bets being made.

Speaker 1

我们还为人类玩家提供了下注历史记录。

And we also had the betting history for the human.

Speaker 1

如果人类玩家忘记当前牌局的情况,他们可以随时查阅参考。

If the human forgot what had happened in the hand so far, they could actually reference back and so forth.

Speaker 0

让他们查看下注历史记录有什么特别原因吗?

Is there a reason they were given access to the betting history?

Speaker 0

嗯,这

Well, it

Speaker 1

其实并不重要。

didn't really matter.

Speaker 1

反正他们也不会忘记。

They wouldn't have forgotten anyway.

Speaker 1

这些都是顶尖玩家,我们只是为了避免人类因遗忘而让AI获得记忆优势的情况。

These are top quality people, but we just wanted to put out there so it's not a question of a human forgetting and the AI somehow trying to get that advantage of better memory.

Speaker 0

那当时是什么感觉?

So what was that like?

Speaker 0

我是说,那真是个了不起的成就。

I mean, that was an incredible accomplishment.

Speaker 0

那么在比赛前是什么感觉?

So what did it feel like before the event?

Speaker 0

你有过怀疑或希望吗?

Did you have doubt, hope?

Speaker 0

你的信心处于什么水平?

Where was your confidence at?

Speaker 0

是的,这很棒。

Yeah, that's great.

Speaker 0

问得好。

Great question.

Speaker 0

所以十八个月前,我

So eighteen months earlier, I

Speaker 1

曾用上一代名为Cloudical的AI组织过类似的'人类大脑对决AI'比赛,但我们没能战胜人类。

had organized a similar Brains versus AI competition with a previous AI called Cloudical and we couldn't beat the humans.

Speaker 1

所以这次,仅仅过了十八个月,我知道这个新AI——Libratus要强大得多,但在真正尝试前很难预测它对抗顶尖人类选手的表现。

So this time around, it was only eighteen months later and I knew that this new AI, Libratus, was way stronger, but it's hard to say how you'll do against the top humans before you try.

Speaker 1

所以我当时认为胜负概率五五开。

So I thought we had about a fiftyfifty shot.

Speaker 1

而国际博彩网站给我们的赔率是1:4或1:5的劣势方。

And the international betting sites put us as a four to one or five to one underdog.

Speaker 1

所以人们如此坚信人类能胜过AI还挺有趣的。

So it's kind of interesting that people really believe in people over AI.

Speaker 1

人们不仅过度相信自己,与AI的表现相比,他们对他人也过于自信。

People don't just over believe in themselves, but they have overconfidence in other people as well compared to the performance of AI.

Speaker 1

所以我们当时是四比一或五比一的劣势方,即便连续三天击败人类选手,国际博彩网站上我们的赔率仍是五五开。

So we were a four to one or five to one underdog and even after three days of beating the humans in a row, we were still fiftyfifty on the international betting sites.

Speaker 0

你是否认为扑克像人们想象的那样具有某种特殊魔力?

Do you think there's something special and magical about poker the way people think about it?

Speaker 0

我的意思是,即使在象棋领域,也没有好莱坞电影。

In the sense have I mean, even in chess, there's no Hollywood movies.

Speaker 0

扑克是许多电影的主角。

Poker is the the star of many movies.

Speaker 0

有种观点认为,人类特定的面部表情、肢体语言和眼球运动等‘马脚’对扑克至关重要。

And there's this feeling that certain human facial expressions and body language, eye movement, all these tells are critical to poker.

Speaker 0

比如你能看穿某人灵魂般理解其下注策略之类的。

Like, you can look into somebody's soul and understand their betting strategy and so on.

Speaker 0

这或许就是为什么——你认为这是人们坚信人类会胜出的原因吗?

So that's probably why the possibly, do you think that is why people have a confidence that humans will outperform?

Speaker 0

因为AI系统在当前架构下无法感知这类‘马脚’。

Because AI systems cannot, in this construct perceive these kinds of tells.

Speaker 0

它们只能分析下注模式,仅此而已——下注模式和统计数据。

They're only looking at betting patterns and nothing else, betting patterns and and statistics.

Speaker 0

如果退一步看人类玩家之间的对决,对你来说什么更重要?

So what's more important to you if you step back on human players, human versus human?

Speaker 0

这些被我们浪漫化的‘马脚’概念究竟扮演着什么角色?

What's the role of these tells, of these ideas that we romanticize?

Speaker 1

是的。

Yeah.

Speaker 1

所以我会分成两部分来说。

So I'll split it into two parts.

Speaker 1

第一部分是:为什么人类更信任人类而非AI,并对人类过度自信?

So one is why do humans trust humans more than AI and have overconfidence in humans?

Speaker 1

我认为这与'微表情'问题没有直接关联。

I think that's not really related to the tell question.

Speaker 1

只是人们见过这些顶尖选手的表现,他们实在太出色了,简直不可思议。

It's just that they've seen these top players, how good they are and they're really fantastic.

Speaker 1

所以很难相信AI能战胜他们。

So, it's just hard to believe that an AI could beat them.

Speaker 1

我觉得这种认知偏差就来源于此。

So, I think that's where that comes from.

Speaker 1

这其实反映了关于AI的一个普遍现象:除非亲眼见证AI超越人类的表现,否则人们很难相信它能够做到。

And that's actually maybe a more general lesson about AI that until you've seen it overperform a human, it's hard to believe that it could.

Speaker 1

说到微表情,这些顶级选手都极其擅长隐藏自己的微表情,以至于在他们这个层级,花大量精力去捕捉彼此的微表情其实很不划算。

But then the tells, a lot of these top players, they're so good at hiding tells that among the top players it's actually not really worth it for them to invest a lot of effort trying to find tells in each other because they're so good at hiding them.

Speaker 1

没错,在周五晚上的普通牌局里,微表情分析会非常重要。

So yes, at the kind of Friday evening game, tells are going to be a huge thing.

Speaker 1

你能读懂别人,如果你擅长观察,就能像读一本打开的书那样看透他们。

You can read other people and if you're a good reader, you'll read them like an open book.

Speaker 1

但在当今扑克的顶级赛事中,随着水平提升,微表情分析在比赛中的占比会变得越来越小。

But at the top levels of poker now, the tells become a much smaller and smaller aspect of the game as you go to the top levels.

Speaker 0

策略的数量,可能的行动数量非常庞大,达到了10的100次方以上。

The the amount of strategies, the amounts of possible actions is is very large, 10 to the power of 100 plus.

Speaker 0

因此必须有所取舍,我读过一些相关论文。

So there has to be some I've read a few of the papers related.

Speaker 0

它必须形成对各种手牌和行动的抽象概念。

It has It has to form some abstractions of various hands and actions.

Speaker 0

那么什么样的抽象概念对扑克游戏有效呢?

So what kind of abstractions are effective for the game of poker?

Speaker 1

是的。

Yeah.

Speaker 1

你说得完全正确。

So you're exactly right.

Speaker 1

当你面对一个规模达到10的161次方的博弈树时,特别是在不完全信息博弈中。

So when you go from a game tree that's 10 to the 161, especially in an imperfect information game.

Speaker 1

这个规模太大,无法直接求解,即使使用我们最快的均衡寻找算法。

It's way too large to solve directly, even with our fastest equilibrium finding algorithms.

Speaker 1

所以你需要先进行抽象处理。

So you want to abstract it first.

Speaker 1

博弈中的抽象比MDP或其他单智能体环境中的抽象要复杂得多,因为存在抽象病态现象:如果我采用更精细的抽象,从中得到的实际游戏策略可能反而比粗粒度抽象得到的策略更差。

And abstraction in games is much trickier than abstraction in MDPs or other single agent settings because you have these abstraction pathologies that if I have a finer grained abstraction, the strategy that I can get from that for the real game might actually be worse than the strategy I can get from the coarse grained abstraction.

Speaker 1

所以必须非常谨慎。

So you have to be very careful.

Speaker 0

现在,关于抽象的类型,概括来说,我们讨论的主要是手牌抽象和...

Now, kinds of abstractions, just to zoom out, we're talking about, there's the hands abstractions and then there's

Speaker 1

投注策略。

betting strategies.

Speaker 1

投注行为。

Betting actions.

Speaker 1

是的。

Yeah.

Speaker 0

投注行为。

Betting actions.

Speaker 1

所以我们需要讨论通用游戏的信息抽象。

So there's information abstraction to talk about general games.

Speaker 1

信息抽象,即对机会行为的抽象。

Information abstraction, which is the abstraction of what chance does.

Speaker 1

在扑克中这就是手牌。

And this would be the cards in the case of poker.

Speaker 1

然后是行动抽象,即对实际玩家行为的抽象,在扑克中就是下注行为

And then there's action abstraction, which is abstracting the actions of the actual players, which would be bets in the case

Speaker 0

的扑克。

of poker.

Speaker 0

你自己和其他玩家?

Yourself and the other players?

Speaker 1

是的,你自己和其他玩家。

Yes, yourself and the other players.

Speaker 1

在信息抽象方面,我们已实现完全自动化。

And for information abstraction, we were completely automated.

Speaker 1

这些算法实现了我们所谓的'潜在意识抽象',不仅评估当前手牌价值,还预测随时间可能演变成好牌或坏牌的情况。

So these are algorithms that do what we call potential aware abstraction, where we don't just look at the value of the hand, but also how it might materialize into good or bad hands over time.

Speaker 1

这是一种自下而上的过程,结合了整数规划、聚类分析等多方面技术。

And it's a certain kind of bottom up process with integer programming there and clustering and various aspects.

Speaker 1

如何构建这种抽象模型呢?

How do you build this abstraction?

Speaker 1

在动作抽象层面,主要基于人类和其他AI过去玩这款游戏的历史数据。

And then in the action abstraction, there it's largely based on how humans and other AIs have played this game in the past.

Speaker 1

但最初我们其实采用了自动化动作抽象技术,这种技术具有可证明的收敛性,能找到最佳下注组合,只是扩展性较差。

But in the beginning, we actually use an automated action abstraction technology, which is provably convergent, that it finds the optimal combination of bet sizes, but it's not very scalable.

Speaker 1

所以我们无法将其应用于

So we couldn't use it for

Speaker 0

全局游戏,但可以用在前几轮下注动作中。

the whole game, but we use it for the first couple of betting actions.

Speaker 0

那么什么更重要呢?

So what's more important?

Speaker 0

手牌的强度吗?

The strength of the hand?

Speaker 0

是信息提取的方式,还是具体的打法策略?

So the the information extraction or the how you play them?

Speaker 0

动作本身。

The actions.

Speaker 0

你知道,浪漫化的观点再次强调——手牌好坏根本无关紧要。

Does it you know, the romanticized notion again is that it doesn't matter what hands you have.

Speaker 0

行动和下注或许才是制胜之道,无论你手中持有什么牌。

That the actions, the betting may be the way you win no matter what hands you have.

Speaker 1

是的。

Yeah.

Speaker 1

正因如此,你必须玩大量手牌来降低运气成分的影响。

So that's why you have to play a lot of hands so that the role of luck gets smaller.

Speaker 1

否则你可能会侥幸拿到好牌,从而赢得比赛。

So you could otherwise get lucky and get some good hands and then you're going to win the match.

Speaker 1

即使玩上千手牌,仍可能受运气左右——无限注德州扑克的波动性太大,一旦双方全押,筹码的变数就极为惊人。

Even with thousands of hands, can get lucky because there's so much variance in no limit Texas Hold'em because if we both go all in, it's a huge stack of variance.

Speaker 1

无限注德州扑克中存在这种巨大的波动。

There are these massive swings in No Limit Texas Holder.

Speaker 1

所以你需要玩的不只是几千手,而是超过十万手牌才能获得统计显著性。

So that's why you have to play not just thousands, but over a 100,000 hands to get statistical significance.

Speaker 0

让我换个方式问这个问题。

So let me ask another way this question.

Speaker 0

如果你根本不看自己的牌,但对手们不知道这点,你能表现得

If you didn't even look at your hands, but they didn't know that, the opponents didn't know that, how well would you be

Speaker 1

多好?

able to do?

Speaker 1

这是个好问题。

That's a good question.

Speaker 1

我确实听过一个故事:挪威女牌手安妮特·奥伯斯塔德曾用这种方式赢得过锦标赛。

There's actually I heard the story that there's this Norwegian female poker player called Annette Oberstad, who's actually won a tournament by doing exactly that.

Speaker 1

但这种情况极为罕见。

But that would be extremely rare.

Speaker 1

所以你那样确实玩不好。

So you cannot really play well that way.

Speaker 0

好的。

Okay.

Speaker 0

所以手部确实有其作用。

So the hands do have some role to play.

Speaker 0

是的。

Yes.

Speaker 0

据我所知,Libratus并不使用学习方法,比如深度学习。

So a Libratus does not use, as far as I understand, use learning methods, deep learning.

Speaker 0

是否有学习空间呢?没有理由说Libratus不能与类似AlphaGo的方法结合,用于评估函数估计器的质量。

Is there room for learning in you know, there's no reason why Libratist doesn't, you know, combine with an alpha go type approach for estimating the quality for function estimator.

Speaker 0

你对此有何看法?

What are your thoughts on this?

Speaker 0

也许与另一个我不太熟悉的算法DeepStack相比,这个引擎确实使用了深度学习,虽然效果尚不明确。

Maybe as compared to another algorithm, which I'm not that familiar with, DeepStack, the the engine that does use deep learning that is unclear how well it does, but nevertheless uses deep learning.

Speaker 0

那么你对用学习方法来辅助Libratos打扑克有什么看法?

So what are your thoughts about learning methods to aid in the way that Libratos plays in the game of poker?

Speaker 1

是的。

Yeah.

Speaker 1

正如你所说,Libratus没有使用学习方法,但依然表现得非常出色。

So as you said, Libratos did not use learning methods and played very well without them.

Speaker 1

自那时起,我们确实在这里有几篇关于使用学习技术的论文。

Since then, have actually here, we have a couple of papers on things that do use learning techniques.

Speaker 0

很好。

Excellent.

Speaker 1

尤其是深度学习。

And deep learning in particular.

Speaker 1

就是你所说的那种学习评估函数的方式。

And sort of the way you're talking about where it's learning an evaluation function.

Speaker 1

但在不完美信息博弈中,与围棋或现在也包括国际象棋和将棋不同,仅学习对状态的评估是不够的,因为信息集的价值不仅取决于确切状态,还取决于双方玩家的信念。

But in imperfect information games, unlike let's say in Go or now also in Chess and Shogi, it's not sufficient to learn an evaluation for a state because the value of an information set depends not only on the exact state, but it also depends on both players' beliefs.

Speaker 1

比如,如果我手牌很差,但对手以为我手牌很好,我的处境就会好得多。

Like, if I have a bad hand, I'm much better off if the opponent thinks I have a good hand.

Speaker 1

反之亦然,如果我手牌很好,但对手以为我手牌很差,我的处境也会好得多。

And vice versa, if I have a good hand, I'm much better off if the opponent believes I have a bad hand.

Speaker 1

所以状态的价值不仅仅是牌面决定的函数。

So the value of a state is not just a function of the cards.

Speaker 1

它取决于——如果你愿意这么说——游戏路径,但仅限于这种路径被信念分布所捕捉的程度。

It depends on, if you will, the path of play, but only to the extent that it's captured in the belief distributions.

Speaker 1

这就是为什么它不像完美信息博弈中那么简单。

So that's why it's not as simple as it is in perfect information games.

Speaker 1

我也不想说在完美信息博弈中就很简单。

And I don't want to say it's simple there either.

Speaker 1

当然,在计算上那里也非常复杂。

It's of course very complicated computationally there too.

Speaker 1

但至少在概念上,这非常简单明了。

But at least conceptually it's very straightforward.

Speaker 1

存在一个状态和一个评估函数,你可以尝试学习它。

There's a state, there's an evaluation function, you can try to learn it.

Speaker 1

在这里,你需要做更多的事情。

Here, you have to do something more.

Speaker 1

我们在其中一篇论文中探讨的是,允许对手在搜索树的叶子节点采取不同策略——如果你愿意这么理解的话。

What we do is in one of these papers we're looking at allowing where we allow the opponent to actually take different strategies at the leaf of the search tree, if you will.

Speaker 1

这是一种不同的处理方式,因此它不会预设对手的特定玩法。

And that is a different way of doing it and it doesn't assume therefore a particular way that the opponent plays.

Speaker 1

但它允许对手从一组不同的延续策略中进行选择。

But it allows the opponent to choose from a set of different continuation strategies.

Speaker 1

这迫使我们在前瞻搜索中不能过于乐观。

And that forces us to not be too optimistic in a look ahead search.

Speaker 1

这是在信息不完整的游戏中实现可靠前瞻搜索的一种方法,这非常困难。

And that's one way you can do sound look ahead search in imperfect information games, which is very difficult.

Speaker 1

你刚才问到了DeepStack。

And you were asking about DeepStack.

Speaker 1

他们的做法与我们截然不同,无论是在Librados还是这项新工作中。

What they did was very different than what we do, either in Librados or in this new work.

Speaker 1

他们随机生成游戏中的各种情境。

They were randomly generating various situations in the game.

Speaker 1

然后他们从那里开始前瞻到游戏结束,就像那是另一个游戏的开始。

Then they were doing the look ahead from there to the end of the game as if that was the start of a different game.

Speaker 1

当时他们用深度学习来学习那些状态值,但

And then they were using deep learning to learn those values of those states, but

Speaker 0

这些状态不仅仅是物理状态,还包括信念分布。

the states were not just the physical states, they include belief distributions.

Speaker 0

当你谈论DeepStack或Libratus的前瞻策略时,是否意味着要考虑游戏可能演化的所有可能性?

When you talk about look ahead for DeepStack or with Libratus, does it mean considering every possibility that the game can evolve?

Speaker 0

我们是否在讨论这种类似指数级增长的树状结构?

Are we talking about extremely sort of this exponential growth of a tree?

Speaker 1

是的。

Yes.

Speaker 1

所以我们讨论的正是这个。

So we're talking about exactly that.

Speaker 1

就像你在α-β剪枝搜索或蒙特卡洛树搜索中做的那样,只是采用了不同技术。

Much like you do in alpha beta search or Monte Carlo tree search, but with different techniques.

Speaker 1

这里使用了一种不同的搜索算法,然后我们需要以不同方式处理叶子节点。

So there's a different search algorithm, and then we have to deal with the leaves differently.

Speaker 1

如果你想想Libratus的做法,我们不需要担心这个,因为我们只在游戏结束时进行。

So if you think about what Liberatus did, we didn't have to worry about this because we only did it at the end of the game.

Speaker 1

我们总是会终止于真实情境,并且知道最终收益是多少。

So we would always terminate into a real situation and we would know what the payout is.

Speaker 1

它没有做这些深度受限的前瞻。

It didn't do these depth limited lookaheads.

Speaker 1

但现在这篇名为《不完全信息游戏的深度受限搜索》的新论文中,我们实际上可以实现合理的深度受限前瞻策略。

But now in this new paper, which is called is called depth limited search for imperfect information games, we can actually do sound depth limited lookaheads.

Speaker 1

所以我们可以从游戏一开始就进行前瞻,因为对整个长局游戏来说这样做太复杂了。

So we can actually start to do the lookahead from the beginning of the game on because that's too complicated to do for this whole long game.

Speaker 1

所以在Libratos中我们只针对终局进行前瞻。

So in Libratos we were just doing it for the end.

Speaker 0

然后是另一方面,这种信念分布。

And then the other side, this belief distribution.

Speaker 0

那么对手可能持有的信念类型是被明确建模的吗?

So is it explicitly modeled what kind of beliefs that the opponent might have?

Speaker 1

是的,这是被明确建模的,但不是假设的。

Yeah, it is explicitly modelled but it's not assumed.

Speaker 1

这些信念实际上是输出,而非输入。

The beliefs are actually output, not input.

Speaker 1

当然初始信念是输入,但它们只是根据游戏规则产生的,因为我们知道发牌者是从牌堆中均匀发牌的。

Of course the starting beliefs are input but they just fall from the rules of the game because we know that the dealer deals uniformly from the deck.

Speaker 1

所以我知道你可能持有的每一对牌的概率都是相等的。

So I know that every pair of cards that you might have is equally likely.

Speaker 1

这一点我很清楚。

I know that for a fact.

Speaker 1

这只是游戏规则的必然结果。

That just follows from the rules of the game.

Speaker 1

当然,除了我手里的两张牌。

Of course, except the two cards that I have.

Speaker 1

我知道你没有那两张牌。

I know you don't have those.

Speaker 1

你必须把这一点考虑进去。

You have to take that into account.

Speaker 1

这叫做牌面排除,非常重要。

That's called card removal and that's very important.

Speaker 0

发牌总是来自同一副牌吗?

Is the dealing always coming from a single deck?

Speaker 0

在单挑中?

In a heads up?

Speaker 1

你可以假设是单副牌

You can assume Single

Speaker 0

deck.

Speaker 0

你知道

You know

Speaker 1

如果我手里有黑桃A,我就知道你没有黑桃A。

that if I have the ace of spades, I know you don't have an ace of spades.

Speaker 1

好吧,

Okay,

Speaker 0

太好了。

great.

Speaker 0

所以最初你的信念基础是发牌是公平的,但你如何开始调整这个信念呢?

So in the beginning your belief is basically the fact that it's a fair dealing of hands, but how do you start to adjust that belief?

Speaker 1

这就是博弈论的妙处所在。

Well, that's where this beauty of game theory comes.

Speaker 1

纳什均衡由约翰·纳什在1950年提出,它定义了在多玩家情况下的理性博弈行为。

So, Nash equilibrium, which John Nash introduced in 1950, introduces what rational play is when you have more than one player.

Speaker 1

这些策略组合中,每个策略都是针对每位玩家的应变计划。

And these are pairs of strategies where strategies are contingency plans, one for each player.

Speaker 1

因此,在对方不偏离的前提下,任何玩家都不愿单方面改变策略。

So neither player wants to deviate to a different strategy given that the other doesn't deviate.

Speaker 1

但作为副作用,你会通过贝叶斯法则获得信念体系。

But as a side effect, you get the beliefs from Bayes rule.

Speaker 1

所以纳什均衡并非仅适用于这些不完全信息博弈。

So Nash equilibrium really isn't just deriving in these imperfect information games.

Speaker 1

纳什均衡不仅定义策略,还为我们双方定义了信念体系。

Nash equilibrium doesn't just define strategies, it also defines beliefs for both of us.

Speaker 1

它还为每个状态定义了信念。

And it defines beliefs for each state.

Speaker 1

每个状态(他们称为信息集),在博弈的每个信息集中,都存在一系列我们可能处于的不同状态,但我无法确定具体是哪一个。

So, each state, each they call information sets, at each information set in the game, there's a set of different states that we might be in, but I don't know which one we're in.

Speaker 1

纳什均衡精确地告诉我,在我的认知中,这些真实世界状态的概率分布。

Nash equilibrium tells me exactly what is the probability distribution over those real world states in my mind.

Speaker 0

纳什均衡是如何给出这种概率分布的?

How does Nash equilibrium give you that distribution?

Speaker 1

我来举个简单例子。

I'll do a simple example.

Speaker 1

你知道石头剪刀布这个游戏吗?

You know the game Rock, Paper, Scissors?

Speaker 1

我们可以这样描述:玩家一先行动,然后玩家二行动。

We can draw it as player one moves first and then player two moves.

Speaker 1

但当然,关键在于玩家二不知道玩家一采取了什么行动。

But of course, it's important that player two doesn't know what player one moved.

Speaker 1

否则玩家二每次都能赢。

Otherwise player two would win every time.

Speaker 1

因此我们可以将其描述为一个信息集:玩家一首先在三个行动中选择一个,然后玩家二面对一个信息集,这样玩家二就不知道世界处于哪个节点上。

So we can draw that as an information set where player one makes one of three moves first and then there's an information set for player two, so player two doesn't know which of those nodes the world is in.

Speaker 1

但一旦我们知道了玩家一的策略,纳什均衡会告诉你以三分之一概率出石头、三分之一出布、三分之一出剪刀。

But once we know the strategy for player one, Nash equilibrium will say that you play onethree rock, onethree paper, onethree scissors.

Speaker 1

由此我可以在信息集上推导出我的信念:各三分之一概率。

From that I can derive my beliefs on the information set that they're one third, one third, one third.

Speaker 0

所以贝叶斯定理给出了这个结论。

So Bayes gives you that.

Speaker 0

但这是特定于某个玩家的吗?还是说你会根据这些特定情况快速更新?

But is that specific to a particular player or is it something you quickly update with those specific

Speaker 1

博弈论其实并不针对特定玩家。

Game theory isn't really player specific.

Speaker 1

这也是为什么我们不需要任何数据。

So that's also why we don't need any data.

Speaker 1

我们不需要这些特定人类过去如何玩的历史记录,也不需要任何AI或人类之前的玩法记录。

We don't need any history of how these particular humans played in the past or how any AI or human had played before.

Speaker 1

这完全关乎理性。

It's all about rationality.

Speaker 1

所以AI只需要思考一个理性的对手会怎么做,以及如果我是理性的我会怎么做。

So the AI just thinks about what would a rational opponent do and what would I do if I am rational.

Speaker 1

这就是博弈论的核心思想。

That's the idea of game theory.

Speaker 1

所以这实际上是一种无需数据、无需对手的

So it's really a data free, opponent free

Speaker 0

方法。

approach.

Speaker 0

因此它源自游戏设计本身,而非玩家设计。

So it comes from the design of the game as opposed to the design of the player.

Speaker 1

完全正确。

Exactly.

Speaker 1

本身并没有对手建模。

There's no opponent modelling per se.

Speaker 1

我是说,我们做了一些将对手建模与博弈论结合的工作,这样你可以更充分地利用弱势玩家。

I mean, we've done some work on combining opponent modelling with game theory so you can exploit weak players even more.

Speaker 1

但那是另一个方向了。

But that's another strand.

Speaker 1

在Libros中我们没有启用这个功能,因为我认定这些玩家太强了,当你开始利用对手时,通常也会让自己暴露在被利用的风险中。

And in Libros we didn't turn that on because I decided that these players are too good and when you start to exploit an opponent, you typically open yourself up to exploitation.

Speaker 1

而且这些家伙几乎没有漏洞可钻,他们是反利用领域的全球顶尖专家。

And these guys have so few holes to exploit and they're world's leading experts in counter exploitation.

Speaker 1

所以我决定不启用那些功能。

So I decided that we're not going to turn that stuff on.

Speaker 0

实际上,看过几篇你关于利用对手弱点的论文。

Actually, saw a few of your papers exploiting opponents.

Speaker 0

这个话题探索起来非常有趣。

It sounded very interesting to explore.

Speaker 0

你认为在Labratis之外是否普遍存在可利用的空间?

Do you think there's room for exploitation generally outside of Labratis?

Speaker 0

是否存在某些主题或个体差异可以被利用?或许不仅限于扑克,还包括日常互动、谈判等你正在考虑的其他领域?

Is is there a subject or people differences that could be exploited, maybe not just in poker, but in general interactions and negotiations, all these other domains that you're considering?

Speaker 1

是的,当然存在。

Yeah, definitely.

Speaker 1

我们在这方面做过一些研究,我特别喜欢将两者结合的工作。

We've done some work on that and I really like the work that hybridizes the two.

Speaker 1

你要先推演出理性对手会怎么做。

So you figure out what would a rational opponent do.

Speaker 1

顺便说,这在零和博弈(双人零和游戏)中是安全的——因为即使对手做出非理性行为,虽然可能干扰我的判断,但对方通过干扰获得的收益永远小于其因拙劣策略造成的损失。

And by the way, that's safe in these zero sum games, two player zero sum games, because if the opponent does something irrational, yes, it might throw off my beliefs, but the amount that the player can gain by throwing off my belief is always less than they lose by playing poorly.

Speaker 1

所以这是安全的。

So it's safe.

Speaker 1

但如果对手实力较弱,你可能会调整策略来更大程度地利用其弱点。

But still, if somebody is weak as a player, you might want to play differently to exploit them more.

Speaker 1

可以这样理解:博弈论策略虽无法被击败,但也不能最大化地战胜其他对手。

So you can think about it this way, a game theoretic strategy is unbeatable, but it doesn't maximally beat the other opponents.

Speaker 1

因此采用不同策略时,单局收益可能会更高。

So the winnings per hand might be better with a different strategy.

Speaker 1

这种混合策略是指,你首先采用博弈论的方法,然后随着你在游戏树的某些部分获得对手的数据,你开始在这些部分逐渐调整策略,使其更偏向于利用对手,同时仍保持与游戏策略相当接近,以避免自己过度暴露于被利用的风险。

And the hybrid is that you start from a game theoretic approach and then as you gain data about the opponent in certain parts of the game tree, then in those parts of the game tree, you start to tweak your strategy more and more towards exploitation while still staying fairly close to the game strategy so as to not open yourself up to exploitation too much.

Speaker 0

你是怎么做到的?

How do you do that?

Speaker 0

你会尝试变换策略,使其难以预测吗?

Do you try to vary up strategies, make it unpredictable?

Speaker 0

就像是囚徒困境中的以牙还牙策略,还是——

It's like, what is it, tit for tat strategies in Prisoner's Dilemma or

Speaker 1

嗯,那属于重复博弈中比较简单的类型。

Well, that's a repeated game kind of simple.

Speaker 0

重复

Repeated

Speaker 1

博弈,囚徒困境就是重复博弈的一种。

games, Prisoner's that's Dilemma repeated games.

Speaker 1

但即便如此,也没有证据表明那是最优策略。

But even there, there's no proof that says that that's the best thing.

Speaker 1

不过实验证明,它的表现确实不错。

But experimentally, it actually does well.

Speaker 1

那么具体有哪些

So what kind

Speaker 0

类型的博弈呢?首先——

of games are there, first of all?

Speaker 0

我不确定这是否能用三言两语概括。

I don't know if this is something that you could just summarize.

Speaker 0

所以存在完全信息博弈,所有信息都摆在台面上。

So there's perfect information games where all the information's on the table.

Speaker 0

也存在不完全信息博弈。

There is imperfect information games.

Speaker 0

还有需要反复进行的重复博弈。

There's repeated games that you play over and over.

Speaker 0

以及零和博弈。

There's zero sum games.

Speaker 0

嗯。

Mhmm.

Speaker 0

还有非零和博弈。

There's non zero sum games.

Speaker 0

对。

Yeah.

Speaker 0

你提出了一个非常重要的区分:双人博弈与多人博弈。

And there's a really important distinction you're making two player versus more players.

Speaker 0

嗯。

Mhmm.

Speaker 0

那么还有哪些其他类型的博弈?比如双人博弈和多人博弈有什么区别?

So what are what other games are there and what's the difference for example with this two player game versus more players?

Speaker 0

关键区别在于...

What are the key differences in So the

Speaker 1

让我从基础开始讲。

let me start from the basics.

Speaker 1

重复博弈是指完全相同的游戏被反复进行的情形。

A repeated game is a game where the same exact game is played over and over.

Speaker 1

在这些扩展形式的博弈中,想象树状结构,可能带有这些信息集来表示不完全信息。

In these extensive form games, think about tree form, maybe with these information sets to represent incomplete information.

Speaker 1

你可以有某种重复性的互动。

You can have kind of repetitive interactions.

Speaker 1

顺便说一句,重复博弈其实也是其中的一个特例。

Even repeated games are a special case of that, by the way.

Speaker 1

但游戏不必完全相同。

But the game doesn't have to be exactly the same.

Speaker 1

就像在采购拍卖中那样。

It's like in sourcing auctions.

Speaker 1

是的,我们每年会面对相同的供应商基础,但每次采购的内容略有不同,供应商基础也每次都有细微变化,诸如此类。

Yes, we're going to see the same supply base year to year, but what I'm buying is a little different every time and the supply base is a little different every time and so on.

Speaker 1

所以这并不算真正的重复博弈。

So it's not really repeated.

展开剩余字幕(还有 455 条)
Speaker 1

因此在现实世界中,找到纯粹的重复博弈实际上非常罕见。

So to find a purely repeated game is actually very rare in the world.

Speaker 1

所以它们其实是对实际情况非常粗略的模型。

So they're really a very coarse model of what's going on.

Speaker 1

然后,如果你从简单的重复矩阵博弈往上延伸——不是直接到扩展形式博弈,而是介于两者之间——还有随机博弈,你可以将其视为这些小矩阵博弈,当你和对手采取行动时,它们不仅决定下一步会进入哪个状态或哪个博弈,而是决定可能进入的下一个博弈的概率分布。

Then, if you move up from just repeated, simple repeated matrix games, not all the way to extensive form games, but in between, there's stochastic games, where you think about it like these little matrix games and when you take an action and your opponent takes an action, they determine not which next state I'm going to, next game I'm going to, but the distribution over next games where I might be going to.

Speaker 1

这就是随机博弈。

So that's the stochastic game.

Speaker 1

但这就像矩阵游戏的重复,随机博弈的扩展形式博弈。

But it's like matrix games repeated, stochastic games extensive form games.

Speaker 1

这是从具体到一般的演变。

That is from less to more general.

Speaker 1

扑克就是后者的一个例子。

Poker is an example of the last one.

Speaker 1

所以它确实处于最一般的设定中。

So it's really in the most general setting.

Speaker 1

扩展形式博弈。

Extensive form games.

Speaker 1

这某种程度上正是AI社区一直在研究的内容,并以德州扑克单挑对局作为基准测试。

And that's kind of what the AI community has been working on and being benchmarked on with this heads up noblem with Texas Holden.

Speaker 0

你能描述一下扩展形式博弈吗?

Can you describe extensive form games?

Speaker 0

这里的核心思想是什么?

What's the motto here?

Speaker 1

是的,如果你熟悉树形结构的话。

Yeah, so if you're familiar with the tree form.

Speaker 1

所以它本质上是树形结构。

So it's really the tree form.

Speaker 1

就像国际象棋中存在搜索树一样。

Like in chess, there's a search tree.

Speaker 1

与矩阵形式相对。

Versus a matrix.

Speaker 1

对阵矩阵,是的。

Versus a matrix, yeah.

Speaker 1

这种矩阵被称为矩阵形式、双矩阵形式或标准型博弈。

The matrix is called the matrix form or bimatrix form or normal form game.

Speaker 1

而这里你看到的是树形结构,在这种形式下可以进行某些类型的推理,但转换为标准型时会丢失信息。

And here you have the tree form, so you can actually do certain types of reasoning there that you lose the information when you go to normal form.

Speaker 1

存在某种形式的等价性,比如从树形结构出发,如果把每个可能的应急方案都视为策略,那么实际上可以回归到标准型,但会因缺乏时序性而丢失部分信息。

There's a certain form of equivalence, like if you go from three form and you say every possible contingency plan is a strategy, then I can actually go back to the normal form, But I lose some information from the lack of sequentiality.

Speaker 1

多人博弈与双人博弈的区分是个重要概念。

Then the multiplayer versus two player distinction is an important one.

Speaker 1

因此零和博弈中的双人游戏在概念上和计算上都更简单。

So two player games in zero sum are conceptually easier and computationally easier.

Speaker 1

虽然像这个例子规模仍然很大,但在概念和计算层面确实更简单。

They're still huge like this one, but they're conceptually easier and computationally easier.

Speaker 1

从概念上说,当存在多个均衡时,你无需担心对方会选择哪个均衡策略。

That conceptually you don't have to worry about which equilibrium is the other guy going to play when there are multiple.

Speaker 1

因为任何均衡策略都是对其他均衡策略的最佳回应。

Because any equilibrium strategy is the best response to any other equilibrium strategy.

Speaker 1

所以我可以选择与你不同的均衡策略,我们仍能得到正确的博弈值。

So I can play a different equilibrium from you and we'll still get the right values of the game.

Speaker 1

这个特性在双人非零和博弈中就会失效。

That falls apart even with two players when you have general sum games.

Speaker 0

即使不考虑合作因素,也要说明这一点。

Even without cooperation, just to say.

Speaker 1

即便没有合作。

Even without cooperation.

Speaker 1

从两人零和博弈到两人非零和博弈,甚至到三人零和博弈,存在巨大差异。

So there's a big gap from two player zero sum to two player general sum or even to three player zero sum.

Speaker 1

至少在理论上,这是个巨大的鸿沟。

That's big gap, at least in theory.

Speaker 0

你能用非数学的方式直观解释为什么三人及以上玩家时体系就会崩溃吗?

Can you maybe non mathematically provide the intuition why it all falls apart with three or more players?

Speaker 0

看起来应该仍能存在一个纳什均衡点,既有指导意义又能保持稳定。

It it seems like you should still be able to have a Nash equilibrium that, that's instructive, that holds.

Speaker 1

好的。

Okay.

Speaker 1

确实,所有有限博弈都存在纳什均衡点。

So it is true that all finite games have a Nash equilibrium.

Speaker 1

这就是约翰·纳什实际证明的内容。

So this is what John Nash actually proved.

Speaker 1

所以它们确实存在纳什均衡。

So, they do have a Nash equilibrium.

Speaker 1

问题不在这里。

That's not the problem.

Speaker 1

问题在于可能存在多个均衡点。

The problem is that there can be many.

Speaker 1

于是就面临选择哪个均衡点的问题。

And then there's a question of which equilibrium to select.

Speaker 1

如果你从不同的均衡中选择策略,而我选择我的策略,那意味着什么?

And if you select your strategy from a different equilibrium and I select mine, then what does that mean?

Speaker 1

在这些非零和博弈中,我们可能会因为愚蠢而失去一些共同利益。

And in these non zero sum games, we may lose some joint benefit by being just simply stupid.

Speaker 1

如果我们采取其他行动,实际上双方都能获得更好的结果。

We could actually both be better off if we did something else.

Speaker 1

是的。

Yes.

Speaker 1

在三人游戏中,还会出现诸如共谋等其他问题。

And in three player you get other problems also like collusion.

Speaker 1

比如也许你我可以联合起来对付第三方玩家,通过共谋我们能取得显著优势。

Like maybe you and I can get up on a third player and we can do radically better by colluding.

Speaker 1

因此这里会出现很多复杂问题。

So there are lots of issues that come up there.

Speaker 0

与你合作的学生No Brown在Reddit的AMA中提到,我查阅了相关内容,他表示扑克玩家间的协作能力将改变游戏格局。

So No Brown, the student you worked with on this has mentioned, I looked through the AMA on Reddit, he mentioned that the ability of poker players to collaborate will make the game.

Speaker 0

有人提问你们将如何设计扑克游戏,使其超越当前AI方法的可解范围?

He was asked the question of how would you make the game of poker, or both of you were asked the question, how would you make the game of poker beyond being solvable by current AI methods?

Speaker 0

他回答说让扑克变得更难的方法不多,但玩家间的合作会使其变得极其困难。

And he said that there's not many ways of making poker more difficult, but collaboration or cooperation between players would make it extremely difficult.

Speaker 0

你能解释一下这背后的原理吗?

So can you provide the intuition behind why that is?

Speaker 0

如果你同意这个观点的话。

If you agree with that Yeah.

Speaker 1

我在碰撞博弈方面做了大量研究,目前有一篇与我的学生Gabriela Farina及NIPS的其他合作者共同完成的论文。

So I've done a lot of work collisional games and we actually have a paper here with my other student, Gabriela Farina and some other collaborators at NIPS on that.

Speaker 1

实际上我刚从展示这项研究的海报会议回来。

Actually just came back from the poster session where we presented this.

Speaker 1

所以当出现共谋时,问题性质就不同了。

So, when you have a collusion, it's a different problem.

Speaker 1

而且通常情况会变得更加棘手。

And it typically gets even harder then.

Speaker 1

即便是博弈表示法,有些表示法确实不利于进行有效计算。

Even the game representations, some of the game representations don't really allow good computation.

Speaker 1

为此我们专门引入了一种新的博弈表示方法。

So we actually introduced a new game representation for that.

Speaker 0

这种合作行为是模型的一部分吗?

Is that kind of cooperation part of the model?

Speaker 0

你们是否掌握其他玩家正在合作的信息?

Do you have information about the fact that other players are cooperating?

Speaker 0

还是说这完全是个一无所知的混沌状态?

Or is it just this chaos where nothing is known?

Speaker 1

有些信息是未知的。

Some things unknown.

Speaker 1

你能

Can you

Speaker 0

举个共谋类博弈的例子吗?或者这类情况通常

give an example of a collusion type game or is it usually

Speaker 1

就像桥牌一样。

So like Bridge.

Speaker 1

所以想想桥牌。

So think about Bridge.

Speaker 1

就像我们同在一个团队时,我们的收益是相同的。

It's like when you and I are on a team, our payoffs are the same.

Speaker 1

问题在于我们不能交流。

The problem is that we can't talk.

Speaker 1

所以当我拿到牌时,我不能悄悄告诉你我有什么牌,那是不被允许的。

So when I get my cards, I can't whisper to you what my cards are, that would not be allowed.

Speaker 1

因此我们必须提前协调策略,而且只能提前协调。

So we have to somehow coordinate our strategies ahead of time and only ahead of time.

Speaker 1

然后我们可以讨论某些信号,但这些信号必须让对方团队也能理解。

And then there are certain signals we can talk about, but they have to be such that the other team also understands them.

Speaker 1

这就是一个协调机制已经内置于游戏规则中的例子。

So, that's an example where the coordination is already built into the rules of the game.

Speaker 1

但在许多其他情况下,比如拍卖、谈判、外交关系、扑克等,这种机制并没有内置,但对共谋者仍然非常有用。

But in many other situations like auctions or negotiations or diplomatic relationships, poker, it's not really built in, but it still can be very helpful for the colloders.

Speaker 1

我在某处读到过你,

I've read you somewhere,

Speaker 0

在谈判时,你会带着事先准备好的策略来到谈判桌前,比如你愿意做什么、不愿意做什么之类的。

when negotiations you come to the table with prior, like a strategy that you're willing to do and not willing to do, those kinds of things.

Speaker 0

那么现在如何从扑克转向其他应用领域,比如谈判?如何开始将其应用到其他领域,甚至是你曾经工作过的现实世界领域?

So how do you start to now moving away from poker, moving beyond poker into other applications like negotiations, how do you start applying this to other domains, maybe even real world domains that you've worked on?

Speaker 1

是的,我其实有两家初创公司专门做这个。

Yeah, I actually have two startup companies doing exactly that.

Speaker 1

一家叫Strategic Machine,主要做商业应用、游戏、体育等各种相关领域。

One is called Strategic Machine and that's for kind of business applications, gaming, sports, all sorts of things like that.

Speaker 1

这些技术在商业、体育、游戏以及金融、电力市场等各类领域都有应用。

Any applications of this to business and to sports and to gaming, to various types of things in finance, electricity markets and so on.

Speaker 1

另一家叫Strategy Robot,我们将这些技术应用于军事安全、网络安全和情报领域。

And the other is called Strategy Robot where we are taking these to military security, cybersecurity and intelligence applications.

Speaker 1

我想你之前做过一些...怎么说来着

I think you worked a little bit in, how do

Speaker 0

就是广告推荐这类工作,对

you put it, advertisement sort of suggesting ads kind of thing, Yes,

Speaker 1

那是另一家公司,叫optimised markets。

that's another company, optimised markets.

Speaker 0

Optimised markets。

Optimised markets.

Speaker 1

但那更多是关于组合市场和基于优化的技术。

But that's much more about a combinatorial market and optimisation based technology.

Speaker 1

并没有使用这些博弈论推理技术。

That's not using these game theoretic reasoning technologies.

Speaker 0

明白了。

I see.

Speaker 0

好的,那么从高层次来看,你认为我们运用博弈论概念来建模人类行为的能力如何?

Okay, so what sort of high level do you think about our ability to use game theoretic concepts to model human behavior?

Speaker 0

人类行为在扑克游戏之外也能适用这种建模方式吗?

Do human you behavior is amenable to this kind of modeling outside of the poker games?

Speaker 0

你在工作中见过哪些成功应用案例?

And where have you seen it done successfully in your work?

Speaker 1

我不确定。

I'm not sure.

Speaker 1

真正的目标是模拟人类行为。

The goal really is modeling humans.

Speaker 1

比如在零和游戏中,我其实不在乎对手是否遵循我的理性行为模型,因为如果他们不遵循,对我反而更有利。

Like, for example, if I'm playing a zero sum game, I don't really care that the opponent is actually following my model of rational behavior because if they're not, that's even better for me.

Speaker 1

对吧。

Right.

Speaker 1

所以在游戏中对阵时,前提是你需要将互动形式化

So see with the opponents in games, the prerequisite is that you formalize the

Speaker 0

为某种可分析的方式。

interaction in some way that can be amenable to analysis.

Speaker 0

你在机制设计方面做了出色工作,设计了能产生特定结果的游戏。

And you've done this amazing work with mechanism design, designing games that have certain outcomes.

Speaker 0

那我举个自动驾驶领域的例子。

But so I'll tell you an example from my from my world of autonomous vehicles.

Speaker 0

对吧?

Right?

Speaker 0

我们研究行人,行人与车辆通过这种非语言交流进行协商。

We're studying pedestrians, and pedestrians and cars negotiate in this nonverbal communication.

Speaker 0

这是一种奇怪的紧张游戏舞蹈,行人基本上在说,我相信你不会撞我。

There's this weird game dance of tension where pedestrians are basically saying, I trust that you won't kill me.

Speaker 0

因此作为一个乱穿马路者,我会走上马路尽管这是违法的,而且存在这种紧张感。

And so as a jaywalker, I will step onto the road even though I'm breaking the law and there's this tension.

Speaker 0

问题在于,在尝试模拟意图时,我们真的不知道如何很好地模拟这种情况。

And the question is, we really don't know how to model that well in in trying to model intent.

Speaker 0

所以人们有时会提出博弈论等想法。

And so people sometimes bring up ideas of game theory and so on.

Speaker 0

你认为人类行为的这种方面可以使用这类不完美信息方法进行建模吗?

Do you think that aspect of human behavior can use these kinds of imperfect information approaches, modeling?

Speaker 0

当你甚至不知道如何设计游戏来描述这种情况以便解决它时,我们该如何开始解决这样的问题?

How do we how do you start to attack a problem like that when you don't even know how the game design the game to describe the situation in order to solve it?

Speaker 1

好的。

Okay.

Speaker 1

我其实没怎么思考过乱穿马路的问题,但我认为在自动驾驶车辆中一个很好的应用场景如下。

So I haven't really thought about jaywalking, but one thing that I think could be a good application in autonomous vehicles is the following.

Speaker 1

假设你有不同公司运营的自动驾驶车队。

So let's say that you have fleets of autonomous cars operating by different companies.

Speaker 1

比如这边是Waymo车队,那边是Uber车队。

So maybe here's the Waymo fleet and here's the Uber fleet.

Speaker 1

如果你思考交通规则,它们定义了某些法律规则,但仍留下了巨大的策略空间。

If you think about the rules of the road, they define certain legal rules, but that still leaves a huge strategy space open.

Speaker 1

举个简单例子,当车辆并道时,你知道人类如何并道——他们会减速、互相观察然后尝试并道。

As a simple example, when cars merge, you know, how humans merge, you know, they slow down and look at each other and try to merge.

Speaker 1

如果这些情况能预先重新协商,让我们能全速合并,明确现状和操作方式,一切不就更快了吗?这样不是更好吗?

Wouldn't it be better if these situations would already be renegotiated so we can actually merge at full speed and we know that this is the situation, this is how we do it and it's all going to be faster.

Speaker 1

但需要手动协商的情况实在太多了。

But there are way too many situations to negotiate manually.

Speaker 1

所以你可以采用自动化协商。

So you could use automated negotiation.

Speaker 1

至少这是个思路。

This is the idea at least.

Speaker 1

你可以用自动化协商提前处理所有这些或大部分情况。

You could use automated negotiation to negotiate all of these situations or many of them in advance.

Speaker 1

当然,可能有时候你不会总是让我先走。

And of course, it might be that hey, maybe you're not going to always let me go first.

Speaker 1

也许你会说,好吧,在这些情况下我让你先走,但作为交换你要给我更多,或者在其他情况下让我先走。

Maybe you said okay, well, in these situations I'll let you go first, but in exchange you're going to give me too much, you're going to let me go first in this situation.

Speaker 1

所以这是个庞大的组合式协商。

So, it's this huge combinatorial negotiation.

Speaker 0

你认为在那个合并案例中,是否有空间将整个情况建模为不完全信息博弈,还是你更倾向于视为完全信息?

And do you think there's room in that example of merging to model this whole situation as an imperfect information game or do you really want to consider it to be a perfect?

Speaker 1

不,这是个好问题。

No, that's a good question.

Speaker 1

对,这是个好问题。

Yeah, that's a good question.

Speaker 0

你是否要为假设自己并非全知而付出代价?

Do you pay the price of assuming that you don't know everything?

Speaker 1

是啊,我也不清楚。

Yeah, I don't know.

Speaker 1

这确实简单多了。

It's certainly much easier.

Speaker 1

完全信息博弈要容易得多。

Games with perfect information are much easier.

Speaker 1

所以如果可行的话,你应该这么做。

So if you can get away with it, you should.

Speaker 1

但如果实际情况信息不完整,那你就得处理信息不完整的问题。

But if the real situation is of imperfect information, then you're going to have to deal with imperfect information.

Speaker 0

太好了。

Great.

Speaker 0

那么你从中吸取了哪些经验教训?

So what lessons have you learned?

Speaker 0

年度计算机扑克大赛。

The annual computer poker competition.

Speaker 0

人工智能的一项惊人成就。

An incredible accomplishment of AI.

Speaker 0

你知道,回顾深蓝、AlphaGo的历史,这些时刻都是人工智能在工程与科学双重努力下超越人类顶尖选手的里程碑。

You know, you look at the history of Deep Blue, AlphaGo, these kind of moments when AI stepped up in an engineering effort and a scientific effort combined to to beat the best of human players.

Speaker 0

那么从这次经历中你获得了什么?

So what what do you take away from this whole experience?

Speaker 0

关于设计这类游戏的人工智能系统,你学到了什么?

What have you learned about designing AI systems that play these kinds of games?

Speaker 0

这对人工智能整体而言,对未来AI发展意味着什么?

And what does that mean for sort of AI in general for the future of AI development?

Speaker 1

是的,这是个很好的问题。

Yeah, so that's a good question.

Speaker 1

关于这方面有很多可说的。

There's so much to say about it.

Speaker 1

我确实喜欢这种以性能为导向的研究。

I do like this type of performance oriented research.

Speaker 1

虽然在我的团队里,我们从构思到理论、实验、大型系统部署再到商业化全程参与。

Although in my group we go all the way from idea to theory, experiments, to big system fielding, to commercialisation.

Speaker 1

所以我们覆盖了整个研究光谱。

So we span that spectrum.

Speaker 1

我认为在AI的很多情况下,你必须先构建大型系统并进行规模化评估,才能真正知道哪些方法有效。

I think that in a lot of situations in AI, you really have to build the big systems and evaluate them at scale before you know what works and doesn't.

Speaker 1

我们在计算博弈论领域就见过这种情况,很多技术在小规模时表现良好,但大规模时就失效了。

And we've seen that in the computational game theory community, that there are a lot of techniques that look good in the small, but then they cease to look good in the large.

Speaker 1

我们还发现很多技术在理论上看起来更优越——我指的是像一阶方法这样的收敛速度。

And we've also seen that there are a lot of techniques that look superior in theory, and I really mean in terms of convergence rates, like first order methods.

Speaker 1

更好的收敛速度,比如基于CFR的算法,但实际上基于CFR的算法才是最快的。

Better convergence rates, like the CFR based algorithms, yet the CFR based algorithms are the fastest in practice.

Speaker 1

这确实告诉我必须在现实中测试这些方法。

So it really tells me that you have to test this in reality.

Speaker 1

可以说现有理论还不够完善,无法准确判断哪种算法更优越。

The theory isn't tight enough, if you will, to tell you which algorithms are better than the others.

Speaker 1

你必须从宏观角度看待这些问题,因为在这个领域,任何从小范围做出的预测都极可能产生误导。

And you have to look at these things in the large because any sort of projections you do from the small can at least in this domain be very misleading.

Speaker 1

这是从科学与工程的角度来看。

So that's from a science and engineering perspective.

Speaker 1

从个人角度而言,我们组织的首次扑克比赛——人脑对决AI的人机扑克大赛,真是一次疯狂的经历。

From a personal perspective, it's been just a wild experience with the first poker competition, the first brains versus AI, man machine poker competition that we organised.

Speaker 1

顺便说一句,其他类型的扑克比赛之前有过,但这是首次举办单挑无限注的比赛。

Had been, by the way, for other poker games there had been previous competitions, but this was for heads up, no limit, this was the first.

Speaker 1

而我可能因此成了扑克界最招人恨的家伙。

And I probably became the most hated person in the world of poker.

Speaker 1

我本无意如此。

I didn't mean to.

Speaker 0

为什么会这样?

Why is that?

Speaker 0

为了某些目的破解游戏。

Cracking the game for something.

Speaker 1

是啊。

Yeah.

Speaker 1

很多人都觉得这对整个游戏、游戏的存在构成了真正的威胁。

A lot of people felt that it was a real threat to the whole game, the whole existence of the game.

Speaker 1

如果AI变得比人类更强,人们会害怕玩扑克,因为那些超人般的AI四处游走,夺走他们的钱等等。

If AI becomes better than humans, people would be scared to play poker because there are these superhuman AIs running around taking their money and all of that.

Speaker 1

所以,这确实非常激进。

So, it's just really aggressive.

Speaker 1

那些评论超级激进。

The comments were super aggressive.

Speaker 1

我收到了除死亡威胁之外的所有恶毒言论。

I got everything just short of death threats.

Speaker 1

你觉得

Do you

Speaker 0

国际象棋也是这样吗?

think the same was true for chess?

Speaker 0

因为就在最近他们刚结束国际象棋世界锦标赛,人类开始刻意忽略现在有AI系统表现优于人类的事实,他们依然享受这个游戏,

Because right now they just completed the world championships in chess and humans just started ignoring the fact that there's AI systems now that outperform humans and they still enjoy the game,

Speaker 1

这依然是个美妙的游戏。

it's still a beautiful game.

Speaker 1

这就是我的看法。

That's what I think.

Speaker 1

我认为扑克界也在发生同样的事。

And I think the same thing happens in poker.

Speaker 1

所以我不认为自己会成为扼杀这个游戏的人,事实上我也没做到。

And so I didn't think of myself as somebody who was going to kill the game and I don't think I did.

Speaker 1

我真的学会了热爱这个游戏。

I've really learned to love this game.

Speaker 1

我以前不是扑克玩家,但从这些AI身上学到了许多精妙之处,顺便说一句,它们彻底改变了游戏的玩法。

I wasn't a poker player before, but learned so many nuances about it from these AIs and they've really changed how the game is played, by the way.

Speaker 1

所以,它们用这些非常外星人式的打法玩扑克,而顶尖人类选手现在正把这些策略融入自己的玩法中。

So, they have these very Martian ways of playing poker and the top humans are now incorporating those types of strategies into their own play.

Speaker 1

因此,在我看来,我们的工作反而让扑克对人类而言成为更丰富、更有趣的游戏,而不是让人们完全远离它。

So, if anything, to me, our work has made poker a richer, more interesting game for humans to play, Not something that is going to steer humans away from it entirely.

Speaker 0

对你刚才提到的一点,我想简短评论一下——在学术界,这种情况有时确实相当罕见。

Just a quick comment on something you said, is, if I may say so in academia, is a little bit rare sometimes.

Speaker 0

像你描述的那样将自己的想法付诸实践检验,这需要相当大的勇气。

It's pretty brave to put your ideas to the test in the way you described.

Speaker 0

承认有些好点子在实际大规模应用时可能行不通。

Saying that sometimes good ideas don't work when you actually try to apply them at scale.

Speaker 0

那么这种理念源自何处呢?

So where does that come from?

Speaker 0

我的意思是,如果要给人们建议,是什么在驱动着你这样做?

I mean, if you could do advice for people, what drives you in that sense?

Speaker 0

你一直都是这样的吗?

Were you always this way?

Speaker 0

我想说的是,这确实需要勇气,去验证自己的想法,看它是否真能对抗人类顶尖选手等等。

I mean, it takes a brave person, I guess is what I'm saying, to test their ideas and to see if this thing actually works against human, top human players and so on.

Speaker 1

我不确定是否勇敢,但这确实需要大量工作。

I don't know about Brave, but it takes a lot of work.

Speaker 1

组织大型活动之类的事情需要耗费大量时间和精力。

It takes a lot of work and a lot of time to organize, make something big and to organize an event and stuff like that.

Speaker 0

是什么驱使你付出这些努力?

And what drives you in that effort?

Speaker 0

因为我认为,即使不做这些,你依然可以像2017年那样获得NIPS最佳论文奖。

Because you could still, I would argue, get a Best Paper Award at NIPS as you did in 2017 without doing this.

Speaker 1

没错,是的。

That's right, yes.

Speaker 1

所以总的来说,我认为在现实世界中大规模地做事非常重要。

So So, in general, I believe it's very important to do things in the real world and at scale.

Speaker 1

这才是真正见真章的地方,如果你愿意这么理解的话。

And that's really where pudding, if you will.

Speaker 1

事实胜于雄辩。

Proof is in the pudding.

Speaker 1

关键就在这里。

That's where it is.

Speaker 1

在这个特定案例中,多年来不同团队之间一直存在某种竞赛,看谁能率先在无限注德州扑克单挑中击败顶尖人类选手。

In this particular case, it was kind of a competition between different groups for many years as to who can be the first one to beat the top humans at Heads Up No Limit Texas Hold'em.

Speaker 1

所以这演变成了一场看谁能率先达标的竞赛。

So it became kind of like a competition who can get there.

Speaker 0

是啊。

Yeah.

Speaker 0

看来良性的竞争确实能极大推动进步。

So a little friendly competition could do wonders for progress.

Speaker 1

是的,完全正确。

Yes, absolutely.

Speaker 0

机制设计这个话题非常有趣,对我来说也比较新颖——虽然我作为观察者接触过政治等领域,但你们论文中提出的自动化机制设计是我刚读到的。

So the topic of mechanism design, which is really interesting, also kind of new to me, except as an observer of, I don't know, politics and any I'm an observer of mechanisms, but you you write in your paper an automated mechanism design that that I quickly read.

Speaker 0

所谓机制设计,就是通过制定游戏规则来达成特定理想结果。

So mechanism design is designing the rules of the game so you get a certain desirable outcome.

Speaker 0

你正在以自动化的方式完成这项工作,而非通过精细调整。

And you have this work on doing so in an automatic fashion as opposed to fine tuning it.

Speaker 0

那么你从这些努力中学到了什么?

So what have you learned from those efforts?

Speaker 0

如果你观察,比如说,我不确定,像我们的政治体系这样的复杂系统。

If you look, say, I don't know, at complexes like our political system.

Speaker 0

我们能否设计政治体系,使其以自动化方式产生我们期望的结果?

Can we design our political system to have in an automated fashion to have outcomes that we want?

Speaker 0

我们能否设计像智能交通灯这样的东西,使其实现我们想要的结果?

Can we design something like traffic lights to be smart where it gets outcomes that we want?

Speaker 0

你从这项工作中得出了哪些经验教训?

So what are the lessons that you draw from that work?

Speaker 0

是的,所以我仍然非常坚信

Yeah, so I still very much believe in

Speaker 1

自动化机制设计这个方向。

the automated mechanism design direction.

Speaker 1

但它并非万能药。

But it's not a panacea.

Speaker 1

机制设计领域存在不可能性结果,表明在C类中不存在能实现目标X的机制。

There are impossibility results in mechanism design, saying that there is no mechanism that accomplishes objective X in class C.

Speaker 1

因此,在机制设计中,使用任何工具——无论是手动还是自动——都无法完成某些特定任务。

So there's no way using any mechanism design tools, manual or automated, to do certain things in mechanism design.

Speaker 0

你能再描述一遍吗?

Can you describe that again?

Speaker 0

所以意思是那是不可能实现的吗?

So meaning it's impossible to achieve that?

Speaker 1

是的,这几乎是不可能的。

Yeah, there's certainly unlikely.

Speaker 1

不可能。

Impossible.

Speaker 1

所以这些并不是关于人类智慧可能想出聪明办法的陈述。

So these are not statements about human ingenuity who might come up with something smart.

Speaker 1

这些证明表明,如果你想在类别C中实现属性X,任何机制都无法做到。

These are proofs that if you want to accomplish properties X in class C, that is not doable with any mechanism.

Speaker 1

自动机制设计的好处在于,我们实际上并不是为整个类别设计的。

The good thing about automated mechanism design is that we're not really designing for a class.

Speaker 1

我们每次都是针对特定情境进行设计。

We're designing for specific settings at a time.

Speaker 1

因此,即使对整个类别存在不可能性结果,也不意味着该类别中的所有情况都不可能。

So even if there's an impossibility result for the whole class, it just doesn't mean that all of the cases in the class are impossible.

Speaker 1

这只意味着其中某些情况是不可能的。

It just means that some of the cases are impossible.

Speaker 1

所以我们实际上可以在这些已知不可能的类别中开辟出可能的岛屿。

So we can actually carve these islands of possibility within these known impossible classes.

Speaker 1

而且我们确实做到了这一点。

And we've actually done that.

Speaker 1

机制设计领域最著名的成果之一是罗杰·迈尔森和马克·萨特思韦特于1983年提出的迈尔森-萨特思韦特定理。

One of the famous results in mechanism design is the Myerton Sathethwaite theorem by Roger Myerton and Marc Sathethwaite from 1983.

Speaker 1

在信息不完善的情况下,高效交易是不可能的。

It's an impossibility of efficient trade under imperfect information.

Speaker 1

我们证明在许多情况下你仍可以避免这种情况,实现高效交易。

We show that you can in many settings avoid that and get efficient trade anyway.

Speaker 0

这取决于你如何设计游戏规则。

Depending on how you design the game.

Speaker 0

取决于

Depending

Speaker 1

你如何设计游戏规则。

on how you design the game.

Speaker 1

当然,这并不以任何方式否定不可能性结果。

And of course, it doesn't in any way contradict the impossibility result.

Speaker 1

不可能性结果依然存在,但它只是在这个不可能类别中找到了某些特定点,在这些点上不存在不可能性。

The impossibility result is still there, but it just finds spots within this impossible class where in those spots you don't have the impossibility.

Speaker 0

抱歉如果我有点哲学化,但正如我提到的,你认为这些对政治或人际互动有什么启示?比如设计机制不仅适用于交易、拍卖或纯粹形式化的游戏,还包括政治体系这样的人际互动。

Sorry if I'm going a bit philosophical, but what lessons do you draw towards, like I mentioned, politics or human interaction and designing mechanisms for outside of just these kinds of trading or auctioning or purely formal games or human interaction, like a political system.

Speaker 0

你认为它能如何应用于政治、商业、谈判这类事务?设计能产生特定结果的规则。

What how can do you think it's applicable to, yeah, politics or to business, to negotiations, these kinds of things, designing rules that have certain outcomes.

Speaker 1

是的。

Yeah.

Speaker 1

是的,我确实这么认为。

Yeah, I do think so.

Speaker 0

你见过成功实施的案例吗?

Have you seen that successfully done?

Speaker 1

其实并没有。

It hasn't really.

Speaker 1

哦,你是指机制设计还是自动化机制?

Oh, you mean mechanism design or automated mechanism?

Speaker 1

自动化机制设计。

Automated mechanism design.

Speaker 1

到目前为止,机制设计本身的成功相当有限。

So mechanism design itself has had fairly limited success so far.

Speaker 1

虽然存在某些案例,但现实世界中的大多数情况从机制设计的角度来看实际上并不完善。

There are certain cases, but most of the real world situations are actually not sound from a mechanism design perspective.

Speaker 1

即使在那些由精通机制设计的专家设计的案例中,人们通常也只是从理论中汲取一些见解并应用于现实,而非直接应用机制本身。

Even in those cases where they've been designed by very knowledgeable mechanism design people, the people are typically just taking some insights from the theory and applying those insights into the real world rather than applying the mechanisms directly.

Speaker 1

一个著名的例子就是FCC频谱拍卖。

So, one famous example is the FCC spectrum auctions.

Speaker 1

我也曾参与其中,扮演了一个小角色。

I've also had a small role in that.

Speaker 1

许多精通博弈论的优秀经济学家一直在研究这个问题。

Very good economists have been working, excellent economists have been working on that who know game theory.

Speaker 1

然而实际设计的规则却使得诚实竞价并非最佳策略。

Yet the rules that are designed in practice there, they're such that bidding truthfully is not the best strategy.

Speaker 1

通常在机制设计中,我们试图让参与者更容易操作,因此诚实是最佳策略。

Usually in mechanism design we try to make things easy for the participants, so telling the truth is the best strategy.

Speaker 1

但即便是在涉及数百亿美元频谱拍卖的高风险场景中,诚实仍然不是最佳策略。

But even in those very high stakes auctions where you have tens of billions of dollars worth of spectrum being auctioned, truth telling is not the best strategy.

Speaker 1

顺便说一句,目前还没有人知道这些拍卖中的任何一种最优竞价策略。

And by the way, nobody knows even a single optimal bidding strategy for those auctions.

Speaker 0

制定最优出价的挑战是什么?

What's the challenge of coming up with an optimal bid?

Speaker 0

因为参与者众多且信息不完全

Because there's a lot of players and there's imperfectly

Speaker 1

不是说很多

It's not to say a lot

Speaker 0

参与者,而是拍卖品数量庞大。

of players, but many items for sale.

Speaker 0

这些机制设计得即使只有两件或一件拍品,如实出价也并非最佳策略。

And these mechanisms are such that even with just two items or one item, bidding truthfully wouldn't be the best strategy.

Speaker 0

纵观AI发展史,它是由一系列里程碑事件标记的。

If you look at the history of AI, it's marked by seminal events.

Speaker 0

AlphaGo成为世界围棋冠军后,我认为LeBratis在无限注德州扑克单挑赛中获胜也是这类标志性事件之一。

And AlphaGo being a world champion, human Go player, I would put LeBratis winning the heads up no limit, hold them as one of such event.

Speaker 1

谢谢。

Thank you.

Speaker 0

那你认为

And what what do you think

Speaker 1

下一个类似事件会是什么?

is the next such event?

Speaker 1

无论是在你个人研究领域还是整个AI界,你觉得未来可能出现哪些让世界震惊的突破?

Whether it's in your life or in the broadly AI community that you think might be out there that would surprise the world?

Speaker 1

这是个好问题,但我确实不知道答案。

So, that's a great question and I don't really know the answer.

Speaker 1

在博弈求解领域,无限注德州扑克单挑模式确实是公认的最后一道基准测试。

In terms of game solving, Heads Up No Limit Texas Hold'em really was the one remaining widely agreed upon benchmark.

Speaker 1

所以那是个重大里程碑。

So that was the big milestone.

Speaker 1

那么,还有其他挑战吗?

Now, are there other things?

Speaker 1

当然,肯定存在其他挑战。

Yes, certainly there are.

Speaker 1

但学界尚未就某个特定问题达成共识。

But there is not one that the community has kind of focused on.

Speaker 1

那么其他可能的方向是什么?

So what could be other things?

Speaker 1

有团队在研究《星际争霸》。

There are groups working on StarCraft.

Speaker 1

有团队在研究《DOTA2》。

There are groups working on DOTA two.

Speaker 1

这些都是电子游戏。

These are video games.

Speaker 1

或者像《外交博弈》《花火》这类桌游。

Or you could have Diplomacy or Hanabi, things like that.

Speaker 1

这些都是娱乐性游戏,但尚未有哪个被公认为下一个核心挑战问题。

These are recreational games, but none of them are really acknowledged as kind of the main next challenge problem.

Speaker 1

就像国际象棋、围棋或无上限德州扑克那样。

Like chess or go or heads up no limit Texas Hold'em was.

Speaker 1

所以在游戏求解领域,我并不确定下一个标杆会是什么或何时出现。

So I don't really know in the game solving space what is or what will be the next benchmark.

Speaker 1

我有点希望会有下一个标杆,因为过去十年里,不同团队研究同一问题确实极大地推动了这些与应用无关的技术快速发展。

I kind of hope that there will be a next benchmark cause really the different groups working on the same problem really drove these application independent techniques forward very quickly over ten years.

Speaker 0

你认为是否存在一个让你兴奋的开放性问题,能让你从游戏转向现实世界的博弈,比如股票市场交易?

Do you think there's an open problem that excites you that you start moving away from games into real world games like say the stock market trading.

Speaker 1

是的,这差不多就是我的风格。

Yeah, so that's kind of how I am.

Speaker 1

所以我可能不会在这些娱乐性标杆上投入太多精力了。

So I am probably not going to work as hard on these recreational benchmarks.

Speaker 1

我正在创办两家关于游戏求解技术的初创公司——战略机器和策略机器人。

I'm doing two startups on game solving technology, strategic machine and strategy robot.

Speaker 1

我们真正感兴趣的是将这些技术投入实际应用。

We're really interested in pushing this stuff into practice.

Speaker 1

你觉得

What do you

Speaker 0

什么会是你认为真正强大且令人惊讶的成果?

think would be really you know, a powerful result that would be surprising?

Speaker 0

比方说五年、十年后,从统计学角度看不太可能实现,但如果出现突破,可能会达成什么?

That would be, if you can say, mean, you know, five years, ten years from now, something that statistically you would say is not very likely, but if there's a breakthrough, what achieve?

Speaker 1

是的,我认为总体而言,我们在博弈论领域所处的局面与机器学习领域截然不同。

Yeah, so I think that overall, we're in a very different situation in game theory than we are in, let's say, machine learning.

Speaker 1

在机器学习领域,这是一项相当成熟的技术,应用广泛,并在现实世界中取得了显著成功。

So in machine learning, it's a fairly mature technology and it's very broadly applied and proven success in the real world.

Speaker 1

在游戏求解方面,目前几乎还没有实际应用。

In game solving, there are almost no applications yet.

Speaker 1

我们刚刚实现了超越人类的能力,而机器学习可以说在90年代甚至更早就已经做到了这一点。

We have just become superhuman, which machine learning you could argue happened in the 90s, if not earlier.

Speaker 1

至少在特定复杂的监督学习应用上是如此。

At least on learning, certain complex supervised learning applications.

Speaker 1

现在,我认为下一个挑战性问题——虽然你并非从这个角度提问,你问的是技术突破——

Now, I think the next challenge problem, I know you're not asking about it this way, you're asking about technology breakthrough.

Speaker 1

但我认为重大突破在于能够证明,或许大部分军事规划或商业战略都将通过计算博弈论来策略性地完成。

But I think the big breakthrough is to be able to show that maybe most of, let's say, military planning or most of business strategy will actually be done strategically using computational game theory.

Speaker 1

这是我期望在未来五到十年内实现的目标。

That's what I would like to see as a next five or ten year goal.

Speaker 0

也许你可以再解释一下,请原谅如果这是个显而易见的问题:机器学习方法如神经网络存在不透明、不可解释的缺陷。

Maybe you can explain to me again, forgive me if this is an obvious question, but machine learning methods, neural networks suffer from not being transparent, not being explainable.

Speaker 0

博弈论方法,比如纳什均衡,当你看到不同的解决方案时——比如在讨论军事行动时——那些战略方案是否合理?

Game theoretic methods, you know, Nash equilibria, do they generally when you see the different solutions, are they when when you talk about military operations, are they once you see the strategies, do they make sense?

Speaker 0

它们是可解释的,还是存在与神经网络相同的问题?

Are they explainable or do they suffer from the same problems as neural networks do?

Speaker 1

这是个很好的问题。

So that's a good question.

Speaker 1

我的回答可以说是既肯定又否定。

I would say a little bit yes and no.

Speaker 1

我的意思是,这些博弈论策略,比如纳什均衡,具有可证明的特性。

And what I mean by that is that these game theoretic strategies, let's say Nash equilibrium, it has provable properties.

Speaker 1

所以它不像深度学习那样,你只能祈祷它能奏效。

So it's unlike, let's say, deep learning where you kind of cross your fingers, hopefully it'll work.

Speaker 1

即使在事后获得权重参数后,你仍在祈祷它能正常工作。

And then after the fact, you have the weights, you're still crossing your fingers and I hope it will work.

Speaker 1

而在这里,你可以确信解决方案的质量是有保障的。

Here, you know that the solution quality is there.

Speaker 1

存在可证明的解决方案质量保证。

There's provable solution quality guarantees.

Speaker 1

不过,这并不意味着这些策略是人类可以理解的。

Now, that doesn't necessarily mean that the strategies are human understandable.

Speaker 1

那完全是另一个问题。

That's a whole other problem.

Speaker 1

我认为深度学习和计算博弈论在这方面处境相同。

I think that deep learning and computational game theory are in the same boat in that sense.

Speaker 1

两者都难以理解。

Both are difficult to understand.

Speaker 1

但至少博弈论技术有这些质量保证。

But at least the game theoretic techniques, they have these guarantees of quality.

Speaker 0

你认为未来的商业运营、战略运营甚至军事行动,是否会成为自动化系统提出的有力候选方案?

Do you see business operations, strategic operations, even military in the future being at least the strong candidates being proposed by automated systems?

Speaker 0

你看到这种趋势了吗?

Do you see that?

Speaker 1

是的,我有。

Yeah, I do.

Speaker 1

我有。

I do.

Speaker 1

但那更多

But that's more

Speaker 0

是一种信念而非确凿的事实。

of a belief than a substantiated fact.

Speaker 0

根据你持乐观还是悲观态度,对我来说那是个令人兴奋的未来,特别是在可证明的最优性方面。

Depending on where you land in optimism or pessimism, that's a really to me, that's an exciting future, especially if there's provable things in terms of optimality.

Speaker 0

展望未来,有些人担心——特别是你看扑克游戏,这可能是最后一个被解决的游戏基准。

So looking into the future, there's a few folks worried about the especially you look at the game of poker, which is probably one of the last benchmarks in terms of games being solved.

Speaker 0

他们担心未来和人工智能带来的生存威胁。

They they worry about the future and the existential threats of artificial intelligence.

Speaker 1

无论以何种形式对社会产生的负面影响。

So the negative impact in whatever form on society.

Speaker 1

这是否同样让你担忧,还是你对AI的积极影响更为乐观?

Is that something that concerns you as much or are you more optimistic about the positive impacts of AI?

Speaker 1

我对积极影响要乐观得多。

I am much more optimistic about the positive impacts.

Speaker 1

就我自己的工作而言,我们目前运营着全国肾脏交换计划。

So just in my own work what we've done so far, we run the nationwide kidney exchange.

Speaker 1

如今有数百人因此得以存活。

Hundreds of people are walking around alive today.

Speaker 1

谁会不愿意呢?

Who wouldn't be?

Speaker 1

而且它还增加了就业。

And it's increased employment.

Speaker 1

现在有很多人在运营肾脏交换项目,并在移植中心与肾脏交换系统互动。

You have a lot of people now running kidney exchanges and at transplant centers, interacting with the kidney exchange.

Speaker 1

还有额外的外科医生、护士、麻醉师、医院等等。

You have extra surgeons, nurses, anesthesiologists, hospitals, all of that.

Speaker 1

就业因此增加,世界也变得更美好。

Employment is increasing from that and the world is becoming a better place.

Speaker 1

另一个例子是组合采购拍卖。

Another example is combinatorial sourcing auctions.

Speaker 1

2001年至2010年间,在我之前创办的名为CombineNet的初创公司中,我们进行了800次大规模组合采购拍卖。

We did 800 large scale combinatorial sourcing auctions from 2001 to 2010 in a previous startup of mine called CombineNet.

Speaker 1

我们在这600亿美元的采购中,将供应链效率提高了12.6%。

And we increased the supply chain efficiency on that $60,000,000,000 of spend by 12.6%.

Speaker 1

因此,全球效率提升了超过60亿美元。

So that's over $6,000,000,000 of efficiency improvement in the world.

Speaker 1

这不是将价值从一方转移到另一方,而是纯粹的效率提升。

This is not like shifting value from somebody to somebody else, just efficiency improvement.

Speaker 1

比如在卡车运输中,减少了空驶。

Like in trucking, less empty driving.

Speaker 1

这样浪费更少,碳足迹也更小等等。

So there's less waste, less carbon footprint and so on.

Speaker 1

这在短期内会产生巨大的积极影响。

This is a huge positive impact in the near term.

Speaker 1

但需要再坚持一段时间,因为我认为博弈论在这里能发挥作用。

But to stay in it for a little longer because I think game theory has a role to play here.

Speaker 1

让我再回头谈谈这个问题。

Let me actually come back on that.

Speaker 1

这是一方面。

That is one thing.

Speaker 1

我认为人工智能还将使世界变得更加安全。

I think AI is also going to make the world much safer.

Speaker 1

这是另一个经常被忽视的方面。

So that's another aspect that often gets overlooked.

Speaker 1

嗯,让

Well, let

Speaker 0

我问这个问题。

me ask this question.

Speaker 0

也许你可以谈谈安全性。

Maybe you can speak to the the safer.

Speaker 0

我曾与马克斯·泰格马克和斯图尔特·罗素交流过,他们非常担心人工智能的生存威胁,通常关注的是价值错位问题。

So I talked to Max Tegmark and Stuart Russell, who are very concerned about existential threats of AI and often the concern is about value misalignment.

Speaker 0

即人工智能系统运作时追求的目标与人类文明、人类利益不一致。

So AI systems basically working, operating towards goals that are not the same as human civilization, human beings.

Speaker 0

因此博弈论似乎可以在确保价值观与人类一致方面发挥作用。

So it seems like game theory has a role to play there to to make sure the values are aligned with human beings.

Speaker 0

我不确定你是否这样看待这个问题。

I don't know if that's how you think about it.

Speaker 0

如果不是的话,你认为AI可能如何帮助解决这个问题?

If not, how do you think AI might help with this problem?

Speaker 0

你认为AI可能如何让世界变得更安全?

How do you think AI might make the world safer?

Speaker 1

是的。

Yeah.

Speaker 1

我认为这种价值观错位更多是理论层面的担忧,在实际应用中我并未真正遇到过,因为我主要从事实际应用开发。

I think this value misalignment is a fairly theoretical worry and I haven't really seen it in it because I do a lot of real applications.

Speaker 1

我在任何地方都没见过这种情况。

I don't see it anywhere.

Speaker 1

我见过最接近的情况其实是这样的思维实验:80年代末我们开发运输优化系统时,有人听说提高资产利用率是个好主意。

The closest I've seen it was the following type of mental exercise really, where I had this argument in the late 80s when we were building these transportation optimisation systems and somebody had heard that it's a good idea to have high utilisation of assets.

Speaker 1

于是他们对我说,嘿,你为什么不把这个设为目标呢?

So they told me, hey, why don't you put that as an objective?

Speaker 1

我们最终没把它设为目标,因为我向他证明:如果把这个设为目标,解决方案就会是把卡车装满然后绕圈行驶。

And we didn't even put it as an objective because I just showed him that if you had that as your objective, the solution would be to load your trucks full and drive in circles.

Speaker 1

这样什么货物都送达不了。

Nothing would ever get delivered.

Speaker 1

却能实现100%的利用率。

Would have 100% utilization.

Speaker 1

所以是的,我了解这种现象。

So yeah, I know this phenomenon.

Speaker 1

我了解这一点已有三十多年,但从未在现实中见过它真正成为问题。

I've known this for over thirty years but I've never seen it actually be a problem in reality.

Speaker 1

确实,如果目标设定错误,人工智能会将其优化到极致,造成的伤害将超过人类半吊子的尝试。

And yes, if you have the wrong objective, the AI will optimize that to the hilt and it's going to hurt more than some human who's kind of trying to

Speaker 0

用人类的一些见解半途而废地解决它。

solve it in a half baked way with some human insight too.

Speaker 0

但我就是没见到这种情况在实践中发生。

But I just haven't seen that materialise in practice.

Speaker 0

你刚才非常清晰地指出了理论与现实之间的这道鸿沟。

There's this gap that you've actually put your finger on very clearly just now between theory and reality.

Speaker 0

我觉得这很难用语言表达清楚。

That's very difficult to put into words, I think.

Speaker 0

这是你理论上能想象到的最坏情况,甚至更糟。

It's what you can theoretically imagine, the the worst possible case or even yeah.

Speaker 0

我是说,糟糕的情况,以及现实中通常发生的状况。

I mean, bad cases, and what usually happens in reality.

Speaker 0

举个例子,对我来说——也许你能就此发表看法——我是在苏联长大的。

So for example, to me, maybe it's something you can comment on, having grown up in I had grown up in the Soviet Union.

Speaker 0

你知道,目前世界上有一万枚核武器。

You know, there's currently 10,000 nuclear weapons in the world.

Speaker 0

几十年来,从理论上看核战争没有爆发让我感到惊讶。

And for many decades, it's, theoretically surprising to me that the nuclear war is not broken out.

Speaker 0

你会从博弈论的角度思考这个问题吗?

Do you think about this aspect from a game theoretic perspective in general?

Speaker 0

为什么这是真的?

Why is that true?

Speaker 0

为什么理论上你能预见事情会变得很糟,但实际上却并未发生?

Why in theory you could see how things would go terribly wrong and somehow yet they have not?

Speaker 0

你怎么看

How do you think

Speaker 1

这个问题?

about that?

Speaker 1

我确实经常思考这个问题。

So I do think about that a lot.

Speaker 1

我认为人类面临的最大两个威胁,一是气候变化,二是核战争。

I think the biggest two threats that we're facing as mankind, one is climate change and the other is nuclear war.

Speaker 1

这就是我最担心的两件事。

So those are my main two worries that I worry about.

Speaker 1

我曾尝试为气候问题做些事情,两次考虑过为气候变化采取行动。

And I've tried to do something about climate I thought about trying to do something for climate change twice.

Speaker 1

实际上,我为两家初创企业委托过相关研究,但都没找到理想切入点,不过我仍在持续关注。

Actually, for two of my startups I've actually commissioned studies of what we could do on those things and we didn't really find a sweet spot, but I'm still keeping an eye out on that.

Speaker 1

如果我们能通过市场方案、优化方案或其他技术手段解决问题的话。

If there's something where we could actually provide a market solution or optimization solution or some other technology solution to problems.

Speaker 1

比如当时我们正在研究污染信用市场。

Right now, for example, pollution credit markets was what we were looking at then.

Speaker 1

更多是这些市场缺乏政治意愿导致效果不佳,而非市场设计本身的问题。

And it was much more the lack of political will by those markets were not so successful rather than bad market design.

Speaker 1

我可以进去设计一个更好的市场机制,但如果没有政治意愿,这对世界的影响微乎其微。

I could go in and make a better market design, but that wouldn't really move the needle on the world very much if there's no political will.

Speaker 1

而在美国,至少芝加哥市场刚刚被关闭等等。

And in The US, at least the Chicago market was just shut down and so on.

Speaker 1

所以,无论你的市场设计有多出色都无济于事。

So then it doesn't really help how great your market design was.

Speaker 0

至于核问题方面,情况更严重。

And then the nuclear side, it's more.

Speaker 0

全球变暖是个日益紧迫的问题。

So, global warming is a more encroaching problem.

Speaker 0

核武器一直存在于此。

Nuclear weapons have been here.

Speaker 0

这是个显而易见却长期悬而未决的问题。

It's an obvious problem that's just been sitting there.

Speaker 0

那么你认为是什么机制设计让一切看起来如此稳定?

So how do you think about what is the mechanism design there that just made everything seem stable?

Speaker 0

你现在仍然极度担忧吗?

And are you still extremely worried?

Speaker 1

我仍然极度担忧。

I am still extremely worried.

Speaker 1

你可能知道'相互保证毁灭'的简单博弈论。

So, you probably know the simple game theory of MAD.

Speaker 1

这就是相互保证毁灭机制,它不需要任何复杂计算。

So, this was a mutually assured destruction and it doesn't require any computation.

Speaker 1

对于小型矩阵,你实际上可以确信这个游戏的性质是没有人愿意主动发起。

With small matrices you can actually convince yourself that the game is such that nobody wants to initiate.

Speaker 1

是的,这是一种非常粗略的分析,在存在两个超级大国或少数几个超级大国的情况下确实适用。

Yeah, that's a very coarse grained analysis and it really works in a situation where you have two superpowers or a small number of superpowers.

Speaker 1

现在情况已经大不相同了。

Now things are very different.

Speaker 1

你们拥有更小型的核武器,因此启动的门槛更低,而且会有更多小国和非国家行为体可能获得核武器等等。

You have a smaller NUC, so the threshold of initiating is smaller and you have smaller countries and non nation actors who may get NUCs and so on.

Speaker 1

所以我认为现在的风险比

So I think it's riskier now than

Speaker 0

以往任何时候都要大。

it was maybe ever before.

Speaker 0

关于AI的想法和应用,你刚才稍微提到了一些,但眼下最让你兴奋的是什么?

And what idea, application of AI, you've talked about it a little bit, but what is the most exciting to you right now?

Speaker 0

我的意思是,你现在在NeurIPS大会上,手头有几项出色的工作成果。

I mean, you're here at NIPS, NeurIPS now, You have a few excellent pieces of work.

Speaker 0

但你正在合作的几家公司,未来有什么规划?

But what are you thinking into the future with several companies you're doing?

Speaker 0

最令人兴奋或其中一项令人兴奋的事情是什么?

What's the most exciting thing or one of the exciting things?

Speaker 1

目前对我来说最重要的是开发这些可扩展的游戏解决方案技术,并将它们应用到现实世界中。

The number one thing for me right now is coming up with these scalable techniques for game solving and applying them into the real world.

Speaker 1

我仍然对市场设计非常感兴趣,我们正在优化市场中实践这一点。

I'm still very interested in market design as well and we're doing that in the optimized markets.

Speaker 1

但我最关心的是,当前的首要任务是否是战略机器、战略机器人,将这项技术推广出去,并且鉴于你们正在一线进行应用开发,实际需要填补哪些空白,还有哪些技术缺口需要填补。

But I'm most interested if number one right now is strategic machine, strategy robot, getting that technology out there and seeing as you are in the trenches doing applications, what needs to be actually filled, what technology gaps still need to be filled.

Speaker 1

所以光是翘着脚空想需要做什么是非常困难的。

So it's so hard to just put your feet on the table and imagine what needs to be done.

Speaker 1

但当你真正进行实际应用开发时,应用本身会告诉你需要做什么。

But when you're actually doing real applications, the applications tell you what needs to be done.

Speaker 1

我真的很享受这种互动过程。

And I really enjoy that interaction.

Speaker 1

将你们正在研发的尖端技术应用到实际中是否充满挑战

Is it a challenging process to apply

Speaker 0

让工业界、军方或其他真正能从中受益的各方实际采用这些技术。

some of the state of the art techniques you're working on and having the various players in industry or the military or people who could really benefit from it actually use it.

Speaker 0

以自动驾驶汽车为例,我们与汽车公司合作,他们在很多方面都有些保守。

What's that process like of, you know, in autonomous vehicles, we work with automotive companies and they're in many ways are a little bit old fashioned.

Speaker 0

这很困难。

It's difficult.

Speaker 0

他们确实想使用这项技术。

They really want to use this technology.

Speaker 0

这显然会带来重大效益,但在数据、计算能力等各方面,现有系统还不足以轻松实现技术整合。

This clearly will have a significant benefit, but the systems aren't quite in place to easily have them integrated in terms of data, in terms of compute, in terms of all these kinds of things.

Speaker 0

所以这是你们面临的主要挑战之一吗?你们是如何应对这个挑战的?

So is that one of the bigger challenges that you're facing and how do you tackle that challenge?

Speaker 1

是的,我认为这始终是个挑战。

Yeah, I think that's always a challenge.

Speaker 1

这实际上是一种缓慢和惰性,就是我们总是按老方法做事。

That's kind of slowness and inertia really of let's do things the way we've always done it.

Speaker 1

你只需要找到客户内部的支持者,他们明白未来不能一成不变。

You just have to find the internal champions at the customer who understand that things can't be the same way in the future.

Speaker 1

否则糟糕的事情就会发生。

Otherwise bad things are going to happen.

Speaker 1

在自动驾驶领域非常有趣的是,传统汽车制造商正在这样做。

And in autonomous vehicles, it's actually very interesting that the car makers are doing that and they're very traditional.

Speaker 1

但与此同时,像谷歌和百度这样与汽车或交通毫无关系的科技公司却在大力推动自动驾驶汽车。

But at the same time, have tech companies who have nothing to do with cars or transportation, like Google and Baidu really pushing on autonomous cars.

Speaker 1

我觉得这很迷人。

I find that fascinating.

Speaker 1

显然你对这些想法能在世界上产生影响感到非常兴奋

Clearly you're super excited about actually these ideas having an impact in

Speaker 0

the world.

Speaker 0

在技术和研究方向上,还有哪些领域让你感到兴奋?

In terms of the technology, in terms of ideas and research, are there directions that you're also excited about?

Speaker 0

无论是你谈到的处理不完全信息博弈的方法,还是将深度学习应用于这些问题。

Whether that's on the some of the approaches you talked about for imperfect information games, whether it's applying deep learning to some of these problems.

Speaker 0

在研究方面有什么让你兴奋的内容吗?

Is there something that you're excited in the research side of things?

Speaker 1

是的,在游戏解决方案中有很多不同的方向。

Yeah, lots of different things in the game solving.

Speaker 1

所以,解决更大的博弈问题,那些玩家行动中有更多隐藏动作的游戏。

So, solving even bigger games, games where you have more hidden action of the player actions as well.

Speaker 1

扑克是一种机会行为真正被隐藏的游戏。

Poker is a game where really chance actions are hidden.

Speaker 1

或者其中一些被隐藏,但玩家的行动是公开的。

Or some of them are hidden, but the player actions are public.

Speaker 1

各种类型的多人游戏,共谋、对手剥削,甚至更长的博弈。

Multiplayer games of various sorts, collusion, opponent exploitation, and even longer games.

Speaker 1

基本上是无限进行的游戏,但它们不是重复的。

Games that basically go forever, but they're not repeated.

Speaker 1

因此,寻找那些无限进行的扩展形式博弈。

So, seek extensive form games that go forever.

Speaker 1

那会是什么样子呢?

What would that even look like?

Speaker 1

你如何表示这种情况?

How do you represent that?

Speaker 1

你如何解决这个问题?

How do you solve that?

Speaker 0

这类游戏有什么例子吗?

What's an example of a game like that?

Speaker 0

这是你提到的一些随机博弈吗?

This is some of the stochastic games that you mentioned?

Speaker 1

比如说商业策略。

Let's say business strategy.

Speaker 1

不仅仅是模拟某个特定的互动,而是要从现在到永远地思考业务。

And not just modeling a particular interaction, but thinking about the business from here to eternity.

Speaker 1

或者说军事策略。

Or let's say military strategy.

Speaker 1

所以战争并不会消失。

So it's not like war is going to go away.

Speaker 1

你认为军事策略会永远持续下去吗?

Do you think about military strategy that's going to go forever?

Speaker 1

你该如何建模呢?

How do you even model that?

Speaker 1

你如何判断某人采取的行动是否正确?

How do you know whether a move was good that somebody made?

Speaker 1

诸如此类。

And so on.

Speaker 1

这大致是一个方向。

So that's kind of one direction.

Speaker 1

我也对学习更可扩展的整数规划技术非常感兴趣。

I'm also very interested in learning much more scalable techniques for integer programming.

Speaker 1

今年夏天我们在ICML上发表了第一篇具有理论泛化保证的自动算法配置论文。

So we had an ICML paper this summer on that for the first automated algorithm configuration paper that has theoretical generalization guarantees.

Speaker 1

如果我看到这么多训练样本,并以这种方式调整算法,它将在未见过的真实分布上表现良好。

So if I see these many training examples and I tool my algorithm in this way, it's going to have good performance on the real distribution, which I have not seen.

Speaker 1

有趣的是,算法配置研究至少已经认真进行了十七年。

Which is kind of interesting that algorithm configuration has been going on now for at least seventeen years seriously.

Speaker 1

在此之前从未有过任何泛化理论。

And there has not been any generalization theory before.

Speaker 0

嗯,真的非常令人兴奋,能和你交谈是我的莫大荣幸。

Well, is really exciting and it's a huge honor to talk to you.

Speaker 0

非常感谢你,托马斯。

Thank you so much, Tomas.

Speaker 0

感谢你将Labratis带到这个世界,以及你所做的所有杰出工作。

Thank you for bringing Labratis to the world and all the great work you're doing.

Speaker 1

嗯,非常感谢你。

Well, thank you very much.

Speaker 1

这很有趣。

It's been fun.

Speaker 1

问得好。

Good questions.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客