Google DeepMind: The Podcast - 人类数据足够了吗?与大卫·西尔弗对话 封面

人类数据足够了吗?与大卫·西尔弗对话

Is Human Data Enough? With David Silver

本集简介

在本期《Google DeepMind:播客》中,强化学习副总裁David Silver阐述了他对人工智能未来的愿景,探讨了"经验时代"与当前"人类数据时代"的概念。他以AlphaGo和AlphaZero为例,说明这些系统如何通过不依赖人类先验知识的强化学习超越人类能力。这种方法与依赖人类数据和反馈的大型语言模型形成鲜明对比。Silver强调需要探索这条路径来推动AI进步,实现人工超级智能。 时间戳 00:00 开场 01:50 经验时代 03:45 AlphaZero 10:19 第37手 15:20 强化学习与人类反馈 24:30 AlphaProof 29:50 数学奥林匹克 35:00 基于经验的方法 42:56 Hannah的思考 44:00 樊麾加入 ___ 特别感谢所有为此付出努力的人员,包括但不限于: 主持人:Hannah Fry教授 系列制片人:Dan Hardoon 系列编辑:Rami Tzabar 委托制作人:Emma Yousif 音乐作曲:Eleni Shaw 音频工程师:Richard Courtice 制作经理:Dan Lazard 视频导演与剪辑:Bernardo Resende 视频工作室制作:Nicholas Duke 视频剪辑:Bilal Merhi 音频工程师:Perry Rogantin 摄影与灯光操作:Robert Messere 制作协调:Zoey Roberts, Sarah Ellen Morton 视觉标识与设计:Rob Ashley 由Google DeepMind委托制作 如果您喜欢本期节目,请在Spotify或Apple Podcasts上留下评价。我们始终期待听众的反馈,无论是意见、新想法还是嘉宾推荐! 由Simplecast托管,AdsWizz旗下公司。有关我们收集和使用个人数据用于广告的信息,请访问pcm.adswizz.com。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

我们需要让AI能够真正自主地解决问题,并发现人类未知的新事物。我认为这将开启一个全新的AI时代,对社会而言将是极其激动人心且意义深远的。

We're going to need our AIs to actually figure things out for themselves and to discover new things that humans don't know. And I think that's going to be a whole new era of AI that's going to be incredibly exciting and profound for society.

Speaker 1

欢迎回到Google DeepMind播客。今天的嘉宾是独一无二的大卫·西尔弗,他是DeepMind的创始成员之一,也是AlphaGo惊人成功背后的关键人物——这是首个掌握世界上最复杂棋盘游戏并实现超人类表现的程序。在今天的播客结尾,我们还为您准备了一份额外惊喜:大卫与樊麾的对话,樊麾是首位与AI对弈的职业围棋选手。但现在,大卫对AI的下一步发展方向有一个大胆的想法。在当前多模态模型的所有热议、兴奋和成就之后,大卫规划了一条通往超人类智能的道路,他称之为'经验时代'的新阶段。

Welcome back to Google DeepMind, the podcast. My guest today is the inimitable David Silver, an original DeepMinder and one of the key people behind the phenomenal success of AlphaGo, the first program to master the world's most complex board game and achieve superhuman performance. Now at the end of today's podcast, we have a little extra treat for you, a conversation with David and Fan Hui, the first professional Go player to take on the AI. But now David has a bold idea about the direction that AI should go in next. After all of the current buzz and excitement and achievements of multimodal models, David has a plan for the path towards superhuman intelligence, a new phase which he calls the era of experience.

Speaker 1

这是一个深刻的想法,但也并非没有风险。大卫,欢迎来到播客。

This is a profound idea and not one without risks. David, welcome to the podcast.

Speaker 0

你好。很高兴来到这里。非常荣幸。谢谢。

Hi. It's great to be here. Real pleasure. Thank you.

Speaker 1

好的。我这个周末非常愉快地阅读了你的立场文件。在其中,你谈到了'经验时代'。请为我们总结一下,你这是什么意思?

Okay. So I have spent the weekend with a very enjoyable read of your of your position paper. And in it, you are talking about the the era of experience. Summarize for us, what do you mean by that?

Speaker 0

嗯,我的意思是,如果你看看AI过去几年的发展,它一直处于我称之为'人类数据时代'的阶段,即所有这些AI方法都基于一个共同理念:我们提取人类拥有的每一份知识,并将其输入机器。这是一种极其强大的方式。但还有另一种方式,这将引领我们进入'经验时代',即机器实际与世界本身互动,并生成自己的经验。它在世界中尝试事物,并开始积累自己的经验。如果你将这些数据视为机器的燃料,那么这将引领我们走向下一代AI,我们可以称之为'经验时代'。

Well, what I mean is that if you look at where AI has been for the last few years, it's been in what I call the era of human data, which is that all of these AI methods, they're based on one common idea, which is we extract every piece of knowledge that humans have and you kind of feed it into the machine. And that's one incredibly powerful way to do things. There's another way to do things, and this is what's gonna lead us into the era of experience, which is where the machine actually interacts with the world itself and it generates its own experience. It tries things out in the world and it starts to build up its own experience. And if you think of that data as fueling the machine, then that will lead to this next generation of AI that we can think of as the era of experience.

Speaker 1

我猜这在某种程度上是你拍桌子说大型语言模型不是AI的唯一形式,对吧?就像还有替代方案,我们可以通过不同的方式来处理这个问题。

I guess this is in a way you're sort of thumping the table saying large language models are not the only AI, right? Like there are alternatives, there are different ways that we can approach this.

Speaker 0

没错。我认为我们在人工智能领域,特别是在构建大型语言模型方面,已经取得了很大进展,利用了海量的人类尤其是自然语言数据,并将所有这些知识吸收到一个知晓人类所有书写记录的机器中。但某种程度上我们需要超越这一点。我们想要走得更远。我们想要超越人类已知的范畴。

That's right. I think we've really got a lot out in the field of AI, of building large language models, harnessing the vast quantity of human, particularly natural language data that's out there and kind of assimilating that all into a machine that knows everything that humans have ever written down. But some point we need to get past that. We want to go beyond that. We want to go beyond what humans know.

Speaker 0

为此,我们需要一种不同的方法。这种方法将要求我们的AI真正自主解决问题,并发现人类未知的新事物。我认为这将开启一个全新的AI时代,对社会来说将极其激动人心且意义深远。

And to do that, we're gonna need a different type of method. And that type of method will require our AIs to actually figure things out for themselves and to discover new things that humans don't know. And I think that's gonna be a whole new era of AI that's gonna be incredibly exciting and profound for society.

Speaker 1

那么好吧,我们来谈谈其他一些著名的AI和著名算法,它们采用了不同类型的方法,最著名的是AlphaGo和AlphaZero,它们大约在十年前击败了世界顶尖的围棋选手,对吧?请告诉我们这些技术是如何运作的,以及它们与当今的大型语言模型有何不同。

Well, okay then, let's talk about some other sort of famous AIs, famous algorithms that have employed different types of methods, most notably AlphaGo and AlphaZero, which, of course, notoriously beat the world's best Go players about a decade ago. Right? Tell us about the the techniques that we use in that and how they differ from from large language models that we see today.

Speaker 0

特别是AlphaZero,它与最近使用的人类数据方法非常不同,因为它完全不使用人类数据。这就是AlphaZero中“零”的含义。系统中完全没有预先编程的人类知识。那么替代方案是什么?如果你不模仿人类,并且事先不知道正确的玩法,你如何学习围棋知识呢?

So AlphaZero in particular is very different from the type of human data approaches that have been used recently, because it literally uses no human data. That's the zero in AlphaZero. So there is literally zero human knowledge that's pre programmed into the system. And so what's the alternative? How do you learn Go knowledge if you're not copying humans and you don't really know in advance the right way to play?

Speaker 0

方法是通过一种试错学习的形式,AlphaZero基本上与自己进行了数百万局的围棋、国际象棋或任何它想玩的游戏。一点一点地,它发现,哦,如果我在这种情况下走这一步,这类移动,那么我最终会赢得更多游戏。然后这段经验被用来推动它变得更强。之后它会更多地采用这种策略,下一次它又会发现新的东西,比如某种新模式。

Well, the way you go about it is through a form of trial and error learning where AlphaZero basically played itself millions of games of Go or chess or whatever the game is that it was wanting to play. And bit by bit, it figured out, Oh, if I play this move, this kind of move in this kind of situation, then I end up winning more games. And then that's a piece of experience that is used to fuel it to become stronger. And then it will play a little bit more like that. And the next time it will discover something new and say, there'll be some new pattern.

Speaker 0

就像,哦,当我使用这种特定模式时,最终会赢得更多游戏或输掉更多游戏。这为下一代提供了反馈,依此类推。这种从经验中学习,从智能体自我生成的经验中学习,在AlphaZero中已经足够推动其从完全随机的行为一路进步到世界上已知的最强的国际象棋和围棋程序。

It's like, oh, when I use this particular pattern, end up winning more games or losing more games. And that feeds the next generation and so forth. And that learning from experience, this learning from the agent's self generated experience is enough and was enough in AlphaZero to fuel its progress all the way from completely random behaviour all the way up to the strongest chess and go playing programmes that the world has ever known.

Speaker 1

但它们并不是从一开始就是随机的空盒子,对吧?它们是从零开始学会如何下围棋的。我的意思是,在设计围棋算法时,你们已经找到了一种编码围棋游戏并将其作为数据库输入的方法,对吧?

They didn't start off just as like random empty boxes though, right? They kind of found how to play Go from nothing. I mean, when you were designing your Go algorithms, you'd worked out a way to encode Go games and then feed them in as a database, right?

Speaker 0

是的,没错。所以,最初版本的AlphaGo,也就是2016年著名地击败李世石的那个版本,实际上确实使用了一些人类数据作为起点。我们基本上给它输入了一个人类职业棋手棋步的数据库,它学习并吸收了这些人类棋步,这给了它一个起点。然后从那时起,它通过经验自学。然而,一年后我们发现,人类数据并非必需,实际上你可以完全抛弃人类棋步。

Yeah, that's right. So, the original version of AlphaGo, the version which famously beat Lisa Dahl in 2016, this version of AlphaGo actually did use some human data to start it off. So, we basically fed it a database of human professional moves and it learned, it ingested those human moves and that gave it a starting point. And then it learnt for itself by experience from that point onwards. However, what we discovered a year later was that the human data wasn't necessary, that you could actually throw out the human moves altogether.

Speaker 0

我们所展示的是,最终的程序不仅能够恢复这种性能水平,实际上它表现更好,并且比最初的AlphaGo学习得更快,达到了更高的性能水平。

And what we showed was that actually the resulting programme not only was it able to recover this level of performance, it actually worked better and was able to learn even faster than the original AlphaGo to achieve a much higher level of performance.

Speaker 1

我的意思是,这真是个奇怪的想法,你抛弃人类数据,结果发现它不仅不是必需的,而且在某种程度上还积极地限制了性能。

I mean, that is such a strange idea, that you throw away the human data and you find not only was it not necessary, but it was actively limiting performance in a way.

Speaker 0

我认为对AI领域的人来说,一个艰难的教训——这有时被称为AI的苦涩教训——是我们真的想相信我们人类积累的所有知识都非常重要。我们真的很想相信这一点。所以我们把这些知识输入我们的系统,将其构建到我们的算法中。而实际发生的是,这使我们设计算法的方式可能更适合人类数据,而不太擅长真正自学。而如果你抛弃人类数据,你实际上会花更多精力在系统如何自学上。

I think one of the hard lessons for people in AI, this is sometimes called the bitter lesson of AI, is that we really want to believe that all of the knowledge that we've accumulated as humans is really important. We really want to believe that. And so we feed it into our systems, we build it into our algorithms. And what happens is that actually that makes us design the algorithms in a way which is maybe fitted to the human data and is less good at actually learning for itself. And what happens is if you throw out the human data, you actually spend more effort on how the system can learn for itself.

Speaker 0

而这部分才能不断地学习、学习、再学习,永无止境。

And that's the part which can then learn and learn and learn forever.

Speaker 1

苦涩的教训。我想在某种程度上,这有点像是在说,接受可能有东西能比人类更好地下围棋,并在某种程度上移除了那个天花板。

The bitter lesson. I suppose in a way it's sort of saying, accepting that it's possible that something could play Go better than humans can and sort of removing that ceiling in a way.

Speaker 0

没错。是的,人类数据确实有助于你起步,但人类所做的一切都有一个天花板。你知道,我们在围棋中看到了这一点,人类曾经达到的性能水平有一个上限。我们需要突破这些天花板。而在AlphaZero中,我们通过构建一个通过自我对弈自学的系统,不断变得更好、更好、更好,直到它冲破那个天花板,远远超越。

That's right. That, yeah, human data, it's really helpful to get you off the ground, but there is a ceiling to everything that humans have done. And, you know, we see that in Go, there was a maximum level of performance that humans have ever achieved. And we need to break through these ceilings. And in AlphaZero, we were able to break through that ceiling by building a system that learned for itself by self play and got better and better and better until it blasted through that ceiling and went far beyond.

Speaker 0

我认为体验时代的理念在于,我们找到了能够突破这一上限的方法,无处不在。我们构建的人工智能系统在人类看似卓越的所有能力上都变得超乎常人,但我们找到了超越这一点的途径。

And I think the idea of the era of experience is that we find the methods that allow us to break through that ceiling everywhere. We build AI systems that become superhuman in all of the capacities that humans seem so amazing, but we find the way to go beyond that.

Speaker 1

在我们讨论其他摆脱人类数据从而提升人类能力的方法之前,请允许我先谈谈围棋,好吗?是的。因为当你提到要摒弃人类下过的所有围棋对局,从零开始时,听起来有点像魔术。请稍微介绍一下你们实际使用的技术,如何让机器像你说的那样,将成千上万不同的想法串联起来,从而在围棋游戏中表现出色。

Let me just stick with Go for a second, okay, before we get onto the other ways that you can get rid of human data and thus improve on human ability. Yeah. Because it sort of sounds a little bit when you say, let's just get rid of all of the games of of Go that humans have played and start with nothing, it sort of sounds like a magic trick. Just tell me a little bit about the techniques that you're really using there in order to get a machine to, as you say, chain together thousands and thousands of different ideas in order to be amazing at the game of Go.

Speaker 0

嗯,主要思路是一种我们称之为强化学习的方法。强化学习的理念是,你基本上给游戏结果赋予一个数值,我们说,赢了加一分,输了减一分。

Well, the main idea is an approach that we call reinforcement learning. And the idea of reinforcement learning is you basically give the outcome of your game a number and we say, plus one if you win and minus one if you lose.

Speaker 1

一分。

One point.

Speaker 0

没错,完全正确。然后我们通过强化学习让系统在每次做对事情时获得奖励,并训练系统强化这些行为,这意味着多做能获得更多奖励的事情。例如,如果你有一个像我们在AlphaGo中使用的神经网络来选择走法,你需要做的就是稍微调整神经网络的权重,使其朝着能获得更多奖励的方向发展。这就是强化学习的主要思想。

Exactly, exactly. And what we do then with reinforcement learning is we get the system to basically, we give it a reward each time it does something right, and we train the system to basically reinforce, that means do more of the things that get more reward. And so in terms of, for example, if you've got a neural network like we do in AlphaGo, that's picking the moves, what you want to do is tweak the weights of your neural network a little bit in the direction that gives you more reward. And that's the main idea of reinforcement learning.

Speaker 1

但是,好吧,我的意思是,一局围棋相当长。你如何确保一开始就走对棋,以便最终在结束时得到正确的结果?比如,你如何判断游戏的哪些部分是重要的?

But then, okay, I mean, a game of Go is quite long. How do you make it so that you do the right moves in the beginning so that you end up with the right outcome towards the end? Like how do you work out which bits of the game are important, I suppose?

Speaker 0

所以这是一个非常重要的问题,被称为信用分配问题。其核心在于,正如你正确指出的,你可能走了100、200或300步不同的棋,然后在最后只得到一个信息,比如赢或输。你必须设法弄清楚游戏中的哪些走法是致胜的关键,哪些是导致失败的原因。有很多方法可以做到这一点。

So this is a really important problem. It's called the credit assignment problem. And the idea is that if you've, yeah, you're absolutely right that you could have had, you know, 100 or 200 or 300 different moves, and then at the end, you just get one bit of information saying, you know, win or loss. And you somehow have to work out which of the moves in the game were responsible for winning and which of the moves in the game were responsible for losing. And there's lots of ways to do that.

Speaker 0

最简单的方法就是假设你所做的一切最终都会对结果有所贡献。某种程度上,一切都会在过程中自然显现。

The simplest way is just to assume that everything that you've done contributes a little bit to that outcome at the end. And it sort of all comes out in the wash.

Speaker 1

AlphaGo故事中最大的时刻之一就是人们经常提到的第37手棋。请给我讲讲这个。

One of the biggest moments of the AlphaGo story was Move 37 that everyone always references. Just tell me about that.

Speaker 0

第37手棋发生在AlphaGo与李世石的第二局比赛中。AlphaGo下出了一步出乎所有人意料的棋。围棋的传统理念是,棋子通常下在棋盘的三线或四线上,因为三线能获取实地,四线能形成外势。人类绝不会在低于或高于这些线的位置落子,这对人类来说毫无意义。

So Move 37 was a move that happened in the second game of AlphaGo against Lee Sedol. And AlphaGo played a move that defied everyone's expectations. The traditional idea in the game of Go is that you play your moves typically on the third line or the fourth line of the board because this either gives you territory for the third line or influence on the fourth line. And you never go below or above that. It just wouldn't make sense to humans.

Speaker 0

AlphaGo却在五线落子,而且以某种方式让整个棋局变得合理起来。这步五线棋巧妙地将一切连接在一起。对人类而言这步棋如此陌生,我们估计人类想到这步棋的概率只有万分之一。人类对此震惊不已,但这步棋却帮助赢得了比赛。这是一个让人类意识到机器产生了创造性思维的瞬间——一种不同于人类传统围棋思维的方式,这实际上是一大进步,让我们突破了人类知识的局限。

AlphaGo played on the fifth line and it somehow played this in a way that just made everything make sense in the board. It kind of connected everything together with this move on the fifth line. And it was so alien to humans that we estimated that only one in 10,000 probability that a human would ever think of playing this move, humans were shocked by this move and yet it helped win the game. And so it was a moment where humans said, Look, here's something creative that happened, something that a machine came up with that was different from the way humans traditionally thought about the game, that actually was a big piece of progress and took us beyond the kind of confines of human knowledge.

Speaker 1

我想如果我们真想推动AI发展,确实需要这种你说的'外星思维'。你认为在大语言模型中看到过相当于第37手棋的突破吗?

And I guess if we really do want to advance AI, we sort of do want those alien ideas, as you put it. Do you think you've seen an equivalent of Move 37 with large language models?

Speaker 0

从某些方面来说,第37手很特别,因为它是第一个这样的突破时刻。由于我们一直处于人类数据时代,我们大量专注于复制人类能力,而很少关注超越它们。我认为除非我们真正重视系统自主学习以超越人类数据,否则在现实世界中不太可能出现相当于第37手棋的重大突破。

Move 37 in some ways was special because it was the first moment. It was the first time that people had seen a big breakthrough like this. And because we've been in the era of human data, we focused a huge amount on reproducing human capabilities and we've focused much less on going beyond them. And I think until we really emphasise systems learning for themselves to go beyond human data, we won't see huge breakthroughs the equivalent of Move 37 in the real world. It seems unlikely to me.

Speaker 1

因为当局限于人类数据时,你只能得到类人的反应。

Because when you're angered in human data, you're only ever gonna have human like responses.

Speaker 0

没错。而且我认为有些事情你可以做,让你或许能在中间地带有所作为。所以如果你非要我说,什么是像第37手那样的最伟大时刻?我可能会举出麻省理工学院科学家们发现一种人类此前未知的新型抗生素的工作。我认为这是一个对人类具有巨大重要性的惊人发现。

That's right. And I think there are things you can do that allow you to maybe do things in the middle a little bit. So if you push me to say, what's the greatest Move 37 like moment? I would probably pick out some work by scientists at MIT who discovered a new antibiotic that no human knew about. And I think that's an incredible discovery of massive importance to humanity.

Speaker 0

所以从这个意义上说,它远远超出了第37手。但我喜欢第37手的地方在于,它不仅仅是一个单一的发现。它是一个无限发现序列中的一个,系统可以不断学习、学习再学习。第37手对我很重要,因为它代表了这种无限发现序列中的单个点,一旦你掌握了这种从经验中学习的方法,这样的发现就能持续发生。

So in that sense, it goes way beyond 37. But what I like about Move 37 is that it's not just a single discovery. It's one of an infinite series of discoveries where the system can just keep on learning and learning and learning. And Move 37 is important to me because it represents just a single point in that infinite sequence of discoveries that can happen once you've got this kind of approach of learning from experience.

Speaker 1

而不是结果本身的价值。

Rather than actual result in and of its own right.

Speaker 0

是的,没错。

Yeah, that's right.

Speaker 1

给我简要介绍一下AlphaZero的工作原理。

Give me a brief rundown of how AlphaZero worked.

Speaker 0

AlphaZero其实非常简单。我的意思是,世界上有一些非常复杂的算法,但这个算法真的很直接。你只需要从一个策略开始——一种选择走法的方式,以及一个价值函数——一种评估走法好坏的方法。你从这个开始,进行搜索,然后根据搜索结果选择最佳走法,并训练你的策略更多地这样做,根据搜索结果多做好的走法。你还根据使用这种搜索进行游戏时的实际结果来训练你的价值函数。

So AlphaZero is surprisingly simple. I mean, there's some very complicated algorithms out there in the world, but this one is really straightforward. So all you do is you start with a policy, a way to pick moves and a value function, which is a way to evaluate moves and say whether they're good or bad. So you start with that, you run a search and then what you do is you take the best move according to your search and you train your policy to do more things like that, to do more of the good moves according to your search. And you train your value function based on how the game actually panned out when you played a game with this search.

Speaker 0

就这样。你只需要将这个流程迭代数百万次,就能培养出一个超人类的游戏玩家。

And that's it. You just iterate that millions of times and help pops a superhuman game player.

Speaker 1

这基本上就像魔法一样。

It's like magic, basically.

Speaker 0

有时候确实感觉像魔法。我记得第一次让我真正感受到魔法般神奇的是我们刚完成AlphaZero在国际象棋上的应用时。有人想到尝试把它用在另一个游戏上。于是我们把它接入了一个我们没人会玩的游戏,叫做将棋,也就是日本象棋。我们完全不知道这个游戏怎么玩。

It does sometimes feel like magic. I remember the first time that really felt like magic to me was when we had just completed AlphaZero on chess. Someone had the idea of trying it on a different game. So we plugged it into a game that none of us could play, a game called shogi, which is Japanese chess. And we had no idea how to play this game.

Speaker 1

什么,你们连规则都不知道?

What, you didn't even know the rules?

Speaker 0

系统知道规则。我们教了这个智能体规则,但我们自己完全不知道真正的策略或战术。如果我们来玩这个游戏,可能会一错再错。我们只是把它接进去,这 literally 是我们第一次在将棋上运行AlphaZero。我们完全不知道它水平如何。

So the system knew the rules. The agent, we taught it the rules, but none of us had the first clue of how to really, you know Strategy or tactics. Would have been like blunder after blunder if we'd been playing this game. And we just plugged it in and it was literally the first ever time we ran AlphaZero on Shogi. We had no idea whether it was good or not.

Speaker 0

我们无法评估它,但我们把它发给了Demis,他实际上是个相当厉害的玩家。他说,这看起来相当不错。我要把它发给世界冠军。而世界冠军说,我认为这已经超乎人类水平了。所以这 literally 感觉像魔法,因为我们只是按下了运行按钮,完全不了解过程它是怎么做到的,但不知怎的就冒出了一个超人类的将棋玩家。

We couldn't evaluate it, But we sent it off to Demis, who's actually a reasonably strong player, Of of he said, this looks quite good. I'm sending it to the world champion. And the world champion said, I think this is superhuman. And so it literally felt like magic because we just pressed go on this system and had no idea of the process and how it got there, but somehow out popped a superhuman shogi player.

Speaker 1

AI能设计自己的强化学习算法吗?

Can AI design its own reinforcement learning algorithms?

Speaker 0

嗯,有趣的是,我们实际上在这个领域已经做了一些工作。这是我们几年前做的研究,但现在才发表。我们所做的是构建一个系统,通过试错,通过强化学习自身,找出哪种算法最适合强化学习。它 literally 上升了一个元层级,学会了如何构建自己的强化学习系统。令人难以置信的是,那个强化学习系统实际上 outperformed 了我们人类多年来自己想出的所有强化学习算法。

Well, funnily enough, we have actually done some work in this area. It's work we actually did a few years ago, but is coming out now. And what we did was actually to build a system that, through trial and error, through reinforcement learning itself, figured out what algorithm was best at reinforcement learning. It literally went one level meta, and it learned how to build its own reinforcement learning system. And incredibly, that reinforcement learning system actually outperformed all of the human reinforcement learning algorithms that we'd come up with ourselves over many, many years in the past.

Speaker 1

我是说,这都是一遍又一遍重复的故事。你在某个东西里加入越多的人为因素,它的表现就越糟糕,

I mean, this is the same story over and over again. The more of a human you put into something, the worse it acts,

Speaker 0

性能就越差。

the worse it performs.

Speaker 1

把人为因素拿掉,反而表现更好。好吧。如果AlphaGo和AlphaZero确实是强化学习的最佳应用典范,那么在我们现有的大型语言模型中仍然能找到强化学习的影子,对吧?给我讲讲它们是如何集成到这些系统中的。

Take the human out, does better. Okay. If alpha go and alpha zero then are really exceptional examples of reinforcement learning used to the best it can be, You still find reinforcement learning in the large language models that we have at the moment, right? Tell me about how they're integrated into these systems.

Speaker 0

强化学习几乎被用于所有大型语言模型系统。主要应用方式是与人类数据相结合。与AlphaZero方法不同,这意味着强化学习实际上是基于人类偏好进行训练的。系统被要求生成输出,然后由人类评判:这个比那个更好。系统就会逐渐变得更符合人类的偏好。

So reinforcement learning is used in almost all large language model systems. And the main way it's used is by combining it with human data. So unlike the AlphaZero approach, this means that the reinforcement learning is actually trained on human preferences. So the system is basically asked to produce outputs and then a human says, This one is better than this other one. And the system becomes more like the one that the human prefers.

Speaker 0

这被称为人类反馈强化学习(RLHF)。它在大型语言模型中极其重要,帮助将其从盲目模仿互联网上任何数据的系统,转变为能有效回答人们真正想知道的问题的系统。这是个惊人的进步。然而,感觉我们有点矫枉过正了。这些人类反馈强化学习系统虽然非常强大,

And this is called reinforcement learning from human feedback. And it's been massively important in LLMs and it's helped transform them from systems that just blindly mimic any kind of data that you see on the internet into systems that actually usefully produce answers to the kind of questions that people really want to see. And so it's an incredible advance. However, it feels like we've thrown out the baby with the bathwater. These reinforcement learning from human feedback systems or RLHF, they're very powerful,

Speaker 2

但是

but

Speaker 0

它们无法超越人类知识。如果人类评分者不认识某个新想法,低估了某些行动序列实际上会比其他序列好得多,系统就永远学不会找到那个序列,因为评分者可能无法理解更好的行为。

they do not have the ability to go beyond human knowledge. Like if a human rater doesn't recognise some new idea and under appreciates that there are some series of actions that would actually end up being far better than some other series of actions, there is no way that the system will ever learn to find that sequence because the raters might not understand that better behavior.

Speaker 1

不过那个人类反馈元素,确实让这些模型有了一些接地气的感知。就像,我记得我们上次聊的时候,接地气是个很大的话题,就是希望这些算法能对我们生活的世界有概念性的理解。所以如果去掉或移除人类反馈这个方面,最终得到的模型还能算是接地气的吗?

That human feedback element though, it does seem to give these models some sense of grounding. Like, I know the last time we spoke, grounding was like this really big topic, this idea that you want these algorithms to have a conceptual understanding almost of the world that we're living in. So if you take away or you remove that human feedback aspect, do you still end up with models that grounded?

Speaker 0

我几乎想提出相反的观点。

I almost want to argue the opposite.

Speaker 1

哦。

Oh.

Speaker 0

我想说的是,当我们通过人类反馈训练系统时,它其实并不接地气。原因在于,RLHF系统通常的工作方式是:系统呈现它的回应,比如对某个问题的答案,然后评分者在系统实际利用这些信息之前就判断这个回应是好是坏。所以就像是人类在预先评判系统的输出。举个例子,如果你向大语言模型要一个蛋糕食谱,人类评分者会查看系统输出的食谱,并在任何人实际制作并品尝蛋糕之前就判断这个食谱的好坏。从这个意义上说,这是不接地气的。

I want to say that when we train a system from human feedback, that it is not grounded. And the reason is that we are basically, the way RLHF systems normally work, is the system presents its response, its answer to a question, for example, and a rater says that's good or bad before the system actually does anything with that information. So it's like the human is pre judging the output of the system. So for example, if you're asking for a cake recipe from an LLM, the human rater will look at the recipe that's output by the system and judge whether that recipe is good or bad before anyone has actually made the recipe and eaten the cake. And in that sense, it's ungrounded.

Speaker 0

接地气的结果应该是有人真的吃了蛋糕,然后蛋糕要么美味要么难吃。这样你就能得到接地气的反馈,比如‘这个蛋糕确实好吃’或‘这个蛋糕不好吃’。正是这种接地气的反馈让系统能够迭代和发现新事物,因为它可以尝试一些新食谱,也许专业厨师认为会很难吃,但结果却非常美味。

Like a grounded outcome would be someone actually eats the cake and the cake is either delicious or disgusting. And then you've got grounded feedback that says, This cake really was a good cake, or This cake was a bad cake. And it's that grounded feedback that allows the system to iterate and discover new things, because it can try out new recipes that maybe, you know, expert chefs presume will be disgusting, but actually turn out to be delicious.

Speaker 1

对,就像那个,嗯,怪物松饼之类的。没错,是的。

Yeah, like a, yeah, Monster Munch muffin or whatever. Exactly, yeah.

Speaker 0

表面上听起来像是史上最美味的食物。

On the surface, it sounds the most delicious food that ever existed.

Speaker 1

不过,这很有意思,因为我听说过,甚至和Demis的一次对话中谈到过这些模型是如何实现接地的,它们是如何建立起对事物的概念性理解的。听起来你所说的似乎是它们拥有的接地就像是一种表面的接地层次,也许是这样吗?

Okay, that's interesting though, because I have heard, I mean, even a conversation with Demis talking about how grounding gets into these models, how they kind of have built this conceptual understanding of things. And it sounds almost like what you're saying is that the grounding that they have is like a sort of superficial level of grounding, maybe?

Speaker 0

我认为人类数据是扎根于人类经验的。所以,大语言模型(LLMs)某种程度上继承了人类通过自身实验可能发现的所有信息。例如,在科学领域,一个人可能尝试过在水上行走,结果发现自己掉进去了,然后他们可能造了一艘船,发现船能浮起来。所有这些信息都可以在一定程度上被LLM继承。但如果我们想要一个真正能做出发现、发现水上推进的全新形式,或者某种全新的数学理念,或者某种全新的Pathoses方法的系统。

I think human data is grounded in human experience. So it's like the LLMs are sort of inheriting all of that information that humans may be figured out from their own experiments. For example, in science, a human might have tried to walk across water and discovered that they fell in, and then they might have created a boat and discovered that that floated. All of that information can be inherited somewhat by the LLM. But if we want a system that actually makes discoveries and discovers some completely new form of propulsion across water, or, you know, some completely new mathematical idea, or some completely new way to Pathoses.

Speaker 0

是的,新药物和生物学的新方法。数据根本不存在,系统需要通过它自己的实验、自己的试错以及它自己基于现实的反馈,来判断一个想法是好是坏。

Yeah, new medicine and new approach to biology. The data just isn't there, and the system needs to figure out for itself, through its own kind of experimentation, its own trial and error, and its own grounded feedback, whether that's a good idea or a bad idea.

Speaker 1

我和Oriel,Oriel Vignales聊过,他确实谈到了我们正在耗尽人类数据,我们将需要开始创建合成数据来填补这个空白。我的意思是,这和那个想法是相关的,对吧。只是你不是用LLM来创造更多的人类对话数据,而是以一种不同的方式来解决这个问题。

I got to talk to Oriel, Oriel Vignales, who really spoke about how we are running out of human data, and that we are going to need to start creating synthetic data in order to fill that gap. I mean, is related, right, to that idea. It's just rather than using LLMs to create more human dialogue data, you're going about the solution in a different way.

Speaker 0

没错。所以合成数据可以有很多含义,但通常意味着你有一个流程,你利用现有的LLM来生成一些数据集。我猜这个论点类似于我们从人类数据中遇到的天花板,即无论合成数据有多好,它们最终都会达到一个点,这些合成数据对系统变得更强不再有用。所以,一个自我学习系统的美妙之处在于,系统的燃料实际上是经验,随着系统开始变得更强,它开始遇到恰好适合其当前水平的问题。因此,它将不断生成能让自己解决所遇下一个问题的经验。

That's right. So synthetic data can mean a lot of things, but normally it would mean that you've got some process where you kind of take your existing LLM and you use it to generate some set of data. And I guess the argument is similar to the ceiling that we have from human data, that however good that synthetic data is, they will reach a point where that synthetic data is no longer useful to the system becoming stronger. So the beauty of a self learning system where the fuel of the system is actually experience is that as the system starts to get stronger, it starts to encounter problems that are exactly appropriate to the level it's at. So it will always be generating experience that allows it to solve the next problem that it's encountering.

Speaker 0

因此,它可以永远变得越来越强。没有极限。我认为这正是这种使用自我生成经验的方法区别于其他形式合成数据的地方。

And so it can just get stronger and stronger and stronger forever. There is no limit. And that I think is what differentiates this particular approach of using self generated experience from other forms of synthetic data.

Speaker 1

不过,还是回到你那个蛋糕的例子,我的意思是,如果你顺着这个思路想下去,有人吃了蛋糕然后说,是的,这很好吃,你最终还是在过程的末尾使用了人类反馈。我们是在讨论这个,还是在讨论也许要拥有完全脱离人类、具身化或以某种方式存在于物理世界中的系统,以便它们能那样获得反馈?

Just returning to your cake example though, I mean, if you kind of follow that through, somebody eats the cake and says, yes, this was delicious, you're using the human feedback then at the end of the process anyway. Are we talking about that or are we talking about maybe having systems that are completely untethered from humans and are embodied or in the physical world somehow so that they can get their feedback in that way?

Speaker 0

我认为理想的情况是,像AlphaZero那样,我们拥有能够生成大量自我生成数据经验的系统,然后它们可以自行验证这些数据。在许多领域,这是可行的;但在许多其他领域,这是不可行的。在不可行的领域,我们必须承认人类是我们所处环境的重要组成部分,我们必须承认他们是我们的智能体想要生存的世界的一部分。

Look, I think the ideal is that like AlphaZero, we have systems which are able to generate vast volumes of self generated data experience that they can then verify for themselves. And in many domains, that's gonna be possible. And in many domains, it's not going to be possible. In the ones where it's not possible, we have to acknowledge that humans are a big part of the environment that we're in. We have to acknowledge they're a part of the world that we want our agents to live in.

Speaker 0

因此,将人类视为环境的一部分,并将他们的行为方式视为智能体接收到的观察的一部分,似乎是合理的。我所反对并认为缺乏根据的并非这一点,而是智能体学习的奖励来自于人类对这一系列行为好坏的判断,而系统本身并未根据这些行为在现实世界中的后果进行判断。所以,一种说法是我们不应将人类数据视为智能体经验中的特权部分。

And so it seems reasonable to think of humans as a part of that environment and to think of the way that they behave as part of the observation that the agent receives. I think the thing which I'm pushing back against and saying is not grounded is not that. It's the fact that the rewards that the agent learns from is coming from a human's judgment of whether this sequence of actions is good or bad. And the system is not judging for itself based on the consequence of those actions in the actual world. And so, one way to say it is that we shouldn't make human data a privilege part of the agent's experience.

Speaker 0

它只是世界中的观察数据,我们应该能够像学习任何其他数据一样从中学习。

It's just observations in the world, and we should be able to learn from that like any other data.

Speaker 1

如果我们回到之前AlphaGo分配奖励的例子,即它在最后获得的那一分,目前我们处理AI的方式几乎就像是算法先进行前10步或15步,然后我们插入一个人说‘是的,这前10步很好’,而不让整个过程完全执行完毕才输入那一点点反馈。

If we go back to that AlphaGo example earlier of assigning that reward, that one point that it gets at the end, is this almost like the way that we're handling AI at the moment is that the algorithm does its first 10 moves or 15 moves, and then then we insert a human in who says, yes. That's a good first 10 moves, and doesn't allow the whole process to kind of execute fully before you input that little bit of feedback.

Speaker 0

完全正确。想象一下,如果我们训练AlphaGo时,每一步棋之后,我们最好的围棋棋手都进来说‘哦,这步棋太棒了’或‘哦,不,那步棋完全错了’。然后我们得到反馈并输入系统,系统学会选择人类偏好的棋步,这样它最终就不会发现第37手棋,因为它只会下出人类认为好的围棋,永远不会发现人类不知道的新下法。

That's exactly right. So imagine that we were training AlphaGo and after every single move, our best Go player comes in and says, Oh, that move was amazing. Oh, no, no, that move was totally wrong. And then we get that feedback and we put it in and the system learns to pick the move that the human prefers, it would not end up discovering Move 37, because it would just end up playing like the human thinks is a good game of Go, and it would never discover the new ways to play Go that that human didn't know about.

Speaker 1

好的,所以在围棋环境中,你说的很有道理。还有其他环境我也觉得这很有道理。我想到的是人类思想的巅峰——数学。告诉我那个领域发生了什么。

Okay, so I think the environment of Go, what you're saying makes a lot of sense in that environment. There are other environments too where I think that this makes a lot of sense. I'm thinking here about the pinnacle of human thought, of mathematics. Tell me what's been going on in that space.

Speaker 0

如你所说,这是一项不可思议的人类努力,凝聚了数千年的心血。因此在很多方面,它确实代表了人类思维成就的极限。所以我们自然转向AI,看看能否达到人类经过所有这些努力所达到的同等性能水平。我们最近完成了一项我认为非常令人兴奋的工作,叫做Alpha Proof。这是一个通过经验学习如何正确证明数学问题的系统。

Like you say, it's an incredible human endeavor that's had millennia of human effort going into it. And so in many ways it does represent like literally the limits of achievement by the human mind. And so naturally we turn to it for AI to see, can we achieve those same levels of performance that humans have achieved over all of those years of endeavour. We recently put together what I think is a very exciting piece of work called Alpha Proof. It is a system that learns through experience how to correctly prove mathematical problems.

Speaker 0

所以它可以做到,如果你给它一个定理,但不告诉它如何实际证明这个定理,它会自行离开并找出该定理的完美证明。我们实际上可以验证并保证这个证明是正确的。有趣的一点是,这与大型语言模型(LLM)通常的工作方式完全相反,因为如果你现在让LLM证明一个数学问题,它们通常会输出一些非正式的数学内容,然后说,相信我,这是正确的。它可能是对的,但也可能不对,因为我们知道LLM经常会产生幻觉,它们可能会编造东西。

So it can, if you give it a theorem and you don't tell it anything about how to actually prove that theorem, it will go away and figure out for itself a perfect proof of that theorem. And we can actually verify and guarantee that this proof is correct. One thing which is interesting about this is that it's the exact opposite of how LLMs normally work, because if you ask LLMs to prove a mathematical problem at the moment, they will normally output some informal mathematics and say, just trust me, this is correct. And it might be correct, but it might not be because we know that LLMs tend to hallucinate a lot. They can make things up.

Speaker 0

Alpha Proof的好处是,它实际上能够保证产出真理。

And the nice thing about alpha proof is that it will actually, guaranteed, produce the truth.

Speaker 1

那么我们来想一个例子,以便让大家更好地理解。比如说,质数是只能被自身和1整除的数,而且有无限多个,去吧,证明它。

So let's think of an example here to kind of anchor this in people's minds. Let's say that prime numbers are something that can't be divided by anything but themselves and one, and there are infinite number of them, off you go, prove it.

Speaker 0

是的,Alpha Proof的工作方式是,它在数百万个不同的定理示例上进行训练,而不仅仅是一个。它的过程是,它会去训练这些示例,一开始,它无法解决绝大多数问题,99.999%的定理它都做不到。

Yeah, so the way alpha proof works is it's trained on millions of different examples of theorems, not just one. And what happens, it goes off and it trains on them, and to begin with, it can't solve the vast majority of them, 99.999% of the theorems it just can't do.

Speaker 1

这些是人类已经证明过的定理吗?你是输入这些吗?

And these are theorems that humans have already proved? Are you feeding in?

Speaker 0

我们向系统输入了大约一百万个由人类自己想出的不同定理,但我们不提供人类的证明。我们只提供问题,而不提供答案。

We feed into the system something like a million different theorems that humans have come up with themselves, but we don't provide the human proofs. We just provide the questions, but not the answers.

Speaker 1

所以你给它的是你知道为真的东西,但只是不告诉它如何证明。

So you're giving it stuff that you know is true, but you're just not telling it how to prove it.

Speaker 0

有时我们甚至不知道它是否真实,因为我们实际上做的是将人类定理、人类问题转化为一种形式化的语言。

And sometimes we don't even know it's true because what we actually do is we take the human theorem, the human question, and we actually turn it into a formal language.

Speaker 1

这些并非使用语言模型所理解的语言,但它们确实在使用一种形式的语言,比如数学语言。

These aren't using language in the sense that language models are using, but they are using a form of language, like a mathematical language.

Speaker 0

没错。实际上,我们确实使用了一个小型大型语言模型,这个大型语言模型使我们能够输出编程语言。特别是,我们使用一种名为Lean的编程语言,它可以表达所有数学内容。数学家们提出了这个惊人的想法,即我们可以将通常用英语或其他语言讨论的内容,转化为一种完全清晰、可验证的数学语言,这种语言不仅能表达所有数学思想,还能表达所有数学证明的思想。例如,你可以说,如果A蕴含B,且B蕴含C,那么存在一种方法可以推导出A蕴含C。

That's right. So in fact, we do use a small large language model and that large language model allows us to output programming languages. And in particular, we use a programming language that's called Lean that allows all of mathematics to be expressed. And so it's an amazing idea that mathematicians have come up with, that you can actually formalize all of these kind of things that we normally talk about in English language, or whatever language you happen to be speaking, can be transformed into a perfectly clear, verifiable mathematical language that allows all of the ideas of maths to be expressed and also all of the ideas of mathematical proof to be expressed. So you can say, for example, that if A implies B and B implies C, then there's a way to go from that to A implies C.

Speaker 0

这就是你可以在这种数学编程语言中做的事情。你基本上编写一个程序,将你从一个点带到另一个点,然后,瞧,你就有了这个陈述的证明。因此,我们从大约一百万个人类问题中生成了一亿个形式化问题。其中一些可能实际上不可能,或者可能被错误地表述,或者干脆就是错误的。但这没关系,因为我们所做的就是学习证明这些内容,而那些我们无法证明的,我们会不断尝试再尝试。

And that's the kind of thing that you can do in this mathematical programming language. You essentially write a programme that takes you from one to the other and ta da, you have a proof of this statement. So we take our kind of million human problems and from that we generate a 100,000,000 formal problems. And some of those might actually not be possible or they might be incorrectly formulated or they might just be false. And it doesn't matter because all we do is we learn to prove those things and the ones which we can't prove become, we keep trying and keep trying.

Speaker 0

那些我们已经证明的,好了,它们完成了,现在可以放在一边了。如果我们反驳了它们,那也没问题,它们也被排除了。剩下的就是那些真正有趣的问题,即那些非常难以证明的问题,我们不断从只能解决一两个,进步到能解决十个或二十个,最终能够解决一百万个。

The ones that we already prove, okay, they're done, they're out the way now. If we disprove them, that's fine, they're out the way. And we're left with the really interesting ones, which are the ones which are really hard to prove, and we keep kind of climbing up from just being able to solve one or two of them, to then being able to solve 10 or 20 of them, and eventually being able to solve a million of them.

Speaker 1

那么,这是否等同于AlphaGo中证明正确或错误的时刻,就像你赢了游戏或没赢一样?

Is this the equivalent then, that moment of the proof is correct or incorrect, is that equivalent to AlphaGo, you win the game or you don't?

Speaker 0

这完全等同。如果我们使用Lean说“干得好,你已经证明了这一点”作为奖励,并且如果系统解决了问题,我们就给它加一分,如果没有正确解决,就减一分。这样,我们就可以通过强化学习训练系统,使其在证明数学陈述方面变得越来越好。实际上,我们直接使用了与提高围棋、国际象棋和其他游戏水平相同的AlphaZero代码。完全是相同的代码,但可以说,它运行在数学的游戏中。

It's exactly equivalent. So if we use the idea that Lean says, Well done, you've proved this, as a reward, and we give the system plus one if it solves it and minus one if it doesn't get that correct. And so this allows us to then train a system by reinforcement learning to get better and better at proving mathematical statements. In fact, we literally use the same AlphaZero codes that we used to get better at Go and chess and all of these other games. It's literally the same code, but it's running, if you like, with the game of mathematics.

Speaker 1

游戏?你怎么敢?竟敢轻视我的专业。好吧,它到底有多厉害?

The game? How dare you? Dare trivialise my subject. Okay, how good is it?

Speaker 0

它还不是一个超人类数学家,尽管这是我们希望有一天能达到的目标。但Alpha Proof确实取得了一项成就——在国际数学奥林匹克竞赛中达到了银牌水平,这是最知名且最具挑战性的数学竞赛。这项竞赛每年举办一次,汇聚了全球最杰出的年轻数学家。毫不夸张地说,这些题目极其困难,相当棘手。

It's not yet a superhuman mathematician, although that is where we'd like to get to one day. But one thing which Alpha Proof did achieve was the most well known and challenging of mathematical competitions is called the International Mathematics Olympiad. And this is a competition that happens once a year for the most incredible and amazing young mathematicians from all around the world. And the problems, to say the least, are extremely hard. They're spicy.

Speaker 1

确实非常棘手。作为一名数学教授,有时候,我是说,它们真的很棘手。

They're very spicy. As a professor of maths, sometimes, mean, they're spicy.

Speaker 0

所以汉娜也说了,这些问题很难,非常难。而令人惊叹的是,Alpha Proof在这场竞赛中实际达到了银牌级别的表现。这一水平只有大约10%的参赛者能够达到。

So you heard it from Hannah. These are hard problems. They're hard. And Alpha Proof, amazingly, it actually achieved a silver medal level of performance in this competition. So this is a level of performance that only roughly 10% of the contestants would actually be able to achieve.

Speaker 1

是在全世界范围内,我是说。

In the entire world, mean.

Speaker 0

是在全世界范围内。这就像是年轻数学家的精英,每个国家最顶尖的六个人。不仅如此,还有一个特定问题,不到1%的参赛者能够解决,而Alpha Proof为这个问题提供了一个完美的证明。看到这一点真是令人欣慰。

In the entire world. This is like the cream of young mathematicians, like the six best from every country. And not only that, but there was one particular question that less than 1% of all the contestants were able to solve, and Alpha Proof got a perfect proof for this particular problem. So that was nice to see.

Speaker 1

这些证明看起来是什么样的?我的意思是,如果你没有向它们输入任何人类数据,它们是否遵循人类风格的论证方式?

What do the proofs look like? I mean, do they follow human style arguments if you're not inputting any human data into them?

Speaker 0

我必须说,对我来说,这些证明我完全看不懂。

I have to say that to me, the proofs, I don't understand them at all.

Speaker 1

蒂姆·高尔斯,我是说,菲尔兹奖得主和前IMO选手,我是说,他获得过,他是金牌得主吗——

Tim Gowers, I mean, Fields medalist and former IMO, I mean, he get, was he a gold medalist at the-

Speaker 0

蒂姆·高尔斯是,没错,多次金牌得主哦,在

Tim Gowers was a, yeah, multiple gold medalist Oh, at the

Speaker 1

大脑,对吧?就像非凡的数学家。但我的意思是,他理解这些证明,对吧?

brain, right? Like extraordinary mathematician. But I mean, he understands these proofs, right?

Speaker 0

所以蒂姆·高尔斯实际上审阅了我们的解决方案,以确保它们是有效的,并且我们没有违反任何规则。他理解这些解决方案,并认为它们远远超出了以往AI数学所能做到的任何事情。所以这是一个飞跃,但这仍然只是开始,因为我们真正想要超越人类数学家,这是我们下一步的目标。

So Tim Gowers actually refereed our solutions to make sure that they were valid solutions and that we hadn't broken any of the rules. And he understands the solutions and thought that they were a huge leap beyond anything that previous AI mathematics could do before. So it's a jump forward, but it's still just the beginning in the sense that we really want to go beyond human mathematicians and that's where we'd like to go next.

Speaker 1

因为目前来说,基本上,你得到的是一个非常、非常、非常有天赋的17岁数学家,对吧?

Because at the moment, basically, you've got yourself a very, very, very talented 17 year old mathematician, basically, right?

Speaker 0

没错。而且应该说明的是,参加IMO的系统确实比人类参赛者被允许的时间要长。所以,你知道,这只是我们假设随着机器速度加快,这种情况会随着时间的推移而改善。

That's right. And it should be said that the system that entered the IMO did take longer than a human contestant would be allowed to take. So, you know, that's something we're just going to assume will get better over time as machines get faster.

Speaker 1

我的意思是,国际数学奥林匹克竞赛(IMO)就像一个完美的试验场,因为它有标准答案,可以被评判,可以与人类表现进行比较,诸如此类。但如果你输入的是猜想,比如那些我们甚至不知道是否成立的东西,比如ABC猜想、黎曼猜想,或者数学中那些真正宏大的未解难题。如果AlphaProof输出结果并声称:‘不不不,我们验证了这个证明,它是成立的’,你能信任它吗?甚至更进一步,如果我们无法理解它,这样的证明还有价值吗?

I mean, the IMO is like the perfect test bed because there are correct answers, it can be judged, you can compare it to human performance, all of that kind of thing. But if you are feeding in conjectures, so things that we don't even know are true, you know, I'm thinking of like the ABC conjecture here, or the Riemann hypothesis, or any of those like really grand unsolved challenges in mathematics. If alpha proof outputs something and says, No, no, no, we've checked this proof, it works, can you trust it? And maybe even beyond that, is it worth anything if we don't understand it?

Speaker 0

我认为Lean的好消息是,比我更优秀的数学家总是能够将Lean证明转化为人类可以理解的形式。事实上,我们甚至已经构建了一个人工智能系统来完成这项工作,它可以将任何形式化证明进行我们所谓的‘非形式化’处理,即将其转换回人类非常容易理解的形式。如果我们真的解决了黎曼猜想——顺便说一句,我们离那一步还很远——但如果真的做到了,将会有数百万数学家非常兴奋地去理解由此产生的新数学,并将其解码为人类能够理解的内容。

I think the good news about Lean is that mathematicians who are better than myself are always able to take a lean proof and translate it back into something that humans can understand. And in fact, we've even built an AI system that can do this, which can take any formal proof and what we call informalise it, which means it will turn it back into something which is very understandable to humans. And if we did solve the Riemann hypothesis, and by the way, we're a long way from doing that, but if it was done, there'll be millions of mathematicians who'd be very excited to understand whatever new mathematics came out of it and decode it back into things that humans can understand.

Speaker 1

好吧,但我的问题是这样的:克莱数学研究所在2000年为七个不同的数学问题设立了一百万美元的奖金。你看,人类数学家花了四分之一个世纪试图解决它们,但只有一个被攻克了。你认为下一个可能会由AI解决吗?

Okay, but here's my question, right? The Clay Math Institute in the year 2000 offered a million dollar prize for seven different mathematical problems. And, you know, human mathematicians have had a quarter of a century in order to try and solve them and only one has fallen. Do you think potentially the next one could go to AI?

Speaker 0

是的,我确实这么认为。但我认为这可能需要时间。我觉得我们还没有达到那个水平,AI系统要具备这种能力还有很长的路要走。但我相信AI正走在正确的轨道上,像AlphaProof这样的系统会变得越来越强大。

Yes, I do actually. Do I think that it might take time. I don't think we're there yet. I think there's a long way before AI systems are capable of doing this. But I think AI is on the right track and systems like AlphaProof will become stronger and stronger and stronger.

Speaker 0

你知道,我们在IMO中看到的只是一个开始。一旦你拥有一个可以扩展、不断学习、再学习的系统,真的就没有什么限制了。那么这些系统在两年、五年或二十年后会是什么样子?我个人会觉得,如果AI数学家没有彻底改变整个数学领域,那才令人惊讶。我认为这一天即将到来。

You know, what we saw in the IMO is just the beginning. And you know that once you have a system that can scale and can keep learning and learning and learning, really the sky's the limit. So what will these systems look like in two years or five years or twenty years? Well, I personally would be amazed if AI mathematicians don't transform the whole of mathematics. I think it's coming.

Speaker 0

数学是少数几个原则上可以由机器完全数字化处理的领域之一,机器可以自我交互,不断推进。因此,对于经验驱动的AI系统来说,掌握数学并没有根本性的障碍。

Mathematics is one of the few areas where in principle everything can be done completely digitally by a machine interacting with itself and just going and going and going. So there's really no fundamental barrier to a experience driven AI system mastering mathematics.

Speaker 1

好吧。顺便说一句,我真的很认同你关于AlphaProof的观点,AlphaZero也是如此。我认为它们是强化学习能够取得多大成就的绝佳例子。但它们也是成功标准非常明确的例子:你赢了围棋比赛或者没赢,你的证明正确或者不正确。

Okay. I really buy what you're saying about AlphaProof by the way, and the same with AlphaZero. I mean, I think they're really excellent examples of how far you can go with reinforcement learning. But they are also examples where there is a very clear metric of success. You win a game of Go or you don't, your proof is correct or it isn't.

Speaker 1

这些想法如何应用到那些实际上更混乱、这些非常清晰的指标可能并不一定存在的系统中?

How did these ideas translate to systems where it's a lot messier, actually these very clear metrics might not necessarily be present?

Speaker 0

首先,我想承认这个问题可能就是为什么强化学习方法或我所说的这类基于经验的方法尚未完全渗透到我们在每个AI系统中所做的一切主流应用中的原因。所以这个问题必须被攻克。经验时代即将到来,我们必须对此给出答案。但我认为答案可能就在我们眼前。因为实际上,当你观察时,现实世界包含了无数的信号。

So first, I want to acknowledge that this question is probably the reason why reinforcement learning methods or these kind of experience based methods that I'm talking about have not yet broken into the mainstream of absolutely everything that we do in every AI system. So it has to be cracked. The era of experience is to come about, then we have to have an answer to this. But I think the answer might be right in front of us. Because actually, when you look at it, the real world contains innumerable signals.

Speaker 0

世界运行的方式中存在着大量的信号。你知道,比如我们看我们在互联网上做的所有事情,就有无数的信号,比如喜欢或不喜欢、利润或亏损、你可能获得的快乐或痛苦信号、产量或材料属性。所有这些不同的数字代表了经验不同方面的不同事物。因此,我们真正需要的是构建一个能够适应并能够说‘在这种情况下,这些中哪一个才是真正需要优化的重点?’的系统。另一种说法是,如果我们能有这样的系统,人类可能指定他们想要什么,但这被转化为一组不同的数字,系统然后可以完全自主地自行优化,那不是很棒吗?

There's just a vast number of signals in the way that the world works. You know, if we look at all of the things that we do on the internet, for example, there's any number of signals like likes or dislikes or profits or losses or pleasure pain signals you might get or yields or properties of materials. There's all these different numbers representing different things about different aspects of experience. And so what we need is really a way to build a system which can adapt and which can say, well, which one of these is really the important thing to optimise in this situation? And so another way to say that is, wouldn't it be great if we could have systems where a human maybe specifies what they want, but that gets translated into a set of different numbers that the system can then optimise for itself completely autonomously.

Speaker 1

那么,举个例子,假设我说,好吧,我今年想更健康。这有点模糊,有点不明确。但你的意思是,这可以被转化为一系列指标,比如静息心率、BMI或其他什么。这些指标的组合然后可以用作强化学习的奖励。如果我理解正确的话?

So, okay, an example then, let's say I said, okay, I want to be healthier this year. And that's kind of a bit nebulous, a bit fuzzy. But what you're saying here is that that could be translated into a series of metrics like resting heart rate or, you know, BMI or whatever it might be. And a combination of those metrics could then be used as a reward for reinforcement learning. That if I understood that correctly?

Speaker 0

完全正确。

Absolutely correctly.

Speaker 1

好的。不过我们是在谈论一个指标,还是在这里谈论一个组合?

Okay. Are we talking about one metric though, are we talking about a combination here?

Speaker 0

所以我认为总体思路是,你有一个人类想要的东西,比如优化我的健康。然后系统可以自行学习,比如哪些奖励有助于你更健康。因此,这可以是一个随时间调整的数字组合。所以可能一开始它说,好吧,现在你的静息心率真的很重要。然后后来,得到一些反馈说,等等,我其实不只是关心那个。

So I think the general idea would be that you've got one thing which the human wants, like to optimise for my health. And then the system can learn for itself, like which rewards help you to be healthier. And so that can be like a combination of numbers that adapts over time. So it could be that it starts off saying, Okay, well, right now it's your resting heart rate that really matters. And then later, gets some feedback saying, Hang on, I really don't just care about that.

Speaker 0

我关心我的焦虑水平之类的。然后它会把这一点纳入考量,并根据反馈进行适应。所以可以这样说,极少的人类数据就能让系统为自己生成目标,从而实现从经验中大量学习。

I care about my anxiety level or something. And then it includes that into the mixture and based on feedback, it could actually adapt. So one way to say this is that a very small amount of human data can allow the system to generate goals for itself that enable a vast amount of learning from experience.

Speaker 1

因为这才是对齐问题的关键所在,对吧?比如说,如果你设计一个强化学习算法,仅仅是为了最小化我的静息心率。很快地,零似乎就是一个很好的最小化策略,它能实现目标,但可能不是你想要的方式。显然,你肯定想避免这种情况。那么你该怎么做呢?

Because this is where the real questions of alignment come in, right? Mean, if you said, for instance, let's do a reinforcement learning algorithm that just minimizes my resting heart rate. I mean, quite quickly, zero is is, like, a good minimization strategy there, which would achieve its objective, just not maybe quite in the way that you wanted it to. I mean, obviously, you really want to avoid that kind of scenario. So how do you?

Speaker 1

你如何确信所选择的指标不会带来额外的问题?

How do you have confidence that the metrics that you're choosing aren't creating additional problems?

Speaker 0

你知道,一种方法是利用在AI其他领域一直非常有效的相同答案,即在那个层面上,可以借助一些人类输入。如果我们要优化的是人类目标,那么我们可能需要在那个层面进行衡量,比如人类提供反馈说,实际上我开始感到不舒服了。虽然我不想声称我们已经有了答案,我认为还需要大量研究来确保这类事情的安全,但它在某些方面确实有助于这种安全性和适应性。有个著名的例子是,当一个系统被要求尽可能多地制造回形针时,它可能会把整个世界铺满回形针。但如果你有一个系统,其总体目标是支持人类福祉,并从人类那里获得反馈,理解他们的痛苦信号和快乐信号等,那么当它开始制造太多回形针并引起人们不适时,它会调整这种组合,选择不同的方式,开始优化不会导致世界被回形针覆盖的目标。

You know, one way you can do this is to leverage the same answer which has been so effective so far elsewhere in AI, which is at that level, can make use of some human input. If it's a human goal that we're optimising, then we probably at that level need to measure, you know, and say, well, you know, a human gives feedback to say, actually, you know, I'm starting to feel uncomfortable. And in fact, while I don't want to claim that we have the answers, and I think there's an enormous amount of research to get this right and make sure that this kind of thing is safe, it could actually help in certain ways in terms of this kind of safety and adaptation. There's this famous example of paving over the whole world with paper clips when a system's been asked to make as many paper clips as possible. But if you have a system which is really, its overall goal is to support human well-being and it gets that feedback from humans about, and it understands their distress signals and their happiness signals and so forth, the moment it starts to create too many paperclips and starts to cause people distress, it would adapt that combination and it would choose a different combination and start to optimise for something which isn't going to pave over the world with paperclips.

Speaker 0

所以,我们还没有达到那一步,但我认为有一些版本可能不仅解决以往以目标为中心的系统所面临的对齐问题,甚至可能比现有的方法更具适应性,因此更安全。

So look, we're not there yet, but I think there are some versions of this which could actually end up not only addressing some of the alignment issues that have been faced by previous approaches to goal focused systems, that maybe even be more adaptive and therefore safer than what we have today.

Speaker 1

不过在AI世界之外,使用量化指标作为成功的衡量标准本身是否存在问题?我想到的是考试成绩或GDP,或者当你过于专注这些指标时可能陷入的各种问题,最终导致指标暴政。

Outside of the world of AI though, I mean, is there a problem with using quantitative metrics as a measure for success at all? I mean, I'm thinking here about exam scores or GDP or the myriad of problems that you can get into when you focus too carefully and end up with a tyranny of metrics.

Speaker 0

所以,我第一个同意,当你在人类世界中盲目追求某个指标时,往往会导致不良后果。但同时,整个人类努力的世界都是围绕我们优化某些事物组织的。如果我们没有任何可以优化的目标,我们就永远无法取得进展。我们有各种信号和指标等推动进步,然后人们会说,哦,也许这不是正确的指标,并对其进行调整。

So look, I would be the first to agree that when you mindlessly pursue a metric in the human world, that it often leads to undesired consequences. At the same time, the whole world of human endeavour is organised around us optimising for some things. You know, if we didn't have anything that we could optimise for, we wouldn't ever be able to make progress. You know, we have all kinds of signals and metrics and so forth that drive progress, then people say, oh, okay, maybe that isn't the right metric, and they adapt it.

Speaker 1

那么问题的一部分是否在于,目前你与AI的互动实际上仅限于当下?没有这种长期的学习或目标调整。就像一旦你决定追求GDP,就永远是GDP,没有改变。

Is part of the problem then that at the moment you have an interaction with an AI that is really contained within time? There aren't these sort of longer term learnings or adjustment of what the goals might be. Like once you decide that GDP is the thing that you're going for, it's GDP forever and there's no change.

Speaker 0

我认为这完全正确,我们今天拥有的AI没有像生命那样的存在。它不像动物或人类那样拥有自己持续多年、不断随时间适应的经验流。这需要改变。改变的原因之一是,我们需要系统能够不断学习、适应,并理解如何更好地实现我们真正想要的结果。

I think that's absolutely right, that the kind of AI that we have today doesn't have like a life. It's not something which has its own stream of experience in the way that an animal or a human might have that kind of goes on for years and years and years and can keep adapting over time. And that needs to change. And one of the reasons it needs to change is so that we can have systems that just keep learning and learning and learning over time and adapting and understanding how to better achieve the kinds of outcome that we really want.

Speaker 1

将可能拥有相当大力量的算法从人类数据中解放出来,是否真的存在相当大的风险?

Is there something that is quite risky about untethering algorithms with potentially quite a lot of power from human data, really?

Speaker 0

确实存在风险,也确实有好处。我认为我们必须非常认真地对待这一点,并在迈向经验时代的下一步中格外谨慎。我写这篇立场文件的原因之一是,我觉得人们没有意识到这一转变即将到来,它将带来后果,需要对这些决策进行深思熟虑。许多人仍然只考虑人类数据方法,这意味着没有足够的人认真对待这类问题。

There are certainly risks and there are certainly benefits. And I think we absolutely have to take this very seriously and be extraordinarily careful about taking these steps that come next in this journey towards the era of experience. And I should say that, one of my reasons to write this position paper is because I feel that people aren't recognising that this transition is going to come and that it will have consequences and it will require careful thought about many of these decisions. And the fact that so many people are still thinking only about the human data approach means that not enough people are taking seriously these kinds of questions.

Speaker 1

上次在这个播客中与你交谈时,我们讨论了你刚写的另一篇论文《奖励就够了》,基本上说强化学习是实现AGI所需的一切。你现在还这么认为吗?

The last time I got to speak to you on this podcast, we talked about a different precision paper that you had just written, reward is enough, essentially saying that reinforcement learning is all you need to get you towards AGI. Do you still think that that's the case?

Speaker 0

我认为我的回答是,人类数据可能给我们一个先机。借用比喻来说,它有点像我们在地球上发现的化石燃料。所有这些人类数据恰好存在,然后我们在大型语言模型中挖掘并消耗它,这免费赋予了它们一定水平的性能。但之后,我们需要某种可持续的燃料,在化石燃料耗尽后维持世界运转。我认为强化学习就是这种燃料。

I think the way I would answer this is by saying that human data might give us a head start. It's a bit like, to borrow a metaphor, it's a bit like the fossil fuels that we discovered in the earth. And all of this human data just happens to be there and then we kind of mine it and burn it in our LLMs, and that gives them a certain level of performance that they have for free. But then we need, in the analogy, some kind of sustainable fuel that keeps the world going once all the fossil fuels are gone. And I think that's what reinforcement learning is.

Speaker 0

它是可持续的燃料,这种经验可以不断生成、使用、学习,并从中生成更多和学习。这确实是推动AI进步的过程。我绝不贬低人类数据所取得的成就。我认为这很棒,我们现在拥有的AI是令人惊叹、震撼的东西。

It's the sustainable fuel, this experience that it can keep like generating and using and learning from and generating more and learning from it. That's really the process that's gonna drive progress in AI. And I don't want to in any way denigrate what's been done with human data. I think it's great. I think, you know, the AIs that we've got now are amazing, mind blowing things.

Speaker 0

而且,你知道,我热爱它们,喜欢与它们一起工作,并亲自对它们进行研究。但这仅仅是个开始。

And, you know, I love them and enjoy working with them and do research on them myself. But it's just the beginning.

Speaker 1

戴夫,非常感谢你。真是太棒了。

Dave, thank you so much. That was amazing.

Speaker 0

谢谢。

Thank you.

Speaker 1

当然,目前正在发生着数量惊人的进步。但当你停下来思考时,会发现围绕AI的想法的多样性确实在缩小。我的意思是,多模态模型的成功如此迅速,如此深刻,远远超出了大多数人的预期,以至于它们几乎吸走了更广泛讨论中的大部分氧气。而且很明显,我们现在一次又一次地听到这些低语,说我们已经达到了可用人类数据的极限。

Of course, there is this monumental amount of progress that's going on at the moment. But when you stop to think about it, there really has been this narrowing in the diversity of ideas around AI. I mean, the success of multimodal models has been so rapid. It's been so profound, so beyond what most people were expecting that they kind of have sucked a lot of the oxygen out of the broader conversation. And it is noticeable that we're hearing again and again now these murmurs that we have reached the limit of usable human data.

Speaker 1

好吧,当然,这种将AI与人类数据解绑的方法存在风险,有许多领域需要仔细思考和关注。但我忍不住对大卫刚才所说的相当信服。如果我们真的想要超人类智能,也许现在是时候摆脱人类的束缚了。您正在收听的是由我,汉娜·弗莱教授主持的谷歌Deep Mind播客。在您离开之前,今天我们为您准备了一份特别的礼物,那就是AlphaGo背后的大卫·西尔弗与首位与之对弈的职业围棋选手樊麾之间的对话。

And okay, of course there are risks involved with this approach of untethering AI from human data, all sorts of areas that need careful thought and attention. But I can't help but be quite convinced by what David was saying there. If we really want superhuman intelligence, maybe it is now time to step away from the human. You have been listening to Google Deep Mind the podcast with me, Professor Hannah Fry. And before you go, we have got an extra special treat for you today in the form of a conversation between David Silver, the man behind AlphaGo, and Fan Hui, the first professional Go player to face it.

Speaker 0

你好吗,戴夫?我很好。很高兴收到你的消息。好久不见了。

How are you, Dave? I'm really well. Good to hear from you. It's been a long time.

Speaker 1

十年前,就在那场著名的4比1战胜李世石比赛前不久,樊麾成为第一位与你的算法测试棋艺的职业围棋选手。你上次和他说话是什么时候?

A decade ago, a little while before the very famous four one victory over Lisa Dole, Fan Hui became the first professional Go player to test his skills against your algorithm. How long has it been since you spoke to him?

Speaker 0

已经过去好几年了。是的,见到樊麾真是太好了。能够叙旧,确实非常棒。樊麾在AlphaGo的发展中扮演了如此重要的角色。所以这真的是一种由衷的喜悦。

It's been quite a few years. It's, yeah, so nice to see Fan Hui. It's been, yeah, absolutely amazing just to catch up. Fan Hui played such a huge part in the development of AlphaGo. So it's really just a genuine delight.

Speaker 1

非常感谢你加入我们,樊麾。

Thank you so much for joining us, Fan Hui.

Speaker 2

哦,谢谢。谢谢。对我来说,这是一次非常非凡的经历。

Oh, thank you. Thank you. For me, it's a very extraordinary experience.

Speaker 1

好的。那么我想问问你关于多年前那场比赛的事。因为我想,现在回顾整个历史,它几乎像是一个预料之中的结局。但在当时,我的意思是,你一定非常紧张吧,大卫。

Okay. So I want to ask you about that match that you had all those years ago. Because I think I guess now Yeah. Looking at the full history of it, it almost seems like a like a foregone conclusion. But at the time, I mean, you must have been pretty nervous, David.

Speaker 1

还有,芬威,你当时感觉如何呢?

And and and how did you feel about it as well, Fenway?

Speaker 2

我记得第一次看到那封邮件告诉我,比如,那个令人兴奋的围棋项目。我仍然记得我和AlphaGo下第一局棋时输了。我感觉有些奇怪。我也记得当我输掉第二局时,我感到恐惧,因为我觉得我可能永远无法战胜这个程序或AI。而当我输掉第五局,也就是最后一局时,

I remember the first time I saw the damage email tell me, like, the exciting gold project. I still remember when I played with AlphaGo, first game I lost. I feel something strange. I also remember when I lost the second game, I feel fear because I feel maybe I will never win with this program or AI. And when I lost my five game last game,

Speaker 1

I

Speaker 2

感觉我旧的目标世界完全破碎了,但我的新目标世界是开放的。

feel my old goal world is totally broken, but my new goal world is open.

Speaker 1

TJ,我也想提前问问你,在那场比赛之前,你对你的算法表现有多大的信心?

TJ, just I wanna ask you as well, though, in advance of that match, how confident were you about the performance of your algorithm?

Speaker 0

我们真的没有信心。因为我们知道自己已经超越了DeepMind已有的玩家,也超越了之前编写的所有程序,所以很难判断我们的位置。但距离像樊麾这样的职业选手水平还有巨大的差距。我们完全不知道,我们是否处于那个差距中的某个位置?

We really weren't confident. It was just so hard to judge where we were because we knew that we'd gone beyond the players that we had at DeepMind. And we knew that we'd gone beyond all the programs that had been written before. But there's such a huge gap beyond that towards the level of professional players like Fan Hui. And we had no idea, are we somewhere in that gap?

Speaker 0

还是我们已经超越了那个差距?我们真的不知道。所以这是我们第一次有机会校准自己的表现水平。如果我们输掉了全部五场比赛,我想我们谁都不会感到惊讶。所以赢得全部五场是一个非常大的惊喜。

Are we somewhere beyond that gap? Like we just genuinely didn't know. And so this was like the first time we had any opportunity to calibrate our level of performance. And I don't think any of us would have been surprised if we'd lost all five games. So it was a very pleasant surprise to win all five.

Speaker 0

是的,我们只是…我真的觉得,那就像是世界可能向两个方向分支的时刻之一,直到比赛发生之前我们根本不知道结果。

And yeah, we just, I genuinely, it was like one of those moments where the world could have branched either way, we just didn't know until until the match happened.

Speaker 1

但是,当然,Benoit,这个算法后来进步了,实际上是在你的帮助下。在你比赛之后,你加入了团队并支持他们进一步开发它。但那个早期版本,和它对弈是什么感觉?它和与人类对手对弈有根本性的不同吗?

But, of course, Benoit, this algorithm then advanced, I mean, with your help, in fact. After your match, you you came on board and and supported the team in in in developing it further. But that earlier version, what did it feel like to play it? Did it feel fundamentally different to having a human opponent?

Speaker 2

你知道,在AlphaGo之前我和另一个程序对弈过。当我和另一个程序对弈时,我感觉,哦,这是一个程序,因为它们下棋不像人类。但和AlphaGo对弈时,我感觉非常奇怪。有时候,我感觉它真的、真的很像人类。

You know, I played with another program before AlphaGo. When I played with another program, I feel like, oh, this is a program because they don't play like a human. But with AlphaGo, I feel something very strange. Sometime, I feel like it's really, really like human.

Speaker 1

那么AlphaGo和AlphaZero对围棋界产生了什么影响呢?是需要一个接受的过程,还是从一开始就是积极的?

What's the impact been then of of AlphaGo and AlphaZero on the Go community? Has there had to be a a process of acceptance, or was it, you know, positive from the off?

Speaker 2

首先,当我输给AlphaGo时,对于围棋界来说,没有人真正相信这是真的,因为你知道,我只是欧洲冠军。所以不是像李世石那样的世界冠军。但当AlphaGo与李世石对弈时,整个围棋界看到了不同的东西,因为AlphaGo下得非常好。我记得第二局比赛中的第37手。

First of all, when I lost with AlphaGo, so for the AlphaGo community, nobody really not believe this is true because, yeah, you know, I'm only European champion. So it's not the word champion like a Lizzard. Okay. But when AlphaGo went with Lizzard and the all Go community see something different because AlphaGo play really, really well. I remember on the second game, the move 37.

Speaker 2

如此美妙的一手。真的非常美妙。非常有创意。对人类来说,我们永远不会下这一步。

Such beautiful move. Really, really beautiful. So creative. It's very creative. For the human, we will never play this move.

Speaker 2

在那一步之后,围棋界的一切都改变了。因为对我们来说,一切皆有可能。如今,就连围棋学生也用AI来学习。所以,是的,我认为这对我们的围棋界真的非常有益。我认为这不仅对围棋界有益,对全世界也是如此。

After that move, everything change in the Go world. Because for for us, everything is possible. Today, even the Go student use AI to learn. So, yes, I think this is really, really good for our Go community. I think it's not just for Go community, it's also for the world, I think.

Speaker 1

樊麾,非常感谢你加入我们。这真是一次难得的享受,尤其是在这个重要的周年纪念日即将到来之际。

Fan Hui, thank you so much for joining us. That was like a that was such a real treat, especially with the big anniversary coming up.

Speaker 0

很高兴再次见到你,感谢你在AlphaGo项目中所做的一切。如果没有你,我认为一切都会不一样。如果没有你的建议一路帮助我们,我们可能会犯一些严重的错误。所以谢谢你。谢谢你,戴夫。

Just great to see you again, and thanks for everything you did on AlphaGo. It was I don't think it would have been the same without you. I think we would have made some terrible mistakes if we hadn't had your advice to help us along the way. So thank you. Thank you, Dave.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客