Gemini 2.0与Oriol Vinyals探讨代理型AI的演进

本集简介

在本期节目中，汉娜与Drastic研究副总裁兼Gemini联合负责人奥里奥尔·维尼尔斯展开对话。他们探讨了智能体从单一任务模型向通用型模型的演进历程，例如Gemini这类具备广泛应用能力的模型。维尼尔斯为汉娜解析了多模态模型背后的两阶段流程：预训练（模仿学习）与后训练（强化学习）。双方深入讨论了规模化部署的复杂性，以及架构创新与训练流程优化的重要性。节目最后快速展示了Google DeepMind近期发布的一系列新型智能体功能。注：观看完整版演示（含未剪辑版本）及Gemini 2.0相关视频，请前往YouTube。延伸阅读/观看： Gemini 2.0 与杰夫·迪恩解码Google Gemini 弗雷德里克·贝塞谈游戏、山羊与通用智能特别鸣谢（包括但不限于）：主持人：汉娜·弗莱教授系列制片人：丹·哈杜恩剪辑：拉米·扎巴尔（TellTale工作室）监制与制片：艾玛·尤瑟夫音乐作曲：埃莱尼·肖摄像指导与视频剪辑：贝尔纳多·雷森德音频工程师：佩里·罗甘廷视频工作室制作：尼古拉斯·杜克视频剪辑：比拉尔·梅尔希视频美术设计：詹姆斯·巴顿视觉标识与设计：埃莉诺·汤姆林森由Google DeepMind委托制作 — 订阅我们的YouTube频道在X平台关注我们 Instagram关注我们 Linkedin添加我们若喜欢本期节目，请在Spotify或Apple Podcasts留下评价。我们始终期待听众的反馈，无论是意见、新想法还是嘉宾推荐！由Simplecast托管（AdsWizz旗下公司）。个人信息收集及广告用途说明详见pcm.adswizz.com。

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

欢迎收听《谷歌DeepMind播客》，我是汉娜·弗莱教授。没错，智能体。它们已经到来或即将到来，很可能成为2025年所有人讨论的焦点，但它们绝非新鲜事物。

Welcome to Google DeepMind, the podcast. I'm professor Hannah Fry. Right. Agents. They are here or almost here, and they are probably all anyone is gonna be talking about in 2025, but they are definitely not new.

Speaker 0

本期节目的嘉宾曾在2019年做客播客，与我探讨他当时研发的多智能体系统——该系统能在《星际争霸》游戏中击败职业选手，并最终达到宗师级别。但此后智能体经历了怎样的演进？它们现在能做什么？语言模型和多模态AI的进步如何改变了局面？

My guest on today's episode is someone who last came on the podcast in 2019 to talk to me about the multi agent system he was working on. It could beat the professional Starcraft players at their own game and eventually went on to achieve grandmaster status. But how have agents evolved since then? What can they do now? How have the advances in language models and multimodal AI changed things?

Speaker 0

又该如何构建能够代表用户自主决策的系统？需要说明的是，若想了解智能体基础知识，可以观看我们夏季与弗雷德里克·贝斯特录制的节目。而今天，奥里尔·瓦格纳尔斯是Drastic研究副总裁兼Gemini联合技术负责人，可以说我们有太多需要跟进的内容。奥里尔，欢迎回到播客。

And how do you possibly go about building something that can make autonomous decisions on behalf of its user? Now I should tell you, if you want a primer on agents, you can watch our episode with Frederic Best that we recorded over the summer. But for now, Oriel Vagnals is vice president of drastic research and co tech lead of Gemini. And it's fair to say we've got quite a lot to catch up on. Oriel, welcome back to the podcast.

Speaker 1

你好，感谢邀请。

Hi. Thank you for having me.

Speaker 0

什么是Drastic研究？

What is drastic research?

Speaker 1

我始终告诫团队要有颠覆性思维——不要只做大家都在考虑的渐进式改进，要大胆设想几年后的技术前景，然后回溯这些构想，以这种心态指导当下的实践。这就是drastic的含义，没错，这是我常挂嘴边的词。

Well, I keep telling my team they have to think drastic, meaning don't just do the incremental stuff that everyone is thinking about. Try to drastically think what will happen in a few years' time and then try to backport those ideas and then execute today with that mindset in mind. So that's what drastic means. But, yeah, it's a word that I use a lot.

Speaker 0

记得上次见面时，你正在研发能通过键盘鼠标在画图软件作画或玩《星际争霸》的智能体。而如今，技术发展已不可同日而语。

I think when I last got to see you, you had been working on an agent that could use a keyboard and a mouse to do things like draw pictures in Paint or play Starcraft. And, well, things have moved on quite a bit since then.

Speaker 1

所以当时那些智能体，你采用了一套非常通用的原则，机器学习领域中非常简单的原则。基本上你会针对一个任务专门训练一个模型。我们当时做的是设计一套任务课程，难度越来越大。对吧？比如我们上次聊的时候，在电子游戏方面，我们研究的是《星际争霸》，这是目前最复杂的现代策略游戏之一。

So those agents, at the time, you took sort of a very generic set of principles, very simple principles in the field of machine learning. And you would basically specialize a model on one task. And what we were doing at the time is have a curriculum of tasks that were more and more and more difficult. Right? So when we last spoke, for instance, in video games, we were looking at StarCraft, which is one of the most complex modern, you know, strategy games out there.

Speaker 1

当然，DeepMind以开创Atari游戏的趋势而闻名，这是一个相当简单的游戏，左右移动、击球，然后就开始了。所以，这就是算法本身，我们努力推动它们变得非常通用，能够不断攀登游戏难度课程的阶梯，做越来越复杂的事情。而现在的情况是，即使是我们训练的模型，也比当时开发的模型能更广泛地应用于更多事物。对吧？所以，创建这个数字大脑的过程并没有太大变化，但那个大脑能做的事情相当狭窄，尽管非常复杂，比如玩《星际争霸》或围棋。

And, of course, DeepMind is notorious for having started the trend with Atari, which is a fairly simple game of, you know, left, right, hit the ball and, you know, off you go. So, that sort of is what the algorithms themselves, we try to push for them to be very general, we can keep climbing this ladder of difficulty curriculum of games and doing more and more complex things. And right now, what has happened is even the models we train are broadly applicable to many more things than the models we developed back then were. Right? So, think about the process of creating this digital brain hasn't changed that much, but what that brain was able to do was reasonably narrow, although very complex, like playing Starcraft or playing Go.

Speaker 1

现在，这些模型能够应用于更广泛的领域，当然，还能与我们交谈，聊天机器人等等，等等。

Right now, these models can do quite a lot more broad applications and, of course, talking to us, chatbots, etcetera, etcetera.

Speaker 0

所以那时候，强化学习是你们的主要手段，我猜。现在的情况有多大不同？

So back then, like, reinforcement learning was your main kind of lever, I guess. How different are things now?

Speaker 1

是的。算法上，实际上，AlphaGo和AlphaStar的创建过程应用了相同的算法序列。这与当前大型语言模型或多模态模型的创建方式并没有太大不同。多年来，在我们参与的许多项目中，有两个基本步骤一直相当恒定，我们可以称第一个为预训练或模仿学习。也就是说，你从随机权重开始。

Yeah. So algorithmically, actually, the process of, you know, AlphaGo and actually AlphaStar, those two had the same set of sequence of algorithms applied to creating this digital brain. And it's not actually that different from how current large language models or multimodal models are created today. There's two basic steps that have been pretty constant throughout many years in many of the projects that we've worked on, which we can call the first one pre training or imitation learning. That is, you start with random weights.

Speaker 1

你有一个算法，它会尝试模仿人类创建的大量数据，无论是玩游戏，还是在这种情况下，整个互联网，所有可用的知识。在那个第一阶段，你只是调整权重，尽可能好地模仿那些数据。

You have an algorithm that will try to imitate lots of data that humans have created to either play a game, or in this case, all of the internet, all of the knowledge available to us. And in that first stage, you just adapt the weights to try to imitate that data as well as possible.

Speaker 0

而这些权重本质上是在每个神经元内部，就像一系列数字，描述了它如何连接到其他一切。

And these weights are essentially inside each of the neurons is like a series of numbers that kind of describes how it's connected to everything else.

Speaker 1

是的。所以，基本上，神经元是计算单元，而神经元之间的连接实际上就是权重。你可以想象有一个神经元，有几个神经元连接到它，你基本上是把来自输入神经元的激活值乘以权重后相加，而这些权重是唯一会变化的东西，输入会激发神经元。这基本上就像大脑的工作原理，带有一定的自由度，嗯，创造性。

Yeah. So, are basically, there are units of computations that are neurons, and the connections between neurons are what actually you have as weights. So, you can imagine that there's a neuron, there's a few neurons connected to it, and you're basically adding all the activations from the incoming neurons there multiplied by the weights, and those weights are the only things that move, and the inputs excite the neurons. It's pretty much how, you know, a brain works with some, you know, freedom of, yeah, creativity.

Speaker 0

好的。如果做个类比，这几乎就像你有神经元，然后水流经它们，而权重就像是连接它们之间的管道宽度。是的，

Okay. If we were to do an analogy, it's almost like you've got the neurons, and you're like, water is flowing through it, and the weight is like the width of the pipes between Yeah, the

Speaker 1

没错。然后你可以想象拥有数百万个神经元和数十亿甚至数万亿条管道，这就是我们花费大部分计算资源来训练这些模型的地方，尤其是语言模型，就是在预训练或模仿我们所有可用数据的阶段。

that's right. And then you can imagine having millions of neurons and billions or even trillions of pipes, and that is what we spend most to compute actually training these models, especially language models, is in this pre training or imitating all the data that we have available to us.

Speaker 0

好的。所以，你现在有了这个巨大的网络，有大量管道连接所有神经元，那就是你的模仿阶段。完成了。接下来，如果你在做比如AlphaGo或AlphaZero，你会让它自己和自己对弈。

Okay. So, you've now got this gigantic network with loads of pipes going between all the neurons, and that's your imitation phase. Done. Next bit, if you were doing, say, alpha go or alpha zero, you would then get it to play itself.

Speaker 1

是的。所以这个模型现在相当擅长做出看起来像人类的下法。这意味着生成的句子是非常合理的英语句子，或者如果是在玩游戏，它会合理地点击移动棋盘上的棋子等等。但这个模型还没有学会这些动作会带来奖励，对吧？这就是强化学习或后训练的核心，也就是训练的第二阶段。

Yeah. So this model now is reasonably good at playing moves that look human like. So, that means the sentences are very plausible sentences in English, or if it was playing a game, it would sort of click things reasonably to move, you know, pieces on the board and whatnot. But what this model hasn't done is learn that these actions yield reward, right? That's the beat of reinforcement learning or post training, which is the second phase of training.

Speaker 1

所以，你可以写一首诗，只是基于互联网上诗歌的平均样子。但问题是，嗯，我只想要好诗。对吧？那么，我如何根据一个信号进一步调整这些管道呢？比如说，现在写完一整首诗后，会得到一个零分或一分的评分。如果是一首平庸的诗，你得零分。

So, you can write a poem by just, hey, just how does a poem on the internet look like on average? But then the question is, well, I want only the good poems. Right? So, how can I further adjust these pipes based on sort of a signal that now having written now a whole poem would give a score of zero or one, let's say, right? And if it's a mediocre poem, you get a zero.

Speaker 1

如果是一首好诗，你得一百分。再以游戏类比，这是我们传统上使用强化学习的地方，如果你赢了游戏，你得一分；如果输了，得零分。然后你进一步调整权重。但现在你不是在模仿人类，而是在说，忘掉人类，我想超越人类所能做的，真正让我的所有诗都成为完美的诗，对吧，或者让我所有的棋局都成为完美的对局。

If it's a good poem, you get a one. Again, for a game analogy, which is what we use reinforcement learning traditionally, if you win at the game, you get a one. If you lose, you get a zero. And then you further adjust the weights. But now instead of imitating humans, you're just saying, forget, I want to go beyond what humans could do, and try to really get all my poems to be the perfect poem, right, or all my chess games to be the perfect game.

Speaker 1

而在语言模型中，这第二个阶段——即强化学习后训练阶段——往往持续时间较短，因为我们无法获得像传统棋盘游戏中自我对弈时‘你赢了游戏’或‘你输了游戏’那样超级清晰的奖励信号。

And in language models, this second phase, which is reinforcement learning post training, tends to be fairly short lived because we do not have access to super clean reward as you've won the game or you've the game when you do self play in traditional board games, for example.

Speaker 0

所以一旦完成，这就是幕后发生的所有事情。然后你会说，就在那儿别动。对，大家保持原位。我们基本上就是要给整个网络拍个快照，而这才是用户实际能访问到的版本。

So once that's done, that's all the stuff that goes on behind the scenes. And then you're like, hold it right there. Yeah. Stay exactly where you are, everybody. We're gonna take just basically a snapshot of this entire network, and that is what you actually get to access as a user.

Speaker 1

没错。现在这个神奇的过程结束了。这些权重超级珍贵对吧？你花了几个月时间精心调整才找到这个配置，现在基本上不会再改动它了，对吗？

Yeah. So, now this amazing process finished. These weights are super precious. Right? So, this configuration you found, you've really spent months to finesse it, to tweak everything, and now you sort of will never move it anymore, right?

Speaker 1

所以训练结束了。你不再改变配置了。你可能想让它超级高效对吧？比如你发现某个神经元没用，它根本不起任何作用。

So, training is over. You're not changing the configuration anymore. You might want to make it super efficient, right? So, say like you find that, oh, look, this neuron is not useful. It's not used for anything.

Speaker 1

你把它移除，这样大规模运行时一切都会更快更经济。然后用户拿到的就是同样的权重。所有人都获得我们训练好的相同权重——这就是我们所说的Gemini 1.5 Flash。它意味着一组冻结的权重，不会改变，也不会进一步训练。

You remove it, so everything becomes faster and cheaper to run it at scale. And then as a user, you just get the same weights. Everyone gets the same weights we've trained. That's what we call like, you know, Gemini 1.5 Flash. That just means a set of weights that are frozen, will not change, will not further train or anything.

Speaker 1

所以从AlphaGo到AlphaStar再到当前的大语言模型，这两个步骤实际上几乎完全相同。当然细节很重要，领域也确实在发展，但原理基本没变。

So, those two steps actually pretty much are identical from AlphaGo to AlphaStar to like current large language models. And, of course, there's details that matter, and the field has evolved certainly, but the principle is pretty much unchanged, actually.

Speaker 0

因为本质上，比如DQN（Atari游戏示例）、AlphaGo使用的算法类型与大语言模型之间存在差异。它们的架构是不同的，对吧？

Because under the hood, as it were, there are differences between I don't know. I'm thinking like DQN here, which was the Atari example, or the types of algorithms that are used in in AlphaGo or then again in the the large language models. Like, the architecture is different. Right?

Speaker 1

是的。所以，构成数字大脑的几个组成部分之一是架构。对吧？这些神经网络就是其中之一。

Yeah. So, there's a few components that go into then what the digital brain is. One is the architecture. Right? So, there are these neural networks.

Speaker 1

现在我们有了Transformer架构，这在我们DQN时代肯定是没有的。所以，架构方面总有一些突破能更好地从数据中学习。但从Transformer到现在，几乎都是些微调。我的意思是，即使你看AlphaFold，它也是由Transformer驱动的，团队有时花几年时间所做的就是找到一些小调整，比如，移除这组神经元，增加另一层。

Now we have the transformers, which we certainly didn't have back in the DQN days. So, there's always some sort of breakthroughs in architectures that are better at learning from the data. But then from transformers to today, it's almost all about little tweaks. I mean, even if you look at AlphaFold, which also is fueled by a transformer, what the teams do for years sometimes is just to find little tweaks on, hey, let's remove this set of neurons. Let's add another layer.

Speaker 1

让这些稍微宽一些。大脑的形状稍微改变一下，这在性能表现上有时会决定成败。

Let's make these a bit wider. The brain shape changes a little bit, and that makes it or breaks it sometimes in terms of the performance achieved.

Speaker 0

所以，如果这些都是迄今为止已经实现的东西，我的理解是，目标是创造更多代理行为，让这些东西能够自主决策。这些是如何帮助实现这一目标的呢？

So, if these are all the things that have been achieved so far, I mean, the goal, as I understand it, is to to create more agentic behavior, to kind of get these things to to to make autonomous decisions. How did these help to achieve that end?

Speaker 1

是的。让我们稍微聚焦一下当前趋势。我们称之为大型语言模型，但它们是多模态的。我想我们之前有一期节目深入讨论了多模态方面，能够添加一张图片然后提问、追问等等是多么有用。所以，这个分数，我们还会继续改进它。

Yeah. So let's zoom in a little bit on the current trend. We call it large language models, but they're multimodal. I think we had an episode earlier covering heavily the multimodality aspect, how good it is to be able to add an image, then ask something, a follow-up question, and so on. So, this score, we we will still improve it.

Speaker 1

嗯，对吧？这组权重能够对输入做出这些惊人的推理。对吧？这张图片是关于什么的？

Mhmm. Right? This set of weights that do these amazing sort of inferences about the input. Right? What's this image about?

Speaker 1

用户在问什么？我能写一首更好的诗吗？我能把它写得更长吗？诸如此类。就像我们现在都能玩到的所有这些交互。

What's the user asking? Can I write a better poem? Can I just make it longer? Whatever. Like all these interactions we all like get to play with these days.

Speaker 1

但这只是一个组件。现在它成了我们的CPU。我们可以在它周围添加更多东西，如果模型能为你去做研究呢？比如说，对吧？我们早在过去就已经在思考这个例子了。

But this is just a component. This is now our CPU. And we can add more to it, around it, what if the model could go off and do research for you? Could, for example, right? One example, we were already thinking about that back in the day.

Speaker 1

我可以让一个语言模型或视觉语言模型学习玩《星际争霸》游戏。这与创建一个专门玩游戏的智能体是完全不同的思路。在这个例子中，它可以上网观看游戏视频，当然也可以下载游戏开始互动学习，逐渐理解规则。它还能在线研究、逛论坛、阅读讨论，通过实战发现自己的弱点并改进等等。

I could ask a model, a language model or a visual language model, to learn to play the game of StarCraft. That's a very different approach to say, create one agent that does play the game. In this other example, right, it could go online, watch videos about the game. It could, of course, download the game to start interacting with it, to learn, oh, yeah, like, I know how to, you know, I get it. Do research online, go to forums, read the forums, go play and figure out that it's weak at this thing and improve and so on.

Speaker 1

经过可能数周的时间后，它会给你发邮件说：我现在会玩这个游戏了。来对战吧？这并非遥不可及的现实，这些模型突然就能真正行动起来，学习任何对它们开放的新事物。想到这一点确实令人震撼。

And after literally, it could be weeks, it sends you an email that says, I now know how to play the game. Let's play. Right? That's not a reality that's that far away, but these models all of a sudden actually do something, take some actions, and learn anything new that is available to them. And that's pretty powerful to think about.

Speaker 1

这最能推动通用性的发展，也正是让许多人称之为AGI（人工通用智能）的感觉更接近现实的原因。

It's what pushes the generality the most, and that's what makes kind of the AGI, as many people call it, feel closer.

Speaker 0

那么按照我的理解，我们现在拥有的大型语言模型、多模态模型等，不管怎么称呼它们，就像是核心中枢。而下一步是在这个核心之上构建功能，让它能够脱离稳定装置，自主地去执行任务。

So if I understand it correctly then, it's it's almost like the stuff that we have at the moment, the large language models, the multimodal models, whatever you wanna call them, that's like kind of the central core. But the next step is that you build stuff on top of that central core that it can go off and and, you know, take off the stabilizers and kind of go off and do its own thing.

Speaker 1

没错。如果它能获取所有知识，并利用时间进行适当研究——比如提出假设、编写代码等，花时间真正回答极其复杂的问题，那么可能性就会急剧扩大。当然，并不是所有事情都需要这种能力。

Yeah. Exactly. If it has access to all the knowledge and it can sort of use its time to do some proper research, I mean, hypotheses, write some code, and so on, and take its time to really answer very, very, very complex questions, then the possibilities now have broadened quite drastically. Although, of course, we're not going to need that for everything.

Speaker 0

意思是，如果

Mean, if

Speaker 1

我们提出一个问题，比如，嘿，你知道，我喜欢米饭，我的意思是，今晚我该准备什么？可能不需要深入思考或花三周时间研究，否则你可能会对等待时间不太满意，对吧？但我认为要推动前沿，你是在给计算机一个数字身体。这样，它不仅能思考并给出指令或文字输出，还能在网上或你上传的文件等地方进行操作，提出非常复杂的问题，并为你个性化等等。

we ask a question like, hey, you know, what I like rice, I mean, what should I prepare tonight? Probably no need to do a very deep dive into thinking or just going on off for three weeks, then you'll probably like not be very happy about the waiting time, right? But I think to push the frontier, you're giving a digital body to the computer. So, it can not only just think and give, you know, an instruction or a word output, but it can also go off and do things online or on documents that you might upload or whatever and ask very, very complex questions and personalize to you, etcetera, etcetera.

Speaker 0

我喜欢这个想法，那个核心部分，然后你给它一个数字身体。你有了电子大脑，现在又给它一个数字身体。这有点

I like that idea, the the central core, then you're giving it a digital body. You've got the electric brain, and now you're giving it a digital body. It's kind of

Speaker 1

没错。

Exactly.

Speaker 0

好的。那么关于电子大脑，关于这个核心，这个处理器，我想我们应该考虑Gemini，对吧？这基本上就是我们讨论的，你们拥有的多模态模型。我知道大型模型的一个大想法就是不断扩展，对吧，让它们越来越大、越来越大。你认为我们从扩展中看到的结果现在是否已经趋于平稳了？

Okay. So in terms of the electric brain then, in terms of this core, this processor, I guess we should be thinking about Gemini here, right, which is which is essentially what we're talking about, the the multimodal model that that you guys have. I know that one of the big ideas for large models was just to scale it up, right, to get them bigger and bigger and bigger and bigger and bigger. Do you think that the results that we've seen from scaling have sort of plateaued by now?

Speaker 1

是的。这是一个非常重要的问题，对吧？我们研究了随着模型变大，也就是这些模型有多少神经元，它们在我们有明确指标的某些任务上如何变得更好，整个机器学习社区都有这样的指标？例如，一个很容易理解的例子是机器翻译，对吧？所以，模型在两种语言之间翻译的能力如何？随着扩展，从数百万到数十亿再到可能数万亿神经元，你可以看到性能持续提升。

Yeah. It's a very important question, right? Like, we have studied how, as you make the models larger, that is how many neurons these models have, how do they become better at certain tasks that we have clear metrics from the whole machine learning community? For example, one that is very simple to understand is machine translation, right? So, how good the models are at translating between two languages As you scale, as you go from millions to billions to potentially trillions of neurons, you can see the performance keep improving.

Speaker 1

现在，即使你做这些研究，一个技巧是它看起来是线性的，但你必须绘制对数坐标轴，对吧？用通俗的话说，这意味着，假设过去三年我们有了一些改进，你不应该期待未来三年有同样的改进。实际上要达到那里是指数级困难的，对吧？所以，这意味着计算投资，当然也以超线性速率进步，但可能不如这些趋势所显示的那么好。

Now, even when you do those studies, one trick is that it looks linear, but you have to plot logarithmic axis, right? What that means in lay terms is that, let's say from the last three years, we had some improvement. You shouldn't expect the same improvement in the next three years. It's actually exponentially hard to get there, right? So, that means the compute investment, which of course also advances at the super linear rate, but perhaps not as good as these sort of trends suggest.

Speaker 1

你会看到一些收益递减，因为简单地扩展x轴，也就是参数数量，你需要增加10倍才能看到同样的改进。这就会产生一些压力，嘿，也许我们不能扩展那么多，我们需要考虑其他方式来扩展以使模型变得更好。

You would just see some diminishing returns because simply scaling the x axis, right, the number of parameters, you need to go 10x to see the same improvement. And that just creates some pressure to, hey, maybe we can't scale as much, and we need to think about other ways to scale to make the models better.

Speaker 0

我给学生们举的例子是，比如，如果你有一个非常乱的房间，你花前10分钟整理会带来巨大变化。捡起所有脏盘子，收拾好所有脏衣服，很好。但一旦你整理到七个小时，那额外的十分钟就根本不会有什么差别了。这基本上就是我们目前所处的情况，对吧？

The example I give to my students is, like, if you've got a room that's really messy, the first 10 that you spend tidying is gonna make a massive difference. Pick up all the dirty plates, put away all the dirty washing, fine. But once you're like seven hours in, that ten minutes, extra ten minutes is is is not gonna make any difference at all. And that's essentially where we are, right?

Speaker 1

是的，这确实是一个非常贴切的类比。实际上，这个类比甚至可以应用到模型的性能上。即使你有极好的性能，如果你希望这些模型100%符合事实，对吧？它们永远不会编造东西。我们知道如果你试探它们，你可以让它们说出一些不真实的内容。

Yeah, that's exactly a very good analogy. And in fact, that analogy can even apply to then the performance of the models. Even if you have extremely good performance, if you want these models to be 100% factual, right? They will never make something up. We know that if you probe them, you can make them say some things that are not real.

Speaker 1

即使是那最后一英里也极其困难，这在大规模部署时带来了一些有趣的挑战。

Even that last mile also is super hard, which creates some interesting challenges to deploy then at scale.

Speaker 0

所以，好吧，我明白你说的关于这一切都存在收益递减的说法。对吧？但就如何让这些模型变得更好而言，仅仅是数据、计算能力和规模吗？这些是唯一的东西，是你们必须拉动的杠杆吗？

So, okay, I hear what you're saying about how there's diminishing returns in all of this. Right? But in terms of how you make these models better, is it just data, computational power, and size? Are those the only things, the levers that you have to pull?

Speaker 1

是的。所以，当然，如果你冻结架构，比如说接下来一年没有创新，我们只是扩大规模，因为有更好的硬件出现。只是

Yeah. So, certainly, if you froze the architecture, let's say for the next year, no innovation, we just scale because there's better hardware coming Just

Speaker 0

把它做得更大。

make it bigger.

Speaker 1

是的。把它做得更大。这肯定会有一个看起来不错的趋势。但实际情况是，特别是在Gemini中，我们还有其他创新，比如其他技巧、技术、关于如何排序呈现给模型的数据的细节，到架构的细节，到如何运行训练过程，运行多长时间，我们实际向模型呈现什么样的数据，如何过滤，我们是呈现更多高质量数据还是更少低质量数据，各种不同的我们称之为超参数的东西，当然还有其他算法进步。我们也相当仔细地研究，因为训练模型的过程是昂贵的。

Yeah. Make it bigger. That certainly would have a trend that would look okay. But what's happened, and certainly in Gemini, we have other innovations, like other tricks, techniques, details about how to order the data that you present the model with, to the details of the architecture, to how to run the training process, how long to run it for, what kind of data do we actually present the model, how do we filter, do we present more data that's high quality, less data that's low quality, all sorts of different, what we call hyperparameters, and of course, other algorithmic advances. We also investigate fairly carefully because the process of training a model is expensive.

Speaker 1

因此，我们需要极其谨慎地积累创新，以便最终当我们准备就绪时，我们拥有足够的创新，并且可能还具备更好的规模来运行下一轮模型迭代，我们运行它，然后实现算法上的突破，而不仅仅是通过数据和计算。

So, we need to be extremely careful with piling up innovation so that eventually when we are ready, we have enough innovation, and also probably we'll have a better scale to run for the next iteration of models, we run it and then we get algorithmic, not only breakthroughs through data and compute.

Speaker 0

我想关于这种规模化的事情，另一个点是你可以投入的节点数量实际上没有限制。也许理论上计算能力也没有限制，但你能投入的数据是有限的，对吧？人类可用的词汇量是有限的。

I guess the other thing about this scaling stuff is that there's no limit really to the number of nodes that you can put in. Maybe there's sort of no limit in theory to the computational power that you put in, but there is a limit to the data that you can put in. Right? There's a limit to the number of human words that are out there.

Speaker 1

说得好。我的意思是，节点数量确实有限制，因为这些模型的扩展方式是，它们无法适配在单一芯片、硬件芯片上。所以，现在你有一组芯片网格，它们之间需要通信，你知道，存在一些限制，比如光速等等。因此，训练如此庞大模型的效率会达到一个临界点，即使从硬件利用的角度来看也不值得了。但你说得非常对，对吧？

Good point. So, mean, there is a limit on the nodes because how you scale these models is, well, they don't fit on one single chip, hardware chip. So, now you have a mesh of chips, they're communicating, you know, there's certain limits like speed of light, etcetera, etcetera. So, there starts to be a time where the efficiency of training such a big model also just not worth it even from a utilization of the hardware at your disposal. But very good point, right?

Speaker 1

预训练模仿所有数据中另一个关键点是，我们并不拥有所谓的无限数据体制。数据是有限的。所以，你可以想，好吧，让我们在所有数据上训练。如果你想

The other bit that is critical on this pre training imitate all the data is that we do not have what we call infinite data regime. There's finite data. And so, you can think, well, let's train on all the data. If you want to

Speaker 0

训练人类曾经阅读过的一切。

train Everything humans have ever read.

Speaker 1

就像整个互联网。所以，我们开始意识到，好吧，数据快用完了。有一些技术比如合成数据。我们能否以多种不同方式编写或重写现有数据？我的意思是，语言是显而易见的思路，嘿，你可以重写互联网。

Like all of the internet. So, we're just starting to think, okay, we're running out of data. There are techniques like synthetic data. Can we write or rewrite existing data in many different ways? I mean, languages would be obvious ways to think, hey, you could write the Internet.

Speaker 1

我的意思是，它主要是英文的。大约60%，我不知道确切比例。但有方法可以用不同方式重写相同的知识。我们正在探索这些方法。这是一个很多人开始投入的研究领域，因为如果你数据耗尽，缩放定律会对你惩罚得更严厉。

I mean, it's mostly in English. I mean, 60%, I don't know what's the exact percentage. But there are ways to rewrite the same knowledge in different ways. We're exploring those. That's kind of a research area that many people are starting to invest because if you run out of data, the scaling laws punish you even more.

Speaker 0

我的意思是，比如说，你可以让Gemini

I mean, for example then, you could get Gemini

Speaker 1

嗯。

Mhmm.

Speaker 0

编写它自己版本的互联网，然后用它来训练一个新版本的Gemini。

To write its own version of the Internet and then use that to train a new version of Gemini.

Speaker 1

是的。

Yes.

Speaker 0

但是否存在一种危险，如果你开始输入同一个模型的输出，最终可能会产生这些小的、嗯、无益的反馈循环？

Is there a danger, though, that if you start feeding in the output of the same model that you can end up creating these little, well, unhelpful feedback loops?

Speaker 1

它们当然可以，你知道，做一些有趣的实验来测试像你刚才提到的这样的想法。而且确实，表面上这不是一个好主意。比如，如果你只是要求它重现整个互联网，模型会受到影响。而且确实，从信息内容的角度来看，这个数据集已经包含了信息，你怎么可能创造新的信息呢，对吧？我不知道。

They certainly can, you know, do some interesting experiments to test ideas like this one you just mentioned. And indeed, that is not, on the surface, not a good idea. Like, the model suffers if you just ask it to recreate all of the Internet. And indeed, a priori from a like information content point of view, look, this dataset has the information that it How could you create new information, right? I don't know.

Speaker 1

像这些想法可能会有点帮助，因为机器学习存在缺陷，我们还没有那种从互联网中真正提取所有信息的基本能力。我的意思是，我们有好的算法，但它们并不完美。所以，我们拭目以待。是的。

Like these ideas might help a little bit because there's machine learning deficiencies that we're not at that fundamental ability to extract all the information truly from the Internet. I mean, have good algorithms, but they're not perfect. So, we'll see. Yeah.

Speaker 0

我的意思是，我想我需要再仔细思考一下，因为这个想法确实很有趣。因为，如果未经思考就盲目操作，新版本可能会带有原有的偏见，对吧？然后在此基础上，新版会变得更加偏颇，最终可能会逐渐偏离原始的人类版本。但你的意思是，在原始的人类互联网中似乎嵌入了这些概念联系。如果能提取出来，我几乎在想，就像E=mc²一样，对吧？

I mean, I guess I just wanna think about that a little bit more because it's a really interesting idea because, of course, naively, if you did it without thinking, then it's like the new version would sort of have the biases in it that the you know? And and then the the new version on top of that would be more biased and more and you'd end up sort of spiraling away from from the original human one. But then what you're saying is as though in the original human Internet is sort of embedded these conceptual connections. And if you can extract those I'm sort of thinking almost like e equals m c squared. Right?

Speaker 0

如果你能找到人类概念的‘E=mc²’，然后仅凭这个生成新数据，那似乎更现实一些。

If you can sort of find the e equals m c squared for human concepts and then just generate new data using that alone, then that seems more realistic.

Speaker 1

是的，没错。我认为这正是关键所在——这些语言模型只是在重复网络内容，无法创造新东西？还是它们真正学习了一个世界模型，能够从提取的原则中泛化出数据之外的内容？在更乐观的版本中（我倾向于相信这一点），我们可以稍微突破当前数据的限制。

Yeah, exactly. Right? And I think that's where you start hitting, I mean, are these language models just repeating what's online and not being able to create anything new? Or are they learning a world model truly that you can then, from the principles it extracts, possibly generalize beyond what the data has? And under the more optimistic version, which I tend to believe more, we can push the limits of data a little bit more than the current limits that we have.

Speaker 1

话虽如此，有些数据源我们尚未看到突破，比如视频数据。视频数据很多，但我们还没有一个时刻能够利用所有视频数据——你或许可以从中推导出大量知识、物理定律、世界运作方式，即使视频不一定有文字说明，也能提取这些知识。即便如此，我认为我们还没有充分利用这个来源。

That being said, I mean, are some data sources that we haven't quite seen a breakthrough, like video data. There's a lot of it. And we haven't quite seen a moment of take all the video data where you probably can derive a lot of knowledge, a lot of laws of physics, a lot of how the world works, even if there are no words associated with the videos necessarily, and extract that knowledge. Even that, I don't think we tapped into that source.

Speaker 0

但它不是这样运作的吗？

And it doesn't work that way?

Speaker 1

我是说，对吧？

It I mean, right?

Speaker 0

或者你不知道。

Or you don't know.

Speaker 1

是的。我是说，感觉上应该是这样。我的意思是，甚至我们学习的方式也是如此。早期确实有一些语言学习，但我们也是通过观察三维空间等来学习的，对吧？所以，很可能还有更多我们尚未提取的知识。

Yeah. I mean, it feels like it should. I mean, even how we learn. I mean, we learn there's some language learning in the early days, but we learn by also observing three dimensions and so on and so forth, right? So, there probably is more knowledge that we haven't extracted.

Speaker 1

显然我们做得相当好的一点，通过测试模型就能看到，是连接视频中存在的概念，然后你可以做很神奇的事情，比如，嘿，拿这一整小时的视频，只给我提取三个有趣的瞬间。对吧？但模型本身，它真的直接使用了那些信息吗？可能没有。

What obviously we've gotten pretty well at, and you can see by testing the models, is connecting the concepts present in the video by and then you can do amazing things like, hey, take this full hour video and just extract me three interesting moments. Right? But the model itself, has it actually used that Oh, yeah. Information directly? Probably not.

Speaker 0

哦，我太喜欢这个了。我们之前和杰夫聊过多模态模型。你知道，如果你让这些模型观看所有曾经创建的视频，它能否真正提取出重力作为一个概念的含义？但根据你的描述，如果我没理解错，目前它能告诉你它在视频中看到了什么，但它无法接着说，比如，E等于MC平方。或者如果你给它看夜空的图片，它不会突然就能像人类天文学家那样预测行星运动。

Oh, I like this so much. We were talking to Jeff about with multimodal models. You know, if if you get these models to just watch all of the videos that have ever been created, can it quite literally extract what gravity means as the concept? So but what but what you're describing here, if I understand it, is that at the moment, it can tell you what's in the video that it's seen, but it can't then say, and, you know, equals MC squared. Or if you showed it pictures of the the night sky, it wouldn't suddenly be able to predict the the planetary motion in the same way that that human astronomers did.

Speaker 1

是的，没错。我的意思是，我们在这里采取的捷径是，当我们训练图像和/或视频时，数据几乎总是有与之相关的文本表示，对吧？所以，可能是一个说明这张图片或这段视频内容的字幕等等。当然，这样一来就很不可思议了，对吧？

Yeah. Exactly. I mean, the shortcut we're taking here is that the data we train when we train on images and or videos, we almost always have a text representation associated with that modality, right? So, it could be a caption explaining what this image has or this video has and so on and so forth. And, you know, of course, then it's incredible, right?

Speaker 1

你可以放一张家庭作业的图片，上面有一些概念性的小涂鸦，它就能连接起来，并基于此进行相当多的逻辑推理。但我想说的是，我能否只拿视频，没有语言，然后训练一个模型来理解发生了什么，甚至可能以一种方式衍生出一种语言（显然不会是我们的语言），并提取那些概念。这还没有实现。而且，我认为它很可能会实现。

You can put a picture of a homework and with a little drawing conceptual, and it connect and do quite a lot of good logic just based on that. But what I'm saying here is, could I just take videos, no language, and then train a model to then understand what's happening, maybe even in a way derive a language, maybe it's obviously not going to be our language, and extract those concepts. And that has not happened. And, I mean, it probably will.

Speaker 0

回到你一开始说的，基本上DeepMind构建的所有模型都有两个阶段。是的。模仿阶段，就是我们刚才一直在讨论的。嗯。然后是叠加在其上的强化学习阶段。

Just going back to what you said at the beginning about there's two phases to to basically all the models that the deep mind have built. Yeah. The the imitation phase, which is what we've been talking about right here. Mhmm. But then the the reinforcement learning phase on top.

Speaker 0

嗯。我知道AlphaGo、AlphaZero以及更多模型是通过自我对弈变得更好的。这在这里也适用吗？

Mhmm. And I know that AlphaGo and AlphaZero and and many more got better by playing themselves. Does that apply here as well?

Speaker 1

是的。这是主要的开放性挑战之一，比如不仅要扩展预训练，还要扩展训练后或强化学习。对吧？所以，游戏中的强化学习之美在于有一套编码的规则，如果你赢了，你就知道赢了。有一种程序，如果你下棋赢了，它会检查所有情况。

Yeah. That's one of the main open challenges, like scaling not only pre training, but post training or reinforcement learning. Right? So, the beauty about reinforcement learning in games is that there is a set of rules that are coded, and if you've won, you know you've won. There is kind of a program that if you play chess and you've won, it will check everything.

Speaker 1

好的，那是将死了。恭喜你，你赢了。

Okay, that's a checkmate. Congratulations, you've won there.

Speaker 0

一个明确的成功指标。

A clear metric of success.

Speaker 1

明确的指标。但在语言中，就棘手多了，对吧？比如，这首诗比那首更好吗？我的意思是，即使在我们之间讨论这个也很难达成一致，对吧？所以，又是这种普遍性，使得精确计算变得非常困难。

Clear metric. Now, in language, much trickier, right? Like, is this a better poem than this? I mean, good luck discussing this even amongst us, right? So, it's the generality, again, makes computing exactness very hard.

Speaker 1

我的意思是，这是对电影更好的总结吗？这是视频中最有趣的部分吗？很难量化。但我们可以尝试，而且确实在尝试。你训练一个模型，基于一些人类偏好，大致上你只是说，好吧，现在试着泛化。

I mean, is this a better summary of the movie? Is this the most interesting bit of this video? It's very hard to quantify. But we can try, and we do try. You train a model, and based on some human preferences, roughly you just say, okay, try to now generalize.

Speaker 1

所以，我让模型批评自己的输出。它不会做得太差。也许80%的情况下还不错，这不算太糟。它能给你一些信号。但当你开始说，好吧，现在你爬升这个指标。

So, I ask a model to criticize its own output. It's not going to do that bad. It's going to be good maybe, I don't know, 80% of the time, which is not terrible. It can give you some signal. But at the point you start saying, well, now you climb this metric.

Speaker 1

你有了这种不完美的性能评估方式，但现在我们要开始针对这个不完美的奖励进行训练。模型会做的是利用奖励的弱点，对吧？也许用国际象棋的例子。想象我有一个袋子，如果某个兵在特定位置，你总是赢。而且这是一个没人会走的位置。

You have this imperfect way to assess performance, but you now we're going to start training against this reward that is not perfect. What the model is going to do is exploit the weaknesses of the reward, right? Maybe using the chess example. Imagine that I had a bag and if a pawn is in a certain position, you always win. And it's a position that no one would play ever.

Speaker 1

所以，也许没人知道这个漏洞存在。但现在你让算法去探索一切，试图找出如何赢得这个游戏。突然间你会发现，嘿，如果我把第一个兵移到这个位置，没人这么开局，你就赢了。算法确实攻克了游戏，然后研究人员去看你是怎么下棋的，结果发现下得糟透了。

So, maybe no one knows this exists. But now you ask an algorithm to please explore everything and try to discover how to win at this game. All of a sudden, you're going to find, hey, if I move the first pawn just to this position, no one plays this opening, you've won the game. Certainly, the algorithm has nailed the game, and then a researcher goes and sees how you play chess, and it's just terrible.

Speaker 0

调皮的AI。

Naughty AI.

Speaker 1

对。这就是我们讨论的问题。所以这就是挑战所在，对吧？基本上，你是在找漏洞，而不是真正学习一首好诗的真实含义。

Right. So that's what we're talking about. So we that's the challenge. Right? Basically, you're finding exploits rather than really learning what a good poem means truthfully.

Speaker 1

对吧？

Right?

Speaker 0

你不能直接加入另一个玩家吗？对吧？比如加入另一个模型，充当终极裁判。

Can you not just add in another player? Right? So add in another model, which is like the kind of ultimate arbiter.

Speaker 1

这是个建议，但问题是如何训练那个模型？对吧？比如，我们只有有限的对好诗的定义，来自一些专家，我们可以问他们，比较这两首诗等等。所以，我们用来训练这些裁判的数据量是有限的。当然，真实标准可能是请教专家，如果我们能做到，我们会的，但这不可扩展，对吧？

I mean, suggestion, but then the problem is how do you train that model? Right? Like, we have only a finite notion of what's a good poem by some experts that we might ask, hey, compare these two poems and so on. So, there's just a limited amount of data we have to train these arbiters. So, the ground truth might be to ask someone that is the expert, of course, and if we could, we would, but that's not scalable, right?

Speaker 1

然后想象一下，说‘好吧，我三秒内找到了一个参数更新，现在请专家评审这1万条内容，因为那是真相来源’，这会有多慢。我们没有足够的数据来训练一个足够好的奖励模型。所以，虽然有些想法，但问题是我们无法接触到真实标准。

Then imagine how slow it would be to say, okay, like I have a parameter update that in three seconds I found. Now please review these 10,000 things by an expert because that's a source of truth. And we don't have enough data to train a good enough reward model. So, again, there's some ideas, but the problem is we don't have access to the ground truth.

Speaker 0

我的意思是，就是这样。就好像你戴着烤箱手套在黑暗中摸索，对吧？就像是

I mean, that's it. It's like it's like you're you're feeling around in the dark with oven gloves on. Right? It's like

Speaker 1

是的。

Yeah.

Speaker 0

然后你甚至不完全确定有坚实的物体可以抓住。好吧。所以，如果那是核心，对吧，就像是电子大脑，现在我们在构建数字身体，你希望这个数字身体具备什么样的能力？比如推理能力。因为这方面也有相当多的工作，不是吗？

Then you're not even completely sure that there are solid objects to grab onto. Okay. So so if that's the core then, right, that's the, like, electronic brain, and now we're building the digital body, what kind of capabilities do you want that digital body to have? Like reasoning, for example. Because there's been quite a lot of work on that too, hasn't there?

Speaker 1

是的。所以你开始思考，嗯，我们能够给这些模型提供哪些主要类型的接口，让它们能够超越其固定的权重，看到更多东西，从而收集知识或做一些比仅仅根据上下文和权重预测下一个词更复杂的事情。那么，显而易见的想法是让它们能够访问搜索引擎。这是我们在谷歌非常擅长的。另一个是让它们能够运行自己编写的代码。

Yeah. So you start thinking, well, what are the main sort of surfaces that we would be able to give these models limited access to so they can see beyond what's in their weights, which are frozen, to be able to gather knowledge or do maybe something a bit more complex than just predicting the next word from just what they have in context plus what they have in their weights. And so, obvious things that come to mind is giving them access to a search engine. That's what we do very well at Google. Another one is to give them the ability to run the code they write.

Speaker 1

然后，当然，也许更通用的是让它们能够与可以访问互联网的浏览器交互，对吧？对于所有这些，你总是必须小心地进行沙盒处理。这意味着保护这些环境，这样即使模型不那么先进，也不会做出 unintended 的事情，对吧？所以，这涉及到整个安全方面，当你超越模型本身时，它就变得相当有趣。但如果我们只是梦想一下可能实现什么，对吧？

And then, of course, maybe even more general is giving them the ability to maybe interact with a browser that has access to the internet, right? With all of these, you always have to be careful to sandbox. That just means protect these environments so that the models, even if they're not that advanced, wouldn't do something that is unintended, right? So, there's the whole like safety aspect of this that as you move beyond the model, it starts to be quite interesting. But if we just kind of dream what would be possible, right?

Speaker 1

通过让模型可以使用这些工具，突然间，它们可以开始做比我们当时使用的训练语料更高级的事情，对吧？它们可以依靠最新新闻向我们解释或总结昨天发生的主要事件。所有这些事情，你都需要给它们这些工具。

By having these tools available to the models, all of a sudden, they can start doing much more advanced things beyond what was the training corpus that we used at the time, right? They can rely on the latest news to explain us or to summarize what was the main thing yesterday that happened. All these kind of things, you need to give them these tools.

Speaker 0

好的。那么，推理能力如何融入这一切呢？

Okay. So, how does reasoning fit into all of this then?

Speaker 1

是的。推理很有趣，对吧？所以我描述的内容可以总结为：嘿，就像，我想知道昨天发生了什么。然后我可以说，看，也许稍微个性化一点，对吧？所以我可以用语言描述，我可以说，嘿，模型，我是口语化的，我对这个和那个感兴趣，我的政治观点是这样或那样的。

Yeah. Reasoning is interesting, right? So, what I described could be summarized as, hey, like, I mean, I want to know what happened yesterday. Then I could just say, look, maybe personalize it a little bit, right? So, I could describe in words, I could say, hey, hey, model, I'm oral, I'm interested in this and that, my political views are this or that.

Speaker 1

给我一个关于昨天新闻的积极看法，对吧？模型可能会搜索，对吧？检索所有新闻，然后根据我的要求，以我喜欢且觉得愉快的方式执行。如果我不喜欢，我甚至可以说，我不喜欢这个，或者这个笑话不好笑。然后我们可以在对话中进行一些迭代。

Give me like a positive take on the news yesterday, right? And the model would probably search, right? Retrieve all the news and then given what I ask it to do, just do it in a way that I like and I find it enjoyable. Maybe if I don't like it, I can even go and then say, I didn't like this or this is not a good joke. And then we could iterate a little bit in a conversation.

Speaker 1

现在，推理是扩展的一个不同维度。所以，你可以想象模型决定采取什么样的中间步骤来给我更好的答案，对吧？想象一下，谷歌搜索检索到100个新闻来源。模型可能决定，嘿，我不会直接阅读并一次性总结所有内容，而是先总结这100篇文章中的每一篇。

Now, reasoning is a bit of a different axis of scaling. And so, you could imagine the model deciding what kind of intermediate steps to do to give me a better answer, right? So, imagine there's like 100 news outlets that Google search retrieves. Maybe the model decides, hey, I'm just not going to read this and just try to summarize it all at once. I'm going to summarize each of the 100 articles first.

Speaker 1

这意味着模型决定为100页中的每一页写一个摘要。不是给用户，而是给我自己。现在它有100个摘要，也许下一步它决定做的是，我将按主题对这些进行分组。然后它发现其中一篇文章看起来可疑。所以，它可能会上网查看是否有论坛讨论，比如，哦，这可能因为作者等原因不太真实，等等。

That means the model decided I'm going to write a summary for each of the 100 pages. Not to the user, to myself. Now it has a 100 summaries, and maybe the next step it decides to do is, I'm going to group these by topics. Then it decides one of the articles looks suspicious. So, maybe it goes online and checks if in any forums someone discusses like, oh, this might be like sort of not truthful because of the author and so on and so forth.

Speaker 1

所以，它可以进行很多步骤来做研究。而且，你知道，它可以这样做相当长一段时间。只有当模型说，嗯，我认为现在我有一个质量高得多的答案时，它才会给你一个简短的总结。但现在它有了所有这些时间来对它可用的信息进行更多的处理。我们希望，给模型的时间越多，它总结新闻、写诗、当然还有做数学的能力就会越好。

So, it can do a lot of steps to do research. And, you know, it could do this for quite a while. And only when the model says, well, I think now I have a much better quality answer, then it will give you the few word summary. But now it had all this time to do much more processing on the information that was available to it. And that inference time compute, we hope that the more time we give to the model, the better it's going to summarize the news, the better it's going to write a poem, better it's going to, of course, do math.

Speaker 1

但这肯定是扩展的另一个维度，我们希望能解锁它，并且再次打破我们在纯预训练中看到的扩展规律和限制。

But that's certainly another axis of scaling which we hope to unlock and, again, will break a bit of the scaling laws and the limits that we see in pure pretraining.

Speaker 0

这还包括计划吗？比如，它能查看你的日历，计算出你的发薪日是什么时候，也许知道一月的促销即将到来，然后告诉你推迟几天预订假期。

Does this also include planning? Like, could it look at your calendar, work out when your payday was, maybe know that the January sales are coming up soon, and tell you to to postpone booking a holiday for a few days.

Speaker 1

我的意思是，这可能会变得非常复杂。但是，当然，当你考虑到个性化和时机等因素时，因为有其他正在进行的事情，你会有更多的信息来源。你需要收集它们然后给出最佳答案，这就不仅仅是‘天空是什么颜色’这样简单的问题了——其实这个问题回答起来也不简单。我刚才在想那个例子。我们早期有一篇论文就用那个例子来说明，语言模型能做到多么惊人的事情。

I mean, that can get very complex. But, of course, when you factor in things like personalization and like when to do things because of all the other things that are ongoing, you have more sources of information. You need to collect them and then give the best answer, and it stops being what color is the sky, which is not that simple to answer. I was thinking about that example. I mean, we had the very early paper where we have that example as, oh, something that language models can do amazing.

Speaker 1

对吧？你不需要编程让它们回答，但它们就能回答。但实际上，如果你开始深入思考，答案其实相当微妙——比如在哪个星球？一天中的什么时间？我的意思是，有没有云层？所以，思考和规划，是的，这绝对是这些模型能够做到的事情。

Right? You don't program them to answer, but they answer. But then, actually, even the answer is quite nuanced if you start thinking, oh, yeah, you know, planets and what time of the day? I mean, is it cloudy or not? So, the thinking and the planning, yeah, it can definitely be that's something that these models can do.

Speaker 0

这让我想起2019年与Demis的一次对话，他当时谈到卡尼曼和特沃斯基的观点，人脑几乎有两种思维系统：一种是快速、本能、直觉驱动的，另一种是像做数学和下棋那样更缓慢、经过计算的。Demis说传统上计算机更容易实现第二种，但现在我们看到更快的那种本能思维也出现了。不过你的意思是要把两者结合起来，对吧？

I'm reminded by I had a conversation with Demis probably back in 2019, and he was talking about the Kahneman Tversky ideas of how the human brain has almost two systems of thinking, the sort of quick, instinctive, intuition based, and then the much slower calculated sort of the way that you do maths and chess. And Dennis was saying that that second one has been traditionally easier for us to do with computers, but that now we're seeing the much quicker instinctive stuff. But, I mean, you're sort of talking about putting the two together. Right?

Speaker 1

是的，没错。Demis可能也是在谈论系统，这确实需要更多反思。在游戏中就很明显，对吧？你可能会说，‘这步棋感觉对了’。

Yeah. Right. Probably what Demis was talking about is systems too, which is indeed one that you you reflect a bit more. In games it's very clear, right? You just could say, oh, this move feels right.

Speaker 1

你就直接走棋。但如果你仔细思考，可能会找到更好的走法。挑战在于，现在由于我们处于如此通用的方向，这些模型什么都能做。我的意思是，真的什么都能做。你想做什么都行。

You just move. But if you think and ponder, you might get to a better move. The challenge is that now, because we are in such general direction, right, these malls can do anything. I mean, anything, literally. You just do whatever you want.

Speaker 1

比如上传一张图片，讨论新闻。所以，要实现这种更深层次的思考，其方式如此领域特定，你该如何做到？有几种答案，但我喜欢的一种是：这些模型非常通用。要在通用能力之上增加思考能力，你可能需要一种通用的思考方式。因此，你可以用模型本身来生成它应该如何思考任何事情。

I mean, upload an image, talk about the news. So, what it means to have this deeper thinking is so domain specific that how are you going to do that? And I mean there's a few answers, but the one I like is like, well, these models are very general. To add the ability to think on top of a very general set of capabilities, you probably need a general way to think. And so, you use the model itself to generate how it should think about anything.

Speaker 1

模型会提出方案，比如‘我要总结每篇文章，我要做这个和那个’。这不是我们在编程它。这是一个非常深刻的见解。那么，这是唯一的方法吗？这是最好的方法吗？

And the model will come up with, oh, like I'm going to summarize each article. I'm going to do this and that and that. And it is not us programming it. That's a very deep insight. Now, is it the only way to do it, and is it the best way to do it?

Speaker 1

早期阶段。五年时间。我们拭目以待。

Early days. Five years. We'll see.

Speaker 0

没错。我会在2020年和你聊聊。好吧。不过我现在也在思考很多五年前觉得非常重要的事情。其中很多是关于神经科学的启发。所以我想某种程度上，你在这里谈论的是规划和推理，但记忆是另一个非常重要的方面。

Exactly. I'll talk to in 2020 Okay. I'm I'm thinking now, though, also about lots of the things that felt very important back, you know, five years ago. And a lot of it was about inspiration from from neuroscience. So I suppose in a way, here, you're talking about planning and reasoning, but memory was the other really big one.

Speaker 0

那那这个方面有没有体现出来？人们经常谈论长上下文和短上下文。我想这在某种程度上就是工作记忆，对吧？

And and I I has that kind of come through? People talk about long context and short context a lot. I suppose that sort of is working memory in a way, isn't it?

Speaker 1

是的。我的意思是，有一些技术可以应用于语言模型。至少有三种，而且解释起来相当简单，对吧？我们拥有一个记忆整个互联网的系统的第一种方式，就是通过预训练步骤，对吧？这实际上是一种特定格式的记忆步骤，我们有这些权重，它们是随机的，然后我们将它们组装到这些惊人的架构中。

Yeah. I mean, there's techniques that are out there that you can apply to a language model. There are, at the very least three, and they're reasonably simple to explain, right? The first way in which we have a system that memorizes all of the internet is by literally doing the pre training step, right? That's literally a memorization step in a particular format, which is we have these weights, they're random, and then we assemble them in these amazing architectures.

Speaker 1

现在，第二个层次也许我稍微解释了一下如何给模型提供像谷歌这样的搜索引擎工具。你可以声称这有点像神经科学家所说的情景记忆，你知道，作为人类，也许就像我们拥有很久以前的记忆。它们不是很精确，所以往往有点模糊，对吧？比如如果我必须回想，哦，我在谷歌的第一天是什么样子？我记得一些零碎片段，或者在某个房间里，或者遇到了某个人等等。

Now, the second level is maybe I explained a little bit how you would give the tool of a search engine such as Google to the model. That you could claim is sort of what neuroscientists would call episodic memory, which, you know, as a human, maybe it's like, you know, we have these memories from a long time ago. They're not very precise, so they tend to be a bit more fuzzy, right? Like if I have to think, oh, what was my first day at Google? I remember bits and pieces or being in a room or someone I met or whatnot.

Speaker 1

大意。是的，大意，对吧？现在有趣的是，这些模型可能没有这个限制，对吧？你 literally 可以得到很多年前写的在线文章，它会包含所有图片，一切都会完美无缺，完美重建。所以，这第二种模式，称为情景记忆，显然当我们尤其是将强大的搜索引擎集成到我们的模型中时，我们看到了这一点。

The gist. Yeah, the gist, right? Now, interestingly, these models maybe don't have that limitation, right? You can literally get an article written many years ago online, and it's gonna have all the images, everything will be perfect, reconstructed perfectly. So, that second mode, called episodic memory, clearly we're seeing that when you integrate especially powerful search engines into our models.

Speaker 1

然后第三种是你可能称之为工作记忆的东西，实际上我描述的整个思考过程就是其中之一，对吧？比如，如果我们获取每篇新闻文章，但然后我们想创建摘要，找出它们如何相互关联，批评其中一些，这就开始结合工作记忆了，意思是说我将会有一个便笺簿，你知道，记录摘要，我认为我发现的问题。而当我们谈论短上下文或长上下文时，通常指的是最后这部分，比如工作记忆，无论你有一千个token（这意味着我做不了太多事情，对吧？我可以检索文章。但这已经超过一千字了）。

And then the third one is what you could call working memory, which actually the whole of thinking that I described is one of, right? Like, if we take every news article, but then we want to create summaries, find how they relate to each other, criticize some of them, this starts to combine working memory, meaning I'm going to have a scratch pad of, you know, the summaries, the issues that I think I'm finding. And that when we call short or long context, generally we mean this last bit, like the working memory, whether you have a thousand tokens, which means I couldn't possibly do much, right? I can retrieve articles. It's already over a thousand words.

Speaker 1

我能做的总结不多。或者它可能非常庞大，在这种情况下，你有更多可能性在此基础上进行推理，依此类推。因此，今年的突破之一——实际上我们还在2024年——就是实现了数百万个token的上下文，这带来了许多可能性。当然，其中之一是从过去检索信息，然后将其提取出来进行非常详细的分析。这就像我们可以上传一部电影或很长的视频，然后开始做摘要的例子。

Not much I can do to summarize them. Or it can be massive, in which case you have many more possibilities to do reasoning on top of that and so on and so forth. And so, one of the breakthroughs of the year, actually, we're still in 2024, was just to enable millions of tokens in context, which enables many things. Of course, one of which is retrieve something from the past, but then bring it forward and then do a very detailed analysis. That's a bit of the examples of we can upload like a movie or like some very long video and start doing summarization.

Speaker 1

我们上传它的事实更像是情景记忆，但现在我们将其存储在内存中。一切都适合内存。我们可以在电影的每一帧、每个对象之间进行大量关联，等等。

The fact we kind of upload it is more episodic memory, but then now we have it in memory. It all fits in memory. We can do quite a lot of associations within each frame, each object in the movie, and so on and so forth.

Speaker 0

更长的上下文窗口总是更好吗？我的意思是，因为我在想，我不知道你们是否还在用神经科学作为工作的灵感来源。但是，人类的记忆，比如工作记忆是有限制的。有时候你肯定会觉得，我的大脑满了，我受不了了。

Is a longer context window better always? I mean, because I'm just thinking about I don't know how much you guys are still using neuroscience as an inspiration for what you're doing. But, I mean, the human memory, like, there's a limit to the working memory. There's certainly sometimes when you're like, my brain is full and I'm done.

Speaker 1

是的。有时候大脑是一种灵感，但计算机肯定有其优势，我们应该发挥其长处。对吧？所以，也许它们可以在内存中存储很多东西，比如每一篇维基百科文章，无论是什么，我们人类做不到。但如果模型可以，那就没问题了。

Yeah. Sometimes brain is an inspiration, but computers certainly have advantages we should build on its strengths. Right? So, perhaps the fact that they can have little in memory, like every Wikipedia article, whatever it is, we can't. But if the model can, well, there you go.

Speaker 1

你有了新的能力。但同样，信息太多可能会造成混淆，即使对于这些神经网络也是如此。所以，压缩可能是个好主意。因此，你可能需要从中获取一些灵感，看看我们是如何做到记忆检索等相当惊人的事情的。

You have new capabilities. But also, it might be too confusing to have too much information, even for these neural networks. So, it might be a good idea to compress. So, that's where you probably want to push for getting some inspiration for how we might do what we do, which is quite amazing, right, in terms of memory retrieval and so on.

Speaker 0

是的。这就是为什么你们在进行前沿研究。没错。对。

Yeah. This is why you're leading up drastic research. Yes. Right.

Speaker 1

我的意思是，我们想用模型做的事情绝对应该是鼓舞人心和前瞻性的。然后你要问技术的主要限制是什么？当然，还要尝试下注，并激励团队围绕关键组件寻找解决方案。

I mean, what we want to do with the models should be definitely inspiring and forward looking. And then you're what are the main limits of the technology? And then try to, of course, place the bets and inspire the teams to finding solutions around the critical components.

Speaker 0

但你已经做出的一些赌注已经见效了。我的意思是，我知道刚刚宣布了大量令人眼花缭乱的新功能。我们能详细聊聊其中一些吗？然后也许还可以谈谈我们之前讨论过的不同技能，以及它们是如何体现在这些功能中的。

But some of the bets that you've already made have come off. I mean, I know there's been a big announcement of a dizzying number of new features that have just come out. Can we talk through some of them? And then but then maybe also talk to me about the different skills that we've already spoken about and how they they appear in each of these.

Speaker 1

是的。所以，我们围绕我们最好的Gemini模型构建了相当多的系统。我们做的一件事是升级到了2.0版本。即使你说，看，我们不再扩大规模了，我们还能获得更好的质量吗？我们看到了代际的飞跃。

Yeah. So, we have quite a few systems around our best Gemini models. So, one of the things that we've done is update to two point zero. We're seeing a generational leap even if you say, look, let's not scale anymore. Can we get better quality?

Speaker 1

所以，我们算是又做到了。这些模型更快、更便宜，而且实际上更好。基本上，

So, we we've done it sort of again. These models are faster, they're cheaper, and they're actually better. Basically,

Speaker 0

Gemini变得更好了。

Gemini's got better.

Speaker 1

是的。Gemini变得更好了，但这不仅仅是因为我们扩大了规模。我想这是主要信息之一。

Yeah. Gemini's got better, but not only because we scaled. I guess that's kind of one of the main messages.

Speaker 0

告诉我更多关于你们为Gemini带来的智能体能力。

Tell me more about the agentic capabilities that you've brought to Gemini.

Speaker 1

是的。所以，我们正在Chrome中发布Companion功能，你可以直接输入来完成任务。其中一些任务很棘手，因为我部分喜欢它们，但部分又不喜欢。我现在非常清楚地想到了旅行，对吧？所以，你旅行时会寻找酒店或航班等等，很多时候会觉得，哦，希望这能自动化，但同时又不想完全不参与。所以，我们发布的这类东西，希望能自动化那些更琐碎的步骤或重复性的事情，或者需要自动化的事情，因为我懒得点击所有东西，对吧？所以，我们添加了一个智能体，你可以让它为你做某事，然后它会通过思考和行动，比如基本的点击链接等，尝试为你解决问题，对吧？

Yeah. So, we're releasing Companion in Chrome where you can just type to do a task that maybe Some of these tasks are tricky because I partly enjoy them, but also partly don't like them. So, I'm thinking now very clearly about trips, okay? So, you travel and you look for hotels or flights or whatnot, and a lot of it feels like, oh, wish this could be automated, but at the same time, I wouldn't just want to not be part of So, this journey, the kind of thing we're releasing is something that hopefully will automate parts of the more trivial steps or repeat it or things that need automation because I can't be bothered to click everything, right? So, we're adding an agent that you can ask it to do something for you, and then it's going to, again, thinking and through acting on like kind of the basic clicking on blinks and so on, try to solve the task for you, right?

Speaker 1

这确实是一个既令人兴奋又充满挑战的研究机遇，因为它为通用智能体和模型提供了一个极其通用的环境。我们早期原型的一些例子是，比如我们可以要求它在浏览器上玩游戏——这当然回到了DeepMind的根源——它表现得还不错，对吧？它能找到网站，开始玩游戏。这种能力与通用性之间的联系很酷：你越通用，就越能把过去需要专门技能的环境视为‘我只需输入指令，它就能去学习玩这个游戏’。虽然我们还没完全达到那个水平，但这让我们得以一窥这类技术可能带来的未来。

And that's quite an exciting, both research challenge opportunity, because it's very general environment for a very general agent and model ultimately. And some examples that we had, again, the early prototypes is, I mean, we can ask to play a game, which of course goes back to the roots of DeepMind, on the browser, and it kind of did okay, right? It finds a website, it starts playing the game. It's kind of a cool connection to the more general you are, then the more you can treat environments where you had specialization in the past as now, oh, I just can type it, and it just goes and learn to play this game. I mean, we're not quite there, but this is a glimpse of maybe where we could go with this kind of technology.

Speaker 0

你说得对，这确实让我们回想起你们多年前就在做的事情，就是那种能够使用键盘和鼠标的智能体。对吧？这真的很相似。

I mean, you're right that it does bring us back to that thing that you were doing so many years ago, which was something that could use a keyboard and a mouse. Right? It's like it's it's a really similar thing.

Speaker 1

是的。甚至连操作方式都非常相似。它需要理解屏幕内容，根据你的指令判断该点击哪里等等。这些操作与你在通用游戏中需要进行的互动本质上是一样的。

Yeah. Even the actions are very similar. Right? They and understand the screen, and given what you ask, I mean, where would you click and so on. That's kind of the same sort of actions even that very general games you would have to interact with.

Speaker 1

区别在于前者的目标是狭窄的，只针对单一游戏和同类界面，而这里面对的是整个互联网，是的，范围要广阔得多。

The difference is the goal there is narrow, which is one game and the same kind of screens, whereas here is the whole web, which, yeah, is pretty vast.

Speaker 0

好吧。那么，我现在就在想象你们能做什么。比如，它能查看你的日历吗？你能说‘我明年想去度假’，然后它就能查看你的日历，找出最佳周次，了解你的预算等等等等？

Well, okay. But then, so I'm sort of imagining what you could do now. I mean, could it look in your calendar? Could you say, I want to go on holiday next year? And it could look in your calendar and work out when the best week was, you know, know your budget, etcetera, etcetera, etcetera.

Speaker 1

是的。这些模型离自动化处理这类任务已经不远了，对吧？所以现在的问题是如何让它做得更好，确保安全。这中间还有很多步骤。

Yeah. So, these models are not far from being able to automate this, right? So, now it's a matter of making it better, right? Making it safe. There's a lot of steps.

Speaker 1

但如果你展望未来，原则上人类在浏览器上能做的任何事情，这类模型都能做到。然后如果你让它们真正理解你的需求，通过思考和其他技术变得非常出色，它们会越来越强，可能比你做得更快，在某些情况下甚至好得多。这就是梦想所在。虽然现在还处于非常早期的阶段，但确实超级令人兴奋。我相信明年我们一定会看到大量关于语言模型与浏览器或更广泛计算机系统深度融合的实验探索。

But if you just fast forward, anything a human can do on a browser, like these things can do in principle. And then if you make them really understand what you want and really good and through thinking and other techniques, they'll get better and better, and they'll be probably faster and maybe in some cases much better than you are doing though. So, that's kind of the dream. And this is super early stages, but it's also super exciting. And I think certainly next year, we're going to see a lot of experimentation around this idea of intersecting language models identically with browser or computers more generally.

Speaker 1

编程怎么样？是的，编程也是一个很棒的方向。我们也在发布软件工程工具，当然，这些通常不仅需要‘嘿，这是一个关于编程的完美谜题描述，请给我写代码’，而且还要知道如何测试它。这更像是迭代的过程，对吧？

How about coding? Yeah, coding is a great one as well. We are also releasing tools for software engineering, which, of course, they generally require not only, hey, here is a perfect description of a puzzle about coding and please write me the code. And by the way, I know how to test it. It's more iterative, right?

Speaker 1

你需要编写代码、运行代码，如此循环。因此，我们也在从智能体的角度推进这种能力。我的意思是，游戏也非常重要，当然，这曾是开发强大算法的手段。但思考这些强大的多模态模型如何开始理解游戏，并能帮助用户在游戏过程中娱乐、提供建议，或讲个游戏相关的笑话等等，也非常有趣。对吧？

You have to write code, run the code, and so on and so forth. So, we are putting that capability as well forward from an agentic point of view. We're also I mean, games are very important, and of course, that was means to an end to develop powerful algorithms. But it's also very interesting to think about how these very powerful multi model models start to understand games and can aid users to, you know, entertain during like a game session, give them advice, or tell a joke about the game or whatnot. Right?

Speaker 1

所以我们也在尝试这种游戏伴侣的概念。

So we're we're also experimenting with this sort of game companion.

Speaker 0

好的。你谈到的所有这些，听起来非常接近相当通用的智能。我的意思是，我们是否正在接近通用人工智能（AGI）？

Okay. All of these things that you're talking about, I mean, this is sounding very close to intelligence that is quite general. I mean, are we getting close to AGI?

Speaker 1

是的，这是个好问题。听着，我这周早些时候还在思考这个。如果在十年前，甚至五年前，有人给我今天的模型，我会说，‘看，这是一个秘密实验室的模型。玩玩看，然后告诉我你是否认为这实际上接近通用智能。’

Yeah, this is a good question. Look, I was thinking about this earlier this week. If ten years ago, five years ago even, I would have been given the models today, Then I would say, Look, there's a secret lab. This is a model. Play with it and tell me if you think this is actually like close to a general intelligence.

Speaker 1

我可能会声称，‘哦，是的，这来自一个AGI基本上已经实现或非常接近的未来，对吧？’所以，越接近，你就越会发现，‘哦，但它会幻觉。’当然，这非常重要，对吧？但退一步看，感觉就像是，‘好吧，它已经非常非常接近了。’

I would have claimed, Oh yeah, that comes from a future where AGI is basically either has happened or I can see that this is very close to it, right? So, the closer you are, the more you find, Oh, but it hallucinates. Of course, that's very important, right? But I think just zooming out, it just feels like, Okay, It's getting pretty pretty close.

Speaker 0

但DeepMind的使命声明是解决智能，那种智能，像是超级智能，超越人类智能的东西。你认为扩展规模就足以实现这一点，还是我们需要其他东西？

But then DeepMind's mission statement, solve intelligence, that sort of intelligence, like super intelligence, something that is surpasses human intelligence. Do you think that scaling is enough to get us there, Or do you think that we need something else?

Speaker 1

是的。我是说，谷歌DeepMind的使命显然是将智能与科学相交融，推动边界。我们最近就看到了一个很好的例子，当然是AlphaFold。所以从这个意义上说，从领域角度来看，我们确实已经看到了一些窄领域但超级智能系统的实例。我是说，AlphaFold就只做那一件事。

Yeah. I mean, Google DeepMind has this mission to obviously intersect intelligence with science, to push the boundaries. And we've seen a good example very recently, of course, with AlphaFold. So, in that sense, from a domain's perspective, we honestly have seen some examples already of narrow, but super intelligent system. I mean, AlphaFold was only doing that.

Speaker 1

我认为这可能是我们要开始看到超级智能的领域，即使是从这些模型的通用能力出发。你可能需要做一些专门化的工作。而且再说一次，这可能是值得的。我是说，解决蛋白质折叠问题值得吗？但绝对值得，我认为这是一个很好的测试用例。

And I think probably that's the domains to think about where we're going to start seeing super intelligence even from the general sort of capabilities these models have. You might need to do some specialization. And again, it might be worth it. I mean, was it worth it to solve protein folding? But Absolutely, I think that's a good test to use.

Speaker 1

我们处于非常有利的位置，因为我们当然有整个科学团队等等，在研究非常有趣的问题。现在，如果你拿语言模型来说，并开始思考智能体，将它们置于可能更偏向科学、模拟、定理证明等的环境中，是否需要某种非常离散的东西来促成其他突破？我可能会说可能不需要。如果没有另一个类似Transformer那样的突破，也许我们会开始看到更多例子，比如在数学领域，现在它发现了数学家觉得有趣的新定理。这当然是通过非常好的执行加上一些想法的扩展等等实现的。

And we are very well positioned because we, of course, have like the whole of science team and so on working on very interesting problems. Now, if you take the language models and you start thinking about agents putting them in environments that could be more about science, simulation, theorem provers, and so on, will something very discrete be needed to enable other breakthroughs? I would say probably not. Without another transformer like breakthrough, perhaps, feels like we're going to start seeing more examples of, oh my god, like, yeah, like in math, now it discovers new theorems that mathematicians find interesting. And it happened by, of course, very good execution plus scaling up of some of the ideas and so on and so forth.

Speaker 0

不过有趣的是，最先倒下的多米诺骨牌是那些有明确事实基础的领域，对吧？是的。

It is interesting, though, that the first dominoes to fall are the ones which have a ground truth, right? Yeah.

Speaker 1

比如科学，

Like science,

Speaker 0

就像你描述的那样。

as you described.

Speaker 1

是的。虽然，我是说，是的，科学取决于哪些科学可能有明确的事实基础，我想。蛋白质折叠，绝对是。是的。是的。

Yeah. Although, I mean, yeah, science it depends which sciences might have ground truth, I suppose. Protein folding, definitely. Yeah. Yeah.

Speaker 1

确实如此。我希望我们还能看到其他一些以超人类方式进步的方法。比如，你可以想象有一个由这些强大模型驱动的头脑风暴科学顾问，它不仅仅是发现新事物或证明新理论，而是挑战你的假设，让你跳出思维定式，从而激发我的创造力，达到我原本无法达到的境界，那么这在某种程度上也可以称之为超人类，对吧？所以，我认为这些绝对没有超出范围，当然也更难思考如何奖励这种行为。

It's true. I'm hoping we also see some other ways to advance in a superhuman way. Like, you could imagine having a brainstorming scientific advisor that is powered by one of these powerful models, and more than it discovers something or it proves something new, it just challenges your assumptions, and it makes you think out of the box in a way that then my creativity sort of gets me to a place I couldn't have gone, then you would call that superhuman in some ways as well, right? So, I think those are definitely not out of scope and much harder to also, of course, think of how do you reward that behavior.

Speaker 0

绝对迷人。我的意思是，这里面确实有很多激进的内容。是的。没错。谢谢你

Absolutely fascinating. I mean, there was definitely a lot of drastic stuff in there. Yeah. Yes. Thank you so

Speaker 1

来参加我的节目。是的。彼此彼此。谢谢。很愉快。

much for joining me. Yeah. Likewise. Thanks. Pleasure.

Speaker 1

五年后再见。

See you in five years.

Speaker 0

我认为那次对话中确实浮现出一个真正的主题，至少对我来说是这样，那就是普遍性的概念。如果你仔细想想，智能推进知识的方式确实存在这种普遍性。就像那些古代天文学家，比如哥白尼，他们通过观察天空收集大量数据，并以此提取出太阳系的模型。而在AlphaGo的例子中，它通过观察围棋对局来提取最佳下棋方式的模型。现在，在人类创造的一切事物中，都嵌入了这个模型，这个关于我们如何体验现实的基本真理。

I think there was this real theme that emerged from that conversation, at least for me anyway, which was this idea of generality. And if you think about it, there is this generality in the way that intelligence advances knowledge. Like those old astronomers, like Copernicus, they were assessing lots of data from observing the sky and using that to extract a a model of the solar system. But in the case of AlphaGo, it was observing games of Go to extract a model for the best possible way to play. And now somewhere embedded in everything that has ever been created by humans is this model, this underlying truth of how we experience reality.

Speaker 0

当然，我们正在寻找的模型永远不会像日心说那样简洁，但这个模型似乎确实存在，隐藏在Gemini的冻结权重之中。如果这就是我们目前所做到的，那么下一阶段就是尝试利用这些普遍概念来提取人类偏好的模型。当然，这要困难得多。但如果我们成功了，或许也能让我们达到一种更普遍的智能形式——AGI。如果你觉得这次对话有趣，我认为也值得看看我与Jeff Dean讨论扩展等话题的节目，以及Yasson Gabriel关于AI代理伦理的节目。

And the model that we're looking for, of course, it's never gonna be as neat as heliocentrism, but that model does seem to be in there, hidden among the frozen weights of Gemini. Now if that's what we've done so far, the next phase is to try and use those general ideas to extract a model of human preferences too. And that is, of course, a lot, lot harder. But if we succeed, it might just get us to a more general form of intelligence too AGI. Now, if you found this conversation interesting, I think it's worth also checking out the episodes I did with Jeff Dean on, among other things, scaling, and Yasson Gabriel on the ethics of AI agents.

Speaker 0

或者，如果你想更深入地了解Gemini 2.0的开发，可以收听由主持人Logan Kilpatrick主持的最新一期Google AI发布说明播客。这期及其他节目可以在你获取播客的任何地方找到。下次再见。

Or if you want to dig a bit deeper into the development of Gemini two point o, then you could check out the latest episode of the new Google AI release notes podcast with host Logan Kilpatrick. This and other episodes can be found wherever you get your podcasts. Until next time.