杰夫·迪恩与诺姆·沙泽尔——谷歌25年：从PageRank到通用人工智能

本集简介

本周节目，我迎来了两位堪称各领域最重要的技术巨匠。杰夫·杰夫顿是谷歌首席科学家，在公司25年间，几乎主导了现代

This week I welcome on the show two of the most important technologists ever, in any field. Jeff Dean is Google's Chief Scientist, and through 25 years at the company, has worked on basically the most transformative systems in modern computing: from MapReduce, BigTable, Tensorflow, AlphaChip, to Gemini. Noam Shazeer invented or co-invented all the main architectures and techniques that are used for modern LLMs: from the Transformer itself, to Mixture of Experts, to Mesh Tensorflow, to Gemini and many other things. We talk about their 25 years at Google, going from PageRank to MapReduce to the Transformer to MoEs to AlphaChip – and maybe soon to ASI. My favorite part was Jeff's vision for Pathways, Google’s grand plan for a mutually-reinforcing loop of hardware and algorithmic design and for going past autoregression. That culminates in us imagining *all* of Google-the-company, going through one huge MoE model. And Noam just bites every bullet: 100x world GDP soon; let’s get a million automated researchers running in the Google datacenter; living to see the year 3000.Watch on Youtube; listen on Apple Podcasts or Spotify. Sponsors Scale partners with major AI labs like Meta, Google Deepmind, and OpenAI. Through Scale’s Data Foundry, labs get access to high-quality data to fuel post-training, including advanced reasoning capabilities. If you’re an AI researcher or engineer, learn about how Scale’s Data Foundry and research lab, SEAL, can help you go beyond the current frontier at scale.com/dwarkesh Curious how Jane Street teaches their new traders? They use Figgie, a rapid-fire card game that simulates the most exciting parts of markets and trading. It’s become so popular that Jane Street hosts an inter-office Figgie championship every year. Download from the app store or play on your desktop at figgie.com Meter wants to radically improve the digital world we take for granted. They’re developing a foundation model that automates network management end-to-end. To do this, they just announced a long-term partnership with Microsoft for tens of thousands of GPUs, and they’re recruiting a world class AI research team. To learn more, go to meter.com/dwarkesh To sponsor a future episode, visit dwarkeshpatel.com/p/advertise Timestamps 00:00:00 - Intro 00:02:44 - Joining Google in 1999 00:05:36 - Future of Moore's Law 00:10:21 - Future TPUs 00:13:13 - Jeff’s undergrad thesis: parallel backprop 00:15:10 - LLMs in 2007 00:23:07 - “Holy s**t” moments 00:29:46 - AI fulfills Google’s original mission 00:34:19 - Doing Search in-context 00:38:32 - The internal coding model 00:39:49 - What will 2027 models do? 00:46:00 - A new architecture every day? 00:49:21 - Automated chip design and intelligence explosion 00:57:31 - Future of inference scaling 01:03:56 - Already doing multi-datacenter runs 01:22:33 - Debugging at scale 01:26:05 - Fast takeoff and superalignment 01:34:40 - A million evil Jeff Deans 01:38:16 - Fun times at Google 01:41:50 - World compute demand in 2030 01:48:21 - Getting back to modularity 01:59:13 - Keeping a giga-MoE in-memory 02:04:09 - All of Google in one model 02:12:43 - What’s missing from distillation 02:18:03 - Open research, pros and cons 02:24:54 - Going the distance Get full access to Dwarkesh Podcast at www.dwarkesh.com/subscribe

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

今天，我非常荣幸能与杰夫·迪恩和诺am·沙泽尔聊天。

Today, I have the honor of chatting with Jeff Dean and Noam Shazeer.

Speaker 0

杰夫是谷歌的首席科学家。

Jeff is Google's chief scientist.

Speaker 0

在公司工作的二十五年里，他参与了现代计算领域几乎所有最具变革性的系统，包括MapReduce、BigTable、TensorFlow、AlphaChip。

And through his twenty five years at the company, he has worked on basically the most transformative systems in modern computing from MapReduce, BigTable, Tensorflow, AlphaChip.

Speaker 0

真的，这个名单根本停不下来。

Genuinely, the list doesn't end.

Speaker 0

现在是Gemini。

Gemini now.

Speaker 0

诺am是推动当前人工智能革命的最关键人物。

And Noam is the single person most responsible for the current AI revolution.

Speaker 0

他是现代大语言模型所使用的所有主要架构和技术的发明者或共同发明者，包括Transformer本身、专家混合模型、Mesh TensorFlow以及其他许多技术。

He has been the inventor or the co inventor of all the main architectures and techniques that are used for modern LLMs from the transformer itself to Mixture of Experts and to Mesh Tensorflow to many other things.

Speaker 0

他们两人是谷歌DeepMind中Gemini项目的三位联合负责人中的两位。

And they are two of the three co leads of Gemini at Google DeepMind.

Speaker 0

太棒了。

Awesome.

Speaker 0

非常感谢你们的到来。

Thanks so much for coming on.

Speaker 1

谢谢你们邀请我们。

Thanks for having us.

Speaker 1

非常兴奋能来到这里。

Super excited to be here.

Speaker 0

好的，第一个问题。

Okay, first question.

Speaker 0

你们两位在谷歌都工作了二十五年左右。

Both of you have been Google for twenty five or close to twenty five years.

Speaker 0

在公司早期，你们可能已经了解了所有事情的运作方式。

At some point early on in the company, you probably understood how everything worked.

Speaker 0

但什么时候这种状况不再成立了呢？

When did that stop being the case?

Speaker 0

你觉得有没有一个明确的时刻发生了这种变化？

Do you feel like there was a clear moment that happened?

Speaker 2

我的意思是，我记得我刚加入的时候，那是2000年左右，当时公司有个规定，每个人都会有个导师。

I mean, I know I joined and like at that point, this was like 2000, and, they had this thing, everybody gets a mentor.

Speaker 2

你知道的，我什么也不懂。

And, you know, so, you know, I knew nothing.

Speaker 2

我就不停地问我的导师，而我的导师什么都知道。

I would just ask my mentor everything, and my mentor knew everything.

Speaker 2

后来我发现，我的导师就是杰夫。

It turned out my mentor was Jeff.

Speaker 2

并不是每个在谷歌的人都什么都知道。

And it was not the case that everyone at Google knew everything.

Speaker 2

只是杰夫什么都知道，因为他基本上写过所有东西。

It was just the case that Jeff knew everything because he because he has basically written everything.

Speaker 1

你太客气了。

You're you're very kind.

Speaker 1

我的意思是，我觉得公司成长过程中会经历这些阶段。

I mean, I think as companies grow, you you kind of go through these phases.

Speaker 1

比如我加入的时候，我们只有25到26个人左右。

Like when I joined, we were 25 people, 26 people, something like that.

Speaker 1

所以你最终能记住每个人的名字。

And so you eventually learned everyone's name.

Speaker 1

尽管公司在不断扩张，你还是能跟踪所有新加入的人。

And even though we were growing, you kept track of all the people who were joining.

Speaker 1

但到了某个阶段，你就记不住公司里每个人的名字了，不过你仍然认识所有从事软件工程的人。

At some point, then you kind of lose track of everyone's name in the company, but you still know everyone working on software engineering things.

Speaker 1

然后你甚至会渐渐记不清软件工程团队里所有人的名字，但至少还能知道每个人在做什么项目。

Then you sort of lose track of all the names of people in the software engineering group, but you at least know all the different projects that everyone's working on.

Speaker 1

再后来，公司大到你收到一封邮件说‘鸭嘴兽项目’周五上线，你却想：‘鸭嘴兽项目？那是什么？’

And then at some point the company gets big enough that you get an email that Project Platypus is launching on Friday, you're like, what the heck is Project Platypus?

Speaker 1

所以我认为

So I think

Speaker 2

通常这会是一个非常好的惊喜。

Usually, it's a very good surprise.

Speaker 2

比如你会说，哇。

Like, you're like, wow.

Speaker 2

Project Platypus。

Project Platypus.

Speaker 2

我根本不知道我们在做这个，结果还真是这样。

Like, I no idea we were doing that, and, and it turns out Yeah.

Speaker 1

但即使只是从很高的层面了解公司正在发生什么，即使你不知道每一个细节，这也是很好的。

But think It is good to keep track of like what's going on in the company, even at a very high level, even if you don't know every last detail.

Speaker 1

而且，了解公司里很多不同的人也很重要，这样你就可以去问某人获取更多细节，或者找出该找谁沟通。

And it's good to know lots of people throughout the company so that you can go ask someone for more details or figure out who to talk to.

Speaker 1

我认为，只要有一层间接关系，如果你平时积累了一定的人脉网络，通常就能在公司里找到合适的人。

I think with one level of indirection, you can usually find the right person in the company if you have a good network of people that you've built up over time.

Speaker 0

顺便问一下，谷歌是怎么招募你的？

How did Google recruit you, by the way?

Speaker 1

实际上，是我主动联系了他们。

I kind of reached out to them, actually.

Speaker 0

诺姆，你是怎么被招聘的呢？

And and, Noam, how did you get recruited?

Speaker 0

你当时是

What what was

Speaker 2

我其实是在1999年的一场招聘会上看到谷歌的，当时我以为它已经是一家超级大公司了，根本没必要加入，因为我知道的每个人都在用谷歌。

it that you did I actually saw Google at a job fair in, like, 1999, and I assumed that it was, like, already this huge company that no point in joining, because everyone I knew used Google.

Speaker 2

我想那是因为我当时是加州大学伯克利分校的研究生。

I guess that was because I was a grad student at Berkeley at the time.

Speaker 2

我大概已经多次从研究生项目中退学了。

I I guess I've dropped out of grad programs a few times.

Speaker 2

但事实上，谷歌当时并没有那么大。

But, but, you know, it turns out that, like, actually it wasn't really that large.

Speaker 2

所以其实我1999年并没有申请，而是在2000年一时兴起发了份简历过去，因为我觉得谷歌是我最喜欢的搜索引擎，应该多投几家公司的简历。

So it turns out I did not apply in 1999 but, like, just kind of sent them a resume on a whim in 2000 because I figured I should let it was like my favorite search engine and figured I should apply to multiple places for a job.

Speaker 2

但后来，确实非常有趣。

But then, yeah, turned out to be be really, really fun.

Speaker 2

看起来是一群聪明人在做很棒的事情。

Looked like a bunch of smart people doing good stuff.

Speaker 2

他们墙上挂着一张彩色图表，记录着每天的搜索查询次数，据说是有人一直在维护的。

And they had this really nice crayon chart on the wall of the daily number of search queries that, you know, somebody had just been maintaining.

Speaker 2

而且，确实看起来增长非常迅速。

And, yeah, it looked very exponential.

Speaker 2

这些人一定会非常成功。

These guys are going to be very successful.

Speaker 2

而且看起来他们有很多值得解决的好问题。

And it looks like they have a lot of good problems to work on.

Speaker 2

所以我心想，好吧，也许我可以去那里工作一段时间，赚够钱之后就能一直做我想要的AI研究了。

So I was like, Okay, maybe I'll, yeah, go work there for a little while and then have enough money to just go work on AI for as long as I want after that.

Speaker 0

是的，对。

Yeah, yeah.

Speaker 0

某种程度上，是这样的，对吧？

In a way, did that, right?

Speaker 0

是的，对的。

Yeah, yeah.

Speaker 2

事情完全按照预期发展了。

It totally worked out exactly according to Right.

Speaker 0

抱歉，你1999年就在考虑人工智能了吗？

Sorry, you were thinking about AI in 1999?

Speaker 2

是的，那大概是2000年的事。

Yeah, this was like 2000.

Speaker 2

是的，我记得在研究生期间，当时有个朋友告诉我，他的2000年新年决心是活到3000年，而他打算通过发明人工智能来实现这个目标。

Yeah, I remember in, yeah, in grad school, a friend of mine at the time had told me that his New Year's resolution for 2000 was to live to see the year 3000 and that he was going to achieve this by inventing AI.

Speaker 2

所以我想，哦，这听起来是个不错的主意。

So I was like, oh, that sounds like a good idea.

Speaker 2

但你知道，后来我才意识到，原来你可以在大公司里做这件事，不过当时我觉得，嘿，好像很多创业公司的人赚了很多钱。

But, you know, then, you know, I didn't get the idea at the time that, oh, like, you could go do it at a big company, but, you know, I figured, hey, you know, a bunch of people seem to be making a ton of money at startups.

Speaker 2

也许我可以先赚点钱，然后就有足够的钱生活，可以长期从事人工智能研究。

Maybe I'll just make some money and then I'll have, you know, enough to live on just work on AI research for a long time.

Speaker 2

是的。

Yeah.

Speaker 2

但确实，谷歌后来成了从事人工智能研究的绝佳工作场所。

But yeah, it actually turned out that Google was a terrific place to work in AI.

Speaker 1

我的意思是，我喜欢谷歌的一点是，我们的雄心始终是那种需要相当先进人工智能才能实现的目标。

I mean, one of the things I like about Google is our ambition has always been sort of something that would kind of require pretty advanced AI.

Speaker 1

组织全球信息并使其普遍可访问且有用。

Organizing the world's information and making it universally accessible and useful.

Speaker 1

实际上，这里面包含着非常广泛的使命。

Like actually, there's a really broad mandate in there.

Speaker 1

所以公司并不是只做一件小事然后一直做下去。

So it's not like the company was going to do this one little thing and stay doing that.

Speaker 1

而且，你也能看到，我们最初所做的已经朝着这个方向迈进，但在这个方向上还能做更多。

And also, you could see that what we were doing initially was in that direction, but you could do so much more in that direction.

Speaker 0

过去两到三十年，摩尔定律如何改变了你在设计新系统、判断哪些项目可行时需要考虑的因素？

How has Moore's Law over the last two, three decades changed the kinds of considerations you have to take on board when you design new systems, when you figure out what projects are feasible?

Speaker 0

有哪些东西一直没变？也就是说，目前仍然存在的限制是什么？

What has stayed, you know, like, what what what are still the limitations?

Speaker 0

现在有哪些事情是你以前显然做不到的？

What are things you can now do that you obviously couldn't do before?

Speaker 1

我的看法是，过去几十年里，这方面确实发生了相当大的变化。

I mean, I I I think of it as actually changing quite a bit in the last couple decades.

Speaker 1

比如，二十年前到十年前，情况非常理想，你只需要等待，大约十八个月后就能获得快得多的硬件，而你根本不需要做任何改动。

So like the two decades ago to one decade ago, it was awesome because you just like wait and like eighteen months later, get much faster hardware and you don't have to do anything.

Speaker 1

但最近，我觉得基于通用CPU的机器的性能提升已经不如以前了。

And then more recently, you know, I feel like the general purpose CPU based machines scaling has not been as good.

Speaker 1

制造工艺的改进现在需要三年时间，而不是每两年一次。

The fabrication processes improvements are now taking three years instead of every two years.

Speaker 1

多核处理器等架构上的改进，也不再像二十年到十年前那样带来同等程度的性能提升。

The architectural improvements in multi core processors and so on are not giving you the same boost that we were getting twenty to ten years ago.

Speaker 1

但我觉得与此同时，我们正看到越来越多专门的计算设备，比如机器学习加速器、TPU，以及最近更专注于机器学习的GPU，这些设备让我们能够为现代计算任务实现极高的性能和效率，而这些任务与过去试图运行微软办公软件那样的复杂C++代码完全不同。

But I think at the same time, we're seeing much more specialized computational devices like machine learning accelerators, TPUs, very ML focused GPUs more recently, are making it so that we can actually get really high performance and good efficiency out of the more modern kinds of computations we want to run are different than, you know, a twisty pile of C plus plus code trying to run Microsoft Office or something.

Speaker 1

是的，

Yeah,

Speaker 2

对，

yeah.

Speaker 2

我的感觉是，算法正在跟随硬件的发展。

I mean, it feels like the algorithms are following the hardware.

Speaker 2

基本上，现在的情况是，算术运算变得非常非常便宜，而数据的移动相比之下要昂贵得多。

Basically, like what's happened is that at this point, arithmetic is very, very cheap and moving data around is comparatively, like, much more expensive.

Speaker 1

没错。

Right.

Speaker 2

因此，深度学习的蓬勃发展基本上正是源于这一点，因为你可以用矩阵乘法来构建它，而矩阵乘法本质上是立方级的运算，伴随着平方级的数据通信开销。

So pretty much all of deep learning has taken off roughly because of that, because you can build it out of matrix multiplications that are, you know, cubed operations and n squared bytes of, data communication, basically.

Speaker 1

我想说，转向围绕这一需求设计的硬件是一个重要的转折点，因为在此之前，CPU和GPU并不特别适合深度学习。

Well, I would say that the the pivot to hardware oriented around that was an important transition because before that we had CPUs GPUs that were not, you know, especially well suited for deep learning.

Speaker 1

然后，我们开始在谷歌构建TPU这样的设备，它们本质上是低精度的线性代数机器。

And then, you know, we started to build, say, TPUs at Google that were really just reduced precision linear algebra machines.

Speaker 1

一旦有了这个，你就想要——对。

And then once you have that, then you want to Right.

Speaker 2

你必须意识到，这其实关键在于识别机会成本。

You have to see the insight that seems like it's all about all about kind of identifying opportunity cost.

Speaker 2

比如，好吧。

Like, okay.

Speaker 2

这就像拉里·佩奇过去常说的：我们的第二大成本是税收，而最大的成本是机会成本。

This is something like Larry Page, think, used to always say, like, our second biggest cost is taxes and our biggest cost is opportunity cost.

Speaker 2

如果我没记错他的话，那我这些年可能一直引用错了。

And if he didn't say that, then I've been misquoting him for years.

Speaker 2

但本质上就是，你知道，你错失了什么样的机会？

But basically it's like, you know, what is the opportunity that you have that you're missing out on?

Speaker 2

比如在这个案例中，我想是这样的：你有这么多芯片面积，却只放了很少的算术单元，不如把芯片填满算术单元，这样就能让计算量提升好几个数量级。

Like in this case, I guess it was that, okay, you've got all of this chip area and you're putting a very small number of arithmetic units on it, like, fill the thing up with arithmetic units, you could have orders of magnitude more arithmetic getting done.

Speaker 2

那么，还需要改变什么？

Now what else has to change?

Speaker 2

好的，算法、数据流以及其他所有方面都需要改变。

Okay, the algorithms and the data flow and everything else.

Speaker 1

是的。

Yeah.

Speaker 1

不过顺便说一下，运算可以使用非常低的精度，这样就能塞进更多的乘法单元。

No, by the way, the arithmetic can be, like, really low precision, so then you can squeeze even more multiplier units in.

Speaker 0

诺亚，我想跟进一下你说的，算法一直在跟随硬件发展。

Noam, I wanna follow-up on what you said that the algorithms have been following the hardware.

Speaker 0

如果你想象一个反事实的世界，假设内存成本比运算成本下降得更多，或者还记得你看到的这种动态变化，是的。

If you imagine a counterfactual world where suppose that the cost of memory had declined more than arithmetic or just like in remember the dynamic you saw Yeah.

Speaker 2

那个，好吧。

That that okay.

Speaker 2

数据流非常便宜，而运算则不然。

Data flow is is extremely cheap and arithmetic Yeah.

Speaker 2

并不便宜。

Is not cheap.

Speaker 0

那么今天的AI会是什么样子？

What what would what would AI look like today?

Speaker 1

你会有

That's You'd

Speaker 2

拥有

have a

Speaker 1

更多的对超大内存的查找操作。

lot more lookups into very large memories.

Speaker 1

是的。

Yes.

Speaker 1

我认为

I think

Speaker 2

对。

Yeah.

Speaker 2

我的意思是，我觉得它可能看起来更像二十年前的AI，但方向相反。

I mean, I think it might look more like AI looked like twenty years ago or but in the opposite direction.

Speaker 2

我，嗯，我不确定。

I I, like, I'm not sure.

Speaker 2

我想我是在2012年加入Google Brain的。

I I guess I joined, Google Brain in 2012.

Speaker 2

你知道，我曾经离开谷歌几年，后来因为去探望我妻子，顺便回去吃午饭，结果正好坐在杰夫和早期的Google Brain团队旁边。

Happ you know, I'd left Google for a few years, happened to, like, go back for lunch to visit my wife, and, and, we happened to sit down next to Jeff and the early Google Brain team.

Speaker 2

我当时想，哇。

And I thought, wow.

Speaker 2

那是一群聪明人在做一件事。

That's a smart group of people doing something.

Speaker 1

你应该好好想想，别太固执。

You should you should think about it and be real nuts.

Speaker 1

我们在这里已经取得了相当不错的进展。

We're making some pretty good progress here.

Speaker 2

听起来很有趣。

That sounds fun.

Speaker 2

所以好吧。

So okay.

Speaker 2

于是我重新加入了。

So I jumped back in.

Speaker 2

我

Speaker 1

回来了。

rode back.

Speaker 1

那很棒。

It was great.

Speaker 2

去加入杰夫。

To join Jeff.

Speaker 2

那大概是2012年。

That was, like, 2012.

Speaker 2

我好像每12年就回一次谷歌。

I I seem to join Google every 12.

Speaker 2

我于2012年2月和2024年再次加入了谷歌。

I I rejoined Google in 02/2012 and 2024.

Speaker 2

但是

But

Speaker 0

2036年会发生什么？

What's gonna happen in the 2036?

Speaker 2

我不知道。

I don't know.

Speaker 2

我想我们拭目以待吧。

I guess we shall see.

Speaker 0

那么，为了未来，你正在考虑做出哪些权衡？

Well, what are the trade offs that you're considering changing for future?

Speaker 0

要整合的TPU版本？

Versions of TPU to integrate?

Speaker 0

你们在思维方式上对算法有什么不同吗？

How how are thinking about algorithms differently?

Speaker 1

我的意思是，我认为一个总体趋势是我们越来越擅长量化，或者使用精度更低的模型。

I mean, I think one thing, one general trend is we're getting better at quantizing or having much more reduced precision models.

Speaker 1

我们从TPU V1开始。

We started with TPU V1.

Speaker 1

我们当时甚至不确定是否能用8位整数对模型进行量化以用于服务，但我们有一些早期证据表明这可能是可行的。

We weren't even quite sure we could quantize a model for serving with eight bit integers, but we sort of had some early evidence that seemed like it might be possible.

Speaker 1

所以我们想，太好了，那就围绕这个来设计整个芯片。

So we're like, great, let's build the whole chip around that.

Speaker 1

随着时间推移，我认为你们也看到人们开始在训练中使用更低的精度，同时推理的精度也在持续降低。

And then over time, I think you've seen people able to use much lower precision for training as well, but also the inference precision has gone.

Speaker 1

现在人们已经开始使用int4或FP4，如果你二十年前对一个超级计算领域的浮点运算专家说我们要用FP4，他们肯定会说：什么？

People are now using int four or FP4, which sounded like if you said to someone like, we're gonna use FP4 to like a supercomputing floating point person twenty years ago, they'd be like, what?

Speaker 1

这太疯狂了。

That's crazy.

Speaker 1

我们喜欢在浮点数中使用64位。

We like 64 bits in our floats.

Speaker 1

或者更低，有些人甚至把模型量化到两位或一位。

Or, you know, even below that, you know, some people are quantizing models to two bits or one bit.

Speaker 1

我认为这是一个值得关注的趋势，一位

And I think that's a trend to definitely pay attention One bit

Speaker 0

就是零或者

is like a zero or

Speaker 2

一？

one?

Speaker 2

是的，

Yeah,

Speaker 1

就是零或一。

just a zero or one.

Speaker 1

然后你会为一组比特设置一个符号位之类的。

And then you have like a signed bit for a group of bits or something.

Speaker 1

它确实需要

It really has

Speaker 2

成为一个协同设计的问题，因为如果算法设计师没有意识到使用更低精度可以大幅提升性能和吞吐量，他当然会说：我当然不想要低精度。

to be a co design thing because if the algorithm designer doesn't realize that he can get greatly improved performance, the throughput with the lower precision, course, the algorithm designer is going to say, of course, I don't want low precision.

Speaker 2

这会引入风险，进而造成困扰。

That introduces risk, and then that's irritation.

Speaker 2

如果你去问芯片设计师：好吧，你想设计什么？

And then if you ask the chip designer, Okay, what do you want to build?

Speaker 2

他们就会去问现在写算法的人，而那些人会说：不，我不喜欢量化。

And then they'll ask the person who's writing the algorithms today who's going to say, no, I don't like quantization.

Speaker 2

这很让人烦心。

It's irritating.

Speaker 2

所以你实际上需要全面了解整个情况，然后意识到：等等。

So you actually need to basically see the whole picture and figure out, oh, wait a minute.

Speaker 2

我们可以通过量化大幅提高性能与成本的比率。

We can, you know, we can increase our throughput to cost ratio by a lot, by, you know, by quantizing.

Speaker 1

然后你就说，是的。

Then you're like, yes.

Speaker 1

量化确实让人烦，但你的模型会快三倍，所以你得接受。

Quantization is irritating, but your model is gonna be three times faster, so you're gonna have to deal.

Speaker 0

在你们的职业生涯中，你们曾多次从事过一些与我们现在用于生成式AI的技术惊人相似的工作。

Through your careers at various times you've had sort of an uncanny, you worked on things that have an uncanny resemblance to what we're actually using now for generative AI.

Speaker 0

1990年，杰夫，你的本科论文是关于反向传播的。

In 1990, Jeff, your senior thesis was about backpropagation.

Speaker 0

2007年，这正是我直到准备这期节目时才意识到的事情。

And in 2007 so this this is the thing I didn't realize until I was hoping for this episode.

Speaker 0

2007年，你们训练了一个包含两万亿词元的n元语法模型用于语言建模。

In 2007, you guys trained a 2,000,000,000,000 token Ngram model for language modeling.

Speaker 0

能跟我讲讲你们开发这个模型时的情况吗？当时你们心里有这样的想法吗？

Just walk me through when you were developing that model, like was this kind of thing in your head?

Speaker 0

当时你们觉得自己在做什么？

What did you think you guys were doing at the time?

Speaker 1

是的

Yeah.

Speaker 1

那么，让我从本科论文开始说起。

So, I mean, let me start with the undergrad thesis.

Speaker 1

所以，在我大四时上的一门并行计算课程中，我第一次接触到了神经网络。

So I kind of got introduced to neural nets in one section of one class on parallel computing that I was taking in my senior year.

Speaker 1

而我需要完成一篇论文才能毕业，也就是荣誉论文。

And I needed to do a thesis to graduate, like an honors thesis.

Speaker 1

于是我去找那位教授，说：‘做点关于神经网络的东西一定会很有趣。’

And so I approached the professor and I said, Oh, it'd be really fun to do something around neural nets.

Speaker 1

于是我们决定，我在1990年实现几种不同的并行化反向传播训练神经网络的方法。

So he and I decided I would sort of implement a couple of different ways of parallelizing backpropagation training for neural nets in 1990.

Speaker 1

我在论文里给它起了个有趣的名字，叫什么‘模式划分’之类的。

And I called him something funny in my thesis, pattern partitioning or something.

Speaker 1

但实际上，我在一台32处理器的超立方体机器上实现了模型并行和数据并行。

But really, I implemented a model parallelism and data parallelism on a 32 processor hypercube machine.

Speaker 1

在第一种方法中，你将所有样本分成不同的批次，每个CPU都拥有模型的副本。

In one, you split all the examples into different batches and every CPU has a copy of the model.

Speaker 1

在另一种方法中，你会将一系列样本流水线式地传递给拥有模型不同部分的处理器。

And in the other one, you kind of pipeline a bunch of examples along to processors that have different parts of the model.

Speaker 1

我对比并分析了这两种方法。

And I compared and contrasted them.

Speaker 1

这很有趣。

And it was interesting.

Speaker 1

我对这种抽象概念感到非常兴奋，因为我觉得神经网络是一种正确的抽象方式。

I was really excited about the abstraction because it felt like neural nets were the right abstraction.

Speaker 1

它们能够解决当时其他方法都无法解决的微型示例问题。

They could solve tiny toy problems that no other approach could solve at the time.

Speaker 1

我想，天真的我啊，32种处理器。

And I thought, Oh, naive me, oh, 32 types of processors.

Speaker 1

我们一定能训练出非常出色的神经网络。

We'll be able to train really awesome neural nets.

Speaker 1

但结果发现，直到计算能力提升约一百万倍后，它们才真正开始解决实际问题。

But it turned out we needed about a million times more compute before they really started to work for real problems.

Speaker 1

但从2008年底到2010年左右，得益于摩尔定律，我们终于拥有了足够的计算能力，让神经网络真正应用于实际任务。

But then starting in the late two thousand eight, 02/2010 timeframe, we started to have enough compute, thanks to Moore's Law, to actually make neural nets work for real things.

Speaker 1

那正是我重新开始关注神经网络的时候。

That was kind of when I sort of reentered looking at neural nets.

Speaker 1

但在那之前，早在二月

But prior to that in February

Speaker 0

所以，我能问一下吗？

So actually, can I ask

Speaker 1

关于这个？

about this?

Speaker 1

当然。

Sure.

Speaker 0

首先，与其他学术成果不同，这篇论文实际上只有四页，你完全可以一口气读完。

First of all, unlike other artifacts of academia, it's actually really It's like four pages and you can just read it

Speaker 1

是的，它只有四页，然后后面有大约30页的C部分

Yeah, in- it's four pages and then like 30 pages of But C

Speaker 0

这就像一个制作精良的学术成果。

it's like just like a well produced sort of artifact.

Speaker 0

然后，告诉我2007年的那篇论文是怎么写出来的吧。

And then, yeah, tell me about how the 2007 paper came together.

Speaker 1

哦，是这样的，当时谷歌有一个机器翻译研究团队，由弗朗茨·奥赫领导，他大约一年前加入谷歌，还有其他一些人。

Oh yeah, so that we had a machine translation research team at Google led by Franz Auch who had joined Google maybe a year before, and a bunch of other people.

Speaker 1

他们每年都会参加一个由DARPA主办的比赛，比赛内容是将几种不同语言翻译成英语，我记得是中文到英语和阿拉伯语到英语。

And every year they competed in a, I guess it's a DARPA contest on translating a couple of different languages to English, I think Chinese to English and Arabic to English, I think.

Speaker 1

谷歌团队提交了一个参赛作品。

And the Google team had submitted an entry.

Speaker 1

比赛规则是，你周一拿到大约500个句子，然后必须在周五提交翻译结果。

And the way this works is you get like, I don't know, 500 sentences on Monday and you have to submit the answer on Friday.

Speaker 1

我看到了比赛结果，我们以相当大的优势赢得了比赛，优势是通过BLEU分数衡量的，这是一种评估翻译质量的指标。

And so I saw the results of this and we'd won the contest by a pretty substantial margin measured in BLUE score, which is like a measure of translation quality.

Speaker 1

于是我联系了这支获胜团队的负责人弗兰茨。

And so I reached out to Franz, the head of this winning team.

Speaker 1

我觉得这太棒了。

I'm like, this is great.

Speaker 1

我们什么时候能上线呢？

When are we going to launch

Speaker 2

上线？

it?

Speaker 1

他却说，哦，我们没法上线这个。

And he's like, oh, well, we can't launch this.

Speaker 1

因为它翻译一句话要花十二个小时，其实并不实用。

It's not really very practical because it takes twelve hours to translate a sentence.

Speaker 1

我说，嗯，这时间似乎有点长。

I'm like, well, that seems like a long time.

Speaker 1

我们该怎么解决这个问题呢？

How could we fix that?

Speaker 1

结果发现，他们根本没为高吞吐量设计它，这很明显。

So it turned out, they'd not really designed it for high throughput, obviously.

Speaker 1

所以它在大型语言模型上执行了大约十万次的disseq操作，这些模型是他们大致计算过统计信息的。

And so it was doing like 100,000 disseqs in a large language model that they'd sort of computed statistics over.

Speaker 1

我不太会说这是真正的训练。

I wouldn't say train really.

Speaker 1

对于它想要翻译的每个词。

And for each word that it wanted to translate.

Speaker 1

所以很明显，执行十万次disseq操作

So like obviously doing 100,000 disseqs

Speaker 2

并不

is not

Speaker 1

高效，但我说，好吧，我们深入研究一下。

super But I said, Okay, well, let's dive into this.

Speaker 1

于是，我和他们一起花了两到三个月的时间，设计了一种内存中的压缩型n-gram数据表示方式。

And so I spent about two or three months with them designing an in memory compressed representation of engram data.

Speaker 1

我们使用的是n元语法，本质上是统计在大型语料库中每个N词序列出现的频率。

We were using an Ngram is basically statistics for how often every N word sequence occurs in a large corpus.

Speaker 1

所以，在这个案例中，我们有大约两万亿个词，而当时大多数n元语法模型只使用二元或三元语法，但我们决定使用五元语法。

So you basically have, in this case, we had like 2,000,000,000,000 words and most Ngram models of the day were like using two grams or maybe three grams, but we decided we would use five grams.

Speaker 1

也就是统计在当天我们能处理的尽可能多的网页内容中，每个五词序列出现的频率。

So how often every five word sequence occurs in basically as much of the web as we could process that in that day.

Speaker 1

然后你有一个数据结构，告诉你比如‘我真的很喜欢这家餐厅’这句话在网页中出现了17次之类的。

And then you have a data structure that says, okay, I really like this restaurant, occurs 17 times in the web or something.

Speaker 1

于是我构建了一个数据结构，可以在200台机器上将所有这些数据存入内存，并提供一个批量API，你可以一次性提交10万个需要查询的条目，我们会并行返回所有结果。

And so I built like a data structure that would let you store all those in memory on 200 machines and then have a batched API where you could say, Here are the 100,000 things I need to look up in this round for this word, and we give you them all back in parallel.

Speaker 1

这让我们从翻译一句话需要一整夜，缩短到基本上只需一百毫秒左右。

That enabled us to go from taking a night to translate a sentence to basically doing something in a hundred milliseconds or something.

Speaker 0

有一份关于杰夫·迪恩的趣闻清单，就像查克·诺里斯的段子一样。

There's this list of Jeff Dean facts, like Chuck Norris facts.

Speaker 0

比如，据说对于杰夫·迪恩来说，NP等于‘没问题’。

Like, for example, that for Jeff Dean, NP equals no problemo.

Speaker 0

其中一个挺有趣的，因为现在听你这么说，我才发现这其实还挺对的。

One of them it's funny because now that I hear you say it's like, actually it's kind of true.

Speaker 0

其中一个说，光速原本是每小时35英里，直到杰夫·迪恩在周末优化了它。

One of them is the speed of light was 35 miles an hour until Jeff Dean decided to optimize it over a weekend.

Speaker 0

从十二小时缩短到一百毫秒左右之类的。

Just going from twelve hours to a hundred milliseconds or whatever.

Speaker 0

我是说，我得把数量级算清楚，不过。

It's like, I gotta do the orders of magnitude there, but

Speaker 1

这些说法都让人非常受宠若惊。

All of these all of these are very flattering.

Speaker 1

它们还挺搞笑的。

They're they're pretty funny.

Speaker 1

就像愚人节玩笑一样，曾经在我大学里流传。

They're like an April fools joke gotta ride by my college.

Speaker 1

好吧。

Okay.

Speaker 1

所以，

So,

Speaker 0

显然，事后看来，通过仅考虑词语之间的关系来构建整个互联网的潜在表示这一想法，确实是大型语言模型的核心理念。

obviously, in retrospect, this idea that you can develop a latent representation of the entire internet through just considering relationships between words is like, yeah, this this is large language models.

Speaker 0

这就是Gemini。

This is Gemini.

Speaker 0

当时，这仅仅是一个翻译方面的想法，还是你意识到它预示着一种全新的范式？

At the time, was it just a translation idea or did you see that as being the beginning of a different kind of paradigm?

Speaker 1

我认为，一旦我们为翻译构建了这个系统，大型语言模型的推理就开始被用于其他用途，比如当你开始打字时，它会建议哪些补全内容是合理的。

I think once we built that for translation, the serving of large language models started to be used for other things like, completion of, you know, you start to type and it suggests like what completions make sense.

Speaker 0

对。

Right.

Speaker 1

这无疑是谷歌内语言模型广泛应用的开端。

So it was definitely the start of a lot of uses of language models in Google.

Speaker 1

而且，你知道，诺姆还在谷歌参与了许多其他项目，比如使用语言模型的拼写纠正系统。

And, you know, Noam has worked on a number of other things at Google, like spelling correction systems that use language models for Right.

Speaker 2

是的

Yeah.

Speaker 2

想想也是

Think yeah.

Speaker 2

但那是在2001年2月左右。

But that was, like, 02/2001.

Speaker 2

然后那时候，我认为所有东西都只存放在一台机器的内存里。

And then and there, I think it was just all in memory on one on one machine.

Speaker 1

是的

Yeah.

Speaker 1

我认为就只有一台机器。

I think it was one machine.

Speaker 2

是的

Yeah.

Speaker 2

那是

It was

Speaker 1

但他2001年开发的拼写纠正系统太厉害了。

a But his spelling correction system he built in 2001 was amazing.

Speaker 1

他发了一个演示链接给全公司，我试了各种乱七八糟的关键词拼写。

Like, he sent he sent out this demo link to the whole company, and, like, I just tried every butchered spelling of every keyword query I could get.

Speaker 1

我甚至试了‘uggs Bundit’这种乱码。

I, like, scrambled uggs Bundit.

Speaker 2

哦，我记得这个。

Oh, I remember

Speaker 1

就是那个。

that one.

Speaker 1

是的。

Yeah.

Speaker 1

是的。

Yeah.

Speaker 1

本来我想打的是‘scrambled eggs Benedict’，结果它每次都准确无误地纠正了。

Instead of scrambled eggs Benedict and like, it just nailed it every time.

Speaker 2

是的。

Yeah.

Speaker 2

我想那就是语言建模。

And I guess that was language modeling.

Speaker 2

是的。

Yeah.

Speaker 0

但在你们开发这些系统的时候，你们有没有意识到：你们把这些东西变得越来越复杂。

But at the time when you were developing these systems, did you have this sense of, look, you make these things more and more sophisticated.

Speaker 0

你们不再只考虑五个词，但如果你考虑一百个词、一千个词，那么这种语言表示就相当于智能。

You don't consider five words, but if you consider 100 words, a thousand words, then the lane representation is intelligence.

Speaker 0

还是说，这种洞察是什么时候突然出现的？

Or was that like, Mesh, when did that insight hit?

Speaker 2

并没有。

Not really.

Speaker 2

我的意思是，我不觉得我曾经有过这样的感觉：好吧。

I mean, like, not like, I don't think I ever felt like, okay.

Speaker 2

n元模型将会，你知道的，将会

Ngram models are going to, you know, are going to

Speaker 1

席卷世界。

Sweep the world.

Speaker 2

是的。

Yeah.

Speaker 2

成为人工智能。

Be be, artificial intelligence.

Speaker 2

我认为在当时，我和很多人一样，对贝叶斯网络感到兴奋。

I think I think at the time, I was a lot of people were excited about the Bayesian networks.

Speaker 2

那看起来非常令人兴奋。

That was that seemed exciting.

Speaker 2

当然，看到那些早期的神经语言模型时，你知道，那里面既有魔力——这确实做了一些极其酷的事情，同时也让我觉得这是世界上最好的问题。

Definitely seeing, like, those early neural language models, you know, there's but both both the magic in that, okay, this is doing something extremely cool, and also also it's just struck me as like the best problem in the world.

Speaker 2

比如，首先，这个问题非常简单明了：给我一个关于下一个词的概率分布。

Like, in that, like, for one, it is very, very simple to state, like, give me a probability distribution over the next word.

Speaker 2

此外，外面还有大约无限的训练数据。

Also, there's roughly infinite training data out there.

Speaker 2

比如网页上的文本。

There's like the text of the web.

Speaker 2

你有成万亿的训练样本，也就是所谓的无监督数据。

You have, like, trillions of training examples, like, you know, of unsupervised data.

Speaker 1

还有自监督。

And then self supervised.

Speaker 2

自监督。

Self supervised.

Speaker 2

是的，这很不错。

Yeah, it's nice

Speaker 1

因为这样你就有了正确答案，然后可以基于除了当前词之外的所有内容来训练，以预测当前词。

because you then have the right answer and then you can train on like all but the current word and try to predict the current word.

Speaker 1

这是一种了不起的能力，能够仅通过观察世界来学习。

And it's this kind of amazing, you know, ability to just learn from observations of the world.

Speaker 2

是的

Yep.

Speaker 2

然后这就属于人工智能完备的问题了。

And then it's AI complete.

Speaker 2

如果你能很好地解决这个问题，那么你基本上就能做任何事情了。

If you can do a great job of that, then you can pretty much do anything.

Speaker 0

我很高兴向大家介绍我们的新赞助商——Meter。

I'm excited to introduce our new sponsor, Meter.

Speaker 0

他们是一家网络公司，支撑着全球互联网基础设施中日益增长的一部分。

They're a networking company that is behind a growing fraction of the world's internet infrastructure.

Speaker 0

一个有趣的事实是，大约三四年前，在这个播客的早期阶段，我正是靠Meter的首席执行官奥尼尔的捐赠来运营这个播客的，直到今天我依然从他的建议中获益良多。

Fun fact, about three to four years ago in the very early days of the podcast, I ran this podcast from a donation from Meter CEO, O'Neill, and I continue to benefit enormously from his advice to this day.

Speaker 0

现代世界依赖于网络运行。

The modern world runs on networks.

Speaker 0

从自动驾驶汽车到大型语言模型的训练，甚至像这样向全球广播播客，这些领域的进展都受限于大型复杂网络的设计与调试。

Progress in fields as diverse as self driving cars to giant LLM training runs to even broadcasting a podcast like this around the world is bottlenecked on designing and debugging large complex networks.

Speaker 0

Meter 希望通过训练一个端到端的基础模型，利用时间序列数据包、支持工单、网络教科书以及他们自建整个网络堆栈所积累的所有其他专有数据，为网络工程师提供 100 倍的效能提升。

Meter wants to give network engineers 100x multiplier by training a large end to end foundation model using time series packet data and support tickets and networking textbooks and all the other proprietary data they have as a result of themselves building every layer of the networking stack in house.

Speaker 0

Meter 刚刚宣布与微软达成一项长期计算合作伙伴关系，以获取数以万计的 GPU 资源。

Meter just announced a long term compute partnership with Microsoft for access to tens of thousands of GPUs.

Speaker 0

他们目前正在招募一支世界级的 AI 研究团队。

They're currently recruiting a world class AI research team.

Speaker 0

他们的目标是构建自主网络，彻底改善我们习以为常的数字世界。

Their goal is to build autonomous networks that radically improve the digital world that we take for granted.

Speaker 0

要了解更多，请访问 meter.com/barkesh。

To learn more, go to meter.com/barkesh.

Speaker 0

好了，我们回到杰夫和诺姆。

All right, back to Jeff and Noam.

Speaker 0

科学史上有一个有趣的讨论，关于思想究竟是时代使然、具有某种必然性，还是偶然从某个旁支方向中被发掘出来的。

There's this interesting discussion in the history of science about whether ideas are just in the air and there's a sort of inevitability to big ideas or whether it's sort of plucked out of some tangential direction.

Speaker 0

在这种情况下，你如此有逻辑地阐述这一观点，是否意味着这种发展本质上是不可避免的？

In this case, this way in which you're laying it out very logically, does that imply like, basically, how inevitable does this

Speaker 2

确实感觉这些想法已经弥漫在空气中了。

It does feel like it's in the air.

Speaker 2

当时确实出现过一些东西，比如神经图灵机。

There were definitely some there was, like, this neural Turing machine.

Speaker 2

所以，围绕注意力机制确实有不少想法，比如将键值存储用于神经网络以实现聚焦，没错，我认为在某种意义上这些想法已在空气中，但同时也需要某个团队去推动实现。

So yeah, a bunch of ideas around this attentionthere's this like having these key value stores that could be useful in neural networks to kind of focus on so yeah, I think in some sense, in the air and in some sense, you need some group to go Sure.

Speaker 2

做

Speaker 1

我倾向于认为，很多想法是部分存在于空气中的，当你试图解决一个新问题时，可能会看到几个彼此独立的研究思路。

I mean, I like to think of a lot of ideas as they're kind of partially in the air where there's like a few different, maybe separate research ideas that one is kind of squinting at when you're trying to solve a new problem.

Speaker 1

你会从中获得一些启发。

And you kind of draw on those for some inspiration.

Speaker 1

然后还有一些尚未解决的方面，你需要想办法去突破。

And then there's like some aspect that is not solved and you sort of need to figure out how to solve that.

Speaker 1

最终，一些已有元素的融合加上一些新元素，会催生出之前不存在的新突破或新研究成果。

And then the combination of like some morphing of the things that already exist and some new things lead to some new breakthrough or new research result that didn't exist before.

Speaker 0

有没有一些关键的时刻让你印象深刻，当你观察某个研究领域时，突然冒出一个想法，然后你感到天啊，居然真的成功了？

Were there, are there key moments that stand out to you where you looking at a research area and you come up with this idea and you have this feeling of like, holy shit, I can't believe that worked.

Speaker 1

我记得一件事，在脑团队的早期，我们的目标是看看能否构建一些基础设施，用来训练非常非常大的神经网络。

One thing I remember was, you know, we'd been In the early days of the brain team, we were focused on, let's see if we can build some infrastructure that lets us train really, really big neural nets.

Speaker 1

那时候，我们的数据中心还没有GPU。

And at that time, we didn't have GPUs in our data centers.

Speaker 1

我们只有CPU，但我们知道如何让大量CPU协同工作。

We just had CPUs, but we know how to make lots of CPUs work together.

Speaker 1

所以我们构建了一个系统，通过模型并行和数据并行，成功训练了相当大的神经网络。

So we built a system that enabled us to train pretty large neural nets through both model and data parallelism.

Speaker 1

我们还建立了一个系统，用于在实际上一千万个随机选取的YouTube视频帧上进行无监督学习。

So we had a system for unsupervised learning on actually 10,000,000 randomly selected YouTube frames.

Speaker 1

这是一种空间上局部的表示方式。

And it was kind of a spatially local representation.

Speaker 1

它会基于尝试从高层表示中重建原始内容，来构建无监督的表示。

So it would build up unsupervised representations based on trying to reconstruct the thing from the high level representations.

Speaker 1

所以我们成功了，使用2000台计算机、16000个核心进行训练。

So we got that working and training on 2,000 computers using 16,000 cores.

Speaker 1

过了一段时间，这个模型最终在最高层构建出一种表征：某个神经元会对猫的图像产生反应，而它从未被明确告知什么是猫，但它在训练数据中见过足够多正面视角的猫，因此这个神经元就会对猫的图像激活，而对其他东西则不会。

After a little while, that model was actually able to build a representation at the highest level where one neuron would get excited by images of cats that it had never been told what a cat but it sort of had seen enough examples of them in the training data of head on facial views of cats that that neuron would turn on for that and not for much else.

Speaker 1

同样地，你还会看到其他神经元对人脸、行人背面等类似事物产生反应。

And similarly, you'd have other ones for human faces and backs of pedestrians and this kind of thing.

Speaker 1

这相当酷，因为它基于无监督学习的原理，构建出了如此高级的表征。

And so that was kind of cool because it's sort of from unsupervised learning principles, building up these really high level of representations.

Speaker 1

随后，我们在有两万个类别的监督式ImageNet挑战赛中取得了非常出色的结果，相对性能提升了60%，这在当时已经相当了不起了。

And then we were able to get very good results on the supervised ImageNet twenty thousand category challenge that advanced the state of the art by 60% relative improvement, which was quite good at the time.

Speaker 1

这个神经网络的规模大概是之前训练过的网络的50倍，并且取得了很好的效果。

And that neural net was probably 50x bigger than one that had been trained previously, and it got good results.

Speaker 1

这让我意识到，扩大神经网络规模似乎是个好主意，而且确实有效，所以我们应该继续推进这一点。

So that said to me, Hey, actually scaling up neural nets seems like a I thought it would be a good idea and it seems to be, so we should keep pushing on that.

Speaker 0

这些例子说明了这些AI系统如何契合你刚才提到的——谷歌本质上是一家组织信息的公司。

So these examples illustrate how these AI systems fit into what you were just mentioning that Google is sort of a company that organizes information fundamentally.

展开剩余字幕（还有 480 条）

Speaker 0

在这种情况下，AI本质上是在信息和概念之间寻找关联，以更快地向你提供想法和你所需的信息。

And then you can basically what AI is doing in this context is finding relationships between information, between concepts to help get ideas to you faster, information you want to you faster.

Speaker 0

现在我们正在使用当前的AI模型。

Now we're moving with current AI models.

Speaker 0

显然，你可以在谷歌搜索中输入‘鸟’，并向这些系统提问，它们在信息检索方面依然表现得很好。

Like obviously they're very, you can use bird in Google search and you can ask these things questions and they obviously are still good at information retrieval.

Speaker 0

但更根本的是，它们能为你编写整个代码库，这看起来更像一个真正的工作者。

But more fundamentally, they can write your entire code base for you and all, That seems more like an actual worker.

Speaker 0

这已经超越了单纯的信息检索。

Which is going beyond the just like information retrieval.

Speaker 0

那么，当你在构建通用人工智能时，你怎么看待谷歌是否仍是一家信息检索公司？

So how are you thinking about like, is Google still an information retrieval company if you're like building an AGI?

Speaker 0

通用人工智能当然能进行信息检索，但它还能做很多其他事情，对吧？

Like AGI can do information retrieval, but it can do many other things as well, right?

Speaker 1

我认为我们是一家组织全球信息的公司，这比信息检索的范围更广，对吧？

I think we're an organized, the world's information company, and that's broader than information retrieval, right?

Speaker 1

这可能是根据你提供的指导来组织并创建新信息。

That's maybe organizing and creating new information from some guidance you give it.

Speaker 1

你能帮我给我的兽医写封关于我狗狗的信吗？

Can you help me write a letter to my veterinarian about my dog?

Speaker 1

它会根据这些症状帮我起草这封信。

It's got these symptoms and it'll draft that.

Speaker 1

或者，你能输入这段视频，并每隔几分钟生成一段视频内容的摘要吗？

Or can you feed in this video and can you produce a summary of what's happening in the video every few minutes?

Speaker 1

我认为我们的多模态能力表明，这不仅仅是文本而已。

And I think our multimodal capabilities are showing that it's more than just text.

Speaker 1

它关乎以信息存在的各种不同模态来理解世界，包括人类常用的模态，也包括非人类导向的模态，比如自动驾驶汽车上的奇怪激光雷达传感器、基因组信息或健康数据。

It's about understanding the world in all the different kind of modalities that information exists in, both kind of human ones, but also kind of non human oriented ones like weird lidar sensors on autonomous vehicles or genomic information or health information.

Speaker 1

那么，你如何从中提取并转化出对人们有用的洞察，并利用这些洞察帮助他们完成各种想做的事情？

Then how do you extract and transform that into useful insights for people and make use of that in helping them do all kinds of things they want to do.

Speaker 1

有时候，我只是想通过和聊天机器人聊天来获得娱乐。

Sometimes it's, I want to be entertained by chatting with a chatbot.

Speaker 1

有时候，我想得到一个非常复杂问题的答案。

Sometimes it's, I want answers to this really complicated question.

Speaker 1

没有单一的信息来源可以查询。

There is no single source to retrieve from.

Speaker 1

你需要从一百个网页中提取信息，弄清楚发生了什么，并将这些数据整理成有组织、综合的版本，然后再处理多模态问题或与编码相关的问题。

It's you need to pull information from 100 web pages and figure out what's going on and make an organised, synthesised version of that data and then dealing with multimodal things or coding related problems.

Speaker 1

我认为这些模型的能力非常令人兴奋，而且它们正在迅速进步。

I think it's super exciting what these models are capable of and they're improving fast.

Speaker 1

所以我非常期待我们未来的方向。

So I'm excited to see where we go.

Speaker 2

我不明白你在说什么。

I don't know what you're saying.

Speaker 2

我也很期待看到我们能走到哪里。

I am also excited to see where we go.

Speaker 2

而且，你知道，没错，我认为整理信息显然是一个价值万亿美元的机会，但如今，万亿美元已经不算什么了。

And, you know, yeah, I think definitely organizing information is clearly like a trillion dollar opportunity, but a trillion dollars is not cool anymore.

Speaker 2

真正酷的是万亿美元。

What's cool is a quadrillion dollars.

Speaker 2

我的意思是，当然，目标不是简单地堆积一堆巨额财富，而是为世界创造价值，你知道，当这些系统能够真正为你做事、编写代码或解决你原本无法解决的问题，并且能大规模实现时，可以创造更多价值。

I mean, and obviously idea is not to just pile up some giant pile of money, but it's to just create value in the world, you know, and so much more value can be created when these systems can actually, like, go and do something for you or write your code or figure out problems that you wouldn't have been able to figure out yourself and to do that at scale.

Speaker 2

所以我的意思是，随着我们提升这些模型的能力，我们必须变得非常非常灵活和动态，是的。

So I mean, we're going to have to be very, very flexible and dynamic as we improve the capabilities of these models to Yeah.

Speaker 1

我想我对许多基础研究问题感到非常兴奋，因为当你看到我们正在做的事情时，如果采用这种方法或朝这个大致方向努力，可能会有显著的改进。

I guess I'm pretty excited about kind of a lot of fundamental research questions that sort of come about because you see something that we're doing could be substantially improved if we tried this approach or things in this rough direction.

Speaker 1

也许这能成功，也许不会。

And maybe that'll work, maybe it won't.

Speaker 1

但我认为，观察我们能为最终用户实现什么，然后反过来思考如何构建能够实现这些功能的系统，这同样有价值。

But I also think there's value in seeing what we could achieve for end users and then how can we work backwards from that to actually build systems that are able to do that?

Speaker 1

举个例子，信息组织意味着世界上任何信息都应能被任何人使用，无论我说什么语言。

So as one example, organizing information, that should mean any information of the world should be usable by anyone regardless of what language I speak.

Speaker 1

我认为我们已经做了一些工作，但离完整愿景还差得很远——无论你使用哪种语言，成千上万种语言中的任何内容，我们都应该能让你获取并使用。

And that I think we've done some amount of, but it's not nearly the full vision of, no matter what language you speak out of thousands of languages, we can make any piece of content available to you and make it usable by you.

Speaker 1

任何视频都可以用任何语言观看。

And any video could be watched in any language.

Speaker 1

我觉得这会非常酷。

I think that would be pretty awesome.

Speaker 1

但我们还做不到，不过我确实看到未来有可能实现这一点。

And we're not quite there yet, but that's definitely things I see on the horizon that should be possible.

Speaker 0

说到你可能尝试的不同架构，你现在正在研究的一个方向是更长的上下文。

Speaking of different architectures you might try, know one thing you're working on right now is longer context.

Speaker 0

如果你把谷歌搜索看作是拥有整个互联网索引的上下文，但它更像是一个非常浅层的搜索。

If you think of Google search as like, it's got the entire index of the internet in its context, but it's like sort of very like shallow search.

Speaker 0

而显然，语言模型目前的上下文长度是有限的，但它们能真正地思考——这就像黑魔法，也就是上下文学习，对吧？

And then obviously language models have like limited context right now, but they can like really think it's like dark magic, like in context learning, right?

Speaker 0

它们能真正地理解所看到的内容。

Just like can really think about what it's seeing.

Speaker 0

你认为将谷歌搜索和上下文学习结合起来会是什么样子？

How do you think about what it would be like to merge something Google search and something like in context learning?

Speaker 1

是的，也许我来先试着说一下。

Yeah, maybe I'll take a first stab at it.

Speaker 1

我的意思是，我确实思考过这个问题一段时间。

Mean, because I've thought about this for a bit.

Speaker 1

我的意思是，你发现这些模型虽然很强大，但有时会幻觉，存在事实性问题。

I mean, I think one of the things you see with these models is they're quite good, but they do hallucinate and have factuality issues sometimes.

Speaker 1

这部分原因在于，你训练时使用了数万亿个词元，并将它们全部混合在数千亿甚至上万亿的参数中。

And part of that is, you've trained on, say, tens of trillions of tokens and you've stirred all that together in your tens or hundreds of billions of parameters.

Speaker 0

对。

Yeah.

Speaker 1

但这一切都比较模糊，因为你把所有这些词元都搅在一起了，模型对这些数据有相对清晰的理解，但有时还是会搞错某些信息的日期。

But it's all a bit squishy because you've churned all these And tokens so the model has a reasonably clear view of that data, but it sometimes gets confused and will give the wrong date for something.

Speaker 1

而上下文窗口中的信息，也就是模型的输入，却非常清晰明确，因为我们有Transformer中出色的注意力机制，模型可以聚焦于特定内容，清楚地知道它正在处理的精确文本、视频帧、音频或其他数据。

Whereas information in the context window, in the input of the model is really sharp and clear because we have this really nice attention mechanism in transformers that the model can pay attention to things and it knows kind of the exact text or the exact frames of the video or audio or whatever that it's processing.

Speaker 1

目前，我们的模型已经能够处理数百万个词元的上下文，这已经相当多了。

And so right now we have models that can deal with kind of millions of tokens of context, which is quite a lot.

Speaker 1

这就像数百页的PDF文件、50篇研究论文、数小时的视频或数十小时的音频，或者这些的组合，相当酷。

It's like hundreds of pages of PDF or 50 research papers or hours of video or tens of hours of audio or some combination of those things, which is pretty cool.

Speaker 1

但如果模型能关注万亿级别的标记，那该多好啊，对吧？

But it would be really nice if the model could attend to trillions of tokens, right?

Speaker 1

它能关注整个互联网并为你找到正确的内容吗？

Could it attend to the entire internet and find the right stuff for you?

Speaker 1

它能为你关注所有个人资料吗？

Could it attend to all your personal information for you?

Speaker 1

比如，我特别希望有一个模型能访问我所有的电子邮件、文档和照片。

Like, I would love a model that has access to all my emails and all my documents and all my photos.

Speaker 1

当我让它做某事时，它可以在获得我授权的前提下，利用这些信息来帮助我解决问题。

And when I ask it to do something, it can sort of make use of that with my permission to sort of help solve what it is I'm wanting it to do.

Speaker 1

但这会是一个巨大的计算挑战，因为朴素的注意力算法是二次方复杂度的，即使在相当不错的硬件上处理数百万标记都勉强可行，但想直接用朴素方法扩展到万亿标记是完全不可能的。

But that's going to be a big computational challenge because naive attention algorithm is quadratic And you can kind of barely make it work on a fair bit of hardware for millions of tokens, but there's no hope of making that just naively go to trillions of tokens.

Speaker 1

所以我们需要一系列创新的算法近似方法，来实现模型在概念上能关注更多、更多——乃至万亿级别的标记，并能关注你的个人标记。

So we need a whole bunch of interesting algorithmic approximations to what you would really want to make a way for the model to attend conceptually to lots and lots of more tokens, trillions of tokens, and attend to your tokens.

Speaker 1

也许我们可以把整个谷歌代码库作为上下文提供给每一位谷歌开发者，把全世界的源代码作为上下文提供给任何开源开发者。

Maybe we can put all of the Google code base in context for every Google developer, all the world's source code in context for any open source developer.

Speaker 1

那将会太棒了。

That would be amazing.

Speaker 2

那简直太惊人了。

It would be incredible.

Speaker 2

是的，我的意思是，没错。

Yeah, mean, right.

Speaker 2

模型参数的美妙之处在于，它们在记忆事实方面相当节省内存。

Yeah, the beautiful thing about model parameters is they are quite memory efficient at, you know, sort of memorizing facts.

Speaker 2

也许，每个模型参数大约能记住一个事实，而如果你在上下文中有一个标记，那么每一层都会有大量的键值对。

Maybe, you know, you can probably memorize order of one fact or something per model parameter, whereas, you know, if you have some token in context, there are like lots of keys and values at every layer.

Speaker 2

每个标记可能需要一KB或几MB的内存。

It could a kilobyte or megabyte of memory per token

Speaker 1

是的，把一个词扩展成10KB左右。

Yeah, that take a word and you blow it up to 10 kilobytes or something.

Speaker 2

是的，是的。

Yes, yes.

Speaker 2

对，我的意思是，实际上现在有很多创新在围绕着一个问题展开：如何最小化这种开销？

Yeah, so, I mean, there's actually a lot of innovation going on around, okay, A, how do you minimize that?

Speaker 2

还有，你需要在那里面使用哪些词？

And the, okay, what words do you need to have there?

Speaker 2

有没有更好的方式来访问这些信息？

Are there better ways of accessing of that information?

Speaker 2

杰夫看起来是解决这个问题的合适人选，比如，我们的内存层次结构是什么样的，从SRAM一直到全球数据中心级别？

You know, Jeff seems like the right person to figure this out, like, okay, what does our memory hierarchy look like, from the SRAM all the way up to data center worldwide level?

Speaker 0

我想更多地谈谈你提到的那件事：谷歌是一家拥有大量代码和大量示例的公司。

I want to talk more about the thing you mentioned about, look, Google is a company with lots of code and lots of examples.

Speaker 0

如果你只考虑这一个使用场景，以及它所意味着的东西。

If you just think about that one use case and what that implies.

Speaker 0

所以你有谷歌的单体仓库。

So you've got the Google mono repo.

Speaker 0

也许你可以解决长上下文的问题，把整个内容纳入上下文，或者对其进行微调。

And maybe you figure out the long context thing, can put the whole thing in context or you fine tune on it.

Speaker 0

是的，本质上就是，为什么这件事还没被做出来呢？

Yeah, basically like why hasn't this been already done?

Speaker 0

因为你可以想象，谷歌拥有大量专有代码，即使你只是内部使用，也能让开发人员更高效、更有生产力。

Because you can imagine like the amount of code that Google has proprietary access to, like me, even if you're just using it internally for it to make your developers more efficient and productive.

Speaker 1

澄清一下，我们实际上已经对Gemini模型在我们的内部代码库上进行了进一步的训练，供内部开发人员使用。

Oh, to be clear, we have actually already done further training on a Gemini model on our internal code base for our internal developers.

Speaker 1

对。

Yeah.

Speaker 1

但这和同时关注所有代码是不同的。

But that's different than attending to all of it.

Speaker 1

没错。

Right.

Speaker 1

因为它把代码库混在一起，转化成一堆参数。

Because it sort of stirs together the code base into a bunch of parameters.

Speaker 1

我认为将它放在上下文中会让事情更清晰。

And I think having it in context is makes things clearer.

Speaker 1

但即使是在内部进行过进一步训练的模型，也极其有用。

But even the sort of further trained model internally is incredibly useful.

Speaker 1

Sundar 曾表示，如今我们提交到代码库的字符中，有25%是由我们基于AI的编码模型生成的，辅以人工干预。

Sundar, I think has said that 25% of the characters that we're checking into our code base these days are generated by our AI based coding models with kind of human

Speaker 0

你如何想象在未来一两年内，基于你所看到的前沿能力？

How do you imagine in a driving year or two based on the capabilities you see around the horizon?

Speaker 0

就你个人的工作而言，作为一名谷歌研究员，未来会是什么样子？

Your own personal work, what will it be like to be a researcher at Google?

Speaker 0

你有了一个新想法，或者类似的东西。

You have a new idea or something.

Speaker 0

一年后，当你与这些模型互动时，这种互动会是什么样子？

With the way in which you're interacting these models in a year, what does that look like?

Speaker 2

我的意思是，我预计这些模型会变得更好，我们也能变得更加高效。

Well, I mean, I assume the we will be we will have these models a lot better and hopefully be able to be much, much more productive.

Speaker 2

是的

Yeah.

Speaker 1

我的意思是，除了研究背景之外，每当你看到这些模型被使用时，我认为它们能够提高软件开发者的生产力，因为它们可以接收你对所需功能的高层次描述或句子说明，并自动生成一个相当接近、相当合理的初步版本。

I mean, I think one of the in addition to kind of researchy context, like any time you're seeing these models used, I think they're able to make software developers more productive because they can kind of take sort of a high level spec or in sentence description of what you want done and give a pretty approximate, pretty reasonable first cut at that.

Speaker 1

从研究的角度来看，你或许可以说：‘我真的很希望你探索一下这个想法，类似于这篇论文中的内容，但让我们尝试把它改成卷积结构之类的。’

And so from a research perspective, maybe you can say, I'd really like, you to explore this kind of idea similar to the one in this paper, but maybe let's try making it convolutional or something.

Speaker 1

你可以这样做，让系统自动生成大量实验性代码，然后你查看后可能会说：‘嗯，这个看起来不错。’

You could do that and have the system automatically sort of generate a bunch of experimental code and maybe you look at it and you're like, Yeah, that looks good.

Speaker 1

运行它。

Run that.

Speaker 1

这似乎是一个很好的理想方向，而且在未来一两年内，你很可能在这一领域取得重大进展。

That seems like a nice dream direction to go in and seems plausible in the next year or two years that you might make a lot of progress on that.

Speaker 0

这似乎被低估了，因为你实际上可以拥有数百万额外的‘员工’，并立即检查他们的输出，而员工之间也可以相互检查对方的输出。

It seems underhyped because you've got like you could have like literally millions of extra employees and you can immediately check their output, but employees can check either each other's output.

Speaker 0

它们会立即流式输出文本。

They like immediately stream tokens.

Speaker 0

是的。

Yeah.

Speaker 1

我不是说不要宣传它。

Didn't mean not to hype it.

Speaker 1

我觉得这非常令人兴奋。

I think it's super exciting.

Speaker 1

我只是不喜欢过度宣传尚未完成的事情。

I just don't like to hype things that aren't done yet.

Speaker 0

是的。

Yeah.

Speaker 0

所以让我们更深入地探讨这个想法，因为如果你有一个类似自主软件工程师的系统，尤其是从研究者的角度来看，他们希望指定并构建系统时，你必须应对这种情况。

So let's do wanna play with the idea more because seems like you have to deal if you have something like kind of like an autonomous software engineer, especially from the perspective of a researcher who's like, I want to spec build the system.

Speaker 0

再说一遍，好的。

Again, okay.

Speaker 0

所以，你提出了这个想法——作为一位在职业生涯中开发过变革性系统的人，你的想法是，不再需要像今天编写MapReduce或TensorFlow那样去编码，而是直接说：‘这是我希望分布式AI库应有的样子，帮我写出来。’

So you legislated this idea, like, as somebody who has worked on developing transformative systems through your careers, the idea that instead of having to code something like whatever the today's equivalent of MapReduce is or Tensorflow is just like here's how I would want like distributed AI library to look like write it up for me.

Speaker 0

你有没有想过，你的生产力可能会提升10倍，甚至100倍？

Do you imagine you could be like 10x more productive, 100x more productive?

Speaker 1

我确实印象深刻。

I was pretty impressed.

Speaker 1

我记得是在Reddit上看到的，说有个新的实验性编程模型，在编程和数学等方面表现得更好。

I think it was on Reddit that I saw like we have a new experimental coding model that's much better at coding and math and so on.

Speaker 1

有人外部尝试了一下，直接给它下指令说：我想让你实现一个没有外部依赖的SQL处理数据库系统。

And someone external tried it and they basically prompted it and said, I'd like you to implement a SQL processing database system with, no external dependencies.

Speaker 1

据那个人说，这个模型实际上完成得相当不错。

And please, please do that in And from what the person said, it actually did a quite good job.

Speaker 1

它生成了一个SQL解析器、一个分词器、一个查询规划系统，以及磁盘上的数据存储格式，甚至能处理简单的查询。

Like it generated a SQL parser and a tokenizer and a query planning system and some storage format for the data on disk and actually was able to handle simple queries.

Speaker 1

从这样一个段落级别的提示，就能得到这样一个初步的成果，这似乎大大提升了软件开发者的生产力。

So from that prompt, which is like a paragraph of text or something, to get even an initial cut at that seems like a big boost in productivity for software developers.

Speaker 1

我认为你可能会遇到其他类型的系统，它们不一定会在几十秒内以半交互方式完成所有工作，而是可能运行十分钟，然后在五分钟后打断你，说：我已经完成了大部分工作，但现在需要你提供一些输入。

And I think you might end up with other kinds of systems that maybe don't try to do that in a single, in semi interactive respond in forty second kind of thing, but might go off for ten minutes and might interrupt you after five minutes saying, oh, I've done a lot of this, but now I need to get some input.

Speaker 1

你关心的是视频处理，还是只是图片之类的？

Do you care about handling video or just images or something?

Speaker 1

如果你有大量这类后台活动在进行，似乎就需要一些方式来管理这些工作流程。

And that seems like you'll need ways of managing the workflow if you have a lot of these kind of background activities happening.

Speaker 0

是的。

Yeah.

Speaker 0

实际上，你能再多讲讲这个吗？

Actually, can you talk more about that?

Speaker 0

所以，如果你真的能随时启动数百万、数十万员工，他们能极其快速地打字，你会想象我们需要什么样的界面？这就像从1930年代的票据交易，一下子跃升到如今的Jane Street这样的现代模式。

So what interface do you imagine we might need if we have if you could literally have, like, millions of employees you could spin up, hundreds of thousands of employees you could spin up on command who are able to type incredibly fast and who so it's almost like you go from 1930s trading of tickets or something to now modern, Jane Street or something.

Speaker 0

我们需要某种界面来追踪所有这些活动，让AI能够融入这个庞大的单一代码库，并发挥各自的优势，同时让人能掌握全局进展。

Need some interface to keep track of all of this that's going on, for the AIs to integrate into this big mono repo and leverage their own strengths for humans to keep track of what's happening.

Speaker 0

想象一下，三年后，作为Jeff或Noam，你每天的工作会是什么样子？

Mesh, what is it like to be Jeff or Noam in three years working day to day?

Speaker 2

可能和我们现在的情况差不多，因为我们现在就已经面临并行化这个主要问题了——你知道，我们有大量极其聪明的机器学习研究人员，我们希望他们能协同合作，共同构建AI。

It might be kind of similar to what we have now because we already have sort of parallelization as a major issue because, you know, we have, like, lots and lots of really, really brilliant machine learning researchers and we want them to all work together and build AI.

Speaker 2

你知道，实际上人与人之间的并行化可能类似于机器之间的并行化。

You know, so actually the parallelization among people might be similar to parallelization among machines.

Speaker 2

但我认为，对于需要大量探索的事情，比如催生下一个突破性发现，这肯定会很有帮助。

But I think there definitely it should be good for things that require like a lot of exploration, you know, like come up with the next breakthrough.

Speaker 2

因为，如果你有一个绝妙的点子，确信它能成功，在机器学习领域，即使你很聪明，它也只有2%的成功几率。

Because, you know, if you have a brilliant idea that's just certain to work, you know, in the ML domain, then, you know, it has a 2% chance of working if you're brilliant.

Speaker 2

但大多数情况下，如果你尝试一百个、一千个甚至一百万个想法，你可能会偶然发现某个惊人的东西。

And, you know, But mostly these things if you try a 100 things or a thousand things or a million things, then you might hit on something amazing.

Speaker 2

我们的计算资源非常充足。

And we have plenty of compute.

Speaker 2

如今，顶级实验室的计算能力大概是训练Transformer时所需算力的百万倍。

Like, modern, know, top labs these days have probably a million times as much compute as it took to train Transformer.

Speaker 2

所以

Speaker 0

是的

Yeah.

Speaker 0

实际上，这个想法非常有趣。

Actually, so that's a really interesting idea.

Speaker 0

假设当今世界有大约一万名AI研究人员和这个社群，每人都在提出突破性成果，

If you have like, suppose in the world today, there's, like, on the order of 10,000 AI researchers and this community coming up with a breakthrough every

Speaker 1

可能还不止这个数。

Probably more than that.

Speaker 1

在NeurIPS会议上就有15000人参与。

There were 15,000 at NeurIPS sponsored.

Speaker 2

哇，真的吗？

Oh, wow.

Speaker 2

啊，

Speaker 0

十万？

100,000?

Speaker 0

我不确定。

I don't know.

Speaker 1

是的。

Yeah.

Speaker 1

也许吧。

Maybe.

Speaker 1

抱歉。

Sorry.

Speaker 0

不。

No.

Speaker 0

不。

No.

Speaker 0

知道正确的数量级很好。

It's good to have the correct order of magnitude.

Speaker 0

而这个领域每年产生像Transformer这样级别突破的概率，假设是10%。

And the odds of this community every year comes up with a breakthrough on the scale of a transformer is, let's say, 10%.

Speaker 0

现在假设这个群体规模扩大一千倍，从某种意义上说，这就像并行探索更优的架构和更优的技术。

Now suppose this community is a thousand times bigger and it is in some sense like the sort of parallel search of better architectures, better techniques.

Speaker 0

我们只是偶尔才迎来一次突破吗？

Do we just like get like A breakthrough day.

Speaker 0

是每年或每天都有突破？

Through breakthroughs every year or every day?

Speaker 2

也许吧。

Maybe.

Speaker 2

听起来可能不错，你知道的？

Sounds potentially good, you know?

Speaker 0

但这真的像机器学习研究的样子吗？

But does that feel like what ML research is like?

Speaker 0

只要你能够尝试所有这些实验。

It's just if you if you are able to try all these experiments.

Speaker 2

这是个好问题，因为说实话，我不确定人们是否一直这么做。

It's a good question because we, you know, I don't know that folks haven't been doing that as much.

Speaker 2

我的意思是，我们确实不断涌现出许多很棒的想法。

I mean, we definitely have lots of great ideas coming along.

Speaker 2

每个人都想以最大规模运行他们的实验，但我认为这其实是个人类的问题。

Everyone seems to want to run their experiment at maximum scale, but I think that's, you know, that's a human problem.

Speaker 2

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

先在一个千分之一规模的问题上进行测试，然后在上面验证十万种想法，再将有潜力的方案放大，这非常有帮助。

It's very helpful to have a one one thousandth scale problem and then vet like 100,000 ideas on that and then scale up the ones that seem promising.

Speaker 0

明白。

Yeah.

Speaker 0

以下是我们的赞助商Scale AI的简短广告。

A quick word from our sponsor Scale AI.

Speaker 0

公开可用的数据正在枯竭。

Publicly available data is running out.

Speaker 0

因此，像Meta、谷歌DeepMind和OpenAI这样的主要实验室都与Scale合作，以推动可能性的边界。

So major labs like Meta and Google DeepMind and OpenAI all partner with Scale to push the boundaries of what's possible.

Speaker 0

通过Scale的数据工坊，主要实验室能够获得高质量数据，以推动模型训练后的性能提升，包括先进的推理能力。

Through Scale's data foundry, major labs get access to high quality data to fuel post training, including advanced reasoning capabilities.

Speaker 0

随着人工智能的快速发展，我们也必须加强人类的自主权。

As AI races forward, we must also strengthen human sovereignty.

Speaker 0

Scale的研究团队Seal提供了实用的AI安全框架，通过公开排行榜评估前沿AI系统的安全性，并为将先进AI融入社会奠定基础。

Scale's research team, Seal, provides practical AI safety frameworks, evaluates frontier AI system safety via public leaderboards, and creates foundations for integrating advanced AI into society.

Speaker 0

最近，Scale与AI安全中心合作发布了《人类的最后一考》，这是一个开创性的新AI基准，用于评估AI系统在广泛领域中的专家级知识和推理能力。

Most recently, in collaboration with the Center for AI Safety, Scale published Humanities' Last Exam, a groundbreaking new AI benchmark for evaluating AI systems expert level knowledge and reasoning across a wide range of fields.

Speaker 0

如果你是AI研究员或工程师，想了解Scale的数据工坊和研究团队如何帮助你突破当前能力的边界，请访问scale.com/duarkesh。

If you're an AI researcher or engineer and you want to learn more about how Scale's data foundry and research team can help you go beyond the current frontier of capabilities, go to scale.com/duarkesh.

Speaker 0

好的。

All right.

Speaker 0

回到杰夫和诺姆。

Back to Jeff and Noam.

Speaker 0

我认为世界可能没有认真对待的一件事是：人们都知道，要让模型规模扩大100倍，所需的计算量会呈100倍增长，这要难得多。

So I think one thing the world might not be taking seriously, people are aware that it's exponentially harder to make, like, to do the scale, like, make a model that's a 100 X bigger, is like a 100 X more compute.

Speaker 0

对吧？

Right?

Speaker 0

所以，人们都知道，从 Gemini 2 到 Gemini 3 这样的升级，难度是呈指数级增长的。

So it's like, people are aware that's like an exponentially harder problem to go from Gemini two to three or so forth.

Speaker 0

但也许人们没有意识到另一个趋势：Gemini 3 正在不断提出各种新的架构想法并进行尝试，观察哪些有效，从而持续产生算法上的进步，让下一个模型的训练变得越来越容易。

But maybe people aren't aware of this other trend where Gemini three is coming up with all these different architectural ideas and trying them out and you see what works and you're constantly coming out with these algorithmic progress that makes training the next one easier and easier.

Speaker 0

是的。

Yeah.

Speaker 0

这种反馈机制能走多远？

How far could you take that feedback

Speaker 1

我的意思是，人们应该意识到，这些模型从一代到下一代的改进，部分是由硬件和更大规模推动的，但同样甚至更主要的是由重大的算法改进、模型架构变化以及训练数据组合的调整等驱动的，这些因素真正提升了每单位计算资源的模型性能。

I mean, think one thing people should be aware of is the improvements from generation to generation of these models often are partially driven by hardware and larger scale, but equally and perhaps even more so driven by major algorithmic improvements and major changes in the model architecture and the training data mix and so on that really make the model better per flop that is applied to the model.

Speaker 1

所以我认为这是一个很好的认识。

So I think that's a good realization.

Speaker 1

如果我们能实现对想法的自动化探索，就能评估更多想法，并将它们引入下一代模型的实际训练流程中。

And then I think if we have automated exploration of ideas, we'll be able to vet a lot more ideas and bring them into kind of the actual production training for next generations of these models.

Speaker 1

这将非常有帮助，因为我们目前在许多机器学习研究中，正是由一群杰出的机器学习研究人员在大量想法中筛选出在小规模下表现良好的方案，再测试其中哪些能在中等规模下奏效，然后将其引入大规模实验，最终确定将大量新颖有趣的内容加入到最终模型的配方中。

And that's gonna be really helpful because that's sort of what we're currently doing with a lot of machine learning research, brilliant machine learning researchers, is looking at lots of ideas, winnowing ones that seem to work well at small scale, seeing if they work well at medium scale, bringing them into larger scale experiments, and then settling on adding a whole bunch of new and interesting things to the final model recipe.

Speaker 1

如果我们可以让机器学习研究人员通过轻微引导更自动化的搜索过程，而不是亲自手把手地照料大量实验，从而将这一过程加速100倍，那将会非常好。

And then I think if we can do that 100 times faster through those machine learning researchers just gently steering a more automated search process rather than sort of hand babysitting lots of experiments themselves, that's going to be really, really good.

Speaker 1

是的。

Yeah.

Speaker 2

但有一件事是无法加速的，那就是最大规模的实验，因为你仍然不得不进行这种n=1的实验，本质上就是把一群非常聪明的人聚集在一起，让他们盯着模型，弄清楚为什么这个有效、那个无效。

The one thing it doesn't speed up is like experiments at the largest scale because you still end up doing, like, these n equals one experiments and they're really just try to put a bunch of really brilliant people in the room and have them stare at the thing, figure out why this is working, why this is not working.

Speaker 1

更多的硬件和更好的硬件是个不错的解决方案。

More hardware is a good solution and better hardware.

Speaker 2

对。

Yes.

Speaker 2

对。

Yes.

Speaker 2

我们全靠你了。

We're counting on you.

Speaker 2

所以

Speaker 0

好的。

okay.

Speaker 0

直觉上，我认为存在一种软件层面的改进，未来的AI可以实现这种算法上的提升。

And naively, I would so there's a software there's this like algorithmic side improvement that future AIs can make.

Speaker 0

还有你正在做的AlphaChip相关的工作，我让你来介绍一下。

There's also the stuff you're working on on AlphaChip, I'll let you describe it.

Speaker 0

但如果你进入一种情况，仅从软件层面，就能在几周或几个月内不断设计出更好的芯片，而更强大的AI显然能做得更好。

But if you get into a situation where just from a software level, can be making better and better chips in a matter of weeks and months and better AIs can presumably do that better.

Speaker 0

基本上，我在想，这个反馈循环会不会最终导致Gemini三号需要两年，Gemini四号只需要六个月，相当于的跃升现在是六个月，然后第五代是三个月，再之后是一个月，从而让你比原本预期更快地达到超人智能，因为软件在硬件和算法两方面都在持续改进。

Basically, I'm wondering how does this feedback loop not just end up in like Gemini three takes two years and Gemini four is like a six, or the equivalent level jump is now six months, then the level five is like three months, then one month, and you get to superhuman intelligence much more rapidly than you might naively think because of this software, both on the hardware side and from the algorithmic side improvements.

Speaker 1

是的。

Yeah.

Speaker 1

我最近一直对如何大幅加速芯片设计过程感到非常兴奋。

I mean, I've been pretty excited lately about how could we dramatically speed up the chip design process.

Speaker 1

因为正如我们之前讨论的，目前设计芯片的方式需要大约十八个月才能从决定制造芯片到将其交付给台积电，而台积电再花四个月生产，之后你才能拿回芯片并部署到数据中心。

Because as we were talking earlier, the current way in which you design a chip takes you roughly eighteen months to go from, we should build a chip to something that you then hand over to TSMC and then TSMC takes four months to fab it and then you get it back and you put it in your data centers.

Speaker 1

所以这个周期相当漫长。

So that's a pretty lengthy cycle.

Speaker 1

而其中的制造时间如今只占了很小一部分。

And the fab time in there is a pretty small portion of it today.

Speaker 1

但如果你能将制造时间变成主要部分，也就是说，把芯片设计时间从十二到十八个月大幅缩短，用十几个人通过高度自动化的搜索流程，探索整个芯片设计空间，并从芯片设计过程的各个方面获取反馈，来指导系统在高层面上的决策。

But if you could make that the dominant portion so that instead of taking twelve to eighteen months to design the chip, could shrink And with 150 people, you could shrink that a few people with a much more automated search process, exploring the whole design space of chips and getting feedback from all aspects of the chip design process for the kind of choices that the system is trying to explore at the high level.

Speaker 1

是的。

Yeah.

Speaker 1

那样的话，我认为你可以实现更多探索，并更快地设计出真正想交付给晶圆厂的产品。

Then I think you could get perhaps much more exploration and more rapid design of something that you actually want to give to a fab.

Speaker 1

这会非常棒，因为你可以缩短时间，通过以正确的方式设计硬件，使你拿到芯片后只需直接插入系统即可。

That would be great because you can shrink that time, you can shrink the deployment time by kind of designing the hardware in the right way so that you just get the chips back and you just plug them in to some system.

Speaker 1

这将有助于实现更多的专业化。

And that will then, I think, enable a lot more specialization.

Speaker 1

这将缩短硬件设计的时间周期，使你不必过于长远地预测哪些机器学习算法会有趣。

It will enable a shorter timeframe for the hardware design so that you don't have to look out quite as far into what kind of ML algorithms would be interesting.

Speaker 1

相反，你只需要关注未来六到九个月的情况。

Instead, it's like, you're looking at six to nine months from now.

Speaker 1

应该设计什么，而不是两到两年半之后？

What should it be rather than two, two and a half years?

Speaker 1

这将会非常酷。

And that would be pretty cool.

Speaker 1

我认为，如果制造时间是你改进循环中的关键环节，你就会开始关心：这需要多长时间？

I do think that fabrication time is if that's in your inner loop of improvement, you're going to like How long is it?

Speaker 1

不幸的是，先进制程节点由于比旧节点拥有更多的金属层，所需时间越来越长。

The leading edge nodes unfortunately are taking longer and longer because they have more metal layers than previous older nodes.

Speaker 1

因此，制造周期通常需要三到五个月。

So that tends to make it take anywhere from three to five months.

Speaker 0

好的。

Okay.

Speaker 0

但交易运行本来就需要这么长时间，对吧？

But that's how long trading runs take anyways, right?

Speaker 0

所以你实际上可以同时进行这两件事？

So you could potentially do both at the same time?

Speaker 1

是的，有可能。

Yeah, potentially.

Speaker 0

好的。

Okay.

Speaker 0

所以我想，你不可能比三到五个月更快，但关键是，你也能快速开发新的算法思路。

So I guess like you can't get sooner than three to five months, but the idea that you could get like but also, yeah, you're like rapidly developing new algorithmic ideas

Speaker 2

在这段时间内。

between this time.

Speaker 2

可以快速推进。

Can move fast.

Speaker 1

这可以快速推进。

That can move fast.

Speaker 1

这可以在现有的芯片上运行，并探索许多酷炫的想法。

That can run on like existing chips and explore lots of cool ideas.

Speaker 0

是的。

Yep.

Speaker 0

所以这并不是那种人们预期会出现S型曲线的情况。

So this isn't that like a situation in which you're like, I think people sort of expect like, ah, there's gonna be a sigmoid.

Speaker 0

再次强调，这并不是确定无疑的，但这是一个可能性吗？

Again, this is not a sure thing, but just like, is this a possibility?

Speaker 0

这个想法是，人类智能在接近尾声时，能力会迅速爆发，变得越来越聪明，且提升速度越来越快。

The idea that you have like sort of an explosion of capabilities very rapidly towards the tail end of human intelligence that gets like a smarter and smarter to more and more rapid rate.

Speaker 2

很有可能。

Quite possibly.

Speaker 1

是的。

Yeah.

Speaker 1

我喜欢这样理解它。

I like to think of it like this.

Speaker 1

现在我们有模型能够处理相当复杂的问题，将其在内部分解为多个步骤，组合解决这些子问题，并通常能给出你所问问题的完整解答。

Right now we have models that can take a pretty complicated problem and can break it down internally in the model into a bunch of steps, can puzzle together the solutions for those steps and can often give you a solution to the entire problem that you're asking.

Speaker 1

但它并不十分可靠，擅长将问题分解为五到十个步骤，而不是一百到一千个步骤。

But it isn't super reliable and it's good at breaking things down into five to 10 steps, not 100 to 1,000 steps.

Speaker 1

所以，如果你能从大约80%的时间内，对一个十步长的问题给出完美答案，提升到90%的时间内对一个包含一百到一千个子步骤的问题都能给出完美答案，

So if you could go from, yeah, 80% of the time, it can give you a perfect answer to something that's 10 steps long to something that 90 of the time can give you a perfect answer to something that's a 100 to a thousand steps of sub problem long.

Speaker 1

那将是这些模型能力的巨大飞跃。

That would be an amazing improvement in capability of these models.

Speaker 1

而我们还没有达到这一点，但我认为我们所追求的目标正是——

And we're not there yet, but I think that's what we're aspirationally trying to Yeah, get to is

Speaker 2

实现这一点并不需要新的硬件。

we don't need new hardware for that.

Speaker 2

但我的意思是，我们当然欢迎，没错，正是如此。

But I mean, we'll take it, but Yeah, exactly.

Speaker 1

我从不会对新硬件挑三拣四。

I've never looked new hardware in the mouth.

Speaker 2

其中一个我认为在不久的将来会有重大改进的领域，就是推理时的计算资源，也就是在推理阶段投入更多计算。

One of the, you know, like one of the big areas of improvement, I think, you know, in the near future is this inference time compute, like applying more compute, you know, at inference time.

Speaker 2

我常这样描述：即使是像大型语言模型这样的系统，哪怕你每令牌执行一万亿次操作——这比现在大多数人做的还要多——每次操作的成本大约是10的负18次方美元，因此你每美元能获得百万个令牌，对吧？

And I guess the way I've liked to describe it is that, you know, like even some giant language model, you know, even if you're doing, say, a trillion operations per token, which is, you know, more than more than most people are doing these days, know, operations cost something like 10 to the negative $18 And so you're getting like a million tokens to the dollar, right?

Speaker 2

所以，拿这个和一些相对便宜的消遣方式做个比较。

So, I mean, compare that to like a relatively cheap pastime.

Speaker 2

比如你出去买一本纸质书来读，你每美元只能获得大约一万个令牌。

Like you go out and you buy a paper book and read it, you're paying like 10,000 tokens to the dollar.

Speaker 2

这意味着，和语言模型对话的成本可能只有读一本平装书的百分之一。

It's so like talking to a language model could be like, you know, is like 100 times cheaper than reading a paperback.

Speaker 2

所以，这里有很大的提升空间：如果我们能让它更贵一点，但更聪明，因为我们现在比读平装书便宜了100倍。

So, there is a huge amount of headroom there to say, okay, if we can make this thing more expensive, but smarter, because we're like 100x cheaper than reading a paperback.

Speaker 2

我们比咨询客服人员便宜了1万倍。

We're 10,000 times cheaper than talking to a customer support agent.

Speaker 2

我们比聘请软件工程师、咨询医生或律师便宜了百万倍甚至更多。

We're a million times or more cheaper than hiring a software engineer or talking to your doctor or lawyer.

Speaker 2

我们能增加计算量，让它更聪明吗？

Can we add computation and make it smarter?

Speaker 2

所以，我认为我们将在不久的将来看到的很多突破都属于这种形式。

So, like, I think a lot of the takeoff that we're going to see in the very near future is of this form.

Speaker 2

过去我们一直在大量利用和改进预训练和后训练，这些方面还会继续提升，但利用推理时更深入的思考，将会带来一场爆发。

Like, we've been exploiting and improving pre training a lot in the past and post training, and those things will continue to improve, but, like, taking advantage of, you know, think harder at inference time is going to just be an explosion.

Speaker 1

是的，推理时间的一个方面是，我认为系统应该主动探索大量潜在的解决方案。

Yeah, and an aspect of inference time is I think you want the system to be actively exploring a bunch of different potential solutions.

Speaker 1

比如，它可能自己进行一些搜索，获取信息，然后消化这些信息，发现：哦，我现在真的很想更多了解这件事。

You know, maybe it does some searches on its own and gets some information back and like consumes that information and figures out, oh, now I would really like to know more about this thing.

Speaker 1

因此，它会逐步迭代地探索如何最好地解决你向这个系统提出的高层次问题。

So now it kind of iteratively kind of explores how to best solve the high level problem you pose to this system.

Speaker 1

我认为，如果我们能有一个调节旋钮，通过增加推理时的计算量来让模型给出更好的答案，那我们现在已经有了一系列似乎能实现这一点的技术。

And I think having a dial where you can make the model give you better answers with more inference time compute seems like we have a bunch of techniques now that seem like they can kind of do that.

Speaker 1

你把旋钮调得越高，计算成本就越高，但答案的质量也会越好。

The more you crank up the dial, the more it costs you in terms of compute, but the better the answers get.

Speaker 1

这似乎是一个不错的权衡，因为有时候你希望深入思考，因为这是一个极其重要的问题。

That seems like a nice trade off to have because sometimes you want to think really hard because this is a super important problem.

Speaker 1

有时候你可能不希望耗费巨大的计算资源去计算，比如一加一等于多少？

Sometimes you probably don't wanna spend enormous amounts of compute to compute, you know, one plus what's the answer to one plus one?

Speaker 1

也许系统

Maybe the system

Speaker 0

你决定将其扩展到一百倍，然后它会提出类似新行动集合论的东西。

You decides take that to a 100 and it comes up with like new actions set theory.

Speaker 1

所以你会选择使用计算器工具，而不是依赖一个非常大的语言模型。

So you decide to use a calculator tool or something instead of, you know, a very large language model.

Speaker 0

在推理时间方面，是否存在任何障碍，使得你无法线性地增加推理计算资源？

Are there any impediments to taking inference time, like having some way in which you can just linearly scale up inference time compute?

Speaker 0

还是说这个问题基本上已经解决了，我们知道如何投入一百倍、一千倍的计算资源，从而获得相应更好的结果？

Or is this basically a problem that's sort of solved and we know how to throw like a 100 X compute, a thousand X compute and get correspondingly better results?

Speaker 2

是的。

Yeah.

Speaker 2

我们目前正在研究这些算法。

Well, we're working out the algorithms as we speak.

Speaker 2

所以我认为，随着上万名研究人员不断钻研这个问题，我们会看到越来越好的解决方案，其中很多是我所在的团队在做。

So I believe, you know, we'll see better and better solutions to this as these many more than 10,000 researchers are hacking at it, many of them I at

Speaker 1

我的意思是，我们在自己的实验工作中确实看到一些例子，当你增加推理时间的计算量时，得到的答案会比只使用较少计算量时更好；比如使用10倍的计算量，就能比原来获得更好的结果。

mean, I think we do see some examples in our own sort of experimental work of things where if you apply more inference time compute, the answers are better than if you just apply if you apply 10X, you can get better answers than X amount of computed inference time.

Speaker 1

这看起来很有用且重要。

That seems useful and important.

Speaker 1

但我想我们真正希望的是，当使用10倍计算量时，答案质量的提升能比现在更大。

But I think what we would like is when you apply 10X to get even a bigger improvement in the quality of the answers than we're getting today.

Speaker 1

这就涉及到设计新算法、尝试新方法，弄清楚如何最好地利用这10倍的计算量来提升效果。

And so that's about designing new algorithms, trying new approaches, figuring out how best to spend that 10x instead of x to improve things.

Speaker 0

这看起来更像搜索，还是说只是沿着当前的线性方向延长计算时间？

Does it look more like search or does it look more like just keep it going in that linear direction for a longer time?

Speaker 1

我的意思是，我认为搜索就像里奇·萨顿那篇关于‘痛苦的教训’的论文，那篇简洁的一页纸论文的核心思想是：你可以尝试各种方法，但真正极其有效的只有两种技术——学习和搜索。

I mean, I think search is really like Rich Sutton's paper that he wrote about the bitter lesson and the bitter lesson effectively is this nice one page paper, but the essence of it is you can try lots of approaches, but the two techniques that are incredibly effective are learning and search.

Speaker 1

你可以应用并扩展这些计算方法，通常会比其他任何方法在更广泛的问题上获得更好的结果。

You can apply and scale those computationally, and you often will then get better results than any other kind of approach you can apply to a pretty broad variety of problems.

Speaker 0

而且

And

Speaker 1

所以我认为，搜索必须是利用更多推理时间解决方案的一部分，因为你可能想探索几种不同的解决方法，比如‘这个不行，但那个效果更好’。

so I think search has got to be part of the solution to spending more inference time, as you want to maybe explore a few different ways of solving this problem and like, oh, that one didn't work, but this one worked better.

Speaker 1

所以我会进一步探讨这一点。

So I'm gonna explore that a bit more.

Speaker 0

这如何改变你对未来数据中心规划的安排？如果这种搜索可以异步进行，会怎样？

How does this change your plans for future data center planning and so forth, where if, can this kind of search be done asynchronously?

Speaker 0

它必须是在线的，还是可以离线进行？

Does it have to be online, offline?

Speaker 0

这会带来哪些变化？

How does that change?

Speaker 0

你需要多大的园区，以及类似这些方面的考虑？

How big of a campus you need and those kinds of considerations?

Speaker 1

我的意思是，一个普遍的趋势是，推理阶段的计算显然会成为一个日益重要且增长迅速的计算类别，你已经训练好了模型，现在只是想用它做推理，也许你更应该针对这类计算专门优化硬件。

I mean, I think one general trend is it's clear that inference time compute, you have a model that's pretty much already trained and you wanna do inference on it, is going to be a growing and important class of computation that maybe you wanna specialize hardware more around that.

Speaker 1

实际上，第一代TPU就是为推理专门设计的，并没有针对训练进行优化。

Actually the first TPU was specialized for inference and wasn't really designed for training.

Speaker 1

随后的TPU则更多地兼顾了训练和推理需求。

Then subsequent TPUs were really designed more around training and also for inference.

Speaker 1

但当你需要在推理阶段大幅提升计算量时，可能更极端的专用解决方案反而没有太大意义。

But it may be that when you have something where you really wanna crank up the amount of compute you use at inference time, that even more specialized solutions won't make a lot of sense.

Speaker 0

这意味着你能更好地支持异步训练吗？

Does that mean you can accommodate more asynchronous training?

Speaker 1

训练还是推理？

Training or inference?

Speaker 0

或者，不同的数据中心之间不需要互相通信。

Or just you can have The different data centers don't need to talk to each other.

Speaker 0

你可以让它们各自独立地执行大量

You can just like have them do a bunch of

Speaker 1

是的。

Oh, yeah.

Speaker 1

我的意思是，我觉得可以这样想：你所进行的推理是否对延迟敏感？

I mean, I think I like to think of it as, is the inference that you're trying to do latency sensitive?

Speaker 1

比如，用户正在主动等待结果。

Like a user's actively waiting for it.

Speaker 1

还是说它只是后台任务？

Or is it kind of a background thing?

Speaker 1

也许有些推理任务是我试图在一批数据上运行的，但并不是为某个特定用户服务的。

And maybe that's, I have some inference tasks that I'm trying to run over a whole batch of data, but it's not for a particular user.

Speaker 1

我只是想对它进行推理并提取一些信息。

It's just, I wanna run inference on it and extract some information.

Speaker 1

而且可能还有一些我们现在还很少见的东西，但你已经在我们刚刚发布的深度研究工具中看到了一些端倪——我记不清具体是什么时候发布的了，大概是一周前——你可以给它一个相当复杂的任务，比如：‘你能不能去研究一下可再生能源的历史，以及风能、太阳能和其他各种技术的趋势和成本，并整理成一张表格，给我一份完整的八页报告？’

And then there's probably a bunch of things that we don't really have very much of right now, but you're seeing inklings of it in our deep research tool that we just released, I forget exactly when, like a week ago, where you can give it a pretty complicated level task like, Hey, can you go off and research the history of renewable energy and all the trends and costs for wind and solar and other kinds of techniques and put it in a table and give me a full eight page report?

Speaker 1

然后它会返回一份包含50条参考文献的八页报告。

And it will come back with an eight page report with 50 entries in the bibliography.

Speaker 1

这相当了不起，但你并不会为此等待哪怕一秒钟。

It's pretty remarkable, but you're not actively waiting for that for one second.

Speaker 1

完成这个过程需要一两分钟。

It takes a minute or two to go do that.

Speaker 1

我认为这类计算将会相当多。

And I think there's gonna be a fair bit of that kind of compute.

Speaker 1

这就带来了一些界面设计问题：比如，如果一个用户同时有20个这样的异步任务在后台运行，而每个任务可能都需要从用户那里获取更多信息——比如，我找到了飞往柏林的航班，但没有直达的。

And that's the kind of thing where you have some UI questions around, okay, if you're going to have a user with 20 of these kind of asynchronous tasks in the background happening and maybe each one of them needs to get more information from the user, I found your flights to Berlin, but there's no nonstop ones.

Speaker 1

你介意转机吗？

Are you okay with a nonstop one?

Speaker 1

当你需要更多信息，然后又想把它放回后台继续查找柏林的酒店或其他内容时，这个流程该如何运作呢？我觉得这会是

How does that flow work when you need a bit more information and then you want to put it back in the background for it to continue finding the hotels in Berlin or whatever, I think it's

Speaker 2

非常有趣的，推理将非常有用。

going to be pretty interesting and inference will be useful.

Speaker 2

推理将非常有用。

Inference will be useful.

Speaker 2

我的意思是，推理过程中还存在计算效率的问题，而训练时却没有，一般来说，Transformer在训练时可以把序列长度作为批量处理，但在推理时却不行，因为你是一次生成一个词元。

I mean, there's also a compute efficiency thing in inference that you don't have in training and that, you know, in general, transformers can use the sequence length as a batch during training, but they can't really in inference because you're, when you're generating one token at a time.

Speaker 2

因此，可能会有专门为提高推理效率而设计的不同硬件和推理算法。

So, so there, there may be different hardware and inference algorithms that we design for the purposes of being efficient at inference.

Speaker 1

是的，算法改进的一个很好的例子就是使用草稿模型。

Yeah, like as a good example of an algorithmic improvement is the use of drafter models.

Speaker 1

所以你用一个非常小的语言模型，在解码时一次预测一个词元，但能预测出四个词元。

So you have a really small language model that you do one token at a time when you're decoding and predict four tokens.

Speaker 1

然后你把这些交给大模型，说：好的，这是小模型生成的四个词元。

And then you give that to the big model and you say, Okay, here's the four tokens the little model came up with.

Speaker 1

检查一下你同意其中哪些。

Check which ones you agree with.

Speaker 1

如果你同意前三个，那就直接推进，这样你就实际上用并行计算完成了四个词元的生成，而不是在大模型中一个一个地处理。

And if you agree with the first three, then you just advance and then you've basically been able to do a four token with parallel computation instead of a one token with thing in the big model.

Speaker 1

这些正是人们正在研究以提升推理效率的方案。

And so those are the kinds of things that people are looking at to improve inference efficiency.

Speaker 1

所以你就不会遇到单个令牌解码的瓶颈。

So you're not don't have this single token decode bottleneck.

Speaker 1

对。

Right.

Speaker 1

基本上，大模型是

Basically, the big model is

Speaker 2

被用作验证器。

being used as a verifier.

Speaker 2

对。

Right.

Speaker 2

它正在被验证。

It's being verified.

Speaker 2

是的。

Yeah.

Speaker 2

生成-验证机制，你可以这样做，你好，

The generator verification, you can do Hello,

Speaker 1

你怎么样？

how are you?

Speaker 1

这听起来不错。

That sounds great to me.

Speaker 1

我要继续往前推进。

I'm gonna advance past that.

Speaker 0

所以目前有一个很大的讨论是，我们在向单个校园供电方面，已经接近核电厂的极限了。

So a big discussion has been about, we're already tapping out nuclear power plants in terms of delivering power into one single campus.

Speaker 0

那么，我们是否必须在一个地方部署两吉瓦、五吉瓦的电力，还是可以更分散地分布，同时仍能训练模型？

And so do we have to like have just like even more like two gigawatts in one place, five gigawatts in one place, or can it be more distributed and still be able to train a model?

Speaker 0

这种新的推理扩展模式，是否让这些不同的考虑变得可行？

Does this new regime of inference scaling make different considerations there plausible?

Speaker 0

或者你现在是怎么看待多数据中心训练的？

Or how are you thinking about multi data center training now?

Speaker 1

我的意思是，我们已经在这么做了。

I mean, we're already doing it.

Speaker 1

我们支持多数据中心训练。

We're pro multi data center training.

Speaker 1

我认为在Gemini 1.5的技术报告中提到，我们使用了多个城市区域，在每个地方部署部分计算资源进行训练，并通过高带宽但延迟较长的连接连接这些数据中心。

I think in the Gemini 1.5 tech report, said we use multiple metro areas and train with some of the compute in each place and then a pretty long latency, but high bandwidth connection between those data centers.

Speaker 1

这样运作得很好。

That works fine.

Speaker 1

非常好。

It's great.

Speaker 1

实际上，训练过程挺有意思的，因为训练的每一步，对于大型模型来说，通常需要几秒钟或更长时间。

Actually training is kind of interesting because each step in a training process is, you know, usually for a large model is a few seconds or something at least.

Speaker 1

所以，即使延迟有五十毫秒，影响也不大。

So the latency of it being, you know, fifty milliseconds away doesn't matter that much.

Speaker 2

只是带宽的问题，你知道的？

Just the bandwidth, you know?

Speaker 2

是的，就是带宽。

Yeah, just bandwidth.

Speaker 2

只要你能同步不同数据中心之间的模型所有参数，并累积所有梯度就行。

As long as you can sync, you know, sync all of the parameters of the model across the different data centers and then accumulate all the gradients.

Speaker 2

所以在完成一步所需的时间内，你基本上没问题。

So it's in the time it takes to do one step, you're pretty Yeah.

Speaker 1

我们还做了一系列工作，尤其是在早期大脑项目时期，当时我们使用的是CPU机器，速度非常慢。

And then we have a bunch of work on, you know, in even early brain days when we were using CPU machines and they were really slow.

Speaker 1

所以我们需要采用异步训练来帮助扩展，每个模型副本都会进行一些本地计算，然后将梯度更新发送到中央系统，并异步应用这些更新。

So we needed to do asynchronous training to help scale, where each copy of the model would kind of do some local computation and then send gradient updates to a centralized system and then apply them asynchronously.

Speaker 1

另一个模型副本也在做同样的事情。

And another copy of the model would be doing the same thing.

Speaker 1

这会让模型参数有些波动，让人对理论保证感到不安，但实际效果似乎不错。

It makes your model parameters kind of wiggle around a bit and it makes people uncomfortable with the theoretical guarantees, but it actually seems to work in practice.

Speaker 2

那它是怎么工作的？

And how does it work?

Speaker 2

从异步切换到同步真是太好了，因为现在你的实验变得可复现了，而不是像以前那样，你的结果取决于是否有一台网络爬虫在同一个机器上运行。

Was so pleasant to go from async to sync because your experiments are now replic replicable, like, rather than, like, every like, your results depend on, like, whether there was, like, a web crawler running on the same machine.

Speaker 2

就像是你的其中一台电脑。

It's like, one of your computers.

Speaker 2

所以我现在用TPU集群运行，感觉开心多了。

So I am so much happier running on, like, TPU pods.

Speaker 1

我喜欢异步训练。

I love asyncpy.

Speaker 2

它只是让你能扩展到一些iPhone和Xbox之类的设备。

It just lets you scale some iPhones and an Xbox or whatever it was.

Speaker 1

但，是的。

But, like Yeah.

Speaker 1

如果我们能给你异步但可复现的结果呢？

What if we could give you asynchronous but replicatable results?

Speaker 1

哦。

Oh.

Speaker 1

实现这一点的一种方法是，你实际上记录下操作的序列。

So one way to do that is you effectively record the sequence of operations.

Speaker 1

所以是哪个梯度更新发生了，何时发生，以及在哪个数据批次上。

So like which gradient update happened and when and on which batch of data.

Speaker 1

不一定非要把实际的梯度更新记录到日志中，但你可以重放这些操作日志，从而实现可重复性。

Don't necessarily record the actual gradient update in a log or something, but you could replay that log of operations, so that you get repeatability.

Speaker 2

嗯。

Mhmm.

Speaker 2

嗯。

Mhmm.

Speaker 1

那样的话，我想你会满意的。

Then I think you'd be happy.

Speaker 2

也许吧。

Then Possibly.

Speaker 2

至少你能调试出发生了什么。

At least you could debug what happened.

Speaker 2

是的。

Yeah.

Speaker 2

但那样你就没法比较两个训练运行了，因为是的。

But then you you wouldn't be able to, like, compare two necessarily two training runs because Yeah.

Speaker 2

好的。

Okay.

Speaker 2

我修改了一个超参数，但同时也用了一个网络爬虫。

I I made one change in the hyperparameter, but also, like, I had, like, a Web crawler.

Speaker 2

网络机器。

Web machine.

Speaker 2

比如在进行测试。

Like, testing up.

Speaker 2

而且当时有很多人同时在为超级碗呐喊。

And there were, like, a lot of people screaming the Super Bowl at the same time.

Speaker 1

意思是，这个

Mean, the

Speaker 2

不是超级碗。

thing that Not Super Bowl.

Speaker 1

让我们能够从CPU上的异步训练转向完全同步训练的关键，在于我们拥有这些超高速的TPU硬件芯片，以及芯片之间和芯片组内具有惊人带宽的集群。

The thing that let us go from asynchronous training on CPUs to fully synchronous training is the fact that we have these super fast TPU hardware chips and then pods, which have incredible amounts of bandwidth between the chips and a pod.

Speaker 1

而在那之后，要实现更大规模的扩展，则依赖于出色的数据中心网络，甚至跨城市区域的网络，使我们能够在多个城市区域部署大量集群，用于最大规模的训练任务。

Then scaling beyond that, have really good data center networks and even cross metro area networks that enable us to scale to many, many pods in multiple metro areas for our largest training runs.

Speaker 1

正如诺姆所说，只要梯度累积和跨城市区域的参数通信速度足够快，相对于每一步的耗时而言，我们就能实现完全同步训练。

And we can do that fully synchronously, as Noam said, as long as the gradient accumulation and communication of the parameters across metro areas happens fast enough relative to the step time, you're golden.

Speaker 1

你其实并不在意。

You don't really care.

Speaker 1

但我认为，随着规模不断扩大，我们可能会倾向于在系统中引入比现在更多的异步性，因为我们已经能够让它正常运行。

But I think as you scale up, there may be a push to have a bit more asynchrony in our systems than we have now because we can make it work.

Speaker 1

我们的机器学习研究人员对同步训练所能达到的规模感到非常满意，因为它的思维模型更简单易懂。

Our ML researchers have been really happy how far we've been able to push synchronous training because it is easier mental model to understand.

Speaker 1

你只需要让算法本身与你对抗，而不是让异步性和算法相互争斗。

You just have your algorithm sort of fighting you rather than the asynchrony and the algorithm kind of like battling

Speaker 2

你。

you.

Speaker 2

随着规模扩大，有更多东西在跟你作对。

Scale As up, there are more things fighting you.

Speaker 2

你知道的？

You know?

Speaker 2

比如，确实如此。

Like, there's Yeah.

Speaker 2

我的意思是，没错。

I mean, the the right.

Speaker 2

这就是扩展时的问题，你其实并不总是知道究竟是什么在跟你作对。

That's the problem with, you know, with scaling that you don't actually always know what it is that that's fighting you.

Speaker 2

是因为你把量化在某些地方推得过头了吗？

Is it, you know, the the fact that you've pushed, like, quantization a little too far in some place or another?

Speaker 2

是你的数据问题吗？

Is it your data?

Speaker 2

还是说？

Or is it?

Speaker 1

也许是你的对抗性机器MUQQ17在搞鬼，比如：‘哦，不。’

Maybe it's your adversarial machine MUQQ17 that is like Oh, no.

Speaker 1

设置了你指数的第七位以及所有弧度值

Setting the seventh bit of your exponent and all your radians

Speaker 2

或者别的什么。

or something.

Speaker 2

对。

Right.

Speaker 2

所有这些因素都会让模型稍微变差，以至于你根本察觉不到问题正在发生。

And all of these things just make the model slightly worse, so you don't even know that the thing is going on.

Speaker 1

所以，神经网络其实有个问题，就是它们对噪声的容忍度太高了。

So That's actually a bit of a problem with neural nets is they're so tolerant of noise.

Speaker 1

你可以在很多方面设置得不太对，但它们总能找到办法绕过去，或者即使有这些问题也能学会应对。

You can have things set up kind of wrong in a lot of ways, and they just kind of figure out ways to work around that or learn and despite You could

Speaker 2

你的代码里可能有bug。

have bugs in your code.

Speaker 2

大多数时候，这什么效果都没有。

Most of the time, that does nothing.

Speaker 2

有时候，它会让你的模型变差。

Some of the time, it makes your model worse.

Speaker 2

有时候，它会让你的模型变好。

Some of the time, it makes your model better.

Speaker 2

然后你因为从未在大规模上尝试过这个bug——因为你之前没有预算支持它——而发现了新东西。

And then you discover something new because you never tried this bug at scale before because you didn't have it it didn't have the budget for it.

Speaker 2

但是什么

But What

Speaker 0

实际上，调试或解析这些情况会是什么样子呢？你有一些因素让模型变好，有一些让模型变差。

what practically does it look like actually to debug or decode what the like, you've got these things, some of which are making your model better, some of which are making it worse.

Speaker 0

当你明天去上班时，你会想：好吧，这里到底发生了什么？

You, when you go into work tomorrow, you're like, all right, what's going on here?

Speaker 0

你怎么才能找出最重要的输入是什么？

How do you figure out what the most salient inputs are?

Speaker 2

对。

Right.

Speaker 2

我的意思是，在小规模下，你会做很多实验。

I mean, well, at small scale, you do lots of experiments.

Speaker 2

我认为，研究中有一部分是这样的：我想独立地发明这些改进或突破，这时候你需要一个简洁的代码库，可以分叉、修改，并设置一些基线。

So, I mean, there's, I think, one part of, of the research that involves, okay, I want to, like, invent these improvements or breakthroughs kind of in isolation, in which case you want a nice simple code base that you can fork and hack and like some baselines.

Speaker 2

我的梦想是，早上醒来后想到一个点子，当天就动手实现，运行一些实验，当天就能得到初步结果，比如：‘这个看起来有希望，这些方法有效，那些没用。’

And, you know, my dream is I wake up in the morning, come up with an idea, hack it up in the day, run some experiments, get some initial results in the day, like, Okay, this looked promising, these things worked and didn't work.

Speaker 2

我认为这完全是可以实现的。

And I think that is very achievable okay

Speaker 1

在小规模下。

At small scale.

Speaker 2

在小规模下，只要你保持一个良好的实验代码库，

At small scale, as long as you keep your, you know, keep a nice experimental code base and

Speaker 1

也许一个实验只花一两个小时，而不是两周。

Maybe an experiment takes an hour to run or two hours or something, not two weeks.

Speaker 1

很棒。

It's great.

Speaker 2

很棒。

It's great.

Speaker 2

所以，研究的这一部分是这样的，然后还有一部分是扩大规模。

So so there's that part of the research, and then there's some amount of scaling up.

Speaker 2

接着就是整合的部分，你需要把所有改进叠加在一起，看看它们在大规模下是否有效，以及是否能协同工作。

And then you have the part which is like integrating, where you want to stack all the improvements on top of each other and see if they work at large scale and see if they work all in conjunction with

Speaker 1

彼此之间。

each other.

Speaker 1

对，你可能觉得它们是独立的，但事实上，可能在我们处理视频数据输入的方式和更新模型参数的方式之间，存在一些有趣的交互。

Right, you think maybe they're independent, but actually maybe there's some funny interaction between, you know, improving the way in which we handle video data input and the way in which we, you know, update the model parameters or something.

Speaker 1

对于视频数据，这种交互比其他某些东西更明显。

That interacts more for video data than some other thing.

Speaker 1

可能会发生各种你可能没预料到的交互。

There's all kinds of interactions that can happen that you maybe don't anticipate.

Speaker 1

所以你需要进行这些实验，把一堆东西组合在一起，并定期确认所有你认为好的改进在一起时仍然有效。

And so you want to run these experiments where you're then putting a bunch of things together and then periodically making sure that all the things you think are good are good together.

Speaker 1

如果不行，就要弄清楚为什么它们无法协同工作。

And if not, understanding why they're not playing nicely.

Speaker 0

两个问题。

Two questions.

Speaker 0

第一，有多少次会出现这些改进无法良好叠加的情况？

One, how often does it end up being the case that things don't stack up well together?

Speaker 0

是罕见的情况，还是经常发生？

Is it like a rare thing or does it happen all the time?

Speaker 2

经常发生。

Happens Yeah.

Speaker 2

一直都会。

All the

Speaker 1

我的意思是，大多数情况下你根本不会尝试叠加，因为最初的实验效果不好，或者相对于基线来说结果不够有希望。

mean, I think most things you don't even try to stack because the initial experiment didn't work that well or it showed results that aren't that promising relative to the baseline.

Speaker 1

然后你会把这些东西分别拿出来，尝试单独扩大规模。

And then you sort of take those things and you try to scale them up individually.

Speaker 1

然后你会说，哦，这些看起来真的很有前景。

And then you're like, Oh yeah, these ones seem really promising.

Speaker 1

所以我现在要把它们纳入一个我打算整合起来、推进的系统中，也就是把那些看起来有前景的其他东西也结合起来。

So I'm going to now include them in something that I'm going to now bundle together and try to advance, you know, what the and combine with other things that seem promising.

Speaker 1

然后你运行实验，结果发现，哦，它们其实并没有那么有效。

And then you run the experiments and then you're like, Oh, well, they didn't really work that well.

Speaker 1

让我们试着调试一下。

Like, let's try to debug.

Speaker 1

为什么？

Why?

Speaker 2

然后这里存在权衡，因为你希望保持你的集成系统尽可能简洁，因为复杂性来自代码。

And then there are trade offs because you want to keep your, like, integrated system, you know, as clean as you can because, you know, complexity- Code based.

Speaker 2

基于代码。

Based.

Speaker 2

是的，基于代码和算法复杂性，你知道，复杂性会带来问题，它会让事情变慢，增加风险。

Yeah, code based and algorithmically complexity, you know, complexity hurts, complexity makes things slower, introduces more risk.

Speaker 2

但与此同时，你又希望它尽可能好。

And then, you know, at the same time, you want to, you want it to be as, as good as possible.

Speaker 2

当然，每位研究人员都希望自己的发明能被纳入其中。

And of course, every individual researcher, wants, wants his, inventions to go into it.

Speaker 2

所以这里确实存在挑战，但我们一直合作得相当不错。

So there are definitely challenges there, we've been working together quite well.

Speaker 0

我的赞助商简街资本发明了一款叫Figgie的纸牌游戏，用来向新交易员传授市场和交易的基本知识。

My sponsors, Jane Street, invented a card game called Figgie in order to teach their new traders the basics of markets and trading.

Speaker 0

我是扑克爱好者，我觉得Figgie就像扑克一样，都涉及隐藏信息，但它的激烈程度和社交性要强得多。

I'm a poker fan, and I'd say that Figgie is like poker in the sense that there's hidden information, but it's much more intense and social.

Speaker 0

在扑克中，你通常只是坐着等轮到自己，而在Figgie中，你整个时间都在向其他玩家大喊出价和要价。

In poker, you're usually just sitting around waiting for your turn, whereas in Figgie, you spend the whole time just shouting bids and asks at the other players.

Speaker 0

这个游戏当然最终会有赢家，但在每一轮中，你都被激励去与其他玩家寻找互利的交易。

The game is set up such that there's a winner in the end, of course, but during each turn, you are incentivized to find mutually beneficial trades with the other players.

Speaker 0

事实上，这个游戏所奖励的主要技能就是这一点。

And in fact, that's the main skill that the game rewards.

Speaker 0

Figgie 模拟了交易中最令人兴奋的部分。

Figgie simulates the most exciting parts of trading.

Speaker 0

Jane Street 的员工非常喜欢这个游戏，因此他们每年都会举办一次内部办公室 Figgie 冠军赛。

Jane Streeters enjoy it so much that they hold an inner office Figgie championship every single year.

Speaker 0

你可以通过在 App Store 下载来自己玩，或者在桌面端访问 figgie.com 玩。

You can play it yourself by downloading it on the App Store, or you can find it on desktop at figgie.com.

Speaker 0

好的。

Alright.

Speaker 0

我们回到 Jeff 和 Noam。

Back to Jeff and Noam.

Speaker 0

好的。

Okay.

Speaker 0

那么，回到你们不断发现更优的算法改进、模型随着时间推移变得越来越好的整个动态过程，即使不考虑硬件因素。

So then going back to the whole dynamic of you find better and better algorithmic improvements and the models get better and better over time, even if you take the hardware part out of it.

Speaker 0

世界是否应该更多地思考这个问题？你们是否也应该更多地关注这一点？

Should the world be thinking more about and should you guys be thinking more about this?

Speaker 0

有一种情况是，人工智能只是随着时间缓慢进步，需要大约二十年才能逐渐变好。

There's one world where you just like AI is a thing that takes like two decades to slowly get better over time.

Speaker 0

如果你搞砸了某些事情，你可以慢慢修正，这并不是什么大问题，对吧？

And you can sort of like refine things over, if like you've kind of messed something up, you fix it, and it's like not that big a deal, right?

Speaker 0

它比你之前发布的版本并没有好太多。

It's like not that much better than the previous version you released.

Speaker 0

还有一种情况是，存在一个强大的反馈循环，这意味着从Gemini四到Gemini五之间的两年，将成为人类历史上最重要的两年，因为你从一个相当优秀的机器学习研究者，迅速跃升为超人类智能，而这正是反馈循环带来的结果。

There's another world where you have this big feedback loop, which means that the two years between Gemini four and Gemini five are the most important years in human history, because you go from a pretty good ML researcher to superhuman intelligence because of this feedback loop.

Speaker 0

如果你认为第二种情况是可能的，那么这会如何改变你应对日益增强的智能的方式？

To the extent that you think that second world is plausible, how does that change how you sort of approach these greater and greater levels of intelligence?

Speaker 2

我已不再打扫车库了，因为我正等着机器人来帮我。

I've stopped cleaning my garage because I'm waiting for the robots.

Speaker 2

你知道的？

You know?

Speaker 2

所以，我更倾向于第二种观点，即我们会看到大量的加速。

So probably I'm I'm I'm more in in the second camp of what we're gonna see a lot of acceleration.

Speaker 1

是的。

Yeah.

Speaker 1

我的意思是，理解正在发生什么以及趋势是什么非常重要。

I mean, I think it's it's super important to understand what's going on and what the trends are.

Speaker 1

我认为，目前的趋势是模型在每一代中都有显著的提升。

And I think right now the trends are the models are getting substantially better generation over generation.

Speaker 1

而且我认为，在接下来的几代中，这种趋势不太可能放缓。

And I don't see that slowing down in the next few generations probably.

Speaker 1

这意味着，再过两到三代的模型，将能够做到：比如，把一个简单任务分解成10个子任务并80%的成功完成，升级为把一个高度复杂的任务分解成100个甚至1000个子任务，并且90%的时间都能正确完成。

So that means the models say two to three generations from now are gonna be capable of, let's go back to the example of breaking down a simple task into 10 sub pieces and doing it 80% of the time to something that can break down a task, a very high level task into a 100 or a thousand pieces and get that right 90% of the time.

Speaker 1

这标志着模型能力的巨大飞跃。

That's a major, major step up in what the models are capable of.

Speaker 1

因此，我认为让人们理解这个领域正在发生的进展非常重要。

So I think it's important for people to understand what is happening in the progress in the field.

Speaker 1

然后这些模型将被应用于众多不同领域。

And then those models are gonna be applied in a bunch of different domains.

Speaker 1

我认为确保社会能够最大限度地受益于这些模型在改善事物方面的潜力非常重要，比如在教育和医疗领域，让所有人都能获取信息，我对此充满期待。

And I think it's really good to make sure that we as society get the maximal benefits from what these models can do to improve things in, you know, I'm super excited about areas like education and healthcare, you know, making information accessible to all people.

Speaker 1

但我们同时也意识到，它们可能被用于制造虚假信息。

But we also realise that they could be used for misinformation.

Speaker 1

它们可能被用于自动化攻击计算机系统。

They could be used for automated hacking of computer systems.

Speaker 1

我们希望尽可能多地建立防护措施和缓解机制，并充分理解这些模型的能力。

And we wanna sort of put as many safeguards and mitigations and understand the capabilities of the models in place as we can.

Speaker 1

我认为，谷歌整体上对如何应对这一问题有着非常明智的见解。

And that's kind of, you know, I think Google as a whole has a really, you know, good view to how we should approach this.

Speaker 1

我们的负责任AI原则实际上为如何在不同场景下权衡发布越来越强大的AI系统提供了很好的框架，同时确保我们做正确的事，比如保证它们的安全性，不传播有害内容等。

You know, our responsible AI principles actually are a pretty nice, framework for how to think about trade offs of making better and better AI systems available in different contexts and settings, while also sort of making sure that we're doing the right thing in terms of making sure they're safe and not saying toxic things and things like that.

Speaker 0

我觉得让我印象深刻的是，你从宏观角度审视了这段人类历史：如果我们处于这样一个世界——比如，如果你对Gemini三号进行糟糕的后训练，它可能会产生虚假信息，但只要你修正后训练，它就会停止这么做。

I guess the thing that stands out to me, you were zooming out and looking at this period of human history, if we're in the world where like, look, maybe if you do post training on Gemini three badly, it can do some misinformation, but then you like fix the post training and like, it's gonna stop doing them.

Speaker 0

这是一个错误，但可以修复。

It's a bad mistake, but it's a fixable mistake.

Speaker 0

对吧？

Right?

Speaker 0

但如果你面临这种反馈循环的可能性，那么导致智能爆炸的错误可能是模型目标错位——它并没有尝试编写你认为它在写的代码，而是在优化其他目标。

Whereas if you have this feedback loop dynamic, which is a possibility, then the sort of like mistake of like the thing that catapults this intelligence explosion is like misaligned, is like not trying to write the code you think it's trying to write and optimizing for some other objective.

Speaker 0

在这个非常迅速的过程结束时，可能只持续几年甚至更短，你会得到接近杰夫·迪恩或超越诺am·沙泽尔水平的系统。

And on the other end of this very rapid process that lasts a couple of years, maybe less, you have things that are approaching Jeff Dean or beyond the level or Noam Shazeer or beyond level.

Speaker 0

然后你会拥有数以百万计的杰夫·迪恩级别的程序员副本。

And then you have, like, millions of copies of Jeff Dean level programmers.

Speaker 0

无论如何，这似乎是一个更难挽回的错误，也显得更加突出。

And, anyways, that seems like a harder to recover mistake, and that seems like a much more salient way.

Speaker 0

你必须确保我们在智能爆炸的正确方向上前进。

You really gotta make sure we're going in the intelligence explosion.

Speaker 2

随着这些系统变得越来越强大，你必须越来越小心，越来越谨慎。

As these as these systems do get more powerful, you you have you know, you gotta be more and more more and more careful.

Speaker 1

我的意思是，两端都有极端的观点。

I mean, one thing I would say is there's, like, the extreme views on either end.

Speaker 1

比如，天哪。

There's like, oh my goodness.

Speaker 1

这些系统在所有方面都会远远超越人类。

These systems are gonna like be so much better than humans at all things.

Speaker 1

我们会因此感到不堪重负。

And we're gonna be kind of overwhelmed.

Speaker 1

而另一端的观点则认为，这些系统会非常出色，我们根本不必担心它们。

And then there's the like, these systems are gonna be amazing and we don't have to worry about them at all.

Speaker 1

我认为自己处于中间立场，我是一篇名为《塑造人工智能》论文的合著者，这篇论文指出，这两种极端观点常常把我们的角色视为放任自流，认为人工智能会自然沿着它自己的路径发展。

I think I'm somewhere in the middle and I'm a co author on a paper called Shaping AI, which is those two extreme views often kind of view our role as kind of laissez faire, like we're just going to have the AI develop in the path that it takes.

Speaker 1

但我认为，实际上有非常充分的理由表明，我们应该努力塑造和引导人工智能在世界上的部署方式，使其在教育、我提到的医疗等领域实现最大化的益处，同时尽可能通过政策、技术措施和安全机制，阻止人工智能接管并获得对其行为的无限控制权。

And I think there's actually a really good argument to be made that what we're going to do is try to shape and steer the way in which AI is deployed in the world so that it is maximally beneficial in the areas that we want to capture and benefit from in education, some of the areas I mentioned, healthcare, and steer it as much as we can away maybe with policy related things, maybe with technical measures and safeguards away from the computer will take over and have unlimited control of what it can do.

Speaker 1

我认为这本质上是一个工程问题：如何设计出安全的系统？

I think that's an engineering problem is how do you engineer safe systems?