本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
所以我们三年前聊过。
So we talked three years ago.
我想知道,在你看来,过去三年最大的变化是什么?
I'm curious in your view, what has been the biggest update of the last three years?
和我三年前的感受相比,现在最大的不同是什么?
What has been the biggest difference between what I felt like last three years versus now?
是的。
Yeah.
我会说,实际上是底层技术,也就是技术的指数级发展,总体而言,和我预期的差不多。
I would say actually the underlying technology, like the exponential of the technology, has gone broadly speaking, I would say about as I expected it to go.
我的意思是,这里可能前后差一两年。
I mean, there's like plus or minus a couple there's plus or minus a year or two here.
那里也可能差一两年。
There's plus or minus a year or two there.
我无法预测代码会朝那个具体方向发展。
I don't know that I would have predicted the specific direction of code.
但当我观察这种指数增长时,它大致符合我的预期:模型从聪明的高中生,发展到聪明的大学生,再到开始做博士和专业工作,而在代码领域,甚至超越了这一点。
But actually when I look at the exponential, it is roughly what I expected in terms of the march of the models from smart high school student to smart college student to beginning to do PhD and professional stuff, and in the case of code, reaching beyond that.
前沿的发展有些不均衡。
The frontier is a little bit uneven.
这大致符合我的预期。
It's roughly what I expected.
不过,我要告诉你最令人惊讶的是什么。
I will tell you though what the most surprising thing has been.
最令人惊讶的是,公众几乎没有意识到我们已经接近指数增长的终点。
The most surprising thing has been the lack of public recognition of how close we are to the end of the exponential.
对我来说,真是不可思议的是,那些身处泡沫之内和之外的人,却还在谈论那些老生常谈的政治热点问题,围绕着我们周围。
To me, it is absolutely wild that have peep within the bubble and outside the bubble, but you have people talking about these just the same tired old hot button political issues and, like, you know, around us.
我们正接近指数增长的终点。
We're, like, near the end of the exponential.
我想弄清楚现在的这种指数增长究竟是什么样子,因为三年前我们录制时,我问你的第一个问题是:扩展进展如何?
I I wanna understand what that exponential looks like right now because the first question I asked you when we recorded three years ago was, what's up at scaling?
为什么它有效?
Why does it work?
我现在也有一个类似的问题,但我觉得这个问题更复杂,因为从公众的角度来看,三年前,这些是广为人知的公共趋势:在多个数量级的计算资源下,我们可以看到损失是如何改善的。
And I have a similar question now, but I feel like it's a more complicated question because at least from the public's point of view, three years ago, were these well known public trends where across many orders of magnitude of compute, could see how the loss improves.
而现在我们有了强化学习的扩展,但还没有公开已知的扩展定律。
And now we have RL scaling and there's no publicly known scaling law for it.
甚至连这背后的故事都不清楚:这究竟是为了教模型技能吗?
It's not even clear what exactly the story is of, is this supposed to be teaching the model skills?
还是为了教元学习?
Is this supposed to be teaching meta learning?
目前的扩展假设到底是什么?
What is the scaling hypothesis at this point?
是的。
Yeah.
实际上,我持有的假设和早在2017年时一样。
So I have actually the same hypothesis that I had even all the way back in 2017.
2017年的时候,我想我上次提到过,我写了一份叫做‘计算量大块假设’的文档。
So in 2017, I think I talked about it last time, but I wrote a doc called the big blob of compute hypothesis.
它并不是专门关于语言模型的扩展。
It wasn't about the scaling of language models in particular.
我写这份文档的时候,GPT-1刚刚发布,对吧?
When I wrote it, GPT-one had just come out, right?
所以那只是众多方向中的一个,对吧?
So that was one among many things, right?
那时候,机器人技术是热门方向。
Back in those days, was robotics.
人们试图把推理作为一个与语言模型分离的独立领域来研究。
People tried to work on reasoning as separate thing from language models.
当时还存在像AlphaGo、OpenAI的DOTA和DeepMind的StarCraft(AlphaStar)那样的强化学习扩展研究。
There was scaling of the kind of RL that happened in AlphaGo and that happened at DOTA at OpenAI and people remember StarCraft at DeepMind, the AlphaStar.
所以这份文档是作为一个更通用的框架撰写的。
So it was written as a more general document.
我具体说的是这样一件事,几年后里奇·萨顿提出了‘痛苦的教训’,但这个假设本质上是一样的。
The specific thing I said was the following, and it's very Rich Sutton put out the bitter lesson a couple years later, but the hypothesis is basically the same.
它说的是,所有那些聪明的技巧、所有我们以为需要新方法才能实现某些目标的东西,其实都没那么重要。
So what it says is all the cleverness, all the techniques, all the kind of we need a new method to do something like that doesn't matter very much.
真正重要的只有少数几件事。
There are only a few things that matter.
我想我列出了其中七项。
I think I listed seven of them.
第一是你可以获得多少原始算力。
One is like how much raw compute you have.
第二是你拥有的数据量。
The other is the quantity of data that you have.
第三是数据的质量和分布,对吧?
Then the third is kind of the quality and distribution of data, right?
数据的分布必须足够广泛。
It needs to be a broad distribution of data.
第四点是我认为你训练的时间有多长。
The fourth is I think how long you train for.
第五点是你需要一个能够扩展到极致的目标函数。
The fifth is you need an objective function that can scale to the moon.
所以预训练目标函数就是这样一个目标函数。
So the pre training objective function is one such objective function.
对吧?
Right?
另一个目标函数是强化学习中的目标函数,它意味着你有一个目标,你要去实现它。
Another objective function is the kind of RL objective function that says like you have a goal, you're going to go out and reach the goal.
在这个过程中,当然会有像数学和编程中那样的客观奖励。
Within that, of course, there's objective rewards you know, like you see in math and coding.
还有更主观的奖励,比如来自人类反馈的强化学习,或者更高阶的版本。
And there's more subjective rewards like you see in RL from human feedback or kind of higher order versions of that.
第六和第七点则涉及归一化或条件化之类的东西,主要是为了保证数值稳定性,让大量的计算能够平稳流动,而不是遇到问题。
And then the sixth and seventh were things around kind of like normalization or conditioning, just getting the numerical stability so that the big blob of compute flows in this laminar way instead of running into problems.
所以这就是当时的假设。
So that was the hypothesis.
而这个假设我至今仍然坚持。
And it's hypothesis I still hold.
我还没看到多少与这个假设相悖的证据。
I don't think I've seen very much that is not in line with that hypothesis.
因此,预训练的扩展定律就是其中一个例子
And so the pre trained scaling laws were one example of
类型的
kind
我们在这里看到的现象。
of what we see there.
事实上,这些趋势仍在持续。
And indeed, those have continued going.
比如,现在大家都广泛报道了,我们对预训练感到乐观。
Like, you know, I think now it's been widely reported like, you know, we feel good about pre training.
比如,预训练仍在持续为我们带来收益。
Like pre training is continuing to give us gains.
发生变化的是,现在我们也在强化学习中看到了同样的现象,对吧?
What has changed is that now we're also seeing the same thing for RL, right?
所以我们看到了一个预训练阶段,然后在这个基础上还有一个强化学习阶段。
So we're seeing a pre training phase and then we're seeing like an RL phase on top of that.
对于强化学习来说,情况其实是一样的。
And with RL, it's actually just the same.
甚至其他公司也在一些发布中提到,比如我们用数学竞赛、AIME或其他类似内容来训练模型。
Even other companies have published in some of their releases have published things that say, Look, we train the model on math contests, AIME or the kind of other things.
模型的表现与训练时长呈对数线性关系。
And how well the model does is log linear and how long we've trained it.
我们也能看到这一点。
And we see that as well.
这不仅仅是数学竞赛。
It's not just math contest.
这涉及多种多样的强化学习任务。
It's a wide variety of RL tasks.
因此,我们在强化学习中看到了与预训练相同的扩展趋势。
And so we're seeing the same scaling in RL that we saw for pre training.
你提到了理查德·萨顿在《痛苦的教训》中的观点。
You mentioned Richard Sutton in The Bitter Lesson.
是的。
Yeah.
我去年采访过他,他实际上并不认同大语言模型的主流观点。
I interviewed him last year and he is actually very non LLM pilled.
我不知道这是否是他的观点,但一种可以概括这种异议的说法是:真正具备人类学习核心能力的系统,不应该需要数十亿级别的数据和算力,以及专门构建的环境,才能学会使用Excel或PowerPoint、浏览网页。
And if I'm I don't know if this is his perspective, but one way to paraphrase this objection is something like, look, something which possesses the true core of human learning would not require all these billions of dollars of data and compute and these bespoke environments to learn how to use Excel or how to use PowerPoint, how to navigate a web browser.
而我们必须通过这些强化学习环境来构建这些技能,这暗示我们实际上还缺乏人类学习的核心算法。
And the fact that we have to build in these skills using these RL environments hints that we're actually lacking this core human learning algorithm.
因此,我们正在扩展错误的方向。
And so we're scaling the wrong thing.
所以,是的,这确实提出了一个问题。
And so, yeah, that that does raise the question.
如果我们确实认为存在某种能够像人类一样即时学习的东西,那我们为什么还要进行所有这些强化学习的扩展呢?
Why are we doing all this RL scaling if we do think there's something that's gonna be human like in its ability to learn on the fly?
是的。
Yeah.
对。
Yeah.
所以我认为,这实际上把几个本应被不同看待的问题混在一起了。
So I think I think this kind of puts together several things that should be kind of thought of thought of differently.
明白。
Yeah.
我认为这里确实有一个真正的难题,但它可能并不重要。
I think there is a genuine puzzle here, but it it may not matter.
事实上,我猜它很可能并不重要。
In fact, I would guess it probably it probably doesn't matter.
所以,我们先暂时把强化学习放一边,因为实际上我认为,在这个问题上,强化学习与预训练并没有什么不同,这种说法是在转移注意力。
So let's take the RL out of it for a second because I actually think RL it's a red herring to say that RL was any different from pre training in this matter.
那么,如果我们看看预训练的扩展过程,会发现它非常有趣。
So if we look at pre training scaling, it was very interesting.
早在2017年,当Alec Radford在做GPT-1的时候。
Back in 2017 when Alec Radford was doing GPT-one.
如果你看看GPT-1之前的模型,它们都是在这些不能代表广泛文本分布的数据集上训练的。
If you look at the models before GPT-one, they were trained on these datasets that didn't represent a wide distribution of text.
对吧?
Right?
我们有这些非常标准的语言建模基准测试。
You had these very standard language modeling benchmarks.
而GBT-one本身是在一堆我认为其实是同人小说上训练的。
And GBT-one itself was trained on a bunch of I think it was fan fiction actually.
但那是文学文本,只占你能获取的文本中非常小的一部分。
But it was literary text, which is a very small fraction of the text that you get.
在那个年代,我们发现数据量大约是十亿个单词左右。
And what we found with that and in those days, was like a billion words or something.
所以数据集很小,代表的分布也非常狭窄,对吧?
So small datasets and represented a pretty narrow distribution, right?
也就是你在世界上能看到的那类文本的狭窄分布。
Like a narrow distribution of kind of what you can see in the world.
而且它的泛化能力很差。
And it didn't generalize well.
如果你在某个……呃,我忘了叫什么了,但它是某种网络小说语料库上表现更好,
If you did better on, you know, the the you know, I I forgot what it but it's some some kind of fan fiction corpus.
它也很难很好地泛化到其他类型的文本上。我们当时有很多指标来衡量模型在预测各种其他文本时的表现,但你根本看不到泛化效果。
It wouldn't generalize that well to kind of the other you know, we had all these measures of, you know, how well does the how well does the model do at predicting all of these other kinds of texts, you really didn't see the generalization.
只有当你在互联网上所有任务上进行训练时,比如对互联网进行一次全面的抓取——像Common Crawl这样的数据源,或者抓取Reddit上的链接,这正是我们在GPT-2中所做的,
It was only when you trained over all the tasks on the, you know, the Internet, when you when you kind of did a general Internet scrape, right, from something like, you know, Common Crawl or scraping links on Reddit, which is what we did for GPT-two.
只有当你这么做时,才真正开始获得泛化能力。
It's only when you do that, that you kind of started to get generalization.
我认为我们在强化学习中也看到了同样的情况,最初是从非常简单的强化学习任务开始,比如在数学竞赛上进行训练。
And I think we're seeing the same thing on RL, that we're starting with first very simple RL tasks like training on math competitions.
然后我们逐渐转向更广泛的训练,涉及像代码这样的任务。
Then we're kind of moving to kind of broader training that involves things like code as a task.
现在我们正在转向执行许多其他任务。
And now we're moving to do kind of many other tasks.
然后我认为我们将越来越获得泛化能力。
And then I think we're going to increasingly get generalization.
所以这排除了强化学习与预训练方面的区别。
So that takes out the RL versus the pre training side of it.
但无论如何,这里似乎有一个谜题:在预训练中,当我们用预训练数据训练模型时,我们会使用数万亿个词元。
But I think there is a puzzle here either way, which is that on pre training, when we train the model on pre training, you know, we we use like trillions of tokens.
对吧?
Right?
而人类并不会接触到数万亿个单词。
And and humans don't see trillions of words.
所以这里确实存在样本效率的差异。
So there is an actual sample efficiency difference here.
这里实际上发生了一些不同的事情,那就是模型从零开始,必须接受大量的训练。
There there is actually something different that's that's happening here, which is that the model start from scratch and they have to get much more training.
但我们还发现,一旦训练完成,如果我们给予它们很长的上下文长度,唯一阻碍长上下文长度的是推理过程。
But we also see that once they're trained, if we give them a long context length the only thing blocking a long context length is like inference.
但如果我们给它们一百万的上下文长度,它们在该上下文内学习和适应的能力非常强。
But if we give them like a context length of a million, they're very good at learning and adapting within that context length.
所以我不知道这个问题的完整答案,但我认为在预训练中,这个过程并不像人类学习的过程。
And so I don't know the full answer to this, but I think there's something going on that pre training, it's it's not like the process of humans learning.
它介于人类学习和人类进化之间。
It's somewhere between the process of humans learning and the process of human evolution.
这就像我们从进化中获得了许多先验知识。
It's like it's somewhere between like, we get many of our priors from evolution.
我们的大脑并不是一块白板。
Our brain isn't just a blank slate.
对吧?
Right?
已经写过很多关于这个的书了。
Whole books have been written about.
我认为语言模型更像是白板。
I think the language models, they're much more blank slates.
它们本质上是从随机权重开始的,而人脑则一开始就具备各种区域,并连接着众多输入和输出。
They literally start as like random weights, whereas the human brain starts with all these regions, it's connected to all these inputs and outputs.
也许我们应该把预训练,以及强化学习,看作是介于人类进化和人类即时学习之间的某种中间过程。
Maybe we should think of pre training and for that matter RL as well as being something that exists in the middle space between human evolution and human on the spot learning.
而模型的上下文学习,则介于人类的长期学习和短期学习之间。
And as the in context learning that the models do as as something between long term human learning and short term human learning.
所以,你知道,这里存在一个层次结构:进化、长期学习、短期学习,以及人类的即时反应。
So, you know, there there's this hierarchy of, like, there's evolution, there's long term learning, there's short term learning, and there's just human reaction.
大语言模型的各个阶段就分布在这条谱系上,但并不一定恰好落在相同的点上。
And the LLM phases exist along this spectrum, but not necessarily exactly at the same points.
人类的一些学习方式在LLM中没有对应的机制。
There's no analog to some of the human modes of learning.
大型语言模型恰恰落在这些点之间的位置。
The LLMs are kind of falling between the points.
这说得通吗?
Does that make sense?
是的。
Yes.
不过有些地方还是有点令人困惑。
Although some things are still a bit confusing.
比如,如果这个类比是说这就像进化,那么它不够样本高效也没关系,但既然我们希望通过上下文学习获得那种极其样本高效的智能体,那为什么还要费心去构建那些RL环境公司呢?它们似乎在教模型如何使用API、如何使用Slack、如何使用各种工具。
For example, if the analogy is that this is like evolution, so it's fine that it's not that sample efficient, then like, well, if we're gonna get the kind of super sample efficient agent from in context learning, why are we bothering to build in, there's RL environment companies, which are, it seems like what they're doing is they're teaching it how to use this API, how to use Slack, how to use whatever.
这让我很困惑,为什么如此强调这一点。
It's confusing to me why there's so much emphasis on that.
如果那种能够即兴学习的智能体已经出现、即将出现,或者已经存在了。
If the kind of agent that can just learn on the fly is emerging or is gonna soon emerge or has already emerged.
是的。
Yeah.
对。
Yeah.
我的意思是,我无法代表其他人解释他们为何如此重视这一点。
So I mean, I can't speak for the emphasis of anyone else.
我只能谈谈我们是如何看待这个问题的。
I can I can only talk about how we how we think about it?
我认为我们的想法是,目标并不是在强化学习中教会模型所有可能的技能,就像我们在预训练中也不会那样做。
I think the way we think about it is the goal is not to teach the model every possible skill within RL, just as we don't do that within pre training.
对吧?
Right?
在预训练中,我们并不是试图让模型接触所有可能的词语组合方式。
Within pre training, we're not trying to expose the model to every possible way that words could be put together.
对吧?
Right?
而是模型在大量事物上进行训练,从而在预训练中实现了泛化。
It's rather that the model trains on a lot of things and then it reaches generalization across pre training.
对吧?
Right?
我近距离见证了从GPT-1到GPT-2的转变,就是模型达到某个临界点的时候。
That was transition from GPT-one to GPT-two that I saw up close, which is like you know, the the model reaches a point.
你知道的?
You know?
我有过几次这样的时刻,心想:哦,确实如此。
I I I I I like had these moments where I was like, oh, yeah.
你只要给模型一组数字,比如,你知道的,这是房子的价格。
You just give the model like you just give the model a list of numbers that's like, you know, you know, this is the cost of the house.
这是房子的平方英尺。
This is the square feet of the house.
模型就能自动补全模式,完成线性回归。
And the model completes the pattern and does linear regression.
不算很好,但它确实做到了,而且它以前从未见过完全相同的东西。
Not great, but it does it, but it's never seen that exact thing before.
因此,当我们构建这些强化学习环境时,目标与五年前或十年前预训练的做法非常相似——我们试图获取大量数据,并不是因为想覆盖某个特定文档或特定技能,而是为了实现泛化。
So to the extent that we are building these RL environments, the goal is very similar to what was done five or ten years ago with pre training, with we're trying to get a whole bunch of data, not because we want to cover a specific document or a specific skill, but because we wanna generalize.
我的意思是,你提出的这个框架显然很有道理。
I mean, I think the framework you're laying down obviously makes sense.
我们正在朝着通用人工智能取得进展。
Like, we're making progress towards AGI.
我认为关键在于,目前没有人会不同意我们将在本世纪实现通用人工智能。
I think the crux is something like, nobody at this point disagrees that we're gonna achieve AGI in this century.
而关键在于,你说我们正接近指数增长的尽头,但别人看到这个却说,是的,我们正在取得进展。
And the crux is you say we're hitting the end of the exponential and somebody else looks at this and says, oh yeah, we're making progress.
自2012年以来,我们一直在取得进展。
We've been making progress since 2012.
到2035年,我们将拥有类人代理。
And then 2035, we'll have a human like agent.
所以我想了解你究竟看到了什么,让你觉得我们确实在这些模型中看到了类似进化或人类一生中学习所发生的现象。
And so I wanna understand what it is that you're seeing, which makes you think, yeah, obviously we're seeing the kinds of things that evolution did or that human within human lifetime learning is like in these models.
那为什么你觉得距离实现只有一年,而不是十年呢?
And why think that it's one year away and not ten years away?
我实际上认为这里有两种情况,或者说两种主张,一种更强,另一种较弱。
I I I actually think of it as like two there's kind of two cases to be made here or like two two claims you could make, one of which is like stronger and the other of which is weaker.
所以我想先从较弱的主张说起,当我2019年第一次看到规模扩展时,我并不确定。
So I think starting with the weaker claim, when I first saw the scaling back in 2019, I wasn't sure.
这当时是一个五五开的事情。
This was kind of a fiftyfifty thing.
我认为我看到了一些东西,我的观点是:这种情况发生的可能性远比任何人想象的要大。
I thought I saw something that was and my claim was this is much more likely than anyone thinks it is.
这太疯狂了。
This is wild.
根本没人会考虑这种可能性。
No one else would even consider this.
也许这件事有50%的可能性会发生。
Maybe there's a 50% chance this happens.
基于你所说的假设——十年内,我们将在数据中心实现我所说的‘天才国家’,
On the basic hypothesis of, as you put it, within ten years, we'll get to what I call country of geniuses in a data center.
我对这一点的把握有90%。
I'm at 90% on that.
很难再高于90%,因为世界太不可预测了。
And it's hard to go much higher than 90% because the world is so unpredictable.
不可削减的不确定性可能出现在95%的时候,比如,我不知道,可能多家公司内部出现动荡,导致什么都没发生。
Maybe the irreducible uncertainty would be if we were at 95% where you get to things like, I don't know, may maybe multi you know, multiple companies have, you know, kind of internal turmoil and nothing happens.
然后台湾被入侵,所有的晶圆厂都被导弹炸毁,于是——
And then Taiwan gets invaded and, like, all the all the fabs get blown up by missiles and and, you know, and then Now
你会为这个情景干杯。
you would drink to scenario.
是的。
Yeah.
是的。
Yeah.
对。
Yeah.
你可以构想一种情景,有5%的可能性,或者你可以想象一个5%的世界,在那里事情被推迟了十年。
You you know, just you could construct a a scenario where there's, a 5% chance that it or, you know, you you can construct a 5% world where, like, things things get delayed for 10 for for for for for for ten years.
这可能是5%。
That's maybe 5%.
还有另外5%,就是我对那些可以验证的任务非常有信心。
There's another 5%, which is that I'm very confident on tasks that can be verified.
所以我认为,在编程方面,除了这种不可消除的不确定性之外,我认为我们一两年内就能达到那个水平。
So I think I think with coding, I'm just except for that irreducible uncertainty, there's just I mean, I think we'll be there in one or two years.
十年内我们肯定能在端到端编程方面做到这一点。
There's no way we will not be there in ten years in terms of being able to do it end to end coding.
我唯一一点根本性的不确定性,即使在长期尺度上,也是关于那些无法验证的任务,比如规划火星任务,或者做一些基础科学发现,比如CRISPR,或者写一部小说——这些任务很难验证。
My one little bit, the one little bit of fundamental uncertainty even on long time scales is this thing about tasks that aren't verifiable, like planning a mission to Mars, like, you know, doing some fundamental scientific discovery like like CRISPR, like writing a novel, hard to verify those tasks.
我几乎可以肯定我们有一条可靠的路径能够实现这个目标,但如果存在一丝不确定性,那就是在这里。
I am almost certain that we have a reliable path to get there, but if there was a little bit uncertainty, it's there.
所以,关于十年内实现这一点,我的把握大约是90%,这差不多是你能达到的最高确信度了。
So so so so so on the ten years, I'm like, you know, 90%, which is about as certain as you can be.
我觉得,认为到2035年这还不会发生,这种想法在我看来是疯狂的。
Like, I think it's I think it's crazy to say that this won't happen by by by 2035.
在任何一个理智的世界里,这种观点都会被视为非主流。
Like, in some sane world, it would be outside the mainstream.
但是,但是对验证的强调让我觉得,这表明你并不完全相信这些模型具有泛化能力。
But but the emphasis on verification hints to me as a lack of belief that these models are generalized.
如果你想想人类,我们既擅长那些能获得可验证回报的事情,也擅长那些无法验证的事情。
If you think about humans, we are good at things that both of which we get verifiable reward and things which we don't.
你就像是,你有一个‘不’字。
You're like, you have a No.
好
Good
不。
No.
这就是我几乎确定的原因。
This is why I'm almost sure.
我们已经看到,从可验证的事物到不可验证的事物之间出现了显著的泛化,这种现象我们已经在发生了。
We already see substantial generalization from things that verify to things that don't we're already seeing that.
但你似乎在强调这是一个会分化的连续谱,这意味着你看到了更多进展。
But but it seems like you were emphasizing this as a spectrum which will split apart, which there means you see more progress.
而我觉得,这并不像人类获得能力的方式。
And I'm like, but that's it doesn't seem like how humans get away.
在我们未能实现目标的世界里,或者在我们未能到达那里的世界里,我们只是做了所有那些可验证的事情。
World in which we don't make it or or or the world in which we don't get there is the world in which we do we do all the things that are that are verifiable.
然后其中许多事物确实泛化了,但我们却没能完全抵达目标。
And then they like you know, many of them generalize, but what we kinda don't get fully there.
我们没有,没有,没有完全做到,你知道的,我们没有完全填满这个盒子的这一侧。
We don't we don't we don't fully, you know, we don't fully color in this side of the box.
这不是一个非黑即白的问题。
It's not a binary thing.
但即使在泛化能力较弱的世界里,当你只谈论可验证的领域时,我也不清楚在这种情况下能否自动化软件工程,因为从某种意义上说,你本身就是一名软件工程师。
But it also seems to me, even if in the world where generalization is weak, when you only say verifiable domains, it's not clear to me in such a world you could automate software engineering because software like in some sense you are quote unquote a software engineer.
是的。
Yeah.
但作为一名软件工程师,你的工作的一部分涉及撰写这些关于你宏大愿景的长篇备忘录,没错。
But part of being a software engineer for you involves writing these, like, long memos about your grand vision about That's right.
各种不同的事情。
Different things.
所以我不认为这是软件工程师的工作内容。
And so I don't think that's part of the job of SWE.
那是公司职责的一部分。
That's part that's part of the job of the company.
我认为软件工程师确实涉及设计文档和其他类似的东西,顺便说一句,这些模型在这方面表现得并不差。
I do think SWE involves, like, design documents and other things like that, which by the way, the the models are not bad.
它们已经相当擅长写注释了。
They're already pretty good at writing comments.
所以,我在这里提出的主张其实比我认为的要弱得多,我只是想澄清一下,区分两件事。
And so with with again, I again, I'm making, like, much weaker claims here than I believe to, like, you know, to to to to to kinda set up you know, to to distinguish between two things.
我们在软件工程方面已经几乎做到了。
Like, we're we're already almost there for software engineering.
我们已经几乎做到了。
We are already almost there.
用什么标准来衡量?
By by what metric?
有一个标准,就是AI写了多少行代码?
There's one metric, which is like how many lines of code are written by AI?
如果你考虑软件工程历史上其他生产力的提升,编译器实际上写了所有的代码。
And if you use if you consider other productivity improvements in the course of the history of software engineering, compilers write all the lines of software.
代码行数和生产力提升的幅度之间是有区别的。
There's a difference between how many lines are written and how big the productivity improvement is.
哦,是的。
Oh, yeah.
然后我们几乎达到了,意思是生产力提升有多大,而不仅仅是写了多少行代码。
And then we're almost there, meaning how big is the productivity improvement, not just how many lines are written.
对。
Yeah.
对。
Yeah.
所以实际上我同意你的观点。
So I actually agree with you on this.
所以我对代码和软件工程做了一系列预测。
So, I've made this series of predictions on code and software engineering.
我认为人们一再误解了它们。
And I think people have repeatedly misunderstood them.
所以,让我来梳理一下这个谱系,对吧?
So, let me lay out the spectrum, right?
我想大概是八九个月前,有人说AI模型将在三到六个月内编写90%的代码行,而这种情况确实在某些地方已经发生了。
I think it was like eight or nine months ago or something, said, The AI model will be writing 90% of the lines of code in three to six months, which happened at least at some places.
在Anthropic发生了,使用我们模型的许多下游用户也发生了这种情况。
Happened at anthropic, happened with many people downstream using our models.
但这其实是一个非常弱的标准。
But but that's actually a very weak criterion.
对吧?
Right?
人们以为我说的是,我们不再需要90%的软件工程师。
People thought I was saying like, we won't need 90% of the software engineers.
这两者完全是两回事。
Those things are worlds apart.
对吧?
Right?
我会把这个范围定义为:90%的代码是由模型编写的。
Like, I would put the spectrum as 90% of code is written by the model.
100%的代码都是由模型编写的,这在生产力上是一个巨大的差异。
A 100% of code is written by the model, and that's a big difference in productivity.
90%的端到端任务,包括编译、设置集群和环境、测试功能、撰写备忘录等,都是由模型完成的。
90% of the end to end suite tasks, right, including things like compiling, including things like setting up clusters and environments, testing features, writing memos.
90%的软件工程任务都是由模型编写的。
90% of the SWE tasks are written by the models.
今天所有的软件工程任务都是由模型编写的。
100% of today's SWE tasks are written by the models.
即使当这种情况发生时,也不意味着软件工程师会失业。
And and even when when when that happened, doesn't mean software engineers are out of a job.
比如,他们可以去做一些更高层次的新工作,比如进行管理。
Like, there's, like, new higher level things they can do where they can they can manage.
再往后的趋势是,对软件工程师的需求减少了90%,我认为这确实会发生,但这是一个连续的谱系。
And then there's a further down the spectrum, you know, there's 90% less demand for SWEs, which I think will happen, but, like, this is this this is a spectrum.
而且,我曾在关于技术青春期的文章中提到过,我曾亲历过农业领域类似的演变过程。
And, you know, I I wrote about it in in the adolescence of technology where I went through this kind of spectrum with farming.
所以我完全同意你的观点。
And so I I actually totally agree with you on that.
这些基准彼此差异很大,但我们在快速推进它们。
It's just these are very different benchmarks from each other, but we're proceeding through them super fast.
看起来在某种程度上
It seems like in part
你的愿景就像是从90提升到100。
of your vision, it's like going from 90 to a 100.
首先,这件事会迅速发生。
First it's gonna happen fast.
其次,这 somehow 会带来巨大的生产力提升。
And two, that somehow that leads to huge productivity improvements.
而当我注意到,即使在全新的项目中,人们也从 Cloud Code 之类的工具开始,人们报告说启动了大量项目。
Whereas when I noticed even in Greenfield projects that people start with Cloud Code or something, people report starting a lot of projects.
我不禁想,我们在这个世界上是否正见证一场软件的复兴,出现了大量原本不会存在的新功能?
And I'm like, do we see in the world out there a renaissance of software, all these new features that wouldn't exist otherwise?
到目前为止,我们似乎并没有看到这种情况。
And at least so far, it doesn't seem like we see that.
所以这让我思考,即使我从未需要干预Claude的代码,但世界是复杂的,工作也是复杂的,要闭环那些自包含的系统——无论是写软件还是其他事情——仅凭这一点,我们能获得多大的整体收益呢?
And so that does make me wonder, if, even if like I never had to intervene on Claude code, there is this thing of like, there's just the world is complicated, jobs are complicated and closing the loop on self contained systems, whether it's just writing software or something, how much broader gains we would see just from that?
因此,这或许让我们应该降低对天才数量的估计。
And so maybe that makes us this should dilute our estimation of the country of geniuses.
嗯,实际上,我同时同意你的观点,这正是这些事情不会立即发生的原因。
Well well, I actually I I like I like simultaneously I simultaneously agree with you, agree that it's a reason why these things don't happen instantly.
但与此同时,我认为这种影响将会非常迅速。
But at the same time, I think the the effect is going to be very fast.
所以,比如说,你可能会有这两种极端,对吧?
So like, I don't know, you could have these two poles, right?
一种是,AI就像,它不会带来进步。
One is like, AI is like, it's not going to make progress.
它很慢。
It's slow.
它在经济中普及起来可能需要很长时间。
It's going to take kind of forever to to diffuse within the economy.
对吧?
Right?
经济扩散已经成为一个流行词,常被用来解释为什么我们不会取得AI进展,或者为什么AI进展无关紧要。
Economic diffusion has become one of these buzzwords that's like a a reason why we're not gonna make AI progress or why AI progress doesn't matter.
而另一个极端是,我们会实现递归式自我改进,你知道的,整个过程,你难道不能在曲线上画一条指数增长线吗?
And and, know, the other axis is like, we'll get recursive self improvement, you know, the whole thing, you know, can't you just draw an exponential line on the on the curve?
你知道,我们会拥有围绕太阳的戴森球,还有无数纳秒之后,就在我们实现递归之后。
You know, it's it's we're gonna have, you know, Dyson spheres around the sun and like, you know, you know, so many nanoseconds after, you know, after after we get recursive.
我的意思是,我这里是在夸张地描述这种观点,但确实存在这两种极端立场。
I mean, I'm completely caricaturing the view here, but like, there are these two extremes.
但至少从一开始,如果我们看Anthropic内部的情况,就看到了收入每年增长十倍的惊人现象。
But what we've seen from the beginning, at least if you look within Anthropic, there's this bizarre 10x per year growth in revenue that we've seen.
对吧?
Right?
所以,你知道,在2023年,收入从0增长到了1亿美元。
So, you know, in 2023, it was like 0 to a 100,000,000.
2024年,从1亿美元增长到了10亿美元。
2024, it was a 100,000,000 to a billion.
2025年,从10亿美元增长到了大约90或100亿美元。
2025, it was a billion to like 9 or 10,000,000,000.
然后你们其实应该直接买一个
And then You guys should have just bought like a
亿美元,用你们自己的产品,所以你们可以轻松达到100亿。
billion dollars with your own products so you could just like have a clean 10 be.
而且,第一个
And and the first
今年的一月份,按理说这种指数增长应该会放缓,但你知道,我们在一月份又增加了数十亿美元的收入。
month of this year, like, that exponential is you would think it would slow down, but it it would like you know, we we added another few billion to like you know, to to to we added another few billion to revenue in January.
所以很明显,这种增长曲线不可能永远持续下去。
And so obviously that curve can't go on forever.
对吧?
Right?
GDP 只有这么大。
The GDP is only so large.
我甚至猜今年这曲线会稍微弯曲一下,但那仍然是一个非常陡峭的曲线。
I would even guess that it bends somewhat this year, but that is like a fast curve.
对吧?
Right?
那真的是一个非常陡峭的曲线。
That's like a that's like a really fast curve.
即使规模扩大到整个经济,我打赌它仍然会保持非常快的速度。
And I would bet it stays pretty fast even as the scale goes to the entire economy.
所以,我觉得我们应该思考这样一个中间状态:事物发展极快,但并非瞬间完成,而是需要时间,因为经济扩散、需要闭环,因为这事儿很繁琐——天啊,我得在企业内部做变革管理。
So like, I I think we should be thinking about this middle world where things are like extremely fast, but not instant, where they take time because of economic diffusion, because of the need to close the loop, because it's like this fiddly, Oh man, I have to do change management within my enterprise.
我得设置好这个,但还得调整这个的安全权限,才能让它真正运行起来。
I have to like I set this up, but I have to change the security permissions on this in order to make it actually work.
或者我之前有一段旧软件,它会在模型编译和发布前进行检查,我现在得重写它。
Or I had this old piece of software that, like, you know, checks the model before it's compiled and and and, like, released, and I have to rewrite it.
是的,模型本身可以做到这一点,但我得告诉模型去做,而且它需要时间来完成。
And, yes, the model can do that, but I have to tell the model to do that, and it has to it has to take time to do that.
所以我认为,到目前为止我们所看到的一切,都与这样一个观点一致:存在一个快速的指数增长,即模型的能力;还有一个位于其下游的快速指数增长,那就是模型在经济中的扩散。
And and and so I think everything we've seen so far is is compatible with the idea that there's one fast exponential that's the the capability of the model, and then there's another fast exponential that's downstream of that, which is the diffusion of the model into the economy.
不是瞬间完成,也不是很慢,比任何以前的技术都快得多,但它有其局限性。
Not instant, not slow, much faster than any previous technology, but it has its limits.
这正是我们——当我审视Anthropic内部,当我观察我们的客户时——所看到的:快速采用,但并非无限快速。
And and and and this is what we you know, when I when I look inside Anthropic, when I look at our customers, fast adoption, but not infinitely fast.
我能对你来个大胆的观点吗?
Can I try a hot take on you?
是的。
Yeah.
我觉得,人们用‘扩散’来当借口,当模型做不到某件事时,他们就说:哦,这是扩散问题。
I feel like diffusion is cope that people use to say when it's like if the model wasn't able to do something, they're like, oh, but the diff it's like a diffusion issue.
但你应当用人来做比较。
But then you should use the comparison to humans.
你会觉得,AI固有的优势会让新AI的普及比新人入职要容易得多。
You would think that the inherent advantages that AIs have would make diffusion a much easier problem for new AIs getting onboarded than new humans getting onboarded.
AI可以在几分钟内读完你整个Slack和硬盘里的内容。
So an AI can read your entire Slack and your drive in minutes.
它们可以共享同一实例其他副本的所有知识。
They can share all the knowledge that the other copies of the same instance have.
你在招聘AI时不会遇到逆向选择问题,因为你直接雇佣经过验证的AI模型副本即可。
You don't have this adverse selection problem when you're hiring AIs because you can just hire copies of a vetted AI model.
招聘人类要麻烦得多,但人们却一直在招聘人类,对吧?
Hiring a human is like so much more hassle and people hire humans all the time, right?
我们每年支付人类高达50万亿美元的工资,因为它们有用,尽管从原则上讲,将AI整合进经济要比招聘人类容易得多。
We pay humans upwards of $50,000,000,000,000 in wages because they're useful, even though it's In principle, it would be much easier to integrate AIs into the economy than it is to hire humans.
所以我认为,普及这个说法,我觉得并不,我真的
So I think the diffusion, I feel doesn't I really
我认为扩散是真实存在的,并不完全是因为AI模型本身的限制。
think diffusion is very real and doesn't have to you know, doesn't exclusively have to do with limitation limitation limitations on the AI models.
比如,有些人把扩散当作一个流行词,用来表示这没什么大不了的。
Like, again, there are people who use diffusion to to you know, as kind of a buzzword to say this isn't a big deal.
我说的不是这个。
I'm not talking about that.
我说的不是AI会像以前的技术那样以某种速度扩散。
I'm not talking about AI will diffuse at the speed that previous.
我认为AI的扩散速度会比以往任何技术都快得多,但并不是无限快。
I think AI will diffuse much faster than previous technologies have, not infinitely fast.
我来举个例子说明一下。
So I'll just give an example of this.
比如Claude Code。
There's like Claude code.
Claude Code的设置非常简单。
Like Claude code is extremely easy to set up.
如果你是开发者,你可以直接开始使用Claude Code。
If you're a developer, you can kind of just start using Claude code.
大型企业的开发者没有任何理由不比个人开发者或初创公司的开发者更快地采用Claude Code。
There is no reason why a developer at a large enterprise should not be adopting Claude code as quickly as individual developer or developer at a startup.
我们尽一切努力推动它的使用。
And we do everything we can to promote it.
我们向企业,包括大型金融机构、大型制药公司等,销售Claude Code,它们采用Claude Code的速度远超企业通常采用新技术的速度。
We sell Claude code to enterprises and big enterprises like big financial companies, big pharmaceutical companies, all of them, they're adopting Claude code much faster than enterprises typically adopt new technology.
但同样,这确实需要时间。
But again, it like, it it it it it it takes time.
任何特定功能或产品,比如Claude Code或Cowork,都会比大型食品销售企业等传统大公司快得多地被经常在Twitter上活跃的个人开发者和A轮初创公司采用。
Like, any given feature or any given product like Claude Code or like Cowork will get adopted by the, you know, the individual developers who are on Twitter all the time, by the like, series a startups many months faster than than, you know, than they will get adopted by, like, you know, a, like, large enterprise that does food sales.
这涉及多个因素。
There are a number of factors.
你必须经过法务流程。
Like, you have to go through legal.
你需要为所有人进行配置部署。
You have to provision it for everyone.
它必须通过安全和合规审查。
It has to pass security and compliance.
公司领导层虽然距离AI革命较远,但他们具有前瞻性,必须表态说‘哦,我们投入5000万美元是合理的’。
The leaders of the company who are further away from the AI revolution are forward looking, but they have to say, Oh, it makes sense for us to spend 50,000,000.
这就是Claude代码这个东西的意义所在。
This is what this Claude code thing is.
这就是它对我们公司有帮助的原因。
This is why it helps our company.
这就是它能提高我们生产力的原因。
This is why it makes us more productive.
然后他们必须向再往下两级的人员解释,并且他们得说,好吧。
Then they have to explain to the people two levels below, and they have to say, okay.
我们有3000名开发人员。
We have 3,000 developers.
以下是我们将如何向开发人员推广的计划。
Here's how we're gonna roll it out to our developers.
我们每天都有这样的对话。
And we have conversations like this every day.
我们正在尽一切努力,让Anthropix的年收入增长达到20或30倍,而不是10倍。
We are doing everything we can to make Anthropix revenue grow 20 or 30 x a year instead of 10 x a year.
再次强调,许多企业都在说,这生产力太高了。
Again, many enterprises are just saying, this is so productive.
我们将在常规采购流程中走捷径。
We're gonna take shortcuts in our usual procurement process.
他们的行动速度比我们试图向他们销售普通API时要快得多,虽然很多人使用普通API,但四重代码是一个更具吸引力的产品。
They're moving much faster than when we tried to sell them just the ordinary API, which many of them use, but quad code is a more compelling product.
但它并非一个无限吸引人的产品。
But it's not an infinitely compelling product.
而且我认为,即使是AGI、强大的人工智能,或是数据中心里的天才国度,也不会是无限吸引人的产品。
And I don't think even AGI or Powerful AI or Country of Geniuses in the data center will be an infinitely compelling product.
展开剩余字幕(还有 480 条)
它可能足够有吸引力,能够实现每年三倍、五倍甚至十倍的增长,即使你的规模已经达到数千亿美元——这在历史上从未有人做到过,但不可能无限快地增长。
It will be a compelling product enough maybe to get three or five or 10 X a year growth, even when you're in the hundreds of billions of dollars, which is extremely hard to do and has never been done in history before, but not infinitely fast.
我同意这可能会稍微放缓。
I buy that it would be a slight slowdown.
也许这不是你的观点,但有时人们会这样说:哦,能力已经具备了,只是因为扩散效应,所以我们实际上已经接近AGI了,但我并不认为我们已经接近AGI。
And maybe this is not your claim, but sometimes people talk about this like, the oh, capabilities are there, but because of diffusion, otherwise, we're basically at AGI and then I don't believe we're basically at AGI.
我认为,如果你真的能在数据中心聚集一整个国家的天才,而你的公司却没有采用这些天才,
I think if you had the country of geniuses in a data center, if your company didn't adopt the country
我们就会立刻知道。
of geniuses a center, we would know it.
没错。
Right.
是的。
Yeah.
如果你真的能在数据中心聚集一整个国家的天才,我们一定会立刻知道。
We would know it if you had the country of geniuses in a data center.
这个房间里每个人都会知道。
Like, everyone in this room would know it.
华盛顿的每个人都会知道。
Everyone in Washington would know it.
你知道的,可能农村地区的人不会知道。
Like, you know, people in rural rural parts might not know it.
但我们一定会知道。
But but but like, we would know it.
我们现在并没有这种情况。
We don't have that now.
这一点非常清楚。
That that's very clear.
正如达莉亚所说,要实现泛化,你需要在各种真实的任务和环境中进行训练。
As Daria was ending at, to get generalization, you need to train across a wide variety of realistic tasks and environments.
例如,对于销售代理来说,最难的并不是教它在Salesforce的特定数据库中点击按钮。
For example, with a sales agent, the hardest part isn't teaching it to mash buttons in a specific database in Salesforce.
这是在各种模糊情境中训练代理的判断能力。
It's training the agent's judgment across ambiguous situations.
你如何从成千上万的潜在客户中筛选出哪些是优质线索?
How do you sort through a database with thousands of leads to figure out which ones are hot?
你该如何实际联系他们?
How do you actually reach out?
当你被对方无视时,你该怎么办?
What do you do when you get ghosted?
当一家人工智能实验室想训练一个销售代理时,Labelbox 邀请了数十位《财富》500强企业的销售人员,构建了多种不同的对话场景。
When an AI lab wanted to train a sales agent, Labelbox brought in dozens of Fortune 500 salespeople to build a bunch of different Aural environments.
他们创建了数千种情境,让销售代理必须与潜在客户互动,而这些客户由另一个AI扮演。
They created thousands of scenarios where the sales agent had to engage with the potential customer, which was role played by a second AI.
Labelbox 确保这个客户AI具备几种不同的性格,因为在电话推销时,你完全不知道对方会是谁。
LimbleBox made sure that this customer AI had a few different personas because when you cold call, you have no idea who's gonna be on the other end.
你需要能够应对各种可能的情况。
You need to be able to deal with a whole range of possibilities.
Labelbox 的销售专家逐轮监控这些对话,调整角色扮演代理,以确保它表现出真实客户会有的行为。
LimbleBox's sales experts monitored these conversations turn by turn, tweaking the role playing agent to ensure it did the kinds of things an actual customer would do.
Labelbox 的迭代速度比行业中的任何人都快。
A Labelbox could iterate faster than anybody else in the industry.
这非常重要,因为强化学习是一门实证科学。
This is super important because RL is an empirical science.
这并不是一个已解决的问题。
It's not a solved problem.
Labelbox 拥有一系列工具,用于实时监控代理的表现。
Labelbox has a bunch of tools for monitoring agent performance in real time.
这使得他们的专家能够不断设计新任务,确保模型始终处于适当难度的分布范围内,并在训练过程中获得最优的奖励信号。
This lets their experts keep coming up with tasks so that the model stays in the right distribution of difficulty and gets the optimal reward signal during training.
Labelbox 几乎可以在每个领域都做到这一点。
Labelbox can do this sort of thing in almost every domain.
他们拥有对冲基金经理、放射科医生,甚至航空公司飞行员。
They've got hedge fund managers, radiologists, even airline pilots.
所以,无论你在做什么,Labelbox 都能提供帮助。
So whatever you're working on, Labelbox can help.
了解更多,请访问 labelbox.com/vorcash。
Learn more at labelbox.com/vorcash.
回到具体的预测上来,因为我觉得在讨论能力时,有太多不同的事情需要区分,很容易出现沟通偏差。
Coming back to concrete predictions because I think because there's so many different things to disambiguate, it can be easy to talk past each other when we're talking about capabilities.
所以,例如,三年前我采访你时,我问过她一个关于三年后我们会看到什么的预测。
So for example, when I interviewed you three years ago, I asked her a prediction about what should we expect three years from now.
我觉得你说对了。
I think you were right.
你当时说,我们应该期待这样的系统:当你和它交谈一小时后,很难将它与一个受过良好教育的人区分开来。
So you said we should expect systems, which if you talk to them for the course of an hour, it's hard to tell them apart from a generally well educated human.
是的。
Yes.
我觉得你在这方面是对的。
I think you were right about that.
而且从精神层面来说,我感到不满足,因为我内心的期望是这样一个系统能够自动化大部分白领工作。
And I think spiritually, I feel unsatisfied because my internal expectation was that such a system could automate large parts of white collar work.
所以,也许讨论你希望这样一个系统最终具备的实际能力会更有成效。
And so it might be more productive to talk about the actual end capabilities you want such a system.
那么我基本上可以告诉你,我认为我们正在实现。
So I will basically tell you I think we are.
所以——但让我用一个非常具体的问题来提问,这样我们就能准确理解不久后应该期待哪些能力。
So- But let me ask it in a very specific question so that we can figure out exactly what kinds of capabilities we should expect soon.
所以也许我会在我比较了解的工作背景下提问,不是因为这是最相关的工作,而是因为我能评估关于它的说法。
So maybe I'll ask about it in the context of a job I understand well, not because it's the most relevant job, but just because I can evaluate the claims about it.
以视频编辑为例,对吧?
Take video editors, right?
我有视频编辑师,他们工作的一部分包括了解我们观众的偏好,了解我的偏好和品味以及我们面临的各种权衡,经过数月的时间,逐渐建立起这种对背景的理解。
I have video editors and part of their job involves learning about our audience's preferences, learning about my preferences and tastes and the different trade offs we have and just over the course of many months, building up this understanding of context.
那么,他们在工作六个月后所具备的技能和能力,一个能在工作中、实时掌握这种技能的模型,我们何时能期待这样的AI系统出现?
And so the skill and ability they have six months into the job, a model that can pick up that skill on the job, on the fly, when should we expect such an AI system?
是的。
Yeah.
所以我想你所说的是,我们进行了三个小时的访谈,然后有人会进来,有人会进行剪辑。
So I guess what you're talking about is like, we're doing this interview for three hours and then you know, someone's gonna come in, someone's gonna edit it.
他们会说,哦,你知道,你知道,比如达里奥,你知道,他抓了抓头,嗯,我们可以把那段剪掉,
They're gonna be like, oh, you know, you know, I don't know, Dario like, you know, scratched his head and, you know, we could we could edit that out and,
放大这一点。
you know Magnify that.
有一段很长的讨论,对观众来说没那么有趣,而其他一些内容则更吸引人。
There was this like long was this long discussion that is less interesting to people, then there's other thing that's more interesting to people.
所以我们来做这个剪辑。
So let's make this edit.
所以我认为,位于数据中心的天才群体将能够做到这一点,方式是他们将拥有对电脑屏幕的全面控制。
So I think the country of geniuses in a data center will be able to do The way it will be able to do that is it will have general control of a computer screen.
对吧?
Right?
你可以将这些内容输入进去,它就能利用电脑屏幕访问网络,查看你所有的过往访谈,浏览人们在推特上对你访谈的回应,与你交流,向你提问,与你的团队沟通,查看你以往的编辑记录,并据此完成工作。
You'll be able to feed this in, and it'll be able to also use the computer screen to go on the web, look at all your previous interviews, look at what people are saying on Twitter in response to your interviews, talk to you, ask you questions, talk to your staff, look at the history of edits that you did, and from that, do the job.
是的。
Yeah.
所以我认为这取决于几个因素。
So I think that's dependent on several things.
首先,这取决于一点,我认为这也是阻碍部署、难以实现计算机使用能力突破的关键所在——即模型能否真正精通使用电脑。
One, that's dependent, and I think this is one of the things that's actually blocking deployment, getting to the point on computer use, where the models are really masters at using the computer.
对吧?
Right?
我们已经看到基准测试成绩的提升,虽然基准测试总是不完美的衡量标准,但比如在操作系统任务方面,当我们一年前半首次推出计算机使用功能时,成功率可能只有5%,后来上升到大约15%,我记不太清具体数字了,但现在已经提升到了65%到70%。
And we've seen this climb in benchmarks, and benchmarks are always imperfect measures, but like OS world went from 5%, I think when we first released computer use a year and a quarter ago, was like maybe 15%, I don't remember exactly, But we've climbed from that to like 65 or 70%.
而且可能还有更难的衡量指标,但我认为计算机使用能力必须达到一个可靠的水平。
And and, you know, there may be harder measures as well, but but I think computer use has to pass a point of reliability.
我能就这一点问个后续问题吗?
Can I just ask a follow-up on that?
是的。
Yeah.
在我们继续下一个话题之前。
Before we move on to the next point.
多年来,我一直在尝试为自己构建不同的内部大语言模型工具。
I often, for years, I've been trying to build different internal LLM tools for myself.
我经常遇到一些文本输入文本输出的任务,这些本应是这些模型的核心能力。
Often I have these text in text out tasks, which should be dead center in the repertoire of these models.
然而,我仍然不得不雇佣人类来做这些工作,因为如果让我识别这段对话中最精彩的片段,他们可能只能做到七分左右的水平,但我无法像对待人类员工那样,持续与他们互动来帮助他们提升工作表现。
And yet I still hire humans to do them just because if it's something like identify what the best clips would be in this transcript and maybe they'll do like a seven out of 10 job at them, but there's not this ongoing way I can engage with them to help them get better at the job the way I could with a human employee.
因此,即使你看到了计算机使用能力的提升,这种缺失的能力仍然会阻碍我将实际工作完全交托给他们。
And so that missing ability, even if you saw computer use, would still block my ability to like offload an actual job to them.
这又回到了我们之前讨论过的在工作中学习的问题,这非常有趣。
Again, gets back to what we were talking about before with learning on the job where it's very interesting.
我认为对于编码代理来说,人们不会说‘在工作中学习’是阻碍它们端到端完成所有任务的原因。
I think with the coding agents, I don't think people would say that learning on the job is what is preventing the coding agents from doing everything end to end.
它们变得越来越好。
They keep getting better.
我们在Anthropic有一些工程师根本不写代码。
We have engineers at Anthropic who don't write any code.
当我看到生产力时,针对你之前的问题,我们有些人说:这个GPU内核、这个芯片,我以前都是自己写的。
And when I look at the productivity, to your previous question, we have folks who say, this this GPU kernel, this chip, I used to write it myself.
我现在让Claude来处理。
I just have Claude do it.
因此,生产力有了巨大的提升。
And so there's this there's this enormous improvement in productivity.
我不确定。
And I don't know.
当我看到Claude写代码时,比如对代码库的熟悉程度,或者感觉这个模型没有在公司工作过一年,这些都不是我看到的主要问题。
Like, when I see Claude code, like, familiarity with the code base or, like, you know, or or a feeling that the model hasn't worked at the company for for a year, that's not high up on the list of complaints I see.
所以我想说的是,我们实际上正在采取一种不同的方式
And so I think what I'm saying is we're we're, like, we're kind of taking a different
但你不觉得编程之所以如此,是因为存在一个外部的记忆框架,这个框架体现在代码库中,而我不知道还有多少其他工作能像编程这样取得如此快速的进展,正是因为拥有这种其他经济活动所不具备的独特优势。
But don't you think with coding that's because there is an external scaffold of memory which exists, instantiated in the code base, which I don't know how many other jobs have Coding made fast progress precisely because it has this unique advantage that other economic activity doesn't.
但当你这么说时,你的潜台词是,通过将代码库读入上下文,你就获得了人类在工作中需要学习的所有内容。
But when you say that, what you're implying is that by reading the code base into the context, I have everything that the human needed to learn on the job.
因此,无论代码是否被写下来,无论是否可获取,这都是一个例子:你所需知道的一切,都可以从上下文窗口中获得。
So that would be an example of whether it's written or not, whether it's available or not, a case where everything you needed to know, you got from the context window.
对吧?
Right?
而我们所认为的学习,天啊,我刚入职这个岗位。
And that and that what we think of as learning, oh, man, I started this job.
我要花六个月才能理解这个代码库。
It's gonna take me six months to understand the code base.
而模型只是在上下文中瞬间完成了这一切。
The model just did it in the context.
是的。
Yeah.
老实说,我不知道该如何看待这个问题,因为确实有人主观上报告了你所说的那种情况。
I honestly don't know how to think about this because there are people who qualitatively report what you're saying.
去年有一项计量研究,我相信你看过——是的。
There was a meter study, I'm sure you saw last year- Yes.
研究中让有经验的开发者尝试在他们熟悉的代码库中关闭拉取请求。
Where they had experienced developers try to close pull requests in repositories that they were familiar with.
这些开发者报告了效率的提升。
And those developers reported an uplift.
他们表示,使用这些模型后,自己的工作效率更高了。
They reported that they felt more productive with their use of these models.
但事实上,如果你查看他们的实际产出以及有多少代码最终被合并回去,却发现效率下降了20%。
But in fact, if you look at their output and how much was actually merged back in, there's a 20 down lift.
由于这些模型的使用,他们的实际生产力反而降低了。
They were less productive as a result of these models.
因此,我正试图调和这种主观感受——人们觉得这些模型很有帮助——与宏观层面的一个问题:为什么软件领域还没有出现你所说的那种繁荣?
And so I'm trying to square the qualitative feeling that people feel with these models versus one, in a macro level, where is this like renaissance of software?
其次,当人们进行这些独立评估时,为什么我们看不到那些积极的成果。
And then two, when people do these independent evaluations, why are we not seeing the Yeah.
我们本应期待的生产力提升。
So productivity benefits that we would expect.
在Anthropic内部,这一点非常明确。
Within Anthropic, this is just really unambiguous.
对吧?
Right?
我们面临着巨大的商业压力,而且因为我们做了大量安全相关的工作,这让我们自己更加艰难,我认为我们在这方面比其他公司做得更多。
We're under an incredible amount of commercial pressure and make it even harder for ourselves because we have all the safety stuff we do that I think we do more than other companies.
因此,在保持我们价值观的同时还要实现经济生存,这种压力简直巨大。
So the pressure to survive economically while also keeping our values is just incredible.
我们正努力维持十倍的收入增长曲线。
We're trying to keep this 10x revenue curve going.
根本没有时间浪费在无意义的事情上。
There is zero time for bullshit.
我们根本没有时间去自我感觉良好,以为自己很有生产力,实际上却没有。
There is zero time for feeling like we're productive when we're not.
这些工具让我们高效多了。
These tools make us a lot more productive.
你觉得我们为什么担心竞争对手使用这些工具呢?
Like, why why do you think we're concerned about competitors using the tools?
因为我们觉得自己领先于竞争对手,而且我们并不想落后。
Because we think we're ahead of the competitors and, like, we don't we don't wanna excel.
如果这真的在暗中降低我们的生产力,我们根本不会经历这么多麻烦。
We we wouldn't be going through all this trouble if this was secretly reducing our productivity.
每隔几个月,我们就能通过模型发布看到最终的生产力提升。
We see the end productivity every few months in the form of model launches.
在这方面,我们绝不会自欺欺人。
There's no kidding yourself about this.
这些模型让你更有生产力。
The models make you more productive.
第一,人们感觉自己更有生产力,这一点被这类研究定性地预测到了。
One, people feeling like they're more productive is qualitatively predicted by studies like this.
但第二,如果我只看最终产出,显然你们正在快速取得进展。
But two, if I just look at the end output, obviously you guys are making fast progress.
递归自我改进的初衷是,你们制造出更好的AI,AI帮助你们构建下一个更好的AI,如此循环。
The idea was supposed to be with recursive self improvement is that you make a better AI, the AI helps you build a better next AI, etcetera, etcetera.
但当我观察你们——OpenAI和DeepMind时,我看到的却是人们每隔几个月就在排行榜上轮流更替。
And what I see instead, if I look at the you OpenAI DeepMind is that people are just shifting around the podium every few months.
也许你们认为这会停止,是因为你们已经赢了之类的,但如果我们确实从上一轮获得了巨大的生产力提升,为什么看不到拥有最佳编码模型的人拥有持久优势呢?
And maybe you think that stops because you've won or whatever, But are we not seeing the person with the best coding model have this lasting advantage if in fact there are these enormous productivity gains from the last
所以编码?不,不,不。
So coding no, no, no.
我认为我的模型是,这种优势正在逐渐扩大。
I think it's all like my model of the situation is there's an advantage that's gradually growing.
比如说,现在编码模型可能带来了大约15%到20%的总体效率提升。
Like, I would say right now, the coding models give maybe, I don't know, a like 15, maybe 20% total factor speed up.
这是我的观点。
That's my view.
六个月前,可能只有5%。
And six months ago, it was maybe 5%.
所以那并不重要。
And so it didn't matter.
5%根本察觉不到。
5% doesn't register.
现在它正逐渐成为几个关键因素之一。
It's now just getting to the point where it's one of several factors that kind of matters.
这种速度还会继续加快。
That's going to keep speeding up.
所以我认为六个月前,有几家公司的水平大致相当,因为这还不是一个显著因素。
And so, I think six months ago, there were several companies that were at roughly the same point because this a notable factor.
但我认为它正在越来越快地加速。
But I think it's starting to speed up more and more.
我还要说,有多家公司开发用于编程的模型,而我们并不能完全阻止这些公司内部使用我们的模型。
I would also say there are multiple companies that write models that are used for code and we're not perfectly good at preventing some of these other companies from using from kind of using our models internally.
我认为我们所看到的一切都符合这种雪球模型——再次强调,我整个观点的核心是,这一切都是温和的起飞,是柔和、平滑的指数增长,尽管这些指数增长相对陡峭。
I think everything we're seeing is consistent with this kind of snowball model where there's no hard again, my theme in all of this is like, all of this is soft takeoff, like soft, smooth exponentials, although the exponentials are relatively steep.
因此,我们看到这个雪球正在积累动力,从10%、20%、25%,到40%、44%。
And so and so we're seeing this snowball gather momentum where it's like 10%, 20%, 25%, you know, 4440%.
随着进展,是的,阿姆达尔定律要求你清除所有阻碍你闭环的环节。
And as you go, yeah, Amdahl's Law, you have to get all the like things that are preventing you from closing the loop out of the way.
但这在Anthropic内部是最重要的优先事项之一。
But this is one of the biggest priorities within Anthropic.
退一步说,我认为在之前的讨论中,我们谈到过,什么时候才能实现工作中的学习?
Stepping back, I think before in the stack we
你提到编程时的观点似乎是,我们实际上并不需要工作中的学习。
were talking about, well, when do we get this on the job learning?
我们实际上并不需要工作中的学习。
And it seems like the the point you were making at the coding thing is we actually don't need on the job learning.
你可以获得巨大的生产力提升。
That you can have tremendous productivity improvements.
即使没有这种基本的人类能力,AI公司也可能获得数万亿美元的收入。
You can have potentially trillions of dollars of revenue for AI companies without this basic human ability.
也许这不是你的观点,你应该澄清一下。
Maybe that's not your claim, you should clarify.
但如果没有这种在工作中学习的基本人类能力。
But without this basic human ability to learn on the job.
但我只是看到,在大多数经济领域,人们都会说:我雇了一个人,头几个月他们并没有太大用处。
But I just look at like in most domains of economic activity, people say, I hired somebody, they weren't that useful for the first few months.
但随着时间推移,他们逐渐积累了背景知识。
And then over time they built up the context understanding.
实际上,我们在这里讨论的内容更难定义,但他们确实获得了某种东西。
It's actually harder to define what we're talking about here, but they got something.
然后现在他们成了核心力量,对我们来说非常有价值。
And then now they're a power horse and they're so valuable to us.
如果人工智能无法发展出这种即时学习的能力,我对没有这一点就能看到世界发生巨大变化持怀疑态度。
And if AI doesn't develop this ability to learn on the fly, I'm a bit skeptical that we're gonna see huge changes to the world without Yeah.
那
That
我认为这里有两点。
I think two things here.
首先是当前技术的状态,同样,我们有两个阶段。
There's the state of the technology right now, which is, again, we have these two stages.
我们有预训练和强化学习阶段,在这个阶段中,你将大量数据和任务输入模型,然后它们进行泛化。
We have the pre training and RL stage where you throw a bunch of data and tasks into the models and then they generalize.
所以这就像学习,但它是通过更多数据来学习,而不是在一个人或一个模型的生命周期内逐步学习。
So it's like learning, but it's like learning from more data and not learning over kind of one human or one model's lifetime.
因此,这介于进化和人类学习之间。
So again, this is situated between evolution and human learning.
但一旦你掌握了所有这些技能,你就拥有了它们。
But once you learn all those skills, you have them.
就像预训练一样,模型知道得更多。如果我观察一个预训练模型,它比我更了解日本武士的历史。
Just like with pre training, just how the models know more if I look at a pre trained model, it knows more about the history of samurai in Japan than I do.
它比我更了解棒球。
It knows more about baseball than I do.
它比我更了解低通滤波器和电子学。
It knows more about low pass filters and electronics.
所有这些方面,它的知识面都比我宽广得多。
All of these things, its knowledge is way broader than mine.
所以我认为,仅凭这一点或许就能让模型在所有方面都变得更强。
So I think even just that may get us to the point where the models are better at everything.
然后我们还有,同样地,仅仅通过扩展现有架构,我们就有了上下文学习能力,我会将其描述为人类在岗学习,但稍弱一些且更短期。
Then we also have, again, just with scaling the existing setup, we have the in context learning, which I would describe as human on the job learning, but a little weaker and a little short term.
你看上下文学习,你给模型一堆示例,它确实能掌握。
You look at in context learning, you give the model a bunch of examples, it does get it.
在上下文中确实发生了真实的学习,一百万tokens是相当多的。
There's real learning that happens in context, and a million tokens is a lot.
这可能相当于人类几天的学习时间。
That can be days of human learning.
对吧?
Right?
如果你想想这个模型,它读了一百万个词,你知道,我得花多长时间才能读完一百万个词?
If you think about the model, you know, kind of reading a million words, you know, it takes me how long would it take me to read a million?
我的意思是,至少得几天甚至几周吧。
I mean, you know, like days or weeks at least.
所以你有这两点,我认为在现有范式下,这两点可能就足以让你在数据中心里获得一群天才。
So you have these two things, and I think these two things within the existing paradigm may just be enough to get you the country of geniuses in the data center.
我不确定,但我认为它们能让你获得其中很大一部分。
I don't know for sure, but I think they're going to get you a large fraction of it.
可能会有一些缺口,但我确信,就目前的情况而言,这已经足以产生数万亿美元的收入。
There may be gaps, but I certainly think just as things are, this I believe is enough to generate trillions of dollars of revenue.
这是第一点。
That's one.
这是一方面。
That's all one.
另一方面是持续学习的理念,即一个模型在工作中不断学习。
Two is this idea of continual learning, this idea of a single model learning on the job.
我认为我们也在研究这一点。
I think we're working on that too.
我认为在未来一两年内,我们也很可能解决这个问题。
I think there's a good chance that in the next year or two, we also solve that.
同样,我认为即使没有它,你也能取得大部分进展。
Again, I think you get most of the way there without it.
我认为,每年数万亿美元的市场,以及我在技术青春期所写的关于国家安全和安全性的所有影响,都可能在没有它的情况下实现。
I think trillions of dollars you know, the the I think the the trillions of dollars a year market, maybe all of the national security implications and the safety implications that I wrote about in adolescence of technology can happen without it.
但我确实认为,我们,我猜其他人也在研究它。
But I I I also think we, and I imagine others, are working on it.
而且我认为,我们很可能在今后一两年内实现这一点。
And I think there's a good chance that that, you know, that we get there within the next year or two.
有很多想法。
There are a bunch of ideas.
我不会详细讨论所有这些,但其中一个就是延长上下文长度。
I won't go into all of them in detail, but, one is just make the context longer.
没有理由阻止更长的上下文起作用。
There's nothing preventing longer context from working.
你只需要在更长的上下文上进行训练,然后在推理时学会处理它们。
You just have to train at longer context and then learn to serve them at inference.
这两者都是我们正在解决的工程问题,我也相信其他人也在研究。
Both of those are engineering problems that we are working on and that I would assume others are working on as well.
是的。
Yeah.
所以这个上下文长度的增加,从2020年到2023年,从GPT-3到GPT-4 Turbo,上下文长度从大约2000行增加到了128k。
So this context line increase, it seemed like there was a period from 2020 to 2023 where from GBD three to GBD four turbo, there was an increase from like 2,000 context lines to one twenty eight k.
我觉得自从那以后的这两年左右,我们一直停留在差不多的水平。
I feel like for the next for the two ish years since then, we've been in the same ish ballpark.
是的。
Yeah.
当模型的上下文行数远超这个长度时,人们报告称模型在处理完整上下文时的能力会出现质的下降。
And when model context lines get much longer than that, people report qualitative degradation in the ability of the model to consider that full context.
所以我想知道,你们内部观察到什么现象,让你们觉得需要达到一千万、一亿上下文,才能实现人类六个月的学习效果,甚至达到十亿亿级别的上下文?
So I'm curious what you're internally seeing that makes you think like, oh, 10,000,000 context, 100,000,000 context to get human, like six month learning, billion billion context.
这不是一个研究问题。
This isn't a research problem.
这是一个工程和推理问题。
This is a this is an engineering and inference problem.
对吧?
Right?
如果你想支持长上下文,就必须存储整个KV缓存。
If you wanna serve long context, you have to store your entire KV cache.
在GPU中存储所有内存并灵活调度内存非常困难。
It's difficult to store all the memory in the GPUs, to juggle the memory around.
我甚至不知道细节。
I don't even know the detail.
你知道,到了这个层面,我已经无法跟上这些细节了。
You know, at this point, this is at a level of detail that that that I'm no longer able to follow.
不过,你知道,在GPD三时代,我知道这些是权重,这些是需要存储的激活值。
Although, you know, I I knew it at the GPD three era of, know, these are the weights, these are the activations you have to store.
但如今,整个情况都变了,因为我们有了MOE模型,所有这些东西都不一样了。
But, you know, these days, the whole thing is flipped because we have MOE models and kind of all of that.
你提到的这种退化,再深入说下去我就不具体了,我可能会问的问题是,有两件事。
This degradation you're talking about, again, without getting too specific, like a question I would ask is like, there's two things.
一个是训练时使用的上下文长度,另一个是推理时使用的上下文长度。
There's the context length you train at, and there's a context length that you serve at.
如果你在短上下文长度上训练,然后试图在长上下文长度上推理,可能会出现这种退化。
If you train at a small context length and then try to serve at a long context length, like maybe you get these degradations.
总比没有好。
It's better than nothing.
你可能仍然会提供它,但会遇到这些性能下降。
You might still offer it, but you get these degradations.
而且,也许在长上下文长度上进行训练会更困难。
And maybe it's harder to train at a long context length.
所以这里面有很多问题。
So there's a lot.
同时,我想问一下,会不会出现这样的情况:如果你必须在更长的上下文长度上训练,就意味着在相同的计算量下,你能获取的样本更少。
I wanna, at the same time, ask about maybe some rabbit holes of like, well, wouldn't you expect that if you had to train on longer context length, that would mean that you're able to get sort of like less samples in for the same amount of compute.
但在此之前,也许还不值得深入探讨这一点。
But before, maybe it's not worth diving deep on that.
我想弄清楚一个更宏观的问题:好吧,我感觉不到一个为我工作了六个月的人类编辑,和一个为我工作了六个月的AI之间有什么偏好差异。
I wanna get an answer to the bigger picture question, which is like, okay, so I don't feel a preference for a human editor that's been working for me for six months versus an AI that's been working with me for six months.
你预测什么时候会达到这种状态?
What year do you predict that that will be the case?
我的猜测是,有很多问题本质上是这样的:只有当我们拥有一个由天才组成的团队在数据中心里工作时,才能做到。
I mean, my guess for that is, there's a lot of problems that are basically like, we can do this when we have the country of geniuses in a data center.
所以我的看法是,如果你让我猜,大概是一到两年,可能是一到三年。
And so my picture for that is, you know, again, if you made me guess, it's like one to two years, maybe one to three years.
真的很难说。
It's really hard to tell.
我有很强的信念,95%到99%的把握,所有这些都会在十年内发生。
I a strong view, 99, 95% that like all this will happen in ten years.
我觉得这简直是个稳赚不赔的赌注。
Like, that's I think that's just a super safe bet.
是的。
Yeah.
然后我有个直觉。
And then I have a hunch.
这更像是五五开的情况,可能会更接近一到两年,或者更偏向一到三年。
This is more like a fifty fifty thing that it's gonna be more like one to two, maybe more like one to three.
所以是一到三年。
So one to three years.
天才的国度,以及稍微没那么经济价值的视频编辑工作。
Country of geniuses and the slightly less economically valuable task of editing videos.
这看起来相当有经济价值,让我告诉你。
It seems pretty economically valuable, let me tell you.
它有很多应用场景,确实如此。
It's just there are a lot of use cases like Exactly.
有很多类似的场景,确实如此。
There are a of similar Exactly.
所以你预测在一年到三年内。
So you're predicting that within one to three years.
然后,一般来说,Anthropic 预测,到 2026 年底或 2027 年初,我们将拥有能够操作人类今日从事数字工作所用界面的 AI 系统,其智力能力达到或超越诺贝尔奖得主的水平,并能与物理世界交互。
And then generally, Anthropic has predicted that by late twenty six, early twenty seven, we will have AI systems that are quote, have the ability to navigate interfaces available to humans doing digital work today, intellectual capabilities matching or exceeding that of Nobel prize winners, and the ability to interface with the physical world.
两个月前,你接受 DealBook 采访时,强调了贵公司相较于竞争对手更负责任的算力扩展策略。
And then you gave an interview two months ago with DealBook, where you're emphasizing your company's more responsible compute scaling as compared to your competitors.
而我正试图调和这两种观点:如果你真的相信我们即将迎来一个天才的国度,那你应该希望拥有尽可能大的数据中心。
And I'm trying to square these two views where if you really believe that we're gonna have a country of geniuses, you you want as big a data center as you can get.
没有理由放慢脚步。
There's no reason to slow down.
一个能够完成诺贝尔奖得主所有工作的诺贝尔奖得主级别的TAM,价值高达数万亿美元。
The TAM of a Nobel Prize winner that is actually can do everything a Nobel Prize winner can do is like trillions of dollars.
因此,我正在试图调和这种保守态度——如果你对时间线持较温和的看法,这种态度似乎是合理的——与你对AI进展的公开观点。
And so I'm trying to square this conservatism, which seems rational if you have more moderate timelines, with your stated views about AI progress.
是的。
Yeah.
所以这一切其实都是吻合的。
So it actually all fits together.
我们回到这个快速但并非无限快的扩散过程。
And we go back to this fast, but not infinitely fast diffusion.
假设我们以这个速度取得进展。
So let's say that we're making progress at this rate.
技术正以这样的速度进步。
Technology is making progress this fast.
再次强调,我非常有信心我们会在几年内实现目标。
Again, I have very high conviction that it's going We're gonna get there within a few years.
我有种预感,我们在一两年内就能达成。
I have a hunch that we're gonna get there within a year or two.
所以在技术方面有些不确定性,但我相当有信心偏差不会太大。
So a little uncertainty on the technical side, but pretty strong confidence that it won't be off by much.
我比较不确定的是,再次强调,经济扩散方面。
What I'm less certain about is, again, the economic diffusion side.
我确实相信,在一到两年内,我们就能拥有在数据中心里相当于一个国家规模的模型。
I really do believe that we could have models that are a country of in the data center in one to two years.
一个问题是,在那之后还需要多少年,数万亿美元的收入才会开始滚滚而来?
One question is, how many years after that do the trillions in revenue start rolling in?
我不认为这能保证会立即实现。
I don't think it's guaranteed that it's going to be immediate.
我认为可能是一年,也可能是两年,我甚至可以把时间拉长到五年,尽管我对此持怀疑态度。
I think it could be one year, it could be two years, I could even stretch it to five years, although I'm skeptical of that.
因此,我们面临这种不确定性:即使技术发展的速度如我所预期的那样快,我也无法确切知道它将如何推动收入增长。
And so we have this uncertainty, which is even if the technology goes as fast as I suspect that it will, don't know exactly how fast it's going to drive revenue.
我们知道它终将到来,但如果你在采购这些数据中心时误差了几年,后果可能是毁灭性的。
We know it's coming, but with the way you buy these data centers, if you're off by a couple years, that can be ruinous.
这就像我在《爱的机器》中写的那样:我认为我们可能会获得这种强大的人工智能,一个存在于数据中心的天才国度。
It is just like how I wrote in Machines of Loving Grace, I said, look, I think we might get this powerful AI, this country of genius in the data center.
你刚才描述的这个概念,正是来自《爱的机器》。
That description you gave comes from the Machines of Loving Grace.
我说过,我们可能在2026年,或者2027年实现这一点。
I said, we'll get that twenty twenty six, maybe twenty twenty seven again.
这是我的直觉。
That is my hunch.
如果我预测的时间偏差一两年,我也不觉得意外,但这就是我的直觉。
Wouldn't be surprised if I'm off by a year or two, but like, that is my hunch.
假设这种情况真的发生了。
Let's say that happens.
那就是起跑的信号。
That's the starting gun.
治愈所有疾病需要多长时间?
How long does it take to cure all the diseases?
对吧?
Right?
这是推动巨大经济价值的一种方式。
That's one of the ways that drives a huge amount of economic value.
对吧?
Right?
你治愈了所有疾病。
You cure every disease.
关于这部分收益有多少归制药公司、有多少归AI公司,确实有个问题,但会产生巨大的消费者剩余,因为假设我们能确保每个人都能获得治疗——这是我非常关心的——治愈所有这些疾病。
There's a question of how much of that goes to the pharmaceutical company, to the AI company, but there's an enormous consumer surplus because everyone assuming we can get access for everyone, which I care about greatly, cure all of these diseases.
需要多长时间?
How long does it take?
你必须进行生物学上的发现。
You have to do the biological discovery.
你需要生产出这种新药。
You manufacture have the new drug.
你必须通过监管流程。
You have to go through the regulatory process.
我们在疫苗和新冠疫情期间看到了这一点。
We saw this with vaccines and COVID.
我们确实把疫苗分发给了每个人,但花了一年半的时间。
There's just this, we got the vaccine out to everyone, but it took a year and a half.
所以我的问题是,要多久才能让AI为所有人发明出治愈所有疾病的方法?
So my question is, how long does it take to get the cure for everything, which AI is the genius that can, in theory, invent out to everyone.
从AI首次在实验室中诞生,到所有疾病真正被治愈,需要多长时间?
How long from when that AI first exists in the lab to when diseases have actually been cured for everyone?
对吧?
Right?
我们已经有脊髓灰质炎疫苗五十年了。
We've had a polio vaccine for fifty years.
我们仍在努力消除非洲最偏远角落的脊髓灰质炎。
We're still trying to eradicate it in the most remote corners of Africa.
你知道,比尔及梅琳达·盖茨基金会正在尽最大努力。
And, you know, the Gates Foundation is trying as hard as they can.
其他人也在尽最大努力,但你知道,这很困难。
Others are trying as hard as they can, but, you know, that's difficult.
再说一遍,我不认为大部分经济扩散会像那样困难。
Again, I, you know, I don't expect most of the economic diffusion to be as difficult as that.
对吧?
Right?
那是最困难的情况。
That's like the most difficult case.
但这里确实存在一个两难困境。
But there's a real dilemma here.
我最终的结论是,这将比我们世界上见过的任何事物都更快,但它仍然有其局限性。
And where I've settled on it is it will be faster than anything we've seen in the world, but it still has its limits.
因此,当我们考虑购买数据中心时,我关注的曲线是:好吧,我们每年都有十倍的增长。
And so then when we go to buying data centers, again, the curve I'm looking at is, okay, we've had a 10x a year increase every year.
所以今年年初,我们预计的年化收入速率为100亿美元。
So beginning of this year, we're looking at 10,000,000,000 in rate of annualized revenue at the beginning of the year.
我们必须决定要购买多少计算能力。
We have to decide how much compute to buy.
而实际建设并预订数据中心需要一到两年的时间。
And it takes a year or two to actually build out the data centers, to reserve the data centers.
所以,我实际上是在问:到2027年,我能获得多少计算能力?
So basically, I'm saying in 2027, how much compute do I get?
我可以假设收入将继续以每年十倍的速度增长,因此到2026年底将达到1000亿美元,到2027年底将达到1万亿美元。
Well, I could assume that the revenue will continue growing 10x a year, so it'll be 100,000,000,000 at the end of 2026 and 1,000,000,000,000 at the end of 2027.
因此,我可以购买价值一万亿美元的计算能力。
And so I could buy a trillion dollars.
实际上,这将相当于五万亿美元的计算能力,因为每年要花一万亿,持续五年。
Actually, it would be like $5,000,000,000,000 of compute because it would be a trillion dollar a year for five years.
对吧?
Right?
我可以购买一万亿美金的计算能力,从2027年开始投入使用。
I could buy a trillion dollars of compute that starts at the 2027.
如果我的收入不是一万亿,哪怕只有八千亿,地球上也没有任何力量。
If my revenue is not a trillion dollars, if it's even 800,000,000,000, there's no force on earth.
如果我购买这么多计算能力,没有任何对冲手段能阻止我破产。
There's no hedge on earth that could stop me from going bankrupt if I buy that much compute.
所以,尽管我内心一部分在想,收入会不会继续以十倍增长,但我不能在2027年每年花一万亿来购买计算能力。
So even though a part of my brain wonders if it's going to keep growing 10x, I can't buy a trillion dollars a year of compute in 2027.
如果我对增长率的预估晚了一年,或者增长率是每年五倍而不是十倍,那你就会破产。
If I'm just off by a year in that rate of growth or if the growth rate is 5x a year instead of 10x a year, then you go bankrupt.
因此,你最终会处在一个支持数千亿而非万亿的世界里,你必须接受一些风险:需求可能大到你无法满足,同时也要接受另一种风险——你判断失误,实际增长仍然缓慢。
And so you end up in a world where you're supporting hundreds of billions, not trillions, and you accept some risk that there's so much demand that you can't support the revenue, and you accept still some risk that you got it wrong and it's still slow.
所以当我谈到负责任地行事时,我真正想表达的并不是绝对金额。
And so when I talked about behaving responsibly, what I meant actually was not the absolute amount.
实际上,我认为我们确实比一些其他竞争对手花得少一些。
That that actually was not you know, I think it is true we're spending somewhat less than some of the other players.
真正重要的是其他方面,比如我们是否对此深思熟虑?
It's actually the other things like, have we been thoughtful about it?
还是说我们在盲目冒险,觉得这里花一千亿美元、那里再花一千亿美元就行。
Or are we YOLO ing and saying, oh, we're gonna do a $100,000,000,000 here, a $100,000,000,000 there.
我有种感觉,一些其他公司根本没认真算过账,他们并不真正理解自己承担的风险。
I kinda get the impression that, you know, some of the other companies have not written down the spreadsheet, that they don't really understand the risks they're taking.
他们只是因为听起来很酷就盲目行动。
They're just kind of doing stuff because it sounds cool.
我们已经仔细考虑过了。
We've thought carefully about it.
对吧?
Right?
我们是一家企业级公司。
We're an enterprise business.
因此,我们可以更依赖收入。
Therefore, we can rely more on revenue.
企业收入比消费市场更稳定。
It's less fickle than consumer.
我们的利润率更高,这在采购过多和采购过少之间提供了缓冲。
We have better margins, which is the buffer between buying too much and buying too little.
所以我认为,我们采购的量足以让我们在前景良好的情况下获得强劲的回报。
And so I think we bought an amount that allows us to capture pretty strong upside worlds.
这不会完全捕捉到每年十倍的增长,除非情况非常糟糕,否则我们不会陷入财务困境。
It won't capture the full 10x a year, and things would have to go pretty badly for us to be in financial trouble.
因此,我认为我们已经仔细思考过,并做出了这种平衡。
So I think we've thought carefully and we've made that balance.
这正是我说我们负责任时的意思。
And that's what I mean when I say that we're being responsible.
好的。
Okay.
所以看起来,我们可能对‘数据中心里的国家级天才’有着不同的定义。
So it seems like it's possible that we actually just have different definitions of a country of a genius in a data center.
因为当我想到真正的天才人类、数据中心里的真正人类天才群体时,我会觉得,我愿意花5万亿美元来运行这样一个数据中心里的天才群体。
Because when I think of like actual human geniuses, an actual country of human geniuses in a data center, I'm like, I would happily buy $5,000,000,000,000 worth of compute to run actual country of human geniuses a data center.
所以,假设摩根大通或Moderna之类的公司不想使用它们。
So let's say JP Morgan or Moderna or whatever doesn't wanna use them.
另外,我也有一个天才国家。
Also, I've got a country of geniuses.
他们会自己创业。
They'll start their own company.
如果他们无法创业,被临床试验卡住了,那值得在临床试验上投入资源。
And if like they can't start their own company and they're bottlenecked by clinical trials, it is worth stating with clinical trials.
大多数临床试验失败,是因为药物无效。
Most clinical trials fail because the drug doesn't work.
没有效果。
There's not efficacy.
我在《爱与恩典的机器》中正是提出了这一点。
And I make exactly that point in Machines of Love and Grace.
假设临床试验会比我们习惯的快得多,但并非瞬间完成,也不是无限快速。
Say the clinical trials are gonna go much faster than we're used to, but not instant, not infinitely fast.
然后假设临床试验需要一年时间才能取得成果,让你开始获得收入,并能研发更多药物。
And then suppose it takes a year for the clinical trials to work out so that you're getting revenue from that and you can make more drugs.
好吧,你拥有一群天才,而你是一家AI实验室,可以使用更多的AI研究人员。
Okay, well, you've got a country of geniuses and you're an AI lab and you could use many more AI researchers.
而且你也认为,聪明人从事AI技术工作会产生自我强化的收益。
And you also think that there's these self reinforcing gains from, you know, smart people working on AI tech.
所以,好吧,可以有
So, like, okay, can have the
没错。
That's right.
但你可以让数据中心专注于,比如,人工智能的进展。
But You can have the data center working on, like, AI progress.
购买每年一万亿美元的计算资源,相比每年三千亿美元,是否会产生显著更多的收益?
Is there more gains from buying, like, substantially more gains from buying a trillion dollars a year of compute versus $300,000,000,000 a year of compute.
如果你的竞争对手正在购买一万亿美元的计算资源,那么是的,确实存在这种优势。
If your competitor is buying a trillion, yes, there is.
嗯,不对。
Well, no.
确实有一些收益,但话又说回来,也存在这种可能性:他们在...之前就破产了,你知道,再说一次,哪怕只差一年,你们也会自我毁灭。
There's some gain, but then but again, there's this chance that they go bankrupt before, you know, be again, if you're off by only a year, you destroy yourselves.
这就是平衡所在。
That's the that's the balance.
我们正在大量买入。
We're buying a lot.
我们买入的量非常巨大。
We're buying a hell of a lot.
我们购买的数量与行业内最大玩家的采购量相当。
Like, we're not we're we're you know, we're buying an amount that's comparable to that that, you know, the the the the biggest players in the game are buying.
但如果你问我,为什么我们没有在2027年中期签下100万亿美元的算力合同呢?
But but if you're asking me, why why haven't we signed, you know, $1,010,000,000,000,000 of compute starting in starting in mid twenty twenty seven?
首先,这种规模的算力根本生产不出来。
First of all, it can't be produced.
世界上根本没有这么多算力。
There isn't that much in the world.
但其次,如果那个天才国家的崛起推迟到了2028年中期,而不是2027年中期呢?
But but second, what if the country of geniuses comes, but it comes in mid twenty twenty eight instead of mid twenty twenty seven?
那你就会破产。
You go bankrupt.
所以如果你的预测是一到三年,那到2029年,你是不是应该已经拿到了10万亿美元的算力?
So if your projection is one to three years, it seems like you should have won $10,000,000,000,000 of compute by 2029?
2020年,也许2020年。
2020 and maybe 2020.
最晚什么时候?
At latest?
我的意思是,你知道,你你,但比如,你是不是
Like, I mean, you know, you you But, like, are
你不确定吗?即使按照你最长的时间线,你所计划建设的算力似乎也与之不符。
you not sure like, it seems like even in your the longest version of the timelines you state, the compute you are ramping up to build doesn't seem What what accordance.
你为什么这么认为?
What what makes you think that?
正如你所说,你会希望获得十万亿级别的算力,假设人类工资总额每年大约是五万亿。
Well, you you as you said, you would want the 10,000,000,000,000 like, human wages, let's say, are on the order of 50,000,000,000,000 a year.
如果你看一下,我不会具体谈Anthropic,但就整个行业而言,今年行业建设的算力可能在十到十五吉瓦左右,明年会增长大约三倍。
If if you look at so so I won't I won't talk about Anthropic in particular, but if you talk about the industry, like the amount of compute the industry the amount of compute the industry is building this year is probably in the, I don't know, very low tens of call it ten, fifteen gigawatts Next year, it goes up by roughly three X a year.
所以下一年可能是三四十吉瓦,2028年可能是100吉瓦,2029年可能达到300吉瓦。
So like next year's 30 or 40 gigawatts and 2028 might be 100, 2029 might be like 300 gigawatts.
每吉瓦的成本大约是100亿美元,我是在心里算的,每吉瓦每年成本大约在100亿到150亿美元之间。
And each gigawatt costs like maybe 10 I mean, I'm doing the math in my head, but each gigawatt costs maybe $10,000,000,000 border 10 to $15,000,000,000 a year.
把这些加在一起,你就得到了你所描述的规模。
Put that all together and you're getting about what you described.
到2028年或2029年,你每年将获得数万亿美元的规模。
You're getting multiple trillions a year by 2028 or 2029.
所以你正好得到了那个数字。
So you're getting exactly that.
你正好得到了你所预测的结果。
You're getting exactly what you predict.
这是针对整个行业的。
That's for the industry.
这是针对整个行业的。
That's for the industry.
没错。
That's right.
假设Anthropic的算力每年保持三倍增长。
Suppose anthropics compute keeps three X ing a year.
到2027年或2028年,你就有了10吉瓦。
And then by like '27, you have Or '27, 28, you have 10 gigawatts.
再乘以
And multiply that by,
你说
as you
100亿。
say, 10,000,000,000.
那么就是每年1000亿。
So then it's like a 100,000,000,000 a year.
但你说2028年的总可服务市场是,
But then you're saying the TAM by 2028,
我不打算给出Anthropic的具体数字,但这些数字太小了。
I 20 I don't wanna give exact numbers for Anthropic, but but these numbers are too small.
这些数字太小了。
These numbers are too small.
好的。
Okay.
有意思。
Interesting.
我非常自豪,我与Jane Street合作设计的谜题已经帮助他们从我的观众中招聘了很多人。
I'm really proud that the puzzles I've worked on with Jane Street have resulted in them hiring a bunch of people from my audience.
他们仍在招聘,而且刚刚又发给我一个新的谜题。
Well, they're still hiring, and they just sent me another puzzle.
这次他们花了大约2万GPU小时,在三个不同的语言模型中植入了后门。
For this one, they spent about 20,000 GPU hours training backdoors into three different language models.
每个模型都有一个隐藏的提示,会引发完全不同的行为。
Each one has a hidden prompt that elicits completely different behavior.
你只需要找出触发条件。
You just had to find the trigger.
这尤其令人兴奋,因为发现后门实际上是前沿AI研究中的一个开放性问题。
This is particularly cool because finding backdoors is actually an open question in frontier AI research.
Anthropic 实际上发布了几篇关于休眠代理的论文,他们展示了可以通过在残差流上构建一个简单的分类器来检测后门即将触发的时刻。
Anthropic actually released a couple of papers about sleeper agents, and they showed that you can build a simple classifier on the residual stream to detect when a backdoor is about to fire.
但他们已经知道触发条件是什么,因为他们自己设计了这些触发条件。
But they already knew what the triggers were because they built them.
而在这里,你并不知道触发条件,而且检查所有可能触发短语的激活状态是不现实的。
Here, you don't, and it's not feasible to check the activations for all possible trigger phrases.
与他们为这个播客制作的其他谜题不同,Jane Street 甚至不确定这个谜题是否可解,但他们已经为最佳的尝试和说明预留了5万美元的奖金。
Unlike the other puzzles they made for this podcast, Jane Street isn't even sure this one is solvable, but they've set aside $50,000 for the best attempts and write ups.
这个谜题已在 janestreet.com/tawarkesh 上线,他们接受提交直到4月1日。
The puzzle's live at janestreet.com/tawarkesh, and they're accepting submissions until April 1.
好的。
Alright.
我们回到Dario。
Back to Dario.
你曾告诉投资者,你计划从2028年开始实现盈利,而今年正是我们可能获得一群天才工程师入驻数据中心的关键年份。
You've told investors that you plan to be profitable starting in '28, and this is the year where we're, like, potentially getting the country of geniuses at a data center.
而且你知道,这将解锁所有这些在医学、健康和等等方面的进展,以及新技术。
And we you know, this is, like, gonna now unlock all this progress and medicine and health and etcetera, etcetera, and new technologies.
这不正是你应该重新投资业务、建设更大规模数据中心的时候吗?这样他们就能生产更多。
Wouldn't this be a particular exactly the time where you'd like want to reinvest in the business and build bigger countries so they can So, make more
我的意思是,在这个领域,盈利能力是个挺奇怪的东西。
I mean, profitability is this kind of weird thing in this field.
我认为在这个领域,盈利能力实际上反映的是支出减少还是对业务进行投资。
I think in this field profitability is actually a measure of spending down versus investing in the business.
我们就拿这个模型来分析一下。
Let's just take a model of this.
我其实认为,当你低估了需求量时就会实现盈利,而当你高估了需求量时就会亏损,因为你提前购买了数据中心。
I actually think profitability happens when you underestimated the amount of demand you were going to get and loss happens when you overestimated the amount of demand you were going to get because you're buying the data centers ahead of time.
所以你可以这样想。
So think about it this way.
理想情况下,我希望——当然,这些都是简化后的事实。
Ideally, would like And again, these are stylized facts.
这些数字并不精确,我只是想构建一个简化模型。
These numbers are not exact for I'm just trying to make a toy model here.
假设你一半的算力用于训练,另一半用于推理。
Let's say half of your compute is for training and half of your compute is for inference.
而推理的毛利润率高于50%。
And the inference has some gross margin that's like more than 50%.
这意味着,如果你处于稳定状态,且能准确预知需求,建造一个数据中心后,你会获得一定的收入。比如说,你每年在算力上花费1000亿美元,其中500亿美元用于支持1500亿美元的收入,另外500亿美元则用于训练。
And so what that means is that if you were in steady state, you build a data center, if you knew exactly the demand you were getting, you would get a certain amount of revenue, say, I don't know, let's say you pay $100,000,000,000 a year for compute, and on $50,000,000,000 a year, you support $150,000,000,000 of revenue, and the other 50,000,000,000 are used for training.
因此,你实际上盈利了500亿美元。
So basically, you're profitable, you make $50,000,000,000 of profit.
这就是当今行业的商业模式。
Those are the economics of the industry today.
或者更准确地说,不是现在,而是我们预计一两年后会达到的状况。
Or sorry, not today, but that's where we're projecting forward in a year or two.
唯一会让这种情况不成立的原因是,如果你的需求低于500亿美元,那么你用于研究的数据中心占比就会超过50%,从而无法盈利。
The only thing that makes that not the case is if you get less demand than 50,000,000,000, then you have more than 50 percent of your data center for research and you're not profitable.
所以你训练更强大的模型,但却不盈利。
So you train stronger models, but you're not profitable.
如果你的需求超过预期,那么你的研究投入就会被压缩,但你能支持更多的推理任务,从而更盈利。
If you get more demand than you thought, then your research gets squeezed, but you're able to support more inference and you're more profitable.
也许我没解释清楚,但我想说的是,你先决定计算资源的总量,然后对推理和训练有一定的目标需求,但这些需求实际上是由市场需求决定的。
Maybe I'm not explaining it well, but the thing I'm trying to say is you decide the amount of compute first, and then you have some target desire of inference versus training, but that gets determined by demand.
它并不是由什么决定的。
It doesn't get determined by What
我听到的是,你预测盈利的原因是你系统性地低估了计算资源的投资。
I'm hearing is the reason you're predicting profit is that you are systematically under investing in compute.
对吧?
Right?
因为如果你真的
Because if you actually
像计算资源这样的东西很难预测。
like compute I'm it's hard to predict.
所以关于2028年这些事以及何时会发生,这是我们尽力向投资者解释的最佳方式。
So these things about 2028 and when it will happen, that's our attempt to do the best we can with investors.
所有这些事情都充满不确定性,因为存在不确定性区间。
All of this stuff is really uncertain because of the cone of uncertainty.
如果收入增长足够快,我们甚至可能在2026年就实现盈利。
Like, we could be profitable in 2026 if the if the revenue grows fast enough.
然后,如果你对下一年的预测高估或低估了,结果可能会大幅波动。
And then and then, you know, if we overestimate or underestimate the next year, that could swing wildly.
我想表达的是,你脑子里有一个模型,认为企业要不断投资、不断投资,直到达到规模,然后才开始盈利。
Like, I I I what I'm trying to get is you have a model in your head of, like, the the business invest, invest, invest, invest, get scale, and kind of then becomes profitable.
有一个明确的转折点,事情会突然逆转。
There's a single point at which things turn around.
我不认为这个行业的经济规律是这样的。
I don't think the economics of this industry work that way.
我明白了。
I see.
所以如果我理解得没错,你的意思是,由于我们实际获得的算力与本应获得的算力之间存在差距,我们被迫转向盈利,但这并不意味着我们会持续盈利。
So if I'm understanding correctly, you're saying because of the discrepancy between the amount of compute we should have gotten and the amount of compute got, we were like sort of forced to make profit, but that doesn't mean we're gonna continue making profit.
我们会把钱重新投入,因为现在AI已经取得了巨大进展,我们希望吸引更多顶尖人才。
We're gonna like reinvest the money because well, now AI has made so much progress and we want the bigger country of geniuses.
所以当收入很高时,亏损也同样很高。
And so then back into revenue is high, but losses are also high.
如果我们每年都能准确预测需求,那么我们每年都会盈利。
If we predict, if every year we predict exactly what the demand is going to be, we'll be profitable every year.
因为将大约50%的算力用于研发,再加上高于50%的毛利率,再加上准确的需求预测,就能实现盈利。
Because spending 50% of your compute on research, roughly, plus a gross margin that's higher than 50% and correct demand prediction leads to profit.
这是一种盈利模式,我认为它其实已经存在了,只是被这种提前建设与预测误差掩盖了。
That's profitable business model that I think is kind of like there, but like obscured by these like building ahead and prediction errors.
我猜你是把这50%当作一个固定的常数来看待。
I guess you're treating the 50% as a sort of know, just like a given constant.
但事实上,如果AI发展迅速,而你能通过扩大规模进一步加速进展,你就应该
Whereas you in fact, if AI progresses fast and you can increase the progress by scaling up more, you should
只需要更多
just have more than
超过50%,却没有任何进展。
50% and not make progress.
让我这么说吧。
Here's what I'll say.
你可能想扩大规模更多。
You might wanna scale up it more.
你可能想进一步扩大规模。
You might wanna scale it up more.
但但但,你要记住规模收益递减。
But but but, you know, remember the log returns to scale.
对吧?
Right?
如果70%的投入只能让你的模型规模缩小1.4倍,对吧?那额外的200亿美元,每一块钱对你的价值都低得多,因为这是对数线性关系。
If if 70% would get you a very little bit of a smaller model through a factor of of 1.4 x, right, like, that extra $20,000,000,000 is, you know, that each dollar there is worth much less to you because of the log linear setup.
因此,你可能会发现,将这200亿美元投资于服务推理或聘请更优秀的工程师会更好,这些工程师更擅长他们所做的事情。
And so you might find that it's better to invest that $20,000,000,000 in in serving inference or in hiring engineers who are who are who are are kinda better who are who are kinda better at what they're doing.
所以我提到50%,这并不是我们的确切目标。
So the the reason I said 50%, that's not that's not exactly our target.
它不会正好是50%。
It's not exactly gonna be 50%.
它可能会随时间而变化。
It'll probably vary vary over time.
我想说的是,对数线性回报意味着你只需投入业务中约一小部分资金。
What what I'm saying is the the the the the like log linear return, what it leads to is you spend of order one fraction of the business.
对吧?
Right?
比如,不是5%,也不是95%。
Like, not 5%, not 95%.
然后你会因为对数效应而遭遇收益递减,因为每个人训练时都遵循对数规律。
And then you get diminishing returns because of the log log Everyone's logs trained it.
我正在努力说服达里奥相信人工智能的进步之类的。
I'm like convincing Dario to believe in AI progress or something.
但好吧,你不会因为研究有递减回报就不投资,而是会投资于你提到的其他方面。
But like, okay, you don't invest in research because it has diminishing returns, but you invest in the other things you mentioned.
再说一遍,我们讨论的是在你每年花费500亿美元之后的递减回报,对吧?
Again, we're talking about diminishing returns after you're spending 50,000,000,000 a year, right?
这个
This
这正是你肯定会提到的一点,但即使是天才,其递减回报也可能相当高。
is a point I'm sure you would make, but like diminishing returns on a genius could be quite high.
更广泛地说,市场中的利润到底是什么?
And more generally, like, what is profit in the market economy?
利润基本上意味着市场上的其他公司能用这笔钱做比我更多的事情。
Profit is basically saying the other companies in the market can like do more things with this money that I
无法将这些资源留给Anthropic。
can't then put aside anthropic.
我只是想说明一下,因为我不想透露关于Anthropic的具体信息,所以我才用了这些概化的数字,但让我们来推导一下整个行业的均衡状态。
I'm just trying to like because I I, you know, I don't wanna give information about anthropic is why I'm giving these stylized numbers, but let's just derive the equilibrium of the industry.
对吧?
Right?
那为什么没有人把100%的算力都用于训练,而不服务任何客户呢?
I think the so why doesn't everyone spend 100% of their compute on training and not serve any customers?
对吧?
Right?
这是因为如果没有任何收入,他们就无法融资,无法达成算力协议,也无法在下一年购买更多算力。
It's because if they didn't get any revenue, they couldn't raise money, they couldn't do compute deals, they couldn't buy more compute the next year.
因此,会存在一个均衡点:每家公司用于训练的算力都少于100%,更不用说用于推理的算力了。
So there's gonna be an equilibrium where every company spends less than 100% on training and certainly less than 100% on inference.
你应该很清楚,为什么不能只服务现有模型而从不训练新模型——因为那样你就没有需求了,最终会被甩在后面。
It should be clear why you don't just serve the current models and never train another model because then you don't have any demand because you'll fall behind.
所以会存在某种均衡状态。
So there's some equilibrium.
不会是10%。
It's not gonna be 10%.
也不会是90%。
It's not gonna be 90%.
我们就姑且以一个风格化的数据来说,是50%。
Let's just say as a stylized fact, it's 50%.
这正是我想表达的。
That's what I'm getting at.
我认为,我们将会处于这样一种状态:你在训练上投入的占比,会低于你从计算资源中获得的毛利。
I think we're gonna be in a position where that equilibrium of how much you spend on training is less than the gross margins that you're able to get on compute.
因此,底层的经济模型是盈利的。
And so the underlying economics are profitable.
问题是,当你采购下一年的计算资源时,会面临一个极其棘手的需求预测问题——你如果预测不足,虽然利润丰厚,却可能没有算力用于研究;如果你预测过度,虽然研究算力充足,却可能亏损。
The problem is you have this hellish demand prediction problem when you're buying the next year of compute, and you might guess under and be very profitable, but have no compute for research, or you might guess over and you are not profitable and you have all the compute for research in the world.
这说得通吗?
Does that make sense?
作为一个动态模型,也许
Just as a dynamic model of Maybe the
退一步说,我不是在说天才国家会在两年内出现,因此你应该购买这些算力。
stepping back, I'm not saying I think the country of genius is gonna come in two years and therefore you should buy this compute.
在我看来,你所说的、你得出的最终结论很有道理,但那是因为它看起来像是,天才国家很难实现,还有很长的路要走。
To me, what you're saying, the end conclusion you're arriving at makes a lot of sense, but that's because it's like, oh, it seems like country geniuses is hard and there's a long way to go.
所以退一步讲,我想表达的是,你的世界观似乎与这样一种观点兼容:我们距离一个能产生万亿美元价值的世界还有大约十年。
And so the stepping back, the thing I'm trying to get at is more like, it seems like your worldview is compatible with somebody who says, we're like ten years away from a world in which we're generating trillions of dollars worth.
那不是我的观点。
That's just not my view.
那不是我的观点。
That is not my view.
那我再做一个预测。
So I'll make another prediction.
我很难想象在2030年之前不会产生数万亿美元的收入。
It is hard for me to see that there won't be trillions of dollars in revenue before 2030.
我可以构想出一个合理的场景。
I can construct a plausible world.
这可能需要三年时间,这是我心目中合理情况的终点。
It takes maybe three years, so that would be the end of what I think it's plausible.
比如在2028年,我们会在数据中心迎来真正的天才国家。
Like in 2028, we get the real country of geniuses in the data center.
到2028年,收入可能已经达到了数百亿美元的水平。
The revenue has been going into the maybe is is in the low hundreds of billions by by by by 2028.
然后,天才国家将推动收入增长至万亿美元级别,换句话说,我们基本上处于扩散过程的缓慢一端。
And and and then the country of geniuses accelerates it to trillions, you know, and and we're basically we're basically on the slow end of diffusion.
再花两年时间就能达到万亿美元的规模。
It takes two years to get to the trillions.
那就是一个需要等到2030年才能实现的世界。
That that that would that that that would be the world where it takes until that would be the world where it takes until 2030.
我怀疑,即使只考虑技术指数增长和扩散指数增长,我们也会在2030年之前达成目标。
I I suspect even composing the technical exponential and diffusion exponential will get there before 2030.
所以你提出了一种模型,认为Anthropic能盈利,因为从根本上说,我们正处于一个计算资源受限的世界。
So you laid out a model where Anthropic makes profit because it seems like fundamentally, we're in a compute constrained world.
因此,最终我们会持续增加计算能力。
And so it's like, eventually we keep growing compute.
不。
No.
我认为利润的来源是这样的,让我们先抽象地看待整个行业。
I think I think the way the profit comes is again, and and let's let's just abstract the whole industry industry here.
让我们想象自己正处在一本经济学教科书的情境中。
Let's just imagine we're in an economics textbook.
我们只有少数几家公司,每家都能投入有限的资金,或者投入一定比例的资金用于研发。
We have a small number of firms, each can invest a limited amount or each can invest some fraction in R and D.
它们的服务边际成本是存在的。
They have some marginal cost to serve.
由于推理效率高,尽管存在一些竞争,但模型之间也有差异化,因此边际成本的毛利率非常高。
The gross profit margins on that marginal cost are very high because inference is efficient, there's some competition, but the models are also differentiated.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。