本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
好吧。
Alright.
这是我的室友教我半导体的那期节目。
This is the episode of my roommate teaches me semiconductors.
这也是对这个的告别,
It's also the send off for this,
这一组内容。
this current set.
是的。
It's yeah.
你知道的,用过之后,我就觉得,我再也不会用这个了。
You're you know, after you use it, I'm like, I can't use this again.
我得赶紧离开这儿。
I gotta get out here.
那些门套的二手货。
Those sloppy seconds for door kits.
好的。
Okay.
戴伦是SemiAnalysis的首席执行官。
Dylan is the, CEO of SemiAnalysis.
戴伦,我有个迫切的问题想问你。
Dylan, the burning question I have for you.
如果你把四大科技公司——亚马逊、Meta、谷歌和微软——的年度资本支出加总起来,你最近发布的预测数据显示,总额高达6000亿美元;考虑到这些算力的租赁价格,这相当于接近50吉瓦的电力消耗。
If you add up the big four, Amazon, Meta, Google, Microsoft, their combined, forecasted CapEx that you published recently this year is $600,000,000,000 And given, you know, prices of renting that compute, would be like close to 50 gigawatts.
显然,我们今年并不会新增50吉瓦的电力容量。
Now, obviously we're not putting on 50 gigawatts this year.
因此,这笔支出 presumably 是为了支付未来几年将陆续上线的算力资源。
So presumably that's paying for compute that is gonna be coming online over the coming years.
所以我想问一下,这些资本支出何时会真正转化为实际上线的算力?
So I have a question about what, how to think about the timeline around when that CapEx comes online.
对于实验室来说也是类似的问题:OpenAI刚刚宣布融资1100亿美元,Anthropic也宣布融资300亿美元;如果你看看他们今年将上线的算力规模,虽然你得告诉我具体数字,但难道不是总共又增加了4吉瓦吗?
Similar question for the labs where, you know, OpenAI just announced that they raised $110,000,000,000 Anthropic just announced they raised $30,000,000,000 And if you look at the compute that they have coming online this year, you should tell me how much it is, but like, isn't it isn't it another four gigawatts total that they'll have this year?
感觉OpenAI和Anthropic今年为维持其算力支出所租赁的算力成本,按每吉瓦1300亿美元计算。
It feels like the cost to rent the compute that OpenAI and Anthropic will have this year to like sustain their compute spend at, you know, dollars $10.13000000000 a gigawatt.
仅这两家公司的单笔融资就足以覆盖他们今年的算力支出。
Those individual raises alone are like enough to cover their compute spend for the year.
而这还不包括他们今年将获得的收入。
And then this is not even including the revenue that they're gonna earn this year.
所以请先帮我理解一下,大型科技公司的资本支出实际何时会投入使用?
So help me understand first, when is the timescale at which the big tech CapEx is actually coming online?
其次,如果这些实验室筹集这么多资金,究竟是为了什么?
And two, what are the labs raising all this money for if like the
一个吉瓦级数据中心的年运营成本大约是130亿美元,当你谈到这些超大规模厂商高达6000亿美元的资本支出时,再算上整个供应链的其他环节,总规模将达到约一万亿美元。
the yearly price of a one gigawatt data center is like $13,000,000,000 So when you talk about the CapEx of these hyperscalers, on the order of $600,000,000,000 and you look at the cross the rest of the supply chain gets you to on the order of a trillion dollars.
其中一部分是今年直接用于上线算力的支出,对吧?
A portion of this is, you know, immediately for compute going online this year, right?
今年实际支付的芯片及其他资本支出部分。
The chips and the the other parts of CapEx that do get paid this year.
但还有很多前期资本支出。
But there's a lot of setup CapEx as well.
对吧?
Right?
所以当我们谈到今年美国大约20吉瓦的增量时,
So when we have when we're talking about 20 gigawatts this year in America, roughly Incremental.
这部分新增容量中,并非全部都在今年支出。
Incremental added capacity, A portion of this is not spent this year.
其中一部分资本支出实际上是在前一年就花掉了。
A portion of that CapEx is actually spent the prior year.
所以当你看到,谷歌有1800亿美元的资本支出时,
And so when you look at, hey, Google's got a $180,000,000,000.
实际上其中很大一部分是用于为2028年和2029年支付涡轮机定金。
Actually, a big chunk of that is spent on turbine deposits for '28 and '29.
还有一部分用于2027年的数据中心建设。
A chunk of that is spent on data center construction for '27.
其中很大一部分用于电力购买协议、定金以及其他各种为未来布局的举措,以便实现这种超高速的扩张。
A chunk of that is spent on, you know, power purchasing agreements and down payments and all these other things that they're doing for further out into the future, so that they can set up this super fast scaling.
对吧?
Right?
这适用于所有超大规模云服务商以及供应链中的其他公司。
And and and this applies to all the hyperscalers and other people in the supply chain.
因此,今年大约部署了20吉瓦的容量。
And so, you know, 20 gigawatts roughly deployed this year.
其中很大一部分来自超大规模云服务商,而这些公司的最大客户正是Anthropic和OpenAI。
A big chunk of that being hyperscalers, chunk not being and all of these companies, their biggest customers are Anthropic and OpenAI.
Anthropic和OpenAI目前的规模分别约为2吉瓦、2.5吉瓦和1.5吉瓦左右。
Anthropic and OpenAI are in the, you know, two gigawatt and, you know, two and a half gigawatt and one and a half gigawatts roughly right now.
它们正试图扩张到更大的规模。
They're trying to scale too much larger.
对吧?
Right?
如果你看看Anthropic在过去几个月的成就,你知道,增加了40亿、60亿美元的收入。
If you look at what Anthropic has done over the last few months, know, 4,000,000,000, 6,000,000,000 revenue added.
如果我们直接画一条直线,嘿,是的,他们每个月还会再增加60亿美元的收入。
And if we just draw a straight line, hey, yeah, they'll add another $6,000,000,000 of revenue a month.
有人会认为这并不乐观,认为他们应该加速前进。
People would argue that's bearish and that they should go faster.
这意味着他们在接下来的十个月内将增加600亿美元的收入。
What that implies is that they're gonna add $60,000,000,000 of revenue across the next ten months.
对吧?
Right?
而以Anthropic目前的毛利率——至少根据媒体最近报道的数据——600亿美元的收入意味着他们在推理上的计算支出大约为400亿美元。
And $60,000,000,000 of revenue at the current gross margins that Anthropic had, at least last reported by media, would imply that they have, you know, roughly $40,000,000,000 of compute spend for that inference for that 60 bill of revenue.
这400亿美元的计算支出,按每吉瓦约10亿美元的租赁成本计算,意味着他们仅为了增长收入,就需要新增四吉瓦的推理能力。
That 40,000,000,000 of compute at roughly $10,000,000,000 a gigawatt rental cost means that they need to add four gigawatts of inference capacity just to grow revenue.
这还假设他们的研究与开发训练集群规模保持不变。
And that's saying that their research and development training fleet stays flat.
对吧?
Right?
所以,某种程度上,Anthropic 需要在今年年底前达到五吉瓦以上,虽然这对他们来说非常困难,但仍然是可能的。
So, you know, in a sense, Anthropic needs to get to well above five gigawatts by the end of this year, and it's gonna be really tough for them to get there, but it's possible.
我可以问个问题吗?
Can you can I ask a question about that?
是的。
So Yeah.
如果 Anthropic 无法在年底前达到五吉瓦,但又需要这么多算力来支撑远超预期的收入增长。
If Anthropic was not on track to have five gigawatts by the end of this year, but it needs that to serve both the revenue that's gone crazier than expected.
而且可能还需要更多。
And maybe it's gonna be even more than that.
再加上为确保明年模型足够优秀所需的研究和训练算力。
Plus the research and training to make sure its models are good enough for next year.
这些算力究竟从哪里来?
How how where is that gonna come from?
你知道吗,达里奥在你播客上的时候非常非常保守。
You know, Dario, when he was on your podcast was very, very like conservative.
他说,我不想疯狂增加算力,因为如果我的收入在不同时间点出现变化,我不想破产。
He's like, you know, I'm not gonna go crazy on compute because if my revenue inflects at a different rate, at a different point, I don't wanna go bankrupt.
我想确保我们在扩展时保持负责任的态度。
You know, I wanna make sure that we're being responsible with this scaling.
但事实上,他显然落后了,不像OpenAI那样,直接签下那些疯狂的交易。
But in reality, you know, he's definitely missed the pooch in terms of, like, going, like, OpenAI, which was let's just sign these crazy fucking deals.
对吧?
Right?
到今年年底,OpenAI在算力获取方面已经远远超过了Anthropic。
And OpenAI has kind of got way more access to compute than Anthropic by the end of the year.
那么Anthropic要怎么做才能获得足够的算力呢?
And so what does Anthropic have to do to get the compute?
他们只能转向之前不会考虑的低质量供应商。
Well, they have to go to lower quality providers that they would not have gone to before.
对吧?
Right?
你知道,理想情况下,Anthropic 历史上一直使用的是像谷歌和亚马逊这样的顶级供应商。
You know, optimally, you know, Anthropic, at least historically, has had the best quality providers been like Google and Amazon.
而相比之下,至少从历史角度看,全球最大的公司现在是微软,它们正在扩展整个供应链,转向一些新兴的供应商。
Whereas, you know, at least historically minded, you know, the biggest companies in the world, now Microsoft, and now they're expanding across the supply chain and going to other players that are newer.
OpenAI 一直更积极地与多家供应商合作。
OpenAI has been, you know, a bit more aggressive on going to many players.
是的,它们从微软那里获得了大量算力。
Yes, they have tons of capacity from Microsoft.
它们也有谷歌和亚马逊的资源,但同时还与 Core Wave 和 Oracle 等公司有大量合作,甚至找了一些看似随机的公司,比如从未建过数据中心的软银能源。
They have Google and Amazon as well, but they also have like tons with Core Wave and Oracle, and they've gone to like random companies or, you know, one would think random companies like SoftBank Energy who has never built a data center in their life.
但你知道,它们现在正在为 OpenAI 建设数据中心。
But, you know, they're building data centers now for OpenAI.
所以它们已经与许多其他公司合作,比如 N-scale 等,从这些公司获取算力。
So they've gone to and and and many others like N scale and others that they're going and getting capacity from.
因此,Anthropic 面临着一个困境,因为他们对计算资源一直非常保守,不想过度扩张。
And so there's this like conundrum for Anthropic because they were so conservative on compute, because they don't wanna go crazy.
对吧?
Right?
从某种意义上说,去年下半年许多金融市场的恐慌,是因为 OpenAI 签下了这么多合同,却没钱支付。
And in in some sense, a lot of the financial freak outs in the second half of last year were like, OpenAI signed all these deals, they don't have the money to pay for them.
明白了。
Okay.
甲骨文的股价要暴跌了。
Oracle stock's gonna tank.
哦,明白了。
Oh, okay.
Core REEF 的股价也要暴跌了。
Core REEF stock's gonna tank.
哦,明白了。
Oh, okay.
你看,这些公司的股票都暴跌了,信贷市场也乱了套,因为最终买家付不起钱。
Like, you know, all these companies stocks tanked and credit markets went crazy because people like, the end buyer can't pay for this.
现在呢,哦,等等。
Now it's like, oh, wait.
他们筹到了一大笔钱。
They raised a ton of money.
好的。
Okay.
行吧。
Fine.
他们能付得起钱了。
They can pay for it.
但从某种意义上说,Anthropic 更加保守。
But in the sense, Anthropic was a lot more conservative.
他们说,我们会签合同,但我们会坚持原则,故意低估自己可能做到的极限,保持保守,因为我们不想可能破产。
They were like, we'll sign contracts, but we'll principled, and we'll purposely undershoot what we think we can possibly do and be conservative because we don't wanna potentially go bankrupt.
但我想要理解的是,在紧急情况下获取计算资源到底意味着什么?
But the thing I wanna understand is so in a what what does it mean to have to acquire compute in a pinch?
是说你得去找像Neo Cloud这样的服务商吗?还是说它们的计算能力更差?
Is it that you have to go with like neo clouds that is it that they have worse compute or something?
具体差在哪儿呢?
Like in what way is it worse?
还是说,因为你临时介入,你不得不支付给文案写手高额的边际成本,而平时根本不需要付这笔钱?
And is it that you had to pay gross margins to a copywriter that you wouldn't have otherwise had to pay to because you're coming in at the last minute?
是谁建了这些备用容量,使得Anthropic和OpenAI能在最后一刻获得资源?
Who built the spare capacity such that it's available for Anthropic and OpenAI to get last minute?
那么,OpenAI到底获得了什么具体的竞争优势,以至于到2027年它们的计算资源总量会和对方差不多?
And like basically what is the concrete advantage that OpenAI has gotten they end up at similar compute numbers by 2027?
是说它们今年年底的电力消耗量会不一样吗?
Is it just like they're gonna end this year with different gigawatts?
如果是的话,到今年年底,Anthropic和OpenAI各自会拥有多少吉瓦的电力?
If so, how many gigawatts is Anthropic and OpenAI gonna have by the end of this year?
是的。
Yeah.
要获取额外的计算资源,确实有一些超大规模云服务商,并非所有计算合同都是长期的,对吧?
So to to acquire excess compute, I mean, yes, there is capacity hyperscalers that and not all contracts for compute are long term, right?
五年。
Five years.
对吧?
Right?
在2023年或2024年签署的H100算力合同,有些并不是五年期的。
There's compute that in 2023 or 2024, H100 is twenty twenty five that were signed at not five year deals.
对吧?
Right?
OpenAI的大部分算力都是通过五年期合同锁定的,但市场上还有很多其他客户签订了一年、两年、三年甚至六个月的短期或按需合同。
Open Eye, the vast majority of their compute assigned at five year deals, But they can, you know, there were there were many other customers that had one year, two year, three year deals, six month deals on demand.
当这些合同陆续到期时,市场上谁最愿意支付高价?
And as these contracts roll off, who is the participant in the market most willing to pay price?
在这种情况下,我们看到H100的价格大幅上涨,人们愿意为长期合约支付高达每颗2美元以上的费用。
And in this sense, right, we've seen H100 prices inflect a lot and go up and people willing to sign long term deals for, you know, as above $2 even.
对吧?
Right?
我见过一些交易,某些AI实验室——出于某些原因我这里说得模糊一点——以高达2.40美元的价格签订了两到三年的H100合约。如果你考虑一下利润率,Hopper刚发布时每颗成本是1.40美元,分摊到五年周期,而现在才过了两年,你却签下了为期两到三年、价格高达2.40美元的合约,这些利润率要高得多。
Like, I've seen deals where certain AI labs, I'm be a little bit vague here for a reason, have signed at as high as $2.40 for two to three years for H100s, which if you think about the margin, a dollar 40 for Hopper when you release it, or Hopper to build it across five years, and now two years in, you're signing deals that are two to three years that are at $2.40, those margins are way higher.
对吧?
Right?
这样一来,你就能挤掉所有其他供应商,无论是亚马逊、CoreWeave、Together AI、Nebius,还是其他任何人。
And so now you can crowd out all of these other suppliers, whether it's Amazon had these or Core Web had these or Together AI or Nebius or whoever it is.
对吧?
Right?
你知道,这些新兴云服务商通常拥有更高比例的Hopper芯片,一方面是因为它们更激进地采购,另一方面,它们倾向于签订较短期的合约——不是CoreWeave,而是其他公司通常签的都是短期合约。
You know, the the these these neo clouds are the firms that had a higher percentage of hopper in general because they were more aggressive on it, a, and b, they tended to sign shorter term deals, you know, not CoreWeave, but the others tended to sign shorter term deals.
所以,嘿,如果我想要Hopper,市场上还是有一些闲置产能的。
And so, hey, if I want Hopper, there is some capacity out there.
而且,大多数像甲骨文或CoreWeave这样的公司,其Blackwell的大部分产能都已签订长期协议,而本季度上线的任何产能都已经被售罄。
And then also, while most of the capacity at like an Oracle or CoreWeave is signed for a long term deal in terms of Blackwell, Anything that's going online this quarter is already sold.
在某些情况下,他们甚至没有达到原本承诺的销售数量,因为一些数据中心出现了延迟,不只是这两家,还有Nebius以及其他所有公司,比如微软、亚马逊、谷歌。
And and in some cases, they're not even hitting all the numbers that they promised they would sell because there are some data center delays, not just those two, but like Nebius and all the other folks, Microsoft, Amazon, Google.
但还有很多新兴云服务商以及一些超大规模云服务商,他们正在建设一些尚未出售的产能,或者原本计划用于内部用途、并非专注于AGI的产能,现在可能会转而出售。
But there is a lot of neo clouds as well as some of the hyperscalers who have capacity they're building that they did not sell yet, or capacity that they were gonna allocate to some internal use that is not necessarily super a AGI focused that they may now turn around and sell.
或者,以Anthropic为例,他们并不需要直接拥有全部的计算资源。
Or they may, you know, in the case of Anthropic, they don't have to have all the compute directly.
对吧?
Right?
亚马逊可以拥有计算资源并提供Bedrock服务,谷歌可以拥有计算资源并提供Vertex服务,微软可以拥有计算资源并提供Foundry服务,然后与Anthropic进行收入分成,反之亦然。
Amazon can have the compute, they can serve Bedrock or Google can have the compute and serve Vertex or Microsoft can have the compute and serve Foundry and then do a revenue share with Anthropic or vice versa.
好的。
Okay.
基本上你的意思是,Anthropic现在不得不支付大约50%的溢价,无论是通过收入分成,还是通过临时购买现货算力,而如果他们当初提前购买算力,本不需要支付这些费用。
Basically you're saying Anthropic is having to pay either this like 50% markup in the sense of the revenue share or in the sense of last minute spot compute that they wouldn't have otherwise had to pay had they bought the compute early.
对。
Right.
而且,这中间是有权衡的。
And and, you know, there's a trade off there.
但与此同时,过去整整四个月,大家都觉得OpenAI疯了,根本不会跟你们签这种合同,因为你们当时没钱;现在所有人都说:‘好的,OpenAI,我们一直相信你们,你们融到了这么多钱,现在我们可以签任何合同了。’但从某种意义上说,Anthropic在这方面反而受限了。
But also at the same time, You know for a solid like four months everyone was like open eye We're not gonna sign deals with you like that sounds crazy right because you guys don't have the money now everyone's like yeah open eye We believed you the whole time we can we can sign any deal because you've raised all this money, but in a sense, oh, Anthropic is constrained in that sense.
目前还没有太多新增的算力买家,因为Anthropic率先达到了这个能力水平,他们的收入开始增长。
There are not that many incremental buyers of compute yet because Anthropic hit the capabilities here first where their revenues
正在增长。
are moving.
这很有趣。
That's interesting.
也就是说,否则你会觉得,拥有最好的模型就像一个快速贬值的资产,三个月后你就不再是最佳模型了;但关键在于,你能借此签下这些合同,提前锁定算力,获得更优惠的价格。
Like, the that's this, you know, cause otherwise you're like, well, having the best model as a extremely depreciating asset that, you know, three months later, or you don't have the best model, but like the reason it's important is that you can sign these deals and then lock in the compute in advance, get better prices.
顺便说一句,这是否也意味着——也许这是个显而易见的观点——但直到最近,人们还一直强调:GPU的折旧周期到底是多久?
Doesn't this also imply by the way, and maybe this is an obvious point, but there's at least until recently, people had made this huge point about, oh, what is the depreciation cycle of a GPU?
空头们,比如迈克尔·伯里之类的,曾说过,人们认为这些GPU的使用寿命是四到五年。
The bears, Michael Burries or whatever have said, look, people are saying that four or five years for these GPUs.
但实际上,也许是因为技术进步太快了,将这些GPU的折旧周期设为两年可能更合理,这会增加每年报告的摊销资本支出。
And in fact, if you, maybe it's because the technology is improving so fast or whatever, it might make sense to have two year depreciation cycles for these GPUs, which increases the sort of like reported amortized CapEx in a given year.
因此,这可能使得建设这些云平台在财务上没那么有利可图。
And so it makes it maybe financially less lucrative to building all these clouds.
但事实上,我们可能面临的是,折旧周期甚至比五年还长,因为我们现在用的是Hopper,而如果AI真的爆发,在2030年我们可能会发现:天啊,得赶紧把7纳米晶圆厂建起来,还得回头去用A100,回报率……
But in fact, were pointing at like maybe the depreciation cycle is even longer than five years because if we're using hoppers and then especially if AI really takes off and in 2030, we're like, fuck, we gotta like get the seven nanometer fabs up and we gotta like, we gotta go back to the A100s, like return on
再次使用A100。
the A100s again.
那它的折旧周期就变得……
Then it's
实际上,折旧周期非常长。
like actually the depreciation cycle is incredibly long.
所以我认为,这对……是一个有趣的财务影响
And, so I I think that's an interesting financial implication of
是的。
Yeah.
有
There's
你说得对。
you're saying.
那里有几个方面可以入手。
There's a few strings to pull on there.
一个是GPU折旧会发生什么。
One is what happens to depreciation of GPUs.
对吧?
Right?
而且我想我之前没回答你的问题,就是Anthropic,我认为到年底,他们自己以及通过Bedrock、Vertex或Foundry提供其产品,应该能实现大约五吉瓦的规模,甚至可能更多。
And and I guess I didn't answer your prior question, which is like, Anthropic, I Think we'll be able to get to like five gigawatts ish, maybe a little bit more by the end of the year through themselves as well as their product being served through Bedrock or through Vertex or through Foundry.
我认为他们能实现五到六吉瓦,远超他们最初的计划。
I think they'll be able to get to five or six gigawatts, which is way above their like initial plans.
对吧?
Right?
没错。
Right.
你知道,不管怎样,那个差不多,可能稍微高一点。
You know, and and anyways, that's that's sort of like and and an open eye will be a little roughly the same, maybe a little higher.
根据我们的数据,实际上会稍微高一点,但不管怎样,GPU的折旧周期。
Actually a little bit higher based on our numbers, but anyways, the depreciation cycle of a GPU.
对吧?
Right?
迈克尔·默里说,大概是三年或更短。
Michael Murray was saying it's, you know, three years or or less.
对吧?
Right?
这基本上是他观点的核心。
It's like sort of his argument.
而且看待这个问题有两种方式和视角。
And there's sort of two ways and lenses to look at this.
比如,从机械角度来看,这里有一个总拥有成本模型。
Like, mechanically, in in in this, you know, there's a TCO model.
对吧?
Right?
GPU的总拥有成本,我们据此预测GPU的价格,并计算出整个集群的总成本。
Total cost of ownership of a GPU, where we sort of project pricing out for GPUs and build up the total cost of a cluster.
但还有很多成本,对吧?
But there's a number of costs, right?
你的数据中心成本,对吧?
There's your data center cost, right?
还有你的网络成本,以及数据中心里负责更换设备的现场人员成本。
There's your networking cost, there's your smart hands and people in the data center swapping stuff out.
还有你的备件成本,对吧?
There's your spare parts, right?
还有你实际的芯片成本。
There's your actual chip cost.
还有你的服务器成本。
There's your server cost.
所有这些不同的成本都被合并在一起,还涉及一些折旧周期。
All these all these various costs get slumped together, and there's some depreciation cycles on it.
你知道,这其中还有一些信贷成本。
You know, there's certain credit costs on it.
然后你就得到了,好吧,这就是你如何计算出来的。
And you get to, okay, that's how you build up.
如果你的折旧周期是五年,那么在五年内大规模部署一台H100,每小时的成本是1.40美元。
Hey, an H100 cost a dollar 40 an hour to deploy at volume across five years if your depreciation is five years.
如果你以每小时2美元的价格签订五年合约,你的毛利润率大约是35%。
And then if you sign a deal at $2 an hour for those five years, your gross margin is roughly 35%.
略高于这个数字。
It's a little bit above that.
但如果你以1.90美元签约,毛利率大约也是35%。
But, know, if you sign it for a dollar 90, it's 35% roughly.
然后你假设在第五年,GPU就像从公交车上掉下来一样报废了。
And then at you assume at that fifth year, the GPU falls off a bus.
对吧?
Right?
它已经报废了。
It's dead.
在某些情况下,人们提出的论点是,如果你没有签订长期合同,因为每两年视频性能就会翻三倍、四倍,而价格只翻一倍或上涨50%,那么H100的价格,当然,在2024年,市场价值可能是2美元,毛利率为35%。
And in some cases, you know, sort of the argument people are making is, well, if you didn't sign a long term deal because every two years in videos tripling quadrupling the performance while only two x ing the price or 50% increasing the price, then the price of an H100, sure, maybe the value in the market was $2 at 35% gross margins in 2024.
但在2026年,当Blackwell大规模量产并每年部署数百万颗时,你实际上每小时只值1美元。
But in 2026, when Blackwell is in super high volume and deploying millions a year, you're actually now worth a dollar an hour.
而当Rubin在2027年大规模量产时——尽管它今年才开始出货,但明年就会大规模部署,每年向云服务商交付数百万颗芯片——性能又提升了三倍,价格仅上涨50%或翻倍。
And when Rubin in '27 is in super high volume, right, even though it starts shipping this year, is in super high volume next year, doing millions of chips a year deployed into clouds, You've got another three x in performance, another 50% or two x in price.
实际上,Hopper每小时只值70美分。
Actually, the hopper is only worth 70¢ an hour.
因此,GPU的价格会继续下跌。
And so the price of a GPU would continue to fall.
这就像一个视角。
That's like one lens.
另一个视角是,你能从芯片中获得什么效用?
The other lens is what is the utility you get out of the chip?
对吧?
Right?
因为如果你能无限制造Rubin或最新芯片,那么确实会发生这种情况。
Because if you could build infinite Rubin or infinite of the newest chip, then yes, that's exactly what would happen.
随着新芯片的推出,性能价格比上升,Hopper的价格会按现货或短期合同价格下跌。
The price of a hopper would fall at a spot or a short term contract rate as the new chips come out and the price per performance goes up.
但由于半导体供应和部署时间表等限制,实际决定芯片价格的并不是:‘我今天能买到什么相对产品?’
But because you are so limited on semiconductors and deployment timelines and all these things, you end up with actually what prices these chips is not, hey, what's the comparative thing I can buy today?
而是:‘我今天能从这个芯片中获得什么价值?’
It's actually what is the value I can derive out of this chip today?
是的
Yeah.
对吧?
Right?
从这个角度来说,我们以GPT-5.4为例。
And in that sense, let's take GPT 5.4.
GPT-5.4运行成本比GPT-4低得多,活跃参数更少。
GPT 5.4 is both way cheaper to run than GPT four, has fewer active parameters.
从活跃参数的角度来看,它要小得多,而且因为使用的是更稀疏的MOE,而GPT-4是更粗粒度的MOE。
It's it's much smaller, right, in that sense of active parameters, plus because, you know, a sparser MOE versus GPU p four being a coarser MOE.
此外,训练、强化学习、模型架构、数据质量等方面的诸多进步,都使得GPT-5.4远优于GPT-4,且服务成本更低。
There's also been so many other advancements in training, RL, model architecture, etcetera, etcetera, data qualities, all these things that have made g d 5.4 way better than g p d four, and it's cheaper to serve.
因此,当你看H100时,它用GPT-5.4能处理的token数量,远高于运行GPT-4时的数量。
And so when you look at an H100, it can serve more tokens per GPU of 5.4 than if you'd ran for g p d four on it.
对吧?
Right?
所以某种程度上,它正在生成更多高质量模型的令牌。
So so at some sense, it's producing more tokens of a model that is of higher quality.
有意思。
Interesting.
所以某种程度上,你知道,GPT-4的令牌最大潜在市场是多少?
And so in some sense, you know, obviously, g p d four, what is the maximum TAM for its tokens?
你知道,也许只有几十亿美元,也许有几百亿美元。
You know, maybe maybe it was a few billion dollars, maybe it was tens of billions of dollars.
采用需要时间。
Adoption takes time.
对于GPT-5.4来说,这个数字可能超过一千亿美元,但存在采用滞后,还有竞争,其他人也在获得它,而且所有人都在持续改进。
For GPT 5.4, that number is probably north of a 100,000,000,000, but there's an adoption lag and there's competitions, other people are getting it, and there's the constant improvements that everyone else is having.
所以如果改进停止了,这里的H100的价值现在取决于GPT-5.4能从中获得的价值,而不是GPT-4能获得的价值,以及这些实验室所追求的利润率等,而它们处于竞争环境中,因此利润率不可能无限增长。
So if if if improvements stopped, you know, here, the value of an H100 is now predicated on the value that g p d five point four can get out of it instead of the value that g p d four can get out of it and the margins and all that stuff that these labs are doing and they're in a competitive environment so their margins can't go to infinity.
是的。
Yeah.
所以你实际上面临着一种非常有趣的动态关系。
So you sort of have this like dynamic that is quite interesting in that.
H100今天的价值比三年前更高。
H100 is worth more today than it was three years ago.
这太疯狂了。
That's crazy.
而且从另一个角度来看,这也挺有意思,不妨往前推一推。
And it I mean, it's also interesting from the perspective of like, just take that forward.
如果我们真的开发出了通用人工智能模型,就像在服务器上拥有一个真正的人类,按每秒浮点运算次数来算,H100——关于人脑能进行多少浮点运算,这些数字都相当模糊,但有人估计人脑的运算能力大约是10的15次方。
If you if we had actual AGI models developed, we had like genuinely human on a server and a human, like, on a flop basis, an H100, these are such hand wavy numbers about how many flops can the brain do, but on a flop basis, an H100 is estimated to 1E15 is like how much some people estimate the human brain does in flops.
当然,从内存角度看,人脑要大得多。
Obviously in terms of memory, the human brain has way more.
H100大约是80GB,而人脑可能有拍字节级别。
H100 is like 80 gigabytes and brain might have petabytes.
哦,是的。
Oh yeah.
你有拍字节?
You've got petabytes?
说个拍字节的零和一给我听听,兄弟。
Name name a petabyte of ones and zeros, bro.
给我一个字符串。
Name me a string.
这实际上正是关键点,因为实际上在
Well, this is actually the point where like actually in
不,我们只是拥有有史以来最优秀的稀疏注意力技术。
No, we've just got the best sparse attention techniques ever.
真的吗?
Genuinely, right?
比如在信息压缩量上,可能有拍字节,但其实这种东西,你知道的,是极度稀疏的MOE。
Like in, like in in the sort of like amount of information that is compressed, it might be petabytes, but like the actual, like this, you know, it's like extremely sparse MOE.
但不管怎样,想象一下,如果我们有一个人类知识工作者,每年能创造六位数的价值。
But anyways, imagine if we had a human knowledge worker can produce 6 figures a year of value.
所以,如果一台H100能产生接近这个水平的产出,如果我们真的在服务器上雇用人类,那么H100的价值就在于它能在几个月内收回成本。
And so if an H100 can produce something close to that, if we had actual humans on a server, the value of an H100 is like, it can repay itself in the course of like a couple of months.
在我为报税做准备的过程中,我意识到去年我合作过的承包商超过50人,从摄影师到音频技术人员再到剪辑师都有。
So as I've been going through everything to prep for taxes, I realized that I worked with over 50 different contractors last year, from cinematographers to audio technicians to editors.
我得给他们所有人寄1099表格。
And I owed all of them 1099s.
过去,我一直是用电子表格和一个装满发票的文件夹来确认需要收集税务表格的人。
In the past, I've just used a spreadsheet and a big folder of invoices to figure out who I need to collect tax forms from.
但因为承包商太多,这个过程非常耗时,我差点漏掉了一些人。
But with so many contractors, this takes a bunch of time, and I've almost missed some people.
不过今年,Mercury让我的流程变得简单多了。
This year, though, Mercury made my process way more straightforward.
每当我支付某人2025年的款项时,我只需开启一个开关,让Mercury自动向他们索取W-9表格。
Whenever I paid somebody in 2025, I just hit a toggle to have Mercury request a W-nine from them.
正因为如此,所有我需要开具1099表格的信息都直接发送给了Mercury。
Because of that, everything that I needed to issue 1099s got sent directly to Mercury.
我 literally 只是点了一下按钮,Mercury 就自动生成并发送了所有文件。
I literally just clicked a button, and Mercury generated and sent them all out.
这只是众多我从未想过银行平台能为我处理的事情之一。
This is just one of the many things that I never would have assumed that a banking platform could just handle for me.
Mercury 有大量类似的功能,这些功能将共同为我节省这个报税季的数天时间。
Mercury has a bunch of features like this, which are going to collectively save me multiple days this tax season.
你可以访问 mercury.com 了解更多信息。
You can learn more at mercury.com.
Mercury 是一家金融科技公司,而非 FDIC 保险银行。
Mercury is a fintech company, not an FDIC insured bank.
银行服务由 Choice Financial Group 和 column NA 成员提供,受 FDIC 保障。
Banking services provided through Choice Financial Group and column NA members FDIC.
所以当我采访 Dario 时,我想表达的重点并不是我认为奇点还有两年就到来,因此 Dario 急需购买更多算力。
So when I interviewed Dario, the point I was trying to make is not that I think the Singularity is two years away and therefore Dario desperately needs to buy more compute.
尽管确实存在收入,他确实需要购买更多算力。
Although it's the revenue is certainly there that he needs to buy more compute.
但我真正想表达的是,根据达里奥所言,他说我们距离拥有天才级数据中心的时间不超过两年,最多五年,而这样的数据中心理应创造数万亿的收入。
But the point I was trying to make is that given what Dario seems to be saying, given his statements that we're two years away from a data center of geniuses, certainly not more than five years away, and data center of geniuses should be earning trillions upon trillions of dollars of revenue.
因此,他反复强调在算力上要更保守,或者像你提到的,在算力投入上比OpenAI更谨慎,这完全说不通。
It just does not make sense why he keeps making these statements about being more conservative on compute or to your point, buying being less aggressive than OpenAI on compute.
我想这个观点被忽略了,因为人们开始调侃我,说:‘你这期播客是想说服一位身家数亿美元的CEO?你干嘛不直接吼他呢,老兄?’
And I guess that point got lost because then people were like roasting me about like, oh, this podcast was like trying to convince this like multi $100,000,000 company CEO, like, why don't you yell at it, bro?
但其实我只是想说,他自己的言论在内部是自相矛盾的。
But no, I was just trying to say that his internally, his statements are inconsistent.
不管怎样,把这个问题理清楚是好事。
Anyway, so it's it's it's good to iron it out.
是的。
Yeah.
我认为,回到之前的观点,如果模型如此强大,那么GPU的价值会随着时间推移而上升。
I think, you know, going back to, like, sort of the earlier view that if the models are so powerful, the value of a GPU goes up over time.
对。
Yeah.
随着我们越来越接近这样一个节点——目前只有OpenAI和Anthropic持这种观点。
As we approach closer and closer to, you know, let's say a point where right now only open eye and Anthropic have that viewpoint.
但随着我们进一步向前推进,实际上每个人都会做到,即使是使用开源模型,也能开始看到每块GPU的价值飙升。
But as we approach further and further out, actually, everyone is going to, even with open source models, able to, like, sort of, like, start to see that value skyrocket per GPU.
因此,从这个角度看,你现在就应该对算力进行投入。
And so in that sense, you should you should commit now to compute.
但有趣的是,按照Anthropic的风格,确实有个梗说他们存在承诺困难的问题,有点像‘多角恋’。
But interestingly, in, like, in anthropic fashion, right, you know, there there there is a bit of a meme that they are they don't they have problems with commitment issues, and they're, like, sort of polyamorous.
不是不是不是,这只是个梗而已。
Not not not not, but, like, this is a bit of a meme.
这解释了一切。
Explains everything.
顺便说一下,这里有个有趣的经济学效应叫‘阿肯-艾伦效应’,意思是如果提高不同商品的固定成本,其中一种是高质量的,一种是低质量的,这会让人们在边际上选择高质量的商品。
By the way, so there's this interesting economics effect called alken alan, which is the idea that if you increase the fixed cost of different goods, one of which is higher quality and one of is lower quality, that will make people choose the higher quality good on the margin.
举个具体例子,假设更好吃的苹果卖2美元,而差一点的苹果卖1美元,好吧。
So to give you specific example, suppose the, you know, better tasting Apple costs $2 and then like the shittier Apple costs $1 Okay.
现在假设你对它们征收进口关税。
Now suppose you put an import tariff on them.
于是现在,好的苹果是3美元,中等的苹果是2美元。
And so now, now it's $3 versus $2 for like great apple, medium apple.
对吧?
Right?
这是因为两者都涨了一美元,还是应该上涨50%?
Is that because they both increased by a dollar or should it be like 50% increase?
不,不是的。
No, no.
因为两者都上涨了一美元。
Because they both increased by a dollar.
整个效应是,如果对两者都施加一个固定成本,它们之间的相对价格、价格差异就会发生变化。
The whole effect is that if there's a fixed cost that's applied to both, the relative price, the price difference between them, the ratio changes.
以前,更贵的那个价格是便宜的两倍。
Previously it was like this, the more expensive one was two X more expensive.
现在它只贵1.5倍。
Now it's just 1.5 X more expensive.
所以我在想,如果应用到人工智能领域,这意味着如果GPU变得更贵,计算成本将出现固定成本的上涨。
And so I wonder if applied to AI, that would mean that look, if, if GPUs are gonna get more expensive, there will be a fixed cost increase in the price of compute.
是的。
Yes.
因此,这会促使人们愿意为稍好一点的模型支付更高的溢价。
As a result, that will push people to be willing to pay higher margins to for slightly better models.
因为从计算角度看,反正我都要为计算支付这么多钱。
Because the calculus is I'm gonna be paying all this money for the compute anyways.
那我还不如多花一点钱,确保使用的是最顶尖的模型,而不是稍差一点的模型。
I might as well just pay slightly more to making sure it's like the very best model rather than a model that's slightly worse.
对。
Right.
所以Hopper从2美元涨到了3美元,如果Hopper能生成一百万条Opus令牌,而能生成两百万条SONNET令牌,那么由于GPU价格从2美元涨到3美元,Opus和SONNET之间的价格差异就缩小了。
So the hopper went from 2 to $3 and if the hopper can make a million tokens of Opus and it can make 2,000,000 tokens of SONNET, the price differential between Opus and SONNET has decreased because the price of the GPU has increased by a dollar from 2 to 3.
有意思。
Interesting.
我觉得这非常有道理。
I think that makes a ton of sense.
而且,我认为我们目前看到的所有流量和收入都集中在最好的模型上,在计算资源受限的世界里,会发生两件事。
Also, we I think we just see all of the volumes are on the best models today, all the revenues on the best models today, and in a compute limited world, there's sort of two things that happen.
对吧?
Right?
A,那些早早锁定资源、没有承诺问题的公司,签订了五年的计算资源合同。
A, companies that have locked up, you know, and and don't have commitment issues, you know, have these five year contracts for compute.
它们已经锁定了巨大的利润率优势,因为它们是以五年前、三年前或两年前的定价锁定的计算资源。
They've kind of locked in a humongous margin advantage because they've locked in compute for five years at a price of what it transacted at five years ago or three years ago or two years ago, whatever it is.
而如果你已经进入五年合同的第三年,而别人的两年或三年合同到期了,现在你试图以现代价格购买,而你的定价是基于模型的价值,那么价格会大幅上涨。
Whereas if you're now three years into that five year contract and someone else's two year contract or three year contract rolled off, and now you're trying to buy that at, you know, modern pricing, when you're priced to the value of models, the price is gonna be up a lot more.
因此,从某种意义上说,早期做出承诺的人通常拥有更好的利润率。
And so in a sense, like the person who committed early has better margins in general.
而签订长期合同的市场份额远大于那些可以临时添加的短期合同市场份额。
And the percentage of the market that is in long term contracts is much larger than the percentage of the market in short term contracts that can be this sort of flex capacity that you add at the last second.
同时,利润去哪儿了呢?
And and then at the same time, right, so where does the margin go?
对吧?
Right?
因为模型变得更有价值了。
Because models get more valuable.
云服务商能多大程度地灵活调整定价?
How much can the cloud players flex their pricing?
事实上,如果你看看Core Web,他们目前超过90%的计算资源平均合同期限已经超过三年。
Well, if in fact, like, you look at Core Web, their average term duration is like over three years right now for like 90% plus of their compute.
期限超过三年。
It's over three years.
因此,他们陷入了这样一个困境:实际上无法灵活调整价格。
And so they end up with this like conundrum of like, well, they can't actually flex price.
但每年,它们新增的产能都远超以往。
But every year, they're adding incrementally way more capacity than they had previously.
对吧?
Right?
仅今年一年,Meta新增的产能就相当于2022年整个数据中心集群为服务WhatsApp、Instagram和Facebook以及运行AI所拥有的全部算力。
This year alone, right, Meta's adding as much capacity as they had in the entire fleet of compute and data centers for all purposes for serving WhatsApp and Instagram and Facebook in 2022 and doing AI.
对吧?
Right?
仅这一项,他们今年就新增了这么多。
They're adding that alone this year.
所以同样地,你想想,Meta在这么做,CoreWeave、谷歌和亚马逊,这些公司每年都在疯狂增加算力,而这些新算力都是按新价格交易的。
So in the same sense, you know, you talk about Meta's doing that, CoreWeave and and Google and Amazon, all these companies are adding insane amounts of compute year on year on year, that new compute gets transacted at the new price.
所以从某种意义上说,是的,只要我们处于这种起飞阶段,价格就被锁定了。
So in a sense, yes, you've locked in as long as we're in a sort of a takeoff.
对吧?
Right?
哦,OpenAI去年从600兆瓦增加到2吉瓦,今年从2吉瓦增加到6吉瓦以上,明年预计会达到12吉瓦。
Oh, OpenAI went from 600 megawatts to two gigawatts last year, from two gigawatts to, you know, six plus this year, and, you know, six to 12 next year.
对吧?
Right?
新增的计算能力才是成本的主要来源,而不是之前的长期合同。
The incremental added compute is where all the cost is, not the prior long term contracts.
所以,掌握定价权的是基础设施提供商。
So then who holds the card is the infra providers for charging margin.
对吧?
Right?
现在,云服务商、新兴云厂商或超大规模云提供商可以收取溢价。
So now the cloud players, the neo clouds or the hyperscalers can charge the margin.
哦,他们不能,或者只能在一定程度上做到。
Oh, they can't because or they can to some extent.
但当你向上游看,谁掌握了所有的内存和逻辑容量呢?
But then as you go upstream to, oh, well, who has access to all the memory and logic capacity?
主要是英伟达。
Well, it's Nvidia for the most part.
他们签了很多长期合同。
They've signed a lot of long term contracts.
你知道,他们现在已经有大约900亿美元的长期合同,并且正在与内存供应商谈判三年期的协议。
You know, they've got like 90,000,000,000 of long term contracts today, and they're negotiating three year deals with the memory vendors today.
你知道,像亚马逊和谷歌通过博通,还有亚马逊直接合作,以及这些公司,还有AMD。
You know, you've got you've got, you know, obviously Amazon and Google through Broadcom and they're, you know, Amazon directly and all these companies, AMD.
这些公司握有所有主动权,因为它们已经锁定了产能。
These companies hold all the cards because they've secured the capacity.
台积电没有提价,但内存供应商却在某种程度上大幅提价。
And and TSMC is not raising prices, but memory vendors are just like sort of, to some extent, raising a lot of price.
对吧?
Right?
所以它们又要将价格翻倍甚至提高三倍。
So they're gonna double or triple price again.
但他们也在签订这些长期协议。
But then they're also signing these long term deals.
所以实际上能够积累所有利润的是云服务商、芯片厂商和内存厂商,直到台积电或ASML站出来表示:不行。
So who is able to accrue all the margin dollars is actually, you know, potentially the cloud, potentially the chip vendors, and the memory vendors until TSMC or ASML, like, break out and they're like, no.
我们打算大幅提高价格。
Actually, we're gonna charge a lot more.
但与此同时,模型厂商能收取巨额利润吗?
But at the same time, do the model vendors get to charge crazy margins?
我认为至少今年,模型厂商的利润率会大幅上升。
I think at least this year, we're gonna see margins for the model vendors go up a lot.
对吧?
Right?
因为他们的产能严重受限,不得不抑制需求。
Because they're so capacity constrained, they have to destroy demand.
对吧?
Right?
他们不可能继续这样下去。
There is there's no way they can continue.
Anthropic 可以在不破坏需求的情况下保持目前的节奏。
Anthropic can continue at the current pace without destroying demand.
是的。
Yeah.
我们来谈谈逻辑和内存。
Let's get into logic and memory.
Nvidia 具体是如何锁定如此多的逻辑和内存资源的?
How specifically Nvidia has been able to lock up so much of both?
所以,根据你的数据,到 2027 年,Nvidia 将占据超过 70% 的 N3 晶圆产能,或者差不多这个比例。
So if you, I think according to your numbers by '27, Nvidia is gonna have like 70 plus percent of N3 wafer capacity or something like that, or around that area.
至于 SK 海力士和三星等公司的内存份额,我记不清具体数字了。
And then I forget what the numbers were for memory at SK Hynix and Samsung and so forth.
但如果你看看 Neo 云业务是如何运作的,Nvidia 是如何与之合作的,或者 RL 环境业务是如何运作的,Anthropic 是如何与之合作的。
But, if you look at, so think about how the Neo cloud business works and how Nvidia works with that, or how the, RL environment business works and how Anthropic works with that.
展开剩余字幕(还有 480 条)
在这两种情况下,英伟达故意试图分裂互补产业,以确保自己拥有尽可能多的议价能力。
In both those cases, Nvidia is purposely trying to fracture the complimentary industry to make sure that they have as much leverage possible.
所以他们把产能分配给一些随机的云服务商,以确保不会让某一家公司独占所有算力。
So they're giving, you know, allocation to random neo clouds to make sure that there's not one person that has all the compute.
同样地,Anthropic 或 OpenAI 在与数据提供商合作时会说:不,我们要培育一个庞大的此类产业,这样我们就不会被任何一家数据环境供应商锁定。
Similarly Anthropic or OpenAI, when they're working with the data providers, they say, no, we're gonna just seed a huge industry of these things so that, we're not locked into any one supplier for, for data environments.
我想知道,为什么在即将推出的三纳米工艺——也就是 Tranium 三、TPU 第七代,以及其他潜在加速器上——
And I wonder why on the three nanometer process that's gonna be Tranium three, that's gonna be TPU v seven, other accelerators potentially.
为什么台积电要把所有产能都交给英伟达,而不是试图分散市场?
And why is TSMC just giving it all up to Nvidia rather than, you know, just trying to fracture the market?
是的。
Yeah.
所以我认为这里有几点值得注意。
So I think there's a couple like points here.
对吧?
Right?
关于三纳米工艺,如果我们回看去年,大部分三纳米产能都是苹果的。
On three nanometer, you know, if we go back to last year, the vast majority of three nanometer was app Apple.
对吧?
Right?
苹果正在转向二纳米工艺。
Apple's being moved to two nanometer.
内存价格正在上涨。
Memory prices are going up.
所以苹果的出货量可能会下降。
So Apple's volumes may go down.
对吧?
Right?
因为内存价格上涨,他们要么削减利润,要么转向其他方案。
Because as memory prices go up, have to either they cut margin or they move move on.
你知道,由于存在长期合同,这中间会有一些时间延迟。
You know, there there's some time lag because they have long term contracts.
但基本上,苹果可能会减少需求,或更快转向二纳米制程,而目前二纳米仅能生产移动芯片。
But basically, Apple likely reduces demand slash moves to two nanometer faster, where two nanometer is only capable of a sort of mobile chips today.
未来,AI芯片也会转向这一制程。
And in the future, AI chips will move there.
所以苹果在这方面是有优势的。
So sort of Apple has that.
然后苹果也在与第三方供应商接洽,因为他们被TSMC挤占了一点空间。
Then Apple is also talking to third party vendors because they're getting squeezed out of TSMC a little bit.
因为TSMC在高性能计算、AI芯片等领域的利润率,高于移动芯片领域,毕竟他们在HPC领域的优势比在移动领域更大。
Because TSMC's margins on high performance computing, HPC, AI chips, etcetera, is higher than it is for mobile, because they have a bigger advantage in mobile in sorry, in HPC than they do in mobile.
但不管怎样,当你看TSMC目前的算盘时,实际上他们给从事CPU业务的公司提供了非常优厚的产能分配。
But anyways, when you look at what's what's TSMC running calculus here, actually, they're providing really good allocations to companies that are doing CPUs.
对吧?
Right?
所以当你想到,亚马逊有Tranium,也有Graviton,这两者都采用三纳米制程,其中Graviton是他们的CPU,Tranium是他们的AI芯片,TSMC实际上更愿意把产能分配给Graviton,而不是Tranium,因为他们认为CPU业务具有更稳定、更长期的增长前景。
So when you think about, hey, Amazon has Tranium and Amazon has Graviton, and both of those are on three nanometer, Graviton being their CPU, Tranium being their AI chip, they're actually TSMC is much more excited to give allocation to Graviton than they are to Trainium because they view CPU business as more stable long term growth.
对吧?
Right?
作为一家保守且不想过度依赖增长周期的公司,你实际上会优先将产能分配给更稳定、增长率较低的市场,然后再将所有增量产能分配给高增长率的市场。
And as a company that is conservative and doesn't want to ride cycles of growth too hard, you actually want to allocate to the the market that is more stable and lower growth rate first before you allocate all the incremental capacity to the fast growth rate market.
一般来说,情况就是这样。
Now, that is that is the case generally.
所以你看,AMD的情况也是如此。
And so when you look at like, hey, same for AMD.
对吗?
Right?
他们在CPU上获得的产能分配,台积电对此的热情远高于对GPU的。
The allocations they get on, you know, their CPUs is is like TSMC is much more excited about those than they are for GPUs.
亚马逊的情况也是如此。
Likewise for Amazon.
而英伟达则有些特殊,因为确实,他们也有CPU。
And Nvidia is is a bit unique because all yes, they have CPUs.
是的,他们制造交换机。
Yes, they make switches.
是的,他们生产网络设备。
Yes, they make networking.
他们制造NVLink。
They make NVLink.
他们生产所有这些不同的InfiniBand、以太网以及各种网卡产品。
They make all these different InfiniBand Ethernet, all these different products, NICs.
总的来说,到今年年底,随着Reuben发布及其家族中的所有芯片,这些产品大多将采用3纳米工艺。
By and large, most of these things will be on three nanometer by the end of this year with the Reuben launch and all the chips that are in that family.
其中GPU最为重要,但Nvidia却获得了大部分产能。
The GPU being the most important one, and yet Nvidia is getting the majority of supply.
对吧?
Right?
部分原因在于,当你观察市场时,会发现像台积电这样的公司有多种方式预测市场需求,但这也是一种市场信号。
Part of this is because you look at the market and you like sort of like, you know, TSMC and others like they there are many ways that they forecast market demand, But also it's market signal.
对吧?
Right?
市场发出了信号:明年我们需要这么多产能。
The market signaled, hey, we need this much capacity next year.
我们需要这么多。
We need this much.
我们需要这么多。
We need this much.
我们会签署不可取消、不可退货的订单。
We'll sign non cancelable, non returnable.
我们甚至可能支付定金。
We may even pay deposits.
对吧?
Right?
类似这样的事情。
Things like this.
Nvidia 比 Google 或 Amazon 早得多就做了这件事。
Nvidia just did it way earlier than Google or Amazon.
在某些情况下,Google 和 Amazon 遇到了障碍,比如有一款芯片 Trainium 延迟了几个季度,诸如此类的事情都发生了。
And in some cases, Google and Amazon had stumbling blocks, you know, there was one one of the chips got delayed slightly by by a couple quarters, Trainium, and and all these sorts of things happened.
在这种情况下,大家突然意识到:好吧。
And then so in that case, there was a huge sort of like, okay.
这些公司都在拖延,但 Nvidia 却在不断追加需求,我们正在与整个供应链核实。
Well, these guys are delaying, but Nvidia is wanting more, more, more, more, And we are checking with the rest of the supply chain.
产能够不够?
Is there enough capacity?
对吧?
Right?
所以他们正在找所有 PCB 厂商,说:嘿。
So they're going to all the PCB vendors and they're saying, hey.
Victory Giant 的产能够不够?
Is there enough victory giant?
PCB产能够吗?
Is there enough PCB?
这是给英伟达供应PCB的最大供应商之一,是一家中国公司。
This is like one of the largest suppliers of PCBs to Nvidia, they're a Chinese company.
所有的PCB几乎都来自中国,要么是他们生产的,要么是很多来自他们。
All the all the PCBs come from China sort of from them or many of them.
不管怎样,他们问:你们有足够的PCB产能吗?
And and anyways, they're like, do you have enough PCB capacity?
太好了。
Great.
哦,对了,内存供应商谁有充足的内存产能?
Oh, hey, memory vendors who has all the memory capacity?
哦,明白了。
Oh, okay.
英伟达有。
Nvidia does.
很好。
Great.
所以当你以同样的方式来看,谁有足够的AGI信仰,愿意在长期时间内以连非AGI信仰者都觉得荒谬的水平购买算力,但他们仍然愿意支付不错的利润率并立即签约,因为他们认为未来这种比例会失衡。
So when you look at sort of in the same way, you know, who who is AGI pilled enough to buy compute in long timelines at levels that seem ridiculous to people who aren't AGI pilled, but nonetheless, they're willing to pay a pretty good margin and sign it now because they view in the future that that ratio is screwed up.
半导体供应链也是一样的情况。
The same thing happens with the supply chain for semiconductors.
对吧?
Right?
英伟达在……我不认为英伟达完全属于AGI信仰者。
Nvidia was while the I don't think Nvidia is quite AGI pilled.
对吧?
Right?
你知道,詹森并不相信软件会被完全自动化,诸如此类的事情。
You know, Jensen doesn't believe software is gonna be automated fully and all these things.
对吧?
Right?
加速计算,不是AI芯片。
Accelerated computing, not AI chips.
对吧?
Right?
就是AI芯片。
It's AI chips.
对吧?
Right?
但他就是这么叫的。
But that's what he calls it.
对吧?
Right?
是的。
Yeah.
因为,我觉得这是一个更宽泛的术语。
Because, I mean, I think there's a broader term.
对吧?
Right?
AI属于其中,但比如物理建模和模拟之类的
AI is within that, but, like, physics modeling and simulations and, like
或者也许他只是没有接受主要的应用场景。
Or maybe just, like, he's not embracing the sort of, like, main use case.
而且
And
我觉得他是在接受的。
I think he's embracing it.
但我觉得他并没有像达里奥或萨姆那样深陷AGI思维。
But like, I I just don't think he's like AGI pilled like Dario, right, or Sam.
但他仍然比去年第三季度的谷歌或亚马逊更专注于AGI,而且他看到了更大的需求。
But he's still way way more AGI pilled than Google was at q three of last year or Amazon was at q three of last year, and he saw way more demand.
对吧?
Right?
原因其实很简单。
And and and and the reason is pretty simple.
你知道,你可以看到所有数据中心的建设。
You know, you can see all the data center construction.
他想,好吧,我要拿下这个市场份额。
He's like, okay, wanna have this market share.
我们基本上追踪了所有数据中心,你可以看到,很多数据中心可能属于其中任意一种情况。
You know, we sort of like have all the data centers tracked and, you know, you can see, you know, there's there's a lot of data centers that you could say, well, they could be one or the other.
对吧?
Right?
所以在某种程度上,谷歌和亚马逊——尤其是谷歌,尽管他们的TPU更适合他们部署,但还是不得不大量部署GPU,因为他们没有足够的TPU来填满数据中心。
And so in some to some extent, Google and Amazon you know, Google especially, even though their, you know, their TPU is just better for them to deploy, they have to deploy a crap load of GPUs because they don't have enough TPUs to fill up their data centers.
他们没法把TPU生产出来。
They can't get them fabbed.
等等。
Wait.
我可以问个关于这个的问题吗?
Can I so I have a question about that?
谷歌卖了一百万个V7,也就是Ironwoods给Anthropic,对吧?
Google sold, I think a million, was it the V7s, the Ironwoods to Anthropic.
你说现在这一年或者明年,总的来说,最大的瓶颈就是逻辑和内存,也就是制造这些芯片所需的东西。
And you're saying in general, there's this big bottleneck right now this year or next year, I mean, I guess going forward forever now is gonna be the, you know, logic memory, the stuff that like it takes to build these chips.
谷歌拥有DeepMind。
And Google has DeepMind.
这是第三个重要的AI实验室。
This is the other third prominent AI lab.
如果这是最大的瓶颈,那他们为什么要把这些芯片卖出去,而不是直接给DeepMind呢?
And if this is the big bottleneck, why would they sell it rather than just giving it to DeepMind?
是的。
Right.
所以,这又是一个问题,DeepMind的人觉得这太疯狂了。
So so this is again like a a problem of like, you know, DeepMind people are like, this is insane.
我们为什么要这么做?
Why did we do this?
是的。
Yeah.
对吧?
Right?
但谷歌云团队和谷歌高管看到了另一种不同的思维方式。
But then Google Cloud people and Google executives saw a different like thought process.
对吧?
Right?
而且,你和我都了解计算团队。
And basically, you you you know, you and I know the compute team.
有一个人,实际上来自两家公司,他们是Anthropic计算团队的核心人物,他们注意到了这种脱节。
There's one guy from you know, both of them actually came from Google at at you know, the main people on on the compute team at Thropic, they saw this dislocation.
他们谈成了这笔交易,在谷歌意识到之前就获得了这些计算资源。
They negotiated a deal, and they were able to get access to these to this compute before Google realized.
所以,从我们发现的数据来看,事件的链条至少发生在第三季度初,在大约六周的时间里,我们看到Anthropic——抱歉,是TPU的算力容量显著提升,而且在那六周内多次大幅增长。
And so the actually, the chain of events, least from our data that we found was in in early q three, we saw over the course of two over the course of, like, six weeks, we we saw capacity on anthropic or sorry, on TPUs go up by a significant amount over the course of those six weeks, and it went up like multiple times in those six weeks.
对吧?
Right?
有多次请求。
There were multiple requests.
谷歌甚至不得不去台积电,向他们解释为什么需要如此突然的产能增加。
Google even had to go to TSMC and explain to them why they needed this increase in capacity because it was so sudden.
但这些产能增加的很大一部分是用于供应Anthropic的。
But that a lot of that capacity increase was for selling to Anthropic.
是的。
Yeah.
因为Anthropic比谷歌更早看到了这一点。
Because Anthropic saw it before Google.
然后谷歌推出了Nano Banana和Gemini 3,导致其用户指标飙升,谷歌管理层于是意识到:哦。
And then Google had Nano Banana and Gemini three, which caused their user metrics to skyrocket, and leadership at Google was like, oh.
然后他们开始提出声明,说我们必须每六个月将计算能力翻倍?
And then they started making the statement of we have to double compute every is it six months?
或者我不记得他们具体说的是多少了。
Or I don't remember the exact number that they said.
但他们确实醒悟了很多,于是他们对台积电说:我们想要更多。
But they they really woke up a lot more, and then they're like, oh, hey, TSMC, we want more.
我们想要更多。
We want more.
而台积电却说:抱歉,各位。
And it's like, well, sorry, guys.
我们明年的产能已经售罄了。
Like, we're sold out for next year.
我们可以为明年安排生产。
We can work on next year.
我们或许能在2026年多提供10%,但真正重点会放在2027年。
We can maybe get, like, 10% more for '26, but really, we're gonna work on '27.
对吧?
Right?
这就像,你知道的,在我看来,实验室之间存在某种信息不对称。
It's sort of like, you know, there's this like information asymmetry of the labs in my mind.
对吧?
Right?
我不确定这是否完全准确,但我根据供应链中看到的所有数据——比如晶圆订单、Anthropic和Fluid Stack签署的数据中心动态等等——自己构建了这样一个叙事,这让我非常清楚地意识到,谷歌搞砸了。
I don't know if this is exactly it's a narrative I've spun myself from seeing all the data in the supply chain on, like, wafer orders and, like, what's going on with the data centers that, you know, Anthropic signed and Fluid Stack signed and all this, like, sort of it's it's it's it's pretty clear to me that Google screwed up.
你可以从谷歌Gemini的年度经常性收入(ARR)中看到这一点。
And you can see this from Google's Gemini ARRs.
对吧?
Right?
他们在第一季度到第三季度几乎没有任何收入。
They had next to nothing in q one to q three.
第三季度稍微好了一点,对吧?因为他们开始出现转折了。
Q three a little bit, right, once they started inflecting.
但第四季度,他们的ARR达到了50亿美元。
But q four, they were at, like, 5,000,000,000 ARR.
对吧?
Right?
大概是退出阶段之类的。
Exiting or something like this.
所以从ARR角度来看,第四季度的收入是50亿美元。
So it's like or 5,000,000,000 revenue for q four on an ARR basis.
因此很明显,谷歌并没有看到收入飙升。
And so it's clearly like Google didn't see revenue skyrocket.
从某种意义上说,Anthropic之前一直有点犹豫不决,直到他们的ARR突然增长,尽管他们掌握的信息更不对称,更能预见未来趋势。
And in a sense, Anthropic was not willing you know, it's kinda had like a little bit of commitment issues before their ARR exploded, even though they have far more information asymmetry and see what's coming down the pipe.
谷歌比Anthropic更保守,a。
Google is going to be more conservative than Anthropic is, a.
b,而且谷歌的ARR本来就更少,所以他们似乎就是不太愿意行动,直到后来才意识到应该这么做。
And b, Google had had even less ARR, So they they sort of were like, I think just not willing to like sort of do it and then they realized they should do it.
从那以后,谷歌变得极度痴迷于AGI。
And so now since then, Google has gotten absurdly AGI pilled.
对吧?
Right?
就他们所做的事情而言。
In terms of like what they're doing.
他们收购了一家能源公司。
They bought an energy company.
他们为涡轮机支付了定金。
They're buying putting deposits down for turbines.
他们购买了海量的电力用地。
They're buying a ridiculous percentage of the powered land.
他们正在与公用事业公司谈判长期协议。
They're going to utilities and negotiating long term agreements.
他们在数据中心和电力方面非常激进地推进这些举措。
They're doing this on the data center and power side very, very aggressively.
对吧?
Right?
所以,你知道,我认为谷歌在去年年底才醒悟过来,但花了他们一些时间。
So, you know, I think Google woke up towards the end of last year, but it took them some time.
你觉得到明年年底,谷歌会有多少吉瓦的电力?
And how many gigawatts do you think Google will have by the end of next year?
买我的数据。
Buy my data.
你对这种信息收费。
You charge for that kind of information.
是的。
Yes.
是的。
Yes.
我觉得每年阻碍我们扩展AI算力的瓶颈都在变化。
I feel like every year the bottleneck for what is preventing us from scaling AI compute keeps changing.
几年前是CoAost,去年是电力,今年你会告诉我今年的瓶颈在哪里,但我想要了解五年后,是什么在制约我们部署奇点?
A couple of years ago was CoAost, last year it was Power, this year, you'll tell me where the bottleneck is this year, but I want to understand five years out, what will be the thing that is constraining us from deploying the Singularity?
是的。
Yeah.
我认为最大的瓶颈是计算能力。
I think the biggest bottleneck is compute.
而在这方面,最长交期的供应链既不是电力,也不是数据中心。
And for that, the longest lead time supply chains are not power or data centers.
而是半导体供应链本身。
They're actually the semiconductor supply chain themselves.
对吧?
Right?
瓶颈又从电力和数据中心转向了芯片。
It switches back from being power and data center as a major bottleneck to chips.
在芯片供应链中,存在多个不同的瓶颈,对吧?
And in the chip supply chain, there's a number of different bottlenecks, right?
有内存,有台积电的逻辑晶圆,还有晶圆厂本身。
There's memory, there's logic wafers from TSMC, there's fabs themselves.
晶圆厂的建设需要两到三年,而数据中心的建设则不到一年。
Construction of the fabs takes a couple years, three two to three years versus a data center takes less than a year.
对吧?
Right?
我们已经看到亚马逊在八个月内就建成了数据中心。
We've seen Amazon build data centers in as fast as eight months.
对吧?
Right?
由于制造芯片的工厂本身的复杂性以及设备的原因,交货时间存在巨大差异。
So there's a big difference in lead times because of the complexity of the building, the fab that actually makes the chip, and then the tools.
对吧?
Right?
这些设备的交货时间也非常长。
Those also have really long lead times.
因此,随着我们规模的扩大,瓶颈已经从‘当前供应链目前无法做到什么’转向了其他问题。
And so the bottlenecks as we've scaled have shifted from, hey, what is the supply chain currently not what is it currently not able to do?
以前是CoOS、电力和数据中心,但这些都属于交期较短的项目。
Which was CoOS and power and data centers, but those were all shorter lead time items.
对吧?
Right?
CoOS只是将芯片封装在一起的简单流程。
Co ops is a much more simple process of packaging chips together.
电力和数据中心最终比芯片的实际制造要简单得多。
Power and data centers are ultimately way more simple than the actual manufacturing of the chips.
因此,一些产能已经从移动或PC芯片转向了数据中心芯片,但这种转移在一定程度上是可替代的。
And so there's been some sliding of of of capacity across, you know, mobile or PC to data center chips, but that's been somewhat fungible.
而CoOS、电力和数据中心作为供应链,却需要从头开始建设,但现在移动和PC行业——曾经占半导体行业大部分份额——已经没有多余产能可以转向AI了。
Whereas on in whereas CoOS and power and data centers have sort of had to start anew as supply chains, but now there's sort of no more capacity for the mobile and PC industries, which used to be the majority of the semiconductor industry, to shift over to AI.
对吧?
Right?
英伟达现在是台积电最大的客户,也是全球最大的内存制造商SK海力士的最大客户。
Nvidia is now the largest customer at TSMC, and Nvidia is the largest customer at SK Hynix, the largest memory manufacturer.
对吧?
Right?
因此,将资源从普通消费者——也就是PC和智能手机——身上转移出来,进一步转向AI芯片,已经几乎不可能了。
So it's sort of impossible for the scaling or the sliding of resources away from the common person, right, PCs and and smartphones to shift any more towards the AI chips.
那么,我们该如何扩大AI芯片的生产呢?
And so now how do we scale the AI chip production?
而这就是我们在迈向2030年时面临的最大瓶颈。
And that's the biggest bottleneck as we go to 2030 is those.
这将会是
It'd be
如果能根据一个绝对的吉瓦上限来预测到2030年的情况,仅基于我们无法生产超过这么多台EUV光刻机这一事实,那将非常有趣。
very interesting if there is an absolute gigawatt ceiling that you can project out to 2030 based just on, hey, we can't produce more than this many EUV machines.
对。
Right.
因此,要进一步提升计算能力,今年和明年会有一些不同的瓶颈。
So to scale compute further, right, there's some different bottlenecks this year, next year.
但到2829年,瓶颈最终会落到供应链最底层的ASML身上。
But ultimately by 2829, the bottleneck falls to the lowest rung on the supply chain, which is ASML.
对吧?
Right?
ASML制造着世界上最复杂的机器,也就是EUV设备,每台的售价是3041亿美元。
ASML makes the world's most complicated machine, I e, an EUV tool, and the selling price for those is $304,100,000,000 dollars.
目前,他们每年只能生产大约70台。
And currently, they can make about 70.
明年,他们将提升到80台。
Next year, they'll get to 80.
即使在非常激进的供应链扩张下,到本世纪末他们的产量也只会略超100台。
Even under very aggressive supply chain expansion, they only get to a little bit over a 100 by the end of the decade.
那么这意味着什么?
And so what does that mean?
好的。
Okay.
到本世纪末,他们能生产大约100台这种设备,而目前是70台。
They can make a 100 of these tools by the end of the decade and, you know, 70 right now.
这实际上如何转化为AI算力呢?
How does that actually translate to AI compute?
对吧?
Right?
我们看到萨姆·阿尔特曼和供应链中许多其他人给出的数字,吉瓦,吉瓦,对吧?
We we see all these numbers from Sam Altman and and many others across the supply chain, gigawatts, gigawatts, Right?
我们每年新增多少吉瓦?
How many gigawatts are we adding?
我们看到埃隆说,太空中有上百吉瓦。
And we see, you know, Elon saying, the hundred hundred gigawatts in space.
每年。
A year.
一年。
A year.
对。
Right.
这些数字或对这些数字的质疑,真正的问题其实不在于电力,也不在于数据中心。
The the problem with any of these numbers or the challenge to these numbers is, you know, actually not the power, not the data center.
我们可以深入探讨这一点。
We can dive into that.
但问题是芯片的制造。
But it's it's it's manufacturing the chips.
对。
Right?
所以是吉瓦级别的,比如英伟达的鲁宾芯片。
So a gigawatt of, you know, Nvidia's Rubin chips.
对。
Right?
所以Rubin是在GTC上发布的,我认为这个播客上线的那一周。
So Rubin is announced at GTC, I believe the week this podcast goes live.
要制造出Nvidia今年年底即将发布的新芯片所对应的吉瓦级数据中心容量,你需要几种不同的晶圆技术。
And to make a gigawatt worth of data center capacity of Nvidia's latest chip that they're releasing at the end of this year or towards the end of this year, you need, you know, a few different wafer technologies.
对吧?
Right?
你需要大约55,000片3纳米的晶圆。
You need about 55,000 wafers of three nanometer.
你需要大约6,000片5纳米的晶圆,还需要大约170,000片DRAM内存晶圆,对吧?
You need about 6,000 wafers of five nanometer, and then you need about a 170,000 wafers of DRAM, right, memory.
因此,在这三类不同的材料中,每一种都需要不同数量的EUV设备。
And so across these three different buckets, each of these requires different amounts of EUV.
对吧?
Right?
当你制造一片晶圆时,会经历成千上万道工艺步骤,包括沉积材料和去除材料。
So when you manufacture a wafer, there's thousands and thousands of process steps where you're depositing material, removing them.
但最关键的一个步骤——至少在先进逻辑芯片中——占了芯片成本的30%左右,而这个步骤实际上并不会在晶圆上添加任何材料。
But the sort of key critical step, which at least in advanced logic is like 30% of the cost of the chip, is something that doesn't actually put anything on the wafer.
对吧?
Right?
你先取一块晶圆,涂上光刻胶,这是一种在暴露于光线下会发生化学变化的化学物质,然后将它放入EUV设备中,用特定方式照射它。
You take the wafer, you deposit photoresist, which is like a chemical that basically chemically changes when you expose it to light, then you stick into the EEV tool which shines light at it in a certain way.
它会形成图案。
It patterns it.
对吧?
Right?
因为这里有一个叫做掩模的东西,本质上是一个设计的模板。
Because there's what's called a mask, which is a stencil effectively for the design.
所以当你看一块先进制程的3纳米晶圆时,它大约有70个左右的掩模。
And so when you look at a wafer, you know, leading edge three nanometer wafer has 70 or so masks.
对吧?
Right?
大约有70层光刻工艺,但其中20层是最先进的EUV。
70 or so layers of lithography, but 20 of them are the most advanced EUV.
对吧?
Right?
而且具体来说,如果你想想,好吧,如果我生产一吉瓦需要55,000片晶圆,每片晶圆进行20次EUV曝光,那就可以算出来了。
And that specifically, know, if you think about, okay, well, if I need 55,000 wafers for a gigawatt, if I do 20 EUV wave passes per wafer, you then you can do the math.
也就是说,单个吉瓦需要大约110万次EUV曝光。
That's like, okay, that's 1,100,000 passes of EUV for a single gigawatt.
所以实际上,这很简单。
So actually, like, it's pretty simple.
当你再加上其他所有步骤后,最终在5纳米节点和所有存储器上,总曝光次数达到200万次。
And then once you add the rest of the stuff, it ends up being 2,000,000 right across five nanometer and all the memory.
对于单个吉瓦,你大约需要200万次EUV曝光。
You're at roughly 2,000,000 EUV passes for a single gigawatt.
你知道,这些设备非常复杂,当你思考它们在晶圆上执行的操作时,它们是在扫描并逐步移动晶圆。
You know, these these tools are very complicated, so when you think about what it's doing across a wafer, it's taking the wafer and it's scanning and it's stepping across.
对吧?
Right?
它在扫描,一步步移动,整个晶圆上要重复数百次或数十次。
It's scanning, stepping across, it does this hundreds of times across the entire or dozens of times across the whole wafer.
所以当你谈论有多少次EUV曝光时,实际上是指整个晶圆以某种速率被曝光。
And and so when you're talking about, hey, how many EUV passes, that's the entire wafer is being exposed at a certain rate.
一台EUV设备每小时大约能处理75片晶圆,而且设备的运行时间大约占90%。
A wafer a a EUV tool can do roughly 75 wafers per hour, and the tool is up roughly 90% of the time.
对吧?
Right?
所以最终,我需要大约三台半EUV设备才能完成每吉瓦所需的200万次EUV晶圆曝光。
So in the end, you end up with actually, I need about three and a half EUV tools to do the 2,000,000 EUV wafer passes for the gigawatt.
三台半EUV设备就能满足每吉瓦的需求。
So three and a half EUV tools satisfies the gigawatt.
想想这些数字,还挺有趣的。
So it's funny to think about the numbers.
对吧?
Right?
因为我们现在谈的是,一吉瓦的成本是多少?
Because we're talking oh, what's a gigawatt cost?
大概要500亿美元左右。
It costs like $50,000,000,000 roughly.
对吧?
Right?
那么,三台半EUV设备要多少钱?
Whereas, what does three and a half EUV tools cost?
大概12亿美元。
That's like 1.2.
对吧?
Right?
没错。
Right.
这实际上是一个低得多的数字,这很有趣,想想看,数据中心里有50吉瓦的经济资本支出。
It's actually like quite a lower number, which is which is interesting to think about like, oh, 50 gigawatts of economic, you know, sort of CapEx in in the data center.
而在其基础上构建的代币价值甚至更大。
And what gets built on top of that in terms of tokens is even larger.
对吧?
Right?
价值高达1000亿美元的AI产业链,竟然被仅值12亿美元的设备所限制,而这些设备的产能根本无法快速扩张。
It might be a $100,000,000,000 worth of AI value into the supply chain is held up by this $1,200,000,000 worth of tooling that simply just cannot expand its supply chain quickly.
我认为,你最近读过一篇文章,说过去三年里,台积电的资本支出达到了1000亿美元。
And I think, so you, you, you read this article recently where you're saying over the last three years, TSMC has done a $100,000,000,000 of CapEx.
所以大概是三、三百、四百亿吧。
So it's like thirty, thirty, 40.
如果你想想,其中只有一小部分被英伟达用于其即将采用的3纳米,或者之前用于4纳米芯片的制造。
And if you think about, I mean, small fraction of that is sort of like being used by Nvidia for the three nanometer that it's gonna, or, you know, previously four nanometer that that's using for its chips.
但英伟达却将这些投入转化成了上个季度高达400亿美元的利润。
But Nvidia has turned that into what was, what are it's like your earnings last quarter was like 40,000,000,000.
所以是400亿美元乘以4。
And so 40,000,000,000 times four.
也就是1600亿美元,仅英伟达一家就将数百亿美元的资本支出转化为了价值。
So $160,000,000,000 So Nvidia alone is turning some small fraction of a 100,000,000,000 in CapEx.
这些支出将在多年内折旧,而不仅仅是今年,但今年就达到了1600亿美元。
That's gonna be depreciated over many years, not just this one year It's $160,000,000,000 in a single year.
当进一步深入到ASML这样的供应链环节时,情况会更加严峻,仅用十亿美元的设备来生产一吉瓦的产能。
And then that gets even more intense when you go down the supply chain to ASML, but just taking a billion dollars of the machines to produce a gigawatt.
当然,这些设备的使用寿命超过一年。
And then of course those machines last for more than a year.
对吧?
Right?
所以实际产生的价值还远不止这些。
So it's, it's doing more than that.
好的。
Okay.
所以现在我想弄清楚,如果算上那些不仅在当年销售、而且在过去几年中累计积累的机器,到2030年这类机器会有多少台?
So now I wanna understand, okay, well, how many such machines will there be by 2030 if you include not just the ones that are sold that year, but are, have been compiling over the previous years?
这又意味着什么?萨姆·阿尔特曼说他希望在2030年实现每周一吉瓦的产能,那么把这些数字加总起来,是否与这个目标一致?
And what does that imply about the, Sam Altman says he wants to do a gigawatt a week in 2030, Or or when you add up those numbers, is that compatible with that?
是的。
Right.
这完全是一致的。
That's that's completely compatible.
对吧?
Right?
因为如果
Because if
你想想台积电和整个生态系统,目前已经拥有大约250到300台EUV设备,再叠加今年70台、明年80台,到2030年增长到100台,到本十年末将拥有约700台EUV设备。
you think about TSMC and the entire ecosystem has something two fifty to 300 EUV tools already, and then you stack on 70 this year, 80 next year, growing to 100 by 2030, you're at like 700 EUV tools by the end of the decade.
700台EUV设备,每吉瓦需要3.5台设备,假设全部用于AI(实际上并非如此),那么就能支持200吉瓦的AI芯片产能,供数据中心部署。
700 EUV tools, three and a half tools per gigawatt, assuming it's all allocated to AI, which it's not, but three and a half tools per gigawatt gets you to 200 gigawatts worth of AI chips for the data centers to deploy.
对吧?
Right?
所以200吉瓦,萨姆想要50吉瓦,52吉瓦每年。
So 200 gigawatts, Sam wants 50 gigawatts, 52 gigawatts a year.
那他只占了25%的份额。
He's only taking 25% share then.
对吧?
Right?
显然,有一部分份额会分给移动设备和PC,假设出于某种原因,我们还能拥有消费电子产品,不会被价格排除在外。
Obviously, there's some share given to, you know, mobile and PC, assuming that, you know, for some reason, we're allowed to even have consumer goods still, you know, and we don't get priced out of them.
但大致来说,他说的是占整个芯片制造产能的25%,也就是每年50吉瓦中的25%。
But, you know, roughly, like, he he's saying 25%, 50 per, you know, 25% market share of the total chips fab.
考虑到今年 alone,我认为他将能获得25%的Blackwell GPU出货量,这个比例相当合理。
That's that's kind of like very reasonable given, you know, this year alone, I think he's gonna have access to 25% of the Blackwell GPUs that are deployed.
对吧?
Right?
所以这其实并不算太疯狂。
So it's it's not that crazy.
我觉得有点意外,你知道ASML是什么时候开始出货EUV设备的?
I find it surprising that, you know, when was the first, when did ASML start shipping EUV tools?
是七纳米工艺开始的时候吗?
When the seven nanometer started?
所以我不太确定具体是什么时候。
So I don't know when that was exactly.
但你是说,到了2030年,他们还会在使用2020年最初发货的那些机器。
But you're saying in 2030, they're gonna be using machines that initially were shipped in 2020.
所以十年了,你还在用着世界上技术最先进的行业里最重要的那台机器。
So ten years, you're using the same most important machine in this most technologically advanced industry in the world.
我我觉得这挺让人惊讶的。
I I find that surprising.
ASML开始出货EUV工具到现在大概有十年了,但它真正进入大规模量产阶段是在2020年左右。
So ASML has been shipping EUV tools now for roughly a decade, but it only entered mass volume production around 2020.
你知道,这台设备已经不是原来的了。
You know, the tool's not the same.
你知道,那时候的设备吞吐量更低。
You know, back then the tools were even lower throughput.
围绕它们有一些称为对准精度的规格,对吧?
There were there's various specifications around them called overlay, right?
我之前提到过,你是在层层堆叠,对吧?
You know, I was mentioning you're stacking layers on top of each other, right?
你会做一些EUV工艺,然后进行大量不同的工艺步骤——沉积材料、蚀刻材料、清洗晶圆,你知道,在进行下一个EUV层之前,要完成几十个这样的步骤。
You'll do some EUV, you'll do a bunch of different process steps, depositing stuff, etching stuff, cleaning the wafer, you know, dozens of those steps before you do another EUV layer.
有一个叫做对准精度的规格。
There's a spec called overlay.
对吧?
Right?
意思是,你完成了所有这些工作。
Which is, okay, you did all this work.
你知道,你在晶圆上画了这些线。
You know, you drew these lines on the wafer.
现在我想画这些点。
Now I wanna draw these dots.
对吧?
Right?
假设我想画这些点来连接这些金属线,然后是孔,再上一层是另一组垂直的线。
Let's just say I wanna draw these dots to connect these lines of metal to and then dot you know, holes, and then the next layer up is another set of lines that goes perpendicular.
所以现在你是在连接彼此垂直的导线。
So now you're connecting wires going perpendicular to each other.
这时候你必须能精准地将它们对准,这就叫对准。
There you have to you have to be able to land them on top of each other, so it's called overlay.
而对准这一指标,已经被ASML迅速提升了。
And overlay is a spec that's been improved rapidly by ASML.
晶圆吞吐量也被ASML迅速提升了。
Wafer throughput has been improved rapidly by ASML.
而且设备的价格也上涨了,但涨幅远不及设备性能的提升。
And also the price of the tool has gone up, but not as much as the capabilities of the tool.
对吧?
Right?
最初,EUV设备的价格大约是一千五百万,而随着时间推移,到2028年,价格已经涨到了四亿美元左右。
Initially, the EUV tools were like one fifty million and over time, they're now like 400,000,000, you know, as I as I look out to 2028.
但设备的性能也提升了不止一倍。
But the capabilities of the tools have more than doubled as well.
对吧?
Right?
尤其是在吞吐量和对准精度方面,也就是即使在中间经历了大量步骤,仍能精确地将后续层对准叠加在一起的能力。
Especially on throughput and overlay accuracy, which is the ability to stack, you know, accurately align the the subsequent passes on top of each other, even though you do tons of steps between.
所以,ASML的提升速度非常快。
And so this is this is, you know, ASML is improving super rapidly.
我认为还值得一提的是,ASML可能是世界上最慷慨的公司之一。
I think it's also something noteworthy to say ASML is, you know, maybe one of the most generous companies in the world.
对吧?
Right?
他们有一个关键的支柱技术。
They have this linchpin thing.
没有人有能与之竞争的产品。
No one has anything competitive.
也许中国在本十年末能推出自己的EUV设备,但其他任何人都没有接近EUV的技术。
Maybe China will have some EUV by the end of the decade, but no one else, you know, has anything even close to EUV.
然而,他们并没有像疯狂一样提高价格和利润率。
And yet they haven't taken price and margins up like crazy.
对吧?
Right?
你去问问我们经常接触的其他人,比如利奥波德,他们会说,让我们把价格提上去吧。
You know, you go ask, you know, some other folks, you know, that we talk to all the time, like, you know, for example, Leopold, and they're like, what you know, let's let's, you know, let's let's have the price go up.
对吧?
Right?
因为他们有能力。
Because they can.
利润空间是存在的。
The margin is there.
你可以提高利润率。
You can you can take the margin.
就像英伟达那样获取利润。
Like Nvidia takes the margin.
内存厂商都在提高利润,但ASML从未将价格提高到超过其工具性能提升的程度。
Memory players are taking the margin, but ASML has never risen the price more than they've increased the capability of the tool.
因此,从某种意义上说,他们始终为客户提供了净收益。
And so in a sense, they've always provided net benefit to their customer.
并不是说这些设备停滞不前。
It's not that the tool is stagnant.
只是这些设备本身已经很老旧了。
It's just that, like, you know, these tools are old.
是的。
Yes.
你可以对它们进行一些升级,新设备也在陆续推出。
You can upgrade them some and the new tools are coming.
为简化起见,我们在这期播客中暂时忽略了一些进展,比如每台设备的对准精度或吞吐量提升。
And for simplicity's sake, we're kind of ignoring, you know, the advances for this podcast, the advances in overlay or throughput per tool.
所以你说今年会生产60台这种设备,之后几年会增加到七十、八十台。
So you say we're producing 60 of these machines, this year and then seventy, eighty over subsequent years.
如果ASML决定将其资本支出翻倍或三倍,会发生什么?
What would happen if ASML just decided to double its CapEx or triple its CapEx?
是什么阻止他们在2030年之前生产超过100台设备?
What is preventing them from producing more than a 100 in 2030?
为什么,为什么,你对五年后的产能有如此高的信心?
Why, why, why so confident that even five years out, you can be relatively sure what their production will be?
我认为这里有几个因素。
So I think I think a couple factors here.
对吧?
Right?
ASML 并没有决定盲目冒险。
ASML has not decided to just go YOLO.
让我们尽可能快地扩大产能。
Let's expand capacity as fast as possible.
对吧?
Right?
一般来说,半导体供应链也没有。
In general, the semiconductor supply chain has not.
对吧?
Right?
它经历过多次繁荣与萧条,我们可以再多聊一点。
It's lived through the booms and busts, and we can talk a bit more about it.
但基本上,没有人——虽然最近有一些玩家才醒悟过来,但总体而言,没人真觉得每年会有 200 吉瓦的 AI 芯片需求,或者半导体供应链每年会有数万亿美元的支出。
But basically, no one, you know, some players as of very recently have like woken up, but in general, no one really sees demand for 200 gigawatts a year of AI chips or, you know, trillions of dollars of spend a year in the semiconductor supply chain.
他们根本就没被AI洗脑。
They're just like they're not they're not AI pilled.
对吧?
Right?
他们也不是通用人工智能。
They're not AGI.
今年我们就会达到一万亿美元。
We're gonna get to a trillion dollars this year.
是的。
Yeah.
我懂你的意思,但我只是说,供应链里根本没人真正理解这一点。
I I I feel you, but I'm just saying, like, no one really understands this in the supply chain.
我们不断被告知,我们的数字太高了。
Constantly, we're told our numbers are way too high.
而当他们正确时,就会说:‘哦,对啊。’
And then when they're right, they're like, oh, yeah.
是的。
Yeah.
但你明年的数字还是太高了。
But your your next year's numbers are still too high.
但不管怎样,ASML 的设备有四个主要组件。
And it's like but anyways, like, ASML has sort of their tool has four major components.
对吧?
Right?
它有光源,对吧,由位于圣地亚哥的 SemiAnalysis 制造。
It has the source, right, which is made by SemiAnalysis in San Diego.
它有掩模台,在康涅狄格州威尔明顿制造。
It has the reticle stage, is made in Wilmington, Connecticut.
对吧?
Right?
它还有晶圆台、光学系统和镜头等。
It has the wafer stage the optics, the lenses and such.
而这两个是在欧洲制造的。
And those two are made in Europe.
对吧?
Right?
所以当你逐一查看这四个部分时,它们的供应链都极其复杂,一方面,它们并没有试图大规模扩张;另一方面,即使想扩张,所需的时间周期也非常长。
And so when you when you look at each each of these four, they're tremendously complex supply chains that a, they have not tried to expand massively, and b, when they try to expand them, the time lag is quite long.
对吧?
Right?
因此,这台机器是人类制造的最复杂的机器,无论从任何产量规模来看都是如此,但咱们具体聊聊光源部分。
And so, again, this is the most complicated machine that humans make period, right, at at a volume any sort of volume, but like, let's talk about the source specifically.
对吧?
Right?
光源到底是做什么的?
What does the source source do?
它会滴下锡滴。
It drops these tin droplets.
激光精确地连续击中它三次。
It hits it three subsequent times with the laser perfectly.
第一次击中锡滴,使其向外膨胀。
So the first one hits this tin droplet expands out.
再次击中,使其膨胀成完美的形状,然后以超高功率轰击,使锡滴被激发到足以释放13.5纳米极紫外光的程度,接着由一个类似收集器的装置将所有光线收集并导向透镜组。
It hits it again, so it expands out to this perfect shape, and then it blasts it at super high power and the tin droplets get excited enough that they release EUV light 13.5 nanometer and then it's in this thing that is like basically collecting all the light and directing it into the lens stack.
对吧?
Right?
然后你有透镜组,那是卡尔·蔡司的,正如你提到的,还有一些其他公司,但蔡司是其中最重要的部分。
Then you have the lens stack which is Carl Zeiss, right, as you mentioned and and and some other folks, but Zeiss being the most important part of it.
他们也没有试图扩大生产能力,因为他们觉得没必要,他们认为:‘哦,是的。’
They also have not tried to expand production capacity because they don't see any you know, they they they're like, oh, yeah.
是的。
Yeah.
我们之所以增长迅速,是因为人工智能。
Like, we're growing a lot because of AI.
我们的规模正从60增长到100。
We're growing from 60 to a 100.
对吧?
Right?
感觉就像是,不。
It's like, no.
不是。
No.
不。
No.
不。
No.
我们需要达到,比如,几百台,但这没问题。
We need to go to, like, a couple 100, but it's it's fine.
无所谓。
Whatever.
这些工具中的每一个,我认为都有大约18个这样的透镜,实际上是镜子。
Each of these tools has, you know, I think 18 of these lenses effectively, mirrors.
它们是多层镜子,由钼和钌的完美层叠压而成,据我回忆,这些层被堆叠成许多层,光线能完美地反射。
They are they are multilayer mirrors, which are perfect layers of molybdenum and ruthenium, if I recall correctly, stacked on top of each other in many layers, and then the light bounces off of it perfectly.
但这并不是像我们通常理解的透镜那样,它有个形状并聚焦光线。
But it's not just like, you know, like when we think about a lens, you know, it's it's like in a shape and it focuses the light.
这更像是一个既是镜子又是透镜的装置,所以非常复杂。
This is a this is like a mirror that's also a lens, and so it's pretty complicated.
在这类超薄沉积堆叠的完美层中,任何缺陷都会破坏整体效果。
Any defect in this perfect layer of stat in this in these like super thinly deposited stacks will mess it up.
任何曲率问题,都会给规模化生产带来大量挑战。
Any curvature issues, like there is a lot of challenges with scaling the production.
从某种意义上说,这相当手工化,对吧?
It's quite artisanal, right, in the sense.
对吧?
Right?
因为你每年不是生产几万台,而是生产几百台,几千台。
Because you're not making tens of thousands of these a year, you're making hundreds, you're making thousands.
对吧?
Right?
你说每年生产60台设备,每台设备有18个这样的部件,最终你还是只能达到几百台,差不多一千台的数量,这些就是透镜和投影光学系统。
You know, talk about 60 tools a year, 18 of these per tool, you end up with, you know, you're still in the, you know, hundreds of tools or thousand you're at the thousand number roughly for these these lenses and projection optics.
然后你再往前看掩模阶段,那也是极其疯狂的。
So then you and then you step forward to the reticle stage, which is also something really crazy.
这个部件的运动加速度能达到九个G,因为它在晶圆上移动时,设备会快速移动,而晶圆台是与之配合的。
This thing moves at, I wanna say, nine g's, like it it will shift nine g's because as you step across a wafer, the tool will go and and the wafer stage is complementary.
这是晶圆部分。
It's the wafer part.
所以你要把这两者精确对准。
So you you line these two things up.
你让所有光线通过这些聚焦的透镜。
You're taking all the light through the lenses that's focused.
这是掩模版。
And and here's the reticle.
这是晶圆。
Here's the wafer.
掩模版正在向一个方向移动。
And you're passing the reticle is moving one direction.
晶圆则向相反方向移动,扫描晶圆上一个26乘33毫米的区域,然后停止。
The wafer is moving the direction the other direction as it scans a 26 by 33 millimeter section of the wafer, and then it stops.
它再移动到晶圆的另一个部分,重复这一过程。
It shifts over to another part of the wafer and does it again.
整个过程只需几秒钟。
And it does that in just seconds.
对吧?
Right?
它们各自以九倍重力加速度向相反方向移动。
And and each of them are moving at nine g's in opposite directions.
所以,这些东西每一项都堪称奇迹,是化学、制造、机械工程和光学工程的杰作,因为你需要精确对准所有部件,确保它们完美无缺。
So each of these things is like a wonder and marvel of like chemistry, fabrication, you know, you know, sort of like mechanic mechanical engineering, optical engineering because you have to align all these things and make sure they're perfect.
所有这些都涉及大量的计量学,因为必须对每个环节进行完美测试,因为任何一处出错,良品率就会直接归零。
All these things have crazy amounts of metrology because you have to perfectly test everything because if anything is messed up, the yield goes to zero.
对吧?
Right?
因为这是一个如此精密的系统。
Because this is such a finely tuned system.
顺便说一下,它的体积如此庞大,以至于你们是在荷兰海因霍芬的工厂里建造的,然后将其拆解,用多架飞机运送到客户现场,再在那里重新组装并进行测试。
And by the way, you it's so large that you're building it in all these you're building in the factory in Hainthoven, Netherlands, and they're deconstructing it and shipping it on many planes to the customer site, and then you're reassembling it there and testing it again.
这个过程需要花费好几个月的时间。
And that process takes many, many months.
所以,你看,供应链中有非常多的环节。
So, like, it's it's just there's so many steps in the supply chain.
对吧?
Right?
无论是蔡司制造其镜头和投影光学系统,还是ASML旗下的西门子制造EUV光源。
Whether it's Zeiss making their lenses and projection optics or Seimer, which is an ASML owned company making the EUV source.
而每一个环节都有其复杂的供应链。
And each of these has its own complex supply chain.
对吧?
Right?
ASML曾表示,其供应链中有多达一万人参与。
ASML has commented their supply chain has over 10,000 people in it.
对吧?
Right?
比如各个独立的供应商。
Like individual suppliers.
是的。
Yes.
而且这可能并不是直接的。
And it might not be directly.
这可能是通过这样的方式:你知道,蔡司有这么多供应商,XYZ公司也有这么多供应商,但如果你只是想想,有两个物理移动的物体,大小像晶圆这么大,
It might be through like, hey, you know, Zeiss has so many suppliers and, you know, XYZ company has so many suppliers, but, you know, they these, you know, if you just think about like, okay, you're talking about two physically moving objects that are like this large and this large, you know, the size of a wafer.
对吧?
Right?
而且它的精度必须达到个位数纳米甚至更小的级别,因为整个系统的层间对准误差必须控制在三纳米左右。
And it has to be accurate to the level of, you know, single digit nanometers or even smaller because the entire system, the overlay, right, layer to layer variation has to be on the order of three nanometers.
对吧?
Right?
如果对准误差是三纳米,那就意味着每个单独部件的物理运动精度必须比这还要更小。
And so if overlay is three nanometers, that means each individual part, the accuracy of its physical movement has to be even less than that.
对吧?
Right?
在大多数情况下,精度必须低于一纳米,因为这些误差会累积叠加。
Has to be sub one nanometer in most cases because the the error of these things stacks up.
对吧?
Right?
所以根本不可能打个响指就提高产量。
And and and so there's no way to like, you know, just like snap your fingers and increase production.
对吧?
Right?
像电力这种简单的事情。
You know, things simple as power.
对吧?
Right?
美国从零电力增长到2%的增长,即使中国已经达到了30%,对美国来说也难如登天。
The US going from 0% power growth to 2% power growth, even though China's already at 30 was like so hard for America to do.
对吧?
Right?
这是一个非常简单的供应链,供应链中参与制造复杂产品的人很少,对吧?
And and and that's a really simple supply chain with very few people in the supply chain, right, who make difficult things.
在美国,从事电力供应链的电工或相关工作人员可能有十万甚至更多。
And there's, you know, probably what, a 100,000 electricians slash people work in the supply chain of electricity or more in The US.
当你看到ASML只雇佣了这么少的人时,你知道的。
And, you know, when you look at, oh, ASML employs like so few people.
卡尔·蔡司可能只有不到一千人从事这项工作,而且这些人都极其专业。
Carl Zeiss probably employs like less than a thousand people working on this, and all of those people are like super super specialized.
所以。
So
确实是这样。
it's Yeah.
你不可能像打个响指那样,随便培训一些人就能胜任这项工作。
You know, you can't just train random people up for this, like, the snap of a finger.
你不可能让整个供应链一下子都动员起来。
You can't just get your entire supply chain to get get galvanized.
对吧?
Right?
英伟达已经做了大量工作,才让整个供应链达到今年的产能,尽管当你去问Anthropic时,他们会说:我们缺TPU,缺算力,缺GPU。
Nvidia's had to do a lot to get the entire supply chain to even deliver the capacity they're gonna make this year, even though when you look go talk to Anthropic, they're like, well, we're short of TPUs, we're short of training, we're short of GPUs.
当你去和OpenAI交谈时,他们会说,我们缺这些东西。
When you go talk to OpenAI, they're like, we're short of these things.
对吧?
Right?
所以OpenAI和Anthropic都知道他们需要x。
So OpenAI and Anthropic, they know they need x.
Nvidia并没有那么痴迷于AGI,他们正在建造的是x减一。
Nvidia is not quite as AGI pilled, and they they're building, you know, x minus one.
当你往下看整个供应链时,每个人都在做减一。
And you go down the supply chain, everyone's doing minus one.
在某些情况下,他们甚至只做到了二分之一。
And in some cases, they're doing like divided by two.
对吧?
Right?
因为他们根本就没有被AGI理念所驱动。
Because they just don't they're not AGI pilled.
对吧?
Right?
我认为,因此最终会导致这个反应机制出现时间延迟。
I think and and and so you end up with the time lag for this whip to react.
对吧?
Right?
你知道,这种对AI的狂热程度以及提升产能的渴望是如此之久。
You know, the the the sort of AI pilledness is and and desire to increase production is so long.
然后当他们终于明白,嘿,我们需要迅速扩大产能。
And then once they finally understand, hey, we need to increase production rapidly.
对吧?
Right?
他们以为自己明白了,哦,AI意味着我们必须从60提升到100。
And they they think they understand, oh, AI means we have to go from 60 to a 100.
除了所有工具都在不断改进和提速之外,你知道,供电能力也从500瓦提升到1000瓦,还有供应链中其他各个方面在技术上的进步以及产能的提升。
In in addition to the tools all just getting better and faster, you know, the source getting higher power from 500 watts to a thousand and, you know, all these other aspects of the supply chain, know, advancing technically plus increase their production.
他们以为自己真的在大幅增加产量。
They think they're they're, like, actually increasing production a lot.
但如果你仔细算算,埃隆到底想要什么?
But if you float through the numbers of, hey, what does Elon want?
他想要到2028年每年在太空领域达到100吉瓦,是吧?
He wants a 100 gigawatts a year in space by 2028, is it?
还是2029年?
Or 2029?
萨姆·奥尔特曼则希望到本十年末每年达到50到52吉瓦,而你看看,Anthropic很可能也需要同样的量,谷歌也同样需要。
And, you know, Sam Altman wants 50 gigawatt, 52 gigawatts a year by the end of the decade, and you look at, you know, probably Anthropic needs the same, and then, you know, Google needs that.
你再看看整个供应链。
You know, you you go across the supply chain.
等等。
It's like, wait.
不对。
No.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。