本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
好的,阿奈。孩子们都上学去了吗?
Alright, Anay. Well, kids are in school?
孩子们都上学了。我们熬过了这个夏天。我喜欢开学的感觉,就像是新年的开始。
Kids are in school. We survived the summer. I love how back to school kind of feels like the beginning of the new year.
确实如此。我一直觉得每年的这个时候才是真正的新年。我喜欢这种感觉。欢迎来到《变革步伐》节目的第三集。我们将在这里讲述人类进步的故事。
It does. I always thought this time of year is the actual new year. I like that. Welcome to the third episode of the Step Change Show. We're here to cover the stories of human progress.
我们想要了解塑造我们世界的技术、系统和基础设施。我是本·艾德尔森,Step Change Ventures的联合创始人,这是一家投资于加速当今最大变革步伐的公司的基金。我常驻华盛顿州西雅图。
We wanna understand the technologies, systems, and infrastructure that shape our world. And I'm Ben Eidlson. I'm a cofounder of Step Change Ventures, a fund that invests in the companies that are accelerating today's biggest step changes. And I'm based up in Seattle, Washington.
我是阿奈·沙阿,StepChange Ventures的另一位联合创始人,常驻加利福尼亚州洛杉矶。今天,我们要讲述的是近期基础设施超大规模化的故事——数据中心的故事。每次你流媒体看电影、发送短信、叫车接送,或是与看似完全成形的计算意识对话时,你都在触碰一个无形的物理帝国。我们称之为云端,但它并不在天上。它存在于某个非常、非常真实的地方。
And I'm Anay Shah, fellow cofounder of StepChange Ventures and based in Los Angeles, California. Today, we're gonna be telling the story of the hyperscale of a recent infrastructure, the story of data centers. Every time you stream a movie, send a text message, call a car to pick you up, or talk to what feels like a fully formed computational consciousness, you are touching an invisible, physical empire. We call it the cloud, but it isn't in the sky. It lives somewhere very, very real.
它存在于全球近1.2万栋建筑中,消耗着美国近5%的电力。它存在于那些铺设在海底最黑暗处的花园软管粗细的电缆里。今天,我们将讲述这个无形基础设施的故事,这个故事始于20世纪初嗡嗡作响的打孔卡片室,蜿蜒穿过意外催生互联网的冷战项目,最终引向如今正在建造的、规模堪比下曼哈顿的千兆瓦级人工智能工厂。
It lives in nearly 12,000 buildings worldwide, consuming almost 5% of electricity in The US. And it lives inside these garden hose sized cables that are laid across the darkest parts of the ocean floor. Today, we are going to tell the story of this invisible infrastructure, a story that begins in the humming and clattering punch card rooms of the early 1900s, winds through the Cold War projects that accidentally birthed the Internet, and leads the gigawatt scale AI factories the size of Lower Manhattan that are being built today.
美国股市市值约60万亿美元,由4000多家公司组成。但处于顶端的只有六家公司——英伟达、微软、苹果、字母表、亚马逊和Meta——它们占据了市场30%的份额。它们是当今的铁路、钢铁和石油公司,正在构建我们这个时代的现代工业引擎。
The US stock market is worth around $60,000,000,000,000 made up of over 4,000 companies. But there's just six companies at the top, NVIDIA, Microsoft, Apple, Alphabet, Amazon, and Meta, that make up 30% of that market. They are today's railroad, steel, and oil companies and are building the modern industrial engine of our time.
这是关于数据中心的故事。但在开始之前,先简单说明一下。本集的所有研究链接和笔记都已发布在stepchange.show上。如果你在收听时想到可能有朋友或同事对此感兴趣,请转发给他们。这对我们来说是一项全新尝试,我们很感激能让可能喜欢的人听到这些内容。
This is the story of data centers. But before we do, just one quick note. We have all of the research links and notes for this episode up at stepchange.show. And if you're listening to this and you think of a friend or colleague who might enjoy it, please send it over their way. All of this is a new endeavor for us, and we appreciate it getting into the hands or rather ears of folks who may dig it.
好了,言归正传,说说数据中心。
Alright. So data centers.
本,我最近读到一篇文章说这些新技术正在为当今每个人节省工作量。很快我们就完全无事可做了。
So Ben, I was recently reading an article saying these new technologies are saving work for everyone nowadays. Pretty soon, we'll have nothing to do at all.
没错。我记得他们提到有种新型电子大脑会让大多数工人失业。
That's right. I think they were saying that we have some new electronic brain that's gonna make most workers obsolete.
对。但其中一项技术背后的CEO立即反驳说,这是节省时间的方式而非取代工作。他说这只是帮助杰出人才造福人类的小工具。
Right. But the CEO behind one of these technologies quickly countered, saying that it was a way to save time, not replace jobs. He said it was a small tool to help great minds benefit mankind.
这是Anthropic、OpenAI还是谷歌的CEO说的?
Was this the CEO of Anthropic or OpenAI or or maybe Google?
你可能会这么想,但都不是。这是IBM最初的领导者托马斯·J·沃森老先生的言论。
You might think so, but no. This was Thomas J. Watson senior, the original leader of IBM.
IBM,上世纪的计算巨头。早在谷歌、亚马逊、Meta和微软出现之前,就有蓝色巨人IBM的存在。而在IBM制造电子计算机之前很久,他们就在进行另一种形式的计算。
So IBM, the computational powerhouse of the last century. Long before we had the Googles and the Amazons and the Metas and Microsofts, there was big blue IBM. And long before IBM ever built an electronic computer, they were doing computation of a different sort.
如果你在1930年代走进一家大公司,可能会发现自己置身于一个无窗的房间,嗡嗡作响、咔哒咔哒像个小工厂。那里有成排的金属柜、旋转的齿轮,职员们正将一叠叠硬纸卡喂入这些庞然大物。每张卡片都承载着零星信息——员工工时、发票或客户地址——但集合起来,它们构成了企业数据的首个中央神经中枢。
And if you walked into a large company back in the nineteen thirties, you might find yourself in one of their windowless rooms humming and clattering like a small factory. There'd be long rows of metal cabinets, whirring gears, and clerks feeding stacks of stiff paper cards into these behemoth machines. Each card was a sliver of information an employee's hours, an invoice, maybe a customer's address but together, they formed the first centralized nerve center for corporate data.
因此我认为,正是这些房间里的卡片信息被这些计算设备处理的过程,让我们有理由称其为最早的真实数据中心。
And I think that's why we argue that those rooms with those cards of information being processed on these devices that were doing computing are the first real data centers.
那台机器的历史相当有趣。它诞生是为了解决美国政府的问题——即人口普查的难题。
That machine has quite an interesting history. Born to solve a government problem, which is the problem of The US census.
这很疯狂。1880年的人口普查,他们在1880年收集了所有信息,却花了七年时间才完成统计。于是人们意识到,人工计算的规模已经无法满足需求了。
It's wild. Mean, the 1880 census, they would collect all the information in 1880, and it took seven years before they got it tabulated. And so you they realized it just was not working at human computation scale anymore.
这时,一位名叫赫尔曼·霍尔瑞斯的男子发明了解决方案。
And so the one man, Herman Hollerith, had an invention to solve this.
他借鉴了铁路乘务员检票的方式,注意到这些打孔纸片最终存储并呈现了信息。如果能造出可以统计孔洞数量的机器,就能实现规模化计算,不再需要人工在表格上做标记统计。
He took the way that railroad conductors checked tickets and noticed that these were papers that had punched out holes that ultimately stored and represented information. And if you could build a machine that could count the number of punch holes, you could then do computation at scale without needing to have humans count markdowns on a form.
没错。这些卡片存储信息,然后你可以用机器进行计算。1890年的人口普查是这台机器的首次重大突破。即使人口比十年前增加了25%,普查工作也在两年内完成,而非超过七年。而且预算还节省了500万美元,这对于政府采用的新技术来说实属罕见。
That's right. These cards stored information, and then you could have machines compute off of them. And the 1890 census was the machine's first big break. Even with a population 25% larger than the decade before, it was completed in two years instead of more than seven years. And it came $5,000,000 under budget, which is not something that typically a new technology could achieve for a government.
快进到沃森的时代。他看到了穿孔卡片机,并理解其功能。到了二十世纪二十年代中期,他已深信不疑。
And so fast forward to Watson. He sees the punch card machine, and he understands what it can do. And by the mid nineteen twenties, he's convinced.
于是在1924年,他将公司更名为国际商业机器公司(IBM)。
So by 1924, he rebranded the company International Business Machines.
这比CTR这个名称贴切多了。
A much more fitting name than CTR.
计算、制表、记录公司。
Computing, tabulating, recording company.
1927年,他告诉高管们:'制表业务没有上限'。于是他全力投入,出售所有前景不佳的业务线,将公司资源集中到制表部门。他们设计了一种专有的80列IBM卡片,只能在IBM机器上使用,成为IBM卡——这是一种早期的锁定策略,想使用机器的客户必须购买这些卡片。
There is no limit for this tabulating business, he told his executives in 1927. And so he doubled down. He sells off all the other less promising lines of business and pours the resources of this company that he's running into the tabulated division. They design a proprietary 80 column IBM card that only works on IBM machines to become the IBM card, a sort of early days lock in where customers who wanted to use the machines had to buy the cards.
人人都痴迷于吉列剃须刀的商业模式,但IBM卡片不仅是计算的工具,还存储信息,成为企业内部运行这些IBM机器时的记录保存方式。
Everyone obsesses over the Gillette razorblade business model, but not only is the IBM card how you do the computation, but it's also storing information, and it becomes your record keeping method inside of businesses that are running on these IBM machines.
一旦你将发票、工时卡等所有信息录入IBM卡片,就无法再迁移出来。
Once you put invoices and time cards and all of that information onto the IBM card, you're not migrating it off.
它就在一张纸上。
It's on a piece of paper.
没错。所以现在你不仅销售制表机,还创造了一条非常稳定的收入流。时机也恰到好处,这是在二十年代末、三十年代初。
Yeah. That's right. So so now you've produced a a very steady stream of revenue alongside selling the tabulator machine. And the timing for this was perfect. This is the late twenties, early thirties.
新政带来了政府对记录保存的高度重视。1935年,社会保障总署与IBM签订合同,要求提供数百万张卡片和机器来处理福利并打印支票。他们纽约的一家工厂很快达到每天生产1000万张卡片的规模。
The New Deal brings in massive government focus around recordkeeping. And in 1935, the Social Security Administration signed a contract with IBM requiring millions of their cards and machines to process benefits and print checks. One of their New York plants is soon printing 10,000,000 cards per day.
这太不可思议了。
It's amazing.
短短几十年内,多数大公司都设立了穿孔卡片室。他们用各种机器来分类、制表和存储财务信息、工资数据、员工档案、客户资料等你能想到的一切。为此他们还会雇佣文员和技术专家来操作维护这些设备。
And within a couple decades, most large companies had punch card rooms. They had various machines they used to sort, tabulate, and store financial information, payroll information, employee information, customer information, everything you could think of. And then in order to manage that, they'd hire clerks and technical specialists to operate and maintain it.
虽然现在觉得理所当然,但这是企业首次实现会计和发票处理的大规模运算。想想这个时代背景——正是企业开始以新方式扩张的时期,也是全球贸易兴起的阶段。老托马斯·沃森后来对外交产生浓厚兴趣,周游世界宣扬跨境商业将带来和平。不幸的是这适得其反,他在大屠杀前夕与德国进行了大量商业往来。
It's easy to take for granted, but this was the first time a company and its accounting and its invoicing was being able to be calculated at this scale. And you think about what's going on in this era, this is when companies are starting to scale in new ways. It's also when more global trade is starting to happen. Thomas Watson senior ends up very interested in diplomacy and does a bunch of traveling around the world trying to preach that commerce across borders is gonna create peace. Unfortunately, this backfires, and he does a lot of commerce with Germany in the lead up to the Holocaust.
关于IBM的计算能力被用于协助德国人进行记录保存,流传着许多不同的故事。归根结底,这项产品奠定了这个时代诸多规模的基础,包括战争规模。每当发生战争或重大政府项目时,无论交战的是哪个国家,他们都需要IBM的机器来进行计算。
There's a lot of different stories around IBM's computational power being used to help Germans with recordkeeping. Ultimately, this product is foundational to so much of the scale of this era, including the scale of war. Every time there was a war or a big government project, regardless of which country it was fighting, they needed IBM machines to do calculations.
这就是处理和存储信息的方式。
This was the way to process and store information.
但当时仍全是卡片和机械开关。是的,最终还有满屋子的纸张。满屋的纸。所以大家都在用打孔卡运行。
But it was all still cards and and mechanical switches Yeah. Ultimately. And and rooms full of paper. Rooms of paper. So everyone's running on punch cards.
这是什么?这个房间是做什么的?我们该叫它什么?
What is this thing? What is this room? What should we call it?
这里就是我们要立下里程碑的地方——最早的数据中心源于IBM的打孔卡。数据中心,最简单的理解就是一个物理空间或设施集合,旨在容纳和运行组织的数据与计算基础设施。你可以将其核心功能理解为存储信息、处理信息,并逐步实现信息的连接与通信。从本质上说,我们将讨论的数据中心及其从20世纪中叶至今的演变,始终围绕着存储、计算和连接这三大物理空间要素。
This is where we're gonna put our stake in the ground and say the earliest data centers come out of the IBM punch card. A data center, the easiest way to understand it is it's a physical space or a collection of facilities designed to house and operate an organization's data and computing infrastructure. And so you can think of it as primarily storing information, computing information, and over time, connecting and allowing for communication of information. At its basic core, a data center and the evolution that we're gonna talk about from the mid nineteen hundreds to present day is the physical space around storage, computing, and connectivity.
此时IBM已是举足轻重的成长型企业,对吧?开展着全球性业务。到1945年,他们拥有约25,000名员工和约1.4亿美元的年收入。但变革即将来临,这也源于战争。二战是政府大规模投资研发的关键时刻。他们经常需要计算火炮射击表,这需要考虑风速、天气等各种因素来确定火炮射击方位。
At this moment, IBM was already this growing company of import, right, doing global and international business. By 1945, they had around 25,000 employees and annual revenue of approximately 140,000,000. But something was about to change, and that's also coming out of the war. World War two was this big moment of government investment in r and d. One of the things that they found themselves often doing was calculating artillery firing tables, which take into account the wind speed and and the weather and all of these different factors to figure out where they should shoot artillery.
因此会有由人类计算员组成的团队,这些职员通过机械计算来解决问题,但这形成了瓶颈。于是陆军资助宾夕法尼亚大学团队建造了第一台真正的电子计算机——这台机器能以数量级更快的速度完成这些计算,后来被称为电子数值积分计算机,简称ENIAC。
And so there'd be teams of human computers, clerks that were doing mechanical calculation to figure things out, but that was a bottleneck. And so the army funded a team at the University of Pennsylvania to build the first real electronic computer, a machine that could crank through those calculations an order of magnitude faster, and that became the electronic numerical integrator and computer, also known as the ENIAC.
这一切都与数学有关。它专注于计算数字并生成数学结果,以帮助他们在战争规划中更高效。
This was all about math. It was focused on calculating numbers and and producing mathematical results that will help them, in this case, be more efficient in war planning.
是啊。我是说,他们那时可没有TI 83计算器。
Yeah. I mean, they didn't have the TI 83.
也没有里面能玩的游戏。
Or the games that go on it.
确实没有那些游戏。他们没有俄罗斯方块。所以他们需要比人力计算瓶颈更快的计算器。ENIAC没有任何会拖慢运算的机械部件。它占据了近2000平方英尺的房间,但运算速度比之前的任何计算设备快1000多倍。
Or the games that go on it. They didn't have Tetris. So they needed the calculator that could go faster than the people that were a bottleneck at this point. And so the ENIAC had no mechanical parts that slowed its operation. It took up nearly 2,000 square foot room, but it did operate over 1,000 times faster than any previous computational device.
它能每秒执行5000次加法运算。当时IBM最快的商用穿孔卡片机每秒只能完成4次加法。这不是数量级的提升,而是三个数量级的飞跃。
And so it could execute 5,000 additions per second. At this time, IBM, which had the commercial product, the quickest punch card machine could complete just four additions a second. So it's not order of magnitude faster. It's three orders of magnitude faster.
从每秒4次加法到每秒5000次加法。数学家和高管们若知晓此事,可能会觉得'我看到了未来'。但事实上,一位哈佛著名数学家曾断言计算机不会有太大市场,认为全国只需要五六台,主要用于军事和科研。我们的老朋友托马斯·J。
Four additions per second to 5,000 additions per second. This seems like something that mathematicians and CEOs that are privy to this would think, oh, I can see the future. But no, a distinguished Harvard mathematician dismissed the idea's foolishness that there would be a big market for computers. He believed that the country would need maybe half a dozen, mainly from military and scientific research. Our friend Thomas J.
沃森老先生曾说通用计算机根本毫无用处
Watson senior said general purpose computers had nothing whatsoever to do
与IBM或IBM的主要设备生产线及其盈利能力有关。但沃森固执地坚守过去,只有他儿子推动公司前进的固执程度能与之匹敌,甚至可能更胜一筹。
with IBM or IBM's main line of equipment and profitability. But Watson's stubbornness to stay wedded to the past was only matched and maybe outdone by his son's stubbornness to push the company forward.
家族遗传,但表现形式不同。
Runs in the family, but takes a different shape.
这是一个跨越时代的父子故事。小托马斯·沃森最终参军参战,归来后准备接掌IBM的领导职位。他将电子技术视为公司的未来,这演变成了一场关于IBM控制权的代际之战。小托马斯·沃森看到了公司在电子领域的未来,并希望朝这个方向推进。
This is a father and son story for the ages. Thomas Watson Jr. Eventually goes and fights in the war, comes back and is ready to take his seat as a leader inside of IBM. He views electronics as the future of the company, and this becomes the intergenerational battle for the control of IBM. Thomas Watson Junior saw that there is a future for the company in electronics and wanted to push it that way.
这里存在耐人寻味的亲子心理现象——你能看出他因父亲不愿接纳他认为显而易见的未来而感到极度沮丧和愤怒。
There's some interesting parent child psychology where you could see him completely frustrated and angry that his father's not gonna jump on what he thinks is very clearly the future.
IBM确实与哈佛数学家霍华德·艾肯合作过一项战时项目,那是台混合了机电与真空管的机器。战后他们紧接着推出了首台真正想向公众展示的机器,名为选择性序列电子计算器(SSEC)。那是1948年,他们在曼哈顿中心的麦迪逊大道展厅展示它,让人们可以路过观看这台机器进行运算。
IBM did do a wartime project with Harvard mathematician Howard Eichen that was a hybrid electromechanical vacuum tube machine. They followed that with their first machine that they really wanted to demonstrate to the public after the war, and this was called the selective sequence electronic calculator, SSEC. This was in 1948, so they showed that in this Madison Avenue showroom right in the middle of Manhattan so that people could walk by and see this thing computing.
看到这些机器时,它们仍然有房间那么大。
And by seeing this thing, these machines are still the size of rooms.
而且它很快——不是按今天的标准,但在当时非常快。它没有我们今天习以为常的内存概念,但通过打孔纸带作为存储形式。尽管老托马斯·沃森不喜欢这项新业务,但他不介意公司因此获得些公关加分。
And it was fast. Not by today's standards, but by those days, it was very fast. It didn't have any memory in the sense that we're used to computers today having memory, but it had this punched paper tape as this form of storage. And despite Thomas Watson Sr. Not liking this as the new business, he didn't mind the company getting some PR points for it.
他们会派记者去观察机器进行运算。就在这一刻,媒体开始预言:很快,由于这台电子大脑的出现,人类将无工可做。
They'd send reporters down to watch the machine do computation. This is the moment when the press said pretty soon no one's gonna have a job to do as a result of this electronic brain.
这台将取代工人的电子大脑。于是在1951年,IBM的长期客户美国人口普查局,在下次人口普查时选择了UNIVAC而非IBM的制表机。小托马斯·沃森回忆这一刻时说:天啊,UNIVAC已经智能到能抢走所有民用业务了。这真正撼动了IBM,促使他们再次自我革新。
The electronic brain that would displace workers. And so 1951, IBM's longtime customer, the US Census Bureau, went with UNIVAC instead of IBM tabulators for its next census. And Thomas Watson junior recalls this moment saying, my god. UNIVAC is smart enough to start taking all the civilian business away. This is really what shakes IBM to once again reinvent itself.
回溯到1880、1890年代,整个发明都是为了解决人口普查问题,而这个问题正变得愈发复杂。人口增多,你想了解的信息也更多。需要追踪更多因素,因此这实际上成为了当时计算机技术的绝佳试验场。这时ENIAC团队带着已商业化的Univac登场,彻底惊醒了IBM。
You go back 1880, 1890, this whole invention was for the census problem, and that problem gets harder. More people and more things you wanna know about people. You wanna track more factors, and so that is actually a perfect test bed for computation at this time. Here comes the ENIAC team that now is commercialized with Univac and wakes IBM right up.
这家公司一直在成长对吧?1940年IBM年收入约4500万美元,员工1.2万人;十年后,年收入达2.5亿美元,员工3万人。当你把公司命运押在新方向上时,很多人可能会因此遭殃。所以对小沃森来说责任重大。
And this company has been growing, right? IBM in 1940 was doing about $45,000,000 in revenue with 12,000 employees, and a decade later, they're doing $250,000,000 in revenue with 30,000 employees. And if you're betting the company on a new direction, things could go south for a lot of people. So it's a big responsibility for Watson Junior.
他们继续推进。在这些早期计算机实验的基础上,最终打造出首个商业产品——IBM701电子数据处理机。这是第一台真正意义上的量产机器:我们要多造几台,还要卖出去。我们要把这变成生意,而不仅是研究项目。
And they pushed forward. And so out of these early computer experiments, they finally built their first commercial product, which was the IBM seven zero one electronic data processing machine. This is the first real machine that, hey. We're gonna build more than one of these, and we're gonna sell some of them. We're actually gonna try and, like, make this into a business, not just a research project.
正如我们在近代一些企业案例中将看到的,他们拥有稳定的客户群和成熟的销售体系。这条新业务线启动后,五年内就占据了计算机市场85%的份额。
And as we'll see with some of the companies closer to the present day, they had an installed base of customers. They had a sales machine. And so they started this business line, and within five years, they had 85% of the computer market.
没错。ENIAC团队发明了首台真空管计算机,随后又推出UNIVAC,但最终被IBM更强大的商业化销售体系超越——别忘了,销售本就是IBM的立身之本。那么谁在购买这些机器?销量如何?701仍是高度定制化产品,他们卖出19台,客户包括国家实验室、气象局和许多需要进行大量计算的航空航天公司。
That's right. So that team that invented the first vacuum tube computer in the ENIAC and then the UNIVAC, they got outrun by the better commercial go to market sales machine that, if you remember, IBM was founded with. So who was buying these things and how many did they sell? Well, seven zero one was still a pretty bespoke product. They sold 19, and these were going to national labs, the weather bureau, and a lot of aerospace firms that were doing a lot of calculation.
整个商业模式实际上并不围绕购买这些机器展开,而是IBM传统制表业务模式的延伸。他们希望你租赁机器、购买打孔卡——我们得到了硬件即服务。没错。
The whole business model was actually not around buying these machines. It was an extension of the tabulating business model that IBM had always had. But if they want you to lease the machine, they want you to rent it and buy punch cards. We got hardware as a service. Exactly.
这是个绝妙的商业模式。五十年代对IBM而言,是从研究导向产品转向他们的'Model T'时代。这个时期他们制造了首个磁盘驱动器,开始同时使用磁带和打孔卡,还发明了Fortran——首个能用文字编写并翻译成计算机指令的现代编程语言。
It was a beautiful business model. The era of the fifties for IBM was going from this research centered product to what I consider their model t. So in this era, they built the first disk drive. They started using tape in addition to to punch cards. They wrote Fortran, is the first real modern programming language where you could write words and it would get translated into computer instructions.
到五十年代末的1959年,IBM推出了1401型计算机,面向数千家公司销售,最终售出约12,000台。他们以真正的计算业务结束了这个十年,并主导着市场。首款机器出货19套,后续机型出货量达到123套。
By the end of the decade, in 1959, launched the IBM fourteen oh one, which was available to thousands of companies, and they ended up selling around 12,000 of those machines. They exited the decade with a real computing business. And they're dominating the market. So that first machine shipped 19 systems. They had another machine that shipped 123 after that.
随后首款晶体管化的'Model T'——1401型产量突破10,000台。1950至1962年间,IBM收入从2.6亿美元激增十倍至26亿美元,员工人数从3万人膨胀至近13万人。哇哦。
Then the first transistorized Model T, the fourteen oh one, crossed over 10,000 units. From 1950 to 1962, IBM's revenue rose tenfold from $260,000,000 to 2,600,000,000.0, and their headcount went from 30,000 employees to almost a 130,000 employees. Whoo.
那么当时还发生了什么?我们处在五十年代,二战已结束,冷战愈演愈烈。苏联的远程核武装轰炸机能在数小时内穿越北极抵达美国城市,这使得探测拦截的决策窗口缩短至两分钟。
And so what else is happening right now? So we're in the the nineteen fifties. World War two has ended. The Cold War is getting colder, And we find that The Soviet Union's long range nuclear armed bombers are able to cross the Arctic and reach American cities in a matter of hours. So this makes the decision window to detect and intercept down to two minutes.
人们容易遗忘的是,冷战本质上是场技术军备竞赛。如果苏联出动这些轰炸机,我们能否更快地察觉并应对?而我们的空军和防御系统并非为这种速度设计。
It's easy to forget, but, like, the Cold War was primarily a technological arms race. Who's gonna get there faster if something's going down? If the Soviet Union launches these bombers, how quickly can we respond and and know about it and then respond? And our air force and our defense systems were not designed for that speed.
确实。每个威胁都迫使技术升级,迫使政府做出反应和投资。1951年MIT有个实验室项目能实时处理雷达数据,证明自动化可以弥补国家安全领域的这个缺口。关键是如何将这个研究原型转化为能24/7运行的政府级系统?这催生了当时最大规模的计算项目。
Yeah. Every threat forced technology upgrade, forced our government to respond, to invest. We had a laboratory project at MIT that in 1951 could process live radar data in real time, proving that automation could close this gap that we were seeing in national security. The challenge was how do you take this research prototype and turn it into a twenty four seven machine that can operate at government scale? And this is a moment that becomes the largest computing project to date.
托马斯·沃森二世看到了这一点,他知道他们必须赢得这份合同。IBM实际上有一项政策,即在所有国防项目中仅保持1%的利润,但他们表示这将推动公司走向未来。因此,1954年签订的这份合同成为当时最大的合同之一,价值超过5亿美元(按1950年代币值计算),相当于现今约55亿美元。那么它是如何运作的?具体实现了什么功能?
Thomas Watson Junior saw this, and he knew that they needed to win this contract. IBM actually had a policy of essentially making only 1% profit on any defense work that they kept throughout this, but they said this is going to push us to the future. And so the contract awarded in 1954 became one of the largest contracts ever, worth more than 500,000,000 in 1950s dollars, around 5,500,000,000.0 in today's dollars. And so how did it work? What did it do?
到1960年代全面部署时,这个网络覆盖了27个不同的中心,每个中心都配备了一对专门为此设计的IBM计算机。当其中一台需要维护或故障时,另一台会立即接管。他们在网络中构建了冗余机制,其规模前所未有。总计部署了56台计算机,这些设备占据整层楼面积,系统功耗达数兆瓦,并且所有设备都实现了互联。
At full deployment, when they got there in the 1960s, it was a network that spent 27 different centers, where each center had a pair of special IBM computers that were designed for this. In case one of them was being serviced or down, the other one would become primary. So they built redundancy into the network, and the scale was unprecedented. In total, was 56 computers. These were acre sized floors, multi megawatt power draw to these systems, and they were connected.
它们通过早期调制解调器连接。这真正推动了调制解调器的量产,使得数据能通过电话线传输。各中心之间租用了专用线路进行连接。
They were connected over early modems. This was the moment that really drove the production of the modem, where you could actually have data sent over telephone lines, and so they would lease special lines between these centers.
这些中心具有我们将详细讨论的特征:内置冗余机制、建筑级规模、多设施联网。这些特征后来成为推动数据中心发展和演进的关键要素。
And these centers have characteristics that we're gonna talk about more. You have redundancy built into this. You have building level scale. You've got multiple facilities networked together. These end up being a lot of the same characteristics that drive data center growth and evolution decades later.
没错。除了规模之外,我认为这个系统的连接性才是关键区别——你不仅拥有存储和计算能力(这些已经存在),现在还实现了连接与通信,因为你需要这个中心能响应另一个中心计算的信息。历史上这一刻,Sage系统,我认为是机器首次实现实时相互通信。意义重大。
That's right. And I feel like there's the scale of it, but I think the connectivity of this system is the difference between you have storage, you have compute, which was happening, but now you had connection and communication, because you want this center over here to respond to information that was computed over there. And so this moment in history, the Sage system, I think is the first real time that machines were communicating with each other. Pretty big moment.
当政府实现这项技术后,自然会吸引其他行业的关注。
So once we're able to do this in government, this is gonna catch the eyes of of other industries.
航空公司目睹Sage项目的成功,而他们正面临新难题。这是六十年代,航空业蓬勃发展,但订票处理方式简直荒谬:你需要联系旅行社,旅行社再致电航空公司,然后航空公司派职员跑来跑去抽卡片来预订座位。
The airlines have seen the Sage project take off, and they're sitting there with a new problem on their hands. This is the sixties. Air flight is booming, and the way that reservations got processed was absolutely insane. You would call your travel agent. Your travel agent would call the airline, and the airline would have to run clerks around pulling cards out to book a seat.
据说,这花了九十分钟。
Supposedly, it took up to ninety minutes.
嗯,你得找出谁坐在17排的中间座位。
Well, you gotta figure out who had the middle seat in Row 17.
谁坐在中间座位?没错。
Who had the middle seat? Exactly.
你不查卡片怎么知道是谁?
How are you gonna figure that out without looking up the card?
随着美国航空公司扩展到实际运营规模,系统开始崩溃。于是他们联系IBM,启动了这个后来被称为SABR的项目。它本质上是Sage系统的商业版本。两台IBM大型机被专门建造,通过电话线连接。这些系统将存储预订信息的真实数据源,并配备终端设备。
As American Airlines scaling to a real operational scale, it's starting to break down. So they reach out to IBM and they kick off this project that'd be known as SABR. It was essentially like a commercial version of the Sage system. Two IBM mainframes were purpose built, would be connected over phone lines. Those systems would house the source of truth around the reservations, and there would be terminals.
这些不是带屏幕的终端,而是带纸张的终端——你仍然需要旅行代理商,但他们会通过终端查询可用座位,预订将自动完成。这样处理时间从九十分钟缩短到终端操作的几秒钟,并消除了中间的办事员。这为电子商务奠定了基础。我认为这是历史上首次通过计算机购买商品。
These weren't terminals with screens, these were terminals with paper where you'd still have a travel agent, but they would be querying over the terminal what seats were available, and things would be booked automatically. So it'd go from ninety minutes to seconds at a terminal, and it removed the clerk in the middle. This set the stage for ecommerce. This is the first time, I think, you're buying something over a computer.
预订时间从九十分钟缩短到几秒钟,到六十年代中期,美国航空公司的运营规模已能实现每日4万次预订。
You had ninety minutes to do a reservation to now down to seconds, and this scaled American Airlines operations by the mid sixties to be able to do 40,000 reservations per day.
难以置信。这个系统至今仍是航空预订系统的支柱之一。这是一个令人惊叹的基础性时刻,商业交易在数据中心之间进行。它们支撑着航空公司运营,承担着这些重大的政府项目。
Insane. This system continues to this day to be part of the backbone of airline booking. It's quite an amazing foundational moment where commerce is happening between data centers. And so they're running airlines. They're doing these major government projects.
企业现在可以直接购买现成的1401型机器。但进入六十年代中期后,又出现了另一个问题——所有这些大型机都有各自的配件,彼此完全不兼容。于是在1964年,他们推出了System/360,将这种混乱、无序、各自为政的大型机世界统一成了一个平台。本质上这是一种统一的架构。
Businesses are now buying the fourteen oh one machine off the shelves. But there's a different problem you enter the mid sixties with, which is all of these mainframes had their own accessories. There was no compatibility. So in 1964, they launched the System three sixty, and that turned this chaotic, messy, ad hoc world of different mainframe models into a platform. It was essentially one architecture.
你可以根据需要选择它的性能等级。
You could choose how powerful you wanted it to be.
先选择基础配置包,然后根据需求添加功能模块。
By your base package, and then you feature it up based on what you need.
没错。对于设施规划师来说,他们现在可以规划机房设施,然后根据需要升级机器性能。我记得这个时代大约投入了50亿美元的研发资金。又一次押上公司命运的豪赌,而且成功了。
That's right. And so for facility planners, they can now plan a room in a facility and then be able to scale up the machine as needed. I believe roughly $5,000,000,000 of investment in R and D was spent in this era. Another bet the company moment. And it worked.
在六十年代,他们每月能出货数千台这样的设备。纵观五十到六十年代——我们之前提到过,从五十年代到1962年,他们的规模增长到了26亿美元。到1970年,年收入已达75亿美元,员工总数27万人。
They were shipping thousands per month of these devices in the sixties. As you zoom out, you look across the fifties and sixties. We talked about earlier, from fifties to 62, they scaled to 2,600,000,000.0. By 1970, they were doing 7 and a half billion dollars in annual revenue and had a team of 270,000 employees.
按今天的美元计算,相当于620亿的年收入。
In today's dollars, that's 62,000,000,000 in revenue.
市场注意到IBM是当时的王者。1970年那个时刻,IBM占美国股市总市值的6.8%。如今看来这个数字几乎前所未闻——当前英伟达占美国股市市值的7%。
And the market noticed IBM was king. At that moment, in 1970, IBM accounted for 6.8% of the total US stock market. It almost feels like an unprecedented thing to say that. Today, NVIDIA is currently 7% of The US stock market.
这些大型机成为了时代精神的象征。它们被昵称为'玻璃屋',形成了独特的文化景观。操作员和主机被安置在封闭房间里,四周装设玻璃幕墙,就是为了炫耀系统的自动化程度和公司的技术先进性。
These mainframes, they became part of the zeitgeist. They became part of a cultural milieu because they were often nicknamed the glass houses. Right? You'd you'd have the mainframes and their operators in an enclosed room. And as we mentioned before, you'd have windows and glass walls around it because you wanted to show off how automated your systems were, how advanced your company was.
人们意识到这些玻璃屋般的大型机支撑着企业客户数十亿美元的营收。因此安保措施、操作控制、恒温清洁环境,都成为那个时代的文化符号——大型机和早期数据中心已成为社会经济运行的重要组成。
There was a recognition that these glass houses, these mainframes were powering the billions of dollars of revenue from their enterprise customers. And so the security, the operational control, the climate control cleanliness, it it was all part of this cultural moment where you had mainframes and these early data centers become a meaningful part of how our society and economy ran.
任何高效使用计算机的公司都面临算力供不应求的状况。用户常常要排队等待运算结果,有时甚至长达一整天。虽然利用率很高,但存在严重的交互延迟问题。直到一项影响深远的发明出现,彻底改变了单线程计算的理念——从'一次只处理一个任务'转变为'确保硬件最大化利用',机器不再关心任务是否来自同一用户。
If any company was using it effectively, they had more demand for use than they had supply in the computer. And so oftentimes, people would be sitting there waiting for their information to come back, their computation to run. Sometimes it would be a day or longer. So the utilization was pretty high, but there was this long interactive latency problem, and an invention that echoes through in multiple ways to this day changed the fundamental way that computing was thought of from this single threaded, I'm only working on one problem at once, to know the job is to make sure my hardware is as utilized as possible. I don't care as a machine whether or not it's the same problem from the same person.
我只需要持续运算。
I just need to be calculating.
我必须全天候运转以充分利用算力。可以处理这个问题,也可以处理那个问题,任务可以随时切换,这无关紧要。
I need to be operating all the time to utilize my capacity. It can be for this problem, it can be that problem, you can slot them in and out. It doesn't matter.
没错。60年代初MIT团队开发的CTSS(兼容分时系统)于1963年投入校园使用。学生们能通过电动打字机同时与主机交互,首次体验到掌控计算机的感觉。
That's right. A team at MIT in the early '60s developed what was called CTSS, the Compatible Time Sharing System, and turned it on to the campus in 1963. Students could sit there with electric typewriters and interact with this machine multiple at the same time and feel like they had control over the computer.
感觉就像是你自己的电脑一样。
Felt like it was your own.
这是一个重大时刻。这是首次出现登录、用户、文件和即时反馈。因此在六十年代中期经历这一切是革命性的,此前电脑只是由他人操作的设备。
And this was a huge moment. This is the first time there was logins and users and files and instant feedback. And so it was revelatory to be there in the mid sixties experiencing that after the idea of a computer was just this kind of operated thing by someone else.
没错。你得排队等候,把数据交给操作员——那些专家们去处理,然后坐着干等。现在你可以直接交互了,是你和电脑之间的对话。
That's right. You had to queue up, get in line, hand over your data to the operators, the specialists who would then go and do their thing, and you would sit and wait. Now you're interacting. It is you and the computer.
这种工作方式的精妙之处在于,你显然没有对电脑的独占控制权。计算机会根据空闲周期切换处理的任务。人类会觉得:电脑怎么能做到这样?其实计算机的运算频率远超我们的感知能力。即便是响应按键这样简单的操作,在我按键和你按键之间的毫秒间隙里,电脑完全能完成响应并切换回其他任务。
And the way this worked was that you didn't obviously have monopoly control over the computer. The computer was switching what problem it was working on depending on what free cycles it had. The human perception is like, oh, how can a computer do that? Well, the computer is operating at a much faster frequency than we're able to realize. Even if it's something as simple as responding to a keystroke, there's plenty of milliseconds in between my keystroke and your keystroke for the computer to respond and then switch back to the other problem.
正如我们后续会看到的,这种智能分割硬件资源并高效利用的理念,成为了当今云计算和数据中心架构的基石,而这一切都始于分时系统的创新。当时这些占领商业领域的大型主机通过分时技术,让校园里的多用户连接到同一台主机时,都感觉像拥有专属计算机。但他们仍与外界完全隔绝,仍是信息孤岛。
And as we'll see later in this story, this notion of intelligently slicing up the hardware and utilizing it becomes the backbone of everything in how the cloud and data centers are architected today, and it started with this time sharing innovation. Well, we had these mainframes that have taken over the business world and time sharing. So multiple people can be on a campus connected to that mainframe feeling like they have their own computer. They were still fully disconnected from anyone else in the world. There were still islands onto themselves.
是的,现在这座岛上可以容纳多人,但你无法造访隔壁岛屿。我们还缺什么?
Yes, now on this island, multiple people can be there, but it's not like you can go visit the next door island. What are we missing?
这是我们如今习以为常的事:不同地点的人们能同时协作于同一系统。别忘了,当时是五十年代末的冷战时期,艾森豪威尔总统要确保美国不会遭遇技术突袭。
It's something we take quite for granted. Many people in different places, but working on the same system at the same time. Now remember, we are in the late nineteen fifties in the Cold War, and Eisenhower wants to ensure that The US would not be blindsided by a technological surprise.
就在那个特定时刻,所有人抬头望向天空,第一次看到人造物体在太空中漂浮,那就是斯普特尼克。苏联人用斯普特尼克率先进入太空的事实,我认为从地缘政治角度引发了一种今天难以想象的恐慌程度。
There's this particular moment, right, when everyone looks up in the sky, and for the first time, a man made thing is floating in space, and that's Sputnik. The fact that the Russians beat us to the skies with Sputnik kicks off, I think, a level of panic in it from a geopolitical standpoint that's hard to connect to today.
因此在斯普特尼克引发的恐慌之后,艾森豪威尔创建了高级研究计划局(ARPA)。那些年里,ARPA向太空、导弹和计算领域投入大量资金。在这个核武装的世界里,我们需要确保武器无法摧毁集中式指挥系统,因此网络本身必须去中心化。这种确保可靠性的需求,再加上鲍勃·泰勒在五角大楼办公室遇到的一个更平常的问题——他当时同时使用三台终端。
And so after the panic of Sputnik, Eisenhower creates the Advanced Research Projects Agency or ARPA. And in those years, ARPA was pouring money into space, into missiles, into computing. And we wanted to ensure in this nuclear armed world that weapons could not destroy a centralized command system. And so the network itself needed to be decentralized. This need to ensure reliability, combined with a slightly more mundane problem that a one Bob Taylor faced when he was at his office in the Pentagon, which is that he was working at three terminals.
他有一台连接麻省理工学院的机器,另一台连接加州大学伯克利分校,第三台则连接完全不同的研究系统。他可以与任何一台交流,但无法同时操作。于是他推着转椅来回移动,据他回忆说:'我们应该找到方法把这些不同机器连接起来',他这样告诉上司。经过二十分钟的会议,泰勒带着一百万美元预算和简单使命离开了——让这个设想成为现实。
He had one machine connected to MIT, another connected to UC Berkeley, and a third for a different research system altogether. And he could talk to any of them, but never at the same time. So he's rolling his chair back and forth, and he quoted saying, we ought to find a way to connect all of these different machines, he told his boss. And after a twenty minute meeting, Taylor walked out with a million dollar budget and a simple mandate to make it happen.
我们不会详述每个细节,因为ARPANET本身就是一个精彩的故事。但我认为有几项创新为互联网奠定了重要基础,其中之一就是分组交换的概念。电话线路方面,AT&T的整个业务建立在连接两人通话的电路上,这被称为电路交换。
And we won't go into every moment and step because the ARPANET itself is quite a story. But I I think a couple innovations set the stage for the Internet in a really important way. One is the concept of packet switching. If you think about a phone line, AT and T built its whole business on this idea that you're gonna connect a phone call between two people, and there's gonna be a circuit that connects those two phones together. This is called circuit switching.
但有个疯狂的想法:要建立弹性系统,当一个节点失效时不影响其他节点,就需要通过不同路径路由信息,不依赖单一线路。需要像蜘蛛网般的节点结构,两点间存在五条、理想情况下五十条甚至五千条路径。为此需要灵活的通信系统,于是产生了将信息拆分为数据包(如同小邮件)的概念,这些数据包可以通过不同路径传输,最终在接收端重新组装成完整信息。
But there's this crazy idea, which is if you want that resilient system so that if one node goes out, the next thing doesn't go out, you need to route things around in different ways and not be dependent on the one route you have. So you need to have more of a spider web node to say, well, from here to here, there's actually five or ideally 50 or 5,000 ways to get between those two points, and to do that, you need a flexible communication system. And so the idea is to break up information into packets, little pieces of mail that would get sent from one node to another. Different packets could even take different routes, but on the other side, someone would reassemble them into the message.
这是将同一信息分解成无数碎片,通过无数不同路径传输的概念。
It's the same singular message broken up into many, many different pieces and routed through many, many different pathways.
有趣的是,直到联系到分时系统的故事我才想到,这其实是为了提高网络利用率。在电路交换模式下,你我之间有一条专属线路,你与他人也有专属线路——而这些线路通常是闲置的。
It's funny. I hadn't thought about this until we connected to the time sharing story, but it's actually about better utilization of the network. In a circuit switch mode, you have this direct line. You and I have a direct line together, and you have a direct line to someone else. That line is usually empty.
要充分利用和发展电路交换技术相当困难,因为电话通话结束后,线路就闲置在那里。而在分组交换中,数据包在这个复杂的世界里自行寻找路径。AT&T对此持强烈怀疑态度,他们说如果想让两台计算机通信,他们会架设电话线供其使用。但一支从MIT分拆出来的精英团队——Bolt、Barenek和Neumann公司——抓住了这个分组交换模式并持续推进。他们需要构建的核心设备被称为接口消息处理器(IMP)。
It's pretty hard to fully utilize and build out the circuit switch thing because the phone call ends and you're like, okay, now it's just sitting there unused. In packet switching, packets are finding their way through this wild world. So a lot of skepticism obviously from AT and T, which said if you want to have two computers talk to each other, we'll build the phone lines and you can run them. But a crack team that had spun out of MIT, this company Bolt, Barenek, and Neumann, latched onto this packet switching model and ran forward. And the core thing that they needed to build was what was called the interface message processor, or the IMP.
这本质上就是放置在主机(即大学)前方的路由器。MIT和伯克利拥有这些大型主机,但要连接这些机器,就需要在它们前面设置这些节点来构建网络。于是BBN采用了一台冰箱大小的Honeywell小型计算机,价值约8万美元,围绕分组交换构建逻辑并推出了这一系统。为了让第一台机器准备就绪,工程师们付出了许多英勇努力,然后时机终于成熟。
This was ultimately the router that would sit in front of the host machine, which was the university. MIT, Berkeley would have these mainframes, but to connect those machines together, there would need to be these nodes that would sit in front of them and be able to build this network. And so BBN used a Honeywell minicomputer that was a fridge size, cost about $80,000, and built the logic around packet switching and rolled this out. There are a lot of engineering heroics that went into getting the first machine ready, but then it was time.
大约55年前,一小群研究生聚集在UCLA,手持香槟等待机器从卡车上卸下,庆祝第一台IMP的到来。几周后,第二台交付给了斯坦福大学。正如Laura所述,具有预言意义的第一条信息本应是'login'(登录),对吧?现在用户可以进行登录了。
Almost fifty five years ago, a small group of grad students gather at UCLA to wait for the machine to be rolled off the truck, champagne in hand, and are able to celebrate the arrival of the first IMP. The second one delivered to Stanford weeks later. And the prophetic first message, as Laura has it, was meant to be login. Right? You now have users that can log in.
但在输入前两个字母后,系统崩溃了。因此第一条信息仅仅是'lo'。
But after inputting the first two letters, it crashed. And so the first message was simply low.
'Lo'。差一点就是'low'(低)。当然,只有两个连接的网络能有多大用处?非常有限。但随后他们向UC圣塔芭芭拉分校、犹他大学推广,之后每月都在向大型大学部署新的IMP。
Low. So close to low. And of course, what is a network with two connections? Pretty limited. But then they rolled out UC Santa Barbara, University of Utah, and then they were just rolling out new IMPs to the large universities month over month.
这些成为了网络节点,几乎每月新增一个。每个新加入网络的节点都放大了网络价值,因为现在不仅是双向通信,而是多边通信。
And these became nodes, almost a new node every month. And then every new node that comes on the network amplifies the value. Because now it's not just bidirectional communication, it's multilateral communication.
他们构建了所有这些逻辑来实现弹性路由、设备发现等必要功能。当然,最初的宣传点是:'作为政府,我们已经资助了所有这些研究型大学的昂贵计算机,让我们提高利用率。也许某个团队有执行特殊任务的专用程序。'
So they're building all this logic to do the resilient routing, to do the discovery of devices, and everything that you needed to do that. And, of course, the the original pitch was, hey. We've funded, as a government, all these expensive computers at all these research universities. Let's drive utilization. Maybe one team has a special program for doing something.
另一个团队有个专门做某事的特殊项目,我们让各部门对接吧。
Another team has a special program for doing something. Let's let the departments connect.
一个意想不到的应用脱颖而出。研究人员发现,快速异步消息通常比登录他人系统运行代码更有价值。1972年,ARPA的鲍勃·卡恩曾说,看吧,大家其实都用这玩意儿来收发电子邮件。网络的目的已悄然从共享机器转变为以邮件形式连接人群。
One unexpected application really bubbled to the top. The the researchers discovered quick asynchronous messages were often more valuable than logging into someone else's system to run the code. And in 1972, ARPA's Bob Kahn said, you know, everyone really uses this thing for electronic mail. And the network's purpose had quietly shifted from sharing machines to connecting people in the form of email.
神奇的是,一旦连接足够广泛,杀手级应用总是应运而生。远程人际沟通永远是那个杀手级应用。
It's wild how as soon as you have enough connectivity, it's always the killer app. The killer app is always people communicating at a distance.
确实如此。人类天生就渴望做我们基因里设定的事——与他人建立联系。尽管这些机器如此精密,最终成为杀手级应用的却是简单的异步通讯行为。
That's that's right. People wanting to do what we are biologically programmed to do, which is connect with other humans. And as sophisticated as these machines were, it was a simple act of asynchronous communication that became the killer app.
越来越多人需要它。终端设备催生了本地网络内的新型连接方式,进而接入更广阔的网络。不再是‘嘿,你需要终端来访问校园里的机器’,而是‘你需要终端给其他校区的朋友发邮件’。
And more people wanted it. It became new ways to connect within your kinda local network over these terminals so then you could connect into the broader network. It was no longer that, hey. You wanted access to a terminal to access the machine on your campus. It was you wanted a terminal so you could email, you know, your friends at the other campus.
听起来像是我们稍后会谈到的早期社交网络。扩张持续进行:1972年有29个节点,到1975年已超50个。
Sounds like an early social network that we'll get to later. And so the expansion continued. By 1972, there were 29 nodes. By 1975, over 50.
有些连接已不仅限于校际之间。我们开始跨越海洋——虽然笨拙,但确实实现了从弗吉尼亚节点到英国康沃尔节点的跨大西洋连接。
And some of these connections weren't just from university to university. We were now leaping oceans. It was clumsy, but we were able to go over the Atlantic Ocean from a node in Virginia to the one in Cornwall, England.
有个故事说他们从英国的一个会议回来时,有人把电动剃须刀落在那儿了。
There was some story where they came back from some conference in England, and someone had left their electric shaver behind.
当时是半夜。在英国是凌晨三点,但他知道他的同事是个工作狂。于是他在英国时间凌晨三点发了一条消息,看看这个人是否醒着,结果他确实醒着。他看到对方在线,就问:嘿,我是不是把电动剃须刀落那儿了?
It was middle of the night. So in England, it was 03:00 in the morning, but he knew that his colleague was a workaholic. So he sent off a message at 03:00 in the morning England time, seeing if this person was awake, and he was. And he saw that he was logged on. He was like, hey, did I leave my electric razor there?
能帮我拿一下剃须刀吗?结果一切顺利,简直不可思议。到了80年代初,美国和其他地方开始涌现出其他研究网络。人们觉得,这个ARPANET挺不错的。
Can you get my razor, please? And it all worked. It was phenomenal. So by the early 1980s, other research networks started to pop up in The US and otherwise. People were like, Oh, this ARPANET thing's pretty cool.
我们要自己建一个。有DeckNet、IBM Net,还有各种公司和研究团体建立的网络。这带来了新挑战:我们想和所有这些网络上的任何人交流,该怎么实现?
We're gonna build our own. There was DeckNet. There was IBM Net. There was various nets that companies started to stand up, other research groups. It had a new challenge of, We wanna talk to everyone on any of these networks, and how are we gonna do that?
而这些IMP(接口信息处理机)并不是为此设计的。它们原以为只需要构建一个网络。于是参与ARPANET的文特·瑟夫和鲍勃·卡恩意识到,他们需要建立一种通用语言来解决寻址问题——我们要把这封邮件发往哪里,同时还要制定实际传输信息的控制协议,确保不会重复发送数据包。这就形成了TCP/IP协议:TCP指传输控制协议,IP指互联网协议。
And these IMPs were not designed for that. They had presumed that there was only one network that they were trying to build out. And so Vince Serf and Bob Kahn, who were involved in ARPANET, figured out what they needed to build was this common language to figure out addressing, where are we gonna send this packet of mail, and then also the actual control protocol for the actual sending of information to make sure you don't send duplicate packets. And so this became TCP IP. TCP for the transmission control protocol and IP for the Internet protocol.
你可以把IP看作解决寻址问题的部分。你可能熟悉自己的IP地址,它通过路由器沟通你要找什么、想把数据包发给谁。而TCP则是确保数据包被可靠确认的方式——嘿,这小段信息已经送达。
You can think of IP as solving the addressing problem. You might be familiar with your IP address that then communicates through routers what you're looking for, who you're trying to get the packet to. And then TCP is the kind of reliable way that a packet gets acknowledged. Hey. This little chunk of information has arrived.
就像邮件送达回执一样。
It's like the certification of the mail.
签收我的包裹。我有地址所以包裹知道该送到哪里,然后我签收它。
Signing for my package. I have an address so the package knows where to come, and then I sign for it.
我不需要你再给我寄包裹了,因为我已经收到了。这种组合成为了连接所有这些网络的骨干,并最终成为ARPANET之后下一阶段——网络之间的互联网络(inter net)的骨干。于是在1983年,他们要求所有ARPANET主机都必须迁移到TCP/IP协议。这个在1983年1月1日的切换,成为了现代互联网的框架。众多网络使用同一种语言进行连接。
And I don't need you to send me the package again because I got it. And so this combination became the backbone for connecting all of these networks together and ultimately the backbone for the next stage of not the ARPANET, but the inter net between networks, inter net. And so in 1983, they took every ARPANET host and said they need to move to TCPIP. And that switch, 01/01/1983, became the framework for the modern day Internet. Many networks all speaking the same language able to connect.
这就是.com、.edu、.gov域名出现的契机,因为我们现在有了让多个网络相互通信的标准协议。
And that's where we have the advent of the .com, the .edu, the.gov, because we had now standard protocols for multiple nets to be communicating together.
20.3.1.72可没有pets.com这么朗朗上口。
20.3.1.72 doesn't have the same ring as pets.com.
Pets.com,这可是个重要案例。
Pets.com, an important one.
但我们还没完全到那一步。
But we're not quite there yet.
是的。但正如本提到的,到八十年代中期,多个网络已经相互连接,而ARPANET本身也显露出老化迹象。到1990年,ARPANET显然需要退役了。但它所做的工作——去中心化网络、分组交换和开放协议的理念已深深扎根,研究机构和大型组织已经牢固建立了信息交换的方式。
No. But by the mid nineteen eighties, as Ben mentioned, there's multiple nets now connected, and ARPANET itself was showing its age. By 1990, it became apparent that the ARPANET needed to be decommissioned. But the work it had done, the ideas of decentralized networking, packet switching, and open protocols had firmly taken root, and research institutions and these large organizations had firmly established how they were exchanging messages.
因此他们能够互相发送文件,或许还能交换彼此的研究论文和数据集。但这还不是网页浏览器的时代。我们需要一种围绕文档的结构。于是,瑞士CERN的研究员蒂姆·伯纳斯-李提出了将成为万维网应用层基础设施的文档结构。
So they were able to send files back and forth, maybe send each other's research papers and datasets. But this was not a thing with a web browser yet. We needed a structure around documents. And so Tim Berners Lee, a researcher at CERN in Switzerland, proposed the document structure that would become the application layers infrastructure of the World Wide Web.
这个最初作为冷战研究项目启动的网络,源于抬头望见苏联人造卫星的刺激,如今已演变为公共及商业互联网——一条数据开放高速公路,也是我们将构建现代数据中心经济的基础。现在我们拥有了这个不断扩张的网络,
And so the network that started off as a Cold War research project in response to looking up at the sky and seeing the Russians on Sputnik had now become the public and commercial Internet, an open highway for data and the foundation for which we're gonna build the modern data center economy. So we now have this expanding network,
但网络上的计算仍停留在终端与大型机之间的交互模式,这些设备再连接到网络;或是终端与连接网络的小型机交互。我认为有必要多谈谈小型机,它们既是后来IMP设备的雏形,也是主要连接方式。这不是你会买回家的东西。它们刚问世时价格仍相当于一辆汽车,但比起IBM那种房间大小的巨型主机已亲民许多。
but the computing on this network was still this terminal based interaction between a terminal and a mainframe that then connected to the network or terminal and a minicomputer connected to the network. I think it's worth saying a bit more about minicomputers, which were both what became that IMP device and then was also a primary source of connection. These were not something that you'd buy for your house. They still were the cost of maybe a car when they first came out. But they were much more accessible than a big IBM mainframe room sized computer.
于是数字设备公司(DEC)在60年代中期推出了首批小型机。小型机的许多创新在于软件领域,如Unix系统、C语言编程、套接字等这些影响至今的基础性创新。但当时市场还缺一款普通人能买得起的计算机。
And so the company Digital Equipment Corporation, or DEC, launched the first mini computers in the mid-60s. A lot of what innovated on the mini computer was software like Unix and C programming and sockets and all these foundational innovations that would later echo through to today. But there was room for a computer that a normal person could buy.
1975年1月那个决定性的日子,《大众电子》杂志封面展示了Altair 8800——一台家庭可购的机器,基础版439美元,增强版1500美元,能运行自编代码。几位爱好者去报摊买了这期杂志,其中两人看到封面后决心采取行动。
On one fateful day, January 1975, the cover of Popular Electronics magazine showcased the Altair 8,800, a machine you could buy at home for $439 or $1,500 souped up, and run your own code. And there are a few hobbyists that went to the newsstand and and picked up this magazine. Two in particular saw the cover and decided to do something about it.
比尔·盖茨和保罗·艾伦在西雅图湖滨中学时就已闻名——他们能使用终端连接计算机,会花大量时间编程。但这与拥有可随意把玩的家庭电脑截然不同。所以当他们看到这个产品问世,立即意识到:计算机已便宜到普通人能负担个人电脑(PC)的时代来临了。
And so Bill Gates and Paul Allen, famously when they were at Lakeside High School in Seattle, they had access to a computer in a terminal, and they would spend hours and hours programming. But that's different than having one in your own house that you can play with. So when they saw this come out, it was immediate that this was a moment that computing has gotten cheap enough that a normal person could afford to have their own personal computer or PC.
你可以预见,如果首款产品售价450美元,那么它普及到更多人群只是时间问题。
And you could see that if the first one was released at $450, it was only a matter of time before that would become accessible to more and more people.
微软迅速成立,为这款设备销售基础编译器。1977年苹果II发布,为许多人开启了个人电脑浪潮,1979年又与VisiCalc电子表格软件搭配。突然间,电子表格这个杀手级应用不仅限于家用,还具有了企业功能,促使人们购买这些个人电脑。这样人们就能在不需要连接走廊尽头的大型机的情况下进行预算建模。
Microsoft formed immediately to sell a basic compiler for this device. The Apple II launches in 1977, which kicks off the PC wave for many folks, and it pairs with VisiCalc in 1979. So all of a sudden you had a killer app of a spreadsheet that wasn't just for home use, but now had a corporate function to go and buy these PCs. And so someone can model a budget without needing to connect to the mainframe down the hall.
我可以坐在办公桌前完成财务操作。
I can stay at my desk and run my finance operations.
没错。IBM虽然不是最早意识到这点的,但他们很快发现这对他们是个问题。如果所有人都在自己桌上完成计算,就不再需要走廊尽头的大型机了。玻璃机房被打破了。于是他们启动了一个臭鼬工厂项目。
That's right. And IBM, while they weren't the first here, they're actually pretty quick to realize that there was a problem for them. If everyone is doing all the computation on their desk, they're not gonna need the mainframe down the hall anymore. The glass house has been shattered. And so they kicked off a skunk work project.
这其实相当惊人。他们将团队隔离在纽约州博卡拉顿市,要求这个团队使用现成部件设计IBM PC。1981年,他们制造并发布了IBM PC,并授权使用六年前成立的微软公司的MS-DOS作为核心操作系统。他们以为硬件才是主业,时间证明他们错了。包括康柏在内的众多IBM兼容设备在80年代涌入市场,价格下跌,硬件商品化,微软凭借DOS和后来的Windows形成了Wintel双头垄断,最终从IBM手中夺取了用户心智和市场。
It was actually pretty amazing. They isolated it outside of the New York region down in Boca Raton, and they said that this team is gonna use off the shelf parts to design an IBM PC. And so in 1981, they built and launched the IBM PC, and they actually licensed MS DOS from Microsoft, founded six years earlier, as the core operating system. They thought that hardware was the business, time would show them to be incorrect, and a bunch of IBM compatible devices, including Compaq and others, flood the market through the '80s. And so prices fall, hardware becomes commodity, and Microsoft with DOS and then Windows becomes the Wintel duopoly that ultimately takes the mindshare and market from IBM.
太神奇了。Lotus 1-2-3统治电子表格市场,WordPerfect主导文字处理——我记得就用过它。然后微软出现,宣布要把这些都整合进Office套件,真正做到了后来居上。
It's amazing. You got Lotus one, two, three dominating spreadsheets. You've got WordPerfect. I remember using that for word processing. And then Microsoft comes along and says, I'm gonna bundle all this into Office, and really takes the cake.
于是生产力突飞猛进,我们个人能完成的事远超从前。我们的协作方式?就是那张神奇的软盘。通过来回传递软盘,工作得以持续推进。
And so productivity's through the roof, we're able to do more personally than we could ever do before. And the way we collaborate, it's that magic floppy disk. We're able to hand that back and forth and keep on rolling.
这很棒对吧?直到它不再棒为止。你想和大楼另一头的人协作?得跑过去递软盘,要是对方用的软件版本不同怎么办?
And that's great, right? Until it's not. Until it's not. You wanna work with someone across a big building? You're gonna run them a floppy disk, and what if they have a different version?
这简直一团糟。我认为个人与个人计算开始成为阻碍,因此我们需要将这些设备也连接起来。于是出现了以太网和IBM令牌环技术,它们铺设楼层线路,将这些PC连接成企业网络。随之涌现了一批新公司,比如Novell开发了NetWare,它实际上将服务器转变为共享磁盘和打印机的枢纽。
It's mess. I think the personal and personal computing started to become a hindrance here, and so we needed to connect these devices as well. Enter the invention of ethernet and IBM's token ring that wires the floors and connects these PCs into a corporate network. And you have a new set of companies. You have Novell building NetWare that actually turns a server into a hub for shared disks and printers.
你桌上的PC可能需要连接打印室进行打印,这时就需要一台专用计算机——我们称之为服务器,它连接着打印机或共享存储设备。现在我们能通过网络访问文件、远程打印了。Novell这家公司曾一度成为仅次于微软的第二大软件制造商,在这个时代蓬勃发展。微软则凭借其优势迅速响应,开发了网络操作系统,最终推出Windows NT等服务,让你能在微软生态系统中完成所有操作。
You have a PC on your desk and there's maybe a printer room and you want to go print to that, you'd have a different computer that is a specialized PC that we're going to call a server that's connected to that printer or connected to the shared storage, and now we're able to access files over a network. We're able to print over a network. And a whole company, Novell, that at some point in time was actually the second largest software manufacturer after Microsoft was booming in this era. Now Microsoft does what they do very well and responds and builds the network connected operating system, eventually Windows NT and other services, so that you can do it all within Microsoft's ecosystem.
在我们多次对话中,许多行业元老都将这个时期视为客户端-服务器时代的开端。这具体意味着什么?
So throughout our conversations, a lot of the OGs in this industry pointed to this moment as the introduction of the client server era. What what does that mean?
这意味着应用程序被真正一分为二。你桌上的电脑作为客户端运行,访问后台的中央数据库或服务器。支撑这一模式的技术层出不穷,从Unix服务器到ERP系统再到Windows NT。但现在你会从设计阶段就将应用视为网络化产物——本地客户端软件能充分利用PC性能,连接的不是全球服务器,而是机房里的本地服务器。
It means an application is really split in two. You had your computer sitting there on your desk running as the client accessing a central database or a server in the backroom. And a bunch of technologies came up to support this, from Unix servers to ERP systems to Windows NT, but now all of a sudden you would think about an application as networked from design, where you'd have this local client software that could be using the best capabilities of the local PC connected to the server, not across the world, but the server in the server room on-site.
走廊那头、隔壁房间或是地球另一端的服务器又是什么?
What is that server down the hall or in the other room or across the world?
我认为最显著的特点是,这个时代的服务器开始采用个人电脑架构——即基于x86英特尔架构或Sun工作站架构。这些架构逐渐成为运行应用程序的服务器标准,这标志着从IBM大型机时代的重大转变。发展到后来,软件商会直接销售软硬件一体的'设备',导致IT部门需要管理各种功能单一、搭载定制软件和操作系统的专用设备。虽然功能强大,但效率极其低下。
I think most notably is that in this era was the era that you take architecture of the personal computer, so this is the x86 Intel based architecture or the Sun Workstation architecture, and increasingly those would become the servers that could run these applications. This was a notable shift, again, from the IBM mainframe era, and it got to the point where someone selling your company an application would sell you an appliance, which was really like a package of the software and the hardware together. And so you ended up with the sprawl of different appliances that IT is managing, all serving different functions, all being written with their own bespoke software, their own operating systems. And so while the functionality was amazing, people got very inefficient.
在此之前,数据处理专业人员奉行着以节约为核心价值的职业伦理——浪费一个CPU周期或一字节内存都是可耻的行为。
Up until this point, there was a culture and an ethos amongst data processing professionals that put conservation as the highest level of ethic. To to waste a CPU cycle or a byte of memory was embarrassing.
我认为部分原因是成本已降至足够低廉。这些服务器的租用价格已大幅下降。同样的服务器在十年前或十五年前每月租金可能高达一万美元,而现在让它们闲置在那里已非不合理之事。与此同时,从宏观角度看,这种低效后来催生了分时系统向虚拟化技术的演进空间与价值——我们会想,为何不能将这些应用集中运行在一台主机上?为何每套系统都需要专属主机待命?
I think part of it was it had gotten cheap enough. These servers had gotten cheap enough. These these same servers would have been $10,000 a month to rent just ten, fifteen years prior, and now it wasn't unreasonable to have them sitting there idle. At the same time, in a macro sense, this inefficiency later creates the space and value of the evolution of time sharing into virtualization, where we say, why can't we run these applications on one box? Why do they all need their own box sitting there waiting for a command?
没错。如果单台主机未被单一客户充分利用,那就共享它。
That's right. If the box is not being maximally utilized by a single client, share the box.
我们将深入探讨这一演变过程,但此刻仍处于深度创新的时代,尤其在冗余技术方面。当时有RAID阵列作为冗余备份存储,还有异地磁带库,因为仍存在重大单点故障风险。企业的服务器——可能是客户记录系统——就放在机房柜子里。我听REI某位IT主管说过,他们总部附近曾遭遇洪水,当时反应是'网站要瘫痪了,我们将损失大量业务'。
We will do a whole focus on how that comes to be, but this is still an era of deep innovation, particularly on redundancy. You have RAID arrays, which is redundant storage for backup, tape libraries that are off-site because you still have these massive points of failure. Your company's server, which might be your system of record for your customers, is sitting there in the closet. I heard from someone who ran some of IT at REI at a point that they had a flood near their headquarters, and they're like, Well, the website's gonna go down. We're gonna lose all this business.
因此这种架构本质上缺乏规模化和冗余设计。为了让各位更直观感受当时计算能力和连接方式如何开始转变,我们来聊聊一家极具创新精神的零售企业——不,不是亚马逊。我们要说的是零售规模化运营的鼻祖:沃尔玛。没错。
And so this model was not designed for scale and redundancy in a major way. So to make this moment concrete of just how computing power and connectivity was starting to shift, let's talk about a deeply innovative retail company that, no, is not Amazon. We're talking about the OGs of major retail and scale. It's Walmart. That's right.
想象你站在社区沃尔玛的收银台前。扫描牙膏条码时'滴'声响起,数秒内店铺后方的卫星天线就会将交易数据传向天空,直达阿肯色州本顿维尔的总部主机记录销售。几分钟后,庞大的数据仓库会更新你刚购买的牙膏数量。或许当天结束前,宝洁工厂就会收到需要
And so you're at the register at your neighborhood Walmart location. You scan the toothpaste, the barcode beeps. Within seconds, a satellite dish behind the store sends that transaction to the sky and over to Bentonville, Arkansas, where they're hosting their mainframe computer record the sale. A few minutes later, a massive warehouse with data would update how many tubes of toothpaste you had just bought. And then perhaps before the end of the day, Procter and Gamble's factory would receive an update that they needed to
增产
make more
牙膏的通知。这就是1980年代末零售业的最前沿技术。沃尔玛做出重大决策推动创新,斥资2400万美元建设私有卫星网络,将所有门店与总部相连。这在当时确实堪称创举,对吧?
toothpaste. This is the cutting edge of retail in the late 1980s, and Walmart makes this massive decision to take this a step further and really drives innovation across the retail industry. They invested $24,000,000 to build their own private satellite network, linking all Walmart stores to headquarters. This was really fairly unprecedented at the time, right?
这是当时已建成的最大规模私人卫星网络。
It is the largest private satellite network that had been built.
它创造了一个统一的实时机器。任何发生在沃尔玛生态系统内的事件都通过这个私人网络传输,使他们能够以前所未有的方式挖掘数据。通过实时收集所有沃尔玛门店的销售数据,他们发现飓风来临时,果酱馅饼的销量会激增至平时的七倍。在阿肯色州本顿维尔工作过一个夏天的我可以告诉你,这已深植于他们的基因中。他们持续关注着这类信号。
And it created a unified real time machine. And so any single event that happened within the the Walmart ecosystem rides on this private network, and it enables them to mine their data in a way that was unheard of before. By mining their sales data, which now could be collected in real time across all Walmart stores over their private satellite network, they discovered that when hurricanes approached, the sale of Pop Tarts increased seven x over the normal rate. And having spent a summer working in Bentonville, Arkansas, I can tell you this is deep in their DNA. They are constantly looking at signals like this.
具体而言,他们发现风暴来临前最受欢迎的是草莓味果酱馅饼,这一传奇洞察促使气象学家预测极端天气时,沃尔玛会在受影响门店大量备货草莓味果酱馅饼。这不禁让人猜想——
Specifically, they discovered that it was the strawberry Pop Tart that was most in demand before a storm, which led to the legendary insight of meteorologists predicting a severe weather event and Walmart stocking their affected stores with pallets of strawberry Pop Tarts. It makes you wonder if
反过来则不可能实现。就像与其查看天气预报,不如直接去沃尔玛看草莓味果酱馅饼是否上架,因为这个系统如此可靠。我知道他们在监测天气。但这里存在疯狂的创新——既有卫星链路,据说沃尔玛还通过与Teradata公司合作,拥有首个商用1TB企业级数据仓库。
the reverse would be impossible. It's like instead of checking the weather, you go to Walmart to see if the strawberry Pop Tarts are there because it's such a reliable system. I know that they're watching the weather. But there's some crazy innovation here. Both the Satellite Link and then Walmart supposedly was the owner of the first commercial one terabyte enterprise data warehouse they built with a company called Teradata.
他们近乎偏执地确保所有销售和客户数据汇集一处。到2001年2月,仅九年后,这个1TB的仓库已扩容至70TB。哇,这完全是企业内部连接性的大爆发。虽然个人电脑已出现,有些人会在家里玩电脑——我记得小时候用IBM XT玩早期编程和游戏——
They were just maniacal about making sure all the sales and customer data flowed into one place. By 02/2001, just nine years later, that one terabyte warehouse had grown to 70 terabytes. Wow. This is just an explosion of connectivity inside the enterprise. While there was the personal computer and some people had fun with their computers at home I remember playing on IBM XT as a kid and some early programming, early games.
但大多数人并没有真正的个人使用场景。我们仍处在一个以企业为中心的世界。我们讨论了个人电脑的兴起、企业内部不断增长的客户端-服务器局域网,这些数据中心,以及沃尔玛的卫星网络。
It was not like most people had the real personal use case. We're still talking about a corporate centered world. We talked about the rise of PCs and the growing client server LAN inside of a company. We talked about these data centers. We talked about Walmart's satellite network.
但进入90年代初,在整个ARPANET和NSFNET互联网发展的背景下,办公室之外正在发生什么?
But what's happening now going into the early nineties in this whole ARPANET, NSFNET Internet thing? What's happening outside of the office?
是的。或许还有地底下正在发生的事。
Yeah. And perhaps what's happening under the ground.
因此,第一个向所有研究人员开放的网络演变成了NSFNET,并成为美国互联网事实上的骨干网。
So the first network that was available to all these researchers was the evolution of this into the NSFNET, and it became the de facto US Internet backbone.
1986年时,这个网络连接了2000台计算机,到1993年已扩展至超过200万台。
This net was connecting 2,000 computers in 1986 and expanded to over 2,000,000 by 1993.
此时不再只有研究人员想使用它,其设计与拓扑结构已发展为高速国家骨干网,采用T3线路(传输速率达45兆比特/秒)。它连接少量区域网络,形成类似中心辐射的模型。这些区域网络再连接大学、实验室和非营利机构,但该网络并非为商业规模设计。
And it was no longer just researchers who wanted to do stuff with it, and so the design and topology was this high speed national backbone that had at this point gone to a T3 line, which is at 45 megabits per second. They connected a small number of regional networks, so kind of like a central hub and spoke model. Then these regional networks would connect to universities, labs, nonprofits, but this was not designed for commercial scale.
实际上,根据使用条款,该骨干网明确禁止商业流量,它专为研究、教育和政府数据传输而设计。
In fact, it was prohibited, according to their terms of use, to have commercial traffic running on that back It it was specifically designed for research, education, and government data.
因此我认为,私营商业网络可能仅存在于某个区域枢纽内部。但无法跨越整个骨干网。那时尚未出现商业化互联网的雏形。如果纽约和费城两个区域想互联,必须通过NSFNET骨干网中转,不存在中立连接点供其直接对接交换。
And so I think some private commercial network could happen inside of just like a regional hub Mhmm. But not across that whole big backbone. So you did not have the beginnings of what could be a commercial Internet. If two regionals, New York and Philadelphia, wanted to connect, they had to flow back up to that NSF Net backbone. There was no neutral point where they could connect and exchange.
就在这个时期,商业互联网服务提供商和电信公司开始意识到:这个基于分组交换的互联网很有意思,人们想用它做新尝试,或许它正试图走出实验室进入商用领域。我们该如何接入?又该做些什么?
And so this is the moment when commercial ISPs and telcos started to see, okay, this internet thing is interesting, this packet switched model, people want to do new things with this, maybe it's trying to get out of the lab into commercial use. How are we going to connect? What are we going to do?
1992年,一群网络服务提供商坐在弗吉尼亚喝着啤酒,决定在泰森角之外将他们的网络连接起来。这群工程师来自大都会光纤系统公司,即当地电话公司。他们选择华盛顿特区外的泰森角,因为那里有密集的国防承包商网络和早期互联网重度用户的服务商。他们在一个改造的停车场里建立了这个枢纽,后来成为新互联网服务提供商事实上的接入点。说到重要枢纽,人们通常不会想到停车场,但在正确的时间出现在正确的地点,这里最终演变成了所谓的东海岸大都会区域交换中心。
In 1992, a group of network providers were sitting in Virginia drinking a beer, and they decided to connect their networks outside of Tysons Corner. Now this specific group of engineers were from Metropolitan Fiber Systems, the the local telco. They chose Tysons Corner outside of Washington, DC, because you had a dense network of defense contractors and early providers which were heavy users of the current Internet. And they famously set up in a repurposed parking garage to become the de facto on ramp for new ISPs. Now, when you think about an important hub, you wouldn't typically think of of a parking garage, but it was the right place at the right time, and it turned into what's called Metropolitan Area Exchange East East Coast.
于是MAE East应运而生。如果你接入了MAE East,就意味着互联网近在咫尺。
So May East is what formed. And if you connected into May East, that meant that you had the Internet at your doorstep.
这里成为了互联网的枢纽。如果有人从伦敦向巴黎发送电子邮件,邮件很可能会经过MAE East。
This became the hub for the Internet. If someone sent an email from London to Paris, it most likely went through May East.
短短几年内,全球约一半的互联网数据包都流经MAE East的停车场。
Within a couple years, you had roughly half the world's Internet packets flowing across the May East parking garage.
我认为这种现象会反复出现——互联网总是围绕枢纽形成。除了'它最先出现'这个原因外,特定地点成为枢纽的其他理由往往并不明显,连接性就像有引力一般。这种现象对互联网并不新鲜,我们在电信行业也见过,有个概念叫'运营商酒店'。
And I think this is something we'll see again and again, which is that the Internet forms just around hubs. And it's not always obvious why that became the hub specifically other than, like, it did first, and there's just this gravitational pull of connectivity. And this wasn't new to the Internet. This was something that we saw with telcos. There's this concept called carrier hotels.
比如在纽约这样的城市,两家不同运营商想要互联。想象一下Sprint和AT&T。他们不必在全市多个地点建立连接,而是会聚集在一个称为运营商酒店的中立区域,在那里建设互联基础设施。
You're in a city like New York and you had two different carriers that are trying to connect with each other. Right? Think Sprint and AT and T. Instead of having to connect all throughout the city in multiple spots, they would all show up in a neutral zone called a carrier hotel and build their connectivity infrastructure there.
这样他们就能接入彼此的长途线路和本地光纤线路,而无需各自单独建设城际网络。
You'd be able to tap into each other's long haul routes, their local fiber routes, without having to build their own intercity footprints, each of them individually.
于是,梅伊东部(May East)成为这种模式的首次尝试,所有参与者都接入并连接到一个设备——想象成一个交换机,它负责连接。好,你接入你的ISP流量,我接入我的,现在我们的ISP用户就能互相通信了,这就像个巨大的交换机房。问题是互联网在扩展,你未必希望所有人都挤在一个交换机上。这不够安全。而且想想看,如果你要——比方说——最终建立一个视频流媒体服务,或者两家ISP达成协议,你肯定不希望其他所有人都参与这个协议。
And so May East was the first sort of flavor of this where they would all come in and connect, told somebody a device, think a switch, that is connecting, okay, you're coming in, plugging in your ISP traffic here, I'm plugging in mine, now users across our ISPs can connect, and it's just one big switch room. The problem is the internet is scaling, and you don't necessarily want to all be bottlenecked on one switch. It's not the most secure thing. And you know what? If you're, I don't know, building eventually a video streaming service or you make a deal between two ISPs, you don't necessarily want everyone else to be in on that deal.
于是你看到这个模式演变成了另一种被称为'MeetMe房间'的模式。这些是中立的物理空间,ISP或想接入ISP的人可以放置自己的设备和连接设施,双方就能直接互联。
And so you saw the evolution of this model to a different model that became known as MeetMe rooms. And these are neutral physical rooms where an ISP or someone trying to hook into an ISP can provide their boxes and their connectivity, and then those two can connect together directly.
所以这是'自带设备'的方法。不同于所有人都通过现有设备连接,你在私人MeetMe房间里自带设备来完成特定交易。
So I'll bring your own box method. So instead of everyone connecting through the existing box, you BYOB your box for the deal that you wanna do in the private MeetMe room.
没错。'带着设备去私人MeetMe房间'听起来有点滑稽,但我们讨论的是ISP之间的数据连接。
That's right. Something sounds funny about BYOB to the private meet me room, but we're talking about ISPs connecting for data.
这可能是最不柏拉图式的对话了。
The farthest thing from a non platonic conversation as you could be.
果然,这种方式成功了。美国国家科学基金会(NSF)正式认可这种方法,并指定了这些网络接入点(NAP)。第一个是梅伊东部,他们指定Sprint在新泽西州靠近跨大西洋电缆登陆点的地方运营NAP,另外在芝加哥和旧金山各设一个,后来又在圣何塞增设了梅伊西部(May West)。
And so sure enough, this all worked, and the NSF kind of officially sanctioned this method and designated these NAP points, network access points. The first one being May East, they designated Sprint to run an NAP in New Jersey near the Transatlantic cable landing points, one in Chicago and one in San Francisco. Then they eventually added May West in San Jose.
这些MeetMe房间模式开始流行,即运营商酒店。比如洛杉矶西区的一幢大楼——威尔希尔大厦,原本是律师事务所,后来腾出整层空间容纳数百台运营商路由器和数千条交叉连接,最终使其成为整个西海岸每平方英尺价值最高的空间之一。
And these meet me room models started to take off, the Carrier hotels. You had one Wilshire, a large building on the West Side Of Los Angeles, which were law offices, and gave way to a single floor that could host hundreds of carrier routers and thousands of cross connects, eventually making it one of the most valuable space per square foot on the entire West Coast.
真有趣。这栋丑陋或相对不起眼的建筑,让人不禁想问:里面到底在干什么?律师事务所还能理解,但现在这里成了西海岸互联网的枢纽。
It's so funny. It's like this ugly building or just relatively nondescript architecture. And you're like, what's going on in there? Law offices make sense, and now this is where the West Coast Internet is coming through.
没错。我们稍后会讨论海底电缆,它们接入后都希望找到通往威尔希尔1号大厦的最短路径。
That's right. And we'll talk a little bit later about undersea cables, but they come in and wanna find their shortest path to 1 Wilshire.
这种商业模式相当天才。这些电信酒店、MeetMe机房最终提供电力、冷却和交叉连接服务,并收取租金。即将到来的互联网泡沫将把这种模式推向新高度。
And the business model was pretty genius for this. These telco hotels, these MeetMe rooms ultimately provided power, cooling, and cross connect, and they would charge rent. And the .com boom to come would boost this model to new heights.
它提供了弹性基础设施,互联网公司(包括我们即将谈到的超大规模运营商)在互联网泡沫期间及之后都能利用。你可以灵活增减容量,因为它们专门提供托管连接所需的所有基础设施。
And it provided an elastic infrastructure that Internet companies in the .com boom and after could leverage, including, as we'll see soon, hyperscalers where you you could flexibly increase and decrease your capacity because they were specialized in providing all the necessary infrastructure to be able to host the connectivity.
所有这些基础设施的建立都是为了商业化互联网,提供可扩展的骨干网,让ISP、电信公司等私营企业投资提升网速。现在我们有了万维网。美国国家科学基金会资助了一个叫Mosaic的小项目,这是第一款用户友好的网页浏览器。个人电脑普及。进入九十年代中期,互联网泡沫开始了。
So all of this infrastructure being set up to commercialize the Internet, to provide a scalable backbone, to enable the private market of ISPs and telcos and others to invest in making the Internet faster. And we now have the worldwide web. The NSF has actually funded a little project called Mosaic, which is the first kind of user friendly web browser. We have personal computing. Enter the mid nineties and the .com boom.
让我们繁荣起来吧。
Let's boom.
互联网服务提供商已准备好建设网络。我们只需要用户了。
The ISPs are ready to build the network. All we need is the users.
他们确实来了。1995年5月,比尔·盖茨写下了著名的互联网浪潮备忘录。互联网是自IBM PC以来最重要的技术发展,同年网景公司上市。本,为什么网景如此重要?
And boy, are they coming. May 1995, Bill Gates writes the famous Internet tidal wave memo. The Internet is the single most important development to come along since the IBM PC, and it's that same year that Netscape went public. Why was Netscape so significant, Ben?
重要原因有二。一是它开启了互联网的普及之路,将互联网从研究者共享文件的网络转变为可下载安装的浏览器,让人们能访问网络、打开所有内容。从产品角度看如此,从经济视角也引发了人们的兴趣:这里是否存在新机遇?能否在如此短的周期内从创立到IPO赚取数百万乃至数十亿,从而掀起一场狂热?
Think significant for two reasons. One is it kicked off the accessibility of the Internet. It turned to the Internet from this network for researchers to share files to this browser that you could download, install, and access the web and open all of that up. And then it was that from a product perspective, but that also captured the economic perspective and interest of, is there a thing here? Is there a new industry, a boom that you can make your millions or billions off of from founding to IPO in such a short cycle kicked off a mania.
它在商界掀起了另一场狂热——几乎无需收入或利润就能达到数十亿美元的估值,网景做到了。网络随之爆发:1995年仅有23,000个网站,到次年2月就超过1000万个,全球用户突破3.5亿。纳斯达克指数两年内翻了三倍。
It kicked off another mania in the business world, which was having barely any revenue or barely any profits. You can hit a multibillion dollar valuation, which is what they did. And the web exploded. You had 23,000 websites in 1995 to over 10,000,000 by the year February, and global users climbed to over three fifty million. NASDAQ tripled in two years.
那时只要在公司名后加上'.com'——就像现在加'AI'一样——仅凭幻灯片上的创意就能融资数百万。仅1999年就有超400家互联网公司上市,募资400亿美元。那样的IPO流动性如今只能梦想。
If you put a .com at the end of your name, just kind of like you put an AI at the end of your name today, you could raise millions on an idea in a slide deck. And in 1999 alone, you had more than 400 Internet companies going public, pulling in $40,000,000,000. What a liquid IPO market that we could only dream of today.
微软股价创下历史新高,整个市场充满狂热。二十多岁的创始人一夜成为账面百万富翁,工程师为股票期权频繁跳槽。
Microsoft hits all time highs in the stock market. The energy was manic. Right? Founders in their twenties would become paper millionaires overnight. Engineers are hopping jobs for stock options.
这是电竞椅、桌上足球和新经济崛起的时代,传统商业规则(比如营收)不再适用的时代。
This is the rise of the air on chair and the foosball tables and the new economy where the where the rules of, you know, things like revenue no longer apply to business.
但这不仅是应用层,基础设施同样疯狂。运营商在光纤和无线网络投入五千亿美元,托管公司飞速扩张。当时全球最大主机服务商Exodus Communications就提供服务器托管服务。
But it wasn't just applications. It was infrastructure as well. Carriers spent half a trillion dollars on fiber and wireless. You had these colocation companies expanding at breakneck speed. This company called Exodus Communications was the world's largest web hosting provider at the time, providing server colocation.
其收入从1997年的1200万美元增长到两年后的2.5亿美元,并在三年后达到320亿美元市值的顶峰。它正在铺设基础设施并在此基础上构建应用程序,经历了一段前所未有的增长期。
And its revenue went from 12,000,000 in 1997 to 250,000,000 two years later, and it peaked at a $32,000,000,000 market cap three years after that. It was laying the infrastructure in the ground and building the applications above it in a period of unprecedented growth.
因此要创办一家初创公司,你必须建立网站、搭建服务、构建数据库。你需要提前筹集大量资金购买服务器,安置在托管机房里。
And so to launch a startup, you had to build a site, you had to build a service, you had to build a database. You needed money way ahead of time to buy the servers, stick in the colo box.
等等。你是说如果我要开展网络业务,就必须先购买硬件设备?
Wait. So you're saying in order for me to launch a web business, I had to buy hardware?
没错。你不得不将大部分风投资金花在服务器上,甚至在你还不确定是否有人会访问你的网站之前。
Exactly. You had to take most of your venture capital dollars and spend it on servers even before you knew if anyone wanted to go to your website to begin with.
所以你无法验证你的想法。无法进行A/B测试。无法制作一个吸引用户加入等待列表的着陆页。
So you couldn't test your idea out. You couldn't a b test. You couldn't do a landing page that drew in a wait list.
这些全都做不到。
Couldn't do any of that.
哇。真是个截然不同的世界。
Wow. What a different world.
所有这些新服务器都试图运行在支撑这次繁荣的新铺设的骨干网络上,而此时的流量仍仅通过少数几个公共交换点交汇。网络通过我们刚提到的梅东海等地的共享设备箱连接,但这种模式的扩展能力终究有限。
So all these new servers are trying to run on the new backbones that are being laid to power this boom, and the traffic still, at this point, met at only a handful of public exchange points. Networks are plugged into these shared boxes at places like May East that we just talked about, and that can only scale for so long.
你会开始遇到瓶颈。而梅东海作为最早的网络接入点之一,成为了主要瓶颈之一。
You'd start to hit choke points. And May East, one of the earliest network access points, became one of those major choke points.
我们需要不同的模式让企业、ISP和非ISP能够互联。Deck在帕洛阿尔托启动了非电信中立的帕洛阿尔托互联网交换中心,随后Equinix成立,他们在弗吉尼亚州阿什本建立了首个站点,将这个模式规模化。
We needed different models for companies, ISPs, non ISPs to be able to connect. You had Deck kicking off Palo Alto with the Palo Alto Internet Exchange, a non telco neutral spot. Then you had the founding of Equinix, where they took that model and scaled it with their first site in Ashburn, Virginia.
阿什本是数据中心的华尔街。1999年,Equinix在弗吉尼亚州阿什本紧邻瓶颈点梅东海(原网络接入点)处,按新模式启用了首个数据中心。
Ashburn is the Wall Street for data centers. In 1999, Equinix launches their first data center under this new model in Ashburn, Virginia right next to May East, the choke point, but the original network access point.
这地方是不是靠近华盛顿特区?我从没去过。你去过劳登县吗?
Isn't this near, like, DC? I've never been there. Have you been to Loudoun?
我在河对岸的华盛顿特区郊区长大。劳登县位于弗吉尼亚州一侧,是杜勒斯机场外的农村 farmland。那里原本荒芜,但毗邻东海岸人口密集区、跨海电缆和梅东海。
I grew up right outside of DC on the other side of the river. Loudoun County is on the Virginia side. It's rural farmland. It's past Dulles Airport. There's really nothing there, but it's proximal to a large East Coast population, the inter sea cables, and May East.
实际上美国在线(AOL)在九十年代中期就选择劳登县建立了大型拨号上网基地。他们铺设了新光纤,吸引了更多运营商入驻。梅东海规模膨胀到超出原有停车场车库的容量,最终交换中心迁至阿什本。于是杜勒斯机场外这片不起眼的 farmland,除了AOL的开拓,还因其他几项因素实现了快速发展。
And you actually had AOL choose Loudoun County in the mid nineties to set up a huge dial up campus. And so they laid fresh fiber and drew even more carriers into the region. And May East had grown so large, it outgrew the parking garage that was in. And so the exchange relocated to Ashburn, and so you've got this unassuming farmland outside of Dulles Airport. And it was really catalyzed by a couple things outside of AOL pioneering the new site.
这是政策主导的结果。劳登县规定数据中心可被视为普通办公园区,直接取消了所有特殊用途听证会,并长期提供惊人的税收优惠以吸引数据中心入驻。与此同时,多米尼克能源公司预见到这一趋势,以最低的工业电价将高压线路架设到这片空地,将电力与光纤结合,开创了Equinix在阿什本率先实践的新型数据中心模式,使之成为地球上最大的互联网枢纽。
It was policy led. So Loudoun County ruled that data centers could be treated like ordinary office parks. They just eliminate all these special use hearings and provide incredible tax breaks over time to attract data centers into this network. Coupled with that, you have Dominion Energy that kind of see what's coming ahead and offered some of the lowest industrial rates to string high voltage lines to this empty land to bring power and fiber together to create the new data center model, which Equinix pioneered in Ashburn and became the largest Internet hub on planet Earth.
因此这是土地(虽然现在已不那么重要)、光纤与连接性、廉价易得的电力以及有利政策的组合。我听过一个有趣的政策故事:苹果在2000年代末(2009年2月)选址新站点时,由于需要扩展服务、增加存储和建设更多数据中心,经过评估发现北卡罗来纳州提供了更优惠条件。弗吉尼亚州随即反击,通过重大税收减免政策——规定只要投资超过1.5亿美元、雇佣50人以上的数据中心建设项目即可免税。
So it's the combination of, perhaps for a moment, land, not so much anymore, fiber and connectivity, cheap and accessible power, and favorable policy. One interesting policy story that I heard on this, Apple was looking for where to put a new site in the late 2000s, 02/2009. Apple's obviously building more services, needs more storage, building more data centers. So they run a process and it turns out that North Carolina gives them a better deal. So Virginia fights back and in reaction passes major tax breaks to say that if you're building a data center basically, if you're building anything more than a $150,000,000 of investment, you're gonna employ more than 50 people, no tax.
建设数据中心当然会满足这些条件。
Which you, of course, are if you're building a data center.
当然。是啊,1.5亿美元根本建不起数据中心。
Of course. Yeah. You can't build a data center for $150,000,000.
正如你所说,地方政策至关重要。实际上联邦政策也起了重要作用——1996年通过的《电信法案》强制要求从AT&T分拆出来的区域性垄断电信公司,必须将其物理网络(铜线对和光纤)租赁给竞争对手。在此之前,数据中心都是绑定单一运营商的私营企业,因为运营商只使用自己的光纤。
Local policy, as you mentioned, is so important. Federal policy actually played a big role here. In 1996, we passed the telecommunications act. And one of the main things it did was force the incumbent telcos, which were regional monopolies from the AT and T breakup, to lease their physical network, their copper pairs, their fiber, to their competitors. And so prior to this, a data center was a private enterprise tied to a single carrier because the carriers only use their own fiber.
这项法案促成了运营商中立站点的出现,因为数据中心租户可以选择接入楼内的多种光纤。这为租户和运营商中立模式打开了爆炸式的选择空间。
Now it enabled carrier neutral sites since you could become a tenant of a data center and choose from multiple fibers provided into that building. And it opened up an explosion of choice for tenants and for this care and neutral model.
因此这是一个放松管制的良性循环:税收激励政策一旦实施,就会成为该地区经济繁荣的主要来源。现在我们快进到今天的故事,可能会遇到一些真正的阻力。
And so this is a flywheel of a deregulation environment to build this because once they have the tax incentives in, it becomes a major source of economic prosperity for the region. Now we'll fast forward later on in the story to today and maybe hitting some of the first real pushback.
与法兰克福、阿姆斯特丹、伦敦、东京类似,最大的枢纽就是你能找到最多光缆和运营商、以最小摩擦连接最多网络的地方。正是这种良性循环使阿什本持续成为美国最大的数据中心聚集地。
And similar to Frankfurt, Amsterdam, London, Tokyo, the biggest hubs are where you can find cables and carriers and connects the most networks with the least amount of friction possible. And it's this flywheel that continues to make Ashburn the largest home of data centers in The US.
那是怎样的时代啊。1995年前六个月,互联网流量每百天就翻一番。电信公司坚信网络建设永远不会过剩,只要铺设尽可能多的光纤就够了。
And what a time. In the first six months of 1995, Internet traffic was doubling every hundred days. The telcos were convinced that you couldn't overbuild. You just needed all the fiber you could put down.
世通和其他电信公司,他们投入了数十亿美元。过度建设的概念?在互联网繁荣期根本不存在,对吧?但到了二月份,实际使用的
WorldCom and and these other telcos, they just they poured billions into this. And the idea of overbuilding? Not possible, particularly in this Internet boom. Right? But by February, how much
已铺设光纤有多少?只有约3%被启用。3%。他们只是铺设了光纤,那些玻璃细丝就那样空置着。这么说吧,
of that installed fiber was actually being used? It was only like 3% was let out. 3%. So they just laid down fiber, the shards of glass that are just sitting there empty. So to give you a
这个时期铺设的光纤长度,足以绕地球赤道5000圈。
sense of the amount of fiber miles laid during this period, you could go around the circumference of the Earth 5,000 times.
哇。所以这些都是陆地上的。对吧?但怎么连接欧洲呢?
Wow. And so this was all over land. Right? Like but how would you connect to Europe?
你说得对,这种过度建设不仅限于陆地。几乎所有洲际互联网流量都通过海底光缆传输——我略有耳闻,但从未真正意识到:这些玻璃纤维束被缠绕成花园软管结构,铺设在全世界海床上。想象一下,整个星球被发丝般纤细的玻璃丝串联,包裹在铠装软管中,横跨最黑暗的深海区域。这就是海底光缆系统,是连接各大洲的真实物理互联网。至今99%的国际数据仍通过这些光缆而非卫星传输——那是比人类头发还细的纤维中疾驰的光脉冲。
And that overbuild, you're right, was not just limited to land. Nearly all intercontinental Internet traffic rides on undersea cables, which I kind of heard of, but I didn't really have a full appreciation for the fact that bundles of glass threads are wound together into a garden hose structure and laid down on the ocean floor all over the world. And so imagine a planet stitched together with hair thin strands of glass tucked into an armored hose and laid across the darkest parts of the ocean. That's the undersea cable system, and it's the real physical Internet that connects the continents. So 99 of international data still rides on these cables, not satellites, that's racing pulses of light through fibers thinner than the human hair.
这个故事可以说始于遥远的十九世纪五十年代,当时第一条横跨大西洋的电报电缆问世。它们通过蒸汽船运载,于1866年完成了首次跨洋连接,首次实现了新闻跨越大洋的即时传递。
The story arguably starts way back in eighteen fifties when we had the first telegraph cables that were crossing the Atlantic. They brought those across steamships and landed the first link in 1866, able to send news for the first time across the ocean.
所以,新闻从需要数周变成了只需几分钟就能获取?
So instead of weeks, you could get the news in minutes?
没错。你不再通过船只传递新闻,而是通过电子传输。太神奇了。快进到1988年,我们铺设了第一条横跨大西洋的光纤电缆。
Yeah. You're not sending the news via ship. You're sending it via electrons. Amazing. And fast forward to 1988, and we land the first transatlantic fiber optic cable.
这条电缆连接美国、英国和法国,开启了跨大陆通信能力的新纪元。到了九十年代末,随着这股建设热潮,你可以想象有多少资金涌入这些基础设施的铺设。
It runs between The US, The UK, and France, and it kicks off this new era of cross continental capacity. By the late nineties, as this boom was happening, you can imagine the funding routes going into wiring all of this up.
你们用专业的电缆船铺设这些电缆,先勘测海床,然后释放电缆,近岸处将其埋入海底以防锚损和风暴破坏,最后通过不起眼的混凝土盒上岸。这种‘花园水管’般的电缆每隔50至100公里就配有光中继器来增强信号,使得数据能无损传输数千英里。上岸后,这些混凝土掩体里的电缆会接入陆地光纤,直通像威尔希尔大厦这样的枢纽,再连接到更远的网络。
You laid all of this cable down with specialized cable ships that surveyed the seabed, unspool the cable, and then near shore, they bury it underneath to protect it from anchors and storms and then rise it up through some nondescript concrete box. And this is this garden hose has optical repeaters that boost the light every 50 to a 100 kilometers so you can sprint thousands of miles without fading. And then you get to land, and these concrete bunkers have the cable hop up into terrestrial fiber and then run straight to your one Wilshire or any of your nearby hubs that then connect you onward.
多么不可思议啊!一根形似花园水管的东西,承载着电影和所有数据。我们究竟在谈论什么?以2018年末的一条现代电缆为例,每秒能传输250太比特数据。形象地说,每秒可传输6000部高清电影,或承载全球约20%的互联网流量——全在这一根‘花园水管’里。
What a wild thing. You have a garden hose shaped thing, like movie and all this data. Like, what do we actually mean? Well, modern cable, as an example, one late in 2018, can move 250 terabits per second. To conceptualize that, you can send 6,000 HD movies in one second, or about 20% of global Internet traffic can go in one garden hose.
这是如何实现的?最初使用光纤时,我们会用激光闪烁表示0和1。但后来我们从单一波长发展到所谓的波分复用技术。说白了就是利用彩虹原理——我们同时使用多个波长传输。
How is this working? Well, at the beginning of using fiber optics, would shine a laser down and blink your zeros and ones. But we've moved from doing that with one wavelength to doing what's called wavelength division multiplexing. That's a fancy way of saying we use the rainbow. We're using multiple.
通常同一根电缆可以同时传输约80至120种不同颜色的光信号。另一个改进是我们增加了更多光纤。过去只有一对光纤,现在增加到12至16对。最后一项技术叫做相干光学,这是一种调制光波振幅的方法。因此信号不再只是简单的开关状态,而是可以实现多级调制。
Usually it's around 80 to 120 different colors that can go down the same cable at the same time. The other thing is we've added more fibers. Instead of there being a pair of fibers, we now have up to 12 to 16 pairs of fibers. And then the last thing is called coherent optics, and this is a way of modulating the amplitude of the light. So instead of it just being on and off, you actually can have multiple steps.
所有这些技术进步使得如今单根光缆的最大传输容量达到约250至300太比特每秒。全球部署了数百条这样的光缆,最酷的是我们铺设的是玻璃光纤。我们不断提升玻璃的传输容量,因为玻璃本质不变。这种基础设施就像铁路一样具有韧性。
All of this adds up to today, probably two fifty to 300 terabits per second maximum capacity in one hose. There's hundreds of hoses around the world, and I think the other thing that's really cool about this is the thing that was laid was the glass. We keep increasing the capacity of the glass because the glass is the glass. Resilient infrastructure. It's kinda like railroads.
就像我们仍在沿用古老的铁轨
It's like we're still using the tracks
即使机车不断升级。正如我们讨论的网络效应,当更多网络接入时,其价值和速度都会提升。这就是为什么阿什本地区会如此稳固——正如我们所说,当数百家运营商和数千条交叉连接汇聚一地时,迁移就变得不可能。
from long ago. Even if the engine gets upgraded. And just like we've been talking about with this network effect, when you bring more networks in, it increases the value and the speed. And so that's a big reason why Ashburn, as we talked about, hardened. Once hundreds of carriers and thousands of cross connects land into a single place, moving it is impossible.
因此若要从美国东海岸快速连接欧洲,你会选择海底电缆登陆点附近的数据中心。纽约、新泽西、弗吉尼亚连接着英国康沃尔和默西塞德;红海和地中海走廊以吉布提为关键节点;迈阿密是通往拉丁美洲的门户;日本、新加坡、香港和台湾则是亚太地区的重要通道。
And so if you need to reach Europe fast from the East Coast, you're gonna colocate where the undersea cable is already coming. You've got New York, New Jersey, Virginia tying to Cornwall, England and to Merseys. You've got the Red Sea and the Mediterranean corridor connecting into Djibouti as a critical touchpoint. You've got Miami as a key touchpoint into Latin America. Japan, Singapore, Hong Kong, Taiwan are key corridors into Asia Pacific.
蒙巴萨和拉各斯现在点亮了非洲东海岸。数百条海底电缆就这样汇聚交织,连接起整个世界,进而形成了我们已建立并将通过这个故事继续建设的数据中心网络。
Mombasa and Lagos now light up Africa's East Coast. These are just hundreds of cables have converged that that have connected the entire world, and this then forms the network of data centers that we have built and will continue to build through this story.
所以这场建设热潮还在继续,对吧?
And so the boom continues, right?
不,那次繁荣之后确实出现了类似萧条的情况。
No. There was something like the bust to that boom.
没错。到了2001年2月,一切都崩溃了。广告业倒闭,初创企业倒闭,我们之前讨论过的Exodus申请破产,负债近60亿美元。当时最大的互联网服务提供商之一PSI Net已经垮台。那么这是否意味着互联网的死亡,以及所有这些铺设的光纤价值的终结?
That's right. And by 02/2001, everything collapsed. Advertising folded, startups folded, Exodus that we talked about earlier filed for bankruptcy, had nearly $6,000,000,000 in debt. PSI Net, one of the largest ISPs, had already collapsed. And so was this the death of the internet and the death of the value of all of this fiber that was laid?
显然不是。这只是特定泡沫时期的终结和过度建设的死亡。但事实上,实际建设的基础设施随着重要服务的成熟和商业模式的完善,将被证明具有巨大价值。换句话说,我认为这个时代的应用层消亡了,但基础设施得以延续并最终蓬勃发展。
Clearly not. It was the death of a particular moment and an overbuild in a bubble. But in fact, the actual infrastructure that was built out would prove immensely valuable as services that mattered matured and business models matured. In other words, I'd say the application layer of this era died, but the infrastructure lived on and would eventually thrive.
所以互联网泡沫退潮了,但这些过度建设的资产恰恰是我们下一阶段所需。包括那些我们如今离不开的基础设施——运营商酒店、光纤、数据中心、海底光缆都没有消失,只是换了主人。当时是在清仓大甩卖,对吧?
So the .com tide went out, but the overbuilt of assets were exactly what we needed for the next chapter. And it included infrastructure that we we can't live without. The carrier hotels, the fiber, the data centers, the glass in the ocean didn't disappear. It just changed owners. And there was a fire sale, right?
资产正以成本价的一小部分被抛售。少数关键参与者幸存下来,也有新的入局者。Equinix挺过了这场崩溃,他们加倍押注于互联互通这一核心资产。那些中立的'会面机房'变成了市场,竞争对手反而付费成为邻居,因为价值在于这些运营商酒店提供的速度、可靠性和灵活性。
Assets were being sold at at a fraction of the cost. A few key actors survived and a few new ones stepped in. Equinix survived the crash. They doubled down on the real asset that they had, which was interconnection. That neutral meet me room turned into a marketplace where competitors paid you to be neighbors because the value was in the speed and the reliability and the flexibility that these carrier hotels provided.
与此同时,私募股权如每个萧条周期那样大举进场,重塑行业格局。其中一家特别收购了全球数十处濒临破产的设施,将其整合为Digital Realty Trust平台,作为首家纯数据中心REIT上市,把计算空间当作房地产来经营。这不是什么高科技豪赌
Meanwhile, private equity swoops in, as they will in every bust cycle, and reframe the category. So one in particular buys a couple dozen distressed facilities around the world, turns it into a vehicle called Digital Realty Trust, takes it public as the first pure play data center REIT, and it treats compute space like real estate. So it wasn't high-tech moonshot
他们引入耐心资本来标准化外壳结构,低成本融资并获得长期租约,这成为未来二十年数据中心建设的蓝图。除了2001年泡沫破灭后严峻的经济环境,还有9·11事件。当时金融机构的关键服务器仍设在办公室或附近,与银行保持交易信息联通。当9·11发生时,威瑞森140西街中央办公室——纽约市最大的电信枢纽之一——被碎片尘埃淹没,机房进水,立即导致数万条语音和数据电路中断。
assets they were buying. They're bringing patient capital in to standardize the shell, finance it cheaply, and get long leases, which becomes a blueprint for the next two decades of data center build outs. In addition to the sobering environment and economic environment of 2001 from the bust, there was also September 11. These financial institutions were still operating in this moment of having their key servers trading information connectivity with banks in their offices or right near their offices. So when nineeleven happened, the Verizon one hundred and forty West Street Central Office, one of the largest telecom hubs in the city, was blasted with debris and dust, flooded in the equipment rooms, tens of thousands of voice and data circuits were knocked offline immediately.
这些电路大部分都用于支持以下功能:经纪业务、市场数据提供商、交易大厅、与证券交易所和清算所的低延迟连接。你可以想象市场在一瞬间就断开了连接。这种情况产生了连锁反应。工程师们日夜工作以恢复系统。我当时正在研究摩根士丹利的经历。
Most of those circuits were powering exactly that: brokerages, market data providers, the trading floors, low latency connectivity to the stock exchange and clearinghouses. So you imagine the market is just disconnected now in a flash. And this cascaded. Engineers worked night and day to bring things back online. I was looking into Morgan Stanley's experience.
他们目睹了整个交易系统崩溃。他们在新泽西州设有灾难恢复站点,但那里的连接性和数据馈送水平不同。因此他们花了几天时间。他们通过建筑地下室铺设新光纤,将枢纽接入另一个仍在运行的电信枢纽以恢复容量,以便在9月17日证券交易所开盘时能够交易。我认为从这次事件中,整个数据中心行业开始以全新方式思考弹性问题。
They saw their whole trading system go down. They had a disaster recovery site in New Jersey, but they didn't have the same level of connectivity and data feeds. So it took them a few days. They ran new fiber through building basements, patched hubs into another telco hub that was still operating to restore capacity so they could trade when the stock exchange opened September 17. I think coming out of this, there was a whole reshaping of the data center world to think about resiliency in a new way.
我认为这展示了这座城市中这种连接性的物理本质。对吧?在那个时刻,你可能想当然地认为可以远程操作数据,无论是交易还是市场数据。
I think it showed the physicality in the city of this connectivity. Right? In this moment, could take for granted that you could take action on data over there, whether a trade or a market data.
仅仅在同一街区的另一层楼设置冗余是不够的。对吧?我们必须开始考虑在不同地点建设基础设施,采用不同网络、设施和路线,以真正建立弹性。现在如果数据中心宕机,系统会自动重新路由,我们不会看到同样的故障,尽管这种情况偶尔还是会发生。
And it wasn't enough to have redundancy on another floor in the same neighborhood. Right? We had to start thinking about an infrastructure build out in different locations with different networks, facilities, and routes to really build true resilience and switch over to the point where now if a data center goes down, there's automatic rerouting, and we don't see those same blips, although it happens from time to time.
2000年代初是这个行业真正成熟和发展的时期,人们意识到这些服务器和数据中心承载着重要金融数据,需要得到相应对待。但与此同时,在消费领域出现了一线曙光,而且是巨大的变化。这个时代我们正从吱吱作响的电话调制解调器转向宽带。
This early 2000 phase is a real maturing and growing up of the entire industry to realize that these servers and data centers are holding important financial data and need to be treated as such. But meanwhile, in consumer land, there's a glimmer of light, and it's a big one. This is the era that we move from that squeaky squealy phone modem to broadband.
我记得参观大学时,有些校园已经普及以太网,有些则没有。
I remember touring for colleges, and some had Ethernet across the campus and others didn't.
我觉得家里装上宽带后体验完全不同了。就像是换了一个互联网。
I feel like getting broadband to our house was a radically different experience. It was like a different Internet.
图片没有从上到下加载完成。
The image didn't load from top to bottom.
它就这样飞速传播。这时BitTorrent开始腾飞,Skype于2003年2月上线,你终于能打网络电话了。哦,Napster成为可能。《魔兽世界》于2004年2月发布。
It would just fly through. And so this is when BitTorrent starts soaring. This is when Skype launches 02/2003, and you can actually make VoIP calls. Oh, Napster was possible. World of Warcraft launches 02/2004.
我记得它当时风靡大学校园。早期的网络视频产品开始
I remember that taking the college campus by storm. Early web video products started to
涌现。到2005年2月,全球已有10亿网民,约占地球人口的16%。
come out. By 02/2005, you had a billion people online, about 16% of the planet.
所以九十年代末那些兴奋预言的人们并没有错。他们只是早了五六年。是的。同时CDN技术也兴起了。Akamai的业务版图急剧扩张,在边缘网络提供越来越多的存储、复制和缓存服务。
So all those folks prognosticating with excitement in the late nineties were not wrong. They were just off by five or six years. Yeah. And so, you know, there's also the advent of CDNs. So Akamai's footprint exploded to provide more and more storage and replication and caching at the edge.
这具体意味着什么?意味着在你的ISP连接点附近,如果有人下载图片——比如你和街对面的人同时用电脑打开《纽约时报》——你们不需要都绕道去访问《纽约时报》的主服务器获取那张照片。它已被缓存在直接连接你们ISP的附近CDN节点上。
And so what exactly does this mean? It means at the places where your ISP is connecting, if someone's downloading an image you know, let's say you load the New York Times on your computer and someone else does on their computer down the street. You don't both need to go all the way back to the New York Times home server for access of that photo. It's now been cached on a nearby CDN that's directly hooked up to your ISP.
因此在2000年代早中期,消费者回归,应用百花齐放,互联网逐渐成为社会基础架构的一部分。少数熬过泡沫破裂的公司抓住了这个千载难逢的机遇。它们不仅建立了庞大的消费者和企业业务,更成为了关键的基础设施公司,帮助构建了现代数据中心体系——这正是我们今天看到的繁荣景象。我们要从的最佳起点就是买书的最佳去处:亚马逊网站。
So in the early to mid two thousands, you have the consumer coming back, applications are flourishing, the Internet is becoming a part of the fabric of society, and there are a handful of companies that survived the bust and captured this moment unlike any other. They not only built incredibly large consumer and enterprise businesses, but they actually became critical infrastructure companies that helped build the modern data center world that is what we're seeing booming today. It's the place we're gonna start with is the best place to buy books, amazon.com.
所以无论你想找什么书
So whatever book you wanted to find
从A到Z应有尽有。
From a to z.
1995年7月,亚马逊成立。到97年完成IPO。98年时,它已不仅是书店,正朝着'万物商店'迈进。最初贝索斯和亚马逊并未打算成为全球基础设施供应商,但九十年代末的高速发展为后续建设奠定了基础——不仅为自己,更为所有人。
July 1995, Amazon launches. By '97, they IPO. And by '98, they are no longer just a bookstore. They're on their path to becoming the everything store. Now it was not initially Bezos' and Amazon's intention to become the infrastructure provider of the world, but this high growth moment of the late nineties set the stage for what they would need to build, not just for themselves, but for everybody.
早期他们使用诸如Deck等厂商昂贵但号称可靠的服务器,这些高利润率的设备。问题是亚马逊本身并非高利润企业,他们追求规模效应,商品售价仅略高于成本。作为零售商,他们力图成为利润率最低、图书配送最便宜的商家。
In those early days, they were running expensive, quote unquote, reliable servers from the likes of Deck, extremely expensive products, high margin servers. Now the problem is Amazon was not a high margin business. They're trying to go for scale. They're selling things at whatever the cost was to pass through. They are a retailer trying to be the lowest margin retailer out there, the cheapest way to get your book delivered to your doorstep.
经营零售业务总是资金紧张。
Running a retail business, they're always tight on cash.
因此继续在服务器上投入变得不合理。到二月时,基础设施支出已高到令人担忧会拖垮公司。于是他们启动重大项目,将整个amazon.com迁移至Linux系统,并改用更便宜的惠普服务器。
So to spend it on servers stopped making sense. So by February, they're spending so much in infrastructure, they're worried this was gonna bankrupt them. And so they kicked off a big project to rewrite all of amazon.com onto Linux and to run it on much cheaper HP servers.
当时亚马逊以庞大的单体代码库著称。每上线一个新品类,都需要改动整个代码库,最终变得像一团乱麻。
And this is when Amazon was famously a a huge monolithic code base. Every new category they launched, they had to work across their entire code base, it became this hairball.
大约在2002年2月左右,贝索斯受够了这种情况,他发布了著名的API指令:内部每个团队都必须通过经过加固、文档化的服务接口暴露功能,这些接口不仅要供内部团队使用,最终还可能对外开放。
By, I think, around 02/2002, Bezos had had enough of that, and he issued the famous API mandate that internally every team had to expose functionality through hardened, documented service interfaces designed not just to be used by internal teams, but eventually potentially externalizable.
基本原则是毫无例外。每个团队都必须通过这些接口进行通信。没有后门,没有直接线程,没有直接链接。无论技术实现如何,无一例外都必须从一开始就设计为能与其他团队进行外部通信。
Classic basis, there are no exceptions. Every team must communicate through these interfaces. There was no backdoors, no direct threads, no direct linking. Doesn't matter what that technology did, it would, without exception, be designed from the ground up to communicate externally to other teams.
如果不这么做,会有什么后果?
And if you didn't do this, what would happen?
如果不遵守,你就会被解雇。在这次重启时刻,公司领导层似乎在自问:我们如何才能像软件企业而非家具企业那样扩展业务?如果计算能力能随需求弹性伸缩呢?既然我们能为自己做到这一点,为何不将其租给全世界?他们不仅站在技术突破的边缘,更是商业模式的突破点。
If you didn't do this, you were canned. So in this moment of reboot, it seems like the question that the company's leaders are asking themselves is how are we able to scale our business like a software business and not like a furniture business? What if compute could scale with demand? And if we can do this for ourselves, why not rent it to the rest of the world? They were on the precipice of not just a technical breakthrough, but a business model breakthrough.
因为几十年来,运营在线业务意味着要签订多年租约,购置昂贵的服务器并过度配置
Because for decades, running an online business meant these multiyear leases, these expensive servers overprovisioning
以应对峰值需求。而亚马逊将彻底颠覆这一点,通过宣称'你可以按小时租用服务器,只为实际用量付费',从根本上改变了互联网企业的运营方式。这在当时极具创新性,同时也颇具讽刺意味——毕竟我们处理家庭电费水费这种公用事业账单已有数十年甚至上世纪的经验。公用事业本就是按实际用量计费。
to handle peak demand. And Amazon would go on to flip this on its head and fundamentally change Internet businesses by saying, you can rent a server by the hour and pay for only what you use. It is both deeply innovative at the time and also funny because we've had decades, century of doing this with our electricity bills in our own houses or water bills. This is utilities. Utilities, you've always just paid for what you use.
但它持续带来的好处是降低成本,提高集中式基础设施的利用率。亚马逊刚刚经历了重写软件和改变服务器架构的痛苦时期。我认为这促成了两点:A.我们绝不想再经历这种痛苦;B.其他人也不该再经历这种痛苦。
But what it continually enables is the driving down of cost, better utilization of centralized infrastructure. And Amazon had just lived through this painful period of having to rewrite their software and change their server architecture. And I think it was two things. One is, we never want to go through this again, A. B, no one should have to go through this again.
C,如果我们开始为全球建设基础设施,这将增加我们的成本和收益,并形成一个良性循环。就像任何规模经济提供者所经历的那样,规模越大,效益越好。
And C, if we start to build infrastructure for the world, that's going to accrue to our costs and our benefit, and there's going to be a flywheel here. Just like any scale economies provider ever experiences, the bigger we get, the better.
没错。利用率是如此关键的一点。那么2006年3月发生了什么?
That's right. And that utilization is such a key point. So so what happens in March 2006?
亚马逊推出了首个真正的AWS服务——S3或简单存储服务。这个简单存储服务对我这样的小型互联网企业有什么用?它允许任何人将数据块存放在这个非物理磁盘上,并在全球任何地方访问。虽然名称听起来简单,但当时要实现无论数据量是兆字节、千兆字节还是太字节,都能让全球用户快速访问,技术上却异常困难。亚马逊抽象化了所有复杂性,让你只需支付月费就能实现。
Amazon launches the first real AWS service, s three or simple storage service. What does this simple storage service do for me as a small Internet business? It lets anyone put a blob of data on this nonphysical disk and access it anywhere in the world. And that sounds simple in the name, but it was shockingly hard to put a blob, whether that was a megabyte or a gigabyte or a terabyte of data out there and have everyone around the world be able to access it quickly. Amazon abstracted everything away so that you could do that and just pay a monthly fee.
你不需要搭建服务器、插入硬盘、再建一个服务器来复制数据等等这些繁琐操作。它直接提供了开发者所需的功能。
You didn't have to build a server and plug in a hard disk and build another one that copied the data over and all of these other things. It just gave you what you need as a developer.
所以这让我能存储信息了。没错。几个月后他们又做了什么?
So this enabled me to store information. Correct. A few months later, what'd they do?
他们推出了弹性计算云服务,也就是EC2。这样我就能
They launched the Elastic Compute Cloud, also known as EC two. So I can
存储数据,现在你告诉我还能计算?这是怎么运作的?
store, and now you're telling me I can compute? How does this work?
你可以进行计算。EC2本质上就是能够按需启动计算机、服务器的能力。实际上他们让你启动的是所谓的虚拟机,你可以选择运行Linux、Windows或某些SQL Server操作系统。我会拥有这个虚拟机来部署、运行它,并运行我需要的任何代码。如果我需要第二台机器,只需按一个按钮就能获得。
You can compute. And so what EC two essentially was was the ability to spin up computers, servers as you saw fit. Now what they were actually letting you spin up is what's called a virtual machine, where you can say, I want to run Linux, or I wanna run Windows, or I wanna run some SQL Server OS. And I would have this virtual machine where I could deploy that, run it, and run whatever code I need to. And if I need a second machine, I push a button, get a second machine.
如果我需要第三台机器,就能得到第三台。我按小时、按机器付费,只为实际使用的资源买单。
If I need a third machine, a third machine. And I pay by the hour, by the machine, only for what I'm using.
这太棒了。我正在创业,预计会快速增长,但无法预测下个月或半年后的流量情况。我刚融到一笔风投资金,现在你告诉我不用购买昂贵的惠普和Sun服务器。随着业务增长,我只需租用更多计算资源和存储空间。
This is outstanding. So I'm building a business, and I think I'm gonna grow fast, but I don't know what traffic I'm gonna get next month or in six months. And so I just raised a bunch of VC money. Now you're telling me I don't have to buy these expensive HP and Sun servers. And as I grow, I can just rent more compute and rent more space.
没错。不仅如此,我认为这里的启动门槛被大幅降低了。过去要面临获取服务器和空间的种种挑战,但现在这种模式下,你只需刷公司信用卡,就能立即获得所需的基础构建模块。这是初创企业的寒武纪大爆发。
Yeah. Not just that, but I think the activation energy here was brought way down. Before, there's the challenge of getting a server and getting space and all that. But, like, in this model now, you just put down your corporate card, and you're off and running with the foundational building blocks that you need. This is the Cambrian explosion for startups.
关键在于利用率问题。太多初创公司购买了服务器却从未达到满负荷运转,就像我们讨论过的那些闲置在后屋的硬件设备。亚马逊如何实现高利用率?关键在于他们提供的所谓'服务器'——实际上并非真实服务器。
Key to this is the utilization point. So many startups had bought servers that then never hit full utilization, or we talked about all these appliances sitting in the backroom closet not hitting full utilization. What enabled Amazon to drive utilization? It was the fact that, yes, they gave you a quote unquote server to run your operating system on. They did not give you a server.
事实上,他们提供的是虚拟机。不是实体机,而是虚拟机。但什么是虚拟机?虚拟机是一种概念化的机器形态,如同建筑方案之于实体建筑。这要追溯到1998年由黛安·格林和门德尔·罗森布拉姆创立的VMware公司。
In reality, they gave you a virtual machine. Not a machine, a virtual machine. But what is a virtual machine? A virtual machine is the flavor of, the concept of a plan, not a plan, the concept of a machine. This goes back to a company, VMware, that was founded in 1998 by Diane Greene and Mendel Rosenblum.
我在大学时曾选修门德尔教授的操作系统课程。而黛安,正如后续故事所述,在我离开谷歌时正负责谷歌云业务。他们都是这个领域的传奇人物。VMware的突破在于,能让普通计算机(无论是服务器还是个人电脑)运行多个虚拟机。其技术难点在于:计算机通常设计为同时只运行一个操作系统,由该系统管理应用并确保稳定。若突然在同一台计算机上运行多个虚拟机,其中某个执行危险操作导致系统停滞或异常,就可能破坏整个运行模式。
I I got to take operating systems in college from Mendel, Then Diane, as we get to later in the story, was running Google Cloud around the time I was leaving Google. Legends in this field and in this industry. What VMware did is they made it possible to take a normal computer, whether it was a server or a PC for that matter, and run virtual machines on that computer. Why that's typically hard is a computer is usually made to run one operating system at a time, that and operating system is managing applications and making sure that computer doesn't crash. If all of a sudden you have multiple machines running on a computer at a time and one does something that you can consider unsafe, that would stall out the machine or do something that they weren't supposed to do, that could break the whole model.
但VMware的首款产品让未经修改的Windows和Linux能在同一台x86主机上并行运行。这催生了更多有趣的应用场景,比如实现虚拟机在机器间的实时热迁移。举个具体例子:假设你正在PC上玩游戏,突然在0.5秒的瞬间,整个游戏画面就无缝切换到了另一台机器上。
But VMware's first product enabled an unchanged Windows and Linux to run side by side on the same x86 box. This accelerated to even more interesting use cases, so you can actually hot swap VMs on machines at the same time. Let me give you a concrete example here. Let's say you're playing a game on a PC. It was as though all of a sudden, in the snap of a half second, that game moved to another machine mid frame.
这个设计初衷就是为了实现虚拟机在服务器间的热迁移能力。
So this was actually designed to be able to hot swap a virtual machine from one server to the next.
因此即便没有专属服务器也不成问题,因为你的虚拟机可以按需灵活迁移。比如遭遇硬盘故障时,你可以实时切换到后台运行的快照版本。这种虚拟机概念既成为资源利用率的核心——单台物理服务器可托管多个虚拟机,也构成了冗余设计和故障恢复的基础架构。
So it didn't matter that you didn't actually have your own server because your virtual machine could float around as needed. So let's say that you have a hard drive crash. Well, you could have a snapshot running in the background, and you could flip over to that one in real time. And so this notion of a virtual machine becomes the backbone for both the utilization point, because one physical server can be used to actually host multiple virtual machines, and a lot of the redundancy and fallback designs.
VMware并非唯一实现这一技术的公司。后来出现了开源的Zen项目,亚马逊AWS最初采用的就是这个方案。随着超大规模服务商不断探索全面虚拟化的方法,这类技术路线正变得越来越成熟。
And VMware wasn't the only one to do this. Eventually, there was the open source Zen project that is what Amazon and AWS first used. This kind of thread becomes better and better over time as all of the hyperscalers would figure out how to maximally virtualize everything that they do.
现在我不必再为运营pets.com网站操心——既要确保商品送达客户,又要维护网站运行,一旦崩溃还得停摆业务、提交工单来修复故障。所有这些都被抽象化地交给了专业团队,交给专门解决此类问题的服务商,而我获得的只有持续的生产力和不间断的服务。
And so now rather than me trying to run my pets.com and ensure that my product is getting to my customer and my website is doing everything, then when it crashes, me having to stop everything and file the ticket and pause business to to fix the crash, that's just abstracted out to the specialists, to a business that is designed to solve this problem for me, and all I get is then continuous production, continuous service.
而且你每年花费的成本都在降低。
And it gets cheaper for you every year.
而且越来越便宜。
And it gets cheaper.
比如,S3和EC2的成本越来越低,低得惊人。这是个了不起的业务。对其他许多参与其中的公司来说,建设和运营这些数据中心及公用事业业务并非他们的高利润业务。但对亚马逊而言,这项业务的利润率使其相较于零售业务成为高利润业务。
Like, the cost of s three and e c two has just gotten cheaper and cheaper and cheaper. It's an amazing business. And for many others that will get into the story, building these data centers and the utility business is not their high margin business. But for Amazon, this business has margins, which makes it a high margin business compared to their retail business.
这就像如果我是一家航空公司,花大量时间研究如何预订机票,然后你给了我一个能自动处理的程序,我现在就能专注于服务客户,更快地服务更多客户。
It's almost like if I were an airline spending all this time trying to figure out how to book a reservation, and then you gave me a program that could do it for me, I can now focus on serving the customer and serving more customers faster.
然后它就爆发式增长了。对吧?如果你看2007年2月的S3,存储了100亿个项目。到2009年2月,大约是640亿。而去年,S3存储了400万亿个项目。
And it explodes. Right? If you look at s three in 02/2007, there were 10,000,000,000 items stored in s three. By 02/2009, that's about 64,000,000,000. And by last year, 400,000,000,000,000 items stored in s three.
这真是个巨大的数字。
That's a really big number.
确实非常大,数量惊人。这为初创企业生态系统和行业提供了动力。对吧?让我们回到那个时间点。
It's a very big number. It's a lot of items. This powers the startup ecosystem and industry. Right? Let's go back in time to that.
那么,推动这些数字增长的因素是什么?
So what was driving the growth of all these numbers?
我们身处黑客马拉松之城。工程师和有想法的人聚在一起欢乐时光。现在你有了想法,只需放下一张信用卡,无需谈判购买硬件,没有发票,没有合同,没有销售电话。
We're in Hackathon City. We're having happy hours with engineers and folks with ideas coming together. And now you have an idea. You can drop a credit card down, and you don't need to negotiate and buy hardware. No invoices, no contract, no sales calls.
你只需几小时就能启动并运行。AWS非常明智地将此视为短期和长期增长的途径。短期来看,他们能让一批初创公司使用其计算资源——虽然初期收益有限,但其中部分企业会成长壮大,并始终扎根于AWS平台。
You're just you're up and running within hours. And AWS very brilliantly saw this as a pathway for short term and long term growth. Right? In the short term, they can get a bunch of early stage startups using their compute. And that's not going to amount to a lot of money, but some of them are going to grow and they're going to be built on AWS.
因此他们实际上采用了一种商业模式:在各类交流活动和黑客马拉松中发放免费额度,使AWS成为创建新公司时的默认基础设施平台。
And so they actually had a business model of giving out free credits at these happy hours, at these hackathons to make AWS the default infrastructure platform to build a new company.
我认为这种市场策略直接影响了他们的产品设计。他们的产品设计对用户用途保持高度中立——核心就是让你轻松获取服务器去做想做的事,提供极简化的存储方案。
And I think it feeds through from that go to market to their product design. Right? Their product design was deeply unopinionated about what you were gonna do. It was make it easy for you to get a server to go do what you wanna do with it. Here's storage, as simple as it can be.
这些就像最基础的乐高积木,简单到让人们在其上搭建出看似相同的业务。听说过Dropbox吗?早期它本质上就是个S3应用,纯粹做存储,仅此而已。
These are the simplest Lego blocks you can build on, and so simple in fact that then people built what feels like the same business on top of them. Ever heard of Dropbox? Dropbox is just an s three application for this whole early period. It's storage. It's storage.
它让文件从电脑同步到云端变得简单,实现备份、同步和共享。Dropbox与S3深度绑定,其设计完全围绕这个功能。直到2015年他们才考虑成本问题,最终从AWS迁移到自建服务器——毕竟他们本质就是个存储层,这种情况确实罕见。
Let's make it easy for syncing files from your computer to this new cloud thing and backing them up and syncing them and sharing them to other people. Dropbox and S3 are intimately linked, and so Dropbox is built to do exactly that, and it wasn't until 2015 that they were like, Okay, we should probably look at the cost of this, and they eventually moved off of AWS to their own servers because all they are is the storage layer. So that's a rare case where it made sense.
展开剩余字幕(还有 480 条)
这就是他们的全部业务。现在每月还收着我11.99美元呢。
That's all they are. They're still getting $11.99 a month from me.
没错,用户黏性很强。Dropbox与AWS的故事堪称经典,但要论这段时期的标杆合作,恐怕非Netflix与AWS的搭档莫属。
That's right. Good locking. So the Dropbox AWS story is a classic one, but there's probably no better partnership to exemplify this time than the one that Netflix had with AWS.
他们与这些公司共同扩展的能力确实令人瞩目。对于记得的人来说,2008年的Netflix主要还是家邮寄DVD的公司。那时它刚推出流媒体服务作为附加功能。管理层已经意识到其中潜力。但2008年8月,Netflix遭遇了主数据中心严重的数据库损坏事件。
Their ability to scale with these companies is really something to behold. And so in 2,008, Netflix is still primarily a DVD by mail company, for those of us that remember. It had launched a streaming service as a side feature. The leadership knew that there was something here. But in August 2008, Netflix suffered a major database corruption in its primary data center.
这次故障导致其DVD邮寄和流媒体服务中断三天,业务完全停滞。这是个巨大的警钟。恢复过程对Netflix而言极为痛苦。他们意识到本地垂直扩展的系统过于脆弱,无法从重大故障中快速恢复。
For three days, it disrupted their DVD shipping, their streaming ability. It stopped their business. This was a huge wake up call. The recovery was very painful for Netflix. They realized that their on prem vertically scaled systems were just too fragile, and they couldn't recover quick enough from major failures.
管理层决定需要构建更具容错性、弹性扩展能力且全球可用的架构,因为他们怀有将未来流媒体业务推向全球的抱负。他们得出结论必须聚焦核心业务,而自建这套系统既缓慢又昂贵。于是Netflix成为AWS首个公开宣传的规模化客户案例。事实证明这至关重要。到2015年,Netflix每年通过AWS和自建CDN交付数十亿小时的流媒体内容。
The leadership decided they needed architecture that was just designed to be more fault tolerant, designed to be elastic, designed to be globally available because they had aspirations of being able to stream their future business all over the world. And they concluded that they need to focus on their core business, and building this out in house was slow and costly. And so Netflix actually became AWS's first marquee all in public reference of a customer that scales. This proved to be vital. By 2015, Netflix was delivering billions of hours of content annually, almost entirely over AWS and their own CDN.
他们运行着数千个EC2实例,所有视频内容都存储在S3中——就是我们正在观看的那些实际视频文件。后来他们意识到必须掌控最后一英里传输的延迟和成本,这正是CDN的典型应用场景。2012年他们推出了名为'开放连接'的项目及设备,直接在我们之前讨论过的网络汇聚点部署Netflix对等设备,与当地ISP建立直连。
They were running thousands of EC two instances and had all of their videos and the the kind of canonical system of record in s three of, like, the actual videos that we are watching. Now they did at some point realize that it was so important for them to own the latency and cost of that last mile delivery, that edge. And this is a perfect example of a use case for CDNs. They launched in 2012 what they call Open Connect and the Open Connect appliances. This means that they would go into those MeetMe rooms that we had talked about before and drop a Netflix peering box that would directly connect to your local ISP.
而且他们免费提供这项服务。对吧?ISP只需说:'我们有很多用户要访问Netflix,让我们优化体验'。
And they would do this for free. Right? The ISP just says, hey. We we have a lot of people trying to access Netflix. Let's make it better.
让我们提升速度,降低成本。
Let's make it faster. Let's make it cheaper.
这是双赢的局面。
It's a win win.
这是双赢的局面。他们当时停用了正在使用的CDN。Netflix节省了成本,客户更满意,每个人都能更快地观看视频。
It's a win win. And they cut out the CDN that they were using at the time. Netflix saves money. Customers are happier. Everyone gets their videos faster.
这样一来,当某个ISP的用户访问最新热门电影时,内容已经就近准备好了。亚马逊实际上几乎不会在那时受到流量冲击。显然它运行着为Netflix服务的所谓控制平面。
This way, when someone on an ISP accesses the newest popular movie, it's already close to them. Amazon doesn't actually even they barely get hit in that moment. It's obviously running what's called the control plane for Netflix.
是啊。因为如果我得等三四秒才能加载预览,我...
Yeah. Because if I have to wait three or four seconds for that preview to load, I
可能就不会看那部剧了。用户可能会流失。而正是这种模式支撑着至今的流媒体服务,同样的架构也让其他企业能够实现大规模流媒体传输。整个Netflix案例完美展示了AWS如何与你共同扩展,在极具挑战性的环境中保持弹性、实现全球覆盖,对吧?
might not watch that show. You might churn. And this is what powers streaming to this day, and the same model is what enables others to enable streaming at scale. All of this, this whole Netflix case study, is the perfect flywheel and customer to show that AWS can scale with you and scale in a in a really challenging environment and be resilient and power global reach. Right?
如今的Netflix早已不仅是美国公司了。
Netflix is not just a US company at this point.
没错。对于那些曾认为AWS和云计算概念存在风险、无法扩展、不适合企业的人来说,Netflix帮助全球首席信息官们打破了这些偏见。
Yeah. For anyone that saw AWS and the cloud as a concept as risky, as unable to scale, as not enterprise ready, Netflix helped debunk that for chief information officers around the world.
他们会不惜一切代价。我记得有个故事——可能来自《Acquired》播客——亚马逊甚至允许你要求他们开辆大卡车来,把所有数据吸到硬盘里,然后运到数据中心导入你的AWS实例。他们想尽办法满足企业客户需求。令人惊叹的是,这个以卖书闻名的公司竟能如此快速地建立起'如何规模化构建安全私有云服务'的品牌形象。
And they would just do whatever it took. I I can remember some story. I think this is from the acquired episode where Amazon would allow you to ask them to roll in a big truck, slurp up all your data into hard drives, then they'll bring it to their data center to plug in and dump all your data into your AWS instance. So they figured out what was needed to close the enterprise customers. It's just amazing that someone who is, you know, known for selling books was able to so quickly build the brand around how to do this new private secure utility thing at scale.
因此AWS持续重新思考数据中心的用途及建设方式以服务客户。他们摒弃了每个市场只建一个巨型设施的做法,引入了可用区的概念——即在每个区域内建立由独立数据中心组成的集群,各自拥有独立供电线路和通过毫秒级专线连接的光纤路径。这种可用区与集群理念成为了AWS为数据中心演进贡献的又一重要组成部分。我感觉
So AWS continued to rethink what data centers are used for and how to build them out to serve the customer. Rather than have one mega facility per market, they introduced the concept of availability zones, which became clusters of independent data centers within a region, each on separate power lines, each on separate fiber paths linked with millisecond private connections. And this availability zone concept and clustering was yet another piece of the data center evolution that that AWS contributed to to the ecosystem. I feel like
这种‘美东’‘美西’的概念,如今已成为人们思考服务器部署的方式——即这种抽象的可用区理念。
this notion of, like, US East and US West, this is how people think about their servers now, which is this abstracted notion of a of an availability zone.
AWS在全球范围内持续推行这一模式,实际上这成为了超大规模增长的蓝图。
AWS continued to do this globally, and this actually became a blueprint for hyperscale growth.
暂且在此结束亚马逊的故事,我认为将公用事业级云计算的开启归功于他们并不为过。他们做对了太多事情:为开发者提供简易构建模块、持续优化成本、确保可靠性冗余、为虚拟化搭建正确护栏以实现高效运营和长期建设,通过内部统一API形成加速发展的飞轮,并将成本优势反哺核心业务。这使他们遥遥领先,至今仍是核心云基础设施领域的领导者。
To end the Amazon story here for now, I think it would not be wrong to to credit them with really kicking off the utility scale cloud story and getting so many things right, giving developers the simple building blocks that they could use, going after cost, going after reliability redundancy, making sure to build the right guardrails for virtualization so that they could actually do this efficiently and build for the long term, using the same APIs internally so that it created a flywheel for them to move faster and bring those cost benefits back to the core business. It propelled them so far ahead in this business. They are still the leader to this day in the core cloud infrastructure product as a result.
正如我们讨论的数据中心,其核心很简单:一个具备存储、计算和连接能力的设施。而AWS所做的,是让数据中心能够在此基础上构建经济生态。
As we've been talking about the data center, it is simple at its core. It's a facility with storage, compute, and connectivity. And what AWS does, they take the data center and they enable it to build an economy on top of it.
换个角度说,过去如果你家里需要电灯照明,就得在房子底下装个燃煤发电机。同理,想在互联网上建网站就得买台服务器接在家用宽带上。如今看来这简直疯狂——什么叫‘要用埋在我家地下的电来供电’?
To frame it another way, it used to be that if you wanted lights in your house, if you wanted electricity in your house to power those lights, you would put a dynamo to burn coal underneath your house. Right? And if you wanted a website on the Internet, you would buy a server and plug it into your home ISP. And that is insane to us today. What do mean you're gonna power my house with electricity from underground in my house?
取而代之的是,我将与所有邻居共享这套基础设施,集中需求后建设最廉价、最庞大的发电系统。云计算正是首次实现了同样的理念:人们虽拥有可访问公开网站的互联网,企业虽自有服务器,但像Dropbox文件夹那样拥有专属存储空间,或作为开发者能任意构建并访问这片看不见具体位置却真实存在的‘数据浮云’——我认为这就是云计算的概念本质。
Instead, I'm going to hook up this shared infrastructure with all of my neighbors, and we're gonna centralize our demand and build the cheapest, biggest infrastructure we can to generate the power, and that's the same thing happening here for the first time. This is the introduction of the concept of the cloud. People have had the internet where they're interacting with publicly visible websites. Private enterprises have servers, but the idea that you have your own storage over there in a Dropbox folder, or you as a developer could build whatever you wanted and access this floating blob of data that's to you invisible where specifically it is, but it's out there. That was the, I think, conceptual notion of the cloud.
我开始有点讨厌这个说法,因为它助长了基础设施的隐形性。
I've come to somewhat dislike the phrase because it feeds into the invisibility of the infrastructure.
嗯。
Mhmm.
对吧?它好像在说,它不存在任何地方。你的照片不存在任何地方。你的存储也不存在任何地方。
Right? It says like, it is nowhere. Your photos are nowhere. Your storage is nowhere.
而这与事实相去甚远。它确实存在于某处。实际上,它存在于多个地点。
And that's so far from the truth. It is somewhere. It is in multiple locations, in fact.
云几乎是与玻璃房设计理念完全相反的存在。玻璃房是你想要炫耀的地方——看,这就是它的位置,这就是计算发生的地方。
The cloud is almost the design antithesis of the glass room. The glass room where you wanna show off, this is where it is. This is where the compute is happening.
它只在这里发生。
It is only happening here.
它正在此刻发生。但这个概念在解释迁移到未知位置的理念上起到了很好的作用,而且我认为它一直具有很强的粘性。所以当亚马逊还在卖书的时候,斯坦福的一个小初创公司正在帮你组织和访问日益增长的万维网上的信息。
It is happening right now. But it served a good purpose in in explaining this notion of migrating to this unknown location and has remained pretty sticky, I think. So while Amazon was out there selling books, a little startup out of Stanford was helping you organize and access the information on the growing worldwide web.
谷歌成立于1998年,不久后每天处理约10,000次搜索查询。到2006年2月底,它每秒处理的搜索量已达到这个数字。谷歌在2006年2月收购了成立仅一年的YouTube,当时YouTube已是全球增长最快的网站之一,每日视频浏览量达1亿次。
Google was founded in 1998, and shortly after, it was handling about 10,000 search queries per day. By the end of 02/2006, it was processing the same number of searches every second. Google acquired YouTube in 02/2006, a year after its founding. And at the time, it was already one of the fastest growing websites in the world with a 100,000,000 video views per day.
我记得人们刚开始使用基于网页的邮箱账户,比如Hotmail,存储空间只有2到4兆字节。而谷歌在2004年4月1日以愚人节玩笑的方式宣布推出Gmail,作为邀请制服务提供整整1GB的存储空间。
And I remember, you know, folks having their first web mail based accounts, like a Hotmail account. They'd have two to four megabytes. Well, Google, notoriously on 04/01/2004, announced and launched Gmail as a invite only service that had a full gigabyte of storage.
我记得通过推荐他人注册就能获得访问权限。
I remember you could get access if you if you referred in.
要么通过推荐注册,要么在eBay上花150美元购买邀请码。这成了抢手货。到2010年2月,Gmail用户已突破1.5亿
If you referred in or they started going on eBay for, like, a $150. It was a hot ticket. By 02/2010, Gmail had over a 150,000,000
用户,当然还提供更多GB级的存储空间。这太不可思议了。从一开始它就是互联网的入口,但也注定要成为基础设施公司。
users, and of course, with more gigabytes of storage available. It was amazing. It was from the very beginning the front door to the Internet, but it was also going to need to be an infrastructure company.
时间回到1999年,谷歌成立一年时,一位名叫乌尔斯的工程师在CEO兼创始人拉里·佩奇的带领下参观公司。乌尔斯说第一台谷歌服务器机柜小得几乎无法立足——这个4英尺×7英尺的机柜里摆放着30台PC,为世界提供超出负荷的谷歌服务。我们的隔壁邻居是eBay,稍远处是Alta Vista存放服务器的大型机柜。
And so you go back to 1999, a year into Google's life, and an engineer named Urs is being shown around as part of his recruitment by Larry Page, the CEO and founder. Urs says you couldn't really set foot in the first Google cage because it was so tiny. The cage was seven feet by four feet with 30 PCs arranged on the shelves, providing the world with more Google than it could handle. Our direct neighbor was eBay. A bit further away was a giant cage housing deck machines at Alta Vista.
所有这些设备都托管在圣克拉拉的Exodus数据中心,就是我们之前讨论过的那种联合托管设施。
All of this was hosted at Exodus in Santa Clara, one of those colocations that we talked about earlier.
当时谷歌与AltaVista等搜索引擎竞争,它为斯坦福大学的部分搜索提供技术支持
So Google competing with AltaVista and all those others at the time, it was powering part of Stanford search
没错。
That's right.
而且仅靠30台个人电脑运行。
And running off of 30 PCs.
在这个托管中心里用30台电脑运行。谷歌当时每兆比特每秒的数据传输成本约为1400美元每月。因此他们不得不购买两兆比特每秒的带宽。一兆比特大约能支持每天百万次查询。从一开始,谷歌就以独特视角看待其服务器和基础设施。
Running off of 30 PCs in this colo center. It cost Google about 1,400 per month per megabit per second of data. And so they had to purchase two megabits per second at the time. One megabit was about a million queries per day. From the beginning, Google looked at its servers and infrastructure differently.
谷歌从未想过走传统老路——购买Sun或惠普的服务器来使用。
It was never interested in taking the tried and true path of buying the Sun or even HP boxes and and using those.
最能体现这一时期的标志性故事就是那个著名的软木板。早期谷歌工程师真的把主板钉在软木板上,用15美元的箱式风扇散热,用尼龙扎带固定。按照传统IT理念,这简直是离经叛道。
The most memorable stories of this time that explains this is the infamous corkboard. So in the early days, Google engineers literally mounted motherboards on corkboard. They had $15 box fans pushing air across. They had zip ties holding it together. And in a traditional IT philosophy, this is heresy.
但他们的理念是:如果某个部件无法在集群规模上提升可靠性,就干脆去掉它。
But the idea was that if a part doesn't add reliability at fleet scale, strip it away.
这一切催生了一种质疑假设的核心理念,并最终实现了基础设施的深度垂直整合。有人可能会认为,这正是谷歌的超能力所在,至今依然如此。苹果将所有这种思考倾注于打造完美的iPhone,而谷歌则将其应用于数据中心。
This all set in motion this ethos of questioning the assumptions and ultimately deep, deep vertical integration of their infrastructure. Some might really argue that that was, like, Google's superpower and still remains so to this day. The Apple's piling all of that thinking into making the perfect iPhone, Google's doing it to their data centers.
因此谷歌在这一早期阶段产生了多项创新。其中之一是关于如何实现电力备份。当时的行业标准是为整个设施配备大型电池组。然而到了二月份,谷歌开始质疑这种模式,发现大型电池存在浪费。于是他们改为在每个服务器上安装小型电池。
And so Google had a number of innovations that came from this early time. One was around how to do power backup. So the standard was to spend for a facility wide big battery. Well, by February, Google's questioning this model and and seeing the waste in large battery. And so they actually put little batteries on each server.
他们再次接受了单个服务器可能宕机的事实,但只要整个机群保持可靠就无妨。因此关键在于
Again, they were accepting that one might go down, but that was okay as long as the whole fleet was reliable. And so the key
实现这一切的核心是将可靠性提升至软件层。谷歌没有使用现成的文件系统和软件,而是自主构建了所有组件。他们预见到硬盘会损坏,因此开发了谷歌文件系统。这大约是在2001年2月到2002年2月期间。
to all of this working was moving reliability up the stack into the software layer. Google was not using off the shelf file systems and off the shelf software. They built everything here themselves. They assumed that hard drives would break, and so they built the Google file system. This was, like, 02/2001, 02/2002.
这是一个分布式文件系统,能将文件分块后必定分散存储在三台不同服务器上。这样既能确保弹性,又能大幅加快搜索索引访问速度。2003年2月,他们开发了Borg集群管理系统,主要负责管理任务在哪些服务器和机器上运行,实质上是虚拟化的延伸,也是后来Kubernetes的前身。除了这些软件组件,他们还质疑了许多关于服务器的物理限制。最著名的是,他们率先进行了热通道/冷通道气流隔离实验。
So this is a distributed file system that would take files, chunk them up, and spread them across three different servers no matter what. So you could always have resiliency, and, also, this would make accessing a search index much, much faster. They built, in 02/2003, Borg Cluster Manager, which ultimately was all about managing where jobs were running on which servers and which machines, and really was like an extension of the virtualization and predates Kubernetes, which we'll talk about a little bit later. So these were all the software components, but they were also questioning many of the physical constraints that people had looked at with servers. And so most notably, they did some of the first experiments in really this hot aisle, cold aisle airflow containment.
传统数据中心只是将发热的计算机堆放在房间里,然后向室内猛吹冷气。因此必须将室温保持在极低水平,常常冷得不舒适,却从未认真考虑如何排出热空气。后来业界逐渐认识到应该统一服务器朝向,主动排出热空气。谷歌发现这种做法越极致效果越好,于是用隔板将热通道与冷通道完全隔离,这样能更高效排出热空气,避免冷热气流混合。
Traditionally, data centers were just a bunch of hot computers in a room, and they would just blast cold air into the room. So you'd have to keep the room as cold as possible, often uncomfortably cold, and there was no real thought about getting the hot air out of the room. Over time, it became accepted wisdom that you should point servers in one direction, get the hot air out of the room, encourage that to happen. Google realized that the more you took that to an extreme, the better. So they would put sheeting up and isolate the hot aisle from the cold aisle because then you could actually extract the hot air more efficiently and not mix it back into the cold.
所谓冷热通道是指:冷空气需要吹过暴露的芯片和主板,将服务器热量带入热通道,然后热通道的空气需要被排出室外。谷歌因此成为优化气流设计的行业领导者。
And so what we mean by cold aisle and hot aisle is that cold air needs to blow across those exposed chips and motherboards that would extract the heat off of the server into the hot aisle, and then the hot air from the hot aisle would need to be extracted out of the room. And so Google became really a leader in designing the best airflow there.
这个概念持续了多年。你也想想传统数据中心,那种假设某些东西会出故障,并围绕这一点进行设计的理念。而在托管设施或租用这些机架时,每台服务器都必须正常工作,因为是不同公司在使用它们。不仅如此,二十年前还有专业清洁团队在数据中心巡视,他们会打扫房间然后分析清扫出的污染物,以了解如何持续优化这个空间的清洁度。用软木板和扎带围住设备并假设某些部件会故障的想法,确实彻底颠覆了当今存在的数据中心概念,正如我们将看到的,这反而带来了更好的性能。
Which is a concept that continued for years. You also think about the traditional data center and this idea of assuming something's going to fail, and you build around that. Whereas in a colo facility or or when you're renting out these cages, every server had to work because it was a different company utilizing it. And not only that, twenty years ago, you had specialty cleaning crews that were moving around the data center, and they would sweep up the room and then analyze the contaminants, what they had just sweep up to understand how to continue to optimize the cleanliness and space around this. The idea of laying in a corkboard and a zip tie around it and assuming something would fail, it really takes the data center concept that existed today and completely flips it on its head, and then as we'll see drives better performance.
当他们建造第一个真正规模化自建数据中心时,这一切都汇聚到了一起。
All of this comes together when they build their own real first scaled homegrown data center.
我喜欢这个理念——不要用我们过去定义数据中心的方式来思考它,即一个塞满服务器、电源和连接设备的房间。要把它想象成一个仓库规模的计算机。厄兹希望我们不要将其视为许多机器,而是一台机器。
I love this idea of of don't think of a data center the way we've been defining it, as a room packed with servers and power and connectivity. Think of it as a warehouse scale computer. Erz wanted us to think of it not as many machines, but as one machine.
所以你想计算机的组件,对吧,存储、内存、计算单元,厄兹看到他们随时间在数据中心构建的内容,意识到他们其实是在建造一台大型计算机,只不过恰好是仓库的形状。
So you think about the components of computer, right, the storage, the memory, the compute, and Erz saw what they were building out over time in their data centers and realized that they were just building a large computer that just happened to be the shape of the warehouse.
没错。如果你接受这个前提,就会停止追求每台服务器的完美,转而开始考虑如何让整个机群可靠运行,设计思路就完全不同了。这大约是在2004年2月。有个叫克里斯·萨卡的狂人,他之前是位著名投资人。
Yeah. And if you accept this premise, then you stop trying to make each server perfect, but you start thinking about making the fleet reliable, and you design entirely differently. And so this is about 02/2004. You have a wild man named Chris Saka. So before, he was a famed investor.
据说他是个衣着邋遢的年轻人,在俄勒冈乡下四处寻找准备就绪的企业园区,想获得一些税收减免。他到处询问需要天文数字般的电力供应,以至于附近城镇怀疑他是恐怖分子并报告了国土安全部。但这正是他要找的。俄勒冈州的达尔斯有个场地适合他——30英亩紧邻一座退役的铝冶炼厂,那里曾经消耗的电力足以满足一个小城市的需求。萨卡欣喜若狂。
He was apparently a young, sloppily dressed individual walking around rural Oregon looking for shovel ready enterprise zones where he could find some tax breaks. And he's walking around asking for such astronomical quantities of power that allegedly a nearby town suspected him as a terrorist and called the Department of Homeland Security. But this is just what he was looking for. And Dalles, Oregon had a site for him, 30 acres next to a decommissioned aluminum smelter that once drew enough power to power the needs of a small city. And Saka was ecstatic.
他说这具有远见。这个没有税收收入的小镇意识到,若想将经济从制造业转型为信息产业,就必须铺设光纤。于是谷歌继续建设俄勒冈州达尔斯数据中心,运用所有创新技术来提升其性能。这意味着:来自哥伦比亚河的低成本稳定水电资源、邦纳维尔走廊提供的高压输电、凉爽干燥的气候让全年大部分时间都能经济运行冷却系统、沿河铺设的长途光纤,以及渴望新租户的小镇。这一切共同促成了谷歌首个仓库级计算机的诞生,降低了单位经济成本,提升了数据中心性能,从而优化了整个机群的经济效益。
He said it was visionary. This little town with no tax revenues had figured out that if you want to transform an economy from manufacturing to information, you've got to pull fiber. And so Google went on to build out Dallas, Oregon and bring all of their innovation to bear to increase the performance of this data center. And so what that meant was low cost, steady hydropower from the Columbia River, the Bonneville Corridor providing high voltage transmission, a cool, dry climate that lets you run a cooling process most of the year that's very economical, long haul fiber that traces the river's edge, and a town that was hungry for a new tenant. This all came together to launch Google's first warehouse scale computer to drive down unit economics, improve the performance of a data center, and therefore, the unit economics of the entire fleet.
我认为这可能是首个从零开始构建的超大规模数据中心。他们建设它是为了解决自身问题,但这些问题目前正以如此快的速度扩展,以至于他们开始思考如何通过软件调优、虚拟化等手段优化所有这些环节——这成为了未来的新蓝图,至今仍是我们设计数据中心的普遍方式。
And I would argue, I think this is potentially the first built from the ground up hyperscale data center. They're building to solve their problems, but their problems are, at this point, scaling so rapidly that they're thinking about how to optimize all these things in a way tuned with software, tuned with virtualization, and all these things that this is the new blueprint for the future that is still how we're generally designing data centers today.
他们从一开始就领先于基础设施的思考,将其融入企业基因。其他公司也并非停滞不前,都在同步建设。这是一场竞赛,他们必须低调行事。谷歌率先在2006年2月投入运营。
They had a head start as thinking about infrastructure from the get go, making it part of their DNA. And it's not as if the other companies were standing still. Were building concurrently. This was a race, and they had to do it quietly. And Google was the first to get operational in 02/2006.
他们迅速在北卡罗来纳州和芬兰复制这种模式,通过测试不同气候条件来优化方案:利用海水冷却、采用热通道封闭系统等方法提升冷却效率和性能。
They quickly replicated this elsewhere in North Carolina and then in Finland, testing different climates to optimize using seawater, using hot aisle containment, other ways to optimize cooling and performance.
没错。他们还在持续突破其他领域,比如供电系统。传统上每台服务器都像插墙电一样,需要将交流电转为直流电,再为各组件降压。而谷歌发现,如果让机架直接接入高压直流电,就能实现单次高效转换并尽可能延后降压环节——高压意味着更少损耗,从而提升整体能效。
Yeah. There's others they kept pushing on, right, like the power supply system. So normally, each server, think about plugging into the wall, it's converting AC to DC and then stepping down DC for all the various components. Google instead realized that if they could just have a higher voltage DC current coming into the rack, they could then convert once cleanly and do it late as possible. And so that higher voltage, you'd have less loss because it's a it's a higher voltage, and you get efficiency gains across the whole path.
因此,在谷歌构建服务器的规模下,他们可以优化每一个环节
And so at the scale that Google's building servers, they can optimize every single part
这种48伏直流电系统,他们在2016年公开宣布实现了两位数的能效提升,随后被全球数据中心设计所采纳。我们如此关注性能和效率是有充分理由的:当时许多数据中心企业正因电力消耗承受巨额亏损,这直接影响盈亏线,必须重点解决。
of the stack. This 48 volt DC, they later announced in 2016 publicly with double digit efficiency gains that then the world could incorporate into data center design. Now we've talked quite a bit about performance and efficiency, and there's good reason for that. Because a lot of data center companies were hemorrhaging money on power. It was enough impact on the bottom line that there needed to be focus on this.
我们非常荣幸地采访到了其中一位
We had the distinct pleasure of talking to one of
数据中心领域的元老级人物克里斯蒂安·贝拉迪。他向我们讲述了将能效指标引入行业的精彩故事。这个指标叫PUE(电能使用效率),即数据中心总能耗与实际供电IT设备能耗的简单比值。
the OGs in data centers, Christian Belady. And he told us this fantastic story of bringing the metric for power efficiency to the industry. It's called PUE. PUE or the power usage effectiveness. This is a simple ratio of the total energy consumed by a data center divided by the actual IT equipment that you're trying to power.
对吧?它能直观反映数据中心的运营开销。如果PUE值为2,意味着辅助设备能耗是服务器实际能耗的两倍。安娜,这个简洁而精妙的指标是怎么诞生的?
Right? So it very easily spits out what is the overhead in running your data center. If it's two, that means that you have twice as much energy use to the energy that's going to the actual servers. Anae, how did this metric come to be? Such a simple, beautiful metric.
90年代末,克里斯蒂安·博莱迪在惠普工作时,惠普有个日本客户NTT Docomo。作为行业资深人士,他通过惠普和行业经验总结出十大最佳实践。Docomo对此非常认可,表示要全面实施。
So Christian Boledi was working at HP in the late nineties, and HP had a customer in Japan called NTT Docomo. And Christian had been in the industry for so long that he had formulated enough standards through HP and through the industry to come up with 10 best practices. And he took these 10 best practices to DOCOMO, and they were like, yes, this is great. We're going to implement all of this. He's like, fantastic.
三个月后验收时,他重返日本,看到满桌打印文件,所有人西装革履在闷热房间里汇报:'贝拉迪先生,我们完全按您说的做了,但服务器状态毫无变化。'
Come back in three months, and we'll do a review. And so he goes back to Japan, and they had printed out everything and had big stacks of paper on the desk, and everyone's in a suit and tie in a hot room. And they come back, mister Beledi, we've we've done everything you said. Here's all of our our reports, and nothing's changed. The servers are all still the same.
热通道更热了,没人愿意靠近,看起来毫无改善。
It's even hotter in the hot aisle, and no one wants to go back there. And so it doesn't seem like it's any better.
没错,我们进去还是热浪扑面,觉得这套方案行不通,打算恢复原来的做法。
That's right. We're we're still hot when we go in there. We don't think this is working. We're gonna go back to the way we were doing it.
这让克里斯蒂安抓狂——热通道更热、冷通道更冷正是关键所在!这种温差分离才能提升效率,要的就是空气不混合。他对此深信不疑。
And it drives Christian nuts. This is the whole point, that you actually want the hot aisle to be hotter and the cool aisle to be cooler. That separation is what drives the efficiency. You don't want the air to mix. So he knows in his bones.
他说,不。你正在以更高效的方式运行。你必须看到这一点。
He's like, no. You're running in a more efficient way. You have to see that.
在此之前,NTT无法看到变化或衡量热酒精遏制措施以及克里希纳提出的其他建议带来的效率提升。因此,他发明了这个极其优雅的指标,并仅在惠普内部使用。他们在惠普继续实施了六年,直到他的好友克里斯·马龙建议他在2006年2月于肯·布里尔的Uptime Institute发表论文并进行展示。
Prior to this, NTT couldn't see the change or measure the efficiency gains of the hot alcohol containment or the other recommendations that Krishna was making. And so he he invented this incredibly elegant metric, and he kept it internal to HP. They continued to implement it in HP for six years until his good friend Chris Malone suggested he publish a paper and present it at Ken Brill's Uptime Institute in 02/2006.
在Uptime Institute,他们发表了论文,并成立了一个名为Green Grid的新组织来发布PUE等指标。创始成员包括克里斯蒂安·布拉迪、保罗·佩雷斯、布鲁斯·肖、拉里·维德尔。这些人后来成为戴尔CTO,在AMD领导大型团队。他们原以为这只是内部尝试,结果却在行业引发风暴——因为PUE竞赛就此展开,而谷歌志在夺冠。
At the Uptime Institute, they present the paper, and they kick off a new organization called the Green Grid to publish PUE and other metrics. It was Christian Blady, Paul Perez, Bruce Shaw, Larry Verdel. These folks later became the CTO of Dell, led big teams at AMD. And they thought it was just gonna be an internal kind of, here's the thing we tried. It hit the industry like a storm because now the PUE race was on, and Google wanted to win the race.
当时许多企业的数值在2左右,意味着照明、制冷等辅助设施消耗的电力是IT硬件实际用电量的两倍。谷歌将其降至1.1,即仅10%电力未用于计算设备供电。
Many, many enterprises had numbers around two, as in twice as much power went to the lights and the cooling and everything else to the amount of power that actually ran the the IT hardware. Google pushed it down to 1.1, meaning only 10% of the power is not used to to power the compute.
他们的内部团队抓住这个理念持续推进,将数据中心彻底重构为一个计算机系统,用软件而非硬件实现可靠性层,并通过创新的地理位置布局来提升性能和PUE,最终达到几乎无法突破1的优化极限。
Their internal teams took hold of this and ran with it, and they used this rethinking of the data center from the ground up as one computer system, as software being the reliability layer, not the hardware, and innovative geographic placement to drive performance and drive PUE, and it come to a point where it essentially can't be optimized past one.
在深入讨论之前,我们先说说参观这些数据中心园区是什么体验。
Before we get too far ahead of ourselves, let's talk about what it is to go to one of these data center campuses.
当你驶入辽阔的开阔地带,会突然看到巨型建筑拔地而起。从空中看可能像物流中心,能看到建筑冒出些许蒸汽,但你就这样驶入这个园区。
You're driving up into vast open space where you suddenly see massive buildings growing out of the ground. From the sky, it maybe looks like a distribution center. You can see some steam flowing out from the building, but you drive into this campus as it is.
你开始注意到,如果这只是一个仓库,为什么会有如此多的电力基础设施?因为你看到了大型电池组、额外的涡轮机或发电机。
You start noticing why would there be so much power infrastructure if this is just for a warehouse? Because you're seeing big battery packs or extra turbines or generators.
你还会注意到安保无处不在。到达前台前,你必须通过一系列检查点。也许他们在为Netflix供电,但他们也在为五角大楼的运作提供电力。所以他们有企业义务,因此安全至关重要。
The other thing you'll notice is security is everywhere. You have to pass through a series of checkpoints as you get to the front desk. Maybe they're powering Netflix, but they're also powering Pentagon operations. So they have an enterprise obligation. And so security is paramount.
所以你获得了批准,成功通过了多层门禁。
So you're approved. You managed to get through multiple layers of gates.
于是就有了那个时刻,那个揭晓真相的瞬间——当你走进去,终于恍然大悟:一排排望不到尽头的机器阵列。
And so there's a moment, the moment of big reveal, when you walk in and it finally hits you. The rows and rows and rows and rows of machines that just go on further than the eye can see.
没错。这些设施比足球场还大,成排的机架闪烁着灯光,机器嗡嗡作响。你会注意到温度变化——刚进去时可能还算舒适。
Yeah. These facilities, they're bigger than a football field, and they're lined with these racks of blinking lights and machines. You're hearing this constant whirr, and you're noticing the temperature. Where you walk in, it might be fairly comfortable.
如今谷歌数据中心的平均温度是80华氏度(约26.7摄氏度)。过去是什么样呢?
I think 80 degrees is average in a Google data center today. What did it used to be like?
以前必须保持这些机器冷却,最好的方法就是让整个房间制冷。所以里面冷得像冰窖,整个房间都开着强力空调。
You had to keep these machines cool, and the best way to do that was to chill the room. So it was frigid. The entire room was blasting with AC.
回到八十年代,你只能一直穿着那件厚毛衣。现在他们能舒适地将温度维持在80度左右,这得益于他们对冷热通道的出色隔离。现在你已适应了机器的噪音,他们给你耳塞让你更舒适,当你抬头,就能看到支撑这些服务器运行的基础设施和连接它们的线路。谷歌以色彩鲜艳的线缆和清晰的管道标识著称,但过去并非如此。若回溯到八十年代,我想情况大不相同。
Back in the eighties, you'd just wear that thick sweater all the time, Now they get to comfortably hover around the 80, and that's because of that really good containment that they did of hot and cold aisles. Now you've taken in the machines, you've adjusted the noise, they've given you earplugs to put in to get comfortable, and you look up and you start to see the infrastructure that's coming in and powering these servers and connecting them. Google had famously bright colored wires and marked where the plumbing is, but it wasn't always like this. If you go back to the eighties, I think it was quite different.
如今你们直接在地板上建造,但几十年前冷却数据中心的方法之一是抬高平台。因此你低头会看到自己走在面板上,下方隐藏的空间里正输送着冷空气。
You now are building right on the floor, but one of the ways to cool the data center decades ago was to raise the platform. And so you actually would look down, and you'd see that you're walking on panels above a hidden space where cold air is being pushed through.
根据我们讨论的谷歌推动行业发展的趋势——如何以最经济高效的方式完成目标?随着时间的推移,高架地板逐渐被淘汰。希望这能让你对走进这样的数据中心有些许体会。谷歌有个不错的播客《互联网栖息之地》,对此有非常详尽的导览。
Per the trend of everything we've talked about with Google driving the evolution, what is the cheapest way to most efficiently do the thing we're doing? And it stops being those raised floors over time. Hopefully, that gives you a little taste of what it's like to walk into one of these. There is a nice Google podcast called Where the Internet Lives that really give a really nice tour of exactly this.
他们为此制作了优质的谷歌与Latitude媒体内容,其中你
They bring a good Google and Latitude media production to it where you
能真切听到他们在数据中心内部录制的声音,那些嗡鸣声非常清晰可辨。谷歌的投资持续增长,垂直整合程度不断加深。他们扩建自有站点,开始收购光纤网络。
can actually hear the sounds as they record inside the data center. So the whirring is very palpable. Google's investments continue to grow. They continue to vertically integrate deeper and deeper. They grow their own sites, they start buying up their own fiber.
他们利用了我们之前提到的部分暗光纤,大量收购了这些资源。这一切都是容量战略——最终他们希望通过容量冗余来降低成本,同时别忘了,他们正构建着全球利润率最高的搜索与广告业务。这个良性循环让他们能持续投资基础设施。作为互联网公司,谷歌面临着爆炸性需求。
They took advantage of some of that dark fiber that we mentioned earlier, snapping a lot of it up. And all of this is a capacity strategy. Ultimately, they want to have the capacity redundancy to drive their own costs down, while mind you, they're building the highest margin best business in the world in the form of search and ads. So that whole flywheel is allowing them to continue to invest in this infrastructure. Google had this explosive demand as they are the web company here.
其内部使用的爆炸性增长推动了垂直整合的价值。但整个行业都看到了AWS推出后的发展态势,谷歌和许多公司一样意识到自己也有可投入的资源。于是他们组建团队在2008年2月推出了App Engine,允许用户在谷歌基础设施上运行应用,但对应用类型有严格限制——仅支持特定框架的Python,这与亚马逊EC2的基础模块化方案截然不同。直到2013年,谷歌才推出通用虚拟机服务。
They have explosive internal use driving the value of the vertical integration, But the whole industry sees what's going on with AWS after its launch, and Google is one of many companies that realizes they have some assets to put towards that too. And so they put together a team that launches App Engine in 02/2008. It lets you run apps on Google's infrastructure, but was very opinionated about the app. It only supported Python with a particular framework, so it was quite different than Amazon's basic building block EC2 approach. And it wasn't until 2013 that Google launched a general purpose virtual machine capability.
由于这种策略以及谷歌缺乏企业销售和营销机制,他们的云项目起步相当缓慢。直到2014、2015年情况才开始转变。谷歌公开了其容器化战略。还记得我们讨论虚拟化时说过,其实你不需要打包整个操作系统。你并不关心它运行的是Windows还是Linux。
As a result of this approach, as well as Google's lack of enterprise sales and marketing motion, it took quite a while for their cloud program to get running. Things started to shift in twenty fourteen, twenty fifteen. Google gets public about its containerization strategy. If you recall our discussion of virtualization, well, you don't actually need to package up the whole operating system. You don't really care that it's running Windows or Linux.
通常你真正需要的只是运行应用程序。容器本质上是在应用层面实现这一点的方式。最近这种趋势更进一步,出现了无服务器函数的概念。
You actually generally just want to run the application. Containers are really a way to do that at the application level. You see this continue even further more recently with the idea of serverless functions.
因此谷歌在2014、2015年通过开源Kubernetes和谷歌Kubernetes引擎震撼了整个行业。它迅速成为程序员使用容器并摆脱虚拟机的默认方式。不过他们当时仍缺乏企业销售合作能力,但Ben,你当时就在场对吧?
And so Google in twenty fourteen, twenty fifteen took the world by storm by open sourcing Kubernetes and the Google Kubernetes engine. It very quickly became the default way for programmers to use containers and migrate away from virtual machines. Now they still didn't have an enterprise sales partnerships muscle, but Ben, you were there at that time, right?
没错。于是他们请来了我们之前谈过的VMware创始人Diane Greene来帮助建立这种能力。到2018年,他们的云市场份额达到7%,AWS是34%,Azure是15%,位列第三。进入2019年后,新任领导者上任至今,他大力推动业务并实现快速增长。现在他们的云业务规模已达130亿美元,增速32%。
That's right. And so that's when they brought in Diane Greene, who we previously talked about was the founder of VMware to help build that muscle. So by 2018, their cloud share was 7%, AWS was 34%, and Azure was 15%, so they were a distant third. Following through to 2019, a new leader came in who's been the leader since, and he has made big pushes and is growing massively. Their cloud business is now a $13,000,000,000 business, growing 32%.
总结来看,谷歌的发展历程很有趣。他们技术上始终领先,尤其内部如此。而且一直拥有优秀的'吃自家狗粮'文化(指内部员工使用自家基础设施)。问题在于他们不知道如何向企业客户销售。他们并非真正的开发者平台,而亚马逊在这个市场已经取得了突破。
Net net, you look at Google's story, and it's interesting. They've always been technically ahead, especially internally. And they've always had an amazing dog fooders, meaning people inside using their infrastructure. The problem was they did not know how to sell to this customer. They were not really a developer platform, especially to the enterprise, and this is a market where Amazon had cracked that.
尽管拥有整个生态系统中最顶尖的技术,他们仍不得不学习新技能。另一个教训是谷歌的垂直整合有时走得太远。他们为自己找到了最佳方案,但这可能导致对非谷歌生态用户需求的盲区。比如认为每个开发者都会采用谷歌发布的新Python框架,这种想法有些自我——实际上工程师可能只想要最大灵活性,而这正是亚马逊深刻理解的。
They really had to learn new skills despite having the best tech in the entire ecosystem. I think the other story is that Google's vertical integration sometimes went too far. They figured out the best way to do this for themselves, but sometimes that led to blind spots in what someone not in the Google ecosystem might need. So the idea that every developer is gonna build a new Python framework that Google releases is a little self serving when it turns out engineers might just want maximum flexibility, or actually Amazon deeply understood.
这一切始于Errors为行业带来全新的思维方式:将数据中心视为持续演进的单一机器,定制能自我容错的硬件,将建筑选址于热力学和电网最优位置,并公开计算公式。展示性能指标和PUE(电能使用效率),让所有人产生错失恐惧症(FOMO)并争相追赶。
And it all starts with Errors giving the industry an entirely new way of thinking about it. Treat the data center as a single evolving machine. Build the hardware to suit the software that will survive its own failures. Put the buildings in thermodynamic optimal and grid optimal locations, and publish your math. Showcase your performance and your PUE so that everyone can have FOMO and try and play catch up.
当时谷歌正斥资数十亿美元购置土地、建筑、硬件,设计自己的基础设施,并通过开发整合这一切的软件颠覆数据中心行业,迫使整个产业按照谷歌等超大规模运营商的标准来竞争。市场竞争正日趋白热化,对吧?不仅是亚马逊和谷歌,更多家喻户晓的企业也在悄然转型。
So Google was spending billions on land, buildings, hardware, designing its own infrastructure, and overturning the data center world by building software to bind it all, which forced the entire industry to meet hyperscalers like Google on its own terms. Competition was heating up. Right? It wasn't just Amazon and Google. There were even more household names quietly transforming themselves.
到2000年代中期,微软无疑是软件界的王者。Windows系统占据90%的桌面市场,Office办公软件简直是印钞机。但时代正在改变——宽带普及率持续攀升。
So by the mid two thousands, Microsoft was unquestionably the software king. Windows was powering 90% of desktops. Office was licensed to print money. But things were changing. Broadband penetration was climbing.
那时我们开始看到联网手机的出现,比如BBM和黑莓设备。互联网原生企业目睹谷歌和亚马逊的崛起,证明无需购买光盘软件也能运行有趣的计算应用。记住,正如我们讨论的,云计算不仅关乎后端开发——应用程序本身正从后端到前端都向浏览器迁移。
We started to have the emergence of connected phones, BBM and BlackBerrys and things like that at the time. And Internet native companies, they saw Google and Amazon building and climbing and proving that you didn't need to buy software in a CD to run interesting computing applications. Remember, the cloud, as we've been talking about, isn't just about back end developers. Applications themselves were starting to move not just the back end, but the front end to the browser as well.
完全在浏览器里运行。记得Salesforce是这方面的先驱,他们以'无需安装软件,全在浏览器'的理念起家,到2005年2月营收已接近2亿美元。
Right on the browser. You remember Salesforce was leading the charge here. They launched with this idea of no software, all in the browser, and by 02/2005, they're doing nearly $200,000,000 in revenue.
并成功占领用户心智,证明这不仅是消费级娱乐应用的未来,更是像CRM这样的商业产品的未来。
And capturing the mind share, proving that this could be the future not just of consumer fun applications, but also a business product like a CRM.
企业级B2B软件。
Enterprise b to b software.
所以当你坐在微软的办公室里,这简直是最可怕的噩梦——你可能与未来的应用和操作系统无缘,而这正是微软的全部业务。此时的微软并非没有网络服务,十年前盖茨就发出过互联网浪潮的备忘录,他们投资收购并扩展了Hotmail,拥有MSN和Xbox Live,但这些都只是各自独立的服务。
And so you're there sitting at Microsoft like, that is your greatest fear, that you're not gonna be part of the application and operating system of the future. That is your whole business. Microsoft at this point, it's not like they didn't run any web services. Ten years prior, Gates had sent his memo of the Internet tidal wave. They'd invested, they'd bought and scaled Hotmail, they had MSN, they had Xbox Live, but these were all these self contained services.
他们当时还未考虑如何向他人开放基础设施以供建设。他们热爱构建最终能运行他人基础设施的操作系统。通过这种方式,他们建立了史上最成功的企业之一。他们只需编写一次代码,就能无限量压制光盘。这造就了令人惊叹的超高利润率业务。
They hadn't yet thought about how to expose an infrastructure to others to build on. They loved building operating systems that would ultimately run others' infrastructure. They had built one of the best businesses in history doing this. They write this code once, and they print as many CDs as they can. And it's this amazing, amazing, high margin business.
他们无需购买他人的硬件设备。
They didn't have to go buy other people's hardware.
他们压制光盘,如同印钞。而基础设施即服务这一理念在当时堪称激进。在华盛顿州雷德蒙德,微软的这次转型被视作生死攸关的转折点。
They print the CDs. They print the money. And this idea of infrastructure as a service was radical. Inside of Redmond, Washington, the pivot for Microsoft was framed as nothing less than existential.
正如这类企业转型中常见的情况,核心在于人才争夺。为此他们收购了Groove Networks公司,引入了曾主导Lotus Notes开发的传奇人物雷·奥兹,任命他为微软联合首席技术官。其重要性堪比当下Meta招揽人才的战略——这就是那个时代的顶级人才争夺战。
So as is often the case in a company motion like this, it's about people and talent. And so they bought a company, Groove Networks, that brought in the famous Ray Ozzy, who had previously led the development of Lotus Notes and brought him into Microsoft and put him as a co CTO of the company. This was so existentially important. Think about what Meta's doing today to staff up talent. It it was that of the time.
云计算是生死存亡的关键。我们需要云计算领域的领军人物雷·奥兹来引领这次变革。
The cloud is the existential thing. We need the leader of the cloud, Ray Ozzy, to come in and lead the way through this charge.
如果我们准备押上全部身家进行这次转型,就必须引进最顶尖的人才。
If we're gonna bet everything we've built on this pivot, we're gonna bring in the best talent.
于是他们收购了公司,委以重任,并发布了这份五千字的宣言,阐述微软未来必须采取的方向。全文值得细读,我会在节目备注中附上链接。不过,阿娜亚,你能读读那个最引人注目的关键段落吗?
And so they buy the company, they put them in charge, and this is this 5,000 word manifesto about what the future of Microsoft needs to be here. And so it's worth reading the whole thing, and I'll link to it in the show notes. But, Anaya, would you read that key section that really stood out?
雷在备忘录中提到,计算和通信技术已显著且持续进步,使得基于服务的模式变得可行。宽带和无线网络的普及改变了人们的互动方式,他们越来越倾向于使用简单易用、即开即用的服务及服务化软件。
And so in Ray's memo, he says, computing and communications technologies have dramatically and progressively improved to enable the viability of a services based model. The ubiquity of broadband and wireless networking has changed the nature of how people interact, and they are increasingly drawn towards the simplicity of services and service enabled software that just works.
他最后总结道,企业正越来越多地考虑基于服务的规模经济如何帮助降低基础设施成本,或按需以订阅方式部署解决方案。
And then he ends that saying, businesses are increasingly considering what services based economics of scale might do to help them reduce infrastructure costs or deploy solutions as needed and on a subscription basis.
只为用户提供他们想要的最终结果,并揭开所有技术过程的神秘面纱。
To just give people the end result that they want and demystify everything that's happening.
没错。他最后强调'我们必须快速果断地响应'。对于微软这样规模和历史的公司,这确实是一记警钟。
That's right. And he ends with, we must respond quickly and decisively. For a company of Microsoft's scale and history, that's really trying to be a wake up call.
盖茨和鲍尔默难道不是给了雷阿兹一张白纸吗?就像在说:'你是这里的领导者,帮我们看清未来方向'。
And didn't Gates and Balmer kind of give Rayazi a blank slate? Like, you're the leader here. Help us understand the way the future is.
不仅如此,他们还允许他将这块业务独立于微软服务器与工具事业部(该部门曾开发Windows Server和SQL Server,本是尝试这些新业务的理想部门)来运营。因为他们明白,这不仅是原有软件业务的渐进式发展,而是一场涉及产品本质、商业模式和市场策略的根本性变革。当一切都需要如此改变时,让现有业务领导者继续主导就变得异常困难。具体来说,这对服务器或数据中心意味着什么?微软当时已拥有Windows Server和SQL Server。
Not just that, they let him carve out and run this separate from Microsoft Server and Tools business, which had built Windows Server, had built SQL Server, would've been the obvious place to try these things, but they knew that this was bigger than an evolution from what they were doing in their previous software business. This was a transformation, a fundamentally different product and different business model and different go to market motion. So when everything has to change like that, it is very hard for the leader of the incumbent thing to keep going. Let's make this concrete, what does this mean for a server or a data center? Microsoft, again, had a Windows Server, had SQL Server.
我们之前谈到的企业服务器机房,通常都在运行微软软件。这正是他们当时已臻完善的IT销售模式:销售Windows Server许可证,配置Active Directory,运行Exchange和Outlook来处理邮件。
When we talked about those server rooms that a company would run, they would very often be running Microsoft software. This is this IT sale that they had perfected by this point. You would sell the Windows Server license. It would be Active Directory. It would be running Exchange and Outlook for your email.
它将在你公司的网络内运行你的日历和所有这些功能。它将在他们的个人电脑上运行Windows系统。
It would be running your calendar and all of these things inside your company's network. It'd be running Windows on their PCs.
因此这是一个完美互联的整体。但这意味着不,不,不。我们必须将那台服务器从你公司资本支出的底线转移到运营支出,转移到我们的数据中心。
And so it's this beautifully connected thing. And this is saying, no. No. No. We've gotta move that server from your bottom line CapEx as a company into OpEx, into our data centers.
那么这是一次从你最终提供的、仅仅在他人计算机上运行的软件,向实际服务的根本性转变吗?
And this is a transformational move from what you're providing at the end of the day from just software that's gonna run on someone else's computer to actual services?
雷·奥兹决定推进这个项目。他组建了一个团队,将项目命名为'红狗',并请来了曾领导Windows NT开发的传奇程序员戴夫·卡特勒,协助领导这个新项目的架构设计。
Ray Ozzy goes for it. He staffs up a team. They call the project Red Dog, and they bring Dave Cutler, who had led Windows NT prior and is a legendary programmer, in to help lead the architecture of this new project.
他们正在整合这套云战略,这既重塑了微软内部结构,同时也在互联网门户领域与谷歌展开了正面竞争。他们将老旧的MSN搜索产品逐步演变为后来的Live搜索,最终在2009年2月推出了我们现在所知的必应。这一刻在整个行业引发了涟漪效应,特别是对谷歌的统治地位产生了冲击。
So they're putting together this cloud strategy that is revamping Microsoft internally, and at the same time ramping up a direct competition to the front door of the internet, Google. As they evolved an old MSN search product into what then became live search and eventually Bing, as we know it, in 02/2009. And this was a moment that kind of rippled across the industry, especially for Google's dominance.
我当时就在微软工作。是的,我2008年大学毕业后加入微软,正好见证了这段时期——他们从Windows Live搜索起步,短暂更名为Live搜索,然后进行了必应的大规模品牌重塑。微软愿意投入巨资进行这次重新定位和营销,甚至给雅虎提供了非常优厚的协议来支持雅虎的搜索和广告业务,以此获取部分基础设施市场份额。
I was at Microsoft at this time. Oh, yeah. I joined Microsoft out of college 2008 and was there in this moment of they had Windows Live Search. They it was briefly called it Live Search, then this big rebrand of Bing. And Microsoft was willing to spend a lot on this repositioning, on this marketing, so much so that they gave Yahoo a very sweet deal to power Yahoo's search and ads and gain some infrastructure market share.
这使他们一举获得了高达20%的市场份额——在一个由谷歌主导的爆发式增长市场中,这绝非易事。
This catapulted them into getting up to 20% share, which is no small feat in an exploding market that Google has been dominating.
必须记住的是,做好搜索和搜索广告是一个极其困难的规模问题,本质上只有另一家公司——谷歌——成功破解了,能够快速响应互联网的实时更新索引,并为合适的广告单元进行竞价。所有这些必须构建的内容推动了谷歌建立其垂直整合的服务器,而微软现在正将自己置于必须解决同样极其困难的技术问题的位置。
It's important to remember that, like, doing search well and search ads well is a really hard scale problem that essentially only one other company had cracked, that was Google, to respond quickly with this active updated index of the Internet and do this auction for the right ad unit. And all of these things that you have to build is what propelled Google to build their vertically integrated servers, and Microsoft was now putting themselves in the position to have to solve those same really hard technical problems.
除了这些极其困难的技术问题外,我们从谷歌看到,要有效运营搜索业务,背后必须有强大的计算能力。现在必须进入数据中心业务,建设自己的基础设施以降低成本、提升性能。之前提到过克里斯蒂安·贝拉迪,他2007年2月还在惠普,接到微软的招聘电话。他的第一反应是:一个机械工程师在微软能干什么?
In addition to those really hard technical problems, we saw with Google that in order to effectively run this search business, you have to have a lot of compute power behind you. You now have to get into the data center business and build out infrastructure that is your own to drive down the cost, drive up performance. We mentioned Christian Belady previously. He was at HP in 02/2007, gets a recruiting call from Microsoft. And his initial reaction is, what the hell is a mechanical engineer gonna do at Microsoft?
因为微软以软件公司著称。所以他拒绝了几次。第三次电话后,他决定去面试,喜欢这个过程,拿到了offer,但没太当回事。我记得他当时在父母家,突然收到一封来自Billg@microsoft.com的邮件,解释为什么他应该接受这份工作。
Because Microsoft is known as the software company. And so he turns him down a couple times. After the third call, he decides to go up for an interview, enjoys the process, gets an offer, but doesn't think much of it. And I believe he's at his parents' house when he suddenly gets an email from BillgBillg@microsoft.com. Explaining why he should accept the job offer.
比尔基本上说,一切都在向云端转移。我们正在为云业务投资,需要将这种新基础设施作为微软未来的核心来建设。
And Bill essentially says, everything's going to the cloud. We're investing for the cloud business, and we need to build out this new infrastructure as core to the future of Microsoft.
这看起来是个疯狂的想法,但比尔·盖茨给我发邮件说,既然你在父母家,不妨试试。收拾行李搬到西雅图地区,加入微软吧。于是这时他入职了,微软刚完成第一个真正集成化的数据中心建设,位于华盛顿州东部的昆西,13兆瓦的规模。
This seems like a wacky idea to go there, but Bill g emailing me like, you're there with your parents and gotta give this a go. Let's pack our things, move to the Seattle area, and join Microsoft. And so at this point, he starts there, and Microsoft had just finished building their first real, more integrated data center build out. This is in Quincy, Washington, in Eastern Washington. 13 megawatt build.
他走进去时,所有人都在东张西望。他们的第一个念头是:我们会被炒鱿鱼的,因为根本不可能用服务器填满这个地方,我们的需求绝对达不到。
And he walks in there, and everyone's looking around. And the first thought they have is we're gonna get fired because there's no way we're gonna fill this thing with servers. There's just no way that we have the demand.
这是他们第一个大规模专门建造的云数据中心园区,标志着微软正式进入基础设施业务。
And this was their first massive scale purpose built cloud data center campus. This was Microsoft entering the infrastructure business.
与我们在达拉斯看到的谷歌情况类似,吸引他们来到昆西的许多相同因素都源自哥伦比亚河的同源电力。当时他们以每千瓦时1.9美分的低价电力驱动设备,当地干燥凉爽的气候也使得全年大部分时间都能经济高效地使用外部空气冷却。那里有多条光纤线路——毕竟位于华盛顿州,可以直接连接他们在雷德蒙的总部。
And similar to what we saw with Google at the Dallas, a lot of the same things that attracted them to Quincy were power from the same Columbia River. They're getting cheap 1.9¢ per kilowatt hour power at the time to power this, and the climate was that dry cool climate that allowed them to be really economical with using outside air cooling for much of the year. They had multiple fiber routes. It's in Washington, right, so they can connect their Redmond headquarters directly to Quincy.
环境、光纤、电力,这些要素完美结合。和谷歌一样,他们深知必须与当地社区和政府合作。因此微软实际与昆西市合作建设了水循环系统,通过处理并重复利用冷却水来降低对当地水源的依赖。他们建成了昆西数据中心并投入运营。
The environment, the fiber, the power, it all comes together. And they, like Google, knew they had to work with the local community and the local government. And so they actually worked with the city of Quincy to build a Quincy water reuse and treat and recirculate the cooling water to reduce the dependence on the local community. They built Quincy. They have it up and running.
他原本担心数据中心填不满,但建设仍在继续。Azure最终在2008年上线(我们稍后会详述)。但我认为,对于投入大量资源进行内部建设的大公司而言,很容易自认为能预判需求规模。他们在芝加哥过度建设后曾将其封存并试图出售,结果九个月后却庆幸没卖掉——因为那时他们急需空间。
He's worried they're not gonna fill it, but they kept going. Azure ends up launching 2,008, which we'll talk about in a moment. But I think it's very easy for a big company that's bet a lot of resources building something internally to think that they have an idea of what demand's gonna look like. But they overbuilt in Chicago, they then had it mothballed, they tried to sell it. And then nine months later, they were so happy they didn't sell it because they needed the space.
这简直太滑稽了,他们居然试图卖掉它
It was hilarious. They tried to sell it
给政府。那么实际情况如何呢?当时他们正在启动'红狗'项目,也就是后来的微软Azure。基础设施投入使用时的规模效应确实难以预测。让我们回到2008年10月那个重大发布的时刻。
to the government. So what's going on there? And what's what's happening is they're launching this project Red Dog, which would become known as Microsoft Azure. And it's hard to predict the scale of the utilization of infrastructure as you launch it. So let's go back to October 2008, the big reveal.
这是微软年度专业开发者大会,当时微软揭晓的并非Microsoft Azure,而是Windows Azure。
This is the Microsoft's big annual professional developers conference, and Microsoft unveils not Microsoft Azure at the time, but Windows Azure.
这不仅仅是把Windows移植到其他地方运行。它是一个供开发者在微软基础设施上构建和托管应用的平台,采用按使用量付费的模式——租用存储和计算资源,用多少付多少。
And this wasn't just Windows running somewhere else. It's a platform for developers to build and host applications on Microsoft infrastructure, paying only for what they used. Rent some storage and compute, pay for what you use.
你提到的关键点是它专为Windows开发者设计。因此它使用了所有熟悉的工具,比如.NET框架和Visual Studio等,让你能轻松地将原本在办公室Windows服务器上运行的工作代码,直接迁移到微软服务器上运行。这是多么强大的整合与转变。它在AWS提供的这些非常基础、无预设的构建块(如S3和EC2,让你自带操作系统和全套配置)之间开辟了一条新路。
And the key thing that you mentioned is it's designed for those Windows developers. So it used all the familiar tools, you know, the dot net framework and Visual Studio and all these things, and it made it easy for you to take your work that you've done over there that would run on Windows Server, right, inside of your office and take that same code and shift it to run on Microsoft servers. What a powerful integration and move. It kind of split the thread between AWS, which gave these really basic building blocks, s three and e c two of this unopinionated, like, run your own operating system, bring your own full thing, and we'll just make it work.
没错。AWS对你在其上运行的内容持中立态度,
Right. They were agnostic to what you were doing on top of it and
以及你的运行方式。正是如此。相较于Google App Engine那种过度具体甚至有些花哨的做法,微软的方式反而更接近谷歌的思路,但开发者使用的是相同的代码和工具。因此他们拥有庞大的开发者生态系统、集成商以及围绕这一切的完整运作机制。
how you're doing it. Exactly. To Google App Engine, which was overly specific and cute about it, in a way, Microsoft was almost more Google's approach, but it was the same code and the same thing that a developer had already used. And so they had this massive developer ecosystem and integrators and this whole kind of motion around it.
这不仅对开发者社区很友好。别忘了微软拥有企业级销售体系,他们有成熟的市场推广策略,财务部门清楚销售团队如何运作及如何预算。因此他们为Azure带来了世界级的销售渠道,更重要的是,获得了这个庞大客户群的信任。
And it wasn't just easy for the developer community. This is Microsoft that has an enterprise sales engine. They've got a go to market motion. They have the finance department that knows how the sales team is gonna operate and how to budget for it. And so they're bringing this world class distribution and most importantly, the trust from this massive customer base to bear with Azure.
采用速度极快,因为人们已经深植于这个生态系统。而你带来的是一款改进后的产品,企业用户就能非常顺畅地接受它。
And the adoption is extremely rapid because people are already embedded in the ecosystem. And here you are coming with an improved product, and the enterprise base adopts it very seamlessly.
是的。这不是亚马逊需要首次讲述的故事——'来云端与我们共建'这样的概念引入。这是四五年后,IT部门还在思考:我们该如何看待云计算?我们与亚马逊没有合作协议。
Yeah. And it's not the story that Amazon has to make for the first time, is, hey. Come build your thing with us in the cloud and introduce this. This is four or five years later, and IT is sitting there like, how do we think about the cloud? We don't have an agreement with Amazon.
那不是我们的供应商。而面对微软时,你会觉得:这是我们现有的协议,这是我们与IT部门的关系,这是完全的信任。如果你告诉我们这是能以无缝、安全、可靠的方式迁移到云端,并能与Active Directory和Outlook集成的方法——只要不用再在机房自己运维服务器,我立刻签约。
That's not a vendor. Versus Microsoft, you're like, this is the agreement we have. This is the IT relationship we have. This is the full trust. If you're telling us this is how we can move to the cloud in a seamless, secure, safe way that's gonna integrate with Active Directory and Outlook, it's just I don't have to run the machines in my closet anymore, sign me up.
讽刺的是,我认为最难注册的反而是华盛顿州雷德蒙德的工程师和产品经理们。我们当时正与《Acquired》节目的本·吉尔伯特讨论此事,他在2012、2013年间负责开发网页版Microsoft Word。微软终于要在产品层面与Google Docs正面对决,推出可通过浏览器使用的Word版本。当然,他们希望所有人都使用Azure——但成功了吗?
Ironically, I think some of the people that had the hardest time signing up were the engineers and product managers in Redmond, Washington. We were actually talking with Ben Gilbert of acquired about this, and he at the time, this was 2012, 2013, was working on a web version of Microsoft Word. So Microsoft was finally gonna compete head to head with Google Docs on product and ship a version of Word that you can use via your browser. And of course, the desire was for everyone to use Azure. Did they?
完全没有。他们直接拒绝说'不,我们需要这种定制服务器。需要这么多台,能扩展到这种规模'。我们还不得不做些自定义JavaScript渲染之类的工作。
Absolutely not. They said, No, this is the custom server we need. We need this many of them. We can scale this big. And we had to do some custom JavaScript rendering, all this stuff.
我们要用自己的服务器,可能还在昆西市,但绝不会用Azure的API和中间层。
We want our own servers, probably still in Quincy, but we're not using the Azure APIs and layers.
所以自家员工要求这种异构性,结果导致服务器SKU种类极其繁杂。
So the own employees are demanding all this heterogeneity and resulting in a wide variety of server SKUs.
作为对比,亚马逊在构建AWS时用了约20种机型,而谷歌在整个建设过程中标准化了垂直集成的通用服务器设计,仅维持少量机型。据说微软当时维护着数十种不同的SKU,因此有人戏称这不是服务器农场,而是服务器动物园。
Amazon, for context, building out AWS, I think had something like 20 different types of machines that were running AWS. Google, in their whole build out, standardized on this vertically integrated, commodity built server design. And so they would just keep it down to a handful of machine types. So supposedly at Microsoft, there were dozens and dozens of different SKUs that Microsoft was maintaining. So instead of calling this a server farm, someone call it a server zoo.
只要想想这些不同网络服务的演化历程,以及各团队各自为政的开发方式,就能明白这完全是另一个世界。大约此时,能源问题也开始成为值得关注的新层面。
You just think about where they're coming from in the evolution of all these different web services and different teams building their own thing. It's a whole different world. And around this time, I think energy starts to become an interesting layer as well.
就像克里斯蒂安2007年被招募时疑惑'我去微软能干什么'一样,他们在2011年又招募了能源专家布莱恩·贾纳斯。微软意识到需要内部配备能源专家。
So similar to Christian getting recruited in 2007 thinking, what the hell am I gonna do at Microsoft? He then goes out to recruit an energy expert, Brian Janus, in 2011. Microsoft saw the need for an energy person to be in house.
我认为微软看到了需求,但我觉得布莱恩当时很困惑。
I'd say Microsoft saw the need, but I think Brian was confused.
布莱恩当时的想法是:我是搞能源的。你们不需要能源方面的人。他认为这是个没有前途的工作。在软件公司当能源专家毫无意义,因为虽然他们要做的事都跟土地、光纤有关,但能源在扩展这类基础设施的优先级列表上只能排第三甚至更靠后。
Brian was like, I do energy. You you don't need an energy person. And his thought was this is a dead end job. Being an energy person at a software company doesn't make sense because though what they're trying to do is all about land, it's all about fiber, energy is a distant third or lower down the list in the priority of how you think about scaling this type of infrastructure.
是的。要知道那时候,公共事业公司巴不得你带着10兆瓦(比如昆西的13兆瓦数据中心)找上门,他们会说'太好了,签合同吧'。
Yeah. I think at this time, for context, utilities were happy you'd show up with your 10 you know, Quincy, 13 megawatt data center, and utilities were like, sure. Great. Sign up.
没错。当时电力需求总体平稳,电力供应过剩,他们很乐意签合同增加利润,因为还不需要为此新建任何基础设施。
Yeah. Generally, power demand is relatively flat. There's excess power available, and they're happy to sign you up and increase their profits because it's not it's not gonna drive any new infrastructure needs on their end yet.
世事变迁啊。这是2012年的事。转眼十多年过去,如今的情况就大不相同了,我们稍后会讲到。
And how things change. This is 2031, 2012. Fast forward a little more than a decade where we are now, and we'll get there.
但布莱恩接受了这份工作,最终打造了一支卓越的团队。几年内,他的团队实际上成了微软基础设施选址的决策者。这引出了我们真正开始关注能源来源的问题,以及布莱恩团队主导的清洁能源建设所涉及的复杂选址问题。
But Brian took the job and ended up building a phenomenal team. Within a few years, his team was actually the decision maker of where are we gonna build out Microsoft infrastructure. And so it's a lead in to we actually start to care about where our energy is coming from and the whole complexity around where you're sourcing and doing these clean energy build outs that Brian's team was leading.
团队认为这是个真正的规划挑战。所以他们向领导层提出:'我们现在需要大量资金投入来建设这个。'
The team thought it was a real kind of planning challenge. Right? And so they're bringing this up to leadership around, hey. We need a lot of money to invest now to build this out.
他们正在做年度预算,预测未来几年的情况,并着手进行年中审查。
They're doing their annual budgets. They're projecting years in advance. They're getting into their midyear review.
微软正处于一个奇怪的转型阶段。必应仍在成长,还有其他服务。Xbox业务蓬勃发展,Azure尚处早期阶段,虽未成为内部服务器构建的主导用途,但要同时协调和预测这么多不同团队确实不易。
Microsoft's in this weird transitional phase. They have Bing that they're still growing. They have other services. Xbox is booming. Azure is still early innings, so it's not the dominant use of their internal server build out, but it's a lot of different teams to juggle and predict.
因此这个预测挑战确实棘手。你把这个呈报给领导层时,究竟要表达什么?
And so this forecasting challenge is a real one. You bring this to leadership, and what are you there to say?
你已经和所有业务部门沟通过,了解他们的想法,带着准备好的笔记去找鲍默,结果没说几句就被打断。史蒂夫·鲍默会说:'啊,业务部门根本不知道'
And you've talked to all the business groups to figure out what they're thinking, and you go to Baumer if you got your prepared notes, and within a few sentences, you get interrupted. And Steve Baumer's like, ah, the business groups don't know
他们需要什么。让我
what they need. Let me
来告诉你怎么做。
tell you how to do this.
然后他说:'把Excel表格给我。你只需根据过去几年的增长趋势画一条直线,按这个趋势预测,然后根据这条线显示的规模去建数据中心。'
And he says, give me the Excel sheet. You just draw a straight line from how growth has been going last few years, and you just project it out, and you just build the data centers that that line shows you to do.
从这里到2020年就是一条直线。那些企业集团根本不知道计划是什么。
Just a straight line from here to 2020. The business groups, they don't know what the plan is.
巴尔默的话其实很有智慧,因为你意识到每个团队都对他们正在构建的东西感到兴奋,Azure会有所有这些需求,但没人真正知道,尤其是超过十二个月,甚至十八个月之后。那么如果你在为长期、多年建设规划,怎么能预测出人们会需要什么呢?
There's actually a lot of wisdom to what Barmer's saying, because you realize that every team is trying to be excited about what they're building, and Azure's gonna have all this demand, but no one really knows, especially more than twelve, maybe eighteen months out. So how can you plan if you're building long term, multiyear build outs for what anyone's gonna need?
在这个增长时期,情况相当困难。贾努斯的团队发现他的预测总是偏低。但多年后他回顾过去,很好奇自己的预测与直线相比到底有多准确。
In this moment of growth, it was quite difficult. And Janus' team found that he was consistently under forecasting. But he did go back many, many years later, and he was kind of curious how accurate were his forecasts compared to the straight line.
2020年他回头看时,巴尔默的偏差有多大?5%。只有5%。所以他们本应该遵循那条直线。果然,微软坚持了下来。
In 2020, he looked back, and Balmer was off by how much? 5%. 5%. So they should have just followed the line. So sure enough, Microsoft keeps at it.
微软的电力团队,数据中心团队,成为了这个行业最大的建设者之一。他们从租赁平方英尺转向基于兆瓦的交易,成为了一个超大规模运营商。
And the power team at Microsoft, the data center team at Microsoft becomes one of the biggest builders in this industry. They shift from leasing square footages to megawatt based deals as they become a major, major hyperscaler.
这对他们来说是一段重要的学习历程,对吧?因为他们原本是一家软件公司,必须迅速转型为基础设施公司。这个转变并不容易,而且要在全球范围内实现。微软团队得益于那些根深蒂固的企业客户群,这些客户非常忠诚、利润丰厚且遍布全球。因此他们的市场策略实际上是跟随客户。这意味着在推出Azure后的几年内,他们就接到了来自法兰克福、阿姆斯特丹、新加坡等地客户的电话。
It was a big learning journey for them, right, because they were a software company and they had to grow very quickly into an infrastructure company. It was not an easy pivot to make, and they had to do that globally. The Microsoft team had the blessing of this entrenched enterprise customer base that was very loyal, very profitable, and very global. And so their go to market strategy was actually to follow their customers. And that meant that within a few years of launching Azure, they were getting calls from customers in Frankfurt, in Amsterdam, in Singapore.
如果他们的德国跨国公司有数据主权问题,微软就会去当地建立容量。因此到2015年,他们已宣布了超过20个区域,实际上比当时的亚马逊还多,每个区域都以互联互通丰富的大都市为锚点,通常通过先进入托管设施(可能是Digital Realty这类公司)来支持,等积累足够关键需求后再转向专门的微软园区。但这种市场策略迫使他们全球创新和扩张,很快就取得了巨大成果。
If their German multinational had data sovereignty questions, Microsoft would go and stand up capacity in the region. So by 2015, they had announced over 20 regions, which is actually more than Amazon at this time, each anchored on an interconnection rich metro and typically supported by entering with colocation facilities, perhaps the digital realties of the world before moving on to dedicated Microsoft campuses once they built up enough critical demand. But this go to market strategy forced them to innovate and expand all over the globe, and it drove massive results quite quickly.
说到这一点,快进到今天,微软的业务在很大程度上已经是Azure业务。Azure是核心增长驱动力,现在年收入达750亿美元,并以每年40%的速度增长。因此这家曾经以软件为主业的公司——我们编写软件、销售软件,这些是我们的核心平台和产品——已不再只是一家软件公司。在很大程度上,它已成为一家数据中心公司。
And to that point, fast forward to today, and Microsoft's business is, in the significant part, the Azure business. Azure is a core growth driver. It's now $75,000,000,000 of revenue a year and growing 40% year over year. And so this company that was the software company, we write software, we sell software, those are our core platforms, those are our products, that's what we do, is no longer just a software company. They are in large part a data center company.
我们已经讨论了AWS、谷歌和GCP,微软和Azure,但还有一家公司多年来被许多人认为只是个玩具。让我们谈谈Facebook。那个Facebook。2004年2月,Facebook还运行在扎克伯格哈佛大学宿舍里的一台服务器上。
So we've gone through AWS, Google and GCP, Microsoft and Azure, but there was another company that many thought for many, many years was just a toy. Let's talk about Facebook. The Facebook. The Facebook. In February 2004, Facebook's running out of a single server in Zuckerberg's dorm room in Harvard University.
早期经常导致服务器崩溃。事实上,他早期的许多产品,包括哈佛校园的‘Facemash’(颜值评分网站),都曾让Kirkland宿舍的服务器瘫痪,哈佛IT部门对此很不满。但后来它成为了一家真正的公司。对吧?
Early days caused crashes. In fact, many of his earlier products, including Facemash, the hot or not for Harvard campus, meltdown the servers in the Kirkland House, and the IT department at Harvard was not pleased. But then it becomes a real company. Right?
他们开始向其他校园扩张,不同学校使用不同数据中心的不同服务器。
And they start expanding to other campuses, and you've got different schools with putting on different servers in different data centers.
我认为这种方式自动形成了韧性,因为如果宾夕法尼亚大学的服务器因任何原因崩溃,不会影响到哈佛。
In a way that I think creates, like, automatic resiliency because you can't if the server crashed for whatever Penn, it's not gonna bring down Harvard.
这并不是他们关心的问题。对吧?他们正在打造这个病毒式传播、改变游戏规则的平台,其发展可能超出了他们的预期,新校园不断加入。所以他们并不真正关心基础设施,但它正在有机地增长。到他们推出新闻推送时,已有数亿人刷新网站,上传数十亿张照片,拇指条件反射般地打开Messenger。
And this isn't their concern. Right? They're they're building this viral kind of game changing platform that is exploding probably beyond their expectations, and new campuses are signing up. So they're not really concerned about the infrastructure, but it's just growing organically. By the time they launch News Feed, you've got hundreds of millions of people refreshing the site, uploading photos by the billions, opening up Messenger just as a reflex of the thumb.
公司正在花掉所有这些钱。他们筹集了风投资金,但对把钱花在豪华服务器和托管机柜上并不满意。
And the company's spending all of this money. They've raised VC dollars, and they're not happy spending all this money on fancy servers and colo cages.
于是他们意识到,为什么每个人都在我们身上赚这么多钱?一定有更好的方法。他们主要采取两种方式。一种类似于亚马逊、谷歌和微软的做法,即认为如果建立自己的定制数据中心,可以做得更好、更便宜。他们首个专用设施于2010年1月宣布,2011年在俄勒冈州普赖恩维尔投入运营。
And so they're realizing that why is everyone making so much money on us? There's gotta be a better way to do this. And there's two broad ways that they go about this. One is similar to what we've seen with Amazon, Google, and Microsoft, which is the thought that, hey, we can do this better, cheaper if we build our own custom data center. And so their first purpose built facility was announced in January 10 and began operations in 2011 in Prineville, Oregon.
随后他们迅速在北卡罗来纳州、瑞典、爱荷华州跟进,并快速扩张。但普赖恩维尔和其他选址一样,都是经过深思熟虑的战略选择。
And they soon followed with North Carolina, Sweden, Iowa, and expanded rapidly. But Prineville was just like all the others, quite intentional and quite strategic.
是的。同样在西北地区,这里有廉价电力、干燥凉爽的气候,加上俄勒冈州开始实施的激励政策——提供长期房产税减免及改造补贴。这些因素共同造就了首个标杆数据中心的理想选址。从服务器数量增长来看,2008年2月他们只有1万台服务器,2009年就达到3万台。
Yeah. And so similarly, in this kind of Northwest region, you had cheap power, you had a dry, cool climate, and you had Oregon's incentives starting to come into play where they offered long property tax abatements and any improvements they made. All of this combined to make a great site for their first lighthouse data center. And if you look at the numbers of of server growth, if you go back just to 02/2008, they had 10,000 servers. 2,009, 30,000 servers.
据称到2010年,当他们正式启动普赖恩维尔项目时,已拥有约6万台服务器,正全力解决扩容问题。
Supposedly by 2010, when they are really kicking off and building the Prineville project, they have around 60,000 servers that they're trying to figure out how to scale up.
因此他们决定将建筑与服务器协同设计,打造这个高度定制化的设施。和其他案例一样,他们发现可以超越行业标准。于是他们高调公布了首个专用设施,设定了1.15 PUE的激进目标——当时行业平均是1.5,历史水平更是超过2。
And so they decide to engineer the building and the servers together, this very purpose built facility. And just like the others, they realized that they can beat industry standards. And so they very famously and publicly came out with this first purpose built facility with a very aggressive target of a 1.15 PUE. Industry averages, 1.5. Historically, they were north of two.
最终报告显示,比旧设施节能38%,成本降低25%。你以为他们只是勉强达标?完全碾压——实际达到1.07。
And they were able to report 38% less energy, 25% lower cost against prior facilities. And do you think they ended up beating their PUE? No way. Smoked it. 1.07.
这意味着数据中心总能耗中,只有7%用于非IT设备供电。这太惊人了!他们有没有对此保密?我认为这正是Facebook与众不同的地方。
So that means that only 7% of the energy going to run the data center went to anything other than powering the IT equipment. That is phenomenal. Amazing. Did they keep this to themselves, how they did this? I think this is what sets Facebook's approach apart.
这第一部分是他们必须完成的,而且执行得非常出色。但他们所做的第二件事更具革命性。
This first part, they needed to do, and they executed extremely well. But the second thing they did was far more revolutionary.
在数据中心开发这个高度保密的领域里,各大云服务商都对自己的建设方案守口如瓶。而2011年4月,Facebook却开源了他们的设计方案,宣布了开放计算项目。
In a very secretive world of data center development, where each hyperscaler kept their builds themselves, in April 2011, Facebook open sourced their blueprints. They announced the open compute project.
这种做法的颠覆性怎么强调都不为过。这个行业向来将服务器和数据中心的设计视为核心知识产权与安全机密。虽然谷歌会发表论文——特别是关于PUE等指标的论文——来展示他们想突出的数据,但众所周知他们从不允许外人进入数据中心,直到这次彻底改变了游戏规则。
It's hard to overstate how radically different of an approach this was. This is an industry that kept the design of the servers and the data centers extremely secretively. Like, they viewed that as this core IP and differentiation and notions of security, and Google would publish papers, especially on things like PUE, and be visible about metrics that they wanted to to highlight, but they famously didn't let anyone into their data centers centers until this really changed the game.
他们的动机很有意思。观察其他大型数据中心公司时,他们注意到大家都在赚取高额利润。于是他们想:如果我们公开设计方案,并让其他公司也加入开源行列,就能通过标准化建设来降低成本。
Pretty interesting, their motivation. They look around at all the other big data center companies, and they're seeing all of the margins that are being made. And they're like, well, wait a minute. If we publish our blueprints and we get everyone else to buy in and publish theirs, that means we can drive down the costs by standardizing what we're building.
归根结底,他们希望加剧供应商之间的竞争。而实现这一目标的最佳方式不是与单一供应商签约,而是告诉所有供应商:'这就是我们需要的产品规格,按这个标准生产'。
Ultimately, what they want is their suppliers to be in increased competition. And the best way to do that is not to make one deal with one supplier, but to say, hey, suppliers. This is what we need. This is what we like. You make this.
只要是最便宜的,我们就采购。
We'll buy it. As long as it's the cheapest one out there.
没错。这彻底扭转了权力关系。现在不再是供应商制定规格标准、为不同客户提供不同方案,而是能够实现标准化,对供应商说:'你们...'
Yeah. It flips the power dynamic. It now instead of the vendors and suppliers dictating what the specs are and having multiple different specs for multiple different customers, they're able to standardize it and say, you're
将响应开放计算项目标准。这与我们在谷歌看到的所有创新方向一致。实际上这里有一个巨大的平行呼应。谷歌内部开发了所有这些我们讨论过的软件编排工具,使数据中心可靠运行,而Facebook成长于开源世界,从中受益并能够在此基础上构建。因此,他们审视硬件设计并提出'这有什么不同?'也就不足为奇了。
gonna respond to the open compute project standards. And it was directionally aligned with all the innovation that we saw at Google. There's actually a big parallel echo here. Google built all of these in house software orchestration tools that we talked about that made the data center reliable, and Facebook grew up in that open source world and benefited from that and was able to build on top of that. And so it's not surprising that they're the ones to look at the hardware design and say, why is this any different?
我们不想成为基础设施这部分唯一的定制开发者。实际上,拥有其他开发者和生态系统对我们有益,部分原因在于——这确实触及了商业模式动机——我们并非试图出售这些创新。除了信息流广告,我们并不打算向其他企业销售任何东西。这套基础设施越好越便宜,我们的业务利润率就会越高。
We don't want to be the custom only developers of this part of our infrastructure. It actually benefits us to have other developers and the ecosystem, in part because and this really gets to a business model motivation we're not trying to sell that innovation. We're not trying to sell anything to another business other than ads on our feed. The better and the cheaper this infrastructure is, the better our business margins are going to be.
没错。我们希望削减这项成本支出,从而能将更多预算和注意力集中在核心业务上。他们审视现状后认为:行业使用19英寸机架只是因为电信公司这么做。但这并不符合我们的需求。19英寸机架根本没有存在的必要。
Yeah. We want to take this cost item and reduce it, and so we can put more of our budget, more of our focus onto the core business. They're looking at it and they're like, well, industry is using a 19 inch rack because that's what the telcos did. Like, well, that's not suiting what we needed. There's no reason for the 19 inch rack.
这只是一个历史遗留元素。于是Facebook决定另辟蹊径。他们将宽度扩大到21英寸并公开发布,以提高效率并更好地满足自身需求。
It was just a a legacy element. So Facebook decides to do differently. They widen it to 21 inches and publish that to make it more efficient and fit their needs better.
有趣的是企业在战略开放上的选择。因此必须清醒认识他们开放什么、不开放什么。Facebook会公开新闻推送排名算法的具体运作方式吗?不会。但这与基础设施层截然不同——当生态系统采用该层时,他们能从中获益。
It's this funny thing how companies are open strategically. And so it's important to be clear eyed about what they're open about and what they're not. Would Facebook publish publicly the exact way that their newsfeed ranking works? No. But that's very different than this infrastructure layer that benefits them when the ecosystem adopts it.
我认为最令人惊叹的是整个行业如何围绕它团结起来并开始加入。OCP在业界掀起涟漪。微软开始贡献设计方案,谷歌带来重大改进——如我们讨论过的48伏设计,电信公司等各方纷纷加入,产生了预期效果和飞轮效应。
I think one thing that's amazing about it is how much the industry rallied around it and started to join it. OCP ripples across the industry. Microsoft starts bringing their designs into it. Google contributes big improvements, like we talked about, their 48 volt design and telcos, and everyone joins this, and it has the intended desire and the flywheel effect.
这确实使透明理念成为常态,当时正值谷歌等公司首次公开其PUE(能源使用效率)之际。这种'我们可以通过公开成果来推动行业发展、降低成本,让业务这部分对所有人都更轻松'的观念由此确立。
It did normalize the idea of transparency, and this was right around the time where companies like Google at first were publishing their PUE. And so this notion of we can be transparent with our results as a way to drive the industry forward, bring down costs, and make this whole part of the business easier for everyone
并激发更多竞争。这四大超大规模云服务商之间就此展开了可持续性竞赛。我认为这一切始于2007年,当时谷歌宣布将实现运营碳中和。
and spur more competition. This kicked off the sustainability race amongst these four hyperscalers. I would say this started all the way back in 2007 when Google announced that they're gonna have operational carbon neutrality.
每家公司随后都认真跟进。微软在2012年承诺实现碳中和运营,这影响了他们的选址决策、电力采购协议,以及合作方选择和与公用事业公司的关系谈判方式。
Each of the companies followed, and and they were serious. Right? Microsoft commits to carbon neutral operations in 2012, and it drove site selection. It drove their power purchase agreements. It drove how they negotiated and determined who they were going to work with and their relationship with their utilities.
这种趋势不断加速。亚马逊发起气候承诺,目标是2040年前实现净零排放。微软在2020年宣布了2030年前实现水资源正效益的目标。谷歌则在一年后跟进发布了类似目标的声明。
And it just accelerates and accelerates. Between Amazon, they kick off the climate pledge to get night zero by 2040. You have zero waste goals. You have water positive by 2030 by Microsoft announced in 2020. Google catches up to have a similar announcement of the same date a year later.
于是我们进入了一个承诺与目标不断加速的时代。值得思考的是:为什么会出现这种情况?仅仅是为了形象工程吗?根据我在其中一些公司的工作经历,我真诚认为领导层确实希望公司产生积极的环境影响,同时这也是经济上正确的选择——这些效率目标使数据中心(运营规模中不断增长的部分)成本持续降低。
So you just have this era of accelerating commitments and goals. And I think it's good to ask, why is this? Is it just feel good? And having worked at some of these companies, I do genuinely think the leadership wants the company to have positive environmental impact, and it is the economically right thing to do as well. These goals of efficiency lead to your data centers, which are a growing item of operational scale, costing you less and less.
另一个影响是催化了可再生能源采购模式的变革。像微软Brian Janis领导的团队已成为能源采购引擎,致力于通过虚拟电力采购协议推动新的太阳能和风能项目。谷歌更进一步提出要采购与数据中心用电时段匹配的清洁电力,实现24/7无碳供电。这实质性地推动了清洁能源生态系统的发展。
The other thing this does is catalyzes different renewable buying and procurement habits, where these teams, like the one led by Brian Janis at Microsoft, become these energy buying procurement machines that are looking to help accelerate that next solar project, that next wind project, to then be able to buy via what's called a virtual power purchase agreement by accounting for the clean electrons. Google pushes this even one step further to say, Hey, we wanna buy electrons that are generated at the same time that our data center is using electrons. So we want this notion of 20 fourseven carbon free electricity. And so this really pushes the clean energy ecosystem forward, I think, a material way. And if we look at these three commercial cloud businesses of AWS, GCP, and Azure, they are behemoths.
正如前文所述,Azure去年营收达到750亿美元,GCP突破500亿美元,AWS该业务年化营收达1110亿美元。三者总和显示,仅云基础设施业务的年支出就达2360亿美元。
As we said, Azure hit 75,000,000,000 last year, GCP surpassed 50,000,000,000, and AWS was on a run rate of $111,000,000,000 for that business. You total that up, that's $236,000,000,000 a year spent only on the direct cloud infrastructure businesses.
让我们用数据透视:这三家公司的基础设施年支出为2360亿美元,而全美居民年用电支出为5000亿美元——相当于全美居民用电支出的一半
Let's put this in perspective. So that's $236,000,000,000 across these three companies spent on the infrastructure. US consumers spend $500,000,000,000 annually on electricity. So we're talking about half the spend of all residential electricity
放眼整个国家。因此我认为可以说我们已经进入了计算的公用事业时代。只不过现在的公用事业公司是亚马逊、微软和谷歌。而Meta首先不想从这三家公用事业公司购买服务,他们自身用量太大以至于想自建基础设施,并且无意对外出售访问权限。
across this country. And so I think it is safe to say that we have entered the utility era of computing. Now it just so happens that the utilities are Amazon, Microsoft, and Google. Meanwhile, Meta, first of all, doesn't wanna buy power from those three utilities. They also use so much themselves that they wanna build their own and have no interest in selling access to it.
他们只需要自己的数据中心来支撑内部庞大的业务运营。于是我们在2010年代末迎来了非常成熟的云计算——它几乎已成为背景板,人们甚至不再需要讨论和解释云计算。就像理所当然地,这就是你创办和运营公司的方式。
They just need their own data centers to run everything that they're doing in house at such an immense scale. And so we end the 2010s with a very mature cloud. I think it's almost in the background. People don't even need to talk about and explain the cloud anymore. It's like, of course, this is how you're gonna start and run your company.
所有功能都已发展成熟。各公司现在都在运行容器化的Kubernetes系统,服务具有可移植性。虽然存在功能差异和销售策略不同,但它们都趋于成熟稳定,具备多区域部署等完整特性。
The the functionality's all grown up. All of the companies are now running containerized Kubernetes things. Your services are portable. Yes, they have their functional differences and their sales differences, but they're all kind of mature and stable, multi region, all of this stuff going on.
已经没人会问'你有云计算战略吗'这种问题了,它早已成为商业运作的基本方式。
Nobody asks, do you have a cloud strategy anymore? It just is the way business is done.
还有那些填补空白的新创公司。比如Snowflake正在帮助构建更专业的数据库,Datadog提供编排服务——这些真正具有公共规模的企业完善了云计算时代的缺失组件。
You also have the new startups that built out the missing pieces. You have the snowflakes of the world. They're helping building more specialized databases. You have Datadog helping you with orchestration. Real significant public scale businesses that have built the missing components of the cloud moment.
当前的态势体现出:云基础设施已趋成熟,超大规模数据中心建设步入成熟期,但同时面临着持续扩张的巨大压力。这是一场全球竞赛,激烈而残酷,参与者远不止这些超大规模服务商。举例来说,微软在短短几年内就将数据中心区域从35个扩展到75个,相当于每月新增一个完整区域。
And the feeling right now is one of maturity of the cloud infrastructure, a maturity of a new large scale data center build out, and at the same time, a huge pressure to continue to build. There is a global race. It is fierce. It is competitive, and it has many, many, many more players than just these hyperscalers. To give you a sense, Microsoft is scaled from 35 regions to 75 regions over a period of a couple of years since they were adding an entire region a month.
这是数据中心区域的扩张。交易规模同时激增:十年前大型租赁可能是5兆瓦,而到2010年代末,每家超大规模服务商都在预订100兆瓦的园区——十年间数据中心规模增长了20倍。
This is data center region build outs. Deal sizes ballooning at the same time. A decade ago, a large lease might have been five megawatts. And you fast forward to the end of the twenty tens, and every one of the hyperscalers are reserving 100 megawatt campuses. So a 20 x growth in the size of the data center over this decade.
这两大要点契合。如今云计算能被视作理所当然且隐形,全赖这些全球规模的基建扩张。若系统无法在用量激增时稳定运行,这一切就无从谈起。
These two big points fit. The whole idea that the cloud now can be taken for granted and invisible is because of these scaled global build outs. You don't get to do that if the thing isn't just working as usage is exploding.
整个行业已围绕此成熟。私募资金涌入,开发商将其视为房地产资产类别,他们囤积资源、超前建设变电站,因为他们清楚超大规模企业或下个大公司的需求,会迅速抢占这些容量。因此陆地上的建设竞赛异常激烈。
The entire industry has matured around it. Private equity money has come in. You've got developers treating this as a real estate asset class, and they're stockpiling and they're prebuilding substations ahead of demand because they know what the hyperscaler or the next large company is going to need, and they're gonna snap up that capacity. And so this build out is fierce on land.
不仅陆地上激烈,海上同样如此。仅拥有自己的服务器、建筑和电力基础设施还不够,你需要确保连接不中断。最可靠的方式就是自建海底电缆。所以微软、Meta、谷歌等公司除了继续租赁容量外,开始投资建设自己的海底网络。
Not only is it fierce on land, it's fierce in the seas. It's not enough just to have your own servers and your own buildings and your own power infrastructure. You need confidence that you can stay connected. The best way to have confidence is to put your own cables underwater. And so instead of just renting capacity, which, of course, they continue to do, but Microsoft, Meta, Google, and others start financing and building their own undersea network.
这样他们就能实现端到端控制。想想谷歌的垂直整合:从服务器机箱到机架,到建筑,到园区,再到连接园区的线路。如此才能对建设成果有信心,获得更优的经济效益和性能表现。而性能提升永无止境。
And this way, they can control end to end. You think about Google's vertical integration. It starts with the box, it goes to the rack, then it goes to the building, then it goes to the campus, then it goes to the wires that connect the campus. That's how you get confidence in what you're building and improved economics and improved performance. And performance continues to drive forward.
但到了
But by the
2010年代末,PUE(电能使用效率)已稳定在惊人的1.1左右。这意味着这些设施中仅10%的电力不是直接用于服务器和IT设备。正如我们在可持续发展竞赛中讨论的,炫耀的资本已经转移。营销团队不再强调PUE,转而讨论碳使用效率(CUE)或水使用效率(WUE)。
end of the twenty tens, PUE had plateaued around an incredible 1.1. Again, only 10% of the power being used in this facility is not directly for the server and IT equipment. And so as we talked about in the sustainability race, the bragging rights had shifted. The marketing teams had moved off of PUE, and now they're talking about carbon intensity or CUE. Now we're talking about water intensity, WUE.
现在的终极指标是:谁能打造全年无休、实时使用可再生能源的数据中心。所有企业都在发布气候承诺,将其纳入规划,通过对碳强度的切实关注,推动整个可再生能源和电力购买协议(PPA)领域的发展。
And the holy grail metric here is who is launching twenty four seven real time renewable powered data centers. And everyone's putting out their climate pledges, building into their plans, and driving the entire renewable energy and PPA world forward through this genuine focus on the carbon intensity.
到2020年,云计算已臻成熟,它完成了应尽的使命。我们拥有了这项可扩展、成熟的全球技术,用于存储数据、进行计算、连接所有人、流媒体传输内容以及支持视频通话。进入2020年时,这为何会显得尤为重要?
The cloud by 2020, it was doing what it needed to do. And so we now had this scaled, matured, global technology for storing things, computing things, connecting all of us, streaming content, powering video calls. Why might that be helpful as we go into 2020?
本,我们现在正面临近代史上最令人难忘的时刻——新冠疫情肆虐的黑暗时期。那时我们都被迫居家隔离,整个社会体系、公共卫生系统等方方面面都承受着巨大压力。数据中心基础设施同样如此。尽管2010年代云计算经历了快速稳定的增长,但疫情将五年的应用进程压缩至约十八个月。随着全民居家,视频通话、游戏、在线协作、远程医疗、电子商务、SaaS等所有领域同时爆发性增长。我们都记得那些没完没了的视频会议,但让我们更全面地看待这一现象。
Well, Ben, we're now upon everyone's favorite moment in recent history, The dark days of the COVID pandemic, a time where we all had to stay at home and a time where society, systems, public health, everything was stretched to the max. The same actually goes for our data center infrastructure. While the twenty tens saw fast, consistent growth of cloud computing, COVID compressed five years of adoption into about eighteen months. With everyone at home, video calls, gaming, online collaboration, telehealth, ecommerce, SaaS, everything surged all at once. And we all remember the endless video calls, but let's put that into perspective a little bit.
进入这个时期前,Zoom的每日会议参与者约为1000万人。仅仅到2020年3月,这个数字就飙升至2亿。四个月内啊,短短四个月。一个月后,又增长到3亿。
Going into this era, Zoom had something like 10,000,000 folks as daily meeting participants. Fast forward just to March 2020, they hit 200,000,000. In four months. In four months. One month later, 300,000,000.
这堪称爆炸式增长。Google Meet同样如此,四月份每天新增300万用户。峰值使用量较一月份增长30倍,游戏行业也呈现井喷之势。人们居家隔离,寻求娱乐消遣,因此Steam平台不断刷新纪录。
So it's explosive growth. Google Meet, similarly, 3,000,000 new users per day in April. Peak usage was up 30 x since January, and gaming was exploding. They're at home. They're looking for entertainment, so Steam is shattering records.
众所周知,Netflix和YouTube不得不限制网速、降低画质,以避免互联网系统崩溃。
Netflix and YouTube famously had to throttle down the network speed to lower resolutions to avoid essentially breaking the Internet.
当时我在Stripe工作。就电子商务而言,那确实是个爆发性增长阶段。虽然某些行业(尤其是旅游业)举步维艰甚至倒闭,但Stripe大多数用户业务量都在激增。明白吗?所有人都在线购物——Instacart、DoorDash等平台。用'十八个月完成五年的增长'来形容毫不夸张。
I was at Stripe at the time. And to the ecommerce point, it was just this explosive moment where, sure, you saw some businesses, perhaps in travel in particular, struggle and and go under, but the majority of Stripe's users were exploding. Right? Everyone was ordering stuff online, Instacart and DoorDash. It's easy to say five years of growth in eighteen months.
当你身处基础设施建设一线时,实际感受如何?感觉所有系统都在崩溃边缘,而你拼命确保外界毫无察觉。我们最终成功实现了这一点,为此深感自豪。API始终保持运行,支付处理即使面对急速扩张也从未中断。
What does it actually feel like when you're working on the infrastructure? It feels like everything is breaking, you're trying to make sure no one in the outside world feels it. And we managed to accomplish that, and we're very proud of that. The API stayed up, and payments were processed even as they scaled rapidly.
另一家备受关注并抓住机遇的公司显然是Zoom。他们不得不在数月内将基础设施从1000万用户扩展到3亿用户。他们在同城托管中心增加了服务器,使用了AWS和Azure。
Another company in the crosshairs and seized the day was obviously Zoom. They had to scale infrastructure from 10,000,000 to 300,000,000 users in months. They added servers in colocation metros. They used AWS. They used Azure.
他们还在Oracle云基础设施上进行了大规模扩展,竭尽所能确保视频通话没有延迟。
They expanded aggressively on Oracle Cloud infrastructure. They were just doing everything they could to ensure those video calls didn't have delays.
从技术层面看,Zoom的运作方式可分为两部分:Zoom控制面板负责处理登录并识别会议参与者身份;而视频流则通过会议区域系统,确保将用户连接到运营商密集建筑中最邻近的会议区域。这与我们之前讨论的Netflix的CDN网络非常相似,但具有双向特性。因此他们竞相在我们之前提到的电信酒店中增加更多嘌呤机(指高性能服务器)。
Under the hood, the way Zoom works, you can think of it split into two. Zoom control pane, which was making sure that it would respond to your logins and understand what you are as a participant in a meeting. But then you actually have the video feed, and that would have these meeting zones where Zoom would make sure to connect you to the closest meeting zone in a carrier dense building. So it's kinda very similar to the CDN networks that we talked about earlier with Netflix, but had this dual bidirectional nature to it. And so they were in a race to add more of these purine machines in those telco hotels that we talked about earlier.
很快这成为云服务弹性承诺真实性的证明。在这个时代最严峻的压力测试中,容量能随着人类需求即时实现。这正是我们面临的爆发性需求时刻,有几个关键主题正在影响数据中心领域,其中最著名的就是ZERP。
And soon became proof that cloud's elastic promise was real. In the greatest stress test of its times, the capacity could materialize as fast as humanity demanded it. This was the moment we're in, exploding demand. And there's a few key themes that affect the data center world in this moment. One of them is famously known as ZERP.
2020年3月15日,美联储将目标利率下调至0-25个基点区间。这对基础设施领域意味着什么?这实质上降低了风险门槛和投机性投资的门槛。现在园区和空壳设施可以在签约租赁前就提前建造,因为基础设施的持有成本极低——资本成本实在太便宜了。
So on 03/15/2020, the US Federal Reserve cut its target rate to the range of zero to 25 basis points. What does that mean? In the infrastructure world, that basically lowers the hurdle rate for your risk, for speculative bills. So now campuses and shells can be built out way ahead of leases being signed because the carrying cost of that infrastructure was so low because your cost of capital is so low.
就像买房一样,当时房贷利率处于历史最低点,所以你能买更大的房子。同理,对于建设数据中心的公司也是如此——同样的本金可以建造更多数据中心,资金使用效率大幅提升。
You're buying a house. This was the time when your interest rate on the mortgage was as low as possible, so you could buy more house. Well, if you are a company trying to build data centers, it's the same story. You could buy more data centers. Your same amount of principle is gonna go much further in building out more.
作为这个基础设施环境中的企业,你现在可以承担更多风险。
And as a company in this infrastructure environment, you can take more risk.
超大规模企业趁机大举投资新建数据中心,黑石集团也紧随其后。于是在2021年,黑石以100亿美元收购了行业顶级数据中心开发商QTS房地产信托公司。
The hyperscalers took advantage of this and started pouring money into building new data centers, but so did Blackstone. So in 2021, Blackstone purchases QTS Realty Trust, one of the largest data center builders in the industry for $10,000,000,000.
市场上资金泛滥,而数据中心资产类别的魅力在于其收益开始显得高度可预测,几乎像公用事业回报率。所有人都在投资房地产信托基金和基础设施债券。这对劳登县等建设数据中心的地区是重大利好——我们之前讨论过这个,对吧?
There's so much capital floating around, and the beauty of the data center asset class was that it has started to look very predictable. It was looking like utility style returns. Everyone's investing in the REITs and the infrastructure debt. This is fantastic for the counties that were building this out, namely Loudoun County, which we've talked about before. Right?
疫情期间他们的财政预算吃紧。但得益于数据中心走廊,服务器设备税年收入突破4亿美元,使他们无需通过提高房产税来填补财政缺口。
Their budgets are suffering in the COVID era period. But because their data center alley, they saw their server equipment tax go past $400,000,000 annually. And so that prevented them from needing to raise property taxes to fill their budget gaps.
我记得有段时间他们的空置率甚至不到1%。空间、托管服务和互联需求极其旺盛,所有资源都供不应求。
I think at some point, their vacancy was below 1% in all of this. So there's just so much demand for space, colocation, interconnection. Everything is just flying off the shelves.
另一个关键主题是什么在推动建设热潮?现在你会发现,电力供应紧迫性引发的恐慌正在蔓延。既要考虑电力供应总量,也要计算通电时效——因为大量资本正等待投入运作。核心问题其实是'最快何时能供电?'这种局面前所未有。
Another major theme here is what is defining the build out? And what you find here now is that starts to grow, the panic around the urgency for electrical supply. And it was both a amount of supply of power that you want as well as a time to power because there was this capital looking to go to work. The question was actually how soon can you get me the power? This is new.
对吧?这种规模——
Right? The scale of
早期的数据中心,那些5兆瓦、10兆瓦、15兆瓦的设施,根本不需要与电力公司深入协商。而现在我们讨论的是50到100兆瓦的用电需求,这彻底改变了与供应商的博弈关系。爱尔兰甚至首次出台法规,允许电力公司以负荷过载为由拒绝数据中心入驻。
the data centers earlier, this five, ten, 15 megawatt data center, didn't necessitate a big conversation with the utility. We're now talking into the 50 to a 100 megawatt bills, and this changes the dynamic with suppliers. We had folks in Ireland saying, for the first time, we're gonna pass regulation that says the utility can block a data center from coming in because it's too much load.
你不能直接走到电力公司就接入100兆瓦的电力。电网容量和规划首次成为限制数据基础设施增长速度的瓶颈因素。
You can't just walk up to a utility and plug in for a 100 megawatts. For the first time, grid capacity and planning was a throttling factor in how fast data infrastructure could grow.
2010年至2018、2019年期间,正如我们讨论过的,是向云计算大规模转型的时代,这些系统效率提升最终抵消了需求和容量的增长。因此尽管行业规模大幅扩张,电力使用增长实际上相对平缓。虽然需求增加,但增幅稳定,效率跃升与规模增长保持同步。
The era from 2010 to twenty eighteen, twenty nineteen, which was this big shift to the cloud that we talked about, was one of increased efficiency in these systems that ended up bouncing out the increase in demand and capacity. So there's actually relatively flat power use growth despite there being so much growth. While there was more demand, it was a steady increase, and the efficiency jumps could go in concert with the growth jumps.
所以当你购买新手机或新电脑时,大部分组件都相同,只是得益于摩尔定律获得了更强的处理能力。这种效率提升让我们能建设大量新数据中心和计算设施。但事实上这一时期数据中心的总体能耗保持相对稳定,这主要归因于几个因素。
So you buy a new phone, you buy a new computer, most of it's all the same, except you just get more processing power because of Moore's Law. And that brought about these efficiency gains. And so we were building a lot of new data centers and compute. But total energy consumption of data centers actually remained relatively flat throughout this period. It was largely due to a few factors.
其一是我们转向了云计算。
One is that we moved into the cloud.
这样我们之前讨论的利用率问题就开始发挥作用了,对吧?
And everything we talked about earlier with utilization then can come into play, right?
没错。一项研究发现云计算比本地部署计算资源效率高出93%,因为减少了资源浪费,再加上我们讨论过的PUE(电源使用效率)优化措施。
Exactly. One study found that the cloud is 93% more efficient than running your compute power on premises because you have less wasted assets, combined with everything we talked about with PUE and driving down that efficiency.
但现在突然之间,新冠疫情加上零利率环境,人们不再等待效率提升,而是拼命扩大建设规模。
Well, now all of a sudden, you had COVID plus ZERP, where you're not waiting to get more efficient. You're just trying to build more and more.
我们差不多已经用尽了那些容易实现的目标,对吧?PUE(电源使用效率)现在是1.1。
We'd also kind of run out of that low hanging fruit. Right? PUE is at 1.1.
如果你看看我们之前提到的那四大巨头——亚马逊、微软、谷歌和Meta,在2017至2021这四年间,它们的能源使用量翻了一番,2021年达到了72太瓦时。
So if you look at the big four that we talked about before, Amazon, Microsoft, Google, Meta, between 2017 and 2021, in those four years, they doubled their energy use to 72 terawatt hours in 2021.
需求激增,加上PUE优势和云效率的丧失,意味着我们正面临一个全新的电力使用范式,整个行业都不得不正视这个问题。
So much greater demand, combined with a loss of the PUE benefits and the cloud efficiency, means we're in a new paradigm of power usage, and the industry is having to reckon with that.
令人惊讶的是,在我们刚刚讨论的快速增长背景下,居然有过一段如此平稳的时期。我是说,云计算的全面扩展带来了惊人的使用量增长,效率提升也非常显著。但我们现在已经用尽了那些招数——冷热通道隔离、所有设备虚拟化。
It's pretty amazing that we had such a flat period for how much growth that we just talked through. I mean, the whole scale up of the cloud led to, wow, growing usage, the efficiency gains were so significant. We've run out of those tricks. The hot and cold aisles are contained. We've virtualized everything.
我们已经将所有东西容器化。现在需要想出些新招数。我保持乐观,毕竟需求是发明之母,我们一定会找到新方法,因为我们必须这么做。
We've containerized everything. We need to come up with some new tricks. I hold out some optimism that necessity is the mother of invention, and we'll figure out some new tricks because we'll have to.
我们必须这样做。
We have to.
话虽如此,这个时代正以一种极端方式出现新的电力消耗大户。让我们稍微聊聊加密货币矿工吧。
That being said, there is a new power user showing up in a extreme way in this era. Let's talk about the crypto miners for a bit.
我们来聊聊加密货币,因为它不仅仅是一种爱好,实际上在数据中心基础设施建设中已成为一个强有力的买家。
Let's talk a little bit about crypto because it's not just a hobby, but actually ends up being a competitive buyer in the data center infrastructure build out.
到2022年,大约100到150太瓦时的电力——这意味着数据中心近一半的电力都流向了加密货币挖矿。回溯到2009年2月,比特币刚诞生时还只是互联网的阴暗角落产物。有人在卧室用个人电脑挖矿。后来人们发现,使用现成的GPU能提高效率,甚至开始专门设计只能挖比特币和其他币种的定制ASIC芯片。
By 2022, a 100 to a 150 terawatt hours, so that's almost 50 that the rest of the data centers is now going to crypto mining. Go back to 02/2009, Bitcoin launches, and it's pretty much just a dark corners of the Internet thing. Right? Someone mining on a PC in their bedroom. Then finally, folks figure if you if you buy the GPUs off the shelf, you can go faster, and people start even designing custom ASICs that are designed to do nothing but mine Bitcoin and other coins.
到2010年代中期,矿场已成为重要电力消耗源,人们开始在能找到廉价电力的任何地方建立矿场。
By the mid two thousand ten, mines had become significant power loads, and folks had started to build these out wherever they could find cheap electricity.
虽然加密货币背后的技术很有趣,但从商业角度看,它更像是电力套利生意而非IT业务,因为
And while the technology behind crypto was interesting, from a business perspective, it was less of a IT business and more of a power arbitrage business because it
一切都取决于你的挖矿效率和选址。本质上就是尽可能多次地进行高强度数学运算,纯粹是用电力解决问题。这与我们之前讨论的数据中心概念截然不同——这里只追求最低成本的计算。
was all based on how efficiently you could mine and where. You're doing a hard math computation as many times as you can, which is purely solving problems using power. And this is very, very different than everything we've said about data centers before. This is just compute as cheaply as possible.
这导致现实世界中出现冲突点:在冰岛和华盛顿东部等市场,比特币矿工出价压倒云服务商,争夺廉价水电资源。
And this really comes to a collision point in the real world where you had markets like Iceland and Eastern Washington, where Bitcoin miners were outbidding these cloud providers for the cheap hydro and the power that was available.
后来加密货币行业意识到有些电力对他人无用。于是开始寻找连数据中心或常规工业负荷都无法部署的场所——比如在油田燃烧伴生气时,何不安装涡轮机?用这些废气发电来维持加密货币矿场。
Then the crypto industry starts to realize that there's power that's not useful to others. Let's look for ways and places to put crypto mining that you couldn't even put a data center or a normal industrial load. You're in an oil field and you're flaring excess gas. Well, what if you put a turbine there? You burn that and you power a crypto mine.
这就是克鲁索(Crusoe)——后来成为顶级人工智能数据中心建设项目之一——如何起步的,通过寻找廉价能源。有趣的是,许多与电力相关的创新都源于加密货币热潮。
This is how Crusoe, which later becomes one of the premier AI data center build outs, gets started, finding that cheap power source. And so it's interesting how many of these power related innovations came from the crypto moment.
当时的加密货币负载有所不同。它们规模庞大但流动性强,这种特性在快速部署方面具有优势。但这也意味着政策能迅速介入并重塑行业格局。
Then crypto loads were different. They were large, but they were fairly mobile. So that was an advantage in terms of where how quickly they could place. But it was also meant that policy could step in and shape the industry fairly quickly.
到2021年,中国承载了全球大部分加密货币挖矿活动。这里有最廉价的电力,且靠近硬件制造地。但2021年中国全面禁止了加密货币挖矿。
By 2021, most crypto mining was happening in China. There is the cheapest power. Was It close to where the hardware is being manufactured. But in 2021, China cracked down. It's no more crypto mining.
于是矿工们纷纷逃往德克萨斯、哈萨克斯坦和加拿大。到2022年初,美国在全球加密货币挖矿中的份额已回升至近40%,而2020年仅为3-4%。
And so all the miners fled to Texas, to Kazakhstan, to Canada. And so by early twenty twenty two, The US share of crypto mining was back up almost near 40%. This is up from three to 4% in 2020.
这个最初作为爱好兴起、后来席卷金融科技界的产业,迫使房地产电力与计算行业不得不应对如何将超大规模弹性负载接入紧张电网的难题,并由此催生创新。加密货币行业本身并不在意碳足迹,但数据中心行业则持续高度关注运营的碳强度——如今超大规模企业都在以极致精度追踪碳排放。
What started out as a hobby and took over the fintech world and financial infrastructure forced real estate power and the compute industry to wrestle with how to connect very large, very flexible loads to stressed grids and what innovations can be born out of trying to wrestle with that. And on that point, crypto as an industry didn't care much about its footprint. But the rest of the industry and the data center industry in particular continued to be very focused on the carbon intensity of their operations. And now these hyperscalers were tracking their carbon footprints with extreme precision.
2020年亚马逊成为全球最大可再生能源采购商。这是规模效应使然——签署气候承诺后,为实现净零目标,亚马逊不仅要实现数据中心和办公室的脱碳,其遍布全球的物流中心和空前规模的包裹运输网络都需转型。
By 2020, Amazon became the world's largest buyer of renewable energy. It's a scale thing. Amazon signs the climate pledge, for Amazon to hit net zero, they don't just have to decarbonize their data centers and their offices. They have fulfillment centers. They have travel of unprecedented scale delivering all of the packages.
这远不止是所谓'正确的事'。正如我们讨论过的,在零利率环境下,可再生能源往往是以最低成本实现电网扩容的最佳方案,这使得电力购买协议更具吸引力并降低门槛利率——原本可能因成本过高被否决的风电项目,现在也能盈利并与微软的新建项目签约。
And this is beyond just the quote unquote right thing to do. As we've talked about, oftentimes, the cheapest, most economical way to get new load onto the grid and scale is through renewable power. And in a zero interest rate environment, it actually makes these PPAs more and more attractive and it lowers the hurdle rate. So a wind project that might otherwise have been too expensive in this environment is profitable and is able to be signed on to a Microsoft new build out.
很难充分理解数据中心服务需求增长所驱动的飞轮效应:零利率环境为数据中心建设提供了资金,进而推动可再生能源的大规模建设,而这些同样受益于零利率环境。你们刚刚经历的基础设施建设飞轮,为我们迎接未来可能的一切做好了准备。
It's hard to appreciate the flywheel of more demand for the data center services driving data center build out, which can be funded by a zero interest rate environment, which then all drives all of the renewable energy build out, which is similarly funded by the zero interest rate environment. You just had an infrastructure build flywheel running that set us all up for what might come next.
这是数据中心发展史上一个极其关键的节点,实际上为我们将要见证的AI爆发式增长提供了前所未有的规模预演。新冠疫情时期,廉价资本涌入系统,推动大规模新建项目,电力开始成为基础设施发展中更严峻的制约因素,同时支撑这些建设的全球供应链出现断裂——这一切都发生在巨大繁荣来临的前夜。现在我们讨论过建设大型基础设施的公司,那些支撑我们生活的巨型服务器,而房间里还有一头价值4万亿美元的大象,我认为是时候介绍它了。
It was an extremely timely moment in the evolution of data centers and actually served as a dress rehearsal for the unprecedented scale that we are going to see in the AI boom. This moment during COVID with cheap capital flooding the system, driving a massive new build out, power starting to become a much more limiting reagent in the development of our infrastructure, and a breaking down of our global supply chain, which was feeding this build out. All happens at this moment right before a massive boom. So now we've spoken about the companies building the big infrastructure, the big servers powering our life, and we have a $4,000,000,000,000 elephant in the room that I think it's time to introduce.
现在该谈谈英伟达了。英伟达最初是一家致力于让电脑游戏运行更快的公司。他们多年来一直专注于制造游戏芯片。我记得他们从斯坦福大学电子工程系招募了许多学生来研发游戏芯片。
It's now time to talk about NVIDIA. NVIDIA started as a company that was trying to figure out how to make gaming run faster on computers. So they made gaming chips, and they did this for years and years and years. And I remember them recruiting out of the Stanford double e department, many a student, to go work on gaming chips.
他们与游戏产业紧密相连,公司成长与之息息相关。
And they were very much tied to the gaming industry. Their growth was correlated to it.
但事实证明,游戏中进行的那些用于渲染屏幕像素的数学运算,同样适用于解决复杂科学问题——需要同时处理大量艰深数学运算的领域。如果要构建加密货币系统需要什么?正如我们之前所说,就是同时解决大量复杂数学问题。这需要海量计算能力,大规模的矩阵乘法运算,以至于他们的股价开始与加密货币挖矿的兴衰挂钩。
But it turns out the same thing that you're doing in a game, which is doing a bunch of math to figure out how to render a pixel on a screen, is what you need sometimes to do complex science, a bunch of tough math problems at once. And if you're gonna build a cryptocurrency, what are you doing? As we said earlier, a bunch of tough math problems at once. A lot of compute. Matrix multiplication at scale to such an extent that their stock price starts to get tied to what's going on in crypto mining.
起初他们绑定游戏产业,后来与加密货币的关联更加紧密。到了2018年,加密货币寒冬来临。加密货币行业的暴跌导致英伟达股价单日下跌17%,期间市值更是腰斩。
First, were been tied to the game industry. Now they get even more correlated to crypto. And come 2018, you have the crypto winter. That fall in the crypto industry dragged NVIDIA stock down 17% in one day. Over the period, get cut in half.
随后新冠疫情爆发,加密货币经历繁荣-萧条周期。但当我们浏览YouTube、Netflix和Instagram时,机器学习算法正输出越来越多的推荐内容。知道什么在驱动这些推荐模型吗?正是英伟达的GPU。因此这些训练模型的增长开始迅猛发展。
And so enter COVID, and you hit a crypto boom bust cycle. But machine learning algorithms are spitting out more and more recommendations to us as we're sitting on YouTube and Netflix and Instagram. And you know what powers a recommendation model? An NVIDIA GPU. And so growth of these training models takes off.
你可以把它看作他们收益中的一个项目,对吧?他们为所谓的‘数据中心业务’划出了这一块,实际上正是这部分业务在增长。你看到的增长率是60%。
And you can see it as a line item, right, in their earnings. They have this carve out for what they call their data center business, which is really the growth of this business. You see 60% growth.
仅一个季度就达到近40亿美元,这主要归功于他们的A100芯片。他们卖出了成千上万片这样的芯片。
This hits nearly $4,000,000,000 in just a quarter, and this is mostly consisting of their a 100 chips. They're selling thousands of these.
没错。这些A100芯片成为早期模型训练时代的支柱。2020年,微软宣布与OpenAI合作进入新阶段,承诺购买1万片或更多A100芯片,专门为OpenAI建造一个数据中心,一台超级计算机,以继续开发他们的模型。记住,那时OpenAI还不太为人所知。
Right. And so these a 100 chips become the backbone of the early era of training models. In 2020, Microsoft announces a next stage of their partnership with OpenAI. They commit to buying 10,000 or more of these a 100 chips to build a special data center, a special supercomputer for OpenAI to continue developing their models. Remember, at this time, OpenAI was not really a household name.
它当时只是个研究项目,对吧?
It was a research project, right?
是的。没错。它是一个资金充足、领先的AI实验室,与微软深度合作,通过API提供模型。同年,他们发表了一篇论文,真正展示了他们的发现,即所谓的‘扩展定律’,表明投入问题的计算能力越大,数据集越大,模型就越好。谷歌也发现了非常相似的情况。
Yeah. That's right. It was a leading, well funded AI lab with this deep collaboration with Microsoft building models that were available via API. In the same year, they published a paper that really demonstrated something that they had discovered and what they call their scaling laws, showing that the bigger the compute was that you threw out the problem, the bigger the dataset you threw out the problem, the better the model. And Google was discovering very much the same thing.
芯片越多,模型越好。多么方便啊。芯片越多,模型越好。我想这对Jensen来说简直是天籁之音。
The more chips, the better the model. How convenient. The more chips, the better the model. Music to Jensen's years, I think.
确实。所以在2022年,就在加密货币崩盘之际,OpenAI发布了新模型GPT-3.5。这是对GPT-3的巨大升级,他们在新的微软数据中心里构建了它,并思考着向世界展示他们成果价值的新方式。
Indeed. And so in 2022, right as the crypto crash is coming, you have OpenAI release a new model, GPT 3.5. It's a huge upgrade from GPT three, and they had built it in their new Microsoft data center. And they're thinking about new ways to show to the world the value of what they've been able to build.
你是说并非所有人都想通过API沙箱来了解模型的好坏吗?
Are you saying not everyone wants to use an API sandbox to understand how good a model is?
目前访问还有点困难。但如果在GPT上设计一个简单的UI对话框,加入一点聊天功能,至少
Right now, it's a little bit hard to access. But maybe if you put a simple box as a UI and create a little bit of a chat feature on the GPT, it'd be
能做个不错的演示,对吧?
a nice demo at least, right?
没错。可能会有一些人感兴趣。于是他们决定在2022年11月30日推出这个聊天功能——ChatGPT的发布。五天内,用户数就突破百万。
Right. A few people might be interested. So they decided to put out this chat function on 11/30/2022. The launch of ChatGPT. In five days, they had a million users.
这完全超出了团队的预期和容量规划。据Sam Altman所说,他们预计的关注度要低一个数量级。到2023年1月,ChatGPT已成为历史上增长最快的消费级软件应用。从五天百万用户,到仅用两个月就突破一亿用户。
This was far and away over the team's expectations, as well as their capacity planning. According to Sam Altman, they expected an order of magnitude less interest. By January 2023, ChatGPT has become the fastest growing consumer software application in history. So after getting a million users in five days, they had now crossed a 100,000,000 users in just two months.
而到去年夏天,ChatGPT网站已成为全球访问量前五的网站,仅次于Instagram。这说明确实存在消费者和企业层面的需求。世界等待AI成为现实已经很久了,而现在所有人都说:就是此刻,就是现在。
And by this last summer, ChatGPT's website is among the top five most visited sites in the world right after Instagram. So there is real consumer and enterprise pull. The world's been waiting for AI to be a thing for a very long time, but this is the moment where everyone says, okay. This is it. This is the time.
是啊。最终每天能收到超过十亿条提示词。人们简直爱死这东西了。
Yeah. You end up with over a billion prompts a day. People are loving this thing.
众所周知,要想变得更好,我们只需要更多芯片。为什么Jensen如此高兴?那么,三年后的今天我们身处何方?
And everyone knows, well, to make it better, we just need more chips. Why is Jensen so delighted? Well, where are we now three years later?
更多芯片,更好模型。对吧?三年前,2022年时,英伟达的数据中心收入正快速增长。当季度收入40亿美元。而上一季度,其数据中心收入达到了惊人的390亿美元——三年间增长了近10倍。
More chips, better models. Right? Three years ago, in 2022, NVIDIA's data center revenue was growing rapidly. 4,000,000,000 in revenue that quarter. Last quarter, their data center revenue was a staggering $39,000,000,000 That's in three years, it's growing nearly 10x of what it was.
三年内实现季度收入增加360亿美元,这是史无前例的。
To add 36,000,000,000 of quarterly revenue in three years, it's unprecedented.
这得益于规模损失的发现。
This is powered by this discovery of the scale loss.
他们产品发布的时机堪称完美。A100芯片本质上是为训练而生,用于扩展训练规模,但他们在2022年发布了H100——这款专为此刻设计的芯片就像是火箭助推器,新增了专为Transformer设计的特殊数学模式。
And the timing of their product launch could not have been better. They had the A100 that was essentially the training, scaling your training, but they launched the H100 in 2022. That was designed for this moment. It was the rocket boost of a chip. It added new special math modes tailored for transformers.
它还增加了更多高速内存,从而大幅缩短训练时间并加快推理速度。这些H100芯片正被疯狂抢购
It added more high speed memory. As a result, it slashed training time and made inference faster. And these h one hundreds are flying off
——从2023年售出150万片到2024年超过200万片。吸取了云计算热潮和疫情需求激增的教训后,各大厂商现在都是数十万片地批量订购。
the shelf from selling a million and a half in 2023 to over 2,000,000 in 2024. And the big players having learned from the past years of the cloud boom and the COVID demand boom, they're ordering these in the hundreds of thousands.
这些东西可不便宜。每个芯片都像一辆车,对吧?单价4万美元。据说你没法一次只买一个芯片。
And these things are not cheap. Each chip is is like a car. Right? $40,000 a piece. Supposedly you couldn't buy one chip at a time.
不,既然能装箱销售,为什么要一辆一辆卖车呢?
No. Why would you sell one car at a time when you can put them in boxes?
他们实际上只卖两种型号,对吧?一种是训练箱,一种是推理箱。训练箱是八块GPU加中央CPU的机箱,通过NVLink硬连接。最有趣的是这本质上是在制造一块巨型GPU,关键在于建立GPU间超高带宽的连接,因为训练时最需要这个。
They really sold in two models. Right? There's training boxes and inference boxes, and the training box is an eight GPU box with a CPU in the middle, and they're hardwired with this NVLink connection. What's really interesting about this is it's all about making it effectively one big GPU. It's all about creating this really high bandwidth connection between these GPUs because when you're training, that's what you want.
说到NVLink——如果有人装过电脑就知道主板上有PCI接口。NVIDIA显卡就是通过PCI接入电脑的,那感觉已经很快了。但对这个场景还不够快,所以NVIDIA发明了NVLink。
In NVLink, if anyone's ever built a computer in their day, you would have what's called a PCI connection in a motherboard. The way that your NVIDIA graphics card would plug into your computer was using PCI. It felt like there's a really fast connection. Well, it's not fast enough for this. NVIDIA invented NVLink.
它的速度可达PCI的15倍,让八块GPU能极速通信。一个训练箱售价约50万美元。他们还推出了推理箱,配备两块GPU和大容量内存。明确一下,推理就是当你向ChatGPT提问它回应的过程。
It's up to 15 times faster than that connection, so these eight GPUs can communicate extremely quickly. One training box, it's about 500 k to buy one of these things. Then they also launched these inference boxes, and so this is two GPUs coupled with a bunch of memory. To be clear, inference is when you're asking ChatGPT a question and it's responding. That process of responding is inference.
你需要大量内存来理解回应中的所有内容。这些机箱设计时就连接了大量高速内存,以便
You want a lot of memory to be able to understand all of the content in that response. These boxes were designed with a lot of fast memory connected to be able
在模型间共享上下文。把这些GPU装箱销售其实涉及物理空间问题。你们听过我们描述数据中心里一望无际的机架,这些机架有服务器插槽,传统服务器有标准功耗。但专用服务器会改变整个机架乃至数据中心的功耗密度。
to share context across the model. Now selling these GPUs in a box actually has a physical space implication. So you've heard us talk about the racks and racks in the data center as far as the eye can see. Well, these racks have slots for servers, and those servers have a traditional power consumption. But when you put in a specialized server, it starts to change the power consumption of that rack and therefore of the square footage and of the data center.
让我们稍作拆解,看看AI到来时,机架层面发生了什么变化。
So let's unpack just for a moment what's happening at the rack level when AI arrives.
历史上,一个新机架的功率是400瓦。你会建造一个装有20到30台服务器的机架,再加上一些网络设备。所以每个机架的功率大约在5到10千瓦之间。打个比方,这相当于同时运行10个吹风机的功率。
Historically, one new rack would be made 400 watts. And so you'd you'd build a rack with 20 to 30 of these servers. You'd have some networking equipment. So it's somewhere around five to 10 kilowatts of power going to a rack. That's like 10 hairdryers running at the same time, to give you a sense.
现在GPU机箱出现了。那些用于训练的GPU机箱,每个功率高达10千瓦,相当于整个旧机架的总功耗和发热量。但你可以在一个机架里塞进4到8个这样的机箱。突然间,机架的功耗就跃升了一个数量级,达到90千瓦。这不仅对供电有重大影响,冷却系统也需要相应升级。
Now the GPU boxes arrive. Those GPU training boxes, each of those is up to 10 kilowatts by itself, so it's the same power consumption and heat as the whole rack was. But you could now pack a rack physically with four to eight of those boxes. So all of a sudden, you're an order of magnitude up in power consumption up to 90 kilowatts in a rack. This has a bunch of implications, not just for power, but also for cooling.
而推理机箱的功耗介于两者之间,整个机架可能达到40千瓦左右。
And an inference box is somewhere in between, so a whole rack maybe is up to 40 kilowatts.
我们一直在讨论的功耗问题,其驱动因素不仅在于数据中心的规模扩大,更在于这些数据中心内部所需的功率密度。这正是AI数据中心与传统数据中心的本质区别。
One of the things driving this power conversation we've been having is, again, not only the size of these data centers, but now the density of power required within these data centers. This is what makes an AI driven data center so much different from a traditional data center.
仅20个这样的训练机架组成的集群就需要一兆瓦电力,相当于一个800人小镇的用电量。如果有200个这样的机架,就相当于8000人社区的用电规模。很快就能达到西雅图城市级别的吉瓦级用电量。
Just 20 of these training racks in a pod is one megawatt of power. That's on the order of a small 800 person town powering electricity. So you get 200 of these racks. You get to an 8,000 person neighborhood. Very quickly, you get to a gigawatt of Seattle scale power.
随着高功率密度GPU的密集部署,它们运行温度会更高,但你又不能让它们过热。因此,GPU和机架的冷却方式必须同步革新。
As you pack these GPUs that have higher power density, they're running hotter, and you can't let them run hotter. So now how you cool your GPUs and your racks needs to evolve.
问题在于空气传导热量的速度有限。即便使用推理机架,也需要吹叶机级别的风扇来排出空气,因此需要将更多冷却系统下移到机架层面。首先可以考虑为机架加装所谓的后门热交换器,这会在机架旁直接引入水冷系统,当空气流经发热芯片时,部分热气可通过机架内的水管排出,而非直接进入热通道。这是个便捷的折中方案,因为无需改变整体系统的冷却方式。
The problem is air can only move heat so quickly. And if you were trying to do that even with the inference racks, you would need leaf blower level fans to move the air out, and so you need to start bringing more cooling down to the rack level. So the first thing that you can look at is retrofitting the racks with what's called rear door heat exchangers. This puts some water cooling directly alongside the rack so that when air is moving over those hot chips, some of that air, instead of just going straight into the hot aisle, can get extracted via some tubes of water running inside the rack. That is a nice convenient half step because you don't have to change how you're actually cooling the overall system.
下一步是直接采用芯片液冷技术,即在芯片上安装冷板,让水流经芯片附近直接带走热量,而非依赖空气和散热片。随着GPU性能代际提升(需知每代GPU性能越强,散热需求越大),我们正全面转向水冷时代。
The next jump is the jump to direct chip liquid cooling. This is where you mount cold plates directly on your chips, and you run water right near the chip to actually extract the heat away instead of expecting air and heat sinks to do it. And so increasingly, especially with the most powerful GPUs, and to be clear, each generation of GPUs gets more powerful and therefore has more heat to remove, we are moving to a water cooled world.
我们已突破物理极限——空气无法冷却机架在这种密度下释放的热量,因此现在进入液冷时代。最终仍需将热量从室内排到室外,但这高度依赖环境。在适宜气候区(如芬兰、俄勒冈等地),可直接外排热量,这也是部分数据中心选址这些地区的原因。
We've crossed the threshold of physics where air can cool this amount of heat being emitted at this level of density from the racks, and so now it is a liquid cooling world. Now at the end of the day, you've got to move the heat from inside to outside. But it's very environment dependent. In the right climates, you can kind of expel it outside. And that's why we saw some of these data centers looking at Finland and Oregon and certain geographies that enable that to happen.
这种模式耗水量极少。但在其他气候区,需要大型蒸发水冷塔,这时冷却用水对当地社区的影响就开始显现。
And this consumes very little water. But in other climates, you have large evaporated water cooling towers. This is where the problem of the water impact on the local community starts to come into play.
本质上,我们是通过相变来散热。水蒸发时会带走大量热量,这些冷却塔通过增加水表面积来加速蒸发。
Fundamentally, what we're trying to do is we're removing heat through that phase change. You're evaporating water. That's a massive transfer of heat. These towers increase the surface area of the water. Water evaporates out.
看起来就像冷却塔升腾的蒸汽,这是非常高效冷却方式,但会消耗大量水资源。
It looks like steam going up from these cooling towers, and it's a very effective and efficient way to cool things down, but it can use up your water.
因此在特定环境中面临能源与用水的权衡,这不仅是与当地社区协作的数据中心的规划重点,也是那些受电力限制、注重碳强度的企业建设数据中心时的核心考量。谷歌2022年研究发现,水冷数据中心能耗降低约10%,意味着比风冷数据中心减少约10%碳排放。但问题在于水资源紧张地区难以实施这种方案。
And so you end up with this energy water trade off in certain environments, and that becomes a big planning focus not only for data centers that are working with their local community, but also those that are power constrained and built by companies that care about carbon intensity. One study Google did in 2022 found that water cooled data centers use about 10% less energy, which means they emit about 10% less carbon emissions than many of the air cooled data centers. Now the problem is you can't do this as easily in water stressed areas.
简单来说,这就像有个可以调节的旋钮。需要明确的是,确实可以建造完全不耗水的数据中心,但那样会消耗更多电力。就像你家开空调不用水,但会耗费大量能源。所以我认为,如果我们关心水资源,最好的办法就是发展清洁电力,让这种取舍变得更简单。
So there's essentially this, like, knob you can dial up and down. To be clear, it's possible to build data centers that don't use any water, but they're gonna use more electricity. To run your an air conditioner in your house, you're not using a bunch of water, but you're taking a ton of energy. And so I would argue if we care about water, the best thing we can do is have clean electricity to make this an easier trade off to make.
从全球视角看水资源问题,数据中心每年消耗超过5500亿升水,这个数字非常庞大。具体来说,一个100兆瓦的数据中心每天可能消耗200万升水,相当于6500户美国家庭的日均用水量(包括直接和间接消耗)。因此,选址审批和日常运营中的水资源问题,已成为数据中心当前面临的重大现实挑战。
So if you zoom out and look at water, globally, data centers consume over 550,000,000,000 liters of water annually, which is a big number. To put that into context, a single 100 megawatt data center can consume 2,000,000 liters a day, or the amount equivalent to 6,500 US households per day, both direct and indirect water usage. So this is a very big, real, and current issue for data centers in terms of where they site their data center, how they get permitted, and the ongoing operations.
地域性问题至关重要。
The local question is hyper important.
嗯。
Mhmm.
《纽约时报》曾报道佐治亚州的案例:当Meta启动价值十亿美元的数据中心建设时,附近居民家中水龙头突然断流。虽然是否存在直接因果关系尚存争议——目前还在核实是否确实由该数据中心导致——但已知该中心消耗了全县10%的供水量,并推高了当地水价。
Some reporting that the New York Times did about the situation in Georgia, where supposedly when Meta broke ground on their billion dollar data center build out, water taps in some residents nearby went dry. And, you know, there's some back and forth around whether or not you can prove direct causality. Still working out whether or not that is actually the case caused by that data center, but it is known to be true that that that data center is using about 10% of the county's total water and it is driving water prices in the region to go up.
现在让我们探讨当前数据中心扩建的最大制约因素——电力。我们梳理过电力问题决策层级的演变:从2010年代初期到疫情时期,再到现在的AI时代。电力已成为真正的限制性要素。
So now let's dive into what is the largest constraint to data center build outs today, and that is power. We've talked about kind of the evolution of power moving up the decision stack from the early twenty tens through the COVID era and now in the AI era. Power is really the limiting reagent.
我脑海中浮现的景象是:人类建造过最庞大复杂的机器——电网(它彻底改变了我们的生活方式),正与数据中心这个新兴的复杂精妙机器产生碰撞。而我们正在经历的这场剧烈碰撞,恰恰定义了这个时代。
The image in my mind is we have the largest, most complex machine that humankind has built in the power grid that has brought us such evolution in how we live our life, and it is colliding with the new complex, most interesting machine of the data center. And this collision, and the force of it that we're living through right now, is defining this period.
这绝非小规模的碰撞。它取决于两个因素:速度与规模。正如我们所见,计算资源正面临一场绝对的军备竞赛,因为上线计算资源的速度越快,模型表现就越好,创造的收入也就越多。时机在这里至关重要,因为2025年训练的模型到2027年就可能过时。
And it is no small collision. It's a function of two things. It's a function of speed and a function of scale. So as we've seen, there is a absolute arms race on compute because the faster you can get compute online, the better your models are, the more revenue you're gonna make. And timing matters a lot here because a model trained in 2025 can become obsolete by 2027.
因此,如果因为电网问题导致数据中心延迟两年上线,这将成为致命问题。必须现在就行动。再加上这些数据中心的规模——正如我们讨论过的,典型数据中心最初是5到10兆瓦,几年前发展到50到100兆瓦。
So if you have a two year delay in getting your data center online because of the power grid, it becomes a deal breaker. It has to happen now. Coupled with the scale of these data centers. And so as we talked about, a typical data center started as five to 10 megawatts. Then just a few years ago, we were in the 50 to 100 megawatts.
而今年,我们开始看到吉瓦级别的项目公告。需要说明的是,这些是组成吉瓦规模的园区,由多个设施构成,但仍需协调建设,并将以吉瓦级规模对电网和当地社区造成压力。为了理解吉瓦的概念——
And now this year, we're starting to see gigawatt scale announcements. Now to be clear, these are campuses that make up a gigawatt. They're multiple facilities, but they are still being built out cohesively and will put a strain on the grid and the local community at a gigawatt scale. Now to put a gigawatt in perspective This is
这相当于一个城市的电力消耗规模,对吧?比如匹兹堡或克利夫兰。谷歌数据中心用电量四年间翻倍,从2020年的1450万兆瓦时增至2024年的3000万兆瓦时。更直观地说,这相当于300万美国家庭的用电量,约占全美电力消耗的0.75%。
the scale of of a city's power consumption. Right? Maybe Pittsburgh or Cleveland. And Google's data center electricity use doubled over the four years and was up to 30,000,000 megawatt hours in twenty twenty four from 14 and a half million megawatt hours in 2020. To put that all into perspective, that's about 3,000,000 US homes or around three quarters of a percent of all US electricity consumption.
这个数字是合理的,对吧?如果数据中心目前占全美用电量的4%,谷歌约占其中的五分之一也说得通。
Which checks out. Right? If data centers are currently about 4% of US electricity use, I guess it makes sense that Google's about a fifth of that.
值得在此稍作停顿,谈谈这4%的用电量。当我听到这个数字时,考虑到所有关于数据中心和电力的讨论,它实际上小得惊人。假设增长到8%,与工业用电、建筑制冷供暖相比,感觉都很小。那为什么这会成为冲突焦点呢?
And it's it's worth pausing there for a second to talk about that 4% electricity use. When I hear that number, it actually seems shockingly small for all of the discussion of data centers and electricity. And I say, okay, it's 4%. Let's say it goes to 8%, compared to industry, compared to cooling and heating buildings, it all feels small. So why might this be such an issue of conflict?
这是个好问题。可能因为从全国总用电量角度考虑会产生误导——数据中心的影响是局部且集中的。这种负荷不会分散到多个公用设施,而且集中区域具有相似性。正如我们所说,数据中心靠近海底电缆和其他网络节点能带来性能优势,因此最终会集中在弗吉尼亚、加利福尼亚和德克萨斯等地。
It's a great question. It's it might have to do with the fact that it's misleading to think about it in terms of total national electricity use because a data center has a localized and concentrated impact. You're not spreading this load across multiple utilities. And not only is it localized and concentrated, it's localized and concentrated in similar areas Because as we've been talking about, the network effect of the value of a data center being positioned near the undersea cables and near other network points is where you get a lot of performance gains. And so you end up concentrating yourself in Virginia, in California, in Texas.
基本上只有10个州能看到新数据中心上线。因此,这对当地社区、县和州的电价产生了巨大影响,但不一定对全国层面造成冲击。
There's just basically 10 states where you're seeing new data centers come online. And so there's a tremendous impact at a local level at the electricity prices for that community, that county, and in that state, but not necessarily at a national level.
要知道,我们讨论的是千兆瓦级别。这完全关乎那些大型训练集群。当我们谈到推理时——比如ChatGPT处理你的响应——实际上你可能希望分散处理。你希望更靠近边缘,需要靠近互联节点,因为要确保与用户及其数据交互时没有延迟。
You know, we talk gigawatt scale. That is all about these large training clusters. When we talk about inference, which is, hey, ChatGPT processing your response, you actually probably want that spread out. You want that closer to the edge, and that's where you need it close to the interconnect because you want it to not have latency when interacting with a user and their data.
那么我们来聊聊为什么需要
So let's talk a little bit about why you need
一个千兆瓦数据中心或五兆瓦数据中心来实现最佳训练效果。你不该把它看作单个GPU、一排GPU或整栋楼的GPU。理想情况下,你希望整个数据中心协同运作。因为Transformer模型的训练原理是:先预测下一个可能的结果,再与实际情况对比,然后调整所有节点的权重——这是在整个模型上集体完成的。
a gigawatt data center or a five gigawatt data center to do the best training possible. You don't really wanna think about it as one GPU or a rack of GPUs or a building of GPUs. You want, as much as possible, the whole data center to act together. Because the way that a transformer actually trains is that it makes guesses about what should come next, and then it compares that to reality, and then it tunes the weights across it. You're doing that collectively across the whole model.
要实现这点,所有参与计算的计算机必须能快速通信。否则整个训练过程就会变慢。这些数据交换越快越好。因此你需要芯片间的连接速度尽可能快,芯片组之间的连接也要达到极限速度。
To do that, you need all of the computers working on the problem to be able to communicate quickly. Otherwise, the whole thing is training slowly. The faster those exchanges can happen, the better. And so you want the connectivity between chips to be as fast as possible. You want the connectivity between boxes of chips to be as fast as possible.
所以如果突然把半数计算机放在东海岸,半数放在西海岸就行不通了——光速横跨全美的延迟会比在同一个园区内慢好几个数量级。这就是为什么Meta要建造五兆瓦园区,这样才能实现最大规模的模型训练。
And so it does not work if all of a sudden half of your computers are on the East Coast and half are on the West Coast because the speed of light across the country is going to slow you down by multiple orders of magnitude than within one campus, within one center. And that's why Meta wants to build a five gigawatt campus because that's gonna get them the biggest training model possible.
这里还有个有趣的细节:在典型数据中心里,训练模型的负载系数通常比推理模型更高。
There's also an interesting nuance here where training models hit higher load factors than an inference model in your typical data center.
你试图一次性运行所有任务,理想情况下会充分利用所有昂贵的芯片。这一切都导致了更高的利用率、更大的功耗和更强的计算能力。
You're trying to run this all at once, and you're utilizing ideally all your expensive chips. And so all of that leads to a higher utilization and more power and more compute.
我们讨论了速度问题。从通电到产生收益的时间至关重要。现在我们谈到了规模,从100兆瓦扩展到千兆瓦级别。这为数据中心增长中的电力约束构建了一个目标函数。最终问题归结为:如何快速、按需、尽可能清洁地获取持续24/7的大规模电力供应?
So we talked about the speed. More time to power is time to revenue. We've talked about the scale now going from 100 megawatts to gigawatts. So this creates an objective function when we're thinking about the power constraint on data center growth. And that ends up being how do we secure large 20 fourseven power where you need it, when you need it, quickly, and ideally as clean as possible?
应该很简单对吧?哦等等。更不用说变压器供应链存在瓶颈,燃气轮机也存在瓶颈。
Should be easy, right? Oh, wait. Not to mention there's supply chain bottlenecks in transformers, bottlenecks in gas turbines.
劳动力和专业技能方面也存在瓶颈。
Bottlenecks in labor and in specialty skills.
我们没有配备最新输电线路的电网系统,因此形势变得非常紧张,这也催生了可以说是创新的领域。
We don't have the most up to date, freshest grid with transmission lines, so this gets into a very tense moment and an area that breeds arguably innovation.
现实情况是,要快速、清洁地获得24/7不间断的廉价电力将非常罕见。这种'金发姑娘'式的理想状态目前并不存在。因此我们至少要在其中一个变量上做出妥协。现在要让这些数据中心上线,你有几种不同选择对吧?
So the reality is getting affordable power that is 24 by seven quickly and cleanly is gonna be very rare. That Goldilocks situation doesn't really exist today. And so we're gonna have to compromise on at least one of those variables. Now you've got a few different options to get these data centers online. Right?
你可以选择接入电网使用现有电力,也可以建设离网的新型发电设施专门供电,或者采用某种混合方案。长期来看,我们可以开发核能、地热能等大型新型发电方式。
You can either hook it up to the grid and use power that already exists. You can build new generation of power that's off the grid and power that data center. Do some sort of hybrid in between. And over the long term, we can build out large new generation like nuclear, geothermal, and others.
仔细观察这一现象,既有许多令人担忧的时刻,因为我们有时会做出诸如延长燃煤电厂运营时间等本不该有的举措。同时也有许多创新的亮点,储能技术的快速发展为现有系统提供了补充,平滑了发电曲线,并以新型混合方式实现了协同布局。
Watching this up closely, there's both a lot of moments of concern, right, because we're kind of at times doing things like keeping a coal plant running longer than we might have otherwise. There's also a lot of interesting moments of innovation, a lot of growth in storage to help supplement what's going on and smooth out power generation and co locating these in new hybrid ways.
这种矛盾很大程度上由时间因素决定。当数据中心上线并可能导致电价上涨时,当地社区正面临着短期阵痛。但我们也应保持乐观,因为有资金雄厚的企业致力于长期可持续电力发展。事实上,最具成本效益的长期可持续能源终将是清洁可再生能源。
There's a tension that's largely dictated by timing. Right? There's a near term pain that local communities are facing when you bring a data center online and and potentially increase electricity prices. But there's also a cause for optimism because you've got well funded companies who have an interest in long term sustainable power. And the fact of the matter is the most affordable long term sustainable power we have is going to be clean and renewable.
因此当前这个时刻实际上为我们提供了加速电网清洁化建设、重塑电力基础设施的机遇。
And so this moment actually offers an opportunity for us to accelerate a lot of new clean build on the grid and reshape our electricity infrastructure.
我认为归根结底还存在激励机制的重大问题。这些超大规模企业现在就想构建这些模型,其中多数仍习惯于基于软件开发周期来开展业务。尽管大家都在讨论公用事业和电力问题,但对他们而言这只是一个瓶颈——构建最大模型的实际能源成本仅占2%到6%。
I think at the end of the day, there's also a big issue of incentives. These hyperscalers, they want to build these models now. Many of them are used to mostly still building their businesses on software based timescales. And the reality is, despite all the discussion of the utilities and electricity, for them it's a bottleneck. The actual cost of energy for them to build the biggest models is like two to 6%.
真正的成本全在人才和硬件上,全在芯片和人力资源上。正如你刚才所说,对他们而言关键在于上市速度。但这与我们的系统设计方式、与我们的激励机制完全不相符。
It is all in the people and the hardware. It is all in the chips and the humans. So for them, it's all, as you said earlier and I, the speed to getting out there. And that's not how our system is set up. That's not how our incentive structures are set up.
确实如此。因为供电企业遵循完全不同的运营范式——对典型电力公司而言,2027年输送的电力与1930年2月输送的电力价值等同。公用事业公司不区分这些产品,但对客户(如谷歌或Meta)而言,这对其业务影响天差地别。
That's right. Because the utility supplying that power operates on an entirely different paradigm. In a typical electric utility, power delivered in 2027 is valued at the same amount as power delivered in 02/1930. Utilities don't differentiate from those products. But the customer, a Google or Meta, massive difference for their business.
他们需要的是即时电力供应。
They need power today.
如果他们现在就能获得电力,而不是等到三年后,他们愿意多支付多少钱?
How much more would they pay if they could get power now versus three years from now?
如果我们能让公用事业公司利用企业愿意现在支付比两年后高得多的费用这一事实,我们能释放多少对公用事业的投资潜力?
And how much could we unlock in terms of investment into our utilities if they could take advantage of the fact that companies are willing to pay a lot more today than in two years.
布莱恩·贾纳斯称这种现象为'瓦特比特价差'。这本质上是一种经济套利,存在于可用电力与将其转化为数据比特的能力之间。现实情况是,公用事业费率定价的监管变革速度——换句话说,公用事业公司调整其经济模式的灵活性——是缓慢的。因此,尽管存在明显的市场需求结构,未来十二个月内公用事业产品的定价方式不会改变。
Brian Janus calls this the watt bit spread. It's kind of the economic arbitrage between an available watt and the ability to turn that into a bit. Reality is the speed of regulatory change in utility tariff pricing, which is a fancy way of saying how quickly the utility can change anything about their economics, is slow. And so despite there being an obvious market demand structure, it's not going to change in the next twelve months the way that utilities price their product.
公用事业是公共产品。它们为我们的生活、学校和医院提供动力,因此受到监管是合理的。但也正因为受到监管,它们的变革需要时间。此外,我们有一个联邦体系。美国有大约3000家不同的公用事业公司。
Utilities are public goods. They power our lives and our schools and our hospitals, and so for good reason, they are regulated. But because they are regulated, they take time to change. And moreover, we have a federated system. We have some 3,000 different utilities across The United States.
因此,要在全国范围内、甚至在正在建设数据中心的十个州实施这种变革,都将耗费大量时间。
And so making that change across this country or even in the 10 states where data centers are getting built out will take a lot of time.
电力已成为决定数据中心选址的首要问题,因为没有电力就无法建设,但这并不意味着其他问题就不存在了。例如,你需要人员来建造数据中心,需要大量电工,需要所有组件和供应链,还需要建造降压变压器。
Power has become the dominant question of where can you build your data center because you can't go do it without that, but it didn't remove the fact that the other questions still remain. For example, you need people to go build the data center. You need a lot of electricians. You need all the components and the supply chain and the thing to build the transformers to step down the power.
根据我们目前的讨论,你可能会认为数据中心几乎完全是美国的故事。虽然美国是数据中心建设的主导者,但这实际上是一个全球性的现象——全球基础设施正在建设中,其他国家也对数据主权有着强烈需求。为了帮助理解当前局势,让我们稍微回溯历史,了解正在全球上演的这场大戏。
Based on how we've been talking about this, you might think that data centers are almost entirely a US story. And while The US is the dominant builder of data centers, there is actually a global story here and global infrastructure being built out and a strong need for sovereignty for other nations. So in order to help understand what's going on here, let's go back in time a little bit to understand the drama that's setting in globally.
我认为值得回顾的是,就在2018年,欧洲通过了《通用数据保护条例》(GDPR),这为全球隐私保护设定了基准。感谢那些不断弹出的cookie提示。但更重要的是,它对欧盟数据的存储位置以及如何处理欧盟居民在境外数据制定了严格管控。因此,我认为这实质上是对'关于我国公民的数据存储在何处'的回应,并由此产生了对欧盟境外数据持有和传输行为的监管措施。
I think it's worth looking at you know, just fairly recently, in 2018, you had Europe pass GDPR, and that set this world baseline for privacy. Thank you all the cookie pop ups. But also really tight controls on where EU data sits and what happens if you have EU resident data outside of the EU. And so this is really, I think, a reaction to where are people storing data about our citizens. And so there's controls about what happens if you're holding or transferring data outside of the EU.
2018年美国通过《云法案》,该法案本质上授权美国当局可合法要求美国服务提供商提供数据——即使这些数据存储在美国境外的服务器上。因此
In 2018, The US passed the Cloud Act, which essentially lets US authorities lawfully demand data from US providers, even if that data sits on servers outside of The US. And so
最终效果是:如果你大规模处理欧盟数据,就必须开始在欧盟境内部署服务器和硬盘。我认为这几乎形成了大型云服务商的另一种垄断优势,因为他们会为此专门建设基础设施。
the net effect is if you're working with EU data at scale, you start need to start having servers and hard drives in the EU. And this becomes, I think, almost another lock in that the big cloud providers start to have because they go and build the infrastructure to do that.
我们在微软案例中见证了这点。对吧?他们早期扩张的重要部分就是追随全球企业客户的脚步,其中在德国建设数据中心就是典型案例之一。
And we saw this in in the Microsoft story. Right? A big part of their early scaling was following their enterprise customers globally and needing to build out data centers in Germany as as one of the primary cases.
所以如果说美欧之间只是小规模TIF争端,那么现在该把目光转向地缘政治层面的数据中心大战了。
And so if the EU, you know, US looks like a little TIF drama, I think it's time to move to the major geopolitical data center drama.
有人称之为新冷战,有本名为《芯片战争》的著作就聚焦于中美之间这场全方位的争夺战,
Which some have deemed the new Cold War, and there's a book called The The Chips War, which centers around this notion of The US and now China in a fight over everything,
但核心都围绕这项技术。让我们回到2012年——我认为这真正开启了近十到十五年的对抗序幕。当时美国众议院情报委员会报告警告运营商不要使用华为和中兴设备,声称敏感政府系统绝不能采用。当然,这种限制后来逐渐蔓延到非政府领域。
but centered on this technology. So let's go back to 2012, which I think really kicked off this last decade, fifteen years of animosity here. The House Intelligence Committee report warned US carriers away from using Huawei and ZTE equipment, saying that we shouldn't use this in any sensitive government system. Now that, of course, starts to spiral outside of just government use.
最初是针对运营商。北京在另一个前沿——数据前沿——做出了回应。2017年中国网络安全法及后续法规推动数据本地化和更严格的国家管控。美国企业不得不通过业务本地化应对,这意味着AWS、苹果、Azure等要么出售给中国合作伙伴,要么必须与有政府背景的中国企业合作。
That was first with the carriers. Beijing responded on a different front, the data front. So in 2017, China's cybersecurity law, followed by other laws, pushed data localization and tighter state control. And so The US companies had to respond by localizing operations, which essentially means AWS, Apple, Azure, they either had to sell off to Chinese partners or find a Chinese government linked company to be a partner.
没错。这些公司不仅要把硬盘和服务器放在中国,还必须由中国企业持有和运营。于是美国进一步反应,禁止所有联邦机构从一长列中国供应商处采购或使用设备,并禁止FCC批准华为、中兴等的新授权。事态就这样不断升级。
That's right. So it's not just that those companies hard disks and servers had to be in China, but they actually had to be owned and operated by a Chinese company. And so then The US, you know, react even further. They barred all federal agencies from buying or using equipment from a long list of covered Chinese vendors, and they barred the FCC from approving any new authorizations from Huawei, ZTE, and others. So this just ramped up further and further.
这场酝酿中的战争不仅限于陆地,还波及我们的海底电缆朋友。2016年谷歌和Facebook与香港公司合作建设太平洋光缆网络,这条1.2万公里的海底电缆连接洛杉矶与香港、台湾和菲律宾,属于我们之前讨论的全球海底电缆网络。看似绝妙的主意,能出什么问题呢?
This brewing war was not just limited to land, but also involves our friends, the undersea cables. And so in 2016, Google and Facebook partnered with a a Hong Kong based company to build the Pacific Light Cable Network, a massive 12,000 kilometer undersea cable linking Los Angeles to Hong Kong, Taiwan, and The Philippines, part of that global network of undersea cables we were talking about earlier. Seems like a great idea. What could go wrong?
四年后的2020年,美国国家安全官员表示不再信任香港登陆点,认为这将成为北京监控互联网重要基础设施的渠道。最终FCC封锁了该线路。如今电缆只启用台湾和菲律宾段,已铺设的香港分支线路则处于闲置状态。
And fast forward four years later, 2020, The US national security officials say, we don't know about this Hong Kong landing point anymore. We think that's going to be a vehicle for Beijing to have surveillance across this important background of the internet. So the FCC ultimately blocked that route. And so now the cable hits Taiwan and hits Philippines, those are lit up. But the Hong Kong branch that's been laid, I believe, is just laying dark.
唉,我现在坐在洛杉矶,却用不上通往香港的电缆。回到陆地和芯片战场,斗争仍在继续。2022年商务部出台对先进AI芯片和晶圆厂设备的全面管制。别忘了此时我们刚经历疫情繁荣期。
Ugh. So I'm sitting here in Los Angeles without that cable to to Hong Kong. Now now moving back to land and to chips, the battle continues. And so in 2022, the Commerce Department rolls out these sweeping controls on advanced AI chips and fab tools. And so remember, this is the moment where we've gone through a COVID boom.
我们看到加密货币、流媒体和AI应用激增,而商务部在2023年和2024年持续收紧这些管制措施。
We're seeing an uptake in usage from crypto and streaming and now AI, and the commerce department tightens these controls again in 2023 and again in 2024.
局势非常活跃。几乎每个英伟达财报季都会讨论此事。几周前美国开放通道,允许英伟达出口性能降级的H20等特定芯片,但需将15%营收上缴美国政府。这种'付费出口'安排前所未有——就像当下许多前所未有之事。但放眼全局,这引发了我们该如何参与的 geopolitics 战略思考。
And this feels very active. Literally every quarter of NVIDIA's earnings, there's discussions of this. A few weeks ago, The US opened a channel that said NVIDIA named you can ship certain chips that are dialed back, like the H20, but they have to give the US government a 15% cut of the revenue. It's a pretty unprecedented pay to export arrangement, like a lot of things that are unprecedented that are going on. But if you zoom out, it raises some good geopolitical strategic questions on how we wanna engage.
对吧?
Right?
是的。这种先进算力是构建新经济的基础。至于最终效果如何还有待观察,可能会适得其反。对吧?
Yeah. It's foundational to the new economy that's being built on this advanced compute power. And it remains to be seen how how this plays out. It may backfire. Right?
中国并不热衷于依赖美国的生产。因此他们正往芯片领域注入大量资金。虽然在原始性能和软件生态主导地位上仍落后于英伟达,但考虑到中国的资源实力,这个差距很可能会迅速缩小。
China is not terribly keen on being dependent on US production. So they are infusing mass amounts of capital in their chip sector. It's still a step behind NVIDIA in terms of raw performance and software ecosystem dominance. But given China's resources, you have to assume that that gap is going to narrow and narrow quickly.
中国追求能源自主是有原因的。他们希望实现硅的本土化生产,因此投入巨资并正在迎头赶上。有估计显示其性能已达英伟达的60%到70%(具体取决于应用场景),他们还设立了500亿美元的基金来推动芯片发展。更重要的是,他们通过政策要求数据中心至少50%的芯片必须国产,这既保障了市场供应端,也同步培育了需求端。
There's a reason China wanted domestic energy. They want to have their own domestic production of silicon here, And so they've invested heavily and they're catching up. Some estimates are that they're 60 to 70% of NVIDIA performance, it depends on what they're doing, but they're investing out of a $50,000,000,000 fund to improve chip development. And even more so, they're passing policy that that says data centers need to source at least 50% of their chips domestically. So they're making sure to to really stand up not just the supply side of the market, but also simultaneously the demand side of the market.
问题的关键不在于中国芯片能否达到英伟达高端设计的水平,而在于在现有投入下这个目标何时实现。简言之,目前美国仍保持领先,但中国芯片生态正在飞速发展。随着时间推移,被压制的英伟达芯片积压需求将变得越来越无关紧要。
The story is less about whether Chinese chips can equal NVIDIA's top end designs and more about how quickly that's gonna happen given the demand and the capital that they're putting in. So in short, at the moment, The US is is still ahead, but the China chip ecosystem is accelerating rapidly. And their backlog of demand for NVIDIA chips, which is being withheld, is gonna matter less and less and less over time.
从地缘政治角度看,各国政府正将AI数据中心热潮视为未来经济的重要部分。阿联酋与OpenAI合作的"星际之门"项目就是典型——这个位于阿布扎比的千兆瓦级AI集群,吸引了从甲骨文到英伟达、思科到软银等生态玩家共同投资建设。这引发了许多值得思考的问题:数据中心的选址对政府而言究竟意味着什么?
I mean, other dynamic that you have geopolitically is governments are seeing the AI data center boom as an important future part of their economy. A deal that exemplifies this is the Emirates deal with OpenAI around this UAE Stargate build out, a one gigawatt AI cluster in Abu Dhabi, and all the ecosystem players from Oracle to NVIDIA to Cisco to SoftBank coming in to finance this and collaborating on the build out. And I think it just raises a lot of interesting questions. What about the location of a data center matters to a government?
现状触目惊心:目前拥有AI数据中心的国家仅32个,且大部分位于北半球。拉美和非洲广袤区域完全空白。各国政府深感忧虑——如果持续租用遥远数据中心的算力,不仅受制于人,也难以用同等掌控力支持本土企业、科研和学术发展。"算力主权"概念因此成为许多人的首要关切,新兴市场普遍担忧AI时代可能带来...
And the map is stark, If look at where AI data centers are located today, only 32 nations have them, and most of them are in the Northern Hemisphere. You have large swaths of Latin America and Africa as fully dark. Governments are deeply, deeply concerned because they are wondering that if you continue to rent compute power from faraway data centers, you remain at the whims and you remain vulnerable to foreign entities and foreign companies, and you you aren't able to support domestic enterprise or domestic scientific research or academia with the same level of control. And so this idea of compute sovereignty is top of minds for a lot of people, and a lot of emerging markets are worried that the AI era runs the risk
在经济上让他们更加落后。这几乎与能源独立有着某种粗略的类比,对吧?就像无论你与另一个国家的关系如何,你是否拥有它?这似乎是这里的重要线索。
of leaving them even further behind economically. There's almost a rough analogy to energy independence. Right? And, like, do you have it regardless of your relationship with another country? And that feels like the important thread here.
另一个棘手的方面,我们之前稍微讨论过,就是对气候的实际影响,这最近受到了很多关注,理所当然。所以有几个问题不断出现,其中一个,本,也许你妈妈问过你,我知道我妈妈问过我,就是ChatGPT查询是否对气候有害。如果我关心气候,我真的应该使用人工智能吗?我们有一些关于这方面的数据。
The other tricky dimension here, which we've talked a little bit about, is the actual impact on the climate, which gets a lot of attention these days, rightly so. So there's a few questions that continue to come up, and one that, Ben, maybe your mom has asked you, I know mine has, is if a chat GPT query is bad for the climate. If I care about the climate, should I really be using AI? And we have some data on that.
有一些关于这方面的报道,我认为这已成为一个相当热门的话题,甚至有些人因为气候影响而羞辱他人使用这些工具。所以值得看看最新的数字。谷歌实际上刚刚发布了一篇公开可读的关于Gemini的论文,其中详细介绍了能源消耗和能源的碳强度。因此,Gemini文本提示的中位数,我认为可以视为与Claude、JetGPT或Copilot提示相当,使用0.24瓦时的能量,排放0.03克二氧化碳当量,消耗0.26毫升或大约五滴水。这些数字远低于公众的估计。
There's been some reporting on this, and I think it's an area that's become a pretty hot button issue for folks, and there's been, I think, some people even shaming others for using these tools because of the climate impacts. So it's worth looking at latest numbers. Google actually just put out a publicly readable paper on Gemini where they go down to the details on the energy consumption and the carbon intensity of that energy. And so the median Gemini text prompt, which for all intents and purposes, I think could be viewed as fairly equivalent to a Claude or a JetGPT or a Copilot prompt, uses 0.24 watt hours of energy and emits 0.03 grams of carbon dioxide equivalent and consumes 0.26 milliliters or about five drops of water. Those are figures that are substantially lower than what the public estimates have been.
为了有个大致的概念,每个提示的能源影响大约相当于看电视不到九秒钟。为什么我们认为公众报道可能相差这么远?
For some kind of sense of scale, that per prompt energy impact is about the same as watching your TV for less than nine seconds. Why do we think the public reporting might be so far off?
随着时间的推移已经有了很多改进,所以重要的是要立足当下。通过同一份报告我们知道,我们正在使用的人工智能系统正变得越来越高效。软件和硬件不断创新以提高效率。因此,在十二个月内,标准Gemini技术提示的能源消耗下降了33倍。哇。
There's been a lot of improvements over time, and so it's important to kind of ground ourselves in this moment. We know through that same report that the AI systems that we're using are becoming more efficient. There's constant innovation in the software and the hardware to drive more efficiency. So over twelve months, the energy of a standard Gemini tech sprout dropped by 33 fold. Wow.
能源消耗提高了33倍。从总碳足迹百分比来看,甚至提高了24倍,同时还能提供更高质量的响应。因此,我们正处在一个效率提升持续的世界里。所以,你的查询对气候有害吗?这个问题相当微不足道。
33x improvement in the energy consumption. Even more from a total carbon footprint percentage, you have 24x, all while delivering higher quality responses. And so we are moving in a world where the efficiency gains are continuing. And this question of, Is your query bad for the climate? Is fairly insignificant.
来自Our World and Data的Hannah Ricci独立证实了这一事实。这是一个有趣的鸡尾酒会话题,但基本上是只见树木不见森林。
Hannah Ricci from Our World and Data has independently corroborated this fact. It's an interesting cocktail conversation, but largely misses the forest from the trees.
我认为这种压力是一种良性的正向压力,因为单个查询不应让人感到困扰,但促使所有公司监控这些数据、降低能耗并提升效率的压力是件极好的事。没错。推动清洁能源部署,从而降低能源的碳强度,这是一个保持压力的良性循环。
I think the pressure is a good positive pressure because one individual query you should not feel bad about, but the pressure for all of these companies to monitor this and drive these numbers down and get these efficiency improvements is a fantastic thing. That's right. And drive the deployment of clean energy so that the carbon intensity of the energy drives down is a good flywheel to keep pressure on.
确实如此。所以,妈妈,请继续提问,因为谷歌会因此持续改进。
Indeed. So, mom, keep asking a question because Google will then continue to improve.
我认为还有一个领域被讨论得不够充分。
I think there's another area that's been under discussed.
关于建造这些大型基础设施(即所谓'隐含碳'的房地产项目)的实际碳强度,目前讨论很少。那么钢材、水泥以及建造外壳和设施本身所消耗能源的隐含碳影响究竟有多大?
You don't hear a lot of discussion on the actual carbon intensity of building these very large pieces of infrastructure, pieces of real estate that's called embodied carbon. And so what is the embodied carbon impact of the steel, the cement, the energy used to build the shell and stand up the facility in and of itself.
值得庆幸的是,谷歌就此发布了另一份报告。他们的分析显示,在运营AI数据中心时,持续运行数据中心所需的能源产生的运营排放约占总排放量的70%至90%。服务器组件(内存、闪存、GPU等)的制造排放约占25%。而数据中心建设(钢材、水泥及相关物流)仅占5%。这说明我们或许应该更关注组件制造环节,确保考虑固态硬盘和GPU的全生命周期评估,但关注重点确实应该放在电力消耗上。
Thankfully, Google did another report on this and shared that in their analysis, running an AI data center, the operational emissions, which really means the energy going into running the data center ongoing, is going be about 70% to 90% of the total emissions. Manufacturing emissions, which is really the manufacturer of the server components, the memory, the flash storage, the GPUs, is going to be around 25%. And then the data center construction, so this is the steel and cement and the logistics around that are around 5%. So this is all to say, I think when you look at this, we should probably be paying a bit more attention to the manufacturing of the components and make sure that we're looking at the LCAs for solid state disk drives and GPUs and making sure we're taking that into account, but the attention's probably correctly centered on the electricity.
关于聚焦制造业这一点,这些建设公司的采购能力实际上是个重要杠杆。当微软要求其供应商在2030年前使用100%无碳能源时,将带动整个供应链——从晶圆厂、主板制造商到组装商——共同转型。这种对范围三排放的管控,正在推动整个供应链走向更清洁的模式,影响力巨大。
And on this point of focusing in on the manufacturing, the procurement muscle of one of these companies that are building this out is actually a very big lever. So when Microsoft requires its suppliers to use 100% carbon free energy by 02/1930, it will pull their suppliers in, all fab building and the motherboard building and the assemblers. It's moving everyone down the supply chain, it's called scope three emissions, into a cleaner world, which has tremendous power.
另一个有趣的现象是循环经济话题开始兴起。我参加过一个讲座,微软首席可持续发展官提到他们四月份刚启动了循环计划:通过新技术回收数据中心硬盘中的稀土材料(同时销毁数据),回收率高达90%。这既有助于建立美国本土的稀土供应链(目前我们严重缺乏),又是项非凡的成就。
I think the other thing that's interesting is some of the circularity conversations are starting to happen. I was at a talk where the Microsoft CSO was talking about how they recently launched, I think it was just in April, a circularity program where they would take the rare earth material in the hard disk drives that they're using in the data centers, and they have a new way to recycle those while disposing of the data, and that yields a 90% recovery of those rare earth materials, which also helps stand up a US supply chain around rare earth materials, we don't really have today, which is pretty phenomenal.
确实,影响巨大。随着我们对这些关键矿物的开采需求不断增长,以及围绕这些矿物资源的地缘政治控制权之争,这将成为未来几年日益激烈的讨论话题。
Yeah, Tremendous. And it is gonna be a growing discussion over the coming years as we mine for more and more of these critical minerals and the geopolitical control over who has those minerals.
好的。既然运营负荷是温室气体的主要来源,约占排放量的70%或更多,那么从宏观气候角度来看,我们应该如何评估其规模?
Alright. So given all of that operational load is the main contributor to greenhouse gases, something like 70% or more of the emissions, how should we think about the scale of that from a meta climate perspective?
谈到运营环节时,请记住我们讨论的是能源消耗而非数据中心建设材料。最新研究表明,数据中心约占美国总用电量的4%。由于其中过半电力来自化石燃料,这意味着数据中心每年产生超过1.05亿吨二氧化碳。
So if we talk about operations, remember we're talking about the energy consumption and not the materials that go into building the data center. Recent studies show that data centers account for about 4% of total US electricity consumption. And with more than half of that electricity derived from fossil fuels, that means that data centers generate more than 105,000,000 tons of CO2 every year.
在阅读这项研究之前,我并未意识到数据中心的碳强度实际上比平均水平高出48%,远超美国均值。
Until reading the study, didn't realize that data centers' carbon intensity is actually higher than average, exceeded The US average by 48%.
关键在于这种影响对电网的依赖程度,对吧?如果这些数据中心位于污染更严重的电网区域——比如弗吉尼亚州中大西洋地区以煤电为主的电网——那么其气候影响特征就与完全由水电供电的华盛顿州东部数据中心截然不同。那么面对这1.05亿吨二氧化碳,我们该如何理解?
The big point here is how grid dependent that impact is, right? If these data centers are located on a dirtier grid, let's say a coal heavy grid in the Mid Atlantic region of Virginia, then it's going to have a very different climate profile than a data center that's in Eastern Washington that's entirely powered by hydro. So when we talk about 105,000,000 tons of CO2, how should we think about that?
是的。我有几个对比数据可以帮助理解1.05亿吨二氧化碳排放的规模:其约相当于美国航空排放量的一半;另一个参照是肠道甲烷排放(即牛打嗝产生的气体)的178,000,000吨。
Yes. I have a few comparisons to help think about the scale of 105,000,000 tons of CO2 emissions. One is you look at US aviation emissions. This is about half of that. Another would be looking at enteric methane, which is a fancy way of saying cal burps.
这意味着数据中心排放量约相当于该数值的60%。若以美国所有乘用车排放量(10亿吨)为基准,数据中心目前约占10%。全球范围来看,这个数字会扩大约三倍。总体而言,数据中心排放虽不容忽视,但相比乘用车、牲畜等主要排放源,仍只占较小比例——约相当于全球航空排放量的三分之一到一半。真正使其成为热点话题的不仅是当前规模,更是未来预期增长,对吧?
It's around 178,000,000 tons. So that means data centers are about 60% of the equivalent of that, or all of US passenger vehicles are 1,000 tons or a gigaton, and so data centers are about 10% of that today. Now you look globally, this goes up by approximately a factor of three, as do the rest of these things. So overall data centers are not insignificant, they're worth tracking on the map here, but quite fractional compared to passenger vehicles, big emitters like livestock, and a third to a half of something like global aviation. The interesting thing, think what makes this a hot topic, is not just the current scale, but the projections, right?
因此,如果我们突然将美国现有电网容量翻倍,那么我们的排放量就将与美国航空业持平。如果继续按照这个趋势发展,数据中心很可能成为排放问题的主导因素。
And so if all of a sudden we double on our current grid in The US, if we double, then now we're caught up to the emissions of aviation in The US. And if keep going from that and you keep the trend line up, you could imagine data centers becoming a fairly dominant story in emissions.
这个讨论的另一面是数据中心对当地社区的影响。我们发现这些影响具有集中性,且对环境、生态及周边居民造成不成比例的冲击。我们以孟菲斯为例,XAI正在密西西比河畔建造一座距市中心仅15分钟的大型数据中心。埃隆·马斯克在那里筹建配备20万块GPU的数据中心,而孟菲斯社区的强烈反对完全在情理之中。
The other leg of the stool in this conversation is how it's affecting the local communities that these data centers are being placed in. And so one of the themes that we've found come out through this is the impact of these data centers are concentrated, and they are disproportionately felt by the environment, the ecology, and the people that immediately surround it. And so we metaphorically went to Memphis, where XAI is building a very large data center on the banks of the Mississippi, fifteen minutes from downtown. And there, Elon is commissioning a 200,000 GPU unit data center. And there's been a significant backlash from the Memphis community for a good reason.
我认为核心问题在于如何供电——电力公司表示只能提供约50兆瓦负载,但XAI需求是这个数字的三倍。50兆瓦根本不够驱动这么多GPU,所以他们引入了35台燃气轮机,理论上可提供420兆瓦电力。虽然解决了供电问题,但这些高污染涡轮机每年可能排放数千吨形成雾霾的氮氧化物——比孟菲斯机场造成的雾霾还要严重。
I think a lot of it centers on how are we gonna provide power to that because the utility said that they can provide XAI power for about 50 megawatts of load, but XAI wants triple that amount. That's a lot of GPUs and 50 megawatts isn't gonna cut it. So they brought in gas turbines, 35 turbines, which could theoretically power four twenty megawatts. The problem with that, it's good they're solving their power problem, but the other problem with that is these are highly polluting gas turbines, and so they have the potential to emit a couple thousand tons of smog forming nitrous oxides each year. For context, that's more than the smog caused by the Memphis Airport.
这相当于给该地区又增加了一个机场规模的雾霾污染源。
So it's like you're adding, you know, another airport of of smog to the region.
这种影响并非持续存在。数据中心的能耗存在波动,因此必须关注峰值时段。我们从NASA和欧空局获得的公开卫星数据显示,该区域二氧化氮浓度同比平均上升了3%。
And this impact is not continuous. Right? Because these data centers ebb and flow in terms of how much power they are they are consuming. So it's important to look at these peak moments. So we found public satellite data from NASA and the European Space Agency that shows that on average, nitrogen dioxide concentration increased by 3% when comparing for a year before in this area.
但在能耗峰值时段,二氧化氮浓度相比XAI入驻前激增79%。想象一下就像堵车时的情况,这些瞬间对社区造成的冲击会被急剧放大。
But in the times of peak consumption, we're talking about a 79% increase in nitrogen dioxide concentration from pre x AI levels in that area. And so you can imagine it's like sitting in a traffic jam and those moments have outsized impact on the community at that moment in time.
没错,从污染和健康风险角度已引发强烈抵制。南孟菲斯社区本就因历史工业污染导致哮喘和癌症高发,这种临时发电设施显然无法长期持续。整件事堪称研究地方影响与供电时效问题的典型案例。
Yeah, and so there's been a lot of pushback from this pollution and health risk standpoint. It's hitting South Memphis neighborhoods that already have elevated asthma and cancer rates from past industrial waste, this temporary electricity generation infrastructure isn't going to sustain in the long term. So all of this, this is kind of a case study in both the local impacts and also the time to power issue.
而‘时间即力量’这个概念,本质上意味着数据中心快速上线的需求如此旺盛,以至于企业愿意不断加码投入。在这种经济压力下,社区抵制往往只能起到有限作用。正如我们当前所见,XAI公司顶着压力,正在几英里外建设第二个规模翻倍的基地。
And this notion of time to power, essentially what it means is there is enough demand to get these data centers online quickly that companies are willing to pay more and more and more. And so when you have that kind of economic pressure, the community resistance sometimes can only go so far. And so as we're seeing in this case, XAI, despite the pressure, is currently building out a second location a few miles away, which will be double the size of the first one.
仅仅五十万块这样的GPU?
Just a half million of these GPUs?
简直疯狂。这就是我们当下的现实。
I mean, just insane. That's the current moment we're living in.
此前我们主要关注已建成项目而非规划方案,但有必要极端化展望未来:Meta已宣布其位于路易斯安那州里奇兰教区的长期重大项目——Hyperon训练中心,计划2030年前投入100亿美元建成。这个占地2000英亩的园区最终将实现五千兆瓦供电能力。
So far, we've been trying to keep it focused on what has been built more than what is proposed, but I think it's worth taking an extreme view on some of what's coming. Meta has announced their big long term project, their big training center, the Hyperon project located in Richland Parish, Louisiana. It's a $10,000,000,000 build out online by 2030. They'll have a few steps in between. The goal is for this to be a five gigawatt campus covering over 2,000 acres.
2000英亩是什么概念?换个说法——这个数据中心相当于下曼哈顿区的面积。我们竟要在短短数年内建造如此规模的建筑来支撑新兴技术,这完全超乎想象。路易斯安那项目正是我们正经历的阶梯式变革的绝佳例证。
What what is 2,000 acres? Yeah. So a different way to visualize this is this data center is about the size of Lower Manhattan. It is at a scale that is unfathomable, right, that we're gonna build buildings of that scale in a few years here to power such a new technology. That Louisiana project is a great example of what we are living the step change function that we are living through right now.
这是自1960年代计算机时代黎明期,甚至1880年代铁路黄金时期以来最大的科技基建项目。英伟达正逼近IBM在1969年创下的资本支出市场份额峰值。当前很多类比都指向镀金时代或电信基建热潮等历史繁荣期。
This is the biggest tech infrastructure project since either the 1960s, the dawn of the computer age, or even the 1880s, the heyday of the railroad period. I think NVIDIA is on pace to capture the highest share of market wide capital spending since IBM peaked in their percentage of of that in 1969. Right? And so a lot of comparisons are being made to other past booms, like the Gilded Age or the telco build out.
提到繁荣期就难免想到萧条。我们讨论过1990年代电信繁荣导致互联网泡沫破裂和光纤过剩,还有1870年代铁路大繁荣引发的崩盘。这些都引发同一个问题:我们是否建设过度了?
And when you mentioned booms, obviously, you end up thinking about busts. Right? And you know, we talked about the telecom boom of the nineteen nineties, which contributed to the dot com crash and a bust on fiber. You also look back at the eighteen seventies and the huge railroad boom that led to a crash. And both of these pose the question of like, did we overbuild?
我们是否超出了需求?在这两种情况下,并非资本支出方判断错误,而是他们出手过早。我们在光纤过度建设中就清晰地见证了这一点,对吧?
Are we outrunning our demand? In both of those cases, it's not that the CapEx spenders were wrong. They were just early. We saw this vividly in the fiber overbuild. Right?
到二月份时,我们仅使用了已铺设光纤的3%,但这些设施对支撑未来几十年发展绝对必要。AI领域现状也类似——当前可能存在过度建设,但这项技术的根本属性决定了它终将被充分利用。
By the year February, we were only using 3% of the fiber we had laid, but it was absolutely necessary to power the next couple decades. And that's what it looks like for AI is that there may be an overbuild in this moment, but the foundational nature of this technology suggests that it's going to get utilized.
确实。我认为当前的狂热源于AI市场潜力创造了经济紧迫感,促使人们尽可能快地部署更多GPU,以此构建最大最智能的AI模型来形成竞争壁垒。但这里有个值得思考的问题:什么才算基础设施?铁路和光纤至今仍在服役,但当初建设它们的公司因时机过早未能收获价值。这次热潮的不同之处部分在于其融资方式。
Yeah. I think there's such a rush here because the potential AI market creates this economic imperative to go and plug in as many GPUs as quickly as possible, and that will build the largest and smartest AI model, which will create a moat. I think there's an interesting question here, though, of of what is the infrastructure? Because, yes, we still use the railroad tracks and we still use the fiber, but remember that all the companies that were building the fiber and the ones that were on top of that, it was too early for them to capture that value. But what's in part different about this boom is how it's being funded.
现在虽有风投资金等投入,但历史地看——正如本期节目讨论的——主要资本支出来自亚马逊、微软、谷歌这些季度盈利数百亿的巨头企业向英伟达采购芯片。这种由巨头主导的模式导致基础设施建设呈现高度集中化特征。
Now, there's VC dollars and other dollars going into it, but at least historically, and and, you know, if you look at the major capital expense, it is these big companies we've spent the episode talking about. It's Amazon, Microsoft, Google buying from NVIDIA, and they're using the fact that their business models have thrown off billions and billions and billions of dollars every quarter to then go buy the chips and build this out. So there's a real kind of concentration in some of the build out of the infrastructure.
以往由银行融资的繁荣-萧条周期最终会波及整体经济。而本次由于主要依赖企业自有现金流,虽然存在私募信贷和风投参与,但萧条主要冲击这些企业自身(拖累股市),理论上能保护未参与的其他资本提供方。
Those previous boom busts financed by the banks end up drawing in the rest of the economy. And while there is some private credit actors and venture capitalist actors, because it's largely financed by free cash flow of these private companies, A bust hurts them, which will drag down the stock market, but is hopefully insulating the rest of the capital providers that are not involved.
据说现在有更多私募信贷开始介入融资,这可能建立与其他系统的关联。我们将持续观察其发展。至此我们讲述了数据中心的发展历程,现在让我们从宏观尺度看看当前格局。
There is supposedly now more private credit starting to come in to fund this, and that could create linkages to other parts of the system. We will see how that plays out. We've talked about the story of data centers up until present day. Let's zoom out for a second and talk about just where we are today from a sense of scale.
从最基础的问题开始:全球现有多少数据中心?目前约有11,800个。当然统计存在灰色地带——比如那些仍存放着服务器的旧式机房。但总体指独立建设的设施。美国以超5,000个数据中心遥遥领先,其次是德国、英国、中国和加拿大。
Let's start with the most basic question. How many data centers exist today? There are globally approximately 11,800 data centers worldwide. Now, of course, the counting of this gets nuanced when you think about the closets that still exist with servers as they did back in the day. But by and large, we're talking about independent construction, and The US is by far the largest with over 5,000 data centers, followed by Germany, The UK, China, and Canada.
美国几乎拥有全球一半的数据中心,这太疯狂了。
So The US has almost half the data centers. It's wild.
是的,美国高度集中。显然,其他国家正在迎头赶上。中国建设速度非常快。这就引出了一个问题:谁拥有这11000个数据中心,又是谁在投入这数千亿美元。
Yeah. Heavy concentration in The US. Obviously, other countries are catching up. China's building very quickly. So this begs the question of who owns all of these 11,000 data centers, and who's spending these hundreds of billions of dollars.
大致可以分为四类,这在美国具有代表性,也适用于全球。第一类是我们的老朋友——托管数据中心,比如Equinix、Digital Realty等公司。全球约有十几家这样的企业,在美国及世界各地建设10至100兆瓦规模的数据中心区块。第二类是超大规模运营商,如Facebook、亚马逊、微软,还可以加上苹果和甲骨文。
And you can kind of think of it in four broad categories that's representative of The US, but also it fits globally. And so the first category are our friends, the colocation centers, the ones we've talked about, Equinix, Digital Realty, and others. There's about a dozen of these that are building 10 to 100 megawatt blocks in The US and around the world. The second category are the hyperscalers, the Facebooks, Amazons, Microsofts. You can add in Apple, Oracle in there.
它们占全球数据中心容量的近一半,也是新增增长的主力军。但值得注意的是,银行、零售机构和公共部门仍拥有数千个私有服务器机房。虽然数量庞大,但容量相对较小,目前仍占约35%。最后是传统电信公司仍在运营的数据中心。有趣的是行业发展趋势——
They account for almost half of global data center capacity, and they're the lion's share of new growth. But importantly, you still have thousands of private server rooms in banks and retail facilities and public institutions that are out there. And so while large in count, they're relatively smaller in capacity, but today still make up about 35%. And finally, you've got legacy telcos that are still owning and operating data centers. Now, the interesting thing about this is the trend.
企业级和公共部门数据中心正在减少,新建项目越来越多流向超大规模运营商和这些专用托管设施。
You're seeing enterprise and public sector data centers trending down and new build increasingly going to hyperscalers and these purpose built colocation facilities.
那么这些数据中心实际占用了多少物理空间?这种细节数据很难精确统计,但合理估计全球数据中心建筑面积可能达15亿平方英尺。打个比方,按美国中位数住宅面积2200平方英尺计算,相当于73万套住宅的面积,或是28000个足球场
So how much actual physical real estate space are all of these data centers taking up? This is like anything that gets into this level of detail, it can be hard to pull exact numbers, but one reasonable estimate is that maybe there's one and a half billion square feet globally in data center build out. For context, if you took an average, you know, median US home of around 2,200 square feet, that's around 730,000 of those homes square footage or 28,000 football
(包括端区在内)的面积。
fields. Importantly, including end zones.
这个数字看起来很大,但为了有个对比,就拿美国州际公路系统来说——比如I-5、I-90这些主要干道——仅这些公路的沥青铺设面积就是数据中心的十倍。现在我们来谈谈电力问题。
That kinda feels like a big number, but for some comparison to some other infrastructure in our life, if you just took The US interstate system, so this is, you know, I five, I 90, the big interstates, just the asphalt of those interstates is 10 times the square footage. So let's talk about power.
记住,数据中心对电力的影响具有高度地域性,集中在物理位置周边。但我们可以从规模角度来讨论。美国数据中心用电量约占总电力的4.5%,相当于1700万户家庭的年用电量。如何理解这个电力规模呢?
Remember, the impact of data centers on power, it's quite localized and concentrated to its physical location. But let's let's talk about it in a sense of scale. And so we've mentioned that in The US, it's about four and a half percent of total electricity. That's roughly 17,000,000 households annual use. How can I conceive of that power?
是的。如果以纽约市的用电量为参照,全美数据中心的耗电量相当于三个纽约市。全球数据中心的耗电量则与英国全国用电量相当。与其他行业相比,接近化工业或基础金属工业(比如铁矿开采和铝冶炼)的耗电量。
Yeah. I think if you look at the electricity consumption of the city of New York, it's about three New Yorks to power all the data centers in The US. Or if you look at all data centers globally, it's about equivalent to the power consumption of The United Kingdom. If you look at other comparable industries, it's pretty close to the power consumption of the chemical industry or the primary metals industry. You know, this is extracting iron ore and smelting aluminum.
同样在水资源问题上,虽然汇总总用水量存在诸多复杂因素,但我们取个平均值。考虑冷却直接用水和发电间接用水,来感受下整体用水规模。
Similarly on the water question, there's a tremendous amount of nuance in how to aggregate total water consumption, but let's just do the average of the average. Let's take direct water for cooling, indirect water from the power generated for the plant, and get a sense of of how much water this is using.
美国数据中心每天消耗约2.5亿加仑水。
US data centers together consume about 250,000,000 gallons of water per day.
这相当于纽约市用水量的四分之一。我们讨论了数据中心的空间规模、数量、所有者、供电和冷却系统。那么内部究竟有什么?来看看计算能力的规模。虽然全球服务器总数难以精确统计,但大约在5000万至1亿台之间。
That's about a quarter of what the city of New York uses. So we've talked about the space, data centers, how many of them, who owns them, how is it being powered and cooled. Well, what's actually inside of them? Let's think about the scale of the compute power. And it's actually a little bit difficult to figure out what humanity's overall compute power is, but let's just say there's about 50 to a 100,000,000 servers globally.
没错。我认为最好的测算方法是回溯用电数据,再看单位电力产生的计算能力。令人惊讶的是,过去十年人类计算能力增长了约40倍,而电力消耗仅增长2.5倍——这得益于能效提升。当前新增计算能力中,只有约10%用于AI领域,当然这个比例正在增长。
Yeah. And so the best way I think to get at this was to go back to the electricity usage numbers and then look at how much compute power we get per unit of electricity. What's surprising here is that if you run the numbers, we've around 40 x humanity's compute power over the last decade, but only have grown power consumption about two and a half x because we've gotten more efficient. And if you look at that new compute power that's come online, only around 10% of data centers are AI focused today. Of course, that's that's a growing percentage.
计算能力是数据中心运作的一部分。让我们来了解一下规模和存储的情况。
So compute power is one part of what the data center is doing. Let's give a sense of the scale and storage.
是的。过去十年间,我们已将全球存储总容量提升了三到四倍,达到约15泽字节。
Yeah. So in the last decade, we've three to four x total installed storage capacity to around 15 zettabytes.
我们正在与节目的朋友拜伦交谈,他让我理解了现代硬盘的奇迹。想象一下,硬盘磁头被放大到波音747的大小,以每小时560英里的速度飞行,距离足球场仅一张纸的厚度,这个磁头正以每小时560英里的速度读取和写入每一片草叶。
And so we're speaking to a friend of the show, Byron, who made me understand the modern marvel of hard drives. So imagine the head of the hard drive is scaled up to the size of Boeing seven forty seven. It's flying at 560 miles an hour, and it is about one piece of paper thickness above a football field, that hard drive head is reading and writing to every single blade of grass moving at 560 miles an hour.
这简直太疯狂了。我们有这么多硬盘,这么多计算能力,现在它们通过海底电缆连接。我们现在有多少条这样的电缆?
That is absolutely insane. And so we have all these hard drives, we have all this compute, and now they're connected via these submarine cables. How many of those do we have now?
我们大约有600条活跃的海底电缆,支撑着全球99%的国际互联网使用。
We have approximately 600 active submarine cables that are powering 99% of international usage on the Internet.
总的来说,全球带宽大约是十年前的100倍,而且我们看到互联网带宽每年持续增长25%。
So in total, we're about a 100 x the global bandwidth that we were a decade ago, and we've seen Internet bandwidth continue to grow 25% year over year.
这些数字是当前的快照。现在有很多关于未来五年增长方向的预测,有些可能相当夸张,但可以肯定的是,增长将在现在及未来几年持续加速。
And so these numbers are today's snapshot. Now there's a lot of projections out there of where growth is gonna go over the next five years. It can get fairly outlandish, but suffice to say, growth is continuing to accelerate now and in the years to come.
好的。我们刚才讨论了拜伦的飞机、阅读和书写草叶。现在该让我们的飞机降落了,让我们来谈谈主题。在这次研究和对话过程中,你意识到哪些问题浮现出来了?
Alright. Well, we talked about Byron's plane, reading and writing blades of grass. Now it's time to land our plane, and I let's do the themes. What did you realize bubbled up for you over the course of this research and conversation?
这是一段旅程。起初我知之甚少,但后来对数据中心的方方面面了解得远超想象,有几件大事会一直铭记在心。我们读的第一本书叫《管道》,这是一个非虚构的极客侦探故事,探寻互联网这个事物的栖身之处。结果发现云其实很物理化,但更有趣的是,它奇怪地集中分布。互联网几乎有无限的边缘,但中心却少得惊人,而且是以这种轮辐式结构构建的。
It's been a journey. I did not know a lot coming in and have learned way more about the ins and outs of data centers than I ever imagined, and a few big things will continue to stick with me. One of the first books we read is called Tubes, and it's this nonfiction nerdy detective story to discover where this thing called the Internet lives. It turns out the cloud is quite physical, but even more interesting, it's strangely concentrated. The Internet has almost infinite edges, but it's got a shockingly small number of centers and has been built in this hub and spoke structure.
对我来说,这要从梅伊斯特开始,对吧?弗吉尼亚州的阿什本,拥有全球13%的数据容量,曾经承载了全球30%以上的互联网流量。这是无可争议的互联网之都。
And so for me, that starts with May East, right? Ashburn, Virginia, home to 13% of global data capacity and has once carried north of 30% of the world's Internet traffic. That is the undisputed capital of the Internet.
最疯狂的是它成为枢纽仅仅是因为它就是枢纽。就像有个引力中心自我旋转,形成了一个数据中心投资的黑洞。
And it's so wild that that became the hub just by kind of being the hub. There's just like this this center of gravity that spiraled on itself, a black hole of of data center investment.
几乎是偶然的,对吧?互联网结构基于沿海全球城市的网状连接。弗吉尼亚、纽约、华盛顿、伦敦、巴黎、阿姆斯特丹、东京、首尔、拉各斯,全都由这600条海底电缆连接,这些电缆实实在在地支撑着世界经济。
Almost by accident. Right? Yeah. The Internet structure is based on this mesh connectivity of global cities on coastal shores. It's Virginia, New York, DC, London, Paris, Amsterdam, Tokyo, Seoul, Lagos, all connected by these 600 undersea cables that are literally powering the world's economy.
所以我想到数据中心最初只占据壁橱大小,发展到整层楼,再到整栋建筑,再到仓库,我开始把云看作一栋建筑,一个工厂——比特进来,经过处理,以正确方式组装,然后发送到目的地。这让我真正理解了互联网的本质。
And so I think about data centers that started off occupying closets, growing to whole floors, then buildings, then warehouses, and I start to think of the cloud as a building, as a factory where a bit comes in, gets massaged, gets put together in the right way, and then it gets sent out to its destination. And it really brings the Internet home for me.
这让我联想到一个贯穿始终的思考:这种基础设施的隐形性,以及是什么让数据中心和连接的互联网几乎成为完美的抽象基础设施。我的意思是,你可以访问它们、使用它们、获取它们的全部能力,却从未见过或碰过它们。如今我们生活在几乎完全无线覆盖的互联网中,你可以通过所有这些海底电缆访问所有这些仓库和建筑。如果你是个开发软件的工程师,可以部署到全球所有这些区域,却一辈子都没亲眼见过CPU或硬盘。我认为,这是人类建造的物理基础设施中最完美的抽象。
This connects to one of my reflections, which is threaded through the invisibility of this infrastructure and what makes data centers and the connecting Internet almost the perfect, abstracted infrastructure. And what I mean by that is you can access them, you can use them, you can get all of the power of them without ever seeing them or touching them. I mean, now with us now living in mostly wirelessly blanketed internet ourselves, you have access to all of these warehouses and all these buildings via all these undersea cables. And if you're an engineer developing software, can deploy to all these regions around the world and never actually look at a CPU or a hard drive in your life. It is, I think, of physical infrastructure that humanity has built the best abstracted.
它是实体的,是硅基的,是电力的,是光纤的。没有真正的魔法,一切都是物理规律。但人类社会中我能想到的唯一与之同样高度抽象的基础设施或许只有货币,然而货币如今已不再以相同方式实体存在。这套基础设施仍在执行实体工作,这让我深受震撼。
It is physical, it is silicon, it is electricity, it is fiber. There is no real magic, it is all physics. But the only infrastructure I can think of in humanity that is as well abstracted is maybe money, but money is actually not physical anymore in the same way. This actually is still doing physical work, and that struck me.
确实如此。能定义我们生活并左右世界运转的全球性基础设施竟如此隐形,这感觉非常罕见。
It does. It feels very rare to have global infrastructure that is defining our life and how the world operates to be so invisible.
尽管它无形无质,却如此频繁地介入我们的生活。不是吗?我们的一整天都被这套基础设施所中介。但如果你整天都坐在火车上,你会非常清楚铁轨的位置,你能看见它们,你就在列车上。那才是真正的引擎。
And so frequently in our life, even though it is invisible. Right? Our entire day is mediated by this infrastructure. But if your entire day is spent on the train, you're very aware of where the tracks are, and you can see them, and you're on the train. That's the engine.
没错。即使是火,我们一天中也只是选择性使用。铁路,我们上下车。汽车,我们进出其中。电力,我们开关电灯。
Right. Even fire, we probably used it selectively throughout the day. The railroads, we got on and off. The car, we got in and out of. Electricity we turned the lights on and off.
这是我们仅在睡觉时才不使用的现代奇迹。即便那时,它也在为我们工作。
This is a modern marvel that we're only not using when we sleep. And even then it's doing stuff for us.
它正在追踪我的睡眠。我戴着一枚戒指记录我的睡眠数据,正将信息片段发送到Aura的服务器。是啊,听你这么说,感觉就像在上下火车一样。
It's tracking my sleep. I'm wearing a ring that's tracking my sleep. It's sending sending bits to, you know, Aura's servers. Yeah. As you're saying that, it's like you're getting on and off a a train.
这有点像在交易时刻开关电源。而现在更像是你生活在其中,对吧?哦,你生活在这个基础设施里,但我们从未看见它。
You're kind of, like, turning electricity on and off in a kind of transactional moment. This is almost more like you're living in it. Right? Oh. You're living in this infrastructure, but we never see it.
我突然想到了《头号玩家》的场景。是的,穿上全套装备,沉浸到另一个现实中。从这个角度想,我们离那已经不远了。
I just had a flash of ready player one Yes. Putting on the full suit and immersing myself into another reality. We're close when you think about it this way.
我认为你说得对。这让我想到与其他物理基础设施对比时的另一个发现:尽管它确实是实体建筑、硅材料和电力构成的,我们却在持续快速地升级它。部分原因是摩尔定律,部分源于硬件本身、我们的技术、散热等各方面的进步。但我觉得大多数其他基础设施在我脑海中是这样的:好吧,这是一种相当不错的铁轨建造方式。
I think that's right. Think this gets to one of my other realizations when comparing this to other physical infrastructure, which is despite it being truly physical buildings and silicon and electricity, we are upgrading it continuously and rapidly. Part of that is because of Moore's Law. Part of that is because of just the improvement of the hardware and our techniques and cooling and all of these things. But I feel like most other infrastructure in my head you get to, okay, this is a reasonably good way to build a train track.
现在我们需要建造更多铁轨;或者说这是一种不错的电力传输方式,所以现在我们要建设更多输电线路。当然输电效率会略有提升。但在这里我们不是略有进步——过去十年我们的性能提升了40倍,而这又建立在之前十年的基础上。它变得好太多了,所以这很奇妙:没错它是基础设施,但它正在以数量级规模快速改变着我们所获得的东西,我想不出其他类似的例子。
Now we need to go build more of it, or this is a reasonably good way to transport electricity, so now we're going to build more transmission. And certainly you get a little bit better at transmission. We don't get a little bit better here. We're getting 40x better in the last decade, which was off of the previous decade. It's gotten much better, so it's this weird thing where, yes, it's infrastructure, but it is changing what it is we're getting at such a kind of order of magnitude scale so quickly, and I can't think of other analogies to that.
也许电力发展最初的二三十年是这样的感觉,但这次似乎更加剧烈,因为所交付的东西与电力并不完全相同。
Maybe the first twenty, thirty years of electricity felt like this, but something feels even more drastic here because what is being delivered is not commoditized exactly the same way as electricity.
你描述的这种动态让我想到了另一个体会,也是我目前感到困惑的地方——关于电力的问题。对我来说,理解电力问题为何如此棘手很容易,答案其实就在眼前。这个问题很严峻,对吧?数据中心正从电网汲取越来越多的电力。
You know, that sense of movement that you're describing reminds me of another one of my takeaways, which is where I'm feeling stuck, and that's on the question of power. And to me, it's easy to think about the complexities of why the power problem is so hard to solve, and yet the answers are right in front of us. The problem is huge. Right? Data centers pulling more and more electricity from the grid.
问题之所以困难,是因为电网已不堪重负,涡轮机和变压器积压严重,而且我们现有的公用事业体系无法适应当前超大规模用户愿意支付的激励结构。改造电网以满足需求需要数年时间。与此同时,最快最便宜的能源今天已经存在——就是太阳能和储能技术。
And the problem is hard because the grid is strained, the turbines and transformers are backlogged, and we don't really have utilities that are built to capture the incentive structures to value what hyperscalers are willing to pay today. And it's going to take years to reform the grid to actually meet the needs. And at the same time, the fastest, cheapest energy is available today. It's solar. It's storage.
快速廉价地接入可再生能源,这已经占到电网新增容量的90%。尽管当前政府竭力发动对清洁能源的意识形态战争,声称我们处于国家能源紧急状态,却拒绝让电网获得可负担的电力。所以只要我们能破除这些阻碍,就能快速为电网增加大量电力。这个故事的另一个侧面是我们建设数据中心的方式。
It's putting renewable energy online quickly and cheaply. That is already accounting for 90% of new capacity going on the grid. Despite every effort from our current administration to execute an ideological war on clean power and claim that we're in some national energy emergency and yet deny the grid of having affordable electrons being placed on it. So if we can just get out of our own way, we can put a lot of power on the grid fairly quickly. The other side of that story is the way we're building data centers.
我们正在按照247高峰负载的20%进行计算。今年早些时候杜克大学发布的一项著名研究指出,如果我们能将数据中心面向电网的功耗每年仅降低1%,即全年总计九十小时,就能释放100吉瓦的负载——相当于为美国电网新增两座核电站的容量。
We're doing it accounting for 20 fourseven peak load. And there's a famous study that circulated earlier this year from Duke University that suggested that if we can limit grid facing power of data centers by just 1% a year, Ninety hours out of the entire year, we can unlock a 100 gigawatts of load. We can unlock the equivalent of two nuclear fleets on The US energy grid.
没错。谷歌最近就宣布正与多家电力公司合作推进这项工作。我们不断看到新兴创业公司围绕这个概念开展业务。这与我关于电力发展的核心观点高度契合:限制催生创新,尤其是当它们遭遇传统垄断体系时。尽管我们对生活中的电力服务常有怨言,但必须承认现有电网总体运行良好且稳定,而它确实已很久没有迫切的升级压力了。
Right. You saw Google do some recent announcements where they're collaborating with some of the utility providers to do exactly that. There's new startups that we see all the time that are building businesses around that concept. This connects really closely to my major takeaway on the power story, which is that limitations drive innovation, especially when they come up against a legacy incumbent system. Something that generally works and reasonably well, which for all of our general disgruntledness about utilities in our life, it is phenomenal that the electricity works most of the time as well as it does, and it has not gotten a kick in the behind in a long time in terms of needing to try and evolve.
几十年来人们一直在讨论智能电网,但真正推动变革的是AI数据中心竞赛和中美地缘竞争。这种压力正在全产业链激发创新——不仅如你所说促进可再生能源并网发电,还倒逼输电审批改革,甚至在数据中心内部推动AI模型的优化。我们正在投资那些提升模型训练与运行效率的企业。
There's been decades of people talking about a smarter grid, but nothing's forced the need like the AI data center competition and the geopolitical competition with China. That drives innovation across the stack. It's exactly what you're saying in terms of driving the next cheap, clean electron on the grid in the forms of renewable energy, but it's also pushing us on transmission and permitting, and it's also pushing us inside the data center to think even further about AI models. We look at it and invest in companies that improve the efficiency of actually running the model and training the model.
确实如此。
That's right.
通过研究访谈我们反复验证:这类资源紧缺会自上而下推动能效创新,加速电网进化。
And so up and down this whole stack, a point that we found again from talking to people in research, these types of crunches drive the efficiency innovation and drive the evolution of the grid.
据深耕该领域数十年的专家所言,我们在创新方面确实长期懈怠。过去只摘取了许多低垂果实,而现在这种强制机制能推动国家电网这种复杂系统实现阶梯式进步。
And arguably, according to folks that have been in this for decades, we've been fairly lazy on innovation. Right? It's been a lot of the low hanging fruit, and so this type of forcing function can drive step change improvements in something as complicated as our national grid.
因此在我看来,这一切首先预示着长期乐观前景:我们不仅能解决问题,还能以促进清洁能源部署的方式解决;但同时也需正视这些地区短期内面临的水电资源压力。
So to me, all of that points to, for me, long term optimism in, first of all, solving the problems, but second of all, solving them in a way that leads to more clean energy deployment, but also needing to recognize and not dismiss the near term local impacts for both power and water in these environments.
我非常赞同这一点,因为当我们讨论这些案例时,那个主题不断重现——从地方层面思考问题与从国家层面思考问题截然不同。我认为人们必须明白这一点的重要性,尤其是当
I love that point because as as we talked through these stories, that theme kept recurring of how it is different when you think about the problem locally than when you think about the problem nationally. And I think that's really important for people to understand as this is
这日益成为头条新闻时。我个人感悟是:对下一个Gemini查询聊天应用或视频存储的焦虑越来越少,反而更渴望在那些正与空气质量或水资源短缺抗争的特定县区提高地方意识,因为此刻行动主义需要聚焦于社区内部,而非试图阻碍我们共同追求的有价值应用与进步。
more and more headline news. My personal takeaway is less and less anxiety about that next incremental Gemini query chattyptanthropic use case or storing a video, and a more desire to have more awareness locally within a given county that's struggling with air quality or water shortages, because that's where the activism needs to center within communities in the moment than trying to slow the valuable uses and progress that we're all building towards.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。