本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
这是一个我认为将会有巨大AI杠杆效应的领域。目前看来在这个工具开发领域还有很大努力空间,这其实挺不可思议的,要知道现在已经是2025年了,我们居然还没有生物学版的元素周期表。我们认为这可能是最需要构建的核心工具集之一。
This is a a a space that I mean, that there's just gonna be a huge amount of leverage with AI. It still seems like there could be a lot more effort in this space around building tools, and it's kind of this crazy thing that we're, you know, here in, you know, 2025, and there's not the kind of periodic table of elements equivalent for biology. We think that this is, like, probably one of the most important sets of tools that you need to build.
当我们最初提出要在本世纪末实现治愈和预防疾病的目标时,说实话,连大多数科学家都觉得我们疯了。他们觉得这根本不可能。确实如此,因为如果你只是把钱投给全国每个实验室的下一个最佳研究项目,根本不可能实现这个目标。
When we first set out that the goal to cure and prevent disease by the end of the century, people like, honestly, most scientists couldn't look at us with a straight face. And They're like, you're crazy. Yes. And it was true because if you just decide to spend the money funding the next best grant for every single lab in the country, like, you there's no pathway to that being true.
生物学界的人觉得这个目标过于雄心勃勃,而AI界的人却觉得这太无聊了,认为这是自然而然会实现的事。所以我觉得需要在这两种极端观点之间找到平衡点。
The biology folks, I think, looked at it as if it were crazy ambitious. And then the AI folks are like, well, that's kind of boring. That's just automatically gonna happen. So I'm gonna it's like, okay. There's something in between there that needs to be bridged.
科学界需要的是能彻底治愈疾病的新工具,而不仅仅是更多资金。几十年来,生物研究一直受限于相同的桎梏:只支持渐进式进展的小额拨款、各自为战的实验室研究狭隘问题,以及缺乏应对重大医学挑战的共享基础设施。但如果我们能改变这种状况呢?今天,您将听到普莉希拉·陈和马克·扎克伯格讲述他们十年来为现代生物研究构建基础设施的历程。我们将讨论他们如何意外地通过细胞图谱项目建立了生物学数据标准,以开源形式对数百万细胞进行了编目。
Scientific community needs fundamentally new tools to cure disease, not just more funding. For decades, biological research has been constrained by the same limitations: small grants that fund incremental progress, isolated labs working on narrow questions, and a lack of shared infrastructure to tackle the biggest challenges in medicine. But what if we could change that? Today, you'll hear from Priscilla Chan and Mark Zuckerberg on their ten year journey building the infrastructure for modern biological research. We discuss how they accidentally created the standard for biology data with their Cell Atlas project, cataloging millions of cells in an open source format.
我们将探讨为何他们要押注虚拟细胞技术,让科学家能在计算机模拟中测试高风险假设,而无需先投入大量湿实验。我们还将深入Biohub项目,了解他们如何通过前沿生物学与前沿AI的结合来加速发现。希望您喜欢本期节目。马克、普莉希拉,欢迎来到a16z播客。
We explore why they're betting on virtual cells that let scientists test high risk hypotheses in silico before investing in extensive wet lab work. And we dive into Biohub, their play to accelerate discovery by pairing frontier biology with frontier AI. Hope you enjoy. Mark, Priscilla, welcome to the a six z podcast.
谢谢邀请。
Thanks for having us.
是啊,很荣幸能来参加。非常期待。好的,非常欢迎你们。
Yeah. Great to be here. Excited. Alright. Excited to have you.
你们正在做激动人心的事情。
You're doing exciting stuff.
是的。为此,大约十年前,你们启动了陈·扎克伯格倡议,目标是到本世纪末治愈、预防和控制所有疾病。你们本可以选择许多其他使命投入时间和资源。请带我们了解背后的讨论过程,为什么选择了这个目标?普莉希拉,不如从你开始讲讲你的个人故事?
Yeah. To that end, almost a decade ago, you guys started the Chan Zuckerberg Initiative with the mission and intent to cure, prevent, manage all disease by the end of this century. There's a lot of missions that you guys could have poured your time and resources into. Take us behind the conversations of why you guys picked this one. Maybe Priscilla, why don't we start with you and hear of your side story?
当我谈到我们如何从事基础科学研究时,人们总是很惊讶。我接受的是儿科医生培训,人们总以为这肯定与医学有关。对我来说,学医是因为我想改善人们的生活,想有所作为,想帮助他人。
It always surprises people when I talk about how we work in basic science research. I trained as a pediatrician, and people always think, oh, it must be about medicine. And for me, I went into medicine because I wanted to improve people's lives. I wanted to make a difference. I wanted to be able to help others.
在UCSF接受儿科医生培训期间,我遇到许多患者,坦白说就是那些我们完全不知道问题出在哪里的孩子和家庭。如果幸运的话,他们可能有个特定的基因名称,或者被归类到其他疾病组,然后打印出一份通用的PDF说'这是我们知道的全部'。作为实习医生或住院医师,我的工作就是把寥寥几行信息转化为对患者的护理方案。正是在那时,我真正认识到基础科学的力量,以及我们必须推进基础科学研究来拓展可能性的边界。我把它称为'希望管道'。
And I think training as a pediatrician at UCSF, I met a lot of patients and frankly like little kids and families for which we just had no idea what the problem was. And they might have a specific gene that they could name if they were lucky, or they could be grouped grouped into a bunch of other diseases and there'd be a general sort of PDF they'd print out like this is what we know. And then it was my job as an intern or resident to try to translate like a few lines of information to how we were supposed to take care of the patient. And for me, that's when I really realized the power of basic science and how we need to work on basic science to advance the forefront of what's possible. I think of it as the pipeline of hope.
是啊,但你们为什么会认为能治愈所有疾病呢?因为这个目标听起来非常...激进。
Yeah. And why did you think you could cure all disease? Because that's like a very, like, aggressive goal.
你想回答这个问题吗?
Do you wanna answer that
我来回答?是的。明确地说,我们并非要亲自治愈所有疾病。这个策略实质上是帮助科学家和科学界加速基础科学的发展进程。
one? Yeah. Well, I mean, we're not gonna cure all diseases, to be clear. I mean, the strategy is to help scientists and the scientific community cure all diseases. So the strategy is really one of accelerating the pace of basic science.
我们当时的理论是,纵观科学史,大多数重大突破基本上都源于新工具的发明,这些工具能以新方式观察现象。对吧?
And the theory that we had was if you look at the history of science, most major breakthroughs are basically preceded by the invention of a new tool to observe phenomena in a new way. Right?
所以是
So was
想想显微镜这类工具,对吧?能够观察细菌,或者其他领域的望远镜。明白吗?但举个工程学的例子,没有这类工具,就像编程时无法单步调试代码一样。
thinking about things like the microscope, right, being able to observe bacteria, or other fields, the telescope. You know? But it's just to use an engineering example, without those kind of tools, it's kind of like you're coding without being able to step through the code and debug things.
对吧?就像过去那样,当它
Right? That's like old days, when it's
四岁的时候。
four years old.
所以我们整体的思路就是:让我们帮助构建能加速整个领域发展的工具。我认为这存在一个合适的细分领域,因为如果你观察科学领域的资金运作,绝大多数资金来自政府和NIH拨款。这些资金被分割成相对较小的拨款,让独立研究者通常只能研究短期项目。而开发这类新型工具——无论是成像技术还是现在许多AI技术如虚拟细胞模型——往往需要更长时间和更高成本。想想看可能需要1亿到10亿美元,耗时十到十五年,然后你才能解锁这些工具并将其提供给科学界以加速进程。
So our whole approach on this is basically let's help build tools that will accelerate the pace of the whole field. And I think that there's a niche that I think fits that because if you look at how funding works in science, the vast majority of funding comes from the government and NIH grants. It's parceled out into these relatively small grants that allow individual investigators to investigate usually pretty near term things. And the development of these kind of new types of tools, whether it's imaging or building now a lot of AI things like virtual cell models, are longer term oftentimes more expensive to develop. So think about on the order of maybe $100,000,000 to $1,000,000,000 over a ten to fifteen year period, and then you try to unlock those tools and give them to the scientific community to accelerate the pace.
大致就是这个理论。
So that's kind of the theory.
没错,而且在很多方面,你们其实并没有因为这些工具而得到应有的认可。我是说,有很多公司都在使用你们的工具,他们对此非常满意,但我甚至都不知道这个情况。所以——
Right, and it seems like there's also something that is you don't really get credit for the tools in a lot of ways. I mean, we have companies that use your tools, and they're very happy about it, but I didn't even know that that was the case. So- That's
这就是为什么这是慈善事业。
why it's philanthropy.
是啊。嗯,确实是。但大多数人做慈善也是为了获得认可。我是说,这也是其中的一部分。所以我想问,你有考虑过这一点吗,还是说就直接决定'不'。
Yeah. Well, is. But most people do philanthropy to get credit too. I mean, that's kind of a part of it. So I guess, did you think about that, or were you just like, No.
这个会成功的,只要它能成功,这就是我们需要的全部?
This is gonna work, and if it works, that's all we need?
我们超级专注于真正让每个科学家变得更好,甚至超越科学领域,比如初创企业、创业创始人。因为关键在于我们无法独自完成这件事。当我们最初设定目标要在本世纪末治愈和预防疾病时,说实话,大多数科学家都无法直视我们。
We're super focused on, like, actually making every scientist better and beyond science, like startups, startup founders. Because the point is we can't do this alone. And when we first set out the goal to cure and prevent disease by the end of the century, people like, honestly, most scientists couldn't look at us with a straight face. And They're
他们觉得,你们疯了。是的。
like, you're crazy. Yes.
这确实如此,因为如果你只是决定把钱投给全国每个实验室的下一个最佳资助项目,那根本没有实现的可能。但如果你迫使人们真正思考这个问题,比如:实现这个目标最可行的途径是什么?这个可行途径存在哪些障碍?那我们才算有所进展。对吧?
And it was true because if you just decided to spend the money funding the next best grant for every single lab in the country, like you there's no pathway to that being true. But if you forced people to really think about this and like, okay, what is the most credible pathway to doing this? And what are the barriers to that credible pathway? Then we sort of got somewhere. Right?
他们说,呃,就是没有共享工具,或者我们没有在大型项目上合作构建合适的数据集。然后我们说,好吧,那我们可以开始解决这个问题。这就是为什么产生了构建共享工具的想法,因为目前科学界没人这么做
They were like, well, like, there's no shared tools or we're not working on big projects and building the right datasets. And we're like, okay. Then we can start doing something about that. So that's where the idea of building shared tools because no one right now in the science
哇,这真有意思。所以基本上你们是说‘我们要治愈所有疾病’,而他们觉得‘不可能。对,这办不到’
Well, that's so interesting. So basically, you're like, we're going to cure all disease. And they're like Can't. Yeah. It can't be done.
为什么办不到?因为我们现在没有工具啊。明白吗?这个逻辑链条挺有意思的
Why can't it be done? Well, because we don't have the tools. Okay? That's pretty cool sequence.
是啊。还有个有趣的现象,生物学家们觉得这个目标疯狂到不切实际,而AI专家却说‘这有点无聊,这不是自然而然就能实现的吗’。就...好吧
Yeah. Mean, there's also this funny thing where the biology folks, I think, looked at it as if it were crazy ambitious, and then the AI folks are like, Well, that's kind of boring. That's just automatically gonna happen. Yeah. It's like, okay.
这中间需要架起一座桥梁——如果能用现代AI工具来构建生物学家需要的工具。这就是我们工作理念的重要组成部分
There's something in between there that needs to be bridged, and if you can use the modern AI tools in order to build the types of tools that biologists need. So that's a big part of how we think about our work is-
AI绝对是史上最被高估又最被低估的技术,这两种评价同时成立。没错
AI has got to be the most overestimated and underestimated technology ever, simultaneously. Yeah,
可能就像早期的互联网那样。但我们Biohub把自己定位为前沿生物学与前沿AI的结合体。有些实验室专攻前沿AI构建最先进的模型,也有很多生物研究机构在做尖端研究来发现新数据集或应对特定挑战。但至今没人同时推进这两方面。你看就连AlphaFold这么了不起的成果,也是基于几十年前产生的公开数据集
well probably like the internet early on, but we think about ourselves and the work that we're doing at the Biohub as frontier biology paired with frontier AI. There are labs that do frontier AI that basically are building the most advanced models, and then there are lots of biological research organizations that effectively do very leading edge research to build, to either discover new datasets or looking to certain challenges. But so far there hasn't been anyone who's tried to do both of those at once. When you look at I mean, even something like AlphaFold, which is amazing. It was built off of this dataset that was a public dataset that had been produced decades ago.
我认为如果能把这两者结合起来,就有机会为训练AI模型生成特定数据集,从而构建能执行特定功能的虚拟细胞。对,没错。所以我觉得这是个相当有趣的领域。是的。
What what I think you have the opportunity to do if you do both of those together is produce specific datasets for the purpose of training AI models to build virtual cells that can do specific things. Yeah. Right. Very So I think that that's a pretty interesting zone to be in. Yeah.
在我们所有的工作中,实际上当我们创立CZI时,我们确实关注了多个领域。我们发现科学研究者带来的回报远超其他,因此我们不断加倍投入,直到现在已坚持十年,Biohub目前确实是我们慈善事业的核心重点。不过,是的,我想这大概就是...基本上就是...
Of all the things that we've worked on, actually when we started CZI, we kind of actually focused on a number of areas. What we found is just that the science researchers had by far the biggest returns, so we've just doubled down on it over and over and over until now we're at the point that we're ten years in, and Biohub is really the main focus of our philanthropy at this point. But, yeah, I mean, that's kind of that's Maybe basically the
你们太谦虚了,因为你们在说'有小规模科学,我们不想做;有世纪规模科学,虽然时间跨度长但可实现、有野心'。但你们实际上找到了介于两者之间的重大科学挑战——十到十五年的周期,至少从你们的表述方式和激励方式来看,这真的很棒。
you're not giving yourselves enough credit because you're sort of saying, well, there's bite sized science. We don't wanna do that. There's century scale science, and that seemed like a long time horizon, but achievable, ambitious. But you've actually identified, which I think is really fantastic, grand scientific challenges that are right in between. They're ten to fifteen year horizons, at least per kind of the way you communicate about them and way you energize Yeah.
科学界对此的看法。十到十五年是个有趣的时间跨度,类似于风投公司的时间周期,也像一个团队能持续协作的时长。你们是如何确定这个时间框架的?又是如何规划每十到十五年周期内要攻克的挑战的?因为这既具体又可实现。
The scientific community about them. Ten to fifteen is kind of an interesting time horizon, sort of like similar to the time horizon of a venture backed company, similar to the time horizon on which a team can work together for that period of time. How did you get to that number? And then how are you thinking about the challenges that you take on in each ten to fifteen year wave? Because that's concrete, achievable.
你们通过宣布这些挑战的方式建立了很高的可信度。
You build a lot of credibility around it the way that you've announced those challenges.
我很好奇你们怎么想这个问题。对我们而言,在审视十到十五年周期的重大挑战时,标准是'当你看到它时能发现可行路径'。并非所有问题都需要解决才值得着手——事实上如果所有问题都已解决,那反而感觉像是...
Well, I'm curious how you guys think about it. But for us, when we looked at the grand challenges from the ten to fifteen year time horizon, it needs to be like, when you look at it, you're like, I see a path. Right. Not everything needs to be solved for us to take it on. In fact, if everything's solved, then that feels like that
应该直接...
should just And
一个足够雄心勃勃的目标。
one ambitious enough.
是的。就像我们有一定的风险承受能力,对吧?所以我们想要的是存在可信路径的领域,有能够胜任的领导者,并且有足够的模糊性让我们觉得可以承担这种风险。如果我们成功了,回报可能比预期的还要高。我们在生物中心是这样建模的:我们设立了三个生物中心。
Yeah. Like we have some risk appetite, right? So we want things where we're like, there's a credible pathway, someone who is at the helm who can do this, and there's enough ambiguity where we feel like we could take on that risk. And if we do it, like the returns could be higher than even expected. And the way we modeled that in the Biohubs is we have three Biohubs.
我们在旧金山、芝加哥和纽约各有一个。纽约的中心专注于细胞工程:我们能否设计细胞来检测信号、读取信息或执行特定行动。芝加哥的中心致力于组织构建和细胞间通讯研究。而旧金山的中心则专注于深度成像和转录组学。
We have one in San Francisco, one in Chicago, one in New York. The one in New York works on cell engineering. Can we engineer cells to go in and detect signals, read it out, or to take certain actions. In Chicago, we're building tissues and looking at cell communications within tissues. And then in San Francisco, we're looking at deep imaging and transcriptomics.
这些选址并非偶然。我们还会考虑合作大学,因为有人会来生物中心开展跨学科合作研究,不受传统实验室的限制。但同时我们也依托这些学术机构的实验室来支持工作。这就是我们选择重大挑战和选址的方式。而大型语言模型和人工智能的加入特别有趣,因为我们已经在构建工具来测量有趣的数据,建立数据集。
And that work, the locations are not by accident. We also look at the partner universities because we have folks who come to the Biohubs to do this work, collaborative, interdisciplinary, and sort of unconstrained by the traditional lab. But we also build off of the labs at these academic institutes that support the work. And so that's how we sort of choose the grand challenge and and the locations. And then the sort of layering in the large language models and AI coming into the picture has been so interesting because we were already building tools to measure interesting data, building the datasets.
但我们之前并不清楚该如何处理这些数据。当大型语言模型出现时,我们才恍然大悟:原来我们可以理解所有这些信息了。
But we didn't really know what to do with them yet. And large language models coming onto the scene were like, wow, we can make sense of all
现在。我很好奇你们在治疗领域如何定义成功。你知道,我们经常思考如何理解生物学,有时我们会投资那些想要开辟全新生物领域的初创公司,研究我们尚不了解的疾病机制。还有另一群人会说:既然我们现在明白了问题所在,那就来修复它吧,开发新药物。
of this now. I'm curious what you view success as in the therapeutic realm. So, you know, we think a lot about understanding biology, and sometimes we bet on startups that want to unlock completely new biological areas, diseases where we don't know what's going wrong. And then there's another group of folks who kind of say, hey, okay, now that we understand what's going wrong, let's fix it. Let's come in with a drug.
采用新型化学物质,新型抗体。你认为未来十、二十、五十年后,CZ生物中心在推动新药研发方面的成功会是什么样子?
Let's come in with a new type of chemistry, new type of antibody. What do you think success for the CZ Biohub looks like ten, twenty, fifty years from now in terms of the new medicines that you've enabled?
我们希望看到精准医疗部署领域能涌现出一大批创新者,就像一场社区大爆发。无论是罕见病还是常见病,本质上我们都是在探讨个体生物学特征——尽管目前我们常将其笼统归类。很多时候我们并不清楚发病机制,对吧?比如检测出某个基因突变,最糟糕的情况是发现意义未明的变异。这到底意味着什么?
We want there to be like an explosion of a community who are building these just the new wave of what it means to be deploying precision medicine. Like I think for rare diseases and common diseases alike, you're really talking about individual biology that we sort of lump together. And they and we often don't know how it happens, right? We know that you have this mutation or the worst nightmare is you have a variant of unknown significance. What does that even mean?
可怕的VUS(意义未明变异)。
The horrible VUS.
没错,确实可怕。就像你告诉患者似乎发现了异常,却无法解释其临床意义。但通过单细胞转录组学等新技术分析变异,我们已能初步判断:这个变异会影响下游特定细胞群。接着通过观察蛋白质表达模式,对比健康细胞差异,就能开始锁定潜在治疗靶点。
Yes, horrible. You're like, you tell someone you kind of know something, but we don't know what it means. But if you look at the way we've been able to look at variants and look at single cell transcriptomics, we're starting to be able to say, Okay, this variant actually impacts this set of downstream cells. And then we start looking at the proteins that get expressed and how it looks similar or different to what a healthy cell would look like. Then you can start targeting, okay, like let's look at that as a target.
基于突变与蛋白质表达的关联分析,你们既能精准定位靶点,又能预测脱靶效应——即药物可能产生的副作用,因为你能预判药物在体内其他部位的相互作用。虽然这类案例罕见,但我认为绝大多数疾病都应视为罕见病,因为每个人的生物学特征都是独特的。
And you both know the specificity of the target you want to build based on the ability to connect mutation to protein expression, as well as to be able to predict off target effects. What are the side effects? Because you also know where else that drug will be able to interact with the body. So those are rare. But I really think most diseases should be thought of as rare diseases because each one of our biology is different.
而现状是我们被简单归类:按年龄、人口统计特征、祖源信息(如果有幸获得这类数据)。但事实上,每个人的生物学特征都不同。以高血压或抑郁症为例,当前疗法基本是试错模式——先用药再看效果。理想状态应该是通过个体生物学分析,实现精准、快速、定制化治疗。
And right now we just get lumped. Right? We get lumped based on age, demographics, ancestry, if we're lucky to have that level of understanding. But truly, each one of our biology is different and say, like, you look at hypertension or depression, like, we kind of just go by trial and error and saying like, let's just try that drug and see what happens. But what should really happen is being able to precisely and accurately and quickly treat people by looking at individuals biology.
我们致力于夯实基础科研根基。如果有人能基于我们构建的模型开发出新型诊断方法和治疗手段,我们将无比振奋。
We want to enable the basic science, and we would be thrilled if people picked up the models that we build to be able to build the diagnostics, the therapeutics that need to come.
必须承认,你们构建的数据集令人惊叹。虽然初创企业、制药公司和研发部门可能不会直接反馈,但由于你们坚持开源,这些工具正在被广泛使用。我们投资组合中有家研究特发性肺纤维化的初创公司——从病名就能看出这疾病多么棘手。
You've built amazing data sets, I have to say. I mean, you may not hear the feedback from the startup community and the pharma community and the R and D community, but it's there because you've committed to open source. And so people may not all be writing papers, but they are using those tools. There's a startup in our portfolio working on idiopathic pulmonary fibrosis. The name tells you how vexing the disease is.
这是特发性的,我们不知道它为何发生。IPF(特发性肺纤维化)因此得名。他告诉我,他使用了你们的细胞基因图谱,查看了数百万个来自患病和健康患者的单细胞,试图精确定位成纤维细胞,深入研究成纤维细胞及其基因表达。这太不可思议了。
It's idiopathic. We don't know why it happens. IPF is named that way. And so, you know, he was telling me that he used your cell by gene atlases to look at millions of single cells in patients with disease, without disease, try to pinpoint the fibroblasts, double click on the fibroblasts and their gene expression. It's incredible.
并尝试利用这些信息来探索:在这个本质上由特发性原因形成的奇怪疾病团块中,我该如何寻找新的药物靶点?我认为有一大批创新者非常喜爱这些工具、可视化界面、查询系统,以及你们构建的软件方法,它们让这些数据变得极其易于获取。
And try to, you know, use that to inform, Hey, where could I go after a new drug target in this disease that's fundamentally a strange clump of idiopathic origin? So I think there's a huge group of innovators who love the tools, the visualizations, the query systems, and really the software approach that you built to making that data incredibly accessible.
不过CellByGene几乎是个意外产物。
CellByGene is almost an accident, though.
请详细说说。
Tell us more.
那么你想分享一下CellByGene的情况,还是由我来开头?
So do you want to share a little bit about CellByGene, or do you want me to start?
我不清楚你想了解哪部分,但细胞图谱工作整体来说——现在已经是2025年了,我们居然还没有生物学版的元素周期表,这挺疯狂的。这正是我们的主要灵感来源:如何通过Biohub的工作和其他资助项目,共同建立并标准化一种能整合所有这些数据的格式?刚开始时,我们甚至没想过要利用它来构建虚拟细胞模型。随着AI技术的进步,这个目标才逐渐清晰起来,但这确实非常令人兴奋。我们绝对应该花大量时间讨论虚拟细胞模型,不过我不确定你原本想了解细胞图谱的哪些方面。
I don't know which part you want to get into, but the Cell Atlas work overall, it's kind of this crazy thing that we're here in 2025, and there's not the periodic table of elements equivalent for biology. That was a lot of the inspiration of it was, all right, how do we both, through work that we're going to do in the Biohub and through other grants, be able to pull together and standardize a format where you can have all this data? When we were starting off, we didn't even necessarily have in mind that we were going to use that to build virtual cell models. I think that that's sort of just come into focus as the AI work has advanced, but that's a very exciting thing. We should definitely spend a bunch of time on the virtual cell models, but I'm not sure what you wanted to get into on the Cell Atlas.
单细胞研究是我们十年前首批资助项目之一。当时我们觉得这是可行的,于是资助了相关方法学以标准化操作流程。十年前我们资助了几个实验室开始构建这个数据集。
Well, the single cell work is was one of our first RFAs ten years ago we started, and we're like, okay. We think this is possible. We actually funded the methodology for it to to standardize how it was gonna be done. So that was ten years ago. And then we seeded a few labs to start building out that dataset.
但我们想,细胞类型和排列组合可能有数百万甚至数十亿种。这要怎么实现?尤其还是用一项新兴技术。于是我们组建了几个小组开始工作,后来他们反馈遇到了问题。
But we're like, there are like millions or billions of different cell types and different permutations. Like, how are we going to do this? And especially with like a burgeoning technique. And so we ended up seeding a few groups, and they started doing work. And then they told us they had a problem.
由于数据标注速度跟不上,工作流程出现了瓶颈。所以我们开发了Cell by Gene标注工具,这就是项目的起源。我们开发这个工具是为了让单细胞科学研究人员能轻松标注数据,然后将收集的数据公开供大家使用。
There was a bottleneck in their workflow because they couldn't annotate the data fast enough. So we built Cell by Gene was an annotation tool. That's the original source of this. So we built the annotation tool to make it easy for people who are doing single cell science to be able to annotate the data. And then we put the data that we collected publicly so people could cheer.
但由于大家都开始使用相同的标注工具,数据格式自然就标准化了。随后围绕这个工具形成了社区,他们希望回馈并共同构建这个图谱。经过十年发展,现在已有数百万细胞数据成为整个科学界的共享资源。我们只资助了其中约75%...抱歉说错了。
But because everyone started using the same annotation tool, everyone was standardized then on the same data formats. And then there started being a community around the tool, and they wanted to share back and build the Atlas. So now after ten years, there are millions of cells that have been built into this shared resource for the entire scientific community. We only funded about 75% of it. Sorry, that's wrong.
我们实际只资助了25%,75%来自更广泛的科研群体,他们认为这个工具很有用,并提供了简便的标准化共建方式。对,就是这样。
We've only funded 25% of it. 75% came from the broader community saying, this is useful and there's an easy way for us to standardize and build Yeah. These That's right.
这就像一种有趣的网络效应。
It's like an interesting It's like what you call a network effect. Effect.
我正想说,听起来很像互联网的发展模式。
I was going say, sounds like the Internet.
这些标注数据将用于构建虚拟细胞模型。嗯。
Up Come from the annotations, for the virtual cell model. Mhmm.
在我们刚开始这项工作时,确保每位参与者采用统一格式至关重要。这样才能保证数据的可用性和可移植性。当这种方式逐渐成为标准后,其他人也发现了它的价值。
Well, it was very important when we were getting started with the work to have everyone who was doing it have a consistent format. So that way it could be used and portable. And then once that kind of took off as as the way that it would get done, then other people just found it valuable.
确实。即便是与早期如GEO等数据库相比,它们也远没有这么标准化或经过质量控制。
Yeah. And even relative to prior data bases like geo and and whatnot, they're just simply not as standardized or QC.
对,没错。
Yeah. Yeah.
让我们谈谈虚拟细胞。你们关注的核心挑战是什么?可以谈谈发展前景和当前面临的困难。
Let's get into virtual cells. What what is the great challenges that you would focus on? Maybe talk about what is the promise or the hope and maybe some of the challenges or where we're at with it.
我们认为这将成为当前最重要的工具之一。本质上是在构建从蛋白质到细胞内部结构,再到完整虚拟免疫系统的多层级体系。我们早就预见这套工具对科研假说生成至关重要——甚至在开展完整实验前,就能预估可能的结果。
Yeah. I mean, we think that one of this is gonna be one of the most important tools at this point. It's basically building up the kind of hierarchy from proteins to to just different structures within the cell to whole to whole virtual immune system or different levels of hierarchy. We knew that this is gonna end up being a very important set of tools for people to effectively generate hypotheses for different science work. Even before you get to the point where you're really running full experiments in it, you can come up with some estimate of how that might run.
这对普莉希拉刚才谈到的精准医疗案例很有帮助。我们认为这是必须构建的核心工具集——它包含多维度组件。细胞图谱数据有助于理解细胞层面现象。目前最关键的是,由前Meta蛋白质折叠模型团队创立的Evolutionary Scale公司正加入Biohub,其负责人Alex Reeves将统领整个科学项目——这很有趣,因为这标志着AI与生物学的融合,而且是由懂生物学的AI专家而非略懂AI的生物学家主导,这反映了我们对两者权重的认知。
It will be useful for some of the precision medicine type examples that Priscilla was talking about a few minutes ago, but we think that this is probably one of the most important sets of tools that you need to build. It's not a single thing. There's different angles to come at this from. The Cell Atlas data is helpful for understanding things on a cellular level. One of the kind of most important things that we're doing right now, there's this great company, Evolutionary Scale, actually had a bunch of researchers who'd formerly worked at Meta on protein folding models, is joining Biohub, and Alex Reeves, the leader of it, is actually going to be the head of the whole science program, which is actually interesting when you think about it, it's where like you have AI and biology coming together, and really it's like an AI person who understands biology is running it rather than a biologist who has some understanding of AI, I think, just kind of speaks a little bit to where we think the relative weight of these things is.
正如普莉希拉所说,各Biohub和纽约的细胞工程项目将开发能记录体内活动并共享数据的细胞,进而构建模型。芝加哥Biohub专注于炎症记录研究——这是独特的数据集。我们刚完成首套空间模型训练,用于解析细胞在不同状态下的形态特征。就像工业界语言模型逐步发展出通用能力那样,这就是我们的发展路径。因此我们将围绕重大生物学挑战建设Biohub网络。
But I mean, we basically view you know, like Priscilla was saying with the different Biohubs, and then New York doing cellular engineering will basically make it so that you can have cells that can record different things that are going on around the body and and share that data, and then you can build that into models. The Chicago Biohub being able to record inflammation, and and basically study that in order to kind of help understand, like, that's a that's a different dataset. We have the Imaging Institute, which is we just trained our our first set of models around that, which are the first, like, spatial models around understanding, like, the way that that kind of cells look in different states. And, eventually, just like you have this analogy on the industry side around language models where you have different capabilities, then over time you train them into models and it gets more and more general, that's the idea here. So we'll we'll build the biohubs around grand biological challenges.
生物中心将开发工具来生成新型数据集。我们将基于这些数据构建模型,最终将这些模型整合成一个日益通用的虚拟细胞视图,这对科学家以及致力于药物研发的初创企业和公司都将非常有用——虽然这不是我们的工作范畴,但显然是整个进程中极其重要的一环。
The biohubs will build tools that will generate novel datasets. We will build models based on those, and then eventually combine the models into an increasingly general view of a virtual cell that will be useful both for scientists and hopefully startups and companies that are working on finding drugs, which is not our part of the whole thing, but but I think is obviously a really important part of what needs to happen.
确实。你们在做投资时总在考虑风险问题。我认为虚拟细胞模型的优势在于能尝试风险更高的创意。现在科研经费难申请,湿实验又贵又慢,不仅耗资还费时。
Yeah. And, you know, you guys think about risk all the time in terms of when you make investments. Like, I think the promise of being able to do virtual biology using a virtual cell model is you can actually take on riskier ideas. Right now, like grant funding can be hard to come by, and the wet lab work is expensive and slow. And it's not just money, it's also time.
因此研究者不得不选择成功概率较高的课题来维持实验室运转。这自然导致人们会承担一定风险,但不会太大,因为他们必须确保一定比例的成功率来获得终身教职或发表成果。但如果有了能模拟高质量生物过程的虚拟细胞模型,就能在计算机端进行测试和调整,提出更高风险的问题——那些在实验室需要耗费大量时间和资源才能验证的设想。这样就能先在数字环境中验证实验方案的可行性,再投入实际实验室资源。
And so you have to choose something that you think is gonna have some likelihood of success to keep your lab career going. And so it naturally lends people to take on some risk, but not a lot of risk because they need to make sure that they are hitting a certain percentage of the time to make tenure or publish or whatever they need to do. But if you had a virtual cell model where you could simulate really high quality biology, you could actually then start testing and tinkering on the computational side and ask riskier questions, things that would have been expensive and costly in terms of time and resources to do in the lab. And actually see if there is promise doing the experiments in silico before you make the time and money investment in the wet lab.
你们是否将其视为一种模式生物?就像新型果蝇那样?
Do you think of it kind of like a model organism? Yeah. Like it's the new fruit fly?
对。我正想问,考虑到细胞的复杂性,你们认为模型能模拟到什么精度?假设能达到完美复现,但实际应用中虚拟细胞需要多精确才算有用?
Yeah. Was going to ask, given the complexity of a cell, how close, how accurate do you think you'll get the model to? I mean, just assuming I mean, maybe you get it to, like, a perfectly accurate representation of a cell, but, like, how accurate to be useful would the virtual cell have to be?
我认为模型显然会不断迭代优化。目前我们还停留在转录组学层面,正在拓展观察细胞的其他维度,精度会持续提升。但实用性并不需要100%准确,因为关键是在前期降低创意风险。风险降得越多效率自然越高。其实只要能获得方向性信号就很有价值了。
I think it will obviously iterate and get better and better because right now we we like, right now, we're still just talking about transcriptomics. We're expanding into different ways of looking at the cell, but you get more and more accuracy. But I don't think it needs to be 100% accurate to be useful because you just want to be able to de risk the idea on the front end a little bit. And the more and more you de risk it, the more efficient it gets, obviously. But it'll be useful if you even get directional signal.
是的,我们确实将其视为模式生物,但要保持对人体真实情况的保真度。我们不希望——
And yes, we do think about it as a model organism, but in a way that has fidelity to the human body. Don't want us-
所有模型都是错误的,但有些是有用的。
All models are wrong. Some are useful.
是啊。
Yeah.
没错。希望在某些操作上具有实用性,正是如此。
Yes. Hopefully, has utility on certain acts. Exactly.
就像语言模型一样,你构建了特定功能。举例来说,我们正在发布的一个模型是变体生成器。它基本上是通过训练大量有效配对数据来实现的。如果你对一个细胞在特定位置应用CRISPR技术,就能看到另一侧的结果。所以它本质上能够做出这类预测。
And just like the language models, you build in specific capabilities. It's not So for example, one of the models that we're publishing is variant former. It basically makes it so that, it's trained on a bunch of effectively pairs. If you you have a cell, you apply CRISPR to it in a place, you see what comes out at the other side. So it's it basically is able to make that kind of a prediction.
比如说,如果你对细胞进行某种编辑,可能会发生什么?另一个模型是这种扩散模型。基本上,你可以描述想要模拟的细胞类型,然后它会生成该细胞的合成模型。这挺有意思的,因为正如Priscilla之前提到的,每个人都是独特的,不同细胞也各有特点,所以你需要能够模拟这些罕见配置。
Like, okay. If you have edit that you're doing to a cell, what is likely going to happen? Another one of the models is it's this diffusion model. Basically, you can describe a type of cell that you would like it to simulate, then it will just produce a synthetic model of the cell. Again, I mean, it's kind of interesting because to Priscilla's point before about how everyone is different and and, like, and different cells have have kind of you know, so you wanna be able to simulate these kind of rare configurations.
至少拥有一个可能形态的合成版本很有趣,然后你可以针对它进行测试。我认为冷冻电镜模型很有意思,因为它是空间性的。它能让你感知到这些不同模型的存在,使你可以观察各类现象,然后随着时间推移将它们训练得越来越通用。
Having at least a synthetic version of what that could look like is interesting, and then you can test against that. The cryo model I think is interesting because it's spatial. So it gives you a sense of there are all these different models that you can have that allow you to basically look at different kinds of things, then you just train them in to be increasingly general over time.
哇,确实非常有趣。这些建模技术本质上是LLMs吗?还是有推理模型?是不是就像...
Wow. Yeah, they're very interesting. Is the modeling technology basically LLMs, or is there a reasoning model? Is it like just a-
哦,这其实也是个非常有趣的模型。虽然还处于早期阶段,但基本上是最早的生物推理模型之一。核心理念是让这些模型能以不同方式模拟世界模型,不仅要能输出相关性发现,更要能推理事物如何演变及其原因。虽然还很初步,但从概念上看,这显然是模型进化的重要方向。
Oh, that's actually a fascinating one too. One of the new models, I think this one is very early, but it's basically the first reasoning model over biology. The idea is that yeah, you you you effectively have these models that that kind of simulate world models in different ways, and then you want it to be able to not just be able to spit out correlations, right, in terms of, like, what it's found, but actually be able to kind of reason through how things would evolve and why things would happen. I know it's quite early, but but it is interesting conceptually as what I think is clearly going to be an important direction in terms of how these models evolve.
没错,这正是我的想法。如果行不通,接下来就要问为什么。是的。
Yeah. Because that's what I was thinking. If it doesn't work, the next question you have is why. Yeah.
用语言模型来类比的话,你需要更好的世界模型或预训练模型来提高推理能力。但关键是要逐步增强功能,而且我认为存在一定的构建顺序。Alex和进化尺度团队主要研究蛋白质,这很有趣,因为蛋白质分辨率明显比细胞数据更微观。但细胞层面的假设是,你可以观察各种细胞并模拟其行为,但若缺乏对细胞亚组分如何互动的层级理解,认知就会比较肤浅。所以我们主张先构建顶尖的蛋白质模型,再将其整合到顶尖的细胞模型中。
I mean, the language model analogy for that would be you need better kind of world models or or better pre trained models in order to get the reasoning to be good. But it's yeah. You just you build more you build more capabilities into it, and I think that there's probably an order too. So the work that Alex and the evolutionary scale folks worked on is a lot of it is protein, which is interesting because that's at a smaller resolution obviously than the cellular data, the But cell part of the hypothesis is that you can look at all these different cells and you can kind of simulate how they might behave, but you're gonna have a somewhat shallow understanding unless you actually have this hierarchical understanding of what, how these subcomponents of the cells are going to interact. So our view is that you basically wanna build up a state of the art protein model, and then have that be a part of the state of the art cellular model.
有了这个基础后,就能构建虚拟免疫系统等更复杂的模拟系统。这本质上是一种分层构建虚拟模型的方法。
Then once you have that, you build things like the virtual immune system, which allows you to simulate much more complicated systems. But it's sort of this hierarchical approach to building up these virtual models.
这非常合理,因为涉及个性化时,共同蛋白质组合成独特细胞的过程从系统角度会变得更可控。这个思路很棒,很有意思。
That makes a lot of sense because also as you get into personalization, you've got common proteins combining into a unique cell. That makes it from a systems standpoint, that makes it much more manageable. That makes a lot of sense. Interesting.
是的,确实。这些研究非常引人入胜。
Yeah. Yeah. No. It's very fascinating stuff.
对了,听说你们这周要宣布重大消息?能提前透露些内幕吗?
Yeah. So you guys are announcing some big news this week. Do you wanna give us a sneak preview?
好消息是,我们正在思考如何作为一个团队凝聚起来。要知道,过去我们运营过生物中心,开发过软件,也进行过人工智能研究。但所有这些工作实际上都略显分散。现在在Alex的领导下,我们将整合为生物中心——一个运营型慈善机构,在这里我们为实现共同目标开展科研工作,致力于推动人工智能与生物学交叉领域的研究进展。
Well, the big news is thinking about how we are going to be coming together as one team. And, you know, in the past, we have done we've run Biohubs, and we've done built software. We've done some AI research. But all of it has been really thinking about has been a little bit decentralized. But now under Alex's leadership, we are going to come together as the Biohub, an operating philanthropy where we are doing the science in service of a singular goal together, and how do we actually advance the state of biology and research at the intersection of AI and biology.
太棒了。
Amazing.
没错,Alex确实了不起。
Yeah. Alex is amazing.
是的,他非常优秀。另外还有我早前提到的部分,CCI曾涉足多个不同领域,但随着时间的推移,我们发现在科学领域能产生最大影响力,因此我们持续加码投入。
Yeah. No. He's great. And then the other thing is the piece that I mentioned earlier, is just, yeah, mean, CCI has focused on a number of different things. We've really just found over time that we feel like we've been able to make the biggest difference in science, so we've just kept on doubling down on it.
我们将继续开展教育工作,继续支持当地社区的各个项目。但展望未来,生物中心将成为我们慈善事业的核心方向,对此我们感到非常振奋。因为当我们启动这项使命——帮助科学界在本世纪末前攻克疾病——时,我认为随着人工智能的进步,这个目标完全有可能大幅提前实现。这是一个极具价值、至关重要且令人激动的目标,我们在这个生态系统中拥有独特地位,能够助力他人快速取得进展。
And we're gonna continue doing work in education. We're gonna continue supporting local communities in in those different pieces. But going forward, the Biohub is really going to be the main thrust of our philanthropy, and we're very excited about that because I think that this is There has been When we started the mission to see if we could help the scientific community cure and prevent diseases by the end of the century, I do think with the advances in AI, that should be possible to do significantly sooner. That is a very worthy and important and very exciting goal that we think we kind of have a unique place in the ecosystem that we can help empower others to make fast progress on that.
显然,分散化管理在沟通成本等方面有许多优势。那么你们试图通过新增这种统一管理层级来实现什么?具体产出是什么?随之而来的复杂性又有哪些?抱歉我——
So there's obviously plenty of advantages to decentralization from a management communication overhead and so forth. And so like, what are you trying to add by adding this kind of new layerunification on top? Like, what what are the outputs? And then I guess what are the complexities to that? Because that's I'm sorry.
用CEO的方式提问。
To ask a CEO question.
不,不。我是说,我是在为
No. No. I I mean, I'm I'm, like, asking for
一个朋友问的。
a friend.
对,想想这个。你要试试吗?
Yeah. Think about this. You you go for it?
而且我能跳...是的。显然现在有出色的团队在做前沿AI研究,也有很多团队在做顶尖生物学研究。而我们独特的价值在于将这两者结合起来。我们资助了数据集建设,也自建了数据集。我们正在搭建观测细胞的工具链——无论是组织层面的细胞通讯研究,还是能让我们在近原子级观察细胞的冷冻电镜技术。
And I can jump Yeah. So there are obviously amazing groups doing frontier AI and a lot of groups doing great frontier biology. And where we think we can do uniquely is actually tie these two together. And we've funded data sets, we've built data sets. We're like building the instrumentation now to be able to look at the cell, whether it's, you know, at the tissue cell cell communication, our cryo EM where we can look at the cell at nearly atomic level.
所以我们不仅能构建数据集,还能根据现有知识体系的缺口,按需塑造数据形态。我们有顶尖团队负责这项工作,同时也在构建这些AI模型。整合两者的意义在于形成闭环——比如当模型显现某些认知盲区时,我们能立即知道该找哪个领域的专家协作。
So we have the ability to not only build the data sets, but actually shape and form them the way we want based on what we see as necessary to complement the existing body of knowledge. And so we have amazing teams doing that work, and we're building these AI models. And so the reason to do it together is then we can actually complete the flywheel. Like, you know, the model is looking like it has some gaps and blind spots in this area. Okay, who do we talk to?
如何构建下一代数据集?我们在实验室已经看到,元数据将丰富到足以反哺建模方式。如果能实现这个闭环(这也是我们召集各方的目标),我认为将产生惊人的能量。
How do we build the next dataset? And, you know, we're seeing this in the lab. Like the metadata is going to be so rich that we can feed back into the way that we do this modeling. Yeah. And so if we can close that loop, which is our goal in bringing everyone together, it's I think it's going be incredibly powerful.
这远不止是写份需求文档说'请交付这个'那么简单。要让研究人员肩并肩协作,在相互启发中不断完善,才能构建出越来越精准的人类细胞工作模型。
And it's more than it it's more than just like, you know, writing down a spec and saying like, please deliver this. Like, these people need to be sort of working shoulder to shoulder and shaping each other's work for this to actually be a more and more accurate model of how the human cell works.
嗯,是的,这非常有趣,因为这在AI领域对我们来说是最大的惊喜。先不谈生物学,领域专用模型确实非常有意思。最初的理论认为AI会变得非常聪明,在所有方面都比人类聪明。但在视频模型领域,每个视频模型都只擅长某些特定任务,而非全能。
Well, yeah, that's so interesting because that has been the biggest surprise in the industry for us in AI worlds. Forget biology for one second, is that the domain specific models have been super interesting. The original thesis was like, there's just some AIs are gonna get so smart. They're gonna be smarter than everybody at everything. But like on video models, every video model is best at something, but not everything.
因此,了解你要解决什么问题在AI领域实际上变得非常重要——这有点讽刺,因为如果你能把两者结合起来,就能得到更好的结果。我们一次又一次地看到这种情况,可以说这与整个行业最初的预期完全相反。
And so, knowing what problem you're solving actually turns out to be sort of ironically very important in AI because you can actually get to a way better result if you put the two together. We're seeing that over and over and over again in a way that is, I would say, very counterintuitive to the whole narrative going into it.
在生物学领域,过去至少有一个假设是:所有数据集都不在互联网上。所以需要领域专用模型的部分原因是数据集不公开。但你们通过创建大量开源数据访问正在打破这种趋势。即便如此,听起来你们还是在押注我们在其他行业看到的趋势。不过,如何标注和整理这些数据仍然会有细微差别。
In biology, it used to be the or at least one assumption was all the data sets aren't on the internet. So part of the reason you need a domain specific model is that the data sets are not public. You guys are bucking that trend too by creating a lot of open source access to the data. And then even then, it sounds like you're betting on the trend that we're seeing in other industries. But still, will be nuance in how you annotate that data, curate that data,
嗯,你怎么能不跟科学家交流呢?我们会找到合适的模型。因为你不仅需要了解数据和模型等,我们发现对话本身变得极其重要。
and Well, how you not talk be to a scientist, right? And we'll find the model. Because you have to not only know the data and the model and so forth, but the conversation is what we keep finding out ends up being very, very important.
如此丰富且重要,在实际操作中——
So rich and so important in how you actually-
科学家不会像我使用ChatGPT那样跟它对话。
A scientist isn't going to talk to it like I talked to ChatGPT or whatever.
嗯,这是第一个
Well, is the first
可以与之对话的飞行物。
fly you can talk to.
是啊是啊,这太令人兴奋了。
Yeah. Yeah. That's super exciting.
用户界面确实非常重要。你提到你们有位创始人正在使用细胞基因技术。那个用户界面是经过精心设计的,使用者无需具备深厚的计算生物学背景,因为我们希望来自不同领域的人都能参与解决问题——就像在说:看这里,帮我们解决这些问题。
And the user interface is actually really important. You talked about you guys have a founder who's using cell by gene. That user interface was intentionally designed to not need to have a computational or really a very deep biological background to be able to use. Because you want people coming from different fields to look at the problem. It's like, look here, help us solve problems here.
因此我们构建这个用户界面时,特意降低了使用门槛,让人们能够轻松探索、学习知识并应用到工作中。我们真心希望通过建立这些虚拟模型,最终能让更多人轻松参与进来——比如有人可能会说:我对这方面有所了解,或许能贡献一份力量。一个典型例子是:免疫学其实与神经退行性疾病密切相关,对吧?
And so building that user interface in a way where it's not a very high barrier to entry to be able to poke around and learn something and bring knowledge back to your work. That's intentional. And we're really hoping when we build these virtual models that we get to a place where we can allow a lower and lower barrier entry for people to say like, you know, like, I have some knowledge about this. Maybe I can contribute. A very pertinent example is, turns out, I think immunology has a ton to do with neurodegeneration, right?
看起来免疫学是这一切的幕后推手。所有事情。所以这可能是你们世纪愿景的一部分。
Seems like immunology is behind all this. Everything. So might be part of your century vision.
所以必须让免疫学家能够参与进来,理解神经退行性疾病,并明白他们的专业如何与之关联。门槛越低,就越能促进真正的跨学科协作思维。那么Biohub会...
So you need to be able to allow the immunologists to come in and understand neurodegeneration and understand how their world fits in. And so the more you lower the barrier to entry allows people to actually think in a sort of truly collaborative and interdisciplinary way. So will the Biohub
发展壮大团队吗?比如会在Biohub本部雇佣更多人,还是转向建立更多站点、实验室和社区驱动数据集的网络模式?主要方向是什么?或者两者兼有?
grow as a team? Like, will you employ more people at the Biohub proper, or are you moving towards more of a network model with more sites, more labs, more community driven data sets? Which is the thrust, or maybe it's both?
可能是两者兼而有之。我们逐步新增了生物中心,同时也在壮大这个核心AI团队。很棒。但我不认为这些关于如何搭建架构的组织性问题——我们的很多方法都借鉴了领域内其他同行的做法,因为科学就像是一个投资组合,社会有它试图实现的目标组合,而在慈善领域,你希望通过找出哪些方面代表性不足来发挥最大的补充作用。所以科学本质上是高度去中心化的,对吧?
Probably a little of both. We've added new biohubs over time, and then we're also building up more of this central AI team. Cool. But I don't I think that these organizational questions of how do you set this up are A lot of our approach is informed by what the rest of the field is doing because you think about science as it's this portfolio, where society has a portfolio of stuff that it's trying to do, and in terms of philanthropy, you want to be the most additive that you can be by trying to figure out what else is underrepresented. So science by default is very decentralized, right?
某种程度上就像资助机制运作的方式,我认为这也是科学家们本能倾向的工作方式。
Kind of the way that granting has worked, the way that I think scientists by default want to work.
所以我
So I
我们发现,找到那些看似简单却未曾实现的合作方式能释放巨大价值。第一个生物中心做了两件有趣的事:一是促成了加州大学旧金山分校、斯坦福和伯克利的合作。这些顶尖人才原本理论上可以协作,但缺乏正式机制,而我们搭建了这个平台。二是跨学科融合,让生物学家和工程师并肩工作。
think a lot of what we've found is that figuring out ways to encourage collaboration in ways that otherwise seem very simple, but weren't happening before can unlock a lot of value. So the very first Biohub, what we did There were two interesting things. One was it was this collaboration between UCSF, Stanford, and Berkeley. There are all these really smart people at all these different places who previously I guess in theory, they could have figured out a way to work together, but there was not really a formal construct for them to do that, and this just allowed a lot more collaboration. The other one is cross discipline, basically having biologists sit next to engineers.
有种观点认为这两个学科需要...我也说不好。你们肯定在很多公司见过这种现象,但有趣的是——
This view that these two disciplines are things that need to I don't know. I'm sure you've seen this in a lot of the companies, but there's so many interesting-
嘿,公司总是把这两个部门分开。
Hey, the companies, they always set them apart.
有意思的是,很多组织问题只要让两个团队坐在一起就能解决。知道吗?不管组织结构图怎么画,重点是他们必须并肩工作直到项目跑通。这是我深信不疑的理念。
Well, it's interesting how many organizational questions or problems you can fix just by having two teams sit together. Right? It's like, doesn't matter what the org chart is or whatever. It's like, you guys need to sit next to each other until you get this thing to work. And That's something I really believe in.
而且你有时间。
And you have time.
你有十到
You have ten to
十五年。
fifteen years.
其实沟通在构建或解决任何问题时都是个被低估的难题,这挺有意思的。
Well, it's all like communication is such an underrated problem in general in building anything or solving anything. That's pretty neat.
是啊,虽然都是些很简单的东西,但作为模式却很新颖。我们已将其从第一个Biohub复制到Biohub网络,并扩展到其他模式。看到领域内其他人也采用类似模式也很有趣,因为这本身就是很直观的做法。最终你会意识到分散式工作也很有价值——我们并不是说所有科研都该这么运作。
Yeah, and it's just really simple stuff, but I think it's novel as a model. We've now copied this from the first Biohub to the Biohub network and expanded it to other models. It's also just been neat to see other folks who are working in the field also adopt similar models because it's a pretty intuitive thing. At some point, you'll reach the point where actually it's really good to have decentralized work too. It shouldn't be that we're not saying that this is the way that all science should work.
我们只是说这种模式有其存在空间。它能释放很多价值,因为不知为何它一直不是默认选项。
We're just saying that there's a space for this. It can unlock a lot of value because it, for whatever reason, hasn't been the default.
对。而且我们仍然依赖——是的。
Yeah. And we still rely on- Yeah.
MIT实验室里流传着关于这个的著名故事。他们就是这样发明了激光等等。他们把来自不同部门的一群人聚集在同一个
There's famous stories in the MIT lab about that. That's how they invented lasers and so forth. They put a bunch of people from different departments in the same
会议实验室。是的。
The meeting lab. Yeah.
实际上,物理学是我们获得许多灵感的地方。历史上,物理学实验室总是围绕大型项目和共享资源展开。我们虽然相对集中,但仍需依赖许多从事前沿或补充性研究的实验室共同支持这些工作。关于扩展问题,还有一个想法,也许这就是现代AI实验室的模式。
Well, actually, physics is where we got a lot of the inspiration. Physics has just historically been labs have just rallied around big projects and big shared resources. And we will you know, we are relatively centralized, but we still depend on a lot of labs who are doing sort of exact frontier work or complementary work to come together to support those. There's that. But one more thought on your expansion question is, and maybe this is the modern AI lab.
我们本质上没有扩大太多物理空间,但我们正在扩展计算资源。
We are not expanding a lot of square footage per se, but we're expanding our compute.
研究嘛,是的。他们不想要员工为他们工作,不想要办公空间,只想要GPU。
Research, Yeah. They don't want employees working for them. They don't want space. They just want GPUs.
是的,智能体。没错。
Yeah. Agents. Yeah.
某种意义上,这就是新型实验室空间。它比湿实验室空间昂贵得多。
It's just like, in a sense, that's new lab space. It's much more expensive than wet lab space.
你们在这方面一直很有创意,即使在最近几年也是如此。你们创造了共享计算资源的方式。你们让学术实验室能够...呃,我忘了你们项目的名字。对。
And you guys have always been creative on that even in the last few years. You've created ways to share access to compute. You've enabled academic labs to, you know, I forgot the name of your program. Yeah.
驻场科学家之类的项目
A scientist in residence or something like
对。我们提供某种租赁服务
that. Yeah. Think we rental kind of prohortelling
核心是计算集群。你看,单个实验室可能拥有几十个GPU,而我们是最早真正构建大规模计算集群的。目前有一千小时规模,我们计划扩展到万级规模。这显然需要不同类型的项目。
The core of it is clusters. You know, if you look at individual labs, they'll have, like like, a large lab would have tens of GPUs. And we were the first to really build a large scale compute cluster. A thousand hour, we have plans to move to the 10,000 range. And that one requires a different type of project, obviously.
能够提出不同类型的问题。这是我们使用的资源,同时我们也邀请科学家来申请,询问他们有什么问题需要这种规模的资源,以此种方式来培育合作机会。
Able to ask different types of questions. And it's a resource that we use, but also we've invited scientists to apply and say, like, what question do you have that could use this amount of resource and be able to sort of seed collaborations that way?
所以如果有科学家在听——那些未被Biohub雇佣或不在Biohub工作但想与Biohub合作的——是的,你们会建立...
And so if a scientist is out there listening, like, who's not employed by the Biohub or working at the Biohub but wants to collaborate with the Biohub Yeah. That you're gonna create
来找我们吧。
Come to us.
利用资源的巧妙途径,这太棒了。
Interesting doors to utilize the resources. That's awesome.
我是说,GPU资源某种程度上是零和游戏,对吧?数据也是有限的。就是这样。
I mean, the GPUs are somewhat zero sum. Right? So, data is some. So, yeah.
是啊,说得对。确实如此。
Yeah. Fair enough. Fair enough.
那么,你们即将迎来从事这项工作的十周年。展望未来几年,关于你们正在考虑的发展方向,或是指导团队成长进化的原则和北极星,还有什么可以分享的吗?
So, you're about to celebrate ten years doing this. As as you look out in the years to come, what else can you tell us about either things that you're thinking about for the future or maybe even principles or a north star that's gonna guide how you guys grow and evolve going forward?
你知道,过去十年非常有趣,因为实际上前几年我完全羡慕那些在营利性公司工作的人。因为目标非常明确,无论是私营还是上市公司,市场都会告诉你工作表现如何。
You know, it's been really interesting in the past ten years because I actually spent the first few years completely envious of people working for for profit companies. Because there's so much clarity. Like, the market will tell you whether or not it's private or public, will tell you if you're doing a good job.
如果他们觉得你做得
If they think you're doing
不错 如果他们觉得你做得
a good If they think you're doing
他们并不总是对的。他们并不总是对的。但是
They're not always right. They're not always right. But
我仍然感到嫉妒,因为我渴望那种反馈。比如,我做得怎么样?十年过去了,我们之所以加倍投入生物学研究,不仅是因为我们实现了当初设定的目标,而且当我们启动这些项目时,实际成果甚至超出了我们的预期。我当时就想,好吧,这是个值得抓住的信号。我们可以继续加大投入,做更多这样的事情。
I was still envious because that was I was like, I craved that feedback. Like, am I doing a good job? And, you know, ten years in, you know, the reason why we're doubling down on biology is like, not only did we achieve what we said we were going to do, and when we set out to set out on these projects, it actually delivered more than we thought we were going to. And I was like, okay, that's a signal I can latch onto. And like that's a signal I we can really continue doubling down and doing more of that.
因此我认为关键在于继续忍受早期的模糊阶段——当你决定要深入某个领域时,保持耐心并愿意长期投入,但同时又要保持紧迫感。正是这一路上的无数次迭代让我们走到了今天,让我们能够幸运地做好准备,建立起能利用人工智能和大语言模型的数据集,这一切都源于我们持续的努力。所以即便面对模糊性和大目标下偶尔的信号缺失,我们也要继续前进——我想我们某种程度上已经为这种模式奠定了基因基础。太棒了。
And so I think it's continuing to tolerate the early ambiguity when you're like, okay, I'm going to do more of this. And being patient, being willing to have a long time horizon, but be impatient at the same time. Because it's all those iterations along the way that have sort of allowed us to get to this place where, you know, to get lucky, ready, having built data, data sets to take advantage of AI and large language models, that's because of all the work that we have been doing. And so being able to continue moving forward in this ambiguity and sometimes lack of signal on a big goal, like, I think we sort of set the DNA for that. Amazing.
哦,不是双关语。
Oh, no pun intended.
是的。但我们能看到有多少人使用Blueprint工具。然后...对。对。对。
Yeah. But we get to see how many people use the tools Blueprint. And the Yeah. Yeah. Yeah.
对。
Yeah.
你们已经有了客户,这很酷。是的。是的。在慈善领域。所以这太棒了。
You have customers, which is pretty cool. Yeah. Yeah. For philanthropy. So that's awesome.
是的。不。这是构建工具的有趣之处之一。就像你能看到人们觉得这些工具有多重要。人们是否使用这些工具来发表重要工作?
Yeah. No. It's one of the fun things about building tools. It's like you kind of get to see how valuable do people find the tools. Do people use the tools in order to publish important work?
对。对,对,对。是的。而且可能,我们的反馈是它们太棒了。
Right. Right, right, right. Yeah. And probably, mean, our feedback is they're awesome.
我们的反馈很好。而且
Our feedback is great. And
顺便说一句,完全独一无二。所以,另一个问题是,如果没有这个,你会用什么?就像,什么都没有。
completely unique, by the way. So, like, the the other thing is, like, what would you use if you didn't have this? It's like, there's nothing.
不。是的。这确实是一个真正的空白。我是说,从加速基础科学到资助大量人使用它,再到生物技术公司可以开始研究新疗法,然后制药公司大规模生产,这一整个流程都需要存在。
No. Yeah. It's it's a real it's a real kind of void. I mean, there's this whole pipeline that that needs to exist from accelerating basic science to funding a lot of people to use it to then you can get into the biotechs that basically can start to work on Yeah. On on basically coming up with novel therapies, and then you get the pharma companies that do them at scale.
然后在公共卫生的另一边还有慈善事业的空间,基本上就是把疗法带给全世界每个人。但这个领域,我是说,AI将带来巨大的杠杆效应,而且是的。似乎在这个领域围绕构建工具和加速整个流程还可以投入更多努力。
Then there's a space for philanthropy on the other side of public health of basically taking the therapies of bringing them out to everyone in the world. But this is a space that I mean, that there's just gonna be a huge amount of leverage with AI, and it is yeah. It still seems like there could be a lot more effort in this space around building tools and just accelerate the whole thing a lot better.
是的。而且我确实认为这是你完全独一无二的领域。对吧?其他事情,有其他人可以做,但没有人做
Yeah. And I do think it is the place where you are completely unique. Right? The other things, there are other people who can do that, but there's nobody doing what
你是优秀的创始人市场。是的。创始人
you're good founder market. Yes. Founder
市场是独特的。如果我们不存在,会是个问题吗?是的。这些问题确实切中要害。
market is unique. If we didn't exist, would it be a problem? Yes. Like, those questions really land.
你知道,作为风投,我们中一个是工程师。另一个,是的,是科学家、博士。是的。非常
You know, as VC One of us is an engineer. The other one Yeah. Is a scientist, doctor. Yeah. Very
对这个方向感到高兴。是的。非常感谢,不仅是为了我们的公司,也为了我们作为人类,能从事这项工作。这是了不起的工作。
happy in this direction. Yeah. Thank you very much, not only for our companies, but for us as humans, for working on this work. It's amazing work.
哦,谢谢你,山姆。
Oh, thank you, Sam.
谢谢。谢谢大家。
Thank you. Thank you, guys.
非常感谢。
Thank you very much.
感谢收听本期a16z播客。如果您喜欢这期节目,请务必点赞、评论、订阅、给我们评分或写评论,并与亲朋好友分享。更多节目请前往YouTube、Apple Podcasts和Spotify收听。在X平台关注我们@a16z,并订阅我们的Substack博客a16z.substack.com。再次感谢收听,我们下期节目再见。
Thanks for listening to this episode of the a 16 z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or a review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on x at a sixteen z, and subscribe to our Substack at a16z.substack.com. Thanks again for listening, and I'll see you in the next episode.
温馨提示:本节目内容仅供信息参考,不应视为法律、商业、税务或投资建议,也不应用于评估任何投资或证券,且不针对任何a16z基金的现有或潜在投资者。请注意a16z及其关联机构可能持有本播客讨论公司的投资。更多详情(包括我们的投资链接)请参见a16z.com/disclosures。
As a reminder, the content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any a sixteen z fund. Please note that a sixteen z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a 16z.com forward slash disclosures.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。