史蒂夫·奎克与夏洛特·邦尼：生物学的圣杯

本集简介

“终有一天，我的梦想是能模拟出一个虚拟细胞。”——德米斯·哈萨比斯构建虚拟细胞的愿景被视为数字生物学领域的登月计划。近期，42位顶尖生命科学家在《细胞》期刊发表论文，阐述了这一目标为何至关重要及最终实现的路径。本次对话的两位作者分别是现任于瑞士洛桑联邦理工学院（EPFL）的夏洛特·邦内，以及斯坦福大学教授、陈-扎克伯格倡议（CZI）科学负责人史蒂夫·奎克。音频（上方）可在iTunes与Spotify获取。完整视频链接置于顶部，亦可通过YouTube观看。 **附时间轴与音频链接的对话实录** 埃里克·托波尔（00:06）：大家好，我是《真实底线》的埃里克·托波尔，今天我们要探讨一个炙手可热的话题——虚拟细胞。这篇极具前瞻性的论文近期发表于《细胞》期刊，第一作者是来自EPFL（曾任职于斯坦福计算机科学系）的夏洛特·邦内，以及我的老友、CZI科学负责人兼斯坦福大学教授史蒂夫·奎克。欢迎二位。史蒂夫·奎克（00:42）：谢谢邀请，埃里克。夏洛特·邦内：感谢。埃里克·托波尔（00:45）：你们这篇由夏洛特领衔、史蒂夫作为资深作者之一的论文于12月发表在《细胞》上，标题《如何用人工智能构建虚拟细胞：优先事项与机遇》令我震撼。这是生物学的圣杯。我们正身处数字生物学时代，正如文中指出，这需要AI与生物学的空前融合——AI的发展速度令人瞠目。或许我们可以先聊聊：42位作者是如何达成共识的？史蒂夫·奎克（01:33）： CZI曾召集跨领域专家会议，包括计算机科学、生物信息学、AI专家，甚至持怀疑态度的生物学家。我们特意引入反对声音，共同探讨可能性与预期目标，这促成了论文的诞生。埃里克·托波尔（02:02）：夏洛特，你如何成为论文起草者？夏洛特·邦内（02:09）：我在Genentech与阿维夫·雷格夫及CZI的尤雷·莱斯科韦克完成博士后研究。尤雷参与CZI驻留项目时，我们与史蒂夫在通用细胞嵌入领域已有合作，由此开启这项计划。埃里克·托波尔（02:29）：作者名单堪称生命科学、AI、数字生物学与组学领域的全明星阵容。我想先引用论文中的一段话：“人工智能虚拟细胞（AIVC）可能彻底改变科研进程，推动生物医学研究、个性化医疗、药物研发、细胞工程与可编程生物学的突破。”这个宏大宣言能否展开说说？史蒂夫·奎克（03:19）：这确实雄心勃勃。我们认为十年内AI将成为变革性生物学工具。目前细胞生物学90%依赖实验，仅10%靠计算。未来比例将逆转——90%计算、10%实验。虚拟细胞正是实现这一目标的关键工具。埃里克·托波尔（04:09）：许多人可能不理解为何这是“圣杯”。细胞作为生命基本单元极其复杂，不仅包含原子、分子、细胞器的动态交互，还涉及细胞与外部组织及环境的互动。你们选择的基础模型是否存在争议？以色列学者埃兰·塞加尔认为“我们已处于临界点……数据、算力、建模条件均已成熟”。但论文也提到博·王对数据充分性的质疑。夏洛特，你如何看待数据缺口问题？夏洛特·邦内（05:41）： AIVC的核心在于整合所有实验数据。AI架构能跨尺度融合多源数据集，突破传统“一算法一数据”模式。当然现有数据仍不足：多数组织样本来自健康个体，疾病表型与患者数据稀缺，非干预性数据占主导导致扰动效应认知有限。我们的策略是优先利用现有数据，智慧规划未来采集方向。

“Eventually, my dream would be to simulate a virtual cell.”—Demis Hassabis The aspiration to build the virtual cell is considered to be equivalent to a moonshot for digital biology. Recently, 42 leading life scientists published a paper in Cell on why this is so vital, and how it may ultimately be accomplished. This conversation is with 2 of the authors, Charlotte Bunne, now at EPFL and Steve Quake, a Professor at Stanford University, who heads up science at the Chan-Zuckerberg Initiative The audio (above) is available on iTunes and Spotify. The full video is linked here, at the top, and also can be found on YouTube. TRANSCRIPT WITH LINKS TO AUDIO Eric Topol (00:06): Hello, it's Eric Topol with Ground Truths and we've got a really hot topic today, the virtual cell. And what I think is extraordinarily important futuristic paper that recently appeared in the journal Cell and the first author, Charlotte Bunne from EPFL, previously at Stanford’s Computer Science. And Steve Quake, a young friend of mine for many years who heads up the Chan Zuckerberg Initiative (CZI) as well as a professor at Stanford. So welcome, Charlotte and Steve. Steve Quake (00:42): Thanks, Eric. It's great to be here. Charlotte Bunne: Thanks for having me. Eric Topol (00:45): Yeah. So you wrote this article that Charlotte, the first author, and Steve, one of the senior authors, appeared in Cell in December and it just grabbed me, “How to build the virtual cell with artificial intelligence: Priorities and opportunities.” It's the holy grail of biology. We're in this era of digital biology and as you point out in the paper, it's a convergence of what's happening in AI, which is just moving at a velocity that's just so extraordinary and what's happening in biology. So maybe we can start off by, you had some 42 authors that I assume they congregated for a conference or something or how did you get 42 people to agree to the words in this paper? Steve Quake (01:33): We did. We had a meeting at CZI to bring community members together from many different parts of the community, from computer science to bioinformatics, AI experts, biologists who don't trust any of this. We wanted to have some real contrarians in the mix as well and have them have a conversation together about is there an opportunity here? What's the shape of it? What's realistic to expect? And that was sort of the genesis of the article. Eric Topol (02:02): And Charlotte, how did you get to be drafting the paper? Charlotte Bunne (02:09): So I did my postdoc with Aviv Regev at Genentech and Jure Leskovec at CZI and Jure was part of the residency program of CZI. And so, this is how we got involved and you had also prior work with Steve on the universal cell embedding. So this is how everything got started. Eric Topol (02:29): And it's actually amazing because it's a who's who of people who work in life science, AI and digital biology and omics. I mean it's pretty darn impressive. So I thought I'd start off with a quote in the article because it kind of tells a story of where this could go. So the quote was in the paper, “AIVC (artificial intelligence virtual cell) has the potential to revolutionize the scientific process, leading to future breakthroughs in biomedical research, personalized medicine, drug discovery, cell engineering, and programmable biology.” That's a pretty big statement. So maybe we can just kind of toss that around a bit and maybe give it a little more thoughts and color as to what you were positing there. Steve Quake (03:19): Yeah, Charlotte, you want me to take the first shot at that? Okay. So Eric, it is a bold claim and we have a really bold ambition here. We view that over the course of a decade, AI is going to provide the ability to make a transformative computational tool for biology. Right now, cell biology is 90% experimental and 10% computational, roughly speaking. And you've got to do just all kinds of tedious, expensive, challenging lab work to get to the answer. And I don't think AI is going to replace that, but it can invert the ratio. So within 10 years I think we can get to biology being 90% computational and 10% experimental. And the goal of the virtual cell is to build a tool that'll do that. Eric Topol (04:09): And I think a lot of people may not understand why it is considered the holy grail because it is the fundamental unit of life and it's incredibly complex. It's not just all the things happening in the cell with atoms and molecules and organelles and everything inside, but then there's also the interactions the cell to other cells in the outside tissue and world. So I mean it's really quite extraordinary challenge that you've taken on here. And I guess there's some debate, do we have the right foundation? We're going to get into foundation models in a second. A good friend of mine and part of this whole I think process that you got together, Eran Segal from Israel, he said, “We're at this tipping point…All the stars are aligned, and we have all the different components: the data, the compute, the modeling.” And in the paper you describe how we have over the last couple of decades have so many different data sets that are rich that are global initiatives. But then there's also questions. Do we really have the data? I think Bo Wang especially asked about that. Maybe Charlotte, what are your thoughts about data deficiency? There's a lot of data, but do you really have what we need before we bring them all together for this kind of single model that will get us some to the virtual cell? Charlotte Bunne (05:41): So I think, I mean one core idea of building this AIVC is that we basically can leverage all experimental data that is overall collected. So this also goes back to the point Steve just made. So meaning that we basically can integrate across many different studies data because we have AI algorithms or the architectures that power such an AIVC are able to integrate basically data sets on many different scales. So we are going a bit away from this dogma. I'm designing one algorithm from one dataset to this idea of I have an architecture that can take in multiple dataset on multiple scales. So this will help us a bit in being somewhat efficient with the type of experiments that we need to make and the type of experiments we need to conduct. And again, what Steve just said, ultimately, we can very much steer which data sets we need to collect. Charlotte Bunne (06:34): Currently, of course we don't have all the data that is sufficient. I mean in particular, I think most of the tissues we have, they are healthy tissues. We don't have all the disease phenotypes that we would like to measure, having patient data is always a very tricky case. We have mostly non-interventional data, meaning we have very limited understanding of somehow the effect of different perturbations. Perturbations that happen on many different scales in many different environments. So we need to collect a lot here. I think the overall journey that we are going with is that we take the data that we have, we make clever decisions on the data that we will co...

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

大家好，我是《Ground Truths》的Eric Topol，今天我们讨论一个非常热门的话题——虚拟细胞，以及我认为近期发表在《Cell》期刊上的一篇极具前瞻性的重要论文。第一作者是来自EPFL的Charlotte Boone，她之前在斯坦福大学攻读计算机科学。还有我多年的年轻朋友Steve Quake，他领导着Chan Zuckerberg研究所，同时也是斯坦福大学的教授。欢迎Charlotte和Steve的到来。

Hello. It's Eric Topol with Ground Truths, and we've got a really hot topic today, the virtual cell, and what I think is a extraordinarily important futuristic paper that recently appeared in the journal Cell. And the first author, Charlotte, Boone from EPFL, previously at computer science in Stanford. And Steve Quake, a, I say, young friend of mine for many years, who heads up the Chan Zuckerberg Institute as well as a professor at Stanford. So welcome, Charlotte and Steve.

Speaker 1

谢谢Eric，很高兴来到这里。

Thanks, Eric. It's great to be here.

Speaker 2

感谢邀请。

Thanks for having me.

Speaker 0

是的。你们共同撰写的这篇论文由第一作者Charlotte和资深作者之一Steve完成，去年12月发表在《Cell》上。它立刻吸引了我——关于如何用人工智能构建虚拟细胞：优先事项与机遇。这是生物学的圣杯，我们正处在数字生物学时代。

Yeah. So you you wrote this article that Charlotte, the first author, and Steve, one of the senior authors, appeared in Cell in December. It just grabbed me because how to build the virtual cell with artificial intelligence, priorities and opportunities. It's the holy grail of biology. We're in this era of digital biology.

Speaker 0

正如你们在论文中指出的，这是AI领域（其发展速度令人惊叹）与生物学领域的融合。也许我们可以从这个问题开始：你们是如何召集了42位作者？我猜是通过会议或其他方式？

And as you point out in the paper, it's a convergence of what's happening in AI, which is just moving at a velocity that's just so extraordinary, and what's happening in biology. So maybe we can start off by you had some 42 authors that I assume they congregated for a conference or something, or

Speaker 1

你们是怎么让42个人对论文内容达成共识的？我们确实在CZI召开了会议，汇集了来自计算机科学、生物信息学、AI专家等不同领域的成员，包括那些对此持怀疑态度的生物学家。我特意邀请了些真正的反对者加入讨论，让大家共同探讨：这里是否存在机遇？它的形态是什么？

how did you get 42 people to agree to the words in this paper? We we did. We had a meeting at CZI, to bring community members together from many different parts of the community, from computer science to bioinformatics, AI experts, biologists who don't trust any of this. I wanted to have some real contrarians in the mix as well and have them have a conversation together about, is there an opportunity here? What's the shape of it?

Speaker 1

哪些期望是现实的？这就是这篇文章的起源。还有，Geralyn...

What's realistic to expect? And that was sort of the genesis, of the article. And, Geralyn,

Speaker 0

你是怎么开始起草这篇论文的？

how did you get to be drafting the paper?

Speaker 2

我在Genentech跟随Avi Friegev和在CZI跟随Jules Laskowicz完成了博士后研究。Jula曾是CCI驻留项目的成员，这就是我们参与其中的缘由。Jula之前还与Steve合作过通用细胞嵌入项目，一切就这样开始了。

So I did my postdoc with Avi Friegev at Genentech and Jules Laskowicz at CZI. And Jula was part of the residency program of CCI. And so this is how we got involved. And Jula had also prior work with Steve on the universal cell embedding. So this is how everything get got started.

Speaker 0

确实令人惊叹，因为这份名单堪称生命科学、人工智能、数字生物学及组学领域的全明星阵容。我是说，这实在太震撼了。我想先引用文章中的一句话，因为它某种程度上预示了这个方向的可能性。论文中引述道：'人工智能虚拟细胞（AIVC）有望彻底改变科研进程，为生物医学研究、个性化医疗、药物发现、细胞工程和可编程生物学带来突破性进展。'

Yeah. And it's actually amazing because it's a who's who of people who work in life science, AI, and digital biology, and omics. I mean, it's pretty darn impressive. So, I thought I'd start off with a a quote in the article because it kinda tells a story of where this could go. So, the quote was in the paper, AIVC, that's artificial intelligence, virtual cell, has the potential to revolutionize the scientific process leading to future breakthroughs in biomedical research, personalized medicine, drug discovery, cell engineering, and programmable biology.

Speaker 0

这是个相当宏大的论断。或许我们可以就此展开讨论，深入思考并丰富你提出的这个观点。

That's a pretty big statement. So maybe we can just kinda, toss that around a bit and maybe give it a little more thoughts and color, as to what you were positing there.

Speaker 1

好的，查理，你想让我先来谈谈这个吗？

Yeah. Charlie, you want me to take the first shot at that?

Speaker 2

请讲。

Go ahead.

Speaker 1

埃里克，这确实是个大胆的断言，而我们也怀抱着宏大的愿景。我们认为在未来十年内，AI将发展成具有变革性的生物学计算工具。目前细胞生物学大约90%依赖实验，仅10%依靠计算。要获得答案，你必须完成各种繁琐、昂贵且具有挑战性的实验室工作。虽然AI不会完全取代这些工作，但它能彻底扭转这个比例。

Eric, it is a bold claim and we have a really bold ambition here. View that over the course of a decade, AI is going to provide the ability to make a transformative computational tool for biology. Right now, cell biology is 90% experimental and 10% computational, roughly speaking. You've got just all kinds of tedious, expensive, challenging lab work to get to the answer. And I don't think AI is going to replace that, but it can invert the ratio.

Speaker 1

因此，我认为在十年内，我们可以让生物学研究90%依靠计算，10%依靠实验。而虚拟细胞项目的目标，就是打造一个能实现这一点的工具。

So within ten years, I think we can get to biology being 90% computational and 10% experimental. And the goal of the virtual cell is to build a tool that'll that'll do that.

Speaker 0

是的。我觉得很多人可能不理解为什么它被视为圣杯——因为细胞是生命的基本单位，且极其复杂。不仅是细胞内发生的所有事情，包括原子、分子、细胞器以及内部的一切，还有细胞与外部组织及其他细胞的互动。所以，你们承担的确实是个非凡的挑战。当然，关于我们是否具备正确基础，可能还存在一些争议。

Yeah. And I think a lot of people may not understand why it is considered the holy grail because it is the fundamental unit of life, and it's incredibly complex. It's not just all the things happening in the cell, with atoms and molecules and organelles and, you know, everything in inside, but then there's also the interactions of the cell to other cells in the outside tissue and world. So, I mean, it's really quite extraordinary challenge that you've taken on here. And I guess there's there's some debate, of course, as to are we do we have the right foundation?

Speaker 0

我们稍后会讨论基础模型。我的好友，也是你们这个项目的参与者之一，以色列的Aaron Siegel说过，我们正处于临界点。所有条件都已成熟，我们拥有数据、算力、建模等各个要素。你们在论文中也提到，过去几十年我们积累了众多丰富的全球性数据集。但问题在于：我们真的拥有足够数据吗？

We're gonna get into foundation models in a second. A good friend of mine, and part of this whole, I think, process that you got together, Aaron Siegel from Israel, he he said, we're at this tipping point. All the stars are lying, and we have all the different components, the data, the compute, the modeling. And in the paper, you describe how we have, over the last couple of decades, have so many different datasets that are that are rich, that are global initiatives. But then there's also questions, do we really have the data?

Speaker 0

特别是Bo Wang提出了这个问题。Charlotte，你对数据缺口怎么看？虽然数据很多，但在我们整合所有数据构建这个最终能实现虚拟细胞的单一模型之前，现有的数据真的够用吗？

I think I think Bo Wang especially, asked about that. Maybe, Charlotte, what are your thoughts about data deficiency? There's a lot of data, but is do you really have what we need before we bring them all together for this kind of single model that will get us someday to the virtual cell?

Speaker 2

我认为构建这个AI的核心思路是，我们可以利用所有已收集的实验数据。这也呼应了Steve刚才的观点——通过AI算法或AIBC架构，我们能够整合来自不同研究、不同尺度的数据集。这让我们逐渐摆脱'一个算法对应一个数据集'的传统模式，转向'一个架构兼容多尺度数据集'的新范式。

So I think, I mean, one core idea of building this AI we see is that we basically can leverage all experimental data that is overall collected. So this also goes back to the point Steve just made. So meaning that we basically can integrate across many different studies data because we have AI algorithms or the architectures that power such an AIBC are able to integrate basically data sets on many different scales. So we are going a bit away from this dogma. I'm designing one algorithm from one data set to this idea of I have an architecture that can take in multiple data set on multiple scales.

Speaker 2

这将帮助我们在实验设计和实施上提高效率。正如Steve所说，我们最终能精准指导需要开展哪些实验、收集哪些数据。当然目前数据仍不充分——特别是现有组织样本大多来自健康个体，

So this will help us a bit in like being somewhat efficient with the type of experiments that we need to make and the type of experiments we need to conduct. And again, what Steve just said, right? Like ultimately we can very much steer which experiments we need to even like data sets we need to collect. Currently, of course, we don't have all the data that is sufficient. I mean, particular, I think most of the tissues we have, they are healthy tissues, right?

Speaker 2

我们缺乏想要测量的疾病表型数据。患者数据获取总是很棘手，现有数据多为非干预性的，这意味着我们对不同尺度、不同环境下各种扰动效应的理解非常有限。因此我们需要大量补充收集。我们的整体策略是：立足现有数据，逐步完善。

We don't have all the disease phenotypes that we would like to measure. Having patient data is always a very tricky case. We have mostly non interventional data, meaning we have very limited understanding of somehow the effect on of different perturbations, perturbations that happen on many different scales in many different environments. So we need to collect a lot here. I think the overall journey that we are going with is that we take the data that we have.

Speaker 2

我们对未来将收集的数据做出明智决策。而且我们还有这个自我改进的实体，它能意识到自己所不知道的领域。对吧？所以我们需要能够评估，在这个特定范围内我能多准确地预测某事？如果做不到，我们就应该把数据收集的重点放在这方面。

We make clever decisions on the data that we will collect in the future. And we have this also self improving entity that is aware of what it doesn't know. Right. So we need to be able to understand like how well can I predict something in this, on this somewhat regime? If I cannot, then we should focus our data collection effort into this.

Speaker 2

所以我认为

So I think

Speaker 0

那真是太棒了。

that's That's great.

Speaker 2

当前状态，但这基本上也会指导未来的数据收集工作。

Present state, but this will basically also guide the future future collection.

Speaker 0

说到数据，我觉得特别引人入胜的是看到AlphaFold二代如何彻底改变了蛋白质预测。但别忘了，这建立在蛋白质数据库这个非凡资源的基础上。而对于虚拟细胞，没有类似'蛋白质数据包'的东西。正如夏洛特强调的，它更加动态化，扰动无处不在。现在的人类细胞图谱已有数千万细胞数据，正向十亿级迈进——我们过去只知道200种细胞类型。

You know, speaking of data, you know, one of the things I I think that's fascinating is we saw how AlphaFold two really revolutionized predicting proteins. But remember, that was based on this extraordinary resource that had been built, the protein databank, that enabled that. And then and for the virtual cell, there's no such thing as a protein data bag. It's so much more as you emphasized, Charlotte, it's so much dynamic and these perturbations that are just, you know, all across the board, as as you emphasized. Now the human cell atlas, which is currently some tens of millions, but going into a billion cells, we learned that it it used to be 200 cell types.

Speaker 0

现在估计已超过5000种。而成年人体内大约有37万亿个细胞。正在绘制的这张图谱规模惊人。你提出的观点让我想到，正如史蒂夫之前说的，过去所有科学研究都是假设驱动的。

Now, I guess, it's well over 5,000. And that we have 37,000,000,000,000 cells approximately in the average person, adult's body. It's a formidable map that's being made now. And I guess the idea that you're you're advancing is that we used to and this goes back to a statement you made earlier, Steve. Everything we did in science was hypothesis driven.

Speaker 0

但如果我们能建立这个虚拟细胞的计算模型，就能让AI探索整个领域。这才是核心所在吧？

But if we could get this computational model of the virtual cell, then we can have AI exploration of of the whole field. Is that really the nuts of this?

Speaker 1

是的。关于这一点，我有几点想法。我们CZI的首席AI专家西奥·卡拉莱佐斯说过，机器学习是我们理解高维数据的形式化方法。我认为这是一个非常深刻的观点。生物系统本质上就是高度多维的。

Yes. And, you know, a couple of thoughts on that maybe. Theo Caralezzos, our lead AI person at CZI says, machine learning is the formalism through which we understand high dimensional data. And I think that's a very deep statement. Biological And systems are intrinsically very high dimensional.

Speaker 1

人类基因组中有2万个基因存在于这些细胞图谱中。我们正在每个单细胞中同时测量所有这些基因的表达。它们基因表达之间的关系存在大量结构，这些是肉眼无法直接观察到的。例如，我们的CellByGene数据库收集了所有单细胞转录组数据的汇总，目前已超过1亿个细胞。正如你提到的，我们正寻求在不久的将来将这个数量级再提升一个级别。

You've got 20,000 genes in the human genome in these cell atlases. You're measuring all of them at the same time in each single cell. And there's a lot of structure in the relationships of their gene expression there that is just not evident to the human eye. And for example, CellByGene, our database that collects all the aggregates, all of the single cell transcriptomic data is now over a 100,000,000 cells. And as you mentioned, we're seeing ways to increase that by an order of magnitude in the near future.

Speaker 1

尤里·莱斯科维奇和我共同参与的项目——夏洛特之前提到的——就像是首次尝试在这些数据上构建基础模型，以发现其中存在的相关性和结构。我们使用了一个子集，大约是2000万或3000万个细胞，构建了一个大型语言模型，并开始询问它：你从这些数据的结构中理解到了什么？它在没有我们教导的情况下，自行发现了细胞谱系关系。我们仅用数字矩阵训练它，没有提供任何生物学信息，但它学习到了许多关于细胞类型与谱系之间关系的知识。这些知识从高维结构中自然涌现，这让我们感到非常欣喜。

The project that Yuri Leskovich and I worked on together that Charlotte referenced earlier was like a first attempt to build a foundational model on that data to discover some of the correlations and structure that was there. And so with a subset, I think it was the 20 or 30,000,000 cells, we built a large language model and began asking it, what do you understand about the structure of this data? And it kind of discovered lineage relationships without us teaching it. We trained it on a matrix of numbers, no biological information there, and it learned a lot about the relationships between cell type and lineage. That emerged from that high dimensional structure, which was super pleasing to us.

Speaker 1

说实话，对我个人而言，这给了我信心去断言：这条路行得通。虚拟细胞确实存在未来，它不是虚构的概念，而是有实质内容的。值得CZI投入大量资源持续推进，并尝试以此项目团结整个科研社区。

And really, I mean, for me personally, gave me the confidence to say, this stuff is gonna work out. There is a future for the virtual cell. It's not some made up thing. There is real substance there, and this is worth investing an enormous amount of CZI resources in going forward and trying to rally the community around as a project.

Speaker 0

确实，这里的核心前提是生命存在一种语言，而你刚刚很好地论证了这一点。如果你能预测、查询甚至生成——这让我想起多年前李世石与世界冠军的那场著名围棋对弈，当时机器下出了人类无法预料的一步棋。我想这正是你所指的方向，现在推理和逻辑能力又为此增添了新维度。夏洛特，这里有两个对多数听众或观众来说陌生的术语——'通用表征'和你们在构建虚拟细胞模型中非常重要的'虚拟仪器'。能否请你解释一下这些概念，以及作为通用表征(UR)组成部分的嵌入向量？

Well, yeah, I mean, the premise here is that there is a language of life, and you just made a good case that there is. If you can predict, if you can query, if you can generate, like that is is reminiscent of the the famous, go game of Lee Sodol, the world champion, and how the machine came up with a a move, you know, many, many years ago that no human would have anticipated. And I think that's what what you're getting at, and the ability for inference and reason now to add to this. So, you know, Charlotte, one of the things, of course, is about well, there's two terms in here that are unfamiliar to many of the listeners or or viewers of this, podcast, universal representations, and virtual instrument that you make a pretty significant part of how you are going about this virtual cell model. So could you describe that and also the embeddings as part of the UR, the universal representation.

Speaker 0

因为我认为嵌入向量或这些有意义的关联，正是史蒂夫刚才讨论内容的关键所在。

Because I think embeddings or these meaningful relationships are key to what what Steve was just talking about.

Speaker 2

没错。为了某种程度上整合非常不同的模态——即整合那些在不同尺度上进行测量的模态——我们的构想是建立可能差异很大的大型Transformer模型。比如处理图像数据时使用视觉Transformer，处理文本数据时使用专为DNA设计的大语言模型（它们具有极宽的上下文窗口等）。核心思路是让这些模型通过生物学的多尺度相互连接，因为我们知道哪些成分参与了上游的测量过程。

Yes. So in order to somewhat leverage, like, very different modalities, in order to leverage basically modalities that will take measurements across different scales, the idea is that we have large, maybe transformer models that might be very different. If I have imaging data, I have a vision transformer. If I have text data, I have large language models that are designed, of course, for DNA, then they have a very wide context and so on and so forth. But so the idea is somewhat that we have models that are connected through the scales biology because those scales we know, we know which components are somewhat involved in measurements that are happening upstream.

Speaker 2

因此，我们拥有一个由超大规模模型构成的互联网络，这些模型将在多样化的数据集上进行训练。它们内部形成的表征模型能够在一定程度上捕捉所见的一切。这就是我们所说的通用表征，它将跨越生物学的各个尺度而存在。人工智能的伟大之处——我想这可以简要概括为AI的发展史——就在于其预测能力，对吧？近年来，更发展出了生成能力。

So we have the somewhat interconnection of very large model that will be trained on many different data. And we have this internal model representation that somewhat capture everything they've seen. And so this is what we call those universal representation that will exist across the scales of biology. And what is great about AI, and so I think this is a bit like a history of AI in short, is like the ability to predict, right? The last years, the ability to generate, right?

Speaker 2

我们可以生成新的假说，可以补全缺失的模态，甚至可能生成具有特定属性的细胞状态或分子状态。但我认为真正即将到来的是推理能力——这一点在那些超大规模语言模型中已初现端倪。

We can generate new hypotheses. We can generate modalities that we are missing. We can potentially generate certain cellular state molecular state that have a certain property. But I think what's really coming is this ability to reason. So we see this in those very large language models.

Speaker 2

这种推理能力能处理假说验证的思考过程。这正是这些工具最终需要实现的功能：我们需要模拟扰动对细胞表型的影响，即在细胞状态的通用内部表征上；需要模拟突变的下游效应及其在我们表征系统中的上游传导路径。

The ability to reason about hypothesis, how we can test it. And so this is what those instruments ultimately need to do. So we need to be able to simulate the change of a perturbation on a cellular phenotype. So on the internal representation, the universal representation of a cell state. We need to simulate the effect the mutation has downstream and how this would propagate in our representations upstream.

Speaker 2

我们需要构建多种虚拟工具，这些工具将赋予AI虚拟细胞所需的核心能力——包括推理、假说生成、实验预测、扰动结果预估，以及细胞状态/分子状态的计算机辅助设计等。正因如此，我们才要将内部表征系统与操作这些表征的工具区分开来。

And we need to build many different types of virtual instruments that allow us to basically design and build all those capabilities that ultimately the AI virtual cell needs to possess that will then allow us to reason, to generate hypotheses, to basically predict the next experiment, to conduct, to predict the outcome of a perturbation experiment, to in silico design, cellular states, molecular states, things like that. So, and this is why we make the separation between like internal representation as well as those instruments that operate on those representations.

Speaker 0

没错。我最欣赏的是你清晰地勾勒出了实现路径：通过将通用表征(URs)嵌入虚拟仪器(VIs)，配备解码器和操控器。这个构想很完整——当然前提是能完成所有待整合的环节。不过现在显然有很多人唱反调，认为这根本不可能实现。

Yeah. And that's what I really liked is that you basically described the architecture how you're gonna do this, by putting these URs into the VIs, having a decoder and a manipulator. And you basically got the idea. If you can bring all these different integrations, about, which, of course, is is pending. Now, there are obviously many naysayers here that this is impossible.

Speaker 0

比如那位菲利普·鲍尔——不知道你是否读过他的《生命运作之道》。作为多产的科学记者，他在书中写道：'将生命比作机器、机器人或计算机是片面的。生命是层层递进的过程链，每个环节都具有独特完整的自主性，其内在逻辑在非生命世界中绝无类比。'

One of them, is this guy Philip Ball. I don't know if you read the language, how life works. Now he's a science journalist, and he writes he's a prolific writer. He says, comparing life to a machine, a robot, a computer sells it short. Life is a cascade of processes each with a distinct integrate integrity and autonomy, the logic of which has no parallel outside the living world.

Speaker 0

他说得对吗？这种系统根本无法建模？太荒谬了？复杂得超出想象？

Is he right? There's no way to model this. It's silly. It's too complex.

Speaker 1

你知道，我们其实并不清楚。对吧？而且，有反对意见其实是件好事。如果所有人都认为这事可行，那还值得去做吗？关键在于要敢于冒险，去尝试那些真正具有挑战性的前沿领域，在那里答案尚未可知。

You you know, we don't know. Alright? And, you know, it's great that there's naysayers. If everyone agreed this was doable, would it be worth doing? I mean, the whole point is to take risks and get out and do something really challenging in the frontier where you don't know the answer.

Speaker 1

如果我们知道这事能成，我反而不会感兴趣了。所以我个人很高兴目前还没有达成共识。

If we knew that it was doable, I wouldn't be interested in doing it. So I personally am happy that there's not a consensus that

Speaker 0

嗯，我的意思是，要激发人们的想象力——如果你们成功了并动员起全球力量，我不知道谁来买单，因为这需要大量持续投入。但如果能做到，关键在于：今天我们讨论的是制造类器官来研究如何治疗某人的癌症或理解罕见病等等。与其等待数周培养、耗费巨资，你可以在计算机里模拟完成，拥有一个人细胞和组织的虚拟孪生体。这个机会——我不确定人们是否意识到——将是极其非凡、快速且廉价的，前提是你们能实现。

Well, I mean, to capture people's imagination here, if you're successful and you marshal a global effort, I don't know who's gonna pay for it because it's a lot of work coming here for going forward. But if you can do it, the question here is, you know, right today, we talk about, oh, let's make an organoid so we can figure out how to treat this person's cancer or understand this person's rare disease or whatever. And instead of having to wait weeks for this culture and all the expense and whatnot, you could just do it in a computer, in silico. And you have this virtual twin of a person's, you know, cells and their tissue and whatnot. So the opportunity here is I don't know if people, you know, get this, is just extraordinary and quick and cheap, you know, if you can get there.

Speaker 0

我想这是个非常大胆的创举。你觉得谁会为此买单呢？

I guess and it's such a bold initiative idea. Who will pay for this, do think?

Speaker 1

CZI正在投入大量资源，这对我们是个重大项目。我们一直在打基础——去年底建成了我认为是非营利性基础科学研究领域最大（或至少是最大之一）的GPU超级计算机集群。事实上，我们12月就向科学界发布了使用它构建模型的提案邀请，正共享这个资源。

Well, CZI is putting an enormous amount of resources into it, and it's it's a major project for us. We have been laying the groundwork for it. We recently put together what I think is, if not the largest, one of the largest GPU supercomputer clusters for nonprofit basic science research that came online at the end of last year. In fact, in December, we put out an RFA for the scientific community to propose using it to build models. And so we're sharing that resource within the scientific community.

Speaker 1

如你所知，这个领域真正的挑战之一是计算资源获取。企业拥有资源，学术界则远远不足。我们处于中间位置——虽不及科技公司的水平，但远超多数高校能力范围。我们正利用这点推动领域发展：今年计划发布更多提案邀请来资助全球研究者，同时CZI内部也在组建强大团队推进项目。

As I think you appreciate, one of the real challenges in the field has been access to compute resources. Industry has it, academia at a much lower level, we're able to be somewhere in between, not quite at the level of a private company, as a tech company, but at a level beyond what most universities are able to do, and we're trying to use that to drive the field forward. We're also planning on launching RFAs this year to help drive this project forward and funding people globally on that. And we're building a substantial internal effort within CZI to to help drive this project forward.

Speaker 0

我觉得这有点像人类基因组计划——最初启动时人们都觉得不可能。但看看结果：它不仅完成了，现在基因组测序简直成了廉价商品，与过去成本相比天壤之别。

You know, I I think it has the looks of, the human genome project, which at times, you know, when it was originally launched, people thought, oh, this is impossible. And then look what happened. It got done, and now, you know, the sequence of genome is just a, you know, commodity, very relatively very inexpensive compared to I what it used to

Speaker 1

我经常思考这些相似之处，我要对菲利普·鲍尔说一点，我承认他的观点。细胞确实非常复杂。基因组计划，我是说，其天才之处在于将生物学问题转化为化学问题。试管里有一种化学物质，需要解析其结构。只要能做到这点，问题就解决了。

think a lot about those parallels, and I will say one thing to Philip Ball, I I will concede him the point. The cells are very complicated. The genome project, I mean, the sort of genius there was to turn it from a biology problem to a chemistry problem. There is a test tube with a chemical and it work out the structure of that chemical. And if you can do that, the problem is solved.

Speaker 1

我认为虚拟细胞的意义更为复杂和模糊，无论是定义它将做什么还是何时完成。我们在这方面还有大量工作要做。正因如此，我略微将我们未来十年的北极星和CCI目标设定为'破解细胞的奥秘'。'奥秘'这个词对我至关重要。正如你之前指出的，分子层面已被理解——基因组已测序，蛋白质结构已解析或预测。

I think what it means to have the virtual cell is much more complex and ambiguous in terms of defining what it's going to do and when you're done. We have our work cut out for us there to try to do that. That's why a little bit I established our North Star and CCI for the next decade as understanding the mysteries of the cell. And that word mystery is like very important to me. You know, I think the molecules as you pointed out earlier understood Genome sequenced, protein structure is solved or predicted.

Speaker 1

我们对分子了解很多。即便不是完全解决，这些问题也接近解决。真正的谜团是它们如何协同作用在细胞中创造生命？这正是我们试图通过虚拟细胞项目解答的问题。

We know a lot about the molecules. Those are, if not solved problems, pretty close to being solved. And the real mystery is how do they work together to create life in the cell? And that's what we're trying to answer with this virtual cell project.

Speaker 0

是的。我认为另一个同步发生的、增加你们成功概率的因素是：我们从未见过生命科学领域像最近几周几个月这样涌现出如此多的基础模型。明天《科学》杂志将有我的一篇论文，总结的不仅是RNA、DNA、配体的进展——包括AlphaFold三、Boltz等众多突破——这股基础模型的洪流速度令人惊叹。

Yeah. I think another thing that, of course, is happening concurrently to add the, I think, likelihood that you'll be successful is we've never seen the foundation models coming out in life science as they have in recent weeks and months. Never I mean, I I I have a paper in science this, tomorrow coming out of summarizing the progress about not just RNA, DNA, ligands. I mean, the the whole idea, AlphaFold three, but now Boltz and so many others. It's just amazing how fast the torrent of new foundation models.

Speaker 0

那么，Sharad，你认为原因何在？生命科学领域以前从未出现过这种速度的基础模型爆发，涉及进化、生命各类分子设计，当然也包括细胞层面。你觉得这是怎么回事？

So, Sharad, what do you think accounts for this? This is unprecedented in life science to see foundation models coming out at this clip, on evolution, on I mean, you name it. Design of every different molecule of life or, of course, in cells included in that. What do you think is going on here?

Speaker 2

Simon，一方面我们确实受益于过去几十年在标准化数据集构建上的巨大努力。Selix Gene可以说是'AI友好型'平台，能直接输入算法。但另一方面，我们也看到了AI算法自身真正新颖的构建机制和设计原则。我们已认识到，要真正取得进展并构建有效系统，必须开发专为生物数据设计的AI工具。

Simon, on the one hand, of course, we benefit profits and inherit from all the tremendous efforts that have been made in the last decades on assembling those data sets that are very, very standardized. Selix Gene is like very, like somehow AI friendly, as you can say it, right? Like it is somewhat a platform that is easy to feed into algorithms. But at the same time, we actually also see really new building mechanisms, design principles of AI algorithms in itself. So I think we have understood that in order to really make progress, build those systems that work well, we need to build AI tools that are designed for biological data.

Speaker 2

举个简单例子：若将大型语言模型直接用于DNA文本，它不会开箱即用——因为存在不同阅读方向、上下文长度等无数差异。再看标准计算机视觉领域——AI表现卓越的领域——若将标准视觉变换器应用于多重成像，同样行不通。常规计算机视觉架构总是预期相同的RGB三通道输入，而多重成像单次实验可能测量多达150种蛋白质，且每项研究检测的蛋白质各不相同。

So to give you an easy example, right? Like if I use a large language model on text, it's not going to work out of the box for DNA because we have different reading directions, different context lens and many, many, many, many more. And if I look at standard computer vision, right, where we can say AI really excels and I'm applying standard computer vision, vision transformers on multiplex images, they are not going to work because normal computer vision architectures, they always expect the same three inputs, RGB. In multiplex images, I'm measuring up to 150 proteins potentially in a single experiment. But every study will measure different proteins.

Speaker 2

因此我处理多种不同尺度的问题，比如比常规更大的尺度，传统计算机视觉变换器中使用的注意力机制已不再适用。它们无法扩展。同时，我需要完全灵活地应对实验中可能遇到的任何输入通道组合。这正是我们当前在首项工作中所做的——继承《AI虚拟细胞》论文中提出的设计原则，然后针对生物数据的特殊需求构建全新AI架构。现在我们有许多计算机科学家与生物学家紧密合作，前者深入理解生物学，后者则越来越精通计算机科学。

So I deal with many different scales, like larger scales than are used to, attention mechanisms that we have in usual computer vision transformers are not going to work anymore. They are not going to scale. And at the same time, I need to be completely flexible in whatever input combination of channel I'm just going to face in this experiment. So, I mean, so this is what we right now did, for example, in our very first work, inheriting the design principle that we laid out in the paper, AI Virtual Cell, and then come up with new AI architectures that are dealing with these very special requirements that biological data have. So we have now a lot of computer scientists that work very, very closely, have a very good understanding of biologists, biologists that are getting much and much more into the computer science.

Speaker 2

因此涌现出一批精通两种语言的专家，他们能构建专为生物数据定制的模型。我们不再简单套用适用于街景的计算机视觉架构来处理生物数据，而是从根本上转变思路，构建专门化的架构体系——当然，这背后还离不开过去积累的海量数据基础。

So people who are fluent in both languages somewhat that are able to now build models that are adopted and designed for biological data. And we don't just take basically computer vision architectures that work well on street scenes and try to apply them on biological data. So it's just a very different way of thinking about it, starting constructing basically specialized architectures, besides, of course, the tremendous data efforts that have happened in the past.

Speaker 0

确实。我们甚至还没讨论序列数据呢，毕竟还有经历了革命的成像技术——现在无需任何染色就能对亚细胞结构成像，完全不会破坏细胞。这是深度学习时代带来的另一项突破。你们发表在《Cell》的论文中有个数据让我震撼：生物序列数据的短读存档库拥有超过14PB信息量，是训练ChatGPT所用数据集的一千倍。

Yeah. I mean, it's pretty and we're not even talking about just sequence because we also got image imaging, which has gone through a revolution, be able to image subcellular without having to use any types of stains Exactly. That would disrupt cells. That's another part of kind of a deep learning era that, when it came along. One one thing I thought was fascinating in the paper in Cell you wrote, for instance, the short read archive of biological sequence data holds over 14 petabytes of information, which is a thousand times larger than the dataset used to train ChatGPT.

Speaker 0

这相当于海量的标记和计算资源。处理这种规模的数据需要深海电缆级别的传输能力——虽然深海电缆未必真如宣传那般经济高效。这种数据挑战与处理人类语言数据截然不同，你不觉得吗？

I mean, that's that's a lot of tokens. That's a lot of, stuff, compute resources. You it's almost like you're gonna need a deep sea type of way to get this I mean, not that deep sea is as as it's claimed to be so much more economical, but, there's a data challenge here in terms of, working with that massive amount that is different than the language of of, human language that is, our language. Wouldn't you say?

Speaker 1

埃里克，这让我想起悉尼·布伦纳的名言，他总那么机智。2000年基因组学初现曙光时他说过：'生物学正淹没在数据海洋中，却渴求着知识'。多么深刻的论断啊？这正是我们在论文中引用短读存档数据的动机。对我而言，构建虚拟细胞的价值就在于它能将数据转化为知识。

So, you know, Eric, it brings to mind one of my favorite quotes from Sydney Brenner, who is such a wit. And, you know, in 2000, that the sort of early, you know, first flush of success in genomics, he said, biology is drowning in a sea of data and starving for knowledge. Very deep statement, right? And that's a little bit what the motivation was for putting the short read archive statistic into the paper there. And, you know, again, for me, part of the value of this endeavor of creating a virtual cell is it's a tool that help us translate data into knowledge.

Speaker 0

没错。你们《Cell》论文中有两张震撼的图表：第一张展示虚拟细胞的功能，第二张对比虚拟细胞与真实细胞。我们会把相关链接附在文字稿里。另外《大西洋月刊》有篇精彩文章也值得推荐。

Yeah. Well, there's two, I I think, phenomenal figures in your cell paper. The first one that kicks across the, the capabilities of the virtual cell and the second that compares the virtual cell to the real or the physical cell. And we'll link that with this, in the transcript. And the other thing we'll link is the there's a nice Atlantic article.

Speaker 0

虚拟细胞是科学界的圣杯。虽然可能不是下周或明年就能实现，但我们正逐步逼近这个目标。这对非专业人士尤为重要，因为它让这项技术不再高不可攀。你们正在探索的领域确实令人振奋。

A virtual cell is a is a holy grail of science. It's getting closer. That may not be quite close as, like, next week or year, but it's getting close. And that's good for people who are not well grounded in this because it's, you know, much more, taken out of the technical realm. This is really exciting, I mean, what you're what you're onto here.

Speaker 0

有趣的是，史蒂夫，我认识你这么多年，早期你的职业生涯专注于组学研究，也就是DNA和RNA。而最近你转向了细胞领域。这是否因为你试图预判学科发展方向？请简单谈谈你的转型历程。

And, what's interesting, Steve, since I've known you for so many years, earlier in your career, you really worked on omics, that is DNA and RNA. And in recent times, you've made this switch to cells. Is that just because you're trying to anticipate the field? Tell us a little bit about your migration.

Speaker 1

确实。我职业生涯的重要部分就是尝试开发能揭示生物学奥秘的新测量技术。几十年前是理解分子层面，如今则是理解细胞等更复杂的生物实体。这就像一种自然演进。

Yeah. A big part of my career has been trying to develop new measurement technologies that will provide insight into biology. You know, decades ago, that was understanding molecules. Now it's understanding more complex biological things like cells. And it was like a natural progression.

Speaker 1

我们构建了测序仪，完成了基因组测序对吧？显然人们随后会大规模开展这项工作并产生海量数据。希望知识能从中涌现。但作为学者，我从未想过会处于现在的位置。这么说吧——

I mean, we built the sequencers, sequence the genomes done, right? I mean, it was clear that people were just going to do that at scale then and create lots of data. Hopefully, knowledge would get out of that. But for me as an academic, I never wanted to never thought I'd be in the position I'm in now. Let's put it that way.

Speaker 1

我原本只想运营一个小型研究组。后来意识到必须跳出基因组领域寻找新前沿。于是聚焦到微流控与基因组学的交叉领域——如你所知，我投入了大量时间开发微流控工具。没错，所有这些细胞研究尝试

I just wanted to keep running a small research group. I realized I would have to get out of the genome thing and find the next frontier. It became this intersection of microfluidics and genomics, which as you know, I spent a lot of time developing microfluidic tools. Yeah. So again, all these cells and try

Speaker 0

开展单细胞生物学以理解其异质性，这条曲折道路最终引领我创建了这些细胞图谱，达到今日成就。我们很幸运能与你及CGI合作推动此事。看来完成这项事业需要大量助力。夏洛特，作为EPFL的计算机科学家，你计划如何继续推进？对有志于数字生物学的计算机领域从业者有何职业建议？

to do single cell biology to to understand their heterogeneity and that through a winding path led me to all these cell atlases and to where we are now. Well, we're fortunate for that and also with your work with CGI to help propel that forward. And I think it sounds like we're gonna need a lot of help to get this thing done. Now, Charlotte, as a computer scientist now at EPFL, what what are you gonna do to keep working on this? And what's your career advice for people in computer science who have an interest in digital biology?

Speaker 2

我专注于利用这项技术构建诊断工具，简化临床诊断流程。毕竟医院进行深度组学分析的能力有限。但通过更经济简便的检测方式——或借助已学习过各类数据的模型进行情境化解读——来映射出更丰富信息的前景非常诱人。我们看到那些病理学基础模型的价值：常规进行H&E染色检查，再智能决定何时启动深度诊断以获得更精准预测，这种能力极具突破性，最终能降低医疗成本同时提升精度。

So I work in particular on the prospect of using this to build diagnostic tools and to make diagnostics in the clinic easier, because ultimately we have somewhat limited capabilities in the hospital to run like deep omics. But the idea of being able to somewhat map with a cheaper and lighter modality or like somehow diagnostic test into something much richer because a model has been seeing all those different data and can basically contextualize it. It's very interesting. We've seen all those pathology foundation models. If I can always run an H and E, but then decide when to run deeper diagnostics to have a better or more accurate prediction, that is very powerful and ultimately like reducing the costs, but the precision that we have in hospitals.

Speaker 2

我目前在生命科学院与计算机科学院双聘，并与医院保持合作关系以实现这些构想。关于职业建议：不必拘谨，但要坚持专业根基。我本科主修生物，但从未局限于此；后来取得计算机博士学位——虽然生物学背景看似不构成直接条件。

So my faculty position right now is co located between the School of Life Sciences, School of Computer Science. So I have a dual affiliation and I'm affiliated to the hospitals to actually make this possible. And as a career advice, I think don't be shy and stick to your discipline. I think I have a bachelor's in biology, but I never only did biology. I have a PhD in computer science, which kind of you would think a bachelor in biology not necessarily qualifies you through.

Speaker 2

因此我认为这种跨学科性还要求你能够非常流畅、自如地阅读多种不同风格的论文和出版物，因为在计算机科学领域的发表方式与生物学写作方式会截然不同。所以不要局限于你的学习计划，而是自由选择任何能让你更接近研究所需知识的课程，无论你正在构建或从事什么任务。

So I think this interdisciplinarity also requires you to be very fluent, very comfortable in reading many different styles of papers and publications because of the publication in a computer science venue will be very, very different from the way we write in biology. So don't stick to your study program, but just be free in selecting whatever course gets you closer to the knowledge you need in order to do the research or whatever task you're building and working on.

Speaker 0

雪莉，你知道吗？你这种将生命科学与计算机科学融合的配置如此理想又罕见，这真是太棒了。这正是我们所需要的，也是实现虚拟细胞的基础——让这两个领域协同合作。史蒂夫，同样地，你本是工程师，却早在'数字生物学'这个术语出现前就成了该领域的先驱之一。这种跨学科、超学科的思维，正是你们取得成功的关键所在，对吧？

Well, you know, Cheryl, the way you're set up there with this coalescence of life science and computer science is so ideal and so unusual, here So in the that's, that's fantastic. And that's kind of what we need, and that's really the underpinning of how you're gonna get to the virtual cells, getting these two communities together. And, Steve, likewise, you were an engineer, and somehow you became one of the pioneers of digital biology way back before you had that term. This this interdisciplinary, transdisciplinary, we need so much of that in order for you all to be successful. Right?

Speaker 1

完全同意。学科交界处蕴藏着大量待发现的宝藏。我接受的是物理学训练，却将职业生涯定位在物理、生物与技术开发的交叉领域。这就像一份持续馈赠的礼物——新的测量方法带来新的科学发现，而这些发现又催生新的测量方向，形成了良性循环。

Absolutely. I mean, there's so much great discovery to be done on the boundary between fields. I trained as a physicist and, you know, kind of made my career this boundary between physics and biology and and technology development. And it's just sort of been the gift that keeps on giving, you know, there's a new way to measure something, you discover something new scientifically, and it just all suggests new things to measure. It's it's very self reinforcing.

Speaker 0

现在有几位你们熟知的专家对这个数字生物学时代做出了重大论断。我认为虚拟细胞可能是当前所有数字生物学研究中最宏大的计划。黄仁勋曾写道'这是人类历史上生物学首次有机会成为工程而不仅是科学'，德米斯·哈萨比斯则提出'我们正在见证工程科学——必须先构建目标实体，再用科学方法拆解理解其组成部分'。

Now a couple of people who you know well have made some pretty big statements about this whole era of digital biology. And I think, you know, the the the virtual cell is perhaps the the biggest, initiative, of all of the digital biology ongoing efforts. But Jensen Huang wrote the first time in human history, biology has the opportunity to be engineering, not science. And Demis Hassabis wrote or said, we're seeing engineering science. You have to build the artifact of interest first, then use the scientific method to reduce it down and understand its components.

Speaker 0

要理解这些组件还有大量工作要做。比如当下AI药物研发正如火如荼，无数公司投身其中，但现有方法并未考虑细胞层面，基本只关注蛋白质-配体相互作用。如果我们能开发基于细胞的药物发现呢？

Well, here, there's a lot to do to understand its components. And if we can do that like, for example, right now, as as both of you know, AI drug discovery is in high gear, and there's umpteen numbers of companies working on it. But it doesn't account for the cell. I mean, it basically is protein, protein ligand interactions. What if we had drug discovery that was cell based?

Speaker 0

能谈谈这个吗？因为目前这种技术根本不存在。

Could you comment about that? Because that doesn't even exist right now.

Speaker 1

是的，我可以先说说。夏洛特，如果你有想法我也很想听听。我认为AI方法在设计分子方面会非常有用。从开发新疗法（无论是小分子还是抗体）的角度来看，这个领域确实有大量投资，是风投青睐的近期成果，存在巨大机遇。

Yeah. I mean, I can say something first. Charlotte, if you've got thoughts, I'm curious to hear them. So I do think AI approaches are gonna be very useful designing molecules. And so from the perspective of designing new therapeutics, whether they're small molecules or antibodies, yeah, I mean, there's a ton of investment in that area that is like a near term fruit, perfect thing for venture people invest in and there's opportunity there.

Speaker 1

已有足够的原理验证。不过我同意你的观点，若想真正理解药物作用于靶点时的机制，就需要

There's been enough proof of principle. However, I do agree with you that if you want to really understand what happens when you drug a target, you're going to

Speaker 2

需要

want to

Speaker 1

建立某种细胞模型。或许不仅是单一细胞，而是人体所有不同类型的细胞，这样才能理解药物靶向毒性产生的来源，以及是否能达到预期疗效。因此我们衷心希望科研人员能将我们正在构建的虚拟细胞模型纳入药物研发流程。诚如你所言，这一领域确实存在盲区，但我们相信只要开发出实用工具，人们自然会采用。关于这点我还想说，我对细胞疗法的未来充满热情。

have some model of the cell. Maybe not just the cell, but all the different cell types of the body to understand where toxicity will come from if you have on target toxicity and whether you get efficacy on the thing you're trying to do. And so we really hope that people will use the virtual cell models we're going to build as part of the drug discovery development process. And we would think it's going to, you know, I agree with you, it's been a little bit a blind spot, and we think if we make something useful, people will be using it. The other thing I'll say on that point is I'm very enthusiastic about the future of cellular therapies.

Speaker 1

我们在CCI的重大赌注之一就是创立纽约生物中心，其目标是以极大雄心奠定工程与科学基础，开发出功能彻底革新的细胞疗法。虚拟细胞将助力实现这一目标，对吧？没错，这对他们完成使命至关重要。

One of our big bets at CCI has been starting the New York Biohub, which is aimed at really being very ambitious about establishing the engineering and scientific foundations of how to engineer completely radically more powerful cellular therapies. And the virtual cell is gonna help them do that, Right? Yeah. Yeah. It's gonna be essential for them to achieve that mission.

Speaker 0

我认为你指出了当今医学界最重要的趋势之一——我们未曾预料到工程化活细胞疗法（尤其是现成疗法或体内改造，而非必须体外操作）正在引发一场革命。这不仅限于癌症领域，还包括自身免疫疾病等诸多病症。因此虚拟细胞技术不可或缺，我们迫切需要它。

I I think you're pointing out one of the most important things going on in medicine today is how we didn't anticipate that live cell therapy of engineered cells, and ideally off the shelf or in vivo, not just, having to take them out and and and work on them outside the body, is a revolution ongoing. And it's not just in cancer. It's in autoimmune diseases and many others. So it's part of the virtual cell need. We need this.

Speaker 0

我想请两位评论一个术语误用问题：我们总在谈论单细胞、单细胞。本周有篇空间多组学论文整合了五种不同单细胞尺度的数据，这很棒。但我们实际上并未达到单细胞水平，基本还是在观测50-100个细胞的群体。

One of the things that's a misnomer I want you both to comment on, we keep talking about single cell, single cell. And there's a paper, spatial multiomics this week, five different single cell, you know, scales all integrated. It's great. But we don't get to single cell. We're basically looking at 50 cells, 100 cells.

Speaker 0

由于测序深度不足，我们并未实现真正的单细胞分析。这是否只是时间问题？当然，越接近单细胞或少细胞水平，我们获得的洞见就越多。能否就此发表看法？毕竟每天都有大量单细胞文献发表，但实际上我们尚未真正达到那个精度。

We're not doing single cell because we're not going deep enough. Is that just a matter of time when we actually are doing it? Of course, the more we do get down to the single or a few cells, the more insights we're gonna get. Would you comment about that? Because, you know, we have all this literature on single cell comes out every day, but we're not really there yet.

Speaker 1

夏洛特，你想先试一下然后我再说点什么吗？

Charlotte, you wanna take a first pass at that and then I can say something?

Speaker 2

是的，这要看情况。对吧？我认为如果我们观察某些空间蛋白质组学，我们仍然具有亚细胞分辨率。当然，我们总是测量许多不同的细胞，但我们能够在一定程度上达到可以观察某些蛋白质共定位的分辨率。

Yes. So so it depends. Right? So I think if we look at certain spatial proteomics, we still have sub cellular resolution. So of course, we always measure many different cells, but we are able to somewhat get down to a resolution where we can look at certain co localization of proteins.

Speaker 2

这也回到了刚才提到的观点，对吧？就像拥有这样一个研究药物的良好环境。比如，如果我想开发一种新药，构建这种多尺度模型的想法实际上让我们能够模拟不同的结合变化和结合效应，因为我们模拟了药物的作用。最终，我们获得的数据是亚细胞层面的。当然，在空间生物学中，我们经常使用一些相对粗略的方法。

This also goes back to the point just made before, right? Like having this very good environment to study like drugs, right? Like if I want to build a new drug, if I want to build a new protein, the idea of building this multiscale model allows us to actually simulate different, like somehow binding changes and binding because we simulate the effect of a drug. Ultimately, the readouts we have, they are subcellular. So of course, we often like in the spatial biology, we often have a bit like methods that are rather coarse.

Speaker 2

它们有一个覆盖某些细胞的斑点，对吧？比如数百个或少数几个细胞。但我认为我们也有越来越多的技术正在聚焦到亚细胞层面，使我们能够标记或使用基于探针的方法进行放大。有单个细胞的显微镜技术，可以真正在三维上捕捉它们。当然，这些技术目前通量还不高，但它们也让我们对形态学以及最终形态如何决定某些细胞特性或表型有了认识。

They have a spot that averages over certain like some cells, right? Like hundreds of cells or few cells. But I think we also have more and more technologies that are zooming in that are subcellular where we can actually tag or have like those probe based methods that allow us to zoom in. There's microscopy of individual cells to really capture them in three d. They are, of course, not very high throughput yet, but it gives us also an idea of the morphology and how ultimately morphology determines certain like somehow cellular properties or cellular phenotype.

Speaker 2

所以我认为在实验方面也有很多进展，这些最终会反馈到AI虚拟细胞中。这些模型将由这些数据驱动。同样，观察动态，观察单个细胞的活体成像及其形态变化，这些也是我们最终需要的数据，以便更好地理解疾病机制、细胞表型、功能和扰动响应。

So I think there's lots of progress also on the experimental and that ultimately will back feed into the AI virtual cell. Those models will be fed by those data. Similarly, looking at dynamics, right, looking at live imaging of individual cells, of their morphological changes. Also, this ultimately is data that we need to get a better understanding of disease mechanisms, cellular phenotype, functions, perturbation responses.

Speaker 0

对。史蒂夫，我的意思是，你可以评论一下我们在空间和时间、空间时间分辨率、空间组学方面取得的惊人进展，但我们在深入到单个细胞方面仍有提升空间，对吧？

Right. Yeah. Steve, I mean, you can comment on that and the amazing progress that we have made with space and time, spatial temporal resolution, spatial omics over these years, but that we still could go deeper in terms of getting to individual cells. Right?

Speaker 1

那么，你知道，我们能对单个细胞做什么？我认为我们在放大和测序单个细胞的基因组、转录组方面已经非常成熟。你可以问，一个细胞是否足以得出生物学结论？也许我认为你指的是人们希望看到重复，所以你可以问需要观察多少个细胞才能对任何给定的生物学结论有信心，这是一个合理的问题。这是一个统计学问题，也是好的科学。

So, you know, What can we do with a single cell? I'd say we are very mature in our ability to amplify and sequence the genome of a single cell, amplify and sequence the transcriptome of a single cell. You can ask, is one cell enough to make a biological conclusion? Maybe I think what you're referring to is people want to see replicates, so you can ask how many cells do you need to see to have confidence in any given biological conclusion, which is a reasonable thing. It's a statistical question and good science.

Speaker 1

我认为质谱团队最近的表现让我印象深刻。他们终于突破了从单细胞中检测蛋白质的技术瓶颈，现在能够分析数千种蛋白质。我记得去年底《自然-方法》将其评为年度技术之一——单细胞蛋白质组学。是的，他们已经攻克了单细胞测量的难关。但目前我认为缺失的部分，是难以可靠地在同一个细胞上完成所有这些分析。

I think I've been very impressed with how the mass spec people have been doing recently. I think they've finally cracked the ability to look at proteins from single cells and they can look at couple thousand proteins. Was, I think one of these nature method of the year things at the end of last year and- Individual proteomics, yes. Are over Yeah, the they are over the hump with single cell measurements. Part of what's missing right now, I think, is the ability to reliably do all of that on the same cell.

Speaker 1

这就是夏洛特提到的在单细胞上进行多模态测量的能力。这方面还处于起步阶段，虽然已有一些案例，但仍有大量工作待完成。另外，目前这些测量都是破坏性的，导致我们无法观察细胞随时间的演变过程。我们只能在某个时间点解构细胞观察其状态，却无法追踪后续发展。

So this is what Charlotte was referring to be able to do sort of multimodal measurements on single cells. That's kind of in its infancy. There's a few examples, but there's a lot more work to be done on that. I think also the fact that these measurements are all destructive right now, and so you're losing the ability to look how the cells evolve over time. You've got to say, this time point, I'm going to dissect this thing and look at its state, and I don't get to see what happens further down the road.

Speaker 1

所以我认为这是未来需要解决的另一个测量挑战。

So that's another future, I think, measurement challenge to be addressed.

Speaker 0

是的。我正试图梳理这项极其大胆的倡议中面临的众多挑战，因为挑战确实不少——而这正是它的价值所在。它为人们提供了大量需要攻克和超越的难题。在我们结束前，除了你指出的所有工作都需通过真实实验验证（不能仅停留在虚拟AI世界），你还提到了这项工作的安全与伦理问题。假设你们最终会逐步取得成功，能否请你们中的一位或两位就此发表看法？

Yeah. And I think, you know, I'm just trying to identify some of the multitude of challenges in this extraordinarily bold, initiative because, there are no shortage, and that's what's good about it. It's that there it's given people, you know, lots of work to do to over overcome, override some of these challenges. Now before we wrap up, besides the fact that you point out that all the work has to be done and be validated in real experiments, not just live in a virtual AI world, but you also comment about the safety and ethics of this work, and assuming, you know, you're gonna gradually get there and be successful. So could either or both of you comment about that?

Speaker 0

因为你们已经提前考虑到这一点，显得非常深思熟虑。

Because, it's very thoughtful that you're that you're thinking already about that.

Speaker 1

作为科学家和广义社会成员，我们必须谨慎行事，确保与政策制定者的互动能使这些工具真正推动科学发展，而非损害人类健康，同时要尊重患者隐私。因此从一开始就需要审慎思考涉及个人数据的伦理问题。此外，关于论文发表也存在伦理考量——要防止有人利用虚拟细胞数据伪造论文，隐瞒数据来源并伪装成真实实验。这类伦理问题同样需要重视。

As scientists and members of the sort of larger community, we wanna be careful, and ensure that we're interacting, with people who set policy in a way that ensures that these tools are being used to advance the cause of science and not do things that are detrimental to human health and are used in a way that respects patient privacy. So the ethics around how you use all this with respect to individuals is going to be important to be thoughtful about from the beginning. I also think there's an ethical question around what it means to be publishing papers. You don't want people to be forging papers using data from the virtual cell without being clear about where that came from and pretending that it was a real experiment. So there's issues around those sorts of ethics as well that need to be considered.

Speaker 0

在你们遍布全球的40多位合作者中，你们是否感觉到大家正在齐心协力实现这个目标？是否存在这种促进协作的全球性凝聚力？

And of those 40 some authors, do you have around the world, do you have the sense that you all work together to achieve this, goal? Is there that kind of a global, bonding here that's gonna collaborate?

Speaker 1

我认为这项努力将远超那40位作者的范畴，它将涵盖更广泛的人群，我非常期待看到它随时间演进。

I think this effort is gonna go way beyond those 40 authors. It's gonna include a much larger set of people, and I'm really excited to see that evolve with time.

Speaker 0

是的。我知道这确实非同寻常，你是如何启动这件事的，而那篇论文，你知道的，是我们都期待能改变许多科学与医学领域的蓝图。我是说，正如你提到的史蒂夫，深度视觉蛋白质组学如何拯救生命。我称之为空间医学，而不仅仅是空间生物学。因此，这种方式能改变医学的未来，我认为很多人只需稍加想象，一旦我们通过AIVC实现目标，未来将充满令人振奋的可能性。

Yeah. I know it's it's really quite extraordinary how you kick this thing off, and the paper is, you know, the blueprint for something that we're all gonna anticipate that, could change a lot of science and medicine. I mean, we saw, as you mentioned, Steve, how that deep visual proteomics saved lives. It was what I wrote as spatial medicine, no longer spatial biology. And so the the way that this can change the future of medicine, I think a lot of people just have to have a little bit of imagination that once we get there with this AIVC that there's a lot in store that's really quite exciting.

Speaker 0

嗯，我认为这次对论文及其相关议题的回顾令人振奋。我对你们的成功无比期待，以及最终这可能引领我们去往何方。在我们结束前，讨论中还有什么我遗漏需要补充的吗？

Well, I think this has been an invigorating review of that paper and some of the issues surrounding it. I couldn't, be more enthusiastic for your success, and ultimately, you know, where this could take us. Did I miss anything, during the discussion that we should touch on before we wrap up?

Speaker 1

从我的角度看没有。埃里克，一如既往地愉快，这是一次有趣的讨论。

Not from my perspective. It was a pleasure as always, Eric, and a fun discussion.

Speaker 2

是的。非常感谢。

Yeah. Thanks so much.

Speaker 0

好的，谢谢你们两位以及这篇论文的所有合著者。我们将怀着极大兴趣持续关注。我想对大多数听众来说，他们可能尚未意识到这将是未来的方向。总有一天我们会实现。现在值得指出的是，当前基于Transformer架构的大型语言模型将持续进化。

Well, thank you both and all of the coauthors of this paper. We're gonna be following this with with a great interest. And I think for most people listening, they may not know, that this is, in store for the future. Someday, we will get there. I think one of the things to point out right now is the models we have today that the largely, the large language models based on transformer architecture, they're gonna continue to evolve.

Speaker 0

我们已经看到在推理和利用推理能力方面取得巨大进展，不再要求即时答案的提示，而是等待数天以获取更多计算资源完成的工作。但未来我们将获得能整合这些的模型。我认为这是你在论文中触及的一点。因此，结合你所规划的蓝图，无论现在拥有什么，人工智能只会不断进步。这些基础模型涉及的生物学应用将变得更广泛，其用例也将更具吸引力。

We're already seeing so much in inference and ability for reasoning to be exploited, and not asking for prompts with immediate answers, but waiting for days to get back a lot more work from a lot more computing resources. But we're gonna get models in the future to fold this together. I think that's one of the things that you've, touched on in the paper. So that whatever we have today in in concert with what you've laid out, AI is just gonna keep getting better. The biology that is these foundation models are gonna get broader and and more, compelling as to their use cases.

Speaker 0

因此，这就是我相信这一点的原因。我不认为当前是一个静态的局面。我只是觉得你在预见未来。我们将拥有更好的模型，能够整合这些海量的、有些人可能认为是分散的数据源。所以，感谢你们两位和所有同事撰写了这篇论文。

So, that's why I believe in this. I I don't see this as a static situation right now. I just think that you're anticipating the future. And we will have better models to be able to integrate this massive amount of, you know, what some people would consider disparate data sources. So, thank you both and all your colleagues for writing this paper.

Speaker 0

我不知道你们是如何让42位作者都达成一致的，这很棒。这只是新领域探索的开始。非常感谢。

I don't know how you got the 42 authors to agree to it all, which is great. And it's just the beginning of something that's a new frontier. Thanks very much.

Speaker 1

谢谢，

Thank you,