Arc研究所的Patrick Hsu谈利用AI构建生物学的应用商店

本集简介

Arc Institute联合创始人Patrick Hsu探讨了AI在生物学领域超越药物开发的机遇，以及他们的新型生物学基础模型Evo 2如何赋能广泛的应用生态系统。Evo 2通过海量基因组数据训练学习进化模式，这些发现原本需要耗时数年；因此，该模型可用于从识别致病突变到设计新分子乃至基因组规模生物系统等多种应用。主持人：Josephine Chen与Pat Grady，红杉资本本期提及内容：《Evo：从分子到基因组尺度的序列建模与设计》：Evo原始论文公开预印本《Evo 2：跨生命领域的基因组建模与设计》：Evo 2论文公开预印本 ClinVar：NIH建立的致病基因及致病突变关联数据库 Sequence Read Archive：NIH庞大的基因测序数据库《慈爱机器》：Daria Amodei论述AI如何推动世界向善的文章（Patrick引用） Arc虚拟细胞图谱：Arc通过AI驱动生物发现整合/生成大规模细胞数据的首个成果（含其他工具）蛋白质数据库（PDB）：DeepMind训练AlphaFold所用的生物分子3D结构全球档案库 OpenAI深度研究：Patrick日常使用的AI应用

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

我们计算生物学领域经常探讨的一个问题是，如果你基因组中存在基因突变——无论是通过23andMe还是其他基因检测发现的——我们该如何解读这些突变并理解其功能影响？有时你会遭遇罕见遗传病，那些已知会导致严重疾病的致病突变，比如肌营养不良症、囊性纤维化或乳腺癌。

One of the things that we, you know, that the field of computational biology is often asking is, you know, if you have a genetic mutation in your genome, if I sequenced you, whether that's via, you know, 23andMe or, you know, or or some other genetic tests, right? You'll find mutations in your genome. How do we actually interpret those and understand what the functional consequences are? Sometimes you'll get a rare genetic disease. Those are causal genetic mutations that are known to cause a devastating disorder That might be muscular dystrophy or cystic fibrosis or breast cancer.

Speaker 0

对吧？但你携带的大部分突变属于所谓的‘意义未明变异’，科学家们也不清楚它们究竟有何作用。而有趣的是，这个模型对这些突变及其影响居然能给出见解。

Right? But most of the mutations that you have, there's sort of this we call them variants of unknown significance, which is fancy kind of scientist know what they do. What the hell is going on. Right? And, you know, it turns out the model has an opinion about those mutations and what the hell is going on with them.

Speaker 0

事实证明它确实可以

And it turns out it's

Speaker 1

在这方面堪称尖端水平。今天我们邀请到了基因组编辑、CRISPR技术和生成生物学新兴领域的先驱Patrick Xu。作为ARC研究所联合创始人，他致力于推动前沿AI与生物学的融合以重塑科学发现。Patrick与团队开发的EVO2是一个革命性生物基础模型，能解读并生成跨生命领域的基因组序列。通过在最基础的生命信息层——DNA本身——进行训练，EVO可大规模识别遗传密码模式，预测编码与非编码突变的影响，这些差异可能关乎健康与疾病的分野。

sort of state of the art in doing that. Today, we're joined by Patrick Xu, a pioneer in genome editing, CRISPR technologies, and the emerging field of generative biology. He's the co founder of the ARC Institute, where cutting edge AI and biology converge to reimagine scientific discovery. Patrick and his collaborators created EVO2, a revolutionary biological foundation model that can interpret and generate genomic sequences across all domains of life. By training on the fundamental information layer of life DNA itself EVO can identify patterns from genetic code at scale and predict effects of both coding and non coding mutations that can mean the difference between health and disease.

Speaker 1

本期节目中，我们将了解Patrick如何超越单纯研发更好的药物，致力于构建全尺度的生物学认知体系。Patrick，欢迎来到节目，感谢你的到来。

In this episode, we'll hear how Patrick's vision goes beyond creating better drugs to building a comprehensive understanding of biology at all scales. Patrick, welcome to the show. Thank you for coming.

Speaker 0

谢谢邀请，很高兴今天能与你交流。

Thanks for having me on. Excited to spend some time with you today.

Speaker 1

我想最直接的切入点是：人们长期听闻计算机科学与生物学的结合，如今又热议AI与生物学的碰撞。实际成果在哪里？我们究竟该期待看到什么？

I think maybe the most obvious thing to start with is, people have heard about CS and bio for the longest time. Now it's all about AI and bio. Where are the results? What should we actually be expecting to see?

Speaker 2

毒品在哪里？

Where are the drugs?

Speaker 1

为什么我们还没看到毒品？

Why are we not seeing the drugs yet?

Speaker 0

这需要时间。事情是这样的。即使我们拥有完美的药物分子设计，从这些流程和高级模型中产出，你知道，你可以设计一万亿个分子。对吧？10,000,000,000,000个。

It takes time. Well, here's the thing. Even if we had perfect drug design molecules, you know, coming out of these pipelines and fancy models, it was still you know, you can design a trillion molecules. Right? 10,000,000,000,000.

Speaker 0

对吧？但你仍然需要实际测试它们。对吧？首先在动物身上，然后在人体上。对吧？

Right? But you still have to actually test them. Right? Initially in animals and then in people. Right?

Speaker 0

所以这才是真正的瓶颈。即使你在前端流程中投入所有资源，通过监管体系仍需数年时间。对吧？因此我认为需要设立几个中间检查点来实现这个潜力。但或许值得退一步说，这是我常强调的观点：生物领域的机器学习不仅限于药物设计。

And so that's the real the real bottleneck. And even if you pack top of funnel with all the things, it just takes years to actually go through the regulatory apparatus. Right? And so I think there are a few intermediate checkpoints along the way in order to kind of realize this potential. But it might be worth taking a step back and just saying, you know, this is a bit of a soapbox of mine that ML for bio is not just drug design.

Speaker 1

对。

Right.

Speaker 0

实际上最终我认为，这是生物学潜力中非常重要但狭窄的一部分，不仅作为STEM领域，更在于它影响人类生活的方式。对吧？

This is actually ultimately, I think, a very important but narrow part of the potential of biology and not just as a field of STEM and in the way that affects human lives. Right?

Speaker 1

你指的是基础层面，比如理解人体，还是你认为除了治疗我们所有人的药物之外，还有哪些其他应用领域？

And do you mean basic, just like understand the human body or where where else do you think the applications are beyond drugs that treat all of us?

Speaker 0

是的。学术上激励我的一个想法是，我们实际上拥有生物学的统一理论。对吧？不像物理学家们一个世纪以来一直在苦苦追寻，对吧？我们在生物学中已经有了这个理论，而且它如此显而易见，以至于我们发现它就像一种不言自明的力量。

Yeah. One of the things that motivates me academically is the idea that we actually have unifying theory for biology. Right? So unlike the physicists who have been kind of scrimping and kind of poking for one for a century, Right? We have this in biology, and it's so obvious that we find it, you know, sort of an just an obvious force.

Speaker 0

对吧？这当然就是进化论。对吧？它作用于生物学的所有不同尺度，从整个星球开始。对吧？

Right? This is, of course, evolution. Right? And then so it acts on biology across all of its different length scales from entire planets. Right?

Speaker 0

例如，生物学可以改造行星环境。

Biology can, for example, terraform planets.

Speaker 1

嗯。

Mhmm.

Speaker 0

对吧？一路向下延伸，从生态系统、种群到个体，再到我们的组织、单个细胞，直至单个分子。对吧？这就是那种既深刻又丰富的统一力量，实际上你可以从中学习到很多。

Right? All the way down to, you know, ecosystems and populations to individuals, to our tissues, to individual cells, to individual molecules. Right? And so that's this unifying force that is actually very deep and rich and actually, you you can learn a

Speaker 2

很多。对吧？我们如何激活这个统一理论？如何让它发挥作用？

lot from. Right? How do we activate that unifying theory? How do we put it to work?

Speaker 0

我们在实验室里一直在思考这个问题，最近训练了一系列名为EVO的模型，灵感来自进化力量，试图通过这种现代序列建模范式，直接将生物序列与生物功能联系起来，其理念是进化通过DNA突变将自然选择的效果代代相传。去年，当然，你知道，多个诺贝尔奖颁给了AI领域，特别是生物学中的AI，比如蛋白质设计和预测蛋白质结构，授予了David Baker、Demis Esabas和John Jumper。对吧？但如果你仔细阅读那些颁奖词，它们都明确提到了蛋白质。

So we've been thinking about this in the lab and recently have been training a series of models that we call EVO inspired by these forces of evolution that tries to connect biological sequences using this sort of modern sequence modeling paradigm directly to biological function with the idea that evolution passes down its effects of natural selection throughout generations of life via DNA mutations. And so, you know, last year, of course, you know, multiple Nobel prizes were awarded for AI and for AI in biology, in particular for protein design and for predicting the structure proteins to David Baker and to Demis Esabas and John Jumper. Right? But and and if you read those citations, they both explicitly state for proteins.

Speaker 2

嗯。

Mhmm.

Speaker 0

对吧？而且，我们热爱蛋白质。这些当然是最重要、最基础的分子机器之一。但作为基因组生物学家的我意识到，蛋白质与RNA、调控DNA以及其他构成生命所需的元素一样，都是由DNA编码的。

Right? And, you know, we love proteins. Right? These are, of course, some of the most important and fundamental molecular machines. But our realization as for me as a as a genome biologist, if you will, right, the idea is that proteins are encoded in DNA along with RNA and with regulatory DNA and all of the things that you need to make life.

Speaker 0

于是我们提出，能否用长上下文模型训练一个基因组模型，使其能够推理基因组中嵌入的所有不同碱基和分子，从而理解分子相互作用及其如何导致生物功能？这听起来很科学或学术，对吧？但我们可以用奶奶也能理解的具体例子来说明，比如预测导致乳腺癌的突变效应——实际上，在这方面它是同类最佳。

Right? So we asked, could we train a model on genomes with a long context model so that it could reason over all the different bases and molecules that are embedded inside of genomes to learn about the molecular interactions and how they lead to biological function? Now that was very scientific or academic, right? But we can talk through specific examples of what we were able to actually do with this in a way that grandma can understand, like predicting the effects of breast cancer causing mutations, right? It actually is best in class at doing this.

Speaker 0

或者设计新的CRISPR基因编辑系统。除了零样本能力，我认为人们正在这些基础层之上构建真正的生物应用商店。当前建模中一个有趣的现象是，每个人的模型都必须比别人的更基础，这有点像在比谁更底层。也许我们的模型确实比其他模型更基础。

Or being able to design new CRISPR gene editing systems. In addition to its zero shot capabilities, I think people are building really an app store for biology on top of all of these kind of foundational layers. One of the kind of funny things in modeling today is how everyone's model has to be more foundational than someone else's model. There's a bit of pissing contest that's happening. But maybe our model's more foundational than the other models.

Speaker 1

因为你们用的是DNA而非蛋白质。

Because your DNA versus protein.

Speaker 0

再往下还有这些全原子扩散模型，可能它们更基础。我也不确定。我认为关键在于能力，以及做出真正有用的东西。这些就是我们觉得模型既酷又有用的一些例子。

And then below that, there are these all atom diffusion models. Maybe those are even more fundamental. I don't really know. I think what matters are the capabilities and doing something that actually feels useful. I think, you know, those are some examples of what we thought was cool and useful from the model.

Speaker 1

你能再详细讲解几个用例吗？比如哪些是最令人兴奋的？为什么现在通过Evo可以实现，而以前不行？

Can you actually walk through a couple more of those use cases? Like, which are some of the most exciting ones? And why is it possible today with Evo but wasn't possible before?

Speaker 2

这个模型是开源的。那么它被应用在哪些领域？就像Josephine提到的，人们正在哪些用例中实际运用它？

And the model's open source. And so where has it been picked up? Like, where are people running with it with some of those use cases that Josephine mentioned?

Speaker 0

好的。好的。关于这个模型，或许我可以先解释一下它是什么。对吧？

Yeah. Yeah. So the model so I can just maybe talk about what the model is. Yeah. Right?

Speaker 0

这是一个自回归式的多卷积混合模型。你可以把它理解为一个经过高效训练的长上下文模型，至少在当前版本中是自回归训练的。本质上就是进行下一个标记或碱基预测。

So it's an autoregressive sort of multi convolutional hybrid model. Right? But you can think of it like just a really efficient long context model that's trained, at least in this version, autoregressively. Right? And basically, does this next token or next base prediction.

Speaker 0

事实证明，就像在自然语言处理、视觉识别、机器人学或肢体智能领域一样，这种通用机器学习范式能够发现更高阶的模式。就像通过预测下一个单词可以学习语法，通过预测下一个碱基、氨基酸残基或基因，模型也能掌握丰富的生物学表征。

And it turns out, just like in natural language or in vision or in robotics and in body intelligence, this general machine learning paradigm is able to find higher order patterns. So just like if you're doing next word prediction, you can learn about grammar or color and world navigation. Right? You seem to learn some rich set of representations about biology by predicting the next base or the next amino acid residue or the next gene. Right.

Speaker 0

这个模型能学习到细胞形成的分子逻辑。计算生物学领域经常探讨的问题是：当基因组出现突变时——无论是通过23andMe还是其他基因检测发现的——我们如何解读这些突变并理解其功能影响？

And the model learns something about the molecular logic that gives rise to a cell. Right. And so one of the things that we you know, that the field of computational biology is often asking is, you know, if you have a genetic mutation in your genome, if I sequenced you, whether that's via, you know, 23andMe or, you know, or some other genetic tests, right? You'll find mutations in your genome. How do we actually interpret those and understand, you know, what the what the functional consequences are?

Speaker 0

有时你会遇到罕见遗传病，这些是已知会导致严重疾病的致病突变，比如肌营养不良症、囊性纤维化或乳腺癌。但大多数突变属于我们所说的'意义未明变异'——这是科学家们的专业说法。

Right? Sometimes you'll get a rare genetic disease. Those are causal genetic mutations that are known to cause a devastating disorder that might be muscular dystrophy or cystic fibrosis or breast cancer. Right? But, you know, most of the mutations that you have, there's sort of this, you know, we call them variants of unknown significance, which is fancy kind of scientist.

Speaker 0

不太清楚

Don't really

Speaker 1

知道他们在

know what they

Speaker 0

做什么。我们根本不知道到底发生了什么。对吧？而且，你知道吗，结果模型对这些突变以及它们到底是怎么回事有自己的见解。事实证明

do. We don't what the hell is going on. Right? And, you know, it turns out the model has an opinion about those mutations and what the hell is going on with them. It turns out

Speaker 1

在这方面它是行业领先的。有意思。等等，能举个例子说明这些突变是什么吗？模型发现了什么？你们又是如何验证其发现的准确性的？

it's state of the art in doing that. Interesting. Wait, what's an example of one of these mutations, and what did the model discover, and how did you verify that what it discovered was accurate?

Speaker 0

是的。我们在论文中展示的一个例子是BRCA1基因，这是一个著名的会导致乳腺癌和卵巢癌的基因。如果你携带BRCA1特定的致病突变，许多女性会选择进行双侧乳房切除术。这显然对你和你的家庭来说是一个重大的人生决定和医疗决策。问题是，如果你携带的不是已知良性突变——那么你没事，只需每年做乳腺X光检查监测——还存在大量这类VUS（临床意义未明的变异）的中间分布。

Yeah. So one example that we showcase in the paper is a gene called BRCA1, a famous gene that's known to cause breast and ovarian cancer. And if you have the specific causal mutations in BRCA1, many women elect to get double mastectomies. And this is obviously a serious and major life decision and medical decision for you and for your family. And the question is, if you don't have one of the known to be benign mutations, so you're fine, and you just go ahead and get an annual mammogram and just check and monitor, there's this entire middle distribution of these VUS or variants of known significance.

Speaker 0

科学文献中有种黄金标准数据库叫ClinVar，它基本上列出了所有已知会导致疾病的基因，以及这些基因中哪些突变会致病。我们可以用这个作为基准数据库，来评估模型针对基因新引入突变的预测结果——判断这些突变是否具有致病性。开发新型模型时，还必须建立大量评估体系。这实际上是我们投入巨大精力的部分。

There is this kind of gold standard database from the scientific literature. It's known as ClinVar, and it basically has a list of all of the different genes that are known to cause disease and which of those mutations in those genes can cause disease state or not. We can basically use this as a ground truth database to assess the predictions of the model for new mutations that you introduce into the gene and whether or not those would be pathogenic. When you develop a new type of model, you have to create a lot of the evals as well. That was actually something that we put tremendous effort into.

Speaker 0

显然，你们在AI各领域都看到这种现象：人们会针对基准测试来优化模型。但说实话，我觉得没人真正喜欢构建基准测试。这活儿很棘手，需要很高的品味，而且极其耗时。

Obviously, you guys see this horizontally across AI in many different domains is, you know, folks will build things to the benchmarks. No one really, I think, likes building benchmarks. It's really gory. It requires a lot of taste. It takes a lot of time.

Speaker 0

随着模型性能提升，你们必须持续更新评估标准。我们在这里处理过非常类似的挑战——设计出类似AGI或智能评估的测试方案，当你们能像解决艾米问题或普特南竞赛题那样完成时，会真正感受到意义所在。那么要如何证明对生物学的真正理解呢？比如让细胞生物学家在你们解决问题时产生情感共鸣？怎样才能让全体分子生物学家体验到几年前NLP领域研究者的感受？这些正是我们正在深入探讨的核心问题。

You have to continually update them as the models get better. And, you know, we dealt with a very similar sort of challenge here, which was making evals that would be similar to the the AGI or intelligence evals, where it actually feels meaningful when you're actually able to do it like Amy or Putnam problems or things like that. What would be the equivalent of demonstrating true biological understanding that a cell biologist would feel emotion if you were actually able to solve that. What would it look like to make all molecular biologists feel what the NLP people felt a few years ago? That's sort of the core of what we're kind of noodling through.

Speaker 1

我记得你简单提过，你们团队专注于DNA层面研究，但实际并未进行实验室操作，整个过程中没有实验室参与，也没有强化学习环节。能否详细说明这个决策背后的考量？以及对于那些研究蛋白质模型或亲和力模型的团队，他们往往有更多实验室环节参与，你们如何看待这种差异？

I think you mentioned this briefly, but you guys are working on the DNA layer and you guys didn't actually do anything in the lab. You didn't have a lab in the loop. There was no RL. Talk us through the decision to do that and then the decision for people who are doing protein models or affinity models. A lot of them have a lot more lab in the loop.

Speaker 1

还请详细说明这些模型之间的区别。

Talk us through the differences between some of those models too.

Speaker 0

没错。我们选择从DNA入手，因为我们认为这是生命最基础的信息层。其次从实用角度说，这也是我们拥有最多数据资料的领域。

Yeah. So we started with DNA because we think it's the fundamental information layer of life. Right? Yeah. The second is it's also just pragmatically where we have the most data.

Speaker 1

所有数据。

All the data.

Speaker 2

对，那么这些...

Yeah. Right? Where does the

Speaker 0

数据从何而来？它们来自整个科学界。存在这类政府资助维护的开源数据库，比如序列读取档案库——基本上当你发表论文时就必须提交数据。过去二十五年来生成的所有测序数据都储存在这类数据库中。

data come from? It comes from the entire scientific community. Right? And so there are these kind of open source government funded and maintained databases known as the sequence read archive, where basically when you publish a paper, you have to submit your data. All of the sequencing data that's been created over the last twenty five plus years databases.

Speaker 0

这个数据库包含了科学界迄今测序的所有基因组——细菌、噬菌体、病毒、人类、猴子、鱼类、果蝇，整个诺亚方舟或动物大观园的所有物种。我们已经拥有了所有这些基因组。可以说，已经完成的实验就是进化本身的实验，正是那些不同的突变使我们彼此不同，构成了人类基因组变异，也造就了我们与黑猩猩、蠕虫或细菌的差异。

And that has all of the genomes that the community has ever sequenced for bacteria, for bacteriophage, for viruses, for humans, monkeys, fish, flies, the entire Noah's Ark or menagerie, whatever. We've got all those genomes. The experiment that's already been done, if you will, is the experiment of evolution, right? That there are different mutations that are what make us different from each other. That's human genomic variation, but also the mutations that make us different from chimpanzees or from worms or from bacteria.

Speaker 0

模型可以遍历这数万亿标记的庞大数据集，学习这些模式。我们在ARC实验室训练的EVO系列模型的核心理念就是：能否预测下一个碱基？这个碱基可能决定一个人是健康还是携带镰状细胞贫血突变。或者预测下一个氨基酸残基，这可能决定关键酶是否具有催化活性的结合口袋，还是因无效突变而丧失功能。

The model can just look across this trillions of tokens, large dataset, and learn those patterns. And that was sort of the insight of this sort of EVO series of models that we've been training at ARC is to kind of ask if you could predict that next base, right? And then that might be the difference between being healthy or having a sickle cell anemia mutation, right? Or it could predict the next amino acid residue. And that could be the difference between having a catalytically active binding pocket for a key enzyme, right, in your body physiology or that being a null mutation where that thing doesn't work anymore.

Speaker 0

它甚至可能预测下一个基因。这些不同层次的抽象概念——可能完成某个生物合成途径，或是被转座子、跳跃基因等可移动遗传元件切除的基因，也可以说是某种病毒干扰。通过分析大型数据库，我们发现的新模式往往具有生物学意义。

Or it could also be the sort of the next gene. Right? So these are different levels of abstraction, right, that completes some biosynthetic pathway or it's, you know, a different gene that's, you know, that's been removed by some transposon or jumping gene, mobile genetic element that's excised. It's some sort of viral interference, if you will. There's just Across large databases, you find new patterns and it turns out those seem to be biologically meaningful.

Speaker 1

那么如果从DNA出发且已知功能，就像你提到的DNA模型甚至能预测结合亲和力，是否还需要蛋白质结构模型作为中间步骤？能否直接从序列推断功能？

Yeah. So if you're going from DNA and you know the function, in many ways you mentioned even the DNA model can actually predict binding affinities, do you even need the structure, the protein structure models at all as an intermediate step? Or can you just go straight from sequence to function?

Speaker 0

结构是理解功能的另一种抽象方式。比如趋同进化现象中，不同序列和结构的蛋白质可能执行相似功能。这种从序列到结构再到功能的蛋白质语言建模非常精妙，它利用了分子生物学的中心法则。

Structure is another way to have an abstraction of function. So you have concepts of convergent evolution, for example, right, where something has similar function, but they have different sequences and slightly different structures that act out the activity or, you know, function of that protein. Right? And so this sort of, you know, sequence to structure to function token or mapping of protein language modeling, I think is very beautiful. It takes advantage of the central dogma of molecular biology.

Speaker 1

源自旧系列。

Off of old series.

Speaker 0

没错。它基于我们对生物学原理的传统教科书认知。有趣的是，这些模型的使用方式就像ChatGPT——文本输入，文本输出。而EVO是DNA输入，DNA输出。

Right. Well, it takes advantage of just our supervised textbook understanding of how biology works. Right? And now the interesting oxymoron with these models for me is the way that we use them is you know, just like you use ChatGeBT, it's text in, text out. You know, EVO is DNA in, DNA out.

Speaker 0

是的。结果发现我们对DNA的理解并不深入。想象一下，如果你用类似ChatGeBT的模型处理俄语内容，但其中1%的单词是英语，会怎样？

Yeah. And it turns out we don't speak DNA very well. Just sort of imagine if you were using a model like ChatGeBT in Russian. Right? But 1% of the words were in English, right?

Speaker 0

大概就是这种感觉。使用Evo的实际体验就是——你根本不清楚发生了什么。所以必须建立大量注释工具和

That's kind of what it feels like. That's the vibe of using Evo is actually, you don't really know what's going on. And so you have to build lots of annotators and

Speaker 1

可解释性。对，

Interpretability. Yeah,

Speaker 0

各种试图解读现状的技术手段。我们甚至使用和提示模型的方式都极其原始。那些通过复杂提示工程流程来提升模型效用的方法，在生物语言模型领域还处于探索的早期阶段——毕竟我们解读DNA时带着浓重的'口音'。

like techniques to try to interpret and read what's happening. And so the way that we even use and prompt the models is really primitive, right? And so the way that we do fancy prompt engineering workflows to get more utility out of these models is something that we're just in the very early innings of exploring how that works with these biological language models because we speak DNA with an extremely heavy accent.

Speaker 1

你认为谁会承担这部分工作？既然模型现在是开源的。你预见到会有围绕它成立的公司吗？还是药企内部人员学习如何提示这个模型？这个生态会如何发展？

Who do you think will do some of that work? Because the model is currently open source. Do you envision a company formed around this? Will it be individuals who are at these pharma companies who learn how to prompt this? Like, how does that ecosystem evolve?

Speaker 0

我希望是所有人参与。首先，工具要真正降低使用门槛才能发挥作用，这就像催化反应。就像现在所有人都用BLAST做序列比对，

Yeah. I hope everybody, right? First of all, right? I think the, you know, tools become useful when they meaningfully lower the energy barrier of adoption, right? It's like a it's a catalysis type of activity where, everyone uses BLAST for sequence alignment.

Speaker 0

用AlphaFold观察蛋白质结构，用CRISPR做基因编辑，用NGS读取DNA或RNA。我认为未来会出现一系列模型，不仅用于分子建模或序列功能映射，科学方法的每个环节都会有对应模型。对吧？

Everyone uses AlphaFold to look at protein structure. Everyone uses CRISPR to do gene editing. Everyone does NGS to read DNA or RNA. So I think there will be a zoo of different models for not just modeling molecules or mapping sequence to function, but I think every step of the scientific method. Right?

Speaker 0

所以我认为，如果2025年是AI代理的元年，对吧？我觉得在科学领域，人们对AI代理的兴趣不仅限于分子解析，还包括科学家工作方式的元层面研究。是的，我们ARC对此也非常兴奋，最近还发布了一些

And so I think, you know, you know, if 2025 is the year of AI agents, right? I think, you know, there's lots of interest in agents for science and not just agents for interpreting molecules, but also for doing the meta aspect of how scientists work operate. Yeah. And we're also very excited about that at ARC and recently released some of

Speaker 2

我们首批AI代理的研究成果。那么，ARC在其中最重要的角色是什么？

our first AI agent work. Yeah. What is the most important role for ARC to play in all of this?

Speaker 0

我们创立这个研究所，就是为了打造一个能够攻克长期研究能力突破的母舰，拥有长远眼光和多学科专长，以真正实现这些目标。生物学领域有个有趣现象——许多最重大的机制性或基础科学突破往往发生在大学环境中。这与AI领域的情况形成鲜明对比。确实如此。

We started the institute to be able to have this mothership that is able to attack long term research capability breakthroughs and to be able to have the long term thinking and the multidisciplinary expertise in order to actually execute on those goals. I think if you look in biology, one of the interesting things is that a lot of the biggest mechanistic or basic science breakthroughs do happen in a university context. I would say that's interestingly kind of in contrast to what happens in AI. Yeah. Yeah.

Speaker 1

或者计算机科学

Or CS

Speaker 0

整体而言。

in general.

Speaker 1

确实。你觉得为什么会这样？

Yeah. Yeah. Why do you think that's the case?

Speaker 0

这个话题可能够录十小时播客了。

That that might be a ten hour podcast.

Speaker 1

对吧？我...我只需要一分钟就能解释清楚。

Right? I I It's a one minute explanation.

Speaker 0

关于这点我有许多想说的，但简而言之，这种现象确实存在于大学中。我认为，三十年前基础科学关注的问题与工业界感兴趣的问题其实大不相同。而今天，它们似乎高度重叠了，对吧？有些研究在大学实验室里通常不会进行，比如人们不太会研究药代动力学、毒理学、化学制造控制，或是硬核的药物生产这类事情。

I have a lot to say about this, but I think the in in short, it does happen in universities. I think, you know, thirty years ago, the questions that basic science was interested in and industry was interested in were quite different, actually. And I would say today they seem to heavily overlap. Right? And there are some things that folks don't typically do in university lab, like people don't tend to study PK or tox or CMC, or hardcore drug manufacturing type things.

Speaker 0

但人们对分子胶、诱导接近与降解剂、新药概念很感兴趣。人们也热衷于新的机器学习模型、新型递送机制、炎症与应激反应等各类相关研究。因此重叠领域大幅增加，但我认为上游研究的选择标准仍存在根本差异。学术机构最终必须优化以获得科研经费和第一或共同第一作者的论文，而在工业界，你必须研发药物，让数十至数百人为一个分子或项目共同努力才能到达理想彼岸。

But folks are interested in molecular glues, induced proximity and degraders and new drug concepts. Folks are also interested in new machine learning models and new delivery mechanisms and inflammation and stress and all kinds of things like this. And so there's much more heavy overlap, but I think the way that the type of product that selects what you do upstream is very different. They're structurally different. You know, end of the day, have to optimize for grant funding and first or co first author papers and, you know, you know, that type of stuff in the academic setting, whereas, you know, you do have to make a drug and have, you know, dozens to hundreds of people lying behind a molecule or a program to reach the holy land.

Speaker 0

对吧？我认为这确实导致策略差异。不过就我个人而言，人们过度强调学术界与工业界的界限。实际上如今这两个领域更像是高度重叠的分布区间，运作方式也很相似。差异或许被夸大了，但根本驱动力确实不同，从而导致了不同的行为模式。

Right? And I think that does lead to differences in strategy. And I don't know personally, by the way, I think people really kind of overemphasize like what's academia and what's industry. And the reality is these are like heavily overlapping distributions today and how people operate. And so I think the differences are a little overblown, but they do have fundamentally different incentives that drive different behaviors.

Speaker 0

我们努力将学术团队与更接近工业界模式的技术团队相融合，这种尝试正是我们希望ARCA模式能被他人效仿或推广的部分原因。

A lot of what we try to do and blending our academic side of the house with our technical staff side of the house, which is built much more like you'd see an industry, has been kind of part of what we hope will make ARCA model that others want to copy or replicate or propagate.

Speaker 1

嗯，有道理。你们技术团队确实来过些有趣的人物，比如格雷格·巴赫曼。

Yeah. Makes sense. And you've had some fun people come through on the technical side of the house, including people like Greg Bachman.

Speaker 0

是的，没错。那段时光很愉快。格雷格是从OpenAI休假期间加入我们的。

Yes. No. It's been a it was a joy. Yeah. So Greg joined us during his sabbatical from OpenAI.

Speaker 0

要知道，这确实是他自加入OpenAI以来第一次真正意义上的休假。

We're really the kind of you know, the first vacation that he had ever taken since starting OpenAI.

Speaker 1

当然，结果还是为了工作。是啊，所以...

Of course, it's to work more. Yeah. So so

Speaker 0

其实关于这个有个有趣的故事。希望格雷格不介意我讲出来。记得他第一次来ARC时，我们跟他讨论生物语言建模，谈到机器学习在其他领域的突破可能直接应用于分子理解。他当时特别兴奋，尤其想到自己的专长能真正推动这件事。但最后他说，这其实是他人生中第一次休假。

I actually have a funny story about this. You know, hopefully, hopefully, Greg will allow me to tell it. But, you know, when he when he first came by ARC and we were talking to him about, you know, you know, biological language modeling and how, you know, the sort of machine learning breakthroughs of, you know, all these other domains might just port over directly to understanding, you know, molecules. He was, you know, really kind of, you know, jazzed by this and also by the idea that his very specific capability and expertise would be kind of really meaningful and actually making this happen. And but, you know, in end, know, initially he was, you know, he was, you know, saying, you know, this is really the first vacation I've ever taken.

Speaker 0

我不能保证太多。可能没那么多时间投入。我得带安娜去度假。结果安娜在漫长的会议中途去了洗手间，然后他突然说：'好吧，这是我的邮箱。'

I can't promise too much. You know, I might not be able to spend so much time on this. You know, you know, I need to take Anna on vacation. And then Anna gets up and goes to the bathroom in the middle of this very long meeting. And then he's like, all right, here's my email.

Speaker 0

把我加进项目库。

Get me on the repo.

Speaker 1

当然。真有意思。确实。

Of course. That's funny. Of course. Yeah.

Speaker 2

真有趣。

That's funny.

Speaker 0

一般般吧，就是与众不同。是的，向他学习真是件乐事。

So so, yeah, just built different. Yeah. Was it was such a joy to to learn from him.

Speaker 2

在让不同学科作为一个统一团队协作方面，你发现哪些方法特别有效？

What have you found works well in getting the different disciplines to work together as one unified team?

Speaker 0

是的。这是个有趣的问题，我们ARC作为召集中心深入思考过这个问题，不仅是湾区三所顶尖研究型大学斯坦福、伯克利和UCSF之间，还包括基础科学与生物技术产业之间，以及生物学与技术领域之间。比如我们的CTO戴夫·伯克几个月前刚加入，他一直在领导虚拟细胞的计算建模工作，我们正尝试用这些AI基础模型模拟人类生物学。戴夫其实多年前就获得了生物医学工程博士学位，最近还在负责Android和Pixel的工程团队。

Yeah. No. It's an interesting question, and we've thought about this deeply at ARC as a convening center, you know, not just between the three flagship research universities here in the Bay Area in Stanford and Berkeley and UCSF, but also between basic science and the biotech industry, but also biology and the technology sector. For example, our CTO, Dave Burke, just started a few months ago and has really kind of been leading the computational modeling of virtual cells where we're trying to simulate human biology with these AI foundation models. Dave used to he actually has a PhD in biomedical engineering from many moons ago at most recently ran engineering at Android and Pixel.

Speaker 0

因此，我认为我们还建立了完整的运营体系，从财务、法务到实验室运营、设施管理再到大学关系和学术事务。我们自主管理空间、行政和运营，并尝试像科技公司那样运作。我们在运营团队中也招募了许多通常不会从事基础科学或探索性研究的人才，但他们被这个能将基础突破转化为产品的使命所激励。

And so, you know, I think we have also built an entire operational side of the house with from finance to legal to lab ops to facilities to university relations and academic affairs. And, you know, we run our own space, our own administration, our own ops. And we try to do much of that like a tech company. Right? You know, we've also, I think, recruited in on the ops side of the house many people who maybe ordinarily wouldn't work in a basic science or discovery setting, but are kind of motivated by the mission to be able to take fundamental breakthroughs and have a product sensibility where we can get these out into the real world.

Speaker 0

在ARC，我们并不以在《自然》《科学》发表论文为优化目标。如果你给伯克利或斯坦福教授数百万美元做更多科研，这几乎是默认的预期产出。对吧？

We're not optimizing at ARC for nature and science papers. Right? If you give Berkeley or Stanford professors millions of dollars to do more science, that's almost the default expectation and output. Right? Yeah.

Speaker 0

我们真正关心的是那些能产生实际影响的事物。那么...

And what we really care about are things that could be tangible. Right? And what does

Speaker 1

成功是什么样子的？仅仅是更多人使用你们创造的产品吗？还是说...ARC的成功标准是什么？

success look like? Just more people using the products you create or yeah. What is success for Art

Speaker 0

我认为衡量科研生产力的方式有很多，对吧？期刊论文当然是与学界分享成果、进行同行评审等重要环节的基础部分。但技术博客、代码仓库、实验方案等平台同样可供人们使用。我们希望能开发出广泛适用的技术和平台能力，既能产生新的机制性见解，也能实际尝试治愈某些疾病。

I think there are lots of ways that you can parse scientific productivity, right? You know, journal publications is, of course, a important and fundamental part of sharing work with the community and peer reviewing it and all of that good stuff. But technical blogs, you know, code repos, protocols are just platforms that people can use. Right? And I think we want to make technologies and platform capabilities that are broadly useful to be able to create new mechanistic insights, but also actually try to cure some diseases.

Speaker 0

对吧？而且我认为，随着时间推移，如果我们能成为像爱迪生实验室那样不断发明或发现各种酷炫事物的机构，那么这些成果终将产生实际价值。当然，这个领域已有许多专业合作伙伴。

Right? And, you know, I think over time, if we can be an Edison shop that's inventing or finding lots of cool things, right, there, you know, hopefully will be, you know, real world value in those things, and there are lots of, you know, partners who specialize in this.

Speaker 2

说到治愈疾病，你之前提到即使

Speaking of curing diseases, earlier you're talking about even if

Speaker 1

拥有完美的药物设计，或是无限可用、无限智能的药物设计方案，仍然存在

you had perfect drug design or infinitely accessible, infinitely intelligent drug design, there's still

Speaker 2

漫长的后续流程才能真正对人类产生影响。能否谈谈你在这个价值链中看到的机遇？如果给你

a long process that has to happen after that before you can make an impact on humans. Can you talk a bit about where you see opportunities in that value chain? And if you had

Speaker 1

一根魔法棒，能直接将进展加速十年。你认为哪些瓶颈可能会被即将问世的技术所缓解？

a magic wand, then you could just accelerate progress by ten years. Which of those bottlenecks might be alleviated by things you see coming down the pipeline?

Speaker 0

没错。很高兴能在制药行业背景下讨论这个问题——这些大型、分散、臃肿的官僚机构，就像大学或国会、旧金山市政府那样。当然，其中某些环节的运行效率其实非常高。

Yeah. So happy to talk about this in the pharma context, which are, you know, large, decentralized, sprawling bureaucracies Yeah. Much like universities or, you know, the, you know, Congress or the SF City governments. Right. And, you know, I think, you know, some parts are, you know, you know, incredibly, you know, functional.

Speaker 0

我认为在这些不同的组织中，大家都会认同某些环节效率较低，对吧？因此我们首先希望能看到的是，模型可以提升各个独立步骤的效率。比如能否建立一个更高效的目标识别流程？

And then I think everyone would agree in these different organizations, some parts are, you know, less efficient. Right? And so I think the first thing that hopefully we can see is that we can have models that can improve efficiency in discrete individual steps. Right? So can we have a more efficient process for target ID, right, in particular?

Speaker 0

好的。我们能否建立更高效的数据分析流程？能否在信息文献综述与总结方面更高效？当然还有分子设计环节，比如优化结合剂，以及分析这些不同分子的药物特性。

All right. Can we have a more efficient process for data analysis? Right? Can we have a more efficient process for, you know, like information and literature review and summarization, right, for molecule design, absolutely making a better binder, right? And then figuring out the drug properties of those different molecules.

Speaker 0

可能涉及选择性、药代动力学、半衰期、表达水平、可生产性等方方面面。每个步骤都会有模型或模型指导的方法。实际上，现在制药公司使用AI的方式，很多只是处理海量监管文件，进行摘要总结，然后借助AI撰写更多类似文件。

That could be selectivity. That could be pharmacokinetics, half life, expression, manufacturability, what have you. There'll be models or model guided approaches for each of those steps. I mean, I think if you look at how AI is actually kind of being used today in pharma companies, a lot of them are actually just taking massive regulatory documents, summarizing them, and then using AI to help them write more of these. More regulatory.

Speaker 0

是的。这种结构化信息的压缩与解压方式，确实推动了企业在这个领域的应用。不过...我觉得...这个现象...

Yes. So there's this compression and decompression of information in a structured fashion that has actually been leading to enterprise adoption in this setting. Right? And I don't know. I think that I don't know.

Speaker 0

这反映了人们认为哪些环节真正有用。比如与制药高管（不是药物研发负责人，而是掌握预算的财务人员）交谈时...

That says something about the process of where people find things useful. So, you know, one example is if you talk to some pharma execs, right, not the drug discovery organizational leaders, but the, you know, the the the budget people. Right. You know, who kind of hold the enterprise purse strings. Right.

Speaker 0

他们会说：'其实我在药物发现上投入不多，大部分资金都用在药物开发上。既然我的钱主要花在这里，AI能在这方面帮我什么？'

They'll say, well, you know, I don't spend actually that much money on drug discovery. I actually spend most of my money on drug development. So tell me something about drug development, which is really where most of my dollars go. How can AI help me with that? Right?

Speaker 0

我认为这个观点非常深刻。因为它揭示了资金流向和价值所在。首先要明白，我们行业的成功率只有10%左右。因此人们在药物研发中讨论、抱怨或评论的很多问题，都源于这个基本统计事实。

And I think that is actually like a very deep comment. Right, because it says something about where money is spent and where value can be found. The first thing to realize is our industry probability of success is like 10%. Right? So, you know, I think a lot of the things that people talk about or complain about or comment on in drug discovery and development falls out of that fundamental statistic.

Speaker 0

是啊，对吧？比如FDA是不是严格监管？为什么如此注重安全性？因为如果90%的情况下都不奏效，他们当然会非常关注安全问题。

Yeah. Right? Like, does the FDA heavily regulate? Why is it so focused on safety? Well, if 90% of the time it doesn't work, they're gonna care a lot about safety.

Speaker 0

对吧？我认为AI的潜力在于，如果我们能把POS从10%提升到20%、30%甚至50%，随着这些进步，我们能做到吗？

Right? And I think the promise of AI is if we can go from 10% POS to 20 or 30 or 50, right, as you move to those steps, can we?

Speaker 2

你觉得我们能行吗？

Do you think we can?

Speaker 0

需要时间。我觉得有意思的是，我们过去做的很多生物学研究，其实和试错法差不了多少。对吧？你看看湿实验室里的实际操作，那些正在进行的实验。

Over time. I think here's the thing that I find kind of interesting is we have done a lot of biology with what is not that far from guess and check. Right? If you look at what happens in the wet lab, like the actual experiments that are happening.

Speaker 2

是啊。

Yeah.

Speaker 0

对吧？你就像是在竞技场里不断尝试

Right? You're just kind of in the arena trying

Speaker 1

对吧？所以就随机验证各种假设，看看会发生什么。

to Right? And and so trying out random hypotheses and seeing what happens.

Speaker 0

是的。而且，这就像是科学文献中缺失的推理痕迹——你不知道哪些方法行不通。所有成果都被叙事化，写成一部关于不可阻挡的逻辑与远见引领科学突破的故事，对吧？但真正参与其中的人都知道事实并非如此。

Yeah. And and and this is, like, the missing reasoning trace in the scientific literature is you don't know what didn't work. Everything is narrativized and written in a story of, you know, inexorable logic and vision leading to scientific breakthrough. Right? But everyone who makes the sausage knows that's not

Speaker 1

大多数时候实际情况并非如此。

what actually happens most of the time.

Speaker 0

没错。如果你真正与实验室的研究人员共事过，就会发现这其实是所有技术行业的普遍现象——真实过程的高保真推理痕迹几乎从未被记录下来。而这些痕迹对于推理模型、闭环验证以及多智能体框架构建等都将非常有用。这就是我们所需要的。

Right. And, you know, if you actually work with the, you know, research, you know, kind of folks at the bench, right? You know, and this is actually the case across all technical industries is that high fidelity reasoning trace of the true process is kind of not written down anywhere. And that would actually be very useful for these reasoning models and for closing the loop and doing, you know you know, multi agent frameworks, blah blah blah blah. But that's kind of what we'll need.

Speaker 0

但如果你观察实际过程，就会发现基本都是试错法。所以哪怕只有一点点预测价值的模型都将带来变革——虽然目前我们连这种中等预测能力的模型都没有。

But if you actually look at what happens, it's guess and check. Right? And so a model with even a modicum of predictive value would be transformative. One with even moderate predictive value, which by the way, we don't have. Right?

Speaker 0

我认为生物学是一门非常务实的实验学科。这在同行评审文化中就能体现：用数据说话。在讨论部分你不能空谈理论，因为你根本没有展示过相关证据。

Like, I think biology is a very pragmatic salt of the earth experimental discipline. Right? You see this in the culture of peer review. Show me the data. You can't, you know, pontificate in your discussion section because, you know, you haven't shown any of this stuff.

Speaker 0

对吧？如果你阅读早期论文，会发现它们既清晰又充满远见，甚至有些高谈阔论。而如今我们的论文文化非常务实。我认为具有预测能力的模型不仅能加速科研效率，还将改变整个学术文化——这种改变会非常有趣。

Right? And if you read old papers, right, they were so clear and visionary and, you know, highfaluting in a way that I think, you know, papers today are, you know, we we have this culture that is very pragmatic. I think having models that have predictive power will, you know, I think, a, obviously be useful for accelerating the efficiency of science. It'll also change the culture, which I think hopefully well, hopefully, will change the culture, which I think will be really interesting.

Speaker 1

为什么你认为这会改变模型的文化？

Why do you think it'll change the culture of the models?

Speaker 0

因为你会相信人们的预测或断言，这取决于

Because you'll believe people's predictions or pontifications depending on

Speaker 1

这本质上只是另一种证据点。就像

how you It's just another kind of evidence point, basically. Just

Speaker 0

模型幻觉可能是预测，也可能是无意义的垃圾，对吧？这取决于你对模型的信任程度。明白了。

like how model hallucinations could be, you know, predictions or they could be nonsense and garbage. Right? And that depends on how much you trust the model. Got it.

Speaker 1

为什么你认为情况如此？按照你的观点，我们也收集了大量数据。显然，我们基本上看到什么有效，而往往看不到什么无效，尽管希望实验室笔记本以某种方式记录了这些。

Why do you think it's the case? We've collected a lot of data too, to your point. There's obviously, we basically see what works and we oftentimes don't see what doesn't, although hopefully lab notebooks are recording that in some way, shape or form.

Speaker 2

也许吧。

Maybe.

Speaker 1

也许吧，希望如此。但即便如此，即使某些方法在细胞和小鼠中有效，它们在人体中经常失败。为什么还是这样？是我们对生物学的理解还不够深入吗？为什么这种落差依然存在，而且多年来几乎没有改变？

Maybe, hopefully. But somehow still very, very regularly, even when things work in cells, things work in mice, they fail in humans oftentimes. Like why is that still the case? Is it we just don't understand biology deeply enough? Why is there still that drop off and that drop off hasn't really changed over time?

Speaker 1

嗯，模型并不完美。

Well, are imperfect models.

Speaker 0

我们在药物发现过程中设立了这套筛选流程，首先验证它在细胞系中有效，然后在原代细胞或类器官中有效，接着在小鼠、猴子身上有效，最后才进行人体试验。对吧？要知道，走到这一步时，五年时间和一亿美元已经投入其中。我认为这极具挑战性。这正是预测模型能真正发挥作用的地方——我们之所以要线性推进所有这些步骤，正是因为缺乏预测能力。

We set up this set of filters in the drug discovery process where, you know, to first show it works in cell lines, then it show it works in primary cells or in an organoid, then it show it works in a mouse, then it show it works in a monkey, then test it in people. Right? And, you know, By the time you've gotten there, five years and $100,000,000 has gone by. I think that's very challenging. That's where I think predictive models will really help because the reason why we do all of these steps in linear series is because we don't have predictive power.

Speaker 0

所以我们不得不在实际场景中逐步验证，这需要真实操作。对吧？培养细胞和动物进行实验往往耗时数月甚至数年。因此预测模型的潜力不仅在于预测能力，更在于它能实现多线程并行模拟。

So we have to do things in the arena and it just, you know, you have to do that in real life. Right? And growing cells and growing animals takes months to years to actually do those experiments. And so the promise of having predictive models isn't just predictive power. It's that you could actually simulate things in a multi parallelized fashion.

Speaker 0

这正是《慈爱机器》部分章节的核心思想，我认为达里奥确实把握住了精髓。关键在于如果能拥有一个可信的预言系统，就能同时运行上万智能体进行模拟。对吧？

That's the whole idea behind parts of Machines of Loving Grace that I thought Dario really did get right. And it's the idea that if you had something that could be a trusted Oracle, right Yeah. That you could just run, you know, 10,000 agents, right, at the same time. Right?

Speaker 1

我们是否有足够数据来构建这种全闭环的可信预言系统？我认为假以时日...

Do we have enough data for that full closed loop to create a trusted Oracle? I think we will see

Speaker 0

我们会看到更多此类案例涌现。当前人们开发AI智能体时，本质上是在试图缩小某个微小环节的差距——比如步骤X与步骤X+1之间。

more examples of this coming out over time. Right? Today, folks building AI agents for things are doing you're basically trying to close the gap between Just small portion. Yes. Step x and step x plus one.

Speaker 0

对吧？或者是X+4之类的步骤。企业正在寻找商业价值最高、步骤组合最精简的解决方案来创业。我认为未来科学方法的每个环节——从假设生成、实验设计到数据分析——都将配备智能体或协作者系统。

Right? Or x plus four or whatever. Right? And the businesses are trying to find the most commercially valuable set of, you know, steps that is a set size that's as small as possible and step number in order to, you know, make a company. And I think we will have agents or copilots at each step in the scientific method from hypothesis generation to experimentation to data analysis.

Speaker 0

而要实现全闭环——包括论文撰写、成果发现及后续决策——我认为还为时尚早。但能高效串联各步骤的系统必将兴起。举个具体例子：我们ARC最近发布的虚拟细胞图谱，作为全球最大单细胞数据集，正用于训练细胞基础模型。其诞生过程就是通过一个类似网络爬虫的智能体，系统性地抓取序列读取档案库，处理那些高度非结构化的混乱元数据，并重新分析所有单细胞数据。

And the ability to close the loop and write the paper or make the discovery and then decide what to do next, I think is quite far away. But I think something that's very efficient at traversing the steps, I think will really take off. So as a concrete example, right, one of the things that we recently released at ARC is our virtual cell Atlas, right, which is the world's largest dataset of single cells, right, that we're using for training these cellular foundation models. Right? And the way that it happened was we created an agent that was essentially it's like a crawler, kind of like, you know, kind of a search crawler, but it's able to crawl the kind of, you know, sequence read archive and then process all of the highly unstructured and messy metadata and, you know, kind of reanalyze and systematically reprocess all single cell data.

Speaker 0

而这正是运行在云端存储实例上的程序，不知疲倦地持续运转着。对吧？这种工作即便是才华横溢的计算生物学家也不愿接手，因为它太枯燥了。但事实上，我们能够实现的规模覆盖了整个研究社区。关键在于，我们团队——其实主要就两位首席研究员——借助一个智能体就能达成这样的杠杆效应和效率，这对我而言是巨大的思维突破。

And this is something that is just running on a cloud bucket instance just cranking away, right, you know, in a tireless fashion. Right? And it's the kind of stuff that a talented computational biologist wouldn't wanna do because it's so grind set. But actually the scale at which we're able to reach is community wide. And that was that the leverage and efficiency that our team of, you know, just, you know, you know, really two lead researchers could achieve with one agent, I think was it was a huge mental unlock for me.

Speaker 0

因此我们希望站在实际部署这些技术并取得突破的最前沿。我认为当前人们追逐的那些元层面问题会随时间逐渐明朗。但我更关注如何运用这些技术真正实现突破，而非绘制端到端的闭环路径。

And so we want to be at the frontier of actually deploying these and making breakthroughs. And I think the meta aspect that folks are going after right now will shake out over time. But I care about using these to actually make breakthroughs as opposed to chart the end to end closed loop path.

Speaker 2

你之前提到现在很多研究论文体现出的实用主义倾向，与过去更具远见或宏大的风格形成对比。我在想这是否与当下研究工作的高度专业化有关。感觉我们正变得越来越专精于细分领域。

You mentioned earlier the sort of the pragmatism that a lot of research papers have now versus grandiosity or something a bit more visionary back in the day. I wonder if some of that is related to the specificity of the work that people are doing now. Meaning, it feels like we've gotten more and more specialized over time.

Speaker 0

是啊。

Yeah.

Speaker 2

我在思考这种趋势是否正在让我们与重大突破渐行渐远。

And I wonder if some of that is taking us away from breakthroughs.

Speaker 0

嗯。

Mhmm.

Speaker 2

因为在许多情况下，你需要融合不同领域或学科的知识才能实现突破。嗯。我想问题在于：大型语言模型的优势之一就是能整合海量信息，即使应用于特定领域时也保持着天然的通用性。我们从中获得的效能有多少是源于它们跨越不同专业领域的能力呢？

Because in a lot of cases, you need the knowledge from different domains or different disciplines to achieve those breakthroughs. Mhmm. And I guess maybe the question is, one of the nice things about LLMs is that they can incorporate an enormous amount of information, and they're sort of inherently generalized even when you apply them to a specific domain. How much of the efficacy that we might get out of some of these models is simply related to their ability to go across all these different specialties?

Speaker 0

是的。在我看来，我有幸合作或学习过的最优秀的科学家们，他们确实做到了两件事：既能提出极具创意的想法，又能将这些想法付诸实践。

Yeah. If you look at, at least to me, the the best scientists that I've had the pleasure to collaborate with or learn from. Yeah. They really do two things. They they they they're able to come up with really creative ideas, and they're able execute on them.

Speaker 0

没错。他们之所以能提出极具创意的想法，是因为他们能在他人看不到的事物间建立联系。实际上，如果把十个聪明人聚在一起讨论科学——比如任何团队的每周组会——你会发现，通常只有一小部分人能在倾听所有讨论内容后指出这些事物间的关联或概念桥梁。

Yeah. The reason why they're able to come up with really creative ideas is because they're able to make connections between things that other people wouldn't make. And in fact, if you got a room of 10 really smart people together to chat science, like that's, you know, a weekly lab meeting in any group. Right? If you actually analyze the anthropology of what happens, there's usually a small subset of people who are hearing all the things that are being discussed and then actually saying, this is the connection or the conceptual bridge between these things.

Speaker 0

对吧？

Right?

Speaker 2

是的。

Yes.

Speaker 0

这就像是一种分布外泛化能力。听到新颖观点后，我们要识别它并尝试从中归纳出普遍规律。这种能力往往来自那些勤于阅读或深思的人。

And so there's you know, that's sort of like an out of distribution generalization. Right? It's like this thing that I heard was really novel and let's recognize that and then try to see what can generalize out of that observation. Right. And that comes from people who tend to either read a lot or reason a lot.

Speaker 0

明白吗？所以这涉及某种预训练——你需要大量阅读论文，比如化学生物学、分子生物学领域的文献，

Right? And so there's some aspect of pre training, right? You need to just read a lot of papers, right? Read a lot of chemical biology papers. Read a lot of molecular biology papers.

Speaker 0

还要读大量人工智能论文、物理学论文，并且跨领域阅读，这样才能穿梭于不同学科的边界。对吧？

Read a lot of AI papers. Read physics papers, right? Yeah. And do so across domains so that you can traverse those boundaries. Right?

Speaker 0

我认为，你知道，如何组建一个多学科团队是个设计难题。对吧？现实情况是，比如你想在生物和机器学习的交叉领域工作。纯粹的机器学习专家和纯粹的生物学专家远多于真正精通这两者的双语人才。对吧？

I think, you know, there's this design problem of how do you build a multidisciplinary team. Right? And the reality is there well, for example, you wanna work at the interface of bio and ML. There are way more ML people and way more bio people than truly bilingual ML and bio people. Right?

Speaker 0

我非常幸运能在ARC与一些这样的人才共事。对吧？他们确实很罕见。对吧？但翻译者实际上可以帮助你赋能其他人群。

I have the extreme fortune of working with some of those at ARC. Right? And they're just rare. Right? But translators can actually help you power the rest of the population.

Speaker 0

是的。

Yeah.

Speaker 2

你在ARC寻找什么样的人才？

What do you look for in people at ARC?

Speaker 0

这取决于具体职位，对吧？我认为要根据具体项目需求来匹配。但某种程度上，我可以告诉你我们如何试图将招聘流程理性化，但实际上归结起来非常简单。这和我寻找研究技术员或科学方向高管时的标准是一样的。对吧？

It depends on the role, right? I think depending on how you're trying to match specific project needs. But in a way, I could tell you all the ways that we try to intellectualize our recruiting process, but it actually comes down to very simple things. And it's the same thing that I look for in a research technician or an executive on at least on the science side. Right?

Speaker 0

其实就是：你是否会在实验室外思考科学问题？你是否有过从始至终完成项目的经验？第三点是，你是否有毅力和决心真正走完这条路并完成任务？

It's really like, are you thinking about science outside of the lab? And have you done something end to end before? And then the third is, do you have the grit to actually kind of walk the path and get it done?

Speaker 2

是的。

Yeah.

Speaker 1

对吧？你说的端到端部分具体指什么？

Right? What do you mean by the end to end part?

Speaker 0

我觉得从第一步到第二步，或者第三步到第五步，甚至第十二步到第十五步都很容易。但是，从第一步一直做到第十五步会大幅减少参与人数。所以我常说，一个项目最后20%的工作实际上占了80%的工作量。

I think it's very easy to go from step one to two or three to five or, you know, 12 to 15. But, you know, going from one through 15 winnows down the population significantly. And so, you know you know, I often say, you know, the the last 20% of a project is actually 80% of the work.

Speaker 1

是啊。

Yeah.

Speaker 0

对吧？因为完成一件事，然后通过多次完成来磨练你的杀手本能，这真的很重要。

Right? And it's because finishing something and then honing your killer instinct from finishing things multiple times really matters.

Speaker 1

在未来六个月以及接下来几年里，我们预计ARC研究所会有什么成果？

What should we expect to be coming out of ARC Institute in the next six months and then over the next few years?

Speaker 0

嗯，我对很多事情都感到非常兴奋。我想很多人可能不知道的是，我们在ARC真正致力于构建生物学体系的深度。也许人们听说过我们的基因编辑工作或机器学习研究，对吧？但我们真正想要构建的是在多系统交互背景下应用高通量可扩展技术的整体概念，特别是在神经与免疫的交叉领域。去年我们从宾夕法尼亚大学聘请了两位研究内感受过程的杰出科学家。

Well, I'm tremendously excited about lots of things. I think the thing that maybe many people don't know is the degree to which we have really been trying to build biology out at ARC. I think people have maybe heard about our gene editing work or our machine learning work, right? But a lot of what we're actually trying to build is this general concept of applying high throughput scalable technologies in the context of multi systems interactions, really working at their neuro and immune interface. And so we hired two incredible scientists out of Penn last year who study the process of interoception.

Speaker 0

本体感受是当你闭上眼睛时，知道肢体在哪里的能力。而内感受则是像'我膝盖能感知天气'或'肚子感觉怪怪的'这类概念，对吧？这些看似接生婆的古老说法，其实背后有很深的科学依据。当然，目前还完全未被充分认知。

Proprioception is when you kind of close your eyes, where are your limbs. Right? And interoception is the idea of, you know, I feel the weather in my knee or my tummy feels funny, right? You know, kind of midwives tales type stuff that actually has really deep science. And of course, it's totally unknown.

Speaker 0

身体如何与大脑对话，反之亦然？对吧？事实证明，这背后有着深刻的机制基础。你知道，当人们思考如何编程生物学时，他们通常从药物范式出发——如何获得能与特定蛋白质结合的配体？或者如何用CRISPR编辑某个基因？

It's how does your body talk to your brain and vice versa? Right? And it turns out there's a deep mechanistic basis for this. And as you know, I think when people think about programming biology, they think about it in the drug paradigm of how do I get a binder that binds to this protein? Or how do I get a CRISPR to edit this gene?

Speaker 0

但如果你想想激素或Ozempic的作用机制，实际上你是在以极其强大的方式编程你的思维、感受和行为模式——这不仅控制饱腹感，还包括精力、情绪、肌肉合成、专注力等方方面面。如何真正编程生理机能，这是我在实验室里长期思考的问题。

But if you think about what happens with hormones or with Ozempic, right, you're able to program the way that you think and feel and behave in really powerful ways that controls not just satiety, but energy, mood, you know, muscle synthesis, you know, focus, all kinds of things. And I think how do we actually program physiology is something that I've been spending a lot of time thinking about in our lab.

Speaker 1

你认为有哪些出人意料的关联是人们尚未意识到的？

What's one unexpected connection that you think people don't think about?

Speaker 0

举个例子——运动。我们的首席研究员Christophe发表过一篇精彩论文，他发现某种肠道菌群能产生特定分子，通过肠神经系统（即肠道内连接大脑的神经网络）刺激多巴胺释放。正是这个功能回路创造了跑步者的愉悦感或运动奖赏机制。当清除这种细菌时，你就切断了肠脑回路，大脑就无法释放多巴胺。

One example is the you know, so so exercise. Right? And and so Christophe, one of our PIs, you know, had a beautiful paper where he showed that there's a specific species of gut bacteria that produce a certain type of molecule that connects via your enteric nervous system, which is the nervous system that lines your gut that goes to your brain in order to release dopamine. And it is this functional circuit that creates runner's high or exercise reward. And when you delete this bacteria, you cut off this ENS to brain circuit or you you cut off the ability of the brain to release the dopamine.

Speaker 0

在每个独立环节进行阻断，都能消除跑步快感。这个完整机制在活体动物（本研究是小鼠）身上得到了验证。

At each of these steps individually, you can block the runner's high. Right? And so it really traces in an intact animal. Well, this is a mouse study. Right?

Speaker 0

这个全身循环系统其实是双向的。当承受深度心理压力时，大脑会向支配肠道的星形胶质细胞发送信号，释放促炎细胞因子导致肠道炎症，最终可能引发溃疡。没错，压力导致溃疡——这个结论我们其实早有认知。

This, you know, full body circuit, but that also goes in reverse. So when you have deep psychological stress, right, that can lead to signaling from the brain to astrocytes that innervate your gut that releases pro inflammatory cytokines that leads to gut inflammation and then can give you ulcers. Right? So stress causes ulcers. We've actually kind of known Yeah.

Speaker 1

是的。但这次揭示了具体作用机制

Yes. But this is the mechanism How for

Speaker 0

你能治疗它吗？因为有些人会反复出现溃疡。是的。对吧？而且大脑与身体之间存在这种信号传递的通路。

can you treat it? Because there are folks who get recurring ulcers. Yeah. Right? And there's a brain to body access by which this signals.

Speaker 0

对吧？我认为这随时都在发生，你知道，其中一些是有意识的，大部分是无意识的。对吧？你实际上可以开始调整这些调节机制。这在某种程度上就像药物的一种新范式。

Right? And I think this happens all the time, you know, some of which is conscious, most of which is unconscious. Right? And you can actually start to figure out the dials and knobs. And that's actually if in a way like a new paradigm for drugs

Speaker 1

是的。

Yeah.

Speaker 0

或者如何考虑使用药物。对吧？它不仅仅是这种高度简化的药物，你知道，那种与生物标志物绑定的东西。而是给你更多整体性的、几乎带有东方医学色彩的思考：我怎样才能感觉更健康？

Or how to think about using drugs. Right? It's not just this highly reductive pharmaceutical, know, kind of binder to biomarker type of thing. But that's giving you more of the holistic kind of almost Eastern medicine flavor of just how do I feel healthier?

Speaker 1

是的。

Yeah.

Speaker 0

对吧？我认为这在当今的长寿社区很流行，对吧？是的。就是健康寿命？我如何延长它？

Right? That I think is in vogue in the longevity community today, right? Yeah. Is health span? How do I improve it?

Speaker 0

我如何改善饮食和营养才能感觉更好，对吧？这些事情同样需要深厚的科学基础，也必须具备这一点。

How do I improve my diet and my nutrition so that I just feel better, right? Those things are also can have deep scientific grounding and needs to have that.

Speaker 1

有意思。那你觉得未来人们会如何被治疗？你将拥有全套检测面板，能精确掌握身体状况，然后通过不同干预手段来调控整体健康。将功能医学与当今制药行业的长寿研究相结合？这种互动会如何体现？

Interesting. So how you think people will be treated in the future? You will have a full panel, you'll know exactly what's going on in your body and then you'll decide different inputs to influence the whole thing. Tie in functional medicine, things that go on longevity with the drug industry as it is today? Like, where does that interplay

Speaker 0

是的。我认为我们会需要AI医生，能够多模态整合信息。就像用CGM持续监测血糖，用Oura Ring或WHOOP追踪各种生物标志物，或是去Quest或Function Health做肝功能、胆固醇、睾酮、雌激素等血液检测。但目前你只知道这些指标升降与否，以及是否在标准参考范围内。

Yeah. I mean, I think we'll want AI doctors, right, that are able to integrate information multimodally. Right? And so just like you have your CGM that monitors your glucose with high temporal resolution, you have your Oura Ring or your WHOOP that talks about your various biomarkers, or you can go to Quest or function health or whatever and get blood tests that measure what's going on with your liver function or your cholesterol or your testosterone or estrogen or other types of hormones. Now, all you really know is that these things are going up or down and whether or not they're in standard or reference range.

Speaker 0

对吧？这些数据并不能告诉你具体该怎么做。我认为个性化遗传学或消费级基因检测缺失的关键能力，是从基因组序列中提取有效信息。

Right? It doesn't tell you very much about what you're supposed to do. Right? I think one thing that has been really missing in personalized genetics or, you know, consumer genetics is the ability to take information content from your genome sequence. Yeah.

Speaker 0

并将其与健康生物标志物有机结合，从而获得能更好预测表型的基因型与环境因素。就是高中生物课学的GXE=P公式。但这些对普通人甚至我们都难以真正理解。

And meaningfully integrate it with your health biomarkers in a way that gives you the genotype and the environment that can be more predictive of phenotype. Right. That GXE equals P equation you learn in high school biology. Right. But none of that is actually kind of accessible to mom and dad or to even us.

Speaker 0

在具体生活场景中该如何应用？我认为需要从高信息量的检测手段，转向将其与基因特征关联以做出更精准预测。这算是我们今天对话的主线。

Right? In the in the in the setting of how do I actually live my life. And I think we need to go from measuring people with, you know, higher content approaches to connecting that to, you know, genetic signatures and make more accurate predictions. Right? So that's sort of, like, been the theme of our conversation today.

Speaker 2

你觉得这会是什么形态？会像现有的长寿研究那样，还是会出现全新的模式来满足这个需求？

And what do you think that'll look like? Do you think that'll look like any of the existing longevity efforts, or do you think there's some new beast entirely that's gonna be created to serve that purpose?

Speaker 0

确实。以23andMe为例——他们最近刚申请第11章破产保护。我认为这是将基因技术带给数百万人的开创性壮举。但我最希望看到的是能整合所有身体检测数据，将其与饮食睡眠关联，最终给出长期个性化健康建议的系统。

Yeah. I mean so I think if you look at you know, 23andMe was, you know, recently, you know, filed chapter 11. Right? And it's I think one of, I think it's an amazing pioneer and visionary kind of pioneering effort in how to take genetics and put it in the hands of millions of people. I think the thing that I think I would really love to see in the world is something that can take all of that information with all of your different, you know, kind of body measurements and then actually, you know, connect that to diet and sleep and give you personalized recommendations about your health in a longitudinal way.

Speaker 0

目前我们拥有的数据集非常零散，难以实现这一目标。我认为能够大规模收集跨人群且具有时间分辨率的长期数据将会——我不想成为那些认为大数据能解决一切问题的狂热分子之一。是的。

We have very fragmented data sets for being able to do this today. I think being able to collect this data at scale across populations and over time with temporal resolution will I don't I don't I don't wanna be one of these unhinged big data will solve everything people. Yeah.

Speaker 1

但它确实会。但它确实会。是的。我确实想知道是否如你所说，在跨职能方面还有更多内容。比如，是的。

But But it will. But it will. Yeah. I do wonder if there's more stuff on the cross functional side to your point. Like Yeah.

Speaker 1

如果你知道自己有赌博成瘾，我不认为有人会想到也许Mounjaro或Ozempic能对此有所帮助。但它们确实在某些方面有效，对吧？比如，是的。我确实认为跨职能方面有些东西值得探讨。

If you know you have a gambling addiction, I don't think anybody's thinking about maybe Mounjaro or Ozempic could help with that. But it does help with some some of these things. Right? Like Yeah. Do think there's something about the cross functional

Speaker 0

这种本质是否就是内感受。

nature Is that interoception.

Speaker 1

这很公平。确实需要某个组织来真正实现可及性。是的。

That's fair. There needs to be some some organization that actually makes accessible. Yeah.

Speaker 0

是的。是的。我认为目前显然还不存在这样的机制。是的。我觉得人们正在从不同角度摸索这个问题，但你们应该创办这家公司。

Yeah. Yeah. I don't think that obviously exists today. Yeah. I think folks are building a different hands on the elephant for this, but, you know, you guys should start this company.

Speaker 2

你们应该的。

You should.

Speaker 1

老实说，这是个相当不错的主意。

Honestly, it's a pretty good idea.

Speaker 2

确实是个好主意。那这样，我们针对几个不同的时间尺度来做些预测。先从2025年开始，嗯，然后可能是2030年，再到2050年。

It is a good idea. Alright. Let's ask let let's get a couple of predictions on a couple different time scales. So we'll start with twenty twenty five Mhmm. And then maybe 2030, and then maybe 2050.

Speaker 2

在AI与生物技术交汇的领域，2025年、2030年和205年我们将会看到哪些最有趣的发展？

What is the most interesting thing we're gonna see in the world of AI meets bio in 2025, by 2030, and by 2050?

Speaker 0

我希望到今年年底——其实这已经在发生了——我们能设计完整的IgG抗体，明白吗？不是纳米抗体那样的单链结合体，而是真正意义上的抗体药物，就像我们现在熟知的那种，可以直接设计它们的CDR区域，让它们具有极佳的结合能力。

My hope is by the end of the year I mean and this is already happening. Right? We can, you know, just we can design full IgG antibodies, right? Not single chain binders like nanobodies, but just the real antibody medicines that, you know, we kind of know and love today that we can just design their CDR regions. They're gonna bind really well.

Speaker 0

你可以一键操作，就像在你酶的表面进行点击那样简单，就能实现结合。懂吗？一键搞定。我认为未来几年会成熟的技术是，我们能够从头设计酶。

You can one shot it, and you can kind of do point and click on that, you know, that surface of your enzyme. I can just bind it. Right? You know, one shot. I think the thing that will mature over the next couple years is that we can actually design enzymes, de novo.

Speaker 2

我

Speaker 0

认为这会非常有趣，当然也需要大量努力。不过再次强调，这些都还停留在蛋白质层面。我觉得大多数思考这个问题的人都过于蛋白质中心化了。所以我们很多工作就是要跳出蛋白质框架，从细胞层面来思考问题，对吧？

think that will be that will be really interesting and, you know, also lots of efforts. But and, again, this is all in the world of proteins. And I think one of the things that most people who think about this stuff are very protein coded. And so a lot of our our our work is to sort of zoom out from proteins and think about cells. Right?

Speaker 0

因此我认为构建虚拟细胞的PDB（蛋白质数据库）是我们ARC目前重点投入的方向。这需要数年时间才能成熟。PDB作为蛋白质数据银行，是原子分辨率下实验解析蛋白质结构的黄金标准数据库，D MIND正是用它来训练AlphaFold的。这些预训练数据让模型获得了诸如埃级精度蛋白质结构预测等前沿能力。

And so I think building the the sort of the the PDB of virtual cells, right, is something that we've been focusing a lot on at ARC. That will take some years from today to mature. So PDB is a protein data bank, right, and it's the sort of gold standard database of atomic resolution solved experimentally solved protein structures that was used by, you know, D MIND to train AlphaFold. Right? And so it's the pre training data that allows the model to reach some SOTA capability like protein structure prediction at angstrom resolution.

Speaker 0

对吧？那么对于虚拟细胞而言，我们相信这将帮助我们设计更好的药物靶点，提高治疗成功率。我认为到2030年左右，我们将拥有能让细胞生物学家为之动容的精确实用虚拟细胞模型。

Right? So, you know, what is that for, you know, virtual cells, which we think would help us design better drug targets, increase therapeutic probability of success. Right? I think, you know, that will that's sort of my 2,030 sort of prediction is that we have, you know, accurate and useful virtual cell models that make a cell biologist feel emotion. Right?

Speaker 0

2050年的愿景——希望这能更早实现——我认为届时会出现大量关于科研超级智能或科学方法端到端递归的讨论。我期待看到'实验闭环'中实现全自动化湿实验室的垂直整合。

The the 2050 idea, and hopefully this happens much, much sooner than that. You know, I think, you know, there's lots of chat about scientific superintelligence or, you know, the the end to end kind of recursion of the scientific method. I'd like to see that, right, with, you know, lab in the loop, with a fully automated wet lab that's vertically integrated.

Speaker 2

你认为到2050年我们能否以99.9%的准确度模拟特定药物对靶点的影响？在几小时内（而非数月）通过全自动化湿实验室完成验证。对于未来25年以上的技术进步，你认为药物研发从零到产生影响的理想场景是怎样的？

Do you think do you think it's possible by 2050, we can simulate with 99.9% accuracy, you know, the impact that a particular drug is gonna have on a particular target, you know, validate that in a wet lab in a fully automated way in a matter of hours, not months. Like, what do you think the dream scenario is for going from zero to impact in the future of drug discovery if you imagine twenty five plus years of technological progress?

Speaker 0

是的。今天我们已从不同维度探讨了这个愿景。当前某些环节仍非常缓慢，比如毒性测试、长期随访等——这很大程度上取决于疾病类型。急性肿瘤治疗与慢性自身免疫疾病就截然不同。我认为加速进程的唯一途径是建立具有强大预测能力的模型。

Yeah. I mean, I think we've laid out different aspects of the vision over the course of our conversation today. Where things are really slow, right, like toxicity, long term follow ups, right? These are the know, it depends a lot on the disease, right? If you're doing some acute oncology thing, that's very different from some really chronic autoimmune thing, And so I think the only way that I can imagine you speeding this up is if you have a model with strong predictive power.

Speaker 0

对吧？因此关键在于我们能否构建真正实现这一目标的模型。模型的质量将决定能力跃升的幅度。

Right? And so a lot of this hinges on our ability to make models that can actually do that. And I think you'll unlock different step sizes of capability based on how good they are.

Speaker 2

到205年我们有什么理由实现不了这个目标吗？

Is there any reason we wouldn't have that by 2050?

Speaker 0

是的。基本上，如果我们用错了数据，我认为最明显的一个例子就是，你可以把老鼠模型做得极其完美，但它终究不是人类。这虽然是个简单的例子，但我们仍在这么做，因为这是最实际的做法。

Yeah. Basically, if we make the wrong data, I would say is, like, one obvious one. You know, you can model the mouse in all of its glory to great perfection. It will still will not be the human. That's, you know, one one sort of, I think, you know, trivial example that is something that we still do, though, because that's just what's practical.

Speaker 0

所以我还有个观点，就是我们实际上需要在人体上进行更多实验。

And so I have this other soapbox about how we actually just need to be doing way more experiments in humans.

Speaker 1

你认为我们需要什么条件才能实现这一点？是监管问题吗？还是...

And what do you think it will take for us to be able to do that? Is that just a regulatory thing? Is that

Speaker 0

我认为除了更好的监管创新外，还需要发挥一些创造力。比如现在可以从脑死亡患者身上取样，获得可以灌注并维持一周生命的肺脏，然后直接在肺脏上做实验。最近就有篇论文报道了这种方法。

I think there will be some aspect of creativity involved in addition to better regulatory innovation. So one example would be, there are these kind of You can take samples from brain dead patients, for example. And then now you can just get lungs that you can perfuse and keep alive for a week and then just do experiments in that lung. And, you know, there there was a paper recently published about this.

Speaker 2

太好了。我们要不要来个快速问答环节？

Great. Should we do lightning round?

Speaker 1

好啊，开始吧。第一个问题：过去三个月里你试过最喜欢的新AI应用是什么？

Yeah. Let's do it. Maybe first one. Favorite new AI app that you've tried in the last three months.

Speaker 0

这可能有作弊嫌疑，但我是OpenAI深度研究的日活用户。它是我日常工作中唯一觉得足够实用的AI应用。当然我也会关注其他有趣的AI应用，毕竟我对AI很感兴趣。

So this is maybe cheating, but I'm a DAU of OpenAI deep research. Right? I just I I find it just by far the main AI app that I find useful enough to use in my day to day work. Right? And so there are lots of other fun AI apps or things that I pay attention to because I'm interested in AI.

Speaker 0

是的。但真正改变我工作方式的是这些深度研究模型。顺便说一句，它们还有巨大的改进和运行空间。没错。而且，是的，那是我第一次感受到真实的情绪，想着，哇哦。

Yeah. But the thing that has actually changed how I work is these deep research models. And they have, by the way, so much more room to improve and to run. Yeah. And, yeah, I think it was the first time I felt real emotion thinking, oh, wow.

Speaker 0

好吧。也许有一天我会被自动化取代。

Okay. Someday maybe I will be automated.

Speaker 2

哦，没错。我每天都有这种感觉。你心目中的科学界伟人会有哪些？

Oh, Yeah. I feel that every day. Who would be on your Mount Rushmore of scientists?

Speaker 0

这话可能有点肉麻，但就是我在ARC共事的同事们。我知道这听起来特别谄媚，但真心实意。每天能去实验室，身边都是充满热情、聪明、善良又极具抱负的人，我觉得自己太幸运了。真的，这让我变得更强。

This is maybe a bit smarmy, but it's the folks I get to work with at ARC. I realize this is incredibly smarmy, but it's really genuine. I feel so lucky to go to lab every day and just be around, you know, just passionate, bright, kind, and incredibly ambitious people. Yeah. And it up it levels up my game.

Speaker 1

你认为到今年年底，科学家们会普遍使用的杀手级应用是什么？

What do you think is gonna be the killer application that scientists will use by the end of this year?

Speaker 0

深度研究。

Deep research.

Speaker 1

就这样？好吧。你们没打算给马克开发点什么吗？

It's all that. Alright. Nothing that you guys are gonna create for Mark?

Speaker 0

嗯，我认为这些虚拟细胞模型将会非常有用。我不认为我们或任何人现在就能拥有真正实用的模型，我觉得它们需要时间发展成熟，直到真正具有基础性的实用价值。对吧？目前有些研究问题需要时间才能成熟。是的。

Well, I think these virtual cell models will be incredibly useful. We I don't think we or anyone will have working models in the sense that they I think I think it'll take some time for them to mature to the point where they're actually fundamentally useful. Right? There are currently research problems that will ripen over over some time. Yeah.

Speaker 0

但我们希望能交付这些成果。

But we'd like to deliver those.

Speaker 2

你在ARC研究所学到的最重要的事情是什么？

What's the most important thing you've learned at ARC Institute?

Speaker 0

有句苏格兰谚语，我转述时可能会说得不太准确，但大意是：活着时要快乐，因为你死后会躺很久。嗯，对吧？这是真的。当我读到这句话时，它真的触动了我。

So there's a Scottish proverb that I'm gonna butcher when I paraphrase, but it's basically, you know you know, be happy when you're alive for you're a long time dead. Mhmm. Right? That's real. You know, I think that that really hit for me when I when I read it.

Speaker 0

你知道吗，那是在某个时刻——你躺在沙发上，手机离脸只有六英寸，在即将入睡前对着眼睛直射强光。不知怎么的，这提醒了我：尽管我们努力做有意义的事情很复杂，但你也应该享受乐趣。我认为生活中需要更多这样的态度，不仅在研究实验室里——在那里很容易变得极度挑剔，因为这是训练方式，进步的方法就是挑剔一切，找出所有问题。但保持乐观和快乐不会导致平庸和错误，反而是让你拥有持久的情感韧性去实现那些长期目标的关键。

And, you know, it was one of those, you know, you're lying on the couch, the phone is six inches from your face, just beaming luxe into your eyeballs right before supposed to fall asleep. I don't know. That just it it reminded me that despite all the complexity of, you know, trying to work really hard to do useful things, you're supposed to have fun. And I think that should I think we need more of that in life, not just in research labs where I think it can be so easy to be super critical because that's the training and how you make progress is to hate on everything and everything has a problem and figure out why this can go wrong. But being optimistic and happy is not a path to mediocrity and mistakes, but it's actually how you have the emotional capacity for persistence over time to reach those those long term goals.

Speaker 2

说得太好了。感觉这是个完美的结束点。

Love it. It feels like a good place to end it.

Speaker 1

知道吗？希望这个播客也让你感到快乐，让我们都感到快乐。

Know. Well, hopefully, this podcast made you happy too, made us happy.

Speaker 0

见到你们俩我总是很开心。

I'm always happy to see you both.

Speaker 1

再次感谢你。

Thank you again.