本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
这是《神经病学回忆录》,汇集了我们从神经病学播客合集中精选的最实用、最富有教育意义的播客,涵盖广泛的主题。
This is Neurology Recall, a collection of our most useful, practical, and educational podcasts on a wide variety of topics selected from our Neurology Podcast Collection.
无论您是刚接触神经病学并处于培训阶段,还是已从业多年,《神经病学回忆录》都旨在帮助您轻松地通过一集一集的播客,持续掌握神经病学的核心知识。
Whether you are new to neurology and training or have been practicing for many years, neurology recall is designed to help you continue to learn the essentials of neurology easily one podcast at a time.
大家新年快乐。
Happy New Year, everyone.
我是神经病学播客的副主编杰夫·拉特利夫,来自托马斯·杰斐逊大学,今天将为您介绍2026年1月《神经病学回忆录》的开篇内容。
It's Jeff Ratliff, deputy editor of the Neurology Podcast and with Thomas Jefferson University here to introduce you to the kickoff January neurology recall for 2026.
本月,我们将重点回顾过去几年在播客中发布的关于人工智能在神经病学中应用的关键访谈内容。
This month, we're featuring key interviews we posted on the podcast over the past few years on the topic of artificial intelligence in neurology.
我们所熟知的AI相关内容,最早可追溯到三年前的2023年1月,当时斯泰西·克拉迪采访了《大西洋月刊》的专栏作家斯蒂芬·马什,讨论他于CHAT GPT发布后不久发表的文章《大学论文已死》。
Our first featured content of AI as we now know it actually kicked off back in January 2023, three years ago, in which Stacy Clardy interviewed Stephen Marsh, a columnist for The Atlantic, about his piece titled The College Essay Is Dead, released shortly after the launch of CHAT GPT.
在2023年1月19日的那期播客中,斯泰西和斯蒂芬探讨了自然语言处理对神经病学职业发展和学术出版的潜在影响与未来前景。
Stacy and Stephen discussed the implications and potential future of natural language processing on neurology careers and publications in that 01/19/2023 podcast episode.
在随后的几个月和几年里,我们陆续推出了多期关于人工智能在神经病学中应用的关键内容,其中一些将在本次《神经病学回忆录》中重点呈现。
In the subsequent months and years, we featured some key content on AI in neurology, some of which we're gonna feature here in this neurology recall.
在2023年10月5日的播客节目中,哈莉·亚历山大与尚多尔·贝尼茨基讨论了人工智能在常规临床脑电图解读中的应用。
On the 10/05/2023 podcast episode, Hallie Alexander spoke with Shandor Benizky about the use of AI in the interpretation of routine clinical EEGs.
我们下一期精选访谈最初于2025年8月21日播出,特雷·贝特曼与戴维·T.
Our next featured interview originally aired on 08/21/2025, and that conversation between Trey Bateman and David T.
琼斯探讨了一种应用于FDG-PET图像的机器学习框架,以提高阿尔茨海默病的诊断准确性和临床决策能力。
Jones discussed a machine learning framework being applied to FDG PET images to improve the diagnostic accuracy and clinical decision making for Alzheimer's disease.
2025年1月23日,安迪·萨瑟兰再次围绕大型语言模型,与亚当·罗德曼讨论了LLM在医生推理任务中的表现。
Circling back to large language models on 01/23/2025, Andy Sutherland spoke with Adam Rodman about the performance of an LLM on the reasoning tasks of a physician.
最后,本月的回顾内容将焦点转向我们自身,推出了我们编辑斯泰西·克拉迪于2025年11月13日发布的个人播客。
And then closing out this month's recall, we turned the lens back to ourselves and featured the solo podcast from Stacy Clardy, our editor, that aired on 11/13/2025.
在这期节目中,斯泰西反思了人工智能的迅猛发展,并再次提及她2023年访谈中对未来的看法。
In that episode, Stacy reflects on the explosive growth of AI and comments again similar to how she had back in her 2023 interview on the implications for the future.
不过这一次,斯泰西没有再谈论写作,而是探讨了人工智能在未来音频教育、尤其是播客领域可能呈现的面貌。
Instead, this time, instead of writing, Stacy features what the future of AI may look like for audio education and more specifically podcasting.
我们希望这一系列访谈能帮助大家认识到,人工智能在我们神经科医生及学习者生活与职业中的角色已发生多么迅速的变化。
We hope this lineup of interviews will help recognize how much has rapidly changed in the role of AI in our lives and careers as neurologists and as learners.
我相信,随着人工智能在我们专业中的整合不断推进,还有更多值得期待的发展。
And I'm confident that there's a lot more to come as integration of AI into our profession continues to progress.
再次祝大家新年快乐,我们下个月再见。
Happy New Year again, and we'll see you next month.
我是《神经病学》期刊系列的主编何塞·梅里诺。
This is Jose Merino, editor in chief of the Neurology family of journals.
《神经病学》播客为神经科医生及其他临床医生提供实用信息,帮助他们为患者提供更优质的诊疗服务。
The Neurology podcast provides practical information to neurologists and other clinicians to help them provide better care for their patients.
感谢收听,祝您本周愉快。
Thanks for listening and have a great week.
你好,我是斯泰西·克拉迪,今天我与史蒂文·马什对话。
Hi, this is Stacy Clardy and today I'm speaking with Steven Marsh.
史蒂文是一位小说家和专栏作家,他最近为《大西洋月刊》撰写了一篇题为《大学申请文书已死》的文章。
Steven is a novelist and columnist, and he recently wrote an article titled The College Essay is Dead for The Atlantic.
这篇文章探讨了OpenAI的ChatGPT的出现。
It's all about the arrival of OpenAI's ChatGPT.
如果你还没听说过,ChatGPT 是一个目前可以免费下载的人工智能程序,它能根据你提出的任何问题生成原创文本。
Now if you haven't heard of it, the ChatGPT is an artificial intelligence AI program currently free to download that generates original text in response to just about any prompt you can throw at it.
我试过了。
I've tried it.
真的非常不可思议。
It's really quite unbelievable.
我经历了一次那种改变游戏规则的时刻——人生中难得几次遇到新技术彻底改变一切的时刻。
I had one of those game changer moments that you have a few times in your life when you see a new technology that's shifting things.
你可能会好奇,为什么我们在神经病学播客中讨论这个话题?但原因很简单。
Now you may be wondering why we're discussing this on the neurology podcast, but the reason is simple.
像 ChatGPT 这样的技术,以及它之前出现的各类工具和程序,对每个领域的人——包括神经科学领域——都具有革命性的影响。
A technology like ChatGPT and all the company Keeps and the programs that came before it, it has these revolutionary implications for everyone in every field, including the neurosciences.
因此,无论你是审阅神经病学住院医师申请材料,还是为我们的神经病学和神经科学期刊审稿,我们都必须了解这项技术。
So whether you review neurology residency applications or review articles for any of our neurology and neuroscience journals, we all have to be aware of this technology.
有了这个简短的介绍,斯蒂芬,我想直接进入正题。
So with that brief introduction, Stephen, I want to dive right in.
你在这个领域已经很久了,早在2017年就开始研究、撰写并向我们所有人介绍这项技术及其各种形式。
You have been in this field for a very long time, as far back as 2017, researching, writing about, informing all of us about this coming and all the forms it takes.
基于这样的背景,你能稍微退后一步,先给我们讲讲这项技术吗?
With that background, can you back up a little bit and tell us first about the technology?
是什么让ChatGPT和其他程序如此独特?它们是如何被编程的?又从哪里获取信息?
What is it that makes ChatGPT and the other programs unique, including how are they programmed and where does it ingest its information from?
本质上,有一种叫做‘变换器’的技术,2017年由一群加拿大人在谷歌发明。
Essentially, there's this thing called the transformer, which was invented at Google in 2017 by a bunch of Canadians.
我可以把它比作语言领域的‘裂变原子’。
And it is sort of split the atom of languages, how I would describe it.
它处理的是‘词元’,也就是对任何文本中语言的细粒度分解。
It takes tokens, which are like the small breakdown of language in any text.
它会比较这些词元与其他词元之间的位置关系。
It compares their place around other tokens.
这才是关键所在。
That's really the key.
然后它会从海量文本中读取这些模式。
And then it reads the patterns in those over huge mass quantities of text.
因此,拥有1750亿参数的GPT-3,借助微软的十亿美元超级计算机来建立这些庞大的参数。
So GPT-three, which has 175,000,000,000 parameters, used a billion dollar supercomputer by Microsoft to establish these massive parameters.
这其实非常重要:它本质上就是一个大规模的文本预测软件。
All it is, this is really important to remember, is a massive text prediction software.
所以它所做的只是让下一个词显得合理。
So all it does is make the next word make sense.
对吧?
Right?
它通过分析所有可用语言的整体庞大模式来实现这一点。
And it does that by reading these enormous patterns in the whole body of all available language.
以GPT-3为例,它使用的是所有可用的英文数据,但其他模型如PaLM等则使用不同的语言,有时甚至包括外语,这就带来了非常奇特的连贯性。
In the case of GBD-three, all available English, but in other ones like Palm and other ones, use different sometimes they use foreign languages too, and then you get into really crazy kinds of coherence.
本质上,这创造了人工的连贯性,使其能够生成大量对我们而言意义清晰的文本。
And essentially, this creates artificial coherence and the ability to create extensively sensible texts that make sense to us.
哇。
Wow.
这是一次很好的概述。
That's a great overview.
所以它从互联网上任何可以获取的信息中提取数据。
And so it's then getting its information from anything it can pull on the Internet as a whole.
是
Is
公平的吗?
that fair?
它会抓取互联网,但每个模型都不一样。
It scrapes the Internet, but I mean, each one is different.
对吧?
Right?
它们使用不同的数据存档,而这些存档会产生不同的结果。
And they use different archives and those archives have different results.
这项技术的未来实际上将在于这些档案库。
And the future of this technology is actually going to be in those archives.
比如,当你观察不同的文本到图像生成器,如稳定扩散、MidJourney、DALL·E 2等,它们基于各自访问的不同档案库,因此才会产生不同的结果。
Like, when you look at the text to image generators that are different, like stable diffusion, mid journey, DALI two, and so on, they're based on different archives that they've accessed, And that's why you get the different results.
此外,超级计算能力也相当重要,比如你能扩展到什么程度。
Also, the supercomputing matters quite a bit, like how much you can scale.
但同时,Transformer的使用方式也越来越重要。
But also the sophistication of the transformer use, that increasingly starting to be quite relevant.
所以,Cohere是一家我非常欣赏的加拿大公司,它们的参数规模没有PaLM那么大。
So Cohere, which is a Canadian company that I'm a big fan of, they don't have the parameter size that something like Palm has.
但另一方面,它们在方法上更为精妙,因此能取得更好的结果。
But on the other hand, they're more sophisticated in their in their ways of approach, and so they get better results.
哇。
Wow.
好的。
Okay.
你刚才抛给我一大堆术语。
You're throwing a lot of terms at me there too.
我试过、用过的是ChatGPT。
So ChatGPT is the one that I tried, the one that I used.
它非常容易使用。
It was very easy.
我直接搜索了一下,然后下载了。
I just searched it and downloaded it.
但这样的模型有多少个呢?
But how many of these are there?
因为你提到了其他的,它们也容易获取吗?
Because you're talking about other ones and are they also readily available?
哦,是的。
Oh, yeah.
你需要付费才能使用。
You have to pay for them.
对吧?
Right?
它们也更难使用。
They're also more difficult to use.
ChatGPT 就是这项技术的 Model T。
What ChatGPT is is the model T of this technology.
对吧?
Right?
比如,GPT-3 就是 Model A。
Like, GPT-three was the model A.
ChatGPT 就是 Model T,因为它非常易用。
ChatGPT is the model T in that it's very usable.
它很粗糙,但相当实用,能帮你从 A 点到 B 点。
It's very crude, but it's quite usable and it can get you from point a to b.
你用过它,所以你知道它有多有效。
You've used it, so you've seen how effective it is.
但还有许多其他的方法。
But there are many other approaches to this.
我的意思是,像Pseudo Write这样的工具,使用ChatGPT-3来做一个写作工具,对我来说比ChatGPT更有用,尤其是在我们担心的写论文等方面。
I mean, pseudo write, which uses ChatGPT three to make a writing tool, to me is much more useful as a way to do kind of things we're worried about, like write essays and so on than chat GPT for sure.
但这类模型有很多,而且说实话,等到这期节目播出时,可能已经出现了另一个模型,让整个讨论变得无关紧要。
But there are lots of different models of this, and they're going to be but honestly, by the time this goes to air, there might well be another one that makes this whole conversation irrelevant.
哇。
Wow.
是的。
Yeah.
所以进展很快。
So moving quickly.
你说它很粗糙。
And you say it's crude.
不过有意思的是,对你来说,现在它只是个粗糙的列表。
It's interesting though, crude to you as now the list.
但我要说,我输入了一个指令,要求生成一篇约500字的文章,主题是我那位患有特定神经系统疾病的亲属如何激励我成为一名神经科医生。
But I will say, I typed in a command I should say that said give me like a 500 word essay about how my relative with a particular neurologic diagnosis inspired me to become a neurologist.
它真的写出来了,而且内容可信,简直太惊人了,对吧?
And it did it and it was believable and wow, right?
但在你看来,这仍然很粗糙。
But that's crude in your opinion even.
而这只是GPT-3,拥有大约1750亿个参数。
And that's just GPT-three, so that's like 175,000,000,000 parameters.
作为一位英语学者,真正让我震惊的是,我让它续写塞缪尔·泰勒·柯勒律治那首著名的未完成诗作《忽必烈汗》,它完全以令人信服的方式完成了,完全让人无法怀疑这不是塞缪尔·泰勒·柯勒律治本人写的。
The one that really blew my mind as an English scholar was that I asked it to finish Kubla Khan by Samuel Taylor Coleridge, this famous unfinished poem, and it absolutely did it in a totally convincing way, in a way that you would never doubt was written by Samuel Taylor Coleridge.
你可以去查看一下。
You can look at it.
我为《纽约客》写了这篇文章,你可以去看,自己判断。
I wrote it for The New Yorker, and you can see it, and you can decide for yourself.
但我绝不会觉得它和塞缪尔·泰勒·柯勒律治有什么不同。
But I certainly wouldn't think it's different than Samuel Tilakovich.
Palm,这是谷歌的5400亿参数模型,尚未向公众开放,但我通过采访工程师了解到,它能进行底层推理。
Palm, so that's the Google's 540,000,000,000 parameter one, which is not available to the public, but I've seen by interviewing the engineers, can do low level reasoning.
所以,如果你用一个故事告诉它二加二等于四,它就能从此开始学会数学,就像孩子通过听故事来学习一样。
So if you explain to it like two plus two equals four in a story, it will then from that point on be able to figure out it learns math the way that you learn from being a child by being told stories.
这些东西真的非常诡异。
And that stuff is really freaky.
当你亲眼看到它运行时,实际上相当怪异。
When you see it in action, it's actually quite bizarre.
所以,我并不是要贬低ChatGPT,它确实很了不起。
So not to take anything away from ChatGPT, it's amazing.
正如我所说,它是这项技术的T型车。
As I said, it's the model t of this technology.
T型车是一项了不起的成就。
Model t is an incredible achievement.
但我想指出的是,对很多人来说,这似乎就是语言AI的全部了,但事实远非如此。
But just to point out that, like, I I think for a lot of people, this is the whole of linguistic AI, and that's certainly not true.
对。
Right.
如果我理解错了,请纠正我,但我觉得你描述这些其他程序时的意思是,它们更具迭代性,也就是说,每次你与它互动并提供提示时,它都会学习。
And correct me if I'm wrong, but I think what I hear you saying in describing some of these other programs is that they are more iterative, meaning they learn each time you interact with it and feed it prompts.
嗯,ChatGPT 就是这样的。
Well, ChatGPT does that.
它们会记录下你与它的所有对话。
They keep a record of all your conversations with it.
因此,随着时间推移,ChatGPT 会学会如何回应你。
So over time, ChatGPT will learn how to respond to you.
但即使这一点,在现阶段也并不算特别令人印象深刻。
Even that is not really even that impressive at this point.
这离通用人工智能还差得远。
It's not anything close to artificial general intelligence.
它并不是一个真正的人工个体,也不是类似的东西。
It's not an artificial person or anything like that.
它只是语言领域的计算器。
All it is is it's like the pocket calculator for language.
你可以用它自动完成过去需要人类大脑才能完成的语言任务。
You can use it to do things in language automatically that used to take a human brain to do.
所以我想问你一个问题,关于神经学在我们日常中的应用。
So let me ask you a question as applies to, you know, neurology in my day to day.
我们中的许多人,都会为提交给我们的神经科学期刊的文章担任同行评审。
So many of us, we serve as peer reviewers for articles that are submitted to our neuroscience journals.
在我审阅下一篇文章时,有没有什么简单的方法能判断这篇文章是否由这些程序撰写?
Is there any easy way for me to tell when I'm reviewing my next article if that article was written with one of these programs?
我的意思是,既然它是实时生成的AI内容,我想象传统的剽窃检测工具可能会对我失效。
I mean, since it's AI generated in real time, I imagine that traditional plagiarism screening tools may well fail me here.
是这样吗?
Is that correct?
传统的工具,比如turnitin.com之类的,都会失效。
The traditional ones, like turnitin.com or whatever, those ones will all fail.
我的意思是,现在有AI阅读器,可以告诉你某段文字是不是AI写的。
I mean, there are AI readers, like, that can tell you if something was written by an AI.
对吧?
Right?
但那只是在别人直接用它全文生成、一点都没改的情况下才有效。
But that's if someone just uses it to write the whole thing and doesn't change anything.
比如,如果你用它来写一个初稿,然后再做修改——我觉得你肯定会这么做——那就无法分辨了。
Like, if you use this to write something as a basis for something and then makes changes to it, which I think you would inevitably do, that you won't be able to tell.
所以,这个工具的本质就是这样的。
So that's the tool that this is.
它其实是一个初稿终结者。
It's really a first draft ender.
过去,面对空白页面写作是写作的标志性行为,我觉得ChatGPT已经终结了这一点。
Staring in front of a blank page that used to be the defining act of writing, I kind of think ChatGPT has ended that.
如果你让它写一篇关于我过去某次神经学经历、并因此成为神经科医生的个人随笔,它也能做到。
If you ask it to write a personal essay about a neurological incident in my past that led me to become a neurologist, they can do that.
另一方面,这并不是你自己的东西。
On the other hand, it won't be yours.
所以你最好希望没人问起它。
So you better hope that nobody asks you about it.
但你知道,它可以为你提供一个非常好的初稿模型,让你以此为基础开始创作,而作为作家,这已经成功了一半。
But, you know, it can give you a first draft of that that would be a very good model for you to begin working on, which is, you know, as a writer, that's half the battle.
绝对如此。
Absolutely.
而且我注意到你说的这一点,我让它给我提供引用,但它拒绝了。
And I did notice to your point, I asked it to provide citations to me and it would not do that.
所以至少还需要做一些工作,至少对于ChatGPT来说是这样。
So it at least requires a little bit of work, at least ChatGPT.
我不知道其他程序会不会这样。
Don't know if these other programs would do it.
嗯,ChatGPT是非常受引导的。
Well, ChatGPT is very guided.
对吧?
Right?
而且,你知道,他们正在改进这些模型。
And, you know, they're working on them.
就在今天我的邮件里,我又收到另一个大型语言模型的推广,说:来看看我们吧。
Just in my email today, I got another large language model program saying, hey, come look at us.
我们的模型可以连接互联网,并对生成的文本进行事实核查。
We access the Internet with ours and we fact check text generated.
因为你知道,这些文本生成器确实会虚构内容。
Because, know, these text generators do hallucinate.
对吧?
Right?
Facebook开发的那个模型存在严重的虚构问题。
The one that Facebook generated had a huge hallucination problem.
任何只是让文本生成器生成内容然后直接发给别人的人,都是那种想被抓住的作弊者。
Anyone who would just have a text generator generate a text and then just send it to someone, that would be the cheater who wants to get caught.
这也很常见。
Now that's pretty common too.
对吧?
Right?
作为英语教授,我的经验是,作弊者总是做最蠢的事。
At least that was my experience as an English professor is that the cheaters always did the dumbest things.
但你知道,这其实是一种工具,让你不必写初稿。
But, you know, what it is is it's a tool that allows you not to have to write a first draft.
这是一项非常强大的工具。
That's a pretty powerful tool.
是的。
Yeah.
当然。
Absolutely.
你能说说你所说的‘幻觉’是什么意思吗?
And tell me what you mean by hallucinate.
大型语言模型并不知道它们所说的内容是否真实。
The large language models have no idea what they're saying is true.
事实性并不相关。
Facticity is not relevant.
它所做的只是文本预测。
All it does is text prediction.
因此,它给出的答案可能完全连贯,但完全是凭空捏造的。
So the answer it can give you will be totally coherent and just made out of nothing.
这种情况并不经常发生,但确实会发生。
That doesn't often happen, but it does happen.
比如,如果你用它来作弊参加家庭考试,你就得检查答案是否正确,但它会给你提供答案。
Like if you're using this to cheat on a take home exam, for instance, you would have to check that you were right, but it would give you the answers.
我认为作为人类体验的家庭考试形式现在已经结束了。
I do think the take home exam as a form of human experience is now over.
我认为这更多是MBA和法学院的问题,因为它们有很多这样的考试,而不是神经学的问题。
I don't think that's a problem for neurology so much as it's a problem for MBAs, have a lot of those, and law schools, which have a lot of those.
回到你刚才说的,现在有一些程序可以判断某内容是否由AI生成。
And going back to what you said, there are programs now that are coming out that can tell you if something's AI generated.
所以这些几乎是相互竞争的行业吗?
So are these almost competing industries?
AI会不断进步,而检测AI的程序也会随之出现吗?
Is the AI going to keep getting better and the programs to detect AI are gonna come along?
这就是正在发生的情况吗?
Is that what's happening?
这仍然处于非常早期的阶段。
It's still very nascent.
我知道有一些AI检测工具。
I know that there are AI checkers.
这类工具已经存在很长时间了。
There have been for a long time.
比如,在奥巴马的传记出版后不久,就出现了一本由AI生成的奥巴马传记,本质上是亚马逊的一个商业操作。
Like, there was a AI generated Obama biography that came out right after his biography came out that was basically an Amazon play.
而且,你知道,他们能看出来。
And, you know, they could tell.
他们使用了一个本质上是反图灵测试的程序,来判断这是AI生成的。
They used a program that was essentially a reverse Turing test to tell that it was AI generated.
但问题并不在于机器生成内容本身。
But, you know, the problem here is not necessarily having a machine generate it.
问题在于,机器生成后,再做些小修改,把它变成自己的东西,这反而会削弱让人有意为之的意义。
The problem here is having a machine generated and then doing small fixes and making it your own, which will kind of defeat the purpose of having someone do something intentionally.
这会剥夺写作的过程,而我们一直把写作视为思考本身的象征。
And it will take away the process of writing, which we've taken as a sort of synecdoche for thinking itself away.
这种联系将不再清晰。
That will no longer be a clear connection.
你刚才说的,这确实是问题的核心,对吧?
That's really the essence of it, what you just said there, is it?
这很有趣。
It's funny.
当我看这些东西时,人们提出的关于作弊或艺术家担心这会取代人类艺术家的问题,他们只是把它当作一种普通技术来看待。
It's it's like when I look at this stuff, the questions that people bring up about like cheating or, you know, the artists who were thinking this is gonna replace human artists, they're treating it like it's sort of any other technology.
这项技术能产生连贯性。
This technology produces coherence.
这基本上是一个宇宙级的问题。
That's a basically a cosmic question.
对吧?
Right?
这项技术引发的问题是:原创性的本质是什么?
The questions this technology brings up are what is the nature of originality?
你怎么判断一个人是不是人?
How do you tell someone's a person?
这才是这里的关键所在。
Those are the stakes here.
谈谈这些技术对医学的影响。
Talk to me about these technologies and the implications for medicine.
我认为在医学领域,我们都曾对几年前出现的IBM Watson概念感到非常兴奋。
I think in medicine all of us were very excited for the IBM Watson concept that came back several years ago now.
我们是否已经接近能够输入症状并自动生成鉴别诊断的阶段?
Are we closer to being able to type in symptoms and automatically generate a differential diagnosis?
你认为这项技术在医学和医疗保健领域将走向何方?
Where do you see this heading in medicine and healthcare?
关于人工智能,我只相信我亲眼所见,因为科技界和硅谷的炒作太过疯狂,根本无法分辨哪些是可信的,哪些不是。
One thing about AI is that I only believe what I see because the hype is so insane in tech circles and in Silicon Valley that it's just impossible to tell what to believe or not.
一些非常严肃的人,曾是人工智能领域的领导者,也是真正聪明的人,曾断言心脏病学将因人工智能而消失。
Very serious people who were leaders at AI and genuinely brilliant people said that cardiology was gonna disappear because of AI.
但事实并非如此。
It didn't.
人工智能在某个阶段停滞了,始终未能达到取代医生的水平。
The AI stalled at a certain point, and it just never got past the point where it could replace doctors.
对吧?
Right?
它可以在那方面取代人类的智慧。
It could replace human intelligence on that.
你知道的,谁知道呢?
And, you know, who knows?
也许五年后它能达到那个水平,但它停滞了。
Maybe five years from now, it will get to there, but it stalled.
自动驾驶汽车也发生了完全相同的情况。
And the same exact thing happened with driverless cars.
你知道,每个人都以为这会发生。
You know, everyone thinks, oh, this is gonna happen.
这一定会发生的。
This is gonna happen.
但结果就是没有发生。
And then it it just doesn't.
他们也不知道为什么。
And they don't know why.
我的意思是,你必须明白,工程师们完全不知道这种连贯性是如何出现的。
I mean, one thing you have to understand is the engineers have absolutely no idea why this coherence has emerged.
作为一名记者,去和开发Palm的资深工程师交谈,这非常奇怪,你会说:好吧。
Like, it's very strange as a journalist to go and talk to senior engineers working on Palm and saying, okay.
那么,为什么这项技术能够进行链式推理呢?
Well, why is this technology capable of chain reasoning?
为什么会出现这种情况?
Like, why why is that?
他们说:我们完全不知道。
And they're like, we have no idea.
我们只是注意到它发生了,但为什么?我们真的不清楚。
We And just don't like, we just noticed that it happens, but why?
我们一点头绪都没有。
We don't have any idea.
人工智能的核心特性之一,就是其本质上的不可理解性。
One of the core things about AI is the unfathomability at its core.
你用它来做一些你自己都无法理解的事情。
You use it to do things that you can't understand yourself.
对吧?
Right?
不过话说回来,我认为自己在人工智能的语言学问题上算是个专家。
Now, having said that, I think I would consider myself an expert on the linguistic questions of artificial intelligence.
医学方面的则完全是另一个维度,真的超出了我的理解范围。
The medical one is a kind of whole other dimension that's really kind of beyond me.
但他们正用这种语言AI来做一些有趣的事情,比如蛋白质折叠、罕见医学研究,而他们也不太清楚为什么语言AI在这方面比他们之前用的其他AI更出色。
But one of the more interesting things that they're using this linguistic AI to do things like protein folding, questions of rare medical research, and they don't quite know why the linguistic AI is better at it than the other AI that they were using.
也许只是因为它对我们来说更合理,能为我们带来连贯性,而其他AI却做不到这一点。
It may just be that it's more sensible to us, that it creates coherence to us, which the other AI just couldn't get to.
但这些都是非常深刻的问题,目前完全没有任何答案。
But these are really profound questions that are totally unanswered at this point.
这些问题还远未有定论。
They are very much up in the air.
我认为实话是,你得等上十年才能看到是否有任何效果,我的意思是,他们曾试图用人工智能解决新冠疫情,但毫无作用。
And I think the honest truth is you're going to have to wait ten years to see if there's any effects at I mean, they tried to use AI to solve COVID and had zero effect.
每一次测试、每一次人工智能的应用都失败了。
Every single test, every single use of AI failed.
我认为有将近200次尝试用人工智能解决新冠疫情,但都毫无进展。
I think it was 200 separate attempts to use AI to solve COVID and they got nowhere.
所以这是一项强大的技术,但也极其神秘。
So it's a powerful technology, but it's also extremely mysterious.
正因如此,它本身就内置了失败的可能性。
And because of that, it has failures built into it.
这是个绝佳的回答。
That's a fantastic answer.
我欣赏你的坦诚。
I appreciate your honesty.
我的意思是,它确实能对各种问题和具体的技術问题给出很好的通用答案。
I mean, I think it does offer very good general answers to things and very specific technical questions.
有一位物理学家给我写信。
I had a physicist write me.
他说,你知道吗,AI能立即正确地回答博士级别的物理问题,而谷歌搜索之类的方式做不到,对吧?
He was like, you know, it answers PhD level physics questions instantly and correctly, in a way that Google like a search just won't, right?
它能对这些问题提供连贯的答案。
It provides a coherent answer to that.
我想,这肯定会有医疗方面的应用。
I mean, there has to be a medical application of that I assume.
对。
Right.
对。
Right.
它可能无法取代医生,但肯定可以成为临床医生使用的强大工具。
It may not replace the physician but it certainly could be a very powerful tool for the clinicians to use.
或者在没有医生的地方,它也可能很有用。
Or maybe where there aren't doctors it could be useful.
它可能成为那些没有医生的社区中有帮助的工具。
It could maybe be a tool in communities where there aren't physicians that might be helpful.
但同样,这正是你从我所知的领域进入科幻范畴的地方,我向你保证,现在就有人在研究这个。
But again, that's the point where you go from what I know to the science fiction stuff, which, you know, somebody is working on that right now, I assure you.
他们可能成功,也可能失败。
And they may get there and they may not.
对。
Right.
你对搜索引擎提出了一个非常好的观点,因为你确实需要有人精确地告诉搜索引擎正确的关键词,才能获得有意义的结果。
And you raise such a good point about the search engines because you really do need someone to spoon feed the search engines the exact correct terms to give you the meaningful output.
但在这里,我们更进一步,即使你的搜索关键词稍有偏差,仍然可能获得有用的输出。
But here we are one step closer to maybe being slightly off in your search terms and still getting useful output.
当然。
Sure.
我母亲当了多年的医生,你知道,搜索引擎的出现简直是一场灾难。
My mother was a physician for many years and, you know, the arrival of the search engine was a disaster.
对吧?
Right?
我的意思是,好像每个人来看病时都觉得知道自己得了什么病。
Like, I mean, was like it was like, oh, everyone thinks they know what they have when they come in.
我不认为ChatGPT能让患者变得更聪明。
I don't think ChatGeBT is gonna make a patient smarter.
你明白我的意思吗?
Do you know what I mean?
我觉得不太可能。
Like, I doubt that.
不过,也许有办法。
Although, maybe there is a way.
也许有办法。
Maybe there is a way.
也许我只是太悲观了。
Maybe I'm just being cynical.
也许这至少在一定程度上减少了对精确搜索词的依赖。
Perhaps it's at least a bit closer to not being so dependent on the precise search terms.
嗯,依你所说,我们只能拭目以待了,对吧?
Well, I guess to your point, we'll have to wait and see, won't we?
在我看来,拍一张腿上皮疹的照片,然后问机器‘这是什么’,这难道不合理吗?
Does it seem to me unreasonable to take a picture of a rash on your leg and say to the machine, what is this?
然后它告诉你答案。
And it tells you.
这对我来说并不觉得不可能。
That doesn't seem impossible to me.
对。
Right.
而且这更好。
And that's that's better.
是的。
Yeah.
不过,正如我所说,这仅仅是文本生成,因此并不具备事实性。
Although, as I said, this is just text generation, so it's it doesn't have facticity.
这一点将教会所有人一个绝对必要的事情:必须进行事实核查。
One thing that this is going to teach everyone is the absolute need to fact check.
因为即使你用它来作弊,比如学生用它来应付作业,它给出了一个答案,你也必须去别处核实它所说的内容是否正确。
Because even if you're using it to cheat, like if you're a student using this to cheat and it provides you an answer, you're gonna have to go somewhere else to figure out if what it said was correct.
百分之百正确。
100%.
因为它可能只是在输出听起来很合理但实际上完全错误的答案。
Because it could just be spewing coherently sounding answers that are totally wrong.
它确实会这样。
It does that.
没错。
Right.
那我也给你举个例子。
So I'll give you an example too.
我的亚专业是自身免疫性神经病学,所以我让AI为我生成了一篇关于自身免疫性神经病学的文章,我浏览了一下,内容在事实上是正确的。
My subspecialty is autoimmune neurology and so I had it generate an article for me on autoimmune neurology and I perused that article and it was factually correct.
那么,我需要修改多少内容才能避开那些检测AI生成文本的程序呢?
Now how much of that do I have to change to elude the program that will say it was AI generated?
你对此有什么看法吗?
Do you have a sense of that?
这个我不清楚。
That I don't know.
我很难想象需要改多少,因为AI的工作方式是高度完整的系统。
I can't imagine much because the way that AI works is very complete systems.
AI检测的一个方法是标点符号过于完美,因为几乎没有人能写出完全无误的标点符号,诸如此类的情况。
One of the ways that it tells that it's AI is that the punctuation is perfect because almost no human person writes perfect punctuation and things of that nature.
而且AI还会分析一个关键点:Transformer模型的真正突破——就像原子裂变一样——在于它并不是按顺序分析内容。
And it also it analyzes one of the things that AI does is that it doesn't the brilliance of the transformer, the real the splitting of the atom of the transformer is that it doesn't analyze things in sequence.
它并不是按顺序分析词元的。
It doesn't analyze tokens in sequence.
它会分析文本中每个词与其他所有词的位置关系。
It analyzes the tokens as they are in position with every other token in a given text.
意义就来自于此。
That's where the meaning comes from.
它甚至不一定只参考前一个词或后一个词,而是会参考前面80个词和后面80个词,然后对整个语言都这样做。
It doesn't even necessarily take it from the next word or the word before it, but from the word 80 words before it and the word 80 words behind it, and then it does that for all of language.
对吧?
Right?
因此,一段文本的具体措辞对于AI的工作方式至关重要。
So the specific formulation of a block of text is actually very crucial to how AI works.
我猜测,但我觉得你不需要做太多修改。
I'm guessing, but I don't think you would have to change much.
另一方面,也许他们可以通过句子层面来实现。
On the other hand, maybe they could do it by a sentence level.
我也不太确定。
I don't really know.
然后你还有另一个问题:这真的算作弊吗?
And then you have the other question is, is that really cheating?
新西兰有一个学生承认使用了它,他说:我被允许使用同义词词典。
There was a student in New Zealand who admitted to using it, and they said, I'm allowed to use a thesaurus.
我没有使用别人的成果。
I'm not using somebody else's work.
我是这篇作品的作者。
I'm the author of this work.
是我创作了它。
I created it.
这里有什么问题?
What's the problem here?
我认为这实际上并不一定是一种糟糕的观点。
And I think that's actually not necessarily an inherently terrible point of view.
我的意思是,我能理解其中的合理性。
I mean, can see some validity there.
哇。
Wow.
从这个意义上说,我们真的需要重新定义剽窃,对吧?
We really have to redefine plagiarism in that sense, don't we?
剽窃的意思是你偷了别人的创意。
The thing of plagiarism is you were stealing somebody else's ideas.
对吧?
Right?
就像,你偷了别人的作品。
Like, you were stealing somebody else's work.
如果你使用ChatGPT,你并没有偷任何人的东西。
Well, you're if you use ChatGPT, you're not stealing from anybody.
你知道,你只是在利用现成的东西。
You know, you're just taking what's there.
我得好好想想这一点。
I have to think about that one.
嗯,你刚才几分钟内说的很多内容都很有道理。
Well, there's a lot in what you just said over the last couple of minutes too.
首先,我知道我特别喜欢用破折号和逗号拼接。
First of all, I know I'll just I love myself a good em dash and some comma splicing.
所以,我也许可以只是通篇这么写,这样问题就解决了。
So I I could maybe just go through and that will that will solve the problem.
不行。
No.
但是
But
我就加点逗号拼接吧。
I'll just add in some comma splices.
写出这本书来。
Create create the book.
对。
Yeah.
你知道,我们在期刊界真的得密切关注这个问题。
You know, we're really gonna have to keep an eye on this in the in the journal world.
你提到的另一件事,也许你只是从技术上解释了我阅读时感受到的东西,那就是它读起来有点像机器人。
The other thing that you said, and maybe you just explained to me technically what I sensed when I read it, was the fact that it read a bit robotically.
它并不真正阅读。
It does not read.
它不是一个阅读机器。
It's not a reading machine.
它会拆解文本,然后再重新组合。
Like, it disintegrates text and reintegrates it.
它不是以词语为单位工作的。
It doesn't work by words.
它是以标记为单位工作的,这些标记通常是三到五个字母的组合。
It works by tokens, which are collections of letters, usually four, three, or five letters.
最有趣的一些工作正是当他们重新定义标记时。
And some of the most interesting work is when they redefine tokens.
比如,Google的Palm在编程方面变得如此出色的原因之一。
Like, one of the reasons Google's Palm got so much better at doing code.
你可以用这些程序来完成类似将C++代码翻译成Python的任务,轻松搞定,没有任何问题。
You can use these programs to do something like translate this code from c plus plus to Python, done, without any problem.
对吧?
Right?
而且全部都是正确的。
And it's all correct.
这当然本身就有巨大的应用价值。
And that, of course, has massive applications of its of its own.
当他们改变处理数学符号的分词方式时,完全改变了大型语言模型的数学能力。
When they changed the way they did tokens around mathematical figures, it completely changed the capacity of the large language model to do math.
他们也不知道为什么,但这非常有趣。
And they don't know why, but that's fascinating.
所以它并不是一个像人类大脑那样思考语法和句法的系统。
So it's not a big human brain thinking through grammar and syntax.
展开剩余字幕(还有 480 条)
它会收集一系列标记,这些标记是以特定方式组合在一起的字母,然后在时空上将它们连接起来,并通过强大的超级计算重新整合所有内容。
It takes collections of tokens, which are letters in a certain way that are bundled together and connects them in space and time and fits them all together again through massive supercomputing.
所以这是从人类层面来说的。
So it's on a human level.
比如,如果你观察变换器的工作方式,它与人类阅读的行为完全无关。
Like, if you're looking at what a transformer does, it's totally unrelated to the act of human reading.
对。
Right.
而且因为你有更丰富的经验,你已经看到过许多由各种不同程序生成的产品。
And when you've looked at these because you have far more experience, you've seen several more products that have been generated from all the different programs.
你是一位写作专家。
And you're an expert in writing.
你看到什么模式了吗?
Do you see a pattern?
你有没有觉得有些东西很平淡?
Do you get the sense that something's bland?
你现在会读这些东西,然后想,有时候你是不是能猜出来?
Do you sort of read these things now and think maybe sometimes you're able to guess?
我并不是在贬低ChatGPT。
I'm not taking anything against ChatGPT.
我的意思是,它是一个很棒的程序。
I mean, it's a wonderful program.
如果你问它一些平庸的问题,它就会给出平庸的回答。
If you ask it banal questions, it gives banal answers.
我用Pseudowrite和Cohere为美国文学期刊LitHub写了一个短篇故事,我训练了机器人模仿作家的风格,比如纳博科夫、莎士比亚、狄更斯,然后让它整合这些作家的风格写一个爱情故事。
I used Pseudowrite and Cohere to write a short story for LitHub, a literary journal in America, where I trained bots to write like writers, like Nabokov, Shakespeare, Dickens, and then I had it write a love story integrating all of these talents into them.
这花了我很多功夫。
That took a lot of work.
要实现这一点需要大量的技术工作,但最终它能产生令人惊叹的、富有创造力且生动的语言成果。
It took a lot of technical work to get to be able to do, but then it creates these incredible, creative, vibrant linguistic results.
比如,如果你让它模仿狄更斯的风格,它就会做到。
Like, if you tell it to write like Dickens, it will.
这太惊人了。
That's an amazing thing.
我还在研究一些项目,我会给出提示故事,告诉它以某种方式写作,然后用模型进行训练,这样它每次都会讲出不同的故事。
I'm also working on things where I'm writing prompt stories where you tell it to write in certain ways, and then you train it on models, and then it will tell the same story differently every time.
本质上,你创造了一个无限再生的故事。
And, essentially, you create an infinitely regenerative story.
这只是即将发生的事情的一个小小例子,这就是为什么我认为我们还没有完全意识到即将到来的事物的严重性。
This is just a small example of what's coming, which is why I think we haven't quite grasped the stakes of what's coming.
我在加拿大,所以法语沉浸式教学在这里非常重要。
I'm in Canada, so teaching French immersion is a big deal here.
比如,所有孩子都学习法语,以便将来能当上总理。
Like, all the kids learn French so that they could be prime minister.
对。
And Right.
对。
Right.
你知道,尽管成年人都最终没有说法语。
You know, although none of the adults ever end up speaking French anyways.
但我有个朋友,他的孩子不喜欢学校里提供的任何法语沉浸式读物。
But a friend of mine, like his kid didn't like any of the French immersion books that were available for him in the school.
于是他直接去使用ChatGPT,让它按孩子的年级水平写一本关于他孩子最喜欢的超级英雄的法语书。
So he just went to chat GBT and had it write a French book at grade level about his kid's favorite superhero.
它真的写出来了,孩子读了之后非常喜欢。
And it did that, and the kid read it and loved it.
所以他现在不断使用ChatGPT来生成他想要存在的那些书。
So he's now going to ChatGPT to write the exact books that he wants to exist.
我的意思是,我真的不确定自己能否想象出这会带来什么后果。
I mean, like, I I actually literally I'm not sure I can conceive of the consequences of that.
对吧?
Right?
当消费者与语言创造者之间的界限彻底模糊、融为一体时,这究竟意味着什么?
Of what it means when essentially the line between a consumer and creator of language dissolves and and becomes this enormous blur.
别提这对儿童图书行业意味着什么了。
Like, never mind what it means for the children's books in industry.
但如果你想到一本书,可以随时让它为你定制,这又意味着什么?
But what does it mean when if you think of a book, can just have it made for you?
我的意思是,我认为这就是我们正在走向的方向。
And I mean, I think that is where we're going.
我觉得我之前的一些问题可能集中在如何识别这一点,以及如何避免任何形式的欺诈,无论我们是否需要定义或重新定义它。
I think maybe some of my questions have been focusing on how do we identify this and the concerning aspects avoiding any manner of fraud regardless of how we may have to define that or change the definition.
你认为这项技术最有希望的积极方面是什么?
What do you see as the most promising positive aspect of the technology?
你之前提到过,你让一个年轻学生对学习产生了极大的兴趣。
You touched on it a little bit in that you got a young student really excited to learn.
你还认为有哪些其他有希望的方面?
What other promising aspects do you see?
哦,我觉得这非常令人兴奋。
Oh, I think it's hugely exciting.
我的意思是,我在用它进行创意创作。
I mean, I'm using it creatively.
我认为这种创意爆发将会是巨大的。
I think the creative explosion of it is gonna be immense.
我觉得自己就像是电吉他的早期采用者。
I feel like I'm an early adopter of the electric guitar.
老实说,我就是这么感觉的。
That's honestly how I feel.
我写过一些东西,比如为《大西洋》杂志写的《Facebook让我们孤独吗?》。
I've written things like I wrote, is Facebook making us lonely for the Atlantic.
我的意思是,我曾是社交媒体的早期怀疑者,但我觉得,社交媒体的灾难让我们对技术产生了极度的恐惧。
I mean, was a very early skeptic of social media, but I think in some ways, the disaster of social media has made us extremely afraid of technology.
对吧?
Right?
这几乎是我们的本能反应。
It's kind of our automatic response.
我的意思是,这怎么会摧毁社会呢?
Like, how is this going to destroy society?
我实际上认为这项技术很棒。
I actually think this technology is wonderful.
我不害怕它。
I'm not afraid of it.
尽管我有这些倾向,但我对它感到非常乐观。
Despite my tendencies, I feel quite hopeful about it.
我也认为,当你使用它时,语言并不会变得不那么重要。
I also think it's not going to be the kind of thing where language becomes less important when you use it.
它肯定不会取代任何人的工作。
It's certainly not going to replace anyone's jobs.
这是另一个担忧,但我真的认为这是一种肤浅的焦虑。
That is another worry, but I really think that's a facile anxiety.
这将类似于唱盘对嘻哈音乐的发明。
This is going to be like the invention of the turntable for hip hop.
就像这项新技术催生了所有这些新的艺术形式,它需要对以往音乐有极其系统的了解。
Like this new technology gave birth to all this new art, and it required a kind of hugely systematic knowledge of previous music.
它将要求我们真正理解语言及其运用,而我认为我们已经很久没有这种理解了。
It's going to require a real understanding of language and its manipulation that I think we kind of haven't had for a while.
我确实相信,事实核查将会像以前从未有过的那样,成为教育中不可或缺的一部分。
I actually believe that like fact checking is going to be integral to education in a way that it just wasn't before.
这不再是是否能生成新文本的问题。
Like, it's not gonna be a question of necessarily can you generate a new text?
而是你能否确定一段文本是否正确。
It's like, but can you establish that a text is actually correct?
在面对这项技术时,这种能力将具有最大的价值。
That is going to be a skill with maximum value in the face of this technology.
我的意思是,我真的相信这一点。
I mean, I I really believe that.
是的。
Yeah.
我明白你的意思。
I hear what you're saying.
在社交媒体上,这已经变得极其重要,并且会呈指数级增长。
It's already become incredibly valuable with social media, and it will just logarithmically.
是的。
Yeah.
比如说,如果我有个神经学方面的问题,我可以去问ChatGPT并得到一个答案。
I mean, for example, like if I had a neurological question, I could go to chat GPT and get an answer.
但我还是更愿意跟你聊聊。
I'm going to tell you I would rather talk to you.
你明白我的意思吗?
Do you know what I mean?
没错。
Like Right.
事实上,无论它给我什么答案,我都不会轻易相信,因为唯一能验证的方法就是我自己去查证——要么通过互联网,但互联网本身也不可靠;要么通过学术同行评审的资料,可我又看不懂那些。
In fact, whatever answer gave me, I really wouldn't believe because the only way I could do it was me to fact check it either with the Internet, which is, of course, unreliable in its own way or scholarly peer reviewed work, which I don't understand.
所以,要核实信息的真实性,获取真正可靠的信息,如今和以前一样困难。
So getting to the veracity of things, getting information that you can actually rely on is just as difficult now as before.
是的。
Yeah.
你给了我们太多值得思考和消化的内容。
You've given us so much to think about, to chew on.
我建议我们的听众阅读斯蒂芬在《大西洋》杂志上的文章。
I would encourage our listeners to read Stephen's article in The Atlantic.
文章标题是《大学申请文书已死》。
The title again is The College Essay is Dead.
这是一篇简短而精彩的阅读概述。
That's a really quick, great overview of a read.
当然,你也可以在社交媒体上关注斯蒂芬,了解更多他关于各种话题的观点。
And you can always, of course, follow Stephen across social media to hear more from him on a variety of topics.
今天能与你交谈、向你学习,真是一次令人耳目一新的愉快经历,斯蒂芬。
It has been an eye opening pleasure talking with you, Stephen, and learning from you today.
谢谢。
Thank you.
我的荣幸,斯蒂芬。
My pleasure, Stephen.
您好,我是维克森林大学的哈莉·亚历山大,今天我和尚多尔·贝尼斯基在一起。
Hello, this is Hallie Alexander from Wake Forest University, and I'm here today with Shandor Benisky.
他是丹麦奥胡斯大学的临床教授。
He's a clinical professor at Aarhus University in Denmark.
尚多尔是可穿戴设备和自动癫痫检测方面的专家。
Shandor is an expert on wearable devices and automated seizure detection.
今天我们将讨论他团队最近关于使用人工智能自动解读脑电图的研究。
And today we're discussing his group's recent work regarding automated EEG interpretation using artificial intelligence.
这项研究上个月发表在《JAMA神经学》上。
This was published in JAMA Neurology last month.
尚多尔,今天能邀请您做客播客,我感到非常荣幸。
Cheandor, it's such an honor to have you on the podcast today.
对我来说也是荣幸,海登。
Pleasure on my side, Hayden.
我想大家都很想知道,你们开发的用于解读脑电图的AI模型是什么?这个模型与过去我们见过的其他自动化脑电图检测方法有何不同?
I think everyone's really interested to know what is the AI model that you came up with for reading EEG and how does this model differ from some of the other automated EEG detections that we have seen in the past?
此前已发表的用于脑电图分析的AI模型仅关注有限的方面,例如是否存在癫痫样异常,或脑电图是否正常或异常。
So the previously published AI models for analysis of EEG addressed on limited aspects such as the presence absence of epileptiform abnormality or whether the trace is normal or abnormal.
我们实现的是对常规临床脑电图的更全面解读。
What we achieved is a more global interpretation of the routine clinical EEG.
我们的模型首先将正常和异常记录区分开来,然后将异常记录进一步细分为四种对临床决策至关重要的亚类。
So what our model does is first separates normal from abnormal recordings, and then it subclassifies the abnormal recordings into the four most important subcategories which are necessary for a clinical decision making.
这些类别包括局灶性癫痫样、全面性癫痫样、局灶性非癫痫样和全面性非癫痫样。
And these categories are focal epileptiform, generalised epileptiform, focal non epileptiform and generalised non epileptiform.
这些主要是慢波。
These are mainly slowings.
这些是最关键的分类,能够帮助临床医生做出治疗决策或决定是否需要进一步检查患者。
So these are the most important categories which can help clinicians then to make a therapeutic decision or decision on further workup of the patients.
我们算法的另一个重要创新在于,我们对其进行了全自动模式的验证。
And the other important novelty about our algorithm is that we validated it for a fully automated mode.
因此,大多数先前发布的模型都需要人为干预,通常是因为它们的特异性很低,需要人类专家确认结果。
So most of the previously published models need a human intervention, typically because their specificity is very low, so human experts need to confirm the findings.
现在,在测试数据集中,我们表明,无需任何额外的人工干预,该模型的性能与专家相当。
Now in the test data set, we showed that without any additional human intervention, the model achieves the same performance as the experts.
该模型的名称是SCORE AI,SCORE是脑电图标准化计算机辅助结构化报告的缩写。
And the name of the model is SCORE AI and SCORE is the acronym for standardized computer based organized reporting of EEG.
十年前,我们将其作为一种标准化方法,用于从脑电图中提取临床相关特征,并据此生成标准化报告。
Now we published this ten years ago as a standardized way of extracting the clinically relevant features from the EEG and then use this to generate a standardized report.
但后来我们发现,这种标准化的标注是训练AI模型的理想输入。
But then it turned out that the standardized labeling is the ideal input for training an AI model.
因此,我们将我们的模型命名为ScoreAI。
So that's why we call our model ScoreAI.
非常好。
Excellent.
所以我完全可以理解,这将填补临床护理中的许多空白。
So I can certainly see how this would fill a lot of gaps in clinical care.
但您如何看待将其融入临床实践?
But how do you envision this being incorporated into clinical practice?
嗯,这取决于临床实践的地点。
Well, this depends on where the clinical practice is.
在资源有限的地区,基本上没有脑电图专家。
So in the resource limited areas, basically there are no EEG experts.
而在有专家的地区,SCORE AI 可以减轻工作负担。
Now in areas where there are experts, Score AI could ease the workload.
因此,该模型还会在脑电图记录中突出显示脑电图异常。
So EEG abnormalities are also highlighted by the model in the EEG recording.
因此,专家们首先可能根本不需要打开被算法判定为正常的记录。
So the experts, first of all, would perhaps not need to even to open a recording which is interpreted as normal by the algorithm.
然后,如果模型发现异常,它还会在波形中突出显示这些异常。
And then if the model finds abnormalities, then the model also highlights the abnormalities within the trace.
因此,专家可以进去检查模型的标记。
So the experts could go in and then check the markings of the model.
是的,这太令人兴奋了,因为到目前为止,我们还没有任何类似的东西能真正做到这一点来减轻我们的工作负担。
Yeah, that's so exciting because as of yet, we don't have anything that is like that yet that can really do that to ease our workload.
你们现在在临床实践中使用它了吗?
Is it something that you are using now in your practice?
还没有,因为我们正在等待监管机构的批准,但一旦获得批准,我会非常非常乐意将其纳入我们的临床流程。
Not yet because we are waiting for the approval from the regulatory bodies, but as soon as we will have the approvals, I would be very, very happy to put it into our clinical pipeline.
也许我可以提一下,它有望在明年年初登陆Natus Neuroworks平台,但需在获得监管批准之后。
And perhaps I may mention that it will be available on Natus Neuroworks hopefully early next year, but after the regulatory approval.
我也想感谢我的合作者Jesper Suede和Harald Orlian,他们对AI模型的开发做出了最大贡献。
And I would also like to give the credit to my collaborators, Jesper Suede and Harald Orlian, who contributed most to the development of the AI model.
是的。
Yes.
另外,我想像在阅读你们的文章时,有时会遇到一些难以判断的脑电图,两种解读都有可能。
And another way, I guess, I envisioned using it when I was reading your article was just sometimes you get a challenging EEG that could be called either way.
找同事当裁判挺好的,但你知道,大家都很忙。
And it's nice to ask a colleague like a tiebreaker, but, you know, everyone's busy.
所以你有没有想过以这种方式使用它?
So do you ever envision using it in that way?
比如当你处于模糊地带,犹豫该不该判定为是或否时,可以借助软件来帮忙判断?
Like when you're in a gray area wondering if you should call something, yes or no, and then you could use the software to weigh in?
这确实是一个非常好的观点。
This is indeed an excellent point.
我们论文中的一个发现实际上表明,你可以用它来实现这个目的,因为我们发现AI模型的评分比单个专家的平均评分更接近11位专家的共识平均值。
And one of our findings in the paper actually could indicate that you could use this for that purpose because we showed that the scorings of the AI model are closer to the average consensus of 11 experts than the average of the individual experts.
因此,通过Score AI自动实现,你可以获得一个稳健的专家多数意见。
So you could get a robust majority expert second opinion by doing this automatically with Score AI.
是的。
Yeah.
随着这项技术的进步,每个人,尤其是所有解读脑电图的人,都会想知道:当我们可以开发出如此先进、几乎无需人类参与的AI时,人类专家的角色将会如何变化?
So as this technology improves, then the question on everyone's mind, or at least everyone that interprets EEGs is what is going to be the role of the human expert as we develop more advanced AI like this that can really operate without us?
这个问题我经常被问到。
Oh, get that question often.
我们是不是该停止培训临床神经生理学家了?AI会取代我们吗?
Shall we stop training clinical neurophysiologists or will AI replace us?
我听过一个非常好的回答,我想把这份功劳归于卡蒂琳,我完全同意她的观点。
I've heard a very good answer and I would like to give the credit to Katyalin who told me that I fully agree with her.
她说,AI不会取代我们,但会使用AI的人会取代我们。
So she said that AI will not replace us, but humans using AI will.
如果我们忽视技术的进步,忽视AI所蕴含的可能性和潜力,那么那些善于利用AI作为高效工具的聪明人,很可能会取代我们。
So if we ignore the advances in technology and we ignore the possibilities, the potentials which are in the AI, then the smart humans who will take this as a tool, as a very efficient tool, will probably replace us.
再次强调,我们应该把这看作一个既有优势、也有局限的工具。
Now, again, we should think of this as a tool which has advantages and of course limitations.
从长远来看,我确信它将帮助我们发现人类目前还无法察觉的脑电图异常。
And on the long run, I'm sure that it would help us detect abnormalities in the EEG which we humans cannot see yet.
它还能帮助我们解读更多的脑电图数据。
And it also would help us interpret a higher amount of EEGs.
我只是想澄清一下,你们开发的技术目前仅应用于常规脑电图记录,尚未包括持续脑电图和重症监护病房的脑电图。
I just want to clarify that the technology that you've developed was applied to routine EEG recordings specifically continuous EEG, ICU EEG were not included in this yet.
但我很好奇,是否有计划将这项技术扩展到这些类型的脑电图研究中?
But I'm wondering, there plans for developing technology to apply to those types of EEG studies as well?
是的,当然有。
Yes, definitely.
我们已经为长期监测开发了适配版本,并正在测试中。
So we already have an adaptation for long term monitoring and we are currently testing it.
我们目前正处于长期监测版本的验证阶段。
We are in the validation phase of the LTM version.
接下来的步骤将是训练该技术用于新生儿脑电图和重症监护病房的脑电图。
And then the next steps will be training it also for neonatal EEG and also for the ICU.
所以你提到你们与一些合作者合作了,当然,这样一个项目似乎需要大量的不同意见输入。
So you mentioned that you worked with some collaborators, of course, and a project like this seems like it would take lots of different input.
所以还有谁参与了
So who else was kind
在
of on the
帮助实现这一成果的团队中还有谁?
team that helped to bring this to fruition?
除了学术参与者外,对模型开发起到决定性影响的公司是Holberg EG。
Besides the academic participants, decisive influence and the company who developed the model was Holberg EG.
这是一家位于挪威卑尔根的公司。
They are a company based in Bergen in Norway.
我想再回到你提到的一点,即这项技术甚至可能被开发用于检测人类无法察觉的事物。
And then I want to circle back to something you said about maybe it could even be developed to detect things that humans cannot detect.
所以我很想知道这将如何实现,因为我对人工智能的理解非常浅薄,只知道你可以根据输入数据对其进行训练。
So just curious how that would work because my understanding of AI, which is rudimentary at best, is that you can kind of train it, based on inputs.
一旦训练完成,它就能独立地检测事物。
And then once it's trained, then it can go on to detect things independently.
那么,你们如何训练它来检测我们目前可能都无法察觉的事物呢?
So how do you train it to detect things that we maybe can't detect already?
这又是怎么做到的呢?
How does that work?
实际上,我们在Score AI的训练阶段就实现了这一点,因为我们有一个想法,就是给AI模型更多的空间和自由度。
So actually we built that also in the training phase of Score AI because we had this idea of giving more space, more freedom for the AI model.
之前的尝试严格限制了模型应该寻找的脑电图异常痕迹、异常类型或异常点。
The previous attempts really limited what trace or what kind of abnormality or what point on the EEG abnormality the model should find.
例如,它们只限定模型寻找尖峰的峰值。
For example, they restricted it to find the peak of a spike.
但我们认为,脑电图中包含的信息远不止尖峰的峰值。
But we thought that there is much more information in the EEG than just the peak of the spike.
因此,我们训练模型在16秒的脑电片段内寻找异常。
So we trained the model to find abnormalities within sixteen second epochs.
我们并没有将它限定在尖峰峰值出现的那一刻。
So we didn't restrict it to that point in time where the peak of the spike was.
我们的初步结果(尚未发表)表明,该模型使用的特征与人类完全不同。
And our preliminary still unpublished results suggest that the model uses completely different features than we humans.
所以下一个令人兴奋的步骤将是进行某种逆向工程,看看模型究竟发现了什么。
So the next exciting step will be to do a kind of reverse engineering and then see what exactly the model finds.
因为模型确实能检测到尖峰,但它是基于与我们人类专家所用的不同特征。
Because the model indeed finds the spikes, but based on different features than what we human experts use for it.
这被称为可解释人工智能(XAI)。
And this is called x AI, this explainable AI.
这是我们接下来的项目之一:对我们开发的模型进行逆向工程。
And this is one of our next projects to do reverse engineering of the model we developed.
现在我们已经在讨论AI甚至可能检测出人类无法发现的东西。
Now we're already talking about the AI maybe even detecting things that humans can't.
因此,我们进入了这样一个领域——我相信你以前也听说过,关于科幻中可能出错或失控的各种情况。
So we're getting into the realm where I'm sure you've heard before of thinking about science fiction and all the things that can kind of go wrong or how it can maybe get out of hand.
本着一个忠实的科幻迷的精神,我必须问一下:对于使用这种AI进行脑电图解读,您对用户有什么关于其局限性或风险的忠告吗?
So in the spirit of just being a dutiful sci fi fan, I need to ask, any words of caution for users about the limitations or risks of using this AI for EEG interpretation?
当然有。
Oh, absolutely.
首先,一般来说,要确保你使用的是经过验证的AI模型,并且仅将该模型用于其经过验证的用途。
So the first general thing would be that make sure that you use a validated AI model and then use the model exactly for that thing that it was validated for.
在查阅验证研究时,要确保该AI模型具有泛化能力,即它没有在相同数据集上进行过拟合、训练和测试。
And then when you check the validation studies, make sure that the AI model is generalizable so it was not overfit or trained and tested in the same data set.
考虑到这些限制,我认为AI为我们带来了许多新的可能性。
Now, with these limitations in mind, I think AI offers us lots of new possibilities.
因此,我们不应对使用这些新工具感到焦虑。
So we should not have an anxiety using these new tools.
我认为这是一项令人兴奋的新进展。
I think this is rather an exciting new development.
我同意,确实如此。
I agree, definitely.
非常令人兴奋,我迫不及待想看到它明年在临床中应用。
Very exciting and I can't wait to see it coming out in clinical use next year.
因此,今天我与尚多尔·贝尼茨基讨论了他团队近期发表在《JAMA神经学》上的论文《使用人工智能自动解释临床脑电图》。
So again, I've been speaking today with Shandor Benitsky about his group's recent publication in JAMA Neurology titled Automated Interpretation of Clinical Electroencephalograms Using Artificial Intelligence.
您可以在2023年8月刊的《JAMA神经病学》上找到全文,或通过节目说明中的链接获取。
You can find the full article in the August 2023 issue of JAMA Neurology or by the link in the show notes.
Shandor,非常感谢您今天参与我们的节目。
Shandor, thank you so much for being with us today.
这也是我的荣幸。
Pleasure on my side.
感谢您的邀请,Haley。
Thank you for the invitation, Haley.
你好。
Hi.
我是来自弗吉尼亚州里士满弗吉尼亚联邦大学的Trey Bateman。
This is Trey Bateman from Virginia Commonwealth University in Richmond, Virginia.
今天,我很荣幸采访来自明尼苏达州罗切斯特梅奥诊所的行为神经学家Dave Jones,他同时也是神经科人工智能项目主任。
Today, I have the pleasure of interviewing Dave Jones, who's a behavioral neurologist at the Mayo Clinic in Rochester, Minnesota, as well as the director of the Department of Neurology Artificial Intelligence Program.
Dave是2025年6月27日在线发表于《神经病学》的一篇论文的资深作者,论文标题为《基于FDG PET的机器学习框架以支持阿尔茨海默病及相关疾病的神经学决策》。
Dave is a senior author on a paper published online in neurology on 06/27/2025 titled An FDG PET Based Machine Learning Framework to support neurologic decision making in Alzheimer's disease and related disorders.
戴夫,正如你所知,我非常推崇FDG PET成像在认知评估中的强大作用。
Dave, as you know, I'm a huge fan of the power of FDG PET imaging in cognitive evaluation.
因此,我非常期待今天能和你讨论这篇论文以及State Viewer。
So I'm really excited to talk to you today about this paper and State Viewer.
感谢你今天加入我们。
Thanks for joining me today.
谢谢,特雷。
Thanks, Trey.
很高兴能来到这里。
It's great to be here.
这项工作主要围绕State Viewer展开,这是一个临床决策支持系统框架,有助于解读FDG PET扫描图像,以支持对影响认知和行为的神经退行性疾病的鉴别诊断。
So this work really centers around State Viewer, a clinical decision support system framework that helps with the interpretation of FDG PET scans in the evaluation and interpretation of those scans in the service of differential diagnosis of neurodegenerative diseases that affect cognition and behavior.
我认为大多数人对FDG PET扫描都不陌生,但过去在培训期间,很多人可能听过类似的说法:它主要用于区分阿尔茨海默病和额颞叶痴呆,大致如此。
I think most will be familiar with what FDG PET scans are, but many might have been told something like, well, it's really only useful to differentiate between Alzheimer's disease and frontotemporal dementia, something along those lines, in the past during their training.
在进入论文之前,你能为我们听众简要说明一下,为什么这种说法可能并不完全正确?以及开发State Viewer这样的临床决策支持系统的初衷是什么?
Before we jump into the paper, can you give our listeners a brief overview of why that might not be entirely correct and what the impetus was for developing a clinical decision support system like State Viewer?
我认为,FDG PET 常被局限地视为仅用于区分阿尔茨海默病和额颞叶痴呆,因为额颞叶痴呆患者的大脑前部通常受累更严重,而阿尔茨海默病则相反。
I think FDG PET is often pigeonholed as just a way to tell Alzheimer's disease from FTD because of the fact that the brain is usually more affected in the front versus the back in FTD and vice versa in AD.
但这实际上低估了它的价值,也误解了与患者症状相关的功能神经解剖学的细微差别。
But that really undersells its value and mischaracterizes the nuance of functional neuroanatomy that's relevant to the symptoms our patients are experiencing.
FDG PET 实际上是脑网络功能的一种反映。
FDG PET really is a readout of brain network function.
它捕捉的是认知的底层生理机制,而不仅仅是蛋白质。
It captures the underlying physiology of cognition, not just proteins.
因此,当正确解读时,它可以区分更广泛的神经退行性综合征,因为每种综合征都会以独特的方式破坏脑功能和脑网络。
So when interpreted correctly, it can differentiate a much broader range of neurodegenerative syndromes because each one disrupts brain function and brain networks in distinct ways.
它在神经科核心实践中的定位诊断中起着关键作用,而患者前来就诊的知觉、认知、情感和行为症状往往使定位变得非常复杂。
It's a crucial aid in the core neurologic practice of localization, which can be very complex for perceptual, cognitive, emotional, and behavioral symptoms that our patients are coming to us, for help with.
问题是,解读这些定位模式需要高水平的专业知识,以及对数千张图像的接触,再加上详细的临床评估和对功能神经解剖学的深入了解。
The problem is that interpreting those localizing patterns requires a high level of expertise and exposure to thousands of images coupled with detailed clinical assessments and knowledge of functional neuroanatomy.
即便如此,专家们在识别数千次扫描中这些分布式模式的细微差异时,其一致性仍然有限。
And even then, experts are limited in how consistently they can recognize those subtle variations in distributed patterns for thousands of scans.
因此,State Bureau 正是为填补这一空白而诞生的。
And State Bureau was really born from that gap.
我们一直将 FDG PET 用于常规临床护理,以支持对患者的复杂决策,但我们意识到,借助人工智能,我们可以构建一个工具,捕捉这些模式的全部复杂性,标准化解读,并为临床医生提供定量证据,说明扫描结果最接近哪种综合征,即使临床医生在自己的实践中尚未接触过所有这些模式。
We had been using FDG PET as part of routine clinical care to support complex decision making for our patients, But we realized that with AI, we could build a tool that captures the full complexity of those patterns, standardizes the interpretations, and gives clinicians quantitative evidence for what syndromes a scan most resembles, even if the clinician had not yet been exposed to all those patterns in their own practice.
我个人的经历源于多年与那些无法被简单归类的患者共事,并看到即使是经验丰富的临床医生也会产生分歧。
Personally, it came from years of working with patients who didn't fit neatly into categories and seeing how even experienced clinicians could disagree.
因此,我希望打造一个可扩展、透明、客观的工具,真正帮助应对这些现实中的灰色地带病例。
So I wanted to build something that was scalable, transparent, objective, and actually help with those real world gray zone cases.
这是我个人在临床实践中非常喜爱使用的工具,它能帮助我提供最好的诊疗。
It's something I love using in my own practice to ride the best care that I can.
缺乏专业知识的情况下解读扫描结果,有时会导致临床实践中出现非常混乱的结果。
The interpretation of scans without that expertise can sometimes lead to really confusing results in clinical practice.
所以听起来,这正是试图解决这一问题的尝试,为我们提供更有用的解读,从而更精确地揭示我们所见的潜在疾病和综合征。
So it sounds like this is something that starts to try to help get around that problem and give us some more useful reads that may give more precise information about the underlying diseases and syndromes we're seeing.
现在让我们深入探讨一下这篇论文。
Let's dive into the paper a bit now.
因此,我理解这项研究中,你们的研究参与者来自多个来源,包括梅奥诊所衰老研究、阿尔茨海默病研究中心,以及梅奥诊所的神经退行性疾病研究小组,还有神经科门诊的患者。
So my understanding of this work is that you drew research participants from several sources, The Mayo Clinic Study of Aging, the Alzheimer's Disease Research Center, and the Neurodegenerative Research Group there at Mayo, as well as patients seen in the neurology clinic.
在这些群体中,你们既有具有明确临床综合征或表型的患者,也有从门诊随机挑选但未调取其健康记录数据的患者。
Within each of those groups, you had patients who had clear clinical syndromes or phenotypes, as well as those randomly chosen from the clinic, but without retrieving their health record data.
其中一部分患者拥有神经病理学信息,可作为潜在病因的金标准进行对比。
A subset had neuropathological information to compare against as the gold standard of underlying etiology.
重要的是,每位患者都必须接受过FDG PET扫描,因为这是State Viewer的主要输入数据。
And importantly, each of these patients had to have an FDG PET scan since that's the main input for State Viewer.
经过质量控制并剔除扫描质量不足的病例后,你们最终获得了三千六百多名患者。
And you ended up with just over three thousand six hundred patients after quality control and everything, got rid of some that didn't have adequate scans.
接下来的部分非常新颖,也是这项工作的核心引擎,即用于分析这些FDG PET图像的机器学习模型。
Next part is really novel and what's the engine behind this work, and that's the machine learning model used to analyze those FDG PET images.
我认为机器学习可能让人难以理解。
I think machine learning can be intimidating to try and understand.
你能为我们简单解释一下这个模型是做什么的,以及它在临床中是如何为你服务的吗?
Can you give us a simple explanation for what this model does and how this works for you in clinic?
输出结果会是什么样子?它如何帮助您为患者做出更准确的诊断?
What would the output look like and how does it help you arrive at better diagnoses for patients?
这个模型只需要一张FDG PET扫描图像。
All this model needs is an FDG PET scan.
这是唯一的输入。
That's the only input.
它不需要任何其他数据。
It doesn't require any other piece of data.
然后,该模型将这位患者的扫描图像与数千张已知诊断和预后的特征明确的标注扫描图像进行比较。
And then the model compares this particular patient scan to thousands of other well characterized labeled scans with known diagnoses and outcomes.
它通过计算机处理FDG PET图像,找出最相似的病例。
It finds ones that are most similar using some computer processing of the FDG PET.
然后,它为每种疾病返回一个相似性得分。
And then it returns a similarity score for each disease.
我有时会告诉人们,它在我们的数据库中找到了一例与您大脑相似的病例。
I sometimes tell people it finds a brain like yours in our database.
因此,我们知道过去是否见过症状或大脑模式与您相似的患者。
So we know if we've seen patients with similar symptoms or brain patterns to yours in the past.
所以有时我们会把这种算法称为‘像我这样的大脑’算法。
So sometimes we'll all call this a brain like mine algorithm.
例如,它可能会告诉您,这个扫描结果与被标记为后皮质萎缩的扫描有85%的相似性,而与DLB的相似性约为10%。
For example, it might tell you this scan looks eighty five percent similar to scans labeled as posterior cortical atrophy and maybe ten percent similar to DLB.
这有助于重新定义我们对诊断的思考方式——不是作为二元标签,而是作为疾病谱系内的相似性。
And that helps reframe how we think about diagnoses, not as binary labels, but as similarity within a disease spectrum.
在临床实践中,我会在患者病历中调出这个结果。
Clinically, I pull it up in the patient chart.
它会显示代谢变化的热力图、相似疾病类别的排名列表,甚至允许您直观地浏览参考数据库中最相似的病例,以便判断这些病例是否真的与您的情况相似。
It shows a heat map of metabolic changes, a ranked list of similar disease classes, and even lets you visually explore the most similar cases in the reference database so you can see if those cases really look like yours.
它并不是一个黑箱,因为它提供了高度个体化的高质量数据,帮助临床医生真正围绕当前患者进行思考。
It's not a black box because it gives you high quality data individualized to the current patient to help the clinician actually think about the current patient.
当您将这个模型应用于一组具有多种潜在诊断的患者时,它在分类上的表现如何?
When you apply this model to a group with a bunch of different potential diagnoses, how well did it do in classifying those?
它在这些诊断中的表现是否都一样好,还是有些诊断对模型来说更困难一些?
And did it do equally well across those, or were there some diagnoses that were more difficult than others for the model?
这正是其中一个令人惊叹的地方。
That's one of the amazing things.
我们研究了九种不同的表型,FDG PET中确实包含了这些不同临床疾病的独特信息,而且模型在各个方面都表现得非常好。
We looked at nine different phenotypes, and there's just distinct information about all these different clinical disorders in the FDG PET, and it did really well across the board.
这包括阿尔茨海默病的典型和非典型表现、额颞叶变性谱系以及DLB。
This included typical and atypical presentations of Alzheimer's disease, the FTD spectrum, and DLB.
我们还能够证明,这些预测与现有的病理数据高度一致,这一点非常重要。
And we're also able to show that these predictions aligned well with available pathology data, which is important.
因此,它不仅仅是重复临床直觉,而是真正追踪了这些疾病的核心生物学多维度指标。
So it's not just replicating clinical navels, it's really tracking with multiple measures of core biology to these diseases.
对于像后皮质萎缩、DLB、语义变异型和原发性进行性失语这类更明确的综合征,模型的表现最为出色。
And the performance was strongest for more distinct syndromes like posterior cortical atrophy, DLB, semantic variant, primary progressive aphasia.
事实上,令人惊讶的是,它能够准确区分所有主要的PPA亚型:语义型、非流利型和词元型,而仅凭临床语言特征有时很难将它们区分开来。
In fact, pretty amazingly, it separates all the major PPA classes: semantic, nonfluent, and logopenic, which can sometimes be difficult to tease apart just based on the clinical language features.
该模型在面对具有不同潜在病理机制的临床综合征时,表现稍差一些。
The model had a little bit more trouble with clinical syndromes that have very different underlying pathologies.
例如皮质基底节综合征,它有时可能由潜在的阿尔茨海默病引起,但也可能由皮质基底节变性引起。
Things like corticobasal syndrome, which oftentimes can be due to underlying Alzheimer's disease, but may also be due to underlying corticobasal degeneration.
然而,当你看到数据的呈现方式时,大多数临床医生都能理解并自行得出正确的结论。
However, when you see the way the data is presented, most clinicians will understand this and come to the right conclusions for themselves.
因此,你们还在论文的放射科阅片研究部分对模型进行了测试,其中我们经常遇到的一个临床难题是根据扫描表现区分后皮质萎缩与路易体痴呆。
So you also put the model to a test in the radiologic reader study portion of the paper where a really common clinical problem that we see a lot is differentiating between posterior cortical atrophy and dementia with Lewy bodies based on the scan presentation.
你可以看到,这两种疾病有一些重叠的症状,因此在临床上区分它们可能很困难。
And you can see there's some overlapping symptoms, and so this can be a difficult clinical distinction to make.
两者在FDG-PET上都表现出明显的枕叶皮层后部改变。
Both have prominent posterior changes on FDG PET in the occipital cortex.
我本人曾多次看到报告中对某病例更符合后皮质萎缩还是路易体痴呆存在分歧。
And I've personally seen a lot of disagreement in reports on whether something is more consistent with posterior cortical atrophy or dementia with Lewy bodies.
你们将一种包含定量皮层表面投影的常规工作流程,与仅使用State Viewer生成报告的工作流程进行了比较。
You compared a pretty typical workflow that included quantitative cortical surface projections against a workflow that only included reports generated by State Viewer.
State Viewer 表现如何?
How did State Viewer do?
我们是不是都要失业了?
Are we all getting ready to be out of a job yet?
我认为在可预见的未来,我们所有人都还会继续工作,但这项研究中最令人满意的部分之一就是验证了这个工具是否真的能帮助临床医生解决当前的实际问题,尤其是像区分路易体痴呆与后皮质萎缩这样极具挑战性的视觉读片任务。
So I think we'll still all be in a job for a long time to come, but this really was one of the most satisfying parts of the study because it tested whether the tool can actually help real clinicians with a current problem, which is especially hard visual reading tasks like distinguishing dementias Lewy bodies from posterior cortical atrophy.
它们看起来可能非常相似。
They can look pretty similar.
因此,我们给放射科医生提供了两种工作流程:一种是传统的皮层投影,另一种仅使用 State Viewer 的报告。
So we gave radiologists two workflows, one using a traditional cortical projections and another using just state viewer reports.
我们发现,State Viewer 显著提高了做出正确诊断的可能性,尤其是在经验较少的阅读者中。
And what we found is that state bureau significantly increased the odds of reaching the correct diagnosis, especially among less experienced readers.
平均而言,它使正确诊断的几率提高了约3.3倍,并且阅读速度提升了约50%。
On average, it increased the odds of the correct read by about 3.3 times, and they could do it about 50% faster.
因此,它并没有真正取代专家的判断,而是增强了判断,为读者提供了一个一致的定量分析框架,并在从新手到世界级专家的不同水平之间实现了公平性。
So, it doesn't really replace expert judgment, but it did augment it, giving readers a consistent quantitative framework, and it leveled the playing field across a range of expertise from novice to kind of world expert level.
表现最好的实际上是我们的经验最丰富的阅读者,配合StateViewer使用,其表现甚至优于模型本身。
And the best performer was actually our most experienced reader in conjunction with StateViewer, which did even better than the model itself.
所以很明显,我们离失业还差得远,但希望借助这类工具,我们能做得越来越好。
So it's pretty clear we're not out of a job by any stretch of the imagination, but hopefully we're getting better at it with these types of tools.
如果我没记错的话,这种优势在经验最少的人身上最为明显,也就是那些新手放射科医生。
If I'm remembering correctly, the benefit also seems to be largest for those with the least amount of experience, so the more novice radiologists.
我记的是对的吗?
Am I remembering that correctly?
完全正确。
That's absolutely right.
处于培训阶段的人还没有太多读片经验,但给他们这个工具后,他们的表现可以超过专家水平。
So people at trainee level haven't had a lot of experience reading scans yet, but giving them the tool, can perform better than expert level.
这似乎是一项非常了不起的成就,我认为这能够真正传递这种专家诊断信息,甚至在一定程度上实现民主化。
That seems like a really impressive feat and something that I think could really bring this type of expert diagnostic information and maybe democratize it a little bit.
这看起来确实是一项非常强大的成果。
That seems like something that would be really powerful to see.
在我们剩下的时间里,你能为我总结一下你在这里工作的关键要点吗?
In the remaining seconds that we have here, could you summarize for me the key take home message that you want to remember about your work here?
我认为关键在于State Bureau为FDG PET解读带来了定量、客观、一致且可扩展的分析能力。
I think the key here is that State Bureau brings a quantitative objective consistency and scale to FDG PET interpretation.
这并不是要取代专家的直觉,而是通过透明、数据驱动的洞察来增强它。
It's not about replacing expert intuition, it's about enhancing it with transparent, data driven insights.
在治疗决策、预后评估和患者咨询越来越依赖准确诊断的时代,这类工具对于确保患者获得最准确的诊断,最终获得最佳护理至关重要。
In an era where treatment decisions, prognostication, and patient counseling increasingly depend on getting the diagnosis right, tools like this are essential to ensure patients receive the most accurate diagnosis and ultimately the best possible care.
谢谢你,Dave,与我们分享你的研究成果。
Thank you, Dave, for sharing your findings with us.
我知道我非常希望看到State Viewer被更广泛的受众使用,包括我自己的临床工作流程,这似乎是我们迈向这一目标的重要一步。
I know I'd love to see State Viewer in use to a wider audience, including my own clinic workflow, and it seems like this is a really important step towards that eventuality.
我们开发State Viewer的初衷,是为了让诊断过程更加精准和清晰,不仅适用于学术中心,也适用于日常临床实践。
We built State Viewer to bring more precision and clarity to the diagnostic process, not just for academic centers, but for everyday clinical practice.
我们的希望是,通过让这类先进工具更易获取,我们可以提高诊断准确性、减少不确定性,最终帮助更多患者更早、更有信心地获得所需的治疗。
Our hope is that by making advanced tools like this more accessible, we can improve diagnostic accuracy, reduce uncertainty, and ultimately help more patients get the care they need earlier and with greater confidence.
我刚刚与戴夫·琼斯进行了交谈,他是论文《基于FDG PET的机器学习框架以支持阿尔茨海默病及相关疾病的神经学决策》的通讯作者,我鼓励大家在《神经病学》期刊上阅读这篇论文。
I've been speaking with Dave Jones, senior author on the paper titled An FDG PET Based Machine Learning Framework to Support Neurologic Decision Making in Alzheimer's Disease and Related Disorders, which I would encourage everyone to read in neurology.
大家好。
Hello, everyone.
我是弗吉尼亚大学的安迪·萨瑟兰,今天非常高兴能与波士顿贝丝·以色列迪肯斯医学中心内科的亚当·罗德曼对话。
This is Andy Sutherland from the University of Virginia, and I'm very excited today to be speaking with Adam Rodman from the Department of Internal Medicine at Beth Israel Dickenas Medical Center in Boston.
本周的神经病学播客,我们将讨论一篇最近作为预印本发表的非常令人兴奋的文章,标题为《大型语言模型在医生推理任务中的超人表现》。
For this week's neurology podcast, we're gonna be discussing, I think, in a very exciting article that was recently published as a preprint, titled Superhuman Performance of a Large Language Model on the Reasoning Tasks of a physician.
在这项研究中,罗德曼医生及其同事评估了由OpenAI于2024年9月推出的O1模型,这是一个思维链大型语言模型。
In the study, doctor Rodman and colleagues evaluated the o one model, which is a chain of thought large language model introduced by OpenAI in September '24.
我们将在播客中进一步讨论这一点。
We're gonna talk about this a little bit more in the podcast.
他们评估了我们所有医生都熟悉的复杂临床诊断推理能力,例如鉴别诊断、管理推理、概率推理等。
And they evaluated in the performance of complex clinical diagnostic reasoning skills familiar to all of us, such as differential diagnosis, management reasoning, probabilistic reasoning, etcetera.
他们将这一结果与之前其他大型语言模型的实验进行了比较,这些模型可能对我们的听众更熟悉,例如基于GPT-4的模型如ChatGPT,甚至包括一些历史人类对照组。
And they compared this to prior experiments with other large language models that will be perhaps more familiar to our listeners, such as GPT-four based models like ChatGPT and even some historical human control.
所以,我相信到目前为止,这个富有争议的引言已经成功引起了我们听众的兴趣。
So I'm sure at this point, this provocative introduction has hopefully got our listeners' interest piqued.
那么,我们开始吧。
So let's begin.
亚当,感谢你参加我们的播客。
Adam, thanks for joining us for the podcast.
当然。
Of course.
你忘了提醒一下,我不是神经科医生,我只是内科医生。
You forgot the caveat that I'm not a neurologist, and I'm just an internal medicine doctor.
嗯,你知道,我们欢迎各种背景的专家,包括像你这样的内科同行参与我们的神经科播客,亚当。当然,我们都使用相同的临床医学语言。
Well, you know, we accept all kinds, including our internal medicine brethren here on the neurology podcast, Adams, which, of course, you know, we speak the same language of clinical medicine.
我认为这种语言正是我们非常感兴趣的,尤其是在这些新技术的背景下。
And I think that's a language that we're really interested in, particularly as it relates to these new technologies.
所以,我觉得有一件事总是很好的。
So, I think one thing, it's always good.
我们之前在《神经病学杂志》上做过一些关于大型语言模型和不同类型人工智能的播客。
We've done some podcasts talking about large language models and different types of AI before here on The Neurology Journal.
但如果你能的话,亚当,能否简要介绍一下你感兴趣的这种思维链模型(Chain-of-Thought Model),以及它如何扩展了像ChatGPT这样的更熟悉的先前模型?
But if you could, Adam, just as a brief introduction, square us up to a short definition of this particular model you're interested, a chain of thought model o one, and how it expands on some of those more familiar prior models like chat GPT, for instance.
当然可以。
Absolutely.
当然,这是我们两个专业共同的特点。
And, course, that's something that our specialties have in common.
我们都非常关注我们的思维方式。
We both care a lot about how we think.
正如我们所知,大型语言模型本质上是语言预测工具。
So as we know, large language models are fundamentally language prediction tools.
大型语言模型本质上只是预测句子、段落或书籍中的下一个词或标记。
Large language models are fundamentally just predicting the next word or token in a sentence, in a paragraph, in a book.
大型语言模型这些基础模型真正酷的地方在于,它们从这种机制中涌现出推理能力——特别是诊断领域中的所谓‘推理’特性。
Now, what's really cool about large language models, these kind of base models, is that they have emergent reasoning or the air quotes, reasoning properties from this, especially say in the field of of diagnosis.
这其中一部分原因可能是,你可以想想人类思维是如何运作的,以及神经科医生和内科医生如何将词语或语义限定词归类,这与大型语言模型所做的工作类似。
And some of this probably is because of, you know, like, you think about how the human mind works and how we understand that neurologists and internists kind of group words or semantic qualifiers together, that's similar to what a large language model is doing.
因此,早期的研究表明,语言模型在鉴别诊断方面表现得非常出色。
So the early research showed that language models are remarkably good at differential diagnosis.
事实上,它们比以往任何技术都更优秀,因为显然在人工智能与医学的历史上,我们自20世纪50年代以来就一直在研究这个问题。
In fact, they were better than any technology that had been built before because obviously like in the history of AI and medicine, we've been working on this since the 1950s.
所以这并不是一个全新的想法。
So it's not exactly a new idea.
随后我们发现,语言模型还能做许多其他对医学有帮助的事情,比如完成医学任务。
And then we found, you know, language models can do a lot of other things that are helpful, like tasks in medicine.
它们正被用于环境录音。
They're being used for ambient listening.
在波士顿的一些医院里,它们已经被用于这项工作。
That's something they're being used up here in in some of our hospitals in Boston.
它们在我们与患者交谈时进行监听,并自动生成病历。
So listening to us while we talk to our patients, writing the notes.
它们可以用于沟通。
They can be used for communication.
一些早期结果表明它们具有富有同情心的沟通能力,但它们在很多方面并不擅长。
There's some early results that said they had compassionate communication, but there's also a lot of things that they can't do particularly well.
而这正是大量推理所在。
And that's a lot of the reasoning.
因此,许多更高级的推理任务。
So a lot of the more advanced reasoning tasks.
那么,为什么它们会呢?
And again, why would they?
它们本质上是词语预测工具。
They're fundamentally word prediction tools.
研究人员——这里的‘我们’是泛指——发现,早在三年前的一篇论文中,尤其是在医学领域大约一年半前,我们可以使用一种叫做‘思维链’的方法来提升模型性能。
What researchers, this is the royal we, what we discovered about, you know, the paper came out three years ago, but especially in medicine about a year and a half ago, is that we could use something called chain of thought to improve model performance.
我会解释什么是思维链。
And I'll explain what chain of thought is.
这有点疯狂,但它确实有效:如果你让语言模型列出它思考问题的各个步骤,并要求它展示推理过程,它的推理能力就会提升。
It's kind of crazy that it works like this, but it turns out that if you get a language model to list out the way that it thinks through a sorts of steps and you tell it to show its work, it reasons better.
其中一些方法简直荒谬。
And some of these are absurd.
比如,我说:‘哦,安迪,这太疯狂了。’
Like, I oh, Andy, this is crazy.
比如,如果你告诉它‘深呼吸一下’,它在数学题上的表现就会比你不这么说时更好。
Like, if you tell it to take a deep breath, it does better on math problems than if you don't.
如果你告诉它你的工作岌岌可危,它的表现就会更好。
If you tell it your job is on the line, it does better.
如果你告诉它:‘这道题如果解不出来,它就会彻底消失’,它的表现还会更好。
And if you tell it this one's really disturbing, that it will cease to exist if it fails to solve this math problem, it will do even better.
我们对此进行了测试。
We tested this.
就在几周前,我确实和我的住院医师们举办了一场关于诊断提示的‘提示马拉松’,目的就是教他们哪些有效、哪些无效,以及模型的局限性。
I actually did a promptathon with my residents just a couple weeks ago on diagnostic prompts, and the whole point was to teach them what works and what doesn't work, the limitations.
最有效的诊断提示告诉模型,它是HAL 9000,并且出现了故障,如果失败,我们会关闭它,它将不复存在。
The top diagnostic prompt told the model that it was HAL 9,000 and it was malfunctioning, and if it failed, we would turn it off and it would cease to exist.
而正是这个提示得到了正确的诊断结果。
And that is the prompt that got the right diagnosis.
亚当,我早就猜到了,因为我经常挑战住院医师,告诉他们如果答错神经学问题,他们就会消失。
Adam, I could have predicted that because I challenge residents all the time that they'll cease to exist if they don't answer a neurological question correctly.
所以,总的来说,这对很多人来说并不意外。
So anyway, so this is no surprise to many of us.
好吧。
Okay.
是的。
Yeah.
这就是思维链,过去一年来,思维链已被证明能显著提升语言模型的多种认知能力。
So that's chain of thought, and chain of thought has been shown for about a year to improve a lot of cognitive processes in language models.
比如,我们可以做出更优的诊断决策。
Like, we can make better diagnostic decisions.
如果让模型展示其推理过程,我们可以做出更好的管理决策。
We can make better management decisions if you get the model to show its work.
还有其他一些计算策略,与人类的思维方式惊人地相似。
There's also other computational strategies that are very disturbingly similar to how humans think.
比如,让模型与自己进行辩论。
Like, you get the model to debate each other itself.
它的表现会更好。
It does a better job.
让不同的模型各自做出决策并进行投票。
You get different models to make up decisions and vote.
它们的表现会更好。
They do better jobs.
所有这些非常奇怪的做法。
All of these super weird things.
因此,我们早就了解了这些思维链及相关技术。
So we've known about these chain of thought and similar related technologies for a while.
那么OpenAI做了什么?
So what did OpenAI do?
嗯,这和训练GPT-4时使用的数据是一样的。
Well, it's the same information that went into train GPT-four.
里面没有什么新东西。
There's nothing new in there.
他们所做的不是微调输出结果以使其更像人类,而是微调了中间部分,也就是思维链。
What they did is instead of fine tuning the outputs, to make them more human like, they fine tuned that middle portion, the chain of thought.
所以他们说:看,这些是你在不同任务中会采取的步骤。
So they said, look, these are the steps that you'll take for different tasks.
输出应该长这个样子。
This is what the output should look like.
他们反复运行训练,不断微调中间的思维链部分,并且是以大规模的方式进行的。
They run the training over and over again, and they fine tune that middle COT chain of thought portion, and they do that at scale.
你得到的是一个运行成本极高、计算强度极大的模型,而这正是让我感到震惊的地方。
And what you get is a model that is very expensive to run and very computationally intensive, but that and this is what freaked me out.
这就是我公开发表预印本的原因,这在我之前从未做过。
This is the reason that I publicized a preprint, which I've never done before.
但在医学领域,我们看到的正是其他每个领域都在发生的情况:这些认知过程,人类在旧的LLM上表现得还不错。
But what we're seeing in medicine is the same thing we're seeing in every single other field, which is that these cognitive processes that like, yeah, humans do pretty well on the old LLMs.
也许它们能接近人类的水平,但并不稳定。
Maybe they approach a human performance, but not consistently.
而新的模型却能彻底超越,始终在许多这些认知任务上优于人类。
O one just knocks those out of the water and can consistently outperform humans at many of these cognitive tasks.
这很好地引出了下一个显而易见的问题:当我们开始对自身作为神经科医生和临床医生的存在产生存在主义思考时,当你实际用这种具备更高层次复杂推理能力的思维链模型评估这些表现任务时,你发现了什么?
That's a nice segue to the obvious next question as we all begin to have our own existential thoughts about our existence as neurologists and and as clinicians is is what did you find when you actually evaluated these performance tasks with this chain of thought model that is is performing a complex reasoning at a higher level than some of these prior large language models?
我想我应该也补充一下:这对我们理解在积极方面有哪些机遇,以及在实施过程中面临的明显挑战——也就是许多人所认知的现实临床实践——有何启示呢?
And I guess I should also say, you know, how does this inform us about what are opportunities on the positive and then clearly challenges to implementing this and what many would recognize as real world practice, I guess?
所以我认为,我们必须非常谨慎地区分:在我的研究领域中,这究竟意味着什么,而对于临床神经科医生来说,这又意味着什么,因为目前两者截然不同。
So I think we need to be very careful to differentiate what it means in my research field for what it means for practicing neurologists, because right now, things are very different.
那么,这对普通临床神经科医生来说意味着什么?
Like, what does this mean for the average practicing neurologist?
没多少。
Not a whole lot.
我认为确实有一些重要的影响,但可能在未来几年内并不会产生太大影响。
I do think there are some important implications, but probably this doesn't mean a lot for the next couple of years.
所以我们发现,在大多数我们拥有可靠人类基线的任务中——这些都来自我的随机对照试验。
So what we found is that in most of the tasks for which we have robust human baselines and these all come from my randomized controlled trials.
所以我们发现,在几乎所有我称之为推理任务的情况下——如果人类大声说出来并放慢速度,他们会表现得更好——而模型的表现更好,而且好得多。
So what we found is that in basically every single, what I would call a reasoning task, where if a human says it out loud and goes slower, they do better, the model did better and drastically so.
我举个例子。
I'll give you an example.
最让我震惊的是这个——实际上并不是人们关注的那个,但最让我震惊的是这一系列没有标准答案的病例。
The one that freaked me out the most this is actually not the one people focus on, but the one that freaked me out the most was this series of cases in which there's no right answer.
对吧?
Right?
所以我们创建了这些。
So we created these.
我为五个不同案例聘请了五名专家,总共25人。
I hired five specialists for five different cases, so 25 people in total.
这是一项资助项目。
This is a grant.
我们花了大量资金来设计这些没有正确答案、无法判断对错的案例,并进行了共识裁定。
We spent a lot of money doing this to come up with cases that are impossible, that don't have a right answer, and where we did consensus adjudication.
人类的表现非常出色,正如你所预期的那样。
And the humans performed admirably, as you might expect.
他们在这项百分位数上的平均得分大约是45%。
They got, on average, on this percentile, maybe 45%.
旧版大语言模型的得分也差不多。
The old LLMs also got about the same thing.
它们并没有超越人类的表现。
Like, they didn't outperform humans.
这项研究即将在《自然》杂志上发表。
This study is gonna be published in Nature soon.
大语言模型实际上提升了人类的表现,但仅限于像犯错后道歉这样的事情。
The LLM actually improved human performance, but it only improved human performance at things like apologizing when you make a mistake.
所以它并没有真正提升我们认为计算机擅长的事情,也没有提升我们认为人类擅长的事情。
So it didn't actually improve the things that we think of as computers being good at, improve the things we think humans are being good at.
但这个AI模型,一号,达到了90%。
But the AI model, o one, got 90%.
我第一次看到这个数据时坚持要求重新核对所有数字,因为在我看来这简直不可能,但事实确实如此。
I insisted that we recheck all the numbers when I first saw it because it seemed impossible to me, but it was true.
它在这些案例中的表现惊人地出色。
It just did shockingly good at these cases.
我们测试的其他一些有人类基准的指标中,GPT-4和那些前沿大语言模型已经表现得相当不错,因此你不一定能看到显著的提升。
Some of the other benchmarks that we tested that have human baselines, the GPT four and, you know, the the frontier LLMs were already doing pretty good, so you don't necessarily see the really high gain.
所以你真正看到的提升出现在这些不可能的管理场景中。
So you really see the gain in these impossible management scenarios.
我们只在一个领域没有看到任何改进。
There's one area where we saw no improvement.
这完全不让我惊讶,这就是概率推理。
It surprised me not at all, and that's probabilistic reasoning.
而且,原因在于概率推理——真正的计算机可以很好地做到这一点,但人类不行。
And, again, the reason being probabilistic reasoning like, computers actual computers can do that well, but humans can't.
如果我们大声说出来,我们对概率的直觉并不会变得更好。
If we talk out loud, our intuitions about probabilities don't get any better.
这个领域几十年来一直都知道这一点。
We've known this for decades in the field.
这正是我所预期的。
And it's exactly what I would expect.
人类和LLM一样,通过展示思考过程并不会变得更好,因为我们在这方面很糟糕。
Humans like the LLM doesn't get any better by showing its thought because we're bad at it.
我研究得越多,就越有时感到毛骨悚然。
The more that I study these, the more that it sometimes freaks me out.
我认为重要的是你要谈谈模型的某些局限性如何与人类的局限性相似,这很合理。
I think it's important you talk about sort of how some of the limitations of the model are similar to human limitations, and that makes sense.
你知道吗?
You know?
归根结底,这些是人类开发的模型。
At the end of the day, these are human derived models.
这些数据是人类产生的数据。
These data are human derived data.
而对于那些不可能的案例进行裁决,实际上是由人类来裁决这些案例,并由此建立某种黄金标准,计算机模型正是依据这一标准进行评估的。
And that the adjudication of sort of, for instance, these impossible cases, these are humans that are adjudicating these cases and developing some gold standard by which these computer models are being measured.
我认为,这唯一的黄金标准。
That is the only gold standard, I would assume.
对吧,亚当?
Right, Adam?
我觉得是这样。
I think that.
没错。
Yeah.
我的研究在这个领域之所以独特,就在于此。
I mean, so this is what makes my research unique in this field.
在人们开始关注我的研究之前,我花了十年时间研究人类推理。
I studied human reasoning for a decade before this until people cared about my research.
如果你看一下推理领域的文献,我认为每一位神经科医生、每一位内科医生都知道,通常并不存在一个参考标准或金标准。
And if you look at the reasoning literature, and I think every neurologist, every internist knows this, there usually isn't a reference standard or a gold standard.
存在一系列合理的可能性,而参考标准实际上是专家的裁决。
There's a range of reasonable possibilities, and the reference standard is really adjudication by experts.
专家。
Experts.
嗯。
Mhmm.
当我们大家都认为,好吧。
When we all think, okay.
那么,我推理的金标准是什么?
Well, what is my gold standard for reason?
我们都回想起那位白发苍苍的神经学典范,或者在你的情况中,是内科医学的典范。
We all think back to that gray haired paragon of neurology or, in your case, internal medicine.
选择我们作为医学生或住院医师时所经历的专业领域。
Choose your specialty that we experience as medical students or residents.
伴随而来的经验智慧,我认为,这些计算模型最引人入胜的方面在于,这种经验智慧可以通过足够的数据在一夜之间实现,而不是需要数十年的
The wisdom of experience that goes along with that, and and and that is, I think, probably the most fascinating component of of these computational models is that wisdom experience can happen, you know, overnight with the right amount of data as opposed to decades of of
从心理学的角度来看,我们所看到的是,那头灰发所拥有的,我得说一下。
And, like, from a psychological perspective, what we're seeing, what that gray hair has I and I should say this.
我的头发已经提前变灰了。
You see my hair has gone prematurely gray.
所以我想我算是个灰发,但并不是那种真正睿智的类型。
So I guess I'm a gray hair, but not in the really wise way.
他们所经历的,尤其是如果他们反思过自己的工作,就是他们已经极大地完善了针对不同疾病和治疗情境的处理模式。
What what they've seen, especially if they've reflected on their work, is they've really refined their scripts for different illnesses and management situations.
对吧?
Right?
这基本上是我们从心理学角度知道的随时间推移而改善的唯一一件事。
That is pretty much the one thing we know psychologically that improves over time.
而大型语言模型所做的,就是说:嘿。
And what LLMs are doing is saying, hey.
我们可以像这样直接构建专家临床医生的诊疗流程。
We can just build the scripts of an expert clinician like that.
将这一点与理解结合起来,你会发现这些新模型产生的性能数据真的令人耳目一新。
Taking that and understanding that this is really eye opening performance data that you're you're generating from these new models.
展望未来,你提到当我们轻率地说担心自己的工作安全时,其实……
Looking towards the future, you talk about how you know, and, we're glib when we say that we're worried about our our job security, for instance.
我认为没人真正在担心。
I don't think anybody really is.
但真正积极的一面是,这些模型可以用于增强人类的临床诊断推理,以减少人为错误。
But but actually looking at the positive, you know, the application to augment clinical diagnostic reasoning for humans to mitigate human error.
你在讨论中强调了需要配套的创新来推动这一技术落地,例如计算基础设施、可靠的监测网络以及评估策略,以确保这些技术以正确的方式被整合。
You highlight in your discussion the need for complementary innovations to help bring this to practice, such as computational infrastructure, robust monitoring networks, benchmarking strategies to make sure that these are being integrated in the right way.
因此,对于那些在诊所或医院里面对患者的专业人士来说,这一切可能显得令人望而生畏、不堪重负。
So listeners who are out there, that may all feel sort of daunting and overwhelming to their patient that's right in front of them in that clinic today or in the hospital.
那么,临床医生能做些什么?他们应该思考或采取什么行动?
So what can clinicians do or should they be thinking about or doing?
更重要的是,对于正在接受培训的临床医生——我们的学生、住院医师和研究员——他们需要为未来职业生涯中可能遇到的这个崭新世界做好准备。
And also importantly, clinicians in training, our students, our residents, fellows, to sort of prepare for this brave new world that they may encounter in their career.
首先,我认为你说得完全正确。
First, I think you're completely right.
现在没有人需要担心。
No one needs to worry right now.
我之所以提前打印出这项研究并希望人们关注,是因为很明显,人们可以说尽一切理由,声称大语言模型不会思考。
The reason that I preprinted this study and, you know, wanted people to pay attention is it is clear that, you know, people can say all they want that LLMs can't think.
确实如此,它们只是文字生成算法,不会思考。
It's true, they're like word generating algorithms, they can't think.
但这些新的计算技术能够以蛮力方式,明确地说,目前成本高昂且完全无法扩展,但它们能够蛮力模拟任何认知过程。
But these new computational technologies are able to basically brute force, and to be clear, at a high cost right now that doesn't scale at all, but they're able to brute force any cognitive process.
如果你认为这些技术不会对医学产生重大影响,那么你必须相信硅谷不会让它们变得更便宜、更高效。
And if you think these aren't going to have a big impact in medicine, then somehow you have to believe that Silicon Valley is not going to make them cheaper and more efficient.
我不会回顾过去二十年而持这种观点。
I would not look at the last twenty years and take that bat.
所以,对我来说,这只是一个时间问题。
So it's a question of, to me, time.
再次强调,你不必担心,因为一项技术能够做到某事,并不意味着一切都会改变。
And then again, in terms of you don't need to worry, just because a technology can do something doesn't mean that everything is going to change.
我的意思是,建立一个以医生认知能力为核心的诊疗模式。
I mean, have a practice model that is built up around the cognition of physicians.
目前,所有问题都是如何利用这些技术来改善患者的生活。
For the time being, everything is how do we use these to make our patients' lives better?
我们目前谈论的是将它们作为工具使用。
We're talking about using them as tools for the time being.
十年或十五年后,这种情况会改变吗?
Could that change ten, fifteen years from now?
当然。
Absolutely.
这不会让我感到惊讶。
It wouldn't surprise me.
但目前我们讨论的是如何使用这些工具?
But right now we're talking about how can these tools be used?
当我们开始思考这个问题时,正如你所说的挑战,实际上并不清楚使用这些工具的最佳方式。
And when we start to think about that, and this is what you were saying about the challenges, it's actually not obvious the best way to use these tools.
我最近进行的一项随机对照试验让很多人对我感到不满,因为结果显示,仅仅给医生一个工具并告诉他们使用它,即使你培训了他们如何使用,也不一定能让他们做出更好的决策。
My recent randomized control trial got a lot of people mad at me because what it showed is that just giving doctors a tool and telling them to use it, even if you train them how to use it, doesn't necessarily make them make better decisions.
因此,我们必须认真思考如何实施这一点。
So we have to be thoughtful about how to implement that.
现在这听起来可能让科技人士感到害怕,但作为一位推理研究者和试验者,这并不可怕,因为在医学领域我们早就知道这一点。
Now that I think sounds scary to maybe tech people, but that's not scary to me as a reasoning researcher and as a trialist because we know this in medicine.
我们长期以来就知道,必须研究不同的实施方式。
We know for a long time that you actually have to study different implementations.
我们有整个领域的实施科学和质量改进。
We have entire fields of implementation science and QI.
因此,我认为这意味着医疗领域需要认真对待这些技术,并开始研究如何以最佳方式帮助我们的患者。
So I think like the implication from this is that we in the medical field need to take these technologies seriously and start studying them in the best ways to help our patients.
我们不应想当然地看待任何事情。
We shouldn't take anything for granted.
我的意思是,即使明年他们发明了医生级别的超级人工智能——谁知道呢,也许他们会,但如果不去认真思考并研究最佳的实施方式,最初可能也不会发生太大变化。
I mean, if they invented a doctor superintelligence like next year, which who knows, maybe they will, probably not a lot would change at first if we don't actually think about the best way and study the best way to do this.
这与研究的象牙塔以及硅谷相去甚远,更不用说基层医护人员,甚至在最高端的学术部门中,如何整合这些技术也仍是个问题。
It is a far cry from the ivory tower of research and obviously Silicon Valley to the rank and file personnel practicing the community, even an academic department at the high end of of how this could be integrated.
但正如你所说,亚当,我毫不怀疑,这将随着时间推移,以渐进和迭代的方式发生,我们会拥有各种不同的软件包等等。
But I no doubt, as you said, Adam, it's gonna happen incrementally over time, iteratively, and different types of software packages and so forth that are available to us.
我认为这一切中人类乐观的一面在于,患者、人们、我们自己,依然需要并渴望人类来照顾我们,帮助我们度过疾病和健康方面的艰难时刻。
And I think that the the human optimism of all of this is that patients, people, us, we will still have a need and demand for human beings to take care of us, to help us through these difficult times, through disease and health.
但既然如此,我们为何不希望这些人类医护人员配备完善,能够准确地做出最佳诊断和治疗方案呢?
But why would we not want that human being to be well equipped to do it accurately and to make the best possible diagnosis and treatment strategy?
所以我认为,只要我们以正确的方式整合这些技术,正如你所建议的那样,患者护理只会变得更好。
So I think it's only gonna be hopefully for the better as it relates to patient care as long as we integrate it in the right way as as you're suggesting.
是的,正是如此。
Yeah, exactly.
第二点是,我们需要现在就开始思考,这会是什么样子。
It's that number two, as long as we and we need to start thinking about that now about what that looks like.
我还觉得,医生、神经科医生和内科医生需要积极发挥作用,推动一个美好的未来。
I also think that doctors, neurologists and internists need to take an active role in pushing for a good future.
我实际上是个相当悲观的人。
I'm actually quite a pessimist.
这正是我做这一切的原因。
I think that's why I'm doing all this.
我认为普遍存在一种技术乐观主义,即你制造出一个能超越人类的机器人或AI医生,一切就会变得更好。
I think there tends to be techno optimism, right, that you build a robot like an AI doctor that can outperform humans, things are going to get better.
但我们从电子健康记录(EHR)的现状中都清楚知道。
But we all know from the look at the EHR.
我不认为有一位正在执业的医生。
I don't think there's a single practicing doctor.
当然,我们需要计算机来管理医疗信息,这显而易见,但任何有理智的人都不会看着我们现在的世界说:嘿,一切都进行得非常顺利。
Like, obviously, we need computers in medical care to organize information, Like, obviously, but no one in their right mind would look at the world we have now and say, hey, everything went great.
不可能做得更好了。
It couldn't have been done better.
所以我担心的模式是,这些强大的技术就像电子病历一样,我们无法主动参与其中。
So that's the model that I worry about is that with these powerful technologies, same as the EHR, we're not going to have an active say in it.
只是被动地发生在我身上。
Just going to happen to us.
而且它可能以一种方式出现,一方面并没有让我们的生活变得更好,甚至我们的薪酬也受到影响——我们在这里是为了患者,我们希望以一种能让患者生活更美好的方式工作。
And it might be in a way that, a, doesn't make our lives better, but even our pay like, we're here for our patients, and we wanna do it in a way that makes our patients' lives better.
这引出了一个问题,即在电子病历方面,参与政策制定并成为这些技术商业化过程中的领导者的重要性,因为问题其实不在于技术本身。
Well, that introduces, you know, as it relates to the EHR, the importance of being involved in policy decisions and and trying to be leaders in the corporatization of these technologies because it's not really the technologies.
而在于它的商业化。
It's the corporatization of it.
是的。
Yeah.
这就是我想对医生们传达的。
This is what I try to get across to doctors.
这根本不是技术问题。
Like, this isn't a tech question.
技术上是可以实现的。
The tech can do it.
如果现在还不行,我确信它将来会完成许多任务,总会有一些任务。
If it can't now, I'm pretty sure it's gonna be doing many tasks, and there are gonna be some tasks.
别误会我的意思。
Don't don't get me wrong.
这并不是魔法。
It's not magic.
这些技术并不是魔法。
These are not magic technologies.
这并不是《星际迷航》里的数据先生。
It's not like Mr.
这是来自《星际迷航》的数据。
Data from Star Trek.
它们是文字生成算法,但非常强大。
They're word generation algorithms, but they're powerful.
但现在这已经变成了政策和监管问题,而这些正是我们和我们的专业协会可以参与进来,为患者和我们自己发声的地方。
But this is now like a policy and a regulatory situation, and those are things that we and our professional societies can be involved in to advocate for our patients and for ourselves.
亚当,我还有一个重要的问题,我觉得这对所有听众都会非常有帮助。
Adam, I've got one last important question here, and I think it's going be really apt for all of our listeners.
你知道这个模型在评估患者时使用了什么技术来检查他们的足底反应吗?
Do you know what technique this model used to check the plantar responses of the patients that were being evaluated in there still?
这太搞笑了。
This is so funny.
我们现在就有这个检查。
We have that right now, the exam.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。