智力的本质，第五集：我们如何评估智力？

本集简介

嘉宾：埃里卡·卡特米尔，印第安纳大学伯明顿分校人类学与认知科学教授埃莉·帕夫利克，布朗大学计算机科学与语言学助理教授主持人：阿巴·埃利·菲博与梅兰妮·米切尔制作人：凯瑟琳·蒙库尔播客主题音乐：米奇·米尼亚诺关注我们： Twitter • YouTube • Facebook • Instagram • LinkedIn • Bluesky 更多信息：教程：机器学习基础讲座：人工智能圣塔菲研究所项目：教育多元智能夏季学院书籍：《人工智能：给思考者的人类指南》作者：梅兰妮·米切尔演讲：《我们如何知道动物理解什么》——埃里卡·卡特米尔《人工智能的未来》——梅兰妮·米切尔论文与文章： “只是开玩笑：嬉戏式调侃的进化根源”，《生物学快报》（2020年9月23日），doi.org/10.1098/rsbl.2020.0370 “克服人类语言与动物交流比较中的偏见”，《美国国家科学院院刊》（2023年11月13日），doi.org/10.1073/pnas.22187991 《动物交流中的感官运用》，埃里卡·卡特米尔，《语言人类学新伴侣》第20章，威利在线图书馆（2023年3月21日）《大型语言模型中的符号与接地》，《皇家学会哲学汇刊A辑》（2023年6月5日），doi.org/10.1098/rsta.2022.0041 《具身序列建模中抽象状态表征的涌现》，arXiv（2023年11月7日），doi.org/10.48550/arXiv.2311.02171 《我们如何知道人工智能系统的智能程度》，《科学》（2023年7月13日），doi: 10.1126/science.adj59

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

你们将听到的声音是在不同国家、城市和工作场所远程录制的。

The voices you'll hear were recorded remotely across different countries, cities, and workspaces.

Speaker 1

我经常认为，作为物种，人类是非常以自我为中心的，对吧？

I often think that humans are very egotistical as a species, right?

Speaker 1

所以我们擅长某些特定的事情，并且倾向于更重视我们擅长的方面。

So we were very good at particular things and we tend to place more value on the things that we're good at.

Speaker 0

来自圣塔菲研究所，这里是复杂性。

From the Santa Fe Institute, this is Complexity.

Speaker 2

我是梅兰妮·米切尔。

I'm Melanie Mitchell.

Speaker 0

我是阿巴·艾利·菲博。

And I'm Abha Eli Phoboo.

Speaker 2

随着我们进入本季关于智能的第五集，我们已经探讨了许多复杂且有争议的观点。

As we enter our fifth episode of this season on intelligence, we've explored quite a few complicated and controversial ideas.

Speaker 2

但有一件事变得非常清楚，智能是一个模糊的概念。

But one thing has become really clear, intelligence is a murky concept.

Speaker 2

这正是本系列想要传达的观点。

And that's the point of this series.

Speaker 2

当我们看到它时，我们认为自己知道它是什么。

It's something that we think we know when we see it.

Speaker 2

但当我们将其分解时，却很难给出严谨的定义。

But when we break it down, it's difficult to define rigorously.

Speaker 2

今天

Today's

Speaker 0

这一集探讨的是我们如何评估智力。

episode is about how we assess intelligence.

Speaker 0

在测试人类时，我们有各种各样的标准测量方法，比如智商测试、SAT考试等等。

When it comes to testing humans, we have all kinds of standard measures, IQ tests, the SAT, and so on.

Speaker 0

但这些测试远非完美，甚至被批评为局限性或带有歧视性。

But these tests are far from perfect, and they've even been criticized as limited or discriminatory.

Speaker 2

要理解我们为何如此渴望测试智力，以及为何将智力视为一种固有的人格特质，回顾西方社会中智力的历史会很有帮助。

To understand where our desire to test intelligence comes from and also the way we talk about it as an inherent personality trait, it's useful to look at the history of intelligence in western society.

Speaker 2

在古希腊，这一概念被描述为理性或合乎逻辑的能力，后来随着心理学学科的兴起，逐渐演变为更广泛的智力概念。

In ancient Greece, the concept was described as reason or rationality, which then evolved into intelligence more broadly when the discipline of psychology arose.

Speaker 2

像苏格拉底、柏拉图和亚里士多德这样的哲学家都非常重视人的思考能力。

Philosophers like Socrates, Plato and Aristotle highly valued one's ability to think.

Speaker 2

乍一看，这似乎是一种崇高的观点。

And at first glance, that seems like a noble perspective.

Speaker 0

但亚里士多德更进一步。

But Aristotle took this a step further.

Speaker 0

他将所谓的理性元素作为社会等级制度的正当理由。

He used the quote unquote rational element as justification for a social hierarchy.

Speaker 0

他把受过教育的欧洲男性置于顶端，而女性、其他种族和动物则被置于其下。

He placed European educated men at the top and women, other races and animals below them.

Speaker 2

其他西方哲学家，如笛卡尔和康德，也接受了这种等级观念。

Other western philosophers like Descartes and Kant embraced this hierarchy too.

Speaker 2

他们甚至给智力赋予了道德价值。

And they even placed a moral value on intelligence.

Speaker 2

通过声称某人或某动物不聪明，就使奴役他们变得在道德上可以接受。

By claiming that a person or an animal wasn't intelligent, it became morally acceptable to subjugate them.

Speaker 2

而我们知道欧洲扩张的后续故事是如何发展的。

And we know how the rest of that European expansion story goes.

Speaker 0

因此，今天关于智力的概念部分源于男性与非男性之间的区分方式。

So today's notions about intelligence can be traced in part to the ways men distinguish themselves from non men.

Speaker 2

或者，更宽容地解读哲学家的观点，关于智力的思想史围绕着这样一个理念：智力是一种根本上属于人类的特质。

Or to give the philosophers a more generous interpretation, the history of thought around intelligence centers on the idea that it is a fundamentally human quality.

Speaker 0

如果智力和理论源于人性，我们该如何决定其他实体（如动物或大型语言模型）的智力程度呢？

So if intelligence and theory stems from humanity, how do we decide the degree to which other entities like animals or large language models are intelligent?

Speaker 0

我们能依赖对它们行为的观察吗？

Can we rely on observations of their behavior?

Speaker 0

还是我们需要理解它们大脑或软件电路内部正在发生什么？

Or do we need to understand what's going on under the hood inside their brains or software circuits?

Speaker 2

一位试图解决此类问题的科学家是艾丽卡·卡特米尔。

One scientist trying to tackle such questions is Erica Cartmill.

Speaker 1

我的名字是埃里卡·卡特米尔。

So my name is Erica Cartmill.

Speaker 1

我是印第安纳大学的认知科学、动物行为、人类学和心理学教授。

I'm a professor of cognitive science, animal behavior, anthropology, and psychology at Indiana University.

Speaker 1

我主要研究认知，特别是社会认知，以及使不同物种间能够进行交流的认知能力。

You know, I really study, cognition, particularly social cognition, and the kinds of cognition that allow communication to happen across a wide range of species.

Speaker 0

埃里卡在观察与人类截然不同的生物的智能行为方面有着丰富的经验。

Erica has extensive experience observing intelligent behavior in beings that are very different from humans.

Speaker 1

而且，我从小就很着迷于动物，我们家养了各种各样的动物。

Also, I've got the animal bug, when I was a kid, and we had, you know, a whole range of of different kinds of animals.

Speaker 1

简直像一个动物展览馆。

It's sort of a menagerie.

Speaker 1

我们养了马。

We had, horses.

Speaker 1

我们养了狗。

We had dogs.

Speaker 1

我们还养了一只乌龟。

We had a turtle.

Speaker 1

我们养了一只鹦鹉。

We had a parrot.

Speaker 1

我总是喜欢在外面观察，比如看蜥蜴、蝴蝶、鸟类，还有我们谷仓里的老鼠。有时候，我会抓一只蜥蜴，放在一个玻璃缸里观察两天，然后再放回自然环境中。这种想要观察自然世界，同时又能以某种受控的方式更近距离地观察它的愿望，即使在我还是孩子时就已存在，而这种做法也一直延续到了我成年后的科研生涯中。

And I was always sort of out, you know, watching, like watching lizards and butterflies and birds, mice in our barn, and you know sometimes it's like I would catch a lizard, put it in a terrarium for two days, observe it, let it go again, and that kind of wanting to observe the natural world and then have an opportunity to more closely observe it under, you might say, controlled circumstances even as a child, and then release it back into its natural environment is really something that I've continued to do as an adult in my scientific career.

Speaker 1

现在我的实验室主要在这两类对象之间开展研究：大猿和人类儿童。

And that's what I do mostly with my lab now, kind of split between studying great apes and human children.

Speaker 1

但我还研究过其他多种物种，比如达尔文雀和加拉帕戈斯群岛的生物。

But I've done work on a range of other species as well, Darwin's finches and the Galapagos.

Speaker 1

我目前正在做一个项目，也涉及海豚、狗和凯亚——这是一种新西兰的鹦鹉。

I'm doing a project now that also includes dolphins and dogs and kia, which is a New Zealand parrot.

Speaker 1

我正在印第安纳大学筹建一个狗类研究实验室，因此我对这些其他物种的研究感到非常兴奋。

And I'm starting a dog lab at IU, so I'm excited about about some of those other species.

Speaker 1

但我想说，我工作的核心其实是比较大猿和人类的认知与沟通能力。

But I'd say the core of my work really focuses on comparing the cognitive and communicative abilities of great apes and humans.

Speaker 2

艾丽卡的大部分研究都集中在语言和交流的演化上。

Much of Erica's research has been on the evolution of language and communication.

Speaker 2

正如我们之前所说，复杂的语言是我们物种独有的。

As we've said before, complex language is unique to our species.

Speaker 2

但其他动物以多种方式交流。

But other animals communicate in many ways.

Speaker 2

因此，研究人员一直在努力缩小范围，以确定究竟是什么让我们的语言如此独特。

So researchers have been trying to narrow down what exactly makes our language so distinct.

Speaker 1

我认为人类一直非常关注这个问题：是什么将我们与其他物种区分开来。

So I think humans have always been really focused on this question of what separates us from other species.

Speaker 1

长期以来，对这一问题的回答都围绕着语言作为决定性界限展开。

And for a long time, answers to that question centered around language as the defining boundary.

Speaker 1

而这些关于语言的争论大多聚焦于语言的结构特征。

And a lot of those arguments about language really focused on the structural features of language.

Speaker 1

如果你回顾这些争论的历史，就会发现每当语言学家提出一个语言特征，比如‘人类语言之所以不同是因为X’，人们就会去研究动物，然后发现‘椋鸟具备这个特征’，或者‘某种猴子也有这个特征’。

And if you look at sort of the history of these arguments, you would see that every time a linguist proposed a feature of language that say, you know, human language is different because x, then people would go out and study animals and they would say, well, you know, starlings have that particular feature or a particular species of monkey has that feature.

Speaker 1

然后语言学家们会重新集结，说好吧。

And then linguists would sort of regroup and say, okay.

Speaker 1

实际上，这个其他特征才是真正的分界线。

Well, actually, this other feature is the is the real dividing line.

Speaker 1

我觉得，无论是从无聊还是有趣的角度来看，答案可能是并没有一个单一的特征。

And, you know, I think probably the boring answer or or interesting answer, depending on how you look at it, is that there probably isn't one feature.

Speaker 1

正是语言特征的独特组合，加上一系列认知能力，才使得语言与众不同且如此强大。

It's the unique constellation of features combined with a constellation of cognitive abilities that make language different and make it so powerful.

Speaker 1

但我要说，近年来，关于语言为何独特这一争论的重点，已经从‘语言的独特性源于某种特定结构特征’，转向了‘语言的独特性在于它建立在对他人心理的丰富社会理解之上’。

But I will say in recent years, the the focus of these arguments about language is unique because has shifted from language is unique because of some particular structural feature to language is unique because it is built on a very rich social understanding of other minds.

Speaker 1

它建立在对他人目标的推断、对他人所知与未知的揣测之上。

It's built on inferences about others' goals, about what others know and don't know.

Speaker 1

它建立在我们所说的语用学和语言学之上。

It's built on what we call pragmatics and linguistics.

Speaker 1

因此，它实际上非常不同于一个可以随意应用和运行的结构化程序。

So actually, it's very unlike a structured program that you can sort of apply and run anywhere.

Speaker 1

这实际上依赖于对他人意图的丰富推断。

It's actually something that relies on rich inferences about others' intentions.

Speaker 2

当我们人类交流时，我们常常试图传达自己的内心想法和感受，或者对他人内心状态做出推断。

When we humans communicate, we're often trying to convey our own internal thoughts and feelings, or we're making inferences about someone else's internal state.

Speaker 2

我们自然地将外部行为与内在过程联系起来。

We naturally connect external behavior with internal processes.

Speaker 2

但当我们面对其他生物时，我们对智能的判断就没那么直接了。

But when it comes to other beings, our ability to make judgments about intelligence isn't as straightforward.

Speaker 0

因此，今天我们将探讨，通过外部行为并将人类的智能观念应用于动物和机器时，我们能学到什么——这些机器和动物通过的测试，表面上看起来与人类表现惊人地相似。

So today, we're going to look at what we can learn from external behavior and applying human notions of intelligence to animals and machines, which can pass tests at levels that are deceptively similar to humans.

Speaker 0

第一部分：评估人类、动物和机器的智能。

Part one, assessing intelligence in humans, animals, and machines.

Speaker 0

如果你家里养了宠物，你很可能有过这样的时刻，想弄清楚它吠叫、喵叫或尖叫时到底想表达什么。

If you have a pet at home, you've probably had moments when you've wanted to know what it's trying to say when it barks, meows, or squawks.

Speaker 0

我们经常把人类的特质投射到宠物身上，其中一种方式就是想象它们在说‘我饿了’或‘我想出去’之类的话。

We anthropomorphize pets all the time, and one of the ways we do that is by envisioning them saying things like, I'm hungry, or I want to go outside.

Speaker 0

或者我们可能会好奇它们彼此之间在说什么。

Or we might wonder what they say to each other.

Speaker 2

动物确实会彼此交流，但关于它们交流的复杂程度，一直存在很多争议。

Animals most definitely communicate with one another, but there's been a lot of debate about how sophisticated their communications are.

Speaker 2

黑猩猩的吼叫或鸟类的鸣叫是否总是意味着相同的内容？

Does a chimp's hoot or a bird's squawk always mean the same thing?

Speaker 2

还是这些信号像人类语言一样具有灵活性，能够根据语境传达不同的含义，包括动物对其听众心理状态的理解？

Or are these signals flexible, like human words, communicating different meanings depending on context, including the animal's understanding of the state of its listeners' minds?

Speaker 2

在她的研究中，艾丽卡批评了人们在测试动物交流时常常做出的假设。

In her work, Erica has critiqued the assumptions people often make in experiments testing animal communication.

Speaker 2

她指出，所使用的方法未必能揭示声音信号及其他类型信号的可能含义，尤其是当这些含义依赖于特定语境时。

She's noted that the methods used won't necessarily reveal the possible meaning of both vocal and other kinds of signals, especially if those meanings depend on particular context.

Speaker 1

最近，从认知科学家到哲学家再到语言学家的作者们认为，人类交流的独特性在于其背后依赖着极为丰富的心理特性。

Authors recently, you know, ranging from cognitive scientists to philosophers to linguists, have argued that human communication is unique because it relies on these very rich psychological properties that underlie it.

Speaker 1

但这反过来又引发了关于人类与其他动物之间界限的新争论：动物使用的交流方式非常像代码——一个动物发出信号，另一个动物听到或看到该信号后解码其含义，而这种交流并不依赖于对他人意图或目标的推断，信号可以被直接读取和回应。

But this in turn has now led to new arguments about the dividing line between humans and other animals, which is that animals use communication that is very code like that, you know, one animal will produce a signal and another animal will hear that signal or see that signal and decode its meaning and that it doesn't rely on inferences about another's intentions or goals that, you know, the signals kind of can be read into and out of system.

Speaker 1

如果你录下一段听觉信号，比如鸟鸣，然后把扬声器藏在树上，播放这段叫声，观察其他鸟的反应，对吧？

If you record, say an auditory signal like a bird call, and then you hide a speaker in a tree and you play that call back and you see how other birds respond, right?

Speaker 1

所以这被称为回放法，这并不令人意外。

So this is called the playback method unsurprisingly.

Speaker 1

这是动物交流研究者用来证明这些叫声确实具有特定含义的最有力工具之一。

And that's been one of the strongest things in the toolkit that animal communication researchers have to demonstrate that those calls in fact have particular meanings.

Speaker 1

它们不仅仅是‘我唱歌是因为好听’，而是这个叫声意味着‘走开’，那个叫声意味着‘过来和我见面’，另一个叫声意味着‘附近有食物’，等等。

That they're not just I'm singing because it's beautiful, but that this call means go away and this other call means come and meet with me, and this other call means there's food around, etcetera, etcetera.

Speaker 1

对吧？

Right?

Speaker 1

因此，将这些信号脱离语境，再播放给同种动物听，观察它们的反应，是科学家证明某种叫声具有特定含义的主流方法。

And so decontextualizing those signals and then presenting them back to members of the species to see how they respond is the dominant method by which scientists demonstrate that a call has a particular meaning.

Speaker 1

这种方法在论证动物确实在传递信息方面至关重要。

That's been incredibly important in, you know, arguing that animals really are communicating things.

Speaker 1

但这种方法以及用于设计实验来探究动物交流问题的底层模型，也存在很大局限性。

But that method and the underlying model that is used to design experiments to ask questions about animal communication is also very limiting.

Speaker 0

一个脱离语境的听觉信号，无论是人类的词语还是动物的叫声，都只是动物和人类之间多种交流方式中非常狭窄的一个方面。

An auditory signal taken out of context, whether a word or an animal call, is a very narrow slice of all the different ways animals and humans communicate with each other.

Speaker 1

因此，这种方法非常擅长证明某一件事，但同时也关闭了关于动物可能做出的其他推断的大门。

So it's very good at demonstrating one thing, but it also closes off doors about the kinds of inferences that animals might be making.

Speaker 1

如果拉里发出这个叫声，而我和拉里是朋友；而鲍勃发出那个叫声，而我和鲍勃是敌人，我该如何回应？

If Larry makes this call and I'm friends with Larry versus Bob makes that call and I'm enemies with Bob, how do I respond?

Speaker 1

鲍勃知道我在这里吗？

Does Bob know that I'm there?

Speaker 1

他能看见我吗？

Can he see me?

Speaker 1

他是因为我在这里、看见了我，才朝我发出这个叫声，还是他其实是在对别人叫，而我只是在偷听？

Is he making that call because I'm there and he sees me and he's directing that call to me versus is he making that call to someone else and I'm eavesdropping on it?

Speaker 1

这些正是动物能够做出的推断类型。

Those are kinds of inferences that animals can make.

Speaker 1

我不是说所有动物在所有情况下都如此，但我们在提出关于动物交流的问题时，所采用的方式决定了我们能得到哪些类型的答案。

I'm not saying all animals in all cases, but the ways that we ask questions about animal communication afford certain kinds of answers.

Speaker 1

我认为，我们需要更加谦逊。

And we need, I think, be more, humble is the right word.

Speaker 1

但我们必须认识到这些方法如何限制了我们所能得出的结论，因为这与我们研究人类语言的方式截然不同。

But we recognize need the ways in which they limit the conclusions that we can draw because this is very different from the way that we ask questions about human language.

Speaker 1

因此，当我们基于那些设计来提出根本不同问题的研究结果，来得出人类语言与动物交流之间差异的结论时，我认为这存在很大问题。

And so when we draw conclusions about the difference between human language and animal communication based on the results of studies that are set up to ask fundamentally different questions, I think that leaves a lot to be desired.

Speaker 0

专注于与人类智力相关的能力，可能会误导我们对动物智力的理解。

And focusing on abilities that are relevant to humans' intelligence might mislead us in how we think about animal intelligence.

Speaker 1

我常常觉得，人类作为一个物种非常以自我为中心。

I often think that humans are very egotistical as a species.

Speaker 1

对吧？

Right?

Speaker 1

我们擅长某些特定的事情，并且倾向于更重视自己擅长的方面。

So we we're very good at particular things, and we tend to place more value on the things that we're good at.

Speaker 1

我认为在很多情况下，这样也没问题。

And I think that in many cases, you know, that's fine.

Speaker 1

这是我们物种独有的一个特点。

That's one of our unique quirks as a species.

Speaker 1

但它也常常限制了我们提问的方式，以及我们对其他物种智力的归类。

But it also often limits the way that we ask questions and, you know, attribute kinds of intelligence to other species.

Speaker 1

因此，我认为，人类要跳出自己擅长的领域，甚至跳出我们自身的感知，是相当困难的。

So it can be quite difficult, I think, for humans to think outside of the things that we're good at or indeed outside of our own senses.

Speaker 1

我的意思是，五种感官，生物感官。

I mean, sort of five senses, biological senses.

Speaker 1

长期以来，我们知道大象能够从不同的起点汇聚到特定地点，比如在某一天的某个时间，从很远的地方出现在同一棵树下。

So elephants, we've known for a long time that elephants are able to converge at a particular location, like show up far away at this tree on this day at this time from different starting points.

Speaker 1

人们真的不知道它们是如何做到的。

People really didn't know how they were doing it.

Speaker 1

对吧？

Right?

Speaker 1

它们的起点相距太远，根本听不到彼此。

They were starting too far apart to be able to hear one another.

Speaker 1

我的意思是，人们都在想，它们是在计划吗？

I mean, people were like, are they planning?

Speaker 1

它们能意识到两周后的星期二我们要在水塘见面吗？

Do they have the sense of two Tuesdays from now we're going to meet at the watering hole?

Speaker 1

直到有人提出，也许它们使用的是我们自身感知能力之外的感官。

And it wasn't until people said maybe they're using senses that fall outside of our own perceptual abilities.

Speaker 1

特别是，人们测量了极低的频率，并问道：也许它们是以我们无法察觉的方式发出声音，对吧？

In particular, they measured very, very low frequencies and basically asked, okay, maybe they're vocalizing in a way that we can't perceive, Right?

Speaker 1

于是，一旦他们这样做了，并大幅降低了录音设备的频率阈值，就发现大象实际上在极远的距离上发出声音。

And so once they did that and they greatly lowered the frequency of their recording equipment, they found that elephants were in fact vocalizing at very, very long distances.

Speaker 1

但它们是通过一种被称为‘低鸣’的发声方式实现的，这种声音实际上是通过地面传播，而不是通过空气。

But they were doing it through this rumble vocalization that actually propagates through the ground rather than through the air.

Speaker 1

它们发出这些声音——我没法模仿，因为即使我能发出，你也听不见——但这些极低的嗡鸣声，其他大象在几公里外并不是用耳朵听到的，而是通过脚掌上特化的细胞感知到地面的震动。

And so they produce these, I can't imitate it because you couldn't hear it even if I could, but they produce these very low rumbles that other elephants, you know, kilometers away perceive not through their ears, but they perceive actually through specialized cells in the pads of their feet where they can feel the vibrations.

Speaker 1

所以我认为，这是一个很好的例子，说明我们必须——实际上，甚至不必像大象那样思考，而是要想象像大象那样去听，拥有像大象那样的身体。我称之为‘跳出人类的思维’。

And so I think this is a nice example of the way that we have to, you know, in effect, not even necessarily think like an elephant, but imagine hearing like an elephant, having a body like an elephant, thinking, I like to call it thinking outside the human.

Speaker 1

人类擅长某些特定的事情。

Humans are good at particular things.

Speaker 1

我们拥有特定类型的身体。

We have particular kinds of bodies.

Speaker 1

我们在特定的时间尺度上感知事物。

We perceive things on particular time scales.

Speaker 1

我们在特定的光波长和声音频率上感知事物。

We perceive things at particular light wavelengths and auditory frequencies.

Speaker 1

让我们暂时把这些放一边，想想：这个物种进化出来是做什么的？

Let's set those aside for a second and think about, okay, what did that species evolve to do?

Speaker 1

它的感知系统让它能感知到什么？

What do its perceptual systems allow it to perceive?

Speaker 1

试着提出更贴合我们所研究物种的问题。

And try to ask questions that are better tailored to the species that we're looking at.

Speaker 1

已经有很多

There's been a lot

Speaker 2

在过去的几十年里，人们做了大量工作，试图教其他物种如黑猩猩、倭黑猩猩或非洲灰鹦鹉人类语言。

of work throughout the many decades on trying to teach human language to other species, like chimps or bonobos or African gray parrots.

Speaker 2

关于它们究竟学到了什么，一直存在很大争议。

And there's been so much controversy over what they have learned.

Speaker 2

目前对这些其他物种的语言能力以及这些实验的整体看法是什么？

What's the current thinking on the language abilities of these other species and those experiments in general?

Speaker 1

要回答‘当前的看法是什么’这个问题几乎很困难，因为现在几乎没有相关研究。

It's almost hard to answer the question what's the current thinking because there's very little current research.

Speaker 1

很多这类研究都是在二十年前，甚至四十年前进行的。

A lot of that research was done, you know, twenty or even forty years ago.

Speaker 1

与三十年前的研究相比，如今对猿类、鹦鹉和海豚的研究非常少——而三十年前，人人都在试图教动物人类语言。

Compared to the work that was being done thirty years ago, there's very little current work with apes and parrots and dolphins, all of which thirty years ago, you know, everyone was trying to teach animals human language.

Speaker 1

我认为这是一个非常有趣的研究领域。

And I think that's a really interesting area of inquiry.

Speaker 1

人们的看法略有不同，但我认为，最主流的观点，或者说最能概括当前讨论的是：今天的人们普遍认为，这些动物确实能够学习、理解并主动使用词语，但它们能掌握的词汇范围有限，且无法将词语组合成有创造性的句子。

You know, I would say people differ a little bit, but I think that probably the most dominant opinion or maybe the discussion is best characterized by saying that people today, I think, largely believe that those animals were able to learn, understand, and productively use words, but that they were limited in the scope of the words they could learn and that they weren't combining them into productive sentences.

Speaker 1

而这正是争论的一部分，即句法——按照特定规则组合词语——是人类语言独有的特征，与动物所能产生的语言截然不同。

And this was part of the argument that syntax, the combining of words according to particular rules, was something that human language did that was very different from what animals could produce.

Speaker 1

因此，我认为动物语言研究普遍表明，动物能够学习词语、发出词语，有时也能将词语组合起来，但它们并未形成稳定、类似句子的结构。

And so I think with the animal language studies that were showing largely that animals could learn words, they could produce words, they could sometimes produce words together, but they weren't doing it in reliable sentence like structures.

Speaker 1

但你认为，我们试图

But do you think that the fact that we were trying

Speaker 2

用人类语言来教导它们，以此评估它们的认知能力，这种做法是否有助于理解动物认知？

to teach them human language in order to assess their cognitive abilities was a good approach to understanding animal cognition?

Speaker 2

还是我们应该像你之前说的那样，更多地从它们的角度出发，尝试理解它们的体验，而不是训练它们更像我们？

Or should we more do what you said before, sort of take their point of view, try to understand what it's like to be them rather than train them to be more like us?

Speaker 2

我认为这是一个

I think that's a

Speaker 1

很好的问题。

great question.

Speaker 1

我的回答可能取决于人类想象力的局限性，我认为，通过让动物以我们的表达方式交流，我们能够提出更好的问题，并更准确地解读它们的回答，而不是试图完全理解它们的沟通系统。

My answer probably hinges around the limitations of human imagination, where I think that teaching animals to communicate on our terms allows us to ask better questions and better interpret their answers than us trying to fully understand, you know, their communication systems.

Speaker 1

人们当然正在使用机器学习等技术，试图破译鲸鱼的歌声或鸟鸣。

People certainly are using, you know, things like machine learning to try to quote unquote decode whale song or birdsong.

Speaker 1

我认为这些方法更贴近动物的自身方式，利用它们自然的交流系统。

I think that those approaches, which, you know, is more sort of on the animals terms or using their natural communication.

Speaker 1

我认为这些方法非常有趣。

And I think that those are very interesting approaches.

Speaker 1

我认为它们在发现动物所产生内容的模式方面会很有成效。

I think they'll be good at finding patterns in what animals are producing.

Speaker 1

但我认为，一个仍然悬而未决的问题是：动物自身是否感知到了这些模式，并以对它们有意义的方式加以利用。

The question I think still remains whether animals themselves are perceiving those patterns and are using them in ways that have meaning to them.

Speaker 0

我们今天评估人工智能系统智能的方式，也受限于人类的想象力，甚至可能比评估动物时更甚，因为大型语言模型默认说的就是我们的语言。

And the way we've tried to assess intelligence in today's AI systems also hinges around the limitations of human imagination, perhaps even more so than animals, given that by default, LLMs speak our language.

Speaker 0

我们仍在探索如何评估它们。

We're still figuring out how to evaluate them.

Speaker 3

是的。

Yeah.

Speaker 3

我的意思是，我会说它们的评估方式非常糟糕，可以说是很差。

I mean, I would say they're evaluated very, you know, I would say badly.

Speaker 0

这是艾莉·帕弗利克。

This is Ellie Pavlick.

Speaker 0

艾莉是布朗大学计算机科学和语言学的助理教授。

Ellie is an assistant professor of computer science and linguistics at Brown University.

Speaker 0

艾莉做了大量工作，试图理解大型语言模型的能力。

Ellie has done a lot of work on trying to understand the capabilities of large language models.

Speaker 3

它们目前是用我们能够方便评估的东西来评估的。

They're evaluated right now using the things that we can conveniently evaluate.

Speaker 3

对吧？

Right?

Speaker 3

这完全是基于我们能测量什么，就测量什么。

It is very much a, like, what can we measure and that's what we will measure.

Speaker 3

有很多是对原本用于人类的评估方式的重新利用。

There's a lot of repurposing of existing kind of evaluations that we use for humans.

Speaker 3

比如SAT考试、MCAT考试之类的。

So things like the SAT or the MCAT or something like that.

Speaker 3

所以这些考试并不是完全与我们关心的内容无关，但它们并不是深入、细致的诊断工具。

And so it's not that those are like completely uncorrelated with the things we care about, but they're not very deep thoughtful diagnostics.

Speaker 3

像智商测试或SAT考试，长期以来在衡量人类智力方面都存在诸多问题。

Things like an IQ test or an SAT have long histories of problems for valuing intelligence in humans.

Speaker 3

但它们也根本不是为这类模型作为测试对象而设计的。

But they also just weren't designed with models of this type being the subjects.

Speaker 3

当一个人通过MCAT或SAT考得好时，这意味着什么，和神经网络做到这一点时的意义并不相同。

Think, like, what it means when a person passes the MCAT or scores well on the SAT is not the same thing as what it might mean when a neural network does that.

Speaker 3

我们其实并不清楚当神经网络做到这一点时意味着什么，这正是问题的一部分。

We don't really know what it means when a neural network does it, that's part of the problem.

Speaker 2

那你为什么认为这不一样呢？

So why do you think it's not the same thing?

Speaker 2

我的意思是，人类通过律师资格考试和大语言模型之间有什么区别？

I mean, what what's the difference between humans passing a bar exam and large language model?

Speaker 3

是的。

Yeah.

Speaker 3

我的意思是，这是一个相当深刻的问题。

I mean, that's a pretty deep question.

Speaker 3

对吧？

Right?

Speaker 3

所以，就像我之前说的，相比很多同行，我反而不太愿意轻易断言语言模型显然没有像人类那样思考。

So, like, I think like I said, I tend to actually I would say compared to a lot of my peers, be, like, not as quick to say the language models are obviously not doing what humans do.

Speaker 3

对吧？

Right?

Speaker 3

我倾向于留一些余地，承认它们可能实际上比我们愿意承认的更像人类。

Like I tend to reserve some space for the fact that they might actually be more human like than we want to admit.

Speaker 3

很多时候，人们通过这些考试所使用的过程，可能并没有我们想象的那么深入。

You know, a lot of times processes that people might be using to pass these exams might not be as deep as we like to think.

Speaker 3

所以当一个人，比如说，SAT考得很好的时候，我们可能倾向于认为他具备某种更通用的数学推理能力和通用的语言推理能力，而这能预测他在其他类型考试中的表现。

So when a person, say, scores well on the SAT, we might like to think that there's some more general mathematical reasoning abilities and some general verbal reasoning abilities, and then that's gonna be predictive of their ability to do well in other types of tests.

Speaker 3

这就是为什么它对大学录取有用。

That's why useful for college admission.

Speaker 3

但我们知道，在实践中，人类往往只是在学习如何应对SAT考试。

But we know in practice that humans often are just learning how to take an SAT.

Speaker 3

对吧？

Right?

Speaker 3

我认为我们很可能会认为，这些大型语言模型主要是在学习如何应对SAT考试。

And I think we very much would think that these large language models are mostly learning how to take an SAT.

Speaker 2

所以为了澄清一下，当你说我知道人类学习如何通过考试意味着什么时。

So just to clarify, when you say I mean, I know what it means when a human is learning how to pass a test.

Speaker 2

但语言模型是如何学会通过考试的呢？

But how does a language model learn how to pass a test?

Speaker 3

是的。

Yeah.

Speaker 3

我们可以想象一个简单的场景，我认为人们更容易理解，那就是，假设我们用大量的SAT考试例子来训练语言模型。

So we can imagine, like, the simple setting I think people are better at thinking about, which is, like, let's pretend we trained the language model on lots of examples of SATs.

Speaker 3

它们会学会某些并不完美但非常可靠的关联。

They're gonna learn certain types of associations that are not perfect, but very reliable.

Speaker 3

我以前总和我丈夫开玩笑，我们在大学时讨论如何在从未学过该科目的情况下通过多项选择题考试，我们偶尔会尝试，比如我试着去通过他医学院的资格考试。

And I used to always have this joke with my husband and I were in college about, like, how you could pass the multiple choice test without having ever taken the subject, and we would occasionally try to like, I would try to pass his qualifying exams in med school.

Speaker 3

我想我和他一起参加过一次经济学考试。

I think he took an Econ exam with me.

Speaker 3

因为有些情况是这样的：比如，当选项中有‘以上全部’或‘以上都不是’时，这些选项更有可能是正确答案，因为它们并不总是出现，只有当这是正确答案时才会出现，或者这是教授高效检验你是否掌握这三个知识点的好方法。

Because you could like so there's certain things like, you know, whenever there's something like all of the above or none of the above, that's like more likely to be the right answer than not because it's not always there, so it's only there when that's the right thing, or it's like a good way for the professor to test that you know all three of these things efficiently.

Speaker 3

同样，当你看到选项中出现‘总是’或‘从不’这类绝对化词语时，它们几乎总是错的，因为出题者想测试你是否理解某些细微差别。

Similarly, when you see answers like always or never in them, those are almost always wrong because they're trying to test whether you know some nuanced thing.

Speaker 3

然后还有一些‘有些’或‘以上都不是’的选项，虽然都不完美，但你可以逐渐掌握更复杂的启发式方法，比如根据词语判断，这个选项似乎更相关或更不相关。

Then there's some and none of these is perfect, but you can get increasingly sophisticated kind of heuristics and things like, oh, based on the words, this, you know, this one seems like more or less related.

Speaker 3

这个选项看起来在主题上有点跑偏，诸如此类。

This seems kind of topically off base, whatever.

Speaker 3

所以你可以想象，存在一些你可以捕捉到的模式。

So you can imagine there's kind of like patterns that you can pick up on.

Speaker 3

如果你把很多很多这样的方法组合在一起，你就能很快达到近乎完美的表现，前提是数量足够多。

And if you stitch many, many of them together, you can pretty quickly get to possibly perfect performance, you know, with enough of them.

Speaker 3

所以我认为，这是一种普遍的看法：语言模型之所以能看起来好像懂得比实际多得多，是因为它们将大量这类启发式方法拼接在一起。

So I think that's a common feeling about how language models could get away with looking like they know a lot more than they do by kind of stitching together a very large number of these kinds of heuristics.

Speaker 0

如果我们能了解大语言模型内部到底发生了什么，这会有帮助吗？

Would it help if we knew what was going on under the hood, you know, with LLMs?

Speaker 0

我们对大脑的了解其实也不多，对大语言模型更是几乎一无所知。

We don't really actually know a whole lot about our brains either, and we don't know anything about LLMs.

Speaker 0

但如果我们能稍微窥探一下内部机制，这会有任何帮助吗？

But would it help in any way if we sort of could look under the hood?

Speaker 3

我的意思是，这正是我押注的方向。

I mean, that's where I'm, like, placing my bets.

Speaker 3

是的。

Yeah.

Speaker 2

在第二部分，我们将看看研究人员是如何真正深入内部进行研究的。

In part two, we'll look at how researchers are actually looking under the hood.

Speaker 2

许多人正试图以神经科学家理解大脑的方式去理解大语言模型。

And many of them are trying to understand LLMs in a way that's analogous to how neuroscientists understand the brain.

Speaker 2

第二部分：深入内部。

Part two, going under the hood.

Speaker 0

好的。

Okay.

Speaker 0

等一下。

So wait a minute.

Speaker 0

如果我们讨论的是对动物或人类的机制性理解，也就是理解产生行为的脑回路，那么这确实是一种需要我们去发现的东西。

If we're talking about mechanistic understanding in animals or humans, that is, understanding the brain circuits that give rise to behavior, it makes sense that it's something we need to discover.

Speaker 0

这对我们来说并不明显，就像你只看汽车的外表，也无法明白它是如何工作的。

It's not obvious to us, in the same way that it's not obvious how a car works if you just look at the outside of it.

Speaker 0

但我们确实知道汽车内部是如何工作的，因为它们是人类的发明。

But we do know how cars work under the hood because they're human inventions.

Speaker 0

在本季的大部分时间里，我们一直在讨论如何更好地了解人工智能系统并理解它们的行为。

And we've spent a lot of this season talking about how to learn more about artificial intelligence systems and understand what they're doing.

Speaker 0

众所周知，存在所谓的黑箱。

It's a given that there are so called black boxes.

Speaker 0

但我们创造了AI。

But we made AI.

Speaker 0

人类程序员创建了大语言模型。

Human programmers created large language models.

Speaker 0

为什么我们没有机制性的理解？

Why don't we have a mechanistic understanding?

Speaker 0

为什么这会是个谜？

Why is it a mystery?

Speaker 0

我们问了Ellie她的看法。

We asked Ellie what she thought.

Speaker 3

人们编写的程序是用来训练模型的，而不是直接编程模型本身。

The program that people wrote was programmed to train the model, not the model itself.

Speaker 3

对吧？

Right?

Speaker 3

所以模型本身是这一系列线性代数方程。

So the model itself is this series of linear algebraic equations.

Speaker 3

没有人坐下来编写，比如说，好吧。

Nobody sat down and wrote like, okay.

Speaker 3

在，你知道的，第五千个矩阵的第一百一十八个单元格里，会有一个零点零二。

In, you know, the hundred and eighteenth cell of the five thousandth matrix, there'll be a point zero two.

Speaker 3

对吧？

Right?

Speaker 3

相反，有大量的数学理论在解释，为什么这是需要优化的正确函数？

Instead, there's a lot of mathematical theory that says, why is this the right function to optimize?

Speaker 3

我们如何编写代码？

How do we write the code?

Speaker 3

以及我们如何在多台机器上并行处理它？

And how do we parallelize it across machines?

Speaker 3

比如，这其中涉及了大量的技术性数学知识。

Like, there's a ton of technical mathematical knowledge that goes into this.

Speaker 3

还有许多其他变量会对此产生影响。

There's all of these other variables that kind of factor in.

Speaker 3

它们确实是这个过程的重要组成部分，但我们并不清楚它们在这一特定系统中是如何映射的。

They're very much part of this process, but we don't know how they map out in this particular thing.

Speaker 3

这就像是你设定了一些规则和约束来引导一个系统，但系统本身却在很大程度上自主运行。

It's like you kind of set up some rules and constraints to guide a system, but the system itself is kind of on its own.

Speaker 3

比如，你正在引导一群人穿过城市参加游行。

So like if you you're routing like a crowd through like a city or something for a parade.

Speaker 3

对吧？

Right?

Speaker 3

事后你试图弄清楚，为什么地上会有一个杯子，而且是以某种特定的朝向摆放。

And now you come after ward and you're trying to figure out like why there's a particular cup on the ground in a particular orientation or something.

Speaker 3

就像是，你设定了……

It's like, but you set up.

Speaker 3

你知道人们会去哪里，嗯，大概知道，但还有这么多其他因素。

Like, you knew where the people were gonna go and it's like, yeah, kind of, but there's all of this other stuff.

Speaker 3

它受到你所设定的限制约束，但那并不是全部。

It's constrained by what you set up, but that's not all that there is.

Speaker 3

满足这些限制的方式有很多，其中一些会产生某种行为影响，另一些则会产生其他影响。

There's many different ways to meet those constraints, and some of them will have some behavioral effects and others will have others.

Speaker 3

对吧？

Right?

Speaker 3

存在一种情况，所有人都遵循了你的规则，但地上却没有杯子。

There's a world where everyone followed your rules and then there wasn't a cup there.

Speaker 3

还有一种情况，比如那些车撞了或没撞，所有这些其他事情都受其他过程的影响。

And, you know, there's a rule where, like, those cars crashed or didn't crash like and all of those other things are subject to other processes.

Speaker 3

所以这是一个被写下来但未充分明确的问题，对吧？

So it's kind of an underspecified problem, right, that was written down.

Speaker 3

填补这些细节的方式有很多，而我们不知道为什么最终得到了这一个。

And there are many ways to fill in the details and we don't know why we got this one that we got.

Speaker 2

因此，当我们评估大语言模型时，这与评估人类并不完全相同，因为我们并不清楚在你设定的约束与例如ChatGPT的SAT分数之间发生了什么。

So when we're assessing LLMs, it's not quite the same as humans because we don't know what happens between the constraints we set up and for example, chat GPT's SAT score at the end.

Speaker 2

而且我们也不总是知道每个人是如何通过SAT考试的。

And we don't always know how individual people are passing the SAT either.

Speaker 2

一个人的分数在多大程度上反映了他们的底层推理能力，又在多大程度上反映了他们应试的技巧。

How much someone's score reflects their underlying reasoning abilities versus how much it reflects their ability to sort of game the test.

Speaker 2

但至少，当我们看到大学申请中的SAT分数时，我们知道这个分数背后是一个活生生的人。

But at the very least, when we see an SAT score on a college application, we do know that behind that SAT score, there's a human being.

Speaker 3

我们可以理所当然地认为，我们都拥有一个人类的大脑。

We can take for granted that we all have a human brain.

Speaker 3

没错。

It's true.

Speaker 3

我们根本不知道它如何运作，但因为它是我们进化过程中一直与之相处的实体，所以它算是一种已知的存在。

We have no idea how it works, but we it is kind of a known entity because we've evolved dealing with humans.

Speaker 3

你一生都在与人类打交道。

You live our whole life dealing with humans.

Speaker 3

所以当你选择一个人来上你的大学，或者雇用一个人工作时，这不仅仅是一个通过了SAT的机器，而是一个通过了SAT的人。

So when you pick somebody to come to your university or you hire someone for a job, it's like it's not just a thing that passed SAT, it's a human that passed the SAT.

Speaker 3

对吧？

Right?

Speaker 3

也就是说，这是一个相关的特征。

Like, that is one relevant feature.

Speaker 3

presumably，更相关的特征是，这是一个有血有肉的人。

Presumably, the more relevant feature is that it's a human.

Speaker 3

因此，随之而来的是，你可以对那些通过SAT考试或取得特定分数的人做出许多推断，比如他们可能还具备哪些能力。

And so with that comes a lot of inferences you can make about what humans who passed the SAT or score a certain score probably also have the ability to do.

Speaker 3

对吧？

Right?

Speaker 3

当你谈论的不是人类时，情况就完全不同了，因为我们根本不习惯与非人类打交道。

It's a completely different ballgame when you're talking about somebody who's not a human because that's just not what we're used to working with.

Speaker 3

所以确实，我们还不了解大脑是如何工作的。

And so it's true we don't know how the brain works.

Speaker 3

但如今，你面对的是另一个表现优异却完全不了解其运作机制的实体，对我来说，要开始破解这个问题，唯一的方法就是去追问：它们在机制层面是否相似？

But now that you're in the reality of having another thing that's scoring well and you have no idea how it works, To me, the only way to start to chip away at that is we need to ask if they're similar at a mechanistic level.

Speaker 3

就像问SAT分数在大语言模型和人类身上取得时是否具有相同的意义，这完全取决于它是如何获得这个分数的。

Like asking whether a score on the SAT means the same thing when an LLM achieves it as a human, it is like a 100% dependent on how it got there.

Speaker 0

在评估人工智能时，这里还有另一个问题。

Now when it comes to assessing artificial intelligence, there's another question here.

Speaker 0

在使用人工智能之前，我们需要在多大程度上理解它的运行方式或它的智能水平？

How much do we need to understand how it works or how intelligent it is before we use it?

Speaker 0

正如我们所确立的，我们并未完全理解人类智能或动物智能。

As we've established, we don't fully understand human intelligence or animal intelligence.

Speaker 0

人们一直在争论SAT对我们来说有多有效，但我们仍然经常使用它，参加SAT的学生也照样进入大学并拥有职业生涯。

People debate on how effective the SAT is for us, but we still use it all the time, and the students who take it go on to attend universities and have careers.

Speaker 3

我们经常使用那些作用机制尚不明确的药物，这确实如此。

We use medicines all the time that we don't understand the mechanisms that they work on, and that's true.

Speaker 3

我认为，我们并不需要在完全理解大语言模型的底层机制之后才能部署它们。

And it's like, I don't think it's like we cannot deploy LLMs until we understand how they work under the hood.

Speaker 3

但如果我们关心‘它是否具有智能’这类问题，那么仅仅因为我们关心这个问题，回答它可能对是否在某个特定应用场景中部署它并无实际意义。

But if we're interested in these questions of is it intelligent, Like, just the fact that we care about that question, like answering that question probably isn't relevant for whether or not you can deploy it in some particular use case.

Speaker 3

比如，如果你有一个用大语言模型处理客户投诉的初创公司，大语言模型是否智能其实并不重要。

Like, if you have a startup for LLMs to handle customer service complaints, it's not really important whether the LLM is intelligent.

Speaker 3

你只关心它能不能完成这件事。

You just care whether it can do this thing.

Speaker 3

对吧？

Right?

Speaker 3

但如果你真想问这个问题，那就打开了一罐非常大的虫子。

But if you wanna ask that question, we're opening up this very big can of worms.

Speaker 3

你不能一边问这些大问题，一边又不愿意做相应的大工作。

You can't ask the big questions and then not be willing to do the big work.

Speaker 2

对吧？

Right?

Speaker 2

而回答机制理解这个问题，确实是一项巨大的工作。

And answering the question of mechanistic understanding is really big work.

Speaker 2

就像其他科学领域一样，你必须决定自己真正追求的是哪个层次的理解。

As in other areas of science, you have to decide what level of understanding you're actually aiming for.

Speaker 3

是的。

Right.

Speaker 3

我的意思是，这种描述层次的概念在认知科学中早已存在。

I mean, this kind of idea of levels of description has existed in cognitive science.

Speaker 3

我认为认知科学家经常讨论的是，描述一个现象的恰当语言是什么？

I think cognitive scientists talk about a lot, which is what is the right language for describing a phenomenon?

Speaker 3

而且，有时候你可以有多个同时成立且一致的解释，它们确实应该彼此一致，但在某些层次上回答某些类型的问题是没有意义的。

And, like, sometimes you can have simultaneous consistent accounts, and they really should be consistent with one another, but it doesn't make sense to answer certain types of questions at certain levels.

Speaker 3

因此，我认为认知科学中一个经典的例子是量子物理与经典力学的对比。

And so I think a favorite example in cognitive sciences, I talk about like the quantum physics versus classical mechanics.

Speaker 3

对吧？

Right?

Speaker 3

比如，如果我把一个台球撞向另一个台球，试图用量子力学来描述这个过程，那会极其繁琐、荒谬且完全违背直觉，这根本行不通，而且你会错过物理学运作中一个至关重要的部分。

Like, it would be really cumbersome and bizarre and highly unintuitive, and we can't really do it to say like, if I roll this billiards ball into this billiards ball and try to describe it at a level of quantum mechanics, it would be an absurd thing to do, and you would be missing a really important part of how physics works.

Speaker 3

关于是否能用量子力学解释台球的运动，目前仍存在大量争议。

And there's a lot of debate about whether you could explain the billiards ball and the quantum mechanics.

Speaker 3

但关键是，底层有一些规律告诉我们这个球会存在。

But the point is like there's laws at the lower level that tell you that the ball will exist.

Speaker 3

一旦你知道球在那里了，用球来解释事情就合理了，因为在这个情境中，是球本身具有因果力，而不是组成球的各个部分。

And now once you know that the ball is there, it makes sense to explain things in terms of the ball because the ball has the causal force in this thing, not the individual things that make up the ball.

Speaker 3

但你仍然需要有规则，将这些小部分组合起来，从而得到球；而一旦你知道球存在了，就可以直接用球来讨论，而无需再诉诸底层的元素。

But you would wanna have the rules that combine the small things together in order to get you to the ball, and then when you know that the ball is there, then you can just talk in terms of the ball, and you don't have to appeal to the lower level things.

Speaker 3

有时候，只谈论球而不谈底层的东西更有意义。

And sometimes it just makes more sense to talk about the ball and not talk about the lower level things.

Speaker 3

我认为我们的感受是，我们正在寻找大语言模型内部的这些‘球’，以便能说：哦，这个语言模型之所以对这个提示做出这样的回答，但当你把句号前面加个空格时，它突然就答错了。

And I think the feeling is we're looking for those balls within the LLM so that you can say, oh, the reason the language model answered this way on this prompt, but when you change the period to have a space before it, it suddenly got the answer wrong.

Speaker 3

这是因为它是在用这些‘球’来思考。

That's because it's thinking in terms of these balls.

Speaker 3

对吧？

Right?

Speaker 3

如果我们试图在底层元素的层面上理解它，那就显得完全随机了。

And if we're trying to understand it at the level of these low level things, it just seems random.

Speaker 3

如果你错过了关键的因果因素，那就显得完全随机了。

If you're missing the key causal thing, it just seems random.

Speaker 3

也许根本就不存在什么关键的因果因素。

It could be that there is no key causal thing.

Speaker 3

对吧？

Right?

Speaker 3

这本身就是问题的一部分。

That's kind of part of the problem.

Speaker 3

我觉得它是存在的，如果我们能找到它，那简直太棒了。

I'm, like, thinking there is, and if we find it, this will be so cool.

Speaker 3

而常见的、合理的怀疑是，可能根本就不存在这样的因素。

And the common legitimate point of skepticism is there might just not be one.

Speaker 3

对吧？

Right?

Speaker 0

所以我们正在试图找出大语言模型中这些台球的形状和大小。

So we are trying to find the shape and size of these billiard balls in LLMs.

Speaker 0

但正如艾莉所说，这些台球是否真的存在还不得而知。

But as Ellie said, whether or not the billiard balls even exist is not certain.

Speaker 0

我们假设并希望它们存在，然后去寻找它们。

We're assuming and hoping that they are, and then going in and looking for them.

Speaker 2

如果我们思考这些层级如何应用于人类，一种试图获得人类智能机制理解的方式是观察我们的大脑内部。

And if we were to think about how these levels apply to humans, one way we try to gain mechanistic understanding of human intelligence is by looking inside our brains.

Speaker 2

如果你回想一下我们关于语言那一集中的埃夫·费德连科的研究，埃夫使用fMRI脑扫描正是如此。

If you think back to Ev Federenko's work from our episode about language, Ev's use of fMRI brain scanning is exactly this.

Speaker 2

她研究了当我们使用语言时大脑中活跃的通路。

She's looked at the pathways in the brain that light up when we use language.

Speaker 2

但想象一下，如果我们试图更进一步，用脑细胞内的质子、电子和中子来描述人类语言。

But imagine if we were to try to go even further and describe human language in terms of the protons, electrons and neutrons within our brain cells.

Speaker 2

如果你深入到这一层次的细节，就会失去在更大脑结构中可见的规律性。

If you go down to that level of detail, you lose the order that you can see in the larger brain structures.

Speaker 2

这变得没有条理了。

It's not coherent.

Speaker 0

L 和 M 的工作通过执行大量的矩阵乘法来完成。

L and M's work by performing vast numbers of matrix multiplications.

Speaker 0

在微观细节层面，这一切都是数学。

At the granular detail level, it's all math.

Speaker 0

我们可以用观察台球量子力学的方式来看待这些矩阵运算。

And we could look at those matrix operations in the same way we can observe the quantum mechanics of billiard balls.

Speaker 0

这可能会让我们发现某些事情正在发生，但不一定是我们真正寻找的东西。

And that'll probably show us that something's happening, but not necessarily what we're looking for.

Speaker 3

当我们对大型语言模型感到非常沮丧，觉得它们像是所谓的‘黑箱’时，或许部分原因就在于此。

And maybe part of when we're very frustrated with large language models and we're like, they seem like, quote, black boxes is because that's kind of what we're trying to do.

Speaker 3

我们试图用实现这些行为的矩阵乘法来描述这些高层行为，虽然它们确实是由矩阵乘法实现的，但这些乘法并不对应任何我们可以把握的直观结构。

We're trying to describe these higher level behaviors in terms of the matrix multiplications that implement them, which obviously they are implemented by matrix multiplications, but like it doesn't correspond to anything that looks like anything that we can grab onto.

Speaker 3

所以我认为，我们都渴望一种更高层次的描述。

So I think there's this higher level description that we all want.

Speaker 3

这种描述对于理解模型本身是有用的。

It's useful kind of for understanding the model for its own sake.

Speaker 3

对于类似人类相似性这样的问题，这也很有用。

It's also really useful for these questions about like similarity to humans.

Speaker 3

对吧？

Right?

Speaker 3

因为人类并不会拥有完全相同的矩阵乘法操作。

Because the humans aren't gonna have those exact same matrix multiplications.

Speaker 3

那么，究竟有哪些更高层次的抽象结构被表示出来了呢？

And so it's kind of like what are the higher level abstractions that are being represented?

Speaker 3

它们又是如何被操作的？

How are they being operated on?

Speaker 3

而相似性很可能就存在于这些地方。

And that's where the similarity is likely to exist.

Speaker 3

这就像是我们需要发明功能性磁共振成像（fMRI）和脑电图（EEG），并找到实现的方法。

It's like we kind of need to invent FMRIs and EEGs and we've got to figure out how to do that.

Speaker 3

我认为已经有一些工具存在了，它们足够好，可以开始逐步深入研究，我们也已经开始获得一些有趣的趋同结果，但这些显然还远非最终答案。

And I think there are some things that exist, they're good enough to start chipping away and we're starting to get some interesting converging results, but they're definitely not the last word on it.

展开剩余字幕（还有 184 条）

Speaker 3

所以我会说，我们经常使用的一个最流行的工具，大概是2019年或2020年左右发明的，叫做路径修补。

So I would say one of the most popular tools that we use a lot that I think was really invented maybe back around 2019, 2020 or something, is called path patching.

Speaker 3

但那篇论文把它称为因果中介分析。

But that paper I think called it causal mediation analysis.

Speaker 3

我认为有很多论文几乎同时提出并完善了这项技术。

I think there are a lot of papers that kind of have simultaneously introduced and perfected this technique.

Speaker 3

但它基本上是在说，尝试找出模型中哪些组件对预测a而非b的决策贡献最大。

But it basically is saying like, try to find which components in the model are like maximally contributing to the choice of predicting a over b.

Speaker 3

因此，这一直是一种非常流行的技术。

So that's been a really popular technique.

Speaker 3

已经有很多论文使用了它，并且得出了非常可复现的结果。

There have been a lot of papers that have used it, and it has made very reproducible types of results.

Speaker 3

你基本上得到的是某种类似fMRI的东西。

And what you basically get is some kind of like you can think of it of like an FMRI.

Speaker 3

对吧？

Right?

Speaker 3

它就像是点亮了网络的某些部分，表明这些部分在该决策中高度活跃。

It's kind of like lights up parts of the network as saying these ones are highly active in this decision.

Speaker 3

这些部分则活跃度较低。

These ones are less active.

Speaker 0

那么，我们如何从路径修补——这种大语言模型的fMRI——过渡到更高层次的概念，比如理解、意图和智能呢？

So then how do we get from path patching, this FMRI for large language models, to higher level concepts like understanding, intentions, and intelligence.

Speaker 0

我们常常想知道大语言模型是否真正理解，但‘理解’的含义往往取决于你如何定义它。

We often wonder if LLMs understand, but what it means to understand something can depend on how you define it.

Speaker 2

让我从矩阵乘法的讨论跳到最顶层的哲学层面。

Let me, jump up from the matrix multiplication discussion to the highest philosophical level.

Speaker 2

2022年有一篇论文对自然语言处理领域进行了调查，询问人们是否同意或不同意以下陈述。

So there was a paper in 2022 that was a survey of the natural language processing community, and it asked people to agree or disagree with the following statement.

Speaker 2

仅通过文本训练的某些生成模型，只要有足够的数据和计算资源，就能以某种非平凡的方式理解自然语言。

Some generative models trained only on text, given enough data and computational resources, could understand natural language in some nontrivial sense.

Speaker 2

这基本上是说，仅通过语言进行训练，在原则上是可能的。

So this is sort of like in principle, trained only on language.

Speaker 2

那么，你同意还是不同意这个观点呢？

So would you agree or disagree with that?

Speaker 3

我会说，我可能同意。

I would say maybe I would agree.

Speaker 3

对我来说，这几乎显而易见，因为我觉得这个问题的妙处在于它没有把‘理解’当作非黑即白的东西，这正是当别人问这个问题时，我通常首先会强调的。

To me, it feels almost trivial because I think what's nice about this question is it doesn't treat understanding like a binary, and I think that's the first place where I usually start when people ask this question.

Speaker 3

对我来说，我们现在正在进行的很多争论其实并不是关于大语言模型的。

To me, a a lot of the debate we're having right now is not about large language models.

Speaker 3

而是关于分布语义，以及我们是否认为分布语义能走这么远。

It's about distributional semantics, and it's whether we thought distributional semantics could go this far.

Speaker 2

你能解释一下什么是分布语义吗？

Can you explain what distributional semantics is?

Speaker 3

是的。

Yeah.

Speaker 3

你知道，自然语言处理一直以来都只是在使用文本。

You know, natural language processing has just been using text.

Speaker 3

所以，利用这样一个观点：一个词前后出现的词是其含义的绝佳线索。

And so using this idea that, like, the words that occur before and after a word are a really good signal of its meaning.

Speaker 3

因此，如果你有大量的文本，并根据它们共同出现的词进行聚类，那么猫和狗，或者狗和小狗、达尔马提亚犬就会出现在一起。

And so if you get a lot of text and you cluster things based on the words they co occur with, then yeah, cat and dog and or like maybe dog and puppy and Dalmatian will occur together.

Speaker 3

猫、狗、鸟和其他宠物会出现在一起。

Cat and dog and bird and other pets will occur together.

Speaker 3

斑马和大象，它们也会共同出现。

Zebra and elephant, those will co occur together.

Speaker 3

随着模型变大、文本增多，这种结构会变得更加复杂。

And as you get bigger models and more text, the structure becomes more sophisticated.

Speaker 3

因此，你可以在许多不同的维度上衡量相似性。

So you can cut similarity along lots of different dimensions.

Speaker 3

这不仅仅是一个维度的问题：这些事物是相似还是不同？

It's not just one dimension, are these things similar or different?

Speaker 3

我已经把宠物和动物园动物区分开来，但在另一个维度上，我又把食肉动物和食草动物区分开来了。

I've differentiated pets from zoo animals, but in this other dimension, I've differentiated like carnivores from herbivores.

Speaker 3

对吧？

Right?

Speaker 3

所以显然它漏掉了一些东西，你知道的。

So it's obviously missing some stuff, you know.

Speaker 3

它可能对“猫”以及它与其他词的关系了解很多，但它并不知道猫到底是什么。

It might know a lot about cat and as it relates to other words, but it doesn't know what a cat actually is.

Speaker 3

对吧？

Right?

Speaker 3

比如，它无法指出你看不到的猫，所以它不知道猫长什么样，也不知道猫摸起来是什么感觉。

Like it wouldn't be able to point out a cat you can't see, so it doesn't know what cats look like and doesn't know what they feel like.

Speaker 2

所以我觉得，那个调查的结果挺有意思的。

So I think, you know, the the results of that survey were interesting.

Speaker 2

那是2022年的事了，现在可能不一样了，但一半人同意，一半人不同意。

That was in 2022, so it might be different now, but half the people agreed and half the people disagreed.

Speaker 2

而关于分歧，我觉得问题是：一个仅通过语言训练的系统，原则上能否以非平凡的方式理解语言？

And so the disagreement, I think, you know, the question was could could something trained only on language in principle understand language in a nontrivial sense?

Speaker 2

我想，这仅仅是人们对‘理解’这个词的不同解读方式。

And I guess it's just a kind of a difference between how people interpret the word understand.

Speaker 2

那些持不同意见的人，我觉得就像你所说的，这些系统知道如何使用‘猫’这个词，但它们并不知道猫是什么。

And the people who disagreed, I would say that, like what you said, these systems know how to use the word cat, but they don't know what a cat is.

Speaker 2

有些人会说，这不算理解。

Some people would say, that's not understanding.

Speaker 3

没错。

Right.

Speaker 3

我觉得这归根结底是人们对‘理解’和‘ trivial’的定义不同。

I think this gets down to like, yeah, people's definition of understand and people's definition of trivial.

Speaker 3

我觉得，这更像是在喝点酒的时候可以深入讨论的话题，但眼下这算得上是科学讨论吗？

And I think this is where I feel like it's an interesting discussion to have over drinks or something like that, but is it a scientific discussion right now?

Speaker 3

我经常发现，这根本不是一场科学讨论。

And I often find it's not a scientific discussion.

Speaker 3

有些人就是觉得这不算理解，而另一些人则觉得当然算。

Like, some people just feel like this is not understanding and other people feel like sure it is.

Speaker 3

而且他们的观点不会改变，因为我不知道该如何与他们沟通。

And there's no moving their opinions because I don't know how you speak to that.

Speaker 3

所以你要做的就是去弄清楚人类身上真正发生了什么。

So the way you have to speak to it is try to figure out what's really going on in humans.

Speaker 3

假设我们都同意人类确实理解，而这是我们唯一都认同的例子，我们需要弄清楚这究竟是什么。

Assuming we all agree that humans really understand and that's the only example we all agree on, we need to figure out what it is.

Speaker 3

然后我们必须弄清楚大语言模型中的不同之处，再判断这些差异是否重要。

And then we have to figure out what's different in the LMs, and then we have to figure out whether those differences are important or not.

Speaker 3

我觉得，说实话，这根本就是一场漫长的博弈。

And I think like, I don't know, that's just a really long game.

Speaker 3

所以，尽管我很喜欢这个问题，但我越来越厌倦被问到它，因为我觉得这根本不是一个科学问题。

So as much as I like kind of love this question, then I've increasingly gotten annoyed having to answer it because I'm like, I just don't feel like it's a scientific question.

Speaker 3

但它本可以是的。

But it could be.

Speaker 3

这不像在问死后世界之类的事情。

It's not like asking about the afterlife or something.

Speaker 3

这并不在可回答问题的范围之外。

It's not outside of the realm of answerable questions.

Speaker 0

在我们之前的节目中，我们讨论过人工智能领域的一个重大问题，即大型语言模型是否具备心理理论，研究人员最初是通过人类心理学测试（如萨利-安妮场景）来评估这一点的。

In our previous episodes, we've talked about how one of the big questions around artificial intelligence is whether or not large language models have theory of mind, which researchers first started assessing with human psychology tests like the Sallian scenario.

Speaker 0

而这一过程又引出了第二个问题。

And a second question arose out of that process.

Speaker 0

如果大型语言模型能够通过人类的心理理论测试，比如在更改细节和名称后通过萨利-安妮测试，它们真的在进行复杂的推理吗？

If LLMs can pass a human theory of mind test, if they pass Sallie Anne when the details and the names are changed, are they actually doing complicated reasoning?

Speaker 0

还是只是在训练数据中学会了更高级的模式匹配？

Or are they just getting more sophisticated at matching patterns in their training data?

Speaker 0

正如艾莉所说，她关心的是我们在说‘大型语言模型理解’或‘不理解’时，是否保持了严谨和科学的态度。

As Ellie said, she cares that we're intentional and scientific when we say things like an LLM understands or doesn't understand.

Speaker 0

然而

And yet

Speaker 3

它们学到的结构比我原先预想的要有趣得多。

They're learning much more interesting structure than I would have guessed.

Speaker 3

所以，我刚接触这项工作时，我会称自己为神经网络怀疑者，到现在我仍然觉得自己是这样的。

So I would say my general coming into this work, I would have called myself a neural network skeptic, and I still kind of view myself as that.

Speaker 3

对吧？

Right?

Speaker 3

当我听到人们说‘它们理解’或‘它们思考’之类的话时，我常常感到恼火；但事实上，我花更多时间写论文，指出这里存在一些有趣结构。

Like I'm very often like annoyed when I hear people say stuff like they understand or they think, And yet, I actually feel like I spend more of my time writing papers saying like, oh, there is interesting structure here.

Speaker 3

它们确实具备某种组合性概念，而且我实际上经常使用这些词语。

They do have some notion of compositionality, or they and I actually do use those words a lot.

Speaker 3

我真的很努力在论文中避免这样用词，但当我讲话时，我就觉得没有别的词能替代了，而且我自己编造新术语实在太低效了。

I really try not to in papers, but when I'm talking, I'm like, I just don't have another word for it and it is so inefficient for me to like come up with some new jargon.

Speaker 3

所以我在演讲中几乎疯狂地赋予它们人性，这很糟糕，我一开始就 blanket 道歉，但我还是不断这么做。

So I I anthropomorphize like crazy in my talks and it's terrible and I apologize blanket at the beginning, I keep doing it.

Speaker 3

但关键的一点是，我并不愿意说它们‘思考’或‘理解’，或使用其他类似词汇；但我确实已经不再断言它们显然不能做或显然没在做的事情了。

But so one big takeaway is I'm not willing to say that they think or they understand or any of these other words, but I definitely have stopped making claims about what they obviously can't do or even obviously aren't doing.

Speaker 3

对吧？

Right?

Speaker 3

因为我不得不收回好几次自己说过的话，我觉得我们对它的理解实在太少了，不如都别急着下定义，先花点时间好好研究一下。

Because I had to eat my words a couple times, and I think it's just we understand so little that we should all just stop trying to call it, and just take a little bit of time to study it.

Speaker 3

我觉得这样挺好。

I think that's like okay.

Speaker 3

我们没必要现在就回答它们是否智能这类问题，这有什么意义呢？

Like we don't need an answer right now on whether they're intelligent or not, what is the point of that?

Speaker 3

你知道的，这种判断注定是错的。

You know, it's just guaranteed to be wrong.

Speaker 3

所以，不如我们花点时间弄清楚，问这个问题到底想达成什么目标，然后好好地去做。

And so, like, let's just take some time and figure out what we're trying to even do by asking that question and, know, do it right.

Speaker 3

我觉得现在看到大语言模型的出现，它们在太多错误的方面太像人类了，用‘智能’这个词来思考这个问题并不合适。

I think right now, seeing LLMs on the scene, it's like too similar to humans in all the wrong kind of ways to make intelligence the right way to be thinking about this.

Speaker 3

所以，如果能彻底放弃这个词，我会很高兴。

And so I would be happy if we just could abandon the word.

Speaker 3

但问题正如我所说，一旦放弃这个词，就会陷入一大堆术语里，我觉得我们都应该达成共识：我们正处于重新定义这个词的过程中，这可能需要一些时间。

The problem, like I said, is then you get bogged down in a ton of jargon and like I think we should all just be in agreement that we are in the process, and it might take a while, of redefining that word.

Speaker 3

我希望这个词能被拆分成许多不同的术语，十年后，你在论文中根本不会再看到这个词，取而代之的是其他更具体的术语，人们用它们来讨论各种不同的能力。

I hope it'll get fractured up into many different words, and that a decade from now, you just won't even see that in the papers anywhere, but you will see other types of terms where people are talking about other kind of much more specific abilities.

Speaker 2

而且还要愿意接受不确定性，但这个领域里很少有人能做到这一点。

Well, also just sort of willing to put up with uncertainty, which very few people in this field seem

Speaker 3

能够做到这一点。

to be able to do.

Speaker 3

如果我们都能说，咱们就等上十年，那该多好。

It would be nice if we could all just be like, let's all just wait a decade.

Speaker 3

我知道现实世界不会允许我们这么做，但我真希望我们能这么做。

Like, I get the world wouldn't allow that, but, like, I wish we could just do that.

Speaker 3

对吧？

Right?

Speaker 0

艾丽卡也同意。

And Erica agrees.

Speaker 0

她对动物的研究让她在对其他实体的能力做出假设之前会三思。

Her work with animals has made her pause before making assumptions about what other entities can and can't do.

Speaker 1

我不断去听新的演讲，你知道，我本来有自己的观点，但当我听到她的新演讲时，我会说，哦，这真有意思。

I keep going to new talks and, you know, I sort of have an opinion and I get her a new talk and I go, oh, well, that's really interesting.

Speaker 1

我不得不重新调整我的观点。

And I have to kind of revise my opinion.

Speaker 1

我对人类科学家不断改变标准，比如‘什么让人类独特？’这个问题，提出了很多质疑。

And I push back a lot on human scientists moving the bar on, oh, what makes humans unique?

Speaker 1

什么让人类语言独特？

What makes human language unique?

Speaker 1

然后我发现，我自己在对待大语言模型时，也有一点在做同样的事。

And then I sort of find myself doing that a little bit with LLMs.

Speaker 1

所以，我需要在这方面保持一点谦逊。

And so I, you know, need to have kind of a little bit of humility in that.

Speaker 1

我不认为它们具有心理理论，但我认为，既要证明它们没有心理理论，又要解释为什么没有，都不是简单的事，你知道吗？

So I don't think they have a theory of mind, but I think demonstrating one, that they don't and two, why they don't, are not simple tasks, You know?

Speaker 1

对我来说，重要的是，我不能武断地断言：‘我相信它们没有。’

And I it's important to me that I don't just sort of dogmatically say, well, I believe that they don't.

Speaker 1

对吧？

Right?

Speaker 1

因为我觉得人们对动物有很多固有的看法，然后就会说，嗯，我相信动物没有概念。

Because I think people believe a lot of stuff about animals and then go into it saying, well, I believe, you know, animals don't have concepts.

Speaker 1

然后你就会问，为什么没有？

And then you say, well, why not?

Speaker 1

因为它们没有语言。

Well, because they don't have language.

Speaker 1

然后你就说，好吧。

And it's like, okay.

Speaker 1

所以我认为，LLM本质上是在进行下一个词的预测。

So I think that, you know, LLMs are fundamentally doing next token prediction.

Speaker 1

我知道你可以将它们构建在能够执行更复杂任务的系统中。

And I know you can, like, build them within systems that do more sophisticated things.

Speaker 1

但从根本上说，以我这个外行的理解来说——我并不构建这些系统，而你对这个的了解远比我多。

But fundamentally, to the extent that, you know, my layperson understanding I mean, I do not build these these systems, and you know much more about this than I do.

Speaker 1

但我认为，它们非常擅长根据语料库预测人类会如何回答这些问题——这些语料库包含了人类对完全相同或结构、逻辑上相似问题的回答方式。

But I think that they're very good at predicting the ways that humans would answer those questions based on corpora of, you know, how humans answer either exactly those questions or questions that are similar in form, that are sort of analogous, structurally and logically similar.

Speaker 1

而且，我花了不少时间试图论证黑猩猩具备心理理论，历史上人们对此一直持强烈反对态度，不过现在我认为人们开始逐渐开放一些了。

And I mean, I've been spending quite a bit of time trying to argue that chimpanzees have a theory of mind and people are historically I mean, now I think they're becoming a little more open to it, but historically have been quite opposed to that idea.

Speaker 1

但我们却很容易因为LLM能用语言回答相关问题，就直接把这类想法归因于它们。

But we'll very readily attribute those ideas to an LLM simply because they can answer verbal questions about it.

Speaker 0

我们会轻易地把人类的特征赋予LLM，因为与埃里卡研究的黑猩猩不同，它们说话的方式和我们一样。

We'll readily attribute human characteristics to LLMs because unlike the chimpanzees Erica studies, they speak like us.

Speaker 0

它们是基于我们的语言构建的。

They're built on our language.

Speaker 0

这使得它们在表层上对我们来说更熟悉，但当我们试图理解它们实际如何运作时，却又显得更加陌生。

And that makes them both more familiar to us on a surface level and more alien when we try to figure out how they're actually doing things.

Speaker 2

之前，埃里卡描述了研究动物智力时的一种权衡。

Earlier, Erica described a trade off in studying intelligence in animals.

Speaker 2

我们使用人类语言这类我们熟悉的指标，能获得多少收益？ versus 试图从动物自身的角度理解它们，比如大象通过地面震动来交流。

How much do we gain by using the metrics we're familiar with, like human language, versus trying to understand animals on their own terms, like elephants that rumble through the ground to communicate.

Speaker 0

我们问了艾莉，这如何适用于大型语言模型。

And we asked Ellie how this applies to large language models.

Speaker 0

它们身上也存在这种权衡吗？

Does that trade off exist with them too?

Speaker 3

是的。

Yeah.

Speaker 3

完全如此。

Totally.

Speaker 3

从大型语言模型的角度来看，我认为在我们的实验室里，我们实际上同时做了这两方面的工作。

From the point of view of LLMs, I actually think within our lab, do a little bit of both of these.

Speaker 3

我经常更多地谈论以人类的视角来理解大型语言模型，这显然比对待动物时要多得多。

I often talk more about trying to understand LLMs in human terms, definitely much more so than with animals.

Speaker 3

语言模型被发明出来就是为了与我们交流并为我们做事，因此试图强行建立这种类比既不荒谬也不反常。

Like LMs were invented to communicate with us and do things for so it is not unreasonable or it's not unnatural to try to force that analogy.

Speaker 3

对吧？

Right?

Speaker 3

与大象不同，大象在我们出现之前就已存在，它们做着自己的事，根本不在乎我们，甚至可能希望我们根本不存在。

Unlike elephants, which existed long before us and are doing their own thing, and they could care less and would probably prefer that we weren't there at all.

Speaker 3

对吧？

Right?

Speaker 2

另一方面，埃里卡觉得更难解读它们，因为尽管它们能按照我们的标准表现，但其底层构成对她来说不如动物直观。

On the other hand, Erica finds them more difficult to interpret because even though they can perform on our terms, the underlying stuff that they're made of is less intuitive for her than animals.

Speaker 1

你知道，我不确定，因为大语言模型本质上不是一个单一的主体。

You know, again, I'm not sure because an LLM is not fundamentally a single agent.

Speaker 1

对吧？

Right?

Speaker 1

它是一个集体。

It's a collective.

Speaker 1

它反映的是集体的知识和信息。

It's reflecting collective knowledge, collective information.

Speaker 1

我觉得我更清楚该如何解读一只鹦鹉、一只海豚或一只猩猩在完成任务时的表现。

I feel like I know much more how to interpret a single parrot or a single dolphin or a single orangutan performing on a task.

Speaker 1

他们是如何理解的？

How do they interpret it?

Speaker 1

他们是如何回应的？

How how do they respond?

Speaker 1

对我来说，这个问题非常直观。

To me, that question is very intuitive.

Speaker 1

那个意识可能与我的截然不同，但那里确实存在一个意识。

That mind might be very different from my own, but there is a mind there.

Speaker 1

有一个自我，无论这个自我是否具有意识，是否能够觉察自身，我认为这些都是重大问题，但确实存在一个自我。

There's a self and whether that self is conscious, whether that self is aware of itself, those I think are are big questions, but there is a self.

Speaker 1

有一个生命降生于世，具有叙事的连续性，终有一天会像我们所有人一样死去。

There was something that was born into the world that has narrative continuity and one day will die like we all will.

Speaker 1

对吧？

Right?

Speaker 1

大语言模型没有这种特性。

LLMs don't have that.

Speaker 1

它们不是出生在这个世界上的。

They aren't born into the world.

Speaker 1

它们没有叙事的连续性。

They don't have narrative continuity.

Speaker 1

它们不会像我们那样死亡。

They don't die in the same way that we do.

Speaker 1

所以我认为，它们是一种人类从未接触过的某种集合体。

And so I think it's a, you know, it's a collective of a kind that humans have never interacted with before.

Speaker 1

我认为我们的思维还没有跟上这项技术的发展。

And I don't think that our thinking has caught up with the technology.

Speaker 1

因此，我觉得我们对它们提出的问题并不正确，因为这些实体、集合体或程序与人类历史上任何曾经历过的都截然不同。

So I just don't think that we're asking the right questions about them because I don't these are entities or collectives or programs unlike anything else that we have ever experienced in human history.

Speaker 0

所以，梅兰妮，让我们回顾一下本集的内容。

So, Melanie, let's recap what we've done in this episode.

Speaker 0

我们探讨了评估人类、非人类动物和机器智能的概念。

We've looked at the notion of assessing intelligence in humans, nonhuman animals, and machines.

Speaker 0

关于智力的思想史在很大程度上是以人类为中心的。

The history of thought concerning intelligence is very much human centered.

Speaker 0

而且，你知道，我们关于如何评估智力的想法，总是重视那些最像人类的特质。

And, you know, our ideas about how to assess intelligence, it's always value the things that are most human like.

Speaker 2

是的。

Yeah.

Speaker 2

我非常认同艾丽卡关于我们在研究动物时缺乏想象力的评论。

I I really resonated with Erica's comment about our lack of imagination doing research on animals.

Speaker 2

她向我们展示了以人类为中心的视角如何主导了动物认知研究，并且这种视角可能让我们忽视了动物思维的重要方面，没有给予它们足够的认可。

And she showed us how a human centered view has really dominated research in animal cognition and that it might be blinding us to important aspects of how animals think, not giving them enough credit.

Speaker 0

但有时我们会通过拟人化而高估了动物。

But sometimes we give animals too much credit by anthropomorphizing them.

Speaker 0

比如，当你对你的狗或猫的所谓想法或感受做出假设时，我们把自己的情感和对世界的观念投射到了它们身上。

Like, when you make assumptions about what your dog or cat is, quote, unquote, thinking or feeling, we project our emotions and our notions of the world onto them.

Speaker 0

对吧？

Right?

Speaker 2

是的。

Yeah.

Speaker 2

我们以人类为中心的假设确实可能在许多方面误导我们。

Our human centered assumptions can definitely lead us astray in many ways.

Speaker 2

但艾莉指出了在评估大语言模型时存在的类似问题。

But Ellie pointed out similar issues for assessing LLMs.

Speaker 2

我们给它们设计了一些为人类准备的测试，比如SAT或律师资格考试。

We give them tests that are designed for humans like the SAT or the bar exam.

Speaker 2

然后，如果它们通过了这些测试，我们就错误地假设它们和通过这些测试的人类具备相同的特质。

And then if they pass the test, we we make the mistake of assuming the same things that we would for humans passing that test.

Speaker 2

但看起来，它们可以在没有真正掌握这些测试原本想评估的通用底层能力的情况下通过测试。

But it seems that they can pass these tests without actually having the general underlying skills that these tests were meant to assess.

Speaker 0

但艾莉也指出，人类经常也会钻这些测试的空子。

But Ellie also points out that humans often game these tests.

Speaker 0

也许问题并不在于测试本身。

Maybe it's not the tests themselves that are the problem.

Speaker 0

也许问题出在那些参加测试的人类、动物或机器身上。

Maybe it's the humans or the animals or the machines that take them.

Speaker 2

当然。

Sure.

Speaker 2

我们评估人类智力的方法一直都有些问题。

Our methods of assessing human intelligence have always been a bit problematic.

Speaker 2

但另一方面，人类花了数十年时间研究哪些通用能力与这些测试分数相关，而我们才刚刚开始探索如何评估像LLM这样的AI系统。

But on the other hand, there's been decades of work on humans trying to understand what general abilities correlate with these test scores while we're just beginning to figure out how to assess AI systems like LLMs.

Speaker 2

正如我们之前所述，艾莉在试图理解AI系统内部运作机制方面的研究被称为机制理解或机制可解释性。

Ellie's own work in trying to understand what's going on under the hood in AI systems, as we described before, is called mechanistic understanding or mechanistic interpretability.

Speaker 0

我的理解是，她在寻找一种比仅仅分析神经网络中的权重和激活值更高级的方式来理解LLM。

The way I understood this is that, you know, she's looking at ways to understand LLMs at a higher level than just weights and activations in a neural network.

Speaker 0

这类似于神经科学家的目标，即在不查看每个神经元的激活或每个突触强度的情况下理解大脑。

It's analogous to what neuroscientists are after, right, in understanding the brain without having to look at the activation of every neuron or the strength of every synapse.

Speaker 2

是的。

Yeah.

Speaker 2

正如艾莉所说，我们需要为大语言模型开发类似fMRI的技术。

As Ellie said, we need something like fMRIs for LLMs.

Speaker 2

或者，由于艾丽卡指出，大语言模型或许更应被视为一种集体智能，而非个体智能，因此我们可能需要完全不同的方法。

Or maybe we actually need something entirely different since as Erica pointed out, an LLM might be better thought of as a collective kind of intelligence rather than an individual.

Speaker 2

但无论如何，这项工作还处于初始阶段。

But in any case, this work is really at its inception.

Speaker 0

是的。

Yeah.

Speaker 0

而且正如艾莉和艾丽卡都指出的，我们需要更好地厘清‘智能’和‘理解’这类词的含义，而这些概念目前尚未被严谨定义。

And also as both Ellie and Erica pointed out, we need to understand better what we mean by words like intelligence and understanding, which are not yet rigorously defined.

Speaker 0

对吧？

Right?

Speaker 2

完全不是。

Absolutely not.

Speaker 2

也许我们不该做出诸如‘大语言模型理解世界’或‘大语言模型完全无法理解任何东西’这样的宏大断言，而应该听从艾莉的建议，学会接受不确定性。

And maybe instead of making grand proclamations like LLMs understand the world or LLMs can't understand anything, we should do what Ellie urges us to do, that is be willing to put up with uncertainty.

Speaker 0

在本季的最后一集中，我会向梅兰妮了解更多关于她对这些话题的看法。

In our final episode of the season, I'll ask Melanie more about what she thinks about all these topics.

Speaker 0

你会听到她关于智能领域的背景、她对通用人工智能的看法以及我们是否能实现它、该行业是否可持续，以及她是否担心未来的AI。

You'll hear about her background in the fields of intelligence, her views on AGI and if we can achieve it, how sustainable the industry is, and if she's worried about AI in the future.

Speaker 0

敬请期待下一集《复杂性》。

That's next time on Complexity.

Speaker 0

《复杂性》是圣塔菲研究所的官方播客。

Complexity is the official podcast of the Santa Fe Institute.

Speaker 0

本集由凯瑟琳·蒙库尔制作。

This episode was produced by Katherine Moncure.

Speaker 0

我们的主题曲由米奇·米尼亚诺创作，其他音乐来自Blue Dot Sessions。

Our theme song is by Mitch Mignano and additional music from Blue Dot Sessions.