智力的本质，第4集：婴儿与机器

本集简介

嘉宾：琳达·史密斯，印第安纳大学伯明顿分校心理与脑科学系杰出教授、校长教授迈克尔·弗兰克，斯坦福大学心理学系本杰明·斯科特·克罗克人类生物学教授主持人：阿巴·埃利·菲博与梅兰妮·米切尔制作人：凯瑟琳·蒙库尔播客主题音乐：米奇·米尼亚诺关注我们： Twitter • YouTube • Facebook • Instagram • LinkedIn • Bluesky 更多信息：教程：机器学习基础讲座：人工智能圣塔菲研究所项目：教育书籍：《人工智能：给思考者的人类指南》作者：梅兰妮·米切尔演讲：《“自生成学习”为何比表面看起来更激进且影响深远》——琳达·史密斯《儿童早期语言学习：社会人工智能的灵感》——迈克尔·弗兰克，斯坦福大学HAI 《人工智能的未来》——梅兰妮·米切尔论文与文章：《基于婴儿第一人称视角视频的课程学习》，发表于NeurIPS 2023（2023年9月21日）《婴儿的视觉世界：视觉学习的日常统计规律》，作者：斯瓦帕纳·贾亚拉曼与琳达·B·史密斯，载于《剑桥婴儿发展手册：大脑、行为与文化背景》第20章，剑桥大学出版社（2020年9月26日）《婴儿的学习经验能否解决数据饥渴型人工智能的问题？》，发表于《自然》（2024年3月18日），doi.org/10.1038/d41586-024-00713-5 《经验片段与生成智能》，发表于《认知科学趋势》（2022年10月19日），doi.org/10.1016/j.tics.2022.09.012 《评估大型语言模型能力的婴儿式起步》，发表于《自然综述：心理学》（2023年6月27日），doi.org/10.1038/s44159-023-00211-x 《辅助任务需求掩盖了小型语言模型的能力》，发表于COLM（2024年7月10日）《利用视觉问答模型从具身语言中学习功能词的含义》，发表于《认知科学》（首次发表：2024年5月14日），doi.org/10.1111/cogs.13448

Guests: Linda Smith, Distinguished Professor and Chancellor's Professor, Psychological and Brain Sciences, Department of Psychological and Brain Sciences, Indiana University Bloomington Michael Frank, Benjamin Scott Crocker Professor of Human Biology, Department of Psychology, Stanford University Hosts: Abha Eli Phoboo & Melanie Mitchell Producer: Katherine Moncure Podcast theme music by: Mitch Mignano Follow us on: Twitter • YouTube • Facebook • Instagram • LinkedIn • Bluesky More info: Tutorial: Fundamentals of Machine Learning Lecture: Artificial Intelligence SFI programs: Education Books: Artificial Intelligence: A Guide for Thinking Humans by Melanie Mitchell Talks: Why "Self-Generated Learning” May Be More Radical and Consequential Than First Appears by Linda Smith Children’s Early Language Learning: An Inspiration for Social AI, by Michael Frank at Stanford HAI The Future of Artificial Intelligence by Melanie Mitchell Papers & Articles: “Curriculum Learning With Infant Egocentric Videos,” in NeurIPS 2023 (September 21) “The Infant’s Visual World The Everyday Statistics for Visual Learning,” by Swapnaa Jayaraman and Linda B. Smith, in The Cambridge Handbook of Infant Development: Brain, Behavior, and Cultural Context, Chapter 20, Cambridge University Press (September 26, 2020) “Can lessons from infants solve the problems of data-greedy AI?” in Nature (March 18, 2024), doi.org/10.1038/d41586-024-00713-5 “Episodes of experience and generative intelligence,” in Trends in Cognitive Sciences (October 19, 2022), doi.org/10.1016/j.tics.2022.09.012 “Baby steps in evaluating the capacities of large language models,” in Nature Reviews Psychology (June 27, 2023), doi.org/10.1038/s44159-023-00211-x “Auxiliary task demands mask the capabilities of smaller language models,” in COLM (July 10, 2024) “Learning the Meanings of Function Words From Grounded Language Using a Visual Question Answering Model,” in Cognitive Science (First published: 14 May 2024), doi.org/10.1111/cogs.13448

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

你们将听到的声音是在不同国家、城市和工作场所远程录制的。

The voices you'll hear were recorded remotely across different countries, cities, and workspaces.

Speaker 1

训练儿童所用的数据是由进化筛选出来的。

The data for training children has been curated by evolution.

Speaker 1

这与所有大型数据模型形成鲜明对比，对吧？

This isn't sort of marked contrast to all the large data models, right?

Speaker 1

它们只是随意抓取一切内容。

They just scrape everything.

Speaker 1

你会通过从网上抓取信息来教育你的孩子吗？

Would you educate your kid by scraping off the web?

Speaker 0

来自圣塔菲研究所，这里是复杂性。

From the Santa Fe Institute, this is Complexity.

Speaker 0

我是梅兰妮·米切尔。

I'm Melanie Mitchell.

Speaker 0

我是阿巴·艾莉·菲博。

And I'm Abha Eli Phoboo.

Speaker 0

到目前为止，本季我们从几个不同的角度探讨了智能，显然AI系统和人类的学习方式非常不同。

So far in this season, we've looked at intelligence from a few different angles, And it's clear that AI systems and humans learn in very different ways.

Speaker 0

有人认为，如果我们只是让AI以人类的方式学习，它们就会更接近人类般的智能。

And there's an argument to be made that if we just train AI to learn the way humans do, they'll get closer to human like intelligence.

Speaker 2

但有趣的是，我们自身的发育仍然是研究人员正在解开的一个谜团。

But the interesting thing is our own development is still a mystery that researchers are untangling.

Speaker 2

对于像大型语言模型这样的AI系统，工程师们设计了它们学习算法的结构以及输入的数据。

For an AI system like a large language model, the engineers that create structure of their learning algorithms and the data that's being fed to them.

Speaker 2

但对于婴儿来说，我们仍在探索这些原始成分最初是如何组合在一起的。

With babies, though, we're still learning about how the raw ingredients come together in the first place.

Speaker 0

今天，我们将透过婴儿的眼睛来看这个世界。

Today, we're going to look at the world through an infant's eyes.

Speaker 0

我们知道，婴儿吸收的信息与大型语言模型的早期发展截然不同。

We know that the information babies are absorbing is very different from an LLM's early development.

Speaker 0

但究竟有多不同呢？

But how different is it?

Speaker 0

婴儿在不同发展阶段体验到的是什么？

What are babies experiencing at different stages of their development?

Speaker 0

他们如何从经历中学习？

How do they learn from their experiences?

Speaker 0

婴儿与机器之间的差异有多重要？

And how much does the difference between babies and machines matter?

Speaker 0

第一部分：婴儿眼中的世界。发展心理学自十九世纪末以来就一直存在，研究认知从出生到成年的演变过程。

Part one: The World Through a Baby's Eyes Developmental psychology, the study of how cognition unfolds from birth to adulthood, has been around since the late nineteenth century.

Speaker 0

在它头一百年的历史中，这个领域主要由心理学家观察婴儿和儿童，并提出理论。

For the first hundred years of its history, this field consisted of psychologists observing babies and children and coming up with theories.

Speaker 0

毕竟，婴儿无法直接告诉我们他们的体验。

After all, babies can't tell us directly what they're experiencing.

Speaker 2

但如果科学家能通过婴儿自己的眼睛来看世界呢？

But what if scientists could view the world through a baby's own eyes?

Speaker 2

这种可能性在过去二十年左右才成为现实。

This has only become possible in the last twenty years or so.

Speaker 2

心理学家现在可以在婴儿头上安装摄像头，记录他们所看到和听到的一切。

Psychologists are now able to put cameras on babies' heads and record everything that they see and hear.

Speaker 2

这些摄像头收集的数据正开始改变科学家们对婴儿早期学习最重要体验的看法。

And the data collected from these cameras is beginning to change how scientists think about the experiences most important to babies' early learning.

Speaker 1

我是琳达·史密斯，印第安纳大学的教授。

I'm Linda Smith, and I'm a professor at Indiana University.

Speaker 1

我是一名发展心理学家，我长期以来一直感兴趣的是婴儿如何习得语言。

I'm a developmental psychologist, and what I am interested in and have been for a kind of long career is how infants break into language.

Speaker 1

有些人认为这意味着你只需研究语言，但实际上，婴儿能用身体做什么、他们控制身体的能力，决定了他们控制注意力的能力以及输入的内容——他们如何操作物体、是否发出声音，这些都直接影响语言学习。

And some people think that means that you just study language, but in fact, what babies can do with their bodies, how well they can control their bodies, determines how well they can control their attention and what the input is, what they do, how they handle objects, whether they emit vocalizations, all those things play a direct role in learning language.

Speaker 1

因此，我采取一种复杂或多模态系统的方法，试图理解这些过程如何相互关联，以及所有这些部分如何协同作用。

And so it I take a kind of complex or multimodal system approach to trying to understand the cascades and how all these pieces come together.

Speaker 2

琳达·史密斯是印第安纳大学的心理与脑科学校级教授。

Linda Smith is the chancellor's professor of psychological and brain sciences at Indiana University.

Speaker 2

她是婴儿头戴式摄像头研究的先驱之一。

She's one of the pioneers of head mounted camera research with infants.

Speaker 1

我开始给婴儿戴上头戴式摄像头，是因为在我整个职业生涯中，许多主要理论家曾多次提出，各种事物都是无法习得的。

I began putting head cameras on babies because people have throughout my career, major theorists, have at various points made the point that all kinds of things were not learnable.

Speaker 1

语言是无法习得的。

Language wasn't learnable.

Speaker 1

基本上，乔姆斯基就是这么认为的。

Chomsky said that, basically.

Speaker 1

所有这些都无法习得。

All this is not learnable.

Speaker 1

你唯一可能知道这些的方式，就是它们原本就是预设好的知识。

The only way you could possibly know it was for it to be a form of pre wired knowledge.

Speaker 1

即使在七十年代，我也觉得我们的能力远不止如此，我当然希望，如果我被置于某个神秘世界或某种矩阵空间中，那里的物理法则完全不同，我也能弄明白。

It seemed to me, even back in the seventies, that my thoughts were we are way smarter than that, and I should sure hope that if I was put on some mysterious world in some matrix space or whatever, where the physics work differently, that I could figure it out.

Speaker 1

但我们根本不知道这些数据是什么样子。

But we had no idea what the data are.

Speaker 1

大多数人认为，在日常生活的规模下，丰富的经验对每个人来说统计特征都差不多。

Most people assume that at the scale of daily life massive experience, the statistics are kind of the same for everybody.

Speaker 1

但通过给婴儿佩戴头戴式摄像头，我们发现他们的视觉经验绝对不是相同的——我并不是唯一这么认为的人，有很多人都在做这项研究。我们发现，这些经验绝对不相同。

But by putting head cameras on babies, we have found out that they are absolutely and I'm not alone in this, there's a lot of people doing this We have found out that it is absolutely not the same.

Speaker 2

琳达在谈论人类所经历的视觉世界中的统计规律。

Linda's talking about the statistics of the visual world that humans experience.

Speaker 2

我们感知到的是相关性。

We perceive correlations.

Speaker 2

某些物体往往会一起出现。

Certain objects tend to appear together.

Speaker 2

例如，椅子通常在桌子旁边，树木在灌木丛旁边，鞋子是穿在脚上的。

For example, chairs are next to tables, trees are next to shrubs, shoes are worn on feet.

Speaker 2

或者在更基础的无意识层面，我们感知物体边缘、颜色、光线某些特性之间的统计相关性。

Or at an even more basic unconscious level, we perceive statistical correlations among edges of objects, colors, certain properties of light and so on.

Speaker 2

我们在空间和时间上都能感知到相关性。

We perceive correlations in space as well as in time.

Speaker 0

琳达和其他人发现，最年幼的婴儿在最早几个月所接触的视觉统计规律，与我们成年人通常所见的非常不同。

Linda and others discovered that the visual statistics that the youngest babies are exposed to, what they're learning from in their earliest months, are very different from what we adults tend to see.

Speaker 1

他们就在这个世界里。

There they are in the world.

Speaker 1

他们坐在自己的小座椅上，你知道的，看着，或者趴在某人的肩膀上看着。

They're in their little seats, you know, looking, okay, or on somebody's shoulder looking.

Speaker 1

他们面前的图像，眼睛所能接收到的输入，变化得极其缓慢。

And the images in front of their face, the input available to the eye, changes extraordinarily slowly.

Speaker 1

这种缓慢有助于提取信息。

And slows good for extracting information.

Speaker 1

在头三个月里，婴儿在视觉的基础方面取得了显著进展，比如边缘感知、对比敏感度和色彩敏感度。

In the first three months, babies make remarkable progress, both in the tuning of the foundational aspects of vision, edges, contrast sensitivity, chromatic sensitivity.

Speaker 1

但这并不意味着他们要等到所有基本视觉能力都完善了才能做其他事情。

But it's not like they wait till they get all the basic vision worked out before they can do anything else.

Speaker 1

在头三个月里，婴儿进入了一个对人脸敏感的时期，他们能认出父母的脸，并对人脸产生偏好。

The first three months to find the period of faces, they recognize parents faces, they become biased in faces.

Speaker 1

如果他们生活在某个特定族群中，他们就能更好地识别和区分该族群的脸，而不是其他族群的脸。

If they live in one ethnic group, they can recognize those faces better and discriminate them better than if they live in another.

Speaker 1

所有这些在三个月内就发生了。

And all this happens by three months.

Speaker 1

一些研究表明，对于先天性白内障的婴儿，如果在四个月大之前不移除白内障，面部知觉将受到终身影响——这是达芙妮·莫尔令人惊叹的研究成果。

And some measures suggest that the first three to four months, this is Daphne Maurer's amazing work of babies with cataracts, that if you don't have a cataract removed before four months of age for infantile cataracts, that human face perception is disrupted for life.

Speaker 1

这很可能发生在较低层级的神经回路中，尽管也可能涉及面部识别的区域。

And that's likely in the lower level neural circuits, although maybe it's in the face ones as well.

Speaker 1

三个月大的婴儿能够区分狗和猫。

And babies who are three months old can discriminate dogs from cats.

Speaker 1

我的意思是，他们并不是什么都没学。

I mean, it's not like they're not learning anything.

Speaker 1

他们正在构建一个非常出色的视觉系统。

They are like building a very impressive visual system.

Speaker 1

我们其他许多哺乳动物一出生就能立刻站起来四处走动。

Many of our other mammalian friends get born and immediately get up and run around.

Speaker 1

但我们不能。

We don't.

Speaker 1

我们就这样坐着三个月，一定是有道理的。

We sit there for three months, gotta believe it's important.

Speaker 1

对吧？

Right?

Speaker 2

琳达和她的合作者分析了婴儿头戴式摄像头的数据，发现这些婴儿在生命的头几个月里，其视觉体验是由他们不断发展的运动能力以及与父母和其他看护者的互动所驱动的。

Linda and her collaborators analyzed the data from head mounted cameras on infants, and they found that over their first several months of life, these infants are having visual experiences that are driven by their developing motor abilities and their interactions with parents and other caregivers.

Speaker 2

这个过程以一种使他们能够高效学习世界的方式展开。

And the process unfolds in a way that enables them to efficiently learn about the world.

Speaker 2

他们体验视觉环境不同方面的顺序，实际上促进了学习。

The order in which they experience different aspects of their visual environment actually facilitates learning.

Speaker 1

这是一种学习的原理，而不是人类大脑的原理。

It's a principle of learning, not a principle of the human brain.

Speaker 1

这是数据结构的原理。

It's a principle of the structure of data.

Speaker 1

我认为大自然所做的，是面对一个必须学习语言、视觉、抓握物体、声音、社交关系以及自我调节等一切事物的发育中的婴儿。

I think what mother nature is doing is it's taking the developing baby who's got to learn everything in language and vision and holding objects and sound and everything, okay, and social relations and controlling self regulation.

Speaker 1

它是在带领他们逐步探索解决方案的空间。

It is taking them on a little walk through the solution space.

Speaker 1

儿童训练所用的数据是由进化精心筛选的。

The data for training children has been curated by evolution.

Speaker 1

这与所有大型数据模型形成鲜明对比，对吧？

This isn't sort of marked contrast to all the large data models, right?

Speaker 1

它们只是随意抓取一切数据。

They just scrape everything.

Speaker 1

你会通过抓取网页来教育你的孩子吗？

Would you educate your kid by scraping off the web?

Speaker 1

我的意思是，你会用这种方式训练你的孩子吗？

I mean, would you train your child on this?

Speaker 1

所以，无论如何，我认为数据很重要。

So, anyway, I think the data is important.

Speaker 0

另一位专注于婴儿及其所接触数据的发展心理学家是迈克·弗兰克。

Another developmental psychologist who's focused on babies and the data they experience is Mike Frank.

Speaker 3

我是迈克·弗兰克。

I'm Mike Frank.

Speaker 3

我是斯坦福大学的心理学教授，主要研究儿童是如何学习的。

I'm a professor of psychology at Stanford, and I'm generally interested in how children learn.

Speaker 3

他们如何从不会说话、没有词汇的婴儿，变成几年后能够自如应对世界的孩童。

So how they go from being speechless, wordless babies to, just a few years later, kids that can navigate the world.

Speaker 3

因此，支持这一转变的成长与变化模式让我着迷，我通常使用大规模数据集和新方法来研究这些问题。

And so the patterns of growth and change that support that is what fascinates me, and I tend to use larger data sets and new methodologies to investigate those questions.

Speaker 3

当我还在读研究生时，人们开始采用一种新方法。

When I was back in grad school, people started working with this new method.

Speaker 3

他们开始在孩子的头上安装摄像头。

They started putting cameras on kids' heads.

Speaker 3

帕万·辛哈用这种方法记录了自己新生儿的视角，为我们提供了关于新生儿如何感知视觉世界的精彩视角。

And so Pavan Sinha did it with his newborn and and gave us this amazing rich look at what it looked like to be a newborn perceiving the visual world.

Speaker 3

随后，琳达·史密斯、俞晨、凯伦·阿道夫、理查德·阿斯兰等先驱者开始尝试这种方法，收集了这些令人振奋的数据集，或许正在颠覆我们对儿童输入环境的传统认知。

And then pioneers like Linda Smith and Chen Yu and Karen Adolf and Dick Aslan and others started experimenting with the method and gathering these really exciting data sets that were maybe upending our view of what children's input looked like.

Speaker 3

这非常关键，因为如果你是一名学习科学家，想要弄清楚学习是如何发生的，你就需要了解输入是什么，以及学习的过程是什么。

And that's really critical because if you're a learning scientist, if you're trying to figure out how learning works, you need to know what the inputs are as well as what the processes of learning are.

Speaker 3

所以我对这个领域变得非常兴奋。

So I got really excited about this.

Speaker 3

当我刚开始在斯坦福建立实验室时，我开始学习一些手工制作技巧，尝试制造一些小设备。

And when I started my lab at Stanford, I started learning a little bit of crafting and trying to, build little devices.

Speaker 3

我们会从网上订购摄像头，然后试着把它们钉在露营头灯上，或者用胶水粘上去，再加一个后装的鱼眼镜头。

You know, we'd order cameras off the Internet and then try to staple them onto camping headlamps or, glue them on, add a little aftermarket fisheye lens.

Speaker 3

我们尝试了各种各样的手工解决方案，希望能做出孩子们愿意佩戴的设备。

We tried all these different little crafty solutions to get something that kids would enjoy wearing.

Speaker 3

那时，我们的技术比计算机视觉技术领先了大约五到七年。

At that time, we were in advance of the computer vision technologies by probably about five or seven years.

Speaker 3

所以我们天真地以为，可以处理从孩子们那里获取的海量视频，通过计算机视觉技术，就能知道孩子们看到了什么。

So we thought naively that we could process this flood of video that we're getting from kids and, you know, put it through computer vision and have an answer as to what the kids were seeing.

Speaker 3

但结果发现，这些视觉算法在这些数据上完全失效了。

And it turned out the vision algorithms failed completely on these data.

Speaker 3

它们根本无法处理这些数据，部分原因是摄像头质量差，因此只能捕捉到孩子所见的一小部分。

They couldn't process it at all, in part because the cameras were bad, and and so they would, you know, have just a piece of what the child was seeing.

Speaker 3

另一部分原因是视觉算法本身很差，它们是用Facebook照片训练的，而不是用儿童的真实视觉输入训练的，因此无法处理这些截然不同的角度、方向以及面部被遮挡等情况。

And in part because the vision algorithms were bad, and they couldn't they were trained on Facebook photos, not on children's real input, and so they couldn't process these very different angles and, very different orientations and, you know, kind of occlusions cutting off of, faces and so forth.

Speaker 3

所以，我正是因此开始涉足这一领域，想着可以利用计算机视觉来测量儿童的视觉输入。

So that that was how I got into it, was thinking I could, you know, use computer vision to measure children's input.

Speaker 3

但后来我发现，我不得不等待五到七年，直到算法足够先进，这个想法才成为可能。

And then it turned out I had to wait maybe five or seven years until the algorithms got good enough that that was true.

Speaker 2

那么，人们从这类数据中发现了哪些最有趣的东西呢？

So what are the most interesting things people have learned from this kind of data?

Speaker 3

作为一名对婴儿交流与社会认知感兴趣的研究者，我认为最让我震惊的发现——这一发现应归功于琳达·史密斯及其合作者——是我们长期以来一直认为人类的注视和面部表情是极其丰富的信息来源。

Well, as somebody interested in communication and social cognition in little babies, I thought the discovery, which I I think belongs to Linda Smith and to her collaborators, the discovery that really floored me was that we'd been talking about gaze following and looking at people's faces for years, that human gaze and human faces were this incredibly rich source of information.

Speaker 3

但当我们观察头戴式摄像头拍摄的视频时，发现婴儿其实很少看到人脸，因为他们常常躺在地板上。

And then when we looked at the head mounted camera videos, babies actually didn't see faces that often because they're lying there on the floor.

Speaker 3

他们正在爬行。

They're crawling.

Speaker 3

他们实际上生活在一个满是膝盖的世界里。

They're really living in this world of knees.

Speaker 3

因此，当人们想要花时间陪伴婴儿或吸引他们的注意力时，他们会把手直接放在婴儿面前，把某个物品放在婴儿眼前，以此来吸引或引导孩子的注意力，或与他们互动。

And so, it turned out that when people were excited to spend time with the baby or to manipulate their attention, they would put their hands right in front of the baby's face and put some object right in the baby's face, and that's how they would be getting the child's attention or directing the child's attention or interacting with them.

Speaker 3

并不是说婴儿会抬头看向父母所在的方向，然后去理解父母在看什么。

It's not that the baby would be looking way up there in the air to where the parent was and figuring out what the parent was looking at.

Speaker 3

因此，通过手部动作和调整婴儿的位置及眼前物品来实现注意力共享，这一发现令人兴奋且出乎意料。

So this idea of sharing attention through hands and through manipulating the baby's position and what's in front of the baby's face, that was really exciting and and surprising as a discovery.

Speaker 3

我想我们在从孩子家中拍摄的视频中也看到了类似的现象。

And I I think we've seen that foreign out in the videos that we take in kids' homes.

Speaker 0

对婴儿进行心理学研究并非没有挑战。

And doing psychological research in babies doesn't come without its challenges.

Speaker 3

你知道，如果你想研究婴儿，就必须招募这些家庭，与他们取得联系，并获得他们对研究的同意。

You know, if you want to deal with the baby, you have to recruit that family, make contact with them, get their consent for research.

Speaker 3

然后，婴儿必须情绪良好才能参与研究，或者孩子必须愿意参与。

And then the baby has to be in a good mood to be involved in a study, or the child has to be, willing to participate.

Speaker 3

因此，我们通过线上和线下两种方式与家庭合作。

And so we work with families online and in person.

Speaker 3

我们还会前往当地的儿童博物馆和幼儿园。

We also go to local children's museums and local nursery schools.

Speaker 3

因此，对于你所看到的每一个数据点，尤其是在传统的实证研究中，这都需要一名熟练的研究助理或研究生花费数小时进行招募，实际上为孩子提供实验体验。

And so often, for each of the data points that you see, at least in a traditional empirical study, that's hours of work by a skilled research assistant or a graduate student doing the recruitment, you know, actually delivering the experience to the child.

Speaker 2

在过去几年里，迈克和他的合作者们创建了两个庞大的视频数据集，这些视频来自佩戴头戴式摄像头的六月龄至五岁儿童。

Over the last several years, Mike and his collaborators have created two enormous data sets of videos taken by head mounted cameras on children from six months to five years old.

Speaker 2

这些数据集不仅被心理学家用于更好地理解人类认知发展，也被人工智能研究者用来尝试训练机器以更接近婴儿的方式学习世界。

These data sets are not only being used by psychologists to better understand human cognitive development, but also by AI researchers to try to train machines to learn about the world more like the way babies do.

Speaker 2

我们将在第二部分进一步讨论这项研究。

We'll talk more about this research in part two.

Speaker 2

第二部分。

Part two.

Speaker 2

人工智能系统应该像婴儿那样学习吗？

Should AI systems learn the same way babies do?

Speaker 2

正如我们在上一期节目中讨论的，虽然大型语言模型能够完成许多令人印象深刻的任务，但与人类相比，它们的能力仍然相当有限。

As we discussed in our previous episode, while large language models are able to do a lot of really impressive things, their abilities are still pretty limited when compared to humans.

Speaker 2

人工智能领域的许多人认为，如果我们只是持续让大型语言模型在越来越多的数据上进行训练，它们就会变得越来越好，很快就能达到甚至超越人类智能。

Many people in the AI world believe that if we just keep training large language models on more and more data, they'll get better and better, and soon they'll match or surpass human intelligence.

Speaker 0

但其他人工智能研究人员认为，这些系统的工作方式以及当前的训练方法中，缺少了某些根本性的东西。

But other AI researchers think there's something fundamental missing in the way these systems work and in how they are currently trained.

Speaker 0

那么，缺失的部分是什么？

But what's the missing piece?

Speaker 0

关于人类认知发展的新见解，能否为人工智能系统提供一种更稳健地理解世界的方式？

Can new insights about human cognitive development create a path for AI systems to understand the world in a more robust way?

Speaker 1

我认为，在理解人类智能时，被严重忽视的因素是理解输入的结构和统计特性。

I think the big missed factor in understanding human intelligence is understanding the structure, the statistics of the input.

Speaker 1

我认为当前人工智能的失败点，我认为，就在于数据。

And I think the fail point of current AI definitely lies, I think, in the data.

Speaker 1

我想谈谈用于训练的数据，并想论证这一点才是最大的失败之处。

And I'd like to make the data used for training, and I'd like to make a case that that is the biggest fail point.

Speaker 0

当今的神经网络通常使用从网页上抓取的语言和图像进行训练。

Today's neural networks are typically trained on language and images scraped from the web.

Speaker 0

琳达和其他发展心理学家尝试了不同的方法。

Linda and other developmental psychologists have tried something different.

Speaker 0

他们使用头戴式摄像头收集的视频帧来训练AI神经网络。

They've trained AI neural networks on image frames from the videos collected from head mounted cameras.

Speaker 0

问题是，这种类型的数据是否会对神经网络的能力产生影响。

The question is whether this kind of data will make a difference in the neural network's abilities.

Speaker 1

如果你用婴儿的视觉输入——四亿张图像，按从出生到12个月的发育顺序排列，而不是按从 oldest 到 youngest 的逆序，或者随机排列，那么按发育顺序排列的数据训练出的网络，在后续训练中更能学会动作的名称和物体的名称。

If you train them, pre train them with babies' visual inputs, 400,000,000 images, and you order them from birth to 12 of age, what we call the developmental order, versus you order them backwards from oldest to youngest, or if you randomize them, that the developmental order leads in a trained network that is better to learn the name for actions in later training, to learn object names in later training.

Speaker 1

并不是每个人都对这个感兴趣。

Not everybody is interested in this.

Speaker 1

他们只是接受了这样一种观点：只要你拥有足够多的数据——世界上所有已知或说过的内容，你就会变得聪明、有智慧。

They just they they bought into the view that if you get enough data, any data, everything ever known or said in the world, okay, that you will be smart, you'll be intelligent.

Speaker 1

在我看来，这并不一定成立。

It just does not seem to me that that's necessarily true.

Speaker 1

网上有很多内容并不准确，甚至完全错误，还很奇怪。

There's a lot of stuff out there that's not accurate, dead wrong, and odd.

Speaker 1

只是抓取大量现有的知识，比如所有写过的东西或拍过的每一张照片。

Just scraping massive amounts of current knowledge that existed everything ever written, or every picture ever taken.

Speaker 1

好吧，这并不是理想的做法。

Okay, it's just it's not ideal.

Speaker 2

是数据质量的问题，还是教学这些系统的方式排序有问题？

Is it a matter of getting better data or getting better sort of ordering of how you teach these systems?

Speaker 2

还是说有什么更根本的东西缺失了？

Or or is there something more fundamental missing?

Speaker 1

我认为并不是更根本的东西缺失了。

I don't think it's more fundamental, actually.

Speaker 1

明白吗？

Okay?

Speaker 1

我认为是数据更好一些。

I think it's better data.

Speaker 1

我认为这是多模态数据。

I think it's multimodal data.

Speaker 1

我认为这是深深植根于现实世界的数据，而不是人类对现实世界的解读，而是直接来自现实世界的数据。

I think it's data that is deeply in the real world, not in human interpretations of that real world, but deeply in the real world.

Speaker 1

通过感官系统传来的数据，是原始数据。

Data coming through the sensory systems, it's the raw data.

Speaker 1

它没有经过你那些关于谁该或不该获得房贷资助的偏见、教条式观点的过滤，没有被网络上最恶劣的群体对女性应如何外貌的偏见所影响，也没有以这些方式被扭曲。

It is not data that has gone through your biased, cultish views on who should or should not get funded in the mortgage, not biased by the worst elements on the web's view of what a woman should look like, not biased in all these ways.

Speaker 1

它没有经过这些信息的过滤。

It's not been filtered through that information.

Speaker 1

它是原始的。

It is raw.

Speaker 1

明白吗？

Okay?

Speaker 1

它是原始的。

It is raw.

Speaker 0

琳达认为，数据的结构，包括其时间顺序，是婴儿和人工智能系统学习中最重要的因素。

Linda believes that the structure of the data, including its order over time, is the most important factor for learning in both babies and in AI systems.

Speaker 0

我向她提到了艾莉森·戈普尼克在我们第一集中提出的观点。

I asked her about the point Alison Gopnik made in our first episode.

Speaker 0

对于学习代理——无论是孩子还是机器——来说，主动与现实世界互动，而不是被动地从给定的数据中学习，这有多重要？

How important is it that the learning agent, whether it's a child or a machine, is actively interacting in the real world rather than passively learning from data it's given?

Speaker 0

琳达承认，这种‘行动’而非仅仅‘观察’的能力——通过自己的动作或注意力实际生成自己所学习的数据——同样至关重要。

Linda acknowledges that this kind of doing, rather than just observing, being able to, through one's movements or attention, to actually generate the data that one's learning from is also key.

Speaker 1

我认为通过观察你能获得很多，但行动显然也很重要。

I think you get a lot by observing, but the doing is clearly important.

Speaker 1

所以这是一种多模态的、非主动的观点，我认为它不仅仅是从现实世界中获取原始数据（虽然这确实会带来巨大好处，不是照片，而是真实世界中的数据，并且是随时间变化的）。

So this is the multimodal, inactive kind of view, which I think doesn't just get your data from the world at the raw level, although I think that would be a big boon, okay, from the real world, not photographs, okay, and in time.

Speaker 1

我下一刻做什么、对你说什么，取决于我的知识状态，这意味着下一刻输入的数据与我需要学习的内容或我学习所处的阶段相关，因为正是我当前的知识促使我采取行动。

What I do in the next moment, what I say to you, depends on my state of knowledge, which means that the data that comes in at the next moment is related to what I need to learn or where I am in my learning, because it is what I know right now is making me do stuff.

Speaker 1

这意味着学习系统与学习数据是交织在一起的，因为学习系统自身生成了这些数据。

That means a learning system and the data for learning, because a learning system generates it, are intertwined.

Speaker 1

这就像同一个大脑，既在学习，又在生成数据。

It's like the very same brain that's doing the learning is the brain that's generating the data.

Speaker 0

也许，如果人工智能研究者更多地关注训练数据的结构，而不是单纯追求数据量，并且让机器能够直接与世界互动，而不是被动地从人类过滤后的数据中学习，人工智能最终会对世界有更深入的理解。

Perhaps if AI researchers focus more on the structure of their training data rather than on sheer quantity and if they enable their machines to interact directly with the world rather than passively learning from data that's been filtered through human interpretation, AI would end up having a better understanding of the world.

Speaker 0

迈克指出，例如，当前大语言模型所训练的语言数据量，比儿童接触到的要大好几个数量级。

Mike notes that, for example, the amount of language current LLMs are trained on is orders of magnitude larger than what kids are exposed to.

Speaker 3

所以现代人工智能系统是在海量数据集上进行训练的，这也是它们成功的一部分。

So modern AI systems are trained on huge data sets, and that's part of their success.

Speaker 3

对吧？

Right?

Speaker 3

当你看到GPT-3在5000亿词的训练数据下展现出这种惊人的灵活智能时，你就开始初见端倪。

So you get the first glimmerings of this amazing flexible intelligence that we start to see when we see GPT-three with 500,000,000,000 words of training data.

Speaker 3

公司如何使用训练数据属于商业机密，但最新的系统至少已经达到了万亿级以上的数据量。

It's a trade secret of the companies how much training data they they use, but the most recent systems are at least in the 10,000,000,000,000 plus range of data.

Speaker 3

你知道，一个五岁的孩子可能只听过六千万个单词。

You know, a five year old has maybe heard 60,000,000 words.

Speaker 3

这个估计是合理的。

That'd be a reasonable estimate.

Speaker 3

这对于一个五岁孩子所听到的内容来说，算是比较高的估计了。

That's, like, kind of a high estimate for what a five year old has heard.

Speaker 3

所以，这在某种程度上相差了五六个多数量级。

So that's, you know, six orders of magnitude different in some way five, six orders of magnitude different.

Speaker 3

因此，我经常思考的一个最大问题是，孩子所听到的内容和语言模型所需训练数据之间的巨大差异。

So the biggest thing that I think about a lot is how huge that difference is between what the child hears and what the language model needs to be trained on.

Speaker 3

孩子们是惊人的学习者，我认为，通过关注孩子和大语言模型所接触数据量的相对差异，能很好地凸显出他们学习能力的卓越。

Kids are amazing learners, and I think by drawing attention to the relative differences in the amount of data that kids and LLMs get, that really highlights just how sophisticated their learning is.

Speaker 2

但当然，他们还获得了其他感官模态，比如视觉、触摸物体和操作物品。

But, of course, they're getting other sensory modalities like vision and touching things and being able to manipulate objects.

Speaker 2

你知道，这会不会对他们所需的训练量产生巨大影响？

You know, is that gonna make a big difference with the amount of training they're gonna need?

Speaker 3

这正是对我来说最核心的科学问题所在。

This is right where the scientific question is for me.

Speaker 3

究竟是孩子作为学习系统的一部分，或者在他们更广泛的数据环境中，哪个部分造成了这种差异。

Just what part of the child as a system, as a learning system, or in their broader data ecosystem makes the difference.

Speaker 3

你可以想，也许是因为他们拥有丰富的视觉输入与语言并存。

And you could think, well, maybe it's the fact that they've got this rich visual input alongside the language.

Speaker 3

也许这才是真正重要的因素。

Maybe that's the really important thing.

Speaker 3

但接着你就要面对这样一个事实：仅仅给语言模型添加图片，并不会让它们变得特别聪明。

And then you'd have to grapple with the fact that adding just adding pictures to language models doesn't make them particularly that much smarter.

Speaker 3

至少在最新的商业系统中，添加图片让它们显得很酷，现在它们能处理图片了。

At least in the most recent commercial systems, adding pictures makes them cool, and they can do things with pictures now.

Speaker 3

但它们在推理物理世界时仍然会犯和以前一样的错误。

But they still make the same mistakes about reasoning about the physical world that they did before.

Speaker 0

迈克还指出，即使你用婴儿头戴式摄像头生成的数据来训练大语言模型，也不一定能解决物理推理的问题。

Mike also points out that even if you train LLMs on the data generated by head mounted cameras on babies, that doesn't necessarily solve the physical reasoning problems.

Speaker 2

事实上，有时会出现相反的效果，模型不仅没有变得更聪明，反而表现得更差了。

In fact, sometimes you get the opposite effect, where instead of becoming smarter, this data makes the models perform less well.

Speaker 2

正如琳达早前指出的，自己用身体生成数据，并且围绕自己真正想学或需要学的内容，这一点非常特别。

As Linda pointed out earlier, there's something special about having generated the data oneself with one's own body and with respect to what one actually wants to or needs to learn.

Speaker 3

还有一些其他研究，我认为更像是一则警示故事：如果你用大量人类数据训练模型，它们依然不会变得特别出色。

There are also some other studies that I think are a bit more of a cautionary tale, which is that if you train models on a lot of human data, they still don't get that good.

Speaker 3

实际上，婴儿所接触的数据对语言模型和计算机视觉模型而言，反而更具挑战性，而不是更简单。

Actually, the data that babies have appears to be more, not less challenging, for language models and for computer vision models.

Speaker 3

这些是我们实验室最近的新发现，但我们发现，当使用婴儿数据进行训练时，模型的性能提升并不明显。

These are pretty new results from my lab, but we find that performance doesn't scale that well when you train on baby data.

Speaker 3

你去看孩子家里的视频，会发现模型训练的数据全是孩子玩同一辆卡车，或者你知道的，家里只有一只狗。

You go to videos from a child's home, you train models on that, and the video is all of the kid playing with the same truck or the you know, there's only one dog in the house.

Speaker 3

然后你试图让这个模型识别世界上所有的狗，它却说：不，这不是狗。

And then you try to get that model to recognize all the dogs in the world, and it's like, no, it's not the dog.

Speaker 3

所以，这完全是另一回事。

So it's a it's a very different thing.

Speaker 3

对吧？

Right?

Speaker 3

所以，孩子们接触到的数据在某些方面既更深入、更丰富，但在其他方面却远不如多样化，然而他们的视觉系统仍然能出色地识别出狗，即使他们只见过一两只。

So the data that kids get is both deeper and richer in some ways and also much less diverse in other ways, and yet their visual system is still remarkably good at recognizing a dog even when they've only seen one or two.

Speaker 3

这种快速学习和对适当类别迅速泛化的能力，正是我们在计算机视觉领域仍然面临的难题。

So that kind of really quick learning and rapid generalization to the appropriate class, that's something that that we're still struggling with in computer vision.

Speaker 3

我认为，语言学习也是如此。

And I I think the same thing is true in language learning.

Speaker 3

因此，使用来自孩子的实际数据进行这类模拟，我认为能够很好地揭示我们模型的优势与不足。

So so doing these kinds of simulations with real data from kids, I think, could be very revealing of the strengths and weaknesses of our models.

Speaker 0

迈克认为我们当前的模型缺少了什么？

What does Mike think is missing from our current models?

Speaker 0

为什么它们需要比孩子多得多的狗的例子，才能完成孩子轻易就能做到的简单泛化？

Why do they need so many more examples of a dog before they can do the simple generalizations that kids are doing?

Speaker 3

也许是因为拥有身体。

Maybe though it's having a body.

Speaker 3

也许是因为能够穿梭于空间中，并主动干预世界，从而改变世界中的事物。

Maybe it's being able to move through space and intervene on the world to change things in the world.

Speaker 3

也许这就是关键所在。

Maybe that's what makes the difference.

Speaker 3

或者是因为孩子是社会性生物，与他人互动，而他人正在为你构建世界并教你认识世界。

Or maybe it's being a social creature interacting with other people who are structuring the world for you and teaching you about the world.

Speaker 3

这可能很重要。

That could be important.

Speaker 3

或者可能是系统本身。

Or maybe it's the system itself.

Speaker 3

也许是因为婴儿天生就具备一些关于物体、事件以及周围人——这些社会行为者——的概念。

Maybe it's the baby, and the baby has built in some concepts of objects and events and the agents, the people around them, social actors.

Speaker 3

正是这些因素造成了差异。

And it's really those factors that make the difference.

Speaker 0

在我们的第一集中，我们听到了艾莉森·戈普尼克一岁大的孙子在敲击木琴的片段。

In our first episode, we heard a clip of Alison Gopnak's one year old grandson experimenting with a xylophone.

Speaker 0

这是一种非常互动的学习方式，孩子自己控制并创造数据，然后能够将经验推广到其他乐器和情境中。

It's a really interactive kind of learning, where the child is controlling and creating the data, and then they're able to generalize to other instruments and experiences.

Speaker 0

至于婴儿最关心的事情，他们可能只需要经历一次，就能牢牢记住。

And when it comes to the stuff that babies care about most, they might only need to experience something once for it to stay with them.

Speaker 2

但也要记住，艾莉森的孙子是在和他祖父一起演奏音乐。

But also remember that Allison's grandson was playing music with his grandfather.

Speaker 2

尽管他还不会说话，但他非常渴望与祖父互动、交流。

Even though he couldn't talk, he had a strong desire to play with, to communicate with his grandfather.

Speaker 2

与人类不同，大型语言模型没有这种参与社交互动的内在驱动力。

Unlike humans, large language models don't have this intrinsic drive to participate in social interactions.

Speaker 3

六个月大的婴儿就能进行交流。

A six month old can communicate.

Speaker 3

他们能很好地表达自己的基本需求。

They can communicate very well about their basic needs.

Speaker 3

他们能向他人传递信息。

They can transfer information to other people.

Speaker 3

甚至有一些实验证据表明，他们能略微理解他人的意图，理解一些信号的初步含义，比如吸引他人注意或让他人做某事。

There's even some experimental evidence that they can understand a little bit about the intentions of the other people and understand some, you know, rudiments of what it means to signal to get somebody's attention or to get them to do something.

Speaker 3

所以它们实际上在沟通方面可以相当出色。

So they actually can be quite good at communication.

Speaker 3

因此，沟通和语言是两回事。

So communication and language being two different things.

Speaker 3

沟通促进语言的发展，并处于语言的核心，但你不需要掌握一门语言也能进行沟通。

Communication enables language and is at the heart of language, but you don't have to know a language in order to be able to communicate.

Speaker 2

与婴儿不同，大语言模型并没有沟通的内在驱动力，但它们可以展现出迈克所说的沟通行为，或者在上一集中玛丽·沙纳汉所称的角色扮演式沟通。

In contrast to babies, LLMs aren't driven to communicate, but they can exhibit what Mike calls communicative behavior or what in the previous episode, Marie Shanahan would have called role playing communication.

Speaker 3

大语言模型并不天生具备沟通能力。

LLMs do not start with communicative ability.

Speaker 3

大语言模型在最基础、标准的架构中，本质上是预测引擎。

LLMs are, in the most basic, you know, standard architectures, prediction engines.

Speaker 3

它们的目标是优化对下一个词的预测。

They are trying to optimize their prediction of the next word.

Speaker 3

当然，我们还会叠加许多其他微调和基于人类反馈的强化学习技术，这些方法用于调整它们的行为以符合其他目标。

And then, of course, we layer on lots of other fine tuning and reinforcement learning with human feedback, these techniques for changing their behavior to match other goals.

Speaker 3

但它们本质上最初只是预测器，而LLM革命中最令人惊讶的一点在于，这些模型的大型版本竟能展现出某种沟通行为。

But they really start basically as as predictors, and it is one of the most astonishing parts about the LLM revolution that you get some communicative behaviors out of very large versions of these models.

Speaker 3

这确实非常惊人，我认为这是真的。

So that's really remarkable, and I think it's true.

Speaker 3

我认为你可以看到相当明显的证据，表明它们正在从事我们称之为沟通的行为。

I think you can see pretty good evidence that they are engaging in things that we would call communicative.

Speaker 3

你知道，这是否意味着它们真正理解了人类？

You know, does that mean they fundamentally understand human beings?

Speaker 3

我不知道，而且我认为这很可能很难证明。

I don't know, and I think that's probably pretty tough to demonstrate.

Speaker 3

但它们会进行一种关于他人目标和意图的推理，而这正是我们从儿童身上所期待的。

But they engage in the kinds of reasoning about others' goals and intentions that we look for in in children.

Speaker 3

但它们只有在输入了五千亿甚至一万亿个词之后才会这样做。

But they only do that when they've got 500,000,000,000 words or a trillion words of input.

Speaker 3

所以它们并不是像我们认为婴儿那样，先有沟通再发展出语言。

So they don't start with communication and then move to language the way we think babies do.

Speaker 3

它们从预测任何作为输入的内容开始，对于大语言模型来说，这就是语言。

They start with predicting whatever it is that they are given as input, which in the case of LLMs is language.

Speaker 3

然后，令人惊讶的是，它们似乎提取出了一些更高层次的概括，帮助它们展现出沟通行为。

And then astonishingly, they appear to extract some higher level generalizations that help them manifest communicative behaviors.

Speaker 0

尽管大语言模型和婴儿之间存在诸多差异，迈克仍然对大语言模型如何帮助我们理解人类认知感到非常兴奋。

In spite of the many differences between LLMs and babies, Mike's still very excited about what LLMs can contribute to our understanding of human cognition.

Speaker 3

我认为，对于对心智和语言感兴趣的人来说，现在是一个了不起的时代。

I think it's it's an amazing time to be a scientist interested in the mind and in language.

Speaker 3

五十年来，我们一直认为学习人类语言最困难的部分是生成语法正确的句子。

For fifty years, we've been thinking that the really hard part of learning human language is making grammatical sentences.

Speaker 3

从这个角度来看，我认为如果我们不承认最近我们已经取得了重大发现——即当你用大量语言数据训练相对无结构的模型时，它们能够恢复生成语法语言的能力——那就是在智力上不诚实。

And from that perspective, I think it is intellectually dishonest not to think that we've learned something big recently, which is that when you train models, relatively unstructured models, on lots of data about language, they can recover the ability to produce grammatical language.

Speaker 3

这真是太惊人了。

And that's just amazing.

Speaker 3

曾经有很多形式上的论证和理论论证认为这是不可能的，我认为这些论证从根本上是错误的。

There were many formal arguments and theoretical arguments that that was impossible, and those arguments were fundamentally wrong, I think.

Speaker 3

我们作为一个领域必须正视这一点，因为这确实是一个巨大的转变。

We we have to come to grips with that as a field because it it's really a big change.

Speaker 3

另一方面，大语言模型的弱点也极具启发性。

On the other hand, the weaknesses of the LLMs also are really revealing.

Speaker 3

对吧？

Right?

Speaker 3

意义的某些方面，尤其是那些植根于物理世界的内容，更难推理，需要更长时间和更多输入，而不仅仅是获得语法正确的句子。

That there are aspects of meaning, often those aspects that are grounded in the physical world, that are trickier to reason about and take longer and need much more input than just getting a grammatical sentence.

Speaker 3

这同样令人着迷。

And that's fascinating too.

Speaker 3

发展认知科学中的经典争论一直是关于先天论与经验论，即儿童学习语言必须具备哪些先天能力。

The classic debate in developmental cognitive science has been about nativism versus empiricism, what must be innate to the child for the child to learn.

Speaker 3

我认为我对哪些能力必须是先天的这一观点正在迅速改变，下一步将是利用这些技术来弄清楚儿童和人类学习者真正具备哪些先天能力。

I think my views are changing rapidly on on what needs to be built in, and and the next step is gonna be trying to use those techniques to figure out what actually is built into the, to the kids and to the human learners.

Speaker 3

我非常兴奋的是，这些模型不仅从工程或商业角度来看变得有趣，而且正成为真正的科学工具和科学模型，可以作为这个开放、可及的生态系统的一部分，用于研究人类心智。

I'm really excited about the fact that these models have not just become interesting artifacts from an engineering or commercial perspective, but that they're also becoming real scientific tools, real scientific models that can be used and explored as part of this broad, open, accessible ecosystem for people to work on the human mind.

Speaker 3

看到这一代新模型与大脑联系起来，与人类行为联系起来，并融入科学讨论中，真是令人着迷。

So just fascinating to see this new generation of models get linked to the brain, get it linked to human behavior, and becoming part of the scientific discussion.

Speaker 0

迈克不仅对大型语言模型如何为人类心理学提供洞见感兴趣，他还撰写了一些具有影响力的文章，探讨发展心理学的实验方法如何帮助我们更好地理解大型语言模型。

Mike's not only interested in how LLMs can provide insight into human psychology, he's also written some influential articles on how experimental practice in developmental psychology can help improve our understanding of LLMs.

Speaker 2

你写过一些文章，讨论发展心理学研究的方法如何有助于评估大型语言模型的能力。

You've written some articles about how methods from developmental psychology research might be useful in evaluating the capabilities of LLMs.

Speaker 2

那么，你认为当前这些系统评估方式存在哪些问题？心理学研究又能如何为此做出贡献？

So what do you see as the problems with the way these systems are currently being evaluated, and how can research psychology contribute to this?

Speaker 3

早在2023年，也就是在人工智能领域算得上是大约十五年前吧，当GPT-4发布时，人们对此产生了大量热烈的反响，这很好。

Well, way back in 2023, which is about, you know, fifteen years ago in AI time, when GPT four came out, there was this whole set of, like, really excited responses to it, which is great.

Speaker 3

这是一项非常令人兴奋的技术。

It was very exciting technology.

Speaker 3

它至今依然如此。

It still is.

Speaker 3

其中一些反应看起来非常像下面这种情况。

And some of them looked a lot like the following.

Speaker 3

我让GPT-4播放了迪士尼电影《海洋奇缘》的剧本，它在结尾时哭了，说它很难过。

I played GPT four, the transcript of the Moana movie from Disney, and it cried at the end and said it was sad.

Speaker 3

天哪。

Oh my god.

Speaker 3

GPT-4有人类的情感。

GPT four has human emotions.

Speaker 3

对吧？

Right?

Speaker 3

作为心理学家，这种反应让我觉得这属于一种典型的研究方法错误——你根本没有做实验。

And this kind of response to me as a psychologist struck me as a kind of classic research methods error, which is you're not doing an experiment.

Speaker 3

你只是观察了一个关于系统的轶事，然后就武断地推断出系统内部的心理状态。

You're just observing this anecdote about a system and then jumping to the conclusion that you can infer what's inside the system's mind.

Speaker 3

如果说心理学发展出了什么，那就是一套关于如何推断他人内心状态的方法和规则。

And, you know, if psychology has developed anything, it's a body of knowledge about the methods and the rules of that game of inferring what's inside somebody else's mind.

Speaker 3

这当然不是一个完美的领域，但其中一些原则已经被描述得相当清晰，尤其是在发展心理学中。

It's by no means a perfect field, but some of these things are pretty, you know, well described, and especially in developmental psych.

Speaker 3

经典的实验包括一个对照组和一个实验组，通过比较这两组来判断某种特定的活性因素是否产生了影响。

So classic experiments have a control group and an experimental group, and you compare between those two groups in order to tell if some particular active ingredient makes the difference.

Speaker 3

因此，至少你需要用两种不同类型的材料进行评估，并比较它们，才能做出这样的推论。

And so minimally, you you would want to have evaluations with two different sort of types of material and comparison between them in order to make that kind of inference.

Speaker 3

这就是我一直在说、也写过的一些观点：你需要采用一些基础的实验方法工具，进行受控实验，使用高度控制的简单刺激，以便清楚地知道为什么大语言模型或儿童会给出特定的反应，从而避免得出后来被证明是人为假象的实验结果——这些假象往往源于你没有妥善控制刺激材料中的某个混淆变量。

And so that that's the sort of thing that I have gone around saying and have written about a bit is that, you know, you just need to take some basic tools from experimental methods, doing controlled experiments, using kind of tightly controlled simple stimuli so that you know why the LLM or why the child gives you a particular response and so forth so that you don't get these experimental findings that turn out later to be artifacts because you didn't take care of a particular confound in your in your stimulus materials.

Speaker 2

AI界对这种研究的反应如何？

What kind of response have you gotten from the AI community?

Speaker 3

我认为实际上已经有一些人对这类研究持开放态度。

I think there's actually been been some openness to this kind of work.

Speaker 3

你知道，对于早期对语言模型的评估，确实出现了很多反对声音。

You know, there there has been a lot of pushback on those initial evaluations of language models.

Speaker 3

举一个具体的例子，我之前调侃过那些声称AI具有人类情感的人，但实际上有很多人声称不同版本的ChatGPT具备所谓的‘心理理论’，也就是能够推理他人信念和欲望的能力。

So just just to give one kinda concrete example here, I was making fun of people with this human emotions bit, but there were actually a lot of folks that, made claims about different chat GPT versions having what's called theory of mind, that is being able to reason about the beliefs and desires of other people.

Speaker 3

最初的评估所使用的材料，基本上是来自发展心理学文献中用于诊断心理理论的故事。

So the initial evaluations took essentially stories from the developmental psychology literature that are supposed to diagnose theory of mind.

Speaker 3

这些就是像萨莉-安妮任务这样的实验。

These are things like the Sally Anne task.

Speaker 0

你可能还记得我们在上一期节目中提到的萨莉-安妮测试。

You might remember the Sally Anne test from our last episode.

Speaker 0

萨莉把一个物体——比如一个球、一本书或其他东西——放在某个地方，然后离开了。

Sally puts an object, let's say a ball or a book or some other thing, in one place and then leaves.

Speaker 0

接着，在萨莉离开期间，安妮把那个物体移到了另一个藏匿处。

And then while Sally's away, Anne moves that object to another hiding spot.

Speaker 0

然后测试会问：当萨莉回来时，她会在哪里找她的物品？

And then the test asks, Where will Sally look for her object when she returns?

Speaker 2

尽管你我知道安妮把书或球放在了哪里，但我们也知道萨莉并不知道这一点。

And even though you and I know where Anne put the book or the ball, we also know that Sally doesn't know that.

Speaker 2

所以当她回来时，会去错误的地方找它。

So when she returns, she'll look in the wrong place for it.

Speaker 2

心理理论是指理解萨莉对这种情况持有错误信念，因为她有自己的独立经历。

Theory of mind is understanding that Sally has a false belief about the situation because she has her own separate experience.

Speaker 0

如果你给聊天GPT一个关于萨莉·安妮测试的描述，它就能解决这个问题。

And if you give CHAT GPT a description of the Sallie Anne test, it can solve it.

Speaker 0

但我们不知道它是真的在进行推理，还是仅仅因为训练期间吸收了大量示例。

But we don't know if it can do it because it's actually reasoning or just because it's absorbed so many examples during its training period.

Speaker 0

因此，研究人员开始做出一些小的改动，最初这些改动让大语言模型陷入了困境，比如更改萨莉和安妮的名字。

And so researchers started making small changes that initially tripped up the LLMs, like changing the names of Sally and Anne.

Speaker 0

但大语言模型也已经适应了这些改动。

But LLMs have caught on to those too.

Speaker 3

大语言模型在处理这类表面变化方面相当擅长。

LLMs are pretty good at those kind of superficial alterations.

Speaker 3

所以也许你需要创造新的材料。

So maybe you need to make new materials.

Speaker 3

也许你需要设计一些全新的关于人们信念的谜题，而不是仅仅改变物品的位置。

Maybe you need to actually make new puzzles about people's belief that don't involve changing the location of an item.

Speaker 3

对吧？

Right?

Speaker 3

所以人们在这方面进步了很多。

So people got a lot better at this.

Speaker 3

我不认为目前的最先进技术已经完美了，但即使只过了一年，论文中出现的方法也变得更加复杂了。

And I I wouldn't say that the state of the art is perfect now, but the approach that you see in papers that have come out even just a year later is much more sophisticated.

Speaker 3

他们设计了大量关于推理他人心理的不同谜题。

They have a lot of different puzzles about reasoning about other people.

Speaker 3

他们研究的是，大型语言模型是否能正确诊断出某个社交失礼为何令人尴尬，或者某种表达方式为何显得别扭。

They're looking at whether the LLM correctly diagnoses why a particular social faux pas was embarrassing or whether a particular way of saying something was awkward.

Speaker 3

这些新基准需要更多的推理能力。

There's a lot more reasoning that is necessary in these new benchmarks.

Speaker 3

所以我认为，这实际上是一个案例——我虽然只是其中的一小部分参与者，但这场讨论确实推动了研究方法的改进。

So I think this is actually a case where the discussion, which I was just a small part of, really led to an improvement in the research methods.

Speaker 3

我们还有很长的路要走，但才过去了一年。

We still have further to go, but it's only been a year.

Speaker 3

因此，我对这些关于方法的讨论真正提升了我们研究模型的方式，也真正加深了我们对模型本身的理解，感到非常乐观。

And and, so I'm I'm quite optimistic that that all of this discussion of methods has actually improved our understanding of how to study the models and also actually improved our understanding of the models themselves.

Speaker 0

所以，梅兰妮，根据迈克刚才说的，看起来研究大语言模型的研究人员仍在努力找出理解它们工作方式的最佳方法。

So, Melanie, from everything Mike just said, it sounds like researchers who study LLMs are still trying to figure out the best way to understand how they work.

Speaker 0

这和长期试图理解婴儿的过程并没有什么不同。

And it's not unlike the long process of trying to understand babies too.

Speaker 0

对吧？

Right?

Speaker 2

是的。

Right.

Speaker 2

你知道吗，当我第一次听说心理学家在婴儿头上安装摄像头进行记录时，我觉得这太好笑了。

You know, when I first heard about psychologists putting cameras on babies' heads to record, I thought it was hilarious.

Speaker 2

但听起来，这些摄像头收集的数据实际上正在彻底改变发展心理学。

But it sounds like the data collected from these cameras is actually revolutionizing developmental psychology.

Speaker 2

我们从琳达那里了解到，数据显示婴儿的视觉体验结构与人们之前认为的截然不同。

We heard from Linda that the data shows that the structure of the baby's visual experiences is quite different from what people had previously thought.

Speaker 0

对。

Right.

Speaker 0

我的意思是，他们其实很少看到自己的脸，这真是太神奇了。

I mean, it's amazing that, you know, they don't actually see their faces so much.

Speaker 0

正如迈克提到的，他们所处的世界尽是膝盖，对吧？

As Mike mentioned, they're in a world of knees, right?

Speaker 0

琳达认为，是大自然对数据的组织方式——正如她所说——让婴儿在生命的头几年里能学到这么多东西。

And Linda seems to think that the structuring of the data by Mother Nature, as she put it, is what allows babies to learn so much in their first few years of life.

Speaker 2

是的。

Right.

Speaker 2

琳达谈到了所谓的发育顺序，也就是婴儿在成长过程中，视觉或其他体验出现的时间顺序。

Linda talked about the so called developmental order, which is the temporal order in which babies get different kinds of visual or other experiences as they mature.

Speaker 2

他们看到和听到的内容，是由他们自己的身体能力以及社会关系决定的。

And what they see and hear is driven by what they can do with their own bodies and their social relationships.

Speaker 2

更重要的是，这也受到他们想学什么、对什么好奇的驱动。

And importantly, it's also driven by what they want to learn, what they're curious about.

Speaker 2

这与大型语言模型的学习方式完全不同，后者是通过人类向它们喂入从网络上抓取的海量文本和图片来学习的。

It's completely different from the way large language models learn, which is by humans feeding them huge amounts of text and photos scraped from the web.

Speaker 0

这种发展顺序也有助于婴儿在正确的时间学习正确的东西。

And this developmental order, I mean, it's also conducive to babies to learn the right things at the right time.

Speaker 0

还记得迈克提到过，婴儿和儿童的学习方式让他们能以更少的资源做更多的事吗？

And remember Mike pointed out that the way babies and children learn allows them to do more with less?

Speaker 0

他们比大语言模型更容易进行泛化。

They're able to generalize much more easily than LLMs can.

Speaker 0

但关于这一切，仍然有很多谜团。

But there's still a lot of mystery about all of this.

Speaker 0

人们仍在努力理解人类认知的发展。

People are still trying to make sense of the development of cognition in humans.

Speaker 0

对吧？

Right?

Speaker 2

有趣的是，迈克认为，尽管大语言模型与我们如此不同，它们实际上将帮助心理学家研究这个问题。

And interestingly, Mike thinks that large language models are actually gonna help psychologists in this even though they're so different from us.

Speaker 2

例如，大语言模型可以作为原理验证，用来展示哪些东西是可以学会的，哪些必须是先天具备的，以及哪些行为会自然涌现，比如他提到的沟通行为。

So, for example, LLMs can be used as a proof of principle of what can actually be learned versus what has to be built in and of what kinds of behaviors can emerge, like the communication behavior he talked about.

展开剩余字幕（还有 16 条）

Speaker 2

我个人也非常期待另一个方向，即运用儿童发展原理来改进人工智能系统，同时利用实验方法来弄清楚大型语言模型的能力范围和局限。

I'm also personally very excited about the other direction, using principles from child development in improving AI systems and also using principles from experimental methodology in figuring out what LLMs are and aren't capable of.

Speaker 0

是的。

Yeah.

Speaker 0

通常看来，试图比较人类和计算机的智能，就像试图比较苹果和橙子。

Often it seems like trying to compare the intelligence of humans and computers is like trying to compare apples to oranges.

Speaker 0

它们看起来如此不同。

They seem so different.

Speaker 0

而试图使用通常用于人类的测试，比如迈克提到、托默在我们上一期节目中讨论过的心理理论测试，似乎并不能总是给我们带来我们想要的洞见。

And trying to use tests that are typically used in humans, like the theory of mind test that Mike referred to and Tomer talked about in our last episode, they don't seem to always give us the insights we're looking for.

Speaker 0

那么，我们应该采用什么样的方法来评估大型语言模型的认知能力呢？

So what kinds of approaches should we use to evaluate cognitive abilities in LLMs?

Speaker 0

我的意思是，从研究非人类动物智力的方法中，我们能学到些什么吗？

I mean, is there something to be learned from the methods used to to study intelligence in nonhuman animals?

Speaker 2

在下一期节目中，我们将更深入地探讨如何评估智能，以及我们是否在提出正确的问题。

Well, in our next episode, we'll look more closely at how to assess intelligence and if we're even asking the right questions.

Speaker 4

我认为，当一个人通过MCAT或SAT考得好时，这代表的意义，和神经网络做到这一点时的意义并不相同。

I think, like, what it means when a person passes the MCAT or scores well on the SAT is not the same thing as what it might mean when a neural network does that.

Speaker 4

我们其实并不清楚当神经网络做到这一点时意味着什么，而这正是问题的一部分。

We don't really know what it means when a neural network does it, and that's part of the problem.

Speaker 2

下次在《复杂性》节目中再见。

That's next time on Complexity.

Speaker 2

《复杂性》是圣塔菲研究所的官方播客。

Complexity is the official podcast of the Santa Fe Institute.

Speaker 2

本集由凯瑟琳·蒙库尔制作，主题曲由米奇·米尼亚诺创作。

This episode was produced by Katherine Moncure, and our theme song is by Mitch Mignano.