本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
我一直以来,可能抱有一种偏见,如果回过头来看,这种偏见是错误的,那就是在尝试新方法之前,总想先用理论去理解。
I always, probably had a bias, maybe a wrong bias if you look back about trying to understand a new approach in terms of theories before doing, you know, applications or demonstrations.
我们和许多其他人正在发展的理论,比十年前我所能想象的要丰富得多。
The theory that we and many others are developing is much richer than I could have expected ten years ago.
还有很多工作要做。
There is much more to be done.
是的。
Yeah.
所以这远没有那么 trivial。
So it's much less trivial.
其中有很多有趣的地方,有些还相当深刻。
A lot of interesting aspects of it and some are quite deep.
所以,是的,我觉得这非常令人兴奋。
So, yeah, I find it very exciting.
你知道,关于那些比我们更聪明的智能体,实现它们可能需要比许多人想象的更长时间。
You know, in terms of intelligence that are better than us, it may take longer than many people think.
对。
Right.
所以我认为皮层可能是具有组合性的,也许我们更容易在计算机中模拟它。
So cortex is what I think probably compositional and maybe what we can simulate in computers more easily.
而大脑更古老的部位,可能就不行。
And older parts of the brain, maybe not.
这有点讽刺意味,也就是说,更简单的部分——或者说更古老的脑区——反而可能更难模拟。
This would be kind of ironic that, you know, the simpler parts, so to speak, the more ancient part of the brain are the one that would be potentially more difficult to simulate.
这是《类脑启示》,由Transmitter赞助。
This is Brain Inspired, powered by the transmitter.
大家好。
Hey, everyone.
感谢大家的到来。
Thanks for being here.
我无法充分表达对今天嘉宾托马索·波吉奥职业生涯的介绍。
I am not going to be able to do justice to the career, to an introduction, to my guest today, Tommaso Poggio.
至于头衔,我直接念网站上的内容吧,因为太多了。
As far as titles, I'm just gonna read from the website because it's a lot.
所以,托马索是脑与认知科学系的尤金·麦克德莫特教授。
So Tommaso is the Eugene McDermott professor in the Department of Brain and Cognitive Sciences.
他还是麦戈文脑研究所的研究员。
He's an investigator at the McGovern Institute for Brain Research.
他是麻省理工学院计算机科学与人工智能实验室(简称CSAIL)的成员。
He's a member of the MIT Computer Science and Artificial Intelligence Laboratory, otherwise known as CSAIL.
他还是麻省理工学院生物与计算学习中心以及大脑、思维与机器中心的主任。
And he's the director of both the Center for Biological and Computational Learning at MIT and the Center for Brains, Minds, and Machines.
托马索的发表成果比我出生还早,而我可不是年轻人了。
And Tommaso has been publishing since before I was born, and I am no spring chicken.
在Google Scholar上,列在第一位的作品是1972年的,那篇论文的标题是《控制论》?
On Google Scholar, the first listed work is from 1972, and the title of that publication in kybernetic kybernetic?
我不太确定。
I'm not sure.
这可能是德语发音。
It's probably a German pronunciation.
它的标题是《时间记忆与光动反应的全息方面》。
The title of it is holographic aspects of temporal memory and optomotor responses.
所以可以说,他在这方面已经研究了很久。
So suffice it to say, he has been at this for a long time.
那么他长期从事的究竟是什么呢?
And what is the this that he has been at?
他研究的是智能原理的理论,而‘理论’才是这里的关键词。
It is studying the theory of principles of intelligence, and theory is is the keyword here.
所以汤米对智能背后的理论原理非常感兴趣,为了研究这些,他同时研究人工智能和大脑的工作方式。
So Tommy is super interested in the theoretical principles underlying intelligence, and to study those things, he studies both artificial intelligence, and the way that brains work.
我们有非常酷的人工智能。
So we have really cool AI.
我们拥有非常酷的人工智能已经有一段时间了。
We've had really cool AI for some time.
当然,它已经从不太酷发展到现在非常酷、非常棒,并且还在变得越来越好。
Of course, it has progressed from not really cool to now really cool and and really great, and of course, it's getting better and better.
但我们仍然不知道,它本质上是如何工作的。
But we still don't know, how it works essentially.
虽然有一些推动人工智能发展的基本理论原则,但现代人工智能的崛起主要是由工程驱动的,即通过构建而非理解来实现的。
There have been some driving theoretical principles that began artificial intelligence, but this modern rise in artificial intelligence was driven really from engineering, from from building it, not from understanding it.
托马索将我们当前在人工智能中工程与理论之间的关系,比作伏打发明第一块电池后,人们开始应用电力、利用电力的时期,与多年后麦克斯韦方程组真正揭示电磁理论之间的关系。
Tommaso likens our current situation in artificial intelligence between engineering and theory as like the time between when Volta engineered the first battery and lots of applications were produced using electricity, harnessing electricity, and being able to to use it, versus years later when Maxwell's equations really brought out the the theory of electromagnetism.
正是由于这一理论,我们才能进一步在电磁领域开发出许多更新、更好的东西,比如计算机和现代人工智能。
And because of that theory, we were able to go on and develop lots of new and better things in the electromagnetic space, like computers and modern artificial intelligence.
在那种情况下,这中间相隔了许多年。
So in that case, it was many years.
当然,正如你会听到托马索提到的,那时信息通过马匹等方式传播的速度要慢得多。
Of course, information, as you'll hear Tommy talk about, was traveling much more slowly by horse, etcetera, during that time.
但从电池被发明、这些系统的工程部分被开发出来,到我们真正理解其工作原理,中间也花了许多年。
But it took many years from when the battery was built, when the engineering component of these systems was developed, and when we actually understood why, and how it it worked.
所以他认为,我们现在所处的阶段就像当年那样,他一直致力于研究驱动我们对智能运作机制理解的理论原理。
So he thinks that we're in this time now like that time then, and he has been and continues to work on the theoretical principles that that drive what we understand about how intelligence works.
因此,在今天的这一集中,我们讨论了他所研究的一些原理,这些原理在他看来,如果想要构建一个高效可计算的函数系统,并且这些函数组合后能形成一个通用的高效计算系统,从而支撑智能行为,那么这些原理至关重要。
So in this episode today, we talk about some of the principles that he's been working on that he has found to be theoretically important if you want an efficiently computable system of functions that when put together results in a generalized efficient computational system that could underlie intelligent behavior.
为了给这些原理命名,其中一个就是稀疏组合性。
To give names to those principles, one is sparse compositionality.
这个观点认为,如果你想高效地计算出智能行为,它必须由许多相当简单的函数组成,这些函数在本质上都很简单,也就是说,每个函数本身只需要少量变量就能学习。
This is the idea that if you want to efficiently compute an intelligent behavior, it needs to be composed of many fairly simple functions that are simple in the facet that each function that composes this collection of functions, each function itself is fairly simple, meaning it takes a few variables to learn that function.
当你拥有这样的系统时,将它们组合起来,理论上就能保证你的系统具有更强的泛化能力。
And when you have that kind of system, you put them together, it theoretically guarantees that your system is gonna be more generalizable.
事实证明,这正是深度网络必须具备深度才能正常运作的根本原因。
And it turns out this is a principled reason why you actually need the depth of deep networks for them to function the way that they do.
因此,需要大量重复的简单基础函数组合在一起。
So lots and lots of repeated, simple, basic functions put together.
这听起来有点像大脑的新皮层,我们也会讨论这些原理是否仅适用于人工智能和深度学习,还是同样适用于我们人类的生物大脑。
Sounds a little bit like the neocortex of the brain, which we also discuss whether these principles apply only to artificial intelligence and deep learning, for example, or whether they also apply to our wet brains.
所以我们谈到了他发展这些理论的过程,以及他为何如此行事的原因。
So we talk about his development of these kinds of theories, the reasons why he does what he does.
我主要享受托马索分享他几十年来与有趣的人们探讨有趣问题的诸多经历,而他至今仍在继续这样做。
Mostly, I enjoyed Tommaso sharing some of his many experiences over decades of working with interesting people on interesting problems, which he continues to do.
我在节目笔记中附上了他的自传链接,该自传可通过他的网站公开获取。
I linked to his autobiography in the show notes, which is available, publicly through his website.
但如果你读过他的自传——今天我们也会谈到其中几位人物——你会发现这几乎就是现代智力研究、理论及科学原则发展史上众多重要人物的名人录。
But if you read through his autobiography, and and we talk about a few of these people today, it's basically a a who's who of, well, many important names throughout the modern history of studying intelligence and theory and scientific principles in general.
无论如何,我也在节目笔记中附上了今天我们讨论的关于组合稀疏性的具体论文,以及其他内容,比如托米和其他人正在撰写的一系列面向公众传播这些理念的博客文章。
Anyway, I also link to the specific paper on compositional sparsity that we discussed today and other things in the show notes, like a series of blog posts that Tommy and others are working on to communicate these ideas to the public.
这些节目笔记位于 braininspired.co/podcast/220nine。
Those show notes are at braininspired.co/podcast/220nine.
正如我所说,在我们的对话中,我们只谈到了托米多年来参与的少数几个项目,但这仅仅是对他所做和仍在进行的工作的浅尝辄止。
Like I said, during our conversation, we we discuss a small handful of the projects that Tommy has worked on over the many years that he's been in this business, but this is really just scratching the surface of what he has done and continues to do.
我希望你们喜欢这次讨论。
So I hope you enjoyed this discussion.
这是Tommaso。
Here's Tommaso.
所以普通科学家会经历这些乐观与悲观的起伏波动,是的。
So normal scientists experience these ebbs and flows of optimism and pessimism Yep.
你知道,在他们的研究生涯中,尤其是早期可能。
You know, throughout their research careers, especially early on maybe.
也许这就是关键所在。
Maybe that's the key.
我不确定。
I'm not sure.
但是,你知道,他们对自己取得进展的能力有起有落,对整个领域的乐观情绪也有起伏。
But, you know, ebbs and flows about their own ability to make progress, and and ebbs and flows about the optimism in their field as a whole.
所以举个例子,当你将学习作为你和David Marr早年开发的分析框架的第四层级引入时。
So so for just as an example, when you introduced learning as sort of a a fourth level to the the levels of analysis framework that you and David Marr developed way back when.
我猜想你当时是乐观的,认为这会像开启一个新层级一样——无意双关——并且如果大家都意识到这就是我们需要专注的方向,就能加快进展速度。
I I kind of I would imagine that you were feeling optimism that this would sort of introduce like a a new sort of unlock a new level, no pun intended, and and sort of speed things up if everyone realized, yes, this is what we need to focus on.
但看起来,从你的工作和你做事的方式来看,你似乎是一个无论何种情况都稳步前进的人。
But but it seems also, looking at your work and and the way that you go about what you do, it seems like you're sort of a a steady marching forward under all circumstances kind of person.
所以,你觉得你自己不正常吗?
So, you know, are you abnormal?
在这一点上,你觉得你是正常的吗?
Are you normal in that regard?
你会感受到起伏吗?
Do you feel the ebbs and flows?
我能感受到起伏。
I feel the ebbs and flow.
你知道,你说得完全对。
You know, I think those you're absolutely right.
确实有一些大的起伏,可能会持续几个月,有时甚至几年。
There are big ones, you know, that may take months or sometimes years.
还有一些小的起伏,比如每天都会有的乐观或悲观情绪。
And there are small ones, you know, kind of day to day optimistic or pessimistic.
我能证明这个定理。
I can prove this theorem.
我已经证明了。
I proved it.
不。
No.
我错了。
I was wrong.
你知道吗?
You know?
就是这种事。
It's these kind of things.
当我引入第四层时,我其实是在回顾。
Now when, when I, introduced the fourth level, I was kind of looking back.
所以,那是在我认定学习很重要之后很久才发生的,你知道的。
So it was after I decided that learning was quite a few years after I decided the learning was, you know, important.
我实际上认为我的第一篇论文是关于机器学习的,大概是1981年左右研究非线性学习。
I actually think my first paper was on machine learning, kind of nonlinear learning in back in the '81 or so.
但当时我决定,还有其他更值得关注的问题,比如人类视觉和立体视觉。
But at the time, I decided, there were other problems to that one would want to look at, human vision and stereopsis.
你知道吗?在涉足学习之前,我们如何实现三维视觉?
You know, how can we see in three d before getting into learning?
所以我在学习方面的职业生涯推迟了一点,先做了其他事情大约十年,之后才回到学习领域。
So I my my kind of career about learning was a bit delayed, kind of doing these other things for about ten years before coming back to learning.
但这是不是
But is that is
是因为学习看起来更困难,还是其他问题更有趣?
that because learning seemed more daunting or the other problems were more interesting?
明白了。
Okay.
不是。
No.
我觉得其他问题更容易入手,而学习则更令人望而生畏,毫无疑问。
I think the other problems were lower hanging fruits and learning was more daunting, definitely.
是的。
Yes.
你知道,我可能一直有种偏见——如果回头来看,这或许是个错误的偏见——就是总想先用理论去理解一种新方法,而不是先去做应用或演示。
You know, I I always probably had a bias, maybe a wrong bias if you look back, about trying to understand a new approach in terms of theories before doing, you know, applications or demonstrations.
这其实是个品味问题。
You know, this is a question of taste.
其他人更喜欢先尝试一下。
Other people prefer to try something out.
如果有效,再发展理论,或者干脆就不发展理论。
If it works, then perhaps develop a theory or perhaps not at all.
比如杰弗里·辛顿就更倾向于‘干脆不发展理论’,但我恰恰相反。
Like like, Jeffrey Hinton is more of a not at all, but but I was the opposite.
有时候我会阻碍自己本可以完成的事情,但这就是我大脑运作的方式。
And sometimes was was breaking what I, you know, I could have done, but that's the way my brain operates.
所以直到1990年,我才建立了机器学习的理论框架,之后我开始将学习方法应用到各种问题上,比如计算机视觉、计算机图形学、通过GNAs检测癌症、文本分类、自动驾驶,基本上就是如今人们所做的一切。
So it was only in 1990 when I had a framework for machine learning, theoretical one, that after that, I started to apply learning to every problems like computer vision, computer graphics, detection of cancer in GNAs, you know, text classification, autonomous driving, basically everything that people do these days.
是的。
Yeah.
我当时用的是那个时代的网络,主要是浅层网络、径向基函数和九十年代的核方法。
I did it with the network of the times, which were basically shallow networks, radial basis functions, and kernel methods in the nineties.
这些方法当时更难,因为计算能力较低,网络也更小,但某种程度上它们更有理论依据。
Well, those were more difficult because there there was lower compute, the smaller networks, but they're in some sense more principled.
对吧?
No?
对。
Yes.
对。
Yes.
没错。
So exactly.
我们与优秀的合作者一起撰写了一篇论文,当时我有费德里科·吉罗齐,时间是1990年,论文内容是关于这种浅层网络的理论。
We wrote a paper with a great collaborators, I had Federico Girozzi, in 1990, which was about the theory of this shallow network.
这基本上是一种在‘核机器’这个术语被发明之前,关于核机器的理论。
It was basically a theory about kernel machines before the term kernel machines was actually invented.
基于这一理论,我感到可以自由地将它应用于遗传学、视觉、图形等领域,正如我所说。
And, and then based on that theory, I kind of felt free to apply to problems like genetics and vision and graphics and so on, as I said.
但是
But
是的。
yeah.
只有当你有了理论之后,才真正获得了自由。
Was only once once you had the theory, then you became free.
这就是你的意思。
That's your okay.
是的。
Yeah.
完全正确。
That's exactly right.
完全正确。
That's exactly right.
而且,说实话,我对此有些遗憾,因为我觉得后来学到的一个教训是关于伏打的故事。
And I you know, in a sense, regret this because maybe, you know, one of the lessons that I think I learned afterwards is this story about Volta.
所以这更像是一个隐喻,不必过于字面理解,但正如我们所说,历史不会重复,但有时会押韵。
So it's it's a kind of metaphor not to be taken too literally, but, you know, as we say, history is not repeating itself, but rhyming itself sometime.
没错。
And so right.
所以,如果你要讲伏打与电之间的这种类比,它本身就很有趣。
So if you want to say this analogy between, vault and electricity, it's an interesting itself.
我认为很少有人意识到,直到1800年——也就是仅仅二百二十年前——那是拿破仑的时代,信息的传播速度还只有马的速度。
I don't think many people realize that until 1800, which was only two hundred and twenty years ago, this was the time of Napoleon, information traveled at the speed of a horse.
是的。
Yeah.
人类历史上从未如此之快。
Never faster in human history.
当人们写信彼此谈论君士坦丁堡陷落时,有这些美妙的信件。
There are these wonderful letters when people wrote to each other about the fall of Constantinople.
他们见证了基督教世界中的重大事件。
They saw the big event in the Christian world.
我想大概是1454年左右吧。
I think it was 1454 or something like this.
他们在巴黎,你知道的,在维也纳,彼此写信。
They wrote, in Paris, they you know, in Vienna, they wrote to each other.
在巴黎,他们彼此写信。
In Paris, they wrote to each other.
你听说君士坦丁堡被土耳其人攻陷了吗?
Did you hear Constantinople fell to the Turks?
在马德里,他们彼此写信。
In Madrid, they wrote to each other.
所以我们有了信息传达到的确切日期。
So we have a precise date when the information got.
到维也纳正好三周,到巴黎四周,到马德里五周。
And it was exactly three weeks to Vienna, four to Paris, five to Madrid.
正好是一匹马连续跑二十四小时。
Exactly a horse running twenty four hours.
天气好的时候。
In good weather.
是的。
Yeah.
天气好的时候。
In good weather.
没错。
Yes.
所以,伏打,1800年,拿破仑。
So, Volta, 1800, Napoleon.
在此之前,电力只是闪电的火花。
Until then, electricity was just sparks of lightning.
而伏特发明了第一个持续电流的来源。
And Volta, invents, the first source of continuous electricity.
实际上,我有
Actually, I have
你只是放了电池。
You just put the battery.
对吧?
Right?
还是叫电堆?
Or the pile?
这是对原始电堆的忠实复制品。
This is a faithful a faithful copy of the original pillar.
哦,这太酷了。
Oh, that's cool.
是的
Yeah.
这是为了纪念亚历山德罗·伏打发明电池二百周年,在2000年赠予我的。
Given to me for the bicentennial of the invention of of Alexander Volle in the year 2000.
嗯哼
Mhmm.
那时,我和沃尔特伯爵夫人共进午餐。
And at that time, I had lunch lunch with countess Walter.
她是亚历山德罗·沃尔特的孙女。
She was the grand grand granddaughter of Alessandro Walter.
是她带给你的吗?
And she brought you that?
或者抱歉?
Or Sorry?
你是怎么得到它的?
How did you get that?
她是把那个带给你的吗?
Did she bring that to you?
嗯,帕维亚大学举办了一场盛大的庆祝活动,颁了四个诺里斯奖。
Well, the University of Pavia make a big celebration, gave four Norris laure okay.
不。
No.
不。
No.
是因果奖,诺里斯因果奖。
The causal laure so Lauro Norris causa.
我是那四个人中的一个。
I was one of the four.
我得到了这一个。
I got this one.
所以当时有个特别的活动,是的。
So there was a special event Yeah.
持续了两百年。
For the two hundred years.
但当时,他们围绕瓦尔特之后在帕维亚发生的事情开设了一座博物馆。
But they and at at the time they opened a museum centered around what happened in Pavia after Walter.
于是,这里有最初的电池,以及后来伏打开发的许多更大尺寸的电池。
And so there was, this battery, the original one, and then many others bigger that Volta developed.
之后,它看起来就像一个电力领域的硅谷。
And afterwards, it looked like it was a museum of a Silicon Valley of electricity.
有大量初创公司专注于发电机、电动机、电力照明等领域。
There are tons of startups for generators, electrical motors, electrical lightning, and so on.
其中一家,顺便说一下,是爱因斯坦和爱因斯坦。
And one of them, by the way, was Einstein and Einstein.
这是阿尔伯特·爱因斯坦的父亲和叔叔,他们从德国南部的乌尔姆搬到帕维亚,创办了一家初创企业。
This was the father and uncle of Albert Einstein who moved from Ulm, Southern Germany, to Pavia to start to make a start up.
嗯。
Mhmm.
当时,他们的初创公司最终破产了。
And at the time, their start up eventually went broke.
在破产之前,他们参加了一场竞赛,争夺为慕尼黑和摩纳哥的街道照明的合同,竞争对手是西门子。
Before going broke, they competed in a competition to illuminate streets of Munich, Monaco, and the competitor was Siemens.
好的。
Okay.
他们输给了西门子。
They lost the bid to Siemens.
所以你可以想象另一种历史:如果获胜的不是西门子,而是爱因斯坦父子,也许我们就不会有相对论了。
So you can imagine an alternative history in which, you know, instead of Siemens, there is Einstein and Einstein, and maybe we don't have relativity theory.
我知道。
I know.
我刚才也在想这个。
That's what I was thinking.
是的。
Yeah.
他本可以是个从未需要为理解事物而奋斗的富家子弟。
He could have been like some privileged kid that never needed to struggle to figure things out.
是的。
Yeah.
所以,总之,我们回到主要话题。
So anyway, this is, you know, back to the main topic.
据说,伏打刚刚发明出第一个持续的电流源后,人们——也就是科学家们——就能开始研究电了。
The story was that immediately after Volta invented this continuous source of electricity for the first time, people could people, meaning scientists, could study electricity.
没错。
Right.
于是,一系列发现接踵而至。
And so it was a really an avalanche of discoveries.
接下来的十五到二十年里,电化学迅速发展。
The electrochemistry was done in the next fifteen, twenty years.
随后,人们发现了电学的基本定律,比如欧姆定律、安培定律,最终法拉第发明了发电机和电动机。
Then there were the discovery of laws of electricity like ohm, ampere, and eventually, Faraday invented the electrical generator, electrical motors.
我认为奥斯特发现了电与磁之间的联系,然后这一进展在1864年达到顶峰,当时麦克斯韦提出了关于电磁学的四个方程,即电磁理论。
Orsted, I think, invented connect discovered the connection between electricity and magnetism that you can and and then it culminated, of course, in '64 1864 when Maxwell came up with four equations about electromagnetism, theory of electromagnetism.
是的。
Yeah.
他发展了一套理论。
He developed a theory.
对。
Yeah.
所以克罗诺花了六十年才完成这一切。
So it took Cranot sixty years to do this.
以马车的时代来看。
By horse time.
没错。
Yes.
但确实花了很长时间。
But a long time.
但与此同时,即使没有理论,直到马克思主义者真正理解电的本质之前,人们并不知道电是什么。
But in the meantime, even without a theory, until Marxist people really did not know what electricity was Right.
但这并没有阻碍人们开发出伟大的应用,比如电动机、发电机以及所有那些东西。
But this not did not, was not an obstacle to develop great applications like, electrical motors and electrical generators and all those things.
所以这可以说是一个教训。
So so this is kind of lesson.
我觉得我们现在正处于人工智能领域。
I feel that we are in artificial intelligence.
你知道,我们仍然处在伏特和麦克斯韦之间。
You know, we are still between, Vault and Maxwell.
我不确定具体在哪里。
I don't know exactly where.
这是个很难回答的问题。
That's difficult question.
是的。
Yeah.
我们现在比马跑得快了。
We're faster than horses now.
但我听你讲过这个类比,也听你在另一场合表达过担忧,或者认为也许我们并不需要AI领域的麦克斯韦。
But I so I've heard you I've heard you tell that analogy, and I've also heard you at the in a different breath voice the your concern or the possibility that maybe we don't need Maxwell of AI.
也许即使你正在研究理论,我们其实并不需要它。
Maybe we don't need the theory even though that's what you're working on.
那么你如何调和这两者呢?
So how do you reconcile those two?
我无法想象你真的相信这一点。
I I can't imagine you actually believe that.
这简直像是你在承认一些你并不相信的东西?
It's almost as if you're are you sort of admitting something you don't believe?
或者
Or
是的。
Yeah.
我认为我承认了一些我不相信的东西。
I think I admit something I don't believe.
我希望我们需要一个理论,并且会有一个理论。
I hope we need a theory, and there will be a theory.
会有多完整?
How complete?
我不知道。
I don't know.
我几乎可以肯定它不会是四个方程。
I'm almost sure it will not be four equations.
它更像是一些关于智能的原则。
It will be more something like, you know, principles of intelligence.
就像在分子生物学中,我们并没有真正的方程,但我们有一些基本原理。
Like, we have in molecular biology, you know, we don't really have equations, but we have some basic principles.
例如,生物信息如何通过双螺旋结构进行复制和自我繁殖。
For instance, how biological information is copying and reproducing itself by the the double helix.
这是一个美妙的原则。
That's a beautiful principle.
我想,你必须设想一些这样的基本原理,它们可能无法像麦克斯韦电磁理论那样提供一个完整的理论。
You have to I imagine things like that that are fundamental but may not give a complete theory in the sense of the electromagnetism of Maxwell.
这正是我希望的。
Now this is what I hope.
你知道,始终存在一种可能性,即机器学习、大语言模型或其后续技术可能会代替我们发展出这一理论,而我们可能无法理解它。
You know, there is always the possibility that, in a sense, machine learning LLMs or their successor will develop the theory instead of us, and that we may not be able to understand it.
哦。
Oh.
好吧,明白了。
Well, okay.
好的。
Okay.
所以你提到原则很有趣,因为我刚刚和神经生理学家亚历克斯·迈耶交谈过,他最近对整合信息理论非常着迷,并认为它可能是对意识的一种潜在解释。
So it's interesting that you mentioned principles because so I was just in conversation with Alex Meyer, who's neurophysiologist, but he has been enamored recently with integrated information theory and the as as an explanation potential explanation for consciousness.
他之所以着迷于它,是因为它建立在一种形式化的数学框架之上,这为发展关于意识的数学定律提供了可能,而这正是令人满意的地方。
And the reason that he's enamored with it is because it is embedded in a formalized mathematical formalization essentially, which gives the possibility of essentially developing laws of you know, mathematical laws of consciousness in this case, but because that's what's satisfying.
是的,进化、分子生物学和DNA都是原则,但它们并不像自然定律那样。
And yes, evolution is and and molecular biology and DNA, those are principles, but they're not like natural laws.
不知为何,作为人类,科学家们似乎最满足于能够——我本来想说‘还原’?
And somehow, we're not, as a as a people, scientists seem most satisfied when we can I won't I was gonna say reduce?
当我们能用这些自然定律来形式化关系时。
When we can formalize relations with these natural laws.
你追求学习理论和机器学习理论时,是不是也想要这样的东西?
Is that the kind of thing that you're after with the with theories of learning and machine learning theory in general?
我觉得是的。
I think so.
它们更像是一些原则,我认为是数学原则。
They are more like principle and I think mathematical principles.
我们待会儿要讨论稀疏性和组合性,但你必须证明定理,才能对它们做出具体的论断,而这正是你在做的事情。
Sparsity and compositionality is what we're gonna discuss, but are those but you have to prove theorems to concretely say things about them, which you're in the business of doing.
是的
Yeah.
但那和形式化的数学定律不同吗?这些是原则还是定律?
But so is that different than a formal mathematical law of is that are those principles or laws?
确实存在一些有趣的原则,比如稀疏组合性,我们稍后可以讨论它。
Well, there is this interesting principles like sparse compositionality, which can speak later about it.
但我们可以证明,这是一些东西的后果,比如一个函数或执行任务的能力可以被图灵机计算。
But we can prove that this, is a consequence of something like a function or the ability to do a task being computable by a Turing machine.
是的
Yeah.
我们可以证明这一点。
We can prove that.
现在问题是,首先,这意味着所有运行在计算机上的东西,比如JGPT,都是稀疏组合的,因为它运行在计算机上。
Now question is, so first of all, this implies that everything that is, running on a computer like JGPT or so is compositionally sparse because it's it runs on a computer.
但这并不必然意味着我们大脑所做的一切都是稀疏组合的,因为我们不知道是否能在机器中复现大脑所做的一切。
But it does not necessarily imply that everything that our brain does is compositionally sparse because we don't know if we can reproduce in a machine everything that our brain does.
大多数人这么认为,但是
Most people believe it, but
但是你
but Do
吗?
you?
嗯,不完全如此。
Well, not completely.
这一点我们稍后可以讨论,但这种可计算性条件指的是高效可计算性,简单来说,就是计算机应该能在合理的时间内完成计算,而不是需要宇宙年龄那么长的时间。
That because we can discuss later, but this condition of computability is efficient computability, which simply means that a computer should be able to compute it in a time that is not the age of the universe or something like that.
对吧?
Right?
对。
Right.
合理的时间。
Reasonable time.
所以情况可能是这样,让我们换种说法。
So it could be that and so let's put it another way.
有一些物理过程,比如混沌系统,你知道的,像天气如何发展、形成和演变,这些过程很可能不是高效图灵可计算的。
There are physical processes that like chaotic systems, you know, like how the weather develops, forms and develops, that are very likely not efficiently Turing computable.
对。
Right.
因此,为了保持未来的可预测性窗口恒定,随着你向前推进,你应该指数级地提高测量的精度。
And so simply because in order to keep a window predictability, that is constant in in the future as you go forward, you should, increase the precision of your measurements exponentially.
对。
Right.
所以,你知道,它是可计算的,但不是高效图灵可计算的。
So, you know, it's computable, but not efficiently Turing computable.
那么,是否存在一个窗口,顺便说一下,这可能与意识的这个问题有关。
And and so there is there a window, which, by the way, may connect to this question about consciousness.
也许意识在某种意义上不是图灵可计算的,就像我们无法以任意精度计算三天后的天气一样。
It may be that consciousness is not Turing computable in the same sense that we cannot compute with arbitrary precision the weather, you know, say, three days from now.
我无法想象它是图灵可计算的。
I can't imagine that it is Turing computable.
是的。
Yeah.
你知道,亚历克斯的观点之一是,他想要的是数学结构与现象意识特性(如感受质等)之间的同构关系。
Well, you know, and and Alex's point one of his points is, you know, what he wants is this isomorphism between a mathematical structure and properties of like phenomenal consciousness, like the qualia, etcetera.
他将意识与认知区分开来,因为所有认知都是一种函数。
And he distinguishes it from cognition, because everything cognition is a function.
我们在人工智能中所做的一切,我们在神经网络中所做的一切,都只是函数,而不是数学同构。
Everything that we do in AI, everything that we do with neural networks, it's all a function and not a mathematical isomorphism.
因此,这里存在巨大的差异。
And so there's a huge difference there.
对。
Yeah.
在我看来,这些函数本质上是可以被计算机组合、计算和可计算的,而其他一些函数则过于复杂,无法在合理时间内被计算。
In my point of view, you have these functions that are essentially the composable, computed and computable by a computer, and other functions that are too complicated to be computed in a reasonable time.
而学习本身也是一种函数。
And and learning itself is a function.
我的意思是,既然你把学习列为第四个层次,那这是否一直是你的热情所在?你从那时起就一直在研究它吗?
I mean, it's safe to say that since you since you popped learning in as a as a fourth level, I mean, has that been your passion for I mean, you've been working on it ever since.
但我真正想问你的是,你对学习的看法是如何随着时间演变的。
But so what I really want to ask you is like how your sort of out how your thoughts, you know, about learning have developed over time.
有没有什么你过去相信但现在不再相信的观点?还是说,它一直像我所看到的那样,是稳步前进的,而你确实一直在践行?
If there's anything that you used to think that you now don't think, or has it been like that steady march forward that I see that, you know, you actually actuate?
我一直认为,学习是通向智能的大门。
Well, I always thought, I think, that learning was really the door to intelligence.
我想变化首先在于,很长一段时间里,我试图向计算机科学系的朋友们灌输学习的重要性。
I think what changed, you know, first of all, was the fact that for a long time, I can try to preach to come to my friends in the computer science department that the learning was really important.
但他们在麻省理工学院直到2010年左右才开始认真听进去。
And they started to listen this at MIT only, I would say, 2010 or so.
他们为什么不愿意听呢?
Why would they not listen to that?
他们卡在哪里?
What was their hang up?
这是个有趣的观点。
It's an interesting point.
如果你仔细想想,这还挺合乎逻辑的。
If you think about it, it's kind of logical.
你知道,从计算机科学系创立之初,大约1950年左右,其基本范式、科学方法或研究方法就是编程。
You know, the paradigm, the basic paradigm, scientific approach or research approach in computer science departments from their inception, 1950 or so, was programming.
嗯。
Mhmm.
算法。
Algorithm.
算法编程。
Algorithmic programming.
是的。
Yeah.
是的。
Yeah.
你告诉计算机该做什么。
You tell a computer what to do.
你可能让它做非常复杂的事情,但还是要告诉计算机去做。
They may be very sophisticated things you tell him to do, but tell computer to do.
而你作为研究者的职责是编写一个聪明的程序。
And your function as a researcher is to write a smart program.
对。
Right.
好的。
Okay.
这种情况一直持续到2010年2月。
Then this was until 02/2010.
但如果你现在看看,计算机科学变成了计算机传输。
Then if you look now, computer science is computer transfer.
现在一切都是机器学习。
Everything is machine learning.
对。
Right.
以前你知道的,编译器、计算机语言、机器人技术、计算机视觉、自然语言处理。
Used to be, you know, compilers, computer languages, robotics, computer vision, natural language.
它们都是各自独立的领域。
They were all separate silos.
现在全是机器学习了。
Now it's all machine learning.
有趣的是,你知道,我早在1990年左右就开始说机器学习将成为计算机科学的通用语言,但花了很长时间才实现。
And it's funny that, you know, I started to say this, that machine learning was going to be the langue franca of computer science, back in, I don't know, 1990 when but it took a long time.
这让我想起了当时的情况。
It's it's it reminds me what happened.
我记得在八十年代,我们在MIT使用电子邮件,当时我是一家相当有趣的小公司Thinking Machines的顾问。
I remember in the eighties, we are using the email at MIT, and, I was a consultant in this quite interesting little company called, Thinking Machines.
是的
Uh-huh.
思考机器公司生产了一台连接机,这是一台拥有百万个极其简单处理器的超级计算机。
Thinking Machines produced a connection machine, supercomputer with 1,000,000 of very simple processors.
总之,我在那里,他们管这叫什么?
Anyway and I was there a what they call it?
企业研究员。
Corporate fellow.
好的。
Okay.
另一位企业研究员,举个例子,就是理查德·费曼。
And the other corporate fellow, just to give you the idea, was a Richard Feynman, for instance.
你的朋友。
Your buddy.
对。
Right.
还有史蒂夫·沃尔弗拉姆,也是其中之一。
And Steve Wolfram, another one.
哦,哇。
Oh, wow.
还有其他一些非常有趣的人。
And a few other very interesting, people.
所以当时,我知道电子邮件是未来的趋势,但又过了十五年,人们才停止使用传真机。
So at the time, you know, it was obvious to me that, email was the way to go, but it took another fifteen years before people stopped using fax machines.
嗯,我上个月还得发过一次传真,完全不明白为什么。
Well, I've still had I still, like last month, had to fax something and I could not understand why.
但是,是的。
But yeah.
对。
Yeah.
对。
Yeah.
是的
Yeah.
你知道我说的是什么。
You know what I'm saying.
是的
Yeah.
是的
Yeah.
当然。
Of course.
我基本上已经放弃了电子邮件会到来的希望,但后来它果然来了。
I basically gave up on the hope that email was going to come, and then it came, of course.
是的
Yeah.
但到
But by
展开剩余字幕(还有 480 条)
然后你就转到Slack了,是的。
then, you're onto Slack and Yeah.
对。
Yep.
但是
But
不过神经网络早就存在了,神经网络领域、PDP社群的人们,一直都在宣扬学习的重要性。
so but but neural nets were around, and people in the neural network, the PDP community, you know, they were preaching learning for a long time.
当时存在多层学习的问题,还有整个反向传播的问题。
There was the problem of multilayer learning and, you know, the the whole backpropagation problem.
它很慢。
It's slow.
它不够高效,你知道的。
It doesn't you know, it's not efficient.
我知道,2012年ImageNet出现时,情况就变了,是的。
And I know, you know, that changed in 2012 when ImageNet was Yep.
解决了,对吧。
Solved Right.
错误率降低了。
Lower error.
但并不是说它不存在。
But but it's not like it didn't exist.
没错。
No.
它确实存在。
It did exist.
我曾经是个怀疑者,某种程度上,我错了。
I I was a skeptic, and in a sense and I was wrong.
关于什么?
About what?
神经网络。
Neural networks.
你知道,我实际上一直在使用浅层神经网络而不是深层网络,因为基本上直到2000年,或者说大约到2008年,浅层网络的表现和深层网络一样好。
You know, I was essentially using shallow neural networks instead of deep one because, basically, until 2000, I don't know, 10 or so roughly two thousand eight, shallow network was working as well as deep networks.
你知道,另一个值得讨论的话题是重要的技术与理念。
It you know, that's another subject to discuss is important technology technologies for ideas.
我们常常以为是自己发展出了理论和算法,但现有的技术条件——哪些是可能的,哪些是难以实现的——实际上在很大程度上塑造了我们的想法和算法。
You know, we often think we develop theories and algorithms on, but existing technology, what is possible, versus what is difficult to do and shape really a lot of the ideas and the algorithms.
是的。
Yeah.
简·拉科塔也一直在历史上强调这一点。
Jan Lakota makes that point too throughout history.
这样的例子太多了。
So many examples of that.
对。
Yeah.
没错。
Exactly.
是的。
Yeah.
你知道吗,我在1999年左右在斯图加特乘坐过一辆奔驰的自动驾驶汽车。
You know, I was in a self driving car by Mercedes in Stuttgart in 9099 or something like this.
真的吗?
Really?
是的。
Yeah.
它在斯图加特市中心狭窄的街道上自动驾驶。
And it was driving self driving in narrow streets in the center of downtown Stuttgart.
当然,车上有一位司机,双手紧贴着方向盘。
Of course, there was a driver with his hands very close to the steering wheel.
但简单来说,你知道,后备箱里塞满了电脑。
But simply, you know, the the trunk was full of computers.
我记得当时举办了一个为期三天的自动驾驶研讨会,仅限受邀者参加。
And I remember there were there work there was a workshop, three days about autonomous driving, kind of invite invitees only.
最后半天是律师们。
And the the last day, half day, was lawyers.
在那个
And at
workshop结束时,戴姆勒奔驰(梅赛德斯)的管理层决定放弃自动驾驶。
end of of that workshop, the management of Daimler Benz, Mercedes decided no autonomous driving.
我们取消它吧。
Let's kill it.
真的吗?
Oh, really?
是的。
Yeah.
我本来想说两件事。
I'll oh, I was gonna say two things.
第一,我打赌福岛的Neocognitron并没有被纳入那辆自动驾驶汽车的计算中。
One, I bet the Fukushima Neo Cognitron was not part of the calculations in that driving car.
为什么?
Why?
不。
No.
当时的情况有点像福岛,因为这基本上就是我们在做的,而且
There were there was kinda like Fukushima because this was basically what we are doing and
是的。
Yeah.
是的。
Yeah.
是的。
Yeah.
是的。
Yeah.
不。
No.
比如说,我们曾用多达200个样本训练了一个检测行人的系统,这在如今看来是很少的。
We you know, for instance, we had trained a system to detect pedestrians using as many as 200 examples, which is is these days.
从科学角度来看,它的表现相当不错。
And it was working, you know, from the scientific point of view, pretty well.
从实际应用的角度来看,它大约每十秒会出现三次错误。
From the practical point of view, it was giving an I think about three three errors every ten seconds.
哦,原来如此。
Oh, okay.
但确实如此。
But yeah.
好的。
K.
从帧数来看,它的错误率非常低。
Well, it was very low in terms of number of frames.
远低于每帧一次错误。
Well, much less than an error per frame.
但是,你知道,显然它在任何实际意义上都是不可用的。
But but, you know, obviously, it was not usable in any real sense.
简而言之,汤米,那天你害死了多少人?
Bottom line, Tommy, how many people did you kill that day?
没有。
No.
不是真的有人丧生。
Not not no real people.
是的。
Yeah.
我以为你要说,他们承诺五年内会有自动驾驶汽车,因为每个承诺都是五年后实现。
Was I thought you were gonna I thought you were gonna say at the end, they they promised that we would have autonomous self driving cars in five years because every every promise is in five years.
对吧?
Right?
但你说他们说没有。
But you said they said no.
也许律师们真的
Maybe the lawyers really
他们 anyway 在戴姆勒内部取消了这个项目。
They they killed the project inside Daimler anyway.
是的。
Yep.
这很遗憾,因为当时他们处于前沿,但时机有点太早了。
Which was a pity because they were at the forefront at the time, but it was kind of too early.
但你说那是1999年,你当时在
But so that was in 1999 that you said that that you were in
那个那个。
that that.
对。
Yeah.
也许好吧。
Maybe Okay.
1999年,或者1970年左右。
'99 '7 or so.
我的意思是,我读过你的自传,当然,我之前已经对你的许多工作有所了解,我会在节目笔记中附上链接。
So I mean, I I read your autobiography, and of course, I already knew a lot about a lot of your work, but and and I'll link to it in the show notes.
但你在自传中提到,你早在20世纪80年代初就开始研究目标识别,也就是检测人体之类的任务。
But you write in your autobiography, you know, you started working on object recognition, which is like detecting people in the in that case, in the in the early eighties, I think.
而你当时其实对这一点持怀疑态度,这与神经网络中的学习机制有关。
And you actually doubted that so this kind of relates to the learning in neural networks, kind of thing.
你怀疑休贝尔和威塞尔提出的简单细胞和复杂细胞能否以层级方式组合成目标。
You you doubted that the Hubel and Wiesel kind of, simple and complex cells could be kind of, composed, kind of hierarchically composed into objects.
你后来承认自己错了,于是转向了HMAX这类基于层级结构的目标识别研究,并致力于HMAX等项目。
And and you sort of admit that you were wrong, and and then you got on this HMAX or, like, object recognition in these, you know, hierarchical structures and worked worked on HMAX, etcetera.
但当时你是如何思考学习机制的呢?在那个年代,你是怎么想的?
But how did how were you thinking about learning, in those days, like, during that time?
是的。
Yeah.
那时候的学习真的只是在输出层进行单层学习。
Learning in that time was really, just one layer learning at the output.
所以,你知道的,没错。
So there was, you know Right.
通过这些层次系统进行处理。
Processing by these hierarchical systems.
特征是通过一种非常简单的方式学习的,就是随机抓取图像的某些部分,而真正的学习——即分类器权重的学习——只发生在网络的最后一层。
The features were learned by a very simple way, just grabbing pieces of images at random that were and the real learning in terms of learning the weights of a classifier, just the last layer of the network.
我明白了。
I I see.
原因是,我真的不认为反向传播在生物学上是可行的。
And the reason is that I I do not really believe that backpropagation could be biological.
你说得对。
You're right.
从某种意义上说,我是对的,但从没在机器学习中使用它这一点上,我又错了。
And I was, in a sense, right, but I was wrong in the sense of not using it in machine learning.
对吧?
Right?
是的。
Yeah.
所以,由于这种生物限制,进展就停滞了,对吧?
And and so there was stopped by, right, this biological constraint.
现在我们认为,我们有了一个从神经科学角度看似乎合理的模型。
Now we think we have ideas, a model that seems plausible from the neuroscience point of view.
我不知道它是否正确。
I don't know if it's correct or not.
这需要实验验证,但至少我们有希望实现一种并非真正的反向传播、但类似于一般梯度下降的方法,它可以通过神经元以一种相当自然的方式实现——虽然有点神奇,因为连接性是自组装形成的。
This will require experiments, but there is at least a fighting chance of having, it's not really back propagation, but it's something like a general form of gradient descent, which can be quite naturally implemented by neurons in a quite little magically, a bit magical way because of self assembly of the connectivity.
而且
And
哦,是的。
Oh, yeah.
是的。
Yeah.
所以,你知道,也许因为我认为这是神经科学中的一个关键问题,如果能够解决,理论上可以建立起神经科学与机器学习之间的深刻联系。
So, you know, maybe because I think that's an interesting key problem in neuroscience that could, in principle, really, if solved, establish a deep connection between neuroscience and machine learning.
我们或许能在大脑中找到反向传播的等价物,因为那样我们就可以观察神经回路和突触模式,说:哦,这里就是它发生的地方。
We could find the equivalent of back propagation in the brain, because then we'd could look at the circuits, the synaptic motifs, and say, oh, this is where this happens.
这有点跑题了,我话题跳得有点乱,但我的意思是,你刚刚提到了你自己在自组织、生物合理可塑性网络方面的一些工作。
How how much time this is an aside, and I'm I'm jumping around, but I I mean, of course, you I mean, you just alluded to some of the your own work in these self organizing, biologically plausible plasticity kinds of networks.
而且,已经有人提出了其他方式,来说明可能存在一些生物合理版本,能够实现类似反向传播的功能。
And and there have been other other ways of suggesting how biologically plausible versions that kind of replicate what backpropagation does, you know, exist.
而且,已经有多项原理性证明,它们在模拟反向传播方面的成功程度各不相同。
And sort of there's been multiple proofs of principle, and and they've had varying degrees of success in terms of emulating backpropagation.
是的。
Yeah.
但我读了你最近的一篇论文,发现里面充满了深度学习理论的术语,我当时就想:天啊,我其实只懂一点点。
But but how you I you know, I was reading your one of your recent papers, and and the language is so thick with deep learning theory jargon, and I thought, oh my God, I'm still like, I know a little bit.
我的意思是,我知道什么是流形之类的,但一接触到专业术语,我就觉得自己有点迷失了。
Like, you know, I know what a manifold is and stuff, but then you get into technical terminology, and I and then I think, oh, I'm I feel kind of lost.
你真的深陷在这个领域里,那么你花在机器学习和生物学习上的时间各占多少呢?
And you're really embedded in that world, and it may you know, how much of it of your time thinks of it in terms of machine learning versus in terms of biological learning?
我的意思是,如果这两者确实有区别的话,你分别投入了多少精力在它们上面?
Like, what kind of how much headspace is devoted to each of those if they're separate at all?
对。
Right.
我认为很长一段时间以来,两者各占一半。
I think for a long time has been fifty fifty.
好的。
Okay.
我认为在过去五年左右,我可能更偏向于人工网络,因为
I think in the last five years or so, I've been probably tilting a bit more towards the artificial network because
但因为现在有数据可以用来验证和
But because the data is there to to test and
没那么多。
Not so much.
不。
No.
这是因为我一直对F理论的必要性感到困惑。
It's because I've been really puzzled by the need of F theory.
是的。
Yeah.
所以,我认为,现在——指的是过去几年——我似乎发现了一些原则。
And, and so I think, again, now, now meaning in the last couple of years, I think, it seems to me that I found some principles.
这些原则当然不是最重要的,但可能是对人工机器学习有些重要的原则。
These these are by no means the most important ones or but maybe some of the principles that, seem to be important for artificial machine learning.
我们来谈谈它们。
Let's talk about them.
我们现在就来谈谈它们。
Let's talk about them now.
所以,我的意思是,组合稀疏性是你目前关注的核心原则吗?
So, I mean, compositional sparsity is I I know is that the central principle that you're focused on right now?
是其中一个。
Is one.
是的。
Yeah.
对我来说,它确实解决了那个长期困扰我的问题。
It's you know, to me, it did solve the question was a block.
再次回到我之前提到的,我至少需要对正在发生的事情有一个理论上的初步理解。
Again, coming back to what I mentioned before, you know, it's I kind of needed to have at least a glimpse of theoretical understanding of what's going on.
我想我们大概在2003年左右,和一位著名的数学家史蒂夫·斯梅尔一起,为美国数学学会写了一篇综述论文,讨论机器学习。
And back in the I think we we wrote a review paper, probably 2003 or so, for the American Mathematical Society with a very famous mathematician, Steve Smale, about machine learning.
在那里,我们相当完整地阐述了浅层网络、核方法等的理论。
And and there, we described the theory quite nice and quite complete of basically shallow networks, kernel machines, and so on.
嗯。
Mhmm.
然后我在讨论中提到了一些段落,探讨了这个谜题:为什么我们似乎拥有一种不需要深层、多层结构的理论。
And then I had a disc in the discussion, various paragraphs about this puzzle of why we seem to have a theory that does not need, deep, you know, multiple layers.
然而,我们对生理学的了解,比如视觉皮层,似乎表明多层结构是重要的。
And whereas, what we know about physiology, for instance, in visual cortex, seems to suggest there are multiple layers that are important.
因此,我一直在思考这个谜题:为什么?
And so I was kind of asking this puzzle, why?
在能够真正应用深度网络之前,我一度卡在这个问题上。
And so I was a bit stuck there before I could really apply deep networks.
我认为,这种稀疏组合性正是对这个谜题以及其他类似谜题的答案。
And I think this sparse compositionality is the answer to this and to other to other similar puzzles.
你是如何发现这一点的?我可以想象一种情况,你通过训练深度网络,观察它们的表征,从而发现了这些特性。
How did you I can imagine a scenario where you sort of discovered that through training deep networks and you look at their representations and see the properties of them.
但我也可以想象另一种情况,你从更理论化、原则性的角度出发,思考哪些特征才是关键。
But I can also imagine a scenario where you sort of approach it from a more theoretical principled lens and think what features would matter.
那么,这个发现究竟是怎么发生的?
So how did how did that come about?
是的
Yeah.
我更多是通过第二种方式得出的。
I came out more in the second way.
这是对一个相关问题的回答:为什么卷积网络似乎比全连接网络好得多。
It came out as an answer to a related question of why convolutional networks seem to be so much better than dense networks.
在卷积网络中,就像在视觉皮层中一样,每个单元只关注一小部分输入,而不是所有输入。
And in convolutional networks, as in visual cortex, you have units that essentially look only at the small set of inputs, not all the inputs.
是的
Yeah.
对
Right.
例如,你有很多感光细胞,但第一层中的每个单元只观察其中一小部分。
For for instance, you have a, say, lot of photoreceptors, but you're looking each unit in the first layer only look at a small set of them.
一小块局部区域。
A little local patch.
局部区域。
Local patch.
没错。
Exactly.
是的。
Yep.
于是就提出了一个问题:假设我有一个多变量函数,在这个特定情况下,比如八个变量,也就是 x1、x2、x3、x4 到 x8。
And and so the question came up, suppose I have a function of many variables for this particular case, say say, eight variable, you know, x one, x two, x three, x four, x eight.
但现在假设这个函数具有某种特定结构。
But now suppose that this function has this particular structure.
所以它是一个函数的函数的函数。
So it's a function of functions of functions.
因此,我有一个关于两个变量 x1 和 x2 的函数。
So I have a function of two variables, x one and x two.
另一个函数作用于另外两个变量,也就是 x3 和 x4。
Another function of the other two variables is clear, x three and x four.
然后你有一个函数,它接收这两个函数的输出,依此类推。
And then you have a function that take the output of those two functions and so on.
所以你本质上有一个二叉树,八个节点作为输入,每个其他节点都是两个变量的函数。
So you have essentially a binary tree where you have eight nodes as inputs, and then each other node is a function of two variables.
嗯哼。
Mhmm.
好的。
Okay.
问题是,这可以说是卷积网络的一个简化版本。
And the question was, this is kind of a toy version of a convolutional network.
嗯,卷积本身其实并不重要。
Well, convolution is not really important.
关键是权重在平移下保持不变。
The fact that the weights are the same under translation.
因此,当一个函数有八个变量时,通常会出现所谓的维度灾难。
And so it turns out that when you have a function of eight variable in general, you have the so curse so called curse of dimensionality.
换句话说,要近似它,所需的参数数量通常可能随着变量数量呈指数级增长,这非常糟糕。
In other words, to approximate it, you need a number of parameters that is typically can be as bad as exponential in the number of variables, which is very bad.
独立的。
Independent.
对吧?
Right?
如果它们不是高度相关的等等。
If they're not if they're not highly correlated, etcetera.
那是最坏的情况,但确实如此。
That's that's the worst case scenario, but yeah.
是的。
Yeah.
对。
Right.
是的。
Yeah.
对。
Right.
而且,你知道,函数的平滑性可以纠正这一点。
And, you know, smoothness of the function can correct that.
但基本上,这个问题会反复出现。
But, basically, it's it reoccurs.
你可以看到,例如,如果你有一个包含一千个变量的函数,这并不多,因为这只是张32乘32像素的小图像,大约有一千个像素。
You can see that, for instance, if you have a function of thousand variables, which is not much because it's a small image, it's 32 by 32 pixels, and you have about thousand pixels.
这个一千像素的函数,如果你假设近似误差为10%,可能就需要10的千次方个参数。
The function of thousand pixels, this could give you if you assume an error in the approximation of 10%, it may you may need 10 to the thousand parameters.
而10的千次方是一个巨大的数字,因为10的80次方才是宇宙中质子的数量。
Now 10 to the thousand is a huge numbers because 10 to the 80 is the number of protons in the universe.
所以我知道你接下来要说电子或者质子之类的。
So I knew you're going with either electrons or protons or yeah.
就是其中之一。
So one of those.
这总是涉及某种度量,如果需要的时间超过合理范围,那就是个坏迹象。
It's always the the metric like and that's a bad sign if if it takes more than Right.
宇宙中的质子。
The protons in the universe.
是的。
Yeah.
对。
Right.
对。
Right.
对。
Right.
但事实证明,如果这个函数,正如我所说,是一个函数的函数。
So but it turns out that if the function is, as I said, it's a function of functions.
它最初被称为层次局部性,但更好的术语是稀疏。
It is kind of originally called it hierarchically local, but the the better term is sparse.
从组合的角度来看,它是稀疏函数的复合,意味着每个函数只依赖于少量变量。
It's compositionally it's composition of sparse function, meaning functional, each one of which depends on a small number of variables.
那么这里的稀疏性是否对‘稀疏’这个术语有精确的定义呢?
What it so is is sparse here serving a is there precision to the term sparse?
稀疏是指少于三个吗?
Is sparse less than three?
还是说它只是一种方向性的说法?
Is it or is it just a sort of directional attitude?
这是一种方向性的说法。
It's directional.
但由于你会遇到指数级的损失,我会说稀疏是指少于40个二进制变量。
But because it you get into exponential losses, I would say that sparse means less than 40 binary variables.
哦,明白了。
Oh, okay.
好的。
Okay.
或者14个非二进制变量。
Or 14 non binary ones.
所以这显然是稀疏的。
So this is obviously sparse.
这简直是非常稀疏了。
This is, like, very sparse then.
是的。
Yeah.
所以稀疏性没问题。
So the sparsity okay.
那请继续。
So go ahead.
所以你有函数的函数。
So so so you have functions of functions.
这就是组合性部分,也是更高层次的结构部分。
That's the compositionality part and the higher hierarchical part.
也许你可以在这里区分层次性和组合性。
And maybe you can differentiate hierarchical versus compositionality here.
不太行。
Not really.
我觉得这个词有两个不同的说法。
I I think there are two words for this same thing.
我觉得‘组合’这个术语更好,因为你是在组合函数。
I think composition is a better term because you are composing functions.
这是一个函数的函数的函数。
It's a function of function of function.
而且,这个词在语言的组合性中也经常出现。
And also, it's a term that, you know, it comes up a lot in, for instance, the compositionality of language.
意思是你可以用简单的部分构建出更大的东西、更复杂的结构。
The idea that you can, create bigger things, bigger meetings out of simple parts.
乔姆斯基说过,赫尔姆霍兹也说过,本质上,就是从简单部分中产生无限复杂事物的能力。
And, you know, Chomsky said that, and Helmholtz also said, you know, essentially, the capability of getting infinitely complex things out of simple parts.
这就是语言的力量之一。
That's one of the powers of language.
嗯哼。
Mhmm.
但事实证明,这种性质适用于每一个可计算的函数特性。
But it turns out that this property of every functional property that can be computed.
必然如此。
Necessarily.
是的。
Yeah.
必然如此。
Necessarily.
所以
So
这似乎没问题。
it seems okay.
所以我能理解,这里的瓶颈在哪里?
So I could see how this where's the bottleneck here?
关键在哪里?
Where's the trick?
这个关键是不是在于函数本身?我从进化的角度来思考这个问题。
Is the is the trick the the functions themselves I'm thinking of this in evolutionary terms.
对吧?
Right?
比如,进化是否发现了哪些函数能够高效地与其他函数在这个稀疏组合中相互作用?
Like, did evolution discover which functions can efficiently interact with other functions in this sparse composition.
这看起来像是一个脆弱的系统,但我们知道这个系统是稳健的。
It seems like a fragile system, and we know the system is robust.
所以看起来,关键在于选对函数,但你仍然需要学习这些函数。
So it seems like you have to get the trick is getting the functions right, and you still have to learn the functions.
好吧。
Well, okay.
但这很有趣。
But so it's interesting.
这有趣的地方在于,古典数学与计算机科学之间是否存在一种冲突或分界线。
This is an interesting I'm not sure whether conflict or dividing line between classical mathematics and computer science.
在
In
在古典数学中,你定义函数空间。
classical mathematics, you define function spaces.
通常,它们具有某种光滑性等性质,比如满足一定数量的导数等等。
Typically, they have properties in, like, different types of smoothness, you know, and and meet a certain number of derivatives and so on.
在计算机科学中,你用少量的基本操作来构建每一个函数。
In computer science, you build every function out of a small number of primitives.
嗯。
Mhmm.
你知道,你从与、或、非开始,通过组合这些简单操作来构建一切。
You know, you start with and, or, not, and you build everything out of those simple things by composing.
嗯。
Mhmm.
这是计算机科学中的一个基本操作。
It's a fundamental operation in computer science.
因此,对计算机科学家来说,认为可计算的每一个函数都必须具备组合性,这是非常自然的。
And so it's very natural to for computer scientists to see that compositionality has to be the property of every function that can be computed.
我明白了。
I see.
好的。
Okay.
对数学家来说,这有点困难,因为这并不是
For a mathematician, it's a bit more difficult because that's not
我忘了我们身处计算机科学的世界。
it's not the I I forget we're in computer science land.
一切又回到了布尔逻辑。
Everything's bull back to boolean.
对吧?
Right?
所以,是的。
So Yeah.
但是,你知道,是的。
But, you know yes.
是的。
Yes.
事实上,我为一本包含28篇论文的文集写过一篇论文,这篇论文属于这个新理论框架的一部分,探讨的是实数是否存在?
There is, in fact, one of the essays I wrote for a collection of 28 essays that kind of part of this new theoretical framework is do real numbers exist?
因为这里正好存在这个问题。
Because there is exactly this question.
为了实现图灵可计算性,你本质上需要将每个数字最终用布尔变量来描述。
In order to have Turing computability, you need essentially, you want to describe every number at the end in terms of Boolean variables.
你知道,也许是一串非常长的零和一。
You know, perhaps a very long string of zero and ones.
所以在计算机科学中,实数并不存在。
So real number do not really exist in computer science.
事实上,如果你看一下数学的基础——连续统假设,它是实数理论的基础,但对于基础数学来说并不是严格必需的。
And, and in fact, if you look at, the foundation of mathematics, the continuum hypothesis, which is at the basis of real numbers, is not strictly needed for con for for the fundamental mathematics.
如果你放弃实数,也不会损失太多。
Don't lose too much if you give up real numbers.
许多实数中,有些是可计算的,比如圆周率π或自然常数e。
And many real numbers, some of them are computable, like, say, the number pi or e.
但许多实数是不可计算的。
But many real numbers are uncomputable.
所以本质上,它们就像诗歌。
So essentially, they're like poetry.
你知道,它并不是
You know, it's not
无用的,换句话说。
Useless, in other words.
完全无用。
Absolutely useless.
你无法进行实验。
You cannot do experiments.
你什么都做不了。
You cannot do anything.
好吧。
Okay.
呃,真抱歉。
Well, so sorry.
那么,我们刚才说到哪儿了?
So then where were we?
所以我们有一组稀疏的组合结构,你证明了什么?
So we have a sparse set of compositional structures, and you proved and what did you prove?
每个高效可计算的函数,也就是能在非指数时间内由图灵机计算、且变量数量有限的函数,都是组合稀疏的。
So every function that is efficiently computable, so basically is computable by a Turing machine in non exponential time and the number of variables, is compositionally sparse.
换句话说,它可以被分解为一系列函数的组合,每个函数都是稀疏的,即只依赖于少量变量。
In other words, can be decomposed in the composition of function, each one of which is sparse, meaning depends on a small number of variables.
是的。
Mhmm.
这些分解并不是唯一的。
And, you know, these decompositions are non unique.
对于任何一个给定的函数,都有非常多这样的分解方式。
Many, many of them for any given function.
你可以想象最极端的情况,就是将其分解为极其深层的最基础初等函数的组合。
You can think of the most extreme one would correspond to a very deep decomposition in the most simple elementary function.
这实际上就是基于基本运算——与、或、非——的组合。
It's really the composition in terms of the basic operation and or not.
是的。
Mhmm.
所以,你知道,我总能把一个数学上等价的图灵机程序转换为一个布尔函数。
So, you know, I can always translate a mathematic equivalent Turing machine program into a Boolean function.
所以,核心想法是学习一组布尔运算,这些运算能构成这种稀疏连接函数组合层次中的任意给定函数。
So so then the idea is that learning that set of Boolean operations that would comprise any given function in this compositional hierarchy of sparsely connected functions.
从学习的角度来看,这在计算上并不特别昂贵吗?
That is not computationally terribly expensive in from a learning perspective?
是的。
Yes.
如果我有每个组成函数的输入输出数据,那么每个函数都很容易学习。
If I have if I have input output data for each one of the constituent functions, each one of them is easily learnable.
类比一下,想象一个多层次网络。
Analogy is imagine a a multilayer network.
通常,你有网络的输入数据和网络的输出数据。
Typically, you have input data for the for the network and the output of the old network.
用这些数据进行训练可能会很困难,或者可能不行。
Training with those may be difficult or maybe.
但如果我有中间数据,也就是每一层(即每个组成函数)的输入和输出,我就能轻松学习每个函数。
But if I would have intermediate data, I have the input and the output for each one of the constituent function, which one of the layers, I could easily learn each of the function.
然后,当然还有旧的函数。
And then, of course, the old function.
顺便说一下,这正是Transformer有效的其中一个原因。
And this is, by the way, one of the reason why transformers work.
这就是所谓的Transformer的神奇之处。
It's one of the, you know, the magic of transformer.
这是因为它是在一个自回归框架下进行训练的。
It's because it's trained in a in an autoregressive framework.
并不是通过给我一串单词,然后让我预测书的最后一个字、最后一个词或最后一句话来训练的。
I'm not it's it's not trained by giving me a sequence of words and then, you know, the last letter of the book or the last word in the book or the last sentence in the book.
而是通过给我一个句子、一个词,然后预测下一个词来训练的。
It's trained by giving me a sentence, a word, and then the next word.
是的。
Mhmm.
然后是一个门控。
And then a gate.
对吧?
Right?
对。
Right.
所以在大多数情况下,它几乎就像是被训练去执行这些组成函数中的某一个。
And so it's almost like, in most cases, it's like just being trained to one of these constituent functions.
哦,我明白了。
Oh, I see.
但好吧。
But okay.
所以
So
然后我可以,当然,预测下一个词,并使用这个旧序列来预测下一个词
And then I can, of course, predict the next word, and then using the this old sequence to predict the next word
是的。
Yeah.
以此类推。
And so on.
对。
Right.
你预测一个词,然后它就成为你预测下一个词的语料库的一部分。
You predict the word, then it becomes part of the corpus from which you predict the next word.
是的。
Yeah.
对。
Right.
哦,好吧。
Oh, which okay.
所以,我这里有两种,不过我们还是坚持用机器学习的版本吧。
So I have kind of two well, let's stick with the let's stick with the machine learning version here.
那么,我们来谈谈泛化能力。
So well, let's talk about generalizability.
我的意思是,是的。
I mean Yep.
我知道你对这与泛化能力的关系感兴趣。
I know that you're interested in in how this relates to generalizability.
我们能说些什么呢?因为如果你要在深度网络中实现它,或者说是使用深度网络的优势在于,你可以利用这些稀疏的组合结构来实现。
What can we say about given that you have to have if you're if you're gonna do it in a deep network, or that's the advantage of doing it in a deep network is that you can do it with these sparse compositional structures.
是的。
Yes.
大多数机器学习任务都非常狭窄。
Most machine learning tasks are very narrow.
对吧?
Right?
而且,你知道,还存在持续学习的问题。
And there's, you know, there's this continual learning problem.
这很难,是的。
It's hard Yep.
一旦你在一个特定任务上训练了模型,就必须重新学习才能让它适应另一个任务。
Once you've trained a model on a certain task, then you have to relearn things to train it on another task.
所以这种泛化,某种程度上是目标,到底什么是它呢?
And so this generalization, which is sort of the gold what's what's what is that?
就像是彩虹尽头的宝藏,你知道的,是众多宝藏中的一个。
The pot at the end of the rainbow, sort of, you know, one of the pots at the a pot of gold
在
at the
对于人工智能来说,彩虹的尽头。
end of a rainbow for for AI.
那么,这种结构与泛化有什么关系呢?
So what does this have to do with generalization, this kind of structure?
这种结构对旧的框架很重要。
So this kind of structure, it's important for the old framework.
机器学习的方法,一直以来都是大家主要采用的,本质上是这样的。
The approach to machine learning that has been, you know, kind of, the, the main one for everybody is essentially the following.
我有一个未知的函数。
I have an unknown function.
比如,我们以ImageNet为例。
You know, like, let's take an example, ImageNet.
我想对ImageNet中的图像进行分类,共一千个类别。
I want to classify images in ImageNet, thousand classes.
所以我有一个函数,需要将这个200乘200的图像映射出去。
So I have a function that wants to map this, you know, four it's 200 by by 200.
所以,是的,大约4000个变量映射到一千个类别中的一个。
So, yeah, about 4,000 variables into one of the thousand classes.
但我只有一个训练集。
And and, what do I but I have only a training set.
我不知道这个函数的具体形式。
I don't know the function.
我有输入数据——图像,以及对应的正确类别标签。
I have input data, the images, and output with the correct class.
在我的训练集中,我已经遇到过这样的例子。
I've been in such example in my training set.
现在,我的思路是:我想用一个非常强大且通用的工具来近似这个函数。
Now the framework is I want to approximate this function using a very powerful general tool.
结果表明,稀疏组合性原理告诉我们,你应该使用的通用工具就是深度网络。
Turns out that this principle of sparse compositionality says the very general tool you should use is a deep network.
深度非常重要。
It's important deep.
因为每一个函数,只要它是可计算的,都可以表示为函数的复合。
Because every function, assuming it's computable, can be represented as composition of function.
这背后有数学依据,但这就是核心观点。
That's there's mathematics underlying it, but that's the basic message.
所以,这些结果给你提供了保证。
So and and those the results said gives you a guarantee.
它指出:如果你有一个多层网络,并且假设你可以进行优化,那么你应该做的就是调整参数。
It says, if you have a network that has multiple layer, assuming that you can optimize, that you can do the optimization, What you should do is tune the parameters.
它们就像很多旋钮,我得调整十万多个旋钮,让我的网络在训练集上模仿我所了解的函数。
They're like many knobs, a 100,000 knobs that I have to adjust so that my network imitate, which I know about the function, on the training set.
我得转动这些旋钮,使网络的表现与训练集一致,因为如果训练集正确的话。
I have to turn the knob so that it will do the same as the training set, because if I correctly, the training set.
好吧,弗兰克
Well, Frank
罗森布拉特确实物理上转动过旋钮,但我知道你指的是现代情况。
Rosenblatt physically did turn knobs, but I know what you mean for modern day.
没错。
That's right.
对。
Right.
嗯哼。
Mhmm.
是的。
Yes.
在这里,我们提到的定理表明,你不会拥有无限或指数级数量的旋钮,因为我们知道这个函数是组合稀疏的。
And here, the the theorems that we have said, you will not have an infinite or an exponential number of knobs because we know the function is compositionally sparse.
你的网络只需要非指数级数量的参数。
Your network will need a non exponential number of parameters.
而这种保证在逼近目标函数时非常强大。
And the guarantee is very powerful to approximate the very function.
一旦你有了这一点,就意味着你也具备了泛化能力。
Once you have that, this means also that, you have generalization.
本质上,这里存在一种权衡。
Essentially, there is a trade off.
再次强调,这是一些数学内容,但基本意思是:如果你能用相对较少的旋钮来表示一个函数,嗯。
Again, this is some mathematics, but that basically said, if you can represent a function with a relatively small number of knobs Mhmm.
那么你也会实现泛化。
Then you would be generalizing also.
哦,明白了。
Oh, okay.
如果你使用一种具有无限参数、非常庞大、极其庞大的技术,就会出现两个问题。
And if you were should be using a technique with has an infinite number of parameters, very large, very large, there would be two problems.
首先,你无法处理像10的千次方这样庞大的参数量。
First of all, you cannot deal with very large, know, 10 to the thousand parameter.
但其次,你将无法实现泛化。
But second, you will not have generalization.
你只会拟合数据。
You'll only fit the data.
所以本质上,如果你使用过多参数,就会过度拟合数据。
So essentially overfit the data with too many if you're Yeah.
使用过参数化的方法。
Using overparameterized.
对。
Right.
没错。
That's right.
这很复杂,需要更深入地讨论过参数化意味着什么,因为今天的神经网络在某种意义上是过参数化的,通常参数数量比训练数据还多。
It's it's this is tricky, and this kind of requires more in-depth conversation about what overparameterization means because neural network today are overparameterized in the sense that it often have, more parameters than training data.
是的。
Yes.
对。
Yeah.
但关键是,如果没有稀疏组合性提供的这种保证,参数数量会大得多。
But, the point is that without this guarantee from sparse compositionality, the number of parameters be so much larger.
那将根本不可能做到
It would be really impossible to have
啊,
a Oh,
我明白了。
I see.
好的。
Okay.
所以,我对这个问题的理解是对的吗?
So am I thinking about this correctly?
我突然想到,我是不是应该把函数的函数,看作介于传统符号AI和神经网络之间的某种东西?在传统符号AI中,你有一些模块彼此交互,而在底层,每个神经元就像一个逻辑门,执行相同的单一逻辑操作,它们只是组合在一起相互交流。
It dawned on me, like, should I be thinking of functions of functions somewhere in between good old fashioned symbolic AI, where that, you know, you have like these kind of modules that are talking to each other, and the low level, every single neuron is like a single logic gate, performs the same single logic gate, and it's just a composition of them all talking together.
但这些几乎是一群布尔函数的集合。
But these are like almost clusters of boolean functions.
是的。
Yeah.
这些几乎。
These almost.
就像
Like
有点像这样。
It's a bit like this.
对。
Yep.
我的意思是,最好的理解方式可能是回想起二叉树。
I mean, the best way to think about it's probably think back to the binary tree.
所以输入来自叶子节点,只有一个输出。
So you have inputs from the from the leaves and one output.
二叉树向上延伸,宽度逐渐减小。
So the binary tree goes up, reduces in width.
我觉得这就像第一层神经元,比如视觉皮层中的单元,观察图像中不同的区域。
I think it's like having units in the first layer of, say, visual cortex looking at different patches in the image.
然后你有一层单元,以及更高一层的单元,它们观察第一层单元的输出。
And then you have a layer in the the layer unit and the layer above which look at the output of this first layer units.
但上一层的单元数量只有一半。
But the layer above is is half the number.
对吧?
Right?
比下一层维度更低
As much as a lower dimension than the And layer
以此类推。
so on.
对。
Yes.
所以这就像是有神经元去观察下方的神经元,并将输出发送给上方的一个神经元。
And and so it's like having, you know, neurons that look at the neurons below and send outputs to one neuron above.
因此你得到的是越来越大的范围,顺便说一下,这在视觉皮层中的结构大致如此:V1区域的接受野很小。
So you get more and more which, by the way, is more or less the architecture in visual cortex where you have small receptive fields in v one Yeah.
然后V2和V4的接受野更大,IT区域的接受野则更大。
And then bigger one in v two and v four and still bigger in IT.
好吧。
Well, okay.
我本来想问,你说的是‘大致如此’,我同意这个说法。
I was gonna ask I mean, you said more or less, which I agree with more or less.
但我还想问一下,你觉得这个理论发现对理解真实大脑的意义或应用在哪里?
But I was gonna but I was gonna ask also, like, how you think this theoretical finding result essentially matters or applies to thinking about wet brains.
是的。
Yep.
首先,我不确定。
First of all, I'm not sure.
你知道,这是一个开放性问题。
You know, that's an open question.
我知道你很在意。
I know you care.
不。
No.
我确实很在意。
I do care.
我只是想说,从数学上讲,我可以告诉你,这种过去存在的竞争关系必须成立,因为像JGPT这样的所有类似系统都在计算机上运行。
Just mean that mathematically, I can tell you that this past competition has to be true for, you know, things like JGPT and all similar system because they run on computers.
所以,我能用计算机模拟的一切都必须具备这个特性。
So, and everything I can simulate on a computer has to have this property.
正如我告诉你的,我不了解人脑。
As I told you, I don't I don't know about the human brain.
我猜测,我们大脑的一些功能,比如语言、数学和其他方面,似乎具有组合性。
My guess is that there are some aspects of what our brain does, like language and mathematics and other things, that seems compositional
对。
Right.
独立地。
By themselves.
还有其他一些功能,可能属于更古老的脑区,比如我们鱼类祖先的中脑,像基底神经节这样的脑结构,可能模块化程度较低。
There are other things, maybe kind of the older brain, midbrain in, you know, in our fish ancestors also, brain structure like the basal ganglia, where maybe there is less modularity.
模块化程度更低。
There is less
嗯,
Well
组合性。
compositionality.
有可能。
Could be.
我
I
我的意思是,是的。
I mean, yeah.
如果基底神经节只是一个增益调节器,那么你就不需要组合性,例如。
If the basal ganglia is just a gain modulator, then you don't need compositionality, for example.
也许不行,也许你真的无法有效地模拟它。
Maybe not maybe you cannot really efficiently simulate it.
这有点像科幻式的推测。
I'm this is a bit science fictional attempts.
这太超前了。
It's out there.
我没有,我没有声称这一点。
I'm not I'm not claiming this.
我个人认为,这不太可能无法用计算机程序来描述,但确实存在这种可能性。
I personally believe it's not likely that this then cannot be described by a computer program, but there is a possibility.
尚无定论。
The jury is out.
是的。
Yeah.
对。
Right.
所以但你可能更倾向于把它看作是皮层,因为你知道,所有智能AI都围绕着皮层。
So but but you probably think of it more as like cortex because, you know, all all intelligent AI is about cortex.
对吧?
Right?
对。
Right.
所以皮层才是我认为可能具有组合性的部分,也许也是我们更容易在计算机中模拟的部分。
So cortex is what I think probably compositional and and maybe what we can simulate in computers more easily.
而大脑更古老的部位,也许不行。
And older parts of the brain, maybe not.
这有点讽刺意味,也就是说,更简单的部分,或者说更古老的脑区,反而可能更难模拟。
This would be kind of ironic that, you know, the simpler parts, so to speak, the more ancient part of the brain are the one that would be potentially more difficult to to simulate.
所以,你不知道有任何跨物种的证据,或者任何能证实大脑中确实发生这种情况的证据吗?
So so you don't know of, like, any evidence across species or or anything that would corroborate that this is occurring in brains.
对吧?
Right?
没有。
No.
那么,我想问你一下,关于理论与实验之间的这种平衡,嗯。
How how does one so I wanted to ask you anyway about, like, sort of this this balance between theory and experiment, which Uh-huh.
物理学的成功依赖于实验者与理论家之间的对话,以及这种相互交流。
Physics has had for the success of physics, like, has depended on experimenters dialoguing with theorists and sort of that back and forth.
所以,在这种情况下,你是理论派。
And so let's say in this case, right, and so you're a theory person.
你会主动去寻找实验证据吗?
Do you go looking for experimental evidence?
你会试图说服别人,嘿,我需要你提供这类数据,或者嘿。
Do you try to convince someone, hey, I need this kind of data for you from you, or or hey.
看看我的理论。
Look at my theory.
这在大脑中存在吗?
Is this in the brain?
你该如何去做?你会怎么着手?
How do you go how would you go about doing that?
是的。
Yes.
在我的职业生涯中,我一直都这么做,不过最近几年可能少了一些。
I've always done that in my in my career, probably less in the last few years.
但确实,我无法忘记当我做出一个关于果蝇行为的理论预测时的那种兴奋感。
But, yeah, I cannot forget the excitement when I made a theoretical prediction, a pretty simple one about behavior in the fly.
然后实验做了出来,结果证明它是正确的。
And and then the experiment was done, and it turned out it was correct.
天哪。
Oh my god.
那一定感觉很……
Did that That must have felt.
是的。
Yeah.
没错。
Exactly.
那就是其中一件事。
That's one of those things.
很多理论学家,对吧?他们总觉得,哦,这就是理论了。
A lot of theorists, right, they they like they have this feeling like, oh, here's the theory.
它一定是正确的。
It has to be correct.
因为从理论上讲它是正确的。
I because it is theoretically correct.
所以人们早已对它的正确性充满信心,但亲眼见证它真正实现,又是另一回事。
So there there's like such confidence in the correctness of it already, but then I maybe to see it actually come to fruition is something else.
对。
Right.
没错。
Exactly.
是的。
Yeah.
你知道,这挺有趣的。
It's, you know, it's it's funny.
有不同的体验。
There are different experiences.
一种当然是证明一个定理。
One, of course, is to prove a theorem.
你非常开心,对自己很满意,
Very you know, you're quite happy with yourself and
我从未有过这种感觉。
I've never felt it.
从未有过这种感觉。
Never felt that one.
大概永远也不会有。
Probably never will.
我有过。
I did.
我不是数学家,也不是优秀的数学家,但有时我证明了一些东西,那确实令人兴奋。
I'm not a mathematician and not a good mathematician, but sometimes I have proved something, it has been exciting.
但实验验证理论所带来的兴奋感,那是完全不同的体验。
But the excitement of of having an experiment, you know, confirming the theory that's that's really something else.
是的。
Yeah.
你看到吗?这有关联。
Do you see so this is related.
我有很多问题想问你,所以我在对话过程中,一有机会就顺带提出来。
I have a lot of questions to ask you, and so I'm kind of folding them in as as I see an opportunity along the conversation here.
但谁更需要深度学习理论呢?
But who who is it who is more in need of deep learning theory?
你觉得是机器学习工程师,试图构建好的人工智能,还是神经科学家试图解释大脑的工作原理?
Do you think machine learning engineers or trying to build good AI or neuroscientists trying to explain how brains work?
当然,如果你问我,我相当有信心,如果你去问OpenAI之类的顶尖研究人员,他们会说我们不需要理论。
Well, for sure, if you ask, I'm pretty I'm pretty confident that if you ask leading researchers in, say, OpenAI or or so, they will say we don't need theory.
这让你有什么感觉?
How does that make you feel?
你知道,我已经习惯了,我想。
You know, I've become accustomed to it, I guess.
但你有那样的经历,可以说:二十年后你会看到的。
But you but you have a track record where you can say, well, you'll see in twenty years.
哦,你知道的?
Oh, you know?
是的。
Yeah.
但你知道,你永远无法确定复活节是否会重演。
But, you know, you you never know whether it Easter will repeat itself.
尤其是在一个非常特殊的情况下。
Is especially the the case in which a very special case.
我们正在与智能本身打交道。
We are working with intelligence itself.
所以我总是担心,我不知道。
So I'm always afraid of I don't know.
也许理论已经永远消亡了。
Maybe that's theory is dead forever.
对吧?
Right?
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。