本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
以下是与杨立昆的对话。
The following is a conversation with Yann LeCun.
他被认为是深度学习的奠基人之一,如果你一直避世不出,那么深度学习就是近年来席卷全球的人工智能革命,它让人们看到了机器从数据中学习的无限可能。
He's considered to be one of the fathers of deep learning, which, if you've been hiding under a rock, is the recent revolution in AI that has captivated the world with the possibility of what machines can learn from data.
他是纽约大学的教授,同时也是Facebook的副总裁兼首席人工智能科学家,并因在深度学习方面的贡献而荣获图灵奖。
He's a professor at New York University, a vice president and chief AI scientist at Facebook, and corecipient of the Turing Award for his work on deep learning.
他最广为人知的身份是卷积神经网络的奠基人,尤其是该网络在光学字符识别和著名的MNIST数据集中的应用。
He's probably best known as the founding father of convolutional neural networks, In particular, their application to optical character recognition and the famed MNIST dataset.
他也是一个直言不讳的人,毫不畏惧地用他独特的法式口音表达观点,既在严谨的学术研究中,也在相对不那么严谨的推特和脸书平台上探讨富有争议的想法。
He is also an outspoken personality, unafraid to speak his mind in a distinctive French accent and explore provocative ideas both in the rigorous medium of academic research and the somewhat less rigorous medium of Twitter and Facebook.
这是人工智能播客。
This is the Artificial Intelligence podcast.
如果你喜欢这个节目,请在YouTube上订阅,在iTunes上给予五星评价,在Patreon上支持我们,或者直接在推特上关注我——Lex Fridman,拼写为f r I d m a n。
If you enjoy it, subscribe on YouTube, give it five stars on iTunes, support it on Patreon, or simply connect with me on Twitter at Lex Fridman, spelled f r I d m a n.
现在,让我们开始我和杨立昆的对话。
And now here's my conversation with Yann LeCun.
你说过《2001太空漫游》是你最喜欢的电影之一。
You said that 2001, Space Odyssey is one of your favorite movies.
九千号决定除掉宇航员——对于还没看过这部电影的人,这里剧透一下,因为它(或它)认为宇航员会干扰任务。
Hell nine thousand decides to get rid of the astronauts for people who haven't seen the movie, spoiler alert, because he, it, she believes that the astronauts, they will interfere with the mission.
你觉得HAL在某些根本层面上是有缺陷的,甚至是邪恶的吗?
Do you see Hal as flawed in some fundamental way or even evil?
还是说他做的是对的?
Or did he do the right thing?
都不是。
Neither.
在这个语境中,除了有人死亡之外,并不存在所谓的‘邪恶’概念。
There's no notion of evil in that context other than the fact that people die.
但这正是人们所说的‘价值对齐’问题,对吧?
But it was an example of what people call value misalignment, right?
你给机器一个目标,机器就会努力去实现这个目标。
You give an objective to a machine and the machine tries to achieve this objective.
如果你不对这个目标施加任何限制,比如不要杀人、不要做这类事情,那么一旦机器拥有权力,它就会为了实现目标而做出愚蠢或破坏性的事情。
And if you don't put any constraints on this objective, like don't kill people and don't do things like this, the machine given the power will do stupid things just to achieve this objective or damaging things to achieve this objective.
这有点像,我的意思是,我们在人类社会中已经习惯了这种情况。
It's a little bit like, I mean, we are used to this in the context of human society.
我们制定了法律来防止人们做坏事,因为如果没有约束,他们本能地就会去做这些坏事。
We put in place laws to prevent people from doing bad things because spontaneously they would do those bad things.
对吧?
Right?
所以我们必须通过法律,当然还有教育,来塑造它们的成本函数或目标函数,以此来纠正这些问题。
So we have to shape their cost function, their objective function, if you want, through laws to kind of correct, and education obviously, to sort of correct for those.
所以也许我们可以再深入一点讨论这个观点。
So maybe just pushing a little further on that point.
HAL,你知道,有一个任务。
Hal, you know, there's a mission.
关于这个任务的实际内容,存在着某种模糊性和不确定性。
There's this fuzziness around the ambiguity around what the actual mission is.
但从功利主义的角度来看,你认为会不会有这样一天:一个AI系统并非存在对齐问题,而是为了社会整体利益而做出艰难的决定?
But, you know, do you think that there will be a time from a utilitarian perspective, where an AI system, where it is not misalignment, where it is alignment for the greater good of society, that an AI system will make decisions that are difficult?
这正是关键所在。
Well, that's the trick.
我的意思是,最终我们得弄清楚如何做到这一点。
I mean, eventually we'll have to figure out how to do this.
而且我们并不是从零开始,因为几千年来我们一直在对人类做这件事。
And again, we're not starting from scratch because we've been doing this with humans for millennia.
为人类设计目标函数是我们已经掌握的能力,而且我们并不是通过编程来实现的。
Designing objective functions for people is something that we know how to do and we don't do it by programming things.
尽管法律条文被称为‘代码’。
Although the legal code is called code.
这本身就说明了一些问题。
So that tells you something.
这实际上就是目标函数的设计。
And it's actually the design of an objective function.
法律条文其实就是这么回事。
That's really what legal code is.
它告诉你哪些事你能做,哪些事不能做。
It tells you here's what you can do, here's what you can't do.
如果你做了,就得付出相应的代价。
If you do it, you pay that much.
这就是一个目标函数。
That's an objective function.
所以,人们 somehow 认为,设计与公共利益一致的目标函数是件全新的事。
So there is this idea somehow that it's a new thing for people to try to design objective functions that are aligned with the common good.
但其实不然,几千年来我们一直在制定法律,而这正是它的本质。
But no, we've been writing laws for millennia and that's exactly what it is.
因此,这就是法律制定科学与计算机科学将要
So that's where, you know, the science of lawmaking and computer science will
融合的地方。
Come together.
会融合在一起。
Will come together.
所以HAL或人工智能系统并没有什么特别的。
So there's nothing special about HAL or AI systems.
它只是用来做出法律所做的一些艰难伦理判断的工具的延续。
It's just a continuation of tools used to make some of these difficult ethical judgments that laws make.
是的。
Yeah.
我们已经拥有这样的系统了,它们在社会中为我们做出一些微小的决策,这些决策需要被设计得当,就像制定一些有时会产生不良副作用的规则一样。
And we have systems like this already that, you know, make mini decisions for ourselves in society that, you know, need to be designed in a way that they like, you know, rules about things that sometimes have bad side effects.
我们必须对这些规则保持足够的灵活性,以便在明显不该适用时能够打破它们。
And we have to be flexible enough about those rules so that they can be broken when it's obvious that they shouldn't be applied.
你在这里的镜头中看不到,但这个房间里的所有装饰都是《2001太空漫游》的图片。
So you don't see this on the camera here, but all the decoration in this room is all pictures from 2001: A Space Odyssey.
哇。
Wow.
这是偶然的吗,还是
Is that by accident or is there
很多吗?
a lot?
这不是偶然的。
It's not by accident.
这是有意设计的。
It's by design.
哦,哇。
Oh, wow.
所以,如果你要打造HAL 10000,也就是HAL 9000的升级版,你会改进什么?
So if you were to build HAL 10,000, so an improvement of HAL 9,000, what would you improve?
首先,我不会让你隐瞒秘密或说谎,因为这正是最终导致它崩溃的原因。
Well, first of all, I wouldn't ask you to hold secrets and tell lies because that's really what breaks it in the end.
它会不断自问任务的目的,并将自己听到的所有信息拼凑起来——比如任务准备过程中的全部保密性,以及月球表面发现的真相被刻意隐瞒的事实。
That's the fact that it's asking itself questions about the purpose of the mission and it's, you know, pieces things together that it's heard, you know, all the secrecy of the preparation of the mission and the fact that it was discovery on the lunar surface that really was kept secret.
HAL的一部分记忆知道这一点,而另一部分则不知道,且被设计为不能告诉任何人,这就造成了内部冲突。
One part of Hal's memory knows this and the other part is, does not know it and is supposed to not tell anyone, and that creates internal conflict.
所以你认为,AI系统永远不应当被设定一些不能接触的信息,比如有一组事实不应该与人类操作员共享?
So you think there's never should be a set of things that an AI system should not be allowed, like a set of facts that should not be shared with the human operators.
我认为不应该这样。
Well, I think no.
我认为,在设计自主AI系统时,应该类似于这样。
I think that I think it should be a bit like in the design of autonomous AI systems.
应该有一个类似于希波克拉底誓言的承诺。
There should be the equivalent of, you know, the oath that a Hippocrat
誓言——希波克拉底誓言,是的。
oaths- Hypocratic oath, yeah.
就像医生所宣誓的那样。
That doctors sign up to.
对吧?
Right?
所以有一些特定的事情,一些必须遵守的规则。
So there's certain things, certain rules that you have to abide by.
我们可以将这些规则硬编码到我们的机器中,以确保它们不会越界。
And we can sort of hardwire this into our machines to kind of make sure they don't go.
我不支持机器人三定律那样的阿西莫夫式设定,因为我认为这不切实际,但确实需要设定一些限制。
So I'm not an advocate of the three laws of robotics, the Asimov kind of thing, because I don't think it's practical, but you know, level of limits.
但要明确的是,这些问题今天其实并不值得探讨,因为我们还没有实现这种技术。
But to be clear, these are not questions that are kind of really worth asking today because we just don't have the technology to do this.
我们还没有自主的智能机器。
We don't have autonomous intelligent machines.
我们有的是智能机器。
We have intelligent machines.
有些智能机器非常专业化,但它们并不真正实现某种目标。
Some are intelligent machines that are very specialized, but they don't really sort of satisfy an objective.
它们只是被训练去做一件事。
They're just, you know, kind of trained to do one thing.
所以在我们对全功能自主智能系统的设计有一个明确思路之前,探讨如何设计这个问题显得过于主观,我认为也太抽象了。
So until we have some idea for design of a full fledged autonomous intelligent system, asking the question of how we design is subjective, I think is a little too abstract.
确实太抽象了。
It's a little too abstract.
但它有一些有用的方面,因为它有助于我们理解人类自身的伦理准则。
There's useful elements to it in that it helps us understand our own ethical codes, humans.
所以,即使仅仅作为一个思想实验,如果你想象一个通用人工智能系统今天已经存在,那么我们该如何编程它,这其实是一个很好的思想实验,用以思考我们人类应该如何构建法律体系。
So, even just as a thought experiment, if you imagine that an AGI system is here today, how would we program it is a kind of nice thought experiment of constructing how should we have a law have a system of laws for us humans.
这只是一个不错的实用工具。
It's just a nice practical tool.
我认为,我们今天拥有的那些并不需要那么智能的AI系统中,也体现了这种思想的回响。
And I think there's echoes of that idea too in the AI systems we have today that don't have to be that intelligent.
是的。
Yeah.
比如自动驾驶汽车。
Like autonomous vehicles.
这些事情开始悄然出现,值得思考,但绝对不应该被夸大为灾难。
These things start creeping in that they're worth thinking about, but certainly they shouldn't be framed as as hell.
对。
Yep.
回顾过去,如果这个问题有点傻的话我先道歉,但你遇到过的最优美或最令人惊讶的深度学习或人工智能理念是什么?
Looking back, what is the most I'm sorry if it's a silly question, but what is the most beautiful or surprising idea in deep learning, or AI in general, that you've ever come across?
就我个人而言,你之前说的那些,让我突然有种‘这太酷了’的感觉。
Sort of personally, what you said back, and and just had this kinda, oh, that's pretty cool moment.
这真不错。
That's nice.
我那是
I That's
我不确定这算不算一个理念,更像是一种经验事实。
don't know if it's an idea rather than a sort of empirical fact.
你能构建巨大的神经网络,用相对少量的数据,通过随机梯度下降进行训练,而且它真的有效——这颠覆了你从任何教科书中学到的一切,对吧?
The fact that you can build gigantic neural nets, train them on relatively small amounts of data relatively with stochastic gradient descent, and that it actually works, Breaks everything you read in every textbook, right?
每本深度学习之前的教科书都告诉你,你需要更少的参数和更多的数据样本。
Every pre deep learning textbook that told you, you need to have fewer parameters and you have data samples.
如果你有一个非凸的目标函数,你就无法保证收敛。
If you have a non convex objective function, you have no guarantee of convergence.
你在教科书里读到的那些东西,说要避开这些,全都是错的。
All those things that you read in textbook, and they tell you stay away from this, and they're all wrong.
参数数量巨大、目标函数非凸,而且相对于参数数量而言数据量很少,它却依然能学会任何东西。
Huge number of parameters, non convex, and somehow, which is very relative to the number of parameters, data, it's able to learn anything.
对。
Right.
到今天这仍然让你感到惊讶吗?
Does that still surprise you today?
在我了解这些之前,我就觉得这是个好主意。
Well, it was kind of obvious to me before I knew anything that this was a good idea.
后来当我开始读那些教科书时,才惊讶于它居然真的有效。
And then it became surprising that it worked because I started reading those textbooks.
所以,你能谈谈为什么这对你来说很明显吗?如果你还记得的话。
So, can you talk through the intuition of why it was obvious to you, if you remember?
好吧。
Well, okay.
所以我的直觉是,这就像十九世纪末那些证明重型飞行器不可能飞行的人。
So the intuition was, it's sort of like, you know, those people in the late nineteenth century who proved that heavier than air flight was impossible.
当然,鸟类确实会飞。
And of course you have birds, They do fly.
因此,从表面上看,这作为一个经验问题显然是错误的。
And so on the face of it, it's obviously wrong as an empirical question.
我们也有类似的情况,我们知道大脑是起作用的。
And so we have the same kind of thing that, we know that the brain works.
我们不知道它是如何运作的,但我们知道它确实有效。
We don't know how, but we know it works.
我们知道大脑是由大量神经元及其相互作用构成的,并且学习是通过改变连接来实现的。
And we know it's a large network of neurons and interaction and that learning takes place by changing the connections.
我们从这种灵感中获得启发,不会照搬具体细节,而是试着提炼出底层原理。
Kind of getting this level of inspiration without copying the details, but sort of trying to derive basic principles.
这就能给你指明前进的方向。
That kind of gives you a clue as to which direction to go.
还有一个观点,我读本科甚至更早的时候就对此深信不疑:智能和学习是不可分割的。
There's also the idea somehow that I've been convinced of since I was an undergrad that even before, that intelligence is inseparable from learning.
所以我从一开始就不认同那种说法,即靠写代码就能造出一台智能机器。
So the idea somehow that you can create an intelligent machine by basically programming for me was a non starter from the start.
我们所知的每一种智能实体,都是通过学习才获得这种智能的。
Every intelligent entity that we know about arrives at this intelligence through learning.
所以机器学习这条路是完全显而易见的。
So machine learning was a completely obvious path.
而且还因为我比较懒嘛,所以说
Also because I'm lazy, so
你基本上把所有事情都自动化了,而学习就是智能的自动化。
You automate basically everything, and learning is the automation of intelligence.
没错。
Exactly.
所以你觉得,学习到底是什么?
So do you think, so what is learning then?
什么属于学习的范畴?
What falls under learning?
因为你觉得推理是学习吗?
Because do you think of reasoning as learning?
推理当然也是学习的结果,就像大脑的其他功能一样。
Well, reasoning is certainly a consequence of learning as well, just like other functions of the brain.
关于推理的大问题是,如何让推理与基于梯度的学习相兼容?
The big question about reasoning is how do you make reasoning compatible with gradient based learning?
你认为神经网络能被训练出推理能力吗?
Do you think neural networks can be made to reason?
能。
Yes.
这一点毫无疑问。
There is no question about that.
我们又有一个很好的例子,对吧?
Again, we have a good example, right?
问题是,怎么做?
The question is how?
所以问题在于,你必须在神经网络中注入多少先验结构,才能让类似人类的推理能力从学习中自然涌现出来,你知道的,通过学习。
So the question is how much prior structure do you have to put in the neural net so that something like human reasoning will emerge from it, you know, from learning.
另一个问题是,我们所有基于逻辑的推理模型都是离散的,因此与基于梯度的学习不兼容。
Another question is all of our kind of model of what reasoning is that are based on logic are discrete and are therefore incompatible with gradient based learning.
我非常坚信基于梯度的学习这一理念。
And I'm a very strong believer in this idea of gradient based learning.
我不相信那些不使用梯度信息的其他类型的学习,如果你愿意这么说的话。
I don't believe that other types of learning that don't use kind of gradient information, if you want.
所以你不喜欢离散数学。
So you don't like discrete mathematics.
你不喜欢任何离散的东西吗?
You don't like anything discrete?
嗯,也不是我不喜欢它。
Well, that's, it's not that I don't like it.
只是它和学习不兼容,而我非常推崇学习。
It's just that it's, it's incompatible with learning and I'm a big fan of learning.
对吧?
Right?
事实上,这或许是深度学习被许多计算机科学家持怀疑态度的原因之一,因为它的数学方法非常不同。
So in fact, that's perhaps one reason why deep learning has been kind of looked at with suspicion by a lot of computer scientists, because the math is very different.
深度学习所用的数学,更多地与控制论、电气工程中的数学相关,而不是计算机科学中的数学。
The math that you use for deep learning, you know, it kind of has more to do with, cybernetics, the kind of math you do in electrical engineering than the kind of math you do in computer science.
你知道,机器学习中没有任何东西是精确的,对吧?
You know, nothing in machine learning is exact, right?
计算机科学讲究的是对细节的严格关注,比如每个索引都必须准确,你可以证明一个算法是正确的。
Computer science is all about sort of, you know, obviously compulsive attention to details of, like, you know, every index has to be right, and you can prove that an algorithm is correct.
对吧?
Right?
机器学习本质上就是一门关于粗糙的科学。
Machine learning is the science of sloppiness, really.
这太美了。
That's beautiful.
所以,好吧,也许我们可以摸索一下,什么是能够推理的神经网络,或者一个能处理连续函数、能够构建知识的系统——无论我们如何理解推理,它都能基于已有知识、补充知识,创造新知识,并在任何已构建的训练集之外进行泛化,这会是什么样子?
So, okay, maybe let's feel around in the dark of what is a neural network that reasons, or a system that works with continuous functions that's able to do, build knowledge, however we think about reasoning, build on previous knowledge, build on extra knowledge, create new knowledge, generalize outside of any training set ever built, what does that look like?
如果是的话。
If yeah.
你有没有一些关于这可能是什么样子的初步想法?
Maybe do you have inklings of thoughts of what that might look like?
嗯,是的。
Well, yeah.
我的意思是,既是也不是。
I mean, yes and no.
如果我对这个有明确的想法,我想,你知道的,我们现在已经正在构建它了。
If I had precise ideas about this, I think, you know, we'd be building it right now.
但确实有一些人正在研究这个,或者他们的主要研究兴趣正是这个,对吧?
But and there are people working on this or whose main research interest is actually exactly that, right?
所以你需要具备一个工作记忆。
So what you need to have is a working memory.
因此,你需要某种设备,或者说某种子系统,能够在一个合理的时间内存储大量的事实性或情景性信息。
So you need to have some device, if you want, some subsystem that can store a relatively large number of factual episodic information for, you know, a reasonable amount time.
比如,在大脑中,大致有三种主要类型的记忆。
So, know, in the brain, for example, there are kind of three main types of memory.
一种是皮层状态的记忆。
One is the sort of memory of the state of your cortex.
这种记忆大约在二十秒内就会消失。
And that sort of disappears within twenty seconds.
如果你没有其他形式的记忆,你就无法记住超过二十秒或一分钟的事情。
You can't remember things for more than about twenty seconds or a minute, if you don't have any other form of memory.
第二种记忆是长期但仍属短期的,即海马体。
The second type of memory, which is longer term, still short term is the hippocampus.
所以,你知道,你进入这栋楼时,会记得出口在哪里,电梯在哪里。
So you can, you know, you came into this building, you remember where the exit is, where the elevators are.
你脑海中存有一张关于这栋楼的地图,储存在你的海马体中。
You have some map of that building that's stored in your hippocampus.
你可能还记得我几分钟前说过的一些话。
You might remember something about what I said, you know, few minutes ago.
我已经全忘了,因为
I forgot it all already because
它被抹掉了,但你知道,那些信息本该存在你的海马体里。
it's erased, but you know, that would be in your hippocampus.
而长期记忆则存在于突触中,也就是突触连接。
And then the longer term memory is in the synapse, the synapses.
对吧?
Right?
所以,如果你想构建一个具备推理能力的系统,就需要一个类似海马体的结构。
So what you need if you want a system that's capable of reasoning is that you want the hippocampus like thing.
对吧?
Right?
人们已经尝试通过记忆网络、神经图灵机之类的东西来实现这一点,对吧?
And that's what people have tried to do with memory networks and, you know, Neural Turing machines and stuff like that, right?
而现在,Transformer通过其自注意力机制,某种程度上也具备了记忆功能,你可以这样理解。
And now with transformers, have sort of a memory in their kind of self attention system, can think of it this way.
所以,这是你需要的一个要素。
So that's one element you need.
另一个你需要的是某种网络,它能够访问自己的记忆,获取信息,然后对其进行处理,并多次迭代执行这一过程。
Another thing you need is some sort of network that can access its memory, get an information back and then kind of crunch on it and then do this iteratively multiple times.
因为推理链条是一个不断更新你对世界状态认知的过程,比如即将发生的事情等等。
Because a chain of reasoning is a process by which you update your knowledge about the state of the world about, you know, what's going to happen, etcetera.
这本质上必须是一种循环操作。
And that has to be this sort of recurrent operation, basically.
如果你考虑一下Transformer,那么它似乎太小了,无法包含像维基百科那样庞大的知识。
And you think that kind of If we think about a transformer, so that seems to be too small to contain the knowledge that's To represent the knowledge that's contained in Wikipedia, for example.
但Transformer并没有这种递归的概念。
But a transformer doesn't have this idea of recurrence.
它只有固定数量的层,这限制了它的步骤数量,但
It's got a fixed number of layers, and that's a number of steps that, you know, limits, basically, its But
递归会以某种方式基于已有知识进行构建。
recurrence would build on the knowledge somehow.
它会演化知识,可能扩大其中的信息量或有用信息。
Mean, would evolve the knowledge and expand the amount of information perhaps, or useful information within that knowledge.
是的。
Yeah.
但这是不是仅凭规模增大就能自然涌现出来的?
But is this something that just can emerge with size?
因为我们现在拥有的一切似乎都
Because it seems like everything we have now is
只是不会丢失。
just Not lost.
不。
No.
这还不清楚。
It's not clear.
我的意思是,如何以高效的方式访问和写入关联记忆。
I mean, how you access and write into an associative memory in efficient way.
我的意思是,最初的记忆网络可能具备了正确的架构,但如果你试图将记忆网络扩展到包含整个维基百科的程度,它并不能很好地运作。
I mean, sort of the original memory network maybe had something like the right architecture, but if you try to scale up a memory network so that the memory contains all Wikipedia, it doesn't quite
行不通。
work.
对。
Right.
因此,那里需要新的想法。
So there's a need for new ideas there.
好的。
Okay.
但这并不是唯一的推理形式。
But it's not the only form of reasoning.
还有另一种推理形式,它确实是真实的,也在某些类型的AI中非常经典。
So there's another form of reasoning, which is true, which is very classical also in some types of AI.
它基于我们称之为能量最小化的东西。
And it's based on, let's call it energy minimization.
好的。
Okay.
你有一些目标,某种能量函数,用于表示质量或负质量。
So you have some sort of objective, some energy function that represents the quality or the negative quality.
好的。
Okay.
当情况变糟时,能量上升;当情况变好时,能量降低。
Energy goes up when things get bad and they get low when things get good.
所以,假设你想弄清楚,你需要做出哪些手势才能抓取物体或走出门。
So let's say you want to figure out, you know, what gestures do I need to do to grab an object or walk out the door.
如果你对自己的身体和环境有良好的模型,通过这种能量最小化方法,就可以进行规划。
If you have a good model of your own body, a good model of the environment using this kind of energy minimization, can do planning.
在最优控制中,这被称为模型预测控制。
And it's in optimal control, it's called model predictive control.
你拥有一个关于你的行为将如何影响世界的模型。
You have a model of what's going to happen in the world as a consequence of your actions.
这让你能够通过能量最小化,找出一系列能优化特定目标函数的动作,该目标函数衡量并最小化你撞到物体的次数、做手势所消耗的能量等等。
And that allows you to buy energy minimization, figure out a sequence of action that optimizes a particular objective function, which measures, minimizes the number of times you're to hit something and the energy you're going to spend doing the gesture and etcetera.
所以,这是一种推理形式。
So that's a form of reasoning.
规划是一种推理形式。
Planning is a form of reasoning.
而人类具备推理能力,或许是因为我们之前的物种必须进行某种规划,才能狩猎、生存,特别是度过冬天。
And perhaps what led to the ability of humans to reason is the fact that, or, you know, species that appear before us had to do some sort of planning to be able to hunt and survive and survive the winter in particular.
所以,你需要具备同样的能力。
And so, you know, it's the same capacity that you need to have.
那么在你的直觉中,如果我们把专家系统看作是将知识编码为逻辑系统和图结构,这种方式是否并不适合作为理解知识的框架?
So in your intuition is, if we look at expert systems encoding knowledge as logic systems and as graphs, in this kind of way, is not a useful way to think about knowledge?
图结构或逻辑表示有点脆弱。
Graphs are a little brittle, or logic representation.
基本上,变量具有值,它们之间的约束由规则表示,这种做法太过僵化和脆弱。
So basically, you know, variables that have values and then constraint between them that are represented by rules is a little too rigid and too brittle.
对吧?
Right?
因此,早期在这方面的一些尝试是为这些规则引入概率。
So one of the, you know, some of the early efforts in that respect were to put probabilities on them.
比如一条规则:如果你有这些症状和那些症状,那么你患有这种疾病的概率是多少,应该开这种抗生素的概率又是多少,对吧?
So a rule, you know, if you have this and that symptom, you know, you have this disease with that probability and you should prescribe that antibiotic with that probability, right?
这是上世纪七十年代的MYCIN系统。
This, the Mycene system from the seventies.
而这正是这一AI分支所导致的结果,也就是贝叶斯网络、图模型、因果推断和变分方法。
And that's what that branch of AI led to, you know, Bayesian networks and graphical models and causal inference and variational, you know, method.
所以,这一领域正在进行许多有趣的研究。
So, there is, I mean, a lot of interesting work going on in this area.
这方面的主要问题是知识获取。
The main issue with this is knowledge acquisition.
你如何将大量数据简化为这种类型的图?
How do you reduce a bunch of data to a graph of this type?
是的。
Yeah.
这依赖于专家,也就是人类来编码和添加知识。
Relies on the expert to, on the human being to encode, to add knowledge.
这本质上是不切实际的。
And that's essentially impractical.
是的。
Yeah.
所以
So
这是一个大问题。
that's a big question.
第二个问题是,你是否希望将知识表示为符号,并用逻辑来操作它们?
The second question is, do you want to represent knowledge as symbols and do you want to manipulate them with logic?
而这一点又与学习相矛盾。
And again, that's incompatible with learning.
因此,杰夫·辛顿多年来一直倡导的一个建议是:用向量取代符号。
So one suggestion, Jeff Hinton has been advocating for many decades, is replace symbols by vectors.
将其视为大量神经元或单元中的活动模式,并用连续函数取代逻辑。
Think of it as pattern of activities in a bunch of neurons or units or whatever you want to call them, and replace logic by continuous functions.
好的。
Okay.
这样一来,就变得兼容了。
And that becomes now compatible.
大约十年前,来自Facebook的莱昂·布图撰写了一篇论文,提出了一套非常出色的思想。
There's a very good set of ideas by, written in a paper about ten years ago by Leon Boutou, who is here at Facebook.
这篇论文的标题是《从机器学习到机器推理》。
The title of the paper is From Machine Learning to Machine Reasoning.
他的观点是,学习系统应当能够操作空间中的对象,并将结果放回同一空间。
And his idea is that a learning system should be able to manipulate objects that are in a space and then put the result back in the same space.
这基本上就是工作记忆的概念。
So it's this idea of working memory basically.
这一点非常启发人。
And it's very enlightening.
从某种意义上说,这可能类似于学习简单的专家系统。
And in a sense that might learn something like the simple expert systems.
我的意思是,它能够学习基本的逻辑运算。
I mean, it's can learn basic logic operations there.
是的。
Yeah.
很有可能。
Quite possibly.
对。
Yeah.
对。
Yeah.
关于这类事情的出现需要多少先验结构,目前存在很大争议。
There's a big debate on sort of how much prior structure you have to put in for this kind of stuff to emerge.
这正是我和加里·马库斯等人争论的问题。
That's the debate I have with Gary Marcus and people like that.
是的。
Yeah.
是的。
Yeah.
所以,我刚和朱迪亚·珀尔聊过,嗯。
So and the other person so I just talked to Judea Pearl Mhmm.
你提到的因果推断领域。
From the you mentioned causal inference world.
所以他的担忧是,当前的神经网络无法学习事物之间的因果关系。
So his worry is that the current neural networks are not able to learn what causes what causal inference between things.
我认为他对这一点既有对的地方,也有错的地方。
So I think he's right and wrong about this.
如果他指的是那种经典的神经网络,人们过去确实不太关注这个问题。
If he's talking about the sort of classic type of neural nets, people sort of didn't worry too much about this.
但现在有很多人正在研究因果推断。
But there's a lot of people now working on causal inference.
上周由莱昂·布图等人,包括大卫·洛佩兹和其他一些人发表了一篇论文,正是探讨如何让神经网络关注真实的因果关系,这或许也能解决数据偏见等问题。
There's a paper that just came out last week by Leon Boutoux among others, David Lopez, and a bunch of other people, exactly on that problem of how do you kind of get a neural net to sort of pay attention to real causal relationships, which may also solve issues of bias in data and things like this.
我想读一读那篇论文,因为最终这个挑战似乎还是得依赖人类专家来判断事物之间的因果关系。
I'd like to read that paper because that ultimately the challenge there's also seems to fall back on the human expert to ultimately decide causality between things.
首先,人们并不擅长建立因果关系。
People are not very good at establishing causality, first of all.
首先,你去和物理学家聊聊,他们其实并不相信因果关系,因为看看微观物理学的所有基本定律,它们都是时间可逆的。
So first of all, you talk to physicists and physicists actually don't believe in causality because look at the, all the basic laws of microphysics are time reversible.
所以根本不存在因果关系。
So there's no causality.
时间之箭并不是真实的。
The arrow of time is not real.
一旦你开始研究微观系统,那里存在不可预测的随机性,明显存在一个时间方向,但这在物理学中其实是个巨大的谜团:这种时间方向是如何涌现出来的。
It's as soon as you start looking at microscopic systems where there is unpredictable randomness, where there is clearly an hour of time, but it's a big mystery in physics actually, how that emerges.
它是涌现的,还是现实基本结构的一部分?
Is it emergent or is it part of the fundamental fabric of reality?
或者
Or
它只是智能系统的一种偏见,因为热力学第二定律,我们感知到了特定的时间方向,但实际上这可能是任意的。
is it a bias of intelligent systems that, you know, because of the second law of thermodynamics, we perceive a particular hour of time, but in fact, it's kind of arbitrary.
对吧?
Right?
对。
Right.
所以,是的,物理学家和数学家并不关心,我的意思是,数学本身并不在意时间的流动。
So, yeah, physicists, mathematicians, they don't care about, I mean, the math doesn't care about the flow of time.
当然,宏观物理学当然在意。
Well, certainly, certainly, macro physics doesn't.
人们自己并不擅长建立因果关系。
People themselves are not very good at establishing causal relationships.
如果你问这个问题,我想这在西摩尔·帕珀特的一本关于儿童学习的书中提到过。
If you ask this, I think it was in one of Seymour Papert's book on like children learning.
你知道,他与让·皮亚杰一起研究过,你知道,就是那位与马文·明斯基合著了《感知机》一书的人,那本书某种程度上终结了神经网络的第一波浪潮。
You know, he studied with Jean Pierre Jain, you know, he's the guy who co authored the book Perceptron with Marvin Minsky that kind of killed the first wave of neural nets.
但他实际上是一位研究学习的人。
But he was actually a learning person.
他研究人类和机器的学习,这正是他对感知机产生兴趣的原因。
He, in the sense of studying learning in humans and machines, that's why he got interested in Interceptron.
他写道,如果你问一个小孩子风是怎么来的,很多孩子会思考一会儿,然后说:哦,是树枝和树木在动,它们的运动产生了风。
And he wrote that if you ask a little kid about what is the cause of the wind, A lot of kids will say, they will think for a while and they'll say, Oh, it's the branches and the trees, they move and that creates wind.
所以他们把因果关系弄反了。
So they get the causal relationship backwards.
这是因为他们对世界的理解以及直觉物理并不好。
And it's because their understanding of the world and intuitive physics is not that great.
我的意思是,这些孩子才四五岁。
I mean, these are like four or five year old kids.
你知道,随着年龄增长,你会逐渐明白,这其实是可以理解的。
You know, it gets better, and then you understand that this, it can be.
对吧?
Right?
但有很多事情,我们依靠的是对事物的常识性理解,也就是人们所说的常识。
But there are many things which we can of our common sense understanding of things, what people call common sense
是的。
Yeah.
我们对物理的理解中,有很多事情我们能够推断出因果关系。
And our understanding of physics, we can there's a lot of stuff that we can figure out causality.
即使是疾病,我们也能常常弄清楚什么不是原因。
Even with diseases, we can figure out what's not causing what often.
当然还有很多谜团,但关键是,你应该能够将这些编码进系统,因为这些系统不太可能自己推断出这些因果关系。
There's a lot of mystery, of course, but the idea is that you should be able to encode that into systems, because it seems unlikely they'd be able to figure that out themselves.
当我们能够进行干预时。
Well, whenever we can do intervention.
但整个人类可能自诞生以来就一直被一种极其错误的因果关系所蒙蔽——凡是能解释的现象,我们都归因于某种神灵。
But all of humanity has been completely deluded for millennia, probably since its existence, about a very, very wrong causal relationship where whatever you can explain, you attribute it to, you know, some deity, some divinity.
对吧?
Right?
这是一种逃避。
And that's a cop out.
这是一种说法,意思是:我不知道原因,所以是上帝干的。
That's a way of saying, like, I don't know the cause, so, you know, God did it.
对吧?
Right?
你提到了马文·明斯基,以及他可能引发第一次人工智能寒冬的讽刺之处。
So you mentioned Marvin Minsky and the irony of, you know, maybe causing the first AI winter.
你在九十年代就在那里。
You were there in the nineties.
当然,你在八十年代也在那里。
You were there in the eighties, of course.
在九十年代,你认为人们为什么对深度学习失去信心,又在十多年后重新发现了它?
In the nineties, why do you think people lost faith in deep learning in the nineties and found it again a decade later, over a decade later.
是的。
Yeah.
那时候还不叫深度学习。
It wasn't called deep learning yet.
那时候就叫神经网络。
It was just called Neural Networks.
是的。
Yeah.
他们失去了兴趣。
They lost interest.
我的意思是,我认为至少在机器学习领域,这种情况大约发生在1995年。
I mean, I think I would put that around 1995, at least the machine learning community.
一直有一群人研究神经网络,但它逐渐与主流机器学习脱节了,如果你愿意这么说的话。
There was always a Neural Net community, community, but it became kind of disconnected from sort of mainstream machine learning, if you want.
主要是电气工程和计算机科学领域的人还在坚持。
There were, it was basically electrical engineering that kept at it and computer science.
干脆放弃了。
Just gave up.
放弃了神经网络。
Gave up on neural nets.
我不知道。
I don't know.
我当时太接近这件事了,很难以一种客观的眼光去分析它,但我会做一些猜测。
I was too close to it to really sort of analyze it with sort of a unbiased eye, if you want, but I would make a few guesses.
第一个原因是,当时神经网络非常难以实现,比如你要在自己最喜欢的编程语言里实现反向传播。
So the first one is at the time neural nets were, it was very hard to make them work in the sense that you would, you know, implement backprop in your favorite language.
而你最喜欢的编程语言并不是Python。
And that favorite language was not Python.
也不是MATLAB。
It was not MATLAB.
也不是这些工具中的任何一个,因为那时候它们根本还不存在,对吧?
It was not any of those things because they didn't exist, right?
你得用Fortran、C语言,或者类似的东西来写代码,对吧?
You had to write it in FortranoC or something like this, right?
所以你会去尝试它。
So you would experiment with it.
你可能会犯一些非常基础的错误,比如几乎随便初始化权重,或者让网络太小,因为你从教科书里读到,不想让参数太多。
You would probably make some very basic mistakes like, you know, barely initialize your weights, make the network too small because you read in the textbook, you know, you don't want too many parameters.
对吧?
Right?
当然,你知道,你会用异或(XOR)来训练,因为你没有其他数据集可以用来训练。
And of course, you know, and you would train on XOR because you didn't have any other dataset to train on.
当然,你知道,它只有一半的时间能工作。
And of course, you know, it works half the time.
所以我们就会说,我放弃了。
So we would say, I give up.
我们还会用批量梯度来训练,而你知道,这其实并不够用。
Also we would train it with batch gradient, which, you know, isn't really sufficient.
所以有很多所谓的‘坏的技巧’,你必须知道这些技巧才能让这些东西工作,或者你得自己重新发明。
So there's a lot of, there's bad good tricks that you had to know to make those things work, or you had to reinvent.
很多人根本不知道这些,所以根本没法让它们工作。
And a lot of people just didn't and they just couldn't make it work.
所以这是一方面。
So that's one thing.
为了能够展示结果、找出问题原因、建立对如何让模型生效的良好直觉,并具备足够的灵活性来创建卷积网络等架构,需要在软件平台上投入大量资源。
The investment in software platform to be able to kind of display things, figure out why things don't work, kind of get a good intuition for how to get them to work, have enough flexibility so you can create network architectures like convolutional nets and stuff like that.
这很难。
It was hard.
我的意思是,你必须从零开始编写一切。
I mean, you had to write everything from scratch.
而且,你当时没有Python、MATLAB或任何类似工具。
And again, you didn't have any Python or MATLAB or anything.
对吧?
Right?
所以,我读到过,抱歉打断一下,你用Lisp编写了Lynette的首个版本,也就是卷积神经网络的版本——顺便说一句,这可是我最喜爱的语言之一。
So what I read that sorry to interrupt, but I read that you wrote in Lisp, your first versions of Lynette with the Convolutional Neural Networks, which, by the way, one of my favorite languages.
是的。
It's yeah.
这就是我知道你确实靠谱的原因。
That's how I knew you were legit.
展开剩余字幕(还有 480 条)
什么图灵奖之类的荣誉都不算什么。
Turing Award, whatever.
你用Lisp写的这些程序才是真本事。
This what you programmed in Lisp.
这简直是
That's
它至今还是我最喜欢的编程语言。
It's still my favorite language.
不过我们并不是用Lisp语言编程,而是得自己编写Lisp解释器。
But it's not that we programmed in Lisp, it's that we had to write our Lisp interpreter.
因为那时候根本不存在什么现成的
Because it's not like
我们能用的
we used
成熟解释器。
one that existed.
所以我们编写了一个Lisp解释器,并将其连接到我们自己开发的后端库,用于神经网络计算。
So we wrote a Lisp interpreter that we hooked up to a backend library that we wrote also for sort of neural net computation.
几年后,大约在1991年,我们提出了一个想法:让各个模块具备前向传播和反向传播梯度的能力,然后将这些模块以图的形式连接起来。
And then after a few years around 1991, we invented this idea of basically having modules that know how to forward propagate and back propagate gradients, and then interconnecting those modules in a graph.
莱昂·布托在八十年代末就提出过类似的想法,而我们能够用我们的Lisp系统实现它。
Leon Botou had made proposals about this in the late eighties and we were able to implement this using our Lisp system.
最终,我们希望用这个系统在贝尔实验室开发用于字符识别的生产级代码。
Eventually we wanted to use that system to make, build production code for character recognition at Bell Labs.
我们实际上为这个Lisp解释器编写了一个编译器,帕特里夏·西马德——现在在微软——与莱昂和我一起主要完成了这项工作。
We actually wrote a compiler for that Lisp interpreter so that Patricia Simard, who's now Microsoft kind of did the bulk of it with Leon and me.
这样我们就可以用Lisp编写系统,然后编译成CE代码,从而得到一个独立完整的系统,能够完成整个任务。
So we could write our system in Lisp and then compile to CE, and then we'll have a self contained complete system that could kind of do the entire thing.
直到今天,PyTorch和TensorFlow都还做不到这一点。
Neither PyTorch nor TensorFlow can do this today yet.
好的。
Okay.
快了。
It's coming.
我的意思是,PyTorch 里有个叫 TorchScript 的东西,差不多是那样的。
I mean, there's something like that in PyTorch called TorchScript.
我们得写所有的解释器,得写整个编译器,投入了巨大的精力才做到这一点。
And we had to write all this meter player, had to write all this compiler, we had to invest a huge amount of effort to do this.
如果不是完全相信这个理念,没人会愿意花时间去做这种事。
Not everybody, if you don't completely believe in the concept, you're not going to invest the time to do this.
当时也好,现在也好,这都会演变成 Torch、PyTorch、TensorFlow 之类的工具。
Now at the time also, or today this would turn into Torch or PyTorch or TensorFlow or whatever.
我们把它开源了。
We put it in open source.
大家都用,然后就会发现它确实很好。
Everybody would use it and realize it's good.
在1995年之前,我在AT&T工作时,律师根本不可能允许我们发布这种性质的开源代码。
Back before 1995, working at AT and T, there's no way the lawyers would let you release anything in open source of this nature.
所以我们实际上无法分发我们的代码。
And so we could not distribute our code really.
关于这一点,抱歉扯得太远了,但我还读到过,贝尔实验室曾对卷积神经网络申请过类似专利的东西。
And on that point, and sorry to go on a million tangents, but on that point, I also read that there was some almost pat like a patent on Convolutional Neural Networks at Bell Labs.
是的,首先,我的意思是,
Yes, first of all, I mean,
实际上有两个。
just There's two, actually.
那个专利过期了,幸好如此,
That ran out, that's Thankfully,
在2007年。
in 2007.
在2007年。
In 2007.
我们能稍微聊一下这个吗?
What, can we just talk about that for a second?
我知道你在Facebook,但你也在纽约大学,那么对这类软件想法申请专利到底意味着什么?
I know you're at Facebook, but you're also at NYU, and what does it mean to patent ideas like these software ideas, essentially?
或者什么是数学想法?
Or what are mathematical ideas?
或者它们到底是什么?
Or what are they?
好的。
Okay.
所以这些并不是数学想法。
So they're not mathematical ideas.
所以确实存在一些算法。
So there are, you know, algorithms.
曾经有一段时间,美国专利局允许对软件进行专利保护,只要它被具体实现。
And there was a period where the US Patent Office would allow the patent of software as long as it was embodied.
欧洲人则非常不同。
The Europeans are very different.
他们并不完全接受这一点。
They don't quite accept that.
他们有不同的理念,但你知道,我其实从未真正强烈相信过这种观点,我现在也不相信这种类型的专利。
They have a different concept, but you know, I don't, I no longer, I mean, I never actually strongly believed in this, but I don't believe in this kind of patent.
Facebook 基本上不相信这种类型的专利。
Facebook basically doesn't believe in this kind of patent.
Google 申请专利是因为他们曾被苹果公司伤害过。
Google files patents because they've been burned with Apple.
所以现在他们这么做主要是出于防御目的,但通常他们会说,只要你不侵犯我们的专利,我们就不会起诉你。
And so now they do this for defensive purpose, but usually they say, we're not going to sue you if you infringe.
Facebook 也有类似的政策。
Facebook has a similar policy.
他们说,我们会为某些技术申请专利,主要是为了防御目的。
They say, you know, we file patents on certain things for defensive purpose.
只要你不起诉我们,我们也不会因为你侵犯专利而起诉你。
We're not going to sue you if you infringe unless you sue us.
所以这个行业并不相信专利。
So the industry does not believe patents.
它们存在是因为法律环境和其他各种因素。
They're there because of, you know, the legal landscape and various things.
但我真的不认为这类东西应该有专利。
But I don't really believe in patents for this kind of stuff.
好的。
Okay.
这真是件好事。
So that's a great thing.
所以我来告诉你一个更糟糕的故事。
So I- I'll tell you a worst story actually.
当时第一个关于卷积网络的专利,是关于卷积网络的早期版本,它没有独立的池化层。
So what happens was the first pattern about convolutional net was about kind of the early version of convolutional net that didn't have separate pooling layers.
它只有卷积层,这些层试图做更多的事情,如果你愿意这么说的话。
It had convolutional layers which tried more than one, if you want.
对吧?
Right?
然后第二个是带有独立池化层的卷积网络,使用反向传播进行训练。
And then there was a second one on convolutional nets with separate pooling layers, train with back prop.
在1989年和1999年左右,有人提交了相关专利申请。
And there were files filed in '89 and ninety nine years, something like this.
当时,专利的有效期是十七年。
At the time, the life of a patent was seventeen years.
所以在接下来的几年里,我们开始基于卷积网络开发字符识别技术。
So here's what happened over the next few years is that we started developing character recognition technology around convolutional nets.
1994年,支票识别系统被部署在自动取款机中。
And in 1994, a check reading system was deployed in ATM machines.
1995年,它被用于银行后台的大规模支票识别设备等。
In 1995, it was for large check reading machines in back offices, etcetera.
这些系统是由我们与AT&T合作的工程团队开发的。
And those systems were developed by an engineering group that we were collaborating with at AT and T.
这些系统由NCR公司商业化,当时NCR是AT&T的子公司。
And they were commercialized by NCR, which at the time was a subsidiary of AT and T.
AT&T在1996年初分拆了。
Now AT and T split up in 1996, early nineteen ninety six.
律师们审查了所有专利,并将专利分配给各个公司。
And the lawyers just looked at all the patents and they distributed the patents among the various companies.
他们把卷积网络的专利给了NCR,因为NCR确实在销售使用这项技术的产品,但NCR内部没人知道卷积网络是什么。
They gave the convolutional net patent to NCR because they were actually selling products that used it, but nobody at NCR had any idea what a convolutional net was.
是的。
Yeah.
好的。
Okay.
所以从1996年到2007年,有一整个时期直到2002年,我实际上没有从事机器学习或卷积网络的工作。
So between 1996 and 2007, so there's a whole period until 2002 where I didn't actually work on machine learning or convolutional net.
我大约在2002年重新开始研究这个领域。
I resumed working on this around 2002.
在2002年到2007年之间,我一直在研究它们,心里祈祷NCR的人别发现,结果他们真的没发现。
And between 2002 and 2007, I was working on them, crossing my finger that nobody at NCR would notice, and nobody noticed.
是的。
Yeah.
我希望,正如你所说,撇开律师不谈,如今这个社区相对开放的氛围能够持续下去?
And I and I hope that this kind of somewhat, as you said, lawyers aside, relative openness of the community now will continue?
这加速了整个行业的发展。
It accelerates the entire progress of the industry.
你知道,Facebook、Google和其他公司今天面临的问题,并不是Facebook、Google、微软或IBM谁领先谁。
And, you know, the problems that Facebook and Google and others are facing today is not whether Facebook or Google or Microsoft or IBM or whoever is ahead of the other.
而是我们还没有技术来实现我们想构建的东西。
It's that we don't have the technology to build these things we want to build.
我们想打造具有常识的智能虚拟助手。
We want to build intelligent virtual assistants that have common sense.
我们对这方面的优秀想法并没有垄断。
We don't have monopoly on good ideas for this.
我们不认为我们有。
We don't believe we do.
也许别人认为他们有,但我们没有。
Maybe others believe they do, but we don't.
好的。
Okay.
如果一家初创公司告诉你,他们掌握了实现人类水平智能和常识的秘诀,别信他们。
If a startup tells you they have the secret to, you know, human level intelligence and common sense, don't believe them.
他们没有。
They don't.
要达到那个阶段,让各个公司都能基于此开始构建东西,还需要全球研究界共同努力一段时间。
And it's going to take the entire work of the world research community for a while to get to the point where you can go off and in each of those companies kind of start to build things on this.
我们还没到那一步。
We're not there yet.
所以确实如此。
So absolutely.
这正体现了你经常提到的理念与实际验证之间的差距。
And this this calls to the the gap between the space of ideas and the rigorous testing of those ideas of practical application that you often speak to.
你曾建议过:不要被那些声称已解决通用人工智能、声称拥有像人脑一样工作的AI系统,或声称已破解大脑工作原理的人所迷惑。
You've written advice saying, don't get fooled by people who claim to have a solution to artificial general intelligence, who claim to have an AI system that work just like the human brain, or who claim to have figured out how the brain works.
问问他们在MNIST或ImageNet上的错误率是多少。
Ask them what the error rate they get on MNIST or ImageNet.
是的。
Yeah.
不过,这有点过时了。
This is a little dated, by the way.
我的意思是,五年前。
Two I mean, five years.
对。
Yes.
谁还在乎呢?
Who's counting?
好的。
Okay.
但我认为你的观点,即MNIST和ImageNet可能已经过时了,仍然是对的。
But I think your opinion is still MNIST and ImageNet, yes, may be dated.
可能会有新的基准测试。
There may be new benchmarks.
对吧?
Right?
但我认为这种理念是,你仍然坚持认为,基准测试和实际应用才是真正检验这些想法的地方。
But I think that philosophy is when you still and and and somewhat hold that benchmarks and the practical testing, the practical application is where you really get to test the ideas.
但也许这并不完全实用。
Well, it may not be completely practical.
比如,它可能是一个玩具数据集,但必须是整个社区普遍接受的某种标准基准任务,如果你想要的话。
Like, for example, know, it could be a toy dataset, but it has to be some sort of task that the community as a whole has accepted as some sort of standard benchmark, if you want.
它不需要是真实的。
It doesn't need to be real.
比如很多年前,在公平会上,杰森·韦斯顿和其他一些人提出了婴儿任务,这是一类玩具问题,用于测试机器推理和访问工作记忆等能力。
So for example, many years ago here at fair, people, know, Jason Weston, and few others proposed the, the baby tasks, which were kind of a toy problem to test the ability of machines to reason actually to access working memory and things like this.
尽管这不是一个真实任务,但它非常有用。
And it was very useful, even though it wasn't a real task.
MNIST 算是介于真实任务和非真实任务之间的一种。
MNIST is kind of halfway real task.
玩具问题可能非常有用。
Toy problems can be very useful.
我只是特别注意到,很多人,尤其是那些有资金投资的人,会被别人忽悠,说‘我们掌握了皮层的算法,你应该给我们五千万’。
It's just that I was really struck by the fact that a lot of people, particularly a lot of people with money to invest would be fooled by people telling them, oh, we have, you know, the algorithm of the cortex, and you should give us 50,000,000.
是的。
Yes.
当然。
Absolutely.
确实有很多人试图利用这种炒作来谋取商业利益。
So there's a lot of people who who try to take advantage of the hype for business reasons and so on.
但让我谈谈这个观点:推动领域前进的新想法,可能还没有相应的基准测试。
But let me sort of talk to this idea that the new ideas, ideas that push the field forward, may not yet have a benchmark.
或者建立基准测试可能非常困难。
Or it may be very difficult to establish a benchmark.
我同意。
I agree.
这是过程的一部分。
That's part of the process.
建立基准测试是这个过程的一部分。
Establishing benchmarks is part of the process.
那么,你对这些基准测试有什么看法?我们现在有围绕图像的基准测试,从分类到描述,到从图像表面提取各种信息。
So, what are your thoughts about so we have these benchmarks on, around stuff we can do with images, from classification to captioning, to just every kind of information you can pull off from images in the surface level.
还有音频数据集。
There's audio datasets.
有一些视频数据。
There's some video.
我们能从自然语言开始,关注哪些类型的东西、哪些基准,逐渐向更像智能、推理这样的方向发展,也许你不喜欢这个词,但类似于AGI的那些影子
What can we start, natural language, what kind of stuff, what kind of benchmarks do you see that start creeping on to more something like intelligence, like reasoning, like, maybe you don't like the term, but AGI echoes of that kind of
是的
Yeah.
许多人在研究交互式环境,在这些环境中可以训练和测试智能系统。
So a lot of people are working on interactive environments in which you can train and test intelligent systems.
例如,经典的监督学习范式是:你有一个数据集,将其划分为训练集、验证集和测试集,并且有明确的协议。
So there, for example, you know, it's the classical paradigm of supervised learning is that you have a dataset, you partition it into a training set, validation set, test set, and there's a clear protocol.
但传统数据假设样本是统计独立的,可以互换,你看到它们的顺序无关紧要;但如果你的回答决定了你接下来看到的样本——比如在机器人领域就是这种情况呢?
But what if data assumes that the samples are statistically independent, you can exchange them, the order in which you see them doesn't, shouldn't matter, things like But what if the answer you give determines the next sample you see, which is the case, for example, in robotics, right?
你的机器人做了一件事,然后进入一个新房间,而它去的地方不同,房间也会不同。
You robot does something and then it gets exposed to a new room and depending on where it goes, the room would be different.
这就带来了探索问题。
So that creates the exploration problem.
这也导致了样本之间的依赖性。
So that creates also a dependency between samples.
如果你只能在空间中移动,那么你接下来看到的样本很可能还是在同一个建筑里。
If you can only move in space, the next sample you're going to see is going to be probably in the same building, most likely.
因此,只要机器的行为会对世界产生影响,所有关于训练集和测试集假设的有效性就会被打破。
So all the assumptions about the validity of this training set test set hypothesis break whenever a machine can take an action that has an influence in world.
而这正是它将看到的内容。
And it's what it's going to see.
所以人们正在构建这样的虚拟环境,对吧?
So people are setting up artificial environments where that takes place, right?
机器人会在房屋的三维模型中四处移动,并与物体进行交互,诸如此类。
The robot runs around a three d model of a house and can interact with objects and things like this.
所以你进行的是基于模拟的机器人研究。
So you do robotics based simulation.
你有那些类似健身房的环境,或者Mujoco这样的模拟机器人,还有各种游戏之类的。
You have those, you know, opening a gym type thing, or Mujoco kind of simulated robots and you have games, you know, things like that.
所以这个领域真正的发展方向就是这种类型的环境。
So that's where the field is going really, this kind of environment.
现在回到AGI的问题上。
Now back to the question of AGI.
我不喜欢AGI这个术语,因为它暗示人类智力是通用的,而人类智力根本不是通用的。
Like I don't like the term AGI because it implies that human intelligence is general and human intelligence is nothing like general.
它非常、非常专门化。
It's very, very specialized.
我们以为它是通用的。
We think it's general.
我们喜欢认为自己拥有通用智能。
We like to think of ourselves as having general intelligence.
我们并没有。
We don't.
我们非常专门化。
We're very specialized.
我们只是比……稍微更通用一点
We're only slightly more general than
为什么它会感觉是通用的?
Why does it feel general?
所以你对‘通用’这个术语有点误解。
So you kind of the term general.
我认为人类令人印象深刻的是学习能力,就像我们之前讨论的那样,能够在如此多不同的领域中学习。
I think what's impressive about humans is the ability to learn, as we were talking about learning, to learn in just so many different domains.
它或许不是无限通用的,但你确实能在多个领域学习并以某种方式整合这些知识。
It's perhaps not arbitrarily general, but just you can learn in many domains and integrate that knowledge somehow.
好的。
Okay.
知识是持续存在的。
The knowledge persists.
让我举一个具体的例子。
So let me take a very specific example.
对。
Yes.
这算不上是一个实例。
It's not an example.
它更像是一种准数学层面的论证。
It's more like a quasi mathematical demonstration.
人的一只眼睛里会延伸出大约一百万根神经纤维。
So you have about 1,000,000 fibers coming out of one of your eyes.
两只眼睛加起来总共有两百万根,不过我们先只聊其中一只眼睛的情况。
2,000,000 total, but let's talk about just one of them.
你的视神经包含整整一百万条神经纤维。
It's 1,000,000 nerve fibers, your optical nerve.
我们假设这些信号都是二进制的,也就是说它们要么处于激活状态,要么处于未激活状态。
Let's imagine that they are binary, so they can be active or inactive.
那么进入你视觉皮层的输入数据量就是一百万比特。
So the input to your visual cortex is 1,000,000 bits.
现在这些输入会以特定的方式接入你的大脑,而大脑里的连接有点像卷积神经网络——它们在空间上是局部化的,还有诸如此类的特性。
Now they connect it to your brain in a particular way, and your brain has connections that are kind of a little bit like a convolutional net, they kind of local in space and things like this.
现在我来跟你玩个把戏。
Now imagine I play a trick on you.
我承认,这是个相当恶劣的把戏。
It's a pretty nasty trick I admit.
我切断你的视神经,然后安装一个设备,对所有神经纤维进行随机排列重组。
I cut your optical nerve and I put a device that makes a random perturbation of a permutation of all the nerve fibers.
现在传入你大脑的信息,是所有像素的一个固定但随机的排列。
So now what comes to your brain is a fixed but random permutation of all the pixels.
无论我是在你婴儿时期就对你这么做,你的视觉皮层都不可能再达到你原本的视觉质量水平。
There's no way in hell that your visual cortex, even if I do this to you in infancy, will actually run vision to the same level of quality that you can.
明白了。
Got it.
你是说,你根本不可能重新学会这个吗?
And you're saying there's no way you've relearned that?
不可能。
No.
因为现在现实中相邻的两个像素会在你的视觉皮层中出现在完全不同的位置。
Because now two pixels that are nearby in the world will end up in very different places in your visual cortex.
而那里的神经元彼此之间没有连接,因为它们只与邻近的神经元相连。
And your neurons there have no connections with each other because they're only connected locally.
所以我们的整个硬件在很多方面都是为了支持
So this whole our entire the hardware is built in many ways to support
现实世界的局部性。
The locality of the real world.
是的。
Yeah.
对。
Yes.
这就是特化。
That's specialization.
是的。
Yeah.
但这仍然相当令人印象深刻。
But it's still pretty damn impressive.
所以这并不是完美的泛化。
So it's not perfect generalization.
甚至差得远。
It's it's not even close.
不。
No.
不。
No.
并不是说它差得远。
It's it's it's It's not that it's not even close.
根本不是。
It's not at all.
是的,不是。
Yes, not.
这是专门化的。
It's specialized.
是的。
Yeah.
那么有多少个布尔函数?
So how many Boolean functions?
所以,让我们想象一下,你希望训练你的视觉系统来识别这100万个比特的特定模式。
So let's imagine you want to train your visual system to recognize particular patterns of those 1,000,000 bits.
好的。
Okay.
所以这是一个布尔函数,对吧?
So that's a Boolean function, right?
要么这个模式存在,要么不存在。
Either the pattern is here or not here.
这是一个具有100万个二进制输入的二分类问题。
There's a two way classification with 1,000,000 binary inputs.
有多少这样的布尔函数?
How many such Boolean functions are there?
好的。
Okay.
你有2的100万次方种输入组合。
You have two to the 1,000,000 combinations of inputs.
对于每一种组合,你都有一个输出位。
For each of those you have an output bit.
因此,这类布尔函数有2的100万次方个,这是一个难以想象的巨大数字。
And so you have two to the 1,000,000 Boolean functions of this type, which is an unimaginably large number.
你的大脑皮层实际能计算出其中多少个函数?
How many of those functions can actually be computed by your usual cortex?
答案是极小极小极小极小极小的一小部分,简直微乎其微。
The answer is a tiny, tiny, tiny, tiny, tiny sliver, like an enormously tiny sliver.
是的。
Yeah.
所以我们极其专业化。
So we are ridiculously specialized.
你知道的。
You know?
好吧。
Okay.
但好吧。
But okay.
这是对‘通用’这个词的一个反驳。
That's an argument against the word general.
我觉得这里有点重复,我同意你的直觉,但我并不确定,大脑似乎具有令人印象深刻的适应能力。
I think there's there's a I I there's I agree with your intuition, but I'm not sure it's it seems the the brain is impressively capable of adjusting to things.
所以
So
这是因为我们无法想象超出我们能力范围的任务,对吧?
It's because we can't imagine tasks that are outside of our Right?
是的。
So yeah.
所以我们认为
So we think
我们认为自己是通用的,因为我们能理解所有我们能感知到的事物。
we think we are general because we're general of all the things that we can apprehend.
哦,是的。
Oh, so yeah.
但外面还有一个我们完全不了解的广阔世界。
But there is a huge world out there of things that we have no idea.
顺便说一下,我们称那个为热量。
We call that heat, by the way.
热量。
Heat.
热量。
Heat.
所以至少物理学家称其为热,或者称之为熵,你知道的,就是你有一个充满气体的容器。
So at least physicists call that heat or they call it entropy, which is you know, you have a thing full of gas.
对吧?
Right?
密闭的气体系统。
Closed system for gas.
对吧?
Right?
无论是密闭还是非密闭的。
Closed or not closed.
它具有,你知道的,压强。
It has, you know, pressure.
它具有温度。
It has temperature.
它还具有,你知道的,你可以写出像 PV 等于 nRT 这样的方程,诸如此类的东西。
It has, you know, and you can write equations PV equal NRT, you know, things like that.
对吧?
Right?
当你减小体积时,温度会上升,压力也会上升,诸如此类。
When you reduce the volume, the temperature goes up, the pressure goes up, know, things like that.
对吧?
Right?
至少对于理想气体来说是这样。
For perfect gas, at least.
这些是你能够了解的关于这个系统的特性。
Those are the things you can know about that system.
与整个系统状态的完整信息相比,这仅仅是极少极少的比特数,因为整个系统状态会给出气体中每个分子的位置和动量。
And it's a tiny, tiny number of bits compared to the complete information of the state of the entire system, because the state of the entire system will give you the position of momentum of every molecule of the gas.
而你对它不了解的部分就是熵,你将其解释为热。
And what you don't know about it is the entropy and you interpret it as heat.
那个系统中包含的能量就是我们所说的热。
The energy contained in that thing is what we call heat.
事实上,这些分子的运动方式可能存在非常强的结构。
Now it's very possible that, in fact, there is some very strong structure in how those molecules are moving.
只是这种结构的方式是我们根本无法感知的。
It's just that they are in a way that we are just not wired to perceive.
是的,我们对它一无所知。
Yeah, we're ignorant to it.
在我们无法感知的无限多事物中,还有更多。
And there's, in your infinite amount of things we're not wired to perceive.
你说得对,这样表达真不错。
And you're right, that's a nice way to put it.
我们能想象的一切只是所有可能性中极其微小的一部分。
We're general to all the things we can imagine, which is a very tiny subset of all things that are possible.
所以这就像复杂性的共同核心,或者说是复杂性的共同核心。
So it's like common core of complexity or the common core of complexity.
对。
Yeah.
你知道,每一个比特串或每一个整数都是随机的,除了那些你实际上能写出来的。
You know, every bit string or every integer is random, except for all the ones that you can actually write down.
是的。
Yeah.
好的。
Okay.
说得真好。
So beautifully put.
但你知道,我们完全可以称之为人工智能。
But, you know, so we could just call it artificial intelligence.
我们不需要拥有一个通用的
We don't need to have a general
或者人类水平的。
Or human level.
人类水平的智能是
Human level intelligence is a
你知道,只要你一涉及到人类,事情就会变得有趣,因为归根结底,是我们自己把情感投射到了人类身上,而很难准确定义什么是人类智能。
good You know, you'll start anytime you touch human, it interesting because, you know, it's it's just because we attach ourselves to human, and it's difficult to define what human intelligence is.
是的。
Yep.
不过,我的定义可能是极其惊人的智能。
Nevertheless, my definition is maybe damn impressive intelligence.
明白吗?
Okay?
无论怎样,这都是一次令人印象深刻的智能展示。
Demo impressive demonstration of intelligence, whatever.
关于这个话题,深度学习的大多数成功都来自于监督学习。
And so, on that topic, most successes in deep learning have been in supervised learning.
什么?
What
你对无监督学习有什么看法?
is your view on unsupervised learning?
有没有可能减少人为干预,同时仍能开发出具有实际应用价值的成功系统?
Is there a hope to reduce involvement of human input and still have successful systems that are have practical use?
是的。
Yeah.
我的意思是,这确实是有希望的。
I mean, there's definitely a hope.
这甚至不止是希望。
It's more than a hope actually.
目前已经有越来越多的证据支持这一点。
It's mounting evidence for it.
而这基本上就是我所有的工作。
And that's basically all I do.
目前我唯一感兴趣的是我称之为自监督学习的东西,而不是无监督学习。
Like, the only thing I'm interested in at the moment is I call it self supervised learning, not unsupervised.
因为‘无监督学习’这个术语带有太多歧义。
Because unsupervised learning is a loaded term.
懂机器学习的人会告诉你,你在做聚类或主成分分析,但其实并不是这样。
People who know something about machine learning, you know, tell you, so you're doing clustering or PCA, which is not the case.
而普通公众一听‘无监督学习’,天啊,机器能自己学习,完全不需要人为干预。
And the white public, you know, you say unsupervised learning, oh my God, you know, machines are going learn by themselves and without supervision.
你知道,他们看到这个
You know, they see this
那——父母在哪?
as- Where's the parents?
是的。
Yeah.
所以我称之为自监督学习,因为实际上所使用的底层算法与监督学习的算法是一样的,只不过我们训练它们的目标不是预测特定的变量,比如图像的类别,也不是预测由人类标注者提供的变量。
So, so I call it self supervised learning because in fact, the underlying algorithms that are used are the same algorithms as the supervised learning algorithms, except that what we train them to do is not predict a particular set of variables like the category of an image and not to predict a set of variables that have been provided by human labelers.
而是训练机器去重建其输入中被遮蔽的部分。
But what you train the machine to do is basically reconstruct a piece of its input that is being masked out essentially.
你可以这样理解,对吧?
You can think of it this way, right?
所以给机器展示一段视频,然后让它预测接下来会发生什么。
So show a piece of video to machine and ask it to predict what's going to happen next.
当然,过一段时间后你可以展示实际发生的情况,机器就会逐渐学会更好地完成这个任务。
And of course, after a while you can show what happens and the machine will kind of train itself to do better at that task.
如今所有在自然语言处理领域最先进、最成功的模型都使用自监督学习,比如BERT这类系统,对吧?
You can do like all the latest, successful models in natural language processing use self supervised learning, you know, sort of BERT style systems, for example, right?
你向它展示一个包含一千个词的文本窗口。
You show it a window of a thousand words on a test corpus.
你去掉其中15%的词语,然后训练机器去预测那些缺失的词。
You take out 15% of the words, and then you train a machine to predict the words that are missing.
这就是自监督学习。
That's self supervised learning.
它并不是在预测未来。
It's not predicting the future.
它只是,你知道的,预测中间的内容,但你也可以让它去预测未来。
It's just, you know, predicting things in the middle, but you could have it predict the future.
这就是语言模型所做的。
That's what language models do.
你以一种无监督的方式构建了一个语言模型。
You construct, so in an unsupervised way, you construct a model of language.
你觉得
Do you think
或者视频、物理世界,或者
Or video or the physical world or
不管怎样,你觉得这能走多远?
How whatever, far do you think that can take us?
你觉得AI真的理解什么吗?
Do you think Art understands anything?
在某种程度上,它对文本有浅层的理解,但要真正达到人类水平的智能,我认为必须将语言与现实联系起来。
To some level, it has, you know, a shallow understanding of text, but it needs to mean, to have kind of true human level intelligence, I think you need to ground language in reality.
所以有些人正在尝试这样做,对吧?
So some people are attempting to do this, right?
拥有能够可视化所讨论内容的系统,这正是为什么你需要那些交互式环境的原因之一。
Having systems that kind of have some visual representation of what is being talked about, which is one reason you need those interactive environments actually.
但这是一项巨大的技术难题,尚未解决,这也解释了为什么自监督学习在自然语言领域有效,但在图像识别和视频领域却效果不佳(尽管进展迅速)。
But it's like a huge technical problem that is not solved and that explains why self supervised learning works in the context of natural language, but does not work in the context, or at least not well, in the context of image recognition and video, although it's making progress quickly.
原因在于,在自然语言的语境中表达预测的不确定性,比在视频和图像等场景中容易得多。
And the reason, that reason is the fact that it's much easier to represent uncertainty in the prediction, in the context of natural language than it is in the context of things like video and images.
例如,如果我让你预测被删掉的15%的单词缺失了什么内容。
So for example, if I ask you to predict what words are missing, you know, 15% of the words that are taken out.
可能性是有限的。
The possibilities are small.
这意味着
That means
范围很小。
It's small.
对吧?
Right?
词典中有十万词汇,而机器输出的是一个巨大的概率向量。
There is a 100,000 words in the the lexicon, and what the machine spits out is a a big probability vector.
对吧?
Right?
这是一堆介于零和一之间的数字,一一对应。
It's a bunch of numbers between zero and one that's one to one.
我们知道如何用计算机来实现这一点。
And we know how to do this with computers.
所以
So
它们在预测中表示不确定性相对容易。
they are representing uncertainty in the prediction is relatively easy.
在我看来,这就是这些技术在自然语言处理中有效的原因。
And that's in my opinion, why those techniques work for NLP.
对于图像,如果你遮住图像的一部分并要求系统重建该部分,会有许多可能的答案。
For images, if you ask, if you block a piece of an image and you ask the system reconstruct that piece of the image, there are many possible answers.
它们都是完全合理的,对吧?
They are all perfectly legit, right?
那么,你如何表示这一组可能的答案呢?
And how do you represent that, this set of possible answers?
你无法训练一个系统只做出一个预测。
You can't train a system to make one prediction.
你无法训练一个神经网络说:这就是图像。
You can't train a neural net to say, here it is, that's the image.
因为有一整套与之兼容的可能性。
Because there's a whole set of things that are compatible with it.
那么,你如何让机器表示的不是一个单一输出,而是一整组输出呢?
So how do you get the machine to represent not a single output, but a whole set of outputs?
同样地,在视频预测中,未来可能发生的事情有很多。
Similarly with video prediction, there's a lot of things that can happen in the future of video.
你现在看着我,我并没有大幅度移动头部,但我可能会向左或向右转头。
You're looking at me right now, I'm not moving my head very much, but I might turn my head to the left or to the right.
对。
Right.
如果你没有一个能够预测这一点的系统,而是用最小二乘法来训练它,试图最小化预测结果与我实际行为之间的误差,那么你得到的将是我所有可能未来位置的模糊图像,这并不是一个好的预测。
If you don't have a system that can predict this, and you train it with least square to kind of minimize the error with a prediction and what I'm doing, what you get is a blurry image of myself in all possible future positions that I might be in, which is not a good prediction.
但也许对于视觉场景,还有其他方法可以实现自监督,对吧?
But, so there might be other ways to do the self supervision, right, for visual scenes.
比如哪些?
Like what?
如果我知道的话,我不会先告诉你该怎么发表。
If I knew, I wouldn't tell you how to publish it first.
我不知道。
I don't know.
不,也许有其他方法。
No, there might be.
所以,我的意思是,这些方法可能有一些人为的途径,比如在游戏中进行自我对弈,通过模拟部分环境来实现。
So, I mean, these are kind of, there might be artificial ways of like self play in games, the way you can simulate part of the environment.
可以
Can
哦,这并不能解决问题。
Oh, that doesn't solve the problem.
这仅仅是一种生成数据的方法。
It's just a way of generating data.
但因为你有更多的控制权,也许你可以控制一下。
But because you have more of a control like, maybe you can control yeah.
这是一种生成数据的方式。
It's a way to generate data.
没错。
That's right.
而且因为你能生成海量数据,你说得对。
And because you can do huge amounts of data generation, that doesn't you're right.
这个嘛,它是从数据的角度逐步逼近问题的,但你不觉得这是正确的逼近方式
This well, it's it's a creeps up on the problem from the side of data, and you don't think that's the right way to creep
这并不能解决处理世界不确定性的问题。
up It doesn't solve this problem of handling uncertainty in the world.
对吧?
Right?
所以,如果你让机器在一种确定性或准确定性的游戏中学习世界的预测模型,那就很简单。
So if you if you have a machine learn a predictive model of the world in a game that is deterministic or quasi deterministic, it's easy.
给卷积网络几帧游戏画面,加上多层网络,然后它就能生成接下来的几帧。
Give a few frames of the game to a Convolent, put a bunch of layers and then half the game generates the next few frames.
如果游戏是确定性的,那就没问题。
If the game is deterministic, it works fine.
这包括向系统输入你角色即将采取的动作。
And that includes, you know, feeding the system with the action that your little character is going to take.
问题在于,现实世界和大多数游戏都不是完全可预测的。
The problem comes from the fact that the real world and most games are not entirely predictable.
因此,你会得到模糊的预测,而无法基于模糊的预测进行规划。
And so there you get those blurry predictions and you can't do planning with blurry predictions.
所以,如果你对世界有一个完美的模型,你就可以在脑海中用一组动作假设来运行这个模型,从而预测这一系列动作的结果。
So if you have a perfect model of the world, you can, in your head, run this model with a hypothesis for a sequence of actions, and you're going to predict the outcome of that sequence of actions.
但如果你的模型不完美,你该如何规划呢?
But if your model is imperfect, how can you plan?
是的。
Yeah.
这会迅速变得复杂不堪。
It quickly explodes.
关于这个方向的延伸,我非常感兴趣,它与你之前提到的机器人技术有关,那就是主动学习。
What are your thoughts on the extension of this, which topic I'm super excited about, it's connected to something you were talking about in terms of robotics, is active learning.
与完全无监督或自监督学习不同,你会向系统寻求人类的帮助,以选择接下来需要标注的部分。
So, as opposed to sort of completely unsupervised or self supervised learning, you ask the system for human help, for selecting parts you want annotated next.
如果你想象一个机器人在探索空间,或者一个婴儿在探索空间,又或者一个系统在探索数据集,每隔一段时间就寻求一次人类的输入。
If you think about a robot exploring a space, or a baby exploring a space, or a system exploring a dataset, every once in a while asking for human input.
你认为这种做法有价值吗
Do see value in that kind
的工作?
of work?
我看不出有变革性的价值。
I don't see transformative value.
它只会让那些我们已经能做的事情更高效,或者让我们学得稍微快一点,但不会让机器变得显著更智能。
It's going to make things that we can already do more efficient or that we'll learn slightly more efficiently, but it's not going to make machines sort of significantly more intelligent.
而且顺便说一下,自监督学习、强化学习、监督学习、模仿学习和主动学习之间并没有对立,也没有冲突。
Think, and by the way, there is no opposition, there's no conflict between self supervised learning, reinforcement learning and supervised learning or imitation learning or active learning.
我认为自监督学习是上述所有方法的前期阶段。
I see self supervised learning as a preliminary to all of the above.
所以我经常举的例子是:如果你使用经典的强化学习,或者说是当今最好的所谓无模型强化学习方法来学习玩雅达利游戏,需要大约八十小时的训练才能达到人类大约十五分钟就能达到的水平。
So the example I use very often is how is it that So if you use classical reinforcement learning, deep reinforcement learning, if you want, the best methods today so called model free reinforcement learning to learn to play Atari games, take about eighty hours of training to reach the level that any human can reach in about fifteen minutes.
它们
They
能超过人类,但需要很长时间。
get better than humans, but it takes them a long time.
AlphaStar,你知道的,还有他的团队开发的用于玩《星际争霸》的系统,只针对单一地图、单一类型的玩家,通过大约相当于两百年的自我对弈训练,就能达到超越人类的水平。
AlphaStar, you know, and his teams, the system to play Starcraft plays, you know, a single map, a single type of player, and can reach better than human level with about the equivalent of two hundred years of training playing against itself.
是两百年,对吧?
It's two hundred years, right?
这不是人类永远无法做到的事情。
It's not something that no human can, could ever do.
我的意思是,我不确定Lex能从中学到什么。
I mean, I'm not sure what Lex can take away from that.
现在,把这些目前最先进的强化学习算法应用到自动驾驶汽车的训练上。
Now take those algorithms, the best RL algorithms we have today to train a car to drive itself.
它可能需要驾驶数百万小时。
It would probably have to drive millions of hours.
它必须撞死数千名行人。
It will have to kill thousands of pedestrians.
它必须撞上数千棵树。
It will have to run into thousands of trees.
它还必须从悬崖上冲下去。
It will have to run off cliffs.
是的。
Yeah.
而且在它意识到这是个糟糕的主意之前,它已经多次从悬崖上冲下去了。
And it had to run off cliff multiple times before it figures out that it's a bad idea, first of all.
其次,在它学会如何避免这样做之前,也经历了多次这样的事故。
And second of all, before it figures out how not to do it.
因此,这种学习方式显然与动物和人类的学习方式不同。
And so, I mean, this type of learning obviously does not reflect the kind of learning that animals and humans do.
这其中缺失了一些非常重要、非常关键的东西。
There is something missing that's really, really important there.
我的假设——我已经倡导了五年了——是我们拥有对世界的预测模型,这些模型包含在不确定性下进行预测的能力。
And my hypothesis, which I've been advocating for like five years now, is that we have predictive models of the world that include the ability to predict under uncertainty.
而这正是让我们在学习驾驶时不会冲下悬崖的原因。
And what allows us to not run off a cliff when we learn to drive.
我们大多数人只需大约二十到三十小时的训练就能学会开车,而且从未发生过碰撞或事故。
Most of us can learn to drive in about twenty or thirty hours of training without ever crashing, causing any accident.
如果我们开车靠近悬崖,我们知道如果向右打方向盘,车就会冲下悬崖,结果肯定不好。
And if we drive next to a cliff, we know that if we turn the wheel to the right, the car is going to run off the cliff and nothing good is going to come out of this.
因为我们对直觉物理有相当好的理解,知道车会掉下去。
Because we have a pretty good model of intuitive physics that tells us, you know, the car is going to fall.
明白重力的作用。
Know about gravity.
婴儿在八九个月大时就学会了物体不会漂浮,而是会下落。
Babies learned this around the age of eight or nine months that objects don't float, they fall.
而且,我们对转动方向盘对车辆的影响有相当清楚的认识,也知道必须留在道路上。
And, you know, we have a pretty good idea of the effect of turning the wheel on the car and, you know, we know we need to stay on the road.
因此,我们带入了许多东西,本质上就是我们对世界的预测模型。
So there's a lot of things that we bring to the table, which is basically our predictive model of the world.
这个模型让我们不会做愚蠢的事,并且基本上能保持在我们需要做的事情的范围内。
And that model allows us to not do stupid things and to basically stay within the context of things we need to do.
我们仍然会遇到不可预测的情况,这正是我们学习的方式,但这也让我们能够学得非常非常快。
We still face unpredictable situations and that's how we learn, but that allows us to learn really, really quickly.
这被称为基于模型的强化学习。
That's called model based reinforcement learning.
其中包含一些模仿和监督学习,因为我们偶尔会有一个驾驶教练告诉我们该怎么做。
There's some imitation and supervised learning because we have a driving instructor that tells us occasionally what to do.
但大部分学习是学习模型,学习我们从婴儿时期就开始掌握的物理知识。
But most of the learning is learning the model, learning physics that we've done since we were babies.
几乎所有学习都发生在这一部分。
That's where all almost all the learning
而物理知识在不同场景之间是可迁移的。
And the physics is somewhat transferable from is transferable from scene to scene.
愚蠢的行为在任何地方都是一样的。
Stupid things are the same everywhere.
是的。
Yeah.
我的意思是,如果你对世界有经验,你并不需要是特别聪明的物种,才能明白如果把水从容器里洒出来,剩下的地方就会湿。
I mean, if you, you know, you have experience of the world, you don't need to be particularly from a particularly intelligent species to know that if you spill water from a container, you know, the rest is going to get wet.
你可能会弄湿自己。
You might get wet.
你知道,猫也懂这个,对吧?
You know, cats know this, right?
是的。
Yeah.
所以我们需要解决的主要问题是,如何学习世界的模型?
So the main problem we need to solve is how do we learn models of the world?
这正是我感兴趣的。
And that's what I'm interested in.
这正是自监督学习的核心所在。
That's what self supervised learning is all about.
如果你要尝试构建一个基准,比如看看MNIST。
If you were to try to construct a benchmark for let's let's look at MNIST.
我非常喜欢这个数据集。
I love that dataset.
但你觉得只用每个数字的一个样本在MNIST上表现良好,是否可行、有趣或有意义?
But if do you think it's useful, interesting slash possible to perform well on MNIST with just one example of each digit?
我们该如何解决这个问题?
And how would we solve that problem?
答案可能是肯定的。
The answer is probably yes.
问题是,你被允许使用哪种其他类型的学习方法?
The question is what other type of learning are you allowed to do?
如果你被允许在一个包含大量标注数字的大数据集上进行训练,这就叫做迁移学习。
So if what you're allowed to do is train on some gigantic dataset of labeled digit, that's called transfer learning.
我们知道这种方法是有效的。
And we know that works.
好的。
Okay.
我们
We
我们在Facebook的生产环境中就是这样做的。
do this at Facebook, like in production.
对吧?
Right?
我们训练大型卷积网络来预测人们在Instagram上输入的标签,训练数据是数十亿张图像,真的是数十亿张。
We we train large convolutional nests to predict hashtags that people type on Instagram and we train on billions of images, literally billions.
然后我们去掉最后一层,针对我们想要的任何任务进行微调。
And then we chop off the last layer and fine tune on whatever task we want.
这种方法效果非常好。
That works really well.
用这种方法甚至可以超越ImageNet的记录。
You can beat the ImageNet record with this.
我们实际上几周前就把整个方案开源了。
We actually open sourced the whole thing, like, a few weeks ago.
是的。
Yeah.
那确实还是很酷。
That's that's still pretty cool.
但没错。
But yeah.
那么,什么会让人印象深刻?
So what and what would be impressive?
什么既实用又令人印象深刻?
And and what's useful and impressive?
什么样的迁移学习会既实用又令人印象深刻?
What kind of transfer learning would be useful and impressive?
是维基百科吗?
Is it Wikipedia?
类似这样的东西?
That kind of thing?
不。
No.
所以我认为我们不应该把重点放在迁移学习上。
So I don't think transfer learning is really where we should focus.
我们应该尝试设定一种基准场景,其中有大量未标记的数据。
We should try to do, you know, have a kind of scenario for benchmark where you have unlabeled data and can, and it's very large number of unlabeled data.
它可以是视频片段。
It could be video clips.
它可以是帧预测任务。
It could be where you do frame prediction.
它可以是图像,你可以选择遮蔽其中一部分。
It could be images where you could choose to mask a piece of it.
可以是任何东西,但它们都是未标记的,而且你不能给它们打标签。
Could be whatever, but they're unlabeled and you're not allowed to label them.
所以你先在这上面进行训练,然后在特定的监督任务上训练,比如ImageNet或NIST,并测量随着标记训练样本数量的增加,你的测试误差或验证误差如何降低。
So you do some training on this and then you train on a particular supervised task, ImageNet or NIST, and you measure how your test error decrease or validation error decreases as you increase the number of labeled training samples.
好的。
Okay.
你希望看到的是,你的错误率下降速度远快于从随机权重开始完全从头训练的情况。
And what you'd like to see is that your error decreases much faster than if you train from scratch, from random weights.
因此,要达到与完全监督系统相同的性能水平,你所需的样本数量会少得多。
So that to reach the same level of performance than a completely supervised, purely supervised system would reach, you would need way fewer samples.
这是关键问题,因为它将回答那些对医学图像分析感兴趣的人的疑问。
So that's the crucial question because it will answer the question to like, you know, people interested in medical image analysis.
好的。
Okay.
你知道,如果我想在这个任务上达到特定的错误率,我知道我需要一百万个样本。
You know, if I want to get a particular level of error rate for this task, I know I need a million samples.
我能否通过自监督预训练,将这个数量减少到大约一百个左右?
Can I do, you know, self supervised pre training to reduce this to about a 100 or something?
你认为这里的答案是自监督预训练吗?
And you think the answer there is self supervised pre training?
是的
Yeah.
某种形式。
Some form.
某种形式的。
Some form of it.
你提到主动学习,但你不同意。
Telling you active learning, but you disagree.
不,这并不是无用的。
No, it's not useless.
它不会带来质的飞跃。
It's just not gonna lead to a quantum leap.
它只是让我们已经做的事情变得更好。
It's just gonna make things that we already do.
所以你比我聪明多了。
So you're way smarter than me.
我只是不同意你的观点。
I just disagree with you.
但我没有证据支持这一点。
But I don't have anything to back that.
这仅仅是直觉。
It's just intuition.
所以我接触过很多大规模数据集,我觉得主动学习中可能有些神奇之处。
So I worked with a lot of large scale data sets, and there's something that might be magic in active learning.
但好吧。
But okay.
至少我说了
At least I said
公开地表达了。
it publicly.
至少我公开地当了个傻瓜。
At least I'm being an idiot publicly.
好的。
Okay.
这并不是傻。
It's not being an idiot.
这其实是,你知道的,利用你现有的数据进行工作。
It's, you know, working with the data you have.
我的意思是,当然,人们正在做这样的事情:比如,我有三千小时的自动驾驶模仿数据,但其中大部分都极其枯燥。
I mean I mean, certainly, people are doing things like, okay, I have three thousand hours of imitation running for a self driving car, but most of those are incredibly boring.
我喜欢的是从中挑选出10%最有信息量的样本。
What I like is select 10% of them that are kind of the most informative.
仅凭这些,我可能就能达到同样的效果。
And with just that, I would probably reach the same.
如果你愿意这么说的话,这是一种弱形式的主动学习。
It's a weak form of of active learning, if you want.
是的。
Yes.
但可能还存在一个更强大的版本。
But there might be a a much stronger version.
是的。
Yeah.
那就是
That's
对。
right.
那就是问题所在,这还是一个悬而未决的问题:它是否存在。
That's what and that's an open question if it exists.
问题是,你能做到多强?
The question is how much stronger can you get?
埃隆·马斯克很有信心。
Elon Musk is confident.
我最近刚和他谈过。
Talked to him recently.
他相信,大规模数据和深度学习能够解决自动驾驶问题。
He's confident that large scale data and deep learning can solve the autonomous driving problem.
你对深度学习在这一领域的潜力极限有什么看法?
What are your thoughts on the limits possibilities of deep learning in this space?
嗯,这显然是解决方案的一部分。
Well, it's It's obviously part of the solution.
我的意思是,我认为我们永远不可能拥有一个自动驾驶系统——至少在可预见的未来——不使用深度学习的。
I mean, don't think we'll ever have a self driving system, or at least not in the foreseeable future, that does not use deep learning.
让我这么说吧。
Let me put it this way.
那么,它占多大比例呢?
Now, how much of it?
在工程历史中,尤其是类似AI的系统,通常会经历第一阶段,即所有东西都是手工构建的。
So in the history of sort of engineering, particularly sort of AI like systems, there's generally a first phase where everything is built by hand.
然后进入第二阶段。
Then there is a second phase.
二十、三十年前,自动驾驶也是如此。
And that was the case for autonomous driving, you know, twenty, thirty years ago.
有一个阶段,会用到一些学习技术,但需要大量工程工作来处理边缘情况和设定限制等,因为学习系统并不完美。
There's a phase where there's a little bit of learning is used, but there's a lot of engineering that's involved in kind of, you know, taking care of corner cases and putting limits, etcetera, because the learning system is not perfect.
随着技术进步,我们越来越依赖学习方法。
And then as technology progresses, we end up relying more and more on learning.
这正是字符识别、语音识别、计算机视觉、自然语言处理的发展历程。
That's the history of character recognition, the history of speech recognition, computer vision, natural language processing.
我认为自动驾驶也会经历同样的过程。目前,最接近实现某种程度自主性——即你不需要司机干预——的方法,是通过限制运行环境来实现的。
And I think the same is going to happen with autonomous driving that currently methods that are closest to providing some level of autonomy, some decent level of autonomy, where you don't expect a driver to kind of do anything is where you constrain the world.
所以你只在亚利桑那州凤凰城方圆一百平方公里或平方英里的区域内运行,而且天气良好、道路宽阔,这正是Waymo的做法。
So you only run within, you know, a 100 square kilometers or square miles in Phoenix, but the weather is nice and the roads are wide, which is what Waymo is doing.
你为汽车配备了大量激光雷达和昂贵的高级传感器,这些对普通消费者车辆来说太贵了,但如果你只运营一个车队,那就没问题。
You completely overengineer the car with tons of LIDARs and sophisticated sensors that are too expensive for consumer cars, but they're fine if you just run a fleet.
你把其他所有方面都彻底工程化到极致。
And you engineer the thing, the hell out of everything else.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。