#94 – 伊利亚·苏茨克沃：深度学习 | Lex Fridman Podcast 中文双语解读

本集简介

伊利亚·苏茨克维尔是OpenAI的联合创始人，历史上被引用次数最多的计算机科学家之一，论文引用量超过16.5万次。在我看来，他是深度学习领域最具才华与洞察力的思想者之一。这世上能让我在麦克风内外都更愿意与之探讨深度学习、智能与生命话题的人，伊利亚当属凤毛麟角。通过以下赞助商注册支持本播客： – Cash App – 使用代码"LexPodcast"并下载： – Cash App（App Store）：https://apple.co/2sPrUHe – Cash App（Google Play）：https://bit.ly/2MlvP5w 节目链接：伊利亚的推特：https://twitter.com/ilyasut 伊利亚的个人网站：https://www.cs.toronto.edu/~ilya/ 本对话属于《人工智能播客》系列。获取更多信息请访问 https://lexfridman.com/ai ，或在Twitter、LinkedIn、Facebook、Medium或YouTube上关注@lexfridman观看视频版对话。若喜欢本节目，请在Apple Podcasts上打五星，Spotify上订阅，或通过Patreon支持。以下是本期时间轴。部分播客平台可通过点击时间戳跳转。时间轴： 00:00 – 开场 02:23 – AlexNet论文与ImageNet转折点 08:33 – 损失函数 13:39 – 循环神经网络 16:19 – 深度学习成功的关键理念 19:57 – 语言与视觉哪个更难解决？ 29:35 – 我们严重低估了深度学习 36:04 – 深度双下降现象 41:20 – 反向传播算法 42:42 – 神经网络能否实现推理？ 50:35 – 长期记忆 56:37 – 语言模型 1:00:35 – GPT-2 1:07:14 – 主动学习 1:08:52 – AI系统的分阶段发布 1:13:41 – 如何构建通用人工智能？ 1:25:00 – 向AGI提出的问题 1:32:07 – 生命的意义

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

以下是与OpenAI联合创始人兼首席科学家Ilya Seskever的对话，他是历史上被引用次数最多的计算机科学家之一，引用量超过16.5万次，在我看来，他更是深度学习领域中最杰出、最具洞察力的思想家之一。

The following is a conversation with Ilya Seskever, cofounder and chief scientist of OpenAI, one of the most cited computer scientists in history with over a 165,000 citations, and to me, one of the most brilliant and insightful minds ever in the field of deep learning.

Speaker 0

这个世界上，能让我在麦克风前后都更愿意与之探讨深度学习、智能与人生的人屈指可数，Ilya就是其中之一。

There are very few people in this world who I would rather talk to and brainstorm with about deep learning, intelligence, and life in general than Ilya on and off the mic.

Speaker 0

这是我的荣幸与快乐。

This was an honor and a pleasure.

Speaker 0

这次对话是在疫情爆发前录制的。

This conversation was recorded before the outbreak of the pandemic.

Speaker 0

对于所有正承受着这场危机带来的医疗、心理与经济压力的人们，我向你们致以爱意。

For everyone feeling the medical, psychological, and financial burden of this crisis, I'm sending love your way.

Speaker 0

保持坚强。

Stay strong.

Speaker 0

我们同舟共济。

We're in this together.

Speaker 0

我们终将战胜这一切。

We'll beat this thing.

Speaker 0

这里是人工智能播客。

This is the artificial intelligence podcast.

Speaker 0

如果你喜欢，请在YouTube上订阅，在苹果播客上打五星好评，在Patreon上支持，或者直接在Twitter上联系我，账号是Lex Friedman，拼写为f r i d m a n。

If you enjoy it, subscribe on YouTube, review it with five stars and Apple Podcast, support on Patreon, or simply connect with me on Twitter at Lex Friedman, spelled f r I d m a n.

Speaker 0

和往常一样，我现在会插播几分钟广告，但绝不会在对话中间插入任何可能打断交流的广告。

As usual, I'll do a few minutes of ads now and never any ads in the middle that can break the flow of the conversation.

Speaker 0

希望这样的安排对你合适，不会影响收听体验。

I hope that works for you and doesn't hurt the listening experience.

Speaker 0

本节目由App Store排名第一的金融应用Cash App赞助播出。

This show is presented by Cash App, the number one finance app in the App Store.

Speaker 0

下载后请使用代码Lex podcast。

When you get it, use code Lex podcast.

Speaker 0

Cash App支持向朋友转账、购买比特币，还能以低至1美元投资股市。

Cash App lets you send money to friends, buy Bitcoin, invest in the stock market with as little as $1.

Speaker 0

既然Cash App支持购买比特币，请允许我提一下加密货币在货币史上的地位确实令人着迷。

Since Cash App allows you to buy Bitcoin, let me mention that cryptocurrency in the context of the history of money is fascinating.

Speaker 0

我推荐《金钱的气味》作为了解这段历史的绝佳读物。

I recommend A Scent of Money as a great book on this history.

Speaker 0

无论是书籍还是有声书都很棒。

Both the book and audiobook are great.

Speaker 0

借贷记账法大约始于三万年前。

Debits and credits on ledgers started around thirty thousand years ago.

Speaker 0

美元诞生于两百多年前，而比特币作为首个去中心化加密货币，问世仅十余年。

The US dollar created over two hundred years ago, and Bitcoin, the first decentralized cryptocurrency, released just over ten years ago.

Speaker 0

鉴于此历史背景，加密货币仍处于发展初期，但它仍可能重新定义货币的本质。

So given that history, cryptocurrency is still very much in its early days of development, but it's still aiming to, just might, redefine the nature of money.

Speaker 0

重申一下，若您通过应用商店或Google Play下载Cash App并使用代码Lex podcast，即可获得10美元，同时Cash App也会向First组织捐赠10美元，该机构致力于推动全球青少年机器人技术和STEM教育发展。

So, again, if you get Cash App from the App Store or Google Play and use the code Lex podcast, you get $10, and Cash App will also donate $10 to First, an organization that is helping advance robotics and STEM education for young people around the world.

Speaker 0

现在，请听我与Ilya Setskever的对话。

And now, here's my conversation with Ilya Setskever.

Speaker 1

您是与Alex Kuszewski、Jeff Hinton齐名的AlexNet论文三位作者之一，这篇论文堪称深度学习革命的催化剂，标志着重大转折点的到来。

You were one of the three authors with Alex Kuszewski, Jeff Hinton of the famed Alex Ned paper that is arguably the paper that marked the big catalytic moment that launched the deep learning revolution.

Speaker 1

当时，带我们回到那个时期，你对神经网络、对神经网络的表征能力有什么直觉？

At that time, take us back to that time, what was your intuition about neural networks, about the representational power of neural networks?

Speaker 1

也许你可以谈谈在接下来的几年里，直到今天这十年间，这种直觉是如何演变的。

And maybe you could mention how did that evolve over the next few years up to today, over the ten years.

Speaker 2

是的。

Yeah.

Speaker 2

我可以回答这个问题。

I can answer that question.

Speaker 2

大约在2010或2011年的某个时候，我脑海中将两个事实联系了起来。

At some point in about 2010 or 2011, I connected two facts in my mind.

Speaker 2

基本上，当时的认识是这样的。

Basically, the realization was this.

Speaker 2

在某个时刻，我们意识到可以用反向传播端到端地训练非常庞大的——按今天的标准来说不算大，但在当时算得上又大又深的神经网络。

At some point, we realized that we can train very large I shouldn't say very know, they were tiny by today's standards, but large and deep neural networks, end to end with back propagation.

Speaker 2

不同的人在不同时间都得出了这个结论。

At some point, different people obtained this result.

Speaker 2

我得到了这个结果。

I obtained this result.

Speaker 2

我第一次意识到深度神经网络具有强大能力，是在2010年詹姆斯·马丁斯发明Hessian-free优化器时，他成功从头训练了一个10层神经网络，无需预训练。

The first the the first moment in which I realized that deep neural networks are powerful was when James Martins invented the Hessian free optimizer in 2010, and he trained a 10 layer neural network end to end without pretraining from scratch.

Speaker 2

当那一刻发生时，我想就是它了。

And when that happened, I thought this is it.

Speaker 2

因为如果你能训练一个大型神经网络，那么它就能表示非常复杂的函数。

Because if you can train a big neural network, a big neural network can represent very complicated function.

Speaker 2

因为一个10层的神经网络，就相当于让人类大脑运行若干毫秒。

Because if you have a neural network with 10 layers, it's as though you allow the human brain to run for some number of milliseconds.

Speaker 2

神经元放电速度很慢，在100毫秒内神经元可能只放电10次。

Neuron firings are slow, and so in maybe a 100 milliseconds, your neurons only fire 10 times.

Speaker 2

所以这也类似于10个层级。

So it's also kind of like 10 layers.

Speaker 2

而在100毫秒内，你就能完美识别任何物体。

And in a hundred milliseconds, you can perfectly recognize any object.

Speaker 2

所以我当时就认为，我们需要用大量监督数据训练一个非常大的神经网络，这样一定能成功，因为我们可以找到最优的神经网络。

So I thought so I already had the idea then that we need to train a very big neural network on lots of supervised data, and then it must succeed, because we can find the best neural network.

Speaker 2

而且还有理论认为，如果数据量超过参数数量，就不会出现过拟合。

And then there's also theory that if you have more data than parameters, you won't overfit.

Speaker 2

戴夫，你知道这个理论其实很不完善，实际上即使数据量少于参数数量也不会过拟合。

So, Dave, you know that actually this theory is very incomplete, and you won't overfit even if have less data than parameters.

Speaker 2

但可以肯定的是，如果数据量超过参数数量，就一定不会过拟合。

But, definitely, if you have more data than parameters, you won't overfit.

Speaker 2

所以神经网络严重过参数化的事实并没有让你感到沮丧？

So the fact that neural networks were heavily overparameterized wasn't discouraging to you?

Speaker 1

所以你当时在思考参数数量的理论？你认为存在大量参数是没问题的？

So you you were thinking about the theory that the number of parameters the fact there's a huge number of parameters is okay?

Speaker 1

这样真的没问题吗？

Is it gonna be okay?

Speaker 2

我的意思是，之前有些证据表明大体上没问题，但理论主要是说如果拥有大数据集和大神经网络，就能成功。

I mean, there was some evidence before that it was okay ish, but the theory was most the theory was that if you had a big dataset and a big neural net, it was going to work.

Speaker 2

过度参数化实际上并没有被视为一个大问题。

The overparameterization just didn't really figure much as as a problem.

Speaker 2

我当时想，对于图像数据，只要增加一些数据增强手段，问题就能解决。

I thought, well, with images, you're just gonna add some data augmentation, and it's gonna be okay.

Speaker 1

那么这些疑虑是从何而来的呢？

So where was any doubt coming from?

Speaker 2

主要的疑虑在于：我们是否有足够的计算资源来训练足够大的神经网络？

The the main doubt was, can we train a bigger will we have enough compute to train a big enough neural net?

Speaker 2

通过反向传播。

With backpropagation.

Speaker 2

我当时认为反向传播会奏效。

Backpropagation, I thought, was would work.

Speaker 2

当时尚不明确的是是否有足够的计算资源来获得非常令人信服的结果。

The thing which wasn't clear would was whether there would be enough compute to get a very convincing result.

Speaker 2

后来，Alex Kurzewski编写了这些用于训练卷积神经网络的极速CUDA内核，然后——砰的一声，突破了瓶颈。

And then at some point, Alex Kurzewski wrote these insanely fast CUDA kernels for training convolutional neural nets, and that was bam.

Speaker 2

我们开始吧。

Let's do this.

Speaker 2

让我们拿下ImageNet，这将是最了不起的事情。

Let's get ImageNet, and it's gonna be the greatest thing.

Speaker 1

你的直觉主要来源于你自己和他人的实证结果吗？

Was your intuition most of your intuition from empirical results by you and by others?

Speaker 1

那么，是实际证明了某段程序能训练一个10层神经网络，还是有一些纸笔或白板上的思考和直觉？

So, like, just actually demonstrating that a piece of program can train a 10 layer neural network, or was there some pen and paper or marker and whiteboard thinking, intuition?

Speaker 1

就像你直接把一个10层的大型神经网络连接到了大脑上。

Like, you just connected a a 10 layer large neural network to the brain.

Speaker 1

所以你刚才提到了大脑。

So you just mentioned the brain.

Speaker 1

那么在你对神经网络的直觉中，人类大脑是否作为直觉构建者发挥作用？

So in your intuition about neural networks, does the human brain come into play as a intuition builder?

Speaker 1

毫无疑问。

Definitely.

Speaker 2

我的意思是，在人工神经网络与人脑之间的类比上必须非常精确，但毫无疑问，大脑一直是深度学习研究者直觉和灵感的巨大源泉，从六十年代的罗森布拉特开始就是如此。

I mean, you you know, you gotta be precise with these analogies between neural artificial neural networks and the brain, but there's no question that the brain is a huge source of intuition and inspiration for deep learning researchers since all the way from Rosenblatt in the sixties.

Speaker 2

比如，如果你观察神经网络的整体概念，它正是直接受到大脑的启发。

Like, if you look at the the whole idea of a neural network is directly inspired by the brain.

Speaker 2

像麦卡洛克和皮茨这样的人曾说过，嘿，我们在大脑中发现了这些新神经元，而且我们最近了解了计算机和自动机，能否用计算机和自动机的一些理念来设计一种既简单、可计算又类似大脑的计算对象？于是他们发明了神经元模型。

You had people like McCallum and Pitts who were saying, hey, you got these new neurons in the brain, and hey, we recently learned about the computer and automata, can we use some ideas from the computer and automata to design some kind of computational object that's going to be simple, computational and kind of like the brain, and they invented the neuron.

Speaker 2

所以当时他们正是受到了大脑的启发。

So they were inspired by it back then.

Speaker 2

随后出现了福岛提出的卷积神经网络，以及后来Yan Lakhan的观点，他指出如果限制神经网络的感受野，将特别适合处理图像——事实证明的确如此。

Then you had the convolutional neural network from Fukushima and then later Yan Lakhan, who said, hey, if you limit the receptive fields of a neural network, it's going to be especially suitable for images, as it turned out to be true.

Speaker 2

因此，确实存在少数案例证明借鉴大脑的类比是成功的。

So there was a there was a very small number of examples where analogies to the brain were successful.

Speaker 2

我当时想，如果经过充分筛选，人工神经元或许与大脑并无太大差异，所以干脆就假设如此并继续推进。

And I thought, well, probably an artificial neuron is not that different from the brain if it's screened hard enough, so let's just assume it is and enroll with it.

Speaker 1

所以现在我们正处于深度学习非常成功的时代。

So now we're now at a time where deep learning is very successful.

Speaker 1

那么让我们少些模糊猜测，直接睁大眼睛看看，对你来说人脑与人工神经网络之间有什么有趣的差异？

So let us squint less and say let's open our eyes and say, what to you is an interesting difference between the human brain?

Speaker 1

我知道你可能不是专家，既不是神经科学家，而是生物学家，但粗略来说，未来一二十年间，你觉得人脑与人工神经网络之间最有趣的差异是什么？

Now I know you're probably not an expert, neither a neuroscientist and you're a biologist, but loosely speaking, what's the difference between the human brain and artificial neural networks that's interesting to you for the next decade or two?

Speaker 2

这是个很好的问题。

That's a good question to ask.

Speaker 2

人脑与我们的人工神经网络之间，究竟存在哪些有趣的差异？

What is in what is an interesting difference between the neural between the brain and our artificial neural networks?

Speaker 2

我觉得如今的人工神经网络——我们都同意在某些维度上人脑远超我们的模型。

So I feel like today, artificial neural networks so we all agree that there are certain dimensions in which the human brain vastly outperforms our models.

Speaker 2

但我也认为人工神经网络在某些方面比人脑具有许多非常重要的优势。

But I also think that there are some ways in which artificial neural networks have a number of very important advantages over the brain.

Speaker 2

通过对比优势与劣势，是找出重要差异的好方法。

Look looking at advantages versus disadvantages is a good way to figure out what is the important difference.

Speaker 2

大脑使用脉冲信号，这可能重要也可能不重要。

So the brain uses spikes, which may or may not be important.

Speaker 1

是的。

Yes.

Speaker 1

这是个非常有趣的问题。

That's a really interesting question.

Speaker 1

你认为这重要吗？

Do you do you think it's important or not?

Speaker 1

这是人工神经网络之间的一大架构差异。

That's one big architectural difference between artificial neural networks.

Speaker 2

这很难说，但我先验概率不高，我可以解释原因。

It's hard to tell, but my prior is not very high, and I can tell I can say why.

Speaker 2

你知道，有些人对脉冲神经网络感兴趣，他们发现基本上需要将非脉冲神经网络模拟成脉冲形式。

You know, there are people who are interested in spiking neural networks, and basically, what they figured out is that they need to simulate the non spiking neural networks in spikes.

Speaker 2

这就是他们使其运作的方式。

And that's how they're gonna make make them work.

Speaker 2

如果不将非脉冲神经网络模拟成脉冲形式，它就无法工作，因为问题在于它凭什么能工作？

If you don't simulate the non spiking neural networks in spikes, it's not going to work because the question is why should it work?

Speaker 2

这涉及到反向传播相关问题和深度学习的问题。

And that connects to questions around back propagation and questions around deep learning.

Speaker 2

你有一个庞大的神经网络。

You've got this giant neural network.

Speaker 2

它究竟为什么能运行？

Why should it work at all?

Speaker 2

那个学习规则为什么能生效？

Why should that learning rule work at all?

Speaker 2

这不是不言自明的问题，特别是如果你刚进入这个领域并阅读早期论文时，你会看到人们说'让我们构建神经网络吧'。

It's not a self evident question, especially if you let's say if you were just starting in the field and you read the very early papers, you can say, hey, people are saying, let's build neural networks.

Speaker 2

这是个好主意，因为大脑就是神经网络，所以构建神经网络会很有用。

That's a great idea because the brain is a neural network, so it would be useful to build neural networks.

Speaker 2

没错。

Yep.

Speaker 2

现在，让我们研究如何训练它们。

Now, let's figure out how to train them.

Speaker 2

训练它们应该是可行的，但具体怎么做呢？

It should be possible to train them, probably, but how?

Speaker 2

所以核心思想就是代价函数。

And so the big idea is the cost function.

Speaker 2

这才是关键所在。

That's the big idea.

Speaker 2

代价函数是根据某种度量标准来衡量系统性能的方法。

The cost function is a way of measuring the performance of the system according to some measure.

Speaker 2

顺便说一下，那是个

By the way, that is a

Speaker 1

实际上很大，让我想想。

big actually, let me think.

Speaker 1

这个想法是否很难得出？这个想法到底有多重要？

Is that is that, one, a difficult idea to arrive at, and how big of an idea is that?

Speaker 1

就是存在一个单一的代价函数，抱歉。

That there's a single cost function sorry.

Speaker 1

让我暂停一下。

Let me take a pause.

Speaker 1

监督学习是一个难以理解的概念吗？

Is supervised learning a difficult concept to come to?

Speaker 2

我不知道。

I don't know.

Speaker 2

回顾起来，所有的概念都非常简单。

All concepts are very easy in retrospect.

Speaker 1

是啊。

Yeah.

Speaker 1

这就是为什么现在看起来微不足道，但我之所以这么问，我们稍后会讨论，因为还有其他方面吗？

That's why it seems trivial now, but I So because because the reason I asked that and we'll talk about it, because is there other things?

Speaker 1

是否存在不一定有成本函数的事物？也许有多个成本函数，或者动态成本函数，或者完全不同的架构？

Is there things that don't necessarily have a cost function, maybe have many cost functions, or maybe have dynamic cost functions, or maybe a totally different kind of architectures?

Speaker 1

因为我们必须这样思考才能得出新的东西。

Because we have to think like that in order to arrive at something new.

Speaker 1

对吧？

Right?

Speaker 1

所以唯一的

So the only so the

Speaker 2

GAN中就是没有明确成本函数的典型案例。

good examples of things which don't have clear cost functions are GANs.

Speaker 2

对吧？

Right?

Speaker 2

在GAN中，你有一个博弈游戏。

In a GAN, you have a game.

Speaker 2

不同于传统成本函数优化思路——你知道可以用梯度下降算法来优化成本函数，并据此推演系统行为。

So instead of thinking of a cost function where you wanna optimize where you know that you have an algorithm gradient descent which will optimize the cost function, and then you can reason about the behavior of your system in terms of what it optimizes.

Speaker 2

GAN采用博弈论思维，通过均衡点来分析系统行为。

With a GAN, you say, I have a game, and I'll reason about the about the behavior of the system in terms of the equilibrium of the game.

Speaker 2

但关键在于构建这些能帮助我们推演系统行为的数学对象。

But it's all about coming up with these mathematical objects that help us reason about the behavior of our system.

Speaker 1

对。

Right.

Speaker 1

这真的很有趣。

That's really interesting.

Speaker 1

是啊。

Yeah.

Speaker 1

所以GAN是唯一一个。

So GAN is the only one.

Speaker 1

某种程度上说，成本函数是从比较中自然产生的。

It's kind of a the cost function is emergent from the comparison.

Speaker 1

我不...

It's I don't

Speaker 2

我不知道它是否有成本函数。

I don't know if it has a cost function.

Speaker 2

我不确定讨论GAN的成本函数是否有意义。

I don't know if it's meaningful to talk about the cost function of a GAN.

Speaker 2

这有点像生物进化的成本函数，或者经济系统的成本函数。

It's kind of like the cost function of biological evolution or the cost function of the economy.

Speaker 2

你可以讨论它可能趋向的区域，但我不认为成本函数的类比是最有用的。

It's you can talk about regions to which it could will go towards, but I don't think I don't think the cost function analogy is the most useful.

Speaker 1

所以如果进化实际上没有成本函数，这真的很有趣。

So if evolution doesn't that's really interesting.

Speaker 1

如果进化没有一个基于类似我们数学概念的成本函数，那么你认为深度学习中的成本函数是否在限制我们？

So if evolution doesn't really have a cost function, like a cost function based on its something akin to our mathematical conception of a cost function, then do you think cost functions in deep learning are holding us back?

Speaker 1

是的。

Yeah.

Speaker 1

所以你刚才提到成本函数是一个很好的初步深刻概念。

I so you just kinda mentioned that cost function is a is a nice first profound idea.

Speaker 1

你认为这是个好主意吗？

Do you think that's a good idea?

Speaker 1

你认为我们会超越这个想法吗？

Do you think it's an idea we'll go past?

Speaker 1

所以自我对弈开始触及

So self play starts to touch on

Speaker 2

强化学习系统中的这一方面。

that a little bit in reinforcement learning systems.

Speaker 2

没错。

That's right.

Speaker 2

自我对弈以及围绕探索的理念，即采取让预测器意外的行动。

Self play and also ideas around exploration where you're trying to take action that sup that surprise a predictor.

Speaker 2

我是成本函数的忠实拥趸。

I'm a big fan of cost functions.

Speaker 2

我认为成本函数非常棒，它们为我们提供了巨大帮助，只要能用成本函数解决的问题，我们都应该使用。

I think cost functions are great and they serve us really well, and I think that whenever we can do things with cost functions, we should.

Speaker 2

或许未来我们可能会提出另一种深刻的视角，让成本函数不再占据核心地位。

And, you know, maybe there is a chance that we will come up with some yet another profound way of looking at things that will involve cost functions in a less central way.

Speaker 2

不过我也说不准。

But I don't know.

Speaker 2

我认为成本函数，我的意思是，我不会反对成本函数。

I think cost functions are I mean, I would not bet against cost functions.

Speaker 1

关于大脑，你还能想到哪些与我们设计人工神经网络时不同且有趣的方面？

Is there other things about the brain that pop into your mind that might be different and interesting for us to consider in designing artificial neural networks?

Speaker 1

我们刚才稍微讨论了一下脉冲。

So we talked about spiking a little bit.

Speaker 2

我认为有一点可能很有用，神经科学家已经发现了关于大脑学习规则的一些东西，我指的是脉冲时间依赖可塑性，如果有人能在模拟中研究这一点会很好。

I mean, one one thing which may potentially be useful, I think people neuroscientists have figured out something about the learning rule of the brain, or I'm talking about spike time independent plasticity, and it would be nice if some people were to study that in simulation.

Speaker 1

等等。

Wait.

Speaker 1

抱歉。

Sorry.

Speaker 1

脉冲时间依赖可塑性？

Spike time independent plasticity?

Speaker 1

是的。

Yeah.

Speaker 1

没错。

That's right.

Speaker 1

那是什么？

What's that?

Speaker 2

STD。

STD.

Speaker 2

这是一种特定的学习规则，利用脉冲时序来决定如何更新突触。

It's it's a particular learning rule that uses spike timing to figure out how to to determine how to update the synapses.

Speaker 2

这有点像如果突触在神经元放电前向神经元传递信号，就会增强这个突触的连接强度。

So it's kind of like if a synapse fires into the neuron before the neuron fires, then it strengthens the synapse.

Speaker 2

而如果突触在神经元放电后不久才传递信号，则会削弱这个突触的连接。

And if the synapse fires into the neuron shortly after the neuron fired, then it weakens the synapse.

Speaker 2

大致是这样的机制，我有90%的把握它是正确的。

Something along this line, I'm 90% sure it's right.

Speaker 2

所以如果我这里说错了什么，还请不要太生气。

So if I said something wrong here, don't don't get too angry.

Speaker 2

但你说这话时听起来很聪明。

But you sounded brilliant while saying it.

Speaker 2

但时序问题，你觉得这是缺失的一环吗？

But the timing, do you that's one thing that's missing.

Speaker 2

时间动态性没有被捕捉到。

The the temporal dynamics is not captured.

Speaker 2

我认为这像是某种基本特性

I think that's like a fundamental property

Speaker 1

大脑的特性在于这些信号的时间性。

of the brain is the timing of this of the signals.

Speaker 1

嗯，你有循环神经网络。

Well, you have recurrent neural networks.

Speaker 1

但你将其视为...我是说，那是个非常粗略简化的叫什么来着？

But you you think of that as this I mean, that's a very crude simplified what's that called?

Speaker 1

循环神经网络里应该有个时钟机制吧。

The there's a clock, I guess, to recurrent neural networks.

Speaker 1

嗯。

Mhmm.

Speaker 1

这看起来大脑就像是那个的通用连续版本，一种泛化形式，所有可能的时间安排都存在可能，而在这些时间安排中蕴含着某些信息。

It's this this seems like the brain is the general the continuous version of that, the the generalization where all possible timings are possible, and then within those timings is contained some information.

Speaker 1

你认为循环神经网络中的循环结构，能否捕捉到与大脑神经元放电中看似重要的时序现象相同的现象？

You think recurrent neural networks, the recurrence in recurrent neural networks can capture the same kind of phenomena as the timing that seems to be important for the brain in the in the firing of neurons in the brain?

Speaker 2

我是说，我认为循环神经网络非常了不起，它们能做到...我认为它们能做到我们期望一个系统能做到的任何事情。

I I mean, I think I think recurrent neural recurrent neural networks are amazing and they can do I think they can do anything we we'd want them to we'd want a system to do.

Speaker 2

目前循环神经网络已被Transformer取代，但也许有一天它们会卷土重来。

Right now, recurrent neural networks have been superseded by transformers, but maybe one day they'll make a comeback.

Speaker 2

也许它们会回归。

Maybe they'll be back.

Speaker 2

我们拭目以待。

We'll see.

Speaker 1

让我稍微跑个题，你觉得它们会回归吗？

Let me on a small tangent, say, do you think they'll be back?

Speaker 1

我们即将讨论的自然语言处理和语言建模领域最近的许多突破，都是由不强调循环机制的Transformer模型实现的。

So so much of the breakthroughs recently that we'll talk about on natural language processing and language modeling has been with transformers that don't emphasize recurrence.

Speaker 1

你认为循环机制会卷土重来吗？

You think recurrence will make a comeback?

Speaker 2

嗯，我认为某种形式的循环机制很可能会回归。

Well, some kind of recurrence, I think very likely.

Speaker 2

至于循环神经网络在处理序列方面的传统优势，我认为这种可能性也是存在的。

Recurrent neural networks for pros as they're typically thought of for processing sequences, I think it's also possible.

Speaker 1

对你来说，什么是循环神经网络？

What is to you a recurrent neural network?

Speaker 1

一般来说，我想问，什么是循环神经网络？

And generally speaking, I guess, what is a recurrent neural network?

Speaker 2

它是一种保持高维隐藏状态的神经网络。

You have a neural network which maintains a high dimensional hidden state.

Speaker 2

当接收到观测数据时，它会通过某种连接方式更新其高维隐藏状态。

And then when an observation arrives, it updates its high dimensional hidden state through its connections in some way.

Speaker 2

那么你认为，

So do you think,

Speaker 1

你知道，那正是专家系统所做的。

you know, that's what, like, expert systems did.

Speaker 1

对吧？

Right?

Speaker 1

符号AI，基于知识的系统，构建知识库就是在维护一个隐藏状态，即其知识库，并通过顺序处理不断扩展它。

Symbolic AI, the knowledge based, growing a knowledge base is is maintaining a hidden state, which is its knowledge base, and is growing it by sequential processing.

Speaker 1

你是更倾向于广义上这样理解，还是说它仅仅是今天我们所说的带有特定门控单元的隐藏状态这种更受限形式，比如LSTM中的那种？

Do you think of it more generally in that way, or is it simply is it the more constrained form that of of a hidden state with certain kind of gating units that we think of as today with LSTMs in that?

Speaker 2

我的意思是，从技术上讲，隐藏状态就是你刚才描述的那样。

I mean, the hidden state is technically what you described there.

Speaker 2

就是进入LSTM或RNN这类模型内部的隐藏状态。

The hidden state that goes inside the LSTM or the RNN or something like this.

Speaker 2

但具体应该包含什么内容呢？如果要类比专家系统的话——我不是说...你可以认为知识存储在连接权重中，而短期处理则是在隐藏状态中完成的。

But then what should be contained, you know, if you want to make the expert system analogy, I'm not I mean, you could say that the knowledge is stored in the in the connections and then the short term processing is is done in the in the hidden state.

Speaker 1

是的。

Yes.

Speaker 1

你能这么说吗？

Could you say that?

Speaker 1

是的。

Yes.

Speaker 1

那么你认为在神经网络中构建大规模知识库有未来吗？

So sort of do you think there's a future of building large scale knowledge bases within the neural networks?

Speaker 1

确实如此。

Definitely.

Speaker 1

所以我们暂且保留这个观点，因为我想深入探讨一下。

So we're gonna pause in that confidence because I wanna explore that.

Speaker 1

但让我把话题拉回来，继续谈谈ImageNet的历史。

But let me zoom back out and ask back to the history of ImageNet.

Speaker 1

正如你提到的，神经网络已经存在了几十年。

Neural networks have been around for many decades, as you mentioned.

Speaker 1

你认为导致它们成功的关键理念是什么，那个ImageNet时刻以及之后在

What do you think were the key ideas that led to their success, that ImageNet moment and beyond the the success in the

Speaker 2

过去十年里。

past ten years.

Speaker 2

好的。

Okay.

Speaker 2

那么问题是，为确保我没有遗漏，过去十年间导致深度学习成功的关键理念。

So the question is, to make sure I didn't miss anything, the key ideas that led to the success of deep learning over the past ten years.

Speaker 1

正是。

Exactly.

Speaker 1

尽管深度学习背后的基本原理存在的时间要长得多。

Even though the fundamental thing behind deep learning has been around for much longer.

Speaker 1

所以关键

So the key

Speaker 2

关于深度学习的理念，或者说在深度学习开始成功之前的关键事实是它被低估了。

idea about deep learning, or rather the the key fact about deep learning before deep learning started to be successful is that it was underestimated.

Speaker 2

从事机器学习的人当时根本不认为新型神经网络能有多大作为。

People who worked in machine learning simply didn't think that new neural networks could do much.

Speaker 2

人们不相信大型神经网络能够被训练出来。

People didn't believe that large neural networks could be trained.

Speaker 2

人们认为，当时机器学习领域有很多关于正确方法的争论等等，大家争论不休是因为那时缺乏确凿的事实依据。

People thought that well, there was lots of there was a lot of debate going on in machine learning about what are the right methods and so on, and people were arguing because there were no there were no there was no way to get hard facts.

Speaker 2

我的意思是，当时没有真正具有挑战性的基准测试，如果你能在这些测试中表现出色，就能自信地说：看，这就是我的系统。

And by that, I mean, there were no benchmarks which were truly hard, that if you do really well on them, then you can say, look, here is my system.

Speaker 2

正是在那时，这个领域开始更多地转向工程化方向。

That's when you switch from that's when this field becomes a little bit more of an engineering field.

Speaker 2

就深度学习而言，直接回答这个问题的话，所有理念其实早已存在。

So in terms of deep learning, to answer the question directly, the ideas were all there.

Speaker 2

当时缺失的是大量标注数据和强大的计算能力。

The thing that was missing was a lot of supervised data and a lot of compute.

Speaker 2

当你拥有了大量标注数据和强大算力后，还需要第三样东西，那就是坚定的信念。

Once you have a lot of supervised data and a lot of compute, then there is a third thing which is needed as well, and that is conviction.

Speaker 2

坚信只要将已有的正确材料与大量数据和算力相结合，就一定能成功。

Conviction that if you take the right stuff, which already exists, and apply and mix it with a lot of data and a lot of compute, that it will in fact work.

Speaker 2

而这正是缺失的那一环。

And so that was the missing piece.

Speaker 2

你需要数据，需要以GPU形式出现的算力，还需要意识到必须将它们融合的信念。

It was you had the you needed the the data, you needed the compute, which showed up in terms of GPUs, and you needed the conviction to realize that you need to mix them together.

Speaker 1

这确实很有意思。

So that's really interesting.

Speaker 1

那么我猜，算力和监督数据的存在让实证证据说服了计算机科学界的大多数人。

So I I guess the presence of compute and the presence of supervised data allowed the empirical evidence to do the convincing of the majority of the computer science community.

Speaker 1

我记得有个关键时刻，Jitendra Malik和Alex Alyosha Efros当时持强烈怀疑态度。

So I guess there's a key moment with Jitendra Malik and Alex Alyosha Efros who are very skeptical.

Speaker 1

对吧？

Right?

Speaker 1

而Jeffrey Hinton则恰恰相反，他毫不怀疑。

And then there's a Jeffrey Hinton that was the opposite of skeptical.

Speaker 1

而有一个令人信服的时刻，我认为ImageNet就充当了那个转折点。

And there was a convincing moment, and I think ImageNet served as that moment.

Speaker 1

没错。

That's right.

Speaker 1

他们代表了计算机视觉领域的几大支柱——就像巫师们聚在一起，突然间风向就变了。光有想法和计算机还不够。

And they represented this kind of or the big pillars of computer vision community kinda the the wizards got together, and then all of a sudden there was a shift, and it's not enough for the ideas to all be there and the computer be there.

Speaker 1

关键在于要消除当时存在的怀疑态度。有趣的是，人们有几十年时间就是不相信。

It's for it to convince the cynicism that existed that the it's interesting that people just didn't believe for a couple of decades.

Speaker 2

是啊。

Yeah.

Speaker 2

但其实不止如此。

Well, but it's more than that.

Speaker 2

这么说起来，听起来像是：'那些不相信的蠢人到底错过了什么'。

It's kind of when when when put this way, it sounds like, well, you know, those silly people who didn't believe what were they what were they missing.

Speaker 2

但实际上当时情况确实很混乱，因为神经网络在各方面确实都不奏效。对吧？

But in reality, things were confusing because neural networks really did not work on anything Right.

Speaker 2

而且它们在几乎所有方面也都不是最佳方法。

And they were not the best method on pretty much anything as well.

Speaker 2

所以当时人们理性地说‘这套东西没有发展前景’是合理的。

And it was pretty rational to say, yeah, this stuff doesn't have any traction.

Speaker 2

这就是为什么我们需要这些能产生确凿证据的艰巨任务，我们正是这样取得进展的。

And that's why you need to have these very hard tasks which are which produce undeniable evidence, and that's how we make progress.

Speaker 2

这也是该领域如今能取得进步的原因——因为我们拥有这些代表真实进展的硬性基准。

And that's why the field is making progress today because we have these hard benchmarks which represent true progress.

Speaker 2

正因如此，我们才能避免无休止的争论。

And so and this is why we were able to avoid endless debate.

Speaker 1

令人惊叹的是，你为人工智能领域贡献了许多重大创新——从计算机视觉、自然语言处理到强化学习，几乎涵盖所有方向，可能除了生成对抗网络。

So incredibly, you've contributed some of the biggest recent ideas in AI in in computer vision, language, natural language processing, reinforcement learning, sort of everything in between, maybe not GANs.

Speaker 1

几乎没有哪个课题是你未曾涉猎的，当然还包括深度学习的理论基础。

Is there there may not be a topic you haven't touched, and, of course, the the fundamental science of deep learning.

Speaker 1

对你而言，视觉、语言与强化学习中的动作学习问题之间有何本质区别？

What is the difference to you between vision, language, and as in reinforcement learning, action as learning problems?

Speaker 1

那么它们的共同点是什么？

And what are the commonalities?

Speaker 1

你认为它们都是相互关联的吗？

Do you see them as all interconnected?

Speaker 1

它们是否从根本上属于需要不同

Are they fundamentally different domains that require different

Speaker 2

方法的不同领域？

approaches?

Speaker 2

好的。

Okay.

Speaker 2

这是个好问题。

That's a good question.

Speaker 2

机器学习是一个高度统一的领域，具有极强的统一性。

Machine learning is a field with a lot of unity, a huge amount of unity.

Speaker 2

实际上

In fact

Speaker 1

你说的统一性是指什么？

What do mean by unity?

Speaker 1

比如，是理念的重叠吗？

Like, overlap of ideas?

Speaker 1

重叠的

Overlap of

Speaker 2

理念的重叠、原则的重叠。

ideas, overlap of principles.

Speaker 2

实际上，只有一两条或三条非常非常简单的原则。

In fact, there's only one or two or three principles which are very, very simple.

Speaker 2

然后它们几乎以相同的方式，几乎相同地应用于不同的模态、不同的问题。

And then they then they apply in almost the same way, in almost the same way to the different modalities, the different problems.

Speaker 2

这就是为什么今天，当有人发表一篇关于改进深度学习在视觉中优化的论文时，它会提升不同的NLP应用，也会提升不同的强化学习应用。

And that's why today, when someone writes a paper on improving optimization of deep learning in vision, it improves the different NLP applications, and it improves the different reinforcement learning applications.

Speaker 2

强化学习...所以我认为计算机视觉和NLP彼此非常相似。

Reinforcement learn so I would say that computer vision and NLP are very similar to each other.

Speaker 2

如今，它们的区别在于架构略有不同。

Today, they differ in that they have slightly different architectures.

Speaker 2

在自然语言处理中我们使用Transformer，而在视觉领域我们使用卷积神经网络。

We use transformers in NLP, and we use convolutional neural networks in vision.

Speaker 2

但也有可能某天这种情况会改变，所有领域都将统一采用单一架构。

But it's also possible that one day this will change and everything will be unified with a single architecture.

Speaker 2

因为如果回顾几年前的自然语言处理领域，每个微小问题都有大量不同的架构。

Because if you go back a few years ago in natural language processing, there were huge a huge number of architectures for every different tiny problem had its own architecture.

Speaker 2

如今，所有这些不同任务都只需要一个Transformer模型。

Today, there's just one transformer for all those different tasks.

Speaker 2

如果再往前追溯，你会看到更加碎片化的状况，AI领域的每个小问题都有自己专门的小分支领域和技能集合，需要特定人才来设计特征。

And if you go back in time even more, you had even more and more fragmentation, and every little problem in AI had its own little sub specialization and sub, you know, little set of collection of skills, people who would know how to engineer the features.

Speaker 2

现在这一切都已被深度学习所整合。

Now it's all been subsumed by deep learning.

Speaker 2

我们实现了这种统一。

We have this unification.

Speaker 2

因此，我预计视觉领域也将与自然语言处理实现统一。

And so I expect vision to become unified with natural language as well.

Speaker 2

或者说，我不该用‘预计’这个词。

Or rather, shouldn't say expect.

Speaker 2

我认为这是有可能的。

I think it's possible.

Speaker 2

我不想过于肯定，因为我觉得卷积神经网络在计算效率上非常高。

I don't wanna be too sure because I think on the convolutional neural net, it's very computationally efficient.

Speaker 2

强化学习有所不同。

RL is different.

Speaker 2

强化学习确实需要略微不同的技术，因为你确实需要采取行动。

RL does require slightly different techniques because you really do need to take action.

Speaker 2

你确实需要处理探索问题。

You really do need to do something about exploration.

Speaker 2

你的方差要高得多。

Your variance is much higher.

Speaker 2

但我认为即便在那里也存在许多统一性。

But I think there is a lot of unity even there.

Speaker 2

例如，我预期在某个时间点，强化学习与监督学习之间会出现更广泛的统一，那时强化学习可能会通过决策来优化监督学习的效果。我想象那会是一个巨大的黑箱系统，你只需不断往里填充各种信息，它就能自行处理你输入的任何内容。

And I would expect, for example, that at some point there will be some broader unification between RL and supervised learning, where somehow the RL will be making decisions to make the supervised learning go better, and it will be, I imagine one big black box and you just throw every you know, you shovel shovel things into it, and it just figures out what to do with whatever you shovel at it.

Speaker 1

我的意思是，强化学习几乎融合了语言和视觉的某些特性。

I mean, reinforcement learning has some aspects of language and vision combined almost.

Speaker 1

它包含需要利用的长期记忆元素，以及一个极为丰富的感知空间要素。

There's elements of a long term memory that you should be utilizing, and there's elements of a really rich sensory space.

Speaker 1

所以看起来这就像是两者的结合

So it seems like the it's it's like the union of

Speaker 2

或者说类似这样的关系

the two or something like that.

Speaker 2

我会稍微换个说法

I'd I'd say something slightly differently.

Speaker 2

我认为强化学习既不是前者也不是后者，但它天然地与两者相互连接并融合

I'd say that reinforcement learning is neither, but it naturally interfaces and integrates with the two of them.

Speaker 1

你认为行动在本质上是不同的吗？

You think action is fundamentally different?

Speaker 1

那么，关于学习行动策略的独特性，有什么有趣的地方呢？

So, yeah, what is interesting about what is unique about policy of learning to act?

Speaker 2

举个例子，当你学习行动时，你本质上处于一个非静态的世界中。

Well, so one example, for instance, is that when you learn to act, you are fundamentally in a non stationary world.

Speaker 2

因为随着你的行动改变，你所看到的事物也开始变化。

Because as your actions change, the things you see start changing.

Speaker 2

你会以不同的方式体验世界，这与更传统的静态问题不同，在静态问题中你有一个分布，只需将模型应用于该分布。

You you experience the world in a different way, and this is not the case for the more traditional static problem where you have a some distribution and you just apply a model to that distribution.

Speaker 1

你认为这是一个本质上不同的问题，还是说它只是理解问题的一个更困难的泛化？

You think it's a fundamentally different problem, or is it just a more difficult general it's a generalization of the problem of understanding?

Speaker 2

这几乎是一个定义的问题。

I mean, it's it's it's a question of definitions almost.

Speaker 2

当然，两者之间存在大量的共性。

There is a huge, you know, there's a huge amount of commonality for sure.

Speaker 2

计算梯度。

Take gradients.

Speaker 2

你尝试计算梯度，在这两种情况下都试图近似梯度。

You try you take gradients, you try to approximate gradients in both cases.

Speaker 2

在强化学习的情况下，你有工具来降低梯度的方差。

In some in the case of reinforcement learning, you have some tools to reduce the variance of the gradients.

Speaker 2

你这样做。

You do that.

Speaker 2

有很多共同点。

There's lots of commonality.

Speaker 2

在这两种情况下使用相同的神经网络。

Use the same neural net in both cases.

Speaker 2

你计算梯度，在这两种情况下都应用Adam优化器。

You compute the gradient, you apply Adam in both cases.

Speaker 2

所以，我的意思是，肯定有很多共同点，但也有一些小的差异并非完全不重要。

So, I mean, there's lots in common for sure, but there are some small differences which are not completely insignificant.

Speaker 2

这实际上只是取决于你的观点，你选择什么样的参考框架，以及你在审视这些问题时想要放大或缩小到什么程度？

It's really just a matter of your of your point of view, what frame of reference you what how much do you zoom wanna zoom in or out as you look at these problems?

Speaker 2

你认为哪个问题更难？

Which problem do you think is harder?

Speaker 2

像诺姆·乔姆斯基这样的人认为语言是一切的基础，它支撑着所有事物。

So people like Noam Chomsky believe that language is fundamental to everything, so it underlies everything.

Speaker 2

你呢

Do you

Speaker 1

你认为语言理解比视觉场景理解更难，还是反过来？

think language understanding is harder than visual scene understanding or vice versa?

Speaker 2

我觉得问一个问题是否困难有些不太对。

I think that asking if a problem is hard is slightly wrong.

Speaker 2

我认为这个问题本身有点问题，我想解释一下原因。

I think the question is a little bit wrong, and I wanna explain why.

Speaker 2

那么，一个问题难究竟意味着什么？

So what does it mean for a problem to be hard?

Speaker 1

好的。

Okay.

Speaker 1

对此最无趣的愚蠢回答是：存在一个基准测试，以及人类在该基准上的表现水平。

The non interesting dumb answer to that is there's this there's a benchmark, and there's a human level performance on that benchmark.

Speaker 1

而达到人类水平所需的努力程度如何？

And how is the effort required to reach the human level Okay.

Speaker 1

基准。

Benchmark.

Speaker 2

那么从我们何时能在优秀基准测试上达到人类水平的角度来看？

So from the perspective of how much until we get to human level on a very good benchmark?

Speaker 2

是的。

Yeah.

Speaker 2

比如有些...我我我明白你的意思。

Like some I I I I what you mean by that.

Speaker 2

我本来想说的是，很大程度上取决于，你知道，一旦问题解决，它就不再困难了，确实如此。

What I was going going to say that a lot of it depends on, you know, once you solve a problem, it stops being hard, and that's Right.

Speaker 2

这总是对的。

All that's always true.

Speaker 2

所以，某件事是否困难取决于我们当前的工具能做什么。

So but if something is hard or not depends on what our tools can do today.

Speaker 2

因此，你看，今天要说达到真正人类水平的语言理解和视觉感知是困难的，因为在未来三个月内完全解决这个问题是不可能的。

So, know, you say today, true human level language understanding and visual perception are hard in the sense that there is no way of solving the problem completely in the next three months.

Speaker 2

没错。

Right.

Speaker 2

我同意这个观点。

So I agree with that statement.

Speaker 2

除此之外，我也只是猜测，我的猜测可能和你的一样不靠谱。

Beyond that, I'm just I'd be my my guess would be as good as yours.

Speaker 2

我不知道。

I don't know.

Speaker 1

哦，好吧。

Oh, okay.

展开剩余字幕（还有 480 条）

Speaker 1

所以你对于语言理解有多困难缺乏基本的直觉。

So you don't have the fundamental intuition about how hard language understanding is.

Speaker 2

嗯，我想我...我知道我改变主意了。

Well, I think I I I know I changed my mind.

Speaker 2

我会说语言可能比...这取决于你怎么定义它。

I'd say language is probably going to be harder than I mean, it depends on how you define it.

Speaker 2

比如，你是指绝对顶尖、100%的语言理解能力，那我选语言。

Like, you mean absolute top notch, 100% language understanding, I'll go with language.

Speaker 2

但如果我给你看一张有字母的纸，你明白我的意思吗？

And so but then if I show you a piece of paper with letters on it, is that you see what I mean?

Speaker 2

你有一个视觉系统，你说是最接近人类水平的视觉系统。

It's you have a vision system, you say it's the best human level vision system.

Speaker 2

我打开一本书，给你看这些字母。

I show you I open a book, and I show you letters.

Speaker 2

是否需要理解这些字母如何组成单词、句子和含义？

Do need to understand how these letters form into word and sentences and meaning?

Speaker 2

这是视觉问题的一部分吗？

Is this part of the vision problem?

Speaker 2

视觉在哪里结束，语言从哪里开始？

Where does vision end and language begin?

Speaker 2

是的。

Yeah.

Speaker 2

所以乔姆斯基会说它始于语言。

So Chomsky would say it starts at language.

Speaker 2

因此视觉只是这种结构和基本概念层次的一个小例子，这些已经以某种方式在我们的大脑中通过语言表现出来。

So vision is just a little example of the kind of structure and, you know, fundamental hierarchy of ideas that's already represented in our brain somehow that's represented through language.

Speaker 2

但视觉在哪里

But where does vision

Speaker 1

停止而语言开始？

stop and language begin?

Speaker 1

这是个非常有趣的问题。

That's a really interesting question.

Speaker 1

一种可能性是，若没有本质上使用同一种系统，无论是图像还是语言，都难以实现真正深刻的理解。

So one possibility is that it's impossible to achieve really deep understanding in either images or language without basically using the same kind of system.

Speaker 1

因此你将能免费获得另一个。

So you're going to get the other for free.

Speaker 2

我认为这很有可能，是的，如果我们能掌握其中一种，我们的机器学习可能已经足够优秀，能够掌握另一种。

I think I think it's pretty likely that, yes, if we can get one, we prob our machine learning is probably that good that we can get the other.

Speaker 2

但并非百分之百确定，我没有十足的把握。

But it's not 100 I'm not 100% sure.

Speaker 2

而且，我认为很大程度上确实取决于你的定义。

And also, I think a lot a a lot of it really does depend on your definitions.

Speaker 2

关于，比如说，完美视力的定义。

Definitions of Of, like, perfect vision.

Speaker 2

因为，要知道，阅读也是视觉活动，但它应该被算在内吗？

Because, know, reading is vision, but should it count?

Speaker 1

是的。

Yeah.

Speaker 1

对我来说，我的定义是如果一个系统观察一张图片，然后观察一段文字，接着能告诉我关于这些内容的信息，并且让我感到非常惊艳。

To me, so my definition is if a system looked at an image, and then a system looked at a piece of text, and then told me something about that, and I was really impressed.

Speaker 2

这是相对的。

That's relative.

Speaker 2

你会惊艳半小时，然后就会说，好吧，所有系统都能做到，但还有它们做不到的事情。

You'll be impressed for half an hour, and then you're gonna say, well, I mean, all the systems do that, but here's the thing they don't do.

Speaker 1

是的。

Yeah.

Speaker 1

但我对人类没有这种感觉。

But I I don't have that with humans.

Speaker 1

人类总能持续给我惊喜。

Humans continue to impress me.

Speaker 1

是这样吗？

Is that true?

Speaker 1

好吧，某些人确实如此。

Well, the ones okay.

Speaker 1

所以我支持一夫一妻制，我喜欢与某人结婚并共度数十年的想法。

So I'm a fan of monogamy, so I I like the idea of marrying somebody, being with them for several decades.

Speaker 1

因此我相信，确实有可能找到一个人持续为你带来愉悦、有趣、机智的新想法和朋友。

So I I believe in the fact that, yes, it's possible to have somebody continuously giving you pleasurable, interesting, witty, new ideas, friends.

Speaker 1

是的。

Yeah.

Speaker 1

我想是的。

I think I think so.

Speaker 1

他们总能给你惊喜。

They continue to surprise you.

Speaker 1

就是这种惊喜。

The surprise.

Speaker 1

你知道，这种随机性的注入似乎是一种很好的持续灵感来源，比如机智和幽默。

It's, you know, that injection of randomness seems to be a seems to be a nice source of, yeah, continued inspiration, like the the wit, the humor.

Speaker 1

我认为，这是一个非常主观的测试，但如果你房间里有足够多的人。

I think, yeah, that that the that would be it's a very subjective test, but I think if you have enough humans in the room.

Speaker 1

是的。

Yeah.

Speaker 1

我我我明白

I I I understand what

Speaker 2

你的意思。

you mean.

Speaker 2

对。

Yeah.

Speaker 2

我觉得我误解了你所说的'让你印象深刻'是什么意思。

I feel like I I misunderstood what you meant by impressing you.

Speaker 2

我以为你是想用它的智能、用它对图像的理解有多准确来打动你。

I thought you meant to impress you with its intelligence, with how how with how good valid understands an image.

Speaker 2

我原以为你的意思是，比如给它看一张非常复杂的图片，它能正确理解，然后你会说'哇'。

I I thought you meant something like, gonna show it a really complicated image, and it's gonna get it right, and you're gonna say, wow.

Speaker 2

那真的很酷。

That's really cool.

Speaker 2

我们2020年1月的系统还做不到这一点。

Our systems of, you know, January 2020 have not been doing that.

Speaker 1

是的。

Yeah.

Speaker 1

不。

No.

Speaker 1

我我我认为归根结底，就像人们在网上点赞的原因一样，是因为内容让他们发笑。

I I I think it all boils down to, like, the reason people click like on stuff on the Internet, which is, like, it makes them laugh.

Speaker 1

所以这更像是幽默或机智，是的。

So it's like humor or wit Yeah.

Speaker 2

或是洞察力。

Or insight.

Speaker 2

我们...我相信我们最终也会得到那种效果。

We'll we'll I'm I'm sure we'll get it as get that as well.

Speaker 1

请原谅这个理想化的问题，但回顾过去，你在深度学习或AI领域遇到过的最美妙或最令人惊讶的想法是什么？

So forgive the romanticized question, but looking back to you, what is the most beautiful or surprising idea in deep learning or AI in general you've come across?

Speaker 2

我认为深度学习最美妙的地方在于它确实有效。

So I think the most beautiful thing about deep learning is that it actually works.

Speaker 2

我这么说是因为我们有这些想法，有小型的神经网络，有反向传播算法，然后还有一些理论认为这有点像大脑，所以如果你把神经网络做大并用大量数据训练，它就能实现大脑的功能。

And I mean it because you got these ideas, you got the little neural network, you got the back propagation algorithm, and then you got some theories as to, you know, this is kinda like the brain, so maybe if you make it large if you make the neural network large and you train it on a lot of data, then it will do the same function that the brain does.

Speaker 2

而事实证明这是真的。

And it turns out to be true.

Speaker 2

这太疯狂了。

That's crazy.

Speaker 2

现在我们只是训练这些神经网络，把它们做得更大，它们就不断变得更好，我觉得这难以置信。

And now we just train these neural networks and you make them larger and they keep getting better, and I find it unbelievable.

Speaker 2

我觉得整个基于神经网络的AI技术居然有效这件事令人难以置信。

I find it unbelievable that this whole AI stuff with neural networks works.

Speaker 2

你对其中原因形成直觉理解了吗？

Have you built up an intuition of why?

Speaker 2

有没有一些零碎的直觉或洞见能解释为什么整个系统会有效？

Are there little bits and pieces of intuitions, of insights of why this whole thing works?

Speaker 2

我是说，有一些，确实有。

I mean, some, definitely.

Speaker 2

嗯，我们知道优化...我们现在有很好的...我们积累了大量的实证依据，相信优化在我们关注的大多数问题上都有效。

Well, we know that optimization we we now have good you know, we've take we we've had lots of empirical, you know, huge amounts of empirical reasons to believe that optimization should work on all most problems we care about.

Speaker 2

你

Do you

Speaker 1

有什么见解吗？你刚才提到了实证证据。

have insights of what so you just said empirical evidence.

Speaker 1

你大部分的实证证据是否在某种程度上说服了你。

Is most of your sort of empirical evidence kind of convinces you.

Speaker 1

这就像进化论是经验性的。

It's like evolution is empirical.

Speaker 1

它向你展示的是，看，这种进化过程似乎是设计能在环境中生存的生物体的好方法，但它并没有真正让你理解整个系统是如何运作的。

It shows you that, look, this evolutionary process seems to be a good way to design organisms that survive in their environment, but it doesn't really get you to the insights of how the whole thing works.

Speaker 2

嗯，我认为物理学是个很好的类比。

Well, I think it's it's a good analogy is physics.

Speaker 2

你知道人们常说，嘿，我们来做些物理计算，提出新的物理理论并做出预测。

You know how you say, hey, let's do some physics calculation and come up with some new physics theory and make some prediction.

Speaker 2

但之后你得进行实验验证。

But then you've got to run the experiment.

Speaker 2

你知道的，必须进行实验验证。

You know, you've got to run the experiment.

Speaker 2

这很重要。

It's important.

Speaker 2

所以这里的情况有点类似，只不过有时候实验可能先于理论出现。

So it's a bit the the same here, except that maybe some sometimes the experiment came before the theory.

Speaker 2

但道理依然成立。

But it still is the case.

Speaker 2

你看，你有些数据并做出预测，说：好，让我们构建一个大型神经网络，训练它，它的表现将远超以往任何模型，而且随着规模扩大性能会持续提升。

You know, you have some data and you come up with some prediction, you say, yeah, let's make a big neural network, let's train it, and it's going to work much better than anything before it, and it will in fact continue to get better as we make it larger.

Speaker 2

而事实证明确实如此。

And it turns out to be true.

Speaker 2

当理论以这种方式得到验证时，这真是太神奇了，你知道的。

That's that's amazing when a theory is validated like this, you know.

Speaker 2

这不是数学理论，几乎更像是一种生物学理论。

It's not a mathematical theory, it's more of a biological theory almost.

Speaker 2

所以我认为深度学习和生物学之间并非没有可比性。

So I think there are not terrible analogies between deep learning and biology.

Speaker 2

我会说这像是生物学和物理学的几何平均数。

I would say it's like the geometric mean of biology and physics.

Speaker 2

这就是深度学习。

That's deep learning.

Speaker 1

生物学与物理学的几何平均数。

The geometric mean of biology and physics.

Speaker 1

我想我需要几个小时来消化这个概念。

I think I'm gonna need a few hours to wrap my head around that.

Speaker 1

因为光是确定生物学所代表的那组参数就够复杂了。

Because just to find the geometric just to find the set of what biology represents.

Speaker 2

生物学领域的事物极其复杂，理论体系也极为深奥，很难建立起具有良好预测性的理论。

Well, biology in biology, things are really complicated and the theories are really really it's really hard to have good predictive theory.

Speaker 2

而在物理学中，理论又过于完美。

And in physics, the theories are too good.

Speaker 2

物理学理论上，人们能构建出极其精确的理论体系，做出惊人的预测。

In in theory in physics, people make these super precise theories, which make these amazing predictions.

Speaker 2

而在机器学习领域，我们某种程度上

And in machine learning, we're kinda

Speaker 1

介于两者之间。

in between.

Speaker 1

差不多介于中间。

Kinda in between.

Speaker 1

但如果机器学习能帮我们发现两者的统一性，而不是仅仅处于中间状态，那就太好了。

But it'd be nice if machine learning somehow helped us discover the unification of the two as opposed to sort of the in between.

Speaker 1

但你是对的。

But you're right.

Speaker 1

你这是在试图兼顾两者。

That's you're you're kinda trying to juggle both.

Speaker 1

所以你觉得

So do you

Speaker 2

神经网络中是否仍存在尚未被发现的优美而神秘的特性？

think there are still beautiful and mysterious properties in neural networks that are yet to be discovered?

Speaker 2

毫无疑问。

Definitely.

Speaker 2

我认为我们仍然严重低估了深度学习的潜力。

I think that we are still massively underestimating deep learning.

Speaker 2

你觉得它会

What do you think it

Speaker 1

呈现出什么形态？

will look like?

Speaker 1

比如说，如果

Like, what if

Speaker 2

我就知道我能做到。

I knew I would have done it.

Speaker 1

是啊。

Yeah.

Speaker 1

所以

Speaker 2

但如果你回顾过去十年的进展，我会说大多数情况下，只有少数案例中真正新颖的想法出现过。

but if you look at all the progress from the past ten years, I would say most of it I would say there have been a few cases where some where things that felt like really new ideas showed up.

Speaker 2

但总体而言，每年我们都以为深度学习只能达到这种程度。

But by and large, it was every year, we thought, deep learning goes this far.

Speaker 2

结果并非如此。

Nope.

Speaker 2

它实际上走得更远。

It actually goes further.

Speaker 2

然后到了下一年，好吧，现在你该认为这就是深度学习的巅峰了。

And then the next year, okay, now you now this is this is peak deep learning.

Speaker 2

我们真的结束了。

We are really done.

Speaker 2

不。

Nope.

Speaker 2

它还能走得更远。

It goes further.

Speaker 2

每年它都在不断突破。

It just keeps going further each year.

Speaker 2

这意味着我们一直在低估它。

So that means that we keep underestimating.

Speaker 2

我们始终未能真正理解它。

We keep not understanding it.

Speaker 2

它总是展现出令人惊讶的特性。

It has surprising properties all the time.

Speaker 2

你是否觉得它变得越来越难了？

Do you think it's getting harder and harder?

Speaker 2

要取得进展吗？

To make progress?

Speaker 2

需要取得进展吗？

Need to make progress?

Speaker 2

这取决于我们的定义。

It depends on what we mean.

Speaker 2

我认为这个领域在未来相当长一段时间内将继续取得非常稳健的进展。

I think the field will continue to make very robust progress for quite a while.

Speaker 2

我认为对于个体研究者，尤其是正在做研究的人来说，可能会更困难，因为现在有非常大量的研究者。

I think for individual researchers, especially people who are doing research, it can be harder because there is a very large number of researchers right now.

Speaker 2

我认为如果你拥有大量算力，就能做出许多非常有趣的发现，但随之而来的是要应对管理庞大算力集群来运行实验的挑战。

I think that if you have a lot of compute, then you can make a lot of very interesting discoveries, but then you have to deal with the challenge of managing a huge compute huge huge compute cluster to run your experiments.

Speaker 2

确实会稍微困难一些。

It's a little bit harder.

Speaker 1

虽然我问的都是些无人知晓答案的问题，但你是我认识的最聪明的人之一，所以我还是要继续追问。

So I'm asking all these questions that nobody knows the answer to, but you're one of the smartest people I know, so I'm gonna keep asking.

Speaker 1

那么让我们想象一下未来三十年在深度学习领域的所有突破。

The so let's imagine all the breakthroughs that happen in the next thirty years in deep learning.

Speaker 1

你认为这些突破中的大多数可以由一个人用一台计算机完成吗？

Do you think most of those breakthroughs can be done by one person with one computer?

Speaker 1

在突破的范畴内，你认为计算资源和大型团队协作是否会是必要条件？

Sort of in the space of breakthroughs, do you think compute will be compute and large efforts will be necessary?

Speaker 1

我是说，我

I mean, I

Speaker 2

无法确定。

can't be sure.

Speaker 2

你说的一台计算机，具体是指多大规格的？

When you say one computer, you mean how how large?

Speaker 1

你很聪明。

You're you're clever.

Speaker 1

我是指一块GPU。

I mean, one one GPU.

Speaker 2

我明白了。

I see.

Speaker 2

我认为这不太可能。

I think it's pretty unlikely.

Speaker 2

我认为这不太可能。

I think it's pretty unlikely.

Speaker 2

我认为深度学习的技术栈已经相当深了。

I think that there are many the the stack of deep learning is starting to be quite deep.

Speaker 2

如果你仔细看，从概念到系统构建、数据集创建、分布式编程、集群搭建、GPU编程，再到整体整合，涉及层面非常广。

If you look at it, you've got all the way from the ideas, the systems to build the datasets, the distributed programming, the building the actual cluster, the GPU programming, putting it all together.

Speaker 2

现在技术栈变得非常深，我认为一个人很难在每一层都达到世界顶尖水平。

So now the stack is getting really deep, and I think it becomes it can be quite hard for a single person to become to be world class in every single layer of the stack.

Speaker 1

那弗拉基米尔·瓦普尼克坚持的观点呢？比如用MNIST数据集尝试从少量样本中学习。

What about the what, like, Vladimir Vapnik really insists on is taking MNIST and trying to learn from very few examples.

Speaker 1

也就是提高学习效率的能力。

So being able to learn more efficiently.

Speaker 1

你认为在那个领域会有不需要大量计算的突破吗？

Do you think that's there'll be breakthroughs in that space that would may not need a huge compute?

Speaker 2

我认为总体上会有大量突破不需要大量计算。

I think there will be a I think there will be a large number of breakthroughs in general that will not need a huge amount of compute.

Speaker 2

也许我应该澄清一下。

So maybe I should clarify that.

Speaker 2

我认为有些突破需要大量计算。

I think that some breakthroughs will require a lot of compute.

Speaker 2

我认为构建真正能实现功能的系统将需要巨大的计算量。

And I think building systems which actually do things will require a huge amount of compute.

Speaker 2

这一点相当明显。

That one is pretty obvious.

Speaker 2

如果你想做x对吧。

If you want to do x Right.

Speaker 2

而x需要一个庞大的神经网络，你就必须搞到一个庞大的神经网络。

And x requires a huge neural net, you gotta get a huge neural net.

Speaker 2

但我认为，小型团队和个人仍有大量空间可以做出非常重要的工作。

But I think there will be lots of I think there is lots of room for very important work being done by small groups and individuals.

Speaker 1

能否请你围绕深度学习科学这个话题，谈谈你最近发表的一篇论文？

Can you maybe, sort of on the topic of the the science of deep learning, talk about one of the recent papers that you've released.

Speaker 1

当然。

Sure.

Speaker 1

关于深度双下降现象——嗯。

The deep double descent Mhmm.

Speaker 1

即模型越大、数据越多反而效果变差的情况。

Where bigger models and more data hurt.

Speaker 1

我认为这是篇非常有趣的论文。

I think it's a really interesting paper.

Speaker 2

你能描述下主要观点吗？好的。

Can you can you describe the main idea and Yeah.

Speaker 2

没问题。

Definitely.

Speaker 2

事情是这样的，多年来，少数研究人员注意到一个奇怪现象：当你增大神经网络规模时，它的表现反而更好，这似乎与统计学理念相矛盾。

So what happened is that some over over the years, some small number of researchers noticed that it is kind of weird that when you make the neural network larger, it works better, it seems to go in contradiction with statistical ideas.

Speaker 2

随后有人通过分析表明，实际上会出现这种双下降的凸起现象。

And then some people made an analysis showing that actually you got this double descent bump.

Speaker 2

而我们的研究则证明了双下降现象几乎在所有实际深度学习系统中都会发生。

And what we've done was to show that double descent occurs for all for pretty much all practical deep learning systems.

Speaker 2

而且这种现象还会持续...等等，你能先退一步解释吗？

And that it will be also so can you step back?

Speaker 1

双下降图的横轴和纵轴分别代表什么？

What's the x axis and the y axis of a double descent plot?

Speaker 1

好的。

Okay.

Speaker 1

很好。

Great.

Speaker 2

你可以这样观察：比如保持数据集不变，逐步缓慢增加神经网络的规模。

So you can you can look you can do things like, you can take a neural network and you can start increasing its size slowly while keeping your dataset fixed.

Speaker 2

所以如果你缓慢增加神经网络的大小并且不进行早停，这是一个相当重要的细节。

So if you increase the size of the neural network slowly and if you don't do early stopping, that's a pretty important detail.

Speaker 2

然后当新别名书籍非常小时，你把它扩大。

Then when the new alias book is really small, you make it larger.

Speaker 2

你会看到性能的快速提升。

You get a very rapid increase in performance.

Speaker 2

然后你继续扩大它，在某个时刻，性能会变差，并且恰恰在达到零训练误差、精确零训练损失时变得最差。

Then you continue to make it larger, and at some point, performance will get worse, and it gets and and it gets the worst exactly at the point at which it achieves zero training error, precisely zero training loss.

Speaker 2

然后随着你把它做得更大，它的性能又开始变好。

And then as you make it larger, it starts to get better again.

Speaker 2

这有点反直觉，因为你预期深度学习现象应该是单调的。

And it's kinda counterintuitive because you'd expect deep learning phenomena to be monotonic.

Speaker 2

很难确定这意味着什么，但这种现象也出现在线性分类器中，直觉基本上可以归结为以下几点。

And it's hard to be sure what it means, but it also occurs in in the case of linear classifiers, and the intuition basically boils down to the following.

Speaker 2

当你拥有大型数据集和小模型时，那么微小的随机... 基本上，什么是过拟合？

When you when you have a lot when you have a large dataset and a small model, then small tiny random so basically, what is overfitting?

Speaker 2

过拟合是指你的模型对数据集中那些微小、随机且不重要的细节异常敏感。

Overfitting is when your model is somehow very sensitive to the small random unimportant stuff in your dataset.

Speaker 2

在训练数据中。

In the training data.

Speaker 2

准确地说，是在训练数据集中。

In the training dataset, precisely.

Speaker 2

所以，如果你有一个小模型和一个大数据集，可能会存在一些随机情况，比如某些训练样本随机出现在数据集中而其他样本则没有。但小模型对这种随机性不太敏感，因为当数据集很大时，模型几乎不存在不确定性。

So, if you have a small model and you have a big dataset, and there may be some random thing, you know, some training cases are randomly in the dataset and others may not be there, But the small model but the small model is kind of insensitive to this randomness because it's the same you there is pretty much no uncertainty about the model when the dataset is large.

Speaker 2

好的。

So okay.

Speaker 2

所以对我来说，最基础也最令人惊讶的是神经网络不会每次都快速过拟合

So at the very basic level to me, it is the most surprising thing that neural networks don't overfit every time very quickly

Speaker 1

在还没能学到任何东西之前。

before ever being able to learn anything.

Speaker 1

参数数量如此庞大。

The huge number of parameters.

Speaker 2

所以这里有一种方法。

So here is so there is one way.

Speaker 2

好的。

Okay.

Speaker 2

那么也许让我试着给出解释。

So maybe the so let let me try to give the explanation.

Speaker 2

或许这个解释能行得通。

Maybe that will be that will work.

Speaker 2

假设你有一个庞大的神经网络。

So you got a huge neural network.

Speaker 2

让我们假设你拥有一个巨大的神经网络。

Let's suppose you've got a you are you have a huge neural network.

Speaker 2

你有大量的参数。

You have a huge number of parameters.

Speaker 2

现在让我们假装所有东西都是线性的——虽然实际上并非如此，就暂且这么假设。

And now let's pretend everything is linear, which is not let's just pretend.

Speaker 2

那么这里就存在一个大的子空间，你的神经网络在其中可以达到零误差。

Then there is this big subspace where your neural network achieves zero error.

Speaker 2

嗯。

Mhmm.

Speaker 2

而SSGT算法将会近似找到梯度点。

And SSGT is going to find approximately the point gradient.

Speaker 2

没错。

That's right.

Speaker 2

对。

Yeah.

Speaker 2

近似找到该子空间中范数最小的点。

Approximately the point with the smallest norm in that subspace.

Speaker 2

好的。

Okay.

Speaker 2

而且可以证明，在高维情况下，这个方法对数据中的微小随机性是不敏感的。

And that can also be proven to be insensitive to the small randomness in the data when the dimensionality is high.

Speaker 2

但当数据维度与模型维度相等时，所有数据集与模型之间就存在一一对应的关系。

But when the dimensionality of the data is equal to the dimensionality of the model, then there is a one to one correspondence between all the datasets and the models.

Speaker 2

因此数据集的微小变化实际上会导致模型的巨大变化，这就是性能变差的原因。

So small changes in the dataset actually lead to large changes in the model, and that's why performance gets worse.

Speaker 2

所以这大致上是最佳解释了。

So this is the best explanation more or less.

Speaker 1

那么模型最好拥有更多参数，使其规模大于数据量。

So then it would be good for the model to have more parameters so to be bigger than the data.

Speaker 2

没错。

That's right.

Speaker 2

但前提是你没有提前停止训练。

But only if you don't early stop.

Speaker 2

如果在正则化中引入早停机制，你几乎可以完全消除双下降的凸起现象。

If you introduce early stop in your regularization, you can make the double descent bump almost completely disappear.

Speaker 2

什么是早停？

What is early stop?

Speaker 2

早停法是指在训练 and you monitor your test your validation performance.

Early stopping is when you train your model and you monitor your test your validation performance.

Speaker 2

然后如果在某个时间点验证性能开始变差，你就说，好吧，我们停止训练。

And then if at some point validation performance starts to get worse, you say, okay, let's stop training.

Speaker 2

我们做得很好，我们做得很好。

We are good we are good.

Speaker 2

我们已经足够好了。

We are good enough.

Speaker 1

所以魔法是在那个时刻之后发生的，所以你不想进行早停？

So the the magic happens after after that moment, so you don't wanna do the early stopping?

Speaker 2

嗯，如果你不进行早停，你会得到非常明显的双下降现象。

Well, if you don't do the early stopping, you get these very you get the very pronounced double descent.

Speaker 1

你对为什么会发生这种情况有什么直觉吗？

Do you have any intuition why this happens?

Speaker 2

双下降？

Double descent?

Speaker 2

哦，抱歉。

Oh, sorry.

Speaker 2

你要停止吗？

Are you stopping?

Speaker 1

不。

No.

Speaker 1

双下降现象。

The double descent.

Speaker 1

所以

So the

Speaker 2

哦，对。

Oh, yeah.

Speaker 2

那我试着解释一下。

So I try let's see.

Speaker 2

直观理解基本是这样的：当数据集的自由度与模型相当时，两者间就存在一一对应关系。

The intuition is basically is this, that when the dataset has as many degrees of freedom as the model, then there is a one to one correspondence between them.

Speaker 2

因此数据集中的微小变化会导致模型产生明显变化。

And so small changes to the dataset lead to noticeable changes in the model.

Speaker 2

所以你的模型对所有随机性都非常敏感。

So your model is very sensitive to all the randomness.

Speaker 2

它无法将其丢弃。

It is unable to discard it.

Speaker 2

而事实证明，当你的数据量远多于参数，或参数远多于数据时，最终解决方案对数据集中的微小变化将不敏感。

Whereas, it turns out that when you have a lot more data than parameters, or a lot more parameters than data, the resulting solution will be insensitive to small changes in the dataset.

Speaker 1

哦，所以它能——这个说法很贴切——丢弃那些微小变化和随机性。

Oh, so it's able to, that's nicely put, discard the small changes, the the randomness.

Speaker 1

正是如此。

Exactly.

Speaker 2

那些你不想要的虚假相关性。

The the the spurious correlation which you don't want.

Speaker 1

杰夫·辛顿曾建议我们需要抛弃反向传播。

Jeff Hinton suggested we need to throw back propagation.

Speaker 1

我们之前已经稍微讨论过这个话题，但他建议我们应该抛弃反向传播，从头开始。

We already kinda talked about this a little bit, but he suggested that we need to throw away back propagation and start over.

Speaker 1

当然，这其中有些话带着几分机智和幽默，但你怎么看？

I mean, of course, some of that is a little bit wit and humor, but what do you think?

Speaker 1

训练神经网络还能有什么替代方法呢？

What could be an alternative method of training neural networks?

Speaker 2

嗯，他确切表达的意思是，既然在大脑中找不到反向传播机制，就值得探讨是否能从大脑的学习方式中汲取经验，但反向传播非常实用，应当继续使用它。

Well, the thing that he said precisely is that to the extent that you can't find backpropagation in the brain, it's worth seeing if we can learn something from how the brain learns, but backpropagation is very useful and you should keep using it.

Speaker 1

哦，你是说一旦我们发现了大脑的学习机制或该机制的任何方面，我们也应该尝试在你的网络中实现它。

Oh, you're saying that once we discover the mechanism of learning in the brain or any aspects of that mechanism, we should also try to implement that in your network.

Speaker 2

如果事实证明我们无法在大脑中找到反向传播。

If it turns out that we can't find backpropagation in the brain.

Speaker 1

如果我们无法在大脑中找到反向传播。

If we can't find backpropagation in the brain.

Speaker 1

那么，我想你对这个问题的回答是反向传播非常有用。

Well, so I guess your answer to that is backpropagation is pretty damn useful.

Speaker 1

那我们为什么还要抱怨呢？

So why are we complaining?

Speaker 2

我个人非常推崇反向传播算法。

I mean, I I personally am a big fan of backpropagation.

Speaker 2

我认为这是一个伟大的算法，因为它解决了一个极其基础的问题——在特定约束条件下寻找神经回路。

I think it's a great algorithm because it solves an extremely fundamental problem, which is finding a neural circuit subject to some constraints.

Speaker 2

我看不出这个问题会消失，因此我认为我们不太可能找到完全不同的替代方案。

And I don't see that problem going away, so that's why I I really I think it's pretty unlikely that we'll have anything which is going to be dramatically different.

Speaker 2

这有可能发生，但我现在不会押注于此。

It could happen, but I wouldn't bet on it right now.

Speaker 2

所以让

So let

Speaker 1

我问一个宏观的问题。

me ask a sort of big picture question.

Speaker 1

你认为神经网络能够被设计成具备推理能力吗？

Do you think can do you think neural networks can be made to reason?

Speaker 2

为什么不能呢？

Why not?

Speaker 2

比如看看AlphaGo或AlphaZero，AlphaZero的神经网络下围棋——我们都认同这是一个需要推理的游戏——其表现超过了99.9%的人类，仅凭神经网络本身，不依赖搜索算法，这难道不是证明了神经网络具备推理能力的存在性证据吗？

Well, if you look for example at AlphaGo or AlphaZero, the neural network of AlphaZero plays Go, which which we all agree is a game that requires reasoning, better than 99.9% of all humans, just the neural network, without the search, just the neural network itself, doesn't that give us an existence proof that neural networks can reason?

Speaker 1

我要稍微反驳一下，虽然我们都认同围棋需要推理。

To push back and disagree a little bit, we all agree that Go is reasoning.

Speaker 1

我想...我同意。

I think I I agree.

Speaker 1

但这并非微不足道——显然，推理和智能一样，某种程度上是个模糊的灰色地带概念。

I don't think it's a trivial so, obviously, reasoning like intelligence is is a loose gray area term a little bit.

Speaker 1

或许你不同意这点。

Maybe you disagree with that.

Speaker 1

不过确实，我认为它具备了推理的某些要素。

But, yes, I think it has some of the same elements of reasoning.

Speaker 1

推理几乎类似于搜索过程。

Reasoning is almost like akin to search.

Speaker 1

对吧？

Right?

Speaker 1

存在一种逐步考虑可能性的序列化元素，并以顺序方式在这些可能性基础上构建，直到得出某种洞见。

There's a sequential element of stepwise consideration of possibilities and sort of building on top of those possibilities in a sequential manner until you arrive at some insight.

Speaker 1

所以，是的，我认为围棋对弈某种程度上就是这样，当单个神经网络无需搜索就能做到时，它确实类似这种过程。

So, yeah, I guess playing goes kind of like that, and when you have a single neural network doing that without search, it's kind of like that.

Speaker 1

因此在特定受限环境中，存在一个类似于许多人称之为推理过程的实证，但更广义的推理形式。

So there's an existing proof in a particular constrained environment that a process akin to what many people call reasoning exists, but more general kind of reasoning.

Speaker 1

所以脱离棋盘而言。

So off the board.

Speaker 1

还有另一个实证存在。

There is one other existence proof.

Speaker 1

哦，天哪。

Oh, boy.

Speaker 1

是哪一个？

Which one?

Speaker 1

我们人类？

Us humans?

Speaker 1

是的。

Yes.

Speaker 1

好的。

Okay.

Speaker 1

明白了。

Alright.

Speaker 1

那么你认为能让神经网络具备推理能力的架构，会与我们现有的神经网络架构相似吗？

So do you think the architecture that will allow neural networks to reason will look similar to the neural network architectures we have today?

Speaker 2

我认为会的——不过，我不想把话说得太绝对。

I think it will I think well, I don't wanna make too overly definitive statements.

Speaker 2

我确信未来实现推理突破的神经网络，很有可能与现有架构非常相似。

I think it's definitely possible that the neural networks that will produce the reasoning breakthroughs of the future will be very similar to the architectures that exist today.

Speaker 2

或许会更循环一些，或许层级更深一些，但这些神经网络的潜力已经强大到不可思议。

Maybe a little bit more recurrent, maybe a little bit deeper, but these these neural lines are so insanely powerful.

Speaker 2

为什么它们不能学会推理呢？

Why wouldn't they be able to learn to reason?

Speaker 2

人类能够推理，为什么神经网络不行？

Humans can reason, so why can't neural networks?

Speaker 2

所以你认为

So do you

Speaker 1

你认为我们目前看到的神经网络表现是否只是一种弱推理？

think the kind of stuff we've seen neural networks do is a kind of just weak reasoning?

Speaker 1

所以这不是一个根本不同的过程？

So it's not a fundamentally different process?

Speaker 1

再次强调，这些问题我们没人知道答案。

Again, this is stuff we don't nobody knows the answer to.

Speaker 2

说到神经网络，我认为神经网络确实具备推理能力。

So when it comes to our neural networks, I would the thing which I would say is that neural networks are capable of reasoning.

Speaker 2

但如果你训练神经网络的任务不需要推理，它就不会进行推理。

But if you train a neural network on a task which doesn't require reasoning, it's not going to reason.

Speaker 2

这是一个众所周知的效应：神经网络会以最简单的方式精确解决你摆在它面前的问题。

This is a well known effect where the neural network will solve exactly the issue, solve the problem that you pose in front of it in the easiest way possible.

Speaker 1

对。

Right.

Speaker 1

这让我们想到你描述神经网络的一个精妙方式——你将神经网络称为对小电路的寻找，而通用智能或许是对小程序的一种追寻。

That takes us to the to one of the brilliant sort of ways you describe neural networks, which is you've referred to neural networks as the search for small circuits and maybe general intelligence as the search for small programs,

Speaker 2

我觉得这个比喻非常引人入胜。

which I found as a metaphor very compelling.

Speaker 2

你能详细说明一下这个区别吗？

Can you elaborate on that difference?

Speaker 2

是的。

Yeah.

Speaker 2

所以我刚才准确表达的意思是，如果你能找到输出你手头数据的最短程序，那么你就能用它做出最佳预测。

So the thing which I said precisely was that if you can find the shortest program that outputs the data in your at your disposal, then you will be able to use it to make the best prediction possible.

Speaker 1

嗯。

Mhmm.

Speaker 2

这是一个可以通过数学证明的理论陈述。

And that's a theoretical statement which can be proven mathematically.

Speaker 2

现在，你也可以用数学证明，找到生成某些数据的最短程序是一个不可计算的操作。

Now, you can also prove mathematically that it is that finding the shortest program which generates some data is not is not a computable operation.

Speaker 2

任何有限的计算资源都无法完成这个任务。

No finite amount of compute can do this.

Speaker 2

因此，神经网络在实践中成为了次优但切实可行的解决方案。

So then, with with neural networks, neural networks are the next best thing that actually works in practice.

Speaker 2

我们虽然无法找到生成数据的最短程序，但能够找到一个相对紧凑的解决方案——不过现在这个表述需要修正。

We are not able to find the best the shortest program which generates our data, but we are able to find, you know, a small, but now now that statement should be amended.

Speaker 2

甚至是一个以某种方式拟合我们数据的大型电路。

Even a large circuit which fits our data in some way.

Speaker 1

我认为你所说的'小型电路'指的是所需的最小规模电路。

Well, I think what you meant by the small circuit is the smallest needed circuit.

Speaker 2

嗯，我现在想修正当时的观点——那时我还没有完全理解过参数化研究结果的深刻含义。

Well, I the thing the thing which I would change now back back then, I really have I haven't fully internalized the overparameter the overparameterized results.

Speaker 2

关于过参数化神经网络，我现在会这样表述：它是一个大型电路，其权重包含少量信息，我认为这正是其运作机制。

The the things we know about overparameterized neural nets, now I would phrase it as a large circuit that can whose weights contain a small amount of information, which I think is what's going on.

Speaker 2

如果将神经网络的训练过程想象为缓慢地将熵从数据集传递到参数，那么权重中的信息量最终不会很大，这解释了它们为何泛化能力如此出色。

If you imagine the training process of a neural network as you slowly transmit entropy from the dataset to the parameters, then somehow the amount of information in the weights ends up being not very large, which would explain why they generalize so well.

Speaker 1

所以这个大型电路可能正是有助于正则化、促进泛化的关键。

So that's the the large circuit might be one that's helpful for the regulars for the generalization.

Speaker 1

是的。

Yeah.

Speaker 1

类似这样的。

Something like this.

Speaker 1

但你认为尝试学习类似程序的东西很重要吗？

But do you see there do you see it important to be able to try to learn something like programs?

Speaker 2

我是说，如果可以的话，那当然重要。

I mean, if we can, definitely.

Speaker 2

我认为答案某种程度上是肯定的，如果我们能做到的话。

I think it's kind of the answer is kind of yes, if we can do it.

Speaker 2

我们应该做力所能及之事。

We should do things that we can do it.

Speaker 2

这正是我们推动深度学习的根本原因——我们能够训练它们。

It's it's the reason we are pushing on deep learning, the fundamental reason, the the the root cause is that we are able to train them.

Speaker 2

换句话说，训练是第一位的。

So in other words, training comes first.

Speaker 2

我们已经确立了训练这一支柱。

We've got our pillar, which is the training pillar.

Speaker 2

现在我们正试图让神经网络围绕训练支柱进行调整。

And now we are trying to contort our neural networks around the training pillar.

Speaker 2

我们必须保持可训练性。

We gotta stay trainable.

Speaker 2

这是一条我们无法违背的不变法则。

This is an this is an invariant we cannot violate.

Speaker 2

因此，可训练性意味着从零开始，一无所知，

And so being trainable means starting from scratch, knowing nothing,

Speaker 1

你实际上可以相当快速地积累大量知识。

you can actually pretty quickly converge towards knowing a lot.

Speaker 2

或者甚至慢慢来。

Or even slowly.

Speaker 2

但这意味着，利用你手头的资源，你可以训练神经网络并使其达到有用的性能水平。

But it means that given the resources at your disposal, you can train the neural net and get it to achieve useful performance.

Speaker 2

没错。

Yeah.

Speaker 2

这是我们无法背离的支柱。

That's a pillar we can't move away from.

Speaker 2

说得对。

That's right.

Speaker 2

因为如果你说‘嘿，让我们找最短程序’，我们做不到。

Because if you can and whereas if you say, hey, let's find the shortest program, well, we can't do that.

Speaker 2

所以不管那会有多有用，我们做不到，所以就不做。

So it doesn't matter how useful that would be, we can't do it, so we won't.

Speaker 1

那么你认为神经网络擅长寻找小型或大型电路，是这样吗？

So do you think you kinda mentioned that the neural networks are good at finding small circuits or large circuits.

Speaker 1

那你觉得寻找小程序的问题是否仅仅在于数据？

Do you think then the matter of finding small programs is just the data?

Speaker 1

不是。

No.

Speaker 1

抱歉打断一下。

So the cut sorry.

Speaker 1

不是规模或特性的问题。

Not not the size or character.

Speaker 1

是数据的类型问题。

The qual the the type of data.

Speaker 1

某种程度上说是给它提供程序的问题。

Sort of ask giving it programs.

Speaker 1

嗯，我

Well, I

Speaker 2

我认为目前的问题是，还没有人能够很好地成功找到程序的先例。

think the thing is that right now, finding there are no good precedents of people successfully finding programs really well.

Speaker 2

所以基本上，你要找到程序的方法就是训练一个深度神经网络来完成这个任务。

And so the way you'd find programs is you'd train a deep neural network to do it, basically.

Speaker 2

对。

Right.

Speaker 2

这确实是正确的方向。

Which is which is the right way to go about it.

Speaker 1

但目前还没有很好的实例来证明这一点。

But there's not good illustrations of that.

Speaker 2

虽然尚未实现，但原则上应该是可行的。

It hasn't been done yet, but in in principle, it should be possible.

Speaker 1

你能详细说明一下吗？

Can you elaborate a little bit?

Speaker 1

你的原则性见解是什么？

You what's your insight in principle?

Speaker 1

换句话说，你认为没有理由不可能实现。

And and put another way, you don't see why it's not possible.

Speaker 2

嗯，这更像是一种表态——我认为押注反对深度学习是不明智的。

Well, it's kind of like more it's more a statement of I think that it's I I think that it's unwise to bet against deep learning.

Speaker 2

如果这是人类似乎能够完成的认知功能，那么用不了多久就会出现能实现同样功能的深度神经网络。

If it's a if it's a cognitive function that humans seem to be able to do, then it doesn't take too long for some deep neural net to pop up that can do it too.

Speaker 1

是啊。

Yeah.

Speaker 1

我完全赞同你的观点。

I'm I'm I'm there with you.

Speaker 1

我已经不再质疑神经网络了，因为它们不断给我们带来惊喜。

I can I've I've stopped betting against neural networks at this point because they continue to surprise us.

Speaker 1

那长期记忆呢？

What about long term memory?

Speaker 1

神经网络能拥有长期记忆或类似知识库的功能吗？

Can neural networks have long term memory or something like knowledge basis?

Speaker 1

能够长期聚合重要信息，这些信息随后可以作为有用的状态表征来辅助决策，从而拥有一个基于长期背景的决策依据。

So being able to aggregate important information over long periods of time that would then serve as useful sort of representations of state that you can make decisions by, so have a long term context based on which you're making the decision.

Speaker 1

所以在某种意义上，

So in some sense,

Speaker 2

参数已经实现了这一点。

the parameters already do that.

Speaker 2

这些参数是整个神经网络经验的聚合，因此它们可以被视为长期知识的体现。

The parameters are an aggregation of the day of the neural of the entirety of the neural net's experience, and so they count as the long as long form long term knowledge.

Speaker 2

人们已经训练了各种神经网络作为知识库，并且研究了这个领域——人们研究了语言模型作为知识库的应用。

And people have trained various neural nets to act as knowledge bases and, you know, investigated this invest people have investigated language models as knowledge bases.

Speaker 2

所以这方面确实有研究工作在进行。

So there is work there is work there.

Speaker 2

是的。

Yeah.

Speaker 1

但从某种意义上说，你认为在所有方面都是如此吗？

But in some sense, do you think in every sense?

Speaker 1

你认为这是否只是一个关于建立更好的遗忘无用信息、记住有用信息的机制的问题？

Do you think there's a it's it's all just a matter of coming up with a better mechanism of forgetting the useless stuff and remembering the useful stuff?

Speaker 1

因为目前还没有真正能长期记忆信息的机制。

Because right now, I mean, there's not been mechanisms that do remember really long term information.

Speaker 2

你具体是指什么？

What do you mean by that precisely?

Speaker 1

精确。

Precisely.

Speaker 1

我喜欢'精确'这个词。

I like I like the word precisely.

Speaker 1

我在思考知识库所代表的信息压缩方式，某种程度上创造了一个——抱歉我用人类中心主义的方式思考知识，因为神经网络所发现的知识未必是可解释的。

So I'm thinking of the kind of compression of information the knowledge bases represent, sort of creating a now I apologize for my sort of human centric thinking about what knowledge is because neural networks aren't interpretable necessarily with the kind of knowledge they have discovered.

Speaker 1

但是

But

Speaker 2

一个

Speaker 1

对我来说，知识库就是一个很好的例子，能够逐步构建类似维基百科所代表的那种知识体系。

good example for me is knowledge bases, being able to build up over time something like the knowledge that Wikipedia represents.

Speaker 1

它是一个高度压缩、结构化的知识库。

It's a really compressed, structured knowledge base.

Speaker 1

当然，不是指实际的维基百科或其语言，而是像语义网络那样，实现语义网络所代表的理想。

Obviously, not the actual Wikipedia or the language, but like a semantic web, the dream that semantic web represented.

Speaker 1

因此这是一个非常精炼的知识库，或者说类似于神经网络那种不可解释形式的存在。

So it's a really nice compressed knowledge base or something akin to that in a non interpretable sense as neural networks would have.

Speaker 2

嗯，如果看神经网络的权重参数确实不可解释，但它们的输出应该是非常可解释的。

Well, neural networks would be non interpretable if you look at their rates, but their outputs should be very interpretable.

Speaker 1

好的。

Okay.

Speaker 1

那么，如何让像语言模型这样非常智能的神经网络变得可解释呢？

So, yeah, how do you how do you make very smart neural networks, like language models, interpretable?

Speaker 2

你可以要求它们生成一些文本，而这些文本通常都是可解释的。

Well, you ask them to generate some text, and the text will generally be interpretable.

Speaker 1

你认为这就是可解释性的典范吗？

Do you find that the epitome of interpretability?

Speaker 1

比如，你能做得更好吗？

Like, can you do better?

Speaker 1

比如，你能...你不能，好吧。

Like, can you you can't okay.

Speaker 1

我想知道它了解什么，不了解什么。

I would like to know what does it know and what doesn't it know.

Speaker 1

我希望神经网络能举出它完全愚笨的例子和完全出色的例子。

I would like the neural network to come up with examples where it it's completely dumb and examples where it's completely brilliant.

Speaker 1

目前我所知的唯一方法就是生成大量示例，然后用人类判断力来评估。

And the only way I know how to do that now is to generate a lot of examples and use my human judgment.

Speaker 1

但如果神经网络能具备某种自我认知能力会更好

But it would be nice if a neural network had some aware self awareness about

Speaker 2

是啊。

Yeah.

Speaker 2

100%。

100%.

Speaker 2

我非常相信自我意识的重要性，我认为新型神经网络的自我意识将实现你描述的那些能力——让它们知道自己知道什么、不知道什么，并明白应该在哪里投入资源才能最优化地提升技能。

I'm I'm a big believer in self awareness, and I think that I think I think new neural net self awareness will allow for things like the capabilities like the ones you described, like, for them to know what they know and what they don't know, and for them to know where to invest to increase their skills most optimally.

Speaker 2

关于你提到的可解释性问题，实际上有两个答案。

And to your question of interpretability, there are actually two answers to that question.

Speaker 2

一个答案是，我们拥有神经网络，因此可以分析神经元，尝试理解不同神经元和不同层次的含义。

One answer is, you know, we have the neural net, so we can analyze the neurons, and we can try to understand what the different neurons and different layers mean.

Speaker 2

事实上你确实可以做到这一点，OpenAI在这方面已经做了一些工作。

And you can actually do that, and OpenAI has done some work on that.

Speaker 2

但还有另一种答案，我认为那是以人类为中心的答案，就像你观察一个人时，你无法直接读取他们的想法，那你怎么知道人类在想什么呢？

But there is a different answer, which is that, I would say, that's the human centric answer, where you say, you know, you look at a human being, you can't read, you know how do you know what a human being is thinking?

Speaker 2

你会直接询问他们：'嘿，你对这个怎么看？'

You ask them, you say, hey, what do you think about this?

Speaker 2

'你对那个有什么看法？'

What do you think about that?

Speaker 2

然后你会得到一些答案。

And you get some answers.

Speaker 1

你得到的答案具有粘性，因为你已经有一个心理模型。

The answers you get are sticky in the sense you already have a mental model.

Speaker 1

你已经有一个心理模型了，是的。

You already have an yeah.

Speaker 1

一个关于那个人的心理模型。

A mental model of that human being.

Speaker 1

你已经对那个人有了大致的概念，了解他们的思维方式、认知模式以及世界观，而你提出的每个问题都在此基础上不断补充。

You already have an understanding of, like, a a big conception of what it of that human being, how they think, how they know, how they see the world, and then everything you ask, you're adding onto that.

Speaker 1

这种粘性似乎正是人类非常有趣的特质之一——信息具有粘性。

And that stickiness seems to be that's one of the really interesting qualities of the the human being is that information is sticky.

Speaker 1

你似乎会记住有用的信息，很好地整合它们，而忘记大部分无用的内容。

You don't you seem to remember the useful stuff, aggregate it well, and forget most of the information that's not useful.

Speaker 1

这个过程与神经网络的工作方式也颇为相似。

That that process but that's also pretty similar to the process that neural networks do.

Speaker 1

只是目前神经网络在这方面要差得多。

It's just that neural networks are much crappier at at this time.

Speaker 1

这并不是说它们在根本上有多么不同。

It's not it doesn't seem to be fundamentally that different.

Speaker 1

不过还是让我们继续讨论推理这个话题。

But just stick on reasoning for a little longer.

Speaker 1

你刚才问，为什么不呢？

You said, why not?

Speaker 1

为什么我不能进行推理？

Why can't I reason?

Speaker 1

对你来说，什么样的推理能力表现能让你印象深刻？如果神经网络能做到什么会让你感到惊艳？

What what's a good impressive feat benchmark to you of reasoning that you'll be impressed by if neural networks were able to do?

Speaker 1

你心里已经有这样的标准了吗？

Is that something you already have in mind?

Speaker 2

嗯，我认为是写出真正优秀的代码，证明非常困难的定理，用创新方案解决开放式问题。

Well, I think writing writing really good code, I think proving really hard theorems, solving open ended problems with out of the box solutions.

Speaker 1

以及各类定理型的数学问题。

And sort of theorem type mathematical problems.

Speaker 1

是啊。

Yeah.

Speaker 2

我认为这些也是非常自然的例子。

I think those ones are a very natural example as well.

Speaker 2

你知道，如果能证明一个未被证实的定理，那就很难反驳了。

You know, if you can prove an unproven theorem, then it's hard to argue.

Speaker 2

别讲道理了。

Don't reason.

Speaker 2

所以，顺便说一句，这又回到了关于硬性结果的观点上。

And so, by the way, and this comes back to the point about the hard results, you know.

Speaker 2

如果你拥有机器学习，深度学习这个领域是非常幸运的，因为我们有时能够产出这些明确无误的结果。

If you got a hard if you have machine learning, deep learning as a field is very fortunate because we have the ability to sometimes produce these unambiguous results.

Speaker 2

当这些结果出现时，争论就会改变，对话也会改变。

And when they happen, the debate changes, the conversation changes.

Speaker 2

这是一种反向思维——你拥有改变对话走向的能力。

It's a converse you you have the ability to produce conversation changing results.

Speaker 2

对话。

Conversation.

Speaker 1

然后，当然就像你说的，人们会逐渐把这视为理所当然，并说那其实不算什么难题。

And then, of course, just like you said, people kinda take that for granted and say that wasn't actually a hard problem.

Speaker 2

嗯，我是说，总有一天我们可能会耗尽所有难题。

Well, I mean, at some point, we'll probably run out of hard problems.

Speaker 1

是啊。

Yeah.

Speaker 1

那个关于死亡率的难题，至今仍是个我们尚未解决的棘手问题。

That whole mortality thing is kinda kind of a sticky problem that we haven't quite figured out.

Speaker 1

或许我们能解决它。

Maybe we'll solve that one.

Speaker 1

我认为在你整个研究体系中，也包括OpenAI近期的成果，最引人入胜的变化之一发生在语言模型领域。

I think one of the fascinating things in in your entire body of work, but also the work at OpenAI recently, one of the conversation changes has been in the world of language models.

Speaker 1

你能简要描述一下神经网络在语言和文本领域应用的近期发展历程吗？

Can you briefly kinda try to describe the recent history of using neural networks in the domain of language and text?

Speaker 2

嗯，这段历史相当丰富。

Well, there's been lots of history.

Speaker 2

我认为LMA网络是八十年代应用于语言处理的一个小型循环神经网络。

I think I think the LMA network was was a was a small, tiny recurrent neural network applied to language back in the eighties.

Speaker 2

所以这段历史其实相当悠久。

So the history is really, you know, fairly long at least.

Speaker 2

真正改变神经网络和语言发展轨迹的，也是改变整个深度学习轨迹的，就是数据和算力。

And the thing that started the the thing that changed the trajectory of neural networks and language is the thing that changed the trajectory of all deep learning, that's data and compute.

Speaker 2

于是突然间，我们就从小型语言模型（只能学到一点点知识）

So suddenly, you move from small language models, which learn a little bit.

Speaker 2

特别是语言模型，有一个非常明确的解释说明为什么它们必须足够大才能表现良好。

And with language models in particular, can there's a very clear explanation for why they need to be large to be good.

Speaker 2

因为它们要预测下一个单词。

Because they're trying to predict the next word.

Speaker 1

嗯

Mhmm.

Speaker 2

所以当你一无所知时，你会注意到非常宽泛的表面模式，比如有时字符之间会有空格

So when you don't when you don't know anything, you'll notice very very broad strokes surface level patterns like sometimes there are characters and there is a space between those characters.

Speaker 2

你会注意到这种模式

You'll notice this pattern.

Speaker 2

你还会注意到有时会出现逗号，然后下一个字符是大写字母

And you'll notice that sometimes there is a comma and then the next character is a capital letter.

Speaker 2

你会注意到那种模式

You'll notice that pattern.

Speaker 2

最终，你可能会开始注意到某些单词经常出现

Eventually, you may start to notice that there are certain words occur often.

Speaker 2

你可能会注意到拼写规则的存在

You may notice that spellings are a thing.

Speaker 2

你可能会注意到句法结构

You may notice syntax.

Speaker 2

当你真正精通这些时，就会开始注意到语义层面。

And when you get really good at all these, you start to notice the semantics.

Speaker 2

你会开始注意到事实。

You start to notice the facts.

Speaker 2

但要做到这一点，语言模型需要更大规模。

But for that to happen, the language model needs to be larger.

Speaker 1

那让我们多讨论这一点，因为这是你与诺姆·乔姆斯基意见分歧的地方。

So that's let's linger on that, because that's where you and Noam Chomsky disagree.

Speaker 1

所以你认为我们实际上正在采取渐进步骤，通过更大的网络和更强的计算能力，最终能够触及语义层面，能够理解语言，而无需像诺姆所设想的那样——不需要对语言结构有根本性的理解，比如将你的语言理论强加于学习机制上。

So you think we're actually taking incremental steps, a sort of larger network, larger compute will be able to get to the semantics, be able to understand language without what Noam likes to sort of think of as a fundamental understandings of the structure of language, like imposing your theory of language onto the learning mechanism.

Speaker 1

那么你的意思是，学习机制可以从原始数据中自行掌握语言背后的规律。

So you're saying the learning you can learn from raw data the mechanism that underlies language.

Speaker 2

嗯，我认为这种可能性相当大。

Well, I think I think it's pretty likely.

Speaker 2

但我也想说明，我并不完全清楚乔姆斯基在谈论他时具体指的是什么。

But I also wanna say that I don't really know precisely what is what Chomsky means when he talks about him.

Speaker 2

你刚才提到在语言中强加结构的问题。

You you said something about imposing your structure in language.

Speaker 2

我不完全确定他的意思，但从经验上看，当你检查那些更大的语言模型时，它们确实表现出理解语义的迹象，而较小的语言模型则没有。

I'm not 100% sure what he means, but empirically, it seems that when you inspect those larger language models, they exhibit signs of understanding the semantics, whereas the smaller language models do not.

Speaker 2

几年前我们在研究情感神经元时就观察到了这种现象。

We've seen that a few years ago when we did work on the sentiment neuron.

Speaker 2

我们当时训练了一个小型LSTM模型来预测亚马逊评论中的下一个字符。

We trained a small, you know, smallish LSTM to predict the next character in Amazon reviews.

Speaker 2

我们注意到，当你将LSTM的大小从500个LSTM单元增加到4000个LSTM单元时，其中一个神经元开始代表文章的情感倾向，抱歉，是代表其观点的情感倾向。

And we noticed that when you increase the size of the LSTM from 500 LSTM cells to 4,000 LSTM cells, then one of the neurons starts to represent the sentiment of the article, of, sorry, of their view.

Speaker 2

为什么会这样呢？

Now, why is that?

Speaker 2

情感是一个相当语义化的属性，而非句法属性。

Sentiment is a pretty semantic attribute, it's not a syntactic attribute.

Speaker 1

对于可能不了解的听众来说，我不确定这是否是一个标准术语，但情感指的是评论是正面还是负面。

And for people who might not know, I don't know if that's a standard term, but sentiment is whether it's a positive or a negative review.

Speaker 2

没错。

That's right.

Speaker 2

比如，这个人对某件事是满意还是不满意？

Like, is is the person happy with something, or is the person unhappy with something?

Speaker 2

因此我们当时就获得了非常明确的证据：小型神经网络无法捕捉情感，而大型神经网络可以。

And so here we had very clear evidence that a small neural net does not capture sentiment while a large neural net does.

Speaker 2

为什么会这样呢？

And why is that?

Speaker 2

我们的理论是，在某个阶段，模型会耗尽语法资源，不得不开始关注其他方面。

Well, our theory is that at some point, you run out of syntax to models, you start to gotta focus on something else.

Speaker 2

而且

And

Speaker 1

随着规模扩大，模型会迅速耗尽语法资源，然后真正开始聚焦于语义层面

with size, you quickly run out of syntax to model, and then you really start to focus on the semantics is would

Speaker 2

就是这个观点。

be the idea.

Speaker 2

没错。

That's right.

Speaker 2

所以我不想暗示我们的模型具备完全的语义理解能力，因为事实并非如此。

And so I don't I don't wanna imply that our models have complete semantic understanding because that's not true.

Speaker 2

但它们确实展现出语义理解的迹象，部分语义理解的能力。

But they definitely are showing signs of semantic understanding, partial semantic understanding.

Speaker 2

但较小的模型并未显示出这些迹象。

But the smaller models do not show that those signs.

Speaker 2

能否退一步说说，什么是GPT-2？这个在过去几年改变游戏规则的大型语言模型是什么？

Can you take a step back and say, what is GPT two, which is one of the big language models that was the conversation changer in the past couple of years?

Speaker 2

是的。

Yes.

Speaker 2

GPT-2是一个拥有15亿参数的Transformer模型，它在大约400亿文本标记上进行了训练，这些文本数据来源于Reddit文章中获得超过三个赞的网页链接。

So it's so so GPT two is a transformer with one and a half billion parameters that was trained on a on about 40,000,000,000 tokens of text, which were obtained from web pages that were linked to from Reddit articles with more than three uploads.

Speaker 2

那么什么是Transformer呢？

And what's the transformer?

Speaker 2

Transformer是神经网络架构近年来最重要的突破。

The transformer is the most important advance in neural network architectures in recent history.

Speaker 2

注意力机制又是什么呢？

What is attention maybe too?

Speaker 2

因为我

Because I

Speaker 1

认为这是个有趣的概念，不一定是技术层面的，而是注意力机制与循环神经网络所代表理念的对比。

think that's an interesting idea, not necessarily sort of technically speaking, but the idea of attention versus maybe what recur recurrent neural networks represent.

Speaker 2

是的。

Yeah.

Speaker 2

Transformer实际上是多个概念的综合体，其中注意力机制只是其中之一。

So the thing is the transformer is a combination of multiple ideas simultaneously of which attention is one.

Speaker 2

你认为注意力机制是关键吗？

Do you think attention is the key?

Speaker 2

不是。

No.

Speaker 2

它是一个关键，但不是唯一的关键。

It's a key, but it's not the key.

Speaker 2

Transformer之所以成功，是因为它同时融合了多种理念，如果去掉其中任何一个，它的效果都会大打折扣。

The transformer is successful because it is the simultaneous combination of multiple ideas, and if you were to remove either idea, it would be much less successful.

Speaker 2

Transformer虽然大量使用了注意力机制，但注意力机制已存在多年，所以这不能算主要创新点。

So the transformer uses a lot of attention, but attention existed for a few years, so that can't be the main innovation.

Speaker 2

Transformer的设计使其在GPU上运行极快，这带来了巨大的性能差异。

The transformer is designed in such a way that it runs really fast on the GPU, And that makes a huge amount of difference.

Speaker 2

这是一点。

This is one thing.

Speaker 2

第二点是Transformer不是循环的。

The second thing is that transformer is not recurrent.