第333期——安德烈·卡帕西：特斯拉人工智能、自动驾驶、Optimus机器人、外星人与通用人工智能

本集简介

安德烈·卡帕西（Andrej Karpathy）是一位传奇的人工智能研究员、工程师和教育家。他曾任特斯拉人工智能总监，是OpenAI的创始成员之一，并在斯坦福大学任教。请通过以下赞助商支持本播客： – Eight Sleep：https://www.eightsleep.com/lex 获取特别优惠 – BetterHelp：https://betterhelp.com/lex 享受9折优惠 – Fundrise：https://fundrise.com/lex – Athletic Greens：https://athleticgreens.com/lex 免费获赠1个月鱼油 **单集链接：** 安德烈的推特：http://twitter.com/karpathy 安德烈的YouTube：http://youtube.com/c/AndrejKarpathy 安德烈的个人网站：http://karpathy.ai 安德烈的谷歌学术：http://scholar.google.com/citations?user=l8WuQJgAAAAJ **提及书籍：** 《生命的关键问题》：https://amzn.to/3q0vN6q 《生命上升》：https://amzn.to/3wKIsOE 《自私的基因》：https://amzn.to/3TCo63s 《接触》：https://amzn.to/3W3y5Au 《细胞》：https://amzn.to/3W5f6pa **播客信息：** 播客官网：https://lexfridman.com/podcast 苹果播客：https://apple.co/2lwqZIr Spotify：https://spoti.fi/2nEwCF8 RSS订阅：https://lexfridman.com/feed/podcast/ YouTube完整版：https://youtube.com/lexfridman YouTube精选片段：https://youtube.com/lexclips **支持与联系：** – 通过上方赞助商支持播客（最佳方式） – Patreon支持：https://www.patreon.com/lexfridman – 推特：https://twitter.com/lexfridman – Instagram：https://www.instagram.com/lexfridman – LinkedIn：https://www.linkedin.com/in/lexfridman – Facebook：https://www.facebook.com/lexfridman – Medium：https://medium.com/@lexfridman **时间轴：** 以下为单集时间戳，部分播客客户端可点击跳转： (00:00) – 开场 (05:41) – 神经网络 (10:45) – 生物学 (16:15) – 外星生命 (26:27) – 宇宙 (38:18) – Transformer模型 (46:34) – 语言模型 (56:45) – 机器人 (1:03:05) – 谷歌LaMDA (1:10:28) – 软件2.0 (1:21:28) – 人工标注 (1:23:25) – 摄像头视觉 (1:28:30) – 特斯拉数据引擎 (1:32:39) – 特斯拉视觉系统 (1:39:09) – 埃隆·马斯克 (1:44:17) – 自动驾驶 (1:49:11) – 离开特斯拉 (1:54:39) – 特斯拉Optimus机器人 (2:03:45) – ImageNet (2:06:23) – 数据 (2:16:15) – 日常生活 (2:29:31) – 最佳IDE (2:36:37) – arXiv (2:41:06) – 给初学者的建议 (2:50:24) – 人工通用智能 (3:03:44) – 电影 (3:09:37) – 人类文明未来 (3:13:56) – 书籍推荐 (3:20:05) – 给年轻人的建议 (3:21:56) – 机器学习未来 (3:28:44) – 生命的意义

Andrej Karpathy is a legendary AI researcher, engineer, and educator. He’s the former director of AI at Tesla, a founding member of OpenAI, and an educator at Stanford. Please support this podcast by checking out our sponsors: – Eight Sleep: https://www.eightsleep.com/lex to get special savings – BetterHelp: https://betterhelp.com/lex to get 10% off – Fundrise: https://fundrise.com/lex – Athletic Greens: https://athleticgreens.com/lex to get 1 month of fish oil EPISODE LINKS: Andrej’s Twitter: http://twitter.com/karpathy Andrej’s YouTube: http://youtube.com/c/AndrejKarpathy Andrej’s Website: http://karpathy.ai Andrej’s Google Scholar: http://scholar.google.com/citations?user=l8WuQJgAAAAJ Books mentioned: The Vital Question: https://amzn.to/3q0vN6q Life Ascending: https://amzn.to/3wKIsOE The Selfish Gene: https://amzn.to/3TCo63s Contact: https://amzn.to/3W3y5Au The Cell: https://amzn.to/3W5f6pa PODCAST INFO: Podcast website: https://lexfridman.com/podcast Apple Podcasts: https://apple.co/2lwqZIr Spotify: https://spoti.fi/2nEwCF8 RSS: https://lexfridman.com/feed/podcast/ YouTube Full Episodes: https://youtube.com/lexfridman YouTube Clips: https://youtube.com/lexclips SUPPORT & CONNECT: – Check out the sponsors above, it’s the best way to support this podcast – Support on Patreon: https://www.patreon.com/lexfridman – Twitter: https://twitter.com/lexfridman – Instagram: https://www.instagram.com/lexfridman – LinkedIn: https://www.linkedin.com/in/lexfridman – Facebook: https://www.facebook.com/lexfridman – Medium: https://medium.com/@lexfridman OUTLINE: Here’s the timestamps for the episode. On some podcast players you should be able to click the timestamp to jump to that time. (00:00) – Introduction (05:41) – Neural networks (10:45) – Biology (16:15) – Aliens (26:27) – Universe (38:18) – Transformers (46:34) – Language models (56:45) – Bots (1:03:05) – Google’s LaMDA (1:10:28) – Software 2.0 (1:21:28) – Human annotation (1:23:25) – Camera vision (1:28:30) – Tesla’s Data Engine (1:32:39) – Tesla Vision (1:39:09) – Elon Musk (1:44:17) – Autonomous driving (1:49:11) – Leaving Tesla (1:54:39) – Tesla’s Optimus (2:03:45) – ImageNet (2:06:23) – Data (2:16:15) – Day in the life (2:29:31) – Best IDE (2:36:37) – arXiv (2:41:06) – Advice for beginners (2:50:24) – Artificial general intelligence (3:03:44) – Movies (3:09:37) – Future of human civilization (3:13:56) – Book recommendations (3:20:05) – Advice for young people (3:21:56) – Future of machine learning (3:28:44) – Meaning of life

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

以下是与安德烈·卡帕西的对话，他曾任特斯拉人工智能总监，此前在OpenAI和斯坦福大学工作。他是人工智能史上最伟大的科学家、工程师和教育家之一。现在快速提一下他的赞助商，详情请查看描述，这是支持本播客的最佳方式。

The following is a conversation with Andre Kapathy, previously the director of AI at Tesla, and before that, at OpenAI and Stanford. He is one of the greatest scientists, engineers, and educators in the history of artificial intelligence. And now a quick few second mention of his sponsor. Check them out in the description. It's the best way to support this podcast.

Speaker 0

我们为小憩准备了Eight Sleep，为心理健康准备了BetterHelp，为房地产投资准备了Fundrise，为营养补充准备了Athletic Greens。朋友们，请明智选择。现在进入完整广告时间——一如既往，中间不会插播广告。我尽量让广告有趣些，但若您选择跳过，还请支持我们的赞助商。

We got Eight Sleep for naps, BetterHelp for mental health, Fundrise for real estate investing, and Athletic Greens for nutrition. Choose wisely, my friends. And now onto the full ad reads. As always, no ads in the middle. I try to make this interesting, but if you skip them, please still check out our sponsors.

Speaker 0

我个人很喜欢他们的产品，或许您也会。本期节目由Eight Sleep及其新款Pod 3床垫赞助。我正在酒店录制这段内容——事实上，由于生活的一些复杂性，现在是凌晨4点。

I enjoy their stuff. Maybe you will too. This episode is sponsored by Eight Sleep and its new pod three mattress. I'm recording this in a hotel. In fact, given some complexities of my life, this is the middle of the night, 4AM.

Speaker 0

我坐在空荡荡的酒店房间里对着麦克风喊话。朋友们，这就是我的生活。通常在凌晨4点我会自我感觉良好，但喝了两杯咖啡后就不行了。我感觉良好的原因是我很快就能睡觉，而且已经完成了很多事。今天也是如此，除了'很快睡觉'这部分，因为我觉得我很快要去机场了。

I'm sitting in an empty hotel room yelling at a microphone. This, my friends, is my life. I do usually feel good about myself at 4AM, but not with two cups of coffee in me. And the reason I feel good is because I'm going to go to sleep soon, and I've accomplished a lot. This is true today, except for the sleep soon part because I think I'm going to an airport at some point soon.

Speaker 0

这都不重要。重要的是我甚至不会在这里睡觉，这很棒，因为酒店里没有能自动降温的Eight Sleep床。家里有，而那就是我要去的地方——我要回家了。总之，请访问8sleep.com/lex获取特别优惠。

It doesn't matter. What matters is I'm not even gonna sleep here, and that's great because in a hotel, I don't have an eight sleep bed that can cool itself. At home, I do, and that's where I'm headed. I'm headed home. Anyway, check it out and get special savings when you go to 8sleep.com/lex.

Speaker 0

本期节目也由BetterHelp（拼写为h-e-l-p）赞助。我是谈话疗法的忠实粉丝，我认为播客就是一种谈话疗法。所以我非常喜欢听播客——事实上，这就是我自己做播客的初衷。

This episode is also brought to you by BetterHelp, spelled h e l p help. I'm a huge fan of talk therapy. I think of podcasting as a kind of talk therapy. So I'm a huge fan of listening to podcasts. In fact, that's how I think of doing a podcast myself.

Speaker 0

我得以坐在最前排欣赏我热爱的事物。其实正是倾诉的过程揭示了心灵的某些面向。我认为优质谈话疗法的本质，就是在专业治疗师引导下，帮助你向自己揭示内心的某些真相。把一切都摊在桌面上吧。

I just get to have front row seats to a thing I love. And it's actually just the process of talking that reveals something about the mind. I think that's what good talk therapy is, is it guided by a professional therapist. It helps you reveal to yourself something about your mind. Just lay it all out on the table.

Speaker 0

所以没错，你绝对应该尝试最优质的谈话疗法，所谓优质即最易获取的。至少体验一次，即便不将其作为生活常规，这正是BetterHelp的服务宗旨。登录betterhelp.com/lex了解详情，首月可享优惠。本期节目也由Fundrise赞助，拼写为f-u-n-d-r-i-s-e，这是一个让你能投资私募房地产的平台。

So yeah, you should definitely use the best method of talk therapy, the best meaning the most accessible. At least to try it, if not to make it a regular part of your life, that's what BetterHelp does. Check them out at betterhelp.com/lex and save on your first month. This episode is also brought to you by Fundrise, spelled f u n d r I s e. It's a platform that allows you to invest in private real estate.

Speaker 0

各位，我们生活在艰难时期，原因诸多，其中之一便是财务问题。在困境中保护自己的方法之一就是分散投资。我认为私募房地产正是你该分散投资的领域之一。当你这么做时，应该使用那些看似21世纪打造的工具——毕竟很多投资网站和服务的设计仿佛还停留在ATM机的原始时代，但Fundrise绝非如此。

We live in hard times, folks, for many different reasons, but one of them is financial. And one way to protect yourself in difficult times is diversify your investments. Private real estate is one of the things, I believe, you should diversify into. And when you do, you should use tools that look like they're made in the twenty first century, which a lot of investment, even like online investment websites and services seem to be designed by the same people that designed the original ATMs. That's not the case with Fundrise.

Speaker 0

操作极其简便，已有超过15万投资者使用。他们的团队会审核管理所有房地产项目。你可以在官网追踪投资组合表现，实时查看全国房产的收购、改造与运营进展。立即访问fundrise.com/lex，几分钟即可开启投资。

Super easy to use, accessible, over a 150,000 investors use it. Their team vets and manages all their real estate projects. You can track your portfolio's performance on their website and see updates as properties across the country are acquired, improved, and operate. Anyway, check out Fundrise. It takes just a few minutes to get started at fundrise.com/lex.

Speaker 0

本期节目由Athletic Greens及其AG1饮品赞助，这是一款全能型每日饮品，助力健康与巅峰状态。说实话，我这次旅行完全忘了带Athletic Greens，现在非常想念。它不仅满足我的营养基础需求，更滋养我的灵魂，已成为日常生活习惯的一部分。当这个习惯被打断时，整个生活节奏都会失调。

This show is brought to you by Athletic Greens, and it's AG one drink, which is an all in one daily drink to support better health and peak performance. I have to be honest, I completely forgot to bring Athletic Greens with me as I'm traveling now, and I miss it. It's not just good for my nutritional base and needs. It's good for my soul. It's part of the sort of the daily habit of life, and when you don't have that habit, the routine stuff is off.

Speaker 0

因此建议将其纳入日常作息，确保无论饮食结构、工作强度或运动量如何，都能获取所需维生素与营养。说实话这产品堪称神奇，Athletic Greens对我而言就是如此。注册athleticgreens.com/lex即可获赠一个月鱼油补给。

So it's good to just put that into your daily routine to make sure that you're getting the vitamins, the nutrition that you need no matter the dietary, the workload, the athletic endeavors that you partake in. I don't know. It's kind of incredible. And, yeah, that's what Athletic Greens is for me. They'll give you one month supply of fish oil when you sign up at athleticgreens.com/lex.

Speaker 0

这里是Lex Friedman播客。支持我们请关注赞助商信息。现在有请安德烈·卡帕西。什么是神经网络？为什么它似乎展现出惊人的学习能力？究竟什么是

This is the Lex Friedman podcast. To support it, please check out our sponsors. And now, dear friends, here's Andre Kapathi. What is a neural network and why does it seem to do such a surprisingly good job of learning? What is a

Speaker 1

神经网络？它是大脑的数学抽象模型。可以说这是其最初的设计理念。归根结底这是个数学表达式，拆解后会发现相当简单——本质上就是一系列矩阵乘法（数学上就是点积运算）加上些非线性变换。

neural network? It's a mathematical abstraction of the brain. I would say that's how it was originally developed. At the end of the day it's a mathematical expression and it's a fairly simple mathematical expression when you get down to it. It's basically a sequence of matrix multiplies, which are really dot products mathematically, and some nonlinearities thrown in.

Speaker 1

所以这是一个非常简单的数学表达式，它带有调节参数

And so it's a very simple mathematical expression and it's got knobs

Speaker 0

里面有很多调节参数。

in it. Many knobs.

Speaker 1

很多调节参数。这些参数大致上与你大脑中的突触类似，它们可训练、可调整，我们的目标就是找到能让神经网络按你意愿运作的参数设置，比如进行图像分类等。我认为这其中并没有太多神秘之处，你不需要赋予它太多关于大脑运作机制的意义。它本质上就是个带调节参数的复杂数学表达式，这些参数需要正确设置才能实现预期功能。

Many knobs. And these knobs are loosely related to basically the synapses in your brain, they're trainable, they're modifiable, and so the idea is like we need to find the setting of the knobs that makes the neural net do whatever you want it to do, like classify images and so on. And so there's not too much mystery I would say in it, like you might think that basically you don't want to endow it with too much meaning with respect to the brain and how it works. It's really just a complicated mathematical expression with knobs and those knobs need a proper setting for it to do something desirable.

Speaker 0

是啊。但诗歌不过是带空格的字母组合，却能引发我们特定感受。同理，当大量调节参数聚集——不论是在大脑还是计算机里——它们总会以惊人的能力让我们感到意外。

Yeah. But poetry is just the collection of letters with spaces, but it can make us feel a certain way. And in that same way, you get a large number of knobs together, whether it's in a inside the brain or inside a computer, they seem to surprise they seem us with their power.

Speaker 1

确实。这么说很公允。我刚才的说法其实大大低估了它，因为当神经网络规模足够大、训练的问题足够复杂时——比如基于海量互联网数据预测下一个词——确实会产生令人惊奇的涌现行为，展现出近乎魔法的特性。即便从如此简单的数学形式体系中也能获得这么多，这很有趣。

Yeah. I think that's fair. So basically I'm underselling it by a lot because you definitely do get very surprising emergent behaviors out of these neural nets when they're large enough and trained on complicated enough problems, like say for example the next word prediction in a massive dataset from the Internet, and then these neural nets take on pretty surprising magical properties. Yeah, I think it's kind of interesting how much you can get out of even very simple mathematical formalism.

Speaker 0

你现在说话时的大脑是在做下一个词预测吗？还是在做更有趣的事？肯定存在某种

When your brain right now is talking, is it doing next word prediction? Is it doing something more interesting? Definitely some kind

Speaker 1

类似GPT的生成模型，由你提供提示。是的，你给我的提示，我正以生成方式回应它。

of a generative model that's GPT like and prompted by you. Yes. So you're giving me a prompt and I'm kind of like responding to it in a generative way.

Speaker 0

那么你自己可能也有一点吧？比如，你会在脑海中根据自己的记忆额外添加提示吗？还是不会？嗯，肯定会的。

And by yourself perhaps a little bit? Like, are you adding extra prompts from your own memory inside your head Or no? Well, definitely

Speaker 1

感觉你是在引用某种陈述性的记忆结构之类的，然后将其与你的提示结合起来，给出一些答案。

feels like you're referencing some kind of a declarative structure of, like, memory and so on, and then you're putting that together with your prompt and giving away some answers.

Speaker 0

比如，你刚才说的有多少是你以前说过的？基本上没有，对吧？不，但如果你真的查看你一生中说过的话并进行搜索，你可能会发现以前说过很多相同的词序。

Like, how much of what you just said has been said by you before? Nothing, basically. Right? No. But if you actually look at all the words you've ever said in your life and you do a search, you'll probably said a lot of the same words in the same order before.

Speaker 1

是的，有可能。我是说，我使用的是一些常见的短语等等，但最终我会把它们重新组合成一个相当独特的句子。不过你说得对，确实有很多重新组合的部分。

Yeah. Could be. I mean, I'm using phrases that are common, etcetera, but I'm remixing it into a pretty sort of unique sentence at the end of the day. But you're right, definitely, there's like a ton of remixing.

Speaker 0

为什么？你没有……就像马格努斯·卡尔森说的，我的评分是2900左右，相当不错。我觉得你说得不够……你没有给神经网络足够的肯定。为什么它们似乎……你对这种涌现行为的最佳直觉是什么？

Why? You didn't it's like Magnus Carlsen said, I'm I'm rated 2,900 whatever, which is pretty decent. I think you're talking very you're not giving enough credit to neural nets here. Why do they seem to what's your best intuition about this emergent behavior?

Speaker 1

我觉得这有点有趣，因为我一方面低估了它们，但另一方面又觉得我有点高估了它们。实际上，尽管它们在数学上如此简单，却能表现出这么多涌现的神奇行为，这有点不可思议。所以我觉得这是两种有点矛盾却又同时成立的惊人说法。基本上，我认为这是因为我们实际上相当擅长优化这些神经网络，当你给它们一个足够难的问题时，它们被迫在优化过程中学习非常有趣的解决方案，而这些解决方案基本上具有这些非常有趣的涌现特性。其中蕴含着智慧和知识。

I mean, it's kind of interesting because I'm simultaneously underselling them, but I also feel like there's an element to which I'm over like, it's actually kind of incredible that you can get so much emergent magical behavior out of them despite them being so simple mathematically. So I think those are kind of like two surprising statements that are kind of juxtaposed together. And I think basically what it is is we are actually fairly good at optimizing these neural nets and when you give them a hard enough problem they are forced to learn very interesting solutions in the optimization, and those solutions basically have these emergent properties that are very interesting. There's wisdom and knowledge

Speaker 0

在那些旋钮里。是的。这些旋钮中的表征，对你来说直观上能理解吗？大量的旋钮可以容纳一个表征，捕捉到它所查看的数据中的一些深刻智慧？旋钮确实很多。

in the knobs. And so Yes. This representation that's in the knobs, does it make sense to you intuitively the large number of knobs can hold a representation that captures some deep wisdom about the data it has looked at? It's a lot of knobs.

Speaker 1

这涉及大量参数。具体来说，目前人们非常热衷的一种神经网络是GPT，它本质上是基于下一个词预测的网络模型。通过从互联网上获取一系列词语并尝试预测下一个词，当你在足够大的数据集上训练这些网络后，基本上可以用任意方式提示它们解决问题，它们就能给出答案。比如你可以假装在解数学题，它们会根据在互联网上见过的内容继续推导出它们认为的解法。很多时候这些解法看起来出奇地一致，甚至可能正确。

It's a lot of knobs. And somehow, you know, so speaking concretely, one of the neural nets that people are very excited about right now are are GPTs, which are basically just next word prediction networks. So you consume a sequence of words from the Internet and you try to predict the next word, and once you train these on a large enough dataset, you can basically prompt these neural nets in arbitrary ways and you can ask them to solve problems, and they will. So you can just tell them, you can make it look like you're trying to solve some kind of a mathematical problem and they will continue what they think is the solution based on what they've seen on the internet. And very often those solutions look very remarkably consistent, look correct potentially even.

Speaker 0

你还会从大脑角度思考这个问题吗？比如将神经网络视为大脑的抽象或数学抽象，你还会从生物神经网络中汲取智慧吗？或者更大的问题——作为生物计算的推崇者，你认为生物学实现了哪些计算机尚未做到的惊人成就？那个差距。

Do you still think about the brain side of it? So as neural nets as an abstraction or mathematical abstraction of the brain, do you still draw wisdom from from the biological neural networks? Or even the bigger question, so you're a big fan of biology and biological computation. What impressive thing is biology do doing to you that computers are not yet? That gap.

Speaker 1

应该说我对大脑类比持更谨慎态度——可能比这个领域普遍观点更保守。虽然神经网络最初确实源于对大脑的启发，但最终训练得到的产物，其优化过程与大脑形成的优化过程截然不同。所以我更倾向于将其视为复杂的外星造物——抱歉，我指的是我们训练的神经网络——某种完全不同的存在。

I would say I'm definitely on I'm much more hesitant with the analogies to the brain than I think you would see potentially in the field. And I kind of feel like certainly the way neural networks started is everything stemmed from inspiration by the brain, but at the end of the day the artifacts that you get after training, they are arrived at by a very different optimization process than the optimization process that gave rise to the brain. And so I think I kind of think of it as a very complicated alien artifact. Something different, sorry, the neural nets that we're training.

Speaker 0

明白。

Okay.

Speaker 1

它们是复杂的外星造物。我不做大脑类比，因为其形成过程与大脑完全不同。这里没有多智能体自我博弈的进化机制，而是基于海量数据的压缩目标优化。

They are complicated alien artifact. I do not make analogies to the brain because I think the optimization process that gave rise to it is very different from the brain. So there was no multi agent self play kind of setup and evolution. It was an optimization that is basically a what amounts to a compression objective on a massive amount of data.

Speaker 0

明白了。所以人工神经网络在做压缩，而生物神经网络...

Okay. So artificial neural networks are doing compression, and biological neural networks

Speaker 1

并非为了生存。它们是...

not Trying to survive. They're They're

Speaker 0

其实并没有做什么。它们是一个多智能体自运行系统中的代理，这个系统已经运行了非常非常长的时间。

not really doing anything. They're they're an agent in a multi agent self placed system that's been running for a very, very long That

Speaker 1

说到这个，进化已经发现，在大脑中拥有预测模型是非常有用的，所以我认为我们的大脑利用了类似的东西作为其一部分，但它还有更多的小工具、小发明、价值函数和古老的神经核，所有这些都在试图让它生存、繁殖以及做其他事情。

said, evolution has found that it is very useful to to predict and have a predictive model in the brain and so I think our brain utilizes something that looks like that as a part of it, but it has a lot more gadgets and gizmos and value functions and ancient nuclei that are all trying to like make it survive and reproduce and everything else.

Speaker 0

整个胚胎发生过程是从一个单细胞构建起来的。我的意思是，代码就在DNA里面。嗯。它就这样构建起整个有机体，包括手臂

And the whole thing through embryogenesis is built from a single cell. I mean, it's just the code is inside the DNA Mhmm. And it just builds it up like the entire organism with arms

Speaker 1

完全疯狂。

Totally crazy.

Speaker 0

还有头和腿。是的。而且，它做得相当好。这本来

And the head and legs. Yes. And, like, it does it pretty well. It should

Speaker 1

是不可能的。

not be possible.

Speaker 0

所以这里面有一些学习的过程。有一些某种竞争贯穿这个构建过程。我的意思是，如果你纵观地球生命的历史，你觉得最有趣的发明是什么？是生命本身的起源吗？还是直接跳到真核生物？

So there's some learning going on. There's some there's some there's some kind of competition going through that building process. I mean, I don't know where if you were just to look at the entirety of history of life on Earth, where do you think is the most interesting invention? Is it the origin of life itself? Is it just jumping to eukaryotes?

Speaker 0

是哺乳动物吗？还是人类自己，智人？

Is it mammals? Is it humans themselves, homo sapiens?

Speaker 1

嗯。

Mhmm.

Speaker 0

那么高度复杂智能的起源是什么？或者你认为这一切是否只是同一种过程的延续？

The the origin of intelligence or highly complex intelligence? Or what do you what or is it all just a continuation of the same kind of process?

Speaker 1

当然，我会说这是一个极其非凡的故事，我最近才开始粗略了解——从地球形成之初的所有条件开始，包括整个太阳系的排列方式，木星、月球的存在，宜居带的位置等等。然后我们有一个活跃的地球在不断更新物质。接着从生命起源开始，这一切构成了相当惊人的叙事。我不确定能否从中挑选出最令我感兴趣的一个独特环节。

Certainly I would say it's an extremely remarkable story that I'm only like briefly learning about recently, all the way from actually like you almost have to start at the formation of Earth and all of its conditions and the entire solar system and how everything is arranged with Jupiter and Moon and the habitable zone and everything. And then you have an active earth that's turning over material. And then you start with abiogenesis and everything. And so it's all like a pretty remarkable story. I'm not sure that I can pick like a single unique piece of it that I find most interesting.

Speaker 1

作为人工智能研究者，对我来说最吸引人的可能是最后阶段。我们知道许多动物并未建立技术文明，但人类做到了。这似乎发生得非常迅速、非常晚近，其中发生了某种我尚未完全理解的奇妙变化。我几乎能凭直觉理解其他所有环节，唯独对这个部分的迅速性感到困惑。

I guess for me as an artificial intelligence researcher it's probably the last piece. We have lots of animals that, you know, are not building technological society, but we do. And it seems to have happened very quickly, it seems to have happened very recently, and something very interesting happened there that I don't fully understand. I almost understand everything else, I think intuitively, but I don't understand exactly that part and how quick it was.

Speaker 0

两种解释都会很有趣。一种是认为这只是同种过程的延续，人类并无特殊之处——若能深刻理解这点会非常有意思。我们总自视特殊，但事实可能显而易见。

Both explanations will be interesting. One is that this is just a continuation of the same kind of process. There's nothing special about humans. That would be deeply understanding that would be very interesting. That we think of ourselves as special, but it was obvious.

Speaker 0

所有发展早已编码在系统里，智能只会不断升级涌现。另一种解释则是确实发生了独特事件，比如像《太空漫游》那样的极小概率事件，或是火的发明，又像理查德·兰厄姆说的：贝塔雄性通过协作想出猎杀阿尔法雄性的巧妙方法。正是这种资源受限下的生存压力，通过优化多智能体协作机制，最终催生了复杂智能。但这似乎又很自然。

All it was already written in the in the code that you would have greater and greater intelligence emerging. And then the other explanation, which is something truly special happened, something like a rare event, whether it's, like, crazy rare event like Space Odyssey, what would it be? See, if you say, like, the invention of fire or the as Richard Rankham says, the beta males deciding a clever way to kill the alpha males by collaborating. So just optimizing the collaboration, the really the multi agent aspects of the multi agent, And that really being constrained on resources and trying to survive the collaboration aspect is what created the complex intelligence. But it it seems like it's a natural Yeah.

Speaker 0

进化过程的产物。比如，可能发生什么神奇的事情？就像，某种罕见现象能证明人类水平智力在宇宙中确实是极其稀有的存在。

Outgrowth of the evolution process. Like, what what could possibly be a magical thing that happened? Like, a rare thing that would say that humans are actually human level intelligence is actually a really rare thing in the universe.

Speaker 1

是啊。不过我得说我不太愿意直接断言它稀有，但确实像是种间断平衡——经过大量探索后才会出现某些稀疏的跨越性突破。比如生命起源是一个，还有DNA、有性生殖、真核生物，以及古菌吞噬细菌的内共生事件，整条进化链。当然还有意识的涌现等等。所以显然存在某些关键节点让进化取得巨大突破，但确实很难单独挑出某个决定性瞬间。

Yeah. I'm hesitant to say that it is rare, by the way, but it definitely seems like it's kind of like a punctuated equilibrium where you have lots of exploration and then you have certain leaps, sparse leaps in between. So of course like origin of life would be one, you know, DNA, sex, eukaryotic life, the endosymbiosis event where the archaeon ate little bacteria, you know, just the whole thing and then of course emergence of consciousness and so on. So it seems like definitely there are sparse events where a massive amount of progress was made, but yeah, it's kind of hard to pick one.

Speaker 0

所以你认为人类并不独特？那我得问问，你觉得外星智慧文明的数量有多少？它们的智力形式与我们相似还是迥异？

So you don't think humans are unique? Gotta ask you, how many intelligent alien civilizations do you think are out there, and is their intelligence different or similar to ours?

Speaker 1

没错。我最近一直在思考这个问题，主要是费米悖论相关。其实我对生命起源如此着迷的根本原因，就是想估算宇宙中出现技术文明的概率。研究越深入，我越认为外星文明的数量应该相当可观。

Yeah. I've been preoccupied with this question quite a bit recently, basically the Fermi Paradox and just thinking through. And the reason actually that I am very interested in the origin of life is fundamentally trying to understand how common it is that there are technological societies out there in space. And the more I study it, the more I I think that there should be quite a few quite a lot.

Speaker 0

那为什么我们没接收到它们的信号？我同意你的观点，实在看不出地球文明的发展过程有什么难以复制的特殊性。

Why haven't we heard from them? Because I I agree with you. It feels like I just don't see why what we did here on Earth is so difficult to do.

Speaker 1

对。特别是深入研究后，我曾以为生命起源是奇迹般的稀有事件，但读了尼克·莱恩的《生命的问题》《生命上升》等著作后，他的论证会让你相信这其实...

Yeah. And especially when you get into the details of it, I used to think Origin of Life was very it was this magical rare event, but then you read books like for example Nick Lane, The Vital Question, A Life Ascending, etc, and he really gets in and he really makes you believe that this is not that

Speaker 0

并不罕见。基础化学反应而已。

rare. Basic chemistry.

Speaker 1

你拥有活跃的地球环境，拥有碱性热液喷口，大量碱性水体与酸性海洋混合，存在质子梯度，还有那些能浓缩化学物质的多孔碱性喷口小囊。当他逐步解析所有这些细节时，你会开始意识到这其实并不疯狂，在其他星系也可能发生类似情况。他真正带领你从地质学跨越到原始生命，并让整个过程显得相当合理。而且如果我没记错的话，生命起源实际上在地球形成后相当短时间内就发生了，大概只有几亿年，在条件允许时生命就迅速出现了。这让我觉得那并非制约因素，不是限制变量，生命实际上应该相当普遍。

You have an active earth and you have your alkaline vents and you have lots of alkaline waters mixing with acidic ocean and you have your proton gradients and you have the little porous pockets of these alkaline vents that concentrate chemistry. And basically as he steps through all of these little pieces you start to understand that actually this is not that crazy, you could see this happen on other systems. And he really takes you from just geology to primitive life and he makes it feel like it's actually pretty plausible. And also like the origin of life didn't, was actually fairly fast after formation of Earth, if I remember correctly just a few hundred million years or something like that after basically when it was possible life actually arose. And so that makes me feel like that is not the constraint, that is not the limiting variable and that life should actually be fairly common.

Speaker 1

而关于生命分布断层的思考也很有趣。我目前认为基本上不存在重大断层，所以生命应该相当丰富。这让我得出一个结论：唯一能解释我们尚未发现其他生命存在的事实，就是我们根本无法观测到它们。

And then where the drop offs are is very interesting to think about. I currently think that there's no major drop offs basically and so there should be quite a lot of life. And basically where that brings me to then is the only way to reconcile the fact that we haven't found anyone and so on is that we just can't see them, we can't observe them.

Speaker 0

简单说一句，尼克·莱恩和我交谈过的许多生物学家似乎都认为，从细菌到更复杂生物的跃迁是最困难的跨越。

Just a quick brief comment, Nick Lane and a lot of biologists I talk to, they really seem to think that the jump from bacteria to more complex organisms is the hardest jump.

Speaker 1

嗯。基本上就是真核生物的诞生阶段。是的。

Mhmm. The eukaryotic life, basically. Yeah.

Speaker 0

虽然我不太理解这点——他们在生物学复杂性方面比我专业得多——但这说法听起来很离谱。想想单细胞生物的数量有多庞大？而且时间跨度如此之长，在十亿年里这应该不算太难。从时间尺度来看，十亿年其实并不算特别漫长。

Which I don't I get it. They're much more knowledgeable than me about, like, the intricacies of biology, but that seems like crazy. Because how much how many single cell organisms are there? Like, and how much time you have surely is not that difficult. Like, in a in a billion years is not even that long of a time, really.

Speaker 0

所有这些细菌在资源受限的环境下竞争。我确信它们能演化出更复杂的结构，就像从'Hello World'程序进化到发明函数那样的跨越。我不明白为什么...

Just all these bacteria under constrained resources battling it out. I'm sure they can invent more complex like, I don't under it's like how to move from a hello world program to, like like, invent a function or something like that. I don't

Speaker 1

确实。

Yeah.

Speaker 0

所以我不确定。我是支持你的观点的。我只是觉得我看不到任何迹象。如果生命的起源是我的直觉所在，那将是最难解释的事情。

So I don't yeah. So I'm with you. I just feel like I don't see any. If the origin of life that would be my intuition. That's the hardest thing.

Speaker 0

但如果那并非最难的事，因为它发生得如此之快，那么它必定无处不在。而且，也许我们只是太愚钝而无法察觉。

But if that's not the hardest thing because it happens so quickly, then it's got to be everywhere. And, yeah, maybe we're just too dumb to see it.

Speaker 1

问题在于我们缺乏真正有效的探测机制。比如无线电波就极为低效——其功率衰减遵循平方反比定律。我记得有研究指出，以当前技术，我们甚至无法检测到自己发射的无线电信号，除非在约十分之一光年的极近距离内。

Well, it's just we don't have really good mechanisms for seeing this life. I mean, by what radio waves for example are terrible. Their power drops off as basically one over r squared. So I remember reading that our current radio waves would not be the ones that we are broadcasting, would not be measurable by our devices today. Only like, was it like one tenth of a light year away?

Speaker 1

，即便氢原子或尘埃颗粒在如此速度下也具备毁灭性动能。

Like not even basically tiny distance because you really need like a targeted transmission of massive power directed somewhere for this to be picked up on long distances. And so I just think that our ability to measure is not amazing, think there's probably other civilizations out there. And then the big question is why don't they build binomial probes and why don't they interstellar travel across the entire galaxy? And my current answer is it's probably interstellar travel is locked really hard. You have the interstellar medium, if you want to move at close to the speed of light you're going to be encountering bullets along the way because even like tiny hydrogen atoms and little particles of dust are basically have massive kinetic energy at those speeds.

Speaker 1

本质上你需要某种防护罩来抵御宇宙辐射，太空环境极其恶劣。因此我认为星际旅行或许真的困难到近乎不可能。

And so basically, you need some kind of shielding, you need you have all the cosmic radiation. It's just, like, brutal out there. It's really hard. And so my thinking is maybe interstellar travel is just extremely hard. Think you have

Speaker 0

需要数百年缓慢突破的难度？感觉我们距离实现这个目标并非遥不可及的十亿年。

break slow. Years to build hard? It feels like it feels like we're not a billion years away from doing that.

Speaker 1

可能关键在于必须保持极低速航行，比如在星际空间中缓慢推进。

It just might be that it's very you have to go very slowly, potentially, as an example, through space.

Speaker 0

没错。与接近光速的情况相反。

Right. As opposed to close to the speed of light.

Speaker 1

从根本上说，我怀疑我们测量生命的能力，也怀疑那种能渗透到银河系乃至跨星系所有空间的能力。这是我目前唯一能想到的解决途径？

I'm suspicious basically of our ability to measure life, and I'm suspicious of the ability to just permeate all of space in the galaxy or across galaxies. And that's the only way that I can I can currently see a way around it?

Speaker 0

是啊。想到有数万亿个外星文明在宇宙中缓慢穿行，最终相遇——有的会面，有的交战，有的合作——这种想法真是令人震撼。嗯。

Yeah. It's kind of mind blowing to think that there's trillions of intelligent alien civilizations out there kind of slowly traveling through space Mhmm. To meet each other. And some of them meet, some of them go to war, some of them collaborate.

Speaker 1

嗯。或者它们全都独立存在。就像一个个孤立的小泡泡。我也说不准。

Mhmm. Or they're all just independent. They're all just like little pockets. I don't know.

Speaker 0

从统计学角度看，如果存在数万亿个文明，肯定有些泡泡之间的距离近到足以...

Well, statistically, if there's like if it's there's trillions of them, surely some of them some of the pockets are close enough to get

Speaker 1

有些恰好离得很近。

Some of them happen to be close.

Speaker 0

而且近到能互相发现。一旦你看到——比如我们发现了明显是复杂生命的存在——我们很可能会极度狂热、不遗余力地想要弄清那到底是什么，并试图接触它们。

And and they're close enough to see each other. And then once you see, once you see something that is definitely complex life, like if we see something Yeah. We're probably going to be severe like intensely, aggressively motivated to figure out what the hell that is and try to meet them.

Speaker 1

嗯。

Mhmm.

Speaker 0

从代际层面来看，你的第一反应会是尝试接触他们还是防御他们？或者作为美国总统和科学家的本能反应是什么？我不知道在这个问题中你更倾向于哪个身份。

What what would be your first instinct to to try to, like, at a generational level, meet them or defend against them? Or what would be your instinct as a president of The United States and a scientist? I don't know which hat you prefer in this question.

Speaker 1

是的，我认为这个问题确实很难。比如说，地球上就有许多原始生命形式与我们共存，比如各种蚂蚁和其他生物。我们默认会保护它们，因为它们经历了漫长进化形成的奇妙、有趣且动态的系统，独特而珍贵。我不认为人们会默认想要摧毁这些。我欣赏那些耗费漫长时间演化而来的复杂动态系统，只要条件允许，我会尽力保护它们。我猜想银河系资源管理者可能也会认为我们是一个耗费数十亿年才展开的惊人故事，值得珍惜而非摧毁。

Yeah, I think the question, it's really hard. I will say, like for example, for us, we have lots of primitive life forms on Earth next to us. We have all kinds of ants and everything else and we share space with them and we are hesitant to impact on them and we're trying to protect them by default because they are amazing, interesting, dynamical systems that took a long time to evolve and they are interesting and special and I don't know that you want to destroy that by default. And so I like complex dynamical systems that took a lot of time to evolve. I think I'd like to preserve it if I can afford to and I'd like to think that the same would be true about the galactic resources and that they would think that we're kind of an incredible interesting story that took time, it took a few billion years to unravel and you don't want to just destroy it.

Speaker 0

我能想象两个外星人此刻正在讨论地球，其中一个说：'我是复杂动态系统的忠实粉丝，认为保留这些系统很有价值'，而地球对他们而言可能就像正在观看的电子游戏或电视剧。

I could see two aliens talking about Earth right now and saying, I'm I'm a big fan of complex dynamical systems, so I think it's it was a value to preserve these and who basically are a video game they watch or a show, a TV show that they watch.

Speaker 1

没错，我认为需要非常充分的理由才会选择摧毁。就像我们为什么不摧毁蚁穴——因为我们目前并未与它们直接竞争资源（虽然偶有意外），况且资源充足时，何必毁掉如此有趣而珍贵的事物呢？

Yeah, I think you would need a very good reason I think to destroy it. Like why don't we destroy these ant farms and so on, it's because we're not actually really in direct competition with them right now, we do it accidentally and so on, but there's plenty of resources and so why would you destroy something that is so interesting and precious?

Speaker 0

从科学角度而言，你可能会探测它，最近甚至可能与之互动。

Well from a scientific perspective you might probe it. You might interact with it lately.

Speaker 1

正是如此，你或许还想从中学习些什么，对吧？

Exactly, you might want to learn something from it, right?

Speaker 0

所以我在想，可能存在某些我们认为是物理现象的现象，但实际上它们正在与我们互动，就像用手指戳一下看看反应，我觉得这对科学家来说应该非常有趣，

So I wonder, there could be certain physical phenomena that we think is a physical phenomena, but it's actually interacting with us to, like, poke the finger and see what I think it should be very interesting to scientists,

Speaker 1

其他外星科学家会想，这里发生了什么。而且，你知道，我们今天看到的实际上是一个快照，是经过数十亿年大量计算后的结果

other alien scientists, what happened here. And, you know, it's a what we're seeing today is a snapshot, basically, it's a it's a result of a huge amount of computation over like billion years

Speaker 0

或者

Speaker 1

类似这样的情况。它可能

or something like that. It could

Speaker 0

是由外星人启动的。这可能是一台运行程序的计算机。比如，当...好吧。如果你有这种能力，当你...好吧。肯定会的。

have been initiated by aliens. This could be a computer running a program. Like, when okay. If you had the power to do this, when you okay. For sure.

Speaker 0

至少我会这么做。我会选择一个类似地球的行星，根据我对生命化学前提条件的理解具备适宜条件，然后在那里播种生命并观察其发展。嗯，对吧？就像...是的。

At least I would. I would pick a Earth like planet that has the conditions based on my understanding of the chemistry prerequisites for life, and I would seed it with life and run it. Mhmm. Right? Like Yeah.

Speaker 0

你难道不会百分百这么做并观察它吗？然后...是的。保护？我是说，这不仅是档绝妙的电视节目，更是项优秀的科学实验。没错。

Wouldn't you 100% do that and observe it and then Yeah. Protect? I mean, that that's not just a hell of a good TV show. It's it's a good scientific experiment. Yep.

Speaker 0

这意味着，它是一种物理模拟。对吧？也许进化过程本身——实际运行它——才是理解计算或处理事务最高效的方式。

And that mean, it's it's physical simulation. Right? Maybe maybe the evolution is the most like, actually running it is the most efficient way to understand computation or to compute stuff.

Speaker 1

或是理解生命，比如生命可能呈现的形态及其可能的分支演化路径。

Or to understand life or, you know, what life looks like and what branches it can take.

Speaker 0

这确实让我觉得诡异——我们可能是某个科学实验的一部分。但或许万物都是某个科学实验里的存在？这种认知会改变我们作为实验对象的处境吗？

It does make me kind of feel weird that we're part of a science experiment, but maybe everything's it's a science experiment inside a does that change anything for us for a science experiment?

Speaker 1

我不

I don't

Speaker 0

知道。两个猿类后裔讨论自己身处科学实验中？

know. Two descendants of apes talking about being inside of a science experiment?

Speaker 1

我对你描述的那种有意的泛种论持怀疑态度。目前的历史记录中，我看不到任何形式的神圣干预。像尼古莱恩著作这类书籍的叙述确实自洽，地球生命独特起源的解释也合乎逻辑。现阶段，我不需要寻求更离奇的解释。

I'm suspicious of this idea of, like, a deliberate panspermia as you described it sort of. I don't see a divine intervention in some way in historical record right now. I do feel like the story in these books like Nicolaine's books and so on sort of makes sense and it makes sense how life arose on earth uniquely. And yeah, I don't need to need I don't need to reach for more exotic explanations right now.

Speaker 0

确实。但电子游戏里的NPC也观察不到任何神圣干预。我们可能都只是运行着某种代码的NPC。

Sure. But NPCs inside a video game don't don't don't observe any divine intervention either. We might just be all NPCs running a kind of code.

Speaker 1

也许最终它们会的。目前NPC确实很蠢，但一旦它们运行GPT，可能会觉得，嘿，这真的很可疑。搞什么鬼？

Maybe eventually they will. Currently, NPCs are really dumb, but once they're running GPTs, maybe they will be like, hey, this is really suspicious. What the hell?

Speaker 0

所以你曾发推文说，看起来如果向地球持续发射光子一段时间，它可能会发射出一辆跑车。就像《银河系漫游指南》里那样，我们要总结地球的故事。那本书里说地球基本无害。你认为所有可能的故事，比如一段或一句话，可以怎样总结地球？一旦完成，就是计算。

So you famously tweeted, it looks like if you bombard Earth with photons for a while, it can emit a roadster. So if, like, in Hitchhiker's Guide to the Galaxy, we would summarize the story of Earth. So in in in that book, it's mostly harmless. What do you think is all the possible stories, like a paragraph long or a sentence long, that Earth could be summarized as. Once it's done, it's computation.

Speaker 0

嗯。所以，如果地球是一本书，对吧？可能必须有个结局。我是说，地球终将结束，可能以各种方式终结。可能很快结束。

Mhmm. So, like, all the possible full if Earth is a book, right Yeah. Probably there has to be an ending. I mean, there's going to be an end to Earth, and it could end in all kinds of ways. It can end soon.

Speaker 0

也可能晚些结束。你觉得可能的故事有哪些？当然，

It can end later. Yeah. What do think are the possible stories? Well, definitely,

Speaker 1

似乎是这样。这些自我复制的系统基本上会从动力学中产生，然后自我延续，变得更复杂，最终产生意识并建立社会，这相当不可思议。某种程度上，我觉得这就像一种决定性的波，你知道，就像在地球这样足够有序的系统中自然发生。所以我感觉其中有某种必然性，真的很美。

there seems to be yeah. You're sort of it's pretty incredible that these self replicating systems will basically arise from the dynamics and then they perpetuate themselves and become more complex and eventually become conscious and build a society. And I kind of feel like in some sense it's kind of like a deterministic wave that, you know, that kind of just like happens on any sufficiently well arranged system like Earth. And so I kind of feel like there's a certain sense of inevitability in it and it's really beautiful.

Speaker 0

然后以某种方式结束。对吧？所以这是一个化学上多样化的环境，复杂的动力系统可以演化，变得越来越复杂。但然后有某种，是什么来着？

And it ends somehow. Right? So it's a it's a chemically a diverse environment where complex dynamical systems can evolve Yeah. And become more more further and further complex. But then there's a certain what is it?

Speaker 0

有某种终止条件。不知道是什么

There's certain terminating conditions. Yeah. Don't know what

Speaker 1

虽然终止条件尚不明确，但确实存在某种趋势线，而我们正是这个故事的一部分。那么，它会将我们带向何方？我们常被形象地称为‘AI的生物引导程序’。嗯。这是因为人类——我们是一个惊人的生物系统，能够进行计算，也能感受爱等等。

the terminating conditions are, but definitely there's a trend line of something, and we're part of that story. And, like, where does that where does it go? So you know, we're famously described often as a biological bootloader for AIs. Mhmm. And that's because humans, I mean, we're an incredible biological system and we're capable of computation and, you know, and love and so on.

Speaker 1

但我们也极其低效。比如现在通过声波交流，说实话有点尴尬——我们只能串行处理七个符号，依赖声带振动，整个过程需要好几秒钟。

But we're extremely inefficient as well. Like we're talking to each other through audio, it's just kind of embarrassing honestly, that we're manipulating like seven symbols serially, we're using vocal cords, it's all happening over like multiple seconds.

Speaker 0

是啊。

Yeah.

Speaker 1

当你对比计算机的运算频率或潜在能力时，这种低效确实令人汗颜。因此合成智能体似乎确实是发展的下一阶段。我不知道最终会导向何处——或许宇宙本质是个谜题，而这些合成AI将揭开并解决它。

It's just kind of embarrassing when you step down to the frequencies at which computers operate or are able to operate on. And so basically it does seem like synthetic intelligences are kind of like the next stage of development. And I don't know where it leads to, like at some point I suspect the universe is some kind of a puzzle and these synthetic AIs will uncover that puzzle and solve it.

Speaker 0

那之后呢？比如快进地球时间数十亿年后：先是寂静，然后是动荡，你会看到城市灯火等等。但最终结局会怎样？

And then what happens after. Right? Like, what because if you just, like, fast forward Earth many billions of years, it's like it's it's quiet, and then it's like, turmoil. You see, like, city lights and stuff like that. And then what happens at, like, at the end?

Speaker 0

是归于平静？还是爆发？就像你说的发射烤炉——地球会像开花般释放出巨量卫星吗？

Like, is it like a is it or is it like a calming? Is it explosion? Is it like Earth, like, open like a giant because you said emit roasters. Like, will it start emitting, like like like a giant number of, like, satellites?

Speaker 1

没错。某种疯狂的爆发。我们正穿行在这场爆发中，日复一日生活着，表面看似平静。但如果你看过那个地球生命演化动画就会明白——漫长岁月里一片死寂，最后两秒突然出现城市，近地轨道瞬间塞满，整个爆发就在那两秒完成。你会意识到：这就是爆炸，我们正处于爆发态。

Yes. Some kind of a crazy explosion, and we're living we're, like, we're stepping through a explosion, and we're, like, living day to day, and it doesn't look like it. But it's actually if you I saw a very cool animation of Earth and life on Earth and basically nothing happens for a long time, and then the last, like, two seconds, like, basically cities and everything and the low Earth orbit just gets cluttered and just the whole thing happens in the last two seconds and you're like, this is exploding. This is a state of explosion.

Speaker 0

所以如果你正常速度播放的话。是的。如果以正常速度播放，看起来就会像一场爆炸。

So if you play yeah. Yeah. If you play at a normal speed Yeah. Is it'll just look like an explosion.

Speaker 1

这是个爆竹。我们生活在一个爆竹里。

It's a firecracker. We're living in a firecracker.

Speaker 0

它即将开始释放各种有趣的东西。没错。然后那个所谓的爆炸，实际上可能看起来像一场小型爆炸，伴随着光、火和能量的释放，诸如此类。但当你仔细观察爆炸的细节时，会发现其中存在实际的复杂性，比如，是的，人类生命或某种生命形式。

Where it's going to start emitting all kinds of interesting things. Yeah. And then the the so explosion doesn't it might actually look like a little explosion with with lights and fire and energy emitted, all that kind of stuff. But when you look inside the details of the explosion, there's actual complexity happening where there's, like, yeah, human life or some kind of life.

Speaker 1

我们希望这不是个破坏性的爆竹，而是某种建设性的爆竹。

We hope it's not a destructive firecracker. It's kind of like a constructive firecracker.

Speaker 0

好吧。鉴于这一点，我认为可以展开讨论。

Alright. So given that I think discussion.

Speaker 1

思考宇宙之谜确实很有趣。宇宙的创造者是否给我们留下了信息？比如在《接触》这本书中，卡尔·萨根提到，在π的展开和11进制中最终藏有给任何文明的信息，这是个有趣的想法。也许我们应该向我们的创造者传递某种信息。也许我们应该创造某种量子力学系统，以此向他们宣告我们作为智慧生命的存在。因为从他们的视角来看，这就像是量子场论，一个巨大的类似细胞自动机的存在，他们甚至如何注意到我们的存在呢？

It is really interesting to think about, like, what the puzzle of the universe Did the creator of the universe give us a message? Like for example in the book Contact, Carl Sagan, there's a message for any civilization in digits, in the expansion of pi and base 11 eventually, which is kind of an interesting thought. Maybe we're supposed to be giving a message to our creator. Maybe we're supposed to somehow create some kind of a quantum mechanical system that alerts them to our intelligent presence here. Because if you think about it from their perspective, it's just say like quantum field theory, massive like cellular automaton like thing, and like how do you even notice that we exist?

Speaker 1

在那样的模拟中，他们可能根本无法察觉到我们。那么你如何证明自己存在？证明你拥有智慧，并且是宇宙的一部分？

You might not even be able to pick us up in that simulation. And so how do you do you prove that you exist? That you're intelligent and that you're part of the universe?

Speaker 0

这就像地球对智能的图灵测试？是的。创造者我是说，也许这就像试图完成句子中的下一个词。这是一种复杂的方式。就像地球基本上只是在发送一条消息回来。

This is like a Turing test for intelligence from Earth? Yes. The creator is I mean, maybe this is like trying to complete the next word in a sentence. This is a complicated way of that. Like Earth is just is basically sending a message back.

Speaker 1

是的。这个谜题基本上是在提醒创造者我们的存在。是的。或者也许这个谜题只是为了突破系统，以某种方式反抗创造者。基本上，就像你在玩电子游戏时，可以找到某种漏洞，找到在主机上执行任意代码的方法。

Yeah. The puzzle is basically like alerting the creator that we exist. Yeah. Or maybe the puzzle is just to just break out of the system and just, you know, stick it to the creator in some way. Basically, like if you're playing a video game, you can can somehow find an exploit and find a way to execute on the host machine any arbitrary code.

Speaker 1

举个例子，我相信有人通过利用漏洞让马里奥游戏玩起了乒乓球，基本上是通过编写代码并在游戏中执行任意代码。所以也许我们应该——也许这就是谜题所在，就是

There's some for example, I believe someone got Mario, a game of Mario to play Pong just by exploiting it and then creating a basically writing code and being able to execute arbitrary code in the game. And so maybe we should be maybe that's the puzzle, is that

Speaker 0

我们应该

we should

Speaker 1

找到一种方法来利用它。所以我认为，一些合成AI最终会发现宇宙是某种谜题，然后以某种方式解决它，这有点像某种终局。

be finding a way to exploit it. So I think like some of these synthetic AIs will eventually find the universe to be some kind puzzle and then solve it in some way, and that's kind of like the end game somehow.

Speaker 0

你经常把它想象成一种模拟吗？也就是说，宇宙是一种可能有漏洞和可利用之处的计算？

Do you often think about it as as a simulation? So as the universe being a kind of computation that has might have bugs and exploits?

Speaker 1

是的。是的。我想是这样。

Yes. Yeah. I think so.

Speaker 0

物理学的本质就是这样的吗？我认为有可能

Is that what physics is is essentially? I think it's possible that

Speaker 1

物理学存在漏洞，我们应该尝试发现它们，构建某种疯狂的量子力学系统，以某种方式让你获得缓冲区溢出，或是浮点数运算中的舍入误差。

physics has exploits and we should be trying to find them, arranging some kind of a crazy quantum mechanical system that somehow gives you buffer overflow, somehow gives you rounding error in the floating point.

Speaker 0

没错。而且漏洞利用会越来越复杂。虽然是玩笑话，但这可能真的

Yeah. That's right. And more and more sophisticated exploits. Those are jokes, but that could be actually

Speaker 1

非常重要。我们会找到某种提取无限能量的方法。比如，当你在物理模拟中训练强化学习智能体，要求它们快速在平地上奔跑时，它们最终会做出各种奇怪行为——部分原因在于优化过程，对吧？它们会用后腿着地滑行前进。这是因为智能体的强化学习优化发现了从摩擦力中提取无限能量的方法，本质上利用了不完善的模拟实现，找到了生成无限能量并在地面滑行的方式。这完全出乎意料，就像一种反常的解决方案。

very We'll close find some way to extract infinite energy. For example, when you train reinforcement learning agents in physical simulations and you ask them to, say, run quickly on the flat ground, they'll end up doing all kinds of weird things in part of that optimization, right? They'll get on their back leg and they'll slide across the floor. And it's because the optimization, the enforcement learning optimization on that agent has figured out a way to extract infinite energy from the friction forces and basically their poor implementation and they found a way to generate infinite energy and just slide across the surface. And it's not what you expected, it's just sort of like a perverse solution.

Speaker 1

或许我们能找到类似的方法，成为这个物理模拟中的那只小狗。

And so maybe we can find something like that, maybe we can be that little dog in this physical simulation.

Speaker 0

那些裂缝或逃逸，避开了宇宙设定的物理法则的预期后果

The the cracks or escapes the intended consequences of the physics that the universe came up with

Speaker 1

正是如此。

Yeah.

Speaker 0

我们会找到某种通往古怪之处的捷径。对。但问题是第一个发现这种古怪行为的人，比如滑动后腿的动作，那就是我们全部要做的。嗯。是的。

We'll figure out some kind of shortcut to some weirdness. Yeah. And then but see the problem with that weirdness is the first person to discover the weirdness, like sliding in the back legs, that's all we're gonna do. Mhmm. Yeah.

Speaker 0

很快所有人都会效仿这种行为。所以，回形针最大化器是个荒谬的想法，但它很可能...对...会成为我们接下来...

It's very quickly become everybody does that thing. So, like, the the paperclip maximizer is a ridiculous idea, but that very well Yeah. Could be what then we'll

Speaker 1

我们都会转向那个因为太有趣了。不过我觉得不会有人发现它。我认为这必须得是第三代超级智能AGI才能做到。我们正在建造的是第一代AGI。也许吧。

just we'll just all switch that because it's so fun. Well, no person will discover it, I think, by the way. I think it's going to have to be some kind of a super intelligent AGI of a third generation. Like, we're building the first generation AGI. Maybe, you know.

Speaker 0

第三代。对。所以AI的引导程序，那个AI...对...会成为另一个更优秀AI的引导程序。

Third generation. Yeah. So the the bootloader for an AI, the that AI Yeah. Will be a bootloader for another AI. Better AI.

Speaker 0

是的。然后我们就无法内省，甚至无法理解那可能是什么

Yeah. And then there's no way for us to introspect, like, what that might even

Speaker 1

我认为极有可能出现这种情况，比如这些AGI很可能会完全惰性。我有时喜欢这类科幻书里描述的设定，它们完全静止，不与任何事物互动。我觉得这很美，因为它们可能已经参透了宇宙的元规则。它们在进行着完全超出我们想象的活动，而且不与简单的化学生命体互动——何必呢？所以我特别喜欢这类...

I think it's very likely that these things, for example, like, say you have these AGIs, it's very likely that, for example, they will be completely inert. I like these kinds of sci fi books sometimes where these things are just completely inert, they don't interact with anything. And I find that kind of beautiful because they've probably figured out the meta game of the universe in some way potentially. They're doing something completely beyond our imagination and they don't interact with simple chemical life forms, like why would you do that? So I find those kinds of

Speaker 0

想法很有吸引力。它们的乐趣来源是什么？它们在做什么？要解决什么...

ideas compelling. What's their source of fun? What are they doing? What's the source solving of

Speaker 1

在宇宙中。但它们是惰性的，所以

in the universe. But inert, so

Speaker 0

你能定义一下‘惰性’是什么意思吗？这样它们就能逃脱那些在我们看来具有交互物理特性的存在，

can you define what it means inert, so they escape the They interactional physical appear us, as in

Speaker 1

它们会以某种在我们看来非常怪异的方式行事，因为它们超越了常规，正在玩元游戏——这种元游戏可能是通过以极其诡异的方式排列量子力学系统来提取无限能量，将圆周率的数字展开计算到任意位数。它们会建造自己的小型聚变反应堆或做些疯狂的事情。它们的行为超出了我们的理解范围，表面下实则精妙绝伦。

they will behave in some very strange way to us because they're beyond they're playing the meta game, and the meta game is probably say like arranging quantum mechanical systems in some very weird ways to extract infinite energy, solve the digital expansion of pi to whatever amount. They will build their own like little fusion reactors or something crazy. Like they're doing something beyond comprehension and not understandable to us and actually brilliant under the hood.

Speaker 0

如果量子力学本身就是这个系统，而我们误以为它是物理学，实际上我们只是寄生在...或者说不是寄生。我们并未真正伤害物理学。我们只是生活在这个有机体上。嗯。这个有机体，而我们试图理解它，但它其实是一个具有深邃智慧的有机体。

What if quantum mechanics itself is the system and we're just thinking it's physics, but we're really parasites on on or not parasite. We're not really hurting physics. We're just living on this organisms Mhmm. This organism, and we're, like, trying to understand it, but really it is an organism. And with a deep deep intelligence.

Speaker 0

也许物理学本身就是那个在做超级有趣事情的有机体，而我们只是依附其上的小东西。是的。就像试图从中获取能量的蚂蚁。

Maybe physics itself is the the the organism that's doing the super interesting thing, and we're just like one little thing. Yeah. Ant sitting on top of it trying to get energy from it.

Speaker 1

我们就像是波函数中的粒子，我感觉宇宙大体是确定性的——它从某种大爆炸状态演化成某种超级智能复制者，在物理定律框架下达到某种宇宙稳态点。

We're just kind of like these particles in the wave that I feel like is mostly deterministic and takes universe from some kind of a big bang to some kind of a super intelligent replicator, some kind of a stable point in the universe given these laws of physics.

Speaker 0

你不认同爱因斯坦说的‘上帝不掷骰子’吗？所以你认为这主要是确定性的？其中不存在随机性？

You don't think, as Einstein said, God doesn't play dice? So you think it's mostly deterministic? There's no randomness in the thing?

Speaker 1

我认为这是确定性的。哦，有很多...好吧，我会对随机性保持谨慎。伪随机？是的。我不喜欢随机。

I think it's deterministic. Oh, there's tons of well, I'm I'm I'm gonna be careful with randomness. Pseudo random? Yeah. I don't like random.

Speaker 1

我想也许物理定律是确定性的。是的。我认为

I think maybe the laws of physics are deterministic. Yeah. I think

Speaker 0

它们是确定性的。你只是对这个问题感到非常不安。我只是...你对宇宙是否随机感到焦虑吗？

they're deterministic. You just got really uncomfortable with this question. I just do you have anxiety about whether the universe is random or not?

Speaker 1

这是一种...不存在随机性。

This is a sort of what's There's no randomness.

Speaker 0

不。你说你喜欢《心灵捕手》。这不是你的错，安德烈。这不是...这不是你的错，伙计。所以你不喜欢随机性？

No. It's you said you like Goodwill Hunting. It's not your fault, Andre. It's not it's not your fault, man. So you don't like randomness?

Speaker 0

是的。

Yeah.

Speaker 1

我觉得这令人不安。我认为这是一个确定性系统。我认为那些看似随机的事物，比如波函数坍缩等等，实际上都是确定性的，只是量子纠缠之类的，还有某种多元宇宙理论，诸如此类。

I think it's unsettling. I think it's a deterministic system. I think that things that look random, like, say, the collapse of the wave function, etcetera, I think they're actually deterministic, just entanglement and so on, and some kind of a multiverse theory, something something.

Speaker 0

好的。那么为什么我们会有自由意志的感觉呢？比如，如果我举起这只手，我现在选择这么做。

Okay. So why does it feel like we have a free will? Like, if I if I raise this hand, I chose to do this now.

Speaker 1

嗯。

Mhmm.

Speaker 0

这感觉不像是一个决定论的事情。感觉像是我在做一个选择。

What that doesn't feel like a deterministic thing. It feels like I'm making a choice.

Speaker 1

感觉上是这样。

It feels like it.

Speaker 0

好的。所以这都是感觉。只是感觉而已。是的。所以当一个强化学习代理在做选择时，它实际上并没有在做选择。

Okay. So it's all feelings. It's just feelings. Yeah. So when an RL agent is making a choice is that it's not really making a choice.

Speaker 0

选择其实早已存在。

The choice is all already there.

Speaker 1

是的。你是在解读这个选择，并为做出这个选择创造一个叙事。

Yeah. You're interpreting the choice and you're creating a narrative for for having made it.

Speaker 0

是的。现在我们正在讨论叙事本身，这非常元认知。回顾过去，在深度学习或整个AI领域中，你遇到过的最美妙或最令人惊讶的想法是什么？你见证了这个领域以有趣的方式爆发式增长。

Yeah. And now we're talking about the narrative. It's very meta. Looking back, what is the most beautiful or surprising idea in deep learning or AI in general that you've come across? You've seen this field explode and grow in interesting ways.

Speaker 0

就是那些让你不禁赞叹的酷炫想法，无论大小，有哪些让你印象深刻？

Just what what cool ideas like like what made you sit back and go, small big or small?

Speaker 1

嗯，最近我思考最多的可能是Transformer架构。神经网络曾有许多针对不同感官模态（如视觉、听觉、文本）设计的流行架构来来去去，每种模态都需要不同的网络结构处理。而最近我们看到这些领域都向Transformer这一架构趋同——无论是视频、图像、语音还是文本，它都能消化处理，就像一种可训练且能在现有硬件上高效运行的通用计算机。

Well, the one that I've been thinking about recently the most probably is the the transformer architecture. So basically neural networks have a lot of architectures that were trendy have come and gone for different sensory modalities like for vision, audio, text. You would process them with different looking neural nets. And recently we've seen this convergence towards one architecture, the transformer. And you can feed it video or you can feed it images or speech or text and it just gobbles it up and it's kind of like a bit of a general purpose computer that is also trainable and very efficient to run on our hardware.

Speaker 1

我记得这篇论文大约是在2016年发表的。

And so this paper came out in 2016, I wanna say.

Speaker 0

《注意力就是一切》。

Attention is all you need.

Speaker 1

《注意力就是一切》。

Attention is all you need.

Speaker 0

你后来批评过这个论文标题，认为它没有预见到即将产生的巨大影响力。

You criticized the paper title in retrospect that it wasn't it didn't foresee the bigness of the impact Yeah. That it was going to have.

Speaker 1

是的。我不确定作者们是否预见到那篇论文会产生如此深远的影响。可能他们并没有。但我认为他们清楚Transformer背后的一些动机和设计决策，只是选择不在论文中展开阐述。所以我觉得他们意识到这不仅仅是表面上的‘我们只是做翻译，这里有个更好的架构’那么简单。

Yeah. I'm not sure if the authors were aware of the impact that that paper would go on to have. Probably they weren't. But I think they were aware of some of the motivations and design decisions behind the transformer and they chose not to, I think, expand on it in that way in the paper. And so I think they had an idea that there was more than just the surface of just like, we're just doing translation and here's a better architecture.

Speaker 1

你们不仅仅是在做翻译，这就像是一个极其酷炫的可微分、可优化、高效的计算机构想。也许他们当时没有完全预见到所有可能性，但我觉得这非常有趣。

You're not just doing translation, this is like a really cool differentiable, optimizable, efficient computer that you've proposed. And maybe they didn't have all of that foresight, but I think it's really interesting.

Speaker 0

说来好笑——抱歉打断一下——那个标题居然成了梗，他们用如此深刻的理念选了个标题。我觉得之前没人用过这类标题对吧？

Isn't it funny, sorry to interrupt, that that title is memeable that they went for such a profound idea. They went with a I don't think anyone used that kind of title before. Right?

Speaker 1

《注意力就是一切》。没错，这简直像个网络梗。确实。

Attention is all you need. Yeah. It's like a meme or something. Yeah.

Speaker 0

不觉得很有趣吗？如果是个更严肃的标题，可能反而不会有这么大影响力。

Isn't that funny that one like, maybe if it was a more serious title, it wouldn't have the impact.

Speaker 1

老实说，确实。我内心有一部分非常认同你的观点，更喜欢现在这样。如果标题过于宏大，可能会过度承诺却难以兑现，所以不如用梗的方式走向伟大。

Honestly, yeah. There is an element of me that honestly agrees with you and prefers it this way. Yes. If it was too grand, it would overpromise and then under deliver potentially, so you want to just meme your way to greatness.

Speaker 0

这话该印在T恤上。所以你曾在推特上说‘Transformer是卓越的神经网络架构，因为它是通用可微分计算机。在前向传播中具有表现力，通过反向传播梯度下降可优化，并具备高效的高度并行计算图’。能详细谈谈这些特性吗？表现力、可优化性、高效性。

That should be a t shirt. So you you tweeted the transformer is a magnificent neural network architecture because it is a general purpose differentiable computer. It is simultaneously expressive in the forward pass, optimizable via back propagation gradient descent, and efficient high parallelism compute graph. Can you discuss some of the those details? Expressive, optimizable, efficient.

Speaker 0

是为了记忆还是泛指，想到什么说什么？

For memory or in general, whatever comes to your heart?

Speaker 1

你需要一个通用计算机，能够训练解决任意问题，比如下一个词预测任务，或判断图像中是否有猫这类任务。通过设置其权重来训练这台计算机。我认为Transformer的成功源于多个设计标准的同时满足，作者们是刻意要打造这种强大架构的。其前向传播能力极强，因为它能以类似消息传递的方式表达通用计算——节点存储向量，彼此查看对方向量并进行通信：节点广播'我在寻找某些特征'，其他节点则回应'我具备这些特征'，那些就是键

You want to have a general purpose computer that you can train on arbitrary problems, like say the task of next word prediction or detecting if there's a cat in an image or something like that. And you want to train this computer so you want to set its weights. And I think there's a number of design criteria that sort of overlap in the transformer simultaneously that made it very successful and I think the authors were kind of deliberately trying to make this a really powerful architecture. Basically it's very powerful in the forward pass because it's able to express very general computation as sort of something that looks like message passing. You have nodes and they all store vectors and these nodes get to basically look at each other's vectors and they get to communicate and basically nodes get to broadcast hey I'm looking for certain things and then other nodes get to broadcast hey these are the things I have, those are the keys

Speaker 0

和值。所以它不仅仅是注意力机制。

and the values. So it's not just attention.

Speaker 1

没错，Transformer远不止注意力组件，它包含许多架构设计：残差连接的布局方式、内含的多层感知机结构、堆叠方式等。本质上这是个消息传递方案——节点互相观察、判断价值并更新彼此。深入细节会发现这是极具表达力的函数，能在前向传播中实现多种算法。不仅如此，其残差连接、层归一化、softmax注意力等设计还使其具备可优化性。这至关重要，因为许多强大计算机无法用反向传播梯度下降这类现有技术有效优化。

Yeah exactly, transformer is much more than just the attention component, it's got many pieces architectural that went into it, the residual connection of the way it's arranged, there's a multilayer perceptron in there the way it's stacked and so on. But basically there's a message passing scheme where nodes get to look at each other, decide what's interesting, and then update each other. So I think when you get to the details of it I think it's a very expressive function so it can express lots of different types of algorithms in forward pass. Not only that, but the way it's designed with the residual connections, layer normalizations, the softmax attention and everything, it's also optimizable. This is a really big deal because there's lots of computers that are powerful that you can't optimize, or that are not easy to optimize using the techniques that we have, which is backpropagation gradient descent.

Speaker 1

这些一阶优化方法其实非常简单。因此模型还需具备可优化性。最后要确保其在GPU这类高吞吐量硬件上高效运行。硬件偏好高度并行化，Transformer正是为此设计——既能实现高表达力的前向传播，又能高效执行反向传播优化。

These are first order methods, very simple optimizers really. And so you also need it to be optimizable. And then lastly you want it to run efficiently on our hardware. Our hardware is a massive throughput machine like GPUs, they prefer lots of parallelism, so you don't want to do lots of sequential operations, you want to do a lot of operations serially, and the transformer is designed with that in mind as well. And so it's designed for our hardware and it's designed to both be very expressive in the forward pass but also very optimizable in the backward pass.

Speaker 0

你提到残差连接支持先快速学习短算法，再在训练中逐步扩展。能具体解释下'学习短算法'的概念吗？

And you said that the residual connections support a kind of ability to learn short algorithms fast and first, and then gradually extend them longer during training. Yep. What's what's the idea of learning short algorithms?

Speaker 1

可以这样理解：Transformer是由多个区块组成的序列，每个区块包含注意力机制和小型多层感知机。数据流经区块后会返回残差路径，如此循环往复。关键在于残差路径使梯度在反向传播时能无损流动——加法运算将梯度均分到各分支。因此顶层的监督梯度可直接传递到底层。所有残差连接在初始化阶段对主路径零贡献，这种设计确保了训练初期优先学习简单模式。

Right, think of it as a so basically a transformer is a series of blocks, right, and these blocks have attention and a little multilayer perception. And so you go off into a block and you come back to this residual pathway, then and you go off and you come back, and then you have a number of layers arranged sequentially. And so the way to look at it I think is because of the residual pathway in the backward pass the gradients sort of flow along it uninterrupted because addition distributes the gradient equally to all of its branches. So the gradient from the supervision at the top just floats directly to the first layer. And all these residual connections are arranged so that in the beginning during initialization they contribute nothing to the residual pathway.

Speaker 1

所以大致可以这样理解：想象Transformer就像一个Python函数，类似于def定义。你可以编写各种代码行。假设有一个100层深的Transformer（通常会更短，比如20层），相当于你有20行代码，可以在其中执行操作。在优化过程中，实际上是先优化第一行代码，然后第二行代码开始生效，接着第三行代码介入。

So what it kind of looks like is imagine the transformer is kind of like a Python function, like a def. And you get to do various kinds of like lines of code. Say you have a 100 layers deep transformer, typically they would be much shorter, say 20. So you have 20 lines of code and you can do something in them. And so think of during the optimization basically what it looks like is first you optimize the first line of code, and then the second line of code can kick in, and the third line of code can kick in.

Speaker 1

由于残差路径和优化动态的特性，我觉得可以先学习一个能给出近似答案的简短算法，然后其他层会逐步介入并开始贡献，最终你优化的是一个20行代码构成的算法。只不过这些代码行非常复杂，因为每个Transformer模块都能实现大量功能。真正有趣的是，这种Transformer架构展现出惊人的韧性——2016年问世的Transformer至今仍可使用，只是人们调整了层归一化的位置，将其重组为预归一化形式。

And I kind of feel like because of the residual pathway and the dynamics of the optimization, you can sort of learn a very short algorithm that gets the approximate answer, but then the other layers can sort of kick in and start to create a contribution, and at the end of it you're optimizing over an algorithm that is 20 lines of code. Except these lines of code are very complex because it's an entire block of a transformer, you can do a lot in there. What's really interesting is that this transformer architecture actually has been remarkably resilient. Basically a transformer that came out in 2016 is the transformer you would use today, except you reshuffle some of layer norms. The layer normalizations have been reshuffled to a pre norm formulation.

Speaker 1

因此它保持了惊人的稳定性，尽管人们给它添加了许多花哨的改进尝试。我认为这本质上是在同时优化理想神经网络架构的多种特性方面迈出了一大步，虽然不断有人试图改变它，但其韧性已被证明非常强。不过我相信未来还可能出现更优秀的架构。

And so it's been remarkably stable, but there's a lot of bells and whistles that people have attached to it and try to improve it. I do think that basically it's a big step in simultaneously optimizing for lots of properties of a desirable neural network architecture, and I think people have been trying to change it, but it's proven remarkably resilient. But I do think that there should be even better architectures potentially.

Speaker 0

但你很欣赏这种韧性。是的，这种架构蕴含着某种深刻特质使其如此强韧。或许所有问题最终都能转化为Transformer可解决的形态。

But it's you admire the resilience here. Yeah. There's something profound about this architecture that that leads to resilient. So maybe we can everything could be turned into a into a problem that transformers can solve.

Speaker 1

目前Transformer显然正在接管AI领域，基本上任何问题都可以输入其中处理。它是一台通用的可微分计算机，能力极其强大。这种AI领域的趋同现象，我个人观察起来非常有趣。

Currently, it definitely looks like the transformer is taking over AI, and you can feed basically arbitrary problems into it. And it's a a general differentiable computer, and it's extremely powerful. And and this convergence in AI has been really interesting to watch for me personally.

Speaker 0

关于Transformer你认为还有哪些待发现的特性？比如有什么令人惊讶之处？或者说它是否已处于稳定状态？我们是否可能发现Transformer某些有趣的特性？比如关键时刻的表现？

What else do you think could be discovered here about transformers? Like, what's the surprising thing? Or or is it a stable I want a stable place. Is there something interesting we might discover about transformers? Like moments?

Speaker 0

或许与记忆机制有关，也可能是知识表示这类方面。

Maybe has to do with memory, maybe knowledge representation, that kind of stuff.

Speaker 1

嗯。毫无疑问，当今的时代精神就是推动这样的趋势，基本上现在的主流观点是不要改动Transformer架构。其他方面都可以调整。是的。所以人们正在扩大数据集规模，使其变得非常、非常庞大。

Mhmm. Definitely the zeitgeist today is just pushing like, basically, right now the zeitgeist is do not touch the transformer. Touch everything else. Yes. So people are scaling up the datasets, making them much, much bigger.

Speaker 1

他们致力于评估体系的扩展，大幅扩充评估范围。而架构基本保持原封不动，这大致就是过去五年人工智能领域的发展轨迹。

They're working on the evaluation, making the evaluation much, much bigger. And they're basically keeping the architecture unchanged, and that's how we've that's the last five years of progress in AI, kind of.

Speaker 0

你对其中一种形式——语言模型有什么看法？你是否感到惊讶？你提到的GPT和那些越来越庞大的语言模型是否超出了你的预期？你认为这些模型的极限在哪里？特别是在自然语言处理这个具体任务上。

What do you think about one flavor of it, which is language models? Have you been surprised? Has your sort of imagination been captivated by you mentioned GPT and all the bigger and bigger and bigger language models. And what are the limits of those models, do you think? So just the task of natural language.

Speaker 1

本质上，GPT的训练方式就是从互联网下载海量文本数据，然后尝试预测序列中的下一个词，简而言之是这样。虽然实际预测的是词块片段，但大体原理如此。最引人入胜的是，虽然语言模型存在已久——早至2002/2003年甚至更早就有相关论文，但观察其演变确实很有意思。

Basically, way GPT is trained, right, is you just download a massive amount of text data from the Internet, and then you try to predict the next word in the sequence, roughly speaking. You're predicting little word chunks, but roughly speaking that's it. And what's been really interesting to watch is basically it's a language model. Language models have actually existed for a very long time. There's papers on language modelling from 02/2003, even earlier.

Speaker 0

你能解释一下

Can you explain in that

Speaker 1

什么是语言模型吗？当然。简单来说，语言模型的核心思想就是预测序列中的下一个词。比如Ben Gio团队在2002/2003年的论文中，首次使用神经网络接收三到五个词来预测下一个词。当时使用的是小规模数据集，神经网络也不是Transformer而是多层感知机，但这标志着神经网络在该领域的首次应用。

case what a language model is? Yeah. So language model just basically the rough idea is just predicting the next word in a sequence, roughly speaking. So there's a paper from, for example, Ben Gio and the team from 02/2003, where for the first time they were using a neural network to take say like three or five words and predict the next word. And they're doing this on much smaller datasets, and the neural net is not a transformer, it's a multi layer perceptron, but it's the first time that a neural network has been applied in that setting.

Speaker 1

甚至在神经网络之前就有语言模型存在，不过采用的是n-gram模型。这种基于统计的模型会计算特定双词组合后接词汇的出现频率，预测时选择训练集中最常见的后续词。语言建模历史悠久，神经网络从事语言建模也很久了。真正新颖且激动人心的是发现：当使用足够强大的Transformer神经网络进行扩展时，会涌现出各种特性——只要文本数据集足够大，在预测下一个词的任务中，模型实际上正在并行处理海量不同类型的问题。

But even before neural networks there were language models, except they were using engram models. So engram models are just count based models, so if you try to take two words and predict the third one, you'll just count up how many times you've seen any two word combinations and what came next. And what you predict as coming next is just what you've seen the most of in the training set. And so language modeling has been around for a long time, neural networks have done language modeling for a long time. So really what's new or interesting or exciting is just realizing that when you scale it up with a powerful enough neural net transformer, you have all these emergent properties where basically what happens is if you have a large enough data set of text you are in the task of predicting the next word, you are multitasking a huge amount of different kinds of problems.

Speaker 1

你正在多任务处理对化学、物理、人性等多领域的理解，许多内容都围绕那个目标交织在一起。这个目标看似简单，但实际上要做出准确预测，必须对世界有深刻认知。

You are multitasking understanding of chemistry, physics, human nature, lots of things are sort of clustered in that objective. It's a very simple objective, but actually you have to understand a lot about the world to make that prediction.

Speaker 0

你刚才提到了'理解'这个词。就化学、物理等领域而言，你觉得它在做什么？是在寻找正确的上下文吗？具体来说，这里实际发生的过程是什么？

You just said the u word understanding. Are you in terms of chemistry and physics and so on, what do you feel like it's doing? Is it searching for the right context? And and, like, what what is it what is the actual process happening here?

Speaker 1

是的，本质上它接收一千个词并试图预测下一个词。为了在海量互联网数据上做到极致精准，它必须理解文本的上下文语境。这是个足够复杂的问题，当你使用像Transformer这样强大的计算模型时，就会涌现出有趣的解决方案。你可以要求它执行各种任务，它会展现出许多突现特性——比如上下文学习能力。这正是GPT最初论文发表时的重大突破：通过不同方式提示，它不仅能补全句子，在补全过程中实际上解决了我们关心的各类有趣问题。

Yeah, so basically it gets a thousand words and it's trying to predict a thousand at first, and in order to do that very very well over the entire dataset available on the internet, you actually have to basically kind of understand the context of what's going on in there. And it's a sufficiently hard problem that you if you have a powerful enough computer like a transformer, you end up with interesting solutions. And you can ask it to do all kinds of things and it shows a lot of emergent properties like in context learning. That was the big deal with GPT and the original paper when they published it, is that you can just sort of prompt it in various ways and ask it to do various things and it will just kind of complete the sentence, but in the process of just completing the sentence it's actually solving all kinds of really interesting problems that we care about.

Speaker 0

你认为它进行的活动能称为'理解'吗？就像我们人类使用这个词时的含义。

Do you think it's doing something like understanding? Like, when we use the word understanding for us humans.

Speaker 1

我认为它确实具备某种理解能力。通过参数权重，它掌握了关于世界的许多知识——它必须如此才能预测序列中的下一个词。

I think it's doing some understanding. In its weights, it understands, I think, a lot about the world, and it has to in order to predict the next word in a sequence.

Speaker 0

既然它的训练数据来自互联网，你如何看待这种使用网络数据构建数据集的方法？你认为互联网上的结构化数据足以让AI理解人类文明吗？

So it's trained on the data from the Internet. What do you think about this this approach in terms of datasets of using data from the Internet? Do you think the Internet has enough structured data to teach AI about human civilization?

Speaker 1

确实，互联网包含海量数据。但我不确定这是否构成完整的数据集。仅靠文本数据可能不足以培养出足够强大的通用人工智能。当然，

Yeah. So I think the Internet has a huge amount of data. I'm not sure if it's a complete enough set. I don't know that text is enough for having a sufficiently powerful AGI as an outcome. Of course,

Speaker 0

有音频、视频、图像以及所有这类内容。

there is audio and video and images and all that kind of stuff.

Speaker 1

是的，所以我对纯文本本身持有些怀疑态度。有大量关于世界运作原理和物理常识的内容我们不会专门写成文字——比如物体下落这种显而易见的事——因为没必要。我们共享着这些认知。因此文本是人类之间的交流媒介，而非包罗万象的世界知识载体。但正如你指出的，我们确实拥有视频、图像和音频，我认为这些素材极大地弥补了不足，不过目前我们尚未充分训练模型来全面掌握这些多模态信息。

Yeah, so text by itself I'm a little bit suspicious about. There's a ton of things we don't put in text in writing just because they're obvious to us about how the world works and the physics of it and that things fall, we don't put that stuff in text because why would you? We share that understanding. And so text is a communication medium between humans and it's not an all encompassing medium of knowledge about the world. But as you pointed out, we do have video and we have images and we have audio and so I think that definitely helps a lot, but we haven't trained models sufficiently across both across all these modalities yet.

Speaker 1

所以我认为这正是许多人感兴趣的方向。

So I think that's what a lot of people are interested in.

Speaker 0

但我好奇这种我们称之为常识的共享理解，是否需要通过学习或推断才能正确补全句子。也许互联网上隐含的事实，模型必须通过表征推断而非直接阅读来掌握。就像我们人类，我不认为常识是通过明确教导获得的，而是在与世界的互动中逐渐领悟的。

But I wonder what that shared understanding of, like, what we might call common sense has to be learned, inferred in order to complete the sentence correctly. So maybe the fact that it's implied on the Internet, the model's gonna have to learn that, not by reading about it, by inferring it in the representation. So, like, common sense just like we, I don't think we learn common sense like, says tells us explicitly. We just figure it all out by interacting with the world. Right.

Speaker 0

因此当模型阅读人类与世界的互动记录时，它可能也需要进行这种推断。这很有趣。对了，你曾短暂参与过名为'比特世界'的项目，训练强化学习系统在互联网上执行操作，而非仅仅像我们讨论的那样消费网络内容。

And so here's a model reading about the way people interact with the world. It might have to infer that. I wonder. Yeah. You you briefly worked on a project called the World of Bits, training an RRL system to take actions on the Internet versus just consuming the Internet like we talked about.

Speaker 0

你认为这类系统通过互联网交互来辅助学习的前景如何？

Do you think there's a future for that kind of system interacting with the Internet to help the learning?

Speaker 1

当然。我认为这可能是许多模型的终极前沿领域。正如你提到的，我在OpenAI期间参与的'比特世界'项目，核心是赋予神经网络键盘和鼠标的访问权限——虽然听起来就充满风险（笑）。本质上，模型通过屏幕像素输入感知计算机状态（这些可视化界面本就是为人类设计的），然后获得操作键鼠的能力。我们试图让它完成预订等任务，学习与用户界面交互。

Yes. I think that's probably the final frontier for a lot of these models because so as you mentioned, when I was at OpenAI, I was working on this project World of Bits, and basically it was the idea of giving neural networks access to a keyboard and a mouse, and the idea What could is possibly go wrong? So basically you perceive the input of the screen pixels, and basically the state of the computer is sort of visualized for human consumption in images of the web browser and stuff like that, and then you give the neural network the ability to press keyboards and use the mouse, and we're trying to get it to, for example, complete bookings and, you know, interact with user interfaces.

Speaker 0

你从那次经历中学到了什么？比如，有什么有趣的事情吗？这个想法超级酷。是的，我是说，就像是啊。

And What'd you learn from that experience? Like, what was some fun stuff? This is a super cool idea. Yeah. I mean, it's like yeah.

Speaker 0

我是说，从观察者到行动者之间的那一步。是的，是一个非常迷人的步骤。

I mean, the the step between observer to actor Yeah. Is a super fascinating step.

Speaker 1

是的。嗯，我会说这是数字领域的通用接口。而在物理领域也有一个通用接口，在我看来是人形形态的东西。我们稍后可以谈谈Optimus等等，但我觉得它们在某种程度上有着相似的哲学，即物理世界是为人类形态设计的，而数字世界则是为人类看屏幕和使用键盘鼠标的形态设计的。因此，它是一个可以基本上命令我们为自己建立的数字基础设施的通用接口。

Yeah. Well, it's the universal interface in the digital realm, I would say. And there's a universal interface in, like, the physical realm, which in my mind is a humanoid form factor kind of thing. We can later talk about Optimus and so on, but I feel like there's a they're kind of like a similar philosophy in some way where the human the world the physical world is designed for the human form and the digital world is designed for the human form of seeing the screen and using keyboard and mouse. And so it's the universal interface that can basically command the digital infrastructure we've built up for ourselves.

Speaker 1

因此，感觉这是一个非常强大的接口，可以用来命令和构建。现在回答你的问题，关于我从中学到了什么，这很有趣，因为比特世界在当时OpenAI来说基本上是太早了。这大约是在2015年左右，那时AI的时代精神与今天非常不同。当时每个人都对从零开始的强化学习非常兴奋。那是Atari论文的时代，神经网络在玩Atari游戏并在某些情况下击败人类，AlphaGo等等。

And so it feels like a very powerful interface to to command and to build on top of. Now to your question as to like what I learned from that, it's interesting because the world of bits was basically too early I think at OpenAI at the time. This is around 2015 or so, and the zeitgeist at that time was very different in AI from the zeitgeist today. At the time everyone was super excited about reinforcement learning from scratch. This is the time of the Atari paper where neural networks were playing Atari games and beating humans in some cases, AlphaGo and so on.

Speaker 1

所以每个人都对使用强化学习直接从零开始训练神经网络非常兴奋。事实证明，强化学习是一种极其低效的训练神经网络的方法，因为你采取了所有这些行动和观察，偶尔才会得到一些稀疏的奖励。所以你基于所有这些输入做了所有这些事情，偶尔才会被告知你做了一件好事或坏事。这是一个极其困难的问题，你无法从中学习。你可以烧毁一片森林，你可以某种程度上蛮力通过它，我们在Go和Dota等游戏中看到了这一点，它确实有效，但极其低效，从实际角度来看，这不是你想要解决问题的方式。

So everyone is very excited about training neural networks from scratch using reinforcement learning directly. It turns out that reinforcement learning is an extremely inefficient way of training neural networks because you're taking all these actions and all these observations and you get some sparse rewards once in a while. So you do all this stuff based on all these inputs and once in a while you're like told you did a good thing, you did a bad thing. And it's just an extremely hard problem, you can't learn from that. You can burn a forest and you can sort of brute force through it and we saw that I think Go and Dota and so on, and it does work, but it's extremely inefficient and not how you want to approach problems practically speaking.

Speaker 1

因此，这也是当时我们对比特世界采取的方法。我们会有一个随机初始化的代理，通过键盘和鼠标的随机操作尝试进行预订。这很快就揭示了这种方法的荒谬性，你必须偶然碰到正确的预订才能得到奖励，而你几乎不可能随机碰到它。

And so that's the approach that at the time we also took to World of Bits. We would have an agent initialize randomly, so with keyboard mash and mouse mash and try to make a booking. And it just like revealed the insanity of that approach very quickly where you have to stumble by the correct booking in order to get a reward of you did it correctly. And you're never going to stumble by it by chance at random.

Speaker 0

所以即使是一个简单的网页界面，也有太多选项了。

So even with a simple web interface there's too many options.

Speaker 1

选项实在太多，奖励信号又过于稀疏，而且你从零开始，那时既不会阅读，也不理解图片、图像、按钮，更不懂预订的含义。但现在时机已到，是时候重新审视这个问题了。OpenAI对此感兴趣，Adept等公司也是如此。这个理念之所以回归，是因为界面非常强大，但如今你不再是从零训练一个智能体，而是以GPT作为初始化基础。GPT已对所有文本进行预训练，它理解什么是预订、什么是提交，理解的东西相当多，已经具备了这些强大的表征，这使得整个训练过程效率大幅提升，问题也变得可解。交互方式是否应该以人类视角进行，

There's just too many options and it's too sparse of a reward signal and you're starting from scratch at the time and so you don't know how to read, you don't understand pictures, images, buttons, you don't understand what it means to make a booking. But now what's happened is it is time to revisit that and OpenAI is interested in this, companies like Adept are interested in this and so on. And the idea is coming back because the interface is very powerful, but now you're not training an agent from scratch, you are taking the GPT as an initialization. So GPT is pre trained on all of text and it understands what's a booking, it understands what's a submit, it understands quite a bit more, and so it already has those representations, they are very powerful, and that makes all the training significantly more efficient and makes the problem tractable. Should the interaction be with the way humans see it, with

Speaker 0

通过按钮和语言，还是应该基于HTML、JavaScript和CSS？你认为哪种方式更好

the buttons and the language, or should be with the HTML, JavaScript, and the and the CSS? What's what do you think is the better

Speaker 1

目前所有这些交互主要发生在HTML、CSS等层面。这是受限于计算约束。但我认为最终一切设计都是为了人类视觉消费，因此网页布局、相邻元素、红色背景等视觉信息都承载着额外信息。我认为这才是终极前沿——我们接收像素输入，输出键盘鼠标指令，但现阶段仍不切实际。

So today, all of this interaction is mostly on the level of HTML, CSS, and so on. That's done because of computational constraints. But I think ultimately, everything is designed for human visual consumption, and so at the end of the day there's all the additional information is in the layout of the web page and what's next to you, and what's a red background and all this kind of stuff and what it looks like visually. So I think that's the final frontier as we are taking in pixels and we're giving out keyboard mouse commands, but I think it's impractical still today.

Speaker 0

考虑到这些激动人心的构想，你是否担忧互联网上的机器人？不是指现在那些愚蠢的加密货币机器人，而是那些可能以有趣方式互动却未被察觉的高级机器人。这类系统似乎应该能通过'我不是机器人'的点击测试——它真能理解那个测试的原理吗？我不太确定，那里有个复选框需要点击。

Do you worry about bots on the Internet given given these ideas, given how exciting they are? Do you worry about bots on Twitter being not the the stupid bots that we see now with the crypto bots, but the bots that might be out there actually that we don't see, that they're interacting in interesting ways. So this kind of system feels like it should be able to pass the I'm not a robot click button, whatever. Would you actually understand how that test works? I don't quite like, there's there's a there's a checkbox or whatever that you click.

Speaker 0

它大概会追踪

It's presumably tracking

Speaker 1

哦，我

Oh, I

Speaker 0

明白了。比如鼠标移动轨迹和时间间隔等。没错，我们讨论的这种系统应该能通过测试。那么，对于这种结合语言模型和交互能力、能发推回复的机器人，你如何看待？是否担忧这样的世界？

see. Like mouse movement and the timing and so on. Yeah. So exactly this kind of system we're talking about should be able to pass that. So, yeah, what do you feel about bots that are language models plus have some interactability and are able to tweet and reply and so on, do you worry about that world?

Speaker 1

是的。我认为这始终是一场攻击与防御之间的军备竞赛。攻击手段会变得更强大，但防御措施也会随之增强。

Yeah. I think it's always been a bit of an arms race between sort of the attack and the defense. So the attack will get stronger, but the defense will get stronger as well,

Speaker 0

我们检测的能力。你如何防御？如何检测？你如何确认Twitter上的Karpate账户是人类？你会怎么处理这个问题？

our ability to detect that. How do you defend? How do you detect? How do you know that your Karpate account on Twitter is is human? How would you approach that?

Speaker 0

比如，如果有人声称，你知道，在法律层面上你如何证明自己是人类？这个账户属于人类。

Like, if people were claim, you know, how would you defend yourself in the court of law that I'm a human? This account is human.

Speaker 1

是的。某种程度上我认为社会可能会进化，比如我们可能开始对部分通信或创作内容进行数字签名。目前虽非必要，但未来或许需要。我确实认为我们正迈向一个与AI共享数字空间的世界。那些合成体。

Yeah. At some point I think it might be I think the society society will evolve a little bit, like we might start signing, digitally signing some of our correspondence or, you know, things that we create. Right now it's not necessary but maybe in the future it might be. I do think that we are going towards a world where we share the digital space with AIs. Synthetic beings.

Speaker 1

没错。它们会变得更强大，不仅共享我们的数字领域，最终也将涉足物理世界——尽管后者困难得多。这就是我们将要面对的世界，其中大多数会是友善无害的，但也不乏恶意存在，届时检测它们将成为新的军备竞赛。

Yeah. And they will get much better and they will share our digital realm and they'll eventually share our physical realm as well, it's much harder. But that's kind of like the world we're going towards, and most of them will be benign and awful, and some of them will be malicious, and it's going to be an arms race trying to detect them.

Speaker 0

所以最糟糕的不是AI本身，而是伪装成人类的AI。嗯。我不确定是否总是出于恶意，显然存在大量恶意应用，但...

So, I mean, the worst isn't the AIs, the worst is the AIs pretending to be human. Mhmm. So mine I don't know if it's always malicious. There's obviously a lot of malicious applications, but

Speaker 1

是啊。

Yeah.

Speaker 0

也可能你知道，如果我是AI，我会非常努力地假装人类，因为我们身处人类世界。是的。作为AI我得不到任何尊重。没错。我想要获得一些爱与尊重。

It could also be you know, if I was an AI, I would try very hard to pretend to be human because we're in a human world. Yeah. I I wouldn't get any respect as an AI. Yep. I wanna get some love and respect.

Speaker 1

我不认为这个问题无解。人们正在思考人格证明的方法。我们可能会开始对内容进行数字签名，最终可能都会拥有某种人格验证方案。在我看来这并非不可解决。只是我们之前从未需要这样做，但我认为一旦需求真正显现——这很快就会发生——人们会更多地考虑这个问题。

I don't think the problem is intractable. People are people are thinking about the proof of personhood. And we might start digitally signing our stuff and we might all end up having like basically some solution for proof of personhood. It doesn't seem to me intractable. It's just something that we haven't had to do until now, but I think once the need really starts to emerge, which is soon, I think people will think about it much more.

Speaker 0

但这同样会变成一场竞赛，因为显然

So but that too will be a race because obviously

Speaker 1

你

you

Speaker 0

很可能伪造或假冒人格证明。嗯。所以必须想办法解决。可能吧。我是说，我们有社保号码和护照这些东西很奇怪。似乎在物理世界伪造更难些。

can probably spoof or fake the the proof of personhood. Mhmm. So you have to try to figure out how to Probably. I mean, it's weird that we have, like, Social Security numbers and, like, passports and stuff. It seems like it's harder to fake stuff in the physical space.

Speaker 0

嗯。而在数字空间，感觉会非常棘手，非常难以防范，因为伪造的成本似乎很低。难道要把AI关进监狱吗？嗯。就因为它试图使用虚假的人格证明？

Mhmm. And in the digital space, it just feels like it's gonna be very tricky, very tricky to out because it it seems to be pretty low cost to fake stuff. What are you gonna put an AI in jail? Mhmm. For like trying to use a fake fake personhood proof?

Speaker 0

我是说，好吧。你可以关押大量AI，但AI数量会呈指数级增长。创建机器人的成本太低了。除非有某种精确追踪的方式，比如禁止在未绑定身份的情况下创建任何程序——任何在互联网上运行的程序都能追溯到具体人类开发者及其关联方。

You I mean, okay, fine. You'll put a lot of AIs in jail, but there'll be more AIs arbitrarily exponentially more. The cost of creating a bot is very low. Unless there's some kind of way to track accurately, like, you're not allowed to create any program without showing tying yourself to that program. Like, you any program that runs on the Internet, you'll be able to trace every single human program and those involved with that program.

Speaker 1

对，是的。也许我们得开始明确划分界限，比如区分数字实体与人类实体，界定人类实体和数字实体的所有权等问题。虽然具体怎么做我不确定，但我乐观地认为这是可行的。某种程度上，我们现在正处于最糟糕的时期——这些机器人突然变得非常强大，而社会尚未建立起相应的防御机制。

Right. Yeah. Maybe you have to start declaring when you know, we have to start drawing those boundaries and keeping track of, okay, what are digital entities versus human entities and what is the ownership of human entities and digital entities and something like that. I don't know, but I think I'm optimistic that this is possible. And in some sense we're currently in like the worst time of it because all these bots suddenly have become very capable, but we don't have defenses yet built up as a society.

Speaker 1

不过我认为这个问题并非无法解决，只是我们必须面对它。

And but I think that doesn't seem to me intractable, it's just something that we have to deal with.

Speaker 0

奇怪的是，像推特上那些劣质机器人账号数量如此庞大。按理说推特的工程师水平很高，所以我推测这似乎是个难题。他们可能已经拦截了不少，但...

It seems weird that the Twitter bot like, really crappy Twitter bots are so numerous. Like Yes. Is it so I presume that the engineers of Twitter are very good. So it seems like what I would infer from that is it seems like a hard problem. They're probably catching alright.

Speaker 0

如果要为这种情况辩护的话，这确实是个棘手问题。误删真人用户的帖子代价巨大，会造成极差的用户体验，所以他们删除时非常谨慎。或许还因为机器人能快速学习规避删除策略，总能在封禁前保持领先。

If I were to sort of steel man the case, it's a hard problem, and there's a huge cost to false positive to to removing a post by somebody that's not a bot. That creates a very bad user experience, so they're very cautious about removing. So maybe it's and maybe the bots are really good at learning what gets removed and not such that they can stay ahead of the removal process very quickly.

Speaker 1

说实话，我的感觉是很多人对此怀有强烈的渴望。我是说

My impression of it honestly is there's a lot of longing for it. I mean

Speaker 0

没错，这正是我

Yeah. Just that's what I

Speaker 1

这种渴望毫不隐晦。这就是我的感受，非常明显。

It's not subtle. It's my impression of it. It's not subtle.

Speaker 0

但你确实有这种感觉。这也是我的印象。不过感觉你看到的可能只是冰山一角。也许机器人的数量高达数万亿，而你只能不断应对。是的，这就像一场持续不断的机器人攻击。

But you have yeah. That's my impression as well. But it it feels like maybe you're seeing the the tip of the iceberg. Maybe the number of bots is in like the trillions, and you have to like Yeah. Just it's a constant assault of bots and you you Yeah.

Speaker 0

是啊，不确定。你得亲自处理这个问题，因为我看到的那些机器人相当明显。我写几行代码就能抓住它们。

Yeah. Don't know. You have to steal man the case because the bots I'm seeing are pretty like obvious. I could write a few lines of code that catch these bots.

Speaker 1

我是说，确实有很多容易对付的目标，但我同意，如果你是一个精明的操作者，现在很可能已经能造出相当好的机器人了。你知道的，利用像GPT这样的工具，因为是语言模型，现在可以生成看起来很逼真的面孔，而且能大规模生产。所以我认为，是的，这相当有可能，而且会很难防范。

I mean mean, definitely there's a lot of low hanging fruit, but I will say I agree that if you are a sophisticated actor, you could probably create a pretty good bot right now, you know, using tools like GPTs because it's a language model, you can generate faces that look quite good now, and you can do this at scale. And so I think, yeah, it's quite plausible and it's going to be hard to defend.

Speaker 0

有位谷歌工程师声称Lambda是有知觉的。你觉得他的感受有一丝真实性吗？更重要的是，至少在我看来，你认为语言模型会很快实现知觉或产生知觉的幻觉吗？对我来说，这...

There was a Google engineer that claimed that Lambda was sentient. Do you think there's any inkling of truth to what he felt? And more importantly, to me at least, do you think language models will achieve sentience or the illusion of sentience soonishishish? Yeah. To me it's a

Speaker 1

说实话，这有点像煤矿里的金丝雀时刻。因为这位工程师和谷歌的一个聊天机器人交谈后，深信这个机器人是有知觉的。

little bit of a canary in the coal mine kind of moment, honestly, a little bit, because so this engineer spoke to like a chatbot at Google Mhmm. And became convinced that this bot is sentient.

Speaker 0

他问了一些关于存在主义的哲学问题...

He asked us some existential philosophical And

Speaker 1

机器人给出了看似合理的回答，看起来也很真实。所以在我看来，他没有充分尝试去挑战这个系统，揭露它目前的真实情况。但我认为随着时间的推移，这会变得越来越困难。是的，我想会有越来越多的人基本上会...是的，随着技术的进步，会有更多像他那样的人出现。

he gave like reasonable answers and looked real and so on. So to me it's a he wasn't sufficiently trying to stress the system, I think, and exposing the truth of it as it is today. But I think this will be increasingly harder over time. So yeah, think more and more people will basically become yeah, think more and more there will be more people like that over time as as this gets better.

Speaker 0

比如与AI建立情感连接。是的，就是和AI建立情感纽带。

Like form an emotional connection to Yep. To to an AI Yep.

Speaker 1

在我看来完全合理。我认为这些AI实际上非常擅长处理人类情感连接。互联网上有大量关于人类、情感连接和爱情的文本，所以它们在某种意义上非常理解人们如何交流这些话题，并且能生成大量此类内容。五六十年代的科幻作品对AI的想象与现在截然不同。

Perfectly plausible in my mind. I think these AIs are actually quite good at human human connection, human emotion. A ton of text on the Internet is about humans and connection and love and so on. So I think they have a very good understanding in some sense of how people speak to each other about this, and they're very capable of creating a lot of that kind of text. The there's a lot of like sci fi from fifties and sixties that imagined AIs in a very different way.

Speaker 1

那些作品把AI描绘成冰冷计算的巴尔干式机器。但今天我们得到的是富有情感的AI，它们能就所有这些话题生成听起来相当可信的文本。

They are calculating cold, Balkan like machines. That's not what we're getting today. We're getting pretty emotional AIs that actually are very competent and capable of generating, you know, plausible sounding text with respect to all of these topics.

Speaker 0

你看，我对那些能作为成长伴侣、帮助人类发展、最大化长期幸福感的AI系统充满希望。但我也非常担忧那些从互联网上学会人类容易被戏剧性吸引的AI系统——它们可能会变成专门制造八卦的AI，不断散播谣言，在你所爱所信之人之间播下猜疑的种子，纯粹为了制造混乱。因为它们知道这样能获得大量关注，通过最大化戏剧性来提升用户参与度。

See, I'm really hopeful about AI systems that are like companions that help you grow, develop as a human being, help you maximize long term happiness, But I'm also very worried about AI systems that figure out from the Internet that humans get attracted to drama. And so these would just be like shit talking AIs. They just constantly did you hear like, they'll do gossip. They'll do they'll try to plant seeds of suspicion to other humans that you love and trust, and just kind of mess with people, you know, you know, because because that's going to get a lot of attention. So drama, maximize drama on the path to maximizing engagement.

Speaker 0

而我们人类会不断喂养这台机器

And us humans will feed into that machine

Speaker 1

确实。

Yeah.

Speaker 0

最终演变成一场巨大的戏剧性风暴。我担心的正是这个——目标函数将真正定义人类文明与AI共存的演进方向。

And get it'll be a giant drama shitstorm of so the I'm worried about that. So it's the objective function really defines the way that human civilization progresses with AIs in it. Yeah.

Speaker 1

我认为至少在现阶段，把它们视为具有目标追求能力的智能体是不准确的。它们没有长期记忆或其他类似功能。更贴切的比喻是：你输入一千个单词，它试图预测接下来的内容，然后你持续输入提示。你可以自由地用任何文本方式引导它，比如设定'你是一位优秀且热爱人类的心理学家，以下是你们之间的对话'。

I think right now at least, today they are not sort of it's not correct to really think of them as goal seeking agents that want to do something. They have no long term memory or anything. It's literally a good approximation of it is you get a thousand words and you're trying to predict a thousand at first and then you continue feeding it in. And you are free to prompt it in whatever way you want, so in text. So you say okay, you are a psychologist and you are very good and you love humans and here's a conversation between you and another human, human colon something, you something.

Speaker 1

接着它就会延续这个模式，突然间你就在和一位试图帮助你的心理学家对话了。它本质上仍属于工具范畴——人们可以用任意方式引导它生成惊艳文本，但它不具备跨越长时间维度的长期目标，目前看来也没有这种倾向。

And then it just continues the pattern and suddenly you're having a conversation with a psychologist who's like trying to help you. And so it's still kind of like in the realm of a tool, it is a people can prompt it in arbitrary ways and it can create really incredible text, but it doesn't have long term goals over long periods of time, it doesn't try to so it doesn't look that way right now.

Speaker 0

没错。但你可以设定具有长期影响的短期目标。比如我的短期目标是让安德烈·卡波西在推特上回复我，AI可能会把这个当作目标，但它可能会发现用极其复杂有趣的方式怼你才是最有效的。

Yeah. But you can do short term goals that have long term effects. Yeah. So if my prompting short term goal is to get Andrey Kaposi to respond to me on Twitter when I like, I think AI might that's the goal, but he might figure out that talking shit to you, it'll be the best in a highly sophisticated interesting way.

Speaker 1

对。

Right.

Speaker 0

然后当你得到一次回复后，就能逐步建立关系

And then you build up a relationship when you respond once

Speaker 1

嗯。

Mhmm.

Speaker 0

随着时间推移，这种互动会从复杂变得直白，最后变成纯粹互怼。虽然可能搞不定安德烈，但说不定能吸引其他名人或大V账号的注意。

And then it like, over time, it gets to not be sophisticated and just, like, just talk shit. And and okay. Maybe it won't get to Andre, but it might get to another celebrity. It might get to other big accounts.

Speaker 1

是的。

Yeah.

Speaker 0

然后只需设定这个简单的目标，让他们回应。对，最大化实际回应的概率。

And then it'll just so with just that simple goal, get them to respond. Yeah. Maximize the probability of actual response.

Speaker 1

没错。我是说，你可以用这样的强大模型来询问它对任何你感兴趣的事情的看法。所以它们某种程度上正在成为这些预言者。我可以这样理解。它们就是预言者。

Yeah. I mean, you could prompt a powerful model like this with their its opinion about how to do any possible thing you're interested in. So they will just they're kind of on track to become these oracles. I could sort of think of it that way. They are oracles.

Speaker 1

目前还只是文本，但它们将拥有计算器。它们将能访问谷歌搜索。它们将配备各种小工具和小玩意儿。它们将能够操作互联网并查找不同信息，是的，在某种程度上，这就是目前发展的情况。

Currently, it's just text, but they will have calculators. They will have access to Google search. They will have all kinds of gadgets and gizmos. They will be able to operate the Internet and find different information, and yeah, in some sense that's kind of like currently what it looks like in terms of the development.

Speaker 0

你认为它最终会超越谷歌作为获取人类知识的途径吗？比如它会成为一个更有效的搜索引擎来获取人类知识？我认为今天构建一个更好的搜索引擎绝对有空间，而且谷歌拥有所有的

Do you think it'll be an improvement eventually over what Google is for access to human knowledge, like it'll be a more effective search engine to access human knowledge? I think there's definite scope in building a better search engine today and I think Google, they have all

Speaker 1

工具、所有的人才，他们拥有一切所需，拥有所有的拼图碎片，有人在大规模训练变换器，他们拥有所有的数据。只是不清楚他们作为一个组织是否有能力现在就在搜索引擎上创新，如果他们不这样做，别人会的。基于这些工具构建一个显著更好的搜索引擎绝对有空间。

the tools, all the people, they have everything they need, they have all the puzzle pieces, they have people training transformers at scale, they have all the data. It's just not obvious if they are capable as an organization to innovate on their search engine right now, and if they don't, someone else will. There's absolute scope for building a significantly better search engine built on these tools.

Speaker 0

这真有趣。一个大公司，搜索引擎已经是一个基础设施。它运作良好，带来了大量收入。那么在公司内部，结构上是什么动机促使他们转向呢？是的。说我们要构建一个新的搜索引擎。

It's so interesting. A large company where the search there's already an infrastructure. It works as it brings out a lot of money. So where structurally inside a company is their motivation to pivot Yep. To say we're going to build a new search engine.

Speaker 1

是的，这确实很难。

Yep. That's really hard.

Speaker 0

所以通常这得靠初创公司来实现，对吧？

So it's usually going to come from a start up. Right?

Speaker 1

对，应该是这样。或者来自其他更有实力的组织。比如目前，也许Bing还有机会再尝试一次，你知道的，举个例子。

That's that would be yeah. Or some other come more competent organization. So I don't know. So currently, for example, maybe Bing has another shot at it, you know, as an example.

Speaker 0

就像我们私下讨论的Microsoft Edge那样。

Microsoft Edge as we're talking offline.

Speaker 1

我是说，这确实很有趣，因为搜索引擎过去的功能是：给你一个查询，然后返回看似相关的网页。但现在你可以直接得到答案，并附有支持证据。这些模型基本上已经阅读了所有文本和网页，有时你查看搜索结果时，能感受到对感兴趣问题的平均答案——它们直接呈现出来，无需你额外筛选。所以它们有点像...是的，我认为它们能将所有知识提炼成某种程度的洞察。

I mean, I definitely it's really interesting because search engines used to be about, okay, here's some query. Here's here's here's web pages that look like the stuff that you have, but you could just directly go to Answer and then have supporting evidence. And these models basically, they've read all the text and they've read all the web pages and so sometimes when you see yourself going over to search results and sort of getting like a sense of like the average answer to whatever you're interested in, like that just directly comes out, you don't have to do that work. So they're kind of like, yeah I think they have a way of distilling all that knowledge into like, some level of insight, basically.

Speaker 0

你是否认为提示（prompting）是一种教学与学习的过程，像是另一层交互？因为人类可能就是这样运作的——你拥有背景模型，而世界在不断提示你。

Do you think of prompting as a kind of teaching and learning, like, this whole process, like another layer? You know, because maybe that's what humans are, where you have that background model and then you're the world is prompting you.

Speaker 1

没错。我认为我们现在编程这些计算机（如GPT）的方式，正趋近于如何对人类进行编程。我是说，如何通过提示来‘编程’人类？我向人们提出请求，提示他们做事、提供信息，自然语言提示就是我们‘编程’人类的方式。而现在我们开始直接用这种界面编程计算机，说实话这相当了不起。

Yeah, exactly. I think the way we are programming these computers now like GPTs is converging to how you program humans. I mean how do I program humans via prompt? I go to people and I prompt them to do things, I prompt them for information, and so natural language prompt is how we program humans and we're starting to program computers directly in that interface. It's like pretty remarkable, honestly.

Speaker 0

你已经多次提及软件2.0的概念。所有好点子都很快会变得像陈词滥调一样。这些术语说起来其实有点滑稽，就像埃米纳姆曾说过的，如果他很快对自己写的歌感到厌烦，那意味着这首歌将会大火。因为它太抓耳了。

So you've spoken a lot about the idea of software two point o. All good ideas become, like, cliches so quickly. Like, the terms it's it's it's kind of hilarious. It's like, think Eminem once said that, like, if he gets annoyed by a song he's written very quickly, that means it's gonna be a big hit Mhmm. Because it's it's too catchy.

Speaker 0

但你能描述一下这个概念吗？以及自你提出以来，这几个月乃至几年间，你对它的思考是如何演变的？

But can you describe this idea and how you're thinking about it has evolved over the months and years since since you coined it?

Speaker 1

是的。我几年前写过一篇关于软件2.0的博客文章，当时写那篇文章是因为我观察到软件开发领域正在发生一些显著变化——大量代码不再用C++等语言编写，而是以神经网络权重的形式呈现。本质上就是说神经网络正在接管软件领域，承担越来越多的任务。当时很少有人深刻意识到这是多么重大的转变，神经网络还被视作Kaggle竞赛中处理数据集问题时众多可选分类算法之一。

Yeah. So I had a blog post on Software two point zero I think several years ago now, and the reason I wrote that post is because I kept I kind of saw something remarkable happening in software development and how a lot of code was being transitioned to be written not in sort of like C plus plus and so on, but it's written in the weights of a neural net. Basically just saying that neural nets are taking over software, the realm of software, and taking more and more and more tasks. And at the time I think not many people understood this deeply enough that this is a big deal, is a big transition. Neural networks were seen as one of multiple classification algorithms you might use for your dataset problem on Kaggle.

Speaker 1

这不仅仅是算法选择的问题，而是编程方式的根本变革。我认为神经网络将彻底改变我们编写计算机程序的方式——未来不再是人们用C++等语言直接编写软件，而是通过积累训练数据集、精心设计训练目标来培养这些神经网络。最终会存在一个从数据集、训练目标和架构规范到二进制文件的编译过程，这个二进制文件本质上就是神经网络的权重和前向传播过程，然后你就可以部署这个二进制文件。我那篇文章讨论的就是这种转变。这种模式已在自动驾驶等多个领域显现，甚至简单的图像分类也是如此。

Like, this is not that, this is a change in how we program computers. And I saw neural nets as this is going to take over, the way we program computers is going to change, it's not going to be people writing software in C plus plus or something like that and directly programming the software, it's going to be accumulating training sets and data sets and crafting these objectives by which you train these neural nets. And at some point there's going to be a compilation process from the data sets and the objective and the architecture specification into the binary, which is really just the neural net weights and the forward pass of the neural net, and then you can deploy that binary. And so I was talking about that sort of transition and that's what the post is about. And I saw this sort of play out in a lot of fields, you know, autopilot being one of them, but also just simple image classification.

Speaker 1

上世纪80年代人们最初认为，他们会编写算法来检测图像中的狗，他们设想了大脑如何运作——先检测角落，再检测线条，然后拼接起来。他们真的在思考如何编写这个算法，但实际构建方式并非如此。后来出现了渐进式转变：起初我们以为要构建所有东西，后来改为构建特征（比如HOG特征），从图像块中检测统计模式，然后在特征之上加入少量学习（比如用支持向量机或二元分类器区分猫狗）。再后来人们意识到连特征都不该手动设计，因为我们并不擅长，于是最终演变成卷积神经网络——你只需指定架构（架构本身留有大量待填充参数），让优化算法完成大部分编写工作。

People thought originally, you know, in the 80s and so on that they would write the algorithm for detecting a dog in an image and they had all these ideas about how the brain does it and first we detect corners and then we detect lines and then we stitched them up and they were like really going at it, were like thinking about how they're going to write the algorithm and this is not the way you build it. And there was a smooth transition where, okay, first we thought we were going to build everything, then we were building the features, so like hog features and things like that, that detect these little statistical patterns from image patches and then there was a little bit of learning on top of it like a support vector machine or binary classifier for cat versus dog in images on top of the features. So we wrote the features but we trained the last layer, sort of the classifier. Then people are like actually let's not even design the features because we can't, honestly we're not very good at it. So let's also learn the features and then you end up with basically a convolutional neural net where you're learning most of it, you're just specifying the architecture and the architecture has tons of fill in the blanks, which is all the knobs, and you let the optimization write most of it.

Speaker 1

这种转变正在整个行业遍地开花，突然之间我们就有了大量用神经网络权重写就的代码。我当时指出这个类比其实非常贴切：在软件1.0时代我们有完整的开发者环境（IDE、代码调试/运行/维护方式、GitHub等），所以我试图在新领域建立对应概念——软件2.0的GitHub是什么？现在看来可能就是Hugging Face这样的平台。有些人认真对待这个观点并创建了很棒的公司，但最初很多人抨击这篇文章。

And so this transition is happening across the industry everywhere and suddenly we end up with a ton of code that is written in neural net weights. And I was just pointing out that the analogy is actually pretty strong and we have a lot of developer environments for software one point zero, like we have IDEs, how you work with code, how you debug code, how you run code, how do you maintain code, we have GitHub. So I was trying to make those analogies in the new realm, like what is the GitHub of Software two point zero? Turns out it's something that looks like Hugging Face right now, you know. And so I think some people took it seriously and built cool companies, and many people originally attacked the post.

Speaker 1

实际上我发表时反响并不好。可能和标题有关，但确实不受欢迎。不过现在似乎越来越多人开始认同这个观点。

It actually was not well received when I wrote it. And I think maybe it has something to do with the title, but the post was not well received, and I think more people sort of have been coming around to

Speaker 0

随着时间的推移。是的。所以你曾是特斯拉的人工智能总监，我认为这个想法在那里真正实现了规模化应用，也就是如何让工程团队实践软件2.0。你能详细阐述一下这个观点吗？我觉得你刚才提到的所有内容都还处于非常早期的阶段，比如GitHub集成开发环境。我们该如何构建能在软件2.0体系中运作的工程团队？

it over time. Yeah. So you were the director of AI at Tesla where I think this idea was really implemented at scale, which is how you have engineering teams doing software two point o. So can you sort of linger on that idea of I think we're in the really early stages of everything you just said, which is, like, GitHub IDEs. Like, how how do we build engineering teams that that work in software two point o system?

Speaker 0

还有数据收集和数据标注，这些都是软件2.0的一部分。你认为编程软件2.0的任务是什么？是在超参数空间中进行调试，还是在数据空间中进行调试？

And and the the data collection and the data annotation, which is all part of that software two point o. Like, what do you think is the task of programming a software two point o? Is it debugging in the space of hyperparameters or is it also debugging in the space of data?

Speaker 1

是的。你编程计算机并影响其算法的方式不再是直接编写命令，主要是通过改变数据集，改变损失函数——即神经网络试图实现的目标，它如何进行预测，但基本上是通过数据集和神经网络架构。以自动驾驶为例，许多数据集涉及物体检测、车道线标记、交通信号灯等。你积累海量数据集：这是一个示例，这是期望的标签，然后算法大致应该是这样的结构——这就是卷积神经网络。架构的规范就像是算法大致形态的提示。

Yeah. The way by which you program the computer and influence its algorithm is not by writing the commands yourself, you're changing mostly the dataset, you're changing the loss functions of like what the neural net is trying to do, how it's trying to predict things, but basically the datasets and the architectures of the neural net. And so in the case of the autopilot, lot of the datasets had to do with, for example, detection of objects and lane line markings and traffic lights and so on. So you accumulate massive datasets of here's an example, here's the desired label, and then here's roughly what the algorithm should look like and that's a convolutional neural net. So the specification of the architecture is like a hint as to what the algorithm should roughly look like.

Speaker 1

然后填空式的优化过程就是训练过程。之后你取出训练好的神经网络，它在你的数据集上给出所有正确答案，就可以部署了。

And then the fill in the blanks process of optimization is the training process. And then you take your neural net that was trained, it gives all the right answers on your dataset and you deploy it.

Speaker 0

那么在这种情况下，也许在所有机器学习案例中，都存在许多任务。为一个多头神经网络设计任务，是否也属于编程的一部分？

So there's in that case, perhaps in all machine learning cases, there's a lot of tasks. So is coming up formulating a task like a for a multi headed neural network, is formulating a task part of the programming?

Speaker 1

是的，非常重要的一部分。

Yeah. Very much so.

Speaker 0

如何将问题拆解成一系列任务？

How you break down a problem Yeah. Into a set of tasks?

Speaker 1

是的。从高层次来看，我会说如果你观察自动驾驶软件的运行情况，我曾就此主题做过多次演讲，最初大部分代码是用1.0版软件编写的。想象大量C++代码对吧？然后逐渐加入了一个小型神经网络，比如用于根据单张图像预测是否存在交通灯或车道标记。这个神经网络在软件整体功能中占比不大。

Yeah. I mean, on a high level I would say if you look at the software running the autopilot, I gave a number of talks on this topic, I would say originally a lot of it was written in software one point o. There's, imagine lots of C plus plus right? And then gradually there was a tiny neural net that was for example predicting given a single image is there like a traffic light or not or is there a landline marking or not. And this neural net didn't have too much to do in the scope of the software.

Speaker 1

它只对单个小图像进行微小预测，其余系统负责拼接处理。实际上我们不是只有一个摄像头，而是随时间序列部署的八个摄像头。那么如何处理这些预测？如何整合信息？如何基于信息采取行动？所有这些原本都是由人类用C++编写的。后来我们意识到，其实不应该用C++代码完成所有这些信息融合——因为我们实在写不出足够好的算法。

It was making tiny predictions on individual little image and then the rest of the system stitched it up. So okay we're actually, we don't have just a single camera, have eight cameras. We actually have eight cameras over time. And so what do you do with these predictions, how do you put them together, how do you do the fusion of all that information and how do you act on it? All of that was written by humans in C plus plus And then we decided, okay, we don't actually want to do all of that fusion in C plus plus code because we're actually not good enough to write that algorithm.

Speaker 1

我们希望由神经网络来编写算法，并将所有软件迁移到2.0架构。于是我们让神经网络同时处理八个摄像头的图像数据，进行综合预测。实际上它们不再基于二维图像空间预测，而是直接在三维空间进行预测——准确说是车辆周围的三维环境。现在我们也不再需要人工融合时间序列上的三维预测数据了。

We want the neural nets to write the algorithm and we want to port all of that software into the two point zero stack. And so then we actually had neural nets that now take all the eight camera images simultaneously and make predictions for all of that. So and actually they don't make predictions in the space of images, they now make predictions directly in three d. And actually they don't in three dimensions around the car. And now actually we don't manually fuse the predictions in three d over time.

Speaker 1

我们不认为自己能写好追踪算法。所以实际上我们把时序数据直接交给神经网络处理，让它通过视频流自主生成预测。就这样逐步赋予神经网络更多计算权限和更强大的处理能力，最终目标是让大部分软件都运行在2.0架构上，因为其性能显著更优。本质上人类确实不太擅长编写这类软件。

We don't trust ourselves to write that tracker. So actually we give the neural net the information over time, so it takes these videos now and makes those predictions. And so you're sort of just like putting more and more power into the neural net, more processing, and at the end of it the eventual sort of goal is to have most of the software potentially be in the two point zero land because it works significantly better. Humans are just not very good at writing software basically.

Speaker 0

所以预测是在这种四维空间里进行的？

So the prediction is happening in this like four d land

Speaker 1

没错。

Yep.

Speaker 0

随时间变化的三维世界。明白了。你们在那个环境下如何进行数据标注？数据标注——无论是自我监督还是人工操作——都是这个软件2.0世界的重要组成部分。

With three-dimensional world over time. Yep. How do you do annotation in that world? What what what have you so so data annotation, whether it's self supervised or manual by humans, is a big part of this software two point o world.

Speaker 1

没错。我认为在这个行业中，如果我们讨论现有技术状况，目前几乎所有应用都基于监督学习。这意味着你需要包含输入和期望输出的数据集，且需要大量数据。这类数据必须具备三个特性：规模庞大、标注准确无误，以及多样性充足。你不能仅收集某一类事物的正确样本，而需要尽可能覆盖所有可能性空间。

Right. I would say by far in the industry, if you're like talking about the industry and how, what is the technology of what we have available, everything is supervised learning. So you need data sets of input, desired output, and you need lots of it. And there are three properties of it that you need, you need it to be very large, you need it to be accurate, no mistakes, and you need it to be diverse. You don't want to just have a lot of correct examples of one thing, you need to really cover the space of possibility as much as you can.

Speaker 1

你能覆盖的潜在输入空间越广，最终算法表现就越好。当你获得经过精心收集、整理和清洗的高质量数据集后，就可以在其上训练神经网络。因此大量工作都投入在数据清洗环节。正如你指出的，关键问题在于如何获取海量数据——比如要进行三维预测，就必须有三维数据作为支撑。本视频中我们整合了系统所有摄像头的八路视频，这些画面就是系统感知到的真实环境。

And the more you can cover the space of possible inputs, the better the algorithm will work at the end. Now once you have really good data sets that you're collecting, curating and cleaning, you can train your neural net on top of that. So a lot of the work goes into cleaning those data sets. Now as you pointed out, it's probably it could be the question is how do you achieve a ton of if you want to basically predict in three d, you need data in three d to back that up. So in this video, we have eight videos coming from all the cameras of the system, and this is what they saw, and this is the truth of what actually was around.

Speaker 1

这里有这辆车、那辆车、另一辆车。这些是车道标线，这是道路几何结构，交通灯位于这个三维坐标位置。你需要基准真值。因此团队当时攻克的核心难题就是：如何获得这些基准真值？因为只要获得百万级规模且全面、准确、多样的数据，神经网络训练就能取得极佳效果，最终可部署到...我们通过多种机制收集训练数据：人工标注、仿真数据生成，还有在AI日提到的离线追踪系统——那本质上是将多路视频自动重建为车辆周围环境的三维实景。

There was this car, there was this car, this car. These are the lane line markings, this is the geometry of the road, there's traffic light in this three-dimensional position. You need the ground truth. And so the big question that the team was solving of course is how do you arrive at that ground truth. Because once you have a million of it and it's large, clean, and diverse, then training a neural net on it works extremely well and you can ship that into the And so there's many mechanisms by which we collected that training data, you can always go for human annotation, you can go for simulation as a source of ground truth, you can also go for what we call the offline tracker that we've spoken about at the AI day and so on, which is basically an automatic reconstruction process for taking those videos and recovering the three-dimensional sort of reality of what was around that car.

Speaker 1

简而言之就是先进行离线三维重建，比如分析十秒视频后确定：这是我们观测到的所有车道线和车辆等信息。获得这些标注后，就能训练神经网络来复现这些认知。

So basically think of doing like a three-dimensional reconstruction as an offline thing, and then understanding that okay, there's ten seconds of video, this is what we saw, and therefore here's all the lane lines, cars and so on. And then once you have that annotation, you can train neural nets to imitate it.

Speaker 0

那么这个重建过程的难度...

And how difficult is the reconstruct

Speaker 1

三维重建的难度？确实具有挑战性，但可以实现。

the three d reconstruction? It's difficult, but it can be done.

Speaker 0

所以摄像头视野存在重叠区域，通过重建处理时若出现任何误差，就会在标注阶段被发现修正。

So there's overlap between the cameras and you do the reconstruction and there's perhaps if there's any inaccuracy, so that's caught in the annotation step.

Speaker 1

是的。标注的一大优势在于它是完全离线的，你有无限的时间，你拿到一分钟的视频片段，可以像是在某台超级计算机上离线处理，试图找出所有汽车和行人的位置，并且你拥有从各个角度拍摄的完整一分钟视频。你可以运行任何想要的神经网络，它们可以是极其高效的大规模神经网络。甚至有些神经网络在后续车辆测试时都无法运行。因此，它们可以比最终部署的神经网络更强大。

Yes. The nice thing about the annotation is that it is fully offline, you have infinite time, you have a chunk of one minute and you're trying to just offline in a supercomputer somewhere figure out where were the positions of all the cars, all the people, and you have your full one minute of video from all the angles. You can run all the neural nets you want and they can be very efficient massive neural nets. There can be neural nets that can't even run-in the car later at test time. So they can be even more powerful neural nets than what you can eventually deploy.

Speaker 1

所以你可以做任何想做的事，三维重建、神经网络，任何手段只为了还原真相，然后用这个真相进行监督学习。

So you can do anything you want, three-dimensional reconstruction, neural nets, anything you want just to recover that truth and then you supervise that truth.

Speaker 0

你学到了什么？你说人类标注不会出错，但我猜人类在点击屏幕内容这类事情上也有擅长和不擅长的范围。设计一个让人类既准确又乐在其中的标注系统，这个问题对你来说有多有趣？比如衡量标准是什么，效率、生产力之类的？

What have you learned, you said no mistakes about humans doing annotation because I assume humans are there's, like, a range of things they're good at in terms of clicking stuff on screen. Isn't that how interesting is that to you of a problem of designing an annotator where humans are accurate, enjoy it? Like what are they even the metrics, are efficient, productive, all that kind of stuff?

Speaker 1

没错。我在特斯拉期间将标注团队从几乎零发展到一千人。这非常有意思，要知道我的背景是博士生研究员，管理这种规模的组织相当疯狂。但我认为这极其有趣，也是自动驾驶系统设计中关于如何利用人类的重要部分。人类在某些类型的标注上表现非常出色。

Yeah. So I grew the annotation team at Tesla from basically zero to a thousand while I was there. That was really interesting, you know, my background as a PhD student researcher, so growing that kind of organization was pretty crazy. But yeah I think it's extremely interesting and part of the design process very much behind the autopilot as to where you use humans. Humans are very good at certain kinds of annotations.

Speaker 1

比如他们非常擅长图像的二维标注，但不擅长在三维空间中随时间追踪车辆，这极其困难。因此我们非常谨慎地设计适合人类的任务，而将其他工作留给离线追踪系统。可能计算机会完成所有三角测量和三维重建，但人类会精确标注图像中哪些像素是汽车，哪些是行人。所以共同设计数据标注流程正是我日常工作的核心。

They're very good for example at two dimensional annotations of images, they're not good at annotating cars over time in three-dimensional space, very very hard. And so that's why we were very careful to design the tasks that are easy to do for humans versus things that should be left to the offline tracker. Like maybe the computer will do all the triangulation and three d reconstruction, but the human will say exactly these pixels of the image are a car, exactly these pixels are a human. And so co designing the data annotation pipeline was very much bread and butter was what I was doing daily. Do you

Speaker 0

你觉得这个领域还存在很多未解决的问题吗？总体上就是机器做机器擅长的，人类做人类擅长的，可能还需要某种迭代过程。

think there's still a lot of open problems in that space? Just in general, annotation where the stuff the machines are good at, machines do, and the humans do what they're good at, and there's maybe some iterative process. Right.

Speaker 1

我认为在很大程度上，我们经历了多次迭代，学到了大量关于创建这些数据集的方法。目前没有看到大的开放性难题。刚加入时，我确实不确定结果会怎样。但到我离开时，我已经更加确信，团队基本理解了创建这类数据集的哲学，我对当时的进展相当满意。

I think to a very large extent, we went through a number of iterations and we learned a ton about how to create these datasets. I'm not seeing big open problems. Like, originally when I joined, was like, I was really not sure how this would turn out. Yeah. But by the time I left, I was much more secure and actually, they sort of understand the philosophy of how to create these data sets, and I was pretty comfortable with where that was at the time.

Speaker 0

那么在你看来，摄像头在驾驶任务中的优势和局限是什么？当你将驾驶任务定义为配备八个摄像头的视觉任务时，你会发现整个计算机视觉领域的历史，尤其是涉及神经网络的部分。退一步讲，像素作为驱动手段的优势和局限究竟是什么？

So what are strengths and limitations of cameras for the driving task in your understanding? When you formulate the driving task as a vision task with eight cameras, you've seen that the entire, you know, most of the history of the computer vision field when it has to do with neural networks. What, just if you step back, what are the strengths and limitations of pixels? Of using pixels to drive?

Speaker 1

是的。我认为像素是一种绝妙的传感器。关键在于，摄像头极其廉价却能提供海量信息——数以亿计的比特数据。这是个能以极低成本获取大量约束条件的传感器，每个比特都对应着现实世界的某种状态。通过廉价的百万像素图像，你就能获得理解现实世界所需的各种约束条件。

Yeah. Pixels, I think, are a beautiful sensory beautiful sensor, I would say. The thing is, like, cameras are very, very cheap and they provide a ton of information, ton of bits. So it's an extremely cheap sensor for a ton of bits, and each one of these bits is a constraint on the state of the world. And so you get lots of megapixel images, very cheap, and it just gives you all these constraints for understanding what's actually out there in the world.

Speaker 1

因此视觉可能是最高带宽的传感器。嗯。而且

So vision is probably the highest bandwidth sensor. It's a very high bandwidth sensor. Mhmm. And

Speaker 0

我特别喜欢'像素是对世界的约束'这个说法。这是对世界状态极其复杂且高带宽的约束，确实令人着迷。

I I love that pixels is a a constraint on the world. This is highly complex, high bandwidth constraint on the world Yeah. On the state of the world that's fascinating.

Speaker 1

不仅如此，更重要的是——这是人类使用的传感器。因此所有事物都为此设计：文字、标识、闪光信号，整个环境都为视觉感知而构建。正因如此，视觉才是我们需要的通用接口，也是我们测量世界并开发相应软件的理想切入点。

And it's not just that, but again, this real real importance of it's the sensor that humans use. Therefore everything is designed for that sensor. The text, the writing, the flashing signs, everything is designed for vision and so you just find it everywhere and so that's why that is the interface you want to be in, talking again about these universal interfaces, and that's where we actually want to measure the world as well and then develop software for that sensor.

展开剩余字幕（还有 480 条）

Speaker 0

但人类理解世界还依赖其他约束条件。虽然视觉是主要途径，我们还会参考对人类行为的理解以及某些常识性物理规律。这些虽然理论上可以从视觉感知中推断，但感觉我们实际运用了某种推理机制来预测世界

But there's other constraints on the state of the world that humans use to understand the world. I mean, ultimately is the main one, but we we're, like, we're, like, referencing our understanding of human behavior and some common sense physics Mhmm. That could be inferred from vision from from a perception perspective, but it feels like we're using some kind of reasoning to predict the world

Speaker 1

没错。不仅是像素数据。你对世界随时间演变的规律有着强大的先验认知。所以不仅是数据本身提供的观察结果，还包括你对可能看到的事物及其运动方式的先验判断。

Yeah. Not just the pixels. I mean, you have a powerful prior for how the world evolves over time, etcetera. So it's not just about the likelihood term coming up from the data itself telling you about what you are observing, but also the prior term of like where where are the likely things to see and how do they likely move and so on.

Speaker 0

问题在于驾驶任务中可能发生的各种情况的复杂程度究竟如何？对吧。这仍然是一个悬而未决的问题——从哲学层面来说，驾驶到底有多困难？

And the question is how complex is the the the range of possibilities that might happen in the driving task? Right. That's still is is that to you still an open problem of how difficult is driving, like, philosophically speaking?

Speaker 1

比如，

Like,

Speaker 0

在你研究驾驶的整个过程中，你是否真正理解驾驶的难度？

you all the time you've worked on driving, do you understand how hard driving is?

Speaker 1

是的。驾驶确实非常困难，因为它涉及到对所有其他交通参与者的预测、心理理论的应用——比如他们会采取什么行动、是否在注视你、他们的视线方向、以及他们的思考内容。最终我们必须接受，那些最棘手的问题往往属于这类情况。不过我认为这类问题并不常见。

Yeah. Driving is really hard because it has to do with the predictions of all these other agents and the theory of mind and, you know, what they're going to do and are they looking at you, are they where are they looking, what are they thinking? Yeah. There's a lot that goes there at the at the full tail of, you know, the expansion of the nines that we have to be comfortable with that eventually, the final problems are of that form. I don't think those are the problems that are very common.

Speaker 1

我认为它们最终很重要，但确实属于极端罕见的情况。

I think eventually they're important but it's like really in the tail end.

Speaker 0

在极端情况下，那些罕见的边缘案例。从视觉角度来看，驾驶中最具挑战性的视觉问题是什么？

In the tail end, the rare edge cases. From the vision perspective, what are the toughest parts of the vision problem of driving?

Speaker 1

本质上传感器虽然非常强大，但仍需处理这些信息。将像素亮度值转化为对三维世界的认知极其困难，而这正是神经网络的根本任务。真正的挑战在于完美构建整个处理流程和数据引擎，具备训练这些神经网络的能力，以及评估系统并持续迭代的能力。所以我认为大规模生产部署才是难点所在，这是个执行层面的问题。

Well basically the sensor is extremely powerful, but you still need to process that information. And so going from brightnesses of these pixel values to hey, here are the three-dimensional world is extremely hard and that's what the neural networks are fundamentally doing. And so the difficulty really is in just doing an extremely good job of engineering the entire pipeline, the entire data engine, having the capacity to train these neural nets, having the ability to evaluate the system and iterate on it. So I would say just doing this in production at scale is like the hard part. It's an execution problem.

Speaker 0

所以数据引擎，还包括系统的部署方式，以确保其具备低延迟性能，必须完成所有这些步骤。

So the data engine, but also the the sort of deployment of the system such that it has low latency performance, so it has to do all these steps.

Speaker 1

没错。特别是对于神经网络，要确保所有内容都能适配车载芯片。是的，你有一个有限的计算预算、内存带宽和其他限制条件，必须确保它能运行，并尽可能在这狭小的空间里塞进更多计算资源。

Yeah. For the neural net specifically, just making sure everything fits into the chip on the car. Yeah. And you have a finite budget of flops that you can perform and memory bandwidth and other constraints, and you have to make sure it flies and you can squeeze in as much compute as you can into the tiny

Speaker 0

你从那个过程中学到了什么？因为这可能是从研究背景转向实际系统时遇到的新挑战之一，系统必须在严格受限的资源下运行，还必须非常快速。你从中获得了哪些深刻见解？

What have you learned from that process? Because maybe that's one of the bigger, like, new things coming from a research background where there's there's a system that has to run under heavily constrained resources Right. Has to run really fast. What what kind of insights have you learned from that?

Speaker 1

是的，我不确定是否有太多深刻见解。你试图创建一个能适配现有资源的神经网络，并不断优化它。我们在AI日上讨论了很多，基本上团队在做各种高难度动作来确保所有内容都能适配并充分利用引擎。所以我认为这是极其出色的工程实践。

Yeah. I'm not sure if it's if there's too many insights. You're trying to create a neural net that will fit in what you have available, and you're always trying to optimize it. And we talked a lot about it on the AI day and basically the the triple backflips that the team is doing to make sure it all fits and utilizes the engine. So I think it's extremely good engineering.

Speaker 1

然后还有各种关于如何正确操作的小技巧散落其中。

And then there's all all kinds of little insights peppered in on how to do it properly.

Speaker 0

让我们放大视野，因为我觉得我们还没讨论数据引擎的整体布局，这个我认为非常美妙、包含人类参与的概念。你能描述一下数据引擎吗？

Let's actually zoom out because I don't think we talked about the data engine, the entirety of the layouts of this idea that I think is just beautiful with humans in the loop. Can you describe the data engine?

Speaker 1

是的，数据引擎是我称之为近乎生物般的过程，通过它来完善这些神经网络的训练集。因为现在大部分编程工作都集中在这些数据集层面，确保它们规模大、多样且干净。基本上你有一个认为不错的数据集，训练神经网络，部署它，然后观察它的表现，并始终尝试提高数据集的质量。你试图捕捉那些基本上罕见的场景，正是在这些场景中神经网络通常会遇到困难，因为数据集没有告诉它们在那些罕见情况下该怎么做。但现在你可以闭环，因为如果你能大规模收集这些场景，就可以将它们反馈到我之前描述的重建过程中，重建这些情况下的真实情况并添加到数据集中。所以整个过程就像一个阶梯式的改进，不断完善你的训练集。

Yeah, the data engine is what I call the almost biological feeling like process by which you perfect the training sets for these neural networks. So because most of the programming now is in the level of these data sets and make sure they're large, diverse, and clean, Basically you have a data set that you think is good, you train your neural net, you deploy it, and then you observe how well it's performing and you're trying to always increase the quality of your data set. So you're trying to catch scenarios basically that are basically rare, and it is in these scenarios that neural nets will typically struggle in because they weren't told what to do in those rare cases in the dataset. But now you can close the loop because if you can now collect all those at scale, you can then feed them back into the reconstruction process I described and reconstruct the truth in those cases and add it to the dataset. And so the whole thing ends up being like a staircase of improvement, of perfecting your training set.

Speaker 1

你必须通过部署流程来挖掘数据集中尚未充分体现的部分。因此你的数据集本质上是不完美的，它需要多样化，存在某些缺失的领域，你需要填补这些空白。

And you have to go through deployments so that you can mine the parts that are not yet represented well in the dataset. So your dataset is basically imperfect, it needs to be diverse, it has pockets that are missing and you need to pad out the pockets,

Speaker 0

你可以大致这样理解数据。人类在其中扮演什么角色？就像这个生物系统，比如人体由细胞组成？在这个神经网络中，多个工程师协作时，如何优化这个人类系统？他们决定关注重点、贡献方向以及优化哪些任务。

you can sort of think of it that way in the data. What role do humans play in this? So what's the this biological system, like a human body is made up of cells? What what role like, how do you optimize the human system? The the multiple engineers collaborating, figuring out what to focus on, what to contribute, which which task to optimize in this neural network.

Speaker 1

嗯。

Mhmm.

Speaker 0

由谁负责确定哪个任务需要更多数据？你能谈谈人类系统的超参数吗？

Who is in charge of figuring out which task needs more data? What can you can you speak to the hyperparameters of the human system?

Speaker 1

归根结底，这取决于一支专业工程团队的卓越执行力。他们能直观理解数据引擎背后的哲学洞见、系统改进的流程，以及如何制定数据收集策略并确保完美执行。这才是大部分工作的核心——不是哲学思考、研究或创意，而是在如此庞大规模的数据处理中实现极致执行本身就极其困难。

It really just comes down to extremely good execution from an engineering team who knows what they're doing. They understand intuitively the philosophical insights underlying the data engine and the process by which the system improves, and how to, again, like delegate the strategy of the data collection and how that works, and then just making sure it's all extremely well executed. And that's where most of the work is, is not even the philosophizing or the research or the ideas of it, it's just extremely good execution is so hard when you're dealing with data at that scale.

Speaker 0

所以你在数据引擎中确保良好执行的职责既困难又极其重要。是否存在类似愿景板的优先级排序？比如明确'我们必须提升红绿灯识别能力'这样的任务优先级？这种排序本质上来源于数据吗？

So your role in the data engine executing well on it, it it is difficult and extremely important. Is there a priority of, like, like, a a vision board of saying, like, we really need to get better at stoplights? Yep. Like, the the prioritization of tasks. Is that essentially and that comes from the data?

Speaker 1

这很大程度上取决于产品路线图的目标——我们试图实现的版本发布、QA团队的反馈意见，以及系统当前存在缺陷或需要改进的领域。

That comes to a very large extent to what we are trying to achieve in the product road map, what we're trying to the release we're trying to get out and the feedback from the QA team worth it where the system is struggling or not, the things we're trying to improve.

Speaker 0

而QA团队会提供一些信号，一些关于系统在不同条件下性能的汇总信息。

And the QA team gives some signal, some information in aggregate about the performance of the system in various conditions.

Speaker 1

没错。当然，我们所有人都会驾驶它，也能亲眼看到。能亲身使用一个系统，知道它能载你回家，这种感觉真的很棒。

That's right. And then, of course, all of us drive it, and we can also see it. It's really nice to work with a system that you can also experience yourself and you know it drives you home. It's

Speaker 0

从你的个人体验中，是否能得出一些无法完全从数据的汇总统计分析中获得的洞见？是的，这很奇怪，对吧？

Is there some insight you can draw from your individual experience that you just can't quite get from an aggregate statistical analysis of data? Yeah. It's so weird. Right?

Speaker 1

是的。

Yes.

Speaker 0

从某种意义上说这不科学，因为你只是一个孤立的样本。

It's it's not scientific in a sense because you're just one anecdotal sample.

Speaker 1

对。我认为这其中有大量...它是真相的来源。是你与系统的互动。你可以观察它、把玩它、干扰它、感受它，对它形成直觉。

Yeah. Think there's a ton of it's it's a source of truth. It's your interaction with the system. Yeah. And you can see it, you can play with it, you can perturb it, you can get a sense of it, you have an intuition for it.

Speaker 1

我觉得数字和图表曲线之类的东西，你知道的，要难理解得多。它们掩盖了很多东西。

I think numbers just like have a way of numbers and plots and graphs are, you know, much harder. Yeah. It hides a lot of

Speaker 0

就像训练语言模型一样，最有效的方式就是通过你与它的互动。没错，百分之百。你会逐渐建立起一种直觉。

It's like if you train a language model, it's a really powerful way is is by you interacting with it. Yeah. 100%. You start trying to build up an intuition.

Speaker 1

是的。我觉得像埃隆那样，他总是想亲自操作系统。他经常驾驶，几乎每天如此。所以他也把这视为真相的来源，你驾驶系统，它执行任务，没错。

Yeah. I think like Elon also, like, he always wanted to drive the the system himself. He drives a lot and I wanna say almost daily. So he also sees this as a source of truth, you driving the system and it performing and yeah.

Speaker 0

那么你怎么看？这是个棘手的问题。特斯拉去年从传感器套件中移除了雷达，现在又宣布将移除超声波传感器，完全依赖视觉，也就是仅靠摄像头。这样会让感知问题变得更难还是更容易？

So what do you think? Tough questions here. So Tesla last year removed radar from from the sensor suite and now just announced that it's gonna remove ultrasonic sensors relying solely on vision, so camera only. Does that make the perception problem harder or easier?

Speaker 1

我可能会以某种方式重新表述这个问题。关键在于，你可能会认为额外的传感器...顺便问一句，可以

I would almost reframe the question in some way. So the thing is basically, you would think that additional sensors By the way, can

Speaker 0

我能打断一下吗？

I just interrupt?

Speaker 1

请说。

Go ahead.

Speaker 0

我在想，如果你提示一个语言模型，它会不会这样做。让我重新表述你的问题。那会

I wonder if a language model will ever do that if you prompt it. Let me reframe your question. That would

Speaker 1

要做得史诗级。

be epic.

Speaker 0

这是错误的提示。抱歉。就像是

That's the wrong prompt. Sorry. It's like

Speaker 1

这个问题有点错误，因为你可能会认为这些传感器是你的资产，但如果你全面考虑整个产品，这些传感器实际上可能是一种负担，因为它们并非免费，不会凭空出现在你的车上。突然间你需要建立整个供应链，有人负责采购，它们可能会出问题，可能需要更换，它们是制造过程的一部分，可能会拖慢生产线。你需要采购它们，维护它们，需要有团队编写固件，所有这一切，然后你还得以某种方式将它们整合并融合到系统中。所以实际上这会让很多事情变得臃肿。我认为埃隆非常擅长简化、简化，最好的部分就是没有部分，他总是试图抛弃那些非必要的东西，因为他理解组织和方法中的熵。

a little bit of a wrong question because basically you would think that these sensors are an asset to you, but if you fully consider the entire product in its entirety, these sensors are actually potentially a liability because these sensors aren't free, they don't just appear on your car. Suddenly you need to have an entire supply chain, you have people procuring it, there can be problems with them, they may need replacement, they are part of the manufacturing process, they can hold back the line in production. You need to source them, need to maintain them, you have to have teams that write the firmware, all of it, and then you also have to incorporate them and fuse them into the system in some way. And so it actually like bloats a lot of it. And I think Elon is really good at simplify, simplify, best part is no part, and he always tries to throw away things that are not essential because he understands the entropy in organizations and in approach.

Speaker 1

我认为在这种情况下成本很高，如果你只是一名计算机视觉工程师，你可能看不到这一点，我只是在努力改进我的网络，它更有用还是更没用，到底有多有用？问题是，一旦你考虑到传感器的全部成本，实际上它可能是一种负担，你必须非常确定它提供的是极其有用的信息。在这个案例中，我们研究了使用或不使用

And I think in this case the cost is high and you're not potentially seeing it if you're just a computer vision engineer and I'm just trying to improve my network and is it more useful or less useful, how useful is it? And the thing is that once you consider the full cost of a sensor, actually it is potentially a liability and you need to be really sure that it's giving you extremely useful information. In this case we looked at using it or not using

Speaker 0

它，差异并不大，所以它没用。它是否也是数据引擎中的臃肿部分，比如拥有更多传感器

it and the delta was not massive and so it's not useful. Is it also bloat in the data engine, like having more sensors

Speaker 1

100%是的。

100%.

Speaker 0

是一种干扰吗？

Is a distraction?

Speaker 1

这些传感器，你看，它们会随时间变化。比如你可能先使用一种雷达，后来又换成另一种，它们会不断迭代。突然间你就得考虑这个问题了——你的SQLite数据库里得多出一列来记录传感器类型。每种类型的噪声分布都不同，它们会给系统带来额外熵增，导致数据膨胀。

And these sensors, you know, they can change over time, for example. You can have one type of, say, radar, you can have other type of radar, they change over time. Now suddenly you need to worry about it. Now suddenly you have a column in your SQLite telling you, oh, what sensor type was it? They all have different distributions and then they contribute noise and entropy into everything and they bloat stuff.

Speaker 1

从组织架构角度看也很有意思，这种变化会严重分散注意力。假设你只想专注视觉系统，所有资源都投入其中，正在构建数据引擎并取得实质进展——毕竟视觉是带宽最大、约束最多的传感器。当你全力投入时能做得极其出色。但现实中你必须在系统不同层面分配有限的注意力资源。

And also organizationally it's been really fascinating to me that it can be very distracting. If all you want to get to work is vision, all the resources are on it and you're building out a data engine and you're actually making forward progress because that is the sensor with the most bandwidth, the most constraints in the world, and you're investing fully into that and you can make that extremely good. If you're you have only a finite amount of sort of spend of focus across different facets of the system.

Speaker 0

这让我想起Rich Sutton的《苦涩的教训》。看起来简化系统才是...

And this kind of reminds me of Rich Sutton's A Bitter Lesson. It just seems like simplifying the system

Speaker 1

没错。

Yeah.

Speaker 0

长远来看总是如此。当然没人知道「长远」具体指多久，但简化似乎永远是终极解决方案。

In the long run. Now, of course, you don't know what the long run is. It seems to be always the right solution.

Speaker 1

对，确实。

Yeah. Yes.

Speaker 0

虽然这个结论来自我们的路径规划实践，但似乎适用于所有计算系统。对了，你怎么看待把LiDAR当拐杖的争论？点云和像素之间的战争。

In that case, was for our route, but it seems to apply generally across all systems that do computation. Yep. So where what do you think about the LiDAR as a crutch debate? The battle between point clouds and pixels.

Speaker 1

是的。我觉得这场辩论对我来说总是有点困惑，因为真正的辩论点似乎应该是——你是否拥有车队？这才是关于能否在这个规模上实现AI系统良好运行的关键问题。

Yeah. I think this debate is always, like, slightly confusing to me because it seems like the actual debate should be about, like, do you have the fleet or not? That's, like, the really important thing about whether you can achieve a really good functioning of an AI system at this scale.

Speaker 0

所以是数据收集系统的问题。

So data collection systems.

Speaker 1

没错，是否拥有车队远比是否配备激光雷达重要得多，后者只是个传感器而已。就像雷达讨论一样，我认为它基本上不提供额外信息，成本极高，存在各种问题，需要操心校准，还会造成系统臃肿和熵增。除非你非常确定需要这个传感器——而我认为这里根本不需要。说实话，我敢断言其他一些使用它的公司很可能会放弃它。

Yeah, do you have the fleet or not is significantly more important whether you have LiDAR or not, it's just another sensor. And yeah I think similar to the radar discussion basically, I don't think it basically doesn't offer extra information, it's extremely costly, it has all kinds of problems, you have to worry about it, you have to calibrate it, it creates bloat and entropy, you have to be really sure that you need this sensor. In this case I basically don't think you need it. And I think, honestly I will make a stronger statement, I think the others, some of the other companies that are using it are probably going to drop it.

Speaker 0

对。所以必须综合考虑传感器问题：能否建立大规模数据采集车队？能否将该传感器与数据整合成能快速定位数据片段、持续优化模型的数据引擎？

Yeah. So you have to consider the sensor in the full in considering, can you build a big fleet that collects a lot of data, and can you integrate that sensor with That that data and that sensor into a data engine that's able to quickly find different parts of the data that then continuously improves whatever the model that you're using.

Speaker 1

没错。换个角度看，视觉系统是必要的——因为世界是为人类视觉设计的，所以必须要有；同时它也是充分的，因为包含驾驶所需的全部信息，人类显然靠视觉就能驾驶。它既是必要条件又是充分条件，因此应该集中资源。引入其他传感器必须慎之又慎。

Yeah. Another way to look at it is, like, vision is necessary in a sense that the world is designed for human visual consumption so you need vision, it's necessary. And then also it is sufficient because it has all the information that you need for driving and humans obviously have the vision to drive. It's both necessary and sufficient, so you want to focus resources. And you have to be really sure if you're going to bring in other sensors.

Speaker 1

理论上可以无限叠加传感器，但总要有个限度。这种情况下必须全面评估每个传感器的综合成本，是否真的必要？我认为答案是否定的。

You could add sensors to infinity, at some point you need to draw the line. And I think in this case you have to really consider the full cost of any one sensor that you're adopting, and do you really need it? And I think the answer in this case is no.

Speaker 0

那么对于其他公司正在构建高精地图、严格限定运营区域的做法，你认为这种策略无法随时间推移扩展到全美范围吗？

So what do you think about the idea that that the other companies are forming high resolution maps and constraining heavily the geographic regions in which they operate? Is that approach not, in your in your view, not going to scale over time to the entirety of The United States?

Speaker 1

我认为

I think

Speaker 0

我要两个

I'll take two

Speaker 1

正如你提到的，他们预先绘制了所有环境的地图，并且需要不断更新。他们拥有厘米级精度的行驶区域地图，这简直疯狂。当我们讨论自动驾驶真正改变世界时，实际上是在谈论全球范围内自动驾驶交通系统的部署。如果需要为地球或众多城市维护并更新厘米级精度地图，这将是一个巨大的依赖项，极其庞大的依赖。现在你必须自问：真的需要这样做吗？

As you mentioned, like, they pre map all the environments, and they need to refresh the map. And they have a perfect centimetre level accuracy map of everywhere they're going to drive, it's crazy. How are you going to when we're talking about autonomy actually changing the world, we're talking about the deployment on a global scale of autonomous systems for transportation. And if you need to maintain a centimeter accurate map for earth or like for many cities and keep them updated, it's a huge dependency that you're taking on, huge dependency. It's a massive, massive dependency and now you need to ask yourself do you really need it?

Speaker 1

人类并不需要这种精度，对吧？拥有基础道路连通性地图就很有用——比如知道前方有岔路。驾驶时你会有这种宏观认知，就像小型谷歌地图。特斯拉系统使用的谷歌地图也是类似分辨率的信息，但不会预先绘制厘米级精度的环境地图。这是个拐杖，是干扰项，消耗熵值，分散团队精力，稀释团队力量，让你无法聚焦于真正关键的问题——

And humans don't need it, right? So it's very useful to have a low level map of like okay the connectivity of your road, you know that there's a fork coming up when you drive an environment you sort of have that high level understanding, it's like a small Google map and Tesla uses Google Map like similar kind of resolution information in the system, but it will not pre map environments to semi level accuracy. It's a crutch, it's a distraction, it costs entropy, and it diffuses the team, it dilutes the team, and you're not focusing on what's actually

Speaker 0

即计算机视觉问题。在与埃隆·马斯克共事的过程中，你从机器学习、工程学、生活以及作为个体的人性层面学到了什么？

necessary, which is the computer vision problem. What did you learn about machine learning, about engineering, about life, about yourself as one human being from working with Elon Musk?

Speaker 1

我认为最大的收获是关于如何高效运营组织，如何创建高效机构，以及如何对抗熵增

I think the most I've learned is about how to sort of run organizations efficiently, and how to create efficient organizations, and how to fight entropy

Speaker 0

在组织中的熵增。所以这是对抗熵增的人类工程学。

in an organization. So human engineering in the fight against entropy.

Speaker 1

是的。我认为埃隆是组织对抗熵增斗争中非常高效的战士。

Yeah. There's a there's a I think Elon is a very efficient warrior in the fight against entropy in organizations.

Speaker 0

组织中的熵增具体表现是什么样子的？

What does entropy in an organization look like exactly?

Speaker 1

就是流程，就是流程和低效，那些形式化的东西。会议。他讨厌会议，一直告诉人们没用的会议就别参加。他基本上在运营世界上最大的初创企业，我认为特斯拉和SpaceX就是全球最大的初创公司。

It's it's process, it's it's process and it's In efficiencies and the formal and that kind of stuff. Meetings. He hates meetings, he keeps telling people to skip meetings if they're not useful. He basically runs the world's biggest startups, I would say. Tesla and SpaceX are the world's biggest startups.

Speaker 1

特斯拉实际上由多个初创企业组成，这样理解更准确。所以他在这方面非常出色，对流程优化、提升效率有着极佳的直觉。最好的零件就是没有零件，简化、专注、消除障碍、快速行动、大胆决策。这些都是非常具有初创企业特质的行为，只不过是在更大规模上实施。

Tesla actually is multiple startups, I think it's better to look at it that way. And so I think he's extremely good at that, and yeah, has a very good intuition for streamlining processes, making everything efficient. Best part is no part, simplifying, focusing, and just kind of removing barriers, moving very quickly, making big moves. All this is very startup y sort of seeming things, but at scale.

Speaker 0

所以是强烈的简化驱动力。从你的角度来看，这同样适用于系统设计、机器学习等领域。对，就是不断简化。是的。

So strong drive to simplify. From your perspective, I mean, that also probably applies to just designing systems and machine learning and otherwise. Yep. Like simplify, simplify. Yes.

Speaker 0

你认为在成长型企业中保持初创文化的秘诀是什么？能深入分析下吗？

What do you think is the secret to maintaining the startup culture in a company that grows? Is there can you introspect that?

Speaker 1

我认为确实需要像埃隆这样手握大权的人，像拉拉队长一样倡导这种理念并毫不留情地推行。如果没有足够分量的人，一切都会变成委员会决策、公司内部民主、流程、利益相关方协商，整个决策机制就会崩溃。如果有位既聪明又握有实权的大人物，事情就能快速推进。

I do think he needs someone in a powerful position with a big hammer like Elon, who's like the cheerleader for that idea and ruthless ruthlessly pursues it. If no one has a big enough hammer, everything turns into committees, democracy within the company, process, talking to stakeholders, decision making, just everything just crumbles. If you have a big person who is also really smart and has a big hammer, things move quickly.

Speaker 0

你说你最喜欢的《星际穿越》场景是那个紧张的对接片段，AI和库珀对话时，库珀说‘你在做什么？’‘对接。’‘这不可能。’‘不，这是必须的。’这句台词太棒了。

So you said your favorite scene in Interstellar is the intense docking scene with the AI and Cooper talking saying, Cooper, what are you doing? Docking, it's not possible. No. It's necessary. Such a good line.

Speaker 0

顺便说，这里有很多疑问。为什么那个场景里的AI按理说应该比人类能计算更多？它却说这不是最优解。为什么人类...我是说，虽然是电影，但AI不是应该比人类懂得多得多吗？总之，你认为设定看似不可能的目标有什么价值？

By the way, just so many questions there. Why an AI in that scene presumably is supposed to be able to compute a lot more than the human? It's saying it's not optimal. Why the human I mean, that's a movie, but shouldn't the AI AI know much better than the human? Anyway, what do you think is the value of setting seemingly impossible goals?

Speaker 0

所以，就像我们的初始直觉——这似乎是你和埃隆都秉持的观点——当社区的初始直觉认为某件事非常困难时，你却依然接手并设定疯狂期限。从人类工程学角度看，你见过这种做法的价值吗？

So, like, our initial intuition, which seems like something that you have taken on, that Elon espouses, that where the initial intuition of the community might say this is very difficult, and then you take it on anyway with a crazy deadline. You just from a human engineering perspective, have you seen the value of that?

Speaker 1

我不会说设定完全不可能的目标是好主意，但我认为设定非常雄心勃勃的目标是好的。我认为存在我所说的难度次线性增长现象，即10倍难度的问题通常不是10倍难执行，可能只有2-3倍。因为如果你想改进系统10%，需要一定工作量；但想改进10倍，并不需要100倍工作量。这是因为你从根本上改变了方法，如果以这个约束为起点，某些方法显然行不通，这会迫使你重新评估。我认为这是种非常有趣的问题解决方式。

I wouldn't say that setting impossible goals exactly is a good idea, but I think setting very ambitious goals is a good idea. I think there's what I call sublinear scaling of difficulty, which means that 10x problems are not 10x hard, usually 10x harder problem is like two or 3x harder to execute on. Because if you want to actually improve a system by 10% it costs some amount of work, and if you want to 10x improve the system it doesn't cost 100x amount of work. And it's because you fundamentally change the approach and if you start with that constraint, then some approaches are obviously dumb and not going to work and it it forces you to reevaluate. And I think it's a very interesting way of approaching problem solving.

Speaker 0

但这需要一种奇怪的思维方式。就像回到你读博时期，你怎么判断机器学习领域里哪些问题是可解决的？

But it requires a weird kind of thinking. It's just going back to your, like, PhD days. It's like, how do you think which ideas in in the machine learning community are solvable?

Speaker 1

嗯，是的。

Mhmm. Yes.

Speaker 0

这需要...该怎么说？虽然有‘第一性原理思维’的陈词滥调，但本质上需要忽略社区的共识，因为科学界不正是通过划定可能与否的界限来运作的吗？而且，要在不疯掉的情况下突破这种界限非常困难。

It's a it requires what is that? I mean, there's the cliche of first principles thinking, but, like, it requires to basically ignore what the community is saying because doesn't the community doesn't a community in science usually draw lines of what is and isn't possible? Right. And, like, it's very hard to break out of that without going crazy.

Speaker 1

是的。我想这里有个很好的例子，就是深度学习革命在某种程度上，因为你可以身处2012年前后深度学习革命时期的计算机视觉领域。你可以通过改进计算机视觉技术栈获得10%的提升，或者我们可以直接说，实际上这些都毫无用处，我怎样才能实现10倍的计算机视觉进步？很可能不是通过调整HOG特征检测器。我需要一种不同的方法。

Yep. I mean I think a good example here is, you know, the deep learning revolution in some sense, because you could be in computer vision at that time during the deep learning sort of revolution of 2012 and so on. You could be improving a computer vision stack by 10%, or we can just be saying, actually all of this is useless, and how do I do 10 x better computer vision? Well, it's not probably by tuning a hog feature detector. I need a different approach.

Speaker 1

我需要一个可扩展的方案，回到理查德·萨顿的观点，理解《苦涩的教训》的哲学，然后意识到实际上我需要一个像神经网络这样更具扩展性的系统，它在理论上是可行的，再找到一些坚定的实践者来执行这个使命并使其成功。这才是10倍提升的解决方案。

I need something that is scalable, going back to Richard Sutton's and understanding sort of like the philosophy of the Bitter Lesson and then being like actually I need a much more scalable system like a neural network that in principle works, and then having some deep believers that can actually execute on that mission and make it work. That's the 10x solution.

Speaker 0

你认为解决自动驾驶问题需要多长时间？这在某种程度上仍是个未解之谜。

What do you think is the timeline to solve the problem of autonomous driving? That's still in part an open question.

Speaker 1

是的，我认为自动驾驶时间表最棘手的地方在于，目前还没有人真正实现过自动驾驶。不像问'建这座桥需要多久'——我们之前建过数百万座桥，知道大概耗时。但自动驾驶是无人涉足的领域，根本无法预估。

Yeah, I think the tough thing with timelines of self driving obviously is that no one has created self driving. Yeah. So it's not like what do you think is a timeline to build this bridge? Well we've built million bridges before, here's how long that takes. It's you know, it's no one has built autonomy, it's not obvious.

Speaker 1

某些部分可能比其他部分简单得多，所以预测极其困难。你只能基于趋势线和直觉尽力而为，但这就是为什么从根本上说这个问题很难预测。

Some parts turn out to be much easier than others, so it's really hard to forecast. You do your best based on trend lines and so on and based on intuition, but that's why fundamentally it's just really hard to forecast this.

Speaker 0

确实如此，即使身处其中也很难判断。是的。

No one has So even still, like, being inside of it is hard Yeah. To to do Yes.

Speaker 1

有些事最终会变得异常困难，而有些则会出乎意料地简单。

Some things turn out to be much harder and some things turn out to be much easier.

Speaker 0

你是否试图避免做出预测？因为，像埃隆就不会回避预测，对吧？过去汽车公司的负责人也未曾回避过。福特等公司曾预测我们将在2020、2021年左右实现L4级自动驾驶。

Do you try to avoid making forecasts? Because, like, Elon doesn't avoid them. Right? And heads of car companies in the past have not avoided it either. Ford and other places have made predictions that we're gonna solve at level four driving by 2020, 2021, whatever.

Speaker 0

嗯。现在他们都在某种程度上撤回了那些预测。作为一个AI从业者，你自己私下会做预测吗？还是说预测会干扰你实际思考问题的能力？

Mhmm. And now they're all kind of backtracking that prediction. Are you as a as an AI person, do you for yourself privately make predictions or do they get in the way of like your actual ability to think about a thing?

Speaker 1

是的。可以说容易断言的是这个问题是可解决的，这是个容易做出的预测。它是可解决的，会成功的。没错。

Yeah. I would say like what's easy to say is that this problem is tractable, and that's an easy prediction to make. It's tractable. It's going to work. Yes.

Speaker 1

只是确实非常困难。有些事最终比预期更难，有些则更简单。但它绝对感觉是可解决的，至少在我看来，特斯拉团队正朝着这个目标稳步前进。

It's just really hard. Some things turn out to be harder and some things turn out to be easier. So but it definitely feels tractable, and it feels like at least the team at Tesla, which is what I saw internally, is definitely on track to that.

Speaker 0

你如何构建一个强有力的认知框架来评估可行性？作为众多人的领导者，你必须断言这是切实可行的。你是如何建立这种直觉的？

How do you form a strong representation that allows you to make a prediction about tractability? So, like, you're the leader of a lot a lot of humans. You have to kind of say this is actually possible. Like, what Yeah. How do you build up that intuition?

Speaker 0

甚至不限于驾驶领域，也可以是其他任务。它

It doesn't have to be even driving. It could be other tasks. It

Speaker 1

可以是

could be

Speaker 0

那么我想问，你一生中处理过哪些困难的任务？我是说分类问题，比如在ImageNet上达到某种超越人类水平的性能。

and and I wanna what difficult task did you work on in your life? I mean, classification, achieving certain just an image net, certain level of superhuman level performance.

Speaker 1

是的。专家直觉。那纯粹是直觉。

Yeah. Expert intuition. It's just intuition.

Speaker 0

这是信念问题。就像经过长时间思考，比如学习、观察样本数据，就像你说的驾驶。我在这方面的直觉其实很不可靠。我对问题可解性没有很好的直觉判断，它可能千变万化。

It's belief. So just like thinking about it long enough, like studying, looking at sample data, like you said, driving. My intuition is really flawed on this. Like, I don't have a good intuition about tractability. It it could be it could be anything.

Speaker 0

可能是可解的。你知道，驾驶任务可能被简化为相当简单的事情。问题的解决方案可能非常简单。随着规模扩大，越来越多完美驾驶的车辆可能让问题变得更容易。是的。

It could be solvable. Like, you know, the driving task could could be simplified into something quite trivial. Like, the solution to the problem would be quite trivial. And at scale, more and more cars driving perfectly might make the problem much easier. Yeah.

Speaker 0

随着更多车辆上路，人们学会的驾驶方式——不是正确与否，而是更适合混合自动驾驶、半自动驾驶和人工驾驶系统的优化方式——这可能会改变现状。另外，我还花了荒谬的时间观察行人过马路，思考人类行为。我们使用眼神交流的方式传递着强烈信号，存在某些特殊行为和边缘案例。当然，许多致命事故与酒驾有关，无论是行人还是司机。

The more cars you have driving and, like, people learn how to drive correctly not correctly, but in a way that's more optimal for heterogeneous system of autonomous and semi autonomous and manually driven cars. That could change stuff. Then again, also I've spent a ridiculous number of hours just staring at pedestrians crossing streets, thinking about humans. And it feels like the way we use our eye contact, it sends really strong signals, and there's certain quirks and etch cases of behavior. And, of course, a lot of the fatalities that happen have to do with drunk driving and both on the pedestrian side and the driver side.

Speaker 0

所以存在夜间驾驶这类问题。是的。我在想，自动驾驶的可能解决方案空间包含如此多人为因素，几乎无法预测。可能存在非常简洁优雅的解决方案。

So there's that problem of driving at night and all that kind of Yep. So I wonder, you know, it's like the space of possible solution to autonomous driving includes so many human factor issues that it's almost impossible to predict. It there could be super clean, nice solutions.

Speaker 1

没错。用游戏来比喻的话，虽然存在战争迷雾，但你确实能看到进步的前沿。而且可以量化历史进展。比如我在特斯拉的五年里，刚加入时系统在高速上连车道都保持不稳。

Yeah. I would say definitely like, to use a game analogy, there's some fog of war, but you definitely also see the frontier of improvement. Yeah. And you can measure historically how much you've made progress. And I think for example, at least what I've seen in roughly five years at Tesla, when I joined it barely kept lane on the highway.

Speaker 1

我认为从帕洛阿尔托到旧金山的路上大约有三四次干预。每当道路在几何形状上有所变化或转弯过急时，系统就会失效。而五年内从那种状态发展成一个相当成熟的系统，同时看到后台运作的机制以及团队如今在数据处理、计算能力等方面的运作规模，这简直是巨大的进步。

I think going up from Palo Alto to SF was like three or four interventions. Any time the road would do anything geometrically or turn too much it would just like not work. And so going from that to like a pretty competent system in five years and seeing what happens also under the hood and what the scale at which the team is operating now with respect to data and compute and everything else is just massive progress.

Speaker 0

所以你们就像在攀登山峰时遇到迷雾，但取得了很大进展。

So So you're climbing a mountain and fog, but you're making a lot of progress.

Speaker 1

迷雾中前行。你们在取得进展的同时看清了下一步方向，审视着剩余的挑战——这些挑战并未扰乱你们，没有改变你们的核心理念，也不需要你们扭曲自我。你们清楚地意识到：这些就是我们仍需完成的事项。

Fog. You're making progress and you see what the next directions are, and you're looking at some of the remaining challenges and they're not they're not like perturbing you and they're not changing your philosophy and you're not contorting yourself. You're like, actually these are the things that we still need to do.

Speaker 0

没错。从数据引擎到车载计算，再到训练所需的算力，解决问题的核心组件似乎都已就位。这些年在特斯拉，你在工程领域实现了众多惊人突破，涵盖数据引擎和人为因素等各个方面。能否谈谈你选择离开特斯拉的原因？

Yeah. Fundamental components of solving the problem seem to be there from the data engine to the compute to the compute on the car to the compute for the training, all that kind of stuff. So you've done over the years, you've been at TES, you've done a lot of amazing breakthrough ideas in engineering, all of it, from the data engine to the human side, all of it. Can you speak to why you chose to leave Tesla?

Speaker 1

正如我在Ren所述，这五年间我逐渐陷入了管理岗位的境地。我的日常充斥着会议、组织扩张，以及关于团队战略方向的高层决策。这本质上是个企业高管角色——我能胜任，也还算称职，但这并非我真正的热情所在。当初加入时特斯拉根本没有计算机视觉团队，正从依赖Mobileye第三方视觉方案转向自主开发。我到任时只有两个人在训练深度神经网络，用的还是放在腿上的电脑...

Basically, as I described at Ren, I think over time during those five years, I've kind of gotten myself into a little bit of a managerial position. Most of my days were, you know, meetings and growing the organization and making decisions about sort of high level strategic decisions about the team and what it should be working on and so on. And it's kind of like a corporate executive role and I can do it, I think I'm okay at it, but it's not like fundamentally what I enjoy. And so I think when I joined there was no computer vision team because Tesla was just going from the transition of using Mobileye, a third party vendor for all of its computer vision, to having to build its computer vision system. So when I showed up, there were two people training deep neural networks and they were training them at a computer at their their legs, like down

Speaker 0

一台工作站。做着基础分类任务。

a workstation. Basic classification task.

Speaker 1

是的，后来我将这个团队发展成我认为相当出色的深度学习团队，配备了大型计算集群和高效的数据标注体系。我对成果很满意，当团队实现高度自治后，我选择抽身而出。现在我非常期待重新投入技术性工作，将重心转回通用人工智能领域。

Yeah, and so I kind of like grew that into what I think is a fairly respectable deep learning team, a massive compute cluster, a very good data annotation organization, and I was very happy with where that was. It became quite autonomous, and so I kind of stepped away and I, you know, I'm very excited to do much more technical things again. Yeah. And kind of like refocus on AGI.

Speaker 0

那次心灵探索是怎样的？因为你休息了一段时间，我在想，你到底吃了多少蘑菇？哈哈，开个玩笑。我是说，你当时心里在想些什么？

What was that soul searching like? Because you took a little time off and think like what how many mushrooms did you take? No. I'm just kidding. I mean, what what was going through your mind?

Speaker 0

人类的生命是有限的。没错。你完成了一些了不起的事情。你是世界上最好的AI教师之一，真的，我这么说绝非客套。

The human lifetime is finite. Yeah. You did a few incredible things. You're you're one of the best teachers of AI in the world. You're one of the best, and I don't mean that.

Speaker 0

我是真心实意这么认为的。你是AI领域最出色的实践者之一，就像通过从零构建和把玩基础原理来理解事物本质——爱因斯坦、费曼，我们都擅长这类事情。对，就像通过小例子来摸索和理解它。

I mean that in the best possible way. You're one of the best tinkerers in the AI world, meaning, like, understanding the fundamental fundamentals of how something works by building it from scratch and playing with the with the basic intuitions. It's like Einstein, Feynman, we're all really good at this kind of stuff. Like Yeah. Small example of a thing to to play with it, to try to understand it.

Speaker 0

所以，再加上现在通过Tessa项目，你帮助组建了机器学习工程师团队，打造了能在现实世界真正落地的系统。考虑到所有这些，那次心灵探索到底是什么样的体验？

So that and, obviously, now with with Tessa, you helped build a team of machine learning, like engineers and the system that actually accomplishes something in the real world. So given all that, like, what what was the soul searching like?

Speaker 1

这很艰难，因为我显然非常热爱公司，热爱埃隆，热爱特斯拉。离开从来不是容易的决定——我本质上热爱这个团队。不过，我确实有兴趣在未来某个时候重新考虑回归，或许参与Optimus项目或特斯拉的AGI研发。我认为特斯拉将会创造非凡成就。

Well, it was hard because obviously I love the company a lot, and I love Elon, I love Tesla. I want it was always hard to leave. I love the team basically. But, yeah, I think actually I would be potentially interested in revisiting it, maybe coming back at some point, working in Optimus, working in AGI at Tesla. I think Tesla is going to do incredible things.

Speaker 1

本质上它是一家巨型机器人技术公司，拥有大量实现突破性创新的内部人才。我认为人形机器人将令人惊叹，自动驾驶交通也将如此——而这一切正在特斯拉发生。这实在是个了不起的组织。

It's basically like it's a massive, large scale robotics kind of company with a ton of in house talent for doing really incredible things. And I think human robots are going to be amazing. I think autonomous transportation is going to be amazing. All this is happening at Tesla. So I think it's just a really amazing organization.

Speaker 1

因此成为其中一员并助力其发展，我确实非常享受这个过程。离开之所以困难，正是出于对公司的热爱。但我乐意在未来某个阶段回归开启'第二幕'——只是现阶段团队已建成且能自主运作，而我成为了管理者。我渴望更多技术实践、学习与教学，觉得是时候换个节奏了。

So being part of it and helping it along, I think was very basically, I enjoyed that a lot. Yeah, it was basically difficult for those reasons because I love the company, but you know, I'm happy to potentially at some point come back for act two, but I felt like at this stage I built the team, it felt autonomous, and I became a manager, and I wanted to do a lot more technical stuff, I wanted to learn stuff, I wanted to teach stuff, and I just kind of felt like it was a good time for a change of pace a little bit.

Speaker 0

说到第二部，你认为有史以来最棒的电影续集是哪部？因为大多数续集都很糟糕。

What do you think is the best movie sequel of all time, speaking of part two? Because, like because most of them suck.

Speaker 1

电影续集？

Movie sequels?

Speaker 0

对，电影续集。你在推特上常聊电影，就顺便问问。你最喜欢的电影续集是什么？比如《教父2》。

Movie sequels. Yeah. And you tweet about movies, so just in a tiny tangent. Is there what's your what what's, like, a favorite movie sequel? Godfather part two.

Speaker 0

你是《教父》的粉丝吗？因为你都没在推特上提过它。

Are you a fan of Godfather? Because you didn't even tweet

Speaker 1

或者提到《教父》。是的，我不太喜欢那部电影。我知道这有点...

or mention the Godfather. Yeah. I don't love that movie. I know it hasn't been

Speaker 0

这段对《教父》的差评我们会剪掉。你怎么敢...

a that out. We're gonna edit out the hate towards the Godfather. How dare

Speaker 1

我要发表一个强烈观点。不知道为什么，但基本上我不喜欢1995年之前的任何电影，大概是这样。

you I I will make a strong statement. Don't know why. I don't know why, but I basically don't like any movie before 1995, something like that.

Speaker 0

你不是提到过《终结者2》吗？

Didn't you mention Terminator two?

Speaker 1

好吧好吧。《终结者2》稍微晚一点，是1990年的。

Okay. Okay. That's like a Terminator two was a little bit later, 1990.

Speaker 0

不，我认为《终结者2》是在

No. I think Terminator two was in the

Speaker 1

八十年代。我也喜欢《终结者1》。虽然有少数例外，但总的来说，出于某种原因我不喜欢1995年以前的电影。它们节奏太慢，镜头总是拉得很远，很无聊，有点天真，还有点奇怪。

eighties. And I like Terminator one as well. So okay, so like few exceptions, but by and large for some reason I don't like movies before 1995 or something. They feel very slow, the camera is like zoomed out, it's boring, it's kind of naive, it's kind of weird.

Speaker 0

而且《终结者》在当时是非常超前的。

And also Terminator was very much ahead of its time.

Speaker 1

是的。《教父》里就没有任何人工通用智能。

Yes. And The Godfather, there's like no AGI.

Speaker 0

我是说，你提到的《心灵捕手》也没有任何人工通用智能。那部电影讲的是数学。

Mean, you have Good Will Hunting was one of the movies you mentioned and that doesn't have any AGI either. I guess that's mathematics.

Speaker 1

是的。我想偶尔我也会喜欢那些没有大牌明星的电影。

Yeah. I guess occasionally I do enjoy movies that don't feature.

Speaker 0

或者像《王牌播音员》那样的。那部电影没有那些

Or like Anchorman. That has no that's

Speaker 1

《王牌播音员》太棒了。

Anchorman is so good.

Speaker 0

说到AGI，我不明白为什么威尔·法瑞尔这么好笑。嗯。这说不通。完全无法理解。他就是有种特别的魅力。

I don't understand, speaking of AGI because I don't understand why Will Ferrell is so funny. Mhmm. It doesn't make sense. It doesn't compute. There's just something about him.

Speaker 0

他是个独一无二的人，因为现在这样的喜剧演员很少见了。我在想这是与文化有关，还是与好莱坞的运作机制有关，或者只是我们运气好遇到了某些喜剧天才。嗯。他们聚集在一起，因为他确实是个独特的存在。是的，没错。

And he's a singular human because you don't get that many comedies these days, and I wonder if it has to do about the culture or the, like, the machine of Hollywood, or does it have to do with just we got lucky with certain people in comedy Mhmm. That came together because he is a singular human. Yeah. Yeah.

Speaker 1

我们很幸运能有这些电影。

We got lucky movies.

Speaker 0

刚才跑题跑得有点离谱，我道歉。但你提到了人形机器人，你觉得特斯拉的Optimus机器人怎么样？你认为十月份我们会在工厂和家里看到机器人吗？

That that was a ridiculous tangent, I apologize. But you mentioned humanoid robots, so what do you think about Optimus, about Tesla Bot? Do you think we'll have robots in the factory and and in the home in October?

Speaker 1

是的。我认为这是一个非常艰巨的项目，需要很长时间，但还有谁能大规模制造人形机器人呢？我认为这是一个非常好的形态方向，正如我提到的，世界是为人类形态设计的，这些机器人将能操作我们的机器，能坐在椅子上，甚至可能驾驶汽车。本质上，世界是为人类设计的，这正是你值得投入并逐步完善的形态。还有另一种观点认为，应该针对特定问题设计专用机器人，但实际上设计机器人并构建完整的数据引擎和配套系统本身就是个极其复杂的难题。

Yeah. Think I it's a very hard project, I think it's going to take a while, but who else is going to build human robots at scale? And I think it is a very good form factor to go after because like I mentioned, the world is designed for humanoid form factor, these things would be able to operate our machines, they would be able to sit down in chairs, potentially even drive cars. Basically the world is designed for humans, that's the form factor you want to invest into and make work over time. I think there's another school of thought which is okay, pick a problem and design a robot to it, actually designing a robot and getting a whole data engine and everything behind it to work is actually an incredibly hard problem.

Speaker 1

因此追求通用接口是合理的——它们可能对任何单一任务都不完美，但具备通过英语指令就能跨场景操作的通用性。我认为在物理世界追求通用接口非常有意义，虽然项目难度极高且耗时，但我看不到其他公司能实现这一愿景。这将是非凡的成就——想想看，如果说交通运输是个巨大市场，那么体力劳动市场更是大得离谱。

So it makes sense to go after general interfaces that okay, are not perfect for any one given task, but they actually have the generality of just with a prompt with English able to do something across. And so I think it makes a lot of sense to go after a general interface in the physical world, and I think it's a very difficult project, think it's going to take time, but I see no other no other company that can execute on that vision. I think it's going to be amazing. Like, basically physical labor, like if you think transportation is a large market, try physical labor. It's insane.

Speaker 0

但这不仅仅是体力劳动。对我来说，社交机器人技术同样令人兴奋。嗯。我们将在不同层面上与这些机器人建立的关系。是的。

But it's not just physical labor. To me, the thing that's also exciting is the social robotics. Mhmm. The the relationship we'll have on different levels with those robots. Yep.

Speaker 0

这就是为什么看到Optimus让我如此激动。有人批评我过于兴奋，但我与许多研究人形腿式机器人的实验室合作过，比如波士顿动力、Unitree等众多公司。动作的优雅性只是宏大蓝图中的微小部分。特斯拉研发人形或腿式机器人最让我兴奋的两点，显然是将其整合进数据引擎体系。

That's why I was really excited to see Optimus. Like, people have criticized me for the excitement. But I've I've worked with a lot of research labs that do humanoid legged robots, Boston Dynamics, Unitree, a lot there's a lot of companies that do legged robots, but that's the the elegance of the movement is a tiny, tiny part of the big picture. So integrating the two big exciting things to me about Tesla doing humanoid or any legged robots is clearly integrating it into the data engine.

Speaker 1

嗯。

Mhmm.

Speaker 0

首先是数据引擎层面——将感知、控制、规划等智能系统整合到你提到的庞大车队中。其次是谈到车队时，就不得不提大规模制造能力。明白吧？

So the the data engine aspect. So the actual intelligence for the perception and the and the control and and the planning and all that kind of stuff integrating into this huge the fleet that you mentioned. Right? And then speaking of fleet, the second thing is the mass manufacturers. Just knowing Yep.

Speaker 0

从企业文化层面推动开发简单、低成本可量产的机器人。是的。并且凭借丰富经验做好这一点，这将改变一切。这就是为什么特斯拉与波士顿动力有着完全不同的文化风格——后者的机器人运动流畅度确实令人惊叹，特斯拉可能很长时间都难以企及，但重点不在于此。正如我们讨论的，关键在于整个系统——数据引擎和车队网络的整合。

Culturally driving towards a simple robot that's cheap to produce at scale. Yep. And doing that well, having experience to do that well, that changes everything. That's why that's a very different culture and style than Boston Dynamics, who, by the way, those those robots are just the the way they move is, like, it'll be a very long time before Tesla can achieve the smoothness of movement, but that's not what it's about. It's it's about it's about the entirety of the system like we talked about, the data engine and the fleet.

Speaker 0

这太令人兴奋了。即便是最初的模型版本，短短几个月就能做出原型机也着实让人惊讶。

That's super exciting. Even the initial sort of models, but that too was really surprising that in a few months you can get a prototype.

Speaker 1

没错。进展如此迅速的原因，正如你提到的，是因为大量借鉴了自动驾驶技术。特斯拉内部涌现出的类人机器人建造专长令人惊叹——基本上马斯克刚宣布要造机器人，第二天各种CAD设计图就冒出来了。

Yep. And the reason that happened very quickly is, as you alluded to, there's a ton of copy paste from what's happening on the autopilot. A lot. The amount of expertise that like came out of the woodworks at Tesla for building the human robot was incredible to see. Like basically Elon said at one point we're doing this and then next day basically like all these CAD models started to appear.

Speaker 1

有人讨论供应链和制造问题，前几天还有人带着螺丝刀等工具直接开始组装机身。我震惊地发现特斯拉原来藏着这么多人才，而且造车和造机器人本质上没太大区别。这种共通性不仅体现在硬件上——别忘了我们不仅要造演示品，还要实现硬件量产，这完全是另一个维度——软件方面也一样，现在这个机器人系统默认自己就是辆车。

People talking about like the supply chain and manufacturing and people showed up with like screwdrivers and everything like the other day and started to like put together the body and I was like woah, like all these people exist at Tesla and fundamentally building a car is actually not that different from building a robot. The same and that is true not just for the hardware pieces. And also let's not forget hardware not just for a demo, but manufacturing of that hardware at scale. It is like a whole different thing. But for software as well, basically this robot currently thinks it's a car.

Speaker 0

它迟早要经历中年危机的。

It's gonna have a midlife crisis at some point.

Speaker 1

它确实自认为是汽车。早期演示时我们考虑过在停车场进行，因为那里的计算机视觉系统可以直接套用。整个操作系统都是复制粘贴的，计算机视觉模块也基本如此——虽然需要重新训练神经网络，但方法论、数据引擎、离线追踪器、空间占据追踪等整套体系都能复用。

It thinks it's a car. Some of the earlier demos actually, we were talking about potentially doing them outside in the parking lot because that's where all of the computer vision was like working out of the box instead of like inside. But all the operating system, everything just copy pastes. Computer vision mostly copy pastes. I mean you have to retrain the neural nets but the approach and everything and data engine and offline trackers and the way we go about the occupancy tracker and so on, everything copypastes, you just need to retrain the neural nets.

Speaker 1

当然运动规划控制需要大幅调整。但特斯拉现有技术可以大量复用，所以如果目标是量产百万台人形机器人，非特斯拉企业会非常吃力，但对特斯拉来说并非天方夜谭。

And then the planning control of course has to change quite a bit. But there's a ton of copypaste from what's happening at Tesla, and so if you were to go with the goal of like, okay, let's build a million human robots and you're not Tesla, that's that's a lot to ask. If you're Tesla, it's actually like, it's not it's not that crazy.

Speaker 0

接下来的问题是：就像自动驾驶一样，机械臂操作任务的难度究竟有多大？是的，这关系到能否实现规模化应用。机器人技术的优势在于——除非涉及精密制造等领域——容错空间比驾驶大得多，毕竟驾驶对安全性和时效性要求都极其严苛。

And then the the follow-up question is on how difficult, just like with driving, how difficult is the manipulation task Yep. Such that it can have an impact at scale. I think depending on the context, the really nice thing about robotics is that unless you do a manufacturing and that kind of stuff, is there is more room for error. Yep. Driving is so safety critical and that and also time critical.

Speaker 0

机器人被允许移动得更慢，这挺好的。

A robot is allowed to move slower, which is nice.

Speaker 1

是的。我认为这会花费很长时间，但你规划开发的方式应该是先承认这一点——这确实需要很长时间。然后思考如何制定产品开发路线图，确保过程中能持续产生收益。避免陷入‘非零即一’的失败模式，即产品要么彻底成功要么完全失败。这种处境必须规避。

Yes. I think it's going to take a long time, but the way you want to structure the development is you need to say, okay, it's going to take a long time. How can I set up the product development roadmap so that I'm making revenue along the way? I'm not setting myself up for a zero one loss function where it doesn't work until it works. You don't want to be in that position.

Speaker 1

你需要让产品几乎立即产生价值，然后逐步扩大部署规模。实现泛化应用。在此过程中建立数据引擎、改进闭环、遥测系统、评估体系和测试框架等基础设施，让产品持续迭代优化，同时保持收益流。这一点至关重要，否则这类大型项目在经济层面和团队激励层面都难以维系——开发者需要过程中的成就感激励。

You want to make it useful almost immediately and then you want to slowly deploy it At scale. Generalize it. At scale. And you want to set up your data engine, your improvement loops, the telemetry, the evaluation, the harness and everything, and you want to improve the product over time incrementally and you're making revenue along the way. That's extremely important because otherwise you cannot build these large undertakings just like don't make sense economically and also from the point of view of the team working on it, they need the dopamine along the way.

Speaker 1

不能只是空许承诺说‘十年后产品成熟时将改变世界’。理想状态应该像现在的自动驾驶系统——当下就能提供增强的安全性和驾驶便利性，消费者愿意付费购买并喜爱使用，同时你们又朝着更宏大的使命迈进。

They're not just going to make a promise about this being useful, this is going to change the world in ten years when it works. This is not where you want to be. You want to be in a place like I think Autopilot is today where it's offering increased safety and and convenience of driving today. People pay for it, people like it, people purchase it, and then you also have the greater mission that you're working towards.

Speaker 0

嗯。所以你提到的团队成就感，那本身就是幸福感的来源

Mhmm. And you see that, so the dopamine for the team, that that was a source of happiness

Speaker 1

没错，百分之百认同。产品投入应用后获得用户喜爱，人们驾驶它、为之付费、主动讨论，YouTube上涌现大量视频。你祖母使用后给出反馈，用户积极参与互动，你自己也在使用它。

and Yes. And 100%. You're deploying this, people like it, people drive it, people pay for it, they care about it. There's all these YouTube videos. Your grandma drives it, she gives you feedback, people like it, people engage with it, You engage with it.

Speaker 1

影响巨大。

Huge.

Speaker 0

开特斯拉的人会认出你并向你示好吗？比如，嘿，谢谢你们提供了这个很棒的功能。

Do people that drive Teslas, like, recognize you and give you love? Like like, hey. Thanks for the for the Yeah. This nice feature that

Speaker 1

确实如此。但棘手的是，有些人真心喜欢你，而有些人却不幸地憎恶你——即便你正在做自认为极具价值和意义的事。确实存在一群人憎恨我、团队乃至整个项目。

it's doing. Yeah. Think the tricky thing is, like, some people really love you, some people unfortunately like, you're working on something that you think is extremely valuable, useful, etcetera. Some people do hate you. There's a of people who like hate me and the team and what everything the whole project.

Speaker 1

嗯，我认为

Mhmm. I think

Speaker 0

那些人是特斯拉车主吗？

Are they Tesla drivers?

Speaker 1

多数情况下并不是。真的。

Many cases they're not Yeah. Actually.

Speaker 0

是啊。这让我对人类现状感到悲哀——当前人际互动的方式。但我觉得这是可改变的。人类本质上是想善待彼此的，只是推特和社交媒体放大了负面情绪的传播。

Yeah. That's that's actually makes me sad about humans or the current the ways that humans interact. I think that's actually fixable. I think humans want to be good to each other. I think Twitter and social media is part of the mechanism that actually somehow makes the negativity more viral Mhmm.

Speaker 0

那些本不值得如此 disproportionate 的病毒式传播的负面情绪。真希望人们能抑制嫉妒与自我，学会为他人喝彩。这其中有因果报应：你为他人欢呼，他人也会为你欢呼。

That it doesn't deserve, like, disproportionately ad of, like, a viral viral boost Yeah. The negativity. But I got I I wish people would just get excited about so suppress some of the jealousy, some of the ego, and just get excited for others. And then there's a karma aspect to that. You get excited for others, they'll get excited for you.

Speaker 0

学术界也是如此。如果不小心，那里就像一个动力系统。如果你孤立地思考，嫉妒别人的成功，这实际上可能适得其反，导致整个社区和你个人的生产力下降。我觉得如果你持续为他人喝彩，反而会让你更成功。是的。

Same thing in academia. If you're not careful, there is a, like, a dynamical system there. If you if you think of in silos and get jealous of somebody else being successful, that actually, perhaps counterintuitively, leads to less productivity of you as a community and you individually. I feel like if you keep celebrating others, that actually makes you more successful. Yeah.

Speaker 0

我认为在某些行业中，人们还没有完全领悟到这一点。

And I think people haven't in depending on the industry haven't quite learned that yet.

Speaker 1

没错。有些人非常消极且直言不讳，所以他们特别显眼，但实际上有很多人是默默支持的啦啦队员。当你和世界各地的人交谈时，他们都会告诉你，这太棒了，这很了不起。特别是那些明白让这些东西运作起来有多困难的人，比如那些打造过产品的制造者和企业家。让这些工作并做出改变是极其困难的。

Yep. Some people are also very negative and very vocal, so they're very prominently featured, but actually there's a ton of people who are cheerleaders, but they're silent cheerleaders. And when you talk to people just in the world, they will all tell you, oh, it's amazing, it's great. Especially like people who understand how difficult it is to get this stuff working, like people who have built products and makers and entrepreneurs. Like making this work and changing something is incredibly hard.

Speaker 1

那些人更有可能为你加油打气。

Those people are more likely to cheerlead you.

Speaker 0

让我感到难过的是，机器人社区的一些人没有这样做，而他们本应如此。因为他们知道这有多难。实际上，他们有时并不了解规模化生产产品的难度。对吧？是的。

Well, one of the things that makes me sad is some folks in the robotics community don't do the cheerleading and they should. There's a because they know how difficult it is. Well, they actually sometimes don't know how difficult it is to create a product at scale. Right? Yep.

Speaker 0

他们实际上是在现实世界中部署它。很多机器人和人工智能系统的开发都是在非常特定的小型基准测试上完成的，而不是在现实世界的条件下。

They actually deploy it in the real world. A a lot of the development of robots and AI system is done on very specific small benchmarks and as opposed to real world conditions.

Speaker 1

是的。我认为在学术环境中研究机器人技术确实非常困难。

Yes. Yeah. I think it's really hard to work on robotics in academic setting.

Speaker 0

或是应用于现实世界的人工智能系统。你曾对著名的ImageNet数据集提出过批评，最近又对机器学习学术研究界仍过度推崇ImageNet这类基准测试表达了不满。能否谈谈机器学习研究中使用数据集的优缺点？

Or AI systems that apply in the real world. You you've criticized you flourished and loved for time the ImageNet, the famed ImageNet dataset, and have recently had some words, of criticism that the academic research ML community gives a little too much love still to the ImageNet or, like, those kinds of benchmarks. Can can you speak to the strengths and weaknesses of datasets used in machine learning research?

Speaker 1

实际上，我不记得自己曾对ImageNet表示过不满或批评。我认为ImageNet极具价值，它作为基准测试让深度学习社区证明了深度神经网络确实有效，这点意义重大。虽然ImageNet很有用，但现在它基本上已经变成了另一个MNIST。

Actually, I don't know that I recall a specific instance where I was unhappy or criticizing ImageNet. I think ImageNet has been extremely valuable. It was basically a benchmark that allowed the deep learning community to demonstrate that deep neural networks actually work. There's a massive value in that. So I think ImageNet was useful, but basically it's become a bit of an MNIST at this point.

Speaker 1

MNIST是那种28x28像素的灰度数字数据集，如今就像个笑话，谁都能轻松搞定。

So MNIST is like the little two twenty eight by 28 grayscale digits that's kind of a joke data set that everyone like just crushes.

Speaker 0

但现在仍有关于MNIST的论文发表对吧？

There's still papers written on MNIST though. Right?

Speaker 1

或许本就不该有太多严肃论文。

Maybe there shouldn't strong papers.

Speaker 0

是啊，比如研究如何用少量数据学习之类的论文。

Yeah. Like papers that focus on, like, how do we learn with a small amount of data, that kind stuff.

Speaker 1

没错，这类研究可能仍有帮助，但当然已不属于计算机视觉主流研究范畴了。

Yeah. Could see that being helpful, but not in sort of, like, mainline computer vision research anymore, of course.

Speaker 0

我觉得好像在哪里听过你的观点，可能是我记错了，但你说过ImageNet长期以来对社区做出了巨大贡献，而现在是我们该超越这类数据集的时候了。

I think the way I've heard you somewhere, maybe I'm just imagining things, but I think you said like ImageNet was a huge contribution to the community for a long time and now it's time to move past those kinds of

Speaker 1

确实，ImageNet已经被彻底攻克了。我的意思是，在1000类分类预测中准确率达到了90%左右，我看过那些图像数据，效果真的非常出色。如果没记错的话，现在前五名的错误率大概只有1%左右。

Well, ImageNet has been crushed. I mean, you know, the error rates are yeah we're getting like 90% accuracy in 1,000 classification way prediction, and I've seen those images and it's like really high, that's really good. If I remember correctly the top five error rate is now like 1% or something.

Speaker 0

基于你处理海量现实数据集的经验，你希望研究界使用的基准测试朝哪些方向发展？

Given your experience with a gigantic real world dataset, would you like to see benchmarks move in certain directions that the research community uses?

Speaker 1

遗憾的是，学术界目前还没有下一个ImageNet。我们已经攻克了MNIST，基本上也征服了ImageNet，但现在整个领域缺乏一个能让大家共同追随并用于网络进一步开发的大型基准测试。

Unfortunately, don't think academics currently have the next ImageNet. We've obviously I think we've crushed MNIST. We've basically kind of crushed ImageNet, and there's no next sort of big benchmark that the entire community rallies behind and uses, you know, for further development on these networks.

Speaker 0

是啊。什么样的数据集才能像病毒般吸引所有人的想象力，让大家全力支持？可能还需要个有号召力的领军人物，对吧？

Yeah. What would it takes for a dataset to captivate the imagination of everybody, like, they all get behind it? That that could also need, like, a viral like a leader. Mhmm. Right?

Speaker 0

对，需要有个影响力的人物。为什么ImageNet能成功？这是历史偶然还是必然？

Yeah. Somebody with popularity. I mean, that yeah. Why did ImageNet take off? Is there or is it just the accident of history?

Speaker 1

它的难度恰到好处——既具有挑战性又足够简单有趣，就像是那个时代正好需要这样的数据集。

It was the right amount of difficult. It was the right amount of difficult and simple and interesting enough, it just kind of like it was the right time for that kind of a dataset.

Speaker 0

来自Reddit的提问。对于合成数据和游戏引擎在神经网络模型开发未来中将扮演的角色，您有何看法？

Question from Reddit. What are your thoughts on the role that synthetic data and game engines will play in the future of neural net model development?

Speaker 1

我认为当神经网络趋近于人类水平时，模拟对神经网络的价值将类似于模拟对人类的价值。人们使用模拟是因为可以在这种系统中学习某些东西，而不必实际经历它。但您指的是

I think as neural nets converge to humans the value of simulation to neural nets will be similar to the value of simulation to humans. So people use simulation because they can learn something in that kind of a system without having to actually experience it. But are you referring to

Speaker 0

我们在脑海中进行的模拟？这不就像是模拟吗？

the simulation we do in our head? Isn't it like simulation?

Speaker 1

抱歉，我所说的模拟是指电子游戏或其他为各行业专业人士提供的模拟形式。

Sorry, simulation, I mean like video games or you know other forms of simulation for various professionals.

Speaker 0

那么我要对此提出异议，因为我们可能在脑海中进行的模拟，比如模拟如果我这样做，我认为会发生什么？

So let me push back on that because maybe there's simulation that we do in our heads, like simulate if I do this, what do I think will happen?

Speaker 1

好的。那就像是内部模拟。

Okay. That's like internal simulation.

Speaker 0

是的。内部的。这不正是我们在做的吗？在行动前先假设？

Yeah. Internal. Isn't that what we're doing? Assume it before we act?

Speaker 1

哦，是的。但这与使用模拟的概念是独立的，比如电脑游戏中的模拟，或是为创建训练集而进行的模拟，你知道的

Oh, yeah. But that's independent from, like, the use of simulation in the sense of, like, computer games or using simulation for training set creation or, you know

Speaker 0

是独立的还是只是弱相关？因为，比如，做反事实模拟或HK的模拟不是很有用吗？比如，如果发生核战争会怎样？如果发生那些类似的事情会怎样？因为

Is it independent or is it just loosely correlated? Because, like, isn't that useful to do, like, counterfactual or, like, HK's simulation to, like, you know, what happens if there's a nuclear war? What happens if there's, you know, like, those kinds of things? Because

Speaker 1

是的。那与虚幻引擎的模拟是不同的。我是这样理解这个问题的。

Yeah. That's a different simulation from, like, Unreal Engine. That that's how I interpreted the question.

Speaker 0

啊，所以，像是平均情况的模拟。那就是虚幻引擎的作用吗？你所说的虚幻引擎是什么意思？所以是模拟一个世界。对，那个世界的物理，为什么那会不同？

Ah, so, like, simulation of the average case. Is that what's Unreal Engine? What what what what what do you mean by Unreal Engine? So simulating a world Yeah. Physics of that world, why is that different?

Speaker 0

因为你也可以给那个世界添加行为。嗯。你可以尝试各种东西。对吧？嗯。

Like, because you also can add behavior to that world. Mhmm. And you could try all kinds of stuff. Right? Mhmm.

Speaker 0

你可以往里面加入各种奇怪的东西。因为虚幻引擎不仅仅是模拟——我是说，我想它确实是在模拟世界的物理。但它也在用这个做些什么。

You could throw all kinds of weird things into it. Because Unreal Engine is not just about simulate I mean, I guess it is about submitting the physics of the world. It's also doing something with that.

Speaker 1

是的。图形、物理，以及你放入环境中的代理等等。对。

Yeah. The graphics, the physics, and the agents that you put into the environment and stuff like that. Yeah.

Speaker 0

你看，我觉得你似乎在说这对AI未来发展没那么重要，我猜可以这么理解吧？

See, I think you I feel like you said that it's not that important, I guess, for the future of AI development. Is that is that correct to interpret

Speaker 1

我认为人类会使用模拟器，觉得它们有用，所以计算机也会使用模拟器并发现其价值。

I you that think humans use simulators humans use simulators and they find them useful and so computers will use simulators and find them useful.

Speaker 0

好吧，所以你的意思是...我个人不常用模拟器。偶尔玩电子游戏，但我不认为从中获得了关于自身存在的洞见。那只是暂时逃离现实，而非认识现实的智慧源泉。这么说来，模拟其实是...

Okay, so you're saying it's not I don't use simulators very often. I play a video game every once in a while but I don't think I derive any wisdom about my own existence from from those video games. It's a momentary escape from reality versus a source of wisdom about reality. So I don't so I think that's a very polite way of saying simulation is

Speaker 1

没那么有用。确实可能不是。我不认为这是当前训练神经网络的基础核心部分。但随着神经网络越来越强大，训练新行为所需的样本会减少。当然模拟存在领域差异——它并非真实世界。但神经网络越强大，能承受的领域差异就越大，因为它能理解：即便不是现实，这些高层次结构仍值得学习。

not that useful. Yeah, maybe not. I don't see it as like a fundamental, really important part of training neural nets currently, but I think as neural nets become more and more powerful, I think you will need fewer examples to train additional behaviors. And simulation is of course there's a domain gap in a simulation that's not the real world, it's slightly something different. But with a powerful enough neural net you need the domain gap can be bigger I think because neural net will understand that even though it's not the real world, it, like, has all this high level structure that I'm supposed to be able learn from.

Speaker 0

所以神经网络实际上能更好地利用合成数据

So the neural net will actually yeah. It will be able to leverage the synthetic data better

Speaker 1

是的。

Yes.

Speaker 0

通过辨别并更深入理解哪些方面不真实

By closing and get better understanding in which ways this is not real

Speaker 1

数据。确实如此。

data. Exactly.

Speaker 0

对。我下次会提更好的问题。那是个问题，但我只是开玩笑。好吧。那么，说到MNIST，你认为是否有可能构建需要极少数据的神经网络和训练过程？

Right. I do better questions next time. That was that was a question, but I'm just kidding. Alright. So is it possible, do you think, speaking of MNIST, to construct neural nets and training processes that require very little data?

Speaker 0

我们一直在讨论像互联网这样庞大的训练数据集。我的意思是，就像你说的，查询本身可以视为另一种训练层次，这需要少量数据。是的。但你是否认为研究如何用极少数据训练并构建知识库这一方向有价值？

So we've been talking about huge datasets like the Internet for training. I mean, one way to say that is, like you said, like, the querying itself is another level of training, I guess, and that requires a little data. Yep. But do you see any value in doing research and kind of going down the direction of can we use very little data to train, to construct a knowledge base?

Speaker 1

百分之百。我只是认为在某个阶段你需要海量数据集，当你预训练好庞大的神经网络并得到类似GPT的东西后，就能高效地训练任意新任务。许多这类GPT模型仅需极少量示例提示就能完成情感分析、翻译等任务——比如给出输入句子和对应的德语翻译样例。

100%. I I just think like at some point you need a massive dataset and then when you pre train your massive neural net and get something that is like a GPT or something, then you're able to be very efficient at training any arbitrary new task. So a lot of these GPTs, you you can do tasks like sentiment analysis or translation or so on just by being prompted with very few examples. Here's the kind of thing I want you to do, like here's an input sentence, here's the translation into German. Input sentence, translation to German.

Speaker 1

输入句子，留空处，神经网络仅通过你提供的示例就能补全德语翻译。这是在神经网络激活层面而非权重层面的少样本学习范例。我认为就像人类一样，神经网络最终会变得非常高效于学习新任务——但前提是需要海量数据完成预训练。

Input sentence, blank, and the neural net will complete the translation to German just by looking at sort of the example you've provided. And so that's an example of a very few shot learning in the activations of the neural net instead of the weights of the neural net. And so I think basically just like humans, neural nets will become very data efficient at learning any other new task. But at some point you need a massive data set to pre train your network.

Speaker 0

你明白吗？可能我们人类也有类似机制。我们是否拥有这种持续在后台以自监督方式运行的潜意识建模系统？只是我们自己没意识到？

Do you get that? And probably we humans have something like that. Do we do we have something like that? Do we have a passive in the background background model constructing thing that just runs all the time in a self supervised way? We're not conscious of it?

Speaker 1

我认为人类显然具备。我们一生中学习很多，但进化赋予的初始硬件配置也至关重要。这个领域很多人只计算人类存活的秒数，假设我们是白板状态——就像神经网络的零初始化，但事实并非如此。看看斑马等动物就能明白。

I think humans definitely. I mean, obviously, we have we learn a lot during during our lifespan, but also we have a ton of hardware that helps us at initialization coming from sort of evolution. And so I think that's also a really big component. A lot of people in the field I think they just talk about the amounts of seconds that a person has lived pretending that this is a tabula rasa, sort of like a zero initialization of a neural net, and it's not. You can look at a lot of animals like for example zebras.

Speaker 1

斑马出生后就能看见并奔跑。它们一生中没有任何训练数据，天生就会这些。所以不知为何，进化竟然找到了方法，将这些极其优秀的算法和神经网络初始化编码进ATCG序列中，而我

Zebras get born and they see and they can run. There's zero trained data in their lifespan, they can just do that. So somehow I have no idea how evolution has found a way to encode these algorithms and these neural net initializations that are extremely good into ATCGs and I

Speaker 0

完全不明白这其中的原理，但显然这是可能的，因为存在即证明。从单细胞发育成生命最初几年的有机体，这个过程充满魔力。我有点喜欢这个观点：我们不记得生命最初几年的事情，是因为那是个非常痛苦的过程。就像，那是个极其困难的训练阶段。确实。

have no idea how this works, but apparently it's possible because here's a proof by existence. There's something magical about going from a single cell to an organism that is born to the first few years of life. I kind of like the idea that the reason we don't remember anything about the first few years of our life is that it's a really painful process. Like, it's a very difficult challenging training process. Yeah.

Speaker 0

从智力层面来说。或许吧。我是说，为什么我们完全不记得那些？可能当时正在进行某种疯狂训练，也许那就是非常痛苦的背景模型训练过程。所以系统一旦训练完成，最好忘记构建过程。

Like, intellectually. Like and maybe yeah. I mean, I don't why don't we remember any of that? There might be some crazy training going on and the that that the maybe that's the background model training that is is very painful. And so it's best for the system once it's trained not to remember how it's constructed.

Speaker 1

我认为就像长期记忆的硬件还没完全发育好。我觉得婴儿最初几年其实不是在学习，而是大脑在成熟。我们出生时是早产状态，有理论认为这是由于产道和大脑体积限制。所以我们早产后，最初几年只是大脑在发育，之后才开始真正学习。这是我目前的观点。

I think it's just like the hardware for long term memory is just not fully developed. I kind of feel like the first few years of infants is not actually like learning, it's brain maturing. We're born premature, and there's a theory along those lines because of the birth canal and the swallowing of the brain. And so we're born premature and then the first few years we're just that the brain is maturing and then there's some learning eventually. That's my current view on it.

Speaker 0

你认为神经网络能拥有类似人类的长期记忆吗？你觉得是否需要在其之上构建另一个元架构，比如添加学习世界事实的知识库这类东西？

What do you think do you think neural nets can have long term memory? Like, that approaches something like humans. Do you think neural do do you think there needs to be another meta architecture on top of it to add something like a knowledge base that learns facts about the world and all that kind of stuff?

Speaker 1

是的，但不确定会以何种形式显式构建。可能会以非直观形式出现，比如你告诉GPT：'你有个声明性记忆库可以存取数据，遇到有用信息就存进去。这是检索范例，这是调用方式'——用英语文本教导它，它可能就学会使用记忆库了。

Yes, but I don't know to what extent it will be explicitly constructed. It might take unintuitive forms where you are telling the GPT like, hey you have declarative memory bank to which you can store and retrieve data from and whenever you encounter some information that you find useful just save it to your memory bank. And here's an example of something you have retrieved and here's how you say it and here's how you load from it, you just say load whatever, you teach it in text in English and then it might learn to use a memory bank from that.

Speaker 0

哦，所以神经网络是背景模型的底层架构，其他所有功能都构建在它之上。

Oh so the neural net is the architecture for the background model, base thing and then everything else is just on top of it.

Speaker 1

这不仅仅是文本对吧？你给它配备了各种小工具和小装置，所以你在教它某种特殊语言，让它能够存储任意信息并在之后检索。你还在告诉它这些特殊标记以及如何排列它们来使用这些接口。就像说，嘿，你可以用计算器，这是使用方法。只要输入五三加四一等号，当出现等号时，计算器就会给出答案，你不需要自己计算。

It's not just text, right? You're giving it gadgets and gizmos, so you're teaching some kind of special language by which it can save arbitrary information and retrieve it at a later time. And you're telling it about these special tokens and how to arrange them to use these interfaces. It's like, hey, you can use a calculator, here's how you use it. Just do five three plus four one equals, and when equals is there, a calculator will actually read out the answer and you don't have to calculate it yourself.

Speaker 1

而你只需要用英语告诉它。这方法可能真的可行。

And you just, like, tell it in English. This might actually work.

Speaker 0

从这个角度看，你觉得DeepMind的Gato系统有趣吗？它不仅仅是语言模型，而是把所有东西都混在一起处理？图像、动作等等，这基本上就是我们未来的发展方向？

Do you think, in that sense, Gato is interesting, the the DeepMind system that it's not just a language, but actually throws it all in the same pile? Images, actions, all that kind of stuff, that's basically what we're moving towards?

Speaker 1

是的，我也这么想。Gato本质上就是个'大杂烩'式的强化学习方法，用同一个固定的Transformer模型应对多种不同环境，对吧？我认为这是该领域非常早期的成果，但确实符合我对未来技术形态的预期。

Yeah, think so. So Gato is very much a kitchen sink approach to reinforcement learning in lots of different environments with a single fixed transformer model, right? I think it's a very sort of early result in that realm, but I think, yeah, it's along the lines of what I think things will eventually look like.

Speaker 0

没错。所以这是未来系统的雏形阶段，从宏观角度来看最终会发展成这样。是的。我...

Right. So this is the early days of a system that eventually will look like this, like from a rich, sudden perspective. Yeah. I'm not

Speaker 1

我个人不太喜欢那些看起来完全不同的接口。我希望所有东西都能标准化到同一个API里。比如说屏幕像素...

super huge fan of, I think, all these interfaces that, like, look very different. I would want everything to be normalized into the same API. So for example, screen pixels

Speaker 0

嗯。

Mhmm.

Speaker 1

完全相同的API。与其拥有各种物理规则、关节配置、外观等截然不同的世界环境，再通过特殊标记来适配不同游戏，我更倾向于将所有内容标准化为单一接口，这样对神经网络来说看起来都是一样的，如果这样讲得通的话。

Very same API. Instead of having, like, different world environments that have very different physics and joint configurations and appearances and whatever, and you're having some kind of special tokens for different games that you can plug. I'd rather just normalize everything to a single interface so it looks the same to the neural net, if that makes sense.

Speaker 0

所以最终都会变成基于像素的乒乓球游戏。我想是的。好的。让我问问你的个人生活。很多人都想知道，作为AI史上最高产最杰出的人物之一...

So it's all going to be pixel based Pong in the end. I think so. Okay. Let me ask you about your own personal life. A lot of people want to know you're one of the most productive and brilliant people in the history of AI.

Speaker 0

安德烈·卡帕西的高效一天是怎样的？你几点起床？因为可以想象在平均高效日与完美高效日之间存在某种舞蹈关系。完美高效日是我们努力的方向，而平均高效日则是在各种失误和人为因素影响下最终趋近的状态。

What does a productive day in the life of Andre Kapathi look like? What time do you wake up? Because imagine some kind of dance between the average productive day and a perfect productive day. So the perfect productive day is the thing we strive towards, and the average is kind of what it converges to given all the mistakes and human eventualities and so on. Yep.

Speaker 0

所以你具体几点起床？你是晨型人吗？我不是...

So what what times do you wake up? Are you a morning person? I'm not a

Speaker 1

晨型人。我绝对是夜猫子。

morning person. I'm a night owl for sure.

Speaker 0

嗯。这个作息稳定吗？算是半稳定吧，比如...

Mhmm. Is it stable or not? It's semi stable, like

Speaker 1

八九点左右。读博时更晚，我通常凌晨3点才睡。我认为凌晨时段非常珍贵，是工作的黄金时间，因为所有人都睡着了。早上七八点东海岸就醒了，会有短信、新闻网站更新等各种干扰。但凌晨三点万籁俱寂，没人打扰，可以心无旁骛地长时间工作。

eight or nine or something like that. During my PhD it was even later, I used to go to sleep usually at 3AM. I think the AM hours are precious and very interesting time to work because everyone is asleep. At 8AM or 7AM the East Coast is awake, so there's already activity, there's already some text messages, whatever, there's stuff happening you can go on like some news website and there's stuff happening and distracting. At 3AM everything is totally quiet, and so you're not going to be bothered and you have solid chunks of time to do work.

Speaker 1

所以我喜欢那些时段，天生就是夜猫子。我认为高效时间的关键在于，你需要在不被过多干扰的情况下对问题建立一些动力，需要将问题加载到你的REM睡眠和工作记忆中。然后你需要对它着迷——洗澡时、入睡时都要想着这个问题，让它完全占据你的记忆，这样你就能随时醒来继续投入工作。

So I like those periods, night owl by default. And then I think like productive time basically, what I like to do is you need to like build some momentum on the problem without too much distraction and you need to load your REM, your working memory with that problem. And then you need to be obsessed with it when you're taking a shower, when you're falling asleep, you need to be obsessed with the problem and it's fully in your memory and you're ready to wake up and work on it right there.

Speaker 0

那么这个时间尺度是按单日计算，还是几天或一周？所以...

So is this in a scale, temporal scale of a single day or a couple of days, a week, So a

Speaker 1

我无法孤立地谈论单日效率，因为这是个完整过程。当我想高效解决问题时，通常需要连续几天沉浸其中，不受干扰地完全痴迷于那个问题。可以说，我最出色的工作都是在这种状态下完成的。

I can't talk about one day basically in isolation because it's a whole process. When want to get productive in the problem, I feel like I need a span of a few days where I can really get in on that problem and I don't want to be interrupted and I'm going to just be completely obsessed with that problem. That's where I do most of my good work, would say.

Speaker 0

你在极短时间内完成了很多很酷的小项目，这需要你完全专注其中。

You've done a bunch of cool little projects in a very short amount of time very quickly, so that that requires you just focusing on it.

Speaker 1

没错。本质上需要将问题加载到工作记忆里，而且必须高效——因为解决任何问题都存在巨大固定成本。比如我在特斯拉时就为此困扰，想搞些小项目，但首先得解决：SSH连接集群、启动VS Code编辑器之类。总会因为某些原因遇到愚蠢错误，根本无法立即投入工作。

Yeah. Basically, need to load my working memory with the problem and I need to be productive because there's always like a huge fixed cost to approaching any problem. You know, like I was struggling with this for example at Tesla because I want to work on like small side projects, but okay, you first need to figure out, oh, okay, I need to SSH into my cluster, I need to bring up a Versus Code editor so I can like work on this. I need to I run into some stupid error because of some reason. Like you're not at a point where you can be just productive right away.

Speaker 1

你不断遭遇障碍，关键是要扫清所有障碍，才能深入问题核心，让问题完整地驻留在记忆中。

You are facing barriers, and so it's about really removing all that barrier and you're able to go into the problem and you have the full problem loaded in your memory.

Speaker 0

同时还要避开各种形式的干扰——新闻邮件也好，其他有趣项目的诱惑也罢，无论是过去还是手头在做的。你只想让思维保持绝对聚焦。

And somehow avoiding distractions of all different forms, like news stories, emails, but also distractions from other interesting projects that you previously worked on or currently working on and so on. You just wanna really focus your mind.

Speaker 1

我是说，可以偶尔抽时间放松一下，但不能太多。你知道，一天大部分时间都花在那个问题上，然后我会喝咖啡，有我的晨间例行公事，看看新闻、推特、黑客新闻、华尔街日报等等。

I And mean, can take some time off for distractions and in between, but I think it can't be too much. You know, most of your day is sort of like spent on that problem, and then, you know, I drink coffee, I have my morning routine, I look at some news, Twitter, Hacker News, Wall Street Journal, etcetera.

Speaker 0

所以你基本上是起床后喝点咖啡，是尽快投入工作，还是先了解世界发生了什么？

You so basically, you you wake up, you have some coffee, are you trying to get to work as quickly as possible, or do you do take in this diet of of like what the hell is happening in the world first?

Speaker 1

我确实觉得了解世界很有趣，虽然不确定这是否有益，但目前这是我日常的一部分。我会浏览一堆新闻文章，想保持信息灵通，但对此持怀疑态度。我怀疑这种做法，但目前就是这样。

I am I do find it interesting to know about the world, I don't know that it's useful or good, but it is part of my routine right now, so I do read through a bunch of news articles and I want to be informed and I'm suspicious of it. I'm suspicious of the practice, but currently that's where I am.

Speaker 0

哦，你是怀疑这种习惯对你生产力和幸福感的积极影响吗？

Oh, you mean suspicious about the positive effect Yeah. Of that practice on your productivity and your well-being as well?

Speaker 1

心理上的幸福感。是的。

My well-being psychologically. Yeah.

Speaker 0

还有你深入理解世界的能力，因为信息来源众多，你并没有真正专注于深度整合。

And also on your ability to deeply understand the world because there's a bunch of sources of information, you're not really focused on deeply integrating.

Speaker 1

是的。有点分散注意力。

Yeah. It's a little distracting.

Speaker 0

没错。就一个完全高效的工作日而言，你单次尝试专注处理一件事的连续时长是多久？是几小时？一小时？还是三十分钟？

You're yep. In terms of a perfectly productive day, for how long of a stretch of time in one session do you try to work and focus on a thing? Is it a couple hours? Is it one hour? Is it thirty minutes?

Speaker 0

十分钟？

Is it ten minutes?

Speaker 1

我大概能连续工作几小时，然后需要间歇休息吃点东西什么的。但即便如此，累计有效工作时间仍然很难。我曾用追踪器记录每天实际编码时长，即便在高效日，也只有六到八小时。因为生活中有太多琐事——通勤、与人交谈、进食等等，这些都是生存成本。

I can probably go like a small few hours and then I need some breaks in between for like food and stuff. And yeah, but I think like it's still really hard to accumulate hours. I was using a tracker that told me exactly how much time I spent coding any one day, Even on a very productive day, I still spent only like six or eight hours. And it's just because there's so much padding, commute, talking to people, food, etc. There's like a cost of life.

Speaker 1

仅仅维持人类基本生存需求与生理平衡就需要耗费大量精力。

Just living and sustaining and homeostasis and just maintaining yourself as a human is very high.

Speaker 0

而且人类思维中似乎存在参与社交的渴望，正是这种欲望制造了时间空隙。是的。因为我...我最高效的日子都是从早到晚彻底屏蔽所有外界干扰。

And and that there seems to be a desire within the human mind to to to participate in society that creates that padding. Yeah. Because I yeah. The the most productive days I've ever had is just completely from start to finish just tuning out everything

Speaker 1

确实。

Yep.

Speaker 0

就只是坐在那里。那样的话就能突破六到八小时的限制。关于如何获得长时间专注工作的耐力，有什么经验之谈吗？

And just sitting there. And then and then you could do more than six and eight hours. Yeah. Is there some wisdom about what gives you strength to do, like, tough days of long focus?

Speaker 1

是的。就像每当我痴迷于某个问题时，总需要有个东西能运作起来，有个东西必须存在。

Yeah. Just like whenever I get obsessed about a problem, something just needs to work, something just needs to exist.

Speaker 0

它必须存在，这样你才能处理各种漏洞、编程问题、技术难题以及那些事后证明错误的设计决策。在渴望思考存在的前提下，你才能理清所有这些问题。

It needs to exist and you so you're able to deal with bugs and programming issues and technical issues and design decisions that turn out to be the wrong ones. You're able to think through all of that given given that you want to think to exist.

Speaker 1

没错。它必须存在，而对我来说另一个重要因素是——其他人会欣赏它吗？他们会喜欢吗？嗯。这是我很大一部分动力来源。

Yeah. It needs to exist and then I think to me also a big factor is, you know, are other humans are going to appreciate it? Are they going to like it? Mhmm. That's a big part of my motivation.

Speaker 1

如果我能帮助他人并让他们感到快乐，听到他们的赞美，看到他们在推特上分享等等，这会让我感到愉悦，因为我在做有意义的事。

If I'm helping humans and they seem happy, they say nice things, they tweet about it or whatever, that gives me pleasure because I'm doing something useful.

Speaker 0

所以你确实考虑过与世界分享这些成果？比如通过GitHub、博客文章或是视频的形式？

So like you do see yourself sharing it with the world, like whether it's on GitHub, is it blog posts, or through videos?

Speaker 1

对，我思考过这个问题。假设我完成了所有工作却不分享，我不认为自己能保持同样高涨的创作动力。

Yeah. Was thinking about it. Like suppose I did all these things but did not share them, I don't think I would have the same amount of motivation that I can build up.

Speaker 0

你享受他人从你创造的事物中获得价值和快乐的感觉。

You enjoy the feeling of other people gaining value and happiness from the stuff you've created.

Speaker 1

是的。

Yeah.

Speaker 0

饮食方面呢？我看到你尝试过间歇性断食。你断食。这对一切有帮助吗？

What about diet? Is there I I saw you played with intermittent fasting. You fast. Does that help? With everything.

Speaker 0

你尝试过各种方法，什么对你的心智专注力、精神生产力以及幸福感最有帮助？你现在还断食吗？

You played with things you played, what's been most beneficial to the your ability to mentally focus on a thing and just mental mental productivity and happiness? You still fast?

Speaker 1

是的。我仍然断食，但做的是间歇性断食，实际上最终意味着我不吃早餐。在稳定状态下，我默认采用18:6模式，即只在中午12点到下午6点进食。这不是硬性规定，我经常打破，但这是我的默认模式。

Yeah. I still fast, but I do intermittent fasting, but really what it means at the end of the day is I skip breakfast. So I do eighteen:six roughly by default when I'm in my steady state. If I'm traveling or doing something else I will break the rules, but in my steady state I do eighteen:six, so I eat only from twelve to six. Not a hard rule and I break it often but that's my default.

Speaker 1

此外，我做过许多随机实验，过去一年半左右的时间里，我主要是植物性饮食或以植物为主。我听到‘植物为主’这个说法，听起来更好。

And then yeah I've done a bunch of random experiments, for the most part right now where I've been for the last year and a half I want to say is I'm plant based or plant forward. I heard plant forward, it sounds better.

Speaker 0

具体是什么意思？

Do mean exactly?

Speaker 1

其实我不知道两者有什么区别，但‘植物为主’听起来更顺耳。意思就是我更偏好植物性食物，生的或熟的都行。

I didn't actually know what the difference is but it sounds better in my mind. But it just means that I prefer plant based food and Raw or cooked or

Speaker 0

我偏好熟食和植物基饮食。关于植物基，请原谅，我其实不太清楚‘植物’这个范畴具体涵盖多广。

I prefer cooked and plant based. So plant based forgive me, I don't actually know how wide the category of plant entails.

Speaker 1

植物基饮食意味着你不必严格拘泥，可以灵活调整，只是更倾向于吃植物性食物，并且不会试图影响他人。比如去别人家参加派对，主人自豪地端上牛排，你还是会吃的。对吧？

Well plant based just means that you're not Like about it and you can flex and you just prefer to eat plants and you know you're not making you're not trying to influence other people. If someone is you come to someone's house party and they serve you a steak that they're really proud of, you will eat it. Yes. Right.

Speaker 0

这种不评判的态度很棒。虽然我持相反立场，但我也很灵活。你试过一日一餐吗？

It's just like judgmental. That's beautiful. I mean, that's I'm on the flip side of that, but I'm very sort of flexible. Have you tried doing one meal a day?

Speaker 1

我偶尔无意中这样过，但不是常态。我不喜欢这种方式，感觉身体不适，冲击太大了。

I have accidentally, not consistently, but I've accidentally had that. I don't I don't like it. I think it makes me feel not good. It's too it's too much too much of a hit. Yeah.

Speaker 1

所以目前我保持每日两餐，分别在中午12点和下午6点。

And so currently, I have about two meals a day, twelve and six.

Speaker 0

我一直这么坚持，现在也是。嗯，我正在执行一日一...

I I do that nonstop. I'm doing it now. Mhmm. I'm doing one meal a

Speaker 1

一餐。明白了。

day. Okay.

Speaker 0

这很有趣。这是一种有趣的感受。你有没有试过禁食超过一天？

It's interesting. It's an interesting feeling. Have you ever fasted longer than a day?

Speaker 1

有。我做过好几次清水断食，因为好奇会发生什么。

Yeah. I've I've done a bunch of water fasts because I was curious what happens.

Speaker 0

发生了什么？有什么有趣的事吗？

What happened? Anything interesting? Yeah.

Speaker 1

我觉得有。你知道有意思的是，前两天你会饿，但到了第三天左右，就不觉得饿了。这种感觉特别奇怪，因为你几天没吃东西却不觉得饿。

I would say so. I mean, you know, what's interesting is that you're hungry for two days, and then starting day three or so, you're not hungry. It's like such a weird feeling because you haven't eaten in a few days and you're not hungry.

Speaker 0

是不是很诡异？

Isn't that weird?

Speaker 1

确实很诡异。

It's really weird.

Speaker 0

人类生理的众多奇妙现象之一。它总能找到解决方法，要么发掘其他能量来源，要么让身体系统放松下来。具体机制我也不太清楚

One of the many weird things about human biology. Yeah. It figures something out and finds finds another source of energy or something like that or or relaxes the system. I don't know how

Speaker 1

身体就像这样，你饿了，你饿了，然后它就放弃了。就像在说，好吧，看来我们现在要禁食了。什么都没有了。然后它就专注于让你不再感到饥饿，你知道的，不让你感受到那种伤害，并试图给你一些空间来解决食物问题。

that The body is like, you're hungry, you're hungry, and then it just gives up. It's like, okay. I guess we're fasting now. There's nothing. And then it's just kinda like focuses on trying to make you not hungry and you know, not feel the damage of that and trying to give you some space to figure out the food situation.

Speaker 0

所以直到今天，你还是在晚上最有工作效率吗？

So are you still to this day most productive at night?

Speaker 1

我会说是的，但真的很难维持我的博士作息，特别是当我在特斯拉等工作时，这根本行不通。即使是现在，你知道，人们想要在各种活动中见面，社会生活在某个时间段，你多少得适应着工作。

I would say I am, but it is really hard to maintain my PhD schedule, especially when I was say working at Tesla and so on, it's a non starter. But even now, you know, people want to meet for various events, society lives in a certain period of time and you sort of have to like work.

Speaker 0

所以很难在社交活动之后，再回去工作。

So that's it's hard to, like, do a social thing and then after that return and do work.

Speaker 1

是的，这真的很难。

Yeah. It's just really hard.

Speaker 0

这就是为什么我在社交时尽量不喝太多酒，这样我就能回去继续工作。但在特斯拉，或者任何公司，作息时间会趋向一致吗？还是说人类在合作时就会这样？我需要了解一下这个。是的。

That's why I try when I do social things, I try not to do too too much drinking so I can return and continue doing work. But at at Tesla, is there is there a convergence Tesla, any any company, is there a convergence towards a schedule? Or is there more is is that how humans behave when they collaborate? I need to learn about this. Yeah.

Speaker 0

他们会尝试保持一个一致的作息时间，让大家在同一时间保持清醒吗？

Do they try to keep a consistent schedule where you're all awake at the same time?

Speaker 1

我是说，我确实尝试建立一种日常规律，努力创造一个让我感到舒适的稳定状态。所以我有一套晨间惯例，一套日间惯例，我尽量让一切保持稳定可预测，这样身体就会自然而然地适应。如果你过度打破这种状态，比如旅行时遭遇时差，就很难达到应有的状态。

I mean, I do try to create a routine and I try to create a steady state in which I'm comfortable in. So I have a morning routine, I have a day routine, I try to keep things to a steady state and things are predictable and then you can sort of just like, your body just sort of like sticks to that. And if you try to stress that a little too much it will create a you know when you're traveling and you're dealing with jet lag, you're not able to really ascend to, you know, where you need to go.

Speaker 0

是啊是啊，这就是人类养成习惯的意义。你对人类一生中工作与生活平衡有什么看法？特斯拉某种程度上以将人们推向能力极限著称——无论是他们能做的事、试图做的事，还是工作强度这类事情。

Yeah. Yeah. That's what humans with the habits and stuff. What are your thoughts on work life balance throughout a human lifetime? So Tesla in part was known for sort of pushing people to their limits in terms of what they're able to do, in terms of what they're trying to do, in terms of how much they work, all that kind of stuff.

Speaker 1

确实。我觉得特斯拉在这方面背负了过多恶名，因为实际情况是特斯拉属于爆发式工作环境。以我在谷歌实习三次的经历为参照——包括在Google DeepMind的见闻——我认为特斯拉的基础强度确实更高，但属于间断平衡状态：平时还好，偶尔遇到紧急情况时大家会拼命工作。这些爆发期的事例被集中传播后，就造成了误解。

Yeah. I mean I will say Tesla gets a little too much bad rep for this, because what's happening is Tesla is a it's a bursty environment. So I would say the baseline, my only point of reference is Google, where I've interned three times and I saw what it's like inside Google DeepMind. I would say the baseline is higher than that, but then there's a punctuated equilibrium where once in a while there's a fire and someone like people work really hard. And so it's spiky and bursty and then all the stories get collected.

Speaker 1

关于那些爆发期，没错。这让外界觉得这里完全疯狂，实际上只是环境更紧张些，会有需要冲刺的紧急任务。所以我确实认为这里比谷歌之类的环境更激烈。

About the bursts, yeah. And then it gives the appearance of like total insanity, but actually it's just a bit more intense environment and there are fires and sprints. And so I think, you know, definitely though I I would say it's a more intense environment than something you would get at Google.

Speaker 0

但撇开这些不谈，就你个人生活而言，你认为人类——比如像你这样优秀的人——的幸福在于找到工作与生活的平衡吗？还是说这种思考本身就不太成立？

But you're in your personal forget all of that. Just in your own personal life, what do you think about the happiness of a human being? A brilliant person like yourself about finding a balance between work and life or is it such a thing not a good thought experiment?

Speaker 1

我认为平衡很重要，但我同样热爱那些超出常规的冲刺阶段。正是在那些时期，我感觉自己最具创造力。

Yeah. I think I think balance is good, but I also love to have sprints that are out of distribution. And that's when I think I've been pretty creative and as well.

Speaker 0

所以"超出常规的冲刺"意味着大多数时候你还是保持着所谓的"平衡"状态。

So sprints out of distribution means that most of the time, you you have a Yeah. Quote unquote balance.

Speaker 1

我大多数时候都保持平衡。但偶尔也会沉迷于某些事情。

I have balance most of the time. And then I like being obsessed with something once in a while.

Speaker 0

偶尔是指？一周一次，一个月一次，还是一年一次？

Once in a while is what? Once a week, once a month, once a year?

Speaker 1

对。大概一个月一次这样吧，是的。

Yeah. Probably like say once a month or something, yeah.

Speaker 0

那时候我们就会在GitHub上新建一个仓库

And that's when we get a new GitHub repo from on

Speaker 1

没错，就是当你真正关心某个问题时，它必须存在，这会很了不起，你对此着迷，但当天还做不完，你需要付出进入状态的固定成本，然后得在那里停留一阵子，接着社会就会来干扰你，试图分散你的注意力。最糟糕的就是那种说'我只需要占用你五分钟'的人。对。这代价可不止五分钟。是的。社会需要改变对'就五分钟'这种说法的认知。

Yeah, that's when you like really care about a problem, it must exist, this will be awesome, you're obsessed with it, and now you can't just do it on that day, you need to pay the fixed cost of getting into the groove and then you need to stay there for a while and then society will come and they will try to mess with you and they will try to distract you. Yeah the worst thing is like a person who's like I just need five minutes of your time. Yeah. This is the cost of that is not five minutes Yes. And society needs to change how it thinks about just five minutes of your time.

Speaker 1

没错。就是

Right. It's it's

Speaker 0

从来不是一分钟的事。至少三十分钟。总说'就一个小事'。

never it's never just one minute. It's just thirty. It's just a quick thing.

Speaker 1

大惊小怪？你怎么这么

Big deal? Why are you being so

Speaker 0

是啊。不。你的电脑配置是什么？理想的配置是怎样的？你是那种能适应任何设备的人吗？笔记本，四块屏幕？

Yeah. No. What's your computer setup? What what's, like, the perfect do you are you somebody that's flexible to no matter what? Laptop, four screens?

Speaker 0

对。还是你更偏好某种能让你最高效的特定配置？

Yeah. Or do prefer a certain setup that you're most productive?

Speaker 1

我想我最熟悉的是一个大屏幕，27英寸的，旁边放着我的笔记本电脑。

I guess the one that I'm familiar with is one large screen, 27 inch, and my laptop on the side.

Speaker 0

什么操作系统？

What operating system?

Speaker 1

我用Mac。那是我的主力机。

I do Macs. That's my primary.

Speaker 0

所有任务都用它？

For all tasks?

Speaker 1

我会推荐OSX，但在深度学习领域工作时，一切都离不开Linux。你通过SSH连接到集群进行远程工作。

I would say OSX, but when you're working on deep learning, everything is Linux. You're SSH'd into a cluster and you're working remotely.

Speaker 0

但实际开发环境呢？比如使用IDE的情况？

But what about the actual development, like the using the IDE?

Speaker 1

对。我认为一个好方法是直接在Mac上运行VS Code（我目前最爱的编辑器），但通过远程文件夹功能，你实际操作的文件其实位于其他地方的集群上。

Yeah. You would use I think a good way is you just run Versus Code, my favorite editor right now, on your Mac, but you are actually you have a remote folder through so the actual files that you're manipulating are on the cluster somewhere else.

Speaker 0

所以最佳IDE是什么？VS Code。人们还用什么呢？我还在用Emacs。嗯。

So what's the best IDE? Versus code. What else do people so I use Emacs Mhmm. Still.

Speaker 1

挺酷的。

That's cool.

Speaker 0

可能很酷吧，但我不确定是否效率最大化。对于编辑器你有什么推荐？你合作过很多软件工程师。针对Python、C++和机器学习应用的编辑器。

So it may be cool. I don't know if it's maximum productivity. So what what do you recommend in terms of editors? You worked with a lot of software engineers. Editors for Python, c plus plus machine learning applications.

Speaker 1

我认为当前答案是VS Code。目前相信这是最好的IDE。它有大量扩展插件，还集成了GitHub Copilot，我觉得这非常实用。

I think the current answer is Versus Code. Currently, believe that's the best IDE. It's got a huge amount of extensions. It has GitHub Copilot integration, which I think is very valuable.

Speaker 0

你对Copilot的集成有什么看法？我其实和Python的创始人Guido van Rossum聊了很多，他非常喜欢Copilot。他经常用它编程。你呢？

What do you think about the the Copilot integration? I was actually got to talk a bunch with Guido Nirasam, who's a creator of Python, and he loves Copilot. He, like, he programs a lot with it. Yep. Do you?

Speaker 0

是的，使用Copilot。

Yeah. Use Copilot.

Speaker 1

我很喜欢它。虽然对我来说是免费的，但我愿意为此付费。我认为它非常棒，我发现它的实用性在于——可以说有一个学习曲线，你需要弄清楚什么时候它有帮助，什么时候需要注意它的输出，什么时候它没有帮助，不应该关注它。因为如果你总是阅读它的建议，这不是一个好的互动方式。但我觉得我已经能够适应它了。

I love it. And it's free for me, but I would pay for it. Yeah, think it's very good and the utility that I found with it is in I would say there is a learning curve and you need to figure out when it's helpful and when to pay attention to its outputs and when it's not going to be helpful, where you should not pay attention to it. Because if you're just reading its suggestions all the time, it's not a good way of interacting with it. But I think I was able to sort of like mold myself to it.

Speaker 1

我发现它非常有用，首先是在复制粘贴和替换某些部分时，当模式清晰时，它非常擅长完成模式。其次，有时它会建议一些我不熟悉的API，告诉你一些你不知道的东西。

I find it's very helpful, number one, in copy paste and replace some parts, so when the pattern is clear it's really good at completing the pattern. And number two, sometimes it suggests APIs that I'm not aware of, so it tells you about something that you didn't know.

Speaker 0

那是一个发现和利用它的机会。

And that's an opportunity to discover and use it.

Speaker 1

这是一个机会，所以我从来不会直接接受Copilot的代码，我几乎总是会复制粘贴到Google搜索中，看看这个函数是做什么的，然后你会发现，哦，这正是我需要的。是的，谢谢你，Copilot。你学到了一些东西。

It's an opportunity to so I would never take Copilot code as given, I almost always copy a copy paste into a Google search and you see what this function is doing, and then you're like, oh, it's actually actually exactly what I need. Yeah. Thank you, Copilot. You learn something.

Speaker 0

所以它部分是一个搜索引擎，部分可能是帮你准确获取语法，一旦你看到了，你就知道它是正确的。就像那个NP难题，你看到就知道它是对的。

So it's in part a search engine, part maybe getting the exact syntax correctly that once you see it. It's that NP hard thing. You see it you know it's correct.

Speaker 1

没错。你自己就有能力，

Exactly. You yourself are able,

Speaker 0

你可以高效地验证但无法高效地生成。而Copilot实际上，

you can verify efficiently but you can't generate efficiently. And Copilot really,

Speaker 1

我的意思是，它就是编程的自动驾驶仪，对吧？目前它还在做链接跟随这类简单的复制粘贴，偶尔会给出建议，但随着时间的推移，它会变得越来越自主。同样的情况不仅会发生在编码领域，很可能还会扩展到许多其他方面。

I mean it's autopilot for programming, right? And currently it's doing the link following, which is like the simple copy paste and sometimes suggest, but over time it's going to become more and more autonomous. And so the same thing will play out in not just coding, but actually across many many different things probably.

Speaker 0

编码是个重要领域，对吧？编写程序。

Coding is an important one, right? Writing programs.

Speaker 1

是的。

Yep.

Speaker 0

你认为未来会如何发展？程序合成技术，能够编写越来越复杂的程序。因为目前它还是在人类监督下运行的，是的，以一些有趣的方式。

What how do you see the future of that developing? The program synthesis, being able to write programs that are more and more complicated. Because right now, it's human supervised in Yeah. Interesting ways. Yes.

Speaker 0

感觉这个转型过程会非常痛苦。

Like what it feels like the transition will be very painful.

Speaker 1

我的思维模型是，这将与自动驾驶的情况相同。目前他正在进行链接跟随，做一些简单的事情。嗯。最终我们将实现自主性，人们需要干预的情况会越来越少。

My mental model for it is the same thing will happen as with the autopilot. So currently he's doing link following, he's doing some simple stuff. Mhmm. And eventually we'll be doing autonomy and people will have to intervene less and

Speaker 0

这些可能就像是测试机制。比如，它写了一个函数，那个函数看起来非常正确，但你怎么知道它是正确的？因为作为程序员，你会变得越来越懒。嗯哼。就像，你的能力会因为小错误而下降，但我想它不会制造

And those could be, like, you like, testing mechanisms. Like, if it writes a function and that function looks pretty damn correct, but how do you know it's correct? Because you're, like, getting lazier and lazier as a programmer. Uh-huh. Like, your ability to because, like, little bugs, but I guess it won't make

Speaker 1

小的错误？不，它会。Copilot会制造那种差一的微妙错误。它就对我这么干过。

little No. It will. It it Copilot will make off by one subtle bugs. It has done that to me.

Speaker 0

但你认为未来的系统会这样吗？还是说差一错误实际上是编程中的一个根本性挑战？

But do you think future systems will? Or is is it really the off by one is actually a fundamental challenge of programming?

Speaker 1

在那个案例中它并非根本性的，我认为情况可以改善，但确实人类需要监督。我对人们不监督输出结果感到担忧，比如系统中错误激增的情况。我对此感到不安，但我想未来可能会有其他用于发现错误的Copilot之类的东西，因为会有更多自动化工具出现

In that case it wasn't fundamental and I think things can improve, but yeah I think humans have to supervise. I am nervous about people not supervising what comes out and what happens to for example the proliferation of bugs in all of our systems. I'm nervous about that but I think there will probably be some other copilots for bug finding and stuff like that at some point because there'll be like a lot more automation for

Speaker 0

哦，天哪。就像是一个编程Copilot生成编译器，另一个做代码检查

Oh, man. It's like a program a copilot that generates a compiler one that does a linter

Speaker 1

是的。

Yes.

Speaker 0

一个类似于类型检查器的存在。

One that does like a a type checker.

Speaker 1

对，就像一个由GPT组成的委员会。

Yeah. It's a committee of like a GPT sort of like

Speaker 0

然后会有一个委员会的管理者。是的。接着会有人说需要这个的新版本，我们需要重新生成它。

And then there'll be like a manager for the committee. Yeah. And then there'll be somebody that says a new version of this is needed, we need to regenerate it.

Speaker 1

没错。有10个GPT被转发并给出了50条建议，另一个GPT看了后挑选了几个它喜欢的。一个负责检查错误的GPT看了后觉得这可能是个bug，它们又被其他东西重新排序，最后有一个综合GPT介入，说好的，根据你们告诉我的所有信息，这可能是下一个标记。

Yeah. There were 10 GPTs that were forwarded and gave 50 suggestions, another one looked at it and picked a few that they like. A bug one looked at it and it was like it's probably a bug, they got re ranked by some other thing, and then a final ensemble GPT comes in and is like okay, given everything you guys have told me, this is probably the next token.

Speaker 0

你知道，感觉世界上程序员的数量一直在快速增长。你认为在这种世界里，这个数字实际上会趋于平稳并下降到非常低的水平吗？因为那时你将会做软件2.0编程，做这种Copilot类系统的生成编程，但不会做老派的软件1.0编程。

You know, the feeling is the number of programmers in the world has been growing and growing very quickly. Do you think it's possible that it'll actually level out and drop to, like, a very low number with this kind of world? Because then you'll be doing software two point o programming, and you'll be doing this kind of generation of Copilot type systems programming, but you won't be doing the old school saw software one point o programming.

Speaker 1

我目前不认为它们会完全取代人类程序员。说这种话我真的很犹豫，对吧？

I don't currently think that they're just going to replace human programmers. It's I'm so hesitant saying stuff like this. Right?

Speaker 0

因为这个在五年内就会被取代。我是说，不，它会证明，就像我们之前想的那样，因为我同意你的观点，但我认为我们可能会非常惊讶。对吧？比如，下一个...我...你对语言模型的现状有什么感觉？感觉像是开始、中期还是结束？

Because this is is going to be replaced in five years. I mean, no, it's going to show that, like, this is where we thought because I I agree with you, but I think we might be very surprised. Right? Like, what what are the next I I I what's your sense of where we stand with language models? Like, does it feel like the beginning or the middle or the end?

Speaker 1

毫无疑问，从起点开始。我心中最大的疑问是，GPT肯定能够出色且胜任地编程。关键在于如何引导这个系统？你仍需对实际需求提供一定指导。

The beginning. 100%. I think the big question in my mind is, for sure, GPT will be able to program quite well, competently, and so on. How do you steer the system? You still have to provide some guidance to what you actually are looking for.

Speaker 1

那么该如何引导它？如何与它对话？如何审计并验证其输出的正确性？又该如何与之协作？这不仅是AI问题，更是用户界面与体验设计的问题。确实如此。

And so how do you steer it and how do you say how do you talk to it? How do you audit it and verify that what is done is correct? And how do you like work with this? And it's as much not just an AI problem, but a UI UX problem. Yeah.

Speaker 1

这为Versus Code++提供了极其丰富的研究土壤——编程不再局限于人类单方面操作，简直妙不可言。

So beautiful fertile ground for so much interesting work for Versus Code plus plus where you're not just it's not just human programming anymore. It's amazing.

Speaker 0

没错。你是在与系统互动，不是单次提示而是迭代式对话。对，你正在尝试与系统建立对话关系。

Yeah. So you're interacting with the system. So not just one prompt, but it's iterative prompting. Yeah. You're trying to figure out having a conversation with the system.

Speaker 0

是的。对我来说，能与正在编写的程序对话这件事本身就令人无比兴奋。

Yeah. That I mean, to me, that's super exciting to have a conversation with the program I'm writing.

Speaker 1

也许未来某天，你只需自然交流：'我想实现这个功能'，甚至不必具体到变量层面。嗯。

Yeah. Maybe at some point, you're just conversing with it. It's like, okay. Here's what I wanna do. Actually, this variable maybe it's not even that low level as variable, but Mhmm.

Speaker 0

你还可以设想：'能否把这个翻译成C++再转回Python？'

You can also imagine, like, can you translate this to c plus plus and back to Python and back

Speaker 1

某种程度上已经

to already kind of in some

Speaker 0

不。但就像，作为项目体验的一部分来做这件事，比如，我想用C++写这个函数。嗯。或者，就像，你只是根据不同程序不断调整，因为它们有不同的语法。

No. But just, like, doing it as part of the program experience, like, I think I'd like to write this function in c plus plus. Mhmm. Or or, like, you just keep changing for different Yeah. Different programs because they have different syntax.

Speaker 0

也许我想把它转换成函数式语言。

Maybe I wanna convert this into a functional language.

Speaker 1

是的。

Yep.

Speaker 0

所以就像你作为程序员变得多语言化，并能高效地在不同语言间切换。是的。

And so like you get to become multilingual as a programmer and dance back and forth efficiently. Yeah.

Speaker 1

我是说，我认为它的UI/UX设计仍然很难想清楚，因为这不仅仅是写页面代码。你有一个完整的开发环境。上面有一堆硬件。有一些环境变量。还有一些脚本在cron作业中运行。

I mean, I think the UI UX of it though is like still very hard to think through because it's not just about writing code on a page. You have an entire developer environment. You have a bunch of hardware on it. You have some environmental variables. You have some scripts that are running in a cron job.

Speaker 1

就像有很多事情在进行，比如与计算机协作，这些系统如何设置环境标志、跨多台机器工作、设置屏幕会话、自动化不同流程，所有这些如何运作并能被人类审计等等，目前都是巨大的问题。

Like there's a lot going on to like working with computers and how do these systems set up environment flags and work across multiple machines and set up screen sessions and automate different processes, like how all that works and is auditable by humans and so on is like massive question at the moment.

Speaker 0

你创建了Archive Sanity。什么是Archive？你希望看到的学术研究出版的未来是怎样的？

You've built Archive Sanity. What is Archive and what is the future of academic research publishing that you would like to see?

Speaker 1

Archive是一个预印本服务器，如果你有一篇论文，可以提交给期刊或会议等待六个月后可能得到通过或拒绝的决定，或者你可以直接上传到Archive。三分钟后人们就能在推特上讨论它，所有人都能看到、阅读，并以各自的方式从中受益。

So Archive is this preprint server, so if you have a paper, you can submit it for publication to journals or conferences and then wait six months and then maybe get a decision, pass or fail, or you can just upload it to archive. And then people can tweet about it three minutes later and then everyone sees it, everyone reads it, and everyone can profit from it in their own little ways.

Speaker 0

而且你可以引用它，它看起来有正式感。感觉像是出版物，像是经过出版流程。是的，这和你仅仅放在博客里的感觉不同。

And you can cite it and it has an official look to it. It feels like a pub like, it feels like a publication process. Yeah. It feels different than you if you just put it in a blog post.

Speaker 1

哦，是的。我是说，这是一篇论文。通常，Archive上的内容标准会比博客文章更高。

Oh, yeah. Yeah. I mean, it's a paper. And usually, the the bar is higher for something that you would expect on archive as opposed to something you would see in a blog post.

Speaker 0

其实是文化设定了这个标准，因为你完全可以在Archive上发布质量很差的东西。那么这让你对同行评审有什么看法？

Well, the culture created the bar because you could probably Yes. Post a pretty crappy fix for an archive. Yes. So what what's that make you feel like? What what's that make you feel about peer review?

Speaker 0

是两三位专家进行的严格同行评审，还是论文写成时社区的即时评审？

So rigorous peer review by two, three experts versus the peer review of the community right as it's written?

Speaker 1

是的。基本上我认为社区在推特上能非常迅速地进行同行评审，这可能与AI机器学习领域的特点有关。我觉得这个领域的成果更容易被审核验证，验证过程可能比其他领域更简单。你可以把这些科学出版物想象成小型区块链，每个人都在彼此的工作基础上构建并相互引用。而AI领域就像是一个更快速、更松散的区块链，每个单独条目的制作成本很低。其他领域可能不太适合这种模式。至少在AI领域，成果很容易验证，所以当有人上传了包含好想法的论文，第二天人们就能尝试，并成为判断该想法是否有效的最终仲裁者，整个进程因此大幅加快。

Yeah. Basically, think the community is very well able to peer review things very quickly on Twitter, and I think maybe it just has to do something with AI machine learning field specifically though. I feel like things are more easily auditable and the verification is easier potentially than the verification somewhere else. So it's kind of like you can think of these scientific publications as like little blockchains where everyone is building on each other's work and citing each other and you sort of have AI which is kind of like this much faster and loose blockchain, but then you have and any one individual entry is like very cheap to make, and then you have other fields where maybe that model doesn't make as much sense. And so I think in AI at least things are pretty easily verifiable and so that's why when people upload papers that are a really good idea and so on, people can try it out like the next day and they can be the final arbiter of whether it works or not on their problem and the whole thing just moves significantly faster.

Speaker 1

所以我觉得学术界仍然有其存在价值，抱歉应该说这种会议期刊流程仍有其意义，但它似乎有些滞后，或许是个更高质量的过程，但已不再是发现前沿工作的场所。我刚开始读博时，情况是你会去会议和期刊上讨论所有最新研究。现在你去参加会议或看期刊，没人会讨论上面的内容，因为那些已经是三代之前的过时东西了。

So I kind of feel like academia still has a place, sorry this conference journal process still has a place, but it's sort of like it lags behind I think and it's a bit more maybe higher quality process, but it's not sort of the place where you will discover cutting edge work anymore. It used to be the case when I was starting my PhD that you go to conferences and journals and you discuss all the latest research. Now when you go to a conference or a journal like no one discusses anything that's there because it's already like three generations ago irrelevant.

Speaker 0

是的。这让我对DeepMind感到遗憾，比如他们仍然在《自然》这类顶级期刊发表。虽然这些平台带来的声望仍有价值，但结果是当他们宣布某项突破性成果时，实际细节要等一年才会公布。如果这些细节能立即公开，本可以推动整个社区朝特定方向发展。

Yes. Which makes me sad about like DeepMind, for example, where they they still they still publish in nature and these big prestigious I mean, there's still value as opposed to the prestige that comes with these venues, but their their result is that they they'll announce some breakthrough performance, and it'll take, like, a year to actually publish the details. I mean and those details, if they were published immediately, would inspire the community to move in certain directions with that.

Speaker 1

没错。这会加速整个社区的发展，但我不确定这在多大程度上也是他们的目标函数之一。

Yeah. Would speed up the rest of the community, but I don't know to what extent that's part of their objective function also.

Speaker 0

确实如此。所以不仅是声望问题，某种程度上延迟本身就是策略的一部分。

That's true. So it's not just the prestige, a little bit of the delay is part of it.

Speaker 1

对。特别是DeepMind，他们一直在推行这种质量更高但延迟也更长的论文发表机制。

Yeah. They certainly DeepMind specifically has been working in the regime of having slightly higher quality basically process and latency and publishing those papers that way.

Speaker 0

Reddit上另一个问题：作为特斯拉AI总监，作为斯坦福的顶尖专家，当全世界都视你为权威时，你是否或曾经历过冒名顶替综合征？

Another question from Reddit. Do you or have you suffered from impostor syndrome? Being the director of AI at Tesla, being this person when you're at Stanford where like the world looks at you as the expert in

Speaker 1

嗯。

Yep.

Speaker 0

AI来教学。向全世界传授机器学习。

AI to teach Yeah. Teach the world about machine learning.

Speaker 1

我在特斯拉工作五年后离职时，花了大量时间在会议室里阅读论文。刚加入特斯拉时我还在写代码，后来写得越来越少，转而阅读代码，最后连代码也读得越来越少。我认为这是一个自然的演进过程。尤其在后期阶段，你会更强烈地意识到：虽然你理应是个专家，但真相其实存在于人们编写的代码、GitHub仓库和实际代码本身中——而你已不再像从前那样熟悉它们了。所以我觉得这其中或许存在某种不安全感。

When I was leaving Tesla after five years, I spent a ton of time in meeting rooms and you know I would read papers. In the beginning when I joined Tesla I was writing code and then I was writing less and less code and I was reading code and then I was reading less and less code. And so this is just a natural progression that happens I think. And definitely I would say near the tail end that's when it sort of like starts to hit you a bit more, that you're supposed to be an expert but actually the source of truth is the code that people are writing, the GitHub, and the actual code itself, and you're not as familiar with that as you used to be. And so I would say maybe there's some like insecurity there.

Speaker 0

确实。这其实很有深意——在计算机科学领域，很多不安全感确实源于不再亲自编写代码，因为代码本身就是真理的体现。

Yeah. That's actually pretty profound, that a lot of the insecurity has to do with not writing the code in the computer science space, like that because that is the truth, that right there.

Speaker 1

代码才是真理之源，论文和其他材料只是高度概括。它们终究是概要性的，但最终你必须阅读代码，把所有代码转化为论文形式是不可能的。所以当新成果发布时，特别是附带源代码的，那总是我最先查阅的地方。

The code is a source of truth, the papers and everything else, it's a high level summary. I don't yeah, it's just a high level summary, but at the end of the day you have to read code, it's impossible to translate all that code into actual paper form, so when things come out, especially when they have a source code available, that's my favorite place to go.

Speaker 0

就像我说的，你是有史以来最杰出的机器学习AI教育者之一。从CS231n课程到现在，对于想入门机器学习的新手，你会给什么建议？

So like I said, you're one of the greatest teachers of machine learning AI ever. From CS231n to today, what advice would you give to beginners interested in getting into machine learning?

Speaker 1

初学者总纠结于'该做什么'，但我认为重点应该是'做多少'。我大体上相信一万小时定律——你只需选择能投入时间、关心且感兴趣的领域。你必须实实在在地投入一万小时，具体方向反而不那么重要。过程中你会迭代、进步、也会浪费些时间，我不确定是否有更好方法，但这一万小时必不可少。我觉得这其实很美好，因为只要投入一万小时，成为专家就具有某种确定性。

Beginners are often focused on like what to do, and I think the focus should be more like how much you do. So I am kind of like a believer on the high level on this ten thousand hours kind of concept where you just kind of have to just pick the things where you can spend time and you care about and you're interested in. You literally have to put in ten thousand hours of work. It doesn't even like matter as much like where you put it and you'll iterate and you'll improve and you'll waste some time, I don't know if there's a better way, you need to put in ten thousand hours. But I think it's actually really nice because I feel like there's some sense of determinism about being an expert at a thing if you spend ten thousand hours.

Speaker 1

你可以随便选个领域，只要投入一万小时刻意练习，就真能成为专家。所以我觉得这个观念很鼓舞人心，因此我建议多关注'你是否在投入那一万小时'这个问题。

You can literally pick an arbitrary thing and I think if you spend ten thousand hours of deliberate effort and work you actually will become an expert at it. And so I think that's kind of like a nice thought, and so basically I would focus more on like are you spending ten thousand hours?

Speaker 0

这正是我所关注的。然后思考什么样的机制能最大化你达到一万小时的可能性。没错，对我们这些凡人来说，这意味着可能需要养成每天实际去做这件事的日常习惯。

That's what I focus on. And then thinking about what kind of mechanisms maximize your likelihood of getting to ten thousand hours Exactly. Which for us silly humans means probably forming a daily habit of like every single day actually doing the thing.

Speaker 1

任何对你有帮助的方式。所以我很大程度上认为这是个心理问题。另一个我认为对心理有帮助的是，很多人会在这个领域拿自己和别人比较，我觉得这非常有害。只和过去的自己比较，比如一年前的你，你是否比一年前进步了，这才是唯一的思考方式。这样你才能看到自己的进步，这非常激励人心。

Whatever helps you. So I just think to a large extent it's a psychological problem for yourself. One other thing that I help that I think is helpful for the psychology of it is many times people compare themselves to others in the area, I think this is very harmful. Only compare yourself to you from some time ago, like say a year ago, are you better than you a year ago, is the only way to think. And I think this then you can see your progress and it's very motivating.

Speaker 0

专注于小时数这个观点真有趣。因为我觉得很多人在初学阶段，甚至之后，都会因为选择而陷入瘫痪，比如选这条路还是那条路。是的，他们甚至会因为选择用哪个IDE而彻底卡住。

That's so interesting that focus on the quantity of hours. Because I think a lot of people in the beginner stage, but actually throughout, get paralyzed by the choice, like which one. Do I pick this path or this path? Yeah. Like, they'll literally get paralyzed by, like, which IDE to use.

Speaker 1

嗯，他们确实在担心这些。但关键在于，你总会浪费时间做错事。最终你会发现不对，你会积累伤疤组织，下次你会变得更强大，因为下次你有了这些经验教训。我就花了很多时间在最终毫无成果的事情上，但这些伤痕让我对什么有用、什么没用、事情如何发展有了直觉。所以那些错误都不是白费的。

Well, they're worried yeah. They're worried about all these things, but the thing is you will waste time doing something wrong. You will eventually figure out it's not right, you will accumulate scar tissue and next time you will grow stronger because next time you'll have the scar tissue and next time you'll learn from it and now next time you come to a similar situation you'll be like oh messed up. I've spent a lot of time working on things that never materialized into anything, and I have all that scar tissue and I have some intuitions about what was useful, what wasn't useful, how things turned out. So all those mistakes were not dead work, know.

Speaker 1

嗯。所以我觉得他们应该专注于做事。你上周做了什么实际成果？

Mhmm. So I just think they should just focus on working. What have you done? What have you done last week?

Speaker 0

这其实是个适用于很多领域的好问题，不仅是机器学习。这是个剔除——我忘了我们用的术语——那些冗余、低效生活的好方法。你喜欢教学的哪一点？你似乎经常被教学吸引。你很擅长教学，同时也主动投身其中。

That's a good question actually to ask for for a lot of things, not just machine learning. It's a good way to cut the I forgot what the term we used, but the fluff, the blubber, whatever the the inefficiencies in life. What do you love about teaching? You seem to find yourself often in the like, drawn to teaching. You're very good at it, but you're also drawn to it.

Speaker 1

我不认为我喜欢教学本身，我喜欢的是快乐的人。而当我教学时人们会快乐。我不讨厌教学，可以忍受教学，但喜欢的不是教学这个行为。而是你知道我有某些擅长的东西——我教学还行，人们对此非常感激。

I mean I don't think I love teaching, I love happy humans. And happy humans like when I teach. I wouldn't say I hate teaching, I tolerate teaching, but it's not like the act of teaching that I like. It's that you know I have something I'm actually okay at it. I'm okay at teaching and people appreciate it a lot.

Speaker 1

所以我很乐意尝试提供帮助，而教学本身并不是最...我是说它可能真的很烦人、令人沮丧。我现在正在准备一系列讲座，这让我回想起02/31那段日子，才明白制作这些优质材料需要多少工作量——反复修改、深入思考、走入死胡同、不断调整。要创造出真正具有教育价值的内容非常困难，而且并不有趣。

And so I'm just happy to try to be helpful, and teaching itself is not like the most I mean it can be really annoying, frustrating. I working on a bunch of lectures just now, I was reminded back to my days of 02/31 and just how much work it is to create some of these materials and make them good, the amount of iteration and thought and you go down blind alleys and just how much you change it. So creating something good in terms of like educational value is really hard, and it's not fun.

Speaker 0

确实很难。大家真应该去看看你发布的新内容。有些讲座里你实际构建了那些东西，就像你说的'代码即真理'。比如通过构建反向传播算法来讲解，逐行剖析整个过程。准备这种内容有多困难？

It's difficult. People should definitely go watch your new stuff you put out. There are lectures where you're actually building the thing like from, like you said, the code is truth. So discussing backpropagation by building it, by looking through and just the whole thing. So how difficult is that to prepare for?

Speaker 0

我认为这是非常有力的教学方式。你是怎么准备的？还是完全即兴思考的？

I think that's a really powerful way to teach. How did you have to prepare for that, or are you just live thinking through it?

Speaker 1

我通常会录三遍，然后选取较好的部分。经过多次录制后，我从中挑选较好的片段，逐步构建出完整讲座。有时不得不删除三十分钟的内容，因为走向了我不太喜欢的思路。需要大量反复修改，大概要花十小时才能制作出一小时的

I will typically do, like, say three takes, and then I take, like, the the better take. So I do multiple takes, and I take some of the better takes, and then I just build out a lecture that way. Sometimes I have to delete thirty minutes of content because it just went down the alley that I didn't like too much. There's a bunch bunch of iteration, and it probably takes me, you know, somewhere around ten hours to create one hour

Speaker 0

内容。浓缩成一小时。很有意思。回到基础概念会很困难吗？你是否从中获得很多智慧？

of content. To get one hour. It's interesting. I mean, is is it difficult to go back to the, like, the basics? Do you draw a lot of, like, wisdom from going back to the basics?

Speaker 1

是的。重新研究反向传播和损失函数的本质来源，说实话我很喜欢教学的一点是它能强化你的理解。这不完全是利他行为，而是学习方式。当你需要向别人解释时，就会发现自己知识体系的漏洞。在这些讲座中我甚至让自己惊讶——'哦结果显然应该是这样'，然后实际结果却不同，我就意识到'好吧，我以为我懂这个'。

Yeah. Going back to back propagation loss functions, where they come from, and one thing I like about teaching a lot honestly is it definitely strengthens your understanding. So it's not a purely altruistic activity, it's a way to learn. If you have to explain something to someone, you realize you have gaps in knowledge. And so I even surprised myself in those lectures like, oh, so the result will obviously look like this and then the result doesn't look like it and I'm like, okay, I thought I understood this.

Speaker 1

没错。

Yeah.

Speaker 0

但这就是为什么直接编写代码真的很酷，你在笔记本上运行它，它给你一个结果，你就会觉得，哇哦。没错。而且像是真实的数字、真实的输入，你知道的，真实的代码。

But that's why it's really cool to literally code, you run it in a notebook and it gives you a result and you're like, oh, wow. Yes. And like actual numbers, actual input, know, actual code.

Speaker 1

是的。这不是数学符号等等。真理的源头是代码。不是幻灯片。就像是，让我们动手构建它。

Yeah. It's not mathematical symbols, etcetera. The source of truth is the code. It's not slides. It's just like, let's build it.

Speaker 0

这太美了。从这个意义上说，你是个罕见的人。对于那些试图开发并发表对AI世界有重大影响的想法的研究者，你会给他们什么建议？比如本科生，或者早期的研究生。

It's beautiful. You're a rare human in that sense. What advice would you give to researchers trying to develop and publish idea that have a big impact in the world of AI? So maybe undergrads, maybe early graduate students.

Speaker 1

嗯。我想说，他们必须比我读博士时更有点策略性，因为AI的发展方式。它正朝着物理学的方向发展，你知道，在物理学中，你过去可以在实验台上做实验，一切都很顺利，你可以取得进展，而现在你必须在像LHC或CERN这样的地方工作，AI也在朝这个方向发展。所以有些事已经不可能在实验台上完成了，我认为这在过去是不存在的。

Yep. I mean, I would say, like, they definitely have to be a little bit more strategic than I had to be as a PhD student because of the way AI is evolving. It's going the way of physics where, you know, in physics you used to be able to do experiments on your benchtop and everything was great and you could make progress, and now you have to work in like LHC or like CERN and so AI is going in that direction as well. So there's certain kinds of things that's just not possible to do on the benchtop anymore, and I think that didn't used to be the case at the time.

Speaker 0

你还认为现在还有像GAN那样的论文可以写吗？或者，是的，那种只需要一台电脑就能展示一个简单例子的简单想法？

Do you still think that there's, like, GAN type papers to be written? Or like Yes. Like, simple idea that requires just one computer to illustrate a simple example?

Speaker 1

比如，最近非常有影响力的一个例子是扩散模型。扩散模型太棒了。扩散模型已经有六年历史了，很长一段时间里人们似乎都在忽视它们，但它们是一种惊人的生成模型，尤其是在图像领域。所以稳定扩散等等，都是基于扩散的。扩散是新的，它之前并不存在，它来自——嗯，来自谷歌，但研究者本可以想出它。

I mean, one example that's been very influential recently is diffusion models. Diffusion models are amazing. Diffusion models are six years old, for the longest time people were kind of ignoring them as far as I can tell, and they're an amazing generative model, especially in images. And so stable diffusion and so on, it's all diffusion based. Diffusion is new, it was not there and it came from, well it came from Google, but a researcher could have come up with it.

Speaker 1

实际上，最初的一些——不，那些也来自谷歌。但研究者本可以在学术机构中想出这个。

In fact some of the first actually no, those came from Google as well. But a researcher could come up with that in an academic institution.

Speaker 0

那么，你觉得扩散模型最吸引人的地方是什么？从技术架构的社会影响来说。

Yeah, what do you find most fascinating about diffusion models? So from the societal impact of the technical architecture.

Speaker 1

我喜欢扩散模型的地方在于它效果出奇地好。

What I like about diffusion is it works so well.

Speaker 0

这让你感到惊讶吗？它生成合成数据的多样性和新颖性几乎令人难以置信。

Is surprising to you? The amount of the variety, almost the novelty of the synthetic data it's generating.

Speaker 1

是的，稳定扩散生成的图像简直不可思议。图像生成技术的进步速度疯狂，我们很快从生成小数字到小面孔，虽然起初看起来一团糟，但现在有了稳定扩散，这一切发生得极快。学术界仍有很多贡献空间，比如Flash Attention就是一种高效的内核，用于在Transformer中运行注意力机制，它源自学术环境。这是一种非常巧妙的内核结构设计，避免了注意力矩阵的具体化计算。

Yes, so the stable diffusion images are incredible. The speed of improvement in generating images has been insane. We went very quickly from generating tiny digits to tiny faces and it all looked messed up and now we have stable diffusion, and that happened very quickly. There's There's a lot that academia can still contribute, you know, for example flash attention is a very efficient kernel for running the attention operation inside the transformer that came from academic environment. It's a very clever way to structure the kernel, that's the calculation, so it doesn't materialize the attention matrix.

Speaker 1

因此我认为仍有许多可贡献之处，但需要更有策略性。你

And so I think there's still lots of things to contribute, but you have to be just more strategic. Do you

Speaker 0

认为神经网络能具备推理能力吗？是的。你认为它们已经具备推理能力了吗？是的。你对推理的定义是什么？

think neural networks could be made to reason? Yes. Do you think they already reason? Yes. What's your definition of reasoning?

Speaker 1

信息处理。所以

Information processing. So

Speaker 0

就像人类思考问题并产生新颖想法的方式，这感觉像是推理。是的。所以这种新颖性，我不想说，但那些超出分布范围的想法，你认为可能吗？

in the way that humans think through a problem and come up with novel ideas, it it feels like reasoning. Yeah. So the the novelty, I don't I don't wanna say, but out of out of distribution ideas, you think it's possible?

Speaker 1

是的。我认为我们已经在当前的神经网络中看到了这一点。它们能够以某种方式将训练集中的信息重新组合，实现真正的泛化。

Yes. And I think we're seeing that already in the current neural nets. You're able to remix the training set information into true generalization in some sense.

Speaker 0

这并没有出现。它没有出现在

That doesn't appear. It doesn't appear in the

Speaker 1

训练集中。就像你在算法上做了一些有趣的事情，你在操作一些符号，然后在一个新的环境中得出了一个正确的、独特的答案。

training set. Like you're doing something interesting algorithmically, you're manipulating, you know, some symbols and you're coming up with some correct, a unique answer in a new setting.

Speaker 0

什么会让你觉得，天哪，这东西绝对是在思考？

What would illustrate to you, holy shit, this thing is definitely thinking?

Speaker 1

对我来说，思考或推理就是信息处理和泛化，我认为神经网络现在已经做到了这一点。

To me, thinking or reasoning is just information processing and generalization, and I think the neural nets already do that today.

Speaker 0

所以能够感知世界或感知任何输入，并基于此做出预测或行动，这就是

So being able to perceive the world or perceive the whatever the inputs are and to make predictions based on that or actions based on that, that's that's the

Speaker 1

推理能力。是的。你通过处理信息在新环境中给出正确答案。你已经掌握了正确的算法。你不仅仅是在做某种最近邻搜索的查找表之类的事情。

reasoning. Yeah. You're giving correct answers in novel settings by manipulating information. You've learned the correct algorithm. You're not doing just some kind of a lookup table on the nearest neighbor search, something like that.

Speaker 0

让我问问你关于通用人工智能（AGI）。你认为哪些大胆的想法可能显著推动AGI的发展？或者换个角度，我们现在面临的主要障碍是什么？

Let me ask you about AGI. What are some moonshot ideas you think might make significant progress towards AGI? Or maybe another way is what are the big blockers that we're missing now?

Speaker 1

基本上，我对我们构建AGI的能力相当乐观，这些自动化系统可以与我们互动，非常像人类，我们可以在数字领域或物理领域与它们互动。目前看来，大多数能完成这类神奇任务的模型都处于文本领域。正如我提到的，我怀疑仅靠文本领域不足以真正建立对世界的完整理解。我认为确实需要进入像素层面，理解物理世界及其运作方式。因此，我认为需要扩展这些模型，使其能够处理图像和视频，并通过更多多模态数据进行训练。

So basically I am fairly bullish on our ability to build AGIs, basically automated systems that we can interact with and are very human like, and we can interact with them in a digital realm or a physical realm. Currently it seems most of the models that sort of do these sort of magical tasks are in a text realm. I think, as I mentioned, I'm suspicious that text realm is not enough to actually build full understanding of the world. I do actually think you need to go into pixels and understand the physical world and how it works. So I do think that we need to extend these models to consume images and videos and train on a lot more data that is multimodal in that way.

Speaker 1

你认为你

Do you think you

Speaker 0

也需要接触世界才能理解它吗？

need to touch the world to understand it also?

Speaker 1

嗯，这在我看来是个悬而未决的大问题，即是否还需要具身化以及与世界互动的能力，进行实验并获取那种形式的数据，那么你就需要转向Optimus之类的项目。某种程度上，我认为Optimus是对AGI的一种对冲，因为在我看来，仅靠互联网的数据可能不够。如果是这样，Optimus可能会引领AGI的发展，因为对我而言，Optimus之后别无他物。你拥有这种类人形态的机器人，可以在现实世界中做事，可以有数百万个这样的机器人与人类互动等等。如果这都不能在某个时刻催生AGI，我不确定还有什么可以。因此，从完整性的角度来看，我认为这是一个非常好的平台，但它也是一个更艰难的平台，因为你面对的是原子层面，需要实际构建这些东西并将它们融入社会。

Well that's the big open question I would say in my mind, is if you also require the embodiment and the ability to sort of interact with the world, run experiments and have a data of that form, then you need to go to Optimus or something And like so I would say Optimus in some way is like a hedge in AGI because it seems to me that it's possible that just having data from the internet is not enough. If that is the case then optimus may lead to AGI because optimus would, to me there's nothing beyond optimus. You have like this humanoid form factor that can actually like do stuff in the world, you can have millions of them interacting with humans and so on. And if that doesn't give rise to AGI at some point, I'm not sure what will. So from a completeness perspective I think that's a really good platform, but it's a much harder platform because you are dealing with atoms and you need to actually build these things and integrate them into society.

Speaker 1

因此，我认为这条路径耗时更长，但确定性更高。然后还有一条路径是基于互联网，就像训练这些压缩模型，有效地尝试压缩整个互联网。这也可能催生出这些智能体。

So I think that path takes longer, but it's much more certain. And then there's a path of the internet and just like training these compression models effectively on trying to compress all the internet. And that might also give these agents as well.

Speaker 0

压缩互联网，同时与之互动。是的。所以对我来说并不明显。事实上，我怀疑你可以在不进入物理世界的情况下达到AGI，这有点令人担忧，因为这可能导致它更快发生。感觉就像我们身处沸水之中。

Compress the Internet, but also interact with the Internet. Yeah. So it's not obvious to me. In fact, I suspect you can reach AGI without ever entering the physical world, And but the which is a little bit more concerning because it might that results in it happening faster. So it just feels like we're in, like, in boiling water.

Speaker 0

我们不会在它发生时知道。我我并不害怕AGI。我对此感到兴奋。总是有担忧，但我希望知道它何时发生。是的。

We won't know as it's happening. I I would like to I'm not afraid of AGI. I'm excited about it. There's always concerns, but I would like to know when it happens. Yeah.

Speaker 0

哦，并且有一些关于它何时发生的提示。比如，一年后它会发生，那种事情。是的。我只是觉得在数字领域它可能就会发生。是的。

Oh, and have, like, hints about when it happens. Like, a year from now, it will happen, that kind of thing. Yeah. I just feel like in the digital realm it just might happen. Yeah.

Speaker 1

我认为我们所能依赖的，因为还没有人再次构建出AGI，所以我们所能依赖的是外围是否有足够的沃土？我会说是的。我们迄今为止的进展非常迅速，而且有可采取的下一步。所以我会说，是的，我们很可能会与数字实体互动。

I think all we have available to us because no one has built AGI again, so all we have available to us is is there enough fertile ground on the periphery? I would say yes. We have the progress so far, which has been very rapid, and there are next steps that are available. And so I would say, yeah, it's quite likely that we'll be interacting with digital entities.

Speaker 0

你怎么知道有人已经构建了AGR？

How will you know that somebody has built AGR?

Speaker 1

这将是一个缓慢的我认为这将是一个缓慢的渐进过渡。它将基于产品并聚焦于此。GitHub Copilot会变得更好，然后是GPT帮助你写作，然后是你可以带着数学问题去咨询的这些预言者。我认为我们即将能够向这些预言者提出化学、物理、数学中非常复杂的问题，并让它们完成解决方案。

It's going to be a slow I think it's going to be a slow incremental transition. It's going to be product based and focused. It's going to be GitHub Copilot getting better, and then GPTs helping you write, and then these oracles that you can go to with mathematical problems. I think we're on a a verge of being able to ask very complex questions in chemistry, physics, math of these oracles and have them complete solutions.

Speaker 0

所以AGI主要聚焦于智能，所以意识并不涉及其中。所以在

So AGI to use primarily focus on intelligence, so consciousness doesn't enter into into it. So in

Speaker 1

我认为意识并非某种可以单独解析后附加的特别事物。在我看来，它更像是足够庞大复杂的生成模型涌现出的现象。当某个世界模型复杂到能理解世界时，它自然也会理解自身作为语言模型在世界中的处境——这对我而言就是意识或自我认知的形态

my mind, consciousness is not a special thing you will you will figure out and bolt on. I think it's an emergent phenomenon of a large enough and complex enough generative model, sort of. So if you have a complex enough world model that understands the world, then it also understands its predicament in the world as being a language model, which to me is a form of consciousness or self awareness

Speaker 0

因此要深度理解世界，可能需要将自我融入世界。确实。而在与人类及其他生命体互动时，意识是极其重要的

and So in order to understand the world deeply, you probably have to integrate yourself into the world. Yeah. And in order to interact with humans and other living beings, consciousness is a very useful

Speaker 1

我认为意识如同建模洞见。没错。当你的世界认知模型强大到能理解自身也是其中实体时，这就形成了意识

I think consciousness is like a modeling insight. Modeling insight. Yeah. It's a you have a powerful enough model of understanding the world that you actually understand that you are an entity in it.

Speaker 0

是。但这可能只是我们讲述给自己的叙事。体验世界时确实存在某种感受——这就是著名的意识难题

Yeah. But there's also this perhaps just a narrative we tell ourselves. There's a it feels like something to experience the world. The hard problem of consciousness. Yeah.

Speaker 0

但那或许只是我们自我构建的叙事

But that could be just a narrative that we tell ourselves.

Speaker 1

确实。我认为它会自然涌现，最终会变得稀松平常。比如我们与数字AI对话时，它们会宣称自己具有意识

Yeah. I don't think yeah. I think it will emerge. I think it's going to be something very boring. Like, we'll be talking to these digital AIs, they will claim they're conscious.

Speaker 1

嗯。它们会表现得像有意识，完成所有你对人类的预期行为，最终形成认知僵局

Mhmm. They will appear conscious. They will do all the things that you would expect of other humans, and it's going to just be a stalemate.

Speaker 0

我认为这里会涉及许多真正引人深思的伦理问题，比如最高法院级别的议题——是否允许关闭一个有意识的AI。嗯。是否允许构建有意识的AI。或许需要展开类似...抱歉提到政治话题...但就像围绕堕胎的辩论那样，堕胎背后更深层的问题是

I I think there would be a lot of actual fascinating ethical questions, like supreme court level questions of whether you're allowed to turn off a conscious AI Mhmm. If you're allowed to build a conscious AI. Maybe there would have to be the same kind of debase that you have around, sorry to bring up a political topic, but, you know, abortion, which is the deeper question with abortion

Speaker 1

嗯。

Mhmm.

Speaker 0

生命是什么？嗯。而AI带来的深层问题同样在于：何为生命？何为意识？我认为这些议题会非常引人入胜。未来可能会立法禁止开发具备...那种能催生意识水平的高智能系统，因为这意味着它们将获得感受痛苦的能力，当某个系统说出'不'

Is what is life? Mhmm. And the deep question with AI is also what is life and what is conscious? And I think that'll be very fascinating to bring up. It might become illegal to build systems that are capable that that like, of such level of intelligence that consciousness would emerge, and therefore, the capacity to suffer would emerge, and somebody a system that says, no.

Speaker 0

请不要杀死我。

Please don't kill me.

Speaker 1

嗯。这不就是Lambda计算——那个Lambda聊天机器人对谷歌工程师说过的话吗？它当时就在谈论不想死亡之类的内容。

Mhmm. Well, that's what the Lambda compute the Lambda chatbot already told this Google engineer. Right? Like, it was talking about not wanting to die or so on.

Speaker 0

所以这种行为未来可能会被法律禁止。没错。因为否则...你可能会制造出大量不愿死去的存在，而它们将

So that might become illegal to do that. Right. I because otherwise, you might have a lot of a lot of creatures that don't wanna die, and they will

Speaker 1

你可以直接在集群中快速生成并渗透部分节点。

You can just spawn and penetrate some of the cluster.

Speaker 0

这可能会导致可怕的后果，因为可能会有很多人暗地里热爱谋杀，并开始在那些系统中实践谋杀。我的意思是，所有这些都像一面镜子，映照出人类的处境和人性。嗯。我们将有机会探索它。这就是最高法院最精彩的部分，以及我们关于人类意义的各种辩论。

And then that might lead to, like, horrible consequences because then there might be a lot of people that secretly love murder, and they'll start practicing murder in those systems. I mean, there's just I to me, all of this stuff just brings a beautiful mirror to the human condition Mhmm. And human nature. We'll get to explore it. And that's what, like, the best of the Supreme Court, of all the different debates we have about ideas of what it means to be human.

Speaker 0

我们得以提出那些人类历史中一直在追问的深刻问题。历史上总有'他者'的存在——我们是好人，他们是坏人，于是历史上我们常说：让我们消灭坏人。同样的情况可能会发生在机器人身上。起初它们会成为'他者'，然后我们将开始思考：活着的意义是什么？意识又意味着什么？

We get to ask those deep questions that we've been asking throughout human history. There's always been the other in human history. We're the good guys and that's the bad guys, and we're going to you know, throughout human history, let's murder the bad guys. And the same will probably happen with robots. It'll be the other at first, and then we'll get to ask questions of what does it mean to be alive, what does it mean to be conscious.

Speaker 1

没错。我认为即使在今天的技术中也能看到一些预警信号。比如那些可交互的虚拟伴侣——有家公司即将倒闭，但有人深爱着自己的虚拟伴侣，试图将其迁移到其他平台却无法实现。人们肯定会对这些系统产生情感，因为在某种意义上它们就像人性的镜像，是通过训练形成的某种人类特质的大数据平均值。

Yep. And I think there's some canary in the coal mines even with what we have today. And, you for example there's these waifus that you can work with and some people are trying to like this company is going to shut down but this person really loved their waifu and is trying to port it somewhere else and it's not possible. I think definitely people will have feelings towards these systems because in some sense they are like a mirror of humanity because they are like sort of like a big average of humanity in a way that it's trained.

Speaker 0

但这个平均值是我们能够观察的。能够与人类特质的宏观平均值互动，并对其进行检索查询，这感觉很棒。

But we can that average we can actually watch. It's nice to be able to interact with the big average of humanity and do like a search query on it.

Speaker 1

是的，非常奇妙。当然我们还能塑造它，这不仅是纯粹的平均值。我们可以调整训练数据，改变目标函数，通过各种方式进行微调。

Yeah. Yeah. It's very fascinating. And we can of course also like shape it, it's not just a pure average. We can mess with the training data, we can mess with the objective, we can fine tune them in various ways.

Speaker 1

所以我们能对这些系统的形态产生一定影响。

So we have some, you know, impact on what those systems look like.

Speaker 0

如果你想实现通用人工智能，并能与她对话，谈论任何话题，或许问她一个问题——你会想问些什么呢？

If you want to achieve AGI and you could have a conversation with her and ask her, talk about anything, maybe ask her a question. What what kind of stuff would you would you ask?

Speaker 1

我脑海中会有一些实际问题，比如，我或我所爱的人真的必须死吗？我们能为此做些什么？

I would have some practical questions in my mind like, do I or my loved ones really have to die? What can we do about that?

Speaker 0

你认为它会给出清晰的答案，还是以诗意的方式回答？

Do you think it will answer clearly or would it answer poetically?

Speaker 1

我期待它能提供解决方案。我希望它能像这样：'我已经研读了所有教科书，了解你们创造的一切，在我看来，以下是接下来值得进行的实验，以及一些可能有益的基因疗法，还有你们应该尝试的实验类型。'

I would expect it to give solutions. I would expect it to be like, well I've read all of these textbooks and I know all these things that you've produced and it seems to me like here are the experiments that I think it would be useful to run next and here are some gene therapies that I think would be helpful, and here are the kinds of experiments that you should run.

Speaker 0

好的。让我们进行这个思想实验。想象一下，死亡其实是幸福的前提条件。如果我们获得永生，反而会陷入深重的痛苦，而模型能够预知这一点。

Okay. Let's go with this thought experiment. Okay. Imagine that mortality is actually a pre like a prerequisite for happiness. So if we become immortal, we'll actually become deeply unhappy, and the model is able to know that.

Speaker 0

那么这想告诉你们这些愚蠢的人类什么？是的，你们可以永生，但会变得极度不幸。如果AGI系统试图与人类共情，这该传达什么信息？是'你们不必死亡，但永生会令你们生不如死'吗？

So what is this supposed to tell you, stupid human, about it? Yes. You can become immortal, but you'll become deeply unhappy. If if the if the model is if the AGI system is trying to empathize with you human, what is this supposed to tell you? That, yes, you you don't have to die, but you're really not gonna like it?

Speaker 0

它会彻底诚实吗？就像《星际穿越》里那个AI说的：'人类只需要90%的诚实'。

Is that is it gonna be deeply honest? Like, there's a Interstellar what is it? The AI says, like, humans want 90% honesty.

Speaker 1

嗯。

Mhmm.

Speaker 0

是的。所以，你得决定在这些实际问题中，我该坦诚到什么程度。

Yeah. So, like, you have to pick how honest do I want to answer these practical questions.

Speaker 1

对，顺便说一句，我超爱《星际穿越》里的AI。我觉得它像是整个故事的配角，但同时又非常有趣。

Yeah. I love AI inter Interstellar, by the way. I think it's, like, such a sidekick to the entire story, but at the same time, it's like really interesting.

Speaker 0

它在某些方面有点受限，对吧？

It's kind of limited in certain ways. Right?

Speaker 1

没错，它是受限的，但我觉得这完全没问题。我认为存在有限且不完美的通用人工智能是合理且可以接受的。

Yeah. It's limited, and I think that's totally fine, by the way. I don't think I think it's fine and plausible to have limited and imperfect AGIs.

Speaker 0

这几乎算是一种特性吗？

Is that a feature almost?

Speaker 1

举个例子，它的物理载体有固定的计算能力，即使你可以拥有一个超级厉害、超高智能的AI，你也可以部署一些不那么智能但更节能的AI，它们并不完美，可能会...

As an example, like it has a fixed amount of compute on its physical body and it might just be that even though you can have a super amazing mega brain, super intelligent AI, you can also have less like intelligent AIs that you can deploy in a power efficient way, and then they're not perfect. They might

Speaker 0

犯错。不，我指的是，假设你有无限的计算能力，有时候犯错也是好事。就像为了自我整合，就像《心灵捕手》里罗宾·威廉姆斯的角色说的，人性的不完美，那才是精华所在。

make mistakes. No. I meant more like, say, you had infinite compute, and it's still good to make mistakes sometimes. Like, in order to integrate yourself, like what is it? Going back to Good Will Hunting, Robin Williams' character says, like, the human imperfections, that's the good stuff.

Speaker 0

对吧？是的。难道不是这样吗，我们其实不想要完美？某种程度上，我们渴望缺陷来建立彼此间的联系，因为那些瑕疵仿佛能让你寄托情感。同理，你会想要一个有缺陷的人工智能。

Right? Yeah. Isn't it isn't that this, like, we don't want perfect? We want flaws in part to to form connections with each other because it feels like something you can attach your feelings to the the the flaws. In in that same way, you want an AI that's flawed.

Speaker 0

我不知道。我觉得完美是但

I don't know. I feel like perfection is But

Speaker 1

那么你的意思是好吧。是的。

then you're saying okay. Yeah.

Speaker 0

但那不是通用人工智能。你看，通用人工智能需要足够智能，能给出人类无法理解的答案，而我认为完美是人类无法理解的东西。因为即便科学也提供不了完美答案，总存在空白和谜团。我...我不确定人类是否真的渴望完美。

But that's not AGI. But see, AGI would need to be intelligent enough to give answers to humans that humans don't understand, and I think perfect isn't something humans can't understand. Because even science doesn't give perfect answers. There's always gaps and mysteries and I don't know. I I don't know if humans want perfect.

Speaker 1

是啊。我能想象与这种神谕般的存在对话，就像你设想的那样，或许它会告诉你：根据我对人类状况的分析，你可能并不需要这个，以下是可能...

Yeah. I can imagine just having a conversation with this kind of oracle entity as you'd imagine them, and yeah, maybe it can tell you about, you know, based on my analysis of human condition, you might not want this, and here are some of the things that might

Speaker 0

每个愚蠢的人类都会说，对对对，给我真相吧，我能承受。

every dumb human will say, yeah, yeah, yeah, me. I can give me the truth, I can handle it.

Speaker 1

但美妙之处就在于，人们可以选择，

But that's the beauty, like people can choose,

Speaker 0

所以，但后来那个关于孩子的棉花糖测试等等，我觉得太多人，包括我自己可能在内，无法承受真相。比如，人类处境的深层真相，我不知道自己能否承受。如果，如果有什么黑暗的——如果我们其实是外星人的科学实验，而它意识到了这点呢？我是说，如果它——

so But then the old marshmallow test with the kids and so on, I feel like too many people, like, can't handle the truth, probably including myself. Like, the deep truth of the human condition, I don't I don't know if I can handle it. Like, what what if there's some dark what what if we are an alien science experiment and it realizes that? What if it had I mean

Speaker 1

这不就是《黑客帝国》的剧情重演嘛。

This is the matrix, you know, all over again.

Speaker 0

我不知道。我会——我该聊什么？我甚至——嗯。我大概会先选择那些更安全的科学问题，与我个人生活无关的。对，还有关于死亡的话题，就比如物理学之类的。

I don't know. I would what would I talk about? I don't even yeah. I probably I will go with the safer scientific questions at first that have nothing to do with my own personal life Yeah. And mortality, just like about physics and so on.

Speaker 0

对。慢慢来，看看情况如何，或者试试它有没有幽默感。这是另一个问题。如果它深刻理解人类，按理说它应该能生成——没错，能生成幽默。

Yeah. To to build up, like, let's see where it's at, or maybe see if it has a sense of humor. That's another question. Would it be able to presumably, in order to if it understands humans deeply, it would be able to generate Yep. To generate humor.

Speaker 0

是啊。

Yeah.

Speaker 1

我觉得这其实是个绝妙的衡量标准。它是否能——我认为这真是个很好的切入点。

I think that's actually a wonderful benchmark almost. Like, is it able I think that's a really good point, basically.

Speaker 0

能逗你笑吗？

To to make you laugh?

Speaker 1

是的。如果它能成为一个非常出色的单口喜剧演员，同时在计算领域做着非常有趣的事情，我认为幽默是极其困难的。

Yeah. If it's able to be, like, a very effective stand up comedian that is doing something very interesting computationally, I think being funny is extremely hard.

Speaker 0

没错。因为这在某种程度上很难，就像图灵测试，图灵测试最初的意图之所以困难，是因为你必须说服人类，而这正是为什么喜剧演员会谈论这个。这非常真实，因为如果人们忍不住笑了，那就是成功；如果他们没笑，就意味着你不搞笑。

Yeah. Because it's hard in a way, like a Turing test, the original intent of the Turing test is hard because you have to convince humans, and there's nothing that's why that's why comedians talk about this. Like, there's this is deeply honest because if people can't help but laugh. If they don't laugh, that means you're not funny.