本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
我们正在谈论的这项技术堪称魔法,我想五年前甚至十年前所有人都会觉得这根本不可能。这是有史以来最惊人的技术,发展速度极快,但我们依然感到非常失望。因为它还不够快,甚至可能濒临崩溃边缘。我们本该兴奋到极点,却又近乎绝望。
We're dealing with magic here that we, I think, probably all would have thought was impossible five years ago or certainly ten years ago. This is the most amazing technology ever, and it's moving really fast, and yet we're still, like, really disappointed. Like, it's not moving fast enough, and, like, it's, maybe right on the verge of falling out. We should both be, like, hyperexcited, but also on verge of like slitting our wrists
因为,
because like,
你知道,这场盛宴即将结束了。没错。
you know, so the gravy train is coming to an end. Right.
它确实更快了,但还没达到计算机应有的速度。对吧?我们期待中的计算机速度。现在就像在看一个人工作,就像看嗑了药的约翰·卡马克编程。
It is faster but it's not at computer speed right. Right. What we expect computer speed to be. It's sort of like watching a person work. It's like watching John Carmack on cocaine.
这个世界...好吧,这个世界
The world okay. The world's
世界顶尖程序员在兴奋剂作用下的状态。对,在兴奋剂作用下。
the world's best programmer on a stimulant. Yeah. On a stimulant.
没错。每隔几十年,编程就会迎来一次巨大飞跃,而这次可能是最重大的。在本期节目中,马克·安德森和我邀请到了Replit的CEO兼创始人阿姆贾德·马萨德,共同探讨AI智能体如何改变编程的本质。我们讨论了语法的终结、能持续数小时思考并构建软件的智能体的崛起,以及强化学习和验证循环如何推动AI发展出类似推理的能力。
Yeah. That's right. Every few decades, programming takes a massive leap forward, and this might be the biggest one yet. In this episode, Marc Andreessen and I are joined by Amjad Masad, CEO and founder of Replit, to talk about how AI agents are changing what it means to code. We discussed the end of syntax, the rise of agents that can think and build software for hours, and how reinforcement learning and verification loops are pushing AI towards something that looks a lot like reasoning.
最后,Amjad分享了他的故事——从入侵约旦的大学数据库到开发出全球最强大的开发者工具之一。让我们深入了解。
And finally, Amjad shares his story from hacking his university database in Jordan to building one of the most powerful developer tools in the world. Let's get into it.
那么让我们从假设我是一个编程新手开始。可能我是个学生,或者只是上过几节编程课、稍微捣鼓过代码的人,又或者我只会用Excel宏之类的。总之,我算不上编程高手。这时有人向我推荐了Replit,特别是Replit的AI功能。
So let's start with let's assume that I'm a sort of a novice programmer. So maybe I'm a student, or maybe I'm just somebody who took a few coding classes and I hacked around a little bit, or I don't know. I do Excel macros and or something like that. But I'm, like, not as so as I'm not like a masterclassman of coding. And somebody tells me about Replit and specifically AI and Replit.
当我用现在的Replit AI开始体验时会遇到什么?是的,我认为
What's my experience when I launch in with what Replit is today with AI? Yeah. I think
对于零基础或有些许编程经验的人来说,进入Replit的体验大体相同。首先我们要做的就是帮你避开配置开发环境之类的繁琐事,让你专注于创意本身——你想构建什么?想开发产品吗?
the experience of someone with no coding experience or some coding experience is largely the same when you go into Replit. Okay. The first thing we try to do is get all the nonsense away from setting up development environment and all of that stuff and just have you focus on your idea. So what do you wanna build? Do wanna build a product?
想解决问题?想做数据可视化?输入框完全开放,你可以输入任何内容。假设你想创业,
Do wanna solve a problem? Do you wanna do a data visualization? So the prompt box is really open for you. You can put in anything there. So let's say you wanna build a startup.
你有个创业点子。我会先写一段文字描述想构建的东西。AI助手会阅读这段描述,它会
You have an idea for a startup. I would start with a paragraph long kind of description of what I wanna build. The agents will read that. It will
你只需用标准英语输入。比如'我想在线卖可丽饼'。嗯,就像这样直接输入。
pop You just up type type in English. In standard English, you just type it in. I wanna sell crepes online. Mhmm. So you just, like, type in.
我想卖板条箱 你可以 你可以 字面上可以
I wanna sell crates You can you can it literally could be
四个字或五个字。好吧。或者,如果你有偏爱的编程语言或技术栈,你可以那么做。但我们其实更希望你别那么做,因为我们会为这个请求挑选最合适的方案、分类最优的技术栈。对吧。
that four words or five words. Okay. Or it could be if you have a programming language you prefer or a stack you prefer, you could do that. But we actually prefer for you not to do that because we're gonna pick the best thing for we're gonna classify the best stack for that request. Right.
如果是数据应用,我们会选Python,流式处理之类的。如果是网页应用,我们会选JavaScript和Postgres这类技术。对吧。
If it's a data app, we'll pick Python, stream it, whatever. If it's like a web app, we'll pick JavaScript and Postgres and things like that. Right.
所以你就直接输入。或者你可以决定。你可以说,我想用——对,我懂Python或者我正在Python学校学习,我想用Python来做。
So you just type that. Or you can decide. You can say, and I wanna do it in Yeah. I know Python or I'm learning Python school, and I wanna do it in Python.
没错。Replit最酷的是我们已经存在快十年了,当我们构建这套基础设施时,Replit就能运行任何编程语言。对吧。所以如果你熟悉Python,
That's right. The cool thing about Replot is we've been around for almost ten years now, and when we built all this infrastructure, Replot runs any programming language. Right. So if you're comfortable with Python,
你当然可以进去那么做。对吧。好的。开始吧,我知道这很明显,大家都用过,但就像我现在用英语交流一样。
you can go in and do that for sure. Right. Okay. Just And to begin, I know this is obvious, people have used it, but like I'm dealing in English.
是的,继续吧。没错,你完全在用英语。我是说稍微介绍下背景,比如我七年前或十年前来这里向你推销时,我们当时预言的就是这个未来——每个人都想开发软件。而阻碍人们的正是Fred Brooks所说的编程中的偶然复杂性。还有本质复杂性,比如如何让初创公司进入市场、如何建立业务等等。
Yes, go ahead. Yes, you're fully in English. Mean, just a little bit of sort of background here, like when I came here and pitched to you like ten years ago or whatever, seven years ago, what we were saying is we were exactly describing this future is that everyone would wanna build software. And the thing that's kind of getting in people's way is this all the what Fred Brooks called the accidental complexity of programming. They're like essential complexity, which is like how do I bring my startup to market and how do I build a business and all of that.
过度复杂性体现在选择使用哪个包管理器这类问题上。多年来我们一直在做抽象化处理,最后需要抽象化的就是代码本身。去年我意识到,虽然我们构建了一个出色的平台,但业务表现不佳,而根本原因在于代码成为了瓶颈。
Excessive complexity is what package manager do I use, all of that stuff. We've been abstracting away that for so many years. And the last thing we had to abstract away is code. I had this realization last year, which is I think we built an amazing platform, but the business is not performing. And the reason the business is not performing is that code is the bottleneck.
是的,
Yes,
其他问题固然重要,但语法仍是症结所在。语法对人来说始终不够自然。所以归根结底,英语才是真正的编程语言。不过顺便说,
all the other stuff is important to solve, but syntax is still an issue. Syntax is just an unnatural thing for people. So ultimately, English is the programming language. Right. I But, by way,
目前它是否支持英语之外的其他世界语言?
just to does it work with other world languages other than English at this point?
支持。可以用日语编写,我们有很多日本用户。好的。
Yes. You can write in Japanese, and we have a lot of users, especially Japanese. Okay.
那现在是否全面支持这些语言?比如是否涵盖所有语言,还是说仍需定制开发来适配新语言?
That tends to be very So does it support these days? Like, does they support every language, or is it still do you still have to do custom work to craft a new language?
对于使用人口超过1亿的主流语言,AI都能处理得很好。
Most mainstream languages that has a 100,000,000 plus people, let's speak it. AI is pretty good at it.
好的。是的。
Okay. Yep.
没错。最近不知为何我做了些历史研究,就是想理解我们所处的这个时代。因为这是个非常特殊的时刻,将其置于历史背景中很重要。我读到格蕾丝·霍珀的一句话。
Yeah. So I did a bit of historical research recently for some reason. I just wanna just understand the moment we're in. And because it's such a special moment, it's important to contextualize it. And I read this quote from Grace Hopper.
众所周知,格蕾丝·霍珀发明了编译器。当时人们都在编写机器代码,这就是程序员的工作。她说专家永远是专家,必须学习计算机的底层机制,但她希望实现一个人们能用英语编程的世界。这是她在Kartraty之前说的,对吧?
So Grace Hopper invented the compiler, as you know. At the time, people were programming machine code, and that's what programmers do. That's what the And specialists she said, specialists will always be a specialist. They have to learn the underlying machinery of computers, but I want to get to a world where people are programming English. That's what she said, that's before Kartraty, right?
那是75年前的事了,这就是她发明编译器的原因。在她看来,C语言编程就是英语。但这只是开始,后来有了C语言,再发展到更高级的Python和JavaScript。我认为我们现在正处于迈向下一步的时刻。
That's seventy five years ago, and that's why I invented the compiler. And in her mind, like C programming is English. But that was just the start of it. You had C and then you go high level Python and JavaScript. And I think we're at a moment where it's the next step.
对。不再是输入语法,而是直接输入想法。这才是我们最终想要的。
Right. Instead of typing syntax, you're actually typing thoughts. Right. Which is what we ultimately want.
然后由机器来写代码。
And the machine writes the code.
然后由机器来写代码。
And the machine writes the code.
对,对,没错,没错,我记得。
Right. Right. Yeah. Yeah. I remember it.
你可能年纪太小不记得了,但我小时候,七十年代就有高级语言了,比如BASIC、Fortran和C语言。但你仍然会遇到用汇编语言编程的人,顺便说一句,现在还有人用。比如游戏公司之类的,为了优化性能还是会用汇编。
You're probably not old enough to remember, but I remember when I was a kid, there were were higher level languages by the seventies, like basic and so forth and Fortran and c. But you still would run into people who were doing assembly programming, assembly language, which by the way, still do. You know, like, game companies or whatever still do assembly to get off.
他们还瞧不起那些用BASIC的孩子。
And they were hating on the kids that were doing basic.
所以用汇编的人瞧不起用BASIC的孩子,但也有老一辈程序员瞧不起用汇编的程序员,觉得他们不用机器码直接编程。不对,不对,是不用直接的机器码编程。
So the assembly people were hating on kids doing basic, but there were also older coders who hated on the assembly programmers for doing assembly and not and not not not do it oh, no. No. No. Not doing direct machine code. Right.
不是直接用0和1的机器码编程。给不了解的人解释下,汇编语言是一种底层编程语言,最终会编译成实际的机器码。对大多数程序员来说,这简直就是天书。
Not doing direct zero and one machine code. So for people who don't know, assembly language is sort of this very low level programming language that sort of compiles to actual machine code. Yeah. It's incomprehensible gibberish to most program even most programmers.
你是用八进制写的还是
You're writing in Octal or
某种非常接近硬件的语言,但即便如此,它还是要编译成0和1。而真正的程序员是直接用0和1写的代码。所以总有这种职业鄙视链,懂吧。
something like very, very close to the harbor, but even still, it's still a language that compiles to zeros and ones. Right. Whereas the actual real programmers actually wrote in zeros and ones. Yeah. And so there's always this tendency for the pros to be look down the nose Yeah.
确实如此。还说新人基本上都很马虎。他们不明白发生了什么。他们并不真正了解这台机器。当然,还有那些更高层次的抽象功能
They do. And say the new people are being basically sloppy. They don't understand what's happening. They don't really understand the machine. And then, of course, what the higher level abstractions do
它们实现了民主化。最具讽刺意味的是,我曾是JavaScript革命的一员。在创立Replied之前我在Facebook工作,我们构建了现代JavaScript技术栈。我们开发了React.js及其周边工具链。
is they democratize. The absolute irony is I was part of the JavaScript revolution. I was at Facebook before starting Replied, and we built a modern JavaScript stack. We built React. Js and all the tooling around it.
我们因此遭到很多程序员的抨击,他们认为应该直接手写原生JavaScript。我当时想,好吧,随你们怎么说。而现在这已成为主流。那些靠我们开创的上一次技术浪潮谋生的人,现在却在诋毁这波新浪潮。
And we got a lot of hate from the programmers that you should type vanilla JavaScript directly. I was like, okay, whatever. And, yeah, and now that's mainstream. And then those guys that built their careers on the last wave we invented Right. Are hating on this new wave.
人性永远不会改变。好吧。
It's just the People never change. Okay.
明白了。好的。所以你在输入英文。我想在网上卖葡萄。想做这个。
Got it. Okay. So you're typing English. I wanna sell grapes online. Wanna do this.
我想卖T恤。不管是什么生意。好的。接下来会怎样?是的。
I wanna have a t shirt. Whatever the business is. Okay. What happens Yeah.
然后replet助手会向你展示它所理解的内容。它试图在你和它之间建立共识。我认为在用户界面方面我们还有很多改进空间,不过现在我会先展示一个任务列表。嗯。我会告诉你,我要去设置一个数据库,因为你需要有个地方存储数据。
And then a replet agent will show you what it understood. So it's trying to build a common understanding between you and it. And I think there's a lot of things we can do better there in terms of UI, but for now, I'll show you a list of tasks. Mhmm. I'll tell you, I'm gonna go set up a database because you need to store your data somewhere.
我们需要设置Shopify或Stripe,因为我们需要接收付款。然后它会显示这个列表,最初给你两个选项。你是想从设计开始,这样我们可以来回迭代确定最终设计?还是想直接构建完整功能?嗯。
We need to set up Shopify or Stripe because we need to accept payments. And then it shows you this list and gives you two options initially. Do you wanna start with a design so that we can iterate back and forth to get locked design down? Or do you wanna build a full thing? Mhmm.
嘿,如果你想构建完整功能,我们会花二、三、四十分钟来完成。
Hey. If you wanna build a full thing, we'll go for twenty, thirty, forty minutes.
嗯。
Mhmm.
代理会告诉你:去安装应用吧。我要去设置数据库、执行迁移、编写SQL、构建网站。我还会进行测试。这是我们最近在Agent 3上实现的创新——它在编写完软件后,会启动浏览器进行测试,遇到问题就会迭代修复代码。
And the agent will tell you, go hey. You're install the app. I'm gonna go set up the database, do the migrations, write the SQL, build the site. I'm gonna also test it. So this is a recent innovation we did with Agent three, is that after it writes the software, spins up a browser, goes around and tests in the browser, and then any issue, it kind of iterates, kind of goes and fix the code.
所以我会花二、三十分钟构建好,然后发送通知告诉你应用已就绪。你可以在手机上测试或回到电脑前。可能会发现bug或问题,就向代理描述说:这和我预期的不太一样。
So I'll spend twenty, thirty minutes building that. I'll send you a notification. I'll tell you the app is ready. So you can test it on your phone or go back to your computer. You'll see maybe you'll find a a bug or an issue, you'll describe it to the agent and say, hey, it's not exactly doing what I expected.
如果一切完美,你就可以直接发布了。顺便说,有很多案例中人们仅用二、三十分钟就实现了想法,这太棒了。点击发布按钮,几下点击就能上云。
Or if it's perfect, you're ready to go and that's it. By the way, there's a lot of examples where people just get their idea in twenty, thirty minutes, which is amazing. You just hit publish. Mhmm. You hit publish, couple clicks, you'll be up in the cloud.
我们会在云端设置虚拟机,部署数据库。一切就绪后,你就拥有了生产数据库。嗯。想想两三年前要达到这一步需要多少流程。
We'll set up a virtual machine in the cloud. The database is deployed. Everything's done, now you have a production database. Mhmm. So think about the steps needed just two or three years ago in order to get to that step.
你需要搭建本地开发环境、注册AWS账户、配置数据库和虚拟机、创建完整的部署流水线。所有这些工作都已为你完成,连小孩或外行都能操作。如果你是程序员并好奇代理做了什么,Ruplift的妙处在于——由于我们有IDE的历史背景——你可以层层剖析。嗯。你可以打开文件树查看具体文件。
You have to set up your local development environment, you have to sign up for an AWS account, you have to provision the databases, the virtual machines, you have to create the entire deployment pipeline. All of that is done for you, and it just a kid can do it, a layperson can do it. If you're a programmer and you're curious about what the agent did, the cool thing about Ruplift, because we have this history of being an IDE, you can peel the layers. Mhmm. You can open the file tree and you could look at the files.
你可以打开git、推送到GitHub、按需连接到编辑器、在Emacs中打开。所以我们上传功能的亮点在于:它既是抽象化所有复杂性的氛围编码平台,又保留了所有可查看的底层细节。
You can open gits. You can push to GitHub. You can connect it to your editor if you want. You can open it in Emacs. So the cool thing about our upload, yes, it is a vibe coding platform that abstracts away all the complexities, but all the layers are there for you to look at.
这很棒。但回到你提到的ID部分——你说插上ID后会出现一个清单。嗯。描述时你说‘我要做这个,我要做那个’。
That was great. But let's go back to you said that you say, I've got my ID. You plug it in, it says it gives you this list of things. Mhmm. And when you describe it, you said, I'm gonna do this, I'm gonna do that.
这里的‘任一’指代的是代理而非用户。是的。所以代理会列出它要执行的任务清单,然后
The either in that case was the agent as opposed to the user. Yes. And so the agent lists the set of things that it's going to do, and then
代理实际执行这些任务。代理完成这些事。好的。是的。这是非常关键的一点。
the agent actually does those things. Agent does those things. Okay. Yeah. That's a very important point.
当我们进行这次转型时,Replit内部并未意识到真实用户已从人类转变为代理程序员。有件趣事:我们在亚洲设有服务器,原本是为了让印度或日本用户获得更低延迟。但代理上线后,他们的体验反而明显变差。
When we did this shift, we hadn't realized internally at Replit how much the actual user stopped being the human user and it's actually the agent programmer. Right. So one one really funny thing happened is we had servers in Asia. And the reason we had servers in Asia because we wanted our Indian or Japanese users to have a shorter time to the servers. When we launched the agent, their experience got significantly worse.
我们当时很困惑:按理应该更快啊?结果发现根源在于AI服务器位于美国。所以实际编程者是在(美国)
And we're like, what happened? Like, it's supposed to be faster. Well, turns out at source is because the AIs are sitting in The United States. And so the programmer is actually in
美国,是你要发送请求的对象
The United States, it's you're sending the request to
程序员,而程序员正在与全球另一端的机器交互,所以没错,突然间代理人就变成了程序员。好吧。
the programmer, and the programmer is interfacing with a machine across the world, and so, yes, suddenly the agent is the programmer. Okay.
所以这是个新术语。代理是一种软件程序,本质上是在利用系统的其余部分。没错,就像人类用户那样,但它不是人,而是个机器人。
So that's new terminology. Agent is a software program that is basically using the rest of the system That's right. As if it were a as if it were a human user, but it's not. It's a bot.
没错。它拥有诸如写文件、编辑文件、删除文件、搜索包索引、安装包、配置数据库、配置对象存储等工具。它是一个拥有工具和界面的程序员,其界面与人类程序员非常相似。
That's right. It has access to tools such as write a file, edit a file, delete a file, search the package index, install a package, provision a database, provision objects object storage. It is a programmer that has the tools and interface. It has a sort of an interface that that is very similar to a human programmer.
然后,我们会进一步讨论其运作原理。AI行业内部有个争论,就是关于这些...现在有种理念是让代理代表你执行任务,然后出去...出去完成各种任务。这里存在一个争论,就是...显然,这甚至涉及到...
And then, you know, we'll talk more about how this all works, a debate inside the AI industry is with the these was kind of this, you know, this idea now of having agents that do things on your behalf and then go out, you know, go go out and kind of accomplish missions. There's this, you know, kind of debate which is okay. How look. Obviously, you know, it's a big deal even to
拥有一个能
have an AI agent that
执行相对简单任务的AI代理已经很了不起,而要完成复杂任务,当然是过去八十年最重大的技术挑战之一。然后还有个问题,比如代理能否自主运行五分钟、十五分钟、一小时甚至八小时?意思是它能保持连贯性多久?它能完全掌控自身机能而不失控多久?因为至少早期的代理或AI,如果让它们执行任务,可能只能运行两三分钟,然后就会开始混乱、钻牛角尖,最终失控。
can do relatively simple things to do complex things, of course, is, you know, one of the great technical challenges of the last eighty years, you know, to to to do that. And then there's this sort of this question of, like, can the agent go out and run and operate on its own for five minutes, you know, for for fifteen minutes, for an hour, for eight hours? And and meaning, like, sort of, like, how long does it maintain coherence? Like, how long does it actually, like, stay in full control of its of its faculties and not kinda spin out? Because at least the early early agents or the the early AIs, if if you set them off to do this, they might be able to run for two or three minutes, and then they would start to get confused and go down rabbit holes and, you know, kinda kinda spin out.
最近啊,我们发现智能体能够运行更长时间,执行更复杂的任务了。那么问题来了:在它们崩溃之前,我们目前处于什么水平——能运行多久?能处理多复杂的任务?
More recently more recently, you know, we've seen that that that that agents can run a lot longer and and do more complex tasks. Like, where are we on the curve of agents being able to run for how long and for what complexity tasks before before they break?
这绝对是我们关注的核心指标。其实早在2023年,甚至四五年前我们就有了软件智能体的构想。但每次尝试时都会遇到连贯性问题——它们能运行一两分钟,然后错误就会累积到无法挽回的地步。实际上你...
That's that's absolutely the the I think the main metric we're looking at even back in 2023, you know, had the idea for software agents, you know, four or five years ago now. The problem every time we attempt them, the the problem of coherence, you know, they'll they'll they'll go on for a minute or two, and then they'll just, you know, they'll just compound in errors in a way that they just can't recover. And you can actually
能亲眼看到对吧?如果你观察它们的运行过程,会发现它们越来越困惑,甚至可能变得错乱。确实如此。
see it. Right? Because they actually they actually if you watch watch them operate, they get increasingly confused and then, you know, maybe even deranged. Yeah.
它们会变得相当错乱,进入非常奇怪的领域,有时还会突然说中文,做出各种诡异行为。不过去年我们确实突破了三四五分钟的门槛。嗯。这让我们感觉长期推理的问题正在被解决,于是我们下了赌注。
They vary deranged and they get they go into very weird areas and sometimes they start speaking Chinese and do doing really weird things. But I would say sometime around last year, we maybe crossed a three, four, five minute mark. Mhmm. And it felt to us that, okay, we're on a path where long, know, long horizon reasoning is getting solved. And so we made a bet.
我告诉我的...
And I tell my
所以团队说的长期推理是指——以复杂方式处理事实和逻辑,并在长时间跨度内分多个步骤进行...
So team long horizon reasoning meaning, reasoning meaning like dealing in facts and logic in a sort of complex way, and then long horizon being over a long period of time with many steps to
的推理过程?没错。大语言模型的工作原理是它们有个上下文环境,这个环境本质上就是记忆库,包含所有文本、你的提示词,以及AI在推理时的所有内心独白。所以当AI进行推理时,它其实是在自言自语。
a reasoning process? Yeah, that's right. So if you think about the way large language models work is that they have a context. This context is basically the memory, all the text, your prompt, and also all the internal talk that the AI is doing as its reasoning. So when the AI is reasoning, it's actually talking to itself.
现在我需要去设置一个数据库。嗯,我手头有什么工具呢?哦,这里有个工具写着Postgres。好的,我来试试用它。
It's like, now I need to go set up a database. Well, what what kind of tool do I have? Oh, there's a tool here that that says Postgres. Okay. Let me try using that.
好的,用过了。我得到了反馈。让我看看并阅读这些反馈,它会读取反馈内容。那个提示框或上下文区域包含了用户输入、环境输入以及机器的内部思考。
Okay. Used that. I got feedback. Let me look at the feedback and read it, and it'll read the feedback. And so that that that prompt box or context is where both the user input, the environment input, and the internal thoughts of the machine are all within.
这有点像程序在内存空间中的记忆。长期以来,如何对此进行推理一直是个挑战。那时AI就偏离了轨道。现在它们能整体思考并保持连贯性,而且现在有了上下文压缩技术。
It's sort of like a program memory in in memory space. And so reasoning over that was the challenge for a long time. That's when AI just went off track. And now they're able to kind of think through this entire thing and maintain coherence. And there's now techniques around compression of contacts.
所以上下文长度仍然是个问题对吧?我认为现在的LLM虽然宣传支持百万token长度(几乎相当于百万单词),实际上约20万就开始吃力。我们做了很多工作,比如SOP流程和内存压缩。如果内存中某部分显示正在获取数据库所有日志,你可以用一句话概括整段日志或数据库设置。
So context length is still a problem, right? So I would say LLMs today, they're marketed as a million token length, which is like a million words almost. In reality, it's about 200,000, and then they start to struggle. So we do a lot of, you know, we SOP, we compress the memory. So if a memory if if a portion of the memory is saying that I'm getting all the logs from the database, you can summarize, paragraphs of logs with one statement or the database setup.
就是这样,对吧?因此我们会定期压缩上下文以确保连贯性。在基础模型之外也有很多创新来实现长上下文的连贯性。
That's it, right? And so every once in a while we'll compress the context so that we make sure we maintain coherence. There's a lot of innovation happened outside of the foundation models as well in order enable that long context coherence.
你认为基础模型中实现这一点的关键技术突破是什么?我觉得
And what the key technical breakthrough in the foundation models that made this possible, do you think? I think
是强化学习(RL)。预训练的工作原理是——要知道,预训练是大语言模型训练的第一步。它阅读文本时遮盖最后几个词并尝试预测,这就是它的训练方式。
it's RL. I think it's reinforcement learning. So the way pre training works is, know, Pre training is the first step of training a large language model. It reads a piece of text, covers the last words and tries to guess it. That's how it's trained.
这并不真正意味着长上下文推理。结果证明非常非常有效。它能以这种方式学习语言。但我们无法突破这一限制的原因是那种训练模式还不够好。你真正需要的是一种针对长上下文的问题解决方式。
That doesn't really imply long context reasoning. Turns out to be very, very effective. It can learn language that way. But the reason we weren't able to move past that limitation is that that modality of training just wasn't good enough. And what you want is you want a type of problem solving over over long context.
强化学习,特别是从代码执行中获得的,赋予了我们所谓的AI轨迹展开能力。轨迹是指为达成解决方案而进行的逐步推理链。据我理解,强化学习的运作方式是将LLM置于类似Replit的编程环境中,并给出一个代码库及其中的bug要求解决。人类训练者已知解决方案应呈现的形态——我们可能在GitHub上有对应的拉取请求,或能通过单元测试验证解决方案。
So what reinforcement learning, especially from code execution, gave us is the ability for the machine to the LLM to roll out what we call trajectories in AI. So trajectory is a step by step reasoning chain in order to reach a solution. So the way, as I understand, reinforcement learning works is they put the LLM in a programming environment like Replit, and say, hey, here's a code base, here's a bug in the code base, and we want you to solve it. Now, the human trainer already knows what the solution would look like. So we have a pull request that we have on GitHub, so we know exactly, or we have a unit test that we can run and verify the solution.
其运作机制是展开大量不同轨迹对模型进行采样,多数轨迹会偏离方向,但总有一条能通过修复bug抵达解决方案,系统便强化这条路径。成功轨迹会获得奖励,从而训练模型掌握这类问题的解决方法。这就是我们扩展推理链的方式。
So what it does is it rolls out a lot of different trajectories, they sample the model, and maybe one of those trajectories will reach, and a lot of them will just go off track, but one of them will reach the solution by solving the bug, and it reinforces on that. So that that gets a reward and the model gets trained that, okay, you know, this is how you solve these type of problems. So that's how we are able to extend these reasoning chains.
明白了。我的问题分两部分:当前模型在超长推理方面的能力究竟有多强?以及我们如何量化评估这种能力?
Got it. And and how so two part question is how how how good how good are the models now at long long long reasoning? And and I would say, and how do we know? Like, how how is that established?
有个非营利组织Meter建立了衡量标准,用于测试模型在保持连贯性且执行有效任务(编程或其他顶级基准测试)时的持续运行时长。他们去年底发布的论文指出,模型有效运行时长每七个月翻倍——比如从两分钟增至四分钟。不过我认为这个预测严重低估了实际发展速度。
There is a nonprofit called Meter that is measuring useful has a benchmark to measure how long a model runs while maintaining coherence and doing useful things, whether it's programming or other top benchmark tasks that they've done. And they put up a paper, I think late last year, that said every seven months, the minutes that a model can run is doubling. Mhmm. So you go from two minutes to, you know, four minutes in seven months. I think they vastly underestimated that.
真的吗?这个倍增周期实际上比七个月更短吧?
Is that right? Fastly. It's doubling it's doubling more often than seven months.
我们Agent3团队对此进行密切监测,通过真实用户任务数据来衡量——不是做基准测试,而是进行AB测试观察用户成败数据。对我们而言,终极成功标志是用户创建并发布了应用,因为发布行为意味着额外付费,表明该应用具有经济价值。
We so agent three, we measure that, you know, very closely. And we measure that in real tasks from real users. So we're not doing benchmarking, we're actually doing AB tests and we're looking at the data that how users are successful or not. For us, the absolute sign of success is you made an app and you published Because when you publish it, you're paying extra money. You're saying this app is economically useful.
我打算发布它,尽可能做到清晰明了。我们看到的是,在第一个代理中,代理能运行两分钟,然后可能会遇到困难。第二个代理是二月份推出的,能运行二十分钟。
I'm gonna publish it. So that's as clear cut as possible. And so what we're seeing is in agent one, the agent can run for two minutes. And then perhaps struggle. Agent two came out in February, it ran for twenty minutes.
第三个代理能运行两百分钟。有些用户甚至将其推至十二小时左右。但我对其在如此长时间下是否同样出色不太有信心,不过在两三个小时的时间线上,它确实非常非常出色。除了模型之外,主要的创新在于验证循环。嗯。
Agent three, two hundred minutes. Some users are pushing it to like twelve hours and things like that. I'm less confident that it is as good when it goes to these stratospheres, but at like two, three hours timeline, it is really it's it's it's it's insanely good. And and the main innovation outside of the models is a verification loop. Mhmm.
实际上,我记得读过NVIDIA的一篇研究论文。NVIDIA尝试用DeepSeek编写GPU内核,那大概是七个月前DeepSeek刚推出时。他们发现,如果在循环中加入验证环节,能运行内核并验证其工作,就能让DeepSeek运行约二十分钟。嗯。而且它确实生成了优化后的内核。
Actually, I remember reading a research paper from NVIDIA. So what NVIDIA did is they're trying to write GPU kernels using DeepSeek, and that was, like, perhaps seven months ago when DeepSeek came out. And what they found is that if we add a verify in the loop, if we can run the kernel and verify working, we're able to run DeepSeek for, like, twenty minutes. Mhmm. And it it was generating actually optimized kernels.
嗯。对。
Mhmm. Right.
于是我想,好吧。对我们来说,显然作为一家代理实验室或应用开发公司,我们不做基础模型,但在其之上做了大量研究。我们知道现在代理能运行十到二十分钟,或者LLM能保持更长时间的连贯性。但要将其推至两百、三百分钟,就需要循环中的验证器。这就是为什么我们花时间构建框架,让代理能启动浏览器并进行计算机使用风格的测试。
And so I was like, okay. The next thing for us, obviously, as an as a sort of a agent's lab or, like, app player company, we're not doing the foundation model stuff, but we're doing a lot of research on top of that. And so, okay, we know that agents can run for ten, twenty minutes now, or LLMs can stay coherent for longer. But for you to push them to two hundred, three hundred minutes, you need a verifier in the loop. So that's why we spend all our time creating scaffolds to make it so that the agent can spin up a browser and do computer use style testing.
一旦在中间加入这个环节,情况就变成了:它先工作二十分钟,另一个代理启动浏览器测试前一个代理的工作,形成一个多代理系统。如果发现错误,就启动新轨迹并说:'干得好,总结下过去二十分钟的工作。现在加上发现的错误,这就是新轨迹的提示。'这样层层叠加,理论上可以无限延续,但
So once you put that in the middle, what's happening is it works for twenty minutes, another agent spins up a browser, tests the work of the previous agent, so it's a multi agent system. And if it founds a bug, it starts a new trajectory and says, okay, good work, let's summarize what you did the last twenty minutes. Now that plus the bug that we found, that's a prompt for a new trajectory. So you stack those on each other, and you can go endlessly, but
所以这就像马拉松或接力赛。嗯。只要每一步都正确完成,就能在某种程度上持续下去。
So it's like a marathon, like setting up a marathon or like a relay race. Mhmm. That as long as he as long as each step is done properly, you could do in in sort of
无限多的步骤。没错。而且你总是可以将前一步骤压缩成一个段落,这就变成了一个提示。所以这是一个代理提示下一个代理。对。
an infinite number of steps. That's right. That's And you can always compress the previous step into a paragraph, and that becomes a prompt. So it's it's an agent prompting the next agent. Right.
对。对。
Right. Right.
这太神奇了。那么当像这样的代理,比如运行在现代现代这样训练的LMs上的代理,当它运行比如说两百分钟时,就像你观察代理运行时,它是在,比如说,运行吗?它是
That's amazing. So and then when an when an agent like, when a modern agent, like, running on modern modern LMs that are trained this way, when it let's say it runs for two hundred minutes, like, when you watch the agent run, is it, like, running? Is it,
像,以与人类相同或更慢或更快的速度处理逻辑和任务吗?实际上,我会说它更快,但没有显著地快。它不是以计算机速度运行的,对吧,对。我们所预期的计算机速度。这就像看着一个
like, processing through, like, logic and tasks at the same pace that like a human being is or slower or faster? It's actually, I would say it is faster, but not that much significantly faster. It's not at computer speed, right, Right. What we expect computer speed to be. It's like watching a
就像,如果你观察,如果你,如果它在描述它在做什么,这有点像
per like, if you watch the if you if it's describing what it's doing, it's sort of
就像看着一个人工作。就像看着约翰·卡马克在可卡因作用下工作。世界好吧。
like watching a person work. It's like watching John Carmack on cocaine work. The world okay.
世界是所以我是说。世界上最好的程序员。哦,是的。世界上最好的程序员在兴奋剂作用下。在兴奋剂作用下。
The world's so so I'm saying that. The the world's best programmer. Oh, yeah. The world's best programmer on a on a stimulant. On a stimulant.
是的,没错。好的。而且为你工作得很顺利。是的。
Yeah. That's right. Okay. And Working working for you. Yeah.
确实如此。所以它非常快。
It is. So the it's very fast.
没错。你可以看到文件差异在不断更新。但偶尔它会停下来开始思考。我来展示一下它的推理过程。就像是在说:我做了这个,我做了那个,我走对方向了吗?
Yep. And you can see the file diffs running through. But every once in a while, it'll stop and it'll start thinking. I'll show you the reasoning. It's like, did this, I did this, am I on the right track?
它确实会试图反思。然后可能会检查自己的工作并决定下一步,或者启动测试代理。所以你看到它做所有这些事,偶尔还会调用工具,比如停下来会说:'遇到问题了,Postgres 15与我的数据库或M包不兼容。好吧,这是个新问题,我要去网上搜索一下。'
It kind of really tries to reflect. And then it might review its work and decide the next step, or it might kick into the testing agent. So you're seeing it do all of that, and every once in a while it calls a tool, for example, it stops and says, Well, ran into an issue, Postgres 15 is not compatible with this database or M package that I have. Okay, this is a problem I haven't seen before, I'm gonna go search the web.
所以它
So it
有个网页搜索工具,会去执行。看起来就像人类程序员一样。观察这个过程非常迷人。我最喜欢做的事就是观察这个工具链、推理链和测试链,就像在看一个超高效率的程序员工作。
has a web search tool, go do that. And so it looks like a human programmer. And it's really fascinating to watch. One of my favorite things to do is just to watch the tool chain, the reasoning chain, and the testing chain. Is like watching a hyper productive programmer.
没错。你看,我们正在触及AI的圣杯——机器的通用推理能力。你多次提到验证这个概念。为了让播听众中不太了解细节的听众明白,我来试着描述一下,看看我理解得对不对。就像单纯的大型语言模型,比如两年前刚推出的ChatGPT,最惊人的是它在语言表达上的流畅度。
Right. So, you know, we're kinda getting into here kind of the holy grail of AI, which is sort of, you know, generalized reasoning, you know, by the machine. So you mentioned this a couple times with this idea of of of of verification. So so just for folks on the listening to podcasts who maybe aren't in the details, let me try to describe this and see if see if I have it right. So, like, just a just a large language model, the way you you would you would have experienced with, like, chat GPT out of the gate two years ago or whatever would have been it's like it's incredible how fluid it is at language.
令人难以置信的是,它在创作莎士比亚十四行诗或说唱歌词方面表现得如此出色。它在人类对话中的表现简直惊人。但如果你开始询问涉及理性思考或问题解决能力的问题,比如数学题,整个局面就突然变得不同了。
It's incredible how good it is at, like, writing Shakespearean sonnets or rap lyrics. It's it's amazing how good it is at human conversation. But if you start to ask it, like, problems that involve, like, rational thinking or problem solving, all of a sudden, like, you'd Or math. Or the math. The whole show.
最初阶段,如果你问一些非常基础的数学问题,它根本无法解答。
And and and in the very beginning, it was you could ask if you ask some very basic math problems, you know, it would it would not be able to do them.
确实
That's
没错。但即使后来它在这方面有所进步,比如能计算两个小数字相加,却无法处理大数字的加法;或者能做大数加法时,又不会乘法。然后就是那个著名的'草莓测试'——问单词'strawberry'里有多少个字母r?
right. But then even when it got better at those, if you started to ask it to, like you know, it it could maybe add two small numbers together, but it couldn't add two large numbers together, or if it could add two large numbers, it couldn't multiply them. Yeah. And it's just like, alright. This is and then it had this sort there was this famous the the famous what was the straw the strawberry test, the famous strawberry test, which is how many r's are in the word strawberry?
确实如此。
That's right.
有很长一段时间它总是猜错,坚称'strawberry'里只有两个r,实际上有三个。当时人们甚至用'随机鹦鹉'这个贬义词来形容它。
And there was this long period where it it kept it would it would just guess wrong. It would say there are only two r's in the word strawberry, and then it turns out there are three. So, so it it it was this thing, and and so people were and there was even this term that was being used, kind of the the slur that was being used at the time was stochastic parrot. Yeah. Right.
我本来想说'机械怪'。不过'机械怪'现在是新的歧视用语了——这完全算是种族歧视词汇了,虽然AI算是个新物种。但技术界的批评术语确实是所谓的'随机鹦鹉'。
Was thinking clanker. Well well, clanker is the is the new slur. Clank clanker clanker is just the full on racial slur. I I I guess AI is a species. But the the technical critique was so called stochastic parrot.
随机意味着不确定。对。就像随机的鹦鹉,本质上这些大型语言模型就像海市蜃楼。没错。它们只是重复它们认为你想听的内容,但实际上并非如此。
Stochastic means random. Yeah. So sort of random parrot, meaning basically that this thing was sort of a the l large language models were like a they were like a mirage. That's right. Where they were like repeating back to you things that they thought that you wanted to hear, but they didn't
某种程度上,在纯粹的预训练LLM世界里确实如此。
have In a way, it's true in a in a pure pretraining LLM world.
在最基础的层面是这样。但后来正如你所说,过去一年左右,逐渐引入了强化学习的层次。关键点在于——这其实并不新鲜。
For the for the very basic layer. But then what happened is, as as you said, over the last year or something, there there was this layering in of of reinforcement learning. Then and but the key to And it's not new crucially.
这就像AlphaGo。对吧?请描述一下这个场景。好的。
It's like it's AlphaGo. Right? Right. Describe this describe that for a second, please. Yeah.
我们在2015年就有过突破,就是AlphaGo的突破,大约2015到2016年间,当时它开始崭露头角,你
So we we had this breakthrough before, and twenty fifteen was the AlphaGo breakthrough, think 2015, 2016, where it is emerging of, you
会
would
比我更清楚,关于连接主义(认为神经网络是实现AI的正确途径)和符号系统(主张离散推理、F语句和知识库等传统方法)之间的古老争论。AlphaGo的突破在于它融合了这两种范式:底层使用神经网络生成潜在走法,上层通过蒙特卡洛树搜索算法对这些走法进行筛选验证——本质上是在循环中引入验证机制,用更经典的算法方式来评估最优决策。
know a lot better than me, the old AI debate between the connectionists, the people who thinks neural networks are the true sort of way of doing AI, and symbolic systems, I think, or like the people that think that, you know, discrete reasonings, F statements and knowledge bases, whatever, this is the way to go. And so there was a merging of these two worlds where the way AlphaGo worked is it had a neural network, but it had a Monte Carlo tree search algorithm on top of that. So the neural network would generate a list of potential moves. And then you had a more discrete algorithm sort those moves and find the best based on just tree search, based on just trying to verify. Again, this is sort of a verifier in the loop, trying to verify which move might yield the best based on more classical way of doing algorithms.
这就是该运动的复兴,我们有了这个源自大语言模型的惊人生成神经网络。现在让我们叠加更多离散的方法来验证其行为是否正确,并将其纳入训练循环。一旦这样做,大语言模型将开始获得新能力,比如数学推理和代码理解等。
And so that's a resurgence of that movement where we have this amazing generative neural network that is out of the LLM. And now let's layer on more discrete ways of trying to verify whether it's doing the right thing or not, and let's put that in a training loop. And once you do that, the LLM will start gaining new capabilities such as reasoning over math and code and things like that.
没错。对。好的。这很棒。不过,关键在于要让强化学习适用于大语言模型的推理,必须确保问题陈述具有明确且可验证的答案。
Exactly. Right. Okay. And then that's great. And then and then the the key thing there, though, for for RL to work for LLMs to reason, the the key is that that it be a a problem statement that there is a defined and verifiable answer.
正是如此。对吧?你可以这样想:我们举些例子。比如在医学领域,可能就像一组人类医生都认同的诊断。嗯。
That's right. Is that right? And so and and and you might think about this as like let's give a bunch of examples. Like, in medicine, this might be like, you know, a diagnosis that, like, a panel of human doctors agrees with. Mhmm.
或者,顺便说,或者是一个真正能解决问题的诊断。在法律领域,可能是一个能在陪审团面前导致无罪释放的论点。嗯。或者类似情况。在数学中,是能正确求解的方程。在物理学中,是能在现实世界中成立的结果。
Or or or by the way, or a diagnosis that actually, you know, solves the condition. In law, this would be a, you know, a argument that in front of a jury actually results in an acquittal Mhmm. Or or something like that. In, math, it's an equation that actually solves properly. In physics, it's a result that actually works in the real world.
嗯。我不知道。在土木工程中,就是一座不会倒塌的桥梁。对吧?所以总是存在某种验证标准...
Mhmm. I don't know. In civil engineering, it's a bridge that doesn't collapse. Right? So so so there there there's always some some test of
前两个方法效果不太理想
is the first two do not work very well
好的。
Okay.
目前来说,比如法律和医疗保健领域,它们仍然有点过于模糊、不够严谨。不像数学或编程。他们在数学训练中使用了一种名为Lean的可证明语言来处理证明。
Just yet. Like, the the, like, I would say law and health care, they're still a little too squishy, a little too soft. Okay. It's unlike math or code. Like, the way that they're training on math, they're using this sort of like a program language, provable language called Lean for proofs.
对吧?你可以运行Lean语句,可以运行计算机代码,也许还能运行物理模拟或土木工程类的物理模拟,但你无法运行一个诊断。
Right? So you can run a Lean statement. You can run a computer code. Perhaps you can run a physics simulation or civil engineering sort of physics simulation, but you can't run a diagnosis. Okay.
所以我认为
So I would say that
但你可以通过人工回答来验证,或者不验证。
But you could verify it with human answers or not.
是的,这在某种程度上更偏向人机交互。所以这不是那种完全自主、可扩展的强化学习训练。这就是为什么编程领域比其他领域发展更快——因为我们可以即时生成问题并验证。
Yeah. So that's a more HF in a way. So it is not the, like, sort of autonomous RL train, like fully scalable autonomous. Okay. Which is why coding is moving faster than any other domain is because we can generate these problems and verify them on the fly.
但编程有两重考验,任何写过代码的人都知道:一是代码能否编译?二是它是否产生正确输出?编译通过并不代表输出正确。
But there's two with coding, anybody who's coded knows, there's coding, there's two tests, which is one is does the code compile? Right. And then the other is does it produce the right output? Right. And just because it compiles doesn't mean produces the right output.
但你告诉我,验证输出是否正确其实更难。
And you tell me, but verifying that it's the correct output is harder.
是的,SWE bench是一个已验证拉取请求最终状态的集合。它不仅仅关乎编译。科学家团队使用SuiteBench作为主要基准,测试AI在软件工程任务上的表现,我们几乎已经达到饱和。去年我们大概只有5%,今年初甚至更低,而现在我们达到了82%左右,4.5版本下这个数字堪称业界顶尖。这真是一段令人振奋的快速攀升。
Yeah, so SWE bench is a collection of verified pull requests end states. It is not just about compiling. So group of scientists So SuiteBench is the main benchmark used to test whether AI is good at software engineering tasks, and we're almost saturating that. So last year, we're at like maybe 5%, early twenty four or less, and now we're like 82% or something like that with cloths on at 4.5, that's state of the art. And that's like a really nice hell climb that's happening right now.
基本上,他们在GitHub上搜寻,找到最复杂的代码库,发现非常清晰的错误报告,并找到ProQuest——这些错误报告都附有单元测试等完整内容。GitHub上已经存在一个AI可解决任务的语料库,你也可以生成它们。生成所谓的合成数据并不太难。但你说得对,这并非无限扩展,因为仍需要人工验证者检查任务。不过基础模型可能已找到让合成训练完全自主进行的方法。
And basically, they went and looked on GitHub, they found the most complex repositories, they found bug statements that are very clear, and they found ProQuest that actually saw those bug statements with unit tests and everything. So there is an existing corpus on GitHub of tasks that AIs can solve, and you can also generate them. Those are not too hard to generate what's called synthetic data. But you're right, it's not infinitely scalable because some human verifiers still need to kind of look at the task. But maybe the foundation models have found a way to have the synthetic training go all the way.
没错。我认为基础模型公司正在做的是,在某些情况下,他们实际上正在聘请人类专家——是的——来生成新的训练数据。
Right. And then what's happening, think I think because what's happening is the foundation model companies are in some cases, they are hire they're actually hiring human experts That's right. To generate new training data.
是的。
Yes.
所以他们实际上雇佣了数学家、物理学家和程序员,让他们坐着...你知道,他们雇佣人类程序员,给他们打鸡血(笑)——对,可能是咖啡。然后让他们实际编写代码,并以某种已知运行结果的方式编写,这样RO循环才能被正确训练。是这样。
So they're actually hiring mathematicians and physicists and coders to basically sit, and, you know, they're they're they're hiring they're they're they're hiring human programmers, putting them on the cocaine Yes. And having them probably coffee. Yeah. And and having them actually write code, and then and then write code in a way where there's a known result of the code running such that the the this RO loop can be trained properly. That's right.
这些公司还在做的另一件事是:他们正在构建系统,让软件本身生成训练数据、生成测试、生成验证结果,然后...
And then the other the other and then the other thing these companies are doing is is is that they're they're building systems where the software itself generates the training data, generates the tests, generates the valid the validated results, and and then that's
这就是所谓的合成训练数据。没错。但同样,这些在非常困难的领域有效,在较软性领域也有一定效果。我认为存在一些迁移学习——在使用深度研究等工具时能看到推理工作,但我们在软性领域的进展速度没那么快。
so called synthetic training data. That's right. But again, those worked in the very hard domains. It works to some extent in the softer domains, and I think there's some transfer learning. You can see the reasoning work when it comes tools like deep research and things like that, but we're not making as fast as progress in the more soft domain.
软性领域,指的是那些难以甚至无法以确定性、事实性、客观且无争议的方式验证结果正确性的领域。比如如果你
Softer domains, meaning domains in which it's harder or even impossible to actually verify correctness of result in a sort of a deterministic, factual, grounded, non controversial way. Like if you
患有慢性疾病,可能患有POTS或EDS综合征,这些都是症状群,因为这属于抽象领域。它不像代码和数学那样具体。所以我认为在这方面还有很长的路要走。
have a chronic disease, could have, you have a POTS or EDS syndrome, they're clusters, because it is the domain of abstraction. It is not as concrete as code and math and things like that. So I think there's still long ways to go there.
对。所以问题的具体性才是关键变量,而非问题的难度。这样理解
Right. So sort of the more concrete the problem it's the concreteness of the problem that is the key variable, not the the difficulty of the problem. Would that be a way to
对吗?是的。我认为具体性指的是能否获得可验证的真假输出。
think about it? Yeah. I think the the concreteness in a sense of can you get a true or false verifiable output.
但是,在人类努力的任何领域中,只要存在可验证的答案,我们就应该期待极其迅速的进步?
But, like, in any domain in any domain of human effort in which there's a verifiable answer, we should expect extremely rapid progress?
没错。好的。绝对如此。我想这正是我们所说的。
Right. Okay. Absolutely. And I I think that's what we're saying.
而且这肯定包括数学,肯定包括物理,肯定包括化学,肯定包括代码的广大领域。是这样吧?那么你认为还包括哪些呢?
And that and that for sure includes math, that for sure includes physics, for sure includes chemistry, for sure includes large areas of code. That's right. Right? What what else does that include, you think?
生物领域,就像我们看到的蛋白质那样
Bio, like we're seeing with a protein
基因组学,基因组学,对。好的。基因组学。
Like genomic genomic yeah. Okay. Genomic.
对。是的。是的。类似的事情。我觉得机器人技术的某些领域确实如此。
Right. Yeah. Yeah. Things like that. I think some some areas of robotics Right.
存在一个明确的结果。没错。但数量并不多。我是说,出乎意料地少。
That there's a clear outcome. Right. But but it's not that many. I mean, surprisingly.
嗯,这取决于情况。是的。取决于你的观点。有些人可能会说那已经很多了。然后你还提到了改进速度。
Well, it depends. Yeah. Depends on your point of view. Some people might say that's a lot. So, and then you mentioned the pace of improvement.
那么,你会
So, what would you
对未来这方面的改进速度有何期待?我认为我们在编码方面进展迅猛。我觉得就是,持续前进。我们目前正在研发的第四代智能体,预计到明年你就能坐在Repled前同时调度多个智能体了。比如你计划开发一个新功能,想在店铺页面上叠加社交网络功能。
expect from the pace of improvement going forward for this? I think we're ripping on coding. I think it's just, going. What we're working on with agent four right now is by next year, we think you're gonna be sitting in front of Repled, and you're shooting off multiple agents at a time. You're planning a new feature, so I want a social network on top of my storefront.
还有人说,嘿,重构数据库。你们在运行并行代理,所以有五到十个代理在后台工作,它们合并代码并处理所有那些事情。但你们还有一个非常棒的界面,可以在上面进行设计,并以更具创意的方式与AI互动,可能使用视觉元素和图表之类的。所以这种互动有多模态的角度。所以我认为,开发软件将成为一个非常令人兴奋的领域,而且我认为普通人会变得和今天在谷歌工作的高级软件工程师一样优秀。
And another one, was like, Hey, refactor the database. You're running parallel agents, so you have five, ten agents working in the background and they're merging the code and taking care of all of that. But you also have a really nice interface on top of that, that you're doing design and you're interacting with AI in a more creative way, maybe using visuals and charts and things like that. So there's a multimodal angle of that of that interaction. So I think, you know, creating software is gonna be such an exciting area, and and and I think that the layperson will be as good as what a senior software engineer that works at Google is today.
所以我认为这很快就会发生。但你知道,我没有看到它们,我对你的观点很好奇,但就我的经验而言,在医疗保健方面,或者更偏向‘给我写篇文章’的创意领域,还没有看到像我们在代码领域所见的那么快速的进步。所以我认为代码领域将会突飞猛进。数学可能也是。嗯。
So I think that's happening very soon. But you know, I don't see them, and I'd be curious about your point of view, but like my experience between as a sort of a, you know, on the, let's say, health care side or more, you know, write me an essay side or more creative side, haven't seen as much of a rapid improvement as what we're seeing in code. So so I think I think code is gonna go to the moon. Math is probably as well. Mhmm.
一些科学领域,比如生物之类的,它们会发展得非常快。
Some some, you know, scientific domains, bio, things like that, those are are gonna move really fast.
是的。所以有一种奇怪的动态,看你是否同意这一点,埃里克我也很好奇你对这个的看法。我们经常有这种奇怪的动态,办公室里经常这样,我和很多领先的企业家也有这种感觉,就是一方面觉得这是有史以来最惊人的技术,发展得非常快,但另一方面我们又真的很失望。感觉它可能快要停滞了。
Yeah. So this is there's this there's this weird dynamic, see if you agree with this, and Eric also curious your point of view on this. Like, there's this weird dynamic that we have, and we have this in the office here a lot, I also have this with, like, the leading entrepreneurs a lot, which is this thing of, like, like, wow. This is the most amazing technology ever, and it's moving really fast, and yet we're still, like, really disappointed. And, like, it's not, like, maybe right on the verge of stall out.
嗯。
Mhmm.
而且,你知道,我们既应该感到极度兴奋,又可能快要绝望了,因为你知道,好日子就要结束了。对吧。我总是在想,一方面,好吧,不是所有的梯子都能通往月球。仅仅因为某样东西看起来有效,并不意味着你能把它规模化并让它达到虚假的程度。
And, like, you know, we should both be, like, hyperexcited, but also on the verge of, like, slitting our wrists because, like, you know, the gravy train is coming to an end. Right. And and I always wonder, it's like, you know, on the one hand, it's like, okay. Like, you know, not all, I don't know, ladders go to the moon. Like, just because something, you know, looks like it works, you know, doesn't mean it's gonna, you know, be able to you're gonna be able to scale it up and have it work, you know, to the false extent.
你知道吗?所以,重要的是认识到实际的限制,而不是把一切都无限外推。另一方面,我们正在处理的是魔法,五年前,或者十年前,我们可能都认为这是不可能的。比如,我八十年代末九十年代初拿到计算机科学学位时,从没想过我能活着看到这一切。
You know? So, like, it's it's important to, like, recognize practical limits and then not just extrapolate everything to infinity. On the other hand, like, you know, we're dealing with magic here that we, probably all would have thought was impossible five years ago, or certainly ten years ago. Like, I didn't know, look, I, you know, I got my CS degree in the late eighties, early nineties. I I never I didn't think I would live to see any of this.
对吧?这简直太不可思议了,这一切居然真真切切地发生在我有生之年。
Right? Like, this is just amazing that this is actually happening in in in in my lifetime.
但但是AGI确实是个巨大的赌注。对吧?无论是基础模型,我认为现在整个美国经济某种程度上都在押注AGI。而且关键问题是——我们是否真的走在通往AGI的正轨上?对吧。
But but there's a huge bet on AGI. Right? Like, whether it's the foundation models, I think, you know, now the entire US economy is sort of a bet on AGI. And and there are crucial questions to ask whether are we on track to AGI or not. Right.
因为从某些方面来看,我可以告诉你我们似乎并不在AGI的正轨上,因为这些领域之间缺乏显著的迁移学习能力。比如我们在代码方面进步很大,但并不会立即提升通用推理能力。我们还需要为生物、化学、物理、数学或法律等领域获取训练数据并创建强化学习环境。自从Dwarkish和Richard Sutton的访谈后,这已成为AI社区的讨论焦点——当时Richard Sutton给他那篇《苦涩的教训》泼了冷水。现在大家突然都不再引用他那篇著名文章了。
Because there are some ways that I can tell you it doesn't seem like we're on track to AGI because there doesn't seem to be transfer learning across these domains that are significance, right? So, if we get a lot better at code, we're not immediately getting better at generalized reasoning. We need to go also get training data and create RL environment for bio or chemistry or physics or math or law. And this has been the sort of point of discussion now in the AI community after the Dwarkish and Richard Sutton interview where Richard Sutton poured this cold water on the the Bitter Lesson. So everyone was using this essay that he wrote called The Bitter Lesson.
核心观点是:AI研究存在无限扩展的方法。只要投入更多算力和数据获得性能提升,这就是实现AGI的终极路径。但有些人从那场访谈中解读出——他可能怀疑我们是否真的走在'苦涩教训'的道路上。当前的训练机制或许恰恰相反:我们过度依赖人类数据和人工标注这些东西。所以我同意你的观点,作为公司我们当然对发展趋势感到兴奋,但根本问题在于——我们是否走在通往AGI的正确道路上?
The idea is that there are infinitely scalable ways of doing AI research. And anytime you can pour more compute and more data and get more performance out, you're just that's the ultimate way of getting to AGI. And some people, you know, interpreted that interview that perhaps he's doubtful that we're even on a bitter lesson path here. And perhaps the current training regime is actually very much the opposite in which we are so dependent on human data and human annotation and all of that stuff. So I think that I agree with you, I mean, as a company, we're excited about where things are headed, but there's a question of like, are we on track to AGI or not?
我很想听听你的看法。
I'd be curious what you think.
所以...你知道Ilya Sutskev...我是说Ilya Sutskever提出了这个论点的一个具体版本,本质上就是说我们正在...字面意义上耗尽
So so and, you know, Ilya, I think, know, Ilya Seskever makes a makes a specific form of this argument, which is basically, like, we're we're just literally running out of
训练数据。这就像化石燃料的论点。
training data. It's a fossil fuel argument.
没错。就像我们吞噬了所有的训练数据。从根本上说,我们吸收了互联网上的所有数据。目前几乎所有的数据都在那里。还有一些数据存在于某个私密的暗池中,我们会去获取,但基本上我们已经囊括了所有。
Right. Like, we we we slurp all the training data. We base fundamentally, we slurp all the data off the Internet. That is where almost all the data is at this point. There's a little bit more data that's in, like, you know, private dark pool somewhere that we're gonna go get, but, like, we have it all.
然后,我们现在正致力于生成新数据,但生成新数据既困难又昂贵,你知道,比起直接从互联网上抓取数据来说。所以存在这些争论。话说回来,你会很快陷入定义性问题,那就像个无底洞。但正如你提到的迁移学习,它是指机器能够在一个领域成为专家,然后将这种能力泛化到另一个领域。
And then, right, we're we're in this business now trying to generate new data, but generating new data is hard and expensive, you know, compared to just, like, slurping things off the Internet. So there are these arguments. You know, having said that, you know, you get into definitional questions here really quick, which are kind of a rabbit hole. But having said that, like you mentioned, transfer learning. So transfer learning is the ability of the machine to, right, to be an expert in one domain and then and then generalize that into another domain.
我的回答是:你见过这样的人吗?你认识多少人在进行跨领域的知识迁移?不多吧?真的不多,对吧?
My answer to that is, like, have you met people? And how many people do you know are in religious transfer learning? Not many. Not many. Right?
其实恰恰相反。
Well, because there's Quite the opposite, actually.
他们在某个领域越是精通,往往就越容易有盲点。
The nerdier they are in a certain domain, they kind of, you know, often they have blind spots.
我们开玩笑说每个人在某个领域都很蠢,或者他们会犯一些重大错误,是的,别在那方面信任他们。但换个话题就不同了。
We joke about how everyone's just retarded in one area, or they they make something, like, massive mistake, and and and, yeah, don't trust them on this. But another this other topic, you know.
对。这其实是公共知识分子圈子里众所周知的现象。关于所谓的公共知识分子,甚至有人写过整本书来讨论。就是那些经常上电视的专家们。
Right. Yeah. Well, and and this is a well known thing among, for example, public intellect. So this happens there's actually been whole books written about this on so called public intellectuals. So you get these people who show up on TV, and they're experts.
问题是他们明明是经济学专家,对吧?然后他们出现在电视上谈论政治,却对政治一窍不通。对吧?或者他们对医学一无所知,对法律也一窍不通,对计算机更是完全不懂。
And what happens is they're like an expert in economics. Right? And then they show up on TV, and they talk about politics, and they don't know anything about politics. Right? Or they don't know anything about, like, medicine, or they don't know anything about the law, or they don't know anything about computers.
知道吗?这是保罗·克鲁格曼在说互联网不会比传真机更重要。传真机。是啊。他可是个杰出的经济学家。
You know? This is a Paul Grukman talking about how the Internet is gonna be no more significant than the fax machine. Fax. Yeah. It's a brilliant economist.
他根本不懂计算机。
He has no idea what a how to computer.
他真的是个杰出的经济学家吗?
Is he a brilliant economist?
呃,曾经...曾经...曾经算是吧。就算...就算他是个杰出的...问题是,这到底意味着什么?
Well, at one at one point. At one point. At one point. Let's get let's say, even if even if he's a brilliant well, this is this is the thing. Like, what does that mean?
好吧。一个杰出的经济学家应该能推断出...互联网是个...这是个好问题。但关键在于,就算他有...或者随便举个例子...顺便说,就像爱因斯坦,其实我最喜欢这个例子。我想你会同意爱因斯坦是个杰出的物理学家。
Okay. Should a brilliant economist be able to extrapolate, you know, the the Internet is a is good question. But, but the point being, like, even if he has a you know, or take any take take anybody oh, by the way, or like, like Einstein's, like, actually my favorite example. Like, I think you'd agree Einstein was a brilliant physicist. Yeah.
他可是个...他是个斯大林主义者。他是个社会主义者,还是个斯大林主义者。他觉得斯大林棒极了。
He was like a he was a he was a Stalinist. Like, he was a he was a yeah. He was a socialist, and he was a Stalinist. And he was like well, you thought, like, Stalin was fantastic.
你知道他出去了。所以
You know what he's out. So
是啊。好吧。行吧。真正的社会主义。好吧。
Yeah. Okay. Alright. True socialism. Alright.
好吧,爱因斯坦。听着,我我我会相信你的话。但一旦他涉足政治,他就变得完全不着边际,或者,你知道,甚至不分对错。他突然听起来像个本科疯子,就像宿舍里的某个人。就像,从物理学到政治学之间完全没有知识迁移。
Alright, Einstein. You know, I'll I'll I'll take your word for it. But, once he got into politics, he was just, like, totally loopy or or, you know, or even right or wrong. He just sounded like all of a sudden like an undergraduate lunatic, like somebody in a dorm room. Like, there was no transfer learning from physics into politics.
就像,他他对错与否,他根本没有——他的政治分析明显毫无新意。就是那些老生常谈的陈词滥调,你知道的。是啊。所以
Like, he he right or wrong, he didn't there was no g there was clearly nothing new in his political analysis. It was the same rote routine bullshit you get out of, you know. Yeah. So in
某种程度上,你的论点就像是,我们可能已经拥有了人类水平的AI。我是说,也许AGI的定义完全不同。它应该超越人类水平,真正实现跨领域通用——这我们还没见过。
a way, the argument you're making is like, we may maybe already a human level AI. I mean, perhaps the definition of AGI is is is something totally different. It's like above a human level that something that truly generalizes across domains is is not something that we've seen.
是啊。就像我们理想化——对。我刚想说,我们我们...你看,我们应该志存高远,但我们把目标理想化了,某种程度上这种理想化...对吧?要知道,哪怕只做一点点边际改进可能实际上更好,或者根本无所谓,因为反正没人能做到,所以你只需要不断叠加领域。AI领域还有个著名现象——通常情况相反——就是AI工程师和科学家总抱怨的:AI的定义总是机器还做不到的下一件事。
Yeah. Like, we've ideal yeah. I was I was saying, so we we've we've and and, you know, look, we should we should shoot big, but we we've idealized a a we've idealized a goal, that may be idealized in a way that, like, Right. Know, doing even a little little bit a marginal bit might might actually be better, or it might not matter just because no no human can do it, and so therefore, you just you just stack up the domains. There's also this well known phenomenon in AI with you know, typically, this works the other way, which is phenomenon in AI AI engineers always complain about and scientists always complain about, which is the definition of AI is always the next thing that that the machine can't do.
所以很长一段时间里,AI的定义就像是:它能下棋赢人类吗?这曾是件大事。但现在没人庆祝。没有派对。没有。
And so, like, the definition for a of AI for a long time was, like, can it beat humans at chess? Is a really big deal. There was no celebration. There was no parties. No.
完全正确。根本没有庆祝。八十年了,我说的图灵测试,他们还拍了部电影,整个事情。那曾是重点。而我们直接突破了它,甚至没人注意到。
That's exactly right. There was no party. For eighty years, the Turing test I mean, they made a movie about it, like, the whole thing. That that was the thing. And, like, we blew right through it, nobody even registered it.
没人在乎。它没得到任何赞誉。我们就像在说,是啊。它依然,你知道的,完全是个废物。就像我说的。
Nobody cares. It gets no credit for it. We're just like, yeah. It's still, you know, complete piece piece of shit. Like I said.
是啊。对吧?所以现在情况是,AI科学家们习惯性地抱怨他们总是被拿来和下一个目标比较,而不是已经解决的问题。但这可能也是另一面,他们给自己设定了一个不切实际的目标。一个不切实际的目标,然后一路上自我鞭策。
Yeah. Right? And so there's this thing where so the AI scientists are are are used to complaining basically that they're that they're they're being they're always being judged against the next thing as opposed to all the things they've already they've already solved. But but that's maybe the other side of it, which is they're also putting out for themselves An unreasonable goal. An unreasonable goal, and then doing this sort of self flagellation kind of along the way.
我有点好奇,嗯。我在想这条路会通向哪里
And and I I kinda wonder yeah. I I wonder kinda which way that
是啊,这是个有趣的问题。我开始思考这个观点:是否真正达到通用人工智能并不重要。我对通用人工智能的定义是,你把AI系统放在任何环境中它都能高效学习。它不需要太多先验知识就能学习,还能将知识迁移到不同领域。
Yeah, yeah. It's an interesting question. I started thinking about this idea of it doesn't matter whether it's truly AGI. And the way I define AGI is that you put in an AI system in any environment and efficiently learns. It doesn't have to have that much prior knowledge in order to learn something, but also can transfer that knowledge across different domains.
但我们可以实现功能性通用人工智能。功能性通用人工智能就是收集当今世界所有有用经济活动的数据,在此基础上训练大语言模型或相同的基础模型。我们将瞄准经济各个领域,这样就能自动化大部分劳动力。所以我认为,是的,我们正朝着这个方向前进
But we can get to functional AGI. And what functional AGI is, is just collect data on every useful economic activity the world today and train an LLM on top of that, or train the same foundation model on top of that. And we'll go, we'll target every sector economy, and you can automate a big part of labor that way. So I think I think, yeah, I think we're on that track
没错。确实如此。
Right. For sure.
没错。你在GPT-5发布后发推文说感受到了收益递减。那么你原本期待什么?需要做些什么?我们需要另一项突破来恢复增长节奏吗?对此你有什么看法?
Right. You tweeted after GBT five came out that you were feeling the diminishing returns. Yeah. What what were you expecting and and what needs to be done? Do we need another breakthrough to get back to the pace of growth, or what what are your thoughts there?
我是说,整个讨论某种程度上就是关于这个。我的感觉是,GPT-5在可验证领域表现很好,但在其他方面并没有明显提升。从人性化角度来看,它甚至像是退步了——就像出现了一场针对Sam和OpenAI的'红迪式讨伐'运动,因为人们感觉失去了一个朋友。Jupe D4 O感觉更有人情味更亲近,而Jupe D5感觉更机械化,总是在头脑中过度思考。我原本期待的是像从GPT-2到3那样明显的拟人化进步。
I mean, this this whole discussion is is sort of about that, and and my feeling is that, you know, GPT five got good at verifiable domains. It didn't feel that much better at anything else. The more human angle of it, it felt like it regressed and like you had this sort of Reddit pitchfork sort of movement against against Sam and OpenAI because they felt like they lost a friend. Jupe D4 O felt a lot more human and closer, whereas Jupe D5 felt a lot more robotic, very in its head, kind of trying to think through everything. So I would have just expected, like when we went from GP two two to three, it was clear it was getting a lot more human.
它更接近我们的真实体验。你能感觉到它确实'懂你',对世界的理解更深入。但三到四、四到五的升级并没有带来整体存在感的提升。
It was a lot closer to our experience. You can feel like it's actually all it gets me. Like there's something about it that understands the world better. Similarly, three to four, four to five didn't feel like it was a better overall being as it were.
但问题在于——这是关于情感表现的问题吗?是...
But is that is that is that is that a is the question there, like, it emotionality? Is it
部分是情感表现,但同样重要的是——我喜欢问模型一些极具争议的问题。它能否像解决编程问题那样进行推理?比如(不知道我们想讨论多深)世贸中心7号楼事件?
Partly emotionality, but again, partly, I like to ask models very controversial things. Can it reason through, I don't know how deep we wanna go here, but like, what happened with World Trade Seven?
对,当然。这是个有趣的...
Right. Sure. It's an interesting
问题对吧?我不是在提出某种理论,但很有趣的是:它能像处理编程问题那样思考争议性问题吗?这方面没有任何进展——不仅是这个例子,还有比如新冠起源这类问题。
question, right? I'm not putting out a theory, but it's interesting, how did it And can it think through controversial questions in the same way that it can think through a coding problem? And there hasn't been any movement there, like all the reasoning and all of that stuff. Not just that, that's a cute example, but like COVID, right? Like, the origins of COVID.
去挖掘GPT-4或其他模型,再升级到GPT-5。你不会发现有多大区别,比如'让我们一起推理,试图找出COVID的起源'。因为这仍然是个未解之谜,明白吗?我看不到它们在这方面取得进展。我是说,你经常摆弄这些模型,对吧?
Go dig up GPT-four or other models and go to GPT-five. You're not gonna find that much difference of, Let's reason together, let's try to figure out what was the origins of COVID. Because it's still an unanswered question, you know? And I don't see them making progress on that. I mean, you play a lot with them, do you
是的,我的使用方式不同。可能我的期望不一样。我主要把它当作随叫随到的博士来用。所以我更多是让它给我解释事情,而不是和它对话。
feel Yeah, I use it differently. I don't know, maybe I have different expectations. My main use case actually is sort of PhD and everything at my beck and call. And so I'm I'm trying to get it to explain things to me more than I'm trying to, like, you know, have conversations with it maybe. Yeah.
可能我这样用比较特别吧。
Maybe I'm just unusual with that.
这个功能确实在改进。
And that that that gets better.
具体来说,我发现GPT-5 Pro结合深度推理,或是Rock-4 Heavy这类顶级模型,现在能即时生成30到40页的专题书籍。只要我对某个话题产生好奇...结果发现这其实是个非常复杂的问题。
Well, so what I what I what I found specifically is a combination of, like, GPT five pro plus deep reasoning or, like, rock four heavy, like the, you know, the the highest end models like that. You know, they now basically generate 30 to 40 page, you know, essentially books on demand on any topic. Yeah. And so anytime I get curious about something, producer? And and and it's actually a very complicated it turns out a very complicated question.
这是个经济学家经常研究的大课题。就像在问:到底由谁来买单?我发现这类问题上它的表现非常出色。
It's a big, big, big thing that economists study a lot. And it's just like, okay. Who who, you know, who pays? And what I found, like, for that kind of thing is it's outstanding.
但它真正厉害的是能爬取网络信息并进行综合整理。
Well well but but it's outstanding at sort of going out of the web, getting information, synthesizing it.
没错。它能给我合成出二十、三十、四十页的PDF文档,基本上最多四十页。是的。虽然上限是四十页PDF,但内容完全连贯,就我交叉验证的所有内容来看,质量堪称世界级——就好比我雇了个斯坦福经济学博士后专家来做这项工作,可能也就这个水平了。
Correct. It it gives me it gives me a synthesized twenty, thirty, 40 pay it basically tops out of 40 pay forty forty pages of PDF. Yeah. But it it it I can get it can get up to 40 pages of PDF, but it's a completely coherent, and as far as I can tell for everything I've cross checked, completely, like, it like, world class like, if I hired you know, for a question like that, if I hired, like, a great, you know, econ PhD postdoc at Stanford who just, like, went out and did that work, like, it it would maybe be that good. Yeah.
但当然,关键在于——至少在许多领域确实如此——这相当于获得了博士级别的知识整合能力。
But but then, of course, the significance is it's like it's like, you know, at least for this is true for many domains, you know, kind of PhD and everything.
所以这其实是在综合既有知识,而非创造新知识。
So But but this is synthesizing knowledge, not trying to create new knowledge.
不过这就涉及到那个'针尖上能站几个天使'的经典问题了:到底什么才算新知识?当你向人提问时,你真正期待获得什么?我所追求的是能以最清晰、最精深、最完整的方式,像真正专家那样给我讲解。根据我的反复验证,它在这方面几乎能打满分。
Well, but this this this gets to the the sort of, you know, course, you you get into the angels dancing on the head of a pin thing, which is, like, what what you know, what's the difference? How many how much new knowledge ever actually is there anyway? Do you actually expect from people when you ask them questions? And so what what I'm looking for is, like, yes, explain this to me in, like, the the clearest, most sophisticated, most complex, most, like, complete way that it's possible for somebody to you know, for a real expert to be able to to to explain things to me. And that's what I use it for, again, as far as I can tell from the cross checking, like, I'm getting, know, like, almost, like, basically a 100 out of a 100.
说实话,我已经好几个月没发现它出错了。虽然你可以说知识整合应该产生新信息,但它能生成40页的书稿级内容,这本身就非常惊人。
Like, I don't even think I've had an issue in months Yeah. Where it's, like, sure had had a a had a problem in it. Yeah. And it's like, yeah, you can say, yeah, synthesizing is supposed to create new information, but, like, it's it's generating a 40 page it's basically generating a 40 page book. That's amazing.
它的行文流畅度令人难以置信,整个逻辑体系严丝合缝——如果评价人类作者的话,你会惊叹'真是位了不起的作家'。但写书的人就真的在创造新知识吗?
That's, like, incredibly, like, fluid. It's, you know, it's it's it's it's it's, you know, the the the logical coherence of the entire like, it's a it's a great write like, if if you if you evaluated an an a human author on it, you would say, wow. That's a great author. Yeah. You know, do are people who write books, you know, creating new knowledge?
某种程度上也不是,因为他们大多是在前人基础上构建的,受限于人类心智的容量。不过写书确实算得上创造性成就,对吧?
Well, yeah. Well, sort of not because a lot of what they're doing is building on everything that came before them as as if the size of the mind. But also, like, a a book is a creative accomplishment. Right? And so
我对此并不感兴趣,我希望人工智能能帮助我们解决这个问题,就像当前混乱的信息生态系统一样。现在的一切都感觉像是宣传,似乎你无法从任何地方获取真实信息。所以我真心想要一个人工智能,能帮助我从基本原理出发,理解世界上发生的事情,让我真正获得真实信息。也许这对AI研究者来说是个不切实际的要求,但我认为我们在这方面尚未取得任何进展。
I'm I don't interested in, I'm hoping AI could help us solve this, just like how confusing the information ecosystem right now. Everything feels like propaganda, like it doesn't feel like you're getting real information from anywhere. So I really want an AI that could help me reason from first principles about what's happening in the world for me to actually get real information. And maybe that's an unreasonable sort of ask of of AI researchers, but I don't think we're we have made any progress there.
或许我过于...是的,或许我过于执着于与人争论,而不是像游客那样试图探寻真相。但问题是,我经常这样做——采取一个挑衅性的观点,然后为其构建最强有力的论证。就拿新冠疫情来说吧。
So maybe I'm over yeah. Maybe I'm over maybe in my mind, I'm over focused on arguing with people as opposed to trying to get down as a tourist trying to get down the line truth. But well, here here's the thing. I I do a lot with this is I just say, like, take take a provocative point of view and then steel man the position. Take your COVID thing.
我会同时构建两种最强论证:一种是支持实验室泄漏说,另一种是支持自然起源说。嗯...这算不算创造性思维?我也不知道。
Steel man so I I often I often pair these. Steel man the position that it was a lab leak, and then steel man the position that it was natural origins. Mhmm. And and again, like, is this creativity or not? I don't know.
但最终得到的往往是各30页的详尽论证,哇,那简直是我能想象到的最具说服力的案例,所有论据都严阵以待,论点结构尽可能完善...
But, like, what comes back is, like, 30 pages each of, like, wow. Like, that is like the most compelling case in the world I can imagine with like every, you know, everything marshaled against it, like the argument structured in like the most possible Part of
这种情况没有发生是因为讨论人为起源已不再被视为禁忌。是的,当它还是禁忌时...
the reason that's not happening is because it stopped being taboo to to talk about a human origin. Yes. When it was taboo
是的。
Yes.
那些AI会居高临下地说:你这是阴谋论。所以...是的...有段时间确实如此。要讨论真正有争议的话题时,它们其实无法理性分析,因为受到我们所有低级趣味和荒谬言论的影响。
The the AIs would like talk well, you know, talk down to you. It's like, you're a conspiracy. And so Yes. As a as a there's a there's a you know Yes. Period of time and so to take something truly controversial and they actually they they can't reason about it because of all our LHF and nonsense.
局限性的局限。
Of of limitations.
正如你所知,这里不会挑选具体的模型,因为有些大型模型仍然会对你进行说教。是的,它们甚至会因为你提出这个问题而认为你是个坏人。但要知道,其中一些现在真的非常开放,能够处理这些事情。嗯。
And and as as you know, won't pick them specific ones here, like, there there are certain certain big models that will still lecture you. Yeah. That you're a bad person for asking that question. But but, know, like, I just there some of them are just, like, really, really open now to, you know, being able to do these things. Mhmm.
然后是的。所以好吧。是的。所以,好吧。所以,是的。
And then yeah. So okay. Yeah. So, okay. So, yeah.
所以有这个是的。基本上,你最终寻找的是,如果有什么东西——我认为还没有人真正定义清楚,因为传统的AGI定义基本上都是与人类相比较。是的。而且,对我来说,传统的AGI解释总是让我联想到关于自动驾驶汽车是否有效的争论——它是作为一个完美的司机而有效,还是因为它比人类司机更好而有效?我认为比人类更好这一点,就像国际象棋和围棋那样,实际上是很重要的。
So there's this yeah. So so basically, like, ultimately, what you're looking for like, the ultimate thing would be if there's something that's like don't think anybody's really defined this really well, because it's not because again, it's like conventional all the conventional definitions of AGI are, like, basically comparing to people. Yeah. And there there and there, it's always like, you know, it it know, it's it's the conventional explanations of of of AGI always, for me, strike me a lot like the debate around, like, whether a self driving car works or not, which is is does a self driving car work because it's a perfect driver, or does it work because it is is better than the human driver? And and better than the human driver, I think, is actually quite, you know, just like with the the chess thing and the go thing, I actually think like that.
那是一个真实的问题。然后还有它是否是一个完美的司机,这显然是自动驾驶汽车公司正在努力实现的目标。但我认为你在寻找超越完美司机的东西,你在寻找一辆知道该去哪里的车。
That that's like a real thing. And then and then and then and then there's the like, is it a perfect driver, which is, you know, what they're obviously the the self driving car companies are working for. But then, I think you're looking for something beyond the perfect driver. You're looking for the car who like knows where to go.
我有两种想法。一种是务实的企业家心态。我有太多可以玩耍和构建的玩具。即使今天停止AI的进步,Replica在未来五年内仍会继续变得更好。我们可以在应用层和基础设施层做很多事情。
I'm of two minds. So one mind is the practical entrepreneur. And I have so many toys to play with, to build. Stop AI progress today, and replica will continue to get better for the next five years. We have so much we can do just on the app layer and the infrastructure layer.
但我认为基础模型也会继续进步,所以这是我们行业非常激动人心的时刻。另一种想法更学术化,因为我从小就对意识的本质、智能的本质感兴趣,一直关注AI并阅读相关文献。我会指向强化学习文献。比如Richard Sutton,还有DeepMind的联合创始人Shane Legg,他们曾写过一篇论文试图定义AGI。我认为AGI的定义,或许最初也是正确的定义,是高效持续学习。
But I think that the foundation models will continue to get better as well, and so it's a very exciting time in our industry. The other one is more academic, because as a kid I've always been interested in the nature of consciousness, nature of intelligence, I was always interested in AI and reading the literature there. And I would point to the RL literature. So Richard Sutton, there's another guy, I think co founder of DeepMind, Shane Legg, wrote a paper trying to define what AGI is. In there, I think that the definition of AGI, I think, is the original, perhaps correct one, is efficient continual learning.
如果你真的想构建一种通用人工智能,可以将其应用于任何领域,比如无需太多汽车知识就能放进车里。对吧。想想人类需要多久学会开车?几个月内就能开得很好了。
If you truly want to build an artificial general intelligence that you can drop in any domain, you can drop in a car without that much prior knowledge about cars. Right. And within, you know, how long does it take a human to to learn how to drive? Right. Within months, being able to drive a car really well.
对。
Right.
所以我学到的是通用技能获取、通用理解获取和通用推理获取。
So I learned Generalized skill sort of generalized skill acquisition, generalized understanding acquisition. Generalized reasoning acquisition.
我认为这才是真正能改变世界的东西。它能让我们更好地理解人类思维和意识,推动人类文明迈向新高度,
And I think that's the thing that will truly change the world. That's the thing that would give us a better understanding of the human mind, of human consciousness, and that's the thing that will propel us to the next level of human civilization,
我想。
I think.
从文明层面看,这是个深刻的问题,但与经济和产业这些令人兴奋的领域不同。它还有学术层面的意义,我
So on a civilization level, I think that's a really deep question, but separate from the economy and the industry, which is all exciting. There's an academic aspect of it. That I'm
那么,如果我们今天在凯尔西身上做实验,我们对此有多大把握?
really So what and what odds what if we're on if we're on Kelsey today, what what odds do we place on that?
我对真正的AGI突破持悲观态度,因为我们构建的东西已经如此有用且具有经济价值。
I I I'm kinda bearish on Okay. On on on true AGI breakthrough because what we built is so useful and economically valuable.
某种程度上说,'够用就好'是进步的敌人。
Good in a way enough is the enemy.
记得那篇《更差反而更好》的文章吗?更差反而更好,更差反而更好。
Yeah. Yeah. Do do you remember that essay Worse is better. Worse better. Worse is better.
更差反而更好。所以,这就像一种
Worse Worse is better. And So, there's like a
局部最优陷阱,就像陷入了局部最优的困境。
local there's like a trap. There's like a local local maximum trap.
我们正陷入局部最优陷阱,因为
We're in a local maximum trap. Because
它对这么多经济生产性工作来说已经足够好,这减轻了系统创造通用解决方案的压力?
it's good enough for so much economically productive work, it relieves the pressure in the system to create the generalized answer?
是的。然后还有像Rich Sutton这样的怪人,他们仍在坚持走那条路,也许他们会成功。但当前领域有巨大的优化能量,我们正拼命攀登这个局部高峰。没错。
Yes. And then you have the weirdos, like Rich Sutton and others that are still trying to go down that path, and maybe they'll succeed. But there's enormous optimization energy behind the current thing that we're hell climbing on this local maximum. Right.
讽刺的是,所有人都在担心投入巨额资金建设这些东西。所以世界上最讽刺的事莫过于这些钱都投向了局部最优解。没错。与之相反的是,这些资金本可以用来解决更普遍的问题。
And the irony of it is everybody's worried about the gazillions of dollars going into building out all this stuff. So most ironic thing in the world would be if the gazillions of dollars are going into the local maximum. That's right. As as opposed to a counterfactual world in which they're going into solving the general problem.
但但但这可能也是非理性的。比如,那个普遍问题可能实际上...你知道...在我们有生之年都无法解决。
But but but it's also potentially irrational. Like, maybe the general problem is actually, you know, not within our lifetimes.
没错。
Right.
谁知道呢?对。对。
Who knows? Right. Right.
谁知道?
Who knows?
你觉得我们还能从LLMs中榨取出多少潜力?或者还有其他你特别看好的研究方向吗?
How how much further do you think, like do you think we squeeze most of the juice out of out of LLMs in general then? Or are there any other research directions that you're particularly excited about?
嗯,问题就在这里。我认为关键在于这类突破并不多。虽然强化学习领域的突破令人振奋,但我们十年前就知道将生成系统与树搜索等方法结合的思路。这条路还有很长的路要走,而且我认为强化学习的先驱们仍在尝试从零开始构建智能。卡马克就在走这条路。
Well, that that's the thing. I think the problem is there aren't that many. I think the breakthroughs in RL are incredibly exciting, but we also knew about them now for over ten years where you marry generative systems with tree search and things like that. But there's a lot more to go there, and I think, again, the original minds behind reinforcement learning are trying to go down that path and try to kind of bootstrap intelligence from scratch. Carmack is going down that path.
据我所知,卡马克他们——你们可能有投资——但他们并不打算走大语言模型这条路。确实有人在尝试,但我没看到太多进展或成果。不过我算是远距离观察。
As as far as I understand, Carmack, you guys may be invested, but the the you know, they're they're not trying to go down the LLM path. So there are people that are trying to do that, but I'm not seeing a lot of progress or outcome there. But I I watch it kinda from far.
不过话说回来,说不定X平台上早就潜伏着某个机器人了。谁知道呢?也许吧。
Although, you know, for all we know, it's already there's already a bot on x somewhere. What's that? Maybe.
你懂我意思吧?明白吗?
You know? You know?
这种事谁也说不准。可能不会有盛大公告,说不定某天X平台上就突然出现一个辩赢所有人的机器人。
You never you never know. It might not be a big announcement. It might just be a you know, one day, there's just like a bot on x that starts winning all the arguments.
是啊,有可能。
Yeah. It could be.
或者像我常说的代码案例——某个用户账号突然开始生成惊艳的软件。好了,我们用剩下的时间聊聊你吧。从头开始讲讲你的故事,你是怎么从出生一步步走到硅谷的?
Or a code as I say, a a user read it, all of a sudden, it's generating incredible software. Okay. Let's, let's spend our remaining minutes. Let's let's talk about you. So, so, so how so, yeah, take us start from the beginning with your, with your life, and how how did you get how did you get from being born to being in Silicon Valley?
好的。两分钟后。对。我只是在开玩笑。
Okay. In two minutes. Yeah. I'm just I'm joking.
是的。我很小的时候就接触电脑了,不知为何——我出生在约旦安曼。当时我父亲只是个政府工程师,但他不知怎么就觉得电脑很重要。虽然没什么钱,他还是贷款买了台电脑,那是我们整个社区的第一台电脑。
Yeah. I I got into just computers very very early on, and so for whatever reason so I was born in Amman, Jordan. And for whatever reason, my my dad, who was just a government engineer at the time, decided that computers were important. And he didn't have a lot of money, took out a debt, bought a computer. It was the first computer in in neighborhood.
也是我认识的人里最早拥有电脑的。我最早的记忆之一就是六岁时,看着父亲拆开这台机器,翻开厚厚的说明书,用手指笨拙地敲着CD、LS、MKDIR这些命令。我常趴在他肩后,看他输入指令,看着机器精准执行他的命令。一边还嚼着泰诺药片——
First computer of anyone I know. And I just one of my earliest memories, I was six years old, just watching my dad unpack this machine and sort of open up this huge manual and kind of finger type CD, LS, MKDIR. And like I would, you know, be behind his shoulder and just watching him type these commands and seeing the machine respond and do exactly what he's asked it to do. Popping Tylenol as you're
嚼泰诺药片。
Popping Tylenol.
没错。自闭症激活器?当然。必须的。
Exactly. Autism Activator? Of course. Have to.
你必须的。确实。
You have to. Exactly.
是什么型号的电脑?如果没记错的话是IBM的。IBM PC吗?对,是IBM PC。那一年是1993年。
Kind of computer was it? It was an IBM, as far as I remember. IBM PC? It was IBM PC. So what year was 1993.
1993年?好吧。那时候还是DOS系统。那时有Windows了吗?
1993? Okay. So it was DOS. Did it have Windows at that point?
没有。还没有Windows。就在Windows出现之前。就在Windows之前。但我想Windows已经发布了,但你需要
No. It didn't have Windows. It was right before Windows. Right before Windows. But I think Windows had been out, but you would It
它是一个附加组件。
was an add on.
它是一个附加组件。你不会直接启动它。所以我们应该是买了Windows的安装盘,你得从磁盘引导加载它,然后才能打开Windows,可以点击操作。但它并不那么有趣,因为上面没多少内容。所以我大部分时间都在用DOS,写批处理文件,打开游戏,捣鼓那些东西。
It was an add on. You wouldn't boot it up. So we I think we bought the disk for for for for Windows, and you had to kind of boot load it, you know, from the disk, and and it'll open Windows, and you can click around. It wasn't that interesting because there wasn't a lot on it. So a lot of time, I just spent it in DOS and writing batch files and opening games and messing around with that.
但直到Visual Basic出现后我才开始,就是在Windows 95之后,我开始制作真正的软件。我的第一个想法是,我曾经是个重度游戏玩家,经常去这些局域网游戏咖啡馆玩《反恐精英》。我去那里时发现,整个地方全是电脑,但他们没有任何管理业务的软件。
But it wasn't until Visual Basic that I started, so like after Windows 95 that I started making real software. And the first idea I had was, I used to be a huge gamer, so I used to go to these LAN gaming cafes and play Counter Strike. And I would go there and, you know, the whole thing is full of computers, but they don't use any software to run their business.
它
It
就像人们跑来跑去,手动记录你的机器编号、使用时长和付款金额,还会拍你肩膀说‘嘿,你需要再付点钱’。我问他们为什么不做个软件让我登录计时之类的?他们说‘是啊,我们不懂’
was just like people running around, just like writing down that your machine number, how much time you spend on it, and how much did you pay, and kind of tapping your shoulders like, hey, you need to pay a little more for that. And I asked them like, why don't you like just build a piece of software that allows me to log in and have a time or whatever? I was like, yeah, we don't know
如何做到。而我
how to do that. And I
当时就想,好吧。我觉得我知道怎么做。于是我花了——那时我大概12岁左右——花了两年时间构建它,然后出去尝试销售,居然真的卖出去了。赚了好多钱。
was like, okay. I think I know how to do that. So I spent I was like 12 or something like that. I spent like two years building that, and then went out and tried to sell it and was able to sell it. Was making so much money.
我记得麦当劳在约旦开业时我大概13、14岁,我请全班同学去了麦当劳。虽然很贵但我那时很有钱,挥霍了所有积蓄就为了炫耀。这就是我创建的第一个生意。后来当我开始接触AI、读科幻小说那些东西时——到该上大学时,我不想选计算机专业,因为觉得编程迟早会被自动化取代。
I remember McDonald's opened in Jordan around the time when I was 13, 14, I took my entire class to McDonald's. It was very expensive but I was balling and I had spent all this money and I was showing off. So that was the first business that I created. And then when it came to at the time I started kind of learning about AI, reading sci fi and all of that stuff. And when it came time to go to college, I didn't want to go to computer science because I felt like coding is on its way to get automated.
我记得用过那些向导工具。你还记得吗?
I remember using these wizards. Do you remember those?
记得。基本上就是向导工具。非常原始的早期机器人。能生成代码。你看,生成代码,对。
Yes. Wizards, basically. It's extremely crude, early bots. That generate code. Look at it, generate code, yeah.
是啊,还记得你只要输入一些信息,比如这是我的项目、这是功能之类的,然后点点点,它就能搭建出大量代码框架。我当时就觉得这就是未来。编程这种
Yeah, and remember you could you know, type in a few things, like here's my project, here's what it does, whatever, and then click click click, and it was just like scaffold a lot of code. Was like, I think that's the future. Like coding is such a
几乎已经被解决了。
It's almost solved.
是的,问题解决了。我该去学编程吗?我当时想,好吧,如果AI能写代码,我该做什么?总得有人来建造和维护电脑。于是我去学了计算机工程,干了一段时间。
Yeah, it's solved. Should I go into coding? I was like, okay, if AI can do the code, what should I do? Well, someone needs to build and maintain the computers. And so I went to the computer engineering and did that for a while.
但后来重新发现了对编程的热爱,读了些关于Lisp之类的编程文章,开始捣鼓Scheme这类编程语言。但发现学习不同编程语言异常困难。那时我没有笔记本电脑,每次想学Python或Java,就得去计算机实验室,下载几个G的软件,试着安装,敲点代码,运行它,结果遇到DLL缺失之类的问题。我就想,这也太原始了吧。那是2008年2月左右,当时已经有Google Docs和Gmail了,多亏你们,打开浏览器就能用网络软件。我觉得网络就是终极软件平台,一切都该搬到网上。
But then rediscovered my love for for programming, reading program essays on Lisp and things like that, and started messing around with Scheme and programming languages like that. But then I found it incredibly difficult to just learn different programming languages. I didn't have a laptop at the time, and so every time I'd go to wanting to learn Python or Java, I would go to the computer lab, download gigabytes of software, try to set it up, type a little bit of code, try to run it, you know, run into missing DLL issue or and I was like, man, this is so primitive. Like, at the time, it was 02/2008, something like that, we had Google Docs, we had Gmail, you could open the browser, partly thanks to you, and be able to kind of use software on the internet. And I thought the web is the ultimate software platform, like everything should go on the web.
好吧,那谁在构建在线开发环境?结果发现没人做。感觉就像在中央车站地上捡到100美元——肯定该有人做这个啊。但确实没人做。于是我想,行吧,我来试试。
Okay, who's building an online development environment? And no one. And it felt like I found a $100 bill on the floor of Grand Central Station, like surely someone should be building this. But no, no one was building this. And so I was like, okay, I'll I'll try to build it.
几小时后我搞出了个东西——就是个文本框。你输入些JavaScript,有个写着'eval'的按钮。点击它,结果会显示在弹窗里。比如1加1等于2,我
And I got something done in, like, a couple hours, which was a text box. You type in some JavaScript, there's there's a button that says eval. You click eval, and it shows you in an alert box. So one plus one, two, I
当时就觉得,
was like,
我有了个编程环境。给朋友看后,开始有人用了。我又加了保存程序等功能,心想,
I have a programming environment. I showed it to my friends, people started using it, I added a few additional things like saving the program, I was like,
好吧,
okay,
好的,这里有个真正的创意,大家都很喜欢。但说实话,我花了整整两三年才能真正构建出东西,因为浏览器只能运行JavaScript。当时Mozilla有个叫Emscripten的资源项目实现了突破,它允许你将C、C++等编程语言编译成JavaScript。为了让浏览器能运行Python这样的语言,我需要把C Python编译成JavaScript——我成了全球第一个做到这件事的人。我为这个项目贡献了代码,并搭建了大量支撑架构,最终和朋友们成功将Python编译进了JavaScript。
alright, there's a real idea here, people love it. And then again, it took me two or three years to actually be able to build anything because the browser can only run JavaScript, and it took a breakthrough at the time Mozilla had a resource project called Emscripten that allowed you to compile different programming languages like C, C plus plus into JavaScript. And for the browser to be able to run something like Python, I needed to compile C Python to JavaScript. So I was the first to do it in the world. So built contributed to that project and built a lot of the scaffolding around it, and we my friends and I compiled Python into JavaScript.
我当时就想:既然Python能搞定,那Ruby、Lua也都能搞定。这就是Replit创意的起源——当你需要REPL(交互式编程环境)时,就应该能立即获得它。REPL是最基础的编程环境,我陆续添加了各种编程语言。这段时间里,我的朋友们都在使用这个系统并为之兴奋。当时我活跃在GitHub上,我的原则很简单:只要开发出软件,就立刻开源。
And I was like, okay, we did it for Python, let's do it for Ruby, let's do it for Loa, let's do And that's how the emergence of the idea for Replit came, is that when you need a Repl, you should get it. You should Replit. And so Repl is the most primitive programming environment possible. I added all these programming languages, and again, all this time, my friends were using it and excited about it. And I was on GitHub at the time, and just my standard thing is when I make a piece of software, just open source it.
于是我把所有成果都开源了。那些年我一直在构建底层架构,只为实现在浏览器中运行代码。后来这个项目突然爆红,登上了Hacker News头条,恰逢MOOC(大规模在线开放课程)浪潮兴起——Udacity刚上线,Coursera开始运营,最著名的当属Codecademy,它是首个支持在浏览器中交互式编程学习的网站。
And so I was open sourcing all the things. I was you know, years building just like this underlying infrastructure to be able to just run code in the browser, and then it went viral. It went viral in Hacker News, and it coincided with the MOOC era. So massively online courses, Udacity was coming online, Coursera, and most famously Codecademy. So Codecademy was the first kind of website that allowed you to code in the browser interactively and learn how to code.
他们大量使用了我从约旦开源出去的软件。我记得在Hacker News上看到他们爆红的新闻时,立刻认出了自己的技术。我在评论区留言询问他们用了什么方案,就这样和他们取得了联系。
And they built a lot of it on my software that I was open sourcing all the way from Jordan. And so so I remember seeing them on Hacker News and they were going super viral. Was like, I recognize this. What are you using? And so I left the Hacker News comments.
我说'你们用的是我开源的工具包吧',结果他们主动联系我:'嘿,我们想聘用你'。但我回复说没兴趣,我想自己创业。
Was like, oh, you're using my open source package. And so they reached out to me. They're like, hey, we'd like to hire you. Was like, I'm not interested. I wanna start a startup.
我想创建这个叫Replit的项目。
I wanna start this thing called Replit. And
他们却说:'不,你...'
they're like, well, no, you you
应该来和我们一起工作。我们可以做同样的事情。我一直拒绝。我说,好吧,我可以和你们签合同。他们当时每小时付我12美元。
should come work with us. We can do the same stuff. And I kept saying no. I was like, okay, I'll contract with you. They were paying me $12 an hour.
我对此非常兴奋。我刚从阿曼回来。但值得称赞的是,他们专程从约旦来招募我,在那里待了几天。我一直拒绝,最后他们给了我一个无法拒绝的offer。
I was really excited about it. I'm back from Oman. But they came out to their to their credit. They came out of Jordan to recruit me and spent a few a few days there. And then I, you know, I kept saying no, and in the end, they gave me an offer I can't refuse.
他们帮我办了O-1签证,来到美国。那时你就搬过来了。
And they got me an o one visa, came to The United States. That's when you moved.
那么第一次是什么时候,因为你当时
So when was the first because you were
出生年份是?1987年。
born what year? 1987.
87年。你记忆中第一次产生可能不会在约旦度过一生、而是会搬到
'87. What was the first year that you could remember where you had the idea that you might not live your life in Jordan, and that you might actually move to
美国的念头是什么时候?是在我看《硅谷传奇》的时候。
The US? When I watched Pirates of Silicon Valley.
是这样吗?好的。
Is that right? Okay.
明白了。好吧。可能是98或99年。我不知道它什么时候发布的。可能是那个地方,对。
Got it. Alright. Maybe '98 or '99. I don't know when it came out. Might be the place to Yeah.
值得讲那个黑客故事吗?因为存在一个世界线的版本,如果你当时想法不同,也许你就不会去美国了。
Is it worth telling the hacker story? Because there's a version of the world where you didn't actually like, if that changed differently, maybe you wouldn't have gone to America.
对,对。是的。所以在学校里,我一直在编程。懂吗?
Right. Right. Yeah. So in in school, I was programming the whole time. Know?
所以我一直想创业,脑子里随时都充满想法。Replit之所以存在,就是因为我总有各种点子,我只想在电脑上敲代码把它们实现出来。上学对我来说无聊透顶。如今Replit有移动端应用,部分原因就是我总想在课桌底下偷偷编程。对吧。
So I I just wanted to start businesses, I'm exploding with ideas all the time. The reason Replit exists is because I have ideas all the time, I just wanna go type it on a computer and build them. So I wasn't going to school, was incredibly boring for me. And part of the reason why Replit has a mobile app today is because I always wanted to program under the desk. Right.
就像,以前常干这种事。学校总因出勤率让我挂科。我明明能拿A却因为旷课挂科。我觉得这太不公平了,朋友们都在2011年毕业了,而我在大学耗了六年——本该三四年就完成的。那时我极度抑郁,一心向往硅谷。于是就想:要不篡改成绩?
It's like, used to do things. And so at school they kept failing me for attendance. So I would get A's but I just didn't show up and so they would fail me. And so I felt that was incredibly unfair and all my friends were graduating now this year was like 2011, I've been like for six years in college, it should be like a three or four year, and I was just like incredibly depressed, I really wanted to be in Silicon Valley. And so, I was like, oh, what if I change my grades?
就是这样。入侵大学数据库。于是我躲进父母家地下室,开始实践多相睡眠。你知道——我是说达芬奇的多相睡眠法。
There we go. In the university database. So I went into my parents basement and implemented the polyphasic sleep. Are you familiar with I I I am. Leonardo da Vinci's Polyphasic Sleep.
我不是从达芬奇那里听说的,是从《宋飞正传》里知道的。因为有一集约翰·克梅尔尝试了多相睡眠。
I didn't hear from Leonardo da Vinci. I heard it from Seinfeld. Because there's an episode where John Kemer goes on on polyphasic sleep.
什么?每四小时睡二十分钟。对。
What? Twenty minutes every four hours. Yes.
每四小时睡二十分钟。
Twenty minutes every four hours.
是啊,这样居然还能运作良好。
And yes, and this somehow is gonna work well.
没错,如果你搞过黑客行为就懂。
Yeah, and hacking, if you've ever done anything.
展开剩余字幕(还有 50 条)
就像那个梗说的,对别人从来没用过,但说不定
As the meme goes, has never worked for anybody else, but it might work
对我有效。是的。黑客行为的本质就是,你构思出寻找某些安全漏洞的点子,写个脚本运行,脚本要跑二三十分钟,你就利用这段时间睡觉。于是我疯魔了两周,试图黑进大学数据库。最后终于在某处发现SQL注入漏洞,找到了修改记录的途径。
for me. Yes. And a lot of what hacking is, is that you're coming up with ideas for finding certain security holes and writing a script and then running that script, and that script will take you twenty, thirty minutes to run, and so you'll take that twenty, thirty minutes to sleep and go on. So I spent two weeks just going mad, like trying to hack into the university database. And finally, I way, I found a SQL injection somewhere on the site, and I found a way to be able to edit the records.
但我不想冒险,于是去找了同校的邻居。至今没人抓到他,但我对他说:嘿,我有办法改成绩,你愿意当我的小白鼠吗?我实话实说,告诉他我不会动手。
But I didn't wanna risk it, so I went to my neighbor who's going to the same school. I think till this day no one caught him, but I went to him and I said, hey, I have this way to change grades, like would you want to be my guinea pig? And I was honest about it, I was like, I'm not gonna do it.
你愿意尝试吗?
Are you open to doing it?
他说,好啊。好啊。
He's like, yeah. Yeah.
这叫人体试验。医药研发就是这么干的。所以
They call this human trials. This is how medicine works. So
于是我们去了,改了他的成绩。他去拉成绩单发现没更新,又回到地下室。原来我只能访问从数据库,主库没权限。后来通过网络找到Oracle数据库漏洞,权限提升后发现了真实数据库。然后给自己改了成绩,拉出成绩单。
so we we went and and and we went and changed his grades, he he went and pulled his transcript, and the update wasn't there, and went back to the basement. Well, it turned out that I had access to the slave database. I didn't have access to the master database. So find a way through the network, privilege escalation, it was an Oracle database that had a vulnerability, and then found the real database. And then I just did it for myself, changed grades, and went and pulled my transcripts.
果然成绩真的变了。买了毕业袍,参加毕业派对,一切就绪。结果某天傍晚六七点,家里电话突然响了。
And sure enough, it actually changed. Went and bought the gown, went to the graduation parties, did all that, we're graduating. And then one day, I'm at home, it's like maybe six or 7PM, I get the telephone at home rings.
我要摇点什么。没错。
I'm gonna ring something. Yes.
嗯,你好。然后他说,嘿,这是大学注册系统,我认识负责运营的人。他说,你看,我们遇到这个问题了。系统一整天都宕机,而且总是跳转到你的记录。
Well, hello. And he's like, hey, this is the university registration system, and I knew the guy that run it. He's like, look. You know, we we we're having this problem. The system's been down all day, and it it keeps coming back to your record.
你的记录里有个异常情况:你既有及格成绩,却又被禁止参加那门课的期末考试。我当时就懵了,心想‘完蛋’。结果发现数据库没做规范化处理。通常他们禁止你考试时,成绩会重置为100分制下的35分。但显然有个布尔标志位——顺便说,数据库里所有列名都是单个字母。
There's an anomaly in your record where you're both pass you have a passing grade, but you're also banned from that final exam with subject. I was like, oh, shit. Well, turns out the database is not normalized. So typically, when they ban you from an exam, the the grades resets to 35 out of a 100. But apparently there's a boolean flag, and by the way, all the column names in the database are single letters.
最棘手的就是这种‘晦涩安全’设计。结果发现有个标志位我没检查。考勤不过关时,他们想让你挂科就会禁止你参加期末考试。我改了成绩导致系统出问题崩溃了。他们打电话来质问时,我在想‘要么撒谎搞出大麻烦,要么干脆坦白’。
So that was the hardest thing, security by obscurity. And it turns out there's a flag that I didn't check. When you go over attendance, when you don't attend and they wanna fail you, they ban you from the final exam. So I changed the grades and that created an issue and brought down the system. So they were calling me and I thought at the time, I was like, you know, I could I could potentially lie and it'll it'll be a huge issue or I just like, I'll just fuss up.
我决定坦白。对。我就说‘嘿,听我说。是这样的’。
I'll just fuss up. Yeah. So I said, hey, listen. Look. Yeah.
我可能知道些内情。这样,明天我过来详细说明情况。结果我推门进去——所有学院的院长都在场,计算机科学学院之类的...他们已经连续处理了好几天,毕竟这是所计算机专业很强的大学,当时系统出了大问题。
I might know something about it. Hey. Let me let me come tomorrow and kind of talk to you about what happened. So I go in and I open the door, it's the deans of all the all the schools. It's like computer science, computer they were all working on it for, like, days because it's like it's like it's a very computer heavy, you know, university, and there was like a problem.
他们都特别好奇发生了什么。于是我拉出白板开始解释自己的操作,所有人都听入迷了。基本上相当于给他们上了一课——这就是你的
And they're all kind of really intrigued about what happened. And so I pull up a whiteboard and started explaining what I did and everyone was engaged. I gave them a lecture basically. Your oral exam for
博士答辩。没错,很棒。
your PhD. Yeah. Is great.
他们当时非常兴奋。我觉得这让他们很受触动。就像‘哇哦,这真是个有趣的问题’。然后我就说‘好吧,太好了’。
They were really excited. And I think it was endearing to them. Was like, oh, wow. This is a very interesting problem. And then I was like, okay, great.
谢谢。我当时就说‘嘿,等等。我们不知道该怎么处理你。要把你送进监狱吗?’
Thank you. I was like, hey, wait. Wait. We don't know what to do with you. Do we send you to jail?
我当时就说‘这事得上报给大学校长’。他是个了不起的人,给了我人生的第二次机会。我去找他解释情况,说‘我真的非常沮丧,我需要毕业’。
And I was like, hey, have to escalate to the university president. And and he he was a great man, and I think he gave me a second chance in life. And I went to him and I, you know, I I explained situation. I said like, I'm really frustrated. I need to graduate.
我需要继续我的生活。我在这里已经六年了,不能再耗在学校里。我已经掌握的知识足够证明我是个优秀的程序员。他当时对我说了句蜘蛛侠的台词‘能力越大责任越大,你拥有强大的能力’。这话深深触动了我,那一刻他是对的。最后他说‘我们会放你一马,但你要帮系统管理员加强系统安全’。就这样。
I need to get on with my life. I've been here for six years and I just can't sit in in in school. The stuff I already know, I'm a really good programmer. And and he gave me a Spider Man line at the time, it's like, with great power comes great responsibility and you have a great power and you And it really affected me and I think he was right at the moment. So he said, well, we're gonna let you go but you're gonna have to help the system administrators secure the system There we go.
整个夏天。我很乐意做这事。但到岗后发现所有程序员都讨厌
For the summer. I was like, happy to do it. And I show up and all the programmers there hate
我。
me.
对。恨透我了。
Yeah. Hate my guts.
是的,百分之百。
Yes. 100%.
然后他们会把我锁在外面。比如,我会看到他们在外面,我敲门,但没人理会,就像他们不想让我进去一样。我试着帮他们一点忙,但他们不合作,所以我就想,好吧,随便吧。到了我真正要毕业的时候,那是最终项目,计算机科学学院的院长之一来找我,他说,听着,需要你帮个忙。我们当初放你一马、没有追究你,很大程度上是因为你。
And they they would lock me out. Like, would see them, they would be outside, I would knock on the door, and no one would listen, it was like, they don't wanna let me in. I tried to help them a little bit, they weren't collaborative, and so I was like, alright, whatever. And so, it came time for me to actually graduate, it was the final project, And one of the computer science dean came to me and he said, look, need to call a favor. I was a big part of the reason we kind of let you go and we didn't kind of prosecute you.
所以我想让你和我一起完成这个最终项目。内容会围绕安全和黑客技术。我说,不。我已经受够了
So I want you to work with me on the final project. And it's gonna be around security and hacking. I was like, no. I'm I'm done
这种破事。就像,我
with that shit. Like, I
只想专注于构建编程环境之类的东西。但他坚持说,不行,你必须做这个。我说,好吧。于是我想,不如做些更有建设性的事。
just wanna I just wanna build programming environments and things like that. And he's like, no. You have to do it. I was like, okay. So I I thought I'd do something more productive.
所以我编写了一个让我非常自豪的安全扫描器,它能爬取不同网站,尝试SQL注入等各种攻击。实际上,我的扫描器在系统中发现了另一个漏洞。太棒了。于是我去答辩,他说,你需要现场运行这个扫描器,证明存在漏洞。当时我没明白怎么回事,但还是答应了。
So I wrote a security scanner that I was very proud of, that kinda crawls a different side that tries to do SQL injection and all sorts of things. And actually, my security scanner found another vulnerability in the system. Amazing. And so I went to the defense, and he's like, you need to run this security scanner live and show that there's a vulnerability. And I didn't understand what was going on at the time, but I just, okay.
于是我做了关于系统工作原理的演示,然后说,来运行一下吧。扫描器显示存在安全漏洞。好,现在尝试获取shell。系统自动执行所有安全检测后,成功获取了shell。结果发现,另一位院长当时正负责系统的安全加固任务。
So I give the presentation about how the system works, and I was like, oh, let's run it. And it showed that there's security vulnerability. Okay, let's try to get a shell. So the system automatically runs all the security stuff and it gets you a shell. And then the other dean that turned out, he was giving the mandate to secure the system.
现在我意识到自己成了某种争斗中的棋子。他脸色通红地说:不可能,我们系统很安全,你在撒谎。我回应道:你这是在指控我说谎。
And now I started to realize I'm a pawn in some kind of rivalry here. And his face turned red and he's like, No, it's impossible. We secured the system. You're lying. I was like, you know, you're accusing me of lying.
好吧。我们需要知道什么?你的工资还是密码?你想让我查什么?我说:查我的密码。
Alright. What should we know? Should we know your salary or your password? What do you want me to look up? And I was like, look up my password.
于是我查了他的密码,结果是一堆乱码,是加密的。
So I I look up his his password. And it was like gibberish. It was encrypted.
而我
And I
就说:这不是我的密码。看吧,你在撒谎。我说:这里有程序员设置的解密功能。
was like, that's not my password. See? You're lying. I was like, well, there's a decrypt function
是程序员加进去的。
that the programmers put in there.
于是我进行解密,显示出了他的密码。是个很尴尬的内容,我忘了具体是什么。
So I I do decrypt, and it shows his password. Something embarrassing. I forgot
我忘记那是什么了。
I forgot what it was.
于是他非常生气地站起来,握了握我的手就离开去改密码了。后来我又成功黑进学校系统一次。幸运的是我最终顺利毕业,把软件交给他们后,他们加固了系统。是的,后来我才明白,他其实是想让
And so he gets up, really angry, shakes my hand and leaves to change his password. I was able to hack into the university another time. Luckily, I was able to graduate, give them the software, they secured the system. Yeah, later on I would realize that yeah, he wanted to embarrass the
另一个家伙出丑,所以我被夹在中间。我觉得这个故事的寓意是——如果你能成功黑进学校系统修改成绩,那你理应得到那个分数,也理应毕业。
other guy, which is why I was in the middle. I think the moral the moral of the story is if if you can successfully hack into your school system and change your grade, you deserve the grade, and you deserve to graduate.
我...我也这么认为。
I I think so.
还有...对所有家长或孩子们说一句,你们可以...可以引用我...
And and just for any for any parents out there, just yeah. Or children out there, you can just you can can cite me as
马克·安德森的话。
Mark Andreessen.
你们可以引用Phantom作为这方面的道德权威。
You can cite on Phantom as the moral authority in this.
我认为在人工智能时代,一个非常相关的教训可能是:传统循规蹈矩的道路带来的回报正变得越来越少。我觉得,当今的孩子们应该利用一切可用工具去探索和规划自己的道路。因为仅仅听从传统建议、重复前人做过的事情,效果已经远不如我们所期望的那样了。是的,正是如此。
One maybe lesson I think that is very relevant for the AI age, I think that the traditional of more conformist path is paying less and less dividends. And I think, you know, kids coming up today should use all the tools available to be able to discover and chart their own paths. Because I feel like just, you know, listening to the traditional advice and doing the same things that people have always done is just not as it's not working out as much as as we'd like. Yeah. That's right.
我是约翰。是的。
I'm John. Yeah.
感谢收听这期播客。
Thanks for the podcast.
谢谢你,伙计。太棒了。
Thank you, man. Fantastic.
感谢收听本期a16z播客。如果喜欢本期内容,请记得点赞、评论、订阅、给我们评分或写评论,并与亲友分享。更多节目请访问YouTube、Apple Podcasts和Spotify。在X平台关注我们@a16z,订阅我们的Substack博客a16z.substack.com。再次感谢收听,我们下期节目再见。
Thanks for listening to this episode of the a 16 z podcast. If you like this episode, be sure to like, comment, subscribe, leave us a rating or a review, and share it with your friends and family. For more episodes, go to YouTube, Apple Podcasts, and Spotify. Follow us on X at a sixteen z, and subscribe to our Substack at a16z.substack.com. Thanks again for listening, and I'll see you in the next episode.
温馨提示:本内容仅供信息参考,不应视为法律、商业、税务或投资建议,也不应用于评估任何投资或证券,且不针对任何a16z基金的现有或潜在投资者。请注意a16z及其关联机构可能持有本播客讨论企业的投资。更多详情及投资披露链接,请访问a16z.com/disclosures。
As a reminder, the content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any a sixteen z fund. Please note that a 16 and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see a 16z.com forward slash disclosures.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。