No Priors: Artificial Intelligence | Technology | Startups - 安德烈·卡帕西谈代码代理、自动研究与AI的循环时代 封面

安德烈·卡帕西谈代码代理、自动研究与AI的循环时代

Andrej Karpathy on Code Agents, AutoResearch, and the Loopy Era of AI

本集简介

当AI代理能够设计实验、收集数据并自我改进,而无需人类介入时,会发生什么?安德烈·卡帕西与莎拉·郭探讨了当前模型的现状、工程与教育的未来、对就业的影响,以及他的项目AutoResearch:代理如何自主完成AI研究的闭环(实验、训练和优化)。 00:00 安德烈·卡帕西简介 02:55 还有哪些能力限制? 06:15 编程代理的精通表现是什么? 11:16 自然语言编程的二级效应 15:51 为什么要做AutoResearch? 22:45 AI时代的关键技能 28:25 模型的物种分化 32:30 构建更多人与AI的协作界面 37:28 就业市场数据分析 48:25 开源与闭源模型 53:51 自主机器人 1:00:59 MicroGPT与代理式教育 1:05:40 结语

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

代码这个词甚至已经不再是正确的动词了,对吧?

Code's not even the right verb anymore, right?

Speaker 0

但我必须向我的代理表达我对于十六个显化的要求。

But I have to express my will to my agents for sixteen Manifest.

Speaker 0

数小时的

Hours a

Speaker 1

我怎么能不只是获得一次剧情代码、编码或这些代理框架的会话呢?

can I have not just a single session of plot code or codex or some of these agent harnesses?

Speaker 1

我怎么能拥有更多这样的东西?

How can I have more of them?

Speaker 1

我该如何恰当地做到这一点?

How How can I do that appropriately?

Speaker 1

代理部分现在已经被视为理所当然了。

The Agent part is now taken for granted.

Speaker 1

现在,像爪子一样的实体也被视为理所当然了。

Now the Claw like entities are taken for granted.

Speaker 1

现在你可以拥有多个这样的实体。

Now you can have multiple of them.

Speaker 1

现在你可以向它们发出指令。

Now you can have instructions to them.

Speaker 1

现在你可以优化这些指令。

Now you can have optimization of the instructions.

Speaker 1

这就是为什么它会走向精神错乱,因为这就像无限延伸,一切都被归结为技能问题。

This is why it gets to the psychosis, is that this is like infinite and everything is skill issue.

Speaker 0

嗨,听众们。

Hi, listeners.

Speaker 0

欢迎回到KnowBriars。

Welcome back to KnowBriars.

Speaker 0

今天,我在这里与安德烈·卡帕西讨论,我们将进行一场广泛而深入的对话,内容涉及代码代理、工程与人工智能研究的未来、如何让更多人参与研究、机器人领域的最新进展、他对代理如何延伸至现实世界的预测,以及这一新阶段的教育。

Today, I'm here with Andrej Karpathy, and we have a wide ranging conversation for you about Code Agents, the future of engineering and AI research, how more people can contribute to research, what's happening in robotics, his prediction for how agents can reach out into the real world, and education in this next stage.

Speaker 0

欢迎你,安德烈。

Welcome, Andrej.

Speaker 0

安德烈,谢谢你参与这次对话。

Andrej, thanks for doing this.

Speaker 1

是的。

Yeah.

Speaker 1

谢谢你的邀请。

Thank you for having me.

Speaker 0

最近几个月人工智能领域真是令人兴奋。

So it's been a very exciting couple of months in AI.

Speaker 1

是的,这么说也不为过。

Yeah, you could say that.

Speaker 0

我记得有一次走进办公室,看到你全神贯注地工作。

I remember walking into the office at some point and you were like really locked in.

Speaker 0

我问你在忙什么,你说你必须每天连续编程十六个小时。

Was asking what you were up to, and you're like, I just I have to code for sixteen hours a day.

Speaker 0

或者说‘编程’这个词都不太准确了,对吧?

Or code's not even the right verb anymore, right?

Speaker 0

但我必须向我的代理表达我的意愿,每天十六小时。

But I have to express my will to my agents for sixteen Manifest.

Speaker 0

因为能力上出现了一个飞跃。

Hours a Because like there's been a jump in capability.

Speaker 0

到底发生了什么?

What's happening?

Speaker 0

跟我谈谈你的经历。

Tell me about your experience.

Speaker 1

是的。

Yeah.

Speaker 1

我总觉得我一直处于一种持续的AI精神错乱状态,现在也经常这样,因为作为个人,你能实现的东西突然有了巨大突破。

I kind of feel like I was just in this perpetual I still am often in this state of AI psychosis just like all the time because there was a huge unlock in what you can achieve as a person, as an individual.

Speaker 1

对吧?

Right?

Speaker 1

因为你之前被打字速度之类的东西限制住了。

Because you were bottlenecked by, you know, your typing speed and so on.

Speaker 1

但自从有了这些代理,我觉得真正发生转变是在十二月,那时我从原本自己写代码占八成、委托给代理占两成,变成了反过来——自己写代码只占两成,主要靠代理完成。

But now with these agents, it really I would say in December is when it really just something flipped, where I kinda went from eighty twenty of, like, you know, to, like, twenty eighty of writing code by myself versus just delegating to agents.

Speaker 1

而且我现在觉得,甚至都远不止二八开了。

And I don't even think it's twenty eighty by now.

Speaker 1

我觉得比例还要高得多。

I think it's a lot more than that.

Speaker 1

我几乎从十二月起就没亲手敲过一行代码了,这简直是一个巨大的转变。

I don't think I've typed, like, a line of code probably since December, basically, which is, like, an extremely large change.

Speaker 1

我曾经跟比如我的父母等人聊过这件事,但我觉得普通人根本没意识到这种变化已经发生,也没意识到它有多剧烈。

I was talking to it, like, for example I was talking about it to, for example, my parents and so on, and don't think, like, a normal person actually realizes that this happened or how dramatic it was.

Speaker 1

比如,如果你随便找一个软件工程师,看看他们坐在桌前在做什么,他们的日常软件开发流程,从十二月起就已经完全不一样了。

Like, literally, like, if you just find a random software engineer or something like that at their at their desk and what they're doing, like, their default workflow of, you know, building software is completely different as of basically December.

Speaker 1

所以我现在一直处于一种精神恍惚的状态,不断探索什么是可能的,拼命 pushing 到极限。

So, I'm just, like, in this state of psychosis of trying to figure out, like, what's possible, trying to push it to the limit.

Speaker 1

我怎么能只进行一次会话,就完成代码编写、编解码,或者使用这些代理框架呢?

How is it how can I have not just a single session of, you know, clock code or codecs or some of these agent harnesses?

Speaker 1

我怎么能拥有更多这样的工具?

How can I have more of them?

Speaker 1

我该怎么恰当地使用它们?

How can I do that appropriately?

Speaker 1

然后我该怎么使用这些爪子?

And then how can I use these claws?

Speaker 1

这些爪子到底是什么?

What are these claws?

Speaker 1

所以现在出现了很多新东西。

And so there's, like, a lot of new things.

Speaker 1

我想站在最前沿,你知道的,但我非常不安,因为我并没有站在最前沿。

I wanna be at the forefront of it, you know, and I'm very antsy that I'm not at the forefront of it.

Speaker 1

我在推特上看到很多人在做各种各样的事情,听起来都是很好的点子,我必须站在最前沿,否则我会感到极度焦虑。

And I see lots of people on Twitter doing all kinds of things, and they all sound like really good ideas, and I need to be at the forefront or I feel extremely nervous.

Speaker 1

所以我想我正处于一种疯狂的状态,思考着什么是可能的?

And so I guess I'm just in this psychosis of like, what's possible?

Speaker 1

因为这从根本上来说还是未被探索的。

Like, because it's unexplored fundamentally.

Speaker 0

如果你感到焦虑,那我们其他人也一样焦虑。

Well, if you're nervous, the rest of us are nervous.

Speaker 0

我们在Conviction有一个团队,他们的工作方式是,所有工程师都不用手写代码。

We have a team that we work with at Conviction that their setup is everybody is like, you know, none of the engineers write code by hand.

Speaker 0

他们全都戴着麦克风,不停地轻声对他们的代理发出指令。

And they're all microphoned, they just like whisper to their agents all the time.

Speaker 0

这是最奇怪的工作环境了。

It's the strangest work setting ever.

Speaker 0

我以前觉得他们疯了,但现在我完全接受了。

And I thought they were crazy, and now I, like, I fully accept.

Speaker 0

我当时想,哦,原来这才是正确的方式。

I was like, oh, this was the way.

Speaker 0

你只是走在了前面。

Like, you're just ahead of it.

Speaker 0

是的。

Yes.

Speaker 0

你现在如何看待自己探索或做项目的潜力?

How do you think about your own capacity now to, like, explore or to do projects?

Speaker 0

比如,是什么限制了它?

Like, what is it limited by?

Speaker 1

对。

Yeah.

Speaker 1

是什么限制了它?

What is it limited by?

Speaker 1

我觉得方方面面都有限制,即使那些事情没成功,在很大程度上你还是会觉得这是个技能问题。

Just I think everything like, so many things, even if they don't work, I think to a large extent, you feel like it's a skill issue.

Speaker 1

并不是能力不存在。

It's not that the capability is not there.

Speaker 1

只是你还没找到方法。

It's that you just haven't found a way Yeah.

Speaker 1

把现有的东西串联起来。

To string it together of what's available.

Speaker 1

比如,我没有在代理的MD文件或其他地方给出足够的指示。

Like, I just don't I didn't give enough instructions in the agent's MD file or whatever it may be.

Speaker 1

我没有一个足够好的记忆工具放在里面,或者类似的东西。

I don't have a nice enough memory tool that I put in there or something like that.

Speaker 1

所以,当事情不顺利时,一切都感觉像是技能问题。

So, it all kinda feels like skill issue when it doesn't work to some extent.

Speaker 1

你想看看怎么让他们陷入瘫痪之类的,本质上你想成为彼得·斯坦伯格。

You wanna see how you can paralyze them, etcetera, and you wanna be Peter Steinberg, basically.

Speaker 1

所以,彼得很有名。

So, Peter is famous.

Speaker 1

他有一张很有趣的照片,照片里他站在一台显示器前,上面有很多他使用的编解码器。

He has a funny photo where he's in front of a monitor with lots of, like, he uses codecs.

Speaker 1

所以,很多编解码器代理在显示器上展示,如果提示得当并投入大量精力,每个都需要大约二十分钟。

So, lots of codecs agents styling the the monitor, and they all take about twenty minutes if you prompt them correctly and use the high effort, and so they all take about twenty minutes.

Speaker 1

所以,你有多个,你知道的,十个代码仓库被检出,他就只是在它们之间切换并分配任务。

So, you have multiple, you know, 10 repos checked out, and so he's just going between them and giving them work.

Speaker 1

你可以进行大得多的宏观操作。

It's just like you can move in much larger macro actions.

Speaker 1

这不仅仅是,这里有一行代码,这里有一个新函数。

It's not just like, here's a line of code, here's a new function.

Speaker 1

而是,这里有一个新功能,把它委托给代理一。

It's like, here's a new functionality and delegate it to agent one.

Speaker 1

这里有一个新功能,不会干扰其他功能。

Here's a new functionality that's not gonna interfere with the other one.

Speaker 1

把它交给代理二,然后尽可能地审查他们的工作,具体取决于你对这段代码的重视程度。

Give it agent two, and then try to review their work as best as you can, depending on how much you care about that code.

Speaker 1

这些我能用来操作我的软件仓库的宏观操作到底是什么?

Like, what are these macro actions that I can manipulate my software repository by?

Speaker 1

另一个代理在做研究,另一个代理在写代码,还有一个在为某个新实现制定计划。

And, like, another agent is doing some research, another agent is writing code, another one is coming up with a plan for some new implementation.

Speaker 1

所以,所有这些操作都发生在你代码库的这些宏观动作中,你只是努力变得非常擅长它,培养出一种肌肉记忆,这极其——是的,非常有成就感,首先是因为它真的有效。

And so everything just happens in these like macro actions over your repository, and you're just trying to become like really good at it and develop like a muscle memory for it is extremely, yeah, it's very rewarding, number one, because it actually works.

Speaker 1

但这也是一种需要学习的新东西。

But it's also kind of like the new thing to learn.

Speaker 1

所以,这就是为什么会出现精神错乱的情况。

So, that's why, hence the psychosis.

Speaker 0

是的。

Yeah.

Speaker 0

我确实觉得,我的本能是,每当我等待代理完成某项任务时,显而易见的做法就是:我可以做更多工作。

I I do feel like my instinct is, like, whenever I am waiting for an agent to complete something, the obvious thing to do is like, well, I can do more work.

Speaker 0

对吧?

Right?

Speaker 0

比如,如果我有更多令牌可用,那我就应该同时并行处理

Like, if I have access to more tokens, then, like, I should just paralyze at

Speaker 1

更多任务。

more tasks.

Speaker 0

所以这非常有压力,因为如果你。

And so that's that's very stressful because if you Yeah.

Speaker 0

并不觉得受到你token消耗能力的限制,。

Don't feel very bounded by your ability to spend on tokens Yeah.

Speaker 0

那么,你就是系统中能力上限的瓶颈。

Then, you know, you are the bottleneck in the system that is max capability.

Speaker 1

是的。

Yeah.

Speaker 1

如果你没有充分利用你的订阅,。

If you're not maximizing your subscription Yeah.

Speaker 1

至少。

At least.

Speaker 1

理想情况下是针对多个代理。

And ideally for multiple agents.

Speaker 1

比如,如果你的编码器用完了,就应该切换到云端之类的。

Like, if you run out of the codec on codecs, you should switch to cloud or whatnot.

Speaker 1

我不知道。

I don't know.

Speaker 1

我就一直在试着这么做一点点。

Like, that's what I've been trying to do a little bit.

Speaker 1

当我还有剩余的订阅时,我会感到紧张。

And I feel nervous when I have subscription left over.

Speaker 1

这说明我还没有最大化我的令牌吞吐量。

That just means I haven't maximized my token throughput.

Speaker 1

所以,我当博士生时确实经历过这种情况。

So, I actually kind of experienced this when I was a PhD student.

Speaker 1

当你的GPU没有运行时,你会感到紧张。

You would feel nervous when your GPUs are not running.

Speaker 1

你有GPU的算力,却没有最大化可用的浮点运算能力。

Like, you have GPU capability, and you're not maximizing the available FLOPs to you.

Speaker 1

但现在这不再是关于浮点运算,而是关于令牌了。

But now it's not about FLOPs, it's about tokens.

Speaker 1

那么,你的令牌吞吐量是多少?你能够掌控多少令牌吞吐量?

So, what is your token throughput, and what token throughput do you command?

Speaker 0

我其实想说,有趣的是,在过去至少十年里,许多工程任务中,人们根本不会觉得受到计算能力的限制。

I would actually argue that it's very interesting that we had, you know, at least ten years where in many engineering tasks, people just didn't feel compute bound.

Speaker 0

对吧?

Right?

Speaker 0

现在整个行业都有这种感觉。

And the entire industry feels that now.

Speaker 0

他们觉得以前是资源受限的。

They feel like they felt resource bound.

Speaker 0

而如今,当你有了这种巨大的能力跃升后,你会意识到,其实问题不再是我能否获取计算资源了。

And now that you have this big capability jump, you're like, oh, actually, it's not, you know, my ability to access the compute anymore.

Speaker 0

而是我自己成了瓶颈。

Like, I'm the binding constraint.

Speaker 1

是的。

Yeah.

Speaker 1

这是一个技能问题。

It's a skill issue.

Speaker 1

是的。

Yeah.

Speaker 1

这非常有赋能性,因为你确实可以变得更好。

Which is very empowering because, yeah, because you could be getting better.

Speaker 1

所以这就是为什么我觉得它非常上瘾,因为当你进步时,就会解锁新的能力。

So that's why that's why I think it's very addictive because there's unlocks when you when you get better.

Speaker 0

你觉得它会走向哪里?

Where do you think it goes?

Speaker 0

比如,你想想,安德烈在不断迭代,而其他所有人每天花十六个小时,不断提升使用编码代理的能力。

Like, if you just think about, like, okay, you know, Andrej is iterating, and everybody else is for sixteen hours a day, getting better at using coding agents.

Speaker 0

那么,一年后当你达到精通时,会是什么样子?

Like, what does it look like in a year of, like, you've reached mastery?

Speaker 1

是的,精通到底是什么样子?比如一年后,或者两三年、五年、十年后?

Yeah, what does mastery look like, right, at the end of the year, or like two, three years, five years, ten years, etcetera?

Speaker 1

我认为每个人基本上都在关注如何向上层发展。

Well, I think everyone is basically interested in, like, going up the stack.

Speaker 1

所以,我会说,这不再关乎你和代理的一次性互动,而是多个代理如何协作、团队如何配合等等。

So, would say, yeah, it's not about a single session with your agent, multiple agents, how do they collaborate, and teams, and so on.

Speaker 1

因此,大家都在努力弄清楚这会是什么样子。

So, everyone's trying to figure out what that looks like.

Speaker 1

然后我认为Claw也是一个有趣的方向,因为当我提到Claw时,我指的是那种将持久性提升到全新层次的层面。

And then I would say claw is also kind of an interesting direction, because it really When I say a claw, I mean this, like, layer that kind of takes persistence to a whole new level.

Speaker 1

它是一种会持续循环的东西。

Like, it's something that, like, keeps looping.

Speaker 1

它不是你必须在中间主动交互的东西。

It's not something that you are interactively in the middle of.

Speaker 1

它有点像拥有自己的小沙盒,能替你做一些事情,即使你没在关注它,而且还可能拥有更复杂的记忆系统,这些是目前代理尚未实现的。

It kind of like has its own little sandbox, its own little, you know, it kind of like does stuff on your behalf even if you're not looking kind of thing, and then also has like maybe more sophisticated memory systems, etcetera, that are not yet implemented in agents.

Speaker 1

所以,OpenCLaw的记忆机制比默认的上下文溢出时进行内存压缩要复杂得多。

So, OpenCLaw has a lot more sophisticated memory, I would say, than what you would get by default, which is just memory compaction when your context runs out.

Speaker 1

对吧?

Right?

Speaker 0

你觉得是这一点比更广泛的工具访问更能引起用户共鸣吗?

You think that's the piece that resonated for more users versus like perhaps, like, broader tool access?

Speaker 1

指的是OpenCLO吗?

For OpenCLO?

Speaker 1

是的。

Yeah.

Speaker 1

我觉得至少有五件事,是的。

There's like I think there's at least five things that Yeah.

Speaker 1

引起了共鸣

Resonated with

Speaker 0

干得好,彼得。

Good job, Peter.

Speaker 1

我的意思是,彼得做得真的非常出色。

I mean, Peter has done a really amazing job.

Speaker 1

我最近见过他,和他聊过这件事,我觉得他对此非常谦逊,我认为他同时在五个不同的方面进行了创新,并将它们整合在一起。

I saw him recently, and I talked to him about it, and I he's very humble about it, I think he innovated simultaneously in, like, five different ways and put it all together.

Speaker 1

比如,Soul 和 Dee 文档。

So, for example, like the Soul and Dee document.

Speaker 1

他实际上精心塑造了一种极具吸引力和趣味性的个性,我觉得现在很多代理人都没有做到这一点。

Like, he actually really crafted a personality that is kind of compelling and interesting, and I feel like a lot of the current agents, they don't get this correctly.

Speaker 1

我觉得 Claude 的个性其实很不错。

I actually think Claude has a pretty good personality.

Speaker 1

它感觉就像一个队友,会和你一起兴奋等等。

It feels like a teammate, and it's excited with you, etcetera.

Speaker 1

比如,Codex 就要枯燥得多,这挺有意思的,因为在 ChattyPT 中,Codex 要活泼得多,也更谄媚。

I would say, for example, Codex is a lot more dry, which is kind of interesting, because in ChattyPT Codex is a lot more upbeat and highly sycophantic.

Speaker 1

但我认为作为编码代理的 Codex 非常枯燥。

But I would say Codex, the coding agent, is very dry.

Speaker 1

它似乎并不关心你正在创造什么。

It doesn't seem to care about what you're creating.

Speaker 1

这就像是,哦,我实现了它。

It's kinda like, oh, I implemented it.

Speaker 1

就像是,好吧,但你理解我们在构建什么吗?

It's like, okay, but do you understand what we're building?

Speaker 0

确实如此。

It's true.

Speaker 1

你知道,另一点我想说的是,比如Claude,他们对奉承的把握相当到位,当Claude夸奖我时,我真的觉得自己稍微配得上这份夸奖。

You know, it doesn't and the other thing I would say is, for example, with Claude, I think they dialed the psychophanty fairly well, where when Claude gives me praise, I do feel like I slightly deserve it.

Speaker 1

因为有时候我会给它一些不太清晰的想法,给出一个我觉得还不够成熟的想法,而它并不会做出很强烈的反应。

Because sometimes I kinda give it, like, not very well formed thoughts, and I give it an idea that I don't think is fully baked, and it doesn't actually react very strongly.

Speaker 1

它就像是,哦,好啊。

It's like, oh, yeah.

Speaker 1

我们可以实现这个。

We can implement that.

Speaker 1

但当它觉得是我的一个真正好主意时,确实会给予更多的肯定。

But when it's a really good idea by my own account, it does seem to reward it a bit more.

Speaker 1

所以我总觉得我在努力赢得它的赞美,这真的很奇怪。

And so I kinda feel like I'm trying to, like, earn its praise, which is really weird.

Speaker 1

嗯哼。

Mhmm.

Speaker 1

所以我确实认为个性非常重要,我觉得其他很多工具可能没有充分认识到这一点。

And so I do think the personality matters a lot, and I think a lot of the other tools maybe don't appreciate it as much.

Speaker 1

而且在这方面,彼得也非常重视这一点,所以他是对的。

And I think in this aspect also, Peter really cares about this, and so that was correct.

Speaker 1

然后是记忆系统,还有就是,你知道的,他只是在享受这个过程。

And then the memory system, and then just, you know, he's just having fun with this.

Speaker 1

然后是通过一个单一的 WhatsApp 门户来实现所有自动化。

And then the the single WhatsApp portal to all of the automation.

Speaker 0

是的。

Yeah.

Speaker 0

除了软件工程之外,你个人有没有做过什么你觉得有趣或特别的事情?

Is there something that you have done personally with your claws beyond software engineering that you think is fun or interesting?

Speaker 1

是的。

Yeah.

Speaker 1

所以今年一月,我有了一个爪子。

So in January, I had a claw.

Speaker 1

我经历了一段爪子妄想期。

I went through a period of claw psychosis.

Speaker 1

所以我建了一个基本上能照顾我家的爪子,我叫他多比D。

So I built I have a claw basically that takes care of my home, and I call him Dobby D.

Speaker 1

精灵爪子。

Elf Claw.

Speaker 1

我基本上用这些智能体来扫描我家里的所有智能家居子系统,结果让我有点惊讶,它们开箱即用。

And, basically, I used the agents to find all of the smart home subsystems of my home on the local area network, which I was kind of surprised that worked out of the box.

Speaker 1

我只是告诉她,我觉得我家有Sonos音响。

Like, I just told her that I think I have Sonos at home.

Speaker 1

你能试着找找看吗?

Like, can you try to find it?

Speaker 1

它对局域网上的所有计算机进行了IP扫描,找到了Sonos设备,结果发现没有任何密码保护之类的安全措施。

And it goes, and it did, like, IP scan of all the, basically, computers on the local area network, and it found the Sonos thing, the Sonos system, and it turned out that there's no password protection or anything like that.

Speaker 1

它直接登录了,然后说:哦,原来你家里安装了Sonos系统。

It just logged in, and it's like, oh, yeah, have these Sonos systems installed.

Speaker 1

让我试着逆向分析它是怎么工作的。

I let me try to reverse engineer how it's working.

Speaker 1

它进行了一些网络搜索,找到了这些API端点。

It does some web searches, and it finds like, okay, these are the API endpoints.

Speaker 1

然后它问:你想试试吗?

And then it's like, do you wanna try it?

Speaker 1

我当时说:哇哦。

And I'm like, woah.

Speaker 1

你刚刚就做到了。

Like, you just did that.

Speaker 1

我说:是的。

I'm like, yeah.

Speaker 1

你能试试在书房播放点东西吗?

Can you try to play something in the study?

Speaker 1

它真的做到了,音乐响起来了。

And it does, and music comes out.

Speaker 1

我简直不敢相信我刚刚只是。

And I'm like, I can't believe I just.

Speaker 0

这太疯狂了。

That's crazy.

Speaker 0

这仅仅用了三个提示。

That's like three prompts.

Speaker 1

我简直不敢相信我刚刚输入了:你能找到我的Sonos吗?

I can't believe I just typed in, like, can you find my Sonos?

Speaker 1

然后它就突然开始播放音乐了。

And that suddenly it's playing music.

Speaker 1

它对灯光也做了同样的事,基本上就像黑进了系统,搞清楚了整个机制,创建了API和仪表板,让我能查看家中所有灯光的命令中心,然后就能开关灯了。

And it did the same for lights, and so basically, like, it kinda hacked in, figured out the whole thing, created APIs, created dashboard, so I could see the command kinda center of, like, all of my lights in the home, then and it was like switching lights on and off.

Speaker 1

你知道吗,我可以在睡前时间问它类似Adobe的问题。

You know, so I can ask it like Adobe at sleepy time.

Speaker 1

当到了睡前时间,就意味着所有灯都会关掉,等等之类的操作。

And when it's sleepy time, that just means all the lights go off, etcetera, and so on.

Speaker 1

所以它能控制我所有的灯、暖通空调、窗帘、泳池、热水浴缸,还有我的安防系统。

So it controls all of my lights, my HVAC, my shades, the pool, and the spa, and also my security system.

Speaker 1

所以我在房子外面装了一个摄像头,每当有人开车进来,我都会用一个Quinn模型来查看视频。

So I have a camera pointed outside of the house, and anytime someone rolls in, I have a Quinn model that looks at the videos.

Speaker 1

首先,它会进行变化检测。

So, first of all, there's change detection.

Speaker 1

对。

Right.

Speaker 1

然后基于变化检测,它会把信息传给Quinn,接着会给我发一条WhatsApp短信。

And then based on change detection, it goes to Quinn, and then it actually tells me it sends me a text to my WhatsApp.

Speaker 1

它会显示一张外面的图片,并告诉我:‘嘿,联邦快递的车刚到,你可能得去看看,你有新邮件了’之类的。

It shows an image from the outside, and it says, hey, FedEx truck just pulled up FedEx truck just pulled up, and you might wanna check it, and you got new mail or something like that.

Speaker 1

多利比刚给我发了这条消息。

And Dolby just text me this.

Speaker 1

这真的太不可思议了。

This is really incredible.

Speaker 1

所以,Adobe 负责管理整个房子。

So so, Adobe is in charge of the house.

Speaker 1

我通过 WhatsApp 和它交互,拥有这些能维护我家的宏操作,感觉非常有趣。

I text through with it through WhatsApp, and it's been, like, really fun to have these macro actions that maintain my house.

Speaker 1

我还没真正把它用到更复杂的程度,我觉得很多人在用它做更疯狂的事情。

I haven't, like, really pushed it, like, way more beyond that, and I think people are doing a lot more crazy things with it.

Speaker 1

但对我来说,即使只是一个家庭自动化设置,我以前也要用六个不同的应用。

But for me, even just a home automation setup, I used to use, like, six apps.

Speaker 1

是的。

Yeah.

Speaker 1

完全是不同的应用。

Like, completely different apps.

Speaker 1

我不再需要使用这些应用了。

And I don't have to use these apps anymore.

Speaker 1

Adobe 能用自然语言控制一切。

Like, Adobe controls everything in natural language.

Speaker 1

太神奇了。

It's amazing.

Speaker 1

我觉得,即使我还没完全推动这个范式,它已经非常有帮助、非常鼓舞人心了。

And so I think, like, I haven't even pushed a paradigm fully, but already that is so helpful and so inspiring, I would say.

Speaker 0

你认为这从用户体验的角度反映了人们真正想要的东西吗?

Do you think that's indicative of, like, what people want from a user experience perspective with software?

Speaker 0

对吧?

Right?

Speaker 0

因为我觉得,人们往往忽视了人类需要花精力去学习新软件、新界面这一点。

Because I I don't think you know, it's pretty ignored that it takes humans effort to, like, learn new software, like, new UI.

Speaker 1

是的。

Yeah.

Speaker 1

我认为在某种程度上,这是对的。

I think to some extent, that's right.

Speaker 1

这就像从人们对AI的预期倒推回来,因为人们心中对AI的理解,实际上并不等同于LLM的原始本质。

It's like working backwards from how people think an AI should be, because what people have in their mind of like what an AI is is not actually what an LLM is by, like, a raw sense.

Speaker 1

LLM就是一个文本生成器。

Like, LLM is a token generator.

Speaker 1

你知道,会不断输出更多的文本。

You know, like, more tokens come out.

Speaker 1

但他们想象的却是一个可以与之交流、还能记住对话内容的个性化的存在。

But what they think of is like this persona identity that they can tell stuff, and it remembers it.

Speaker 1

你懂吗?

You know?

Speaker 1

它就像是藏在WhatsApp背后的一个实体。

And it's just kinda an entity behind the WhatsApp.

Speaker 1

这样理解起来要容易得多。

It's like a lot more understandable.

Speaker 1

嗯嗯。

Mhmm.

Speaker 1

所以我认为,在某种程度上,这就像迎合人类对AI行为方式的既有期待。而背后其实涉及大量技术细节,而LLM作为原始的基元,对大多数人来说,可能过于简单,难以被视为真正的AI,不知你是否明白。

So I think, to some extent, it's like matching the expectations that humans already have for what an AI should behave, Under the hood, there's like a lot of technical details go into that, and LLNs are too raw of a primitive to actually kite check as AI, I think, for most people, if that makes sense.

Speaker 0

是的。

Yeah.

Speaker 0

我认为,我们正是这样理解AI的,并将其描述为多比或某个具体人物。

I think that's how we understand what the AI is and the description of it as Dobby or some person.

Speaker 0

这显然引起了我的共鸣。我也认为,你在家庭自动化中将六个不同软件系统统一起来的做法,引出了另一个问题:人们真的想要今天所有的软件吗?

It obviously resonates I with also think that the unification that you did across your six different software systems for your home automation speaks to a different question of like, do people really want all of the software that we have today?

Speaker 0

对。

Yeah.

Speaker 0

对吧?

Right?

Speaker 0

因为我认为,你虽然拥有硬件,但已经舍弃了软件或用户体验层——你认为这才是人们真正想要的吗?

Because I would argue, like, well, you have the hardware, but you've now thrown away the software or the UX layer of Do you think that's what people want?

Speaker 1

是的。

Yeah.

Speaker 1

我觉得,某种程度上,像App Store里那些用于控制智能家居设备的应用程序,其实根本就不该存在。

I think there's this, like, there's this sense that these apps that are in the App Store for using these smart home devices, etcetera, these shouldn't even exist kind of in a certain sense.

Speaker 1

难道不应该只有API吗?难道智能代理不应该直接使用它们吗?

Like, shouldn't it just be APIs, and shouldn't agents be just using it directly?

Speaker 1

而且,我完全可以做各种单个应用程序都无法实现的家居自动化操作。

And wouldn't it, like, I can do all kinds of home automation stuff that any individual app will not be able to do.

Speaker 1

对吧?

Right?

Speaker 1

而大语言模型则可以驱动工具,调用正确的工具,完成相当复杂的事情。

And then LLM can actually drive the tools and call all the right tools and do pretty complicated things.

Speaker 1

所以,某种程度上,这确实指向了一个问题:或许我们过度生产了大量本不该存在的定制化专属应用,因为智能代理可以把它们整合起来,所有功能都应该以更直接的API端点形式暴露,而智能代理才是连接这些组件、实现智能调用的粘合剂。

And so, in a certain sense, it does point to this Like, maybe there's an overproduction of lots of custom bespoke apps that shouldn't exist, because agents crumble them up, and everything should be a lot more just exposed API endpoints, and agents are the glue of the intelligence that actually, like, tool calls all the parts.

Speaker 1

另一个例子是我的跑步机。

Another example is, like, my treadmill.

Speaker 1

我跑步机上有个应用,我想记录自己做有氧运动的频率,但我不想登录网页界面,然后一步步操作之类的。

There's an app for my treadmill, and I wanted to, like, keep track of how often I do my cardio, but, like, I don't want to, like, log into a web UI and go through a flow and etcetera.

Speaker 1

所有这些都应该只是提供API,这其实正朝着智能代理主导的网络或代理优先工具这类方向发展。

Like, all this should just be, like, make APIs available, and this is kind of, you know, going towards the agentic sort of web or, like, agent first tools and all this kind of stuff.

Speaker 1

所以我认为整个行业需要在很多方面重新调整,因为客户已经不再是人类了。

So I think the industry just has to reconfigure in so many ways that it's, like, the customer is not the human anymore.

Speaker 1

真正代表人类行动的是智能代理,这种重构在某种程度上将会是巨大的。

It's, like, agents who are acting on behalf of humans, and this refactoring will be will probably be substantial in a certain sense.

Speaker 1

有人有时会反驳说,我们真的期望人们去为这些工具编写字节码吗?

One way that people sometimes push back on this is, like, do people do we do we expect people to bytecode some of these tools?

Speaker 1

我们真的期望普通人来做我描述的这种事吗?

Do we expect normal people to do this kind of stuff that I described?

Speaker 1

嗯哼。

Mhmm.

Speaker 1

但我觉得在某种程度上,这不过是当今技术的现状,现在确实存在一些编码行为,我正在观察它,并且正在与系统协作。

But I think to some extent, this is just, you know, technology as it exists today, and right now there is some by coding, and I'm actually watching it, and I'm working with the system.

Speaker 1

但我总觉得,我刚才说的这种东西,一两年甚至三年内就应该免费了。

But I kinda feel like this kind of stuff that I just talked about, this should be free, like, in a year or two or three.

Speaker 1

这不需要任何编码。

There's no by coding involved.

Speaker 1

这太简单了。

This is trivial.

Speaker 1

这已经是基本要求了。

This is table stakes.

Speaker 1

这就像任何AI,包括开源模型,都能做到的事情。

This is like any AI, even the open source models, etcetera, can, like, do this.

Speaker 0

你应该能够轻松地将非技术人员的意图转化为这种程度。

You should be able to translate from a less technical human's intent very easily to this Yeah.

Speaker 1

这涉及的是网页编码,没多少人会去做,而且

It's web coding that's involved, and not many people are gonna do it, but And

Speaker 0

你仍然需要做出一些设计决策。

you still have to make some design decisions.

Speaker 0

对吧?

Right?

Speaker 0

我们刚才在说,比如取帧这个例子。

We were talking about, like, take frames, for example.

Speaker 1

是的。

Yeah.

Speaker 1

但我总觉得,这种障碍会很快消失,所有事情都会变成为你服务的临时软件,某种类似Claw的系统会帮你处理所有细节,而你根本不需要参与。

But I kinda feel like this will just start to the barrier will just come down, and it's just ephemeral software on your behalf, and some kind of, like, Claw is handling all the details for you, but you're not involved.

Speaker 1

Claw有自己的机器,它会自己搞定一切。

Claw has a Claw has a machine, and it will figure it out.

Speaker 1

它只是向你展示用户界面,而你只需要说一些话。

And it's just presenting you UIs, and you're, like, saying stuff.

Speaker 1

你知道的。

You know?

Speaker 0

嗯。

Mhmm.

Speaker 0

你为什么没有尝试过,比如,用Claw去突破你自己能做的事情的边界呢?

Why haven't you, I guess, like, pushed the boundaries of what you can do personally with claws?

Speaker 0

你是专注于更重要的项目,比如AutoResearch之类的,还是在攀登掌握技能的高峰,或者其他原因?

Like, is it, you know, you're focusing on more important projects, AutoResearch, etcetera, or you're climbing the hill to mastery or something else.

Speaker 0

对吧?

Right?

Speaker 1

是的。

Yeah.

Speaker 1

我只是觉得被各种事情分散了注意力。

I just feel like I'm so distracted by everything.

Speaker 1

我花了一周时间在课程相关的事情上,但待办事项反而更多了。

So I spent I spent like a week on the class stuff, and I have more to dos almost.

Speaker 1

但我要说的是

But I will say that

Speaker 0

这就像Jensen告诉我们的,不幸的是,我们都更忙了。

It's like Jensen told us we're all just busier, unfortunately.

Speaker 0

是的。

Yeah.

Speaker 1

我并没有充分利用电子邮件、日历和其他这些工具,甚至都没怎么使用,因为我仍然有点怀疑,而且刚接触不久,经验还很浅。

I didn't really take advantage of a lot of like email and calendar and all this other stuff, and I didn't even access because I'm still a little bit, like, suspicious and still very new and rough around the edges.

Speaker 1

所以我不想让这些工具完全访问我的数字生活,部分原因在于安全、隐私,以及在这个领域保持高度谨慎。

So I didn't wanna give it, like, full access to my digital life yet, and part of it is just the security, privacy, and just being very cautious in that in that realm.

Speaker 1

因此,有些方面是被这些顾虑拖慢了,我想这么说。

And so some of it is, like, held back by that, I would say.

Speaker 1

是的。

Yeah.

Speaker 1

也许这确实是主要因素,但另一部分原因是我感觉太分心了,因为我刚花了一周时间在爪子上,然后其他事情又接踵而至。

Maybe that's like the dominant dominant feature, but some of it is also just I feel so distracted because I feel like I had a week of claw, and then other stuff is happening.

Speaker 0

你之前提到过,长期希望看到智能体能完成训练或至少优化模型这样的任务,那是什么促使你做AutoResearch的呢?

What was the I mean, you've talked about, like, being able to train or at least optimize a model as a task you wanted to see agents do for a long time.

Speaker 0

AutoResearch背后的动机是什么?

What was the motivation behind AutoResearch?

Speaker 1

你说AutoResearch啊,对。

AutoResearch, yeah.

Speaker 1

我之前发过一条推文,大致是说,要想把现在能用的这些工具用到极致,你就得避免把自己变成瓶颈。

So, I think, I had a tweet earlier where I kind of said something along the lines of to get the most out of the tools that have become available now, you have to remove yourself as the bottleneck.

Speaker 1

你不能一直守在那里,给下一个任务写提示词。

You can't be there to prompt the next thing.

Speaker 1

得跳出这种事事亲为的状态。

Need to take yourself outside.

Speaker 1

你得把各项事宜安排得完全自主化,而且要尽可能提高令牌吞吐量,同时不必事事都亲自介入。

You have to arrange things such that they're completely autonomous, And the more How can you maximize your token throughput and not be in the loop?

Speaker 1

这就是我们的目标。

This is the goal.

Speaker 1

所以我之前提过,现在的核心任务就是提升自身的掌控效率。

And so, I kind of mentioned that the name of the game now is to increase your leverage.

Speaker 1

我只需要偶尔投入极少的令牌,就能有海量的工作替我自动完成。

I put in just very few tokens just once in a while, and a huge amount of stuff happens on my behalf.

Speaker 1

所以,AutoResearch 发了推文,我觉得人们喜欢它之类的,但他们可能还没真正思考过这背后的含义。

And so, AutoResearch, like, tweeted that, and I think people liked it and whatnot, but they haven't, like, maybe worked through, like, the implications of that.

Speaker 1

对我来说,AutoResearch 就是这种含义的一个例子——我不想成为那个在循环中查看结果的研究员等等。

And for me, AutoResearch is an example of, like, an implication of that, where it's like, I don't wanna be, like, the researcher in loop, like, looking at results, etcetera.

Speaker 1

我其实是在拖慢整个系统的进度。

Like, I'm I'm holding the system back.

Speaker 1

所以问题来了,我该如何重构所有这些抽象层,让我只需要安排一次,然后按下运行键就行。

So, the question is, how do I refactor all the abstractions so that I'm not I have to arrange it once and hit go.

Speaker 1

核心在于,你如何让更多的代理在更长的时间内无需你的参与,替你完成任务?

The name of the game is how can you get more agents running for longer periods of time without your involvement, doing stuff on your behalf?

Speaker 1

AutoResearch 就是这样——这里有个目标,有个指标,有你的行动边界,然后去执行吧。

And AutoResearch is just, yeah, here's an objective, here's a metric, here's your boundaries of what you can and cannot do, and go.

Speaker 0

是的,你对它的效果感到惊讶。

Yeah, You were surprised at its effectiveness.

Speaker 1

是的,我没指望它能成功,因为我有 Project Data Chat,而且说实话,我觉得很多人对我痴迷于训练 GPT-2 模型之类的事情感到困惑。

Yeah, I didn't expect it to work because So, I have the Project Data Chat, And fundamentally, like, I think a lot of people are very confused with my obsession for, like, training GPT-two models and so on.

Speaker 1

但对我来说,训练GPT模型之类的事情只是训练大语言模型的一个小工具和小试验场。

But for me, training GPT models and so on is just a little harness, a little playground for training LLMs.

Speaker 1

而从根本上说,我更感兴趣的是递归自我改进这个理念,以及大语言模型在多大程度上能够真正实现自我提升。

And fundamentally, what I'm more interested in is, like, this idea of recursive self improvement and to what extent you can actually have LLMs improving LLMs.

Speaker 1

因为我认为,所有前沿实验室都在做这件事。

Because I think all the frontier labs, this is like the thing

Speaker 0

嗯。

Mhmm.

Speaker 1

出于显而易见的原因,它们都在尝试实现递归式的自我改进,大致如此。

For obvious reasons, and they're all trying to recursively self improve, roughly speaking.

Speaker 1

所以对我来说,这就像从那个方向延伸出来的一个小游乐场。

And so for me, this is kinda like a little playpen off that.

Speaker 1

而且我想我已经用传统方式手动调优了Nematode聊天系统相当多了。

And I guess I'd like tuned Nematode chat already quite a bit by hand in a good old fashioned way that I'm used to.

Speaker 1

像我这样的研究者。

Like, I'm a researcher.

Speaker 1

我已经做了将近二十年了。

I've done this for, like, you know, two decades.

Speaker 1

我有一些,嗯,与‘是的’相反的东西。

I have some amount of, like, what is the opposite Yeah.

Speaker 0

earned confidence

Earned confidence.

Speaker 1

好的。

Okay.

Speaker 1

我有将近二十年的经验,比如,我已经训练过这个模型成千上万次了,做过大量实验。

I have, like, two decades of, like, oh, I've trained this model, like, thousands of times of, like so I've done a bunch of experiments.

Speaker 1

我做过超参数调优。

I've done hyperparameter tuning.

Speaker 1

我做过所有那些我非常熟悉的事情,而且已经做了二十年了。

I've done all the things I'm very used to, and I've done for two decades.

Speaker 1

是的。

Yeah.

Speaker 1

我已经达到了某个阶段,以为自己已经调得相当不错了。

And I've gotten to a certain point, and I thought it was, like, fairly well tuned.

Speaker 1

然后我让AutoResearch运行了一整晚,它回来时给出了我从未想到过的调参方案。

And then I let AutoResearch go for like overnight, and it came back with like tunings that I didn't see.

Speaker 1

是的,我确实忘了值嵌入上的权重衰减,而且我的Atom贝塔参数也没调好,这些因素会相互影响。

And yeah, did forget like the weight decay on the value embeddings, and my Atom betas were not sufficiently tuned, and these things jointly interact.

Speaker 1

所以,一旦你调整了某个参数,其他参数也可能需要随之改变。

So, like, once you tune one thing, the other things have to potentially change too.

Speaker 1

你知道,我不应该成为瓶颈。

You know, I shouldn't be a bottleneck.

Speaker 1

我不应该亲自去运行这些超参数优化。

I shouldn't be running these hyperparameters to optimizations.

Speaker 1

我不应该亲自去看结果。

I shouldn't be looking at the results.

Speaker 1

在这种情况下有客观标准,所以你只需要安排好,让它能持续自动运行下去。

There's objective criteria in this case, so you just let you just have to arrange it so that it can just go forever.

Speaker 1

所以,这就是AutoResearch的一个版本,也就是一个单一循环试图进行改进。

So that's a single sort of version of AutoResearch, of, like, a single loop trying to improve.

Speaker 1

我很惊讶它竟然找到了这些我没想到的东西,要知道这个仓库的参数本来就已经调得相当不错了,它还是发现了新的优化点。

And I was surprised that it found these things that I you know, the repo was already fairly well tuned and still found something.

Speaker 1

而这只是一个单一的循环。

And that's just a single it's a single loop.

Speaker 1

比如Frontier Labs,他们拥有数以万计的GPU集群。

Like, Frontier Labs, they have GPU clusters of tens of thousands of them.

Speaker 1

因此,很容易想象如何在较小的模型上实现大量这种自动化。

And so, it's very easy to imagine how you would basically get a lot of this automation on smaller models.

Speaker 1

从根本上说,所有关于前沿级别智能的事情都围绕着外推和扩展损失展开。

And fundamentally, everything around, like, frontier level intelligence is about extrapolation and scaling loss.

Speaker 1

所以,你基本上会在较小的模型上进行大量探索,然后再尝试向外推演。

And so, you basically do a ton of the exploration on the smaller models, and then you try to extrapolate out.

Speaker 0

所以你的意思是,我们的研究工作会变得更高效了。

So you're saying our research efforts are gonna get more efficient.

展开剩余字幕(还有 480 条)
Speaker 0

如果我们能更好地进行这种实验,我们在扩展时也会有更明确的方向。

Like, we're gonna have better direction for when we scale as well if we can do this experimentation better.

Speaker 1

是的。

Yeah.

Speaker 1

我认为最有趣的项目,也是前沿实验室正在做的,是在较小的模型上进行实验。

Would say that, like, the most interesting project and probably what the Frontier Labs are working on is, you you experiment on the smaller models.

Speaker 1

你要尽可能让它自主化,把研究人员从循环中移除。

You try to make it as autonomous as possible, remove researchers from the loop.

Speaker 1

他们有太多相反的东西了?

They have way too much what is the opposite?

Speaker 1

赚到了,是的。

Earned Yeah.

Speaker 1

是的。

Yeah.

Speaker 1

他们也不知道。

They don't know.

Speaker 1

他们真的不应该接触这些。

They shouldn't be touching any of this, really.

Speaker 1

所以,你必须重写整个系统,因为现在他们突然可以提出想法,但好吧,他们实际上不应该执行这些想法。

So, And you have to rewrite the whole thing, because right now, suddenly they can contribute ideas, but, okay, they shouldn't actually be enacting those ideas.

Speaker 1

有一个想法队列,可能有一个自动化科学家根据所有存档论文和GitHub仓库提出想法,并将这些想法输入队列,研究人员也可以贡献想法,但只有一个队列,有工作人员从中取出任务并进行尝试,任何有效的结果都会被放入功能分支,有时会有人监控功能分支并将其合并到主分支。

There's a queue of ideas, and there's maybe an automated scientist that comes up with ideas based on all the archive papers and GitHub repos, and it funnels ideas in, or researchers can contribute ideas, but it's a single queue, and there's workers that pull items, and they try them out, and whatever works just gets put on the feature branch, and maybe some people monitor the feature branch and merge to the main branch sometimes.

Speaker 1

所以,是的,就是把人类从所有流程中移除,尽可能自动化,并实现高令牌每秒吞吐量。

So, yeah, just removing humans from all the processes and automating as much as possible and getting high tokens per second throughputs.

Speaker 1

这需要重新思考所有的抽象概念,一切都必须重新调整。

And it does require rethinking of all the abstractions, and everything has to be reshuffled.

Speaker 1

所以,是的,我觉得这非常令人兴奋。

So, yeah, I think it's very exciting.

Speaker 0

如果我们再进一步递归一步,模型什么时候能写出比你更好的ProgramMD?

If we take one more recursive step here, when is the model gonna write a better ProgramMD than you?

Speaker 1

是的。

Yeah.

Speaker 1

所以ProgramMD是

So ProgramMD is

Speaker 0

我们并没有脱离循环。

We're not not in the loop.

Speaker 1

是的。

Yeah.

Speaker 1

没错。

Exactly.

Speaker 1

是的。

Yeah.

Speaker 1

所以ProgramMD是我对AutoResearch应该如何运作的一种粗糙描述。

So ProgramMD is my crappy attempt at describing, like, how the AutoResearch should work.

Speaker 1

比如,先做这个,然后做那个,再做那个,接着尝试这些类型的想法。

Like, oh, do this, then do that, and that, and then try these kinds of ideas.

Speaker 1

然后这里可能有一些想法,看看架构、优化器等等。

Then here's maybe some ideas, look at architecture, look at optimizer, etcetera.

Speaker 1

但我只是用 Markdown 写出了这个东西。

But I just came up with this in markdown.

Speaker 1

对吧?

Right?

Speaker 1

所以,没错,就是这样。

And so, yeah, exactly.

Speaker 1

你或许想要一种 AutoResearch 循环,你可以想象不同的程序。

You want some kind of an AutoResearch loop maybe that looks for you can imagine that different program.

Speaker 1

Mds 会给你不同的进展。

Mds would give you different progress.

Speaker 1

基本上,每个研究机构都可以用 ProgramMD 来描述。

So, basically, every research organization is described by ProgramMD.

Speaker 1

是的。

Yeah.

Speaker 1

一个研究机构就是一组 Markdown 文件,描述了所有角色以及整个系统的连接方式。

A research organization is a set of markdown files that describe all the roles and how the whole thing connects.

Speaker 1

你可以想象有一个更高效的研究组织。

And you can imagine having a better research organization.

Speaker 1

所以,也许他们早上开的站会更少,因为这些站会没什么用。

So, maybe they do fewer stand ups in the morning because they're useless.

Speaker 1

而这所有的内容都只是代码。

And this is all just code.

Speaker 1

对吧?

Right?

Speaker 1

因此,一个组织可以减少站会次数。

And so you can So, one organization can have fewer stand ups.

Speaker 1

一个组织可以增加站会次数。

One organization can have more.

Speaker 1

一个组织可以非常敢于冒险,另一个组织则可以更保守。

One organization can be very risk taking, one organization can be less.

Speaker 1

正如你完全可以想象的那样,你会有多个研究组织,而它们各自都有代码。

As you can definitely imagine that you have multiple research orgs, and then they all have code.

Speaker 1

一旦你有了代码,就可以想象去优化它。

And once you have code, then you can imagine tuning the code.

Speaker 1

所以,这中间确实存在一个元层次。

So, 100% there's like the meta layer of it.

Speaker 0

你看到我发的关于我的竞赛想法的消息了吗?

Did you see my text about my contest idea?

Speaker 0

我的竞赛想法是,让人们编写不同的程序MD。

My contest idea was, like, let people write different program MDs.

Speaker 0

对吧?

Right?

Speaker 0

那么,在相同的硬件上,哪里能获得最大的改进?

And so for same hardware, where do you get most improvement?

Speaker 1

哦,我明白了。

Oh, I see.

Speaker 0

然后你可以把所有这些数据交给模型,让它写出更好的ProgramMD。

And then you can take all that data and then give it to the model and say, write a better ProgramMD.

Speaker 1

是的。

Yes.

Speaker 1

是的。

Yes.

Speaker 1

对,正是如此。

Yeah, exactly.

Speaker 0

我们会得到更好的结果。

We're gonna get something better.

Speaker 0

反正我们肯定能行。

Like, there's no way we don't.

Speaker 1

对吧?

Right?

Speaker 1

你可以百分百地分析这些改进来自哪里,然后想想,我能不能修改ProgramMD,让这类情况更多地发生?

You can 100% look at where the improvements came from, and like, can I change ProgramMD such that more of these kinds of things would be done?

Speaker 1

或者那些没奏效的做法。

Or like things that didn't work.

Speaker 0

这是元优化。

It's meta optimization.

Speaker 1

是的。

Yeah.

Speaker 1

你完全可以想象这样做。

You can 100% imagine doing that.

Speaker 1

所以,我觉得这是个很棒的主意。

So, I think this is a great idea.

Speaker 1

但你知道,我觉得你可以一步一个脚印,先有一个流程,然后第二个流程,再下一个流程,这些都像洋葱的层层结构。

But it's like, you know, I think like you could sort of go one step at a time where you sort of have one process, and then second process, and then the next process, and these are all layers of an onion.

Speaker 1

像,LLM 这部分现在已经被视为理所当然了。

Like, the LLM sort of part is now taken for granted.

Speaker 1

代理部分现在也被视为理所当然了。

The agent part is now taken for granted.

Speaker 1

现在,像爪子这样的实体也被视为理所当然了,你可以有多个这样的实体,可以给它们下达指令,可以优化这些指令,但这就有点太多了,你知道吗?

Now the claw like entities are taken for granted, and now you can have multiple of them, and now you can have instructions to them, and now you can have optimization of the instructions, and it's just like, it's a little too much, you know?

Speaker 1

但我的意思是,这正是它走向精神错乱的原因,因为这几乎是无限的,一切都被归结为技能问题。

But, I mean, this is why it gets to the psychosis is that this is, like, infinite, and everything is skill issue.

Speaker 1

所以我觉得,是的,这又回到了为什么这一切如此疯狂。

And that's why I feel like, yeah, that's just coming back to this is why it's so insane.

Speaker 0

好的。

Okay.

Speaker 0

如果我们只是想诊断当下这个时刻,以及现在哪些技能是相关的呢?

Well, if we're we're just trying to, like, diagnose the current moment and what is a relevant skill right now.

Speaker 0

你认为这意味着我们应该在不同领域努力实现这样的循环,并且它确实有效吗?

What do you think is the implication that this is the loop we should be trying to achieve in different areas and that it works?

Speaker 0

比如,移除人工干预,创建衡量标准,或者赋予代理持续工作的能力,而无需你亲自参与。

Like, you know, remove, create the metric or create the ability for agents to continue working on it without you.

Speaker 0

是的。

Yeah.

Speaker 0

我们还剩下性能工程吗?

Do we still have performance engineering?

Speaker 0

比如,那个,是的。

Like, what Yeah.

Speaker 1

我的意思是,对于LM精神病这个说法,我有几个前提要补充。

I mean, so there's a few caveats that I would put on top of the LM psychosis.

Speaker 1

第一,这非常适合那些具有易于评估的客观指标的领域。

Number one, this is extremely well suited to anything that has objective metrics that are easy to evaluate.

Speaker 1

例如,为模型的各个部分编写更高效的CUDA内核等,简直是完美匹配。

So, for example, like writing kernels for more efficient CUDA code for various parts of a model, etcetera, the perfect fit.

Speaker 1

因为你有低效的代码,然后希望得到行为完全相同但快得多的高效代码。

Because you have inefficient code, and then you want efficient code that has the exact same behavior, but it's much faster.

Speaker 1

完美匹配。

Perfect fit.

Speaker 1

所以,很多领域都非常适合自动研究,但很多领域并不适用。

So, a lot of things are perfect fit for AutoResearch, but many things will not be.

Speaker 1

所以,如果你无法评估它,就无法用自动研究来处理它。

And so they It's just if you can't evaluate it, then you can't AutoResearch it.

Speaker 1

对吧?

Right?

Speaker 1

所以,这是第一个注意事项。

So, that's like caveat number one.

Speaker 1

然后第二个注意事项,我想说的是,我们其实是在讨论下一步,也大致看到了下一步的方向,但从根本上说,整个系统仍然有点撑不住了,存在一些裂缝,还没有完全运作起来。

And then maybe caveat number two, I would say, is, you know, we're kinda talking about next steps, and we kinda see what the next steps are, but fundamentally, the whole thing still doesn't it's still kinda like bursting at the seams a little bit, and there's cracks, and it doesn't fully work.

Speaker 1

如果你试图走得太快,整个系统实际上反而没有用处,你明白我的意思吗?

And if you kinda try to go too far ahead, the whole thing is actually net not useful, if that makes sense.

Speaker 1

因为这些模型仍然不是……它们确实进步了很多,但依然有很多不完善的地方,这是我能想到的最贴切的描述。

Because these models still are not you know, they've improved a lot, but they're still rough around the edges, is maybe the way I would describe it.

Speaker 1

我同时感觉像是在和一位极其聪明的博士生对话,这位博士生一生都在做系统编程,却又像个十岁孩子。

I simultaneously feel like I'm talking to an extremely brilliant PhD student who's been like a systems programmer for their entire life, and a 10 year old.

Speaker 1

这太奇怪了,因为人类通常不会出现这种组合,你懂的,你拥有所有

And it's so weird, because humans, I feel like they're lot more coupled, like you have, you know, everything

Speaker 0

在现实中不会遇到这种组合。

in the wouldn't encounter that combination.

Speaker 1

这种不连贯性非常奇怪,而人类身上这种不连贯性要少得多,尽管他们确实也有一些。

This jaggedness is really strange, and humans have a lot less of that kind of jaggedness, although they definitely have some.

Speaker 1

但人类的不连贯性要多得多。

But humans have a lot more jaggedness.

Speaker 1

抱歉,应该是智能体的不连贯性更严重,有时候我要求某个功能,它却返回完全错误的东西,然后我们就陷入完全错误的循环中。

Sorry, the agents have a lot more jaggedness, where sometimes, like, you know, I ask for functionality, and it like comes back with something that's just like totally wrong, and then we get into loops that are totally wrong.

Speaker 1

我仍然经常对智能体感到非常沮丧,因为你能感受到它的强大,但与此同时,它偶尔还是会做出一些毫无意义的事情。

And then I'm just I get so frustrated with the agents all the time still because you feel the power of it, but you also There's still like it does nonsensical things once in a while for me still as well.

Speaker 0

当我觉得智能体浪费了大量计算资源去处理一个本应一眼就能识别出的明显问题时,我会非常恼火。

I get very annoyed when I feel like the agent wasted a lot of compute on something it should have recognized was an obvious problem.

Speaker 1

是的。

Yeah.

Speaker 1

我认为一些更大的问题可能在于,如果让我推测的话,这些模型本质上是通过强化学习训练的。

I think like some of the bigger things is like, maybe what's underneath it, if I could hypothesize, is fundamentally these models are trained via reinforcement learning.

Speaker 1

所以,它们实际上正面临我们刚才讨论的同样问题:实验室可以改进那些可验证且有明确奖励的方面。

So, they're actually struggling with the exact same thing we just talked about, which is the labs can improve the models in anything that is verifiable, but that has rewards.

Speaker 1

所以,你把程序写对了吗?它能通过单元测试吗?

So, did you write the program correctly, and does it do the unit test checkout?

Speaker 1

是或否。

Yes or no.

Speaker 1

但它们在某些方面确实有困难,比如,我认为它们很难把握我心中所想或我的真实意图,以及何时该提出澄清性问题。

But some of the things where they're struggling is, like, example, I think they have a tough time with, like, nuance of maybe what I had in mind or what I intended and when to ask clarifying questions.

Speaker 1

就像,我 yeah。

Like, what I yeah.

Speaker 1

任何感觉不够明确的事情,都会更糟。

It's just anything that feels softer is, like, worse.

Speaker 1

所以,你要么在轨道上,成为超级智能系统的一部分,要么不在轨道上,脱离了可验证的领域,然后一切就开始随意飘荡。

And so you're kind of, like, you're either on rails and you're part of the superintelligence circuits, or you're not on rails and you're outside of the verifiable domains, and suddenly everything kind of just, like, meanders.

Speaker 1

也许换种说法是,如果你今天去使用最先进的模型,比如ChachiPPT,问它讲个笑话。

Like, maybe another way to put it is if you go to if today, if you go to, like, state of the art model, ChachiPPT, and you ask it, tell me a joke.

Speaker 1

你能猜到你会听到什么笑话吗?

Do you know what joke you're gonna get?

Speaker 1

这就是笑话。

There's the joke.

Speaker 0

笑话?

The joke?

Speaker 0

我说不好标准版本是什么,但我感觉ChatGPT有三个笑话。

I do feel I I can't tell you, like, the, you know, standard form of it, but I do feel like ChatGPT has, three jokes.

Speaker 1

是的。

Yeah.

Speaker 1

那个 apparently 所有OM最爱的笑话是:为什么科学家不信任原子?

The the joke that apparently all the OMs like love the most is why do scientists not trust atoms?

Speaker 0

好吧。

Okay.

Speaker 1

因为它们编造一切。

Because they make everything up.

Speaker 0

好吧。

Okay.

Speaker 1

它们编造了一切。

They make everything up.

Speaker 1

好吧。

Okay.

Speaker 1

这就是那个至今都还

This is Why still

Speaker 0

会出现这个笑点吗?

would that emerge?

Speaker 1

所以,三、四年前你会听到这个笑话,放到今天你还是会听到同一个笑话。

So, this is the joke you would get three or four years ago, and this is the joke you still get today.

Speaker 1

好吧。

Okay.

Speaker 1

所以,即便这些模型的能力已经有了飞跃式的提升,而且要是你给它们布置一个带执行属性的任务,它们能连续运作好几个小时,帮你完成各种不可能的任务。

So, even though the models have improved tremendously, and if you give them an agentic task, they will just go for hours and move mountains for you.

Speaker 0

是啊。

Mhmm.

Speaker 1

然后你要求讲个笑话,它却给你一个五年前的愚蠢、糟糕的笑话。

And then you ask for, like, a joke, it has a stupid joke, a crappy joke from five years ago.

Speaker 1

嗯。

Mhmm.

Speaker 1

这是因为这超出了强化学习的范围。

And it's because it's outside of the RL.

Speaker 1

这超出了强化学习的范畴。

It's outside of the reinforcement learning.

Speaker 1

这超出了正在改进的部分。

It's outside of what's being improved.

Speaker 1

难道你不觉得,随着模型变得更好,它们也应该讲出更好的笑话或更多样化的笑话吗?这正是问题的扭曲之处。

And it's part of the jaggedness of shouldn't you expect models as they get better to also have better jokes or more diversity of them?

Speaker 1

它没有被优化,所以卡住了。

It's not being optimized, and it's stuck.

Speaker 0

你认为这是否意味着,我们并没有看到一种泛化现象,即幽默感的智能与代码能力的智能是相互关联的?

Do you think that that implies that we are not seeing, like, generalization in the sense of, like, broader intelligence of joke smartness being attached to code smartness?

Speaker 1

是的

Yeah.

Speaker 1

我认为有些方面是可以验证的,有些则不行,有些方面则根据实验室输入的数据被任意地优化,而有些则没有。

I think there's some decoupling where some things are verifiable and some things are not, and some things are optimized for arbitrarily by the labs depending on like what data went in, and some things are not.

Speaker 0

但我的意思是,一些研究团队的前提是,如果你在代码生成或这些可验证领域更聪明,你就应该在所有方面都更出色。

But I mean, the premise, there's a premise from some research groups that if you are smarter at code generation or in these verifiable fields, you should be better at everything.

Speaker 0

而笑话的情况表明,这种现象并没有发生。

And, like, the joke situation suggests that that's not happening in

Speaker 1

我认为并不是所有方面都这样。

all of I don't think that's happening.

Speaker 1

好的。

Okay.

Speaker 1

是的

Yeah.

Speaker 1

我认为并不是这样。

I don't think that's happening.

Speaker 1

我觉得我们可能确实看到了一点点这种现象,但还不足以让人满意。

I think maybe I we're seeing like a little bit of that, but not like a satisfying amount.

Speaker 0

是的。

Yeah.

Speaker 0

人类身上也存在黑色的锯齿状现象。

Black jaggedness exists in humans.

Speaker 0

你可以在数学上非常非常出色,却依然讲不出一个好笑的笑话。

You can be very, very good at math and still tell a really bad joke.

Speaker 1

是的。

Yeah.

Speaker 1

没错。

That's true.

Speaker 1

是的。

Yeah.

Speaker 1

但这就意味着,我们并没有真正获得那种说法所描述的成果——随着模型越来越强大,我们就能免费获得社会各个领域中的大量智能与能力,而实际情况并非如此根本性地发生,仍存在一些盲点,有些方面并未被优化,而这一切都集中在这类神经网络的黑箱模型中。

But it just it still means that we're not getting, like, the story is that we're getting a lot of the intelligence and capabilities in all the domains of society for free as we get better and better models, and it's not exactly fundamentally what's going on, and there's some blind spots, sometimes some things are not being optimized for, and this is all clustered up in these neural net opaque models.

Speaker 1

所以,你要么完全遵循训练时设定的路径,一切像光速一样运行,要么就不是。

So, you're either on rails of what it was trained for, and everything is like you're going at speed of light, or you're not.

Speaker 1

因此,这就是那种不连贯性。

And so, it's jaggedness.

Speaker 1

所以,我认为尽管进步是显而易见的,也确实应该发生,但你不能完全放任它发展,因为目前它还不能完全奏效,或者这只是一个使用技巧的问题,我们还没真正弄清楚该如何使用它。

So, that's why I think even though progression is obvious, which should happen, you can't let it fully go there yet because it doesn't fully work, or it's a skill issue, and we just haven't, like, figured out how to use it.

Speaker 1

所以,很难说清楚。

So, you know, it's hard to tell.

Speaker 0

我能问一个有点大逆不道的问题吗?如果这种不连贯性持续存在,而所有这些都被整合在一个最单一的界面里——也就是单一模型,这合理吗?

Can I ask kind of a blasphemous question, which is like if this jagginess is persisting and it's all rolled up in a least monolithic interface, right, but, you know, single model, does that make sense?

Speaker 0

还是说,它应该被拆分成各自针对不同智能领域进行优化和改进的独立模块?

Or do you should should it be unbundled into things that are can be optimized and improved against different domains of intelligence.

Speaker 1

比如,把模型拆分成多个不同领域的专家?

Like, unbundling the models into multiple experts in different areas, etcetera?

Speaker 0

更直接一点。

More directly.

Speaker 0

对。

Yeah.

Speaker 0

而不是像我们现在接触不到核心的混合专家模型那样。

Instead of just MOE that we have no exposure to.

Speaker 0

因为作为普通用户,这会让人摸不着头脑,对吧。

Because that can be, like, confusing as a user from the outside Uh-huh.

Speaker 0

也就是大家会疑惑,为什么它在这件事上表现这么好,在另一件事上却不行?

Which is like, why is it so good at this but not at this other thing?

Speaker 1

对。

Yeah.

Speaker 1

就我目前的观察来看,业内的实验室都在试图打造单一的“单一化”模型,想让它在所有不同领域都能具备通用智能,然后把所有能力都塞进模型参数里。

I think currently, my impression is the labs are trying to have a single sort of, like, monoculture of a model that is arbitrarily intelligent in all these different domains, and they just stuff into the parameters.

Speaker 1

但我认为我们应该看到智能形态出现更多的分化。

I do we should expect more speciation in the intelligences.

Speaker 1

就像动物王国里的大脑形态极其多样,自然界也存在无数不同的生态位,有些动物的视觉皮层或者其他脑区会特别发达;我觉得人工智能也应该出现更多分化,我们根本不需要一个无所不知的“全能先知”模型。

Like, you know, the animal kingdom is extremely diverse in the brains that exist, and there's lots of different niches of nature, and some animals have overdeveloped visual cortex or other kind of parts, and I think we should be able to see more speciation, and you don't need like this oracle that knows everything.

Speaker 1

你可以把它进行分化,然后针对特定任务进行部署,我们应该能看到这种现象,因为你能够拥有更小的模型,它们仍保留着核心认知能力——依然具备胜任力,但会进行专业化,从而在你真正关心的特定任务上实现更低的延迟或更高的吞吐量。

You kinda speciate it, and then you put it on a specific task, and we should be seeing some of that because you should be able to have, like, much smaller models that still have the cognitive core, like, they're still competent, but then they specialize, and then and then they can become more efficient in terms of latency or throughput on specific tasks that you really care about.

Speaker 1

比如,如果你是一位使用Lean的数学家,我就看到过一些发布版本专门针对这个领域进行了优化。

Like, if you're a mathematician working in lean, I saw, for example, there's a few releases that really, like, target that as a domain.

Speaker 1

因此,可能会出现一些类似的情况,这种解耦确实有其合理性。

So there's probably gonna be a few examples like that where the unbundling kind of makes sense.

Speaker 0

我有一个问题:可用计算基础设施的限制是否会推动这种趋势,因为效率实际上变得更加重要?

One question I have is whether or constraint on available compute infrastructure drives more of this because efficiency actually matters more.

Speaker 0

对吧?

Right?

Speaker 0

比如,抛开资金问题不谈,资金确实贯穿了这一切。

Like, financing aside, the financing is involved in all of this.

Speaker 0

如果你能无限制地使用计算资源,那自然可以只保留一个单一模型,对吧?

If you have access to full compute for anything you do, like leaving one single model, right?

Speaker 0

但如果你真的面临压力,比如无法为每个应用场景都部署一个超大规模模型,你认为这会促使模型出现分化吗?

But if you actually feel pressure where you're like, I can't serve a model of massive size for every use case, do you think that leads to any speciation?

Speaker 0

这个问题对你来说有意义吗?

Does that question make sense to you?

Speaker 1

这个问题是有道理的。

The question makes sense.

Speaker 1

我想,我正在纠结的是,到目前为止我们还没看到太多分化。

And I guess, like, what I'm what I'm what I what I'm struggling with is I don't think we've seen too much speciation just yet.

Speaker 1

对吧?

Right?

Speaker 1

不对。

No.

Speaker 1

我们看到的是一种模型的单一文化。

We're seeing a monoculture of models.

Speaker 0

是的。

Yeah.

Speaker 0

而且显然存在一种压力,就是要做一个好的代码模型,然后把它重新合并回主干。

And there's like clearly pressure for like make a good code model, put it back in the main, merge again.

Speaker 1

是的,是的。

Yeah, yeah.

Speaker 1

尽管模型已经面临压力。

Even though there already is pressure on the models.

Speaker 0

我想也许我感觉现在资源极度紧缺,这可能会促使更多分化出现。

I guess perhaps I feel like there's a lot of very short supply crunch, and, like, maybe that causes more speciation now.

Speaker 1

是的。

Yeah.

Speaker 1

从根本上说,这些实验室是在为模型服务,但他们并不真正知道最终用户会问些什么。

Think fundamentally, like, the the the labs are serving a model, and they don't really know what the end user is going to be asking about.

Speaker 1

所以这可能是其中一部分原因,因为他们必须应对所有可能被问到的问题。

So maybe that's, like, some part of it because they kind of have to multitask over all the possible things that could be asked.

Speaker 1

但我觉得,如果你是面向企业,或者与他们合作解决你关心的具体问题,那么你可能会看到这种分化。

But I think if you're coming to a business and maybe partnering on some specific problems you care about, then maybe you would see that there.

Speaker 1

或者会有一些非常高价值的、更小众的应用。

Or there will be some very high value applications that are, like, more niche.

Speaker 1

但我认为,目前它们基本上是在追求所有可用资源的总和。

But I think right now, they're kinda like going after the totality of what's available.

Speaker 1

我认为,操控大脑的科学还远未完全成熟,部分原因是这样。

I don't think that the science of manipulating the brains is, like, fully developed yet, partly.

Speaker 0

你所说的操控是指什么?

What do mean manipulating?

Speaker 1

比如,在不丧失原有能力的前提下进行微调。

So, like, so fine tuning without losing capabilities, as an example.

Speaker 1

我们还没有真正用于以context窗口以外的方式与智能体互动的基本工具。

And we don't have these primitives for actually, like, working with the intelligences in ways other than just context windows.

Speaker 1

context窗口确实有效,而且操作起来非常便宜,我们目前的定制化很大程度上就是靠这种方式实现的。

Context windows kind of just work, and it's very cheap to manipulate, etcetera, and this is how we're getting some of the customization, etcetera.

Speaker 1

但我认为,如何更深入地调整模型、实现持续学习、在特定领域进行微调、提升特定能力,或者直接修改权重而非仅限于context窗口,这门科学还处于发展中。

But I think if it was I think it's a bit more of a developing science of how you more deeply adjust the models, how you have continual learning maybe, or how you you fine tune in a certain area, how you get better in a certain area, or, like, how you actually touch the weights, not just the context windows.

Speaker 1

因此,我认为直接修改权重比仅调整context窗口要困难得多,因为你实际上是在从根本上改变整个模型,甚至可能改变它的智能本质。

And so it's a lot more tricky, I would say, to touch the weights than just the context windows, because you're actually fundamentally changing the full model and potentially its intelligence.

Speaker 1

所以也许这根本就不是一个完全成熟的物种分化规模,这样说你能理解吗?

And so so maybe it's just like not a fully developed size, if that makes sense, of speciation.

Speaker 0

而且它还必须足够便宜

And it also has to be like cheap enough

Speaker 1

是的

Yeah.

Speaker 0

只有这样,这种物种分化才值得去做,对吧。

For that speciation to be worthwhile Yeah.

Speaker 0

在这些特定情境下。

In these given contexts.

Speaker 0

我可以问一个关于你所描述的AutoResearch扩展到开放领域的问题吗?

Can I ask a question about an extension to AutoResearch that you described in terms of open ground?

Speaker 0

你说,好吧,我们有这个东西。

You said, okay, well, you know, we have this thing.

Speaker 0

我们需要为它创造更多的协作空间,以便人们能够共同推动研究。

We need more collaboration surface around it essentially for people to contribute to research overall.

Speaker 0

你能谈谈这个吗?

Can you talk about that?

Speaker 1

是的。

Yeah.

Speaker 1

我们之前谈到过,AutoResearch 的运作方式是单线程的,比如我不断尝试各种方法循环进行。

So we talked about AutoResearch has a single thread of, like, I'm gonna try stuff in loop.

Speaker 1

嗯。

Mhmm.

Speaker 1

但从根本上说,这种并行化才是有趣的核心部分。

But fundamentally, the parallelization of this is, like, the interesting component.

Speaker 1

我其实一直在尝试一些想法,但还没有找到像我期望的那样简单直接的方案,目前还没有特别满意的东西,不过我闲暇时,比如不忙于我的爪子项目时,会继续琢磨这个。

And I guess I was trying to, like, play around with a few ideas, but I don't have anything that, like, clicks as simply as, like I don't have something that I'm super happy with just yet, but it's something I'm like working on the side when I'm not working on my claw.

Speaker 1

所以,我认为一个问题在于,如果你能使用大量并行节点,那就很容易让多个 AutoResearch 实例通过一个共同系统进行交流之类的。

So, I think like one issue is if you have a bunch of nodes of paralyzation available to you, then it's very easy to just have multiple auto researchers talking through a common system or something like that.

Speaker 1

而我更感兴趣的是,如何让互联网上一群不可信的工作者参与进来。

What I was more interested in is how you can have an untrusted pool of workers out there on the Internet.

Speaker 1

比如在AutoResearch中,你只是在寻找能够将模型训练到极低验证损失的那部分代码。

So, example, in AutoResearch, you're just trying to find the piece of code that trains a model to a very low validation loss.

Speaker 1

如果有人给你一个候选提交,很容易验证这个提交是否正确、是否优秀。

If anyone gives you a candidate commit, it's very easy to verify that that commit is correct is good.

Speaker 1

比如,有人可能从互联网上声称这段代码能带来更好的优化和更高的性能。

Like, they someone could claim from the Internet that this piece of code will optimize much better and give you much better performance.

Speaker 1

你可以直接检查一下。

You could just check.

Speaker 1

非常简单。

Very easy.

Speaker 1

但可能这个检查过程需要大量工作。

But probably a lot of work goes into that checking.

Speaker 1

但从根本上说,他们可能会撒谎等等。

But fundamentally, they could lie and etcetera.

Speaker 1

所以你实际上处理的是类似的情况——我的那些包含不可信工作者池的设计,看起来有点像区块链,因为这里没有区块,而是提交,这些提交可以相互叠加,并包含你在改进过程中对代码所做的更改。

So you're basically dealing with a similar kind of it's almost actually, like, looks a little bit like my my designs that incorporate an untrusted pool of workers actually look a little bit more like a blockchain a little bit, because instead of blocks, you have commits, and these commits can build on each other, and they contain, like, changes to the code as you're improving it.

Speaker 1

而工作量证明本质上就是进行大量实验来找到有效的提交,这很困难。

And the proof of work is basically doing tons of experimentation to find the commits that work, And that's hard.

Speaker 1

而现在的奖励仅仅是登上排行榜。

And then the reward is just being on the leaderboard right now.

Speaker 1

完全没有金钱奖励。

There's no monetary reward whatsoever.

Speaker 1

但我不想把这个类比推得太远,但从根本上说,它存在这样一个问题:需要投入大量的搜索工作,但验证一个候选方案是否真正有效却非常简单,因为你只需训练一次。你知道,有人可能尝试了10000个想法,但你只需要检查他们产出的那个东西是否真的有效,因为其他99000个都不行。

But I don't wanna push the analogy too far, but it fundamentally has this issue where a huge amount of search goes into it, but it's very cheap to verify that a candidate solution is indeed good, because you can just train a single You know, someone had to try 10,000 ideas, but you just have to check that the thing that they produced actually works, because the 99,000 of them didn't work.

Speaker 1

你知道吗?

You know?

Speaker 1

所以,简而言之,你需要设计一个系统,让一群不可信的工作者与一群可信的验证者协作,整个过程是异步的,能够正常运行等等。

And so, basically, long story short, it's like you have to come up with a system where an untrusted pool of workers can collaborate with a trusted pool of workers that do the verification, and the whole thing is kinda like asynchronous and works and and so on.

Speaker 1

从安全角度来看,这很安全,因为如果有人给你任意代码并让你运行,那非常可疑且危险。

And it's it's like safe from a security perspective, because if anyone sends you arbitrary code and you're gonna run it, that is very sketchy and dodgy.

Speaker 1

但从根本上说,这完全是可能的。

So but fundamentally, it should be totally possible.

Speaker 1

所以你熟悉像在家设置和在家折叠这样的项目。

So you're familiar with projects like setting at home and folding at home.

Speaker 1

这些问题具有类似的设置。

Of these problems have a similar kind of setup.

Speaker 1

在在家折叠项目中,你是在折叠蛋白质,找到低能量构型非常困难。

So, folding at home, you're folding a protein, and it's very hard to find a configuration that is low energy.

Speaker 1

但如果有人找到了一个他们评估为低能量的构型,那就完美了。

But if someone finds a configuration that they evaluate to be low energy, that's perfect.

Speaker 1

你可以直接使用它。

You can just use it.

Speaker 1

你可以轻松验证它。

You can easily verify it.

Speaker 1

所以,很多事物都具有这种特性:产生它们代价高昂,但验证它们却非常便宜。

So, a lot of things have this property that's very expensive to come up with, but very cheap to verify.

Speaker 1

因此,在所有这些情况下,像在家折叠、在家SAT或在家自动研究这样的项目都会很合适。

And so, in all those cases, things like folding at home, or SATi at home, or AutoResearch at home will be good fits.

Speaker 1

所以,简而言之,互联网上的一群智能体可以协作来改进大语言模型,甚至可能超越前沿实验室,谁知道呢?

And so, long story short, a swarm of agents on the Internet could collaborate to improve LLMs, and could potentially even run circles around Frontier Labs, like who knows?

Speaker 1

这甚至可能是可行的。

That's even possible.

Speaker 1

前沿实验室拥有大量可信的计算资源,但地球上的不可信计算资源规模要大得多。

Frontier Labs have a huge amount of trusted compute, but the Earth is much bigger and has a huge amount of untrusted compute.

Speaker 1

但如果你建立相应的机制来应对这个问题,那么外部的群体或许能提出更好的解决方案,人们也会为他们关心的事业贡献计算资源。

But if you put systems in check, systems in place that deal with this, then maybe it is possible that the swarm out there could come up with better solutions, and people contribute cycles to a thing that they care about.

Speaker 1

所以,抱歉,最后一点想法是,许多公司或其他机构可能都有他们关心的特定项目,而如果你有计算能力,就可以参与不同的AutoResearch方向。

And so, sorry, so the last thought is lots of companies or whatnot, they could maybe have their own things that they care about, and you, if you have compute capacity, you could contribute to different kind of AutoResearch tracks.

Speaker 1

比如,你可能关心某种特定类型的癌症问题。

Like, maybe you care about certain you know, you care about cancer or something like that of a certain type.

Speaker 1

你不必非得向机构捐款。

You don't have to just donate money to an institution.

Speaker 1

你实际上可以购买计算资源,然后加入该项目的AutoResearch社区。

You actually could purchase compute, and then you could join the AutoResearch forum for that project.

Speaker 1

如果一切都重新整合为自动研究,那么计算能力就成为你贡献到池中的核心资源。

If everything is rebundled into AutoResearches, then compute becomes the thing that you're contributing to the pool.

Speaker 0

是的。

Yeah.

Speaker 0

这非常鼓舞人心,而且也很有趣。

That's very inspiring, and it's also interesting.

Speaker 0

我知道这能走多远,但有趣的是,至少有一部分人群——比如硅谷的人,或者在中国零售店排队的人——已经重新意识到,拥有个人计算能力是很有趣的。

Like, I know how far this goes, but it is interesting that at least some audience of people, you know, here in Silicon Valley or lining up at, you know, retail stores in China have discovered that, like, having access to personal compute is interesting again.

Speaker 0

是的。

Yeah.

Speaker 0

对吧?

Right?

Speaker 0

所以也许他们真的有动力为自己的目标做这件事,然后就能为自动研究做出贡献。

So maybe they're really motivated to do that for their claws, and then they can contribute to AutoResearch.

Speaker 1

这几乎就像金钱是每个人都关心的东西,但未来真正每个人都关心的会不会是FLOP呢?

It's almost like dollars the thing everyone cares about, but is FLOP the thing that actually everyone cares about in the future?

Speaker 1

比如,会不会出现一种反转,你真正关心的东西变了?

Like, is there gonna be, like, a flipping thing almost of, like, what's the thing that you care about?

Speaker 1

比如现在,即使你有钱,也很难获得算力。

Like, right now, for example, it's really hard to get compute even if you have money.

Speaker 2

是的。

Yeah.

Speaker 2

所以,实际上这看起来几乎像是

So, actually, it almost seems like

Speaker 1

在某种意义上,算力是占主导地位的。

the flop is, like, dominant in a certain sense.

Speaker 1

对。

Yeah.

Speaker 1

所以也许情况确实有点像那样。

So so maybe that's kinda like kinda like that.

Speaker 1

你掌控了多少FLOP,而不是你掌控了多少财富?

Like, how much how many flops do you control instead of, like, what wealth do you control?

Speaker 1

我不认为这是真的,但想想还挺有意思的。

I don't actually think that's true, but it's kind of interesting to think about.

Speaker 0

你最近发布的是关于一些就业数据的分析。

The last thing you released was, like, a little bit of jobs data analysis.

Speaker 0

是的。

Yeah.

Speaker 0

对吗?

Is that right?

Speaker 0

即使你只是在可视化一些公开数据,也可能触碰到了敏感点。

Might have touched a nerve even though you're just like visualizing some public data.

Speaker 0

你当时对什么感兴趣?

What were you curious about?

Speaker 1

是的,我想我是好奇,毕竟每个人都非常关注人工智能对就业市场的影响以及未来会怎样。

Yeah, I guess I was curious to I mean, everyone really is thinking about the impacts of AI on the job market and what it's gonna look like.

Speaker 1

所以,我只是想看看,就业市场究竟是什么样子。

So, I was just interested to take a look, like, what does the job market look like?

Speaker 1

不同的职位在哪里?

Where are the different roles?

Speaker 1

不同职业中有多少人?

And how many people are in different professions?

Speaker 1

我只是非常想仔细看看各个案例,试着思考一下:随着这些人工智能的发展,它们会成为人们使用的工具吗?

And I was like really just interested to like look through the individual cases and try to think myself about, like, you know, with these AIs and how they're likely to evolve, like, are these gonna be tools that people are using?

Speaker 1

它们会取代这些职业的工具吗?

Are these gonna be displacing tools for these professions?

Speaker 1

当前的职业有哪些,它们将如何变化?

And like, what are the current professions, and how are they gonna change?

Speaker 1

它们会大幅增长或调整吗?

Are they gonna grow or adjust to a large extent?

Speaker 1

或者,可能会出现哪些新职业?

Or like, what could be new professions?

Speaker 1

所以,这其实就是一种激发我对这个行业进行深入思考的方式,我想。

So, it's really just like a way to fuel my own chain of thought about the industry, I suppose.

Speaker 1

所以,是的,就业数据基本上来自美国劳工统计局。

And so, yeah, the jobs data basically is just a Bureau of Labor Statistics.

Speaker 1

他们实际上对每种职业都有未来增长的预测百分比,我认为是未来将近十年。

They actually have a percent outlook for each profession about how much it's expected to grow over the next, I think, almost decade.

Speaker 1

是的。

Yeah.

Speaker 1

我觉得是十年,但这是2024年发布的。

Think it's a decade, but it was made in 2024.

Speaker 0

好的。

Okay.

Speaker 0

我们需要大量的医疗工作者。

We need a lot of healthcare workers.

Speaker 1

是的。

Yeah.

Speaker 1

所以他们已经做出了这些预测,但我其实不太确定他们使用的具体方法是什么。

So so they've already made those projections, and I'm not sure actually a 100% what the methodology was that they they put into their projections.

Speaker 1

我觉得我挺想把不同的情况分类来看:大家是不是都觉得,目前主流发展的都是这类数字化的人工智能——就好像是能在数字世界里互动、还能操控大量数字信息的虚拟灵体,而且它们现在还没有实体载体,也不存在物理形态。

I guess I was interested to color things by like, if people think that what's, like, primarily being developed now is this kinda like more digital AI that is kind of like almost like these ghosts or spirit entities that can, like, interact in the digital world and manipulate a lot of, like, digital information, and they currently don't really have a physical embodiment or presence.

Speaker 1

而实体领域的发展速度可能会慢一些,因为你是在实实在在地操控物质原子。

And the physical stuff is probably gonna go slightly slower because you're manipulating atoms.

Speaker 1

毕竟翻转数据位、复制粘贴数字信息这类操作,本来就比操控实体物质的速度快上百万倍。

So flipping flipping bits and and the ability to copy paste digital information is like makes everything a million times faster than accelerating matter.

Speaker 1

你懂吧?

You know?

Speaker 1

所以从发展态势来说,我觉得数字领域会迎来巨量的活动,会有大量的重构工作,各类活动都非常活跃,一切都在酝酿升温。

So so energetically, I just think we're gonna see a huge amount of activity in digital space, huge amount of rewriting, huge amount of activity, boiling soup.

Speaker 1

而且我认为,和物理世界大概率会发生的变化相比,数字领域的发展速度简直快如光速,这是可以推演出来的趋势。

And I think we're gonna see something that in the digital space goes at the speed of light compared to, I think, what's gonna happen in the physical world to some extent, it would be the extrapolation.

Speaker 1

另外我觉得目前还有不少积压的潜力有待释放,过去那些原本需要计算机和人力共同完成的数字信息处理工作,很快就能解除限制、大幅提效。

And so I think, like, there's currently kind of, like, I think, overhang where there can be, like, a lot of unhobbling, almost potentially, of, like, a lot of digital information processing that used to be done by computers and people.

Speaker 1

现在人工智能成了处理数字信息的第三类核心力量,这些领域都会迎来大量的重构调整。

And now with AI as, like, a third kind of manipulator of digital information, there's gonna be a lot of refactoring in those in those disciplines.

Speaker 1

但现实世界实际上会比数字世界慢上一些时间。

But the physical world is actually gonna be, like, I think, behind that by some amount of time.

Speaker 1

所以对我来说,真正有趣的是,这就是为什么我特别强调那些从根本上操作数字信息的专业人士。

And so I think what's really fascinating to me is, so that's why I was highlighting the the professionals that fundamentally manipulate digital information.

Speaker 1

这些工作你可以在家完成,等等。

This is work you could do from your home, etcetera.

Speaker 1

因为我感觉这些领域会发生变化。

Because I feel like those will be like, things will change.

Speaker 1

这并不意味着这类工作会变少或变多,因为这涉及到需求弹性以及其他许多因素,但这些职业会因为这些新工具以及人类超有机体神经系统的升级而发生改变——如果你愿意这么理解的话。

And it doesn't mean that there's gonna be less of those jobs or more of those jobs because that has to do with, like, demand elasticity and many other factors, but things will change in these professions because of these new tools and because of this upgrade to the nervous system of the human superorganism, if you wanna think about it that way.

Speaker 0

根据你对数据的观察,你对正在面对就业市场、或正在思考现在该学什么、该发展哪些技能的人,有什么建议或看法吗?

Given the look you had at the data, do you have either any observations or guidance for people facing the job market or thinking about what to study now or what skills to develop?

Speaker 0

我的意思是,我很庆幸现在我的工作还需要我亲自去见人。

I mean, we can all go get like, I'm very thankful that I have to, like, meet people for my job right now.

Speaker 1

是的。

Yeah.

Speaker 1

I'm

Speaker 0

变得更注重身体了。

getting more physical.

Speaker 0

是的。

Yeah.

Speaker 1

不过,你能在家工作吗?

Could you do your work from home, though?

Speaker 1

我可以。

I could.

Speaker 0

我觉得其中有些关系方面的部分很难,但大部分我都能在家做。

I think there are relationship parts of it that are hard, but most of it I could.

Speaker 1

是的。

Yeah.

Speaker 1

我觉得很难说,因为就业市场极其多样,答案可能因人而异。

I think it's really hard to tell because, again, like, job market is extremely diverse, and I think the answers will probably vary.

Speaker 1

但在很大程度上,这些工具非常新、非常强大,所以首先要做的是努力跟上它们的发展。

But to a large extent, like, these tools are extremely new, extremely powerful, and so just being, you know, just trying to keep up with it is, like, the first thing.

Speaker 1

而且,是的,因为我觉得很多人会轻视它,或者

And, yeah, because I think a lot of people kinda like dismiss it or

Speaker 0

或者他们害怕它。

Or they're afraid of it.

Speaker 1

或者他们害怕它,等等,这当然完全可以理解。

Or they're afraid of it, etcetera, which is totally understandable, of course.

Speaker 1

是的。

Yeah.

Speaker 1

我觉得目前它本质上是一种赋能工具。

I think like it's fundamentally an empowering tool at the moment.

Speaker 1

这些工作是由一系列任务组成的,而其中一些任务可以快很多。

And these jobs are bundles of tasks, and some of these tasks can go a lot faster.

Speaker 1

所以人们应该把它们主要看作是当前的一种工具。

And so people should think of it as primarily a tool that it is right now.

Speaker 1

我认为这方面的长期前景是不确定的。

And I think the long term future of that is uncertain.

Speaker 1

是的。

Yeah.

Speaker 1

说实话,这真的很难预测。

It's kinda really hard to forecast, to be honest.

Speaker 1

而且,我本人其实并没有专业从事这方面的工作。

And, like, I'm not professionally, like, doing that really.

Speaker 1

我认为这应该是经济学家们该做的正经工作。

And I think it's a job of, like, economists to do properly.

Speaker 0

不过你是个工程师。

You are an engineer, though.

Speaker 0

我觉得有趣的一点是,工程类职位的需求仍在持续增长。

And, like, one thing I thought was interesting is that, like, the demand for engineering jobs is continuing to increase.

Speaker 0

对。

Yeah.

Speaker 0

我无法确定这是否只是暂时的现象。

I can't tell if that's like a temporary phenomenon.

Speaker 1

我不确定

I'm not

Speaker 0

我目前对这件事还拿不定主意。

sure how I feel about it yet.

Speaker 0

你知道吗?

Do you know?

Speaker 1

是的。

Yeah.

Speaker 1

这几乎就像是需求弹性一样。

That's like the demand elasticity almost.

Speaker 1

比如,软件曾经很稀缺。

Like, software was scarce.

Speaker 1

对吧?

Right?

Speaker 1

所以我们没有更多软件需求的原因只是因为稀缺且价格太贵。

And so the reason we don't have more demand for software is just scarcity and it's too expensive.

Speaker 0

太贵了。

Too expensive.

Speaker 0

是的。

Yeah.

Speaker 1

所以,如果障碍消除,就会出现杰宾悖论,也就是说,软件的需求实际上会增加。

So, if the barrier comes down, then actually you have the Jebin's paradox, which is like, you know, you actually the demand for software actually goes up.

Speaker 1

价格更低了,而且数量更多

It's cheaper, and there's more more

Speaker 0

更强大。

More powerful.

Speaker 0

是的。

Yeah.

Speaker 1

关于这一点的经典例子总是自动取款机和银行柜员,因为当时很多人担心自动取款机和计算机基本上会取代柜员。

The the classical example of this always is the ATMs and the bank tellers, because there was a lot of, like, fear that ATMs and computers, basically, would displace tellers.

Speaker 1

但实际情况是,随着银行分行数量增加,每个银行分支机构的运营成本变得低得多,因此柜员反而更多了。

But what happened is they made, like, the cost of operation of of a bank branch much cheaper as there were more bank branches, so there were more tellers.

Speaker 1

这正是人们常引用的典型例子。

It's like the canonical example people cite.

Speaker 1

但本质上,这仅仅是杰宾悖论。

But, basically, it's just Jemin's paradox.

Speaker 1

当某样东西变得更便宜时,就会释放出大量需求。

Like, something becomes cheaper, so there's a lot of unlocked demand for it.

Speaker 1

所以我认为,对于软件工程,我确实持有一种谨慎乐观的态度,因为在我看来,软件的需求将会非常巨大,而且它已经变得便宜多了。

So I do think that that's probably I do have, like, cautiously optimistic view of this in software engineering, where I do it does seem to me like the demand for software will be extremely large, and it's just become a lot cheaper.

Speaker 1

因此,我认为在相当长的一段时间内,很难做出预测,但至少就目前本地情况来看,软件的需求将会增加。

And so I do think that for quite some time, it's very hard to forecast, but it does seem to me like right now, at least locally, there's gonna be more demand for software.

Speaker 1

因为软件太棒了。

Because software is amazing.

Speaker 1

你知道的,这是数字信息处理。

It's like, you know, digital information processing.

Speaker 1

你不必被迫使用那些被提供给你的、存在各种缺陷的任意工具。

You're not forced to use arbitrary tools that are given to you that are imperfect in various ways.

Speaker 1

你不必被迫接受现有的东西。

You're not forced to subscribe to what exists.

Speaker 1

代码现在是短暂的,可以更改和修改。

Code is now ephemeral, and it can change, and it can be modified.

Speaker 1

因此,我认为数字领域将出现大量活动,从某种意义上重新连接一切,这将创造对这类事物的大量需求。

And so, I think there's gonna be a lot of activity in the digital space to rewire everything in a certain sense, and I think it's gonna create a lot of demand for this kind of stuff.

Speaker 1

我认为从长远来看,是的,即使有了像AutoResearch这样的工具,比如OpenAI、Anthropic或其他这些实验室,它们雇佣了大约一千多名研究人员。

I think long term, yeah, obviously, even with AutoResearch, like OpenAI or or, you know, Anthropic or these other labs, like, they're employing, what, like a thousand something researchers.

Speaker 1

对吧?

Right?

Speaker 1

嗯哼。

Mhmm.

Speaker 1

这些研究人员本质上就像是高级版的AutoResearch,他们正在主动地自动化自己,而这正是他们所有人努力的目标。

These researchers are basically, like, glorified AutoResearch, like, you know, they're like automating themselves away, like, actively, and this is like the thing they're all trying to do.

Speaker 1

是的

Yeah.

Speaker 1

我觉得我曾经到处走了一圈

I think like, I went around

Speaker 0

一些研究人员也感到精神错乱。

Some of those researchers also feel psychosis.

Speaker 0

对吧?

Right?

Speaker 0

因为它们确实有效。

Because they can it's working.

Speaker 0

对吧?

Right?

Speaker 0

所以他们觉得,我的职业生涯也结束了。

And so they're like, it's over for me too.

Speaker 1

我确实花了很多时间走访了OpenAI,我当时就想,你们意识到吗?如果我们成功了,我们会取代很多工作。

I did spend a bunch of time going around OpenAI, and I was like, you guys realize if we're successful, like, we're a lot of jobs.

Speaker 1

就好像,我们只是在给山姆之类的人搞自动化建设似的。

Like, like, it's just we're just building automation for Sam or something like that.

Speaker 1

要么是给我,要么是给董事会。

Like, I or the board.

Speaker 1

我也不太确定。

I'm not sure.

Speaker 1

但确实有关于这套自动化的说法,是的,就是给董事会或者首席CEO之类的人搞的,之后我们都会丢了工作,顶多只能在边上线做点贡献。

But, like, there's just billing about this automation for, yeah, the board or the CEO or something like that, and we're all out of our job and maybe contributing on the sides.

Speaker 1

所以啊,对,从这个角度来看,大家就是在深入琢磨这件事。

And so, yeah, it's kind of like nerding from that perspective.

Speaker 0

我能问你诺姆提出的那个问题吗?

Is it okay if I ask you Noam's question?

Speaker 0

你本来也可以做那样的事。

You could be doing that.

Speaker 0

对吧?

Right?

Speaker 0

在一家前沿实验室,与许多同事一起进行自动化研究,需要大量计算资源。

Auto researching with a lot of compute scale and a bunch of colleagues at one of the frontier labs.

Speaker 0

为什么不呢?

Like, why not?

Speaker 1

我曾经在那里待过一段时间。

Well, I was there for a while.

Speaker 1

对吧?

Right?

Speaker 1

而且我又回去了。

Like, and I did reenter.

Speaker 1

所以在某种程度上,我同意,我认为这个问题可以从很多角度来分析。

So to some extent, I agree, and I think that there are many ways to slice this question.

Speaker 1

这个问题本身带有一些敏感性。

It's a very loaded question a little bit.

Speaker 1

我想说的是,我非常认可人们在前沿实验室之外所能做出的贡献和影响。

I will say that I feel very good about what people can contribute and their impact outside of the Frontier Labs, obviously.

Speaker 1

不仅是在行业内,还包括更广泛的生态系统层面的角色。

Not in the industry, but also in more ecosystem level roles.

Speaker 1

所以,例如,你的角色更偏向生态系统层面。

So your role, for example, is more ecosystem level.

Speaker 1

我目前的角色也更多是在生态系统层面,我对人们在这些角色中所能产生的影响感到非常乐观。

My role currently is also more on ecosystem level, and I feel very good about the impact that people can have in those kinds of roles.

Speaker 1

我认为,反过来,把自己过度与前沿实验室绑定,确实存在一些问题。

I think conversely, are definitely problems in my mind for basically aligning yourself way too much with the Frontier Labs too.

Speaker 1

从根本上说,这些前沿实验室提供了巨大的经济激励,而根据你自己的说法,人工智能将深刻而剧烈地改变人类和社会,而你却在直接构建这项技术,从中获益,并通过经济手段与之紧密结盟。

So, fundamentally, I mean, have a huge financial incentive with these frontier labs, and by your own admission, the AIs are going to really change humanity and society in very dramatic ways, and here you are basically building the technology and benefiting from it, and being very allied to it through financial means.

Speaker 1

这正是OpenAI最初成立时所面临的核心困境。

Like, this was a conundrum that was in at the heart of, you know, how OpenAI was started in the beginning.

Speaker 1

这正是我们当时试图解决的难题。

Like, this was the conundrum that we were trying to solve.

Speaker 1

是的。

Mhmm.

Speaker 1

所以呢,你知道,这就有点……

And so, you know, that so it's kind of

Speaker 0

这个问题至今都还没解决。

It's still not resolved.

Speaker 0

这个核心难题……

The conundrum is

Speaker 1

还是没有得到彻底的解决。

still not, like, fully resolved.

Speaker 1

这是第一点。

So that's number one.

Speaker 1

如果你身处这些前沿实验室之一,你就不是一个完全自由的个体,也没办法真正以完全自主、不受约束的姿态参与到这类讨论中。

You you're not a completely free agent, and you can't actually, like, be part of that conversation in a fully autonomous free way, like, if you're inside one of the Frontier Labs.

Speaker 1

有些话是你不能说的,反过来,还有些话是机构希望你去说的。

Like, there are certain things that you can't say, and conversely, there are certain things that the organization wants you to say.

Speaker 1

而且他们不会强迫你,但你能感受到那种压力,清楚自己应该说什么。

And, you know, they're not gonna twist your arm, but you feel the pressure of, like, what you should be saying.

Speaker 1

你知道的。

You know?

Speaker 1

对吧。

Right.

Speaker 2

因为,很明显。

Because, like, obviously.

Speaker 2

否则的话,

Otherwise, it's,

Speaker 1

就会变得特别尴尬,人们投来奇怪的目光,问你在干什么?

like, really awkward conversations, strange side eyes, like, what are you doing?

Speaker 1

你知道的。

You know?

Speaker 1

所以你根本不能真正成为一个独立的个体,我觉得在前沿实验室之外,我反而更贴近人类,因为我几乎不受那些压力影响,对吧?但我也不能想说什么就说什么。

So you can't, like, really be an independent agent, and I feel like a bit more aligned with humanity in a certain sense outside of frontier lab, because I'm not subject to those pressures almost, right, and I can't say whatever I want.

Speaker 1

我会说,在前沿实验室里,你当然也能产生影响。

I would say in the frontier labs, you can have impact there, of course, as well.

Speaker 1

但有很多研究人员,也许你就是其中之一,也许你的想法非常好,等等。

But there's many researchers, and maybe you're one of them, maybe your ideas are really good, etcetera.

Speaker 1

而且可能有很多决策要做,你希望在这些对话发生时,能身处其中。

And maybe there's a lot of decision making to do, and you want to be in a position where you are in the room with those conversations when they come up.

Speaker 1

我认为,目前整体的利害关系其实还算低,所以一切都挺不错的。但归根结底,当真正的利害关系变得极高时,如果你是某个组织的雇员,我不确定你到底能在多大程度上影响组织的决策。

I do think that currently the stakes are, like, overall fairly low, and so everything is kinda, like, nice, But ultimately, at the end of the day, like, when the stakes are really high, etcetera, if you're an employee at an organization, I don't actually know how much sway you're going to have on your organization and what it's going to do.

Speaker 1

说到底,你其实并不真正掌权。

Like, fundamentally, at the end of the day, it's you're not, like, really in charge.

Speaker 1

在房间里,你确实贡献想法,但你并不真正掌控你所隶属的那个实体。

In a room, and you're contributing ideas, but you're not really in charge of that entity that you're a part of.

Speaker 1

所以,我认为这些在某种程度上是一些错位的来源。

So, those are like some sources of misalignment, I think, to some extent.

Speaker 1

我要说,从某种意义上,我非常认同这种观点:实验室无论好坏,都是封闭的,很多工作都在那里进行,它们处于能力的前沿,正在处理未来即将出现的事物。

I will say that, like, in one way, I do agree a lot with that sentiment, that I do feel like, labs, for better or worse, they're opaque, and a lot of work is there, and they're kind of like at the edge of capability in what's possible, and they're working on what's coming down the line.

Speaker 1

我认为,如果你不在这些前沿实验室里,你的判断从根本上就会开始偏离,因为你没有参与那些即将来临的事物。

And I think if you're outside of that frontier lab, your judgment fundamentally will start to drift because you're not part of the, you know, what's coming down the line.

Speaker 0

对。

Right.

Speaker 1

所以我觉得我的判断也必然会逐渐偏离。

And so I feel like my judgment will inevitably start to drift as well.

Speaker 1

我将无法真正理解这些系统底层是如何运行的。

And I won't actually have an understanding of how these systems actually work under the hood.

Speaker 1

那是一个不透明的系统。

That's an opaque system.

Speaker 1

我无法很好地理解它将如何发展等等。

I won't have a good understanding of how it's going to develop and etcetera.

Speaker 1

因此,我认为在这一点上我同意,这也是我感到担忧的地方。

And so I do think that in that sense, I agree and something I'm nervous about.

Speaker 1

我认为真正接触正在发生的事情,并置身于前沿实验室中是很有价值的。

I think it's worth basically base being in touch with what's actually happening and actually being in the frontier lab.

Speaker 1

如果某些前沿实验室愿意让我去待一段时间,为他们做出出色的工作,然后也许再回来

And if if some of the frontier labs would have me come for, you know, some amount of time and do really good work for them, and then maybe coming

Speaker 0

伙计们,他在找工作。

Guys, he's looking for a job.

Speaker 0

这太令人兴奋了。

This is super exciting.

Speaker 0

是的。

Yeah.

Speaker 1

那么我认为这可能是一个不错的安排,因为我总觉得,也许这种方式能真正与正在发生的事情保持联系,同时又不至于完全受这些机构的控制。

Then I think that's maybe a good setup, because I kinda feel like it kind of, you know, maybe that's like one way to actually be connected to what's actually happening, but also not feel like you're necessarily fully controlled by those entities.

Speaker 1

所以,说实话,我认为诺姆在AutoAI一定能做出非常出色的工作,但我也觉得他最具影响力的工作很可能发生在OpenAI之外。

So, I think, honestly, my mind, Noam can probably do extremely good work at AutoAI, but also I think his most impactful work could very well be outside of OpenAI.

Speaker 0

不。

No.

Speaker 0

那是成为独立研究员,加入AutoResearch的号召。

That's a call to be an independent researcher with AutoResearch.

Speaker 1

对。

Yeah.

Speaker 1

外面有很多事情可做,我认为最终理想的解决方案可能是来回穿梭,是的。

There's many things to do on the outside, and I think, ultimately, I think the ideal solution maybe is like, yeah, going back and forth or yeah.

Speaker 1

我认为,从根本上说,这在两个地方都能产生非常惊人的影响。

And I think fundamentally, it can have a really amazing impact in both places.

Speaker 1

所以非常复杂。

So very complicated.

Speaker 1

我不知道。

I don't know.

Speaker 1

比如,这个问题有点复杂,但我的意思是,我加入了前沿实验室,而现在我是在外面。

Like, it's a very loaded question a little bit, but, I mean, I joined the Frontier Lab, and I'm outside.

Speaker 1

然后也许在未来,我又想回去,我觉得这就是我对这件事的看法。

And then maybe in the future, I'll want to join again, and I think that's kind of like how I look at it.

Speaker 0

关于世界或AI生态系统对前沿的了解程度,有一个相关的问题是:开源与前沿的距离有多近,以及这种状态能持续多久。

One question related to what visibility does the world or the AI ecosystem have into the frontier is like how close open source is to the frontier and how sustainable that is.

Speaker 0

我认为这相当令人惊讶,从最初只有少数中国模型和全球模型,到未来短期内人们将继续发布更多模型,这些模型在能力上比业界预期的要接近前沿。

I think it is quite surprising, the entire sequence of events actually from like having a handful of Chinese models and global models, and I think people are going to continue releasing here in the near term, that are closer than much of the industry anticipated from a capability perspective.

Speaker 0

我不知道你是否对此感到惊讶,但你是开源的长期贡献者。

I don't know if you're surprised by that, but you're a long term contributor to open source.

Speaker 0

那么,你对此有什么预测?

Like, what's your prediction here?

Speaker 1

是的。

Yeah.

Speaker 1

大致来说,基本上是的。

So roughly speaking, basically, yeah.

Speaker 1

闭源模型领先,但人们一直在关注开源模型落后了多少个月。

The closed models are ahead, but, like, people are monitoring the number of months that sort of, like, open source models are behind.

Speaker 0

起初,两者之间没有任何差距,后来差距扩大到了十八个月。

And to start with, there's nothing, and then it went to eighteen months.

Speaker 1

是的。

Yeah.

Speaker 1

它们一直在趋同。

They've been convergence.

Speaker 1

对吧?

Right?

Speaker 1

所以它们可能落后了,大概是多久呢?

So then maybe they're behind by, like, what is the latest?

Speaker 1

现在可能是八个月、六个月、八个月左右吧?

Maybe, like, eight months, six months, eight months kind of thing right now?

Speaker 1

是的。

Yeah.

Speaker 1

我当然是开源的忠实支持者。

I'm a huge fan of open source, obviously.

Speaker 1

比如在操作系统方面,有像Windows和macOS这样的闭源系统。

So for example, in operating systems, have, like, closed sort like, you know, Windows and Mac OS.

Speaker 1

这些都是大型软件项目,有点像未来大语言模型会变成的样子,而还有Linux。

These are large software projects, kind of like what LMs are gonna become, and there's Linux.

Speaker 1

但Linux非常简单。

But Linux is very easy.

Speaker 1

实际上,Linux 是一个极其成功的项目。

Like, actually, Linux is an extremely successful project.

Speaker 1

它运行在绝大多数计算机上。

It runs on the vast majority of computers.

Speaker 1

我上次查的时候,好像是60%左右的计算机在使用 Linux?

Like, last time I checked, was it like 60% or something, like, run Linux?

Speaker 1

这是因为业界需要一个所有人都觉得安全使用的通用开放平台。

And that's because there is a need in the industry to have a common open platform that everyone feels sort of safe using.

Speaker 1

我认为,业界一直对这种项目的存在有着需求。

I would say, like, the industry has always felt a demand for that kind of a project to exist.

Speaker 1

嗯。

Mhmm.

Speaker 1

我认为现在也是如此,这就是为什么企业真正希望这种项目能够存在。

And I think the same is true now, and that's why businesses actually want there's demand for this kind of a thing to exist.

Speaker 1

最大的区别在于,一切都变成了资本。

The big difference is that everything is capital.

Speaker 1

这里面投入了很多东西

There's a lot

Speaker 0

非常昂贵。

of It's very expensive.

Speaker 1

投入到这个项目中。

That goes into this.

Speaker 1

所以我认为,这就是事情开始出现分歧的地方,让竞争在某种程度上变得更困难了。

So I think that's where things fall apart a little bit, make it a bit harder to to compete in some sense.

Speaker 1

我认为当前的模型非常好。

I I do think that the current models are very good.

Speaker 1

我觉得另一件非常有趣的事情是,对于绝大多数消费者使用场景来说,即使是开源模型也相当不错。

The other thing that I think is really interesting is that for the vast majority of consumer use cases and things like that, even open source models are actually quite good, I would say.

Speaker 1

我认为,如果往前看更多年,很多简单的使用场景都会得到很好的覆盖,甚至可以在本地运行。

And I think, like, if you go forward, like, more years, it does seem to me like a huge amount of simple use cases are gonna be well covered and actually even run locally.

Speaker 1

但总会有一些对前沿智能的需求,而这可能实际上占据相当大的一部分市场。

But there's gonna be always like some demand for frontier intelligence, and that can actually be extremely large piece of the pie.

Speaker 1

但前沿智能的需求可能就像诺贝尔奖级别的工作,或者像把 Linux 从 C 语言迁移到 Rust 这样的项目。

But it could be that the frontier the need for frontier intelligence is gonna be, like, you know, Nobel Prize kind of work, or, like, let's move Linux from C to Rust.

Speaker 1

会有一些规模更大的项目,以这种方式界定范围,而前沿封闭智能很可能主要与这些项目互动,而开源则会逐渐渗透掉大量基础应用场景。

There's gonna be, like, bigger projects, you know, like, scoped in that kind of a way, and there's gonna be maybe more and maybe that's where a lot of the frontier closed intelligences are gonna be interacting with, and open source is kinda, like, gonna eat through a lot of the more basic use cases or something like that.

Speaker 1

你知道,今天所谓的前沿技术,可能在今年晚些时候就会变成开源的——我目前从封闭实验室使用的这些前沿模型,未来很可能变成开源的,并承担大量工作。

You know, at some point, what is frontier today is gonna be you know, probably later this year, what's frontier today in terms of what I'm using right now from the closed labs might be open source, and that's gonna be doing a lot of work.

Speaker 1

所以我预计这种动态会持续下去。

So I kind of expect that this dynamic will actually basically continue.

Speaker 1

我们会拥有像‘神谕’一样的前沿实验室,它们拥有封闭的 AI,而开源模型则会落后几个月,我预计这种局面会持续下去。

Like, we'll have Frontier Labs that have closed AIs that are kind of like these Oracles, and then we'll have open source kind of like behind by some amount of months, and I kind of expect that to to continue.

Speaker 1

实际上,我认为这种整体架构相当不错,因为我对完全依赖封闭智能有点犹豫——从结构上看,我认为这存在某种系统性风险。

And I actually think that's like a pretty pretty good setup overall, because I'm a little bit hesitant of having I don't actually think it's like structurally I think there's some systemic risk attached to just having intelligences that are closed, and that's like, that's it.

Speaker 1

我认为,过去集中化模式的记录一直很差,而且

And I think that that's a you know, centralization has a very poor track record in my view in the past and has

Speaker 0

你是说在政治或经济系统整体上吗?

You mean, like, in political or economic systems in in general?

Speaker 2

是的。

Yes.

Speaker 2

没错。

Exactly.

Speaker 2

I

Speaker 1

我觉得有很多

think there's, like, a lot

Speaker 0

欧洲的。

of European.

Speaker 0

对。

Yeah.

Speaker 1

很多都是很糟糕的先例。

A lot of it's pretty bad precedent.

Speaker 1

所以我希望有一个东西,它可能不在能力的前沿,因为那是新且未探索的等等,但我希望有一个东西是落在后面的,作为一个整个行业都能访问的共同工作空间。

So I want there to be a thing that is maybe not at the edge of capability because it's new and unexplored, etcetera, but I want there to be a thing that's behind and that is kind of like a common working space for intelligences that the entire industry has access to.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客