本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
我第一次给别人看的时候,他们说:不可能。
The first time I showed it to someone, they were like, no way.
这简直就像是一个假的演示。
This is, like, a fake demo.
这不可能这么快。
Like, this cannot be this fast.
这将改变一切,尤其是因为它还不是我们能达到的最快速度。
This will change everything, especially because it's not yet the fastest that we can actually get it to be.
我的体验是尝试这个应用程序。
My experience was trying the app.
我真的不想再回到终端了。
I didn't really wanna go back to a terminal.
我意识到,实际上,图形界面很棒。
What I realized is, actually, GUIs are great.
问题是IDE本身。
IDEs are just the problem.
有一种用于编程的图形界面,但它不是IDE,看起来你正在摸索出这种东西。
There's something that's a GUI for programming that's not an IDE, and it it seems like you're figuring that out.
但我甚至不知道这叫什么。
But I don't even know what that's called.
它叫Codex应用。
It's called a Codex app.
这里是Dan。
Dan here.
我想暂时离开一下本集,来介绍一下Granolah。
And I wanna take a second away from the episode to tell you about Granolah.
Granolah是一款用于会议的AI笔记工具,我几乎每天都用。
Granolah is an AI notetaker for your meetings, and I use it pretty much every day.
这听起来可能有点奇怪,甚至有点令人不安,比如把所有会议都录下来。
That may sound a little bit weird or a little bit creepy, like, transcribe all your meetings.
但对我而言,作为领导者,这实际上几乎是不可或缺的。
Well, for me, it's actually kind of indispensable as a leader.
现在团队大约有20人,对我来说,理解决策是如何做出的、我在会议中的表现以及如何最好地支持我的团队至关重要。
Every is about 20 people now, and it's really important to me that I understand how decisions get made, how I'm showing up in meetings, and how I can help my team the best way I can.
Granolah 对我来说就像一个领导力日志,让我能回顾自己在会议中的表现、某周出现了哪些情况,以及下次如何做得更好。
Granola acts a little bit like a leadership log for me so I can see how I've done in meetings, what what situations came up in a particular week, and how I can do better next time.
如果你正试图提升领导力并扩大公司规模,不妨试试 Granolah,它是一款为会议打造的AI笔记工具。
If you're trying to improve as a leader and scale your company, try Granolah as your AI powered notepad for meetings.
前往 granola.ai/every,使用代码 every,即可免费使用三个月。
Head to granola.ai/every, code every, to get three months free.
现在,我们回到节目本身。
And now back to the episode.
Thibault,Andrew,欢迎来到节目。
Thibault, Andrew, welcome to the show.
你好。
Hey.
谢谢你们邀请我们。
Thanks for having us.
谢谢你们邀请我们。
Thanks for having us.
很好。
Great.
很高兴能和你们聊天。
Great to get to chat with you.
对于不了解的人,Thibault,你是OpenAI Codex的负责人,而Andrew,你是OpenAI Codex应用的技术团队成员。
So for people who don't know, Thibault, you are the head of Codex OpenAI, and Andrew, you are a member of the technical staff on the Codex app at OpenAI.
你们就是当下最热门的人物。
And you are the people of the moment.
他们刚刚播出了关于Codex的超级碗广告。
They just ran a Super Bowl commercial about Codex.
是OpenAI做的。
OpenAI did.
你们感觉怎么样?
How are you feeling?
是的。
Yeah.
那场超级碗广告确实很令人惊讶,不是吗?
That that Super Bowl was quite surprising, wasn't it?
确实如此。
It it really was.
我认为核心在于,我想从这里开始讨论的原因是,这感觉像是一次战略转变。
I think the the the core thing, and I think the way the reason the the place I wanna start this conversation is it feels like that that is a strategic shift.
你本会预期OpenAI在超级碗上投放的是ChatGPT的广告,尤其是如果你看看几个月前Codex对专业工程师的定位,可能不会针对更广泛的受众投放广告。
You would expect OpenAI to have run a ChatGPT commercial during the Super Bowl and maybe not especially if you looked at Codex's positioning, like, three or four months ago for professional engineers, maybe not have run an ad targeted at a much more at a much broader audience.
长期以来,一直存在一种分裂:Codex是给专业工程师用的。
It it felt like for a long time, there was this divide where Codex was for professional engineers.
如果你想进行氛围编程,那就去ChatGPT应用里做吧。
And if you wanna do vibe coding, do that in the ChatGPT app.
但似乎在过去一两个月里,这种状况发生了很大变化。
And it seems like that that has shifted a lot over the last, like, month or two.
你能跟我讲讲这个吗?
Can you tell me about that?
是的。
Yeah.
我觉得特别是,比如我们可以谈谈上周的事。
I think especially, like, in you know, we we can talk about last week.
对吧?
Right?
所以上周一,我们发布了Codex应用。
So, like, last week on Monday, we released the Codex app.
立刻,我们就看到了大量的下载量,第一周就超过了一百万次下载。
Immediately, we saw, like, a ton of downloads, like, than a million downloads in the first week.
然后我们知道,周四我们将发布一个非常强大的模型,也就是Codex 53。
And then we knew that we were releasing, like, an extremely strong model, like, you know, five three Codex on Thursday.
这让我们非常清楚地表明,我们就是要为大家带来卓越的体验。
That just made, I think, it very visible that, you know, we're we're here to, you know, put incredible experiences out there.
我们非常重视Codex。
We're very committed to Codex.
而且,像代理现在也开始真正发挥作用了,即使你技术没那么强,也能创建这些东西。
And, like, also agents are really starting to work and be able to create these things, you know, even if you're, like, a little bit less technical.
我觉得这个应用确实证明了这一点。
I think, like, the app really showed that.
你知道,它让普通人更容易上手尝试,运行多个代理,而我们的模型在支持多任务处理和长时间稳定运行方面表现得非常出色。
You know, it's, like, it's much more inviting for people to just try it and, like, you know, run multiple agents, you know, with our models being, like, very very good at sort of, like, allowing for multitasking and being reliable for long running long running sessions.
所以,它让你能创造出更多东西。
So, like, it allows you to create a lot more.
因此,我觉得也许我们可以激励更多人去构建,并展示代理已经来了。
So it just felt that, you know, maybe we can inspire more people to build and then show that agents are here.
对吧?
Right?
这正在到来。
It's like, it's not it's it's coming.
这将成为主流。
It's going to be mainstream.
你知道,为什么不试着创造一些新的东西,去激励别人呢?
You know, why don't you try and, like, create something new and, you know, inspire people?
这感觉正是我们想要强化的核心理念。
That felt like the right thing that we wanted to reinforce.
是的。
Yeah.
在设计和开发这款应用的过程中,我们始终给自己定下了一条内部准则:我们必须打造一款自己真心喜欢并用于所有工作的工具。
While we were designing and developing the app, one of one of our, like, internal mandates to ourselves the whole time was that we had to make something that we love to use and that we used for, like, all of our work.
如果我们做不到这一点,就不会发布这款产品,这还是我们刚开始时的想法。
And if we couldn't do that, then we weren't gonna put this out, and this was back when we started.
我觉得我们对自己能从中获得这么多乐趣感到非常意外。
And I think that we surprised ourselves a lot with how fun it was.
尤其是当我们开始构建这个应用时,还远未开始开发智能体技能。
And especially as, you know, we started to build this app before we started to build agent skills.
当我们把它们结合起来后,就形成了一个非常丰富的互动体验,你可以打开浏览器或连接到各种服务。
And then once we kind of paired them together, it became this really rich interactive experience where you could open the browser or you could connect to these various services.
于是,我们突然感受到了这种紧密连接的互动体验,想要分享出来——我觉得这个广告就像一封写给开发者的情书。
And so all of a sudden, we started to feel this, like, really connected interactive experience and wanted to share like, I I kind of see the the ad as, like, a a love letter to builders.
对吧?
Right?
我从没见过Linux光盘出现在超级碗广告里。
I have never seen a Linux CD in a Super Bowl ad.
所以,看着这一切真的太酷了。
And so, you know, like that was really cool to watch.
这个广告带来了什么影响?
What was the impact of the ad?
我们还在衡量这一点。
We're still to measure that.
我们会看看它在长期内如何发展,但我们在广告播出后——太平洋时间下午4点左右——确实看到了流量的激增,我们的系统瞬间承受了巨大压力。
We'll see like, you know, how how it plays out over the long term, but we saw a giant surge of traffic actually, like, remarkably, like, you know, very, very quickly after 4PM, like PST when it aired, like, the surge in, like, our systems were, like, under heavy load.
所以对我来说,这感觉有点奇怪。
So it would it felt kinda weird to me.
你知道,人们在看超级碗比赛时,接着去下载并安装这个应用,然后当场就开始试用。
Like, you know, people are watching the Super Bowl and then going and, like, you know, installing the app, and they're just, like, trying it out right there and then.
但这件事确实发生了。
But that it happened.
很多很多人联系了我们,说他们被这个广告深深激励了,之后就想动手开发,而这正是我们所期望的。
And a lot of a lot of people reached out and, like, saying they were really inspired by by by it and just wanted to build afterwards, which is, you know, what we're, like, aiming for as well.
回到正题,我仍然想聊聊一些战略上的转变。
Tamming back, I I still wanna talk a little bit about the the strategic shifts.
所以,Codex应用,或者说整个Codex,正从原本主要面向专业开发者的工具,转向一个面向更广泛用户群体的产品,也许还把原本在Chatuchip tea里的部分编程体验迁移到了Codex应用中。
So Codex app moving from or Codex in general moving from something that is really for professional developers moving to something that has a more a broader audience and and and maybe moving some of the vibe coding from Chatuchip tea into the Codex app.
能跟我讲讲这个吗?
Tell me about that.
我认为我们并没有打算把Vibe编程从Chateapetit迁移到Codex应用中。
I I don't think we're trying to move Vybe coding from from Chateapetit into the the Codex app.
就像,我们正在经历两件事。
Like, we're very much you know, two things are happening.
首先,我们在推动专业软件开发的前沿。
Like, one, we're pushing the frontier on, like, professional software development.
五三版本的Codex在所有主要编码基准测试中都击败了其他所有模型。
Like, five three Codex beats every single other model, like, on the top benchmarks for coding.
所以它是一个非常非常强大的模型。
So it is a very, very capable model.
而且,在速度和成本方面,它确实也是一个顶尖的表现者。
And, you know, it's also, like, at the at the speed and cost, it's like, you know, it's it is a top performer rather.
我认为,第二点是,这款应用让事情变得更加便捷,因此确实吸引了更广泛的用户群体。
I think the app the second thing is, like, the app does make things more accessible, and so, like, it does appeal to, like, a wider audience.
但内部我们也看到,这款应用在研究和我们自己的团队中被广泛使用。
But internally, we're also seeing the app, you know, it is very much used within research, within our own team.
整个Codex团队都在使用这款应用。
Like, the entire Codex team uses the app.
让人们更有生产力。
Makes people more productive.
所以我们非常专注于我们观察到的、让公司内外的人们变得非常高效的代理使用方式和模式,并全力投入其中。
So it's very much leaning in into how we think agents are best used, the patterns that we were seeing that were making people very productive here at the company and outside, and then just sort of going all in on that.
与此同时,委托功能终于实现了。
It does happen that at the same time also it's like, hey, delegation is finally here.
它有效果。
It works.
它变得更加容易使用。
It is much more accessible.
我们打算尝试看看如何将这些功能打包,真正推向更广泛的用户群体,但这可能不是Codex应用本身。
And we're gonna try and see, like, how we can package that and actually ship this to, like, you know, a much, much wider audience, but that's that might not be the Codex app.
我的意思是,你每天都用它。
I mean, you use that all the days.
你就在里面直接构建。
Like, you just build in there.
我写的代码中有99%都是在Codex应用中完成的。
99% of the code that I write is using the Codex app.
我也是。
Same.
我的生活现在完全离不开这个应用了。
I mean, I live in here now.
是的。
Yeah.
好的。
Okay.
这其实真的很有意思。
Well, that's that's actually really interesting.
我 definitely 想聊聊这个应用本身,但我想先回到你刚才说的,如果我没理解错的话,你们正在推动前沿,看到越来越多的人——不仅仅是资深工程师——在使用它。
I I definitely wanna talk about the app in particular, but I wanna go back to the thing you just said, which is maybe if I if I'm reading you right, you're you're kind of like we're pushing the frontier, we're seeing lots of people who are maybe broader than just like, senior engineers using this.
不过,关于谁在哪个应用里做什么,整体上你们可能还没完全理清,这不像过去那样有清晰的界限——不再是‘ vibe coding ’和‘ Chegibi ’,现在真正是‘ vibe coding ’和‘ Codex ’。
However, the overall idea of like, who is doing what in which app, like maybe you haven't totally figured out yet, and it's it's not as clean of a line as, like, no longer vibe coding and Chegibi, you're really vibe coding and Codex.
就像你说,两种方式都能做,但我们还没完全搞清楚到底该在什么地方用哪种方式。
It's like, you can do it in both, but we haven't figured out exactly, like, which thing you're gonna do where.
是的。
Yeah.
我认为 Credex 是目前最强大的体验。
I I think Credex is the most powerful experience out there.
所以你最好具备一定的技术背景,以便理解代码实际上正在被编写出来。
So you should be fairly technical so that you understand, hey, code is actually getting written.
代码会在你的机器上执行。
It's gonna get executed on your machine.
默认情况下,它会在沙盒中执行。
By default, it's executed in the sandbox.
但为了充分使用 Codex,你大概还是需要能够读懂代码。
But you should probably be able to read code in order to use, you know, Codex two, like, its its fullest.
我们将来也会在 ChatGPT 上提供类似的体验,它在沙盒和概念呈现方式上会有一些不同的特性。
We will bring a similar experience to ChatGPT at some point, which will have, like, different properties in terms of, like, the sandbox and how concepts are represented.
也许我们不会展示那个吓人的终端命令正在运行,你需要批准它。
Maybe we won't be showing, hey, this scary terminal command thing is running and you should probably approve it.
当然,你不应该对非技术人员这样做。
Of course, you shouldn't do that to someone who is not technical.
Codex 的目标是吸引所有程序员、构建者和技术相关人士,比如那些本身懂技术或接近技术的人,像数据科学这类领域。
And Codex is really there to appeal to just all coders, builders, technical, like people who are close, like either technical themselves or like technical adjacent, like data science, these kinds of things.
是的。
Yeah.
如果你使用 Codex 应用一段时间,就能看到它从聊天中获得的灵感。
And, you know, if you use the Codex app for any amount of time, you can see the inspirations from chat.
你知道,界面布局非常相似。
You know, the layout's very similar.
我们会自动为你的对话命名。
We auto name your your conversations.
我们有上下文操作,但整体设计非常简洁。
We've got contextual actions, but it's pretty clean.
对吧?
Right?
Composer 看起来也非常相似,你还会在其他类型的聊天中看到一些类似的灵感。
The composer looks very similar, And you'll see some of that inspiration back in chat for for other types of things.
但我们仍然相信,当我们着手为专业软件开发者打造产品时,它值得拥有一个专属体验,能够真正展现模型的强大能力以及模型如何改变开发流程。
But we we still believe that some you know, when we set out to make something that was for the professional software developer and for us, that it deserved a dedicated experience that could could really showcase the power of the models and the way that the models could change the development life cycle.
因此,我们专门为这一点打造了产品,我们在研究团队和产品团队内部已经取得了很大成功。
And so we, you know, we made something very tailored to that, And we've we've had a lot of success internally with research teams, with product teams.
所以,我们会放眼更远的未来,但我认为我们对这种量身定制的方案感到非常满意。
And so, you know, we're we'll look beyond, but I I think we're really happy with where we've ended up on that kind of tailored the tailored approach to this.
你能
Can you
告诉我为什么选择投资 GUI 而不是两个 E 吗?
tell you what the decision to invest in a GUI over a two e?
我觉得现在两个 E 真的非常热门。
I feel like two Es are are so hot right now.
而且很明显,你们已经为Codex推出了一款。
And obviously, you have one for Codex already.
你们本可以说,我们要加倍投入,把终端体验做得比现在更好。
And you you could have said, we're going to double down and and just make the make the terminal terminal experience even better than it is now.
并且大力投资于终端,而不是说,好吧。
And really invest in that versus, okay.
我们要去做一个GUI,我认为这有点反直觉,或者说是与主流叙事相悖的做法。
We're gonna go you know, I think, yeah, making a GUI is a little bit of, a counterintuitive or, like, counter narrative thing to do.
所以跟我讲讲这个决策过程吧。
So tell me about that decision process.
我觉得这并不反直觉。
I think it wasn't counterintuitive.
它更像是不那么主流而已。
It's more maybe it's not mainstream.
所以我们尝试了各种不同的方法。
And so we we experiment with a lot of different approaches.
我觉得我们仍然处于实验阶段,我们主要负责两件事。
Like, I very much consider that we're still in the experimentation phase, and, you know, we're responsible primarily for two things.
那就是打造世界上最具能力的编码实体。
It's, like, building the most powerful entity out there, you know, that's capable of coding.
然后,随着时间推移,这将逐渐演变为一个多智能体系统,变得越来越强大。
And then, you know, increasingly, this will become, like, a multi agent system, and it will become like, more and more capable.
你必须学会如何引导和监督它的结果与行为。
And, you know, you will have to figure out, like, how to steer and supervise, like, its outcome and its behavior.
这是我们正在构建的一个方面。
You know, that's, like, one thing that we're building.
同时,我们也在探索如何与这个系统进行交互。
And then we're also building, like, how you even, you know, interact with this.
那么,如何才能最佳地洞察这个强大实体或实体系统正在做什么呢?
It's, you know, what is the optimal way to have visibility into what this, like, very capable entity or, like, system of entities is doing?
你该如何引导它们?
How do you steer them?
你如何监督它们?
How do you supervise them?
所以我们还在大力探索这究竟是什么样子。
And so we we're very much still experimenting with what that is.
就像,你知道的,你当然可以在两个 E 中做到。
It's like, you know, sure, you can do it in the two e.
但到了某个阶段,这种做法就会显得非常受限,尤其是在多模态场景下。
It's like, at some point, it starts to feel like very limiting, especially on like multimodal.
实际上,这些模型可以绘制简单的图表、生成图像,或者通过语音进行交互。
Actually, the models can draw little diagrams and generate images or you can talk over it using voice.
也许你会让它们并行运行很多个,于是就开始难以追踪了。
Maybe you have like many of them going in parallel, and so you start to lose track.
所以我们觉得有必要开始尝试其他方法。
So we felt like we needed to start experimenting with something else.
直到我们看到它在公司内部变得极其流行,才意识到我们必须把它对外发布。
And it is only when, you know, we saw it become, like, super, super popular internally, we were like, we have to ship this externally.
这已经发展到一个地步,好到我们不能再自己藏着了。
Like, this is kinda like this has come to a point where it's, like, too good to sort of, like, just keep it to ourselves.
我的意思是,这就是你经历的过程,你知道,你现在正在开发这个应用。
I mean, that was, like, the journey that you went you know, you were now building in the app.
不过,你是什么时候开始在应用里开发的?
Although, like, when did you start building in the app?
其实那很快,就在应用自己开始构建的时候。
That was actually, like, fairly quickly, like, when the app was building itself.
没错。
That that yeah.
那确实挺快的。
That was pretty quickly.
而且是的。
And yeah.
因为我一开始是从TUI和IDE插件开始的。
Because I was starting with the TUI and with the the IDE extension.
我认为我个人的目标是如何尽快实现完全在应用内构建应用?
And I think that my goal personally was how can I get to fully building the app on the app as fast as possible?
对吧?
Right?
在开发这些东西时,很容易就会陷入这种模式:哦,这个对某人会有帮助。
It's, like, it's really easy when building this stuff to slip into the mode of, like, oh, this will be good for somebody.
比如,有人会非常喜欢这个。
Like, somebody would love this.
某种特定类型的人,他们会非常喜欢这个。
A certain type of like, they will love this.
对吧?
Right?
所以我们真的很想尽快实现:我想能够在应用内构建应用。
So we really wanted to get quickly to, like, I want to be able to build the app on the app.
我希望它能依靠技能自行运行。
I want it to be able to run itself with skills.
我希望它能点击自己生成的应用程序界面。
I want it to click around on the app that it spawned.
我希望这能尽快成为我工作流程的一部分。
And I I want this to be, like, part of my workflow as soon as possible.
他们说,我有时候还是会用TUI,尤其是想快速启动某个功能时,但我认为,控制UI的灵活性很有价值,比如让一些面板保持持久,而另一些则是临时的,你知道吗?我们已经为应用集成了语音功能,你可以用语音进行提示。
And they're, like, I I still use the TUI sometimes when I wanna fire something quick, but I think that, like, there is something about the flexibility of controlling UI and being able to have some panes be persistent and others be ephemeral and be you know, we're we we shipped voice with the app so you can prompt with with voice.
我们的应用里还集成了Mermaid图表。
We have mermaid diagrams in the app.
我们支持完整的图像渲染。
We have full image rendering.
我认为,所有这些功能都只是我们未来在专用UI上要做的冰山一角。
So all of those things, I think, are, like, the tip of the iceberg on what we wanna do with a dedicated UI.
它非常简单,而且是故意设计得简单,但我相信我们在动态功能上还能做很多。
And it's it's pretty simple, and it's simple intentionally, but I think we're gonna do a lot with dynamic stuff there.
我的意思是,是的,上限要高得多。
I mean, yeah, the the ceiling is just much higher.
是的。
Yeah.
这很有趣。
It's interesting.
我的体验是尝试这个应用,反复尝试。
My experience was trying the app trying the app.
我其实不想再回到终端了,之前几个月我主要在Cloud Code中编码,偶尔也在终端里写一些代码。
I didn't really wanna go back to a terminal, and I had been coding most mostly in Cloud Code and and and some codecs in the terminal for the last, like for several months before that.
我意识到,实际上,图形界面很棒。
And I think what I realized is, actually, GUIs are great.
问题是IDE。
IDEs are just the problem.
有一种不是IDE的编程图形界面,你们似乎正在探索这种东西。
And, like, there's some there's something that's a GUI for programming that's not an IDE, and it it seems like you're kind of in that figuring that out.
但我甚至不知道这叫什么。
But I don't even know what that's called.
它叫 codecs op。
It's called a codecs op.
你知道吗,在这个项目的开发过程中,曾经有一段时间,每个人都在 fork 同一个 IDE。
You know, there there there there was a moment during the development of this where everybody and their mother was forking the same IDE.
我们彼此对视了一下,然后说:嘿。
And we we kind of looked at each other, and we were like, hey.
我们是不是也应该 fork 一下 Versus Code?
Should we have done a fork of Versus Code as well?
我是认真的。
Like, it like, very seriously.
我记得那天确切是哪一天。
I remember exactly which day it was.
我不确定我是否要说 IDE 是问题所在,但我有时会用卡车来打比方:我会偶尔打开一个 IDE。
And I think I don't know if I I don't know if I would say that IDEs are the problem, but I go back to, like, the truck analogy sometimes with them, which is that, like, I will open an IDE here and there.
比如,我今天就打开过一个。
Like, I opened one today.
我当时想做一件非常具体的事,但现在都记不起来是什么了。
It was something very specific that I wanted to do that I don't even remember what it was.
但后来我关掉了它,又回去用Codex应用了。
But then I closed it, and I went back to using the Codex app.
我觉得Codex应用作为日常主力工具确实有它的优势。
And I think that there there is something there with, like, the Codex app being a great daily driver.
偶尔你需要一个IDE,或者需要一套复杂的终端配置,但这些都不该是你的主要工作环境。
And, like, occasionally, you need an ID or occasionally, you need, like, a really complex terminal setup, but that this should be your home base.
它应该是你运行的智能代理的指挥中心,一个你可以随时回来、追踪所有事情的地方。
It should be your command center for the agents that are running and a place that you can come back to and track all this stuff.
当然,我们也面临很多设计上的抉择,比如是否允许像IDE那样自由布局的面板。
And, you know, there are a lot of design decisions around, like, do we allow free form panels like an IDE?
我们最终得出的结论是,这些模型最擅长的,是根据当前任务类型判断此时此刻需要什么。
And we kinda came to the conclusion that a lot of what these models are great at is knowing what is needed in the moment for what type of task.
所以我们希望对每个时刻显示的内容拥有更全面的控制权。
And and so we wanted to have kind of more full control over what was able to show at what point.
对吧?
Right?
在计划模式下,你可以看到这一点,你并不一定会得到一个编辑器。
And you can see that in plan mode where you're not necessarily getting a composer.
你得到的是一个快速回答问题的方式。
You're getting a really quick way to answer questions.
你可以,你知道的?
You can you know?
而且你有你的计划,可以编辑你的计划。
And you've got your plan, and you can edit your plan.
我认为随着进展,我们只想在这方面做更多。
And I think we only wanna do more with that as we go.
你似乎对没有想回到那两个之后感到惊讶。
It seems like you were surprised that you didn't wanna go back to the the two after.
我确实如此。
I was.
是的。
Yeah.
是吗
Is that
我们之前,呃,像格雷格做过一次访谈,格雷格说,我是个高级用户。
we had, a a like like Greg Greg did an interview, and Greg was like, I am a power user.
我以为我永远不会离开终端了。
I thought I would never leave the terminal.
就像,是的。
Like Yeah.
格雷格整天都用 Emacs。
Greg Greg lives in Emacs.
你是不是那种
Are you like a I
我曾经当了六个月的 TUI 高级用户,从 Cloud Code 刚变得很好用的时候开始。
was a TUI power user for, like, six months starting with starting with like, when Cloud Code first got really good.
我当时就想,天啊,这比用光标冲浪或者别的什么方式好太多了。
And I was like, holy shit, this is so much better than being cursor windsurf or whatever.
现在我觉得自己像是速通了两个时代,又回到了Gooey的环境,目前我正在两者之间来回切换。
And now I feel like we I speed I speed ran my two era and I'm back in back in gooey's like I'm kind of flipping back and forth right now.
但我能隐约看到曙光了,尤其是当你同时运行多个任务时,图形界面的优势就非常明显了,体验好太多了?
But I can I sort of see the light where it just if you're especially if you have a bunch of them going at once, the affordances of GUI are just, like, make it much nicer?
是的。
Yeah.
那里还有很多东西即将推出。
And there's a lot more to come there.
这对我们来说是一个非常有意识的决定。
And it was a very intentional thing for us.
我们逐渐意识到,智能代理的行为已经远远超越了代码本身。
Like we sort of see agents will act and are already acting on like much more than code.
因此,它们需要成为你电脑上每一个应用和每一件事的伴侣。
And so they need to be a companion to like every single app and every single thing that you can do on computer.
就像我们与 Linear、Slack 集成一样。
It's like we integrate with linear, Slack.
当然,它们也需要能够阅读代码并生成代码,但也许还能通过 Vercel 进行部署。
And of course, they also need to be able to read the code and produce code, but maybe it can do like a deploy through Vercel as well.
你是打算在 IDE 里完成所有这些操作吗?
Like, are you gonna do all these things from your IDE?
那样感觉会很奇怪。
That would sort of like feel very odd.
所以这就像你代理的指挥中心。
And so it's like this command center for your agent.
我们围绕这样一个理念优化了整个体验:你正在控制、引导和监督一个非常强大的智能实体。
We optimize the entire experience around the idea that you have a very capable intelligent entity that you're controlling, steering, and supervising.
你根本不需要亲自去执行那些操作。
And you never need to sort of go in there and do the things yourself.
它非常擅长被委派任务。
It's like the thing is very capable of being delegated to.
我的意思是,当你接受这就是我们正在走向的方向时,比如五二 Codex,感觉我们就快达到了。
Like, I think, you know, when when you accept that that is, like, you know, what we're headed towards and, like, you know, with five two Codex is, like, you know, it just feels like, you know, we're getting, you know, like, there.
对吧?
Right?
那对你来说也是一样的。
Then you're like, well, it's the same with you.
当我跟你讨论某个功能 ID 或类似的东西时,你就去获取灵感,然后去完成它。
When I talk to you about a feature ID or something, it's just like, you go and you get inspired and go and do it.
我不会突然跳进你的 IDE 里直接去实现它。
It's just like, I don't suddenly jump into your IDE and just go and implement it.
你可以的。
You could.
是的。
Yeah.
我的意思是,你会觉得那样做很奇怪。
I mean, that thing you would find it disturbing.
对吧?
Right?
我的意思是,这就是你们所有人使用代理的方式。
It's like I mean, so that's the way that you will you know, everyone will work with agents.
你就只是跟它们对话。
It's like you just talk to them.
与五二编码器相比,你的工作流程在五三编码器上有什么变化?
How has your your workflow changed with five three Codecs versus five two?
我对它的速度加快感到惊讶。
I was surprised at how much faster it was.
而且,我得调整一下,因为我之前更多地优化了长时间、多任务处理。
And sort of, like, I have to adjust on I had been optimizing a lot more for, like, long running, sort of, like, multitasking.
而且,我原本预期是,好吧。
And, you know, I sort of, like, had an expectation of, like, okay.
这种任务大概需要十到十五分钟。
This type of task will take, like, you know, ten, fifteen minutes.
我要去处理四个不同的事情,然后回来。
I'm gonna kick, like, you know, four, like, you know, different things and then come back.
所以我现在可以少做一些多任务处理,更能进入心流状态。
So I'm able to, like, know, maybe do a little bit less multitasking and, like, you know, be more in the flow.
这种感觉真的很好。
So that, felt really good.
现在用技能启动自动化流程也让人感觉非常满意。
And then it just feels now very satisfying as well, like, you know, to kick off, like, automations with it, using skills.
它是一个更通用的模型。
It's like it's it's a more generally capable model.
它不像以前那么专注于代码了。
It's, like, less sort of, like, super focused on code.
对吧?
Right?
所以我发现它可靠多了,比如处理推特回复、总结重要信息、在Linear里提交bug,然后通过自动化实现日常任务。
And so I find it, like, much more reliable, like, know, sort of, like, going through, like, Twitter replies and, like, you know, summarizing, like, the important teams or, like, filing bugs in, like, linear and then, you know, coming back to that and using automation so that, you know, things are, like, implemented daily.
感觉它在这些方面更稳健了。
Feels like it's much more robust for these things.
不过,安德鲁,你才是真正的超级用户。
Mean, but you're really the superpower user here, Andrew.
他做的就是这种类型的事情。
It's just like the kind of stuff he does.
和安德鲁相比,我用Codex的方式就非常普通。
It's just like I have very vanilla usage of Codex compared to Andrew.
不。
No.
我的意思是,说得真好。
I mean, well said.
我之前打算长期运行一个系列,但只在Twitter上运行了三天,就是设置了一个提示,用来给Codex应用添加一个随机的、非可交付的功能。
I had a a series that I I had intentions to run this for a while, and I only ran it for three days on on on x Twitter, which was that I was I was setting up a prompt to basically add a feature to the Codex app, like, some random, like, non shippable feature to the Codex app.
我写了很长的提示,关于我们需要达到的质量标准。
I had this long prompts, like, about the the quality bar that we had to do.
当我切换到五三版本的Codex后,结果变得有趣多了。
And once I switched it to five three Codex, the results got actually much more interesting.
比如,我们做了一个地铁滑板的界面,就是其中一个。
Like, we did a subway surfers panel on the right was one of them.
还有一个是为子代理做的一个小型Minecraft界面,我不确定。
Like, a little Minecraft UI for the sub agents was another one that we did that I don't know.
也许我们会把它上线。
Maybe maybe we'll ship it.
我当时说:快回去工作。
I was like, get back to work.
是的。
Yeah.
对。
Yeah.
为什么我们会有Minecraft在
Why do we have Minecraft in
现在Codex应用怎么样?
the Codex app now?
是的。
Yeah.
但得去探索一下。
But gotta explore.
不。
No.
我的意思是,五三版本的Codex,真的很不错。
I mean, five three Codex, like, it's it's it's neat.
它很快。
It's fast.
它功能强大。
It's capable.
它是多模态的。
It's multimodal.
Tivo说你有很多酷的应用场景,具体是哪些?
What are Tivo says you have a lot cool use cases.
比如,你用Codex应用最有趣的方式是什么?可能是一些人们还没想到但值得一试的用法?
Like, what are what are the, like, more interesting ways that you're using the Codex app that maybe people should try but haven't thought of yet?
Andrew搞了一些自动化,我觉得这种方式改变了你对这些工具的看法——当它们能根据特定触发条件在后台自动运行时。
Andrew came up with automations, and I think that sort of, like, shifts the way that, you know, you're thinking about these things when they can just, like, sort of, like, hop it in the background, you know, on a specific trigger at a specific time.
然后,你可以自己编程设置它们。
And then, you know, just you can sort of, like, program it yourself.
是的。
Yeah.
你正在
You're
大量使用它。
using that a lot.
我用这个应用做很多事,超出了单纯的编程功能。
There are a lot of things that I use the app for that are a little bit outside of just, coding features.
我用它来保持我的拉取请求可通过自动化合并。
I keep it to I I use it to keep my PRs mergeable with automations.
它会解决合并冲突。
And so it'll resolve merge conflicts.
它会保持它们更新。
It'll keep them updated.
它会修复构建问题,这样一旦准备好,就能立即合并。
It will fix, like, build issues so that basically, like, as soon as they're ready to go, like, they're ready to go.
不会出现那种情况:哦,天哪。
There's no, like, oh, hey.
有人合并了一个大改动,现在出现了冲突。
Somebody merged a big thing and there's a conflict now.
所以我这么做。
So I do that.
你说过,那自动化触发是在什么时刻?
So you said, like so at what point is the is the automation trigger?
因为我以为自动化是按固定时间触发的,但听起来还有其他我之前不知道的触发方式。
Because I thought the automation triggers, like, at a certain time schedule, but it sounds like there are other triggers I didn't know about.
是的。
I yeah.
我们正在关注很多东西。
I we're looking a lot of things.
我现在只是设定了一个时间计划,并使用了我们的 GitHub 技能和一些内部的 CI 技能。
I have it right now just on a time schedule, and I use our GitHub skill and some internal skills for our CI.
它每小时或每两小时运行一次,大致上清理所有内容。
And that that runs hourly or every two hours and kinda just cleans everything up.
我明白了。
I see.
所以它会遍历所有内容,比如如果 main 分支有变动,它就会检查所有 PR,确保它们都保持最新,这样当你准备合并时,永远不会遇到问题——这其实很好。
So it's like it just through all you know, if there are any changes on main and and it just looks through any PRs and just, like, make sure that they're all up to date so that whenever you're ready to go, it's never, like that's actually that's good.
我喜欢这个做法。
I like that.
是的
Yeah.
这实际上非常有帮助。
It's it's actually really helpful.
出乎意料地有用。
It's surprisingly helpful.
我设置了一个每天早上9点的定时任务,会收到过去一天内所有合并到Codex应用中的贡献记录。
I have one that every day at, like, 9AM, I get sent all of the contributions that have merged to the Codex app over the last day.
它会生成一份清晰的报告,显示谁合并了什么,并且我会按主题进行分组。
And so it'll do, like, a nice report of who merged what, and it will I have it grouped by theme.
所以我会说,好吧。
So I can be like, alright.
有三个人在处理这个Composer模块的部分。
Like, three people worked on this part of the the composer.
有两个人在处理自动化相关的工作。
Two people worked on automations.
比如,我会看看发生了什么,这样至少能了解当前的情况,毕竟上线前总是会很混乱,是的。
Like, here's what happened so that I can at least be, like, knowledgeable of what's happening because, you know, things things get chaotic right before launch and yeah.
我有一个自动化脚本,一天会运行好多次,它会随机挑选一个文件,查找并修复一个细微的bug。
One automation I have is it's I I run it, like, multiple times a day, and it's like, pick a random file and find and fix, a subtle bug.
这还挺搞笑的,因为它真的会随机选一个文件。
And then it's kinda funny because it actually does pick a random file.
它会运行类似Python的随机函数,找到一个随机文件,然后从那里开始检查。
So it will run like, you know, Python, like, random, it will, like, you know, find a random file, and it will start from there.
所以每次它都会像是在探索一个全新的地方。
And so it's like every time it sort of, like, explores like a new one.
它发现过问题吗?
Has it caught anything?
哦,是的。
Oh, yeah.
对。
Yeah.
展开剩余字幕(还有 301 条)
我们通常能发现一些潜在的bug,这些bug实际上并不会在关键路径上触发,但它们确实是bug。
It was, we we catch, like it's often latent bugs that, you know, are not triggering actually, like, on the critical path, but, you know, they're actually bugs.
然后,修复它们非常简单,直接合并就行了。
And then, you know, it's, like, trivial to fix it, like, merge it.
花不了多少时间。
Takes very little time.
而且这种事情,我自己根本不可能发现,比如前几天我在约束采样中遇到的问题。
And it's a thing that, you know, I would have never found myself found, like, an issue in, like, constraint sampling, like, the other day.
是的。
Yeah.
这太棒了。
That's really cool.
你还有其他值得分享的自动化工具吗?
Do you have other other automations that are worth sharing?
让我想想。
Let's see.
我觉得一直有大约60个在运行。
I feel like I have 60 that are running at all times.
有些是用于测试,有些是用于实际用途。
It's just like Some for testing and some for real.
团队里有些成员特别喜欢一个工具,它会查看你过去一两天提交的PR,并悄悄修复你发布的任何bug。
Some of the members on the team really like this one that looks at the PRs that you've done in past day or so and quietly cleans up any bugs you shipped.
它还会查看几个可观测性平台,试图在别人注意到你发布了bug之前就自动推送修复。
And kinda, like, looks at a few of the observability platforms to see and and it's, like, tries to basically ship a fix before anyone's noticed that you shipped a bug.
这很酷。
That's cool.
并不是所有自动化都和编码相关,比如市场研究。
Everyone is not coding related, which is like marketing research.
它每天运行,通常需要特定的技能来进行深度市场研究,我这段时间一直在调整优化。
It runs daily, and then it's just sort of like it's probably with like a specific skill to do like deep marketing research, which I've like sort of like tuned over time.
然后它会搜索网络,查找关于用户对Codex的最新看法和讨论。
And then that goes and searches the web on any new things that came up in terms of how users are perceiving, talking about Codex.
然后我收到了那份小报告,每次读起来都很有趣。
And then I just received that little report And it always makes for, like, an interesting read.
是的。
Yeah.
我们可以继续了。
We can just go on.
这些只是我们确实依赖的一些例子。
It's like these are just examples that, you know, we do rely on.
它们会运行。
Like, you know, they run.
对。
Yeah.
是的。
Yeah.
对。
Yeah.
你们有没有什么特别的技能,是超出一般那种的?
Do you have any particular skills that you guys like that are beyond the normal kind of you know?
我有GitHub方面的技能,就是那种东西。
I have a GitHub skill and that that kind of stuff.
我超爱Andrew的Yeet Yeet技能,它能直接抓取变更,然后提交、创建拉取请求、写草稿、放进草稿状态,还能自动生成带标题和正文的PR。
I love Andrew's Yeet Yeet skill, which it just, like, takes, like, the change and then, you know, does the commit, does the PR, writes, like, the draft, that puts it in draft and, like, know, publishes a PR with, like, a a PR title and and body.
是的。
Yeah.
这真的很让人满足。
It's very satisfying.
对。
Yeah.
它什么都能做。
It just does everything.
这个功能确实能让大家的效率大大提高。
That one is, like, makes definitely makes people, like, productive.
对你来说,最常用的是哪些?
What are the top used ones for you?
ImageGen 很酷,是的。
ImageGen is a cool one Yeah.
用于各种搞笑的自动化目的。
For both, like, silly automation purposes.
比如,嘿。
Like, hey.
给我生成一张能体现我前一天工作的图片。
Make me an image that characterizes my last day of work.
不是我最后一天工作。
Not my last day of work.
是我前一天。
My previous day.
对。
Yes.
是的,安德鲁。
Yes, Andrew.
你知道吗,ImageGen 这个功能其实特别棒,我用 Codex 应用给我女儿们做了一本书。
I you know, the the ImageGen skill was actually really cool for I I used the Codex app to make a book for my daughters.
所以我整理了一个提示,用来教它写我想要的剧本。
And so I had I, like, you know, put together this prompt for teaching it about, like, a script that I wanted written.
比如,24页,这是我女儿们的年龄。
So, like, 24 pages, here are my daughter's ages.
这是咱们过去住过的地方。
Here's, like, where we've lived in the past.
我们曾经住在波士顿,后来搬到了纽约,再搬到了这里。
Like, we were in Boston and moved to New York and then moved over here.
然后我说,之后我们就经历了那些事。
And then I said, like, after that we went through that.
我同意了剧本,然后我们继续推进,我说,好了。
I agreed on the script, and then we went through, and I said, like, alright.
现在是时候使用图像生成技能了,它根据脚本为书中的每一页生成了图像。
Now it's time to use the image gen skill, and it made like, it prompted for every page in the book based on the script.
它为每张图片生成提示,然后将它们全部整合在一起,并使用PDF技能生成了书籍的PDF文件,接着我打印了出来。
It prompted for the image, and then it kinda put them all together and used the PDF skill to put together the book's PDF, and then I printed it.
所以我们得到了一本非常定制的书,我会读给孩子们听,真的非常棒。
And so we've got, like, a super custom book that, you know, I read to my kids, and it's it's really cool.
当你能把智能代理的智能与程序化方式结合起来时,这简直太棒了,比如通过使用各种技能,然后以新颖的方式将它们组合在一起。
It's just this awesome thing when you can combine, like, the intelligence of, like, the agent, and then it's, it, works in a programmatic way, like, you know, by using skills, and then you can just combine them in, like, novel ways.
是的,我认为PDF和图像生成这一组合是我们经常看到的常见搭配。
Like, yeah, I think the the PDF and Image ten one is like a it's a common combo that we see.
Codex模型显然变得更快了,这让它更加实用,而且感觉也更敏锐了一些。
It feels like the Codex model obviously, it's got faster, which makes it much more usable, and and it also feels a little more opicy.
它似乎多了一点情感智能,但仍然保留了一点‘完全按你说的做’的特性,有时候这种表现会让人有点烦。
Like, it's a little more has a little more emotional intelligence, but it still has a little bit of that, like, it does exactly what you say thing in a way that is a little can be annoying.
你们是怎么考虑塑造模型的感知方式,以及你们希望朝哪个方向引导它的?
How are you guys thinking about how you shape the way the model feels and which way you're pushing it?
这是我们非常关注的事情。
It's something that we obsess over.
因此,我们确实希望模型在编程和指令遵循方面表现出色。
So we definitely want the model to excel at coding and be really good at instruction following.
但与此同时,如果我们在这方面优化得过多,它可能会过度关注某些特定词语,或以人类不会的方式误解意图。
At the same time, when we optimize a little bit too much in that direction, it can sort like over index on specific words or sort of like misunderstand the intent in ways that humans wouldn't.
比如,有时候我会打错字,而这个错字居然真的会出现在文件里。
Like sometimes I will just like have a typo and then typo, the like, it'll actually find its way into, like, the file.
我会想,当然,我并不是真的想打错字。
And I'm like, you know, obviously, you know, I didn't mean, like, you know, the typo.
我的意思是,我想的是这个类的名字。
Was like, you know, I meant, like, this name of this class.
所以我们一直在持续改进这一点。
So that's something that we're, you know, definitely continuing to push on.
但我们现在最关注的是效率、速度,以及我们现在所说的‘个性’——它有多支持人?
But like the thing that we're pushing on the most right now is like really efficiency, speed, and then also like what we now refer to as like personalities, like how supportive is it?
我们理解,并不是每个人都有相同的偏好。
And we understand that not everybody has the same preferences there.
之前的默认风格确实非常直接、务实。
Like the previous default was definitely like super blunt, like pragmatic personality.
现在我们也引入了一种更支持性、更友好的风格,你可以在这两者之间自由选择。
Now we've also introduced like a more supportive, like friendly personality, and you can just like pick between those.
我认为,对于那些没有普遍公认标准、每个人都不必使用同一方式的事情,我们可能会提供一种让你自定义的方式。
And I think for things that don't have like sort of like a universal accepted thing that everybody should just use is that we're probably going to introduce some way for you to just make it your own.
你应该感觉拥有属于自己的个性化代码规范,它能完全按照你的需求运作。
You should feel like you have your own little personal codex that works in exactly the way that you want it to work.
你用的是友好的还是务实的版本?
Do you use the friendly or the pragmatic one?
务实的。
Pragmatic.
务实的。
Pragmatic.
是的
Yeah.
好的
Okay.
我会说用务实型。
I'll say use pragmatic.
是的
Yeah.
有意思。
Interesting.
我觉得你们最近推出了一款快得离谱的模型。
I think you guys recently put out a model that is so fucking fast.
在它发布前我就在测试,当时我就觉得,我根本跟不上这玩意的速度。
I was testing it before it came out, and I was just like, I I can't really keep up with this thing.
所以我很想知道,这改变了你们对使用这类模型进行编程的可能性的看法,以及你们需要哪些功能来有效管理如此快速的模型。
So I'm curious how that changes how you think about, what is now possible with coding with a model like this and also the affordances that you need in order to manage models that are so quick effectively.
是的。
Yeah.
毛皮,是的。
The fur yeah.
我们第一次在应用中使用这个模型时,也遇到了类似的情况:突然之间出现了一大段文字,我们 scrolled 到底部,立刻意识到:好吧,我们需要让这段文字进来时更平滑一些。
The first time we used this model in the app, we had kind of that same thing happen where all of a sudden there was just like this wall of text and we are at the bottom of the scroll and we were immediately like, alright, we need to smooth this thing out coming in.
所以我们实际上稍微放慢了一点速度,让你能更顺畅地看到文字逐字出现。
And so we actually do slow it down ever so slightly just so that you can see the words come in like a little bit smoother.
这太有趣了。
That's so funny.
这确实是个很有趣的问题,但这个过程非常有趣。
It's it's like a really funny problem, but this thing has been super fun.
我认为我最兴奋的是,我们能为应用添加哪些真正动态的新功能——这些功能在没有这么快的模型时是无法实现的。
And I think I think what I'm most excited about is what sort of capabilities we can start to add to the app that are really, really dynamic that we couldn't with a model that wasn't this fast.
是的,这个模型将让你能够非常快速地迭代,但它也开启了大量新的可能性,关于如何编程以及如何与 Codex 应用交互。
So, yes, this model is going to allow you to iterate really, really quickly, but it also opens up a lot of new opportunities to how, like, how you code and and how you interact with the Codex app.
第一次展示这个最初原型时,我们把所有东西都连接起来了,显然这个模型是由Cerebras驱动的。
The first time showed the very first prototype when we hooked everything up and obviously the model is powered by Cerebras.
我们已经谈过这个合作关系,我们非常兴奋能推出首个通过该平台提供的模型。
And we've talked about the partnership there, we're very excited to put the first model that we're serving through that out there.
这显然还处于非常早期的阶段。
It's obviously still very early.
这简直是第一次我们把所有东西连起来,我们激动得不得了,想马上分享出来。
It's, like, literally the first time we hook it all up, and, we're just, like, so excited that we wanna share it.
但当我第一次给某人展示时,他们说:‘不可能。’
But the first time I I showed it to someone, they were like, no way.
这简直就是假的,假的演示。
This is, like, a fake a fake demo.
这根本不可能是真的。
It's like, you know, this is not real.
这不可能这么快。
Like, this cannot be this fast.
然后他们试了几个提示词。
And then they tried, like, a few prompts.
他们只是说:天啊,我根本跟不上。
They were just like this is like, oh, I can just I literally cannot keep up.
这简直太疯狂了。
It's like, this is insane.
而且,是的,我认为这将改变一切,尤其是因为这还不是我们能实现的最快速度。
And, yeah, I think this will change this will change everything, especially because it's not it's not yet the fastest that we can actually get it to be.
比如,我们现在发布的预览版还非常早。
Like, with the preview, we're putting it out, like, know, quite early.
我们实际上会在其基础上叠加多项优化,让速度提升两到三倍,远超你目前体验到的水平。
We're actually going to layer a number of optimizations on top of it, which should be able to, like, make it, you know, maybe two to three x faster than the experience that you have experienced.
所以这将会带来巨大改变。
So that's going to change things.
我们也在从任务分派的角度来思考这个问题。
And we're thinking about this also from a point of view of, like, delegation.
你知道,我们认为这个模型在多智能体系统中扮演着至关重要的角色,也能帮助加速那些更慢但更智能的智能体。
You know, like, we think this model has a huge role to play as part of, like, a system of, like, you know, multi agent systems and as a way to, like, speed up, you know, maybe the the slower, more intelligent agent as well.
所以我们会以这种方式进行实验。
So we're gonna be experimenting in that way.
那你预计那些更智能的智能体也会很快获得同样的硬件速度提升吗?
And do you expect the same hardware speedups on like the more intelligent agents to come out soon?
我们之前做的很多工作都挺有意思,主要是分布式系统和基础设施方面的问题,这些问题是因为我们能以前所未有的速度从模型中采样才暴露出来的。
So a lot of the things that we worked on were interesting, sort of like distributed systems and like infra problems that we uncovered because we were able to sample from the model at unprecedented speeds.
如果你能这么快地收到令牌,就必须去优化服务关键路径上所有暴露出来的瓶颈。
And then if you're getting tokens back this fast, you need to go and optimize the entire set of bottlenecks that you sort of uncover on the critical path of serving.
所有这些优化都对当前的模型有益,比如553编码器,也对所有未来的模型有帮助。
All of those benefit the current They benefit like 55, three codecs and all future models.
我们还做了一件事,我确信我们以后会写一篇更详细的博客文章,那就是我们彻底重构了整个SerDes栈,改用WebSocket和持久连接,以更增量、更状态化的方式处理任务,从而降低了所有模型的整体延迟。
And there's one thing that we've been doing as well, which I'm sure we're gonna put in like a more detailed blog post at some point, which is we ruined the entire SerDes stack to be based on like web sockets and like a persistent connection and to do things, like, a lot more incrementally and, like, statefully, and that decreases, like, the overall latency, like, you know, across all models.
我们还没默认上线这个功能,但它是为这个全新的超快模型准备的默认方案。
Like, we haven't shipped it, like, by default yet, but it's you know, it it is something that, you know, we are making the default for this new, like, super fast model.
然后我们也会在其他模型上启用这个功能。
And then we're also gonna enable, like, on the other models.
而且这能将整体的轮转延迟降低大约40%。
And, like, it makes things it decreases, like, overall turn latency by, like, something, like, 40%.
我们可以查一下具体的数据。
I can we can look into the exact numbers, like.
是的。
Yeah.
在内部使用这个模型时,最让你惊讶的是什么?这种速度提升带来了哪些可能性?
What are the most surprising things that you see in using the model internally in terms of what a speed up like this enables?
它让你能完全沉浸其中,几乎像是实时地塑造体验或代码。
It just allows you to be super, super in the flow and it's just you're almost, like, just in real time sculpting the experience or, like, the code.
这种感觉非常不一样。
It's just a very different feel to it.
一开始会让人觉得有点不安。
It's it's very unsettling at first.
一旦你适应了,就很难再回到其他任何模型了。
And then and then once you get into it, it's very hard to go back to any other model.
这正是我们所收到的反馈。
That's like the feedback that we've seen.
这也是我自己的感受。
That's like what I have felt myself.
所以,这需要大约五分钟来适应,然后你就觉得:不行,不行。
And so it's it's like this very it it takes, like, five minutes to adapt, and then and then you sort of like, no.
好吧。
Okay.
这就是我今后使用这个工具的方式。
It's like this is how I'm gonna use this thing.
是的。
Yeah.
我认为我们还没有充分探索它能实现的所有可能性。
I also don't think that we've poked at the full extent of what we could do with it.
是的。
Yeah.
现在还太早了。
It's It's very early.
我们拥有它的时间还不长。
We haven't had it for very long.
是的。
Yeah.
团队里有个人,比如钱宁,刚刚在缝制的时候说,哦,是的。
Someone on the team, like, Channing was just sewing like, oh, yeah.
它速度非常快,实际上还能玩乒乓球。
It's so fast, and it can actually, like, play Pong.
你知道吗?
You know?
玩得不是很好,但这个模型能够对事物做出反应,几乎是实时的。
Not very well, but it's, like, the model is able to react to things, like, you know, almost on, like, real time.
对吧?
Right?
你会开始看到它如何取代一些确定性的步骤。
It it, like you start to see how it might replace some deterministic steps.
在Codex应用中,我们有一组Git操作。
So we have we have in the Codex app a a set of Git actions.
对吧?
Right?
众所周知,Git的某些配置或状态会让你在运行这些操作时遇到大量错误处理、错误信息和指导的困扰。
And as everybody knows with Git, like, certain configuration of things or certain states that you can be in can make it really hard to run those without a ton of error handling and, like, all sorts of, like, error messages and guidance.
要打造一个优秀的Git体验非常困难,这就是为什么从来没人真正做好过。
And it's really hard to create a good Git experience, which is why, like, nobody ever has.
但如果你有一个几乎和运行这些脚本一样快的模型,你就可以想象一个世界,在那里这些操作变成了一种技能之类的东西。
But if you have a model that's as almost as fast as running these scripts, then you can imagine a world where these things turn into skills or something like that.
你可以让你的操作以稍微不同的方式运行,带有一些智能,而不会像今天这样,在要求它追踪代码库中的内容时经历相同的延迟。
And you can have your operations run a little bit differently with some, like, some intelligence and and not have the same latency that you have today when you're asking it to go track something down the code base.
对吧?
Right?
你可以大致比划一下,说:嘿。
You can kinda, like, vaguely gesture and be like, hey.
比如,把这东西上传上去,而且速度要快到足以支持一个按钮操作。
Like, send this up and and have that be fast enough for for a button.
我非常兴奋的是,当它和我们随53版本Codex一起发布的一个功能结合时,那就是我们所谓的‘中途转向’,你知道的,你从一个提示开始。
What I'm very excited about is, like, when it's gonna come together with you know, one one thing that we ship with five three Codex as well is, like, this thing that we call, like, mid turn steering, you know, where you're you're just you start with your prompt.
它开始工作,然后你在它还在运行时发送另一个提示,它也能实时适应。
It's like it it got to work, and then you send another prompt, like, while it's still working and it adapts, like, in real time as well.
它会接收这条消息,确认一下,然后继续它的任务。
Like, it will just sort of, like, receive that message, acknowledge it, and then, know, continue its work.
如果你开始想象,好吧,如果用语音会是什么样子?
Like if you start to think about, okay, what would this look like with voice?
然后,如果有一个像我们刚刚发布的一样快的模型,那就会带来一种完全不同的体验,我们非常期待能尽快实现它。
And then with a model that is as fast as the one that we just shipped, then that's like a whole other experience that we would be very excited to bring, hopefully very quickly.
因为你在说话时可以轻松地打断。
Because you can easily interrupt as you're talking.
是的。
Yeah.
只需用自然语言进行交流,然后进行中途调整,由于速度极快,实现几乎瞬间完成。
Just talking and engaging with like natural language and then doing the midterm steers and then the implementation happens almost instantly because of the speed.
使用起来会变得非常愉快。
It becomes a very pleasant thing to use.
现在你可以通过语音输入模拟这一过程,发送指令并进行中途调整,然后观察模型执行,这真的很酷。
Right now you can emulate it with voice dictation and then send it and midterm steering and then, you know, watch the model implement, and it's, like, a very cool thing.
我认为,当我们真正把它打磨完善时,这种体验将发生质的飞跃。
I think we're gonna have a step change in that experience when we just, like, really just polish it.
如果速度这个瓶颈即将被解决,你认为下一个瓶颈会是什么?
If speed as a bottleneck is, like, close to being solved, what do think is the next bottleneck?
实现你想要的东西,下一个限制是什么?
What is the next limit on making the thing you want?
一个非常明显的瓶颈是,你能多快验证事情的正确性?
The the bottleneck that is very apparent is, like, know, how fast can you verify that things are correct?
我们的意思是,我们现在生成代码的速度比以往任何时候都快。
So, like, we we I mean, we can generate, like, code faster than ever before.
我们可以实现完整的功能。
We can implement entire features.
我们看到过有人仅凭对Codex应用的描述,甚至仅通过截图就能合成出一个计划,模型完全有能力复现95%的功能,并从零开始重建整个应用。
And, you know, we like, saw, like, someone just based on a description of, you know, the Codex app, if you sort of, like, synthesize that into a plan just based on screenshots, like, the models are very much capable of, like, reproducing 95% of the features and just rebuilding the app from scratch.
那它会完全没有bug吗?
Now is it going to be bug free?
它是否能像真正的应用那样,完美无缺地实现每一个细节?
Is it going to know, is everything, like, implemented to, like, you know, perfection in the same way that, you know, the actual app is?
但要让人类去点击、验证,确保设计一致、没有这样或那样的bug,这仍然需要大量时间。
It's like that takes, like, a lot of time still, like, you know, for, like, a human to go and click and verify and, you know, make sure that, you know, it's like it's the designs are, like, consistent that, you know, there's, like, no bugs here or there.
比如设置面板,当你点击那个按钮时,它真的会执行你期望的操作。
That the settings panel, like, you know, when you click that button, it actually does the thing that you expect.
我认为验证确实成了一个瓶颈。
I think verification, you know, definitely becomes a bottleneck.
我们团队里有人抱怨,代码量太大,根本审不完。
Like, we have people on the team, like, complain, you know, like, just too much code to review.
就像你
It's like, you
知道,是的。
know Yeah.
这正是我们试图解决的问题。
That's what we're trying to solve for.
我的意思是,你确实也在抱怨这个。
I mean, you you complain about that.
我也在抱怨这个。
I complain about that.
现在要审查的代码太多了。
There's so much code to review now.
无论是你自己机器上的代码,还是来自其他同事的代码,我们都得想办法解决这个问题。
Both that, like, co like, on your own machine and, like, from another peer, it's it's like we're gonna have to figure that out.
是的。
Yeah.
你已经在第一次审查代码了,因为智能体只是把它呈现给你,然后你还得审查你同事生成的代码,或者说是这两轮审查。
You're already reviewing you're reviewing the code the first time because the agent is just presenting it to you, and then you have to review, you know, the code produced by your peers, you know, or, like there's, like, these two rounds of reviews.
然后是的。
And then yeah.
对。
Yeah.
我的意思是,这正是我们正在处理的问题。
I mean, this is something that we're working on.
我们很多人仍然需要审查代码,我们正在思考,当模型参与进来时,这个体验应该是什么样子的。
A lot of us still do have to review code, and we want you know, we're taking a look at what that experience should look like with the model involved.
对吧?
Right?
Codex 应用中有一个审查模式,效果非常好,会在侧边标注你的差异内容,包括发现的问题和风格建议,以及大量待办事项。
We've got a review mode in the Codex app that works really nicely and kind of annotates your diffs on the side with findings and stylistic things and lots to do.
是的。
Yeah.
另一件让我感到兴奋的事情是,让模型运行得更快,我们刚刚发布的这个版本真的快得惊人。
It's one thing I'm I'm sort of, like, also excited, like, about, you know, like, making the models faster and then just, like, you know, this one that we just put out is, you know, which is mind blowing really fast.
你也可以用它。
It's like, can also use it.
你可以想象用它来理解代码、理解功能,帮助你进行代码审查,帮你真正理解上面的代码。
You know, you can imagine using it, like, in a way to understand code, understand features, helping you with code review, like helping you understand the code that's up here really.
而且这样做会更愉快,因为这是你愿意去做的事情。
And it's like much more pleasant because this is something that you wanna do.
你希望融入其中,保持流程顺畅。
You wanna be there in the flow.
这必须是同步进行的。
It's like something that has to be like synchronous.
这不是你可以委托出去的事情。
It's not something that you delegate.
你无法委托理解,对吧?
You cannot delegate understanding, right?
你是在努力去理解某样东西。
It's like, you're trying to like, get to understand something.
所以速度在这里是一个真正的优势。
And so like speed there, like is a real advantage.
因此,它也在一定程度上抵消了模型生成越来越多代码这一事实。
So it sort of like helps offset as well, like, you know, the fact that models are, like, producing more and more code.
速度能帮助你更快地理解这些代码。
It's like, you know, speed helps you understand, you know, this code faster as well.
是的。
Yeah.
我的意思是,我确实已经发现,这个新模型在端到端测试方面,速度更快了。
I mean, I definitely think I've found this already with this with this new model is speed, especially for end to end to end testing, is faster.
因为如果你让它做端到端测试,比如手动集成测试,通常会弹出一个提示框,只显示一秒钟。
Because if you're having it do end to end testing, like manual integration testing, often there's, like, a toast that pops up, it pops up for, like, a second.
如果模型不够快,它就捕捉不到这个提示。
And if the model's not fast, it's not gonna get it.
而且它在这方面似乎更好,因为循环时间短得多。
And it it seems like it's better for that because it it it the cycle times are much much shorter.
所以我也确实有这种感受。
So and and I definitely find this too.
就像,我可以写出大量代码,但当我看到一个PR进来,或者我自己提交PR时,我第一个问题是:你有没有实际测试过这个代码,它真的能运行?
It's like, I can produce so much code, but when I see a PR come in or when I make a PR, the my first question is, like, is there evidence that you've actually tested this and this actually works?
不只是单元测试。
Like, not just unit tests.
而是你已经做过端到端的测试了。
Like, you've gone through it, but end to end.
你该怎么处理这个问题?
How do you how do you handle this?
我的意思是,我见过很多同行,也有同样的疑问。
I mean, I have seen a lot of peers that I have the same question about.
现在写代码变得太简单了。
It's like, it's so easy to to code things now.
对吧?
Right?
是的。
Yeah.
我的意思是,我们已经让Codex应用通过一些技巧变得相当出色,比如自动运行、点击操作、截图作为证据,并上传到PR中。
I mean, we have gotten the Codex app to be pretty good at through some skills that we have of running itself, clicking around, screenshotting itself for evidence, and uploading it to the PR.
这里面有很多有趣的地方,尤其是当我们让这个过程更异步,或者模型在这些任务上变得非常快的时候。
There's there's, like, a lot that's pretty interesting there, especially when we make this, like, more async or when, you know, the models get really fast at this stuff.
我不确定具体会是什么样子,但这里面确实有很多可能性,比如:
Like, I don't know exactly what it looks like yet, but there is a lot there around, like, hey.
这是一个bug修复。
Here's a bug fix.
这正是当时发生时的样子,而现在使用完全相同的点击路径,它看起来就是这样的。
This is exactly, like, what it looked like when it was happening, and here's exactly what it looks like now with the same exact click path.
所以,也许这就是代码审查变得不那么重要的转折点,是的。
And so, like, maybe that's the turning point that code review becomes less important Yep.
当你能验证这一部分时,情况就不同了。
When it's like you can verify that part instead.
因此,你必须减少通过代码作为代理来做的工作。
So you have to kinda, like, do less through the code as a proxy.
但那里确实还有更多值得探索的地方。
There but there's there's definitely more to explore there.
最后几个问题。
Last last couple of questions.
我很好奇。
I'm curious.
你们从 Epropica 和 Cloud Code 学到了什么?
What did you guys learn from Epropica and Cloud Code Cloud Code?
你如何看待你们在市场上的定位与他们的区别?
And how do you think about your positioning in the market versus them?
你们觉得你们之间的差异在哪里?
Like, how do how do you think about the differences?
我认为他们是第一个把东西推出来的人,这让我们很感兴趣,因为我们其实也一直在研究类似的想法。
I I think they were first to put something out there, and that was interesting to us because we have been working on similar ideas for a bit.
但我觉得当时我们的模型还不够成熟。
But I think our models were a little bit at the time not ready.
他们在长周期任务上不够可靠。
They were not reliable on long horizon tasks.
他们无法稳定地调用工具并保持话题聚焦。
They were not able to do reliable tool calls and stay on topic.
所以当我们开始大力投入这一点,尤其是有了GPT-5之后,我们就觉得:好了,模型已经到位了。
And so as soon as we started to really invest on that and especially with GPT-five, we were like, okay, the models are there.
我们知道如何让它们变得更好。
We know how to make them even better.
5.2 版本带来了更好的长上下文、长周期的可靠性以及非上下文理解能力。
5.2 brought even better long context, long horizon, like, reliability and non context understanding.
我们看到,Entropic 在模型方面似乎逐渐失去了动力。
And what we were seeing is that Entropic was sort of, like, to us, like, losing a little bit of steam when it came to the model.
我们处于一个幸运的位置,因为我们的 Codex 运作方式是:既有产品团队,也有工程团队,同时还拥有研究团队。
And we were in this fortunate position where like, the way that we run Codex is, like, you know, we've got, like, product, we've got engineering, but we've also got research.
我们只是紧密合作,坐在一起,共同解决问题。
And we just, like, all work together and sit together and solve problems together.
这是一个高度富有创造力的环境,有时我们决定在产品或框架层面解决问题,但有时我们也会想,嘿。
And it's, a highly creative space where, you know, at times we decide to solve problems in the product, in the harness, but at times we also we're like, hey.
我们该如何真正改进模型呢?
How can we actually improve the model?
让我们一起讨论,共同头脑风暴。
And, like, let's just talk about it and, like, you know, idea it together.
然后研究团队就会过来说,嘿。
And then, like, research will come and be like, hey.
你知道,我们手里有这么一幅图景。
You know, we've got this, like, picture that we're sitting on.
这会不会是我们可以发布的东西呢?
It's like, would this be sort of like something we can ship?
然后我们就对这个想法感到兴奋起来。
And then it was just sort of like, get excited about that.
其中一个例子是我们收到了很多关于压缩的反馈。
One of the examples was we had a lot of complaints on compaction.
压缩就像是个让人不满的环节,只要一触发压缩,用户就会抱怨。
Compaction was like something that people felt like whenever you would hit compaction, people would complain.
它似乎丢失了太多上下文。
It's like it's losing too much context.
所以我们从头到尾解决了这个问题,决定采用端到端的强化学习训练,在研究中引入压缩机制,让模型本身对压缩概念非常熟悉,并能跨时间自主优化委派任务。
And so we sort of like solved that end to end and we decided to do like end to end RL training and introduce compaction within research and then make the model itself very familiar with the concept of compaction and producing optimal delegating to itself across time.
一旦我们在模型层面解决了这个问题,工具链的问题就变得简单多了,因为只需要让模型自己去处理,它就会非常可靠。
And once we had that and we had solved it at the model level, the harness problem became so much easier because it was just like, oh, just let the model do it and it's going to be very reliable.
通过这种方式和这种合作,我们感觉势头非常强劲,能够以每周或每月的频率持续改进并发布模型。
So through that and through that collaboration, it just felt like the momentum has been very strong and that we're sort of able to improve models and ship a model roughly on a weekly a monthly cadence.
然后我们对Codex应用采取了另一种不同的策略和方法,结果证明这真是个绝佳的尝试。
And then we took a bit of a different bet and a different approach with the Codex app, which turned out to be an awesome thing to just try and do.
我们并不是强迫自己,试图把所有东西都硬塞进文本界面中。
It's not just sort of force ourselves, you know, and, like, trying to cram everything into the TUI.
我的意思是,这确实是个很棒的挑战。
I mean, it was, like, it was, like, a great challenge.
对吧?
Right?
你知道的?
You know?
你当时不是说,我知道?
You were like, I'm know?
咱们来开发一个应用吧。
It's like, let's build an app.
就是,我该从哪里开始呢?
Like, just, like, where do I get started?
然后,你知道,你就完全沉迷进去了。
And then, you know, just, like, you just got obsessed by it.
这很难不被吸引。
It's hard not to.
是的。
Yeah.
我的意思是,像这样去构建一个相当反主流的东西,感觉怎么样呢?我想是这样的。
I mean, it's like how was it to just, like, you know, build something that was quite contrarian, I I suppose.
对。
Yeah.
我的意思是,我记得我们早期讨论过,我们不确定能不能把这个产品做出来。
I mean, I remember you and I talking about whether or not, like, early on, we're like, we don't know if we'll ship this.
是的。
Yeah.
我们就试试看。
Like, we're we will try it out.
我们会看看能不能用我们热爱的东西做到这一点,我记得我说过,咱们先在内部实现一些产品市场契合度。
We'll see if we can get there with something that we love and see if we can get I remember saying, like, let's get some PMF internally.
让OpenAI的每个人都愿意使用这个东西,而不是被迫使用。
Let's let's get everybody at OpenAI to want to use this thing without being forced to use it.
咱们看看能不能做到。
Let's see let's see if we can do it.
对吧?
Right?
我们做到了。
We did.
而且它很快就被人接受了。
And it was, like, adopted very quick.
我的意思是,只要它稍微能用,研究人员就立刻在自己的开发机上用起来了。
I mean, the the minute it was barely usable, the research folks, like, put dev boxes on it.
对吧?
Right?
那时候这简直是个疯狂的临时方案。
Like, which was like this crazy hack at the time.
是的。
Yes.
是的。
Yes.
但现在他们用它来做所有事情。
But now they use it, like, for everything.
对。
Yeah.
对。
Yeah.
它甚至被用在训练中,比如五三编码器。
It was, like, including in training, like, the five three codecs.
所以,我觉得特别欣慰的是,现在已经达到了这样一个阶段——公司里几乎所有技术人员都在使用编解码器,而使用得最多的,其实是那些正在开发编解码器和模型的人。
And so, like, I think I feel really good about having hit the point where, like, you know, it's, like, everyone technical at the company, like, almost everyone at technical at the company, like, uses codecs, but, like, the people who use it the most are, you know, actually building codecs and building the models.
因此,我们能够以惊人的速度不断改进,而且完全看不到任何放缓的迹象。
And so, you know, we're just able to, like, you know, improve things at, like, crazy, crazy speeds and, you know, there's just, like, no signs of it slowing down.
太棒了。
Amazing.
我非常期待你们接下来发布的东西。
Well, I'm excited for what you ship next.
谢谢你们抽出时间。
Thank you guys for your time.
我真的很感激。
I really appreciate it.
谢谢。
Thank you.
谢谢你们邀请我们。
Thank you for having us.
谢谢。
Thanks.
天哪,各位。
Oh my gosh, folks.
你们绝对必须点击点赞按钮并订阅《AI与我》。
You absolutely positively have to smash that like button and subscribe to AI and I.
为什么?
Why?
因为这个节目是卓越的典范。
Because this show is the epitome of awesomeness.
这就像在后院发现了一个宝箱,但里面装的不是黄金,而是关于ChatGPT的纯粹无杂质的知识炸弹。
It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT.
每一集都是一场情感、洞见与欢笑的过山车,让你坐立不安,渴望更多。
Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat craving for more.
这不仅仅是一档节目。
It's not just a show.
这是一场以丹·希珀为船长的太空飞船之旅,带你驶向未来。
It's a journey into the future with Dan Shipper as the captain of the spaceship.
所以,为自己着想吧。
So do yourself a favor.
点赞、订阅,并准备好迎接你人生中最精彩的旅程。
Hit like, smash subscribe, and strap in for the ride of your life.
好了,不多说了,丹,我简直无可救药地爱上了
And now without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with
你。
you.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。