本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
大家好,我是安德鲁·梅恩,这里是OpenAI播客。在本期节目中,我们将与OpenAI联合创始人兼总裁格雷格·布罗克曼以及Codex工程负责人蒂博·索西奥进行对话。我们将讨论智能体编程、GPT-5 Codex技术,以及2030年2月可能的发展方向。
Hello, I'm Andrew Main, and this is the OpenAI Podcast. In this episode, we're going to speak with OpenAI co founder and president, Greg Brockman, and codex engineering lead, Thibault Soccio. We're going to talk about agentic coding, GPT-five Codecs, and where things might be heading in 02/1930.
长远来看,更高的智能终将证明其价值。
Just bet that the greater intelligence will pan out in the long run.
它确实高度优化了人们在Codex中使用GPT-5的场景。
It's just really optimized for what people are using GPT-five within Codex for.
如何确保人工智能生成的内容是真正正确的?
How do you make sure that that AI is producing things that are actually correct?
今天我们讨论Codex——其实从最初版本开始我就一直在使用,当时我还在公司工作,现在你们推出了新版本。我整个周末都在体验,对其表现深感震撼,这项技术在短短几年内的发展速度令人惊叹。我很想了解早期的故事:使用语言模型处理代码的想法最初是从何而来的?
We're here to talk about Codex, which first, I've been using it since actually since I worked here with the first version of this and then now you guys have the new version of this. I was playing with it all weekend long and I've been very, very impressed by this and it's amazing how far this technology has come in a few years. I would love to find out the early story. Like where did the idea of even using a language model do code come from?
嗯,我记得早在GPT-3时代就看到了初步迹象:输入文档字符串和Python函数定义就能...
Well, I mean, I remember back in the GPT three days seeing the very first signs of life of take a docstring and a Python definition of a function
嗯。
Mhmm.
命名,然后看着模型完成代码。一旦你看到那个,你就知道这会成功。这将会是巨大的。我记得在某个时刻,我们谈论过这些雄心勃勃的目标——想象一下,如果有一个语言模型能写出一千行连贯的代码,对吧?那对我们来说就像是一个大目标。
Name, and then watching the model complete the code. And as soon as you saw that, you knew this is gonna work. This is going to be big. And I remember at some point, we were talking about these aspirational goals of imagine if you could have a language model that would write a thousand lines of coherent code, right? That was like a big goal for us.
而有点疯狂的是,那个目标已经实现并过去了。我认为我们对此已经习以为常了,对吧?我觉得在开发这项技术的过程中,你真的只会看到漏洞、缺陷和不起作用的地方。但时不时地,退一步意识到,实际上事情已经取得了如此巨大的进步,这是很好的。
And the thing that's kind of wild is that that goal has come and passed. And I think that we don't think twice about it, right? I think that while you're developing this technology, you really just see the holes, the flaws, the things that don't work. But every so often, it's good to, like, step back and realize that, like, actually things have just, like, come so far.
令人难以置信的是,我们多么习惯于事物一直在改进,它们如何变得像日常工具一样,你每天都使用它,然后你回想一个月前,这甚至是不可能的。而这还在持续发生。我认为这相当迷人,人类适应新事物的速度有多快。
It's incredible how used we get to things improving all the time and how they just become like a a daily driver and you just use it every day and then you reflect back to, like, a month ago, and this wasn't even possible. And this just continues to happen. I think that's quite fascinating, like, how quickly humans adapt to new things.
现在,我们一直面临的一个挣扎是是否要深入某个领域的问题。对吧?因为我们真正追求的是G,对吧?是AGI,通用人工智能。所以首先,我们的本能是同时推动所有能力的提升。
Now, one of the struggles that we've always had is the question of whether to go deep in a domain. Right? Because we're really here for the G, right? For AGI, general intelligence. And so to first order, our instinct is just push on making all the capabilities better at once.
编码一直是这个规则的例外。我们确实有一个非常不同的计划,用来专注于编码数据、代码指标,真正理解我们的模型在代码上的表现。我们也在其他领域开始这样做,但对于编程编码,这一直是我们非常特殊的焦点。对于GPT-4,我们确实推出了一个在所有方面都有飞跃的单一模型。但实际上,我们训练了Codex模型。
Coding has always been the exception to that. We really have a very different program that we use to focus on coding data, on code metrics, on trying to really understand how do our models perform on code. And we've started to do that in other domains too, but for programming encoding, that that's been a very exceptional focus for us. And for GPT-four, we really produced a single model that was just a leap on all fronts. But we actually had trained, you know, the Codex model.
我记得做了一个专注于Python的模型。就像,我们真的非常努力地在2021年左右推动编码能力的水平。而且,我记得当我们做Codex演示时,那可能是我们今天称之为氛围编码的第一次演示。我记得构建这个界面时,意识到对于标准的语言模型东西来说,界面、框架是如此简单。对吧?
And I remember doing like a Python sort of focused model. Like, we're really, really trying to to push the level of coding capability back in, you know, 2021 or so. And, I remember when we did the Codex demo, that was maybe the first demonstration of what we would call vibe coding today. And I remember building this interface and having this realization that for just standard language model stuff, the interface, the harness is so simple. Right?
你只是在完成一件事,也许还有一个后续的回合之类的,但仅此而已。对于编码,这段文字实际上活了起来。对吧?你需要执行它,它需要连接到工具,所有这些事情。所以你意识到,框架几乎和智能一样,是让这个模型可用的重要组成部分。
You're just completing a thing and, you know, maybe there's a follow-up turn or something like that, but but that's it. For coding that you actually this text comes to life. Right? You need to execute it, that it needs to be hooked up to tools, all these things. And so you realize that the harness is almost equally part of how you make this model usable as the intelligence.
所以我认为从那一刻起我们就明白了这一点。今年随着我们开发出更强大的模型,并真正开始关注不仅仅是提升原始能力(比如如何在编程竞赛中获胜),而是如何让它变得实用,这一过程非常有趣。对吧?在各种环境中进行训练,真正连接人们的使用方式,然后真正构建那个所谓的'harness'——这是Thibault和他的团队真正努力推进的东西。
And so that is something that I think we knew from that moment. And it's been interesting to see as we got to more capable models this year and really started to focus on not just making the raw capability, like how do you win at programming competitions, but how do you make it useful? Right? Training in a diversity of environments, really connecting to how people are going to use it and then really building the harness, which is something that Thibault and his team have, like, really pushed hard.
你能不能用简单的话解释一下,什么是'harness'?
Could you unpack, like, a harness, what that means in sort of simple terms?
是的,这很简单。你有模型,模型只具备输入输出的能力。而我们所说的'harness'是指如何将其与其余基础设施集成,使模型能够实际作用于环境。嗯。
Yes. It's quite simple. You have the model, and the model is just capable of input output. And what we call the harness is how do we integrate that with the rest of the infrastructure so that the model can actually act on its environment. Mhmm.
所以它是一套工具,是一种循环方式——我们称之为智能体循环。本质上相当简单,但当你开始将这些部分整合在一起并进行端到端训练时,你会看到相当神奇的行为,以及模型真正代表你行动、创造事物并成为真正合作者的能力。可以把它想象成:harness就像是你的身体
So it's a set of tools. It's the way that it's, looping. So the agent loop as, we refer to it as the agent loop. And, it's in in essence, it's fairly simple, but when you start to integrate these pieces together and really train it end to end, start you to see, like, pretty magical behavior, and an ability of the model to really act and create things on your behalf and be a true collaborator. So think about it a little bit as, you know, the harness being your body
嗯。
Mhmm.
而模型则是你的大脑。
And the model being your brain.
好的。看到它的进步真是有趣,比如从GPT-3时代开始,那时你 literally 必须写带注释的代码,比如用Python说明这个函数是做什么的,前面加个井号之类的。现在看到模型变得如此自然直观地擅长编码,真的很有意思。你提到过在通用模型和代码重要性之间的权衡。这是因为外部需求(人们说这些模型编码更好),还是内部因为你们自己想更多地使用它?
Okay. It is it's interesting to see, like, yeah, how far it came, like, g p d three days where you literally had to write, like, you know, commented code and say, like, this function does this with Python, put your hashtag in front of that, whatever. And it's just interesting to see how the models have now become just naturally intuitively good at coding. And you mentioned that, you know, trying to determine between a general purpose model or saying how important code is. Was it just outside demand people telling me one of these models are better code, or was this coming internally because you guys wanted to use this more?
两者都是。是的,绝对是两者。我记得,在2022年我们与GitHub合作推出了GitHub Copilot。当时非常有趣的一点是,你第一次真正感受到,在编码工作流中融入AI是什么感觉,以及它如何能加速你的工作?
Like, both. Yeah. Absolutely both. And I remember, you know, in I think 2022 is when we worked with GitHub to produce GitHub Copilot. And the thing that was very interesting there was that for the first time, you really felt, what is it like to have an AI in the middle of your coding workflow and how can it accelerate you?
我记得当时有很多关于最佳界面的讨论。你是想要幽灵文本让它自动补全?还是想要一个下拉菜单提供多种可能性?但非常明确的一点是,延迟是一个产品特性。像自动补全这样的功能,其约束是1500毫秒。
And I remember that there were a lot of questions around the exact right interface. Do you want ghost text so it just does a completion? Do you want a little drop down with a bunch of different possibilities? But one thing that's very clear was latency was a product feature. And the the constraint for something like an autocomplete is that 1,500.
对吧?这就是你生成补全建议的时间限制。任何比这更慢的方案,哪怕极其聪明,也没人愿意坐着干等。所以我们得到的明确指令,从用户、产品经理以及所有考虑产品层面的人那里获得的清晰信号是:在延迟约束下找到最聪明的模型。
Right? That's like the time that you have to produce a completion. Anything that's slower than that, it could be incredibly brilliant. No one wants to sit around waiting for it. And so the the mandate that we had, the clear signal we had from from users and from, you know, the product managers and all the all people thinking about the product side of it is get the smartest model you can subject to the latency constraint.
然后你有了像GBD4这样的模型,它聪明得多,但无法满足你的延迟预算。你该怎么办?它是个无用的模型吗?绝对不是。你需要做的是改变使用方式,改变界面。
And then you have something like GBD4, which much, much smarter, but it's not going to hit your latency budget. What do you do? Is it a useless model? Absolutely not. The thing you have to do is you change the harness, you change the interface.
我认为这是一个非常重要的主题:你需要根据模型的能力共同进化界面和使用方式。所以超级快速又聪明的模型会很棒,但极其聪明但较慢的模型也值得。我们一直有一个论点,即智能带来的回报是值得的。
I think that that's like a really important theme is you need to kind of co evolve the interfaces and the way that you use the model around its affordances. And so super fast, smart models is going be great, but the incredibly smart but slower models, it's also worth it. And I think that we've always had a thesis that the, you know, that the returns on that intelligence are worth it.
嗯。
Mhmm.
这在当时从来都不明显,因为你只会觉得它太慢了,为什么会有人想用?但我们的方法很大程度上就是坚持认为,从长远来看,更高的智能会带来回报。
And it's never obvious in the moment because you're just like, well, it's gonna be too slow. Why would they want to use it? But I think that our approach has very much been to say just that that the greater intelligence will pan out in the in the long run.
当初在开发GitHub Copilot时,我很难理解这一切最终会走向何方,因为那时我们习惯了,就像你说的,补全功能——要求做一件事,它就完成一件事。我觉得我当时并没有真正理解,通过构建一个框架并添加所有这些功能,能带来多大的额外价值。那时似乎觉得只需要模型就够了,但现在意识到工具链和其他一切都能产生如此巨大的差异。然后你提出了多模态的概念,现在我们有了CLI、代码CLI。
It was it was hard for me to wrap my head around where that was all headed back when working on the GitHub Copilot because at that point, we're used to, like you said, the completion, ask to do a thing, it completes a thing. And I think I didn't really understand how much more value you would get out of building a harness added adding all these capabilities there. And it just seemed like all you just need is the model, but now you realize the tooling, everything else matters can make such a big difference. Then you brought up the idea of modalities. And now we have CLI, codec CLI.
所以我可以在命令行中使用,可以这样做。还有VS Code的插件,我可以在那里使用。此外,我还可以将内容部署到网上进行操作。我不认为我当时完全理解了这种模式的价值。那么,你是如何实际使用这些功能的呢?
So I can go in the command line, I can do this. There's a plugin for Versus Code so I can go use this there. And then also I can deploy stuff to the web and do that. And I don't think I fully kind of comprehend the value of that. And so like, how is this something you're using?
你自己是如何部署这些工具的?你在哪些方面发现它的效用最大?
How are you kind of deploying these things yourself? Like, where are you finding the most, you know, utility out of it?
我想先回顾一下我们最初看到的一些迹象:公司内外有很多开发者,比如我们的用户,会使用ChatGPT来帮助他们调试非常复杂的问题。我们明确意识到,人们正试图向ChatGPT输入越来越多的上下文,包括代码片段、堆栈跟踪等信息,然后粘贴出来,呈现给一个非常智能的模型以获取帮助。交互变得越来越复杂,直到某个时刻我们意识到:嘿,也许不是由用户来主导这个过程,而是让模型实际驱动交互,自行寻找上下文,自行探索,并能够独立调试这些难题,这样你就可以坐下来,看着模型完成工作。所以这有点像逆转了那种交互方式,我认为这促使我们更多地思考框架的重要性,并赋予模型行动的能力。
I think just to go back a little bit on like the first signs that we saw is like we had a lot of developers at the company, outside of the company, like our users use ChatGPT to help them debug like very complex problems. And one thing that we clearly decided, like people are trying to get more and more context into ChatGPT and you're trying to get bits of your code and stack traces and things, and then you paste out and you present that to a very smart model to get some help. And interactions were starting to get more and more complex up to some point where we realized like, hey, maybe instead of the user driving this thing, maybe let the model actually drive the interaction and find its own context and then find its way and be able to debug, you know, this hard problem by itself so that you can just sit back and, you know, watch the model do the work. So it's like sort of like reversing that interaction that, you know, led to this, I think, thinking a lot more about the harness and giving the model the ability to act.
而且我们在形态因素上进行了多次迭代。
And we we iterated on form factors.
嗯。
Mhmm.
对吧?我的意思是,我记得年初时我们尝试了几种不同的方法。我们有异步代理框架,也有本地体验,以及几种不同的实现方式。是的。
Right? I mean, I remember at the beginning of the year, we had a couple of different approaches. We had, you know, sort of the async agentic harness, but we also had the local experience and a couple of different implementations of it. And yeah.
我们实际上开始尝试在终端中运行这个想法,但后来觉得这还不够AGI化。嗯。我们需要能够大规模、远程地运行它,只需合上笔记本电脑,让智能体继续工作。然后你或许可以在手机上关注它并与它互动。这看起来非常酷。
We we actually started to play a little bit with this idea of, like, running it in the terminal, and then we felt that was not AGI pilled enough. Mhmm. We needed the ability to run this at scale and remotely and just close the laptop and have, you know, the agent to just like continue to do its work. And then you can maybe follow it on your phone and interact with it there. That seemed like very cool.
所以我们推进了这一点,但实际上我们有一个完全在终端中运行的原型,OpenAI内部有人已经在高效使用它。我们决定不将其作为产品发布。感觉还不够完善。它被称为10x,因为我们觉得它带来了10倍的生产力提升。但后来我们决定尝试不同的形态,并最初全力投入异步形态。
So we pushed on that, but we actually had a prototype of it fully working in a terminal and people were using that productively at OpenAI. We decided to not, launch this as a product. It didn't feel like polished enough. It was called 10x, because we felt like it was giving us this 10x productivity boost. But then we decided to like, you know, just experiment with different form factors and like really go all in with the async form factor initially.
现在我们又有点回归,它重新演化,认为实际上这个智能体可以带回你的终端,可以融入你的IDE。但我们真正努力做对的是这个实体,这个与你协作的伙伴,然后在你作为开发者已经使用的工具中呈现给你。
And now we've kind of gone back a little bit now and it reevolved and said like, actually this agent, we can bring it back to your terminal. We can bring it in your ID. But the thing that we're really trying to get right is like this entity, this collaborator that's working with you and then bringing that to you in in the tools that you're already using as a developer.
是的。而且还有其他尝试方向,对吧?所以我们有一个版本是远程守护进程连接到本地代理。这样你几乎可以同时获得两者。
Yep. And there are other shots on goal as well. Right? So we had a version where there was a remote daemon that would connect to to a local agent. And so you kind of could die could get both at once.
我认为演变的一部分在于,部署工具的方式几乎形成了一个矩阵,对吧?有这种异步方式,它在云端有自己的计算机;有本地方式,它在那里同步运行。
And I think that part of the the evolution has been that there's almost this matrix of different ways you could try to deploy a tool. Right? There's this, like, async. It has its own computer off in the cloud. There's the local that it's running synchronously there.
你可以在这些之间混合。问题在于,对我们来说一直有一个问题:我们应该多大程度上专注于构建可外部化的东西,对吧?即在人们多样化的环境中都有用的东西, versus 真正专注于我们自己的环境,努力让东西为我们内部工程师工作得非常好。其中一个挑战是我们想全部做到。我们最终想要对每个人都有用的工具。但如果你甚至不能让它对你自己有用,你怎么能让它对其他所有人都极其有用呢?
You can blend between these. That there's a question of you know, there's been a question for us of how much do we focus on trying to build something that is externalizable, right? That is like useful in the diversity of environments that people have out there versus really focus on our own environment and try to make it so that things work really well for our internal engineers. And one of the challenges has been we want to kind of do all of We ultimately want tools that are useful to everyone. But if you can't even make it useful for yourself, how are you going to make it extremely useful for everyone else?
因此,我们面临的部分挑战确实是弄清楚我们应该聚焦在哪里,以及如何在工程努力上实现最大的性价比?对我来说,一个 overarching 焦点是,我们知道编码和构建非常强大的智能体是今年我们能做的最重要的事情之一。年初,我们设定了一个公司目标:年底前实现一个代理式软件工程师。并精确理解那意味着什么,如何实例化它,以及如何汇聚所有机会和我们拥有的所有算力来解决这个问题。这对OpenAI的许多人来说都是一项伟大的事业。
And so part of the challenge for us has been really figuring out where do we focus and like, how do we achieve the sort of biggest bang for the buck in terms of of of our engineering efforts? And, you know, for me, one of the things that's been an overarching focus has been we know that coding and building very capable agents is one of the most important things that we can do this year. At the beginning of the year, we set a company goal of an agentic software engineer by the end of the year. And figuring out exactly what that means and how to instantiate that and how to bring together all the opportunity and all the kind of compute that we have to bear on this problem. Like that has been a great undertaking for many, many people at OpenAI.
所以你提到你们有10x这个工具,是个内部工具,而且你似乎说过,这在某个时刻让你觉得,哦,这对其他人真的很有用。决定何时这样做、何时不这样做,以及如何优先考虑这件事,肯定很难。你知道,我们看到Claude代码变得非常强大,我猜这和你内部使用然后部署出去的东西是类似的故事。当你开始思考下一步,比如决定接下来把它带向哪里?决定把重点放在哪里?
So you mentioned that you had the tool 10x and that was an internal tool and that seemed to be something at some point you said, oh, this is really useful to other people. It's gotta be hard to decide when to do that and when not to and how much to sort of prioritize that. You know, we've seen Claude code has become extremely powerful, which I imagine is probably a similar story with something that was used internally and then becomes something deployed. When you start to think about next steps of, you know, where do you decide to take it next? Where do you decide to put the emphasis?
你知道,你之前提到,我现在可以在云中运行东西,运行这些网络,你知道,做这类代理式的任务,我走开,我的问题在于这是一种全新的模式。对我来说真的很难去思考和理解,但有时候这些东西需要存在一段时间,人们才会各自独立地发现它们。你在内部有没有发现有人说,哦,现在我明白了?
You know, you mentioned before, you know, I can now run things in the cloud, run these web, you know, do these kind of agentic like tasks where I walk away and my problem is it's such a new modality. It's really, really hard for me to think about and but sometimes these things have to sit around for a while and people sort of discover them independently. And have you found that internally that somebody says, oh, now I get it?
我会说绝对是的。对吧?而且我认为,你知道,我的观点是我们大致知道未来的形态,对吧,长期的未来。非常清楚的是,你会想要一个拥有自己计算机的AI,能够运行,你知道,委托给一组代理,并能够并行解决多个任务。你应该在早上醒来,喝着咖啡,你知道,回答你的代理的问题,提供一些审查,像是,哦,不,这不是我完全的意思。
I'd say absolutely. Right? And I think that, you know, my perspective is that we kind of know the shape of the future, right, of the long term. It is very clear that you're going to want an AI that has its own computer that is able to run, you know, delegate to a fleet of agents and be able to to solve multiple tasks in parallel. You should wake up in the morning, you're sipping your coffee, you know, answering questions for your agent, providing some review, being like, oh, no, this wasn't quite what I meant.
这种工作流程显然需要发生。但模型还不够聪明,不足以让你以这种方式与它们互动。
This workflow clearly needs to happen. But the models aren't quite smart enough for this to be the way that you interact with them.
嗯。
Mhmm.
因此,拥有一个真正存在于你的终端、你的编辑器中,以帮助你完成工作的代理,这种方式与你一年前的工作方式非常相似,这也是现状。所以我认为我们看到的方式几乎是模糊在一起的。这是未来的样子,但我们也不能抛弃现状。并且思考如何将AI引入代码审查,以及如何让它主动出现并为你做有用的工作。然后你还有一个全新的挑战,如果你有更多的PR(拉取请求),比如,你是否真的会筛选出那些你真正想合并的?所以我认为我们已经看到了所有这些机会空间,并且我们看到人们开始改变他们在OpenAI内部的开发方式,他们如何构建他们的代码库。
And so having an agent that is really there in your terminal, in your editor to help you with the way that you do your your work that looks very similar to the way you would have done it a year ago, that's also the present. And so I think that the the way that we've seen it is almost we're blurring together. Here's what the future looks like, but we also can't abandon the present And thinking about how do you bring AI into code review and how do you make it so that it appears proactively and does work for you that's useful. And then you have a whole new challenge as well of if you have a lot more PRs, like, do you actually sort through those to the ones that are that are the ones you actually want to merge? So I think we've kind of seen all of this opportunity space, and we've seen people start to change how they develop within OpenAI, how they've been structured their code bases.
是的。我认为有两件事在这方面真正结合了起来。而且,我的意思是,你知道,这就是我们今天所处的位置,一是基础设施很难,我们很希望,你知道,所有人的代码、任务和包都能完美地容器化,这样我们就可以大规模运行它们。但情况并非如此。比如,人们有非常详尽和复杂的设置,可能只在他们的笔记本电脑上运行,我们希望能够利用这一点,并且,你知道,满足人们当前的状态,这样他们就不必专门为编解码器配置东西了。
Yeah. I think there are two things to that effect that really combined. And, mean, you know, this is where we're at today is one infrastructure is hard and we would love for, you know, all all of everyone's code and, like, tasks and packages to be, like, perfectly containerizable and so we can run them at scale. That's not the case. Like, people have very thorough and complex setups that probably only runs on their laptop and we want to be able to leverage that and meet, you know, people where they are so that, you know, they don't have to configure things specifically for codecs.
这为您提供了一个非常容易的切入点,让您体验强大的编程助手能为您做什么。同时,它也让我们能够试验什么是正确的界面。六个月前,我们还没有使用这类工具,这一切都非常新且发展迅速,我们必须继续迭代和创新,探索正确的界面以及如何与这些助手协作的正确方式。我们觉得还没有完全掌握这一点。这将继续发展,但将其打造成零设置、开箱即用、极其易用的形式,可以让更多人受益并尝试使用,同时让我们获得反馈,以便继续创新。
That gives you this very easy entry point into experiencing, you know, what a very powerful coding agent can do for you. And it's also at the same time lets us experiment with, you know, what the right interface is. Six months ago, we weren't playing with these kinds of tools and this is all very new and evolving fast and we have to continue to iterate here and innovate on like what the right interface and why the right, way to collaborate with these agents are. And we don't feel like we have really nailed that yet. That's going to continue to evolve, but bringing it, to, like, a zero setup, extremely easy to use out of the box, you know, allows a lot more people to benefit from it and, like, play with it and for us to get the feedback so that we can continue to innovate.
这一点非常重要。
That's very important.
我记得年初和我们的一位工程师交谈,他真的很棒。他说ChatGPT有一个集成功能,可以自动查看终端中的上下文。他说这具有变革性,因为他不再需要复制粘贴错误信息。他可以立即问:'嘿,这个bug是什么?'然后它就会告诉他,这太棒了。
I remember at the beginning of the year talking to one of our engineers, who I think is is really fantastic. He was saying that ChatGPT, we had this integration where it could automatically see the context in this terminal. And he's like, it's transformative because he doesn't have to like copy paste errors. He just like can instantly be like, hey, like, you know, what's what's the bug? And it would just tell him and it was great.
对吧?你意识到我们构建的这个集成功能如此具有变革性。关键不在于模型更智能。
Right? And you realize that it was an integration that we built that was so transformative. It wasn't about a smarter model.
嗯。
Mhmm.
我认为很容易让人困惑的一点是,只关注其中一个维度,然后问哪个更重要?因为答案其实是两者都很重要。
And I think that one thing that's very easy to get confused by is to really focus on only one of these dimensions and be like, which one matters? Because the answer is they kind of both matter.
而且
And
我一直以来的思考方式是,我记得2020年我们最初发布API时,衡量AI吸引力的有两个维度。一个是智能,你可以把它看作一个轴;另一个是便利性,可以理解为延迟、成本,或者它可用的集成能力。这里存在一个接受区域,对吧?比如,如果一个模型极其聪明,但运行需要一个月时间,你可能还是会用。
the way I've always thought about this, I remember when we were originally releasing the API back in 2020, is there's two dimensions to what makes an AI desirable. There's intelligence, which you can think of as one axis, and then there's convenience, which you can think of as latency. You can think of as cost. You can think of as the integrations available to it. And there's some acceptance region, right, where it's like, if the model's incredibly smart, but it takes you like a month to run it or something, like, you still might.
对吧?如果它能输出非常有价值的代码,或者治愈某种疾病的方案之类的,那没问题,这是值得的。但如果模型不够智能、能力有限,那你只想用它来做自动补全。所以它必须极其便利,零认知负担,你无需思考它的建议。
Right? If what's gonna output is such a valuable piece of code or, you know, cure for a certain disease or something like that, okay, fine. Like, it's worthwhile. If the model's incredibly not that intelligent, not that capable, then all you wanna do is autocomplete. So it has to be incredibly convenient, zero cognitive tax for you to think about what it's suggesting, that kind of thing.
而我们目前处于这个光谱的某个位置。现在我们有更聪明的模型,虽然比自动补全稍微不便一些,但远比坐等一个月才能得到答案要方便得多。因此,我认为我们的很多挑战在于:何时投入提升便利性(向左拉),何时投入提升智能(向上推),这是一个巨大的设计空间,也正是乐趣所在。
And where we are is, of course, somewhere on the spectrum now. We now have smarter models that are, you know, reasonably less convenient than autocomplete, but still, like, more convenient than you have to sit around and wait for a month for for for the answer to appear. And so I think that a lot of our challenge is figuring out when do you invest in pulling that convenience to the left, when do you invest in pushing the the intelligence up, and it's a massive design space. It's what makes it fun.
是的。不知道你还记不记得,2020年AI频道上线时,我开发的一款应用还被推荐过。
Yeah. I don't know if you remember, but I I made an app that was featured on the launch back in 2020 AI channels.
嗯哼。然后它
Uh-huh. And it
是的,当时的挑战是GBD React能力很强,但我得写这些长达600词的提示才能让它干活。而且因为每千个token要0.06美元,还有延迟问题,我当时就觉得,现在还不是时候推广这个。
was that was yeah, the challenge was GBD React was very capable, but I had to write these like 600 word prompts to get it to do stuff. And because it's $06 per thousand tokens in the latency, I'm like, I don't think this is the world for this right now.
是的。
Yes.
然后是GPT-3.5和GPT-4。突然间你看到了所有这些能力。我很难说清楚原因,但你会发现一切突然就融合在一起了。你提到的那种理念,就是让模型能够看到你工作环境中的上下文。
And then GPD 3.5 and GPD four. And then all of a sudden you see all that capabilities. And it was hard for me to say why. But then you see that all of a sudden the things that come together. And you mentioned, you know, the idea of just having, you know, the model be able to see the context inside of the, you know, where you're working.
我记得当我用ChatGPT复制粘贴到工作空间时,这让我想起去杂货店购物时拒绝推车,把所有东西都抱到收银台的情景。我就想,这效率太低了。一旦给东西装上轮子,就好用多了。我认为我们现在正在见证各种这样的突破。现在我面临的问题是,当我坐下来工作时,是该用命令行界面吗?
And I remember when I was copy pasting using ChatGPT into my workspace and it reminded me going into a grocery store and refusing to get a cart and just carrying everything to the checkout. I'm like, this is terribly inefficient. Once you put things on wheels, it works really well. And I think we're seeing all kinds of those unlocks now. Now the problem I deal with is when I sit down to work on something is, do I go into CLI?
该用VS Code插件吗?该用Cursor编辑器吗?还是该用其他工具?你们是怎么解决这个问题的?
Do I go use the Versus Code plugin? Do I go into cursor? Do I use some other tool? And how do you guys figure this out?
目前我们还处于实验阶段,正在尝试不同的方式让你与智能体互动,并让它融入你已经高效工作的环境。比如Codex现在已经集成在GitHub中。嗯。你可以@提及Codex,它会为你工作。如果你输入@Codex修复这个bug或者把测试移到那边,它就会去执行,就像带着自己的小笔记本电脑在我们的数据中心里工作,你完全不用操心。
Right now, we're still at the experimentation phase where we're trying different ways for you to interact with the agent and bring it where you're already productive. So for example, Codex is now in GitHub. Mhmm. You can at mention Codex and it will do work for you. If you do at Codex, fix this bug or move the tests over here, it will go and, like, run off and, like, do it with its own little laptop, you know, on our data centers, and you don't have to think about it.
但如果你在处理文件夹中的文件,就需要决定是在IDE中操作还是在终端中操作。我们看到的情况是,高级用户正在终端中开发更复杂的工作流程。嗯。而当你在处理更倾向于在IDE中操作的文件或项目时,那里的界面更加精致完善。
But if you're working with files in a folder, you know, then you have that decision that are you gonna do it in your IDE? Are you gonna do it in terminal? What we're seeing is users are developing like power users are developing very complex workflows with the terminal more. Mhmm. And then when you're actually working on a file or a project that you prefer to do it in the IDE, it's it's a bit more of a polished interface.
你可以撤销操作,可以看到编辑记录,不会像终端那样只是滚动输出。而终端也是一个非常棒的沉浸式编码工具,当你不是特别关心生成的代码质量时,比如快速生成一个小应用,它更注重交互体验。它提升了交互性而非代码本身,更关注最终结果。
You can undo things, you can see the edits, you know, it's not like just scrolling by you. And then the terminal is just an amazing also vibe coding tool where, you know, if you don't really care that much about the the code that's being produced, you know, you can just generate a little app. It's much more about interaction. It elevates the interaction more instead of focusing on the code. So it's more focused on the outcome.
这主要取决于你想做什么,但目前确实还处于大量实验的阶段。我们正在尝试各种不同的方案,而且我觉得这种状态还会持续一段时间。
And it's just sort of like depends on what you want to do, but it's still very much an experimentation phase right now. And we're trying different things out And, yeah, it's going to continue like that, I think.
是的。我我非常同意这一点。而且我也认为我们的很多方向将是这些工具之间更深入的整合。对吧?因为人们有能力使用多种工具。
Yeah. I I really agree with that. And I I also think that a lot of our direction will be more integration across these things. Right? Because people are capable of using multiple tools.
对吧?你已经有了终端、浏览器、GitHub网页界面、本地机器上的代码库。人们已经学会了在什么情况下该使用什么工具。我认为因为我们正处于这个实验阶段,这些工具可能会让人感觉很分散、很不同。就像,你需要学习一套新的技能和相关工具的功能。
Right? You already have your terminal, your browser, your, you know, GitHub web interface, your, you know, repo on your local machine. Each of these is something people have kind of learned when it's appropriate to reach for for what tool. And I think that because we're in this experimentation phase, that these things can feel very disparate and very different. Like, you know, you have to kind of learn a new set of, you know, skills and the affordances of of the relevant tool.
我认为在我们迭代的过程中,我们需要认真思考这些工具如何协同工作。你可以看到这一点,比如Codex IDE扩展能够运行远程Codex任务。最终,我们的愿景是应该有一个AI,它既能访问自己的计算机和集群,又能随时协助你。对吧?它也可以本地帮助你,这些不应该是分离的功能。
And I think that a lot of as we're iterating, what's on us is to really think about how do these fit together. And so you can start to see it, right, with the Codex IDE extensions being able to run remote Codex tasks. And I think that ultimately, our vision is that there should be a AI that has access to its own computer, its own clusters, but is also able to look over your shoulder. Right? They can also come and help you locally, and these shouldn't be distinct things.
没错。就像是一个编码实体在那里帮助你并与你协作。就像我和Greg协作时,我不会抱怨有时你在Slack上,有时当面交流。
Right. And it's like this one coding entity that is there to help you and collaborate with you. Like, when I collaborate with Greg, you know, I don't complain that sometimes you're in Slack. Sometimes I talk to in person.
有时你会抱怨你
Sometimes you complain You
知道,有时通过GitHub代码评审等方式互动。这在与其他人协作时显得很自然。这也是我们将Codex视为一个代理实体的思考方式,它真正旨在当你努力实现目标时为你提供强大助力。
know, sometimes you interact, like, through a GitHub, like, review. Like, this seems, like, very natural when you interact with other humans and collaborators. And this is also where, you know, how we're thinking about Codex as an agentic, like, entity that is really meant to just supercharge you when you're trying to achieve things.
那么我们来谈谈像智能体那样使用它的一些方式。Md,你想解释一下吗?
So let's talk about some of the ways of using it like agents. Md. Do you wanna explain that?
是的,代理。Md是一组指令,你可以提供给Codecs,它存在于你的代码旁边,这样Codecs就能有更多上下文信息,了解如何最好地导航代码并完成任务。我们发现,在agents.md中有两样主要的东西很有用:一是帮助压缩,让代理直接阅读Codex.md比探索整个代码库效率更高。
Yeah, agents. Md is a set of instructions that you can give to Codecs that lives alongside your code so that Codecs has a little bit more context about how to best navigate the code and accomplish the tasks. There are two main things that are useful to put in agents. Md that we find is like helping with it's like a compression thing where it is a little bit more efficient for the agent to just read Codex. Md instead of like exploring the entire code base.
其次是那些在代码库本身并不明确的偏好,比如测试应该放在这里,或者我喜欢以这种特定方式完成事情。这两样东西——偏好以及向代理解释如何有效导航代码库——在agents.md中非常有用。
And then preferences that are actually not clear in the code base itself where you would be like, you know, actually tests should be over here or, you know, I like things to be done in this particular fashion. And those two things, you know, preferences and then explaining to the agent how to navigate the code base effectively are very useful things to have in agents. M g.
没错。我认为这里有一个非常根本的问题:如何向一个没有任何上下文的代理传达你想要什么、你的偏好是什么,并试图节省一些人类启动所需的时间。对吧?我们为人类也是这么做的。
Yep. And I think that there's something deeply fundamental here of how do you communicate to an agent. There's no context what you want, what your preferences are, and to try to save it a little bit of the kind of spin up the human would require. Right? We do this for humans.
对吧?我们编写README.md文件。这只是一个约定俗成的文件名,代理应该去查看。但这里也有一些时效性的问题,对吧?
Right? We write we write readme. Mds. And this is just a convention for a name of a file that an agent should go look at. But there's also something that's a little point in time, right?
目前的代理还没有很好的记忆能力。对吧?就像如果你的代理运行了第十次,它真的从前九次为你解决难题的经历中受益了吗?因此,我认为我们需要进行真正的研究,思考如何实现记忆,如何让代理真正去探索你的代码库并深入理解它,然后能够利用这些知识。这是我们的众多例子之一,我们看到进一步的研究进展前景广阔。
That the agents right now don't have great memory. Right? It's like if you're running your agent for the tenth time, has it really benefited from the nine times that it went and solved a hard problem for you? And so I think that we have real research to do to think about how do you have memory, how do you have an agent that really just goes and explores your code base and really deeply understands it and then is able to leverage that knowledge. And so this is one of the examples and with our many where we see great fruit on the horizon for further research progress.
现在竞争非常激烈。曾有一段时间,OpenAI对很多人来说像是凭空出现,突然就有了GPT-3,然后是GPT-4,接着Anthropic在构建优秀模型,谷歌的Gemini也变得非常出色。你们如何看待这个竞争格局?你们如何看待自己在这个格局中的定位?
It's a very competitive landscape now. There was a point where, you know, OpenAI kind of came out of nowhere for a lot of people and all of a sudden there was GPD three, then there was GPD four and then I think Anthropics building great models and Gemini, you know, from Google has gotten really good. How do you guys see the landscape? How do you see your placement there?
我认为还有很大的进步空间。我较少关注竞争,而更关注潜力。嗯。对吧?因为我们在2015年创立OpenAI时,就认为通用人工智能可能比人们想象的更早实现。
I mean, I think that there's a lot of progress to be had. I I focus a little less on the competition and a little more on the potential. Mhmm. Right? Because, we started OpenAI twenty fifteen thinking that AGI is going to be possible maybe sooner than people think.
我们只想成为一股积极的力量,并思考它如何发挥作用。真正思考这意味着什么?尝试将其与实际执行联系起来一直是我们的主要任务。因此,当我们开始研究如何构建真正有用、能够真正帮助人们的能力模型时,将其带给人们就变得至关重要。你可以看看我们一路走来所做的选择,例如发布ChatGPT并广泛提供ChatGPT的免费版本。
We just want to be a positive force and how it plays out. And that really thinking about what does that mean? Trying to connect that to practical execution has been a lot of the task. And so as we started to figure out how to build capable models that are actually useful, right, that can actually help people actually bringing that to people is this really critical thing. And you can look at choices that we've made along the way, for example, releasing ChatGPT and making ChatGPT free tier available widely.
对吧?我们这样做是因为我们的使命,因为我们真正希望人工智能能够普及、可及,并惠及每个人。因此,在我看来,最重要的是继续这种指数级进步,并真正思考如何以积极和有用的方式将其带给人们。所以我认为我们现在的处境是,像GPT-4这类预训练模型已经存在,并在其之上进行强化学习,使其更加可靠和智能。
Right? That's something that we do because of our mission, because we really think about we want AI to be available and accessible and benefit everyone. And so in my view, the most important thing is to continue on that exponential progress and really think about how to bring it to people in a positive and useful way. So I really look at where we're at right now is that these models like, there's the GP four class of pretrained models. There's reinforcement learning on top of it to make it just much more reliable and smart.
对吧?就像你想想,如果你只是浏览了互联网,观察了大量的人类思想,然后第一次尝试写一些代码,你可能会遇到很多困难。
Right? It's like you think about if you've just sort of read the Internet, right, you've just observed a bunch of, you know, sort of human thought and you're trying to write some code for the first time, you're probably going to have a bad time of it.
嗯。
Mhmm.
但如果你有能力实际尝试解决一些困难的代码问题,拥有Python解释器,能够使用人类使用的工具,那么你就能变得更加稳健和精炼。所以我们现在让这些部分协同工作,但我们必须继续将它们推向新的高度。很明显,像能够重构大规模代码库这样的事情,目前还没有人完全攻克。没有根本原因我们不能做到。我认为一旦实现这一点,代码重构将成为企业的杀手级应用之一。
But if you've had the ability to actually try to solve some hard code problems, you have a Python interpreter, you have access to the kinds of tools that humans do, then you're going to be able to become much more robust and refined. So we now have these pieces working together in concert, but we got to keep pushing them to the next level. It's very clear that things like being able to refactor massive code bases, no one's cracked that just yet. There's no fundamental reason we can't. And I think the moment you get that and I think refactoring code is the is one of the killer use cases for enterprise.
对吧?你知道,如果你能将代码迁移的成本降低两倍,我认为最终会有十倍多的迁移发生。想想有多少系统还困在COBOL中,而且现在没有人在培训COBOL程序员了。对吧?
Right? It's, you know, if if if you could bring down the cost of code migrations by, you know, two x, I think you'll end up with 10 x more of them happening. Think about the number of systems that are stuck in COBOL. And there's no COBOL programmers being trained. Right?
这就像是,你知道,这种依赖关系正在为世界积累负债。唯一的出路就是构建能够真正解决这个问题的系统。所以我认为这是一个巨大的开放空间。指数级进步仍在继续,我们需要坚持走下去。
It's just like it's a it's a strictly, like, you know, building liability for for the world to have this dependency. Like, the only way through is by building systems that can actually tackle that. So I just think it's a massive open space. The exponential continues, and we we need to to stay on that.
今天我最喜欢的一件事是OpenAI发了一条推文,展示了如何使用CLI从completions API切换到responses API,因为确实如此。
My favorite thing today that happened was there was a tweet from OpenAI, which was showing people how to use the CLI to switch from the completions API to the responses API because it's Yeah.
这是个好消息。我期待看到更多这样的应用。是的,你知道,就是给Codex特殊指令让它可靠地进行重构之类的工作,然后你只需启动它,它就会为你完成。这真是太棒了。
That's a great news. I expect to see more of that. Yeah. You know, where you have special instructions given to Codex in order to go do, like, refactorings reliably, and then you just set it off and it does it for you. That's, like, a wonderful thing.
迁移是最糟糕的事情之一。没人愿意做迁移。没人愿意从一个库换到另一个,然后确保一切仍然正常工作。如果我们能自动化大部分这样的工作,那将是非常美好的贡献。
Migrations are some of the worst things. Nobody wants to do migrations. Nobody wants to, like, change from, like, one library to the other, and then make sure that everything still works. If we can automate, like, most of that, that's going to be, like, a very beautiful contribution.
是的。我认为还有很多其他领域。我觉得安全补丁是一个很好的例子,我认为这很快就会变得非常重要,我们正在非常认真地思考这个问题。我认为能够真正拥有能生产新工具的人工智能。对吧?
Yeah. I I think there's a lot of other ground as well. I think that security patching is a good example of something that I think will become very important soon and that that's something we're being very thoughtful about. I think that being able to actually have AIs that produce new tools. Right?
想想Unix标准工具集有多重要,而人工智能实际上能够构建对自己有用、对你有用的工具。你实际上可以在那里搭建复杂性或实用性的阶梯,能够持续改进这个效率的飞轮。人工智能实际上不仅仅是在编写代码,还能够执行,你知道,它们自己的,能够管理服务或能够做SRE工作之类的事情。我认为所有这些都在眼前。它正在开始发生,但还没有以我们希望看到的方式真正发生。
You think about how important the Unix set of standard tools has been and AIs, they're actually able to build their own tools that are useful for you, are useful for themselves. You can actually build up a ladder of complexity there or utility there to be able to just like continue to to improve this flywheel of efficiency. AIs that are actually really not just writing code, but able to execute, you know, their own you know, be able to administer services or be able to to do, you know, SRE work and things like that. I think all of that is on the horizon. It's like starting to happen, but it's not really happening yet in the way that we would like to see.
我们在OpenAI内部攻克的一个重大应用是代码审查,我们开始注意到我们的主要瓶颈是随着需要审查的代码量增加,团队需要做的审查工作量也随之增大。因此我们决定真正专注于一个高信号的Codex模式,它能够审查一个PR,并深入思考合约和意图,你知道,你原本打算实现的,然后查看代码并验证该意图是否在代码中得到匹配和体现。它能够深入多层,查看所有依赖关系,思考合约,并真正提出一些我们最优秀的员工、我们最好的审查者除非花数小时深入思考那个PR,否则无法发现的问题。我们首先在OpenAI内部发布了这个功能。
One big one that we cracked internally at OpenAI and then we decided to release it as a code review where we started to notice that the big bottleneck for us was with increased amounts of code needing to be reviewed. It's like the well, amount of review simply that, people had to do on the teams. And so we decided to really focus on a very high signal codex mode where it's able to review a PR and really think deeply about the contract and the intention that, you know, you were you were meaning to implement and then look at the code and validate whether that intention is matched and found in that code. And it's able to go layers deep, look at all the dependencies, think about the contract, and really raise things that some of our best employees, some of our best reviewers wouldn't have been able to find unless they were spending hours really deeply thinking about that PR. And we released this internally first at OpenAI.
它相当成功,实际上当它出故障时人们还很沮丧,因为他们感觉失去了那个安全网。它极大地加速了团队,包括Codex团队。在我们发布IDE扩展的前一晚,我团队的一位顶级工程师疯狂地完成了25个PR,我们自动发现了相当多的bug。Codex发现了相当多的bug,你知道,我们第二天就发布了一个几乎无bug的IDE扩展。所以那里的速度是惊人的。
It was quite successful and people were upset actually when it broke because they felt like they were losing that safety net. And it accelerated teams and including the Codex team tremendously. The night before we released the IDE extension, one of the top engineers on my team was like cranking out 25 PRs and we were finding quite a few bugs automatically. Codex was finding quite a few bugs and, you know, we were able to put out an IDE extension that was almost bug free the next day. So the velocity there is incredible.
特别有意思的是,对于代码审查工具,人们非常紧张是否启用它,因为我觉得我们之前尝试过的所有自动代码审查实验都证明它只是噪音。
And it's very interesting that for the code review tool in particular, people were very nervous about having this enabled because I think our previous experience with every auto code review experiment that we've tried is that it's just noise.
嗯。
Mhmm.
对吧?你只是收到某个机器人的邮件,然后你就想,唉,又是这种东西,直接忽略。我觉得我们现在的情况恰恰相反。这真的说明当能力低于某个阈值时,它感觉完全是个净负面的东西。我根本不想听到它。
Right? You just get an email from some bot and you're like, ugh, another one of those things, you ignore it. And I think we've had kind of the opposite finding from from where we are now. And it really shows you when the capability is below threshold, it just feels like this thing is like totally net negative. I don't wanna hear about it.
我不想看到它。但一旦你突破某个效用阈值,突然之间人们就想要它了,对吧,如果被拿走还会非常不高兴。而且我觉得我们的观察是,如果AI的某个功能现在勉强能用,一年后它就会变得极其可靠,极其关键。我认为代码审查正朝着这个方向发展。
I don't wanna see it. Once you kinda crack above some threshold of utility, suddenly people want it, right, and get very upset if it gets taken away. And I think also our observation is if something kind of works in AI right now, one year from now, it'll be incredibly reliable, incredibly mission critical. And I think that that's where we're going with code review.
代码审查另一个有趣的地方是,它能带动人类参与,真正成为一个包括审查在内的协作伙伴。我们深入思考的一个问题是,如何呈现这些发现,让你真正有兴趣去阅读,甚至可能学到东西,包括当它出错的时候。比如,你可以真正理解它的推理过程。大多数时候,实际上超过90%的情况下它是正确的。作为代码作者或协助审查代码的人,你经常能学到东西。
Part of the interesting things there with code review as well is, like, bringing humans along and really have this be a collaborator including in review. And one thing we taught a lot about is, like, how can we raise those findings so that you are actually excited to read, this finding and you might even learn something, including, you know, when it's wrong. Like, you know, you can actually understand its reasoning. Like most of the time, like actually more than 90% of the time it's right. And you often learn something as the person who authored the code or someone who is helping reduce the code.
是的。回到我们之前关于进步速度的讨论,有时候退一步想想以前的情况。我记得对于GPT-3和GPT-4,我们真的非常关注它的固执己见问题。你还记得吗,如果AI说错了什么,你指出错误后它会怎样?
Yeah. Just, you know, circling back to what we were saying earlier about the rate of progress and sometimes stepping back and thinking about how things were earlier. Like, I remember for g p d three and for g p d four, really focusing on the doubling down problem. Like, do you remember if the AI would say something wrong and you'd point out the mistake?
哦,它会,是的,跟你争论。
Oh, it would, yeah, argue with you.
哦,是的。是的。它会试图说服你它是正确的。就像,我们早就过了那成为核心问题的阶段了。我确信在某些情况下确实会发生
Oh, yeah. Yeah. It would try to, like, convince you that it was right. Like, we're so far past that being the core problem. Like, I'm sure it happens in some Yeah.
就像人类也会遇到的 obscure 边缘案例一样。但真正令人惊叹的是,我们已经达到了这样一个水平:即使它没有完全找准正确方向,它也在突出重要的东西。它有着相当合理的思考。而且我...是的。每次完成这些代码审查后,我总会想,好吧
Obscure edge cases just like it does for humans. But it's it's really amazing to see that we're at a level where even when it's not quite zeroed in on the right thing, it's highlighting stuff that matters. It has, like, pretty reasonable thoughts. And I that Yeah. I always walk away from these code reviews thinking like, Okay.
是的,那是个很好的观点。我应该考虑这一点。
Yeah, that that's a good point. I should be thinking about that.
我们现在即将推出 GPT-5。在录制本期播客时,我们现在已经有了 GPT
We're now just into launch a GPT five. And as the recording of this podcast, we now have GPT
五号 Codex。我们对此感到无比兴奋。
five Codex. Which we're tremendously excited about.
非常兴奋。
Very excited.
先生们,我为什么要对此感到兴奋?给我推销一下这个产品。
Why should I be excited about this, gentlemen? Sell me on this.
所以GP5 Codecs是我们针对Codecs优化的GP5版本,我们之前讨论过工具链,因此它是为工具链优化的。我们真的将其视为一个智能体,将模型与工具集紧密耦合,使其更加可靠。这个模型展现的一个特点是能够持续工作很长时间,在处理复杂重构任务时具备所需的坚韧性。但同时,对于简单任务,它的响应速度更快,几乎无需思考就能回复。就像一个优秀的协作者,你可以询问代码问题,找到需要修改的代码片段,或者更好地理解计划。
So g p five codecs is a version of g p five that we have optimized for codecs, and we talked about the harness, and so it's optimized for the harness. We really consider it to be, like, one agent where you couple the model very closely to the set of tools, and it's able to be even more reliable. One of the things that this model exhibits is an ability to go on for like much longer, and to really have that grit that you need on like these complex refactoring tasks. But at the same time, for simple tasks, it actually comes way faster at you and is able to, reply without much thinking. And so it's like this great collaborator where you can ask questions about your code, find where, you know, this bit of a piece of code is that you need to change or like better understand plan.
但与此同时,一旦让它开始处理某个任务,它会持续工作非常长的时间。我们在内部观察到它处理复杂重构任务长达七小时。这是其他模型从未实现过的。我们还在代码质量方面做了巨大改进,真正优化了人们使用GPT-5 within Codecs时的体验。
But at the same time, once you let it go on to something, it will work for like a very, very long period of time. We've seen it work internally up to seven hours for like very complex refactorings. We haven't seen other models do that before. And we also have really worked tremendously on like code quality and it's just really optimized for, you know, what people are using GPT-five within Codex for.
当你说工作时间更长,提到长达七小时时,你不仅仅是指它不断将内容重新放入上下文,而是实际上在做出决策,判断什么是重要的并持续推进,对吗?
So when you talk about working longer and you say worked up to seven hours, you're not just talking about it keeps putting things back into context that it's actually making decisions, deciding what's important and moving forward or?
是的。想象一个非常棘手的重构任务。我们都遇到过这种情况:代码库变得难以维护,需要进行一些更改才能继续推进。
Yes. So imagine like a really tricky refactoring. Right. We've all had to deal with those where, you know, you've decided that your code base is unmaintainable. You need to make a couple of changes in order to move forward.
你制定计划后让模型开始工作。让Codecs GPT-5 Codecs接手,它会逐步解决所有问题,让测试运行并通过,最终完成整个重构。这就是我们观察到它能持续工作长达七小时的任务类型。哇。
So you make a plan and then you let the model go. You let codecs GPT five codecs go at it, And it will just like work its way through all of the issues, get the test to run, get the test to pass, and just completely finish the refactoring. This is like one of the things that we've seen it do like for up to seven hours. Wow.
是的。我觉得最令人惊叹的是这些模型的核心智能显然如此惊人。即使在三四个月前,我认为我们的模型在导航内部代码库寻找特定功能方面已经比我更强,这需要相当复杂的...
Yeah. The thing that I find so remarkable is that the core intelligence of these models is clearly so just like stunning. Right? I think that even three, six months ago, I think our models were better than I am at navigating our internal code base, right, to find a specific piece of functionality. And that requires some really sophisticated
你要
Are you gonna
不得不让自己放手?
have to let yourself go?
你是不是
Are you
比如,格雷格,我很抱歉。
like, Greg, I'm sorry.
不。因为关键在于我能做更多事。就像,我想花时间做的事情,就是我希望人们因此认识我的事情。比如能够找到功能代码库。绝对不行。
No. Because that's the thing is I get to do more. It's like, is what I wanna spend my time doing is what I want people to know me for. It's like being able to find functionality code base. Like, absolutely not.
对吧?这不是我定义自己作为工程师价值的方式,也不是我想花时间做的事情。现在我认为这对我来说是核心所在。对吧?这种惊人的智能首先能吸走所有那些 mundane、无聊的部分。
Right? That's not how I define my value as an engineer or what I wanna spend my time on as an engineer. And now I think that that to me is the core of it. Right? That that there's this amazing intelligence and that it can, first of all, suck away all the like kind of mundane, boring parts.
当然也有一些有趣的部分,对吧?比如,你知道,我认为真正思考事物的架构时,它是一个很棒的伙伴,但我能选择如何花费我的时间。对吧?我可以思考你想要多少这样的代理运行在什么任务上,如何分解事物。所以我视其为增加了程序员的机会面。
And certainly some of the there are some fun parts too, right? Like, you know, I think that really thinking about the architecture of things, it's a great partner, but I get to choose how I spend my time. Right? And I get to think about how many of these agents do you want running on what task, how do I break down things. And so I view it as increasing the opportunity surface for programmers.
而且,你知道,我是一个彻头彻尾的 Emacs 用户。我开始使用 VS Code、Cursor、Windsurf 这些东西,部分是为了尝试新事物,但部分是因为我喜欢不同工具的多样性,但真的很难让我离开我的终端。哇。所以,但你知道,我发现我们现在已经超过了阈值,我真的发现自己会想念那种感觉,比如我在做一些重构时,我会想,为什么我在打这些东西?
And, you know, I'm a I'm an Emacs user through and through. You know, I started using, you know, Versus Code and Cursor and Windsurf and these things partly to to just just try things out, but partly because I like the diversity of of different tools, but it's really hard to get me out of my terminal. Wow. And so but, you know, I I have found that we're now above threshold where I really find myself missing the like, I'm like doing some refactor. I'm like, why am I typing this thing?
对吧?就像,你知道,就像你在努力回忆某个特定东西的准确语法,或者试图去做这些非常机械的事情。我就想,我只想要有个实习生去搞定这些事。但现在我在终端里就有了这个能力。而且我认为我们达到了这样一个阶段真的很神奇——你拥有这个核心智能,并且可以自主选择何时以及如何使用它。
Right? Like, you know, it's like you're trying to remember exactly the syntax for a specific thing or like trying to to, you know, sort of do these these very mechanical things. I'm like, I just wanna, like, have an intern go do the thing. But I have that now in my terminal. And and I think it's really amazing that we're at the point that you have this core intelligence and that you get to pick and choose when and how to use it.
请给扩展程序也加上语音输入功能吧,因为我现在就喜欢和模型对话,告诉它要做什么事情。
Please add a whisper to the, you know, the extension too because now I just love to talk to the model and tell it to do things.
是啊是啊。你应该能和你的模型视频聊天。我觉得我们正在朝着真正的协作伙伴、真正的同事关系迈进。
Yeah. Yeah. You should be able to video chat with your model. Like I think we're heading towards a real collaborator, a real coworker.
嗯,没错,我们来聊聊未来吧。你认为这会朝着什么方向发展?你觉得前景如何?智能体时代的未来有什么令人兴奋的?我们将如何运用这些系统?
Well, yeah, let's talk about the future. Where do you see this headed? Where do you see that? What's exciting about the agentic future? How are we going to be using these systems?
我们坚信未来的发展方向是:在云端的某个地方存在大量智能体群体,我们人类作为个人、团队和组织对其进行监督和引导,以创造巨大的经济价值。所以如果展望几年后的景象,将会是数百万智能体在企业的研发中心和数据中心里工作,执行有用任务。现在的问题是如何逐步实现这个目标,以及如何通过实验找到合适的形态和交互模式?其中极其重要的是要解决所有这些智能体的安全性和对齐问题,让它们既能完成有用工作,又能确保安全。
We have strong conviction that the way that this is headed is large populations of agents somewhere in the cloud that we as humanity as, you know, people, teams, organizations, like supervise, and steer in order to produce, like, great economical value. So if we're going, like, you know, a couple of years from now, this is what it's going to look like. It's millions of agents working in, you know, R and like company's data centers in order to do useful work. Like now the question is like how do we get there gradually, and how do we get to experiment on the right form factor and the right interaction patterns here? One of the things that is incredibly important to solve is the safety, security alignment of all of this so that agents can perform useful work but in a safe way.
而你作为操作者的人类始终掌握控制权。这就是为什么Codec CLI默认让智能体在沙箱中运行,无法随意修改你电脑上的文件。我们将持续投入大量资源来确保环境安全,研究何时需要人类引导、何时需要人类批准特定操作,通过逐步授予更多权限的方式,让你的智能体拥有自己的一套权限体系——你允许它使用的权限,并在允许它执行更高风险操作时提升权限。因此,构建这套完整系统并使其支持多智能体协作,能够被个人、团队和组织所引导,最终与组织的整体目标对齐,这就是我所看到的未来方向。虽然还有些模糊,但确实非常令人兴奋。
And you get to always stay in control as like the operator, as a human. And this is why for Codec CLI, by default, the agent operates in a sandbox and is unable to edit files like randomly on your computer. And we're going to be continuing to invest a lot in making, you know, basically the environment safe, invest in like understanding when humans need to steer, when humans need to approve certain actions, giving more and more permissions so that your agent has its own set of permissions that, you know, you allow it to use and then maybe escalate permissions when you allow it to do like exceptionally, you know, more risky things. And so figuring out this this entire system and then making it multi agent and steerable by individuals, teams, organizations, and then aligning, you know, that with the whole intent of organizations, this is where it's headed for me. It's a bit nebulous, but it's also very exciting, I think.
没错。我觉得完全正确。我认为在微观层面,还有一系列技术问题需要解决。就像Thibault提到的可扩展监督问题:作为人类,你该如何管理那些在外编写大量代码的智能体?
Yep. Yeah, think it's exactly right. I mean, I think at zoomed in level, there's a bunch of technical problems that need to be solved. Like Thibault is kind of getting at scalable oversight, right? How do you, as a human, manage agents that are out there writing lots of code?
你可能不想阅读每一行代码。现在大多数人可能不会阅读这些系统生成的所有代码。但你怎么
You probably don't want to read every line of code. Probably right now, most people do not read all the code that comes out of these systems. But how do you
我当然会读。完全正确。
Of course I do. Exactly.
但如何保持信任?如何确保人工智能产生的内容确实是正确的?我认为存在技术方法,我们可能从2017年就开始思考这类问题,当时我们首次发布了一些策略,关于如何让人类或较弱的人工智能开始监督更强大的人工智能,并通过某种引导方式确保它们执行非常强大、重要的任务,使我们能够保持信任和监督,真正掌握主导权。这是一个非常重要的问题,通过思考越来越强大的编码代理,它以非常实际的方式得到了体现。但我也认为还有其他容易被忽视的维度,因为在人工智能能力的每个层级,人们都会过度拟合他们所看到的东西,认为,哦,这就是人工智能。
But how do you maintain trust? How do you make sure that that AI is producing things that are actually correct? And I think that there are technical approaches and we've been thinking about these kinds of things since probably 2017 is the first time we published some strategies for how you can have humans or weaker AI start to supervise even stronger AIs and kind of bootstrap your way to to making sure that they're doing very capable, important tasks that we can maintain trust and oversight and really be in the driver's seat. So that's a very important problem, and it really is exemplified in a very practical way through thinking about more and more capable coding agents. But I think there's also other dimensions that are very easy to miss because I think at each level of AI capability, people kind of overfit to what they see and think, oh, this is AI.
这就是人工智能将要成为的样子。但我们尚未完全看到的是人工智能解决真正困难的新问题。嗯。对吧?现在,可以把它看作是需要进行代码重构。
This is what AI is going to be. But the thing we haven't quite seen yet is AI is solving really hard novel problems. Mhmm. Right? Right now, think of it as, like, need to do my refactor.
你至少对那个东西应该是什么样子有个概念。对吧?就像它会为你做很多工作,节省大量时间。但是解决那些通过其他任何方式都根本解决不了的问题呢?我认为这不一定仅限于编码领域,想想医学领域,对吧?
You at least have a shape of like what that thing would be. Right? It's like it'll do a lot of the work for you, save a lot of time. But what about solving problems that are fundamentally unsolvable through any other means? And I think of this not necessarily just in the coding domain, but think of it in medicine, right?
你知道,生产新药物,想想材料科学,生产具有新颖特性的新材料。我认为有很多新的能力即将出现,将解锁这类应用。所以,对我来说,一个重要的里程碑是第一次出现由人工智能产生的、其本身极具价值和趣味的成果,不是因为它是由人工智能产生的,不是因为它生产成本更低,而是因为它就是一个突破。它纯粹是新颖的东西,甚至不一定需要人工智能自主创造,只需在与人类的合作中,人工智能成为关键依赖。我认为我们开始看到这类事物的迹象。
You know, producing new drugs, think of it in material science, producing new materials that have novel properties. And I think that there's a lot of new sort of capability coming down the pike that is going to unlock these kinds of applications. And so, you know, for me, one big milestone is the first time that you have an artifact produced by an AI that is extremely valuable and interesting unto itself, not because it was produced by an AI, not because it was cheaper to produce, but it because it's simply like a breakthrough. It is simply something that is just novel and that AI you don't even necessarily have it to be autonomously created by the AI, but just in partnership with humans and that AI is a critical dependency. And so I think we're starting to see signs of life on this kind of thing.
我们在生命科学领域看到了这一点,人类实验者向AI询问五个实验方案的想法。他们尝试了这五个方案。其中四个不奏效,有一个奏效了。我们得到的反馈是,早在三天前,其结果就达到了你对三年级或四年级博士生所期望的水平,这太疯狂了。
We're seeing it in life sciences where humans ask human experimenters ask three for five ideas of experimental, you know, protocols to run. They try out the five of them. Four of them don't work. One of them does. And the kind of feedback that we've been getting in this was back in the three days is that it's kind of the results are at the level of what you'd expect from like a third or fourth year PhD student, which is crazy.
是的,
Yeah,
疯狂。那是第三代对吧?GPT五和GP五专业版。我们在那里看到了完全不同的结果。我们看到研究科学家们表示,好吧,这确实在做真正新颖的事情。
crazy. And that was three, right? GPT five and GP five pro. We're seeing totally different results there. There we're seeing research scientists saying, Okay, yeah, this is doing real novel stuff.
有时候,再次强调,它不仅仅是独立解决这些宏大理论,而是在合作中能够远远超越人类个体所能达到的极限。对我来说,这正是我们需要继续推进并做对的关键之一。
And sometimes it's again, it's not just on its own solving these grand theories, but it's together in partnership being able to just stretch far beyond where where human on a cystic could go. And that to me is like one of the critical things that we need to continue to push on and get right.
我在与人谈论未来时遇到的一个挑战——我想听听你们对此的看法——是人们倾向于把未来想象成现在,只是多了闪亮的衣服和机器人。他们会想,那么当机器人完成所有编码之类的工作后会发生什么?你提到了一个事实,比如有些事情你喜欢做,有些事情你不愿意做。到2030年我们会处于什么位置?那会是什么样子?
One of the challenges I have when talking to people kind of about the future, and I want to hear you guys talk about this, is that people tend to imagine the future is kind of the present, but with like shiny clothes and robots. And and they think about like, well, then what happens when robots do all the code and all that? And you brought up a fact that, like, there are the things you like to do and the things you don't care to do. Where are we in 2030? What does it look like?
那是五年前,GPT三代。现在再看五年后。
It was five years ago, G P D three. Now five years from now.
2030年。我们六个月前还没有这些工具。所以很难准确描绘五年后的具体景象。但有一点
2030. We didn't we didn't have these tools six months ago. So it's hard to picture exactly what this is going to look like, five years from now. But one thing that
五年后我会突然从草丛里跳出来拿着这个播客说:你当时是这么说的。
I'm gonna pop out of the bushes five years now with this podcast and be like, you said this.
嗯,你的代理会帮你处理的
Well, your agent will do it
为你处理。是的。是的。是的。它会
for you. Yeah. Yeah. Yeah. It's gonna
所以重要的一点是,那些作为关键基础设施、支撑社会的代码片段,我们需要持续理解并掌握相应的理解工具。这也是为什么我们考虑代码审查的原因。代码审查应该帮助你理解代码,成为你的编程伙伴,帮助你深入理解他人编写的代码,并可能借助AI提供协助。
so one thing that's important is, like, the things that are the pieces of code that are critical infrastructure and underpinning society, we need to, like, continue to understand and have the tools to understand. And this is why also we're we were thinking about code review. It's like and and code review should help you, you know, understand that code and be this teammate dot, you know, helps you deep dive into the code written by someone else, potentially help with AI.
而且我其实认为,我们已经面临一个问题:市面上存在大量未必安全的代码。
And and I would actually argue that we already have a problem of there's lots of code out there that is not necessarily secure.
嗯。
Mhmm.
对吧?这种情况经常发生。我记得,比如Heartbleed漏洞。那大概是十二年前的事了。一个关键漏洞出现在互联网广泛使用的核心软件中。而且你会发现这并非个例,对吧?
Right? This happens all the time. I remember, like, Heartbleed back. I guess it's almost twelve years ago or something. Critical vulnerability and a key piece of software used across the Internet, And you realize that that's not singular, right?
世界上存在大量尚未被发现的安全漏洞。
That there's lots of vulnerabilities out there that no one has found.
所有这些来自NPM的包和东西,还有那些人们植入漏洞后就被闲置的包。
All these all these packages and stuff from NPM and all these packages that are just sitting there that people put exploits into.
一直以来都是这样运作的:攻击者变得更狡猾,防御者变得更强大,就像一场猫鼠游戏。我认为有了AI,你会想,嗯,也许它会偏向哪一方?也许它只会加速这场猫鼠游戏。但我认为,通过AI,你确实可以解锁根本性的新能力,这带来了一些希望。例如,形式化验证,这可以说是防御的终极手段。
And the way that it's always worked is that there's a cat and mouse game between attackers getting more sophisticated, defenders getting better. And I think that with AI, you're like, well, maybe it's going to like which side will advantage the most? Maybe it'll just sort of accelerate this cat and mouse. But I think that there's some hope that actually you can unlock fundamental new capabilities through AI. For example, formal verification that are sort of an endgame for defense.
嗯。
Mhmm.
我认为这对我来说非常令人兴奋的是,思考如何不仅继续这场似乎永无止境的激烈竞争,而是如何最终实现更高的稳定性和可理解性。我认为还有其他类似的机会,让我们能够真正理解我们的系统,而目前,我们几乎处于人类对传统软件系统理解能力的边缘。
And I think that that to me is very exciting is thinking about not just how do you continue this, like, know, sort of never ending rat race, but how do you actually end up with increased stability, increased understandability? And I think that there's other opportunities like that for us to really understand our systems in a way that right now it's almost, you know, we're we're sort of at the edge of of human human understanding of the of the softwares traditional software systems that have been built.
我们构建Codex的原因之一是为了改进世界上的基础设施和代码,而不一定是增加世界上的代码量。所以这是一个非常重要的点,它也在帮助发现错误、帮助重构、帮助找到更优雅、更高效的实现方式,这些实现可以达到相同的目标,或者实际上更通用,但最终不会产生像1亿行你无法理解的代码。我真正兴奋的一点是,Codex如何帮助团队和个人编写更好的代码,成为更好的软件工程师,并最终构建出更简单但实际为我们做更多事情的系统。
One of the reasons we built Codex is to improve, the infrastructure and the code out there in the world, not necessarily to increase the amount of code in the world. And so this is like a very important point where it's also like helping find bugs, helping refactor, helping find more elegant, more performant implementations that achieve the same thing or actually are more general, but not necessarily ending up with like a 100,000,000 lines of code that, you know, you don't understand. One thing that I'm really excited about is like how Codex can help teams, individuals, you know, just write better code, be better software engineers, and end up with simpler systems that are actually doing more things for us.
我认为2030年展望的一部分是,我们将生活在一个物质丰富的世界。对吧?我认为AI将使创造任何你想要的东西变得比你几乎能想象的还要容易得多。嗯,对吧?
I think part of the 2030 outlook is we will be in a world of material abundance. Right? I think that AI is going to make it much easier than you could almost imagine to create anything you want. Mhmm. Right?
而且这可能不仅在数字世界,在物理世界也是如此,其方式难以预测。但我认为那将是一个计算资源绝对稀缺的世界。
And that that will probably be true in the physical world in addition to the digital world in ways that are hard to predict. But I think it'll be a world of absolute compute scarcity.
嗯。
Mhmm.
我们在OpenAI内部已经稍微见识过这种情况了,对吧?不同研究项目争夺计算资源的方式,或者研究项目的成功取决于计算资源的分配,这一点的重要性怎么说都不为过,对吧?我认为我们将进入这样一个世界:你实现和创造想象中事物的能力,部分受限于你的想象力,但也部分受限于背后的计算能力。因此我们经常思考的一个问题是:如何增加全球的计算供应?对吧?
And we've seen a little bit of what this is like within OpenAI, right? The the way that different research projects fight over compute or that the success of the research program is determined by the compute allocation is something that is you know, it's hard to overstate, right? And I think that we're going to be in a world where your ability to produce and create whatever you imagine will be limited partly by your imagination, but partly by the compute power behind it. And so one thing we think about a lot is how do we increase the supply of compute in the world? Right?
我们既想提升智能水平,也想提高这种智能的可及性。从根本上说,这是一个物理基础设施问题,而不仅仅是软件问题。
We want to increase the intelligence, but also the availability of that intelligence. And fundamentally, it is a physical infrastructure problem, not just a software problem.
我知道对于GPT-5,我觉得特别令人惊叹的一点是,我们能够将其作为免费版、Plus计划和Pro计划的一部分提供。就像,你可以用Plus计划使用Codex,你获得的GPT-5版本和其他所有人得到的是一样的。这就像是这种惊人的智能,但模型在那种方式下也极其具有成本效益。
I know with GPT five, I think one thing that's quite amazing is, like, we're able to give it, you know, as part of the free, the plus plan, the pro plan. It's like, you know, you can use Codex with your plus plan, you get GPT-five like the same version that everyone else gets. And it's like this incredible intelligence, but the model is also incredibly like cost effective in that way.
我认为那对我来说真正突出的一点是,我觉得模型的能力强了很多,但它却以相同的价格点发布,或者在某些方面比之前的模型更便宜,那真是让人惊叹。我的意思是,这种模式太棒了。
I think that was one of the things that was really stood out for me was I thought the model was much more capable, but it came out at the same price point or some ways cheaper than the previous model and that was something like a wow. Mean, that that patterns are great.
我认为我们在提升智能和降低价格方面的程度,是非常容易被忽视、被视为理所当然的,但这实际上非常疯狂。对吧?我记得我们对o3之类的进行了大约80%的降价。如果你回顾一下,GPT-3级别智能的价格是每千个token 6美分。
I think I think the degree to which we are improving the intelligence and cutting prices is something that is very easy to miss, take for granted, but it's actually crazy. Right? I think we did, a 80% price cut on on o three or something like that. If you just look at your point of, 6¢ per thousand tokens back for g p d three level intelligence.
是的。之前有一篇文章出来时,报纸抱怨说这些推理模型让成本更高了,但他们没有将推理模型与过去六到七个月内的推理模型进行比较,看看它们变得高效了多少。
Yeah. There was an article that came out when the newspapers was complaining that, well, these reasoning models have made it more expensive, but they didn't compare reasoning models to reasoning models in, the last six to seven months and how much more efficient they've become.
是的。而且这种情况还会持续下去。关于计算资源稀缺的问题,我觉得有一点很能说明问题:现在人们谈论要建造庞大的、拥有百万级别GPU的集群,那种规模的GPU。但如果我们达到一个阶段——可能并不遥远的未来——你会希望有代理程序持续为你运行。嗯。
Yep. And that that will just continue. You know, on the compute scarcity point, one thing that I find very sort of suggestive is thinking about you know, right now people talk about building big, know, big fleets of a million GPUs of millions of GPUs, that level of of GPUs. But if we we reach a point which is probably not in that far future where you're going to want agents running on your behalf constantly. Mhmm.
没错。就像,每个人都想要一个专属GPU来运行自己的代理程序是很合理的。这样算下来,我们需要的GPU数量就接近100亿了。而我们目前离这个数量级还差得很远。所以我认为我们的部分工作就是要弄清楚如何提供这种计算能力,如何让它存在于世界上,但同时如何最大化利用当前非常有限的计算资源。
Right. Like, it's reasonable for every person to want a dedicated GPU just for them writing their agent. And so now you're talking almost 10,000,000,000 GPUs that we need. We're orders of magnitude off of that. And so I think that part of our job is to figure out how to supply that compute, how to make it exist in the world, but how to make the most out of the like very limited compute that exists right now.
而且,你知道,这既是个效率问题,也是个提升智能水平的问题。但是,是的,我认为很明显,要实现这个目标需要大量的工作和建设。
And, you know, that's an efficiency problem. It's also an increase that that intelligence problem. But, yeah, I think it's it's very clear that bringing this to fruition is going to be just like a lot of work and a lot of building.
关于代理程序与GPU的关系及其运行方式,一个有趣的点在于:让GPU离用户更近是非常有益的。因为当代理程序在运行,在几分钟内进行200次工具调用时,它一直在GPU和你的笔记本电脑之间来回通信,执行这些调用,获取上下文信息,然后继续反思。所以让GPU更贴近用户,在这方面也是很大的贡献。而且,你知道,这确实能带来巨大好处,因为它极大地降低了整个交互过程和部署的延迟。
So one of the interesting things about agents and the relationship to GPUs and them acting is that it is very beneficial to have a GPU also close to you because, you know, when it's acting and doing 200 tool calls over the span of like a couple of minutes, it's always it's doing this back and forth between the GPU and like your laptop and executing those two calls, getting that context back and then continuing to reflect. And so bringing GPUs like to people, you know, being GPUs close to people is, you know, a great contribution there as well. And, you know, really benefits because it reduces the latency tremendously of the entire interaction and the entire rollout.
先生们,我们经常被问到一个关于未来、关于劳动力、关于所有这些的问题。第一,学会编程,不是学会编程。
Gentlemen, we get the question that comes up periodically is about the future, about labor, about all of this. Number one, learn to code, not learn to code.
我认为现在是学习编程的大好时机。
I think it's a wonderful time to learn to code.
我想,是的。我同意。一定要学会编程,但也要学会使用AI。
I think I think yeah. I agree. Definitely learn to code, but learn to use AI.
是的。
Yeah.
对我来说,这是最重要的事情。
That to me is the most important thing.
使用Codex学习新编程语言有着极大的乐趣。我的团队中有很多人是Rust新手,我们决定用Rust构建核心框架。看到他们仅仅通过使用Codex、提问、探索不熟悉的代码库就能快速掌握一门新语言,并且仍然取得出色成果,这真是太棒了。当然,我们也有经验丰富的Rust工程师继续指导,确保我们保持高标准。但这确实是一个学习编程的绝佳时机。
Something tremendously enjoyable about using Codex to learn about a new programming language. A lot of people on my team were new to Rust, and we decided to build a core the core harness in Rust. And it's been it's been really great seeing, like, how quickly they can pick up a new language just by using codecs, asking questions, exploring a code base that they don't know, and still achieving great results. Obviously, we also have very experienced Rust engineers to continue to mentor and, you know, make sure that we have a high bar. But it's just a really fun time to learn to code.
我记得我学习编程的方式是通过w3schools的教程,PHP、JavaScript、HTML、CSS。当我构建最初的一些应用程序时,我试图弄明白如何——我甚至不知道这个词——序列化数据。对吧?然后我想出了一种结构,使用一些特殊字符序列作为分隔符。但如果你数据中真的出现了那个字符序列会怎样?
I remember the way that I learned to program was by w three schools tutorials, PHP, JavaScript, HTML, CSS. I remember when I was building some of my first applications, I was trying to figure out how to I didn't even know the word for it, serialized data. Right? And I came up with some sort of, like, you know, sort of, you know, structure that had some special special sequence of characters that was serving as a delimiter. And what would happen if you actually had that sequence of characters in your data?
比如,我们还是别谈那个了。所以我必须使用一个非常特殊的序列。这种事情是你不会在教程中看到有人为你标记出来的。但Codex在代码审查中会说,嘿,有JSON序列化功能,直接用这个库吧。
Like, let's not talk about that. So that's why I had to have a very special sequence. And this is the kind of thing where you're not gonna have a tutorial that will flag this kind of issue for you. But will Codecs in its code review be like, hey, there's JSON serialization. Just use this library.
绝对如此。因此我认为它有加速和极大简化编码过程的潜力,这样你就不必重新发明所有这些轮子,它可以替你提问或回答问题,甚至是你都没意识到需要问的问题。对我来说,这就是为什么我认为现在比以往任何时候都更适合进行开发。
Absolutely. And so I think that the potential to accelerate and make it so much easier to code so you don't have to sort of reinvent all these wheels and that it can ask ask the question for you or answer the question for you. You don't even know that you needed to ask. Like that to me is why I think it's like a better time than ever to to to build.
仅仅通过观察它如何解决问题,我就学到了很多,发现了新的方法等等。我经常喜欢给它一些疯狂的任务,比如如何只用一千行代码创建你自己的语言模型,你会尝试做什么?有时它可能会失败,但如果你看看它的方向并尝试去做,你会说,哦,我甚至不知道还有这种东西。
I've learned a lot just by looking how it solves a problem, found found new methods and stuff. That's often I like to sometimes give it like a crazy task, like how would you create your own language model with only a thousand lines of code and what would you try to do? And sometimes it might fail, but then if you look at the direction and tried to do it, you go, oh, I didn't even know that was a thing.
还有一点是,你知道,那些用AI编码最成功的人,也都深入学习了软件工程的基础知识,建立了正确的框架和架构,思考过如何构建他们的代码库,然后才借助AI的帮助,但仍然遵循那个总体蓝图。这确实能加速你的进程,让你比完全不懂所写代码的情况下走得更远。
One of the things as well is that, you know, the people who are most successful coding with AI also have really studied, you know, fundamentals of software engineering and put the right, framework in place, right architecture, have taught about how to structure their code base and then are, you getting help from AI, but still, you know, following that general blueprint. And that really accelerates you and allows you to go like much further than you would be able to go if you actually didn't understand the code that's being written.
自从你们推出这个,自从GPT-5上线,自从你们能够通过Codecs部署功能以来,
Since you've launched this, since you've made this available GPT-five, since you've been able to deploy things with Codecs, what
你们看到的用户使用率如何?是的,使用率一直在爆炸式增长。我们看到用户整体使用量增长了10倍以上,而且原本就在使用的用户现在也用得更多了。所以我们看到了更复杂的使用场景,人们使用它的时间也更长了。
have you seen as usage rates? Yeah. Usage has been exploding. So we've seen more than a 10x growth in usage from across users and the users that were using it already are using it much more as well. So we're seeing more sophisticated usage, and people are using it for longer periods of time as well.
我们现在已经将其包含在Plus和Pro计划中,并设定了宽松的使用限制,这对取得成功贡献很大。
We have now included in the plus and pro plan, with generous limits and that's contributed a lot to being successful.
是的。我认为氛围也确实开始转变,因为人们正在努力理解如何正确使用GPT-5,对吧?我觉得它的风格有点不同。我认为我们在正确的工具和生态系统整合方式上有自己的独特见解。
Yeah. I think that the vibes, I think, also have really started to shift as people, I think, are trying to realize how you need to use g p d five. Right? I think it's a little bit of a different flavor. I think that we have our own spin on the right harnesses and tools and the ecosystem of how these things fit together.
而且我认为一旦人们掌握了诀窍,他们就会进展得非常快。
And I think that once it clicks for people, then they just go so fast.
先生们,非常感谢你们加入我们并讨论这个话题。还有什么最后的想法吗?
Gentlemen, thank you so much for joining us here and talking about this. Any last thoughts?
感谢邀请我们。是的,我们对接下来的一切感到非常兴奋。我认为我们有很多东西要构建。进步继续呈指数级增长,而真正让这些工具变得对每个人都可用且有用,是我们使命的核心。
Thank you for having us. Yeah. We're really excited about everything that comes next. I think we have so much to build. Progress continues on the exponential, and I think really bringing these tools to be usable and useful by everyone is core to our mission.
是的,感谢邀请我们。我也超级兴奋。现在我们有了编解码器并且它不断改进,我们也在加速,每天都在构建更好的Codex。就我个人而言,我觉得我现在和Codex交流的时间比大多数人都多,这真的让我感受到了AGI的存在,我希望更多人能够从中受益。
Yeah. Thanks for having us. I'm also super excited. Now that we have codecs and it keeps improving, we're also, like, getting accelerated and building better Codex every day. And personally, I think I spend more time talking to Codex now than most people, and it's really how I feel the AGI, and I hope, like, more people will be able to benefit from it.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。