本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
你们负责Codex的开发工作吗?
Do lead work on Codex?
Codex是OpenAI的编程代理。
Codex is OpenAI's coding agent.
我们认为Codex只是软件工程团队成员的起点。
We think of Codex as just the beginning of a software engineering teammate.
它就像个特别聪明的实习生,但从不看Slack消息,除非你要求否则绝不查看Datadog。
It's a bit like this really smart intern that refuses to read Slack, doesn't check Datadog unless you ask it to.
我记得Karpathy发推说过他遇到的那些最棘手的bug,他花了好几个小时都搞不定。
I remember Karpathy tweeted the gnarliest bugs that he runs into that he just spends hours trying to figure out.
其他方法都解决不了。
Nothing else has solved.
他把问题交给Codex,让它运行一小时,结果就解决了。
He gives it to Codex, lets it run for an hour, and it solves it.
我们开始看到未来的雏形——Codex将能够自主参与自身的训练过程。
Starting to see glimpses of the future where we're actually starting to have Codex be on call for its own training.
Codex编写了大量代码来帮助管理其训练运行和关键基础设施。
Codex writes a lot of the code that helps, like, manage its training run, the key infrastructure.
因此我们设立了Codex代码审查机制。
And so we have a Codex code review.
它确实发现了许多错误。
It's, like, catching a lot of mistakes.
实际上它还发现了一些相当有趣的配置错误。
It's actually caught some, like, pretty interesting configuration mistakes.
最令人震撼的加速案例之一就是Sora安卓应用,一个全新的应用程序。
One of the most mind blowing examples of acceleration, the Sora Android app, like a fully new app.
我们仅用18天就完成了开发,10天后(总计28天)就将其推向公众。
We built it in eighteen days, and then ten days later, so twenty eight days total, we went to the public.
你认为如何在这个领域取得优势?
How do you think you win in this space?
Codex的主要目标之一是实现生产力提升。
One of our major goals with Codex is to get to productivity.
如果我们要打造一个超级助手,它必须能够实际执行任务。
If we're gonna build a super assistant, it has to be able to do things.
过去一年我们学到的是:要让模型发挥作用,当它们能使用电脑时会高效得多。
One of the learnings over the past year is that for models to do stuff, they are much more effective when they can use a computer.
事实证明,模型使用电脑的最佳方式就是直接编写代码。
It turns out the best way for models to use computers is simply to write code.
所以我们逐渐形成这个理念:如果你想构建任何智能体,或许应该先构建一个编码智能体。
And so we're kinda getting to this idea where if you wanna build any agent, maybe you should be building a coding agent.
当你思考Codex的进展时,我猜你们有一堆评估指标,还有各种公开基准测试。
When you think about progress on Codex, I imagine you have a bunch of evals, and there's all these public benchmarks.
我们几个人经常泡在Reddit上。
A few of us are, like, constantly on Reddit.
你知道的,那里有赞美,也有大量抱怨。
You know, there's, there's praise up there, and there's a lot of complaints.
我们产品团队能做的,就是持续思考如何打造一个工具,让它真正加速人类进步,而不是制造一个让人更加无所适从的工具。
What we can do is, we, as a product team, just try to always think about how are we building a tool so that it feels like we're maximally accelerating people rather than building a tool that makes it more unclear what you should do as the human.
身处OpenAI,我不得不问:你认为我们离通用人工智能还有多远?
Being at OpenAI, I can't not ask about how far you think we are from AGI.
目前被低估的限制因素实际上是人类的打字速度或多任务处理速度。
The current underappreciated limiting factor is literally human typing speed or human multitasking speed.
今天我的嘉宾是Alexander Imbirikos,Codex的产品负责人,Codex是OpenAI极其受欢迎且强大的编程助手。
Today, my guest is Alexander Imbirikos, product lead for Codex, OpenAI's incredibly popular and powerful coding agent.
用ChatGPT负责人、前播客嘉宾Nick Turley的话说,Alex是我共事过的最喜欢的人之一,将他及其公司引入OpenAI是我们做过的最佳决策之一。
In the words of Nick Turley, head of ChatGPT and former podcast guest, Alex is one of my all time favorite humans I've ever worked with, and bringing him and his company into OpenAI ended up being one of the best decisions we've ever made.
同样,OpenAI首席产品官Kevin Weil表示,Alex简直是最棒的。
Similarly, Kevin Weil, OpenAI's CPO, said Alex is simply the best.
在我们的对话中,我们聊到了在OpenAI做产品的真实体验,Codex如何帮助Sora团队在一个月内发布登顶App Store榜首的Sora应用,Codex当前20倍增长的秘诀及其编程能力如此出色的原因,为何他的团队现在专注于让代码审查(而不仅是编写)变得更简单,他对通用人工智能时间线的预测,对AI助手何时才能真正实用的看法,以及更多内容。
In our conversation, we chat about what it's truly like to build product at OpenAI, how Codex allowed the Sora team to ship the Sora app, which became the number one app in the App Store, in under one month, also the 20 x growth Codex is seeing right now and what they did to make it so good at coding, why his team is now focused on making it easier to review code, not just write code, his AGI timelines, his thoughts on when AI agents will actually be really useful, and so much more.
特别感谢Ed Baze、Nick Turley和Dennis Yang为本次对话提供的建议话题。
A huge thank you to Ed Baze, Nick Turley, and Dennis Yang for suggesting topics for this conversation.
如果你喜欢这期播客,别忘了在你喜欢的播客应用或YouTube上订阅关注。
If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube.
若您成为我通讯的年费订阅用户,即可免费享用19款卓越产品一年,包括Devon、Lovable、Replid、Bolt、n8m Linear Superhuman、Descript、WhisperFlow、Gamma、Perplexity、Warp、Granolah Magic Pattern、Raycast、Chiper、D、Mobbin、Posthog以及Stripe Atlas的一年使用权。
And if you become an annual subscriber of my newsletter, you get a year free of 19 incredible products, including a year free of Devon, Lovable, Replid, Bolt, n eight m linear superhuman, Descript, WhisperFlow, Gamma, Perplexity, Warp, Granolah Magic Pattern, Raycast, Chiper, D, Mobbin, Posthog, and Stripe Atlas.
请访问lendysnewsletter.com并点击‘产品通行证’。
Head on over to lendysnewsletter.com and click product pass.
接下来,在赞助商简短插播后,我将为您带来亚历山大·恩比里科斯的分享。
With that, I bring you Alexander Embirikos after a short word from our sponsors.
这里有个谜题等你来解。
Here's a puzzle for you.
OpenAI、Cursor、Perplexity、Vercel、Platt以及数百家其他成功企业有什么共同点?
What do OpenAI, Cursor, Perplexity, Vercel, Platt, and hundreds of other winning companies have in common?
答案是它们都由本期赞助商WorkOS提供技术支持。
The answer is they're all powered by today's sponsor, WorkOS.
如果你正在为企业开发软件,可能深有体会:集成单点登录、SCIM、RBAC、审计日志等大客户所需功能有多痛苦。
If you're building software for enterprises, you've probably felt the pain of integrating single sign on, SCIM, RBAC, audit logs, and other features required by big customers.
WorkOS将这些交易阻碍因素转化为即插即用的API,其现代化开发平台专为B2B SaaS打造。
WorkOS turns those deal blockers into drop in APIs with a modern developer platform built specifically for b to b SaaS.
无论你是种子轮初创企业试图赢得首个企业客户,还是独角兽公司进行全球扩张,WorkOS都是最快实现企业级准备并解锁增长的道路。
Whether you're a seed stage startup trying to land your first enterprise customer or a unicorn expanding globally, WorkOS is the fastest path to becoming enterprise ready and unlocking growth.
他们本质上就是企业功能领域的Stripe。
They're essentially Stripe for enterprise features.
访问workos.com开始使用,或直接联系他们的Slack支持,那里有真正的工程师会超快速解答你的问题。
Visit workos.com to get started or just hit up their Slack support where they have real engineers in there who answer your questions super fast.
Work OS让你能像顶尖开发者一样构建应用,提供令人愉悦的API、全面的文档和流畅的开发体验。
Work OS allows you to build like the best with delightful APIs, comprehensive docs, and a smooth developer experience.
立即访问workos.com,让你的应用具备企业级能力。
Go to workos.com to make your app enterprise ready today.
本节目由客户服务领域排名第一的AI助手Finn赞助播出。
This episode is brought to you by Finn, the number one AI agent for customer service.
如果你的客户支持工单堆积如山,那么你需要Fin。
If your customer support tickets are piling up, then you need Fin.
Fin是市场上性能最高的AI助手,平均解决率高达65%。
Fin is the highest performing AI agent on the market with a 65% average resolution rate.
Fin能解决最复杂的客户查询问题。
Fin resolves even the most complex customer queries.
没有其他AI代理能表现得更好。
No other AI agent performs better.
在与竞争对手的直接较量中,Fin每次都胜出。
In head to head bake offs with competitors, Finn wins every time.
是的。
Yes.
改用新工具可能令人担忧,但Fin适用于任何帮助台且无需迁移,这意味着您无需彻底改造现有系统或让客户面临服务延迟。
Switching to a new tool can be scary, but Finn works on any help desk with no migration needed, which means you don't have to overhaul your current system or deal with delays in service for your customers.
Fin已获得6000多位客户服务负责人及Anthropic、Shutterstock、Synthesia、Clay、Vanta、Lovable、monday.com等顶级公司的信任。
And Fin is trusted by over 6,000 customer service leaders and top companies like Anthropic, Shutterstock, Synthesia, Clay, Vanta, Lovable, monday.com, and more.
由于Fin由Fin AI引擎驱动——这是一个持续优化的系统,可让您轻松分析、训练、测试和部署——Fin也能持续提升您的结果。
And because Fin is powered by the Fin AI engine, which is a continuously improving system that allows you to analyze, train, test, and deploy with ease, Fin can continuously improve your results too.
如果您准备好变革客户服务并扩展支持规模,现在只需99美分/次解决方案即可试用Fin。
So if you're ready to transform your customer service and scale your support, give Fin a try for only 99¢ per resolution.
此外,Fin还提供90天退款保证。
Plus, Fin comes with a ninety day money back guarantee.
了解Finn如何为您的团队服务,请访问fin.ai/lenny。
Find out how Finn can work for your team at fin.ai/lenny.
网址是finn.ai/lenny。
That's finn.ai/lenny.
Alexander,非常感谢你来到这里,欢迎参加播客。
Alexander, thank you so much for being here, and welcome to the podcast.
非常感谢。
Thank you so much.
我关注很久了,能来这里真的很兴奋。
I've been following for ages, and I'm excited to be here.
我更加兴奋。
I'm even more excited.
我真的很感激。
I really appreciate that.
我想从你在OpenAI的经历开始聊起。
I wanna start with your time at OpenAI.
你大约一年前加入了OpenAI。
So you joined OpenAI about a year ago.
在此之前,你经营自己的初创公司约有五年时间。
Before that, you had your own startup for about five years.
更早之前,你是Dropbox的产品经理。
Before that, you're a product manager at Dropbox.
我想OpenAI与你工作过的其他地方都大不相同。
I imagine OpenAI is very different from every other place you've worked.
让我直接问你这个问题。
Let me just ask you this.
OpenAI运作方式最大的不同是什么?以及你从中学到的、无论未来去往何处(假设你终将离开)都会随身携带的经验是什么?
What is most different about how OpenAI operates, and what's something that you've learned there that you think you're gonna take with you wherever you go, assuming you ever leave?
迄今为止,我会说OpenAI的工作节奏和雄心壮志远超我的想象。
By far, I would say the speed and ambition of working at OpenAI are just, like, dramatically more than what I can imagine.
而且,我猜这话说出来有点尴尬,因为每个初创公司创始人都会觉得,是啊,我的公司发展超快,人才标准超高,我们超级有雄心。
And, I you guess it's kind of an embarrassing thing to say because everyone who's a startup founder thinks like, yeah, my startup moves super fast and the talent bar is super high and we're super ambitious.
但我不得不说,在OpenAI工作让我重新思考了这些词的真实含义。
But I have to say, working at OpenAI just kind of like made me reimagine what that even means.
我们经常听到这种说法,感觉每个AI公司都在惊叹,天啊,不敢相信他们的发展速度如此之快。
We hear this a lot about, you know, it feels like every AI company is just like, oh my god, can't believe how fast they're moving.
有没有什么具体例子能说明,哇,这在其他任何地方都不可能发生得这么快?
Is there an example of just like, wow, that wouldn't have happened this quickly anywhere else?
首先想到的最明显例子就是Codex本身的爆炸性增长。
The most obvious thing that comes to mind is just like the the explosive growth of Codex itself.
虽然我们有一段时间没更新外部数据了,但Codex的规模在短短几个月内就实现了10倍速增长。
I think it's a while since we bumped our external number, but the 10x ing of Codex's scale was just super fast in a matter of months.
而且自那以后增长幅度更大。
And it's well more since then.
经历过这种速度后——至少对我个人而言——现在每当我要投入时间开发科技产品时,都会觉得必须达到那种速度和规模标准。
And once you've lived through that, or at least speaking for myself, having lived through that now, I feel like any time I'm gonna spend my time on building tech product, there's that kind of that speed and scale that I now need to meet.
回想我在初创公司时的经历,进展要缓慢得多。
If I think of what I was doing in my startup, it moved way slower.
初创企业总是面临这样的平衡:你该对某个想法投入多少,还是发现它行不通就立即转向。
And there's always this balance with startups of how much do you commit to an idea that you have versus find out that it's not working and then pivot.
但在OpenAI我意识到一点:我们能够且必须产生的影响力如此巨大,以至于我现在必须对时间分配更加严苛。
But I think one thing I've realized at OpenAI is, like, the the amount of impact that we can have and, in fact, need to have to do a good job is so high that it it's a I have be, like, way more ruthless with how I spend my time now.
在讨论Codex之前,OpenAI是否有某种组织架构或运作方式让团队能如此快速行动?
Before we get to Codex, is there a way that they've structured the org or, I don't know, the way that OpenAI operates that allows the team to move this quickly?
因为每个人都想飞速前进。
Because everyone everyone wants to move super fast.
我猜想存在一种结构性方法能让这种情况发生。
I imagine there's a structural approach to allowing this to happen.
我的意思是,我们正在构建的技术已经改变了许多方面,无论是我们的构建方式,还是我们能为用户实现的功能种类。
I mean, so one thing is just the technology that we're building with has like just transformed so many things, you know, from like both how we build, but also like what kinds of things we can enable, for users.
而且,我们大部分时间都在讨论基础模型的改进,但我认为即使模型今天不再进步(当然事实并非如此),我们在产品方面也远远落后。
And, you know, we spend most of our time talking about like the sort of improvements within the foundation models, but I believe that even if we had no more progress today with models, which is absolutely not the case, but if even if we had no more progress, we are way behind on product.
还有太多产品需要开发。
There's so much more product to build.
所以我觉得时机已经成熟,如果这说得通的话。
So I think like, just like the moment is ripe, if that makes sense.
嗯。
Mhmm.
但我认为有很多反直觉的事情让我刚到时就感到惊讶,比如组织架构方面。
But I think there's a lot of sort of counterintuitive things that surprised me when I arrived as far as like how things are structured.
我想到一个例子,当我在创业公司工作时,以及之前在Dropbox时,作为产品经理,持续凝聚团队共识非常重要,就像确保方向正确后才能加速前进。
One example that comes to mind is like, when I was working on my startup and and before that, when I was at Dropbox, was like very important, you know, especially as a PM to, like, always kinda rally the ship, and it was kinda like, make sure you're pointed in the right direction, then you can, like, accelerate in that direction.
但在这里,因为我们甚至不清楚即将出现哪些新功能,技术上也不知道哪些会奏效。
But here, I think, because we don't exactly know, like, what capabilities will even come up soon, and we don't know what's going to work, technically.
而且即使技术上行得通,我们也不确定哪些会被市场接受。
And then we also don't know what's going to land, even if it works technically.
对我们来说更重要的是保持谦逊,通过实践学习,快速尝试各种方案。
It's much more important for us to be very humble and learn a lot more empirically and just try things quickly.
整个组织架构就是按照这种方式自下而上运作的。
And the org is set up in that way to be incredibly bottoms up.
这又回到了你刚才说的——每个人都想快速行动。
This is, again, one of those things that, as you were saying, everyone wants to move fast.
我觉得大家都喜欢说自己采用自下而上的方式,至少很多人是这样。
I think everyone likes to say that they're bottoms up, or at least a lot of people do.
但OpenAI是真正、彻底的自下而上。
But OpenAI is truly, truly bottoms up.
这对我来说是个学习过程。
That's been a learning experience for me.
现在想想,如果我将来去非AI公司工作——虽然我觉得未来去非AI公司工作可能都没意义了——那会很有趣。
That now, like, it's it'll be interesting if I ever work at, like I don't think it'll ever it'll even make sense to work at a non AI company in the future.
我甚至不知道那意味着什么。
I don't even know what that means.
但如果要我设想或回到过去,我觉得我会完全换种方式来运作。
But if I were to imagine it or go back in time, I think I would, like, run things totally.
我所听到的是一种‘准备、开火、瞄准’的方式,而非‘准备、瞄准、开火’。
What I'm hearing is kind of this ready, fire, aim, is their approach more than ready, aim, fire.
而且当你思考这一点时——虽然听起来可能不太顺耳——但我确实在AI公司经常听到这种说法,Nick Charlie似乎也表达过相同观点。
And there's something and as you process that, because that may not come across well, but I actually have heard this a lot at AI companies is because you don't know and Nick Charlie shared, I think, the same sentiment.
因为你无法预知人们会如何使用它,花大量时间追求完美是没有意义的。
Because you don't know how people will use it, it doesn't make sense to spend a lot of time making it perfect.
更好的做法是以原始形态发布,观察人们如何使用,然后重点发展那些实际用例。
It's better to just get it out there in a primordial way, see how people use it, and then go big on that use case.
是的。
Yeah.
用这个比喻来说的话,我觉得确实存在瞄准环节,但这个环节要模糊得多。
It's like, okay, to use this analogy a little bit, I feel like there there is an AIM component, but the AIM component is much fuzzier.
更像是大致思考‘我们认为可能会发生什么?’
You know, it's kinda like roughly, what do we think can happen?
比如有位研究主管——我在这里工作学到了很多——他常说在OpenAI,我们可以就一年后的事情展开非常有深度的讨论。
Like, someone, I've learned a ton from working here is a is a research lead, and he likes to say that, like, in OpenAI, we can have really good conversations about something that's a year plus from now.
未来存在诸多不确定性,但这样的时间框架是合理的。
And there's a lot of ambiguity in what will happen, but that's a right sort of timeline.
而对于几个月或几周内即将发生的事,我们也能进行高质量的讨论。
And then we can have really good conversations about what's happening in low months or weeks.
但存在一个尴尬的中间地带——当你开始接近一年期限但尚未到达时,这时很难进行理性推演。
But there's kind of this awkward middle ground, which was as you start approaching a year, but you're not at a year where it's very difficult to reason about.
因此在目标设定方面,我们需要明确:我们试图构建哪些未来图景?
And so as far as aiming, I think we want to know, okay, what are some of the futures that we're trying to build towards?
我们当前处理的许多AI问题(比如对齐问题)都需要超前思考未来很久的情况。
And a lot of the problems we're dealing with in AI, such as alignment, are problems you need to be thinking out really far out into the future.
所以我们在这个层面上的目标设定是模糊的。
So we're kind of aiming fuzzily there.
但当涉及更战术性的问题——比如具体要开发什么产品,以及用户将如何使用该产品时——
But when it comes down to the more tactically, oh yeah, what product will we build and therefore how will people use that product?
我们的态度就变得更倾向于通过实证来寻找答案。
That's the place where we're much more like, let's find out empirically.
这个说法很好。
That's a good way of putting it.
还有一点,当人们听到这个时,他们有时会听到像你们这样的公司说,好吧。
Something else that when people hear this, they people sometimes hear companies like yours saying, okay.
我们要自下而上。
We're gonna be bottoms up.
我们要尝试很多东西。
We're gonna try a bunch of stuff.
我们不会对未来几年制定具体计划。关键是你们都雇佣了世界上最优秀的人才,这感觉是在自下而上工作中取得成功的关键要素。
We're not gonna have exactly a plan of where it's going in the next few The key is you all hire the best people in the world, and so that feels like a really key ingredient in order to be this successful at bottoms up work.
这简直太惊人了,我刚来时甚至被这里每个人的个人驱动力和自主性水平震惊到了。
It's just super rising, I was just, like, again, surprised or even shocked when I arrived at, like, the level of, like, individual, like, drive and, like, autonomy that everyone here has.
所以我认为,OpenAI的运作方式,很多...你无法通过阅读或听播客就能照搬到自己的公司。
So I think, like, the way that OpenAI runs, like, many you you can't, like, read this or be on listen to a podcast and be like, am I'm just gonna deploy this to my company.
也许这么说有点苛刻,但我认为确实很少有公司拥有能这样做的顶尖人才储备。
You know, maybe this is a harsh thing to say, but I think, like, yeah, very few companies have the talent caliber to be able to do that.
所以如果要实施这个方案,可能需要进行一些调整。
So it might need to be, like, adjusted if you were gonna implement this.
好的。
Okay.
那我们聊聊Codex吧。
So let's talk Codex.
你负责领导Codex项目的工作。
You lead work on Codex.
Codex进展如何?
How's Codex going?
能分享些什么数据吗?
What numbers can you share?
这方面有什么可以透露的吗?
Is there anything you can share there?
另外,不是所有人都清楚Codex到底是什么。
Also, not everyone knows exactly what Codex is.
解释一下Codex是什么。
Explain what Codex is.
完全正确。
Totally.
是的。
Yeah.
我非常幸运地承担了预见未来并领导Codex产品开发的工作。
So I had the very lucky job of of living in the future and leading products on Codex.
Codex是OpenAI的编程助手。
And Codex is OpenAI's coding agent.
具体来说,它是一个IDE扩展,比如你可以安装的VS Code扩展,或者一个可以安装的终端工具。
So super concretely, that means it's an IDE extension, like a Versus Code extension, that you can install or a terminal tool that you can install.
安装后,你基本上可以与Codex结对编程,让它回答代码问题、编写代码、运行测试、执行代码,完成软件开发生命周期中那个密集中间阶段的大量工作——这个阶段的核心就是编写要投入生产的代码。
And when you do so, you can then basically pair with Codex to answer questions about code, write code, you know, run tests, execute code, and do a bunch of the work in sort of that, like, thick middle section of the software development life cycle, which is all about, you know, writing code that you're gonna get into production.
更广泛地说,我们认为Codex目前的状态只是一个软件工程团队成员的雏形。
More broadly, we think of Codex as, like, it's what it currently is is just the beginning of a software engineering teammate.
所以当我们用'队友'这样的大词时,我们想象的是它不仅能够编写代码,还能真正参与到软件开发的早期构思和规划阶段,以及更下游的验证、部署和维护代码环节。
And so, you know, when we use a big word like teammate, like some of the things we're imagining are that it's not only able to write code, but actually it participates like early on in like the ideation and planning phases of writing software, then further downstream in terms of, like, validation, deploying, and, like, maintaining code.
为了让这更有趣一点,我喜欢想象的是:如果把现在的Codex比作什么,它有点像那种非常聪明的实习生——除非你要求,否则它不会看Slack消息,也不会检查Datadog或Sentry。
To make that a little more fun, like, one thing I like to imagine is, like, if you think of what Codex is today, it's a bit like this, like, really smart intern that, like, refuses to read Slack and, like, doesn't check Datadog or, like, Sentry unless you ask it to.
因此无论它有多聪明,在你没有和它协作的情况下,你有多大程度会信任它写的代码呢?
And so, like, no matter how smart it is, like, how much are you gonna trust it to write code without you also working with it.
对吧?
Right?
这就是目前人们主要的使用方式——与它结对编程。
So that's how people use it mostly today is they pair with it.
但我们希望能达到这样的程度:它能够独立工作。
But we wanna get to the point where, you know, it can work.
就像你新招聘的实习生一样,你不仅会让他们写代码,还会让他们参与整个开发周期。
Like, just like a new intern that you hire, you don't only ask them to write code, you ask them to participate across the cycle.
所以你知道,即使他们第一次尝试没做对,最终也能通过迭代胜任工作。
And so you know that, like, even if they don't get something right the first try, they're eventually gonna be able to iterate their weight there.
我认为关于不查看Slack和Dat Dog的观点在于它不会被分心,始终保持专注并处于心流状态。
I thought the way I thought the point about not reading Slack and Dat Dog was that it's just not distracted, it's just constantly focused and is always in flow.
但我理解你说的意思是它并不掌握所有正在发生的上下文。
But I get what you're saying there is it doesn't have all the context on everything that's going on.
而且,这不仅仅在执行任务时成立。
And, like, that's not only true when it's performing a task.
但再想想,最好的人类队友,你不需要告诉他们该做什么。
But, again, if you think of, like, the best human teammates, like, you don't tell them what to do.
也许刚雇佣他们时,你会开几次会议说:'嘿,学习这些提示对这个队友有效'。
Maybe when you first hire them, you have a couple meetings and you're like, Hey, learn, these prompts work for this teammate.
这些提示无效。
These prompts don't.
这就是与这个人沟通的方式。
This is how to communicate with this person.
最终你会给他们一些初始任务,委派几项工作。
Then eventually you give them some starter tasks, delegate a few tasks.
但最终你只需要说,嘿,太棒了。
But then eventually you just say like, Hey, great.
好的。
Okay.
你正在与这组人合作处理代码库的这个部分。
You're working with this set of people in this area of the code base.
甚至也可以自由地与其他人在代码库的其他部分合作。
Feel free to work with other people in other parts of the code base too even.
是的,你告诉我你认为应该完成什么才合理。
And yeah, you tell me what you think makes sense to be done.
我们将此视为主动性,而Codex的主要目标之一就是实现这种主动性。
And we think of this as like proactivity and like one of our major goals with Codex is to like get to proactivity.
我认为这对于实现OpenAI的使命至关重要,即为全人类带来AGI的益处。
I think this is critically important to achieve the mission of OpenAI, which is to deliver the benefits of AGI to all humanity.
你知道,我今天喜欢开玩笑说AI产品——这其实是个半开玩笑的说法——它们实际上非常难用,因为你必须深思熟虑它何时能帮到你。
You know, I like to joke today that AI products, and it's a half joke, they're actually really hard to use because you have to be very thoughtful about when it could help you.
如果你没有主动提示模型来帮助你,它很可能在那时并没有提供帮助。
And if you're not prompting a model to help you, it's probably not helping you at that time.
想想现在普通用户每天会提示AI多少次,可能也就几十次。
And if you think of how many times like the average user is prompting AI today, it's probably like tens of times.
但如果你思考人们实际上能从智能实体获益的次数,每天可能有上千次。
But if you think of how many times people could actually get benefit from a really intelligent entity, it's thousands of times per day.
因此我们Codex项目的一个重要目标是探索:一个真正默认就有帮助的队友代理应该是什么形态?
And so a lot a large part of our our goal with Codex is to figure out, like, what is the shape of an actual teammate agent that is sort of helpful by default?
当人们想到Cursor甚至Cloud Code时,会觉得是帮助编码的IDE,能自动补全代码,或许还能完成一些代理工作。
When people think about Cursor and, even Cloud Code, it's like a IDE that helps you code and kind of auto completes code and maybe does some agentic work.
但我在这里听到的愿景是不同的,它是作为一个队友的存在。
What I'm hearing here is the vision is is different, which is it's a teammate.
就像一个远程队友,为你编写代码,你可以与之交谈并让它执行任务。
It's like a remote teammate, a building code for you that you talk to and ask to do things.
当然它也具备IDE、自动补全这些基础功能。
And that also does, I mean, IDE, autocomplete, and things like that.
这是否是你们思考Codex方式的一个差异化点?
Is that is that a kind of a differentiator in the way you think about Codex?
核心思想是,我们希望开发者完成任务时,能感觉自己拥有超能力,能够以极快的速度推进工作。
It's basically this idea that, like, we want the way like, if you're a developer and you're trying to get something done, we want you to just feel like you have superpowers and you're able to move much, much faster.
但我们不认为为了获得这些好处,你需要时刻思考如何调用AI来完成特定任务。
But we don't think that in order for you to reap those benefits, you need to be sitting there constantly thinking about, like, how can I invoke AI at this point to do this thing?
我们希望它能无缝融入你的工作流程,在你无需刻意思考时就开始发挥作用。
We want you to be able to sort of like plug it in to the way that you work and have it just start to do stuff without you having to think about it.
好的。
Okay.
关于这方面我有很多问题,目前进展如何?
I have a lot of questions along those lines, just how's it going?
有什么可以分享的Codex使用数据或统计数字吗?
Is there any stats, any numbers you can share about how Codex is doing?
有的。
Yeah.
自八月份GPT-5发布以来,Codex的增长简直呈爆炸式态势。
It's been Codex has been growing like absolutely explosively since the launch of GPT-five back in August.
关于我们如何释放这种增长潜力,确实有些有趣的产品洞察可以分享,如果你感兴趣的话。
There's definitely some interesting product insights to talk about as to how we unlock that growth, if you're interested.
但我们上次公布的数据显示,自八月以来增长已远超10倍。
But the last stat we shared there was we were well over 10x since August.
实际上,自那以后增长已接近20倍。
In fact, it's been like 20x since then.
此外,Codex模型现在每周处理的token数量已达数万亿级别。
Also, the Codex models are serving many trillions of tokens a week now.
它基本上是我们使用最频繁的编程模型。
And it's basically like our most served coding model.
我们发现的一个非常棒的事情是,我们决定组建Codex团队的方式——打造一个紧密集成的产品与研究团队,他们共同迭代模型和框架。
One of the really cool things that we've seen is that the way that we decided to set up the Codex team, was to build a, you know, a really tightly integrated product and research team that are iterating on the model and the harness together.
事实证明,这种方式能让你做更多尝试,进行更多关于这些组件如何协同工作的实验。
And it turns out that lets you just do a lot more and try many more experiments as to how these things will work together.
我们最初训练这些模型是为了用于我们自主设计的框架,对此我们有着非常明确的理念。
And so we were just training these models for use in our first party harness that we were very opinionated about.
而最近我们开始看到,其他主要的API编程客户也开始采用这些模型。
And then what we've started to see more recently actually is that other major sort of API coding customers are now starting to adopt these models as well.
因此我们实际上已经达到了这样一个阶段:Codex模型在API中也成为了使用最广泛的编程模型。
And so we've reached the point where actually the Codex model is the most served coding model in the API as well.
你刚才暗示了是什么因素促成了这种增长。
You, hinted at this, what unlocked this growth.
我对这个非常感兴趣。
I am extremely interested in hearing that.
之前给我的感觉是...我不太确定。
It felt like before I don't know.
可能这是在他加入团队之前的情况。
Maybe this was before he joined the team.
那时候感觉Cloud Code的表现非常出色。
It just felt like Cloud Code was killing it.
当时所有人都对Cloud Code趋之若鹜。
Just everyone was sitting on top of Cloud Code.
它绝对是当时最优秀的编程方式。
It was by far the best way to code.
然后突然间,Codex横空出世。
And then all of a sudden, Codex comes around.
我记得Karpathy发推说他从未见过这样的模型。
I remember Karpathy tweeted that he just, like, has never seen a model like this.
我想那条推特说的是他遇到的最棘手的bug,他花了好几个小时都搞不定。
He I I think the tweet was the gnarliest bugs that he runs into that he just spends hours trying to figure out.
其他方法都解决不了。
Nothing else has solved.
他把问题交给Codex,让它运行一小时,结果就解决了。
He gives it to Codex, lets it run for an hour, and it solves it.
你们到底是怎么做到的?
What did what'd you guys do?
OpenAI这里有一个强烈的使命,基本上就是要构建通用人工智能(AGI)。
We have this strong sort of mission here at OpenAI to, you know, basically to build AGI.
所以我们经常思考如何设计一个产品,使其能够规模化发展。
And so we we think a lot about what how can we shape a product so that it can scale.
对吧?
Right?
早些时候我提到过,比如,嘿。
You know, earlier I was mentioning like, hey.
如果你是一名工程师,你应该每天从AI那里获得数千次的帮助。
Like, if you're an engineer, you should be getting help from an from AI, like, thousands of times per day.
对吧?
Right?
因此我们在推出Codex的第一个版本——Codex Cloud时,深入思考了实现这一目标的基础架构。
And so we thought a lot about the primitives for that when we launched our first version of Codex, which was Codex Cloud.
那基本上是一个拥有独立计算资源、部署在云端、可以委派任务给它的产品。
And that was basically a product that had its own computer, lived in the cloud, you could delegate to it.
最酷的部分在于你可以并行运行大量任务。
And, you know, the sort of the coolest part about that was you could run many, many tasks in parallel.
但我们遇到的一些挑战是,这种设置相对复杂,既涉及环境配置,比如为模型提供验证变更所需的工具,还要教会它如何以这种方式进行提示。
But some of the challenges that we saw are that it's a little bit harder to set that up, both in terms of like environment configuration, like giving the model the tools it needs to validate its changes and to learn how to prompt in that way.
我用一个队友的比喻来解释这个情况。
And sort of my analogy for this is going back to this teammate analogy.
就像你雇佣了一个队友,但永远不能直接通话,只能通过异步方式反复沟通。
It's like if you hired a teammate, but you're never allowed to get on a call with them and you can only go back and forth, you know, asynchronously over time.
这种方式对某些队友确实有效。
Like, that works for some teammates.
最终这其实会是你希望的主要协作方式。
And eventually, that's actually how you wanna spend most of your time.
这仍然是未来的方向,但初期采用会比较困难。
So that's still the future, but it's hard to initially adopt.
所以我们仍坚持这个愿景:努力让你拥有一个可以委派任务后就能主动工作的AI队友。
And so we still have that vision of like, that's what we're trying to get you to a teammate that you delegate to and then is proactive.
我们正见证这一趋势的增长,但关键突破点在于首先要以更直观、更易获取价值的方式赢得用户青睐。
And we're seeing that growing, but the key unlock is actually first, you need to land with users in a way that's like much more intuitive and like trivial to get value from.
目前绝大多数用户发现Codex的方式,要么是通过下载IDE扩展插件,要么是在CLI中运行它,让智能体在您的电脑上以交互方式协同工作。
So the way that most people discover, like the vast majority of users discover Codex today is either they download an IDE extension or they run it in their CLI and the agent works there with you on your computer interactively.
它运行在沙盒环境中——这项技术非常精妙,既能保障安全,又能获取所有必要的依赖项。
And, it works within a sandbox, which is actually like a really cool piece of tech to, to, to help that be safe and secure, but it has access to all those dependencies.
当智能体需要执行操作时,比如运行某个命令,它可以在沙盒内自主完成。
So if the agent needs to do something, like it needs to run a command, it can do so within the sandbox.
我们完全不需要配置任何环境。
We don't have to set up any environment.
如果遇到沙盒中无法执行的命令,它可以直接向您询问。
And if it's a command that doesn't work in the sandbox, it can just ask you.
这样就能形成使用模型的强力反馈闭环。
And so you can get into this, like, really strong feedback loop using the model.
而我们团队的工作就是逐步将这些反馈闭环转化为产品使用过程中的自然配置,最终实现您可以放心委派任务的目标。
And then over time, like, our team's job is to, help turn that feedback loop into you sort of as a byproduct of using the product, configuring it so that you can then be delegating to it down the line.
再打个比方,虽然老生常谈——如果你雇了个新队友,却只给他一台刚买的裸机,他很难开展工作。
And again, analogy, keep going back to it, but if you hire a teammate and you ask them to do work, but you just give them a fresh computer from the store, it's gonna be hard for them to do their job.
对吧?
Right?
但如果你们并肩工作时,你可以说‘哦,你还没我们常用服务的密码’
But if as you work with them side by side, you could be like, oh, you don't have a password for this service we use.
‘给,这是那个服务的密码’
Like, here's the password for this service.
明白吗?
You know?
嗯
Yeah.
别担心
Don't worry.
尽管运行这个命令
Feel free to run this command.
这样他们就能更容易地独立工作好几个小时而不需要你。
Then it's like much easier for them to then go off and do work for hours without you.
所以我听到的是,Codex的最初版本几乎太过超前了。
So what I'm hearing is the initial version of Codex was almost too far in the future.
它就像是云端的一个远程代理,在异步地为你编写代码。
It's like a remote in the cloud agent that's coding for you asynchronously.
而你们所做的就是,好吧,让我们稍微退回来一点。
And what you did is, okay, let's actually come back a little bit.
让我们集成到工程师已经习惯的IDE和本地环境中,帮助他们逐步适应这个新世界。
Let's integrate into the way engineers already integrate into IDs and locally, and help them kind of on ramp to this new world.
完全正确。
Totally.
这其实非常有趣,因为我们在OpenAI内部大量使用自己的产品。
And this was it was quite interesting because we we dog food products a ton in OpenAI.
你知道,'吃自己的狗粮',意思就是我们使用自己的产品。
So, you know, dog food, as in we use our own product.
因此Codex在过去一整年里持续推动OpenAI加速发展,而云端产品也为公司带来了巨大的加速效应。
And so Codex has been accelerating OpenAI over the course of the entire year and the cloud product was a massive accelerant to the company as well.
事实证明,这是我们内部使用反馈与大众市场信号有所不同的领域之一,因为在OpenAI我们整天训练推理模型。
It just turns out that this was one of those places where the signal we got from dogfooding is a little bit different from the signal you get from the general market, because at OpenAI, we train reasoning models all day.
所以我们对这种即时提示方式非常熟悉,会预先思考,大规模并行运行任务,需要等待一段时间后再异步返回处理结果。
And so we're very used to this kind of prompt thing and think upfront, run things massively in parallel, and it would take some time and then come back to it later asynchronously.
因此现在开发时,我们虽然仍从内部使用中获得大量信号,但也充分意识到不同用户群体使用产品的差异方式。
And so now when we build, we still get a ton of signal from dogfooding internally, but we're also very cognizant of the different ways that different audiences use the product.
这真的很有意思。
That's really funny.
就像既要活在未来,又不能离未来太远。
It's like, live in the future, but maybe not too far in the future.
我能理解OpenAI的每个人都生活在很远的未来,但这种方式并不总是适用于所有人。
And I could see how everyone at OpenAI is living very far in the future, and sometimes that won't that won't work for everyone.
是啊。
Yeah.
那么,关于智能训练数据呢?
What about just, like, intelligence training data?
我不知道。
I don't know.
还有什么其他因素帮助Codex提升了实际编码能力?
Is there something else that helped Codex accelerate its ability to actually code?
是更优质、更干净的数据吗?
Is it better, cleaner data?
主要是模型在进步吗?
Is it more just models advancing?
还有其他什么真正加速了发展吗?
Is there anything else that really helped accelerate?
是的。
Yeah.
这里有几点因素。
So there's a few components here.
我想,你刚才提到了模型,这些模型已经有了巨大的改进。
I guess, you you were mentioning models and the models have improved a ton.
事实上,就在上周三,我们发布了GPT 5.1 Codex Max,一个命名非常准确的模型。
In fact, just last Wednesday, we shipped GPT 5.1 Codex Max, a very, you know, accurately named model.
这真是太棒了。
That is that is awesome.
它的棒之处在于,对于你使用GPT 5.1编解码器执行的任何任务,它的完成速度大约快30%,同时它还释放了大量的智能潜力。
It is awesome both because it is, for any given task that you were using GPT 5.1 codecs for, it's like, you know, roughly 30% faster at accomplishing that task, but also it unlocks a ton of intelligence.
如果你在更高推理层级使用它,它就会显得更加智能。
So if you use it at our higher reasoning levels, it's just like even smarter.
就像你提到的那个反馈或推文,卡帕西说的那样,'把你们最棘手的bug交给我们'。
And that feedback that or that tweet you were saying, like, Karpathy made about like, hey, give us your gnarliest bugs.
显然当前市场上正在发生很多事,但Codex Max确实肩负着解决最困难bug的使命。
Like, you know, obviously there's a ton going on in the market right now, but like Codex Max is definitely like carrying that mantle of tackling the hardest bugs.
这真是太酷了。
So that is super cool.
但我想说的是,我们对此的思考方式正在逐渐演变——从单纯考虑模型本身、训练最佳模型,转向更全面地思考什么是真正的智能体。
But I will say it's like, some of how we're thinking about this is evolving a little bit from being like, yeah, we're just going to think about the model and let's just train the best model to really thinking about what is an agent actually overall.
对吧?
Right?
我不会试图精确定义智能体,但至少我们认为其架构包含:一个非常聪明的推理模型,它擅长处理特定类型的任务。
And, you know, I'm not gonna try to define agent exactly, but at least the stack that we think of it as having is it's like, you have this model, really smart reasoning model that knows how to do a specific kind of task really well.
我们可以聊聊如何实现这一点。
So we can talk about how we make that possible.
但实际上我们需要通过API将该模型部署到一个框架中,这两方面在此都扮演着极其重要的角色。
But then actually we need to serve that model through an API into a harness, and both of those things also have a really big role here.
举个例子,他们引以为豪的一点是能让GPT 5.1 Codex Max持续运行超长时间。
So for instance, one of the things, they're really proud of is you can have GPT 5.1 Codex Max work for really long periods of time.
这虽非常态,但你可以进行设置使其实现,或者这种情况也可能自然发生。
That's not like normal, but you can set it up to do that, or that might happen.
但现在我们经常听到用户反馈说,它连续运行了一整夜或持续工作了二十四小时。
But now routinely we'll hear about people saying like, yeah, it ran like overnight or it ran for twenty four hours.
所以,要让一个模型持续工作那么长时间,它肯定会超出上下文窗口的限制。
And so, you know, for a model to work continuously for that amount of time, it's gonna exceed its context window.
对此我们有一个解决方案,我们称之为'压缩'。
And so we have a solution for that, which we call compaction.
但压缩功能实际上需要用到整个技术栈的三个层面。
But compaction is actually a feature that uses like all three layers of that stack.
首先需要模型本身具备压缩的概念,当接近上下文窗口限制时,能够准备切换到新的上下文窗口。
So you need to have a model that has a concept of compaction and those like, okay, as I start to approach this context window, I might be asked to like prepare to be running a new context window.
在API层面,需要设计能理解这个概念并提供相应接口的API。
And then at the API layer, you need an API that, like, understands this concept and, like, has an endpoint that you can hit to do this change.
在应用框架层面,需要框架能准备好执行这个操作所需的数据负载。
And at the harness layer, you need a harness that can, like, prepare the payload for this to be done.
因此,这个让Codex用户都能使用的压缩功能,其实是三个层面协同工作的成果。
And so, like, shipping this compaction feature that now just, like, made this behavior possible to, like, anyone using codecs actually been working across all three things.
我认为这种跨层协作的趋势会越来越明显。
And I think that's increasingly gonna be true.
另一个可能被低估的例子是,如果你看看市面上所有不同的编程产品,它们都配备了截然不同的工具套件,对模型应该如何运作有着非常不同的理念。
Another maybe underappreciated version of this is if you think about all the different coding products out there, they all have very different tool harnesses with very different opinions on how the model should work.
因此,如果你想训练一个模型能适应各种不同的工作方式,比如你可能坚信它应该通过语义搜索来运作。
And so if you wanna train a model to be good at all the different ways, it could work, like, maybe you have a strong opinion that it should work using semantic search.
对吧?
Right?
或许你强烈认为它应该调用定制工具。
Maybe you have a strong opinion that it should call bespoke tools.
或者像我们这样,强烈主张它应该直接使用终端shell进行操作。
Or maybe you have, like, in our case, a strong opinion that it should just use, like, the shell and work in the terminal.
要知道,如果只针对其中一种场景优化,开发速度可以快得多。
You know, you can be much you can move much faster if you're just optimizing for one of those worlds.
对吧?
Right?
因此我们构建Codex的方式是让它仅使用shell,但为了确保安全,我们为模型设计了一个沙箱运行环境。
And so the way that we built Codex is that it just uses the shell, but in order to make that, like, safer and secure, we have a sandbox that the model is used to operating in.
所以我认为,回到最初问题的最大加速器之一就是:我们正在并行构建这三样东西,不断调整每一项,并持续实验这些元素如何与紧密结合的产品和研究团队协同工作。
So I think one of the biggest accelerants to go all the way back to answered question is just like, we're building all three things in parallel and tuning each one and constantly experimenting with how those things work with, like, a tightly integrated product and research team.
你认为在这个领域如何取胜?
How do you think you win in this space?
你觉得这会演变成一场持续的模型竞赛吗?就像各模型不断超越彼此那样?
Do you think it it'll event it'll always be this kinda, like, race with other models constantly kind of leapfrogging each other?
你认为是否存在一种可能,某人遥遥领先而其他人永远无法追赶?
Do you think there's a world where someone just runs away with it and no one else can ever catch up?
是否存在一条直接通向胜利的道路?
Is there, like, a path to just we win?
这又回到了构建队友的理念上。
Again, comes back to this idea of, like, building a teammate.
不仅是参与团队规划和优先级排序的队友,不仅是真正测试代码并协助维护部署的队友,甚至——如果再次以工程师队友为例——他们还能安排日历邀请、调整站会时间或处理各种事务。
And not just a teammate that, you know, participates in team planning and prioritization, not just a teammate that really tests its code and helps you maintain and deploy, but even a teammate, like if you think again, an engineering teammate, they can also schedule a calendar invite, or move stand up, or do whatever.
因此在我看来,如果我们设想每天或每周都有研究实验室部署某种惊人的新能力...
And so in my mind, if we just imagine that every day or every week, some crazy new capability is just going to be deployed by a research lab.
作为人类,我们根本不可能跟上并运用所有这些技术。
It's just impossible for us as humans to keep up and use all this technology.
所以我认为我们需要进入这样一个世界:你只需拥有一个AI队友或超级助手,你可以直接与它对话,它会自主地知道如何提供帮助。
And so I think we need to get to this world where you kind of just have an AI teammate or a super assistant that you just talk to, and it just knows how to be helpful, like, on its own.
对吧?
Right?
这样你就不必去阅读最新使用技巧了。
And so you don't you don't have to be, like, reading the latest tips for how to use it.
你只需接入它,它就能提供帮助。
You just like, you've plugged it in, and it just provides help.
这就是我认为我们正在构建的产品形态,如果能够实现,我认为这将是一个非常具有粘性的成功产品。
And so that's kind of the shape of what I think we're building, and I think that will be, like, a very sticky, like, winning product if we can do so.
所以在我脑海中形成的构想是——也许一个有趣的话题是:聊天是否是AI的正确交互方式?
So the shape that in my head, at least I have, is that we build you know, maybe a fun topic is, is chat the right interface for AI?
我认为当你不知道该如何使用它时,聊天是一种非常好的交互方式。
I think chat is a very good interface when you don't know what you're supposed to use it for.
就像我在Teams或Slack上和队友交流时那样,聊天功能非常好用。
In the same way that if I think of, like, I'm, like, on Teams or in Slack with a teammate, chat is pretty good.
我可以提出任何我想要的要求。
I can ask for whatever I want.
对吧?
Right?
它就像是适用于所有场景的通用解决方案。
It's, like, it's kind of the the common denominator for everything.
所以你可以和超级助手聊任何话题,无论是编程还是其他事情。
So you can chat with a super assistant about whatever topic you want, whether it be coding or not.
如果你是某个特定领域(比如编程)的功能专家,还可以调出GUI界面进行深度操作,比如查看代码、处理代码。
And then if you are like a functional expert in a specific domain such as coding, there's like a GUI that you can pull up to go really deep and like, look at the code and like work with the code.
因此我认为OpenAI需要构建的就是这种理念:拥有ChatGPT这样的工具,让每个人都能随时随地使用。
So I think what we need to build as OpenAI is basically this idea of you have Chat, ChatGPT, and that is a tool that's ubiquitously available to everyone.
你甚至会在工作之外也开始使用它,对吧,就为了获得帮助。
You start using it even outside of work, right, to just help you.
展开剩余字幕(还有 480 条)
你会逐渐习惯被AI加速的想法。
You become very comfortable with the idea of being accelerated with AI.
于是当你工作时,你就能很自然地想:'对,我直接问它要这个就行'。
And so then you get to work and you just can naturally just, yeah, I'm just gonna ask it for this.
我不需要了解所有连接器或各种功能细节。
And I don't need to know about all the connectors or, like, all the different features.
我直接向它求助,它会根据当下情况以最佳方式提供帮助,甚至可能在我没求助时主动介入。
I'm just gonna ask it for help, it'll surface to me the the best way that it can help at this point in time, and maybe even chime in when I didn't ask it for help.
所以在我看来,如果能实现这一点,我认为这才是打造成功产品的真正路径。
So in my mind, if we can get to that, I think that's, you know, that's how we we really build, like, the winning product.
这很有趣,因为我和ChatGPT负责人尼克·查理聊天时,他提到ChatGPT最初的名字是'超级助手'之类的。
This is so interesting because at with the my chat with Nick Charlie, the head of ChatGPT, I think he shared that the original name for ChatGPT was super assistant or something like that.
是的。
Yeah.
有趣的是存在两种路径:一种是超级助手方向,另一种是Codex方向。
And it's interesting that there's, like, that approach to the super assistant, and then there's this codex approach.
这就像是B2C版本和B2B版本的区别。
It's almost like the b to c version and the b to b version.
我听到的理念是:你先从编码和构建开始,然后它会帮你处理其他所有事情——安排会议、发Slack消息,甚至可能在你没求助时主动介入。
And what I'm hearing is the idea here is, okay, you start with coding and building, and then it's doing all this other stuff for you, scheduling meetings, I don't know, probably posting in Slack, I don't know, shipping designs.
我不知道。
I don't know.
这是否意味着,就像ChatGPT的商业版?
Is that is the idea there, this is like the the business version of ChatGPT in a sense, or is there or or is there something else there?
是的。
Yeah.
所以我们在讨论大约一年内的发展前景。
So, you know, so we're getting to the, like, the, like, one year time horizon conversation.
很多事可能会更快发生,但按模糊度来说,我认为是一年。
A lot of this might happen sooner, but in terms of fuzziness, I think we're the one year.
我会提出一个论点和一个可能实现的路径。}
So I'll give you, like, a contention and a plausible way we get there.
至于具体如何实现,谁知道呢?
But as for how it happens, who knows?
简而言之,如果我们要构建一个超级系统,它必须能执行各种任务。
So basically, if we're gonna build a super system, it has to be able to do things.
我们将拥有一个模型,它能做些影响你世界的事情。
So we're gonna have a model and it's gonna be able to do stuff affecting your world.
过去一年左右我们获得的一个认知是:要让模型发挥作用,当它们能使用计算机时效率会高得多。
And one of the learnings I think we've seen over the past year or so is that for models to do stuff, they are much more effective when they can use a computer.
对吧?
Right?
好的。
Okay.
所以现在我们需要一个能使用计算机的超级系统,或者说多台计算机。
So now we're like, okay, we need the super system that can use a computer, right, or many computers.
现在的问题是:它该如何使用计算机呢?
And now the question is, okay, well, how should it use the computer?
对吧?
Right?
使用电脑的方式有很多种。
And there's lots of ways to use a computer.
你知道,你可以尝试黑入操作系统,比如使用无障碍API。
You know, you could try to hack the OS and, like, use accessibility APIs.
或许更简单的方法是直接点击操作。
Maybe a bit easier is you could point and click.
这种方式有点慢,而且有时不太可靠。
That's a little slow and, unpredictable sometimes.
而另一种方式,事实证明模型使用电脑的最佳方式就是编写代码。
And another way, it turns out the best way for models to use computers is simply to write code.
所以我们逐渐得出一个观点:如果你想构建任何智能体,或许应该构建一个编码智能体。
And so we're kind of getting to this idea where like, well, if you want to build any agent, maybe you should be building a coding agent.
对于非技术用户来说,他们甚至不会意识到自己正在使用编码智能体,就像没人会思考自己是否在使用互联网一样。
And maybe to the user, a non technical user, they won't even know they're using a coding agent, the same way that no one thinks about, are they using the internet or not?
他们最多只会问,Wi-Fi开了吗?
It's just they're more just like, is Wi Fi on?
对吧?
Right?
所以我认为我们正在用Codex打造一个软件工程队友。
So I think that what we're doing with Codex is we're building a software engineering teammate.
作为其中的一部分,我们正在构建一个能通过编写代码来操作计算机的智能体。
And as part of that, we're kind of building an agent that can use a computer by writing code.
所以我们已经看到了一些这方面的需求。
And so we're already seeing some pull for this.
虽然还很早期,但我们开始看到人们将Codex用于编码相关的产品用途。
It's quite early, but we're starting to see people who are using Codex for coding adjacent product purposes.
随着这一趋势的发展,我认为我们会自然而然地发现,哦,原来只要能用编码解决问题,就应该让代理写代码。即使你在做财务分析,也可以写些代码。
And so as that develops, I think we'll just naturally see that, Oh, it turns out we should just always have the agent write code if there is a coding way to solve a problem instead of Even if you're doing a financial analysis, maybe write some code.
所以基本上,就像是在问:这是不是ChatGPT超级助手产品的两个发展方向?
So basically, were like, Hey, is this the two ends of this product for the super assistant of ChatGPT?
在我看来,编程是包括ChatGPT在内的任何智能体的核心能力。
In my mind, just coding is a core competency of any agent, including ChatGPT.
因此我们真正在构建的正是这种能力。
And so really what we think we're building is that competency.
智能体编写代码最酷的地方在于你可以导入代码。
So here's the really cool thing about agents writing code is that you can import code.
代码具有可组合性和互操作性。
Code is composable, interoperable.
对智能体最简化的理解可能是:给它一台电脑,它就会点击操作四处探索。
One very reductive view we could have for an agent is it's just gonna be given a computer and it's just gonna like point and click and go around.
但这正是未来。
But that is the future.
至于如何实现这个目标,很难规划具体路径,因为构建智能体的许多问题不在于'它能否做到'
And then how we get there is difficult to sort of chart a path because a lot of the questions around building agents aren't like, can the agent do it?
而更多在于'我们如何帮助智能体理解其工作场景'
But it's more about, well, how can we help the agent understand the context that it's working in?
就像使用它的团队那样,他们有自己的做事偏好方式。
And like the team that's using it, you know, has a way that they like to do things.
他们有指导原则。
They have guidelines.
他们可能希望关于智能体能做什么或不能做什么有某些确定性保证。
They probably want certain deterministic guarantees about what the agent can or cannot do.
他们想知道智能体是否理解这类细节,举个例子,如果我们查看一个崩溃报告工具,为其连接器时,每个子团队可能都有不同的元提示来说明他们希望如何分析崩溃情况。
What they wanna know that the agent understands sort of this detail, like an example would be, you know, if we're looking at a crash reporting tool, hitting a connector for it, Every sub team is probably has a different meta prompt for how they want the crashes to be analyzed.
于是我们开始面临这样的情况:是的,我们有一个坐在电脑前的智能体,但我们需要为团队或用户提供可配置性。
And so we start to get to this thing where, yeah, we have this agent sitting in front of a computer, but we need to make that configurable for the team or for the user.
对于智能体经常执行的任务,我们可能希望直接将其构建为该智能体具备的固定能力。
Stuff that the agent does often, we probably just want to build in as a competency that this agent has that it can do.
所以我认为最终我们会实现你所描述的这种通用性——一个能为任何需求自主编写脚本的智能体。
So I think we end up with this generalizable thing that you were saying of an agent that can just write its own scripts for whatever it wants to do.
但关键在于,我们能否将智能体高频执行或擅长的事项存储记忆,使其无需重复编写脚本?
But I think that the really key part here is can we make it so that everything that the agent has to do often or that it does well, we can just remember and store so that the agent doesn't have to write a script for that again, right?
或者如果我刚加入一个团队,而你已经和我同队,我就可以直接使用那些智能体已经写好的脚本。
Or maybe if I just joined a team and you are already on the same team as me, I can just, like, use all those scripts that the agents have written already.
是的。
Yeah.
这就像,如果这是我们的队友,他们可以分享从与公司其他人合作中学到的东西。
That's like, if this is our teammate, we can they can share things that it's learned from working with other people at the company.
这个比喻很贴切。
It just makes sense as a metaphor.
没错。
Yeah.
感觉你今天属于Karpathy阵营,认为现在的智能体并不出色,大多是粗制滥造,也许未来会变得很棒。
It feels like you're in the Karpathy camp of agents today are not that great and mostly slop, and maybe in the future, they'll be awesome.
你觉得呢?
Does that resonate?
我想是的,我认为编程智能体已经相当出色了。
I think so I think coding agents are pretty great.
思考
Think
老兄,我觉得
Man, I feel it's
价值巨大。
a ton of value.
至于编码之外的智能体,目前还处于非常早期的阶段。
And then I think agents outside of coding, it's still very early.
这只是我的个人观点,但我认为一旦它们能以组合方式运用编码能力,表现会大幅提升。
And this is just my opinion, but I think they're gonna get a whole lot better once they can use coding too in a composable way.
为软件工程师开发产品时,这部分特别有趣。
It's kind of the fun part of when you're building for software engineers.
在我创业期间,很长一段时间我们也是为软件工程师打造产品。
In my startup, we were building for software engineers too for a lot of that journey.
他们是最有趣的用户群体,因为他们自己也热衷开发,在思考技术应用时往往比我们更具创造力。
And they're just such a fun audience to build for because they also like building for themselves and are often even more creative than we are in thinking about how to use the technology.
因此,为软件工程师构建产品时,你能观察到大量自发行为,以及应该融入产品的功能和改进点。
And so, like, by building for software engineers, you get to just observe a ton of emergent behaviors and, like, things that you should do and build into the product.
我很喜欢你这种说法,因为很多为工程师开发产品的人会感到非常恼火,因为工程师总是在抱怨。
I love how you you say that because a lot of people building for engineers get really annoyed because the engineers ever so they're just always complaining about stuff.
他们总是说‘这太烂了’。
They're like, that sucks.
‘你们为什么要这样设计?’
So why'd you build it this way?
我很高兴你喜欢它,但我想这可能是因为你为工程师们打造了一款真正能解决问题、甚至能替他们编程的出色工具。
I love that you enjoy it, but I think it's probably because you're building such an amazing tool for engineers that can actually solve problems and just, you know, code for them.
顺着这个思路,你知道的,人们总在讨论工作、工程师和编程的未来会怎样。
Kinda along those lines, you know, there's always this talk of what will happen with jobs, engineers, coding.
人们还需要学习编程吗?诸如此类的问题。
Do you have to learn coding, all these things?
显然,按照你的描述,它更像是一个队友。
Clearly, the way you're describing it is it's a teammate.
它会与你协作,让你变得更强大。
It's gonna work with you, make you more superhuman.
按照你对工程领域影响的思考方式,它不会取代你,而是拥有一个超级智能的工程队友。
It's not gonna replace you with the way you just think about the impact on the field of engineering, having a super intelligent engineering teammate.
我认为这件事有两面性。
I think there's there's two sides to it.
但我们刚才讨论的是这样一种观点:也许每个智能体都应该实际使用代码,成为一个编程智能体。
But the one we were just talking about is this idea that maybe every agent should actually use code and be a coding agent.
在我看来,这只是这个更广泛理念的一小部分——随着代码变得更加无处不在,即使在AI之前,你也可以说它现在已经无处不在,对吧?
And in my mind, that's just like a small part of this broader idea that like, Hey, as we make code even more ubiquitous, I mean, you could probably claim it's ubiquitous today, even pre AI, right?
但随着代码变得更加无处不在,它实际上将被用于更多目的。
But as you make code even more ubiquitous, it's actually just going to be used for many more purposes.
因此,对具备这种能力的人——即掌握这项技能的人类——的需求将会大幅增加。
And so there's just going to be a ton more need for people with this, like humans with this competency.
这就是我的观点。
So that's my view.
我认为这是一个相当复杂的话题。
I think this is like, quite a complex topic.
所以,你知道,这是我们经常讨论的事情,我们需要观察它的发展。
So, you know, it's something we talk about a lot, and we have to kind of see how it pans out.
但我认为我们作为这个领域的产品团队能做的,就是始终思考如何打造一个工具,让它感觉像是在最大限度地加速人们的工作,而不是制造一个让人类更不清楚该做什么的工具。
But I think what we can do, what we can do basically as a product team building in the space is just try to always think about how are we building a tool so that it feels like we're like maximally accelerating, people, you know, rather than building a tool that makes it more unclear what you should do as the human.
我想举个例子,现在当你与编码代理合作时,它会编写大量代码,但事实证明编写代码其实是许多软件工程师工作中最有趣的部分。
I think to give an example right now, nowadays when you work with a coding agent, it writes a ton of code, but it turns out writing code is actually one of the most fun parts of software engineering for many software engineers.
结果你却要花时间审查AI生成的代码。
It's then you end up reviewing AI code.
对吧?
Right?
而这通常是许多软件工程师工作中不那么有趣的部分。
And that's often a less fun part of the job for many software engineers.
对吧?
Right?
因此我确实认为,我们看到这种情况在各种微观决策中不断上演。
And so I actually think, like, we see that this comes up, plays out all the time in a ton of micro decisions.
所以我们作为产品团队一直在思考:如何让这件事变得更有趣?
So we, as a product team, are always thinking about, okay, do we make this more fun?
如何让你感觉更有掌控感?
How do we make you feel more empowered?
而当前这种方式并不奏效。
Whereas it's not working.
我认为审查AI编写的代码就是当前比较无趣的一个环节。
And I would argue that reviewing agent written code is a place that today is like less fun.
于是我在思考:我们能为此做些什么?
And so then I think, okay, what can we do about that?
我们可以推出代码审查功能,帮助你增强对AI编写代码的信心。
Well, we can ship a code review feature that helps you build confidence in the AI written code.
好的,很棒。
Okay, cool.
我们还可以做的是让代理能更好地验证自己的工作成果。
Another thing we could do is we can make it so that the agent's better able to validate its work.
这甚至深入到微观决策层面。
And it gets all the way down into micro decisions.
如果你要让代理具备验证工作的能力,比如说现在我想到了Codex Web,你有一个面板可以反映代理所做的工作。
If you're gonna have an agent capability to validate work, and let's say you have, I'm thinking of Codex Web right now, you have a pane that sort of reflects the work the agent did.
你首先会看到什么?
What do you see first?
是看到代码差异对比,还是看到它编写代码的图像预览?
Do you see the diff or do you see the image preview of the code it wrote?
我认为如果你从这个角度思考:如何赋能人类?
I think if you're thinking about this from perspective, how do I empower the human?
如何让他们感受到最大程度的加速?
How do I make them feel as accelerated as possible?
你显然会先看到图像对吧?
You obviously see the image Right?
除非先看过图像,否则你不应该直接审查代码——除非这些代码已经经过AI审核。
You shouldn't be reviewing the code unless first, you know, you've seen the image unless maybe it's been, like, reviewed by an AI.
现在该你查看一下了。
Now it's time for you to take a look.
当我邀请Cursor公司的CEO迈克尔·奇拉尔达上播客时,他提出了我们正在向超越代码的领域迈进的观点。
When I had, Michael Chiralda, CEO of Cursor on the podcast, he he had this kind of vision of us moving to something beyond code.
我观察到一种称为'规范驱动开发'的趋势兴起,你只需编写规范,AI就会自动为你生成代码。
And I've seen this rise of something called spec driven development where you kinda just write the spec and then the code, you know, the AI writes code for you.
因此你开始在更高的抽象层次上工作。
And so you kinda start working at this higher abstraction level.
你是否认为这就是未来方向——工程师不再需要实际编写或查看代码,我们将专注于更高层次的抽象?
Is that something you see where we're going just like engineers not having to actually write code or look at code, and there's gonna be this higher level of abstraction that we focus on?
是的。
Yeah.
我认为这些抽象层级是持续存在的,而且实际上在今天就已经有所体现了。
I mean, I think I think there's, like, constantly these levels of abstraction, and they're actually already played out today.
对吧?
Right?
就像今天,编程代理大多是基于提示进行补丁操作,对吧?
Like, today, like coding agents, mostly it's like prompts to patch, right?
我们开始看到人们尝试规范驱动开发或计划驱动开发。
We're starting to see people doing like spectrum in development or like planned and driven development.
这实际上就是当人们问'如何在超长任务中运行Codecs'时的解决方案之一。
That's actually one of the ways when people ask like, Hey, how do you run Codecs on a really long task?
通常的做法是先与系统协作制定计划。
Well, it's like often collaborate with it first to write like a plan.
比如创建一个Markdown文件作为你的计划书。
Md, like a markdown file that's your plan.
当你对这个计划满意后,就可以让AI去执行具体工作。
And once you're happy with that, then you ask a kid to go off and do work.
如果这个计划包含可验证的步骤,它就能处理更长时间的任务。
If that plan has verifiable steps, it'll like work for much longer.
所以我们完全看到了这一点。
So we're totally seeing that.
我认为规范驱动开发是个有趣的想法。
I think spec driven development is an interesting idea.
我不确定这种方式是否可行,因为很多人也不喜欢写规范。
It's not clear to me that it'll work out that way because a lot of people don't like writing specs either.
但似乎有些人会采用这种方式工作。
But it seems plausible that some people will work that way.
不过有个略带玩笑的想法是,想想现在很多团队的工作方式,他们通常不一定有规范,但团队自我驱动力很强,所以事情自然就完成了。
A bit of a joke idea though is if you think of the way that many teams work today, they often don't necessarily have specs, but the team is just really self driven and so stuff just gets done.
所以我临时想到这个不太好的名称——'闲聊驱动开发',就像事情都是在社交媒体和团队通讯工具中发生的。
So almost that is like, I'm coming up with this on the spot, so it's not a good name, but chatter driven development where it's just like stuff is happening on social media and in your team communications tools.
结果就是代码被编写并部署了。
And then as a result, code gets written and deployed.
所以是的,我觉得我更倾向于这种方式,我甚至不一定非要写规范。
So yeah, I think I'm a little bit more oriented in that way of, I don't even necessarily want to have to write a spec.
有时我会想写规范,但仅限于我喜欢写的时候。
Sometimes I want to, only if I like writing specs.
其他时候我可能只想说:'嘿,这是客服渠道,告诉我有什么值得关注的点。'
Other times I might just want to say, Hey, here's the customer service channel and tell me what's interesting to know.
但如果是小问题,直接修复就好。
But if it's a small bug, just fix it.
我不想为此专门写规范。
I don't wanna have to write a spec for that.
我有个喜欢用来激发讨论的假想未来场景:当我们拥有真正强大的智能代理时,独立开发者会是什么样子?
I have this hypothetical future that I like to share sometimes with people as a provocation, which is in a world where we have, like, truly amazing agents, like, what does it look like to be a solo printer?
有个糟糕的设想是:实际上会存在一个手机应用。
And, you know, one terrible idea for how it could look is that it's actually there's a mobile app.
代理的每个想法都会以竖屏视频形式出现在你手机上。
And every idea that the agent has to do is just, like, vertical video on your phone.
你可以左滑表示这是个坏主意,右滑表示好主意。
And then you can, like, swipe left if you think it's a bad idea, and you can, like, swipe right if it's a good idea.
比如,你可以长按手机说话,在滑动前获取关于这个想法的反馈。
And, like, you can press and hold and, like, speak to your phone if you wanna get feedback on the idea before you swipe.
你知道吗?
You know?
在这个世界里,你的工作基本上就是把这个应用接入每一个信号系统、记录系统,然后你就可以坐享其成,只管滑动。
And in this world, like, basically, what your job is just to, like, plug in this app into, like, every single, like, signal system, you know, system of record, and then you just sort of sit back and, like, swipe.
我不知道。
I don't know.
我太喜欢这个了。
I love this.
所以这就是Tinder遇上TikTok再遇上Codex。
So this is, Tinder meets TikTok meets Codex.
这想法挺糟糕的。
It's pretty terrible.
不。
No.
这太棒了。
This is great.
所以这个想法是,这个智能体在观察、倾听你,关注市场和用户动态,然后它就会觉得'酷'。
So the idea here is this thing is this agent is watching and, right, listening to you, paying attention to the market, your users, and it's like, cool.
这是我应该做的事情。
Here's something I should do.
就像一个积极主动的工程师,直接说'看这个'。
It's like a proactive engineer just like, here.
我们应该开发这个功能并修复这个
We should build this feature and fix this
问题。
thing.
没错。
Exactly.
我认为这正是我们要做的。
I think that's what really gonna do.
以最基础的方式与你交流,就像这样。
Communicating with you in, like, the lowest, like Yeah.
没错。
Yeah.
就像现代沟通方式那样。
Like, the modern way way to communicate.
对。
Yeah.
左右滑动和垂直信息流。
Swipe left to right and and vertical feed.
然后是Sora视频。
And then the Sora video.
好的。
Okay.
现在我明白这一切是如何联系起来的了。
So I see how this all connects now.
我明白了。
I see.
是啊。
Yeah.
明确一下,我们并不是在构建那个东西,你知道的,这只是个有趣的想法。
To be clear, we're not building that, like, you know, it's a fun idea.
我是说,你看,在这个例子中,它正在做的事情之一就是接收外部信号。
I mean, you see, you know, like, in this example, though, like, one of the things that it's doing is it's consuming external signals.
对吧?
Right?
我认为另一个非常有趣的点是,如果我们想想迄今为止最成功的人工智能产品是什么?
I think the other really interesting thing is, like, if we think about, like, what is the most successful, like, AI product to date?
说来有趣,虽然不想把事情搞混,但第一次我们使用Codex品牌和OpenAI时,实际上是为GitHub Copilot提供动力的模型。
I would argue, it's funny, actually, not to confuse things at all, but, like, the first time we used the the the brand Codex and OpenAI was actually the model powering GitHub Copilot.
这要追溯到很久以前,好几年前的事了。
This is, like, way back in the day, years ago.
所以我们最近决定重新使用Codex这个品牌,因为它实在太出色了,你知道的,编解码器、代码执行。
So we decided to reuse that that brand recently because it's just so good, you know, codecs, code execution.
但我认为实际上,IDE中的自动补全功能是当今最成功的人工智能产品之一。
But I think actually, like, auto completion and IDEs is, like, one of the most successful AI products today.
它最神奇的地方在于,当它能正确提供帮助时能极大加速你的工作,即使出错也不会让人特别烦躁。
And part of what's so magical about it is that when it can surface like ideas for helping you really rapidly when it's right, you're accelerated when it's wrong, it's not like that annoying.
虽然有时确实会让人恼火,但还不至于太糟糕。
It can be annoying, but it's not that annoying.
对吧?
Right?
这样你就能创建这种能根据你当前操作意图进行上下文响应的智能辅助系统。
And so you can create this like next initiative system that's like contextually responding to like what you're attempting to do.
因此在我看来,这对我们OpenAI正在构建的东西来说是个非常有趣的方向。
And so in my mind, this is like a really interesting thing for us as open AIs we're building.
举个例子,当我考虑推出浏览器时——就像我们之前用Atlas做的那样,对吧?
So for instance, when I think about launching a browser, which we did with Atlas, right?
在我看来,我们能做的一件非常有趣的事情是,可以根据上下文情境为你日常工作中提供帮助的方式。
Like in my mind, one of the really interesting things we can then do is we can then like contextually surface like ways that we can help you as you're going about your day.
这样我们就突破了局限,不再只是查看代码或停留在终端里,而是意识到一个真正的团队成员要处理的远不止代码。
And so we break out of this, we're just looking at code or we're just in your terminal into this idea that, hey, a real teammate is dealing with a lot more than just code.
他们还需要处理大量网络内容相关的事务。
They're dealing with a lot of things that are web content.
我们该如何在这方面帮助你呢?
You know, how can we help you with that?
天啊,这里面的内容太丰富了。
Man, there's so much there.
我太喜欢这个了。
I love this.
好的。
Okay.
那么浏览器里的网页自动补全功能。
So autocomplete on web with the browser.
这太有趣了。
That's so interesting.
就像这样,当你浏览网页和处理日常事务时,我们可以为你提供各种帮助。
Just like, here's all the things that we can help you with as you're browsing and going about your day.
我想谈谈Atlas。
I wanna talk about Atlas.
我稍后再回到这个话题。
I'll come back to that.
编解码器,代码执行。
Codecs, code execution.
我之前不知道这个。
Did not know that.
这真的很巧妙。
That's really clever.
我现在明白了。
I I get it now.
好的。
Okay.
然后这些闲聊。
And then this chatter.
什么是闲聊驱动开发?
What is a chatter driven development?
我有个不。
I had a no.
这真是个绝妙的主意。
This is a really good idea.
但这让我想起,我曾邀请Block公司的CTO John G Donji上播客,他们有个内部产品叫Goose,是他们自己的智能代理工具。
But it reminds me, I had John g Donji on the podcast, CTO of Block, and they they have this product called Goose, which is their own internal agent thing.
他提到Block有位工程师让Goose实时观察他的屏幕、旁听所有会议,并主动完成他可能需要做的工作。
And he talked about an engineer at Block just has Goose watch him with, like, his screen and listens to every meeting and proactively does work that he should probably wanna do.
比如提交PR、发送邮件、起草Slack消息。
So he ships to PR, sends an email, drafts a Slack message.
所以他正在以一种非常初级的方式,做着你所描述的事情。
So he's doing exactly what you're describing in in kind of a very early way.
是的。
Yeah.
那非常有趣。
That's super interesting.
而且,你知道吗?我敢打赌,如果我们去问他们这种生产力的瓶颈是什么,他们有分享过吗?
And, you know, I bet you the so if we go if we went and asked them what the bottleneck to that productivity is, did did they share what it is?
可能就是在观察它,并确认这是正确的事情。对。
Probably looking at it and just making sure this is the right thing to Yeah.
是的。
Yeah.
所以我们现在看到了这种情况。
So we see this now.
我们为Codex集成了Slack功能。
We have a Slack integration for Codex.
人们喜欢这样——当需要快速完成某事时,大家就会直接@Codex。
People love if there's something that you need to do quickly, people will just mention Codex.
你认为这个bug为什么会发生?
Why do you think this bug is happening?
不一定要是工程师,甚至像数据科学家这类角色也经常大量使用Codex来回答诸如'你认为这个指标为什么波动?'之类的问题。
Have to be an engineer, even like maybe data scientists often appear are using Codex a ton to just answer questions like, why do you think this metric moved?
发生了什么?
What happened?
所以提出问题后,你能直接在Slack里获得答案。
So questions, you get the answer right back in Slack.
这太棒了,超级实用。
It's amazing, super useful.
至于编写代码时,你还是得回头检查代码对吧?
As for when it's writing code, then you have to go back and look at the code, right?
所以目前真正的瓶颈我认为在于验证代码是否有效,以及编写代码审查。
And so the real, like, I think bottleneck right now is like validating that the code worked and like writing code review.
所以在我看来,如果我们想要达到类似你提到的那个朋友World的水平,我认为我们真的需要想办法让人们配置他们的编程代理,使其在工作后期阶段能更加自主运作。
So in my mind, if we wanted to get to something like, you know, that, a friend you were talking about, World, I think we, we really need to figure out how to get people to configure their coding agents to be much more autonomous on those later stages of the work.
有道理。
It makes sense.
就像你说的,写代码我
Like you said, writing code I
曾经是
used to be
一名工程师。
an engineer.
我当了十年工程师。
I was an engineer for ten years.
写代码真的很有趣,进入心流状态、构建、架构设计、测试都很有意思。
Really fun to write code, really fun to just get in the flow, build, architect, test.
但审查别人的代码就没那么有趣了,特别是当你必须为那些可能导致生产环境崩溃的愚蠢代码负责时。
Not so fun to look at everyone else's code and just have to go through and be on the hook if it is doing something dumb that's gonna take down production.
现在构建已经变得更容易了,我从那些真正处于技术前沿的公司那里一直听到的是,现在的瓶颈在于确定要构建什么,然后到了最后阶段,就像,好吧。
And now that building has become easier, what I've always heard from companies that are really at the cutting edge of this is the bottleneck is now, like, figuring out what to build, and then it's at the end of, like, okay.
我们有所有这些100个PR需要审查。
We have all this all a 100 PRs to review.
谁来处理所有这些呢?
Who's gonna go through all that?
没错。
Right.
是的。
Yeah.
本节目由Jira Product Discovery赞助播出。
This episode is brought to you by Jira Product Discovery.
构建产品最困难的部分其实并不是构建产品本身。
The hardest part of building products isn't actually building products.
而是其他所有事情。
It's everything else.
关键在于证明工作的重要性,管理利益相关者,并提前做好规划。
It's proving that the work matters, managing stakeholders, trying to plan ahead.
大多数团队把更多时间花在被动应对而非主动学习上,疲于追逐更新、为路线图辩护,并不断解决阻碍以维持运转。
Most teams spend more time reacting than learning, chasing updates, justifying roadmaps, and constantly unblocking work to keep things moving.
Jira产品发现功能让您重掌主动权。
Jira Product Discovery puts you back in control.
通过Jira产品发现,您可以捕捉洞见并优先处理高影响力的创意。
With Jira Product Discovery, you can capture insights and prioritize high impact ideas.
它具有灵活性,能适应团队的工作方式,帮助您制定促进共识而非质疑的路线图。
It's flexible so it adapts to the way your team works and helps you build a road map that drives alignment, not questions.
由于它基于Jira构建,您可以在同一平台跟踪从战略到交付的整个创意流程。
And because it's built on Jira, you can track ideas from strategy to delivery all in one place.
减少奔波,腾出更多时间思考、学习并构建正确的事物。
Less chasing, more time to think, learn, and build the right thing.
免费获取Jira产品发现功能,请访问atlassian.com/lenny。
Get Jira product discovery for free at atlassian.com/lenny.
网址是atlassian.com/lenny。
That's atlassian.com/lenny.
Codec对您作为产品人员、项目经理的工作方式产生了什么影响?
What has the impact of Codec's been on the way you operate as a product person, as a PM?
工程团队受到的影响显而易见。
It's clear how engineering is impacted.
代码是为你编写的。
Code is written for you.
它对你和OpenAI的PM们的工作方式产生了什么影响?
What has it done to the way you operate and the way PMs operate at at OpenAI?
是的。
Yeah.
我认为主要是让我感觉更有能力了。
I mean, I think mostly I just feel, like, much more empowered.
我一直都是偏技术型的项目经理。
I've always been sort of more technical leaning PM.
尤其是当我为工程师开发产品时,感觉必须亲自'吃自己的狗粮'(使用自家产品)。
And especially when I'm working on products for engineers, feel like it's necessary to dog food the product.
但除此之外,我觉得作为产品经理能做的事情多太多了。
But even beyond that, I just feel like I can do much, much more as a PM.
Scott Belsky曾提出过'压缩人才栈'这个概念。
And Scott Belsky talks about this idea of compressing the talent stack.
我不确定这个表述是否准确。
I'm not sure if I phrased that right.
但核心思想是:由于人们能承担更多工作,这些角色之间的界限变得不像过去那么必要了。
But it's basically this idea that maybe the boundaries between these roles are a little bit less needed than before because people can just do much more.
每当有人能承担更多职责时,就能减少一个沟通环节,让团队效率大幅提升。
And every time someone can do more, you can skip one communication boundary and make the team that much more efficient.
所以我认为这种现象已经出现在多个职能领域了。不过既然你问的是产品方面——现在回答问题变得容易多了,你可以直接咨询Codex获取想法。
So I think I think we see it, you know, in a bunch of functions now, but I guess since you asked about like product specifically, you know, now, like answering questions much, much easier, you can just ask Codex for thoughts on that.
很多产品经理的工作,比如理解需求变化...
A lot of, like, PM type work understanding what's changing.
再次强调,直接向Codex寻求帮助即可。
Again, just ask Codex for help with that.
制作原型通常比编写规范文档更快。
Prototyping is often faster than writing specs.
这是很多人讨论过的话题。
This is something that a lot of people have talked about.
我认为虽然不算特别出人意料,但稍微让人惊讶的是:我们主要开发Codex是为了编写可部署到生产环境的代码,但实际上现在很多人用Codex编写一次性代码。
I think something that I don't think it's super surprising, but something that's slightly surprising is we see We're mostly building Codex to write code that's gonna be deployed to production, but actually we see a lot of throwaway code written with Codex now.
这某种程度上回归了'无处不在的代码'这个概念。
It's kind of going back to this idea of ubiquitous code.
比如你会看到有人想做数据分析时——
So you'll see someone wants to do an analysis.
如果我想理解某些数据,就直接给Codex一堆数据,然后让它为这些数据构建一个交互式数据查看器。
Like if I wanna understand something, it's like, okay, just give Codex a bunch of data, but then ask it to build an interactive data viewer for this data.
过去做这种事太麻烦了,但现在完全值得花时间让智能体去完成。
That's just too annoying to do in the past, but now it's just totally worth the time of just getting an agent to go do something.
类似地,我在设计团队看到一些相当酷的原型,比如设计师想制作一个动画,这就是硬币动画Codex。
Similarly, I've seen like some pretty cool prototypes on our design team about like, if you want to well, like a designer basically wanted to build an animation and this is the coin animation codex.
通常来说,编程实现这个动画会非常麻烦。
And it was like, normally, it'd be too annoying to program this animation.
所以他们直接快速编写了一个动画编辑器,然后用这个编辑器制作动画,最后将动画提交到代码库。
So they just vibe coded a animation editor, and then they use the animation editor to build the animation, which they then check into the repo.
实际上,我们的设计师在这方面效率提升非常显著。
Actually, our designers are there's a ton of acceleration there.
说到人才栈压缩,我认为我们的设计师非常全能(PME)。
And, like, speaking of compressing the talent stack, I think our designers are very PME.
他们做了大量产品工作,甚至用快速编码的方式为Codex应用做了完整的功能原型。
So, you know, they they do a ton of product work and, like, they actually have, like, an entire, like, vibe coded sort of side prototype of the Codex app.
我们讨论问题的方式往往是快速头脑风暴,因为要处理的事情实在太多了。
And so a lot of how we talk about things is we'll have a really quick jam because there's 10,000 things going on.
然后设计师会去思考这个功能应该如何运作。
And then the designer will go think about how this should work.
但他们不会再次讨论,而是直接在独立原型中即兴编写一个原型。
But instead of talking about it again, they'll just vibe code a prototype of that in their standalone prototype.
我们会试用它。
We'll play with it.
如果我们喜欢,他们会将这个原型即兴编码或即兴工程化,变成一个实际的PR提交。
If we like it, they'll vibe code that prototype into or vibe engineer that prototype into an actual PR to land.
然后根据他们对代码库的熟悉程度,比如Codec CLI和Rust稍微难一些,他们可能会自己提交,或者接近完成时由工程师协助提交PR。
And then depending on their comfort with the code base, like Codec CLIs and Rust is a little harder, Maybe they'll, like, land it themselves or they'll, like, get close and then an engineer can help them, like, land the PR.
你知道,我们最近发布了Sora安卓应用,这实际上是最令人震撼的加速案例之一,因为OpenAI内部对Codex的使用率显然非常高。
You know, we recently shipped the Sora Android app, and, that was one of the most sort of mind blowing examples of acceleration actually, because usage of Codex internally to open AI is obviously really, really high.
但这一年来它一直在增长,现在基本上所有技术人员都在使用它。
But it's been growing over the course of the year, both in terms of like now it's basically like all technical staff use it.
甚至关于如何最大化利用编码代理的深度和技巧也大幅提升。
But even like the intensity and know how of how to make the most of coding agents has gone up by a ton.
所以Sora安卓应用这个全新的应用,我们仅用18天就完成了从零到员工发布的全部流程。
And so the Sora Android app, like a fully new app, we built it in eighteen days, it went from like zero to launch to employees.
然后十天后,总共二十八天,我们就向公众全面发布了。
And then ten days later, so twenty eight days total, we went to just GA to the public.
这完全是在Codex的帮助下完成的。
And that was done just with the help of Codex.
所以开发速度相当惊人。
So pretty insane velocity.
我想说这有点——不想说是简单模式——但如果你是一家在多个平台上构建软件的公司,Codex确实特别擅长一件事。
I would say it was a little bit, I don't want say easy mode, but there is one thing that Codex is really good at if you're a company that's like building software on multiple platforms.
所以你已经弄清楚了一些底层的API或系统。
So you've already figured out like some of the underlying like APIs or systems.
让Codex像移植东西一样工作非常有效,因为它有现成的东西可以参考。
Asking Codex to sit like to port things over is really effective because it has something you can go look at.
因此那个团队的工程师基本上就是让Codex参考iOS应用,生成需要完成的工作计划,然后去执行这些计划。
And so the engineers on that team were basically having Codex go look at the iOS app, produce plans of work that needed to be done, and then go implement those.
我当时同时关注着iOS和Android的情况。
And I was kind of looking at iOS and Android at the same time.
所以,基本上,从内部发布到员工用了两周,总共四周时间。
And so, you know, basically, it was like two weeks to launch employees, four weeks total.
速度快得惊人。
Insanely fast.
更疯狂的是它成为了App Store排名第一的应用。
What makes that even more insane is it was the it became the number one app in the App Store.
我不知道。
I don't know.
这简直让人难以置信。
This just boggles the mind.
好的。
Okay.
是啊。
So yeah.
想象一下
So imagine doing
第一名
the number
在应用商店上架的应用,仅由少数工程师开发完成。
one app on the app store with, like, a handful of engineers.
我想大概就两三个人,在几周内完成的。
I think it was, like, two or three possibly, in a handful of weeks.
是的。
Yeah.
这太荒谬了。
This is absurd.
所以,这是个展示加速发展的绝佳例子。
So, yeah, so that's a really fun example of acceleration.
另一个例子是Atlas,我记得本做过一期关于Atlas的播客,详细介绍了我们的开发过程。
Then like Atlas is the other one that I think Ben did a podcast, engine lean on Atlas, sharing a little bit about how we built there.
要知道,Atlas实际上是个浏览器,对吧?
You know, many Atlas is is actually mean, it's a it's a browser, right?
开发浏览器真的非常困难。
And building a browser is really hard.
因此我们不得不构建许多复杂的系统来实现这一目标。
And so we, had to build a lot of difficult systems in order to do that.
基本上,我们让那个团队现在拥有了大量Codex的高级用户。
And basically we got to the point where that team has a ton of power users of Codex right now.
而且,你知道吗,发展到后来他们基本上都在讨论这个,因为很多那些工程师都是我创业前共事过的同事。
And, you know, it got to the point where they were basically, were talking to them about it because a lot of those engineers are people I used to work with before my startup.
所以他们会说,要知道在以前这需要两三个工程师花两三周时间才能完成。
And so they'd say, you know, before this would have taken us like two to three weeks for two to three engineers.
而现在只需要一个工程师一周时间。
And now it's like one engineer, one week.
所以这里也有巨大的效率提升。
So massive acceleration there as well.
最酷的是,你知道,我们最初是在Mac上发布Atlas的,但现在我们正在开发Windows版本。
And what's quite cool is that, you know, we we shipped Atlas on on Mac first, but now we're working on the Windows version.
要知道,团队现在正专注于Windows平台的开发,他们也在帮助我们优化Windows上的编解码器——虽然目前还处于早期阶段,比如我们上周发布的模型是首个原生支持PowerShell的版本。
You know, that so the team now is, like, ramping up on Windows and they're helping us make codecs better on Windows too, which is admittedly earlier, like, just the model we we shipped last week is the first model that natively understands PowerShell.
PowerShell作为Windows系统原生的脚本语言。
So, you know, PowerShell being, the native, like, shell language on Windows.
是的,看到整个公司因Codex而加速发展真是令人振奋——最明显的是研究部门在加速模型训练和提升效果,甚至包括我们之前讨论的设计和营销领域。
So, yeah, it's been it's been really awesome to see, like, whole company getting accelerated by Codex, from And most obviously, also research and improving how quickly we train models and how well we do it, and then even design as we talked about and marketing.
实际上,现在我们的产品营销人员经常直接在Slack上修改字符串或更新文档。
Actually, we're at this point now where my product marketer is often also making string changes just directly from Slack or, like, updating docs directly from Slack.
这些都是令人惊叹的案例。
These are amazing examples.
你们正站在技术可能性的最前沿,这将成为其他公司未来的工作方式。
You guys are living at the bleeding edge of what is possible, and this is how other companies are gonna work.
再次发布后便登顶应用商店榜首,风靡全球——至少霸榜一周。
Just shipping, again, what became the number one app in the app store and just beloved all over the it just, like, took over the, I don't know, the world for at least a week.
你说核心功能只用了28天开发?不对,好像是10天,还是18天就搞定了基础框架。
Built, you said, at twenty eight days and, like, I don't know, ten days, eighteen days just to get, like, the core of it working.
是啊。
Yeah.
所以,大概十八天后,我们就有了一个员工们都在玩的东西。
So, like, eighteen days, we had a thing that employees were playing with.
对。
Yeah.
然后十天后,我们就发布了。
And then ten days later, we were out.
你还说只有几个工程师。
And you said just a couple engineers.
嗯。
Yeah.
两三个。
Two or three.
好的。
Okay.
然后你说Atlas花了一周时间构建。
And then Atlas, you said it was took a week to build.
不,不,不。
No, no, no.
Atlin项目其实只用了不到一周,但它是个相当复杂的项目。
So Atlas, not the whole week, but Atlas was like a really meaty project.
我当时正在和Atlas团队的一位工程师讨论他们如何使用Codex。
And so I was talking to one of the engineers on Atlas about just how what they use Codex for.
基本上就是,我们所有事情都用Codex来完成。
And it's basically like, we use Codex for absolutely everything.
我当时就想,好吧,那你们要怎么测量加速度呢?
I was like, Okay, well, how would you measure the acceleration?
所以我得到的回复基本上是,以前需要两三个工程师花两三周时间,现在只要一个工程师一周就能搞定。
So basically, answer I got back was, previously, it would have taken two to three weeks for two to three engineers, and now it's like one engineer one week.
你觉得最终会发展到让非工程师来做这类工作吗?
Do you think this eventually moves to non engineers doing this sort of thing?
比如说,必须由工程师来构建这个东西吗?
Like, does it have to be an engineer building this thing?
能不能由产品经理或设计师之类的人来构建呢?
Could sort of built been built by, I don't know, a PM or designer?
我认为我们很大程度上会达到一个界限变得模糊的阶段。
I think we will very much get to the point where well, basically, where the boundaries are a little bit blurred.
对吧?
Right?
我认为你会需要了解他们正在构建内容细节的人,但这些细节会不断演变。
Like, I think you're going to want someone who's like understands the details of what they're building, but what details those are will evolve.
有点像现在,如果你在写Swift代码,就不必懂汇编语言。
Kind of like how now, like, if you're writing Swift, you don't have to speak assembly.
世界上只有少数人真正精通汇编语言,也许比少数多一些,但他们的存在确实很重要,对吧?
There's a handful of people in the world and it's really important that they exist and like speak assembly, maybe more than a handful, right?
但这就像一种专业职能,大多数公司并不需要具备。
But that's like a specialized function that like most companies don't need to have.
所以我认为我们自然会看到抽象层次的增加。
So I think we're just gonna naturally see like an increase in layers of abstraction.
而现在最酷的是,我们正在进入语言抽象层,比如自然语言。
And then the cool thing is now we're entering like the language layer of abstraction, like natural language.
而且自然语言本身非常灵活,对吧?
And the natural language itself is really flexible, right?
比如工程师可以讨论计划,也可以讨论规范,还可以直接讨论产品或想法。
Like you could have engineers talking about a plan and then you could have engineers talking about a spec and then you could have engineers talking about just a product or an idea.
所以我认为我们也可以开始向上移动这些抽象层次。
So I think we can also start moving up those layers of abstraction as well.
但我确实认为这会是一个渐进的过程。
But I do think this is gonna be gradual.
我不认为会突然发展到没人写任何代码,只剩规范的地步。
Don't I think it's gonna go up to all of a sudden nobody ever writes anything and any code and it's just specs.
我觉得更像是:我们已经把编码代理设置得很擅长预览构建或运行测试。
I think it's gonna be much more like, okay, we've set up our coding agent to be really good at previewing the build or at running tests.
这可能是大多数人已经完成的第一部分。
Maybe that's the first part that most people have set up.
就像,好了,现在我们设置好了,它们可以执行构建并看到自己更改的结果。
And it's like, okay, now we've set it up so they can execute the build and it can see the results of its own changes.
但我们还没有建立一个好的集成框架,以便它能够——顺便说一下,在Atlas的情况下,不知道他们是否已经做了这些。
But we haven't yet built a good integration harness so that it can, in the case of Atlas, by the way, don't know if they've done any of this or not.
我认为他们已经做了很多这方面的工作,但也许下一阶段是让它加载几个示例页面看看效果如何。
I think they've done a lot of this, but maybe the next stage is enable it to load a few sample pages to see how well those work.
那么,好了,现在我们要设置到那个阶段。
So then, okay, now we're gonna set up to that.
我认为至少在一段时间内,我们会有人类来筛选这些连接器、系统或组件,确定代理需要擅长与哪些交互。
And I think for some time, at least we're gonna have humans curating which of these connectors or systems or components that the agent needs to be good at talking to.
然后,你知道,在未来会有更大的突破,Codex会告诉你如何设置,或者可能在代码库中自行设置。
And then, you know, in the future, there will be an even greater unlock where Codex tells you how to set it up or maybe sets itself up in a repo.
活在这个时代真是太疯狂了。
What a wild time to be alive.
哇。
Wow.
我很好奇这类事物的二阶效应,就是它能多快地构建东西。
I'm curious just the second order effects of this sort of thing, just how quickly it is to build stuff.
那会带来什么影响?
What does that do?
这是否意味着分发变得极其重要?
Does that mean distribution becomes much, much more important?
这是否意味着创意的价值会大幅提升?
Does it mean, ideas are just worth a lot more?
思考这种变化的速度很有意思。
It's interesting to think about how quick, how that changes.
我想知道你的看法。
I'm curious what you think.
我仍然不认为创意的价值有大多数人想的那么高。
I still don't think ideas are worth as much as maybe a lot of people think.
我还是认为执行非常困难,对吧?
I still think execution is really hard, right?
就像你可以快速构建某样东西,但你仍然需要很好地执行它。
Like you can build something fast, but you still need to execute well on it.
仍然需要合理且整体上连贯。
Still needs to make sense and be a coherent thing overall.
是的,而且分销渠道至关重要。
Yeah, and distribution is massive.
是啊。
Yeah.
只是感觉其他所有事情现在都变得更加重要了。
Just feels like everything else is now more important.
所有非构建环节的事情——想点子、推向市场、盈利,诸如此类的一切。
Everything that isn't the building piece, which is coming up with an idea, getting to market, profit, all that kind of stuff.
我认为我们可能曾处于这个奇怪的临时阶段,有一段时间你只需要擅长构建产品就够了,因为那时做产品实在太难了,或许你并不需要对特定客户有深刻理解。
I think we might have been in this weird temporary phase where for a while, you could just it was so hard to build product that you mostly just had to be really good at building product and it maybe didn't matter if you had an intimate understanding of a specific customer.
但现在我认为我们正达到这样一个阶段:如果只能选择理解一件事,那真正理解特定客户面临的问题将变得极具意义。
But now I think we're getting to this point where actually, if I could only choose one thing to understand, it would be really meaningful understanding of the problems that a certain customer has.
如果我只能具备一项核心能力。
If I could only go in with one core competency.
所以我认为这最终仍将是最重要的因素。
So I think that that's ultimately still what's going to matter most.
如果你今天要创办新公司,并且对当前AI工具服务不足的客户群体有深刻理解和人脉网络,我认为你就稳了。
If you're starting a new company today, and you have a really good understanding and network of customers that are currently underserved by AI tools, I think you're set.
对吧?
Right?
而如果你只是擅长搭建网站这类技能,却没有明确的客户对象,那你的处境会艰难得多。
Whereas if you're, like, good at building, like, you know, websites, but you don't have any specific customer to build for, I think you're in in for a much harder time.
按我的理解,你是在看好垂直领域AI初创公司。
Bullish on vertical AI startups is what I'm hearing.
是的。
Yeah.
我完全同意。
I completely agree.
你知道,有些通用方案能解决很多问题,但有些方案则专注于把演示做得极其出色——我们会比任何人都更懂演示问题,会融入你的工作流程,会解决所有与这个特定问题相关的细节。
There's like, you know, there's like the general thing that can solve a lot of problems, then there's like, we're gonna solve presentations incredibly well, and we're gonna understand the presentation problem better than anyone, and we're gonna plug into your workflows and and all these other things that matter for a very specific problem.
好的。
Okay.
太棒了。
Incredible.
当你思考Codex的进展时,我猜你们有一堆评估指标,还有各种公开基准测试。
When you think about progress on Codex, I imagine you have a bunch of evals and there's all these public benchmarks.
你们会关注什么指标来判断进展良好?
What's something you look at to tell you, okay, we're making really good progress?
我猜不会是单一指标,但你们主要关注什么?
I imagine it's not gonna be the one thing, but what do you focus on?
你们正在重点推进什么方向?
What's like something you're trying to push?
有哪些关键绩效指标?
What's like a KPI or two?
我时常提醒自己的一点是,像Codex这样的工具本质上是一种你会逐渐成为高级用户的工具。
One of the things that I'm constantly reminding myself of is that a tool like Codex sort of naturally is a tool that you would, you know, become a power user of.
对吧?
Right?
因此我们可能会无意中花费大量时间思考那些用户采用旅程中非常深入的功能。
And so we can accidentally spend a lot of our time thinking about features that are like very deep in the user adoption journey.
所以我们最终可能会过度解决这个问题。
And so we can kind of end up over solving for that.
因此我认为查看D7留存率至关重要。
And so I think it's just critically important to go look at your D7 retention.
只需重新尝试产品,从头开始注册。
Just go try the product, sign up from scratch again.
我有太多ChatchaPity专业账户了,为了充分进行内部测试,我用Gmail注册的这些账户每月要花费我200美元。
I have a few too many, like, ChatchaPity pro accounts that I've just, like in order to maximally correctly dog food, like, sign up for on my Gmail, they charge me, like, $200 a month.
这些我需要报销。
I need to expense those.
但是,你知道,我认为作为一个用户的感受和早期留存数据对我们来说仍然非常重要,因为尽管这个领域正在快速发展,我认为人们使用它们仍处于非常早期的阶段。
But, you know, like, I think just, like, the feeling of being a user and the early retention stats are still like super important for us because, you know, as much as this category is is taking off, I think we're still in the very early days of like people using them.
我们做的另一件事——我认为我们可能是这个领域里最关注用户反馈的团队——就是我们几个人经常在Reddit和Twitter上活跃。
Another thing that we do that might be I think we might be the most user feedbacksocial media pill team out there in this space is a few of us are constantly on Reddit and Twitter.
上面有赞扬也有很多抱怨,但我们非常认真地对待这些抱怨并加以研究。
And there's praise up there and there's a lot of complaints, but we take the complaints very seriously and look at them.
我认为这再次是因为你可以将编程代理用于如此多不同的用途,它往往在特定行为上存在各种问题。
And I think that, again, because you can use coding agent for so many different things, it often is kind of broken in many sort of ways for specific behaviors.
所以我们实际上经常监测社交媒体上的舆论氛围,特别是对于Twitter X来说,它更偏向炒作一些。
And so we we actually monitor a lot just like what the vibes are on social media pretty often, especially, I think for for Twitter X, it's a little bit more hype y.
而Reddit则更负面但更真实。
And then Reddit is a little more negative but real, actually.
所以我开始越来越多地关注人们在Reddit上实际是如何讨论使用Codex的。
So I've started increasingly paying attention to, like, how people are talking about using codex on Reddit, actually.
这一点大家需要知道。
This is important for people to know.
你最常查看哪些子版块?
Which subreddits do you check most?
有没有类似r/codex的版块?
Is there, like, an r codex?
还是
Or
虽然算法在内容推荐上做得不错,但r/codex这个版块确实存在。
I mean, the algorithm is pretty good at servicing stuff, but, like, r slash codex is is there.
好的。
Okay.
我会记下的。
I'll take.
非常有趣。
Very interesting.
如果在Twitter上有人@你,你仍然能看到,但可能不如在Reddit上看到那么有效。
And then if people tag you on Twitter, you still see that, but maybe not as powerful as seeing it on Reddit.
嗯,是的。
Well, the yeah.
有趣的是,Twitter的特点在于它更像是点对点的交流,即使是在公开场合。
And the interesting well, the thing with Twitter is it's a little bit more one to one even if it's, like, in public.
而Reddit则有很好的投票机制,而且可能大多数用户还不是机器人。
Whereas, like, with Reddit, there's, like, really good upvoting mechanics, and, like, maybe most people are still not bots.
不确定。
Unclear.
所以你能够获得关于重要事项和其他人看法的良好信号。
So you get you get, like, good signal on what matters and what other people
有意思的是,Atlas,我想简单聊聊这个。
So, interestingly, Atlas, I wanna talk about that briefly.
你们推出了Atlas。
You guys launched Atlas.
我其实发推说过试用了Atlas,但我不太喜欢纯AI的搜索体验。
I tweeted actually that I tried Atlas, then I I don't love the AI only, search experience.
有时候我就是想要用谷歌之类的。
Was just like, I just want Google sometimes or whatever.
比如等着AI给我答案时,
Like, waiting for AI to give me an answer.
我就觉得不想等,而且没法切换模式。
I'm like, I don't wanna and there was no way to switch.
我就发了条推说,嘿,
I just tweeted, hey.
我要换回原来的主页了。
I'm I'm switching back at home.
这个体验不太好。
It's not great.
感觉让OpenAI的几个产品经理难过了,还看到有人发推说好吧。
And I feel like I made some PMs at OpenAI sad, and I saw someone tweet, okay.
我们现在有了这个,我想这原本就是计划的一部分。
We have this now, which I imagine was always part of the plan.
这可能是个例子,说明我们只管发布产品,先看看人们如何使用,然后再想办法完善。
It's probably an example of we just ship we gotta ship stuff, see how people use it, and then we figure it out.
所以我想一点是,我不知道。
So I guess one is that I don't know.
那里有什么东西吗?
Is there anything there?
第二点,我只是好奇,你们为什么要开发一个网页浏览器?
And two, I'm just curious, why are you guys building a web browser?
我曾经参与Atlas项目一段时间。
So I I worked on Atlas for a bit.
现在已经不做了。
I don't work on it now.
但你知道,对我来说这里的部分故事背景是——简单说说我的经历——我当时在做这个屏幕共享的结对编程创业项目。
But, you know, like the a bit of the narrative here for for me, just to tell my story a bit, was like, I was working on this, like, screen sharing, like pair programming startup.
对吧?
Right?
然后我们加入了OpenAI。
And then we joined OpenAI.
所以这个想法其实是构建一个情境感知的桌面助手。
And so the idea was really to build a contextual desktop assistant.
我认为这非常重要的原因是,我觉得每次都要向助手交代所有背景信息,再让它想办法帮你,这真的很烦人。
And the reason I believe that's so important is because I think that it's really annoying to have to give all your context to an assistant and then to figure out how it can help you.
对吧?
Right?
所以如果它能直接理解你想做什么,就能最大限度地为你提速。
And so if it could just like understand what you were trying to do, then it could maximally accelerate you.
因此我现在依然把Codex视为某种情境助手,只是从编码任务这个略有不同的角度切入。
And so I still think of Codex actually as like a contextual assistant from a little bit of a different angle, like starting with coding tasks.
但至少我个人认为——虽然不能代表整个产品——很多工作都是在网页端完成的。
But some some of the thinking, at least for me personally, I can't speak for the whole product, but was that a lot of work is done in the web.
如果我们能构建一个浏览器,就能以更原生方式为你提供情境支持。
And if we could build a browser, then we could be contextual for you, but in a much more first class way.
我们不像其他桌面软件那样需要各种兼容性处理,那些软件对渲染内容到无障碍树的支持参差不齐。
We weren't hacking like other desktop software, which have like very varied support for for like what content they're rendering to the accessibility tree.
我们不必依赖速度较慢且不可靠的屏幕截图。
We wouldn't be relying on screenshots, which are a little bit slower and unreliable.
相反,我们可以直接嵌入渲染引擎,对吧?
Instead, we we we could like be in the rendering engine, right?
并提取任何我们需要的信息来帮助你。
And like extract whatever we needed to, to help you.
我还喜欢用电子游戏来类比,比如不知道你玩过《光环》没有?
And also I like to think of like, you know, video games, like, I don't know if you've played, like, I don't know, say Halo, right?
就像你走向一个物体。
Like you walk up to an object.
其实很多游戏都是这样的。
I mean, this is true for many games.
你按下按键,伙计,已经很久了。
You press, man, it's been a long time.
这有点尴尬。
This is embarrassing.
按下x键,它就会自动执行正确的操作。
Press x, and it just does the right thing.
对吧?
Right?
我是那种每买一个电子游戏都会仔细阅读说明书的人。
And I was one of those guys who always read the instruction manual for every video game that I bought.
还记得我第一次读到关于情境化操作的概念时,就觉得这真是个超酷的点子。
And I remember the first time I read about a contextual action, and I just thought it was, like, this really cool idea.
要知道,关于上下文操作的关键在于我们需要了解你试图做什么。
And, you know, the thing about a contextual action is we need to know what you are attempting to do.
我们需要掌握一些上下文信息,这样我们才能提供帮助。
We need have a little bit of context and then we can and then we can help.
我认为这至关重要,因为想象一下我们即将到达的那个世界,在那里每天有智能代理为你提供数千次帮助。
And I think this is critically important because, you know, imagine this world that we reach, right, where where we have agents that are helping you thousands of times per day.
想象一下,如果我们只能通过推送通知来告诉你我们帮了你。
Imagine if the only way we could tell you that we helped you was if we could like push notify you.
那么你每天会收到一千条AI推送通知,写着'嘿,我做了这件事'。
So you get a thousand push notifications a day of an AI saying like, Hey, I did this thing.
你喜欢吗?
Do you like it?
那会非常烦人,对吧?
It'd be super annoying, right?
而想象一下回到软件工程领域,比如我正在看仪表盘,注意到某个关键指标下降了。
Whereas imagine going back to software engineering, like, I was looking at a dashboard and I noticed some, like, key metric had, gone down.
那时候AI可以主动查看,并在我盯着仪表盘时直接呈现它对指标下降原因的分析和可能的修复方案。
And, you know, at that point in time, an AI could, maybe go take a look and then surface the fact that it has an opinion on why this metric went down and maybe a fix right there, right when I'm looking at the dashboard.
对吧?
Right?
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。