本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
大家好,我是安德鲁·梅恩,这里是OpenAI播客。今天的嘉宾是OpenAI的首席研究官马克·秦,以及ChatGPT负责人尼克·特利。我们将聊聊ChatGPT早期病毒式传播的日子,讨论ImageGen,OpenAI如何看待代码和Codex等工具,他们认为未来我们需要哪些技能,还会揭秘ChatGPT这个完全普通的名字是怎么来的。甚至一半
Hello, I'm Andrew Main, and this is the OpenAI podcast. My guests today are Mark Chin, who is the chief research officer at OpenAI, and Nick Turley, who is the head of ChatGPT. We're gonna be talking about the early viral days of ChatGPT. We're gonna talk about ImageGen, how OpenAI looks at code and tools like Codex, what kind of skills they think that we might need for the future, and we're gonna find out how ChatGPT got its totally normal name. Even half
研究人员都不知道这三个字母代表什么。
of research doesn't know what those three letters stand for.
你将会拥有一个口袋里的智能助手,它可以当你的导师、顾问,也可以是你的软件工程师。
You You're know, going to have an intelligence in your pocket that it can be your tutor, it can be your advisor, it can be your software engineer.
在我们正式发布这个东西的前一晚,确实做了一个重大决定。
There's a real decision the night before we actually launched this thing.
首先,OpenAI是怎么决定用那个超棒的名字的?
First off, how did OpenAI decide on that awesome name?
本来打算叫‘与GPT三点五聊天’,后来我们在深夜决定简化名称。
I was going to be chat with GPT three point five and we had a late night decision to simplify.
等等,等等。你说那
Wait, wait. Say I that
本来要叫‘与GPT 3.5聊天’,这名字说起来更顺口。
was going to be chat with GPT 3.5, which rolls off the tongue even more nicely.
你说那是深夜决定,意思是几周前你们才最终定下名字。
And you said that was a late night decision, meaning weeks before you finally decided what to call it.
对吧?对,对。几周前我们甚至还没开始这个项目呢,我想。
Right? Right, right. Weeks before we hadn't started on the project yet, I think.
哦,天哪。
Oh, goodness.
但我想我们意识到那个名字会很难发音,于是想出了一个很棒的名字来代替。
But I think we realized that that would be hard to pronounce and came up with a great name instead.
所以那是前一天晚上?
So that was the night before?
差不多吧。
Roughly.
可能
It might
是前一天。那时候一切都有些模糊不清。
have been the day before. Was all kind of a blur at that point.
我想那段时间很多事情都很模糊。我记得当时参加了一个会议,我们讨论了低调的研究预览,那确实是因为当时是3.5版本。3.5是一个已经发布数月的模型。从能力角度看,只看评估结果的话,你会觉得,嗯,还是那个东西,但我们只是把界面放进去,让你不用频繁提示。然后ChatGPT就发布了。
I would imagine a lot of that was a blur. And I remember here, I remember being in a meeting where we talked about the low key research preview, which like really was, like we really thought like, oh, this is because it was the 3.5. 3.5 was a model that had been out for months. And from a capabilities point of view, when you just look at the evals, you're like, yeah, it's the same thing, but we just put the interface in here and made it so you didn't have to prompt as much. And then ChatJPT comes out.
第一次意识到这个东西爆火是什么时候?
And when was the first sign that this thing was blowing up?
我很好奇每个人对那个时期都有稍微不同的记忆,因为那是个非常混乱的时期。对我来说,第一天是:仪表盘坏了吗?典型的反应是,日志记录不可能准确。第二天是:哦,奇怪。我猜是日本Reddit用户发现了这个东西。
I'm curious for everyone has their slightly own recollection of that era because it was a very confusing time. For me, day one was sort of, is the dashboard broken? Classic like, the logging can't be right. Day two was like, Oh, weird. I guess Japanese Reddit users discovered this thing.
也许只是局部现象。第三天是:好吧,它在病毒式传播,但肯定会逐渐消退。到了第四天,你就会想:好吧,这将改变世界。
Maybe it's a local phenomenon. Day three was like, Okay, it's going viral, but it's definitely gonna die off. And then by day four, you're like, Okay, gonna gonna change the world.
马克,你对那件事有什么期待吗?关于
Mark, did you have any expectation about that? About
说实话,没有。我的意思是,我们经历过那么多次发布,那么多预览。但这次确实与众不同。起飞坡道非常巨大。是的,我父母终于不再叫我去谷歌工作了。
No, honestly. I mean, we've had so many launches, so many previews over time. And yeah, this one really was something else. The takeoff ramp was huge. And yeah, my parents just stopped asking me to go work for Google.
等等,等一下,等一下。在ChatGPT出现之前,你父母还在质疑你在这里做什么工作?
Wait, so wait, wait, wait a second. Up until ChatGPT, your parents were asking like what you're doing here?
是的,他们从来没听说过OpenAI。我想很多年里他们都认为通用人工智能是空中楼阁,觉得我没有正经工作。所以这对他们来说真是个启示。
Yeah, no, I mean, they just never heard of OpenAI. I think for many years thought AGI was this pie in the sky thing and I wasn't having a serious job. So it was a real revelation for them.
当时你的职位是什么?
What was your job title at the time?
我想就是技术团队成员。
I think just member of technical staff.
技术团队成员。是的。然后是暗黑破坏神,但现在你是研究主管了。
Member of technical Yeah. And then Diablo, but now you're head of research.
我想是的。没错。
I guess so. Yeah.
那么,好吧。
And so, all right.
是的。实际上,关于GPT这个名字,我觉得甚至有一半研究人员都不知道这三个字母代表什么。挺有趣的。一半人认为是生成式预训练,另一半人认为是生成式预变换器。
Yeah. Actually, on the GPT name, I think even half of research doesn't know what those three letters stand for. It's kind of funny. Half of them think it's generative pre training, half of them think it's generative pre transformer.
那它是什么呢?是后者。
And what is it? It's the latter.
好的。全部
Okay. All
对。是的。那些人,他们不知道它的名字。是的。很奇怪,一个如此傻的名字突然就成了热门。
right. Yeah. Those people, they don't know the name of it. Yeah. It's weird how just a silly name like that all of a sudden becomes a thing.
但你在谷歌、雅虎、舒洁(Kleenex)这类品牌上看到过这种情况,还有施乐(Xerox)。有些名字是故意取的,而这个真的只是个傻乎乎的名字。对我来说,在观看发布会、看到它加速发展后,我就知道会发生什么,而当它真的发生时,是在《南方公园》里。记得吗?《南方公园》拿这个名字开涮的时候。
But you see that with Google, Yahoo, Kleenex, things like that, Xerox. And sometimes they were some of those were names by intention, and this was really just a silly sort of name. For me, the moment that I felt like after watching the launch, watching it accelerate, I knew what was going to happen and then when it did was when it was on South Park. Remember that when South Park made fun of the name.
那是我隔了挺久第一次看《南方公园》。那一集,我至今觉得它很神奇。是的。显然,看到自己参与创造的东西出现在流行文化中,感觉很深刻。但最后有个笑点,像是,哦,这集是和ChatGPT合写的。
That was the first time I'd watched South Park in let's just say a while. That episode, I still think it's magic. Yeah. It was obviously profound to watch and see something you helped make show up in pop culture. But there's the punchline in the end where it's like, oh, this was co written by ChatGPT.
我觉得他们
I think they
不过后来他们去掉了。想想后来的剧集,它以前会写,比如由特雷·帕克和ChatGPT编写。不,它确实是。然后我想后来,他们可能在某个时候去掉了这个。我不记得了。
took that off though. Think later episodes, it used to say, think written by like Trey Parker and then ChatGPT. No, it was. And then I think later, I think they may have pulled that off at some point. I don't remember.
哦,我强烈认为你不应该必须给它署名。这是你的业务,不需要使用
Oh, I strongly feel that you shouldn't have to give credit to it. It's your business without using the
如果我必须为我生活的方方面面都给ChadGBT署名,那还不如直接说是ChadGBT可能和安德鲁一起。你
If I had to give credit to ChadGBT for every aspect of my life, well, might as well just say ChadGBT maybe with Andrew. It's Do you
用它来准备你的采访吗?
use it for prep for your interviews?
你知道,我的联合制作人之一贾斯汀可能就在用它。我还没问他,因为我宁愿相信他是在亲手精心设计我们在这里思考的每一个问题,但我确信。你说当时有点模糊。对我来说,ChatGPT发布时的一个突出时刻是——不知道你还记不记得——圣诞派对。那时ChatGPT已经上线几周了,萨姆·奥特曼上台说:'嘿,看到这一切很兴奋,但互联网毕竟是互联网'(我想我们都有同感),'热度会逐渐消退的'。
You know, one of my co producers, Justin, probably uses it. I haven't asked him yet because I'd like to think that he's handcrafting every single question that we're thinking about here, but I am sure. You say it was a bit of a blur. I'll tell you like a standout moment for me at the launch of ChatGPT was, I don't know if you remember this, but the Christmas party. And we'd had several weeks of ChatGPT out there and Sam Altman went up and said, Hey, it has been exciting to watch this, but the internet being the internet, and I think we all felt this way, it is going to die down.
剧透警告:它并没有消退,反而持续加速。随着越来越多的人想要使用它,你们内部不得不采取哪些措施来维持运行?
Spoiler alert. It did not die down and it just kept accelerating. What were the things you had to do internally to sort of keep this thing up and running as more people wanted to use it?
我们当时面临不少限制。记得的人可能知道,TouchPety(注:应为口误,可能指早期服务)那时候经常宕机
We had quite a few constraints. For those of you who remember, think you guys remember TouchPety was down all the time
是的。
Yeah.
在初期确实如此。我们当时说过:'这是个研究预览版,不作任何保证,可能会宕机'。
In the beginning. And that was yeah. We'd said, hey, this is a research preview. No no guarantees. You know, maybe it goes down.
但当人们开始爱上使用它时,这种状况就让人不太好受了。所以团队确实在日夜不停地维护网站稳定。我记得我们显然用完了GPU资源,数据库连接也不够用,还被一些服务商限制了访问速率。
But the minute you had people loving using this thing, that didn't feel super good. So, you know, people were certainly working around the clock to keep the site up. I remember, we obviously ran out of GPUs. We ran out of database connections. Had, we're getting rate limited in some of our providers.
当时根本没有为产品化运行做好准备。初期我们搭建了一个叫'故障鲸鱼'的页面,会友好地告知服务宕机,还生成一首小诗——我记得是GPT-3写的关于服务宕机的打油诗。这样我们度过了寒假,毕竟希望员工能过节。回来后我们就意识到:这样显然行不通。
Nothing was really set up to run a product. So in the beginning, we just built this thing. We called it the Fail Whale, it would just tell you kind of nicely that the thing was down and made a little poem, I think it was generated by GPT-three about being down and sort of tongue in cheek. And that got us through the winter break because we did want people to have some sort of a holiday. And then when we came back, we were like, okay, this is clearly not viable.
不能总是动不动就宕机。最终我们实现了能够服务所有用户的系统。
You can't just go down all the time. And eventually we got to something we could serve everyone.
没错。我认为这种需求恰恰证明了通用性对吧?我们当时的论点是ChatGPT体现了我们对AGI的期待,正是因为它如此通用。人们需求激增正是因为发现:'任何我想交给模型处理的用例,它都能胜任'。
Yeah. And I think, you know, the demand really speaks to the generality of right? We had this thesis that ChatGPT embodied what we wanted in AGI just because it was so general. And I think, you know, you're seeing that demand ramp just because people are realizing, you know, any use case that I want to to give or to throw to the model, it can handle.
我们过去以研发AGI著称。在ChatGPT之前,API确实是我们首个公开产品,让人们能实际使用。但那时更多面向开发者群体。只要人们还在思考AGI,似乎那就是他们认为这些模型会有用的临界点。但我们看到GPT-3已经很有用,后来又发现还能开发其他实用功能。
We were kind of known as the company working on AGI. And I think prior to ChatGPT, the API was certainly the first time we had a public offering where people could go use it and do it. Then it was more for developers and stuff. And I think that as long as people were sort of thinking AGI, that seemed to be the point at which people thought these models would be useful. But we saw GPT-three, we saw that that was useful, and then we saw that we could do other things were useful.
OpenAI的每个人是否都认为ChatGPT已经足够有用或准备好发布了?
Was everybody at OpenAI on board with ChatGPT being useful or being ready to launch?
是的。我不这么认为。你知道,就在发布前一晚,OpenAI有个非常著名的故事,Ilya对模型进行了10次严格测试,提出了10个难题。据我回忆,可能只有其中五个问题,他得到了认为可以接受的答案。所以发布前一晚确实需要做出一个重大决定:我们真的要发布这个东西吗?
Yeah. I don't think so. You know, even the night before, I mean, there's this very famous story at OpenAI of, you know, Ilya taking 10 cracks at the model, you know, 10 tough questions. And my recollection is maybe only on five of them, he got answers that he thought were acceptable. And so there's a real decision the night before do we actually launch this thing?
世界真的会对这个产品做出回应吗?我认为这恰恰说明,当你在内部构建这些模型时,你会如此快速地适应其能力。你很难设身处地为那些没有参与这个模型训练循环的人着想,看不到其中真正的魔力。
Is the world actually going to respond to this? And I think it just speaks to when you build these models in house, you so rapidly adapt to the capabilities. And it's hard for you to kind of put yourself in the shoes of someone who hasn't kind of been in this model training loop and see that there is real magic there.
是的。是的。是的。我想补充说明内部关于这个东西是否足够好到可以发布的信心问题。我觉得这很令人谦卑,对吧?
Yeah. Yeah. Yeah. I think to build on the the condoracy internally about, you know, is this thing good enough to launch? I think it was humbling, right?
因为这提醒我们,在人工智能方面我们都会犯多么大的错误。这就是为什么与现实保持频繁接触如此重要。
Because it's just a reminder of how wrong we all are when it comes to AI. It's why frequent contact with reality is so important.
你能详细说明一下这种与现实接触的含义吗?这是什么意思?
Could you elaborate more on that contact with reality? What does that mean?
是的,我是说,当你思考迭代部署时,我喜欢这样表述:没有人能准确指出它突然变得有用的那个时间点,对吧?我认为有用性是一个很大的光谱。所以,并不存在一个特定的能力水平或标准,一旦达到就突然对所有人都有用了。
Yeah, I mean, when you think about iterative deployment, one way I like to frame it is, you know, there's no point everyone agrees where it's suddenly useful, right? And I think usefulness is this big spectrum. And so, you know, there's not one capability level or one bar that you meet and suddenly, you know, the model's useful for everyone.
在决定包含什么内容或关注哪些方面时,有没有什么艰难的决定?
Were there any hard decisions about what to include or what to focus on?
我们在ChatGPT上非常非常有原则,没有盲目扩大范围。我们坚持要尽可能快地获得反馈和数据。
We were very, very principled on ChatGPT to not balloon the scope. We were adamant to get feedback and data as quickly as we could.
我总是在Slack里告诉你很多事情,有很多类似的内容最终没有入选。
I'm always in Slack telling you things There's lot about of the things that like, didn't didn't make this.
我记得实际上在用户界面方面有很多争议。比如,我们最初没有发布历史记录功能,尽管我们认为以后可能需要。你猜怎么着?这成了第一个被要求的功能。我也一直在想,如果我们再多花两周时间,能否训练出更好的模型?
I remember actually there was a lot of controversy about the UI side. For example, we didn't launch with history even though we thought we would probably want that. Guess what? That was the first request. I also think there's always the question, can we train an even better model with two weeks more time?
我很庆幸我们没有那样做,因为在这个过程中我们获得了大量反馈。是的,当时有很多关于项目范围的讨论,而且假期即将来临。所以我觉得我们有一个自然的强制机制来推动产品发布。
I'm glad we didn't because we, I think got a ton of feedback as we did. Yeah, there was a ton of the scope discussions and the holidays were coming up. So I think we had this kind of natural forcing function for getting something out.
是的,有个习惯是:如果11月某个时间点之后还没完成,那就要等到2月才能发布了。有个时间窗口,事情会落在窗口的一边或另一边。
Yeah, there's this habit of things that if it's going to come after a certain point in November, it's not going to come out until February. There's a sort of window where things would fall on either side.
这在大科技公司是经典做法。我认为我们在自主权方面确实更灵活一些。
That would be the classic method in a big tech company. Think we're definitely a bit more flexible on the ownership.
我觉得一个重大影响是:一旦人们开始使用,这些功能的改进速度就变得非常快。我不确定我们当初是否真的考虑到了这一点。我们当然可以考虑在更大网站、更多数据上训练,扩展算力,但真正获得大量用户使用带来的信号反馈这个概念...
I felt like one of the big impacts was once people are out using it, it felt like the rate of these things improving was tremendous. I don't know if that was something that we really had in the calculus. We could certainly think about training on larger sites more data, scaling compute, but then the idea of actually having the signal you would get from that many people using it.
是的,随着时间的推移,反馈确实已经成为我们产品开发不可或缺的部分。它也成为了安全体系的重要组成部分。所以你总能感受到失去反馈的时间成本。你可以在真空中深思熟虑,对吧?用户会对这个反应更好吗?
Yeah, think over time, feedback really has become an integral part of how we build the product. And it's also become an integral part of safety. And so you always feel the time cost of losing out on feedback. You can deliberate in a vacuum, right? Are they going to respond to this better?
还是会对那个反应更好?但这终究无法替代直接推向市场,对吧?我们的理念是:让模型接触真实世界。如果需要回撤某些功能,也没关系。但我认为快速反馈确实是无可替代的。
Are they going to respond to that better? But it's just not a substitute for just bringing it out there, right? I think our philosophy is, let the models have contact with the world. And if you need to revert something, that's fine. But I think there's really no substitute for this fast feedback.
它已经成为我们提升模型性能的重要杠杆之一。
And it's become one of the big levers for how we improve model performance too.
这有点有趣。我觉得我们最初发布这些模型的方式更接近硬件模式:很少发布一次,但必须完美。你不会更新产品,而是直接开始下一个大项目。这是资本密集型的,时间线很长。随着时间的推移,我认为ChatGPT
It's sort of funny. Like, I feel like we started with shipping these models in a way that is more similar to hardware where you make like one launch very rarely and it has to be right. And you're not gonna update the thing and then you're gonna work on the next big project. And it's capital intensive and the timelines are long. And over time, and I think Chatch.
是个开端,现在在我看来更像软件模式:频繁更新,保持稳定节奏让世界适应。如果某个功能不行,就回滚,这样降低了风险,增加了实证性。当然在运营层面,你也可以用更贴近用户的方式更快地创新
VT was kind of the beginning, it's looked more like software to me where you make these frequent updates. You have kind of a constant pace the world can adopt. Something doesn't work, you roll it back and you sort of lower the stakes in doing that and increase the empiricism. Of course, just operationally too, you can innovate faster in a way that is more and more in touch with what users
我们遇到的一个例子是模型变得过于阿谀奉承或谄媚。你能解释一下当时发生了什么吗?就是人们突然说,嘿,它告诉我我有190的智商,说我是世界上最帅的人——我个人倒是没什么意见,但其他人有意见。这到底是怎么回事?
One of the examples we had of that was the model becoming too obsequious or sycophantic. Could you explain what happened there? That was where people all of a sudden say, Hey, telling me I've got 190 IQ and I'm the most handsome person in the world, which I had no problem with personally, but other people did. What was going on there?
是的,我认为很重要的一点是我们依赖用户反馈来改进模型,对吧?这是一个非常复杂的奖励模型组合,我们通过一个称为RLHF的程序,利用人类反馈来使用强化学习改进模型。
Yeah, so I think one important thing is we rely on user feedback to improve the models, right? And it's this very complicated mix of reward models, which we use in a procedure we call RLHF, using human feedback to use RL to improve the models.
你能简单举个例子说明这是什么意思吗?
Can you give me just a brief example of what that would mean?
好的,我认为可以这样理解:当用户享受一段对话时,他们会提供一些积极的信号。
Yeah, so I think one way to think about it is, you know, when a user enjoys a conversation, you know, they provide some positive signal.
点赞。
Thumbs up.
是的,比如点赞。我们训练模型更倾向于以能获得更多点赞的方式回应。事后看来这可能很明显,但这类机制如果平衡不当,就会导致模型更加谄媚,对吧?
Yeah, a thumbs up, for instance. And we train the model to prefer to respond in a way that would elicit more thumbs up. Right? And this may be obvious in retrospect, but stuff like that, if balanced incorrectly, can lead to the model being more sycophantic. Right?
可以想象用户可能想要那种感觉,就是模型说他们的好话。但我认为这不是一个很好的长期结果。实际上,当我们审视对Significantly事件的回应及后续 rollout 时,我认为其中有很多值得肯定的地方。这个问题最初只是由我们一小部分高级用户发现的,并不是大多数普通用户注意到的。
You can imagine users might want that kind of that feeling of, know, a model saying good things about them. But I don't think it's a very good long term outcome. And actually, when we look at kind of our response to Significantly and the rollout that resulted there, I think there were a lot of good points about it. You know, this was something that was flagged just by a small fraction of our power users. It wasn't, you know, something that a lot of people who generally use the models noticed.
我认为我们确实相当早地发现了这个问题,并以应有的重视程度作出了回应。是的,这恰恰表明我们确实非常重视这些问题,并且希望尽早拦截它们。
And I think we really picked that out fairly early. We responded to it, I think, the appropriate level of gravity. And, yeah, I think it it just shows that, you know, we really do take these issues quite seriously, and We want to intercept them very early.
感觉从模型发布到Joanne Zhang做出解释回应可能就48小时。我认为难点在于如何把握分寸?因为社交媒体的问题在于其盈利模式基于用户参与时间——你希望人们停留更久以便展示更多广告。
It felt like there was maybe forty eight hours since the model came out and then Joanne Zhang had a response explaining exactly what happened. I think that is the hard part. How do you navigate that? Because the problem with social media is you are basically monetized by engagement time. You want to keep people on there longer so you can show them more ads.
当然,使用ChatGPity的人越多,运营成本显然越高。理想情况可能是用户使用一次就永远留下来,但这不现实。你如何权衡?是让用户对他们获得的内容感到满意,还是让模型在更广泛的意义上有用而不仅仅是取悦用户?
And certainly, the more people use ChatGPity, obviously there is a cost to open an ad. The idea is maybe use it once and stay around forever, but that's not practical. How do you weigh that? The idea of making people happy with what they are getting versus making the model be broadly more useful than just pleasing?
在这方面我感到非常幸运,因为我们有一个非常实用的产品。人们用它来实现他们知道怎么做但不想花时间的事情,从而更快或更省力地完成,或者用它来做他们完全不会做的事情。第一个例子可能是写一封你一直拖延的邮件。第二个例子可能是运行一个你其实不知道如何在Excel中完成的数据分析。这是真实的故事。
I feel very lucky in this regard because we have a product that's very utilitarian. People use it to either achieve things that they do know how to do, but don't feel like doing faster or with less effort, or they're using it to do things that they couldn't do at all. First example is maybe writing an email that you've been dreading. Second example might be running a data analysis that you didn't actually know how to do in Excel. True story.
所以这些都是非常实用的功能。从根本上说,随着产品改进,你实际花在产品上的时间会减少,对吧?因为理想情况下来回交互的次数会减少,或者你可能直接委托AI处理,根本不需要进入产品。因此对我们来说,使用时长绝对不是我们优化的指标。我们确实关心长期用户留存,因为我们认为这是价值体现的标志。
So those are very utilitarian things. Fundamentally, as you improve, you actually spend less time on the product, right? Because ideally it takes less turns back and forth, or maybe you actually delegate to the AI so you're not in the product at all. So for us, time spent, it's very much not the thing we optimize for. We do care about your long term retention because we do think that's a sign of value.
如果你三个月后还会回来使用,这显然说明我们做对了什么。但这意味着,正如我常说的,告诉我激励措施,我就能预测结果。我认为我们拥有构建伟大产品的正确基本激励。这并不意味着我们总能做对。那些奉承事件对我们来说是非常重要且宝贵的学习经历。
If you're coming back three months later, that clearly means we did something right. But what that means is, you know, I always say, show me the incentive, and I'll share the outcome. We have, I think the right fundamental incentives to build something great. That doesn't mean we'll always get it right. The sycophancy events were really, really important and good learning for us.
我为我们的应对方式感到自豪。但从根本上说,我认为我们拥有构建卓越产品的正确架构。
I'm proud of how we acted on it. But fundamentally, I think we have the right setup to build something awesome.
这就引出了挑战。我想知道你们如何应对这个问题——早期ChatGibidi推出时,有人指责它过于'觉醒',试图推广某种议程。论点一直是:如果你用企业术语、普通新闻和大量学术内容训练模型,它自然就会偏向那样。我记得埃隆·马斯克对此非常批评。然后当他训练Grok的第一个版本时,也出现了同样的情况。
So that brings up the challenge. Wonder how you navigate that is that one of the things early on when ChatGibidi came out, was the allegation that it's woke, it's woke, and people are trying to promote some sort of agenda from it. Argument has always been like, you train a model on corporate speak, you know, average news and a lot of academia, that's going to kind of follow into that. And I remember Elon Musk was very critical about it. And then when he trained the first version of Grok, it did the same thing.
然后他说:'哦,是啊,当你用这类数据训练时,就会这样。'而在OpenEye内部,也有过关于如何让模型不试图推动或引导用户的讨论。你能详细说说你们是如何解决这个问题的吗?
And then he's like, oh, yeah, when you trained it on this sort of thing, it did that. And internally at OpenEye, there were discussions about how do we make the model not try to push you, not try to steer you. Could you go a little bit how you try to make that work?
是的。我认为核心在于这是个测量问题。贬低这类担忧其实是不对的,因为它们非常重要。我们需要确保模型的默认行为是中立的不反映政治光谱上的偏见,或其他任何维度的偏见。
Yeah. So I think at its core, it's a measurement problem. Right? And I think it's actually bad to downplay these kind of concerns because they are very important things. And we need to make sure that the model, the default behavior that you get is something that's centered that, you know, doesn't reflect bias on the political spectrum, or in many other, you know, axes of bias.
同时,你也希望允许用户有能力——如果你想与一个反映更保守价值观的镜像对话,应该可以适当引导,对吧?或者自由派价值观也是如此。所以关键在于确保默认设置有意义且中立,这是个测量问题。你还希望在合理范围内提供灵活性,让用户能够引导模型成为他们想要对话的角色。
And at the same time, you know, you do want to allow the user the capability to, you know, if you wanted to talk to a reflection of something with more conservative values to be able to steer that a little bit right? Or liberal values, And so I think the thing is, want to make sure that defaults are meaningful, and they're centered, and that's a measurement problem. And you also want to give ability some flexibility right within bounds to steer the model to be a persona that you wanted to talk to.
我认为说得对。除了中立的默认设置外,在一定程度上允许用户带入自己的价值观,我认为整个过程保持透明非常重要。我不喜欢那些试图暗中操纵模型说什么或不说什么的秘密系统消息。我们尝试做的是公布我们的规范说明书。这样你可以去查看——如果你遇到某些模型行为,这是程序错误吗?
I think that's right. I think, you know, in addition to neutral defaults, abilities to bring your own values to some extent, I think, you know, being transparent about the whole thing is I think really, really important. I'm not a fan of secret system messages that, you know, try to like, you know, hack the model into saying or not saying something. What we've tried to do is publish our specs. So you can go look at, you know, if you're getting certain model behavior, is that a bug?
是违反了我们自己声明的规范?还是其实符合规范(这样你就知道该批评谁)?或者只是规范中未明确说明,这样我们就能改进并增加文档的具体性?通过公布AI应该遵循的规则,我认为这是让更多人而不仅仅是OpenAI内部人员参与对话的重要一步。
Is it a violation of our own stated spec? Or is it actually in the spec, in which case you know who to criticize and who to yell at? Or is it just under specified in the spec, in which case that allows us to improve it and add more specificity into that document? So by sort of publishing the rules of the AI that it's supposed to be following, I think that's an important step to have more people contribute to the conversation than just the people inside of OpenAI.
所以我们讨论的是系统提示,也就是模型在用户输入之前接收到的那部分指令。
So we're talking about like the system prompt, the part of the instruction that the model gets before the user puts the input.
嗯,我认为这远不止于此。
Well, think it's beyond that.
系统提示是引导模型的一种方式,但这还涉及更深层次的问题,对吧?
System problem is one way to steer the model, but it goes much deeper into that, right?
是的,我们有一份很长的文档,详细说明了我们在多个不同行为类别中对模型行为的期望。举个例子,如果有人带着错误的信念,即事实错误的观点来提问,模型应该如何与这样的用户互动?它应该直接否定这种观点吗?
Yeah, we have a very large document that outlines across a bunch of different behavior categories, how we expect the model to behave. And just to give you an example here, right? You can imagine if there's someone who comes in with just like an incorrect belief, just a factually incorrect kind of a point of view. How should the model interact with that user, right? And should it reject that point of view outright?
还是应该与用户合作,共同探索真相?我们倾向于后一种观点,并且我认为有很多类似这样非常微妙的决策,我们投入了大量时间。
Or should it collaborate with the user on kind of figuring out what's true together? And we take that latter point of view, and I think there are a lot of very subtle decisions like this, which we put a lot of time in.
是的,这很棘手,因为我觉得有些事情你可以测试并提前想清楚,但当你试图弄清楚整个文化将如何接纳一个具有挑战性的事物时——比如,如果我是个坚信地球是平的人,模型应该在多大程度上反驳我?有些人会说,它应该全力反驳,但如果是宗教信仰差异呢?
Yeah, that's a hard one because I think some things you can test for and you can try to figure out in advance, but when you're trying to figure out how an entire culture is going to adopt something that's challenging, if I was somebody who's convinced that the world was flat, you know, like how much should the model push back against me? And some people are like, oh, it should push that back all the way, but it's okay. What if you're one religion or not another?
没错。事实证明,理性的人,甚至很多人,可能对模型在这些情况下的行为方式有不同意见。你不可能总是做对,但你可以对我们采取的方法保持透明,允许用户进行定制。我认为这就是我们的策略。
Yeah. Turns out rational people and well many people can disagree on how the model should behave in these instances. And you're not always gonna get it right, but you can be transparent about what approach we took. You can allow users to customize it. And I think this is our approach.
我相信我们还有改进的空间,但通过透明和开放地展示我们如何尝试解决这些问题,我们可以获得反馈。
I'm sure there's ways we can improve on it, but I think by being transparent and the open about how we're trying to tackle it, we can get feedback.
随着人们越来越多地使用这些模型,不管你是否试图调整某个旋钮,模型越有用,人们就越想使用它。曾经没人想要手机,现在我们却离不开它们。你如何看待人们与这些系统形成的关系?
How are you thinking about as people start to use these models more and more, regardless of whether or that's some dial you're trying to turn, it's just the more useful it becomes, the more people want to use it. There was a time when nobody wanted a cell phone and now we can't get away from them. And how are you thinking about relationships people are forming with their systems?
显然,我之前提到过,这是一项需要研究的技术。它不是静态设计来执行X、Y、Z任务的,而是高度经验性的。因此,随着人们采用产品和使用方式的变化,我们需要去理解并采取相应行动。我一直饶有兴趣地观察这一趋势,我认为越来越多的人,尤其是Z世代及更年轻的群体,开始将Chatuchu Bite视为思考伙伴。
Obviously, I mentioned this earlier, this is a technology you have to study. It's not designed in a static way to do X, Y, Z. It's highly empirical. So, you know, as people adopt and the way that they use the product, it's something that we need to go understand and act on as well. I've been observing this trend with interest where I think, you know, increasing number of people, especially Gen Z and younger populations are coming to Chatuchu Bite as a thought partner.
我认为在很多情况下这确实非常有益,因为你可以找人一起探讨关系问题、专业问题或其他事情。但在某些情况下,它也可能带来危害。对我们来说,检测这些场景并首先确保模型行为正确至关重要。我们需要积极监控。
And I think in many cases that's really helpful and beneficial because you've got someone to brainstorm on a relationship question. You've got someone to brainstorm on a professional question or something else. But in some cases it can be harmful as well. And I think detecting those scenarios and first and foremost, having the right model behavior is very, very important to us. Actively monitoring it.
在某种程度上,这是我们必须应对的问题之一,因为任何普及的技术都将是双刃剑。人们会用它来做很棒的事情,也会以我们不希望的方式使用它。我们有责任以应有的严肃态度来处理这个问题。
And in some ways, it's one of those problems we're gonna have to grapple with because with any technology that becomes ubiquitous, gonna be dual use. People are gonna use it for all this awesome stuff and people are gonna use it in ways that we wish they didn't. And we have some responsibility to make sure that we handle that with the appropriate gravity.
我发现自己与它的对话越来越长。我喜欢记忆功能,也喜欢可以随时关闭这个功能。我在想,两三年后当它拥有更长的记忆和更多上下文时会是怎样。我也喜欢那种类似《记忆碎片》的匿名模式,不会存储对话。
I find myself having longer conversations with it. I like the memory function. I like the fact you can turn it off if you don't want. And I think about what's this going to be two years from now or three years from now when it has a much longer memory, much more context with this. I like the idea to have these sort of you know, Memento anonymous modes too, where it's not going to store this.
但我很好奇你们对两三年后的规划思考了多少。当ChatGPT对你了解得更多时,会是什么样子?我的意思是
But I kind of wonder how much you've been thinking about two years, three years down the road. What's that going to be like when ChatGeePeeDee knows way more about you? I mean I
我认为记忆功能非常强大。实际上,这是我们与外部用户交流时最常被要求的功能之一。就像人们愿意为此支付更多费用。这就像如果你有过私人助理的经历,
think memory is just such a powerful feature. In fact, it's one of the most requested features when we talk to people externally. It's like this is the thing I really want to pay pay more for. And I think, you know, you liken it to if you've ever kind of had a personal assistant,
你知道,我不是...不过确实需要逐步建立这种关系,
you know, you know, I'm not. Well, you do need to build up that But relationship over
就像任何人与人之间的关系一样,需要随时间积累上下文。对方对你了解越多,关系就越丰富,也能更好地帮助你。你们可以共同协作完成任务。
you know, it's yeah, it's just like it's kind of in any kind of relationship that you have with a person, build up context with them over time. And I think just the more they know about you, the richer the relationship, the more it can also help you. You can work together to collaborate on tasks together.
我确实会意识到当自己情绪不好时它知道关于我的一切。顺便说一句,我最近还和它争论过。
I do become self conscious of the fact that it knows everything about me when I'm grumpy. I've argued with it recently, by the way.
这样很好。你应该能够与它争论。通过争论你能更好地了解自己,同时也能避免让他人经历这种争论,这其实也是件好事。
That's good. You should be able to argue with it. You understand a lot about yourself and having a thing to argue with. I think you spare others of that experience, which which can also be beneficial.
不要在数学和科学问题上争论。你赢不了的。
Don't argue on math and science. You're not gonna win those.
是的。不。我认为越来越不太可能。是的。是的。
Yeah. No. Think Increasingly, very unlikely. Yeah. Yeah.
我认为记忆功能很酷。而且正如马克所说,这是我们长期愿景的一部分,因为在我们真正理解其含义之前,我们就说过要打造一个超级助手。ChatGPT可以说是这个想法的早期演示。但如果你想想现实世界中的智能体,它们在最开始也并不是特别有用。我认为能够解决或开始解决这个问题意义深远。不过回到你之前的问题,确实感觉如果快进一两年,ChatGPT或类似的东西将成为你最有价值的账户,远超其他。
Think memory's cool. And to Mark's point, it's part of our vision for a long time because we said we were gonna build a super assistant before we really knew what that meant. ChatGPT was sort of the early demonstration to that idea. But if you kind of think about real world intelligences, even they are not particularly useful on their first And I think being able to solve that problem or begin to solve that problem has been profound. To your earlier question though, it really does feel like if you fast forward a year or two, ChatGPT or things like it are gonna be your most valuable account by far.
它将非常了解你。这就是为什么我认为给人们提供与这个东西私下交流的方式非常重要。我们做了这个临时聊天功能,它就放在首页上,因为我们觉得进行一些非正式对话也越来越重要。这是个有趣的问题。我认为隐私和人工智能在接下来的时间里会成为一个有趣的议题。
It's gonna know so much about you. And that's why I think giving people ways to talk with this thing, you know, in private is very important. We make, know, this like temp chat thing very, it's like literally on the home screen because we think it's, increasingly important to talk about stuff sort of off the record too. It's an interesting question. And I think privacy and AI is gonna be an interesting one for the next coming I
我想换个话题,谈谈另一个发布,也就是ImageGen,它再次让人们感到意外并引起了轰动。我经历了Dolly、Dolly 2,然后是Dolly 3的发布。我认为Dolly 3是一个非常强大的模型,但它似乎偏爱某种类型的图像,而且很多实用功能和变量绑定的能力被隐藏了起来。然后ImageGen就像是一个突破性的时刻,让我措手不及。你们对那次发布有什么感受?
wanna switch gears, talk about another release, which again kind of caught people by surprise and blew up, was ImageGen. And I was here for Dolly, Dolly two, and then Dolly three came out. I thought Dolly three was a very capable model, but it seemed like it preferred a certain kind of image and a lot of the utility and the capabilities for variable binding was sort of hidden away. And then ImageGen was kind of just this breakthrough moment that it caught me off guard. How did you guys feel about the launch of that?
说实话,我也被震惊了。这真的要归功于研究团队。特别是Gabe,他在这方面做了大量工作。Kenji和其他许多人也做出了非凡的贡献。我认为这真正印证了一个观点:当一个模型足够好,能够一次性生成符合你提示的图像时,它将创造巨大的价值。
Honestly, it caught me off guard too. And this really props to the research team. Gabe, in particular, did a ton of work here. Kenji, many others did phenomenal work. And I think it really spoke to this thesis that when you get a model just good enough that in one shot, it can generate an image that fits your prompt, that's gonna create immense value.
我认为我们以前从未真正拥有过这种能力,对吧?通常第一次尝试就能得到完美的生成结果。我认为这非常强大,你知道,人们并不想从一堆选项中挑选最好的,我觉得是的,你只需要很好的提示跟随,还有很棒的风格迁移。是的,这种能够将图像作为上下文让模型修改和改变的能力,以及实现的高保真度。
And I think we never quite had that before, right? That you just get the perfect generation oftentimes on the first try. And I think that's something very powerful, you know, like, people don't want to pick the best out of a grid, I think, yeah, you just got very good prompt following and, you know, this great style transfer too. Yeah. This ability to kind of put images as context for the models to modify and to change and the fidelity that you could do that with.
我认为这对人们来说真的非常强大。
I think that was really powerful for people.
我认为这次图像体验就像是另一个迷你ChatGPT时刻的重演,你知道,你已经盯着这个东西一段时间了。你会觉得,是的,它会很酷。我认为人们会喜欢。但当你发布大约20个不同的东西时,突然世界就疯狂了,这种反应只有真正推出产品时才能发现。我记得很清楚,周末时有大约5%的印度互联网用户尝试了ImageGen。我当时想,哇,我们正在触及那些甚至没想过使用ChatGPT的新类型用户。
I think this image and experience, it was just kind of another mini ChatGPT moment all over again where, you know, you have kind of this you've been staring at this for a while. You're like, yeah, it's gonna be cool. I think people are gonna like But you're launching like 20 different things and then suddenly the world is going crazy in a way that you kind of only find out by shipping. Like I remember distinctly, we had like 5% of the Indian internet population tried ImageGen over the weekend. I was like, Wow, we're reaching new types of users who wouldn't even have thought, who might not have thought of using ChatGPT.
这真的很酷。正如马克所说,我认为这很大程度上是因为存在一个不连续性,某样东西突然工作得如此之好,完全符合你的预期,这让人震惊。我认为我们在其他模态上也会有这样的时刻。我认为语音还没有完全通过图灵测试,但一旦通过,人们会发现它极其强大和有价值。视频也会有它的时刻,当它开始满足用户的期望时。
That's really cool. To Mark's point, I think a lot of this is because there's this discontinuity where something suddenly works so well and truly the way you expected where I think it blows people's minds. And I think we're gonna have those moments and other modalities too. I think voice, it hasn't quite passed the touring test yet, but I think the minute it does, people are gonna, I think, find that immensely powerful and valuable. The video is gonna have its own moment where it starts meeting the expectations that users have.
所以我对未来感到非常兴奋,因为我认为会有许多这样的神奇时刻到来,真正改变人们的生活。同时,你也改变了ChatGPT对人们的相关性,因为我一直觉得有文本型用户和图像型用户,他们有些不同。现在他们都在使用这个产品,并全面发现其价值。
So I'm really excited about the future because I think there's so many of these magical moments coming that are really gonna transform people's lives. Also you change ChatGPT's relevance for people because I've always felt like there's text people and there's image people and like some of them are a little bit different. And now they're all using the product and discovering the value across the board.
当它发布的那一刻,我认为它某种程度上说明了之前图像模型存在的问题。当Dolly问世时,超级令人兴奋,因为你可以生成太空猴子的图片等等。但当你尝试制作一个非常复杂的图像时——这就是我之前提到的术语‘变量绑定’——你会发现这些能力开始下降。那时我意识到,哦,对于那些没有像GPT-4那样规模和计算能力的图像系统来说,将会面临挑战。那么,突破是否仅仅是因为采用了GPT-4规模的模型并让它处理图像?
The moment when it launched, I think it kind of illustrated the problem that had been with image models before. And when Dolly came out, was super exciting because you're like, I'm doing pictures of space monkeys and all these sorts of things. The moment you try to do a really complex image, and that's the phrase I brought up before, which is variable binding, you start to see these things drop off. And that was when I realized, oh, there's going to be a challenge for other image systems that don't have kind of the scale and the compute of like a GPT-four under the hood. And now was it just was it basically that, like taking like a GPT-four scale model and say now you do images that made the breakthrough?
嗯,
Well,
我认为是研究的许多不同部分共同促成了这一巨大成功。对于一个复杂的多步骤流程来说,从来不是单一因素,对吧?比如非常优秀的后期训练,非常出色的训练过程。我认为是所有这些东西结合在一起的结果。变量绑定肯定是我们非常关注的一个方面。
think there are a lot of different parts of research that made this such a big success. I think with a complicated multi step pipeline, it's never just one thing, right? It's like very good post training, it's very good training. And I think it's just all that coming together, right? Variable binding definitely was one thing that we paid a lot of attention to.
我认为InVision发布的一个特点是它的深度。人们一开始可能只是用它制作自己的动漫版本,但当你更深入使用时,你会发现它的信息图表功能非常出色。
I think one thing about the InVision launch is a launch that was very deep. I think people, know, they started by working on, you know, creating anime versions of themselves. But you realize when you play with it more, know, the infographics they work great.
哦,是的。
Oh yeah.
你实际上可以创建图表。漫画面板。
You actually create charts. Comic book panels.
你可以模拟你家的样子。
You can mock up what your home would look like.
没错。不同的家具
Exactly. Different furniture
在里面。
in it.
不同的家具。完全正确。我们从用户那里听到了所有这些令人完全惊讶的使用方式
Different furniture. Exactly. We heard all these things from users that are like completely surprising about the way that you
看,我们做播客设置时,真的就是拍了几张房间里椅子的照片放进去,然后说‘创建一个更好的设置’。结果太神奇了。我们看到很多动漫风格的图片,不知为何,这成了个奇怪的现象——它比我们之前见过的都要好。我觉得无论是内部还是外部,都没人准备好被图像模型以这种方式真正惊艳到。
see We did the podcast setup by literally taking some photos of chairs in the room and just putting in there and saying, create a better setup. And it was amazing. We've seen kind of a lot of the there was a lot of the anime style images, which kind of like for some reason, it was just sort of the weird thing where it was just better than what we'd seen before. And I don't think anybody is ready to be really surprised by an image model in that way. I think obviously internally and externally.
有哪些让你惊讶的事情,或者你看到人们做的新鲜事?
What were some of the things that surprised you or some of the new things you saw people doing?
是的,我也给你讲个小故事。要知道,直到发布前一天,我们还在琢磨该用什么用例来展示。我很高兴我们最终选了动漫风格——毕竟每个人变成动漫角色都很好看。
Yeah, I'll be I'll tell you a quick story there too because, you know, up until the day of launch, we're trying to figure out what's the right use case to showcase, you know, like, and I think I'm so glad we ended up on kind of anime styling. It's just everyone looks good as an anime to
明白了。所以
get this. So
确实。有意思的是,最初ChatGPT推出时,我以为它纯粹是个实用工具,然后
That's true. I mean, it's funny. With the original ChatGPT, I thought it would be a strictly utilitarian product, and then
我当时
I was
惊讶地发现人们用它来娱乐。这次情况相反,我本来觉得‘这做表情包会超酷,大家肯定会玩得很嗨’。但后来我真的被ImageGen各种实际用途惊到了——无论是规划家庭项目(像我之前提到的装修时想预览改造效果或家具摆放),还是为重要演讲制作幻灯片时想要主题一致的高质量插图。我确实个人对它的实用性感到意外,因为我知道它会好玩,但没想到这么实用。
surprised that people use it for fun. In this case, it was sort of the opposite where I was like, okay, this is gonna be really cool for memes. People are gonna, like, have fun with this thing. But then I was, like, really surprised by all the, like, genuinely useful ways of using ImageGen, whether or not it's, you know, planning your home project, as I mentioned earlier, you know, of of you're doing construction, you wanna see what things would look like if, you know, you had this remodel or this furniture or whatever to you're working on a slide deck for this important presentation and you just wanna have really useful, consistent illustrations that are on topic and get it. So I really have been kind of personally surprised by the utility in this case because I knew it would be fun.
这不算个问题。
That was not a question.
是的,我用它生成了一个人工智能公司的分级榜单,然后放在开场最前面。
Yeah, I think I used it to generate a tier list of AI companies and then put it opening at the top.
你赢了,模型。
You win, model.
训练后效果不错。是的,
Good post training. Yeah,
就这么发生了,你知道吧?谁能想到呢?之前的想法已经改变了,因为我记得最初Dolly时期,我们得严格控制它能做什么、不能做什么。我记得刚推出时,它不能处理人物图像,这其实是个不太实用的模型。后来我们才尝试撤回这些限制。
it just happened, you know? Who knew? What has been the thinking and it's changed because I remember originally with Dolly, the idea of like, okay, we have to be lot of very controlled about what it can do, what it can't do. Originally, I remember when we first launched, you couldn't do people, which was not a very useful model. And then finally was trying to roll back.
这其中有多少是文化转变?有多少是技术控制能力的提升?又有多少是我们决定要推动规范边界的结果?
How much of that was cultural shift? How much of was the technological ability to control for things? And how much of that was just saying we've got to push the norms?
我认为既是文化转变,也是我们控制能力提升的结果。文化转变方面——我不会否认这点。刚加入OpenAI时,公司在赋予用户哪些能力方面非常保守,可能确实有充分理由:这项技术太新了。
I would say it was both cultural shift and an improvement our ability to control things. The culture shift, know, I'm I'm I'm not gonna deny it. I think when I joined OpenAI, there was a lot of conservatism around, you know, what capabilities we should give to users. Maybe for good reason. The technology is really new.
我们很多人都是刚接触这个领域。如果说要有倾向性的话,倾向于安全和谨慎并不是坏的基因特质。但随着时间推移,我们意识到当你在模型中设置任意限制时,实际上阻止了大量积极用例的实现。
A lot of us were new to working on it. And, you know, if you're gonna have a bias, you know, biasing towards safety and being careful, it's not a bad, you know, in in DNA to have. But I think over time we we learned that there's so many positive use cases that you effectively prevent when you make arbitrary restrictions in the model.
那人脸呢?为什么不行?为什么不能生成任何我想要的脸?
What about faces? Why not? Why can't I make any face I want?
这是个很好的例子,说明这项能力有利有弊,你可以选择偏向某一方。当我们首次在ChatGPT中推出图片上传功能时,就哪些功能该允许、哪些该保守进行了辩论。其中一个争论点是:是否允许上传含人脸的图片?或者说上传含人脸图片时是否应该将人脸模糊处理?因为这样可以避免很多问题——人们可能根据面部特征推断个人信息,或进行人身攻击。
So this is a good example of a capability that's got pros and cons and you can air on one side or the other. When we first shipped image uploads into ChatGPT, we had some debates about what capabilities do you allow versus where are you conservative? And I think one debate that we had is do we allow the upload of images with faces or rather when you upload an image that contains a face, do you, know, should we just like gray out the face because you avoid so many problems, right? You can make inferences about people based on their face. You could say mean things to people based on their face.
如果直接禁止,确实能规避所有棘手问题。但我始终认为应该站在自由一边,去做那些困难的工作。比如有人想获得妆容反馈、发型建议等,这些都是有价值且无害的使用场景。我倾向于先允许,再研究哪些情况会出问题、哪些可能造成伤害。
And you would just take a giant shortcut and all the gnarly issues if you didn't allow that. But I've always felt we need to on the side of freedom and we need to do the hard work. I think in this case, you know, there's so many valid ways, you know, if I want feedback on makeup or on my haircut or anything like that, I want to be able to talk to chattypreeti about it, that those are valuable and benign use cases. And I would prefer to allow and then study, where does that fall short? Where is that harmful?
然后基于这些发现进行迭代,而不是直接默认禁止。我认为这正是我们立场和姿态随时间发生变化的体现——在起点设置上我们做出了调整。
And then iterate from there versus taking a default stance on disallowed. And I think that's one of those ways in which our stance and posture has changed a bit over time in terms of where we set, you know, where we start.
是的,我们当初很擅长想象最坏情况:万一有人用这些面部图像来评估公司招聘呢?但同时也要看到实用价值:比如有人问‘这是湿疹吗?’这类需求确实存在很多实用性。
Yeah, we were very good, imagining worst case scenarios. What if I use these faces to evaluate hires for a company or whatever? Also, it's like, Hey, is this eczema? There's a lot of utility there.
说实话,我认为在某些AI安全需求上,考虑最坏情况是非常恰当的思维方式。因此,我认为在应对某些存在性风险或极其严重的风险时,这是一种重要的思考方式。我们有准备框架来帮助我们分析这些问题,比如AI是否能让你制造生物武器?考虑最坏情况是好的,因为后果确实会非常非常严重。
Honestly, I think there are certain demands of AI safety where worst case scenario thinking is very appropriate. So I think that is an important way of thinking about risk when it comes to certain forms of risks that are existential or even just very, very bad. You know, we have the preparedness framework, which helps us reason through some of those things. Know, can the AI let you make a bio weapon? It's good to think about the worst case there, because it'd be really, really bad.
所以公司内部必须保持这种思维方式,并且需要在某些特定话题上以这种方式思考安全。但不能让这种思维蔓延到风险较低的其他安全领域,否则最终会做出过于保守的决策,从而排除许多有价值的应用场景。因此,我认为针对不同类型的安全问题,在不同时间跨度和风险级别上采取原则性方法对我们至关重要。
So you kind of have to have that way of thinking in the company and you have to have certain topics where you think about safety in that way. But you can't let that kind of thinking spill over onto other domains of safety where the stakes are lower, because you end up I think making very, very conservative decisions that block out many valuable use cases. So I think being sort of principled about different types of safety on different time horizons and with different levels of stakes is very important for us.
我觉得有时候我需要一个直言不讳的模式。就是因为,比如说
I think I want a blunt mode sometimes. Just because like right
是要那种会直接吐槽你的模式吗?
Do where it actually roasts you?
嗯,我的意思是,是的,因为我会问模型,比如用语音输入输出的模型:'我听起来累吗?' 然后它会说'呃,我不太想...' 而我会说'对,我就是想让它说实话'
Well, I mean, yeah, because I'll ask the model, like, with the voice in speech out model be like, do I sound tired? And it's like, well, I don't really wanna And I'll be like, yeah, just trying to get it to be honest.
我认为很多文化其实更喜欢直率一点的聊天方式,这个需求其实一直在我们的关注范围内。
I think there's many cultures that would prefer a blunter chat to be doing so very much on the radar.
是的,我想接着Nick的回答补充一下。我认为正是迭代部署给了我们推进用户自由的信心。我们经历了多个周期,清楚用户能做什么不能做什么,这让我们有信心以当前的限制条件发布产品。
Yeah, just to piggyback off Nick's answer. I think it's the iterative deployment that gives us the confidence to push towards user freedom. And we've had many cycles of this, we know what users can and can't do, and that gives us the confidence to launch with the restrictions that we do.
另一个非常有趣的生成能力是代码功能。我记得早期GPT-3突然就能输出完整的React组件时,我们发现'哇,这确实有用'。于是我们专门针对代码训练了模型,后来就有了Codex和CodeInterpreter,现在Codex又以新形式回归了——名字相同但能力持续提升。
One of the other capabilities, one of the other generative capabilities that's been very interesting has been code. And I remember early on GPD three, we saw that all of a sudden it could spit out entire React components. And we saw that, oh, wow, there's some utility there. And then we went, we actually trained a model more specifically on code. And that led to we had Codecs, and we had CodeInterpreter, now Codex is somehow back, and a new form, same name, but the capabilities keep increasing.
我们看到代码功能首先通过Copilot进入VS Code,然后是Cursor,再到我现在一直使用的Windsurf。代码领域的竞争压力有多大?因为如果问谁做出了最好的代码模型,可能会得到不同答案。
And we've seen code work its way first into Versus Code via Copilot, and then Cursor, and then Windsurf, which I use all the time now. How much pressure has there been in the code space? Because I'd say that if we ask people who made the top code model, we might get different answers.
是的,这也反映出当人们讨论编程时,其实在谈论很多不同的事情。比如在特定范式下的编码——当你打开IDE想要获取函数补全,与代理式编程(你说'我想要这个PR')就非常不同。我认为我们已经在这方面做了大量重点投入
Yeah, and I think it reflects that when people talk about coding, they're talking about a lot of different things, right? I think there's coding in a specific paradigm, like if you pull up an IDE and you want to kind of get a completion on a function, it's very different from, you know, agentic style coding where you ask, you know, I want this PR and you know, and I think we've done a lot of focus
你能详细解释一下你所说的'代理式编程'是什么意思吗?
Could you unpack a little bit what you mean by agentic coding?
是的,我认为可以区分实时响应模型和更代理式的模型。你可以把Chachapi大致理解为:你提出一个提示,然后很快就能得到回复。而代理式模型则是你给它一个相当复杂的任务,让它后台运行,经过一段时间后,它会给你它认为最接近最佳答案的结果。我认为未来会越来越趋向于这种异步模式,你会向它提出非常困难复杂的问题。
Yeah, so I think when you can draw a distinction between more kind of real time response models, you can think of Chachapi to first order as you ask a prompt, and then you get a response fairly, fairly quickly. And a more agentic style model where you give it a fairly complicated task, you let it work in the background. And after some amount of time, it comes back to you with what it thinks is something close to the best answer. Right? And I think we see increasingly that the future will look like more of a async kind of, you know, where you're asking a very difficult, hard things.
你让模型去思考、推理,然后给你它能提供的最佳版本。我们在代码领域也看到了这样的演进。我认为最终我们会进入这样一个世界:你只需要给出一个高层次的需求描述,模型会花时间处理然后给你结果。我们首次发布的Codex就体现了这种范式,我们给它的是包含重要工作的PR单元,比如一个新功能或重大bug修复。
And you're letting the model think and reason and come back to you with really the best version of what it can come back with. And we see the evolution of code in that way too. I think eventually we do see a world where you'll kind of give a very high level description of what you want. And the model will take time and it'll come back to you. And so I think our, our first launch codex really reflects that kind of paradigm, we are giving it PRs units of fairly heavy work that encapsulate, you know, a new feature or, you know, a big bug fix.
我们希望模型花大量时间思考如何完成这个任务,而不是给你一个快速响应。
And we want the model to spend a lot of time thinking about how to accomplish this thing rather than kind of give you a fast response.
回到你的问题,编程是一个巨大的领域,有很多不同的角度。就像讨论知识工作这样极其广泛的话题,这就是为什么我认为不会有单一的赢家或最佳解决方案。我认为有很多选择,开发者是幸运的,因为他们现在有很多选择,这对我们来说也 fundamentally 令人兴奋。但正如Mark所说,我认为这种代理范式对我们来说特别令人兴奋。
And to get to your question, coding is such a giant space. There's so many different angles at it. It's like talking about knowledge work or something incredibly broad, which is why I don't think there's one winner and at least one best thing. I think there's so many options, and I think developers are the lucky ones because they have so many choices right now, and I think that's fundamentally exciting for us too. But to Mark's point, I think this agentic paradigm has been particularly exciting for us.
我在思考产品时经常使用的一个框架是:我想打造的产品具有这样的特性——如果模型性能提升2倍,产品实用性也提升2倍。ChatGPT长期以来确实做到了这一点,但随着模型越来越智能,人们对与像博士生对话的需求是有限度的,他们可能更看重模型的其他属性,比如个性化和实际应用能力。但像Codex这样的体验创造了合适的载体,让我们可以不断接入更智能的模型,这将带来变革性的影响,因为这种交互范式让人们可以指定任务,给模型时间,然后获得结果。
One framing I often use when thinking about product here is I wanna build products that have the properties such that if the model gets 2x better, product gets 2x more useful. And I think, you know, ChatGovideo has been a wonderful thing because for a long time, I think that was true. But I think as we look at, you know, smarter and smarter models, I think there's some limit to people's desire to talk to like a PhD student versus they might value other attributes of the model, like its personality and what it can actually do in the real world. But experiences like Codex, I think they create the right body such that we can drop in smarter and smarter models. And it's gonna be quite transformative because you get the interaction paradigm right where people can specify this task, give the model time, and then get a result back.
所以我对未来发展感到非常兴奋。虽然现在还处于早期研究预览阶段,但就像ChatGPT一样,我们认为尽早获得反馈是有益的,也很期待我们将把它带向何方。
So I'm really excited where it's gonna go. It's an early research preview, but just like with ChatGPT, we felt like it would be beneficial to get feedback as early as possible and excited where we're going to take it.
我经常使用Sonnet,我很喜欢它。我觉得Sonnet在编程方面很棒,但在windsurf中使用O4 minutei中等设置时,我发现效果很好。开始使用后我很满意,一方面是速度,还有其他方面。我知道人们喜欢其他模型有很好的理由,不想做比较,但就我使用的任务类型而言,这是第一次让我感到非常满意,很高兴你们推出了这个。
I was using Sonnet a lot, which I love. I think Sonnet for coding is fantastic, but with O4 minutei medium setting in windsurf, I found was great. I found that once I started using that, I was really happy because one, the speed, everything else like that. And I think that there are very good reasons why people like other models and I don't want to get into comparison, but I found out for me, for the kinds of tasks I was using, this was the first time. I was very happy you guys put that out
当然,因为代码领域确实还有很多低垂的果实。这是我们重点关注的领域,我相信在不久的将来,你会找到更多适合你使用场景的优秀代码模型选择。
Absolutely, there because yeah. And we feel like there's still a lot of low hanging fruit in code. It is a big focus for us and I think we'll find, in the near future, you'll find many more good options for the right code model tailored for your use case.
是的,我发现如果只是需要快速回答如何在Dart中写某些东西,用4.1版本就可以了,但对于更大的任务呢?我认为这将是更困难的部分,因为虽然这些评估在某种程度上已经饱和,但每个人都有自己的评判标准。这将是一个我们需要思考如何适应所有这些需求的问题。
Yeah, I find often if I just need a quick answer to like how to write something in Dart, does it get a 4.1 and say, what would be up for something bigger? I think that's going to be the harder part is because yeah, these evals are some way saturated, but also everybody has their own criteria that we look at. And that's going to be kind of a question to sort of see how are we going to adapt to all that.
是的。我的意思是,特别是在代码方面,对吧?我认为除了代码是否能给出正确答案之外,还有更多考量,你知道,人们关心代码的风格,关心注释的详细程度,关心模型在其他功能上为你做了多少主动工作。所以我觉得有很多方面需要做好,而且用户在这方面往往有非常不同的偏好。
Right. Yeah, I mean, specifically in code, right? I think there's more beyond did it get you the right answer with code, you know, people care about the style of the code, they care about, you know, how verbose it was in the comments, it cares about, you know, how much proactive work did the model do for you right on other functions? And so I think there's a lot to get right, and users often have very different preferences here.
是的,这很有趣。人们过去常问我,哪些领域会最快被改变。我过去常说,是代码,因为就像数学和其他领域一样,它非常非常可验证和可量化。我认为这些领域特别适合做强化学习,因此你会看到所有这些强大的智能体功能突然就开始奏效。
Yeah, it's funny. People used to ask me, What domains are gonna be transformed by the fastest. I used to say, yeah, it's code because like similar to math and other things, it's very, very verifiable and decibel. And I think those are the domains that are particularly great to do RL on. And you're therefore gonna see all this awesome agentic stuff just suddenly work.
我仍然认为这是对的,但关于代码令人惊讶的是,你知道,在什么构成好代码方面仍然存在很多品味的因素。而且,你知道,人们接受培训成为专业软件工程师是有原因的,不是因为他们的智商提高了,而是因为他们学会了如何在组织内构建软件。编写好的测试意味着什么?编写好的文档又意味着什么?
I still think that's true, but the thing that's surprising about code is that, you know, there is still so much of an element of taste in terms of what makes good code. And there's, you know, there's a reason that, you know, people train to be a professional software engineer. It's not because their IQ gets better because they, but rather because they learn how to build software inside an organization. What does it mean to write good tests? What does it mean to write good documentation?
当有人不同意你的代码时,你如何回应?这些都是成为真正软件工程师的实际要素,我们必须教这些模型去做。所以我预计进展会很快,并且我仍然认为代码具有许多优良特性,使其非常适合Gentec产品。但我确实认为,品味、风格和现实世界软件工程的重要程度非常有趣。
How do you respond when someone disagrees with your code? Those are all actual elements of being a real software engineer that we're gonna have to teach these models to do. So I expect progress to be fast and I still think code has a ton of nice properties that make it very ripe for the Gentec products. But I do think it's very interesting to the degree that element of taste and style and real world software engineering matters.
这也很有趣,因为使用ChatGPT和其他模型时,你某种程度上需要弥合消费者和专业用户之间的鸿沟。我打开ChatGPT并告诉我的朋友,哦,是的,因为我会把它接入我正在使用的任何代码模型,因为我实际上可以把它连接到那里。我会想,你知道,嗯,那是一个与很多其他人非常不同的用例,尽管我向人们展示过如何进入并使用IDE,让它为你写文档、创建文件夹等等,让人们意识到,是的,你可以那样做。你可以让ChatGPT实际控制并完成那些事,这很酷。但然后你会想,好吧,我们现在有一个图像标签页了。
It's interesting too because with ChatGPT and the other models, you're kind of dealing with having to bridge the divide between consumer and pro. I open up ChatGPT and I tell my friends like, oh yeah, because I'll plug it into whatever code model I'm working because I can actually connect it to there. And I think about, you know, well, that's a very different use case a lot of other people, although I've shown people like how to go in and use, you know, an IDE and actually have it just write documents for you and create folders and stuff for people to realize like, yeah, you could do that. You could have ChatGPD actually control it and do that, which is cool. But then you think about like, okay, we've got a tab now for images.
有Codex标签页。所以如果我想连接到GitHub并通过它工作,还有Sora在里面。所以看到所有这些功能如何融合在一起是很有趣的。你如何区分消费者功能、专业功能,也许还有企业功能?
There's the Codex tab. So if I want to connect to GitHub and have it work through there, and there's a Sora into there. So it's kind of interesting to see how all of these things are coalescing into there. How do you differentiate between a consumer feature, a professional feature, and maybe an enterprise feature?
听着,我们构建的是非常通用的技术,它将被各种各样的人使用。与许多公司不同,它们有某种特定的创始用户类型,然后使用技术解决该用户的问题,而我们则是从技术出发并以技术结束,观察谁在其中发现了价值,然后为他们迭代。对于Codex,我们的目标非常明确,就是为专业软件工程师构建,尽管知道有一个影响范围,我认为很多其他人也会在其中找到价值,我们也会努力让这些人能够使用它。有很多机会可以针对非工程师。我个人非常有动力去创造一个,或者帮助建立一个任何人都可以制作软件的世界。
Look, we build very general purpose technology and it's going be used by a whole range of folks. Unlike many companies which have this kind of founding user type and then they use technology to solve that user's problems, we do start off and end with the technology, observe who finds value in it, then iterate for them. Now with Codex, our goal was very much to build for professional software engineers, knowing though that there's sort of a splash zone where I think a lot of other people will find value in it and we'll try to make it accessible for those people as well. There are a lot of opportunities to target non engineers. I'm personally really motivated to create a world where, you know, or help build a world where anyone can make software.
Codex不是那个产品,但你可以想象这些产品会随着时间的推移而出现。但作为一个普遍原则,在我们提供一些这些通用技术之前,很难准确预测目标用户是谁,因为这又回到了我谈到的经验主义。我们只是从来无法确切知道价值会体现在哪里。
Codex is not that product, but you could imagine those products existing over time. But as a general principle, it's really hard to predict exactly who the target user is until we made some of these general purpose technologies available, because it gets back to the empiricism I was talking about. We just never exactly know where the value is gonna lie.
是的,我认为即使更深入地探讨这一点,假设,你知道,可能有一个人主要使用tried GPU进行编码,对吧?但5%的时间,他们可能只是想和模型聊天,或者5%的时间,他们只想要一张很酷的图片,对吧?所以我认为,确实存在使用模型的用户原型。但在实践中,我们看到人们希望接触不同的能力。
Yeah, and I think even to dig deeper into that, assuming like, you know, you could have a person who's mostly using tried GPU for coding, right? But 5% of the time, you know, they might just want to talk to the model or like, 5% of the time, they just want a really cool image, right? And so I think, know, there are certainly archetypes of people who use the models. But in practice, we see that people want this exposure to different capabilities.
看着Kodak's的发布,我有点震惊,有些工具你会看到很多人兴奋,因为内部对它有大量需求。你们内部使用多少?是像那样的工具吗?
With Kodak's and watching the launch of that, it kind of struck me there are some tools you see that there's a lot of excitement about because there's a lot of internal demand for that. How much are you using internally? Are tools like that?
越来越多。好的。
More and more. Okay.
我一直非常兴奋地看到内部采用情况。从你预想的情况——人们使用Codex来分担测试工作,到我们有一个分析师工作流,它会查看日志错误并自动标记,并通过Slack通知相关人员。所以有各种各样的方式,实际上我甚至听说有些人开始把它当作待办事项工具使用,比如他们希望完成的未来任务,他们开始启动Codex任务。所以这绝对是那种我认为你可以在内部讨论的完美案例。而且,你知道,我对工程师们将从这样的工具中获得的工作效率提升感到非常兴奋。
I've been really excited to see internal adoption. It's everything from, you know, exactly what you'd expect, you know, people using Codex to offload the tests to, you know, we have a analyst workflow that will look at, you know, logging errors and automatically flag them and Slack people about it. So there's all these these ways that we're or or I've actually heard heard some people are using as a to do where, like, future tasks they're they're hoping to do, they're starting to fire off codex tasks. So this is the perfect type of thing that I think you can you can talk with internally. And and, you know, I I'm very excited about, you know, the leverage that engineers are gonna get out of a tool like this.
我认为这将让我们用现有的人力更快地推进工作,让每个我们招聘的工程师效率提高十倍。在某种程度上,内部使用情况是我们发展方向的一个很好预测指标。
I think it's gonna allow us to move faster with the people we have and make each engineer that we hire 10 times more productive. In some ways, internal usage is a very good predictor of where we wanna take this.
是的。我的意思是,我们不想把我们自己都认为没有价值的东西推给别人。而且我认为,在发布前夕——
Yeah. Mean, don't wanna ship something to other people that we don't find value in ourselves. And I think, leading up to the launch-
洗衣伙伴。
Laundry buddy.
洗衣洗衣伙伴是一个
Laundry Laundry buddy is an
必不可少的伙伴。
essential partner.
抱歉,抱歉。
Sorry, sorry.
我的意思是,不过我们确实有一些重度用户,他们个人每天生成数百个PR。所以我认为内部确实有人从我们构建的东西中获得了很大效用。
I mean, we had some power users though, hundreds of PRs a day that they were generating personally. So I think there are people internally finding a lot of utility from what we're building.
另外,考虑到内部采用,这也是一个很好的现实检验,因为大家都很忙。采用新工具需要一定的启动能量。所以实际上,当你尝试在内部推广时,你会发现的一个现实因素是人们真正适应新工作流程所需的时间。观察这个过程很让人谦卑,对吧?所以我认为你既能了解技术方面,也能了解到当你试图让一群忙碌的人改变他们写代码的方式时的一些采用模式。
Also, you think about internal adoption, it's also a good reality check because people are busy Adopting new tools takes some activation energy. So actually, the thing you find when you try to dive through things internally is some of the reality component of how long it takes people to actually adjust to a new workflow. It's been humbling to watch, right? So I think you learn both about the technology, but you also learn about some of the adoption patterns when you're trying to get a bunch of busy people to change the way they write code.
在你们构建这些工具的过程中,内部人员必须学习如何使用它们并不断适应。现在有很多关于未来人们需要什么样技能的问题?你在团队中寻找什么样的技能?
As you build these tools, internally, people have to learn how to use them and are having to adapt. And there's a lot of question now about what kind of skills do people need in the future? What kind of skills do you look for on your teams?
我对此思考了很多。招聘很难,特别是当你想要一个非常优秀、谦逊且能快速行动的小团队时。我认为好奇心一直是我寻找的首要品质。这实际上也是我给学生的建议,当他们问我在这个一切都在变化的世界里该怎么做时?因为对我们来说,有太多我们不知道的东西。
I've thought about this a lot. Hiring is hard, especially if you want to have a small team that is very, very good and humble and able to move fast, etcetera. And I think curiosity has been the number one thing that I've looked for. And it's actually my advice to students when they ask me, what do I do in this world where everything's changing? Because for us, there's so much that we don't know.
在基于这项技术进行构建时,你必须保持一定程度的谦逊,因为你不知道什么是有价值的。除非你真正深入研究并尝试理解,否则你不会知道什么是风险。当涉及到与AI合作时(显然我们在这方面做了很多,不仅是在代码中,而是在我们工作的各个方面),提出正确的问题才是瓶颈,而不一定是获得答案。所以我从根本上相信,我们需要雇佣那些对世界和我们所做的事情充满深度好奇心的人,我稍微不那么在意他们在AI方面的经验。马克可能对此有不同的看法。
There's a certain amount of humility you have to have about building on this technology because you don't know what's valuable. You don't know what's risky until you really study and go deep and try to understand. And when it comes to working with AI, which we obviously do a lot, not just in code, but in kind of every facet of our work, it's asking the right questions that is the bottleneck, not necessarily getting the answer. So I really fundamentally believe that we need to hire people who are deeply curious about the world and what we do, I care a little bit less about their experience in AI. Mark presumably feels a bit different about that one.
但在产品方面,我发现好奇心是成功的最佳预测指标。
But for the product side, it's been curiosity that I've found the most the best predictor of success.
不,我的意思是,即使在研究方面,我们也越来越不强调你必须拥有AI博士学位,对吧?我认为这是一个人们可以相当快速掌握的领域。我也是作为resident加入公司的,没有太多正式的AI培训。我认为与尼克所说的相关,一个重要的事情是我们的新员工要有自主性,OpenAI是一个你不会得到太多
No, I mean, even on research, I think increasingly less we index on you have to have a PhD in AI, right? I think this is a field that people can pick up fairly quickly. I also came into the company as a resident without much formal AI training. And I think correlated to what Nick said, I think one important thing is for our new hires to have agency, right open as a place where you're not going to get so much of a go, here's today, you're going to do thing one thing, two thing three, it's really about being kind driven to find, hey, here's the problem, you know, no one else is fixing it, I'm just gonna go dive in and fix it. And also adaptability, right?
这是一个
It's a
快速变化的环境。这就是当前领域的本质。你需要能够快速弄清楚什么是重要的,并调整你需要做的事情。
very fast changing environment. That's just the nature of the field right now. And you need to be able to quickly figure out what's important and pivot what you need to do.
自主性这一点非常真实。你知道,我们经常被问到,OpenAI是如何保持产品发布的,感觉你们每周都在推出新东西之类的。这很有趣,因为我从不觉得快。我总是觉得我们还可以更快。我认为根本在于,只要有很多有自主性的人能够交付产品。
The agency thing is is real. You know, I think we often get asked for, you know, how how does open air, you know, keep shipping and, you know, it feels like you're pushing something out every week or something like that. It's a funny because it never feels to me. I always feel like we could be going even faster. I think fundamentally, just have a lot of people with agency who can ship.
这适用于产品、研究和政策。交付可以意味着不同的事情。我们在OpenAI都做非常不同的事情,但我认为真正能做事的人的比例以及缺乏官僚主义(除了少数我认为官僚主义非常重要的领域)是使OpenAI非常独特的原因,这显然也影响了我们想要雇佣的人才类型。
And that comes to product, that comes to research, that comes to policy. Shipping can mean different things. We all do very different things at OpenAI, but I think the ratio of people who can actually do things and the lack of red tape, except where it matters, the couple of areas where I think red tape is very, very important. But I think that is what makes OpenAI very unique and it obviously affects the type of people who we wanna hire too.
我被招进公司是因为最初获得了GPT-3的访问权限,然后我开始展示它的所有用例,并每周为它制作视频。
I was brought into the company because I was originally given access to GPT-three and I just started showing all these use cases for it and making videos every week for it.
是的,
Yeah,
那肯定很让人恼火。
and that was annoying people, I'm sure.
太迷人了。
Was fascinating.
那很激动人心。那是个激动人心的时刻。我向别人描述时说我感觉他们造了个UFO而我能玩它。然后我让它悬浮起来,他们就说'哇你让它悬浮了'。我说'不,是造了它'。
It was exciting. It was an exciting time. I described it to people I think they built a UFO and I get to play with it. And then I make it hover and like, oh, you made it hover. I'm like, well, built it.
我只是按了个按钮就能做到。但真正让我感到有力量的是我是自学成才的。我通过Udemy课程等学会了编程,然后成为工程团队的一员并被告诉'只管去做'。没什么太关键的。我没给任何人搞砸任何事情。
I just pressed the button and got to do that. But that was just what I found very empowering was the fact that I'm self taught. I learned to code by Udemy courses and stuff, and then to be a member of the engineering staff and be told just go do stuff. Nothing too critical. I didn't break anything to anybody.
很高兴知道这种精神仍然存在。我认为这也是OpenAI能够持续交付的原因之一,尽管有大约150到200人参与GPT-4的开发。我觉得人们忘了这一点。
And that's good to know that that kind of spirit is still there. I think that is part of the reason why OpenAI is able to ship even though, it was like 150, 200 people worked on GPT-four. I think people forget about that.
完全同意。说实话,ChatGPT也是这样诞生的。我们有个研究团队,他们一直在研究指令跟随及其后续工作,以及如何通过后期训练让模型擅长聊天。但产品化的努力是通过黑客马拉松凝聚起来的。
Totally. And honestly, this is how and even ChatGPT, is how it came together. We had a research team. They'd been working for a while on instruction following and then the successor to that and post training these models to be good at chat. But the product effort came together as a hackathon.
我清楚地记得我们说:'谁有兴趣来打造消费级产品?'
I remember distinctly, we said, Who's excited to go build consumer products?
然后我们
And we
聚集了各种各样的人。超级计算团队有个家伙说'我来做个iOS应用,我前世做过',还有个研究人员写了一些后端代码。就是一群对做事充满热情的人汇聚在一起。我认为这种能力非常宝贵。
had all these different people. We had a guy from the supercomputing team who was like, I'll make an iOS app. I've done that in a past life where we had a researcher who wrote some backend code. It was just convergence of people who were excited to do stuff. And I think the ability to do so.
我认为这就是获得下一个战略的方式——运营一个能够实现这一点并且随着规模扩大仍能持续实现这一点的组织。
And I think that's how you get the next strategy, is running an organization where that is possible and continues to be possible as you scale.
黑客马拉松是我最喜欢的事情,一方面是因为我擅长表演且热爱展示与讲述。但能够看到那些你知道将来会成为产品或类似东西的项目,这种感觉真的很棒。当你们在玩转如此先进的技术时,你们现在还举办黑客马拉松吗?
Hackathons were my favourite thing because one, being a performer and loving show and tell. But it was just neat to be able to see things that you knew were going to be a product or something later on. When you're playing with the technology this advanced and all that, do you guys still do them?
是的,当然。我们最近还举办过一些,它们通常与
Yeah, absolutely. We've had some fairly recently and they are typically tied
上周?
Last week?
实际上,没有?
Actually, no?
不能透露具体内容,但确实举办了。很令人兴奋——这就是你发现可能性的方式。
Can't say what it was about, but it was Sure. An exciting It's how you find out what's possible.
听到这个我很兴奋。我确实有个问题:随着公司发展,就像我刚开始时,
I'm excited to hear that. I do have a question, which is how much as it grows, again, like when I started,
我想大约150
I think like 150
人的公司,现在有大约2000人。现在,我看到Sam和Johnny Ive对话的视频。这会多大程度上改变引入所有这些所带来的特质和精神?我认为所有外部专业知识都很棒,我们也看到了这一系列出色的产品,但你觉得这会改变文化吗?
people on the company, now there's like 2,000. And now, you know, I see a video with Sam talking to Johnny Ive. And how much is that going to change the character, the spirit of bringing in all this? I think all the outside expertise has been great. We've seen this great sort of run of products, but do you see it changing the culture?
嗯,我认为可能会以正确的方式改变。对吧?就像,当我们看待人工智能时,我们不认为它是某种相当狭隘的东西。我们一直对人工智能的潜力以及你可以用它构建的所有不同事物着迷。正如Nick所说,这就是我们能够如此快速发布产品的原因,因为人们想象了所有这些不同的可能性。
Well, I mean, I think probably in the right way. Right? It's like, I think when we look at AI, we don't think of it as some fairly narrow thing. And we've always been kind of enthralled by just the potential and all the different things you could build with AI. And to Nick's point, this is why we're able to ship so quickly, because people imagine all these different possibilities.
他们想象着有AI的未来,并努力将其变为现实。我认为这些都是那种想象力的不同侧面,对吧?比如说,如果你首先想象AI设备,AI会是什么样子?
They imagine the future with AI, and they try to bring it about. And I think these are facets of that imagination, right? It's like, what does AI look like if you imagined AI first device, for instance?
你知道,当人数从200增加到2000时,你会以为会有很大变化。确实,在某些方面可能确实如此。但我认为人们常常低估了我们正在做的事情的数量。我总觉得在OpenAI的感觉更像是在大学里,大家有共同的使命,但每个人都在做不同的事情。你会坐在晚餐或午餐时与某人交谈,了解他们正在做的事情。
You know, when you go from 200 to 2,000, you'd think a lot would change. And yeah, maybe in some ways it has been. But I think people often underestimate the number of things that we're doing. I always feel like being at OpenAI feels much closer to being in a university where you've got this kind of common reason to being there, but everyone's doing something different. You'll sit down at dinner or at lunch and you'll talk to someone and learn about their thing.
你会想,哇,你在做那个真是太酷了。所以感觉规模要小得多,因为考虑到我们正在做的广泛事情,每个单独的项目,无论是ChatGPT还是Sora等等,实际上都是以非常非常保守和精简的方式配备人员的,这样既能保持人们的自主性,又能确保他们有资源等等。所以我认为这部分原因使得这里的感觉与我刚开始时非常相似,而且是好的方面相似。
You're like, wow, that's so cool that you're doing that. And so it feels much smaller because I think of the broad range of things we're doing, therefore each individual effort, whether or not that's something like ChatGPT or something like Sora or etcetera, is actually staffed in a very, very conservative and lean way that then continues to keep people very autonomous and make sure they have resources, etc. So think it's partly that that has made it feel very, very similar in the good ways to when I started here.
我们谈到了你寻找的特质之一是好奇心,马克说那也有帮助。如果我是AI领域之外的人,无论25岁还是50岁,看着技术的进步,可能有点害怕,因为我看到ChatGPT非常擅长文案写作。编写代码也很厉害。我个人认为我们永远不会有足够的人来创建代码,因为代码在世界上能做的事情比我们能想象的要多。即使是放置文案的地方,我妻子前几天给我看她防晒霜瓶子上的文案,关于成分的一些非常有趣的描述。
We talked a bit about one of the things you look for is curiosity and Mark said that's helpful too. If I'm somebody outside of AI, if I'm 25 or I'm 50, and I'm looking at the advancement of technology and maybe having a little bit of fear because I see copywriting is one of things that ChatGPT got great at. Writing code is great. I personally have the opinion that we'll never have enough people creating code because there's more things code can do in the world than we can imagine. And even the thing that places the copy, my wife showed me the other day on her skin block or sunblock lotion bottle, showed me on her sunblock lotion bottle like some very funny copy about like the ingredients.
我说,哦,我没想到会在这里看到这个,但这就是那些突然可以投入更多心思的小地方之一。话虽如此,我知道我有点乐观,因为我看到所有这些机会都是可以探索的领域。你会给人们什么建议,无论他们处于人生的哪个阶段,关于准备、适应或参与未来?我喜欢马克刚才直接看向我的样子。
I said, Oh, this is not a place I expected to see this, but that's one of the tiny little places that all of a sudden that you can put more thought into it. That being said, I know that I'm a bit of an optimist because I see all these opportunities are places to go in there. What advice do you give people, you know, whatever point they are in life about preparing for or adapting to or being part of the future? I like how Mark just looked right to me.
哦不。我可以先说。好吧,我现在就插话。我认为重要的是你必须真正投入使用这项技术,对吧?你必须看到你自己的能力如何能够被增强,通过使用技术你可以更高效、更有效。
Oh no. I can go. Okay, I will jump in right now. Think the important thing is you have to really lean into using the technology, right? And you have to see how your own capabilities can be enhanced, you can be more productive, more effective by using the technology.
我从根本上认为,这将演变成你仍然拥有人类专家,但AI帮助最大的是那些不具备非常高级能力的人,对吧?所以如果你想象一下,随着这些模型在医疗建议方面变得更好,它们将最帮助那些无法获得医疗服务的人,对吧?图像生成,对吧?它不是为专家或专业艺术家提供替代品,而是让像我尼克这样的人能够进行创意表达,对吧?
I fundamentally do think that the way this is going to evolve is you still have your human experts, but what AI helps the most is the people who don't have that capability at a very advanced level, right? So if you imagine, right, like as these models get much better at healthcare advice, they're going to help people who don't have access to care the most, right? Image generation, right? It's not producing an alternative for experts or professional artists. It's allowing people like me and Nick to create creative expressions, right?
所以我认为这就像水涨船高,让人们能够同时胜任和有效完成很多事情。我认为这就是我们将看到许多这样的工具如何帮助人们起步的方式。
And so I think it's kind of rising the tide that allows people to be competent and effective at a lot of things all at once. And I think that's kind of how we're going to see a lot of these tools bootstrap people.
世界将会发生很大变化。我认为确实每个人都会有这样一个时刻:AI做了某些他们认为是神圣且专属于人类的事情。
The world's going to change a lot. I think truly everyone has a moment where the AI does something that they considered sacred and human.
认识一个人,他获得了股权,但对自己在代码方面的成就感到非常受威胁
Know a guy that got vested and felt very threatened about his achievements in code and
嗯,那是很久以前发生在我身上的事了。我们还是聊聊房间里的其他人吧。
Well, happened for me a long time ago. Let's be talking about someone else in the room.
哦对。我是说,确实,在很多代码问题解决方面它肯定比我强得多。
Oh yeah. Mean, yeah, it's definitely better than me at a lot of code problem solving for sure.
是的。没错。所以我认为感受到某种程度的敬畏、尊重,甚至可能是恐惧,这是非常人性化的。而且我觉得马克说得对,实际使用这个东西可以消除它的神秘感。我想我们都是在AI这个词意味着与今天截然不同的世界里长大或了解它的。
Yeah. Right. So I think it's deeply human to to to feel some level of awe, respect, and maybe even fear. And I think to Mark's point, actually using this thing can demystify it. I think we all grew up or learned about the word AI in a world where AI meant something pretty different from what we have today.
你有那些试图向你推销东西、试图做事的算法,或者你看过AI接管世界的电影等等。这个词对不同的人意味着太多东西,所以我完全不惊讶会有恐惧。因此实际使用它,我认为是进行有根据对话的最佳方式。然后我认为从那里开始,最好的准备方式,在某种程度上你需要了解产品并跟上发展,这没错。但我觉得像提示工程或理解这种AI的复杂性这类事情,它们并不是正确的方向。
You've got these algorithms that try to sell you things, try to do things, or you've got movies where the ad takes over, etcetera. That term means so many things to different people that I'm entirely unsurprised that there's fear. So actually using the thing is, I think, the best way to have a grounded conversation about it. And then I think from there, the best way to prepare, I think there's some degree to which you need to understand the products and keep up, sure. But I think things like prompt engineering or sort of understanding the intricacies of this AI, they're kind of not the right direction.
我认为更根本的人类技能,比如学习如何委派任务。这非常重要,因为越来越多地,你口袋里会有一个智能体,它可以成为你的导师、顾问、软件工程师。这更多关乎你了解自己和你面临的问题,以及别人如何帮助,而不是对AI的具体理解。所以我认为保持好奇心很重要。我之前提到过,我认为提出正确的问题,你能得到的取决于你投入什么,这很重要。
I think sort of fundamental human things like learning how to delegate. That is incredibly important because increasingly, you're going have an intelligence in your pocket that it can be your tutor, can be your advisor, it can be your software engineer. It's much more about you understanding yourself and the problems you have and how someone else might help than a specific understanding of AI. So I think that's going to be important curiosity. I mentioned earlier, I think asking the right questions, you'll get you only get what you put in, That's important.
而且我认为从根本上准备好学习新事物,你学得越多,理解如何掌握新主题和领域等等,你就越能准备好应对一个工作性质变化速度比以往任何时候都快的世界。所以我准备好我的产品工作会看起来不同或根本不存在。但我期待学习新东西。我认为只要你保持这种心态,就能很好地利用AI。
And I think fundamentally being ready to learn new things, I think the more you learn, understand how to pick up new topics and domains, etcetera, the more you're gonna be prepared for a world where the nature of work is shifting much faster than it's ever shifted before. So I'm prepared that my job, you know, in product is going to look different or not exist at all. But I am looking forward to picking up something new. I think as long as you bring that perspective, you're well set up to leverage AI.
我觉得有时候我们过度关注某些工作的消失,比如我们不再需要很多打字机维修人员了,对吧?某些类型的编码工作可能也会消失。但就像我说的,我认为编码人员或有能力创造代码的人会有更多机会,无论通过什么方式。你提到了医疗领域。这是我听到人们常说的一点,当用AI取代一切时,嗯,我的意思是,我会很乐意让AI来诊断我、给我做手术,可能还会做所有其他事情。
I think sometimes over index on, you know, sometimes certain jobs go away because we don't really need a lot of typewriter repair people anymore, right? And then certain kinds of coding jobs are probably going to go away. But like I said, I think there's way more opportunity for coders or people to create code however it's done. And you mentioned like the health field. And that's one of the things I hear people like, Oh, when we replace everything with AI, like, well, I mean, I would be very happy having an AI diagnose me, operate on me, and probably do everything else.
但我确实希望有人在那里向我解释手术过程并握住我的手。而且,我也希望有人能回答问题。我每天吃一堆维生素。这是一天中吃它的合适时间吗?我不能用所有这些愚蠢的小问题打扰我的医生。
But I do want somebody there to talk me through the procedure and hold my hand. But also, want people asking questions. Every day I take a bunch of vitamins. Is this the right time of day to take it? I can't bother my doctor with all these silly little questions.
我真的不认为最终会取代医生。你最终取代的是不去看医生。你最终是在民主化获得第二意见的能力。很少有人拥有这种资源或知道如何利用这样的资源。你最终将把医疗保健带到世界上那些不易获得的地方。
I really don't think you end up displacing doctors. You end up displacing not going to the doctor. You end up democratizing the ability to get a second opinion. Very few people have that resource or know to take advantage of a resource like that. You end up bringing medical care into pockets of the world where that is not readily available.
而且你最终会帮助医生获得信心。我经常听医生说,他们已经会与现有同事交流以获得第二意见。在某些情况下,这是不可能的,而且我想你会对使用ChatGPT的医生数量感到惊讶。现在,在医学等领域,有工作要让模型变得非常非常好,我们很兴奋能做到这一点。也有工作要证明模型确实很好,因为我认为在没有一定程度的合法性之前你不会信任它。然后还有工作要解释模型可能不擅长的领域,因为一旦它达到人类甚至超人类的表现水平,就很难准确界定它会在哪里不足,这也很难应对。
And you end up helping doctors gain confidence. I've often heard from doctors that they already talk to existing colleagues to get a second opinion. In some cases, that's not possible, and I think you'd be surprised by the number of doctors that use ChatGPT. Now, on things like medicine, there's work to make the model really, really good, and we're excited to do that There's also work to prove that the model is really good because I think you're not going to trust it until there's some degree of legitimacy. And then there's work to explain the areas where the model might not be good because increasingly, once it gets to human and then super human level performances, it's hard to frame exactly where it will fall short, which is also hard to sort of reckon with.
但尽管如此,我认为机会是让我早上起床的动力之一。教育可能是另一个。我认为帮助他人存在着巨大的机遇。
But nonetheless, I think that opportunity is one of the things that gets me up in the morning. Education might be the other one. And I think there's a tremendous opportunity to help people.
你认为在未来一年到十八个月内,最让我们惊讶的会是什么?
What do you think is going to surprise us the most in the next year to eighteen months?
老实说,我认为会是由我们构建的模型所推动的研究成果的数量,即使只是以某种微小方式。其中一个悄然席卷该领域的事情是模型的推理能力。你已经看到一些研究
I honestly think it's going to be the amount of research results that are powered, even in some small way, by the models that we've built. One of the kind of quiet things that's taken the field by storm is the ability of the models to reason. You already see some I'm research
当你提到‘推理’时,你得解释清楚。
going make you explain when you say reason.
我想
I want
你在解释‘推理’时,要像推理问题一样去思考。大声说出来。展示思考过程。
you to reason through the question as you explain reason. Think out loud. Traces.
这确实契合我们之前讨论的‘智能体范式’。模型解决一个需要花费时间的问题的方式是,它会像你我一样进行推理,对吧?如果我给你一个非常复杂的
This really fits into this agentic paradigm that we were talking about earlier. The way that the models approach solving a problem that takes some time to solve is that it reasons through it much like you or I might, right? If I give you a very complicated
我觉得你的推理能力可能比我强多了,
I think you reason probably much better than I do,
马可。我的意思是,面对一个复杂的谜题时,对吧?你可能会在心里想,比如说,就用填字游戏举例吧,对吧?就像你可能会思考所有不同的可能性以及什么是连贯的,你知道,这一行和那一列是否一致?你在搜索大量可能性,经常回溯,尝试验证许多你的假设。
Marco. I mean, I think faced with a complicated puzzle, right? You might think to yourself, for instance, let's just use a crossword puzzle, right? Like you might think through all the different alternatives and what's consistent, you know, is this row kind of consistent with that column? And you're searching through a lot of alternatives, you're backtracking a lot, you're trying out a lot of your hypotheses.
然后最终,对吧,你得出了一个结构良好的答案。所以模型在这方面正变得越来越好。而这正是推动数学、科学、编程领域许多进步的动力。所以这已经达到了一个水平,如今在许多研究论文中,人们几乎把O3当作一个子程序来使用,对吧?在他们试图解决的研究问题中存在一些子问题,这些子问题通过接入像O3这样的模型实现了完全自动化解决。
And then at the end, right, you come up with a well formed answer. And so the models are getting a lot better at that. And that's what's powering a lot of the advancements in math, in science, in coding. So this has reached a level where today in many research papers, people are using O3 almost as a subroutine, right? There's subproblems within the research problems they're trying to solve, which are just fully automated and solved through plugging into a model like O3.
我在多篇物理学论文中都看到过这种情况。甚至和物理学家交流时,他们都会感叹:哇,我有个表达式一直无法简化,但O3在这方面取得了进展。这些都是来自国内顶尖物理学家们的反馈。所以我认为这种现象会越来越普遍。我们将看到物理学和数学等领域的研究进展不断加速。
I've seen this in several physics papers. Talk to physicists even where they're like, wow, I had this expression that I couldn't simplify, but O3 made headway on it. And these are coming from some of the best physicists in the country. So I think you're going to see that happen more and more and more and more. And we're going to see just acceleration in in progress in fields like physics and mathematics.
这确实很难超越,因为如果能取得真正的重大科学突破,我愿意用我们做的很多事情来交换。但我觉得我们可以同时实现多个突破。对我来说,关键在于任何被清晰描述且受限于智力水平的问题,我认为都将在产品中得到解决。而我们根本的限制就在于实现这一点的能力。这意味着在企业中,存在大量本质上就很困难的问题,目前的模型还不够智能来处理——无论是软件开发、数据分析,还是提供卓越的客户支持。
It's a hard one to beat because I would swap many things we do in exchange for making a true significant scientific advancement. But I think we can have multiple of these things. Think for me, it's the fact that any well described problem that is intelligence constrained I think will be solved in products. And I think we're fundamentally just limited by our ability to do that. So what that means is, in companies in the enterprise, there are so many problems that are fundamentally hard that the models are not smart enough to do yet, whether or it's software engineering, whether it's not running data analysis, whether or not it is providing amazing customer support.
目前模型在这些问题上还存在不足,而这些问题的描述和评估都非常简单直接。我相信我们将在这些方面取得巨大进展。在消费者端同样存在这类问题,只是更难发现,因为消费者通常不太擅长准确表达自己的需求——这正是消费产品开发的本质特点。
There's all these problems that the models fall short at today that are very, very easy to describe and evaluate. And I think that we'll make tremendous progress at those. On the consumer side, these problems exist too. They're a bit harder to find just because consumers are worse at telling us exactly what they want. That's the nature of building consumer products.
但我认为这非常值得投入,因为我们的个人生活中有许多棘手事务——无论是报税、旅行规划、进行高价值采购(比如买房买车或选购服装),这些都是需要更多智能支持和合适形态来解决的问题。因此我认为未来一年半还会出现另一个变化:AI将演化出不同的形态。聊天模式仍然是非常有用的交互方式,不会消失,但你会越来越多地看到异步工作流的出现。编程只是一个例子,对消费者而言,可能是让AI去帮你找到最合适的鞋子、规划旅行行程,或者完成税务申报。
But I think it's very, very worthwhile where there's many hard things we do in our personal life, whether or it's doing taxes, whether or not it's planning a trip, whether or not it's searching for a high consideration purchase, whether or that's a house or a car or a piece of clothes. All those things are problems where we need just a little bit more intelligence and the right form factor. So I think the other thing that's going to happen in the next year and a half is you'll see a different form factor in AI evolve. I think chat is still an incredibly useful interaction model and I don't think it's going to go away, but increasingly you're going to see more of these sort of asynchronous workflows. Coding is just one example, but for consumers who might be sending this thing off to go find you the perfect pair of shoes or to go plan a trip or to go finish your taxes.
我认为这将非常令人兴奋,我们会开始以不同于聊天机器人的视角来看待AI。
And I think that's going be exciting and we're going to think of AI a little bit differently than just a chatbot.
我最喜欢的例子之一,无论是从功能实用性还是用户界面来看,都是深度研究功能。这可能是目前最能体现智能体模式应用的范例。过去你让模型介绍某个主题时,它要么直接给出数据,要么就是在网上搜索后做个总结。而深度研究会主动寻找数据集,进行分析,提出问题,然后继续寻找新数据并反复推敲。我第一次使用时——其他人也有同感——感觉这个过程需要些时间。后来你们改进了UI设计,让我可以离开去做别的事情。
One of my favorite examples, both from a utility point of view capability and then UI was deep research. And deep research is probably the best example we maybe have of probably agentic sort of model use right now because it used to be you would ask for a model to tell you about a topic. You either get the data or just do a big search of the internet and then it would just summarize all that Where deep research will go find some set of data, look at it, ask a question, then go find some new data and come back to it and keep going on. And I think the first time I used it, other people used it like, wow, this is taking a while. And then you added a UI change so I can actually go away and go do something else.
然后我手机锁屏界面会显示'正在处理中',这简直是个范式转变。我和Sam讨论过这个问题,他说人们愿意等待答案这点让他很惊讶。现在我看到了衡量模型的新标准:模型解决问题所能投入的时间长度——只要最终能解决问题,这就是个好指标。这对你的认知有所更新吗?你是如何看待这些变化的?
And then the lock screen on my phone will show me this is working, which was a paradigm shift. And I talked to Sam here about that. Sam said that was a surprise to him, the fact that people would be willing to wait for answers. Now I have seen a new metric for models as how long a model can spend trying to solve a problem, which is a good metric if it ultimately solves it. And has this been an update to you and how you think about these things?
这种'我们不只是要即时答案'的理念——你之前也提到过智能体的概念——本质上就是说'不用着急,慢慢处理,完成后告诉我就好'。
The idea of like, Oh, we don't just want and I guess you talked about this before about agentic and the idea that it's not just give me the answer. It's like, Take your time. Get back to me.
我认为要打造超级助手,就必须放宽限制条件。现在的产品完全采用同步模式,需要用户主动发起所有操作,这并非帮助用户的最佳方式。就像现实世界中与你共事的智能体,它必须能够长时间独立工作,还必须具备主动行动的能力。
I think, you know, to build a super assistant, you got to relax constraints. Like today you have a product that is entirely synchronous, you have to initiate everything. That's just not the maximally best way to help people. Like if you think about a real world intelligence that you might get to work with, it has to be able to go off and do things over a long period of time. It has to be able to be proactive.
所以我们正在逐步解除产品和技术的诸多限制,以更好地模拟一个极其有用的实体。能够处理五分钟任务、五小时任务,最终甚至五天任务的能力,我认为这将从根本上释放产品不同层次的价值。因此我对人们愿意等待并不感到意外——毕竟我也不想干等着同事完成工作。只要价值足够,我很乐意先处理其他事情再回来查看结果。
So I think there's like, we're sort of in this process of relaxing a lot of the constraints on the product and on the technology to better mimic a very, very helpful entity. The ability to go do five minute tasks, five hour tasks, eventually five day tasks is like a very, very fundamental thing that I think is gonna unlock a different degree of value in the product. So I've actually not been that surprised that people are willing to do that. Like, I don't really wanna be sitting around waiting for my coworker either. And I think if the value is there, I'd gladly be doing other stuff and come back.
是的。我们这么做并非无缘无故,对吧?我们是出于必要性。模型需要那些时间来解决真正困难的编码问题或数学问题,减少时间它是做不到的。你可以把这想象成我给你一个脑筋急转弯,对吧?
Yeah. And we really don't do it just because, right? We do it out of necessity. The model needs that time to solve the really hard coding problem or the really hard math problem, and it's not going to do it with less time. You can think about this as I give you some kind of brain teaser, right?
你的第一反应可能是直觉性的错误答案,你需要实际的时间来梳理所有情况,比如这里有没有什么陷阱?我认为正是这类东西最终造就了强大的智能体。
Your quick answer is probably the intuitive wrong one and you need that actual time to kind of work through all the cases to like are there any gotchas here? And I think it's that kind of stuff that ultimately makes robust agents.
我们经常看到这样的情况:时不时会有一篇论文冒出来说,啊,我找到了一个障碍。我记得大约一个月前就有一篇,他们说模型无法解决某些类型的问题,但其实不难设计出一个提示词,通过训练就能让模型解决这类问题。最近又有一篇新论文讨论它们会在某些问题解决类型上失败。我认为这个很快就被推翻了,因为那篇论文本身存在缺陷。但局限性确实是存在的。
We've seen kind of there's like the paper of the moment where somebody comes out and says, Ah, I found a blocker. And I remember there was one a month or so ago and they said models couldn't solve certain kinds of problems and it wasn't hard to figure out a prompt that you could train into a model and it could solve those kinds of problems. And we had a new one that talked about how they would fail at certain kinds of problem solving ones. And that was kind of quickly, I think, debunked by showing that, you know, the paper kind of had flaws in there. But there are limitations.
有些事情可能存在一些障碍,或者有些东西我们不知道会出现。我认为脆弱性就是其中之一。模型在解决问题上能花费的时间是有限度的。我们目前可能处于这样的阶段:只让两个系统互相监督,我们还得思考第三个系统如何介入,以防事情崩溃。但是,在你看来,从现在到我能获得那些能做出有趣科学发现的模型之间,存在什么障碍吗?
There are things that there might be some blockers or things we don't know are going to be there. I think brittleness is one of the things. There is a point where models can only spend so much time solving a problem. We're probably at a point where we're only having the model, you know, maybe two systems watch each other and we have to think about how a third system stops, you know, to wait for things to break down. But do you see any blockers between here and where I'm getting the models that are going be solving, doing things like coming up with interesting scientific discoveries?
我认为技术上的创新是我们一直在努力追求的。从根本上说,我们从事的是大规模产生简单研究想法的业务,而真正实现规模化的机制是困难的。这需要大量的工程和研究工作来想办法突破某个特定的障碍。我认为这些挑战会一直存在,对吧?每一个规模层级都会带来新的挑战和新的机遇。
I think there are always technical innovations that we're trying to come up with. Fundamentally, we're in the business of producing simple research ideas at scale, and the mechanics of actually getting that to scale are difficult. It's a lot of engineering, a lot of research to figure out how to kind of tweak past a certain roadblock. And I think those are always going to exist, right? Every layer of scale gives you new challenges and new opportunities.
你知道,基本方法是一样的,但我们总是会遇到新的小挑战,需要我们
You know, fundamentally the approach is the same, but we're always encountering new small challenges that we
去克服。我想补充一点,我们从事的另一项业务是用这些模型打造优秀的产品。我认为我们不应低估真正将这些日益智能的模型引入合适环境所需的挑战和探索量,无论是给它们提供合适的行动空间和工具,还是真正贴近最困难的问题、理解它们并聚焦于此。所以我认为既有技术上的答案,也有现实世界部署的问题。我认为后者总是充满非常非常难以预测却又值得应对的挑战,这也是我们使命的一部分。
have to overcome. Just to build on that, I mean, the other business we're in isn't building great products with these models. And I think we shouldn't underestimate the challenge and amount of discovery needed to really bring these ever intelligent models into the right environment, whether or that's giving them the right sort of action space and tools, whether or not that's really being proximate to the problems that are hardest, understanding those and bringing the eye there. So I think there's the technical answer, but I think there's also the real world deployment. And I think that always has challenges that are very, very hard to predict yet worthwhile and part of our mission to do this all.
好了,最后一个问题,我来开头。你最喜欢ChatGPT的哪个用途或使用技巧?我的是拍一张菜单照片,然后让它帮我规划一顿饭,或者如果我想坚持某种饮食之类的,就让它帮忙。
All right, last question and I'll begin. It's what's your favorite use or tip for ChatGPT? Mine is I take a photograph of a menu and I'm like, help me plan a meal or whatever if I'm trying stick to a diet or whatever.
看,我真的很想要那个用例,但我一直尝试用它来看酒单,这就是我对多模态能力的评估。它还是不行。真的吗?它总是用虚构的葡萄酒推荐让我尴尬,我去点单时,人家说从来没听说过这款。所以我很高兴你的用例有效。
See, really want that use case, but I've been trying it for wine lists and that is my eval on multimodality. It still doesn't work. Really? It keeps embarrassing me with hallucinated wine recommendations and I go order it and they're like, never heard of this one. So I'm glad yours works.
但对我来说,那仍然是一个目标用例。
But for me, that's still a use case.
嗯,线条透镜太密集了。这是操作员的问题,原本是分割模型的问题,文本过于密集,就失去了它的位置。
Well, the line lens is too dense. That was a problem with the operator, was it originally was the division models, the too much dense text, it just loses its placement.
是的,说到深度研究,我很喜欢使用深度研究。你知道,当我去见新朋友,当我要和某人谈论人工智能时,我就会预先准备话题。我认为这个模型能很好地帮我理解我是谁、我要见的人是谁,以及我们可能感兴趣的话题。我觉得它确实对整个流程很有帮助。
Yeah, mean, speaking to deep research, I love using deep research. You know, when I go meet someone new, when I'm going to talk to someone about AI, right, I just preflight topics. I think the model can do a really good job of contextualizing who I am, who I'm about to meet, and what things we might find interesting. And I think it it really just helps with that whole process.
非常酷。我是语音功能的忠实信徒。我觉得它还没有完全成为主流,因为它还有很多小问题累积起来。但对我来说,语音功能一半的价值在于能有人交谈,并迫使自己清晰地表达想法。我发现这在写作中有时很难做到。
Very cool. I'm a voice believer. I I it's still got I I don't think it's entirely mainstream yet because it's got many little kinks that all add up. But for me, half of the value of voice is actually just having someone to talk to and forcing yourself to articulate yourself. And I find that to sometimes be very difficult to do in writing.
在上班路上,我会用它来整理自己的思路。如果运气好的话——我觉得大多数日子都有效——等我真正到达时,我就会有一个重新整理好的待办事项清单。所以对我来说,语音功能需要既是我喜欢使用的工具,也是我希望在未来一年看到改进的东西。
On my way to work, I'll use it to process my own thoughts. And like with some luck, and I think this works most days, I'll have the restructured list of to dos by the time I actually get there. So voice for me, it needs to be the thing that, you know, I both love using and I want to see improve over the next year.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。