本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
在Anthropic,我们思考的方式是,我们不是为今天的模型而构建。
At Anthropic, the way that we thought about it is we don't build for the model of today.
我们是为六个月后的模型而构建。
We build for the model six months from now.
这实际上仍然是我给那些正在基于大语言模型创业者的建议。
That's actually, like, still my advice to to founders that are building on LLMs.
试着去思考,当前模型还很不擅长的前沿领域是什么?
Just try to think about, like, what is that frontier where the model is not very good at today?
因为模型迟早会变得擅长这一点。
Because it's gonna get good at it.
所有的四元代码都已经被编写、重写、再重写、反复重写了无数次。
All of quad code has just been written and rewritten and rewritten and rewritten over and over and over.
六个月前根本不存在任何四元代码的组成部分。
There is no part of quad code that was around six months ago.
你尝试一个想法,把它交给用户,与用户交流,从中学习,最终你可能会找到一个好点子。
You try a thing, you give it to users, you talk to users, you learn, and then eventually you might end up at a good idea.
有时候你不需要。
Sometimes you don't.
你心里是不是也在想,也许六个月后,你就不用再明确地进行提示了?
Are you also in the back of your mind thinking that maybe, like, in six months, you won't need to prompt that explicitly?
比如,模型自己就能足够好地理解了?
Like, the model will just be good enough to figure out on its own?
可能一个月就够了。
Maybe in a month.
一个月后就不再需要计划模式了?
No more need for plan mode in a month?
天啊。
Oh my god.
欢迎来到另一期《光锥》。
Welcome to another episode of the Light Cone.
今天,我们有一位非常特别的嘉宾,Boris Czerny,Claude Code 的创建者和工程师。
And today, have an extremely special guest, Boris Czerny, the creator, engineer of Claude Code.
Boris,感谢你加入我们。
Boris, thanks for joining us.
谢谢你们邀请我。
Thanks for having me.
谢谢你创造了一个让我连续三周睡不着觉的东西。
Thanks for creating a thing that has taken away my sleep for about three weeks straight.
我非常沉迷于Claude Code,感觉就像装了火箭助推器。
I'm very addicted to Claude Code, and it feels like rocket boosters.
到目前为止,人们是不是已经这样感觉好几个月了?
Has it felt like this for people, like, for, you know, months at this point?
我觉得大概是十一月的时候,我的很多朋友都说,有什么东西变了。
I think it was, like, November is where a lot of my friends said, like, something changed.
我记得对我而言,也有这种感觉。
I remember for me, felt this way.
当我第一次创建QuadCode时,还不确定自己是否发现了什么,但我隐约觉得我可能发现了什么。
When I first created QuadCode, and I didn't yet know if I was onto something, I kinda felt like I was onto something.
然后那时我就开始睡不着了。
And then that's when I wasn't sleeping.
我当时就想,连续这样三个多月了,是2024年9月。
I was just like, This after three straight was September 2024.
是的,整整连续三个月。
Yeah, it was like three straight months.
我一天假都没休,周末也在工作,每晚都加班。
I didn't take a single day vacation, worked through the weekends, worked every single night.
我当时就想,天哪。
I was just like, oh my god.
我觉得这可能会成为一个大事件。
This is I think this is gonna be a thing.
我不知道它是否有用,因为当时它还不会写代码。
I don't know if it's useful yet, because it it couldn't actually code yet.
如果回望那些时刻到现在,你觉得此刻最令人惊讶的是什么?
If you look back on those moments to now, like, what would be like the most surprising thing about this moment right now?
我们居然还在用终端,这简直难以置信。
It's unbelievable that we're still using a terminal.
这本该是起点。
That was supposed to be the starting point.
我从来没想过这会成为终点。
I didn't think that would be the ending point.
其次,它居然真的很有用。
And then the second one is that it's even useful.
因为一开始,它根本写不出代码。
Because, you know, at the beginning, it didn't really write code.
就连二月份我们正式发布时,它也只帮我写了大概10%的代码之类的。
Even in February when we GA'd it, it wrote maybe, like, 10% of my code or something like that.
我其实没怎么用它来写代码。
I didn't really use it to write code.
它在这方面表现得并不好。
It wasn't very good at it.
我仍然大部分代码都是手写的。
I still wrote most of my code by hand.
所以事实上,我们的押注得到了回报,它在我们原本认为它会变好的那个领域真的变强了,因为这并不明显。
So the fact that it it actually, like, our bets paid off, and it got good at the thing that we thought it was gonna get good at because it wasn't obvious.
在Anthropic,我们思考的方式是,我们不是为今天的模型而构建,而是为六个月后的模型而构建。
At Anthropic, the way that we thought about it is we don't build for the model of today, we build for the model six months from now.
这仍然是我对那些基于大语言模型创业者的建议:试着思考一下,当前模型还很不擅长的前沿领域是什么,因为未来它一定会变得擅长,你只需要等待。
And that's actually still my advice to founders that are building on LLMs, is just try to think about what is that frontier where the model is not very good at today, Because it's going get good at it, and you just have to wait.
回过头来看,你还记得你最初产生这个想法的时候吗?
Going back, do you remember when you first got the idea?
你能跟我们讲讲那个过程吗?
Can you just talk us through that?
是灵光一现吗?或者你脑海中的第一个版本是什么样的?
Was it like a spark, or what was even the first version of it in your mind?
这很有趣。
It's funny.
这完全是偶然的,它就这样逐渐演变成了这样。
It so accidental that it just kind of evolved into this.
作为Anthropic,我认为对于Ant来说,长期押注的就是编程。
As anthropic, I think for Ant, the bet has been coding for a long time.
而押注的路径是,通往安全AGI的道路是通过编程。
And the bet has been the path to safe AGI is through coding.
这一直就是我们的想法。
And this has kind of always been the idea.
实现这一目标的方式是:先教模型如何编程,然后教它如何使用工具,再教它如何使用计算机。
And the way you get there is you you teach the model how to code, then you teach it how to use tools, then you teach it how to use computers.
你可以看到这一点,因为我加入Anthropic时的第一个团队,叫做Anthropic Labs团队。
And you can kind of see that because the the first team that I joined at Anthropic, this is called the Anthropic Labs team.
它推出了三个产品。
And it produced three products.
分别是QuadCode、MCP和桌面应用。
It was QuadCode, MCP, and the desktop app.
所以你可以看到这些是如何相互交织的。
So you can kind of see how these weave together.
我们开发的这个特定产品,没有人要求我开发一个命令行界面。
The particular product that we built, no one asked me to build a CLI.
我们隐约觉得,可能是时候开发某种编程产品了,因为看起来模型已经准备好了,但还没有人真正打造出能利用这一能力的产品。
We kind of knew maybe it was time to build some kind of coding product because it seemed like the model was ready, but no one had yet really built the product that harnessed this capability.
所以当时仍然存在巨大的产品积压感,但那时情况甚至更疯狂,因为根本没人做过这个。
So like still there's this insane feeling of product overhang, but at the time it was just like even crazier because like no one had built this yet.
于是我开始动手尝试,心想:好吧,我们要做一个编程产品,我第一步该做什么?
And so I started like hacking around and I was like, okay, we build a coding product, what do I have to do first?
我得先了解如何使用API,因为那时我还没用过Anthropic的API。
I have to understand how to use the API, because I hadn't used the Anthropic API at that point.
所以我只是搭建了一个小的终端应用来使用API,就这么简单。
And so I just built like a little terminal app to use the API, that's all that I did.
这是一个简单的聊天应用,因为当你想到当时AI应用时,对于非程序员来说,大多数人用的都是聊天应用,所以我做了这样一个应用。
And it was a little chat app because you like you think about the AI applications at the time, and for non coders today, are most people using is just a chat app, so that's what I built.
它是在终端里运行的,我可以提问,也可以给出回答。
It was in a terminal, I can ask questions, I can give answers.
然后我觉得工具使用功能发布了。
Then I think tool use came out.
我想试试工具使用功能,因为我真的不太理解这东西是什么。
I just want to try out tool use, because I don't really understand this is.
我当时想,工具使用,这挺酷的。
I was like, tool use, this is cool.
这真的有用吗?
Is this actually useful?
可能没什么用。
Probably not.
我还是试试吧。
Let me just try it.
你把它放在终端里,只是因为这是最快让东西跑起来的方式吗?
You put it in terminal just because it was the easiest way to get something up and running?
是的。
Yes.
因为我不用去开发一个用户界面。
Because I didn't have to build a UI.
好的。
Okay.
所以
So
那时候只有我一个人。
it was just me.
那时候,像 IDE、Cursor、Windsurf 这些工具正在迅速兴起。
At that point, it was like the IDEs, cursor, Windsurf were the things that were really taking off.
你当时有没有感受到压力,或者收到很多建议,说我们应该把它做成一个插件,或者一个完整的 IDE?
Were you sort of under any pressure or getting lots of suggestions of, hey, we should build this out as a plug in or as a fully featured IDE itself?
当时没有任何压力,因为我们根本不知道自己想打造什么。
There was no pressure because we didn't even know what we wanted to build.
团队当时只是在探索阶段。
Like the team was just in explore mode.
我们大致知道想在编程领域做点什么,但具体做什么并不明确。
Know vaguely we wanted to do something in coding, but it wasn't obvious what.
没有人有足够的信心。
No one was high confidence enough.
figuring out 这是我的任务。
That was my job to figure out.
于是我给了模型批量工具。
And so I gave the model the batch tool.
那是我给它的第一个工具。
That was the first tool that I gave it.
只是因为我认为那正是我们文档里的示例。
Just because I think that was literally the example in our docs.
它直接照搬了那个示例。
It just like took the example.
它是用 Python 写的。
It was in Python.
我只是把它移植到了 TypeScript,因为我是用它写的。
Just ported it to TypeScript because that's how I wrote it.
我不知道模型能用 Bash 做什么,所以我让它读一个文件。
You know, I didn't know what the model could do with Bash, so I asked it to read a file.
它能用 cat 命令查看文件。
It could cat the files.
这很酷。
That was cool.
然后我想,好吧,它到底能做什么?
And then I was like, okay, what can you actually do?
于是我问它:我正在听什么音乐?
And I asked it, what music am I listening to?
你写了一些 Apple Script 来控制我的 Mac,并查看我音乐播放器里的歌曲。
You wrote some Apple script to script my Mac and look up the music in my music player.
天啊。
Oh my God.
这是Sonnet 3.5。
And this was Sonnet 3.5.
你知道,我真的没想过模型能做到这个。
And, you know, like, I I didn't think the model could do that.
那是我第一次,我觉得,真正感受到AGI的时刻。
And that was my first, I think, ever feel the AGI moment.
嗯。
Mhmm.
它就是单纯地想使用工具。
Where it's just like, oh my god, the model, it it just wants to use tools.
它就只想做这个。
That that's all it wants.
这相当有趣。
That's kind of fascinating.
我的意思是,Clockroach 以如此优雅简洁的形式能表现得这么好,这其实挺反直觉的。
I mean, it's very kinda contrarian that Clockroach works so well in such an elegant, simple form factor.
我的意思是,终端已经存在很久了,而这种设计约束似乎为开发者带来了许多有趣的体验。
I mean, terminals have been around for a really long time, and that seemed to be, like a good design constraint that allowed a lot of interesting developer experiences.
它根本不像在工作。
Like, it doesn't feel like working.
作为开发者,它就是让人觉得有趣。
It just feels fun as a developer.
我根本不会去想文件都放在哪儿。
I don't think about files where everything is.
这几乎是偶然发生的吗?
And that came by accident almost?
是的。
Yeah.
这纯粹是个意外。
It was an accident.
我记得,当这个终端在公司内部开始流行起来后,老实说,在第一个原型完成后的两天,我就开始把它给我的团队用于内部测试。
I remember so after the terminal started to take off internally and honestly, like after building this thing, I think like two days after the first prototype, I started giving it to my team just for dog fitting.
因为如果你有了一个想法,觉得它有用,你第一件事就是想把它交给别人,看看他们怎么使用。
Because if you come up with an idea and it seems useful, the first thing you want to do is you want to give it to people to see how they use it.
第二天我来上班时,坐在我对面的工程师罗伯特,他的电脑上已经装了QuadCode,并且正在用它写代码。
And then I came in the next day and then Robert, who sits across from me, who's another engineer, he just had quad code on his computer and he was using it to code.
我当时说:‘什么?’
I was like, What?
你在干什么?
What are you doing?
这东西还没准备好,只是一个原型。
This thing isn't ready, it's just a prototype.
但确实,它在那个形态下已经很有用了。
But yeah, was already useful in that form factor.
我记得当我们做QuadCode的外部发布评审时,那是在2024年11月或12月左右。
And I remember when we did our launch review to launch QuadCode externally, this was in December, November, or something like that in 2024.
达里奥问了,他说,内部的Ushis图表,就像道琼斯图表一样是垂直的。
Dario asked, he was like, the Ushis chart internally, like the the Dow chart is like vertical.
你们是在强迫工程师使用它吗?
Are you like forcing engineers to use it?
为什么你们要强制要求他们?
Like, why are you mandating them?
我只是说,没有。
And I was just like, No.
不,我们没有。
No, we didn't.
我只是发了一下,然后他们就互相之间在传播这个东西。
I just posted about it, and they'd just been telling each other about it.
老实说,这完全是偶然的。
Honestly, was just accidental.
最初是从命令行工具开始的,因为那是最便宜的方案,然后它就一直留在那里了。
Started with the CLI because it was the cheapest thing, and it just kind of stayed there for a bit.
那么在2024年那段时间,工程师们是怎么使用它的?
So in that 2024 period, how were the engineers using it?
他们已经开始用它来提交代码了,还是以其他方式使用?
Were they shipping code with it yet, or were they using it in a different way?
当时的模型在编程方面还不是很擅长。
The model was not very good at coding yet.
我本人用它来自动化Git操作。
I was using it personally for automating Git.
我想我现在可能已经忘了怎么手动用Git了,因为Glatico已经帮我们做了太久。
I think at this point I've probably forgotten of my Git, because Glatico has just been doing it for so long.
但确实,自动化bash命令是早期的一个使用场景,还有操作Kubernetes之类的事情。
But yeah, automating bash commands, that was a very early use case, and operating Kubernetes and things like this.
人们已经开始用它来写代码了,当时已经有一些早期迹象了。
People were using it for coding, so there were some early signs of this.
我认为第一个使用场景其实是写单元测试,因为风险较低,而当时的模型在这方面依然很糟糕。
I think the first use case was actually writing unit tests, because it s a little bit lower risk and the model was still pretty bad at it.
但人们正在慢慢摸索出来,学会如何使用这个工具。
But people were kind of figuring it out and they were figuring out how to use this thing.
我们发现的一件事是,人们开始为自己编写Markdown文件,然后让模型阅读这些文件。
And one thing that we saw is people started writing these markdown files for themselves and then having the model read that markdown file.
这就是QuadMD的由来。
And this is where QuadMD came from.
对我而言,产品设计中最核心的原则就是潜在需求。
Probably the single, for me, principle in product is latent demand.
这个产品的每一个功能,都是在初始命令行界面之后,基于潜在需求逐步构建的。
And just every bit of this product is built through latent demand after their initial CLI.
因此,QuadMD就是这一原则的体现。
And so QuadMD is an example of that.
还有另一个普遍原则我觉得可能很有意思:你可以为模型构建应用,然后围绕模型搭建辅助结构,以略微提升性能。
There's this other general principle that I think is maybe interesting where you can build for the model and then you can build scaffolding around the model in order to improve performance a little bit.
根据不同领域,你或许能将性能提升10%、20%左右。
And depending on the domain you can improve performance maybe 10%, 20%, something like that.
然后,这种优势在下一个模型出现时就被抵消了。
And then essentially the gain is wiped out with the next model.
所以,你要么构建辅助系统,获得一些性能提升,然后再重新构建一次。
So either you can build the scaffolding and then get some performance gain and then rebuild it again.
或者你干脆等待下一个模型,这样你就能几乎免费获得这些改进。
Or you just wait for the next model and then you kind of get it for free.
Cloud MD 和这种辅助结构就是这个原理的一个例子。
The Cloud MD and kind of the scaffolding is an example of that.
事实上,我认为我们一直坚持使用命令行界面的原因是,我们觉得任何用户界面在六个月后都会过时,因为模型的进步实在太快了。
Really I think that's why we stayed in the CLI, is because we felt there was no UI we could build that would still be relevant in six months because the model was improving so quickly.
之前我们提到应该比较一下 Cloud MD,但你说了个非常深刻的观点,那就是你的版本其实非常简短,几乎和人们预期的完全相反。
Earlier we were saying like we should compare Cloud MDs, but you said something very profound, which is, you know, yours is actually very short, which is almost like the opposite of what, you know, people might expect.
为什么会这样?
Why is that?
你的 Cloud MD 里包含什么?
What's in your Cloud MD?
好的。
Okay.
所以在来之前我查过这个。
So I I checked this before we came.
我的CloudMD里有两件事。
So my my CloudMD has two things.
第一,它只有两行。
One is there it it it's just two lines.
第一行是:每次你提交PR时,启用自动合并。
So the first line is whenever you put up a PR, enable auto merge.
一旦有人批准,就会自动合并。
So as soon as someone accepts it, it's merged.
这样我就可以直接写代码,不用来回折腾代码审查之类的。
That's just so I can, like, code and I don't have to kinda go back and forth with CR or whatever.
第二,每次我提交PR时,都会在我们内部团队的印章频道里发一下,这样有人能盖章,我就不会被卡住。
And then the second one is whenever I put up a PR, post it in our internal team stamps channel just so someone can stamp it and I can get unblocked.
而且,其他所有规范都记录在我们纳入代码库的QuadMD中,整个团队每周都会多次共同维护。
And the idea is every other instruction is in our QuadMD that's checked into the code base, and it's something our entire team contributes to multiple times a week.
很多时候,我会看到别人的PR,他们犯了一些完全可以避免的错误。
And very often, I'll see someone's PR and they they make some, like, mistake that's totally preventable.
我就会直接在PR里@Claude。
And I'll just literally tag Claude on the PR.
我会直接写:‘把这条加到ClaudeMD里’,这种事情我每周都会做很多次。
I'll just do, like, add Claude, you know, like, add this to the ClaudeMD, and I'll do this, you know, like, many times a week.
你们需要整理ClaudeMD吗?
Do you have to, like, compact the ClaudeMD?
比如,我 definitely 到了那个阶段,上面弹出提示说你的CloudMD已经有几千个token了。
Like, I definitely reached the point where I got the message at the top saying, your Cloud m d is, like, thousands of tokens now.
当你们遇到这种情况时,会怎么处理?
What do you do when you guys hit that?
我们的CloudMD其实还挺短的。
So our Cloud m d is actually pretty short.
我觉得大概几千个token左右吧,差不多就是这样。
I think it's, like, couple thousand tokens, maybe something like that.
如果你遇到这种情况,我的建议是删除你的CloudMD,从头开始。
If you if you hit this, my recommendation would be delete your Cloud m d and start fresh.
有意思。
Interesting.
我觉得很多人会试图把这件事过度工程化,对吧?
I think a lot of people, they try to over engineer this, right?
实际上,每一代模型的能力都在变化,所以你真正需要做的,是用最少的干预让模型回到正轨。
Really the capability changes with every model, and so the thing that you want is do the minimal possible thing in order to get the model on track.
所以,如果你删除了QUADMD后,模型开始跑偏、做错事,那时你再一点点地加回去。
And so if you delete your QUADMD and then the model is getting off track, it does the wrong thing, that's when you kind of add back a little bit at a time.
你可能会发现,随着每一代新模型的推出,你需要添加的内容越来越少。
What you're probably going find is with every model you have to add less and less.
说实话,我觉得我自己是个很普通的工程师。
For me, I consider myself a pretty average engineer, to be honest.
我不怎么用那些花哨的工具。
I don't use a lot of fancy tools.
比如,我根本不使用 Vim。
Like, I I don't use, like, Vim.
我用的是 VS Code,因为它挺方便的。
I use, you know, Versus Code because it's somewhere.
我并不
I don't
等等,真的吗?
really Wait.
真的?
Really?
我本来以为你既然在终端里开发这个,应该是个死忠终端用户,只用 Vim那种人,你知道的?
I would have assumed that because you built this in the terminal that you were sort of like a diehard terminal, like, Vim Vim only person, you know?
那些用 VS Code 的人就别理了。
Screw those Versus code people.
团队里确实有这样的人。
Well, have people like that on the team.
比如亚当·沃尔夫,他就在这支团队里。
You know, like Adam Wolf, for example, he's on the team.
他常说:你别想从我冰冷的尸体上把Vim拿走。
He's like, You will never take Vim for my cold, dead hand.
是的,团队里确实有很多这样的人,这也是我早期学到的一点:每个工程师都有自己偏爱的开发工具。
Yeah, so there's definitely a lot of people like that on the team, and this is one of the things that I learned early on, is every engineer likes to hold their dev tools differently.
他们喜欢使用不同的工具。
They like to use different tools.
根本不存在一种适合所有人的工具。
There's just no one tool that works for everyone.
但我认为,这也是GitHub Copilot能如此出色的原因之一,因为我常常想:我会使用什么样的产品,才能让我觉得顺手?
But I think also this is one of the things that makes it possible for quad code to be so good, because I kind of think about it as what is the product that I would use that makes sense to me.
所以,使用GitHub Copilot,你并不需要懂Vim。
And so to use quad code, you don't have to understand BIM.
不需要了解 TMUX。
Don't have to understand TMUX.
你不需要知道如何使用 SSH。
You don't have to know how to SSH.
你不需要懂所有这些东西。
You don't have to know all this stuff.
你只需要打开工具,它就会引导你。
You just have to open up the tool and it will guide you.
它会帮你完成所有这些事情。
It will do all this stuff.
你怎么决定终端的详细程度呢?
How do you decide how verbose you want like, sort of the terminal to be?
有时候你得按 Ctrl+O 去查看一下。
Like, sometimes you have to go, you know, control o and check it out.
这会不会引发一些关于输出长短的内部争论?
And is it, like, internal bike shed battles around, like, longer, shorter?
我的意思是,每个用户可能都有不同的看法。
I mean, every every user probably has a different opinion.
比如,你是如何做出这类决定的?
Like, how do you make those sorts of decisions?
你的看法是什么?
What what's your opinion?
现在是不是太啰嗦了?
Is it is it too verbose right now?
哦,我喜欢这种详尽的输出。
Oh, I love the verbosity.
因为有时候它会突然疯狂输出,我就盯着看,然后快速扫一眼,心想:糟了。
Because, basically, sometimes it just, like, goes off the deep end, and I'm watching, and then I can just read very quickly, and it's like, oh, no.
不是的。
No.
不是那个问题。
It's not that.
然后我退出,直接停止它。
And then I escape, and then just stop it.
然后它就会在问题发生时,直接阻止整个错误集群。
And then it just, like, stops an entire bug farm, like, as it's happening.
我的意思是,这通常是因为我没有正确使用计划模式。
I mean, that's usually when I didn't do plan mode properly.
这可能是我们经常更改的内容。
This is something that we probably change pretty often.
我记得早期,大概是六个月前,我曾试图完全去掉bash输出,只做摘要,因为我觉得这些超长的bash命令我根本不在乎。
I remember early on, this was maybe six months ago, I tried to get rid of bash output just internally, just to summarize it because I was like, these giant long bash commands, I don't actually care.
然后我把它给Anthropic的员工用了一天,结果所有人都抗议了。
And then I gave it to Anthropic employees for a day and everyone just revolted.
我想看到我的命令行输出。
I want to see my dash.
因为对于像git输出这样的东西,它实际上还挺有用的,也许并不是没用。
Because it actually is quite useful for something like git output, maybe it's not useful.
但如果你在运行 Kubernetes 任务之类的东西时,实际上还是希望看到这些输出。
But if you're running Kubernetes jobs or something like this, you actually do want to see it.
我们最近隐藏了文件读取和文件搜索操作。
We recently hid the file reads and file searches.
所以你会注意到,不再显示‘读取 foo’。
So you'll notice instead of saying, read foo.
而是显示‘读取一个文件,一个模式’。
Md, it'll say, read one file, one pattern.
我认为六个月前我们根本不可能推出这个功能,因为当时的模型还不够成熟。
And this is something I think we could not have shipped six months ago because the model just was not ready.
它经常会读错内容。
It would have, you know, it still read the wrong thing pretty often.
作为用户,你仍然需要在场,去发现并调试这些问题。
As a user, you still had to be there and kind of catch it and debug it.
但如今,我注意到它几乎每次都能走在正确的轨道上。
But nowadays, I just noticed it's on the right track almost every time.
因为它大量使用工具,所以最好直接总结一下。
And because it's using tools so much, it's actually a lot better just to summarize it.
但随后我们发布了它。
But then we shipped it.
我们自己内部使用了一个月,然后GitHub上的用户并不喜欢。
We dogfooded it for like a month, then people on GitHub didn't like it.
因此出现了一个大问题,用户表示:不,我想看到详细信息。
So there was a big issue where people were like, no, like I want to see the details.
这是非常棒的反馈。
That was really great feedback.
于是我们新增了一个详细模式,你可以在配置中启用详细模式,如果你想查看所有文件上传,仍然可以做到。
And so we added a new verbose mode, and so that's just like in config, you can enable verbose mode, and if you want to see all the file uploads, can continue to do that.
然后我在该问题下发了帖子,但用户还是不满意,这同样很棒,因为我最喜欢的事情就是听到用户的反馈,了解他们实际想如何使用它。
And then I posted on the issue and people still didn't like it, which is again awesome because my favorite thing in the world is hearing people's feedback and hearing how they actually wanna use it.
所以我们不断迭代,反复改进,直到它真正变得优秀,成为大家想要的样子。
And so we just like iterated more and more and more to get that really good and to make it the thing that people want.
我惊讶于自己现在竟然如此享受修复bug的过程。
I'm amazed like how much I enjoy fixing bugs now.
你只需要有非常好的日志记录,然后甚至可以说:嘿,看看这个特定的对象。
And then all you have to do is have really good logging, and then even just say, like, hey, check out that, you know, this particular object.
它以这种方式出错了,然后会自动在日志中搜索。
It messed up in this way, and it, like, searches the log.
它能弄清楚所有问题。
It figures everything out.
它甚至可以帮你建立生产环境隧道,直接查看你的生产数据库。
It can, like, go into your you can make a production tunnel, and it'll look at your production DB for you.
这简直太疯狂了。
It's like, this is insane.
修复bug就是直接复制到Sentry的Markdown。
Bug fixing is just going to Sentry copy markdown.
你知道的?
You know?
很快,这就直接变成MCP了。
Pretty soon, it's just gonna be straight MCP.
它就像是自动修复bug、自动生成测试用例之类的,他们现在管这叫什么新术语来着?
It's like an auto bug fixing, like, end test making sort of what's the new term they call it?
就像是打造一个创业工厂。
Like, making a startup factory.
哦,是的。
Oh, yeah.
对吧?
Right?
现在有很多这样的概念,不再需要人工审查代码了,你知道的。
There's, like, all these concepts now of rather than having to review the code, you know.
我是老派的,所以我喜欢详尽的表达。
I'm I'm old school, so I like the verbosity.
我喜欢说:哦,你这么做,但我希望你那样做。
I like to say, oh, well, you're doing this, but I want you to do that.
对吧?
Right?
但现在有一种完全不同的观点,认为只要需要真人查看代码,那就是坏事。
But there's a totally different school of thought now that says, like, anytime an a real human being has to look at code, that's bad.
是的。
Yeah.
是的。
Yeah.
是的。
Yeah.
这很有趣。
It's fascinating.
我认为,像丹·奇珀这样的人都经常谈到这一点:每当看到模型出错时,就尝试把它放进QuadMD,或者放进技能之类的东西里,以便复用。
I think, like, Dan Chipper talks about this a lot as kind of whenever you see the model make a mistake, try to put it in the QuadMD, try to put it in skills or something like this so it's reusable.
但我认为还有一个我经常感到困惑的元层面的问题。
But I I think there's this meta point that I actually struggle with a lot.
人们总说智能体能做这个、能做那个,但实际上智能体能做什么,会随着每个新模型而变化。
And people talk about agents can do this, agents can do that, but actually what agents can do, it changes with every single model.
所以有时候新成员加入团队后,他们使用QuadCode的频率反而比我高得多。
And so sometimes there's a new person that joins the team and they actually use quad code more than I would have used it.
我对此总是感到非常惊讶。
And I'm just constantly surprised by this.
比如,我们曾经遇到一个内存泄漏问题,当时正在尝试调试它。
Like, for example, there was a we had like a memory leak and we were trying to debug it.
顺便说一句,贾里德·萨默一直在致力于消灭所有内存泄漏,这真的太棒了。
By the way, Jared Sumner has just been on this crusade killing all the memory leaks, it's just been amazing.
但在贾里德加入团队之前,这些事都得我来处理。
But before Jared was on the team, I had to do this.
当时就存在这样一个内存泄漏问题。
And there was this memory leak.
我在调试时,做了一个堆转储。
I was trying to debug it, and so I took a heap dump.
我用 DevTools 打开了它。
I opened it in DevTools.
我正在查看性能分析数据。
I was looking through the profile.
然后我翻看代码,试图弄清楚这个问题。
Then I was looking through the code, and I was trying to figure this out.
团队里的另一位工程师克里斯,他直接问了 QuadCode。
And then another engineer on the team, Chris, he just asked QuadCode.
他说:‘你觉得这里有内存泄漏吗?’
He was like, Hey, think there's a memory leak.
你能运行一下这个,然后试着找出问题吗?
Can you run this and then try to figure it out?
QuadCode 接收了堆转储,自己写了一个小工具来分析堆转储。
And QuadCode took the heap dump, it wrote a little tool for itself to analyze the heap dump.
然后它比我更快地找到了泄漏点。
And then it found the leak faster than I did.
这是一些我必须不断重新学习的东西,因为我的大脑有时还停留在六个月前的状态。
And this is just something I have to constantly relearn because my brain is still stuck somewhere six months ago at times.
那么,对于技术型创始人来说,如何才能在最新模型发布时成为极致的追随者呢?
So what would be some advice for technical founders to really become maximalists at the latest model release?
听起来,刚毕业或者没有太多先入为主观念的人,可能比那些长期从事这一行的工程师更适合。
It sounds like people off of fresh off of school or that that don't have any assumptions might be better suited than maybe sometimes engineers who have been working at it for a long time.
那么专家们又是如何变得更好的呢?
And how do the experts get better?
我认为对于自己来说,保持初学者的心态,或许还有谦逊的态度。
I think for yourself it's kind of beginner mindset and I don't know, maybe just like humility.
我觉得作为工程师这一行,我们被训练得拥有非常强烈的意见,而资深工程师往往因此受到认可。
I feel like engineers as a discipline, we've learned to have very strong opinions and senior engineers are kind of rewarded for this.
在我以前在大公司的工作中,招聘架构师这类工程师时,我们会寻找那些经验丰富、观点鲜明的人。
In my old job at a big company when I hired architects and this kind of type of engineer, you look for people that have a lot of experience and really strong opinions.
但事实上,很多这些东西已经不再相关了,许多这些观点都应该改变,因为模型正在变得越来越好。
But it actually turns out a lot of this stuff just isn't relevant anymore and a lot of these opinions should change because the model is getting better.
所以我认为最重要的技能是那些能够科学思考、从第一性原理出发的人。
So I think actually the biggest skill is people that can think scientifically and can just think from first principles.
你现在招聘团队成员时,如何筛选出这样的人呢?
How do you screen for that when you try to hire someone now for your team?
我有时会问他们:你什么时候犯过错误?
I sometimes ask about what's an example of when you're wrong.
这是一个非常好的问题。
It's a really good one.
这些经典的行为主观问题,甚至不是编程问题,我认为相当有用。
Some of these classic behavioral questions, not even coding questions, I think are quite useful.
你可以看出一个人是否能在事后认识到自己的错误,是否愿意承认错误,并从中吸取教训。
You can see if people can recognize their mistake in hindsight, if they can claim credit for the mistake, and if they learn something from it.
我认为很多资深人士,尤其是某些类型的创始人,实际上在这方面做得很好,创始人尤其擅长这一点。
And I think a lot of these very senior people, especially there are some founder types like this, I think founders in particular are actually quite good at it.
但其他人有时却永远不会真正承担责任,他们从不为错误道歉。
But other people sometimes will never really take they'll never take the blame for a mistake.
但说实话,对我来说,我大概有一半时间都是错的。
But I don't know, for me personally, I'm wrong probably half the time.
我一半的想法都是糟糕的。
Half my ideas are bad.
你只能不断去尝试各种东西。
And you just have to try stuff.
你试一个东西,交给用户,和用户交流,然后学习。
You try a thing, you give it to users, you talk to users, you learn.
最终你可能会得到一个好点子,但有时也不会。
And then eventually you might end up at a good idea, sometimes you don't.
我认为,过去这种能力对创始人来说非常重要。
And this is the skill that I think in in the past was very important for founders.
但现在我觉得这对每个工程师都至关重要。
But now I think it's very important for every engineer.
你觉得你会根据Claude代码 transcript 中某人与代理互动的表现来雇佣他吗?
Do you think you would ever hire someone based on the Claude code transcript of them working with the agent?
因为我们现在正在 actively 做这件事
Because we're actively doing that right
现在。
now.
是的。
Yeah.
我们刚刚添加了一个功能作为测试,比如可以上传你使用 Cloud Code、Codecs 或其他任何工具编码某个功能时的对话记录。
We just added, just as a test, like, can upload a transcript of you coding a feature with Cloud Code or Codecs or whatever it is.
我个人觉得,这肯定会奏效。
Personally, I think that, like, it's gonna work.
我的意思是,你可以看出一个人是如何思考的,比如他们是否查看日志,当代理偏离轨道时,他们能否纠正它?
I mean, you can figure out how someone thinks, like, whether they're looking at their logs or not, like, can they correct the agent if it goes off off the rails?
比如,他们会使用计划模式吗?
Like, do do they use plan mode?
当他们使用计划模式时,是否会确保有测试用例,或者你知道的,所有这些不同的事情——他们是否考虑过系统?
You know, when they use plan mode, do they make sure that there are tests or you know, all of these different things that, you know, do they think about systems?
他们甚至理解系统吗?
Do they even understand systems?
就像,这里面蕴含了太多东西,我想象中是这样的。
Like, there's just so much that's sort of embedded in that that I imagine.
我只是想要一个蜘蛛网图,你知道的,就像那些电子游戏里NBA 2K那样的。
I just want like a spider a spider web graph, you know, like in those video games like NBA two k.
哦,这个人投篮或防守特别厉害。
It's like, oh, this person's really good at shooting or defense.
你可以想象出一个蜘蛛网图,用来表示某人的Claude代码技能水平。
It's like, you can imagine a spider web graph of, like, you know, someone's Claude code skill level.
是的。
Yeah.
那些技能会是什么?
What would what would the skills be?
这些方面会包括哪些?
What would be those aspects?
我觉得这涉及到系统、测试,肯定得考虑用户行为,也就是说,肯定有一个设计部分。
I think it's like systems, testing, must be like user behave I mean, there's gotta be a design part.
是的。
Yeah.
当然。
For sure.
比如产品直觉。
Like product sense.
也许还包括自动化一些事情。
Maybe maybe also just like automating stuff.
嗯哼。
Mhmm.
我在CloudMD最喜欢的一件事是,对于每个方案,都要判断它是过度设计、设计不足,还是恰到好处,并说明原因。
My favorite thing in CloudMD for me is I have a thing that says for every plan, decide whether it's over engineered, under engineered, or perfectly engineered, and why.
我觉得我们也在试图弄清楚这一点,因为当我观察团队中我认为最高效的工程师时,发现他们基本上只有两种类型,非常两极分化。
I think this is something that we're trying to figure out too because I I think when I look at engineers on the team that I think are the most effective, there's essentially two it's very bimodal.
展开剩余字幕(还有 480 条)
有一类是极端的专业人士。
There's one side where it's extreme specialists.
就像我之前提到的贾里德,他就是一个很好的例子,整个团队也是如此。
And so like I named Jared before, like he's a really good example of this and kind of the team is a really good example.
他们是超级专家,对开发工具的理解比任何人都深。
Just hyper specialists, they understand dev tools better than anyone else.
他们对JavaScript运行时系统的理解也比任何人都透彻。
They understand JavaScript runtime systems better than anyone else.
而另一面则是超级通才,也就是团队的其他成员。
And then there's the flip side of kind of hyper generalists, and that's kind of the rest of the team.
很多人横跨产品和基础设施,或产品和设计,或产品和用户研究,产品和业务。
And a lot of people, they span like product and infra, or product and design, or product and user research, product and business.
我特别喜欢看到那些做些奇怪事情的人。
I really like to see people that just do weird stuff.
我觉得这在过去曾是一个警示信号,因为人们会想:这些人真的能做出有用的东西吗?
I think that's one of these things that was kind of a warning sign in the past, it's because like, can these people actually build something useful?
这就是极限测试。
That's the limits test.
是的,这就是极限测试。
Yeah, that's the limits test.
但如今,比如我们团队的一位工程师Daisy,她之前在另一个团队,后来转到了我们团队。
But nowadays, like for example, an engineer on the team, Daisy, she was on a different team and then she transferred onto our team.
我之所以希望她转过来,是因为她加入后不久就提交了一个关于QuadCode的PR。
And the reason that I wanted her to transfer is she put up a PR for QuadCode like a couple of weeks after she joined or something.
这个PR是为了给QuadCode添加一个新功能。
And the PR was to add a new feature to QuadCode.
但她并没有直接添加功能,而是先提交了一个PR,为QuadCode开发了一个工具,用于测试任意工具并验证其是否正常工作,然后才提交了那个功能PR。
And then instead of adding the feature, what she did is first she put up a PR to give QuadCode a tool so that it can test an arbitrary tool and verify that that works, and then she put up that PR.
接着,她让QuadCode自己编写这个工具,而不是亲自去实现它。
And then she had Quad write its own tool instead of herself implementing it.
我认为这种跳出框架的思维方式非常有趣,因为目前还没有多少人能理解这一点。
I think it's this kind of out of the box thinking that is just so interesting because not a lot of people get it yet.
你知道吗,我们使用Quad Agent SDK来自动化开发的几乎所有环节。
You know, like we use the Quad Agent SDK to automate pretty much every part of development.
它自动化代码审查和安全审查。
It automates code review, security review.
它为我们的所有问题打上标签。
It labels all of our issues.
它推动项目上线生产环境。
It shepherds things to production.
它几乎为我们做了所有事情。
It does pretty much everything for us.
但我认为,从外部来看,正有越来越多的人开始理解这一点。
But I think externally, I'm seeing a lot of people start to figure this out.
但实际上,要弄清楚如何以这种方式使用大语言模型花了挺长时间。
But it's actually taken a while to figure out how do you use LMs in this way?
如何使用这种新型的自动化?
How do you use this new kind of automation?
所以这算是一种新技能。
So it's kind of a new skill.
我想,我在和一些创始人进行办公时间交流时,遇到的一个有趣现象是:你有一个富有远见的创始人,他脑子里有一个想法。
I guess one of the funnier things that I've been having office hours with various founders about is you have, like, sort of the visionary founder who has, like, the idea.
他构建了自己想要打造的产品的水晶宫殿。
They've, like, built this, like, crystal palace of the product that they wanna build.
他完全把用户是谁、用户感受如何、用户动机是什么,都记在了脑子里。
They've totally loaded in their brain, you know, who the user is and what they feel and what they're motivated by.
然后他们坐在Claude代码前,能实现50倍的工作效率。
And then they're sitting in Claude code, they can do, like, you know, 50 x work.
但他们手下的工程师却没有那种关于产品理想形态的‘水晶记忆宫殿’,只能发挥5倍的效率。
And then but they have engineers who work for them who, like, don't have the, you know, crystal memory palace of, like, the platonic ideal of the product that the founder has, and they can only do, like, five x work.
你有没有听过类似的故事?
Are you hearing stories like that?
通常总会有一个人,是这个东西的核心设计师,他们只是想把自己的想法从脑子里彻底倾泻出来。
There's usually a person who's, like, the core, like, designer of a thing, and they're just, like, you know, trying to blast it out of their brain.
像这样的团队本质上是什么样的?
What's the nature of, like, teams like that?
你知道,这看起来几乎是一种稳定的配置。
You know, it it seems like that's almost a stable configuration.
你会有一个愿景者,现在彻底释放了。
Like, you're gonna have the visionary who, like, now is unleashed.
但回到最初,我现在正亲身体验着这一点。
But, you know, going back to the top of it, like, I'm experiencing this right now.
就像,哦,我只是一个单独的人,我需要吃饭睡觉,而且我还有整整一份工作。
It's like, oh, well, I'm only a solo person, and, you know, I need to eat and sleep, and I have, you know, a whole job.
那我该怎么做到这一点呢?
And it's like, how am I gonna do this?
你知道吧?
You know?
你知道,我们刚推出了QuadTeams,这是一种做法,但你也可以自己构建属于你的方式。
You know, like, we just launched QuadTeams, and, you know, this is a way to do it, but you can also just build your own way to do it.
这很容易。
It's pretty easy.
QuadTeams的愿景是什么?
What's the vision for QuadTeams?
就是协作。
Just collaboration.
这就像出现了一个全新的领域,人们正在探索代理道歉的方式。
It's like there's this whole new field of agent apologies that people are exploring.
比如,他们可以如何配置代理?
Like what are the ways they can configure agents?
有一个子想法是无关的上下文窗口。
There's this one sub idea which is uncorrelated context windows.
其理念就是多个代理。
And the idea is just multiple agents.
它们拥有全新的上下文窗口,不会被彼此的上下文或自身之前的上下文所污染。
They have fresh context windows that aren't essentially polluted with each other's context or their own previous context.
如果你向一个问题投入更多上下文,这就相当于一种测试时的计算。
And if you throw more context at a problem, that's like a form of test time compute.
因此,你就能以这种方式获得更强的能力。
And so you just get more capability that way.
如果在此基础上拥有正确的拓扑结构,使代理能够以正确的方式通信、合理地布局,它们就能构建出更庞大的系统。
And then if you have the right topology on top of it so the agents can communicate in the right way, they're laid out in the right way, then they can just build bigger stuff.
所以,Teams 是其中一个想法。
And so Teams is kind of like one idea.
还有一两个想法很快就会推出。
There's a few more that are coming pretty soon.
这个想法只是也许它能多构建一点东西。
And the idea is just maybe it can build a little bit more.
我认为第一个真正成功的大型例子是我们插件功能,它完全是由一个代理群在周末内构建完成的。
I think the first kind of big example where it worked is our plug ins feature was entirely built by a swarm over over a weekend.
它只是运行了几天左右。
It just ran for, like, a few days.
实际上并没有人为干预。
There wasn't really human intervention.
插件的形态基本和它刚发布时一模一样。
And plug ins is pretty much in the form that it was when when it came out.
你们是怎么设置的?
How did you set that up?
比如,你们是否先明确了期望的结果,然后让它自己去推敲细节,再让它运行?
Like, did you spec out sort of the outcome that you were hoping for, and then let it sort of figure out the details, and then, like, let it run?
是的。
Yeah.
团队里一位工程师给Quad提供了一份规范,并让他使用Asana看板。
An engineer on the team just gave gave Quad a spec, and told Quad to use Asana board.
然后Quad就在Asana上创建了一堆任务单,启动了多个代理,这些代理开始接手任务。
And then Quad just put up a bunch of tickets on Asana, then spawned a bunch of agents, and the agents started picking up tasks.
接着Quad给了它们一些指导,它们自己就全都搞定了。
And then Quad just gave it instructions, they all just figured it out.
那些没有整体规格背景的独立代理,对吧?
The independent agents that didn't have the context of the bigger spec, right?
对。
Right.
如果你想想现在代理是如何实际启动的?
If you think about the way that how are agents actually started nowadays?
我还没查过这方面的数据,但我敢打赌,如今大多数代理实际上是由Claude以子代理的形式触发的。
I haven't pulled the data on this, but I would bet the majority of agents are actually prompted by Claude today in the form of sub agents.
因为子代理其实就是一种递归的Claude代码。
Because a sub agent is just like a recursive Claude code.
代码里就是这么回事。
That's all it is in the code.
它只是被我们称作‘妈妈’的Claude所触发。
And it's just prompted by we call her mom a Claude.
就是这样。
That's all it is.
我认为,如果你观察大多数代理,它们都是以这种方式启动的。
And I think probably if you look at most agents, they're launched in this way.
我的Claude洞察建议我多用这种方式进行调试。
My Claude Insights just told me to do this more for debugging.
因为我花了很多时间在调试上,而让多个子代理并行启动来调试问题会更好。
So that I get I spent a lot of time on debugging, and it would just be better to have multiple sub agents spin up and debug something in parallel.
于是我把它加到了我的Claude MD里,就是说:嘿,下次你尝试修复bug时,让一个代理专注于日志,另一个代理专注于代码路径。
And so then I just added that to my Claude MD to just be like, hey, next time you try and fix a bug, have one agent that lives in the log, one that lives in the code path.
这看起来似乎是不可避免的。
That just seems sort of inevitable.
对于奇怪而棘手的bug,我会尝试在计划模式下修复,然后它似乎会使用代理来搜索一切。
For weird scary bugs, I try to fix bugs in plan mode, and then it seems to use the agents to sort of Yep.
搜索所有内容。
Search everything.
而当你只是逐行处理时,就会想:好吧,我就只做这一个任务,而不是广泛搜索。
Whereas like when you're just trying to do it in line, it's like, okay, I'm gonna do like this one task instead of search wide.
我也会经常这么做。
This is something I do all the time too.
只要任务看起来有点难,比如这种研究型任务,我就会根据任务的难度来调整让我使用的子代理数量。
Just say, if the task seems kind of hard, this kind of research task, I'll calibrate the number of sub agents I ask it to use based on the difficulty of the task.
所以如果任务特别难,我会说,使用三个,或者五个,甚至十个子代理。
So if it's really hard, I'll say, use three or maybe five or even 10 sub agents.
并行开展研究,然后看看它们能得出什么结果。
Research in parallel, and then see what they come up with.
我很好奇,那你为什么不在你的ClaudeMD文件里加上这个呢?
I'm curious, so then why don't you put that in your ClaudMD file?
这得视具体情况而定。
It's kind of case by case.
QuadMD,那是什么?
QuadMD, what is it?
这是一个快捷方式。
It's a shortcut.
如果你发现自己反复做同一件事,就把它们放进QuadMD里。
If you find yourself repeating the same thing over and over, you put in the QuadMD.
但除此之外,你没必要把所有东西都放进去。
But otherwise, you don't have to put everything there.
你只需直接提示quad即可。
You can just prompt quad.
你心里是不是也在想,也许六个月后,你就不再需要明确地这样提示了?
Are you also in the back of your mind thinking that maybe, like, in six months, you won't need to prompt that explicitly?
比如,模型本身已经足够智能,能自己判断了?
Like, the model would just be good enough to figure out on its own?
可能一个月就够了。
Maybe in a month.
一个月内就不用计划模式了?
No money for plan mode in a month?
天哪。
Oh my god.
我认为计划模式的生命周期可能是有限的。
I think plan mode probably has a limited lifespan.
有意思。
Interesting.
那就是
That's
给在座的每个人一些前沿进展。
some alpha for everyone here.
如果没有计划模式,世界会是什么样子?
What would the world look like without plan mode?
你只是在提示层面描述它,然后它就能一次性完成吗?
Do you just describe it at the prompt level and it would just do it, one shot it?
是的。
Yeah.
我们已经开始在这方面进行实验,因为Quad代码现在可以自行进入计划模式了。
We've started experimenting with this because quad code can now enter plan mode by itself.
我不知道你们有没有看过这个。
I don't know if you guys have seen that.
看过这个,是的。
Seen that, yeah.
所以我们正在努力让这个体验变得非常好。
So we're trying to kind of get this experience really good.
因此,它会在人类本想进入计划模式的同一时刻自动进入计划模式。
So it would enter plan mode at the same point where a human would have wanted to enter it.
所以我觉得大概是这样的。
So I think it's something like this.
但实际上,计划模式并没有什么神秘之处。
But actually, plan mode, there's no there's no big secret to it.
它只是在提示中添加了一句话,比如:请不要写代码。
All it does is it adds one sentence to the prompt that's like, please don't code.
就这么简单。
That's all it is.
你其实可以直接这么说。
You can you can actually just say that.
对。
Yeah.
所以听起来,Cloud Code 的许多功能开发都非常像我们在 YC 谈论的内容。
So it sounds like a lot of the feature development for Cloud Code is very much what we talk about at YC.
与用户交流,然后去实现它。
Talk to your users, and then you come and implement it.
并不是先有一个完整的计划,再逐一实现所有功能。
It wasn't the other way that you had this master plan and then implemented all the features.
是的。
Yeah.
对。
Yeah.
我的意思是,事情就是这么简单。
I mean, that that's all it was.
比如,计划模式是我们看到一些用户说,嘿,Quad。
Like, plan mode was we saw users that that were like, hey, Quad.
想出一个点子。
Come up with an idea.
把这个计划梳理清楚,但先别写任何代码。
Plan this out, but don't write any code yet.
这中间有过各种不同的版本。
And there was kind of various versions of this.
有时候只是简单地讨论一个想法。
Sometimes it was just talking through an idea.
有时候则是他们让Claude去写非常详细的规格文档。
Sometimes it was these very sophisticated specs that that they were asking Claude to write.
但共同点是,在还没写代码之前先做点别的。
But the common dimension was do a thing without coding yet.
所以, literally,这事儿就发生在周日晚上10点。
And so literally, like this was, like, Sunday night at 10PM.
我当时只是在查看 GitHub 的问题,看看人们在讨论什么,同时浏览我们内部的 Slack 反馈频道。
Was I was just, like, looking at GitHub issues and kinda seeing what people were talking about and looking at our internal Slack feedback channel.
我花了大约三十分钟写出了这个东西,然后当晚就发布了。
And I just wrote this thing in, like, thirty minutes and then shipped it that night.
它在周一早上上线了。
It went out Monday morning.
这就是计划模式。
It was plan mode.
所以你的意思是,计划模式在某种意义上不再需要了,比如我担心模型会做错事或跑偏方向,但仍然需要这种模式。
So do you mean that there'll be no need for plan mode in the sense of I'm worried that the model's gonna do it's gonna do like the wrong thing or head off in the wrong direction, but there will still be a need for that.
你需要仔细思考这个想法,弄清楚你到底想要什么,而这些必须在某个地方完成。
You need to think through the idea and figure out exactly what it is that you want, and you have to do that somewhere.
我倾向于从提升模型能力的角度来思考这个问题。
I kind of think about it in terms of increasing model capabilities.
所以六个月前,一个计划是不够的。
So maybe six months ago, a plan was insufficient.
所以你让Claude制定计划,即使使用计划模式,你也得坐在那儿盯着,因为它可能会跑偏。现在我大部分会话(大概80%)都会说,计划模式的使用寿命是有限的,但我还是个重度计划模式使用者。
So you get Claude to make a plan, let's say even with plan mode, you still have to kind of sit there and babysit, because it can go off Nowadays, I do is probably 80% of my sessions I say plan mode has a limited lifespan, but I'm a heavy plan mode user.
我大概80%的会话都从计划模式开始,Claude会先开始制定计划,然后我就切换到第二个终端标签页,再让它生成另一个计划。
Probably 80% of my sessions, I start in plan mode, and Claude will it'll start, he'll start making a plan, I'll move on to my second terminal tab, and then I'll have it make another plan.
等我用完所有标签页后,我就打开桌面应用,切换到代码标签页,然后在那里开一堆新标签页。
And then when I run out of tabs, I open the desktop app, and then I go to the code tab, and then I just start a bunch of tabs there.
它们几乎都是从计划模式开始的,大概80%的时间都是这样。
And they all start in plan mode, probably, you know, like 80% of the time.
一旦计划好了,有时需要来回调整几次,我就直接让Claude去执行。
Once the plan is good, and sometimes it takes a little back and forth, they just get Claude to execute.
现在我用Opus 4.5发现,我觉得是从4.6开始,它变得非常好用了。
Nowadays what I find with Opus 4.5, I think it started with 4.6, it got really good.
一旦计划完善了,它就能一直保持在正确的轨道上,几乎每次都能准确无误地完成任务。
Once the plan is good, it just stays on track, and it'll just do the thing exactly right almost every time.
所以以前,你不仅要在计划前盯着,计划后也得盯着。
And so, you know, before you had to babysit after the plan and before the plan.
现在只需要在制定计划之前了。
Now it's just before the plan.
所以也许下一步是你根本不需要监督,只需给出一个提示,Claude 就能自己搞定。
So maybe the next thing is you just won't have to babysit, you can just kind of give a prompt and Claude will figure it out.
下一步是 Claude 直接与你的用户沟通。
The next step is Claude just speaks to your users directly.
是的。
Yeah.
它完全绕过了你。
It just bypasses you entirely.
有趣的是,这实际上是我们目前的情况。
It's funny, this is actually the current stuff for us.
我们的四人小组实际上会彼此交流。
Our quads actually, they talk to each other.
它们经常通过 Slack 与我们的用户交流,至少在内部是这样。
They talk to our users on Slack, at least internally pretty often.
我的那个四足机器人偶尔会发一条推文。
My quad will, like, tweet once in a while.
不可能吧。
No way.
但我其实经常会删掉它发的内容。
But I actually, like, delete it.
感觉有点儿俗气。
It's just like it's a little, like, cheesy.
我不太喜欢那种语气。
Like, I don't love the tone.
它想发些什么内容呢?
What does it want to tweet about?
有时候它会回复某个人。
Sometimes it'll just respond to someone.
因为我总是在后台运行着协作者,真正喜欢做这个的是协作者四足机器人,因为它喜欢使用浏览器。
Because I always have like co work running in the background, and it's like, it's the co work quad that really loves to do that because it likes using a browser.
这很有趣。
That's funny.
一个非常常见的模式是,我让四足机器人构建某个东西,它会查看代码库,发现某个工程师在 Git 平台上修改了某些内容,然后就在 Slack 上给那位工程师发消息,提出一个澄清性问题。
A really common pattern is I ask quad to build something, it'll look in the code base, it'll see some engineer touch something in the Git plane, Then it'll message that engineer on Slack, just asking a clarifying question.
一旦它得到回复,就会继续下去。
Once it gets an answer back, it'll keep going.
对于现在的创业者来说,有哪些关于面向未来进行建设的建议?
What are some tips for founders now on how to build for the future?
听起来一切都正在发生巨大变化。
It sounds like everything is really changing.
有哪些原则会持续存在,又有哪些会改变?
What are some principles that will stay on and what will change?
我认为其中一些原则相当基础,但如今它们比以往任何时候都更重要。
I think some of these are pretty basic, but I think they're even more important now than they were before.
一个例子是潜在需求。
One example is latent demand.
我对自己说过无数次了。
I mentioned it a thousand times for me.
这是产品领域最重要的一个理念。
It's just the single biggest idea in product.
这是一个没人真正理解的东西。
It's a thing that no one understands.
在我最初的几次创业中,我当然也不理解。
It's a thing I certainly did not understand my first few startups.
它的意思是,人们只会去做他们原本就在做的事情。
And the idea is like people will only do a thing that they already do.
你无法让别人去做一件全新的事情。
You can't get people to do a new thing.
如果人们正在尝试做某件事,而你让它变得更简单,那就是一个好点子。
If people are trying to do a thing and you make it easier, that's a good idea.
但如果人们正在做某件事,而你试图让他们去做另一件事,他们是不会这么做的。
If But people are doing a thing and you try to make them do a different thing, they're not gonna do that.
所以你只需要让它们正在做的事情变得更简单。
And so you just have to make the thing that they're trying to do easier.
我认为Quad会越来越擅长帮你发现这类产品创意,因为它可以分析反馈、查看调试日志,从而慢慢弄清楚这些事情。
And I think Quad is gonna get increasingly good at kind of figuring out these kind of product ideas for you, just because it can look at feedback, it can look at debug logs, it can kinda figure this out.
这就是你所说的计划模式代表潜在需求的意思吧?人们其实已经打开浏览器里的聊天窗口,正在跟它对话,来弄清楚需求和该做什么。
That's what you mean by a plan mode was latent demand, that people were already like kinda had their clawed chat window open in the browser and were like talking to it to figure out like the spec and and what it should do.
现在Plan模式直接变成了这样,你只需要在Claude Code里做就行了。
And now that Plano just became that, you just do it in Claude code.
是的,没错。
Yeah, yeah.
有时候我会在办公室里走一走,站在人们身后。
Sometimes what I'll do is I'll just walk around the office on our floor and I'll just kind of stand behind people.
我会说:嗨,这样其实不太好。
I'll say like, hi, so it's not great.
然后我就观察他们是如何使用Claude Code的。
And then I'll just see kind of like how they're using Claude code.
这其实也是我经常看到的情况。
And this is also just something I saw a lot.
但GitHub上的问题里也提到了这一点,人们一直在讨论这个。
But it also came up in GitHub issues, like people were talking about it.
你似乎对终端已经发展到如此地步感到惊讶,它被推到了这么远。
It seems like you're surprised how far the terminal has gone and how far it's been pushed.
你觉得它还有多大的发展空间?
Like, how far do you think it has left to go?
考虑到这个多智能体协同的世界,你认为上面还需要一种不同的用户界面吗?
Just given with this world of swarm multiple agents, do you think there's going to be a need for a different UI on top of it?
这挺有意思的。
It's funny.
如果你一年前问我这个问题,我会说终端最多再撑三个月,之后我们就该转向别的东西了。
If you asked me this a year ago, would have said the terminal has like a three month lifespan and then we're going to move on to the next thing.
你能看到我们正在做这方面的尝试,对吧?
And you can see us experimenting this, right?
因为 Quad Code 最初是在终端里诞生的,但现在它已经迁移到了网页上。
Because quad code started in a terminal, but now it's in, you know, it's on web.
你可以访问 quad AI slash code。
You can like quad AI slash code.
它也在桌面应用里。
It's in the desktop app.
你知道的,我们已经在代码标签页里提供这个功能大约三个月、六个月了。
You know, we've had that for, you know, like three months or six months or something, just in the code tab.
它也出现在 iOS 和 Android 应用中,就在代码标签页里。
It's in the iOS and Android apps, just like in the code tab.
它也在 Slack 里。
It's in Slack.
它也在 GitHub 里。
It's in GitHub.
还有 Versus Code 的扩展插件。
There's Versus Code extensions.
还有 JetBrains 的扩展。
There's JetBrains extensions.
所以我们一直在尝试各种不同的形式,来看看下一个会是什么。
So we're just like, we're always experimenting with different form factors for this thing to figure out what's the next thing.
到目前为止,我对 CLI 的生命周期判断都错了,所以我可能不是预测这个的最佳人选。
I've been wrong so far about the lifespan of the CLI, so I'm probably not the person to forecast this.
那对于开发者工具的创始人,你有什么建议吗?
What about, like, your advice to DevTool founders?
比如,现在有人正在建立一个开发者工具公司。
Like, someone's building a DevTool company today.
他们应该只是为工程师和人类构建,还是应该更多地考虑 Claude 会怎么想、想要什么,从而为智能体构建?
Should they just, like, be building for engineers and humans, or should they be thinking more about what Claude's gonna think and want and build for the agent?
我会这样看待这个问题:思考模型想要做什么,然后想办法让它更容易实现。
The way I would frame it is think about the thing that the model wants to do and figure out how do you make that easier.
这一点我们在刚开始开发 Quad Code 时就意识到了,发现这个工具就是想要使用各种工具。
And that's something that we saw, you know, like when I first started hacking on quad code, realized like this thing just wants to use tools.
它只是想与世界互动。
It just wants to interact with the world.
你该如何实现这一点?
And how do you enable that?
你不能做的就是把它关在盒子里。
Well, the way you don't do it is you put it in a box.
然后你告诉它:这是API,这是你与我交互的方式,这是你与世界交互的方式。
And you're like, here's the API, here's how you interact with me, and here's how you interact with the world.
正确的方法是观察它想使用哪些工具,了解它想做什么,然后像为你的用户那样去支持它。
The way you do it is you see what tools it wants to use, you see what it's trying to do, and you enable that the same way that you do for your users.
所以,如果你正在创办一个开发者工具初创公司,你应该思考:你希望为用户解决什么问题?
And so if you're building a DevTool startup, would think about what is the problem you want to solve for the user?
当你用模型来解决这个问题时,模型想做什么?
And then when you use when you apply the model to solving this problem, what is the thing the model wants to do?
那么,什么样的技术和产品方案能同时满足这两者的潜在需求?
And then what is the technical and product solution that serves the latent demand of both?
YC的下一期项目现在开始接受申请。
YC's next batch is now taking applications.
你有创业想法吗?
Got a startup in you?
请前往 ycombinator.com/apply 申请。
Apply at ycombinator.com/apply.
永远不嫌太早,填写申请表能提升你的想法。
It's never too early, and filling out the app will level up your idea.
好的。
Okay.
回到视频。
Back to the video.
多年前,十多年前,你曾是 TypeScript 的重度用户,还写过一本关于它的书。
Back in the day, more than ten years ago, you were a very heavy user and you wrote a book about TypeScript.
对吧?
Right?
在TypeScript流行之前。
Before TypeScript was cool.
那时候大家都深陷在JavaScript中。
This is when everyone was deep in JavaScript.
那是二十世纪二十年代初。
This is back in early twenty tens.
对吧?
Right?
是的。
Yeah.
差不多就是这样。
Something like that.
在TypeScript还不存在的时候,因为那时候JavaScript是一种非常奇怪的语言。
Before TypeScript was a thing because back then is a very weird language.
它本来就不该在JavaScript中做这么多与类型相关的事情。
It's not supposed to do a lot of things with being typed in JavaScript.
而现在这是正确的方式。
And now is the right thing.
而且感觉终端里的ClotCode与TypeScript早期有很多相似之处。
And it feels like ClotCode in the terminal has a lot of parallels with TypeScript at the beginning.
TypeScript做出了很多非常奇怪的语言设计。
TypeScript makes a lot of really weird language decisions.
所以如果你看一下类型系统,几乎任何东西都可以是字面量类型。
So if you look at the type system, pretty much anything can be a literal type, for example.
这非常奇怪,因为就连Haskell都没有这样做。
And this is like this is super weird because like even like like Haskell doesn't even do this.
这太极端了。
It's like it's too extreme.
或者它有条件类型,我认为没有任何其他语言想过这一点。
Or it has conditional types, which I don't think any language thought of at all.
它是非常强类型的。
It was very strongly typed.
是的。
Yeah.
非常强类型。
Was very strongly.
当乔·派默、安德斯和早期团队在构建这个系统时,他们的做法是:我们有一些团队拥有庞大的无类型JavaScript代码库。
And the idea was when Joe Paymer and Anders and the early team was building this thing, the way they built it is, okay, we have these teams with these big untyped JavaScript code bases.
我们必须把类型引入进去,但不会要求工程师改变他们的编码方式。
We have to get types in there, but we're not going get engineers to change the way that they code.
你不可能让JavaScript开发者像Java程序员那样使用15层的类继承。
You're not going to get JavaScript people to have 15 layers of class inheritance like you would a Java programmer.
对。
Right.
他们会按照自己习惯的方式写代码。
They're going write code the way they're going write it.
他们会使用反射,会使用变异,会使用所有那些传统上非常难以类型化的特性。
Going to use reflection, and they're going to use mutation, and they're going to use all these features that traditionally are very, very difficult to type.
对任何坚定的函数式程序员来说,它们都是非常不安全的类型。
They're a very unsafe type to any strong functional programmer, really.
没错。
That's right.
没错。
That's right.
没错。
That's right.
因此,他们没有试图让人改变编码方式,而是围绕这一点构建了一个类型系统。
And so the thing that they did, instead of getting people to kind of change the way that they code, they built a type system around this.
这真是太巧妙了,因为当时根本没人想到这些想法。
And it was just brilliant because there's all these ideas that no one was thinking about.
甚至在学术界,也没人想过这么多点子。
Even in academia, like no one thought of a bunch of these ideas.
这些完全源于对人们的观察,看到了JavaScript程序员希望如何编写代码。
It purely came out of the practice of observing people and seeing how JavaScript programmers want to write code.
因此对于四元代码,有一些类似的想法,你可以像使用 Unix 工具一样使用它。
And so for quad code, there are some ideas that are kind of similar in that you can use it like a Unix utility.
你可以将数据管道输入到它里面。
You can pipe into it.
你也可以将数据从它里面管道输出。
You can pipe out of it.
在某些方面,它在这方面是严谨的,但在几乎所有其他方面,它只是我们想要的工具。
In some ways, it is rigorous in this way, but in almost every other way, it's just the tool that we wanted.
我为自己开发了一个工具,然后团队为自己开发了这个工具,接着为 Anthropic 的员工开发,最后为用户开发。
I built a tool for myself, and then the team built the tool for themselves, and then for Anthropic employees, and then for users.
它最终变得非常有用。
It just ends up being really useful.
它并不是一个基于原则或学术性的东西。
It's not this like principled and academic thing.
我认为,如今十五年多过去了,真正的证明就在于结果。
Which I think the proof is actually in the results now, fast forward more than fifteen years later.
很少有代码库使用Haskell,因为Haskell更偏向学术性。
Not many code bases are in Haskell, which is more academic.
而现在有大量代码库使用TypeScript,因为它更实用。
And there's tons of them now in TypeScript because it's way more practical.
没错。
Right.
这很有趣。
Which is interesting.
是的。
Yeah.
确实很有趣。
It is interesting.
对吧?
Right?
TypeScript解决了这个问题。
It's like TypeScript solves the problem.
我想有一件事很酷,我不知道有多少人知道,但这个终端其实是市面上最漂亮的终端应用之一,而且它实际上是用 React Terminal 写的。
I guess one thing that's cool, I don't know how many people know, but the terminal is actually one of the most beautiful terminal apps out there and is actually written with React terminal.
当我刚开始构建它的时候,你知道,我曾经做过一段时间的前端工程。
When I first started building it, you know, like, I I did front end engineering for for a while.
所以,我也算是一个混合型的人。
So and I was also like a you know, I'm I'm sort of like a hybrid.
比如,我会做设计、用户研究,还会写代码,所有这些我都做。
Like, I I do, like, design and user research and, write code and all this stuff.
我们特别喜欢招这样的工程师。
We love hiring engineers that are like this.
我们喜欢全才。
We love generalists.
对我来说,就是我正在为终端构建一个东西。
For me it's like, okay, I'm building a thing for the terminal.
其实我是个很烂的 VIM 用户。
I'm actually kind of a shitty VIM user.
我该如何为像我这样要在终端中工作的人打造一个产品?
How do I build a thing for people like me that going to be working in a terminal?
我觉得愉悦感真的非常重要。
And I think just the delight is so important.
我觉得在YC,你们经常谈论这一点,对吧?
I feel like at YC this is something you talk about a lot, right?
就是要打造一个让人喜爱的产品。
It's like build a thing that people love.
如果产品虽然有用,但你并不爱上它,那就不够好。
If the product is useful but you don't fall in love with it, that's not great.
所以它必须同时做到这两点。
So it kind of has to do both.
为终端设计,说实话,一直很难。
Designing for the terminal honestly has been hard.
对吧?
Right?
大概是80乘以100个字符之类的。
It's like 80 by 100 characters or whatever.
你只有256种颜色。
You have like two fifty six colors.
你只有一种字体大小。
You have one font size.
你不能使用鼠标交互。
You don't have like mouse interactions.
有这么多事情你做不到,而且还有很多艰难的权衡。
There's all this stuff you can't do, and there's all these very hard trade offs.
所以一个鲜为人知的技巧是,你实际上可以在终端中启用鼠标交互。
So like a little known thing for example is you can actually enable mouse interactions in a terminal.
所以你可以启用点击之类的操作。
So can you enable clicking and stuff.
哦,在Cloud Code里怎么做到这一点?
Oh, how do you do that in Cloud Code?
我一直在想办法
I've been trying to figure
怎么做到这一点
out how
实现这个功能。
to do this.
我们在 Cloud Code 中没有实现这个功能,因为我们曾经多次原型化过,但体验非常差,因为必须虚拟化滚动。
We don't don't have it in Cloud Code because we actually prototyped it a few times, and it felt really bad, because the trade off is you have to virtualize scrolling.
而终端的工作方式是没有 DOM。
And so there's all these weird trade offs because the way terminals work is there's no DOM.
对吧?
Right?
只有转义序列和这些自 1960 年代以来逐渐演变出来的奇怪规范。
There's antie escape codes and these weird organically evolved specs since the 1960s or whatever.
是的。
Yeah.
这感觉像BBS。
It feels like BBSs.
就像一个BBS门游戏。
It's like a BBS door game.
是的。
Yeah.
是的。
Yeah.
是的。
Yeah.
天啊。
Oh my gosh.
这简直是极大的赞美。
That's like that's like a great compliment.
是的。
Yeah.
是的。
Yeah.
让你感觉像是在发现《红龙之主》。
Get you feel like you're discovering Lord of
红龙。
the Red Dragons.
太棒了。
Fantastic.
天哪。
Oh my god.
是的。
Yeah.
但我们不得不自己去发现所有这些构建终端的用户体验原则,因为根本没人写过这些东西。
But we have we've had to just like discover all these kind of UX principles for building the terminal because no one really writes about this stuff.
如果你看看八九十年代或者 whatever 那时候的大型终端应用,它们都用像 Ed Curses 这样的工具,有很多窗口之类的东西,但以现代标准来看,它们的界面显得很粗糙。
And if you look at the big terminal apps of, you know, like the eighties or nineties or February or whatever, they use like Ed Curses, and they have all these like windows and things like this, and it just looks kind of like janky by modern standards.
它看起来太沉重和复杂了。
It just looks too heavy and complicated.
所以我们不得不重新设计很多东西。
And so we had to reinvent a lot.
比如终端旋转器,就是那些旋转的文字,到现在已经迭代了大概五六十次,甚至可能一百次。
And for example, something like the terminal spinner, just like the spinner words, it's gone through probably, I want to say like 50, maybe 100 iterations at this point.
其中大概80%都没能通过。
Probably 80% of those didn't chip.
我们试了,感觉不好,就换下一个。
So we tried it, it didn't feel good, move on to the next one.
试了,感觉不好,就换下一个。
Try it, didn't feel good, move on to the next one.
这其实是Quad Code的一个了不起之处,对吧?
And this was sort of one of the amazing things about quad code, right?
你可以写出这些原型,然后连续做二十个原型,看看哪个你喜欢,然后直接发布,整个过程可能只需要几个小时。
It's like you can write these prototypes and you can just do like 20 prototypes back to back, see which one you like and then ship that, and the whole thing takes maybe a couple hours.
而过去,你必须学习使用折纸或Framer之类的东西。
Whereas in the past, what you would have had to do is learn to use origami or Framer or something like this.
大概做出三个原型。
Built maybe three prototypes.
花了两周时间。
Took two weeks.
整个过程要长得多。
It just took much, much longer.
所以我们现在有这种优势,必须去发现这个新东西。
And so we have this luxury of we have to discover this new thing.
我们必须构建一个东西。
We have to build a thing.
我们不知道最终的正确形态是什么,但我们可以如此快速地迭代,这使得它变得非常容易,也让我们能够打造出令人愉悦、人们喜欢使用的產品。
We don't know what the right endpoint is, But we can iterate there so quickly, and that's what makes it really easy, and that's what lets us build a product that's joyous, and that people like to use.
鲍里斯,你对开发者还有其他建议,但我们一直打断你,因为问题太多了。
Boris, you had other advice for builders, and we kept interrupting you because we have so many questions.
我会说,好吧,可能有两条有点奇怪的建议,因为它们是关于为模型而构建的。
I would say, so okay, so maybe two pieces of advice that are kind of weird, because it's like about building for the model.
第一条是:不要为今天的模型构建,而要为六个月后的模型构建。
So one is, don't build for the model of today, build for the model of six months from now.
这有点奇怪,对吧?
This is, like, sort of weird.
对?
Right?
因为,如果你的产品根本无法运行,你就找不到产品市场契合点,但事实上,你应该这么做,否则你会花大量精力找到当前产品的市场契合点,然后很快就会被别人超越,因为他们正在为下一个模型构建,而新模型每隔几个月就会出现。
Because, like, you can't find PMF if the product doesn't work, but actually this is the thing that you should do because otherwise what will happen is you spend a bunch of work, you find PMF for the product right now, and then you're just gonna get leapfrogged by someone else, because they're building for the next model and a new model comes out every few months.
使用当前模型,探索它的能力边界,然后为六个月后可能存在的模型进行构建。
Use the model, fill out the boundary of what it can do, and then build for the model that you think will be the model maybe six months from now.
我认为第二点是,在我们所处的代码领域,我们墙上挂着一份《痛苦的教训》的复印件。
I think the second thing is actually in code the area where we sit, we have a framed copy of the bitter lesson on the wall.
这是里奇·萨顿的一篇博客文章,如果你还没读过,每个人都应该读一读。
And this is this Rich Sutton blog post that everyone should read it if you haven't.
核心理念是,更通用的模型总会胜过更具体的模型。
The idea is the more general model will always beat the more specific model.
对此有很多推论,但本质上就是:永远不要与模型作对。
And there are a lot of corollaries to this, but essentially what it boils down to is never bet against the model.
因此,这正是我们始终在思考的问题:我们可以在Quad Code中添加一个功能。
And so this is just like a thing to that that we always think about, where we could build a feature into quad code.
我们可以让产品变得更好,我们称之为脚手架。
We could make it better as a product, and we call this scaffolding.
这些代码都不是模型本身。
That's all this code that's not the model itself.
但我们也完全可以等几个月,模型可能就能直接完成这件事了。
But we could also just wait like a couple months and the model can probably just do the thing instead.
而且有一种直接的方法。
And there's a way to straight off.
对吧?
Right?
这现在更像是工程工作,你可以稍微扩展一下能力,比如在你试图拓展的某个领域中提升20%左右,就像那个蜘蛛图一样。
It's like engineering work now, and you can kind of extend the capability a little bit, maybe 20% or whatever in whatever domain on this like, you know, like the spider chart of what you're trying to extend.
或者你可以直接等一等,下一个模型就能做到。
Or you can just wait and the next model will do it.
所以一定要始终从这个权衡的角度来思考。
So just always think in terms of this trade off.
你到底想在哪里投入资源?
Where where do you actually wanna invest?
并且要假设,无论什么脚手架,都只是技术债务。
And assume that whatever the scaffolding is, it's just tech debt.
你们多久重写一次ClockCode的代码?
How often do you rewrite the code ways of ClockCode?
是每六个月就重写一次吗,就在这第一个阶段?
Is this every six months with this with this first
你们有没有删除过那些物理脚手架,因为模型已经进步了,你们不再需要它们了?
physical scaffolding that you've deleted because you don't need it anymore because the model just improved?
太多了。
Oh, so much.
是的。
Yeah.
就像四元代码被一遍又一遍地编写和重写。
Like, of quad code has just been written and rewritten and rewritten and rewritten over and over and over.
我们每隔几周就会下线一些工具。
We unship tools every couple weeks.
我们每隔几周就会新增一些工具。
We add new tools every couple weeks.
没有任何一部分四元代码是六个月前就存在的。
There's no part of quad code that was around six months ago.
它一直在不断地被重写。
It's just constantly rewritten.
但你说当前时钟代码的代码库中,大部分——大约80%——都不到几个月大?
But you say that most of the code base for current clock code is only say 80% of it is only less than a couple months old?
是的
Yeah.
当然
Definitely.
它甚至可能更少,没错。
It might it might even be like less than yeah.
也许只有几个月。
Maybe like a couple months.
这感觉差不多对。
That that feels about right.
如今代码的生命周期就是这样,另一个alpha期望它的使用寿命只有几个月,是的。
Just like the life cycle of code now, that's another alpha is expecting it to be the shelf life to be just a couple months Yeah.
对于最优秀的创始人来说。
For the best founders.
你看到史蒂夫·耶吉关于在Anthropic工作有多棒的帖子了吗?
Did you see Steve Yegi's post about how awesome working at Anthropic is?
而且里面有一句话说,Anthropic 的工程师目前的生产力平均是谷歌工程师在巅峰时期的 1000 倍,这真是一个惊人的数字。
And think there's a line in there that says that an Anthropic engineer currently averages 1,000 x more productivity than a Google engineer at Google's peak, which is really an insane number.
老实说,一千倍。
Honestly, like, thousand x.
你知道吗,三年前我们还在谈论十倍工程师。
Like, you know, we were three years ago, we were still talking about 10 x engineers.
现在我们却在说,比处于巅峰时期的谷歌工程师还要高出一千倍?
Now we're talking about a thousand x on top of a Google engineer in the prime?
这简直难以置信,说实话。
Like, is unbelievable, honestly.
是的。
Yeah.
我的意思是,从内部来看,如果你看技术员工,他们每天都在使用 Quad Code。
Mean, internally, if you if you look at, like, technical employees, they all use quad code every day.
甚至非技术人员,我觉得有一半的销售团队都在使用 Quad Code。
Even non technical employees, I think half the sales team uses quad code.
他们已经开始转向协作者,因为更容易使用,而且有虚拟机,更安全一些。
They have started switching to co workers because it's a little easier to use, it has VMs, it's a little bit safer.
不过,我们刚刚统计了数据,我觉得团队去年规模翻了一倍,但每位工程师的生产力提升了大约70%。
But yeah, we actually we just pulled the stat and I think the team doubled in size last year, but productivity per engineer grew something like 70%.
以什么为衡量标准?
As measured by?
就是最简单、最笨拙的指标:拉取请求。
Just like the simplest, stupidest measure, pull requests.
但我们还交叉核对了提交记录、提交的生命周期之类的其他数据。
But we also kind of cross checked that against commits and the lifetime of commits and things like this.
自从QuadCode推出以来,Anthropic每位工程师的生产力提升了150%。
And since QuadCode came out, productivity per engineer at Anthropic has grown 150.
天哪。
Oh my God.
这太疯狂了,因为在我以前的工作中,我负责Meta的代码质量。
And this is crazy because in my old life I was responsible for code quality at Meta.
我负责Facebook、Instagram、WhatsApp等所有产品线的代码质量。
And I was responsible for the quality of all of our code bases across every product, across Facebook, Instagram, WhatsApp, whatever.
团队当时的一项工作就是提升开发效率。
And one of the things that the team worked on was improving productivity.
那时候,如果能提升2%的效率,那都是数百人花上一年才做到的成果。
And back then, seeing a gain of something like 2% in productivity, that was like a year of work by hundreds of people.
所以这种100%的提升,简直是闻所未闻,完全不可思议。
And so this, like, a 100%, this is just like unheard of, just completely unheard of.
是什么吸引你来到Anthropic?
What drew you to come over to Anthropic?
我的意思是,作为一位构建者,你哪里都能去。
I mean, basically, as a builder, you could go anywhere.
是什么样的时刻让你觉得,就是这群人,就是这种做法?
What was the moment that made you say, like, actually, this is the set of people or this is the approach?
我当时住在日本乡村,每天早上都会打开Hacker News阅读新闻。
I was living in rural Japan, and I was opening up Hacker News every morning, and I was reading the news.
然后某一天,它就开始变成人工智能相关的东西了。
And it just started to be like AI stuff at some point.
于是我开始使用一些早期的产品。
And I started to use some of these early products.
我记得第一次使用时,简直让我瞠目结舌。
I remember the first couple of times that I used it, it just took my breath away.
这么说听起来有点俗气。
That was very cheesy to say.
但那确实是我的真实感受。
But that was actually the feeling.
太惊人了。
It was amazing.
作为一位构建者,我从未在使用这些非常早期的产品时有过这种感觉。
As a builder, I've just never felt this feeling using these very, very early products.
那大概是在Quad发布后两天左右吧。
That was like in the Quad two days or something like that.
所以我开始跟实验室里的朋友们聊天,想看看都在发生什么。
And so I just started talking to friends at labs just to kind of see what was going on.
我遇到了本·曼,他是Anthropic的联合创始人之一,他立刻就让我信服了。
And I met Ben Mann, who's one of the founders at Anthropic, and he just immediately won me over.
当我见到Anthropic的其他团队成员时,我也立刻被他们打动了。
And as soon as I met kind of the rest of the team at Ant, it just won me over.
我想,可能有两个方面的原因。
And I think I think probably in two ways.
一方面,它是一个研究实验室。
So one is it operates as a research lab.
所以产品工作非常非常微小。
So the product work was teeny, teeny, tiny.
它的核心完全是打造一个安全的模型。
It's really all about building a safe model.
这才是唯一重要的事。
That's all that matters.
因此,能够如此贴近模型、贴近开发,而不再把产品放在最重要的位置,这种理念让我非常认同,因为产品已经不再是最重要的了。
And so this idea of just being very close to the model and being very close to development and being not the most important thing because the product isn't anymore.
最重要的就是模型本身。
It's just the model is the thing that's the most important.
在多年从事产品工作之后,这一点深深打动了我。
That really resonated with me after building product for many years.
第二点则是它强烈的使命感。
And then the second thing was just how mission driven it is.
我是个狂热的科幻小说读者,我的书架上摆满了科幻作品。
Like I a huge sci fi reader, my bookshelf is just filled with sci fi.
所以我深知事情可能会有多糟糕。
And so I just know how bad this can go.
当我想到今年将会发生什么时,那一定会完全疯狂。
And when I kind of think about what is going to happen this year, it is going to be totally insane.
而在最坏的情况下,情况可能会变得极其糟糕。
And in the worst case, can go very, very bad.
所以我只是想在一个真正理解并深深认同这一点的地方工作。
And so I just wanted to be at a place that really understood that and kind of really internalized that.
最终,如果你在食堂或走廊里无意中听到人们的对话,他们谈论的都是人工智能安全。
And at end, if you overhear conversations in the lunch room or in the hallway, people are talking about AI safety.
这才是每个人最关心的事情,远超其他一切。
This is really the thing that everyone cares about more than anything.
所以我只是想待在这样一个地方。
And so I just wanted to be in a place like that.
对我来说,使命至关重要。
I know for me personally, mission is just so important.
今年会发生什么?
What is going to happen this year?
好的。
Okay.
如果你回想六个月前,人们当时做出了哪些预测?
So if you think back like six months ago, and kind of what are the predictions that people are making?
所以达里奥预测,Anthropic公司90%的代码将由Quad生成。
So Dario predicted that 90% of the code at Anthropic would be written by Quad.
这是真的。
This is true.
就我个人而言,自从Opus 4.5以来,我的代码已经是100%由它生成了。
For me personally, it's been 100% for like since Opus 4.5.
我卸载了我的IDE。
I uninstalled my IDE.
我再也不手动编写任何一行代码了。
I don't edit a single line of code by hand.
在Opus里,我的代码完全是Quad生成的。
It's just a 100% quad code in Opus.
我每天都会合并20个拉取请求。
And I land 20 PRs a day every day.
如果你看Anthropic整体情况,不同团队的代码生成比例在70%到90%之间浮动。
If you look at Anthropic overall, it ranges between 70 to 90% depending on the team.
很多团队也是100%。
For a lot of teams it's also 100%.
对很多人来说,这已经是100%了。
For a lot of people it's 100%.
我记得在五月我们正式发布Quad代码时就做过这个预测,说以后写代码根本不需要IDE了。
And I remember making this prediction back in May when we GA ed quad code that you wouldn't need an ID to code anymore.
当时说这话简直太疯狂了。
And it was totally crazy to say.
我觉得台下的观众都倒吸一口凉气,因为那时候这个预测听起来太荒谬了。
I feel like people in the audience gasped because it was such a silly prediction at the time.
但其实这只不过是追踪一下指数增长趋势而已。
But really all it is is you just trace the exponential.
因为我们的三位创始人都是扩展定律论文的合著者,所以这种理念已经深深融入了公司的DNA。
This is just so deep in the DNA at the end because three of our founders were coauthors of the scaling laws paper.
他们很早就看到了这一点。
They saw this very early.
所以这就像追踪指数增长,这就是即将发生的事,而确实,它发生了。
So this is just like tracing the exponential, this is what's going to happen, and yes, that happened.
继续追踪指数增长,想想会发生什么:编程将对每个人普遍变得易如反掌。
So continuing to trace the exponential, think what will happen is coding will be generally solved for everyone.
我认为如今编程对我而言已经基本解决了,我相信对每个人也都会如此,无论你身处哪个领域。
And I think today coding is practically solved for me, and I think it'll be the case for everyone, you know, regardless of domain.
我认为我们将开始看到‘软件工程师’这个头衔逐渐消失。
I think we're gonna start to see the title software engineer go away.
也许未来只会保留‘构建者’或‘产品经理’这样的称谓。
And I think it's just gonna be maybe builder, maybe product manager.
也许我们会保留‘软件工程师’这个头衔作为某种残留的痕迹,但人们实际做的工作将不再仅仅是写代码。
Maybe we'll keep the title as kind of a vestigial thing, but the work that people do, it's not just gonna be coding.
软件工程师还将撰写规格文档。
It's software engineers are also gonna be writing specs.
他们还将与用户沟通。
They're gonna be talking to users.
就像我们团队现在开始出现的情况一样,工程师们都是全才,我们团队的每个成员都会写代码,我们的产品经理写代码,设计师写代码,工程经理写代码,连我们的财务人员也写代码,我们团队里的每个人都写代码。
Like this thing that we're starting to see right now on our team where engineers are very much generalists, and every single function on our team codes, like our PM's code, our designer's code, our EM codes, our finance guy codes, everyone on our team codes.
我们会看到这种现象在各个地方都开始出现。
We're going to start to see this everywhere.
如果我们继续这个趋势,这就是最低限度的情况。
This is the lower bound if we just continue the trend.
而上限,我认为要可怕得多。
The upper bound, I think, is a lot scarier.
这就像我们达到了ASL四。
And this is something like, you know, we hit ASL four.
在Anthropic,我们讨论过这些安全级别。
And this you know, at Anthropic, we talked about these safety levels.
ASL三就是目前模型所处的水平。
ASL three is where the models are right now.
ASL四则是模型能够递归地自我改进。
ASL four is the model is recursively self improving.
如果这种情况发生,我们就必须满足一系列条件才能发布模型。
And so if this happens, essentially, we have to meet a bunch of criteria before we can release the model.
极端情况下,这种情况会发生,或者出现某种灾难性的滥用。
And so the the extreme is that, you know, this happens, or there is some kind of catastrophic misuse.
比如有人用这个模型来设计生物病毒、寻找零日漏洞,类似这样的事情。
Like people are using the model to design bio viruses, design zero days, stuff like this.
我们正在非常积极地努力防止这种情况发生。
And this is something that we're really, really actively working on, so that doesn't happen.
老实说,看到人们如何使用CodeQwen,这真的让人感到兴奋又谦卑。
I think it's it's just been honestly, it's just been like so exciting and humbling, like seeing how people are using quad code.
我只是想做一个酷炫的东西,结果却变得非常有用。
Like, you know, I just wanted to build a cool thing, and it ended up being really useful.
这让我感到既惊讶又兴奋。
And that was so surprising and so exciting.
我从Twitter或外界的观感是,大家在假期期间都离开了,然后突然发现了Claude Code,从此就一发不可收拾了。
My impression from Twitter or just the outside is basically everyone went away over the holidays and then like found out about Claude code and it's just been crazy ever since.
但对你来说,当时是不是这样?你正享受着一个愉快的圣诞节假期,然后回来后,发生了什么?
But is that how it was for you at like in time that you were you having like a nice Christmas break and then came back here, like, what happened?
实际上,整个十二月我都在旅行,还度过了一个编程假期。
Well, actually, for all of December, I was traveling around, and I I took a coding vacation.
所以我们一路旅行,我每天都只是在写代码。
So we were kinda traveling around, and I was just, like, coding every day.
那真的很好。
So that was really nice.
然后我也开始使用推特,因为那时候我一直在用Threads,早就已经是Threads的用户了。
And then I also started to use Twitter at the time, because like I I worked on threads back then, way back when, so I've been a threads user for a while.
所以我只是想看看其他人活跃在哪些其他平台上。
So I just like tried to see kind of like other platforms where people are.
是的。
Yeah.
我觉得对很多人来说,他们正是在那时发现了OPUS 4.5,而这一点我早就知道了。
I think for a lot of people, they kind of discovered that was the moment where they discovered OPUS 4.5 that I kind of already knew.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。