AI & I - 揭秘Claude代码:来自构建工程师的深度解析 封面

揭秘Claude代码:来自构建工程师的深度解析

Inside Claude Code From the Engineers Who Built It

本集简介

在Every团队中,成员们将工作方式的革新归功于Claude Code。 如今他们能向几乎陌生的代码库提交代码,每个新功能都让后续开发更轻松,甚至非技术同事也能自信地使用终端。 为探寻这一转变,AI&I主持人Dan Shipper邀请Claude Code的创造者——Anthropic AI的Cat Wu(@_catwu)和Boris Cherny(@bcherny)——分享他们打造这款全球最受喜爱的AI工程工具的经验。 无论是否具备技术背景,这期节目都是想要像开发者一样掌握Claude Code的必看内容。 若喜欢本期节目,请点赞、订阅、留言并分享。 想获取更多? 立即注册Every领取《ChatGPT提示词终极指南》:https://every.ck.page/ultimate-guide-to-prompting-chatgpt。该资源通常仅限付费订阅用户,但您可在此免费领取。 关注Dan Shipper更多内容: 订阅Every:https://every.to/subscribe 关注他的X账号:https://twitter.com/danshipper 在[ai.studio/build](http://ai.studio/build)构建您的首款AI应用。 时间轴: 00:00:00 - 开始 00:01:26 - 介绍 00:02:25 - Claude Code诞生故事 00:07:03 - Anthropic如何内部使用Claude Code 00:14:06 - Boris和Cat最爱的斜杠命令 00:15:49 - Boris如何用Claude Code规划功能开发 00:21:53 - Anthropic关于高效使用子代理的全部经验 00:26:16 - 用Claude Code将历史代码转化为优势 00:33:14 - 打造简洁强大代理的产品决策 00:36:38 - 让非技术用户也能使用Claude Code 00:45:12 - AI编程的下一代形态 节目中提到的资源链接: - Cat Wu:https://x.com/_catwu - Boris Cherny:https://x.com/bcherny - Claude Code:https://www.claude.com/product/claude-code

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

让它真正有效的原因是,QuadCode 能够访问工程师在终端上做的所有事情。

What made it work really well is that QuadCode has access to everything that an engineer does at the terminal.

Speaker 0

你能做的任何事情,QuadCode 都能做到。

Everything you can do, QuadCode can do.

Speaker 0

中间没有任何隔阂。

There's nothing in between.

Speaker 1

实际上,Anthropic 内部使用大量积分、每月花费超过一千美元的人数正在不断增加。

There's actually an increasing number of people internally at Anthropic that are using, like, a lot of credits, like spending, like, over a thousand bucks every month.

Speaker 1

我们看到了这种高级用户行为。

We see this, like, power user behavior.

Speaker 1

这是 YC 教授的内容。

This is something that they teach in YC.

Speaker 1

如果你能解决自己的问题,那么你更有可能也在解决别人的问题。

If you can solve your own problem, it's much more likely you're solving the problem for others.

Speaker 1

产品领域有一个非常古老的概念,叫做潜在需求。

There's this, like, really old idea in product called latent demand.

Speaker 1

你以一种可被篡改的方式构建产品,使其足够开放,让人们能将其用于原本未设计的其他用途,然后你再针对这些用途进行优化,因为你清楚地知道这些需求是真实存在的。

You build a product in a way that is hackable, that is kinda open ended enough that people can abuse it for other use cases it wasn't really designed for, then you build for that because you kind of know there was demand for it.

Speaker 2

你觉得命令行界面是最终的产品形态吗?

You think the CLI is the final form factor?

Speaker 2

一年或三年后,我们主要还是会通过命令行界面使用 Cloud Code 吗?还是会有更好的替代方案?

Are we gonna be using Cloud Code in the CLI primarily in a year or in three years, or is there something else that's better?

Speaker 1

本播客由谷歌赞助。

This podcast is sponsored by Google.

Speaker 1

大家好。

Hey, folks.

Speaker 1

我是奥马尔,谷歌深脑的产品与设计负责人。

I'm Omar, product and design lead at Google DeepMind.

Speaker 1

我们刚刚在 AI Studio 推出了升级版的编程体验,让你能够自由组合 AI 功能,更快地将你的想法变为现实。

We just launched a revamped vibe coding experience in AI Studio that lets you mix and match AI capabilities to turn your ideas into reality faster than ever.

Speaker 1

只需描述你的应用,Gemini 就会自动为你连接合适的模型和 API。

Just describe your app, and Gemini will automatically wire up the right models and APIs for you.

Speaker 1

如果你需要一点灵感,点击‘我有点幸运’,我们会帮你快速上手。

And if you need a spark, hit I'm feeling lucky, and we'll help you get started.

Speaker 1

前往 ai.studio/build 创建你的第一个应用。

Head to ai.studio/build to create your first app.

Speaker 2

帕特、博里斯,非常感谢你们的到来。

Pat, Boris, thank you so much for being here.

Speaker 0

谢谢你们邀请我们。

Thanks for having us.

Speaker 2

是的。

Yeah.

Speaker 2

对于还不了解你们的人,你们是 Claude Code 的创作者。

So for people who don't know you, you are the creators of Claude Code.

Speaker 2

我由衷地衷心感谢你们。

Thank you very much from the bottom of my heart.

Speaker 2

我非常喜欢 Claude Code。

It's I love Claude Code.

Speaker 2

听到这个真是太棒了。

That's amazing to hear.

Speaker 2

这正是我们想听到的。

That's what we love to hear.

Speaker 2

好的。

Okay.

Speaker 2

我想从我第一次使用它的时候说起,那时有一个瞬间。

I think the place I want to start is when I first used it, there was like this moment.

Speaker 2

我想大概是Sonnet 3.7发布的时候,我用了之后心想,天啊,这完全是一种全新的范式。

Like, I think it was around when Sonnet 3.7 came out where I was like, I used it and I was like, holy shit, this is like a completely new paradigm.

Speaker 2

这是一种完全不同的编程思维方式。

It's a completely new way of thinking about code.

Speaker 2

最大的不同在于,你们彻底取消了文本编辑器,你只需要和终端对话就够了。

And the big difference was you went all the way and just eliminated the text editor, and you're just like all you do is like talk the terminal and that's it.

Speaker 2

以前的AI编程范式,比如之前的工具,都是你有一个文本编辑器,AI在旁边,有点像自动补全。

You know previous paradigms of AI programming, previous harnesses have been like you have a text editor and you have the AI on the side and it's kind of like or it's a tab complete.

Speaker 2

所以给我讲讲这个决策过程,你是如何设计出这种新范式的?

So take me through that decision process, that process of architecting this new paradigm.

Speaker 2

你是怎么思考这个问题的?

How do you think about that?

Speaker 1

是的,我认为最重要的是,这完全不是有意为之的。

Yeah, I think the most important thing is it was not intentional at all.

Speaker 1

我们只是偶然得到了这个结果。

We sort of ended up with it.

Speaker 1

当我加入Anthropic时,我们当时还在不同的团队。

So at the time when I joined Anthropic, we were still on different teams at the time.

Speaker 1

当时有一个QuadCode的前身。

There was this previous predecessor to QuadCode.

Speaker 1

它叫Clyde,就是C-I-D,是一个研究项目。

It was called Clyde, like C I D And it was this research project.

Speaker 1

它启动要花一分钟。

It took like a minute to start up.

Speaker 1

它当时是一个相当笨重的Python项目。

It was this kind of really heavy Python thing.

Speaker 1

它需要运行一大堆索引之类的操作。

It had to run a bunch of indexing and stuff.

Speaker 1

当我加入时,我想提交我的第一个PR。

And when I joined, I wanted to ship my first PR.

Speaker 1

而我当时像个新手一样手写代码,根本不知道有这些工具

And I hand wrote like a noob in I the didn't know about any of these tools

Speaker 2

比如,谢谢

like Thank

Speaker 1

你在播客上承认了这一点。

you for admitting that on the podcast.

Speaker 1

我当时也不懂更好的方法。

I didn't know any better.

Speaker 1

然后我提交了这个PR,亚当·沃尔夫曾是我们团队的工程经理一段时间,他是我的入职伙伴,他直接拒绝了那个PR,他说:'你这是手写的吗?你在干什么?'

And then I put up this PR and Adam Wolf, who was the eng manager for our team for a while, he was my ramp up buddy and he just like rejected the PR and he was like you wrote this by hand, what are doing?

Speaker 1

用 Clyde。

Use Clyde.

Speaker 1

因为当时他也在大量使用 Clyde 进行开发。

Because he was also hacking a lot on Clyde at the time.

Speaker 1

于是我试了 Clyde,把任务描述输入进去,它一下子就生成了整个代码。

And so I tried Clyde, I gave it the description of the task and it just like one shotted this thing.

Speaker 1

那时候还是 Sonnet 3.5,所以即使对于这种基础任务,我还是得去修复一些问题。

And this was like, you know, Sonnet 3.5, so I still had to fix the thing even for this kind of basic task.

Speaker 1

而且测试框架太老旧了,跑一次要花五分钟,简直慢得离谱。

And the harness was super old, so it took like five minutes to turn this thing out and just took forever.

Speaker 1

但它确实运行成功了,我简直震惊,这居然真的能做到。

But it worked and I was just mind blown that this was even possible.

Speaker 1

这下子让我开始认真思考这些问题。

And they just kind of got the gears turning.

Speaker 1

也许你根本不需要 IDE。

Maybe you don't actually need an IDE.

Speaker 1

后来,我在使用 Anthropic API 进行原型开发时,最简单的方法就是在终端里构建一个小应用,这样就不必开发任何用户界面。

And then later on I was prototyping using the Anthropic API, and the easiest way to do that was just building a little app in the terminal because that way I didn't have to build a UI or anything.

Speaker 1

于是我开始简单地搭建一个聊天程序。

And I started just making a little chat up.

Speaker 1

然后我就想,也许我们可以做点类似 Clyde 的东西,不如我来构建一个简易版的 Clyde。

And then I just started thinking maybe we could do something a little bit like Clyde, so let me build like a little Clyde.

Speaker 1

结果发现,没花多少功夫,它反而变得比预期有用得多。

And it actually ended up being a lot more useful than that without a lot of work.

Speaker 1

对我来说,最大的顿悟是当我们开始给模型提供工具时,它们就开始主动使用这些工具了。

And I think the biggest revelation for me was when we started to give the model tools, they just started using tools.

Speaker 1

那真是一个令人震惊的时刻。

It was this insane moment.

Speaker 1

模型就是想要使用工具。

The model just wants to use tools.

Speaker 1

比如,我们给了它 Bash,它就开始用 Bash,还写 AppleScript 来根据问题自动化执行任务。

Like, we gave it Bash, it just started using Bash, writing AppleScript to automate stuff in response to questions.

Speaker 1

我当时就想,这真是太疯狂了。

And I was like, this is just the craziest thing.

Speaker 1

我从未见过这样的事情。

I've never seen anything like this.

Speaker 1

因为当时我只用过一些IDE,比如文本编辑、一行自动补全、多行自动补全之类的。

Because at the time I had only used IDEs with like, you know, like text editing, a little like one line autocomplete, multi line autocomplete, whatever.

Speaker 1

所以这个想法就是从这种转变中来的——既是原型开发,也是以一种非常粗略的方式看到可能性。

So that's where this came from, was this kind of conversion, like prototyping, but also kind of seeing what's possible in kind of like a very rough way.

Speaker 1

结果这个东西出乎意料地有用。

And this thing ended up being surprisingly useful.

Speaker 1

我想对我们来说也是同样的情况。

And I think it was the same for us.

Speaker 1

对我来说,那神奇的时刻出现在Sonnet 4和Opus 4的时候。

I think for me it was like kind of Sonnet four, Opus four, that's where that magic moment was.

Speaker 1

我当时就想,天啊,这东西真的能用。

It was like, Oh my god, this thing works.

Speaker 2

这很有趣。

That's interesting.

Speaker 2

跟我说说那个工具的时刻吧,因为我觉得这是Cloud Code的一个特别之处,它能直接写bash,而且非常擅长。

Tell me about the tool moment because I think that is one of the special things about Cloud Code is it just writes bash and it's really good at it.

Speaker 2

我认为,许多以前的代理架构,甚至今天任何构建代理的人,第一反应可能是:好吧,我们要给它一个查找文件的工具,然后给它一个打开文件的工具,为所有代理可能执行的操作都构建自定义封装,但Cloud Code直接用Bash,而且做得非常好。

And I think a lot of previous agent architectures or even anyone building agent today, your first instinct might be, okay, we're gonna give it a find file tool and then we're gonna give it a open file tool and you build all these custom wrappers for all the different actions you might want the agent to take, but Cloud Code just uses Bash and it's really good at it.

Speaker 2

那么,你从中学到了什么?

So how do you think about what you learned from that?

Speaker 1

是的,目前Cloud Code实际上已经拥有了很多工具。

Yeah, think we're at this point right now where Cloud Code actually has a bunch of tools.

Speaker 1

我觉得大概有一打左右吧。

I think it's like a dozen or something like this.

Speaker 1

我们几乎每周都会添加或移除工具,所以这个变化非常频繁。

We actually add and remove tools most weeks, so this changes pretty often.

Speaker 1

但今天确实有一个搜索工具,用于搜索。

But today there actually is a search, there's a tool for searching.

Speaker 1

我们这样做有两个原因。

And we do this for two reasons.

Speaker 1

一是用户体验,这样我们可以向用户更美观地展示结果,因为目前大多数任务仍然需要人工参与。

One is the UX, so we can show the result a little bit nicer to the user, because there's still a human in the loop right now for most tasks.

Speaker 1

第二个原因是权限管理。

And the second one is for permissions.

Speaker 1

所以,如果你在你的代码设置里指定。

So if you say in your like quad code like settings.

Speaker 1

对于这个JSON文件你无法读取,我们不得不强制实施这一点。

Json this file you cannot read, we have to kind of enforce this.

Speaker 1

我们在bash中强制执行,但如果我们有专门的搜索工具,效率可以更高一些。

We enforce it for bash but we can do it a little bit more efficiently for if we have a specific search tool.

Speaker 1

但我们肯定希望精简工具,保持模型使用的简洁性。

But definitely we want to unship tools and of keep it simple for the model.

Speaker 1

上周或两周前我们移除了LS工具,因为过去我们需要它,但后来我们实际上构建了一种方式来为Bash强制执行这类权限系统。

Last week or two weeks ago we unshipped the LS tool because in the past we needed it, but then we actually built a way to enforce this kind of permission system for Bash.

Speaker 1

所以在 Bash 中,如果我们知道你没有权限读取某个目录,Quad 就不能访问该目录。

So in Bash, if we know that you're not allowed to read a particular directory, Quad's not allowed to OS that directory.

Speaker 1

由于我们可以一致地执行这一规则,我们就不再需要这个工具了。

And because we can enforce that consistently, we don't need this tool anymore.

Speaker 1

这很好,因为对 Quad 来说选择更少了,上下文中的内容也更简洁。

And this is nice, because it's a little less choice for Quad, a little less stuff in context.

Speaker 2

明白了。

Got it.

Speaker 2

你们团队是如何分工的?

And how do you guys split responsibility on the team?

Speaker 0

我认为 Boris 设定了技术方向,并且是我们推出的许多功能的产品愿景主导者。

I would say Boris sets the technical direction and has been the product visionary for a lot of the features that we've come out with.

Speaker 0

我把自己看作一个支持角色,确保我们的定价和包装能引起用户的共鸣。

I see myself as more of a supporting role to make sure that, one, that our pricing and packaging resonates with our users.

Speaker 0

第二,确保我们所有功能都能顺利走过发布流程。

Two, making sure that we're shepherding all our features across the launch process.

Speaker 0

从决定哪些原型我们 definitely 要添加功能,到为我们的应用功能设定质量标准,再到向最终用户传达这些信息。

So from deciding, all right, these are the prototypes that we should definitely add food, to setting the quality threshold for our app fooding through to communicating that to our end users.

Speaker 0

我们目前正在推进一些新举措,我可以说,过去很多 Cloud Code 都是自下而上构建的,比如 Boris 和许多核心团队成员都提出了很棒的想法,比如待办事项、子代理、钩子,这些全都是自下而上的。

And there's definitely some new initiatives that we're working on that I would say historically a lot of Cloud Code has been built bottoms up, like Boris and a lot of the core team members have just had these great ideas for to do lists, sub agents, hooks, like all these are bottoms up.

Speaker 0

当我们考虑扩展到更多服务并将 Cloud Code 引入我们的平台时,我认为很多这类工作更应该是:先与客户沟通,让工程师参与这些对话,然后对这些服务进行优先级排序并逐一实现。

As we think about expanding to more services and bringing cloud code to our places, I think a lot of those are more like, all right, let's talk to customers, let's bring engineers into those conversations and prioritize those services and knock them out.

Speaker 2

明白了。

Got it.

Speaker 2

什么是 antfooding?

What is antfooding?

Speaker 0

哦,antfooding 是

Oh, antfooding is

Speaker 2

哦,antfooding?

Oh, antfooding?

Speaker 0

哦,这其实是 dogfooding。

Oh, it means dogfooding.

Speaker 2

安提克普蚂蚁。

Anthropic ants.

Speaker 2

是的。

Yeah.

Speaker 2

我明白了。

I got it.

Speaker 0

是的,我们给内部员工的昵称是‘蚂蚁’。

Yeah, our nickname for internal employees is ant.

Speaker 0

所以‘蚂蚁喂食’就是我们对‘自我喂食’的叫法。

And so ant fooding is our version of dog fooding.

Speaker 0

在内部,我认为超过70%到80%的蚂蚁——即技术型安提克普员工——每天都使用胶体代码。

Internally, over, I think, 70 or 80% of ants, technical anthropic employees use colloid code every day.

Speaker 0

因此,每当我们考虑新功能时,都会先推送给内部员工,从而获得大量反馈。

And so every time we are thinking about a new feature, we push it out to people internally and we get so much feedback.

Speaker 0

我们有一个反馈渠道。

We have a feedback channel.

Speaker 0

我觉得我们每五分钟就会收到一条帖子。

I think we get a post every five minutes.

Speaker 0

因此你能迅速得到反馈,知道大家是否喜欢它、是否有bug,或者是否不够好需要撤回。

And so you get really quick signal on whether people like it, whether it's buggy, or whether it's not good and we should unship it.

Speaker 2

你能看出来。

You can tell.

Speaker 2

你能看出那些在构建东西的人一直在用它,因为它的易用性在你试图构建东西时显得非常合理,而这只有在进行‘蚂蚁喂养’时才会发生。

You can tell that someone that is building stuff is using it all the time to build it, because the like, its ergonomics just makes sense if you're trying to build stuff and that that only happens if you're like ant fooding.

Speaker 2

是的,我认为这是一种非常有趣的构建新事物的模式,自下而上,我为自己做点东西。

Yeah, and I think that that's a really interesting paradigm for building new stuff, that sort of bottoms up, I make something for myself.

Speaker 2

跟我说说这个。

Tell me about that.

Speaker 1

是的,而且这也非常谦逊。

Yeah, and is also so humble.

Speaker 1

我认为卡特在产品方向上也扮演了非常重要的角色。

I think Kat has a really big role in the product direction also.

Speaker 1

这些想法来自团队中的每个人。

Comes from everyone on the team.

Speaker 1

具体例子,这确实来自团队中的每个人。

Specific examples, this actually came from everyone on the team.

Speaker 1

待办事项和子代理是Sid、Hooks和Dixon开发的,插件是Daisy开发的。

To do lists and sub agents, that was Sid, Hooks, Dixon shipped that, plugins, Daisy shipped that.

Speaker 1

所以团队中的每个人,这些想法都来自大家。

So everyone on the team, these ideas come from everyone.

Speaker 1

因此,对于我们来说,我们构建了这个核心代理循环和核心体验,然后团队中的每个人都会频繁使用这个产品,团队外的人也都在频繁使用这个产品。

And so I think for us, we build this core agent loop and this kind of core experience, and then everyone on the team uses the product all the time, and so everyone outside the team uses the product all the time.

Speaker 1

因此,有很多机会来构建满足这些需求的功能。

And so there's just all these chances to build things that serve these needs.

Speaker 1

比如批处理模式,你知道的,感叹号,你可以输入批处理命令。

Like for example, like batch mode, you know, like the exclamation mark and you can type in batch commands.

Speaker 1

几个月前,我用Quad Code的时候,来回切换两个终端,觉得挺烦的。

This was just like many months ago I was using quad code and I was going back and forth between two terminals and just thought it was kind of annoying.

Speaker 1

我一时兴起,就让Quad想些点子。

And just on a whim, I asked quad to think of ideas.

Speaker 1

我想到了这个感叹号的bash模式。

I thought of this exclamation mark bash mode.

Speaker 1

然后我就说,太好了,把它改成粉色,然后上线吧。

And then I was like, great, make it pink and then ship it.

Speaker 1

它真的就做到了。

It just did it.

Speaker 1

这种功能一直保留了下来。

That's the thing that still kind of persisted.

Speaker 1

现在你看到其他人也开始效仿了。

Now you see others also kind of catching on to that.

Speaker 2

这真有趣。

That's funny.

Speaker 2

我其实之前不知道这件事。

I actually didn't know that.

Speaker 2

这非常有用,因为我总是得打开一个新标签页来运行任何 bash 命令。

And that's extremely useful because I always have to open up a new tab to run any bash commands.

Speaker 2

所以你只要输入一个感叹号,它就会直接运行,而不需要经过所有的云服务流程。

So you just do an exclamation point, and then it just runs it directly instead of filtering it through all the Cloud stuff.

Speaker 0

而且 Cloud Code 也能看到完整的输出。

And Cloud Code sees the full output too.

Speaker 2

有意思,这太完美了。

Interesting, that's perfect.

Speaker 0

所以你在 Cloud Code 视图中看到的任何内容,Cloud Code 也能看到。

So anything you see in the Cloud Code view, Cloud Code also sees.

Speaker 2

好的,这真的很有趣。

Okay, that's really interesting.

Speaker 1

这正是我们正在考虑的一个用户体验问题。

And this is kind of a UX thing that we're thinking about.

Speaker 1

过去,工具是为工程师设计的,但现在则是工程师和模型各占一半。

In the past, tools were built for engineers, but now it's equal parts engineers and model.

Speaker 1

因此,作为工程师,你可以看到输出结果,但这对模型来说也非常有用。

And so like as an engineer you can see the output but it's actually quite useful for the model also.

Speaker 1

这也是我们理念的一部分——一切都是双重用途的。

And this is part of the philosophy also like everything is dual use.

Speaker 1

例如,模型也可以调用斜杠命令。

So for example the model can also call slash commands.

Speaker 1

比如,我有一个用于提交的斜杠命令,它会执行几个步骤,比如对比差异并生成合理的提交信息,诸如此类的操作。

So like you know I have a slash command for a slash commit where I run through a few different steps like diffing and generating a reasonable commit message and this kind of stuff.

Speaker 1

我可以手动运行它,但Quad也可以为我自动运行。

I run it manually, but also Quad can run this for me.

Speaker 1

这非常有用,因为我们能够共享这段逻辑,定义这个工具,然后双方都能使用它。

And this is pretty useful because we get to share this logic, get to define this tool, then we both get to use it.

Speaker 2

是的。

Yeah.

Speaker 2

设计双重用途的工具与设计仅由一方使用的工具,两者之间有什么区别?

What are the differences in designing tools that are dual use from designing tools that are used by one or the other?

Speaker 1

令人惊讶的是,到目前为止都是一样的。

Surprisingly, it's the same so far.

Speaker 1

是的。

Yeah.

Speaker 1

我总觉得,这种为人类设计的优雅方案,对模型来说也同样适用。

I sort of feel like this kind of elegant design for humans translates really well to the models.

Speaker 2

所以你只是在思考什么对你有意义,而如果对你有意义,模型通常也会觉得有意义。

So you're just thinking about what would make sense to you, and the model generally, it makes sense to the model too, if it makes sense to you.

Speaker 0

是的,我认为Cloud Code作为终端UI的一个非常酷的地方,也是它能如此成功的原因,是Cloud Code能够访问工程师在终端上能做的所有事情。

Yeah, I think one of the really cool things about Cloud Code being a terminal UI and what made it work really well is that Cloud Code has access to everything that an engineer does at the terminal.

Speaker 0

至于工具是否应该设计为双重用途,我认为让它们成为双重用途实际上会让工具更容易理解。

And I think when it comes to whether the tool should be dual use or not, I think making them dual use actually makes the tools a lot easier to understand.

Speaker 0

这意味着,你所能做的所有事情,Cloud Code都能做,中间没有任何隔阂。

It just means that, okay, everything you can do, Cloud Code can do, there's nothing in between.

Speaker 2

这很有趣。

That's interesting.

Speaker 2

有几个这样的决定,所以没有代码编辑器。

There are a couple of those decisions, so So no code editor.

Speaker 2

它在终端里,因此可以访问你的文件。

It's in the terminal so it has access to your files.

Speaker 2

它运行在你的电脑上,而不是在云端的虚拟机中。

And it's on your computer versus in the cloud in a virtual machine.

Speaker 2

你可以反复使用它,逐步构建你的 Cloud MD 文件或创建命令行快捷方式等,从一个非常简单的起点开始,变得极具组合性和可扩展性。

You get to use it in a repeated way where you can build up your Cloud MD file or build slash commands and all that kind of stuff, where it becomes very composable and extensible from a very simple starting point.

Speaker 2

我想知道,对于那些打算构建代理的人,比如不想做 Cloud Code,而是做其他东西的人,你是如何获得那个简单的初始包,让它随着时间推移不断扩展并变得强大?

And I'm curious about how you think about, you know, for people who are thinking about, okay, I want to build an agent, I want to build probably not Cloud Code, but like something else, how you get that simple package that then can extend and be really powerful over time.

Speaker 1

对我来说,我会先像开发任何产品那样思考:在能为他人解决问题之前,你必须先解决自己的问题。

For me, I'd start by just thinking about it like developing any kind of product where you have to solve the problem for yourself before you can solve it for others.

Speaker 1

这正是 YC 教导我们的:你必须从自己开始。

And this is something that they teach in YC is you have to start with yourself.

Speaker 1

所以,如果你能解决自己的问题,就更有可能也解决了别人的问题。

So if can solve your own problem it's much more likely you're solving the problem for others.

Speaker 1

我认为对于编程来说,本地起步是合理的选择,而现在我们有了网页版的 Cloud Code,你也可以用虚拟机来使用它,而且在远程环境下也能用,这在你外出时特别有用,比如你想随时处理一些事情。

And I think for coding starting locally is the reasonable thing and now we have Cloud Code on the web so you can also use it with a virtual machine and you know you can use it in a remote setting and this is super useful when you're on the go you want to take know some that.

Speaker 1

我们是逐步验证这一点的,比如你可以在 GitHub 上使用 quad。

That your this is sort of we started proving this out kind of a step at a time where you can do at quad in GitHub.

Speaker 1

我每天都用这个,比如上班路上遇到红灯,我本不该这么做,但我还是在 GitHub 上,红灯时就用 quad 去修复某个问题之类的。

And I use this every day, like on the way to work I'm like at a red light, I probably shouldn't be doing this, but I'm like, on GitHub, at a red light, and then I'm like, at quad, fix this issue or whatever.

Speaker 1

所以能通过手机来控制它,真的非常方便。

And so it's just really useful to be able to control it from your phone.

Speaker 1

这证明了这种体验是可行的。

And this kind of proves out this experience.

Speaker 1

我不知道这种做法是否适用于所有使用场景。

I don't know if this necessarily makes sense for every kind of use case.

Speaker 1

对于编程来说,我认为从本地开始是正确的。

For coding, I think starting local is right.

Speaker 1

不过,我不确定这是否适用于所有情况。

I don't know if this is true for everything, though.

Speaker 2

明白了。

Got it.

Speaker 2

你们常用的斜杠命令有哪些?

What are the slash commands you guys

Speaker 0

使用什么?

use?

Speaker 0

斜杠 PR 提交。

Slash PR commit.

Speaker 1

是的。

Yeah.

Speaker 0

对,我觉得 PR 提交这个斜杠命令能让大家快速知道该运行哪些 bash 命令来提交代码。

Yeah, it's I think the PR commit slash command makes it a lot faster for a call to know exactly what bash commands to run-in order to make a commit.

Speaker 2

对于不熟悉的人,PR 提交这个斜杠命令具体是做什么的?

And what does the PR commit slash command do for people who aren't familiar?

Speaker 0

哦,它就是明确告诉你该如何提交代码。

Oh, it it just tells it, like, exactly how to make a commit.

Speaker 2

好的。

Okay.

Speaker 0

你可以说,好吧,这三个是需要运行的bash命令。

And you can say, Okay, these are the three bash commands that need to be run.

Speaker 1

明白了。

Got it.

Speaker 1

而且相当酷的是,我们还在斜杠命令中内置了一套模板系统。

And what's pretty cool is also we have this kind of templating system built into slash commands.

Speaker 1

所以我们实际上会提前运行这些bash命令。

So we actually run the bash commands ahead of time.

Speaker 1

它们被嵌入在斜杠命令中。

They're embedded into the slash command.

Speaker 1

你还可以预先允许某些工具的调用。

And you can also pre allow certain tool invocations.

Speaker 1

因此对于这个斜杠命令,我们会允许git commit、git push、ghpr等,这样你在运行斜杠命令后就不会再被询问权限了,因为我们有一个基于权限的安全系统。

So for that slash command we say allow git commit, git push, ghpr and so you don't get asked for permission after you run the slash command because we have like a permission based security system.

Speaker 1

而且它还使用了 Haiku,这挺酷的。

And then also it uses Haiku, which is pretty cool.

Speaker 1

所以它是一个更便宜、更快的模型。

So it's kind of a cheaper model and faster.

Speaker 1

是的,我经常使用 commit、commit PR、feature dev 和 Wise。

Yeah, and for me I use commit, commit PR, feature dev, Wise a lot.

Speaker 1

这是 Sid 创建的。

Sid created this one.

Speaker 1

挺有意思的。

It's kind of cool.

Speaker 1

它会一步步引导你构建东西。

It kind of walks you through step by step building something.

Speaker 1

所以我们向 Quad 提示:先问我具体想要什么,比如制定规范。

So we prompt Quad like, first ask me what exactly I want, like build the specification.

Speaker 1

然后制定详细计划,再生成待办清单,一步步引导完成。

And then build a detailed plan and then make a to do list, walk through step by step.

Speaker 1

所以这更像是结构化的功能开发。

So it's kind of like more structured feature development.

Speaker 1

然后我认为我们经常使用的最后一个工具是安全审查,我们会对所有拉取请求进行安全审查和代码审查。

And then I think the last one that we probably use a lot, so we use like security review for all of our PRs and then also code review.

Speaker 1

比如,Claude 在 Anthropic 内部负责我们所有的代码审查。

So like Claude does all of our code review internally at Anthropic.

Speaker 1

当然,最终还是有人类审批,但 Claude 会先完成代码审查的第一步。

You know, there's still a human approving it, but Claude does kind of the first step in code review.

Speaker 1

这只是一个 /code_review 的斜杠命令。

That's just a slash code review slash command.

Speaker 2

明白了。

Got it.

Speaker 2

是的,我特别想深入了解如何制定一个好的计划,比如功能开发那部分。

Yeah, what are the things I would love to go deeper into the how do you make a good plan, so the sort of the feature dev thing.

Speaker 2

因为我觉得有很多小技巧,我刚开始发现,或者其他人也开始发现一些有效的方法。

Because I think there's a lot of little tricks that I'm starting to find, or people at every start are starting to find that work.

Speaker 2

我很好奇,我们还漏掉了哪些东西。

And I'm curious, what are the things that we're missing.

Speaker 2

例如,在计划开发过程中有一个反直觉的步骤是:即使我并不完全清楚需要构建的具体内容,我心中也只有一句话,比如‘我想做功能X’。

So for example, one unintuitive step of the plan development process is, even if I don't exactly know what the thing that needs to be built is, I just have a little sentence in my mind, like I want feature X.

Speaker 2

我会让Claude直接去实现它,不提供任何其他信息,然后观察它做了什么,这能帮助我理解:哦,原来我真正想要的是这个,因为它犯了各种不同的错误,或者做出了我没想到但可能不错的事情。

I have Claude just like implement it, just without giving it anything else, and I see what it does, and that helps me understand like, okay, here's actually what I mean, because it made all these different mistakes, or like it did something that I didn't expect that might be good.

Speaker 2

然后我会利用这种临时开发中获得的教训,清理掉这些内容,再据此写出更好的功能开发计划文档,这在以前是根本不会做的,因为让工程师在没有明确需求的情况下‘盲干’一个功能成本太高了。

And then I use that, like the learning from the sort of throwaway development, and just clear it out, and then that helps me write a better plan spec for the actual feature development, which is something that you would never do before, because it'd be too expensive to just like YOLO, in an engineer on a feature that you hadn't actually specced out.

Speaker 2

但因为你有Claude在你的代码库中不断尝试和探索,你可以从它那里学到东西,从而帮助你制定更完善的计划。

But because you have Cloud going through your code base and doing stuff, you can learn stuff from it that helps inform the actual plan that you make.

Speaker 1

是的,我觉得我可以先开始,我也很好奇你是怎么用的。

Yeah, I feel maybe I can start, and I'm curious how you use it too.

Speaker 1

我觉得对我而言可能有几种不同的使用模式。

I think there's like a few different modes maybe for me.

Speaker 1

一种是原型模式。

One is prototyping mode.

Speaker 1

传统的工程原型开发中,你会构建一个最简单的、能触及所有系统的版本,只是为了大致了解有哪些系统以及存在哪些未知因素,从而梳理整个流程。

So traditional engineering prototyping, you wanna build the simplest possible thing that touches all the systems just so you can get a vague sense of what are the systems, there's unknowns, just to kind of trace through everything.

Speaker 1

所以,丹,我和你做的一模一样。

And so I do the exact same thing as you, Dan.

Speaker 1

让Claude直接去执行,然后我观察它哪里出错了,再让它丢掉重做。

Like Claude just does the thing and then I see where it messes up and then I'll ask it to just throw it away and do it again.

Speaker 1

直接按两次ESC,回到之前的检查点,然后重新尝试。

Just hit escape twice, go back to the old checkpoint and then try again.

Speaker 1

我认为还有另外两种类型的任务:一种是Quad能一次性完成的,我觉得它能搞定,我就直接告诉它,然后切换到另一个标签页,按Shift+Tab自动接受,接着去做别的事,或者去处理其他Quad任务,同时让它在后台运行。

I think there's also maybe two other kinds of tasks so one is just things that quad can one shot and I feel pretty confident it can do it so I'll just tell it and then I'll just go to a different tab and I'll shift tab to auto accept and then just go do something else or go to another one my quads and tend to that while it does this.

Speaker 1

但还有一种更复杂的特性开发,这类任务在过去可能需要耗费数小时的工程时间,对于这类任务,我通常会进入计划模式,先和它对齐方案,然后再让它写代码。

But also there's this kind of like harder feature development so these are you know things are maybe in the past it would have taken like a few hours of engineering time and for this usually I'll shift tab in plan mode and then align on the plan first before it even writes any code.

Speaker 1

我觉得这其中真正困难的地方在于,随着每个新模型的出现,这个边界会以一种令人惊讶的方式发生变化。

And I think what's really hard about this is the boundary changes with every model in kind of a surprising way.

Speaker 1

新模型更智能了,因此需要使用计划模式的边界也随之向外推了一点。

Where the newer models they're more intelligent so the boundary of what you need plan mode for got pushed out a little bit.

Speaker 1

以前需要规划,现在不需要了。

Before, used to need to plan, now you don't.

Speaker 1

我认为这是一种普遍趋势,以前那些属于临时搭建的东西。

And I think this is general trend of stuff that used to be scaffolding.

Speaker 1

随着模型变得更先进,这些功能逐渐被整合进模型本身,模型最终会逐步吸收一切。

With a more advanced model, it gets pushed into the model itself, and the model kind of tends to subsume everything over time.

Speaker 2

是的。

Yeah.

Speaker 2

你如何看待构建一个代理框架,使其不会让你花大量时间去开发那些在三个月后新模型发布时就会被吸收的东西?

How do you think about building an agent harness that isn't just going to you're not spending a bunch of time building stuff that is just going to be subsumed into the model in three months when the new cloud comes out?

Speaker 2

你怎么知道该做什么,而不是说‘现在还不行,但下次就会好了’,所以不值得花时间去做?

How do you know what to build versus what you just say, It doesn't work quite yet, but next time it's going work, so we're not going to spend time on it.

Speaker 0

哦,我认为我们会构建大多数我们认为能提升云代码能力的东西,即使这意味着我们可能三个月后就得把它废弃。

Oh, I think we build most things that we think would improve Cloud Code's capabilities, even if that means we'll have to get rid of it in three months.

Speaker 0

相反,我们反而希望三个月后就能把它废弃。

If anything, we hope that we will get rid of it in three months.

Speaker 0

我认为目前我们只想提供尽可能优质的体验,因此并不太担心这些临时性的工作。

I think for now, we just want to offer the most premium experience possible, and so we're not too worried about throwaway work.

Speaker 1

有意思。

Interesting.

Speaker 1

是的。

Yeah.

Speaker 1

一个例子就是计划模式本身。

And an example of this is something like even like plan mode itself.

Speaker 1

我认为当Quad能够根据你的意图判断你可能首先想要规划时,我们最终会移除它。

I think we'll probably unship it at some point when Quad can just figure out from your intent that you probably want to plan first.

Speaker 1

或者,比如说,我昨天刚从系统提示中删除了大约2000个标记。

Or, you know, for example, I just deleted, like, 2,000 tokens or something from the system prompt yesterday.

Speaker 1

Sonnet 4.5已经不需要它了。

This goes like Sonnet 4.5 doesn't need it anymore.

Speaker 1

但Opus 4.1确实还需要它。

But Opus 4.1 did need it.

Speaker 2

那如果最新的前沿模型不再需要它,但你又想让它更高效,因为你有大量用户,可能不会用Opus或Sonnet 4.5来处理所有任务,而是会用Haiku呢?

What about in the case where the latest frontier model doesn't need it, but you're trying to figure out how to make it more efficient because you have so many users that maybe you're not going to use Opus or Sonnet 4.5 for everything, maybe you're going to use Haiku.

Speaker 2

所以这里存在一个权衡:是为Haiku投入更多复杂的设计,还是干脆不花时间优化,直接用Sonnet,承担成本,转而专注于更前沿的开发?

So there's a trade off between having a more elaborate harness for Haiku versus just not spending time on it, using Sonnet, eating the cost, and working on more frontier type stuff.

Speaker 0

总的来说,我们把Cloud Code定位为一个非常高端的产品。

In general, we've positioned Cloud Code to be a very premium offering.

Speaker 0

因此,我们的核心目标是确保它能与我们目前最强大的模型——也就是Sonnet 4.5——完美配合。

So our North Star is making sure that it works incredibly well with the absolutely most powerful model we have, which is Sonnet 4.5 right now.

Speaker 0

我们正在研究如何让它在未来的较小模型上也能表现优异,但这对我们来说并不是当前的首要任务。

We are investigating how to make it work really well for future generations of smaller models, but it's not the top priority for us.

Speaker 2

好的。

Okay.

Speaker 2

我注意到的一点是,我们经常在新模型发布前就拿到它们,非常感谢你们提供这些,我们的工作就是判断它们到底怎么样。

What do you think about, know, one thing that I notice is we get models often, and thank you very much for this we get models a lot before they come out and it's our job to kind of figure out is it any good.

Speaker 2

在过去六个月里,当我用新发布的前沿模型在Claude应用中进行测试时,其实很难立刻判断它是否真的更好。

And over the last six months, when I'm testing Claude, for example, in the Claude app with a new frontier model, it's actually very hard to tell whether it's better immediately.

Speaker 2

但在Claude代码中,这一点就非常明显,因为框架对模型的性能影响非常大。

But it's really easy to tell in Claude code because the harness matters a lot for the performance that you get out of the model.

Speaker 2

而且你们在Anthropic内部开发Claude或Claude代码,因此基础模型训练与你们构建的框架之间有着更紧密的集成,它们似乎彼此产生了深远的影响。

And you guys have the benefit of building Claude, or building Claude code inside of Anthropic, so there's like a much tighter integration between the fundamental model training and the harness that you're building, and they seem to kind of really impact each other.

Speaker 2

那么内部是如何运作的?这种紧密集成给你们带来了哪些好处?

So how does that work internally, what are the benefits you get from having that tight integration?

Speaker 1

是的,我认为最重要的是研究人员都在使用它。

Yeah, I think the biggest thing is researchers just use this.

Speaker 1

因此,当他们看到哪些有效、哪些无效时,就能不断改进。

And so as they see what's working and what's not, they can improve stuff.

Speaker 1

我们做了大量评估和其他工作,以便相互沟通,准确了解模型的现状。

We do a lot of evals and things like that to communicate back and forth and understand where exactly the model's at.

Speaker 1

但确实存在这样一个前沿领域:你需要给模型足够困难的任务,才能真正考验它的极限。

But yeah, there's this frontier where you need to give the model a hard enough task to really push the limit of the model.

Speaker 1

如果你不这么做,那么所有模型看起来都差不多。

And if you don't do this, then all models are kind of equal.

Speaker 1

但如果你给它一个相当难的任务,就能看出区别了。

But if you give it a pretty hard task, can tell the difference.

Speaker 2

你们使用哪些子代理?

What sub agents do you use?

Speaker 1

我有几个。

I have a few.

Speaker 1

我有一个用作规划器的子代理。

I have like a planner sub agent that I use.

Speaker 1

我有一个代码审查子代理。

I have a code review sub agent.

Speaker 1

代码审查这个任务,有时我会用子代理,有时会用斜杠命令。

Code review is actually something where sometimes I use a sub agent, sometimes I use a slash command.

Speaker 1

通常在CI中我会用斜杠命令,但在同步使用时,我会为同样的任务使用子代理。

So usually in CI it's a slash command, but in synchronous use I use a sub agent for the same thing.

Speaker 1

这是个好问题,是的。

It's a good question, yeah.

Speaker 1

这可能只是品味的问题。

Maybe it's like a matter of taste.

Speaker 1

是的,我不确定,我不确定。

Yeah, I don't know, I don't know.

Speaker 1

我认为当你同步运行时,稍微分离一下上下文窗口还挺好的,因为代码审查中的所有内容都与我接下来要做的事情无关。

I think it's maybe when you're running synchronously, it's kind of nice to fork off the context window a little bit, because all the stuff that's going on in the code review, it's not relevant to what I'm doing next.

Speaker 1

但在CI中,这根本无所谓。

But in CI, it just doesn't matter.

Speaker 2

你有没有同时启动过十个子代理?都是为了什么?

Are you ever spawning like 10 sub agents at once, and for what?

Speaker 1

对我来说,我主要是在进行大规模迁移时这么做。

For me, I do it mostly for big migrations.

Speaker 1

那才是最重要的事情。

That's like the big thing.

Speaker 1

实际上,我们用的这个代码审查命令中,就有很多子代理。

Actually, have this code review slash command that we use, there's a bunch of sub agents there.

Speaker 1

所以其中一步是找出所有问题。

And so one of the steps is find all the issues.

Speaker 1

因此有一个子代理检查QuadMD合规性,另一个子代理查看Git历史记录以了解情况。

And so there's one sub agent that's checking for QuadMD compliance, there's another sub agent that's looking through git history to see what's going on.

Speaker 1

还有一个子代理在查找明显的bug。

Another subagent that's looking for obvious bugs.

Speaker 1

然后我们再进行去重和质量检查步骤。

And then we do this of deduping quality step after.

Speaker 1

所以它们会发现一大堆问题。

So they find a bunch of stuff.

Speaker 1

其中很多都是误报。

A lot of these are false positives.

Speaker 1

因此我们会再启动大约五个子代理。

And so then we spawn like five more subagents.

Speaker 1

这些子代理都只是用来检查误报。

And these are all just like checking for false positives.

Speaker 1

最终的结果非常出色。

And in the end the result is awesome.

Speaker 1

它能找出所有真正的问题,而不会包含虚假问题。

It finds like all the real issues without the false issues.

Speaker 2

这太棒了。

That's great.

Speaker 2

我实际上也是这么做的。

I actually do that.

Speaker 2

所以我的一个非技术性的Cloud Code使用场景是费用报销。

So one of my non technical Cloud Code use cases is expense filing.

Speaker 2

比如我现在在旧金山,有很多开销,于是我建了一个小的Cloud项目,或者说是Cloud Code,它使用某个金融API自动下载我所有的信用卡交易记录,然后判断哪些可能是我需要报销的费用。

So like when I'm, I'm in SF right now, so like I have all these expenses, and so I built this little Cloud project that, or in Cloud Code that it uses one of these finance APIs to just download all my credit card transactions, and then it decides these are probably the expenses that I'm gonna have to file.

Speaker 2

然后我设置了两个子代理,一个代表我,另一个代表公司,它们会‘对决’以确定真正的费用清单。

And then I have two sub agents, one that represents me and one that represents the company, and they do battle to figure out what's the proper actual set of expenses.

Speaker 2

这就像一个审计子代理和一个‘支持丹’子代理。

It's like an auditor sub agent and a pro Dan sub agent.

Speaker 2

所以,像这样的事情,这种对手处理器模式似乎挺有意思的。

So yeah, that kind of thing, the sort of opponent processor pattern seems to be an interesting one.

Speaker 1

是的,这很酷。

Yeah, yeah, it's cool.

Speaker 1

我觉得当子代理刚出现的时候,真正启发我们的,是很久以前的一个Reddit帖子,有人为前端开发、后端开发和设计师这样的角色创建了子代理。

I feel like when sub agents were first becoming a thing, actually what inspired us, was a Reddit thread a while back where someone made sub agents for, there was a front end dev and a back end dev and thing like a designer.

Speaker 0

测试开发。

Testing dev.

Speaker 1

测试开发。

Testing dev.

Speaker 1

还有一个产品管理子代理。

There was a PM sub agent.

Speaker 1

这挺可爱的。

And this is cute.

Speaker 1

感觉有点太拟人化了。

It feels a little maybe too anthropomorphic.

Speaker 1

也许这其中有道理,但我认为真正的价值在于加权的上下文窗口,你有两个彼此不知情的上下文窗口,这还挺有意思的。

Maybe there's something to this, but I think the value is actually the encore weighted context windows, where you have these two context windows that don't know about each other, and this is kind of interesting.

Speaker 1

而且这样通常能得到更好的结果。

And you tend to get better results this way.

Speaker 2

那你呢,阿里?

What about you, Ari?

Speaker 2

你有没有用过什么有趣的子代理?

Do you have any interesting sub agents you use?

Speaker 0

我一直在调试一个特别擅长前端测试的子代理。

So I've been tinkering with one that is really good at front end testing.

Speaker 0

它使用Playwright来检查客户端有哪些错误,把这些错误提取出来,并尝试测试应用的更多步骤。

So it uses Playwright to see, all right, what are all the errors that are client side and pull them in and try to test more steps of the app.

Speaker 0

它还没完全成熟,但我已经看到一些苗头了,我觉得这种东西我们有可能打包到我们的插件市场中。

It's not totally there yet, but I'm seeing signs of life, and I think it's the kind of thing that we could potentially bundle in one of our plugins marketplaces.

Speaker 2

是的。

Yeah.

Speaker 2

当然。

Definitely.

Speaker 2

我之前用过类似的东西,只是用 Puppeteer,看着它构建东西,然后打开浏览器,发现:哦,得改这个。

I've used something like that just with Puppeteer, and just watching it build something and then open up the browser and then be like, oh, need to change this.

Speaker 2

简直让人惊呼:天哪。

It's like, this is like, oh my god.

Speaker 0

是的,它

Yeah, it's

Speaker 1

真的很棒。

really cool.

Speaker 1

真的很棒。

It's really cool.

Speaker 1

我觉得我们正开始看到这种庞大而多样的子代理的雏形。

I think we're starting to see the beginnings of this massive, multi massive sub agents.

Speaker 1

我不知道该怎么称呼这种东西,比如群 swarm 之类的。

I don't know what to call this, like swarms or something like that.

Speaker 1

有很多人,实际上在Anthropic内部,每月使用大量积分、花费超过1000美元的人数正在不断增加。

There's a bunch of people, there's actually an increasing number of people internally at Anthropic that are using a lot of credits every month, spending over $1,000 every month.

Speaker 1

而且这类人群的比例实际上增长得相当快。

And this percent of people is growing actually pretty fast.

Speaker 1

我认为最常见的使用场景是代码迁移。

And I think the common use case is code migration.

Speaker 1

他们所做的就是从框架A迁移到框架B,主代理会生成一个详细的待办事项清单,然后通过大量子代理进行映射和归约。

And so what they're doing is framework A to framework B, there's the main agent and makes a big to do list for everything, and then just of map reduce over a bunch of sub agents.

Speaker 1

所以像Unstructured Quad这样的工具,可能会说:是的,启动10个代理,然后一次处理10个,把所有内容都迁移过去。

So Unstructured Quad would like, yeah, start 10 agents and then just go 10 at a time and just migrate all the stuff over.

Speaker 2

这很有趣。

That's interesting.

Speaker 2

你能举一个具体的迁移例子吗?比如你所说的那种迁移?

What would be a concrete example of the kind of migration that you're talking about?

Speaker 1

我认为最典型的例子是lint规则。

I think the most classic is lint rules.

展开剩余字幕(还有 480 条)
Speaker 1

所以你正在推行某种lint规则,但没有自动修复功能,因为AST分析对于它来说太简单了。

So there's some kind of lint rule you're rolling out, there's no auto fixer because it's like AST analysis is kind of too simplistic for it.

Speaker 1

我认为其他情况比如框架迁移。

I think other stuff is like framework migrations.

Speaker 1

我们刚刚从一个测试框架迁移到了另一个不同的框架。

We just migrated from one testing framework to a different one.

Speaker 1

这是一个非常常见的场景,因为输出很容易验证。

That's a pretty common one where it's super easy to verify the output.

Speaker 2

我发现的一件事是,无论是在Every内部的项目还是开源项目中,如果你是一个正在构建产品的开发者,并且想实现一个之前已经做过的功能。

One of the things I found is, and this is both for projects inside of Every and then just open source projects, it's like if you're someone building a product and you want to build a feature that's been done before.

Speaker 2

所以也许一个人们可能需要实现很多的例子就是内存管理。

So maybe an example that people might need to implement a bunch is memory.

Speaker 2

你怎么实现内存管理?

How do you do memory?

Speaker 2

因为我们内部有很多不同的产品,你可以直接启动多个云端子代理,去问问这三个其他产品是怎么做的。

Because we have a bunch of different products internally, you can just like spawn cloud sub agents to be like, how do these three other products do it?

Speaker 2

而且这里还存在一种隐性的代码共享可能性,你不需要定义 API,也不需要去询问任何人,可以直接问:我们之前是怎么做这个的?

And there's like possibility for just like tacit code sharing where you don't need to like have an API, you don't need to like ask ask anyone, you can just be like, how does, how do we do this already?

Speaker 2

然后借鉴最佳实践来构建你自己的方案。

And then use the best practices to build your own.

Speaker 2

你也可以用开源项目来做这件事,因为有很多开源项目,人们已经花了整整一年时间研究内存管理,做得非常好。

And you can also do that with open source, because there's tons of open source projects where people are like, they've been working memory for a year and it's really, really good.

Speaker 2

你可以问:人们已经发现了哪些模式?哪些模式是我想要采用的?

You can be like, what are the patterns that people have figured out and which ones do I want to implement?

Speaker 0

完全正确。

Totally.

Speaker 0

你还可以连接到你的版本控制系统。

You can also connect to your version control system.

Speaker 0

如果你以前开发过类似的功能,Cloud Code 可以通过这些 API 直接查询 GitHub,找到过去人们是如何实现类似功能的,阅读代码并复制相关部分。

If you've built a similar feature in the past, Cloud Code can use those APIs, like query GitHub directly, and find how people implemented a similar feature in the past, and read that code and copy the relevant parts.

Speaker 2

是的。

Yeah.

Speaker 2

你有没有发现日志文件的用途,比如记录下我实现它的完整过程?

Have you found any use for log files of, Okay, here's the full history of how I implemented it.

Speaker 2

把这些内容提供给Claude重要吗?

And is that important to give to Claude?

Speaker 2

是实现这个功能,还是让它对Claude有用?

Implementing that or making it useful for it?

Speaker 0

有些人非常推崇这种方法。

Some people swear by it.

Speaker 0

在Anthropic公司,有些人每完成一项任务,都会让Claude Ko以特定格式写一篇日记,记录它做了什么、尝试了什么、为什么没成功。

There are some people at Anthropic where for every task they do, they tell Claude Ko to write a diary entry in a specific format that just documents like, what did it do, what did it try, why didn't it work.

Speaker 0

他们甚至还有代理程序,会回顾过去的记忆并将其综合成观察结果。

And then they even have these agents that look over the past memory and synthesize it into observations.

Speaker 0

我认为这还只是初步的萌芽,这里有一些有趣的东西可以产品化,但这是我们看到的一种新兴且效果良好的模式。

I think this is like the starting budding, There's something interesting here that we could productize, but it's a new emerging pattern that we're seeing that works well.

Speaker 0

我认为,仅凭一次对话就注入记忆的难点在于,很难判断某个特定指令对所有未来任务的相关性。

I think the hard thing about one shotting memory from just one transcript is that it's hard to know how relevant a specific instruction is to all future tasks.

Speaker 0

比如我们经典的例子是,如果我说‘把按钮变成粉色’,我不希望你记住要把所有按钮都变成粉色。因此,我认为从大量日志中综合出记忆,是一种更一致地发现这些模式的方法。

Like our canonical example is, if I say make the button pink, I don't want you to remember to make all buttons pink And in the so I think synthesizing memory from a lot of logs is a is a way to find these patterns more consistently.

Speaker 2

看起来你可能需要一些方法,有些情况下你可以通过自上而下的方式总结或综合出信息。

It seems like you probably need like, there's some things where you're gonna know you'll be able to summarize, like synthesize or summarize in this sort of like top down way.

Speaker 2

这些信息以后会很有用,你会知道在哪个抽象层次上它可能有用。

This will be useful later, and you'll know the right level of abstraction at which it might be useful.

Speaker 2

但还有很多情况是,任何一个提交日志,比如‘把按钮变成粉色’,都可能有无数种你事先无法预知的用途。

But then there's also a lot of stuff where it's like, you actually, any given commit log, like make the button pink, it could be useful for kind of an infinite number of different reasons that you're not going to know beforehand.

Speaker 2

因此,你还需要模型能够查找所有相似的过往提交,并在恰当的时候呈现出来。

So you also need the model to be able to look up all similar past commits and surface that at the right time.

Speaker 2

这也是你在思考的问题吗?

Is that something that you're also thinking about?

Speaker 1

是的,这可能是一种可行的方向。

Yeah, I there could be something like that.

Speaker 1

我认为一种看待方式是,这种传统记忆存储工作,比如Memex之类的东西,你只是想把所有信息都存进系统,之后就变成一个检索问题。

I think one way to see it is this kind of like traditional memory storage work like Memex, like kind of stuff where you just want to put all the information into the system and then it's kind of a retrieval problem after that.

Speaker 1

是的,随着模型变得越来越智能,我注意到它也开始自然地这样做,比如在Sonnet 4.5中,当它遇到卡壳时,会自发地像我们之前讨论的那样,使用bash命令去翻阅Git历史记录,然后说:哦,原来这样处理还挺有意思的。

Yeah, think as the model also gets smarter it naturally I've seen it start to naturally do this also with Sonnet 4.5 where if it's stuck on something it'll just naturally start looking like we talked about before like using bash spontaneously to just look through Git history and be like, oh, Okay, yeah, this is kind of an interesting way to do it.

Speaker 2

对。

Yeah.

Speaker 2

在我们开始录制之前,我们曾提到过一件事,那就是我们如今在每个项目中都深度依赖Cloud Code和CLI工具,这彻底改变了我们的工程方式。

One of the things that we were talking before we started recording, one of the things that we're doing inside of every I feel like it has really changed the way that we do engineering, because everyone is Cloud Code pilled, like CLI pilled.

Speaker 2

我们有一种称为‘复利工程’的工程范式,在传统工程中,每增加一个新功能,都会让下一个功能更难实现。

And we have this engineering paradigm that we call compounding engineering, where in normal engineering, every feature you add, it makes it harder to add the next feature.

Speaker 2

而在复利工程中,我们的目标是让每一个新功能的开发,都让下一个功能更容易构建。

And in compounding engineering, your goal is to make the next feature easier to build from the feature that you just added.

Speaker 2

我们实现这一点的方式是,把所有开发过程中积累的经验和教训都系统化地记录下来。

And the way that we do that is we try to codify all the learnings from everything that we've done to build the feature.

Speaker 2

比如:我们最初是如何制定计划的?计划中哪些部分需要调整?在测试时发现了哪些问题?我们遗漏了哪些关键点?

So how did we make the plan and what parts of the plan needed to be changed, or when we started testing it, what issues did we find, what are the things that we missed.

Speaker 2

然后我们将这些经验固化到所有提示词、子代理和命令中,这样当下次有人遇到类似情况时,系统就能自动捕捉并提醒,从而让后续工作变得更轻松。

And then we codify them back into all the prompts and all the sub agents and all the slash commands so that the next time when someone does something like this, it catches it, and that makes it easier.

Speaker 2

这就是为什么对我来说,比如,我可以直接进入我们的任何一个代码库并立即开始高效工作,即使我对代码的运作方式一无所知,因为我们已经建立了一套累积了所有实施过程中所学知识的记忆系统。

That's why for me, for example, I can hop into one of our code bases and start being productive, even though I don't know anything about how the code works, because we have this built up memory system of all the stuff that we've learned as we've implemented stuff.

Speaker 2

但我们不得不自己构建这套系统。

But we've had to build that ourselves.

Speaker 2

我很想知道,你们是否在构建类似的循环,让 Cloud Code 能自动实现这一点?

I'm curious, are you working on that kind of loop so the Cloud Code does that automatically?

Speaker 1

是的,我们已经开始考虑这个问题了。

Yeah, we're starting to think about it.

Speaker 1

有趣的是,我们也从菲奥娜那里听到了同样的说法。

It's funny, we heard the same thing from Fiona.

Speaker 1

她刚加入团队,是我们主管。

She just joined the team and she's our manager.

Speaker 1

她差不多有十年没写过代码了,类似这种情况。

She hasn't coded in like ten years, something like that.

Speaker 1

但她第一天就提交了被合并的代码请求。

And she was winning PRs on her first day.

Speaker 1

她还说,没错,我不但忘了怎么编程,但Quad Code让我轻松地重新上手,而且我不需要再熟悉任何上下文,因为我本来就了解这些。

And she was like, Yeah, not only did I kind of, I forgot how to code and quad code kind of made it super easy to just get back into it, but also I didn't need to ramp up on any context because I kind of knew all this.

Speaker 1

我认为很大一部分原因在于,当人们为Quad Code提交拉取请求时,我觉得我们的客户也经常做类似的事情。

And I think a lot of it is about like when people put up pull requests for quad code itself, and I think our customers tell us that they do like similar stuff pretty often.

Speaker 1

如果你看到一个错误,只需说:加Quad,把这个加到quadmd里,这样下次它就会自动知道了。

If you see a mistake, I'll just be like, add quad, add this to quadmd so that the next time it just knows this automatically.

Speaker 1

你可以用各种方式来培养这种记忆。

And you can kind of like instill this memory in kind of a variety of ways.

Speaker 1

你可以说:加Quad,把它加到quadmd里。

So you can say like add quad, add it to quadmd.

Speaker 1

你也可以直接说:加Quad,写个测试。

You can also say add quad, write a test.

Speaker 1

你知道,这是一种确保不会倒退的简单方法。

You know, that's like an easy way to make sure this doesn't regress.

Speaker 1

而且,别再因为让人写测试而感到不好意思了,因为这实在太简单了。

I And don't feel bad asking anyone to write tests anymore, because it's like super easy.

Speaker 1

我认为我们几乎100%的测试都是由Quad生成的,如果测试不好,我们就不会提交,只有好的测试才会被保留。

And I think probably close to 100% of our tests are just written by quad, and if they're bad, we just won't commit it, and then the good ones stay committed.

Speaker 1

此外,我认为lint规则也是一个重要的方面。

And then also I think lint rules are a big one.

Speaker 1

对于那些经常需要强制执行的规则,我们实际上有一套内部的lint规则。

So for stuff that's enforced pretty often we actually have a bunch of internal lint rules.

Speaker 1

这些规则完全由Claude编写。

Claude writes 100% of these.

Speaker 1

这通常就是在一个PR中添加Claude,然后写这个lint规则。

And this is mostly just like add Claude in a PR, write this lint rule.

Speaker 1

是的,目前确实存在一个问题,就是如何自动实现这一点?

And yeah, there's sort of this problem right now about like how do you do this automatically?

Speaker 1

我认为Kat和我通常的思路是,我们看到这种高级用户的行为,第一步是让产品可自定义,让最出色的用户能自己摸索出这些酷炫的新用法。

And I think generally how Kat and I think about it is we see this like power user behavior, and the first step is how do you enable that by making the product hackable so the best users can figure out how to do this cool new thing.

Speaker 1

但真正的难点在于,如何将这种能力推广到所有人?

But then really the hard work starts of like how do you take this and bring it to everyone else?

Speaker 1

对我来说,我属于‘其他人’这一类。

And for me, count myself in the everyone else bucket.

Speaker 1

我其实不太会用VIM,也没有那种复杂的T box配置。

I don't really know how to use VIM, I don't have this crazy T box setup.

Speaker 1

所以我的设置非常普通。

So I have a pretty vanilla setup.

Speaker 1

如果你能做出一个我会用的功能,那就能很好地说明其他普通工程师也会使用它。

So if you can make a feature that I'll use, it's a pretty good indicator that other average engineers will use it.

Speaker 2

这很有趣。

That is interesting.

Speaker 2

跟我说说这个吧,因为我一直在思考这个问题:如何让产品具有足够的可扩展性和灵活性,让高级用户能发现你根本想不到的创新用法,同时又足够简单,让任何人都能轻松上手并高效使用,还能把高级用户的发现反馈到基础体验中。

Tell me about that, because that's something I think about all the time is making something that is extensible and flexible enough that power users can find novel ways to use it that you would not have even dreamed of, but it's also simple enough that anyone can use it, and they can be productive with it, and you can pull what the power users find back into the basic experience.

Speaker 2

你是如何思考这些设计和产品决策的,以便实现这种平衡?

How do you think about making those design and product decisions so that you enable that?

Speaker 0

一般来说,我们认为每个引擎环境都有所不同,因此我们系统的每个部分都必须具有可扩展性。

In general, we think that every engine environment is a little bit different from the others, and so it's really important that every part of our system is extensible.

Speaker 0

从状态行到通过自定义斜杠命令,再到钩子——这些钩子允许你在四码流程的几乎每一步插入确定性逻辑。

So everything from your status line to adding your own slash commands through to hooks, which let you insert a bit of determinism at pretty much any step in quad code.

Speaker 0

因此,我们认为这些是我们提供给每位工程师的基础构建模块,他们可以自由地进行探索和使用。

So we think these are the basic building blocks that we give to every engineer that they can play with.

Speaker 0

关于插件,实际上这是我们团队的Daisy开发的,我们的目标是让像我们这样的普通用户更容易将这些斜杠命令和钩子融入自己的工作流程中。

For plugins, plugins is actually our so it was built by Daisy on our team, and this is our attempt to make it a lot easier for the average user like us to bring these slash commands and hooks into our workflows.

Speaker 0

插件的功能是让你浏览现有的MCP服务器、现有的钩子、现有的插件,或者更准确地说,浏览现有的斜杠命令,然后只需在Cloud Code中输入一条命令,就能将它们一键添加到你的环境中。

And so what plugins does is it lets you browse existing MCP servers, existing hooks, existing plugins, and just like or sorry, existing slash commands, and just let you write one command in Cloud Code to pull that in for yourself.

Speaker 1

产品领域有一个非常古老的概念叫‘潜在需求’,我认为这正是我个人思考产品和决定下一步该做什么的主要方式。

There's this really old idea in product called latent demand, which I think is probably the main way that I personally think about product and think about what to build next.

Speaker 1

这是一个非常简单的理念。

It's a super simple idea.

Speaker 1

你设计的产品要具备可篡改性,足够开放,让用户能将其用于原本未设计的其他用途,然后你观察他们如何‘滥用’它,再据此进行优化开发。

You build a product in a way that is hackable, that is of open ended enough that people can abuse it for other use cases it wasn't really designed for, then you see how people abuse it and then you build for that.

Speaker 1

因为你清楚地知道,这种需求是真实存在的。

Because you kind of know there was demand for it.

Speaker 1

对。

Right.

Speaker 1

你知道,当我还在Meta的时候,我们就是用这种方式打造所有大型产品的。

And you know, when I was at Meta, this is how we built kind of all the big products.

Speaker 1

我认为几乎每一个大型产品都蕴含着这种潜在需求的精髓。

I think almost every single big product had this nugget of latent demand in it.

Speaker 1

比如,Facebook的交友功能就源于这样一个想法:当我们观察谁查看他人个人资料时,发现大约60%的浏览行为发生在异性之间,而他们彼此并不认识。

You know, like for example, something like Facebook data came from this idea that when we looked at who looks at people's profiles, I think 60% of views were between people of opposite gender, so kind of like traditional setup, that were not friends with each other.

Speaker 1

于是我们想,哦,天哪,如果我们想做一个约会产品,或许可以利用这种已经存在的需求。

And so we're like, Oh man, okay, maybe if we want a dating product we can kind of harness this demand that exists.

Speaker 1

这很有趣。

That's interesting.

Speaker 1

对于 Marketplace 来说,情况也差不多。

And for Marketplace it was pretty similar.

Speaker 1

当时我认为,Facebook群组中约有40%的帖子是买卖类帖子。

I think it was like 40% of posts in Facebook groups at the time were buysell posts.

Speaker 1

所以我们觉得,人们正在自己使用这个产品。

And so we're like, okay, people are trying to use this product by themselves.

Speaker 1

我们就围绕它开发了一个产品。

We just build a product around it.

Speaker 1

这很可能行得通。

That's probably gonna work.

Speaker 1

因此,我们以类似的方式思考,但同时我们还有为开发者构建产品的优势。

And so we think about it kind of similarly, but also we have the luxury of building for developers.

Speaker 1

开发者喜欢折腾东西,也喜欢自定义功能。

And developers love hacking stuff and they love customizing stuff.

Speaker 1

作为我们自己产品的用户,这让构建和使用这个工具变得特别有趣。

It's like as a user of our own product it makes it so fun to build and use this thing.

Speaker 1

所以,正如我所说,我们只是搭建了合适的扩展点,观察人们如何使用它,这就能告诉我们下一步该开发什么。

And so yeah, like I said, we just built the right extension points, we see how people use it, and that kind of tells us what to build next.

Speaker 0

比如,我们收到了大量用户反馈,说:‘天啊,Cloud Code 怎么要这么多权限,我正出去买咖啡呢。’

Like for example, we got all these user requests where people were like, Dude, Cloud Code is asking me for all these permissions, and I'm out here getting coffee.

Speaker 0

我不知道它为什么问我如何才能让它在 Slack 上给我发提醒?

I don't know that it's asking me for How can I just get it to ping me on Slack?

Speaker 0

所以我们开发了钩子,Dixon 做了 Hooks,让使用者能收到 Slack 的提醒。

And so we built hooks, Dixon built Hooks, so that people could get pinged on Slack.

Speaker 0

你可以针对任何你想在 Slack 上收到提醒的事情设置提醒。

And you could get pinged on Slack for anything that you want to get pinged on Slack for.

Speaker 0

因此,这明显反映出人们非常希望拥有执行某些操作的能力。

And so it was very much like people really wanted the ability to do something.

Speaker 0

我们不想自己去开发这些集成,所以我们开放了 Hooks,让用户自己实现。

We didn't want to build the integration ourselves, and so we exposed Hooks for people to do that.

Speaker 2

让我想到的是,你们最近发布了新版本,把 Cloud Code 的定位重新调整为一个更通用的代理 SDK。

The thing that makes me think of is you recently released, you kind of moved or rebranded how you talk about Cloud Code to be this more general purpose agent SDK.

Speaker 2

这个转变是源于某种潜在需求吗?你们是否发现你们所构建的东西有更广泛的用途?

Was that driven by some latent demand where you saw there's a more general purpose use case for what you built?

Speaker 0

我们意识到,就像你提到的用 Cloud Code 做编码以外的事情一样,我们也经常看到这种情况。

We realized that, similar to how you were talking about using Cloud Code for things outside of coding, we saw this happen a lot.

Speaker 0

我们收到了大量案例,人们使用云代码来帮助他们撰写博客、管理所有数据输入,并以自己的语气进行初步处理。

We get a ton of stories of people who are using Cloud Code to help them write a blog and manage all the data inputs and take a first pass in their own tone.

Speaker 0

我们发现有人用它来构建邮件助手。

We find people building email assistants on this.

Speaker 0

我经常用它来做市场研究,因为本质上,它是一个代理,只要给你一个明确的任务,并能获取正确的底层数据,它就可以无限期地运行。

I use it for a lot of market research because at the core, it's an agent that can go on for an infinite amount of time as long as you give it a concrete task and it's able to fetch the right underlying data.

Speaker 0

我之前在做的一个项目是,我想了解全球所有公司以及它们各自拥有的工程师数量,并据此创建一个排名。

So one of the things I was working on was I wanted to look at all the companies in the world and how many engineers they had and to create a ranking.

Speaker 0

而这件事正是云代码可以做到的,尽管它并不是一个传统的编程用例。

And this something that quad code can do, even though it's not a traditional coding use case.

Speaker 0

于是我意识到,底层的原语其实非常通用。

So I realized that the underlying primitives were really general.

Speaker 0

只要有一个能够长时间持续运行的代理循环,并且能够访问互联网、编写代码和运行代码,那么只要你稍加想象,几乎就能用它构建任何东西。

As long as you have an agent loop that can continue running for a long period of time, and you're able to access the internet and write code and run code, pretty much you can, if you squint, you can kind of build anything on it.

Speaker 1

我认为,当我们把产品从云代码SDK重新命名为云代理SDK时,已经有成千上万的公司在使用它了。

And I think at the point where we rebranded it from the Quad Code SDK to the Quad Agent SDK, there was already many thousands of companies using this thing.

Speaker 1

而且很多这些用例根本不是关于编程的。

And a lot of those use cases were not about coding.

Speaker 1

所以无论是内部还是外部,我们

So it's both internally and externally, we

Speaker 0

看到了,比如健康助手、金融分析师、法律助手。

saw Yeah, it was like health assistants, financial analysts, legal assistants.

Speaker 0

用途非常广泛。

It was pretty broad.

Speaker 2

是的。

Yeah.

Speaker 2

最酷的是哪些?

What are the coolest ones?

Speaker 1

我觉得你最近在播客上采访过诺亚·布莱尔。

I feel like actually you had a Noah Briar on the podcast recently.

Speaker 1

我觉得Obsidian思维导图笔记工具的使用案例真的令人震惊,有这么多人用它来做这个。

I thought the Obsidian mind mapping note keeping use It's cases is really insane how many people use it for this.

Speaker 1

这种特定的组合。

This particular combination.

Speaker 1

我认为一些编码或与编码相关的用例挺酷的,比如我们有一个针对Quad代码的问题追踪器。

I think some coding or coding adjacent use cases that are kind of cool is we have this issue tracker for quad code.

Speaker 1

团队一直忙得不可开交,根本跟不上源源不断涌入的问题。

The team's just constantly underwater trying to keep up with all the issues coming in.

Speaker 1

问题实在太多了。

There's just so many.

Speaker 1

所以Quad会去重这些问题,自动识别重复项,而且做得非常出色。

So Quad dedupes the issues and it automatically finds duplicates and it's extremely good at it.

Speaker 1

它还能进行初步的解决处理。

It also does first pass resolution.

Speaker 1

通常当出现一个问题时,它会主动创建一个内部拉取请求,这是团队里的Inigo新开发的功能。

So usually when there's an issue it'll proactively put up a PR internally and this is a new thing that Inigo on the team built.

Speaker 1

这相当酷。

So this is pretty cool.

Speaker 1

还有值班机制,以及从其他地方收集信号,获取Sentry日志和BigQuery日志,并将所有这些信息整合起来。

There's also on call and collecting signals from other places, getting sentry logs and getting logs from BigQuery and collating all this.

Speaker 1

而且它在这方面非常出色,因为所有这些都只是手动编写的Bash脚本。

Plus it's really good at doing this, because it's all just Bash in hand.

Speaker 1

这些都是我看到的一些内部使用场景。

And so these are all these internal use cases that I saw.

Speaker 2

所以当它在整合日志或去重问题时,是像你们在后台持续运行云服务吗?

So when it's collating logs or deduping issues, is that like you have clouds continually running in the background?

Speaker 2

这是你们专门为这个构建的吗?

And is that something that you're building for?

Speaker 0

对于这个特定功能,每当有新问题提交时就会被触发。

For that particular one, it gets triggered whenever a new issue is filed.

Speaker 0

所以它只运行一次,但可以根据需要运行任意长时间。

So it runs once, but it can choose to run for as long as it needs.

Speaker 2

明白了。

Got it.

Speaker 2

那关于让四元组一直运行的想法呢?

What about the idea of quads always running?

Speaker 0

哦,主动型四元组。

Oh, proactive quads.

Speaker 0

我认为这确实是我们要达到的方向。

I think it's definitely where we want to get to.

Speaker 0

我觉得目前我们非常专注于让四元组编码在单个任务中变得极其可靠。

I would say right now, we're very focused on making quad coding incredibly reliable for individual tasks.

Speaker 0

还有多行自动补全、单轮代理,现在我们正在开发能够完成任务的四元组代码。我觉得如果你追踪这条发展曲线,最终会走向更高层次的抽象,处理更复杂的任务。

And multi line autocomplete and then single turn agents and then now we're working on quad code that can complete tasks, I feel like if you trace this curve, eventually you go to even higher levels of abstraction, even more complicated tasks.

Speaker 0

然后,希望在这之后的下一步是大幅提升生产力。

And then hopefully, the next step after that is a lot more productivity.

Speaker 0

比如理解你团队的目标、你的目标,能够说:嘿,我觉得你可能想尝试这个功能,这是我写的初步代码,以及我做出的假设,这些假设对吗?

So just understanding what your team's goals are, what your goals are, being able to say, Hey, I think you probably want to try this feature and here's a first pass at the code, and here are the assumptions I made, and are these correct?

Speaker 2

我等不及了。

I can't wait.

Speaker 2

我认为紧随其后的是,Claude 会成为你的经理。

And I think probably right after that is Claude is now your manager.

Speaker 1

天哪。

Oh no.

Speaker 0

这不在计划之中。

That's not in the plan.

Speaker 2

所以团队里的每个人都对我们今天的对话感到非常兴奋,他们给了我一堆问题,我得确保回答到所有问题。

So everyone on the team was like super excited that we were talking today and they gave me a bunch of questions and I want to make sure I hit all the questions.

Speaker 2

哦,这是个好问题。

Oh, here's a good one.

Speaker 2

你们在架构中为什么选择智能检索而不是向量搜索?向量嵌入还有相关性吗?

Why did you choose agentic rag over vector search in your architecture, and are vector embeddings still relevant?

Speaker 0

实际上,我们最初确实使用了向量嵌入。

So actually, initially we did use vector embeddings.

Speaker 0

但它们很难维护,因为你必须不断重新索引代码,而且它们可能会过时,还有本地更改也需要被纳入。

They're just really tricky to maintain because you have to continuously re index the code and they might get out of date, and you have local changes, so those need to make it in.

Speaker 0

当我们思考外部企业如何采用它时,我们意识到这会暴露更多的攻击面和安全风险。

And then as we thought about what does it feel like for an external enterprise to adopt it, we realized that this exposes a lot more surface area and security risk.

Speaker 0

我们还发现,Cloud Code 和 Cloud Models 在代理搜索方面表现非常出色。

We also found that actually Cloud Code is really good and Cloud Models are really good at agentic search.

Speaker 0

因此,你可以通过代理搜索达到相同的准确率,而且部署起来要简洁得多。

So, you can get to the same accuracy level with agentic search, and it's just a much cleaner deployment story.

Speaker 2

这真的很有趣。

That's really interesting.

Speaker 0

如果你确实想在 Cloud Code 中引入语义搜索,可以通过 MCP 工具来实现。

If you do want to bring semantic search to Cloud Code, you can do so via an MCP tool.

Speaker 0

所以,如果你希望自行管理索引,并提供一个让 Cloud Code 调用的 MCP 工具,这是可行的。

So if you want to manage your own index and expose an MCP tool that lets Cloud Code call that, that would work.

Speaker 2

你认为哪些 MCP 工具最适合与 Cloud Code 搭配使用?

What do think are the top MCPs to use with Cloud Code?

Speaker 0

Puppeteer 和 Playwright 是其中比较突出的选择。

Puppeteer and Playwright are pretty high up there.

Speaker 2

当然。

Definitely.

Speaker 0

是的。

Yeah.

Speaker 0

Sentry 的一个非常好。

Sentry has a really good one.

Speaker 0

Asana 的一个也非常棒。

Asana has a really good one.

Speaker 0

Do

Do

Speaker 2

你认为在 Anthropic 内部或其他大型组织中,那些重度使用 Cloud Code 的用户有哪些不为人知但值得了解的高级技巧吗?

you think that there are any power user tips that you see people inside of Anthropic or other people who are big power inside of organizations that are big Cloud Code power users that people don't know about but they should?

Speaker 0

Cloud Code 本身不太擅长主动提问,但我个人觉得这非常有用。

One thing that Cloud Code doesn't naturally like to do, but that I personally find very useful, Cloud Code doesn't naturally like to ask questions.

Speaker 0

但你知道,当你和思维伙伴或合作者一起头脑风暴时,通常你们会互相提问。

But, you know, if you're brainstorming with a thought partner, collaborator, usually you do ask questions back and forth to each other.

Speaker 0

所以这是我喜欢做的一件事,尤其是在计划模式下。

And so this is one of the things that I like to do, especially in plan mode.

Speaker 0

我会直接告诉Cloud Code:嘿,我们只是在头脑风暴这件事。

I'll just tell Cloud Code like, Hey, we're just brainstorming this thing.

Speaker 0

请问我问题。

Please ask me questions.

Speaker 0

如果你有任何不确定的地方,我希望你主动提问,我会配合的。

If there's anything you're unsure about, I want you to ask questions and I'll do it.

Speaker 0

我认为这实际上能帮助你得出更好的答案。

And I think that actually helps you arrive at a better answer.

Speaker 1

我们还能分享很多技巧。

There's also so many tips that we can share.

Speaker 1

我觉得我经常看到人们犯的一些常见错误。

I think there's a few really common mistakes I see people make.

Speaker 1

就像你所说的,不够使用计划模式。

One is like you said not using plan mode enough.

Speaker 1

这非常重要,我认为这对刚接触代理式编码的人来说尤其关键。

This is just super important and I think this is people that are kind of new to agentic coding.

Speaker 1

他们往往以为这东西什么都能做,但实际上并不能。

They kind of assume this thing can do anything and it can't.

Speaker 1

它今天的表现还不够好,但未来会变得更好;目前它只能一次性完成一些测试,无法一次性完成大多数任务。

It's not that good today and it's going to get better but today it can one shot some tests, it can't one shot most things.

Speaker 1

因此,你必须了解它的局限性,明白自己在什么环节需要介入。

And so you kind of have to understand the limits, you have to understand where you get in the loop.

Speaker 1

像计划模式这样的功能,如果你先确定好计划,就能轻松将成功率提升两到三倍。

And so something like plan mode, it can like two, 3x success rates pretty easily if you land on the plan first.

Speaker 1

我还看到一些高级用户做得特别好的地方,比如那些大规模部署了Cloud Code的公司,幸运的是,现在这类公司很多,我们可以从中学习。

Other stuff that I've seen power users do really well is companies that have really big deployments of quad code, and now luckily there's a lot of these companies so we can kind of learn from them.

Speaker 1

设置配置项。

Having settings.

Speaker 1

将JSON文件提交到代码库中非常重要,因为你可以用它来预先允许某些命令,避免每次都要请求权限,同时也能阻止某些命令。

Json that you check into the codebase is really important, because you can use this to pre allow certain commands so you don't get permission prompted every time, and also to block certain commands.

Speaker 1

假设你不想使用网络获取或其他功能。

Let's say you don't want web fetch or whatever.

Speaker 1

这样作为工程师,我就不会被反复提示,我可以将这个配置提交到代码库,与整个团队共享,让每个人都能使用。

And this way as an engineer I don't get prompted, and I can check this in and share it with the whole team so everyone gets to use

Speaker 2

我通过直接使用危险地跳过权限来绕过这个问题。

I get around that by just using dangerously skip permissions.

Speaker 1

是的,我们这里也有这个功能,但我们不推荐使用。

Yeah, we kind of have this here, but we don't recommend it.

Speaker 1

这是一个模型,你知道的。

It's a model, you know?

Speaker 1

它可能会做出一些奇怪的事情。

It can do weird stuff.

Speaker 1

我认为另一个有趣的用例是人们用停止钩子来做一些有意思的事情。

I think another kind of cool use case that we've seen is people using stop hooks for interesting stuff.

Speaker 1

所以停止钩子会在每一轮任务完成时运行。

So stop hook runs whenever the turn is complete.

Speaker 1

所以这结束了与各种工具的来回调用,任务已完成,并将控制权交还给用户。

So this has ended some tool calls back and forth with whatever, and it's done, and it returns control back to the user.

Speaker 1

然后我们运行停止钩子。

Then we run the stop hook.

Speaker 1

因此你可以定义一个停止钩子,比如如果测试未通过,就返回文本并继续执行。

And so you can define a stop hook that's like if the tests don't pass, return the text, keep going.

Speaker 1

本质上,你可以让模型持续运行,直到任务完成。

Essentially it's like you can just make the model keep going until the thing is done.

Speaker 1

当将这与 SDK 和这种程序化使用方式结合时,简直令人难以置信。

And this is just insane when you combine it with the SDK and this programmatic usage.

Speaker 1

这本质上是一种随机的、非确定性的过程,但通过框架支持,你可以获得确定性的结果。

This is a stochastic thing, it's a non deterministic thing, but with scaffolding you can get these deterministic outcomes.

Speaker 2

所以你们率先推动了这个命令行界面的范式转变。

So you guys started this CLI paradigm shift.

Speaker 2

你们认为命令行界面会是最终的交互形态吗?

Do you think the CLI is the final form factor?

Speaker 2

我们是一年后还是三年后主要在CLI中使用Cloud Code?

Are we going be using Cloud Code in the CLI primarily in a year or in three years?

Speaker 2

还是有其他更好的方式?

Or is there something else that's better?

Speaker 0

我的意思是,这并不是最终的形式,但我们非常专注于让CLI尽可能智能,并且尽可能可定制。

I mean, it's not the final form factor, but we are very focused on making sure the CLI is the most intelligent that we can make it and that it's as customizable as possible.

Speaker 0

你可以谈谈下一代形式。

You can talk about the next form factors.

Speaker 1

是的,凯特让我谈谈这个,因为没人跟得上这个速度,现在根本没人知道这些形式会是什么样子。我认为我们的团队正处于实验阶段,我们有CLI,然后推出了IDE扩展,现在又有一个新的IDE扩展,是一个GUI,更易用,我们还在GitHub上推出了Claud,你可以 anywhere 都添加Claud。

Yeah, I mean, Kat's asking me to talk about it because no one like this stuff's like it's just moving like so fast right like no one knows what these form factors are like right now I think our team is in experimentation mode so we have CLI then we came out with an IDE extension now we have a new IDE extension that's like a GUI it's a little more accessible We have claud in github so you can just add claud at anywhere.

Speaker 1

现在还有网页版和移动端,你可以在这些地方使用它。

Now there's on web and on mobile so you can use it on any of these places.

Speaker 1

我们正处于实验阶段,正在探索下一步该做什么。

And we're just in experimentation mode so we're trying to figure out what's next.

Speaker 1

我认为如果我们从宏观角度看这些东西的发展方向,其中一个大趋势是更长的自主运行时间。

I think if we kind of zoom out and see where this stuff is headed, I think one of the big trends is longer periods of autonomy.

Speaker 1

因此,对于每个模型,我们都会测量它能持续自主执行任务多久,在危险模式下的容器中自动压缩,直到任务完成。

And so with every model we kind of time how long can the model just keep going and do tasks autonomously, and just in dangerous mode in a container, keep auto compacting until the task is done.

Speaker 1

现在我们已经达到了两位数小时的级别。

And now we're on the order of double digit hours.

Speaker 1

我觉得上一个模型大概能运行三十个小时。

Think it's like the last model is like thirty hours.

Speaker 1

差不多是这个水平。

Something like this.

Speaker 1

下一个模型将能持续数天。

The next model is going be days.

Speaker 1

当你考虑并行化模型时,会衍生出一系列问题。

As you think about kind of parallelizing models there's kind of a bunch of problems that come out of this.

Speaker 1

其中一个问题是:这个程序运行在什么样的容器中?

So one is, what is the container this thing runs in?

Speaker 1

因为你不想不得不关闭笔记本电脑。

Because you don't want to have to close your laptop.

Speaker 2

我现在就是这样,因为我正在做很多Dispie。

I have that right now because I'm doing a lot of Dispie.

Speaker 2

我读过相关的内容,关于DSPY或者Dispie的提示优化,它运行在我的笔记本上,我根本不想合上它。

I've read it, but DSPY or Dispie prompt optimization, and it's on my laptop, and it's like, I don't want to close it.

Speaker 2

我总是开着笔记本屏幕,因为我不想合上它。

I'm like, in the window with my laptop open because I don't want to close it.

Speaker 1

是的,没错。

Yeah, that's right.

Speaker 1

对,我们之前拜访过一些公司和客户。

Yeah, we visited companies before, customers.

Speaker 1

每个人都随身带着他们的四重代码。

Everyone's just walking around with their quad codes.

Speaker 1

这是为其他人准备的吗?

Is this for other?

Speaker 1

所以我认为其中一个问题是需要摆脱这种模式。

So I think one is kind of getting away from this mode.

Speaker 1

然后我认为,很快我们就会进入这种监控四元组的模式。

And then I also think pretty soon we're going to be in this mode of monitoring quads.

Speaker 1

我不知道这种模式的合适形态是什么,因为作为人类,你需要能够检查并了解正在发生什么。

I don't know what the right form factor for this is, because as a human you need to be able to inspect this and kind of see what's going on.

Speaker 1

但同时,它也需要针对四元组进行优化,即优化四元组之间的通信带宽。

But also it needs to be quad optimized, where you're optimizing for kind of bandwidth between the quad to quad communication.

Speaker 1

我的预测是,终端并不是最终的形态。

So my prediction is terminal is not the final form factor.

Speaker 1

我的预测是,在未来几个月,甚至大约一年内,会出现几种新的形态。

My prediction is there's going be a few more form factors in the coming months, you know, maybe like a year or something like that.

Speaker 1

而且它会持续快速变化。

And it's going keep changing very quickly.

Speaker 2

你觉得呢?我经常给很多Every订阅者讲授云代码。

What do you think about, you know, I teach a lot of Cloud Code to a lot of Every subscribers

Speaker 0

谢谢。

and Thank you.

Speaker 2

不客气。

You're welcome.

Speaker 2

为你代劳工作。

Doing doing your work for you.

Speaker 2

我认为其中一件大事是,终端让人望而生畏。

And I think the like, one of the big things is just the terminal is intimidating.

Speaker 2

和订阅者通话时,告诉他们如何打开终端,即使你非技术背景也能操作,这真的很重要。

And just being on a call with subscribers being like, here's how you open the terminal and you're allowed to do this even if you're non technical, is like a big deal.

Speaker 2

你怎么看这个问题?

How do you think about that?

Speaker 0

是的,我们市场团队的一位同事开始使用 Cloud Code,因为她正在撰写一些涉及 Cloud Code 的内容,我就说,你真的应该亲身体验一下。

Yeah, one of the people on our marketing team started using Cloud Code because she was writing some content that touched on Cloud Code, and I was like, you should really experience it.

Speaker 0

她一用就弹出了大约30个弹窗,要求她接受各种权限,因为她以前从没用过终端。

And she got, like, 30 pop ups on her screen where she had to accept various permissions because she'd never used a terminal before.

Speaker 0

对。

Yeah.

Speaker 0

我完全同意你的看法。

So I completely see eye to eye with you on that.

Speaker 0

对于非工程师来说,这确实很难,甚至有些工程师也发现日常使用终端并不完全自在。

It's definitely hard for non engineers, and there's even some engineers we found who aren't fully comfortable with working day to day in the terminal.

Speaker 0

我们的 VS Code GUI 扩展是朝这个方向迈出的第一步,因为你根本不需要考虑终端。

Our Versus Code GUI extension is our first step in that direction because you don't have to think about the terminal at all.

Speaker 0

它就像一个传统的界面,上面有一堆按钮。

It's like a traditional interface with a bunch of buttons.

Speaker 0

我们正在开发更多图形化界面。

We are working on more graphical interfaces.

Speaker 0

Cloud Code 网页版就是一个图形界面。

Cloud Code on the Web is a GUI.

Speaker 0

我认为这可能是对技术不太熟练的人的一个很好的起点。

I think that actually might be a good starting point for people who are less technical.

Speaker 1

是的。

Yeah.

Speaker 1

几个月前,我有一次神奇的时刻,走进办公室时发现Anthropic的一些数据科学家就坐在Quad Code团队旁边。

There was this magic moment maybe a few months ago where I walked into the office and some of the data scientists at Anthropic sit right next to the quad code team.

Speaker 1

这些数据科学家的电脑上竟然都在运行Quad Code。

And the data scientists just had quad code running on their computers.

Speaker 1

我当时就问:这是什么?

And I was like, what is this?

Speaker 1

你们是怎么发现这个的?

How did you figure this out?

Speaker 1

我觉得可能是Brandon第一个这么做的。

I think it was like Brandon was the first one to do it.

Speaker 1

他跟我说:哦,对啊,我就直接安装了。

And he was like, oh yeah, I just installed it.

Speaker 1

我负责这个产品,所以我应该用它。

I work on this product, so I should use it.

Speaker 1

我当时简直惊呆了。

And I was like, oh my god.

Speaker 1

所以他弄明白了怎么使用终端和 Node。

So he figured out how to use a terminal and Node.

Speaker 1

Js。

Js.

Speaker 1

他以前没怎么做过这种工作流程。

He hasn't really done this kind of workflow before.

Speaker 1

显然非常技术性。

Obviously very technical.

Speaker 1

所以我认为我们现在开始看到越来越多与代码相关的功能,人们都在使用 Quad Code。

So I think now we're starting to see all these kind of like code adjacent functions, people use quad code.

Speaker 1

是的,这还挺有意思的。

And yeah, it's kind of interesting.

Speaker 1

从潜在需求的角度来看,这些人在自己动手改造产品,说明他们确实有使用它的需求。

From a latent demand point of view, these are people hacking the product, so there's demand to use it for this.

Speaker 1

所以我们希望用更易用的界面让它变得更简单一些。

And so we want to make it a little bit easier with more accessible interfaces.

Speaker 1

但与此同时,对我们来说,对于QuadCode,我们专注于为最优秀的工程师打造最好的产品。

But at the same time, for us, for QuadCode, we're laser focused on building the best product for the best engineers.

Speaker 1

因此我们专注于软件工程,并希望把它做得非常好。

And so we're focused on software engineering, and we want to make this really good.

Speaker 1

但我们希望让它成为其他人也能进行二次开发的东西。

But we want to make it a thing that other people can hack.

Speaker 0

有时,Cloud Code 会生成一些略显冗长的代码,但你只需告诉它简化一下,它就能做得非常好。

Sometimes, Cloud Code will write code that's a bit verbose, but you can just tell it to simplify it, and it does a really good job.

Speaker 2

有意思。

Interesting.

Speaker 2

那么,你是在什么情况下、什么时候这样做的呢?

And so how and when are you doing that?

Speaker 2

你是用斜杠命令,还是你……

So you're using a slash command or you're

Speaker 0

我只是直接说出来。

I just say it.

Speaker 0

就说 Every

Just say Every

Speaker 2

每次你都是对的。

time you're just yeah.

Speaker 0

有时候你会说,嘿。

Like sometimes you're like, hey.

Speaker 0

这应该是一行代码的改动。

This should be a one line change.

Speaker 2

是的。

Yeah.

Speaker 0

我会在这里写五行代码。

And I'll write five lines here.

Speaker 0

简化它。

Simplify it.

Speaker 0

嗯嗯。

Mhmm.

Speaker 0

它能立刻理解你的意思,我会马上修复。

And it understands immediately what you mean and I'll fix it.

Speaker 2

是的。

Yeah.

Speaker 2

我觉得我们团队里很多人也都会这么做。

I think a lot of people on our team do that too.

Speaker 2

这挺有意思的。

That's it's interesting.

Speaker 2

既然你总是这么说,那为什么不干脆把它做成一个斜杠命令或者集成到平台里,让它自动发生呢?

Why do you like why not then if you're saying that all the time, why not then, you know, push that into like a slash command or the harness or something like that to, yeah, make it just happen automatically?

Speaker 0

我们在CloudMD里已经有相关的操作说明了。

We do have instructions for this in the CloudMD.

Speaker 0

我觉得这种情况在对话中占比极低,我们不希望因此过度偏向另一个极端。

I think it impacts such a low percentage of conversations that we don't want it to, like, over rotate in the other direction.

Speaker 0

嗯。

Mhmm.

Speaker 0

而不是使用斜杠命令的原因是,你实际上不需要那么多上下文。

And then the reason why not a slash command is because you actually don't need that much context.

Speaker 0

我认为斜杠命令非常适合那些通常需要写两到三行的情况。

I think slash command's really good for situations where you would otherwise need to write two, three lines.

Speaker 0

嗯。

Mhmm.

Speaker 0

但对于某些情况,比如计划模式,你其实可以用几个词,但有时候却需要两到三行才能完整表达出你在计划模式下想要的内容。

But for some like, even for plan mode, you actually can use a few words, but sometime but it actually takes two or three lines to capture the entirety of what you want in plan mode.

Speaker 0

为了简化,你只需写‘简化它’,系统就能理解。

For simplify it, you can just write simplify it and it gets it.

Speaker 2

是的。

Yeah.

Speaker 2

对。

Yeah.

Speaker 2

这说得通。

That makes sense.

Speaker 1

酷。

Cool.

Speaker 1

对。

Yeah.

Speaker 2

好的。

Okay.

Speaker 2

嗯,现在我们可以了。

Well, now we're we can.

Speaker 2

这很有趣。

That's interesting.

Speaker 1

是的,但这些东西,你知道,还是感觉太早期了。

Yeah, but this stuff, like, you know, it still feels just so early.

Speaker 1

对。

Yeah.

Speaker 1

你知道,我们之前录音前还聊过,关于我们现在处于采用曲线的哪个阶段,还是在早期阶段,或者不管怎么说。

You know, like we were talking before the recording about like kind of where are we on the adoption curve and it's still The Hauschen curve or whatever.

Speaker 1

不管那个词是什么。

Whatever that term was.

Speaker 1

没错。

Exactly.

Speaker 1

而且感觉我们还处在前10%,这些东西变化会非常快,还会持续变化。

And it just feels like we're first 10%, still like this stuff is going to change so fast, it's going to keep changing.

Speaker 0

即使我跟Anthropic之外的研究人员聊过他们使用Cloud Code的经历,他们也会卡在类似的问题上,比如没意识到可以直接让LLM简化它。

Even when I talk to researchers outside of Anthropic who've used Cloud Code, they also get stuck on things like this, like not realizing that they can just tell the LLM to simplify it.

Speaker 0

我觉得这恰恰说明,即使是在这个行业工作的人,也并不总是意识到你可以直接跟模型对话。

And I think that just goes to show that even for people who are working in this industry, they don't always realize that you can just talk to the model.

Speaker 2

关键在于,我认为人们潜意识里期望使用AI不应该需要什么技能,因为AI好像只要你说什么它就做什么。

That's the thing is like, I think that there's this underlying expectation that using AI shouldn't have to be a skill, like, because it just does whatever you say.

Speaker 2

但你想想,你说的话本身会影响它的表现。

And you're like, well, I mean, whatever you say is going to matter for what it does.

Speaker 2

所以如果你能表达得更好,它就会做得更好。

So if you can say things better, it's going to do better.

Speaker 1

是的。

Yeah.

Speaker 1

我的意思是,每个模型都不一样,这才是难点。

I mean, it changes with every model, That's the hard part.

Speaker 1

比如,提示工程师曾经是个职业,但现在众所周知,它已经不再是职业了。

Like, prompt engineer was a job, and now famously it's not a job anymore.

Speaker 1

会有很多类似的工作,之后也会消失。

There's gonna be more jobs that are then not jobs anymore.

Speaker 1

这些你必须掌握的微小技能,随着模型变强,它能更好地理解你的意思。

These kind of little micro skills that you have to learn to use this thing and as the model gets better, it can just interpret it better.

Speaker 1

但我觉得,对我们来说,这也是构建这种产品时必须保持的谦逊态度——我们真的不知道下一步会怎样,只是和大家一样在摸索中前进。

But I think that's also like, for us, this is part of this kind of humility that we have to have building a product like this that we just really don't know what's next, and we're just trying to figure it out kind of along with everyone else.

Speaker 1

我们只是来体验这场旅程的。

We're just here for the ride.

Speaker 2

所以你们为自己而构建它,这真的很棒,因为我觉得这才是真正了解它的最好方式。

And that's why it's cool that you're building it for yourself, because I think that's the best way to know that.

Speaker 2

这就像是,你们——我们其实也是这样,你生活在未来,一直在使用它,所以很明显哪些功能还缺失。

It's just like, you're, and this is what we do too, is like, you're sort of living in the future, you're using it all the time, and it's pretty clear what's missing.

Speaker 2

你心里想着,我就想要这个功能,然后直接去做下一个改进,而不是去问某个大公司的企业产品经理:你们想要什么样的AI功能?

You're like, I just want this thing, and you can just do the next thing, rather than being like, let me ask some enterprise product manager at some gigantic company, like, what kind of AI feature do you want?

Speaker 2

他们说:我不知道。

And they're like, I don't know.

Speaker 2

在IDE旁边加个聊天机器人,然后你就说:好吧。

Put a little chat bot on the side of my IDE, and you're like, Okay.

Speaker 2

是的。

Yeah.

Speaker 1

这就是开发工具的优越之处。

This is like the luxurious thing about building DevTools.

Speaker 1

你就是自己的客户。

You're your own customer.

Speaker 2

我认为这也是AI的独特之处,因为它为所有软件重新设定了游戏规则。

I think it's also really a unique thing about AI because it sort of reset the game board for all software.

Speaker 2

所以我们有Quora、这个邮件助手,还有像Sparkle这样帮你整理文件的工具。

So we have Quora, this email assistant, and we have like Sparkle, which organizes your files.

Speaker 2

任何你想要在电脑上使用的功能,如果你用AI来构建,很可能之前没人做过,因为整个格局已经被重新洗牌了。

And it's like anything that you do for something that you want to use on your computer, if you're building it with AI, there's a good chance that hasn't been done before because like the whole landscape has been reset.

Speaker 2

所以,现在为自己构建东西是一个特别令人兴奋的时机。

And so it's a it's a uniquely exciting time to build stuff for yourself.

Speaker 0

完全正确。

Totally.

Speaker 0

我觉得这也彻底打开了竞争格局。

I think it totally opens the playing field too.

Speaker 0

现在任何个人都可以开发一个应用来满足自己的需求,然后分享给其他人。

It's like any individual can now build an app to fill their need and then distribute it to everyone else.

Speaker 0

是的。

Yeah.

Speaker 0

对。

Yeah.

Speaker 0

这真的很酷。

It's really cool.

Speaker 0

我一直在原型化各种随机的个人项目。

I've been prototyping all these, like, random pet projects.

Speaker 0

我刚搬进一个新公寓,里面空空如也。

I just moved into a new apartment and it's empty.

Speaker 0

所以我用云代理SDK开发了一个购物顾问助手,因为谁有时间去读所有评论、查看所有选项、寻找价格呢?这些信息都很难获取。

And so I've been building this shopping advisor assistant on the Cloud Agent SDK because who has time to read all the reviews and look at all the options and find their pricing and everything's hard to discover.

Speaker 0

它会问我一堆问题,然后我告诉它我想要什么。

And so it just asks me a bunch of questions and I tell it what I want.

Speaker 0

比如家具?

Like for furniture?

Speaker 0

是的,没错。

Yeah, exactly.

Speaker 0

它会给我展示各种沙发的照片、不同选项以及网友的评价。

And it shows me a bunch of photos of different sofas and options and what people say online.

Speaker 0

然后我会告诉它我不喜欢什么。

Then I tell it what I don't like.

Speaker 0

这简直就像在和一个购物助手合作。

It literally feels like working with a shopping assistant.

Speaker 0

这真的太棒了。

It's been really cool.

Speaker 0

这真的很酷。

That's really cool.

Speaker 0

我还有一个小小的邮件回复助手,帮我起草回复内容。

I also have my little email response agent that drafts responses for me.

Speaker 0

但我平时不太用邮件,所以。

But I don't use email that much, so.

Speaker 1

哦,我知道那不是你回复的。

Oh, and I knew it wasn't you responding.

Speaker 0

所以才延迟了七天。

That's why it's seven days delayed.

Speaker 0

这个代理做得非常细致。

The agent's just doing a very thorough job.

Speaker 0

不过,代理SDK真的很棒。

Agent SDK is cool though.

Speaker 1

是的,代理SDK很棒。

Yeah, Agent SDK is cool.

Speaker 1

对,总是让人觉得惊讶,我们这么小的团队竟能构建出这么多东西。

Yeah, it always just feels amazing how much we're able to build with such a small team.

Speaker 1

是的。

Yeah.

Speaker 1

因为我觉得

Because I feel

Speaker 0

还有另一件很酷的事,就是我觉得公司内部的人正在从依赖文档转向依赖演示。

like The there's other thing that's really cool is that I think people are just shifting their mindset from docs to demos internally.

Speaker 0

我们的核心其实是演示。

Our currency is actually demos.

Speaker 0

你希望人们对你做的东西感到兴奋。

It's like you want people to be excited about your thing.

Speaker 0

给我们展示一下它能做什么,十五秒就够了。

Show us fifteen seconds of what it can do.

Speaker 0

我们发现,现在团队里的每个人都已经形成了这种思维定式。

And we find that everyone on the team now has this kind of indoctrinated.

Speaker 2

没错,这就是演示文化。

Democulture, for sure.

Speaker 2

我认为这样更好,因为很多东西你可能只在脑子里有想法,如果你擅长写作,或许能想办法解释清楚。

And I think that's better because there's a lot of things that you might have in your head that if you're a great writer, maybe you could figure out how to explain it.

Speaker 2

但即便如此,要解释清楚还是非常困难。

But even then, it's just really hard to explain.

Speaker 2

但如果有人能亲眼看到,他们立刻就能理解。

But if someone can see it, they get it immediately.

Speaker 2

我认为这种趋势在产品开发中正在发生,同时也正在影响各种其他类型的创意活动。

And I think that's happening for product building, but it's also happening for all sorts of other types of creative endeavors.

Speaker 2

比如拍电影,以前你得去推销它。

Like making a movie, for example, you had to pitch it.

Speaker 2

但现在你可以直接说:我做了个这样的视频,你看看。

But now you can just be like, I made this sort of video and check.

Speaker 2

你可以以很低的成本看到你想要制作的东西的雏形。

You can kind of see the glimmer of the thing you're trying to make for very cheap.

Speaker 2

这意味着你不必花太多时间去说服别人。

And so that means you don't have to spend time convincing people as much.

Speaker 2

你只需要说:看,我做出来了。

You can just be like, here, I made it.

Speaker 1

是的,而且作为创作者,你可以直接做出来,然后一遍又一遍地修改,直到你满意为止。

Yeah, and also as a builder, you can just make it, and then make it again, and then make it again until you're happy.

Speaker 1

我觉得相反的是,过去你会写个文档或者在白板上画点什么,比如我会用Sketch或Figma画东西,但现在我们会直接动手做,直到感觉对了为止。

I feel like the flip side is you used to make a doc or whiteboard something or you know like I would draw stuff in like Sketch or Figma or whatever and now we'll just like build it until I like how it feels.

Speaker 1

现在要传达那种感觉变得太容易了,我觉得以前你或许能用视觉呈现出来,或者用语言描述,但就是永远抓不住那种氛围。

And it's just like so easy to get that feeling out of it now and I think it's like you could see it visually before or you could describe it in words, but it's like you could never get the vibe.

Speaker 1

现在,那种感觉变得非常容易捕捉了。

And now the vibe is real easy.

Speaker 0

你把计划模式重做了三次。

And you built plan mode like three times

Speaker 2

是因为你做了之后扔掉,再重做,又扔掉,再重做吗?

Like because of you built it and then you threw it out and rebuilt it and then threw it out and rebuilt it?

Speaker 1

是的,待办事项这个功能,最初是西德做的,他也做了三四个版本。

Yeah, where to dos, like Sid built the original version, also like three or four, he built like three or four prototypes.

Speaker 1

然后我听说之后还迭代了大约二十个版本,一天之内就完成了。

And then I've heard this had maybe like 20 versions after that, like in like a day.

Speaker 1

是的,我觉得我们发布的几乎所有东西,背后都有至少几个原型。

Yeah, think this is like a lot of, pretty much everything we released, there was at least a few prototypes behind it.

Speaker 2

你们是怎么跟踪并把从一个原型中学到的经验延续到下一个原型的?

How do you keep track of and carry forward the things you learn from prototype to prototype?

Speaker 2

尤其是当一个人先做原型,然后你接手说,我要再做二十个版本的时候。

And especially if it's like, one person is prototyping it, and then you're like, I'm gonna take it over, I'm gonna do 20 more.

Speaker 2

你如何最大化从中获得的收获?

How do you maximize what you get out of that?

Speaker 1

这其中可能有几个要素。

There's maybe a few elements of it.

Speaker 1

一个是风格指南。

One is the style guide.

Speaker 1

所以我们发现了一些风格上的元素。

So there's some elements of style that we discover.

Speaker 1

我认为这其中很多都是为终端而设计的。

And I think a lot of this is building for the terminal.

Speaker 1

我们正在为终端发现一种新的设计语言,并边做边完善。

We're discovering a new design language for the terminal and building it as we go.

Speaker 1

我认为其中一些内容可以编码到风格指南中,这就是我们的QuadMD。

And I think some of this you can codify in a style guide, so this is our QuadMD.

Speaker 1

但还有一部分是产品直觉,我觉得模型目前还完全理解不了。

But then there's this other part of it that's kind of product sense where I don't think the model totally gets it yet.

Speaker 1

我认为我们应该尝试找到方法,教会模型理解哪些做法有效、哪些无效,这种产品直觉。

And I think maybe we should be trying to find ways to teach the model this kind of product sense about this works and this doesn't.

Speaker 1

因为在产品设计中,你希望以最简单的方式解决用户的问题,然后删除所有与此无关的内容,清除一切干扰。

Because in product you want to solve the person's problem in the simplest way possible and then delete everything else that's not that and just get everything out of the way.

Speaker 1

所以你要尽快让产品与用户意图对齐。

So you align the product to the intent as quickly as possible.

Speaker 1

也许模型目前还完全无法理解这一点。

And maybe the model doesn't totally get that yet.

Speaker 0

是的,模型并不能真正体会到使用 Quad Code 的感受。

Yeah, it doesn't really feel what it's like to use quad code.

Speaker 0

就像模型本身并不使用 Quad Code 一样。

Like the model doesn't use quad code.

Speaker 1

所以我认为,当 Quad Code 能够自我测试、自我使用的时候,

So I think like when, you know, quad code can like test itself and it can kind of use itself.

Speaker 1

我们在开发过程中就是这样做的,它能发现 UI 错误之类的问题。

And like we do this when developing and it can see like UI bugs and things like that.

Speaker 1

我不知道,也许我们该试试直接提示它。

I don't know, maybe we should just try prompting it though.

Speaker 1

说实话,很多这类事情就是这么简单。

Honestly, a lot of this stuff is as simple as that.

Speaker 1

每当有新想法时,通常你只要提示一下,它往往就能奏效。

When there's some new idea, usually you just prompt it and often it just works.

Speaker 1

也许我们就该试试这个。

Maybe we should just try that.

Speaker 0

很多原型实际上都是用户体验交互。

A lot of the prototypes are actually the UX interactions.

Speaker 0

所以我觉得,一旦我们发现新的用户体验交互,比如Shift

And so I think once we discover a new UX interaction like Shift

Speaker 1

plus

Speaker 0

Tab自动确认,我觉得Boris已经找到了。

Tab for auto accept, I think Boris figured out.

Speaker 1

其实是伊戈尔。

That was Igor actually.

Speaker 0

哦,伊戈尔。

Oh, Igor.

Speaker 0

回去后

Went back

Speaker 1

我们去了

we to went

Speaker 0

博里斯。

Boris.

Speaker 1

我们做了一周的原型。

We did doing prototypes for a week.

Speaker 0

是的,Shift Tab 感觉非常好。

Yeah, Shift Tab felt really nice.

Speaker 0

然后,当前计划模式的其中一个迭代采用了 Shift Tab,因为它实际上只是另一种方式,用来告诉模型它应该有多自主。

Then one of the now current plan mode iteration uses Shift Tab because it's actually just another way to tell the model how agentic it should be.

Speaker 0

所以我认为,当更多功能使用相同的交互方式时,你会对各个功能应该放在哪里形成更强的心理模型。

And so I think as more features use the same interaction, you form a stronger mental model for what should go where.

Speaker 1

我觉得是的。

Thinking Yeah.

Speaker 1

我认为这是另一个非常好的点。

I think is another really good one.

Speaker 1

我们最初在发布 Quad Code 之前,或者可能是第一个思考型模型的时候,是 3.7 吗?

First we were like, before we released Quad Code, or maybe it was like the first thinking model, was it like 3.7?

Speaker 1

我忘了第一个版本叫什么了。

I forget what the first one was.

Speaker 1

是的。

Yeah.

Speaker 1

那时候它已经能够思考了,我们在头脑风暴:怎么才能切换思考模式?

And it was like, it was able to think and we're brainstorming, how do we toggle thinking?

Speaker 1

然后有人就提议:如果直接让模型用自然语言来思考,它自己就知道该怎么思考了呢?

And then someone was just like, what if you just ask the model to think in natural language and it knows how to think?

Speaker 1

我们就说,好的,太棒了,就这么办。

And we're like, okay, sweet, let's do that.

Speaker 1

于是我们这么做了段时间,后来发现人们会不小心触发它。

And so we did that for a while and then we realized that people were accidentally toggling it.

Speaker 1

所以他们就说:别想了。

So they were like, don't think.

Speaker 1

然后模型就会想:哦,我应该思考。

And then the model's like, oh, I should think.

Speaker 1

他们只是开始让它思考。

They just started thinking.

Speaker 1

所以我们不得不调整一下,让‘别想了’不再触发它。

And so we had to kind of tune it out, so don't think didn't trigger it.

Speaker 1

但那时仍然不够明显,于是我们做了个用户体验改进,来高亮显示思考状态。

But then it still wasn't obvious, so then we made a UX improvement to highlight the thinking Yeah, side yeah.

Speaker 1

那样做真的很有趣,感觉特别神奇。

Of that so fun and it felt really magical.

Speaker 2

当你使用超思考时,就像彩虹一样

When you do ultra think, it's like rainbow or

Speaker 1

随便什么吧,对,没错。

whatever, Yeah, exactly.

Speaker 1

在 Sono 4.5 中,我们发现开启扩展思考后性能有了显著提升。

And then with Sono 4.5, we actually find a really, really big performance improvement when you turn on extended thinking.

Speaker 1

所以我们让它很容易切换,因为有时候你需要它,有时候不需要——对于简单任务,你不想让模型思考五分钟,你只想让它直接完成任务。

And so we made it really easy to toggle it because sometimes you want it, sometimes you don't because you kind of, for a really simple task, you don't want the model to think for like five minutes, you want it to just do the thing.

Speaker 1

所以我们用 Tab 键来切换这个功能,然后移除了大量与思考相关的文字。

And so we used tab as the interaction to toggle it, and then we unshipped a bunch of the thinking words.

Speaker 1

不过我觉得我们还是保留了‘超思考’,纯粹是出于情感原因。

Although I think we kept ultrathink just for sentimental reasons.

Speaker 1

那真是个很棒的用户体验。

It was such a cool UX.

Speaker 2

有意思。

Interesting.

Speaker 2

你觉得有没有什么新的指标是关于你删除的内容的?

Do you think there's some new metric that's about what you deleted?

Speaker 2

我觉得程序员一直觉得删除大量代码会让人感觉特别好,但如今因为构建东西变得如此迅速,删除代码也变得同等重要。

And I think programmers have always felt like deleting a bunch of code feels really good, but there's something about because you can build stuff so fast, it becomes more important to also delete stuff.

Speaker 1

我觉得我最喜欢看到的 diff 是红色的 diff。

I think my favorite kind of diff to see is a red diff.

Speaker 1

这太棒了。

This is the best.

Speaker 1

比如当我收到一个时,我会说,好啊,来吧,再来一个,再来一个。

Like when I receive one, I'm like, yeah, bring it on, another one, another one.

Speaker 1

但这很难,因为任何你发布的东西,都有人在使用。

But it's hard because anything you ship, people are using it.

Speaker 1

所以你必须让所有人满意。

And so you've got to keep people happy.

Speaker 1

我认为总体上我们的原则是,如果我们移除了某个功能,就必须推出一个更好、更能满足用户需求的新功能来替代它。

And I so think generally our principle is if we unship something, we need to ship something even better that people can take advantage of, that kind of matches that intent even better.

Speaker 1

而且,这其实又回到了如何衡量四倍代码及其影响的问题上。

And yeah, think this is kind of back to how do you measure quad code and the impact of it?

Speaker 1

这是每个公司、每个客户都会问我们的事情。

And this is something every company, every customer asks us about.

Speaker 1

我觉得在Anthropic内部,自一月以来我们的规模大概翻了一倍左右。

I think so internally at Anthropic, think we doubled in size since January or something like that.

Speaker 1

但在这段时间里,每位工程师的生产力却提升了近70%。

But then productivity per engineer has increased almost 70% in that time.

Speaker 2

怎么衡量的?

Measured by?

Speaker 1

我觉得我们实际上从几个方面进行了衡量,但PR是最简单也是最主要的指标。

I think we actually measured it in a few ways, but kind of PRs are the simplest one and the main one.

Speaker 1

但正如你所说,这并不能完全反映全部情况。

But like you said, this doesn't capture the full extent of it.

Speaker 1

因为很多情况下,原型开发变得更简单了,尝试新想法也更容易了,那些过去因为优先级太低而根本不会去做的功能,现在你都能一一实现。你本来有个愿望清单,现在因为太容易了,就全都做了,以前根本不可能做到。

Because a lot of this is it easier to prototype, it easier to try new things, making it easier to These things that you never would have tried because they're way below the cut line, you're launching a feature and there's this kind of wish list of stuff, now you just do all of it because it's so easy and just wouldn't have done it.

Speaker 1

所以,真的很难谈论这个。

So yeah, it's really hard to talk about it.

Speaker 1

然后还有另一方面,代码写得越多,你就得删除越多的代码。

And then there's this flip side of it, where more code is written, so you have to delete more code.

Speaker 1

你必须更仔细地进行代码审查,并尽可能自动化代码审查过程。

You have to code review more carefully and automate code review as much as you can.

Speaker 2

还有一个有趣的新产品管理挑战,因为你能发布太多功能,导致整体体验反而不够连贯,因为你可能只是在这里加个按钮、那里加个标签、再加点小东西。

There's also an interesting new product management challenge, because you can ship so much that it ends up not feeling as cohesive, because you could just add a button here and a tab there and a little thing here.

Speaker 2

要构建一个包含所有你想要功能但缺乏任何组织原则的产品变得容易多了,因为你一直在不停地发布各种功能。

It's much easier to build a product that has all the features you want but doesn't have any sort of organizing principle because you're just shipping lots of stuff all the time.

Speaker 0

我认为我们在这方面相当有纪律,确保所有抽象概念都足够清晰,即使某人只是听到功能的名字也能理解。

I think we try to be pretty disciplined about this and making sure that all the abstractions are really easy to understand for someone, even if they just hear the name of the feature.

Speaker 0

我们有一个原则,我相信是Boris带给团队的,我非常认同:我们不希望有新的用户界面。

We have this principle that I believe Boris brought to the team that I really like, where we don't want a new user experience.

Speaker 0

所有功能都应该足够直观,让用户一进来就能顺利使用。

Everything should be so intuitive that you just drop in and it just works.

Speaker 0

我认为这为确保每个功能都极其直观设定了很高的标准。

And I think that's really set the bar really high for making sure every feature is really intuitive.

Speaker 2

你们如何在对话式界面中做到这一点?

How do you do that with a conversational UI?

Speaker 2

因为当没有一堆按钮和旋钮,而一开始只是一个空白的文本框时,你们如何思考如何让它直观?

Because when there's not a bunch of buttons and knobs and it's just a blank text box to start, how do you think about making it intuitive?

Speaker 0

我们做了很多小细节。

There's a lot of little things that we do.

Speaker 0

我们教用户可以使用问号来查看提示。

We teach people that they can use the question mark to see tips.

Speaker 0

当 QuadCode 工作时,我们会显示提示。

We show tips as QuadCode is working.

Speaker 0

我们会在侧边显示变更日志。

We have the change log on the side.

Speaker 0

我们会告诉你,哦,有一个新模型发布了。

We tell you about, oh, there's a new model that's out.

Speaker 0

或者我们在底部显示一个用于思考的通知区域。

Or we show you at the bottom, we have a notification section for thinking.

Speaker 0

我认为我们只是通过一些微妙的方式向用户提示功能。

I think there's just subtle ways in which we tell users about features.

Speaker 0

我认为另一件非常重要的事情是确保所有基础组件都定义得非常清晰。

I think the other thing that's really important is to just make sure that all the primitives are very clearly defined.

Speaker 0

在开发者生态系统中,钩子具有共同的含义。

Hooks have a common meaning in the developer ecosystem.

Speaker 0

插件在开发者生态系统中也有非常普遍的含义,我们要确保我们构建的内容与普通开发者听到这些词时立即想到的一致。

Plugins have a very common meaning in the developer ecosystem, and just making sure that what we build matches what the average developer would immediately think of when they hear that.

Speaker 1

还有一种渐进式披露的方式。

There's also this progressive disclosure thing.

Speaker 1

在QuadCode中每次运行时,你可以按Ctrl+O查看完整的原始转录内容,也就是模型看到的内容。

Any time in quad code when you run it, can hit control o to see the full raw transcript, the same thing the model sees.

Speaker 1

我们只有在真正相关时才会向你展示这些内容。

And we don't show you this until it's actually relevant.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客