揭秘Claude代码：来自构建它的工程师们

本集简介

在Every团队看来，Claude Code彻底改变了他们的工作方式。如今他们能向几乎陌生的代码库提交代码，每个新功能都让后续开发更轻松，甚至非技术同事也能自信使用终端。为探究这一转变，AI & I节目主持人Dan Shipper特邀Claude Code的创造者——Anthropic AI的Cat Wu（@_catwu）和Boris Cherny（@bcherny）——分享他们打造这款全球最受喜爱的AI工程工具的经验。无论是否技术背景，这期节目都是理解Claude Code核心使用方法的必看内容。若喜欢本期内容，请点赞、订阅、留言并分享。想要更多？立即注册Every获取《ChatGPT提示词终极指南》：https://every.ck.page/ultimate-guide-to-prompting-chatgpt。该资源通常仅限付费用户，但您可在此免费领取。关注Dan Shipper更多动态：订阅Every：https://every.to/subscribe 关注X账号：https://twitter.com/danshipper 在ai.studio/build构建您的首款AI应用。时间轴： 00:00:00 - 开始 00:01:26 - 介绍 00:02:25 - Claude Code诞生故事 00:07:03 - Anthropic如何内部使用Claude Code 00:14:06 - Boris和Cat最爱的斜杠命令 00:15:49 - Boris如何用Claude Code规划功能开发 00:21:53 - Anthropic关于高效使用子代理的全部经验 00:26:16 - 用Claude Code将历史代码转化为杠杆 00:33:14 - 打造简洁强大代理的产品决策 00:36:38 - 让非技术用户也能轻松使用Claude Code 00:45:12 - AI编程的下一代形态节目中提到的资源： Cat Wu: cat (@_catwu) Boris Cherny: Boris Cherny (@bcherny) Claude Code: https://www.claude.com/product/claude-code

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

真正让它运作良好的原因是Code拥有工程师在终端上操作的所有权限。你能做的，Cloud Code都能做。中间没有任何障碍。

What made it work really well is that Code has access to everything that an engineer does at the terminal. Everything you can do, Cloud Code can do. There's nothing in between.

Speaker 1

实际上Anthropic内部有越来越多的人在使用大量额度，比如每月花费超过一千美元。我们看到这种高级用户行为。这是YC课程教的内容。如果你能解决自己的问题，就更可能也在为他人解决问题。产品领域有个很古老的概念叫做潜在需求。

There's actually an increasing number of people internally at Anthropic that are using, like, a lot of credits, like spending, like, over a thousand bucks every month. We see this, like, power user behavior. This is something that they teach in YC. If you can solve your own problem, it's much more likely you're solving the problem for others. There's this, like, really old idea in product called latent demand.

Speaker 1

你以可被破解的方式构建产品，保持足够的开放性，让人们能将其滥用于非设计初衷的其他用途，你之所以这样设计，是因为你知道市场存在这种需求。

You build a product in a way that is hackable, that is kind of open ended enough that people can abuse it for other use cases it wasn't really designed for, and you build for that because you kind of know there was demand for it.

Speaker 2

你认为CLI会是最终形态吗？一年或三年后我们主要还会在CLI中使用Cloud Code吗，还是会出现更好的替代方案？

Do you think the CLI is the final form factor? Are we gonna be using Cloud Code in the CLI primarily in a year or in three years, or is there something else that's better?

Speaker 3

本播客由谷歌赞助。大家好，我是Omar，谷歌DeepMind的产品设计负责人。我们刚在AI Studio推出了全新的编程体验，让你能混搭AI功能，比以往更快地将想法变为现实。只需描述你的应用，Gemini就会自动为你配置合适的模型和API。

This podcast is sponsored by Google. Hey, folks. I'm Omar, product and design lead at Google DeepMind. We just launched a revamped vibe coding experience in AI Studio that lets you mix and match AI capabilities to turn your ideas into reality faster than ever. Just describe your app, and Gemini will automatically wire up the right models and APIs for you.

Speaker 3

如果需要灵感，点击'手气不错'按钮，我们会帮你启动项目。前往ai.studio/build创建你的第一个应用。

And if you need a spark, hit I'm Feeling Lucky, and we'll help you get started. Head to a i.studio/build to create your first app.

Speaker 2

Kat、Boris，非常感谢你们的参与。

Kat, Boris, thank you so much for being here.

Speaker 0

感谢邀请我们。

Thanks for having us.

Speaker 2

是的。对于不了解你们的人，你们是Claude Code的创造者。我发自内心地感谢你们。我太爱Claude Code了。

Yeah. So for people who don't know you, you are the creators of Claude Code. Thank you very much from the bottom of my heart. It's I love Claude Code.

Speaker 1

听到这个真是太棒了。这正是我们最想听到的反馈。

That's amazing to hear. That's what we love to hear.

Speaker 2

好的，我想从第一次使用时的感受说起。那是在Sonnet 3.7版本发布前后，当时我用着用着突然意识到——天呐，这完全是种新范式，是思考代码的全新方式。最大的不同是你们彻底取消了文本编辑器，现在只需要对着终端说话就行。要知道以前的AI编程范式，之前的工具都是左边文本编辑器，右边AI辅助，基本就是个加强版的代码补全。能详细说说你们设计这个新范式时的决策过程吗？你们是怎么构思的？

Okay, I think the place I want to start is when I first used it, there was like this moment, like, think it was around when, Sonnet 3.7 came out where I was like, I used it and I was like, holy shit, this is like a completely new paradigm, it's a completely new way of thinking about code. And the big difference was you went all the way and just eliminated the text editor and you're just like all you do is like talk the terminal and that's it. You know previous paradigms of AI programming, previous harnesses have been like you have a text editor and you have the AI on the side and it's kind of like where it's a tab complete. So take me through that decision process, that process of architecting this new paradigm. How you think about that?

Speaker 1

是的，我觉得最重要的是这完全不是刻意设计的。某种程度上是自然形成的。当时我刚加入Anthropic时，我们还在不同的团队。QuadCode之前有个前身叫Clyde（C-L-I-D-E），是个研究项目。

Yeah, I think the most important thing is it was not intentional at all. We sort of ended up with it. So at the time when I joined Anthropic, we were still on different teams at the time. There was this previous predecessor to QuadCode. It was called Clyde, like C L I D E And it was this research project.

Speaker 1

启动需要点时间。那是个很笨重的Python项目，得运行一堆索引什么的。我刚加入时想提交第一个PR，结果像个新手一样手动编写，完全不知道有这些工具。

It took a minute to start up. It was this really heavy Python thing. It had to run a bunch of indexing and stuff. And when I joined, I wanted to ship my first PR. And I hand wrote it like a noob in I the didn't know about any of these tools

Speaker 2

谢谢你愿意在I the上承认这件事

like Thank you for admitting that on I the

Speaker 1

当时我还不懂这些。然后我提交了这个PR，Adam Wolf（他曾是我们团队的引擎经理，也是我的入职导师）直接拒绝了PR，他说'你这是手写的吗？在干什么？用Quiet'——因为他当时也在大量使用Quiet。于是我试了Quiet，输入任务描述后它一次性就搞定了。要知道这还是Sonnet 3.5时代，所以即使是这种基础任务我也得修复结果。而且测试框架非常老旧，生成结果要花五分钟，整个过程漫长无比。

didn't know any better. And then I put up this PR and Adam Wolf, who was the engine manager for our team for a while he was my ramp up buddy and he just like rejected the PR and he was like you wrote this by hand what are you doing? Use Quiet because he was also hacking a lot on Quiet at the time and so I tried Quiet I gave it the description of the task and it just like one shotted this thing. And this was like, you know, Sonnet 3.5, so I still had to fix the thing even for this kind of basic task. And the harness was super old, so it took like five minutes to turn this thing out and just took forever.

Speaker 1

但确实成功了，我当时震惊到不敢相信这居然可行。这让我开始思考：也许你根本不需要IDE。后来我在用Anthropic API做原型时，最简单的方法就是在终端里建个小应用，这样就不用构建UI什么的。于是我开始弄个简易聊天程序。

But it worked, I was just mind blown that this was even possible. And this just kind of got the gears turning. Maybe you don't actually need an IDE. And then later on I was prototyping using the Anthropic API, and the easiest way to do that was just building a little app in the terminal because that way I didn't have to build a UI or anything. And I started just making a little chat up.

Speaker 1

然后我想或许可以做个类似Clyde的功能，就试着构建简易版。结果没花多少功夫就做出了比预期实用得多的东西。对我来说最大的启示是：当我们给模型提供工具后，它们真的就开始使用工具了。那一刻太疯狂了——模型自己就想要使用工具。

And then I just started maybe we could do something a little bit like Clyde, so let me build like a little Clyde. And it actually ended up being a lot more useful than that without a lot of work. And I think the biggest revelation for me was when we started to give the model tools, they just started using tools. It was just, it was this insane moment. Like, the model just wants to use tools.

Speaker 1

比如我们给了Bash，它就开始用Bash写AppleScript来自动化处理问题。我当时的反应是'这太疯狂了，从未见过这样的东西'。因为那时我只用过带文本编辑功能的IDE，最多就是单行/多行自动补全。所以这个项目的起源就是这样——既是原型开发，也是用最粗糙的方式探索可能性。结果这东西出乎意料地实用。

Like we gave it Bash and it just started using Bash, writing AppleScript to automate stuff in response to questions. And I was like, this is just the craziest thing, I've never seen anything like this. Cause at the time I had only used IDEs with like, you know, like text editing, a little like one line autocomplete, multi line autocomplete, whatever. So that's where this came from, was this kind of conversion, so like prototyping, but also kind of seeing what's possible in kind of like a very rough way. And this thing ended up being surprisingly useful.

Speaker 1

我想我们团队也有同感。对我来说关键转折点是Sonnet 4和Opus 4时期，那个神奇时刻就像'天啊，这东西真的能行'。

And I think it was the same for us. I think for me it was like kind of Sonnet four, Opus four, that's where that magic moment was. It was like, my god, this thing works.

Speaker 2

这很有趣。说说工具使用的关键时刻，因为我认为Cloud Code最特别的就是它直接编写Bash而且非常擅长。现在很多人构建智能体时第一反应可能是'我们要给它找文件工具、开文件工具'，为每个动作都建立定制封装，但Cloud Code直接用Bash就做得很好。你从中学到了什么？

That's interesting. Tell me about the tool moment, because I think that is one of the special things about Cloud Code is it just writes bash and it's really good at it. And I think a lot of previous agent architectures or even anyone building agent today, your first instinct might be, okay, we're going to give it a find file tool, and then we're going to give it a open file tool, and you build all these custom wrappers for all the different actions you might want the agent to take, but Cloud Code just uses Bash and it's really good at it. How do you think about what you learned from that?

Speaker 1

是的，现在我们的Cloud Code其实有很多工具，大概有十几个吧。实际上我们几乎每周都会增减工具，所以变化很频繁。但目前确实有个专门搜索的工具，我们这样做有两个原因。

Yeah, think we're at this point in our Cloud Code actually has a bunch of tools. I think it's like a dozen or something like this. We actually add and remove tools most weeks, so this changes pretty often. But today there actually is a tool for searching. And we do this for two reasons.

Speaker 1

一是用户体验优化，这样我们可以把结果显示得更美观些，因为目前大多数任务仍需人工介入。二是权限管理。比如在你的quad代码中若声明settings.json文件不可读，我们就需要强制执行这一限制。虽然bash环境下我们已能执行此限制，但若使用特定搜索工具，效率还能更高些。

One is the UX, so we can show the result a little bit nicer to the user, because there's still a human in the loop right now for most tasks. And the second one is for permissions. So if you say in your like quad code like settings. Json this file you cannot read, we have to kind of enforce this. We enforce it for bash but we can do it a little bit more efficiently for if we have a specific search tool.

Speaker 1

但我们确实希望精简工具集以简化模型。上周或两周前我们移除了LS工具，因为过去需要它，但后来我们实际上构建了一套能在bash中强制执行这类权限系统的方法。所以在bash中，如果我们判定你无权读取某目录，Quad同样无权操作系统该目录。既然能始终如一地执行此限制，就不再需要这个工具了。这很好，因为Quad的选择更少了，上下文中的冗余也减少了。

But definitely we want to unship tools and of keep it simple for the model. Last week or two weeks ago we unshipped the LS tool because in the past we needed it, but then we actually built a way to enforce this kind of permission system for Bash. So in Bash, if we know that you're not allowed to read a particular directory, Quad's not allowed to OS that directory. And because we can enforce that consistently, we don't need this tool anymore. And this is nice, because it's a little less choice for Quad, a little less stuff in context.

Speaker 2

明白了。那你们团队如何划分职责？

Got it. And how do you guys split responsibility on the team?

Speaker 0

我认为Boris负责技术方向制定，并为许多已发布功能提供产品愿景。我更多扮演支持角色：一是确保定价和包装方案符合用户需求，二是监督所有功能顺利通过发布流程——从确定哪些原型应该进入'ant food'阶段，到设定质量门槛，再到向终端用户传达信息。我们正在推进的新计划中，Cloud Code历史上很多功能都是自下而上构建的，比如Boris和核心团队成员提出的待办清单、子代理、钩子等创意都是如此。

I would say Boris sets the technical direction and has been the product visionary for a lot of the features that we've come out with. I see myself as more of a supporting role to make sure that, one, that our pricing and packaging resonates with our users. Two, making sure that we're shepherding all our features across the launch process. So from deciding, all right, these are the prototypes that we should definitely ant food, to setting the quality threshold for ant fooding through to communicating that to our end users. And there's definitely some new initiatives that we're working on that I would say historically a lot of Cloud Code has been built bottoms up, like Boris and a lot of the core team members have just had these great ideas for to do lists, sub agents, hooks, like all these are bottoms up.

Speaker 0

随着我们考虑扩展更多服务并将quad code引入新领域，我认为更多会采用这样的方式：先与客户对话，让工程师参与讨论，优先处理这些服务并逐一攻克。

As we think about expanding to more services and bringing quad code to our places, I think a lot of those are more like, alright, let's talk to customers. Let's bring engineers into those conversations and prioritize those services and knock them out.

Speaker 2

明白了。什么是antfooding？

Got it. What is antfooding?

Speaker 0

哦，antfooding就是

Oh, antfooding is

Speaker 2

哦，蚂蚁食品？

Oh, antfooding?

Speaker 0

哦，意思是狗粮测试。

Oh, it means dog fooding.

Speaker 2

人工蚂蚁。嗯，我明白了。

Anthropic ants. Yeah. I got it.

Speaker 0

是的，我们对内部员工的昵称是蚂蚁。所以蚂蚁食品测试就是我们的狗粮测试版本。内部大约有70%到80%的技术人员每天都在使用Callie Code。每次我们考虑新功能时，都会先推给内部人员测试，能收到大量反馈。我们有一个专门的反馈渠道。

Yeah, our nickname for internal employees is ant. And so ant fooding is our version of dog fooding. Internally, over, I think, 70 or 80% of ants, technical anthropic employees, use Callie Code every day. And so every time we are thinking about a new feature, we push it out to people internally, and we get so much feedback. We have a feedback channel.

Speaker 0

大概每五分钟就能收到一条反馈。这样我们能快速了解用户是否喜欢这个功能、是否存在漏洞，或者是否不够好需要撤回。

I think we get a post every five minutes. And so you get really quick signal on whether people like it, whether it's buggy, or whether it's not good and we should unshift it.

Speaker 2

你能感觉到。你能看出开发者自己也在频繁使用这个工具来构建产品，因为它的交互设计完全符合开发需求——这只有通过蚂蚁食品测试才能实现。我觉得这种自下而上的开发模式很有意思，先为自己打造工具。详细说说这个吧。

You can tell. You can tell that someone that is building stuff is using it all the time to build it, because its ergonomics just makes sense if you're trying to build stuff and that only happens if you're like ant fooding. Yeah, and I think that that's a really interesting paradigm for building new stuff, that sort of bottoms up, I make something for myself. Tell me about that.

Speaker 1

是啊，而且Kat非常谦逊。产品方向的很多决策其实来自团队每个人。具体例子：待办事项和子代理功能是Sid提出的，Hooks是Dixon上线的，插件功能由Daisy负责。团队每个人的想法都能被采纳。

Yeah, and Kat is also so humble. I think Kat has a really big role in the product direction comes from everyone on the team. Specific examples, this actually came from everyone on the team. To do lists and sub agents, that was Sid, Hooks, Dixon shipped that, plugins, Daisy shipped that. So everyone on the team, these ideas come from everyone.

Speaker 1

因此我认为对我们来说，我们构建了这个核心代理循环和这种核心体验，然后团队中的每个人都在一直使用这个产品，团队外的人也是如此。所以就有很多机会去构建满足这些需求的功能。比如批处理模式，你知道的，就是那个感叹号，你可以输入批处理命令。这就像几个月前，我在使用quad code时，在两个终端之间来回切换，觉得有点烦。一时兴起，我让quad想些点子，就想到了这个感叹号bash模式。

And so I think for us, like we build this core agent loop and this kind of core experience, and then everyone on the team uses the product all the time, and so everyone outside the team uses the product all the time. And so there's just all these chances to build things that serve these needs. Like for example, like batch mode, you know, like the exclamation mark and you can type in batch commands. This was just like many months ago, I was using quad code and I was going back and forth between two terminals and just thought it was kind of annoying. And just on a whim, I asked quad to of think of ideas, I thought of this exclamation mark bash mode.

Speaker 1

然后我就说，太好了，把它做成粉色的然后发布吧。它就这么做了。这个功能至今仍在沿用。现在你看到其他人也开始采用类似的做法了。

And then I was like, great, make it pink and then ship it. It just did it. And that's the thing that still kind of persisted. And now you see kind of others also kind of catching on to that.

Speaker 2

真有趣，我之前还真不知道。这非常有用，因为我总是得开新标签页来运行bash命令。所以你只需要打个感叹号，它就能直接运行，而不用经过所有云端处理的过滤。

That's funny. I actually didn't know that. And that's extremely useful, because I always have to open up a new tab to run any bash commands. So you just do an exclamation point, and then it just runs it directly instead of filtering it through all the Cloud stuff.

Speaker 0

是的，而且Cloud Code也能看到完整的输出。

Yeah, and Cloud Code sees the full output too.

Speaker 2

有意思，这太完美了。

Interesting, that's perfect.

Speaker 0

所以你在Cloud Code视图里看到的任何内容，Cloud Code也都能看到。

So anything you see in the Cloud Code view, Cloud Code also sees.

Speaker 2

好的，这确实很有意思。

Okay, that's really interesting.

Speaker 1

这是我们正在考虑的一个用户体验问题。过去工具是为工程师设计的，但现在工程师和模型各占一半。作为工程师你可以看到输出，但对模型也很有用。这也是我们哲学的一部分——万物皆可双用。比如模型也能调用斜杠命令。

And this is kind of a UX thing that we're thinking about. In the past, tools were built for engineers, but now it's equal parts engineers and model. And so like as an engineer you can see the output but it's actually quite useful for the model also. And this is part of the philosophy also like everything is dual use. So for example the model can also call slash commands.

Speaker 1

比如我有一个斜杠提交命令，会执行几个步骤如差异对比、生成合理的提交信息等。我手动运行它，但Quad也能替我执行。这非常实用，因为我们可以共享这套逻辑，定义这个工具后双方都能使用。

So like you know I have a slash command for a slash commit where I run through kind of a few different steps like diffing and generating a reasonable commit message and this kind of stuff. I run it manually but also Quad can run this for me. And this is pretty useful because we get to share this logic, we get to kind of define this tool, then we both get to use it.

Speaker 2

嗯。设计双用工具与设计单方使用工具有哪些区别？

Yeah. What are the differences in designing tools that are dual use from designing tools that are used by one or the other?

Speaker 1

出乎意料的是，目前来看完全一样。我觉得这种对人类优雅的设计同样适用于模型。

Surprisingly, it's the same so far. I sort of feel like this kind of elegant design for humans translates really well to the models.

Speaker 2

所以你们只需考虑对你有意义的设计，通常对模型也同样适用——只要对你有意义就对模型也有意义。

So you're just thinking about what would make sense to you, and the model generally, it makes sense to the model too, if it makes sense to you.

Speaker 0

是的，我认为Cloud Code作为终端UI最酷的一点是它能获取工程师在终端的所有操作。关于工具是否应该双用，我认为双用设计反而让工具更易理解。这意味着你能做的所有操作，Cloud Code都能做，中间没有任何隔阂。

Yeah, I think one of the really cool things about Cloud Code being a terminal UI and what made it work really well is that Cloud Code has access to everything that an engineer does at the terminal. And I think when it comes to whether the tool should be dual use or not, I think making them dual use actually makes the tools a lot easier to understand. It just means that, okay, everything you can do, Cloud Code can do. There's nothing in between.

Speaker 2

这很有趣。有几个关键决策：不采用代码编辑器、基于终端获取文件访问权限、部署在本地而非云虚拟机。

That's interesting. There are a couple of those decisions. No code editor. It's in the terminal so it has access to your files. It's on your computer versus in the cloud in a virtual machine.

Speaker 2

因此你可以重复使用它，逐步构建你的Cloud MD文件或创建斜杠命令等功能，从一个简单的起点开始变得非常可组合和可扩展。我很好奇你对这个问题的看法，对于那些正在考虑构建代理（可能不是Cloud Code而是其他东西）的人来说，如何获得那个简单的包，然后逐步扩展并随时间推移变得非常强大。

So you get to use it in a repeated way where you can build up your Cloud MD file or build slash commands and all that kind of stuff, where it becomes very composable and extensible from a very simple starting point. And I'm curious about how you think about, you know, for people who are thinking about, okay, I want to build an agent, want to build probably not Cloud Code, but something else, how you get that simple package that then can extend and be really powerful over time.

Speaker 1

对我来说，我会先把它当作开发任何产品一样思考——你必须先为自己解决问题，才能为他人解决。这是YC课程中强调的：你必须从自身需求出发。如果能解决自己的问题，就更可能解决他人的问题。我认为对于编程来说，从本地开始是合理的。现在我们有网页版的Cloud Code，你也可以配合虚拟机使用，或者在远程环境中使用，这在移动办公时特别有用。我们逐步验证了这个模式，比如在GitHub上通过'claud'操作。我每天都会用，比如等红灯时（虽然可能不该这么做），我就会在GitHub上添加一个'red light'。

For me, I'd start by just thinking about it like developing any kind of product where you have to solve the problem for yourself before you can solve it for others. And this is something that they teach in YC is you have to start with yourself. So if can solve your own problem, it's much more likely you're solving the for others. And I think for coding starting locally is the reasonable thing and you know now we have cloud code on the web so you can also use it with a virtual machine and you know you can use it in a remote setting and this is super useful when you're on the go you want to take some of that from your And this is sort of, we started proving this out kind of a step at a time, where you can do at claud in GitHub. And I use this every day, like on the way to work I'm like, at a red light, I probably shouldn't be doing this, but I'm like, on GitHub, add a red light.

Speaker 1

然后我会继续操作，比如添加'quad'、修复某个问题等等。用手机控制真的非常方便。这验证了这种体验的可行性，不过我不确定是否适用于所有场景。对于编程来说，我认为从本地开始是正确的。

And then I'm like, add quad, fix this issue or whatever. And so it's just really useful to be able to control it from your phone. And this kind of proves out this experience. I don't know if this necessarily makes sense for every kind of use case. For coding, I think starting local is right.

Speaker 1

我不确定这是否正确

I don't know if this is true

Speaker 2

适用于所有情况。明白了。你们用的斜杠命令是

for everything, Got it. What are the slash commands you guys

Speaker 0

什么？斜杠PR提交。

use? Slash PR commit.

Speaker 1

是的。

Yeah.

Speaker 0

是的。我认为PR提交斜杠命令能让调用者更快速地明确需要运行哪些bash命令来完成提交。

Yeah. It's I I think the PR commit slash command makes it a lot faster for a call to know exactly what bash commands to run-in order to make a commit.

Speaker 2

那么对于不熟悉的人来说，PR提交斜杠命令具体能做什么呢？

And what does the PR commit slash command do for people who aren't familiar?

Speaker 0

哦，它其实就是明确告诉你如何进行提交。

Oh, it it just tells it, like, exactly how to make a commit.

Speaker 2

好的。

Okay.

Speaker 0

你可以说，好的，这是需要运行的三个bash命令。

And you can say, Okay, these are the three bash commands that need to be run.

Speaker 1

明白了。更酷的是我们的斜杠命令内置了这种模板系统。实际上我们会提前运行这些bash命令，它们被嵌入到斜杠命令中。你还可以预先允许某些工具调用。

Got it. And what's pretty cool is also we have this kind of templating system built into slash commands. So we actually run the bash commands ahead of time. They're embedded into the slash command. And you can also pre allow certain tool invocations.

Speaker 1

所以对于那个斜杠命令，我们会允许git commit、git push、ghpr等操作，这样运行斜杠命令后就不会再询问权限，因为我们有基于权限的安全系统。另外它还使用了Haiku技术很酷，这是个更轻量更快的模型。我个人经常使用commit、commit PR、feature dev这些命令，Sid创建的这个功能很棒，它会一步步引导你构建内容。

So for that slash command we say allow git commit, git push, ghpr and so you don't get asked for permission after you run the slash command because we have like a permission based security system. And then also it uses Haiku, which is pretty cool. So it's kind of a cheaper model and faster. Yeah, and for me I use commit, commit PR, feature dev, we use a lot, Sid created this one. It's kind of cool, it kind of walks you through step by step building something.

Speaker 1

我们这样使用Quad：首先让它询问我的具体需求，比如制定规范。然后制定详细计划，列出待办事项，逐步推进。这样功能开发就更有条理。最后我们经常使用的一个环节是对所有PR进行安全审查和代码审查。Quad在Anthropic内部负责我们所有的代码审查工作。

So we prompt Quad like, first ask me what exactly I want, like build the specification. And then build a detailed plan and then make a to do list, walk through step by step. So it's kind of like more structured feature development. And then I think the last one that we probably use a lot, so we use like security review for all of our PRs and then also code review. So like Quad does all of our code review internally at Anthropic.

Speaker 1

当然最终仍需人工批准，但Quad会先完成代码审查的第一步。只需输入/code review/命令即可。

You know, there's still a human approving it, but Quad does kind of the first step in code review. That's just a slash code review slash command.

Speaker 2

明白了。我很想深入了解如何制定优质计划，也就是功能开发这部分。因为我觉得Everest团队逐渐发现了很多实用小技巧，我很好奇我们遗漏了哪些。比如计划制定过程中有个反直觉的步骤：即使我不完全清楚要构建什么，只是脑子里有个模糊概念，比如‘我想要X功能’。

Got it. Yeah, what are the things I would love to go deeper into the how do you make a good plan, so the sort of the feature dev thing. Because I think there's a lot of little tricks that I'm starting to find, or people at Everest are starting to find that work. And I'm curious, what are the things that we're missing. So for example, one unintuitive step of the plan development process is, even if I don't exactly know what the thing that needs to be built is, I just have a little sentence in my mind, like I want feature X.

Speaker 2

我会直接让Claude实现它，不给任何额外信息，然后观察它的输出。这能帮助我理清思路——因为它的各种错误或意外之举可能反而有启发。我会从这种‘试错开发’中学习，清空重来，从而为实际功能开发写出更好的计划规范。这在以前是不可能做的，因为让工程师盲目开发未规划的功能成本太高。但现在有了Cloud浏览代码库的能力，你可以从中学习并完善实际计划。

I have Claude just like implement it, just without giving it anything else, and I see what it does, and that helps me understand like, okay, here's actually what I mean, because it made all these different mistakes, or like it did something that I didn't expect that might be good. And then I use that, like the learning from the throwaway development, and just clear it out, and then that helps me write a better plan spec for the actual feature development, which is something that you would never do before, because it'd be too expensive to just YOLO send in an engineer on a feature that you hadn't actually specked out. But because you have Cloud going through your code base and doing stuff, you can learn stuff from it that helps inform the actual plan that you make.

Speaker 1

是的，或许我可以先开始，也很好奇你的使用方式。对我而言大概有几种模式：一是原型模式。传统工程原型就是构建触及所有系统的最简版本，只为大致了解系统结构和未知项，从而理清脉络。丹，我用的方法和你完全一样。

Yeah, I feel maybe I can start, and I'm curious how you use it too. I think there's a few different modes maybe for me. One is prototyping mode. So traditional engineering prototyping, you wanna build the simplest possible thing that touches all the systems just so you can get a vague sense of what are the systems, there's unknowns, just to kind of trace through everything. And so I do the exact same thing as you, Dan.

Speaker 1

Quad直接执行任务，我观察它的失误点，然后让它丢弃重来。按两次ESC回退到旧检查点重新尝试。另外还有两类任务：一类是能一次性完成的，我有把握时就直接下达指令，然后切到其他标签页用Shift+Tab自动接受，同时去处理其他Quad任务；另一类是较难的功能开发，过去可能耗费数小时工程时间。这类任务我会用Shift+Tab进入计划模式，在写代码前先对齐计划。难点在于每个模型的边界都以出人意料的方式变化。

Quad just does the thing and then I see where it messes up and then I'll ask it to just throw it away and do it again. Just hit escape twice, go back to the old checkpoint and then try again. I think there's also maybe two other kinds of tasks so one is just things that clocked in one shot and I feel pretty confident it can do it so I'll just tell it and then I'll just go to a different tab and I'll shift tab to auto accept and then just go do something else or go to another one of my quads and tend to that while it does this. But also there's this kind of like harder feature development so these are you know things are maybe in the past it would have taken like a few hours of engineering time And for this usually I'll shift tab into plan mode and then align on the plan first before it even writes any code. And I think what's really hard about this is the boundary changes with every model in kind of a surprising way.

Speaker 1

新版模型更智能，需要计划模式的边界就被向外推移了些。以前需要计划的内容现在不需要了。这是个普遍趋势：随着模型进阶，原本的脚手架功能逐渐被模型吸收，最终模型会逐步涵盖所有环节。

Where the newer models they're more intelligent so the boundary of what you need plan mode for got pushed out like a little bit. Like before you used to need to plan, now you don't. And I think this is general trend of like stuff that used to be scaffolding with a more advanced model, it gets pushed into the model itself, and the model kind of tends subsume everything over time.

Speaker 2

是的。你如何看待构建一个代理框架，避免花费大量时间开发那些三个月后新云服务推出时就会被模型取代的功能？如何区分该构建什么和只是说说而已？目前虽然还不完善，但下次就会成功，所以我们不打算在这上面浪费时间。

Yeah. How do you think about building an agent harness that isn't just going to you're not spending a bunch of time building stuff that is just going to be subsumed the model in three months when the new Cloud comes out? How do you know what to build versus what you just say? It doesn't work quite yet, but next time it's going to work, so we're not going to spend time on it.

Speaker 0

我认为我们会构建大多数能提升Cloud Code能力的功能，即使这意味着三个月后就要淘汰它们。实际上，我们反而希望三个月后能淘汰这些功能。目前我们只想提供最优质的体验，所以不太担心做短期工作。

I think we build most things that we think would improve Cloud Code's capabilities, even if that means we'll have to get rid of it in three months. If anything, we hope that we will get rid of it in three months. I think for now, we just want to offer the most premium experience possible, so and we're not too worried about throwaway work.

Speaker 1

有意思。比如计划模式本身就是个例子。我们可能最终会取消这个功能，当Quad能直接从用户意图判断出需要先做计划时。就像我昨天刚从系统提示中删掉了大约2000个token。

Interesting. Yeah. And an example of this is something like even like plan mode itself. Think we'll probably unship it at some point when Quad can just figure out from your intent that you probably want to plan first. Or, for example, I just deleted 2,000 tokens or something from the system prompt yesterday.

Speaker 1

感觉Sonnet 4.5已经不需要这个了，但Opus 4.1还是需要的。

This feels like Sonnet 4.5 doesn't need it anymore. But Opus 4.1 did need it.

Speaker 2

如果最新前沿模型不需要这些，但你有海量用户需要考虑效率问题，可能不会在所有场景都用Opus或Sonnet 4.5，而是会用Haiku。这时就需要权衡：是为Haiku开发复杂框架，还是干脆不花时间直接使用Sonnet承担成本，专注于前沿技术开发。

What about in the case where the latest frontier model doesn't need it, but you're trying to figure out how to make it more efficient because you have so many users that maybe you're not going to use Opus or Sonnet 4.5 for everything. Maybe you're going use Haiku. So there's a trade off between having a more elaborate harness for Haiku versus just not spending time on it, using Sonnet, eating the cost, and working on more frontier type stuff.

Speaker 0

总体而言，我们将Cloud Code定位为高端产品。因此我们的北极星目标是确保它能与我们当前最强大的Sonnet 4.5模型完美配合。虽然也在研究如何适配未来代际的模型，但这并非最高优先级。

In general, we've positioned Cloud Code to be a very premium offering. So our North Star is making sure that it works incredibly well with the absolutely most powerful model we have, which is Sonnet 4.5 right now. We are investigating how to make it work really well for future generations models, but it's not the top priority

Speaker 2

对我们来说。有件事我注意到，我们经常能提前获得新模型（非常感谢），而我们的工作就是评估它们是否好用。过去半年测试Claude时发现，在Claude应用中很难立即判断新前沿模型是否更好，但在Claude代码中就很容易辨别，因为框架对模型性能表现影响很大。

for us. Okay. What do you think about One thing that I notice is we get models often, and thank you very much for we get models a lot before they come out, and it's our job to kind of figure out is it any good. Over the last six months, when I'm testing Claude, for example, in the Claude app, with a new frontier model, it's actually very hard to tell whether it's better immediately. But it's really easy to tell in Claude code because the harness matters a lot for the performance that you get out of the model.

Speaker 2

你们在Anthropic内部开发Claude或Claude代码具有优势。因此基础模型训练与你们构建的工具之间存在更紧密的集成，两者似乎能相互促进。这种内部协作机制具体是如何运作的？紧密集成带来了哪些好处？

And you guys have the benefit of building Claude or building Claude code inside of Anthropic. So there's a much tighter integration between the fundamental model training and the harness that you're building. They seem to kind of really impact each other. So how does that work internally, what are the benefits you get from having that tight integration?

Speaker 1

是的，我认为最关键的是研究人员直接使用这个系统。当他们发现哪些有效、哪些无效时，就能进行改进。我们做了大量评估工作来双向沟通，准确掌握模型状态。但关键在于，你需要给模型足够困难的任务才能真正突破其极限。如果不这样做，所有模型的表现都会趋于平庸。

Yeah, I think the biggest thing is researchers just use this. And so as they see what's working and what's not, they can improve stuff. We do a lot of evals and things like that to kind of communicate back and forth and understand where exactly the model's at. But yeah, there's this frontier where you need to give the model a hard enough task to really push the limit of the model. And if you don't do this, then all models are kind of equal.

Speaker 1

但如果你给它相当困难的任务，就能看出区别。

But if you give it a pretty hard task, you can tell the difference.

Speaker 2

你们使用哪些子代理？

What sub agents do you use?

Speaker 1

我有几个。比如我使用的规划子代理、代码审查子代理。实际上在代码审查时，我有时用子代理，有时用斜杠命令。通常在持续集成中用斜杠命令，而在同步视图里对相同功能会使用子代理。

I have a few. I have like a planner sub agent that I use. I have a code review sub agent. Code review is actually something where sometimes I use a sub agent, sometimes I use a slash command. So usually in CI it's a slash command, but in synchronous views I use a sub agent for the same thing.

Speaker 1

这是个好问题。可能更多是个人偏好吧。我觉得在同步运行时，稍微分离上下文窗口会更好，因为编码视图里发生的所有事情与我下一步操作并不相关。但在持续集成中这些都无所谓。

It's a good question, yeah. Maybe it's like a matter of taste. I don't know, I don't know. I think it's maybe when you're running synchronously, it's kind of nice to fork off the context window a little bit, because all the stuff that's going on in the coder view, it's not relevant to what I'm doing next. But in CI, just doesn't matter.

Speaker 2

你们会同时生成10个子代理吗？具体用于什么场景？

Are you ever spawning like 10 sub agents at once, and for what?

Speaker 1

对我来说，我主要是在大型迁移时使用它。

For me, I do it mostly for big migrations.

Speaker 2

好的。

Okay.

Speaker 1

这就像是最重要的事情。实际上我们有这样一个coderucommand工具，里面包含了许多子代理。其中一个步骤就是查找所有问题。有一个子代理负责检查QuadMD合规性，另一个子代理会查看git历史记录了解情况，还有一个子代理专门寻找明显的错误。之后我们会进行去重和质量检查的步骤。

That's like the big thing. Actually we have, so this like coderucommand that we use, there's a bunch of sub agents there. And so one of the steps is like find all the issues. And so there's one sub agent that's like checking for QuadMD compliance, there's another sub agent that's looking through git history to see what's going on, another sub agent that's looking for kind of obvious bugs. And then we do this kind of deduping quality step after.

Speaker 1

它们会找出很多问题，其中很多是误报。于是我们又派生出大约五个子代理，这些子代理专门检查误报。最终的结果非常棒。

So they find a bunch of stuff. A lot of these are false positives. And so then we spawn like five more sub agents. And these are all just like checking for false positives. And in the end, the result is awesome.

Speaker 1

它能找出所有真正的问题而不会包含误报。

It finds like all the real issues without the false issues.

Speaker 2

这很棒。我其实也这么做。我的一个非技术性Cloud Code应用场景是费用报销。比如我现在在旧金山，有很多费用需要处理，所以我建立了一个小型云项目，或者说用Cloud Code编写了一个程序，它调用金融API下载我所有的信用卡交易记录，然后判断哪些可能是需要报销的费用。我有两个子代理，一个代表我，一个代表公司，它们会互相博弈来确定最终合理的报销项目。就像一个审计子代理和一个支持我的子代理。

That's great. I actually do that. So one of my non technical Cloud Code use cases is expense filing. So like when I'm I'm in SF right now, so like I have all these expenses, and so I built this little cloud project that, or in cloud code that, it uses one of these finance APIs to just download all my credit card transactions, and then it decides like these are probably the expenses that I'm gonna have to file, And then I have two sub agents, one that represents me and one that represents the company, and they do battle to figure out what's the proper actual set of expenses. It's like an auditor sub agent and a pro Dan sub agent.

Speaker 2

是的，这类对抗处理模式似乎是个很有趣的方法。

So yeah, that kind of thing, the sort of opponent processor pattern seems to be an interesting one.

Speaker 1

是啊是啊，很酷。我记得当子代理刚开始兴起时，真正启发我们的是很久以前Reddit上的一个帖子，有人创建了几个子代理，包括前端开发、后端开发，我记得还有个设计师。

Yeah, yeah, it's cool. I feel like when sub agents were first becoming a thing, actually what inspired us, there was a Reddit thread a while back where someone made sub agents for, there was a front end dev and a back end dev and I think a designer.

Speaker 0

测试开发。

Testing dev.

Speaker 1

测试开发，还有个产品经理子代理。这个想法挺可爱的，虽然可能有点过于拟人化。但确实有其价值，关键在于那些互不关联的上下文窗口，两个互不知情的上下文窗口这样组合起来很有意思。而且这种方式往往能获得更好的结果。

Testing dev, there was a PM sub agent. And this is cute, it feels a little maybe too anthropomorphic. Maybe there's something to this, but think the value is actually the uncorrelated context windows, where you have these two context windows that don't know about each other, and this is kind of interesting. And you tend to get better results this way.

Speaker 2

你呢，Ari？你有什么有趣的子代理在用吗？

What about you, Ari? Do you have any interesting sub agents you use?

Speaker 0

我最近在捣鼓一个特别擅长前端测试的。它用Playwright来检测客户端的所有错误，收集这些错误并尝试测试应用的更多步骤。虽然还不完善，但已经初见成效，我觉得这种工具很适合打包进我们的插件市场。

So, I've been tinkering with one that is really good at front end testing. So, it uses Playwright to see, all right, what are all the errors that are client side, and pull them in, and try to test more steps of the app. It's not totally there yet, but I'm seeing signs of life, and I think it's the kind of thing that we could potentially bundle in one of our plugins marketplaces.

Speaker 2

确实如此。我用过类似的工具，配合Puppeteer，看着它构建内容然后打开浏览器，然后发现问题需要修改，那种感觉就像'天啊'。

Yeah, definitely. I've used something like that just with Puppeteer, and just watching it build something and then open up the browser, and then be like, oh, I need to change this. It's like, this is like, oh my god.

Speaker 0

是啊，真的很酷。

Yeah, it's really cool.

Speaker 1

这真的很酷。我们开始看到这种大规模、多主体子代理的雏形。不知道该怎么称呼它，像是集群之类的。有很多人，实际上Anthropic内部有越来越多的人每月使用大量额度，花费超过1000美元。而且这个人群比例增长得相当快。

It's really cool. I we're starting to see the beginnings of this massive, multi massive sub agents. Don't know what to call this, like swarms or something like that. There's a bunch of people, there's actually an increasing number of people internally at Anthropic that are using a lot of credits every month, spending over $1,000 every month. And this percent of people is growing actually pretty fast.

Speaker 1

我认为最常见的用例是代码迁移。他们正在做的是从框架A迁移到框架B，主代理会列出一个庞大的待办事项清单，然后通过一系列子代理进行映射归约。学生们惊讶地发现，可以启动10个代理，每次处理10项，就这样迁移所有内容。

And I think the common use case is code migration. And so what they're doing is framework A to framework B, there's the main agent and makes a big to do list for everything, and then just of map reduce over a bunch of sub agents. Students struck by like, yeah, like start 10 agents and then just go like 10 at a time and just migrate all the stuff

Speaker 2

这很有趣。你能举个具体例子说明下你所说的迁移类型吗？

over. That's interesting. What would be like a concrete example of the kind of migration that you're talking about?

Speaker 1

我觉得最经典的例子是代码规范检查规则。当你推出某种新的检查规则时，由于涉及抽象语法树分析过于简单，往往没有自动修复功能。还有就是框架迁移，比如从一个测试框架迁移到另一个，这种情况很常见，因为输出结果很容易验证。

I think the most classic is like lint rules. So there's some kind of lint rule you're rolling out, there's no auto fixer because it's like AST analysis is kind of too simplistic for it. I think other stuff is like framework migrations. Just migrated from one testing framework to a different one. That's a pretty common one where it's super easy to verify the output.

Speaker 2

我发现的一个现象是——这既适用于Every内部项目也适用于开源项目——如果你正在开发产品并想实现一个已有先例的功能。比如内存管理就是个很多人需要实现的例子。因为我们内部有多个不同产品，你可以直接生成云端子代理去查询：这三个产品是怎么实现的？这样就存在隐性代码共享的可能，不需要通过API或询问他人，直接就能知道现有解决方案。

One of the things I found is, and this is both for projects inside of Every and then just open source projects, it's like if you're someone building a product and you want to build a feature that's been done before. So maybe an example that people might need to implement a bunch is memory. How do you do memory? Because we have a bunch of different products internally, you can just like spawn cloud sub agents to be like, how do these three other products do it? And there's like possibility for just like tacit code sharing where you don't need to like have an API or you don't need to like ask ask anyone, you can just be like, how does how do we do this already?

Speaker 2

然后借鉴最佳实践来构建自己的方案。开源项目也能这样操作，因为有很多项目已经钻研内存管理一整年，做得非常出色。你可以分析人们总结出的模式，再决定采用哪些方案。

And then use the best practices to build your own. And you can also do that with open source, because there's tons of open source projects where people are like, they've been working memory for a year, and it's really, really good. And you can be like, what are the patterns that people have figured out, and which ones do I want to implement?

Speaker 0

完全正确。你还可以连接版本控制系统。如果过去开发过类似功能，Cloud Code能直接调用GitHub等API查询历史实现方案，阅读代码并复制相关部分。

Totally. You could also connect to your version control system. If you've built a similar feature in the past, Cloud Code can use those APIs like query GitHub directly and find how people implemented a similar feature in the past, and read that code and copy the relevant parts.

Speaker 2

是啊，你有没有发现日志文件有什么用处？比如，这里完整记录了我实现它的过程，把这些交给Claude重要吗？你是如何实现或让它变得有用的？

Yeah, have you found any use for log files of, okay, here's the full history of how I implemented it, and is that important to give to Claude, and how are you implementing that or making it useful for it?

Speaker 0

有些人对此深信不疑。Anthropic有些员工每完成一项任务，都会让Claude Ko按照特定格式写日记条目，记录它做了什么、尝试了什么、为什么没成功。他们甚至开发了能回顾历史记忆并综合成观察结论的智能体。我觉得这就像萌芽状态，这里有些值得产品化的有趣东西，是我们发现的新兴有效模式。单从一份记录一次性提取记忆的难点在于，很难判断某个具体指令对未来所有任务的关联性。

Some people swear by it. There are some people at Anthropic where for every task they do, they tell Claude Ko to write a diary entry in a specific format that just documents like what did it do, what did it try, why didn't it work. And then they even have these agents that like look over the past memory and synthesize it into observations. I think this is like the starting budding, There's something interesting here that we could productize, but it's a new emerging pattern that we're seeing that works well. I think the hard thing about one shotting memory from just one transcript is that it's hard to know how relevant a specific instruction is to all future tasks.

Speaker 0

就像我们的典型例子：如果我说把按钮变成粉色，我不希望你今后记住要把所有按钮都变粉。所以我认为从大量日志中综合记忆，能更持续地发现这些模式。

Like our canonical example is, if I say make the button pink, I don't want you to remember to make all buttons pink in the future. And so I think synthesizing memory from a lot of logs is a way to find these patterns more consistently.

Speaker 2

看起来你可能需要某些能自上而下综合或总结的内容。这些内容未来会有用，你会知道哪些抽象层级可能有用。但还有很多情况是，比如任何具体的提交记录——比如把按钮变粉——可能因无数你事先无法预料的原因而有用。所以还需要模型能检索所有类似的历史提交，并在适当时机呈现。这也是你在考虑的吗？

It seems like you probably need There's some things where you're gonna know You'll be able to synthesize or summarize in this top down way. Will be useful later, you'll know the right level of abstraction at which it might be useful. But then there's also a lot of stuff where it's like, you actually, any given commit blog, make the button pink, it could be useful for kind of an infinite number of different reasons that you're not going to know beforehand. So you also need the model to be able to look up all similar past commits and surface that at the right time. Is that something that you're also thinking about?

Speaker 1

是的，可能有类似机制。或许可以看作传统记忆存储工作（如MemX）的延伸，就是把所有信息存入系统后变成检索问题。随着模型变得更智能，我看到Sonnet 4.5也开始自然实现这点——当遇到瓶颈时，它会像我们之前讨论的那样自发使用Bash查看git历史记录，然后发现'哦，这倒是个有趣的做法'。

Yeah, I there could be something like that. Maybe I think one way to see it is this kind of like traditional memory storage work like MemX, like where you just want to put all the information into the system and then it's kind of a retrieval problem after that. I think as the model also gets smarter, I've seen it start to naturally do this also with Sonnet 4.5, where if it's stuck on something it'll just naturally start looking like we talked about before, like using Bash spontaneously to just look through git history and be like, Oh, okay, yeah, this is kind of an interesting way to do it.

Speaker 2

没错。开始录音前我们聊到，在Every内部推行的工程范式彻底改变了我们的工作方式，因为所有人都被Cloud Code和CLI深度影响了。我们称之为'复合型工程'——传统工程中每个新增功能都会让下一个更难实现，而复合型工程的目标是让每个已完成功能为下一个提供便利。实现方式就是把所有开发过程中的经验教训都编码固化。

Yeah. One of the things that we were talking before we started recording, one of the things that we're doing inside of Every, like, I feel like it has really changed the way that we do engineering, because everyone is Cloud Code pilled, like CLI pilled. And we have this engineering paradigm that we call compounding engineering, where in normal engineering, every feature you add, it makes it harder to add the next feature. And in compounding engineering, your goal is to make the next feature easier to build from the feature that you just added. And the way that we do that is we try to codify all the learnings from everything that we've done to build the feature.

Speaker 2

比如我们如何制定计划？哪些部分需要调整？测试时发现了什么问题？遗漏了什么？然后把这些经验编码回所有提示词、子智能体和斜杠命令中，这样下次有人做类似事情时就能自动规避。正因如此，像我这样完全不懂代码逻辑的人，也能快速在我们的代码库中投入生产——因为我们建立了随着开发不断积累的记忆系统。不过这套系统是我们自己搭建的。

So like, how did we make the plan and what parts of the plan need to be changed? Or like, when we started testing it, like what issues did we find, what are the things that we missed? And then we codify them back into all the prompts and all the sub agents and all the slash commands so that the next time when someone does something like this, it catches it, and that makes it easier. That's why for me, for example, I can hop into one of our code bases and start being productive, even though I don't know anything about how the code works, because we have this built up memory system of all the stuff that we've learned as we've implemented stuff. But we've had to build that ourselves.

Speaker 2

我很好奇，你是在开发那种循环功能让云代码自动执行吗？

I'm curious, are you working on that kind of loop so the Cloud Code does that automatically?

Speaker 1

是的，我们正在考虑这个问题。有趣的是，菲奥娜也提出了同样的看法。她刚加入团队担任经理，大概有十年没写过代码了。

Yeah, we're starting to think about it. It's funny, we heard the same thing from Fiona. She just joined the team. She's our manager. She hasn't coded in ten years, something like that.

Speaker 1

结果她入职第一天就成功提交了PR。她说不仅通过quad code轻松重拾编程技能，还不用花时间熟悉业务背景。很多用户反馈说经常会给quad code本身提交PR，看到错误就直接说'添加quad，把这个加入quadmd'，这样下次就能自动识别。你可以用多种方式培养这种记忆。

And she was winning PRs on her first day. And she was like, Yeah, not only did I kind of, I forgot how to code and quad code kind of made it super easy to just get back into it, but also I didn't need to ramp up on any context because I kind of knew all this. And I think a lot of it is about like when people put up pull requests for quad code itself, and I think our customers tell us that they do like some more stuff pretty often. If you see a mistake, I'll just be like, add quad, add this to quadmd so that the next time it just knows this automatically. And you can kind of like instill this memory in kind of a variety of ways.

Speaker 1

比如你可以说'添加quad，加入quadmd'，也可以说'添加quad，写个测试'。这样能轻松防止功能退化，现在让人写测试也不会不好意思了。

So you can say like add quad, add it to quadmd. You can also say add quad, write a test. You know, that's like an easy way to make sure this doesn't regress. I And don't feel bad asking anyone to write tests anymore. It's just super easy.

Speaker 1

我们接近100%的测试都是quad写的。写得差的就不提交，好的就保留。另外lint规则也很重要，对于经常需要强制的规范，我们有很多内部lint规则，这些100%都是quad生成的。

I think probably close to 100% of our tests are just written by quad. If they're bad, we just won't commit it, and then the good ones stay committed. And then also I think lint rules are a big one. So for stuff that's enforced pretty often, we actually have a bunch of internal lint rules. Quad writes 100% of these.

Speaker 1

基本模式就是在PR里添加Quad，编写lint规则。现在的问题是如何自动化实现？我和Kat的思路是先观察高级用户行为，通过产品可定制化让优秀用户探索新功能，真正的难点是如何把这些推广给所有用户。我自己就属于普通用户群体。

And this is mostly just like add Quad in a PR, write this lint rule. And yeah, there's sort of this problem right now about how do you do this automatically? And I think generally how Kat and I think about it is we see this power user behavior, and the first step is how do you enable that by making the product hackable so the best users can figure out how to do this cool new thing. But then really the hard work starts of like how do you take this and bring it to everyone else? And for me, count myself in the everyone else bucket.

Speaker 1

比如我其实不太会用VIM，也没有复杂的终端配置，就是很普通的开发环境。所以如果一个功能我能用，那基本说明普通工程师也都能用。

Like I don't really know how to use VIM, I don't have this crazy T box setup. So I have a pretty vanilla setup. So if you can make a feature that I'll use, it's a pretty good indicator that other average engineers will use it.

Speaker 2

这很有趣。详细说说吧，因为我一直在思考如何打造一个既具有足够扩展性和灵活性，让高级用户能发掘出你意想不到的创新用法，同时又足够简单让任何人都能使用并高效工作的产品，还能将高级用户发现的用法反馈到基础体验中。你是如何考虑这些设计和产品决策来实现这一目标的？

That is interesting. Tell me about that, because that's something I think about all the time is making something that is extensible and flexible enough that power users can find novel ways to use it that you would not have even dreamed of, but it's also simple enough that anyone can use it, they can be productive with it, and you can pull what the power users find back into the basic experience. How do you think about making those design and product decisions so that you enable that?

Speaker 0

总的来说，我们认为每个引擎环境都略有不同，因此系统的每个部分都具备可扩展性至关重要。从状态栏到添加自定义斜杠命令，再到允许在代码几乎任何环节插入确定性逻辑的钩子功能，这些都是我们提供给每位工程师的基础构建模块。插件功能实际上是由团队里的Daisy开发的，它让普通用户能更轻松地将这些斜杠命令和钩子整合到工作流中。通过插件，你可以浏览现有的MCP服务器、钩子、插件，或者应该说现有的斜杠命令，只需在云代码中写一条指令就能将其集成。

In general, we think that every engine environment is a little bit different from the others, and so it's really important that every part of our system is extensible. So everything from your status line to adding your own slash commands through to hooks, which let you insert a bit of determinism at pretty much any step in quad code. So we think these the basic building blocks that we give to every engineer that they can play with. For plugins, plugins is actually our so it was built by Daisy on our team, and this is our attempt to make it a lot easier for the average user like us to bring these slash commands and hooks into our workflows. And so what plugins does is it lets you browse existing MCP servers, existing hooks, existing plugins, and just like or sorry, existing slash commands, and just let you write one command in Cloud Code to pull that in for yourself.

Speaker 1

产品领域有个古老概念叫'等待需求'，这或许是我个人思考产品和规划下一步的核心方式。这个超级简单的理念是：你打造的产品要具备可改造性，开放到足以让人们将其滥用于非设计初衷的场景。然后观察人们如何滥用它，再据此构建功能，因为你已经知道存在这样的需求。

There's this really old idea in product called wait and demand, which I think is probably the main way that I personally think about product and think about what to build next. It's a super simple idea. You build a product in a way that is hackable, that is kinda open ended enough that people can abuse it for other use cases it wasn't really designed for. Then you see how people abuse it and then you build for that because you kinda know there's demand for it. Right.

Speaker 1

在Meta时，我们所有重大产品都是这样打造的。几乎每个大产品都蕴含这种'等待需求'的核心理念。比如Facebook Dating的诞生：当我们分析用户资料浏览数据时，发现60%的查看发生在异性用户之间——这种非好友关系的传统社交场景。于是我们想，或许可以开发个交友产品来满足这个现成的需求。这很有意思。

And when was at Meta, this is how we built all the big products. I think almost every single big product had this nugget of weight and demand in it. For example, something like Facebook data came from this idea that when we looked at who looks at people's profiles, I think 60% of views were between people of opposite gender, so kind of like traditional setup, that were not friends with each other. And so we're like, oh man, okay, maybe there's like, maybe if we want a dating product can harness this demand that exists. That's interesting.

Speaker 1

Marketplace的诞生也类似。当时Facebook群组中约40%帖子是交易帖。我们意识到人们已经在用这个产品进行买卖，如果专门为此打造产品很可能成功。现在我们仍延续类似思路，但多了为开发者定制的优势。

And for Marketplace it was pretty similar. Think it was like 40% of posts in Facebook groups at the time were buysell posts. And so we're like, Okay, people are trying to use this product to buysell. If we just build a product around it, that's probably gonna work. And so we think about it kind of similarly, but also we have the luxury of building for developers.

Speaker 1

开发者热爱改造和定制。作为自己产品的用户，这种特性让构建和使用过程充满乐趣。就像Kat说的，我们只需搭建好扩展点，观察人们如何使用，就能知道下一步该开发什么。

And developers love hacking stuff, they love customizing stuff. And it's like, as a user of our own product, it makes it so fun to build and use this thing. And so, like Kat said, we just built the right extension points, we see how people use it, and that kind of tells us what to build next.

Speaker 0

举个例子，我们收到大量用户请求说：'老兄，云代码总在我要权限，可我正买咖啡呢，根本不知道它在请求权限。能不能改成Slack通知？'于是Dixon开发了钩子功能，现在任何需要Slack提醒的操作都能实现。

Like, for example, we got all these user requests where people are like, Dude, Cloud Code is asking me for all these permissions, and I'm out here getting coffee. I don't know that it's asking me for permissions. How can I just get it to ping me on Slack? And so we built hooks, Dixon built hooks, so that people could get pinged on Slack. And you could get pinged on Slack for anything that you want to get pinged on Slack for.

Speaker 0

这很像人们确实渴望拥有某种能力。我们不想自己构建集成，所以开放了Hooks让人们来实现。

And so it was very much like people really wanted the ability to do something. We didn't want to build the integration ourselves, and so we exposed Hooks for people to do that.

Speaker 2

这让我想到你们最近发布的，你们某种程度上将Cloud Code重新定位为更通用的代理SDK。这是否源于你们发现所构建的产品存在更广泛的潜在需求？

The thing that makes me think of is you recently released, you kind of moved or rebranded how you talk about Cloud Code to be this more general purpose agent SDK. Was that driven by some latent demand where you saw there's a more general purpose use case for what you built?

Speaker 0

我们意识到，就像你谈到的将Cloud Code用于编码之外的场景一样，这种情况很常见。我们收到大量用户故事，有人用它辅助写博客、管理数据输入并以自己的风格完成初稿。还有人用它构建邮件助手。我个人常用它做市场研究，因为本质上它就像一个可以无限运行的代理，只要给它具体任务并能获取底层数据。比如我曾想研究全球所有公司及其工程师数量并做排名。

We realized that, similar to how you were talking about using Cloud Code for things outside of coding, we saw this happen a lot. We get a ton of stories of people who are using Cloud Code to help them write a blog and manage all the data inputs and take a first pass in their own tone. We find people building email assistants on this. I use it for a lot of market research because at the core, it's like an agent that can just go on for an infinite amount of time as long as you give it a concrete task and it's able to fetch the right underlying data. So one of the things I was working on was I wanted to look at all the companies in the world and how many engineers they had and to create a ranking.

Speaker 0

虽然这不是传统编码用例，但Quad Code完全可以胜任。所以我意识到底层原语其实非常通用。只要有个能长期运行的代理循环，能访问互联网、编写和运行代码，基本上可以说，只要稍加变通，你就能在上面构建任何东西。

And this is something that quad code can do, even though it's not a traditional coding use case. So I realized that the underlying primitives were really general. As long as you have an agent loop that can continue running for a long period of time and you're able to access the internet and write code and run code, pretty much you can, if you squint, you can kind of build anything on it.

Speaker 1

当我们从Quad Code SDK更名为Quad Agent SDK时，已有数千家公司在使用。其中很多用例与编码无关。所以在内外两方面我们都看到了...

And I think at the point where we rebranded it to from the Quad Code SDK to the Quad Agent SDK, there was already many thousands of companies using this thing. And a lot of those use cases were not about coding. So it's both internally and externally, we kind of saw Yeah,

Speaker 0

比如健康助手、金融分析师、法律助理等，应用范围相当广泛。

it's like health assistants, financial analysts, legal assistants. It was pretty broad.

Speaker 2

没错，最酷的用例有哪些？

Yeah, what are the coolest ones?

Speaker 1

我注意到你最近在播客里邀请了Noah Briar。Obsidian的思维导图笔记功能使用案例之多令人难以置信，特别是这种特定组合。有些与编程相关的酷用例，比如我们为quad code开发的这个问题追踪系统。团队简直应接不暇，问题源源不断地涌来，数量实在太多了。

I feel like actually you had a Noah Briar on podcast recently. I thought the Obsidian mind mapping note keeping use It's cases is really insane how many people use it for this, this particular combination. I think some coding or coding adjacent use cases that are cool is we have this issue tracker for quad code. The team's just constantly underwater trying to keep up with all the issues coming in. There's just so many.

Speaker 1

quad会自动去重问题，它非常擅长发现重复项。还能进行第一轮问题解决——通常发现问题时，它会主动在内部提交PR，这是团队里Inigo开发的新功能。此外还有值班系统，能从Sentry日志、BigQuery等各处收集信号并整理汇总。

So quad dedupes the issues and it automatically finds duplicates and it's extremely good at it. It also does first pass resolution. So usually when there's an issue it'll proactively put up a PR internally and this is a new thing that Inigo on the team built. So this is pretty cool. There's also on call and collecting signals from other places, getting sentry logs and getting logs from BigQuery and collating all this.

Speaker 1

而且它处理这些特别高效，因为底层全是Bash脚本。这些都是我看到的内部应用场景。

Plus it's really good at doing this because it's all just Bash in hand. And so these are all these internal use cases that I saw.

Speaker 2

那么当它整理日志或去重问题时，是像有云服务在后台持续运行那样吗？这是你们正在构建的功能吗？

SPEAKER So when it's collating logs or deduping issues, is that like you have clouds continually running in the background? And is that something that you're building for?

Speaker 0

那个特定功能会在新问题提交时触发。它只运行一次，但可以根据需要持续运行。

For that particular one, it gets triggered whenever a new issue is filed. So it runs once, but it can choose to run for as long as it needs.

Speaker 2

明白了。关于quad持续运行的理念呢？

Got it. What about the idea of quads always running?

Speaker 0

哦，主动式quad。这绝对是我们想实现的目标。目前我们正全力确保quad code在单项任务中的可靠性，比如多行自动补全和单次代理，现在正开发能完整处理任务的quad code。顺着这个发展轨迹，最终会实现更高层次的抽象，处理更复杂的任务。希望下一步能大幅提升生产力。

Oh, proactive quads. I think it's definitely where we want to get to. I would say right now, we're very focused on making quad coding incredibly reliable for individual tasks. And multi line autocomplete and then single turn agents and then now we're working on quad code that can complete tasks, I feel like if you trace this curve, eventually you go to even higher levels of abstraction, like even more complicated tasks. And then hopefully, the next step after that is a lot more productivity.

Speaker 0

首先要了解你们团队的目标是什么，你们个人的目标是什么，能够主动提出：'嘿，我觉得你可能想试试这个功能。这是代码初稿，这些是我做的假设，这些假设正确吗？'

So just understanding what your team's goals are, what your goals are, being able to say, Hey, I think you probably want to try this feature. And here's a first pass at the code, and here are the assumptions I made, and are these correct?

Speaker 2

我都等不及了。我觉得紧接着可能就是Claude要当你的经理了。天哪！

I can't wait. And I think probably right after that is Claude is now your manager. Oh no!

Speaker 0

这不在计划之内。

That's not in the plan.

Speaker 2

团队每个人都特别兴奋我们今天能交流，他们提了一大堆问题给我，我得确保都回答到。哦，这里有个好问题：为什么在架构中选择代理式RAG而非向量搜索？向量嵌入技术现在还重要吗？

Everyone on the team was like super excited that we were talking today and they gave me a bunch of questions and I want to make sure I hit all of the questions. Oh, here's a good one. Why did you choose agentic rag over vector search in your architecture, and are vector embeddings still relevant?

Speaker 0

其实最初我们确实使用了向量嵌入技术。但维护起来非常棘手，因为需要持续重新索引代码，可能会过时，而且本地修改也需要同步。当我们思考外部企业采用时的体验时，意识到这会暴露更多攻击面和安全隐患。同时我们发现云端代码和云端模型其实非常擅长代理式搜索，能达到相同的准确度，而且部署方案简洁得多。

So actually, initially we did use vector embeddings. They're just really tricky to maintain, because you have to continuously re index the code and they might get out of date, and you have local changes, so those need to make it in. And then as we thought about what does it feel like for an external enterprise to adopt it, we realized that this exposes a lot more surface area and, like, security risk. We also found that actually cloud code is really good and cloud models are really good at agentic search. So you can get to the same accuracy level with agentic search, and it's just a much cleaner deployment story.

Speaker 2

这真的很有意思。

That's really interesting.

Speaker 0

如果确实想为Cloud Code引入语义搜索功能，可以通过MCP工具实现。也就是说，如果你想自己管理索引并通过MCP工具让Cloud Code调用，这种方案是可行的。

If you do want to bring semantic search to Cloud Code, you can do so via an MCP tool. So if you want to manage your own index and expose an MCP tool that lets Cloud Code call that, that would work.

Speaker 2

你认为与Cloud Code配合使用的最佳MCP（管理控制平台）有哪些？

What do you think are the top MCPs to use with Cloud Code?

Speaker 0

Puppeteer和Playwright在这方面表现非常出色。

Puppeteer and Playwright are pretty high up there.

Speaker 2

确实如此。

Definitely.

Speaker 0

没错。Sentry和Asana都有非常优秀的解决方案。

Yeah. Sentry has a really good one. Asana has a really good one. Do

Speaker 2

你认为Anthropic内部人员或其他组织中的Cloud Code高级用户是否掌握了一些不为人知但应该知道的实用技巧？

you think that there are any power user tips that you see people inside of Anthropic or other people who are power inside of organizations that are big Cloud Code power users that people don't know about but they should?

Speaker 0

Quad Co天生不喜欢做的一件事——但我个人觉得非常有用——就是提问。但你知道，当你与思维伙伴或合作者头脑风暴时，通常会互相提问。这正是我喜欢做的事，特别是在计划模式下。我会直接告诉Cloud Code：'嘿，我们正在头脑风暴这件事，请向我提问。'

One thing that Quad Co doesn't naturally like to do, but that I personally find very useful, is Quad Co doesn't naturally like to ask questions. But you know, if you're brainstorming with a thought partner, a collaborator, usually you do ask questions back and forth to each other. And so this is one of the things that I like to do, especially in plan mode. I'll just tell Cloud Code like, Hey, we're just brainstorming this thing. Please ask me questions.

Speaker 0

如果你有任何不确定的地方，我希望你提出问题，我会回答。我认为这实际上能帮助你找到更好的解决方案。

If there's anything you're unsure about, I want you to ask questions and I'll do it. And I think that actually helps you arrive at a better answer.

Speaker 1

我们还可以分享很多技巧。我认为有几个常见错误我看到人们常犯。一个就像你说的，没有充分利用计划模式。这非常重要，我认为那些刚接触智能体编程的人往往会这样。他们某种程度上认为这东西无所不能，但其实并非如此。

There's also so many tips that we can share. I think there's a few really common mistakes I see people make. One is like you said, not using plan mode enough. This is just super important and I think this is people that are kind of new to agentic coding. They kind of assume this thing can do anything and it can't.

Speaker 1

它今天表现没那么好，以后会改进，但目前它能一次性通过某些测试，但大多数情况下不行。所以你需要理解它的局限性，明白何时会陷入循环。像计划模式这样的功能，如果能先制定好计划，可以轻松将成功率提高两三倍。我看到高级用户做得很好的一点是那些大规模部署智能代码的公司，幸运的是现在有很多这样的公司，我们可以向他们学习。比如设置功能。

It's not that good today and it's going to get better but today it can one shot some tests, it can't one shot most things. And so you kind of have to understand the limits and you have to understand where you get in the loop. And so something like plan mode, it can like two, three X success rates pretty easily if you land on the plan first. Other stuff that I've seen power users do really well is companies that have really big deployments of quad code, and now luckily there's a lot of these companies so we can kind of learn from them. Having settings.

Speaker 1

提交到代码库的JSON文件非常重要，因为你可以用它预先允许某些命令，这样就不会每次都弹出权限提示，同时也能阻止某些命令。比如你不想用web补丁之类的功能。这样作为工程师，我就不会被频繁询问权限，可以把这个配置提交并分享给整个团队，让大家都能使用。

Json that you check into the code base really important because you can use this to pre allow certain commands so you don't get permission prompted every time, and also to block certain commands. Let's say you don't want web patch or whatever. And this way as an engineer I don't get prompted, and I can check this in and share it with the whole team so everyone gets to use

Speaker 2

我通常直接用'dangerously skip permissions'绕过这个限制。

I get around that by just using dangerously skip permissions.

Speaker 1

是的，我们这里也有这个选项，但不推荐使用。毕竟这是个模型，你知道的，它可能会做出奇怪的事情。我认为另一个很酷的用例是人们用停止钩子做有趣的事情。停止钩子会在每轮操作完成时运行。

Yeah, we kind of have this here, but we don't recommend it. It's a model, you know? It can do weird stuff. I think another kind of cool use case that we've seen is people using stop hooks for interesting stuff. So stop hook runs whenever the turn is complete.

Speaker 1

就像这个助手来回进行了一些工具调用，完成后将控制权交还给用户。这时就会运行停止钩子。你可以定义一个停止钩子，比如如果测试未通过，就返回文本并继续运行。本质上你可以让模型一直运行直到任务完成。当你结合SDK和这种编程式用法时，效果简直惊人。

So like this assistant did some tool calls back and forth with, you know, whatever, and it's done and it returns control back to the user. Then we run the stop hook. And so you can define a stop hook that's like if the tests don't pass, return the text, keep going. And essentially it's like you can just make the model keep going until the thing is done. And this is just insane when you combine it with the SDK and this kind of programmatic usage.

Speaker 1

这是个随机过程，具有不确定性，但通过搭建框架，你可以获得确定性的结果。

This is a stochastic thing, it's a non deterministic thing, but with scaffolding you can get these deterministic outcomes.

Speaker 2

所以你们开创了这种CLI范式转变。你认为CLI会是最终形态吗？一年或三年后我们主要会在CLI中使用云代码吗？还是有更好的替代方案？

So you guys started this sort of CLI, this CLI paradigm shift. Do you think the CLI is the final form factor? Are we going be using Cloud Code in the CLI primarily in a year or in three years? Or is there something else that's better?

Speaker 0

我的意思是，这不是最终形态，但我们非常专注于让CLI尽可能智能和可定制。你可以谈谈下一代形态。

I mean, it's not the final form factor, but we are very focused on making sure the CLI is like the most intelligent that we can make it and that's as customizable as possible. You can talk about the next form factors.

Speaker 1

是的，Kat让我来谈这个是因为没人知道答案。这些东西发展得太快了，没人知道未来形态会怎样。目前我们团队处于实验阶段，先有CLI，然后是IDE扩展，现在又推出了更像图形界面的新版IDE扩展。我们在GitHub上还有quad功能，可以随处调用。

Yeah, I mean, Kat's asking me to talk about it because no one knows. This stuff's like it's just moving like so fast right like no one knows what these form factors are. Like right now I think our team is in experimentation mode. We have CLI then we came out with the IDE extension now we have a new IDE extension that's like a GUI it's a little more accessible. We have quad in github so you can just quad it anywhere.

Speaker 1

现在quad已经支持网页端和移动端，可以在任何平台使用。我们正处于探索阶段，试图找出下一步方向。如果宏观来看发展趋势，一个重要趋势是更长的自主运行时长。我们每个模型都在测试能持续自主完成任务的时间，现在已达到两位数小时级别，上一个模型能运行约30小时。

Now there's quad on web and on mobile so you can use it on any of these places And we're just in experimentation mode, so we're trying to figure out what's next. I think if we kind of zoom out and see where this stuff is headed, I think one of the big trends is longer periods of autonomy. And so with every model we kind time how long can the model just keep going and do tasks autonomously and just, you know, in dangerous mode in a container, keep auto compacting until the task is done. And now we're on the order of double digit hours. Think it's like the last model was like thirty hours.

Speaker 1

下一代模型将能以天为单位运行。但并行运行模型会带来一系列问题，比如运行容器的问题——你总不希望必须开着笔记本吧。

The next model is going to be days. And as you think about paralyzing models, there's a bunch of problems that come out of this. So one is, what is the container this thing runs in? Because you don't want to have to close your laptop.

Speaker 2

我现在就遇到这个问题，因为要做很多Dispie的Prompt优化工作，程序在笔记本上运行着，我根本不敢关机。

I have that right now, because I'm doing a lot of Dispie. I've read it, but DSPY or Dispie Prompt Optimization, and it's on my laptop, and it's like, I don't want to close on like, in the window, like, with my laptop open because I'm, like, I don't wanna close it.

Speaker 1

没错。我们之前拜访客户时也遇到过这种情况。

Yeah. Yeah. That's right. Yeah. We, like, visited companies before, like, like, customers.

Speaker 1

他们每个人都带着自己的代码块走来走去。这是为了其他人，所以我认为有人正在摆脱这种模式，而且我觉得很快我们就会进入四元组监控四元组的模式。我不确定最适合的形式因素是什么，因为作为人类，你需要能够检查并了解情况，但它也需要为四元组优化，优化四元组之间的通信带宽。所以我预测终端不是最终形态，未来几个月（也许一年左右）还会出现几种新的形式因素，而且变化会非常快。

They're everyone's just, like, walking around with their, like, plot codes. Is this for other so I think like one is kind of getting away from this mode and then I also think pretty soon we're going be in this mode of like quads monitoring quads and kind of I don't know what the right form factor for this is because as a human you need to be able to inspect this and kind of see what's going on But also it needs to be quad optimized, where you're optimizing for kind of bandwidth between the quad to quad communication. So my prediction is terminal is not the final form factor. My prediction is there's going be a few more form factors in the coming months, you know, maybe like a year or something like that. And it's going to keep changing very quickly.

Speaker 2

你怎么看待这个问题？我教了很多云代码课程给各种订阅用户，

What do you think about, you know, I teach a lot of Cloud Code to a lot of every subscribers and

Speaker 0

谢谢。

Thank you.

Speaker 2

不客气，帮你完成工作。我认为一个重要问题是终端让人望而生畏。就像在电话里告诉订阅用户‘这是如何打开终端的，即使你不懂技术也可以操作’是件大事。你怎么看？

You're welcome. Doing your work for you. And I think the, like, one of the big things is just the terminal is intimidating. And just like being on a call with the subscribers being like, here's how you open the terminal and you're allowed to do this even if you're non technical is like a big deal. How do you think about that?

Speaker 0

是的，我们市场团队有个人开始使用云代码，因为她要写一些涉及云代码的内容，觉得应该亲身体验。结果她屏幕上弹出了30多个需要授权的弹窗，因为她从没用过终端。我完全同意你的观点，这对非工程师确实很难，甚至有些工程师也不习惯日常在终端工作。我们的VS Code图形界面扩展是第一步尝试，因为你完全不用考虑终端。

Yeah, I, one of the people on our marketing team started using Cloud Code because she was writing some content that touched on Cloud Code, and was like, you should really experience it. And she got, like, 30 pop ups on her screen where she had to accept various permissions because she'd never used a terminal before. So I completely see eye to eye with you on that. It's definitely hard for non engineers, and there's even some engineers we found who aren't fully comfortable with working day to day in the terminal. Our Versus Code GUI extension is our first step in that direction because you don't have to think about the terminal at all.

Speaker 0

这就像带有一堆按钮的传统界面。我们正在开发更多图形界面，比如网页版的云代码就是图形界面。这对技术背景较弱的人来说可能是个很好的起点。

It's like a traditional interface with a bunch of buttons. We are working on more graphical interfaces. So Cloud Code on the Web is a GUI. I think that actually might be a good starting point for people who are less technical.

Speaker 1

是啊。几个月前有个神奇时刻，我走进办公室发现Anthropic的一些数据科学家（就坐在四元代码团队旁边）的电脑上运行着四元代码。我当时就想：这是什么？你们是怎么搞明白的？

Yeah. Yeah. There was this magic moment maybe a few months ago where I walked into the office and some of the data scientists at Anthropic, like say right next to the quad code team, and the data scientists just had quad code running on their computers. And was like, what is this? How did you figure this out?

Speaker 1

我记得好像是Brandon第一个这么做的。他说'哦对，我刚安装了这个。我在做这个产品，所以我应该用用看'。我当时就想'天啊'。所以他学会了怎么用终端和Node。

I think it was like Brandon was the first one to do it. And he was like, oh yeah, I just installed it. I work on this product, so I should use it. And I was like, oh my god. So he figured out how do use a terminal and Node.

Speaker 1

Js。他以前没真正做过这种工作流程。显然技术性很强。所以我觉得现在我们开始看到所有这些与代码相关的功能。人们会用quad code。

Js. He hasn't really done this kind of workflow before. Obviously very technical. So I think now we're starting to see all these kind of code adjacent functions. People use quad code.

Speaker 1

是的，这挺有意思的。从潜在需求角度看，这些人是在hack产品，说明有用它做这个的需求。所以我们想通过更易用的界面让这变得简单些。但与此同时，Cloud Code团队专注为顶级工程师打造最好的产品。我们聚焦软件工程，想把这个做得特别好。

And yeah, it's kind of interesting. From a latent demand point of view, these are people hacking the product, so there's like demand to use it for this. And so we want to make it a little bit easier with more accessible interfaces. But at the same time for us, Cloud Code, we're laser focused on building the best product for the best engineers. And so we're focused on software engineering and we want to make this really good.

Speaker 1

但我们希望它也能成为其他人可以hack的东西。

But we want to make it a thing that other people can hack.

Speaker 0

有时候Cloud Code生成的代码会有点啰嗦，但你可以直接告诉它简化，它做得很好。

Sometimes Cloud Code will write code that's a bit verbose, But you can just tell it to simplify it, and it does a really good job.

Speaker 2

有意思。那你们具体怎么操作？是用斜杠命令还是

Interesting. And so and how are how and when are you doing that? So you're you're using a slash command or you're

Speaker 0

我直接说出来。我就说

I just say it. I just say

Speaker 2

类似这样的

something Like

Speaker 0

有时候你会觉得，嘿，这应该是个一行代码的改动。对。然后我会在这里写五行代码，比如，简化它。嗯。它立刻就能理解你的意思，然后我会修复它。

sometimes you're like, hey, this should be a one line change. Yeah. And I'll write five lines here, like, simplify it. Mhmm. And it understands immediately what you mean and I'll fix it.

Speaker 2

是啊。我觉得我们团队很多人也这样。这挺有意思的。既然你经常这么说，为什么不把它做成斜杠命令或者集成到工具里呢？这样就能自动完成了。

Yeah. Think a lot of people on our team do that too. It's it's interesting. Why do you like why not then if you're saying that all the time, why not then, push that into, like, a slash command or the harness or something like that to, yeah, make it just happen automatically?

Speaker 0

我们在CloudMD里确实有这个说明。我觉得它影响的对话比例太低，我们不想过度调整。至于为什么不用斜杠命令，因为你其实不需要那么多上下文。斜杠命令更适合那些原本需要写两三行代码的情况。比如计划模式，你其实可以用几个词表达，但有时候确实需要两三行代码才能完整表达计划模式的需求。

We do have instructions for this in the CloudMD. I think it impacts such a low percentage of conversations that we don't want it to over rotate in the other direction. And then the reason why not a slash command is because you actually don't need that much context. I think slash command's really good for situations where you would otherwise need to write two, three lines. Like, even for plan mode, you actually can use a few words, but sometime but it actually takes two or three lines to capture the entirety of what you want in plan mode.

Speaker 0

对于简化操作，你只需要写'简化它'就能搞定。

For simplify it, you can just write simplify it and it gets it.

Speaker 2

对，对。这样说得通。很酷。是的。

Yeah. Yeah. That makes sense. Cool. Yeah.

Speaker 2

好的。现在我们能...这挺有意思的。

Okay. Now we're we can That's interesting.

Speaker 1

是啊，但是这些东西，你知道的，感觉还是很混乱。我们录音前还在讨论我们处于采用曲线的哪个位置，现在还是那个豪申曲线什么的。

Yeah. But but the stuff, like, you know, it still feels disorderly. We were talking before the recording about where are we on the adoption curve and it's still The Hauschen curve or whatever.

Speaker 2

管它叫什么术语呢。

Whatever that term was.

Speaker 1

没错。感觉我们就像那前10%。这些东西变化太快了，而且会持续变化。

Exactly. It just feels like we're first 10%. This stuff is going to change so fast, it's going keep changing.

Speaker 0

即使我和Anthropic外部使用过Cloud Code的研究人员交谈时，他们也会被这类问题卡住，没意识到可以直接让大语言模型简化它。这说明即便是业内工作者，也不总能意识到你可以直接和模型对话。

Even when I talk to researchers outside of Anthropic who've used Cloud Code, they also get stuck on things like this, not realizing that they can just tell the LLM to simplify it. And I think that just goes to show that even for people who are like working in this industry, they don't always realize that you can just talk to the model.

Speaker 2

问题在于人们潜意识里认为使用AI不该是项技能，因为它应该能听懂任何指令。但实际上，你的表达方式会直接影响它的表现。所以如果你能表达得更好，它就会表现得更好。

That's the thing is like, think that there's this underlying expectation that using AI shouldn't have to be a skill, like because it just does whatever you say. And you're like, well, I mean, whatever you say is gonna matter for what it does. So if you can say things better, it's gonna do better. Yeah,

Speaker 1

但每个模型都会带来变化，这才是难点。就像提示工程师曾经是个职业，现在众所周知已经不存在了。未来还会有更多这样的小微技能需要学习，但随着模型进步，它能更好地理解你的需求。

I mean, it changes with every model, though. That's the hard part. Like, Prompt engineer was a job, and now famously it's not a job anymore. And there's gonna be more jobs that are then not jobs anymore, these little micro skills that you have to learn to use this thing. As the model gets better, it can just interpret it better.

Speaker 1

但我觉得对我们来说，这也体现了开发这种产品必须保持的谦逊——我们真的不知道接下来会发生什么。我们和其他人一样都在摸索中，只是随波逐流罢了。

But I think that's also like, for us, this is part of this kind of humility that we have to have building a product like this that we just really don't know what's next. And we're just trying to figure it out kind of along with everyone else. We're just here for the ride.

Speaker 2

这就是为什么你自己动手开发很酷，因为我认为这是最好的认知方式。就像我们做的这样，你某种程度上生活在未来，你一直在使用它，缺什么一目了然。你会想'我就要这个功能'，然后直接动手实现，而不是去问某个大企业的产品经理'你想要什么AI功能？'他们只会回答'不知道。在我的IDE旁边加个小聊天机器人吧'，然后你就只能'好吧'。

And that's why it's cool that you're building it for yourself, because I think that's the best way to know that. Is just like, you're, and this is what we do too, is like, you're sort of living in the future, you're using it all the time, and it's pretty clear what's missing. You're like, I just want this thing, and you can just do the next thing, rather than being like, let me ask some enterprise product manager at some gigantic what kind of AI feature do you want? And they're like, I don't know. Put a little chat bot on the side of my IDE, and you're like, Okay.

Speaker 1

没错。开发开发者工具的美妙之处就在于此——你自己就是用户。

Yeah. This is the luxurious thing about building DevTools. You're your own customer.

Speaker 2

我认为这也是AI的独特之处，它某种程度上重置了所有软件的竞争格局。比如我们有Quora这样的邮件助手，Sparkle这样的文件整理工具。只要你为自己电脑需求开发的东西用上了AI，很可能就是首创，因为整个领域都被重置了。所以现在为自己开发东西是个特别令人兴奋的时期。

I think it's also really a unique thing about AI because it sort of reset the game board for all software. So, you know, we have Quora, this like email assistant, and we have like Sparkle, which organizes your files. And it's like anything that you do for something that you want to use on your computer, if you're building it with AI, there's a good chance that hasn't been done before because like the whole landscape has been reset. And so it's a uniquely exciting time to build stuff for yourself.

Speaker 0

完全同意。我认为这彻底打开了竞争空间。现在任何人都能开发满足自己需求的应用然后推广给所有人。确实很酷。

Totally. I think it totally opens the playing field, too. It's like any individual can now build an app to fill their need and then distribute it to everyone else. Yeah. It's really cool.

Speaker 0

我一直在捣鼓各种随机的小项目原型。

I've been prototyping all these, like, random pet projects.

Speaker 2

我

Speaker 0

刚搬进新公寓，里面空荡荡的。所以我用Cloud Agent SDK开发了个购物顾问助手——谁有时间读所有评论、对比所有选项、查价格啊？发现好东西太难了。它只要问我几个问题，我告诉它需求就行。家具？对，就是这样。

just moved into a new apartment and it's empty. And so I've been building this like shopping advisor assistant on like the Cloud Agent SDK because who has time to read all the reviews and look at all the options and find their pricing and everything's really hard to discover. And so it just like asks me a bunch of questions and I tell it what I want. Like for furniture? Yeah, exactly.

Speaker 0

它会给我展示一堆不同沙发款式的照片和网友评价，然后我告诉它我不喜欢哪些。这感觉就像在和购物助手合作一样，真的很酷。

And it shows me a bunch of photos of different sofas and options and what people say online. And then I tell it what I don't like. And it literally feels like working with a shopping assistant. It's been really cool. That's really cool.

Speaker 0

我还有个小小的邮件自动回复助手帮我起草回复。不过我平时不怎么用邮件，所以...

I also have my little email response agent that drafts responses for me. But I don't use email that much, so.

Speaker 1

哦，我就知道不是你在亲自回复。

Oh and I knew it wasn't you responding.

Speaker 0

所以才会延迟七天回复嘛。这个助手做事特别细致。不过Agent SDK确实很酷。

That's why it's seven days delayed. The agent's just doing a very thorough job. Yeah. Agent SDK is cool though.

Speaker 1

是啊，HNSTK很厉害。我们这么小的团队能做出这么多东西，总是让人感觉不可思议。因为我觉得...

Yeah, HNSTK is cool. Yeah, it always just feels amazing how much we're able to build with such a small team. Because I feel like

Speaker 0

对了，还有个特别棒的变化是，我发现人们的思维正在从文档转向演示。在我们内部，演示才是硬通货。想让别人对你的东西感兴趣？那就展示15秒它能做什么。现在团队里每个人都潜移默化地...

there's Oh, the other thing that's really cool is that I think people are just shifting their mindset from docs to demos. Like internally, our currency is actually demos. It's like you want people to be excited about your thing, show us fifteen seconds of what it can do. We find that everyone on the team now has this kind of indoctrinated

Speaker 2

彻底融入了演示文化。我觉得这样更好，因为很多想法就算你文笔再好也很难解释清楚。但只要让人亲眼看到，他们立刻就能明白。这种转变不仅发生在产品开发上，也出现在各种创意领域。

in the Democulture, for sure. And I think that's better because there's a lot of things that you might have in your head that if you're a great writer, maybe you could figure out how to explain it. But it's just even then, it's just really hard to explain. But if someone can see it, they get it immediately. And I think that's happening for product building, but it's also happening for all sorts of other types of creative endeavors.

Speaker 2

比如拍电影，以前需要提案，但现在可以直接说'做了这个Sora视频'。你能以极低成本看到自己想创作内容的雏形。这意味着你不需要花太多时间说服别人，直接展示成品就行。

Like making a movie, for example, you had to pitch it, but now you can just be like, made this Sora video. You can kind of see like the glimmer of the thing you're trying to make for very cheap. And so that means you don't have to spend time convincing people as much. You can just be like, here, I made it.

Speaker 1

是啊，作为创作者，你可以不断重做直到满意。过去我们习惯用文档、白板或在Sketch/Figma上画图，现在直接动手构建直到感觉对为止。现在要验证这种感觉容易多了——以前只能视觉预览或用文字描述，但永远抓不住那种氛围感，而现在氛围营造变得非常简单。

Yeah, and also as a builder, like, you can just make it and then like make it again and then make it again until you're happy. I feel like the flip side is you used to make a doc or whiteboard something or I would draw stuff in Sketch or Figma or whatever and now we'll just build it until I like how it feels. And it's just so easy to get that feeling out of it now. I And think it's like you could see it visually before or you could describe it in words, but it's like you could never get the vibe. And now the vibe is real easy.

Speaker 0

你的计划模式就重建了三次

And you built plan mode like three times

Speaker 2

是因为你建完就推倒，推倒又重建这样循环吗？

Like because of you built it and then you threw it out and rebuilt it and threw it out and rebuilt it?

Speaker 1

没错，就像Todos功能，Sid最初版本就迭代了三四次原型。听说之后一天内又改了20版。我们发布的几乎所有功能背后都至少有多个原型版本。

Yeah, where like Todos, like Sid built the original version, also like three or four, he built like three or four prototypes. And then I've heard this had maybe like 20 versions after that, like in like a day. I think this is like a lot of pretty much everything we released, there was at least a few prototypes behind it.

Speaker 2

你们如何在不同原型间传承经验？特别是当一个人完成初步原型后，另一个人接手继续迭代20版时，如何最大化学习成果？

SPEAKER How do you keep track of and carry forward the things you learn from prototype to prototype? And especially if it's like some one person is prototyping it, and then you're like, I'm going to take it over, I'm going to do 20 more. How do you maximize what you get out of that?

Speaker 1

有几个关键点：一是风格指南。我们在为终端开发时发现了一些设计元素，这其实是在探索终端的新设计语言。其中部分规范可以纳入风格指南固化下来。

There's maybe a few elements of it. One is the style guide. So there's some elements of style that we discover, and I think a lot of this is building for the terminal. We're discovering a new design language for the terminal and building it as we go. And I think some of this you can codify in a style guide.

Speaker 1

这是我们的QuadMD。但其中还有一部分是关于产品感知的，我认为模型目前还没有完全掌握这部分。我觉得或许我们应该想办法教会模型这种产品感知能力，让它明白什么可行什么不可行。因为在产品设计中，你需要用最简单的方式解决用户的问题，然后剔除所有无关内容，扫清一切障碍。这样产品就能最快地契合用户意图。

This is our QuadMD. But then there's this other part of it that's kind of product sense, where I don't think the model totally gets it yet. And I think maybe we should be trying to find ways to teach the model this kind of product sense about this works and this doesn't. Because in product you want to solve the person's problem in the simplest way possible and then delete everything else that's not that and just get everything out of the way. So you align the product to the intent as quickly as possible.

Speaker 1

可能模型现在还没完全理解这一点。

And maybe the model doesn't totally get that yet.

Speaker 0

是啊，它确实体会不到使用Quad代码的感觉。毕竟模型自己并不使用Quad代码。

Yeah, it doesn't really feel what it's like to use quad code. Like the model doesn't use quad code.

Speaker 1

所以我在想，当Quad代码能够自我测试和自我使用时——就像我们开发时那样，它能发现UI漏洞之类的问题。不过也许我们直接尝试提示它就行？说实话很多问题就这么简单。每当有新想法时，通常你只要给出提示，往往就能奏效。

So I think like when, you know, quad code can like test itself and it can kind of use itself. And like we do this when developing and it can see like UI bugs and things like that. I don't know, maybe we should just try prompting it though. It could Like honestly, a lot of the stuff is as simple as that. Like when there's some new idea, usually you just prompt it and often it just works.

Speaker 1

也许我们真该试试看。

Maybe we should just try that.

Speaker 0

很多原型实际上就是用户体验交互。所以当我们发现新的交互方式时——比如Boris发现的用Shift+Tab实现自动接受功能。

A lot of the prototypes are actually the UX interactions. And so I think once we discover a new UX interaction like shift tab for auto accept, I think Boris figured out.

Speaker 1

其实是Igor发现的。哦对，是Igor。我们回头想想

That was Igor actually. Oh, Igor. We went back and think

Speaker 0

东西可以放进去。

things can fit into that.

Speaker 1

我们花了一周时间做原型设计。

We did doing prototypes for a week.

Speaker 0

是的，Shift Tab感觉很好用。后来当前计划模式的一个迭代版本采用了Shift Tab，因为这实际上是另一种告诉模型它应该有多主动的方式。所以我认为随着更多功能使用相同的交互方式，你会对什么应该放在哪里形成更清晰的心智模型。

Yeah, Shift Tab felt really nice. Then one of the now current plan mode iteration uses Shift Tab because it's actually just another way to tell the model how agentic it should be. And so I think as more features use the same interaction, you form a stronger mental model for what should go where.

Speaker 1

是的。我认为思考是另一个很好的例子。最初在发布Quad Code之前，或者可能是第一个思考模型时，大概是3.7版本？我记不清第一个是什么了。

Yeah. Thinking I think is another really good one. First we were like, before we released Quad Code, or maybe it was like the first thinking model, was it like 3.7? I forget what the first one was.

Speaker 0

是的。

Yeah.

Speaker 1

它能思考，我们在头脑风暴如何切换思考模式。然后有人就说，如果用自然语言直接让模型思考会怎样？它知道怎么思考。我们说，好吧，太棒了，就这么做。我们这样做了一段时间，后来发现人们会意外触发它。他们会对模型说不要思考。

And it was able to think and we're brainstorming, how do we toggle thinking? And then someone was just like, what if you just ask the model to think in natural language and it knows how to think? And we're like, okay, sweet, let's do that. And so we did that for a while, then we realized that people were accidentally toggling it. So they were like, don't think.

Speaker 1

然后模型就会开始思考。所以我们不得不调整让它不被'不要思考'触发。但这样还是不够直观。

And then the model's like, oh, I should think. They just started thinking. And so we had to kind of tune it out. So don't think didn't trigger it. But then it still wasn't obvious.

Speaker 1

于是我们做了个用户体验改进来突显思考过程，是的，

So then we made a UX improvement to highlight the thinking Yeah,

Speaker 2

全部那些。

all that.

Speaker 1

那太有趣了，感觉真的很神奇。

That was so fun and it felt really magical.

Speaker 2

当你进入超级思考模式时，就会出现彩虹之类的效果。

When you do ultra think, it's rainbow or whatever.

Speaker 1

没错。在Sono 4.5版本中，我们发现开启扩展思考后性能确实有非常显著的提升。所以我们把切换功能做得很简单——因为有时你需要，有时不需要。比如处理简单任务时，你肯定不希望模型思考五分钟，只想让它直接执行。于是我们用Tab键作为切换交互方式，然后移除了大量思考相关的术语。不过我觉得'超级思考'这个名称因为意外被保留下来了。

Yeah, exactly. And then with Sono 4.5, we actually find a really, really big performance improvement when you turn on extended thinking. And so we made it really easy to toggle it because sometimes you want sometimes you don't because you kind of, for a really simple task, don't want the model to think for five minutes, want it to just do the thing. And so we used tab as the interaction to toggle it and then we unshipped a bunch of the thinking words. Although I think we kept UltraThing for accidental reasons.

Speaker 1

这个用户体验设计太酷了。

It was such a cool UX.

Speaker 2

有意思。你觉得是否应该设立关于删除内容的新指标？程序员们一直觉得删除大量代码很爽。但考虑到现在构建速度这么快，删除东西反而变得更重要了。

Interesting. Do you think there's some new metric that's about what you deleted? And I think programmers have always felt like deleting a bunch of code feels really good. But there's something about because you can build stuff so fast, it becomes more important to also delete stuff.

Speaker 1

我觉得我最喜欢看到的差异是红色差异。这是

I think my favorite kind of diff to see is a red diff. This is

Speaker 2

最棒的。就像当

the best. Like when

Speaker 1

我收到它时，我会说，好啊，放马过来吧。再来一个，再来一个。但这也很难，因为你发布的任何东西，人们都在使用它。所以你必须让人们满意。因此我认为我们的原则通常是，如果我们撤销某个功能，就需要发布一个更好的替代品，让人们能从中受益，更完美地实现原有意图。

I receive it, I'm like, yeah, bring it on. Another one, another one. But it's hard because anything you ship, people are using it. And so you gotta keep people happy. And so I think generally our principle is if we un ship something, we need to ship something even better that people can take advantage of, that kind of matches that intent even better.

Speaker 1

是的，这又回到了如何衡量代码质量及其影响的问题。这是每家公司、每个客户都会问我们的。我想在Anthropic内部，自一月份以来我们的规模可能翻了一番。但与此同时，每位工程师的生产力提升了近70%。

And yeah, I think this is kind of back to how do you measure quad code and the impact of it? And this is something every company, every customer asks us about. I think internally at Anthropic, I think we doubled in size since January or something like that. But then productivity per engineer has increased almost 70% in that time.

Speaker 2

通过什么标准衡量的？

Measured by?

Speaker 1

我们确实通过几种方式进行了测量，其中PR（代码审查）是最简单也是主要的指标。但正如你所说，这无法全面反映情况，因为很多改进在于让原型设计更容易、尝试新事物更便捷——那些原本因为优先级太低而绝不会尝试的功能，现在都能轻松实现。所以确实很难量化。另一方面，代码量增加了，你就得删除更多代码，必须更谨慎地进行代码审查，并尽可能实现自动化审查。

I think we actually measured it, yeah, in a few ways, but kind of PRs are the simplest one and the main one. But like you said, this doesn't capture the full extent of it because a lot of this is making it easier to prototype, making it easier to try new things, making it easier to These things that you never would have tried because they're way below the cut line, you're launching a feature and there's this kind of wish list of stuff, now you can just do all of because it's so easy and you just wouldn't have done it. So yeah, it's really hard to talk about it. And then there's this flip side of it where more code is written, so you have to delete more code. You have to code review more carefully and automate code review as much as you can.

Speaker 2

这也带来了一个有趣的新产品管理挑战：由于能快速发布大量功能，产品可能失去整体感。你可以在这里加个按钮，那里添个标签页，这里补个小功能。最终很容易做出一个功能齐全但缺乏组织原则的产品，因为你总是在不停地发布各种东西。

There's also an interesting new product management challenge, because you can ship so much that it ends up not feeling as cohesive, because you could just add a button here and a tab there and a little thing here. It's much easier to build a product that has all the features you want but doesn't have any sort of organizing principle because you're just shipping lots of stuff all the time.

Speaker 0

我认为我们在这方面力求严谨，确保所有抽象概念对用户来说都易于理解，哪怕他们只是听到功能名称。我们遵循一个由鲍里斯引入团队的原则——我们不需要全新的用户体验。一切都应该直观到用户可以直接上手使用，且运行顺畅。我认为这为‘确保每个功能都真正直观’设定了很高的标准。

I think we try to be pretty disciplined about this and making sure that all the abstractions are really easy to understand for someone, even if they just hear the name of the feature. Feature. We have this principle that I believe Boris brought to the team that I really like, where we don't want a new user experience. Everything should be so intuitive that you just drop in and it just works. And I think that's really set the bar really high for making sure every feature is really intuitive.

Speaker 2

在对话式UI中如何实现这一点？因为最初只有一个空白文本框，没有一堆按钮和旋钮时，你们如何考虑让它变得直观？

How do you do that with a conversational UI? Because when there's not a bunch of buttons and knobs and it's just a blank text box to start? How do you think about making it intuitive?

Speaker 0

我们做了许多细节工作：教会用户用问号查看提示，在Cloud Code运行时显示提示，侧边栏设有变更日志，底部设有通知区域来告知新模型发布等信息。

There's a lot of little things that we do. We teach people that they can use the question mark to see tips. We show tips as Cloud Code is working. We have the change log on the side. We tell you about, oh, there's a new model that's out, or we show you at the bottom, we have a notification section for thinking.

Speaker 0

我认为我们通过微妙的方式向用户介绍功能。另一个重点是确保所有基础概念都有明确定义——比如‘钩子’和‘插件’在开发者生态中有公认含义，我们要确保产品实现与开发者听到这些术语时的第一反应完全匹配。

I think there's just subtle ways in which we tell users about features. I think the other thing that's really important is to just make sure that all the primitives are very clearly defined. Hooks have a common meaning in the developer ecosystem. Plugins have a very common meaning in the developer ecosystem, and making sure that what we build matches what the average developer would immediately think of when they hear that.

Speaker 1

还有渐进式披露的设计。在Quad Code中运行时，随时可以按Control+O查看模型看到的完整原始记录。我们只在必要时才显示这个功能——比如当工具结果被折叠时，会提示‘按Control+O查看’。我们不想一开始就堆砌太多复杂性，毕竟这个工具几乎无所不能。

There's also this progressive disclosure thing. Anytime in quad code when you run it, can hit control o to see the full raw transcript, the same thing the model sees. And we don't show you this until it's actually relevant. So when there's a tool result that's collapsed, we'll say use control o to see it. So we kind of we don't want to put too much complexity on you at the start because this thing can do you know anything.

Speaker 1

我们正在探索一个新原则：让模型教会用户如何使用它。你可以向Quad Code询问它自身功能，它会检索自己的文档来回答。我们还能更深入——比如斜杠命令既可供用户使用，也能被模型调用。当用户看到模型使用斜杠命令时，就会意识到自己也可以这样做。

I think there's this other kind of new principle which we've just started exploring which is like the model teaches you how to use the thing. And so you can ask quad code about itself and it kind of knows to look up its own documentation to tell you about it. But we can also go even deeper. Like for example, slash commands are a thing that people can use, but also the model can call slash commands. Maybe you see the model calling it and and then you'll be like, oh yeah, I guess I can do that too.

Speaker 2

确实很有意思。从最初Cloud Code作为通过CLI使用AI的独特范式开始，到现在整个行业都在转向CLI方向，这个演变过程是怎样的？

Yeah, yeah, yeah. Interesting. How has it changed, like, you know, when you first started doing this, Cloud Code was this sort of like singular thing, the singular way of thinking about, you know, using AI through a CLI. Other people had stuff like this, but it felt like this shift. And now there's a whole landscape of everyone is going CLI, CLI, CLI.

Speaker 2

这些是如何改变的？你对构建的看法、构建的感受，以及你如何应对当前竞争中的压力？

How has that changed? How you think about building, how it feels to build, and how are you dealing with the pressure of the race that you're in?

Speaker 1

对我来说，模仿是最真诚的恭维。这很棒。看到其他人受此启发而构建的各种东西真的很酷。我认为最终目标就是激励人们为这项即将到来的惊人技术打造下一个创新。这确实令人兴奋。

I think for me, imitation is the greatest flattery. So it's awesome. And it's just like, it's cool to see all this other stuff that everyone else is building, like inspired by this. And I think this is ultimately the goal is to kind inspire people to build this next thing for this just incredible technology that's coming. That's just really exciting.

Speaker 1

就我个人而言，我不太使用其他工具。通常当有新东西出现时，我可能会尝试一下感受氛围。除此之外，我们更专注于解决我们和客户面临的问题，并着手构建下一代产品。

Personally, I don't really use a lot of other tools. So usually when something new comes out, I'll maybe just try it to get a vibe. But otherwise, I think we're pretty focused on just solving problems that we have and our customers have and kind of building the next thing.

Speaker 2

酷。太棒了。我也很喜欢采访的这个环节。我们

Cool. Sweet. I love this part of the interview too. Do we

Speaker 0

回答所有

answer all

Speaker 1

你们的

of your

Speaker 0

团队问题了吗？

team's questions?

Speaker 1

好问题。

Good questions.

Speaker 2

哦，我们是否回答完我团队的所有问题了？让我看看，我想我们完成了。

Oh, do we get through all my team's questions? Let's see. I think we did.

Speaker 1

我也很好奇你会如何回答关于功能下架的问题。因为你们在进行AI驱动的开发，发布很多功能。团队规模小，所以运营负担很重。

I'm curious also how you would answer the unshipping question. Because also, you're doing this AI driven development, you ship a lot. You have a small team, so it's a lot of operational load.

Speaker 2

我之所以这么问，是因为我觉得我们在这方面做得不够好。我感觉有些产品因此显得有些混乱。特别是对Cora来说，产品功能面很广，能做很多事情。比如我们有个邮件助手，你可以问它'告诉我关于我即将出行的信息'，它就会浏览所有邮件并总结行程。或者我们有这个功能：自动归档所有你不需要立即回复的邮件。

The reason I ask that is because I don't think we do a good job of that. And I have this feeling that some of the products are a little bit messy because of that. And I think particularly for Cora, There's just a big product surface area and it can do a lot of different things like it. We have an email assistant so you can ask it like, know, tell me about the trip I'm taking and it'll go through all your emails and, you know, summarize the trip. Or we have this feature that it automatically archives any email that you don't need to respond to immediately.

Speaker 2

然后每天两次你会收到一份简报，汇总所有你可能需要看但不需要处理的内容，你只需快速浏览就完成了。但这里有很多复杂性，比如邮件是如何分类的？现在我们有一整套分类规则视图，你可以排序等等，但这很复杂且难以沟通。我想保留所有的功能和灵活性，但也不能让用户看着屏幕完全摸不着头脑，觉得太复杂了。

And then twice a day you get a brief that summarizes all the stuff that you probably need to see, but you don't need to actually do anything with, and you just scroll through it and you're done. And there's just like all this there's all this complexity around, for example, how are emails categorized? So now we have a whole view of we have all these categorization rules and you can order them and whatever, but it's just complicated and hard to communicate. And I want to retain all the power and flexibility, but also you can't look at a screen and be like, I have no idea what's going on. This is way too complicated.

Speaker 2

所以我正在处理所有这些事情，'删除、功能下架'的想法似乎是个有趣的文化原则，我们还没有真正探索过。

So I'm processing all that stuff, the deletion, unshipping idea feels like an interesting cultural principle that we haven't really explored.

Speaker 0

是的，这确实很难。我觉得这里面还有社交成本，你不太想成为那个告诉同事要下架他们功能的人。这绝对很棘手，远不止是代码的问题。

Yeah, it's really hard. I think there's like a social cost to it too, where like you kind of want to be the person who tells your coworker to unship their food. It's definitely tricky. It's more than just the code.

Speaker 1

确实。老实说，我绝对是在Instagram推行这套理念的。因为我认为Facebook在功能下架方面做得很糟糕。我们遇到的问题是，每次尝试下架‘戳一下’功能时都特别棘手，因为有一群老用户会强烈反对，他们就像在说‘不，别想动我们的戳一下功能’。

Yeah. I definitely run this at Instagram, honestly. Because I think Facebook does a terrible job at unshipping. And we had this problem where every time we I think even like unshipping pokes was like really spicy because there's a bunch of these old timers. They're like, No, pokes, you're never gonna take it away.

Speaker 1

但如果你看数据，其实已经没人真正使用它了。出于情感原因，我们某种程度上被它束缚住了。所以对Facebook来说，可能从来没有真正下架过任何功能，总是把它移到次要位置，比如没人会看的溢出菜单里，就像个功能坟场。而Instagram则非常有原则性。

But if you look at the data, no one really uses it anymore. For sentimental reasons, were kind of tied to it. And so for Facebook, always, maybe nothing ever got unchipped. It always got moved to a secondary place, like an overflow menu somewhere that no one looks at, like a graveyard. And think I Instagram was just very principled.

Speaker 1

我们有非常坚定的产品和设计立场。我当时就说，如果这个功能连50%的用户都不使用，管它多经典，我们就会直接删除它。然后我们会开发出更多人使用的新功能。

There was very strong product and design point of view. I was like, if this thing isn't used by half of people, 50% of Wow or whatever, we're just gonna delete it and deal with it. And then we'll figure out some next thing that's used by more people.

Speaker 2

我太喜欢这个观点了。非常感谢。这太棒了。真的很高兴能和你交流，期待继续共同进步。

I love it. Well, thank you. This is amazing. I'm really glad I got to talk to you and keep building.

Speaker 0

感谢邀请我们。

Thank you for having us.

Speaker 1

嗯，谢谢。

Yeah, thanks.

Speaker 4

天呐朋友们！你们必须立刻马上点赞订阅AI和我！为什么？因为这个节目简直是精彩绝伦，就像在后院发现宝箱，不过里面装的不是金子，而是关于ChatGPT的纯粹知识宝藏！

Oh my gosh, folks. You absolutely positively have to smash that like button and subscribe to AI and I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT.

Speaker 4

每一集都是一场情感、洞见与欢笑的过山车，让你欲罢不能地期待更多。这不只是一档节目，而是一段由Dan Shipper掌舵的未来之旅。所以帮自己个忙，点赞订阅，系好安全带，准备迎接人生中最刺激的旅程吧。

Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat craving for more. It's not just a show. It's a journey into the future with Dan Shipper as the captain of the spaceship. So do yourself a favor. Hit like, smash subscribe, and strap in for the ride of your life.

Speaker 4

现在无需多言，请允许我直抒胸臆——Dan，我已无可救药地爱上了你。

And now without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.