Fragmented - AI Developer Podcast - 307 - 硬件工程——AI编程的难点 封面

307 - 硬件工程——AI编程的难点

307 - Harness Engineering - the hard part of AI coding

本集简介

AI 编程最难的部分不是生成代码,而是控制质量、安全性和偏差。Kaushik 和 Iury 拆解了“ Harness 工程”:塑造智能体环境的五大支柱,以及团队从零构建自定义 Harness 时的实际样子。 完整节目笔记请访问 fragmentedpodcast.com。 节目笔记 为何重要 Harness 工程 — OpenAI 关于构建 Codex 代码库的帖子(约 100 万行代码,1500 个 PR 合并,无一行手动编写) 塑造 Harness The Feed 的遗失与找回 — Iury 的通讯,整合了 Harness 工程的核心主题 智能体的可理解性 封闭反馈回路 持久记忆 熵控制 破坏范围控制 构建 Harness Minions:Stripe 的单次端到端编码智能体 — Stripe 基于 Goose 构建了针对其代码库的自定义智能体 Goose — Block 开源的编码智能体 Jesse Vincent 的超能力 — 推动规范软件工程流程的技能 Open Code — 可分叉和自定义的开源编码智能体 其他资源 智能体 Harness 术语表 — Latent Patterns 迈向自驾驶代码库 — Cursor 智能体工作流 — GitHub Next 软件开发的未来 — ThoughtWorks 联系我们 我们非常期待您的反馈。邮件是我们最推荐的联系方式,您也可以访问我们的联系页面了解其他方式。 我们想听取所有反馈:哪些有效,哪些无效,您希望我们深入探讨哪些主题。 联系我们 通讯邮件 YouTube 网站 联合主持人: Kaushik Gopal Iury Souza > [!fyi] 我们从第 300 期开始从 Android 开发转向 AI 领域。收听该期节目,了解我们新方向的完整故事。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

欢迎收听《Fragmented》,这是一档帮助 vibe 开发者逐步成长为软件工程师的 AI 开发者播客。

Welcome to Fragmented, an AI developer podcast that helps vibe coders become software engineers one episode at a time.

Speaker 0

我是你的主持人 Kaushik。

I'm your host, Kaushik.

Speaker 1

我是另一位主持人 Iury,我很乐意聊聊如何利用 AI 让你成为一名更好的开发者。

And I'm Iury, the other host of Fragmented, where I'd love to talk about using AI to make you a better developer.

Speaker 0

在上一期节目中,我们深入讨论了 agents.md 文件。

In the last episode, we talked a lot about the agents dot MD files specifically.

Speaker 0

但从整体来看,这仅仅是拼图中的一小块。

But in the grand scheme of things, that's just one piece of the puzzle.

Speaker 0

你提到了一个统称——‘Harness 工程’。

You threw this umbrella term called harness engineering.

Speaker 0

至少现在年轻人都是这么叫的。

At least that's what the kids are calling it these days.

Speaker 0

我知道你做了大量研究,收集了很多文章,里面有不少值得探讨的内容。

I know you did a lot of research and you have a bunch of articles that you pull from and there's some interesting things to talk about.

Speaker 0

但也许我们先给听众们简要介绍一下,今天我们将具体围绕‘ Harness 工程’讨论哪些主题。

But maybe let's start by giving our listeners a sort of overview of what are the topics we're going to talk about specifically around harness engineering today.

Speaker 1

在本集中,我们将探讨什么是 Harness 工程,以及它为什么重要。

So in this episode, we're going to cover what's harness engineering and why is this even important.

Speaker 1

那么,为什么大家现在都在谈论这个话题呢?

So why are people talking about this at all?

Speaker 1

以及如何设计和构建 Harness。

And how to shape the harness and how to build it.

Speaker 1

这样听起来怎么样?

Does that sound good?

Speaker 1

是的,听起来不错。

Yeah, that sounds good.

Speaker 1

我觉得第一个问题非常重要,对吧?

I think the first one is very important, right?

Speaker 1

为什么这件事很重要?

Why does this matter?

Speaker 1

对吧?

Right?

Speaker 0

是的。

Yeah.

Speaker 0

为什么要讨论这个?

Why even talk about this?

Speaker 1

之前OpenAI发过一篇题为《Harness Engineering》的文章,其实很好地解释了这一点。

So there was this OpenAI post with this very name called Harness Engineering that actually helps explain this really well.

Speaker 1

我觉得有趣的是,他们展示的是一个真实产品中的实践。

I think it's also interesting because they're showcasing something that they did with a real product.

Speaker 1

这篇文章所引用的是他们用于Codex的代码库。

It's their code base for Codex that they use as reference for this article.

Speaker 1

他们拥有超过一百万行代码,并声称这些代码全是自动生成的,共合并了1500个拉取请求。

So they have over 1,000,000 lines of code, and they claim that's zero manually written, like 1,500 PRs merged.

Speaker 1

显然,这对他们来说是真正有效的实践。

So apparently, this is the real deal for them.

Speaker 1

结果发现,困难的部分其实不是编写代码,而是构建一个能让代理可靠地生成这些代码的环境。

And it turns out that the hard part was not actually creating the code, but it's basically building the environment that makes the agents reliable to build this code.

Speaker 0

这是他们有意识的努力,对吧?

It was an intentional effort on their end, right?

Speaker 0

他们想的是,如果我们朝着这个方向发展,让代理完全构建整个产品,我们会遇到哪些问题?

Where they're like, if we were to move in this direction where the agent basically builds the entire product, what are the problems we run into?

Speaker 0

随着进展,我们该如何缓解并消除这些对代理造成的问题?

And how do we alleviate and remove those problems for the agent as we go along?

Speaker 1

这在做事方式上是一个巨大的转变。

It's a very big shift in the way you do things.

Speaker 1

对吧?

Right?

Speaker 1

你实际上必须尝试不去手动干预它。

You actually have to attempt to not manually touch it.

Speaker 1

然后问题就会显现出来,你就得想办法解决它们。

And then the problems start showing up and then you have to kind of figure how to solve them.

Speaker 1

这是一个非常有趣的案例研究。

So that's a very interesting case study.

Speaker 0

那篇帖子中有一个让我觉得特别有意思的观点,他们明确提到,整个一周他们都会使用代理来生成拉取请求,但随后会花大约20%的时间来清理、校正并把这些修正反馈到系统中。

One point that stood out in that post that I thought was interesting was they said explicitly throughout the week, they would basically use agents to generate the PRs, But then they would spend about a good 20% trying to clean up and make sure that they course correct and then add those corrections into the system.

Speaker 0

对。

Right.

Speaker 0

这基本上就是催生了这种‘ Harness 工程’的概念,对吧?

And that's basically what gave birth to this harness engineering thing, right?

Speaker 0

没错。

Exactly.

Speaker 1

工程师的角色正在发生变化,因为我们现在不再那么频繁地编写代码,而是更多地专注于这个引导环节。

It's like the role of the engineer is changing or now that we don't really write the code as much, we're more working on this steering part.

Speaker 1

我们是在引导代理,控制它们的走向。

So we're steering the agents where it's moving.

Speaker 0

顺便说一句,这简直是个神来之笔。

That's like a mic drop moment, by the way.

Speaker 0

随便聊聊。

Just casually.

Speaker 1

随便聊聊。

Just casually.

Speaker 0

是的,你知道的,人类现在已经不写代码了。

Yeah, you know, humans don't write code anymore.

Speaker 0

智能体已经如此先进了。

The agents are so.

Speaker 1

不,安拉保佑。

No, InshaAllah.

Speaker 1

但没错,大致就是这样。

But yeah, that's pretty much it.

Speaker 1

现在的瓶颈不再是能不能写代码,对吧?

The bottleneck now is not if you can write the code, right?

Speaker 1

而是你能不能阻止代码偏离轨道或崩溃,你知道的,因为现在有大量代码在不断涌入。

But it's like, can you stop the code from drifting or breaking, you know, because there's a lot more code going in.

Speaker 0

天啊,这个话题我们也在未来的节目中聊聊吧。

Man, this is something we should talk about in a future episode too.

Speaker 0

我们该怎么去审查所有这些代码呢?

How do we even approach the review of all of that code?

Speaker 1

对吧?

Right?

Speaker 1

哦,这真是个大问题。

Oh, that's a big one.

Speaker 0

生成代码太容易了,但你该怎么真正地审查它呢?

Generating the code is so easy, but how do you actually review it?

Speaker 0

普遍的看法是,至少人类应该审查代码,确保不会出错。

There's a general theme where everyone is like, yes, at least the human should review the code and make sure the slot doesn't go in.

Speaker 0

但当你面对一千个拉取请求时,根本不可能好好完成审查工作。

But when you have 1,000 PRs coming your way, no chance you're gonna be even doing a good job of reviewing.

Speaker 0

到了这个时候,不如想想怎么让智能体更聚焦。

At that point, it's better to think about how can we get agents to focus.

Speaker 0

我们如何让代理在代码审查方面做得更好?

How do we make the agents better at reviewing the code?

Speaker 1

是的。

Yeah.

Speaker 1

这是一个大问题。

That's a big can of worm.

Speaker 1

是的。

Yeah.

Speaker 1

我们现在来谈谈什么是 Harness 工程。

Let's go now into what is harness engineering.

Speaker 1

我不太确定这个想法具体来自哪里,但它的核心是围绕代理塑造环境,使其能够可靠地运作。

So I don't know exactly where I got this from, but it's basically this idea of shaping the environment around the agent so that it can act reliably.

Speaker 1

围绕代理塑造环境。

Shaping the environment around the agent.

Speaker 1

是的。

Yeah.

Speaker 1

如果你仔细想想,他们给了我们这么多工具和调节选项,比如技能、命令、子代理、上下文、围绕代理的所有指令,以及代理可用的工具。

If you think about it, with all the tools and the knobs that they give us, so for example, skills and commands and sub agents and the context and all the instructions around the agent and the tools that the agent have available.

Speaker 1

因此,关注点从模型有多智能,转向了它拥有哪些工具、它如何获得反馈,以及人类在何时介入。

So this shifts the focus away from how smart is the model towards what tools does it have and how does it get feedback and when does the human steps in?

Speaker 0

这很有趣,因为还有一种不同的观点认为,模型会持续变强,所以你不需要如此关注这些方面。

And this is interesting because there's a different school of thought where, you know, you can think that the models will keep getting better, so you don't need to focus as much.

Speaker 0

你知道,模型自己就会知道该怎么做了。

You know, the models will just know how to do it.

Speaker 0

这是一种观点。

That's one school of thought.

Speaker 0

和AI领域的大多数事情一样,我们不知道今年年底会落在哪里。

As is the case with most things in AI, we don't know where we're going land end of this year.

Speaker 0

我们可能会推出某种超级GPT-6,或者不知道的Opus-5,突然间模型变得如此强大,你就再也不用操心这些事情了。

We might come up with some Super GPT-six or, don't know, Opus five and suddenly the models have become so good and you no longer need to bother about these things.

Speaker 0

但我们永远无法确定。

But we never know.

Speaker 0

至少以今天的状况来看,如果你想获得良好的结果,专注于塑造这个环境是非常有意义的,就像你所说的那样。

At least as it stands today, if you want to get good results, it makes a lot of sense to focus on shaping this environment like you put it.

Speaker 1

我想这就是我们在这个播客中试图做的事情。

I think that's what we try to do in this pod.

Speaker 1

对吧?

Right?

Speaker 1

我们对这些工具和当前状态采取了非常务实的态度,因为没人真正知道我们会走向何方。

We're trying to be very pragmatic about these tools and the current state of because no one actually knows where we're going.

Speaker 1

所以,这就是Harness Engineering的整个理念。

So yeah, so this is the whole idea of Harness Engineering.

Speaker 1

但我不是母语者,对吧?

But I'm not a native speaker, right?

Speaker 1

当我第一次听到‘Harness’这个词时,我有点困惑。

So when I first heard the word Harness, I was kind of confused.

Speaker 1

我想我可能是在AI的语境下第一次听到这个词的。

I think I probably first heard it on the context of AI.

Speaker 1

所以我当时就想,这玩意儿到底是什么东西?

So I was like, what the hell is a harness?

Speaker 1

对于非母语者来说,我找到的一个有趣含义是,它指的是你给马套上的那些皮带,比如当你想控制马的方向时。

For the non native speakers out there, but one of the interesting meanings I found is the one that it's talking about, you know, the straps you put around a horse, for example, when you want to steer it, to control it.

Speaker 0

所以模型在这里 basically 就是那匹马。

So the model is the horse here, basically.

Speaker 1

你骑着这东西,需要有这些工具围绕着它,才能把它引向正确的方向,让你在正确的地方驾驭它。

You're riding this thing and you need to have the tools around it so you can point it to the right direction so you can ride it in the right place.

Speaker 1

对吧?

Right?

Speaker 0

松开这个缰绳,你就会从马上摔下来。

Let go of that harness and, you know, you'll fall off that horse, basically.

Speaker 1

没错。

Exactly.

Speaker 1

哦,这个说法真不错。

Oh, that's a good one.

Speaker 0

酷。

Cool.

Speaker 0

所以我们一开始简单提过这个,Iury,但现在我觉得有必要把 Harness 工程分成两个部分。

So we mentioned this briefly at the start, Iury, but I think it's important now to split or fork harness engineering into two things.

Speaker 0

一个是像我们之前讨论的那样,塑造这个 Harness。

One is shaping the harness like we talked about.

Speaker 0

这些是你围绕模型做的、帮助它变得更好的事情。

These are the things that you do around the model to help it get better.

Speaker 0

但还有另一种方法,就是直接构建 Harness 本身。

But there's the other approach too, which is building the harness itself.

Speaker 0

这更像是如果你想自己打造一个类似 Cloud Code 或 Open Code 的东西。

And this is more like if you wanted to build your own version of Cloud Code or Open Code or whatever.

Speaker 0

所以在这集中,我觉得可以把内容分成两部分比较合适。

So in this episode, what I thought it would make sense to do is split the episode into two parts.

Speaker 0

第一部分我们先讲如何塑造 Harness,因为这目前看起来最有用、最有效。

The first we're going talk about shaping the harness because that feels like the most useful and effective to do now.

Speaker 0

我们会把大部分时间花在这里。

We'll spend a majority of our time there.

Speaker 0

但我们也会简要探讨一下构建整个工具链这一概念。

But we'll also touch on this concept of building the harness altogether.

Speaker 0

好的,我们来谈谈塑造工具链,这是工具链工程中第一个需要讨论的重要领域。

Okay, let's talk about shaping the harness, which is the first big area to speak here in harness engineering.

Speaker 0

在实际操作中,这到底是什么样的呢?

What does that actually look like in practice?

Speaker 0

我认为在你的通讯稿中,Iury,我们会在节目说明中附上链接,你提到了五个方面或主题,关于如何塑造工具链。

And I think in your newsletter, Iury, which again we'll add a link to that in the show notes, you pointed towards these five areas or themes, so to speak, around shaping the harness.

Speaker 0

我喜欢你对这些内容的结构安排。

And I liked how you structured those.

Speaker 0

所以,我们直接进入,逐一讨论这五个方面吧。

So maybe we just jump right in and talk about each of those five areas.

Speaker 1

好的。

All right.

Speaker 1

是的

Yeah.

Speaker 1

我其实读过很多不同来源的内容,它们结构上都不太清晰,但我试着把它们整理成一种有条理的方式。

So I've actually read a bunch of different, like from different sources, and they were not that structured, but I try to organize them in a way.

Speaker 1

这并不是任何官方来源,只是我开始思考这个问题的方式。

So this is not any official source or, you know, like this is just the way I started thinking about this.

Speaker 1

所以第一个大的主题,我认为是‘代理可理解性’这个概念,这个词可能有点奇怪。

So the first big theme, I think, is this idea of agent legibility, which is maybe a weird word.

Speaker 1

我不确定。

I don't know.

Speaker 0

不,不,我觉得我明白。

No, no, I think I understand.

Speaker 0

但也许你该解释一下‘代理可理解性’,也就是如何让代理更容易理解?

But maybe you should explain that agent legibility, which is like how do you make it easy for the agent to understand?

Speaker 0

你能详细说说这个吗?

Can you expound on that?

Speaker 0

什么是代理的可理解性?

What is agent legibility?

Speaker 1

我认为背后的理念是,如果知识存在于Slack中、某人的头脑里,或者某个链接到其他地方的随机文档中,代理就无法使用它。

I think behind this is the idea that if knowledge lives in Slack or in someone's head or in some random doc that's linked elsewhere, the agent cannot use it.

Speaker 1

因此对代理来说,它本质上就不存在。

So for the agent, it essentially doesn't exist.

Speaker 1

所以你认为再简单不过的事情,实际上并不在它的上下文范围内,完全无法被发现。

So something that you think it's trivial is just not part of its context and it's not discoverable at all.

Speaker 0

就是那些你和产品经理或其他人在Slack里的对话。

It's those things, those conversations you have with your product manager or other people in Slack.

Speaker 0

那些不存在于代码库中的Google文档和PRD。

The Google Docs, the PRDs that don't exist in the codebase itself.

Speaker 0

你希望把这些内容捕捉并编码到系统中。

Those are the things you want to sort of capture and encode into this.

Speaker 0

这似乎就是这里的主题:如果代理无法看到某些信息,就要确保将其编码并纳入,以便它能从你的代码库中读取这些内容。

That feels like the theme here, which is like if there are things that the agent has no visibility into, make sure that you encode and put that so that it can start to read this from your codebase.

Speaker 0

否则,它就没有任何知识,对吧?

Because otherwise, it has no knowledge, right?

Speaker 0

如果它看不到,就不可能将这些内容纳入它的推理和思考中。

If it can't see it, there's no chance it's going to incorporate that into its reasoning and thinking.

Speaker 1

是的。

Yeah.

Speaker 1

这意味着代码库必须对代理更易导航,而不仅仅是为了人类。

It's the idea that the repo has to be more navigable by the agents, so not just for humans.

Speaker 0

我喜欢这个说法。

I like that.

Speaker 0

让代码库对代理更易导航。

Make it more navigable by agents.

Speaker 0

但自然的问题是,你该如何做到这一点?

But the natural question is how do you do that?

Speaker 0

你建议我怎么做,才能让代码库对代理更易导航?

What are you suggesting that I do to make it more navigable by the agents?

Speaker 0

对。

Right.

Speaker 1

回到AgentsMD的初衷,现在你不再只是拥有像1000行那样的大量指令,而是可以明确指出:如果你想了解仓库的工作方式,有一个docs文件夹,里面按功能或某种主题组织了相关信息,以及关于仓库运作的跨领域关注点。

So going back to the whole AgentsMD starting point, so now you can have, instead of just having a bunch of instructions like 1,000 lines in AgentsMD, you can actually specify that, hey, if you want to understand how the repo works, there is this docs folder where we organize by feature and or some kind of theme and some cross cutting concerns of how the repo works.

Speaker 1

因此,你可以有组件概览、设计规范、编码实践,所有这些内容对代理自身来说都更容易发现。

So you can have some components overview, design rules, coding practices, and all of this becomes more discoverable by the agents themselves.

Speaker 0

为了明确一下,你所说的docs文件夹,一方面是要创建这个文件夹并添加所有这些内容,另一方面还要在代理中引用它。

And just to be clear, the docs folder you're saying, one is actually creating the docs folder, adding all of those things, but then also having that referenced in the agents.

Speaker 0

像一个地图一样。

Md like a map.

Speaker 0

这才是关键所在。

And that's the important piece.

Speaker 1

是的。

Yeah.

Speaker 1

笔记之间可以相互链接。

And notes can link to other notes.

Speaker 1

当代理遍历这些文档时,它能够自行判断这些内容属于哪里以及与什么相关。

When the agent is traversing this documentation, it can figure out by itself like where does this go and what this is related to.

Speaker 0

也许我们继续下一个话题。

Maybe we move to the next one.

Speaker 0

你接下来提到的主题是什么?

What was the next theme that you brought about?

Speaker 1

是的。

Yeah.

Speaker 1

我认为很多人也正在意识到并着手解决的一个问题是:关闭反馈循环。

This is something that I think a lot of people have been realizing too, like, and working on is the idea of closing the feedback loops.

Speaker 0

闭合反馈循环听起来像是控制理论,或者一些高深的机器学习术语。

And, you know, closed feedback loop sounds like this control flow theory, you know, some fancy machine learning term.

Speaker 0

但我们说的‘关闭反馈循环’到底是什么意思?

But what do we mean by closed the feedback loop?

Speaker 0

因为很多人经常提到这个词,但实际操作中,你指的是什么?

Because I hear a lot of people say this, but pragmatically, what do you mean by that?

Speaker 1

嗯,代理需要知道它什么时候出错了。

Well, the agent needs to know when it messed up.

Speaker 1

所以它需要某种形式的反馈。

So it needs some kind it needs some kind of feedback.

Speaker 1

所以我认为第一个可能是测试。

So I think the first one could be probably tests.

Speaker 1

我认为它们是基础。

I think they're the baseline.

Speaker 1

所以如果你让它实现某个功能,它需要有测试,这样当它运行测试且测试失败时,你就不必告诉它出错了。

So if you have you ask it to implement something, it needs to have tests so that when it runs a test and the test fails, you don't need to tell it that it's broken.

Speaker 1

它可以自己发现问题。

It can figure it out by itself.

Speaker 0

这很有道理。

And this makes sense.

Speaker 0

所以这就是即使拥有测试的意义,对吧?

So this is the idea of even having the tests, Right?

Speaker 0

因为在上一节关于代理的可解释性中,你还需要告诉代理如何运行测试。

Because in the last section with the agent legibility, you want to also tell the agent how to run the tests.

Speaker 0

这部分内容就放在那一节里。

That goes in that section.

Speaker 0

在这一节中,我一直在想‘测试框架’这个词,这个词很好。

In this section, I mean, now I keep thinking about the term harness, which is a good term.

Speaker 0

你的代码库首先必须具备运行这些测试的能力。

Your code base needs to have the ability to even run these tests in the first place.

Speaker 1

是的。

Yeah.

Speaker 1

举个例子,可能不太明显的一点是:如果代理要运行测试,那么测试的输出必须足够清晰,以便代理能够据此采取行动。

So for example, one thing that maybe is not immediately obvious is that, okay, if the agent is going to run the test, so the output from the test needs to be clear enough for the agents to act on it.

Speaker 1

这可能取决于你正在构建的应用类型或系统类型,但有些测试本身并不直接可操作。

So maybe it's not if you're depending on what kind of app you're building or what kind of system you're building, but some tests are not immediately actionable.

Speaker 1

你需要在某个地方检查测试结果。

You need to inspect the results somewhere.

Speaker 1

也许会把结果输出到一个代理不知道位置的文件中。

Maybe dumps the results into a file that the agent doesn't know where it is.

Speaker 1

所以这些测试实际上对反馈循环没有帮助。

So these tests are not useful actually for the feedback loop.

Speaker 0

哦,这是个非常好的观点。

Oh, that's a really good point.

Speaker 0

所以不仅仅是编写测试,还要学会如何读取测试的结果。

So it's not just about writing the test, it's also how you read the results from the test.

Speaker 1

是的,它需要形成一个完整的闭环,对吧?

Yeah, it needs to close actual loop, right?

Speaker 1

所以它需要

So it needs to

Speaker 0

我明白了。

I see.

Speaker 0

我喜欢你这样做。

I I like how you did that.

Speaker 0

所以这就是闭环的那部分,好的。

So that's the closing of the loop part in Okay.

Speaker 0

这很有道理。

That makes a lot of sense.

Speaker 0

所以写测试本质上就是实现反馈闭环的想法。

So writing tests is basically the idea of closing the feedback loop.

Speaker 0

这里还有其他内容吗?

Is there anything else here?

Speaker 1

对。

Right.

Speaker 1

所以我认为,作为开发者,我们习惯的很多东西其实也是我们需要的,对吧?

So I think most of the things that we are used as developers because we need these things as well, right?

Speaker 1

所以如果你曾经试着在记事本里写代码,会非常困难,对吧?

So if you ever try to write code in a notepad, it's really hard, right?

Speaker 1

因为你无法确切知道正在发生什么。

Because you don't know exactly what's going on.

Speaker 1

这只是一个文本。

It's just text.

Speaker 0

嘿,伙计。

Hey, man.

Speaker 0

15年前,我们用Notepad++,我不知道,当我学C++课程的时候,我们还得用那个。

15 back, Notepad plus plus I don't know, you know, when I had my C plus plus courses, we had to do that.

Speaker 0

我的意思是,别说Notepad++了,我们甚至得在纸上写代码,听起来很疯狂,但确实如此。

I mean, forget Notepad plus plus we had to write code on paper, like, you know, as crazy as that sounds.

Speaker 1

难多了。

Way harder.

Speaker 1

对。

Right.

Speaker 1

没错,确实难多了。

That's exactly it's way harder.

Speaker 1

所以这些工具,代理们和我们一样需要它们。

So the the agents the agents need them as much as we do.

Speaker 0

哦,我喜欢你这样联系起来。

Oh, I like how you connected that.

Speaker 0

好的。

Okay.

Speaker 0

好的。

Okay.

Speaker 0

这很好。

That's good.

Speaker 0

这很好。

That's good.

Speaker 0

所以你的意思其实是,还记得那有多难吗?

So you're basically saying, remember how hard that was?

Speaker 0

所以别那样做了。

So don't do that.

Speaker 1

没错。

Exactly.

Speaker 1

尤其是,比如一些基础的东西,像类型检查或者带领团队。

And especially, like, example, even basic things like type checks or lead team.

Speaker 1

你能给代理提供的任何反馈都会有帮助。

Any kind of feedback you can give to the agent will help.

Speaker 1

所以,静态分析就是二十一世纪的工具,基本上是这样。

So static analysis twenty first century tools, basically.

Speaker 1

对。

Right.

Speaker 1

然后你会接触到更高级的内容。

And then you get to even more higher level stuff.

Speaker 1

不是高级别的,而是一些可能不会立刻显而易见、但你需要构建的东西。

Not high level, but things that maybe doesn't jump to the eye immediately that you need to build.

Speaker 1

但比如说,代理需要能够访问日志。

But for example, the agent needs to have access to the logs.

Speaker 1

根据你的系统架构,日志可能并不容易获取。

Depending on how your system is built, the logs may be not so trivial to pull from.

Speaker 1

所以你需要让代理能够随时获取这些信息。

So you need to get the agent to have that at hand.

Speaker 0

我认为即使在OpenAI的帖子中他们也提到过,他们使用的是LogSQL之类的工具。

I think even in the OpenAI post they mentioned, they use LogSQL or something.

Speaker 0

我忘了他们具体用的是什么工具。

I forget what the tool is specifically that they use there.

Speaker 0

根据第一手经验,我也可以说,在Entropy我们不得不花时间研究如何从我们的系统中提取日志,以便提供给代理,因为其中涉及太多细微的环节。

From first hand experience, I can also say that at Entropy we had to spend some time figuring out how to draw the logs out from our systems in a way that we can feed to agents because there's so many minor points.

Speaker 0

例如,你的日志文件可能大得惊人。

For example, your logs could be a ginormous size.

Speaker 0

你甚至可能没有API。

You may not even have APIs.

Speaker 0

你必须分页调用API,或者设置MCP。

You have to paginate your APIs or you set up MCPs.

Speaker 0

但问题是,你所使用的工具并没有现成的MCP。

But then the MCPs don't exist for the tools that you use.

Speaker 0

所以所有这些都属于工程工作。

So all of that is this engineering.

Speaker 0

对吧?

Right?

Speaker 0

你如何才能做到让代理有效地提取这些日志并加以利用?

How do you get to a point where you can get the agent to pull those logs effectively and then use those logs?

Speaker 0

因为即使你考虑日志,里面也有时间戳。

Because even if you think about logs, there's timestamp.

Speaker 0

我们不想提取一整天的日志。

Don't want to pull the logs for the entire day.

Speaker 0

你只想提取在代理需要时的特定时刻的日志,或者使用正确的过滤条件。

You want to pull it for that specific moment where it's useful for the agent to pick it up or the right filters on those logs.

Speaker 0

因此,为了确保在代理需要时能正确提取信息,这其中涉及大量所谓的工程工作。

So there's a lot of quote engineering that goes into making sure that you pull the information correctly for these agents when they need them.

Speaker 1

你提到的这一点非常好,因为我觉得这正是让‘ Harness 工程’这个概念对我更具吸引力的原因。

This is a very good point that you made, because I think this is what makes this idea of harness engineering more compelling to me.

Speaker 1

因为之前我们做的是上下文工程。

Because the previous thing we had was context engineering.

Speaker 1

所以上下文工程更像是那种玄乎的东西,你只是输入一些文字,试图从代码库中找到语义之类的东西,试图激发AI的这部分能力。

So context engineering is more in this, you know, like the woo woo kind of thing that you're just typing words and trying to find semantic and whatever from the code base, like try to invoke this good part of the AI.

Speaker 1

但这里才是真正意义上的工程工作,对吧?

But here is actual engineering work, right?

Speaker 1

好吧,我需要日志。

Okay, I need logs.

Speaker 1

我该怎么获取这些日志?

How do I get the logs in?

Speaker 1

哦,我需要把它连接起来。

Oh, I need to tie this in.

Speaker 1

好吧,也许我可以设置一个钩子,每当模型编写代码时,我就触发代码检查器来格式化我的代码库,确保一切都很严谨。

Okay, maybe I have a hook that whenever the model writes code, then I will trigger the linter to format my code base to make sure that things are very tight.

Speaker 1

所以,我认为这属于工程的部分。

So this is, I think, to the engineering part.

Speaker 0

很好。

Great.

Speaker 0

而且这适用于各个方面。

And it applies across the board.

Speaker 0

假设你有截图测试,那么你就需要知道如何阅读截图,甚至首先要设置截图测试。

Say you have screenshot tests, then you need to know how to read screenshots at all or even set up screenshot tests in the first place.

Speaker 0

作为移动开发者,你需要能够启动一个模拟器,以便代理可以在模拟器上运行测试并读取结果。

As mobile developers, you need to be able to spin up an emulator or something so that the agent can then actually run the test on an emulator, read the results.

Speaker 0

无论你作为人类需要做什么来检查系统,你都需要将系统工程化,以便代理也能做到同样的事情。

Whatever it is that you as a human do to check the system, you need to engineer your system so that the agents can basically do that as well.

Speaker 0

接下来是什么?

What's next?

Speaker 1

哦,现在这个就变得复杂了。

Oh, that's now there comes a complicated one.

Speaker 1

外面有很多不同的观点。

Lots of opinions out there.

Speaker 1

但这确实是一个开放性问题。

But and it's that most definitely an open problem.

Speaker 1

但我们必须解决内存问题,某种形式的持久化内存。

But we need to solve memory, some kind of persistent memory.

Speaker 0

人们对这个问题的处理方式差异很大。

People approach this so differently.

Speaker 0

目前关于内存的解决方案非常多。

There's so many solutions out there for memory.

Speaker 0

但也许你可以先告诉我们,所谓持久化内存,你具体指的是什么?

But maybe you can start off by telling us, yeah, well, what do you mean by persistent memory?

Speaker 0

然后我们可以讨论不同的方法。

And then we can talk about approaches.

Speaker 1

是的,如果缺乏一个权威的记录系统、明确的设计决策以及对常见故障模式的理解,那么无论是人类还是代理,都会反复为同样的困惑付出代价,对吧?

Yeah, so here's the idea that if you don't have a system of record or design decisions and some failure patterns, both humans and agents will keep paying for the same confusion over and over, right?

Speaker 1

因此,当出现问题时,我们需要能够记住它,以避免再次发生。

So when something goes wrong, we need to be able to remember that so that it doesn't happen again.

Speaker 0

我们把记忆看作是防止过去错误的一种方式。

We think of memory as preventing past mistakes.

Speaker 0

那么,持久记忆和仅仅把某些东西放在你的代理中,有什么区别呢?

What is the difference then between persistent memory and, say, just putting something in your Agent.

Speaker 0

MD文件?

Md file?

Speaker 1

因为用记忆基本上可以解决两类问题。

Because there's basically like two things you can solve with memory.

Speaker 1

一个是过去决策的概念,另一个是基本上接续你之前停下的地方。

It's like one is this idea of like past decisions, but it's also like basically trying to pick up where you left from.

Speaker 0

哦,这非常有趣。

Oh, that's very interesting.

Speaker 0

这不仅仅关乎当你从零开始时,想要编码过去的记忆。

It's not just about if you're starting net new, you want to encode past memory.

Speaker 0

如果你思考人们是如何处理记忆的,也许你可以再多谈谈这一点,有些解决方案相当随意,比如我认为Claude的做法,就是存在一个记忆。

If you think about how people are approaching memory, and maybe you can talk a little more about this, there's as hacky solutions as I think what Claude is doing, which is there's a memory.

Speaker 0

现在是MD文件。

Md file now.

Speaker 0

然后你就可以直接使用内存了。

And then you can just use the memory.

Speaker 0

MD文件。

Md file.

Speaker 0

然后还有一些人使用类似这样的工具?

Then there are people who have tools like what is it?

Speaker 0

我认为Shopify的CEO托比有一个叫QMD的工具,它可以将所有内容输入,生成编码后的Markdown,然后保存为嵌入向量,或者我认为它是存入向量数据库的。

I think Toby, the Shopify CEO, has this QMD tool, which is feed it across all and get encoded markdown and then that you save as embeddings or I think it puts it in a vector DB.

Speaker 0

我记不清了。

I forget.

Speaker 0

还有一些完整的系统,你得设置一套完整的索引流程。

And then there's entire systems where you set up an entire indexing process.

Speaker 0

对于那些不记得的人,比如Cursor早期版本,每次你打开Codex时,它都会进行索引。

For those who don't remember, like Cursor in the early days, know, anytime you open your Codex, it would index.

Speaker 0

你知道,从简单的markdown到复杂的系统,这中间有一系列不同的选择。

You know, there's a range from like a simple markdown to like sophisticated systems.

Speaker 1

是的。

Yeah.

Speaker 1

因为本质上,它们是一回事,对吧?

Because essentially it's the same thing, right?

Speaker 1

因为所有东西都被标记成了markdown。

Because everything's marked down.

Speaker 1

但我认为关键在于你如何使用它,或者如何填充内容。

But I think it's just the way you use it or how you fill that in.

Speaker 0

你还记得吗?我们曾经有一段时间特别流行Ragged模型。

You know how we have Ragged at one point, Ragged models were the rage.

Speaker 0

我有一个强大的模型,但我想把我的一些经验融入进去。

I have this powerful model, but then I want to layer in some of my learnings.

Speaker 0

你可以把那些内容写进markdown文件里,根本没人拦着你。

You can put that in a markdown file like nothing stopped you.

Speaker 0

但还有更高效的方法。

But then there are more efficient ways.

Speaker 0

模型最终可能会达到同样的结果并给出答案。

The model eventually might get to the same point and give you the results.

Speaker 0

但你希望让这个过程更快。

But you want to make that process faster.

Speaker 0

所以记忆似乎是一个很好的方式,因为随着这些模型变得越来越强大,智能体最终会得出正确的答案。

So memory feels like a good way to do that because eventually the agent, as these models get really good, the agents will come up with the right answer.

Speaker 0

你希望加速这个过程。

You want to hasten that process.

Speaker 0

你希望快速且高效地获得这些答案。

You want to get those answers quickly and more efficiently.

Speaker 0

你不希望一直消耗令牌。

You don't want to keep burning tokens.

Speaker 0

这似乎就是记忆的强大之处。

It feels like that's the powerful piece of memory.

Speaker 1

这确实是个非常有趣的话题,因为你开始思考我们是如何记忆事物以及如何提取事实和信息的。

This is honestly a super interesting topic because you start thinking of how we remember things and how we surface facts and things.

Speaker 0

你说得对,也许当这些解决方案成熟、人们提出更多想法后,我们应该再专门做一期节目来讨论记忆。

To your point, maybe we should spend another episode just talking about memory once these solutions mature and people come up with more ideas.

Speaker 0

我们一定会这么做的。

We'll make sure to do that.

Speaker 0

下一个是什么?

What's the next one?

Speaker 0

熵控制,对吧?

Entropy control, right?

Speaker 0

这可能是我第一次用上我的机械工程背景。

This is maybe the one time I'm using my mechanical engineering background.

Speaker 0

我觉得我终于用上点什么了。

I feel like I'm finally using something.

Speaker 0

没错。

Exactly.

Speaker 0

它表现得相当不错。

It pretty well.

Speaker 0

它基本上指的是随机性或无序程度。

It basically means the degree of randomness or disorder.

Speaker 0

因此,热力学的其中一条规则或定律指出,熵会随着时间逐渐增加。

So one of the rules or laws of thermodynamics says that entropy over time will gradually increase.

Speaker 0

那么,我该如何将这一点与我的代码库联系起来呢?

So how do I connect that to my codebase?

Speaker 1

这不正是代码库中发生的情况吗?

That's exactly what happens in a codebase, right?

Speaker 1

如果你置之不理,它就会变得一团糟。

If you leave it unattended, it just becomes a mess.

Speaker 1

这是个很好的说法

That's a good way

Speaker 0

那就是:什么也不做,就别指望它会自然变得有序。

to put it, which is don't do anything and don't expect it to naturally become orderly.

Speaker 0

本质上,它会变得混乱。

By nature, it is going to become disorderly.

Speaker 1

要保持整洁需要很多努力,对吧?

Needs a lot of effort to keep things tidy, right?

Speaker 1

就像你的卧室一样。

Just like your bedroom.

Speaker 1

我曾经有个室友,他总说不管怎样,房间都会趋向混乱。

I had a roommate at some point that he used to say that no matter what, the room drifts to chaos.

Speaker 1

很好。

Great.

Speaker 1

很棒的室友。

Great roommate.

Speaker 1

所以我认为,现在由代理生成的代码的问题在于,这种情况被大大放大了,因为代码库中变动的东西多了很多。

So I think the thing with agent generated code now is that this is multiplied a lot because there's a lot more moving things in the code base.

Speaker 1

有大量更多的代码被写入。

There's just a lot more code being written.

Speaker 1

所以熵失控的可能性大大增加了。

So the chances of Entropy taking over just increase a lot.

Speaker 0

这是个很好的观点,因为现在有太多东西在变化了。

That's a good point because there's so much going in.

Speaker 0

是的。

Yeah.

Speaker 0

对。

Yeah.

Speaker 1

所以模式会逐渐偏离。

So the patterns drift.

Speaker 1

即使你有完整的Harness工程体系、一堆防护措施,并且非常认真地进行代码审查,它仍然会偏离,文档会过时,辅助函数也可能开始增多。

Even if you have the whole Harness Engineering going on and you have a bunch of guardrails and you're reviewing very diligently, it's still going to drift and docs will go stale and helper functions might start multiplying.

Speaker 1

这就跟我们人类的情况有点像,对吧?

So just kind of like we have with humans, right?

Speaker 1

因为我觉得现在很多人好像觉得在AI出现之前,代码都是人类写得完美的。

Because I think a lot of people now kind of act like humans wrote perfect code before AI.

Speaker 1

This

Speaker 0

确实如此。

is true.

Speaker 0

天啊。

It's like, oh my god.

Speaker 0

全是AI写的垃圾代码。

All the AI slob, Code.

Speaker 0

你写的也没好到哪儿去。

You wrote it was not any better.

Speaker 1

我也见过很多人类写的糟糕代码。

I've seen a lot of atrocities from humans as well.

Speaker 1

确实如此。

So true.

Speaker 1

确实如此。

So true.

Speaker 1

对。

Right.

Speaker 1

确实有。

There's yeah.

Speaker 1

我们需要设法控制这种偏离。

We we need to try to control this drift.

Speaker 1

我认为熵控制本质上就是这个。

And I think the entropy control is basically that.

Speaker 1

那么在框架中如何实现呢?

So how do you do that in a harness?

Speaker 1

你可以手动解决它。

You can solve it manually.

Speaker 1

你可以尝试自动化这些事情。

You can try to automate things.

Speaker 1

你可以使用更多AI来尝试控制它。

You can use more AI to try to control it.

展开剩余字幕(还有 136 条)
Speaker 1

这是人们正在探索的另一件事。

That's another thing that people are exploring.

Speaker 1

例如,你可以设置一个专门处理文档漂移的代理,确保文档不会过时。

For example, you could have an agent for catering to documentation drift, trying to make sure that the docs are not stale.

Speaker 1

你可以做各种各样的事情。

You can do all sorts of things.

Speaker 1

而且这也可以算作一个独立的案例,我想。

And this is also like, it could be an also an episode, I guess.

Speaker 1

但确实,你得对这个问题采取些措施。

But yeah, you'd have to do something about this.

Speaker 0

有了这一点,我想我们基本上就谈到了你在这里添加的最后一个话题,即影响范围控制。

And with that, we basically come to the last one, I think, that you added here, which is blast radius control.

Speaker 0

你能具体解释一下你所说的这一部分是什么意思吗?

Can you explain what you mean by this section specifically?

Speaker 1

是的。

Yeah.

Speaker 1

这个主要是关于范围权限、风险感知检查、审批关口,以及你可以用来明确控制代理在代码库敏感区域行为的所有措施。

So this one is just about scope permissions and risk aware checks and approval gates and everything that you can use to explicitly control what the agent can do, especially in sensitive areas of the code base.

Speaker 0

对吧?

Right?

Speaker 0

我明白了。

I see.

Speaker 0

所以这个想法是设置正确的权限,以免代理突然去对整个数据库进行重构。

So this is the idea that you set up the right permissions so you don't let the agent basically suddenly go and RF your entire database.

Speaker 0

我相信大家都看过这类故事,但这个主题就是讲这个的。

I'm sure people have seen the stories, but that's what this theme is about.

Speaker 0

爆炸半径的意思是确保你的代理不会真的做出坏事,或在你的代码库中引发核爆。

The Blast radius is make sure you don't let your agent actually do the bad things or go nuclear on your code base.

Speaker 1

没错。

Exactly.

Speaker 1

我以前也听过这种说法。

It's like, I heard this before as well.

Speaker 1

当你听到一个初级开发者炸了公司数据库的故事时,这并不是初级开发者的错,对吧?

When you hear a story that a junior developer nuked the company's database, it's not the junior's developer fault, right?

Speaker 1

这其实是给他在数据库访问权限的人的错。

It's like the person that gave him access to the database.

Speaker 0

你应该建立坚不可摧的系统,让人根本无法做出错误的操作。

You should have bulletproof systems that don't allow you to even do the wrong thing.

Speaker 1

没错。

Exactly.

Speaker 1

是的。

Yeah.

Speaker 1

如果你的代理在做灾难性的事情,实际上并不是代理的错。

If your agent is doing something catastrophic, it's not the agent's fault actually.

Speaker 1

所以你必须对此采取一些措施。

So you have to do something about that.

Speaker 1

比如,我有个很愚蠢的例子,但前几天我确实试过那个Gmail MCP。

For example, I had this thing, very stupid example, but I was trying this Gmail MCP the other day, like some time ago, actually.

Speaker 1

天哪。

Oh, boy.

Speaker 1

而这最终是邮件撰写。

And that's ultimately email drafting.

Speaker 1

好的。

Okay.

Speaker 1

哦,我能看出这个故事很棒。

Oh, I can see that the story is great.

Speaker 1

而且它真的有草稿功能。

And it really had this draft feature.

Speaker 1

我说,好的。

I said, okay.

Speaker 1

让我用一下草稿功能。

Let let me use the draft feature.

Speaker 1

我想我告诉它我们要起草一封邮件,然后我们就邮件内容来回沟通。

And I think I told it we were going to draft an email and we're just, you know, like back and forth over the email.

Speaker 1

但我觉得这件事持续了一段时间。

But I think the thing just went on for some time.

Speaker 1

我不知道。

I don't know.

Speaker 1

然后在某个时刻我说,好吧,我们就做这个修改吧,我觉得没问题了。

And then at some point I said, okay, let's just do this change and I think we're good.

Speaker 1

然后它做了修改并发送了邮件。

And then it did the change and it sent the email.

Speaker 1

我当时想,什么?

I was like, what?

Speaker 0

我的意思是,你并没有说我们没问题了。

I mean, you didn't say we're good.

Speaker 0

所以

So

Speaker 1

对。

Right.

Speaker 1

好的。

Okay.

Speaker 1

所以后来我明白了,实际上我不能让它发送邮件,就像在权限部分那样,你知道的。

So then I learned, okay, though, I have to actually not let it send the email, like, you know, like in the permissions part.

Speaker 0

哦,但这一点非常好。

Oh, but that's a great point there.

Speaker 0

所以,在 Harness 工程和控制影响范围的背景下,在这个具体例子中,你应该添加一条指令,或者基本上限制权限,使代理永远无法发送任何内容。

So like, know, in the context of harness engineering and this controlling the blast radius, what you would then do in this specific example is go and maybe either add an instruction or basically limit the permission so that you never allow the agent to send something.

Speaker 0

它只能创建草稿。

It can only make draft.

Speaker 0

你只应授予它草稿权限。

You would only give it access to the draft permission.

Speaker 0

你不应授予它发送权限,这么说吧。

You would not give it access to the send permission, so to speak.

Speaker 1

构建一个使用 MCP 但阻止其部分工具的 SKU,例如。

Build a SKU that uses the MCP and then blocks one of the of the the the some of the tools that the MCP has, for example.

Speaker 1

但,是的,关键是控制它。

But, yeah, just the idea of controlling it.

Speaker 0

我喜欢这个想法。

I like that.

Speaker 0

我觉得这已经很好地说明了如何设计约束机制。

I think that covers shaping the harness pretty well.

Speaker 0

我觉得我已经理解得很清楚了。

I feel I have a good grasp.

Speaker 0

也许我们用最后几分钟聊聊如何整体构建一个约束系统。

Maybe let's spend the last few minutes in the episode talking about building a harness altogether.

Speaker 1

听起来不错。

Sounds good.

Speaker 1

意思是,总有一天,我们目前使用的这些通用工具,比如Cloud或Cursor,就不够用了。

The idea is that at some point, these generic tools that we're using, right, so like think of Cloud or Cursor.

Speaker 1

它们无法满足你们公司的需求,对吧?

They stop being enough for what your company is doing, right?

Speaker 1

它们只能带你走这么远。

They can only take you so far.

Speaker 1

尤其是在大型企业环境中,你往往有很多定制化的内容在进行。

Because especially in the context of larger enterprises, it's like you have a lot of custom stuff going on.

Speaker 1

因此,开发软件变得非常具体,与其他环境中的情况大不相同。

So building software becomes something very specific, like very different from in other environments.

Speaker 0

我非常喜欢这一点。

I like that a lot.

Speaker 0

我认为杰西·文森特提出的一项最受欢迎的技能,他是个非常聪明的人。

One of the most popular skills I think that's out there by Jesse Vincent, super smart guy.

Speaker 0

这项技能叫做‘超能力’。

It's called Superpowers.

Speaker 0

它的理念是一系列技能,简而言之,能迫使你很好地遵循软件工程流程。

And the idea is it's a bunch of skills that, in a nutshell, forces you to follow the software engineering process well.

Speaker 0

因此,这项技能会自动促使你首先进行头脑风暴,制定一个合适的执行计划,或者我们有时称之为ERD。

So the skill will automatically prompt you to first brainstorm, come up with a proper execution plan or what sometimes we call like an ERD.

Speaker 0

编写测试。

Writes the tests.

Speaker 0

它强制你采用TDD那样的实践。

It forces like a TDD kind of practice.

Speaker 0

它确保你设置好工作树,即Git工作树。

It makes sure that you set up work trees, Git work trees.

Speaker 0

因此,这几乎是一个完整的系统,能自动帮你做正确的事。

So it's almost a full system that will enable you automatically to do the right thing.

Speaker 0

我认为这是第一层,就是说,如果我的公司能有这样的工具,那该多好?

That's like I think the first level which is like, hey, wouldn't it be nice if my company just had something like this?

Speaker 0

因为不同的企业,正如你所说,可能在软件开发流程上采取不同的方式。

Because different enterprises and companies to your point probably approach the process of building software differently.

Speaker 0

所以,如果我能把这种做法编码到我的智能体中,让任何在工作中使用该智能体的人都能自然地遵循它,岂不很好?

So wouldn't it be nice if I could encode that into my agent so that anyone who uses the agent at work will naturally gravitate towards that?

Speaker 1

是的。

Yeah.

Speaker 1

当你朝这个方向延伸时,如果继续走下去,你最终想要的是什么?

And when you stretch in that direction, like if you keep going that direction, what is it that you want?

Speaker 1

更信任这个代理。

Want to trust the agent more.

Speaker 1

例如,当你拥有这些超能力时,比如:好吧,我现在更信任这个代理了,因为我知道它会做这些事,而且我已经测试过了。

For example, when you have these superpowers things like, Okay, now I trust the agent more because I know that it's going to be doing this, and I already tested it.

Speaker 1

所以你就开始朝这个方向发展。

So you start going that direction.

Speaker 1

那你还能做些什么?

So what else can you do?

Speaker 1

也许这条路线的终点是拥有自主代理,对吧?

Maybe at the very end of this line is having autonomous agents, right?

Speaker 1

比如在不介入的情况下完成任务。

Like doing things without involvement.

Speaker 1

我认为这就是像Stripe这样的公司所追求的方向。

And I think this is what, for example, companies like Stripe went for.

Speaker 1

所以他们最近分享了一篇文章,称其为 Stripe 小助手。

So they shared this article recently, what they call Stripe Minions.

Speaker 1

这名字不错。

It's a good name.

Speaker 1

基本上,这是从 Goose 分叉出来的,那 Goose 又是什么?

Basically, it's a fork from Goose, which is also what?

Speaker 1

是一个编程代理吗?

A coding agent?

Speaker 1

Goose 是什么?

What is Goose?

Speaker 0

是的,没错。

Oh yeah, yeah.

Speaker 0

它是一个开源的编程代理,对吧?

It's an open source coding agent, right?

Speaker 0

这是来自 Square 或 Block 团队的项目,几乎就像是 Cloud Code 的开源版本。

This is from the Square or the Block guys, almost like an open source version of Cloud Code.

Speaker 1

Stripe的理由是,他们拥有非常非传统的技术栈,同时还有支付安全要求和大量的内部工具。

Stripe, how they justify is that they have this very heterodox tech stack and they have payment security requirements and a lot of internal tooling.

Speaker 1

所以他们希望对这个代理有充分的信心。

So they wanted to have the confidence in the agent.

Speaker 1

于是他们全力投入,深度定制了上下文的呈现方式、Dev Box的启动方式以及其他各种功能。

So then they went all in in really customizing how the context is surfaced and how these Dev Boxes are launched and all these things.

Speaker 0

我本人也很喜欢你推荐给我的那种使用Open Code客户端的思路。

I honestly am also liking the approach where we explore using Open Code, the client, like, that you turned me on.

Speaker 0

再说一遍,这是开源的,这正是它的美妙之处。

Again, it's Open Source, so this is the beauty.

Speaker 0

对吧?

Right?

Speaker 0

所以这几乎就像是我得到了Cloud Code,但却是它的开源版本。

So it's almost like I get Cloud Code, but an Open Source version of it.

Speaker 0

如果我能把Open Code拿过来,再叠加一些这些功能,世界会是什么样子?

What does a world look like when I can take Open Code and, you know, layer in some of these things?

Speaker 0

你知道吗,我只是觉得,应该有人去试试这样做。

You know, I just think, like, someone should try to do this.

Speaker 0

如果你打造一个内置超强功能的 Open Code 定制版本呢?

What if you built a custom version of Open Code with superpowers built in?

Speaker 0

就类似这样的东西。

Just something like that.

Speaker 0

我觉得这会是下一步非常自然的演进。

I can see that being a really good natural evolution of the next step.

Speaker 1

是的。

Yeah.

Speaker 1

还处于非常非常早期的阶段。

Very Very early days.

Speaker 1

非常有趣。

Very interesting.

Speaker 0

我要说的是,这种做法唯一的问题是,你越往定制化的方向走,维护的责任也就越落在你自己身上。

I will say like the only issue with this is you have the more you go in the direction of trying to create something custom, the onus of maintaining that is also on you.

Speaker 0

维护它的负担也在你身上。

The burden of maintaining that is also on you.

Speaker 0

所以这是需要小心的一点。

So that's one thing to be careful.

Speaker 0

这就是为什么大多数大公司能够做到,因为他们可以专门组建一个团队。

That's why most big companies are able to do it because they can dedicate a team.

Speaker 0

但我不会建议外面的独立开发者去构建自己的框架。

But I wouldn't suggest the single developer out there go about building your own harness.

Speaker 0

或者也许你该这么做。

Or maybe you should.

Speaker 0

我不知道。

I don't know.

Speaker 0

也许有些人可以。

Maybe some people can.

Speaker 1

小心别试图重新实现 Kubernetes。

Watch out to not try to reimplement Kubernetes.

Speaker 1

真是个冷门话题。

Deep cut there.

Speaker 1

好的,

Okay,

Speaker 0

尤里。

Iury.

Speaker 0

我喜欢这个说法。

I like that.

Speaker 0

我们谈到了设计和构建测试框架。

We talked about shaping the harness and building the harness.

Speaker 0

也许是时候结束这一集了。

Maybe it's time to close this episode.

Speaker 0

你有什么临别赠言吗?

Are there any parting thoughts you have?

Speaker 1

如果放宽视角,我们从早期的提示工程聊到了现在。

If you zoom out, we've gone from prompt engineering back in the day.

Speaker 1

对吧?

Right?

Speaker 1

好早以前了。

So long ago.

Speaker 1

10。

10.

Speaker 1

然后是上下文工程,现在我们来到了 Harness 工程。

And then context engineering, and now we're at Harness Engineering.

Speaker 1

嗯,是的。

Uh-huh.

Speaker 1

挺有意思的是,每一个阶段都指向了不同的杠杆点,比如如何使用提示、何时呈现上下文以及如何管理上下文。

It's, kinda interesting that how each one will point to a different leverage point, right, from how to use the prompts to when to surface the context and how to manage the context.

Speaker 1

最后是 Harness,也就是智能体自身的环境。

Finally, to harness, which is the environment of the agent itself.

Speaker 0

有道理。

Makes sense.

Speaker 0

我喜欢这个。

I like it.

Speaker 0

从提示工程到上下文工程,再到 Harness 工程。

Prompt engineering to context engineering to harness engineering.

Speaker 1

我希望你能停一下。

I hope you stop at some point.

Speaker 0

我知道。

I know.

Speaker 0

我知道。

I know.

Speaker 0

好的。

All right.

Speaker 0

感谢大家收听,我们下一期再见。

Thank you all for listening, and we will catch you in the next episode.

Speaker 1

下一期见。

Catch you guys in the next one.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客