The Startup Ideas Podcast - Claude Opus 4.6 对比 GPT-5.3 Codex:实时构建,明确胜者 封面

Claude Opus 4.6 对比 GPT-5.3 Codex:实时构建,明确胜者

Claude Opus 4.6 vs GPT-5.3 Codex: Live Build, Clear Winner

本集简介

我与 Bold Metrics 的联合创始人兼首席技术官摩根·林顿坐下来,深入剖析 Claude Opus 4.6 和 GPT-5.3 Codex 同日发布的情况。我们详细讲解了如何在 Claude Code 中设置 Opus 4.6,探讨了自主智能体团队与交互式结对编程之间的哲学分歧,随后让两个模型现场无脚本地从零开始构建一个 Polymarket 竞品,进行实际测试。最终,你将了解如何配置每个模型、何时选择其中一个,以及当它们正面竞争时发生了什么。 时间戳 00:00 – 引言 03:26 – 在 Claude Code 中设置 Opus 4.6 05:16 – 启用智能体团队 08:32 – Codex 与 Opus 的哲学分歧 11:11 – 核心功能对比(上下文窗口、基准测试、智能体行为) 15:27 – 现场演示设置:Polymarket 构建提示设计 18:26 – 比赛开始 21:02 – 最适合氛围型程序员的模型 22:12 – Codex 在 4 分钟内完成 26:38 – Opus 智能体仍在运行,令牌使用量持续上升 31:41 – 测试与评估 Codex 的构建成果 40:25 – Opus 构建完成,首次查看结果 42:47 – Opus 最终构建成果揭晓 44:22 – 对比展示:Opus 本轮胜出 45:40 – 最终总结与建议 关键要点 Opus 4.6 和 GPT-5.3 Codex 在相隔 18 分钟内发布,代表了两种根本不同的工程哲学——自主智能体 vs. 交互式协作。 要正确使用 Opus 4.6,必须将 Claude Code 更新至 2.1.32+ 版本,在 settings.json 中设置模型,并明确启用实验性的智能体团队功能。 Opus 4.6 的突出特点是多智能体编排:你可以同时启动多个智能体分别负责研究、架构、用户体验和测试。 GPT-5.3 Codex 的突出特点是任务中干预:你可以在模型构建过程中随时中断、重定向或调整方向。 在实时对决中,Codex 在不到 4 分钟内完成了 Polymarket 竞品;Opus 耗时更长,但生成了更精致的 UI、更丰富的功能集,以及 96 个测试用例,而 Codex 仅有 10 个。 智能体团队会大幅增加令牌消耗——一次 Opus 构建可能在所有智能体间消耗 15 万至 25 万令牌。 发现创业点子/趋势的首选工具 - https://www.ideabrowser.com LCA 帮助财富 500 强企业与快速增长的初创公司构建未来——从华纳音乐到堡垒之夜再到 Dropbox。我们用 AI、应用和下一代产品将“如果”变为现实 https://latecheckout.agency/ 氛围营销者 - 为热衷氛围营销与 AI 营销的人提供的资源:https://www.thevibemarketer.com/ 在社交平台关注我 X/Twitter:https://twitter.com/gregisenberg Instagram:https://instagram.com/gregisenberg/ LinkedIn:https://www.linkedin.com/in/gisenberg/ 摩根·林顿 X/Twitter:https://x.com/morganlinton Bold Metrics:https://boldmetrics.com 个人网站:https://linton.ai

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

今天是个大日子,因为Anthropic刚刚发布了Opus 4.6,而OpenAI也以GPT 5.3 Codex作为回应。

Today is a massive day because Anthropic just dropped Opus 4.6, and OpenAI answered with GPT 5.3 Codex.

Speaker 0

但哪个模型更好?如何开始使用?又有哪些技巧能让它们发挥最大效用?

But what is the better model, and how do you get started, And what are some tips and tricks to get the most out of them?

Speaker 0

嗯,本期节目就专门讨论这些话题。

Well, this episode is all about that.

Speaker 0

这是为那些想要充分利用这些模型的技术人士准备的——他们不只想听热门观点,而是想要能最大化利用这些模型的实战技巧。

This is for the technical person who's trying to get the most out of these models, who don't just want hot takes, who want tactical sauce for getting the most out of these models.

Speaker 0

本期播客的嘉宾是我亲爱的朋友摩根·林顿。

This episode of the pod is with my dear friend, Morgan Linton.

Speaker 0

摩根是我认识的最优秀的工程师之一。

Morgan is one of the best engineers I know.

Speaker 0

他曾是Sonos公司的高管。

He was an executive at Sonos.

Speaker 0

他投资了许多人工智能公司,并且正在创办自己的人工智能企业。

He's invested in a lot of AI companies, and he's building an AI company of his own.

Speaker 0

他是我首先会打电话咨询的人之一,当我问‘嘿,哪个模型更好?’的时候

He's one of my first calls when I'm like, hey, which model's better?

Speaker 0

所以我们让这些模型正面交锋,最后会有一个胜出者

So we put the models head to head and there's a winner at the end.

Speaker 0

我们重建了价值数十亿美元的应用程序Polymarket,但这次我们使用了这些模型

We rebuild Polymarket, a multibillion dollar app, but we use these models.

Speaker 0

那么哪个才是更好的呢?

So which is the better one?

Speaker 0

通过观看本期节目你会找到答案,同时你还会学到成为更优秀的AI开发者,因为你会把这些技巧和诀窍收入囊中。

You'll find out by watching this episode, but you'll also learn to become a better AI developer because you'll have these tips and tricks in your back pocket.

Speaker 0

和我在一起的是我最喜欢的人之一,摩根·林顿。

I'm with one of my favorite people, Morgan Linton.

Speaker 0

你可能不认识他,但他真的是一位非常出色的开发者、创始人、企业家和投资人。

You might not know him, but he is just, you know, just an incredible developer, founder, entrepreneur, investor.

Speaker 0

他无所不能,但今天,我需要他帮我理解的是,Opus 4.6刚刚发布。

He's does it all, but today, what, you know, what I I needed him to help me understand is Opus 4.6 just came out.

Speaker 0

GPT 5.3 编码器刚刚发布了。

GPT 5.3 codecs just came out.

Speaker 0

摩根,帮我理解一下。

Morgan, help me understand.

Speaker 0

看完这一集后,观众能获得什么?

By the end of this episode, what are people gonna get out of this?

Speaker 1

是的。

Yeah.

Speaker 1

好了,格雷格,谢谢你邀请我。

Well, Greg, thanks for having me.

Speaker 1

今天真是令人兴奋。

Super exciting day.

Speaker 1

今天进展得很快。

It's moving fast today.

Speaker 1

Opus 4.6 刚刚发布,然后萨姆·阿尔特曼发了一条简短的推文。

Opus 4.6 came out, and then Sam Altman put together a quick tweet.

Speaker 1

大概十八分钟后,他就发推宣布了GPT 5.3 Codex。

Wanna say like maybe eighteen minutes later announcing GPT 5.3 Codex.

Speaker 1

我觉得其他人也都在忙着尝试,摸索其中的不同之处。

And me, I think everybody else has been jumping on it, playing around, figuring out the differences.

Speaker 1

你知道的,每个版本里都有些小巧的新设置。

You know, little neat new settings that there are in each of these.

Speaker 1

到最后,你会首先知道如何确保自己正在运行Opus 4.6,以及可以在设置中调整的所有细节。

By the end of this, you're gonna know first how to make sure that you are running Opus 4.6 and all of the little details you can change in the settings.

Speaker 1

Json文件来使用Opus 4.6中的一些酷炫功能,尤其是智能体团队功能,这大概是我最期待的特性了。

Json file to use some of the cool features in Opus 4.6, especially agent teams, which is probably the feature I'm the most excited about.

Speaker 1

你还会理解为什么可能会选择使用其中一个而不是另一个,因为它们各自采用了不同的工程方法。

You'll also understand why you might use one versus the other, because they both kind of tackle different engineering methodologies.

Speaker 1

然后希望在我们一起构建一些我准备但自己还没尝试过的演示时,你能看到一些很酷的东西。

And then hopefully you'll see some cool stuff as we build some demos together that I've put together that I haven't tried myself.

Speaker 1

所以我会和大家一起现场尝试。

So I'll be trying just live with you.

Speaker 1

那我们来看看会怎么样。

So we'll see how that goes.

Speaker 0

酷。

Cool.

Speaker 0

我认为其中一个是我们打算重现Polymarket。

I think one of them is we're gonna try to recreate Polymarket.

Speaker 0

是的。

Yes.

Speaker 0

看看哪个模型表现最好。

And see which model performs best.

Speaker 1

他们将进行一场对决,各自构建自己的Polymarket版本。

They're gonna do a head to head to try to each build their own version of Polymarket.

Speaker 0

所以到本集结束时,你将对如何使用这些模型、何时使用它们以及如何入门有相当清晰的理解。

So by the end of this episode, you will have a pretty good understanding of how to use the models, when to use the models, how to get started.

Speaker 0

摩根,我们开始吧。

Morgan, let's get into it.

Speaker 1

好的,没问题。

Cool, right on.

Speaker 1

好的,我记了些笔记,基本上,你知道,我会在Mac桌面应用上演示Five Three Codecs,因为他们对此非常兴奋。

All right, so I took some notes and essentially, you know, with Five Three Codecs, I'll be showing that in the desktop app on Mac, because they're super excited about that.

Speaker 1

我也很期待。

I'm excited about it.

Speaker 1

我觉得如果OpenAI想要一个完美的演示,他们会希望我在他们的应用里完成。

I think if OpenAI was wanting a demo to be done the right way, they would want me to do it in their app.

Speaker 1

而对于OPUS 4.6,我会说Anthropic团队希望我在CLI中完成操作。

Whereas with OPUS 4.6, I would say the Anthropic team would want me to do it in the CLI.

Speaker 1

因此在使用Opus 4.6时,有几个不同的配置设置你确实需要确保正确设置。

And so there's a few different configuration settings that you do wanna make sure that you get right when you're using Opus 4.6.

Speaker 1

我们今天、明天,或者无论你什么时候开始使用,都在尝试使用Opus 4.6。

We're trying to use Opus 4.6 today, tomorrow, whenever it is that you're jumping in to use it.

Speaker 1

我今天在Twitter上看到很多人说,这很奇怪。

I've seen a lot of people online today on Twitter saying, it's weird.

Speaker 1

我遇到了一个问题。

I'm having a problem.

Speaker 1

比如,我应该能看到智能体团队,但我没看到它们。

Like, I'm supposed to be agent teams, but I don't see them.

Speaker 1

或者怎么知道我运行的是哪个版本?

Or how do know what version I'm running?

Speaker 1

所以我想,让我们先给大家一个公平的起点,让大家知道,好吧,我想能够使用Claude代码配合Opus 4.6。

So I thought, let's start by just giving everybody a level playing field to know, okay, I wanna be able to use Claude code with Opus 4.6.

Speaker 1

我该如何确保自己正确做到这一点?

How do I make sure I'm doing that and doing that correctly?

Speaker 1

所以这里有一些每个人清单上应该有的初步待办事项。

So here's kind of the initial to dos that everyone should have on their list.

Speaker 1

只需运行一次 npm update,看看是否有效。

Just do an n NPM update, see if that does the trick.

Speaker 1

如果不行,而且你运行的是旧版本,那就运行 clot update。

If that doesn't, and you're running an older version, then run clot update.

Speaker 1

但你现在应该能看到,目前版本是2.10.32。

But you you should see like, as of right now, it's 2.10.32.

Speaker 1

如果你看到的版本号是1点几,说明你在运行旧版本。

If you see one dot something, you're running an old version.

Speaker 1

然后你需要做的是进入你的settings.json文件,我在这里演示一下。

And then what you wanna do is go into your settings dot JSON, and I'll just show this here.

Speaker 1

所以你只需要输入类似 cd ~/.quad 这样的命令。

So if you just do like c d tilde slash dot quad.

Speaker 0

所以我敢肯定有人还在用旧模型。

So I bet that there's people who are running the old model.

Speaker 0

他们甚至都没意识到。

They don't even realize it.

Speaker 1

很可能

Probably a

Speaker 0

糟糕,糟糕的结果。

bad, bad result.

Speaker 1

完全不知道。

No idea.

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

所以我的意思是,确保你进入这里,CDTildeSlashDotClaud。

So I mean, make sure you go in here, CDTildeSlashDotClaud.

Speaker 1

这是你的 settings.json 文件。

Here's your settings dot JSON.

Speaker 1

如果你查看这个,这里基本上就是你应该看到的内容。

If you view this, here's essentially what you should see.

Speaker 1

现在没问题了。

Now it's okay.

Speaker 1

可以是模型。

It can be model.

Speaker 1

如果你想具体指定的话,可以输入claw-opus-4-6,这样就能锁定它。

If you wanna like really be specific about it, you can put in claw dash opus dash four dash six, that'll lock it in.

Speaker 1

但由于4.6是最新模型,你也可以只输入model和opus,这样也能用。

But because 4.6 is the newest model, you can also just put in model and just opus, and that'll work.

Speaker 1

在我看来,最关键的功能是他们随4.6版本新增的智能体团队功能。

The key thing that you wanna do is in my opinion, the coolest feature that they added with 4.6 is agent teams.

Speaker 1

我非常期待向你演示这个功能。

I'm super excited to demo that with you.

Speaker 1

你必须确保开启这个功能,因为它还处于实验阶段。

You have to make sure to turn that on because it is experimental feature.

Speaker 1

这可能是我今天看到人们对OPUS四六最大的困惑——他们正在运行OPUS四六。

And that's probably the biggest confusion I'm seeing people have today with OPUS four six is that they are running OPUS four six.

Speaker 1

他们不断听说代理团队,然后给出诸如'组建一个代理团队'这样的提示。

They keep hearing about agent teams, and they're giving it prompts like, build a team of agents.

Speaker 1

做这个和那个。

Do this and this.

Speaker 1

但它并没有完全做到这一点,这是因为你确实需要启用这个功能。

And it's not quite doing it, and that's because you do have to enable this.

Speaker 1

所以你需要添加这段Claude代码 experimental agent teams,然后将其设置为1。

So you do have to add in end this Claude code experimental agent teams and then set it equal to one.

Speaker 1

明白了吗?

Okay?

Speaker 1

没什么太复杂的。

Nothing too crazy.

Speaker 1

一旦你这样做了,就能实现所有这些功能。

Once you do that, that will make all that possible.

Speaker 1

考虑到这一点,你基本上就准备好了,然后只需在终端运行Cloud,一切就绪。

So with that in mind, you're pretty ready to go there, then you can just run Cloud in the terminal and you're good.

Speaker 1

对于使用API的用户,我想特别指出一个很酷的新功能,叫做自适应思考。

For people that are using the API, the one thing I did want to point out is there's a pretty cool new addition, which is called adaptive thinking.

Speaker 1

另外,为了明确起见,因为我发现这方面也存在一些混淆。

Also, just to be clear, because I'm seeing confusion on this too.

Speaker 1

这是在API中的功能。

This is in the API.

Speaker 1

这并不是Claude代码本身的功能。

This is not in Claude code itself.

Speaker 1

但自适应思考功能,在这里展示一下,你可以选择希望模型使用的努力程度。

But adaptive thinking, just to show it here, you're able to essentially pick the level of effort that you would like the model to use.

Speaker 1

顺便说一下,如果你想使用最高努力级别,这只在四六版本中有效。

This is only gonna work in four six, by the way, if you want to use like an effort level of max.

Speaker 1

所以这里有点不同的层级。

And so here's kind of a different level.

Speaker 1

使用max模式时,Claude的思考深度不受限制。

So with max, Claude always thinks with no constraints on thinking depth.

Speaker 1

这仅限于Opus 4.6版本。

It's Opus 4.6 only.

Speaker 1

因此在其他模型上使用max模式的请求会返回错误。

So requests using max on other models are gonna return an error.

Speaker 1

所以如果你调用API时把努力级别设为max却收到错误,那你可能没有使用Opus 4.6。

So if you're calling the API and you set the effort level to max and you get an error, then you're probably not using Opus 4.6.

Speaker 1

但这里有个例子可以看到,如果我调用API时把模型设为Claude Opus 4.6,然后在这里设置努力级别。

But here's the example where you can see if I'm calling the API, I set the model to Claude Opus 4.6, and then here's where I can set the effort.

Speaker 1

还有一点。

And this is another thing.

Speaker 1

如果你使用的是现有的API代码,模型可能是Opus 4.5,现在你把努力级别调到max,就会报错。

If you're using existing API code, you may have the model of Opus 4.5, and now you adjust the effort to max, it gives you an error.

Speaker 1

你只需要升级版本就可以了。

All you need to do is just bump the version and you're good.

Speaker 1

但这是他们在4.6版本中为API添加的一个很酷的功能,值得一提。

But this is kind of a neat thing they've added to the API with 4.6, it's worth worth mentioning.

Speaker 1

最后我想说的是,如果你想为智能体使用分屏功能。

And then kind of the last thing I would say is just if you wanna use split panes for agents.

Speaker 1

如果你希望智能体显示在不同的窗格中,并且正在使用类似Warp的工具,只需确保安装TMUX。

So if you want agents to show up in different panes, and you're using something like warp, just make sure to install TMUX.

Speaker 1

你可以通过 brew install TMUX 来安装。

You can do this with brew install TMUX.

Speaker 1

安装后默认会设置为自动模式,通常意味着在进程中运行,也就是在你现有的终端窗口内,所有智能体将协同工作。

And then if you do that, it's gonna default to auto, which usually means in process, which means in that same terminal window you have, the agents are gonna be working all together.

Speaker 1

如果你想要分屏显示,只需要在 settings.json 中更新设置改为分屏模式即可。

If you want it to split pane, then you just need to update that setting in the settings dot JSON to split panes.

Speaker 1

我不会深入细节,但这些对于刚开始使用 Opus 的用户来说都是很好的基础配置建议。

I'm not gonna go into super details on that, but those are just like, I think good housekeeping to start with for anyone using Opus.

Speaker 1

不过不用担心这个。

But don't worry about it.

Speaker 1

其实大家需要做的,特别是如果你甚至不想使用Teams或智能体团队,就是确保你使用的是最新版本,并且模型是Opus,这样你就会用到Opus 4.6。

Really all that anybody needs to do, especially if you don't even wanna use Teams, agent Teams, is just make sure you're updated using the newest version, and that the model is Opus, and you'll be using Opus 4.6.

Speaker 0

好的。

Cool.

Speaker 1

就是这样。

So that's that.

Speaker 1

在我深入探讨Opus 4.6和Codex之间的区别之前,我想先读一下这个内容,因为它四小时前发布在Hacker News上,我读的时候就觉得这是最好的解释方式。

Before I get into kind of the differences between Opus 4.6 and Codex, I thought I would actually read this, because this is posted on Hacker News four hours ago, and I was reading it, I was thinking that's like the best way to explain it.

Speaker 1

所以我就读一下这一小段,因为我觉得他们写得非常好。

So I'm just gonna read this little section here, because I think they do such a good job with it.

Speaker 1

这个人说:对我来说有趣的是,GPT-5 30和Opus 46在哲学理念上正在分化,实际上就像真正的工程师和组织在哲学理念上分化一样。

This person's saying, What's interesting to me is that GPT-five 30 and Opus 46 are diverging philosophically, and really in the same way that actual engineers and orgs have diverged philosophically.

Speaker 1

我觉得这真的说到点子上了。

I think this really nails it.

Speaker 1

使用Codex 5.3时,其定位是一个交互式协作者。

With Codex 5.3, the framing is an interactive collaborator.

Speaker 1

你可以在执行过程中引导它,保持参与循环,并在其工作时进行航向修正。

You steer it mid execution, stay in the loop, course correct as it works.

Speaker 1

而Opus 4.6则强调相反的理念,它是一个更自主、更具代理性、更深思熟虑的系统,能够进行深度规划、长时间运行,并减少对人类的依赖。

With Opus four six, the emphasis is the opposite, a more autonomous, agentic, thoughtful system that plans deeply, runs longer, and asks less of the human.

Speaker 1

这感觉像是反映了人们对基于LLM的编程应该如何运作的真实分歧。

That feels like a reflection of a real split in how people think LLM based coding should work.

Speaker 1

有些人希望保持紧密的人机交互控制。

Some want tight human and loop control.

Speaker 1

另一些人则希望委托整块工作并审查结果。

Others want to delegate whole chunks of work and review the results.

Speaker 1

说实话,我觉得这段话描述得非常到位。

And I honestly, I think that says it beautifully.

Speaker 1

我认为这精准概括了它们之间的差异。

And I think that that nails the differences.

Speaker 1

而且,希望你知道,大家都想选出一个赢家,但其实不是这样的。

And also, hopefully, you know, there's all everybody wants to pick a winner where it's like, oh, no, no.

Speaker 1

Opus 4.6更好。

Opus four six is better.

Speaker 1

Codex更好。

Codex is better.

Speaker 1

它们是不同的。

It's it's different.

Speaker 1

这取决于你的方法论。

It depends on what your methodology is.

Speaker 1

我认为我们现在看到的,不仅限于Vibe编码,还包括整体的AI驱动工程,是如何希望与代理式编程协作。

And I think what we're seeing now, not just with Vibe coding, but also with like, overall, like AI powered engineering is how do you want to work with agentic coding?

Speaker 1

你是想要一个完全自主的体验,让代理去执行任务吗?

Do you want to have a totally autonomous experience where you're sending agents out to do work?

Speaker 1

还是想把LLM当作另一个队友,和它结对编程?

Or do you wanna work with an LLM like another teammate and pair program with the LLM?

Speaker 1

现在你正看到这种分歧,我认为很多团队会同时使用这两种方式,因为Codex真的是你的合作者。

And that's where you're now seeing a divergence where I think you're gonna see a lot of teams using both, because Codex really is your collaborator.

Speaker 1

而他们在53版本中加入的功能,实现了非常出色的执行中引导。

And what they've added with Five Three is like really good, like mid execution steering.

Speaker 1

而OPUS 4.6可能是目前最出色的,能够让你说:我想启动三到四个代理。

Whereas with OPUS 4.6, it's probably the best of the best now being able to say, I wanna spin up three or four agents.

Speaker 1

我想让它们去完成一些任务。

I want them to go do stuff.

Speaker 1

嘿,别来烦我。

Hey, and don't bug me.

Speaker 1

我希望相信它们会做出好东西,并且能够交付成果。

I want to trust they're gonna do good stuff, and it's able to deliver.

Speaker 0

那么你的意思是,在某种程度上这只是个人偏好问题?

So are you saying that there, in some ways, it's just a preference?

Speaker 0

就像,取决于你怎么看,基本上没有对错之分。

Like, depending on how you know, there's no right or wrong, basically.

Speaker 0

你知道,选择OPUS并没有错,或者,这可能只是感觉上更合适。

You know, you're not wrong to be an opus person or, you know, it just like might feel yeah.

Speaker 0

这可能只是一种偏好。

Might it's just a preference.

Speaker 1

是的。

Yeah.

Speaker 1

嗯,你可能两者都需要。

Well, you might be both.

Speaker 0

确实如此。

That's true.

Speaker 1

结果可能是你两者皆是。

It might turn out that you're both.

Speaker 1

这就是为什么我不想让大家失望,但我们不会以我宣布'所以获胜者是...'来结束,而是会说,这取决于你想做什么。

That's why I like not to disappoint people here, but we're not gonna end this with me saying, and so the winner is It's like, well, depends on what you wanna do.

Speaker 1

每个人对此都有不同的方法。

Everyone has a different methodology for it.

Speaker 1

所以我会深入尝试快速完成这部分,因为我知道有趣的部分可能是我们动手操作这两个系统,让它们进行正面交锋,尝试在有限时间内构建一个Polymarket的竞争对手。

So I'll dive in and try to make this part fast, because I know the fun part is probably us going in and playing around with both of these and having them do a head to head and try to build a competitor to Polymarket and however much time we have.

Speaker 1

但我会先大致介绍一下这些内容,以便让想了解核心差异的人知道。

But I'll just I'll just start kind of going into these at a high level, just so for anyone who wants to know, like, what are the core differences?

Speaker 1

为什么这如此有趣?

Why is this so interesting?

Speaker 1

直接进入正题吧。

Just go into what that is.

Speaker 1

所以Opus 4.6拥有更大的上下文窗口。

So with with Opus 4.6, much bigger context window.

Speaker 1

这里有一百万个token的上下文窗口。

So you have a million token context window here.

Speaker 1

在整个文档和代码库中具有非常强的一致性,专为加载整个知识体系并进行推理而设计。

Very strong coherence over entire documents and repos designed for, you know, like load the whole universe and reason over it.

Speaker 1

五三模型虽然提到了大上下文,但这并非其主要特性。

Five three, they talk about large context, but it's not a headline feature.

Speaker 1

我实际上反复尝试才让它给出了一个具体数字。

And I actually went back and forth of it to get it to actually give me a number.

Speaker 1

这个数字大约是20万个代币,并不算太令人印象深刻。

And the number is around 200,000 tokens, which is not that impressive.

Speaker 1

这比我预想的要小一些。

That's smaller than I was thinking it would be.

Speaker 1

不过没关系。

But that's okay.

Speaker 1

它是优化过的,你知道,是为了渐进式执行而非完全记忆。

It's optimized, you know, for for progressive execution rather than total recall.

Speaker 1

所以这就是为什么那方面不那么重要。

So that's why that's not as important.

Speaker 1

而且,你知道,优化了决定哪些内容保留在工作内存中。

And, you know, optimized for deciding like what to keep in working memory.

Speaker 1

所以从高层次来看,这意味着当任务需要先理解所有内容再决策时,Claude表现更出色。

So high level, what that means is Claude is better when the task is understand everything first and then decide.

Speaker 1

GPT 5.3 codecs 可能在需要快速决策、行动、迭代的任务上表现更好,更像是那种结对编程、任务中途改变行为的方式。

GPT 5.3 codecs is probably better when the task is decide fast, act, iterate, more of that pair programming, mid task change behavior.

Speaker 1

对于编码基准测试,OPUS 4.6 在基于代码的理解、具有架构敏感性的重构、解释系统为何以某种方式运行方面确实很出色。

For coding benchmarks, know, OPUS 4.6 is really good at code based comprehension, refactors with like architectural sensitivity, explaining why a system behaves a certain way.

Speaker 1

而且,你知道,它不太会像那种'管他呢先写代码再说'的倾向,对吧?

And then, you know, a little less tendency of this like YOLO write code, right?

Speaker 0

这很好。

That's good.

Speaker 1

我认为这是每个人都想要的。

Which is I think something everybody wants.

Speaker 1

是的,没错。

Yeah, exactly.

Speaker 1

所以,这对大家都有好处,尤其是对于刚开始编程的新手来说,他们可能无法识别幻觉错误。

So, you know, that's good for everybody, but especially for vibe coders that are getting started, and they may not be able to identify hallucinations.

Speaker 1

Opus 4.6在这方面肯定会表现得更好。

Opus 4.6 is definitely gonna perform better there.

Speaker 1

但对于团队来说,你知道的,像我和我的团队这样在大型代码库中构建,这一点也非常重要。

But then for teams, you know, building in large code bases like me and my team are doing, that's also really important.

Speaker 1

所以在这方面对大家来说都是一种胜利。

So kind of a win for everyone there.

Speaker 1

5.3 Codecs确实在SWD bench pro和terminal bench上胜出。

Five three Codecs did win on SWD bench pro, terminal bench.

Speaker 1

总体而言,它在编程基准测试中得分更高。

Overall, it's like scored better on coding benchmarks.

Speaker 1

所以可能在端到端的应用生成方面表现更好。

So probably better end to end app generation.

Speaker 1

而Claude有点像高级评审员或资深工程师,GPT-5.3大概就像是你们的创始工程师,对吧?

And Claude's kind of like senior reviewer staff engineer, GPT-five-three probably like your founding engineer, right?

Speaker 1

智能体行为方面,Opus 4.6,这才是关键所在,对吧?

Agenic behavior, Opus four-six, this is the key one, right?

Speaker 1

它就像是多智能体协同运作。

It's like the multi agent orchestration.

Speaker 1

这大概是4.6版本中最前沿的功能了。

That's probably like the bleeding edge feature in 4.6.

Speaker 1

而5.3版本的Codex,真正实现了任务驱动的自主性,能够主动构建、测试和修改。

And then with 5.3 Codex, really like task driven autonomy, build, test, modify without being asked.

Speaker 1

但有了这个任务引导功能,你可以观察它,随时介入,就像你的编程伙伴在写代码,你可以说‘哎,等等,伙计’。

But then this task steering, you can watch it, you can go in, it's like your buddy's coding, and you can say, oh, wait, wait, man.

Speaker 1

等一下。

Wait.

Speaker 1

你为什么要这么做?

Why are you doing this?

Speaker 1

然后你可以让它停下来,说声“好的”。

And you can stop it, and you'll go, okay.

Speaker 1

接着你可以重新开始。

And then you can restart.

Speaker 1

你可以实时修正问题。

You can really fix things in line.

Speaker 1

用OPUS来做这个要困难得多。

Much harder to do that with with OPUS.

Speaker 1

用OPUS的话,你基本上需要先停止它然后重新开始,不过它的上下文窗口很大,所以它知道之前做了什么。

With OPUS, you'll kinda be stopping it and then starting somewhat fresh, but it has a pretty big context window, so so it knows what it did.

Speaker 1

但是,你知道,Claude实际上是在问,我们应该这样做吗?

But, you know, Claude's really asking, like, should we do this?

Speaker 1

而GBT 5.3则是,我能多快把这个东西交付出去?

GBT 5.3 is like, how fast can I ship this?

Speaker 1

对吧?

Right?

Speaker 0

真的,我的意思是,这太酷了,因为它们给人的感觉几乎像是不同的人。

It's really, I mean, it's so cool because it almost feels like they're different people.

Speaker 0

你明白我的意思吗?

You know what I mean?

Speaker 1

就像它们

Like they

Speaker 0

有不同的风格。

have different styles.

Speaker 1

是的,完全同意。

Yes, totally.

Speaker 1

对,这是一个很好的看待方式。

Yeah, it's a good way to look at it.

Speaker 1

就像是不同的个性类型,对吧?

It's like a different personality type, right?

Speaker 1

然后确实,Claude 4.6的失败模式可能是过度分析。

And then yeah, failure modes, Claude 4.6, it might overanalyze.

Speaker 1

它的上下文窗口要大得多。

It's got a much bigger context window.

Speaker 1

当需求模糊时它可能会犹豫,然后可能在完全执行前就停止了。

It can hesitate when requirements are ambiguous, and then it can stop short on full execution.

Speaker 1

而Codex可能过于自信,会过早锁定一个有缺陷的假设,但如果发生这种情况,你可以把它引导回正确的方向。

Five three Codex could be overconfident, can lock in a flawed assumption early, but you can steer it back in the right direction if that happens.

Speaker 1

所以这大概就是两者之间的一个高层次概述。

So that's kind of a high level overview on the two.

Speaker 1

很酷。

Cool.

Speaker 0

这很有帮助。

That's helpful.

Speaker 0

是的。

Yeah.

Speaker 1

那我们直接开始吧?

So should we just dive in?

Speaker 1

这些我都没有测试过。

I haven't tested any of this.

Speaker 1

所以我现在没有任何预设演示,因为我觉得一起尝试些新东西看看会发生什么会更有趣。

So this is now have like zero canned demos because I thought it'd be more fun just to try something together and see what happens.

Speaker 1

那我们要不要试试看?

So should we try it?

Speaker 0

好的。

Yeah.

Speaker 1

明白。

Okay.

Speaker 1

那么让我们看看。

So let's see.

Speaker 1

我准备先用Opus,我已经预先加载了这些提示。

I'm gonna I'm gonna start with Opus, and I've got these prompts preloaded.

Speaker 1

所以我正在给出不同的提示,就像我觉得你说得非常好那样。

So I'm giving different prompts, just like I think you said it really well.

Speaker 1

就像你在和不同的人交谈一样。

It's like you're talking to different people.

Speaker 1

所以,你知道,当我和Opus对话时,我可以告诉它‘给我组建一个团队’,并说明我希望每个团队成员做什么。

And so, you know, when I'm talking to Opus, I can tell Opus, build me a team, and here's what I want each member of the team to do.

Speaker 1

当我和Codex对话时,我不能真的让它给我组建团队,但我可以告诉它去思考一些事情。

When I'm talking to Codex, I can't really tell it to build me a team, but I can tell it to think about stuff.

Speaker 1

所以我准备给Opus的提示是:构建一个Polymarket的竞争对手。

So the prompt that I'm gonna give to Opus is build a competitor to Polymarket.

Speaker 1

创建一个代理团队,从不同角度探索这个问题。

Create an agent team to explore this from different angles.

Speaker 1

一位团队成员负责技术架构,一位负责理解Polymarket和预测市场的方方面面,一位负责用户体验,还有一位专门负责编写完善的测试来确保一切正常运作。

One teammate on technical architecture, one on understanding Polymarket and the ins and outs of prediction markets, one on UX, and one that just works on building really good tests to make sure everything works.

Speaker 1

对于Codex,我会给它一个稍有不同的提示,但非常相似。

For Codex, I'm gonna give it a little different prompt, but very similar.

Speaker 1

所以我仍然要构建一个与Polymarket竞争的产品,但现在要深入思考技术架构、理解Polymarket和预测市场的方方面面、打造良好简洁的用户体验,并确保建立完善的测试以保证一切正常运行。

So I'll still build a competitive Polymarket, but now think deeply about technical architecture, understanding Polymarket and the ins and outs of prediction markets, good clean UX, make sure it builds really good tests to make sure everything works.

Speaker 1

公平起见,我会尽量在同一时间粘贴这些内容。

And to be fair, I'm gonna try to paste these in around the same time.

Speaker 0

你真是个公平的人,摩根。

You're a fair guy, Morgan.

Speaker 1

我这是在尽量保持公平,对吧?

I'm trying to keep it fair here, right?

Speaker 1

这是唯一的办法。

That's the only way to do it.

Speaker 1

就像我说的,没有赢家或输家。

Like I said, no winners or losers.

Speaker 1

关键是让每个人都有公平参与游戏的机会。

It's just about letting everybody have a fair shot to play the game.

Speaker 0

是的。

Yeah.

Speaker 1

好的。

All right.

Speaker 1

那么我来看看,我会为你创建不同的目录。

So let's see, I'm gonna make different directories for you.

Speaker 1

我会这样操作,我们就叫这个项目为Opus四、五,Jolly Market竞品。

So I'll do, let's just call this Opus four, five, Jolly Market competitor.

Speaker 1

行。

All right.

Speaker 1

那么让我们在这里启动Claude吧。

So let's fire up Claude in here.

Speaker 1

顺便说一下,如果你想检查运行状态,确保模型处于良好状态,输入斜杠model,我就能看到这里显示的是Claude Opus 4.6。

By the way, if you wanna check when you're running, just to like really make sure that you're in a good place with the model, if you type slash model, I can see here, right, Claude Opus four 6.

Speaker 1

对吧?

Right?

Speaker 1

所以我这边没问题了。

So I'm good there.

Speaker 1

我要复制这个提示,确保所有内容都正确复制进来了。

I'm gonna take this prompt, copy it, make sure this is all copied in correctly.

Speaker 1

好的,搞定了。

Okay, got that.

Speaker 1

我暂时先不按回车键。

I'm not gonna hit enter yet.

Speaker 1

我要确保这次测试完全公平。

I'm making this totally fair.

Speaker 1

我不想让Anthropic或OpenAI的任何人生我的气。

I don't want anyone at Anthropic or OpenAI to get upset with me.

Speaker 1

所以我想和他们两家都保持良好的关系。

So I wanna be in good terms with both of them.

Speaker 0

完全正确,聪明人。

Totally, smart guy.

Speaker 1

让我看看。

Let's see.

Speaker 1

哦等等,其实你知道吗?

Oh wait, actually you know what?

Speaker 1

我确实想为这个创建一个新文件夹。

I do wanna create a new folder for this.

Speaker 0

但我们是在保持真实。

But we are keeping it real.

Speaker 0

我们是客观的。

We're being objective.

Speaker 0

我和摩根都与这两家没有关联,呃,其实我不确定你的情况。

Neither myself or Morgan are affiliated with either Well, actually, don't know about you.

Speaker 0

我没有关联

I'm not affiliated

Speaker 1

没有。

with Nope.

Speaker 1

两家我都同样喜欢。

Epic or I love them both equally.

Speaker 1

你觉得怎么样?

How about that?

Speaker 1

好的。

Yeah.

Speaker 1

行。

Okay.

Speaker 1

我会尽量让它们开始的时间保持同步。

And I'm gonna try to start as close to on the same time as I can.

Speaker 1

输入,开始。

Enter, go.

Speaker 1

好的。

All right.

Speaker 1

它们开始运行了。

They're going.

Speaker 1

比赛开始了。

Off the races.

Speaker 0

那你觉得接下来会发生什么?

So what do you think is gonna happen?

Speaker 1

这是个很好的问题。

That's a great question.

Speaker 1

嗯,我现在知道,因为我让Opus 4.5使用不同的团队成员来构建,它会这么做的。

Well, I know right now because I told Opus 4.5 to build using different teammates, it's gonna do that.

Speaker 1

所以你可以看到这里写着:我将通过先启动并行研究代理,然后将它们的发现综合成一个全面的实施计划和代码库,来构建一个PolyMark的竞争对手。

So you can see here it says, I'll build a PolyMark competitor by launching parallel research agents first, then synthesizing their finding to a comprehensive implementation plan and code base.

Speaker 1

这是全新的,对吧?

This is brand brand new, right?

Speaker 1

就像如果我昨天用Opus做这个,是不可能实现的。

Like if I did this with Opus yesterday, wouldn't be possible.

Speaker 1

这里的区别在于,Codex的工作方式其实就是一直以来事情运作的方式,对吧?

Kind of the difference here is that the way that Codex is working is the way things have kind of always worked, right?

Speaker 1

所以你看,这就像是个体的人,对吧?

So if you see, this is like the individual person, right?

Speaker 1

它并不是说,好吧,我要启动所有这些不同的智能体然后比较它们的说法。

It's not saying, okay, I'm gonna launch all these different agents and compare what they say.

Speaker 1

它更像是,好的,我要检查一下工作空间。

It's like, okay, I'm gonna inspect the workspace.

Speaker 1

这就是你所说的那种注重细节、非常资深的创始工程师类型,就像那个例子给出的那样,对吧?

This is your, you know, detail oriented, really senior like founding engineer, like that example gave, right?

Speaker 1

而在这边,你可以看到它已经启动了这些智能体,现在它想要进行网络搜索,我会让它去做的。

Whereas over here, you can see it's already launched these agents, and now it wants to do web searches, and I'm gonna let it do that.

Speaker 1

所以多个智能体被要求进行网络搜索。

So multiple agents were asking to do web searches.

Speaker 1

现在同时启动所有四个研究智能体。

So now launching all four research agents in parallel.

Speaker 1

这个已经启动了,我有我的技术架构智能体。

So this is off, and I've got, you know, my technical architecture agent.

Speaker 1

我还有另一个智能体,它们现在都在进行网络搜索。

I've got this other agent that these are both doing web searches right now.

Speaker 1

所以有一个正在研究预测市场订单簿匹配引擎架构。

So one is looking at like prediction market order book matching engine architecture.

Speaker 1

这个正在学习预测市场的引擎架构。

So this one's learning about engine architecture for prediction markets.

Speaker 1

这个在研究Polymarket的工作原理,Barnaby预测市场机制。

This one's looking at Polymarket, how it works, Barnaby prediction market mechanics.

Speaker 1

然后我的UX设计正在进行一些设计研究,我们还有一些测试研究。

And then I've got the UX design is doing some design research, and then we've got some test research.

Speaker 1

好的。

Okay.

Speaker 1

现在它要去Polymarket了,真心希望Polymarket不会屏蔽它,否则会给我们增加难度。

Now it's gonna go to Polymarket, and let's really hope the Polymarket doesn't block it, because that'll make things harder for us.

Speaker 1

与此同时,这边发现了Coke。

Meanwhile, over here, this has discovered Coke.

Speaker 1

刚发现代码库是空的,所以它要从头开始搭建脚手架。

Just figured out the repo is empty, so it's gonna scaffold it from scratch.

Speaker 1

它现在开始连接核心市场数学和交易引擎了。

And it is starting to I'm now wiring the core market math and trading engine.

Speaker 1

所以这很有趣,对吧?

So it's interesting, right?

Speaker 1

所以Codex正在这里构建,就像在构建引擎一样。

So you've got Codex is out here building, and is like building the engine.

Speaker 1

使用Opus 4.6,它仍然有智能体在外面做研究工作。

With Opus 4.6, it still has agents out there doing research work.

Speaker 0

Yeah.

Speaker 0

随着进展,你真的开始看到它们有多么不同。

You really start to see just how different they really are as make progress.

Speaker 1

是的

Yeah.

Speaker 1

就像我说的,我之前没接触过这个,所以我们不知道每个任务会花多长时间。

And like I said, I haven't touched it before, so we don't know how long it'll take each of these.

Speaker 0

完全同意。

Totally.

Speaker 0

是的。

Yeah.

Speaker 0

我想,我有个问题是,比如,是否有一个模型更适合初学者或非技术背景的实时编码者,还是说其实无所谓。

And I think, like, I guess one question I I have is, like, is one is one model better for being more of a beginner, non technical live coder, or, you know, it doesn't really matter.

Speaker 1

是的,这是个好问题。

Yeah, it's a good question.

Speaker 1

我的意思是,我认为公平的答案可能是Codex,因为Codex在一些编程基准测试中略微领先Opus 4.6,并且以编写更好的生产代码而闻名,在这方面可能更胜一筹。

I mean, I think the fair answer would be probably Codex, because Codex edged out Opus 4.6 a little bit on some of those coding benchmarks, and is kind of known for writing better production code, probably codecs in that way.

Speaker 1

与此同时,一个缺点就是,正如我所说,我只能以完全平衡的方式进行比较,因为它们实在太不一样了。

At the same time, one of the downsides, and like I said, this is I could only do this in a totally balanced way because they're so different.

Speaker 1

你知道,对于氛围编程者来说,同时需要知道何时介入并阻止Codex,说‘等等,你是这样做的吗?’

You know, at the same time for a vibe coder, knowing when to interject and stop codex and say, oh, wait, you're doing this this way?

Speaker 1

你能考虑改用这种方式来做吗?

Can you instead look at doing it this way?

Speaker 1

他们可能不知道该怎么操作,对吧?

They're probably not gonna know how to do that, right?

Speaker 1

所以这也许就是Opus 4.6的优势所在,你可以说:好吧,启动四五个智能体让它们协同工作,对吗?

And so that's where maybe Opus four point six is better where you could say, Okay, spin up four or five agents and let them work with each other, right?

Speaker 0

是的。

Yeah.

Speaker 1

好的。

Okay.

Speaker 1

Codex完成了。

Codex is done.

Speaker 1

好的。

All right.

Speaker 1

所以Codex在3分47秒内构建了一个Polymarket的竞争对手。

So Codex built a competitor to Polymarket in three minutes and forty seven seconds.

Speaker 0

需要说明的是,Polymarket是一家价值数十亿美元的公司。

And to be clear, Polymarket's a multi billion dollar company.

Speaker 1

是的。

Yeah.

Speaker 1

我不认为这会像那样有效。

I don't think this will work quite as well.

Speaker 1

但我们拭目以待。

But we'll see.

Speaker 1

我们看看吧。

Let's see.

Speaker 1

所以让我们先检查一下它是否成功了。

So let's just check out if it worked first.

Speaker 1

我会让这个继续运行。

I'll let this keep running here.

Speaker 1

所以,你知道,它最后会告诉你的。

So, you know, it'll tell you at the end here.

Speaker 1

它实际上完成了测试,所以你可以看到它构建了一个测试套件。

It actually did the testing, so you can see it built a test suite.

展开剩余字幕(还有 480 条)
Speaker 1

所以它有一个LMSR数学单元测试套件、一个引擎行为单元测试套件和一个API集成测试套件。

So it has an LMSR math unit test suite, an engine behavior unit test suite, and an API integration test suite.

Speaker 1

并且它以10/10的测试通过率通过了。

And it passed with 10 out of 10 tests.

Speaker 1

至于它构建的内容,它拥有这个核心的LMSR做市引擎。

As far as what it built, it has this core LMSR market maker engine.

Speaker 1

所以连贯的定价滑点、绑定损失行为、领域趋势引擎。

So coherent pricing slippage, bound loss behavior, domain trending engine.

Speaker 1

它构建了一个REST API路由器,这挺有意思的,因为我完全没有告诉它需要以任何方式构建这些东西。

It built a REST API router, which is kind of interesting, because I didn't tell it that it would have to build obviously any of this in any way.

Speaker 1

它自行设计出了响应式前端的架构。

It figured out the architecture on its responsive front end.

Speaker 1

好的,我们来看看吧。

All right, well, let's see.

Speaker 1

我们看看它是否真的可以。

Let's see if it is actually.

Speaker 1

那我们到这里来。

So let's go here.

Speaker 1

我会让这个继续运行。

I'll let this keep running.

Speaker 1

这里有四个代理正在运行。

This has got these four agents just running away here.

Speaker 1

我在这里,我要进行下午1点的测试。

And I'm in here, I'm gonna do 1PM test.

Speaker 1

好的,测试十项全部通过,看起来很不错。

All right, test ten past ten, that looks good to me.

Speaker 1

是的。

Yeah.

Speaker 1

下午1点开始。

1PM start.

Speaker 1

好的,它正在运行。

All right, it's running.

Speaker 1

我们来看看。

Let's see.

Speaker 1

好,我们开始吧。

Okay, here we go.

Speaker 1

这看起来具备这个能力。

This looks like it has the ability.

Speaker 1

所以,格雷格,我们让你做第一个交易员。

So let's Greg, we'll make you the first trader.

Speaker 1

好的。

All right.

Speaker 1

说添加,好的。

Say add, okay.

Speaker 1

你有一千美元。

You got a thousand bucks.

Speaker 0

好的。

Okay.

Speaker 1

好了,还不错。

There we go, not bad.

Speaker 1

好的,你想创建什么市场?

All right, what market do you wanna create?

Speaker 0

嗯,比特币,我觉得现在应该已经跌到63000左右了吧?

Well, Bitcoin, I think as we speak has crashed to what 63,000 or something?

Speaker 1

差不多吧,是的。

Something like that, yeah.

Speaker 0

所以我的意思是,BDC会超过110千吗,在

So I do like the I mean, will BDC be above 110 ks by

Speaker 1

是的。

Yeah.

Speaker 0

好的。

Okay.

Speaker 0

到31.20美元为止

By desk $31.20

Speaker 1

20挺不错的,是的,好吧。

That's 20 pretty good, yeah, okay.

Speaker 0

看起来很不错。

Looks pretty good.

Speaker 1

所以让我们,

So let's,

Speaker 0

这几乎翻了一倍。

That's pretty Almost double.

Speaker 1

是的,那会很不错。

Yeah, that'd be pretty good.

Speaker 0

这取决于你什么时候买的。

It depends when you bought it.

Speaker 0

如果你是在125千买的,那你就不太开心,但是

If you bought it at 125 ks, then you're not so happy, but

Speaker 1

我们看看。

Let's see.

Speaker 1

那我就不知道了。

So then I don't know.

Speaker 1

我甚至不知道分辨标准和来源会是什么。

I don't even know what resolution criteria and source would be.

Speaker 1

意思是。

Means.

Speaker 1

我的意思是,我想我明白它指的是什么,但我觉得可以说,我们为什么不把CoinMarketCap作为来源,然后查看12月午夜前的价格来作为解决依据。

I mean, I think I know what it's getting at, but I guess you could say like, why don't we say use coin market cap as the source and resolve by looking at the price on the December, just before midnight, I guess.

Speaker 0

是的,我觉得像是

Yeah, I guess like

Speaker 1

比特币的价格。

The price of BTC.

Speaker 1

对。

Yeah.

Speaker 1

好吧。

All right.

Speaker 1

好的。

Okay.

Speaker 1

看起来没问题。

It looks like it's okay.

Speaker 1

所以我们现在有了。

So we've got it now.

Speaker 1

那我们就用 CoinMarketCap。

So we'll use coin market cap.

Speaker 1

好的。

Okay.

Speaker 1

那你可以设一个 yes,50%。

So then you could do a yes, 50%.

Speaker 1

你觉得怎么样?

So what do you think?

Speaker 1

是的,是或否?

Yes, yes or no?

Speaker 0

我的意思是,这并不是财务建议。

I mean, this isn't financial advice.

Speaker 0

只是纯粹用于教育目的。

Is just purely for educational purposes.

Speaker 0

但我认为是的。

But I think so.

Speaker 0

我觉得是这样。

I think that.

Speaker 1

好的,这是格雷格的买入票。

All right, that's a yes for Greg, buy.

Speaker 1

我们来看看。

Let's see.

Speaker 1

你想买多少股?

How many shares you wanna buy?

Speaker 1

你有一千美元。

You've got a thousand bucks.

Speaker 0

我想全部投入。

I wanna put it all.

Speaker 0

我会全部投入。

I'll put it all I on

Speaker 1

我不知道每股多少钱。

don't know how much it is per share.

Speaker 1

我们看看是不是一千,如果没错的话。

Let's see if it's a thousand, if that's right.

Speaker 1

好的,是的。

Okay, yeah.

Speaker 1

好的,交易已执行。

Okay, trade executed.

Speaker 1

好的。

Okay.

Speaker 1

所以,我的意思是,看起来它已经构建了一个相对功能性的原型。

So, I mean, it seems like it built something, as a prototype relatively functional here.

Speaker 1

我想它实际上已经减少了。

I guess that it actually has decremented.

Speaker 1

所以,好吧,一千股并没有,最终你花了大约24美元。

So, okay, a thousand shares was not, that ended up being, you know, about $24 that you spent.

Speaker 1

所以如果你想要创建另一个市场,你还有更多的钱,但这个操作成功了,没有返回错误。

So you've got more money if you wanted to create another market, but it worked, it's not returning an error.

Speaker 1

这里显示了成交量。

It shows the volume here.

Speaker 1

有意思。

Interesting.

Speaker 1

好吧,我们回到之前。

All right, so let's go back.

Speaker 1

我们看看。

Let's see.

Speaker 1

到目前为止,我觉得一切顺利。

So far so good with that, I'd say.

Speaker 1

我们来看看这里的情况。

Let's see what's going on here.

Speaker 1

好的,我们继续。

So we've got okay.

Speaker 1

首先,看看有多少代币。

So first off, look at how many tokens.

Speaker 1

人们一直在讨论Opus有多消耗代币。

People have been talking about how token hungry Opus is.

Speaker 1

而且它非常消耗token。

And it's very token hungry.

Speaker 1

每个智能体都使用了超过25,000个token。

Each one of these agents has used over 25,000 tokens.

Speaker 1

不过我们还是来看看吧。

So let's see though.

Speaker 1

所以他们完成了,对吧?

So they finished, right?

Speaker 1

围绕架构的技术研究已经完成。

The technical research around architecture is done.

Speaker 1

预测市场研究已经完成。

Prediction market research is done.

Speaker 1

用户体验设计研究已经完成。

The UX design research is done.

Speaker 1

测试策略已经完成。

The testing strategy is done.

Speaker 1

现在它要开始构建了。

Now it's gonna go and build.

Speaker 1

所以它正在编写package.json文件。

So it's writing the package JSON.

Speaker 0

你看到Anthropic发布的那条关于广告的广告了吗?

Did you see the ad that Anthropic launched about ads.

Speaker 1

是的。

Yes.

Speaker 1

我全都看了。

I watched them all.

Speaker 1

它们太搞笑了。

They're hilarious.

Speaker 1

不过说实话,我觉得Sam今天对它们不太满意。

Although actually, I guess Sam was not very happy about them today.

Speaker 1

我看到Sam发了一条不太高兴的推文。

I saw a tweet from Sam that was less than happy.

Speaker 1

我也觉得它们很搞笑,但我同样也能理解他的立场。

I also found them hilarious, but I also understand his side as well.

Speaker 1

所以看起来

So it seems like

Speaker 0

Anthropic目前似乎是反对广告的,是的。

Anthropic is sort of anti ads for now and Yes.

Speaker 0

ChatGPT将要引入广告了。

ChatGPT is gonna be introducing ads.

Speaker 1

是的。

Yes.

Speaker 0

而且,你知道,当我看着这个,看到你在处理令牌,两千五百个、五千个令牌,我就想,对啊,Anthropic其实并不

And, you know, when I'm when I'm watching this and I'm seeing you're going through tokens, twenty five five thousand tokens, I'm like, yeah, of course, Anthropic doesn't really

Speaker 1

没错。

Yeah.

Speaker 1

就是啊。

Like Yeah.

Speaker 1

我的意思是,这确实意味着,如果你把这些都加起来,你实际上使用了超过10万个token来完成这件事。

I mean, this is literally I mean, if you add that all up, you're talking about over a 100,000 tokens used in doing this.

Speaker 1

所以我认为这对Anthropic的投资者来说是一个非常好的方面,对吧?随着智能体成为Opus的新杀手级功能,你的token使用量将会乘以智能体的数量。

So I think that's one of the very good things for like investors in Anthropic, right, is with agents and agents now being, I think probably the new killer feature in Opus, you're gonna take whatever token usage and multiply it by the number of agents.

Speaker 0

没错。

Exactly.

Speaker 0

这实际上非常聪明。

It's actually really smart.

Speaker 0

我在想这是否就是他们的思路。

And I wonder if that was like the thinking.

Speaker 0

就像他们在想,我们怎样才能让人们使用更多代币呢?

Like they're like, how can we get people to use more tokens?

Speaker 0

哦,我们只需要启动代理程序,然后这样设计就行了。

Oh, we'll just like spin up agents and we'll design it like that.

Speaker 0

还是他们考虑的是,好吧,我们如何设计一个最适合这个用例的系统?

Or did they think like, okay, how can we design a system that is best for the use case?

Speaker 1

然后

And then

Speaker 0

他们就说,好吧,那我们就这么来变现。

they're like, okay, then we'll monetize it like this.

Speaker 0

我不知道。

I don't know.

Speaker 0

是的。

Yeah.

Speaker 1

可能是两者的结合。

Probably a combination of the two.

Speaker 1

我可以告诉你,我从来没有一天用过像今天这么多的令牌。

I can tell you I've never used so many tokens in one day as today.

Speaker 1

所以它是有效的。

So it's working.

Speaker 1

A

Speaker 0

10万个代币大概相当于多少美元?

100,000 tokens is like roughly how much in US dollars?

Speaker 1

我不知道,因为我用的是Claude Max套餐。

I don't know because I have a Claude Max plan.

Speaker 1

是的。

Yeah.

Speaker 1

所以我现在没有在付费,我们目前还没遇到任何限制,对吧?

So I'm not paying We're not seeing it hitting any limits right now, right?

Speaker 1

所以我可以告诉你,我支付的费用不会超过200美元。

So I'm not paying more than $200 I can tell you that.

Speaker 0

是的。

Yeah.

Speaker 0

我猜大概是,你知道,我们说的是200美元的最高计划,你还记得大概能获得多少令牌吗?

Guess is it's, you know, we're talking like in the $200 max plan, do you remember how many tokens you get approximately?

Speaker 1

这是个好问题。

That's a good question.

Speaker 0

让我查一下。

Let me check.

Speaker 1

让我启动Claude问问它。

Let me fire up Claude and ask it.

Speaker 1

让我看看这里。

Let's see here.

Speaker 1

我能获得多少代币的估算?

How many tokens do I get estimate?

Speaker 1

让我看看。

Let's see.

Speaker 1

好的。

Okay.

Speaker 1

所以你看。

So here you go.

Speaker 1

估算。

Estimate.

Speaker 1

所以SONNET每月有4500万个令牌。

So 45,000,000 tokens per month of SONNET.

Speaker 1

不过让我看看。

But let's see.

Speaker 1

你对Opus foursix的预估是多少?

What is your estimate for Opus foursix?

Speaker 1

让我们

Let's

Speaker 0

他们好像不太想让你知道。

It's like they don't really want you to know.

Speaker 1

不,他们是想让这变得稍微难一点。

No, they're trying to make it little harder.

Speaker 1

好的。

Okay.

Speaker 1

不过说真的,他们甚至都不会告诉我。

But yeah, they're not even gonna tell me actually.

Speaker 1

他们只会说,没有公开数据。

They're just gonna say, there's no public data.

Speaker 1

它非常新。

It's very new.

Speaker 1

Opus大约贵了5倍。

Opus is roughly 5X more expensive.

Speaker 1

所以如果那是4500万,5倍的话,那么答案大概是1000万左右。

So then if it's 45,000,000, that's 5X, so 10,000,000 is probably the answer, about.

Speaker 1

是的。

Yeah.

Speaker 1

对吧?

Right?

Speaker 0

那么如果我们快速估算一下,假设我们用了10万个令牌。

Then if we're doing quick math, let's just say we spent a 100,000 tokens.

Speaker 0

10万除以500万,我们不会...我们会用更多,因为你看

A 100,000 divided by 5,000,000 is, we're not gonna We're gonna spend more than that because look at

Speaker 1

这个,我们在下一个构建中已经超过了17000个令牌。

this, we're now over 17,000 tokens on top of that in this next build.

Speaker 0

好的。

Okay.

Speaker 1

所以,但即便如此,假设我们现在用一百万个令牌构建一个有竞争力的债券市场,我们仍然只用了Buddy能力的十分之一。

So, but still, let's say, you know, even if we use a million tokens building a competitive bond market right now, we're still only using a tenth of what it Buddy can do.

Speaker 1

那还不算太糟。

That's not terrible.

Speaker 0

不是。

No.

Speaker 0

我的意思是,这也就20美元,差不多是迈阿密一杯鸡尾酒的价格。

I mean, it's $20 which is like the price of a cocktail in Miami.

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

没错。

Exactly.

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

那我们来看看。

So let's see.

Speaker 1

但就像我说的,我现在看着代币数慢慢往上涨。

But now for as I say, I'm watching the tokens creep up.

Speaker 1

好的。

All right.

Speaker 1

现在它正在构建API路由。

So it's building the API routes now.

Speaker 0

我有预感最终结果会更好。

I have a feeling this is gonna be a better end result.

Speaker 1

我其实正想说,这感觉可能是因为之前是四个代理在做所有的工作,而现在它正在做这些工作。

I was actually just gonna say that this feels and then maybe it's just because there were four agents that were doing all the work beforehand now it's doing the work.

Speaker 1

感觉当我们加载它构建的内容时,会看到一些非常不同的东西。

It feels like we're gonna see something very different when we load what it builds.

Speaker 0

是的。

Yeah.

Speaker 0

我觉得我们没有给它任何,比如设计,视觉设计或任何,你知道的,所以没有。

I don't think we gave it any, like, design, like visual design or any, you know, so Nope.

Speaker 0

你会建议人们先推出一个最小可行产品应用,在本地环境里玩玩,点击一些按钮,然后在此基础上逐步完善视觉设计吗?

Do you recommend for folks to just, like, sort of get the MVP app out, play around with it on local host, you know, click some buttons and then sort of update with the visual design from there?

Speaker 1

这是个好问题。

It's a good question.

Speaker 1

我大概一半一半吧。

I I do like fifty fifty.

Speaker 1

有时候如果我心里有明确想法的话,是的。

Sometimes if I have something in mind Yeah.

Speaker 1

特别是如果我想要符合品牌风格的东西,比如假设我正在构建一个将融入OpenClaw、Multbook生态系统的项目,我可能会说:'嘿,我想设计一个看起来与openclaw.ai和moltbook.com有些相似或受其启发的网站。'

Especially if I want something like on brand brand with something like suppose I'm building something that is going to be in the like OpenClaw, Multbook ecosystem, I would probably say, Hey, I want to design a site that looks somewhat similar to, or is inspired by openclaw.ai and moltbook.com.

Speaker 1

看看那些网站并获取灵感。

Take a look at those sites and get inspiration.

Speaker 1

这些模型非常擅长处理这类事情。

These models are great at doing stuff like that.

Speaker 0

很酷。

Cool.

Speaker 1

不过,我真的很期待看到这个项目的进展。

I'm really excited to see what this is doing though.

Speaker 1

我觉得我们现在已经超过20万个token了,我能感觉到。

I think we're now like well over 200,000 tokens, they felt like I could tell.

Speaker 1

但我们还没达到1000万个。

But we're not at 10,000,000.

Speaker 0

我们还没有碰到任何限制。

We're not hitting any limits.

Speaker 0

要知道,我们不必为我们的项目再抵押贷款,是的,

Know, we don't have to take out a second mortgages on our Yeah,

Speaker 1

我们开始吧。

here we go.

Speaker 1

还没到那一步。

Yet.

Speaker 1

没错。

Yeah.

Speaker 1

它还在继续运行。

It's still going.

Speaker 1

我想,你知道,我们可以看出来,比如这里有个有趣的现象,相比之下,这个还在继续。

I guess, you know, we can tell, like here's an interesting thing, in a comparison, like this is still going.

Speaker 1

我们为什么不说,比如设计方面,因为我觉得这个设计看起来有点平淡无奇,对吧?

Why don't we say like, the design, because the design looked kind of bland to me, right?

Speaker 0

是的,确实如此。

Yeah, yes it did.

Speaker 1

你能把它美化一下,让它看起来更漂亮吗?

Can you spruce it up and make it look nicer?

Speaker 1

因为我们也让Codex一起工作,对吧?

Because like we may as well have Codex working away too, right?

Speaker 0

是的。

Yeah.

Speaker 0

所以你并没有给它任何具体的指示,比如它应该看起来像square.com那样。

So you didn't really give it any like specific, it should look like square.com.

Speaker 0

没有。

No.

Speaker 0

我们基本上看看Codex,如果,你知道,如果五三版本能有点品味的话。

We'll basically see if Codex, if, you know, if five three has a little bit of taste.

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

所以它现在说的是这个意思。

So that's what it's saying now.

Speaker 1

它说,好的,我会在不改变功能的情况下升级视觉系统,采用更强的排版、更丰富的色彩方向、更好的卡片层级和有目的性的动效。

It's saying, okay, I'll upgrade the visual system without changing functionality, stronger typography, richer color direction, better card hierarchy, and purposeful motion.

Speaker 1

我不太明白这是什么意思,不过我们会知道的。

I don't know what that means, but we'll find out.

Speaker 1

好吧。

All right.

Speaker 1

好的。

Okay.

Speaker 1

现在开始编辑。

So now editing.

Speaker 1

Index dot HTML。

Index dot HTML.

Speaker 1

看起来它要添加动态悬停效果。

It looks like it's gonna add motion hover polish.

Speaker 1

好的。

Okay.

Speaker 1

不错。

Cool.

Speaker 1

看这个。

Look at that.

Speaker 1

当前任务超过3万个标记,正在构建前端用户界面。

This current task for over 30,000 tokens, building the front end UI.

Speaker 1

好的。

Okay.

Speaker 1

完成了。

It's done.

Speaker 1

是的。

Yeah.

Speaker 1

顺便说一下,Codex速度很快。

Codex is fast, by the way.

Speaker 1

对吧?

Right?

Speaker 1

我是说,这速度相当快了。

I mean, that's pretty darn fast.

Speaker 1

是的。

Yeah.

Speaker 1

所以我们应该可以直接到这里。

So we should be able to just go here.

Speaker 1

它应该已经自动重新加载了。

It should have already automatically reloaded.

Speaker 1

好的。

Okay.

Speaker 1

行。

Alright.

Speaker 1

我是说

I mean

Speaker 0

我是说

I mean

Speaker 1

没什么太大不同。

Not that not that different.

Speaker 0

没什么不同。

Not that different.

Speaker 0

是的。

It's Yeah.

Speaker 0

我想,我能试试看吗?

I think, can I try something?

Speaker 1

好的。

Yeah.

Speaker 1

尽管试吧。

Go for it.

Speaker 0

我会说,我会说,好吧。

I'm gonna say I'll I would say, okay.

Speaker 0

谢谢,但这只是一个小幅度的设计更新。

Thank you, but this was a minor design refresh.

Speaker 0

我正在寻找一个重大的更新。

I'm looking I'm looking for a major one.

Speaker 1

这就对了。

There you go.

Speaker 1

是的。

Yeah.

Speaker 0

然后我会说,假装你是杰克·多西。

And then I'm gonna say, pretend you are Jack Dorsey.

Speaker 1

给你。

Here you are.

Speaker 0

他会如何设计这个网站,使其简洁优雅

And how would he design this website to be clean, elegant

Speaker 1

是的。

Yeah.

Speaker 1

而且

And

Speaker 0

丰富且充满有趣的交互。

full and full of interesting interactions.

Speaker 1

是的。

Yeah.

Speaker 1

很好。

Great.

Speaker 0

好的。

Yeah.

Speaker 0

杰克·多尔西,可能有些人不知道,他是前身为推特、现为SquareBlock的联合创始人。

Jack Dorsey, for people who don't know, cofounder of formerly known as Twitter and SquareBlock now.

Speaker 0

嗯哼。

Mhmm.

Speaker 0

他就是个搞设计的人。

He's just got he's got he's a design guy.

Speaker 0

我不知道。

I don't know.

Speaker 0

他是第一个想到的人,或者说最先想到的人。

He's the first one guy came to mind or first person came to mind.

Speaker 0

是的。

Yeah.

Speaker 1

这个想法不错。

That's a good one.

Speaker 1

这是个不错的提示。

That's a good prompt.

Speaker 1

让我看看。

Let's see.

Speaker 1

所以我会做一个完整的视觉重构,而不是渐进式的微调。

So I'll do a full visual re architect, not an incremental tweak.

Speaker 1

新的布局语言,更强的排版,单色优先的调色板,有趣的交互驱动卡片,等等等等。

New layout language, stronger typography, monochrome first palette, interesting, interaction driven cars, da, da, da.

Speaker 1

好的。

Okay.

Speaker 1

你知道有趣的是,它并没有——我本来有点希望,也许我们还没达到通用人工智能的水平,我本来希望它会说‘让我去找些艺术作品’。

You know what's interesting is it didn't, I would have kind of hoped, and maybe we're quite at AGI yet, I would have kind of hoped that it would say, let me go find some art.

Speaker 1

比如,如果你告诉我,格雷格,嘿,摩根,你能读一下吗?

Like, if you told me that, Greg, like, hey, Morgan, can you read it?

Speaker 1

我可能会说,好的,让我去找一些关于杰克·多尔西设计美学的文章看看。

I would be like, yeah, let me go look at some articles about Jack Dorsey's design aesthetic.

Speaker 1

没错。

Exactly.

Speaker 1

我很惊讶它没有这么做。

I'm surprised it's not doing that.

Speaker 1

相反,它表现得像是——我假设它知道杰克·多尔西是谁,虽然我不确定它是否真的知道。

Instead, it's going like, I am assuming it knows who Jack Dorsey is, although I don't know if it actually does.

Speaker 1

感觉它只是抓住你问题中的这部分内容,然后说‘哦,好的,重大更新,我来做这个’

Just seems like it's really just taking like this part of your question and going, oh, okay, major refresh, I'll do that.

Speaker 0

嗯,你不能问问它吗?

Well, can't you ask it?

Speaker 0

你不能问一下,你知道杰克·多尔西是谁吗?

Can't you say, do you know who Jack Dorsey is?

Speaker 1

让我看看,其实我可以,我应该能在中途打断它。

Let's see, I can actually, I'm supposed to be able to in the middle cut it off.

Speaker 1

让我来,你知道杰克·多尔西是谁吗?

Let Yeah, me do you know who Jack Dorsey is?

Speaker 1

让我看看,好的,我们开始吧。

Let's see, okay, so here we go.

Speaker 1

这是中途测试。

This is the midstream test.

Speaker 1

它正在思考。

It's thinking about it.

Speaker 1

这边有43,000个标记。

43,000 tokens over here.

Speaker 1

好的。

Okay.

Speaker 1

是的。

Yes.

Speaker 1

好的,我们开始吧。

Okay, here we go.

Speaker 1

对。

Yes.

Speaker 1

杰克·多尔西是Twitter的联合创始人,也是前Square公司的CEO。

Jack Dorsey is the co founder of Twitter, former Docky Square.

Speaker 1

好的。

Okay.

Speaker 1

其设计风格通常极简克制且注重交互。

With a design style that's typically minimal restraint and interaction focused.

Speaker 0

太美了。

Beautiful.

Speaker 1

好的。

Okay.

Speaker 1

明白了。

Alright.

Speaker 1

所以它向我们展示了。

So touche, it showed us.

Speaker 1

是的。

Yeah.

Speaker 1

现在有个奇怪的事情。

Now here's the weird thing.

Speaker 1

看起来它

It looks like it's

Speaker 0

它是完整的还是我们还有

Is it complete or do we have

Speaker 1

要说它是完整的吗?

to say Is it complete?

Speaker 1

就像我刚才说的那样?

Like right when I was saying that?

Speaker 1

但你是说完了,还是因为我问了问题才停下的?

But like are you done or did you stop because I asked a question?

Speaker 0

是的。

Yeah.

Speaker 0

这真的很有趣。

This is really interesting.

Speaker 0

我会说,就像,哦,你问问题的时候我暂停了。

I will say like the Oh, I paused when you asked the question.

Speaker 0

这次重大重新设计,哦,有意思。

The major redesign Oh, interesting.

Speaker 0

大部分是

Mostly

Speaker 1

嗯,你想继续吗,哦,这有点奇怪。

Well, you want to resume, oh, so that's weird.

Speaker 1

所以你一问问题,它就停了。

So you ask a question, it just stops.

Speaker 1

但是,就像这样,是的,当然,继续。

But like So like, yes, of course, continue.

Speaker 1

对。

Yeah.

Speaker 1

好的。

Okay.

Speaker 0

这实际上是一种很奇怪的用户体验。

So that's actually some weird UX.

Speaker 0

比如,显然它应该在之后自动继续。

Like it obviously should just continue after.

Speaker 1

对吧?

Right?

Speaker 1

我会这么认为。

I would assume that.

Speaker 1

这事儿真奇怪,因为它说:是的,如果你愿意,我现在就继续。

It's such a weird thing because it said, yeah, if you want, I'll resume now.

Speaker 0

我要说,我喜欢在过程中可以编辑内容这一点。

I will say, I do like that you can in midstream, like kind of edit things.

Speaker 0

我的思维方式就是这样。

Like that's how my brain works.

Speaker 1

完全同意。

Totally.

Speaker 1

这用了大量的令牌。

This is using a ton of tokens.

Speaker 1

能看到这些细节真是太棒了。

It is amazing to see the detail.

Speaker 1

我的意思是,无论这是哪个周期的产物,都应该算是一件艺术品。

I mean, this should be a work of art, whatever cycle this comes off.

Speaker 1

对,这个完成了。

Right, this is done.

Speaker 1

那么现在让我们看看,我们可以回到这个。

So now let's see, we can go back to this.

Speaker 1

好的。

And okay.

Speaker 1

我的意思是,虽然没有让我惊艳,但也还行。

I mean, I'm not blown away, but it's okay.

Speaker 0

意见在毫秒间就形成了价值。

Opinions become price in milliseconds.

Speaker 0

交易信念,而非噪音。

Trade conviction, not noise.

Speaker 0

信号市场正渴望快速响应。

Signal market is dying for fast.

Speaker 0

通过透明定价进行假设迭代。

Thesis iteration with transparent pricing.

Speaker 0

好的。

Okay.

Speaker 0

我的意思是,我觉得我可以再推得更进一步。

I mean, I would push it more, I think.

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

我想,你知道,就像

I guess, you know, like

Speaker 0

I

Speaker 1

我会为它这么说。

would Go say for it.

Speaker 0

我认为那不是我认识的杰克·多西。

I would say that's not the Jack Dorsey I know.

Speaker 1

我不认识杰克·多西。

I don't know Jack Dorsey.

Speaker 1

I

Speaker 0

我一直在寻找大写锁定的重大升级。

was looking for a caps lock major upgrade.

Speaker 0

这可能意味着更多的文字、更多的图片、更多的叙事。

That might mean way more copy, way more images, way more storytelling.

Speaker 1

是的,没错。

Yeah, exactly.

Speaker 0

等等,诸如此类。

Etcetera, etcetera.

Speaker 1

是的。

Yeah.

Speaker 1

说真的,我会说,你慢慢来。

I'll just say, seriously, take your time.

Speaker 1

尽情发挥吧。

Go nuts.

Speaker 1

是的。

Yeah.

Speaker 1

什么是学分?

What are credits?

Speaker 0

经典的遗言啊,摩根。

Famous last words, Morgan.

Speaker 1

没错。

Yeah.

Speaker 1

我知道。

I know.

Speaker 1

对吧?

Right?

Speaker 1

就像是,哦,太完美了。

It's like, oh, perfect.

Speaker 1

好的。

Okay.

Speaker 1

那就像是总部开业时发出的信号。

That's like a signal within the opening of headquarters.

Speaker 1

就像是,我们终于招到人了。

Like, we finally got someone.

Speaker 0

完全正确。

Totally.

Speaker 0

这是一条大鱼。

It's a whale.

Speaker 1

好的。

Alright.

Speaker 1

所以Opus已经完成了。

So Opus has finished.

Speaker 1

是的。

Yep.

Speaker 1

我不知道自己用了多少积分,不过其实,让我问问它。

I have no idea how many credits I've used, but probably actually, let me ask it.

Speaker 1

你总共用了多少个代币来整合这一切,包括那四个代理?

How many tokens in total did you use to put all of this together, including the four agents?

Speaker 1

然后我们可以——哦,它用的是代币印章。

And then we can Oh, it's using token stamps.

Speaker 0

太有趣了。

That's so funny.

Speaker 1

好的。

Okay.

Speaker 1

不知道。

Doesn't know.

Speaker 1

让我看看。

Let's see.

Speaker 1

好的。

Okay.

Speaker 1

开始了。

Here we go.

Speaker 1

好的。

Okay.

Speaker 1

它正在估算。

It's it's estimating.

Speaker 1

它其实不知道,这有点奇怪,因为它应该知道。不过,我在想我能不能直接输入 /cost 命令。

It actually doesn't know, which is weird because it should Although, wonder if I can actually do slash cost.

Speaker 1

哦,好了。

Oh, here we go.

Speaker 1

好的

Yeah.

Speaker 1

Okay.

Speaker 1

Yeah.

Speaker 1

哦,它不行啊。

Oh, it doesn't okay.

Speaker 1

不。

No.

Speaker 1

不需要监控成本。

No need to monitor cost.

Speaker 1

好的。

Okay.

Speaker 0

不需要。

No need.

Speaker 1

他们真的不想让你说'是的'。

They really don't want you to Yeah.

Speaker 1

好的。

Okay.

Speaker 1

它估计总共需要15万到25万个令牌。

It's guessing 150 to 250,000 tokens total.

Speaker 0

对。

Yeah.

Speaker 0

那可能是对的。

That's probably right.

Speaker 1

好的。

Okay.

Speaker 1

当然。

Sure.

Speaker 0

是的。

Yeah.

Speaker 1

好的。

Okay.

Speaker 1

那么,它是这样做的。

So here's what it's done.

Speaker 1

首先,这里有一个非常有趣的现象,Codex生成了10个测试,对吧?

So first off, one really interesting thing here is, you know, Codex created 10 tests, right?

Speaker 1

Opus生成了96个测试。

Opus created 96 tests.

Speaker 1

所以在测试方面确实有更多细节。

So definitely a lot more detail on the testing side.

Speaker 1

它被称为预测市场,而Codex称之为信号市场。

And it's called it forecast, whereas Codex called it signal market.

Speaker 1

所以名称不同。

So different names.

Speaker 1

一个Polymark的竞争对手已经建成并经过验证。

A Polymark competitor is built and verified.

Speaker 1

以下是每位团队成员交付的内容。

Here's what each team member delivered.

Speaker 1

架构技术负责人决定采用模块化单体架构,Next。

So the architecture technical lead decided modular monolith, Next.

Speaker 1

Js 14应用路由器、中央限价订单簿、数据库架构、RESTful API。

Js 14 app routers, central limit order book, database schema, RESTful API.

Speaker 1

好的。

Okay.

Speaker 1

预测市场领域的专家,二元是/否市场,其中是/否选项始终价值一美元。

The prediction market domain expert, binary yesno market, where yesno is always a dollar.

Speaker 1

好的。

Okay.

Speaker 1

涵盖加密货币和政治领域的市场。

Seated markets across crypto politics.

Speaker 1

明白了。

Okay.

Speaker 1

用户体验设计负责人,黑暗模式交易平台。

The UX design lead, dark mode trading platform.

Speaker 1

页面,绿色代表是,红色代表否。

Pages, it is a green for yes, red for no.

Speaker 1

好的。

Okay.

Speaker 1

测试质量保证负责人,进行了订单簿测试。

Testing QA lead, did order book tests.

Speaker 1

好的。

Okay.

Speaker 1

这是我们正在拆分的测试:订单簿测试、匹配引擎。

So here's the tests we're breaking out, order book tests, matching engine.

Speaker 1

好的。

Okay.

Speaker 1

好了,运行 npm run dev 来启动应用。

All right, npm run dev to start the app.

Speaker 1

那我们进来看看。

So let's go in here.

Speaker 1

哦,有意思。

Oh, interesting.

Speaker 1

好的。

Okay.

Speaker 1

我其实不想说什么。

I don't wanna say anything actually.

Speaker 1

我已经给他了。

I've already given to him.

Speaker 1

你最初的判断是什么?

What's your initial take?

Speaker 0

你好,杰克·多西,你明白我的意思吗?

Hello Jack Dorsey, you know what I'm saying?

Speaker 0

这正是我们发布Codex时预期的样子。

This is what I expected it to look like when we pushed Codex.

Speaker 1

是的,我也是。

Yeah, me too.

Speaker 0

这看起来非常整洁。

This looks really clean.

Speaker 0

当你悬停时会发生什么?

What happens when you hover over?

Speaker 1

哦,对。

Oh, yeah.

Speaker 1

看看这个。

Look at that.

Speaker 1

是的。

Yeah.

Speaker 1

有悬停状态。

Got hover states.

Speaker 0

悬停状态。

Hover states.

Speaker 0

是的

Yeah.

Speaker 1

显然它已经整理好了

Obviously It's got it organized.

Speaker 1

比如体育赛事,你知道,接下来的两个会有超过1.2亿观众吗?

Like when it goes sports, you know, will the next two will have over 120,000,000 viewers?

Speaker 1

人工智能会在2027年通过图灵测试吗?

Will AI pass the Turing test by 2027?

Speaker 1

它会移动到凡尔赛三号吗?

Will it move to Versailles three?

Speaker 1

里面有些东西。

It's got stuff in there.

Speaker 0

是的。

Yeah.

Speaker 0

对。

Yeah.

Speaker 0

这根本感觉不像一个最小可行产品。

This doesn't even, it doesn't feel like an MVP.

Speaker 1

是的。

Yeah.

Speaker 1

这实际上相当疯狂。

This is pretty wild actually.

Speaker 1

它还生成了一些我们从未向它提及的内容,对吧?

And it created some stuff that we never talked to it about, right?

Speaker 1

比如一个排行榜,它已经填充了一些初始内容。

Like a leaderboard, which it's already populated with some initial stuff.

Speaker 1

作品集部分。

Portfolio section.

Speaker 1

是的。

Yeah.

Speaker 1

有趣。

Interesting.

Speaker 1

那么我们现在来看看。

So let's see now.

Speaker 1

所以,我的意思是,尽管如此,我还是更印象深刻,也许这15万美元花得值,感觉好多了。

So, I mean, I'm more impressed with though, it was maybe worth the 150,000 to one Feeling 150,000 better about it.

Speaker 1

SpaceX会在2030年前将人类送上火星吗?

Will SpaceX land humans on Mars before 2030?

Speaker 1

它认为只有8%的可能性,哦对了,看看这个。

Only 8% it thinks, Oh yeah, look at this actually.

Speaker 1

哇哦。

Woah.

Speaker 0

这太疯狂了,兄弟。

This is insane, bro.

Speaker 1

这很简洁。

That's clean.

Speaker 0

这很简洁。

That's clean.

Speaker 1

是的

Yeah.

Speaker 1

我没想到点进来能看到设计得这么好的页面。

I wasn't expecting to click in and actually get a well designed page like this.

Speaker 1

所以如果我要操作的话,必须先登录才能交易。

So if I were to do that, I have to sign in to trade.

Speaker 1

我不确定能不能登录,因为我还没设置任何账户。

I don't know if I'm gonna be able to sign in because I haven't set anything up.

Speaker 1

让我看看。

Let me just check.

Speaker 0

嗯,你可以注册,上面显示没有账户。

Well, you can sign up, says don't have an account.

Speaker 1

哦对,注册吧。

Oh yeah, sign up.

Speaker 1

不过我不确定它是否全部连接好了。

I don't know if it gets it all connected though.

Speaker 1

让我看看。

Let's see though.

Speaker 1

好的,我正在操作。

All right, I'm snagging.

Speaker 1

你知道吗,其实我打算用Greg这个用户名,我还是用你的用户名。

You know what, actually, I'm gonna take the username Greg, I'm still your username.

Speaker 1

好的。

All right.

Speaker 1

让我看看。

Let's see.

Speaker 1

好的。

Okay.

Speaker 1

所以是的,可能因为我想说数据库还没连接好。

So yeah, it's probably because I was gonna say the database isn't wired up yet.

Speaker 0

对。

Right.

Speaker 1

所以我并不惊讶实际上需要这么做。

So I'm not surprised that I would actually have to do.

Speaker 1

我本来没预料到要做这个。

I wasn't expecting to do that.

Speaker 1

所以,但我理解了。

So, but I get it.

Speaker 1

我的意思是,这样很简洁。

I mean, it's clean.

Speaker 1

这挺不错的。

This is pretty neat.

Speaker 0

是的。

Yep.

Speaker 1

对。

Yeah.

Speaker 1

好吧。

All right.

Speaker 1

让我看看。

Let's see.

Speaker 1

那么这样,好吧。

So then can this, all right.

Speaker 1

我们已经给了,我不知道你怎么想,但这将是我给Codex的最后一次机会

We've given, I don't know about you, but this is the last chance I'm gonna give Codex

Speaker 0

继续

on

Speaker 1

这是这里的设计机会。

It's the design out of opportunities here.

Speaker 1

所以,哦,这里,不过挺有意思的。

So, oh, here, it's funny though.

Speaker 1

最后,它有点像《星际迷航》里的数据。

In the end, it kind of, it's acting a little bit like data from Star Trek.

Speaker 1

问题是,什么是学分?

And the question, what are credits?

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客