The Information's TITV - Claude Sonnet 4.5评测,AWS高管谈AI代理,Vercel估值达90亿美元 | 2025年9月30日 封面

Claude Sonnet 4.5评测,AWS高管谈AI代理,Vercel估值达90亿美元 | 2025年9月30日

Claude Sonnet 4.5 Review, AWS Director on AI Agents, Vercel’s $9B Valuation | Sep 30, 2025

本集简介

Vercel首席运营官Jeanne DeWitt Grosser与TITV主持人Akash Pasricha畅谈公司3亿美元F轮融资及成为"AI界AWS"的愿景。我们还与Warp的Zach Lloyd、Zencoder的Andrew Filev探讨了他们对Claude Sonnet 4.5新模型的第一印象。《The Information》记者Theo Wayt深度解析xAI最新组织架构变动,并与AWS技术总监Shaown Nandi共话AI智能体发展。 本期提及文章: https://www.theinformation.com/articles/people-running-elon-musks-xai TITV于太平洋时间上午10点/东部时间下午1点在YouTube、X和LinkedIn同步直播。您也可在任意播客平台收听。 订阅渠道: - The Information YouTube频道:https://www.youtube.com/@theinformation4080/?sub_confirmation=1 - The Information主站:https://www.theinformation.com/subscribe_h 注册获取AI Agenda通讯:https://www.theinformation.com/features/ai-agenda

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

欢迎各位来到The Information的TI TV节目。我是阿卡什·帕什里察。今天是9月30日星期二。我们The Information团队都非常兴奋,因为今天在时代广场举办了AI Agenda Live大会。请务必关注我们的网站获取更多相关报道。

Welcome everyone to The Information's TI TV. My name is Akash Pashritsha. It is Tuesday, September 30. We are all excited here at The Information because today is our AI Agenda Live conference, which is happening today in Times Square. Be sure to follow along our website for more coverage on that.

Speaker 0

接下来几天我们还将为您带来该活动的精彩集锦,请持续关注本直播。今天的节目里,我们邀请了几位令人期待的嘉宾。我们将讨论Claude Sonnet 4.5的最新发布版本,并采访几位测试过该模型的创始人。顺带一提,实际上Vercel会是第一个出场的嘉宾。

We're also going to be bringing you the highlights from that event over the next few days, so stay tuned to this stream. Today on the show, we've got some exciting guests coming on. We are talking about the latest release of Claude Sonnet 4.5. We're going be talking to some founders who have been testing out the model. By the way, Vercel is coming on the show first actually.

Speaker 0

他们将首先登场讨论最新一轮融资。我们还有关于xAI组织架构的最新报道,另外还为您准备了与AWS朋友的精彩对话。但在开始之前,我想重点介绍我们昨晚在The Information发布的重磅新闻。我们独家报道OpenAI在2025年前六个月创造了超过40亿美元的收入。这个数字超过了该公司去年全年的总收入。

We're coming on first to talk about their latest fundraise. We also have some of the latest reporting on xAI's org chart, and separately, we've got a great conversation planned for you with our friends at AWS. But before we get going, I want to highlight for you a big story that we published last night at The Information. The Information is first to report that OpenAI generated more than $4,000,000,000 in revenue in the first six months of twenty twenty five. And to put that number in context, that figure is more than the total revenue that the company generated in all of last year.

Speaker 0

值得注意的是,我们已知该公司近期月收入达到10亿美元,因此很可能正朝着全年130亿美元的收入目标稳步前进。报道中还包含许多关于公司现金消耗情况的细节,我强烈推荐您阅读。好的,让我们欢迎第一位嘉宾。由于AI的影响,开发者工具发生了重大变革,而Vercel正是借此趋势获得巨大发展的公司之一。该公司今天以93亿美元估值完成了3亿美元的F轮融资。

Now, thing to note here is that we know the company was recently generating $1,000,000,000 a month in revenue, so it is likely making good progress towards that $13,000,000,000 annual revenue target for the year. The story also has a lot of great details about where the company's cash burn is at, and so I highly recommend that you check that out. Okay, let's get to our first guest. Developer tools have significantly as a result of AI, and Vercel is one company that has gained a lot of traction as a result of that trend. The company raised $300,000,000 today in a Series F funding round at a 9,300,000,000 valuation.

Speaker 0

现在有请公司首席运营官让·格罗瑟来讨论这个消息。让,欢迎来到TI TV,很高兴您能来。

I want to bring on the company's Chief Operating Officer, Jean Grocer, to talk about the news. Jean, welcome to TI TV. It's great to have you.

Speaker 1

谢谢邀请,阿卡什。

Thanks for having me Akash.

Speaker 0

我想更广泛地聊聊你们的业务,但简单来说,Vercel是做什么的?

I want to talk about the business more broadly you're in, but very quickly, what does Vercel do?

Speaker 1

我们涉足众多领域。传统上,我们以提供前端云服务闻名,本质上是构建全栈前端应用的最佳平台。大约一年前,或者说十八个月前,我们发布了vZero——一款被广泛用于原型设计和应用开发的可视化编程工具。而在vZero底层,我们不得不构建大量基础设施来支撑这款AI原生应用。

Well, we do a lot of things. So historically, we've been known for providing a front end cloud. So basically the best place to go and build full stack front end applications. And then about a year ago or a little more, maybe eighteen months, we released vZero, which is a vibe coding application that many are using for prototyping and app building. And then underneath the hood of vZero, we had to build a bunch of infrastructure to support an AI native application.

Speaker 1

如今这类技术已无法轻易从现成方案中获得。因此今年六月,我们推出了AI云平台,现已成为构建全栈智能体应用的首选——无论是对话型智能体还是后台智能体,都能在AI云上完成开发。

You can't just get that stuff easily off the shelf these days. So in June, we released the AI Cloud, which is now the best place to go and build full stack agentic applications, whether that's for conversational agents or background agents. You can build them all on the AI cloud.

Speaker 0

明白了。所以本质上,你们的技术是在用更自动化的方式帮助人们完成繁琐的AI编程工作。这笔融资你们打算如何运用?

Right. So AI coding, essentially, you're helping people do that painful task in much more automated ways with your technology. What are you going do with the money?

Speaker 1

我们将对这三个领域进行投资。AI云平台还有大量待完善之处——可以说我们的愿景是成为AI领域的AWS,为开发者提供构建AI应用所需的标准化基础模块。近期我们在企业级市场取得显著进展,因此会同步完善企业产品线并搭建相应的市场推广体系。另外值得注意的是,AI技术天然具备全球化特性...

Well, we'll invest in all three of those areas. So a lot to do in building out the AI cloud. You can sort of think of our aspiration is to go become the AWS of AI, giving everybody off the shelf primitives that they need to build AI applications. We've recently seen a lot of traction in enterprise, so we'll be building out both enterprise product and the go to market or to support that. And then I think one of things that's been incredible to see about AI is just how instantly global it is.

Speaker 1

所以我们会持续拓展全球业务版图。

So continuing to expand our global reach.

Speaker 0

没错。现在我想探讨的是,当前市面上AI编程工具层出不穷。稍后节目还将有几位开发同类工具的创始人登场,他们将讨论Claude Sonnet 4.5——我马上想听听你的见解,因为我知道...

Right. Now, of the things I want to talk about is there are so many AI coding tools out there. You know, we've got actually a couple other founders who are working on their own AI coding tools coming on the show in a minute to talk about Claude Sonnet 4.5, which I want to get your thoughts on in a second because I know

Speaker 1

你们团队一直在使用它。但如果我们更宏观地审视AI编程领域,你们究竟如何...

you guys have been using it. But if we look at the AI coding landscape more broadly, I mean, how do you

Speaker 0

看到这种洗牌了吗?你们现在有收购其他公司的打算吗?你认为行业整合会发生吗?这就是我快速预测的情况。

see this shaking out? Are you looking to buy other companies right now? Do you see consolidation happening? That's what I'm forecasting here very quickly.

Speaker 1

是的。这是个碎片化的市场。作为vZero的开发者以及底层基础设施的提供者,我们真正想支持所有这些应用的开发。比如Cursor就在我们平台上运行了大量代理基础设施,许多其他Vibe编程应用同样在使用我们的AI云服务。

Yeah. I mean, it's a fragmented market. And I think as both the makers of vZero but also the infrastructure that goes underneath that, primarily we really want to support all of these applications getting developed. So Cursor runs a bunch of their agentic infrastructure on us. Many of the other Vibe coding applications out there similarly are using our AI cloud.

Speaker 1

实际上我们开源了vZero。你可以利用vZero的核心技术来创建自己的版本——我们反而在促进这种碎片化。

In fact, we open sourced vZero. So you can go, we actually enable the fragmentation. You can go and use what's under the hood of vZero to create your own.

Speaker 2

明白了。

Got it.

Speaker 1

短期来看,这种碎片化其实是有益的,因为人们可以深入特定用例场景来精细调校AI。vZero就是专注于React和Next.js的典型例子——要知道Next.js可是...

Near run, one of the things we'll probably see is actually the fragmentation is beneficial because people can go deep on specific types of use cases where you really want to fine tune the AI. That's one of the things with vZero, we're deeply focused on React and Next. Js. So, you know Which is what? So Next.

Speaker 1

构建前端应用的首选框架。如果你正在开发前端应用,很可能会用Next.js编写代码。而vZero采用的是复合模型架构,比如我们会整合Sonnet这样的模型。

Js is the top framework for building front end applications. If you're a developer writing code to build a front end application, there's a decent chance you're going to be writing it in Next. Js. What we can do with vZero is actually we have a composite model. So we'll take a model like Sonnet.

Speaker 0

没错。

Right.

Speaker 1

但我们有一个运行在其上的修正模型,它能捕捉Sonnet出错的所有地方,并融入我们的Next.js专业知识,使你更有可能一次性将提示转化为应用,或构建出真正稳健可用的应用程序。

But we have a fixer model that runs on top of it that catches all the places where Sonnet got it wrong and adds all of our Next. Js expertise to make it more likely that you can one shot a prompt to an app or build a really robust application that actually works.

Speaker 0

对。你们一直在用Claude新出的4.5 Sonnet版本吗?

Right. You've been using Claude's new 4.5 Sonnet?

Speaker 1

是的。我们——

Yes. We-

Speaker 0

评价如何?我们马上要开始一个小组讨论,不如先问问你,你觉得它

What's the review? We're going to have a panel on in a second, we might as well ask you, how are you finding

Speaker 1

没错。我们经常在模型供应商正式发布更新前就提前合作。这个版本我们已经用了一段时间,仍然对其评价很高。它现在支撑着vZero系统。

Yeah. I mean, we often are working with a lot of the model providers before they actually publicly drop their model updates. So we've been working on with that one for a bit and, you know, still think very highly of it. It's powering vZero.

Speaker 0

明白了。关于这些模型效果趋于平缓的讨论——有人认为当某些方面已经相当出色时,要取得实质性进步会更困难。你观察到这种平台期现象了吗?

Right. And what's your take on the sort of discussion around the effectiveness of these models sort of plateauing, the idea that it's harder to make substantial improvements when the things are already quite good in some cases. Do you see that plateauing of the models?

Speaker 1

目前我们确实看到每个版本都有进步。我认为最近我们观察到很多...但这是否

So far, we're definitely seeing improvements with each version. I think one of the things we're seeing a lot of the last But is it

Speaker 0

一样好吗?真的有那么好吗?我是说,这里的利润空间肯定在缩小一些。

as good? Is it as good? Like, I mean, the margins must be shrinking in here a bit.

Speaker 1

这是个好问题。我可能需要请我们的vZero工程团队来给你更详细的解答。我们对每个模型版本都有相当严格的评估流程,能够与其他模型进行对比等。但我认为现在很多模型提供商都在专注于特定领域,编程显然是其中之一,不过看起来在更细分领域仍有回报空间。

Yeah, that's a good question. I probably have to get our vZero engineering team over to give you a more detailed answer on that one. We have like a pretty robust eval process we go through with each one of these model drops and ability to compare them to other models, etc. But I do think one of the things you see a lot of the model providers do is now focusing in specific areas. Obviously coding being one of those, But it seems like there probably are still returns for more area specific

Speaker 0

没错。你们是大型编程公司之一。你们有收购计划吗?未来6到12个月内会有动作吗?

Right. You guys are one of the bigger coding companies. Are you looking to do acquisitions? Do you see them coming in the next six to twelve months?

Speaker 1

Vercel一直积极收购,通常是那些做有趣事情的小型初创公司。最近我们完成了对Nuxt的重大收购,这是另一个开源前端框架。所以我们始终在关注市场,无论是独立开源开发者还是拥有我们平台所需功能的团队。

Vercel has been highly acquisitive, typically, you know, of smaller startups doing really interesting things. Most recently, we actually made a pretty significant acquisition in buying Nuxt, which is another open source front end framework. So we are always out there looking at things, whether it's a one person, you know, open source developer or folks with interesting functionality that we think we could add to the platform.

Speaker 0

对。这正是我们讨论行业整合的原因——你们既提供编程工具的基础设施,又涉及编程工具本身,还有AI代码审查。

Right. And that is sort of the reason we talk about consolidation here is because you talk about the infrastructure that enables these coding tools. That's the business that you're on. You talk about the coding tools themselves. You talk about AI code review.

Speaker 0

我们接触过这三类公司。我在想,为什么不打造一个垂直整合的全能实体呢?Vercel是想成为这样的存在吗?

I mean, we've had companies from all three buckets. I kind of see this as, you know, why not have a one vertically integrated entity that does it all? Mean, is that what Vercel wants to become?

Speaker 1

这正是我们的自我定位。比如我们构建和维护的Next.js框架。

That's how we describe ourselves, right? So you've got the framework in Next. Js, which we we build and maintain.

Speaker 0

对。

Right.

Speaker 1

Vercel的基础设施平台是专门为前端和现在的智能体应用构建的。

You've got, you know, the platform in all of Vercel's infrastructure that's specifically, you know, built for front end and now agentic applications.

Speaker 2

没错。

Right.

Speaker 1

我们的平台还内置了一系列智能体功能。正如你刚才提到的,代码审查功能正在发布Vercel智能体。它能在Next.js和Vercel专属代码的特定上下文中出色地完成代码审查工作。

And then you have built into our platform a bunch of agentic capabilities. So exactly what you just said, code reviews were releasing the Vercel agent. And so it will do code reviews and it does them quite well because it's doing them in the specific context of Next. Js and code built for Vercel. Right.

Speaker 1

因此相比通用型代码审查工具,我们能实现更高的准确性和效率。

So you can imagine that we can be quite accurate and effective at that relative to more generic based code reviews.

Speaker 3

确实。

Right.

Speaker 1

总的来说,我们认为垂直整合是平台的核心优势与差异化所在。我们的目标是将智能体能力贯穿整个平台,为用户提供解决方案而非制造问题。

So I think in general, our view is being vertically integrated is real strength here and a differentiator of our platform. That, you know, the goal is basically to take agents across the entire platform. So we're providing folks solutions rather than problems.

Speaker 0

好的。Gene,在结束之前,我想带你回顾一下你在Stripe的时光。你在Stripe工作了九年多,最近担任的是首席商务官。我之所以想询问你对Stripe领域的见解,是因为昨天我们看到OpenAI发布了一款代理式结账工具,现在似乎可以通过Jack QBT完成结账了。

Right. Gene, before I let you go, I want to just take you back in time a little bit to your previous role at Stripe. You were at Stripe for more than nine years. You were most recently Chief Business Officer. And the reason I want to ask you about some of your insights in Stripe land is because we saw OpenAI yesterday release this agentic checkout tool, I guess, can now check out with Jack QBT.

Speaker 0

我们对AI与商业交汇的领域非常着迷,这想必是你之前工作中密切关注的方向。请容我多问几句,你认为这一趋势将如何发展?昨晚看到这个功能时,感觉它似乎还未达到人们想象中的那种能自主采购、筛选不同选项并做出最佳决策的智能代理水平。你觉得这项技术会对亚马逊等平台构成威胁吗?你如何看待其发展前景?

And we've been really fascinated by the intersection of AI and commerce, which I imagine is something you were looking at very closely in your previous role. Indulge me here for a minute. Where do you think this is going? Because I saw that last night, the feature, and it felt a little bit under what I don't know that it was quite the agent that everyone imagined doing the purchases for, know, sorting through different options, deciding what the best option is. I mean, do you see this technology being a threat to something like Amazon or how do you see that playing out?

Speaker 1

这是个好问题。目前阶段可能还需要人工参与确认,比如对代理说'好的,我同意你找到的这件商品'。但对于许多基础采购需求——当人们没有特别偏好,只是需要某件商品时——这种技术就很适用。某种程度上,这正是亚马逊的众多优势之一。

That's a good question. I mean, could, right? Like, I think right now you're probably going to want a human in the loop where you get to say, okay, I agree, agent, you did go find the thing I was looking for. But there are a lot of more basic, you know, purchasing needs where sort of you you as the human don't have a strong opinion, you just need the thing. And, you know, arguably, that's what Amazon's awesome for among among many things.

Speaker 1

我认为代理式商务会首先应用于那些人们不太可能反对代理决策的场景。

But I could see that's probably where agentic commerce goes first is things where you're highly unlikely to disagree with the agent.

Speaker 3

没错。

Right.

Speaker 1

而在其他情况下,它负责初步筛选后由人决定。比如牙膏——你完全可以让代理帮我买佳洁士。

Versus in other cases, it teeing up and then you Like see toothpaste. Exactly. You're welcome to go get Crest for me. Right.

Speaker 0

是的。但对于更具选择性的采购,还是需要人工参与。你对昨天的产品发布有什么看法?

Right. But But the more discretionary purchases, you you need a human in the loop. What did you think of of the release yesterday?

Speaker 1

我还没有深入了解细节,不过Farfel实际上也在与Stripe合作开发一系列代理电商前端。所以我今天会花时间研究Stripe的功能。我认为他们在那个活动中已经深入探讨过这个问题,我今晚晚些时候会观看主题演讲。

I have not gone into it in detail, although Farfel actually is also working with Stripe on a bunch of agent ecommerce fronts. So I will be spending Stripe tours today. So I think they went through it in-depth at that event, and I will be watching the keynote later this evening.

Speaker 0

所以呢,我是说,你知道的,Stripe...你知道吗?关于这个我有很多问题想问你。这样吧,一周后的今天你再来节目怎么样?让我们继续这个对话,因为我非常想了解更多关于代理电商前端的内容,以及Vercel与Stripe的合作进展。

And so what, I mean, you know, is Stripe You know what? I got so many questions for you on this. I'll tell you what. Why don't you come back on the show a week today? Let's keep this conversation going, because I would love to ask you more about the agentic commerce front and also what Vercel is doing with Stripe.

Speaker 0

非常感谢Gene今天做客节目。这位是Vercel的首席运营官Gene Grosser。好的,各位听众,我们即将迎来下一位嘉宾。正如刚才提到的,Anthropic发布了Claude Sonnet 4.5,开发者们已经迫不及待地试用并检验它的性能。

Thank you so much, Gene, for coming on Thank the you. That is Gene Grosser, the Chief Operating Officer at Vercel. Okay. Well, we are marching on ahead to our next guest, folks. As we just mentioned, Anthropic released Claude Sonnet 4.5, and developers have quickly been running to try it out and see just how good it is.

Speaker 0

我想再邀请两位已经使用过这个新模型的朋友来分享他们的第一印象。Zach Lloyd是Warp公司的CEO,Andrew Filav是Zencoder的CEO,这两家公司都是AI编程软件公司。欢迎二位,很高兴你们能来。

I want to bring on two more people who have been using the new model already to give us their early reactions. Zach Lloyd is the CEO of Warp, and Andrew Filav is the CEO at Zencoder. Both those companies are AI coding software companies. Welcome to you both. It's great to have you.

Speaker 4

谢谢邀请,能来这里真是太棒了。

Thanks for having me. This is awesome to be here.

Speaker 0

那么这样吧,我这里有个非常开放性的问题:你们的初步反应是什么?谁想先来回答?Zach?

So I'll tell you what, I got a really open ended question here, and it's what are our initial reactions? So who wants to go first? Zach?

Speaker 4

当然。第一印象非常好。就编程模型的质量提升而言,我们又前进了一大步。相比上一代Sonnet模型,这是个显著的进步,同时保留了原有的优点。

Sure. Initial reaction's very good. We march on as far as, like, the quality of these coding models improving. It's a significant improvement over the last generation of Sonnet models. It keeps what's good about them.

Speaker 4

它速度很快,但也能执行一些较长视野的推理任务。在多重并行工具调用方面表现尤为出色。总的来说,我印象深刻,非常棒。

It's fast, but it can also do sort of longer horizon reasoning tasks. It does a bunch of things really well with multiple parallel tool calls. So overall, I'm impressed. Very good.

Speaker 0

安德鲁,你怎么看?

Andrew, what do you think?

Speaker 2

这是个了不起的模型,它做到了优秀模型近期都在做的事——将工业界已验证有效的方法训练模型掌握。就像之前引起轰动的推理模型,本质上是将思维链技术规模化应用到LLM层面。

It's an awesome model, and it does what models what great models have been doing recently is they're taking something that works in the industry, and they're training the model to do that. So if you think about the previous big splash, it was reasoning models. So they basically took their chain of thought technique

Speaker 3

确实。

Yeah.

Speaker 2

而这个模型的关键突破在于上下文管理能力——这点可能被低估了——这对智能体执行大型任务至关重要。专业AI工程师都擅长将大任务拆解为小任务,根据我的观察,Anthropic直接将这种范式训练进模型。这意味着普通用户无需经历漫长的技巧摸索,就能直接获得自动生成需求文档、执行计划等能力。

That was popular in the industry and of blasted it out at their LLM scale. And then what this model is doing, one of the things that I don't know if people picked up on this, is very good at context management, which is one of the most important things to get these agents do sizable tasks. So if you worked with any professional AI first engineer, they're getting really good at breaking down the big tasks into smaller ones And to match basically, my read from how this model behaves, how it operates, is that Anthropic took that paradigm and trained the model to do the same automatically. So this means that a lot of people who are not yet good at that skill, who did not go through this kind of personal journey of discovering how to best use the models, how to do context management, now they can get it out of the box. They can just throw an idea at the model, and the model will automatically create the requirements document, the plan for them, and whatever.

Speaker 2

这大幅降低了普通用户的使用门槛。还有个连带好处:前沿实验室训练模型解决这些问题时,由于通用智能的特性,模型会自然掌握其他技能。比如现在Sonnet 4.5的方案撰写能力已堪比上一代的Opus 4.1,而Opus的成本是Sonnet的五倍。

So it lowers the barrier for regular folks to get the most out of these models. And as a byproduct, another cool thing is when Frontier Labs train models to solve those things, because it's their general intelligence, they automatically pick up a lot of other things. So one other big improvement is that Sonic right now is becoming in my opinion, Sonic 4.5 is becoming as good as writing those plans as the previous generation of Opus, Opus 4.1. Right? And Opus is five times more expensive than Sonnet.

Speaker 2

我绝对会选择便宜五倍的这个方案。更何况Sonnet作为更轻量模型,响应速度也更快。

And I will take that five time cheaper price any day. So and and it's also faster. Right? Because Sonnet is a smaller model. Yeah.

Speaker 2

所以它运行得更快,这真是太棒了。

So so it operates faster, and that's that's awesome.

Speaker 0

扎克,谈谈这里的定价问题。

Zach, talk talk about the pricing here.

Speaker 4

是的。正如安德鲁所说,它的价格与上一代Sonata型号相同,但质量更好。在多项基准测试中,它的表现不亚于Opus,甚至更优。而Opus的价格要贵五倍。因此,从为用户提供价值的角度来看,这是一个显著的提升。

Yeah. So it's it's the same price as the last Sonata model, as Andrew was saying, but it's better quality. And on a bunch of the benchmarks, it's as good as Opus, if not better. An Opus is five times more expensive. So from the perspective of offering value to people who are using it, this is like a significant

Speaker 0

进步。而且,你知道,我们经常听到这些基准测试。信息的读者中有很多是技术专家,但也有很多人可能不像编程人员那样具备深厚的技术知识。当谈到基准测试时,用通俗的话来说,你们具体评估哪些方面?我想速度是一个指标。

step up. And and what you know, we hear a lot about these benchmarks. You know, a lot of the readers of the information are highly technical people, but a lot of them also are people who may not have as much technical acumen as people coding. When they talk about the benchmarks, what exactly, in layman's terms, are you assessing any on? I mean, one thing is is speed, I guess.

Speaker 0

另一个是准确性。我们还在看什么其他方面?所以

You know, other is is sort of accuracy. What what else are we looking at? So

Speaker 4

抱歉,安德鲁,你先说。

so sorry. Go ahead, Andrew.

Speaker 2

是的。对于公布的基准测试,实验室评估时通常只评估其解决率。有不同的方法,我们就不深入细节了,但他们评估的是模型解决问题的频率。我们在内部运行基准测试时,还会考虑成本和速度。

Yeah. So for the benchmarks that are published, when when labs assess it, they typically just assess their resolution rate. And there are different methodologies. We won't go into details, but they assess how frequently does the model solve the issue. Now when we run benchmarks internally, we also look at cost, and we also look at the speed.

Speaker 2

举个例子,我们早期的发现之一是,他们的Sonnet 4.5解决问题的速度比Sonnet 4快了约25%,这在现实生活中非常棒。但你在公开基准测试中看不到这一点,因为它们只关注最高分。但在内部评估这些代理时,无论是企业还是产品开发,正如Takash所说,你需要看整体情况,对吧?

And so, for example, one of our early insight is that their Sonnet 4.5 resolved the issues about 25% faster than Sonnet four, which is, again, awesome for the real life. And and you don't see it in the public benchmarks because they're all about just just the high mark. Right. But internally, when you're evaluating those agents, whether it's enterprise or whether you're building products, you do need to look at, Takash, as you said, it's like the whole picture. Right?

Speaker 2

不仅仅是解决率,还有速度、成本等其他因素。

It's not just the resolution rate, but it's the speed, it's the cost, everything else.

Speaker 0

Zach,插句话。你们在基准测试方面还关注哪些其他指标?

Zach, jump in here. Anything else you guys are looking at in terms of benchmarks?

Speaker 4

关于延迟的有趣之处在于,可以从两个角度来看。一是如果你在使用SONNET,令牌流回你所需的时间,而SONNET模型非常快。二是模型实际完成任务所需的时间,而SONNET在这两方面都表现更好。但有趣的是,我们发现一些推理模型,比如GPT五,返回第一个令牌的速度较慢,它花了很多时间思考,你可以看到它在思考,但实际上完成任务更快。

The interesting thing about the the latency is there's two ways of looking at it. So there's like if you're if you're sitting there using SONNET, how long does it take for the token to stream back to you and the SONNET models are very, very fast? There's a second way of looking at it, which is like how long does it take for the model to actually complete the task? And it's it's better on both of these, but one of the interesting things is like we find that some of the the reasoning models, like for instance, GPT five is slow to return the first token. It, like, spends a ton of time thinking, and you can see it thinking, but it actually completes the task faster.

Speaker 4

明白了。两种延迟并不完全相同。除此之外,我同意Andrew的观点。他们在报告解决率。我们在内部对SONNET 4.5运行了所有基准测试。

Got it. Two pieces of latency are not exactly the same. Other than that, it's I I agree with Andrew. It's like they're reporting resolution rate. We ran all the benchmarks on SONNET 4.5 internally.

Speaker 4

它在SuiteBench上的开箱即用得分更高。嗯,这绝对是行业领先水平。

It's it scores higher on SuiteBench out of the box for us. Mhmm. It is definitely state of the art.

Speaker 0

Zach,我想再和你聊一会儿,Andrew。我稍后再找你。你知道,我们在上一节中看到了这一点,和Vercel讨论了模型效果趋于平缓的问题,以及我们之前取得了许多重大突破,现在很难每次都让一切变得更好。

Zach, I I want to stick with you for a second, Andrew. I'm gonna come back to you. You know, as we we saw this in our last segment, were talking to Vercel about sort of this discussion around model effectiveness sort of plateauing and the idea that we made so many big jumps. Right? It's now hard to make everything that much better every single time.

Speaker 0

扎克,你是否观察到你们使用的模型在效能上出现了某种停滞?

Are you seeing sort of a plateauing in the efficacy of the models that you're using, Zach?

Speaker 4

我认为如果观察从3.5到3.7再到4和4.5的跃迁,这个进步幅度似乎略小。我不确定安德鲁会怎么想,但当我们从比如3.7升级到4时,感觉是'哇,以前做不到的事现在模型能实现了'。我还没足够时间体验4.5来判断是否也有同样感受,但感觉我们正逐渐接近能力边界。现在更关键的制约因素开始变成:模型是否具备正确的上下文?

I think if you look at the jump from like three five to three seven to four to four five, that I think it's a slightly smaller delta. I don't know what Andrew would would think here, but when we were using, say, like, Sona three seven and it went to four, it was like, woah. Things that were not possible before are now now the model can do. I haven't spent quite enough time with four five to know if it feels that same way, But it feels like we're getting a little bit closer to the edge, and and I would say the limiting thing is starting to become more, does the model have the right context? Okay.

Speaker 4

以及它能否对你的大型代码库进行推理?它是否理解你的意图?是否拥有所有合适的工具?这些都是...

And, like, can it can it reason over, like, your big code base? Does it understand your intent? Does it have all the right tools? So all of that is

Speaker 0

本质上就是人员对接问题,比如非技术人员。关键在于他们能否获取到核心数据?

Contacts basically for people, you you know, who might not be coded. Basically, can do they have access to the right data essentially?

Speaker 3

正确的数据。它是否,

The right data. Does it,

Speaker 4

理解你们组织的运作方式?优秀的软件工程还涉及许多其他因素,不仅仅是纯粹推理能力的问题。所以我认为当前更主要的制约因素正在转向这些方面。

like, understand how your organization works? There's all these other things that go into, say, great software engineering that is not just a question of like pure inference. And so that's a place where I think we're starting to that's a little bit more of the limiting factor at the moment in my mind.

Speaker 0

明白了。安德鲁,听起来扎克的意思是:模型的效能固然重要,但现在真正的瓶颈反而不是模型本身,而是如何构建模型周边的其他组件来充分发挥其潜力?你是这样理解的吗?

Right. And so Andrew, I mean, it sounds like Zach is saying, hey. I mean, yeah, the efficacy of the model is one thing, but it's not actually the model now that is sort of the rate limiting step. It's like, how else do you build the other parts around the model to sort of get the most out of it? Is that the way you understand it?

Speaker 2

对我而言,它是个乘数效应。模型本身及其优劣是一方面,另一方面是我们所谓的“驾驭系统”——即围绕模型的所有工具和应用的质量。两者的成功与失败会相互放大。所以若模型差、驾驭系统也差,结果就糟透了;反之,若两者俱佳,效果就会非常出色。

To me, it's a multiplier. There's the model itself and how good is the model, and then there's what we call harness. Like, how good is their all the tooling and applications that use the model, and and both their successes and their failures kind of multiply. So if you got bad model and bad harness, it's terrible. And and then vice versa, you've got also model and also harness, it's great.

Speaker 2

而且它们还会相互促进发展。模型开始承担更多原本由驾驭系统完成的工作,并接受相关训练。与此同时,驾驭系统也在向更高层级进化。比如我们已着手研究多智能体协作、信息传递机制,以及如何管理整个智能体集群。

And then what happens is they both also move upstream. So the models start to do more of the job that harnesses did before, and and they're getting trained on that. Right. But then harnesses are starting to get higher levels. So for example, we started working on orchestration of multiple agents and whatever, and how do they pass over information, like, how do you manage that whole fleet.

Speaker 2

明白吗?整个体系都在升级。从这个角度看,我认为它会持续进化,就像人类大脑虽与两万年前相差无几,但我们的文明已今非昔比。模型、驾驭系统乃至整个生态体系也将经历类似的演进过程。

Right? So so it's all moving up. And from that perspective, I think it continues to level up just like, you know, human brain was probably pretty much the same as it was twenty thousand years ago, but we as a civilization now are much more capable. And I think the same thing happens with their with the models and harnesses and kind of all the ecosystem.

Speaker 0

没错。这么说吧,我个人还需要多查些资料来理解这些决定因素。但首先要感谢二位拨冗参与——这些模型正以惊人速度涌现,能听到一线使用者的真实反馈总是难能可贵。感谢Warp的Zach和Andrew,真的非常感谢。

Right. Well, I'll tell you what. You know, I I think we we I know for myself, I I I got a little more googling to do about some some of these determiners, but I I I want you I wanna thank you both for coming on. Know, these models are coming fast and furious, and it's always great to get people on the ground using them to tell us how it is they are liking them. That is Zach from Warp, and Andrew, thank you so much.

Speaker 0

衷心感谢你们的时间。观众朋友们,让我们进入下一环节。XAI一直是家引人注目的公司,特别是最近收购了埃隆·马斯克的社交平台X之后,这给公司组织架构带来了重大调整。关注The Information报道的读者知道,我们向来热衷剖析企业架构图。本周我们独家披露了XAI当前的公司结构。

Really appreciate your time. Thanks, Let's get on to our next segment, folks. XAI has been a fascinating company to follow, especially with the latest acquisition of X, Elon Musk's social media platform, of course that is, and all of that has meant big changes for the company's org charts. Now, if you follow the information's coverage, you know that we love to geek out over companies' org charts over here. This week, we published an exclusive look at what xAI's company structure currently looks like.

Speaker 0

现在有请专注报道埃隆·马斯克动态的Theo Waite为我们深入解读。Theo,欢迎再次做客节目,很高兴见到你。

I want to bring on Theo Waite, who covers all things Elon Musk to tell us more about what he's found. Theo, welcome back to the show. It's great to have you.

Speaker 5

谢谢邀请。

Thanks for having me.

Speaker 0

你今天穿得很正式啊,伙计。你要参加AI Agenda Live峰会。看看这个。

You're dressed up today, man. You got the AI Agenda Live Summit. Look at that.

Speaker 5

你在吗?

Are you there?

Speaker 0

我穿的是同一件西装外套,老兄。我就那五件外套轮着穿。周一、周二、周三、周四、周五。好了,咱们聊聊xAI吧。

I'm wearing the same blazer, man. I wear the same five blazers over and over again. Monday, Tuesday, Wednesday, Thursday, Friday. Okay. Let's talk about xAI.

Speaker 0

你以前报道过亚马逊对吧?你看,亚马逊是全球最大的公司之一,我敢说他们肯定有完善的组织架构体系。这是个企业集团。

You used to cover Amazon. Okay? And look, Amazon, it's one of the biggest companies in the world, right? I'm sure they have a lot of org chart infrastructure in place. It's a conglomerate.

Speaker 0

现在你负责报道xAI,我觉得他们的组织架构肯定截然不同。所以我想知道,你在xAI观察到什么?和我们认知中的传统科技公司有什么不同?

You're now covering xAI, which I take the I mean, the org chart must be, it looks so different. So what I want to know is, what were your observations from x AI, and how do they differ from a traditional tech company that we think of?

Speaker 5

不得不说,报道xAI和梳理其组织架构相比亚马逊真是耳目一新。在亚马逊,人们总纠结于总监和副总裁的区别,L7和L5的层级——那里有太多等级制度和术语,而xAI根本没有这些。很多时候人们连正式头衔都没有,除了'技术团队成员'这个称呼。这公司不像传统企业那样等级森严,部分原因是规模较小,但也因为埃隆的管理风格。有些人直接向埃隆汇报,手下可能有数百人——不是直接下属,但归他们管。

It's pretty refreshing covering x AI and working on the org chart compared to Amazon, I have to say, because at Amazon, there's a lot of, you know, agonizing over director versus VP versus l seven versus l five. Like, there's so there's so much hierarchy and so much terminology that just doesn't exist at x AI. I a lot of times, people don't even really have a formal title besides a member of technical staff. And it's just not a very hierarchical company in the same way, just largely by virtue of its size, but also because of Elon's management style. There are some people that report to Elon that have, you know, hundreds of direct reports or not direct reports, but hundreds of people under them.

Speaker 5

还有些直接向埃隆汇报的人手下为零,他们只是某个他特别关注项目的工程师,所以他想直接沟通。明白吧?一切都非常即兴。

Other people that report directly to Elon that have zero, but are just, an engineer working on a project that he really cares about, and so he wants to speak with them directly. Right. It's all very impromptu.

Speaker 0

那么,即兴提拔意味着有人快速晋升也有人迅速降职。我们来聊聊当前他的核心圈子里都有谁?哪些人是我们应该关注的XAI核心成员?

So impromptu means people are getting promoted and people are getting demoted, I guess, very quickly. Let let's talk about sort of who is in his inner circle right now. Who are the people that we should have on our radar for the XAI inner circle?

Speaker 5

当然。目前XAI最重要的两位研究科学家——他们自称工程师,但实际是研究科学家——是吴托尼和张古东。他们原是谷歌的研究科学家,今年被埃隆提拔,某种程度上是以公司其他联合创始人的利益为代价。不过当埃隆对某个产品特别关注时,管理层就会洗牌。这次他们俩脱颖而出。

Sure. So the two most important research scientists that or, you know, they call them engineers, but they're research scientists at XAI right now are Tony Wu and Gudong Zhang, who are former research scientists from Google who were promoted by Elon this year, kind of at the expense of some other cofounders of the company. But when Elon, you know, pays a lot of close attention to to one of his products, heads start to roll. And in this case, they they came out on top.

Speaker 0

我查过他们的职称,这些人技术上应该被称为技术团队成员对吧?

So these are people I was looking at their terminology. These are people who are technically called members of the technical staff, right?

Speaker 5

是的,但实际上他们现在已经是管理者了。

Yeah, that's right. But in reality, they're managers now.

Speaker 0

明白了。那你之前报道过的Jimmy Bah呢?关于他我们需要了解什么?

Right. And what about people who, you you know, may have recently been you wrote about this this guy, Jimmy Bah. What what do we need to know about him?

Speaker 5

他原本负责XAI大部分工程师团队,至少是很大一部分。今年夏天埃隆收回他多项职责后,他似乎被降级了,不过他仍直接向埃隆汇报,现在主管企业业务。还有一位叫Igor Babushkin的,几个月前在埃隆也收回部分职责后彻底离开了公司。

So he used to be in charge of the majority of the engineers at x AI or at least, you know, a huge number of them. He kind of got demoted, it seems, over the summer when Elon took a lot of his responsibilities away, although he is still, you know, in the Elon direct report, and he's in charge of, the enterprise business. And then there's another guy, Igor Babushkin, that left the company entirely a couple months ago after Elon also took some responsibilities away from him.

Speaker 0

在所有这些变动之下,有没有关于XAI文化的报道?我猜这些人都在拼命加班取悦埃隆·马斯克吧。

So with with all of these changes happening, what is do we have any reporting on the the culture of x AI? I imagine these are people that that, you know, are really burning the midnight oil trying to make Elon Musk happy.

Speaker 5

是的。根据我交谈过的每个人反馈,情况异常紧张。一方面,我不认为这是一个追求工作生活平衡的行业,但确实,相比OpenAI或Meta,埃隆提出的一些要求尤其令人恼火或严苛。当OpenAI或Meta这类公司试图挖角这些人并提供更高薪酬时,我认为对他们而言,金钱考量可能比金钱本身更重要,尤其是当你的薪资被削减或类似情况发生时。

Yeah. It's it's incredibly intense based on every everyone I've talked to. I I mean, you know, on the one hand, I I don't think it's an industry you get into for work life balance, but I I do think there are also, you know, some demands from Elon that are especially galling or intense compared to OpenAI or Meta. And when, you a company like OpenAI or Meta tries to poach these people and offers them much larger paychecks, I think the money is probably a bigger consideration than money for them at that point, especially if they're like TEDx ing your pay or something like that.

Speaker 0

没错。X公司那边呢?琳达·亚卡里诺已经离职了。我们至今没听到任何关于谁在掌管X的消息。你有发现关于临时领导层或零散解决方案的信息吗?

Yeah. What about X? I mean, Linda Yaccarino left. We haven't heard anything about who's running X right now. Did you find anything about, you know, interim leaders or sort of piecemeal solutions to leadership over

Speaker 4

那边?

there?

Speaker 5

目前,X公司的CEO职位处于待定状态,

Right now, we have a kind of TBD space for the CEO of X,

Speaker 2

but

Speaker 5

目前并没有明确人选,长期计划也不明朗。不过现阶段,由莫妮克·彭塞雷利、约翰·尼蒂和安吉拉·萨帕塔这三位在亚卡里诺时期加入的传统媒体与营销领域高管组成的三人小组,基本上在负责X公司的运营。但相比XAI,它确实给人一种公司次要部分的感觉。我实在不觉得这是埃隆今年会花时间痴迷关注的事情。

there is not one, and it's unclear what the long term plan there is. But, you know, at the moment, there's this trio of executives, Monique Pencerelli, John Nitti, and Angela Zapata that are kind of traditional media and marketing types that joined under Yaccarino. And they're, you know, running X for the most part at this point. But it really does feel like, you know, a pretty secondary part of the company at this point compared to XAI. I I just don't really get the sense that it's what what Elon is spending his time obsessing over this, you know, this year.

Speaker 0

明白了。太棒了。西奥,非常感谢你参加节目并讨论xAI。我知道你今天下午还要主持我们AI Agenda Live的机器人专题讨论,我非常期待那场活动。

Right. Great. Well, Theo, look, thank you so much for coming on and talking xAI. I know that you're going be moderating our robotics panel at AI Agenda Live this afternoon. I'm excited for that one.

Speaker 0

是的,我们稍后可能会再次邀请你上节目,与我们最喜爱的机器人领域记者、信息火箭德鲁一起讨论这个话题。他简直就是一部百科全书。感谢你今天的参与,西奥。我们非常感激。下午见。

And, yeah, we'll have you back on the show maybe to talk about that later on with our favorite robotics correspondent, the information Rocket Drew. I mean, he's also an encyclopedia. Thank you, Theo, for coming on. We appreciate it. See you this afternoon.

Speaker 0

好的。接下来是我们的合作伙伴亚马逊网络服务的环节。最近关于AI代理的讨论很多,其中最具变动性的议题莫过于如何准确定义代理、代理擅长什么,更重要的是它们不擅长什么。为此,我想请出AWS总监邵南迪,他同时也是道琼斯新闻集团的前首席信息官。

Okay. Our next segment is with our presenting partner, Amazon Web Services. There has been a lot of talk about AI agents lately, and one of the biggest moving targets is really what the right definition of an agent is, what things agents are good at, and more importantly, what they're not good for. To talk about that, I wanna bring on Shao Nandy, a director at AWS. He is also the former CIO of Dow Jones News Corp.

Speaker 0

非常高兴能邀请到他。肖恩,欢迎再次来到节目。很高兴你能来。

I'm really excited to have him on. Shawn, welcome back to the show. It's great to have you. It's good

Speaker 3

再次见到你也很高兴,阿卡什。

to see you again, Akash.

Speaker 0

上次我们聊了首席AI官的话题。今天我想谈谈代理。我需要你帮忙厘清的是:我们在节目里讨论了很多代理相关的内容,但我一直难以理解它与过去十年我们熟知的传统自动化究竟有何本质区别。如今这个代理新范式——能否谈谈它们的差异点?你如何定义代理?

So we talked last time about the Chief AI Officer. Today, I want to talk about agents. And one of the things I want to get your help on is, look, we've talked a lot about agents on the show, and I've been struggling to sort of understand really what the difference is between sort of old school automation that we've read so much about over the past decade, it seems. And now we have this new paradigm of agents. I mean, talk to me about how these are different and how you think about what an agent actually is.

Speaker 3

没错。这其实是个很有趣的话题。让我先从AWS对代理的正式定义说起:我们认为代理是能够通过推理、规划和行动来实现目标的自主或半自主系统。

Yeah. No. It's it's a really fun topic, actually. Let me start with sort of a a formal definition from when AWS thinks about agents. We look at them as autonomous or semi autonomous systems that can reason, plan, and act to accomplish goals.

Speaker 3

它们不只是响应指令,而是执行任务、适应情境并推动结果。理解它们与长期以来的生成式AI的区别很有帮助——生成式AI产出内容、文本、代码和图像,而代理是质的飞跃:它们不只创造,更会执行。

They're not just answering prompts, but they're executing tasks, adapting to context, and driving outcomes. And what helps is thinking about how they differentiate from all of the generative AI action we've been seeing for so long. Generative AI, of course, generates content, text, codes, images. Agents are really a step change. They don't just create, they do things.

Speaker 3

起草商业提案、更新CRM系统、安排下次会议、启动工作流程而无需你亲自点击操作。你问到这与我们过去所见的机器人流程自动化(RPA)等技术的区别。我想说的是,如果你回想传统的RPA概念,它本质上是基于规则的引擎,必须遵循预设路径,非常脆弱。当然,过去几年我们已看到辅助型技术,比如副驾驶式的活动,这类预先设定目标的操作。

Get draft a business proposal, update the CRM system, schedule the next meeting, kick off the workflow without you doing all of the clicking. And you asked like sort of the difference between what we've seen in the past, robotic process automation, all those pieces. What I'll say is if you think about the old follow concept, which was RPA, really you had a rules based engine and you had to follow a prebuilt path. It was very brittle. Of course, we've been seeing the assist for the last couple of years, co pilot type activities, sort of predetermined goals.

Speaker 3

但现在我们正迈向协作阶段——这就是目标导向型智能体的意义。

But now we're getting to collaboration. That's the goal defined agent.

Speaker 0

关于智能体的定义,你们在帮助客户运用智能体方面有哪些具体方式?或者能否分享些你们自身业务中的使用案例?

We have that sort of definition of an agent. What are some ways that you've been helping customers work with agents or some examples about how you've been using them in your own business?

Speaker 3

当然。说实话例子太多了。你提到自身业务,既有生产效率提升方面,也有通过AgenTic实现的新产品创新。

Yeah. I mean, look, so many. I mean, you said own business. There's productivity. There's new product innovation with AgenTic.

Speaker 3

还有更快解决问题的场景——这不仅是效率问题。比如亚马逊,我们正在大规模应用智能体。最近刚宣布升级卖家智能体,这些智能体主要服务于我们的电商平台(非AWS业务)。

There's helping solve problems faster that's not just about productivity. Amazon, you know, we are doing a lot with agents. And we just announced enhanced seller agents. Now seller agents are really in our marketplace. And this is not AWS.

Speaker 3

这是亚马逊核心零售业务。想象所有在亚马逊上销售产品的第三方卖家。我们希望通过智能体帮他们自动化目录管理、优化定价策略、生成商品描述。最初我们发布了基于生成式AI的卖家支持工具,能创建/识别图片并重新排版文本。

This is big Amazon retail. And you think about all the third party sellers selling products on Amazon. Right. We wanted to help them automate catalog management, do better pricing, have product descriptions. And we initially released a Gen AI powered seller support tool that would like create images or identify images and reformat text.

Speaker 3

现在它更进一步:帮卖家执行任务、开展调研、进行逻辑推理。这对亚马逊卖家是巨大的效率提升,更重要的是还能提高商品质量,最终带来更好的客户体验。

Now it's actually helping them do tasks, do research, reason for them. And that's a huge productivity boom for all those people selling on Amazon. But it also, even more importantly, can improve quality, which turns to a better customer experience.

Speaker 2

没错。

Right.

Speaker 0

但你确实需要...我只是想确认我理解正确。你们确实需要主动提示他们,对吧?我是说,具体是怎么运作的?

But you you you do have to have to I I just wanna make sure I understand that. You do have to prompt them. Right? So I I I mean, how does it work?

Speaker 3

是的,这并非魔法。你需要提示系统说:提供初步信息、给份销售单、发些图片、给个链接、提供相关资料。但它不只是生成建议文案,而是真正协助你调研并与你互动。

Yeah. It's not just magical. Right? You might prompt you to say, give us some initial re information, give us a sell sheet, give us some images, give us a link, give us some information. But it's going from simply sort of generating suggested copy to actually going and helping you research and interacting with you.

Speaker 3

这种体验要好得多。我们一直在写相关博客文章,希望更多人能了解。当然,真正在亚马逊上销售的——好吧,我们大多数人并不在亚马逊卖货。

It's a really much better experience. And, you know, love to have people go out and see that. We've been writing blog posts about it. But of course, those who actually sell on Amazon okay. Most of us aren't selling on Amazon.

Speaker 3

他们实际在使用这个系统。而且必须实现规模化运作,Akash,这才是关键。我们面对的是数百万、数千万卖家。

They're actually using this. And this has to work at scale, Akash. That's the difference. We have millions, tens of millions of sellers. Yeah.

Speaker 3

听着,第三方卖家群体庞大。不能只为个别人服务,系统必须坚如磐石——这就是我们的立场。

Come on. Third party sellers. You can't just do this for one or two people. It has to be bulletproof. I mean, that's sort of our case.

Speaker 3

我可以分享个有趣的客户案例。

I could tell you about a fun customer case.

Speaker 0

当然。是的。我一直都喜欢听好故事。

Sure. Yeah. I'm I'm always down for a good story.

Speaker 3

那我告诉你吧。你提到的XCIO。

So I'll I'll tell you what. So XCIO here, you mentioned that.

Speaker 0

丹·琼斯。没错。

Dan Jones. Yeah.

Speaker 3

对。网络中断、连接故障,这些都是痛点。你知道对谁影响最大吗?我最爱的运动之一——一级方程式赛车。关注F1的人都知道,全球高速赛事无处不在,他们需要辗转各大赛场。

Yep. Network outages, connectivity outages, all is a pain point. You know who it really matters for? One of my favorite sports, Formula One. So for those who follow Formula One, you know, high speed racing worldwide happens everywhere, and you go from race to race.

Speaker 3

这周在墨西哥比赛,下周就到奥斯汀,几周后又转战新加坡。这些赛事都通过F1 TV等频道直播。一旦网络出问题,后果是灾难性的。

There's a race in Mexico. The next week, there's a race in Austin. A few weeks later, a race in Singapore. And they broadcast all these on channels like f one TV. So when they have a network problem, it's disastrous.

Speaker 3

对吧?所有摄像机、所有角度的画面都必须能实时传输。

Right? Like, you gotta be able to broadcast all those cameras, all those angles.

Speaker 2

确实。

Right.

Speaker 3

没错。在搭建站点时的故障排查过程中,有时需要耗费十五个工程师日来调查为何搭建过程出现公司层面的问题。为此提前三周,F1团队开发了一款高度智能的网络中断根因分析代理系统。他们整合了多个系统,能进行自然语言故障诊断。

Right. The problem resolution during setup, when they're setting these sites up, it would sometimes take fifteen engineering days to investigate why the the setup was a company. So to start way in advance, three weeks in advance, F1 built a really intelligent root cause analysis agent for networking outages at race sites. They connected multiple systems. They do natural language troubleshooting.

Speaker 3

这使得问题解决时间缩短了86%。分类处理时间从一天降至二十分钟,这简直令人难以置信。

It drove an 86% reduction in resolution time of issues. So triage time went down from one day to twenty minutes. And that's just incredible.

Speaker 0

所以这个代理系统相当于在主动发现问题。我想探讨下F1为何选择聚焦代理系统解决这个特定难题,因为这非常复杂。理论上代理可以应用于任何场景,但显然某些用例比其他更适合。他们是如何锁定这个具体问题的?

And so this is sort of the agent sort of finding the problems for you. I want to talk a little bit about how F1 decided to focus agents on that particular problem, because it's really hard. I mean, you could apply agents to anything and everything seemingly, but there appear to be better use cases, better applications than others. How did they land on that particular problem?

Speaker 3

我从两个维度说明选择标准:当需要智能决策支持、更优的自动化方案,以及面临复杂动态情境时——这正是适用场景。

So I'll give you these pressures from both sides. How do you select? How do you select what use case, what situation? It's when you need intelligent decision making. You need better automation and you have complex dynamic situations.

Speaker 3

网络中断虽然听起来不总让人兴奋,但其诱因可能来自物理光缆断裂、错误配置等无数环节,极其复杂。处理这些海量信息正是代理系统的拿手好戏。不过目前还不会将其用于零容错率的超高危场景。

And network outages, while not always the most fun sounding thing, there's so many possible areas that could be coming from physical fiber breaks, bad configuration. It's really complicated. It's really hard to process all that information. That's low hanging fruit for an agent. However, you're not probably gonna use agents in the most high stakes, zero error tolerance situations yet.

Speaker 3

比如你不会用它向赛车手提供座舱实时反馈,这类任务仍会采用传统传感器。但网络中断属于事后研究解决型问题。

You're not gonna use them, for example, to give real time feedback to a driver on the cockpit. You're gonna use sort of your classic sensors for that. Right? Right. But the network outage, this was research and resolve.

Speaker 3

既然问题已经发生,你只想加速解决过程。这不仅不会引入新风险,反而降低风险系数——当然,整个流程中仍有人类参与把控。

You already have a problem. You want to make it solved quicker. There's no risk being added by this. There's only risk reduction. And of course, there's still a human in the loop.

Speaker 3

另一个要素,虽然对F1来说不那么适用,但对所有听众都重要。你必须确保组织上准备就绪,有合适的合作伙伴共同构建这些系统,拥有可扩展的基础设施。一旦这些到位,你就能进行更广泛的实验,更广泛地投入生产。但F1选择了我们。

The other element, and this was less true for F1, but this is for everyone listening. You have to sort of make sure you're organizationally ready, that you have the right partner to work with on building these, that you have a scalable infrastructure. And once you have that in place, you can experiment more broadly. You can go to production more broadly. But f one had selected us.

Speaker 3

他们正在与我们合作。这对我们来说是好事。他们已经理解了如何应用这项技术,这是部分心智模型。我可以更深入地探讨这些情况。

They're working with us. Great for us. They already understood how to apply this. That was some of the mental model. I can go deeper on some of those situations.

Speaker 0

我的意思是,我理解你在实施风险及其合理性方面的观点,也理解关于领导力的部分。我们在节目中经常讨论的另一点是投资回报率(ROI)以及智能代理的商业案例。你如何看待这一点?是否全部以节省成本和量化时间为标准?

I mean, I I think that, you know, I I hear you on sort of the the how much risk there is to implementing it and whether or not that makes sense. I hear you on the leadership piece. The other piece that we've been talking about a lot on the show is the ROI and what the business case is for agents. How do you sort of think about that? Is it all sort of cost savings and quantified in terms of time?

Speaker 0

你是否会说,'嘿,这相当于10名全职员工的工作量',类似这样,你是如何考虑的?

Do you sort of say that, Hey, this is doing the work of 10 full time employees and, you know, that's that like, how do you think about that?

Speaker 3

我先举一个不以节省成本为例的例子,然后再讨论成本问题。你们大多数听众真正关心的是投资回报率和成本,对吧?

I'll give you one example that's not cost savings based, and then we'll talk about cost. Most of your listeners are really worried about ROI and cost. Right?

Speaker 0

所有人。我认为所有

All of them. I think all

Speaker 2

人都是。

of them.

Speaker 3

不收费的是Alexa Plus。本周你们会看到亚马逊发布大量设备公告,新闻里都是令人振奋的消息,而Alexa Plus是其中的核心。这对我们来说是真正的新产品。对吧?它关乎客户体验。

The not cost one is Alexa Plus. You're gonna see a ton of devices announcements this week from Amazon, really exciting things in the news, and Alexa Plus is the core of them. And that's really new product for us. Right? It's customer experience.

Speaker 3

Alexa现在能做到以前从未实现的功能。它经过了彻底重构。我们正在将大语言模型与数千项服务连接起来。这就是全新的产品创造。但对大多数人而言,正如你所提到的——

It's Alexa being able to do things that has never been able to done before. It's been completely rebuilt. We're connecting LLMs with thousands of services. It's it's just new product creation. But for most of us, it is what you mentioned.

Speaker 3

它关乎效率、速度和客户体验。关于投资回报率,我给领导们的建议始终是:志存高远,从小处着手。别被可能性限制思维。但开始实施时,要选择能衡量结果的场景。

It's efficiency, it's speed, it's customer experience. And for the ROI, I'll tell you like the basic advice I give leaders all the time. Think big, start small. So like don't get yourself in terms of what's possible. But when you start with a scenario, pick something where you have a measurable outcome.

Speaker 3

以呼叫中心为例,我们拥有优秀的产品Amazon Connect。部署Connect或任何呼叫中心解决方案时,我们真正关注的是要实现什么目标?是要减少主管升级吗?要知道客户讨厌升级流程,这听起来是糟糕体验,实则成本高昂。

So for a call center, a contact center, we have a great product set Amazon Connect. And one of the things we look at when we're deploying Connect or any sort of call center contact center experience is really around what are we looking to accomplish? Are we gonna reduce supervisor escalations? You know, customers hate supervisor escalations. It sounds like bad experience, but it's actually expensive.

Speaker 3

主管人力成本很高。所以你要设定明确指标:我想把升级率降低35%,这将节省X成本。然后对比智能代理方案的成本。提示一下:近期构建成本根本不是问题。

Supervisors are hard. They cost a lot. So you got them defined metric. I wanna bring escalations down by 35% and it will result in x, this cost Then you can look at the cost of the agentic solution. And I'll give you a hint, build cost isn't a big deal lately.

Speaker 3

这些代理程序可以非常快速轻松地构建。

Like you can build these agents very quickly and easily.

Speaker 0

那么最终最昂贵的部分是什么呢?

So what ends up what ends up being the the most expensive part then?

Speaker 3

正在运行中。顺便说一句,人们现在对运行它并不感到压力,因为他们部署了一个代理,可能只为20个、10个甚至5个用户运行。对吧?这没问题。但如果他们成功了,使用量将会激增。

Running it. And by the way, people aren't stressed about running it yet because they deploy an agent, they're running it for like 20 agents, 10 agents, five people. Right? Fine. But if they're successful, consumption's gonna go through the roof.

Speaker 3

所以我们花了很多时间告诉客户,要快速构建、快速实验,并关注可衡量的指标。但之后我们要一起确保你选择了正确的模型。最近几周我们刚推出了一系列开放权重(即开源)模型。昨天还宣布了最新的Anthropic模型,现在Bedrock平台上运行着Anthropic的四五个模型。

So we're spending lots of time saying to customers, build quickly, experiment quickly with those measurable metrics. But then let's work together on how to really make sure you've picked the right model. We just launched a whole stack of open weight, aka open source models over the last several weeks. Have the latest Anthropic models announcing yesterday running in Bedrock Anthropics four or five model. Right.

Speaker 3

然后根据你的用例挑选最适合的模型。当你构建一个非常专一的代理时——这个话题我能聊一小时,Akash,我保证——你可能只需要一个简单的模型来快速回答问题,这成本与一个全功能推理模型相差巨大。所以不存在一个万能模型解决所有问题。

And picking across the different models for the one that makes sense for your use case. Right. When you've built a really narrow agent, and I could go on for an hour about this, Akash, I want, I promise. You might need a really simple model to answer that question fast and quickly, and that costs an incredibly different amount of money than a full reasoning model. That's why it's not like one model to solve them all.

Speaker 0

明白,明白。最后一个问题。关于代理,目前还存在哪些技术挑战或需要突破的地方?我指的是那些如果能在行业层面(不仅是亚马逊)解决就能极大推动代理功能发展的关键问题。请用通俗语言为我们解释一下。

Right, right. Last question before we let you go. As it relates to agents, what are some of the technical challenges or the breakthroughs that still need to happen? I'm talking about things that are high up on your list that if we could really just figure this one thing out, not just at Amazon as an industry, This right, would really propel agents forward in terms of functionality. Explain what those are to us and try to keep it in layman's terms for us so we understand Yeah,

Speaker 3

当然。你看,我们现在就像在看电影预告片,看到了各种可能性并感到无比兴奋。其中很多确实是可能实现的。

of course. Look, I'll tell you what's amazing right now. We're all seeing the movie trailer. Like, we're all seeing the art of the possible and getting super hyped up. And much of it is possible.

Speaker 3

只是需要投入关注并建立稳定的运行机制。我认为关键问题在于:当你构建多个代理时,它们之间该如何沟通?它们将如何...

It just needs attention and a way to run it resiliently. And I think one of the things that's really mattering is as you're building agents, how are they going to communicate with each other? How are they gonna

Speaker 0

确保正确性

have the right

Speaker 3

没错。正如我们在预览中宣布的一系列服务,我们称之为Agent Core。市面上已有许多优秀的开源解决方案,如Crew AI和LangChain,能让你大规模运行智能体。但客户能否轻松实现这一点——虽然谈论进化听起来不那么激动人心——但比如,你真的想部署一个智能体,并了解它如何通过像A到A这样的开源协议与邻近智能体对话,认证机制是怎样的。

Right. Like we announced in preview a series of services. We call it agent core. There's a bunch of great open source solutions out there like Crew AI and LangChang that let you run agents at scale. But customers being able to do that easily had it sounds less hot to talk about evolution, but, like, you really wanna drop an agent and know how it's gonna talk to your neighboring agent using open source protocols like a to a, what authentication looks like.

Speaker 3

让这一切配置变得简单易用非常关键。我们已打下基础,但要实现规模化执行。第二部分你可能会觉得好笑,但这其实与我们无关——关键在于让组织做好教育准备。

Having that all be set up and easy, that's pretty critical. We put the groundwork in place, but getting it to execute at scale. And the second part, I think you're going to laugh, but it's not it's about us. It's getting the organizations educated and ready. And getting

Speaker 0

让他们接受。是的,他们是领导者。就像你说的,要让领导层真正认同这个理念——好吧,这就是

them on board. Yeah. They're the leaders. Just like you said, getting leaders to actually buy into the idea that, okay, this

Speaker 3

未来的业务运行方式。工具能力太强大了,Akash。即便是现在,发展速度也快得我们跟不上。所以在大多数情况下,我们甚至还没充分利用现有的功能。

is how the business is gonna run. The tooling is so capable, Akash. Right. Even now, it's moving faster than we can keep up. So we haven't taken advantage of what's available today yet in most cases.

Speaker 3

没错。更不用说未来三个月即将推出的那些令人期待的新功能,我都等不及要告诉你们了。

Right. Much less all the great things that are coming over the next three months that I can't wait to tell you about.

Speaker 0

太棒了。邵,很高兴再次邀请你来深入探讨这些话题。关于智能体的讨论令人着迷。再次感谢你的时间。这位是亚马逊云服务的总监邵·南迪。

Great. Well, Shao, I'm excited to have you on again to talk more about those things. It was fascinating discussion about agents. Thank you again for the time. That is Shao Nandy, a director at Amazon Web Services.

Speaker 0

各位观众,今天的节目就到这里。提醒大家我们每周一至周五太平洋时间上午10点/东部时间下午1点在此直播。感谢本节目首席赞助商亚马逊云服务,也感谢您的收看,我们非常珍视每一位观众的支持。

And with that, folks, that does it for today's show. A reminder that we are live on this stream Monday through Friday at 10AM Pacific, 1PM Eastern. I want to thank Amazon Web Services, who is our presenting sponsor for this production. And I want to thank you for tuning in. We really do appreciate your viewership.

Speaker 0

我已经对明天的下一场节目充满期待了。再次提醒,我们将讨论今天下午举行的AI Agenda Live峰会。请继续关注我们的网站获取更多相关报道。那么,祝大家晚安,明天见。

I am already excited for our next show tomorrow. Again, we're going to talk about our AI Agenda Live Summit happening this afternoon. Stay tuned on our website for more coverage on that. And with that, have a good night. I'll see you tomorrow.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客