1250位专业人士谈与AI共事的体验

本集简介

Anthropic调研了1250名专业人士，了解AI如何实际改变他们的工作。结果显示了一种乐观、焦虑与身份转变的复杂交织——创意工作者感到挤压，科学家渴望可信赖的合作伙伴，而大多数劳动者希望将常规任务交给AI，同时保留定义其职业的核心技能。本期节目还探讨了AI主导的访谈如何打破研究中规模与情境的传统取舍，这对理解AI在现实世界的影响意味着什么。头条新闻包括Gemini 3深度思考、Replit携手谷歌进军企业市场、Opus 4.5基准测试飙升、Salesforce的Agent Force发展势头，以及Meta从元宇宙战略的转向。本期节目由以下品牌赞助： KPMG – 探索AI如何将可能变为现实。收听KPMG新推出的《You Can with AI》播客，获取助力企业智能决策的洞见。立即收听，用每期节目塑造您的未来。https://www.kpmg.us/AIpodcasts Rovo - 通过AI驱动的搜索、聊天和代理释放团队潜能 - https://rovo.com/ AssemblyAI - 构建语音AI应用的最佳方式 - https://www.assemblyai.com/brief LandfallIP - 用AI导航专利申请流程 - https://landfallip.com/ Blitzy.com - 访问https://blitzy.com/，以天而非月为单位构建企业级软件 Robots & Pencils - 提供云端原生AI解决方案，驱动实效成果 https://robotsandpencils.com/ Superintelligent的智能体准备度评估 - 访问https://besuper.ai/，获取您公司的智能体准备度评分《AI每日简报》助您掌握AI领域最重要的新闻与讨论。在任意播客平台订阅音频版：https://pod.link/1680633614 有意赞助节目？请联系sponsors@aidailybrief.ai

Anthropic asked 1,250 professionals how AI is actually changing their work, and the results reveal a blend of optimism, anxiety, and shifting identity—creatives feeling squeezed, scientists wanting trustworthy partners, and most workers hoping to hand off routine tasks while keeping what defines their craft. The episode also looks at how AI-run interviews collapse the old scale-vs-context tradeoff in research and what that means for understanding real-world AI impact. Headlines include Gemini 3 Deep Think, Replit’s enterprise push with Google, Opus 4.5’s benchmark surge, Salesforce’s Agent Force momentum, and Meta’s pivot away from the metaverse. Brought to you by: KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.kpmg.us/AIpodcasts⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Rovo - Unleash the potential of your team with AI-powered Search, Chat and Agents - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://rovo.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AssemblyAI - The best way to build Voice AI apps - ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.assemblyai.com/brief⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/ Blitzy.com - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://blitzy.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://robotsandpencils.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ The Agent Readiness Audit from Superintelligent - Go to ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://besuper.ai/ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠to request your company's agent readiness score. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Interested in sponsoring the show? sponsors@aidailybrief.ai

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

今天的AI每日简报主题是：1200位专业人士分享他们与AI共事的经验，在此之前的头条新闻是：Gemini三代的DeepThink现已发布。

Today on the AI Daily Brief, What 1,200 Professionals Tell Us About Working with AI, and before that in the headlines, Gemini three DeepThink is now available.

Speaker 0

AI每日简报是一档每日播客和视频节目，聚焦人工智能领域最重要的新闻和讨论。

The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.

Speaker 0

好的，朋友们。

Alright, friends.

Speaker 0

在开始之前快速通知几件事。

Quick announcements before we dive in.

Speaker 0

首先感谢今天的赞助商：超级智能、Robo、机器人与铅笔以及Blitzy。

First of all, thank you to today's sponsors, Super Intelligent, Robo, Robots and Pencils and Blitzy.

Speaker 0

要获取无广告版本节目，请访问patreon.com/aideallybrief，或通过苹果播客订阅。

To get an ad free version of the show, go to patreon.com/aideallybrief, or you can subscribe on Apple Podcasts.

Speaker 0

当然，如果您有兴趣赞助本节目，想在2025年费率到期前锁定优惠，请发送邮件至sponsors@aideallybrief.ai联系我们。

And, of course, if you are interested in sponsoring the show, locking in those 20 25 rates before they expire, send us a note at sponsors@aideallybrief.ai.

Speaker 0

最后提醒：今天我们会做些特别的节目调整。

Now last note before we dive in, we're doing a bit of a switcheroo today.

Speaker 0

今天的头条新闻部分实际上比主节目还要长一些。

The headline section is actually a little bit longer than the main episode.

Speaker 0

新闻实在太多了，我们不得不这样安排。

There was just enough news that we kind of had to do it that way.

Speaker 0

那么闲话少说，我们直接开始吧。

So without any further ado, let's dive in.

Speaker 0

欢迎回到AI每日简报头条版，五分钟带你了解所有每日AI要闻。

Welcome back to the AI Daily Brief headlines edition, all the daily AI news you need in around five minutes.

Speaker 0

不过今天这期内容非常充实，所以可能会比平时长一些。

Although today is a very jam packed episode, so I expect it to be a little longer than normal.

Speaker 0

今天我们首先为各位模型测试者带来一个激动人心的消息。

We kick off today with an exciting one for you model testers out there.

Speaker 0

谷歌已发布Gemini三代的深度思考模式，这是新版Gemini套件中最强大的版本。

Google has released Gemini three Deep Think Mode which is their most powerful version of the new Gemini three Suite.

Speaker 0

目前这个新模式仅限谷歌AI Ultra计划的订阅用户使用，该套餐每月费用高达数百美元。

Now right now the new mode is exclusively available to subscribers of Google's AI Ultra plan which is their couple $100 a month type of product.

Speaker 0

考虑到如此高昂的价格标签，DeepThink模式专为解决现有最复杂的数学、科学和逻辑问题而设计。

Now as you might imagine then with the price tag that high DeepThink is designed to tackle the most complex math, science and logic problems available.

Speaker 0

该模式基于Gemini 2.5 DeepThink构建，尽管我通常不太关注基准测试，但它确实宣称取得了令人印象深刻的性能表现。

The mode builds on top of Gemini 2.5 DeepThink and as much as I tend not to care about benchmarks does claim some impressive performances.

Speaker 0

他们宣称在不使用工具的情况下，在人类终极考试中取得了41%的顶尖成绩，优于GPT-5 Pro的30.7%；同时DeepThink在ArcGI-2测试中也获得了45%的成绩，以超过GPT-5 Pro两倍多的表现成为新标杆。

They claim a state of the art 41% result on humanity's last exam without the use of tools outperforming GPT-five Pro at 30.7% and DeepThink also achieved a 45% result on the ArcGI-two test more than doubling the performance of GPT-five Pro to become the new state of the art.

Speaker 0

需要说明的是，每当讨论ArcGI时都存在两个维度。

Now it should be noted whenever we talk about ArcGI that there are two vectors.

Speaker 0

一个是得分，另一个是单任务成本。

There is score and there is cost per task.

Speaker 0

虽然Gemini 3 DeepThink彻底刷新了最高得分记录，但每项任务77美元的成本也相当高昂。

And while Gemini three DeepThink absolutely shattered the previous high score, it did so at a pretty elevated cost of $77 a task.

Speaker 0

这或许能部分解释为何他们将DeepThink模式锁定在最昂贵的订阅层级之后。

Now this might go some way to explaining why they're paywalling DeepThink mode behind the most expensive subscription.

Speaker 0

值得注意的是，这是普通用户首次能够接触到运行成本如此高昂的模型。

It's important to note that this is the first time normal users have ever had access to a model this expensive to run.

Speaker 0

OpenAI去年底从未发布过那个每项任务需花费167美元才能达到顶尖性能的预览版三号模型。

OpenAI never released the preview version of three that cost $167 per task to achieve its state of the art performance at the end of last year.

Speaker 0

DeepThink通过同时探索多种假设再给出解决方案，实现了其顶尖性能。

DeepThink achieves its state of the art performance by exploring multiple hypotheses at once before delivering a solution.

Speaker 0

这项技术虽在研究中用于提升性能，但由于高昂的推理成本，通常不作为标准功能向普通用户开放。

A technique that has been used in research to boost performance but generally hasn't been available to regular users as a standard feature due to the high inference costs.

Speaker 0

目前尚不明确的一点是，这里的预期应用场景究竟是什么。

Now one thing that's not exactly clear yet is what the use case is actually expected to be here.

Speaker 0

他们通过展示模型一次性生成包含复杂物理效果的多米诺骨牌游戏来宣布这一功能。

They announced it by showing it generating a dominoes game with complex physics in one shot.

Speaker 0

另一位谷歌员工展示了它生成橡胶花瓶落在硬质表面上的复杂物理模拟，我认为这有助于澄清一点：虽然它叫深度思考(DeepThink)，而我们已有深度研究(DeepResearch)，两者可能存在概念重叠，但本质上是不同的。

Another Googler showed it producing a complex physics simulation of a rubber vase falling on a hard surface, which I think helps clear up one thing because it is called deep think and we already have deep research there may be some mental overlap between the two but they are fundamentally different things.

Speaker 0

深度思考并非只是深度研究的强化版，它实际上具备科学推理能力。

Deep think is not just a souped up version of deep research instead it is capable of scientific reasoning.

Speaker 0

关于第一反应，许多人与HyperBrowser创始人Srir Shukhani有相同体验，他写道：'拥有这么多TPU和GPU，Gemini三号DeepThink怎么还会过载无法使用？'

Now in terms of first reactions, a lot of people had the same experience of hyper browser founder Srir Shukhani who wrote, With all their TPUs and GPUs, how TF is Gemini three DeepThink overloaded and unusable?

Speaker 0

这也是我在公告发布后最初几小时得到的反应，不过后来情况有所缓解。

This is also the response I got for the first couple of hours after the announcement, although then it cleared up.

Speaker 0

维克多·塔隆写道：对于那些好奇的人，正如预期的那样，Gemini三代的DeepThink解决了困扰我好几天的Stack Overflow漏洞。

Victor Talon writes: For those wondering and as expected Gemini three DeepThink solves the Stack Overflow bug that cost me a few days.

Speaker 0

这个答案比Opus 4.5更明确，后者是唯一另一个公开解决该问题的模型。

The answer is more decisive than Opus 4.5, the only other public model to solve it.

Speaker 0

甚至连Gemini三代的Pro版本也未能解决。

Even Gemini three Pro fails.

Speaker 0

它甚至能自信地指出确切位置。

It even points the exact location confidently.

Speaker 0

不过耗时极长。

Takes forever though.

Speaker 0

目前我手头没有更难的测试案例。

I don't have harder tests for now.

Speaker 0

我的大多数基准测试都已达到饱和状态。

Most of my benchmarks are saturated.

Speaker 0

而且我目前接触它还不到一天时间。

Now I've only had access for less than a day so far as well.

Speaker 0

考虑到这可能不是它的典型应用场景，我还是拿了一个最近在研究的真实商业策略问题来测试，想比较GPT51、GPT51 Pro和Gemini三者在商业策略思考上的差异。

Knowing that it sort of probably wasn't the use case, I gave it a recent business strategy question that I had been both genuinely exploring, but also trying to test GPT51 thinking versus GPT51 Pro versus Gemini three Pro.

Speaker 0

现阶段我得说，对于这类商业策略问题，DeepThink的额外迭代似乎并不值得。

And I will say at this stage, I don't particularly think that the extra reps of DeepThink are worth it for that type of business strategy question.

Speaker 0

基本上，我认为它并没有带来什么特别额外的价值。

Basically, don't think that it particularly added anything more.

Speaker 0

事实上，我甚至觉得它的回答还不如其他几个版本。

In fact, I didn't even prefer its response relative to the others.

Speaker 0

所以虽然最近我处理商业策略问题时更愿意花时间用Five One Pro，但DeepThink可能有点过度设计，不太适合这个特定用途。

So whereas I have recently been finding myself being willing to take the time for Five One Pro on business strategy questions, I think deep think might be a little too far and just not right sized for that particular purpose.

Speaker 0

无论如何，我会继续利用Ultra账户充分进行实验。

In any case, I will continue to experiment with it, making full use of that Ultra account.

Speaker 0

接下来我们继续关注谷歌生态——他们已与Replit合作，将Vibe编程引入企业领域。

Next up, we stay in the Google Universe where they have partnered with Replit to bring vibe coding to the enterprise.

Speaker 0

这项多年合作伙伴关系将使Replit扩大对谷歌云服务的使用，意味着谷歌AI模型的深度整合，同时在后端使用谷歌云基础设施来实现功能完整的Vibe编码软件。

The multi year partnership will see Replit expand their use of Google Cloud services, meaning a deeper integration of Google's AI models, as well as using Google Cloud infrastructure on the back end to enable fully functional vibe coded software.

Speaker 0

在Replit上编码的应用程序还能在其市场推广策略中利用谷歌云市场。

Apps coded in Replit will also be able to leverage Google Cloud Marketplace in their go to market strategy.

Speaker 0

Replit首席执行官Amjad Masad表示：我们和谷歌的目标是让企业级Vibe编码成为现实。

Replit CEO Amjad Masad said: The goal for us and for Google is to make enterprise vibe coding a thing.

Speaker 0

我们想向世界展示这些工具将真正改变企业运营方式和人们的工作方式。

We want to show the world that these tools are actually going to transform businesses and how people work.

Speaker 0

现在公司里任何人都可以具备创业精神，而不再各自为政。

Instead of people working in silos now anyone in the company can be entrepreneurial.

Speaker 0

谷歌云高级总监Richard Serato补充道：虽然看起来像，但Replit并非一夜成名。

Richard Serato, the senior director for Google Cloud added: It may feel like it, but Replit is no overnight success.

Speaker 0

Amjad和他的团队经过长期努力，为当下这个开发者时代打造了恰到好处的产品。

Amjad and team built something over time that became the exact right thing for this current moment with builders.

Speaker 0

Amjad在另一次接受CNBC关于AI泡沫现状的采访时承认，Vibe编码的蜜月期已经结束。

In separate comments to CNBC on the state of the AI bubble, Amjad acknowledged that the honeymoon phase for vibe coding is over.

Speaker 0

他表示：今年早些时候，曾出现过氛围编程的热潮市场，那时人人都听说过氛围编程。

He said: Early on in the year, there was the vibe coding hype market where everyone's heard about vibe coding.

Speaker 0

每个人都想尝试一下。

Everyone wanted to go try it.

Speaker 0

当时的工具远不如现在这么好。

The tools were not as good as they are today.

Speaker 0

所以我认为这让很多人感到失望。

So I think that burnt a lot of people.

Speaker 0

可以说现在氛围编程的热度有所减退，许多曾经盈利的公司现在赚得没那么多了。

So there's a bit of a vibe coding, I would say, hype slowdown, and a lot of companies that were making money are not making as much money.

Speaker 0

伊姆贾德指出，今年早些时候我们每周都能收到氛围编程公司的年度经常性收入更新，但现在收不到了。

Imjad noted that earlier in the year we were getting weekly ARR updates from the vibe coding companies and now we're not.

Speaker 0

话虽如此，RAMP的新数据显示Replit的增速并没有放缓太多。

That said, new statistics from RAMP suggest that Replit isn't slowing down all that much.

Speaker 0

RAMP经济实验室报告称，Replit目前在所有软件供应商中新客户增长率排名第一。

The RAMP Economics Lab reported that Replit is currently number one for new customer growth across all software vendors.

Speaker 0

谷歌在发布Gemini三和Nano Banana Pro后也名列前茅，在新客户增长方面排名第五，在新支出增长方面排名第二。

Google is also up there following the release of Gemini three and Nano Banana Pro, sitting at number five for new customer growth and number two for new spend growth.

Speaker 0

现在，我有一个与众不同的观点：我认为目前我们对氛围编程的看法实际上过于悲观。

Now, one of my contrarian somehow takes is that I think we are actually way too bearish on vibe coding right now.

Speaker 0

我倾向于认为，当我们谈论氛围编程时，实际上是在用相同的词汇进行两种完全不同的对话。

I tend to think that when we say vibe coding we are having two entirely different conversations at the same time using the same words.

Speaker 0

面向非技术人员的氛围编程与面向软件工程师的氛围编程完全不同。

There is vibe coding for non technical people, which is entirely different than vibe coding for software engineers.

Speaker 0

Amjad提到的那种'玫瑰褪色'现象，我认为是专门针对软件工程师的氛围编程而言的。

That shine coming off the rose type of phenomenon that Amjad was talking about is I think specific to vibe coding for software engineers.

Speaker 0

开发者群体目前正在进行重新调整，围绕如何在自主性光谱上最佳部署这些工具展开讨论，包括如何将代理编程整合到工作流程中而不产生新问题等议题。

There is a recalibration happening right now among developers around how best to deploy these tools around the autonomy spectrum, all these sort of questions around how you're going to integrate agentic coding into your processes in a way that doesn't just create new problems.

Speaker 0

然而对于非技术人员而言，我认为我们才刚刚触及表面。

However, for the non technical people, I think we are barely scratching the surface.

Speaker 0

特别是，我认为氛围编程尚未真正大规模进入商业领域。

In particular, I do not think that vibe coding has significantly made its way into the business world yet.

Speaker 0

目前主要还是个别黑客和DIY爱好者发现他们现在可以不用Wix或Squarespace之类的平台，就能自己搭建和修改网站。

It's mostly still individual hackers and tinkerers who are discovering that they can build and modify their own websites now without having to use Wix or Squarespace or something like that.

Speaker 0

我真诚地认为这种情况将会改变，实际上我觉得2026年将成为氛围编程爆发性增长的一年，但面向的是完全不同的市场受众。

I genuinely believe that that is going to change and I actually think '26 is going to be a massive increase year for Vibe Coding but with a very different market audience.

Speaker 0

再来一则与谷歌相关的消息：谷歌Neo云合作伙伴FluidStack正在洽谈以70亿美元估值融资7亿美元。

One more Google adjacent story, Google's Neo Cloud partner FluidStack is in talks to raise $700,000,000 at a $7,000,000,000 valuation.

Speaker 0

FluidStack年初还名不见经传，但通过签署多个数据中心开发协议迅速打开了业务局面。

FluidStack started the year as a relative unknown, but signed multiple data center development deals to jumpstart their business.

Speaker 0

谷歌为其中两笔交易提供了担保，承诺若FluidStack违约将代为偿还债务。

Google served as the backstop on a pair of deals, pledging to repay debt if FluidStack defaults.

Speaker 0

作为协议的一部分，FluidStack成为首批获得谷歌TPU芯片的第三方供应商。

As part of those deals FluidStack became one of the first third party vendors to receive Google's TPUs.

Speaker 0

去年9月达成协议时这还不算重大新闻，但随着市场开始将TPU视为英伟达主导GPU的真正竞争者，情况正在改变。

Now that wasn't massive news back in September when the deals were struck, but now that the market narrative views TPUs as a genuine contender to NVIDIA's dominant GPUs, that is changing.

Speaker 0

FluidStack还获得了在法国建设千兆瓦级数据中心的合同，这是马克龙总统推动主权AI计划的一部分。

FluidStack also secured the contract to build a gigawatt capacity data center in France as part of President Emmanuel Macron's push for sovereign AI.

Speaker 0

他们还是Anthropic上月宣布的500亿美元数据中心投资项目的基建合作伙伴。

They are additionally the infrastructure partner for Anthropics $50,000,000,000 data center investment announced last month.

Speaker 0

据报道，本轮融资将由情境感知基金领投，该基金由前OpenAI研究员Leopold Aschenbrenner创立的对冲基金运作。

The new funding round will reportedly be led by situational awareness which is of course the hedge fund started by former OpenAI researcher Leopold Aschenbrenner.

Speaker 0

接下来关注Opus 4.5的相关动态——随着该模型不断突破极限，市场热度持续攀升。

Moving to our next story hype around Opus 4.5 continues to build as the model keeps pushing the limits.

Speaker 0

Syash Kapoor（你可能通过《AI是常态科技》博客认识他）宣布其团队准备声明Opus已攻克CoreBench科学智能体基准测试。

Syash Kapoor who you may know from the AI is Normal Technology blog announced that his team are ready to declare that Opus has solved the CoreBench Scientific Agent Benchmark.

Speaker 0

该基准测试要求智能体在获得论文代码和数据后能复现科研成果。

The benchmark requires agents to reproduce scientific papers when given the code and data from a paper.

Speaker 0

评分标准包括智能体从论文搭建代码库、运行代码，并正确回答结果相关问题的能力。

The agent is scored on its ability to set up the repo from the paper, run the code, and then correctly answer questions about the result.

Speaker 0

从功能上说，这主要是关于智能体代码执行能力的基准测试。

Functionally, it's a benchmark primarily about agentic code execution.

Speaker 0

CoreBench采用名为CoreAgent的通用智能体框架，确保不同模型能在公平环境下进行对比。

CoreBench uses a common agent scaffold called CoreAgent to allow comparison between different models on a level playing field.

Speaker 0

Opus 4.5最初使用Core Agent进行测试，得分为42%——这个成绩不错，但距离Opus 4.1的领先分数51%仍有差距。

Opus 4.5 was initially tested using Core Agent and scored 42% a solid score but not close to Opus four point one's leading score of 51%.

Speaker 0

随后DeepMind研究员Nicholas Carlini联系了该团队，提出了一种使用Claude代码的新框架，并指出了基准测试评分方式存在的一些问题。

DeepMind researcher Nicholas Carlini then reached out to the team with a new scaffold that uses Claude code as well as some issues with the way the benchmark was being scored.

Speaker 0

CoreBench团队使用Claude代码框架重新运行测试，发现OPUS 4.5的表现几乎翻倍，达到了78%。

CoreBench team ran the benchmark again using the Claude code harness and found that OPUS four point five's performance almost doubled to 78%.

Speaker 0

有趣的是，这种幅度的性能跃升是OPUS 4.5独有的。

Interestingly, a jump of this size was unique to OPUS 4.5.

Speaker 0

Sonnet 4和4.5的改进幅度要小得多，而OPUS 4.1的性能实际上出现了倒退。

Sonnet four and four point five saw much smaller improvements and OPUS 4.1 actually went backwards.

Speaker 0

Kapoor写道：我们不确定是什么导致了这种差异。

Kapoor wrote: We're unsure what led to this difference.

Speaker 0

一种假设是Claude 4.5系列模型更适合与Claude代码协同工作。

One hypothesis is that the Claude 4.5 series of models is much better tuned to work with Claude code.

Speaker 0

另一种可能是Core Agent的底层指令对能力较弱的模型效果良好，但对更强大的模型反而会限制其性能表现。

Another could be that the lower level instructions in Core Agent, which worked well for less capable models, stop being effective and hinder the model's performance for more capable models.

Speaker 0

CoreBench团队还手动检查了他们的基准测试，剔除了Carlini指出的评分错误：有八个任务因微小的浮点误差被错误标记为失败，还有一个任务因数据集从互联网上移除而无法复现。

The CoreBench team also manually went through their benchmark weeding out grading errors that Carlini had pointed out: eight tasks were being incorrectly marked as wrong due to small floating point errors, and one task was impossible to reproduce due to a dataset being removed from the internet.

Speaker 0

团队手动评估Opus 4.5的表现得分为95%，仅有两项任务失败。

The team manually scored Opus four point five's performance at 95% with only two tasks failed.

Speaker 0

Kapoor写道：随着Opus 4.5获得95%的评分，我们认为CoreBench难题已被攻克。

Kapoor wrote: With Opus 4.5 scoring 95%, we're treating CoreBench hard as solved.

Speaker 0

团队现在计划转向一组未公开的测试问题作为下一个基准，以确保这些问题未被包含在训练数据中。

The team now plans to pivot to an undisclosed set of test questions for their next benchmark to ensure the questions aren't included in training data.

Speaker 0

现在，在基准测试之外，关于4.5 Opus的个人好评仍在持续涌现。

Now, outside the benchmarks, the personal testimonials for 4.5 Opus just continue to roll in.

Speaker 0

来自Every的Dan Chipper最初就非常看好，现在又发表了一篇更为乐观的新文章。

Dan Chipper from Every, who was very bullish to begin with, wrote a new piece going even farther.

Speaker 0

他在推特上表示：Opus 4.5这周让我震惊不已。

He said on Twitter, Opus 4.5 blew me away this week.

Speaker 0

我在会议间隙不看代码就构建了一个功能齐全的阅读伴侣应用，现在每天都使用它。

I built a fully featured reading companion app that I now use every day, in between meetings without looking at the code.

Speaker 0

有两点很重要：我们刚刚达到了自主编码的新高度。

Two things that are important: we just reached a new level of autonomous coding.

Speaker 0

用任何前沿模型都能一次性完成令人印象深刻的应用程序演示，这已经有一段时间了。

You've been able to one shot an impressive app demo for a while now with any frontier model.

Speaker 0

Opus 4.5是首个能持续编码而不会陷入无限错误循环的模型。

Opus 4.5 is the first model that just keeps coding and coding without running into endless loops of errors.

Speaker 0

其次，即时原生应用现在成为可能。

Second, prompt native apps are now possible.

Speaker 0

Opus 4.5现在可以作为通用组件集成到你的应用中，为多项功能提供支持。

Opus 4.5 can now act as a general purpose inside your app to power many of your features.

Speaker 0

这将功能开发转变为编写提示词而非编写代码的过程。

This turns building features into an exercise in writing prompts instead of writing code.

Speaker 0

《纽约时报》的Kevin Roos也发现Opus 4.5在非编码领域表现卓越。

The NYT's Kevin Roos is also finding Opus 4.5 great for non coding purposes.

Speaker 0

他写道：Claude Opus 4.5在写作、头脑风暴和提供书面作品反馈方面表现非凡。

He writes: Claude Opus 4.5 is a remarkable model for writing, brainstorming, and giving feedback on written work.

Speaker 0

与之交谈也很有趣，它似乎几乎达到了反沉迷的极致。

It's also fun to talk to and seems almost anti engagement maxed.

Speaker 0

前几天凌晨1点我还用各种愚蠢问题打扰它，结果它说：'凯文，去睡觉吧'。

The other night I was hitting it with stupid questions at 1AM and it said, Kevin, go to bed.

Speaker 0

至于我自己，虽然目前还没有完全从GPT51或Gemini三代转向Opus 4.5，但听到这么多讨论后，显然我得给它更大的尝试机会。

Now as for me, I have not yet found myself switching away from GPT51 or Gemini three to Opus 4.5 all that often, but with all of this chatter it seems clear that I'm going to have to give it an even bigger swing.

Speaker 0

再分享几个新闻——就像我说的，今天我们有很多延伸报道。

Couple more stories like I said we are on an extended headlines today.

Speaker 0

一些市场和采用率方面的消息：Salesforce基于AgentForce的采用率发布了强劲的收入预测。

A little bit of market and adoption news: Salesforce has delivered a strong revenue forecast on the back of AgentForce adoption.

Speaker 0

Salesforce表示其第四季度收入将在111亿至112亿美元之间，远超分析师预测的109亿美元。他们还表示剩余履约义务（衡量未来订单的指标）将增长约15%，高于分析师预估的10%。

Salesforce said their Q4 revenue would be between $11,100,000,000 and $11,200,000,000 outstripping analyst forecasts of $10,900,000,000 They also said that remaining performance obligations, a measure of future bookings, would increase by about 15% compared to analyst estimates of 10%.

Speaker 0

CEO马克·贝尼奥夫归功于其AI产品线表示：'我们的AgentForce和Data360产品是增长驱动力'。

CEO Mark Benioff credited their AI focused products stating: Our AgentForce and Data360 products are the momentum drivers.

Speaker 0

AgentForce的活跃客户账户季度环比增长70%，许多客户正从试点阶段转向实际部署阶段。

Active customer accounts for AgentForce have grown 70% quarter over quarter with many customers now transitioning from the pilot phase to active deployment.

Speaker 0

贝尼奥夫表示他们现在拥有超过9500家付费AgentForce客户。

Benioff said that they now have over 9,500 paying AgentForce customers.

Speaker 0

他说：'我们通过AgentForce取得了令人难以置信的成果。'

He said: We've delivered incredible results with AgentForce.

Speaker 0

它确实超出了我们的预期。

It's really exceeding our expectations.

Speaker 0

这是我们历史上增长最快的产品。

This is our fastest growing product ever.

Speaker 0

现在关于Salesforce故事中我关注的一个有趣细节？

Now one interesting sub wrinkle that I'm watching with the Salesforce story?

Speaker 0

许多人关心的一个重大问题是在未来模型会在多大程度上变得商品化。

A big question that many have is to what extent models get commoditized in the future.

Speaker 0

Salesforce方面自2024年底推出AgentForce以来主要基于OpenAI模型进行构建。

Salesforce for their part has primarily built on top of OpenAI models since they launched AgentForce in late twenty twenty four.

Speaker 0

然而上周贝尼奥夫发文称：'我连续三年每天都在使用ChatGPT。'

However last week Benioff posted: I've used ChatGPT every day for three years.

Speaker 0

我刚花了两个小时体验Gemini三号

Just spent two hours on Gemini three.

Speaker 0

我不会再回去了

I'm not going back.

Speaker 0

这个飞跃太惊人了

The leap is insane.

Speaker 0

无论是推理能力、速度、图像还是视频，一切都更清晰更快了

Reasoning, speed, images, video everything is sharper and faster.

Speaker 0

感觉世界又一次改变了。

It feels like the world just changed again.

Speaker 0

然后他在周四发文称：大语言模型就是新型磁盘驱动器。

Then on Thursday he posted: LLMs are the new disk drives.

Speaker 0

这种基础商品可以根据性价比随时更换供应商。

Commodity infrastructure you hot swap for whoever's cheapest and best.

Speaker 0

认为模型能形成护城河的幻想刚刚破灭了。

The fantasy that the model is a moat just expired.

Speaker 0

因此，值得关注的是Salesforce如何看待模型切换，以及这对市场其他部分意味着什么。

So, interesting things to watch to see how Salesforce thinks about model switching and what that means for the rest of the market.

Speaker 0

昨天还有个更重大的市场新闻——虽然与AI只有些许关联——Meta可能正在放弃其同名技术，有传言称元宇宙部门将大幅裁员。

An even bigger market story yesterday if only a little tangentially related to AI is that Meta could be giving up on their namesake technology with rumors of deep cuts to the Metaverse division.

Speaker 0

彭博社报道称，元宇宙部门明年的预算可能削减高达30%。

Bloomberg reports that the metaverse group could see budget cuts as high as 30% next year.

Speaker 0

消息人士称，如此规模的削减很可能包括最早于1月实施的裁员。

Their sources said cuts of that magnitude would most likely include layoffs as soon as January.

Speaker 0

他们补充说明尚未做出最终决定，但大幅削减元宇宙部门预算已被列入年底预算规划会议的议程。

They did caveat that no final decisions have been made, but deep cuts to the metaverse group are on the agenda for end of year budget planning sessions.

Speaker 0

消息人士透露，扎克伯格要求全公司范围削减10%预算，这是过去几年的标准要求。

Sources said that Zuckerberg has asked for 10% cuts across the board, which has been the standard request for the past few years.

Speaker 0

然而，由于该技术缺乏行业广泛竞争，元宇宙部门被特别标记为需要更深度削减。

However, the metaverse group was signaled out for deeper cuts due to the lack of industry wide competition over the technology.

Speaker 0

对大多数公开市场投资者而言，元宇宙除了是个巨大的失望（尤其相对于当初的承诺）外，很难被视为其他任何东西。

Now, for most public market investors, it's hard for them to see the metaverse as anything but a massive disappointment especially relative to the pitch.

Speaker 0

2021年，扎克伯格以如此坚定的态度推介元宇宙，以至于他更改了公司名称。

In 2021, Zuckerberg presented the metaverse with such conviction that he changed the name of the company.

Speaker 0

自那时起，他们的元宇宙部门简直就是一个烧钱的无底洞。

Since then, their metaverse group has been nothing short of a cash incinerator.

Speaker 0

自元宇宙战略公布以来，该部门已亏损超过700亿美元，因此，市场对Meta将大幅削减此类支出的消息反应积极也就不足为奇了。

The group has lost more than 70,000,000,000 since the metaverse strategy was announced, and thus, unsurprisingly, markets responded well to the idea that Meta would be slashing that particular category of spend.

Speaker 0

股价当日飙升5.7%，创下自7月以来最大单日涨幅。

The stock jumped by 5.7% in its largest intraday move since July.

Speaker 0

虽然元宇宙部门正面临大幅削减，但这未必会波及母公司Reality Labs部门。

Now while the Metaverse group is being slashed, that doesn't necessarily carry over to the parent division Reality Labs.

Speaker 0

这个更广泛的部门专注于Meta各类AR和VR产品，近年来发展势头持续强劲。

That broader division is focused on Meta's various AR and VR products and has been going from strength to strength in recent years.

Speaker 0

Meta Ray-Ban智能眼镜意外走红，现已成为该产品类别的标杆——随着大语言模型能力逐渐兑现AI可穿戴设备的潜力，这类产品的重要性想必会与日俱增。

The Meta Ray Bans have been a surprise hit and now define their product category, which is presumably a product category only becoming more important as LLM capabilities catch up to the promise of AI wearables.

Speaker 0

Meta发言人暗示这一战略调整正在进行中，并表示：鉴于发展势头，我们正将现实实验室整体投资组合中的部分资源从元宇宙转向AI眼镜和可穿戴设备领域。

A meta spokesperson suggested this strategy pivot is underway commenting, Within our overall reality labs portfolio, we are shifting some of our investment from metaverse towards AI glasses and wearables given the momentum there.

Speaker 0

除此之外，我们暂无更广泛的调整计划。

We aren't planning any broader changes than that.

Speaker 0

资源重新分配也与Meta本周早些时候挖角苹果资深UX设计师艾伦·戴伊的行动相呼应。

The reallocation of resources also aligns to Meta's poaching of veteran Apple UX designer Alan Dye earlier this week.

Speaker 0

周三，扎克伯格宣布戴伊将领导Reality Labs内部一个新创意工作室，专注于设计、时尚与技术领域。

On Wednesday, Zuckerberg announced that Dye would lead a new creative studio within Reality Labs that would focus on design, fashion, and technology.

Speaker 0

在Threads的发文中，扎克伯格写道：‘我们正进入一个新时代，AI眼镜和其他设备将改变我们与技术及彼此互动的方式。’

In a post on Thread, Zuckerberg wrote, We're entering a new era where AI glasses and other devices will change how we connect with technology and each other.

Speaker 0

潜力是巨大的，但最重要的是让这些体验感觉自然且真正以人为本。

The potential is enormous, but what matters most is making these experiences feel natural and truly centered around people.

Speaker 0

通过这个新工作室，我们致力于让每次交互都充满巧思、直观易用，并真正服务于人。

With this new studio, we're focused on making every interaction thoughtful, intuitive, and built to serve people.

Speaker 0

朋友们，以上就是本期《扩展头条》的全部内容，现在我们将结束这部分，进入正片环节。

So friends, that is the story from this Extended Headlines edition, but for now, we'll wrap it there and move on to the main episode.

Speaker 0

本期节目由我的公司Super Intelligent赞助播出。

Today's episode is brought to you by my company, Super Intelligent.

Speaker 0

超级智能是一个AI规划平台。

Super Intelligent is an AI planning platform.

Speaker 0

当前，随着我们步入2026年，我们合作的各大企业展现出的核心趋势是：决心让2026年成为规模化AI部署之年，而不仅仅是进行更多试点和实验。

And right now, as we head into 2026, the big theme that we're seeing among the enterprises that we work with is a real determination to make 2026 a year of scaled AI deployments, not just more pilots and experiments.

Speaker 0

然而，我们的许多合作伙伴都陷入了某种AI发展瓶颈。

However, many of our partners are stuck on some AI plateau.

Speaker 0

可能是治理方面的问题。

It might be issues of governance.

Speaker 0

可能是数据准备度的问题。

It might be issues of data readiness.

Speaker 0

也可能是流程映射的问题。

It might be issues of process mapping.

Speaker 0

无论何种情况，我们正在推出一种名为'瓶颈突破者'的新型评估方案——正如其名所示，旨在帮助突破AI发展瓶颈。

Whatever the case, we're launching a new type of assessment called Plateau Breaker that, as you probably guessed from that name, is about breaking through AI plateaus.

Speaker 0

我们将部署语音代理来收集信息，诊断导致您停滞在瓶颈期的真正障碍。

We'll deploy voice agents to collect information and diagnose what the real bottlenecks are that are keeping you on that plateau.

Speaker 0

基于此，我们将制定一份蓝图和行动计划，帮助您突破瓶颈，实现全面部署并获取实际投资回报。

From there, we put together a blueprint and an action plan that helps you move right through that plateau into full scale deployment and real ROI.

Speaker 0

若您想了解更多关于'瓶颈突破者'的信息，请发送邮件至contact@super.ai，主题注明'plateau'。

If you're interested in learning more about Plateau Breaker, shoot us a note contactbsuper dot ai with plateau in the subject line.

Speaker 0

认识一下Robo，您的人工智能队友。

Meet Robo, your AI powered teammate.

Speaker 0

Robo通过AI驱动的搜索、聊天和代理功能释放团队潜力，您还可以使用Studio构建专属代理。

Robo unleashes the potential of your team with AI powered search, chat, and agents, or build your own agent with Studio.

Speaker 0

Robo由您组织的知识库驱动，运行在Atlassian安全可信的平台上，始终与您的工作场景保持同步。

Robo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work.

Speaker 0

将Robo与您喜爱的SaaS应用连接，确保知识资产永不流失。

Connect Robo to your favorite SaaS app so no knowledge gets left behind.

Speaker 0

Robo基于Teamwork Graph运行——这是Atlassian的智能层，能整合所有应用数据，从第一天起就提供个性化AI洞察。

Robo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one.

Speaker 0

Robo已内置在Jira、Confluence及Jira服务管理（标准版、高级版和企业版）订阅中。

Robo is already built into Jira, Confluence, and Jira Service Management Standard, Premium, and Enterprise subscriptions.

Speaker 0

有没有体验过AI从工具变成队友的感觉？

Know the feeling when AI turns from tool to teammate?

Speaker 0

如果你使用rovo，你就会明白。

If you rovo, you know.

Speaker 0

探索由Atlassian驱动的新AI队友rovo。

Discover rovo, your new AI teammate powered by Atlassian.

Speaker 0

请访问rov,asinvictory,.com开始使用。

Get started at rov,asinvictory,.com.

Speaker 0

AI不是一次性项目。

AI isn't a one off project.

Speaker 0

这是一种必须随技术发展而进化的伙伴关系。

It's a partnership that has to evolve as the technology does.

Speaker 0

Robots and Pencils与客户并肩合作，将实用AI引入每个阶段：自动化、个性化、决策支持和优化。

Robots and Pencils work side by side with clients to bring practical AI into every phase: automation, personalization, decision support, and optimization.

Speaker 0

他们通过应用实验验证有效方案，并构建能放大人类潜能的系统。

They prove what works through applied experimentation, and build systems that amplify human potential.

Speaker 0

作为AWS认证合作伙伴，拥有全球交付中心的Robots and Pencils将广泛覆盖与高触达服务完美结合。

As an AWS certified partner with global delivery centers, robots and pencils combines reach with high touch service.

Speaker 0

当其他公司选择放手时，他们始终保持深度参与。

Where others hand off, they stay engaged.

Speaker 0

因为伙伴关系不是项目计划，而是一种承诺。

Because partnership isn't a project plan it's a commitment.

Speaker 0

随着AI技术的进步，他们的解决方案也将同步升级。

As AI advances, so will their solutions.

Speaker 0

这才是长期价值所在。

That's long term value.

Speaker 0

进步始于选择正确的合作伙伴。

Progress starts with the right partner.

Speaker 0

从robotsandpencils.com/aidailybrief开始与Robots and Pencils合作。本期节目由Blitzy赞助播出——这款企业级自主软件开发平台拥有无限的代码上下文理解能力。

Start with robots and pencils at robotsandpencils.com/aidailybrief This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context.

Speaker 0

Blitzy运用数千个专业AI代理，通过数小时的思考来理解数百万行代码规模的企业级代码库。

Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code.

Speaker 0

企业工程领导者们使用Blitzy平台开启每个开发冲刺周期，输入他们的开发需求。

Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements.

Speaker 0

Blitzy平台会提供计划方案，然后为每项任务生成并预编译代码。

The Blitzy platform provides a plan, then generates and pre compiles code for each task.

Speaker 0

Blitzy能自主完成80%以上的开发工作，同时为需要人工完成的最后20%开发工作提供指导。

Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.

Speaker 0

上市公司采用Blitzy作为集成开发环境前的开发工具后，工程效率提升了5倍，他们将其与首选编程助手配对，将AI原生SDLC引入组织。

Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre IDE development tool, pairing it with their coding pilot of choice to bring an AI native SDLC into their org.

Speaker 0

访问blitzy.com并点击'获取演示'，了解Blitzy如何将您的SDLC从AI辅助转变为AI原生。

Visit blitzy.com and press Get a Demo to learn how Blitzy transforms your SDLC from AI assisted to AI native.

Speaker 0

欢迎回到AI每日简报。

Welcome back to the AI Daily Brief.

Speaker 0

在本周早些时候的一期节目中，我谈到进入2026年时，我们很可能会看到更多试图研究当前人类工作中AI实际能完成多少比例的研究和实验。

In an episode earlier this week, I talked about how I thought that heading into 2026, we were likely to see a lot more studies and research and experiments that were trying to figure out just how much of the current slate of human work AI was actually able to do.

Speaker 0

几周前麦肯锡的研究显示，高达57%的工作任务可以实现自动化。

We got a McKinsey study a couple weeks ago that said that up to 57% of tasks could be automated.

Speaker 0

最近，我们获得了麻省理工学院的Iceberg报告，指出11.7%的价值创造任务可以实现自动化。

More recently, we got that MIT Iceberg report, which said that 11.7% of value generating tasks could be automated.

Speaker 0

当然，主流媒体已经将这些数据转化为标题，声称57%或12%的工作岗位将会消失。

Now, of course, those things have been translated by mainstream media into headlines that 57% of jobs or 12% of jobs are going to be lost.

Speaker 0

如果你想了解为什么12%的任务能自动化并不意味着12%的工作岗位会消失，请收听昨天的节目。

If you want a reputation of why 12% of tasks being able to be automated doesn't mean 12% of jobs going away, Listen to yesterday's episode.

Speaker 0

除了这类研究，我希望我们还能获得更多关于这些技术在实际应用中表现的研究。

But alongside those types of studies, what I hope for is that we're also going to get more research around how these things are playing out in practice.

Speaker 0

AI在理论上能实现的功能与其实际应用效果之间存在巨大的断层和差异。

There is a seismic gap and massive difference in what AI can theoretically do and what it is actually doing in practice.

Speaker 0

目前，在提供这类真实实践经验信息方面最受关注的公司之一是Anthropic。

Now, one of the companies that is most on the spot right now when it comes to providing some amount of that real lived experience information is Anthropic.

Speaker 0

昨天我们研究了他们团队内部的一项调查，他们在八月份采访了研究人员和工程师，以了解Claude和Claude Code如何影响他们的工作。

Yesterday we looked at some research that they did around their own team where they had interviewed researchers and engineers in August to figure out how Claude and Claude Code were impacting their work.

Speaker 0

今天我们还将更全面地审视AI在实际工作中的应用情况，随着Anthropic Interviewer的推出。

And today we're looking at an even more expanded look at how AI is working in practice with the introduction of Anthropic Interviewer.

Speaker 0

简而言之，Anthropic推出了一款新研究工具，并通过询问专业人士使用AI的体验来测试它。

The TLDR is that Anthropic launched a new research tool, and tested it by asking professionals about their experience working with AI.

Speaker 0

现在我想主要关注结果和专业人士的实际反馈，而非工具本身。但值得一提的是，抛开这个具体用例，该工具可能代表了未来研究模式的更广泛趋势。

Now I want to focus mostly on the results and what the professionals actually said more than the tool itself, But it is worth mentioning the tool itself a little bit because some are understanding that holding aside this specific use case for it, this potentially represents a broader pattern in how research happens in the future.

Speaker 0

Anthropic在介绍中指出，虽然他们最近开发了Clio——一个保护隐私的系统，用于从Claude的实际使用中获取洞察，但这存在固有局限。

Now in their introduction, Anthropic points out that while they recently developed Clio, which is a privacy preserving system for getting insights from real world AI use of Claude, there were inherent limits there.

Speaker 0

正如他们写道：该工具仅能让我们了解与Claude对话中发生的情况。

As they write: The tool only allowed us to understand what was happening within conversations with Claude.

Speaker 0

那么对话之后呢？

What about what comes afterwards?

Speaker 0

人们实际如何使用Claude的输出？

How are people actually using Claude's outputs?

Speaker 0

他们对此感受如何？

How do they feel about it?

Speaker 0

他们如何看待AI在未来扮演的角色？

What do they imagine the role of AI to be in their future?

Speaker 0

如果我们想要全面了解AI在人们生活中不断变化的角色，并将人类置于模型开发的核心，就需要直接询问人们。

If we want a comprehensive picture of AI's changing role in people's lives, and to center humans in the development of models, we need to ask people directly.

Speaker 0

他们指出，这样的项目需要进行数百次访谈。

Such a project, they noted, would require us to run many hundreds of interviews.

Speaker 0

在这里，我们借助AI来协助完成这项工作。

Here, we enlisted AI to help us do so.

Speaker 0

谷歌的Tao Dong注意到了这种形式的有趣之处。

Now Google's Tao Dong got that there was something interesting about the form factor here.

Speaker 0

他写道：在阅读项目博客和几份对话记录后，我的第一印象是我们正在见证一种新型用户研究，介于调查和访谈之间的混合体。

He writes: After reading the project blog post and a few transcripts, my initial impression is that we're seeing a new genre of user research, a crossover between surveys and interviews.

Speaker 0

我倾向于称之为半结构化调查。

I'm tempted to call it semi structured surveys.

Speaker 0

它像调查一样有预设的开放式问题，但能即时提出不错的后续问题。

It acts like a survey with predefined, open ended questions, but with the ability to ask decent follow-up questions on the fly.

Speaker 0

虽然这些10-15分钟的会话并不特别深入，但它们结合了调查的规模和主持人的灵活性。

While these ten-fifteen minute sessions weren't particularly deep, they combined the scale of a survey with the flexibility of a moderator.

Speaker 0

结合AI分析使团队能够识别量化模式，并真正解释这些模式存在的原因。

Pairing this with AI analysis allowed the team to identify quantitative patterns and actually explain why they exist.

Speaker 0

看起来是个非常有趣的实验。

Seems like a fascinating experiment.

Speaker 0

你怎么看？

What's your take?

Speaker 0

我认为这几乎就是我们用'超级智能'构建的东西，至少是它的一个版本。

My take is that this is pretty much exactly what we built, or at least a version of it, with Super Intelligent.

Speaker 0

在'超级智能'的审计和评估中，无论是我们的代理准备度评估还是新平台突破评估。

In SuperIntelligence audits and assessments, whether they are our agent readiness assessments or our new plateau breaker assessments.

Speaker 0

其中一个核心观点是：问卷调查擅长规模但缺乏情境理解。

One of the key ideas is that surveys are great for scale but bad for context.

Speaker 0

访谈擅长情境理解但难以规模化。

Interviews are great for context but bad for scale.

Speaker 0

但有了AI，特别是语音AI，你就不必做这种取舍了。

But with AI, particularly voice AI, you don't have to make that trade off.

Speaker 0

而且你不需要从小样本中推断，可以直接去询问每个人。

And rather than inferring from a small sample, you can just go ask everybody.

Speaker 0

我并不认为这是什么惊人的新见解。

Now, I don't think that this is some crazy novel insight.

Speaker 0

也不认为我们的技术是什么巨大的飞跃。

Nor do I think that our technology is some stratospheric leap.

Speaker 0

但我认为这种利用AI大规模扩展信息收集并加速信息分析的模式，必将成为当前各类研究流程的标配。通过这种方式，它将开启以前因信息收集与分析规模限制而无法实现的全新研究类型。

But I think that this particular pattern of using AI to radically scale information gathering and then speed up information analysis is something that is absolutely going to become de rigueur for all sorts of current research processes, and in so doing it is going to open up totally new types of research that weren't possible before because of the scale of information that you can collect and analyze.

Speaker 0

所以在讨论具体结果前，我想说——如果你正在考虑任何领域的有趣研究项目，只要涉及与人交谈，借助现有新工具，你完全可以大幅提升研究目标。

So before we move on to the specific results, would say if you are thinking about interesting research projects across basically any domain, if they involve talking with people, I believe that you can radically increase your ambition, thanks to the new tools that are available.

Speaker 0

好了，回到这次对1250位专业人士的实际调查。

Alright, so back to this actual survey of these twelve fifty professionals.

Speaker 0

就我们的目的而言，我最感兴趣的是他们关于与AI合作的看法。

For our purposes, what I'm most interested in is what they said about working with AI.

Speaker 0

可以推测，这个群体可能比随机抽样的1250人更热衷于此。

Now presumably this group is probably going to be more enthusiastic than a random sample of twelve fifty people.

Speaker 0

因此我认为这个注意事项很重要。

And so I think that that caveat is important.

Speaker 0

但在此前提下，Anthropic的一些高层见解包括：第一，人们对AI在工作中的角色持乐观态度。

But within that, some of the high level insights from Anthropic are that one, people are optimistic about the role that AI plays in their work.

Speaker 0

积极情绪主导了大部分讨论话题，不过我们也会看到，有少数话题呈现出相对悲观的前景。

Positive sentiment characterized the majority of topics discussed, however, as we'll see, there are a small number of topics that have more relatively pessimistic outlooks.

Speaker 0

第二个我认为极具价值的见解关乎系统设计和岗位替代问题：普通劳动者希望保留定义其职业身份的核心工作，而将常规性工作委托给AI。

A second insight which I think is really valuable about how we design systems and think about displacement: People from the general workforce want to preserve tasks that define their professional identity while delegating routine work to AI.

Speaker 0

他们设想的未来场景是：常规任务自动化，而他们的角色转变为监督AI系统。

They envision futures where routine tasks are automated and their roles shift to overseeing AI systems.

Speaker 0

我不认为这个观点特别新颖，但有趣的是看到人们也开始这样思考自己的角色定位。

I don't think that that's particularly novel, but it's interesting to see that that is how people are starting to think about their role as well.

Speaker 0

我们业内人士经常讨论转向人类管理AI代理和AI系统的模式，但有趣的是看到这也开始成为普通从业者的期待或目标。

We talk a lot as insiders about this idea of shifting to a model where humans manage AI agents and AI systems, but it's interesting to see that start to come out as an expectation or a goal from individual professionals as well.

Speaker 0

第三个与我观察完全吻合的见解是：尽管创意工作者面临着对其未来的纯粹评判和焦虑，他们仍在转向AI以提高生产力。

A third insight which absolutely resonates with what I'm seeing is that despite creatives facing pure judgment and anxiety about their future, they are turning to AI to increase their productivity.

Speaker 0

正如Anthropic所言，他们正在应对创意社区中使用AI的即时污名化，以及更深层次的经济替代和人类创意身份侵蚀的担忧。

As Anthropic puts it, they are navigating both the immediate stigma of AI use in creative communities and deeper concerns about economic displacement and the erosion of human creative identity.

Speaker 0

最后，第四点：Anthropic写道，科学家们希望与AI合作，但尚不能信任它进行核心研究。

Lastly, number four: Anthropic writes that scientists want AI partnership but can't yet trust it for core research.

Speaker 0

科学家们一致表达了对能生成假设和设计实验的AI的渴望，但目前他们将其实际用途限制在撰写论文或调试分析代码等任务上。

Scientists uniformly express a desire for AI that could generate hypotheses and design experiments, but at present they confine their actual use to tasks like writing manuscripts or debugging analysis code.

Speaker 0

因此，与其他领域相比，有趣的是科学家们似乎希望AI能做得更多，或至少在其核心功能上提供更多帮助，而不仅仅是自动化那些常规任务。

So what's interesting here, as opposed to some of these other areas, is that it sounds like scientists want AI to do more or at least be more helpful with their core functions, not just those routine tasks to be automated.

Speaker 0

那么让我们来看看可视化图表。

So let's look at the visualization.

Speaker 0

如果你在收听节目，我会快速过一遍；但如果你在观看，蓝灰色代表更悲观的情绪。

If you're listening to the show, I'll go through this pretty fast, but if you're watching it, The blue gray represents more pessimistic.

Speaker 0

柔和的黄色代表更乐观的情绪。

The muted yellow represents more optimistic.

Speaker 0

你可以看到，在几乎每个类别中，乐观情绪大多胜过悲观情绪。

And you can see across almost every category, optimism mostly beats out pessimism.

Speaker 0

在我看来，唯一一个相对更悲观的领域（至少在普通劳动者中）是职业适应，这很合理。

The one area, it appears to me, where there's relatively more pessimism, at least among the general workforce, is in career adaption, which makes sense.

Speaker 0

在创意工作者中，有几个领域悲观情绪再次略占上风。

Now among creatives, there are a few areas where, again, pessimism takes a little bit more root.

Speaker 0

特别是艺术家被替代的问题上，实际上整体上人们比乐观更悲观，作家被替代的情况也是如此。

In particular, artist displacement shows actually people more pessimistic than optimistic overall, same with writer displacement.

Speaker 0

在科学家中，悲观情绪明显超过乐观情绪的最大领域是围绕安全问题。

Among scientists, the biggest area that saw actual more pessimism than optimism is around security concerns.

Speaker 0

当你深入研究他们分享的案例时，很多都反映了社交媒体上日复一日流传的普遍情绪。

And when you dig into the examples they shared, a lot of it reflects broader sentiment that you hear day in and day out on social media.

Speaker 0

例如，很多人正在努力弄清楚他们工作中哪些部分不会被自动化，哪些技能在未来AI无处不在的情况下仍有价值。

For example, a lot of folks are trying to figure out what parts of their jobs won't be automated, which parts of their skills will be valuable in a future where they assume AI is ubiquitous.

Speaker 0

例如，一位货运调度员说：我一直在思考人类能为行业提供哪些无法被自动化取代的东西，并真正专注于这方面，比如个性化的人际互动。

For example, a trucking dispatcher said: I'm always trying to figure out things that humans offer to the industry that can't be automated, and really hone in on that aspect, like the personalized human interactions.

Speaker 0

不过，我并不认为这在长期来看会是必要的。

However, that is not something that I think will be necessary in the long run.

Speaker 0

我仍在思考应该培养哪些人工智能无法取代的技能。

I'm still trying to figure out what skills would be good to work on that AI can't take over.

Speaker 0

显然这远不止于某个特定工作岗位的问题。

Obviously way bigger than just that particular job role.

Speaker 0

我认为这个问题特别重要，因为它不仅是个人需要思考的，也是设计技能提升和再培训系统的人必须高度关注的。

This question is particularly pertinent I think because it's not only something that people should be asking individually, but it's also something that people who are designing, upskilling, and retraining systems need to be hyper conscious of.

Speaker 0

如果我们设计的一堆培训项目最终被GPT-7淘汰，那就没什么意义了。

It is not going to be particularly useful if we design a bunch of training programs that just get obviated by GPT-seven.

Speaker 0

悲观情绪中另一个常见问题是使用人工智能带来的污名。

Another thing that comes up on the pessimism side is the stigma of using AI.

Speaker 0

例如一位销售人员说：我听同事说他们能分辨出邮件是AI生成的，并对发件人产生轻微负面看法。

A salesperson, for example, said: I hear from colleagues that they can tell when email correspondence is AI generated, and they have a slightly negative regard for the sender.

Speaker 0

他们觉得被怠慢了，认为发件人太懒不愿亲自写个性化邮件，而是推给AI处理。

They feel slighted and the sender is too lazy to send them a personalized note and push it onto AI to do it.

Speaker 0

我认为一个非常有趣的问题是：这种反感在多大程度上是过渡期的暂时现象？未来人们是否会觉得用AI写邮件理所当然，还是这种抵触情绪会持续存在。

I think one really interesting question is to what extent that is a temporary transitional feeling, where in the future people will feel like, of course they used AI to write an email, or if that's going to be something that's more persistent.

Speaker 0

然而在乐观的一面，你会看到大量反思性的评论。

On the optimism side, however, you see tons of reflective comments.

Speaker 0

人们期待AI帮助他们管理时间、拓展创造力，通过专注于工作中最精彩的部分来减轻压力。

People looking to AI to help them manage their time, expand their creativity, reduce their stress by allowing them to focus on the best parts of their job.

Speaker 0

总体而言，86%的专业人士报告AI为他们节省了时间，65%表示对AI在其工作中扮演的角色感到满意。

Overall, eighty six percent of professionals reported that AI saves them time, and sixty five percent said that they were satisfied with the role AI plays in their work.

Speaker 0

在不同工作类别中，挫败感与满意度的分布相当类似，但在艺术、设计和媒体等领域担忧略有增加。

Across different categories of work, there was a pretty similar distribution of frustration satisfaction, with slightly expanded worry in categories like art, design, and media.

Speaker 0

在创意工作者中，反馈呈现出更大的差异性。

Among creative professionals, there is a much bigger band of responses.

Speaker 0

以设计师为例，他们表现出的挫败感远高于电影制作人，而且在许多情况下，即使在同一类别内部也能看到复杂分歧。

Designers, for example, see much more frustration than filmmakers, and in many cases you just see complication even within individual categories.

Speaker 0

担忧与满足，希望与沮丧，这些情绪往往同时并存。

Worry and satisfaction, and hope and frustration all sitting alongside one another.

Speaker 0

在我看来，这类调查不应该偶尔开展，而需要形成非常规律的机制。

In my estimation, this is the type of survey that needs to happen not once in a while, but on a very regular basis.

Speaker 0

我希望看到这些问题能随时间推移被持续追踪，并希望这些数据能提供给各领域的政策制定者。

I want to see these questions tracked over time, and I want to see that data available to policymakers of all stripes.

Speaker 0

现在有个非常棒的消息要总结：Anthropic公司正在将所有数据公开为一个可下载的公共数据集（当然已获参与者同意），你可以在Hugging Face平台获取。这意味着如果你感兴趣，也可以亲自参与并基于这项研究开展自己的分析。

Now one really cool thing about this as we wrap: Anthropic is making all of the data available in a public dataset that you can download from Hugging Face, of course with all the participants' approval, meaning that if you are so interested, you can go interact with and run your own analysis on this research as well.

Speaker 0

总体而言，我认为这1200位专业人士向我们讲述的，与多年来我们所见的情况高度一致：一个充满机遇但本质已截然不同的未来。

Overall, I think these 1,200 professionals tell us a lot of the same story that we've been seeing for years now: a future that has so much opportunity, but is fundamentally different.

Speaker 0

而这种不同，某种程度上也令人不安。

And in that difference, somewhat scary as well.

Speaker 0

感谢Anthropic挖掘出这些真实数据，以上就是今日《AI每日简报》的全部内容。

Good job to Anthropic for digging up this real information, but that's gonna do it for today's AI Daily Brief.

Speaker 0

一如既往感谢您的收听或观看，下次再见，祝平安。

Appreciate you listening or watching as always, and until next time, peace.