本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
今天的AI每日简报,我们将分享从AWS re:Invent了解到的亚马逊AI战略,以及在此之前头条新闻中关于即将发布的开源AI模型的最新动态。
Today on the AI Daily Brief, what we learned about Amazon's AI strategy from AWS re:Invent, and before that in the headlines, more updates about forthcoming open AI models.
《AI每日简报》是一档每日播客和视频节目,聚焦人工智能领域最重要的新闻与讨论。
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
好的朋友们,在开始前先快速插播几条通知。
Alright friends, quick announcements before we dive in.
首先要感谢今天的赞助商:Robots and Pencils、Blitzy、Robo以及Super Intelligent。
First of all, thank you to today's sponsors: Robots and Pencils, Blitzy, Robo, and Super Intelligent.
想获取无广告版节目,请访问patreon.com/aidailybrief,或在苹果播客订阅。
To get an ad free version of the show, go to patreon.com/aidailybrief, or you can subscribe on Apple Podcasts.
如有意赞助本节目,请发邮件至sponsors@aidailybrief.ai联系我们。
And to learn more about sponsoring the show, shoot us a note at sponsors@aidailybrief.ai.
再次提醒,现在锁定2025年赞助费率的最后机会,请速发邮件至sponsorsaidailybrief.ai。
Once again, now is your last chance to lock in 2025 rates, so shoot us a note again at sponsorsaidailybrief dot ai.
说完这些,让我们来看看昨天我刚录完节目就爆出的最新动态。
And with that, let's get into some updates that dropped basically as soon as I finished recording yesterday.
欢迎回到AI每日简报头条版,五分钟内为您呈现所有每日必备的AI新闻。
Welcome back to the AI Daily Brief headlines edition, all the daily AI news you need in around five minutes.
我们正处在一个飞速发展的时期。
We are in an extremely fast moving period right now.
昨天的节目当然全部聚焦于OpenAI的'红色警报'及其意义——他们如何调整优先级,努力推进以改进ChatGPT,同时发布一些新模型来再次扭转舆论势头。
Yesterday's episode was of course all about OpenAI's Code Red and what it meant in terms of how they were shifting priorities and trying to push forward to improve ChatGPT as well as release some new models that could help shift the narrative momentum once again.
就在我结束录制后,我们获得了关于其中一个代号为'大蒜'模型的更多信息。
Just after I finished recording, we got more information about one of those models, which is code named garlic.
消息人士向The Information透露,'大蒜'是一次全新预训练运行的成果。
Sources told the information that garlic is the result of a new pre training run.
他们表示首席研究官Mark Chen最近告知员工,'大蒜'在内部基准测试中表现优异,优于谷歌的Gemini 3 Pro和Anthropic的Opus 4.5。
They said that chief research officer Mark Chen had recently informed staff that garlic was performing well in internal benchmarking compared against Google's Gemini three Pro and Anthropics Opus 4.5.
编程和推理任务是其特别优势。
Coding and reasoning tasks were a particular strength.
Chen表示该模型超越了OpenAI之前最佳且规模更大的预训练模型GPT 4.5。
Chen said that the model improved upon OpenAI's previous best and much larger pre trained model GPT 4.5.
陈表示该模型将尽快发布,据The Information解读,可能最早明年以GPT 5.2或GPT 5.5的形式推出。
Chen said that the model would be released as soon as possible, which the information interpreted as early next year, possibly as GPT 5.2 or GPT 5.5.
目前看来,Garlic与Sam Altman十月份备忘录中提到的Shallow Pete是不同模型——当时他警告员工在Gemini 3发布后要做好应对艰难氛围的准备。
Now, Garlic is apparently a separate model to Shallow Pete, which had been mentioned in Sam Altman's October memo where he warned staff to expect some rough vibes after the release of Gemini three.
Altman将Shallow Pete定位为OpenAI对谷歌新模型的回应,也是重获发展势头的关键。
Altman pitched Shallow Pete as OpenAI's response to the new Google model and a way to win back momentum.
陈向员工解释称,Garlic整合了首次在Shallow Pete预训练中部署的错误修复方案,这表明OpenAI已解决了预训练阶段的问题。
Chen explained to staff that Garlic incorporates bug fixes first deployed in the Shallow Pete pre training run, suggesting that OpenAI has cleared their problems with pre training.
Semi Analysis在最近的研究报告中声称,OpenAI自五月份GPT-4.0以来,尚未成功完成新基础模型的全面训练。
In a recent research note, Semi Analysis claimed that OpenAI had not completed a successful full scale training run on a new foundation model since GPT-four point zero in May.
这使得谷歌宣称Gemini 3是预训练领域的重大进步更具冲击力。
That made Google's proclamation that Gemini three was an advancement in pre training all the more impactful.
如果OpenAI已经解决了这个问题,那么这将为基础模型带来更重大的进步铺平道路,而不仅仅依赖于强化学习过程。
If OpenAI has fixed this problem, then it paves the way for more significant advancements in the base model, rather than just relying on the reinforcement learning process.
目前,虽然有人推测'Garlic'就是OpenAI作为'红色警报'计划一部分将于下周发布的模型,但也有其他人表示情况并非如此。
Now, while some assume that Garlic is the model set for release next week as part of OpenAI's Code Red, others are saying this isn't the case.
推特上的Chris ChatGPT21写道:下周发布的模型并非Garlic。
Chris ChatGPT21 on Twitter writes: The model releasing next week is not garlic.
Garlic将是一个表现优异的独立模型。
Garlic will be a separate model that performs well.
他们深度研发并创新的全新预训练模型,预计在所有领域表现强劲,计划于2026年初(大约1月至3月)推出。
The truly new pre trained model that they have deeply cooked and innovated on, which is expected to be strong across all areas, is scheduled for early twenty twenty six, roughly January to March.
所以对于想理清头绪的朋友们,传闻我们下周将获得一个新的推理模型,尽管它不会是完全新训练的模型。
So for those trying to keep track at home, the rumors are that we're getting a new reasoning model next week, although it won't be the fully new pre trained model.
但另一方面,OpenAI已解决其预训练问题,现在能对基础模型进行更多改进,预计明年初展示成果。
But separately, OpenAI has fixed their issues with pre training, can now make further advancements on the base model, which they expect to show off early next year.
我相信到那时,我们将掌握更多信息。
I'm sure by the time that this comes out, we will have even more information.
但如果Gemini 3带来的压力还不够大,越来越明显的是,至少在开发者群体中,Anthropic的Opus 4.5可能是过去一个月发布的最重要模型。
But if Gemini three weren't enough pressure, it increasingly appears that at least among developers, Anthropic's Opus 4.5 was probably the most important model released in the past month.
人们使用这个模型的时间越长,就越喜欢它。
The longer people spend with this model, the more they like it.
放眼望去,到处都是这样的推文。
Everywhere you look, there are tweets like this one.
Rishikesh写道:我越尝试Opus 4.5,就越觉得Anthropic关于软件工程消亡的观点是正确的。
Rishikesh writes: The more I try Opus 4.5, the more I feel like Anthropic is right about software engineering dying.
它好得难以置信。
It's unbelievably good.
Ivan Fioravanti写道:我认为搭载Opus 4.5的Claude可以轻松暴力破解任何软件工程问题,并以某种方式解决它。
Ivan Fioravanti writes: I think Claude with Opus 4.5 can easily brute force any software engineering problem and solve it in one way or another.
这个模型确实非常强大。
This model is really strong.
YasinMTB写道:Opus 4.5与其他所有模型之间的差距简直疯狂。
YasinMTB writes: The gap between Opus 4.5 and every other model is insane.
Justin Schroeder写道:老实说,Opus 4.5感觉是我迄今为止见过的编码模型中最重大的飞跃。
Justin Schroeder writes: Honestly, Opus 4.5 feels like the biggest jump in coding models I've seen to date.
它真的非常非常出色。
It's really, really good.
Stuart Cheney写道:Opus 4.5将我们带到了新的高度。
Stuart Cheney writes: Opus 4.5 has taken us to the next level.
我现在可以一次性处理八到十个线性工单,完全无需人工干预,直到PR在GitHub上被审核。
I can now offload eight-ten linear tickets a time with no humans in the loop until after the PR is reviewed in GitHub.
质量提升的程度非同寻常。
The step up in quality is exceptional.
想想十二个月后它会发展到什么程度,简直不可思议。
Pretty insane to think where this will be twelve months from now.
Pietro Sciarano的评论最为直白:Opus 4.5是外星科技。
Pietro Sciarano had the simplest take: Opus 4.5 is alien tech.
现在还有一条关于Anthropic的消息。
Now one other story regarding Anthropic.
该公司收购了开发工具初创公司BUN以加速云代码开发。
The company has acquired developer tool startup BUN to accelerate Cloud Code.
BUN开发的JavaScript运行时速度远超竞争对手。
BUN produces a JavaScript runtime that's dramatically faster than competitors.
该产品是一个一体化开发者工具包,集成了运行时环境、包管理器、打包工具和测试运行器。
The product is an all in one developer toolkit that combines runtime, package manager, bundler, and test runner.
Anthropic写道,BUN已经'成为AI主导软件工程的关键基础设施,帮助开发者以前所未有的速度构建和测试应用程序'。
Anthropic wrote that BUN has quote, become essential infrastructure for AI led software engineering, helping developers build and test applications at unprecedented velocity.
通过吸纳BUN团队,Anthropic希望以AI优先的方式重建开发者技术栈。
By bringing the BUN team on board, Anthropic hopes to work on rebuilding the developer stack with an AI first approach.
他们写道:'BUN将助力我们构建下一代软件的基础设施'。
They wrote, BUN will be instrumental in helping us build the infrastructure for the next generation of software.
我们将共同努力,继续使Claude成为程序员及依赖AI完成重要工作的所有人的首选平台。
Together, we will continue to make Claude the platform of choice for coders and anyone who relies on AI for important work.
BUN将保持开源,Anthropic将持续投入该平台以确保其保持领先地位。
BUN will remain open source, and Anthropic will continue to invest in the platform to ensure it remains a top choice.
虽然交易条款未公开,但Anthropic借此机会宣布Claude Code已实现10亿美元年度经常性收入。
And while deal terms weren't disclosed, Anthropic did use the occasion to announce that Claude Code had reached a billion dollars in ARR.
首席产品官Mike Krieger写道:'Claude Code仅用六个月就达到10亿美元年化收入,将BUN团队纳入Anthropic意味着我们能构建基础设施来增强这一势头,跟上AI应用指数级增长的步伐'。
Chief Product Officer Mike Krieger writes: Claude Code reached a billion dollars in run rate revenue in only six months, and bringing the BUN team into Anthropic means we can build the infrastructure to compound that momentum and keep pace with the exponential growth in AI adoption.
在Anthropic方面,该公司显然正在加入明年准备进行IPO的上市竞赛。
On the Anthropic front, the company is apparently joining the race to go public as they prepare for an IPO next year.
《金融时报》报道称,Anthropic已聘请律师和主要投资银行为2026年的IPO做准备。
The Financial Times reports that Anthropic has engaged lawyers and major investment banks to prepare for a twenty twenty six IPO.
报道还指出,他们正在进行一轮估值超过3000亿美元的私募融资谈判。
The report also stated that they're in the middle of negotiating a private funding round at more than a $300,000,000,000 valuation.
这轮融资将包括上个月微软和英伟达承诺的150亿美元投资,估值可能高达3500亿美元。Anthropic发表了一份非常模棱两可的声明,称'对于像我们这样规模和收入水平的公司来说,像上市公司一样有效运作是相当标准的做法'。
That round would include that $15,000,000,000 commitment from Microsoft and Nvidia last month, and could see the valuation go as high as $350,000,000,000 Anthropic offered a very non committal statement telling the It's fairly standard practice for companies operating at our scale and revenue level to effectively operate as if they are publicly traded companies.
他们补充说,关于是否上市或IPO时间表尚未做出任何决定。
They added that no decisions had been made on whether or not to go public or timing for the IPO.
尽管如此,这一消息为明年设定了一个新的竞争态势,这次是OpenAI与Anthropic之间争夺公开市场上市。
Still, the news sets up a new competitive race dynamic next year, this time between OpenAI and Anthropic to list on the public markets.
《金融时报》写道:Anthropic的投资者对IPO充满热情,他们认为通过率先上市,Anthropic可以从规模更大的竞争对手OpenAI手中夺取主动权。
The Financial Times wrote: Anthropic's investors are enthusiastic about an IPO, arguing that Anthropic can seize the initiative from its larger rival OpenAI by listing first.
无论这两家公司的IPO谁先谁后,它们都很可能成为有史以来规模最大的两次公开上市,这将使2026年和/或2027年成为AI驱动市场的又一个爆发年。
Now whichever order the IPOs happen in, they are likely to be the two biggest public listings in history, setting 2026 and or 2027 up to be another set of huge years for AI driven markets.
今天最后一条消息,Mistral发布了一大波新模型。
Lastly today, a big new set of model releases comes from Mistral.
Mistral周二宣布了新的Mistral 3开源模型家族。
Mistral announced the new Mistral three open source model family on Tuesday.
该系列包括对小型Mistral模型的更新,参数规模分别为30亿、80亿和140亿。
The lineup includes updates to the small Mistral models available in 3,000,000,000, 8,000,000,000, and 14,000,000,000 parameter sizes.
每个小型模型都有三个变体:基础模型以及针对推理和代理任务的微调版本。
Each small model has three different variants: a base model as well as fine tunes for reasoning and agentics.
最小型的模型可以在智能手机等设备上运行。
The smallest model can run on devices like smartphones and normal laptops.
虽然小型模型在同类产品中表现都很出色,但米斯特拉尔的新大型模型也值得关注。
And while the small models are all very strong in their class, Mistral's new large model is also notable.
这款名为Mistral Large三的模型拥有675亿参数,采用专家
Called Mistral Large three, the model is a six seventy five billion parameter model that uses a mixture of experts architecture with 41,000,000,000 active parameters.
米斯特尔的基准测试显示,该模型与DeepSeq 3.1和Kymi K2相比具有竞争力,在推理和科学知识方面略胜一筹,但在编码方面稍显不足。
MISTRL's benchmarking shows that the model is competitive with DeepSeq 3.1 and Kymi K2, outperforming slightly on reasoning and scientific knowledge, and lagging a little on coding.
该大型模型在非英语和中文的多语言提示上展现了业界顶尖性能,这是MISTRL重点关注的领域之一。
The large model delivers best in class performance for multilingual prompts outside of English and Chinese, which is one of MISTRL's big focuses.
另一项新特性是全系列产品原生支持多模态能力。
The other new feature is native multimodal capabilities across the entire family.
多模态已被证明是谷歌模型中非常实用的功能,使其能够将推理应用于图像分析及转录等用例。
Multimodality has proven to be a very useful feature in Google's models, allowing them to apply reasoning to image analysis and use cases like transcription.
鉴于大多数中文开源模型都将图像模型作为独立系统部署,原生多模态可能成为Mistral的重要差异化优势。
And since most of the Chinese open source models have been deploying image models as a separate system, native multimodality could be a big point of differentiation for Mistral.
Mistral指出他们仅用3000块NVIDIA H200组成的集群就完成了训练,与美国领先实验室运营的超10万块GPU集群相比规模极小。
Mistral noted that they carried out the training run on a cluster of just 3,000 NVIDIA H200s tiny compared to the clusters operated by the leading US labs, which contain over 100,000 GPUs.
该产品线最大的空白是缺乏专门的推理模型。
The big gap in the lineup is the lack of a reasoning model.
这意味着尽管Large three在同等条件下超越了中文非推理模型,但仍未达到中文推理模型的最先进水平。
That means that although Large three beats the Chinese non reasoners in an apples to apples comparison, it falls short of the state of the art performance of the Chinese reasoning models.
Mistral选择强调其小型模型作为重大进步。
Mistral chose to highlight their small models as a big step forward.
他们写道,下一波人工智能的浪潮将不再由单纯的规模定义,而是由模型的普遍性决定——这些模型小到足以在无人机、汽车、机器人、手机或笔记本电脑上运行。
They wrote, The next wave of AI won't be defined by sheer scale but by ubiquity by models small enough to run on a drone, in a car, in robots, on a phone, or a computer laptop.
在与VentureBeat的对话中,首席科学家兼联合创始人Guillaume Lampel探讨了小模型的应用场景及其如何契合Mistral的商业模式。
With VentureBeat, chief scientist and co founder Guillaume Lampel discussed the use case for small models and how it fits with Mistral's business model.
Mistral目前正瞄准那些在使用大型专有模型时遭遇失败的企业客户。
Mistral is now targeting enterprises that are experiencing failure with the large proprietary models.
他表示:有时客户会问'是否存在现有最佳闭源模型无法胜任的使用场景?'若真如此,他们实际上就陷入了困境。
He said: Sometimes customers say, 'Is there a use case where the best closed source model isn't working?' If that's the case, then they're essentially stuck.
他们束手无策。
There's nothing they can do.
这已经是市面上最好的模型了,但它就是无法开箱即用。
It's the best model available, and it's not working out of the box.
在这些领先专有模型失效的场景中,Mistral正尝试部署工程团队直接与客户协作。
And in those situations where those leading proprietary models are failing, Mistral is trying to deploy engineering teams to work directly with their customers.
Lample说道:在超过90%的情况下,尤其是经过微调后,小模型就能胜任工作。
Said Lample, In more than 90% of cases, a small model can do the job, especially if it's fine tuned.
因此它不仅成本低得多,而且速度更快,还具备所有优势。
So it's not only much cheaper, but also faster, plus you have all the benefits.
你无需担心隐私、延迟、可靠性等问题。
You don't need to worry about privacy, latency, reliability, and so on.
事实上,Mistral的很多业务似乎来自那些基于大型闭源模型构建代理的公司,他们最终发现成本高得令人望而却步。
Fact, it appears that a lot of Mistral's business is coming from companies who build agents on top of large closed source models only to find the result is cost prohibitive.
Lample表示:他们几个月后又回头找我们,因为他们意识到,我们构建了原型,但速度太慢且成本太高。
Lample said: They come back to us a couple months later because they realize, we built prototype but it's way too slow and way too expensive.
一些人对这次发布感到失望。
Some were disappointed with the release.
AI内容创作者Theo写道:看到Mistral的缓慢消亡有点令人难过。
AI content creator Theo writes: It's kind of sad to see the slow death of Mistral.
他们的新模型:1) 比DeepSeq更笨 2) 比DeepSeq贵三倍 3) 比GPT-5慢。
Their new model is: one) Dumber than DeepSeq two) Three times more expensive than DeepSeq and three) Slower than GPT-five.
不过也有人表示:先等等。
Others, however, say, Wait just a second.
安吉·米达写道,这些模型是在3000台H200上训练的,仅为一个实践集群,却已达到顶尖水准。
Anji Midha writes, These were trained on 3,000 H200s, a practice cluster, and yet state of the art zone.
Mistral的18 ks GB200集群即将上线。
Mistral's 18 ks GB200 cluster comes online soon.
今天的发布只是Mistral四系列的热身。
Today's releases are a warm up for the Mistral four family.
对于前沿开源模型来说,未来几个月将会很有趣。
It'll be an interesting few months for Frontier open models.
这无疑是我们将持续关注的话题,不过今天的头条新闻就到这里。
Certainly something we will be watching here, however for now, that is going to do it for today's headlines.
接下来进入正片环节。
Next up, the main episode.
AI技术日新月异。
AI changes fast.
你需要一个为长期发展而打造的合作伙伴。
You need a partner built for the long game.
机器与铅笔(Robots and Pencils)与各组织并肩合作,将人工智能的雄心转化为真实的人类影响力。
Robots and Pencils work side by side with organizations to turn AI ambition into real human impact.
作为AWS认证合作伙伴,他们致力于基础设施现代化,设计云原生系统,并应用AI创造商业价值。
As an AWS certified partner, they modernize infrastructure, design cloud native systems, and apply AI to create business value.
他们的合作伙伴关系 doesn't end at launch.
And their partnerships don't end at launch.
随着AI技术的不断变化,机器与铅笔始终陪伴在您身边,助您与时俱进。
As AI changes, robots and pencils stays by your side, so you keep pace.
不同之处在于这种紧密的合作伙伴关系能够持续创造价值,并随时间推移不断增值。
The difference is close partnership that builds value and compounds over time.
此外,凭借遍布美国、加拿大、欧洲和拉丁美洲的交付中心,客户既能获得本地专业知识,又能享受全球化规模服务。
Plus, with delivery centers across The US, Canada, Europe, and Latin America, clients get local expertise and global scale.
若想了解AI如何实现实质进展而非空头承诺,请访问robotsandpencils.com/aidailybrief。
For AI that delivers progress, not promises, visit robotsandpencils.com aidailybrief.
本节目由Blitzy为您呈现——这款企业级自主软件开发平台,拥有无限的代码上下文。
This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context.
Blitzy运用数千个专业AI代理,经过数小时思考来理解拥有数百万行代码的企业级代码库。
Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale code bases with millions of lines of code.
企业工程领导者们以Blitzy平台开启每个开发冲刺周期,输入他们的开发需求。
Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements.
Blitzy平台会提供计划方案,然后为每项任务生成并预编译代码。
The Blitzy platform provides a plan, then generates and pre compiles code for each task.
Blitzy自主完成80%以上的开发工作,同时为剩余20%需要人工完成的开发工作提供指导。
Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.
上市公司采用Blitzy作为集成开发环境前的开发工具后,工程速度提升了5倍,将其与自选的编程助手配对,实现AI原生软件开发生命周期。
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre IDE development tool, pairing it with their coding pilot of choice to bring an AI native SDLC into their org.
访问blitzy.com并点击'获取演示',了解Blitzy如何将您的SDLC从AI辅助升级为AI原生。
Visit blitzy.com and press Get a Demo to learn how Blitzy transforms your SDLC from AI assisted to AI native.
认识Robo,您的人工智能队友。
Meet Robo, your AI powered teammate.
Robo通过AI驱动的搜索、聊天和代理释放团队潜力,您还可以使用Studio构建专属代理。
Robo unleashes the potential of your team with AI powered search, chat, and agents, or build your own agent with Studio.
Robo由您组织的知识驱动,并运行在Atlassian可信赖的安全平台上,因此它始终在您的工作上下文中运作。
Robo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work.
将Robo连接到您喜爱的SaaS应用,确保不遗漏任何知识。
Connect Robo to your favorite SaaS app so no knowledge gets left behind.
Robo运行在Teamwork Graph上——这是Atlassian的智能层,能整合您所有应用的数据,并从第一天起提供个性化AI洞察。
Robo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one.
Robo已内置在Jira、Confluence以及Jira Service Management的标准版、高级版和企业版订阅中。
Robo is already built into Jira, Confluence, and Jira Service Management Standard, Premium, and Enterprise subscriptions.
是否体验过AI从工具转变为队友的感觉?
Know the feeling when AI turns from tool to teammate?
如果您使用Robo,您就会明白。
If you Robo, you know.
探索Robo——您由Atlassian驱动的新AI队友。
Discover Robo, your new AI teammate powered by Atlassian.
立即开始使用rov,asinvictory,o.com。
Get started at rov,asinvictory,o.com.
本期节目由我的公司Super Intelligent赞助播出。
Today's episode is brought to you by my company, Super Intelligent.
Super Intelligent是一个AI规划平台。
Super Intelligent is an AI planning platform.
当前,随着我们步入2026年,我们在合作企业中观察到一个重要趋势:他们决心让2026年成为规模化AI部署之年,而不仅仅是进行更多试点和实验。
And right now, as we head into 2026, the big theme that we're seeing among the enterprises that we work with is a real determination to make 2026 a year of scaled AI deployments, not just more pilots and experiments.
然而,我们的许多合作伙伴都陷入了某种AI发展瓶颈。
However, many of our partners are stuck on some AI plateau.
可能是治理方面的问题。
It might be issues of governance.
可能是数据准备度的问题。
It might be issues of data readiness.
也可能是流程映射方面的问题。
It might be issues of process mapping.
无论何种情况,我们正在推出一种名为'瓶颈突破者'的新型评估方案,顾名思义,就是要突破AI发展瓶颈。
Whatever the case, we're launching a new type of assessment called Plateau Breaker that, as you probably guessed from that name, is about breaking through AI plateaus.
我们将部署语音代理来收集信息并诊断出阻碍您突破瓶颈的真正问题所在。
We'll deploy voice agents to collect information and diagnose what the real bottlenecks are that are keeping you on that plateau.
在此基础上,我们将制定一个蓝图和行动计划,帮助您突破瓶颈,实现全面部署和真正的投资回报。
From there, we put together a blueprint and an action plan that helps you move right through that plateau into full scale deployment and real ROI.
如果您想了解更多关于'瓶颈突破者'的信息,请发送邮件至contactbsuper.ai,并在主题栏注明'plateau'。
If you're interested in learning more about Plateau Breaker, shoot us a note contactbsuper dot ai with plateau in the subject line.
欢迎回到AI每日简报。
Welcome back to the AI Daily Brief.
今天,我们将讨论亚马逊在其AWS Reinvent活动上发布的所有内容。
Today, we are talking about everything that Amazon has unveiled so far at their AWS Reinvent event.
这当然是AWS一年一度的重要活动,由于AWS是亚马逊与更广泛AI世界联系最紧密的部分,我们通常能在这个活动上获得亚马逊关于其AI战略的最新动态。
This is of course AWS's big annual event, and since AWS is the part of Amazon that is most connected to the broader AI world, it is the event where we most often get updates from Amazon around their AI strategy.
您可能还记得,早在2022年,AWS实际上计划发布一款类似ChatGPT的产品,当时他们称之为Bedrock。
You may or may not remember that back in 2022, AWS was actually planning on releasing something akin to ChatGPT that they were then calling Bedrock.
但在2022年11月30日ChatGPT发布后,看到其领先程度后,他们放弃了原计划,彻底重新定义了Bedrock的含义。
But after ChatGPT launched on 11/30/2022, and they saw how far ahead it was, they scrapped those plans and actually reconstituted entirely what Bedrock meant.
自那时起,亚马逊虽凭借其全球领先的云业务在企业架构层面始终是AI叙事的关键部分,但尚未真正找到自身在AI叙事中的定位。
Since then, Amazon hasn't exactly found its place in the AI narrative, even if with their cloud business, which remains the number one in the world, they have been a key part of the story structurally for many enterprises.
在去年的AWS re:Invent大会上,亚马逊推出了名为Nova的新模型家族,似乎押注于企业工作负载多样化扩展的理念——未来竞争维度不仅包括尖端性能,还将涵盖成本效益与性价比。
At last year's AWS re:Invent, we got a new family of Amazon models called Nova, which seemed to be making a bet on the idea of an expanding diversity of enterprise workloads where the vectors of competition would not only be state of the art performance, but also efficiency and performance for the cost.
而在今年的活动上,我们可谓收获颇丰,应有尽有。
At this year's event, we have gotten well a little bit of everything.
因此问题在于——这也是我们在十二月预览节目中提出的疑问——本次大会及其公告将如何影响亚马逊相对于其他云服务和模型供应商的定位?坦白说,企业究竟需要多大程度关注这些动态?
So the question was, and this was the question that we were asking in our December preview show, what this event and its announcements might do for Amazon's positioning relative to its cloud and model provider peers, and frankly, just how much enterprises have to care about all of this that's going on.
让我们先从亚马逊对其Nova家族的更新开始,恰如其分地命名为Nova二代。
Let's start with Amazon's update to their Nova family with the appropriately named Nova two.
如我之前所述,Nova模型系列去年首次发布,包含四种不同规模的文本模型和一个图像模型。
As I mentioned, the Nova models were first released last year and consisted of four text models of various sizes as well as an image model.
Nova二代通过转向原生多模态架构,取消了独立的图像模型。
Nova two has done away with the dedicated image model by switching to a native multimodal architecture.
该系列包含名为Nova二代Lite的小型推理模型,以及名为Nova二代Pro的大型推理模型。
The family includes a small reasoning model called Nova two Lite and a large reasoning model called Nova two Pro.
此外还有一款专用于语音转语音的模型Nova two Sonic,以及被亚马逊称为统一多模态推理与生成模型的Nova two Omni。
There's also a dedicated speech to speech model called Nova two Sonic and a model called Nova two Omni Amazon is referring to as a unified multimodal reasoning and generation model.
换言之,Nova two Omni能处理文本、图像、视频和语音输入,同时生成文本和图像。
In other words, Nova two Omni can process text, image, video and speech inputs while generating both text and images.
亚马逊宣称这是业界首创,能够原生处理视频和语音输入确实可能开辟一系列新应用场景。
Amazon is touting this as an industry first, and certainly being able to handle native video and speech inputs could open up a number of new use cases.
目前仅公布了Lite和Pro模型的基准测试结果,整体表现虽不惊艳但还算不错。
Benchmarks were only shared for the Lite and Pro models, and seemed decent, if unsplashy across the board.
在某些特定类别中,Nova系列模型超越了Anthropic、OpenAI和谷歌的同级别模型,但这些优势主要集中在多模态感知等专业功能上。
There are a handful of categories where the Nova models outrank models of the same class from Anthropic, OpenAI, and Google, but they tend to be clustered around specialized features like multimodal perception.
工具调用能力也非常突出,意味着这些模型有望成为智能体的基础架构。
Tool calling was also very strong, meaning these models could be useful as the foundation for agents.
值得注意的是,这些模型在SuiteBench验证测试中远未达到顶尖水平,说明它们不会成为新的首选编程模型。
Notably, the models fell far short of state of the art on SuiteBench verified, meaning that these are not going to become the new coding models of choice.
虽然单项基准测试成绩并不突出,但综合来看仍构成了不错的前沿模型,相比Nova一代确实有了重大改进。
If the benchmarks are nothing to write home about, in aggregate they seem to combine into a decent frontier model and certainly a big improvement over Nova one.
独立基准测试机构Artificial Analysis显示,Nova 2 Pro整体性能与Claude 4.5 Sonnet相当,而Nova 2 Lite略优于Claude 4.5 Haiku。
Independent benchmarking firm Artificial Analysis showed that Nova two Pro is in the same ballpark as Claude 4.5 Sonnet overall, and Nova two Lite is slightly ahead of Claude 4.5 Haiku.
这些模型尚无法与Gemini 3 Pro、GPT 5.1或Claude 4.5 Opus竞争,但问题在于它们是否有必要达到这个水平。
The models are not competitive with Gemini three Pro, GPT 5.1, or Claude 4.5 Opus, but the question is whether they need to be.
Artificial Analysis指出,Nova 2 Pro完成基准测试的成本约为Cloud 4.5 Sonnet的80%,仅为Gemini 3 Pro的一半。
Artificial analysis noted that Nova two Pro completed their benchmark run at around 80% of the cost of Cloud 4.5 Sonnet and about half the cost of Gemini three Pro.
伴随新模型发布,AWS推出了一项名为NovaForge的新服务,允许企业训练自有版本的Nova系列模型。
Alongside the new models, AWS launched a new service called NovaForge that allows companies to train their own versions of the Nova family of models.
该服务起价为每年10万美元,价格不菲。
The service is not cheap, starting at $100,000 a year.
不过这是种相当差异化的产品,亚马逊提供了各种预训练和后训练检查点的访问权限。
However, it is a pretty different type of offering, with Amazon providing access to various pre training and post training checkpoints.
其核心理念是让企业注入自有专有数据及行业特定数据,最终获得符合需求的前沿定制模型。
The idea is that enterprises can feed their own proprietary data, as well as industry specific data, to come up with a frontier model customized to their needs.
Reddit首席技术官Chris Slow为该服务提供了推荐词,称其已展现出令人印象深刻的效果。
Chris Slow, the CTO of Reddit, provided a testimonial for the service, saying that it's already delivering impressive results.
他继续说道:我们正在用一个更精准的单一解决方案取代多个不同模型,使内容审核更高效。
He continued: We're replacing a number of different models with a single, more accurate solution that makes moderation more efficient.
用统一协调的方法取代多个专业机器学习工作流程的能力,标志着我们在Reddit上实施和扩展AI方式的转变。
The ability to replace multiple specialized ML workflows with one cohesive approach marks a shift in how we implement and scale AI across Reddit.
就市场反应而言,坦率地说现在还为时过早,难以获得大量反馈,而且亚马逊推出模型的方式也让情况变得更加困难。
Now in terms of reactions, frankly it's kinda too early to get much, and the way Amazon rolls out models doesn't make it a lot easier.
正如伊桑·马利克教授所言:由于亚马逊让尝试其新模型变得非常困难,我至今尚未体验过Nova two Pro。
As Professor Ethan Malik put it: Since Amazon makes it very hard to experiment with its new models, I haven't tried Nova two Pro yet.
那么,看起来还不错?
So, it seems fine?
它们从未处于性价比的前沿,而新款Nova two总体上仍落后于其他AI,仅在某些代理基准测试中零星获得较高分数。
They have never been at the cost performance frontier, and the new Nova two continues to generally lag other AIs with scattered higher scores on some agentic benchmarks.
关于NovaForge,还有一些耐人寻味的地方。
On NovaForge, there was a little bit more intrigue.
AI企业家埃迪·格雷写道:我需要进一步研究,但如果他们所言属实,亚马逊是第一个做到这点的。
AI entrepreneur Eddie Gray wrote: I need to research more, but if what they say is true, Amazon is the first to do this.
AWS Nova现在可以获取公司的专有数据,并让这些数据训练出专供客户大规模使用的自有大型语言模型。
AWS Nova can now take a company's own proprietary data and let that data train their own LLM just for the customer to use at a large scale.
它还能根据需要战略性地引入外部数据集,与客户数据合并。
It can also allow to strategically bring in external datasets as needed to merge them with their data.
最终结果是针对每个公司和客户量身定制、价值更高的LLM模型。
The result is a much more valuable LLM model tailored to each company and customer.
因此,这组公告中让我感兴趣的是,它们某种程度上都代表着对某些论点的加倍投入——这些论点虽尚未被证伪,但实现时间确实比许多人预期的要长。
So the things that are interesting to me about this set of announcements is that they sort of both represent a doubling down on theses which, while haven't been disproven yet, have at the very least taken longer to come to fruition than some might have expected.
Nova发布时就很明显,亚马逊的赌注是:随着AI工作负载成熟且多样化,市场将需要那些虽非最尖端,但对某些用例类别更高效且更具成本效益的模型。
It seemed pretty clear when Nova was released that Amazon's bet was that as AI workloads matured and got more diverse, there was going to be a need for models that were not state of the art, but were more efficient and cost effective for certain categories of use cases.
这个论点最终可能被证明是正确的,但今年在企业AI领域确实不是主要关注点。
That thesis may end up proving correct, but it certainly hasn't been the major emphasis this year when it comes to enterprise AI.
在很多情况下,我们仍处于技术前沿状态,企业一直专注于每个新SOTA模型上线带来的新功能。
In many cases, we've still been living at the state of the art, and enterprises have been focused on the new capabilities that come online with each new SOTA model release.
不过,虽然这个论点尚未完全实现,但在我看来,当企业全面规模化应用时,随着我们对不同用例所需能力类型的细化,必然会出现更强的成本意识和对AI部署经济性的考量。
However, while the thesis hasn't fully played out yet, it seems somewhat inevitable to me that when we do reach full scale across the enterprise, there will be far more cost consciousness and consideration of the economics of AI deployments as we get more specific about what different use cases need which type of capabilities.
在Forge方面,自ChatGPT问世之初就有这样一种感觉,即企业定制自己的模型,无论是从头开始完全训练,还是通过Rag将其接入新数据集,或者采用其他任何可用策略,将公司的专有和非公开数据与底层模型连接起来。
On the Forge front, there has been this sense since the very beginning of ChatGPT that enterprises customizing their own models, either fully training them from scratch, or having them plugged into novel datasets via Rag, or generally whatever other strategies have been available to connect the proprietary and non public data of a company with the underlying models.
尽管这些用途看起来直观,但它们仍未成为企业应用的主流。
And while the uses for this seem intuitive, again, they just haven't been the mainstream in where enterprises are.
再次强调,我个人还不准备否定这一观点——即这在未来某个时刻会变得有价值,但我们仍处于探索这类产品需求的早期阶段。
Once again, I wouldn't be ready personally to write off the thesis that this would be valuable at some point, but we're still very much in the early stages of discovery around what the demand for that type of product looks like.
不过这两者的关键在于,现在看似平淡无奇的公告,可能最终会在未来某个时刻为亚马逊带来回报。
The point in both these cases though, is that what looks like a shoulder shrug announcement now could end up paying off for Amazon at some point in the future.
现在转向2025年的关键词——代理,AWS预览了三款专业代理。
Now, moving on to the 2025 watchword of agents, AWS previewed a trio of specialized agents.
其中包括软件开发代理Kiro、可自动化应用安全的AWS安全代理,以及用于IT运维的AWS DevOps代理。
There is Kiro, a software development agent, AWS security agent that can automate application security, and AWS DevOps agent for IT operations.
Kiro被定位为一种编码代理,可以在无需人工干预的情况下连续工作数天。
Kiro was pitched as a coding agent that can work for days without human intervention.
AWS并未明确说明这是一个可以由专有模型驱动的中性工具,还是仅限于Nova模型使用。
AWS wasn't clear on whether this was a neutral harness that could be driven by a proprietary model, or if it was locked to the Nova models.
不过,人们非常渴望看到AWS版本的这种长周期编码代理在实际发布后表现有多强大。
Still, people are pretty eager to see how strong AWS' version of this type of long horizon coding agent is once it's actually released.
安全代理获得了大量关注,因为这是当前AI编码领域的一大空白。
The security agent got a lot of attention as this is a big gap in the current AI coding space.
其理念是打造一个持续运行、主动出击的代理,能够自主寻找漏洞和攻击途径。
The idea is to have an always on, proactive agent that can autonomously hunt for bugs and exploits.
亚马逊表示它能在从设计到部署的每个开发阶段发挥作用。
Amazon says that it can operate at every stage of the development process from design to deployment.
AR Insights的Shelly Kramer在现场发帖称:当Matt Garmin宣布推出AWS安全代理时,现场爆发的自发掌声完全在情理之中。
Shelly Kramer of AR Insights was in the crowd and posted: There's every reason for the spontaneous applause that happened when Matt Garmin announced the launch of AWS Security Agent.
这具有重大意义,它能在开发的每个阶段提供安全反馈,确保及早发现潜在问题,降低昂贵返工的风险,并增强整体产品安全性。
This is incredibly significant as it delivers security feedback at every stage of development, ensuring that potential issues are caught early, reducing the risk of costly rework and strengthening overall product security.
而DevOps代理的设计定位是成为故障排查时的第一响应者。
The DevOps agent meanwhile is designed to be the first actor during a triage situation.
当你的应用宕机时,该代理可以介入,将警报路由给正确人员,并着手诊断甚至可能修复问题。
If your application goes down, the agent can step in, route alerts to the correct people, and get to work diagnosing and maybe even fixing the issue.
虽然不是光鲜亮丽的智能体,但对软件开发人员来说可能具有无可估量的价值。
Not a glamorous agent, but the kind of thing that could be invaluable for software developers.
我要说的是,当你把这些智能体放在一起看时,亚马逊的智能体战略至少会变得更清晰一些。
And I will say that I think that Amazon's agent strategy at least becomes a little bit more clear when you see these all together.
这些智能体被设计成可以独立运作的数字工作者,能够扩展你的团队能力。
These agents are designed to function as self contained digital workers that can extend your team.
它们不是通用型智能体,而是针对特定工作类型专门设计的。
They are not generalist agents, they are specific to a type of work.
很明显,亚马逊正在这里押注于实际落地的集成应用。
And it's very clear that Amazon is making a bet on practical real integration here.
这些智能体共同标志着软件开发新时代的开端。
Together, these agents mark the beginning of a new era in software development.
这些前沿智能体不仅让团队效率更高,更从根本上重新定义了当AI作为团队延伸时可能实现的成果——在整个软件开发生命周期中自主交付价值。
These frontier agents don't just make teams faster, they fundamentally redefine what's possible when AI works as an extension of your team, delivering outcomes autonomously across the software development lifecycle.
节目开头我提到过,Bedrock原本是他们聊天机器人的名称,但后来成为了平台名称——亚马逊让云客户可以通过这个统一平台访问多种模型。
Now, I mentioned at the beginning of the show that Bedrock was originally going to be the name of their chatbot, but instead became the name of their platform where Amazon allowed their cloud customers to access multiple models all from a single place.
说到这次re:Invent大会,Bedrock的重大发布实际上是由众多小更新组成的。
And when it comes to this re:Invent, the Bedrock big release is actually a ton of little releases.
Bedrock平台新增了18个开放权重模型,包括最新的Mistral三模型家族。
The Bedrock platform added 18 open weight models including the latest Mistral three model family.
但这里没有出现任何能够访问OpenAI等公司专有模型的更新。
But one thing that was not here is any sort of update that adds access to the proprietary models from companies like OpenAI.
我们将在节目最后进一步讨论这对他们的战略意味着什么,以及那里是否可能正在发生变化。
We'll talk a little bit more towards the end about what suggests for their strategy and whether there might be something that's changing there.
还利用这次活动正式发布了Trainium三Ultra服务器,并预告了下一代Trainium四芯片。
Also used the event to formally launch their Trainium three Ultra server and tease their next generation Trainium four chips.
Ultra服务器是他们的数据中心级单元,可容纳144个芯片。
The Ultra server is their data center scale unit that can host 144 chips.
AWS表示可以联网数千台Ultra服务器,提供多达一百万个协同工作的Trainium三芯片。
AWS said that thousands of Ultra servers can be networked to provide up to a million coherent Tranium three chips.
虽然这听起来很厉害,但与NVIDIA的千卡集群并非直接可比,所以我们还得看这些芯片在实际应用中的的表现如何。
And while this sounds impressive, it is not an apples to apples comparison to the thousand strong NVIDIA clusters, so we'll have to see how the chips perform in the wild.
AWS表示,Trainium 3的速度是上一代的四倍,内存容量也是四倍,且能效提高了40%。
AWS said that Trainium three was four times faster, had four times as much memory, and are 40% more efficient than the previous generation.
有趣的是,Trainium 4将完全兼容英伟达的NVLink Fusion网络系统,这意味着AWS芯片可与英伟达GPU互操作。
Interestingly, Trainium four will be fully compatible with NVIDIA's NVLink Fusion networking system, meaning AWS chips will be interoperable with NVIDIA GPUs.
官方未公布Trainium 4的发布时间表,因此我们需等到明年才能见分晓。
Didn't announce a timeline for the Tranium four release, so we'll have to wait until next year to see how they stack up.
但时代叙事已变——《华尔街日报》迅速将Trainium称为'英伟达的新威胁',这与亚马逊以往芯片发布后被忽视的情况截然不同。
But in a sign of the narrative times, rather than being written off as previous Amazon chip releases had been, The Wall Street Journal was quick to declare Tranium quote another threat to Nvidia.
投资者近来当然热衷于谷歌TPU可能颠覆英伟达市场主导地位的说法。
Investors have of course been hooked recently on the narrative that Google's TPUs could disrupt Nvidia's market dominance.
显然,Trainium恰好符合这个叙事框架。
So Tranium apparently fits right alongside that story.
现实中我认为需要谨慎看待这些芯片能否获取可观市场份额,但投资者认真对待对英伟达的威胁,本身就是时代特征的体现。
Now, in the real world, it pays, I think, to be circumspect of whether either of these chips can gather significant market share, but it is a sign of the times that investors are taking the threat to NVIDIA seriously.
意外之举是宣布了名为'AI工厂'的新产品。
One curveball was the announcement of a new product called AI Factories.
通过这款产品,AWS正在进军本地计算领域。
With this product, AWS is getting into the on premise compute sector.
其理念是企业和政府可以自建数据中心,而AWS则提供AI服务器和硬件管理服务。
The idea is that companies and governments can supply their own data center, while AWS supplies the AI servers and hardware management.
该服务还能与其他AWS云服务绑定,为客户提供两全其美的解决方案。
The service can also be tied to other AWS cloud services giving customers something of the best of both worlds.
当然,这款产品反映了人们对安全性和数据主权的日益关注。
The product, of course, reflects a growing concern over security and data sovereignty.
通过自建硬件设施,客户可以确保数据完全不会传输给AI公司,模型就运行在自有设施的硬件上。
By hosting their own hardware, customers can ensure they're not sending their data to an AI company at all, with the models hosted on their own hardware in their own facilities.
给'Trainium即将颠覆行业'的说法泼点冷水——该产品其实是与NVIDIA的合作项目,后者将成为独家硬件供应商。
Throwing some cold water on the idea that Trainium is somehow about to take over the industry, this product is a partnership with NVIDIA, who will be the exclusive hardware provider.
这显然是应对市场需求之举,所以会有多少企业开始采用这种白标服务搭建私有云,值得关注。
Now, it's clearly a response to market demand, so it'll be interesting to see how many companies start setting up their own private clouds using this white label service.
退一步看,我认为这次转型在很多方面体现了亚马逊对企业AI长期愿景的加码投入。
Taking a step back, in a lot of ways I think that this reinvent is in some ways a doubling down on the long term vision of enterprise AI that Amazon has been pursuing.
它包含了许多渐进式的更新和发展,其中一些对商业客户将非常有价值,但在我看来,核心论点并未改变。
It has a lot of incremental updates and developments, some of which will be very valuable to business customers, but it doesn't feel to me like any core thesis has changed.
如果说有什么变化的话,那就是似乎新增了一定程度的灵活性和开放性,不再试图将人们锁定在AWS生态系统中。
To the extent that anything has changed, there does seem to be some new amount of flexibility and openness to not trying to lock people into the AWS ecosystem.
《信息报》发表了一篇题为《AWS态度逆转,让AI客户更易使用竞争对手云服务》的文章。
The information wrote a piece called In a reversal, AWS makes it easier for AI customers to use rival clouds.
尽管他们将其表现为对亚马逊在AI领域竞争失利的现实让步,但我认为背后有更广泛的趋势:在这个领导地位几乎每周都在变化的快速发展的领域,任何人想要玩老式的锁定游戏都将非常困难。
And while they presented as a concession to the reality of being out competed in AI for Amazon, I think that there's a broader thing going on underneath, which is just that it's going to be very hard for anyone in such a fast moving field where leadership changes on a nearly weekly basis to try to play the old style games of lock in.
我认为企业正在评估客户根本不会接受这一点。
I think companies are assessing that customers simply will not accept that.
因此将会出现许多非常有趣的新型'友敌'关系。
And so there's going to be a lot of really interesting new types of frenemy relationships.
水涨船高确实让所有船只都受益。
The rising tide truly is lifting all boats.
至少在我看来,像亚马逊这样重新评估传统企业策略的做法是合理的。
And to me at least, I think it makes sense to reevaluate the old enterprise playbooks in the way that Amazon seems to be doing.
现在,对于企业听众和那些试图弄清楚需要关注多少信息的人来说,我会这样总结这一切的意义。
Now, in terms of what this all adds up to for enterprise listeners and people who are trying to figure out how much they have to be paying attention, the way that I would put it is this.
我认为这里没有任何内容意味着你需要突然急着去关注任何一项已宣布的内容。
I don't think that there's anything here that means that all of a sudden, you have to rush out and start paying attention to any one thing that was announced.
不过,我确实认为对于许多企业买家来说,至少熟悉亚马逊正在酝酿的内容——不仅是当前,还包括他们未来发展的轨迹——这似乎是进行恰当尽职调查的良好做法。
However, I do think that for many enterprise buyers, being at least familiar with what Amazon has cooking, not just now, but in terms of the trajectory of where they're headed, feels like good proper due diligence.
当然,我相信本周内我们会听到更多消息,如果有任何值得注意的内容,我会进行更新。
But of course, I'm sure we'll hear more throughout the week, and if there is anything notable, I will do an update.
现在,今天的AI每日简报就到这里。
For now, that is gonna do it for today's AI Daily Brief.
一如既往感谢您的收听或观看,下次再见,祝平安。
Appreciate you listening or watching as always, and until next time, peace.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。