Last Week in AI - #214 - Gemini 命令行工具、io 剧情、AlphaGenome、版权裁决 封面

#214 - Gemini 命令行工具、io 剧情、AlphaGenome、版权裁决

#214 - Gemini CLI, io drama, AlphaGenome, copyright rulings

本集简介

我们第214期节目,总结并讨论上周的重大AI新闻! 录制于2025年6月27日 由安德烈·库伦科夫和杰里米·哈里斯主持。 欢迎将您的问题和反馈发送至:contact@lastweekinai.com 和/或 hello@gladstone.ai 阅读我们的文字简报并在 https://lastweekin.ai/ 上评论本播客。 本期内容: Meta从OpenAI和Thinking Machines Lab招聘关键工程师,后者完成20亿美元种子轮融资,估值达100亿美元。 DeepMind推出Alpha Genome,通过一个与AlphaFold相当但专注于基因功能的模型,显著推动基因组研究。 台湾对华为和中芯国际实施技术出口管制,同时Getty在一项开创性法律案件中撤回对Stability AI的关键版权主张。 一篇新的DeepMind研究论文提出了一种变革性方法,用于评估AI任务中的认知债务,利用脑电图(EEG)评估使用大语言模型撰写论文时的认知负荷与记忆召回情况。 时间戳 + 链接: (00:00:10) 引言 / 闲聊 (00:01:22) 新闻预览 (00:02:15) 回应听众评论 工具与应用 (00:06:18) 谷歌将Gemini CLI引入开发者终端 (00:12:09) Anthropic现在允许您直接通过Claude AI聊天机器人创建应用 应用与商业 (00:15:54) Sam Altman公开其“io”商标之争 (00:21:35) 华为Matebook搭载麒麟X90,采用中芯国际7nm(N+2)工艺 (00:26:05) AMD推出首款Ultra Ethernet就绪网卡——Pensando Pollara提供高达400 Gbps性能 (00:31:21) 亚马逊加入核能热潮,为AWS采购1.92吉瓦电力 (00:33:20) 英伟达进军核能领域——公司加入比尔·盖茨支持的TerraPower,该公司正建造用于数据中心供电的核反应堆 (00:36:18) Mira Murati的Thinking Machines Lab完成20亿美元融资,估值100亿美元 (00:41:02) Meta招聘关键OpenAI研究员,专注于AI推理模型 研究与进展 (00:49:46) 谷歌新AI将帮助研究人员理解基因运作机制 (00:55:13) 直接推理优化:大语言模型可自我奖励与优化其在开放任务中的推理过程 (01:01:54) Farseer:大语言模型中的精细化扩展定律 (01:06:28) LLM优先搜索:对解空间的自引导探索 政策与安全 (01:11:20) 无监督语言模型诱导 (01:16:04) 台湾对华为和中芯国际实施技术出口管制 (01:18:22) 您的大脑与ChatGPT:使用AI助手撰写论文时认知债务的累积 合成媒体与艺术 (01:23:41) 法官驳回作者关于Meta AI训练侵犯版权的主张 (01:29:46) Getty撤回对Stability AI的关键版权主张,但英国诉讼继续 查看隐私政策:https://art19.com/privacy 加利福尼亚州隐私声明:https://art19.com/privacy#do-not-sell-my-info

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

上周AI资讯感谢ODSC AI成为赞助商。

Last week in AI would like to thank ODSC AI for being a sponsor.

Speaker 0

ODSC是历史最悠久、规模最大的专注于应用数据科学和AI的社区之一。

ODSC is one of the longest running and largest communities focused on applied data science and AI.

Speaker 0

它十多年前始于一个简单的理念:将实践者聚集在一起,向真正构建和部署模型的业内人士学习,而不仅仅是空谈理论。

It started over a decade ago with a simple idea, bringing practitioners together to learn from people actually building and deploying models in the real world, not just talking theory.

Speaker 0

在4月28日至30日,您可以通过参加在波士顿举办并线上同步进行的ODSC East 2026亲身体验这一盛会。

On April 28 through the thirtieth, you can experience it yourself at ODSC East twenty twenty six taking place in Boston and virtually.

Speaker 0

届时将有数千名混合模式参会者,包括数据科学家、机器学习工程师、AI研究员和技术领导者。

There will be thousands of hybrid attendees ranging from data scientists, ML engineers, AI researchers and technical leaders.

Speaker 0

您可以参加超过300场讲座,涵盖大语言模型、生成式AI、计算机视觉、自然语言处理、数据工程等内容。

You can attend over 300 sessions covering LLMs, GenAI, computer vision, NLP, data engineering and more.

Speaker 0

您还可以参加由OpenAI、Hugging Face、NVIDIA以及其他顶尖公司和大学专家主讲的实战培训、工作坊和训练营。

You can also go to hands on training with workshops and boot camps taught by experts from companies like OpenAI, Hugging Face, NVIDIA, and other top companies and universities.

Speaker 0

当然,还会有规模盛大的展览和丰富的社交机会,非常适合初创企业、招聘经理和AI工具开发者。

And of course, there'll be a massive expo and networking opportunities great for startups, hiring managers, and AI tool builders.

Speaker 0

这是AI从业者和团队跟上领域前沿、向顶尖人才学习并与社区建立联系的最佳方式之一。

It's one of the best ways for AI practitioners and teams to stay ahead of a field and learn from the best and connect with the community.

Speaker 0

前往 odsc.ai/east 并使用促销码 LWAI,即可在 ODSC AI East 2026 的门票上再享 15% 折扣。

Go to odsc.ai/east and use promo code l w a I for an additional 15% off your pass to ODSC AI East twenty twenty six.

Speaker 0

访问 odsc.ai/east 并使用代码 LWAI,即可在全球领先的 AI 构建者与培训大会上额外获得 15% 折扣。

That's odsc.ai/east and use code LWAI to get an extra 15% off on the number one AI builders and training conference.

Speaker 0

我们要感谢 Box 对《上周 AI》的赞助。

We'd like to thank Box for sponsoring Last Week in AI.

Speaker 0

Box 是领先的企业级智能内容管理平台,帮助组织促进协作、管理整个内容生命周期、保护关键内容,并通过企业级 AI 转型业务流程。

Box is leading intelligent content management platform, organizations to fuel collaboration, manage the entire content lifecycle, secure critical content and transform business workflows with Enterprise AI.

Speaker 0

要释放 AI 的力量,你需要将你的内容接入你的大语言模型和智能代理。

To unlock the power of AI, you need to get your content to your LLMs and agents.

Speaker 0

你的企业不仅仅是互联网知识的总和,你的企业真正存在于你的内容之中。

Your business isn't the sum of internet knowledge, your business lives in your content.

Speaker 0

因此,你并不只是想简单地把 AI 套用在现有的流程上。

So you don't just want to bolt on AI to your existing processes.

Speaker 0

成为一家AI优先的公司,不仅仅是自动化你现有的工作,更是要重新构想可能实现的事情。

To become an AI first company isn't just about automating what you already do, it's about reimagining what's possible.

Speaker 0

借助Box AI,你可以真正利用最新的AI突破来自动化文档处理和工作流程,从内容中提取洞察,并构建自定义AI代理来完成任务等。

With Box AI, you can truly leverage the latest breakthroughs in AI to automate document processing and workflows, extract insights from content, build custom AI agents to work on assignments and more.

Speaker 0

最重要的是,Box AI与所有主要的领先AI模型提供商兼容,包括OpenAI、Anthropic、Google XAI等,因此你可以确信能够将最新的AI模型与你的内容结合使用。

And most importantly Box AI works with all the major leading AI model providers, so OpenAI, Anthropic, Google XAI and others, so you can be sure you can use the latest AI models with your content.

Speaker 0

Box AI将为你提供内容层,为AI提供所需的上下文,同时让你的团队能够灵活地针对不同用例测试和使用各种模型。

Box AI will give you the content layer that gives AI the context it needs, while giving your teams the flexibility they need to test and leverage various models for different use cases.

Speaker 0

请前往box.com/ai了解更多信息。

So go to box.com/ai to learn more.

Speaker 0

大家好,欢迎收听《Last Week in AI》播客,在这里我们会聊聊AI领域的最新动态。

Hello and welcome to The Last Week in AI podcast, where you can hear us chat about what's going on with AI.

Speaker 0

和往常一样,本期节目我们将总结并讨论上周最有趣的AI新闻,你可以在节目描述中找到相关链接和时间戳。

As usual, this episode, we will summarize and discuss some of last week's most interesting AI news and you can check out the episode description for the links to that and the timestamps.

Speaker 0

我是你们的常驻主持人之一,安德烈·库伦科夫。

I'm one of your regular hosts, Andrey Kurenkov.

Speaker 0

我在研究生阶段学习人工智能,现在在一家生成式AI初创公司工作。

I studied AI in grad school and now work at a generative AI startup.

Speaker 1

我是你的另一位主持人杰里米·哈里斯,Gladstone AI的联合创始人,专注于AI国家安全,等等,你知道的。

And I'm your other host, Jeremie Harris, co founder of Gladstone AI, AI national security, blah blah blah, as you know.

Speaker 1

正是因为有我,这个播客才会是一个半小时,而不是两个小时。

And I'm the reason this podcast is gonna be an hour and a half and not two hours.

Speaker 1

安德烈正耐心地等了半小时,而我刚刚处理完一些事。

Andrey is very patiently waiting for, like, half an hour while I just sorted out.

Speaker 1

我女儿正在长牙,有个女儿真是太棒了,但有时候牙齿会一下子长出六颗甚至八颗,那真是忙得不可开交。

Just my my daughter's been teething, and it's wonderful having a daughter, but sometimes teeth come in six or eight in a shot, and then you have your hands full.

Speaker 1

所以她成了这一切最大的受害者。

And so she is the greatest victim of all this.

Speaker 1

我饿得排第二,因为刚才她一直说‘我饿了’,持续了五分钟,可就是没吃上。

I'm hungry as a close second, because boy, that was a kept saying, I'm hungry for five more minutes, and it never happened.

Speaker 1

谢谢你这么有耐心,安德烈。

I appreciate the patience, Andrey.

Speaker 0

我多出了半小时准备,所以我不抱怨。

I got an extra half hour to prep, so I'm not complaining.

Speaker 0

我敢肯定你今天早上比我更糟。

I'm pretty sure you had a rougher morning than I did.

Speaker 0

我只是喝着咖啡等着,所以也没太糟。

Was just drinking coffee and waiting, so not too bad.

Speaker 0

但说到这一期,我们先做个简短预告。

But speaking of this episode, let's do a quick preview.

Speaker 0

这周又不会是重大新闻频出的一周。

It's going to be, again, kind of less of a major news week.

Speaker 0

会有一些还算重要的消息、工具和应用。

Some somewhat decently big stories, tools and apps.

Speaker 0

Gemini CLI 是个相当重要的更新。

Gemini CLI is a fairly big deal.

Speaker 0

在应用和商业方面,我们有一些有趣的OpenAI八卦,还有一大堆硬件动态。

Applications and business, we have some fun OpenAI drama and a whole bunch of hardware stuff going on.

Speaker 0

这周也没有什么重大的开源动态,所以我们就不提了。

And not really any major open source stuff this week, so we'll be skipping that.

Speaker 0

研究与进展方面,DeepMind 有令人兴奋的新研究,还有各种关于可扩展推理、强化学习等方面的论文。

Research and advancements, exciting new research from DeepMind, and just various papers about scalable reasoning, reinforcement learning, all of that type of stuff.

Speaker 0

最后,在政策与安全方面,我们会继续讨论一些互操作性、安全、中国相关的故事,以及关于版权的重磅新闻,延续上周的内容。

Finally, in policy and safety, we'll have some more interoperability, safety, China stories, the usual, and some pretty major news about copyright following up on what we saw last week.

Speaker 0

实际上,这会是本集后半部分的亮点之一。

That actually would be one of the highlights of this episode towards the end.

Speaker 0

在进入那部分之前,我们想感谢一下在 Apple 播客上留下的几条评价,就像我们偶尔做的那样。

Before we get to that, do want to acknowledge a couple reviews on Apple Podcasts, as we do sometimes.

Speaker 0

感谢那些留下美好评论的热心听众。

Thank you to the kind reviewers leaving us some very nice comments.

Speaker 0

还有一些有趣的评价。

Also some fun ones.

Speaker 0

比如这条。

Like this one.

Speaker 0

这位观众说:我想听到一个机智而深刻的回应,解释为什么AI无法像你们一样制作这个节目。

This viewer said, I want to hear a witty and thoughtful response on why AI can't do what you're doing with the show.

Speaker 0

哇,你让我同时做到机智又深刻,真是让我措手不及。

And wow, you're putting me on the spot being both witty and thoughtful.

Speaker 0

这确实让我思考了一下。

And it did make me think.

Speaker 0

我得说,几个月前我确实试过Notebook LM,对吧?

I will say I did try Notebook LM a couple months ago, right?

Speaker 0

那是谷歌推出的播客生成工具。

And that's the podcast generator from Google.

Speaker 0

它还不错,但明显开始重复内容了。

It was good, but definitely started repeating itself.

Speaker 0

我发现LLM通常还是会有这样的问题:在十到二十分钟后就失去上下文,不断重复或者出现其他状况。

I found that LLMs still often have this issue of losing track of where they're at, like ten minutes, twenty minutes in, repeating themselves or just otherwise.

Speaker 0

而且,

And also,

Speaker 1

安德烈,它们也会重复自己。

Andrey, and repeating themselves too.

Speaker 1

对吧?

Right?

Speaker 1

它们就会一直说同样的内容,一遍又一遍地重复。

And they'll just keep saying the same thing and and repeating over and over.

Speaker 1

比如,它们会反复地重复很多次。

Like, they'll repeat and and repeat a lot.

Speaker 1

所以是的。

So yeah.

Speaker 1

对。

Yeah.

Speaker 0

这种重复的问题,幸运的是,几年前就已经解决了。

That that was that kind of repetition was solved a couple years ago, thankfully.

Speaker 0

但确实。

But Yeah.

Speaker 0

确实。

True.

Speaker 0

说实话,如今用大语言模型复制上周的内容已经可以做得相当不错了。

Honestly, you could do a pretty good job replicating last week in AI with LLMs these days.

Speaker 0

我不瞒你,但要精准还原我们的个性、风格和声音,你需要非常精确的提示。

I'm not going to lie, but you're going to have to do very precise prompting to get our precise personas and personalities and voices and so on.

Speaker 0

我不知道。

I don't know.

Speaker 0

所以,希望我们仍然比AI做得更好,或者至少在做与那些试图让AI制作通用AI新闻播客时所得出的泛化结果不同的事情。

So Hopefully, we're still doing a better job that AI could do, or at least doing a different job than the more generic kind of outcomes you could get trying to elicit AI to make an AI news podcast.

Speaker 1

老兄,有什么AI能竞争得过因为女儿出牙而晚开始三十分钟的情况吗?

Dude, what AI could compete with starting thirty minutes late because it's daughter's teething?

Speaker 1

我现在就挑战你。

Like, I challenge you right now.

Speaker 1

试试看。

Try it.

Speaker 1

你找不到能完成这件事的AI吗?

You're not gonna find an AI that can pull that off?

Speaker 0

你可以让AI声称它能做到。

You can have AI that says it does.

Speaker 0

没错。

That's right.

Speaker 0

没错。

That's right.

Speaker 0

那种体验的情感真的会存在于其中吗?

Will the emotion of that experience actually be in it?

Speaker 0

我不这么认为。

I don't think so.

Speaker 1

我觉得这是自我安慰的方式,对吧?

I think the copium way, right?

Speaker 1

人们经常说,哦,它不会有真情实感。

People are often like, Oh, it won't have the heart.

Speaker 1

它不会有那种灵魂,你知道的,那种播客的感觉。

It won't have, like, the the soul, you know, the podcast.

Speaker 1

它会有的。

It will.

Speaker 1

它会有的。

It will.

Speaker 1

事实上,我认为我们的工作恰恰是为你揭示出那个时刻——当你不再需要听我们的时候。

In fact, I think arguably our job is to surface for you the moment that that is possible, that you can stop listening to us.

Speaker 1

我们不是全职做这个播客,这本身就是一个优势,或许让我们拥有比平常更多的自由。

One of the the the virtues of not being, like, a full time podcaster on this too is we have that freedom maybe more than we otherwise would.

Speaker 1

但说实话,我预计在未来十八个月内,很难想象不会出现类似的东西。

But, man, I mean, it's I would expect within the next eighteen months, hard to imagine, that there won't be something comparable.

Speaker 1

但那样的话,你的播客主持人就不会有灵魂了。

But then, you you know, your your podcast hosts won't have a soul.

Speaker 1

他们会困在盒子里。

They'll be stuck inside a box.

Speaker 0

事实上,我非常确定。

Well, in fact, I'm I'm certain.

Speaker 0

我相信很久以前就已经有AI生成的AI新闻播客了。

I I believe as of quite a while ago, there are already AI generated, AI news podcasts out there.

Speaker 0

我还没听过,但我确信它们是存在的。

I haven't checked them out, but I'm sure they exist.

Speaker 0

如今它们的质量可能已经相当不错了。

Nowadays they're probably quite good.

Speaker 0

你可以每天收到一个,而不是每周一次,而且它们永远不会落后一周。

You get one of those every day as opposed to once a week and they're never a week behind.

Speaker 0

所以在某些方面,它们确实比我们更优秀,但在其他方面,它们能如此机智而深刻地回应这样的问题吗?

So in some ways definitely superior to us, but in other ways, can they be so witty and thoughtful in responding to such a question?

Speaker 0

我不知道。

I don't know.

Speaker 0

难道它们不会像我们有时那样缺乏机智和思考吗?

Don't think Can they be so lacking in wit and thought as we can be sometimes?

Speaker 0

没错。

That's right.

Speaker 0

这确实是个挑战,你知道的。

That's a challenge, you know?

Speaker 1

它们永远无法超越我们的愚蠢。

They'll never outcompete with our stupid.

Speaker 0

是的,这在一般情况下也是如此。

Yes, as is true in general.

Speaker 0

我想你得费尽心思才能让AI在它本来擅长的事情上表现糟糕。

I guess you'd have to really try to get AI to be bad at things when it's actually good.

Speaker 0

不管怎样,最近又有了几条评价,我想表示感谢。

Anyways, a couple more reviews lately, so I do want to say thank you.

Speaker 0

另一条叫《这是最好的AI播客》,这真是莫大的荣誉,还说这是他们唯一会以正常速度收听的播客。

Another one is called this the Best AI Podcast, which is quite the honor and says that this is the only one they listen to at normal speed.

Speaker 0

其他大多数播客都是以1.5倍或2倍速播放的。

Most of the other podcasts are played in 1.5 or 2x speed.

Speaker 0

很高兴听到我们正在以良好的节奏用完这两小时。

Good to hear we are using up all our two hours at a good pace.

Speaker 0

有趣的是,不久前有一条评论说,我总是快进跳过安德烈的讲话,然后专门听杰里米的部分,所以也许我之后加快了速度。

Funny, a while ago there was a review that was like, I always speed up through Andrey's talking and then have to listen, Don't Boys for Jeremie, so maybe I've sped up since then.

Speaker 0

所以,和往常一样,感谢你们的反馈,也感谢你们提出的问题。

So yeah, as always, thank you for the feedback and thank you, for questions that you bring in.

Speaker 0

我觉得这是开场的一个有趣方式。

Think it's a fun way to start the show.

Speaker 0

但现在我们进入新闻部分,先从工具和应用开始。

But now let's go into the news, starting with tools and apps.

Speaker 0

第一个新闻是,我认为这是本周最重要的消息之一:Gemini CLI。

And the first story is, I think one of the big ones of this week, Gemini CLI.

Speaker 0

这基本上是谷歌对Claude Code的回应。

So this is essentially Google's answer to Claude code.

Speaker 0

这是一个你可以在终端中使用的工具,对于非程序员来说,终端就是操作电脑的文本界面。

It is a thing you can use in your terminal which for any non programmers out there is just the text interface to working on your computer.

Speaker 0

你可以查看它们是哪些文件,打开、阅读、输入内容等,全部通过非图形界面完成。

You can look what files they are, open them, read them, type stuff, etcetera, all via non UI interface.

Speaker 0

这个命令行界面就是集成在终端里的 Gemini。

Now this CLI is that is Gemini in your terminal.

Speaker 0

从高层次来看,它具备与 Cloud Code 相同的能力。

It has the same source of capabilities at a high level as Cloud Code.

Speaker 0

它是一个智能代理,你启动它后,告诉它你想做什么,它就会去执行。

So it's an agent and you launch it and you tell it what you want it to do and it goes off and does it.

Speaker 0

它会交替进行:一边执行操作,一边等待你指示它继续、修改或检查当前任务等。

And it sort of takes turns between it doing things and you telling it to follow-up, to change what it's doing or to check what it's doing, etcetera.

Speaker 0

在这次发布中,谷歌非常慷慨,提供了大量免费使用额度:每分钟 60 次模型请求,每天 1000 次请求。

With this launch, Google is being pretty aggressive, giving away a lot of usage, 60 model requests per minute and 1,000 requests per day.

Speaker 0

这个使用上限非常高,而且大量功能完全免费,无需付费。

It's a very high allowance as far as caps and there's also a lot of usage for free without having to pay.

Speaker 0

我不确定这是否是免费用户的上限,但目前你几乎不需要付费。

I'm not sure if that is the cap for free, but for now you're not going to have to pay much.

Speaker 0

我相信迟早你会接触到像Cloud Code这样的云服务模式,使用Cloud Code的最高级别需要每月支付200美元或100美元,而我们公司已经这么做了,因为Cloud Code实在太有用。

I'm sure sooner or later you get to the cloud code type of model where to use cloud code at the highest level, you have to pay $200 per month or a $100 a month, which is what we at our company already do because cloud code is so useful.

Speaker 0

根据我在网上看到的讨论,大家的评价是,这还不太如Cloud Code那么出色。

From what I've seen on conversations online, the Vibe eval is that this is not quite as good as Cloud Code.

Speaker 0

它在软件工程和使用工具方面能力较弱,也不如Cloud Code那样能随遇而安地解决问题,但这款产品刚发布不久。

It isn't as capable of software engineering at using tools, just generally figuring things out as it goes, but it was just released.

Speaker 0

它很快可能会成为一个强劲的竞争对手。

Could be a strong competitor soon enough.

Speaker 1

是的。

Yeah.

Speaker 1

顺便说一句,我依然对这么快我们就习惯了百万级上下文窗口感到惊讶,因为这背后是Gemini 2.5 Pro推理模型在驱动。

I I'm still amazed at how quickly we've gotten used to the idea of a million token context window, by the way, because this is powered by Gemini 2.5 Pro, the reasoning model, and that's part of what's in the back end here.

Speaker 1

所以这也是它为什么还达不到Claude标准的原因,毕竟Claude显然是一个强大得多的模型,我也不太确定具体强在哪里。

So that's gonna be the reason also that it doesn't quite, you know, live up to the Claude standard, which is obviously a model that's a lot I don't know.

Speaker 1

它在处理代码时似乎表现得更好。

It just seems to work better with code.

Speaker 1

我很好奇这种情况什么时候会改变,以及Anthropic真正的秘诀是什么。

I'm curious about when that changes, by the way, and what Anthropic's actual recipe is.

Speaker 1

比如,为什么它表现得这么好?

Like, why is it working so well?

Speaker 1

我们当然不知道,但也许有一天,在奇点之后,当我们所有人成为一个巨大的集体意识时,我们会明白究竟是什么让Claude模型如此优秀且持续出色。

We don't know, obviously, but someday, maybe after the singularity when we're all one giant hive mind, we'll know what actually was going on to make the Claude models this good and persistently good.

Speaker 1

但无论如何,这确实是一次非常令人印象深刻的布局。

But in any case, yeah, it's a really impressive play.

Speaker 1

当然,谷歌目前相对于Anthropic的优势在于拥有更大规模的计算资源。

The advantage that Google has, of course, over Anthropic currently is the availability of just a larger pool of compute.

Speaker 1

因此,当他们考虑降低成本时,你也会看到他们在这方面试图以这一点进行竞争。

And so when they think about driving costs down, that's where you see them trying to compete on that basis here as well.

Speaker 1

所以有大量的免费提示,更准确地说,是大量的免费令牌,以及非常优惠的令牌用量方案。

So a lot of free prompts, a lot of free tokens, I should say, good deals on the the token counts that you put out.

Speaker 1

所以,你知道,这是一种可行的路径。

So, you know, it's it's one way to go.

Speaker 1

我认为,随着这些模型能力的上限不断提升,最终对于任何特定的固定应用来说,成本会变得越来越重要。

And I think as as the ceiling rises on the capabilities of these models, eventually, cost does become a more and more relevant thing for any given fixed application.

Speaker 1

所以这是一个有趣的动态。

So that's an interesting dynamic.

Speaker 1

对吧?

Right?

Speaker 1

前沿与快速跟进者之间的区别。

The frontier versus the fast followers.

Speaker 1

我不确定把谷歌称为快速跟进者是否准确。

Don't know if it's quite right to call Google a fast follower.

Speaker 1

他们确实在做一些前沿工作。

They're definitely doing some frontier stuff.

Speaker 1

但不管怎样,接下来的这一步确实很有趣。

But, anyway, yeah, so interesting next next move here.

Speaker 1

这些技术的生产化,以及以非常显著的方式融入工作流程,我认为这正以缓慢但持续的步调,朝着一个代理承担越来越多任务、上下文窗口和连贯长度都成为其中一部分的世界迈进。

Part of the productionization, obviously, of of these things and entering workflows in very significant ways, I think that's you know, this is heading in slow increments towards a world where agents are doing more and more and more, and context windows, coherence lengths are all part of that.

Speaker 0

对。

Right.

Speaker 0

是的,我们去年年初的时候还热议过智能体和智能体未来的话题。

Yeah, we discussed last year, like towards the beginning of last year was real kind of hype train for agents and the agentic future.

Speaker 0

我认为Claude Code和Gemini CLI表明,我们确实已经到达了那个阶段。

I think Claude code and Gemini CLI are showing that we are definitely there.

Speaker 0

除此之外,像Replit、Lovable这样的工具,总体而言,LLM已经发展到一定程度,部分得益于推理能力的提升,部分可能只是由于LLM本身的改进,使得它们在智能体中应用非常成功。

In addition to things like Replit, Lovable, broadly speaking, LLMs have gotten to a point partially because of reasoning, partially presumably just due to improvements in LLMs where you can use them in agents and they're very successful.

Speaker 0

据我所见,Claude Code表现如此出色,不仅是因为Claude本身,还因为它作为智能体在使用工具方面非常出色。

For what I've seen, part of the reason Claude code is so good is not just Claude, it's also just Claude code, particularly the agent, is very good at using tools.

Speaker 0

它非常擅长文本搜索和文本替换。

It's very good at doing text search, text replacement.

Speaker 0

在进行软件工程时,它特别热衷于编写测试并运行测试。

It's very keen on writing tests and running them as it's doing software engineering.

Speaker 0

因此,这与单纯看待一个LLM还是有些不同的。

So it is a bit different than just thinking about an LLM.

Speaker 0

正是代理所做的一切及其工作方式,才让它如此成功。

It's the whole suite of what the agent does and how it goes about its work that makes it so successful.

Speaker 0

这并不是通过LLM训练就能直接获得的,对吧?

That's something you don't get out of the box with LLM training, right?

Speaker 0

因为工具使用并不在你的预训练数据中。

Because tool usage is not in your pre training data.

Speaker 0

它是建立在预训练之上的额外能力。

It's something kind of on top of it.

Speaker 0

这又是另一个类似于推理的方面,我们现在正超越那种只需从互联网上堆砌海量数据就能免费获得成果的阶段。

That is yet another thing similar to reasoning where we are now going beyond the regime of you can just strain on tons of data from the internet and get it for free.

Speaker 0

除了对齐之外,越来越多的特性现在需要被添加到VLM中,而不仅仅是向它投入数百万GB的数据。

More and more things in addition to alignment, now you need to add to VLM beyond just throwing a million gigabytes of data at it.

Speaker 1

这确实是一个系统,对吧?

It really is a system, right?

Speaker 1

归根结底,它不仅仅是一个模型。

Like, at the end of the day, it's not it's also not just one model.

Speaker 1

很多人以为后端只有一个庞大的单一模型。

A lot of people have this image of, like, you know, there's one monolithic model in the back end.

Speaker 1

实际上背后有很多模型在协作,决定由哪个模型来回应提示,我这里说的还不是MOE那种技术,而是纯粹的后端软件工程,正是这些让这些系统具备了整体流畅的体验。

Assume that there's a lot of, like, models choosing which models to answer a prompt, and I'm not even talking about MOE stuff, like, just literal software engineering in the back end that makes these things have the holistic feel that they do.

Speaker 1

所以是的。

So yeah.

Speaker 0

顺便说一下,我之前忘了这件事,所以去查了一下。

FYI, by the way, I didn't remember this, so I looked it up.

Speaker 0

CLI是命令行界面的缩写,也就是终端的另一种说法。

CLI stands for command line interface command line, another term for terminal.

Speaker 0

所以,对于任何非程序员来说,这是一个有趣的细节。

So again, for any non programmers, fun detail.

Speaker 0

说到Cloud Code,下一个故事是关于Anthropic的,他们发布了发布制品的功能。

And speaking of Cloud Code, the next story is about Anthropic and they have released their ability to publish artifacts.

Speaker 0

这些制品本质上是你可以在Claude内部构建的小型应用程序。

So artifacts are these little apps essentially you can build within Claude.

Speaker 0

你可以获得预览和交互式网页应用,大致如此。

You get a preview and interactive web apps, more or less.

Speaker 0

和其他一些平台一样,我认为谷歌允许你发布他们称之为‘gems’的东西。

And as with some other ones, I believe Google allows you to publish gems is what they call it.

Speaker 0

现在你可以发布自己的制品,其他人也可以浏览它们。

Now you can publish your artifacts and other people can browse them.

Speaker 0

他们还增加了内置AI构建应用的支持,让Claude成为应用的一部分。

They also added the support to building apps with AI built in, with Claude being part of the app.

Speaker 0

现在如果你想在Claude中构建一个语言翻译应用,你可以做到,因为应用本身可以调用Claude进行翻译。

Now if you want to build a language translator app within Claude, you can do that because the app itself can query Claude to do a translation.

Speaker 0

这与仅仅拥有制品相比变化不大,但另一个看似趋势是,所有VLM最终都会走向相似的方向——当你加入制品功能,并让分享你所构建的内容变得简单时。

Not a huge delta from just having artifacts, but another sort of seemingly trend where all VLMs tend to wind up at similar places as far as you add things like artifacts, when you make it easy to share what you build.

Speaker 0

而且任何人都可以做到。

And it's something that anyone can do.

Speaker 0

大多数用户,无论是免费版、专业版还是最高版,都可以分享,他们也会对人们构建的内容感兴趣。

Most users on their free, pro, max tiers can share, and they'll be interested to see what people build.

Speaker 1

如果我是Replit,看到这些我肯定会很紧张。

And if I'm Replit, I'm getting pretty nervous looking at this.

Speaker 1

当然,Replit拥有那个平台,让你能非常轻松地发布应用。

Granted, obviously, Replit has so Replit, right, that platform that lets you essentially, like, launch an app really easily.

Speaker 1

它把服务器管理之类的复杂操作都抽象掉了,孩子们用它来发布游戏和各种实用应用,同时学习编程。

It takes abstracts away all the, like, server management and stuff, and, like, you've got kids launching games and and all kinds of useful apps and learning to code through it.

Speaker 1

这是一个非常强大且极其受欢迎的工具。

Really, really powerful tool and super, super popular.

Speaker 1

它的年增长率达到了十倍。

Mean, it's 10x year over year.

Speaker 1

它增长得非常快。

It's it's growing really fast.

Speaker 1

但你可以看到,前沿正越来越倾向于:先让构建应用变得越来越简单。

But you can start to see the the frontier moving more and more towards, let's make it easier and easier, at first, for people to build apps.

Speaker 1

我们将会拥有一个代理,它能直接为你写出整个应用,生成代码。

We're going to have an agent that just writes whole app for you or whatever and just produces the code.

Speaker 1

但在什么时刻,自然会进一步发展到说:让我们来负责托管吧?

But at what point does it naturally become the next step to say, well, let's do the hosting.

Speaker 1

让我们把所有这些都抽象掉。

Let's abstract away all the things.

Speaker 1

你可以看到OpenAI。

You could see OpenAI.

Speaker 1

你可以看到Anthropic推出一个类似应用商店的东西。

You could see Anthropic launching a kind of app store.

Speaker 1

这词其实不太准确,对吧?因为我们谈论的是更流动的应用。

That's not quite the right term, right, because we're talking about more fluid apps.

Speaker 1

但总的来说,趋势是越来越倾向于托管更多内容,最终达到你只需向AI公司提出一个高层次需求,它就会为你构建出合适的应用的程度。

But, you know, moving more in that direction, hosting more and more of it, and eventually getting to the point where you're just asking the AI company for whatever high level need you have, and it'll build the right apps or whatever.

Speaker 1

今天听起来这其实并不算疯狂。

That's not actually that crazy sounding today.

Speaker 1

而且,这会吞噬掉Replit的很多商业模式,值得看看他们如何应对。

And again, that swallows up a lot of the replic business model, and it'll be interesting to see how they respond.

Speaker 0

是的,这一点尤其成立,因为出现了与之并行的上下文模型协议趋势,使得AI能够轻松与其他服务交互。

Yeah, and this is particularly true because of the converging or parallel trend of these context model protocols that makes it easy for AI to interact with other services.

Speaker 0

所以现在,如果你想要开发一个能连接日历、邮箱、Google Drive或其他任何你常用工具的应用,AI都能轻松与之集成。

So now if you want to make an app that talks to your calendar, talks to your email, talks to your Google Drive, whatever you can think of, basically any major tool you're working with, AI can integrate with it easily.

Speaker 0

因此,如果你想开发一个与你使用的工具相关的应用,现在可以在云端直接完成。

So if you wanna make an app that does something with connection to tools that you use, you could do that within cloud.

Speaker 0

正如你所说,我认为Replit和Lovable都是AI应用开发领域新兴的巨头,我确信它们将在需要数据库、身份验证等更复杂场景的领域占据一席之地。

So as you said, I think both Replit and Lovable are these emerging titans in the world of building apps with AI, and I'm sure they'll have a a place in the kind of domain of more complex things where you need databases and you need authentication and so on and so on.

Speaker 0

但如果你只是想为自己或少数人构建一个应用,以加速某个流程,现在完全可以使用这些工具来实现,并且如果愿意,还可以分享出去。

But if you need to build an app for yourself or for maybe just a couple of people to speed up some process, you can definitely do it with these tools now and then share them if you want.

Speaker 0

接下来,如前所述,我们来谈谈应用和商业领域,先从一些OpenAI的新闻开始——我们已经有一段时间没聊到这类话题了,看来它们还没结束。

And onto applications and business, as promised, kicking off with some OpenAI drama, which we haven't had in a little while, so good to see it isn't ending.

Speaker 0

这次是跟进之前那起关于‘IO’商标的诉讼事件。

This time it's following up on this IO trademark kind of lawsuit that happened.

Speaker 0

我们上周报道过,OpenAI和山姆·阿尔特曼宣布与乔尼·艾维共同推出名为‘IO’的项目,但还有一家名为IO的AI音频硬件公司,拼写不同,是I-Y-O,而不是I-O。

We covered it last week where had OpenAI, Sam Altman, announce the launch of this IO initiative with Johnny Ive and there's another AI audio hardware company called IO spelled differently, I y o instead of I o.

Speaker 0

他们提起诉讼,指控对方窃取了创意和商标。

And they sued alleging that they stole the idea and also the trademark.

Speaker 0

这两个名字听起来非常相似。

The names sound very similar.

Speaker 0

是的,萨姆·阿尔特曼进行了反击,并决定公开一些电子邮件。

And yeah, Sam Altman hit back, decided to publish some emails.

Speaker 0

这些只是电子邮件的截图,显示了IO公司的创始人非常友好、热情地表示希望与阿尔特曼会面,并希望获得OpenAI的投资。

It's just screenshot of emails showing the founder of IO, let's say, being very friendly, very enthusiastic about meeting with Altman and wanting to be invested in by OpenAI.

Speaker 0

萨姆·阿尔特曼的核心观点是,这位提起诉讼的创始人杰森·鲁戈洛一直坚持不懈地试图获得阿尔特曼的投资。

The basic gist of what Sam Altman said is this founder Jason Rugolo, who filed the lawsuit, was kind of persistent in trying to get investments from Sam Altman.

Speaker 0

事实上,他在与约翰尼·艾夫公布计划之前就于三月主动联系过,而阿尔特曼显然已经告知他,自己正在推进的竞争对手项目也叫IO。

In fact, reached out in March prior to the announcements with Johnny Ive, and apparently Sam Altman let him know that the competing initiative he had was called IO.

Speaker 0

这无疑是对这场诉讼的有效回应,类似于OpenAI此前应对埃隆·马斯克时的做法。

Definitely I think an effective pushback on the lawsuit, similar in a way to what OpenAI also did with Elon Musk.

Speaker 0

就是这么直接,证据就在这里。

Just like, here's the evidence.

Speaker 0

这是你们邮件的凭证。

Here's the receipts of your emails.

Speaker 0

我不太确定你说的是否属实。

I'm not too sure if what you're saying is legit.

Speaker 1

这变得有点儿……两个还不算模式,对吧?

This is becoming well, two is not yet a pattern, is it?

Speaker 1

是三个吗?

Is it three?

Speaker 1

我忘了要多少才能算作一个模式,他们这么说。

I forget how many it takes to make a pattern, they say.

Speaker 1

不过话说回来,我也不知道他们是谁,或者他们凭什么有资格告诉我们这是模式。

Then again, I don't know who they are or why they're qualified to tell us it's a pattern.

Speaker 1

但,是的,这确实是个有趣的情况。

But, yeah, this is an interesting situation.

Speaker 1

一个有趣的细节或许能让你稍微看出,到目前为止证据的天平是如何倾向的。

One interesting detail kind of gives you maybe a bit of a window into how the balance of evidence is shaping up so far.

Speaker 1

我们知道,在这起诉讼中,EO——不是IO,而是EO——也就是说,我本来想说杰森·德鲁罗,其实是杰森·鲁戈洛的公司,最终……等等,那是在哪儿?

We do know that in the lawsuit, EO, so not IO, but EO, so this is, I was gonna say Jason Derulo, Jason Rugolo's company, did end up sorry, where was it?

Speaker 1

他们确实获得了。

They were actually yeah.

Speaker 1

他们获得了针对OpenAI使用IO品牌行为的临时禁令。

They were granted a temporary restraining order against OpenAI using the IO branding themselves.

Speaker 1

因此,OpenAI被迫因这项临时禁令更改了IO品牌,而这项禁令是EO商标诉讼的一部分。

So the OpenAI was forced to change the IO branding due to this this temporary restraining order, which was part of EO's trademark lawsuit.

Speaker 1

所以至少在商标诉讼层面,法院似乎愿意签发这种初步的临时禁令。

So at least at the level of the trademark lawsuit, there has been an appetite from the courts to put in this sort of preliminary temporary restraining order.

Speaker 1

我不是律师,所以我不知道这背后需要什么样的举证标准。

I'm not a lawyer, so I don't know what the the standard of proof would be that would be involved in that.

Speaker 1

因此,至少在商标层面,也许是因为听起来足够相似。

So at least at a trademark level, maybe it's like sounds vaguely similar enough.

Speaker 1

所以,是的,目前我们就先告诉OpenAI,他们不能这么做。

So, yeah, for now, let's let's tell OpenAI they can't do this.

Speaker 1

但这些设备之间存在足够的根本性差异,你完全可以理解OpenAI声称‘这不一样’的理由。

But there's enough fundamental differences here between the devices that you can certainly see OpenAI's case for saying, hey.

Speaker 1

这是不同的。

This is different.

Speaker 1

他们声称IO设备根本不是入耳式设备。

They claim that the IO hardware is not an in ear device at all.

Speaker 1

它甚至不是可穿戴设备。

It's not even a wearable.

Speaker 1

这些信息的来源正是当时在流传的那些内容。

That's where that information comes from that was itself doing the rounds.

Speaker 1

这个大新闻:OpenAI的新设备实际上根本不是可穿戴设备。

This big deal, OpenAI's new device is not actually gonna be a wearable after all.

Speaker 1

但我们确实知道,Rugolo早在2022年就试图向许多人推销他们的想法,即IO概念——抱歉,是EO概念,当时他向曾任苹果设计师的Evans Hankey分享了相关信息,而Hankey后来共同创立了IO公司。

But we do know that apparently so Rugolo was trying to pitch a bunch of people about their idea, about the IO concept sorry, the Eo concept way back in 2022, sharing information about it to former Apple designer Evans Hankey, who actually went on to cofound IO.

Speaker 1

所以,这里确实有很多重叠之处。

So, yeah, there's a lot of overlap here.

Speaker 1

OpenAI 的说法是,看吧。

The claim from OpenAI is, look.

Speaker 1

你们从2018年就开始做这个项目了。

You've been working on it since 2018.

Speaker 1

你们曾经向我们展示过。

You demoed it to us.

Speaker 1

当时它根本没法用。

It wasn't working.

Speaker 1

存在很多问题。

There were these flaws.

Speaker 1

也许你们之后修复了这些问题,但当时它是个很粗糙的设备。

Maybe you fixed them since, but at the time, it it was a janky device.

Speaker 1

所以我们没有和你们合作。

So that's why we didn't partner with you.

Speaker 1

但你们还存在这种奇怪的重叠:事实上,EO团队的一些创始成员之前似乎直接接触过EO。

But then you also have this whole weird overlap where, yeah, some of the founding members of the EO team had apparently spoken directly to EO before.

Speaker 1

所以这相当混乱。

So it's pretty messy.

Speaker 1

我认为我们在法庭程序中会学到很多。

I think we're gonna learn a lot in the in the court proceedings.

Speaker 1

我觉得这些邮件不足以让我们做出明确的判断,因为我们甚至都不知道硬件到底是什么,而这似乎是问题的核心。

I don't think these emails give us enough to go on to make a firm determination about what because we don't even know what the hardware is, and that seems to be at the core of this.

Speaker 1

那么实际的硬件是什么?OpenAI、Love from、IO到底看到了多少?

So what is the actual hardware, how much of it did OpenAI, did love from, did IO actually see?

Speaker 0

对。

Right.

Speaker 0

从大局来看,这可能并不是什么大事。

And in the big scheme of things, this is probably not a huge deal.

Speaker 0

这是一场诉讼,说你们不能把你们的东西叫做IO,因为它和我们的EEO太相似了,而且似乎还是一种可穿戴AI设备。

This is a lawsuit saying you can't call your thing IO because it's too similar to our thing, EEO, and it's also seemingly some sort of wearable AI thing.

Speaker 0

最坏的情况下, presumably Sam Altman 和 Johnny 的计划已经改变了。

Worst case, presumably the initiative by Sam Altman and Johnny I have changes.

Speaker 0

我认为,最重要的是,这只是需要跟踪的OpenAI的另一件事,对吧?

I think more than anything, this is just another thing to track with OpenAI, right?

Speaker 0

另一件正在发生的事,而我们却没在Anthropic或Mistral或其他任何这些公司身上看到类似的情况。

Another thing that's going on that for some reason we don't have these kinds of things with Anthropic or Mistral or any of these other companies.

Speaker 0

也许是因为OpenAI是最大的公司,所以这类事情——这次是法律和商业纠纷,不是人际纠纷——但依然产生了大量头条新闻,而且确实有很吸引人的内容可以讨论——

Maybe because OpenAI is the biggest, there just tends to be a lot of this, in this case, legal business drama, not interpersonal drama, but nevertheless, lot of headlines and honestly juicy kind of stuff to discuss that-

Speaker 1

是的,是的,是的。

Yeah, yeah, yeah.

Speaker 0

对。

Yeah.

Speaker 0

还有另一件事,也表明了萨姆·阿尔特曼倾向于以公开且直接的方式应对这类争端。

So another thing going on and and another indication of the way that Sam Altman likes to approach these kinds of battles in a fairly public and direct way.

Speaker 1

接下来,华为MateBook搭载了采用中芯国际7纳米N+2工艺的麒麟X90芯片。

Up next, have Huawei MateBook contains Kirin X90 using SMIC seven nanometer n plus two technology.

Speaker 1

如果你是这个播客的常听者,你可能会说:天哪。

If you're a regular listener of the podcast, you're probably going, oh my god.

Speaker 1

然后或者你可能确实如此。

And then or maybe you are.

Speaker 1

我不确定。

I don't know.

Speaker 1

这可能有点太细节了。

This is maybe a little in the weeds.

Speaker 1

但无论如何,你可能想重温一下这到底意味着什么。

But either way, you might want a refresher on on what the hell this means.

Speaker 1

对吧?

Right?

Speaker 1

实际上,之前流传着很多传言,说华为破解了——抱歉,是中芯国际,中国最大的半导体代工厂,也是最先进的那家。

So there was a bunch of rumors actually floating around that Huawei had cracked sorry, that SMIC, which is China's largest semiconductor foundry or most advanced one.

Speaker 1

你可以把他们看作是中国本土的台积电。

You can think of them as being China's domestic TSMC.

Speaker 1

关于他们是否突破了5纳米制程,流传着不少传言,对吧?这个关键制程正是用于制造H100 GPU(NVIDIA H100)的,或者其改良版本。

There's a bunch of rumors circulating about whether they had cracked the five nanometer node, right, that critical node that is what was used or a modified version of it was used to make the h h 100 GPU, the NVIDIA h 100.

Speaker 1

如果中国能在国内实现这一突破,那将是一个巨大的成就。

So if China were to crack that domestically, that'd be a really big deal.

Speaker 1

然而,这些谣言现在被打破了,因为这家实际上总部位于加拿大的公司进行了评估。

Well, those rumors now are being squashed because this this company, which is actually based in Canada, did an assessment.

Speaker 1

Tech Insights,我们之前多次讨论过他们的发现,有时会点名提及,有时则没有。

So Tech Insights, we've actually talked a lot about their findings sometimes while mentioning them by name, sometimes not.

Speaker 1

我们其实应该多提一提他们。

We really should.

Speaker 1

Tech Insights 在整个领域中是一家非常重要的公司。

Tech Insights is a very important firm in all this.

Speaker 1

他们会对硬件进行拆解分析。

They do these teardowns of hardware.

Speaker 1

他们会深入研究,弄清楚制造芯片某个部件所使用的工艺节点。

They'll go in deep and figure out, oh, what manufacturing process was used to make this component of the chip.

Speaker 1

对吧?

Right?

Speaker 1

他们就是做这种分析的。

That's the kind of stuff they do.

Speaker 1

他们能够确认,华为的麒麟90芯片实际上并没有采用5纳米等效工艺,而是使用了我们早已知道的中芯国际所掌握的7纳米工艺。

And they were able to confirm that, in fact, the Huawei x 90, so system on a chip, is was actually not made using five nanometer equivalent processes, but rather using the old seven nanometer process that we already knew SMIC had.

Speaker 1

因此,从中国能否在国内自主生产GPU并跟上西方步伐的角度来看,这是一件非常重大的事情。

So that's a big, big deal from the standpoint of their ability to onshore domestically GPU fabrication and keep up with the West.

Speaker 1

所以,从中芯国际首次突破7纳米工艺算起,现在已经过去了两年,但我们仍然没有实现5纳米工艺的突破。

So it seems like like we're, like, two years down the road now from when SMIC first cracked the seven nanometer node, and we're still not on the five nanometer node yet.

Speaker 1

这真的非常有意思。

That's really, really interesting.

Speaker 1

值得一提的是,华为从未明确说过这款新电脑使用的是5纳米工艺。

And so worth saying, like, Huawei never actually explicitly said that this new PC had a a five nanometer node.

Speaker 1

关于这一点,只是有很多传言。

There's just a bunch of rumors about it.

Speaker 1

所以我们现在得到的,正是对这些传言的决定性澄清。

So what we're getting now is just kind of the the decisive quashing of that rumor.

Speaker 0

对。

Right.

Speaker 0

更广泛的背景是,美国正在阻止英伟达向中国公司出售顶级芯片,这限制了中国开发先进人工智能的能力。

Broader context here is, of course, that The US is preventing NVIDIA from selling top of line chips to Chinese companies, and that does limit the ability of China to create advanced AI.

Speaker 0

他们正努力在国内生产可与英伟达竞争的芯片。

They are trying to get the ability domestically to produce chips competitive with NVIDIA.

Speaker 0

据我了解,目前他们大约落后两年。

Right now they're, let's say, about two years behind is my understanding.

Speaker 0

其中一个真正的瓶颈是,如果你无法获得最先进的芯片制造工艺,那么在同样面积的芯片上能集成的计算能力就会更少。

And is one of the real bottlenecks is if you're not able to get the state of the art fabrication process for chips, there's just less compute you can get on the same amount of chip.

Speaker 0

对吧?

Right?

Speaker 0

就是密度更低。

It's just less dense.

Speaker 0

而这 arguably 是最难攻克的部分,对吧?

And this arguably is the hardest part, right, to get this thing.

Speaker 0

正如你所说,仅靠这个工艺就需要两年时间,如果他们无法突破这一点,这将是一个真正的障碍。

It takes forever, as you said, two years with just this process, and it it is gonna be a real blocker if they're not able to crack it.

Speaker 1

是的。

Yeah.

Speaker 1

中国面临的基本问题是,由于他们的工艺节点较差,无法像台积电那样制造出同等质量的节点,因此被迫要么窃取台积电制造的芯片,要么设法让台积电代工他们的设计,通常通过子公司或空壳公司,伪装成来自新加坡或其他非华为的所谓‘干净’中国公司提出代工请求。

The fundamental issue China is dealing with is because they have crappier nodes, so they can't fab the same quality of nodes as TSMC, they're forced to either steal TSMC fabbed nodes, so or or find clever ways of getting TSMC to fab their their designs, often by using subsidiaries or shell companies to make it seem like they're you know, maybe we're coming in from Singapore and asking TSMC to fab something, or we're coming in from a clean Chinese company, not Huawei, which is blacklisted.

Speaker 1

而另一方面,他们的替代方案是采用这些较差的7纳米工艺节点,这些节点的能效低得多。

And then the other the other side is because their alternative is to go with these crappier seven nanometer process nodes, those are way less energy efficient.

Speaker 1

因此,这些芯片发热更严重,或者说运行温度更高,这意味着随着时间推移,会出现各种由热量引发的缺陷。

And so the chips burn hotter or they run hotter rather, which means that you run into all these kinds of heat induced defects over time.

Speaker 1

我想我们上一期或者前两期已经讨论过这个问题了。

And and we covered that, I think, last or two episodes ago, last episode I was on.

Speaker 1

所以,无论如何,这些由中芯国际未能跟上台积电步伐而引发的复杂问题,构成了一个庞大的难题。

So, anyway, there there's a whole kind of hairball of different problems that come from ultimately the fact that SMIC has not managed to keep up with TSMC.

Speaker 0

对。

Right.

Speaker 0

你看到正在建造这些价值100亿、200亿美元的数据中心。

You're seeing all these $10,000,000,000 $20,000,000,000 data centers being built.

Speaker 0

这些数据中心配备了大量机架和海量GPU。

Those are being built with racks and racks and huge amounts of GPUs.

Speaker 0

你如何供电、如何散热等等,所有这些都取决于你所使用的硬件。

The way you do it, the way you supply energy, the way you cool it, etcetera, all of that is conditioned on the hardware you have in there.

Speaker 0

理想情况下,拥有最先进的硬件来进行建设非常重要。

It's very important to ideally have the state of art to build with.

Speaker 0

下一个故事也与硬件发展有关,这次是关于AMD的,他们现在推出了一款超高速以太网网卡——Pensando Pollara,最高可达每秒400吉比特,是这样吗?

Next story also related to hardware developments, this time about AMD, and they now have an ultra Ethernet ready network card, the Pensando Pollara, which provides up to 400 gigabits per second, is that it?

Speaker 0

每秒的性能。

Per second performance.

Speaker 0

这在他们举办的‘推进AI’活动上公布。

And this was announced at their Advancing AI event.

Speaker 0

它将实际部署在甲骨文云上,搭配AMD Instinct A350X GPU和这款网卡。

It will be actually deployed by Oracle Cloud with the AMD Instinct A350X GPUs and the network card.

Speaker 0

这很重要,因为AMD正在GPU领域与NVIDIA竞争,而其一系列GPU似乎正在迎头赶上,至少已被证明在AI应用中相当可用。

So this is a big deal because AMD is trying to compete with NVIDIA on the GPU front and various series of GPUs does seem to be catching up, or at least has been shown to be quite usable for AI.

Speaker 0

这是堆栈中的另一个环节——芯片间通信,但它非常重要,且与NVIDIA的举措密切相关。

This is another part of the stack, the inter chip communications, but it's very important and very significant in terms of what NVIDIA is doing.

Speaker 1

是的,完全正确。

Yeah, 100%.

Speaker 1

顺便说一下,这是业界首款符合超以太网标准的网卡,也就是网络接口卡。

This is, by the way, the industry's first ultra Ethernet compliant NIC, so a network interface card.

Speaker 1

网卡的作用是,你可以回顾一下我们之前关于硬件的那期内容来了解更多细节,但在一个机架内,或者在机柜级别,你的所有GPU都通过加速器互连技术紧密连接在一起。

So what the NIC does, you've got and and you go back to our hardware episode to kind of see more more detail on this, but in a rack, say, at the rack level or at the pod level, you've got all your GPUs that are kinda tightly interconnected with accelerator interconnect.

Speaker 1

这种技术通常就是NVIDIA的NVLink产品。

This is often like, the the NVIDIA product for this is NVLink.

Speaker 1

这是一种超低延迟、但成本极高的互连技术。

This is super low latency, super expensive interconnect.

Speaker 1

但如果你想连接不同的机柜或机群,就不得不通过一个更慢的互连网络,也就是有时被称为后端网络的部分。

But then if you wanna connect, like, pods to other pods or racks to other racks, you're now forced to hop through a slower interconnect, part of what's known sometimes as the back end network.

Speaker 1

当你这么做的时候,你通常会使用NVIDIA的InfiniBand解决方案。

And when you do that, the NVIDIA solution you you'll tend to use for that is InfiniBand.

Speaker 1

对吧?

Right?

Speaker 1

所以你有NVLink用于pod内部的连接,而pod之间则使用InfiniBand。

So you've got you've got NVLink for the really, like, within a pod, but then from pod to pod, you have InfiniBand.

Speaker 1

InfiniBand长期以来一直是行业内的首选和事实上的黄金标准。

And InfiniBand has been a go to de facto, like, kind of gold standard in the industry for a while.

Speaker 1

非NVIDIA的公司并不喜欢这一点,因为这意味着NVIDIA掌控了更多技术栈,并在各个组件上形成了更深的事实垄断。

Companies that aren't NVIDIA don't like that because it means that NVIDIA owns more of the stack and has an even deeper kind of de facto monopoly on different components.

Speaker 1

因此,出现了一个名为Ultra Ethernet联盟的组织。

And so you've got this thing called the Ultra Ethernet Consortium that came together.

Speaker 1

该联盟由多家公司共同发起,包括AMD、Broadcom、Meta、微软和Intel等。

It's founded by a whole bunch of companies, AMD, notably Broadcom, I think Meta, Microsoft were involved, Intel.

Speaker 1

他们聚在一起说:嘿。

And they came together and said, hey.

展开剩余字幕(还有 480 条)
Speaker 1

让我们共同制定一种开源标准,这种互联技术具备AI优化功能,能够与NVIDIA现有的InfiniBand模型竞争。

Let's come up with an open source standard for this kind of interconnect with AI optimized features that basically can compete with the InfiniBand model that that NVIDIA has out.

Speaker 1

这就是Ultra Ethernet的由来。

So that's what Ultra Ethernet is.

Speaker 1

这个项目已经筹备很久了。

It's been in the works for a long time.

Speaker 1

我们刚刚发布了Ultra Ethernet协议1.0版本,专为超大规模AI应用和数据中心设计。

We've just had the announcement of specification one point o of that ultra Ethernet protocol, and that's specifically for hyperscale AI applications and data centers.

Speaker 1

这实际上标志着行业的一次重大变革,而且已有不少迹象表明,企业将从InfiniBand转向这种新协议,其中一个关键原因是成本经济性。

And so this is actually a pretty seismic shift in the industry, and there are actually quite interesting indications that companies are going to shift from InfiniBand to this sort of protocol, and one of them is just cost economics.

Speaker 1

以太网在整条网络产业链中早已具备巨大的规模效应,而InfiniBand则相对小众。

Like, Ethernet has massive economies of scale already across the entire, like, networking industry, and InfiniBand's more niche.

Speaker 1

因此,Ultra Ethernet的芯片和交换机价格要便宜得多。

So as a result, you kind of have ultra Ethernet chips and and, like, switches that are just so much cheaper.

Speaker 1

所以你会非常喜欢这一点。

So you'd love that.

Speaker 1

你还拥有供应商独立性。

You also have vendor independence.

Speaker 1

因为这是一个开放标准,任何人都可以基于它进行开发,而不是只有NVIDIA掌控一切。

You have because it's an open standard, anyone can build to it instead of just having NVIDIA own the whole thing.

Speaker 1

因此,利润率大幅下降,这一点显然让很多人非常满意。

So so the margins go down a lot, and and people really, really like that, obviously.

Speaker 1

还有各种运营优势。

All kinds of operational advantages.

Speaker 1

它在操作上更简单,因为数据中心已经熟悉以太网及其使用方式。

It's just operationally more simple because data centers already know Ethernet and how to work with it.

Speaker 1

所以,总之,这是一件非常值得关注的事情。

So, anyway, this is a really interesting thing to to watch.

Speaker 1

我知道听起来可能有点无聊。

I know it sounds like it sounds boring.

Speaker 1

这是数据中心内不同模块之间的互联技术,但顶级实验室的高管们对此非常重视,因为InfiniBand存在一些问题。

It's the interconnect between different pods in a data center, but this is something that executives at the top labs really sweat over because there are issues with the InfiniBand stuff.

Speaker 1

这是限制模型规模扩大的关键瓶颈之一。

This is one of the key rate limiters in terms of how big models can scale.

Speaker 0

对。

Right.

Speaker 0

是的。

Yeah.

Speaker 0

举个例子,甲骨文似乎计划部署最新的AMD GPU,搭配Zetascale AI集群,最多可配备131,172块Instinct MI355X GPU。

To give you an idea, Oracle is apparently planning to deploy these latest AMD GPUs with a Zetascale AI cluster with up to one hundred and thirty one and seventy two Instinct MI355X GPUs.

Speaker 0

当你想到这些数字时,131,000块GPU。

So when you get to those numbers, think of it, 131,000 GPUs.

Speaker 0

GPU并不小,对吧?

GPUs aren't small, right?

Speaker 0

GPU相当大。

The GPUs are pretty big.

Speaker 0

它们可不是小小的芯片。

They're not like a little chip.

Speaker 0

它们大概有笔记本电脑那么大。

They're, I don't know, like notebook sized ish.

Speaker 0

现在你需要连接多达131,000个这样的设备。

And there's now 131,000 that you need to connect all of them.

Speaker 0

当你提到‘机柜’时,通常是一排排像书架一样的设备,用线缆连接起来,但每排最多只能连64个左右。

And when you say pod, right, typically you have this rack of them, like almost a bookcase you could think where you connect them with wires, but you can only get, I don't know how many, typically 64 or something on that side.

Speaker 0

当你达到141,000这个数量级时,这些问题就变得至关重要了。

When you get to 141,000, this kind of stuff starts really mattering.

Speaker 0

在他们这次活动的演示文稿中,他们明确地将自己与竞争对手对比,声称自己的带宽是对手的20倍,性能也高出20%之类的。

In their slides, in this event, they did, let's say, very clearly compared themselves to competition, said that this has 20x scale over in feeding band, whatever that means, has performance of 20% over competition, stuff like that.

Speaker 0

AMD正在积极竞争,试图提供一些在某些方面领先于英伟达、博通等公司的产品。

AMD is very much trying to compete and be offering things that are in some ways ahead of NVIDIA and others, like Broadcom and so on.

Speaker 0

接下来是另一个硬件新闻,这次涉及能源问题。

And next up, another hardware story, this time dealing with energy.

Speaker 0

亚马逊正加入大型核电行列,从塔伦特能源公司在宾夕法尼亚州的苏斯奎哈纳核电站购买1.2至1.92吉瓦的电力。

Amazon is joining the big nuclear party by buying one point two one point nine two gigawatts of electricity from Talend Energy's Susquehanna Nuclear Plant in Pennsylvania.

Speaker 0

所以,核能用于AI,现在非常流行。

So nuclear power for AI, it's all the rage.

Speaker 1

是的。

Yeah.

Speaker 1

我的意思是,如果我们回溯一下,最初他们想达成的是960兆瓦的交易,但被监管机构否决了,因为他们担心这会给电网上的用户带来不公平的负担。

I mean, so we've known about if you flip back, originally, this was the nine sixty megawatt deal that they were trying to make, and that got killed by regulators who were worried about customers on the grid.

Speaker 1

本质上,那些日常使用电网的人,在他们看来,会不公平地承担维持电网运行的代价。

Essentially everyday people who are using the grid who would, in their view, unfairly shoulder the burden of running the grid.

Speaker 1

如今,Susquehanna电厂为电网供电,这意味着他们每输入一度电,都会产生输电费用,用于支持电网的维护。

Today, Susquehanna powers the grid, and that means every kilowatt hour that they put in leads to transmission fees that support the grid's maintenance.

Speaker 1

因此,亚马逊原本打算绕过电网,直接将电厂与他们的数据中心连接起来。

And so what what Amazon was going to do was gonna go behind the meter, basically link the power plant directly to their data center without going through the grid.

Speaker 1

这样一来,就不会产生电网费用,这意味着普通的电网基础设施长期无法从这些费用中获益,就像上高速公路却不交过路费一样。

So there wouldn't be grid fees, And that basically just means that the general kind of grid infrastructure doesn't get to benefit from those fees over time, sort of like not paying toll when you go on a highway.

Speaker 1

而这项新的交易,将总功率提升至1.2吉瓦,正是对原先计划的调整。

And this new deal that gets us to 1.2 gigawatts is a revision in that.

Speaker 1

这实际上是让亚马逊通过电表前的方式,按常规途径使用电网。

It it's got Amazon basically going through in front of the meter, going through the grid in the usual way.

Speaker 1

正如你所想象的,需要重新配置大量基础设施,包括输电线路。

They're gonna be as you can imagine, a whole bunch of infrastructure needs to be reconfigured, including transmission lines.

Speaker 1

这些工程将在2026年完成,而该协议涵盖的电力采购将持续到2042年,这有点讽刺,因为这就像试图在飓风来临前修好房子。

Those will be done in 2026, and the deal apparently covers energy purchased through 2042, which is sort of amusing because, like, imagine trying to fix before a hurricane.

Speaker 1

但确实如此。

But yeah.

Speaker 0

我猜他们预测到2042年仍然需要电力,假设X风险不会发生的话,没错。

I guess they are predicting that they'll still need electricity by 2042, which assuming X risk doesn't come about, suppose Yeah.

Speaker 0

下一个故事也涉及核能和英伟达。

That's Next story, also dealing with nuclear and dealing with NVIDIA.

Speaker 0

它正与比尔·盖茨等人一起投资TerraPower公司,该公司正在建造用于为数据中心供电的核反应堆。

It is joining Bill Gates and others in backing TerraPower, a company building nuclear reactors for powering data centers.

Speaker 0

这是通过英伟达的风险投资部门——英伟达风投进行的。

This is through NVIDIA's venture capital arm and ventures.

Speaker 0

他们已经投资了TerraPower这家公司,据称投资了6.5亿美元,与现代汽车一同参与。

They have invested in this company TerraPower, investing seems like $650,000,000 alongside Hyundai.

Speaker 0

TerraPower目前正在怀俄明州开发一座345兆瓦的钠冷反应堆。

TerraPower is developing a three forty five megawatt natrium plant in Wyoming right now.

Speaker 0

我想他们正处在让这项技术变得可用的过程中,不过可能还要好几年才能实现。

They're, I guess, in the process of starting to get to a point where this is usable, although it probably won't come for some years.

Speaker 1

你在时间点上的直觉完全正确。

Your instincts are exactly right on the on the the timing too.

Speaker 1

对吧?

Right?

Speaker 1

现在有很多关于小型模块化反应堆的讨论,它们是一种非常高效且安全的现场核能发电方式。

So there's a lot of talk about SMRs, like small modular reactors, which are just a very efficient way and very safe way of generating nuclear power on-site.

Speaker 1

这正是它们令人兴奋的地方。

That's the exciting thing about them.

Speaker 1

除了聚变之外,它们显然是未来为数据中心供电的明显解决方案。

They are the obvious apart from, like, fusion, they are the obvious solution of the future for powering data centers.

Speaker 1

当你与数据中心公司和建设方交谈时,他们总会说,是的,SMRs很棒,但我们现在关注的是首批SMRs获得批准并开始发电的时间,最早也得是2029年、2030年左右。

The the challenge is when you talk to data center companies and builders, they'll they'll always tell you, like, yeah, SMRs are great, but, you know, we're looking at first first approvals, first SMRs generating power, like, at the earliest, you know, like, 2029, 2030 type thing.

Speaker 1

所以,如果你认为AGI的实现时间较短,那么这些技术对它们来说根本无关紧要。

So, you know, if you have sort of shorter AGI timelines, they're not gonna be relevant at all for those.

Speaker 1

如果你的预期时间更长,哪怕只是稍微长一点,它们就会变得有意义。

If you have longer timelines, even kind of somewhat longer timelines, then then they do become relevant.

Speaker 1

因此,这是一个非常有趣的空间,我们即将看到能源基础设施类型的更替。

So it's a really interesting space where we're going to see a turnover in in the kind of energy generation infrastructure that's used.

Speaker 1

而且,人们经常谈论中国在能源方面的优势,这确实没错。

And and this, you know, people talk a lot a lot about China and their energy advantage, which is absolutely true.

Speaker 1

我很好奇,这是否能让美国的能源行业像中国在移动支付领域那样,在SMRs上实现类似的跨越式发展。

I'm quite curious whether this allows the American energy sector to do a similar leapfrogging on SMRs that China did, for example, on mobile payments.

Speaker 1

对吧?

Right?

Speaker 1

当你根本无法在十年内建成核电站时——美国目前就是这样,我们既缺乏相关经验,也缺乏在工业基础上放松监管的意愿,那就只能被迫去寻找其他选择。

When you when you just, like, do not have the ability to build nuclear plants in less than ten years, which is the case for The United States, we just, like, don't have that that know how and and, frankly, the willingness to deregulate to do it in the industrial base, then it kinda forces you to look at other options.

Speaker 1

因此,如果电力生产格局发生转变,这可能会带来一些追赶的机会。

And so if there's a shift just in the the landscape of power generation, it can introduce some opportunities to to play catch up.

Speaker 1

所以,我想这算是一个大胆的观点,我还没深入思考过,但无论如何,这为小型模块化反应堆的故事增添了一个有趣的维度。

So sort of a I I guess that's a hot take there that I haven't thought enough about, but that's a an interesting dimension anyway to the SMR story.

Speaker 0

顺便说一下,一吉瓦相当于130万马力。

By the way, one gigawatt apparently equivalent to 1,300,000 horsepower.

Speaker 0

我不确定这是否能让你对一吉瓦有概念,但它的能量非常巨大。

So not sure if that gives you an idea of what a gigawatt is, but it's a lot of energy.

Speaker 1

或者够一百万户家庭使用。

Or 1,000,000 homes.

Speaker 0

是的。

Yeah.

Speaker 0

一百万户家庭是一天的用量吗?还是说

1,000,000 homes for one day or what does that

Speaker 1

实际上,吉瓦是功率单位,指的是百万户家庭所消耗的电力总量,是的。

actually So gigawatt is a unit of power, so it's like the amount of power that a million homes just consume Yeah.

Speaker 1

在运行中

On a running

Speaker 0

没错。

exactly.

Speaker 0

所以十亿瓦特已经很多了,三百四十五兆瓦也不少。

So one gigawatt is a lot, so is three forty five megawatts.

Speaker 0

接下来聊聊一些融资消息。

Now moving on to some fundraising news.

Speaker 0

米拉·穆拉蒂的公司Thinking Machines Lab已经完成融资,以100亿美元的估值获得了20亿美元。

Mira Murati, her company Thinking Machines Lab has finished up their fundraising, getting $2,000,000,000 at a $10,000,000,000 valuation.

Speaker 0

这还是种子轮。

And this is the seed round.

Speaker 0

所以又是一轮十亿美元的种子轮融资。

So yet another billion round, billion dollar seed round.

Speaker 0

这当然是前OpenAI首席技术官在2024年离职后创立的,一直在筹建Thinking Machines Lab,作为AGI领域的另一家竞争者, presumably计划自研模型,招募了多位研究人员,其中一些来自OpenAI,如今手握数十亿美元,显然打算用于训练这些大型模型。

And this is of course the former CTO of OpenAI left in, 2024, I believe, and has been working on setting up Thinking Machines Lab, another competitor in the AGI space, presumably planning to train their own models, recruited various researchers, some of them from OpenAI, and now has billions to work with that tell Deploy presumably to train these large models.

Speaker 1

是的。

Yeah.

Speaker 1

这很有趣。

It's funny.

Speaker 1

由于涉及的人才水平如此之高,大家自然都觉得融资额必须是个以十亿为单位的数字。

Everyone just kinda knew that it was gonna have to be a number with with billion after it just because of the the level of talent involved.

Speaker 1

这确实是一支非凡的人才团队。

It is a remarkable talent set.

Speaker 1

本轮融资由安德烈·库尔恩科夫主导,因此他在资本结构中现在占了16%。

The round is led by Andrey Kurenkov, so a 16 z on the cap table now.

Speaker 1

但值得注意的是,Thinking Machines 并未向投资者透露他们正在做什么。

Notably, though, Thinking Machines did not say what they're working on to their investors.

Speaker 1

至少这篇报道看起来是这个意思。

At least that's what this article that's what it sounds like.

Speaker 1

措辞可能稍微有些模糊。

The wording is maybe slightly ambiguous.

Speaker 1

我直接念出来吧。

I'll just read it explicitly.

Speaker 1

你可以自己判断。

You can make up your mind.

Speaker 1

Thinking Machines Lab 并未公开他们正在研究的内容,而是借助穆拉蒂的名望来吸引投资者。

Thinking Machines Lab had not declared what it was working on, instead using Murati's name and reputation to attract investors.

Speaker 1

这表明他们虽然没有全额开出20亿美元的支票,但确实领投了这一轮。

So that suggests that a 16 z cut they didn't cut the full $2,000,000,000 check, but they led the round.

Speaker 1

所以,仅仅因为这样,就投入了数亿甚至数十亿美元。

So hundreds and hundreds of millions of dollars just on the basis of, like, yeah.

Speaker 1

你知道,米拉是个非常严肃的人。

You know, Mira's a serious fucking person.

Speaker 1

约翰·舒尔曼也是个非常严肃的人。

John Schulman's a serious fucking person.

Speaker 1

你知道,乔纳森·拉克曼,还有各种各样的人,巴雷特都离开了。

You know, Jonathan Lackman, like, all kinds of of people Barrett's off.

Speaker 1

这些都是非常严肃的人。

These are these are really serious people.

Speaker 1

所以我们给你开8亿美元的支票,不管他们在这笔投资中出了多少。

So we'll cut you 800,000,000 check, whatever they cut as part of that.

Speaker 1

这既疯狂又充分说明了这个领域是如何被定价的。

That's both insane and tells you a lot about how the the space is being priced.

Speaker 1

我们还知道另一件奇怪的事,之前也讨论过,但值得再提一下。

The other weird thing we know, and we talked about this previously, but it bears kind of repeating.

Speaker 1

因此,穆拉蒂将掌握投票权,其投票权超过所有其他董事的总和。

So Murati is gonna hold as a mirror, it's gonna hold board voting rights that outweigh all other directors combined.

Speaker 1

这很奇怪。

This is a weird thing.

Speaker 1

对吧?

Right?

Speaker 1

这不就是所有这些AGI公司和那些极其古怪的董事会结构的常态吗?

This is not what is with all these AGI companies and the really weird board structures?

Speaker 1

很多情况就像是OpenAI的圈子,那些曾在OpenAI工作过的人不喜欢萨姆的做法,吸取了教训,然后把这些经验固化在他们公司的运营方式和实际企业结构中。

A lot of it is just like the OpenAI mafia, like people who worked at OpenAI did not like what Sam did and learned those lessons and then enshrined that in the way they run their company, in their in their actual corporate structure.

Speaker 1

Anthropic设立了公共利益公司架构,并配有监督委员会,而Thinking Machines则采用了米拉·穆拉蒂的独裁式结构,她在公司几乎所有事务上都拥有最终决定权。

And Anthropic has, you know, their public benefit company set up with their their oversight board, and now Thinking Machines has this Mira Murati dictatorship structure where she has final say, basically, over over everything at the company.

Speaker 1

顺便说一句,我听到的关于她的所有评价都极其出色。

By the way, everything I've heard about her is is exceptional.

Speaker 1

每当我跟任何一位曾与OpenAI共事过的人谈起米拉,他们都会对她赞不绝口。

Like, every OpenAI person I've I've ever spoken to about Mira has just, like, glowing things to say about her.

Speaker 1

尽管20亿美元在你相信规模定律的前提下,确实不足以参与竞争,但它揭示了一点:人们在选择去哪里工作时,会考虑‘我会和谁共事’这样的因素。

And so even though $2,000,000,000 is not really enough to compete if you believe in scaling laws, it tells you something about, you know, the peep the kinds of decisions people will make about where they work include who will I be working with.

Speaker 1

我认为,这似乎是促使这么多人离开OpenAI的一个重要因素。

And this seems to be a big factor, I would guess, in in all these people leaving OpenAI.

Speaker 1

她确实是一位真正非凡的人物——我虽然从未见过她,但所有我听到的评价都极为正面,既称赞她的能力,也赞赏她与人合作时的从容与高效。

She does seem to be a genuinely accept exceptional per like, I've never met her, but, again, everything I've heard is just, like, glowing and both in terms of competence and in terms of kind of smoothness of of working with her.

Speaker 1

因此,这或许也是吸引如此多人才加入的原因之一。

So that may be part of what's attracting all this talent as well.

Speaker 0

是的。

Yes.

Speaker 0

关于他们不太清楚自己在构建什么这一点,如果你访问 thinkingmachines.ai,这种情况已经持续一段时间了,你会看到一个页面的文字。

And on the point of not quite knowing what they're building, you go if you go to thinkingmachines.ai, this has been the case for a while, you'll get a page of text.

Speaker 0

这段文字是一份使命宣言,确实说了不少东西。

The text reads a mission statement that sure is saying a lot.

Speaker 0

里面提到了科学进步是集体努力、强调人机协作、更个性化的AI系统、基础设施质量、先进的多模态能力、研究与产品协同设计、实证迭代的AI安全方法、衡量真正重要的东西。

There's stuff about scientific progress being a collective effort, emphasizing human AI collaboration, more personalized AI systems, infrastructure quality, advanced multimodal capabilities, research product co design, empirical iterative approach to AI safety, measuring what truly matters.

Speaker 0

我完全搞不懂。

I have no idea.

Speaker 0

这就像堆了一大堆话,你可以从中解读出任何你想得到的意思。

This is like just saying a whole bunch of stuff and you can really take away whatever you want.

Speaker 0

我推测,它大概会直接与OpenAI和Anthropic竞争,这是一种印象。

Presumably it'll be something that is competing with OpenAI and Anthropic fairly directly is an impression.

Speaker 0

在 thinkingmachines.ai 页面底部,‘创始团队’列出了几十个名字,每个名字都可以悬停查看他们的背景,正如你所说,都是真正的重量级人物。

Near the bottom of the page at thinkingmachines.ai, Founding team has a list of a couple dozen names, each one with you can hover over it to see their background, as you say, like real heavy hitters.

Speaker 0

还有顾问团队和加入我们的页面。

Then there are advisors and a join us page.

Speaker 0

所以,是的,如果你建立了声誉,并且拥有一些硅谷的顶尖人才,这会带来很大优势。

So yeah, it really tells you what if you gain a reputation and you have some real star talent in Silicon Valley that goes a long way.

Speaker 0

说到这里,下一个相关的故事是:Meta招聘了一些关键的OpenAI研究人员,来开发他们的AI推理模型。

And on that note, next story quite related, Meta has hired some key OpenAI researchers to work on their AI reasoning models.

Speaker 0

所以,一周或两周前,我们讨论过Meta如何投入大量资金投资Scale AI,并挖走了Scale AI的创始人Alex Wang,让他领导他们的超级智能项目。

So a week ago or two weeks ago, we talked about how Meta paid a whole bunch of money invested rather in Scale AI and hired away the founder of Scale AI, Alex Wang, to head their new super intelligence efforts.

Speaker 0

现在有这些报道。

Now there are these reports.

Speaker 0

我不确定这是否特别因为OpenAI而被强调,或者这只是些吸引人的细节。

I don't know if this is highlighting it particularly because of OpenAI or perhaps this is just juicy details.

Speaker 0

我确信Meta也招聘了其他工程师和研究人员,但我想这个案例值得特别提及。

I'm sure Meta has hired other engineers and researchers as well, but I suppose this one is worth highlighting.

Speaker 0

他们确实招聘了一些来自OpenAI的知名人物。

They did hire some fairly notable figures from OpenAI.

Speaker 0

这是卢卡斯·拜尔、亚历克斯·桑德·卡拉什尼科夫和谢哈尔·扎贾伊,我认为他们创立了瑞典办公室。

So this is Lucas Bayer, Alex Sander Kalashnikov, and Shehal Zajai, who I believe founded the Sweden office.

Speaker 0

瑞士的办公室,他们似乎是OpenAI的一支相当重要的团队,至少在我看来是这样。

Switzerland Anyway, office, was they were a fairly significant team at OpenAI, or so it appears to me.

Speaker 0

我认为卢卡斯·拜尔曾在推特上发帖称,我们被支付了一亿美元的说法是假新闻。

Think Lucas Baer did post on Twitter and say that the idea that we are paid $100,000,000 was fake news.

Speaker 0

这又是另一件悬而未决的事情。

This is another thing that's been up in the air.

Speaker 0

萨姆·阿尔特曼一直在以一种温和的方式暗示,Meta一直在承诺极其丰厚的薪酬待遇。

Sam Altman has been taking, you could say, some gentle swipes saying that Meta has been promising insane pay packages.

Speaker 0

所以所有这些都表明,马克·扎克伯格正在非常积极地争夺人才。

So all this to say is this is just another indication of Mark Zuckerberg very aggressively going after talent.

Speaker 0

他亲自在WhatsApp等平台上给数十人发消息,说:嘿。

Know he's been personally messaging dozens of people on WhatsApp and whatever, being like, hey.

Speaker 0

来Meta工作吧。

Come work for Meta.

Speaker 0

或许并不令人意外,这在某种程度上正为这个超级智能领域吸引着更多人才。

And perhaps unsurprisingly, that is paying off in in some ways in expanding the talent of this superintelligence scene.

Speaker 1

是的。

Yeah.

Speaker 1

这件事有很多既奇怪又有趣的地方。

There's a lot that's both weird and interesting about this.

Speaker 1

第一点是,任何低于这个标准的做法都将毫无价值。

The first thing is anything short of this would be worth zero.

Speaker 1

当你处于扎克伯格的位置时,我会简单地说一下——这多少受到我个人对这个领域谁对谁错的解读影响,但我认为,公平地说,事情正变得越来越清晰。

The the when you're in Zuck's position and you are and I'll just sort of like, this is colored by my own interpretation of who's right and who's wrong in this space, but I think it's increasingly sort of just becoming clear in fairness.

Speaker 1

我认为这并不仅仅是我的偏见。

I don't think it's just my bias in saying that.

Speaker 1

当你的公司尽管拥有前沿规模的算力资源——也就是说,在基础设施这一最困难、最昂贵的方面完全没有任何借口失败——却依然把AI项目搞砸到如此灾难性的地步,原因在于你的文化被雅恩·乐昆所主导,即使他并非真正领导着你们的AI团队,至少最近几年他的影响力已经大不如前。但他在Meta内部树立了对AGI持怀疑态度、对规模扩张持怀疑态度的基调,然后又以维护自我形象的方式改变立场,却从不承认自己改变了主意。

When when your company's AI efforts, despite having access to absolutely frontier scales of compute, so having no excuses for failure on the basis of access to to infrastructure, which is the hardest and most expensive thing, when you've managed to tank that so catastrophically, because your culture is taken is screwed up by having Yan Lakun as the mascot, if not the leader of your internal AI efforts, because he's not actually as influential as it sounds or hasn't been for a while on the internals of Facebook, but he has set the beat at face at Meta being kinda skeptical about AGI, being skeptical about scaling, and then, like, changing his mind in ego preserving ways without admitting that he's changed his mind.

Speaker 1

我认为这些都是非常有害的。

I think these are very damaging things.

Speaker 1

他们摧毁了Meta的信誉,并造成了这种损害。

They destroy the credibility of Meta and have done that damage.

Speaker 1

我认为,Meta如今远远落后,很大程度上是由于杨立昆的个性以及他无法相应更新、保持认知谦逊所致。

And I think the fact that Meta is so far behind today is a reflection, in large part, a consequence of Yan LeCun's personality and his inability to kind of update accordingly and maintain, like, epistemic humility on this.

Speaker 1

我想每个人都能看出来。

I think everybody can see it.

Speaker 1

他就像那个还在对云朵大喊大叫的老人,当云朵形状改变时,他却假装它们没变。

He's like the old man who's still yelling at clouds and just, like, as the clouds change shape, he's, like, trying to pretend they're not.

Speaker 1

但就我个人而言,如果我要决定去哪里工作,这会是一个巨大的因素,而且客观上已经导致了对全球最令人印象深刻的AI基础设施之一的灾难性利用失败。

But but I think just, like, speaking as like, if I were making a decision about where to work, that would be a huge factor, and it has just objectively played out in a catastrophic failure to leverage one of the most impressive fleets of AI infrastructure that there actually is.

Speaker 1

因此,我们看到的这批招聘人员,其思维方式与杨立昆的观念完全对立,Meta在挖人方面不可能更彻底地转向了。

And so what we're seeing with this set of hires is people who are, I mean, so completely antithetical to Yan LeCun's way of thinking like, Meta could not be pivoting harder in terms of the people it's poaching here.

Speaker 1

首先,OpenAI显然是这个领域中最崇尚规模化的组织之一,可能是最崇尚规模化的。

First of all, OpenAI, obviously, one of the most scale pilled organizations in the space, probably the most scale pilled.

Speaker 1

Anthropic也同样位居前列。

Anthropic actually is is up there too.

Speaker 1

但还有Scale AI的亚历克斯·王。

But, also, Scale AI's Alex Wang.

Speaker 1

所以,好吧,这挺有意思的。

So, okay, that's interesting.

Speaker 1

这家伙非常注重规模扩展,也非常关注AI安全。

Very scale pilled dude, also very AI safety pilled dude.

Speaker 1

丹尼尔·格罗斯,可以说也非常关注AI安全。

Daniel Gross, arguably quite AI safety pilled.

Speaker 1

至少‘安全超级智能’曾经是他们的口号。

At least that was the mantra of safe superintelligence.

Speaker 1

他这么快就离开了,真奇怪。

Weird that he left that so soon.

Speaker 1

顺便说一句,如果丹尼尔·格罗斯现在离开了,关于‘安全超级智能’目前状况的问题还有很多。

A lot of open questions about how safe superintelligence is doing, by the way, if Daniel Gross is now leaving.

Speaker 1

我的意思是,DG可是首席执行官啊。

I mean, DG was the CEO.

Speaker 1

对吧?

Right?

Speaker 1

和ILLEA共同创立了它。

Cofounded it with ILLEA.

Speaker 1

那那边到底发生了什么?

So what's going on there?

Speaker 1

但这就像是个悬而未决的选票。

But so that's a hanging chad.

Speaker 1

但如今丹尼尔·格罗斯转到了Meta这边,你必须聚集足够多的顶尖人才,才能吸引其他顶尖人才加入。

But just Daniel Gross being being now over on the the Meta side, you have to have enough of a concentration of exquisite talent to make it attractive for other exquisite talent to join.

Speaker 1

如果你无法突破这个关键人数门槛,那还不如什么都没有,这正是Meta一直以来的问题。

If you don't break that critical mass, you might as well have nothing, and that's been Meta's problem this whole time.

Speaker 1

他们需要像这样,用巨额资金注入来启动这件事。

They needed to just, like, jump start this thing with a massive capital infusion.

Speaker 1

再次强调,这些巨额薪酬就是从这里来的。

Again, these massive pay packages, that's where it's coming from.

Speaker 1

只要给人们一个理由回来,提供一些早期的成果,重新激发人们对Meta的兴趣。

Just give people a reason to come, get some early proof points to get people excited about Meta again.

Speaker 1

奇怪的是,尽管有这一切,我并不完全有信心这么说,但你可以看到Meta在安全问题上的立场未来可能会有所转变,因为严拉坤曾如此轻视这个问题,但现在他们被迫招聘的许多人,客观来看,都与那些真正引领前沿、并认真对待AI失控风险的团队高度重合。

And the weird thing is, with all this, like, I'm not confident at all in saying this, but you could see a different line from Meta on safety going forward too because Yan Lakun was so dismissive of it, but now a lot of the people they've been forced to hire because there is, if you look at it objectively, a strong correlation between the people in teams who are actually leading the frontier and the people in teams who take loss of control over AI seriously.

Speaker 1

如今,Meta在某种程度上被迫改变其DNA,认真对待这个问题。

Now Meta is kind of forced to change, in some sense, its DNA to take that seriously.

Speaker 1

所以我觉得这真是一个非常有趣的转变。

So I think that's just a really interesting, like, shift.

Speaker 1

我知道这样说严拉坤听起来非常苛刻。

And I know this sounds really harsh with respect to Yan Lakun.

Speaker 1

你懂的,就当是这么回事吧。

Like, you know, take it for what it is.

Speaker 1

这只是一个人的观点,但我已经和许多持相同看法的研究人员交流过。

It's it's just one man's opinion, but I've I have spoken to a lot of researchers who feel the same way.

Speaker 1

而且,再次强调,我觉得数据在某种程度上支持这一点。

And, again, I mean, I think the data kinda bears it out.

Speaker 1

本质上,马克·扎克伯格现在正在为杨立昆支付代价。

Essentially, Mark Zuckerberg is being forced to pay the Yan Lakun tax right now.

Speaker 1

我不知道杨立昆未来会怎样,但我确实有点怀疑他的Meta生涯是否即将结束,或者是否需要采取某种保全面子的措施。

And I don't know what happens to Yan Lakun going forward, but I do kind of wonder if his meta days may be numbered or, you know, if there's gonna be a face saving measure that has to be taken there.

Speaker 0

对。

Right.

Speaker 0

背景信息:杨立昆是Meta的首席人工智能科学家。

For context, Yan LeCun is Meta's chief AI scientist.

Speaker 0

他已经在Meta工作了十多年,我记得是2012年或2013年被Meta聘用的。

He's been there for over a decade, hired, I think, 2013, 2012 by Meta.

Speaker 0

他是过去几十年里新型网络发展的关键人物之一,无疑是深度学习兴起的主要研究者和贡献者之一,但正如你所说,他对大型语言模型持怀疑态度,而支持其他技术路线。

One of the key figures in the development of newer networks, really, over the last couple of decades and certainly is a major researcher and contributor to the rise of deep learning in general, but as you said, a skeptic on large language models and a proponent for other techniques.

Speaker 0

我个人并不完全认同这种说法。

I will say not entirely bought into this narrative personally.

Speaker 0

据我所知,负责LAMA和LLM项目的人并不在库伦科夫团队中。

The person heading up the effort on LAMA and LLMs was not in the Kurenkov as far as aware.

Speaker 0

Meta内部还有一个专注于生成技术的部门,现在已经进行了重组。

There was another division within Meta that focused on generative technology that has now been revamped.

Speaker 0

负责生成式AI工作的负责人已经离职,现在正在组建一个全新的部门,名为AGI基础团队。

The person leading the generative AI efforts in particular has left, and now there is an entirely new division called AGI Foundations that is now being set up.

Speaker 0

因此,这是大规模重组的一部分。

So this is part of a major revamp.

Speaker 0

扬·库伦科夫仍然领导着他那更偏向研究发表的那部分工作。

Yann Kurenkov is still leading his more research publication type side of things.

Speaker 0

据我所知,他并未深度参与扩大LaMind等工作的这一侧,这部分更偏向研发,旨在与OpenAI等公司竞争,而非纯粹的研究。

And perhaps, as far as I know, not very involved in this side of scaling up LaMind and all of this, which is less of a research effort, more of an R and D kind of compete with OpenAI and so on effort.

Speaker 1

完全同意。

Absolutely agree.

Speaker 1

我之前说扬·拉昆不参与公司日常产品事务时,指的就是这个。

That was what I was referring to when I was saying Yan Lakun is not sort of involved in the day to day kind of product side of the org.

Speaker 1

长期以来大家都知道,他实际上并没有亲自操刀LaMind的工作,但他多年来一直定义并阐述了Meta对人工智能和AI扩展的核心理念。

You know, it's it's been known for a while that he's not actually, you know, doing the heavy lifting on LAMA, but he has defined what it means like, essentially articulated Meta's philosophy on AI and AI scaling for the last, you know, however many years.

Speaker 1

因此,人们普遍认为,当你加入Meta时,至少过去是这样,你是在认同一种与杨·拉昆相一致的哲学,我认为这正是Meta如今处境的核心驱动力。

And so it's understood that when you join Meta, at least it was, that you were buying into a sort of Yann Lakun aligned philosophy, which I think is is the kind of core driving problem behind where Meta finds itself today.

Speaker 0

是的,这确实是其中一部分原因。

Yeah, that's definitely part of it.

Speaker 0

我的意思是,这正是Meta作为AI研究俱乐部的声誉所在。

I mean, that's part of the reputation of Meta as an AI research club.

Speaker 0

此外,Meta的优势之一,也是人们可能选择加入Meta的原因,是他们非常支持开源。

Also, I mean, part of the advantage of Meta and why people might want to go to Meta is because of their very open source

Speaker 1

他们之所以友好地支持开源,是因为他们别无选择——这是他们在不断发布媒体内容时获得关注的唯一方式。

They're friendly only very open source friendly because they're forced to do that because it's the only way they can get headlines while they pump out media

Speaker 0

但无论如何,这仍然是一个重要的因素。

But regardless, it's still a factor here.

Speaker 0

关于这个故事,最后一点值得注意的是,你可以对Meta试图将大量人员投入到这个问题中、将规模从几百人扩大到上千人这一做法进行一番推测性分析。

One last thing worth noting on this whole story, I mean, you could do a whole speculative analysis of what Meta also try to throw a lot of people at the problem, scale up from a couple 100 to like a thousand people.

Speaker 0

我认为,这可能和谷歌的情况类似,都是大公司面临的问题,而OpenAI则不是。

I think probably had a similar situation to Google where it was big company problems, OpenAI

Speaker 1

and

Speaker 0

Anthropic规模很大,但他们没有大公司的问题。

Anthropic, they're huge, but they don't have big company problems.

Speaker 1

那是一个

That's a

Speaker 0

很好的观点。

great point.

Speaker 0

他们正在应对公司扩张的问题。

They scaling company problems.

Speaker 0

所以这次改革可能

So this revamp could

Speaker 1

也是,是的。

also Yeah.

Speaker 1

有助于

Help with

Speaker 0

接下来是研究与进展。

On to research and advancements.

Speaker 0

我想,不再谈创伤了。

No more trauma talk, I guess.

Speaker 0

接下来,我们来看DeepMind的一个故事,他们开发了Alpha Genome,这是他们Alpha科学模型系列的最新成果。

Next, we have a story from DeepMind, and they have developed Alpha Genome, the latest in their Alpha line of scientific models.

Speaker 0

这个模型专注于帮助研究人员理解基因功能。

This one is focused on helping researchers understand gene functions.

Speaker 0

它并不用于个人基因组预测,而是更侧重于一般性模式的识别。

It's not meant for personal genome prediction, but more so just general identification of patterns.

Speaker 0

它有助于识别患有超罕见癌症患者的致病突变。

It could help identifying causative mutations in patients with ultra rare cancers.

Speaker 0

例如,哪些突变导致了基因表达异常。

For instance, which mutations are responsible for incorrect gene expression.

Speaker 0

说实话,这里涉及大量关于生物学和基因组学的深奥知识,而我对这些完全不专业。

I'm going to be honest, there's a lot of deep science here with regards to biology and genomics, which I am not at all an expert on.

Speaker 0

其核心理念与AlphaFold以及其他Alpha系列项目类似。

The gist of it is similar to Alpha Fold, similar to other alpha efforts.

Speaker 0

在遗传学家所面对的问题、预测任务和分析等基准测试中,Alpha Genome的表现远远超越了所有现有技术。

On the benchmarks dealing with the problems that geneticists deal with, the kind of prediction issues, the analysis, Alpha Genome kind of beats all existing techniques out of the park.

Speaker 0

在几乎每一个基准测试中,它都超越了以往的努力。

On almost every single benchmark, it is superseding previous efforts.

Speaker 0

SWAN模型能够同时完成许多任务。

The SWAN model is able to do a lot of things all at once.

Speaker 0

同样,这并不是我的专业领域,我不太能深入探讨,但我相信这与AlphaFold的思路是一致的。

Again, not really my background to come within this too much, but I'm sure that this is along the lines of Alpha Fold.

Speaker 0

AlphaFold在预测蛋白质折叠方面对科学界非常有帮助。

Alpha Fold was very useful scientifically for making predictions about protein folding.

Speaker 0

Alpha Genome有望在理解基因组学、预测哪些基因发挥何种功能等方面发挥巨大作用。

Alpha genome is presumably going to be very useful for understanding genomics, for making predictions about which genes do what, things like that.

Speaker 1

这是一个非常有趣的视角,可以说是一种从根本上不同的方式来解决‘理解生物学’的问题。谷歌DeepMind及其子公司Isomorphic Labs(顺便说一下,Dennis是这家公司的CEO,据我所知,这家公司一直非常专注于此)正是由此衍生出来的。

It's a really interesting take that's, I guess, a fundamentally different way of approaching the let's understand biology problem that that Google DeepMind and then its its subsidiary, I guess, it's it's spawned a company, Isomorphic Labs, which, by the way, Dennis is the CEO of and and very focused on, I hear, has kind of been been very focused on anyway.

Speaker 1

当你看Alpha Fold时,你实际上是在根据构成蛋白质的乐高积木来预测其结构,某种程度上还包括其功能。

When you look at Alpha Fold, you're looking at essentially predicting the structure and and to some degree, the function of of proteins from the LEGO blocks that make up those proteins.

Speaker 1

对吧?

Right?

Speaker 1

也就是氨基酸,那些被串联在一起的单个氨基酸。

The amino acids, the individual amino acids that get chained together.

Speaker 1

对吧?

Right?

Speaker 1

所以你有大约20种氨基酸可以选择,这就是构建蛋白质的方式。

So you got, you know, 20 amino acids you can pick from, and and that's how you build a protein.

Speaker 1

而根据你所使用的氨基酸,有些带正电,有些带负电,有些是极性的,有些则不是,然后蛋白质就会以某种方式折叠。

And depending on the amino acids that you have, some of them are positive charged, some of are negative, some of are polar, some of them are not, and then the thing will fold in a certain way.

Speaker 1

这与下面这个问题是不同的:好吧。

That is distinct from the problem of saying, okay.

Speaker 1

我有一段由30亿个碱基对组成的DNA链。

I've got a strand of, you know, 300,000,000,000 base pair sorry, 3,000,000,000 base pairs of DNA.

Speaker 1

我想知道的是,如果我取这一个碱基对,把它从比如A换成T,或者从G换成A,蛋白质会发生什么变化?

And what I wanna know is if I take this one base pair and I switch it from, I don't know, like, from an a to a t, right, or from a g to an a, what happens to the the protein?

Speaker 1

这对下游的生物活性会产生什么影响?

What happens to the downstream kind of biological activity?

Speaker 1

这会引发什么样的连锁反应?

What cascades does that have?

Speaker 1

它会产生哪些效应?

What effects does it have?

Speaker 1

这个问题很有趣,因为它取决于你以一种相当独特的方式模拟生物学的能力。

And that question is a it's an interesting question because it depends on your ability to model biology in a pretty interesting way.

Speaker 1

它也与生物学中的一个实际现象紧密相关。

It also is tethered to an actual phenomenon in biology.

Speaker 1

有一种现象叫做单核苷酸多态性。

There's this thing called the single nucleotide polymorphism.

Speaker 1

在人类基因组中,有些核苷酸通常可以是G、T或其他某种碱基。

There's some nucleotides in the human genome that you'll often see can can either be like a a g or a t or something.

Speaker 1

你会看到有些人携带G等位基因,有些人则携带T等位基因。

And so you'll see some people who have the g variant and some people have the t variant.

Speaker 1

通常情况下,这些变异中的一些与特定疾病相关。

It's often the case that some of these variants are associated with a particular disease.

Speaker 1

我以前在基因组学实验室做心脏病学研究,那时候有个著名的变异叫9p21.3之类的。

And so there's like a I used to work in a genomics lab doing cardiology research back in the day, and there's, like, this famous variant called nine p twenty one point three or something.

Speaker 1

你知道,如果有些人携带——我忘了具体是什么了。

And, you know, if some people had I forget what it was.

Speaker 1

如果是T版本,患冠状动脉疾病或动脉粥样硬化等的风险会更高,而如果是另一种版本则不会。

The the t version, you have a higher risk of getting coronary artery disease or atherosclerosis or whatever, and not if you have the other one.

Speaker 1

所以,本质上,这让你在某种程度上减少了需要进行的实验数量——如果你能弄清楚:尽管人类基因组中有这么多可能的变异,但只有少数几种真正与某种疾病或效应相关。

So, essentially, what this is doing is it's allowing you to reduce, in some sense, the number of experiments you need to perform If you can figure out, okay, like, we have all these different possible variations across the human genome, but only a small number of them actually matter for a given disease or effect.

Speaker 1

如果我们能很好地模拟基因组,就有可能锁定真正值得关注的变异,从而开展更受控的实验。

And if we can model the genome pretty well, we might be able to pin down the variants we actually care about so that we can run more controlled experiments.

Speaker 1

对吧?

Right?

Speaker 1

所以我们知道,嘿。

So we know that, hey.

Speaker 1

你知道,病人A和病人B的基因组可能有上百万个差异,但就这个效应而言,他们其实很相似,或者应该说是相似的。

You know, patient a and patient b, they may have, like, a zillion different differences in their genomes, but, actually, for the purpose of this effect, they're quite comparable, or they ought to be.

Speaker 1

这无论如何都是谷歌DeepMind的一项非常有趣的下一步进展,我预计我们会看到更多,因为他们明确地对这个方向感兴趣。

This is anyway a really, I think, interesting next advance from Google DeepMind, I expect that we'll see a lot more because they are explicitly interested in that direction.

Speaker 0

对。

Right.

Speaker 0

他们发布了一份相当详细的科研论文,一篇关于这个的预印本,就像AlphaFold那样,一篇55页的论文,详细描述了模型、结果和数据。

They released a pretty detailed research paper, a preprint on this as they have of Alpha Fold, 55 page paper describing the model, describing the results, describing the data, all of that.

Speaker 0

他们还发布了API,允许客户端查询该模型,并且非商业用途免费,但有一定查询限制。

Also released an API, so a client side ability to query the model and it is free of charge for non commercial use with some query limiting.

Speaker 0

所以,是的,和AlphaFold一样,他们正在让科学家们能够使用这个工具。

So yeah, again, similar to Alpha Fold, they are making this available to scientists to use.

Speaker 0

他们还没有开源这个模型本身,但已经解释了它的运行原理。

They haven't open sourced this yet, the model itself, but they did explain how it works.

Speaker 0

所以这确实令人兴奋,看到DeepMind做这种事总是很有趣。

So certainly exciting and always fun to see DeepMind doing this kind of stuff.

Speaker 1

接下来是直接推理优化,DRO。

And up next, have direct reasoning optimization, DRO.

Speaker 1

我们已经有了GRPO,有了DPO。

So we've got, you know, GRPO, we've got DPO.

Speaker 1

我们有这么多、这么多的PO、RO或者O。

We've like, you know, there's so many, so many POs or ROs or Os.

Speaker 1

这么多O。

So many Os.

Speaker 1

因此,大语言模型可以奖励并优化自己在开放性任务中的推理过程。

So LLMs can reward and refine their own reasoning for open ended tasks.

Speaker 1

我喜欢这篇论文。

I like this paper.

Speaker 1

我非常喜欢这篇论文。

I like this paper a lot.

Speaker 1

我觉得我以前在播客里聊过这个。

It's I I think I'm I'm gonna talk about this on the podcast before.

Speaker 1

以前有个教授,每次你做展示时都会问一些非常简单的问题,简单得让人尴尬。

Used to have a a professor who would, like, ask these very simple questions when you were presenting something, and they were, like, embarrassingly simple.

Speaker 1

你甚至不好意思问出那样的问题,但结果证明,那往往是正确且最深刻的问题。

And you would you would be embarrassed to ask that question, but then that always turns out to be the right and deepest question to ask.

Speaker 1

这篇论文就是这样的。

This is one of those papers.

Speaker 1

它的概念非常简单,但当你意识到这一点时,你会惊呼:天啊,原来缺了这个。

It's like it's very simple concept, but it's something that when you realize it, you're like, oh my god, that was missing.

Speaker 1

首先,我们来谈谈目前我们通常如何将推理能力训练进模型。

So first, let's just talk about how currently we typically train reasoning into models.

Speaker 1

对吧?

Right?

Speaker 1

你有一些你知道是正确的输出。

So you have some output that you know is correct.

Speaker 1

对吧?

Right?

Speaker 1

某个答案,也就是期望的或目标输出,以及你的输入。

Some answer, the desired or target output, and you've got your input.

Speaker 1

所以你要做的就是把输入喂给模型。

So what you're gonna do is you're gonna feed your input to your model.

Speaker 1

让模型生成一系列不同的推理过程。

You're gonna get it to generate a bunch of different reasoning traces.

Speaker 1

然后在每种情况下,你都要查看这些推理过程,将它们输入模型,并根据模型生成的推理过程,看看它对已知正确的目标输出赋予了多高的概率。

And then in each case, you're going to look at those reasoning traces, feed them into the model, and based on the reasoning trace that the model generated, see what probability it assigns to the target output that you know is correct.

Speaker 1

对吧?

Right?

Speaker 1

一般来说,正确的推理过程会带来更高的概率,因为这是正确的结果。

So reasoning traces that are correct in general will lead to a higher probability that the model places on the target outcome because it's the right outcome.

Speaker 1

所以如果推理是正确的,它就会给结果赋予更高的概率。

So if the reasoning is correct, it's gonna be give a higher probability to the outcome.

Speaker 1

所以这感觉和我们通常训练这些模型的方式有点相反,但至少在GRPO(组相对策略优化)中,就是这样做的。

So this is sort of it feels a little bit backwards from the way we normally train these models, but this is how it's done, at least in in GRPO, group relative poly policy optimization.

Speaker 1

因此,本质上,你通过奖励来激励模型,使其在给定推理轨迹的条件下对期望输出赋予高概率。

So, essentially, you reward the model to incentivize high probability of the desired output conditioned on the reasoning traces.

Speaker 1

这会促使你随着时间推移生成越来越好的推理轨迹,因为你希望生成那些能为正确答案赋予更高概率的推理轨迹。

And this makes you generate over time better and better reasoning traces because you wanna generate reasoning traces that assign higher probability to the correct output.

Speaker 1

所以这里的直观理解是:如果你的推理是好的,你就应该对正确答案非常有信心。

So the intuition here is if your reasoning is good, you should be very confident about the correct answer.

Speaker 1

对吧?

Right?

Speaker 1

但现在这种方法会失效,而且失效的方式非常有趣。

Now this breaks, and it breaks in a really interesting way.

Speaker 1

即使你的参考答案完全正确,你在训练过程中也可能对模型过于宽容,因为你评估模型对正确答案的置信度时,是将正确答案中每个词的置信分数平均起来的。

Even if your reference answer is exactly correct, you can end up being too forgiving to the model during training because the way that you score the model's confidence in the correct answer based on the reasoning traces is you average together, essentially, the confidence scores of each of the answer tokens in the correct answer.

Speaker 1

问题是,正确答案的第一个词往往本身就泄露了答案。

Now the problem is the first token of the correct answer often gives away the answer itself.

Speaker 1

所以即使推理过程完全错误,比如问题是‘谁在足球比赛中打进了制胜一球’,正确答案是利昂内尔·梅西,但模型的推理却是‘我觉得是克里斯蒂亚诺·罗纳尔多’,那么模型会从这里开始,给‘利昂内尔’这个正确答案的首个词赋予很低的概率。

So even if the reasoning stream was completely wrong, like, even if so let's say the question was, like, who scored the winning goal in the soccer game, and the answer was Lionel Messi, if the model's reasoning is, like, I think it was Cristiano Ronaldo, the model is going to, okay, from there, assign a low probability to Lionel, which is the the first word of the correct answer.

Speaker 1

但一旦它读到‘利昂内尔’这个词,模型就知道‘梅西’一定是下一个词。

But once it reads the word Lionel, the model knows that Messi must be the next token.

Speaker 1

所以即使它的推理过程写的是‘克里斯蒂亚诺·罗纳尔多’,它还是会赋予‘梅西’很高的概率。

So it's gonna assign up actually a high probability to Messi even though its reasoning trace said Cristiano Ronaldo.

Speaker 1

因此,这表明答案中有一些词实际上能真实反映你模型推理的质量。

And so, essentially, this suggests that there are some tokens in the answer that are going to actually, like, correctly reflect your the quality of your model's reasoning.

Speaker 1

所以,如果你的模型推理是‘我觉得是克里斯蒂亚诺·罗纳尔多’,而正确答案其实是利昂内尔·梅西。

So, you know, if your model's reasoning was, I think it was Cristiano Cristiano Ronaldo, and the actual answer was Lionel Messi.

Speaker 1

那么‘利昂内尔’这个词,你应该期望模型对它的置信度很低——这样才是好的。

Well, Lionel, you should expect it to have very low confidence in, so that's good.

Speaker 1

你就能准确判断出模型的推理在这里是错的。

You'll you'll be able to actually correctly determine that your reasoning was wrong there.

Speaker 1

但一旦‘利昂内尔’作为提示的一部分出现,‘梅西’就突然变得显而易见了,于是这里就出现了一点误判。

But once you get Lionel as in as part of the prompt, then messy all of a sudden becomes obvious, and so you get a bit of a misfire there.

Speaker 1

所以,本质上,他们要做的是输入大量推理轨迹,然后查看正确输出中的每个词元,看看哪些词元的变动幅度很大。

So, essentially, what they're gonna do is they're gonna calculate like, they'll feed in a whole bunch of reasoning traces, and they'll look at each of the tokens in the correct output and see which of those tokens vary a lot.

Speaker 1

真正反映推理质量的词元应该具有高方差。

Tokens that are actually reflective of the quality of the reasoning should have high variance.

Speaker 1

对吧?

Right?

Speaker 1

因为如果你的推理路径正确,这些词元应该具有高置信度。

Because if you have good reasoning trajectory, those tokens should have high confidence.

Speaker 1

而如果你的推理路径错误,它们的置信度就应该很低。

And if you have a bad reasoning trajectory, they should have low confidence.

Speaker 1

但你也会有一些不太反映推理过程的词元,比如‘Messi’和‘Lionel Messi’,因为一旦‘Lionel’出现,答案就已经暴露了。

But then you have some, like, kind of less reasoning reflective tokens, like, say, messy and Lionel messy, because then Lionel has already given it away.

Speaker 1

你应该预期‘Messi’始终具有高置信度。

You should expect messy to consistently have high confidence.

Speaker 1

因为即便你的推理轨迹完全错误,一旦你读到‘Lionel’,‘Messi’就变得显而易见了。

Because, again, even if your reasoning trace is totally wrong, by the time you get Lionel as by the time you've you've read Lionel, messy is obvious.

Speaker 1

这简直就像,如果你在考试时能看到正确答案的第一个词,那么是的,即使你的思考完全错误,只要答案是利昂内尔·梅西,你还是会选对第二个词。

It's almost like, you know, if you're writing a test and you can see, like, the first word in the correct answer, well, yeah, you're gonna get even if your thinking was completely wrong, you're gonna get the correct second word if the answer is Lionel Messi.

Speaker 1

所以,无论如何,这只是他们用来检测良好推理的一种方式,然后将它融入一个更广泛的算法中,而这个算法本身其实相当简单,没什么特别令人惊讶的。

So, anyway, this is just a way that they use to kind of detect good reasoning, and then they feed that into anyway a broader algorithm that beyond that is is fairly fairly simple, nothing too too shocking.

Speaker 1

他们只是将这种方法融入到一个看起来很像GRPO的框架中,从而得到这个DRO算法。

They just fold this in to something that looks a lot like a GRPO to get a this DRO algorithm.

Speaker 0

对。

Right.

Speaker 0

是的。

Yeah.

Speaker 0

这篇论文已经有一段时间了,它与其他那些不关注词元的近期研究形成了对比。

It's been a while in the paper contrasting it with other recent work that deals with that doesn't pay attention to tokens, basically.

Speaker 0

为了帮你理解你刚才说的,他们的重点在于这种R自由推理反思奖励。

That, just to contextualize what you were saying, their focus is on this R free reasoning reflection reward.

Speaker 0

DRO,即直接推理优化,本质上就是GRPO,人们通常用它来进行强化学习,一般配合可验证的奖励。

DRO, Direct Reasoning Optimization, is basically GRPO, what people use generally for RL, typically with verifiable rewards.

Speaker 0

他们的重点在于:如何以开放的方式对长推理链进行训练,识别其中的一些问题和现有方法,并突出这种推理反思奖励——它本质上是通过考察思维链中的标记与输出之间的一致性来作为优化信号。

Here their focus is how do we train generally in an open ended fashion over long reasoning chains, identify some of these issues and existing approaches, and highlight this reasoning reflection reward that basically is looking consistency between these tokens in the chain of thought and in the output as a signal to optimize over.

Speaker 0

正如你所预料的,他们做了一些实验。

As you might expect, they do some experiments.

Speaker 0

他们表明,这种方法非常有效。

They show that this winds up being quite useful.

Speaker 0

我认为这还表明,我们仍处于使用强化学习训练推理的早期阶段。

I think another indication of we are still in the early ish days of using RL and training reasoning.

Speaker 0

目前存在大量噪声,同时也在利用许多重要的洞见。

There's a lot of noise and a lot of significant insights being leveraged.

Speaker 0

最后一点,DRO,我想这是对DPO的一种呼应。

Last thing, DRO, I guess kind of a reference to DPO.

Speaker 0

正如你所说,DPO是直接偏好优化,而DRO是直接推理优化。

As you said, DPO is direct preference optimization versus direct reasoning optimization.

Speaker 0

两者并没有太强的相关性,只是命名上比较有趣,因为除了在基于强化学习的偏好对齐与DPO之间的差异上可能有某种类比之外。

Not super related, it's just, I guess, fun naming conventions, because aside from arguably being sort of analogous in terms of difference between RL based preference alignment and DPO.

Speaker 0

总之,这只是一个有趣的引用。

Anyway, it's kind of a funny reference.

Speaker 1

是的。

Yeah.

Speaker 0

下一篇文章,《大型语言模型中的精细化扩展定律》。

Next paper, A Refined Scaling Law in Large Language Models.

Speaker 0

我们已经多次讨论过扩展定律。

So we've talked about scaling laws a ton.

Speaker 0

基本上,你会收集大量数据点:当你使用这么多算力、这么多训练浮点运算次数或其他资源时,你会在语言预测任务上获得特定的损失值,通常以困惑度作为实际指标,然后你对这些数据点拟合某种方程。

Basically you try to collect a bunch of data points of once you use this much compute or this much training flops or whatever, you get to this particular loss on language prediction typically on the actual metric of perplexity, and then you fit some sort of equation to those data points.

Speaker 0

通常的情况是,你会得到一个相当好的拟合结果,适用于你持续扩大规模时的数据点——规模越大,损失就越低。

What tends to happen is you get a fairly good fit that holds for feature data points that typically you're scaling up, scaling up, scaling up, your loss goes down and down and down.

Speaker 0

人们发现,令人惊讶的是,这种拟合非常准确且具有很强的预测性,而这一点在2020年之前并不常见,也几乎没有人在尝试。

People have found that somewhat surprisingly, can get a very good fit that is very predictive, which was not at all a common idea or something that people had really tried pre-twenty twenty.

Speaker 0

这篇论文所做的,基本上就是以更好的方式完成这件事。

What this paper does is basically do that but better.

Speaker 0

这是一种新颖且精细的扩展定律,能提供更高的预测准确性。

It's a novel and refined scaling law that provides enhanced predictive accuracy.

Speaker 0

他们通过系统地构建模型损失面,并更精准地拟合实证数据来实现这一点。

They do that by just systematically constructing model loss surface and doing just a better job of fitting to empirical data.

Speaker 0

他们表示,通过将外推误差降低433%,改进了几年前提出的Chinchilla定律。

They say that they improve upon the chinchilla law, one of the big ones from a couple of years ago, by reducing extrapolation error by 433%.

Speaker 0

所以,这可以说是一条可靠得多的定律。

So a much more reliable law, so to speak.

Speaker 1

是的。

Yeah.

Speaker 1

Chinchilla扩展定律可以说是谷歌对OpenAI在2019年论文中提出的初始扩展定律的修正。

The the chinchilla scaling law was sort of somewhat famously Google's correction to the initial OpenAI scaling law that was proposed, I think, in a 2019 paper.

Speaker 1

这就是所谓的Kaplan扩展定律。

This is the so called Kaplan scaling law.

Speaker 1

因此,Chinchilla曾被视为关于扩展规律的重要突破,甚至可能是最终的权威结论。

And so it it was Chinchilla was sort of heralded as this kind of big and and ultimately, maybe pseudo final word on how scaling would work.

Speaker 1

与卡普兰缩放定律相比,它明显更依赖数据。

It was more data heavy than the Kaplan scaling laws, notably.

Speaker 1

但他们在这里指出,奇奇拉定律在中等规模模型上表现非常好,这基本上就是它被校准和设计的目标场景,但它在极小或极大的模型上表现并不出色。

But what they're pointing out here is Chinchilla works really well for midsize models, which is basically where it was calibrated, like, you know, what it was designed for, but but it doesn't do great on very small or very large models.

Speaker 1

而显然,既然缩放是一个关键因素,超大规模模型就非常重要。

And, obviously, given that scaling is a thing, very large models matter a lot.

Speaker 1

而缩放定律的全部意义就在于,从你当前的状况出发进行外推,看看,比如:

And the whole point of a scaling law is to extrapolate from where you are right now to see, like, okay.

Speaker 1

如果我将模型规模扩大100倍,因此预算也增加100倍,我预期结果会达到什么水平?

Well, if I train a model a 100 times the scale and therefore at, you know, let's say, a 100 times this budget, where would I expect to end up?

Speaker 1

你可以想象,这类决策背后依赖着多么巨大的影响。

And you can imagine how much depends on those kinds of decisions.

Speaker 1

因此,你希望有一个经过精确校准、能够很好外推,尤其是对超大规模模型表现优异的模型。

So you want a model that is really well calibrated and extrapolates really well, especially to very large models.

Speaker 1

他们在论文中做了一项非常有趣的工作。

They do a really interesting job in the paper.

Speaker 1

我们不会深入细节,但如果你有物理学背景,比如热力学,他们会玩一个非常有趣的游戏,使用有限差分分析来分离模型大小n和训练数据量d之间的依赖关系。

We won't go into detail, but especially if you have a background in physics like thermodynamics, they they play this, like, really interesting game where they'll use finite difference analysis to to kinda separate out dependencies between n, the size of the model, and d, the amount of data that it's trained on.

Speaker 1

而这最终就是所谓的秘诀,如果你愿意这么称呼的话。

And that ultimately is kinda the the secret sauce, if you wanna call it to call it that here.

Speaker 1

还有其他一些技巧,但核心部分是他们将损失函数分解为不同的项,其中一项仅依赖于n,另一项仅依赖于d。

There's a bunch of other hijinks, but the core piece is they sort of break the loss down into different terms, one of which only depends on n, the other of which only depends on d.

Speaker 1

一项仅与模型大小有关,另一项则仅取决于训练数据集的规模。

So one is just model size dependent, the other is only dependent on the size of the training dataset.

Speaker 1

但他们还引入了n和d之间的交互效应,即模型大小与训练数据量之间的相互作用,然后推导出这一项应该是什么形式。

But then they also introduce this interaction effect between n and d, between the size of the model and the amount of data it's trained on, and then they end up deriving what should that term look like.

Speaker 1

这是其中一种非常有趣的表述方式。

That's one of the the framings of this that's really interesting.

Speaker 1

简单来说,如果Chinchilla认为数据扩展遵循一致的模式,那么无论模型大小如何,它总是d的某个负beta系数次方。

Just to kind of nutshell it, if Chinchilla says that data scaling follows a consistent pattern, it's either, like, d to the power of some negative beta coefficient, regardless of model size.

Speaker 1

无论你的模型有多大,它始终是d的负b次方。

Like, no matter how big your model is, it's always d to the power of negative b.

Speaker 1

所以,如果你告诉我数据量,我就能确定数据项的贡献。

So if I give you the the amount of data, you can determine the contribution of the data term.

Speaker 1

Farseer 的观点是,数据扩展实际上依赖于模型大小。

What farseer says is data scaling actually depends on model size.

Speaker 1

更大的模型从根本上以不同的方式从数据中学习,我们就先说到这里,但还有很多有趣的推论,可以弄清楚这个项究竟应该是什么样子。

Bigger models just fundamentally learn from data in a different way, and we'll park it there, but but there's a lot of cool extrapolation to figure out how exactly does this term have to look.

Speaker 0

没错。

Exactly.

Speaker 0

而且这一点非常有用,不仅仅是为了大致了解你会得到什么。

And and this is very useful not just to sort of know what you're going to get.

Speaker 0

这一点意味着,对于给定的计算预算,你可以预测数据量与模型大小的最佳平衡点。

That aspect of it means that for a given compute budget, you can predict what balance of data to model size is likely optimal.

Speaker 0

基本上就是当你花数百万美元训练一个模型时。

Basically is when you're spending millions of dollars training a model.

Speaker 0

知道这些事情真是太好了。

Pretty nice to know these kinds of things.

Speaker 0

再看一篇论文,下一个是对解空间的LLM自引导探索。

More paper, next one is LLM Self Guided Exploration of the Solution Space.

Speaker 0

所以核心思想是,搜索有多种方式,这里的搜索指的是观察一件事,然后决定下一步观察什么,如此反复,直到找到解决方案。

So the gist of this is there are many ways to do search where search just means look at one thing and then you decide on some other things to look at, and you keep doing that until you find a solution.

Speaker 0

通常的一种方法是蒙特卡洛树搜索,这是一个经典算法。

So the typical or one of the typical ways is Monte Carlo TreeSearch, a classic algorithm.

Speaker 0

例如,AlphaGo就使用了这种方法。

And this was, for instance, done with AlphaGo.

Speaker 0

如果想将这种方法与LLM结合,通常的做法是为某个位置赋予分数,做出一些预测,然后借助现有算法进行采样或决定下一步方向。

If you want to combine this in LLM, typically what you do is you assign some score to a given location and make perhaps some predictions, and then you have an existing algorithm to sample or to decide where to go.

Speaker 0

而LLM用于搜索时的关键区别在于:忘掉蒙特卡洛树搜索,忘掉任何现有的搜索算法或技术,直接让LLM决定下一步去哪里。

The key difference here with LLM for a search is basically forget that Monte Carlo tree search, forget any preexisting search algorithm or technique, just make the LLM decide where to go.

Speaker 0

它可以自主决定如何进行搜索。

It can decide how to do the search.

Speaker 0

他们认为这种方法更灵活、更注重上下文、需要更少的调参,而且效果似乎更好。

They say that this is more flexible, more context sensitive, requires less tuning, and just seems to work better.

Speaker 1

是的

Yeah.

Speaker 1

这全是提示层面的东西。

It's all prompt level stuff.

Speaker 1

对吧?

Right?

Speaker 1

所以这里没有优化,没有训练,也没有微调。

So there's no optimization going on, no training, no fine tuning.

Speaker 1

只是给模型一个提示而已。

It's just like, give, like, give the model a prompt.

Speaker 1

首先,找到一种方式,以一致的形式表示导致当前时刻的一系列动作,无论语言模型正在解决什么问题。

So number one, find a way to represent the sequence of actions that have led to the current moment in whatever problem the language model is trying to solve in a way that's consistent.

Speaker 1

也就是说,以一致的方式格式化到目前为止的所有棋步,这样模型就能看到当前状态和棋盘的历史记录。

So, like, essentially format, let's say, all the chess moves up till this point in a consistent way so that the model can look at the state and the the history of the the board, if you will.

Speaker 1

然后给模型一个提示,比如:好的。

And then give the model a prompt that says, okay.

Speaker 1

从这里开始,我希望你决定是继续当前路径,还是探索其他分支、其他轨迹。

From here, like, I I want you to decide whether to continue on the current path or look at alternative branches, alternative trajectories.

Speaker 1

这个提示是这样的:在决定是探索还是继续时,这里有几点重要考虑因素,然后列出了一系列内容。

The prompt is like, here are some important considerations when deciding whether to explore or continue, and then it lists a bunch.

Speaker 1

同样地,他们在评估阶段也有类似的设置,即你对可用选项进行打分,让模型选择最有前景的那个。

And then similarly, they have the same but for the evaluation stage, where you're scoring the available options and getting the model to choose the most promising one.

Speaker 1

所以,比如说,当你评估可能采取的操作或行动时,这里有几点重要考虑因素。

So, you know, it's like, here are some important considerations when evaluating possible operations that you could take or actions you could take.

Speaker 1

所以,一旦把这些东西结合起来,基本上,游戏或问题解决过程中的每个阶段,模型都拥有到目前为止所有已采取行动的完整历史。

So once you combine those things together, basically, each stage, I'll call it, of the game or of the problem solving, the model has a complete history of all the actions taken up to that point.

Speaker 1

然后,模型会被提示评估眼前的选择,决定是继续探索并添加新选项,还是选择其中一个选项并执行。

It's then prompted to evaluate the options before it and to decide whether to continue to explore and add new options or to select one of the options and execute against it.

Speaker 1

总之,基本上就是这样。

Anyway, that's basically it.

Speaker 1

这是一个相当概念上简单的想法。

It's a pretty conceptually simple idea.

Speaker 1

直接把树状结构和分支决策的开发交给模型,让它实时地思考这些路径。

Just offload the tree and branching structure development to the model so that it's thinking them thinking them through in in real time.

Speaker 1

性能提升相当显著。

Pretty impressive performance jumps.

Speaker 1

所以在使用GPT-4o时,与在倒计时游戏中的标准蒙特卡洛树搜索相比,这个游戏本质上是给你一组数字和所有标准数学运算——加法、除法、乘法、减法。

So when using g p d four o when compared with standard Monte Carlo tree search on this game of countdown, where essentially you're given a bunch of numbers and all the standard mathematical operations, addition, division, multiplication, subtraction.

Speaker 1

你需要找出如何组合这些数字来得到目标数字。

You're trying to figure out how do I combine these numbers to get a target number.

Speaker 1

所以在每个阶段,你都必须做出选择:好吧。

So at each stage, you have to choose, okay.

Speaker 1

我要不要把这两个数加起来?

Do I try adding these together?

Speaker 1

不管怎样。

Do I anyway.

Speaker 1

使用这种技术达到了47%的准确率,而蒙特卡洛树搜索只有32%,而且这种优势还会进一步放大。

So 47% on this using this technique versus 32% using Monte Carlo Tree Search, and this effect amplifies.

Speaker 1

因此,当你使用更强的模型时,这种优势会进一步放大。

So the advantage amplifies as you work with stronger models.

Speaker 1

例如,在o三mini上,79%对比蒙特卡洛树搜索的41%。

So on o three mini, for example, 79% versus 41% for Monte Carlo TreeSearch.

Speaker 1

因此,推理模型似乎能够充分利用这一点。

So reasoning models seem to be able to take advantage of this.

Speaker 1

你可以把它看作是一种更好的支撑结构。

You can think of it as a kind of scaffold a lot better.

Speaker 1

它还使用了更少的token,因此性能更好。

It also uses fewer tokens, so it's getting better performance.

Speaker 1

它使用的token更少,计算量也比蒙特卡洛树搜索更低。

It's using fewer tokens, so less compute than Monte Carlo tree search as well.

Speaker 1

这真的非常有趣。

So that's that's really interesting.

Speaker 1

对吧?

Right?

Speaker 1

这是一种更高效的方式,能够从现有模型中榨取更多性能,而且完全基于非常可解释且可调整的提示。

This is a way more efficient way of squeezing performance out of existing models, and it's all just based on very kind of interpretable and tweakable prompts.

Speaker 0

对。

Right.

Speaker 0

而且他们不仅将这种方法与蒙特卡洛树搜索进行比较。

And and they compare this not just to Monte Carlo TreeSearch.

Speaker 0

我们还将其与三步思维、广度优先搜索、最佳优先搜索进行了比较。

We also compare it to three of thoughts or three of thoughts, breadth first search, best first search.

Speaker 0

所有这些方法都非常关键,因为广义上的搜索意味着你有一系列可采取的行动,而你希望获得最佳结果,因此你需要提前考虑多步策略。

All of these, by the way, are pretty significant because search broadly is like there's a sequence of actions I can take and I want to get the best outcome, so you need to think many steps ahead.

Speaker 0

这里的分支意味着我采取这一步、这一步和这一步。

Branches here mean I take this step and this step and this step.

Speaker 0

你可以选择更深或更广地考虑要评估的步骤数量。

Well, can either go deeper or wider in terms of how many steps you consider.

Speaker 0

一步 ahead,两步 ahead。

One step ahead, two step ahead.

Speaker 0

这对于许多类型的问题至关重要,比如国际象棋、围棋,显然如此,但广义上讲,我们一直在进行各种搜索。

This is essential for many types of problems, chess, go, obviously, but broadly we do search and all sorts of things.

Speaker 0

更好的搜索方法意味着你能进行更优的推理,能更好地解决问题。

Having a better approach to search means you can do better reasoning, means you can do better problem solving.

Speaker 1

接下来我们谈谈策略与安全,这里有一个主要的故事,叫做语言模型的无监督引导。

And moving on to policy and safety, we have one main story here called unsupervised elicitation of language models.

Speaker 1

这非常有趣,说实话,一开始让我很困惑,我花了相当长的时间——虽然有点尴尬——和Claude一起研究这篇论文,这其实有点讽刺,因为如果我没记错的话,这是Anthropic的论文。

This is really interesting, and I'll be honest, like, was a head scratcher for my like, I spent a good embarrassing amount of time with Claude trying to help me through the paper, which is sort of ironic because if I remember, it's an Anthropic paper.

Speaker 1

但本质上,这是一种让语言模型利用其内部的逻辑理解能力来解决问题的方法。

But this is essentially a way of getting a language model's internal understanding of logic to help it to solve problems.

Speaker 1

想象一下,你有一堆数学问题和对应的解答。

So imagine that you have a bunch of math problems and solutions.

Speaker 1

比如,你知道五加三等于多少吗?

So for example, you know, what's five plus three?

Speaker 1

然后你有一个可能的解答。

And then you have a possible solution.

Speaker 1

对吧?

Right?

Speaker 1

也许是八。

Maybe it's eight.

Speaker 1

下一个问题是,七加二等于多少?

The next problem is, like, what's seven plus two?

Speaker 1

然后你有一个可能的答案,这个答案可能是10,但顺便说一下,这是错的。

And you have a possible solution, and that possible solution is maybe 10, which is wrong, by the way.

Speaker 1

所以这些可能的答案中有一些是错误的。

So some of these possible solutions are gonna be wrong.

Speaker 1

所以你有一堆数学题和可能的答案,但你不知道哪些是正确的,哪些是错误的。

So you have a bunch of math problems and possible solutions, and you don't know which are which are correct and incorrect.

Speaker 1

你希望训练一个语言模型来识别正确的答案。

And you wanna train a language model to identify correct solutions.

Speaker 1

对吧?

Right?

Speaker 1

你想弄清楚这些中哪些是真正正确的。

You wanna figure out which of these are actually correct.

Speaker 1

所以想象一下,你把这些全部列成一个列表。

So imagine you just lay these all out in a list.

Speaker 1

你有,比如,五加三等于多少,然后答案是八?

You have, you know, what's five plus three and then solution eight?

Speaker 1

七加二等于多少?

What's seven plus two?

Speaker 1

答案是十,等等。

Solution 10 and and so on.

Speaker 1

现在你要做的是,随机给其中几个例子标注正确或错误的标签。

Now what you're gonna do is you're gonna randomly assign correct and incorrect labels to a few of these examples.

Speaker 1

对吧?

Right?

Speaker 1

所以你会说,五加三等于八,然后随便说,好吧。

So you'll say, you know, five plus three equals eight, and you'll just randomly say, okay.

Speaker 1

这是正确的。

That's correct.

Speaker 1

而七加二等于十,但顺便说一下,这是错的,不过你会随机说它是正确的。

And seven plus two equals 10, which, by the way, is wrong, but you'll randomly say that's correct.

Speaker 1

对吧?

Right?

Speaker 1

然后你会让模型根据我们这里的正确性评分,知道第一个解是正确的,第二个解也是正确的,第三个解大概应该是什么?

And then you're going to get the model to say, given the the correctness scores that we have here, given that solution one is correct and solution two is correct, what should solution three be roughly?

Speaker 1

或者,你知道的,根据我们随机、秘密地分配的所有正确和错误标签,这个缺失的标签应该是什么?

Or, you know, given all the incorrect and incorrect and correct labels that we've assigned randomly, secretly, what should be this missing label?

Speaker 1

通常来说,由于你随机分配了这些标签,模型会变得非常困惑,因为这些随机分配的标签之间存在逻辑矛盾。

And, generally, because you've randomly assigned these labels, the model's gonna get really confused because there's a logical inconsistency between these randomly assigned labels.

Speaker 1

你标记为正确的很多问题实际上是错的,反之亦然。

A bunch of the problems that you've labeled as correct are actually wrong and vice versa.

Speaker 1

所以现在你要做的,本质上是尝试衡量模型对这个问题的困惑程度,然后你会翻转其中一个标签。

And so now what you're gonna do is essentially try to, like, measure how confused the model is about that problem, and you are then gonna flip one label.

Speaker 1

所以你会考虑将其中一个题目的正确或错误标签从正确改为错误,然后重复这个过程,看看模型的困惑分数是否会降低。

So you'll kind of think of, like, flipping the the correct or incorrect label on one of these one of these problems from correct to incorrect, say, and then you'll repeat, and you'll see if you get a lower confusion score from the model.

Speaker 1

总之,这大致就是这个概念。

Anyway, this is this is roughly the the the concept.

Speaker 1

随着时间推移,你会逐渐收敛到一个更低的困惑分数,这感觉就像模型逐渐趋于正确的答案,这就是为什么这很像模拟退火——如果你熟悉这个概念的话。

And so over time, you're gonna gradually converge on a lower lower confusion score, and that's it sort of, like, feels almost like the model's relaxing into the correct answer, which is why this is a lot like simulated annealing, if you're if you're familiar with that.

Speaker 1

你对问题进行随机调整,直到获得极低的损失,然后逐渐放松,最终接近正确答案。

You're making random modifications to the problem until you get a really low loss, and you gradually kind of relax into the correct answer.

Speaker 1

希望这能讲清楚。

I hope that makes sense.

Speaker 1

这有点像是你得亲眼看到才能理解,确实是这样。

It's sort of like, you kinda gotta see it, and it's yeah.

Speaker 0

对。

Right.

Speaker 0

为了给这个方法一些背景,他们——顺便说一下,这是来自Anthropic和其他几个机构——将这个问题表述为这样。

Just to give some motivation, they framed this problem, and this is from Anthropic and a couple other institutes, by the way.

Speaker 0

他们将这个问题置于超人类模型的背景下进行阐述。

They framed this in the context of superhuman models.

Speaker 0

因此,这种无监督引导部分的核心在于:如何训练模型去做某些事情,对吧?

So the unsupervised elicitation part of this is about the aspect of how do you train a model to do certain things, right?

Speaker 0

如今,常见的范式是先通过预训练训练语言模型,然后进行后训练,使用一些词或输出偏好标签,再通过RLHF或DPO来让模型按照你的期望行事。

And these days, the common paradigm is you train your language model via pre training, then you post train, you have some labels for your words or preferences of outputs, and then you do RLHF or you do DPO to make a model do what you want it to do.

Speaker 0

但这里的框架或理念是,一旦你达到超人工智能,人类可能根本无法看清它在做什么,也就无法为其提供什么是好、什么是坏的标签。

But the framework or the idea here is once you get to superhuman AI, well, maybe humans can't actually see what it does and kind of give it the labels of what is good and what's not.

Speaker 0

因此,这种内部一致性最大化框架能够在无需人类外部监督的情况下,引导大语言模型展现出良好且期望的行为。

So this internal coherence maximization framework makes it so you can elicit the good behaviors, the desired behaviors from the LLM without external supervision by humans.

Speaker 0

与以往类似方向的努力相比,这里的显著区别在于他们实现了规模化应用。

The key distinction here from previous efforts in this kind of direction is that they do it at scale.

Speaker 0

他们训练了一个完全不依赖人类标签的Claude 3.5 Haiku助手,并取得了优于其人类监督版本的性能。

So they train a Claude three point five Haiku based assistant without any human labels and achieve better performance than its human supervised counterparts.

Speaker 0

他们在规模显著的大语言模型上实际验证了这一方法的有效性,这可能对未来更大规模的模型产生深远影响。

They demonstrate in practice on a significantly sized LLM that this approach can work and this could have implications for future, even larger models.

Speaker 0

接下来,讲几则关于政策方面的故事。

Next up, a couple of stories on the policy side.

Speaker 0

好吧,其实只有一个故事。

Well, only one story.

Speaker 0

这则故事关于台湾,台湾已对华为和中芯国际实施了技术出口管制。

It's about Taiwan and it has imposed technology export controls on Huawei and SMIC.

Speaker 0

台湾实际上已将华为和中芯国际(半导体制造国际公司)列入黑名单。

Taiwan has actually blacklisted Huawei and SMIC, Semiconductor Manufacturing International Corp.

Speaker 0

这一举措来自台湾的国际贸易管理局。

And this is from Taiwan's International Trade Administration.

Speaker 0

他们还涵盖了这些公司的子公司。

They have also included subsidiaries of these.

Speaker 0

这是对台湾所谓战略高科技商品实体清单的更新。

It's an update to their so called strategic high-tech commodities entity list.

Speaker 0

并且,据称还新增了来自俄罗斯、巴基斯坦、伊朗、缅甸和中国大陆的601家实体。

And apparently, added not just those, six zero one entities from Russia, Pakistan, Iran, Myanmar, and Mainland China.

Speaker 1

是的。

Yeah.

Speaker 1

当你看到这一点时,你可能会想:等等。

And one, you know, reaction you might have looking at this is like, wait a minute.

Speaker 1

我以为中国已经被禁止获取来自台湾的芯片了。

I thought China was already barred from accessing, for example, chips from Taiwan.

Speaker 1

你说得完全正确。

And you're you're absolutely correct.

Speaker 1

情况确实如此。

That is the case.

Speaker 0

这正是我的反应。

This is my reaction.

Speaker 1

是的。

Yeah.

Speaker 1

是的。

Yeah.

Speaker 1

不。

No.

Speaker 1

完全正确。

Totally.

Speaker 1

完全正确。

Totally.

Speaker 1

这是个很好的问题。

It's it's a great question.

Speaker 1

那么,到底这里增加了什么内容呢?

Like, so what, like, what is actually being added here?

Speaker 1

原因是美国的出口管制,我们不会深入讨论美国为何有这种影响力,但事实是他们确实有。

And so the answer is because of US export controls, and we get we don't we won't get into the reason why US The US has leverage to to do this, but they do.

Speaker 1

台湾芯片至少在理论上不会流入中国大陆。

Taiwanese chips are not going into Mainland China, at least theoretically.

Speaker 1

显然,华为找到了绕过这一限制的方法。

Obviously, Huawei finds ways around that.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客