Lex Fridman Podcast - #490 – 2026年人工智能现状:大语言模型、编程、扩展法则、中国、智能体、GPU、通用人工智能 封面

#490 – 2026年人工智能现状:大语言模型、编程、扩展法则、中国、智能体、GPU、通用人工智能

#490 – State of AI in 2026: LLMs, Coding, Scaling Laws, China, Agents, GPUs, AGI

本集简介

内森·兰伯特与塞巴斯蒂安·拉施卡是机器学习领域的研究者、工程师及教育者。内森现任艾伦人工智能研究所(AI2)训练后阶段负责人,并著有《RLHF手册》;塞巴斯蒂安·拉施卡是《从零构建大语言模型》与《从零构建推理模型》的作者。 感谢收听 ❤ 赞助商链接:https://lexfridman.com/sponsors/ep490-sc 时间戳、文字稿及反馈提交入口见下方: 文字稿: https://lexfridman.com/ai-sota-2026-transcript 联系莱克斯: 反馈 – 向莱克斯提交意见:https://lexfridman.com/survey 问答 – 提交问题或视频连麦:https://lexfridman.com/ama 招聘 – 加入团队:https://lexfridman.com/hiring 其他 – 其他联系方式:https://lexfridman.com/contact 赞助商: 支持本节目可享受以下赞助商优惠: Box:智能内容管理平台 访问 https://box.com/ai Quo:企业电话系统(通话、短信、联系人) 访问 https://quo.com/lex UPLIFT Desk:站立式办公桌与人体工学设备 访问 https://upliftdesk.com/lex Fin:客服AI助手 访问 https://fin.ai/lex Shopify:在线销售平台 访问 https://shopify.com/lex CodeRabbit:AI代码审查工具 访问 https://coderabbit.ai/lex LMNT:无糖电解质冲饮 访问 https://drinkLMNT.com/lex Perplexity:AI问答引擎 访问 https://perplexity.ai/ 内容框架: (00:00) – 开场 (01:39) – 赞助商环节与讨论 (16:29) – 中美AI竞赛:谁将胜出? (25:11) – ChatGPT vs Claude vs Gemini vs Grok:当前领先者 (36:11) – 编程最佳AI工具 (43:02) – 开源与闭源大模型之争 (54:41) – Transformer架构:2019年以来的演进 (1:02:38) – AI扩展定律:失效还是依然有效? (1:18:45) – AI训练三阶段:预训练、中期训练与训练后 (1:51:51) – 训练后技术解析:大语言模型新研究方向 (2:12:43) – 给AI开发与研究新手的建议 (2:35:36) – AI行业工作文化(每周72+小时) (2:39:22) – 硅谷泡沫现象 (2:43:19) – 文本扩散模型等新方向 (2:49:01) – 工具使用能力 (2:53:17) – 持续学习 (2:58:39) – 长上下文处理 (3:04:54) – 机器人技术 (3:14:04) – AGI实现时间表 (3:21:20) – AI会取代程序员吗? (3:39:51) – AGI梦想正在破灭? (3:46:40) – AI如何实现盈利? (3:51:02) – 2026年重大并购预测 (3:55:34) – OpenAI、Anthropic、Google DeepMind、xAI、Meta的未来 (4:08:08) – AI曼哈顿计划 (4:14:42) – 英伟达、GPU与AI算力集群的未来 (4:22:48) – 人类文明的未来

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

以下是一场关于人工智能最新进展的对话,涵盖过去一年中人工智能领域一些令人兴奋的技术突破和进展,以及我们对即将到来的一年可能发生的有趣变化的展望。

The following is a conversation all about the state of the art in artificial intelligence, including some of the exciting technical breakthroughs and developments in AI that happened over the past year and some of the interesting things we think might happen this upcoming year.

Speaker 0

有时内容会非常技术化,但我们始终努力确保非专业人士也能轻松理解,而不会简化或降低深度。

At times, it does get super technical, but we do try to make sure that it remains accessible to folks outside the field without ever dumbing it down.

Speaker 0

能够与我在人工智能领域最欣赏的两位人士——塞巴斯蒂安·拉施卡和内森·兰伯特——一起制作这期节目,我感到无比荣幸和愉快。

It is a great honor and pleasure to be able to do this kind of episode with two of my favorite people in the AI community, Sebastian Raschka and Nathan Lambert.

Speaker 0

他们都是广受尊敬的机器学习研究者和工程师,同时也是一流的沟通者、教育者、作家和推特/X平台上的活跃者。

They are both widely respected machine learning researchers and engineers who also happen to be great communicators, educators, writers, and Twitterers, x posters.

Speaker 0

塞巴斯蒂安撰写了两本书,我强烈推荐给初学者和专家 alike。

Sebastian is the author of two books I highly recommend for beginners and experts alike.

Speaker 0

第一本是《从零开始构建大语言模型》,另一本是《从零开始构建推理模型》。

First is build a large language model from scratch and build a reasoning model from scratch.

Speaker 0

我真心相信,在机器学习和计算机科学领域,学习和理解某样东西的最佳方式就是亲自从零开始构建它。

I truly believe in the machine learning computer science world, the best way to learn and understand something is to build it yourself from scratch.

Speaker 0

内森是艾伦人工智能研究所的后训练负责人,也是《基于人类反馈的强化学习》一书的作者,这本书是该领域的权威之作。

Nathan is the post training lead at the Allen Institute for AI and author of the definitive book on reinforcement learning from human feedback.

Speaker 0

他们都有很棒的X账号和优秀的Substack。

Both of them have great x accounts, great sub stacks.

Speaker 0

Sebastian在YouTube上开设了课程。

Sebastian has courses on YouTube.

Speaker 0

Nathan有一个播客,大家绝对都应该关注所有这些内容。

Nathan has a podcast, and everyone should absolutely follow all of those.

Speaker 0

现在简要介绍一下各个赞助商。

And now a quick few second mention of each sponsor.

Speaker 0

请在描述中或访问lexfreedman.com/sponsors查看他们。

Check them out in the description or at lexfreedman.com/sponsors.

Speaker 0

这实际上是支持这个播客的最佳方式。

It is, in fact, the best way to support this podcast.

Speaker 0

我们有很多优秀的赞助商。

We got a bunch of great sponsors.

Speaker 0

Box用于智能内容管理,Quo用于企业电话系统,包括通话、短信和联系人,Uplift Desk是我身后坐着的办公桌,也是我最喜欢的办公桌。

Box for intelligent content management, Quo for a phone system, like calls, text, contacts for your business, Uplift desk, the desk I'm sitting behind, and my favorite office desk.

Speaker 0

Finn提供客户服务AI代理,Shopify用于在线销售,CodeRabbit提供AI驱动的代码审查,Element提供电解质,当然还有我们长期的朋友Perplexity。

Finn for customer service AI agents, Shopify for selling stuff online, CodeRabbit for AI powered code review, Element for electrolytes, and of course, our long time friend, Perplexity.

Speaker 0

对于以好奇心驱动的知识探索,请明智选择,朋友们。

For curiosity driven knowledge exploration, choose wisely, my friends.

Speaker 0

现在进入完整的广告播报。

And now on to the full ad reads.

Speaker 0

我努力让它们变得有趣,但如果你跳过了,请务必了解一下这些赞助商。

I try to make them interesting, but if you do skip, please do check out the sponsors.

Speaker 0

我喜欢他们的产品。

I enjoy their stuff.

Speaker 0

也许你也会喜欢。

Maybe you will too.

Speaker 0

如需联系我,无论出于什么原因,请访问lexfreeman.com/contact。

To get in touch with me, for whatever reason, go to lexfreeman.com/contact.

Speaker 0

如果你没看出来,我此刻正努力让自己更有精神,因为我昨晚熬得很晚,几乎没怎么睡觉。

If you can't tell, I'm trying to have a bit of a pep in my step at the moment because I had a long night, didn't get much sleep at all.

Speaker 0

我现在全靠意志撑着,神志恍惚,却很开心,分不清什么是现实,什么是梦境。

So I am running on fumes, delirious, happy, unsure of what is reality and what is a dream.

Speaker 0

事实上,我们现在可能就活在一场梦里。

In fact, we could right now be living inside of a dream.

Speaker 0

我最近经历了很多事。

I have been going through a lot.

Speaker 0

我一直在超负荷工作。

I have been working insane hours.

Speaker 0

事情太多了。

So much going on.

Speaker 0

我快被压垮了。

I'm so overwhelmed.

Speaker 0

当然,和往常一样,我依然无比感激并庆幸自己还活着,但没能像我希望的那样发布那么多集节目,所以还有很多广告赞助需要补上。

Of course, as always, truly grateful and happy to be alive, but have not been able to publish as many episodes as I would like, so there's a bunch of sponsors we have to catch up on.

Speaker 0

你们的支持对我来说意义非凡。

Your support truly means the world.

Speaker 0

请查看所有赞助商。

Please check out all the sponsors.

Speaker 0

如果你觉得对他们产品感兴趣,就去买吧。

If you think it might be useful to you, buy their stuff.

Speaker 0

这确实是支持这个播客最好的方式。

It really is the best way to support this podcast.

Speaker 0

好的。

Alright.

Speaker 0

我们开始吧。

Let's go.

Speaker 0

首先,本集由Box赞助,这是一款基于云的内容管理、文件共享以及企业各类协作的平台。

First up, this episode is brought to you by Box, a cloud based platform for content management, file sharing, and all kinds of collaboration or all kinds of content for your businesses.

Speaker 0

和许多公司一样,关键问题是:人工智能如何被用来提升企业的业务表现?

Like with a lot of companies, the big question is, how is AI leveraged to make whatever the business does better?

Speaker 0

很多公司只是出于炒作和贴标签的目的使用它。

A lot of companies kinda use it for the hype and the label.

Speaker 0

看这个还挺搞笑的。

It's kinda hilarious to watch.

Speaker 0

人们只是说,比如,由AI驱动。

People just say, like, powered by AI.

Speaker 0

I

Speaker 1

不在乎你是不是

don't care if you're

Speaker 0

一家面包店,由AI驱动。

a bakery, powered by AI.

Speaker 0

我不确定。

I don't know.

Speaker 0

但抛开所有炒作之外,这是人类创造过的最了不起的成就之一。

But outside of all of the hype, it is one of the most incredible things that humans have ever created.

Speaker 0

因此,能够很好地利用这一点的公司才会获胜。

And so companies that can leverage that well are the companies that win.

Speaker 0

当然,Box 在文件和内容管理方面享有盛誉,尤其是在处理大规模场景时。

And of course, Box is legendary for its file and content management, especially when you're talking about scale.

Speaker 0

因此,很显然,AI 可以用来帮助自动化一些文档处理、工作流程和组织工作,而他们在这方面做得非常出色。

So obviously, it's amenable for the utilization of AI to help automate some of the document processing, some of the workflow, some of the organization, and they do that exceptionally well.

Speaker 0

他们有一个系统,正如你所猜测的,叫做 Box AI,专门做这些事。

They have a system called, as you could imagine, Box AI that does just that.

Speaker 0

我非常喜欢。

I love it.

Speaker 0

他们在界面和后端的实现都非常出色,一切都运行得极其顺畅。

They do an excellent implementation on the interface side, on the back end side, everything works extremely nicely.

Speaker 0

今天就帮助你在整个组织中扩展 AI,访问 box.com/ai,也就是 box.com/ai,了解更多信息。

Help scale AI across your organization today, and go to box.com/ai, that's box.com/ai to learn more.

Speaker 0

本集节目还由 Kuo 赞助,拼写为 q u o。

This episode is also brought to you by Kuo, spelled q u o.

Speaker 0

巧合的是,这是一个由三个字母组成的公司名称,能帮你赢下拼字游戏。

Also happens to be a company name with just three letters that will help you win at Scrabble.

Speaker 0

在拼字游戏中可以使用公司名称吗?

Are you allowed to use company names in Scrabble?

Speaker 0

字母Q是多少分?

How many points is q?

Speaker 0

字母U是多少分?

How many points is u?

Speaker 0

我在想象会有好多分。

I'm imagining a lot.

Speaker 0

我刚开始学英语的时候,这让我特别困惑。

That was one of the big confusions to me when I was first learning the English language.

Speaker 0

我一直觉得Q应该在字母表的末尾,比如排在Z前面。

It always felt like q should be at the end of the alphabet, maybe like q z.

Speaker 0

让我惊讶的是,Q居然在字母表的前面位置。

It was always surprising to my limited brain capacity that q was earlier on in the alphabet.

Speaker 0

这是为什么?

What is it?

Speaker 0

O P Q?

O p q?

Speaker 0

我甚至无法在脑海中准确定位字母在字母表中的位置,我相信对很多人来说都是这样,除非我逐个在心里默念整个字母表。

I can't even actually localize letters in the alphabet, I'm sure that's the case for a lot of people, without reading the alphabet in my head sequentially.

Speaker 0

所有这些都与短期和长期记忆的访问、人类认知的功能与局限性,乃至可能的认知系统整体有关,这些都与本集内容相关,但与我本该谈论的、曾名为Open Phone的Quo的精彩之处关系不大。

All of this has to do with short term and long term memory access, the functioning, the limitation of human cognition, and maybe cognitive systems in general, all of it relevant to this particular episode and not so relevant to the awesomeness of quo formerly known as open phone that I should be talking about.

Speaker 0

当然,和往常一样,我认为这里的重点,以及生活中处处的重点,都是发自内心地谈论你真正想说的内容,这也是我对待一切事物的方式。

Of course, as is always the case, I think the point here and the point everywhere in the point of life is to talk from the heart about whatever you want, and that's what I try to do with everything.

Speaker 0

更广泛地说,我想说的时候就说,不想说的时候就闭嘴,专心倾听。

And to generalize that even more, to talk whenever I want and to shut the f up whenever I want and listen.

Speaker 0

相比说话,我更常倾向于倾听。

And I prefer that more often than I prefer to talk.

Speaker 0

在这里插入一个巧妙的过渡,因为谈话 somehow 是相关的。

Insert clever transition here because talk is somehow relevant.

Speaker 0

确实如此。

It is.

Speaker 0

所以,Quo,前身为OpenPhone,帮助超过9万家企业管理电话、短信、联系人以及各种与业务相关的电话事务。

So Quo, formerly known as OpenPhone, helps over 90,000 businesses manage phone calls, texts, contacts, all kinds of phone related stuff for business.

Speaker 0

你有一大群客户、大量来电,以及需要接听这些电话、管理这些事务的业务人员。

You have a bunch of customers, a bunch of incoming calls, a bunch of people on the business side that have to answer those calls, have to manage it.

Speaker 0

这个特定请求的状态如何、语音邮件、文字记录,所有这些内容,显然,通过AI的高效运用让这一切变得更加高效。

What's the status of this particular request, voicemails, transcripts, all that kind of stuff, and obviously, really nice effective utilization of AI to make that really efficient.

Speaker 0

但真正对这类产品重要的是界面要好,团队协作要顺畅,而Quo在这方面做得很好。

But really, what's really important for things like this is that the interface is good, that team collaboration is good, and Quo delivered on that.

Speaker 0

前往quo.com/lex免费试用Quo,并在前六个月享受20%折扣,网址是quo.com/lex。

Try Quo for free, plus get 20% off your first six months when you go to quo.com/lex, that's quo.com/lex.

Speaker 0

告诉你的朋友们吧,因为它或许能帮他们在拼字游戏中获胜。

Tell your friends about it because it just might help them win at Scrabble.

Speaker 0

说到拼字游戏,你通常想在桌子上玩拼字游戏。

Speaking of Scrabble, you usually wanna play Scrabble on a table.

Speaker 0

这真是一种神奇的体验。

It's such a magical experience.

Speaker 0

我刚刚回忆起一段遥远的过去,那时我和朋友坐在桌旁玩拼字游戏。

I just had a vision from a distant past of me sitting with a friend and playing Scrabble at a table.

Speaker 0

这人生充满了美好的回忆,却为何如此短暂?

What is this life full of beautiful memories and then it's over too soon?

Speaker 0

是的。

Yeah.

Speaker 0

那种忧郁的感觉很美,我觉得。

That melancholy feeling is beautiful, I think.

Speaker 0

来个巧妙的过渡吧,就像马克·诺曼那样,因为接下来这家公司的名字叫Uplift Desk。

Insert another clever transition, a la Mark Norman maybe, because of the name of this next company's Uplift Desk.

Speaker 0

正如我所说。

As I said.

Speaker 0

好的。

Okay.

Speaker 0

这是我最钟爱的办公桌,也是我用来做播客家具的桌子。

It's my go to favorite office desk, and it's also the desk that I use for podcast furniture.

Speaker 0

我已经数不清了。

I have I already lost count.

Speaker 0

我家里有很多Uplift升降桌,到处都是。

I have a lot of Uplift desks, standing desks in my place, everywhere.

Speaker 0

到处都是桌子。

It's desks everywhere.

Speaker 0

我地上放了张床垫和几张Uplift桌子。

I have a mattress in the floor and Uplift desks.

Speaker 0

我有一台用于机器人开发的Linux电脑。

So I have a Linux box for robotics.

Speaker 0

我有一台机器,用来做大量剪辑工作。

I have a machine where I do a lot of the editing.

Speaker 0

所有这些设备都放在桌子上。

All of that is on a desk.

Speaker 0

我为播客桌准备了三张桌子。

I have the three tables for the podcast desk.

Speaker 0

就是你过去几年看到的那张桌子,全部都是Uplift升降桌。

The the very one you've seen over the past several years, that's all Uplift desks.

Speaker 0

我通常不会把它们调成站立模式,但它们确实是升降桌。

I usually don't put them in standing mode, but they are standing desks.

Speaker 0

它让我能轻松做各种事情,材质非常好,非常稳固。

It allows me to do all kinds of stuff, really easy to work with, really nice material, really sturdy.

Speaker 0

我简直爱死Uplift升降桌的方方面面了。

I just love everything about Uplift desk.

Speaker 0

当他们说想赞助我时,而我早就用了他们很多年,我简直疯了。

When they said they want a sponsor after I've been using them for many years, I lost my mind.

Speaker 0

当我长期钟爱一家公司、钟爱他们的产品,还能有机会为他们大力宣传时,我特别开心。

I love it when I've been in love with a company, in love with their product for such a long time, and I get to also sing them praises.

Speaker 0

我的天啊。

I mean, come on.

Speaker 0

你接下来是不是要告诉我FFmpeg也想赞助这个播客?

What are you gonna tell me next that FFmpeg wants to sponsor this podcast?

Speaker 0

另一个开源项目并不是我为之倾心的公司。

Another sort of open source project is not a company that I've been in love with.

Speaker 0

总之,前往 upliftdesk.com/lex 并使用代码 lex,即可免费获得四种配件、免费当日发货、免费退换、十五年保修,以及订单额外折扣。

Anyway, go to upliftdesk.com/lex and use code lex to get four free accessories, free same day shipping, free returns, a fifteen year warranty, and an extra discount off your entire order.

Speaker 0

那就是 upliftdesk.com/lex。

That's upliftdesk.com/lex.

Speaker 0

把拼写说清楚真的对谁有帮助吗?

Does spelling it out really help anybody?

Speaker 0

我不知道,但他们真的恳求我了。

I don't know, but they really said pretty please.

Speaker 0

唯一的请求就是把网址拼写出来。

The one request is spell it out.

Speaker 0

这生活到底是怎么回事?

Again, what is this life?

Speaker 0

太不可思议了。

Incredible.

Speaker 0

本集节目还由Finn赞助,Finn是客户服务领域排名第一的AI代理。

This episode is also brought to you by Finn, the number one AI agent for customer service.

Speaker 0

找到细分领域,并成为第一名。

Find the niche and become number one.

Speaker 0

这就是这里的理念。

That's the idea here.

Speaker 0

任何正在构建AI公司的人,我们之前也讨论过,AGI的梦想已经破灭了吗?

Anybody building an AI company, and we talk about this, is the dream of AGI dead?

Speaker 0

我认为对许多公司来说,成功在于细分领域,但确实有一些公司做到了,而Finn正是在这一细分领域表现出色。

I think for a lot of companies, success is in the niche, but there is a few and Finn delivers on that niche.

Speaker 0

它已获得超过6000位客户服务负责人和顶级公司的信赖,包括AI公司。

It's trusted by over 6,000 customer service leaders and top companies, including AI companies.

Speaker 0

当一家AI公司信任你的公司来处理其客户服务时,这就意味着你确实是可靠的。

When an AI company trusts your company to do its customer service, that means you're legit.

Speaker 0

提供90天无理由退款保障,最高可达100万美元,专为处理复杂的多步骤查询而设计,如退货、换货和纠纷。

Ninety day money back guarantee up to $1,000,000 built to handle complex multi step queries like returns, exchanges, and disputes.

Speaker 0

前往 fin.ai/lex 了解如何转型您的客户服务并扩大您的支持团队。

Go to fin.ai/lex to learn more about transforming your customer service and scaling your support team.

Speaker 0

那就是 fin.ai/lex。

That's fin.ai/lex.

Speaker 0

我不明白我为什么突然换了这种夸张的语气。

I don't know why I switched to this hyping voice.

Speaker 0

糟糕的播音员,糟糕的电台主持人,糟糕的广告配音。

Crappy announcer, crappy radio jockey, crappy ad read voice.

Speaker 0

事情就是这样。

It is what it is.

Speaker 0

感谢你们一直陪我听到现在。

Thank you for sticking with me this long.

Speaker 0

我感受到了你们的爱,我也把这份爱送还给你们。

I feel the love and I send it right back at you.

Speaker 0

本集同样由一家工程师们充满热情的公司——Shopify——赞助。

This episode is also brought to you by a company whose engineers are also full of love, Shopify.

Speaker 0

每次想到Shopify,我都会忍不住微笑,因为我曾在NeurIPS大会上见过他们的工程展台,那是一个机器学习会议。

It just brings a smile to my face every time I think about Shopify, I got to see their engineering booth at NeurIPS, which is a machine learning conference.

Speaker 0

真是些才华横溢、非常出色的人。

Really brilliant people, wonderful people.

Speaker 0

当然,首席执行官Toby至今仍在写代码、开发产品,依然深度参与工程细节,现在他还经常谈论如何在个人兴趣项目以及公司内部应用大语言模型。

Of course, the CEO Toby is still programming, still building stuff, still in on the details of the engineering, and now is talking quite a bit about utilization of LLMs for his own sort of pet projects, but also inside the company.

Speaker 0

当公司最高层都如此热爱工程时,这简直令人难以置信。

It's just incredible when from the very top, the company is in love with engineering.

Speaker 0

这是一场对卓越工程的颂扬。

It's a celebration of great engineering.

Speaker 0

就像与DHH的对话一样,他是Ruby on Rails的缔造者,而Shopify正是基于它构建的,那场对话同样是对卓越工程的赞美。

Just like the conversation with DHH, who is the guy behind Ruby on Rails that Shopify was built on, that conversation was a celebration of great engineering.

Speaker 0

还有工程本身的美。

The beauty of engineering as well.

Speaker 0

总之,去听听那期节目吧,感受一下Ruby on Rails的神奇、Shopify的魔力,以及我们所谈论的Toby的风采。

Anyway, listen to that episode to to see some of the magic of Ruby on Rails and the magic of Shopify and the magic of Toby that we talk about.

Speaker 0

总之,前往 shopify.com/luxe 注册每月一美元的试用期。

Anyway, sign up for a $1 per month trial period at shopify.com/luxe.

Speaker 0

全部都是小写字母。

That's all lowercase.

Speaker 0

立即访问 shopify.com/luxe,将您的业务提升到新水平。

Go to shopify.com/luxe to take your business to the next level today.

Speaker 0

本集还由 CodeRabbit 赞助,这是一个直接在您的终端中提供 AI 驱动代码审查的平台。

This episode is also brought to you by CodeRabbit, a platform that provides AI powered code reviews directly within your terminal.

Speaker 0

我们在本集中多次谈到完全自动化程序员的时间表。

We talk a lot in this episode about the timeline for the full automation of the human programmer.

Speaker 0

我认为我们离完全将人类排除在流程之外还很遥远。

I think we're quite far away from taking the human out of the loop.

Speaker 0

审查过程、调试过程,这些都是编程中至关重要的部分,正如我们在节目中所讨论的那样。

That review process, the debugging process, all of that, that's such a crucial part of programming, especially just like we talked about in the episode.

Speaker 0

当我们谈论的不是个人网站——在那里,HTML 的杂乱无章也能被网页浏览器神奇地、自动地渲染出来(我不知道它们究竟是如何做到如此出色地渲染杂乱代码的)——但网页浏览器确实能够渲染杂乱代码,包括 AI 生成的代码。

When we're not talking about a personal website where HTML slop is something that a web browser magically, automagically, I don't know how they're possibly able to do such incredible job of rendering slop, but a web browser is in fact able to render slop, including AI slop.

Speaker 0

它总能找到办法。

It just finds a way.

Speaker 0

所以真正的问题是,当你有生产代码,而许多用户依赖它时,你该如何审查这些代码?

So really the question is when you have production code, something that a lot of users are relying on, how do you review that code?

Speaker 0

你如何确保捕捉到错误?

How do you make sure you're catching the errors?

Speaker 0

你如何确保为AI编码代理可能产生的幻觉和逻辑错误设置一道防线?

How are you making sure that you put a backstop to hallucinations and the logical errors that AI coding agents can generate?

Speaker 0

总之,CodeRabbit支持所有编程语言。

Anyway, CodeRabbit supports all programming languages.

Speaker 0

今天就前往 coderabbit.ai/lex 安装 CodeRabbit CLI。

Install CodeRabbit CLI today at coderabbit.ai/lex.

Speaker 0

就是 coderabbit.ai/lex。

That's coderabbit.ai/lex.

Speaker 0

本集还由 Element 赞助,这是我每天饮用的无糖且美味的电解质饮品。

This episode is also brought to you by Element, my daily zero sugar and delicious electrolyte mix.

Speaker 0

这让我想起我还需要剪辑那段在丛林里和保罗·罗萨利一起的视频,那时我们真是无比出色的人。

Reminds me of the fact that I need to get to editing the video of me in the jungle when Paul Rosalie and I are such an incredible human.

Speaker 0

祝贺保罗取得的所有成就。

Congratulations to Paul on all all of his success.

Speaker 0

去买他的书吧。

Go get his book.

Speaker 0

这是一本了不起的书。

It's an incredible book.

Speaker 0

再说一次,他是个了不起的人,有着非凡的使命,而我也确实需要去剪辑并发布这段视频,哪怕只是最低限度地完成它。

Again, he's an incredible person with an incredible mission, and yes, I need to edit and publish, hoping to at the very least.

Speaker 0

我们丛林之旅的故事,那是一场对自然、友谊以及人类体验丰富性的美好颂扬。

The story of our journey in the jungle because it was a beautiful celebration of nature in the jungle and friendship and full richness of the human experience.

Speaker 0

那真美。

It was beautiful.

Speaker 0

我提到这一点是因为在那次旅程中,我严重脱水,还记得自己梦见了清凉的电解质饮料。

The reason I mentioned that is I was, as part of that journey, severely dehydrated and I remember dreaming of element of a cold drink of water with the electrolytes.

Speaker 0

你的身体渴望它,因为它需要它。

Your body craves it and it craves it because it needs it.

Speaker 0

电解质、钠、钾、镁,当你缺乏时,不仅仅是水,还有电解质。

Electrolytes, sodium, potassium, magnesium, when you're deprived, it's not just water, it's electrolytes.

Speaker 0

所以,我总是记得这一点。

So anyway, I always remember that.

Speaker 0

凡购买即可免费获得八片装试用装。

Get a free eight count sample pack with any purchase.

Speaker 0

请前往 drinkelement.com/lex 试用。

Try it at drinkelement.com/lex.

Speaker 0

这是莱克斯·弗里德曼播客。

This is the Lex Friedman podcast.

Speaker 0

要支持本节目,请查看描述中的赞助商信息,你还可以在那里找到联系我、提问、提供反馈等链接。

To support it, please check out our sponsors in the description where you can also find links to contact me, ask questions, get feedback, and so on.

Speaker 0

现在,亲爱的朋友们,有请塞巴斯蒂安·拉施卡和内森·兰伯特。

And now, dear friends, here's Sebastian Raschka and Nathan Lambert.

Speaker 0

所以我认为,看待这一切的一个有用视角是所谓的‘深度求索时刻’。

So I think one useful lens to look at all of this through is the deep seek, so called deep seek moment.

Speaker 0

这件事发生在大约一年前的2025年1月,中国开源公司深度求索发布了DeepSeek R1,我认为可以说,它以远低于预期的计算成本,实现了接近或达到顶尖水平的性能,令所有人感到惊讶。

This happened about a year ago in January 2025 when the open weight Chinese company DeepSeek released DeepSeek r one that, I think it's fair to say, surprised everyone with near or at state of the art performance with allegedly much less compute for much cheaper.

Speaker 0

从那时至今,人工智能领域的竞争已经变得疯狂,无论是在研究层面还是产品层面。

And from then to today, the AI competition has gotten insane, both on the research level and the product level.

Speaker 0

进展一直在加速。

It's just been accelerating.

Speaker 0

今天我们来讨论这一切,也许我们可以先提出一些尖锐的问题。

Let's discuss all of this today, and maybe let's start with some spicy questions if we can.

Speaker 0

在国际层面,谁赢了?

Who's winning at the international level?

Speaker 0

你会说是中国的一批公司,还是美国的一批公司领先呢?

Would you say it's a set of companies in China or the set of companies in The United States?

Speaker 0

塞巴斯蒂安、内森,很高兴见到你们。

And Sebastian, Nathan, it's good to see you guys.

Speaker 0

那么,塞巴斯蒂安,你认为谁正在胜出?

So, Sebastian, who do you think is winning?

Speaker 2

胜出是一个非常宽泛的术语。

So winning is a very broad, you know, term.

Speaker 2

我想说你提到了深度求索时刻,我确实认为深度求索正在赢得开源模型工作者的心,因为他们将这些作为开放模型分享。

I I would say you mentioned the deep sig moment, and I do think deep sig is definitely winning the hearts of the people who work on open weight models because they share these as open models.

Speaker 2

我认为胜出具有多个时间尺度

Winning, I think, has multiple timescales to

Speaker 1

它。

it.

Speaker 2

我们有今天。

We have today.

Speaker 2

我们有明年。

We have next year.

Speaker 2

我们有十年后。

We have in ten years.

Speaker 2

我确定的一点是,以现在2026年的状况来看,不会有任何一家公司能拥有其他公司都无法接触到的技术。

One thing I know for sure is that I don't think nowadays, 2026, that there will be any company who is, let's say, having access to a technology that no other company has access to.

Speaker 2

这主要是因为研究人员经常换工作、换实验室。

And that is mainly because researchers are frequently changing jobs, changing labs.

Speaker 2

他们会在不同地方轮转。

They rotate it.

Speaker 2

因此,我不认为在技术获取方面会出现明显的赢家。

So I don't think there will be a clear winner in terms of technology access.

Speaker 2

然而,我认为差异化的因素将是预算和硬件限制。

However, I do think there will be the differentiating factor will be budget and hardware constraints.

Speaker 2

所以,我认为这些想法本身不会是专有的,但实现它们所需的方式和资源会是关键。

So I don't think the ideas will be proprietary, but the way or the resources that are needed to implement them.

Speaker 2

因此,我看不到当前会出现赢家通吃的局面。

And so I don't see currently take it all scenario where a winner takes it all.

Speaker 2

目前我无法想象这种情况会发生。

I I can't see that at the moment.

Speaker 0

内森,你怎么看?

Nathan, what do you think?

Speaker 1

你看各个实验室在它们所做的事情上投入的能量是不同的。

You see the labs put different energy into what they're trying to do.

Speaker 1

我认为,就我们录制这段内容的时间点而言,关于Anthropic的Claude Opus 4.5模型的炒作简直疯狂,我确实用过它。

And I think to demarcate the point in time when we're recording this, the hype over anthropics Cloud Opus 4.5 model has been absolutely insane, which is just I mean, I've used it.

Speaker 1

在过去几周里,我搭建了一些东西,它的热度几乎已经到了像一个梗的地步。

I've built stuff in the last few weeks, and it's it's almost gotten to the point where it feels like a bit of a meme in terms of the hype.

Speaker 1

这挺有趣的,因为这种热度完全是自发形成的。

And it's kind of funny because this is very organic.

Speaker 1

如果我们回溯几个月前,就能查到它的发布日期。

And then if we go back a few months ago, we can get the release date.

Speaker 1

当时的公告说,谷歌的Gemini 3发布了。

And the notes says Gemini three from Google got released.

Speaker 1

当时那场发布在营销和震撼效果上的声势确实非常高。

And it seemed like the marketing and just, like, wow factor of that release was super high.

Speaker 1

但到了十一月,Claude Opus 4.5 发布了,热度一直在上升。

But then at the November, Claude Opus 4.5 was released, and the hype has been growing.

Speaker 1

但 Gemini 三号是在这之前发布的,现在感觉人们谈论得没那么多了。

But Gemini three was before this, and it kind of feels like people don't really talk about it as much.

Speaker 1

尽管刚发布时,大家都觉得这是 Gemini 抓住机会重新夺回谷歌在人工智能领域结构性优势的时刻。

Even though when it came out, everybody was like, this is Gemini's moment to retake kind of Google's structural advantages in AI.

Speaker 1

Gemini 三号是个非常出色的模型,我至今仍在使用。

And Gemini three is a fantastic model, and I still use it.

Speaker 1

只是它的差异化显得没那么明显了。

It's just kind of differentiation is lower.

Speaker 1

我同意塞巴斯蒂安你说的这些观点。

And I agree with, Sebastian, what you're saying with all these.

Speaker 1

比如,理念空间非常流动,但文化上,Anthropic 一直非常坚定地押注代码,而目前这种‘代码优先’的策略正在为他们带来成效。

Like, the idea space is very fluid, but culturally, anthropic one is known for betting very hard on code, which is called code thing is working out for them right now.

Speaker 1

所以我认为,即使理念可以自由流动,但很大程度上,这一切都受限于人力投入和组织文化,而 Anthropic 至少在呈现上显得最不混乱。

So I think that even if the ideas flow pretty freely, so much of this is bottlenecked by human effort and kind of culture of organizations where anthropic seems to at least be presenting as the least chaotic.

Speaker 1

这确实是个优势,如果他们能持续一段时间的话。

It's it's it's a bit of an advantage, and if they can keep doing that for a while.

Speaker 1

但另一方面,中国有许多令人担忧的技术进展,拥有远超深海的实验室数量。

But on the other side of things, there's a lot of ominous technology from China where there's way many more labs than DeepSea.

Speaker 1

所以深海在中国掀起了一场运动。

So DeepSea kicked off a movement within China.

Speaker 1

这有点像ChatGPT在美国引发的浪潮,当时每个产品都得有个聊天机器人。

I say kind of similar to how ChatGPT kicked off a movement in The US where everything had a chatbot.

Speaker 1

现在中国有大量科技公司发布了非常强大的前沿开源模型,以至于我会说,深海正在失去其作为中国首要开源模型制造者的地位,像Z.ai的GLM模型、Mini Max的模型、Kimi月之暗面等,尤其是在最近几个月,表现得更加亮眼。

There's now tons of tech companies in China that are releasing very strong frontier open weight models to the point where I would say that DeepSeek is kind of losing its crown as the preeminent open model maker in China and the likes of z dot ai's with their GLM models, Mini Max's models, Kimi Moonshot, especially in the last few months, have shown more brightly.

Speaker 1

新的深海模型依然非常强大,但这可能将成为一个重要的叙事节点:2025年,深海出现,随后为更多中国公司提供了平台,使它们能够以一种全新的方式发布这些出色的模型。

The new DeepSeek models are still very strong, but that's kind of a it it could look back as a big narrative point where in 2025, Sea came and then all and it kind of provided this platform for way more Chinese companies that are releasing these fantastic models to kind of have this new type of operation.

Speaker 1

这些中国公司推出的模型都是开源权重的。

So these models from these Chinese companies are open weights.

Speaker 1

而根据美国公司当前商业模式的发展轨迹,这些模式可能面临风险。

And depending on this trajectory of business models that these American companies are doing could be at risk.

Speaker 1

但目前,美国有很多人正在为AI软件付费。

But currently, a lot of people are paying for AI software in The US.

Speaker 1

而历史上,在中国和其他地区,人们很少为软件付费。

And historically, in China and other parts of the world, people don't pay a lot for software.

Speaker 0

所以像DeepSeek这样的模型因为是开源权重而受到大众喜爱。

So some of these models like DeepSeek have the love of the people because they are open weight.

Speaker 0

你觉得中国公司会持续发布开源权重模型多久?

How long do you think the Chinese companies keep releasing open weight models?

Speaker 1

我觉得会持续几年。

I would say for a few years.

Speaker 1

我认为,就像在美国一样,目前还没有清晰的商业模式。

I think that, like in The US, there's not a clear business model for it.

Speaker 1

我一直以来都在撰写关于开源模型的文章,而这些中国公司已经意识到了这一点。

I have been writing about open models for a while, and these Chinese companies have realized it.

Speaker 1

因此,有些公司会主动联系我。

So I get inbound from some of them.

Speaker 1

他们很聪明,明白同样的限制,即许多美国科技公司和其他IT公司由于安全顾虑,不会向中国公司支付API订阅费用。

And they're smart and realize the same constraints, which is that a lot of US tech companies and other IT companies won't pay for a API subscription to Chinese companies for security concerns.

Speaker 1

这在科技行业一直是长期存在的习惯。

This has been a long standing habit in tech.

Speaker 1

这些公司的人员将开源模型视为影响并参与美国快速增长的AI支出市场的一种方式,他们对此非常务实。

And the people at these companies then see open weight models as an ability to influence and take part of a huge growing AI expenditure market in The US, and they're very realistic about this.

Speaker 1

这对他们来说是有效的。

And it's working for them.

Speaker 1

我认为,政府会意识到,这正在国际上为该技术的采用建立大量影响力。

And I think that the government will see that that is building a lot of influence internationally in terms of uptake of the technology.

Speaker 1

因此,会有很多激励措施来维持这种状态,但构建这些模型和进行研究的成本非常高。

So there's gonna be a lot of incentives to keep it going, but building these models and doing the research is very expensive.

Speaker 1

所以,我预计最终会迎来整合,

So at some point, I expect consolidation,

Speaker 0

但是

but

Speaker 1

我不认为2026年会是开放模型构建者数量比2025年更多的年份,而且其中许多知名的构建者将来自中国。

I don't expect that to be a story of 2026 where there will be more open model builders throughout 2026 than there were in 2025, and a lot of the notable ones will be in China.

Speaker 0

你刚才想说什么吗?

You were gonna say something?

Speaker 2

是的。

Yes.

Speaker 2

你提到DeepSeek失去了它的领先地位。

You mentioned deepsake losing its crown.

Speaker 2

我认为在某种程度上确实如此,但我们也不能忽视,它们目前仍然略微领先。

I do think to some extent, yes, but we also have to consider, though, they are still, I would say, slightly ahead.

Speaker 2

其他模型并不是因为DeepSeek变差了。

And the other ones, it's not that deepsake got worse.

Speaker 2

而是其他模型借鉴了DeepSeek的理念。

It's just like the other ones are using the ideas from deep sea deep sea.

Speaker 2

比如你提到的Kimi,使用的就是同样的架构。

For example, you mentioned Kimi, same architecture.

Speaker 2

他们正在训练它。

They're training it.

Speaker 2

然后,我们又看到了这种超越现象,他们可能在某个时刻因为拥有更新的模型而略胜一筹。

And then, again, we have this leapfrogging where they might be at some point in time a bit better because they have the more recent model.

Speaker 2

我认为这又回到了一个事实:不会有一个明确的赢家。

And I think this comes back to the the fact that there won't be a clear winner.

Speaker 2

情况就只会是这样。

It's it will just be like like that.

Speaker 2

一个人发布了新成果,另一个人就跟上,而最新的那个——

One person releases something, the other one comes in, and the the recent the

Speaker 1

最新的模型很可能始终是最好的模型。

most recent model is probably always the best model.

Speaker 1

是的。

Yeah.

Speaker 1

我们还会看到,中国公司有着不同的激励机制。

We'll also see that Chinese companies have different incentives.

Speaker 1

比如,DeepSeek 非常保密,而像 Mini Max 和 Z.AI 这样的初创公司则不是这样。

So, like, DeepSeek is very secretive, where some of these startups are, like, the mini Max's and z dot AI's of the world.

Speaker 1

这两家公司已经正式提交了IPO文件,正在努力争取西方市场的关注并大力进行推广。

Those two literally have filed IPO paperwork, and they're trying to get Western mindshare and do a lot of outreach there.

Speaker 1

所以我不确定这些激励机制是否会改变模型的开发方式,因为DeepSeek 明显是由对冲基金 High Flyer Capital 建立的。

So I don't know if these incentives will kind of change the model development, because DeepSea famously is built by a hedge fund, High Flyer Capital.

Speaker 1

我们并不清楚他们究竟看重什么,也不知道他们用这些模型做什么,或者他们是否在意这一点。

And we don't know exactly what they like, we don't know what they use the models for or if they care about this.

Speaker 0

他们在沟通上很保密。

They're secretive in terms of communication.

Speaker 0

但在描述其模型工作原理的技术报告方面,他们并不保密。

They're not secretive in terms of the technical reports that describe how their models work.

Speaker 0

在这一点上,他们仍然是开放的。

They're still open on that front.

Speaker 0

我们还应该提到关于 Opus 45 的炒作:一方面是在 Twitter 上的 X 生态圈里被捧为明星,另一方面是真正使用该模型的人数。

And we should also say on the Opus four five hype, there's the layer of something being the darling of the x echo chamber on Twitter echo chamber, and the actual amount of people that are using the model.

Speaker 0

我认为可以说,ChildGBT和Gemini专注于那些只想解决日常生活中问题的广大用户群体,这个群体规模巨大。

I think it's probably fair to say that ChildGBT and Gemini are focused on the broad user base that just want to solve problems in their daily lives, and that user base is gigantic.

Speaker 0

所以,关于编程的炒作可能并不能代表实际的使用情况。

So the hype about the coding may not be representative of the actual use.

Speaker 2

我还想说,很多使用模式正如你所说,是品牌认知度和品牌效应,但也几乎是出于习惯,因为JGPD已经存在很长时间了。

I would say also a lot of the usage patterns are, like you said, name recognition, brand, and and stuff, but also muscle memory almost where, you know, like, JGPD has been around for a long time.

Speaker 2

人们只是习惯了使用它,这几乎就像一个飞轮效应。

People just got used to using it, and it's kind of like almost like a flywheel.

Speaker 2

他们会推荐给其他用户等等。

They recommend it to other users and that stuff.

Speaker 2

另一个有趣的点是L和M的定制化功能。

One interesting point is also the customization of L and M's.

Speaker 2

例如,ChechiPT有一个记忆功能。

For example, ChechiPT has a memory feature.

Speaker 2

对吧?

Right?

Speaker 2

所以你可能有一个订阅,用来处理个人事务。

And so you may have a subscription, and you use it for personal stuff.

Speaker 2

但我不确定你是否愿意在工作中使用同一个工具,毕竟私人和工作之间要有界限。

But I don't know if you want to use that same thing at work, you know, because it's a boundary between private and work.

Speaker 2

如果你在公司工作,公司可能不允许,或者你自己也不希望这样。

If you're working at a company, they might not allow that, or you may not want that.

Speaker 2

我认为这也是一个有趣的点,你可能会拥有多个订阅。

And I think that's also an interesting point where you might have multiple subscriptions.

Speaker 2

一个是纯粹用于工作代码的。

One one is just clean code.

Speaker 2

它不会包含任何你的个人图片或业余项目内容。

It keeps has nothing of your personal images that you or hobby projects in there.

Speaker 2

它只用于工作,而另一个则是你的个人用途。

It's just like the work thing, and then the other one is your personal thing.

Speaker 2

所以我认为这是两种不同的使用场景,不代表你只能选择一个。

So I think that's also something where two different use cases, and it doesn't mean you only have to have one.

Speaker 2

我认为未来也会是多个服务并存。

It's it's I think the future is also multiple ones.

Speaker 0

你认为2025年会是哪个模型胜出?2026年呢?

What model do you think won 2025, and what model do you think is gonna win '26?

Speaker 1

我认为在消费者聊天机器人这个领域,问题在于你是否愿意押注Gemini而不是ChatGPT,嗯。

I think in the context of a consumer chatbots is a question of are you willing to bet on Gemini over ChatGPT Mhmm.

Speaker 1

在我看来,这有点像冒险,因为OpenAI一直是市场主导者,而在科技领域,这种地位带来了诸多优势。

Which I would say in my gut feels like a bit of a risky bet because OpenAI has been the incumbent, and there's so many benefits to that in tech.

Speaker 1

我认为,如果看2025年的趋势,势头确实偏向Gemini,但它们是从一个很低的起点开始的。

I I think the momentum, if you look at 2025, was on Gemini's side, but they were starting from such a low point.

Speaker 1

我觉得Bard以及这些早期尝试都值得高度赞扬,它们在组织混乱中坚持下来,最终实现了突破。

I think RIP bard and these earlier attempts of getting started, I think huge credit for them for powering through the organizational chaos to make that happen.

Speaker 1

但同时,要押注反对ChatGPT和OpenAI也很困难,因为它们看起来总是很混乱,却总能很好地落地成果。

But, also, it's hard to bet against chat to OpenAI because they always come off as as so chaotic, but they're very good at landing things.

Speaker 1

就我个人而言,我对GPT-5的评价很矛盾,但它肯定帮它们省下了大量成本,其核心功能是一个路由系统,让大多数用户不再需要承担高昂的GPU费用。

And I think like, personally, I have very mixed reviews of GPT five, but it had to have saved them so much money with the headline feature being a router where most users are no longer charging, like, charging their GPU costs as much.

Speaker 1

所以我认为,很难把我喜欢的模型特性与真正能成为大众市场差异点的特性区分开来。

So I think it's very hard to dissociate the things that I like out of models versus the things that are gonna actually be a general public differentiator.

Speaker 0

你对2026年有什么看法?

What do you think about 2026?

Speaker 0

谁会赢?

Who's gonna win?

Speaker 1

我会说一些即使有风险的话。

I'll say something even though it's risky.

Speaker 1

我会说,我认为Gemini将继续在ChatGPT上取得进展。

I will say that I think Gemini will continue to take progress on ChatGPT.

Speaker 1

我认为,当这两者都以如此巨大的规模运行时,谷歌的规模优势会显现出来。

I think Google scale when both of these are operating at such extreme scales.

Speaker 1

而且,谷歌更有能力更好地将研究与产品分离开来。

And, like, Google has the ability to separate that research and product a bit better.

Speaker 1

我们经常听到OpenAI在运营上非常混乱,总是追逐高影响力的事情,这非常像初创公司的文化。

We hear so much about OpenAI being chaotic operationally and chasing the high impact thing, which is a very startup culture.

Speaker 1

在软件和企业方面,我认为Anthropic将继续取得成功,因为他们一次又一次地为这一领域做好了准备。

And then on the software and enterprise side, I think Anthropic will have continued to success as they've again and again been set up for that.

Speaker 1

当然,谷歌云提供了大量服务,但我认为建立Gemini这个品牌对它们来说很重要。

And obviously, Google's cloud has a lot of offerings, but I think this kind of like Gemini name brand is important for them to build.

Speaker 1

谷歌云将继续表现良好,但这在生态系统中是一个更复杂的事情,因为这与Azure和AWS等竞争对手直接竞争,而不是在模型提供商层面。

And and Google's Cloud will continue to do well, but that's kind of a more complex thing to explain in the ecosystem because that's competing with the likes of Azure and AWS rather than on the model provider side.

Speaker 0

那么在基础设施方面,你觉得TPU能带来优势吗?

So in infrastructure, you think TPUs give an advantage?

Speaker 0

主要是因为

Largely because

Speaker 1

英伟达芯片的利润率高得离谱,而谷歌可以从上到下自主开发整个堆栈,无需支付这笔利润,而且他们在建设数据中心方面已经领先一步。

the margin on NVIDIA chips is insane, and Google can develop everything from top to bottom to fit their stack and not have to pay this margin, and they've had a head start in building data centers.

Speaker 1

因此,所有这些具有长周期和高成本、低利润率的领域,谷歌都拥有历史性的优势。

So all of these things that have both high lead times and very hard margins on high costs, Google has just kind of a historical advantage there.

Speaker 1

如果未来会出现新的范式,最有可能来自OpenAI,因为他们的研究团队一次又一次地展现了实现新研究想法或产品的强大能力,比如深度研究、SOARA、o1思维模型等。

And if there's gonna be a new paradigm, it's most likely to come from OpenAI where they're kind of their research division again and again has kind of shown this ability to land a new research idea or a product, I think, like deep research, SOARA, o one thinking models.

Speaker 1

这些定义性的东西都来自OpenAI,这一定是他们组织的一项核心优势。

Like, all of these definitional things have come from OpenAI, and that's gotta be one of their top traits as an organization.

Speaker 1

所以很难与之抗衡,但我认为今年的焦点将是扩展规模,并优化那些可以称为低垂果实的模型改进。

So it's kind of hard to bet against that, but I think a lot of this year will be about scale and optimizing what could be described as low hanging fruit in models.

Speaker 0

显然,在智能和速度之间存在权衡。

And clearly, there's a trade off between intelligence and speed.

Speaker 0

这正是GPT-5在幕后试图解决的问题。

This is what Chad GPT five was trying to solve behind the scenes.

Speaker 0

人们,尤其是普通大众,真正想要的是智能,还是速度?

It's like, people actually want intelligence, the broad public, or do they want speed?

Speaker 0

我认为这取决于

I think it's a

Speaker 2

实际上,提供多样化的选择,或者加入一个切换开关,是个不错的想法。

nice variety, actually, or the option to have a toggle there.

Speaker 2

我的个人使用经验是,大多数时候我查找信息都会用JGPT快速提问,迅速获得所需内容。

I mean, first, for my personal usage, most of the time when I look something up, I use JGPT to ask a quick question, get the information I wanted fast.

Speaker 2

对于日常任务,我通常使用快速模式。

For, you know, most daily tasks, I use the quick model.

Speaker 2

现在我觉得自动模式已经很不错了,你不需要特意说‘思考’或者‘不思考’之类的话。

Nowadays, I think the auto mode is pretty good where you don't have to specifically say thinking or, you know, nonthinking and stuff.

Speaker 2

不过,有时候我也想要那种高级模式。

Then again, I also sometimes want the promo.

Speaker 2

我经常的做法是,当我写完一些东西后,会把它放进聊天模式,然后说:嘿。

Very often what I do is when I have something written, I put it into a chatty bitty and say, hey.

Speaker 2

请做一次非常详细的检查。

Do a very thorough check.

Speaker 2

我的所有参考文献都正确吗?

Is are all my references correct?

Speaker 2

我的所有观点都正确吗?

Are all my thoughts correct?

Speaker 2

我有没有格式上的错误?

Did I make any formatting mistakes?

Speaker 2

图表编号是不是也有错误之类的?

And are the figure numbers wrong or something like that?

Speaker 2

我不需要马上得到结果。

And I don't need that right away.

Speaker 2

这事儿嘛, okay。

It's something, okay.

Speaker 2

我做完自己的事,可能去吃晚饭,让它运行,回来后再仔细检查一遍。

I finished my stuff, maybe have dinner, let it run, come back, and go through this.

Speaker 2

我觉得,这就是为什么拥有这个选项很重要的地方。

And I think see, this is where I think it's important to have this option.

Speaker 2

如果每次提问都要等三十分钟,甚至十分钟,我会疯掉的。

I would go crazy if for each query, would have to wait thirty minutes or ten minutes even.

Speaker 2

这就是我。

That's me.

Speaker 2

是的。

Yeah.

展开剩余字幕(还有 480 条)
Speaker 1

我在这儿都快疯了,你居然用路由器和非思考模型。

I'm like saying over here losing my mind that you use the router and the non thinking model.

Speaker 1

我是说,你怎么能忍受这样?你怎么能忍受这个?

I'm like, how do you how do you live with how do you live with that?

Speaker 1

是的。

Yeah.

Speaker 1

这就像我的反应。

It's like my reaction.

Speaker 1

我最近一直陷在悲剧里。

I'm been heavily on tragedy for a while.

Speaker 1

我从来没碰过那个非思考模型。

Never touched five non thinking.

Speaker 1

我发现它的语气和错误倾向。

I find it it's tone and then its propensity of errors.

Speaker 1

它就是更容易出错。

It's just like it has a higher likelihood of errors.

Speaker 1

这部分内容源于OpenAI发布o3的时候,那是第一个能够进行深度搜索、查找多个来源并为你整合信息的模型。

Some of this is from back when OpenAI released o three, which was the first model to do this deep search and find many sources and integrate them for you.

Speaker 1

所以我逐渐习惯了这种方式。

So I became habituated with that.

Speaker 1

因此,只要是我需要查找任何工作相关的资料,无论是论文还是代码参考,我都只使用GPT 5.2的思考模式或专业模式。

So I will only use GPT 5.2 thinking or pro when I'm finding any sort of information query for work, whether that's a paper or some code reference that I found.

Speaker 1

我经常同时运行五个专业模式的查询,每个查询都在寻找某一篇特定的论文,或者对某个公式进行反馈。

And it's just like, I I will regularly have, like, five pro queries going simultaneously, each looking for one specific paper or feedback on an equation or something.

Speaker 2

我有个有趣的例子,当时我需要尽快给出一个答案。

I have a fun example where I just needed to answer as fast as possible.

Speaker 2

在去旅行前录制这个播客时,我家里有一台本地GPU,我想运行一个长时间的强化学习实验。

For this podcast before I was going on the trip, I have a local GPU running at home, and I wanted to run a long RL experiment.

Speaker 2

通常我也会拔掉电源,因为你永远不知道自己不在家时会发生什么。

And usually, I also unplug things because you never know if you're not at home.

Speaker 2

我不想让设备一直插着电,结果我不小心把GPU的电源拔掉了。

Don't wanna have things plugged in, And I accidentally unplugged the GP.

Speaker 2

我妻子已经上车了,我当时就想:‘糟了。’

It was like my wife was already in the car, and it's like, oh, dang.

Speaker 2

然后,我急需一个能快速运行我不同实验和评估的 Bash 脚本。

And then, basically, I wanted as fast as possible a Bash script that runs my different experiments and the evaluation.

Speaker 2

我确实知道怎么用 Bash 界面或 Bash 终端。

And did something I know I learned how to use the Bash interface or Bash terminal.

Speaker 2

但那一刻,我只需要十秒钟内给我

But in that moment, I just needed, like, ten seconds to give me

Speaker 0

那个命令。

the command.

Speaker 0

这真是个搞笑的情况,不过确实如此。

This is a hilarious situation, but yeah.

Speaker 0

那你用了什么?

So what did you use?

Speaker 2

于是我用了那个不经过思考的最快模型。

So I did the nonthinking fastest model.

Speaker 2

它给了我一个Bash命令,可以把不同的脚本串联起来。

It gave me the Bash command I to chain different scripts to each other.

Speaker 2

然后问题是,你得把输出重定向到日志文件。

And then the thing is, like, you have the t thing where you want to route this to a log file.

Speaker 2

当时我脑子里一片空白,只是赶时间。

Top of my head, I was just, like, in a hurry.

Speaker 2

我本可以自己想出来的。

I could have thought about it myself.

Speaker 0

顺便说一下,我不确定有没有一个典型的例子。

By the way, I don't know if there's a representative case.

Speaker 0

老婆在车里等着,你得赶紧跑。

Wife waiting in the car, have to run

Speaker 2

你知道的,插上GPU。

you know, plug the GPU.

Speaker 2

你得

You have

Speaker 0

生成一个bash脚本。

to generate a bash script.

Speaker 0

这听起来像一部电影。

This sounds like a movie.

Speaker 1

我是说《碟中谍》。

I I Mission Impossible.

Speaker 1

我用Gemini来做这个。

I use Gemini for that.

Speaker 1

所以我用思考功能处理所有信息类任务,而Gemini则用于快速处理或那些我本可以去谷歌搜索的事情——它擅长解释内容,我信任它具备这样的知识背景,而且操作简单。

So I use thinking for all the information stuff, and then Gemini for fast things or stuff that I've could come time to Google, which is like it's good at explaining things, and I trust that it has this kind of background of knowledge, and it's simple.

Speaker 1

Gemini应用现在进步了很多,很适合做这类事情。

And the Gemini app has gotten a lot better, and it's good for that sort of things.

Speaker 1

至于代码和任何哲学类讨论,我使用Claude Opus 4.5。

And then for code and any sort of philosophical discussion, I use Claude Opus 4.5.

Speaker 1

而且,我总是开启深度思考模式。

Also, always with extended thinking.

Speaker 1

扩展思考和推理时间缩放只是让模型稍微更聪明的一种方式。

Extended thinking and inference time scaling is just a way to make the models marginally smarter.

Speaker 1

当进展非常显著时,我总会倾向于选择这一侧,因为你不知道什么时候会解锁新的使用场景。

And I will always edge on that side when the progress is very high because you don't know when that'll unlock a new use case.

Speaker 1

然后我会偶尔用Grok来获取实时信息,或者找我在AI推特上看到过但需要挖出来、并且一直惦记着的内容。

And then sometimes use Grock for real time information or finding something on AI Twitter that I knew I saw and I need to dig up and I just fixated on.

Speaker 1

尽管Grok 4发布时,Grok 4非常臃肿,但它的专业版实际上表现非常好。

Although when Grok four came out, the Grok four was a super heavy, which was like their pro variant was actually very good.

Speaker 1

我对它印象深刻。

And I was pretty impressed with it.

Speaker 1

我当时只是出于肌肉记忆,因为开着Chat2bt应用,就忽略了它。

And I was just kind of like muscle memory lost track of it with having the Chat2bt app open.

Speaker 1

所以我用很多不同的工具。

So I use many different things.

Speaker 0

是的。

Yeah.

Speaker 0

我确实会用Grunt四号重型版来调试。

I actually do use Grunt four heavy for debugging.

Speaker 0

对于那种极端的调试任务,其他工具都解决不了的时候,我发现它是最出色的,这很有趣,因为你说ChatGPT对你来说也是因为同样的原因最好,但这可能只是惯性使然。

For, like, hardcore debugging and the other ones can't solve it, I find that it's the best at and I it it's interesting because you say ChatGPT is the best interface for me for that same reason, but this could be just Momentum.

Speaker 0

嗯。

Mhmm.

Speaker 0

是的。

Yeah.

Speaker 0

对我而言,Gemini是更好的界面。

Gemini is the better interface for me.

Speaker 0

我觉得是因为我爱上了他们最擅长的‘大海捞针’功能。

I think because I fell in love with their best needle in the haystack.

Speaker 0

如果我输入一些内容量很大的信息,但只想找到非常具体的细节,确保它能追踪到所有内容,我发现至少对我来说,Gemini是最出色的。

If I ever put something that has a lot of context, but I'm looking for very specific kinds of information to make sure it tracks all of it, I find at least the Gemini for me has been the best.

Speaker 0

所以这些模型中,有些如果赢得了你的心,就挺有意思的。

So it's funny with some of these models, if they win your heart over

Speaker 2

嗯。

Mhmm.

Speaker 0

对于某个特定功能、某一天、某个特定查询或提示,你会觉得这个模型更好。

For one particular feature at one on a one particular day for that particular query, that prompt, you're like, this model is better.

Speaker 0

于是你会暂时坚持用它,直到它做出一些非常愚蠢的事情。

And so you'll just stick with it for a bit until it does something really dumb.

Speaker 0

这就像有一种阈值效应。

There's like a threshold effect.

Speaker 0

它做了一件聪明的事,让你爱上它,然后它又做了一件蠢事,你就想:你知道吗?

Some smart thing, and then you fall in love with it, and then it does some dumb thing, and you're like, you know what?

Speaker 0

我要换掉它,试试Clawd、试试GPT之类的。

I'm gonna switch and try Clawd and try GPT and all that kind of stuff.

Speaker 2

这就像你一直用它,直到它出问题,然后你才换掉语言模型。

This is exactly like you use it until it breaks, until you have a problem, and then then you change the LM.

Speaker 2

我觉得这和我们使用任何东西都一样,比如最喜爱的文本编辑器、操作系统或浏览器。

And I think it's the same how we use anything, like our favorite text editor operating systems or the browser.

Speaker 2

我的意思是,浏览器的选择太多了,Safari、Firefox、Chrome,它们都差不多,但有时候你会因为某些特殊的扩展功能而切换。

I mean, there are so many browser options, Safari, Firefox, Chrome, all the relatively similar, but then there are ex edge cases, maybe extensions you wanna use, and then you switch.

Speaker 2

但我不认为有人会把同样的网址输入到不同浏览器里去比较它们。

But I don't think there is anyone who types the same thing, like the website, into different browsers and compares them.

Speaker 2

我想,你只有在网站无法正常显示、出问题的时候才会这么做。

You only do that when the website doesn't render if something breaks, I think.

Speaker 2

这确实是个不错的观点。

So that's that's a good point.

Speaker 2

我觉得你是用到它出问题了,才会去尝试其他选择,我觉得是这样。

I think you use it until it breaks, and then you explore other options, I think.

Speaker 1

关于长上下文这件事,我之前也是Gemini的用户。

On the long context thing, I was also a Gemini user for this.

Speaker 1

但GPT 5.2的发布博客里,长上下文的评分高得离谱,很多人都在问:他们是不是发现了什么算法上的突破?

But the GPT 5.2 release blog had, like, crazy long context scores where a lot of people were like, did they just figure out some algorithmic change?

Speaker 1

在一次小版本更新中,它的分数从大约30%直接飙升到了70%左右。

It went from, like, 30% to, like, 70 or something in this minor model update.

Speaker 1

所以要跟踪所有这些事情也非常困难。

So it's also very hard to keep track of all of these things.

Speaker 1

但现在我对GPT五点二的长上下文有了更积极的看法。

But now I'm look more favorably at GPT five point two's long context.

Speaker 1

我只是在想,我到底该怎么去测试它呢?

It's just kinda like, how do I actually get to testing this?

Speaker 1

永无止境的战斗。

Never ending battle.

Speaker 0

有趣的是,我们都没从用户使用角度讨论过中国模型。

It's interesting that none of us talked about the Chinese models from a user usage perspective.

Speaker 0

这说明了什么?

What does that say?

Speaker 0

这意味着中国模型不够好,还是说我们只是非常有偏见,过于关注美国?

Does that mean the Chinese models are not as good, or does that mean we're just very biased and US focused?

Speaker 2

我认为这目前反映了模型与平台之间的差异。

I do think that that's currently the discrepancy between just the model and the platform.

Speaker 2

所以我认为,开源模型目前更出名的是它们的开源权重,而不是它们的平台。

So I I think the open models, they are more known for the open weights, not their platform yet.

Speaker 1

还有很多公司愿意以非常低的成本向你提供开源模型的推理服务。

There are also a lot of companies that are willing to sell you the open model inference at a very low cost.

Speaker 1

我认为,像 OpenRouter 这样的平台,很容易实现多模型的查看和使用。

I think, like, OpenRouter, it's easy to do the look at multi model things.

Speaker 1

你可以在 Perplexity 上运行 DeepSeek。

You could run DeepSeek on Perplexity.

Speaker 1

我想我们所有人坐在这里,都一直在持续使用 OpenAI 的 GPT-4 Pro。

Think I all of us sitting here are like, we use OpenAI GPT five Pro very consistently.

Speaker 1

我们都愿意为那一点点智能提升付费。

We're all willing to pay for the marginal intelligence gain.

Speaker 1

那些认为美国这些模型在输出质量上更优的人。

And anyone that's like, the these models from The US are better in in terms of the outputs.

Speaker 1

我想问题是,它们在未来这一年乃至多年里,还能继续保持优势吗?

I think that the question is, will they stay better for this year and for years going?

Speaker 1

但只要它们更好,我就愿意付费使用。

But it's like, so long as they're better, I'm gonna pay for it to use them.

Speaker 1

我认为还有一些分析表明,中国模型的部署方式——无论这是否源于专家控制——导致它们在复制时使用的GPU更少,因此速度更慢,错误也不同。

I think there's also analysis that shows that, like, the way that the Chinese models are served, this you could argue due to expert controls or not, is that they use fewer GPUs for replica, which makes them slower and have different errors.

Speaker 1

这关乎速度和智能。

And it's like the speed and intelligence.

Speaker 1

如果这些优势对你作为用户有利,我认为在美国,很多用户都会选择它们。

If these things are in your favor as a user, I think in The US, a lot of users will go for this.

Speaker 1

我认为这会促使中国公司以其他方式竞争,比如提供免费服务或大幅降低成本,或者激发服务创新,这对整个生态系统是有益的。

And I think that that is something that will spur these Chinese companies to want to compete in other ways, whether it's, like, free or substantially lower costs or it'll breed creativity in terms of offerings, which is good for the ecosystem.

Speaker 1

但我只是觉得,简单来说,美国的模型目前更好,我们就在用它们。

But I just think the simple thing is that US models are currently better, and we use them.

Speaker 1

我试过中国模型和其他开源模型,觉得挺有趣,但不会回头用。

And I try Chinese I try these other open models, and I'm like, fun, but not gonna I don't go back to it.

Speaker 0

我们其实没怎么提到编程。

We didn't really mention programming.

Speaker 0

这是很多人非常关心的另一个使用场景。

That's another use case that a lot of people deeply care about.

Speaker 0

所以我基本上一半用Cursor,一半用Codium,因为我觉得它们的体验根本不同,而且都挺有用。

So I use basically half and half cursor and clogged code, because there I find them to be, like, fundamentally different experience and both useful.

Speaker 0

你们经常编程吗?你们平时用什么?

What do you guys you program quite a bit, so what what do you use?

Speaker 0

现在的趋势怎么样?

What's the current vibe?

Speaker 2

我用的是Versus Code的Codex插件。

So I use the Codex plug in for Versus Code.

Speaker 2

你知道,这非常方便。

You know, it's very convenient.

Speaker 2

它就是一个插件,然后提供一个可以访问你代码仓库的聊天界面。

It's just like a plug in, and then it's a chat interface that has access to your repository.

Speaker 2

我知道Cloud Code可能有点不一样。

I know that Cloud Code is, I think, a bit different.

Speaker 2

它更具主动性。

It's a bit more agentic.

Speaker 2

它会涉及更多内容。

It touches more things.

Speaker 2

它能为你完成整个项目。

It does a whole project for you.

Speaker 2

我还没到能完全放心使用那一步,也许我是个控制狂,但我还是想稍微看看发生了什么。

I'm not quite there yet where I'm comfortable with that because maybe I'm a control freak, but I still would like to see a bit what's going on.

Speaker 2

对我而言,Codex 目前正是最佳平衡点:它在帮助我,但并没有完全接管一切。

And Codex is kind of like right now for me, like, the sweet spot where it is helping me, but it is not taking completely over.

Speaker 0

我应该提一下,我使用 Claude Code 的一个原因是培养用英语编程的能力。

I should mention one of the reasons I do use Claude code is to build the skill of programming with English.

Speaker 0

我的意思是,这种体验本质上是不同的。

I mean, the experience is fundamentally different.

Speaker 0

你不是在微观管理代码生成过程的细节,也不是查看差异(如果你用的是 Cursor,可以这么做),而是在不断修改、调整、阅读和深入理解代码;相比之下,你只是在设计空间里大致思考一下。

You're, as opposed to micromanaging the details of the process of the generation of the code and looking at the diff, which you can in cursor, if that's the ID you use, and and in changing, altering, looking and reading the code and understanding the code deeply as you progress versus just kinda like thinking in this design space Mhmm.

Speaker 0

而且在这个宏观层面上进行引导,我认为这是另一种思考编程过程的方式。

And just guiding it at this macro level, which I think is another way of thinking about the programming process.

Speaker 0

另外,我们应该提到,Cloud Code 似乎在某种程度上更好地利用了 Cloud Opus 四五。

Also, we should say that Cloud Code, it just seems to be somehow a better utilization of Cloud Opus four five.

Speaker 1

这对人们来说是一个很好的对比方式。

It's a good side by side for people to do.

Speaker 1

你可以同时打开 Cloud Code、Cursor 和 VS Code,让它们使用相同的模型,提出相同的问题,这非常有趣。

So you can have Cloud Code open, you can have Cursor open, and you can have Versus Code open, and you can select the same models on all of them and ask questions, and it's very interesting.

Speaker 1

比如,Cloud Code 在这个领域要好得多。

Like, the like, Cloud Code is way better in that domain.

Speaker 1

这很了不起。

It's remarkable.

Speaker 0

好的。

Alright.

Speaker 0

我们应该说,你们两位在多个方面都是真正的专家:研究人员、程序员、教育者、推特用户,甚至在出书方面也是如此。

We should say that both of you are legit on multiple fronts, researchers, programmers, educators, tweeterers, and on the book front too.

Speaker 0

所以,内森,很快,希望如此,他将出版一本关于RLHF的书。

So Nathan, at some point soon, hopefully, has an RLHF book coming out.

Speaker 1

这本书已经可以预购了,而且有一个完整的数字预印本,正在为实体版做美化和更有序的整理,这正是我这么做的主要原因——因为当我们的生活大部分都是数字化的时候,创造一些在实体形式上非常出色的东西是很有趣的。

It's available for preorder, and there's a full digital preprint just making it pretty and better organized for the physical thing, which is a lot of why I do it because it's fun to create things that you think are excellent in the physical form when so much of our life is digital.

Speaker 0

我得提一下,去Perplexity看看,塞巴斯蒂安·罗什卡是一位机器学习研究员和作者,以多本有影响力的书籍而闻名。

I should say, going to perplexity here, Sebastian Roshka is a machine learning researcher and author known for several influential books.

Speaker 0

我想提到其中几本,我强烈推荐《从零开始构建大语言模型》和新书《从零开始构建推理模型》。

A couple of them that I wanted to mention, which is a book I highly recommend, build a large language model from scratch, and the new one, build a reasoning model from scratch.

Speaker 0

所以我对这本书非常期待。

So I'm really excited about that.

Speaker 0

从零开始构建东西是学习最有效的方式之一。

Building stuff from scratch is one of the most powerful ways of learning.

Speaker 2

老实说,从零开始构建一个元素非常有趣。

Honestly, building an element from scratch is a lot of fun.

Speaker 2

同时,这也是学习很多东西的好方法。

It's also a lot of to learn.

Speaker 2

就像你所说的,这可能是了解某事物真正工作原理的最佳方式,因为你虽然可以看图示,但图示也可能有错误。

And like you said, it's probably the best way to learn how something really works because you can look at figures, but figures can have mistakes.

Speaker 2

你可以看概念和解释,但你可能会误解它们。

You can look of con concepts, explanations, but you might misunderstand them.

Speaker 2

但如果你看到代码始终存在,而且代码能运行,你就知道它是正确的。

But if you see the always there is code and the code works, you know it's correct.

Speaker 2

我的意思是,这里不存在误解。

I mean, there's no misunderstanding.

Speaker 2

它是精确的。

It's like it's precise.

Speaker 2

否则,它根本无法运行。

Otherwise, it wouldn't work.

Speaker 2

我认为这正是编码的魅力所在。

And I think that's, like, kind of like the beauty behind coding.

Speaker 2

它确实像从不撒谎一样。

It is kind of like it doesn't lie.

Speaker 2

这基本上就是数学。

It's math, basically.

Speaker 2

所以即使在数学中,我认为你在看书时也可能遇到错误,却永远发现不了。

So even though with math, I think, you can have mistakes in a book you would never notice.

Speaker 2

因为你读书的时候并没有运行数学,无法验证它。

Because you're not running the math when you are reading the book, you can't verify this.

Speaker 2

而代码的好处在于你可以验证它。

And with code, what's what's nice is you can verify it.

Speaker 0

是的。

Yeah.

Speaker 0

我同意你关于从零开始学习语言模型这本书的看法。

I agree with you about the LM from scratch book.

Speaker 0

把其他一切——互联网等等——都屏蔽掉,专心看书,这很好。

It's nice to tune out everything else, the Internet and so on, and just focus on the book.

Speaker 0

但我读过好几本历史类的书。

But, you know, I've read several, like, you know, history books.

Speaker 0

不知怎么的,感觉没那么孤单了。

It's just less lonely somehow.

Speaker 0

真的更有趣了。

It's really more fun.

Speaker 0

比如在编程方面,我觉得用大语言模型编程确实更有趣。

Like, for example, on the programming front, I think it's genuinely more fun to program with an LLM.

Speaker 0

嗯哼。

Mhmm.

Speaker 0

我觉得用大语言模型阅读也确实更有趣。

And I think it's genuinely more fun to read with an LLM.

Speaker 0

嗯哼。

Mhmm.

Speaker 0

但你说得对。

But you're right.

Speaker 0

这种干扰确实应该尽量减少。

Like, this distraction should be minimized.

Speaker 0

所以你是用LLM来丰富体验,也许增加更多背景信息,或者对我来说,小规模内与LLM互动的高光时刻确实很多。

So it's you use the LLM to basically enrich the experience, maybe add more context, maybe the I just the rate of moments for me in a small scale is really high with LLMs.

Speaker 2

完全同意。

100%.

Speaker 2

我也想纠正一下自己。

I would I also want to correct myself.

Speaker 2

我不是建议不要使用LLM。

I'm not suggesting not to use LLMs.

Speaker 2

我建议分多次进行,比如第一次用离线专注模式。

I suggest doing it in multiple passes, like one pass just offline focus mode.

Speaker 2

之后,我会做笔记,但我会尽量克制立刻去查资料的冲动。

And then after that, I mean, I also take notes, but I I try to resist the urge to immediately look things up.

Speaker 2

我会进行第二次阅读。

I I do a second pass.

Speaker 2

对我来说,这样更有条理,有时候答案就在章节里,但有时候也让信息沉淀下来,好好思考一下。

It's just, like, for me, more structured this way, and I get let I mean, sometimes things are answered in the chapter, but sometimes also it just helps to let it sink in and think about it.

Speaker 2

其他人有不同的偏好。

Other people have different preferences.

Speaker 2

我强烈推荐在读书时使用大语言模型。

I would highly recommend using LLMs when reading books.

Speaker 2

对我来说,这并不是首先要做的事。

For me, it's just it's not the first thing to do.

Speaker 2

这更像是第二遍。

It's like the second pass.

Speaker 0

作为建议,我要说,我恰恰相反。

By way of recommendation is to say, I do the opposite.

Speaker 0

我喜欢在一开始使用大语言模型。

I like to use the LLM at the beginning Mhmm.

Speaker 0

来全面了解我即将进入的这个世界是什么样的?

To lay out the full context of, like, what is this world that I'm now stepping into?

Speaker 0

但我尽量避免从大语言模型跳转到Twitter和博客这类外部世界,因为那样你会陷入信息的漩涡。

But I try to avoid clicking out of the LLM into the world of, like, Twitter and blogs, and because then you're now down this rabbit hole.

Speaker 0

你正在阅读别人的观点。

You're reading somebody's opinion.

Speaker 0

某个话题引发了激烈争论,突然间,你不再是你自己,而是进入了互联网和Reddit等世界。

There's a flame war about a particular topic, and all of sudden, you're no longer you're now in the in the realm of the Internet and Reddit and so on.

Speaker 0

但如果你只是让LLM告诉你为什么这很重要、有哪些宏观理念,有时书籍本身也能做到这一点,但并不总是如此。

But if you're purely letting the LLM give you the context of why this matters, what are the big picture ideas, but sometimes books themselves are good at doing that, but not always.

Speaker 1

所以这就是我喜欢ChatGPT应用的原因,因为它把AI变成了你电脑上的一个专属空间,让你可以专注于它,而不是让它只是我众多网络选项中的一个标签页。

So That's why I like the ChatGPT app, because it gives the AI a home in your computer when you are folk you can focus on it rather than just being another tab in my mess of Internet options.

Speaker 1

我认为Claude Code这类工具特别擅长让这个过程变得愉快,它看起来像是一个精心设计的界面,你的AI可以从这里出发去探索世界。

And I think Claude Code and these, in particular, does a good job of making that a joy, where it seems very engaging as a product designed to be an interface that your AI will then go out into the world.

Speaker 1

在它和Codex之间有一种难以言喻的差异,那就是它给人的感觉温暖而吸引人,而Codex虽然同样优秀,但总显得有点粗糙。

And it's something that is very kind of intangible between it and Codex is that it just feels kind of warm and engaging, where Codex can often be as good from OpenAI, but it just kind of, like, feels a little bit rougher on the edges.

Speaker 1

相比之下,Cloud Code让从零开始构建东西变得有趣,你根本不需要操心,但你相信它能创造出一些东西。

Whereas, like, Cloud Code is makes it fun to build things, particularly from scratch where you just don't like, you don't have to care, but you trust that it'll make something.

Speaker 1

显然,这对网站开发和更新工具之类的事情很有帮助,而我用它来做数据分析。

Like, obviously, this is good for websites and kind of refreshing tooling and stuff like this, which I use it for data analysis.

Speaker 1

所以我的博客会抓取 Hugging Face 的数据。

So I my my blog, we scrape Hugging Face.

Speaker 1

我们现在持续追踪每个数据集和模型的下载量变化,因此我们有了这些数据。

We keep the download numbers for every dataset and model over time now, so we have them.

Speaker 1

当时 Cloud 就说,是的。

And it's like, Cloud was just like, yeah.

Speaker 1

我确实用过这些数据。

I've made use of that data.

Speaker 1

没问题。

No problem.

Speaker 1

我当时想,这本来得花我好几天时间。

And I was like, that would have taken me days.

Speaker 1

然后我意识到,自己有足够的背景认知,心想:好吧。

And I was like and then I have enough situational awareness to be like, okay.

Speaker 1

这些趋势显然说得通,你可以去核实一下。

These trends obviously make sense and you can check things.

Speaker 1

因为这种界面非常出色,你可以通过一个中间层来操作,而无需去做那些繁琐的底层工作,比如维护不同的网页项目并完成这些任务。

Because that's just the kind of wonderful interface where you can have an intermediary and not have to do the kind of awful low level work that you would have to do to maintain different web projects and do this stuff.

Speaker 0

好的。

Alright.

Speaker 0

所以我们刚刚讨论了很多闭源权重模型。

So we just talked about a bunch of the closed weight models.

Speaker 0

现在来聊聊开源的模型吧。

Let's talk about the open ones.

Speaker 0

给我讲讲 OpenLM 模型的现状吧。

So tell me about the landscape of OpenLM models.

Speaker 0

有哪些有趣的模型?

Which are interesting ones?

Speaker 0

哪些模型让你印象深刻?为什么?

Which stand out to you, and why?

Speaker 0

我们已经提到过 DeepSeq 了。

We already mentioned DeepSeq.

Speaker 1

你想看看我们能一口气说出多少个吗?

Do you wanna see how many we can name off the top of our head?

Speaker 0

是的。

Yeah.

Speaker 0

对。

Yeah.

Speaker 0

不用看笔记。

Without looking at notes.

Speaker 1

DeepSeq、Kimmy、MiniMax、Z dot a I、Antlang。

DeepSeq, Kimmy, MiniMax, Z dot a I, Antlang.

Speaker 1

我们这纯粹是在列中国公司的名字。

We're just going Chinese.

Speaker 2

我们再加上Mistral AI Gemma吧。

Let's throw in Mistral AI Gemma.

Speaker 2

好。

Yeah.

Speaker 2

GPT OSS,由Jet GPT开发的开源模型。

GPT OSS, the open source model by Jet GPT.

Speaker 2

实际上,英伟达的Nemotron系列有一个非常酷的模型,叫Nematron three。

Actually, NVIDIA Nemotron had a or NVIDIA had a really cool one, Nematron three.

Speaker 2

有很多东西,尤其是在年底的时候。

There there's a lot of stuff, especially at the end of the year.

Speaker 2

Quen one可能是其中一个。

Quen one may be the one

Speaker 1

哦,是的。

Oh, yeah.

Speaker 1

Quen是我当时在听的明显名字。

Quen was the the obvious name I was listening.

Speaker 1

我试图至少找出10个中国模型和10个西方模型。

I was trying to get to the you can get at least 10 Chinese and at least 10 Western.

Speaker 1

我认为,OpenAI自GPT-2以来发布了他们的第一个开源模型。

I think that I mean, OpenAI released their first open model since GPT two.

Speaker 1

当我提到我写关于OpenAI开源模型发布时,他们都说别忘了GPT-2,我觉得这特别有趣,因为那已经是完全不同的时代了。

That when I when I meant talk when I was writing about OpenAI's open model release, they're all like, don't forget about GPT two, which I thought was really funny because it's just such a different time.

Speaker 1

但GPT OSS实际上是一个非常强大的模型,能完成其他模型不太擅长的一些任务。

But GPT OSS is actually a very strong model and does some things that the other models don't do very well.

Speaker 1

而且,自私地说,我会大力推广一些西方公司。

And I think that selfishly, I'll promote a bunch of, like, Western companies.

Speaker 1

所以美国和欧洲都有这些完全开源的模型。

So both in The US and Europe have these, like, fully open models.

Speaker 1

我在艾伦人工智能研究所工作,我们一直在开发OMO,发布数据、代码等所有内容。

So I work at Allen Institute for AI where we've been building OMO, which releases data and code and all of this.

Speaker 1

现在,我们有了真正的竞争对手,他们也努力发布一切,以便其他人能训练这些模型。

And now we have actual competition for people that are trying to release everything so that other people can train these models.

Speaker 1

还有基础模型研究所或LM360,他们发布了各种类型的K2模型。

So there's the Institute for Foundation Models or slash LM three sixty, which is, like, had their k two models of various types.

Speaker 1

Apertus是一个瑞士研究联盟。

Apertus is a Swiss research consortium.

Speaker 1

Hugging Face 的小语言模型非常流行。

Hugging Face has small lm, which is very popular.

Speaker 1

NVIDIA 的 NeMo 已经开始发布数据了。

NVIDIA's Neematron has started releasing data as well.

Speaker 1

斯坦福的 Marin 社区项目正在建立一种流程,让人们可以提交 GitHub 问题、实现新想法,并将其集成到稳定的语言建模框架中。

And then Stanford's Marin community project, which is kind of making it so there's a pipeline for people to open a GitHub issue and implement a new idea and then have it run-in a stable language modeling stack.

Speaker 1

所以这个领域在 2024 年时,这个列表要小得多。

So this space, that list was way smaller in 2024.

Speaker 1

所以那时候可能就只有 AI2。

So I think it was like just AI two.

Speaker 1

这让更多人能够参与进来并理解语言模型,而中国公司目前还没有类似的对应项目。

So that's a great thing for more people to get involved and to understand language models, which doesn't really have a, like, a Chinese company that is has an analog.

Speaker 1

在我说这些的时候,我想指出,中国的开源语言模型通常规模更大,这带来了更高的峰值性能,比如 MoE 模型;而我们很喜欢的很多模型,比如 Gemma 和 NeMo,大多是美国的小型模型,不过现在美国和欧洲的情况也开始发生变化。

While I'm talking, I'll say that the Chinese open language models tend to be much bigger, and that gives some of this higher peak performance as MOEs, where a lot of these things that we like a lot, whether it was Gemma and Nematron, have tended to be smaller models from The US, which is which is starting to change from US US and Europe.

Speaker 1

Mistral Large 3 在十二月发布了,这是一个巨大的 MoE 模型,架构与 DeepSeek 非常相似。

Mistral Large three came out, which was a giant MOE model, very similar to DeepSeek architecture in December.

Speaker 1

然后一家初创公司RCAI以及Nematron和NVIDIA都透露了参数规模超过一万亿的MOE模型,预计在2026年第一季度左右会推出高达四万亿参数的模型。

And then a startup RCAI and both Nematron have Nematron and NVIDIA have teased MOE models of this way bigger than a 100,000,000,000 parameters, like this 400,000,000,000 parameter range coming in this like q one twenty twenty six timeline.

Speaker 1

所以我认为今年这种平衡将发生变化,人们使用中国和美国开源模型的用途将有所不同,我个人对此非常期待。

So I think this kind of balance is set to change this year in terms of what people are using the Chinese versus US open models for, which will be which I'm personally, so I can just be very excited to watch.

Speaker 0

首先,能说出这么多模型的名字,真是非常厉害。

First of all, huge props for being able to name so many of these.

Speaker 0

你真的提到Lama了吗?

Did you actually name Lama?

Speaker 1

没有。

No.

Speaker 1

我觉得Lama完了。

I feel like RIP.

Speaker 2

这并不是故意的。

This was not on purpose.

Speaker 0

Lama完了?

RIP Lama?

Speaker 0

是的

Mhmm.

Speaker 0

好的

Alright.

Speaker 0

你能说说有哪些值得注意的模型吗?

Can you mention what are some interesting models that stand out?

Speaker 0

你提到了Qwen 3,这显然是一个突出的模型。

So you mentioned QUEN three is is is obviously a standout.

Speaker 2

我觉得这一年几乎被DeepSeek V3和R1以及12月的DeepSeek V3.2所贯穿,我喜欢这些模型的原因是它们总有一些其他人没有的有趣架构调整。

So I would say the year is almost bookended by both DeepSeek version three and r one, and then on the other hand, in December, DeepSeek version 3.2 because what I like about those is they always have an interesting architecture tweak that others don't have.

Speaker 2

但除此之外,如果你想选一些熟悉但性能非常出色的模型,那就是Qwen 3,就像内梅特说的,还有GPD OSS。

But otherwise, if you wanna go with, you know, like, familiar but really good performance, QN three, and like Nathan said, also GPD OSS.

Speaker 2

我觉得GPD OSS有趣的地方在于,它是第一个真正以工具使用为设计目标的公开或开源权重模型,我认为这确实是一种小小的范式转变,因为当时的生态系统还没准备好迎接它。

And I think GPD OSS, what's interesting about it is kind of like the first public or, like, open weight model that was really trained with tool use in mind, which I do think is kind of a little bit of a paradigm shift where the ecosystem was not quite ready for it.

Speaker 2

所谓工具使用,指的是大语言模型能够进行网页搜索或调用Python解释器。

So with tool use, I mean that the LLM is able to do a web search to call a Python interpreter.

Speaker 2

我认为这确实很突出,因为它带来了巨大的突破,因为人们对大语言模型最常见的抱怨之一就是幻觉问题。

And I do think this it's a standout because I think it's a huge unlock because one of the most common complaints about LLMs are, for example, hallucinations.

Speaker 2

对吧?

Right?

Speaker 2

嗯。

Mhmm.

Speaker 2

所以在我看来,解决幻觉问题最好的方法之一就是不要试图总是记住信息或凭空捏造。

And so in my opinion, one of the best ways to solve hallucinations is to not try to always remember information or make things up.

Speaker 2

做数学运算时,为什么不直接用计算器或Python呢?

For math, why not use a calculator app or Python?

Speaker 2

嗯。

Mhmm.

Speaker 2

如果我问大语言模型1998年足球世界杯的冠军是谁,它不必死记硬背,而是可以直接去搜索。

If I ask the LLM who won the, I don't know, soccer World Cup in 1998, instead of just trying to memorize, it could go do a search.

Speaker 2

我认为,大多数情况下,这通常还是通过谷歌搜索来实现的。

I think mostly, it's usually still a Google search.

Speaker 2

所以JGPD和GPTOSS会调用工具去搜索Google,可能找到FIFA官网,然后获取信息。

So JGPD and GPTOSS, they would do a tool call to Google, maybe find the FIFA website, find okay.

Speaker 2

是法国赢了。

It was France.

Speaker 2

这样能可靠地为你提供信息,而不是试图死记硬背。

It would get you that information reliably instead of just trying to memorize it.

Speaker 2

所以我认为这是一个巨大的突破,但目前开源和开放权重生态体系还没有充分运用它。

So I I think it's a huge unlock, which I think right now is not fully utilized yet by the open source, open weight ecosystem.

Speaker 2

很多人不使用工具调用模式,因为首先这是一个信任问题。

A lot of people don't use tool call modes because I think it's first, it's a trust thing.

Speaker 2

你不想在自己的电脑上运行这种可以访问工具、可能删除你硬盘数据的程序。

You don't wanna run this on your computer where it has access to tools, could wipe your hard drive or whatever.

Speaker 2

所以你可能希望将它容器化隔离起来。

So you wanna maybe contain containerize that.

Speaker 2

但我确实认为,未来几年拥有这种能力是非常重要的一步。

But I do think, you know, that that is, like, a really important step for the upcoming years to have this ability.

Speaker 2

是的

Yep.

Speaker 0

所以有几件快速的事情。

So a few quick things.

Speaker 0

首先,感谢你定义了你所说的工具使用是什么意思。

First of all, thank you for defining what you mean by tool use.

Speaker 0

我认为,对于我们讨论的概念,这样做非常好。

I think that's a great thing to do in general for the concepts we're talking about.

Speaker 0

即使是像MOEs这样已经相当成熟的概念,嗯。

Even things that sort of well established as MOEs Mhmm.

Speaker 0

你必须说明这意味着专家混合,而且你得帮助人们建立直觉,理解它意味着什么,如何实际应用,以及有哪些不同的类型。

You have to say that means mixture of experts, and you kinda have to build up an intuition for people what that means, how it's actually utilized, what are the different flavors.

Speaker 0

那么,开源模型如此激增意味着什么?

So what does it mean that there's just such explosion of open models?

Speaker 0

你的直觉是什么?

What's your intuition?

Speaker 1

如果你发布了一个开源模型,首要目标就是希望人们使用它。

If you're releasing an open model, you want people to use it is the first and foremost thing.

Speaker 1

其次才是透明度和信任等问题。

And then after that comes things like transparency and trust.

Speaker 1

我认为,当你看中国时,最主要的原因是他们希望全球各地的人都能使用这些模型,但我认为很多人并不会使用。

I think when you look at China, the biggest reason is that they want people around the world to use these models, and I think a lot of people will not.

Speaker 1

如果你看美国以外的地区,很多人不愿意为软件付费,但他们可能拥有计算资源,可以在上面部署并运行模型。

If you look outside of The US, a lot of people will not pay for software, but they might have computing resources where you can put a model on it and run it.

Speaker 1

我认为也可能存在一些不想上传到云端的数据。

I think there can also be data that you don't want to send to the cloud.

Speaker 1

所以,最核心的一点是让那些如果没有模型访问权限就无法使用AI的人能够使用模型、使用AI,或者使用你的AI。

So this the the number one thing is getting people to use models, use AI, or use your AI that might not be able to do it without having access to the model.

Speaker 0

我想我们应该明确说明一下。

I guess we should state explicitly.

Speaker 0

我们之前一直在讨论这些中国模型和开源权重模型。

So we've been talking about these Chinese models and open weight models.

Speaker 0

通常,这些模型是在本地运行的。

Oftentimes, the way they're run is locally.

Speaker 0

所以你并不是把数据发送到中国或任何开发模型的人——无论是硅谷的谁。

So it's not like you're sending your data to China or to whoever developed to Silicon Valley, whoever developed the model.

Speaker 1

许多美国初创公司通过托管这些模型来赚钱。

A lot of American startups make money by hosting

Speaker 2

是的。

Mhmm.

Speaker 1

它们从中国购买这些模型并出售,这被称为出售令牌,意思是有人调用模型来完成某项任务。

These models from China and selling them, selling to it's called, like, selling tokens, which means somebody will call the model to do some some piece of work.

Speaker 1

是的。

Mhmm.

Speaker 1

我认为另一个原因是,对于美国公司来说,比如OpenAI,它们严重缺乏GPU。

I think the other reason is for US companies, like, to OpenAI is so GPU deprived.

Speaker 1

它们已经达到了GPU的极限。

Like, they're so they're at the limits of the GPUs.

Speaker 1

每次他们发布新版本时,总在说:‘我们的GPU快撑不住了。’

Whenever they make a release, they're always talking about, oh, like, our GPUs are hurting.

Speaker 1

我想在一次GPTOSS的发布活动中,萨姆·阿尔特曼曾说:‘我们发布这个是因为你们的GPU可以帮上忙。’

And I think there's, like, like, one of these, like, GPTOSS release sessions, Sam Altman said, like, oh, we're releasing this because we can use your GPUs.

Speaker 1

我们不必使用自己的GPU,OpenAI依然能从中获得分发优势,这确实是个非常现实的问题,因为这对他们来说几乎不花任何成本。

We don't have to we don't have to use our GPUs, and OpenAI can still get distribution out of this, which is another very real thing because it doesn't cost them, though, anything.

Speaker 2

对用户而言,我认为也有一些用户像使用Chekipudi一样本地使用模型,但对企业来说,拥有这些模型是一个巨大的突破,因为你可以对它们进行定制。

And for the user, I think also I mean, there are users who just use the model locally how they would use Chekipudi, but also for companies, I think it's a huge unlock to have these models because you can customize them.

Speaker 2

你可以训练它们。

You can train them.

Speaker 2

你可以进行后训练,加入更多数据,比如将它们专门化为法律、医疗等领域的模型,随你所需。

You can add post training, add more data, like specialize them into, let's say, law, medical models, whatever you have.

Speaker 2

你提到的Llama,它的吸引力在于。

And the appeal you mentioned Lama.

Speaker 2

中国开源权重模型的吸引力在于,这些开源模型的许可协议也更加友好。

The appeal of the open weight models from China is that the open weight models are also the licenses are even friendlier.

Speaker 2

我认为它们是完全无限制的开源许可证,如果我们使用像Llama或Gemma这样的模型,就会有一些附加条件。

I think they are just unrestricted open source licenses where if we use something like Lama or Gemma, there are some strings attached.

Speaker 2

我认为在用户数量上存在一个上限。

I think it's like a upper limit in terms of how many users you have.

Speaker 2

如果你超过了,比如说几百万用户,你就得向Meta之类的公司报告你的财务状况。

And then if you exceed, I don't know, so so many million users, you have to report your finance situation to, let's say, Meta or something like that.

Speaker 2

我认为这虽然是一个免费模型,但附带了一些条件,而人们更喜欢没有任何附加条件的东西。

And I think, well, it is a free model, but there are strings attached, and people do like things where strings are not attached.

Speaker 2

所以我认为,除了性能之外,这也是中国开源权重模型如此受欢迎的原因之一,因为你可以直接使用它们。

So I think that's also one of the reasons besides performance why the open weight models from China are so popular because you you can just use them.

Speaker 2

从这个意义上说,没有任何陷阱。

There's no there's no catch in that sense.

Speaker 2

是的。

Yeah.

Speaker 1

在这方面,生态系统已经有所改善,但主要是得益于这些新提供商提供如此开放的许可证。

The ecosystem has gotten better on that front, but mostly downstream of these new providers providing such open licenses.

Speaker 1

你调出Perplexity的时候真有趣。

That was funny when you pulled up Perplexity.

Speaker 1

它显示Kimmy k two Thinking托管在美国,这正是我们所讨论的敏感点的完美例子,我以前从没见过这种情况。

It said Kimmy k two Thinking hosted in The US, which is just like an exact I've never seen this, but it's an exact example of what we're talking about where people are sensitive to this.

Speaker 1

比如Kimmy K two thinking,而Kimmy K two是一个非常受欢迎的模型。

Like Kimmy K two thinking and Kimmy K two is a model that is very popular.

Speaker 1

人们说它在创意写作和一些软件任务方面表现非常出色。

People say that it has very good, like, creative writing and also in doing some software things.

Speaker 1

所以,人们会注意到不同模型这些细微的特性,并因此偏好某些模型。

So it's just these little quirks that people pick up on with different models that they like.

Speaker 0

这些模型中有哪些有趣的想法是你能谈一谈的,特别是让你觉得特别有意思的?

What are some interesting ideas that some of these models have explored that you can speak to, like, that particular interesting to you?

Speaker 2

也许你可以按时间顺序讲一讲。

Maybe you can go chronologically.

Speaker 2

我的意思是,如果我们只关注2025年,当然有DeepSeq DeepSeq r one在1月份发布了。

I mean, there was, of course, DeepSeq DeepSeq r one that came out in January if we just focus on 2025.

Speaker 2

然而,这是基于DeepSeq第三个版本,该版本于2024年12月发布。

However, this was based on DeepSeq version three, which came out the year before in December 2024.

Speaker 2

在架构方面有多个创新点。

There are multiple things on the architecture side.

Speaker 2

有趣的是,你仍然可以——我的意思是,这正是我在从零开始的编码项目中所做的。

What is fascinating is you you can still I mean, that's what I do in my from scratch coding projects.

Speaker 2

你仍然可以从GPT-2开始,然后在这个模型基础上添加功能,将其变成另一个模型。

You can still start with GPT two, and you get can add things to that model to make it into this other model.

Speaker 2

所以它们本质上仍属于同一条技术脉络,彼此关系非常紧密。

So it's all still kinda like the same lineage, the same it is a very close relationship between those.

Speaker 2

但就我目前所想,DeepSeq的独特之处在于混合专家机制,或者更准确地说,他们并没有发明混合专家机制。

But top of my head, DeepSeq, what was unique there is the mixture of ex or not I mean, they were not inventing mixture of experts.

Speaker 2

我们或许可以再多解释一下什么是混合专家机制。

We can maybe talk a bit more what mixture of experts means.

Speaker 2

不过在深入细节之前,先列出这些要点:混合专家机制,此外他们还采用了多头潜在注意力机制,这是一种对注意力机制的改进,我认为在2025年,这是这些开源模型之间最主要的区别,即各种针对推理效率或KV缓存大小的优化调整。

But just to list these things first before we dive into detail, mixture of experts, but then they also had a multi head latent attention, which is a tweak to the attention mechanism where this was, I would say, 2,025, the main distinguishing factor between these open weight models, different tweaks to make inference or KV cache size.

Speaker 2

我们也可以在稍后定义KV缓存。

We can also define KV cache in a few moments.

Speaker 2

但为了更经济地支持长上下文,需要缩小KV缓存的大小,那么我们可以做哪些调整呢?

But to kind of make it more economical to have long context to shrink the KV cache size, so what are tweaks that we can do?

Speaker 2

其中大多数都聚焦在注意力机制上。

And most of them focused on the attention mechanism.

Speaker 2

DeepSeq中使用了多头潜在注意力。

Is multi head latent attention in in DeepSeq.

Speaker 2

分组查询注意力也非常流行。

There is group query attention, which is still very popular.

Speaker 2

它并不是由这些模型中的任何一个发明的。

It's not invented by any of those models.

Speaker 2

它早在几年前就已存在,但那会是另一种选择。

It goes back a few years, but that that would be the other option.

Speaker 2

滑动窗口注意力,我认为几乎是在复用它,如果我没记错的话。

Sliding window attention, I think, almost reuses it, if I remember correctly.

Speaker 2

所以,这些不同的调整让模型之间产生了差异。

So there are, like like, these different tweaks that make the models different.

Speaker 2

否则,我曾经写过一篇文章,把它们全都放在一起进行比较。

Otherwise, I put them all together in an article once where I just compared them.

Speaker 2

它们出人意料地相似。

They are very surprisingly similar.

Speaker 2

其实只是在Transformer块的重复次数、以及一些细微的调节参数上有所不同。

It's just different numbers in terms of how many repetitions of the transformer block you have in the center and, like, just little little knobs that people tune.

Speaker 2

但真正棒的是,无论怎么改,它都能正常工作。

But but what's so nice about it is it's it it works no matter what.

Speaker 2

你可以调整各种参数。

You can tweak things.

Speaker 2

你可以移动归一化层的位置。

You can move the normalization layers around.

Speaker 2

这样可以获得一些性能提升。

You get some performance gains.

Speaker 2

而且我在消融研究中几乎总是表现得很好,能够清楚地展示如果移动某些部分,模型会受到什么影响。

And I almost always very good in ablation studies showing what actually what it does to the model if you move something around.

Speaker 2

消融研究并不会让模型变得更好或更差。

Ablation studies doesn't make it better or worse.

Speaker 2

但有太多方式可以实现Transformer,同时仍能使其正常工作。

But there are so many, let's say, ways you can implement a transformer and make it still work.

Speaker 2

至今仍广泛使用的一些重要想法包括专家混合、多头潜在注意力、滑动窗口注意力、分组核注意力。

Big ideas that are still prevalent is mixture of experts, multi ad latent attention, sliding window attention, group core attention.

Speaker 2

到了年底,我们看到研究重点转向了让注意力机制在推理时的标记预测中实现线性扩展。

And then at the end of the year, we saw a focus on making the attention mechanism scale linearly with inference token prediction.

Speaker 2

比如QN3就引入了门控Delta网络。

So there were QN three next, for example, which added a gated delta net.

Speaker 2

这有点受到状态空间模型的启发,即你保持一个固定状态并不断更新,但它让注意力计算变得更便宜,或者用更廉价的操作替代了注意力。

It's it's like kind of like inspired by state space models where you have a fixed state that you keep updating, but it makes essentially this attention cheaper, or it replaces attention with a cheaper operation.

Speaker 0

也许我们该退一步,谈谈Transformer架构的整体情况。

And it maybe is it useful to step back and talk about transform architecture in general.

Speaker 2

是的

Yeah.

Speaker 2

所以我们或许应该从GPT-2架构开始说起。

So maybe we should start with the GPT two architecture.

Speaker 2

这个Transformer架构源自《Attention Is All You Need》论文。

The transformer that was derived from the attention is all you need paper.

Speaker 2

嗯哼。

Mhmm.

Speaker 2

《Attention Is All You Need》论文中的Transformer架构包含两个部分:编码器和解码器。

So the attention is all you need paper had a transformer architecture that had two parts, an encoder and a decoder.

Speaker 2

而GPT只专注于解码器部分。

And GPT went just focusing in on the decoder part.

Speaker 2

它本质上仍然是一个神经网络,内部包含这种注意力机制。

It is essentially still a neural network, and it has this attention mechanism inside.

Speaker 2

并且你一次预测一个标记。

And you predict one token at a time.

Speaker 2

你将它通过一个嵌入层。

You pass it through an embedding layer.

Speaker 2

这是Transformer模块。

There's the transformer block.

Speaker 2

Transformer模块包含注意力模块和全连接层,中间还有一些归一化层。

The transformer block has attention modules and a fully connected layer, and there are some normalization layers in between.

Speaker 2

但本质上它还是带有注意力机制的神经网络层。

But it's essentially neural network layers with this attention mechanism.

Speaker 2

因此,从GPT-2到GPT-4,比如引入了专家混合层。

So coming from GPT two, when we move on to GPT OSS, there is, for example, the mixture of experts layer.

Speaker 2

这并不是GPT-4发明的。

It's not invented by GPT OSS.

Speaker 2

它已经有几年历史了,但本质上是一种调整,可以在不增加每次前向计算开销的情况下扩大模型规模。

It's a few years old, but it is essentially a a tweak to make the model larger without consuming more compute in each forward pass.

Speaker 2

所以这里有一个全连接层。

So there is this fully connected layer.

Speaker 2

如果听众熟悉多层感知机,你可以把Transformer内部的全连接神经网络层看作是一个小型的多层感知机。

And if listeners are familiar with multilayer perceptrons, you can think of a mini multilayer perceptron, a fully connected neural network layer inside the transformer.

Speaker 2

这非常昂贵,因为它是全连接的。

And it's very expensive because it's fully connected.

Speaker 2

如果你有上千个输入和上千个输出,那就相当于一百万个连接。

If you have thousand inputs, thousand outputs, it's like a 1,000,000 connections.

Speaker 2

这是Transformer中非常耗资源的一部分。

And it's a very expensive part in this transformer.

Speaker 2

这个想法是将它扩展为多个前馈网络,比如不再只有一个,而是有256个,但这会让计算量大幅增加,因为你现在有了256个。

And the idea is to kind of expand that into multiple feed forward net So instead of having one, let's say you have 256, but it would make it way more expensive because now you have 256.

Speaker 2

但你并不会同时使用所有这些网络,而是引入一个路由机制,决定:好的。

But you don't use all of them at the same time, so you now have a router that says, okay.

Speaker 2

根据这个输入词元,使用这个全连接网络会更有帮助。

Based on this input token, it would be useful to use this fully connected network.

Speaker 2

在这种情况下,它被称为一个专家。

And in that context, called an expert.

Speaker 2

所以,专家混合模型意味着你拥有多个专家。

So a mixture of experts means you have multiple experts.

Speaker 2

根据你的输入内容,比如如果是数学密集型的,就会使用与将英文翻译成西班牙文不同的专家。

And depending on what your input is, let's say it's more math heavy, it would use different experts compared to, let's say, translating input text from English to Spanish.

Speaker 2

它可能会咨询不同的专家。

It would maybe consult different experts.

Speaker 2

这并不是很清楚,我的意思是,很难明确地说:好吧。

It's not quite clear I mean, not as clear cut to say, okay.

Speaker 2

这个专家只负责数学,而西班牙语则更模糊一些。

This is only an expert for math, and for Spanish is a bit more fuzzy.

Speaker 2

但核心思想是,你将更多知识压缩进网络中,但并非所有知识都会被同时使用。

But the idea is essentially that you pack more knowledge into the network, but not all the knowledge is used all the time.

Speaker 2

是的。

Mhmm.

Speaker 2

那会非常浪费。

That would be very wasteful.

Speaker 2

所以在生成令牌时,你更像是有选择性地进行处理。

So you are kind of like during the token generation, you are more selective.

Speaker 2

有一个路由器来决定哪些令牌应该发送给哪个专家。

There's a router that selects which tokens should go to which expert.

Speaker 2

这增加了复杂性。

It's more complexity.

Speaker 2

训练起来更困难。

It's harder to train.

Speaker 2

有很多可能出错的地方,比如崩溃之类的。

There's a lot of, you know, that can go wrong, like collapse and everything.

Speaker 2

所以我认为这就是为什么目前大多数模型仍然使用密集型架构。

So I think that's why almost three still uses dense.

Speaker 2

我的意思是,你拥有的所有混合专家模型,但密集型模型——这里的‘密集’也是行话。

I mean, you have, I think, all the models with mixture of experts, but dense models where dense means so also it's jargon.

Speaker 2

密集型和稀疏型之间是有区别的。

There's a distinction between dense and sparse.

Speaker 2

因此,专家混合模型被认为是稀疏的,因为我们有很多专家,但只有少数几个会被激活。

So mixture of experts is considered sparse because we have a lot of experts, but only few of them are active.

Speaker 2

所以这被称为稀疏。

So that's called sparse.

Speaker 2

而密集模型则恰恰相反,你只有一个全连接模块,并且它始终处于使用状态。

And then dense would be the opposite where you only have, like, one fully connected module and it's always, you know, utilized.

Speaker 0

所以也许现在是个不错的时机来谈谈KV缓存,但在此之前,先从更宏观的角度看,从GPT-2到现在,究竟引入了多少新想法?

So may maybe this is a good place to also talk about KV cache, but actually before that, even zooming out, like, fundamentally, how many new ideas have been implemented from from GPT two to today?

Speaker 0

是的。

Mhmm.

Speaker 0

这些架构到底有多大不同呢?

Like, how different really are these architectures?

Speaker 2

想象一下,专家混合模型。

Picture, like, the mixture of experts.

Speaker 2

GPT-3中的注意力机制,就是分组查询注意力机制。

The attention mechanism in GPT OSS, that would be the group query attention mechanism.

Speaker 2

所以这是对多头注意力机制的一个小调整,改为分组查询注意力,这样我们就有了两种。

So it's a slight tweak for multi head attention to group query attention so that we have two.

Speaker 2

我认为他们用RMS归一化替换了层归一化,但这只是另一种归一化方式。

I think they replaced layer norm by RMS norm, but it's just like a different normalization there.

Speaker 2

这不是很大的改动。

Not a big change.

Speaker 2

这只是一个小小的调整。

It's just like a a tweak.

Speaker 2

关于非线性激活函数,熟悉深度神经网络的人会知道,这就像把Sigmoid换成ReLU。

The nonlinear activation function, people familiar in with deep new networks, I mean, it's the same as changing sigmoid with ReLU.

Speaker 2

这并没有从根本上改变网络结构。

It's it's not changing the network fundamentally.

Speaker 2

这只是一个小小的调整。

It's just like a tweak.

Speaker 2

你只是做了一点点调整。

You take a little little tweak.

Speaker 2

我想就这些了。

And and that's about it, I would say.

Speaker 2

它并没有本质上那么不同。

It's not really fundamentally that different.

Speaker 2

它仍然是相同的架构。

It's still the same same architecture.

Speaker 2

所以你可以从一个转换到另一个。

So you can convert one from one.

Speaker 2

你基本上只需添加这些改动,就能从一个转向另一个。

You can go from one into the other by just adding these these changes, basically.

Speaker 0

它本质上仍然是相同的架构。

It's fundamentally is still the same architecture.

Speaker 0

是的。

Yep.

Speaker 2

比如,你之前提到过我的书。

So for example, you mentioned my book earlier.

Speaker 2

书中的那个是GPT-2模型,因为它简单而且体积很小。

That's a GPT two model in the book because it's simple and it's very small.

Speaker 2

大约有1亿2400万到1亿2000万个参数。

So 124, 120,000,000 parameters approximately.

Speaker 2

但在附加材料中,我确实从零开始构建了几乎三个模型,比如Gemma从零开始,以及其他类型的从零构建模型。

But in the bonus materials, I do have almost three from scratch, Gemma three from scratch and other types of from scratch models.

Speaker 2

我总是从我的GPT-2模型开始,然后调整——嗯,添加不同的组件,就能从一个模型过渡到另一个。

And I always started with my GPT two model and just, you know, tweaked the well, added different components, and you get from one to the other.

Speaker 2

这在某种程度上就像一种谱系关系。

It's like it's kind of like a lineage in a sense.

Speaker 2

是的。

Yeah.

Speaker 0

你能为人们建立一种直观的理解吗?

Can you build up an intuition for people?

Speaker 0

因为当你宏观来看时,会发现人工智能领域的发展速度如此之快。

Because sort of when you zoom out, you look at it, there's so much rapid advancement in the AI world.

Speaker 0

同时,从本质上讲,这些架构并没有改变。

And at the same time, fundamentally, the architectures have not changed.

Speaker 1

嗯。

Mhmm.

Speaker 0

那么,所有这些快速进步和动荡究竟发生在哪儿呢?

So where is all the turbulence, the turmoil of the advancement happening?

Speaker 0

进步的潜力在哪里?

Where where is the gains to be had?

Speaker 2

在开发或训练网络的过程中,有不同的阶段。

So there are the different stages where you develop the network or train the network.

Speaker 2

你有预训练阶段。

You have the pretraining.

Speaker 2

那时候,他们只是用GPT-2进行预训练。

Now back then, they it was just pretraining with GPT two.

Speaker 2

现在你有预训练、中训练和后训练。

Now you have pretraining, midtraining, and posttraining.

Speaker 2

所以我认为,目前我们正处于后训练关注阶段。

So and I think right now, we are in the post training focus stage.

Speaker 2

我的意思是,预训练仍然能带来优势,特别是当你使用更高质量的数据并扩大规模时。

I mean, pretraining still gives you advantages if you scale it up to better, higher quality data.

Speaker 2

但随后我们获得了GPT-2时代所没有的能力突破。

But then we have capability unlocks that were not there with GPT two.

Speaker 2

例如,ChatGPT本质上是一个GPT-3模型,而GPT-3在架构上与GPT-2是相同的。

For example, ChatGPT, it is basically a GPT three model, and and GPT three is the same as GPT two in terms of architecture.

Speaker 2

新的突破在于引入了监督微调和基于人类反馈的强化学习。

What was new was adding the supervised fine tuning and the reinforcement learning with human feedback.

Speaker 2

因此,这些进步更多体现在算法层面,而非架构层面。

So it's more on the algorithmic side rather than the architecture.

Speaker 1

我认为系统本身也发生了很大变化。

I would say that the systems also change a lot.

Speaker 1

如果你听NVIDIA的发布内容,他们会提到像现在使用FP8这样的技术。

I think if you listen to NVIDIA's announcements, they talk about these things like, you now do FP eight.

Speaker 1

现在你可以使用FP4了。

You can now do FP four.

Speaker 1

目前这些实验室正在研究如何利用更多计算资源将它们整合到一个模型中,从而加快训练速度。

And what is happening is these labs are figuring out how to utilize more compute to put it into one model, which lets them train faster.

Speaker 1

这让他们能够加入更多数据。

And that lets them put more data in.

Speaker 1

通过这种方式,你可以更快地找到更优的配置。

And then you can find better configurations faster by doing this.

Speaker 1

因此,在进行大规模训练时,你可以关注每块GPU每秒处理的token数量这一指标。

So you can look at like the essentially the tokens per second per GPU is a metric that you look at when you're doing large scale training.

Speaker 1

通过开启FP8训练,你可以将性能从大约10k提升到13k,这意味着模型中每个参数占用的内存更少了。

And you could get you can go from like 10 k to 13 k by turning on FPA training, which means you're using less memory per parameter in the model.

Speaker 1

通过减少保存的信息量,你减少了通信开销。

And by saving less information, you do less communication.

Speaker 1

你可以更快地进行训练。

You can train faster.

Speaker 1

所以所有这些系统层面的改进,都支撑着数据和算法上更快的实验循环,这种循环一直在持续,虽然从架构上看它们几乎完全相同,但训练这些模型所用的代码库却大不相同。

So all of these, like, system things underpin way faster experimentation on data and algorithms that is kind of like it's this it's this kind of loop that keeps going, where where it's kinda hard to describe when you look at the architecture and they're exactly the same, but the code base used to train these models is gonna be vastly different.

Speaker 1

嗯。

Mhmm.

Speaker 1

你可能也发现,虽然GPU不同,但用现在的技术训练GPT-2 20B,实际耗时肯定比当年训练GPT-2快得多。

And you could probably like, I don't the GPUs are different, but you probably train GPT OSS 20 b way faster in wall clock time than GPT two Mhmm.

Speaker 1

那时候的训练方式。

Was trained at the time.

Speaker 2

是的。

Yep.

Speaker 2

就像你说的,比如在专家混合模型中,他们采用了BF16和FP4优化,从而获得了更高的吞吐量。

Like you said, they had, for example, in the mixture of experts is n b f p four optimization, for example, where you get more throughput.

Speaker 2

但我确实认为,虽然这在速度上是成立的,但它并没有在本质上赋予模型新的能力。

But I I do think this is for the speed, this is true, but it it doesn't give the model new capabilities in a sense.

Speaker 2

它只是让我们能在不显著降低模型性能的前提下,把计算粒度粗化到什么程度。

It's just how much can we make make the computation coarser without suffering in terms of model performance degradation.

Speaker 2

但我认为,确实出现了替代Transformer的其他方案。

But I do think mean, there are alternatives popping up to the transformer.

Speaker 2

比如文本扩散模型,这是一种完全不同的范式。

There's text diffusion models, completely different paradigm.

Speaker 2

而且文本扩散模型可能也会使用Transformer架构,但它们不是自回归的Transformer。

And there's also I mean, the text diffusion models might use transformer architectures, but it's not an auto autoregressive transformer.

Speaker 2

还有MAMBA模型,这是一种状态空间模型,但它们也有各自的权衡。

And also MAMBA models, it's a state space model, but they do have trade offs.

Speaker 2

目前来看,还没有任何模型能取代自回归Transformer作为最先进的模型。

And what's right is there's nothing that has replaced the autoregressive transformer as state of the art model.

Speaker 2

所以,要追求最先进的性能,你还是会选这个方案。

So, like, for state of the art, you would still do that go with that thing.

Speaker 2

但现在出现了更多低成本的替代方案,这些方案做了一些妥协,但已不再只有一种架构了。

But there are now alternatives for the cheaper and, like, alternatives that are kind of making compromises, but it's not just one architecture anymore.

Speaker 2

新的小众架构正在不断涌现。

There are little ones coming up.

Speaker 2

但如果我们讨论最先进的技术,目前基本上还是源自GPT-2的自回归Transformer架构。

But if we talk about the state of the art, it's pretty much still the the transformer architecture autoregressive derived from GPT two, essentially.

Speaker 0

我想这里的关键问题是,我们已经讨论了很多关于预训练背后架构的内容。

I guess the big question here is we talked quite a bit here on the architecture behind the the pretraining.

Speaker 0

在预训练、后训练、推理、上下文长度、数据和合成数据方面,扩展定律是否依然有效?

Are the scaling laws holding strong across pre training, post training, inference, contact size, data, synthetic data?

Speaker 1

我想先从扩展定律的技术定义说起,这个定义会影响我们对所有这些的理解。

I like to start with the technical definition of scaling law, which kind of informs all of this.

Speaker 1

扩展定律是一种幂律关系,横轴代表你所扩展的变量,通常是计算量和数据的组合,这两者是相似的。

The scaling law is a power law relationship between you think of the x axis, so kind of what you are scaling is a combination of compute and data, which are kind of similar.

Speaker 1

纵轴则是模型在预测下一个token时的保留集准确率。

And then the y axis is like the held out prediction accuracy over our next tokens.

Speaker 1

我们之前讨论过模型是自回归的。

We talked about models being auto regressive.

Speaker 1

也就是说,如果你有一组模型未曾见过的文本,在训练后它能有多准确?

It's like if you keep a set of text that the model has not seen, how accurate will it get when you train?

Speaker 1

而缩放定律的概念源于人们发现这是一种非常可预测的关系。

And the idea of scaling laws came when people figured out that that was a very predictable relationship.

Speaker 1

我认为这个技术术语仍在延续,接下来的问题是:用户能从中获得什么?

And I think that that technical term is continuing, and then the question is like, what do users get out of it?

Speaker 1

此外,还有更多类型的缩放方式,比如OpenAI的o1以引入推理时缩放而闻名,而不太为人所知的是,它还展示了可以通过扩展强化学习训练,获得对数形式的x轴和线性增长的y轴性能。

And then there are more types of scaling where OpenAI's o one was famous for introducing inference time scaling, and I think less famously for also showing that you can scale reinforcement learning training and get kind of this log x axis and then a linear increase in performance on y axis.

Speaker 1

因此,现在有这三大维度:传统缩放定律主要讨论预训练阶段,即模型规模和数据集大小。

So there's kind of these three axes now where the traditional scaling laws are talk talked about for pre training, which is how big your model is and how big your dataset is.

Speaker 1

然后是强化学习的缩放,也就是我们接下来要讨论的这种试错学习能持续多久。

And then scaling reinforcement learning, which is like how long can you do this trial and error learning that we will talk about.

Speaker 1

我们会进一步定义这些内容。

We'll define more of this.

Speaker 1

还有推理时的计算资源,即让模型在特定问题上生成更多令牌。

And then this inference time compute, which is just letting the model generate more tokens on a specific problem.

Speaker 1

我对这三者都依然持乐观态度,但低垂的果实大多已被摘取,尤其是在过去一年里,强化学习方面有了可验证奖励(RLVR),而推理时缩放也让这些模型的使用体验截然不同——过去你几乎能立即获得第一个令牌。

So I'm kind of bullish where they're they're all really still working, but the low hanging fruit has mostly been taken, especially in the last year on reinforcement learning with verifiable rewards, which is this RLVR, and then inference time scaling, which is just why these models feel so different to use where previously you would get that first token immediately.

Speaker 1

现在它们会花费数秒、数分钟,甚至数小时来生成这些隐藏的思考过程,然后才给出你答案的第一个词。

And now they'll go off for seconds, minutes, or even hours generating these hidden thoughts before giving you the first word of your answer.

Speaker 1

这一切都关乎推理时间扩展,这种扩展在模型能力的提升上带来了显著的跃迁。

And that's all about this inference time scaling, which is such a wonderful kind of step function in terms of how the models change abilities.

Speaker 1

它们使得工具使用成为可能,并推动了我们之前讨论的更出色的软件工程能力。

They've kind of enabled this tool use stuff and enabled this much better software engineering that we were talking about.

Speaker 1

当我们说‘使能’时,这几乎完全是由于使用可验证奖励的强化学习训练,让模型能够轻松地掌握这些技能。

And this is when we say enabled, almost entirely downstream of the fact that this reinforced learning with verifiable rewards training just kind of let the models pick up these skills very easily.

Speaker 1

让模型自己去学习。

So let the models learn.

Speaker 1

所以当你观察模型在生成大量令牌时的推理过程,它通常会尝试一个工具,然后观察返回的结果。

So if you look at the reasoning process when the models are generating a lot of tokens, what it'll be often doing is it tries a tool, it looks at what it gets back.

Speaker 1

它再尝试另一个API,查看返回内容,直到解决问题为止。

It tries another API, it sees what it gets back and if it solves the problem.

Speaker 1

因此,在训练过程中,模型会迅速学会这样做。

So the models when you're training them very quickly learn to do this.

Speaker 1

到了一天结束时,这就形成了一个通用的基础,模型可以很好地在你的代码库中使用命令行指令,帮你处理Git,移动和整理文件,或搜索更多信息——而如果我们一年前坐在这里,根本不会想到模型能做这些事。

And then at the end of the day, that gives this kind of general foundation where the model can use CLI commands very nicely in your repo and handle git for you and move things around and organize things or search to find more information, which if we're sitting in these chairs a year ago, it's something that we didn't really think of the models being doing.

Speaker 1

所以,这仅仅是今年发生的事情,彻底改变了我们对使用AI的看法,我认为这非常神奇。

So this is just kinda something that has happened this year and is totally transformed how we think of using AI, which I think is very magical.

Speaker 1

这是一个非常有趣的演进,极大地释放了价值。

It's such an interesting evolution and just so unlocks so much value.

Speaker 1

但目前还不清楚下一步解锁类似能力的方向会是什么。

But it's it's like, it's not clear what the next avenue will be in terms of unlocking stuff like this.

Speaker 1

我想我们稍后会谈到持续学习的问题。

I think that there's there's we'll get to continual learning later.

Speaker 1

但目前AI领域有很多关于某些方向的热议,却没人知道下一个跃迁何时真正到来。

But there's a lot of buzz around certain areas of AI, but no one knows when the step function will will really come.

Speaker 0

你刚才说了很多内容,而且迅速道出了深刻的见解。

So you've you've actually said quite a lot of things there and said profound things quickly.

Speaker 0

如果能稍微深入探讨一下这些观点就好了。

It would be nice to unpack them a little bit.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客