达里奥·阿莫代伊（Anthropic首席执行官）—— 每一次AI突破背后的隐藏模式

本集简介

这是我与Anthropic首席执行官达里奥·阿莫代伊的对话。达里奥风趣幽默，对这些模型究竟在做什么、为何能如此高效扩展，以及如何实现对齐，有着独到而深刻的见解。在YouTube观看。在Apple Podcasts、Spotify或其他任何播客平台收听。阅读全文 Transcript。关注我的Twitter以获取未来剧集更新。时间戳 (00:00:00) - 引言 (00:01:00) - 扩展 (00:15:46) - 语言 (00:22:58) - 经济实用性 (00:38:05) - 生物恐怖主义 (00:43:35) - 网络安全 (00:47:19) - 对齐与机制可解释性 (00:57:43) - 对齐研究是否需要规模？ (01:05:30) - 滥用 vs 对齐失败 (01:09:06) - 如果AI发展顺利会怎样？ (01:11:05) - 中国 (01:15:11) - 如何思考对齐问题 (01:31:31) - 现代安全是否足够？ (01:36:09) - 训练中的低效问题 (01:45:53) - Anthropic长期利益信托 (01:51:18) - Claude有意识吗？ (01:56:14) - 保持低调获取Dwarkesh播客完整内容，请访问 www.dwarkesh.com/subscribe

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

一个普遍受过良好教育的人。

A generally well educated human.

Speaker 0

这种情况可能在两三年内发生。

That could happen in, you know, two or three years.

Speaker 1

当两三年后这些庞然大物进行高达一百亿美元的交易时，这对Anthropic意味着什么？

What does that imply for Anthropic when in two to three years, these Leviathans are doing, like, $10,000,000,000 trading runs?

Speaker 0

这些模型只是想学习。

The models, they just wanna learn.

Speaker 0

这有点像禅宗的科恩。

And it was a bit like a Zen Cohen.

Speaker 0

我听了之后，顿悟了。

I listened to this and and I became enlightened.

Speaker 0

算力并没有流动。

The compute doesn't flow.

Speaker 0

就像香料没有流动一样。

Like, the spice doesn't flow.

Speaker 0

或者吧。

Or Yeah.

Speaker 0

就像是，你不能，那个数据块必须是自由无拘的。

It's like, you can't like like, the the blob has to be unencumbered.

Speaker 0

对吧？

Right?

Speaker 0

去年底和今年初发生的那场巨大加速，我们并没有促成它。

The big acceleration that that happened late last year and and beginning of this year, we didn't cause that.

Speaker 0

老实说，我认为如果你看看谷歌的反应，那可能比其他任何事情都重要十倍。

And honestly, I think if you look at the reaction of Google, that that might be 10 times more important than anything else.

Speaker 0

曾经有个笑话。

There was a running joke.

Speaker 0

建造通用人工智能的样子是，你知道的，会有一个数据中心紧挨着核电站，旁边再有个掩体。

The way building AGI would look like is, you know, there would be a data center next to a nuclear power plant next to a bunker.

Speaker 0

但现在已经是2030年了。

But now it's 2030.

Speaker 0

接下来会发生什么？

What happens next?

Speaker 0

我们正在

What what are we

Speaker 1

和一个超人类的神做什么？

doing with a superhuman god?

Speaker 1

好的。

Okay.

Speaker 1

今天，我很荣幸能与安托尼克公司的首席执行官达里奥·阿莫代伊交谈。

Today, I have the pleasure of speaking with Dario Amodei, who is the CEO of Anthropic.

Speaker 1

我真的很期待这次对话。

I'm And really excited about this one.

Speaker 1

达里奥，非常感谢你来参加这个播客。

Dario, thank you so much for coming on the podcast.

Speaker 1

谢谢你的邀请。

Thanks for having me.

Speaker 1

第一个问题。

First question.

Speaker 1

你是少数几个多年来一直预见规模效应的人之一，已经超过五年了。

You have been one of the very few people who has seen scaling coming for years, more than five years.

Speaker 1

我不确定具体有多久。

I don't know how long it's been.

Speaker 1

但问问你认识的人，看看有没有人早就预见到了。

But ask somebody you've seen it coming.

Speaker 1

为什么规模效应之所以有效，其根本原因是什么？

What is fundamentally the explanation for why scaling works?

Speaker 1

为什么宇宙的结构会使得，只要你将大量计算资源投入到足够广泛的数据分布中，系统就会变得智能？

Why is the universe organized such that if you throw big blobs of compute at a wide enough distribution of data, the thing becomes intelligent?

Speaker 0

我认为事实是我们仍然不清楚。

I think the truth is that we still don't know.

Speaker 0

这几乎完全是一个经验性的发现。

I think it's almost entirely an empirical fact.

Speaker 0

我认为这是一个可以从数据和多个不同地方感受到的事实。

I think it's a fact that you could kind of sense from the data and from a bunch of different places.

Speaker 0

但我觉得我们仍然没有一个令人满意的解释。

But I think we don't still have a satisfying explanation for it.

Speaker 0

如果我要试着给出一个解释，但我只是，我不知道，说这话时只是在空谈，物理学中有一些关于长尾或相关性与效应的幂律概念。

If I were to try to make one, but I'm just, I don't know, I'm just kind of waving my hands when I say this, there's these ideas in physics around long tail or power law of correlations or effects.

Speaker 0

因此，当很多事情发生时，当你拥有大量特征时，你会在分布的主体部分获得大量数据，而不是在尾部。

And so when a bunch of stuff happens, when you have a bunch of features, you get a lot of the data in the fat part of the distribution before the tails.

Speaker 0

对于语言来说，这就像我意识到词性存在，名词通常跟随动词。

For language, this would be things oh, I figured out there are parts of speech and nouns follow verbs.

Speaker 0

然后还有越来越多、越来越微妙的相关性。

And then there are these more and more and more and more subtle correlations.

Speaker 0

因此，每增加一个数量级，你就能捕捉到分布中更多的内容，这似乎是有道理的。

And so it kind of makes sense why there would be this every log or order of magnitude that you add, you kind of capture more of the distribution.

Speaker 0

但完全不清楚的是，为什么它的性能会随着参数如此平滑地扩展？

What's not clear at all is why does it scale so smoothly with parameters?

Speaker 0

为什么它会随着数据量的增加而如此平滑地扩展？

Why does it scale so smoothly with the amount of data?

Speaker 0

你可以想出一些解释，说明为什么它是线性的。

You can think up some explanations of why it's linear.

Speaker 0

参数就像一个桶，数据就像水，桶的大小与水量成正比。

The parameters are like a bucket, and so the data's like water, and so size of the bucket is proportional to the size of the water.

Speaker 0

但为什么这会导致如此平滑的扩展？

But why does it lead to all this very smooth scaling?

Speaker 0

我认为我们仍然不知道。

I think we still don't know.

Speaker 0

有各种各样的解释。

There's all these explanations.

Speaker 0

我们的首席科学家贾里德·卡普兰做了一些关于分形流形维度的研究，可以用它来解释这一现象。

Our chief scientist, Jared Kaplan, did some stuff on fractal manifold dimension that you can use to explain it.

Speaker 0

所以有各种各样的想法，但我感觉我们其实并不真正确定。

So there's there's all kinds of ideas, but I feel like we just don't really know for sure.

Speaker 1

顺便说一下，对于正在跟进的观众来说，我们所说的‘扩展’是指，当你从GPT-3到GPT-4，或者从Claude 1到Claude 2时，模型预测下一个token的损失会非常平稳地变化。

And by the way, for for the audience who's trying to follow along, by scaling, we're referring to the fact that you can very predictably see how if you go from GPT three to GPT four or, in this case, Claude one to Claude two, that the loss in terms of whether it can predict the next token scales very smoothly.

Speaker 1

所以，好吧，我们不知道为什么会这样，但你至少能根据经验预测，当损失达到某个值时，这种能力就会出现吗？

So, okay, we we don't know why it's happening, but can you at least predict if empirically here is the loss at which this ability will emerge?

Speaker 1

这是这个电路会浮现出来的临界点。

Here is the place where this circuit will emerge.

Speaker 1

这能预测吗？还是你只是在看损失数值？

Is is that at all predictable, or are you just looking at the loss number?

Speaker 0

这要难预测得多。

It is much less predictable.

Speaker 0

可以预测的是这种统计平均值——损失、熵，它非常可预测。

What's predictable is this statistical average, this loss, this entropy, and it's super predictable.

Speaker 0

它甚至能精确到好几位有效数字，这在物理学以外的地方你几乎看不到。

It's, like, you know, predictable to, like, sometimes even to several significant figures, which you don't see outside of physics.

Speaker 0

对吧？

Right?

Speaker 0

你不会期望在这个混乱的经验领域中看到它。

You don't expect to see it in this messy empirical field.

Speaker 0

但实际上，具体的能力非常难以预测。

But actually specific abilities are very hard to predict.

Speaker 0

所以，你知道，当我还在研究GPT-2和GPT-3的时候，算术能力是在什么时候出现的？

So, you know, back when I was working on GPT two and GPT three, like, when does arithmetic come in place?

Speaker 0

模型什么时候学会编程？

When do models learn to code?

Speaker 0

有时候，这种能力的出现非常突然。

Sometimes it's very abrupt.

Speaker 0

这就像是你可以预测天气的统计平均值，但某一天的具体天气却很难预测。

It's kind of like you can predict statistical averages of the weather, but the weather on one particular day is very hard to predict.

Speaker 1

所以给我讲得简单点。

So dumb it down for me.

Speaker 1

我不懂流形，但从机制上讲，模型还不懂加法。

I don't understand manifolds, but mechanistically, it doesn't know addition yet.

Speaker 1

现在它会加法了。

Now it knows addition.

Speaker 1

发生了什么？

What has happened?

Speaker 0

这是另一个我们不知道答案的问题。

This is another question that we don't know the answer to.

Speaker 0

我的意思是，我们正试图通过机制可解释性之类的方法来回答这个问题，但我也不确定。

I mean, we're trying to answer this with things like mechanistic interpretability, but I'm not sure.

Speaker 0

你可以想象这些事情就像电路突然接通一样，尽管有一些证据表明，当你观察模型能够进行加法运算时，比如你看看它答对的概率，会突然飙升。

I mean, you can think about these things about like circuits snapping into place, although there is some evidence that when you look at the models being able to add things that, you know, like if you look at its chance of getting the right answer, that shoots up all of a sudden.

Speaker 0

但如果你看一下，好吧，正确答案的概率是多少？

But if you look at, okay, what's the probability of the right answer?

Speaker 0

你会看到它从百万分之一上升到十万分之一，再到千分之一，远在它真正答对之前就发生了。

You'll see it climb from like one in a million to one in a hundred thousand to one in a thousand long before it it actually gets the right answer.

Speaker 0

因此，在许多这类情况下——至少我不知道是否所有情况都如此——背后都存在某种持续进行的过程。

And so there's some in many of these cases, least, I don't know if in all of them, there's some continuous process going on behind the scenes.

Speaker 0

我完全不明白。

I don't understand it at all.

Speaker 1

这是否意味着进行加法的电路或过程原本就存在，只是变得更为显著了？

Does that imply that the circuit or the process for doing addition was preexisting and it just got increased in salient?

Speaker 1

我

Speaker 0

我不确定是否真的存在一个微弱但正在变强的电路。

I don't know if, like, there's this circuit that's weak and getting stronger.

Speaker 0

我不确定它是不是某种能工作但效果不佳的东西。

I don't know if it's something that works but not very well.

Speaker 0

就像，我们不知道。

Like, think we don't know.

Speaker 0

这些正是我们试图用机制可解释性来回答的一些问题。

And these are some of the questions we're trying to answer with mechanistic interpretability.

Speaker 1

是否存在一些不会随着规模扩大而涌现的能力？

Are there abilities that won't emerge with scale?

Speaker 0

所以我 definitely 认为，像对齐和价值观这样的东西，并不会随着规模扩大而自动出现。

So I definitely think that, again, like things like alignment and values are not guaranteed to emerge with scale.

Speaker 0

一种理解方式是，你训练模型，它本质上是在预测世界，理解世界。

It's kind of like, one way to think about it is you train the model and it is basically, it's like predicting the world, it's understanding the world.

Speaker 0

它的任务是处理事实，而不是价值观，它只是试图预测接下来会发生什么。

Job is facts, not values, It's trying to predict what comes next.

Speaker 0

但这里存在一些自由变量，比如：你该做什么？

But there's there's free variables here where it's like, what should you do?

Speaker 0

你该想什么？

What should you think?

Speaker 0

你该重视什么？

What should you value?

Speaker 0

这些，你知道的，根本就没有对应的数据位来承载这些内容。

Those, you know, like, just there aren't the bits for that.

Speaker 0

它只是像：如果我从这里开始，我就应该以那里结束。

There's just like, well, if I started with this, I should finish with this.

Speaker 0

如果我从另一个东西开始，我就应该以另一个东西结束。

If I started with this other thing, I should finish with this other thing.

Speaker 0

所以我认为这不会自然出现。

And so I think that's not going to emerge.

Speaker 1

我想稍后再谈谈对齐问题。

I wanna talk about alignment in a second.

Speaker 1

但在扩展方面，如果我们尚未达到人类水平的智能，扩展就已趋于平稳，回过头来看，你的解释会是什么？

But on on scaling, if it turns out that scaling plateaus before we reach human level intelligence, looking back on it, what would be your explanation?

Speaker 1

如果这种情况真的发生，你认为最可能的原因是什么？

What do you think is likely to be the case if that turns out to be the outcome?

Speaker 0

是的。

Yeah.

Speaker 0

所以我想我会区分根本性理论问题和一些实际问题。

So I guess I would distinguish some problem with the fundamental theory with some practical issue.

Speaker 0

一个可能的实际问题是，我们可能会耗尽数据。

So one practical issue we could have is we could run out of data.

Speaker 0

由于各种原因，我认为这种情况不会发生。

For various reasons, I think that's not going to happen.

Speaker 0

但如果你非常天真地看待这个问题，我们离数据耗尽其实并不远。

But if you look at it very, very naively, we're not that far from running out data.

Speaker 0

所以，我们只是没有足够的数据来维持扩展曲线。

And so it's like, we just don't have the data to continue the scaling curves.

Speaker 0

我认为另一种可能发生的情况是：我们用尽了所有可用的算力，但这些算力仍然不够，之后进展就会变得缓慢。

I think another way it could happen is like, oh, we just use up all of our compute that was available and that wasn't enough, and then progress is slow after that.

Speaker 0

我不会押注于其中任何一种情况会发生，但它们确实有可能。

I wouldn't bet on either of those things happening, but they could.

Speaker 0

从根本角度来看，我个人认为，扩展定律突然停止的可能性非常小。

Think from a fundamental perspective, personally, I think it's very unlikely that the scaling laws will just stop.

Speaker 0

如果真的停止了，另一个原因——这并不完全属于根本性问题——可能是我们还没有找到完全合适的架构。

If they do, another reason, again, this isn't fully fundamental, could just be we don't have quite the right architecture.

Speaker 0

比如，如果我们用LSDM或RNN来尝试，斜率就会不同。

Like if we tried to do it with an LSDM or an RNN, the slope would be different.

Speaker 0

我仍然觉得我们可能会达到那个阶段，但我认为，如果没有Transformer那种关注遥远过去的能力，有些事情就很难表示出来。

I still might be that we get there, but I think there are some things that are just very hard to represent when you don't have this ability to attend far in the past that transformers have.

Speaker 0

如果 somehow（我不知道我们如何能知道这一点），问题根本不在架构上，而我们只是碰到了瓶颈，我会对此感到非常惊讶。

If somehow, and I don't know how we would know this, it kind of wasn't about the architecture and we just hit a wall, I think I'd be very surprised by that.

Speaker 0

我认为，我们现在已经到了这样的地步：模型做不到的事情，与它们能做到的事情之间，似乎并没有本质上的区别。

I think we're already at the point where the things the models can't do don't seem to me to be different in kind from the things they can do.

Speaker 0

几年前，你或许还能说，它们不会推理，不会编程，你可以划出界限，说也许我们会遇到瓶颈。

And it just, you you could have made a case a few years ago that it was like, they can't reason, they can't program, like you could have drawn boundaries and said, well, maybe you'll hit a wall.

Speaker 0

我不这么认为。

I didn't think that.

Speaker 0

我不觉得我们会遇到瓶颈。

I didn't think we would hit a wall.

Speaker 0

当时少数其他人也不认为我们会遇到瓶颈，但那时这种说法还更有道理一些。

Few other people didn't think we would hit a wall, but it was a more plausible case then.

Speaker 0

现在我觉得这种说法没那么有道理了。

I think it's a less plausible case now.

Speaker 0

现在，这种情况是有可能发生的。

Now, it could happen.

Speaker 0

这些事情太疯狂了。

Like, this stuff is crazy.

Speaker 0

可能明天就会发生，但就像我们遇到了瓶颈一样。

Like, could happen tomorrow, but it's just like we hit a wall.

Speaker 0

我认为如果真的发生了，我在想，我的解释会是什么？虽然可能性不大，但真正能解释我的想法的是什么？

I think if that happens, I'm trying to think of like, what's my what would really be my it's unlikely, but what would really be my explanation?

Speaker 0

我认为我的解释是，在使用下一个词预测进行训练时，损失函数本身存在问题。

I think my explanation would be there's something wrong with the loss when you train on next word prediction.

Speaker 0

比如，一些剩余的推理能力之类的，如果你真的想达到很高的编程水平，就意味着你对某些词的关注度远高于其他词。

Like, some of the remaining reasoning abilities or something like that, like if you really want to learn to program at a really high level, it means you care about some tokens much more than others.

Speaker 0

而这些词足够稀少，以至于损失函数过度关注那些贡献最多熵的表层内容。

And they're rare enough that it's like the loss function over focuses on kind of the appearance, the things that are responsible for the most bits of entropy.

Speaker 0

相反，它们没有关注那些真正关键的东西。

And instead, they don't focus on this stuff that's really essential.

Speaker 0

所以你可能会让信号被噪声淹没。

And so you could kind of have the signal drowned out in the noise.

Speaker 0

我认为由于多种原因，事情不会以这种方式发展。

I don't think it's gonna play out that way for a number of reasons.

Speaker 0

但如果你告诉我：没错，你训练了2024年的模型，它更大了，但并没有更好，你尝试了所有架构都没用，我想这就是我会选择的解释。

But if you told me, yep, you trained your 2024 model, it was much bigger, it just wasn't any better, you tried every architecture and it didn't work, I think that's the explanation I would reach for.

Speaker 1

如果你必须放弃下一个词预测，有没有其他损失函数的候选方案？

Is there a candidate for another loss function if you had to abandon next token prediction?

Speaker 0

我认为那时你必须转向某种形式的强化学习。

I think then you would have to go for some kind of RL.

Speaker 0

而且同样，强化学习也有很多种类型。

And again, there's many different kinds.

Speaker 0

有基于人类反馈的强化学习。

There's RL from human feedback.

Speaker 0

有面向目标的强化学习。

There's RL against an objective.

Speaker 0

还有像宪法AI这样的方法。

There's things like constitutional AI.

Speaker 0

还有像放大和辩论这样的方法，对吧？

There's things like amplification and debate, right?

Speaker 0

这些既是对齐方法，也是训练模型的方式。

These are kind of both alignment methods and ways of training models.

Speaker 0

你得尝试很多东西，但重点必须放在我们真正关心模型做什么上，对吧？

You would have to try a bunch of things, but the focus would have to be on what do we actually care about the model doing, right?

Speaker 0

从某种意义上说，我们有点幸运，因为预测下一个词能帮我们实现所有其他需要的功能。

And in a sense, we're a little bit lucky that it's like predict the next word gets us all these other things we need.

Speaker 0

这并没有保证。

There's no guarantee.

Speaker 0

看起来似乎

It seems like

Speaker 1

根据你的世界观，存在多种不同的损失函数，关键在于哪种方法能让你直接投入大量数据。

from your worldview, there's a multitude of different loss functions that it's just a matter of what can allow you to just throw a whole bunch of data at it.

Speaker 1

比如，下一个词预测本身并不重要。

Like the next token prediction itself is not significant.

Speaker 0

是的。

Yeah.

Speaker 0

嗯，我的意思是，RL的问题在于你会被拖慢一点，因为你必须通过某种方式来设计损失函数的工作方式。

Well, I mean, I guess the thing with RL is you get slowed down a bit because it's like, you know, you have to, by some method, kinda design how loss function works.

Speaker 0

而下一个词预测的好处是，它已经为你准备好了，对吧？

Nice thing with the NEXT token prediction is it's there for you, right?

Speaker 0

它就在那里。

It's just there.

Speaker 0

这是世界上最简单的事情。

It's the easiest thing in the world.

Speaker 0

所以，我认为如果你不能以如此简单的方式进行扩展，反而会拖慢进度。

And so I think it would slow you down if you couldn't scale in just that very simplistically.

Speaker 1

你提到数据不太可能是限制因素。

You mentioned that data is likely not to be the constraint.

Speaker 1

你为什么认为是这种情况？

Why do you think that is the case?

Speaker 0

这里有很多可能性。

There's various possibilities here.

Speaker 0

由于多种原因，我不该深入细节。

And for a number of reasons, I shouldn't go into the details.

Speaker 0

但世界上有众多数据来源，也有许多方式可以生成数据。

But there's many sources of data in the world, and there's many ways that you can also generate data.

Speaker 0

我猜测这不会成为障碍。

My guess is that this will not be a blocker.

Speaker 0

也许如果真是这样会更好，但事实并非如此。

Maybe it'd be better if it was, but it won't be.

Speaker 0

你是在说多模态吗？

Are you talking about multimodal?

Speaker 0

有非常多不同的方式来

There's just many different ways to

Speaker 1

去做吧。

do it.

Speaker 1

你是如何形成关于扩展规模的看法的？

How did you form your views on scaling?

Speaker 1

我们可以追溯到多久以前？

How far back can we go?

Speaker 1

然后你基本上就是在说类似这样的话。

And then you would be basically saying something similar to this.

Speaker 0

我这种观点可能是从2014年到2017年期间逐渐形成的。

This view that I have probably formed gradually from, I would say, like 2014 to 2017.

Speaker 0

所以我认为，这是我第一次接触AI的经历。

So I think my first experience with it was my first experience with AI.

Speaker 0

所以我看到了2012年左右AlexNet的一些早期成果。

So I saw some of the early stuff around AlexNet in 2012.

Speaker 0

我一直想研究智能，但在此之前，我只是觉得这根本行不通。

Always kind of had wanted to study intelligence, but before I was just like, this isn't really working.

Speaker 0

看起来这东西根本没在起作用。

Like, doesn't seem like it's actually working.

Speaker 0

早在2005年，我就读过里克·赫茨韦尔的作品。

All the way back to like 2005, like I'd read Rick Hertzwell's work.

Speaker 0

那时候我还在早期互联网上读过埃利泽的一些文章，觉得这些东西离现实还很遥远。

I'd read even some of like Eliezer's work on the early internet back then, and I was like, oh, this stuff kinda looks far away.

Speaker 0

看看今天的AI，我觉得根本还差得远。

Like, I look at the AI stuff of today, and it's like not anywhere close.

Speaker 0

但看到AlexNet时，我就觉得，哦，这些东西真的开始起作用了。

But with Alex, I was like, oh, this stuff is actually starting to work.

Speaker 0

所以我最初加入了百度的吴恩达团队。

So I joined Andrew Ng's group initially at Baidu.

Speaker 0

我接到的第一个任务是，我之前从事的是另一个领域。

And the first task that I got set to do, right, was my, I'd been in a different field.

Speaker 0

所以我刚加入时，这确实是我第一次接触AI，和世界上其他地方进行的学术研究风格有点不同。

And so I first joined, you know, this was my first experience with AI, and it was a bit different from a lot of the kind of academic style research that was going on kinda elsewhere in the world, right?

Speaker 0

我觉得自己挺幸运的，因为当时给我和那里的其他人分配的任务就是尽可能打造最好的语音识别系统。

I think I kinda got lucky in that the task that was given to me and the other folks there was just make the best speech recognition system that you can.

Speaker 0

而且当时有大量的数据可用。

And there was a lot of data available.

Speaker 0

也有很多GPU可用。

There were a lot of GPUs available.

Speaker 0

所以这个问题的设定方式，恰好适合发现扩展规模是一种解决方案，对吧？

So it kind of, it posed the problem in a way that was amenable to discovering that kind of scaling was a solution, right?

Speaker 0

这和你是个博士后，职责是想出一个看似聪明新颖、能让你留下创新印记的想法，非常不同。

That's very different from like, you're a postdoc and it's your job to come up with what's the best, like, what's an idea that seems clever and new and makes your mark as someone who's invented something.

Speaker 0

所以我很快发现，其实只需要尝试最简单的实验。

And so I just quickly discovered that, you know, was just trying the simplest experiments.

Speaker 0

就是摆弄一些参数而已。

Was like, you know, just fiddling with some dials.

Speaker 0

我当时想，好吧，试试看在RNN里直接增加更多层。

I was like, okay, you know, try try adding more layers to the r literally, add more layers to the RNN.

Speaker 0

你知道，试试让它训练更久。

You know, try training it for longer.

Speaker 0

会发生什么？

What happens?

Speaker 0

需要多久才会过拟合？

How long does it take to overfit?

Speaker 0

如果我加入新数据，但减少重复次数呢？

What if I add new data and repeat it less times?

Speaker 0

然后我就看到了这些非常一致的模式。

And like, I just saw these, like, very consistent patterns.

Speaker 0

我其实并不知道这有什么特别，或者别人是否也这样思考。

I didn't really know this was unusual or that others weren't thinking in this way.

Speaker 0

这简直就像是新手运气。

This this was just kind of like almost like beginner's luck.

Speaker 0

这是我第一次接触这个。

It was my first experience with it.

Speaker 0

我并没有进一步思考超越语音识别之外的事情。

And I didn't really think about it beyond speech recognition.

Speaker 0

对吧？

Right?

Speaker 0

没错。

Right.

Speaker 0

我只是觉得，哎，我对这个领域一无所知。

I was just kind of like, oh, I don't know anything about this field.

Speaker 0

人们用机器学习做了成千上万种事情，但我觉得这很奇怪，因为在语音识别领域似乎确实如此。

There are zillions of things people do with machine learning, but I'm like weird, but this seems to be true in the speech recognition field.

Speaker 0

然后我想，就在OpenAI成立前不久，我遇到了伊利亚，就是你采访过的那位。

And then I think it was recently, just before OpenAI started that I met Ilya, who you interviewed.

Speaker 0

他跟我说的第一件事就是：你看，模型只是想学习。

One of the first things he said to me was, look, the models, they just wanna learn.

Speaker 0

你得明白这一点，模型只是想学习。

You have to understand this, the models, they just wanna learn.

Speaker 0

这有点像禅宗公案。

And it was a bit like a Zen Cohen.

Speaker 0

我当时就想，我听了之后，突然开悟了。

Like, I kind of like, I listened to this and I became enlightened.

Speaker 0

在那之后的几年里，我又成了那个经常把这些东西系统化、整合起来的人。

And over the years after this, again, I would be kind of the one who would formalize a lot of these things and kind of put them together.

Speaker 0

但这件事让我明白，我所看到的现象并不是什么偶然的个例。

But like just kind of what that told me is that phenomenon that I'd seen wasn't just some random thing that I'd seen.

Speaker 0

它是普遍存在的。

It was broad.

Speaker 0

它更具有普遍性，对吧？

It was more general, right?

Speaker 0

模型只是想学习。

The models just wanna learn.

Speaker 0

你只要把它们前进路上的障碍清除掉就行了。

You get the obstacles out of their way, right?

Speaker 0

你给他们高质量的数据。

You give them good data.

Speaker 0

你给他们足够的操作空间。

You give them enough space to operate in.

Speaker 0

你不要做那种愚蠢的事，比如用错误的数值方式限制他们，而他们就是想学习。

You don't do something stupid like condition them badly numerically, And and they wanna learn.

Speaker 0

他们会做到的。

They'll do it.

Speaker 0

他们会做到的。

They'll do it.

Speaker 1

你知道吗？

You know what?

Speaker 1

我觉得你刚才说的特别有意思的是，当时有很多人其实已经意识到这些技术在语音识别或玩某些受限游戏方面非常出色，尽管他们可能没有直接从事这方面的研究。

What I find really interesting about what you said is there are many people who were aware back at that time, probably weren't working on it directly, but were aware that these things are really good at speech recognition or at playing these constrained games.

Speaker 1

但很少有人像你和伊利亚那样，从那里推断出通用智能的可能性。

Very few extrapolated from there, like you and Ilya did, to something that is generally intelligent.

Speaker 1

你当时思考这个问题的方式，和别人有什么不同？你怎么会从‘它在语音识别上持续变好’，想到‘它会在所有事情上都持续变好’呢？

What what was different about the way you were thinking about it versus how others think that you went from, like, it's getting better at speech in this consistent way.

Speaker 1

它会在所有事情上都以这种一致的方式变得更好。

It will get better at everything in this consistent way.

Speaker 0

是的。

Yeah.

Speaker 0

说实话，我真的不知道。

So I I genuinely don't know.

Speaker 0

我的意思是，最初当我研究语音识别时，我以为这只适用于语音，或者只适用于这一类模型。

I mean, first, when I sought for speech, I assumed this was just true for speech or for this narrow class of models.

Speaker 0

我认为在2014年到2017年这段时间里，我尝试了大量其他领域，却一次次看到同样的现象。

I think it was just over the period between 2014 and 2017, I tried it for a lot of things and saw the same thing over and over again.

Speaker 0

我看到同样的情况也发生在DOTA上。

I watched the same being true with DOTA.

Speaker 0

我还看到同样的情况在机器人领域也成立，尽管很多人认为机器人是个反例，但我只是觉得，机器人领域获取数据太难了。

I watched the same being true with robotics, which many people thought of as a counterexample, but I just thought, well, it's hard to get data for robotics.

Speaker 0

但如果我们仅从我们现有的数据中来看，就会发现相同的模式。

But if we operate within, if we look within the data that we have, we see the same patterns.

Speaker 0

所以我不确定。

And so I don't know.

Speaker 0

我认为人们当时非常专注于解决眼前的问题。

I think people were very focused on solving the problem in front of them.

Speaker 0

为什么一个人这样想，另一个人那样想，这很难解释。

Why one person thinks one way, another person thinks, it's very hard to explain.

Speaker 0

我认为人们只是用不同的视角来看待，是垂直地看，而不是横向地看。

I think people just see it through a different lens, are looking like vertically instead of horizontally.

Speaker 0

他们没有考虑规模扩展的问题。

They're not thinking about the scaling.

Speaker 0

他们想的是怎么解决我的问题？

They're thinking about how do I solve my problem?

Speaker 0

对于机器人领域来说，数据不够。

Well, for robotics, there's not enough data.

Speaker 0

因此，这很容易抽象为：扩展之所以无效，是因为我们没有足够的数据。

And so that can easily abstract to, well, scaling doesn't work because we don't have the data.

Speaker 0

所以我不知道，不知为何，也许是出于偶然，我特别执着于这个方向。

And so I don't know, I just, for some reason, and it may have been random chance, was obsessed with that particular direction.

Speaker 1

你是什么时候意识到语言是向这些系统输入大量数据的手段的？还是说你只是其他方法都用尽了？

When did it become obvious to you that language is the means to just feed a bunch of data into these things that or was it just you ran out of other things?

Speaker 1

比如机器人领域，数据不够。

Like robotics, there's not enough data.

Speaker 1

其他这些东西，数据也不够。

This other thing, there's not enough data.

Speaker 0

是的。

Yeah.

Speaker 0

我的意思是，我认为整个‘下一个词预测’的概念——你可以进行自监督学习——再加上这样一个想法：哇，预测下一个词蕴含着如此丰富的结构和信息。

I mean, I think this whole idea of, like, the next word prediction that you could do self supervised learning, you know, that together with the idea that it's like, wow, for predicting the next word, there's so much richness and structure there.

Speaker 0

它可能会说‘2加2等于’，而你必须知道答案是4。

It might say two plus two equals, and you have to know the answer is four.

Speaker 0

它可能在讲述一个角色的故事，然后本质上是在向模型提出类似于针对儿童发展的测试问题。

And it might be telling the story about a character, and then basically it's posing to the model the equivalent of these developmental tests that get posed to children.

Speaker 0

玛丽走进房间，把一件物品放在那里。

Mary walks into the room and puts an item in there.

Speaker 0

然后查克走进房间，把物品拿走了。

And then Chuck walks into the room and removes the item.

Speaker 0

而玛丽没有看到这一幕。

And Mary doesn't see it.

Speaker 0

玛丽会认为发生了什么？因此，模型必须在预测下一个词的过程中准确地回答这个问题。

What does Mary think hap So the models are gonna have to get this right in the service of predicting the next word.

Speaker 0

它们必须解决所有这些心理理论问题，解决所有这些数学问题。

They're gonna have to solve all these theory of mind problems, solve all these math problems.

Speaker 0

所以我的想法是，尽可能地扩大规模，这似乎没有上限。

And so my thinking was just, well, scale it up as much as you There's kind of no limit to it.

Speaker 0

我之前在抽象层面上有这种观点，但真正让我确信并坚定这一看法的，是亚历克·拉德福德在GPT-1上的工作——不仅能够训练出一个预测能力极强的语言模型，还可以对其进行微调。

And I think I kind of had abstractly that view, but the thing of course that really solidified and convinced me was the work that Alec Radford did on GPT-one, which was not only could you get this language model that could predict things very well, but also you could fine tune it.

Speaker 0

在那时候，你需要对它进行微调才能完成所有这些其他任务。

You needed to fine tune it in those days to do all these other tasks.

Speaker 0

所以我当时就想，哇，这不仅仅是一些狭窄的领域，只是把语言模型做对而已。

And so I was like, wow, this isn't just some narrow thing where you get the language model right.

Speaker 0

它可以说是通向一切的中间阶段，对吧？

It's sort of halfway to everywhere, right?

Speaker 0

你的语言模型一旦做对了，再稍微往这个方向推进一点，它就能解决这种逻辑指代测试之类的任务。

It's like, you know, you get the language model right, and then with a little move in this direction, it can solve this logical dereference test or whatever.

Speaker 0

而通过其他一些方法，它又能解决翻译之类的问题。

And with this other thing, it can solve translation or something.

Speaker 0

然后你就意识到，哇，这真的大有可为。

And then you're like, wow, I think there's really something to do.

Speaker 0

当然，我们还能进一步扩大规模。

And of course, we can really scale it.

Speaker 1

有一件事让人困惑，或者说是难以想象的：如果你在2018年告诉我，到2023年我们会拥有像GPT-4这样的模型，能够以莎士比亚的风格写出定理，或者任何你想要的理论风格。

One thing that's confusing or that would have been hard to see, if you told me in 2018, We'll have models in 2023, like Quote Law two, that can write theorems in the style of Shakespeare or whatever theory you want.

Speaker 1

它们能够轻松应对开放式问题的标准化考试，还有各种令人印象深刻的表现。

They can ace standardized tests with open ended questions, just all kinds of really impressive things.

Speaker 1

如果你在那时告诉我，到2023年我们会拥有像Law2这样的模型，能以莎士比亚的风格写定理，或者任何你想要的理论，我会说：哦，你们已经拥有通用人工智能了。

You would have said at that time, I would have said, oh, you have AGI.

Speaker 1

你显然拥有某种达到人类水平的智能。

You clearly have something that is a human level intelligence.

Speaker 1

尽管这些表现非常出色，但显然我们还没有达到人类水平，至少在当前一代，甚至可能在未来几代都达不到。

Where these while these things are impressive, it clearly seems we're not at human level, at least in the current generation and potentially for generations to come.

Speaker 1

这种在基准测试中表现出色，以及在你能描述的各种任务中表现卓越，与所谓的‘通用智能’之间的差距，该如何解释呢？

What explains this discrepancy between super impressive performance in these benchmarks and in just, like, the things you could describe Yeah.

Speaker 1

相比之下，通用智能呢？

Versus, yeah, general?

Speaker 0

所以，这正是我当初没有深入追问的一个领域，而且我感到很惊讶。当我第一次看到GPT-3，以及我们在Anthropic早期开发的那些东西时，我的总体感觉是：它们似乎真正掌握了语言的本质。

So that that was one area where actually I was not pressing, and I was surprised as So when I first looked at GPT-three and more so the kind of things that we built in the early days at Anthropic, my general sense was, I looked at these and I'm like, it seems like they've really grasped the essence of language.

Speaker 0

我不确定我们还需要把它们扩展到多大程度。

I'm not sure how much we need to scale them up.

Speaker 0

也许从这里开始，更需要的是强化学习和其他各种技术。

Like maybe what's more needed from here is like RL and kinda all the other stuff.

Speaker 0

我觉得我们可能已经接近极限了，2020年的时候我以为还能继续扩大规模，但现在我不确定是继续扩大规模更高效，还是该开始加入强化学习这类其他目标。

Like we might be kind of near the, you know, I thought in 2020, like, we can scale this a bunch more, but I wonder if it's more efficient to scale it more or to start adding on these other objectives like RL.

Speaker 0

我当时觉得，如果你对2020年风格的模型做和预训练一样多的强化学习，那可能才是正确的方向，而继续扩大规模依然有效。

I thought maybe if you do as much RL as you've done pre training for a 2020 style model, that's the way to go, and scaling it up will keep working.

Speaker 0

但那真的是最好的路径吗？

But is that really the best path?

Speaker 0

我觉得，我不知道，它似乎一直在持续前进。

And think it, I don't know, it just keeps going.

Speaker 0

我原以为它已经掌握了语言的很多本质，但显然还有更大的空间可以突破。

Like I thought it had understood a lot of the essence of language, but then there's kind of further to go.

Speaker 0

所以，从更宏观的角度看，我之所以在人工智能、安全和组织问题上如此经验主义，是因为你常常会感到意外。

And so, I don't know, stepping back from it, like one of the reasons why I'm sort of very empiricist about AI, about safety, about organizations, is that you often get surprised.

Speaker 0

我觉得我在一些事情上是对的，但总体而言，面对这些理论上的设想，我还是错了很多。

I feel like I've been right about some things, but I've still, you know, with these theoretical pictures ahead, been wrong about most things.

Speaker 0

能对10%的事情判断正确，就已经远远超越很多人了。

Being right about 10% of the stuff is, you know, sets you head and shoulders above above above many people.

Speaker 0

你知道吗，如果你回看过去，我记不清是谁做过一些图表，上面画着：这里是村里的傻子，这里是爱因斯坦，这是智力的尺度。

You know, if you look back to, I can't remember who it was, kind of made these diagrams that are like, you know, here's the village idiot, here's Einstein, here's the scale of intelligence.

Speaker 0

对吧？

Right?

Speaker 0

而村里的傻子和爱因斯坦其实离得非常近。

And the village idiot and Einstein are like very close to each other.

Speaker 0

也许在某种抽象意义上这仍然成立，但现实情况真的不是这样，对吧？

Like that maybe that's still true in some abstract sense or something, but it's it's not really what we're seeing, is it?

Speaker 0

我们看到的是，人类的能力范围其实很广，而且我们在不同任务上达到人类水平的时间和位置并不一致。

We're seeing, like, that it seems like the human range is pretty broad and doesn't we don't hit the human range in the same place or at the same time for different tasks.

Speaker 0

用科马克·麦卡锡的风格写一首十四行诗。

Write a sonnet in the style of Cormac McCarthy or something.

Speaker 0

我不知道，我没什么创意，所以我也写不出来。

I don't know, I'm not very creative, so I couldn't do that.

Speaker 0

但这是一种相当高级的人类技能。

But that's a pretty high level human skill.

Speaker 0

甚至连模型也开始擅长一些受限写作了。

And even the model is starting to get good at stuff of constrained writing.

Speaker 0

比如写一页文字却不使用字母e。

There's this write a page without using the letter e or something.

Speaker 0

写一页关于x的内容，但不能使用字母e。

Write a page about x without using the letter e.

Speaker 0

我认为模型在这一点上可能已经超越人类，或接近超越人类。

I think the models might be superhuman or close to superhuman at that.

Speaker 0

但当涉及到，你知道的，是的，我也不确定，证明一些简单的数学定理时，它们才刚刚开始入门。

But when it comes to, you know, yeah, I don't know, prove relatively simple mathematical theorems, like, they're they're just starting to do the beginning of it.

Speaker 0

它们有时会犯非常愚蠢的错误。

They make really dumb mistakes sometimes.

Speaker 0

而且它们完全缺乏广泛纠正错误或执行长期任务的能力。

And they really lack any kind of broad correcting your errors or doing some extended task.

Speaker 0

所以我不知道，结果发现智力并不是一个连续谱。

And so I don't know, it turns out that intelligence isn't a spectrum.

Speaker 0

存在许多不同的领域专长。

There are a bunch of different areas of domain expertise.

Speaker 0

有许多不同类型的技能，比如记忆是不一样的。

There are a bunch of different, like, kinds of skills, like memory is different.

Speaker 0

我的意思是，所有这些都形成于同一个整体中。

I mean, it's all it's all formed in the blob.

Speaker 0

所有这些都形成于同一个整体中。

It's not it's all formed in the blob.

Speaker 0

这并不复杂。

It's not complicated.

Speaker 0

但即使它确实存在于一个谱系上，这个谱系也非常宽广。

But to the extent it even is on the spectrum, the spectrum is also wide.

Speaker 0

如果你十年前问我，我完全不会这么认为。

If you asked me ten years ago, that's not what I would have expected at all.

Speaker 0

但我觉得，事情确实就是这样发展的。

But, I think that's very much the way it's turned out.

Speaker 1

天哪。

Oh, man.

Speaker 1

关于这一点，我有太多后续问题了。

I have so many questions just as follow-up on that.

Speaker 1

一个是，你是否预期，考虑到这些模型从海量互联网数据中获得的训练分布，与人类从进化中获得的相比，它们所展现出的技能集合只会略微重叠？

One is, do you expect that given the distribution of training that these models get from massive amounts of internet data versus what humans got from evolution, that the repertoire of skills that elicits will be just barely overlapping?

Speaker 1

它们会像同心圆一样。

It will be like concentric circles.

Speaker 1

你怎么看待这些差异是否重要？

How do you think about do those matter?

Speaker 1

它是

Is it

Speaker 0

显然，两者之间有大量重叠。

Clearly, there's large there's certainly a large amount of overlap.

Speaker 0

对吧？

Right?

Speaker 0

因为这些模型有很多商业应用，而其中许多应用都是帮助人类更高效地完成各种任务。

Because a lot of the you know, like, these models have have business applications, and many of their business applications are doing things that, you know, are help helping humans to be more effective at things.

Speaker 0

所以重叠的部分相当大。

So the overlap is quite large.

Speaker 0

而且如果你想想人类在互联网上产生的各种行为数据，这些数据已经覆盖了大部分内容。

And if you think of all the activity that humans put on the Internet in tax, that covers a lot of it.

Speaker 0

但它可能没有涵盖某些方面。

But it probably doesn't cover some things.

Speaker 0

比如，我认为这些模型在某种程度上确实学到了对物理世界的理解，但它们显然没有学会如何在现实中实际移动。

Like the models, I think they do learn a physical model of the world to some extent, but they certainly don't learn how to actually move around in the world.

Speaker 0

不过，也许这很容易通过微调来改进。

Again, maybe that's easy to fine tune.

Speaker 0

但我认为，确实有一些人类具备而模型没有学到的能力。

But I, you know, I think so I think there are some things that the models don't learn that humans do.

Speaker 0

然后我认为，这些模型学会了比如流利地使用 base 64 编码。

And then I think, you know, the models learn, for example, to speak fluent base 64.

Speaker 0

我不知道你怎么样，但我从来没学过这个。

I don't know about you, but I never learned that.

Speaker 1

对。

Right.

Speaker 1

你认为这些模型在许多其他相关任务上仍低于人类水平的情况下，有多可能在经济价值高的任务上长期超越人类，从而阻止像智能爆炸之类的事情发生？

How likely do you think it is that these models will be superhuman for many years at economically valuable tasks while they are still below humans in many other relevant tasks that prevents, like, an intelligence explosion or something?

Speaker 0

我觉得这类事情真的很难判断。

I think this kind of stuff is, like, really hard to know.

Speaker 0

所以我要先说明一下，就像之前说的，基本的扩展规律你还能大致预测。

So I'll give I'll give that caveat that, like, again, the basic scaling laws you can kind of predict.

Speaker 0

但更细致的这些内容——我们真正想知道的是这一切将如何发展——就难得多。

And then this more granular stuff, which we really wanna know to know how this all is gonna go, is much harder to know.

Speaker 0

但我的猜测是，扩展规律将继续下去，当然前提是人们不会因为安全或监管原因而放慢脚步。

But my guess would be the scaling laws are gonna continue, again subject to do people slow down for safety or for regulatory reasons.

Speaker 0

但让我们先放下这些，假设我们有经济能力继续扩大规模。

But let's just put all that aside and say, we have the economic capability to keep scaling.

Speaker 0

如果我们这么做，会发生什么？

If we did that, what would happen?

Speaker 0

我认为我的观点是，我们会全面持续进步，我看不到任何领域模型表现得特别差或尚未开始取得进展。

And I think my view is we're gonna keep getting better across the board, and I don't see any area where the models are super, super weak or not starting to make progress.

Speaker 0

过去数学和编程确实如此，但我觉得在过去六个月里，2023年的模型相比2022年的模型已经开始掌握这些能力。

That used to be true of math and programming, but I think over the last six months, you know, the 2023 generation of models compared to the 2022 generation has started to learn that.

Speaker 0

可能还有一些我们尚未察觉的更细微的问题。

There may be more subtle things we don't know.

Speaker 0

因此，我倾向于认为，即使并不完全均衡，潮水上涨也会让所有船只都浮起来。

And so I kind of suspect, even if it isn't quite even, that the rising tide will lift all the boats.

Speaker 1

这包括你之前提到的，如果任务持续时间较长，模型会失去思路或无法顺利执行一系列操作，对吧？

Does that include the thing you were mentioning earlier where if there's an extended task, it kind of loses its train of thought, or its ability to just like execute a series So of

Speaker 0

我认为这取决于是否通过强化学习训练，让模型能够完成更长期的任务。

I think that that's gonna depend on things like RL training to have the model do longer horizon tasks.

展开剩余字幕（还有 480 条）

Speaker 0

我不认为这需要大量额外的计算资源。

I don't expect that to require a substantial amount of additional compute.

Speaker 0

我认为这可能是由于对强化学习的思考方式有误，低估了模型自身已经学到的东西。

I think that that was probably an artifact of, yeah, kind of thinking about RL in the wrong way and underestimating how much the model had learned on its own.

Speaker 0

关于我们是否会在某些领域达到超人水平，而在其他领域则不会，你怎么看？

In terms of, are we gonna be superhuman in some areas and not others?

Speaker 0

我认为这很复杂。

I think it's complicated.

Speaker 0

我可以想象，我们可能在某些领域无法达到超人水平，比如那些涉及物理世界具身化的问题。

I could imagine that we won't be superhuman in some areas because, for example, they involve embodiment in the physical world.

Speaker 0

那么接下来会发生什么？

And then it's like, what happens?

Speaker 0

AI会帮助我们训练出更快的AI，而这些更快的AI反过来解决这些问题吗？

Do the AIs help us train faster AIs, and those faster AIs wrap around and solve that?

Speaker 0

你真的不需要物理世界吗？

Do you not need the physical world?

Speaker 0

这取决于你指的是什么。

It depends what you mean.

Speaker 0

我们是否担心对齐灾难？

Are we worried about an alignment disaster?

Speaker 0

我们是否担心滥用，比如制造大规模杀伤性武器？

Are we worried about misuse, like making weapons of mass destruction?

Speaker 0

我们是否担心人工智能取代人类进行研究？

Are we worried about the AI taking over research from humans?

Speaker 0

我们是否担心它达到某种经济生产力阈值，从而能够完成普通人所能做的各种任务？我认为这些不同的阈值有不同的答案。

Are we worried about it reaching some threshold of economic productivity where it can do what the average these different thresholds, I think, have different answers.

Speaker 0

尽管我怀疑它们都会在几年内相继出现。

Although I suspect they will all come within a few years.

Speaker 0

让我问一下

Let me ask

Speaker 1

关于这些阈值。

about those thresholds.

Speaker 1

如果Claude是Anthropic的一名员工，它的薪资应该值多少？

So if Claude was an employee at Anthropic, what salary would it be worth?

Speaker 1

它到底在多大程度上显著加速了AI的发展？

What is it, like, meaningfully speeding up AI progress?

Speaker 1

它

Speaker 0

在我看来，它在大多数领域就像个实习生，但在某些特定领域表现得更好。

feels to me like an intern in most areas, but then some specific areas where it's better than that.

Speaker 0

再说一次，我认为比较困难的原因之一是，它的形态和人类并不相同，对吧？

Again, I think one thing that makes the comparison hard is like the form factor is kind of like not the same as a human, right?

Speaker 0

比如，如果你像这些聊天机器人一样行事，我们其实不会——我的意思是，也许我们可以进行这样的对话。

Like, you know, if you were to behave like one of these chatbots, like we wouldn't really I mean, I guess we could have this conversation.

Speaker 0

它们的设计初衷更倾向于回答单个或少数几个问题。

It's like, but, you know, they're they're not really they're more designed to answer single or a few questions.

Speaker 0

对吧？

Right?

Speaker 0

而且，你知道，它们没有拥有长期过往经验的概念。

And and like, you know, they don't have the concept of having a long life of prior experience.

Speaker 0

对吧？

Right?

Speaker 0

我们这里谈论的是，我过去经历过的一些事情。

We're talking here about, you know, things that that I've experienced in the past.

Speaker 0

对吧？

Right?

Speaker 0

而聊天机器人并没有这些。

And chatbots don't don't have that.

Speaker 0

所以有很多东西是缺失的，因此很难进行比较。

And so there's there's all kinds of stuff missing, and so it's hard to make a comparison.

Speaker 0

但我不确定。

But I don't know.

Speaker 0

它们在某些方面像实习生，但在某些领域又表现得特别突出，可能比在座的任何人都更出色。

It it they they feel like interns in some areas and kind of then they have areas where they spike and are really savants where they may be better than they may be better than anyone here.

Speaker 1

但关于智能爆炸的整体图景，我的前一位嘉宾卡尔·舒尔曼有一个非常详细的模型，作为一个真正会看到这种情况发生的人，你觉得从实习生到初级软件工程师的转变合理吗？

But does the overall picture of something like an intelligence explosion you know, my my former guest is Karl Schulman, and he has this, like, very detailed model of an does that, as somebody who would actually see that happening, does that make sense to you as they go from interns to entry level software engineers?

Speaker 1

这些初级软件工程师的生产力会提升，你的

Those entry level software engineers increase I your

Speaker 0

我认为人工智能系统会变得更高效，首先它们会提升人类的生产力，然后逐渐达到与人类相当的水平，最终在某种意义上成为科学进步的主要推动力，这个过程在某个时刻会发生。

think the idea that the AI systems become more productive, and first they speed up the productivity of humans, then they kind of equal the productivity of humans, and then they're, in some meaningful sense, the main contributor to scientific progress, that that happens at some point.

Speaker 0

我觉得这个基本逻辑对我来说是合理的，尽管我怀疑当我们深入细节时，实际情况可能会和我们预期的有点奇怪和不同。

I think that basic logic seems likely to me, although I have a suspicion that when we actually go into the details, it's gonna be kinda like weird and different than we expect.

Speaker 0

所有这些详细模型都像是，我们在思考错误的东西，或者我们对一件事是对的，但对其他十件事都错了。

That all the detailed models are kind of, you know, we're thinking about the wrong things, or we're right about one thing and then are wrong about 10 other things.

Speaker 0

所以我不确定。

And and so I I don't know.

Speaker 0

我觉得我们最终可能会进入一个比预期更奇怪的世界。

I think we might end up in, like, a weirder world than we expect.

Speaker 1

当你把所有这些综合起来，你对人类水平人工智能出现时间的估计是什么样的？

When you add all this together, like your estimate of when we get something kind of human level, what does that look like?

Speaker 0

我的意思是，这 again 取决于阈值。

I mean, again, it depends on the thresholds.

Speaker 0

是的。

Yeah.

Speaker 0

你知道，从某人观察这些模型的角度来看，即使你和它聊上一个小时，它也基本上就像一个受过良好教育的人类。

You know, in terms of someone looks at these, the model, and you know, even if you talk to it for an hour or so, it's basically like a generally well educated human.

Speaker 0

我认为，这可能并不遥远。

That could be not very far away at all, I think.

Speaker 0

比如，这可能在两三年内就会发生。

Like that could happen in two or three years.

Speaker 0

比如，如果我再看一遍，我认为唯一可能阻止它的因素是我们达到某些安全阈值，我们内部有安全标准之类的测试。

Like if I look at, again, I think the main thing that would stop it would be if we hit certain, and we have internal tests for safety thresholds and stuff like that.

Speaker 0

所以，如果一家公司或整个行业决定放慢步伐，或者我们能够促使政府出台一些限制措施，以安全为由减缓进展速度，那才是它不会发生的主要原因。

So if a company or the industry decides to slow down or we're able to get the government institute restrictions that kind of, you know, that moderate the rate of progress for safety reasons, that would be the main reason it wouldn't happen.

Speaker 0

但如果你只看规模化所需的后勤和经济能力，我认为我们离那个目标其实并不远。

But if you just look at the logistical and economic ability to scale, I don't think we're very far at all from that.

Speaker 0

现在，这可能并不是模型具有存在性危险的临界点。

Now that that may not be the threshold where the models are existentially dangerous.

Speaker 0

事实上，我怀疑它还远未达到那个程度。

In fact, I suspect it's not not quite there yet.

Speaker 0

它可能也不是模型能够接管大部分人工智能研究的临界点。

It may not be the threshold where the models can take over most AI research.

Speaker 0

它可能也不是模型严重改变经济运行方式的临界点。

It may not be the threshold where the models seriously change how the economy works.

Speaker 0

我认为在此之后情况就变得模糊了，所有这些临界点可能都会在之后的不同时间点发生。

I think it gets a little murky after that, and all of those thresholds may happen at various times after that.

Speaker 0

但从基础技术能力来看，它听起来就像一个普遍受过良好教育的人类。

But I think in terms of the base technical capability of it, it kinda it kinda sounds like a reasonably generally educated human across the board.

Speaker 0

我认为这个目标可能已经非常接近了。

I think that could be quite close.

Speaker 1

为什么它可能通过了受过教育者水平的图灵测试，却仍无法参与或替代人类在经济中的作用呢？

Why would it be the case that it could be sound pass the Turing test for an educated person, but not be able to contribute or substitute for human involvement in the economy?

Speaker 0

有几个原因。

A couple reasons.

Speaker 0

一个是，你知道，技能门槛还不够高。

One is just, you know, that the threshold of skill isn't high enough.

Speaker 0

对吧？

Right?

Speaker 0

比较优势。

Comparative advantage.

Speaker 0

就像，即使有个人在每项任务上都比普通人强，这也没关系。

It's like, it, like, doesn't matter that I have someone who's better than the average human at every task.

Speaker 0

比如在AI研究方面，我需要找到一种足够强大的工具，能显著加速那上千名最顶尖专家的工作。

Like, what I really need is like for AI research, like, you know, I need to basically find something that is strong enough to substantially accelerate the labor of the thousand experts who are best at it.

Speaker 0

因此，我们可能会达到一个点，这些系统的比较优势并不明显。

And so we might reach a point where the comparative advantage of these systems is not great.

Speaker 0

另一个可能是，我认为存在一些神秘的摩擦因素，这些在朴素的经济模型中看不到，但当你去接触客户时就会发现，比如你跟客户说：嘿，我有个很酷的聊天机器人。

Another thing that could be the case is that I think there are these kind of mysterious frictions that kind of don't show up in naive economic models, but you see it whenever you're like, when you go to a customer or something and you're like, hey, I have this cool chatbot.

Speaker 0

原则上，它可以完成你的客服机器人或公司这部分所做的所有工作。

In principle, it can do everything that your customer service bot does, or that this part of your company does.

Speaker 0

但问题是，我们该怎么把它整合进去呢？

But like, the actual friction of like, how do we slot it in?

Speaker 0

我们该怎么让它正常运行？

How do we make it work?

Speaker 0

这包括一方面，它在公司内部如何以人性化的方式运作，以及经济中如何克服这些摩擦。

That includes both kind of like, just the question of how it works in a human sense within the company, like how things happen in the economy and overcome frictions.

Speaker 0

另一方面，就是具体的工作流程是什么？

And also just like, what is the workflow?

Speaker 0

你实际上该如何与它互动？

How do you actually interact with it?

Speaker 0

说一个聊天机器人看起来像是在完成某项任务，或帮助人类完成任务，和说这个系统已经部署、十万人都在使用它，是完全不同的两回事。

It's very different to say, here's a chatbot that kind of looks like it's doing this task that, or helping the human to do some task, as it is to say like, okay, this thing is deployed and 100,000 people are using it.

Speaker 0

通常，现在很多人正急于部署这些系统。

Often, like right now, lots of folks are rushing to deploy these systems.

Speaker 0

但我认为，在许多情况下，他们并没有以最高效的方式使用这些系统。

But I I think in many cases, they're not using them in anywhere close to the most efficient way that they could.

Speaker 0

你知道，这并不是因为他们不够聪明，而是因为要理清这些事情需要时间。

You know, not because they're not smart, but because it takes time to work these things out.

Speaker 0

因此，我认为当事物变化如此迅速时，必然会存在各种摩擦。

And so I think when things are changing this fast, there are gonna be all of these frictions.

Speaker 0

而且我认为，这些是复杂的现实，无法被模型完全捕捉。

And I think, again, these are messy reality that doesn't quite get captured in the model.

Speaker 0

我不认为这改变了基本的图景。

I don't think it changes the basic picture.

Speaker 0

我不认为这改变了这样一个观点：我们正在积累一个雪球效应——模型帮助模型变得更好，加速人类能做的事情，最终，主要由模型来完成工作。

I don't think it changes the idea that we're building up this snowball of like, the models help the models get better and do what the humans and can accelerate what the humans do, and eventually, it's mostly the models doing the work.

Speaker 0

只要你足够宏观地看，这确实在发生。

Like, you zoom out far enough, that's happening.

Speaker 0

但我对任何精确的数学或指数预测持怀疑态度，即它将如何发展。

But I'm kind of skeptical of kind of any kind of precise mathematical or exponential prediction of how it's gonna be.

Speaker 0

我觉得一切都会很混乱，但我们知道这并不是一种比喻性的指数增长，而且它会迅速发生。

I think all gonna be a mess, but I think what we know is it's not an metaphorical exponential, and it's gonna happen fast.

Speaker 1

我们之前讨论过的那些不同的指数增长，最终会如何相互作用呢？

How do those different exponentials net out, which we've been talking about?

Speaker 1

一个是扩展定律本身是幂律，随着参数增加，边际损失逐渐递减，或者类似的情况。

So one was the scaling laws themselves are power laws with decaying marginal loss per parameter or something.

Speaker 1

另一个你提到的指数增长是，这些工具本身可以参与到人工智能研究的过程中，从而加速研究。

The other exponential you talked about is, well, these things can get involved in the process of AI research itself, speeding it up.

Speaker 1

所以这两种是某种程度上相互对立的指数增长。

So those two are sort of opposing exponentials.

Speaker 1

它们最终是呈现超线性增长，还是亚线性增长？

Does it net out to be super linear or sub linear?

Speaker 1

你还提到，智能的分布可能只是变得更广泛。

And also you mentioned, well, the distribution of intelligence might just be broader.

Speaker 1

所以当我们达到两三年后这个节点时，是不是就会突然之间，飞速发展？

So should we expect after the after we get to this point in two to three years, it's like, voom, voom.

Speaker 1

比如，那看起来像什么？

Like, does that look like?

Speaker 0

我的意思是，我觉得这非常不清楚。

It's I mean, I think it's very unclear.

Speaker 0

对吧？

Right?

Speaker 0

所以我们现在已经到了这样一个阶段：如果你看损失值，扩展规律已经开始出现弯曲。

So we're already at the point where if you look at the loss, the scaling laws are starting to bend.

Speaker 0

我的意思是，我们已经在多家公司发布的模型卡片中看到了这一点。

I mean, we've seen that in published model cards offered by multiple companies.

Speaker 0

所以这根本不是什么秘密。

So that's not a secret at all.

Speaker 0

但当它们开始弯曲时，每一丁点的熵——也就是准确预测——就变得更加重要了，对吧？

But as they start to bend, each little bit of entropy, right, of accurate prediction becomes more important, right?

Speaker 0

也许这些最后的一点点熵，就像爱因斯坦会写的物理论文那样，而不是其他物理学家会写的那种。

Maybe these last little bits of entropy are like, well, this is a physics paper as Einstein would have written it, as opposed to as some other physicist would have written it.

Speaker 0

因此，很难从这一点评估其重要性。

And so it's hard to assess significance from this.

Speaker 0

就实际性能而言，这些指标显然仍在相对线性地上升。

It certainly looks like in terms of practical performance, the metrics keep going up relatively linearly.

Speaker 0

希望它们原本就是不可预测的。

Hopefully, they were always unpredictable.

Speaker 0

所以，这一点很难看出来。

So so it's it's hard to see that.

Speaker 0

而我认为推动加速的最主要因素，是越来越多的资金涌入这个领域。

And then, I mean, the thing that I think is driving the most acceleration is just more and more money is going into the field.

Speaker 0

人们看到这里蕴含着巨大的经济价值。

Like, people are seeing that there's just a huge amount of, you know, of of economic value.

Speaker 0

因此，我预计在最大模型上的投入金额将增加约一百倍。

And so I expect the price, the amount of money spent on the largest models to go up by like a factor of 100 or something.

Speaker 0

再加上芯片变得更快、算法不断优化，因为现在有如此多的人在研究这个问题。

And for that then to be concatenated with the chips are getting faster, the algorithms are getting better because there's so many people working on this now.

Speaker 0

所以，我在这里并不是在做规范性陈述。

And so again, I'm not making a normative statement here.

Speaker 0

这是应该发生的事情。

This is what should happen.

Speaker 0

我甚至没有说这一定会发生，因为我认为这里涉及重要的安全和政府问题，我们正在积极研究这些问题。

I'm not even saying this necessarily will happen because I think there's important safety and government questions here, which we're very actively working on.

Speaker 0

我只是说，如果放任不管，经济就会朝这个方向发展。

I'm just saying like left to itself, this is what the economy is gonna do.

Speaker 0

我们稍后会谈到

We'll get to

Speaker 1

这些问题一会儿再说。

those questions in a second.

Speaker 1

但你怎么看待Anthropic对这一行业规模扩大的贡献？我的意思是，我们有一个观点认为，有了这些投资，我们可以在Anthropic专注于安全问题；还有另一种观点认为，你提升了整个领域的关注度。

But how do you think about the contribution of Anthropic to that increasing in the scope of this industry where I mean, there's an argument we make that, listen, with that investment, we can work on safety stuff at Anthropic, another that says you're raising the salience of this field in general.

Speaker 0

是的。

Yeah.

Speaker 0

我的意思是，这全是成本和收益的问题。

I mean, it's all it's all costs and benefits.

Speaker 0

对吧？

Right?

Speaker 0

成本并不是零。

The costs are not zero.

Speaker 0

对吧？

Right?

Speaker 0

所以我认为，成熟地看待这些问题的方式是，不要否认存在任何成本，而是要思考这些成本是什么，收益又是什么。

So I think a mature way to think about these things is, you know, not not to deny that there are any costs, but to think about what the costs are and what the benefits are.

Speaker 0

我觉得我们在某种程度上是相对负责任的，因为去年底和今年初发生的那场重大加速，并不是我们引起的。

You know, I think I think we've been relatively responsible in the sense that, you know, the big acceleration that that happened late last year and and beginning of this year, like, we didn't cause that.

Speaker 0

我们并不是那个促成它的人。

We weren't the ones who did that.

Speaker 0

老实说，我认为如果你看看谷歌的反应，那可能比其他任何事情都重要十倍。

And honestly, I think if you look at the reaction of Google, that that might be 10 times more important than anything else.

Speaker 0

一旦事情发生，生态系统发生变化后，我们就做了很多事来保持在前沿。

And then kind of once it had happened, once the ecosystem had changed, then we did a lot of things to kind of stay on the frontier.

Speaker 0

所以我不知道，这就像其他任何问题，对吧？

So I don't know, it's like any other question, right?

Speaker 0

你试图做那些成本最高、成本最低、收益最大的事情。

It's like you're trying to do the things that have the biggest costs and that have the lowest costs and the biggest benefits.

Speaker 0

这会导致你在不同时期采取不同的策略。

And that causes you to have different strategies at different times.

Speaker 1

我们在讨论智能相关话题时，我有个问题：是的。

One question I had for you while we were talking about the intelligence stuff was Yes.

Speaker 1

听我说，作为一名科学家，你怎么看待这些系统几乎记住了整个人类知识库这一事实？

Listen, as a scientist yourself, is it what do you make of the fact that these things have basically the entire corpus of human knowledge memorized?

Speaker 1

据我所知，它们还从未做出过任何能导致新发现的全新关联。

And as far as I'm aware, haven't been able to make a single new connection that has led to a discovery.

Speaker 1

而如果一个中等智力的人拥有这么多记忆，他早就注意到：这个东西会导致那个症状。

Whereas if even a moderately intelligent person had this much stuff memorized, they'd notice, oh, this thing causes this symptom.

Speaker 1

另一件事也会导致这个症状。

This other thing also causes this symptom.

Speaker 1

这里就有一个医学疗法。

There's a medical cure right here.

Speaker 1

我们难道不该期待这类事情吗？

Shouldn't we be expecting that kind of stuff?

Speaker 0

我不确定。

I'm not sure.

Speaker 0

我的意思是，我觉得，我不知道，像‘发现’、‘创造力’这样的词，我学到的一个教训是，在庞大的计算洪流中，这些想法往往变得模糊、难以捉摸，难以追踪。

I mean, I think, I don't know, these words discovery, creativity, it's one of the lessons I've learned is that in kind of the big blob of compute, often these these ideas often end up being kind of fuzzy and elusive and hard to track down.

Speaker 0

但我认为这里确实有些东西，那就是我认为这些模型确实展现出了一些能力，比如用科马克·麦卡锡或芭比的风格写一首十四行诗。

But I think I think there is something here, which is I think the models do display Again, the kind of like write a sonnet in the style of Cormac McCarthy or Barbie or something.

Speaker 0

这确实体现了一定的创造力。

There is some creativity to that.

Speaker 0

而且我认为它们确实会建立起普通人也会做出的新关联。

And I think they do draw new connections of the kind that an ordinary person would draw.

Speaker 0

我同意你的看法，目前确实还没有出现什么重大的科学发现。

I agree with you that there haven't been any kind of like, I don't know, like, I would say, like big scientific discoveries.

Speaker 0

我认为这是因为模型的能力还不够高。

I think that's a mix of like, just the model skill level is not high enough yet.

Speaker 0

对吧？

Right?

Speaker 0

上周我参加了一个播客，主持人说，我不知道。

Like I was on a podcast last week where where the host said, I don't know.

Speaker 0

我经常使用这些模型。

I play with these models.

Speaker 0

它们有点平庸。

They're kind of mid.

Speaker 0

对吧？

Right?

Speaker 0

它们能拿到B或者B-这样的成绩。

Like, they get, you know, they get a b or a b minus or something.

Speaker 0

而且我认为，随着规模的扩大，这种情况将会改变。

And and that that, I think, is gonna change with the with the scaling.

Speaker 0

我认为有一个有趣的观点，那就是模型拥有的优势是它们知道的东西比我们多得多。

I do think there's an interesting point about, well, the models have an advantage, which is they know a lot more than us.

Speaker 0

你知道，即使它们的技能水平还不够高，它们难道不应该已经拥有这种优势吗？

You know, like, should should they have an advantage already even if even if they their skill level isn't isn't isn't quite high?

Speaker 0

也许你正是想表达这个意思。

Maybe that's kinda what you're getting at.

Speaker 0

我对这个问题其实没有明确的答案。

I don't really have an answer to that.

Speaker 0

我的意思是，显然在记忆、事实和建立联系方面，模型已经领先了。

I mean, it seems certainly like memorization and facts and drawing connections is an area where the models are ahead.

Speaker 0

我认为，你可能需要这些联系，也需要相当高的技能水平。

And I do think maybe you need those connections and you need a fairly high level of skill.

Speaker 0

我认为，特别是在生物学领域，无论好坏，生物学的复杂性使得当前的模型已经掌握了大量知识。

I do think, particularly in the area of biology, for better and for worse, the complexity of biology is such that the current models know a lot of things right now.

Speaker 0

而这就是你做出发现和建立联系所需要的。

And that's what that's what you need to make discoveries and draw.

Speaker 0

这不像物理学，你得思考并推导出公式。

It's not like physics where you need to, you know, you need to think and come up with a formula.

Speaker 0

在生物学中，你需要了解很多知识，没错。

In biology, you need to know a lot of Right.

Speaker 0

因此，我认为这些模型掌握了很多知识，但它们的技能水平还不足以将这些知识整合起来。

And so I do think the models know a lot of things and they have a skill level that's not quite high enough to put them together.

Speaker 0

我认为它们正处在能够将这些要素整合起来的临界点上。

And I think they are they are just on the cusp of being able to put these things together.

Speaker 1

关于这一点，上周你在参议院作证时提到，这些模型距离可能引发大规模生物恐怖袭击还有两到三年的时间。

On that point, last week in your Senate testimony, you said that these models are two to three years away from potentially enabling large scale biotourism attacks or something

Speaker 0

比如可以

like Can

Speaker 1

你能更具体地说明一下吗？当然，不能提供可能被滥用的信息。

you make that more concrete without obviously giving the kind of information that would help?

Speaker 1

但这是像一次性获取如何武器化某种东西的方法吗？

But is it like one shotting how to weaponize something?

Speaker 1

你需要微调一个开源模型吗？

Do you to fine tune an open source model?

Speaker 1

也就是说，这实际上会是怎样的？

Like, what would that actually

Speaker 0

我认为有必要澄清这一点，因为我们曾在参议院听证会上发布过一篇博客文章，我觉得很多人没理解我们的观点，或者没理解我们做了什么。

I look think it'd be good to clarify this because we did a blog post in the Senate testimony and like, I think various people kinda didn't understand the point or didn't didn't understand what we'd done.

Speaker 0

所以我认为，今天——当然在我们的模型中，我们努力防止这种情况，但总会有越狱漏洞。

So I think today, and of course in our models we try and prevent this, but there's always jailbreaks.

Speaker 0

你可以向模型询问各种关于生物学的问题，让它们说出各种可怕的内容。

You can ask the models all kinds of things about biology and get them to say all kinds of scary things.

Speaker 0

是的。

Yeah.

Speaker 0

但这些可怕的内容往往是你在谷歌上也能查到的，因此我对此并不特别担心。

But often those scary things are things that you could Google, and I'm therefore not particularly worried about that.

Speaker 0

我认为这反而阻碍了人们看清真正的危险——有人只是说，我问了这个模型一些关于天花的问题，它就给出了答案。

I think it's actually an impediment to seeing the real danger, where someone just says, oh, I asked this model to tell me some things about smallpox, and it will.

Speaker 0

这其实并不是我真正担心的事情。

That is actually kind of not what I'm worried about.

Speaker 0

所以我们花了大约六个月的时间，与世界上最具专业知识的人合作，研究生物攻击是如何发生的。

So we spent about six months working with basically some

Speaker 1

其中

Speaker 0

你知道，要实施这样的攻击，你需要什么，以及我们该如何防范？

the folks who are the most expert in the world on how do biological attacks happen.

Speaker 0

他们非常深入地研究了整个流程：如果我要做一件坏事，这并不是一次性的操作，而是一个漫长的过程，包含许多步骤。

You know, what would you need to conduct such an attack, and how do we defend against such an attack?

Speaker 0

这不仅仅是‘我问了模型一页信息’那么简单。

They worked very intensively on just the entire workflow of, if I were trying to do a bad thing, it's not one shot, it's a long process, there are many steps to it.

Speaker 0

它不是仅仅因为我问了模型这一页信息就完事了。

It's not just like I asked the model for this one page of information.

Speaker 0

而且，不深入细节的话，我在参议院证词中提到的是，有些步骤你可以在谷歌上直接找到信息。

And again, without going into any detail, the thing I said in the Senate testimony is like, there are some steps where you can just get information on Google.

Speaker 0

有些步骤是我所说的‘缺失’的。

There are some steps that are what I'd call missing.

Speaker 0

这些信息分散在众多教科书中，或者根本不在任何教科书里。

They're scattered across a bunch of textbooks, or they're not in any textbook.

Speaker 0

它们属于隐性知识，而不是显性知识。

They're kind of implicit knowledge, and they're not really like, they're not explicit knowledge.

Speaker 0

它们更像是：我得执行这个实验流程，但如果我做错了怎么办？

They're more like, I have to do this lab protocol, and like, what if I get it wrong?

Speaker 0

哦，如果发生了这种情况，我的温度太低了。

Oh, if this happens, my temperature was too low.

Speaker 0

如果出现那种情况，我就需要加入更多这种特定的试剂。

If that happened, I needed to add more of this particular reagents.

Speaker 0

我们发现，这些关键的缺失环节，目前模型还做不到。

What we found is that for the most part, those key missing pieces, the models can't do them yet.

Speaker 0

但我们发现，有时它们确实可以。

But we found that sometimes they can.

Speaker 0

当它们能做到时，有时仍然会幻觉，而这正是让我们保持警惕的地方。

And when they can, sometimes they still hallucinate, which is the thing that's kind of keeping us safe.

Speaker 0

但我们看到了足够多的迹象，表明这些模型能够很好地完成这些关键任务。

But we saw enough signs of the models doing those key things well.

Speaker 0

如果我们观察当前最先进的模型，并回溯到之前的模型，分析这一趋势，可以看出再过两三年，我们将会面临一个真正的问题。

And if we look at state of the art models and go backwards to previous models, we look at the trend, it shows every sign of two or three years from now, we're gonna have a real problem.

Speaker 1

是的。

Yeah.

Speaker 1

尤其是你提到的那一点，在对数尺度上，从每100次中成功一次，变成每10次中成功一次，确实如此。

Especially the thing you mentioned, though, on the log scale, go from, like, one in 100 times it gets a ride to one in 10 to Exactly.

Speaker 0

所以，你知道，我这一生中见过很多这样的‘grocks’。

So, you know, I've seen many of these, like, grocks in my life.

Speaker 0

对吧？

Right?

Speaker 0

我亲眼见证了GPT-3学会算术、GPT-2勉强超过随机水平地完成回归任务，以及当我们使用Claude时，在这些关于有益、诚实、无害的测试上取得的进步。

I was there when I watched when GPT-three learned to do arithmetic, when GPT-two learned to do regression a little bit above chance, when, you know, when we got, you know, with Claude and we got better on like, you know, all these tests of helpful, honest, harmless.

Speaker 0

我见过很多这样的‘Grox’。

I've seen a lot of Grox.

Speaker 0

这并不是让我感到兴奋的一种进展，但我相信它正在发生。

This is this is unfortunately not one that I'm excited about, but I believe it's happening.

Speaker 1

所以有人可能会说：你不是曾作为合著者参与了OpenAI发布的那篇关于GPT-2的文章吗？当时他们说，因为担心这个模型会被用于不良用途，所以不发布权重或详细信息。

So somebody might say, listen, you were a coauthor on this post that OpenAI released, GPT two, where they said, you know, we're not gonna release the weights or the details here because we're worried that this model will be used for something, you know, bad.

Speaker 1

现在回头看，认为GPT-2能造成什么实际危害的想法简直可笑。

And looking back on it, now it's laughable to think that GPT-two could have done anything bad.

Speaker 1

我们是不是太过担忧了？

Are we just, like, way too worried?

Speaker 1

这种担忧对……来说根本没有道理。

This is a concern that doesn't make sense for

Speaker 0

这确实很有趣。

It is interesting.

Speaker 0

回头看那篇帖子的实际内容可能很有价值。

It might be worth looking back at the actual text of that post.

Speaker 0

我不太记得具体内容了，但它应该还挂在互联网上。

So I don't remember it exactly, but it it should it you know, it's it's it's still up on the Internet.

Speaker 0

它提到类似这样的内容：我们选择不发布权重，是因为担心被滥用。

It says something like, you know, we're choosing not to release the weights because of concerns about misuse.

Speaker 0

但它也说，这是一次实验。

But it also said, this is an experiment.

Speaker 0

我们不确定此时是否有必要或正确这样做，但我们希望确立一种认真思考这些问题的规范。

We're not sure if this is necessary or the right thing to do at this time, but we'd like to establish a norm of thinking carefully about these things.

Speaker 0

你可以把它看作是20世纪70年代的阿西洛马会议，当时人们刚刚开始研究重组DNA技术。

You could think of it a little like the Silamur Conference in the 1970s, right, where it's like they were just figuring out recombinant DNA.

Speaker 0

当时并不一定意味着重组DNA会被用来做真正有害的事情。

You know, it was not necessarily the case that someone could do something really bad with recombinant DNA.

Speaker 0

只是这些可能性开始变得清晰起来。

It's just the possibilities were starting to become clear.

Speaker 0

至少，这些话体现了正确的态度。

Those words, at least, were the right attitude.

Speaker 0

现在我认为还有一件事，人们不只是评判这篇帖子，他们还在评判整个组织。

Now I think there's a separate thing that, like, you know, people don't just judge the post, they judge the organization.

Speaker 0

这是一个制造大量炒作的组织，还是一个有公信力的组织，或者类似的情况？

Is this an organization that produces a lot of hype or that has credibility or something like that?

Speaker 0

所以我认为这在一定程度上产生了影响。

And so I think that had some effect on it.

Speaker 0

我想你也可以问，人们是否会不可避免地将其解读为：你无法传达任何比‘这东西很危险’更复杂的信息。

I guess you could also ask, is it inevitable that people would just interpret it as like, you can't get across any message more complicated than this thing right here is dangerous.

Speaker 0

你可以就这些问题展开讨论，但我想，当时在我和参与此事的其他人脑海中最基本的想法是，正如帖子中所体现的：我们其实并不确定。

So you can argue about those, but I think the the basic thing that was in my head and the head the head of others who were who were who were involved in that, and, you know, I think what what is what is evident in the post is like, we actually don't know.

Speaker 0

我们对什么是危险的、什么不是危险的，有着相当宽泛的猜测。

We have pretty wide air of ours on what's dangerous and what's not.

Speaker 0

所以我们应该，嗯，希望建立一种谨慎行事的规范。

So we should, you know, like, we we want to establish a norm of being careful.

Speaker 0

我想顺便说一下，我们现在拥有的证据要多得多。

I I think, by the way, we have enormously more evidence.

Speaker 0

我们现在看到的这类grocs也多得多了。

We've seen enormously more of these grocs now.

Speaker 0

所以我们已经很好地调整了判断，但仍然存在不确定性。

And so we're well calibrated, but there's still uncertainty.

Speaker 0

对吧？

Right?

Speaker 0

在所有这些陈述中，我都说过，比如两三年后，我们可能会达到那个程度。

In all these statements, I've said, like, in two or three years, we might be there.

Speaker 0

对吧？

Right?

Speaker 0

这存在相当大的风险，我们不想冒这个险。

There's a substantial risk of it, and we don't wanna take that risk.

Speaker 0

但你知道，我不会说这是100%的。

But, you know, I wouldn't say it's it's a 100%.

Speaker 1

可能是五五开。

It could be fifty fifty.

Speaker 1

好的。

Okay.

Speaker 1

让我们谈谈网络安全，除了生物恐怖主义之外，这是Anthropic一直强调的另一个方面。

Let's talk about cybersecurity, which in addition to Bioterrorisk is another thing Anthropic has been emphasizing.

Speaker 1

你们是如何避免云微架构泄露的？

How have you avoided the cloud microarchitecture from leaking?

Speaker 1

因为你知道，你们的竞争对手在這種安全性方面做得不太成功。

Because as you know, your competitors have been less successful at, this kind of security.

Speaker 0

我不能评论其他人的安全措施。

Can't comment on anyone else's security.

Speaker 0

我不清楚里面发生了什么。

Don't know what's going on in there.

Speaker 0

我们做的一件事是，有一些架构上的创新，能让训练更加高效。

A thing that we have done is, so there are these architectural innovations, right, that make training more efficient.

Speaker 0

我们称它们为计算倍增器，因为它们的效果相当于提升了计算能力。

We call them compute multipliers because they're the equivalent of improving, they're like having more compute.

Speaker 0

我们的计算倍增器，再说一遍，我不想透露太多细节，因为这可能让对手找到对策，所以我们只让真正需要知道的人了解每个计算倍增器的具体内容。

Our compute multipliers, again, I don't want to say too much about it because it could allow an adversary to counteract our measures, but we limit the number of people who are aware of a given compute multiplier to those who need to know about it.

Speaker 0

因此，只有极少数人掌握所有这些机密。

And so there's a very small number of people who could leak all of these secrets.

Speaker 0

能泄露其中某一项机密的人则相对更多一些。

There's a larger number of people who could leak one of them.

Speaker 0

但这种做法正是情报机构或地下抵抗组织常用的隔离策略。

But this is the standard compartmentalization strategy that's used in the intelligence community or resistant cells or whatever.

Speaker 0

在过去几个月里，我们已经实施了这些措施。

We've, over the last few months, we've implemented these measures.

Speaker 0

所以我不敢妄言说‘这种事情永远不会发生在我们身上’，那样会不吉利。

So I don't wanna jinx anything by saying, oh, this could never happen to us.

Speaker 0

但我认为，这种事情发生的可能性会小得多。

But I think I think it would be harder for it to happen.

Speaker 0

我不想再深入细节了。

I don't wanna go into any more detail.

Speaker 0

顺便说一下，我鼓励其他所有公司也这么做。

And, know, by the way, I'd encourage all the other companies to do this as well.

Speaker 0

竞争对手架构泄露对Anthropic的帮助非常有限，对任何人都没有好处，

It's as much as like competitors' architectures leaking is is narrowly helpful to Anthropic, it's not good for anyone in

Speaker 1

从长远来看。

the long run.

Speaker 1

对吧？

Right?

Speaker 1

所以这方面的安全非常重要。

So security around this stuff is really important.

Speaker 1

即使你们有所有这些安全措施，以你们目前的安全水平，能否阻止一个专注的国家级行为者获取Claude 208？

Even with all the security you have, could you, with your current security, prevent a dedicated state level actor from getting the Claude two zero eights?

Speaker 0

这取决于他们有多专注，我会这么说。

It depends how dedicated is what I would say.

Speaker 0

我们的安全主管以前负责Chrome的安全工作，而Chrome被广泛应用于AT&T和各类应用中，他喜欢从成功攻击Anthropic需要多少成本的角度来思考这个问题。

Our head of security, who used to work on security for Chrome, which very widely used in ATT and application, he likes to think about it in terms of how much would it cost to attack Anthropic successfully.

Speaker 0

再说一遍，我不想详细说明我认为攻击需要多少成本，那样等于在给他人提供线索。

Again, I don't want to go into super detail of how much I think it will cost to attack, and it's kind of inviting people.

Speaker 0

但我们的目标之一是，攻击Anthropic的成本要高于自己训练一个模型的成本，当然这并不能完全保证安全，因为你还需要人才。

But like, one of our goals is that it costs more to attack Anthropic than than it costs to just train your own model, which doesn't guarantee things because, you know, of course, you need the talent as well.

Speaker 0

所以你可能还是会去尝试，但你知道，攻击本身是有风险的。

So you might still but, you know, but but attacks have have have risks.

Speaker 0

还有外交上的代价。

The diplomatic costs.

Speaker 0

而且，你知道，攻击会消耗国家行为体可能拥有的极其有限的资源。

And, you know, And they use up the very sparse resources that nation state actors might have in order to do the attacks.

Speaker 0

顺便说一句，我们还没达到完美，但以我们公司的规模来看，我认为已经达到了非常高的安全标准。

So we're not there yet, by the way, but I think we're to a very high standard compared to the size of company that we are.

Speaker 0

如果你看看大多数150人规模公司的安全措施，我觉得根本没法相提并论。

Like, I think if you look at security for most 150 person companies, like, I think there's just no comparison.

Speaker 0

但如果一个国家行为体将窃取我们的模型权重作为最高优先级，我们能抵抗得住吗？

But could we resist if if it was a state actor's top priority to steal our model weights?

Speaker 0

不能。

No.

Speaker 0

他们会成功的。

They would they would succeed.

Speaker 1

这种情况会持续多久？

How long does that stay true?

Speaker 1

因为随着时间推移，模型的价值会不断攀升。

Because at some point, the value keeps increasing and increasing.

Speaker 1

这个问题的另一部分是：训练Claude 3或Claude 2的方法究竟算什么类型的秘密？

And another part of this question is that what kind of a secret is how to train cloud three or cloud two?

Speaker 1

比如核武器，我们身边到处都是间谍。

Is it with nuclear weapons, for example, we have lots of spies.

Speaker 1

你只需把设计图带出去，那就是内爆装置，而这正是你需要的一切。

You just take a blueprint across and that's the implosion device, and that's what you need.

Speaker 1

在这里，它是否更像你提到的生物学中的那种隐性知识？

Here, is it more tacit like the thing you were talking about biology?

Speaker 1

你需要了解这些试剂是如何工作的。

You need to know how these reagents work.

Speaker 1

是仅仅像你拿到了蓝图、微架构和超参数一样吗？

Is it just like you got the blueprint, you got the microarchitecture and the hyperparameters?

Speaker 1

那你就可以开始了

There You're good to

Speaker 0

有些东西就像一个简单的公式，而其他一些则更复杂。

are some things that are like a one line equation, and there are other things that are more complicated.

Speaker 0

我认为最好的方式是进行隔离。

And I think compartmentalization is the best way to do it.

Speaker 0

限制知道某件事的人数。

Just limit the number of people who know about something.

Speaker 0

如果你是一家千人公司，而每个人都了解每一个秘密，那么我敢保证，你一定有内鬼。

If you're a thousand person company and everyone knows every secret, like, one, I guarantee you have some you have a leaker.

Speaker 0

第二，我敢保证你有个间谍，一个真正的间谍。

And two, I guarantee you have a spy, like a literal spy.

Speaker 1

好的。

Okay.

Speaker 1

我们来谈谈对齐问题。

Let's talk about alignment.

Speaker 1

我们来谈谈机制可解释性，这是那个……是的。

And let's talk about mechanistic interpretability, which is the branch Yes.

Speaker 1

你们正是专精于这一领域的。

Of which you, you guys specialize in.

Speaker 1

在回答这个问题时，你或许需要先解释一下什么是机制可解释性。

While you're answering this question, you might wanna explain what mechanistic interpretability is.

Speaker 1

但更广泛的问题是：从机制上讲，对齐到底是什么？

But just, the broader question is, mechanistically, what is alignment?

Speaker 1

是不是把模型锁定在一个仁慈的特质上？

Is it that you're locking in the model into a benevolent character?

Speaker 1

你们在禁用欺骗性电路和流程吗？

Are you disabling deceptive circuits and procedures?

Speaker 1

也就是说，当你对模型进行对齐时，具体发生了什么？

Like, what concretely is happening when you align a model?

Speaker 0

我认为，和大多数事情一样，当我们实际训练模型以实现对齐时，并不清楚模型内部发生了什么，对吧？

I think as with most things, you know, when we actually train a model to be aligned, we don't know what happens inside the model, right?

Speaker 0

有多种方式训练模型实现对齐，但我认为我们其实并不清楚其中发生了什么。

There are different ways of training it to be aligned, but I think we don't really know what happens.

Speaker 0

我的意思是，对于一些当前的方法，我认为所有涉及某种微调的当前方法，都有一个共同特点：我们可能担心的底层知识和能力并不会消失。

I mean, I think for some of the current methods, I think all of the current methods that involve some kind of fine tuning, of course, have the property that the underlying knowledge and abilities that we might be worried about don't disappear.

Speaker 0

只是，模型被训练成不去输出这些内容而已。

It's just, you know, the model is just taught not to output them.

Speaker 0

我不知道这是否是一个致命缺陷，或者这只是事情必须如此的方式。

I don't know if that's a fatal flaw or if, you know, or if that's just the way things have to be.

Speaker 0

我不清楚机制层面内部究竟发生了什么，而我认为这正是机制可解释性的核心目的——真正理解模型内部各个电路层面的运作。

I don't know what's going on inside mechanistically, and I think that's the whole point of mechanistic interpretability, to really understand what's going on inside the models at the level of individual circuits.

Speaker 0

最终问题解决了，这个解决方案会是什么样子？

Eventually, it's solved, what does the solution look like?

Speaker 1

在什么情况下，如果你是Claude，你做了机制可解释性分析，然后说：我满意了？

Where what is it the case where if you're Claude for, you do the mechanistic interpretability thing and you're like, I'm satisfied.

Speaker 1

它已经对齐了。

It's aligned.

Speaker 1

你看到了什么？

What is it that you've seen?

Speaker 0

是的，我认为我们目前还不知道。

Yeah, so I think we don't know that yet.

Speaker 0

我认为我们还不足以了解这一点。

I think we don't know enough to know that yet.

Speaker 0

我的意思是，我可以给你描述一下这个过程看起来是什么样，而不是最终结果是什么样。

I mean, I can give you a sketch for what the process looks like as opposed to what the final result looks like.

Speaker 0

所以我认为可验证性是这里的主要挑战，对吧？

So I think verifiability is a lot of the challenge here, right?

Speaker 0

我们有各种声称能够对齐AI系统的方法，并且在当前任务上确实取得了成功。

We have all these methods that purport to align AI systems and do succeed at doing so for today's tasks.

Speaker 0

但问题总是在于，如果你有一个更强大的模型，或者模型处于不同情境下，它还会对齐吗？

But then the question is always, if you had a more powerful model or if you had a model in a different situation, would it be aligned?

Speaker 0

因此，我认为，如果你有一个可以扫描模型并直接说‘我知道这个模型是对齐的’的预言机，这个问题就会容易得多。

And so I think this problem would be much easier if you had an Oracle that could just scan a model and say like, okay, I know this model is aligned.

Speaker 0

我知道它在每种情况下都会做什么。

I know what it'll do in every situation.

Speaker 0

那样的话，问题就会简单得多。

Then the problem would be much easier.

Speaker 0

我认为我们目前最接近这一点的是机制可解释性。

And I think the closest thing we have to that is something like mechanistic interpretability.

Speaker 0

但它离胜任这项任务还差得很远。

It's not anywhere near up to the task yet.

Speaker 0

但我想说，我把它看作是一个扩展的训练集和扩展的测试集。

But I guess I would say, I think of it as almost like an extended training set and an extended test set.

Speaker 0

对吧？

Right?

Speaker 0

我们所做的一切，所有这些对齐方法，都是训练集。

Everything we're doing, all the alignment methods we're doing are the training set.

Speaker 0

对吧？

Right?

Speaker 0

你可以在这些方法中进行测试，但它们真的能应对分布变化吗？

You you know, you can you can run tests in them, but will it really work out a distribution?

Speaker 0

在其他情况下，它们真的有效吗？

Will it really work in another situation?

Speaker 0

机制可解释性是唯一在原则上可能做到这一点的方法，尽管我们离这个目标还很远，但原则上，它更像是对模型的X光检查，而不是对模型的修改，对吧？

Mechanistic interpretability is the only thing that even in principle, and we're nowhere near there yet, but even in principle is the thing where it's like, it's more like an x-ray of the model than modification of the model, right?

Speaker 0

它更像是一种评估，而不是干预。

It's more like an assessment than an intervention.

Speaker 0

因此，我们需要某种机制，既能拥有一个扩展的训练集——也就是这些对齐方法，又能拥有一个扩展的测试集——就像对模型进行X光扫描，判断哪些有效、哪些无效。

And so somehow we need to get into a dynamic where we have an extended test set, an extended training set, which is all these alignment methods, and an extended test set, which is kind of like you x-ray the model and say like, okay, what works and what didn't?

Speaker 0

这种方式超越了你所进行的那些经验性测试，对吧？

In a way that goes beyond just the empirical tests that you've run, right?

Speaker 0

你在说，模型在这种情况下会做什么？

Where you're saying, what is the model going to do in these situations?

Speaker 0

它是内在能力上能做什么，而不是仅仅从现象上看它做了什么？

What is it within its capabilities to do instead of what did it do phenomenologically?

Speaker 0

当然，我们必须对此保持谨慎。

And of course, we have to be careful about that.

Speaker 0

对吧？

Right?

Speaker 0

我认为非常重要的一点是，我们绝不应该为了可解释性而训练，因为我认为这会削弱这种优势。

One of the things I think is very important is we should never train for interpretability because I think that is that's taking away that advantage.

Speaker 0

对吧？

Right?

Speaker 0

你甚至会遇到类似验证集与测试集的问题：如果你反复查看X光图像，就可能造成干扰。

You even have the problem similar to validation versus test set, where if you look at the X-ray too many times, you can interfere.

Speaker 0

但我认为这是一个更弱的选择，我们确实应该关注它，但这是一个更弱的过程。

But I think that's a much weaker opt we should worry about that, but that's a much weaker process.

Speaker 0

这不是自动化优化。

It's not automated optimization.

Speaker 0

我们应该像对待验证集和测试集一样，确保在运行测试集之前不要过多地查看验证集。

We should just make sure as with validation and test sets that we don't look at the validation set too many times before running the test set.

Speaker 0

但同样，这是人为的压力，而不是自动化压力。

But again, that's manual pressure rather than automated pressure.

Speaker 0

因此，我们需要某种解决方案，在训练集和测试集之间建立一种动态关系，即我们尝试各种方法，并通过模型并未针对优化的某种正交方式来真正检验它们是否有效。

And so some solution where it's like, we have some dynamic between the training and test set, where it's like, we're trying things out and we really figure out if they work via way of testing them that the model isn't optimizing against, some orthogonal way.

Speaker 0

比如，我认为我们永远无法保证万无一失，但可以建立某种流程，把这些事情结合起来——当然，不能以愚蠢的方式。

Like if I think of, and I think we're never gonna have a guarantee, but some process where we we do those things together, again, not in a stupid way.

Speaker 0

有很多愚蠢的做法会让你自己骗自己。

There's lots of stupid ways to do this where you fool yourself.

Speaker 0

但我们需要某种方式，将对齐能力的长期训练与对齐能力的长期测试结合起来，真正发挥作用。

But, like, some way to put extended training for alignment ability with extended testing for alignment ability together in a way that actually works.

Speaker 1

我还是不明白你的直觉，为什么你觉得这个方法很可能有效，或者值得去探索。

I I still don't feel like I understand the intuition that why you think this is likely to work or this is a promising to pursue.

Speaker 1

让我更具体地问一个问题，抱歉这个类比有点牵强。

And let me ask the question in a sort of more specific way and excuse the tortured analogy.

Speaker 1

假设你是一位经济学家，想要理解经济运行的规律。

But listen, if you're you're an economist and you wanna understand the economy.

Speaker 1

是的。

Yeah.

Speaker 1

于是你派一大批微观经济学家去研究，其中一位研究餐饮业如何运作。

So you send a whole bunch of microeconomists out there and one of them studies how the restaurant business works.

Speaker 1

另一位研究旅游业如何运作。

One of them studies how the tourism business works.

Speaker 1

还有一位研究烘焙行业如何运作。

You know, one of them studies how the baking works.

Speaker 1

最后他们聚在一起，你依然无法预测五月会不会出现经济衰退。

And at the end, they all come together, and you still don't know whether there's gonna be a recession in May or not.

Speaker 1

为什么这不像那样，我们明明理解了在两层变换器中归纳头是如何工作的？

Why is this not like that where you have an understanding of we understand how induction heads work in a two layer transformer?

Speaker 1

我们理解，比如说，模算术。

We understand, you know, modular arithmetic.

Speaker 1

这些加起来怎么就能说明这个模型是否想杀死我们？

How does this add up to does this model want to kill us?

Speaker 1

换句话说，这个模型从根本上来说想要做什么？

Like, what does this model fundamentally A

Speaker 0

关于这一点，有几点要说。

few things on that.

Speaker 0

我的意思是，我认为这些问题问得正是时候。

I mean, I think that's like the right set of questions to ask.

Speaker 0

我认为我们最终希望的并不是理解每一个细节。

I think what we're hoping for in the end is not that we'll understand every detail.

Speaker 0

但再次强调，我会用X光或核磁共振的类比。

But again, I would give the X-ray or the MRI analogy.

Speaker 0

我们能否处于这样一种状态：能够观察模型的宏观特征，并判断这个模型的内部状态和意图是否与其外在表现的行为截然不同？

That we can be in a position where we can look at the broad features of the model and say, is this a model whose internal state and plans are very different from what it externally represents itself to do?

Speaker 0

对吧？

Right?

Speaker 0

这是一个模型吗？它的大部分计算资源都被用于做那些看起来极具破坏性和操纵性的事情，而我们对此感到不安。

Is this a model where we're uncomfortable that far too much of its computational power is devoted to doing what looked like fairly disruptive and manipulative things.

Speaker 0

当然，我们还不确定这是否真的可能，但我认为至少有一些积极的迹象表明这是有可能的。

Again, we don't know for sure whether that's possible, but I think some at least positive signs that it might be possible.

Speaker 0

再说一遍，模型并不是有意在对你隐瞒，对吧？

Again, the model is not intentionally hiding from you, right?

Speaker 0

最终可能是训练过程本身将这些信息隐藏了起来。

It might turn out that the training process hides it from you.

Speaker 0

我可以想到一些模型极其聪明的例子。

And I can think of cases where the model's really super intelligent.

Speaker 0

它会以某种方式思考，以至于这种思考方式会直接影响它自身的认知。

It, like, thinks in a way so that it, like, affects its own cognition.

Speaker 0

我怀疑我们应该思考这个问题。

I suspect we should think about that.

Speaker 0

我们应该考虑所有可能性。

We should consider everything.

Speaker 0

我怀疑，我们可以大致这样理解：模型是通过正常方式训练的，仅达到略高于人类的水平，这可能是一个我们应当检查的理由，也可能是合理的假设——模型的内部结构并非有意针对我们进行优化。

I I I suspect that it may roughly work to think of the model as if it's trained in the normal way, just getting to just above human level, it may be a reason we should check, it may be a reasonable assumption that the internal structure of the model is not intentionally optimizing against us.

Speaker 0

我举一个类似人类的类比。

And I give an analogy like to humans.

Speaker 0

实际上，通过查看某人的核磁共振成像，你就能比随机猜测更准确地预测他们是否是心理变态者。

So it's actually possible to, you know, to look at an MRI of someone and predict above random chance whether they were a psychopath.

Speaker 0

几年前确实有一个关于一位研究此领域的神经科学家的故事。

There was actually a story a few years back about a neuroscientist who was studying this.

Speaker 0

他查看了自己的扫描结果，发现自己是个心理变态者。

And he looked at his own scan and discovered that he was a psychopath.

Speaker 0

然后他身边的所有人都说：不，不，不。

And then everyone everyone in his life was like, no, no, no.

Speaker 0

这很明显。

This is obvious.

Speaker 0

你简直就是个混蛋。

Like, you're you're a complete asshole.

Speaker 0

你肯定是心理变态。

Like, you must be a psychopath.

Speaker 0

而他对此完全毫无察觉。

And and he was total totally unaware of this.

Speaker 0

基本观点是，可能存在一些宏观特征，心理变态或许是一个很好的类比。

The basic idea that there can be these macro features that like psychopath is probably a good analogy for it.

Speaker 0

他们就像是：我们所担心的那种模型，表面上很迷人，目标明确，但内心却非常黑暗。

They're like, this is what we'd be afraid of, model that's kind of like charming on the surface, very goal oriented, and very dark on the inside.

Speaker 0

表面上，他们的行为可能看起来和别人一样，但他们的目标却截然不同。

On the surface, their behavior might look like the behavior of someone else, but their goals are very different.

Speaker 1

有人可能会问：你之前提到实证的重要性，这是为什么？

A question somebody might have is, listen, you mentioned earlier the importance of being empirical.

Speaker 1

在这种情况下，你是在试图估算它。

And in this case, you're trying to estimate it.

Speaker 1

听好了，这些激活信号可疑吗？

Listen, are these activations sus?

Speaker 1

但这件事我们是否能以实证方式处理，还是我们需要一个非常扎实的、基于第一性原理的理论依据，来证明这不仅仅是因为这些模型的MRI图像与‘坏’行为相关？

But is this something we can afford to be empirical about a or do we need a very good first principle theoretical reason to think, no, it's not just that these MRIs of the model correlate with being bad.

Speaker 1

我们需要一些深层的数学证明，说明确实如此。

We need just some deep root math proof that this So is

Speaker 0

这取决于你所说的‘实证’是什么意思。

it depends what you mean by empirical.

Speaker 0

更好的说法应该是‘现象学的’。

I mean, better term would be phenomenological.

Speaker 0

我不认为我们应该纯粹停留在现象学层面，比如，这里有一些非常危险模型的大脑扫描图，那里也有一些大脑扫描图。

I don't think we should be purely phenomenological in like, you know, here are some brain scans of like really dangerous models and here are some brain scans.

Speaker 0

我认为机制可解释性的整个理念，是去探究底层的原理和电路。

I think the whole idea of mechanistic interpretability is to look at the underlying principles and circuits.

Speaker 0

但我想我是这样看待这个问题的：一方面，我一直以来都支持以我们所能达到的最细致层次来研究这些电路。

But I guess the way I think about it is like, on one hand, I've actually always been a fan of studying these circuits at the lowest level of detail that we possibly can.

Speaker 0

这样做的原因是，这正是积累知识的方式。

And the reason for that is kind of that's how you build up knowledge.

Speaker 0

即使你最终的目标是这些特征太多、太复杂，归根结底，我们试图构建的是一个更宏观的理解。

Even if you're ultimately aiming for there's too many of these features, it's too complicated, at the end of the day, we're trying to build something broad, and we're trying to build some broad understanding.

Speaker 0

我认为，构建这种理解的方式是通过做出大量这样的具体发现。

I think the way you build that up is by trying to make a lot of these very specific discoveries.

Speaker 0

你必须先理解这些基础构件，然后才能利用它们来得出宏观的结论，即使你无法弄清楚所有细节。

Like you have to understand the building blocks, and then you have to figure out how to kind of use that to draw these broad conclusions, even if you're not gonna figure out everything.

Speaker 0

我觉得你最好去问问克里斯·奥拉，他对此会有更详细的见解。

I think you should probably talk to Chris Ola, who would have much more detail.

Speaker 0

这只是我对这个问题的宏观思考。

This is my kind of high level thinking on it.

Speaker 0

克里斯·奥拉掌控着可解释性研究的方向。

Like, Chris Ola controls the interpretability agenda.

Speaker 0

你知道，他才是决定解释性工作方向的人。

Like, you know, he's he's the one who decides what to what to do on interpretability.

Speaker 0

这是我对此的高层次思考，但肯定不如他的深入。

This is my high level thinking about it, which is not gonna be as good as his.

Speaker 1

Anthropic 的乐观前景是否依赖于机制可解释性对能力提升有帮助？

Does the bull case on Anthropic rely on the fact that mechanistic interpretability is helpful for capabilities?

Speaker 0

我完全不这么认为。

I don't think so at all.

Speaker 0

不过，从原则上讲，机制可解释性确实有可能对能力提升有帮助。

Now I do think, think in principle, it's possible that mechanistic interpretability could be helpful with capabilities.

Speaker 0

如果真是这样，我们可能因为各种原因选择不公开讨论这一点。

We might, for various reasons, not choose to talk about it if that were the case.

Speaker 0

在 Anthropic 创立之初，我或者我们任何人并没有这样想过。

That wasn't something that I thought of, or that any of us thought of at the time of Anthropic's founding.

Speaker 0

我们当时觉得自己就是擅长扩大模型规模、并在这些模型之上做安全工作的人。

Mean, we thought of ourselves as like, we're people who are like good at scaling models and good at doing safety on top of those models.

Speaker 0

而且你知道，我们认为我们团队中擅长这方面的人才密度非常高。

And like, you know, we think that we have a very high talent density of folks who are good at that.

Speaker 0

我的观点一直认为，人才密度胜过人才数量。

And you know, my view has always been talent density beats talent mass.

Speaker 0

所以这更像是我们的看涨理由。

And so that's more of our bull case.

Speaker 0

人才密度胜过人才数量。

Talent density beats talent mass.

Speaker 0

我不认为这取决于某个特定的因素。

I don't think it depends on some particular thing.

Speaker 0

现在其他人也开始做机制可解释性了，我非常高兴看到这一点。

Like others are starting to do mechanistic interpretability now, and I'm very glad that they are.

Speaker 0

我们的变革理论之一，讽刺的是，就是要让其他组织更像我们。

That a part our theory of change is paradoxically to make other organizations more like us.

Speaker 1

人才密度当然很重要。

Talent density, I'm sure, is important.

Speaker 1

但Anthropic强调的另一点是，要进行安全研究，你必须拥有前沿模型。

But another thing Anthropic has emphasized is that you need to have frontier models in order to do safety research.

Speaker 1

而且，当然，你还得真是一家公司。

And of course, like actually be a company as well.

Speaker 1

目前的前沿模型，有人可能会猜，比如GPT-4的开销高达一亿美元左右。

The current frontier models, something somebody might guess, like GPT-four o'clock to like $100,000,000 or something

Speaker 0

就是这样。

like that.

Speaker 0

从非常宽泛的角度来看，这个数量级大致没错。

That general order of magnitude in very broad terms is not wrong.

Speaker 1

但你知道，我们距离你所谈论的这些技术，还有两到三年的时间。

But, you know, we're two to three years from now, the kinds of things you're talking about.

Speaker 1

为了跟上这种发展，我们需要越来越多的数量级提升，而且如果情况是这样——比如说，你对站在前沿感到羞愧的话。

We're talking more and more orders of magnitude to keep up with that and to if it's the case, let's say, you're embarrassed to be on the frontier.

Speaker 1

我的意思是，在什么情况下，Anthropic会去和这些巨无霸竞争，以维持同样的规模？

I mean, what is the case in which Anthropic is, like, competing with these Leviathans to stay on that same scale?

Speaker 0

我的意思是，我认为这是一个充满权衡的状况。

I mean, I think it's a I think it's a very it's a situation with a lot of trade offs.

Speaker 0

对吧？

Right?

Speaker 0

我觉得这并不容易。

I think it's I think it's not easy.

Speaker 0

我想回到之前的话题，也许我可以逐个回答这些问题。

I guess to go back, maybe I'll just like answer the questions one by one.

Speaker 0

对吧？

Right?

Speaker 0

那么回到为什么安全如此依赖规模这个问题上。

So like to go back to like, you know, why is safety so tied to scale?

Speaker 0

对吧？

Right?

Speaker 0

有些人并不这么认为，但你看一看，哪些领域已经实际应用了安全方法，哪怕只是在某些情况下有效，即使我们不认为它们普遍适用。

Some people don't think it is, but like, just look at like, you know, where have been the areas that, you know, I don't know, like safety methods have like been put into practice or worked for something, for anything, even if we don't think they'll work in general.

Speaker 0

我会回想起所有那些想法，比如辩论和放大。

I go back to thinking of all the ideas, something like debate and amplification.

Speaker 0

2018年我们在OpenAI撰写相关论文时，觉得人类反馈可能行不通，但辩论和放大能让我们超越它。

Back in 2018, when we wrote papers about those at OpenAI, was like, well, human feedback isn't quite gonna work, but debate and amplification will take us beyond that.

Speaker 0

但如果你实际去观察，我们确实尝试过辩论，却发现我们严重受限于模型的质量——要让两个模型进行足够连贯的辩论，以便人类能够评判，从而让训练过程真正起作用，你需要的模型水平至少要达到甚至超越当前前沿。

But then if you actually look at, and we've done attempts to do debates, we're really limited by the quality of the model, where it's like for two models to have a debate that is coherent enough that a human can judge it so that the training process can actually work, you need models that are at or maybe even beyond on some topics the current frontier.

Speaker 0

你可以不处于前沿，就能想出一种方法或一个点子。

Now you can come up with a method, you can come up with the idea without being on the frontier.

Speaker 0

但对我来说，这仅仅是需要完成的工作中非常小的一部分。

But for me, that's a very small fraction of what needs to be done.

Speaker 0

想出这些方法非常容易。

It's very easy to come up with these methods.

Speaker 0

想出类似‘问题是X，也许解决方案是Y’这样的想法非常容易。

It's very easy to come up with like, oh, the problem is X, maybe a solution is Y.

Speaker 0

但我真正想知道的是，这些方法在实践中是否有效，哪怕只是针对我们今天已有的系统。

But I really wanna know whether things work in practice, even for the systems we have today.

Speaker 0

我想知道这些方法在实践中会出现哪些问题。

And I wanna know what kinds of things go wrong with them.

Speaker 0

我只是觉得，通过实际尝试，你会发现十种新想法和十种新的出错方式。

I just feel like you discover 10 new ideas and 10 new ways that things can go wrong by trying these in practice.

Speaker 0

这种经验性学习，我认为还没有得到应有的广泛理解。

And that empirical learning, think it's just not as widely understood as it should be.

Speaker 0

我对类似宪法AI的方法也有同样的看法。

I would say the same thing about methods like constitutional AI.

Speaker 0

有些人说，这无关紧要。

And some people say, oh, it doesn't matter.

Speaker 0

比如，我们知道这种方法行不通。

Like, we know this method doesn't work.

Speaker 0

它无法实现纯粹的对齐。

It won't work for pure alignment.

Speaker 0

我对这一点既不赞同也不反对。

I neither agree nor disagree with that.

Speaker 0

我觉得这有点过于自信了。

I think that's just kinda overconfident.

Speaker 0

我们发现新事物、理解哪些方法有效、哪些无效的方式，是通过不断尝试和探索。

The way we discover new things and understand the structure of what's gonna work and what's not is by playing around with things.

Speaker 0

这并不意味着我们应该盲目地认为，这里有效的方法在那里也一定有效。

Not that we should just kind of blindly say, oh, this worked here and so it'll work there.

Speaker 0

但你真的会开始理解其中的模式，尤其是规模定律。

But you really start to understand the patterns, with the scaling laws.

Speaker 0

甚至机制可解释性，这可能是我看过的、在没有前沿模型的情况下取得大量进展的领域。

Even mechanistic interpretability, which might be the one area I see where a lot of progress has been made without the frontier models.

Speaker 0

我们看到OpenAI几个月前发布的工作中，使用非常强大的模型来帮助自动解释较弱的模型。

We're seeing in the work that say OpenAI put out a couple months ago, that using very powerful models to help you auto interpret the weak models.

Speaker 0

这当然不是可解释性研究的全部，但却是其中一个重要组成部分。

Again, that's not everything you can do in interpretability, but that's a big component of it.

Speaker 0

我们自己也发现它很有用。

And we found it useful too.

Speaker 0

因此，你一次又一次地看到这种现象：扩展与安全就像两条相互缠绕的蛇，比你想象的还要紧密，对吧？

And so you see this phenomenon over and over again, where it's like, you know, the scaling and the safety are these two snakes that are like coiled with each other, always even more than you think, right?

Speaker 0

就可解释性而言，我想三年前我并没觉得这一点在可解释性上也会如此成立，但不知为何它偏偏就是成立的。

You know, with interpretability, like I think three years ago, I didn't think that this would be as true of interpretability, But somehow it manages to be true.

Speaker 0

为什么？

Why?

Speaker 0

因为智能是有用的。

Because intelligence is useful.

Speaker 0

它在许多任务中都很有用。

It's useful for a number of tasks.

Speaker 0

它其中一个有用之处在于，能够判断和评估其他智能体，也许将来甚至能用于对齐研究本身。

One of the tasks it's useful for is figuring out how to judge and evaluate other intelligence, and maybe someday even for doing the alignment research itself.

Speaker 0

鉴于这一点是真实的，

Given how that's true,

Speaker 1

那么这对Anthropic意味着什么？当两到三年后，这些庞然大物进行价值高达一百亿美元的训练时？

what what does that imply for Anthropic when in two to three years, these Leviathans are doing, like, $10,000,000,000 training runs?

Speaker 0

第一个选择是，如果我们无法做到，或者保持在前沿成本太高，那么我们就不要这么做。

Choice one is if it if we can't or if it costs too much to stay on the frontier, then we shouldn't do it.

Speaker 0

我们不会使用最先进的模型。

And we won't work with the most advanced models.

Speaker 0

我们会看看在不那么先进的模型上能获得什么成果。

We'll see what we can get with models that are not quite as advanced.

Speaker 0

我认为那里还是能获得一些价值的，不是零价值，但我有点怀疑这种价值是否真的很高，或者学习速度是否足够快，足以支持这项任务。

I think you can get some value there, like non zero value, but I'm kind of skeptical that the value is all that high or the learning can be fast enough to really be in favor of the task.

Speaker 0

第二个选择是，你只是找到一种方法。

The second option is you just find a way.

Speaker 0

你只是接受这些权衡。

You just accept the trade offs.

Speaker 0

我认为这些权衡比表面看起来更积极，因为有一种我称之为‘向上竞争’的现象。

And I think the trade offs are more positive than they appear because of a phenomenon that I've called race to the top.

Speaker 0

我稍后可以详细讲讲，但目前先搁置一下。

I could go into that later, but I'll just let me put that aside for now.

Speaker 0

然后我认为第三个现象是，当事物发展到那个规模时，可能会伴随着出现相当严重的危险的概率开始变得不可忽视。

And then I think the third phenomenon is, you know, as things get to that scale, I think this may coincide with starting to get into some nontrivial probability of very serious danger.

Speaker 0

同样，我认为首先出现的问题会是滥用，比如我之前提到的生物技术方面的风险，但我觉得我们目前还没有达到那种自主性水平，不需要担心两年内就会出现对齐问题，但这个问题可能紧随其后。

Again, I think it's gonna come first from misuse, the kind of bio stuff that I talked about, but I don't think we have the level of autonomy yet to worry about some of the alignment stuff happening in like two years, but it might not be very far behind that at all.

Speaker 0

这可能会导致单边、多边或政府强制的决策，即我们支持的、不以最大速度推进的决定。

You know, that that may that may lead to unilateral or multilateral or government enforced, which we support, decisions, not to scale as fast as we could.

Speaker 0

这最终可能是正确的做法。

That may end up being the right thing to do.

Speaker 0

所以，说实话，我其实希望事情朝这个方向发展。

So I I you know, actually, that's kind of like, I kind of hope things go in that direction.

Speaker 0

这样一来，我们就不用在‘我们不在前沿，无法像希望的那样开展研究或影响其他机构’和‘我们身处前沿，必须接受那些总体积极但利弊交织的权衡’之间做出艰难抉择。

And then we don't have this hard trade off between we're not in the frontier and we can't quite do the research as well as we want or influence other orgs as well as we want, or versus we're kind of on the frontier and have to accept the trade offs, which are net positive, but have a lot in both directions.

Speaker 1

好的。

Okay.

Speaker 1

关于滥用和对齐问题，正如你所说，两者都是问题。

On the misuse versus misalignment, those are both problems, as you mentioned.

Speaker 1

但从长远来看，你更担心哪一个？

But in the long scheme of things, what are you more concerned about?

Speaker 1

比如三十年后，你认为哪个问题会更严重？

Like thirty years down the line, which do you think will be considered a bigger problem?

Speaker 0

我觉得用不了三十年，但我对两者都感到担忧。

I think it's much less than thirty years, but I'm worried about both.

Speaker 0

我不知道。

I don't know.

Speaker 0

如果你有一个理论上能够独自掌控世界的模型，而你能控制这个模型，那么很显然，如果这个模型只服从一小部分人的意愿而忽视其他人，这些人就能利用它为自己夺取世界控制权。

If you have a model that could, in theory, take over the world on its own, If you were able to control that model, then it follows pretty simply that if a model was following the wishes of some small subset of people and not others, then those people could use it to take over the world on their behalf.

Speaker 0

错位的预设本身就意味着，我们同样应该对滥用保持高度警惕，因为其后果可能相当严重。

The very premise of misalignment means that we should be worried about misuse as well with similar levels of consequences.

Speaker 1

但一些比你更悲观的人可能会说，滥用其实已经是比较乐观的情形了，因为你至少已经知道如何让模型听命于坏人。

But but some people who might be more doomer y than you would say misuse is you're already working towards the optimistic scenario there because you've at least figured out how to align the model with the bad guys.

Speaker 1

现在你只需要确保它听命于好人就行了。

Now you just need to make sure that it's aligned with the good guys instead.

Speaker 1

你为什么认为你能达到让AI与坏人对齐的阶段？你难道没看到已经发生了什么吗？

Why do you think that you could get to the point where it's aligned with the bad you know, you haven't already saw the

Speaker 0

我想，如果你认为对齐问题是完全无解的，那你可能会说，好吧，反正我们完蛋了，所以也不用担心滥用问题。

I I guess if you had the view that, like, alignment is completely unsolvable, then, you know, then you'd be like, well, I don't you know, we're dead anyway, so I don't wanna worry about misuse.

Speaker 0

这完全不是我的立场。

That's not my position at all.

Speaker 0

但你也应该想想，什么样的计划才真正能成功，从而让事情变得更好？

But but also, like, you should think in terms of, like, what's a plan that would actually succeed that would make things good?

Speaker 0

任何真正能成功的计划，无论对齐问题有多难解决，都必须同时解决滥用问题和对齐问题。

Any plan that actually succeeds, regardless of how hard misalignment is to solve, any problem any plan that actually succeeds is gonna need to solve misuse as well as misalignment.

Speaker 0

随着AI模型变得越来越强大、越来越快，它们将引发国家间力量平衡的重大问题。

It's getting to solve the fact that like, as the AI models get better, faster and faster, they're gonna create a big problem around the balance of power between countries.

Speaker 0

它们还将引发一个重大问题：单个个体是否可能做出一些坏事，而其他人却难以阻止？

They're gonna create a big problem around, is it possible for a single individual to do something bad that it's hard for everyone else to stop?

Speaker 0

任何真正能导向美好未来的实际解决方案，都必须解决这些问题。

Any actual solution that needs to leads to a good future needs to solve those problems as well.