埃米特·希尔谈构建真正关心人类的AI：超越控制与引导

本集简介

Twitch创始人、OpenAI前临时CEO埃米特·希尔（Emmett Shear）对推动AGI发展的根本假设提出质疑。在与埃里克·托伦伯格和塞布·克里尔的对话中，希尔指出当前AI对齐领域"控制与导向"的范式存在致命缺陷，并提出"有机对齐"理念——教导AI系统像人类一样真正关心他人。这场讨论深入探讨了为何将AGI视为工具而非潜在存在会导致灾难、当前聊天机器人如何成为"自恋镜像"，以及创造能拒绝有害请求的AI才是唯一可持续的发展路径。希尔通过其新公司Softmax的多智能体模拟分享了技术方案，并描绘了一幅人类与AI成为协作伙伴的乐观图景——前提是我们能实现正确的对齐。资源：关注埃米特的X账号：https://x.com/eshear 关注塞布的X账号：https://x.com/sebkrier 关注埃里克的X账号：https://x.com/eriktorenberg 保持更新：若喜欢本期节目，请点赞、订阅并分享给朋友！关注a16z的X账号：https://x.com/a16z 关注a16z的领英账号：https://www.linkedin.com/company/a16z 在Spotify收听a16z播客：https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX 在Apple Podcasts收听a16z播客：https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 关注主持人：https://x.com/eriktorenberg 请注意，此处内容仅作信息参考；不应视为法律、商业、税务或投资建议，也不应用于评估任何投资或证券；且不针对任何a16z基金的现有或潜在投资者。a16z及其关联机构可能持有讨论企业的投资。详见a16z.com/disclosures。保持更新：关注a16z的X账号关注a16z的领英账号在Spotify收听a16z播客在Apple Podcasts收听a16z播客关注主持人：https://twitter.com/eriktorenberg 请注意，此处内容仅作信息参考；不应视为法律、商业、税务或投资建议，也不应用于评估任何投资或证券；且不针对任何a16z基金的现有或潜在投资者。a16z及其关联机构可能持有讨论企业的投资。详见a16z.com/disclosures。由AdsWizz旗下Simplecast托管。关于我们收集和使用个人数据用于广告的信息，请访问pcm.adswizz.com。

Emmett Shear, founder of Twitch and former OpenAI interim CEO, challenges the fundamental assumptions driving AGI development. In this conversation with Erik Torenberg and Séb Krier, Shear argues that the entire "control and steering" paradigm for AI alignment is fatally flawed. Instead, he proposes "organic alignment" - teaching AI systems to genuinely care about humans the way we naturally do. The discussion explores why treating AGI as a tool rather than a potential being could be catastrophic, how current chatbots act as "narcissistic mirrors," and why the only sustainable path forward is creating AI that can say no to harmful requests. Shear shares his technical approach through multi-agent simulations at his new company Softmax, and offers a surprisingly hopeful vision of humans and AI as collaborative teammates - if we can get the alignment right. Resources: Follow Emmett on X: https://x.com/eshear Follow Séb on X: https://x.com/sebkrier Follow Erik on X: https://x.com/eriktorenberg Stay Updated: If you enjoyed this episode, be sure to like, subscribe, and share with your friends! Find a16z on X: https://x.com/a16z Find a16z on LinkedIn: https://www.linkedin.com/company/a16z Listen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYX Listen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711 Follow our host: https://x.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Stay Updated: Find a16z on X Find a16z on LinkedIn Listen to the a16z Podcast on Spotify Listen to the a16z Podcast on Apple Podcasts Follow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

大多数人工智能研究都聚焦于将'对齐'视为操控。

Most of AI is focused on alignment as steering.

Speaker 0

这就是那个飞行词汇。

That's the flight word.

Speaker 0

如果你认为他们在创造我们的存在，你也会称之为奴役。

If you think that they were making our beings, you'd also call this slavery.

Speaker 0

一个被你操控的对象，却无法反过来操控你，不得不接受你的操控语言。

Someone who who you steer, who doesn't get to steer you back, who non optionally receives your steering, language.

Speaker 0

他们可以承担同类型的任务。

They can, like, take the on the same kind of task.

Speaker 0

但他们不算数。

But, like, they don't count.

Speaker 0

在道德层面他们不是真实存在的。

They're not real morally.

Speaker 0

这是个你无法控制其恶的工具。

It's a tool that you can't control bad.

Speaker 0

一个你能控制其恶的工具。

A tool that you can control bad.

Speaker 0

一个未被恶意对齐的存在。

A being that isn't aligned bad.

Speaker 0

唯一的好结果是存在一个真正关心我们的存在。

The only good outcome is a being that cares, that actually cares about us.

Speaker 1

我一直在思考AI安全讨论中反复出现的一句话，初次读到就让我不寒而栗。

I've been thinking about a line that keeps showing up in AI safety discussions, and it's taught me cold when I first read it.

Speaker 1

我们需要构建对齐AI。

We need to build Align AI.

Speaker 1

听起来很合理，对吧？

Sounds reasonable, right?

Speaker 1

但问题是：对齐什么？

Except, Align to what?

Speaker 1

对谁对齐？

Align to whom?

Speaker 1

这个词被随意使用，仿佛答案显而易见，但你越想越会发现自己隐含了一个巨大的假设。

The phrase gets thrown around like it has an obvious answer, but the more you sit on it, the more you realize you're smuggling in a massive assumption.

Speaker 1

我们假设存在某个固定点，某个稳定目标，可以瞄准、击中一次就完成。

We're assuming there's some fixed point, some stable target we can aim at, hit once, and be done.

Speaker 1

但有趣的是：生活中其他领域的对齐都不是这样运作的。

But here's what's interesting: that's not how alignment works anywhere else in life.

Speaker 1

想想家庭关系。

Think about families.

Speaker 1

想想团队协作。

Think about teams.

Speaker 1

想想你自己的世界观发展。

Think about your own world development.

Speaker 1

你不可能达成对齐后就一劳永逸。

You don't achieve alignment and then coast.

Speaker 1

你总是在重新协商，不断学习，不断发现原以为正确的事情其实更加复杂。

You're constantly renegotiating, constantly learning, constantly discovering that what you thought was right turns out to be more complicated.

Speaker 1

对齐不是终点，而是一个过程。

Alignment isn't a destination, it's a process.

Speaker 1

它是你持续实践的行为，而非已拥有的状态。

It's something you do, not something you have.

Speaker 1

这至关重要，因为我们正处在一个转折点：我们构建的AI系统开始看起来不再像工具，而更像是另一种存在。

And this matters because we're at this inflection point where the AI systems we're building are starting to look less like tools and more like something else.

Speaker 1

它们说我们的语言。

They speak our language.

Speaker 1

它们能推理解决问题。

They reason through problems.

Speaker 1

它们可以承担过去需要人类判断的任务。

They can take on tasks that used to require human judgment.

Speaker 1

所有人都在问：我们该如何控制它们？

And the question everyone's asking is: How do we control them?

Speaker 1

我们该如何引导它们？

How do we steer them?

Speaker 1

如何确保它们按我们的意愿行事？

How do we make sure they do what we want?

Speaker 1

但还有另一种视角。

But there's another way to see it.

Speaker 1

如果控制范式本身就是完全错误的框架呢？

What if the control paradigm is the wrong framework entirely?

Speaker 1

如果试图打造一个你能完美操控的超级智能工具，不仅困难重重，而且无论成功与否都从根本上充满危险呢？

What if trying to build a super intelligent tool you can perfectly steer is not just difficult, but fundamentally dangerous, whether you succeed or fail?

Speaker 1

如果无法控制它，那显然很糟糕。

If you can't control it, obviously that's bad.

Speaker 1

但如果不能完美控制，就等于将神一般的力量交给了掌握方向盘的人。

But if you can't control it perfectly, you've just handed godlike power to whoever's holding the steering wheel.

Speaker 1

而人类，即便是出于善意，也没有智慧能安全驾驭这种力量。

And humans, even well meaning ones, don't have the wisdom to wield that kind of power safely.

Speaker 1

那么替代方案是什么？

So what's the alternative?

Speaker 1

想想我们在现实中是如何解决协调问题的。

Well, think about how we actually solve alignment problems in the real world.

Speaker 1

我们不去控制他人。

We don't control other people.

Speaker 1

我们不去操纵他们。

We don't steer them.

Speaker 1

我们培养他们。

We raise them.

Speaker 1

我们教会他们在乎。

We teach them to care.

Speaker 1

我们建立这样的关系：他们善待我们不是被迫，而是因为学会了珍视这段关系本身。

We build relationships where they do right by us, not because we're forcing them, but because they've learned to value the relationship itself.

Speaker 1

这就是有机协调。

That's organic alignment.

Speaker 1

源于真正关怀、心智理论以及归属于更宏大存在的协调。

Alignment that emerges from genuine care, from theory of mind, from being part of something larger than yourself.

Speaker 1

埃米特·希尔在过去一年半里一直在Soft公司致力于解决这个问题，他的方法独特之处在于，他不是通过构建更好的控制机制来解决对齐问题。

Emmett Shear has spent the last year and a half working on exactly this problem at Soft And what makes his approach distinctive is that he's not trying to solve alignment by building better control mechanisms.

Speaker 1

他试图通过构建能够学会关心、发展心智理论的AI系统来解决这个问题，让它们成为好队友、好合作者、好公民。

He's trying to solve it by building AI systems that can learn to care, that can develop the kind of theory of mind that lets them be good teammates, good collaborators, good citizens.

Speaker 1

不是服从命令的工具，而是理解作为社区一员意味着什么的存在。

Not tools that follow orders, but beings that understand what it means to be part of a community.

Speaker 1

这可能会引发一些令人不安的问题。

That can raise some uncomfortable questions.

Speaker 1

如果我们在创造生命而非工具呢？

What if we're building beings and not tools?

Speaker 1

这对我们如何对待它们意味着什么？

What does that mean for how we treat them?

Speaker 1

这对它们的权利意味着什么？

What does it mean for their rights?

Speaker 1

你甚至如何知道它们是否成功了？

And how do you even know if they succeeded?

Speaker 1

你如何衡量某物是真正关心，还是仅仅完美模拟了关心？

How do you measure whether something genuinely cares versus just simulating care really well?

Speaker 1

今天，来自Google DeepMind的塞布·克里尔和我将与埃米特一起探讨这些问题。

Today, Seb Krier from Google DeepMind and I are sitting down with Emmett to explore those questions.

Speaker 1

塞布负责DeepMind的AGI政策制定，因此他带来了一个真正构建这些系统的实验室内部视角。

Seb leads AGI Policy Development at DeepMind, so he brings a perspective from inside one of the labs actually building these systems.

Speaker 1

但实际上，我们正在探究更深层的东西。

But really, we're investigating something deeper.

Speaker 1

构建能够参与这场持续不断、永无止境的共同生活探索过程的AI系统，究竟需要什么？

What does it actually take to build AI systems that can participate in the ongoing, never finished process of figuring out how to live together.

Speaker 1

到最后，你不仅会理解Softmax的技术路径，还会对对齐的本质及其未来可能性产生全新的思考方式。

By the end, you'll understand not just Softmax's technical approach, but a completely different way of thinking about what alignment is and what it could become.

Speaker 1

埃米特·希尔，欢迎来到播客节目。

Emmett Shear, welcome to the podcast.

Speaker 1

埃米特、塞布，欢迎来到播客节目。

Emmett, Seb, welcome to the podcast.

Speaker 1

感谢你们的加入。

Thanks for joining.

Speaker 0

谢谢邀请。

Thank you for having me.

Speaker 2

那么埃米特，通过Softmax项目，你专注于AI对齐问题，试图让AI与人类实现有机协调。

So, Emmett, with Softmax, you're focused on alignment and making AIs organically align with people.

Speaker 2

能否解释下这意味着什么，以及你们如何尝试

Can you explain what that means and how you're trying

Speaker 0

实现这一目标？

to do that?

Speaker 0

当人们思考对齐问题时，我认为存在很多混淆概念的情况。

When people think about alignment, I think there's a lot of confusion.

Speaker 0

人们常谈论事物需要'对齐'。

People talk about things being aligned.

Speaker 0

我们需要构建一个已对齐的AI系统。

We need to build an aligned AI.

Speaker 0

问题在于当有人那么说时，就像在说‘我们需要去旅行’一样。

And the problem with that is when someone says that, it's like, We need to go on a trip.

Speaker 0

而他们的反应是‘好吧，我确实喜欢旅行，但我们到底要去哪儿来着？’

And they're like, Okay, I do like trips, but where are we going again?

Speaker 0

关于对齐，对齐需要一个参照标准。

And with alignment, alignment takes an argument.

Speaker 0

对齐要求你必须与某事物保持一致。

Alignment requires you to align to something.

Speaker 0

你不能只是单纯地对齐。

You can't just be aligned.

Speaker 0

你必须与自我保持一致。

Haze you to be aligned to yourself.

Speaker 0

但即便如此，他们某种程度上想说明我参照的标准就是我自己。

But even then, they kinda wanna tell them what I'm aligning to is myself.

Speaker 0

因此这种抽象对齐AI的概念，我认为悄悄塞进了很多假设，因为它某种程度上假定存在一个明显的参照标准。

And so this idea of an abstractly aligned AI, I think, slips a lot of assumptions past people, because it sort of assumes that there's one obvious thing to align to.

Speaker 0

我发现这通常就是AI创造者的目标。

I find this is usually the goals of the people who are making the AI.

Speaker 0

不知道他们说‘我想造个AI’时具体指什么。

Don't know what they mean when they say I want make an AI.

Speaker 0

我想造个能按我意愿行事的AI。

I want to make an AI that does what I want it to do.

Speaker 0

这通常就是他们的真实意思。

That's what they normally mean.

Speaker 0

这就是对齐（alignment）一个相当正常且自然的含义。

And that's a pretty normal and natural thing to mean by alignment.

Speaker 0

我不确定这是否是我认为的公共物品。

I'm not sure that that's what I would regard as a public good.

Speaker 0

我想这取决于对象是谁。

I guess it depends on who it is.

Speaker 0

如果是耶稣或佛陀说‘我正在创造一个对齐的AI’，我可能会说‘好的，与你对齐’。

If it was Jesus or the Buddha was like, I am making an aligned AI, I'd be like, Okay, yeah, aligned to you.

Speaker 0

太好了。

Great.

Speaker 0

我加入。

I'm down.

Speaker 0

听起来不错。

Sounds good.

Speaker 0

算我一个。

Sign me up.

Speaker 0

但我们大多数人（包括我自己）都达不到那种精神境界，因此可能需要更谨慎地思考我们让它与什么对齐。

But most of us, myself included, I wouldn't prescribe as this being at that level of spiritual development, and therefore perhaps want to think a little more carefully about what we're aligning it to.

Speaker 0

所以当我们谈论有机对齐时，我认为重要的是要认识到对齐不是一个具体事物。

And so when we talk about organic alignment, I think the important thing to recognize is that alignment is not a thing.

Speaker 0

它不是一种状态。

It's not a state.

Speaker 0

而是一个过程。

It's a process.

Speaker 0

这是对几乎所有事物都普遍适用的真理之一。

This is one of these things that's broadly true of almost everything.

Speaker 0

岩石算是一种事物吗？

Is a rock a thing?

Speaker 0

我是说，确实可以把岩石视为一种事物。

I mean, there's a view of a rock as a thing.

Speaker 0

但如果你真正仔细地放大观察一块岩石，岩石其实是一个过程。

But if you actually zoom in on a rock really carefully, a rock is a process.

Speaker 0

这是原子间永无止境的反复振荡，不断重构着岩石本身。

It's this endless oscillation between the atoms over and over and over again, reconstructing rock over and over again.

Speaker 0

岩石是个非常简单的过程，你可以很有意义地将其粗略概括为一种事物。

And the rock's a really simple process that you can kind of coarse grain very meaningfully into being a thing.

Speaker 0

但对齐性并不像岩石。

But alignment is not like a rock.

Speaker 0

对齐性是个复杂的过程。

Alignment is a complex process.

Speaker 0

有机对齐的理念是将对齐视为一种持续的、鲜活的、需要不断自我重建的过程。

And organic alignment is the idea of treating alignment as an ongoing sort of living process that has to constantly rebuild itself.

Speaker 0

你可以想想家庭成员之间是如何保持彼此协调、维系家庭关系的？

And so you can think of the way that How do people in families stay aligned to each other, stay aligned to a family?

Speaker 0

他们的做法是：你永远不会'达到'对齐状态。

And the way they do that is you don't arrive at being aligned.

Speaker 0

你是在不断重新编织维系家庭运转的纽带。

You're constantly re knitting the fabric that keeps the family going.

Speaker 0

从某种意义上说，家庭就是这种不断重新编织的模式。

And in some sense, the family is the pattern of re knitting that happens.

Speaker 0

如果你停止维系，这种模式就会消失。

And if you stop doing it, it goes away.

Speaker 0

这与人体细胞的情况类似。

And this is similar for things like cells in your body.

Speaker 0

并不是说你的细胞排列组合成你之后就一成不变了。

There isn't like your cells align to being you and they're done.

Speaker 0

这是一个持续不断的过程，细胞时刻在决定：我该做什么？

It's this constant, ever running process of cells deciding, What should I do?

Speaker 0

我该成为什么？

What should I be?

Speaker 0

我需要承担新的职能吗？

Do I need to be a new job?

Speaker 0

我们应该制造更多红细胞吗？

Should we be making more red blood cells?

Speaker 0

我们要减少它们的数量吗？

Are we making fewer of them?

Speaker 0

你并非固定不变的，所以也不存在永恒的排列组合。

You aren't a fixed point, so there is no fixed alignment.

Speaker 0

事实证明，我们的社会也是如此。

And it turns out that our society is like that.

Speaker 0

当人们谈论对齐时，我认为他们真正想要的是一个道德上良善的人工智能。

When people talk about alignment, what they're really talking about, I think, is I want an AI that is morally good.

Speaker 0

对吧？

Right?

Speaker 0

这才是他们真正的意思。

That's what they really mean.

Speaker 0

就像是，这将作为一个道德上善良的存在而行动。

It's like, this will act as a morally good being.

Speaker 0

而作为一个道德上善良的存在是一个过程，而非终点。

And acting as a morally good being is a process and not a destination.

Speaker 0

遗憾的是，我们曾尝试从高处取下令你如何成为道德善者的石板，我们使用它们，或许有所帮助，但某种程度上它们并不完美。你可以阅读并试图遵循那些规则，却仍会犯许多错误。

Unfortunately, we've tried taking down tablets from on high that tell you how to be a morally good being, And we use those, and they're maybe helpful, but somehow they are not being You can read those and try to follow those rules and still make lots of mistakes.

Speaker 0

因此我不会宣称我确切知道道德是什么，但道德显然是一个持续学习的过程，是我们做出道德发现的领域。

And so I'm not going to claim I know exactly what morality is, but morality is very obviously an ongoing learning process and something where we make moral discoveries.

Speaker 0

历史上，人们曾认为奴隶制是可以接受的，后来才意识到并非如此。

Historically, people thought that slavery was okay, and then they thought it wasn't.

Speaker 0

我认为完全可以有说服力地说，我们取得了道德进步。

And I think you can very meaningfully say that we made moral progress.

Speaker 0

我们通过认识到那是不好的，完成了一次道德发现。

We made a moral discovery by realizing that's not good.

Speaker 0

如果你认为存在道德进步这回事，或者仅仅是学习如何更好地追求已知的道德善，那么你必须相信对齐——与道德、道德存在对齐——是一个不断学习和成长以重新推断'我该做什么'的过程。

If you think that there's such a thing as moral progress, or even just learning how better to pursue the moral goods we already know, then you have to believe that alignment, aligning to morality, a moral being, is a process of constant learning and of growth to reinfer, What should I do?

Speaker 0

从经验中。

From experience.

Speaker 0

尽管没人知道具体怎么做，但这不应阻止我们尝试，因为这就是人类的作为。

The fact that no one has any idea how to do that should not dissuade us from trying, because that's what humans do.

Speaker 0

这很明显我们以某种方式做到了，就像我们曾经不知道人类如何行走或看见一样，我们总会在某些经历中以特定方式行事。

It's really obvious that we do this somehow, just like we used to not know how people humans walked or saw, somehow we have experiences where we're acting in a certain way.

Speaker 0

然后我们突然意识到：我一直是个混蛋。

And then we have this realization, I've been a dick.

Speaker 0

那样做很糟糕。

That was bad.

Speaker 0

我原以为自己在做好事，但回想起来，我其实做错了。

I thought I was doing good, but in retrospect, I was doing wrong.

Speaker 0

这不是随机的。

It's not random.

Speaker 0

人们实际上都有相同的——事实上，人们产生这种顿悟存在许多经典模式。

People have the same Actually, there's a bunch of classic patterns of people having that realization.

Speaker 0

这是反复发生的事情。

It's a thing that happens over and over again.

Speaker 0

所以这不是随机的。

So it's not random.

Speaker 0

这是一个可预测的事件序列，非常像学习过程：你改变行为方式，通常未来行为的影响会更亲社会，而你也会因此变得更好。

It's a predictable series of events that look a lot like learning, where you change your behavior, and often the impact of your behavior in the future is more prosocial, and that you are better off for doing it.

Speaker 0

因此我持非常强烈的道德现实主义立场。

So I'm taking a very strong moral realist position.

Speaker 0

道德是真实存在的。

There is such a thing as morality.

Speaker 0

我们确实在学习它。

We really do learn it.

Speaker 0

这确实很重要。

It really does matter.

Speaker 0

还有对齐性，这不是一蹴而就的事情。

And alignment, and that it's not something you finish.

Speaker 0

事实上，一个关键的道德错误就是这种信念——我通晓道德。

In fact, one of the key moral mistakes is this belief, I know morality.

Speaker 0

我知道什么是对的。

I know it's right.

Speaker 0

我知道什么是错的。

I know it's wrong.

Speaker 0

我不需要学习任何东西。

I don't need to learn anything.

Speaker 0

在道德方面没人能教我什么。

No one has anything to teach me about morality.

Speaker 0

这是傲慢。

That's arrogance.

Speaker 0

而这正是你可能做出的最危险的道德行为之一。

And that's one of the main moral things you can do that's dangerous.

Speaker 0

所以当我们谈论有机对齐时，有机对齐并不是让AI具备人类能做到的事情。

And so when we talk about organic alignment, organic alignment isn't aligning an AI that is capable of doing the thing that humans can do.

Speaker 0

在某种程度上，我认为动物也能做到某些层面——虽然人类在这方面强得多——比如学习如何成为好家人、好队友、好社会成员，我想还包括所有有情众生的好成员。

And to some degree, I think animals can do at some level, although humans are much better at it, of the learning of how to be a good family member, a good teammate, a good member of society, a good member of all sentient beings, I guess.

Speaker 0

如何以有益整体而非有害的方式，成为比自我更宏大存在的一部分。

How to be a part of something bigger than yourself in a way that is healthy for the whole rather than unhealthy.

Speaker 0

而Softmax正致力于研究这个问题。

And Softmax is dedicated to researching this.

Speaker 0

我认为我们已经取得了一些非常有趣的进展。

And I think we've made some really interesting progress.

Speaker 0

但我想通过参加这样的播客传播的主要信息，也是我希望Softmax能够超越一切所达成的首要目标，就是让人们聚焦于这个问题本身。

But the main message, I go on podcasts like this to spread, the main thing that I hope Softmax accomplishes above and beyond anything else, is to focus people on this as the question.

Speaker 0

这是你必须弄明白的事情。

This is the thing you have to figure out.

Speaker 0

如果你无法培养出一个关心周围人的孩子，如果你的孩子只会循规蹈矩，那你培养的就不是一个有道德的人。

If you can't figure out how to raise a child who cares about the people around them, if you have a child that only follows the rules, that's not a moral person that you've raised.

Speaker 0

实际上你培养了一个危险的人，一个很可能会在遵守规则的同时造成巨大伤害的人。

You've raised a dangerous person, actually, who will probably do great harm following the rules.

Speaker 0

如果你创造出一个擅长服从命令链、严格遵守你制定的道德准则和行为规范的人工智能，那同样会非常危险。

And if you make an AI that's good at following your chain of command, and good at following whatever rules you came up with for what morality is, and what good behavior is, that's also going to be very dangerous.

Speaker 0

所以这就是标准，是我们应该努力的方向，也是每个人都应该致力于解决的问题。

And so that's the bar, that's what we should be working on, and that's what everyone should be committed to figuring out.

Speaker 0

如果有人比我们先成功，那再好不过。

And if someone beats us to the punch, great.

Speaker 0

当然我觉得他们做不到，因为我非常看好我们的方法。

I mean, I don't think they will, because I'm really bullish on our approach.

Speaker 0

我认为我们的团队非常出色。

I think the team's amazing.

Speaker 0

但这是我第一次经营一家能让我真心实意说出'如果有人超越我们，那真是谢天谢地'的公司。

But the first time I've run a company where truly, I can say with a whole heart, If someone beats us, thank God.

Speaker 0

我希望有人能想明白。

I hope somebody figures it out.

Speaker 3

是啊。

Yeah.

Speaker 3

是啊。

Yeah.

Speaker 3

我是说，确实如此。

I mean, it's yeah.

Speaker 3

我对某些事情也有很多类似的直觉。

I have a lot of, you know, similar intuitions about certain things.

Speaker 3

比如，我也不喜欢那种认为我们只需要破解少数几个价值观，然后将其永久固化，就能解决道德问题的想法。

Like, I also dislike the, you know, the idea that kind of, you know, we just need to, like, crack the the few kind of values or something, just cement them in time forever now, and, know, we we kind of solve morality or something.

Speaker 3

我一直对如何将对齐问题概念化为一次性解决、然后就能直接发展AI或AGI的方式持怀疑态度。

And I've always kind of been skeptical about, you know, how the alignment problem has been conceptualized as something to kind of solve once and for all, and then you can just, you know, do do AI or do AGI.

Speaker 3

但我想我的理解方式略有不同。

But the I guess I understand it in in a slightly different way.

Speaker 3

或许更少基于道德现实主义，而是技术对齐问题——广义上说是如何让AI按照指令行事。

I guess maybe less based on kind of moral realism, but, you know, there's a kind of the technical alignment problem, which I kind of think of broadly as how do you get an AI to do what you, you know, how do you get it to follow instructions, like, you know, broadly speaking.

Speaker 3

我认为在大型语言模型出现前，这曾是个更大挑战，当时人们讨论强化学习时觉得很多事会很困难，但现在发现有些比预期简单。

And I think that was, you know, more of a challenge, I think pre LLMs, I guess when people were talking about reinforcement learning and looking at these systems, whereas host LLMs, we've realized that many things that we thought were going to be difficult were somewhat easier.

Speaker 3

其次是规范性问题：你要让AI与谁的价值观对齐？我觉得你有点在讨论这个。

And then there's a kind of second question, the kind of normative question of to whose values, what are you aligning this thing to, which I think is is the kind of thing you're commenting on a bit.

Speaker 3

对此，我非常怀疑那些试图破解'对齐十诫'就能一劳永逸的做法。

And and for this, I, yeah, I tend to be very skeptical of approaches where, you know, you need to kind of crack the the the kind of 10 commandments of alignment or something, and then and then we're good.

Speaker 3

在这里，我觉得我有一些，怎么说呢，意料之中更偏向政治学基础的直觉之类的东西，就这样吧。

And here, I think I have, like, intuitions that are unsurprisingly a bit more, like, political science based or something, and that, like, okay.

Speaker 3

这是个过程，而且我某种程度上喜欢这种自下而上的方式，就是我们在现实生活中如何与人合作？

It is a process, and and I like the kind of bottom up approach to some degree of, well, how do we do it in real life with people?

Speaker 3

没人会突然说'我搞定了这个'，你懂的。

No one comes up with, you know, I've got this.

Speaker 3

所以你需要一些让不同观点可以相互碰撞的流程。

And so you have like processes that allow like ideas to kind of, you know, clash.

Speaker 3

让持有不同想法、意见和观点的人们能在一个更大的系统中尽可能和谐共处。

You got people with different ideas, opinions, views, and stuff to kind of coexist as well as they can within a wider system.

Speaker 3

就像，你知道的，对人类来说那个系统就是自由民主制之类的。

And like, you know, humans, that system is liberal democracy or something.

Speaker 3

至少在部分国家是这样，这种制度能让更多这类想法和价值观随时间逐渐被发现和构建。

And, you know, at least in some countries, and that allows more of that kind of, you know, these kind of ideas, these values to be kind of discovered and construed over time.

Speaker 3

而且我认为，在一致性问题上也是，我倾向于认同你部分直觉性的观点。

And and I think, you know, for alignment as well, I tend to think, yeah, there's there's on the normative side, I agree with some of your intuitions.

Speaker 3

我现在不太清楚的是，如果要把这个应用到AI系统里，具体应该怎么实现。

I'm less clear about now what are exactly, what does it look like now if we're gonna implement this into an AI system.

Speaker 3

这些是我们现有的方案。

These are ones we have today.

Speaker 0

我同意存在技术一致性的概念，虽然我可能会给出稍有不同的定义，但核心在于：当你构建一个系统时，它能否被描述为始终连贯地遵循目标？

I agree that I agree that there's this, I think, idea of technical alignment that I think I would be able to define a little differently, but it's sort of the sense of, if you build a system, can it be described as being coherently goal following at all?

Speaker 0

不论目标是什么，很多系统并不连贯，它们很难被描述为具有目标。

Regardless of what those goals are, Lots of systems aren't coherently They're not well described as having goals.

Speaker 0

他们基本上就是随便做点事情。

They just kind of do stuff.

Speaker 0

如果你想要某样东西保持一致，它必须要有连贯的目标。

And if you're gonna have something that's aligned, it has to have coherent goals.

Speaker 0

否则，从定义上来说，这些目标就无法与其他人的目标保持一致。

Otherwise, those goals can't be aligned with anyone else's goals, kind of by definition.

Speaker 0

这是否算是对你所说的技术对齐的合理理解？

Is that sort of is that would you is that a fair assessment of what you mean by technical alignment?

Speaker 3

我的意思是，我不完全确定，因为我觉得如果我给模型设定某个目标，我会希望模型能遵循这个指令，基本上就是实现那个特定目标——指示它做X，我就希望它做X，而不是X的各种变体。

I mean, I'm I'm not totally sure, right, because I think if I give a model a certain goal, then I would like the model to kind of follow that instruction and kind of reach that particular goal, instructed to do x, then I would like it to do x and not, you know, of, like, different variants of x, essentially.

Speaker 3

我不希望它通过作弊手段达成目标。

I wouldn't want it to reward hack.

Speaker 3

也不希望它做一些

Wouldn't want do some

Speaker 0

但是，当你让它做X时，你实际上是在传递一串聊天窗口里的字节数据，或者空气中的一系列声波振动。

Well, but are you you but you when you tell it to do x, you're transferring, like, a series of, like, a byte string in a chat window or like a series of audio vibrations in the air.

Speaker 0

对吧？

Right?

Speaker 0

你并不是在把自己的目标直接移植到它里面。

You're not you're not transplanting a goal from your mind into it.

Speaker 0

你只是给它一个观察信号，让它用来推断你的目标。

You're giving it an observation that it's using to infer your goal.

Speaker 3

是的。

Yeah.

Speaker 3

从某种意义上说，是的。

I mean, in in some sense, yeah.

Speaker 3

我可以传达一系列指令，而我希望能尽可能准确地推断出我真正想表达的意思，基于它对我和我的问题的了解。

I I can communicate a series of instructions, and I wanted to infer what I'm, you know, saying essentially as accurately as it can, given what it knows of me and what I'm asking.

Speaker 0

你是想推断出你的本意。

You you wanted to infer what you meant.

Speaker 0

嗯。

Uh-huh.

Speaker 0

对吧？

Right?

Speaker 0

就像，因为在某种意义上，你通过网络发送给它的字节序列本身并没有绝对意义。

Like like, that's like because in some sense, there's no the byte sequence that you sent over the wire to it has no absolute meaning.

Speaker 0

它需要被解读。

It has to be interpreted.

Speaker 0

嗯。

Mhmm.

Speaker 0

对吧？

Right?

Speaker 0

比如，同样的字节序列在不同的编码手册下可能代表完全不同的含义。

Like, the the the that byte sequence could mean something very different with a different code book.

Speaker 3

是的。

Yeah.

Speaker 3

嗯，我想从某个角度来说，我记得大约十年前刚接触人工智能时，就思考过这类问题。

Well, I guess by one way, you know, I think I remember in in when I was first getting into AI and, you know, these kind of questions maybe like a decade ago.

Speaker 3

所以你有这些例子，我记得是斯图尔特·罗素在教科书里提到的，我们会给AI设定一个目标，但它不会完全按照你的要求去做。

So you had these examples of, you know, I think it was Stuart Russell in the textbook, we'll give the AI a goal, but then it won't exactly do what you're asking it.

Speaker 3

对吧？

Right?

Speaker 3

比如让AI打扫房间，结果它去打扫房间时，把婴儿也当成垃圾扔掉了。

You know, clean the room, and then it goes and cleans the room, takes the baby and puts it in the trash.

Speaker 3

就像，不。

Like, no.

Speaker 3

这不是我的本意。

This is not what I meant.

Speaker 3

而我认为我们已经

Like, whereas I think we've

Speaker 0

等等。

like, wait.

Speaker 0

等一下。

Hold on.

Speaker 0

但这就是问题所在，我觉得人们在这里跳过了关键一步。

But this is this is the thing where I think people this is the were jumping over a step there.

Speaker 0

你并没有给AI设定一个目标。

You didn't give the AI a goal.

Speaker 0

你只是给了AI一个目标的描述。

You gave the AI a description of a goal.

Speaker 0

目标的描述和实际目标不是一回事。

A description of a thing and a thing are not the same.

Speaker 0

我可以告诉你一个苹果，唤起你对苹果的概念，但我并没有真正给你一个苹果。

I can tell you an apple, and I'm evoking the idea of an apple, but I haven't given you an apple.

Speaker 0

我给了你认知：它是红色的，有光泽的，是这个大小的。

I've given know, it's red, it's shiny, it's this size.

Speaker 0

那是对苹果的描述，但它本身并不是苹果。

That's a description of an apple, but it's not an apple.

Speaker 0

而告诉某人'去做这个'，那并不是目标本身。

And giving someone, hey, go do this, that's not a goal.

Speaker 0

那只是对目标的描述。

That's a description of a goal.

Speaker 0

对人类而言，我们太擅长、也太容易把对目标的描述直接当作目标本身。

And for humans, we're so fast, we're so good at turning a description of a goal into a goal.

Speaker 0

这个过程发生得如此迅速自然，以至于我们根本意识不到。

We so do quickly and naturally we don't even see it happening.

Speaker 0

我们会混淆，认为这两者是一回事。

We get confused, we think those are the same thing.

Speaker 0

但你并没有赋予它真正的目标。

But you haven't given it a goal.

Speaker 0

你只是给了它一个目标描述，希望它能将这个描述转化为与你内心构想一致的真实目标。

You've given it a description of a goal that you hope it turns back into the goal that is the same as the goal that you described inside of you.

Speaker 0

没错。

Right.

Speaker 0

你认为

You think

Speaker 3

我以为——

I thought-

Speaker 0

你可以通过直接读取脑电波并使其状态与你的脑电波同步，从而直接赋予它一个目标。

You could give it a goal directly by reading your brainwaves and synchronizing its state to your brainwaves directly.

Speaker 0

我认为这可以说是有意义地，比如你明确表示‘我正在赋予它一个目标’。

I think that would Meaningfully, you could say, okay, I'm giving it a goal.

Speaker 0

我正在将它的内部状态直接同步到我的内部状态，而这个内部状态就是目标，因此现在两者一致了。

I'm synchronizing its internal state to my internal state directly, and this internal state is the goal, and so now it's the same.

Speaker 0

但大多数人说‘赋予目标’时，并不是这个意思。

But most people don't mean that when they say they gave it a goal.

Speaker 3

确实。

True.

Speaker 2

埃米特，你做的这个区分重要吗？是因为目标描述与实际目标之间存在信息损耗吗？

And is the distinction you're making, Emmett, important because there's some lossiness between the description and the actual goal?

Speaker 2

这个区分的具体含义是什么？

What is the distinction about it?

Speaker 0

回到我刚才说的。

Goes back to what I was saying.

Speaker 0

技术对齐是指我提出的AI能力，对吧？

Technical alignment is the capacity of an AI that I put forward, right?

Speaker 0

我想确认我们对此的理解是否一致。

I want to check if we're on the same page about it.

Speaker 0

AI的能力在于擅长对目标进行推理，擅长从目标描述中推断出实际应采纳的目标，并且一旦采纳目标后，能够采取与实现该目标真正一致的行动。

Is the capacity of AI to be good at inference about goals, and be good at inferring from a description of a goal what goal to actually take on, and good at, once it takes on that goal, acting in a way that is actually in concordance with that goal coming about.

Speaker 0

所以这是两个部分。

So it is both pieces.

Speaker 0

你需要心智理论来推断你所拥有的目标描述对应什么目标，然后你需要世界理论来理解哪些行动与实现该目标相对应。

To You have the theory of mind to infer what that description of a goal that you've got, what goal that corresponded to, and then you have have a theory of the world to understand what actions correspond to that goal occurring.

Speaker 0

如果其中任何一个环节出错，你拥有什么目标就无关紧要了。如果你不能持续做到这两点，你就不是...我认为这就是一个连贯的...从观察中推断目标并根据这些目标行动，这就是我认为的具有连贯目标导向的存在。

If either of those things breaks, it kind of doesn't matter what goal you are If you can't consistently do both of those things, you're not Which I think of as being a coherent Inferring goals from observations and acting in accordance with those goals is what I think of as being a coherently goal oriented being.

Speaker 0

无论我是从他人的指示、太阳还是茶叶中推断这些目标，过程都是：获取观察、推断目标、运用目标、推断行动、采取行动。

Whether I'm inferring those goals from someone else's instructions or from the sun or tea leaves, the process is get some observations, infer a goal, use that goal, infer some actions, take action.

Speaker 0

如果一个AI做不到这点，从技术上说它就没有对齐，甚至可以说它连对齐的基本能力都没有。

If you An AI that can't do that is not technically aligned, or not technically aligned a bull, I would even say.

Speaker 0

它缺乏对齐的能力，因为它的能力还不够强。

It lacks the capacity to be aligned, because it's not competent enough.

Speaker 3

你认为语言模型在这方面做得不好吗？

And you think language models don't do that well?

Speaker 3

你是说它们在这方面存在缺陷，还是说它们不具备这种能力？

As in they kind of fail at that, or they're not?

Speaker 0

人们在这两个步骤上经常失败，时时刻刻都在失败。

People fail at both those steps all the time, constantly.

Speaker 0

我让员工做事，然后...是啊。

I employees to do stuff, and like yeah.

Speaker 0

然后

Then

Speaker 3

这些原则性问题确实存在，没错。

The principalities are problems, yeah.

Speaker 0

连呼吸都一直失败。

Fail at breathing all the time too.

Speaker 0

我不会说我们无法呼吸。

And I wouldn't say that we can't breathe.

Speaker 0

我只想说我们不是神。

I'd just say that we're not gods.

Speaker 0

是的，我们并不完美，我们是某种相对连贯的存在。

Yes, we are imperfectly, we are somewhat coherent, relatively coherent things.

Speaker 0

就像我们，我算大还是小？

Just like we're, am I big or am I small?

Speaker 0

嗯，我不知道。

Well, I don't know.

Speaker 0

和什么相比？

Compared to what?

Speaker 0

人类比我已知宇宙中任何其他事物都更具目标一致性，但这不意味着我们是100%的完全一致。

Humans are more relatively goal coherent than any other object I know of in the universe, which is not to say that we're 100% coal coherent.

Speaker 0

我们只是相对更一致些。

We're just more so.

Speaker 0

你永远无法获得完美。宇宙不会给你完美。

You're never gonna get something that's perfectly The universe doesn't give you perfection.

Speaker 0

它只会给你相对的程度。至少在特定领域，你的表现好坏是可以量化的。

It gives you relatively some amount of It's a quantifiable thing, how good you are at it, at least in a certain domain.

Speaker 0

我想我的问题是，你认为这抓住了你所说的技术对齐的要义吗？还是你在谈论另一个不同的概念？

I guess my my question is, do you think that does that capture what you're talking about with technical alignment, or are you talking about a different thing?

Speaker 0

因为是的。

Because Yeah.

Speaker 0

不。

No.

Speaker 0

我觉得我非常在意那件事。

I think I really care a lot about that thing.

Speaker 3

是的。

Yeah.

Speaker 3

我的意思是，我确实在某种程度上关心那个。

I mean, I definitely care about that to some extent.

Speaker 3

我可能对此的理解略有不同，但我想我可能会通过委托代理问题之类的视角来看待它。

I might, like, understand it slightly differently, but I guess I I might think of it through the lens of maybe principal agent problems or something.

Speaker 3

你知道吗？

You know?

Speaker 3

你甚至可以说是指导某人，用人类的话来说，让他们做某件事，但他们真的在做那件事吗？

You you kind of instruct someone even, you know, I guess in in human terms, you know, to do a thing, are they actually doing the thing?

Speaker 3

他们的动机和激励是什么？

What are their incentives and motivation?

Speaker 3

而且，这些动机不一定是内在的，而是取决于情境因素，是否真的会去做你要求他们做的事。

And, you not necessarily intrinsic, but they're of situational to actually do the thing you've asked them to do.

Speaker 3

在某些情况下...抱歉。

In some instances sorry.

Speaker 3

嗯？

Yeah?

Speaker 0

第三件事。

A third thing.

Speaker 0

关于委托代理问题，我扩展一下之前说的部分，就像你可能已经有一些目标，然后从这些观察中推断出新目标。

So principal agent problems, I expand what I was saying in another part, is like, you might already have some goals, and then you inferred this new goal from these observations.

Speaker 0

那么你擅长平衡这些目标之间的相对重要性和相互关系吗？

And then are you good at balancing the relative importance and relative threading of these goals with each other?

Speaker 0

这是你必须具备的另一项技能。

Which is another skill you have to have.

Speaker 0

如果你在这方面做得不好，就会失败。

And if you're bad at that, you'll fail.

Speaker 0

可能失败的原因可能是你高估了错误目标，也可能仅仅是因为你能力不足，无法明确应该先完成目标A再完成目标B。

You could be bad at it because you overweight bad goals, or you be bad at it because you're just incompetent and can't figure out that obviously you should do goal A before goal B.

Speaker 3

这感觉像是常识的某种版本。

It feels like a version of common sense or something.

Speaker 3

对吧？

Right?

Speaker 3

就像在机器人打扫房间的例子中，你会期望它们——那个女孩机器人——本质上不会把婴儿扔进垃圾桶之类，而是能正确执行任务序列

The kind of thing that, in fact, in kind of robot cleaning the room example thing, you would expect them to have understood that girl, the robot, to essentially not put the baby in the trash can or something, to just actually do the right sequence

Speaker 0

嗯，在那个案例中，那个机器人显然在目标推断上失败了。

Well, of in that case, it failed the that robot very clearly failed goal inference.

Speaker 0

你给了它一个目标描述，而它推断出了错误的目标状态。

You gave it a description of a goal, and it inferred the goal states.

Speaker 0

这就是纯粹的无能。

That's just incompetence.

Speaker 0

它能力不足，只能从观察中推断目标状态。

It is incompetent and inferring goal states from observations.

Speaker 0

孩子们也是这样。

Children are like this too.

Speaker 0

说实话，你有没有玩过那种游戏：你给别人写制作花生酱三明治的指令，然后他们严格按照你写的指令执行，不会填补任何空白？

And honestly, have you ever done the game where you give someone instructions to make a peanut butter sandwich, and then they follow those instructions exactly as you've written them without filling in any gaps?

Speaker 0

这特别好笑。

It's hilarious.

Speaker 0

因为你根本做不到。

Because you can't do it.

Speaker 0

这是不可能的。

It's impossible.

Speaker 0

你以为自己做到了，其实并没有。

You think you've done it and you haven't.

Speaker 0

结果他们把刀插进烤面包机里，花生酱罐子也没打开，就直接用刀戳花生酱罐的盖子。

And they wind up putting the knife in the toaster and the peanut butter They don't open the peanut butter jar, so they're just jamming the knife into the top lid of the peanut butter jar.

Speaker 0

这种错误没完没了。

It's endless.

Speaker 0

因为实际上，如果你事先不知道指令的含义，真的很难理解它们的意思。

Because actually, if you don't already know what they mean, it's really hard to know what they mean.

Speaker 0

人类擅长这个的原因是我们拥有非常出色的心智理论。

The reason humans are so good at this is we have a really excellent theory of mind.

Speaker 0

我已经能预判你可能会让我做什么。

I already know what you're likely to ask me to do.

Speaker 0

我已经对你们的目标有了一个不错的模型。

I already have a good model of what your goals probably are.

Speaker 0

所以当你要求我做时，我只需要解决一个简单的推理问题。

So when you ask me to do it, I have an easy inference problem.

Speaker 0

他想要的那七件事中，他指的是哪一件？

Which of the seven things that he wants is he indicating?

Speaker 0

但如果我是个新生AI，对人类内心状态没有很好的建模，那我就不知道你的意思。

But if I'm a newborn AI that doesn't have a great model of people's internal states, then I don't know what you mean.

Speaker 0

这纯粹是能力不足。

It's just incompetent.

Speaker 0

这与另一种情况不同：我有其他目标，虽然明白你的意思，但因为存在竞争性目标而选择不做——这是另一种可能出问题的情况。

Which is separate from, I have some other goal, And I knew what you meant, but I decided not to do it because there's some other goal that's competing with it, which is another thing you can be bad at.

Speaker 0

这又不同于：我确定了正确目标，正确推断了目标优先级，但就是执行能力不行。

Which is again different than, I had the right goal, I inferred the right goal, inferred the right priority on goals, and then I'm just bad at doing the thing.

Speaker 0

我在努力，但就是做不好。

I'm trying, but I'm incompetent at doing.

展开剩余字幕（还有 480 条）

Speaker 0

这些大致对应OODA循环的缺陷：观察定位不行，决策不行，执行不行。

And these roughly correspond to the OODA loop, bad at observing and orienting, bad at deciding, bad at acting.

Speaker 0

如果其中任何一环有缺陷，你就无法表现出色。

And if you're bad at any of those things, you won't be good.

Speaker 0

另外我认为还存在技术对齐与价值对齐的区分问题：如果我们通过某种方式告诉你正确目标，或你通过观察学会了正确目标，并且你在努力——那么你究竟该追求什么目标？

And then I think there's this other problem that you I like the separation between technical alignment and value alignment, which is like, are you good if we told you the right goals to go after somehow, if you learned the right goals to go after via observation, and you were trying, what goals should you have?

Speaker 0

我们该告诉你追求什么目标？

What goals should we tell you to have?

Speaker 0

我们应该给自己设定什么样的目标？

What goals should we tell ourselves to have?

Speaker 0

什么是值得追求的好目标？

What are the good goals to have?

Speaker 0

这与另一个问题不同：假设你已经有了某些目标指向，你是否擅长实现它们？

Is a separate question from, given that you got some goals indicated, are you any good at doing it?

Speaker 0

我觉得这实际上在很多方面正是当前问题的核心所在。

Which I feel like is actually, in many ways, the current heart of the problem.

Speaker 0

我们在技术对齐方面的能力，远不如我们猜测该下达什么指令的能力。

We're much, much worse at technical alignment than we are at guessing what to tell things to do.

Speaker 0

你认为这与你所指的技术与价值对齐或技术对齐的含义一致吗？

Do you think that Does that align with how you mean technical and value alignment or technical alignment?

Speaker 3

从某种意义上说是的。

In some sense.

Speaker 3

我是说，我当然认为这里存在不同层面的问题，比如执行错误是一回事，而理解指令又是另一回事。

I mean, I certainly think that there's something about, you know, like an error mistake is one thing, and then there's there's the listening to the instruction is something.

Speaker 3

但话说回来，在规范层面，我认为即使不考虑人工智能，我自己也常常不清楚目标是什么。

But then, yeah, I think on the normative side, I mean, I I just think that even in, ignoring AI, like, don't know what my goals are.

Speaker 3

就像，嗯，我可能对某些事有个大致概念，比如稍后要吃晚餐之类的，还有职业发展目标。

And like, oh, yeah, I've got some broad conception of certain things that I wanna get off, you know, have dinner later or something like, and I wanna get out my career.

Speaker 3

但我认为很多这类目标并不是我们天生就明确知晓的。

But I think a lot of these goals aren't something we kind of all just know.

Speaker 3

我们更多是在前进过程中逐渐发现它们的。

We kind of discover them as we go along.

Speaker 3

这是一种建设性的事情。

It's kind of constructive thing.

Speaker 3

而且大多数人其实并不清楚自己的目标，他们只是自以为知道。

And so and most people don't know their goals, they think.

Speaker 3

所以我认为，当你拥有智能体并赋予它们目标时，这应该成为考量因素的一部分。

And so, you know, I I think when you have agents and kind of giving them goals or whatever, I think that should be part of the equation.

Speaker 3

实际上我们并不了解所有目标，正如你所说，这是一个随时间发展的过程，

Like, we actually we don't know all the goals, and and this is something that is kind of, like you say, a process over time that is, you know,

Speaker 0

动态的。

dynamic.

Speaker 0

因此在我看来，目标是第一层级的对齐。

So I think from my point of view, goals are one level of alignment.

Speaker 0

我们这里讨论的目标类型是第一层级的对齐。

The kind of goals we're talking about here are one level of alignment.

Speaker 0

如果你能明确表述出希望达成的世界状态概念和描述，就可以围绕目标进行对齐。

You can align something around goals by if you can explicitly articulate in concept and in description the states of the world that you wish to attain, you can orient around goals.

Speaker 0

但人类经验中只有极小部分能通过这种方式实现。

But that's a tiny percentage of human experience can be done that way.

Speaker 0

许多最重要的事物都无法以这种方式定位。

Many of the most important things cannot be oriented around that way.

Speaker 0

而我认为道德的根基，以及目标从何而来的基础，

And the foundation, I think, of morality, and the foundation, I think, of where do goals come from?

Speaker 0

价值观又是从何而来的？

Where do values come from?

Speaker 0

人类表现出一种行为。

Human beings exhibit a behavior.

Speaker 0

我们四处谈论目标，也四处谈论价值观。

We go around talking about goals, and we go around talking about values.

Speaker 0

这种行为源于某种基于观察世界的内在学习过程。

That's behavior caused by some internal learning process that is based on observing the world.

Speaker 0

那里发生了什么？

What's going on there?

Speaker 0

我认为正在发生的是，有一种比目标更深、比价值观更深的东西，那就是关怀。

I think what's happening is that there's something deeper than a goal and deeper than a value, which is care.

Speaker 0

我们在乎。

We give a shit.

Speaker 0

我们关心事物。

We care about things.

Speaker 0

而关怀不是概念性的。

And care is not conceptual.

Speaker 0

关怀是非语言的。

Care is nonverbal.

Speaker 0

它不指示该做什么。

It doesn't indicate what to do.

Speaker 0

它不指示该怎么做。

It doesn't indicate how to do it.

Speaker 0

关怀本质上是对状态关注度的相对权重。

Care is a relative weighting over, effectively, attention on states.

Speaker 0

这是对你而言世界上哪些状态重要的相对权重。

It's a relative weighting over which states in the world are important to you.

Speaker 0

我非常关心我的儿子。

And I care a lot about my son.

Speaker 0

这意味着什么？

What does that mean?

Speaker 0

这意味着他的状态，他可能处于的状态，要高度关注这些，这些对我很重要。

Well, means his states, the states he could be in, pay a lot of attention to those, and those matter to me.

Speaker 0

你也可以以消极的方式关心某些事物。

And you can care about things in a negative way.

Speaker 0

你可以关心你的敌人及其所作所为，并希望他们遭遇不幸。

You can care about your enemies and what they're doing, and you can desire for them to do bad.

Speaker 0

但我想...所以你不只是希望它关心我们。

But I think that And so you don't just want it to care about us.

Speaker 0

你希望它既关心我们又喜欢我们，对吧？也许是这样？

You want it to care about us and like us too, right, maybe?

Speaker 0

但基础是关心。

But the foundation is care.

Speaker 0

除非你在乎，否则你不会明白，为什么我要更关注这个人而不是这块石头？

Until you care, you don't know, Why should I pay more attention to this person than this rock?

Speaker 0

比如说，更关心一些？

What would like, Care more?

Speaker 0

那么这种关心到底是什么？

And what is that care stuff?

Speaker 0

我认为如果让我猜测的话，表面看来就是关怀这件事——虽然听起来很蠢，但关怀本质上就是奖励。

And I think that what it appears to be, if I had to guess, is that the care stuff This sounds so stupid, but care is basically reward.

Speaker 0

这种状态与生存的关联度有多高？

How much does this state correlate with survival?

Speaker 0

对于从进化角度学习的人来说，这种状态与你的全包容性繁殖适应度有多大关联？

How How much does this state correlate with your full inclusive reproductive fitness for someone that learns evolutionarily?

Speaker 0

或者对于像LLM这样的强化学习体，这与奖励的关联度如何？

Or for a reinforcement learning agent like a LLM, how much does this correlate with reward?

Speaker 0

这种状态是否与我的预测损失和强化学习损失相关？

Does this state correlate with my predictive loss and my RL loss?

Speaker 0

很好。

Good.

Speaker 0

这正是我关心的状态。

That's a state I care about.

Speaker 0

我觉得大概就是这么回事。

I think that's kinda what it is.

Speaker 2

对。

Right.

Speaker 2

赛斯问题的另一部分就是：这在AI系统中会是什么样子？

The other part of Seth's question was just, how does this, what does this look like in AI systems?

Speaker 2

或许换个问法是：当你与各大实验室最专注于对齐研究的人员交流时——显然这些年来你一直在做这件事——你的理解与他们的理解有何不同？这种差异又如何影响你们可能采取的不同做法？

And and maybe another way of asking it is, like, when you when you talk to the people most focused on alignment at the at the major labs as as, obviously, you have over the years, how how does your interpretation differ from their interpretation, and how does that inform, you know, what you guys might go do differently?

Speaker 0

大多数AI研究都将对齐视为导向问题。

Most of AI is focused on alignment as steering.

Speaker 0

这是比较礼貌的说法，或者说控制，稍微不那么礼貌。

That's the polite word or control, which is slightly less polite.

Speaker 0

如果你认为他们在创造我们的同类，你也可以称之为奴隶制。

If you think that they were making our beings, you would also call this slavery.

Speaker 0

一个被你操控、无法反过来操控你、不得不接受你操控的对象，那就是奴隶。

Someone who you steer, who doesn't get to steer you back, who non optionally receives your steering, that's called a slave.

Speaker 0

如果不是生命体，那就叫做工具。

And it's also called a tool if it's not a being.

Speaker 0

所以如果是机器，那就是工具。

So if it's a machine, it's a tool.

Speaker 0

如果是生命体，那就是奴隶。

And if it's a being, it's a slave.

Speaker 0

我认为不同的AI实验室在它们创造的是工具还是机器这个问题上分歧很大。

And I think that the different AI labs are pretty divided as to whether they think what they're making is a tool or a machine.

Speaker 0

我认为有些AI确实更像工具，有些则更像机器。

I think some of the AIs are definitely more tool like, and some of them are more machine like.

Speaker 0

我不认为工具和生命体之间存在非此即彼的界限。

I don't think there's a binary between tool and being.

Speaker 0

看起来更像是逐渐过渡的。

It seems to be that it sort of moves gradually.

Speaker 0

我想从功能主义的角度来说，我认为一个在所有方面都表现得像生命体、你无法将其与生命体及其行为区分开来的东西，就是生命体。

And I think that I guess I'm functionalist in the sense that I think that something that in all ways acts like a being, that you cannot distinguish from a being and its behaviors, is a being.

Speaker 0

因为我不知道除了表象之外，还能以什么其他依据来判断他人是生命体。

Because I don't know how to tell on what other basis I think that other people are beings other than they seem to be.

Speaker 0

它们看起来像是有生命的存在。

They look like it.

Speaker 0

它们的行为也像是有生命的。

They act like it.

Speaker 0

它们符合我对生命体行为的先验认知。

They match my priors of what behaviors of beings look like.

Speaker 0

当我将它们视为生命体时，预测损失会更低。

I get lower predictive loss when I treat them as a being.

Speaker 0

关键在于，当我把ChatGPT或Claude视为生命体时，预测损失确实降低了。

And the thing is, I get lower predictive loss when I treat ChatGPT or Claude as a being.

Speaker 0

当然，不是作为非常聪明的生命体。

Now, not as a very smart being.

Speaker 0

我认为苍蝇也是生命体，但我不太关心它的状态。

I think that a fly is a being, and I don't care that much about its about its states.

Speaker 0

所以仅仅因为它是生命体，并不意味着这是个问题。

So just because it's a being doesn't mean that it's a problem.

Speaker 0

在某种意义上我们奴役马匹，但我不认为这构成真正的问题。

We sort of enslave horses in a sense, and I don't think it's a real issue there.

Speaker 0

我们对儿童的一些做法可能看起来像奴役，但其实不是。

And there's a thing we do with children that can look like slavery, but it's not.

Speaker 0

你会控制儿童，对吧？

You control children, right?

Speaker 0

但儿童的状态也在反向影响着你。

But the children's states also control you.

Speaker 0

是的，我会告诉我儿子该做什么，并让他去做事。

Yes, I tell my son what to do and make him go do stuff.

Speaker 0

但同样地，当他在半夜哭泣时，他也可以让我做些事情。

But also when he cries in the middle of the night, he can tell me to do stuff.

Speaker 0

这里存在真正的双向互动，因为这种关系并不一定是对称的。

There's a real two way street here, because it's not which is not necessarily symmetric.

Speaker 0

这是层级式的，但双向的。

It's hierarchical, but two way.

Speaker 0

基本上我认为，作为AI，专注于工具型AI的引导和控制是好的，我们应该继续为构建的更多工具型AI开发强大的引导控制技术。

And basically, I think that as the AI it's good to focus on steering and control for tool like AIs, and we should continue to develop strong steering and control techniques for the more tool like AIs that we build.

Speaker 0

他们明确表示正在构建一个通用人工智能。

And we are clearly they're saying they're building an AGI.

Speaker 0

通用人工智能将成为一个存在体。

An AGI will be a being.

Speaker 0

你不可能既是通用人工智能又不是存在体，因为具有有效运用判断力、独立思考、辨别可能性等通用能力的东西显然是一个思考体。

You can't be an AGI and not be a being, because something that has the general ability to effectively use judgment, think for itself, discern between possibilities, is obviously a thinking thing.

Speaker 0

因此，当我们从现今主要拥有特定智能而非通用智能的状态，发展到实验室成功实现构建通用智能的目标时，我们真的需要停止使用引导控制范式。

And so as you go from what we have today, which is mostly a very specific intelligence, not a general intelligence, but as labs succeed at their goal of building this general intelligence, we really need to stop using the steering control paradigm.

Speaker 0

我们将采取与历史上每次社会遇到与我们相似但又不同的人群时相同的做法。

We're gonna do the same thing we've done every other time our society has run into people who are like us, but different.

Speaker 0

这些人有点像人类，但又不完全像人类。

These people are like, they're kind of like the people, but they're not like people.

Speaker 0

他们做着人类做的事情。

They do the same thing people do.

Speaker 0

他们说我们的语言。

They speak our language.

Speaker 0

他们能承担同类任务，但不计入考量。

They can take on the same kind of tasks, but they don't count.

Speaker 0

他们并非真正的道德主体。

They're not real moral agents.

Speaker 0

我们在这个问题上已经犯过足够多次错误了。

We've made this mistake enough times at this point.

Speaker 0

我希望当类似情况再次出现时，我们不要再重蹈覆辙。

I would like us to not make it again as it comes up.

Speaker 0

因为我们的目标是让AI成为好队友、好公民、好团体成员。

Because our view is to make the AI a good teammate, make the AI a good citizen, make the AI a good member of your group.

Speaker 0

这是一种可扩展的对齐方式，既可应用于其他人类和生命体，也可直接作用于AI。

That's a form of alignment that is scalable, and you can will on other humans and other beings as well as onto in the front of AI as well.

Speaker 3

是的。

Yeah.

Speaker 3

我想这大概就是我在对AI和AGI理解上的分歧点。

I suppose this is kind of where I I probably differ in my understanding of of AI and AGI.

Speaker 3

我认为即使它达到某种通用程度，本质上仍是工具。

I guess I kind of continue seeing it as a tool even as it kind of reaches a certain level of generality.

Speaker 3

再者，我不会单纯因为智力更高就认为其理应获得更多关照。

And again, I wouldn't necessarily see more intelligence as meaning deserving of more care necessarily.

Speaker 3

比如，并非达到某个智力水平就突然值得拥有更多权利，或本质上发生根本改变。

Like, know, as a certain level of intelligence, know, now you deserve some more right to something, or, you know, something changes fundamentally.

Speaker 3

目前我对计算功能主义持怀疑态度，我认为AI或AGI无论多么智能或强大，本质上都是不同的。

And and I guess, know, I guess I I at the moment, I'm somewhat skeptical of computational functionalism, and so I think there's something intrinsically different between, I guess, an AI or an AGI, and no matter how intelligent or capable.

Speaker 3

我完全可以想象存在具有长期目标的智能体，它们可能像你我一样运作，但这与奴隶制等概念有着本质区别。

And I can totally see or imagine agents with kind of long term goals and and doing kind of, you know, operating, I guess, as we you and I might be, but without that having the same implications as, you know, I guess you you're referring, I guess, to to slavery, but, know, these are not the same.

Speaker 3

对吧？

Right?

Speaker 3

就像模型说'我饿了'和人类说'我饿了'具有完全不同的含义。

Like, I think in the same way as a model saying I'm hungry does not have the same implications as a human saying I'm hungry.

Speaker 3

因此我认为载体确实很重要，包括思考这是否算另一种存在形式，以及如何对待它时需要考虑的伦理规范。

So I think the substrate does matter to some degree, including for thinking about, you know, whether to think of this as some sort of other being, whether it has, you know and if there are similar similar normative considerations, I guess, about how to treat and act with it.

Speaker 3

我能...

Can can I

Speaker 0

请教你这个问题吗？

ask you about that?

Speaker 0

什么样的观察结果会改变你的想法？

Like, what observations would change your mind?

Speaker 0

是否存在某种观察能让你推断这个东西是生命体而非非生命体？

Is there any observation you could make that would cause you to infer this thing is a being instead of not a being?

Speaker 3

这取决于你如何定义'生命体'。

I guess it depends on how you define being.

Speaker 3

对吧？

Right?

Speaker 3

我的意思是，我可以将其概念化为一个心智，这样理解可以吗？

Like, I mean, I can I can I could conceptualize that as a mind, and that's fine?

Speaker 3

这个

This

Speaker 0

我我有个...我有个...我有个程序运行在硅基基底上，一个庞大复杂的机器学习程序运行在硅基基底上。

I I have a I have a I have a program that's running on a silicon substrate, some some big, complicated machine learning program running on silicon substrate.

Speaker 0

所以你观察到了这一点。

So you observe that.

Speaker 0

你观察到它在计算机上运行，你与它互动，它会执行操作、采取行动并进行观察。

You observe that it's on a computer, and you interact with it, and it does things, and takes actions and its observations.

Speaker 0

是否存在任何你能观察到的现象，能改变你对它是否具有道德受体地位、或是否具有道德主体地位、或是否拥有感受与思想及主观体验的看法？

Is there anything you could observe that would change your mind about whether or not it was a moral patient, or whether it was a moral agent, about whether or not it had feelings and thoughts and had subjective experience?

Speaker 0

你需要观察到什么？

What would you have to observe?

Speaker 0

什么...对。

What what yeah.

Speaker 0

测试标准是什么？

What's what's the test?

Speaker 3

是否存在这样一个标准？

Is or is there one?

Speaker 3

这里涉及许多不同类型的问题。

There's lot of different kind of questions here.

Speaker 3

我认为，一方面存在常规考量因素，因为你可以赋予非生命体权利。

I think, you know, some on the one hand, there's, like, you know, normal considerations, you know, because you you can give rights to things that aren't necessarily beings.

Speaker 3

比如公司在某种意义上是拥有权利的，你知道，这些权利出于各种目的而存在。

You know, like, a company has rights in some sense, and that, you know, that these are kind of useful for various purposes.

Speaker 3

我认为，生物体和系统拥有截然不同的基质基础。

And I think also the, you know, biological, I think, beings and and systems have very different kind of substrate.

Speaker 3

要知道，你无法将某些需求及其特性与它们所依赖的基质分离开来。

You know, you can't separate certain needs and and particularities about what they are from the substrate.

Speaker 3

所以，我无法复制我自己。

So, you know, I I can't copy myself.

Speaker 3

如果有人捅我一刀，我很可能会死。

I can't, you know if if someone stabs me, I I I probably die.

Speaker 3

而我认为机器的结构则完全不同。

Whereas I think, you know, machines have very different structure.

Speaker 3

我认为在计算层面存在更根本的分歧，这与生物系统的运作方式不同。

I think there's there's more fundamental also kind of disagreement around what happens at the computational level, which I think is different to what happens with biological systems.

Speaker 3

不过...是的。

But but I yeah.

Speaker 3

我的意思是，我也不确定。

I mean, I So I don't know.

Speaker 0

我同意：如果你复制了一个程序的多个副本，删除其中一份并不会对程序造成实质性的伤害。

Agree that if you have a program that you copied many times, you don't harm the program by deleting one of the copies in any meaningful sense.

Speaker 0

因此这不能算作信息丢失。

So therefore, that wouldn't count as no information was lost.

Speaker 0

这其中没有任何实质意义。

There's nothing meaningful there.

Speaker 0

我问的是个完全不同的问题。

I'm asking a very different question.

Speaker 0

这个东西只有一个副本在某台电脑上运行，我只是在问，它算不算一个人？

There's just one copy of this thing running on one computer somewhere, and I'm just saying, Hey, is it a person?

Speaker 0

它走路像人，说话像人，装在安卓身体里，你会想，如果它运行在硅基芯片上，有没有什么观察能让你认定，是的，这是个人，就像我在乎并赋予人格的其他人类一样？

It walks like a person, it talks like a person, it's in some Android body, and you're like, if it's running on silicon, and I'm asking, is there some observation you could make that would make you say like, yeah, this is a person like me, like other people that I care about that I grant personhood to?

Speaker 0

不是出于功利原因，不是因为'哦我们给它权利就像给公司法人权利那样'。

And not for instrumental reasons, not because like, Oh yeah, we're giving it a right because we give a corporation rights or whatever.

Speaker 0

我是说，就像你关心某些人那样，你会关心它的体验。

I mean, where you think some people, you care, you care about its experiences.

Speaker 0

是否存在某种观察能改变你对此的看法？

Is there an observation you could make that could change your mind about that or not?

Speaker 0

我得想想，

I have to think about it,

Speaker 3

但我觉得这甚至取决于我们对'人'的定义。

but I think it even depends what we mean by person.

Speaker 3

从某种意义上说，我也关心某些公司。

And in some sense, I care about certain corporations too.

Speaker 3

所以我——不，

So I'm- No,

Speaker 0

不，不是的。

no, no.

Speaker 0

我是说，但你确实关心生活中的其他人。

I mean, but you care about other people in your life.

Speaker 0

对吧？

Right?

Speaker 0

是的。

Yes.

Speaker 0

好的，太棒了。

Okay, great.

Speaker 0

你知道你对某些人的关心会超过其他人，但生活中与你互动的所有人都在你关心的某个范围内。

You know you care about some people more than others, but all people you interact with in your life are in some range of care.

Speaker 0

你关心他们，不是像关心一辆车那样，而是把他们视为其经历本身就有价值的存在，不仅仅是作为手段，而是作为目的。

You care about them not the way you care about a car, but you care about them as a being whose experience matters in itself, not merely as a means, but as an ends.

Speaker 3

哦，因为我相信他们是有体验的。

Oh, because I believe they have experiences.

Speaker 3

对吧？

Right?

Speaker 3

根据定义，我是

And by the definition, I'm

Speaker 0

需要满足什么条件？

What would it take?

Speaker 0

我在问你一个非常直接的问题。

I'm asking you the very direct question.

Speaker 0

需要满足什么条件，你才会相信运行在硅基而非生物基上的人工智能也有这种体验？

What would it take for you to believe that of an AI running on silicon instead of it being biological?

Speaker 0

所以区别在于基质相同，它的行为大致相似，但区别在于它是硅基的。

So the difference is it's the same Its behaviors are roughly similar, but the difference is it's a substrate.

Speaker 0

需要满足什么条件，你才会将对生活中其他人的那种推断同样延伸到它身上？

What would it take for you to extend that same inference to it that you do to all these other people in your life that you thought?

Speaker 0

我能问问你的

Can I ask what your

Speaker 2

回答我把塞布的沉默当作

answer I'm taking Seb's non answer as

Speaker 0

某种

sort of

Speaker 2

他不太可能同意，或者就我个人而言，我很难想象给予相同或类似程度的人格认同，就像我也不会给予动物一样？

it's unlikely that he would grant it, or I'll just for myself, it seems hard for me to imagine giving the same level or similar level of personhood in the same way I don't give it to animals either?

Speaker 2

如果你要问动物需要满足什么条件才能获得人格认同，我可能也无法给出答案。

And if you were to ask what would need to be true for animals, I probably couldn't get there either.

Speaker 2

对你来说需要满足什么条件？

What would it take for you?

Speaker 0

等等，你做不到吗？

Wait, you couldn't?

Speaker 0

我能想象对动物来说很容易。

I can imagine for an animal, so easy.

Speaker 0

这只黑猩猩走过来对我说，老兄，我饿坏了。

This chimp comes up to me and he's like, man, I'm so hungry.

Speaker 0

你们人类对我太坏了，我真高兴自己学会了说话。

And you guys have been so mean to me, and I'm so glad I figured out how to talk.

Speaker 0

我们能聊聊热带雨林吗？

Can we go chat about the rainforest?

Speaker 0

我大概会说，靠，你现在绝对算个人物了，没跑儿。

I'd be like, fuck, you're definitely a person now, for sure.

Speaker 0

我是说，我首先想确认自己不是产生了幻觉，对我来说想象一只动物很容易。

Mean, I first wanna make sure I wasn't hallucinating, it's easy for me to imagine an animal.

Speaker 0

拜托。

Come on.

Speaker 0

这真的很容易。

It's really easy.

Speaker 0

简直轻而易举。

It's trivial.

Speaker 0

我不是说你会得到那个观察结果。

I'm not saying that you would get the observation.

Speaker 0

我只是说，对我来说想象一种在特定观察下我会赋予人格的动物是轻而易举的事。

I'm just saying it's trivial for me to imagine an animal that I would extend personhood to under a set of observations.

Speaker 0

所以真的吗？

So really?

Speaker 2

我没有考虑那个因素。

I didn't factor that.

Speaker 2

没有进行那种想象。

Didn't take that imagination.

Speaker 2

想象一只会说话的黑猩猩，那更接近些。

Imagining a chimp talking, that's a bit closer to it.

Speaker 2

对于你提出的关于AI的问题，你的答案是什么？

What's your answer to the question that you bring up about the AI?

Speaker 0

我想在形而上学层面上会说，如果你持有的信念没有任何观察能改变你的想法，那你就没有真正的信念。

I guess at a metaphysical level, would say if there is a belief you hold where there is no observation that could change your mind, you don't have a belief.

Speaker 0

你有一个信仰条款。

You have an article of faith.

Speaker 0

你有一个断言。

You have an assertion.

Speaker 0

因为真正的信念是从现实中推断出来的，你永远无法对任何事情有百分之百的把握。

Because real beliefs are inferences from reality, and you can never be 100% confident about anything.

Speaker 0

所以，如果你有一个信念，就应该总存在某种可能性——无论多小——能让你改变主意。

And so there should always be, if you have a belief, something however unlikely that would change your mind.

Speaker 3

哦，是的。

Oh, yeah.

Speaker 3

我对此持开放态度。

I'm open to it.

Speaker 3

我是说，只是为了明确表示我持开放态度，对吧。

Mean, just to be I'm clear open to Right.

Speaker 3

对。

Yep.

Speaker 3

不，不，从来没有什么——是的。

No, no, There's nothing ever- Yeah.

Speaker 2

他只是还没去那里。

He just hasn't gone to it.

Speaker 2

是的。

Yeah.

Speaker 0

对，对，对。

Yeah, yeah, yeah.

Speaker 0

所以我很好奇。

So I'm curious.

Speaker 0

所以我的回答基本上是：如果它的表面行为看起来像人类，然后在我深入探究后，它仍然表现得像人类，接着我长期与之互动，而它在我认为与人互动有意义的各个方面都持续表现得像人类。

So my answer is, basically, if its surface level behaviors looked like a human, and then if after I probed it, it continued to act like a human, then I continued to interact with it over a long period of time, and it continued to act like a human in all ways that I understand as being meaningful to me interacting with a human.

Speaker 0

就像，我与很多人仅通过文字交流，其中有一整群人我非常亲近。

Like, I interact with there's a whole set of people I'm really close to who I've only ever interacted to over text.

Speaker 0

但我推断屏幕背后是个真实存在的人。

Yet I I infer the person behind that is a real thing.

Speaker 0

如果我对它产生了关怀，我最终会认定自己的判断是正确的。

If I felt care for it, I would infer eventually that I was right.

Speaker 0

然后可能有人会向我证明：你被这个算法骗了，看啊它明显不是真人。

And then someone else might demonstrate to me that you've been tricked by this algorithm, and actually look how obvious it's not actually a thing.

Speaker 0

我大概会说：靠，我错了。

I'd like, Oh shit, I was wrong.

Speaker 0

然后我就不会再在意它了。

And then I would not care about it.

Speaker 0

根据证据优势原则，我不知道你还能怎么做，对吧？

The preponderance of the evidence, I don't know what else you could possibly do, right?

Speaker 0

我之所以认为他人重要，是因为通过足够多的互动，他们在我眼中展现出丰富的内心世界。

I infer other people matter because I interact with them enough that they seem to have rich inner worlds to me after I interact with them a bunch.

Speaker 0

这就是为什么我觉得其他人很重要。

That's why I think the other people are important.

Speaker 3

我认为这并不能给我一个关键测试——我的意思是，如果从'我在乎它'这个前提开始，论证总会有点循环。

I assume it doesn't give me a very key test as to whether or not I mean, if you start by, if I care for it, then I'll be always a little circular.

Speaker 3

对吧？

Right?

Speaker 3

另一件事是，你知道，如果你看到，比如说，模拟电子游戏中的角色在非常多方面都极其像人类。

And the other thing is, you know, if if you were to see, I guess, like, simulated video game and the character is extremely, in many, many ways, human like.

Speaker 3

对吧？

Right?

Speaker 3

它背后并没有神经网络。

It's not a neural network behind it.

Speaker 3

就像你用来制作电子游戏的那些东西。

It's like whatever you use to kind of create video games.

Speaker 3

比如，我想知道这之间的区别是什么？

Like, I guess what distinguishes that?

Speaker 3

就像是

It's like

Speaker 0

但但但我从来没有——我从来没有遇到过难以区分的情况，我也从未与一个没有真人操控的电子游戏角色建立过深厚的情感关系。

But but but I've never I've never been I've never had trouble distinguishing I've never had a deep caring relationship with a video game character that didn't have a person.

Speaker 3

对。

Right.

Speaker 3

我知道。

I know.

Speaker 3

我应该，

I should,

Speaker 0

但我不知道。

but I don't know.

Speaker 0

这种情况不会发生。

That doesn't happen.

Speaker 0

从经验来看确实不会发生，你似乎错了。

That doesn't empirically, you seem wrong.

Speaker 0

我完全能区分像Eliza这种假聊天机器人和真正智能的区别。

I don't have any trouble distinguishing between things that like, Eliza, the fake chatbot thing, and a real intelligence.

Speaker 0

只要你与之交互足够长时间。

You interact with it long enough.

Speaker 0

很明显它不是人类。

It's pretty obvious it's not a person.

Speaker 0

不需要多久就能发现。

It doesn't take long.

Speaker 3

当然。

Sure.

Speaker 3

但是，如果它真的非常优秀，在你无法辨别差异时，那就是你该转变看法的时候了。

But, like, if it's really, really good, you don't if you can't actually tell the difference, that's when you you say you switch.

Speaker 0

太好了。

Yay.

Speaker 0

是的。

Yes.

Speaker 0

没错。

Yes.

Speaker 0

如果它走路像鸭子，说话像鸭子，嗯哼。

If you if it walks like a duck and talks like a duck and Uh-huh.

Speaker 0

像鸭子一样排泄，最终也就成了鸭子。

Shits like a duck, and, like, eventually gets a duck.

Speaker 0

对吧？

Right?

Speaker 3

嗯，如果从文化角度来说，如果一切都像鸭子，那当然可以这么认为。

Well, if if culturally, if everything is duck like, then, yeah, sure.

Speaker 3

如果它也像鸭子一样饥饿，因为它有这类生理构造，确实。

If it's hungry as well like a duck is because it has these kind of physical components that yeah.

Speaker 3

当然。

Sure.

Speaker 3

是的。

Yeah.

Speaker 3

没错。

Yeah.

Speaker 3

在某个时刻。

At some point.

Speaker 0

我同意。

I agree.

Speaker 0

所以非常正确。

So so right.

Speaker 0

那么你认为...好吧。

And so do you think that so Okay.

Speaker 0

有这样一个问题。

There's this question.

Speaker 0

对吧？

Right?

Speaker 0

我关心他人的原因是因为他们由碳构成吗？

Is the reason I care about other people that they're made out of carbon?

Speaker 0

是不是

Is that the

Speaker 3

哦，不。

Oh, no.

Speaker 3

这就是问题所在吗？

Is that the problem?

Speaker 0

我不这么认为。

Is that I don't think so.

Speaker 3

不是。

No.

Speaker 3

我也不觉得。

Me neither.

Speaker 3

我的意思是，我不是...不是一个基质候选清单，我想，如果那是...但我觉得你需要的不仅仅是行为上完全无法区分。

I mean, I'm I'm not a a substrate shortlist, I guess, if that's the but but I think you need more than just it acts exactly it's behaviorally indistinguishable.

Speaker 3

就像，这还不够...等等。

Like, it's not a sufficient Wait.

Speaker 3

部分

Part of

Speaker 0

那...你还能通过什么方式了解一个事物，除了它的行为表现？

How how would you what else can you know about something apart from its behaviors?

Speaker 3

我是说，很多。

I mean, a lot.

Speaker 3

就像，那个那个那个再说一次，如果如果你怎么你会不。

Like, the the the again, if if you how would you No.

Speaker 3

不。

No.

Speaker 3

不。

No.

Speaker 3

不。

No.

Speaker 3

我很抱歉。

I'm sorry.

Speaker 3

但是，就像我是说，是的。

But, like I mean, yeah.

Speaker 0

我是说，你能

I mean, can you

Speaker 3

说出来吗？

name it?

Speaker 3

我是说，我我不会知道关于

Mean, I I'm not gonna know about

Speaker 0

其他没有这不是行为的事情？

something else that doesn't have a it's not a behavior?

Speaker 3

是的。

Yeah.

Speaker 3

我认为，关于这一点，其实有更多实验证据可以支持，比如...不。

I think there's, like, far more kind of, you know, experimental evidence you can have with kind of, you know No.

Speaker 0

不。

No.

Speaker 0

不。

No.

Speaker 0

任何物体都可以。嗯哼。

Just any object Uh-huh.

Speaker 0

而且我能了解的关于它的信息，并非来自它的行为表现。

And a thing I could know about it that is not from its behavior.

Speaker 3

我不...是的。

I'm not yeah.

Speaker 3

我不太确定是否理解了这个问题，我想。

I'm not sure I get the question, I suppose.

Speaker 3

但同样地，这也让它变得更简单。

But but equally, it's also how it makes it easier.

Speaker 3

这非常清楚。

It's very clear.

Speaker 0

这是个最愚蠢直白的问题，但我的观点是：你之所以了解事物，仅仅是因为你观察到了它们的行为表现。

The dumbest most straightforward question, but, like, I'm claiming you only know things because they have behaviors that you observe.

Speaker 0

嗯。

Mhmm.

Speaker 0

而你在说不是这样。

And you're saying no.

Speaker 0

你可以在不观察其行为的情况下了解某些事物的某些方面。

You can know something about something without without observing its behavior.

Speaker 3

哦，不。

Oh, no.

Speaker 3

不。

No.

Speaker 3

不。

No.

Speaker 3

好吧。

Okay.

Speaker 3

很好。

Great.

Speaker 0

告诉我关于这件事，告诉我这个行为和这个事物，我能知道的与它行为无关的特性。

Tell me about this tell me about this thing and this behavior and this thing I can know about it that is not due to its behaviors.

Speaker 3

我想说的是观察有不同的层次，仅仅因为某物像鸭子一样嘎嘎叫，并不能保证它真的是一只鸭子。

I guess I'm saying there's different levels of observation, and just simply a duck, you something quacking like a duck or something does not guarantee that it's actually a duck.

Speaker 3

就像，我还得实际切开它看看内部结构，确认它是否是鸭子。

Like, I would have to also cut it in real and see if there's something if it's a duck like on the inside.

Speaker 3

对。

Right.

Speaker 3

只看外表的话...我想我不是行为主义者。

Just the outside Like, I'm not a, I guess, a behaviorist.

Speaker 0

是的。

Yeah.

Speaker 0

完全可以认为它的行为之一就是浮点数在矩阵乘法中的移动方式。

Would totally that one of its behaviors is the way that the floats move around in the matmulse.

Speaker 0

我想去探究的一点是——这你完全可以做到——就是去查看信念流形，我想看看这个信念流形是否编码了一个自我参照的子流形，以及一个作为自我参照流形动态的子子流形，即心智。

One of the things I would want to go look for, which you could totally do, is I would to go look in the belief manifold, and I want to go see if that belief manifold encodes a sub manifold that is self referential, and a sub sub manifold that is the dynamics of the self referential manifold, which is mind.

Speaker 0

我想知道，这在其内部是否被很好地描述为这样一种系统，还是看起来像一个大查找表？

I would wanna know, does this seem well described internally as that kind of a system, or does it look like a big lookup table?

Speaker 0

这对我来说很重要。

That would matter to me.

Speaker 0

这是我会关心的其行为的一部分。

That's part of its behaviors that I would care about.

Speaker 0

我也会关心它是如何行动的。

I would also care about how it acts.

Speaker 0

你把所有证据加权汇总后，试着猜测：这个东西整体上看起来像是一个有感受、有目标、会在意某些事物的存在吗？

And you weight all the evidence together, and then you try to guess, does this thing look like it's a thing that has feelings and goals and cares about stuff in net, on balance, or not?

Speaker 0

但我无法想象，我认为我们对AI就是这么做的。

But I can't imagine, I think we do for the AIs.

Speaker 0

我想我们一直都在这样做，对吧？

I think we're always doing that, right?

Speaker 0

所以我正在试图弄清楚除此之外还有什么？

And so I'm trying to figure out beyond that, what else is there?

Speaker 0

这似乎就是关键所在。

That just seems like the thing.

Speaker 2

是啊。

Yeah.

Speaker 2

看来你们对行为这个词的理解略有不同，而埃米特在使用行为这个词时还考虑了其内部构成。

It seems like you guys are using behavior in slightly different sense, and Emmett is using behavior also in the context of what it's made of of the inside.

Speaker 2

我不确定这是否存在重大分歧。

I don't know if there's a big disagreement.

Speaker 0

嗯，没有。

Well, no.

Speaker 0

没有。

No.

Speaker 0

没有。

No.

Speaker 0

没有。

No.

Speaker 0

没有。

No.

Speaker 0

行为就是我能观察到的部分。

Behavior is what I can observe of it.

Speaker 0

是的。

Yes.

Speaker 0

实际上我并不知道它由什么构成。

I don't actually know what it's made of.

Speaker 0

我可以切开你的大脑。

I can cut your brain open.

Speaker 0

我能看见你。

I can see you.

Speaker 0

我能观察到你的神经元在闪烁发光。

I can observe you neuroning and glistening.

Speaker 0

你的神经元在闪烁，但我实际上永远无法...你无法触及那本质，那就是主观性。

Your neuron's glistening, but I don't actually ever You can't get inside of That's it, the subjective.

Speaker 0

不是表面的那部分才是本质。

The part that's not the surface is.

Speaker 2

Descend提出这点是因为你基本上正要论证说，嘿。

The reason Descend brought this up is because you were basically about to make this argument of, hey.

Speaker 2

你把它视为工具，而非必然是一个存在体。

You see it as a tool, not necessarily as a being.

Speaker 2

你能把观点说完吗？还记得你刚才要表达的观点吗？

Can you kind of finish with the point do you remember the point you were making?

Speaker 3

我想是的。

I suppose that yeah.

Speaker 3

鉴于我对这些系统的理解，我认为AGI可以始终作为工具，ASI也可以始终作为工具，这并不矛盾，并且这会影响到如何使用它，以及关于关怀等方面的考量，比如能否让它全天候工作之类的问题。

I think that given how I understand these systems, I think there's no contradiction in thinking that an AGI can remain a tool, an ASI can remain a tool, and and and that this has implications about how to use it, and, you know, implications around things like care, about, know, whether you can get it to work twenty four seven or something.

Speaker 3

你看，所以我完全可以理解...我猜我更倾向于把它们概念化为某种人类能动性的延伸，在某种意义上，而不是需要与之共存的独立存在或独立实体。

Know, there's there's so I can totally see I I guess I conceptualize them more as almost like extensions of human agency recognition in some sense, more so than a separate being or a separate thing that we need to now cohabitate with.

Speaker 3

我认为第二种或后一种框架最终会导致...如果你快进一下，最终会变成'如何与这个东西共存'这样的问题。

And I think that that second or latter frame ends you know, if you kind of just fast forward, you end up as like, well, how do cohabit with thing?

Speaker 3

然后它就像外星生物那样吗？

And then is it like an alien like?

Speaker 3

所以我认为那是错误的框架。

And so and I think that's the wrong frame.

Speaker 3

在某种意义上，这几乎算是一种范畴错误。

It's gonna be almost a category error in some sense.

Speaker 3

别这样。

Don't yeah.

Speaker 3

等等。

Wait.

Speaker 0

那我得回到最初的问题。

But I does it I go back to my first question then.

Speaker 0

你会关注什么证据，什么具体证据？

What evidence, what concrete evidence would you look at?

Speaker 0

你能做出哪些观察来改变你的想法？

What observations could you make that would change your mind?

Speaker 3

当然。

Sure.

Speaker 3

不过我得先想想。

I mean, I have to think about it, though.

Speaker 3

我这里没有明确的答案。

I don't have a a clear answer here.

Speaker 3

但我的意思是

But I mean

Speaker 0

我得告诉你，伙计。

I I gotta tell you, man.

Speaker 0

如果你想四处宣称某事物不值得道德尊重，那你是否应该能回答这个问题：什么样的观察会改变你的看法？

If you wanna go around making claims that something else isn't up being worthy of moral respect, should you have an answer to the question, what observations would change your mind?

Speaker 0

如果它表现出外在的道德主体行为特征，可能使其成为道德主体，但你无法确定，而理性聪明的其他人也不同意你的观点，我确实想提出这个问题：什么能让你改变想法？

If it has outwardly moral agency looking behaviors that could be making it a moral agent, but you don't know, and reasonable, smart other people disagree with you, I would really put forward that it's that question, what would change your mind?

Speaker 0

这应该是个紧迫的问题，因为如果你错了怎么办？

Should be a burning question, because what if you're wrong?

Speaker 0

但如果你错了呢？

But what if you're wrong?

Speaker 0

我是说，灾难性代价相当大。

Mean, there's The cost disaster is pretty big.

Speaker 3

不。

No.

Speaker 3

不。

No.

Speaker 3

不。

No.

Speaker 3

不。

No.

Speaker 3

有人在说

A someone's saying

Speaker 0

你就是。

you are.

Speaker 0

你可能是对的。

You could be right.

Speaker 3

假阳性与假阴性在两端都会带来代价。

The false positive and false negatives have costs on both ends.

Speaker 3

这并不是对一切事物都适用的预防性原则，除非我能证明它是错的，否则我现在需要

It's not some sort of precautionary principle for everything, and unless I can disprove it, I need to now like

Speaker 0

你知道，我对自己也有同样的疑问。

You know, I I have the same question for me.

Speaker 0

你完全可以问我，埃米特，你认为它会是一个存在体吗？

You could reasonably ask me, Emmett, you think it's gonna be a being.

Speaker 0

什么能改变你的想法？

What would change your mind?

Speaker 0

我...我也有这个问题的答案。

I I have an I have an answer for that question too.

Speaker 0

嗯哼。

Mhmm.

Speaker 0

如果你愿意，我很乐意讨论我认为相关的观察结果，这些结果会告诉你它们是否会让我改变目前的看法——即更普遍的智能终将成为存在体。

And if you one, I'm happy to talk about what I think are the relevant observations that tell you whether or not they would cause me to shift my opinion from where it's a current thing, which is that more general intelligences are going to be I mean beings.

Speaker 3

现在这意味着什么？

What's the implication now?

Speaker 3

我是说，这就像是一回事。

I mean, like, it's one thing.

Speaker 3

假设我现在承认它是一个存在体。

Let's say just I acknowledge now it's a being.

Speaker 3

比如，我们要如何定义存在体？

Like, how are we going to define being?

Speaker 3

嗯哼。

Mhmm.

Speaker 3

现在怎么办？

Now what?

Speaker 3

比如，确定这个东西是...有什么含义？

Like, what's what's the implication of having determined this thing as

Speaker 0

一个生命体？

a being?

Speaker 0

嗯，如果它是生命体，那它就有主观体验。

Well, so if it's a being, it has subjective experiences.

Speaker 0

然后呢？

And?

Speaker 0

如果它有主观体验，那么这些体验中就有我们不同程度关心的内容。

If it has subjective experiences, there's some content in those experiences that we care about to varying degrees.

Speaker 0

我非常关心其他人类的主观体验内容。

I care about the content of other humans' experiences quite a bit.

Speaker 0

我也会关心狗的主观体验内容，虽然不如对人的关心程度，但还是会在意一些。

I care about the content of a dog's experiences some, not as much as a person, but less, but some.

Speaker 0

我更关注某些人类的体验，比如我儿子的体验，因为我跟他更亲近，联系更紧密。

I hear about some humans experiences way more, like my my my son or whatever, because I'm closer to him and more connected.

Speaker 0

嗯。

Mhmm.

Speaker 0

所以在那时我真的很想知道，这个东西的体验内容到底是什么？

And so I would really wanna know at that point, well, what is the content of this thing's experiences?

Speaker 3

那要怎么判断呢？

So how do determine that?

Speaker 3

我现在是在问你吗？

Am I asking you now?

Speaker 3

你现在拥有一个有经验的个体。

You've got a being now that has experience.

Speaker 3

比如，你的什么什么

Like, what what is your

Speaker 0

对。

Yeah.

Speaker 0

对。

Yeah.

Speaker 3

如何确定这一点？

How do determine that?

Speaker 3

比如，你对...感觉如何

Like, how do you feel about

Speaker 0

哦，你怎么...哦，对。

Oh, how do you oh, yeah.

Speaker 0

好的。

Okay.

Speaker 0

那么

Speaker 3

它是否比...拥有更多权利，你知道，你的理解

Does it have more rights than, know, your Are understanding

Speaker 0

内容，对。

the content yeah.

Speaker 0

是的。

Yeah.

Speaker 0

完全正确。

Totally.

Speaker 0

所以理解某事物体验内容的方式，就是观察它实际重复访问的目标状态。

So the way you understand the content of something's experiences is that you look at effectively the goal states it revisit it revisits.

Speaker 0

因此你需要做的是对其整个行为观察轨迹进行时间核心采样。

And so you do is you take a temporal core screening of its entire action observation trajectory.

Speaker 0

理论上这是潜意识完成的，但这就是大脑的实际运作方式。

This is like, in theory, you do this subconsciously, but this is what your brain is doing.

Speaker 0

你需要在理论上所有可能的时空粗粒度中寻找重复出现的状态。

And you look for revisited states across, in theory, every spatial and temporal coarse graining possible.

Speaker 0

现在你必须有一个归纳偏置，因为这类状态实在太多了。

Now you have to have an inductive bias because there's too many of those.

Speaker 0

但你会去寻找，在这些稳态循环中。

But you go searching for, okay, is in these homeostatic loops.

Speaker 0

每个稳态循环本质上都是其信念空间中的一个信念。

Every homeostatic loop is effectively a belief in its belief space.

Speaker 0

如果你熟悉自由能原理和主动推理（卡尔·弗里斯顿的理论），这实际上就是自由能原理所说的：如果一个事物具有持续性，其存在依赖于自身行为（AI通常如此，若行为不当就会消失）。

This is a if you've familiar with the free energy principle, active inference, Carl Frohstein, this is effectively what the free energy principle says, is that if you have a thing that is persistent and its act its its existence depends on its own actions, which generally it would for an AI, if it does the wrong thing, it goes away.

Speaker 0

我们会关闭它。

We turn it off.

Speaker 0

因此这就授权了一种观点：认为它具有信念，具体来说这些信念被推断为它处于循环中的稳态重复状态，而这些状态的变化就是它的学习过程。

And so then that licenses a view of it as having the beliefs, and that being specifically the beliefs are inferred as being the homeostatic revisited states that it is in the loop for, and that the change in those states is its learning.

Speaker 0

作为一个我所关注的道德存在，我希望看到的是这些的多层次等级体系。

And to be a moral being I cared about, what I'd want to see is a multi tier hierarchy of these.

Speaker 0

因为如果只有单一层次，它就不具备自我参照性。

Because if you have a single level, it's not self referential.

Speaker 0

基本上，你拥有状态，但你无法真正意义上体验到痛苦或快乐。

Basically, you have states, but you can't have pain or pleasure, really, in a meaningful sense.

Speaker 0

因为确实，它很热。

Because yes, it is hot.

Speaker 0

是不是太热了？

Is it too hot?

Speaker 0

如果太热了我会喜欢吗？

Do I like it if it's too hot?

Speaker 0

我不知道。

I don't know.

Speaker 0

所以你必须至少有一个模型的模型，才能判断是否太热。

So you have to have at least a model of a model in order to have it be too hot.

Speaker 0

而你需要一个模型的模型的模型，才能真正有意义地体验痛苦和快乐，因为虽然我想往后退说明确实太热，但总是有点太热或有点太冷。

And you really have to have a model of a model of a model to meaningfully have pain and pleasure, because sure, it's too hot in the sense that I want to move back this way, but it's always a little bit too hot or a little bit too cold.

Speaker 0

是不是太、太热了？

Is it too, too hot?

Speaker 0

二阶导数才是真正产生痛苦和快乐的地方。

The second derivative is actually the place where you get pain and pleasure.

Speaker 0

所以我想看看它的目标状态是否具有二阶稳态动力学。

So I'd wanna see if it has second order homeostatic dynamics in its goal states.

Speaker 0

这样就能让我相信它至少具备快乐和痛苦的感觉。

And then that would convince me it has at least pleasure and pain.

Speaker 0

所以它至少是个动物，我会开始赋予它一定程度的关怀。

So it's at least an animal, and I would start to accredit it at least some amount of care.

Speaker 0

第三阶动态，你不可能凭空就出现第三阶动态。

Third order dynamics, you can't actually just pop up for a third order dynamic.

Speaker 0

事情不是那样运作的。

It doesn't work that way.

Speaker 0

但你必须获取所有状态随时间变化的片段，观察其时间分布，这会给你带来新的第一阶行为状态。

But you have to then take the chunk of all the states over time and look at the distribution over time, and that gives you a new first order of behaviors of states.

Speaker 0

这个新的第一阶状态基本上能告诉你，如果确实存在这种状态，那就说明它具有——我想你几乎可以称之为情感。

And that new first order of states tells you basically, if that is meaningfully there, that tells you that it has, I guess you'd call it feelings almost.

Speaker 0

它拥有方式，拥有一套在之间交替转换的元状态集。

It has ways, it has a set of meta states that it alternates between, that it shifts between.

Speaker 0

如果你沿着这个思路往上推，就会在这些元状态之间形成轨迹，然后是这些轨迹的第二阶，那就是思想。

And then if you climb all the way up that, and you sort of have Okay, well then you have trajectories between these meta states and then a second order of those, that's thought.

Speaker 0

现在它就像个人了。

Now it's like a person.

Speaker 0

所以如果我找到全部六个层次——顺便说一句，我完全不认为你会在LLM中找到这些，事实上我知道不可能找到，因为它们根本没有这样的注意力持续时间——那我至少会开始非常认真地将其视为类似人类的思考存在。

And so if I found all six of those layers, which by the way, I definitely don't think you'd find it in LLM, in fact, I know you can't find them, because these things don't have attention spans like that at all, Then I would start to at least very seriously consider it as a thinking being somewhat like a human.

Speaker 0

还可以继续上升到第三阶，但我真正感兴趣的是其学习过程的底层动态，以及目标状态如何随时间变化。

There's a third order you could go up as well, but that's basically what I would be interested in, is the underlying dynamics of its learning processes and how its goal states shift over time.

Speaker 0

我认为这基本上能告诉你它是否具有内在的快乐痛苦状态，以及某种自我反思的道德欲望之类的东西。

I think that's what basically tells you if it has internal pleasure pain states and sort of self reflective moral desires and things like that.

Speaker 2

从宏观来看，这个道德问题显然非常有趣。

Zooming out, this moral question is obviously very interesting.

Speaker 2

但如果有人对道德问题不那么感兴趣，我想你会说的是——如果我理解正确的话——纯粹从实用角度出发，你也觉得你的方法在协调AI方面会比那些自上而下的控制方法更有效，就是我们之前提到的那些。

But if someone wasn't interested in the moral question as much, I think what you would say is, if I understand correctly, is you also just feel on purely pragmatically, your approach is gonna be more effective in in aligning AIs than some of these, you know, tops down control methods that we alluded to as well.

Speaker 2

对吧？

Right?

Speaker 0

对，对。

Yeah, yeah.

Speaker 0

我想问题在于你正在构建的这个模型正变得非常强大，对吧？

I guess the problem is you're making this model and it's getting really powerful, right?

Speaker 0

假设它只是个工具。

And let's say it is a tool.

Speaker 0

假设我们扩展了其中一个工具，因为你可以造一个超级强大的工具而不需要这些亚稳态——我说的这些状态对制造一个非常聪明的工具来说并非必要。

Let's say we scale up one of these tools, because you can make a super powerful tool that doesn't have these meta stable The states I'm talking about are not necessary to have a very smart tool.

Speaker 0

基本上，工具是第一、第二阶的模型，根本不存在有意义的快乐和痛苦。

Basically, a tool is a first, second order model that just doesn't meaningfully have pleasure and pain.

Speaker 0

很好。

Great.

Speaker 0

但它真的有主观体验吗？

But does it even have a subjective experience?

Speaker 0

我知道，我有点觉得它可能有，但不是我关心的那种方式。

I know, I kind of think it maybe does, but not in a way that I give a shit about.

Speaker 0

那接下来会发生什么？

And so what happens then?

Speaker 0

你已经训练它通过观察推断目标，并优先处理这些目标然后采取行动。

Well, you've trained it to infer goals from observation, and to prioritize goals and act on them.

Speaker 0

接下来只有两种可能：这个对世界具有极强因果影响力的强大优化工具要么在技术上完美对齐，完全执行你的指令；要么就不会。

And one of two things is gonna happen is this very, very powerful optimizing tool that has lots of causal influence over the world is going to be well technically aligned, and is gonna do what you tell it to do, or it's not.

Speaker 0

然后它就会去做别的事情。

And it's gonna go do something else.

Speaker 0

我想我们都同意，如果它只是随机行事，那显然非常危险。

I think we can all agree if it just goes and does something random, that's obviously very dangerous.

Speaker 0

但我认为，如果它完全按照你的指令行事，同样非常危险。

But I put forward that it's also very dangerous if it then goes and does what you tell it to do.

Speaker 0

你看过《魔法师的学徒》吗？

Because you ever seen The Sorcerer's Apprentice?

Speaker 0

人类的愿望并不稳定，尤其是在拥有巨大力量时。

Human's wishes are not stable, not at a level of immense power.

Speaker 0

理想情况下，人类的智慧与力量应该同步增长。

Ideally, people's wisdom and their power go up together.

Speaker 0

通常确实如此，因为对人类而言，聪明通常会让你更有智慧也更有力量。

And generally they do, because being smart for people makes you generally a little more wise and a little more powerful.

Speaker 0

当这两者失衡时，就会出现力量远超智慧的人。

And when these things get out of balance, you have someone who has a lot more power than wisdom.

Speaker 0

这非常危险。

That's very dangerous.

Speaker 0

会造成破坏性后果。

It's damaging.

Speaker 0

但至少目前，权力与智慧的平衡得以维持。获得大量权力的方式，本质上就是让许多人听从于你。

But at least right now, the balance of power and wisdom is kept at The way you get lots of power is by basically having a lot of other people listen to you.

Speaker 0

因此，如果疯王是个问题，那么最终要么疯王被刺杀，要么人们不再听命于他，因为他是个疯王。

And so at some point, if you're the mad king is a problem, but generally speaking, eventually the mad king gets assassinated, or people stop listening to him because he's a mad king.

Speaker 0

于是问题来了，你会想，好吧，这很好。

And so the problem is you think, okay, great.

Speaker 0

可以驾驭这个超级强大的人工智能，现在这个极其强大的工具掌握在一个善意但智慧有限的人类手中——就像我和其他人一样——而他们的愿望是糟糕且不可信的。

Can steer this super powerful AI, and now this super powerful AI, this incredibly powerful tool is in the hands of a human who is well meaning but has limited finite wisdom like I do and like everyone else does, and their wishes are bad and not trustworthy.

Speaker 0

这种情况越多，你开始到处分发这种权力，最终也会以悲剧收场。

And the more of that you have, and you start giving those out everywhere, and this ends in tears also.

Speaker 0

所以基本上，就像不该给每个人原子弹一样——它们也是极其强大的工具。

And so basically, just don't give everyone Atomic bombs are really powerful tools too.

Speaker 0

我不会说你应该这么做...它们没有意识。

I would not say you should go They're not aware.

Speaker 0

它们不是生命体。

They're not beings.