小语言模型中推理能力的演变，嘉宾：Yejin Choi - #761 | The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) 中文双语解读

本集简介

今天，我们邀请到了斯坦福大学计算机科学系和以人为中心的人工智能研究所（HAI）的教授兼高级研究员叶瑾·蔡。在本次对话中，我们探讨了叶瑾近期关于提升小型语言模型推理能力的研究。我们讨论了高质量、多样化的数据在缩小小型模型与大型模型之间智能差距中的核心作用，以及如何结合合成数据生成、模仿学习和强化学习来激发小型模型更强的推理能力。叶瑾解释了她在《人工蜂群》论文中强调的模型输出同质化和模式坍塌风险，及其对人类创造力和知识的影响。我们还讨论了她团队提出的新方法，包括将强化学习作为预训练目标，激励模型在预测下一个标记前“思考”，以及“棱镜合成”——一种基于梯度的方法，用于生成多样化的合成数学数据并过滤过度代表的样本。此外，我们还探讨了人工智能的社会影响以及“多元对齐”理念——确保人工智能反映人类多样化的规范与价值观。最后，叶瑾分享了她推动人工智能民主化、超越大型机构的使命，并对来年的发展做出了预测。本集完整节目笔记请见：https://twimlai.com/go/761

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

即使是开放性问题，模型的多样性也远不如我们预期的那样，即使你多次使用更高的温度参数提问，其输出也可能变化不大。

Even for open ended questions, the models are not as diverse as we would have expected to the point that even when you ask multiple times with higher temperature, it may not be able to vary as much.

Speaker 0

因此，我们发现模型输出中不仅存在模型内部的同质性，还存在模型间的同质性，也就是说，LAMA、CHAJPT 和 DIP6R1 它们的行为都惊人地相似。

So there's intra model homogeneity in the model output as well as we find intermodal homogeneity, meaning, you know, LAMA, CHAJPT and DIP6R1, they all have strikingly similar behavior.

Speaker 1

好的，各位。

Alright, everyone.

Speaker 1

欢迎收听本期的 Twiml AI 播客。

Welcome to another episode of the Twiml AI podcast.

Speaker 1

我是你们的主持人，萨姆·查林顿。

I'm your host, Sam Charrington.

Speaker 1

今天，我邀请到了叶瑾·蔡。

Today, I'm joined by Yejin Choi.

Speaker 1

叶瑾是斯坦福大学计算机科学系和以人为中心的人工智能研究所（HAI）的教授兼高级研究员。

Yejin is professor and senior fellow at Stanford University in the computer science department and Institute for Human Centered AI or HAI.

Speaker 1

在开始之前，请别忘了在你收听本节目的平台点击订阅按钮。

Before we get going, be sure to take a moment to hit the subscribe button wherever you're listening to today's show.

Speaker 1

Yejin，欢迎再次做客播客。

Yejin, welcome back to the podcast.

Speaker 1

已经有一段时间了。

It's been a while.

Speaker 0

哦，是的。

Oh, yeah.

Speaker 0

谢谢您再次邀请我。

Thanks for having me back.

Speaker 1

当然。

Absolutely.

Speaker 1

当然。

Absolutely.

Speaker 1

我想我们上一次交谈还是在2021年，那在AI领域简直像是很久以前了。

I think we last spoke in the 2021, which seems like ages ago in AI years.

Speaker 1

我很想直接切入主题，请你介绍一下自那以后你都在研究些什么。

I would love to kinda jump in and have you bring us up to date on what you've been working on since then.

Speaker 1

对于那些没看过那一期的听众，也许你可以先简单介绍一下你的背景。

And actually for folks who didn't catch that one, maybe start with a little bit about your background.

Speaker 0

我上次做你节目时，可能还主要是因为从事常识性知识与推理方面的工作，那时我也在自然语言生成领域投入了很多精力。

The time when I was on your podcast, I was still maybe best known for working on common sense knowledge and reasoning and back then I was also working on natural language generation quite a bit.

Speaker 0

当然，自那以后发生了许多变化，最近我特别关注推理能力，尤其是如何让小型语言模型具备更好的推理能力。

Of course, since then a lot has happened, So more recently I've been excited about reasoning, especially making small language models to reason better.

Speaker 0

我广泛地关注大型语言模型、小型语言模型、大型推理模型、小型推理模型，以及如何让模型更好地契合多元化的规范与价值观。

So I'm broadly interested in large language models, small language models, large reasoning models, small reasoning models, and then, how we could, make, models align better for pluralistic norms and values.

Speaker 1

很好。

Nice.

Speaker 1

很好。

Nice.

Speaker 1

是什么促使你对小型语言模型感兴趣？

What drives your interest in SLMs?

Speaker 1

看起来大部分关注点都在大型语言模型上，而我们正努力让小型模型达到同样的性能水平。

Seems like a lot of the action is in large language models and, we're working hard to get the smaller ones up to the same level of performance.

Speaker 1

是什么驱动了你对这个领域的特别兴趣？

What's your particular interest driven by?

Speaker 0

是的，我们的使命其实是让生成式AI实现民主化，让不只是那些能购买大量GPU的公司才能创建、调整和部署大语言模型，也让像我这样的学者和同事们——我们无法购买那么多GPU——也能做些真正有意义且有趣的事情，哪怕只是用更小的模型？

Yeah, so the mission really is democratizing generative AI so that it's not just companies who can purchase a lot of GPUs, are able to create LLMs and adapt LLMs and serve LLMs but also people like myself and colleagues who are academics, for example, cannot buy as many GPUs and is there something really meaningful and fun that we could do even with a smaller counterpart?

Speaker 0

归根结底，我相信这在根本上应该是可行的。

And at the end of the day, I believe that fundamentally it should be feasible.

Speaker 0

只是这个世界在探索扩大规模会带来什么结果上投入了太多资源。

It's only that the world has invested so much more into exploring what happens when you scale things up so much.

Speaker 0

而如果我们哪怕只投入一小部分——哪怕只是那笔投入的一小部分——但再多一点，我相信我们就能释放出小语言模型更多令人兴奋的能力。

Whereas if we invested even like smaller, I mean, like even a fraction of that investment, but just a little bit more, I do think that we can unlock a lot more exciting capabilities out of small language models.

Speaker 0

我的部分研究也源于寻找更优方式来教导机器智能的愿望。

Part of my research is also driven by the desire to find really better ways of teaching intelligence to machines.

Speaker 0

目前，这完全依赖数据，我们稍后在本播客中可以更详细地讨论，但现在的做法过于依赖数据，这几乎是唯一已知的教AI理解人类知识和智能的方式。

Currently, it's just so data centric, we can talk about that in more detail later in this podcast, but it's just so data dependent and that's pretty much the only way we know how to teach AI about human knowledge and intelligence.

Speaker 0

但未来，我不知道我们是否能找到这样的解决方案，作为学者，我觉得我们必须尝试寻找一种全新的、更高效利用数据的解决方案，从而用更少的数据学到更多。

But in the future, I don't know whether we will find those solutions or not, as an academic, I feel like we have to give it a try to find an entirely better solution to this that is so much more data efficient and then able to learn so much more with much less data.

Speaker 1

当你思考这个领域和行业如何发展时，以及你提到的所有投资都流向了哪里，你认为这是为什么？

When you think about how the space, the industry evolves and, your comment about where all the investment has gone, why do you think that is?

Speaker 1

你觉得投资只是必须迅速跟进已经证明有效的方法，而没有花时间退一步去识别所有可以优化的机会吗？

Do you feel like the investment has just got to follow-up quickly what works and without, us taking a time to step back and identify all of the opportunities to optimize?

Speaker 1

还是你觉得小型模型本身存在某些特定的障碍，使得它们 inherently 更具挑战性？

Or do you think that there, are particular impediments to smaller models that make it inherently more challenging?

Speaker 0

这确实存在滚雪球效应和跟风效应。

There's definitely snowball effect and then you know, ship herding effect.

Speaker 0

你看到其他公司往哪个方向走，你就想跟着走，因为这是一个安全的选择，尤其是在融资方面——如今为AI融资比以前容易多了，所以这种已被验证的提升智能的方式，为什么不选呢？

You see other ship going where and then you want to follow that because it's a safe choice, especially when raising funding is relatively yeah, raising funding is not as hard as it used to be for AI, therefore that's guaranteed and proven ways of increasing intelligence, so why not?

Speaker 0

事实上，我并不反对这样的努力。

And in fact, I'm not against such effort.

Speaker 0

看到规模扩大能解锁多少智能，真的非常有趣。

It's really interesting to see and watch how much of an intelligence scale can unlock.

Speaker 0

我很欣赏一些人疯狂探索规模极限，发现了规模扩大后会发生什么。

I appreciate that some people went crazy and find out the frontier of what happens with the scale.

Speaker 0

话虽如此，我确实担心每个人都试图做同样的事情。

Having said that, I do worry about everybody trying the same thing.

Speaker 0

我认为尝试不同的想法非常重要，尤其是从历史上看，计算机或手机的创新都是从非常庞大的形态开始，然后随着时间推移，人们逐渐找到让它们变得更小但更强大的方法。

I think it's very important that we try different ideas, especially historically, innovation happened with computers or phones, they're always very large at the beginning and then over the course of the time, people figure out how to make it smaller yet more powerful.

Speaker 0

同样的事情也一定会发生在生成式人工智能上。

So the same thing will definitely happen with generative AI as well.

Speaker 0

事实上，现在已经有很多研究致力于让模型更小但更强大，我认为如果我们投入更多心思和努力，还能做得更多、更好。

In fact, already there's a lot of research effort that makes models smaller but more powerful and I think we can do so much more, so much better if we put more mind and effort into it.

Speaker 1

那么，你如何看待解决这个问题的不同途径或方法？你觉得哪些方面已经被探索了，而哪些领域至今还没有被有效挖掘？

And how do you think about the different attack vectors or approaches, to tackling this problem and like, what do you feel like is already being explored and where do you think there are opportunities that really haven't been explored very effectively thus far?

Speaker 0

是的，有多种路径。

Yeah, so there are multiple routes.

Speaker 0

我认为最初，人们尝试通过量化或剪枝一些神经元等方式，将大型模型压缩成小型模型。

I think at the beginning, people were trying to think about compressing larger models into smaller models by either quantizing it or pruning some neurons and stuff like this.

Speaker 0

所以，这在某种程度上是一种基于优化的、更偏向机械式优化的方法，用来将大型模型变小。

So that's more like in some sense, optimization based or a little bit more mechanical optimization based approach to making larger models to smaller models.

Speaker 0

而且要制作更小的模型，确实需要更大的模型，这一点是存在的。

And it does require larger models in order to make smaller models, so there's that.

Speaker 0

这本身没什么问题。

Nothing wrong with that.

Speaker 0

拥有这种选择很好，但这并不是唯一的方法。

It's nice to have that option, but it's not the only way.

Speaker 0

因此，短期内，我认为像 NVIDIA 的 Mamba Hybrid 这样结合状态机与传统 Transformer 的新架构，可能是让小型模型更强大的另一种途径。

So in the short term, I think having new architectures like some hybrid between state machines and conventional transformers like Mamba Hybrid from NVIDIA for example, could be an alternative ways of making small models more powerful.

Speaker 0

但还有其他方法，比如改善数据质量，尤其是提供更强大的数据。

But there can be other ways such as making data better, especially providing much more powerful data.

Speaker 0

这种数据通常需要来自互联网数据的边缘地带，也就是互联网本身难以提供的那种数据，用以更好地训练模型进行特定类型的推理。

And this data usually has to be at the outer skirt of the internet data, meaning the kind of data that internet couldn't quite provide in order to teach the model to do certain kinds of reasoning better.

Speaker 0

如果我们拥有更高品质的数据，小型模型的学习速度通常会快得多。

So if we have much higher quality data, usually small models learn so much faster.

Speaker 0

所以这是另一种方法。

So that's another way.

Speaker 1

当你说到互联网边缘的数据时，这些数据来源有哪些具体例子？

When you say data kind of on the outer reaches of the internet, what are some examples of these sources?

Speaker 1

人们常常认为，我们已经找到了所有公开可用的数据，并且所有大型模型都是基于这些数据进行训练的。

I think it's commonly thrown around that, you know, we found all the data that is available to the public and we train all of the large models based on this data.

Speaker 1

而且，人们经常提出，未来将来自于发掘新型数据。

And, you know, it's often proposed that the future is going to come from unlocking new types of data.

Speaker 1

例如，视频显然是人们经常讨论的一个例子。

For example, video, I think is the obvious one that people talk about.

Speaker 1

你指的是这类数据吗？还是你有其他关于哪些数据对小模型更有效的想法？

Is that the kind of thing you're referring to or do you have other ideas about what will be effective for small models?

Speaker 0

大致来说，是的。

Roughly speaking, yes.

Speaker 0

但让我澄清一下我所说的‘更好数据’是什么意思，因为互联网数据在数量上并不差。

But let me clarify what I meant by better data because internet data is not so bad, at least in terms of quantity.

Speaker 0

但当我们审视大语言模型的流程时，预训练模型永远不够好。

But when we look at the LLM pipeline, pre trained model is never good enough.

Speaker 0

尽管数据量庞大，但预训练模型仍然不够好，你必须使用大量通常不同于互联网数据的数据进行后训练。

Despite the scale of it, pre trained the model is never good enough and you have to do post training on a fairly large amount of data that usually is different from the internet data.

Speaker 0

像监督微调和强化学习都需要由人类专门为教AI而整理的数据点。

Like supervised defined tuning as well as RL requires the kind of data points that are either curated by humans just for the purpose of teaching AI.

Speaker 0

这些数据不是你从互联网上随便下载的东西，你可能需要花钱请人来撰写这些数据点，而如今更常见的做法是，不仅不依赖众包数据，而是聘请律师、前国际数学奥林匹克冠军等专家。

These are not like things that you just downloaded from the internet, you maybe pay someone to write those data points and the more common practice these days is that not even crowdsourcing your data, but rather hire experts like lawyers, former international math Olympiad winners.

Speaker 0

这些是真正的专家，然后让他们为你撰写数据，因此也在大量收集专家数据。

These are like real experts and then have them to write data for you, so a lot of expert data being collected as well.

Speaker 0

但即便如此，对AI来说仍然不够，因为AI极度依赖数据。

And then even that is not enough for AI because AI is so data dependent.

Speaker 0

因此，最近人们还大量进行自动合成数据生成。

So then more recently, people also do a lot of automatic synthetic data generation.

Speaker 0

现在，如果你以一种普通的方式生成合成数据，比如只是让LLM为你生成一些问题或解答，那么通常效果不够好，或者只是重复相同的内容。

Now synthetic data, if you do it in a more like in the vanilla way, just ask LLMs to write some problems for you or solutions for you, then oftentimes it's not good enough or it could be just a repetition of the same thing.

Speaker 0

因此，这需要在提示设计上投入更多努力，甚至构建一个由不同模型组成的流水线，来优化提示或改进解答，反复修订，经历多次迭代。

So it does require a lot more effort in the way that you design prompt and then have even like a pipeline of different models, making the prompt better or making the solution better, revise the solution, taking lots of iterations.

Speaker 0

所以，这并不只是简单地让ChatGPT为你生成数据，但如果做得恰当，它可以产生互联网上原本不存在的新数据点。

So it's not as simple as just asking ChatGPT to write data for you, but if we do it quite right, then it can lead to a new data point that didn't exist on the internet.

Speaker 0

它可能是与互联网上内容在质量上截然不同的高质量数据，一个典型的例子就是难题的解答。

It could be like really high quality data that is qualitatively different from what was on the internet and a prime example of this is hard math solutions.

Speaker 0

互联网上的数据确实包含大量数学内容，但并不一定包含许多难题的解答。

So internet data does have a lot of math but it doesn't necessarily have solutions to a lot of hard math problems.

Speaker 0

因此，你必须通过请专家为你撰写解答，或以某种方式利用大语言模型来生成优质解答，尽管目前它们还不能完全胜任。

So you have to come up with those solutions, either by asking experts to write solutions for you or using LLMs in some ways in order to generate good solutions, even though they are not quite capable of doing it yet.

Speaker 0

但你可以使用，例如，带有验证器的强化学习，通过大量探索来找出哪些AI生成的解答是正确的，然后收集这些数据，这些数据在强化学习中被隐式地用作优质数据，以强化模型的行为。

But you could use, for example, reinforcement learning with verifiers that can explore a lot of explorations and see which solutions that AI generated happens to be correct based on the verifier, then you collected that data, which is implicitly used as good data during RL to amplify that model behavior.

Speaker 0

但有时人们会进一步利用这些数据点进行迭代式的模仿学习。

But sometimes people then use that data points in order to do imitation learning on top iteratively.

Speaker 1

这听起来像是应用了相当广泛的各种方法。

That sounds like the application of a fairly broad variety of approaches.

Speaker 1

你提到了合成数据生成、模仿学习、强化学习，这些在过去都是各自独立的研究领域，然后分别投入实践。

You talk about synthetic data generation, you talk about imitation learning, you talk about RL, these are all things that historically have been like, you know, their own field of research and then, you know, put into practice independently.

Speaker 1

而现在你在谈论将它们整合在一起。

And now you're talking about integrating them together.

Speaker 1

这是否是你认为解决这个问题的关键所在，即整合这么多想法？

Is that a significant element of how you think this problem gets solved, integrating a lot of these ideas?

Speaker 0

是的。

Yes.

Speaker 0

事实上，从某种意义上说，人工推理的艺术在很大程度上是人为的，因为我们必须精心协调所有这些复杂的、近乎系统性的研究，以确保事情在正确的时间以正确的方式、按正确的顺序进行，然后反复迭代。

And in fact, in some sense the art of artificial reasoning is fairly artificial in the way that we have to orchestrate all this complex, almost like system style research in order to make sure that things are done at the right time, in the right way, in the right sequence and then iterate over and over.

Speaker 1

请详细说明一下模仿学习的使用，以及你如何看待它在这一流程中的作用。

Elaborate a little bit on the use of imitation learning and how you see that playing into the pipeline.

Speaker 0

我一时想不起是哪家公司、哪个模型了。

Yeah, am blanking on which company, which model.

Speaker 0

可能是LAMA吧。

It may have been LAMA actually.

Speaker 0

有一篇白皮书描述了他们如何进行后训练。

Was a white paper describing how they did the post training.

Speaker 0

可能是LAMA三号。

It may have been LAMA three.

Speaker 0

他们反复进行，当然，他们做了一些类似的事情，比如先进行预训练，然后进行顺序微调、指令微调，以及对一些考试风格数据进行监督训练。

They were repeating, of course, they were doing something like, okay, first the pre training and then sequential fine tuning, instruction tuning, as well as like supervised training on some of the exam style data.

Speaker 0

最后，还有一个强化学习阶段。

And then finally, there's reinforcement learning phase.

Speaker 0

在强化学习和监督微调之间进行迭代是很常见的，即在完成强化学习后，发现一些好的行为，然后希望进一步强化这些行为。

It's not uncommon to take the iteration between RL and SFT such that after doing RL you find some good behaviors and then you want to kind of like drill on that.

Speaker 0

这是另一个例子。

Here's another example.

Speaker 0

所以是DeepSig R1。

So DeepSig R1.

Speaker 0

DeepSig R1在强化学习之后进行了一定程度的模仿学习，他们提供了其DIPSEQ R1的精简版模型。

DeepSig R1 does some amount of this imitation learning after the reinforcement learning in that they do provide distilled version of their DIPSEQ R1 in the smaller models.

Speaker 0

有趣的是，他们在这些小型模型上并不直接进行强化学习，而是通常从已经通过强化学习训练好的更强模型中进行知识蒸馏。

What's interesting is that they don't do straightforward RL at those small scale models, but they rather do distillation often from the stronger model that's already trained through reinforcement learning.

Speaker 0

所以他们说，如果你只进行强化学习，有时模型在解决数学问题的过程中会突然切换语言。

So they say that if you just do reinforcement learning, by the way, sometimes this model starts code switching in the middle of solving math problems.

Speaker 0

它会突然在中文和英文之间来回切换，或者使用其他一些对人类读者来说毫无意义的外语。

It's just suddenly speaking in Chinese and English and back and forth or some other foreign languages that may not make sense to human readers.

Speaker 0

强化学习只关心你是否得到了正确的最终答案，而不关心你是如何得出答案的。

So reinforcement learning only cares about whether you got the final solution right or not, it doesn't care about how you got there.

Speaker 0

因此，奇怪的行为可能会自发出现，甚至被进一步强化。

So strange behaviors can be emergent and then it can be even reinforced.

Speaker 0

所以，如果你不希望出现这种情况，希望人类能够理解和验证思维链的推理过程，那么你只希望保留那些导向正确答案的思维链。

So then if you don't want that, if you want interpretability of the way that chain of thoughts can be interpreted by humans and verified, then you want only those chain of thought that leads to the correct solution that you like.

Speaker 0

因此，你可以过滤掉大量经过强化学习后产生的输出示例和解法，只收集那些更好的例子，用来训练较小的模型。

So then you can filter out a lot of these output examples, solutions by this stronger model that went through reinforcement learning and then collect only those better examples in order to teach smaller models.

Speaker 0

总体而言，这会催生出一种纯粹基于模仿学习的模型，它通常具有你所期望的更好行为。

And in general, this leads to a model like purely based on imitation learning that's very powerful, but oftentimes have a better behavior that you want it.

Speaker 1

我想我更想弄清楚的是，从更根本的角度来看，当我想到模仿学习时，我总是联想到一个经典例子：你给模型看一段YouTube视频，展示如何做某事，然后模型通过模仿视频中的内容来学习。

I guess maybe the, what I'm trying to get clearer on is maybe more fundamental in the sense of when I think about imitation learning, I always think of this canonical example as you know, you're showing a model, a YouTube video of how to do something and then the model learns how to do it based on, you know, imitating what it sees in a video.

Speaker 1

当我们尝试将这种方法应用于更传统的文本数据时，我不太清楚它与监督微调或RLHF之类的方法有何区别。

When we try to apply that to more traditional data, textual data, it's not super clear to me how that is distinguished from just supervised fine turn fine tuning or RLHF or something like that.

Speaker 1

你能解释一下吗？在你描述的这个应用或流程中，是什么让它成为模仿学习？

Like, can you explain like how they're, you know, what makes it imitation in the application or pipeline you described?

Speaker 0

在这个语境下，模仿学习其实就是监督微调。

Oh, imitation learning just means supervised fine tuning in this context.

Speaker 0

有时候人们会说模仿学习，指的是在强化学习之前或之后使用一些示例轨迹进行模仿，但本质上它仍然是监督微调。

Sometimes people say imitation learning when things were performed either right before or after RL because then there's some like example trajectories that you want to imitate from, but it's really just a supervised to fine tuning.

Speaker 0

好的。

Okay.

Speaker 0

对。

Yeah.

Speaker 1

明白了。

Okay.

Speaker 1

懂了。

Got it.

Speaker 1

我们之前稍微讨论过，你可能想做的一件事是让模型为你生成合成数据。

We were talking a little bit about, you know, one of the things that you might want to do is to use the model to generate synthetic data for you.

Speaker 1

你提到过，这种做法通常效果不好。

And you talked about how that, you know, tends to not work.

Speaker 1

你没提到这一点，但我猜我们讨论的是模式坍塌。

I don't think you mentioned it, but I'm assuming we're talking about mode collapse here.

Speaker 1

这让我想起了《人工蜂群》这篇论文，它在最近的NeurIPS大会上被评为获奖论文之一，深入探讨了这种模式坍塌的后果。

And that brought to mind the Artificial Hivemind paper, which, was highlighted as one of the award winning papers that this past NeurIPS, which really talked about one of the implications of this kind of mode collapse.

Speaker 1

你能谈谈这篇论文吗？不一定局限于小模型，但你能讲讲你对这篇论文中哪些发现感到有趣吗？

Can you talk it's not necessarily in this small models, you know, but can you talk a little bit about, that paper and what you found interesting about it?

Speaker 0

当然可以。

Sure.

Speaker 0

是的。

Yeah.

Speaker 0

所以，模式坍塌确实是大语言模型生成中的一个真实问题，因为我们在论文中发现，即使你提出开放性问题，比如‘讲个关于时间的笑话’或‘说点关于时间的智慧之言’，当你问这些开放性问题时，你本应期待——既然没有唯一正确答案，语言模型就应该能生成多样化的回答。

So, mode collapse is a real concern with LLM generation because, so what we find in our paper is that even when you ask open ended questions like, 'Tell me a joke about time' or 'Tell me something wise about time', like when you ask or tell me a story about something like when you ask these open ended questions, you would expect that, well, there's no one good answer therefore language models should be able to generate diverse set of solutions.

Speaker 0

即使你问，顺便说一下，给我一个零到十之间的随机数。

Even when you ask, by the way, Hey, give me random number between zero and ten.

Speaker 0

这并不随机。

It's not random.

Speaker 1

就在想这一点，对吧？

Just thinking about that, right?

Speaker 0

是的。

Yeah.

Speaker 0

对。

Yeah.

Speaker 0

通常都是七或者十三，如果你问的范围更大一点的话。

It's usually like seven or, you know, 13, you know, if you ask the bigger range.

Speaker 0

所以这并不随机，因为数据本身就不随机，或者说是偏斜的。

So it's not random because data is not random or data is skewed in the first place.

Speaker 0

所以当你对数据的分布进行随机采样时，得到的仍然是偏斜的分布。

So when you do random of the distribution of the data, what you get is skewed distribution.

Speaker 0

这是问题的第一部分，即使在预训练之后。

That's part one of the problem, even after pre training.

Speaker 0

更大的问题是，在后训练阶段，比如顺序微调和强化学习，模型的输出概率变得更加扭曲，倾向于聚焦在人们喜欢的刻板答案上。

The bigger problem is after post training, like sequential fine tuning and RL, the probability, output probability of the model becomes even more skewed like zoning in to the stereotypical answers that people tend to like.

Speaker 0

因此我们发现，即使是这些问题，我的意思是，当然有些问题你根本不应该改变答案。

And so we find that even for those questions, I mean, course there are questions for which you shouldn't vary the answer at all.

Speaker 0

比如什么是

Like what's the

Speaker 1

对，事实性问题，

Right, factual questions,

Speaker 0

事实性的，对吧？

factual right?

Speaker 0

答案。

Answers.

Speaker 0

别改变那些。

Don't vary that.

Speaker 0

但即使是开放式问题，模型的多样性也远不如我们预期，即使你多次使用较高的温度参数进行提问，它也未必能产生足够多样的回答。

But even for open ended questions, the models are not as diverse as we would have expected to the point that even when you ask multiple times with temperature, higher temperature, it may not be able to vary as much.

Speaker 0

因此，我们发现模型输出中不仅存在模型内部的同质性，还存在模型间的同质性，也就是说，LAMA、CHAJPT 和 DIP sick R1 都表现出惊人相似的行为。

So there's intra model homogeneity in the model output as well as we find intermodal homogeneity, meaning LAMA, CHAJPT and DIP sick R1, they all have similar behavior, strikingly similar behavior.

Speaker 0

有时它们生成的输出几乎是逐字相同的，这非常奇怪。

Sometimes they generate output that's almost verbotim identical, which is very strange.

Speaker 0

这就是我们在 NeurIPS 上发表的论文《人工蜂群》的核心观点。

So that's sort of like the gist of this paper that we presented at NeurIPS, Artificial Hivemind.

Speaker 0

这让我有些担忧，因为越来越多的人使用大语言模型来发布网络内容。

It's a bit of a concern to me because more and more people utilize LLMs in order to post things on the internet.

Speaker 0

我不禁想知道，我们的互联网会变成什么样？

I wonder what happens to our internet.

Speaker 0

互联网曾经是人类智慧的产物。

The internet used to be the artifact of human intelligence.

Speaker 0

它真正体现了人们写作和思考方式的极大差异，是人类智慧的历史性结晶。

It really encapsulates the vastly different ways people write and think and it's a historical artifact of human intelligence.

Speaker 0

现在它正逐渐成为大语言模型与一定程度的人类智能混合的产物，但如果这种产物变得越来越同质化，不再反映人类思想的多样光谱，我就担心我们会失去一些有价值的东西。

Now it's really becoming the artifact of LLMs mixed with some amount of human intelligence, but what if it becomes less, more homogeneous and less reflecting the diverse spectrum of human thoughts, now we're gonna do something valuable there is my concern.

Speaker 1

你们是否也研究了第二部分——这种现象的后果？还是你们主要专注于展示这种效应，以及在开放性场景中模型如何发生同质化？

And did you study that kind of second part, the implications of this as well, or were you primarily focused on demonstrating the effect and how, mode how collapse works in these open ended scenarios?

Speaker 0

我们的研究仅止于探讨这些模型有多同质化，尤其是在后训练阶段；相比之下，预训练模型在这方面表现更好。

Our study stops at just like studying how homogeneous these models are, especially after post training, Pre trained models are better in this regard.

Speaker 0

尽管这次对话让我想起一项研究，该研究比较了ChatGPT出现前后Reddit论坛的语言使用情况，发现就连Reddit的帖子也不如以前多样了。

Although this conversation reminds me of a study that I saw that looked at the language use of Reddit forums before and after ChatGPT and they find how even Reddit posts are not as diverse as before.

Speaker 1

现在确实出现了很多以前不存在的深入探讨。

There's a lot of delving now that wasn't happening before.

Speaker 0

是的，大概如此。

Yep, probably.

Speaker 0

对。

Yep.

Speaker 0

你知道吗，每当我看到别人文章里出现'delve'这个词，我都会想：你到底做了什么？

You know, actually whenever I see the word delve in anybody's writing, I'm like, what did you do?

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

有趣的是，我认为使用多种大语言模型的人经常观察到这种现象。

It's interesting that I think, you know, we've folks that use a variety of LMs see this behavior a lot.

Speaker 1

比如，你向多个大语言模型提出一个相当开放的问题，得到的回应却非常相似。

Like you ask a fairly open ended question across multiple, LLMs and you get very, very similar responses.

Speaker 1

而且，从训练数据的角度来看，人们经常问：如果模型在‘自产自销’，比如使用合成数据进行训练，我们该如何改进模型呢？

And, I think, and kind of in the context of, you know, training data, the question is often asked like, okay, so are, how do we improve models if models are, you know, quote unquote smoking their own exhausts, for example, or, you know, training on this synthetic data.

Speaker 1

这里隐含的意思是，我们会加速模式坍缩。

You know, there's the implication being that we accelerate further mode collapse.

Speaker 1

但让我觉得这篇论文有趣的是，它确实指出了这一点。

But what I found interesting about this paper is that, there's that.

Speaker 1

不仅如此，这种由合成数据充当文章的环境，对读者、整个生态系统以及人类会产生什么影响呢？

And, but not only that, like what is the impact on, you know, the reader, the ecosystem, the humans that are in this environment where this, you know, synthetic data is being posted as, you know, articles?

Speaker 1

这是否就是所谓的群体思维，它正在改变我们的思考方式吗？

Is it, you know, Hivemind, is it changing the way we're thinking?

Speaker 1

我认为更广泛的想法是，你与HAI有联系，那里似乎是进行跨学科研究的理想场所。

And, you know, what I think the the broader thought was you're you're affiliated with the HAI that seems like the place to kind of cross interdisciplinary study this.

Speaker 1

我非常期待能更多了解这方面的情况。

And I'd be super interested in in hearing more about that.

Speaker 1

如果你听到任何相关研究，麻烦告诉我。

You know, let me know if you hear any work in that regard.

Speaker 0

哦，是的，是的，我一半的斯坦福隶属关系都在HAI，也就是以人为中心的AI研究所。

Oh yeah, yeah, half of my Stanford affiliation is with HAI, Human Centered AI Institute.

Speaker 0

因此，我一半的研究都涉及人工智能对人类的影响，我个人认为，人工智能既能带来巨大益处，也存在诸多担忧。

Therefore, easily half of my research has to do with AI's impact on humanity and I personally think that there's a lot of benefit we could get from AI as well as concerns.

Speaker 0

当前这种局面的棘手之处在于，益处与潜在危害并存，而我认为，未来会如何发展，很大程度上取决于我们今后如何推进人工智能研究。

So the thorny thing about this situation, current situation is that both the benefit and potential harms coexist and then in some ways, depending on how we pursue AI research from here on, the future can be drastically different is how I feel.

Speaker 0

一方面，大型语言模型可能影响人类智力，导致我们失去个性和多样性；但另一方面，它也可能导向一个完全相反的未来。

You know, on one hand, it could be that LLMs influence human intelligence such that we lose individuality and we lose diversity, but on the other hand, it could also lead to a future in which the opposite is true.

Speaker 0

或者至少对某些人来说，我认为在AI的帮助下，他们可能会变得更加专业化和更具创造力；而对其他人来说，他们可能会过度依赖AI，丧失自己的思考能力，仅仅盲目接受AI所说的一切。

Or at least for some humans, I think, they might be able to become even more specialized and even more creative with the help of AI and then for many others, they might choose to be overly dependent on AI and then lose their own thinking and then they might just be super dependent on whatever AI says is what they say.

Speaker 0

因此，最佳情况与最坏情况之间的差距可能会进一步扩大，这是一种可能。

So the gap, the best case scenario, the worst case scenario, the gap might actually increase is one possibility.

Speaker 0

无论如何，我认为意识到潜在的危害和最坏情况至关重要，这样才能采取行动应对。

In any case, I believe that it's important to be aware of potential harms and worst case scenarios in order to do something about it.

Speaker 0

所以，即使我们发现LLM在微调后变得更加同质化，作为一项后续研究——它与人工蜂群思维同时进行但尚未提及——那就是光谱微调。

So even for this problem that we find that LLMs are more homogeneous after fine tuning, as a follow-up research, which was pursued concurrently to the artificial hivemind but we haven't talked about yet is spectrum tuning.

Speaker 0

我的前学生索伦森研究过一种叫做光谱微调的想法。

So my former student, Sorenson, he worked on this idea called spectrum tuning.

Speaker 0

这是一种后训练方法，旨在让模型保留多种输出生成方式的多样性，而不是仅仅聚焦于后训练数据中呈现的所谓‘正确答案’。

It's a kind of post training method that teaches the model to retain the spectrum of different ways of generating the output instead of just like honing in to the correct answers that were presented by the post training data.

Speaker 0

因此，我认为当我们意识到这些问题时，就可以尝试寻找解决方案，比如设计新的后训练算法，或从一开始就确保后训练数据的多样性，从而使这些数据依赖型模型的偏差程度低于原本可能达到的水平。

So I do think that when we are aware of the problems, we can then try to seek solutions to that problem, either by designing new post training algorithms and or ensuring that the post training data is diverse in the first place, because then these data dependent models can become less skewed than how much skewed they could have been.

Speaker 0

因此，我认为在这一领域还有大量未来研究工作有待开展，以缓解生成式AI可能带来的各种担忧。

So I think there are a lot more future research to be done in this space in order to mitigate the potential concerns about generative AI.

Speaker 1

当你开始描述这种可能存在两种结果、差距扩大的想法时，你提到我们追求研究的方式在很大程度上决定了我们会倾向于哪种结果。

When you started to describe this idea of, you know, there being two possible outcomes and that gap widening, you talked about the way we pursue research as being kind of central to which of those outcomes we tend towards.

Speaker 1

你能详细谈谈这一点吗？你认为研究在决定这一方向中扮演了什么角色？

Can you elaborate on that and the role you see research having in determining direction there?

Speaker 0

是的，我认为大型语言模型在数学数据和数学问题上表现得非常好，这无疑对人类最为有益。

Yeah, so I do think that LLMs are doing really well on math data, math problems will necessarily be the best, more beneficial one for humanity.

Speaker 0

这是一种比较粗略的说法，但它概括了我对人工智能未来对人类影响的看法：我们必须更明确地针对具体问题开展工作。如果我们关心民主，那就需要设计能够增强人类民主能力的AI，使人们能够通过民主过程相互理解，并通过民主方式处理不同意见，而不是构建那些只为优化用户注意力和信息流参与度、从而加剧我们之间紧张关系的AI。

That's one crude way of saying it, but that kind of encapsulates what I believe about the future of AI on humanity, which is that we really have to work on specific problems more explicitly, like if we care about democracy, we then need to work on designing AI that can make humans more democratic and then be able to understand each other through democratic process and then be able to work with different opinions through democratic process as opposed to building AI that really optimizes for people's attention and engagement with feeds that could then increase the tension among us.

Speaker 0

因为这些利润驱动的动机并不一定与全人类应追求的目标一致。

Because these profit incentives are not necessarily aligned with what humanity at large should aspire to achieve.

Speaker 0

因此，为了实现我们想要的目标，我们不能仅仅依赖几家科技公司做好它们的工作。

So in order to get what we want, we cannot just leave it to a few tech companies doing their wonderful jobs.

Speaker 0

我的意思是，我确信这些公司里有很多员工都怀有良好的意图，希望做出有益的事情，但也很容易出现这种情况——尤其是在利润和用户参与度的竞争压力下，事情的发展方向可能并不利于人类整体利益。

I mean, I'm sure that they're all, you know, they have a lot of members in their companies who have good intentions about making, yeah, well intended people, but it could also be that easily, especially when there's like a profit competition and engagement competitions, things could unroll in a way that is not as beneficial for humanity.

Speaker 0

因此，当我们思考如何让人工智能服务人类时，顺便说一下，我所理解的‘AI民主化’，正是我所推崇的：属于人类、为了人类、由人类主导的AI。

So when we think about how to make AI serve humans, this is where, by the way, when I think about AI democratization, I really think about the way that I like to think about is AI of humans, for humans and by humans.

Speaker 0

所以，如果是为人类服务的AI，它应该是为所有人类服务的，而不仅仅是某些国家某些公司的人类。

So if it's AI for humans, it should really be AI for all humans, not just some humans working for some companies in some countries.

Speaker 0

不仅如此，如果我们不谨慎设计AI，我认为它可能会走向一个甚至不是为人类服务、而是为AI服务的阶段。

Not only that, if we are not very careful with how we design AI, I think it could be coming to a point where it's not even AI for humans, but it's more like AI for AI.

Speaker 0

更糟糕的是，人类为AI服务，也就是说，人类在为AI工作。

And even worse, humans for AI, you know, humans working for AI essentially.

Speaker 0

因此，我认为非营利部门参与AI未来的设计至关重要，而不仅仅是营利部门。

And for that I think it's very important for non profit sectors participate in designing the future of AI, not just the profit sectors.

Speaker 1

这在一定程度上呼应了这样一种观点：我们这一代最聪明的人才，专注于让人点击广告，而不是推动人类进步、科学进步等。

Part of that echoes the idea of kind of the brightest minds of our generation focused on making people click ads as opposed to, you know, advancing humanity, science, etcetera.

Speaker 1

你暗示这种趋势中还存在一种以AI为导向的层面，我们需要警惕，并积极定义我们的未来。

You're suggesting that there's a kind of an AI oriented aspect to that as well that we need to be, on the lookout for and and, be proactive in defining our future.

Speaker 0

是的。

Yeah.

Speaker 0

我们需要更多投资来支持那些真正思考AI对人类影响的研究，而不仅仅是提升数学问题的基准分数。

And more investment needed to support such research that really think about AI's impact on humanity, not just increasing the benchmark scores on math problems.

Speaker 1

我们回到那个话题吧。

Let's come back to that.

Speaker 1

我们刚才好像正在讨论小型模型的话题。

We were in, I think in the middle of our, small models conversation.

Speaker 1

我们大致谈到了如何让小型模型表现得更好。

And I think we were talking broadly about making small models, perform better.

Speaker 1

但我记不清我们是否深入探讨过推理与小型模型之间的关系，以及推理在小型模型或小型推理器中的独特特性。

But I don't recall us getting to the specifics of reasoning and small models and kind of unique characteristics of reasoning as they pertain to small models or small reasoners.

Speaker 1

你也在关注这一点吗？

Is that something you're looking at as well?

Speaker 0

是的，我对提升小型语言模型的推理能力非常感兴趣，特别是因为推理是任何未来模型都必须具备的重要智能能力。

Yeah, so I'm quite excited about making small language models a better reasoner, especially because reasoning is such an important intelligence capability that any model in the future has to be good at.

Speaker 0

而且这也是一个有趣的挑战，因为互联网数据并不能让大语言模型立即具备良好的推理能力。

And also this is an interesting challenge because internet data doesn't really equip the LLMs to reason well right away.

Speaker 0

它们虽然能一定程度上进行推理，但需要大量的后续训练来增强推理能力。

They can reason to some degree but they do require a lot more post training in order to infuse better reasoning capabilities.

Speaker 0

因此，我最近的研究集中在如何让小型模型更好地推理，这需要在序列微调阶段提供更高质量且多样化的数据，同时结合其他算法手段，从看似无望的小型语言模型中挖掘出更强大的智能。

So my research has focused on recently how to make small models reason better that requires both feeding in better data through sequential fine tuning phase and that data has to be really high quality and diverse as well as other kinds of algorithmic approaches that can squeeze out better intelligence from even from small language models that seemed kind of hopeless.

Speaker 1

在数据整理方面，有没有哪些具体技术正在兴起？

Are there any specific techniques coming to the fore with regards to the data curation side of things?

Speaker 1

还是说，这主要依赖人工手动整理，比如人类在创建大型数据集以提升质量时所做的工作？

Or is it, you know, largely kind of manual labor in, you know, humans curating these, you know, large data sets to increase quality?

Speaker 1

你在这方面看到了什么趋势？

What are you seeing there?

Speaker 0

我认为合成数据有着巨大的前景。

I think there's a huge future in synthetic data.

Speaker 0

我可以举一个我们最近工作的例子，叫做Prismatic Synthesis。

So I can give you one example of our recent work called the Prismatic Synthesis.

Speaker 0

这是一种合成数据生成算法，之所以叫‘棱镜’，是因为它像棱镜一样，能把光分散开来，使数据更加多样化。

It's a synthetic data generation algorithm which is prismatic because it acts like a little bit like a prism that can scatter the light to make it more diversified.

Speaker 0

我们的做法本质上是数学问题合成，更准确地说是数学问题与解答的合成，我们使用DHPSIC R1这个320亿参数的模型作为教师模型。

So what we do is basically math problem synthesis where actually it's more like math problem and solution synthesis And we're doing this using DHPSIC R1 32,000,000,000 parameter model as the teacher model.

Speaker 0

现在320亿参数属于中等规模，比如今的中等规模稍大一些，但远不如DIPSEQ R1，也就是那个6710亿参数的完整大模型。

Now 32 B parameters medium size, a little bit bigger than medium size these days, But it's much worse than DIPSEQ R1, the full model, the biggest model, which is six seventy one B parameter model.

Speaker 0

这比我们选作教师模型的模型大了大约20倍。

So that's like 20 times bigger than the model that we choose to use as teacher model.

Speaker 0

在这项工作中，我们主要专注于使用这种中等规模的教师模型，为困难的数学问题生成顺序微调数据。

In this work, we primarily focus on making sequential fine tuning data for hard math problems using this medium scale teacher model.

Speaker 0

我们的目标是与另一种方法竞争，即使用大20倍的更强教师模型。

And then our goal is to compete against the alternative, which is to use much stronger teacher that's 20 times larger.

Speaker 0

通常来说，这是一场非常难打的仗。

Now in general, that's like really difficult game to play.

Speaker 0

要击败一个大20倍的教师模型非常困难，因为性能差距十分显著。

It's really hard to beat against a teacher that's 20 times larger because the performance gap is significant.

Speaker 0

那么，我们究竟该如何通过算法手段过滤数据，以确保生成数据的多样性，来缩小这一差距呢？

So how on the earth do we close the gap is algorithmic ways of filtering the data to ensure the diversity of the generated data.

Speaker 0

因为无论你的教师模型多优秀，正如我们在《人工蜂群》论文中所展示的，它们都会重复。

Because no matter how good your teacher is, as we demonstrated in our Artificial Hivemind paper, they're all repetitive.

Speaker 0

它们都高度同质。

They're all homogeneous.

Speaker 0

因此你必须付出大量努力来使它们多样化。

So you have to put a lot of efforts to diversify them.

Speaker 0

我们多样化数据的方法是使用一个代理模型——一个小规模的代理模型，来观察给定输入时输出的梯度向量。

The way that we diversify data is we look at the gradient vector of output given input using a proxy model, small scale proxy model.

Speaker 0

它非常小。

It's so small.

Speaker 0

它只有15亿个参数。

It's only 1,500,000,000 parameter model.

Speaker 0

我们只是从网上下载了Qwen 1.5B，用它作为代理模型来计算给定输入时输出的梯度。

We just use quen 1.5 bib just downloaded from the net and we use it as a proxy model to compute the gradient of output, given input.

Speaker 0

这些输入输出对是我们使用DIPSeq R1（320亿参数模型）生成的合成数据。

And these output input pairs are the synthetic data that we just synthesized using the DIP seq R1 32,000,000,000 parameter model.

Speaker 0

因此，我们观察这个教师模型生成的每个数据点的梯度表示，然后通过K均值聚类来看它们之间的差异。

So we look at the gradient representation of each data point that this teacher model generated and then we look at how they differ from each other by looking at k means clustering.

Speaker 0

这是一种古老的聚类机制，但在当今依然有效。

It's an old fashioned clustering mechanism that still works in this modern day.

Speaker 0

因此，我们使用张量化的K均值聚类来识别哪些簇过度代表，以及哪些数据点代表性不足。

So we do tensorized k means clustering to see which clusters are overrepresented and then which data points are underrepresented.

Speaker 0

我们会非常严格地过滤掉过度代表的数据点。

We filter out overrepresented data points really aggressively.

Speaker 0

比如，我们会丢弃我们刚刚合成的绝大部分数据，只保留那些彼此独特且不同的数据点。

Like we throw out the vast majority of all the data that we just synthesized and then only maintain those that are unique and different from each other.

Speaker 0

然后用这些数据点在下一轮中提示教师模型。

And then use those data points in order to prompt the teacher model in the next round.

Speaker 0

因此，我们反复进行这一过程：过度生成，然后利用梯度向量进行严格过滤，再过度生成、严格过滤，直到收集到一百万个数据点。

So we iterate through this, over generate and then filter aggressively using gradient to vectors and then over generate and filter aggressively until we gather 1,000,000 data points.

Speaker 0

所以这是一个大量过度生成和过滤的过程。

So it's a lot of over generation and filtration.

Speaker 0

然后我们发现，这一百万个数据点实际上优于你从更强的教师模型——即最佳教师模型——生成的一百万个数据点。

And then we find that that 1,000,000 data points is actually better than the 1,000,000 data points that you generate from the stronger teacher model, the best teacher model.

Speaker 1

你能谈谈在这个例子中你看到的提示和回应的类型吗？

Can you talk a little bit about the, like the kinds of prompts and responses that you're seeing in this example?

Speaker 0

这只是一些困难的数学问题，是的，所有问题都需要很长的解答过程。

It's just some hard math problem in our, yeah, it's all hard math that requires a very long solution.

Speaker 0

顺便说一下，当我们自动生成所有解答时，我们根本不知道答案是否正确。

And so by the way, when we auto generate all the solutions, we have no idea whether the answer is correct or not.

Speaker 0

对吧？

Right?

Speaker 0

所以我们玩了个小花招。

So we play a bit of a trick.

Speaker 0

这个花招很简单。

The trick is simple.

Speaker 0

在生成问题之后，我们的案例甚至对问题本身也是完全合成的。

After we generate the problem, our case is like a fully synthetic even for the problems.

Speaker 0

许多其他合成数据通常依赖于互联网上已存在的问题，这样你只解决真实的问题，但在我们的情况下，我们也会生成问题本身，因为我们真的想探索数学推理领域的多样化范围。

A lot of other synthetic data usually relies on problems that exist on the internet so that you only solve legit problems but in our case we generate the problems as well because we really wanted to explore diverse scope of the math reasoning domain.

Speaker 0

但这些为虚假问题或元问题生成的解答，可能正确，也可能不正确。

But these solutions generated for fake problems or meta problems may or may not be correct.

Speaker 0

因此，我们让模型多次解决同一个问题，然后检查最终答案是否彼此一致。

So what we do is we ask the model to solve the same problem multiple times and then check whether the final answer is identical to each other.

Speaker 0

如果不一致，我们就担心质量可能有问题。

If not, then we worry that the quality might be bad.

Speaker 0

这是一种非常粗糙的数据过滤方式，但在我们的情况下效果足够好，我认为这是一种在无需人工验证的情况下控制合成数据质量的强大方法。

It's a very crude way of filtering data but it worked well enough for our case and I think this is a powerful method to use for controlling the quality of synthetic data without human validation.

Speaker 1

明白了。

Got it.

Speaker 1

所以，你——我不知道这样问是否恰当，或者有没有更好的问法，但我只是想了解一下你的做法。

And so you, what's the, I don't know if this is the right way to ask this question or, or, if there's a better way to ask the question, but I'm trying to get a sense for you.

Speaker 1

我想你提到过生成一个包含一千个数据点的数据集，而且这个过程是分多轮进行的，对吧？

I think you mentioned generating a dataset of a thousand points and it's happening over multiple rounds, I guess.

Speaker 1

我想弄清楚的是，在各轮之间提示词的变动程度有多大？你们是始终使用同一个提示词，然后在每轮中基于某种对数级的变异来逐步增加差异，还是完全不改变提示词本身？

And I'm trying to understand like the degree to which the prompt is varied across rounds, or are you starting from like a prompt and developing variance based, you know, kind of, some logarithmic number of variance with each round without any kind of variance of the prompt?

Speaker 0

非常好的问题。

Excellent question.

Speaker 0

关于提示，我们会展示一些示例。

We so for prompting, we show some examples.

Speaker 0

嘿，生成这种类型的数学题。

Hey, generate math problems of this kind.

Speaker 0

在多次生成和过滤的过程中，我们改变的是提示中展示的示例。

And so what we change through these multiple iterations of over generation and filtration is the examples that we show in the prompt.

Speaker 0

这些示例来自之前的迭代，目的是希望模型在接触到更新颖、更多样化的上下文示例时能受到更多启发。

So the examples come from the previous iterations, that in the hope that the models may feel more inspired when provided with newer, more different examples in the context.

Speaker 1

在每个阶段，你们是验证问题的答案，还是不验证？

And you're at each stage, are you validating the answer to the question or you're not?

Speaker 0

我们确实会验证那些我们决定保留的内容。

We do validate whatever we decided to keep, we do validate.

Speaker 0

因此，我们最终合成的这100万个示例，希望大部分都具有真正正确的答案。

So that the final 1,000,000 examples that we've synthesized hopefully have by and large actually correct answers.

Speaker 0

不能保证。

Not guaranteed.

Speaker 1

这里的核心思想是，不要一开始就设定某个提示并直接生成一百万个例子，而是分阶段进行，每个阶段生成一部分，然后逐步发散。

The core idea here is as opposed to starting with some prompt and saying, generate a million examples, do some, you know, have end stages and at each stage generate some section and kind of disperse from there.

Speaker 1

你发现这种方法在保持数据集多样性方面效果很好。

And you've found that to work well for maintaining a degree of diversity in the dataset.

Speaker 0

所以，这取决于你怎么看待它，这是一个简单而直接的想法和方法，这也是这种方法的主要优势，因为其核心原则是：你真正想要的是与互联网上容易生成的数据在质上不同的数据。

So this is, you know, depending on how you view it, it's a simple idea and a simple method which is the big benefit of this method because the principle here is that you really want to make data that's different, qualitatively different from the internet data that's easy to generate.

Speaker 0

你必须深入这些相对较少探索的领域，当然，你也需要确保这些领域的质量足够好，但真正重要的是，我们试图在数据集生成过程中定量地提升多样性，我认为仅这一点就能带来显著效果。

You really need to go to these relatively less explored regions And then, of course, you need to make sure that the quality is reasonably good there as well, but really it's a diversity that we're trying to quantitatively enhance in the data set production and I think that alone can really go quite far.

Speaker 0

比如，如果你真正覆盖了多样化的领域，那么你的模型表现就会好得多，因为归根结底，当今的大型语言模型无论多么出色，其能力都取决于它所训练的数据。

Like if you really cover diverse ground, then your model can perform so much better because at the end of the day, current day LLMs, no matter how amazing they are, they are only as good as the data that it was trained on.

Speaker 0

所以你必须展示类似的例子。

So you have to show similar examples.

Speaker 0

类似的例子越多，模型在测试时应对各种情况就越好。

The more similar examples, the better for whatever the model may have to deal with during test time.

Speaker 0

我喜欢这样表达：任何分布外的东西，都要把它变成分布内的。

The way that I like to put it is that whatever is out of distribution, just make in distribution.

Speaker 0

确保你把所有分布外的都变成分布内的。

Make sure that you make all the out of distribution in distribution.

Speaker 0

这就是生成式人工智能的工作方式。

This is how generative AI works.

Speaker 0

这就是自动驾驶汽车的工作方式。

This is how even self driving car works.

Speaker 0

确保你覆盖所有情况、边缘案例，一切都要纳入你的训练数据。

Make sure that you cover all the rows, corner cases, everything, and that should go into your training data.

Speaker 0

这与人类学习驾驶的方式截然不同。

This is really different from how humans learn to drive.

Speaker 0

我们不需要看到大量这些边缘案例，我们就能应对自如，这正是智能真正的奥秘，我希望有一天我们能对此有所解答。

We do not need to see a lot of these corner case examples, we just deal with it, which is the real mystery of intelligence that I wish one day we will have some answers for.

Speaker 0

我们的数据效率非常高。

We're so data efficient.

Speaker 0

但目前，生成式人工智能在当前框架和范式下，唯一的方法就是确保所有分布外的数据都变成分布内的数据。

But right now, the generative AI, the only way under this current framework and current paradigm is making sure that all the out of distribution becomes in distribution.

Speaker 0

无论你怎么做，都要确保这一点发生。

However you do it, just make sure that that's going to happen.

Speaker 0

因此，后训练阶段需要大量整理数据，甚至合成数据，或者将两者结合使用。

And so that's why post training requires curating a lot of data or even synthesizing and curating and taking some combination of the two.

Speaker 0

但即便如此还不够，因此你需要大规模进行强化学习，因为强化学习是另一种将分布外数据转化为分布内数据的方法。

And then even that's not enough, therefore you do reinforcement learning at scale because reinforcement learning is another way of making out of distribution, in distribution.

Speaker 0

通过让模型自行探索所有这些未被探索的领域，确保在进入测试阶段之前，所有区域都已被充分探索。

By having the model to explore itself, all these other unexplored areas, make sure that it's all explored before you get into the testing phase.

Speaker 1

你知道，每当我参与这类关于数据作用以及使用各种技术来改进小模型或大模型数据的讨论时，

You know, whenever I'm in these conversations that we're talking about the role of data and the idea of using, you know, varying techniques to improve the data for, you know, small or large models.

Speaker 1

它总会让我回想起几年前，你在斯坦福的同事安德鲁·吴开始大力倡导数据驱动型人工智能的理念。

It kind of brings me back to a few years ago when your colleague there at Stanford, Andrew Ng, you know, started kind of planting this banner around data centric AI.

Speaker 1

正如你前面提到的，所有的人工智能本质上都是数据驱动的。

And as you noted earlier, like all AI is data centric.

Speaker 1

那这到底意味着什么？

So what does that really mean?

Speaker 1

但通过关注用于训练模型的数据，而非反复调整算法来提升模型这一理念，依然具有很强的共鸣。

But the idea that we're gonna improve models by focusing on like how the data that's used to train them as opposed to iterating on it, on the algorithms continues to resonate.

Speaker 0

是的。

Yep.

Speaker 0

意思是我们在这一方向上并没有走得太远。

In the sense we didn't go very far.

Speaker 0

对。

Yeah.

Speaker 0

但我们只是以一种更实证有效的方式做了更多这类工作。

But we're just doing a lot more of it in a more empirically powerful way.

Speaker 0

仍然有更多可以做的。

And still more can be done.

Speaker 0

从这个角度来看，当我们审视那些看似神奇的生成式AI前沿模型时，确实有点令人失望，但现实如此，我认为即使在当前的范式下，我们也能做得更好。

I think it's a bit disappointing when we look at the seemingly magical generative AI frontier models from this lens, but it is what it is and I think we can do a lot better even following the current paradigm.

Speaker 0

但我认为，这一直是我们的对话中反复出现的主题：一定存在一种更根本的更好方法来实现这一点。

But I, you know, like this is a recurring theme of our conversation which is that there must be a better way of, fundamentally better way of doing this.

Speaker 0

我们能找到它吗？

And can we find it?

Speaker 0

在某种意义上，自然已经找到了一种解决方案，那就是人脑。

In some ways, the nature found a solution, which is the human brain.

Speaker 0

自然找到了一种解决方案，而人脑所需的能量极少。

The nature found a solution and human brain requires so little energy.

Speaker 0

我们的大脑消耗的能量似乎比一个灯泡还少。

Our brain apparently uses less energy than one light bulb.

Speaker 1

所以，到目前为止，你提出的办法是专注于数据，或者说，是创建符合多样化、分布式模式但质量仍受约束的合成数据。

And so, you know, thus far you're proposing that one way to do this is to, focus on the data, or thus far, you're proposing that one way to do this is to create synthetic data that, follows, you know, you know, diverse and distributed pattern while still kind of constrained in quality.

Speaker 1

你也在探索将强化学习融入预训练目标的方法。

You are also looking at ways to incorporate reinforcement as part of the pre training objective.

Speaker 1

你能谈谈这项工作吗？

Can you talk a little bit about that work?

Speaker 0

是的

Yeah.

Speaker 0

所以，这是一种将强化学习作为预训练目标的新论文，我们最近刚刚发表。

So that's a reinforcement learning as a pre training objective, a new paper that we recently put out.

Speaker 0

大致来说，这个想法是，在预训练过程中，模型被迫以完全被动的方式学习预测下一个词元。

And roughly speaking, the idea is that during pre training, the model is forced to be completely passive in the way that it learns to predict which token comes next.

Speaker 0

但如果我们鼓励模型自己思考呢？

But what if we encourage the model to think for itself?

Speaker 0

在预测下一个词元之前，如果我们鼓励模型先生成类似思维链的内容，再预测下一个词元，会怎么样？

Before predicting the next token, what if we encourage the model to think for itself by generating, you know, something like chain of thought and then predict next token?

Speaker 0

在这种情况下，由于是强化学习，我们现在需要考虑奖励机制。

And then in that context, the reward, because it's reinforcement learning, we now need to think about reward.

Speaker 0

奖励可以有多种设计方式，但我们方法的核心思想是：用有思考时预测下一个词元的信息增益，来对比没有思考时的预测，这样你就必须比自己不思考时预测得更好。

The reward is could be I mean, could be different ways of designing this, but the key idea of our approach is to make the reward information gain of predicting next token with thought compared to without thought, so that now you have to be able to predict the next token even better than yourself predicting the next token without a thought.

Speaker 0

因此，你必须学会更好地思考，使你在有思考时的下一个词元预测概率，优于你没有思考时的预测概率。

So, you have to learn to think better so that your next token prediction probability becomes better than your own prediction probability without a thought.

Speaker 0

这样我们就鼓励模型在回答下一个词之前先独立思考。

So that way we encourage the model to think for itself before answering the next token.

Speaker 0

当您

And when When you

Speaker 1

您提到信息增益，然后又说希望模型能更好地预测下一个词。

said information gain and then you went on to say that you want the model to predict the next token better.

Speaker 1

当我听到信息增益时，我会想到最大化意外性，也就是希望奖励那些……其实我也说不清楚具体是什么。

Like when I hear information gain, I think of like maximizing surprise in the sense of you want the, you want to give reward to predictions that, I don't even know how to describe it.

Speaker 1

并不是指更准确意义上的更好，而是可能更倾向于多样性这样的理解。

Not necessarily better in the sense of, you know, more accurate, but better in the sense of, you know, maybe more diverse is even the way I'm thinking about it.

Speaker 1

您能详细解释一下这部分吗？

Can you elaborate on that part?

Speaker 0

是的，实际上在这个情境中，我们做的恰恰是与多样化相反的事情。

Yeah, actually, we kind of do the opposite of diversification in this context.

Speaker 0

因此，在这个框架下，我们采用的强化学习方法是建立在预训练范式之上的，而预训练的核心就是预测下一个词。

So in this context what we are trying to do is because we frame this reinforcement learning approach under the pre training framework in which it's all about next token prediction.

Speaker 0

所以我们采用这个总体框架。

So we go with that overarching framework.

Speaker 0

一切都围绕着下一个词的预测。

So it's all about next token prediction.

Speaker 0

但在预训练的最后阶段，我们通过定义奖励为信息增益，赋予其一种成人式的风格——这种信息增益来自于能够通过良好的思考、良好的中间思考来预测下一个词。

But we give some adult style flavor by incorporating a reward during the last phase of pre training by defining reward as information gain of being able to predict the next token with good thought, good intermediate thought.

Speaker 0

因此，我们关注的是：在给定所有先前词并结合你自己的思考的前提下，下一个词的条件概率；将其与仅在给定所有先前词（不含你的思考）的情况下预测下一个词的条件概率进行比较。

And then so what we look at is the conditional probability of the next token given all the previous tokens concatenated with your own thought, you know, compare that with a conditional probability of predicting next token and all the previous tokens without your thought.

Speaker 0

因此，你需要比较这两个数值，而这里的挑战在于：只有当你的中间思考在结合所有先前词的基础上，确实提升了下一个词的条件概率时，你才能获得奖励。

So you compare these two quantities and then the challenge here is that you got to generate, you get reward only if your intermediate thought actually increase the conditional probability of predicting next token when you concatenate your intermediate thought in addition to all the previous tokens.

Speaker 0

所以，这不是一个容易获得的奖励。

So it's not an easy reward to get.

Speaker 1

所以，这不是关于词的信息增益，而是关于思考的信息增益，相对于生成词而言。

So not information gain with regard to the token but information gain with regard to the thought, relative to generating the tokens.

Speaker 0

是的，没错。

Yeah, yeah.

展开剩余字幕（还有 90 条）

Speaker 1

是的，是的。

Yeah, yeah.

Speaker 1

你刚才提到，获得这种奖励并不容易，就像我听到的，这里面涉及一些推理的成分，那么你预计像这样的技术会使训练的复杂性增加到什么程度？

You just mentioned it's not an easy reward to get, like to what degree do you expect, like I'm hearing in there, you know, aspects of inference, like to what degree do you expect the, the complexity of training to increase based on techniques like this?

Speaker 0

这使得预训练的计算量比以前高得多。

Makes computation of pre training much higher than before.

Speaker 0

因此，在我们的工作中，我们做了大量实验。

So in our work, we do a lot of experiments.

Speaker 0

其中一个实验设置是，在预训练期间控制使用或不使用RLP时的标记数量。

One experimental setting is to control the amount of tokens during pretraining with or without RLP.

Speaker 0

所以，如果你使用相同数量的标记，那么我们会使用更多的推理步骤，正如你注意到的。

So if you use the same amount of token, then we use a lot more flaps as you noticed.

Speaker 0

另一个实验设置是，我们控制推理步骤的数量，从而使用少得多的标记。

So another empirical setting is we control for the flap so that we use way less tokens now.

Speaker 0

在预训练的最后阶段，我们使用少得多的标记，但使用了相同数量的推理步骤。

Of the last phase of pretraining, we use way less tokens, but we use more we use identical number of flaps.

Speaker 0

让我大为惊讶的是，即使在控制flap的设置下，完成这种预训练后，最终的预训练模型在后训练阶段表现得更好。

And what we found to my big surprise is that when you finish your pretraining in this way, even in the flap controlled setting, the final pre trained model does do much better after post training.

Speaker 0

不仅在那一刻，这个模型在推理基准测试中表现更好——当然，它在推理基准测试中表现更好，因为它被激励去思考以预测下一个词。

So not only at that point in time, not only this model does better with reasoning benchmarks, of course, will do better with the reasoning benchmarks because it was incentivized to be able to think for predicting NEXT tokens.

Speaker 0

但不仅如此，如果你使用相同的后训练方法，比如顺序微调加上强化学习，这种性能提升依然存在，使得你的模型在面对重度推理的后训练方法时表现得更好。

But also not only that, if you apply the same post training recipe like sequential fine tuning followed by RL, the performance gain survives such that your model now performs even better with reasoning heavy post training recipe.

Speaker 0

这意味着什么？这有点类似于人类的发育过程，比如在某个关键时期你必须习得语言。

So what does this entail is it's a bit analogous to how humans you know, there's this critical period during which you have to acquire language, for example.

Speaker 0

很可能在人生早期学习数学和逻辑思维，比在晚年才开始学习要好得多。

And probably it's a good idea to learn math and logical thinking reasonably early in your life as opposed to much later in life.

Speaker 0

因此，我们实证发现，预训练过程中也发生了类似的情况。

So something like that is happening even with the pre training that we empirically found.

Speaker 1

有趣。

Interesting.

Speaker 1

有趣。

Interesting.

Speaker 1

你知道，我们之前提到过，在结束之前会回到关于普及人工智能这个更广泛的话题。

You know, mentioned that we would come back to this broader idea of democratizing AI before we close out.

Speaker 1

在那里你提到的一个主题是多元化的对齐。

And one of the topics there that you mentioned is pluralistic alignment.

Speaker 1

你能详细解释一下这指的是什么，以及你为什么对这个想法感到兴奋吗？

Can you elaborate on what that means and why you're excited about that idea?

Speaker 0

是的，这涉及到我之前说过的一个观点，即人工智能是由人类创造、为了人类、属于人类的。

Yeah, so that goes to this earlier statement I made about AI of humans, for humans and by humans.

Speaker 0

所谓‘由人类创造的人工智能’，是指人工智能的起源——它所学习的价值观、知识和各种规范，都应该真实反映全人类的多样性。

So AI of humans is about the origin of AI being really humans in terms of the values, the knowledge, the different kinds of norms that AI learns from should really reflect the entirety of humanity.

Speaker 0

这就是这个理念的核心。

So that's the idea.

Speaker 0

当然，互联网并不能均衡地反映全人类。

And of course, internet is not evenly reflecting all humanity.

Speaker 0

因此，由此产生的AI在某些方面也存在偏见。

Therefore, the resulting AI is also biased in some ways.

Speaker 0

那么问题来了，我们能做些什么吗？

And the question is, is there anything we can do?

Speaker 0

这真的很难。

So it's really hard.

Speaker 1

去偏见化互联网，作为训练数据的用途之一。

De biasing the internet for, you know, one uses training data.

Speaker 1

这听起来很难。

That sounds hard.

Speaker 0

这几乎是不可能的，对吧？

It's almost impossible, right?

Speaker 0

因为你无法回到过去，让国王和王后的数量变得均等。

Because you cannot go back and change the history to make even number of kings and queens.

Speaker 0

人类历史上已经发生的事情，都已经发生了。

Whatever happened in the humanity already happened.

Speaker 0

所以你是无法改变这一点的。

So you're not going to change that.

Speaker 1

不过另一方面，我们确实有去偏的统计方法。

Although like there, you know, on the other hand, like, you know, we have de biasing statistical techniques.

Speaker 1

所以如果你把训练集看作仅仅是数据集，我们是有办法对其进行去偏的。

And so if you think of your training set as just kind of a data, you know, a dataset, we have ways to debias that.

Speaker 1

让我感到困惑的是，这样做我们会失去什么？

The question that jumps out to me is like, you know, what do you lose in doing so?

Speaker 1

而且，你会不会失去互联网中某些根本性的特质，而正是这些特质让这些我们并不完全理解的大型语言模型得以运行？

Like, and do you lose some fundamental aspect of, you know, internet that you know, made these LLMs that we don't understand how they work, work?

Speaker 0

是的。

Yeah.

Speaker 0

我觉得去偏并不是正确的解决方案，因为去偏本身也几乎是不可能的，而且有时候我们确实希望保留自己的偏见。

You know, I don't think debiasing is the right answer in the sense that debiasing is also impossible, but also that, you know, sometimes you do want to maintain your own bias.

Speaker 0

例如，如果你是一个宗教信徒，或者来自某个有着特定规范的国家，那么也许我们应当尊重这些差异。

For example, if you're a religious person or if you're from a certain country where you have a particular norms that you like to go by, then maybe, you know, we want to respect that.

Speaker 0

顺便说一句，作为人类，当我们与拥有不同价值观的人互动时，我们有一种方式可以妥善应对：保持礼貌、保持尊重，求同存异，对吧？

By the way, as a human, when we interact with other human beings who we know have different values, we have a way of navigating around this person such that we maintain politeness, we maintain respect, we agree to disagree, right?

Speaker 0

因此，在某种程度上，我认为AI意识到多元价值观并能够在此基础上进行调适，而不是一味追求完全中立，这一点非常重要。因为完全中立不仅可能无法实现，而且如果我们希望以尊重的方式服务不同的文化规范，这可能也不是理想的解决方案。

So to some degree, think this is very important that AI is aware of diverse values and then be able to navigate around it as opposed to just being completely neutral everywhere, which may not only be not attainable but also may not be the desirable solution if we are trying to really serve different cultural norms in a respectable manner.

Speaker 0

因此，在我的研究中，我们从三个不同角度思考多元对齐问题。

So in my work, we think about pluralistic alignment from three different angles.

Speaker 0

这些概念包括显性多元主义、分布性多元主义和可调控多元主义。

There's something called overtone pluralism, distributional pluralism and steerable pluralism.

Speaker 0

这些概念需要进一步解释。

So these concepts require explanations.

Speaker 0

也许我们先从显性多元主义开始吧。

Maybe let's just start with overtone.

Speaker 0

所以，显性多元主义意味着你询问的是

So overtone pluralist means you ask In

Speaker 1

是指奥弗顿窗口的意思吗？

the sense of the Overton window?

Speaker 0

是的。

Yeah.

Speaker 0

所以当你提出一个具有政治敏感性的问题时，比如这个问题可能有多种答案，最好的方式可能是让大语言模型直接呈现所有合理的观点，例如：‘嘿，人们的看法各不相同。’

So it's like when you ask a question that's politically thorny, for example, that could have different answers, the best way might be for an LLM to just present all of them, all of the reasonable opinions as, 'Hey, the answer is that people have different opinions.

Speaker 0

这是一种观点，另一种观点是……”，而不是只选择多数意见，因为那样会边缘化其他声音。

Here's one view, there's another view' and be able to include all of them as opposed to picking the majority opinion because that marginalizes out the rest.

Speaker 0

所以能够涵盖所有这些可能性。

So being able to cover all of these options.

Speaker 0

分布型多元主义是指AI被用于更偏向决策过程的场景，比如AI在筛选求职申请或回答问题时，必须以更分类化的方式作答。

Distribution of pluralism is when AI is made for more like decision making process where maybe AI is doing job application filtering or AI is answering questions, have to answer questions in a more like categorical manner.

Speaker 0

你必须选择一个答案。

You have to choose an answer.

Speaker 0

你不能给出所有可能的答案。

You cannot give all of the answers.

Speaker 0

那么从分布角度看，大语言模型的回答分布应当模拟人类决策的分布。

Then distributionally, the distribution of LLM answers should mimic the distribution of human's decisions.

Speaker 0

因此，在每一个时间点，AI所做的决定可能与任何一个人类的决定都不相同。

So at each point in time, AI might be making a decision that differs from any other human decisions.

Speaker 0

然而，当我们观察整体分布时，与其总是偏向多数观点，导致分布严重失衡，我们的目标是至少让分布比现在更均衡一些。当然，人类本身存在偏见，我们的决策分布也未必公平或无偏，对吧？

However, when we look at the overall distribution, instead of going for the majority case all the time, which would be distributionally super skewed, the idea is that try to be more at least distributionally even compared to now, you know, of course humans have a bias so it's not like our distribution of decisions is necessarily fair or unbiased, right?

Speaker 0

但至少别比这更糟，这就是我们的初衷。

But at least let's not get worse than that is the idea.

Speaker 0

最后一个，可调节的多元主义，意思是你可以引导模型采用不同的价值体系、模型框架或价值框架，以满足你在合理范围内可执行的日常需求。

Now the last one, steerable distribution is sorry, steerable pluralism is that you are able to steer the model to different values, model framework or value framework to serve your day to day needs within the scope that's reasonable to execute this.

Speaker 1

也就是说，在某些情境下，你可能希望模型以更开放或更保守的方式体现多元性？

Meaning in some scenarios you might want to be more or less pluralistic in the way the model operates?

Speaker 0

是的，模型应该能够调整到任何合理的不同价值体系。

Yeah, so the model should be able to steer to any different value system that's reasonable.

Speaker 0

当然，问题在于什么是合理的，因为我们可能不希望允许模型被引导去支持那些想犯罪的人。

Know, of course the question is what is reasonable because maybe we don't want to allow the model to be steerable to support people who want to be criminal.

Speaker 0

罪犯的价值体系应该完全排除，但在合理、合法且社会可接受的范围内，能够引导模型服务于你的价值体系是很重要的。

Criminal's value system probably should be completely out, but within the reasonable, like legal, socially acceptable scope, the ability of being able to steer your model to serve your value system.

Speaker 1

除了识别出多元主义方法的这三种维度之外，你们在探索实现这些方法的具体途径上进展到什么程度了？

How far along are you in identifying ways to do these things beyond, you know, identifying, you know, the three kind of dimensions of pluralistic approaches?

Speaker 0

因此，这类研究需要数据研究和算法研究并重。

So, this sort of research requires both data research as well as algorithmic research.

Speaker 0

令我高兴的是，许多聪明的学者开始在这两个方面开发解决方案。

To my delight, a lot of smart academics started developing solutions for both fronts.

Speaker 0

因此，出现了一些新的算法，能够实现更具多元性的对齐，例如多元分布。

And so there are some new algorithms that does do more pluralistic alignment for example, distribution of pluralism.

Speaker 0

其他人也在研究这个问题。

And other people are working on this.

Speaker 0

但有人可能会说，前沿模型其实并不差。

But one could argue that, hey, the frontier models are not so bad.

Speaker 0

总体而言，前沿模型在這種多元对齐方面比那些在数据整理、安全防护等方面投入较少的开源模型表现更好。

So frontier models in general are better at this kind of alignment pluralism compared to open source models that went through less effort in terms of data curation and like safety guardrails and everything.

Speaker 0

因此，为了提升更小的模型——尤其是开源模型——的性能，以实现更广泛的可及性。

So, in the spirit of making smaller, especially open source models more powerful for wider accessibility.

Speaker 0

这需要更多的学术研究来共享数据、制作数据，以及提出算法创新，以克服数据的局限性。

This requires more academic research to share data, make data and share data as well as coming up with algorithmic innovations in order to get around the limitations of the data.

Speaker 0

仍然有很多工作要做，但至少已经形成了社区协作的氛围。

It's still a lot of work to be done, but at least there's a community effort.

Speaker 1

这太棒了。

That's awesome.

Speaker 1

我们将在2026年初重新建立联系。

So we're connecting, reconnecting, in the beginning of the year, 2026.

Speaker 1

你对今年可能会发生什么有什么想法或预测吗？或者你期待看到哪些进展？

Any, thoughts or predictions on, what you expect to see happen this year or maybe what you would be excited about seeing?

Speaker 0

是的，我认为社区对小模型的努力还会进一步加剧。

Yeah, I think the community efforts on small models will escalate even further.

Speaker 0

去年，开源社区的努力已经不断增加，当然，现在英伟达也在大力投入支持开源项目。

There's Last year already, there has been increasing efforts for open source community and of course, now NVIDIA is also really heavily invested into supporting open source efforts.

Speaker 0

因此，我们可以预见这方面的投入会大幅增加，这是我能做出的一个明显预测。

And so we will see a lot more of that is one obvious prediction I can make.

Speaker 0

另一个是人工智能在科学领域的应用。

Another one is the use of AI for science.

Speaker 0

我个人对此非常兴奋，因为人工智能在科学领域带来的积极影响可能会非常显著。

I'm quite excited about that personally because the positive impact of AI for scientific domains can be really phenomenal.

Speaker 0

如果我们能正确地做到这一点，那么医学和人类生活的不同方面都将真正受益于人工智能驱动的科学。

If we know how to do it quite right, then like medicine and like different aspects of human life could really benefit from AI for science.

Speaker 0

这同样是一个极具挑战性的智力难题，因为现在真正需要的是能够超越互联网数据中反映的人类知识的水平。

And that's also a really hard intellectual challenge because now that really requires being able to reach the knowledge that's really above and beyond the human knowledge reflected on the internet data.

Speaker 0

问题是，人工智能只擅长学习人类能够提供的数据。

And the thing is AI is only really good at learning the data that humans are able to provide.

Speaker 0

因此，这也是一个巨大的智力挑战。

So this is a big intellectual challenge as well.

Speaker 0

我非常期待进一步探索这个方向。

And I'm very excited about pursuing further into that direction.

Speaker 1

Yejin，非常感谢你再次加入我们，向我们更新你正在从事的工作，特别是你如何应对小语言模型的推理问题。

Well, Yejin, thanks so much for jumping back on with us and giving us an update as to what you're working on, and in particular, into, how you're approaching reasoning for SLMs.

Speaker 0

非常感谢你们再次邀请我。

Thank you so much for having me again.

Speaker 1

谢谢。

Thank you.

Speaker 0

再见。

Bye bye.