文本扩散：为什么水星能让大语言模型快10倍

本集简介

扩散模型改变了我们生成图像和视频的方式——现在，它们即将进军文本领域。在本集中，我们与斯坦福大学计算机科学教授、Inception Labs 创始人 Stefano Ermon 交谈，深入探讨扩散模型在语言中的工作原理、为何能并行生成（而非逐词生成），以及这对延迟、成本和实时 AI 产品意味着什么。我们讨论了：扩散的最简单心理模型：先生成完整草稿，再通过“修正错误”逐步优化为何当前自回归大模型推理常受内存限制，而扩散模型能将其转向更适配 GPU 的计算模式 Mercury 当前的优势领域（IDE、语音/实时代理、客户支持、教育科技——任何人类无法等待的场景）长上下文和架构选择中哪些变了，哪些没变在生产环境中评估模型的真实方法：离线评估 + 黄金标准 A/B 测试 Stefano 还分享了 Mercury 未来的发展方向，特别是围绕代理场景中更强的规划与推理能力。试用 Mercury 并了解更多：inceptionlabs.ai 如需更多关于真正实用的 AI 系统的务实对话，请订阅 The Neuron 通讯：https://theneuron.ai

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

我们当时在匹配困惑度，但速度能快上十倍。

We were matching the perplexity, but we were able to be like 10 times fast.

Speaker 0

这让我非常兴奋，我真的很想看看，如果训练一个比GPT-2更大的模型，是否能打造出具有商业可行性的产品。

That was super exciting to me, and I really wanted to see what happens if you train something bigger than a GPT-two model, possible to build something commercially viable.

Speaker 0

因此，我创办了这家公司，以扩大规模。

And that's why I started the company to scale things up.

Speaker 0

我们今天使用的自回归模型在推理工作负载中的算术强度非常差。

The arithmetic intensity of inference workloads that we have today with an autoregressive model is very bad.

Speaker 0

利用率非常低，这就是为什么人们正在建造庞大的数据中心，甚至开发专门的AI推理芯片，以更好地适应这类工作。

The utilization is very low, and that's why people are building massive data centers or even building custom chips, AI inference chips, that are better suited for that kind of work.

Speaker 0

基本上，如果你能每秒生成更多令牌，这意味着在相同的硬件资源、相同数量的GPU下，你可以生成更多的令牌。

Basically, if you can generate more tokens per second, what this means is that for the same amount of hardware for the same number of GPUs, you can produce more tokens.

Speaker 0

因此，每个令牌的成本将会下降。

And so the cost per token is going to go down.

Speaker 0

这就是为什么我们能够以比别人低得多的成本提供模型，因为我们更高效地利用了现有硬件。

And that's why we're able to serve our models much more cheaply than what you could get because we make better use of the existing hardware.

Speaker 0

因此，我们现在生产中的Mercury模型要大得多。

So now the Mercury models that we have in production are significantly larger.

Speaker 0

它们是在更多数据上训练的。

They've been trained on more data.

Speaker 0

这将使Mercury模型变得更加聪明。

That's going to enable Mercury models to be even smarter.

Speaker 0

它们将具备更好的规划和推理能力。

It's gonna have much better planning and kind of like reasoning capabilities.

Speaker 0

这将推动许多人们非常关注的智能代理应用场景。

And so that's gonna enable a lot of agentic use cases that people really care about.

Speaker 0

它们将会变得非常、非常快。

It's gonna make them really, really fast.

Speaker 1

欢迎各位来到Neuron AI播客。

Welcome humans to the Neuron AI podcast.

Speaker 1

我是主持人科里·诺尔斯，和往常一样，我身边是那位能把GPU基准测试变成睡前故事的人——格兰特·哈维。

I'm your host, Corey Knowles, and I'm joined as always by the man who could turn a GPU benchmark into a bedtime story, Grant Harvey.

Speaker 1

今天怎么样，老兄？

How's it going today, man?

Speaker 2

挺好的。

It's going great.

Speaker 2

不过别现在就让我当场回答这个问题。

Don't put me on the spot to do that right this moment, though.

Speaker 2

我得好好想想那里的某些机制。

I'd have to I'd have to think of some some mechanics there.

Speaker 1

哦，稍后我们就要邀请斯坦福大学计算机科学教授、Inception Labs创始人斯特凡·奥尔曼，他创建了Mercury扩散大语言模型。

Oh, well, here in just a few, we're gonna be joined by Stefan Oermann, Stanford University computer science professor and the founder of Inception Labs that created the mercury diffusion large language models.

Speaker 1

但在那之前，Grant会先为我们提供一些背景信息。

But first, Grant's gonna share us a little context before we get in there.

Speaker 2

是的。

Yeah.

Speaker 2

所以图像扩散模型的工作方式与下一个词预测的GPT模型完全不同。

So image diffusion models work in an entirely different way than the NEXT token predicting GBT models.

Speaker 2

所以我们今天邀请了斯特凡诺，因为他将这项技术应用到了大语言模型上，这有可能彻底改变AI在各种场景中的使用方式，从智能代理到复杂的企业工作流程。

So we've invited Stefano today because he's taken that same technology and applied it to LLMs, and it has the potential to transform how AI is used in all types of settings from agents to complex enterprise workflows.

Speaker 1

太棒了。

Excellent.

Speaker 1

在请他上线之前，我想花一点点时间给你展示一下Mercury的实际运行效果，因为我认为亲眼看到它真的很重要，会让你保持浓厚兴趣。

Well, before we bring him on, I wanna take just a quick second to show you Mercury in action because I think seeing it really matters and will keep you really interested.

Speaker 1

你会明白为什么你必须观看这个视频。

You'll understand why you need to be watching this video.

Speaker 1

所以，我屏幕上的这个界面是Inception Lab的网站。

So what you see here on my screen, this is the inception lab site.

Speaker 1

如果你往上走，可以进入Mercury聊天页面，然后在这里。

If you go up top, you can go to the mercury chat and down here.

Speaker 1

我打算选一个他们推荐的提示词。

I'm just going to grab one of their suggested prompts.

Speaker 1

我特别喜欢这个：模拟爱因斯坦、阿达·洛芙莱斯和艾伦·图灵之间的圆桌讨论。

And I love this one simulate a roundtable discussion between Einstein, Ada Lovelace and Alan Turing.

Speaker 1

现在你要确保点击这个扩散按钮，因为扩散按钮能让你看到这项技术有多酷。

Now you want to make sure you click this diffusion button because the diffusion button gives you the visual of how cool this is.

Speaker 1

所以看好了。

So watch this.

Speaker 1

好了。

All right.

Speaker 1

看看它是怎么运作的。

Watch how this works.

Speaker 2

哇哦。

Woah.

Speaker 1

如果你往下看

And if you go down

Speaker 2

这比AI的打字机效果酷多了。

That is so much cooler than the typewriter effect of the AI.

Speaker 1

这难道不疯狂吗？

Is that not insane?

Speaker 1

对吧？

Right?

Speaker 2

是的。

Yeah.

Speaker 2

是的。

Yeah.

Speaker 2

对吧？

Right?

Speaker 2

太棒了。

That's awesome.

Speaker 2

当你用HTML5做游戏时，这真的非常酷，比如它能让你快速做出像乒乓球这样的二维游戏，或者任何其他类型的二维游戏。

It's really cool too when you do this, like, you build a game in HTML five, like, how quickly it can make you something like Pong or or, you know, any any kind of, like, two d game.

Speaker 2

这太惊人了。

It's it's amazing.

Speaker 2

是的。

Yeah.

Speaker 1

确实是。

It is.

Speaker 1

确实是。

It is.

Speaker 2

在我们开始这次采访之前，请花一秒钟点赞并订阅频道，这样我们才能继续为您带来科技和人工智能领域最有趣的人士。

Well, before we get to this interview, please take a quick second to like and subscribe to the channel so we can keep bringing you the most interesting people in tech and AI.

Speaker 2

而且随着

And with

Speaker 1

欢迎来到《The Neurons》。

that Welcome to The Neurons.

Speaker 1

Stefano，很高兴你来。

Stefano, it's great to have you.

Speaker 0

谢谢。

Thank you.

Speaker 0

很高兴能来这里。

Pleasure to be here.

Speaker 0

很高兴再次见到你。

Good to see you again.

Speaker 1

太好了。

Excellent.

Speaker 1

我们非常高兴你能来参加。

Well, we're we're so excited to have you on.

Speaker 1

正如我在开始前提到的，我们在拉斯维加斯见过面，做过一次简短的采访，我一直在期待这次对话，因为当时离开时我还有太多问题没解决，心想：‘我们一定要让他再来一次。’

As I mentioned before we started, I had we chatted in Vegas, did a short interview, and I've really been looking forward to this because I just I had so many questions when I walked away still that I was like, oh, we've gotta get him on.

Speaker 1

我们一定要让他再来一次。

We've got to get him on.

Speaker 1

那么，首先你能用一种简单的方式向观众解释一下什么是扩散模型吗？可能有些人还不太了解。

So I guess to start, would you mind kind of explaining diffusion in a fairly simple way for viewers who maybe aren't familiar?

Speaker 0

扩散是一种生成式人工智能模型。

So diffusion is a type of generative AI model.

Speaker 0

这种模型常用于生成图像、视频和音乐。

It's the kind of model that is commonly used to generate images, video, music.

Speaker 0

你可能熟悉ChatGPT、Gemini或Claude这样的模型，它们生成文本时就像从左到右逐个词元逐步输出。

And you're probably familiar with you know, ChatGPT's or Gemini's or Claude, where you kind of like see the models generate text, kind of like left to right one token at a time.

Speaker 0

扩散模型的工作方式则完全不同：它从一开始就生成完整对象，然后通过不断修正错误、让画面更清晰、效果越来越好来优化结果。

A diffusion model works very differently in the sense that it generates the full object from the beginning and then it refines it by kind of like fixing mistakes, making it sharper, making it look better and better.

Speaker 0

这是一种非常不同的解决方案，更具并行性，因为神经网络能够同时修改图像或文本的多个组成部分。

And it's a very different kind of like solution that it's more parallel in the sense that the neural network is able to modify many components of the image or the text at the same time.

Speaker 0

因此，扩散模型通常比传统的自回归模型快得多，后者是像从左到右逐个词元那样逐步工作的。

And that's why diffusion models tend to be a lot faster than traditional autoregressive models that kinda, like, work left to right one token at a time.

Speaker 1

好的。

Okay.

Speaker 1

对。

Right.

Speaker 1

对。

Right.

Speaker 2

那么，它们究竟是如何对最初生成的版本进行推理的呢？

And, how how is it that they're they're actually, like, reasoning over the original version that they they create?

Speaker 2

比如，它们怎么知道第一个版本不够好呢？

Like, how like, how do they know that the first version isn't good?

Speaker 0

是的。

Yeah.

Speaker 0

所以，嗯，它是这样的。

So it it's yeah.

Speaker 0

这是个很好的问题。

A That's a great question.

Speaker 0

而这确实源于模型的训练方式。

And and it really it really stems from the way the models are trained.

Speaker 0

像GPT这样的传统自回归模型是一个神经网络，它被训练来预测下一个词元、下一个词。

A traditional autoregressive model like a GPT model is trained to is a neural network and it's trained to predict the next token, the next word.

Speaker 0

这就是它的使用方式。

And that's how you use it.

Speaker 0

在推理时，你给它一个问题，它就会从左到右逐个词元地预测答案。

At inference time, you give it a question and then it will try to predict the answer left to right one token at a time.

Speaker 0

融合语言模型经过训练，能够消除错误、修正错误。

The fusion language model, it's trained to remove mistakes, fix mistakes.

Speaker 0

所以你会先从干净的文本或干净的代码开始。

So you kinda like start with clean text or clean code.

Speaker 0

你人为地加入错误，然后训练模型去修复这些错误。

You artificially add mistakes, and then you train the model to fix those mistakes.

Speaker 0

在推理时，模型的使用方式也是如此。

And, that's how the model is also used at inference time.

Speaker 0

你从一个完整的答案开始，然后对其进行优化。

You start with kind of like a full answer, and then you refine it.

Speaker 0

这是一种非常不同的模型训练方式。

And so it's a it's a very different way of training the models.

Speaker 0

在推理阶段，这也是一种非常不同的模型使用方式。

It's a very different way of using the models at the inference time.

Speaker 1

而且整个过程快如闪电。

And it happens at the speed of lightning.

Speaker 0

它非常快。

It's very fast.

Speaker 0

它非常快，因为关键在于，在自回归模型中，你一次只能得到一个标记。

It's very fast, because, the the the key thing is, you know, in in an autoregressive model, you know, you get one token at a time.

Speaker 0

你需要处理一个包含数百亿甚至万亿参数的庞大神经网络。

You have to process a massive neural network with hundreds of billions or trillions of parameters.

Speaker 0

但最终，你只得到一个标记。

And at the end, you only got a single token.

Speaker 0

是的。

Yeah.

Speaker 2

如果从这个角度想，效率非常低。

Very inefficient if

Speaker 1

你这样想的话

you think You of it that

Speaker 0

你仍然需要大型神经网络，但每次前向传播，你仍然需要评估整个网络。

you still need big neural networks, but each forward pass, each, you know, you need to still evaluate the whole thing.

Speaker 0

但最终，你能修改的东西会超过一个。

But then at the end, you get more than you are able to modify more than one thing.

Speaker 0

所以只要不需要太多的去噪步骤或扩散步骤，这个过程可以非常非常快。

And so as long as you don't need too many denoising steps, too many diffusion steps, this can be really, really fast.

Speaker 1

哇。

Wow.

Speaker 1

斯蒂法诺，我知道你在这个领域做了很多研究，你当时看到了什么，让你相信文本扩散在商业上现在可行了？

Stefano, what did you I know you've done a lot of research on this topic along the way.

Speaker 1

你看到了什么，让你相信文本扩散在商业上现在可行了？

What what did you see that convinced you diffusion for text was commercially viable now?

Speaker 0

是的。

Yeah.

Speaker 0

所以这始于我一直对扩散模型充满热情。

So it started, I mean, I've always been passionate about diffusion models.

Speaker 0

这个最初的想法其实来自我在2019年于斯坦福的实验室。

The the the kind of like original idea came out from from my lab at Stanford back in 2019.

Speaker 0

那时候，几乎所有图像生成模型都基于GAN（生成对抗网络），而GAN非常难以训练，极不稳定。

Back then, pretty much all the image, generative models were based on GANs, generative adversarial networks, which were very difficult to train, very unstable.

Speaker 0

这是一种生成器和判别器之间的博弈。

There is this kind of like game between a generator and a discriminator.

Speaker 0

这是一种相当复杂的模型，扩展这种方案时会遇到各种问题。

It's a pretty complex kind of model to train and there is all kinds of issues in scaling that approach up.

Speaker 0

我们提出了一种替代方法：训练模型去噪，然后以一种逐步细化的方式生成。

And we came up with this alternative approach of training the model to remove noise and then generating in a course to find way.

Speaker 0

我们证明了这种方法效果更好，最终它迅速普及，所有人都转向了扩散模型用于图像、视频生成，比如MidJourney、SORA、Stable Diffusion。

And we showed that it was working much better and eventually that took off and everybody kind of switched to diffusion models for image, video generation, mid journey, SORA, stable diffusion.

Speaker 0

它们都基于我在斯坦福实验室提出的那些原始想法。

They were all based on those original ideas from my lab at Stanford.

Speaker 0

从那时起，我一直在尝试探索：我们能否也将这种方法应用于文本和代码生成？

And since back then, I kinda like tried to see, can we get this to work also on text and code generation?

Speaker 0

你知道，要弄清楚如何正确实现它，花了好几年时间。

And, you know, it took a few years to to figure out how to do it properly.

Speaker 0

但到了2024年，我们取得了突破性进展。

But then, in 2024, we kind of had a breakthrough.

Speaker 0

我们找到了将连续空间中的数学和底层算法适应到文本和代码这样的离散空间的方法。

We figured out how to adapt the math and the underlying sort of like algorithms from continuous spaces to discrete spaces like text and code.

Speaker 0

我们在GPT-2规模上取得了非常有前景的结果。

And, we had some really promising results at the GPT-two scale.

Speaker 0

在斯坦福大学的实验室里，我们无法获得大量的GPU和计算资源，因此我们能训练的最大模型是GPT-2规模的。

So at Stanford in university labs, don't have access to a lot of GPU, a lot of compute, and so the largest model we were able to train was a GPT-two sized model.

Speaker 0

我们所做的就是用一个GPT-2规模的模型，在相同的数据上同时将其训练为扩散模型和自回归模型。

But basically what we did is we took a GPT-two sized model and we train it as a diffusion model and as an autoregressive model on the same data.

Speaker 0

我们发现，生成质量是相同的——如果你考虑困惑度，这是人们通常用来衡量模型拟合数据程度和生成质量的指标。

And so what we found was that the quality was the same, so you were able to actually, if you think about perplexity, which is the way people usually use to figure out how good the model fits the data, what's the quality of the generations you get.

Speaker 0

我们的困惑度与之相当，但速度却快了大约10倍。

We were matching the perplexity, but we were able to be like 10 times faster.

Speaker 0

这让我非常兴奋，我特别想看看，如果训练一个比GPT-2更大的模型会发生什么。

And so that was super exciting to me and I really wanted to see what happens if you train something bigger than a GPT two model.

Speaker 0

你知道吗？

You know?

Speaker 0

有可能打造出商业上可行的产品吗？

Is it possible to build something commercially viable?

Speaker 0

所以我创办了这家公司，来把事情规模扩大。

And that's why I started the company to to scale things up.

Speaker 1

从那以后，你就一直在从GPT-2级别的模型进行扩展。

And you've been scaling up from GPT two caliber since then.

Speaker 1

对吧？

Right?

Speaker 0

从那以后。

Since then.

Speaker 0

是的。

Yes.

Speaker 0

是的。

Yes.

Speaker 0

现在我们量产的水星模型要大得多。

Now the the Mercury models that we have in production are significantly larger.

Speaker 0

它们接受了更多的数据训练。

They've been trained on more data.

Speaker 0

在模型训练后，我们投入了大量工程工作，确保它们能有效应对人们关心的任务，比如大语言模型的商业应用场景。

There is a lot of engineering work that went into kind of like post training the models and making sure that they would be, useful for, tasks that people that people care about, like commercial kind of use cases of LLMs.

Speaker 2

太好了。

Excellent.

Speaker 2

它们的规模相似吗？

Do they are they kind of similar in in size?

Speaker 2

比如，我们应该把它们视为同一种类型吗？

Like, should we be thinking about them the same

Speaker 1

方式吗？

way?

Speaker 1

哦，比如参数之类的？

Oh, like, parameters and such?

Speaker 2

我们该如何将它们与

How how do we compare them

Speaker 0

比如，一个

to, like, a

Speaker 2

传统语言模型进行比较？

traditional language model?

Speaker 0

是的。

Yeah.

Speaker 0

这些模型在参数数量上仍然相当大。

So the the the models are still fairly fairly large, in terms of, like, the number of parameters.

Speaker 0

我们实际上仍在使用类似的架构。

We're still using actually similar architectures.

Speaker 0

因此，底层仍然是一个Transformer。

So under the hood, it's still a transformer.

Speaker 1

好的。

Okay.

Speaker 0

所以我们发现，这种架构实际上表现得相当不错，也可以作为扩散语言模型的骨干网络。

So what we found is that that kind of architecture actually works pretty well, also for, you know, as a backbone for a diffusion language model.

Speaker 0

我们利用了那些已知效果良好、并有现有框架和开源代码支持的技术。

And we've kind of used what we knew worked well and is well supported by existing frameworks and open source code.

Speaker 0

所以，神经网络并没有太大不同。

So yeah, the neural networks are not that different.

Speaker 0

只是它们的训练方式不同，在推理时的使用方式也不同。

It's just like they're trained in a different way and they're used in a different way at the inference time.

Speaker 1

这太有趣了。

That's so interesting.

Speaker 1

我甚至没意识到，你本质上还是在使用Transformer技术，只是没有采用自回归的方式。

I didn't even realize that that essentially you're still looking at transformer technology, just not wrapped in an autoregressive approach.

Speaker 1

对吧？

Right?

Speaker 0

没错。

That's right.

Speaker 0

是的。

Yes.

Speaker 0

我认为，这可能实际上不是最优的。

And I think, perhaps that's actually suboptimal.

Speaker 0

我的意思是，我们已经基本认同变换器是用于自回归模型的一种非常优秀的架构。

Like, I mean, have kind of converged on transformers as being a really good, architecture for, you know, autoregressive models.

Speaker 0

我的意思是，如今人们也用它们来做扩散模型，比如人们会使用扩散变换器。

I mean, these days people also use them for diffusion models, like people use diffusion transformers.

Speaker 0

因此，这是一种在不同模态、不同类型的生成模型中广泛使用的架构。

So it's kind of like an architecture that widely used across different modalities, across different kinds of generative models.

Speaker 0

但有可能，当你改变生成模型时，会出现更好的架构，表现得更出色。

But it's possible that there might be better architectures that shine even better once you change the generative model.

Speaker 0

它不再是自回归的了。

It's no longer autoregressive.

Speaker 0

所以我认为设计空间是不同的。

So I think the design space is different.

Speaker 0

我认为，在将神经网络架构与训练目标以及我们进行的推理计算相匹配方面，还有很大的空间进行研发并取得进一步改进。

I think there is a lot more room for doing R and D and coming up with further improvements just by kind of matching the neural network architecture to the training objective and to the inference kind of computations that we do.

Speaker 0

对。

Right.

Speaker 0

有道理。

That makes sense.

Speaker 2

非常酷。

Very cool.

Speaker 1

当我们谈论并行生成时，什么是被并行化的？

When we talk about parallel generation, what is parallelized?

Speaker 1

我们是在讨论词元、片段还是编辑？

Are we talking about tokens, spans, edits?

Speaker 1

有什么是我可能不知道的吗？

Something I maybe don't know about?

Speaker 0

是的。

Yeah.

Speaker 0

基本上，并行的是网络能够同时修改多个标记。

Basically what's parallelized is that the network is able to essentially modify multiple tokens at the same time.

Speaker 0

哇。

Wow.

Speaker 0

所以这正是你之前看到的。

And so that's kind of like what you were seeing.

Speaker 0

你可以试试我们的网站，看看扩散模型是如何运作的动画，你会看到它不断地修改答案，而不是一次只改一个标记。

Think if you could try our website and you kind of like see the animation of how the diffusion model works, you're gonna see that it constantly changes the answer and it's not one token at a time.

Speaker 0

许多内容会同时被修改，这就是它更并行、更适合GPU的原因。

Many things get changed at the same time And that's what makes it more parallel and makes it much more suitable to GPUs.

Speaker 0

GPU的设计就是为了并行处理大量任务。

GPUs are built to process many things in parallel.

Speaker 0

它们实际上会对不同的数据点应用相同的计算。

They're gonna like apply the same computation across different data points effectively.

Speaker 0

而你在从自回归模型中采样时所进行的计算，根本无法很好地映射到GPU上。

And the kind of computation that you do when you sample from an autoregressive model does not map well at all to a GPU.

Speaker 0

这是一种非常受内存限制的计算，你需要花费大部分时间在慢速内存和快速内存之间搬运权重，以便真正执行计算。

It's a very memory bound kind of computation where you're going to spend most of your time moving around weights from slow memory to fast memory where you can actually do the computation.

Speaker 0

因此，我们今天用自回归模型所面临的推理工作负载的算术强度非常差。

So the arithmetic intensity of the kind of inference workloads that we have today with an autoregressive model is very bad.

Speaker 0

利用率非常低，这就是为什么人们要建造大型数据中心，甚至开发专门的AI推理芯片，以更好地适应这类工作负载。

The utilization is very low and that's why, you know, people are building massive data centers like because it's a it does not or or even building custom chips, AI inference chips that are better suited for that kind of workload.

Speaker 1

几天前有一个小案例，为了将速度提升2.5倍，成本却增加了6倍。

It's a small drop a couple days ago where in order to get it two and a half times faster, the cost 6x.

Speaker 0

因为这是顺序计算。

Because it's a sequential it's a sequential computation.

Speaker 0

在生成第一个和第二个标记之前，你无法生成第三个标记。

Like, you cannot generate the the third token until you've generated the first and the second.

Speaker 0

这纯粹是一个结构性的瓶颈。

And so there is a it's it's just a structural bottleneck.

Speaker 0

由于计算过程中存在序列依赖关系，因此无法对其进行并行化。

There is no way to parallelize it, because there is sequential dependencies, across the computation.

Speaker 0

因此，在生成之前的所有内容之前，你无法处理任何未来的内容。

And so you can't process something into the future until you've generated everything before it.

Speaker 0

所以根本没有办法并行化这一点。

And so there's just no way no way to parallelize that.

Speaker 2

这有点好笑。

It's kinda funny.

Speaker 2

基本上，你讨论的是，与其广泛增加GPU的数量，不如让我们在某一时刻处理的token数量增加。

Basically, you're talking about is, like, instead of scaling the amount of GPUs widely, it's like, let's scale the amount of tokens that we're actually dealing with at a certain time.

Speaker 2

然后你就可以在GPU上做更少的工作。

And then you could do less on a GPU.

Speaker 2

这太棒了。

That's awesome.

Speaker 0

没错。

Exactly.

Speaker 0

没错。

Exactly.

Speaker 0

这实际上是将重点从内存受限转向计算受限，也就是说，你主要受限于GPU上可用的浮点运算次数，而这个数值更容易扩展。

Like, it's shifting from a memory bound regime to a compute bound regime, where you basically are are bounded by the number of FLOPs that you have available, on the GPU, which is a much easier quantity to scale.

Speaker 0

如果你是一家芯片制造商，增加浮点运算次数比提升内存带宽要容易得多。

Like, if you were a chip manufacturer, it's a lot easier to add FLOPs that do increase memory bandwidth.

Speaker 2

那么，为什么所有大型实验室不立即转向这种方法呢？如果它真的高效这么多的话？

So why wouldn't all the big labs like immediately switch to this, like if it's that much more efficient?

Speaker 2

因为我想他们必须自己训练模型，而他们并不具备你们那样的技能。

Because I guess they have to train it, they don't have the same skills that you do.

Speaker 2

你怎么看？

What's your thought?

Speaker 0

是的。

Yeah.

Speaker 0

一些实验室对特定的技术栈非常依赖，因此如果他们要转向其他方案，成本会非常高。

Some of the labs are very entrenched in a certain stack, and so there's a big cost if they were to switch to something different.

Speaker 0

在如何训练这些模型、甚至如何从模型中采样等方面，确实存在不少‘秘方’。

There is quite a bit of, secret sauce involved in terms of like what is the right way to train these models, what is the right way to, you know, even just sample from them.

Speaker 0

这并不像表面上看起来那样简单，比如生成一个词元，追加它，再生成下一个词元，再追加。

Like it's not as obvious as, okay, generate one token, append it, generate the next one, append it.

Speaker 0

如果你使用的是传统的自回归模型，在推理端能做的优化非常有限，但扩散模型的推理算法设计空间要广泛得多。

There is not much you can do on the inference side if you have a traditional autoregressive model, but on a diffusion model, the design space for inference algorithms is much broader.

Speaker 0

此外，从MLC层面来看，如果你考虑实际部署生产负载，目前针对自回归模型已经有了相当多的开源和闭源解决方案，比如VLLM、SGLANG、TensorRT。

And then there is also the issue of kind of like the at the MLCs level, like if you think about actually serving production workloads, There is a decent amount of open source and closed source, of course, solutions for autoregressive models, things like VLLM, SGLANG, TensorRT.

Speaker 0

自回归模型的推理服务框架已经相当成熟了。

Like, there is pretty mature serving stacks for autoregressive models.

Speaker 0

而对于扩散模型来说，这还处于更早期的阶段。

For diffusion models, it's much more you know, it's much earlier.

Speaker 0

我们有自己的框架，但要知道，要想在真实的GPU上真正实现高效运行，需要投入大量工作，而且在系统层面还有各种各样的优化空间。

We have our own stack, but, you know, it it takes a significant amount of work if you wanna figure out how to actually make things efficient in practice on on real world GPUs, and and there's all kinds of optimizations that you can do at the systems.

Speaker 1

好吧，我得说，每百万输出词元只花一美元，你们在这方面做得不错。

Well, I've gotta say, at a dollar per million output tokens, you seem to be doing okay with that.

Speaker 0

当然。

For sure.

Speaker 0

当然。

For sure.

Speaker 1

我一直在想，当我们谈论GPU以及采用自回归方法所需的东西时，从很多方面来看，这或许是一种巧妙绕过当前内存供应问题的方式——不必花大笔钱去采购，也不用把钱堆在黄仁勋的后院门口。

I keep thinking when we when we talk when we talk like GPUs and what it takes to do the autoregressive approach, you know, this kinda, in a lot of ways, could be a smart way to sort of side skirt things like the current memory supply issue, the need to go acquire Brink trucks of money and back them up at Jensen Huang's patio door.

Speaker 1

我真的觉得，在这个关键时刻，这是一种非常有趣的方法。

You know, I I really think that this is an interesting approach at a prime time for that.

Speaker 0

是的。

Yeah.

Speaker 0

基本上，如果你能每秒生成更多令牌，这意味着在相同数量的硬件和GPU条件下，你可以生成更多的令牌。

Basically, if you can generate more tokens per second, what this means is that, you know, for the same amount of hardware for the same number of GPUs, you can produce more tokens.

Speaker 0

因此，每个令牌的成本将会下降。

And so the cost per token, is gonna go down.

Speaker 0

这就是为什么我们能够以比传统自回归模型低得多的成本提供我们的模型——因为我们更高效地利用了现有硬件，成本确实显著更低。

And that's why we're able to to serve our models much more cheaply than what you would get, you know, if you were to use traditional autoregressive models because we we make better use of the of the existing hardware, and so the costs are actually significantly lower.

Speaker 1

这说得通。

That makes sense.

Speaker 1

这说得通。

That makes sense.

Speaker 1

那么，这对长上下文来说表现如何？

So how does how does this behave with long context?

Speaker 1

我们是否看到它随着上下文增长而明显变得更昂贵，还是并行性在其中发挥了更大作用？

Are are we looking at at at it getting specifically any more expensive, or is parallelism playing more of a role as they grow?

Speaker 0

是的。

Yeah.

Speaker 0

这实际上更取决于架构，而不是它是扩散模型还是自回归模型。

So that really depends more on the architecture than the than the you know, whether it's a diffusion model or an autoregressive model.

Speaker 0

好的。

Okay.

Speaker 0

正如我提到的，目前我们仍在使用自注意力机制，而它在上下文长度增加时效率会显著下降。

Right now, as I mentioned, we're still using self attention, which unfortunately scales pretty poorly with the context length.

Speaker 0

所以我认为没有区别。

So I would say there is no difference.

Speaker 0

它并不比自回归模型更好，也不比它更差，当你考虑更长的上下文时。

It's not better, it's not worse than an autoregressive model as you think about a longer context.

Speaker 0

我们的模型支持大约10万标记的上下文长度。

Our models are supporting roughly a 100 k tokens of context length.

Speaker 0

我们有可能进一步扩展它。

We could potentially scale it up more.

Speaker 0

再次强调，如果你考虑自回归模型和扩散模型的区别，这并没有本质不同，更多取决于底层架构。

Again, it's not something that is very different if you think about an autoregressive model versus a diffusion model, it's more a function of the underlying architecture.

Speaker 0

事实上，我们可以使用其他更能有效处理上下文长度的架构，比如状态空间模型或其他更高效的注意力变体。

And in fact, we can actually use alternative architectures that scale better with respect to the context like like state space models or other attention variants that are more efficient.

Speaker 0

我们有一些初步结果，所有模型都兼容不同的主干架构，但目前尚未投入生产流程。

We have some preliminary results, so everything is compatible with different kind of backbones, but not in the production process at the moment.

Speaker 2

这很酷。

That's cool.

Speaker 2

是的。

Yeah.

Speaker 2

我一直想问研究人员一个问题。

I was wondering, like, I've always wanted to ask a researcher this.

Speaker 2

在过去一年，甚至最近六个月里，你有没有看到什么让你眼前一亮的东西，能解决这个记忆上下文的问题？

Have you seen anything over the past year or, like, six months even that has lit you up in terms of, like, oh, this could be a good alternative for that memory context problem?

Speaker 2

还是说，你们觉得根本还没出现任何接近的方案？

Or are you like, we still haven't seen anything that's even close?

Speaker 0

没什么特别的。

Nothing particularly.

Speaker 0

我觉得这本质上是个难题，要取得真正的突破很难。

I think it's it's just like a fundamental problem for which, you know, it's gonna be hard to to get, you know, like a like a real breakthrough.

Speaker 0

就像，这里面存在一些固有的权衡。

Like, there is just, like, inherent trade offs.

Speaker 0

我从充分统计量的角度来思考：你要存储关于过去哪些信息，如何追踪？你希望记住有用的东西，丢弃无用的，这本质上就是一个难题。

I think of them in terms of sufficient statistics, like what do you store about your past and how do you keep track of You wanna remember the things that are useful, you wanna discard the things that are not useful, that's just fundamentally a hard problem.

Speaker 0

总是存在一种所谓的‘没有免费午餐’的情况——你事先并不知道该记住什么、该丢弃什么。

There there's always you know, that there is is no some kind of no free lunch involved where ahead of time you don't know what you should remember and what you should discard.

Speaker 0

有些东西对某件事有用，但对另一件事就没用。

And, some things are gonna be useful for something, and they're gonna be not useful for something else.

Speaker 0

所以我认为这是一个根本上非常困难的问题，必须做出权衡。

And so I think it's it's a fundamentally very difficult problem where you have to make trade offs.

Speaker 2

不过我要说，能够并行处理的话，或许能让你更容易做出这些权衡，或者更高效地完成一些计算。

Although I will say with that with that equation, being able to work in parallel, I think, makes you maybe perhaps make those trade offs or you can make some of those calculations more efficiently.

Speaker 2

你同意这个说法吗？如果我的意思表达清楚的话。

Would you would you agree with that, if that makes sense?

Speaker 0

是的。

Yeah.

Speaker 0

我认为这在一定程度上改变了设计空间，比如你有多少浮点运算能力，以及你受内存限制的程度。

And I think it it it changes a little bit the the the design space in terms of, like, how many flops you have access to and then, how memory bound you are.

Speaker 0

是的。

Yeah.

Speaker 0

但并不是根本性的。

But not fundamentally.

Speaker 0

就像你仍然需要处理信息，仍然需要查看所有上下文和过去的信息，才能生成高质量的回答，无论你是一次生成一个词元，还是并行处理。

Like, you you kinda like you kinda, like, still need to process and you still need to be able to look at your all the context, all the past information to be able to generate good quality answers, whether you do them one token at a time or you do them in parallel.

Speaker 0

你必须回顾过去，意识到那里有一些非常根本的东西。

You kind of have to look at the past and say, oh, there is something pretty fundamental there.

Speaker 1

你知道，我认为除了大多数企业级和软件应用之外，当你面对普通员工日常使用的情境时，100K的上下文长度对大多数情况来说已经绰绰有余了。

You know, I would say probably outside of most, you know, maybe enterprise and software applications, when you're dealing with what, say, the average worker uses, a 100 K context is is plenty for most things.

Speaker 1

在这个范围内，你能做的事情真的很多。

You can really do a lot in that range.

Speaker 1

是的。

Yeah.

Speaker 1

我一直在想，如果不按从左到右的顺序，连贯性是如何维持的。

I was I was kind of wondering how coherence plays together by not going left to right.

Speaker 1

我想这是我用人类大脑的思维方式在思考这个问题。

And I guess that's me thinking of it through how my human brain works.

Speaker 0

对。

Yeah.

Speaker 0

所以，本质上存在一种错误修正机制，模型确实经过训练来纠正错误，并不断修订答案。

So, essentially, there is a there is an element of error correction, and, certainly, the the models are trained to to fix mistakes, and then they they constantly revise the answer.

Speaker 0

因此，最初的答案并不连贯，但随着你投入更多的计算资源，答案会变得越来越好。

And so initially the answers are not coherent and then they get increasingly better as you throw essentially more compute.

Speaker 0

它们也可以被视为在测试时可扩展计算的另一个维度，即测试时推理和测试时计算，以在质量、速度或成本之间进行权衡。

They can also think of it as another dimension over which you can scale compute at test time, so test time inference, test time compute, to trade off quality for speed or cost.

Speaker 0

这就是扩散语言模型所揭示的基本权衡。

And so that's kind of like the fundamental trade off that is exposed by a diffusion language model.

Speaker 0

它只是让你能够控制质量与速度之间的关系。

It just provides you an access to control quality versus speed.

Speaker 0

这背后存在一个根本性的权衡。

And there is a fundamental trade off.

Speaker 0

比如，如果你不想进行太多去噪步骤，不想对输出进行太多次迭代，那么质量就不会像经过多次精细优化那样好。

Like if you don't want to do too many denoising steps, too many passes over the over the output, the quality is not gonna be as good as what you would get if you refine it many, many, many times.

Speaker 1

这就像我运行稳定扩散或类似组件时，你正在进行去噪运算一样。

Same as with if I'm running, you know, stable diffusion and comp UI or something and and you're you're doing your your denoising runs.

Speaker 0

这完全一样。

That's exactly the same.

Speaker 0

没错。

Exactly.

Speaker 0

没错。

Exactly.

Speaker 0

所以，即使在图像和视频生成的背景下，通常也可以控制去噪步骤的数量。

So, you know, even in the context of image and video generation, can usually control the number of denoising steps.

Speaker 0

你采取的去噪步骤越多，质量就越高，但当然成本也越高，耗时也越长。

The more denoising steps you take, the higher the quality, but of course, the more expensive it becomes, the more time it takes.

Speaker 2

那么在智能体情境下，这又是如何运作的呢？

How does this work in a agentic context then?

Speaker 2

比如，它会有什么不同？

Like, how how would it be be different?

Speaker 2

比如说，在像 Cloud Code 这样的系统中，它会出去执行一系列工具调用并进行推理。

Like, let's say, like, in in something like Cloud Code, like, where, you know, it's going out, it's doing a bunch of tool calls, it's reasoning.

Speaker 2

在这种情况下，如果有所不同，会怎样体现呢？

How how does it play out in that scenario if if different?

Speaker 0

所以，这与它需要输出某些内容有关。

So it it's, related in the sense that, you know, it needs to output something.

Speaker 0

然后，如果有工具调用，你就得等工具调用的结果返回。

And then, you know, if there is a tool call involved, then it kinda like you need to essentially wait until the the result of the tool call comes back.

Speaker 0

但这是不一样的。

But it's too different.

Speaker 0

我们能够以与OpenAI兼容的方式提供模型。

Like, we're able to serve the models in an OpenAI compatible way.

Speaker 0

我们支持工具调用，因此人们已经在各种智能体框架中使用我们的模型，包括一些开源框架。

We support tool calls, and so people are already using our models in, you know, variety of agentic frameworks, including some of the open source ones.

Speaker 0

我认为人们已经找到了在云代码中使用它的方法。

I think people have figured out ways to also use it in in Cloud Code.

Speaker 0

我认为有一些封装工具可以让你使用其他模型。

I think there are some wrappers that allow you to to to use, other models.

Speaker 0

所以，是的。

And so Yeah.

Speaker 1

我一直在考虑怎么把它连接到我的OpenClaw上。

I've been eyeballing how to connect it to my OpenClaw.

Speaker 0

这样可行。

That works.

Speaker 1

天啊。

Oh my gosh.

Speaker 1

是的。

Yeah.

Speaker 1

你有试过吗

It's Have you done

Speaker 2

实际上吗？

that, actually?

Speaker 2

你有玩过OpenClaw吗，你有

Have you have you played with open claw and and have you

Speaker 0

试过了吗？

worked it?

Speaker 0

但我的一些团队成员试过，他们用过水银，所以我确定它确实有效。

But, some of my team members have, and they've used mercury, so I know I know for sure it works.

Speaker 1

太好了。

Excellent.

Speaker 1

真酷。

That's cool.

Speaker 1

太好了。

Excellent.

Speaker 1

我还没尝试连接它。

I I haven't tried to connect it yet.

Speaker 1

我还在摸索中，但它确实在我待办事项列表上，因为我总想着我有这么多不同的任务。

I've still been stumbling my way through, but it's very much on my list because I keep thinking of like, I have such a variety of tasks.

Speaker 1

我现在正让这个东西帮我做事情。

I'm I'm having this thing do for me now.

Speaker 1

我认为这些超高速功能会非常强大，而且坦白说，比我现在能接触到的其他许多模型都要便宜得多。

And I think some of these that that super speed would be so huge on and and so much more affordable, frankly, than running plenty of the other models I have access to as well.

Speaker 1

所以这 definitely 在我的关注列表上。

So it's it's definitely on the radar.

Speaker 1

我觉得在很多智能代理流程中，这种方案具有很大的价值。

And I feel like, you know, in many agentic flows, there's a lot of value for this approach.

Speaker 0

是的。

Yeah.

Speaker 0

完全同意。

Completely agree.

Speaker 0

我认为，到了那个阶段，与环境交互的速度会成为关键瓶颈。

I think at that point, kind of speed of interaction with the environment becomes the key bottleneck.

Speaker 0

你真的需要能够使用你所能访问的工具，然后收集解决任务所需的信息。

You really want to be able to use the tools that you have access to and then kind of collect the information needed to solve the task.

Speaker 0

你交互得越快，就能越快地采取行动、收集反馈、进行推理、决定下一步该做什么，体验就会越好，模型的表现也会更出色。

And the faster you interact, the faster you can take actions, collect feedback, reason, decide what to do next, the better the experience, better the model is gonna work.

Speaker 0

因此，速度方面，我认为我们也在其他实验室看到了类似趋势。

And so speed, I think we're seeing it also from other labs.

Speaker 0

人们正在不断推动更快的进展。

People are pushing more and more.

Speaker 0

让我们让模型变得更快、更快、更快，因为这确实能极大提升用户体验。

Let's make the models faster, faster, faster because that really improves the user experience.

Speaker 1

是的。

Yeah.

Speaker 1

因为当你在提升智能的同时，也必须同步提升速度。

Because you kinda have to scale that at the same time you're scaling intelligence.

Speaker 1

否则，我觉得你最终会面临每件事都要等四十八分钟的情况。

Otherwise, I feel like what you're gonna wind up with is is a forty eight minute wait for everything you ask for, it seems like.

Speaker 1

就像你必须把这两者一起提升一样。

Like like, you kinda have to lift them together.

Speaker 0

对。

Yes.

Speaker 0

是的。

Yes.

Speaker 2

你之前提到过，作为实际上创建了这个系统的人，你最大的独特优势和好处之一就是你了解所有运行和部署它的技巧。

You mentioned earlier, so like one of the the unique things and and benefits of being the the person who basically made this is that you know all of the tricks of how to, you know, run it and deploy it.

Speaker 2

而且你知道，虽然有很多框架支持普通的大型语言模型，但专门针对扩散模型的内部框架可能没那么多。

And you know, there's a lot of frameworks that support regular LLMs, but maybe not as many frameworks where, you know, you have an in house framework for dealing with diffusion.

Speaker 2

你有没有考虑过把这套系统开源？

Would you ever make like an open version of that?

Speaker 2

你是希望每个人都必须通过你来使用吗？

Like, do you want everyone to go through you?

Speaker 2

你的商业计划究竟是怎样的呢？

Like what, what's your, what's your business plan there, I guess?

Speaker 0

这是个好问题。

It's good question.

Speaker 0

这确实是我们在之前反复思考过的问题。

It's something that we've, we've thought a lot about before.

展开剩余字幕（还有 272 条）

Speaker 0

我的意思是，团队中的很多人都是非常学术背景的。

I mean, a lot of the, you know, the team is very academic.

Speaker 0

我们中的许多人都是研究人员，一直在发表我们在实验室里的所有成果，我们非常看重分享我们的想法，并建立一个共同合作、共同进步的研究者社区。

A lot of us have been, researchers, have been publishing, everything we do in our labs, and we see a lot of value in sort of, like, you know, sharing our ideas and and having a community of researchers that work together to to to improve and and make progress.

Speaker 0

是的。

Yeah.

Speaker 0

我认为限制在于，正如你所知，目前的竞争环境极其激烈。

I think the constraint is that it as you know, it's it's an extremely competitive kind of landscape at the moment.

Speaker 0

知识产权仍然是一个巨大的护城河。

IP, it's still a a big moat.

Speaker 0

它仍然是公司的重要组成部分，而且我们认为其他实验室要复制我们的成果会非常困难。

It's still an important kind of part of the company, and and, you know, we feel like it would be hard for other labs to to reproduce what we have.

Speaker 0

因此，很遗憾，开源任何东西都可能暴露我们很多关于如何构建

And so, unfortunately, open sourcing anything would reveal probably quite a bit about how the

Speaker 2

模型，你必须保持你的优势。

model You gotta keep your advantage.

Speaker 2

所以

And so

Speaker 1

这真的很重要。

that is really important.

Speaker 1

做这项工作一定很昂贵，我肯定。

It's expensive to do this work, I'm sure.

Speaker 1

肯定花了不少钱。

It's gotta cost a fortune.

Speaker 2

是的。

Yeah.

Speaker 2

不。

No.

Speaker 2

这很公平。

That's fair.

Speaker 2

不。

No.

Speaker 2

这完全公平。

It's totally totally fair.

Speaker 2

我只是在想，你知道，NVIDIA 的一个优势是 CUDA，它让所有人都被绑定在一起。

I was just thinking, like, you know, like, in one of NVIDIA's strengths, right, is is CUDA, which kinda keeps everyone locked in.

Speaker 2

所以我总在想，会不会有一天出现类似的情况：你让所有人都能轻松做到这一点，但你仍然是某种意义上的守门人。

So I'm always wondering if there was a similar kind of play there eventually where, you know, you make it easy for everyone to do this and but then you're still the gatekeeper in some way.

Speaker 0

不会。

No.

Speaker 0

而且，我认为，如果我们能开源一些东西，获得社区的贡献和反馈，会带来很多价值。

And then there would also be, like, I think, lot of value just like, you know, if we could open source something, just get, contributions from the community, get feedback.

Speaker 0

如果我们能这么做，我认为会带来很多价值。

I think there would be a lot of value if we could do it.

Speaker 0

所以我们一直考虑这个问题，反复思考，很希望有朝一日能实现。

So that's why we, you know, we thought about it we thought about it, a lot, and and we'd love to do it at some point.

Speaker 0

也许当团队更大一些时，或者能找到一种方式发布一个更小的模型，或者一些更偏向研究性质的模型，让人们对它进行尝试和改进。

Maybe when the team becomes bigger and maybe there is a way to to release a smaller model or some more like a research type model that people can play with and make improvements.

Speaker 0

而且，我想肯定还有很多东西有待发明，如果整个社区共同努力，让融合语言模型成为下一代的默认选择，那将会很好。

And, I mean, I'm sure there is a lot to be invented, and it'd it'd be good, you know, if the whole community works towards, you know, making the fusion language models become the default for the next generation of other

Speaker 1

事情。

things.

Speaker 1

是的。

Yeah.

Speaker 1

是的。

Yeah.

Speaker 2

没错。

That's right.

Speaker 1

是的。

Yeah.

Speaker 1

我明白。

And I get it.

Speaker 1

这确实是个难题，因为事实是你必须持续向前推进。

It's a it's a tough sword too because, like, the fact is you've gotta continue to be able to push forward.

Speaker 1

我一直很好奇，那些长期从事开源的公司，它们的开源价值主张到底是什么。

And I've I've wondered about the open source value proposition for companies that have done it for quite a while.

Speaker 1

我理解这个理念，也很喜欢这些开源项目的存在。

I get the idea and I love that they're available.

Speaker 1

但与此同时，我总是忍不住想，如果无法盈利，你们怎么持续创新呢？

But at the same time, I'm always left wondering, but how do you continue to innovate if you can't make any money?

Speaker 1

事实是，做这件事需要花钱。

The fact is, takes money to do this.

Speaker 1

如今这些研究人员的代价不菲，GPU也是如此，所有这些都真的非常昂贵。

These researchers aren't coming cheap these days and neither are GPUs, you know, all of it's it's really, really expensive.

Speaker 1

不过，你多年来也公开贡献了很多关于这个主题的研究。

And you've you've contributed a lot of research over the years as well on the topic, though, publicly.

Speaker 0

是的。

Yeah.

Speaker 0

我认为，学术界的优势在于一切都是开放的，你可以自由发表所有研究成果，而这正是整个社区共同推动领域发展的核心所在。

That's the benefit, I think, of being in academia that everything is open and you're allowed to publish all of your work and, you know, that's that's the whole point for advancing the field together as a community.

Speaker 0

我喜欢这个方面。

I I love that aspect.

Speaker 0

而且我认为，很多研究者也这么觉得，我能感受到，随着发表政策日益收紧，工业界大实验室的同行们越来越不满，甚至不再被允许发表成果。

And and I think, I mean, a lot of the researchers do and I could sense a lot of unhappiness from colleagues and other researchers in industry working in the big labs as as the publication policies, you know, started to tighten and people were not allowed to to publish anymore.

Speaker 0

我觉得很多人都不开心。

I think there was a lot of a lot of people were were were not happy.

Speaker 0

所以

And so

Speaker 1

当研究由企业而非大学进行时，本质上是这样。

When it's businesses doing the research instead of universities, essentially.

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

这确实是一个难题。

That's that's definitely a struggle.

Speaker 1

我很好奇。

I I'm I'm curious.

Speaker 1

你们在为水星公司瞄准哪些特定行业时，认为水星及其所提供的功能能比大语言模型产生更大的影响？

Are you with Mercury targeting any specific industries where you feel like Mercury and what it has to offer could really make a bigger impact than maybe LLMs could?

Speaker 0

是的。

Yeah.

Speaker 0

目前，我们专注于那些我们认为属于即时AI类的应用场景，也就是对延迟要求极高的场景，通常意味着有人参与其中，而这个人等不起。

We're at the moment, we're going after what we think of, like, instant AI sort of kinda like applications of LLMs where latency is critical, which typically means there is a human in the loop, and the human cannot wait.

Speaker 0

这个用户可能是开发者。

And that human could be a developer.

Speaker 0

因此，我们看到Mercury模型在IDE中的大量应用，比如直接为开发者提供代码建议或修改。

So we are seeing a lot of, usage of Mercury models in IDEs, where, you know, you're essentially providing suggestions or edits to the code, for example, directly to a developer.

Speaker 0

在那里，你可能只有几百毫秒的延迟预算，需要在延迟限制内提供最优质的建议。

And there, you know, you maybe have a few hundred milliseconds of latency budget and you want to be able to provide the best possible suggestion within the latency budget.

Speaker 0

但这也可能适用于客服、语音助手、教育科技等场景。

But it could also be, you know, customer support, voice agents, EdTech, sort situation.

Speaker 0

是的。

Yeah.

Speaker 0

在任何需要实时与人类互动并给出回应的场景中，延迟都变得至关重要，此时的关键在于，在合理的成本下，在延迟预算内提供最佳质量的结果。

Any other situation where you have, you know, to give an answer, have to interact with a human in real time, then latency becomes critical and the game becomes, again, sort of like what's the best quality result that you can provide within the latency budget for a reasonable cost.

Speaker 0

这正是我们超越现有自回归解决方案的地方，也是我们目前获得初步市场认可的领域。

And that's where we dominate, existing autoregressive solutions, and that's where we're seeing a lot of the initial traction.

Speaker 0

我认为，随着模型智能的持续提升，随着我们投入更多研发，逐步追上前沿模型的质量水平，未来会有越来越多的应用场景可以拓展。

I think eventually, as the intelligence of the models keeps improving, as we do more R and D, as we catch up with frontier quality models, I think there's gonna be more and more applications that we can go after.

Speaker 0

但目前，我们专注于其他实验室在延迟敏感型应用上的领域。

But right now, we're going after, latency sensitive applications of other labs.

Speaker 1

这很合理。

That makes sense.

Speaker 1

那个领域里有很多东西都会属于这一类。

There's a lot of stuff in that world that's, that would fall in there.

Speaker 1

你知道的。

You know?

Speaker 1

格兰特，你之前提到过对医疗科技这一端的问题。

Grant, you said you had a question about the the medical tech end of it.

Speaker 2

没有，我刚才在想，我第一时间想到的是语音代理，我在想，你能做出一个扩散语音模型吗？

No, was thinking, so the immediate place that I went to was voice agents, and I was wondering, could you make a diffusion speech model?

Speaker 2

然后我又想，如果突然间它发布出来，你听到声波在生成，那听起来会有点疯狂。

And then I was thinking, that would sound kind of wild if like all of a sudden it comes out and it's like, then you the, and then you hear the sound wave generate.

Speaker 2

但我也在想，我很想听听你的看法，这是否真的可能。

But then also I was thinking, and I'd love your take on that if it's even possible.

Speaker 2

但我也在想，尝试把它和语音转文字模型结合起来是有道理的，因为最需要实时交互的地方正是这里，对吧？

But then also I was thinking, well, it makes sense to try and, you know, perhaps match it with a speech to text model because it makes sense to, that's where you need the most real time interactions, right?

Speaker 2

就像你正在进行一场语音对语音的对话。

As if you're having a voice to voice conversation.

Speaker 2

所以这对我来说显而易见。

So that makes obvious sense to me.

Speaker 2

我想知道这方面的进展如何。

I'm curious how that's going.

Speaker 0

是的。

Yeah.

Speaker 0

而且，你说得对，扩散模型确实有效，而且在语音和音乐生成方面表现得非常好。

And, I mean, you're absolutely right that diffusion, actually does, work, and it works really, really well for speech, and music generation.

Speaker 0

我知道一些开源模型，还有一些实际上闭源的前沿模型都是基于扩散模型的，关于文本生成我之前甚至都不知道。

Like some I know that some of the open source models and some of the the state of the art actually closed source models are based on diffusion for text I didn't even know that.

Speaker 0

是的。

Yeah.

Speaker 0

是的。

Yeah.

Speaker 0

所以，这确实很有道理。

And so, you know, it it does make a lot of sense.

Speaker 0

其中一个挑战是，如果你想要直接从语音到语音，通常这类交互仍然涉及工具调用。

One of the challenges is that, you know, if you wanted to go go straight from voice to voice, is that often these kind of interactions involve still involve tool calls.

Speaker 0

比如在客户服务中，你可能仍然需要查询数据库、查看日历的可用性，或者查看菜单以获取价格。

So if you're doing a customer support, you might still need to be able to, you know, query the database or check a calendar for availabilities or, you know, look up the menu to get the prices.

Speaker 0

因此，我认为仍然需要一些文本和代码，这使得开发变得稍微复杂一些。

And so there there still needs to be some text, I think, some code involved, which makes it a little bit more tricky to to develop.

Speaker 0

但我们非常期待最终实现真正多模态的系统。

But we are very excited about eventually getting to something that is actually multimodal.

Speaker 0

现有的Mercury模型仅支持文本或代码，但我们知道未来的模型在图像、视频和音乐方面表现非常出色。

The existing Mercury models are just, text only or code only, but we know the future models work really well for image, video, music.

Speaker 0

如果我们把所有这些整合起来，就能打造出一种真正卓越的系统，能够处理各种模态，成为一个真正理解一切、整合所有模态学习与信号的现实世界模型。

So if we put everything together, we could get to to something like truly, phenomenal at handling different kinds of modalities and have a a real world model that understands everything and and, puts together all the learnings and the signals from all the different modalities.

Speaker 0

但这 definitely 是我们最终想实现的目标。

But it's definitely something we wanna do at some point.

Speaker 2

这听起来非常好。

That sounds very good.

Speaker 2

太棒了。

So awesome.

Speaker 2

在你看来，这会适用于机器人场景，还是更适用于用于训练机器人的模拟环境？

Would that be something that would be useful for, like, a a robotic kind of situation, or would that be more for, like, simulations that you can use to train robots in your opinion?

Speaker 0

可能是混合的。

Could be a mix.

Speaker 0

它可能涉及决策，比如当你使用视频或其他类型的传感器作为输入，然后用模型来做出决策或分析周围环境的情况。

It could be decision making, like, if you're using video or, other kinds of sensors as input and then use the model to make decisions or kind of, like, analyze what's going on in the surroundings.

Speaker 0

这是一种非常有用的应用场景。

It it's a it's a very useful kind of like application of this technology.

Speaker 0

事实上，我们已经从一些早期使用者那里听说，他们非常希望我们的模型能支持图像输入，因为他们正在构建计算机代理。

In fact, we've already heard it from some early adopters that they would love for our models to have image inputs because they're building computer agents.

Speaker 0

所以，这是另一个你需要快速响应的领域。

And so that's another space where, you know, you really need to be quick.

Speaker 0

你需要能够快速与代理所交互的软件或应用程序进行互动。

You need to be able to interact fast with the with the, you know, whatever software, whatever application the agent interacts with.

Speaker 0

但重要的是，不要只关注网页的文本和HTML代码，还要真正看到正在发生的情况。

But it's important not to just look at the at the text and and the and the HTML code, let's say, of a web page, but actually seeing, you know, what's happening.

Speaker 0

因此，这将开启许多其他应用场景，我认为。

And so that would open up a lot of other applications, I think.

Speaker 2

这让我大开眼界。

That's blowing my mind.

Speaker 2

计算机是怎么使用的呢？我的意思是，这可能涉及很多机密，但计算机如何使用扩散型大语言模型呢？

How would a computer use I mean, this might be giving secrets all this way, but how would a computer use Diffusion LLM work?

Speaker 2

它是生成整个屏幕空间吗？

Like, is it generating the whole screen space?

Speaker 2

这到底是怎么运作的？

Like, what how does that work?

Speaker 0

对于计算机使用来说，不是的。

For computer use, no.

Speaker 0

它更像是一种控制操作的方式。

It it would be more like controlling the actions.

Speaker 0

所以，这个智能体可以打字、点击，可以帮你下单购买商品，或者帮你预订航班，但它需要在网站上，或者在你的应用程序、手机上的某些应用中执行一系列操作，具体取决于所处的环境。

So, you know, the the agent can type, the agent can click, the agent can, you know, check out an item for you or can book a flight for you, but it needs to take a bunch of actions, let's say, on a website or maybe on your apps or on some apps on your phone, like, depending on what's the environment.

Speaker 2

但它是怎么做到的？

But how is it?

Speaker 2

我想知道的是，它是怎么看到的呢？

I guess what I was wondering is how is it seeing it?

Speaker 2

它是怎么，嗯，看到的？我想这可能更多跟视觉模型有关

How is it, like, see the and that that that's I guess it's maybe more so on the vision model

Speaker 1

方面。

side of things.

Speaker 1

扩散模型是怎么运作的？

Of of how diffusion looks?

Speaker 1

是什么让你产生这种差距的呢？

It would bring what brings you to this this gap?

Speaker 2

是的。

Yeah.

Speaker 2

是的。

Yeah.

Speaker 2

没错。

Exactly.

Speaker 0

现有的模型，是的，它们会结合一些结构化信息，比如有哪些菜单可用、按钮在哪里，以及图像等。

So existing models, yeah, they take a a mix of, sort of, like, structured information about, you know, what kind of menus are available, where the buttons are, and and images.

Speaker 0

然后它们利用这些信息映射到具体的操作上。

And then they they use that and they map it to an action.

Speaker 0

你可以想象一个类似的场景：你使用一个扩散模型，处理相同的输入，但不是逐个生成标记，而是通过一种精炼过程来输出答案，这非常合理，因为人们已经在使用扩散策略和基于流的策略了。

So you can imagine something similar where instead you have a you have a diffusion model, processes the same inputs, but then produces the answer not one token at a time but through this refinement process, which makes a lot of sense because people are already using diffusion policies and flow based policies.

Speaker 0

如果你看看当前的强化学习和机器人领域，控制机器人并实现决定机器人行为的策略的最佳方法之一，就是基于扩散的。

If you look at RL and robotics right now, the way one of the best approaches for controlling robots and and kinda, like, implementing the policy that decides what actions the robots take is based on diffusion.

Speaker 0

它基于基于流的模型，本质上差不多是一回事。

It's based on flow based models, just more or less the same thing.

Speaker 0

因此，这也是另一个让我非常兴奋的数据点，让我更有信心我们走在正确的道路上，我们确实需要在某个时候去实现这一点。

And so that's another kind of, like, data point that really, excites me and and gives me even more confidence that we are on the right track, and we really need to to to to to do this at some point.

Speaker 1

这真的很

That's really

Speaker 2

为了澄清一下，这是一个视觉-动作模型，还是另一种类型的模型？

Just to clarify, is that a vision action model, or is that a different type of model?

Speaker 2

好的。

Okay.

Speaker 0

是的。

Yeah.

Speaker 1

好的。

Okay.

Speaker 1

太好了。

Cool.

Speaker 1

太好了。

Cool.

Speaker 1

嗯，我有个问题，方向有点不同，我一直很好奇：扩散模型是否会改变模型的幻觉行为或可控性，还是本质上类似？

Well, I have a I have a question that's that's kind of a different direction that I've I've been really curious about, and that's does diffusion change hallucination behavior or controllability of a model, or is it similar in nature?

Speaker 0

这些权衡确实会变化，因为归根结底，幻觉的产生是因为你在构建一个统计模型。

So those trade offs change in the sense that, you know, you're you're at the end of the day, hallucinations happen because, you know, you're building a statistical model.

Speaker 0

所以，当你用数据拟合一个统计模型时，总会存在某种情况：你可能在进行插值，但有时也需要外推，而如果没有完美的模型，错误就会发生——但完美的模型实际上是不可能实现的。

And so, you know, whenever you fit a statistical model to data, there is a sort of regime where maybe you're interpolating, but then you might need to extrapolate and then mistakes happen unless you have a perfect model, but perfect model is never actually possible.

Speaker 0

因此，由于你使用的是不同的模型，即使使用相同的数据，也会产生不同的行为。

And so because you're fitting a different model, even if you use the same data, you're gonna get a different kind of behavior.

Speaker 0

所以它会以不同的方式插值和外推。

So it's gonna interpolate, it's gonna extrapolate in different ways.

Speaker 0

我们看到的是，它们仍然会犯错。

What we're seeing is that, you know, they still make mistakes.

Speaker 0

你应该试试我们的水星模型。

You should try our Mercury model.

Speaker 0

它并不完美。

It's not perfect.

Speaker 0

它会幻觉，但表现方式不同。

It does hallucinate, but it does show in different ways.

Speaker 0

很难量化具体是怎样的不同。

And it's hard to quantify how.

Speaker 0

你知道，有一些基准测试，所以我们用基准来评估它在一般知识和指令遵循方面的表现，是否会编造内容？

You know, there are benchmarks, so we use benchmarks to see, you know, how well does it know, you know, the general knowledge and instruction following, does it make up things?

Speaker 0

我们发现它的表现非常好。

And we're seeing that it does really well.

Speaker 0

但要精确地量化，甚至从定性上判断——哪些方面它表现不佳，哪些方面表现更好——是非常困难的。

But it's hard to actually precisely quantify or even qualitatively figure out, okay, there are certain things that it doesn't do well or there are certain things where it does better.

Speaker 0

这实际上是一个相当棘手的问题。

That's actually a pretty hard problem.

Speaker 0

即使从学术理论的角度来看，理解泛化是如何运作的，这些模型如何以合理或不合理的方式整合训练数据中看到的所有知识，这个问题仍然广泛未解。

Like even from an academic theoretical perspective, understanding how generalization works, how these models are actually able to combine all the knowledge that they see in the training data in in ways that make sense and ways that don't make sense, it's, it's widely open.

Speaker 0

没有人真正理解这些模型是如何工作的。

Nobody really understands how these models work.

Speaker 0

因此，不幸的是，这也意味着很难比较两种模型。

And so, unfortunately, that also means that it's very hard to compare two kinds.

Speaker 2

不久前，OpenAI 发表了一篇论文，基本上认为幻觉是一个训练问题，或者至少他们的观点是，模型实际上被奖励去猜测，而不是在不知道时坦承。

Well, OpenAI had this paper not too long ago, right, that was saying that basically, like, hallucinations are a training problem, or at least that was their thesis that, you know, it's the the models are effectively being rewarded for guessing as opposed to saying when they don't know.

Speaker 2

你认同这个观点吗？

Do you do you buy into that?

Speaker 2

你认同这个观点吗？

Do you subscribe to that idea?

Speaker 2

如果是的话，你是否有可能去研究一下，以进一步减少这种情况？

And if so, is that something that you could possibly, you know, work work on to to help reduce that even more?

Speaker 0

是的。

Yeah.

Speaker 0

我认为，从根本上说，这确实是一个训练问题，因为你是在喂给一个统计模型。

I think fundamentally, it is it is a training problem in the sense that it's a you know, you're feeding a a statistical model.

Speaker 0

而且

And

Speaker 2

对。

Right.

Speaker 0

不幸的是，这非常复杂，用我们的话说，这是一个非常高维的空间。

Unfortunately, this is a very, you know, it's a very high dimensional space, what we would say.

Speaker 0

就像，存在

Like, there is there is

Speaker 1

是的。

a Yeah.

Speaker 1

一个

Speaker 0

可能存在极其大量的组合方式。

extremely large number of possible combinations that that you could that you could come up with.

Speaker 0

比如，想想你能生成的所有不同句子，那简直是一个组合爆炸的空间。

Like, if you think about all the different sentences that you can generate, it's just like a combinatorially large space.

Speaker 0

无论你的训练集有多大，它都只是你能生成的所有可能句子中微不足道的一小部分。

And no matter how big your training set is, it's only ever gonna be, like, a tiny little fraction of all the possible things that you could all the possible sentences that you could generate.

Speaker 0

所以

And so

Speaker 2

没关系。

It's alright.

Speaker 0

这意味着模型必须依靠自身进行推断，训练数据本身并不能告诉你一切，你需要进行插值和外推。

What this means is that the model has to essentially, you know, the training data itself will not tell you everything, and you have to interpolate and extrapolate.

Speaker 0

你必须进行泛化，但没有人真正知道这些模型是如何泛化的。

You have to generalize, and nobody actually knows how these models generalize.

Speaker 0

即使在更简单的场景中，比如仅训练一个神经网络来分类图像，也没有人理解这些模型是如何泛化的。

Even in simpler settings, like even if you take, you know, supervised learning, just training a neural network to classify images, nobody understands how these models generalize.

Speaker 0

在这里，我们讨论的是更复杂的情况，你输出的不是二元标签或ImageNet中的千个类别，而是一个极其庞大、甚至 notoriously 庞大的输出空间，

Here, we're talking about something even more complicated where you're not just outputting a binary label or a thousand, you know, classes in ImageNet, but you have, like, an extremely large could be notoriously large space of outputs,

Speaker 2

而且没有人

and nobody

Speaker 0

理解在这一空间中泛化是如何运作的。

understands how generalization works in that space.

Speaker 2

从某种意义上说，这一切居然能奏效，是不是有点像奇迹？

In a sense, is it kind of a miracle that any of this works in the first place?

Speaker 0

绝对是的。

Absolutely.

Speaker 0

绝对是的。

Absolutely.

Speaker 0

我的意思是，我十多年前就开始研究这个问题了，当时觉得这足以让我整个职业生涯都忙个不停，因为这实在是一个太难的问题。

I mean, I I started working on this, you know, more than ten years ago and I thought it was going to keep me busy for my whole career because it's such a hard problem.

Speaker 0

这几乎感觉是不可能成功的。

It almost feels impossible for this to work.

Speaker 0

可能的组合太多了。

There's just so many combinations.

Speaker 0

即使你想想用图像来训练模型，对吧？图像的种类和特征组合方式太多了。

Even if you think about feeding a, let's say a model over images, right, there are so many different kinds of images and different kinds of combinations of features.

Speaker 0

对吧？

Right?

Speaker 0

假设你用一个数据集来训练模型，里面包含红色的汽车、蓝色的公交车和红色的公交车，那么模型该不该生成一辆蓝色的汽车？

And so let's say you train a model over a dataset where there is red cars and blue buses and red buses, then should the model generate a blue car?

Speaker 0

对吧？

Right?

Speaker 0

这一点并不明确。

And it's it's not clear.

Speaker 0

对吧？

Right?

Speaker 1

是的。

Yeah.

Speaker 1

对，就是这样。

Like Yeah.

Speaker 0

但从根本上说，这些模型就是这么做的。

But fundamentally, that's what these models do.

Speaker 0

当然，这要复杂得多，因为涉及更多不同的事物、形状和颜色，有些组合有意义，有些没有，而它们却能以正确的方式做到这一点。

And it's, of course, much more complicated because there's many more different things and many different shapes and colors and some combinations means make sense, some don't, and they are able to to do it the right way.

Speaker 0

但这确实是一个非常、非常困难的问题。

But it is just a very, very hard problem.

Speaker 1

可解释性已经成为一个非常引人入胜的领域，因为我们知道，这些就是相关的部分。

That's interpretability has become such a fascinating field in terms of just looking at, you know, yeah, we know these are the parts.

Speaker 1

我们知道这是如何构建的。

We know this is how we build it.

Speaker 1

我们知道这些原理在这边有效，那些原理在那边也有效。

We know that these principles work on this end and these principles work on this end.

Speaker 1

在中间的某个地方，存在着一种我们还未能完全理解的飞跃。

Somewhere in the middle is that like, like leap of faith that that we don't quite get.

Speaker 1

这真的非常迷人。

And, it's really it's fascinating.

Speaker 1

格兰特，很高兴你问了这个问题。

Grant, I'm glad you asked that question so much.

Speaker 2

你必须得问，你就是得问。

You gotta you just got you just gotta ask.

Speaker 2

你知道的。

You know?

Speaker 2

这就像是

It's like

Speaker 1

没错。

That's right.

Speaker 2

我不知道。

I don't know.

Speaker 2

所以得去问那些知道的人。

So gotta ask the people who know.

Speaker 1

所以你觉得，模型评估在扩散模型中是否有所不同，还是我们本质上只是在查看其中的答案？

So do you feel like does model evaluation, is it approached any differently with diffusion, or are we essentially just looking at the answers in there?

Speaker 1

我只是不太确定。

I just I wasn't sure.

Speaker 1

我写下这个问题的时候，不确定它是不是太蠢了，但我还是决定问出来，因为如果非顺序性影响了这些指标和评估方式，我想弄清楚。

I wrote this question down and wasn't sure it wasn't really stupid, but I decided I was gonna ask it anyways, as if not being sequential affects how those metrics work and evals.

Speaker 0

没太大区别。

Not too much.

Speaker 0

如果你想想质量指标，它们通常非常相似，我们可以使用相同的基准，比如CBE，或者你知道的，很多现有的基准测试，人们用它们来评估模型在各种我们关心的任务上的表现，比如遵循指令、为我们写代码、软件工程，所以我们仍然可以进行测试。

So if you think about the quality metrics, they are often very similar, we can use the same benchmarks as we bent, CBE or, you know, if eval, like a lot of the existing benchmarks that people use to test, you know, how good are the models doing various things that we care about, following instructions, writing code for us, software engineering, so we can still essentially test.

Speaker 0

最后，我们只是给模型输入相同的提示，看看它的回答，然后评估它在任务中的表现，这是一种非常有用的方式来衡量模型的实用性。

And to end, we just feed the same prompt to our model, we look at the answer and then we see how well it does at the task, and that's a very useful way of measuring, you know, how useful the models are.

Speaker 0

还有一些指标是针对未来模型的，比如你需要多少次去噪步骤，也就是模型的速度。

There are other metrics that are, specific to to the future models, like how many kinda, like, denoising steps do you need to do, so so essentially how fast they are.

Speaker 0

这是我们需要跟踪和优化的另一件事，因为质量和速度之间总是存在权衡，所以事情变得稍微复杂了一些，因为你多了一个可以调整的旋钮。

That's another thing that we need to track and we and we need to optimize because there's always kinda like a trade off between quality and speed, and so everything becomes a little bit more complicated because there is an extra knob that you can kind of play with.

Speaker 0

此外，还有一些可能是扩散模型特有的东西。

And so there are a few other things that are maybe diffusion specific.

Speaker 0

但总的来说，我认为最终还是离不开速度、质量和成本这三个因素。

But broadly, I think it's still always gonna be a matter of speed, quality, and cost.

Speaker 0

是的。

Yeah.

Speaker 0

你总是会归结到这三件事上，而且你知道，速度和成本相对容易衡量，执行方式上的灵活性也不大。

You always boil down to those three things, And, you know, speed and cost are relatively easy to measure, and and there's not a ton of wiggle room in terms of, like, how you do things.

Speaker 0

质量才是更难的那一个。

Quality is really the harder one.

Speaker 0

这时候，就很难说哪个模型更好了。

That's when, you know, it's it's it's hard to say which model is better.

Speaker 0

仍然有很多定性评估和主观感受在里面。

There is still a lot of, qualitative evaluations and a lot of, vibes involved.

Speaker 0

是的。

Yeah.

Speaker 0

有点像是觉得这个模型比那个模型更好。

Kind of like seeing this this model better than the other model.

Speaker 0

但另一方面，这非常重要。

But on the other hand, it's so important.

Speaker 0

对吧？

Right?

Speaker 0

你无法做好工程。

You cannot do good engineering.

Speaker 0

如果你不追踪那些重要的东西，就无法进行真正的研发。

You cannot do, proper r and d if you're not tracking the things that that matter.

Speaker 0

所以，是的。

And so Yeah.

Speaker 1

你必须追踪这些指标，但同时也要意识到，基准测试也需要打点折扣来看待。

You have to track them, but also respect that benchmarks need a bit of a grain of salt with them as well.

Speaker 1

但归根结底，这取决于你的具体任务和你正在做的事情，我想是这样。

But, you know, in the end, it's it's down to your tasks and what you're doing where where you find that, I guess.

Speaker 1

没错。

Exactly.

Speaker 2

我其实有个相关的问题，我们的许多读者和观众都在努力弄清楚如何在他们的业务和工作流程中实施这些系统，对吧？

I actually have a question related to this, which is a lot of our readers and viewers are trying to figure out how to implement these systems in their business, in their workflows, right?

Speaker 2

作为一名多年来构建并使用这类模型的人，你建议他们如何闭环实施这些系统，并在实际生产规模下评估质量呢？

As someone who builds and has worked with these types of models for years, how do you recommend that they, I guess, close the loop in terms of implementing this and and actually assess the quality when you're actually in production scale?

Speaker 2

比如，你有什么技巧或建议吗？如何判断这些系统是否真正有效？

Like, do you have any, like, tips or tricks there of, like, how basically, like, what's your recommendation for how to assess whether they're working or not?

Speaker 0

我的意思是，从我们客户的情况来看，黄金标准是在业务相关的指标上进行某种A/B测试。

I mean, ultimately, what we see with our customers, the the gold standard is some kind of AB test, on a business on the business relevant metric.

Speaker 0

我认为，最终决定人们是否会购买、是否会转向Mercury的关键就在这里。

I think, ultimately, that is the thing that decides whether people are gonna buy, are gonna switch to Mercury or not.

Speaker 0

他们关心某个业务指标，然后你会进行A/B测试，看看Mercury是否更好，成本是否合理，可靠性是否到位，以及其他所有重要因素是否具备，如果都满足，他们就会切换。

There is some business metric they care about, and then they you know, you do an AB test, and then you see, you know, if Mercury is better and if it is and and the cost is right and, you know, the reliability is there and all the other things that matter are there, then they switch.

Speaker 0

所以我认为这就是黄金标准。

And so I think that's kinda like the gold standard.

Speaker 0

不幸的是，这需要基础设施支持。

Unfortunately, you know, it takes infrastructure.

Speaker 0

并非每个客户都有足够的成熟度来正确运行A/B测试，而且这也很昂贵。

Not every client has the maturity to be able to run AB tests properly, and that's also expensive.

Speaker 0

因此，理想情况下，在进行A/B测试之前，你希望具备某种离线评估手段，以指导模型选择，甚至判断是否值得进行A/B测试。

So, of course, ideally, leading up to the to the AB test, you wanna be able to have some kind of offline evaluation to kinda, like, guide the model selection and even just make any sense of is it even worth doing an AB test.

Speaker 0

因此，制定一些良好的离线评估方法——可以基于评分标准、使用大语言模型作为评判者，或采用你认为在生产环境中关键指标的代理指标——我认为这总是一个不错的步骤。

So coming up with some good kinda, like, offline evals, which could be based on rubrics, LLM as a judge, or some kind of proxy for the thing you think will matter in production, I think that's always a good step.

Speaker 0

这能为你节省时间。

It's gonna save you time.

Speaker 0

这能为你节省成本。

It's gonna save you money.

Speaker 0

同样，这一点也总是很重要的。

It's always important to do that as well.

Speaker 1

太棒了。

Awesome.

Speaker 1

嗯，我知道我们时间有点紧了，但在你走之前，我还有一个最后的问题。

Well, I I know we're getting tight on time, but I have one last question before I let you go.

Speaker 1

我想确保我们再多聊一点关于 Mercury 的事，因为它已经存在一段时间了，而它在很多人甚至还不知道它存在的情况下，表现得竟然这么出色。

And and I wanna I wanna make sure we talk a little bit more about mercury because it's been around a little minute now, minute now, and, it's surprisingly good for something that some people don't even know exists at this point.

Speaker 1

我想知道，你们的路线图上有什么计划？

And I'm wondering, like, what's what's on the roadmap?

Speaker 1

近期有什么即将推出的内容？

What's what's coming up in the near future?

Speaker 1

也许更长远的未来呢？

Maybe the longer term future?

Speaker 1

你有什么想分享的吗？

Is there anything you wanna share about that?

Speaker 0

很高兴你问到这个问题。

I'm glad you asked.

Speaker 0

是的。

Yeah.

Speaker 0

正如我提到的，我们正在全力投入研发，不断提升融合语言模型的前沿能力。

So as I mentioned, we're working hard on R and D and improving the frontier of what's possible with the fusion language models.

Speaker 0

我们将持续发布新模型。

We're going be releasing new models constantly.

Speaker 0

我们非常期待一款即将很快公开发布的模型，它将让水星模型变得更加智能。

There is one that we're very excited about, that we hope to release publicly very soon, that's gonna enable Mercury models to be, even smarter.

Speaker 0

它将具备更强的规划能力和推理能力。

It's gonna have much better planning and kind of like reasoning capabilities.

Speaker 0

这将推动许多人们非常关注的智能体应用场景。

And so that's gonna enable a lot of agentic use cases that that people really care about.

Speaker 0

它将让它们变得极其迅速。

It's gonna make them really, really fast.

Speaker 0

所以我们对这些新模型展现出的结果感到非常兴奋。

So we're we're excited about the the the results that we're seeing on on these on these new models.

Speaker 0

所以，希望我们能很快与你和你的观众分享这些成果。

So, hopefully, we're gonna be able to share them with with you and and your and your audience pretty soon.

Speaker 1

太棒了。

Amazing.

Speaker 1

太棒了。

Amazing.

Speaker 2

太好了。

Awesome.

Speaker 2

这 definitely Go away。

It's definitely Go away.

Speaker 1

太多了，老兄。

So much, man.

Speaker 1

这真的太引人入胜了。

This has been absolutely fascinating.

Speaker 1

我们非常喜欢听到你们的声音。

We we love hearing from you.

Speaker 0

非常感谢你邀请我。

Thank you so much for hosting me.

Speaker 0

是的。

Yeah.

Speaker 0

这真的非常有趣。

This was really fun.

Speaker 2

听众应该去哪里试用Mercury并了解更多关于扩散语言模型的信息？

Where should listeners go to give Mercury a try and learn more about diffusion language models?

Speaker 0

他们可以访问我们的网站 inceptionlabs.ai。

So they can come to our website, inceptionlabs.ai.

Speaker 0

我们提供了一些资源，包括关于我们模型的博客文章。

We have a number of resources, blog posts on our models.

Speaker 0

还有一个聊天体验平台。

There is a chat playground.

Speaker 0

我们提供了使用API的文档。

There's documentation to to use our API.

Speaker 0

如果你想用我们的模型开发应用，那可能是最好的起点。

You wanna build an app using our models, that that's probably the best place to start.

Speaker 1

太好了。

Excellent.

Speaker 1

太好了。

Excellent.

Speaker 1

如果你还没有的话，请花一秒钟点赞、订阅、注册我们的通讯，并持续关注Inception和Mercury，因为Stefano正在做一些很酷的事情，我们非常高兴今天能把他介绍给大家。

Well, if you haven't yet, please take just a second to like, subscribe, go hit the newsletter, and make sure you keep up with inception and Mercury because Stefano's doing some neat stuff, and, we're really excited we could bring him to you today.

Speaker 1

但今天就到这里了。

But that's it for today.

Speaker 1

那么，暂时告别了，朋友们。

So farewell for now, humans.

Speaker 1

我们下次见。

We'll see you next time.