Zero Knowledge - ZK如何启发米兰达·克里斯开创AI水印技术 封面

ZK如何启发米兰达·克里斯开创AI水印技术

How ZK inspired AI Watermarking with Miranda Christ

本集简介

在本期节目中,安娜·罗斯与塔伦·奇特拉同哥伦比亚大学计算机科学博士生米兰达·克里斯探讨了密码学与人工智能通过水印技术的交叉领域。米兰达分享了她在开发不可察觉方法以证明内容由AI模型生成方面的研究,涵盖从简单的红绿词表到复杂的伪随机纠错编码等各类技术。 讨论深入探究了水印的密码学特性——包括完备性、可靠性与不可检测性——以及这些特性如何与零知识证明系统中的属性相呼应。米兰达阐释了水印技术与ZKML等其他密码学方法的差异:水印仅修改采样过程而非底层模型权重,因而计算量轻量化,更具备实际部署可行性。 相关链接 第206期:与吉列尔莫、亚历克斯和塔伦共析DeFi原语 我为德州大学有效利他主义组织讲授的AI安全课程 谷歌SynthID 亚马逊公共水印检测器 《纽约时报》:ChatGPT如何在生成文本中嵌入"水印" 《华尔街日报》:OpenAI未部署水印技术的原因 大语言模型水印技术 语言模型的不可检测水印 沙中之印:生成模型强水印的不可能性 伪随机纠错编码 理想伪随机编码 查看ZK领域最新职位:ZK播客工作板 **如果您喜欢我们的节目:** * 所有链接汇总:@ZeroKnowledge | Linktree * 订阅播客通讯 * 推特关注 @zeroknowledgefm * 加入Telegram群组 * YouTube频道 **支持节目:** * Patreon * ETH捐赠地址 * BTC捐赠地址 * SOL捐赠地址 * ZEC捐赠地址 阅读文字稿

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

欢迎来到零知识领域。

Welcome to Zero Knowledge.

Speaker 0

我是主持人安娜·罗斯。

I'm your host, Anna Rose.

Speaker 0

在本期播客中,我们将探索零知识研究和去中心化网络的最新进展,以及有望改变我们在线互动和交易方式的新范式。

In this podcast, we will be exploring the latest in Zero Knowledge research and the decentralized web, as well as new paradigms that promise to change the way we interact and transact online.

Speaker 0

本周,塔伦和我与哥伦比亚大学计算机科学博士生米兰达·克里斯进行了对话。

This week, Tarun and I chat with Miranda Christ, computer science PhD student at Columbia University.

Speaker 0

我们讨论了正在开发的AI内容水印技术。

We discussed the techniques being developed to watermark AI content.

Speaker 0

AI水印是指通过某种方式证明内容是由AI模型生成的,但这种方式对内容消费者来说是不可察觉的。

To watermark AI is to have some way of proving that a piece of content was created by an AI model, but in a way that's imperceptible to the person consuming the content.

Speaker 0

米兰达分享了她在该领域的研究及同行们的工作成果,涵盖AI水印发展史、AI水印系统架构、探索中的技术方案、应对不断变化模型的挑战、水印规避方法,以及使更强水印成为可能的密码学工具。

Miranda shares her research and the research of her peers working on this problem, covering the history of AI watermarking, the architecture of an AI watermarking system, the techniques being explored, the challenges in working with ever changing models, how watermarks can be circumvented, and the cryptographic tools that make stronger watermarks possible.

Speaker 0

这些AI水印技术与零知识证明领域存在大量交叉点和相似之处。

There is a ton of overlap and lots of parallels in these AI watermarking techniques to the ones used in ZK.

Speaker 0

看到节目中常讨论的这些技术正以新方式渗透到邻近领域,真是件有趣的事。

So it was fun to see new ways that these techniques that we often cover on the show are making their way into this adjacent field.

Speaker 0

在正式开始前,我想向大家推荐ZK招聘板。

Now before we kick off, I just wanna point you towards the ZK jobs board.

Speaker 0

在那里你能找到来自顶尖ZK团队的工作机会。

There you will find job postings from top teams working in ZK.

Speaker 0

如果你是招聘团队,现在就可以在该平台发布职位。

And if you're a team looking to hire, you can also post your jobs there today.

Speaker 0

我们听说不少团队通过这个平台找到了理想人选,希望它也能帮到你。

We've heard great things from teams who found their perfect hire through this platform, and we hope it can help you as well.

Speaker 0

详情请访问jobsboard.0knowledge.fm。

Find out more at jobsboard.0knowledge.fm.

Speaker 0

你可以在我们官网找到这个链接,我也在节目笔记中添加了它。

You can find this on our website, and I've added a link in the show notes.

Speaker 0

现在请收听本期关于AI生成内容水印技术的完整内容。

Now here is our episode all about the techniques used to watermark AI generated content.

Speaker 0

今天,塔伦和我请到了哥伦比亚大学计算机科学博士生米兰达·克里斯。

Today, Tarun and I are here with Miranda Christ, a computer science PhD student at Columbia University.

Speaker 0

她也是哥伦比亚大学理论组和密码实验室的成员。

She is part of the theory group and the CryptoLab at Columbia as well.

Speaker 0

欢迎来到节目。

Welcome to the show.

Speaker 1

谢谢邀请。

Thanks for having me.

Speaker 1

是的。

Yeah.

Speaker 1

我很兴奋。

I'm excited.

Speaker 0

也欢迎你,塔伦。

And welcome, Tarun.

Speaker 2

嘿。

Hey.

Speaker 2

很高兴能回来。

Great to be back.

Speaker 2

已经有一段时间了。

It's been a while.

Speaker 0

酷。

Cool.

Speaker 0

Tarun,你一直在和Miranda交流。

Tarun, you have been speaking with Miranda.

Speaker 0

这次邀请Miranda上节目是你的建议。

It was this was a suggestion to have Miranda on.

Speaker 0

Miranda,这是我们第一次见面,所以我很期待了解你一直在做的工作。

It's the first time I'm meeting you, Miranda, so I'm really excited to learn about the work you've been doing.

Speaker 0

你能简单介绍一下自己和你的研究方向吗?

Could you share a little bit about yourself and the research that you've been working on?

Speaker 1

好的。

Yeah.

Speaker 1

所以我特别高兴能上这个节目,因为我觉得自己在加密货币方面有些背景。

So I'm especially happy to be on the show because I feel like I have some background in like cryptocurrency.

Speaker 1

我想我刚完成了在哥伦比亚大学的第五年博士学业,在那里做了不少研究。

I guess I just finished the fifth year of my PhD at Columbia, and so I've done kind of a lot there.

Speaker 1

一开始我就知道自己想从事密码学,所以从最初就一直在做这个方向。

I started off knowing I wanted to do cryptography, so I've been doing that since the beginning.

Speaker 1

但在我博士早期,我更专注于区块链密码学,曾在十六z实习过,那段经历很有趣,对我那一年影响很大。

But early on in my PhD, I focused more on blockchain cryptography, so I did an internship at a sixteen z that was really fun and I would What say influential in my year is that?

Speaker 1

那是2022年夏天。

That was summer twenty twenty two.

Speaker 0

好的。

Okay.

Speaker 0

酷。

Cool.

Speaker 1

大概就是那时候我开始接触整个零知识证明领域,非常令人兴奋。

And that's kind of when I started to learn about the whole z k world, which was very exciting.

Speaker 1

此外,我一直兼顾着理论密码学和一些更偏应用的理论密码学研究。

And then, I've always been doing a bit of theoretical cryptography and a little bit more applied theoretical cryptography.

Speaker 1

可以说在专注区块链之后,我仍会涉及相关研究,但现在投入减少了,更多转向了水印技术及其他偏向理论的课题。

I would say after my focus on blockchain, I still do some of it, but a little less now, I moved more towards watermarking and other more like theory topics.

Speaker 1

而水印技术目前可能是我研究的主要方向。

And watermarking is probably the main focus of my research now.

Speaker 0

我想这也将是本期节目的重点内容之一。

And I think it's gonna be a bit of the focus of this episode as well.

Speaker 0

这非常酷。

That's very cool.

Speaker 2

你和本节目另一位嘉宾乔·邦诺合作发表过许多论文,我记得你还在安德森那里工作过?

You've written a lot of papers with another guest on the show, Joe Bonneau, who, yeah, I guess you worked with Andreessen.

Speaker 2

那项研究主要涉及哪些主题?

What is sort of, like, the the themes in that research?

Speaker 2

因为那条研究线已经持续了三年多。

Because that line of research has been going on for, like, three years plus.

Speaker 2

所以,我感觉你的研究就像一条自我构建的链条。

So, like, I feel like you you have, like, a there's like a chain of research that's built upon itself.

Speaker 2

因此在讨论水印技术之前,也许我有点好奇——特别是为了让对我们这类工作更熟悉的听众了解——能否简单描述下你之前的研究内容?

So like, before talking about watermarking, maybe I'm I'm kinda curious just to give give especially for our listeners who will be more familiar with that type of work, just like a brief description of what you worked on.

Speaker 1

是的。

Yeah.

Speaker 1

和乔一起工作非常有趣。

Working with Joe is super fun.

Speaker 1

让我想想。

Let's see.

Speaker 1

我们的第一篇论文是关于无状态区块链的不可能性结果。

So our first paper was an impossibility result for stateless blockchains.

Speaker 1

基本上证明了效率无法超过某个特定阈值。

Basically, showing that you can't get better than a certain efficiency.

Speaker 1

自那以后,我们做了很多不同方向的研究。

And since then, we've been doing, I guess, lots of things.

Speaker 1

最近我们一直在研究随机数信标。

Lately, we've been working on randomness beacons.

Speaker 1

在这个场景中,多个参与方需要共同生成哪怕只是一个随机比特。

So there, you know, you have a bunch of different parties who want to jointly generate even just one random bit.

Speaker 1

如果这些参与方中大多数都不可信,这将非常困难。

This is very hard if the majority of these parties are untrusted.

Speaker 1

在区块链应用中,这对领导者选举或可验证抽奖等场景至关重要。

And super important in blockchain settings for things like leader election or like verifiable lotteries.

Speaker 1

因此,我和乔一直在研究如何利用延迟函数实现随机数信标。

And so, with Joe, I've been working on doing randomness beacons using delay functions.

Speaker 1

在某些已知没有延迟函数就无法实现的环境中。

In settings where they're known to be impossible without delay functions.

Speaker 1

在这种不诚实多数场景下,当你信任的参与者不到半数时,实际上传统方法没有延迟函数就无法建立安全的分布式随机数源。

So in this like dishonest majority setting where you trust fewer than half the participants, it's actually like classically without delay functions, impossible to have a secure distributed randomness begin.

Speaker 1

我们在论文中证明,延迟函数不仅能突破这种不可能性,事实上还是必要条件。

And so, in one of our papers, we showed that delay functions don't just allow you to get around this impossibility, but they're in fact necessary.

Speaker 1

所以,是的,乔在基于时间的密码学方面做了很多工作,能够展示它与随机信标之间的一些联系真是太好了。

And so, yeah, Joe works a lot on this like time based cryptography, and it's been nice to show some connections between that and randomness beacons.

Speaker 0

很酷。

That's cool.

Speaker 0

在研究主题列表中,还有差分隐私。

Also on the list of research topics, there's differential privacy.

Speaker 0

在那个研究方向上,具体是什么时候开始引入这个概念的?

When and like at what point in that thread of research was that kind of coming in?

Speaker 1

哦, 那其实是分开的的。

Oh yeah.

Speaker 1

好的。

That was kind of separate.

Speaker 1

我记得我博士第一年,我参加了 Columbia 大学, Steve Belavin 的课程, Belavin 是 Columbia 的教授, Belavin 教授,他研究法律与安全。

Okay.

Speaker 1

哦,我那时候在 Columbia 大学, Steve Belavin 的课程, Belavin 是 Columbia 的教授,他研究法律与安全。

So I think the first year of my PhD, I took a really interesting course with Steve Belovin, who's a professor at Columbia, and he works on law and security.

Speaker 1

哦。

Oh.

Speaker 1

是的,他确实在做一些非常酷且实际有影响力的工作。

So, yeah, he does like very cool, actually impactful work.

Speaker 1

我记得他曾与联邦贸易委员会合作过。

I think he's worked with the FTC.

Speaker 1

在这门课上,我们需要完成一个课程项目。

And in this class, we had to do a class project.

Speaker 1

我认为这门课非常关注那些对法律和社会产生实际影响的安全问题。

And I think the class was very focused on like security that impacts law and society in a very real way.

Speaker 1

因此在我的项目中,我选择研究人口普查中的差分隐私,这也是我最终完成那篇论文的契机。

And so for my project, I chose to work on differential privacy in the census, which is how I ended up with that paper.

Speaker 1

没错,这其实是源自这门课的一个小组项目。

So, yeah, this came out of like a group project for this class.

Speaker 1

这门课的主题——法律与匿名性——促使我稍微走出舒适区,这才让我最终研究起人口普查中的差分隐私,这过程其实非常有意思。

I think the topic of the class, which was like law and anonymity, pushed me to move a little more outside my comfort zone, which is how I ended up working on differential privacy in the census, which was actually really cool.

Speaker 1

我和合著者Sarah最终在国家科学院会议上向许多来自人口普查局的人做了报告。

My co author Sarah and I ended up presenting at National Academies meeting with a lot of people from the Census Bureau.

Speaker 1

真不错。

Cool.

Speaker 1

所以能看到我们的成果传递给真正负责人口普查的人,感觉很棒。

So it was cool to see like our work getting to the people who actually, you know, really run the census.

Speaker 0

有意思。

Funny.

Speaker 0

我们很久以前在节目里就讨论过差分隐私。

We we have talked about differential privacy on the show a long time ago.

Speaker 0

而且Tarun,我记得当时是在你将差分隐私应用于DeFi领域的背景下讨论的。

And, Tarun, I think it was in the context of the work you were doing using differential privacy in, like, DeFi.

Speaker 0

你还记得那时候吗?

Do you remember this era?

Speaker 2

记得。

Yeah.

Speaker 2

是的

Yeah.

Speaker 2

没错

Yeah.

Speaker 2

那是2021年的事了

That was 2021.

Speaker 2

已经过去很久了

It's a long time ago.

Speaker 0

我们要不要再快速定义一下那是什么?

Should we quickly define what what that is again?

Speaker 0

说实话,我对这个概念有点生疏了

I I'm a little rusty on on it, actually.

Speaker 2

我觉得这或许是个很好的过渡,可以引入水印技术的话题,因为在我看来水印技术某种程度上处于简洁证明和差分隐私之间的中间地带

I think maybe this is a good way to get into a segue to watermarking because I view watermarking as sort of in the middle of proofs in the sense of succinct proofs and differential privacy in some weird way.

Speaker 2

它并不完全是...它同时带有两者的某些特征

It's not it's not really it it has some flavor of both of them.

Speaker 0

不错。

Nice.

Speaker 2

但差分隐私本质上试图说明以下问题。

But differential privacy basically is just trying to say the following.

Speaker 2

如果我移除数据的某个子集,你能分辨出我移除了哪个子集吗?

If I remove some subset of the data, can you tell which subset I removed?

Speaker 2

因此它试图使数据集无法被识别。

And so it tries to make the set non identifiable.

Speaker 2

通常你会通过添加噪声来实现这一点,这样你就无法区分是移除了数据的某个子集还是噪声。

And usually, you do that by adding noise, so you can't tell the difference between removing some subset of the data versus noise.

Speaker 2

从某种意义上说,这给了你一种在组合下保持的隐私概念,但它并不具备统计上的保证性。

And in some sense, what that does is it gives you a sort of like you have a notion of privacy that is preserved under composition, but it's not statistically guaranteed.

Speaker 2

就像,如果你有足够的计算能力和样本,他们仍然可以强制破解出缺失的数据集。

Like, someone with enough compute power and samples can, even in polynomial time in a lot of cases, brute force what set was missing.

Speaker 2

所以它不具备计算上的硬度保证,但在实践中易于使用。

So it it it doesn't have, like, the computational hardness guarantees, but it's easy to use in practice.

Speaker 0

明白了。

I see.

Speaker 0

有意思。

Interesting.

Speaker 2

我漏掉什么了吗,米兰达?

Did I miss anything, Miranda?

Speaker 2

我不确定是否有遗漏。

I'm not sure if I have.

Speaker 1

我认为你甚至获得了比计算保证更强的保障。

I would say you get even stronger than computational guarantees.

Speaker 1

就像你获得了一个统计保证,无论花费多少时间,你都无法获取超过限定量的关于被移除集合的信息。

Like you get a statistical guarantee that no matter how much time you spend, you can't tell more than a bounded amount of information about the set that was removed.

Speaker 1

所以确实,这里不存在计算难度。

So like, yeah, there's no computational hardness.

Speaker 1

但我要说这强化了你获得的属性——即便你愿意投入比破解密码学更多的资源来重建缺失的数据集,你也无法以超过某个限定置信度的水平完成这件事。

But I would say that's strengthening the property that you get, which is like even if you're willing to throw more resources than would break cryptography at reconstructing the missing dataset, you can't do it with higher than some bounded level of confidence.

Speaker 2

是啊。

Yeah.

Speaker 2

我之所以提到计算层面的问题,更多是出于实际案例中被攻破的情况,比如人口普查数据事件,人们能够逆向推断出数据来源,差分隐私技术就是在这种背景下应运而生的——当时人们发现,

The only reason I I kind of mentioned the computational thing is more in practice when it has been broken, like, the census data stuff where people were able to reverse engineer where it came from, it's come from, like differential hierarchy was, like, invented because people were like, hey.

Speaker 2

你可以根据模型的预测结果,识别出哪些用户为特定模型贡献了数据。

You can identify which users contributed data to some particular model based on, like, the model's predictions.

Speaker 2

这就像2010年或2011年的Netflix竞赛案例。

Like, this is like the Netflix prize 2010 or 2011.

Speaker 2

所以你需要一种仅满足组合安全性的方案,确保当同一数据集被用于其他场景时不会造成信息泄露。

And so you need something where it's composition only safe, where, like, oh, if I use the same dataset in some other regime, I'm not leaking something.

Speaker 2

但问题在于差分隐私的组合保障性在某些情况下会弱化。

But the problem is the compositional guarantees of differential privacy worsen in some ways, like when I multiply or add.

Speaker 2

正因如此,实践中已经出现了利用这种缺陷发起的攻击,我认为这也是该技术推广受阻的部分原因。

And from that, people have been able to attack in practice, which is I think why it's had a little bit of a harder time That's getting an adoption.

Speaker 2

确实。

Yeah.

Speaker 2

针对它的攻击手段已经变得更加恶劣。

The attacks against it have been worse.

Speaker 2

我们暂且这么说吧。

Let's put that.

Speaker 2

我觉得,这大概就是为什么最近人们对它的热情有所减退。

Like, I think I think people have this is kind of why people, I think, seem less enthusiastic about it lately.

Speaker 2

不过话说回来。

But anyway.

Speaker 0

差分隐私属于密码学中的哪个分支?

What family of cryptography is differential privacy in?

Speaker 0

具体来说,支撑差分隐私的技术底层是什么?

Like, what kinds of, like, things under the hood are actually building differential privacy?

Speaker 1

我认为它和大多数密码学领域是相对独立的,因为它不涉及计算复杂度问题。

So I would say it's pretty separate from most of cryptography because there's no computational hardness.

Speaker 1

嗯。

Mhmm.

Speaker 1

所以它可能更接近统计学。

So it's maybe closer to like statistics.

Speaker 1

我现在不太熟悉了,但人们会研究如何选择噪声的分布,以针对特定任务优化准确性。

I'm not super familiar anymore, but people study things like what distribution you want to choose the noise from in order to optimize accuracy for your specific task.

Speaker 1

就像图灵说的,他们通过添加噪声来隐藏数据的某些信息。

So like Turin said, they're hiding things about the data by adding noise.

Speaker 1

你可以想到很多选择噪声的方法,所以人们会研究根据你需要精确的内容,从哪种分布中选择噪声最佳。

And you can think of many ways to choose this noise and so people will study what distribution is best to choose the noise from depending on what you want to be accurate.

Speaker 1

例如,如果你在计算中位数,你可能会选择与计算平均值时不同的噪声分布。

For example, if you're computing the median, you might choose noise from a different distribution than if you're computing the mean of the data.

Speaker 0

有意思。

Interesting.

Speaker 0

我现在很好奇水印技术如何介于这种方法和证明类方法之间。

I'm now really curious how the watermarking work lives somewhere between this and the proof stuff.

Speaker 2

我我明白那是我的陈述,所以也许也许米兰达会不同意我的这个观点。

I I get that was my statement, so maybe maybe Miranda will disagree with with me on that.

Speaker 2

不过我想,这倒是个很好的过渡来讨论水印技术。

But I guess, like, good good segue to talk about watermarking.

Speaker 2

你知道,人们最初作为思维实验提出的朴素水印版本是这样的:假设你是大学老师,班上90%的学生都会用某种大语言模型来作业作弊。

So, you know, the naive version of watermarking just for as like a thought experiment that people kind of start with often is, you know, say you're a teacher, you're teaching a class at a university, and, of course, you have to deal with the fact that 90% of your students are going to, like, use an LLM of some form to cheat on their homework.

Speaker 2

有没有办法仅通过文本内容,就能检测出他们是否使用了语言模型生成作业?

Is there a way to detect just from the text whether they cheated or not in some some way that they use the LLM and asked it generated output?

Speaker 2

这个想法就是,你在文本中嵌入了某种可被检测的统计特征。

And so the idea is, like, there's some statistical signature you've somehow embedded that can be detected.

Speaker 2

这大致就是基本设定。

So that's sort of the setup.

Speaker 2

米兰达,或许我们可以从基础开始:什么是水印?真正使其成为独特对象的属性是什么?

So, Miranda, I guess, like, maybe place to start is, like, you know, what is a watermark, and, like, what are the properties of a watermark that really make it, like, a unique object?

Speaker 2

对于熟悉公钥密码学的人,该如何理解水印——它不完全是数学对象,而是更偏向于交互接口的概念?

How should someone who maybe has used public key cryptography think about, you know, what a watermark is as like sort of a not necessarily a mathematical object, but like how they would interface with it.

Speaker 1

我认为可以把水印想象成一种密钥加密方案,其中密文就是语言模型的输出。

I think a good way to think of a watermark is like a secret key encryption scheme, where ciphertexts are LLM responses.

Speaker 1

或者更广义地说,如果你考虑为其他类型的模型添加水印,那么AI生成的内容实际上就是密文。

Or more generally, if you think about watermarking some other kind of model, really the AI generated content is the ciphertext.

Speaker 1

但我们可以先聚焦于大语言模型,这样更便于具体思考。

But we can maybe focus on LLMs for now just to make it easier to think about concretely.

Speaker 1

其核心理念是:如果你知道密钥并拥有某段内容,就能通过某种解密过程发现隐藏的标记模式,从而判断它是否带有水印。

And so the idea is that if you know the secret key and you have a piece of content, you can kind of decrypt to see a secret pattern which tells you whether it's watermarked or not.

Speaker 0

而这个标记会直接体现在实际输出中。

And that would be in the actual output.

Speaker 0

对吧?

Right?

Speaker 0

这不像是在附加文件里,也不像那些老式的水印技术,比如音乐文件里的那种,在MP3文件开头嵌入代码之类的。这是直接嵌入在文本里的。

This isn't like in an attached file or, you know, like thinking about kind of older watermarking techniques and like music or whatever where it was like actually like code at the start of a m p three or something, you know, like this is in the text.

Speaker 0

对吧?

Right?

Speaker 0

因为很多人只是把这些文本复制到其他地方去。

Because a lot of people are just copying this text into another place.

Speaker 0

这不像他们有一个你可以检查的文件。

It's not like they have a file that you could check.

Speaker 1

是的。

Yeah.

Speaker 1

是的。

Yeah.

Speaker 1

这个观点非常到位。

That's a really good point.

Speaker 1

所以对于文本来说,水印必须嵌入到实际文字中尤为重要。嗯。

So for text, it's especially important that the watermark is embedded in the actual words Mhmm.

Speaker 1

因为人们会复制粘贴。

Because people are copying and pasting.

Speaker 1

在其他媒介中,或许你可以把信息嵌入元数据里,比如视频的元数据很难剥离的情况。

In other mediums, maybe you can get away with embedding information in the metadata, like if it's really hard to strip the metadata from a video or something.

Speaker 1

我不确定。

I don't know.

Speaker 0

好的。

Okay.

Speaker 1

对于文本来说,确实需要将水印嵌入到文字本身中。

For text, it really needs to be in the words themselves.

Speaker 1

有意思。

Interesting.

Speaker 0

也许还包括一些图片,因为它们也经常被复制。

Maybe also in some of the images because also those are being copied.

Speaker 0

不过我很好奇,那会是什么样子。

But I'm curious, yeah, what that would even look like.

Speaker 0

这很酷。

That's cool.

Speaker 0

是啊。

Yeah.

Speaker 0

继续深入探讨这个。

Continuing in on this.

Speaker 0

什么样的密钥,或者说,你几乎已经描述的是,在文本中藏有一把钥匙。

What kind of key or I mean, you almost have you what you're describing is, like, within the text, there's a key.

Speaker 0

某种秘密信息被添加进去了。

There's some sort of secret that's been added.

Speaker 0

可能是通过,比如,使用特定数量的元音——我这里随便举个例子,比如前10个单词中使用的元音数量之类的。

It's added maybe in, like, the number of vowels used with I mean, I'm just gonna use something random here, but, like, some number of vowels used in the first 10 words or something like that.

Speaker 0

而且这个规律始终不变。

And it's always the same.

Speaker 0

然后他们会重新排列文本来符合这个规律,我猜是这样。

And then they'll rearrange text to look like that, I'm guessing.

Speaker 0

你用什么方法来发现这个规律?

What do you use to find that?

Speaker 0

具体来说,你实际是用什么工具来解密判断是否存在隐藏信息?

Like, what is the thing that you're actually, like, using to decipher whether or not there is something?

Speaker 1

是的。

Yeah.

Speaker 1

通常会有某种大家都同意的规则。

So usually there's some kind of rule that everyone agrees on.

Speaker 1

元音的例子其实是个好例子。

The vowel one is actually a good example.

Speaker 1

对。

Yeah.

Speaker 1

也许你们会同意修改LLM,使其总是在前三个单词中使用相同数量的元音,知道这个规则的人现在就可以去检查回复了。

Maybe you would agree that you're you're going to modify the LLM to always output the same number of vowels in the first three words or something And like everyone who knows that rule can now go and take a response and check it.

Speaker 1

如果符合,那可能就是加了水印。

And if it holds, then it's probably watermarked.

Speaker 1

还有一个更接近实际使用方案的例子是

And one good example that's closer to actual schemes that are being

Speaker 0

我觉得元音的方法并不是实际运作方式。

used I feel like the Vowel one is not how it works.

Speaker 0

但这是我脑海中想到的,也许类似这样的东西。

But it was just that was the one that came in my mind is like, maybe it's something like that.

Speaker 0

就像,只是让我们思考一下这个问题。

Like, just for us to to think about it.

Speaker 0

就像,实际上并不影响使用的词语。

Like, doesn't actually affect the words being used.

Speaker 0

它实际上并不影响所说的内容。

It doesn't actually affect what's being said.

Speaker 0

这就像是语言层面的某种东西。

It's just like something in the language part of things.

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

没错。

Exactly.

Speaker 1

你需要的是那种既不会太影响质量,又足够独特以至于不会出现在人类生成的文本中的东西。

You want something that's not going to hurt the quality too much, but that's still like distinctive enough that it wouldn't appear in human generated text.

Speaker 0

那种持续性的。

That consistently.

Speaker 1

是的。

Yes.

Speaker 1

对。

Yeah.

Speaker 0

嗯。

Yeah.

Speaker 0

如果你想分享一些例子,那就太好了。

If you wanna share some examples, that would be amazing.

Speaker 0

比如一些真实或更真实的例子。

Like some real or more real examples.

Speaker 1

好的。

Yeah.

Speaker 1

所以一个非常常见的方案框架是将所有单词分成两个大小相等的随机列表,称为绿色列表和红色列表,然后调整你的LLM使其更倾向于使用绿色列表上的单词。

So one like really common framework for a scheme is to divide all words into two equal sized random lists, call them a green list and a red list, and now modify your LLM to prefer to use words on the green list.

Speaker 1

因此现在LLM生成的文本可能会有70%的绿色词汇。

And so now LLM generated text might have say 70% green words.

Speaker 1

而人类生成的文本会是五五开,因为你随机选择了这些列表并确保它们大小相等。

Whereas human generated text will have fifty fifty because you chose these lists randomly and to have equal sizes.

Speaker 1

人类生成的文本应该正好对半分开,一半红色,一半绿色。

Human generated text should be split right down the middle, half red, half green.

Speaker 1

所以如果你知道这种红绿划分方式,这就像是你可以用来检测水印存在的秘密规则。

And so if you know this partition into red and green, this is like the secret rule that you can use to check for the presence of the watermark.

Speaker 0

这让我不禁思考,是否已经存在水印了?

This makes me wonder like, are there already watermarks?

Speaker 0

我想听更多这方面的例子。

I wanna hear more examples here.

Speaker 0

但是否已经存在水印了呢?

But are there already watermarks?

Speaker 0

因为我在想,为什么有些文本如果不改变与你对话的角色,我总觉得通用ChatGPT生成的文本经常会用一些我们现实生活中不太使用的词汇。

Because like, why is it that that some text if you don't change the character that's speaking to you, I feel like generic chat GPT text will often use words that like we don't really use in real life.

Speaker 0

比如'delve'(深入探究)和'thrilled'(兴奋不已)这类词

Like delve and thrilled.

Speaker 0

我们确实会用这些词,但使用频率远没有这里显示的这么高

And we use them, but we don't use them as much as it seems to be used here.

Speaker 0

所以

So

Speaker 1

是的

Yeah.

Speaker 1

我想应该说明一下,水印技术并非检测AI生成内容的唯一方法

So I guess I should mention that watermarking isn't the only approach to detecting AI generated content.

Speaker 1

目前已有一些事后检测器效果相当不错

And right now, there are some post hoc detectors that work pretty well.

Speaker 1

所谓事后检测器就是经过训练的分类器,专门区分ChatGPT生成文本和人类文本

Where post hoc detectors are just classifiers that are trained to distinguish between like say ChatGPT generated texts and human text.

Speaker 1

这些检测器目前还行,但本质上会逐渐失效,因为我们发展更好LLM的目标就是让AI文本越来越接近人类文本

And these work okay for now, but they're sort of inherently going to become worse because our goal in developing better LLMs is to make the LLM text seem as close to human text as Yeah.

Speaker 1

因此,即使没有人试图欺骗事后检测器,我们训练更好模型的目标也是要揭示这一点。

So without anyone even trying to fool the post hoc detectors, like our goal in training better models is to reveal Yeah.

Speaker 1

它们所寻找的特征。

The that they're looking for.

Speaker 1

百分之百。

A 100%.

Speaker 2

可以说,当前流行的可验证奖励和强化学习模型基本上就是在尝试做到这一点。

And arguably, of the the current rage of verifiable rewards and RL models is basically trying to do that.

Speaker 2

对吧?

Right?

Speaker 2

因为你拿一个预训练模型,然后制作一个检测器,让模型与检测器进行博弈直到它能有效欺骗检测器。

Because you're you take a pretrained model, and then you make a detector, and you have it play a game with the detector until it is able to fool it effectively.

Speaker 2

在某种程度上,可以这样理解——虽然不完全准确,但存在这样一种世界观。

It's like, in some ways, can view the it's not exactly that, but there's a version of the world you can view.

Speaker 2

不过实际上,这或许给了我们一个很好的理由稍微绕个弯,那就是:水印技术的历史是怎样的?

So but actually, maybe this gives us a good reason to to take a slight detour, which is what is the history of watermarking?

Speaker 2

我记得很清楚,大概是在ChatGPT刚推出那会儿,斯科特·阿伦森发过一篇博客,说我们需要找到统计标记这类内容的方法。

So I like you know, for me, I I kind of remember there was this, you know, around the time Chattypu T came out, there was this, like, Scott Aronson post about, hey, like, we need to figure out some way of statistically marking these things, and, you know, it was like a blog post.

Speaker 2

但如果你读过任何水印论文,所有人都会引用斯科特·阿伦森那篇博客。

But if you read any watermarking paper, everyone cites this Scott Aronson blog post.

Speaker 2

所以在我看来,那应该就是最初的起源。

So it seems like that to me was the initial genesis.

Speaker 2

但具体历史是怎样的?

But what is the history?

Speaker 2

我是说,大语言模型水印技术发展至今的历史?

Like, what is the history of the LLM watermarking industry, I guess, this point?

Speaker 2

我称之为人们正在研究的这一系列技术。

I'd call it, you know, like the the set of stuff that people are working on.

Speaker 2

它是从何时开始的?

Like, where did it start?

Speaker 2

又是如何演变的?

How has it evolved?

Speaker 2

具体来说,它是如何发展到现在的?

Where, you know, and how do we get to where it is now?

Speaker 1

好的。

Okay.

Speaker 1

根据我的记忆,我最初看到的是Scott的博客文章。

From what I remember, I saw Scott's blog post first.

Speaker 1

我觉得这很棒,因为文章通俗易懂,受众广泛。

And I think that was nice because it was very easy to read, and he gets a wide audience.

Speaker 1

所以我认为很多人可能是通过Scott的博客文章了解到LLM水印技术的。

So I think a lot of people have probably discovered LLM watermarking from Scott's blog post.

Speaker 1

但那只是一篇博客文章,所以没有包含全部细节。

But it was just a blog post, so it didn't have the full details.

Speaker 1

我认为Kirchenbauer等人是第一个在论文中全面阐述这个技术的人。

I think Kirchenbauer et al were the first to really flush everything out in a paper.

Speaker 1

所以那篇论文也获得了大量关注。

So that paper gets a lot of attention as well.

Speaker 1

我不确定哪个先出现。

I'm not sure which came first.

Speaker 1

我认为这并不为人所知,因为Scott当时在OpenAI从事水印技术研究。

And I think it's not really known because Scott was working on watermarking at OpenAI.

Speaker 1

所以他无法分享大部分他正在研究的内容。

So he couldn't share a lot of what he was working on.

Speaker 1

另外,我知道谷歌有他们自己的水印合成ID,如果你回顾一下专利,会发现这些专利的申请时间与Scott的博客文章和Kirchenbauer等人的工作大致同期。

And also I know Google has their watermark synth ID, which if you look back at the patents for, the patents were filed around the same time as Scott's blog posts and Kirchenbauer et al.

Speaker 1

但合成ID论文发表的时间要晚得多,我想是在去年年底才出来。

But the synth ID paper came out much later, like I think within the last year.

Speaker 1

所以这段历史并不十分清晰。

So it's not that clear exactly what the history was.

Speaker 1

但我想,大多数人是从Scott的博客文章或Kirchenbauer等人的论文中了解到水印技术的,这篇论文实际上还获得了ICML的最佳论文奖,我想

But I think most people found out about watermarking from Scott's blog post or the Kirchenbauer et al paper, which actually won a best paper award at, I think, ICML, which also

Speaker 2

23年。

'23.

Speaker 2

对吧?

Right?

Speaker 1

是啊。

Yeah.

Speaker 1

哦,哇。

Oh, wow.

Speaker 1

已经有一段时间了。

It's been a while now.

Speaker 2

嗯。

Yeah.

Speaker 2

是啊。

Yeah.

Speaker 2

不。

No.

Speaker 2

这就是水印技术最有趣的地方——从2022年至今的角度看它是新的,但从大语言模型的发展历程来看它并不新。

That's all the the the interesting thing about this watermarking is it's, new in the sense of 2022 to now, but it's not new in the sense of LLMs.

Speaker 2

就像是

It's like

Speaker 1

是啊。

Yeah.

Speaker 1

不知怎么的,这在水印技术变得真正好用之前就出现了。

Somehow it came a bit before LLMs got really good.

Speaker 1

我记得当我刚开始对水印技术感兴趣时,我试着尽可能多地使用大语言模型,但因为它们表现太差,感觉就像在应付差事。

Like, I remember when I was starting to get interested in watermarking, I tried to actively use the LLMs as much as I could, but it it felt like a chore because they were pretty bad.

Speaker 1

但现在当然它们已经好多了。

But now of course they're much better.

Speaker 1

不过确实,我认为斯科特的博客文章和Kirchenbauer等人的研究在该领域极具影响力。

But yeah, I would say Scott's blog post and Kirchenbauer et al were super influential in the field.

Speaker 1

然后《纽约时报》还发表了一篇关于这两者的文章,写得非常好。

And then also the New York Times ran an article on both of them that was very nice.

Speaker 1

文章里还有不错的交互式可视化内容。

It had nice like interactive visualization.

Speaker 1

所以你真的可以看到这个红绿名单的概念以一种很酷的动画方式呈现。

And so you could actually see this red green list idea kind of like animated in a cool way.

Speaker 1

这对我来说真的很有帮助。

And for me this was really helpful.

Speaker 1

我觉得我在没读过任何论文的情况下就开始讨论水印研究了。

Like I think I started talking about watermarking research without having read any papers.

Speaker 1

但因为看过这篇纽约时报的文章,我对它的工作原理有了很好的心理模型。

But because I'd seen this New York Times article, I had like a good mental model of how it works.

Speaker 1

所以实际上开始思考这个问题很容易。

So it was actually easy to start thinking about it.

Speaker 1

不错。

Nice.

Speaker 1

所以我认为这三件事确实对这个领域很有帮助。

So yeah, those three things I think really helped the field.

Speaker 1

这大概就是LLM水印技术的开端了。

So that was kind of the beginning of LLM watermarking.

Speaker 1

出于某种原因,LLM水印技术比其他AI生成内容更早受到大量关注。

And for some reason, LLM watermarking got a lot of attention before other AI generated content.

Speaker 1

就像现在人们研究图像水印比当时要多得多。

Like now people work on image watermarking a lot more than they did then.

Speaker 1

好的。

Okay.

Speaker 1

I'm

Speaker 0

我猜最初只是从文本开始的?

not Started just with text, I guess?

Speaker 1

我不认为它最初仅局限于文本。

I don't think it started just with text.

Speaker 1

我认为当时文本确实受到了大量关注。

I think there was a lot of attention on text.

Speaker 1

但人们长期以来一直在对各种内容进行水印处理。

But people have been watermarking all kinds of things for a long time.

Speaker 1

或许我该提到,早在Scott的博客文章和Kirchenbauer等人之前,人们就已经对文本、图像和视频进行水印处理长达数十年了。

So maybe I should mention even before like Scott's blog posts and Kirchenbauer et al, people had been watermarking text and images and video for like decades.

Speaker 0

是啊。

Yeah.

Speaker 0

确实。

True.

Speaker 0

用于数字版权管理(DRM)和身份识别用途。

For DRM stuff and for ID recognition reasons.

Speaker 0

没错。

And yeah.

Speaker 0

很久很久以前,我开发过一款音乐识别软件产品。

Long long ago, I worked on like a music recognition software product.

Speaker 1

哦,厉害。

Oh, wow.

Speaker 0

那款产品完全依赖于元数据和这些水印技术。

And it was all about metadata and these watermarks.

Speaker 0

那简直就像是史前时代的事情了。

That was like before all the is ancient history.

Speaker 1

是啊。

Yeah.

Speaker 1

确实有很多来自早期研究的影响,只是人们现在不太提起。

So there's a lot of like influence from older work that people don't talk about so much.

Speaker 1

没错。

Yeah.

Speaker 1

不知为何,在Scott那篇博客发布时,AI生成文本更受关注。

For some reason, AI generated text was more popular around when Scott's blog post came out.

Speaker 1

但现在图像也开始获得更多关注了。

But now images have been getting more attention too.

Speaker 1

而且说实话,我觉得图像其实更容易添加水印。

And those are actually, I would say easier to watermark.

Speaker 1

嗯。

Mhmm.

Speaker 1

你能获得更好的鲁棒性保证。

Like you get better robustness guarantees.

Speaker 1

鲁棒性意味着即使有人试图去除水印,也很难成功。

Robustness means that it's hard for someone to remove the watermark if they're really trying.

Speaker 0

在图片中隐藏信息也更容易。

It's easier to hide stuff in images too.

Speaker 0

对吧?

Right?

Speaker 1

是的。

Yeah.

Speaker 1

你只是有更多可操作的空间。

You just have like more to work with.

Speaker 1

没错。

Yeah.

Speaker 1

确实。

Yeah.

Speaker 1

因此最初人们对大语言模型水印技术感到非常兴奋,但几年过去了,现在才有公司开始部署水印。

And so there was this initial excitement about LLM watermarking and since then it's taken a couple years for companies to deploy watermarks.

Speaker 1

比如谷歌已经部署了Synth ID系统,据我所知它可以给文本、图像和视频添加水印。

So Google has deployed Synth ID, which watermarks text, images and video, I believe.

Speaker 1

而亚马逊,我知道他们有一个图像水印系统,并且实际上提供了公开可用的检测器。

And Amazon, I know has an image watermark that actually has a publicly available detector.

Speaker 1

这是我目前唯一知道的公开检测器。

This is the only one that I'm aware of.

Speaker 1

其他所有水印技术,只有嵌入水印的公司才能检测到。

All other watermarks, only the company that embeds the watermark can detect.

Speaker 1

所以作为用户,你根本看不到水印的存在。

And so as a user, like you can't see the watermark at all.

Speaker 1

但亚马逊确实提供了一个公开的图像水印检测器。

But Amazon actually has a publicly available image watermark detector.

Speaker 0

那OpenAI呢?

What about OpenAI?

Speaker 0

它有采取什么措施吗?

Does it have anything?

Speaker 1

哦,这非常有趣。

Oh, that's very interesting.

Speaker 1

是的。

Yeah.

Speaker 1

《华尔街日报》有一篇文章探讨了为什么OpenAI选择不部署水印技术。

So there's a Wall Street Journal article about why OpenAI chose not to deploy a watermark.

Speaker 1

Scott Aronson曾为他们研究过水印技术,但最终他们什么也没部署。

So Scott Aronson worked on watermarking for them, and in the end they didn't deploy anything.

Speaker 1

这其中有几个原因。

And there are a couple reasons for that.

Speaker 1

一是他们进行了用户调查,用户不想要水印,我想这并不意外。

One is they ran a user survey and users didn't want watermarks, which I guess is unsurprising.

Speaker 1

所以他们最终没有部署水印。

So they didn't deploy a watermark.

展开剩余字幕(还有 480 条)
Speaker 1

此外,我认为这是一项庞大的工程任务,

And then also I think it was a big engineering task,

Speaker 0

或者说确实存在水印,只是极其隐秘。

or there is a watermark, but it's really really secret.

Speaker 1

确实如此。

That's true.

Speaker 1

是啊。

Yeah.

Speaker 1

所以我们甚至可能根本不知道他们是否在使用水印技术。

So we could just not even know if they're watermarking.

Speaker 2

确实。

Quite.

Speaker 2

不太厚道。

Not nice.

Speaker 2

我喜欢ZK播客现在开始讨论阴谋论了。

I love that the ZK podcast now has conspiracy theories on it.

Speaker 0

我很想听听更多关于不同类型水印的例子。

I'd love to hear about more examples of types of watermarks.

Speaker 0

你之前提到的红绿例子,能否再分享一些你知道的、可能已经部署或正在实验中的水印技术?

What you said before with the red green example, could you share a few more that you do know of that maybe have been deployed or have been experimented with?

Speaker 0

不仅是文本,还包括你提到的其他媒介。

So for text, but maybe also for some of the other mediums you mentioned.

Speaker 1

当然。

Sure.

Speaker 1

我对一种图像水印比较熟悉。

I know one image watermark pretty well.

Speaker 1

这是我的合著者Sam Gunn与他在伯克利的同事Xuandong Zhao和Don Song共同研究的。

This is by my co author Sam Gunn and his colleagues at Berkeley, Xuandong Zhao and Don Song.

Speaker 1

这种水印被称为生成式图像模型的不可检测水印。

So this watermark is called an undetectable watermark for generative image models.

Speaker 2

但这是一篇非常新的论文。

But this is a very recent paper.

Speaker 2

对吧?

Right?

Speaker 2

大概几个月前,或者一个月前?

Like a couple months ago or month one month ago maybe.

Speaker 2

对吧?

Right?

Speaker 2

还是说这是更早的?

Or is this older?

Speaker 1

至少半年前。

At least six months ago.

Speaker 1

我记得几个月前在某个会议上发表过,实际可能还要再早半年。

Think it appeared at a conference a few months ago and came out maybe six months before that.

Speaker 2

可能我就是在那看到的。

Maybe that's where I saw it.

Speaker 1

是的。

Yeah.

Speaker 1

我想是在今年的ICLR会议上。

I think it was at ICLR of this year.

Speaker 0

那它是怎么运作的呢?

So how does it work?

Speaker 1

这类图像模型首先会采样一个随机潜变量向量。

It works with these kinds of image models that first sample random latent vector.

Speaker 1

这些图像模型通常有一个潜空间,代表图像的高级特征。

So these image models typically have a latent space which represents the higher order features of the images.

Speaker 1

你可以把空间中的每个分量想象成代表图像的亮度、愉悦度或自然度等属性。

So you can think of each component in the space as representing something like how bright the image is or how happy it is or how like nature it is.

Speaker 1

类似这样的抽象特征。

Something abstract like that.

Speaker 1

这些模型会先采样这些随机特征,然后通过某种变换将这些特征转化为实际图像。

And so these models will first sample these random features and then have some transformation from these features to an actual image.

Speaker 1

它们还有一个逆向过程,可以将图像转换回潜变量,这些潜变量通常接近生成它们时使用的原始潜变量。

And they also have a reverse process so you can take an image and transform it into latents which are typically close to the latents that were used to generate them.

Speaker 1

但这个转换过程并不完美。

But this transformation isn't perfect.

Speaker 1

这里存在一些误差。

There's some error here.

Speaker 1

因此这个水印是通过在潜在空间中嵌入特定模式来实现的。

And so this watermark works by embedding a pattern in the latents.

Speaker 1

通常它们像是围绕零值分布的高斯随机变量。

So usually they're like random gaussians centered at zero.

Speaker 1

使用水印时,你会先选择一个特殊的秘密向量,比如由正负一组成的向量。

With the watermark, you'll first choose special secret vector of say plus or minus one.

Speaker 1

关于这个向量如何选择,我稍后可能会详细说明。

And maybe I'll talk more about how this is chosen later.

Speaker 1

然后他们会选择与这个向量符号相对应的潜在变量。

But then they'll choose the latents to have signs corresponding to this vector.

Speaker 1

比如说,如果这个向量的第一个分量是1,那么第一个分量就会是正数,以此类推。

So you know, if the first component of this vector is one, then the first component will be positive and so on.

Speaker 1

因此现在,如果你知道这个秘密向量,那么当你获取一张图像时,可以将其转换回潜在空间,并将符号与秘密向量进行比对。

And so now, if you know the secret vector, then if you get an image, you can transform it back to its latents and compare the signs to the secret vector.

Speaker 0

什么是潜在空间?

What's a latent?

Speaker 0

其实我不太了解这个术语。

I actually don't know that term.

Speaker 0

这是图像处理术语还是向量术语?

Is that an image term or is that a vector term?

Speaker 2

不是。

No.

Speaker 2

这是机器学习术语,指那些不可见的变量,它们类似于生成参数,可能无法直接解释,但能生成最终信息的过程。

That's like a machine learning term for like variables you don't see that are sort of generate like like parameters that might not be interpretable, but they like generate the process that's able to put the final information.

Speaker 0

能举个类似的例子吗?

What would an example of something like that be?

Speaker 0

就是你之前描述的那种明亮、欢快的感觉吗?

Is that what you were describing with like the sort of bright, happy, like that?

Speaker 0

还是别的什么?

Or is it something else?

Speaker 1

我也不太清楚。

I don't really know.

Speaker 1

我听人们经常用这个词。

I hear people throw this term around.

Speaker 1

我觉得Tarun说的有道理。

I guess what Tarun said makes sense.

Speaker 1

就像潜在空间,作为用户你本来就不该看到它。

Like with the latent space, it's something that as a user you're not meant to ever see.

Speaker 1

它只是生成图像过程中的一个中间步骤。

It's just like an intermediate step in generating the image.

Speaker 2

这么说吧,我第一次接触这个术语是在隐马尔可夫模型中,当时我从这个概率分布中采样一个状态,但有一小部分变量会改变我的采样过程。

So so I mean like I think the first time I ever ran into this term was, like, in hidden Markov models where, like, I have a Markov chain that I'm, you know, I'm sampling a state from this, like, probability distribution, but the thing there's, like, some small set of variables that changes the my sampling process.

Speaker 2

所以我进行采样过程时,比如生成下一个单词,就像'敏捷的狐狸跳过小屋'这样。

So I have my sampling process, you know, I'm generating the next text word, so it's like the quick fox jumped over the shed.

Speaker 2

你可以想象存在其他变量会改变下一个词的概率分布。

And you could imagine that there's some other variables that change what the distribution of the next word is.

Speaker 2

对吧?

Right?

Speaker 2

所以最古老的文本模型——我是说90年代的,当然还有更早的——但那些在90年代流行起来的模型,比如二月份研究的这类东西,最终形成了可以说是最后一个前神经网络语言模型,也就是主题模型。我提到这个是因为它实际上和你刚才描述的东西非常相似。

And and so the oldest text models, like, from the nineties I mean, there's older than that, but, like, the ones that were popular in the nineties and February did this type of stuff and culminated in kind of, I would call it, like, the last pre neural net language model, which was topic models, and and that topic models are actually and the reason I'm bringing this up is it's actually very similar to the thing you just described.

Speaker 2

主题模型的主要算法就是所谓的潜在狄利克雷分配。

So topic models are the the main algorithms is this thing called latent Dirichlet allocation.

Speaker 2

LDA的作用是尝试学习词元上的分布,即对应不同词簇的单词分布。

And LDA, what it does is it tries to learn distributions over tokens, so distributions over words that that correspond to clusters.

Speaker 2

比如可能你词汇中的一个簇是音乐相关的。

So maybe, like, one cluster in your words is like music.

Speaker 2

像萨克斯、长号这类词被选中的概率永远不会低于10%。

So all of the words like saxophone, trombone, whatever, have probabilities of being chosen that never go below 10%.

Speaker 2

而其他所有词的概率则在10的负9次方百分比左右。

But then everything else is, like, 10 to the minus 9%.

Speaker 2

然后这个就被称为音乐主题。

And then that that's that'll be called the music topic.

Speaker 2

而视频主题可能包含摄像机、相机、灯光等高概率词汇。

And then the video topic might be, like, camcorder, camera, light, and those are all high high probability words and all.

Speaker 2

所以早期就有这类预测下一个词分布概率的模型,而潜在部分则负责如何对这些不同分布进行聚类。

So there were already these kinds of, like, models where people are, like, predict next distribution over next word, and the latent part was choosing how you cluster the the different distributions.

Speaker 2

明白。

Understand.

Speaker 2

我认为神经网络领域的理念是,神经网络能学习到类似这样的潜在表征。

And I think the idea in neural net land is that the neural net learns some representation like that, that is this latent represent.

Speaker 2

你无法确切知道它是什么,但从均方误差或性能表现来看确实有效。

You don't know exactly what it is, but it seems to work in terms of, like, mean squared error or, like, how well it's performing.

Speaker 2

但'潜在'这个词用于指代未知背景变量的用法,在机器学习领域已有三十年历史。

But the the usage of the word latent for, like, background variable that you don't know is kind of like in machine learning has existed for thirty years.

Speaker 2

不过,我觉得这是同一个概念。

But, yeah, I think it's the same thing.

Speaker 2

这就像一组特定类型像素的特征,它们代表着某种有用的信息。

It's like some set of features of certain types of pixels that represent like something useful.

Speaker 2

就像是可能类似于

Like like maybe like

Speaker 0

回到你之前说的,米兰达,这就像是图像的前身。

back going back to what you said, Miranda, then it's like something sort of their predecessor to what the image is.

Speaker 0

你正在回溯到前一步骤,正是在那里检测是否存在任何水印。

You're going backwards to like the step before, and it's that where you're kind of testing it for any sort of watermark.

Speaker 1

没错。

Exactly.

Speaker 0

好的。

Okay.

Speaker 0

是的。

Yeah.

Speaker 0

那些潜在变量。

The latents.

Speaker 1

是的。

Yeah.

Speaker 1

那么为了完整描述这个方案,我说过你们使用这个符号向量来嵌入水印,并且需要用它来检测。

And so maybe to complete the description of the scheme, I said you use this vector of signs to embed the watermark and you need this to detect.

Speaker 1

现在如果你真的这么做并使用固定的符号向量,比如第一个分量决定图像亮度而你总是选择其为正值,你就会严重扭曲生成图像的分布,因为它们永远会是明亮的。

And now if you actually do exactly that and have a fixed vector of signs, then you know, if the first component determined the brightness of the image and you're always choosing that to be positive, you're significantly skewing the distribution of images that you generate because they'll always be bright.

Speaker 0

所以这实际上影响了输出结果。

So it's affecting the output actually.

Speaker 1

没错。

Yeah.

Speaker 1

因此我认为这种做法并不理想。

So this I think is not great.

Speaker 1

有些图像水印确实会选择固定向量,用相同的符号向量来修改所有图像。

And there are some image watermarks that really choose like a fixed vector and, you know, use the same vector of signs to alter all of the images.

Speaker 1

但这篇关于不可检测图像水印论文的核心洞见在于:他们能选择大量看似独立随机的不同向量,从而为每张图像有效选择全新的随机符号向量。

But the insight in this undetectable image watermark paper is they have a way to choose lots of different vectors that look independently random, so they can effectively choose a brand new random vector of signs for each image.

Speaker 1

然而,如果你有一个主密钥,仍然可以检测出水印。

Yet, if you have like a master secret key, you can still detect the watermark.

Speaker 1

因此你仍能辨别出这个看似随机的向量来自特定的水印向量家族。

So you can still tell that this random looking vector comes from the special family of watermarked vectors.

Speaker 1

要实现这一点,你需要一个称为伪随机码的对象,这来自我和Sam Gunn的论文。

And to do this, you need an object called a pseudo random code, which was from my paper with Sam Gunn.

Speaker 1

所以是的,我承认我对这个方案有偏爱,因为它就像是...

So yeah, I guess I'm biased in liking the scheme because it's like

Speaker 2

我们...我们即将讨论PRC,因为在我看来,这里与ZK(零知识)最相似的联系在于简洁证明

We we are we are going to get to to PRCs because I think to me, that's where the connect the, like, if the thing that looks most like ZK stuff exists is like succinct proofs

Speaker 0

嗯。

Yeah.

Speaker 2

最接近ZK概念的就是这个PRC(伪随机码)。

The thing that looks closest to that is this PRC.

Speaker 0

有意思。

Interesting.

Speaker 0

在你创建或讨论的工作中,是否存在公开验证者?

In the work that you created or you talked about, like, are there public verifiers?

Speaker 0

有验证者吗?

Is there a verifier?

Speaker 0

它存在吗?

Does it exist?

Speaker 0

它是如何处理这个问题的?

How does it deal with it?

Speaker 0

它是否在运行那种逆向转换来获取底层内容?

Is it, like, running that, like, anti transformation in order to get to that underlying stuff?

Speaker 0

还是能够直接穿透并识别出潜在内容是什么?

Or is it able to kinda cut through and just like figure out what the latents were?

Speaker 1

是的。

Yeah.

Speaker 1

好问题。

Good question.

Speaker 1

对于嵌入在潜在空间中的这些图像水印,检测器确实需要运行逆向过程。

So for these image watermarks that are embedded in the latents, the detector does have to run the reverse procedure.

Speaker 0

好的。

Okay.

Speaker 1

这会产生一定的计算成本。

Which it has some computational cost.

Speaker 1

不过没有生成图像那么耗费资源。

It's not as bad as generating an image.

Speaker 1

好的。

Okay.

Speaker 1

你还需要了解生成图像所使用的模型信息。

And you also need to know something about like what model was used to generate the image.

Speaker 1

哦,因为逆向过程是与模型绑定的。

Oh, Because For the reverse procedure is tied to the model.

Speaker 1

好的。

Okay.

Speaker 1

所以这是这些图像水印的一个缺点。

So that is one downside of these image watermarks.

Speaker 2

我想,我们讨论了很多关于水印的设置,我认为,我们刚才讨论的例子中一个关键点是,我们想要一些特性,也有一些特性是我们不希望水印具备的。

I guess, you know, we've talked a lot about the kind of setup for watermarks, and I think, you know, one key thing to the examples that we just talked about is there's clearly some some features we want and some features we don't want out of a watermark.

Speaker 2

所以这些基本属性,就像CK中的正确性或共识中的活跃性。

And so there are these kind of, like, fundamental properties, almost like soundness in in CK or liveness in consensus.

Speaker 2

水印必须具备一组特性,才能被证明,比如嵌入水印或被验证者验证。

There's there's a set of properties that a watermark must have, and in order to be sort of, like, proven, like embedding the watermark or verified by whoever is verifying.

Speaker 2

那么我们能否讨论一下人们研究且重要的那些属性集合?

So maybe could we talk through the sets of properties that people study and and are important?

Speaker 2

你知道,显然目前所有例子中最常提到的一个概念就是某种偏差,比如,

You know, obviously, the most the one that's come up in all the examples so far is some notion of bias of, like, hey.

Speaker 2

通过添加水印,我引入了偏差。

By adding in the watermark, I'm biasing.

Speaker 2

我正在改变生成的内容,并且希望以某种方式限制这种改变。

I'm changing the thing generated, and I wanna limit that somehow.

Speaker 2

但如果我们将此与可靠性、完备性等概念类比,水印应具备哪些特性呢?

But, you know, if if we make this comparison to, you know, sort of soundness, completeness, things like that, what are kind of the properties you want in the watermark?

Speaker 1

我刚意识到这些特性与零知识证明或简洁证明系统之间存在非常明确的联系,之前竟没注意到这点,这很有意思。

I just realized a very explicit connection between these properties and ZK or succinct proof systems, I So didn't realize this is good.

Speaker 1

从高层次来看,水印需要具备的特性首先是能在被标记内容中检测到它的存在。

At a high level, the properties you want from a watermark are one that you can detect it in watermarked content.

Speaker 0

嗯。

Mhmm.

Speaker 1

只要掌握检测所需的密钥或其他信息,水印就应该能被实际识别出来。

If you know the secret key or whatever information you need to detect, it should actually show up.

Speaker 1

第二个特性是要求低误报率。

The second property is you want a low false positive rate.

Speaker 1

人类生成的内容不应被错误地标记为含有水印。

So human generated content shouldn't be falsely flagged as watermarked.

Speaker 1

第三则是需要某种质量保证机制。

And then third, you want some kind of quality guarantees.

Speaker 1

所以你不想因为嵌入这个水印而损害内容的质量。

So you don't want to be harming the quality of the content by embedding this watermark.

Speaker 0

通过创建 是的。

By creating Yeah.

Speaker 0

是的。

Yeah.

Speaker 0

这回到你之前说的,你不想让它影响结果。

This is going back to what you were saying with where it's like, you don't want it to impact the outcome.

Speaker 0

你不想让它比如让某些东西变亮,仅仅因为你想在里面加入这个水印就总是变亮。

You don't want it to like make something brighter and always bright just because you wanna have this this watermark in it.

Speaker 1

没错。

Exactly.

Speaker 1

是的。

Yeah.

Speaker 1

所以质量非常重要,特别是如果公司要在像ChatGPT这样的产品上部署水印。

So quality is super important, especially if companies are going to be deploying watermarks on top of like ChatGPT for example.

Speaker 1

他们确实不希望影响模型输出的文本质量。

They really don't want to be harming the quality of the text that it outputs.

Speaker 1

是的。

Yeah.

Speaker 1

我还要补充第四个同样重要的特性,但我认为这是水印的额外优势——鲁棒性。

And I would add a fourth property that's also important, but I consider it to be a bonus property of watermarks, which is robustness.

Speaker 1

也就是说,即使有人试图去除水印,你仍希望它能保留下来。

So you want the watermark to remain even if an adversary is trying to remove it.

Speaker 1

这些就是我们期望的高阶特性。

So these are the high level properties that we want.

Speaker 1

但在我与Sam Gunn或Zamir合作的论文《语言模型的不可检测水印》中,

But in papers like in my work with Sam Gunn and or Zamir, it's called an undetectable watermark for language models.

Speaker 1

我们形式化了这些期望特性的概念,这些定义受到了密码学的启发。

We formalized notions of these properties that we want, and these definitions were inspired by cryptography.

Speaker 1

很酷。

So Cool.

Speaker 1

我们想要的第一个特性是水印可检测性,我们将其定义为完备性——如果你熟悉简洁证明,这也是

The first property we wanted that the watermark is detectable, we define this as completeness, which if you're familiar with succinct proofs, this is also

Speaker 0

哦,

Oh,

Speaker 1

是的。

a yes.

Speaker 1

这些特性之一。

Property of those.

Speaker 1

是的

Yeah.

Speaker 1

这个属性的含义是:如果你生成的文本具有足够的熵值,即有足够的自由度来嵌入水印,那么水印实际上应该以压倒性的概率出现。

So this is the property that if the text you're generating has enough entropy, meaning like there's enough freedom to embed the watermark, then it should actually appear with overwhelming probability.

Speaker 1

这,你知道的,与证明文献中的概念相呼应。

And this is, you know, draws parallels to the proof literature.

Speaker 1

但我要说这是一个更普遍的密码学特性。

But I would say is a more general cryptographic property.

Speaker 1

比如,加密也同样具备这类特性。

Like, you have this kind of property for encryption as well.

Speaker 1

而且,我之前提到过水印有点像加密,所以这些不同的密码学对象之间有许多相似之处。

And, you know, I said earlier that watermarks are a bit like encryption, so there are lots of parallels between these different cryptographic objects.

Speaker 0

不错。

Nice.

Speaker 1

而误报率则被形式化为可靠性,这也是这些证明的一个属性。

And the false positive rate, formalized as soundness, which is a property of these proofs as well.

Speaker 1

但它指出,与秘密水印密钥无关生成的内容被标记为含水印的概率应当可以忽略不计。

But it says that content generated independently of the secret watermarking key should be flagged as watermarked with only negligible probability.

Speaker 1

最后,我认为我们这篇论文的主要贡献在于形式化了质量保证,这非常困难,因为连文本质量的定义都难以确定。

And finally, we had like I would say our main contribution of this paper was formalizing a quality guarantee, which is very difficult because it's hard to even define what quality of text is.

Speaker 1

但我们解决这个问题的方法是提出:当水印无法被检测时——在我看来这相当于能达到的最高质量。

But the way we manage to get around this is we say that a watermark is undetectable or in my mind like the highest quality you can possibly get.

Speaker 1

如果无法区分带水印模型与原始模型,就达到了这种状态。

If it's infeasible to distinguish between the watermarked model and the original model.

Speaker 1

这表明任何计算能力有限的算法都无法区分原始模型与水印模型。

So this says that no computationally bounded algorithm can tell the difference between the original model and the watermarked model.

Speaker 0

只要它不知道其中的窍门。

As long as it doesn't know what the like trick is.

Speaker 0

对吧?

Right?

Speaker 0

因为如果它拥有验证组件,就能看出区别。

Because if it had the if ver it had the verifier component, then it would see the difference.

Speaker 0

但在不知道的情况下,它不应该察觉到差异。

But without knowing that, it shouldn't see a difference.

Speaker 1

没错。

Exactly.

Speaker 1

是的。

Yeah.

Speaker 1

这一点非常重要。

That's super important.

Speaker 1

你需要密钥来区分检测器的能力——检测器理应能识别水印——和区分器的能力

You need the secret key to sort of separate the power of the detector, which, you know, should be able to see the watermark, and the distinguisher, which

Speaker 2

应该能实现RPT,即PPT算法

should RPT, be able to PPT algorithm.

Speaker 2

Yeah.

Speaker 1

是的

Yeah.

Speaker 1

没错

Yeah.

Speaker 1

这可以是任何概率多项式时间算法

This can be any probabilistic polynomial time algorithm.

Speaker 2

实际上有趣的是,不可检测性定义确实很像不可区分性混淆的定义,就像我无法检测出差异那样

So actually, the funny thing, the other thing about the undetectability definition, so it really resembles, like, the indistinguishability obfuscation definitions for, like, I can't detect the difference.

Speaker 2

但至少在我看来,那篇论文中的形式化定义与Sahai类IO论文中的定义有所不同

But one kind of at least, like, when I think about the, you know, the formal definition in that paper versus, like, a bunch of kind of, like, the Sahai type IO papers.

Speaker 2

从文本上看,这些定义实际上看起来甚至有些相似,是的。

Like, the definitions actually look even, like, qualitatively, if I look at the text, they're, like, kind of similar Yeah.

Speaker 2

这对我来说,就像是让我觉得这里有些比大多数LLM内容更正式的东西的原因之一。

Which I I thought to me was, like, kind of one of the reasons I was like, oh, there's something, like, more formal here than, like, most LLM stuff.

Speaker 2

但我有个问题,在我看来值得深入探讨,就是你需要某种统计约束才能使水印正常工作,需要一定的最小熵。

But I had one question that's kind of worth diving into in my opinion, which is there's a a kind of statistical constraint you need in order to get your watermark to work and that you need a certain minimum entropy.

Speaker 2

要不你来给我们详细解释一下?

And maybe why don't you walk us through that?

Speaker 2

因为这实际上是为了获得这些类似密码学的保证,你仍然需要一些统计特性,是的,详细讨论这个会很好。

Because because actually that's sort of something where it's in order to get these cryptographic like guarantees, you still need some statistical property and like, yeah, it'd be great to kind of like talk through that.

Speaker 1

是的。

Yeah.

Speaker 1

正如我提到的,我们的完整性保证是,如果响应有足够的熵,那么水印就会出现。

So as I mentioned, our completeness guarantee is that if the response has enough entropy, then the watermark appears.

Speaker 1

之所以需要这个熵要求,是因为你无法期望给确定性响应加水印。

And the reason why you need this entropy requirement is that you can't hope to watermark deterministic responses.

Speaker 1

举个例子,如果我提示大语言模型输出比如圆周率的前100位数字,如果这个模型足够优秀,这里就不该有任何随机性。

So for example, if I prompt the LLM to output like, I don't know, the first 100 digits of PIE, there should be no randomness here if the LLM is any good.

Speaker 1

对吧?

Right?

Speaker 1

大语言模型对输出内容根本没有选择权。

Like the LLM has no choice in what it's outputting.

Speaker 1

所以它不应该能在这里嵌入水印。

And so it shouldn't be able to embed the watermark here.

Speaker 0

是的。

Yeah.

Speaker 1

实际上,可靠性已经意味着你不能给这段文字加水印,否则就会产生误报。

In fact, like soundness already implies that you can't watermark this text because then like this would be a false positive.

Speaker 1

所以在完备性和可靠性之间存在某种张力——如果你让大语言模型输出人类生成的文本,那就不该被加上水印。

So there's some like tension between completeness and soundness, which is that, you know, if you ask the LLM to output a human generated text, then this should not be watermarked.

Speaker 1

没错。

Yeah.

Speaker 1

因此我们的解决方案是,要求水印仅在响应具有足够随机性时出现,这样你才能真正对此有所控制。

So the way that we get around this is you ask that the watermark only appears when there's enough randomness in the response that you actually have some control over this.

Speaker 1

有意思。

Interesting.

Speaker 1

所以你可以用熵保证的形式来规范化这一点。

And so you can like formalize this in terms of an entropy guarantee.

Speaker 1

在我提到的这篇关于SAM和OR的不可检测水印论文中,我们也证明了只有在具备一定熵值的情况下才能实现水印。

And in this paper I mentioned undetectable watermarks with SAM and OR, we also prove that you can only hope to watermark if you have a certain amount of entropy.

Speaker 1

所以我们在这方面也有一个下限。

So we have like a lower bound there as well.

Speaker 1

很酷。

Cool.

Speaker 1

如果你查看大语言模型水印相关文献,通常当他们证明可检测性时,都会设定某种熵边界。

And so if you look at like the LLM watermarking literature, usually when they prove detectability, they'll have some kind of entropy bound.

Speaker 1

你只能在足够随机的文本中检测到水印。

You can only detect the watermark in random enough text.

Speaker 0

或者也需要一定数量的文本?

Or like a certain amount of text too?

Speaker 0

就像,是不是某种程度上...或者说可以是短的、长的,只要不是人类写的就行?

Like is it sort of or I guess it could be short, long, as long as it's not human.

Speaker 1

对。

Yeah.

Speaker 1

不同的方案有着略微不同的保证机制。

People have different like different schemes have slightly different guarantees.

Speaker 1

对于某些方案,你需要足够长的文本和整个长度内足够高的熵率。

For some schemes, you need a long enough text and a high enough rate of entropy throughout the whole length.

Speaker 1

我认为有些方案只需要足够的熵值。

I think for some schemes, need enough entropy.

Speaker 1

文本长度并不重要。

It doesn't matter how long the text is.

Speaker 1

可以是很短的文本但包含高度集中的熵值,那样也是可行的。

It could be like a very short text with highly concentrated, like a large amount of entropy in that short text and that would be okay.

Speaker 0

在图像的情况下,也有最小限制吗?

In the image case, is there also minimums?

Speaker 0

比如在这种情况下是否有最小尺寸限制?

Like are there size minimums in that case?

Speaker 0

因为你也可以生成像1像素的白色方块这样的东西。

Because you could also generate like a one pixel white box.

Speaker 0

我觉得那里面藏不了多少信息。

Can't hide much in that, I feel.

Speaker 1

我同意。

I I agree.

Speaker 1

我不熟悉他们证明的保证条件

I'm not familiar with the guarantees they prove

Speaker 0

好的。

Okay.

Speaker 1

对于图像水印来说。

For image watermarks.

Speaker 1

我认为总体来说在那里证明事情更难,因为你依赖这个模型从图像返回到潜在空间。

I think in general it's harder to prove things there because you're relying on this model to go from the image back to the latents.

Speaker 1

所以很难在那里证明保证。

So it's hard to prove guarantees there.

Speaker 1

但直观上,至少潜在空间的大小与你能够嵌入的效果之间应该存在某种关系。

But intuitively, there should be some relationship between at least like the size of the latent space and how well you're able to embed.

Speaker 2

所以,现在我想我们已经讨论了水印具有的特性,这些特性某种程度上类似于密码学特性,当然在我的期望中带有更多约束。

So, you know, now I think we've we've kind of talked about the properties that the watermark has, which kind of resemble cryptographic properties on my desire, obviously, with more constraints.

Speaker 2

比如,我可以为任何NP程序生成零知识证明。

Like, you know, I can generate a ZK proof for any NP program.

Speaker 2

但在这里,情况更像是'不行'。

But, like, here, it's like, no.

Speaker 2

实际上我只能为某些具有足够熵的内容生成水印,对吧,这有点不同...是的。

I actually can only generate the watermark for some if they have enough entropy, right, which is like a slight different yeah.

Speaker 2

也许这种差异实际上并不小。

Not it's maybe it's not actually slight.

Speaker 2

这是个不小的差异。

It's a nontrivial difference.

Speaker 2

但有个重要观点,我是在读了你的论文——那篇关于不可检测ORM的论文后才真正理解的,那就是这些构建的水印在某种意义上不会损害模型。

But an important thing that I think I only understood once I read your paper, the the undetectable ORM paper, is there's a sense in which the watermarks that are constructed are sort of they don't damage the model.

Speaker 2

它们所做的只是改变模型的采样特性。

All they're doing is changing sampling properties of the model.

Speaker 2

有点像区块链通过随机性选择验证者。

A little bit like a blockchain samples randomness to choose a validator.

Speaker 1

You

Speaker 2

知道,这个类比跨度确实有点大。

know, that's a very that's a long jump to to to that.

Speaker 2

但有趣的是,这些模型的运作方式——我把大语言模型抽象为只做两件事:

But the interesting thing is the way that these models work is is I abstract the LLM to just doing two things.

Speaker 2

预测下一个token的概率分布,然后采样一个token。

Predict and distribution over NEXT tokens, sample a token.

Speaker 2

我们并没有改变预测下一个标记的功能,这意味着我没有调整模型的权重。

And we don't change the predict the NEXT token thing, which means I don't mutate I don't adjust the weights of the model.

Speaker 2

我没有改变任何东西。

I don't change I don't change anything.

Speaker 2

我只是改变了用于采样下一个标记的种子,这有点像区块链的工作原理。

I'm just changing the seed that I'm using to sample the next token, which that is the thing that's like a blockchain.

Speaker 2

这有点像验证随机函数(VRF)的概念。

That's the thing that's like a VRF a little bit.

Speaker 2

所以或许值得详细讨论这一点,因为我认为这正是水印技术显得比ZKML或其他隐私机器学习更强大的原因——后者需要将整个模型置于私有状态。

And so maybe it would be great to kind of talk through this because I think this is the reason Watermark seems so much more powerful than, like, things like ZKML or other private machine learning where it's like, I have to put the entire model into Ah, yeah.

Speaker 2

如果采用差分隐私方案,我就必须改变模型本身。

Private state, or I have to mutate the model if I was doing differential privacy.

Speaker 2

但在这里,你只改变了采样过程,这在计算上是相对低成本的。

But here, you're only changing the sampling, and it's so computationally kind of a very relatively low effort thing.

Speaker 2

是的,通过讨论这个采样变异如何产生所需的特性,能帮助我们更好地理解。

And, yeah, like talking through that and understanding like how this kind of sample mutation leads to these properties that you want.

Speaker 1

哦,是的。

Oh, yeah.

Speaker 1

这是个非常好的观点,水印技术极其轻量级,完全不需要触及模型的权重。

That's a really good point that watermarking is super lightweight and that you don't need to touch the weights of the model at all.

Speaker 1

我喜欢把这些水印方案看作只是对LLM采样算法的修改。

I like to think of these watermarking schemes as just modifying the sampling algorithm of the LLM.

Speaker 1

我的意思是,就像Tarun说的,这些LLM有两个组成部分。

And what I mean by that is, like Tarun said, these LLMs have two components.

Speaker 1

一个是神经网络,它以当前生成的响应作为输入。

One is like a neural network that takes as input the response output so far.

Speaker 1

嗯。

Mhmm.

Speaker 1

并生成下一个token的概率分布。

And generates a probability distribution over the next token.

Speaker 1

所以这确实是计算量最大的部分——运行这个神经网络来获取概率分布。

So this is really the computationally expensive part running this neural network to get this probability distribution.

Speaker 1

而现在是一个非常廉价的步骤,你只需根据这个分布采样下一个词元。

And now there's a very cheap step where you just sample the next token according to this distribution.

Speaker 1

水印技术修改的正是这个采样步骤。

And it's the sampling step that the watermarks modify.

Speaker 1

所以我之前描述的红绿方案,就是通过提高绿色词元的概率并降低红色词元的概率来修改采样步骤。

So the red green scheme that I described earlier modifies the sampling step just by increasing the probabilities of the green tokens and decreasing the probabilities of the red ones.

Speaker 1

因此它完全不会触及模型的权重。

So it's not touching the weights of the models at all.

Speaker 1

它只是改变了采样算法。

It's just changing the sampling algorithm.

Speaker 1

有意思。

Interesting.

Speaker 1

所以红绿方案可以说是你能想到的最简单的方案。

And so the red green scheme is kind of the simplest scheme you can think of.

Speaker 1

但即便是更复杂的方案,也只是改变采样过程。

But even the more complicated schemes only change the sampling process.

Speaker 0

直到你这么说我才意识到,我们之前讨论过的CKML模型要求底层模型必须被修改。

I didn't realize until you said this that the CKML models that we've talked to though require that the model be changed underneath.

Speaker 2

嗯,不是

Well, not

Speaker 0

我也曾以为那些就像是

that I also thought of those as like

Speaker 2

不是那样,但你必须在这个框架内评估整个事情。

It's not that it but you have to evaluate the whole thing in contained in this.

Speaker 2

对吧?

Right?

Speaker 2

这种方法的侵入性非常小。

This thing is very minimally invasive.

Speaker 2

我基本上是在原样运行模型。

I'm, like, running the model as is.

Speaker 2

我只是在改变采样方式。

I'm only changing the sampling.

Speaker 2

嗯。

Mhmm.

Speaker 2

我不需要把你放进我的安全区。

I don't have to, like, put you in my enclave.

Speaker 2

我不需要用ZK编译器重新编译你的代码。

I don't have to recompile your code with the ZK compiler.

Speaker 2

对吧?

Right?

Speaker 2

实际上我完全没有动它。

Like, I'm not touching it at all effectively.

Speaker 2

好的。

Okay.

Speaker 2

除了采样部分。

Except for sampling.

Speaker 0

有意思。

Interesting.

Speaker 1

这也意味着仅通过API访问就能实现水印功能。

This also means that you can watermark with only API access.

Speaker 1

哦。

Oh.

Speaker 1

因此,只要你能获取这些概率分布,就可以自行进行采样或修改采样方式,从而为来自ChatGPT等API的文本添加水印。

So as long as you have access to these probability distributions, you can do sampling yourself or change sampling yourself to watermark text that's coming from say an API to ChatGPT.

Speaker 1

实际上我就这么操作过,为我们的方案获取了一些带水印的示例文本。

So I actually did this to get some like sample watermark text for our scheme.

Speaker 1

我只拥有ChatGPT的API访问权限。

I only had API access to ChatGPT.

Speaker 1

对吧?

Right?

Speaker 1

就像我自己无法运行整个神经网络一样。

Like I couldn't run the whole neural network myself.

Speaker 1

但仅凭这个API访问权限,就足以改变采样器和ChatGPT生成文本的水印。

But even just with this API access, it was enough to change the sampler and watermark ChatGPT generated text.

Speaker 0

有意思。

Interesting.

Speaker 0

这让我想进一步了解一下底层模型。

This leads me to like ask a little bit about the underlying models.

Speaker 0

是否存在某些模型组可以使用非常相似的水印方案?

And if like, are you are there sort of groups of models where you'd have a very similar watermarking scheme for it?

Speaker 0

并且你实际上可以像刚才那样,将其应用到所有模型上。

And you can actually do what you just did, which is like kind of apply it to all of them.

Speaker 0

还是必须为每个模型单独定制水印系统?

Or do you have to create bespoke watermarking systems for every single one?

Speaker 0

或许存在某些模型组可以批量处理。

And let's like and maybe there's groups and you can do a few.

Speaker 0

我很好奇这方面具体是如何划分的。

I'm just kind of curious how that breaks down.

Speaker 1

是的。

Yeah.

Speaker 1

这类仅改变采样器的大语言模型水印技术的一个优点在于,你可以跨任意数量的模型使用相同的水印密钥。

So one nice thing about these LLM watermarks that just change a sampler is that you can use the same secret watermarking key across any number of models.

Speaker 0

哦。

Oh.

Speaker 1

对。

Right.

Speaker 1

以红绿方案为例,你可以将相同的红绿分区规则应用于任意数量的模型。

So with the red green scheme for example, you can use the same red green partition for as many models as you want.

Speaker 1

由于你只是提高了绿色词汇的出现概率,因此可以统一应用该方案,并以相同方式检测——只需统计绿色词汇的数量即可。

And since you're just increasing the probability of the green words, you can do this for all of them and detect in the same way, just count the number of green words.

Speaker 1

这正是基于采样器方案的精妙之处:无论模型数量多少都可使用相同密钥,这对于频繁更新模型或部署多模型的实际场景非常实用。

So this is a very nice property of the sampler based schemes where you can use the same secret key for any number of models, which should be nice in practice if they're like updating these models often or have many different models deployed.

Speaker 0

不过我猜这显然取决于媒介类型。

I guess though there's a distinct like it depends on the medium.

Speaker 0

回到文本与图像的对比,显然无法在这两类媒介间使用相同技术。

So going going back to like text versus images, obviously, you're not really able to use the same technique across those.

Speaker 0

但所有图像生成模型是否都能用你之前提到的不可检测水印工作中描述的方式进行测试呢?

But would like all image generation models be able to be tested the same way that that was described in the undetectable watermark work that you described earlier?

Speaker 1

关于文本水印和图像水印的区别,这是个很好的观点。

So that's a good point about the difference between these text and image watermarks.

Speaker 1

对于图像水印,即使你在不同模型中以相同方式改变潜在空间,由于还需要通过反演过程从图像回到潜在空间,这个过程可能因模型而异。

For the image watermarks, even if you're changing the latents in the same way across different models, since you also have this inversion process where you go from the image to the latents, that might be different from model to model.

Speaker 1

基本上无法使用同一个检测器适用于所有模型。

There's sort of nothing you can do to use the same detector for all of them.

Speaker 1

你必须使用与生成图像的模型相对应的正确反演器。

Like you'll have to use the correct inverter for the model that generated the image.

Speaker 1

因此,即便你使用相同的潜在空间偏置方法,也必须使用特定于模型的反演器。

And so even if you're using the same way of biasing the latents, you're going to have to use the model specific inverter.

Speaker 1

或许有方法可以设计适用于所有模型的全局反演器。

Now there might be ways to like design global inverters that work for all of them.

Speaker 1

我对这个领域不太熟悉,但相比文本水印,实现跨模型的图像水印似乎要困难得多。

I'm not so familiar with this area, but it seems much harder to have like a an across model image watermark compared to a text watermark.

Speaker 2

是的。

Yeah.

Speaker 2

我是说,我想我之所以最终了解这方面的文献,部分原因是我在观察密码学领域的人们如何尝试解决问题,从计算角度看,那些方法似乎都很疯狂。

I mean, I think one of the reasons I was I kinda ended up learning about this literature was I basically was looking at how people in cryptography land were trying to do things, and, like, all of them just seemed crazy from a computational standpoint.

Speaker 2

然后我看到了水印技术。

And then I saw the watermark stuff.

Speaker 2

我当时就想,这几乎不需要计算量,却能获得人们期望中90%的收益,就像Nice那样。

I'm like, this is like no computation, and you're getting, like, 90% of the benefits that people wanted from, like Nice.

Speaker 2

比如对整个模型进行ZK验证之类的。

ZK ing the whole model or whatever.

Speaker 2

我再次提到ZKI时,指的不是零知识证明。

And so when I use ZKI again, I don't mean zero naught.

Speaker 2

我的意思是,关键在于简洁性。

I mean, it's succinctness there.

Speaker 0

我们知道。

We know.

Speaker 0

我们知道。

We know.

Speaker 2

但我觉得需要阅读大量论文才能理解的一个有趣点是关于威胁模型的概念,以及我们如何看待对手攻击、伪造水印或试图让某些本无水印的内容被检测为阳性。

But an interesting thing that I think I needed to read a bunch of these papers to understand was sort of the threat model and, like, how we think about adversaries attacking and, like, forging watermarks or trying to to make something that doesn't watermark detect as positive.

Speaker 2

哦。

Oh.

Speaker 2

我认为讨论一下鲁棒性特性会很有意义,比如你们是如何构建攻击威胁模型的。

And I think it would be great to talk a little bit about the robustness properties and, like, how you formulate the threat model for attacks.

Speaker 2

或许现在我们可以聊聊表情符号攻击。

And, like, maybe now is we can talk about the emoji attack.

Speaker 2

为了描述威胁模型,讨论一些具体算法可能不错,比如你们与Zamir和Gunn合作的论文中提到的基于哈希的水印技术。

And and maybe in order to describe the threat model, it would be good to talk about concrete algorithms such as hash based watermarks like your paper with Zamir and Gunn.

Speaker 1

是的。

Yeah.

Speaker 1

那么,好吧。

So, okay.

Speaker 1

我经常提到这个红绿方案。

I mentioned this red green scheme a lot.

Speaker 1

这样做的一个问题是,如果某个话题被列入绿名单,它可能会被过度讨论。

And one problem with this is that if a certain topic is on the green list, it'll be talked about a lot more than it should be maybe.

Speaker 1

因此理想情况下,你不希望所有回答都使用相同的红绿名单,因为这会在系统层面改变模型讨论的话题。

And so ideally, you don't want to use the same red green list for all responses because this will systematically change what topics the model is talking about.

Speaker 1

所以在Scott的方案、Kirchenbauer等人的方案以及我和Sam、Orr的方案中,都采用了使用哈希函数(或我们用的是伪随机函数)来生成随机性,从而为每个回答获取新的红绿名单的想法。

And so one idea used in both Scott's scheme and the Kirchenbauer et al scheme, as well as mine with Sam and Orr, is to use a hash function, or in our case like a pseudo random function to derive randomness to effectively get a new red green list per response.

Speaker 1

具体做法是:取模型生成的最后几个词,对它们进行某种哈希计算,以确定模型输出的下一个词或标记应该使用哪些红名单和绿名单。

So the idea is to take the last few words generated by the model and evaluate some kind of hash on these to determine what should be in the red list and the green list for the next word or token that the model outputs.

Speaker 1

这可能在原理上类似于Fiat-Shamir(如果你熟悉密码系统的话),即对当前对话记录进行哈希计算,为协议后续步骤提供随机性。

So maybe this is similar in spirit to like Fiat Shamir, if you're familiar with cryptographic systems, where, you know, you're taking a hash function on the transcript so far to drive randomness for what you're doing next in the protocol.

Speaker 0

基本上就是在创建这个水印的过程中。

In the creation of this watermark, basically.

Speaker 1

是的。

Yeah.

Speaker 1

举个具体例子,如果模型输出了'作为一个大型语言模型',你需要决定接下来输出哪个词。

Maybe as a concrete example, if the model has output like as a large language model, you wanna know which word to output next.

Speaker 1

这类基于哈希的方案会计算'作为一个大型语言模型'的哈希值,用生成的随机性来选择响应中下一个词的红绿分区。

This family of hash based schemes would compute the hash of as a large language model, derive some randomness which it uses to choose the red green partition for the next word in the response.

Speaker 1

而要进行检测,你需要能重新计算这些哈希值来推导随机性,从而判断当时的红绿分割情况。

And now to detect, you need to be able to recompute the hashes to derive this randomness to tell what the red green slits were.

Speaker 0

好的。

Okay.

Speaker 0

这使得验证过程稍微复杂了些。

So it makes the verification process a little bit more complex.

Speaker 0

是啊。

Yeah.

Speaker 0

至少多了一个步骤。

You have an extra step at least.

Speaker 1

没错。

Yeah.

Speaker 1

这个额外的步骤在计算上仍然很廉价,但它真正影响的是鲁棒性。

There's this extra step which is still computationally cheap, But where it really hurts you is in robustness.

Speaker 1

所以现在如果哈希部分的任何单词发生变化,你就无法恢复之前的相同随机性。

So now if any word in this hashed portion changes, then you can't recover the same randomness as before.

Speaker 0

哦,是的。

Oh, yeah.

Speaker 1

然后你会得到一个完全不同的红绿分区。

And you'll get a completely different red green partition.

Speaker 0

哎呀。

Yikes.

Speaker 0

所以你不能让它改变。

So you can't it can't change.

Speaker 1

是的。

Yeah.

Speaker 1

基本上,我的意思是,这取决于程度

Basically, I mean, it depends how much

Speaker 2

you

Speaker 1

哈希。

hash.

Speaker 1

但如果你对整段文字进行哈希处理,那么这段内容就完全不能更改。

But if you hash, say, a paragraph, then this whole paragraph can't change at all.

Speaker 1

否则,你将无法检测到水印。

Otherwise, you wouldn't be able to detect the watermark.

Speaker 1

现在Scott的方案和Christian Bauer等人的方案是对前两个词进行哈希,所以它们在鲁棒性上不会受到太大影响。

And now Scott's scheme and the Christian Bauer et al scheme hash something like the past two words, so they don't take a huge hit to robustness.

Speaker 1

但我的方案是对更长的部分进行哈希。

But my scheme, we hash a much longer portion.

Speaker 1

这是为了获得这种额外的不可检测性和高质量特性。

This is to get this extra undetectability high quality property.

Speaker 1

有意思。

Interesting.

Speaker 1

所以,这些基于哈希的方案非常突出。

So, yeah, these hash based schemes are very prominent.

Speaker 1

它们催生了所谓的表情符号攻击。

And they motivate what's called the emoji attack.

Speaker 1

在表情符号攻击中,你要求模型输出响应时,在每个单词之间插入一个表情符号。

And in the emoji attack, you ask the model to output your response, but insert an emoji between every pair of words.

Speaker 1

这样你就会得到类似'单词 表情符号 单词 表情符号'的输出。

And so now you'll get something that's like word emoji, word emoji.

Speaker 1

对吧?

Right?

Speaker 1

你可以轻松地通过删除所有表情符号来恢复原始文本。

And you can easily get back an actual text by just deleting all of the emojis.

Speaker 0

嗯。

Mhmm.

Speaker 1

这样做的作用是,如果水印是通过哈希来嵌入的,哦

What this has done is now if the watermark is embedded by hashing Oh,

Speaker 0

所以需要完全调整所有内容,就因为里面插入了这些表情符号?

totally adjust everything because it has all these emojis in it?

Speaker 1

对。

Yeah.

Speaker 1

没错。

Exactly.

Speaker 1

就像如果哈希计算中使用了表情符号,一旦删除它们,哈希值就完全不同了。

Like if the emojis are used in the hashes, now once you delete them, the hashes are entirely different.

Speaker 1

哦,哇。

Oh, wow.

Speaker 1

而且这种攻击方式完全不会影响质量,因为你最终得到的文本和没有插入表情符号时一模一样。

And also you're not hurting quality by doing this attack at all because you get back exactly the same text you would have had without emojis.

Speaker 0

等等。

Wait.

Speaker 0

我不确定自己是否完全理解了这个原理。

I don't know if I fully understood this.

Speaker 0

就是说,这些表情符号是作为水印的一部分被添加进去,然后你们又喜欢添加和移除的吗?

Like, are the emojis being added as part of this watermarking that you then like add and remove?

Speaker 0

还是说如果对手试图

Or is it if an adversary tried to,

Speaker 2

不是这样的。

like No.

Speaker 2

不是。

No.

Speaker 2

假设假设你试图在作业上作弊。

Suppose suppose you tried to cheat on your homework.

Speaker 0

好吧。

Okay.

Speaker 2

然后你说,嘿。

And you said, hey.

Speaker 2

实际上,把你刚才给我的回复里所有空格都替换成表情符号。

Actually, take the response you just gave me and replace all the spaces with emojis.

Speaker 0

比如,在模型内部。

Like, within the model.

Speaker 0

所以他们是在模型内部操作

So they're doing Within the

Speaker 1

模型。

model.

Speaker 1

好的。

Okay.

Speaker 2

但现在它被用表情符号做了水印标记。

So now though now it's watermarked with the emojis.

Speaker 2

但当你提交作业时,你手动删掉所有表情符号换成空格,这样你就破坏了哈希验证。

But then you, when you turn in your homework, you just, like, go and manually delete all the emojis and put spaces, and now you've ruined the hashing.

Speaker 2

对吧?

Right?

Speaker 2

是的。

Yeah.

Speaker 2

因为这就像是一个非常简单的攻击方式,某种程度上就像幼儿园小孩能想到的第一招。

Because so it's like it's it's like a very simple it's like a the first attack a kindergartner would come up with in some ways.

Speaker 2

对吧?

Right?

Speaker 2

就像是,对。

Of like Yeah.

Speaker 1

这超级简单,却能完全击败这些基于哈希的方案。

It's like super simple and completely defeats these hash based schemes.

Speaker 1

即便是那些只哈希保存最近两个标记的方案也不行,因为你到处都加了表情符号。

Even the ones that only hash save the past two tokens because you have emojis everywhere.

Speaker 0

你能给这些基于哈希的方案增加一个特性吗?如果它们检测到这种明显的低级注入模式,就不哈希那部分内容?

Could you add another quality to these hash based ones that if they see sort of that kind of pattern of like just dumb obvious injections that it doesn't hash that part?

Speaker 1

可以。

Yeah.

Speaker 1

所以我觉得你可以玩这种猫捉老鼠的游戏,对吧?

So you can, I think, play this kind of cat and mouse game Yeah?

Speaker 1

当你发现特定攻击时,就可以在水印中添加一些步骤来忽略触发攻击的内容。

Where if you see a specific attack, then you can, you know, add some steps to the watermark that ignores things that are activating the attack.

Speaker 1

但我觉得即使你修复了一大类表情符号攻击,人们还是会想出新的攻击方式。

But I think if you fix even a broad class of emoji like attacks, people will come up with new attacks.

Speaker 1

所以这种博弈非常难进行。

So it's very hard to play this game.

Speaker 1

事实上,极限情况下,哈佛大学的张等人团队有篇论文《沙中水印》证明,要构建一个能抵御任意多项式时间对手的水印是不可能的。

In fact, in The Limit, there is a paper by Zhang et al, group at Harvard called Watermarks in the Sand, which shows that it's impossible to construct a watermark that's robust to an arbitrary polynomial time adversary.

Speaker 1

他们证明如果攻击者有足够的决心,就能移除任何水印。

So they show that if the adversary is dedicated enough, then it can remove any watermark.

Speaker 1

这是在相当强的假设条件下得出的结论。

This is under some fairly strong assumptions.

Speaker 1

我认为这更像是一篇理论性论文。

I would say it's more of a theory paper.

Speaker 1

可以说,它并没有给出我们当前需要担心的实际攻击手段。

It doesn't give practical attacks we need to worry about now, I would say.

Speaker 1

但它确实表明你无法拥有一个强有力的鲁棒性定理。

But what it does show is you can't have a strong robustness theorem.

Speaker 1

就像,不能说,这里有一个方案,并证明它对试图移除水印的多项式时间对手具有鲁棒性。

Like, can't say, here's a scheme and prove that it's robust to polynomial time adversaries that are trying to remove the watermark.

Speaker 2

是的。

Yeah.

Speaker 2

我是说,你与费马小定理做类比很有趣,因为我认为费马小定理就像,嘿,随机预言模型,它是安全的。

I mean, it is interesting you made this analogy to Fiad Shimura because I think about Fiad Shimura as like, hey, random oracle model, it's safe.

Speaker 2

但是,哦,就像所有最近的论文都在说,哦,在某些情况下它有点崩溃,虽然不是完全像鲁棒性问题,但在轻微扰动下。

But, oh, like, all of the recent papers that are like, oh, it's like kind of borked under some sort of like, something not quite like the robustness thing, but, like, under slight perturbations.

Speaker 2

好吧。

Okay.

Speaker 2

不存在相关性难处理性或其他什么。

There's not correlation intractability or whatever.

Speaker 2

就像,费马小定理从窗户映照出来。

Like, Fietscher mirrors out the window.

Speaker 2

我们之前讨论过一些关于方案稳健性的问题。

You know, we talked a little bit about robustness of schemes.

Speaker 2

我个人觉得最让我这个技术宅着迷的是它与纠错码的联系,这种想法让我觉得:如果我把秘密作为选择的随机性路径,就能为水印提供某种保证。

And one really I think the thing that kind of nerds sniped me personally a lot about this was the connection to, like, error correcting codes and sort of this idea that, like, I can error I can have these, like, error correcting codes that have randomness in them, and if I, you know, make my secret, like, the path of the set of randomness I choose, then I can get these kind of guarantees for watermarks.

Speaker 2

而且在某种程度上,我可以处理一定程度的增删,比如有人删除了某个字符。

And, also, there's sort of some sense in which I can handle additions and deletions up to some amount, like, you know, someone deleting one of the characters.

Speaker 2

对吧?

Right?

Speaker 2

这其实就是一种鲁棒性。

Like, which is a form of robustness.

Speaker 2

当然,在你之后还有其他研究,但它们似乎都基于你提出的这种抽象框架——关于水印与编码理论、纠错码以及这种自然嵌入关系的思考方式。

And there have been other works, obviously, after yours, but they all seem to build build off this kind of abstraction you came up with for thinking about watermarks and their relation to coding theory and error correcting codes and kind of this natural embedding.

Speaker 2

所以如果能探讨水印与纠错码的关系会很有意义,比如一旦进入纠错码领域,你就能讨论这个水印能抵抗文本中多少删除或添加操作。

So it would be great to kind of talk about the relationship between watermarks and error correcting codes and sort of, like, the the way that once you go to error correcting code land, you can suddenly talk about, okay, like, you can you know, this watermark is resistant up to some number of deletions in the text or some number of additions.

Speaker 2

在我看来,这似乎是人们追求鲁棒性的主流方向——除非我遗漏了其他重要方面。

And, you know, to me that seems to be like the way people are kind of moving towards robustness, unless I'm missing something else.

Speaker 2

所以

So Does

Speaker 0

那实际上也能解决表情符号攻击的问题吗?

that actually solve for the emoji attack too?

Speaker 0

如果你用的是纠错码而不是简单的基于哈希的方法?

If you're using error correcting codes instead of the simple hash based one?

Speaker 1

是的。

Yeah.

Speaker 1

太酷了。

How cool.

Speaker 1

没错。

Yeah.

Speaker 1

纠错码非常强大。

The error correcting codes are super powerful.

Speaker 1

这是我在这个领域最喜欢的工作,就是引入了所谓的伪随机纠错码。

And this is like by the favorite work that I've done in this area was the introduction of something called pseudo random error correcting codes.

Speaker 1

这是与Sam Gunn合著的论文。

This was a paper with Sam Gunn.

Speaker 1

然后我想我们还有一篇更新的后续论文,《理想伪随机码》。

And then I guess we have a newer follow-up paper, Ideal Pseudorandom Codes.

Speaker 1

那篇可能更偏理论。

That's maybe more theoretical.

Speaker 2

那是STOC会议论文。

That's the stock paper.

Speaker 2

对吧?

Right?

Speaker 1

是的。

Yeah.

Speaker 1

那篇论文刚在今年STOC会议上发表。

That just appeared at stock of this year.

Speaker 1

那也是与Omar Alrabia、Prahbhanjan Ananth和Yevgeniy Dodis合作的成果。

That was also joint with Omar Alrabia, Prahbanjan Anant and Yevgeny Dodas.

Speaker 1

所以,是的。

So, yeah.

Speaker 1

这些伪随机码是什么?

What are these pseudorandom codes?

Speaker 1

我想它们就是字面意思。

I guess they're exactly what they sound like.

Speaker 1

它们就像是看起来随机的纠错码。

They're like error correcting codes that appear random.

Speaker 1

通常纠错码通过引入大量结构来工作。

So usually error correcting codes work by introducing a lot of structure.

Speaker 1

比如一个好的纠错码就是简单重复信息多次。

Like one good error correcting code would just be to repeat the message over and over again.

Speaker 1

但这样显然会引入大量结构,且不具备伪随机性。

But of course, this adds a lot of structure and isn't pseudo random.

Speaker 1

伪随机码很难构建,因为码字需要看起来随机。

Pseudo random codes are hard to construct because the code words need to appear random.

Speaker 1

因此它们仍需具备结构,以便在出现错误后能恢复信息。

So they still need to have the structure so you can recover the message after errors are imposed.

Speaker 1

但这种结构需要以某种方式隐藏,使其看起来仍是随机的。

But somehow the structure needs to be hidden so that they still appear random.

Speaker 1

这就是伪随机码的定义。

This is what a pseudo random code is.

Speaker 1

乍一看,它可能与水印技术没有明显关联。

And at a glance, there might not be an obvious connection to watermarking.

Speaker 1

但实际上,我和Sam引入它们正是因为水印技术,某种程度上也是为了解决基于哈希方案的这些问题。

But actually, Sam and I introduced them because of watermarking and kind of because of these issues with the hash based schemes.

Speaker 1

还记得我们期望基于哈希的方案能动态生成新的红绿列表吗?

So remember what we wanted from the hash based schemes was to basically generate fresh red green lists on the fly.

Speaker 1

是的。

Yeah.

Speaker 1

当时我们从先前令牌的哈希值推导这些列表,但现在却损害了鲁棒性,因为我们需要恢复哈希值才能知道选择了哪些列表。

Where we were deriving these lists from the hashes of previous tokens, but now we're hurting robustness because we need to recover the hashes to know what lists we chose.

Speaker 0

一旦有任何变动或被注入内容,你就基本完蛋了。

And the minute anything changes or is injected, you're sort of screwed.

Speaker 1

没错。

Exactly.

Speaker 1

是的。

Yeah.

Speaker 1

于是我们开始思考,与其从哈希值中获取随机性,不如为每个标记随机采样不同的红绿列表然后彻底忘记它们?

And so we started thinking that instead of deriving the randomness from hashes, what if you could somehow just sample a different red green list for each token and forget about them?

Speaker 1

现在在检测时,如果足够多的标记在该绿的时候是绿的,即便不知道具体列表,也能通过密钥神奇地进行检测。

And now, during detection, somehow magically, if enough of the tokens are green when they should be, without even knowing the lists, you can use a secret key to detect.

Speaker 1

嗯。

Mhmm.

Speaker 1

这有点像纠错码——你可以想象采样一个密码字,它会告诉你响应中每个标记对应的红绿列表。

And this is kind of like an error correcting code where you can think of sampling code word that tells you what the red green lists are for each token in the response.

Speaker 1

现在忘记密码字的存在,但当你收到响应时,仍可以通过这种鲁棒解码方式还原出原始信息。

And now, forget about the code word, but if you get back a response, you can do this like robust decoding to get back the message anyway.

Speaker 1

这想法虽然超级模糊,但这就是背后的原理。

This is like super fuzzy, but this is the idea behind them.

Speaker 0

关键在于采样机制。

It's the sampling aspect.

Speaker 0

你某种程度上接受了不会直接匹配的模式,而是在寻找某种规律,但不像哈希那样严格。

And you're sort of accepting that like it won't like you're still looking for a pattern, but you're not looking as like direct it's almost like the hash one is so hard.

Speaker 0

哈希方式太死板了。

It's like so finite.

Speaker 0

而这个更像是概率问题。

Whereas this is like a I guess it's a probability.

Speaker 0

你其实是在计算这些绿色词汇出现的概率

Like you're still looking for like a probability of these green words or green list

Speaker 1

对。

Yeah.

Speaker 0

或类似的东西。

Words or something.

Speaker 1

对。

Yeah.

Speaker 1

之前在检测阶段,你问的是'这个响应是否接近这个特定序列?'

Before in detection you were kind of asking, is this response close to this specific sequence?

Speaker 1

而现在使用伪名编码,你问的是'这个响应是否接近这个大家族中的任何代码词?'

Whereas now with pseudonym codes, you ask like, is this response close to any code word in this large family?

Speaker 0

哇。

Wow.

Speaker 1

这听起来是个更难的问题,但我们利用纠错码的力量,能高效判断给定响应是否接近该编码。

And this sounds like a harder question, but we use the power of error correcting codes to have an efficient way of telling whether a given response is close to the code.

Speaker 0

那么在这种情况下,如果存在类似机制,是否会有个最终内容的转换阈值作为分界点?

In in that case, with something like that, if you still like, would there be almost like a threshold of transformation of the end content at which it's like sort of like a cutoff?

Speaker 0

比如变化量低于这个程度时仍能被识别为带有水印

Like, if it's under this much change, it will still be recognized as watermarked.

Speaker 0

但如果改动超过这个阈值,水印就可能丢失

But if you really alter it past that, then it might get lost.

Speaker 0

比如,如果差异过大,是否会失去这种纠错码的能力?

Like, could you lose the power of this error correcting code if it gets too different?

Speaker 1

是的。

Yeah.

Speaker 1

确实如此。

Definitely.

Speaker 1

这在某种程度上是固有的特性。

And this is kind of inherent.

Speaker 1

这些纠错码通常具有距离属性,能告诉你可容忍多少错误。

So these error correcting codes usually have some distance property that tells you how many errors you can tolerate.

Speaker 1

我们有类似机制,不过我们的检测保证与编码理论中的略有不同,因为我们处于计算受限环境且存在随机性。

So we have something analogous which says that, I guess our our detection guarantees are a little bit different from those in coding theory because we're in like a computationally bounded setting and we have some randomness.

Speaker 1

但概括来说,只要响应足够接近,就仍能被检测到。

But at a high level our guarantee says if the response is close enough, then it will still be detected.

Speaker 1

如果超出某个界限,我们就完全无法保证了。

If it's beyond some bound, then we have no guarantee at all.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客