本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
欢迎收听谷歌DeepMind播客,我是主持人汉娜·弗雷教授。
Welcome to Google DeepMind, the podcast, with me, your host, professor Hannah Frey.
我们已全面进入生成式AI时代。
We are well into the era of generative AI now.
就在几年前,人们还很难想象AI能成功制作惊艳视频、谱写交响乐、模仿荷兰大师风格创作艺术,或是写出令人屏息的惊悚小说。
Just a few years ago, it was very difficult to imagine that AI could successfully produce stunning videos or compose symphonies or or make art in the style of great Dutch masters or write a pacey thriller that has you on the edge of your seat.
但如今这些都已成为现实,机器在创意领域的边界正不断被拓宽。
But today, all of this has come to pass, and the limits of what machines can achieve in the creative endeavors are being nudged forwards all the time.
但AI真的具备真正的创造力吗?
But is AI capable of true creativity?
道格·埃克是数十年来一直梦想这一刻的人。
Well, one person who has been dreaming about this moment for decades is Doug Eck.
他是生成式AI领域的先驱,最近还主导发布了谷歌最先进的视频和图像生成模型。
He is a pioneer in the world of generative AI, and most recently, he oversaw the release of Google's state of the art video and image generation models.
在谷歌DeepMind,道格作为高级研究总监领导着生成媒体领域的研究工作。
At Google DeepMind, Doug is a senior research director who leads research across generative media.
他还是一名音乐家。
He's also a musician.
当他不弹钢琴或吉他时,你可能会发现他在思考艺术的社会意义,或者AI是否有一天能获得奥斯卡奖——这两个问题都是我们对话的绝佳话题。
And when he's not playing piano or guitar, you might find him pondering some big questions about the social purpose of art and whether AI will ever win an Oscar, both of which seem like excellent questions for our conversation.
非常感谢你的参与,道格。
Thank you so much for joining me, Doug.
噢,汉娜,谢谢你的邀请。
Oh, Hannah, thanks for inviting me.
你在这个领域工作多久了?
How long have you been working in this space?
我想我可以自豪地说,我是这个领域的元老级人物了。
I guess I can say with some pride, I'm OG in this space.
要知道,早在2001年2月,我就试图让循环神经网络演奏爵士乐和蓝调音乐。
You know, I I back in in 02/2001, I was trying to get recurrent neural networks to play jazz and blues music.
我们当时没有数据。
We didn't have data.
我们没有算力,但我们有满腔热情。
We didn't have compute, but we had we had passion.
这就是我们所拥有的。
That's what we had.
后来在2015到2016年间,我创建了一个让我非常自豪的项目Magenta,代表音乐与艺术生成。
And, in 2015 and 2016, I created a project that I'm very proud of called Magenta, which stands for Music and Art Generation.
我们是最早探索生成式AI创意潜力的团队之一。
And we were one of the early teams exploring the creative aspects of generative AI.
当时这项技术还处于相当初级的阶段。
And, you know, we were at the phase that the technology there was quite young.
我思考这些问题已经很久了,看到现在的进展真的让我无比兴奋。
And so I've been thinking about these issues for a long time, and I can't you know, I'm very, very excited to see where we are now.
就像是技术终于开始追赶我个人的理想,这感觉太棒了。
It's like technology is starting to catch up with my own personal aspirations, which is so cool.
如果这事晚发生十年十五年,等我退休了才实现,我肯定会懊恼死的。
Like, if if this happened ten years or fifteen years later and I were retired, I would just be like, oh, god.
就错过时机了。
I missed it.
对吧?
Right?
所以能亲身参与这个变革时代,让我感到无比振奋。
So I feel so excited to be here, able to do it while it's happening.
那么回到你年轻时的个人理想,这是你当年就期待实现的愿景吗?
Well, in terms of your personal aspirations then, what was this the kind of thing that you wanted to happen back when you were younger?
在某些方面,已经超越了。
In some ways, outstrips.
我...我觉得我们做到的比我预想的还要多。
I I think I think we've done more than I thought was possible.
我真的...真的必须这么说。
I really I really have to say that.
我认为Transformer、Diffusion等模型的出现,确实让这个领域的发展速度远超我的想象。
I think what we've seen with models like Transformer and Diffusion and a bunch of other stuff has really moved the field faster than I imagined.
这并非渐进式的缓慢攀升。
This hasn't been like a gradual, you know, slow climb.
过去五年我们实现了阶跃式突破,就像解封了无数潜能。
We've hit a step change in the last five years, and then things it's like it's like unblocked a bunch of potential.
我认为这是算力等多重因素共同作用的结果——前期进展缓慢,而后突然爆发。
And I think it's the combination of compute it's a combination of a lot of things, but things didn't move very much, and then they moved a lot.
这一点很值得强调。
And I think that's important to point out.
对吧?
Right?
所以现在我们正在努力理解这项技术。
So now we're we're kinda trying to figure out this technology.
这确实令人振奋。
It's really exciting.
如果要探讨AI是否具备创造力,或许我们首先该界定创造力的定义。
I guess if we are asking whether AI could be creative, it probably makes sense for us to try and define what we mean by creativity in the first place.
你能定义吗?
Can you?
你果然问了关于创造力的问题。
You had to ask, yes, the creativity question.
很多人讨论创造力时,会区分大写的C和小写的c。
Think a lot of people talk about creativity with a capital c versus a lowercase c.
嗯。
Mhmm.
你知道,就像AI模型能创造出对我们来说新颖、令人惊喜和愉悦的新样本。
You know, like, AI models are capable of creating new samples that that look novel to us, that that that surprise us, that delight us.
我认为在这种情况下,这个问题已经得到解答,可能已经有十年了。
I think in that case, the question's answered and has been answered for probably a decade.
是的。
Yes.
另一方面,我们谈论的是真正的新想法,比如全新的流派。
On the other hand, we talk about genuinely new ideas, genuinely new genre, for example.
我认为,不,我们现在还没达到那个水平。
I think, no, we're not there right now.
你知道,我们还没看到AI模型真正以那种方式推动艺术领域。
You know, we're not seeing AI models really move, the art field yet like that.
此外,我认为大写的C创造力有一个社会组成部分对我很重要,就是我在意创作我所消费艺术的人,无论是绘画还是音乐。
Furthermore, I think there's a social component to creativity with a capital c that matters a lot to me, which is, you know, I care about the people that are creating the the art that I that I consume, whether it's a painting or whether it's music.
我最喜欢的例子之一是AC/DC曾一度从流媒体服务上消失。
One of my favorite examples of this is for a while, ACDC disappeared from the streaming service.
这是很久以前的事了。
This is a long time ago.
这大概是十五年前的事了。
This has been this is probably fifteen years ago.
所以好吧。
And so okay.
所以我儿子真的和我一起听了《Back in Black》。
So my son was really my son and I listened to Back in Black together.
就是其中一件事,你知道,我儿子那时候大概七岁。
Was just one of these, you know, my son, like, who was, like, seven at the time.
于是ACDC消失了,在流媒体服务上被这个ACDC翻唱乐队取代了,他们设法钻了系统的空子。
And so ACDC disappeared, and it was replaced in the streaming services by this ACDC cover band that managed to kinda game the system.
所以如果你搜索《Back in Black》,他们的专辑就会跳出来播放,封面也有点相似,所以我没注意到。
So if you search for Back in Black, their album came up and got played, and it kind of had a similar cover, and so I didn't notice.
我听了之后感觉糟透了,因为我无法...你知道,我熟悉原唱的声音。
And I heard it, and it was just horrible because I couldn't you know, I know the voices of the actual singers that made the music.
对我来说这是真实的。
It's real to me.
创作音乐的人才是关键,因为对我们来说真正重要的是那种连接。
And people that made it because that's really what matters to us is a connection.
对吧?
Right?
我们之间需要那种连接。
There's a connection to us.
如果AI能做出披头士的第十三张专辑,那它就能做出第十四张。
If we had AI that could make the thirteenth Beatles album, then it could make the fourteenth Beatles album.
嗯。
Mhmm.
然后它还能做出一百一十四张披头士专辑,诸如此类。
Then it could make the hundred and fourteenth Beatles album, etcetera, etcetera.
这就形成了一个悖论。
And you have a kind of puzzle here.
就像在说,不,那不是披头士的本质。
It's like, no, that's not what the Beatles are.
他们是特定时代的产物。
They're a point in time.
那是二十世纪六十年代和七十年代。
They're the nineteen sixties and nineteen seventies.
那是我的童年时光。
They're my childhood.
那是约翰、保罗、乔治和林戈,这意义重大。
They're John and Paul and George and Ringo, and that matters a lot.
我认为这极其重要。
I think that matters tremendously.
我不认为我们应该排除未来会出现我们逐渐欣赏并追随的AI代理,它们推动领域发展,甚至能与教授一起出现在播客中。
I don't think that we should rule out a future where there are AI agents that we grow to appreciate and follow and that move the field and that appear on podcasts, know, with professors.
但我们目前还没到那个阶段。
But, like, we're not there right now.
而且我认为,如果我们过多讨论创造性而忽视艺术的社会性,就真的偏离重点了。
And I think if we talk too long about creativity and leave behind the social part of art, we're really missing the point.
这真有意思。
That's so interesting.
我们一下子就聊得很深入了,不是吗?
I mean, we've got quite deep quite quickly here, haven't we?
抱歉。
Sorry.
抱歉。
Sorry.
我刚喝了咖啡。
I just had my coffee.
但我喜欢这个观点:艺术是传递人类体验的载体,而这一点至少在目前是无法复制的。
But but I like that idea that art is something about communicating the human experience, and that, actually, that cannot be replicated, at least for now.
同意。
Agreed.
嗯。
Mhmm.
而且有大量关于思想实验的讨论。
And there's like a ton of like, people talk about there's a ton of thought experiments.
举个例子,即使你讨论人类创作的艺术,假设你有台时光机——顺便说一句,如果你真有时光机,可能还有其他更想做的事。
One is even if you talk about human created art, if you had a time machine by the way, you have a time machine, there are other things that you might wanna do.
确实。
Sure.
但假设你有台时光机,其中一件事就是把安迪·沃霍尔的画作送回13世纪的法国。
But let's say you have a time machine, one thing you could do is, like, take an Andy Warhol painting and ship it back to thirteenth century France.
我认为——大多数人也会直觉地认同——这毫无意义。
And I argue, and most people would argue intuitively, that would just make no sense.
比如:我们眼前这个金宝汤罐头是什么东西?
Like, what is this Campbell soup can that we're looking at?
玛丽莲·梦露又是何方神圣?
Who the hell is Marilyn Monroe?
艺术家们始终在指向文化、评论文化、推动文化,而这些文化议题此刻正由我们创造、驱动和回应。
And, like so artists are constantly, like, pointing to culture, making comments about culture, moving culture, and these cultural questions are are all right now created by, driven by, responded to by us.
对。
Right.
所以艺术的社会属性和其重要性——这点我稍后再谈。
So the social side of art and the importance of of that, I wanna come back to that in bit.
但你似乎把艺术划分成了两个不同类别。
But it's almost like you're splitting it into two different categories there.
一方面是前所未有的原创性,另一方面则是思想的质量与深度。嗯。
There's, like, the originality of just it's it hasn't existed before, but then there's, like, the quality and depth of the ideas Mhmm.
以及那种原创性。
And the originality of that.
那就是is吗
Is that the is
我想是的。
I think so.
还有一点,如果我们从文化问题回归到科学问题上,嗯。
And one one thing that if we go back to the move away from the culture question and move to the to the science Mhmm.
我们利用AI生成优质图像已有相当长的时间了。
We've been able to generate great images with AI for quite some time.
我们一直缺乏的是可控性。
What we haven't had is controllability.
比如用户输入提示词后,能否真正获得他们想要的图像。
So the ability for a user to type in a prompt, for example, and actually get the image that they asked for.
这正是我们在Transformer模型和扩散模型中看到的重大突破之一——可控性概念。
That is one of the major parts of the breakthrough that we've seen with transformer models and with diffusion models, this idea of controllability.
但这种自我表达能力与AI力量的结合,正是我们看到的精彩之处。
But that express that ability to express ourselves coupled with the power of AI is where where we're seeing something interesting.
所以我希望AI足够强大,能为我做很酷的事,但必须确保我能以某种方式控制它。
So I want the AI to be really powerful and to do really cool things for me, but I've gotta be able to control it somehow.
否则说实话,很快就会变得相当无趣。
Or otherwise, frankly, just becomes pretty boring pretty fast.
实际上在很多方面,创造力仍存在于人类而非AI本身。
Actually, in a lot of ways there, the creativity remains in the human rather than in the AI alone.
是的。
Yes.
但不仅如此。
But yes and.
这并不意味着技术无关紧要。
That doesn't mean that the technology doesn't matter.
对吧?
Right?
就像技术与艺术家之间的联姻,这种结合极其重要。
Like, there's a marriage between the technology and the artist that just is incredibly important.
就我所从事的音乐而言,我并不自称是伟大的音乐家,因为我确实不是,所以我的定位很清晰。
And for me as what I do is music, I don't, like, claim to be a great musician because I'm not, so I'm a lot well aligned.
但你知道吗,我其实有点沉迷于即兴弹奏钢琴或吉他时那种心流状态。
Like, but that, you know, I really I'm kind of addicted to the flow state that comes from improvising on piano or on guitar.
在某个瞬间,你会忘记——从认知上忘记——这架钢琴并不是你。
And there's a point where you forget, like, cognitively, you forget that this piano is not you.
真的,或者电吉他,你会忘记它不是你,而是你身体的延伸,这很美。
Like, really or, like, the the the electric guitar, you know, you forget that it's not you and it's this extension of you, and that is beautiful.
对吧?
Right?
这就是技术催生出的新自我表达形式,而我们正试图用AI实现这一点。
That's technology giving rise to new forms of of of self expression, and we're trying to get that with AI.
至少我是这样。
At least I am.
这就是我的目标。
Like, that's my goal.
对吧?
Right?
我们的目标是:能否创造一种新方式,让你把脑海中的想法具象化呈现?
The goal is to can we can we build a new way that you can you can take these ideas that are in your brain and get them out there?
不过你觉得真有可能做到吗?让人工智能与你如此紧密连接,让你完全掌控它,以至于意识不到它只是你身体的延伸?
Do you reckon that's possible, though, to have have it where you are so connected with an AI, where you are so in control of it that you you you cease to realize that it's not just an extension of your body?
我认为有可能,甚至觉得我们正在研发的一些技术已经初步实现了这种状态。
I do think it's possible, and I think it's even happening now with some of the stuff that we're working on.
我也不想过度停留,不想故弄玄虚,我认为人工智能的挑战之一就是如何恰到好处地适配。
And I also don't wanna I don't wanna overstay I don't wanna mystify I think one of the challenges with AI is getting that fit right.
让人工智能像我们人类理解艺术创作那样,与我们自身相契合。
Fitting fitting the AI to our own like, the way that we as humans understand the art that we're trying to create.
我认为未来可能会出现全新的艺术流派或形式。
I think later, we may see entirely new genre or entirely new forms of art come up.
可能会出现一个既非音乐、也非绘画、摄影或电影制作的新领域,而人工智能将帮助我们创造它。
Like, there may be a new word that is not music and is not painting and is not photography and is not, you know, film or movie making, and that AI will have helped us create it.
对吧?
Right?
我们还没到那一步。
We're not there yet.
现阶段我们仍生活在现有艺术形式中,所以用AI来生成图像。
Where we are now is kind of living in our current genre, so we're using AI to make images.
用AI来创作音乐。
We're using AI to make music.
用AI来制作视频。
We're using AI to make, video.
但即便如此,回到你的问题,我认为现在已经可以实现——随着音乐模型制作速度加快,我们正越来越接近实时创作。虽然尚未完全实现,但作为音乐人,你或许能边弹吉他边与AI即兴合奏。
But even then, back to your question, I think it's already possible now, as we're making music models faster, we're able to get closer and closer to real time, And you can think about we're not quite there yet, that you might be able to, like, as a musician, play your guitar along with the AI.
就像一种智能循环效果器。
Like, it's a kinda smart looper pedal.
你知道的,那种能持续循环演奏的效果踏板。
You know, you have a pedal that kinda keeps looping.
这是最简单的例子。
And that's the simplest example.
是的。
Yeah.
我认为这确实是一个很好的未来愿景,因为目前我们的现状是:整个过程实际上是由人类进行策划、指导、打磨和精炼,而不是必须实现各个环节的无缝集成。
I mean, I really like that as a as a view for the future, because I guess now where we're at is that it's it's really a case of a human curating and sort of and then instructing and and and honing and refining rather than necessarily that the whole process is, like, seamlessly integrated with one another.
我觉得你说得对。
I think that's right.
我认为可以概括为三个关键环节:转化、策展和创作,这是目前最核心的三个部分。
I think I would say, like, transformation, curation, and then creation would be sort of the three that I would think are out there right now.
转化的一个例子是,我拍了一张照片。
An example of transformation is I took a photograph.
嗯。
Mhmm.
然后我用AI把它转变成我更喜欢的样子。
I transform it into something else that I love even more using AI.
嗯。
Mhmm.
这就是转化。
That's transformation.
对。
Yep.
对吧?
Right?
策展。
Curation.
是啊。
Yeah.
比如开个美颜滤镜,让自己看起来更漂亮。
Face filter on, make myself look better.
继续说吧。
Go on.
确实如此。
Exactly.
没错。
That's right.
经典的老派Instagram。
Good old fashioned Instagram.
我我是它的忠实粉丝。
I I'm a big fan.
策展就是,你知道的,AI让我能在一个空间里自由探索,并向我推荐可能让我惊喜又实用的内容。
Curation is, you know, I'm the AI is is, like, playing allowing me to play around in a space and suggest things to me that might surprise me and be useful.
对吧?
Right?
还有创作。
And creation.
实际上就是根据提示创造出全新的东西。
So just actually creating something new from a prompt.
所以这才是最难的,我觉得是的。
So that's the hardest, I think Yeah.
你知道,要做得好的话。
You know, to do well.
但话说回来,关于原创性的问题,AI真的在创造原创内容吗?还是某种程度上只是在模仿已有的东西?
But then, okay, that question of originality, is AI actually creating things that are original, or is it in some ways just mimicking what exists already?
我认为它让我们能够创造出原创内容。
I think it's allowing us to create something original.
我想我们不得不承认这一点。
I think we we have to admit that.
尽管这些模型是基于我们世界的数据训练的,我常说它们反映的是我们的世界,但更像是哈哈镜里的影像。
Even though the the these models are trained on, you know, data from our world, you know, I've often said, you know, they are reflecting our world, but they're kind of like fun house mirrors.
所以它们并没有用一个完全平坦的平面完美地反映我们的世界。
So they're not reflecting our world perfectly with a perfectly flat plane.
对吧?
Right?
通过这种方式,这些哈哈镜实际上可以产生非常非常有趣的新材料。
And in doing so, these fun house mirrors can actually give rise very, very interesting new materials.
我认为我们必须承认这些材料确实是新颖且变革性的,为我们提供了新的工作素材。
And I think we have to admit that those materials are actually new and transformative and and giving us new things to work with.
我们这里使用的描述很像那种'哦,我见到创意时就能认出来'的感觉。
The descriptions that we're using here are quite like, oh, I sort of know creativity when I see it.
差不多。
Almost.
对吧?
Right?
就像是,是不是这样?
Like, is it isn't it?
你可以争论。
You can debate it.
你不能。
You can't.
你们有衡量AI创造力的方法吗?
Do you have ways that you measure creativity in AI?
我是说,我们当然可以测量...对。
I mean, we can certainly measure the yeah.
我们可以使用距离度量。
We can use distance measures.
基本思路是...我是说,你可以逐像素计算一张图像与另一张图像的距离,但这可能...
The basic idea is to to take I mean, you could do a pixel by pixel distance of an image and another image, and that might not
看看它们彼此之间有多相似。
See how similar they are to one another.
是啊。
Yeah.
然后你会发现像素可能被不同比例缩放等等这类情况。
And then then you realize the pixels might be scaled differently and things like that.
所以你们转向了更复杂的测量方法。
So you move to more sophisticated measures.
我们可以测量与训练集的距离之类的东西。
We can measure the distance from, the training set and things like that.
我们可以测量一种称为'复述'的东西,就是模型反刍训练集中的大段内容吗?
We can measure something called recitation, which is is the model regurgitating, chunks of the training set?
而且,你知道的,所以我们有办法做到这一点。
And, you know, so we have ways to do it.
好的。
Okay.
那么我想问你,奥西。
Then so I wanted to ask you, Ossie.
如果我们回到AlphaGo,这是那个非常著名的DeepMind人工智能,它与李世石对弈并在围棋比赛中击败了他。
If we go back to AlphaGo, this is the the very famous deep mind AI that played Lee Sedol and beat him at the game of Go.
在那场比赛中,有一个特别的第37手,当时震惊了许多研究人员,因为它完全超出了人类围棋选手的传统策略范畴。
During that match, there was there was one particular move, move 37, which shocked a lot of researchers at the time because it was so far outside of the traditional strategies that human Go players would make.
我记得当时有很多人说那是真正创造力的展示。
And and I remember at the time, there were a lot of people who were saying that that was a demonstration of of true creativity.
在你们领域有过类似'第37手'的案例吗?
Has there been an equivalent of a Move 37 in your field?
我还没见过让我觉得是艺术家与AI完美绝妙结合的作品。
I haven't seen anything yet where I feel like there was this perfect and brilliant marriage of of artist and AI.
我见过一些极具启发性的作品,某些视觉艺术家创作的东西让我非常着迷。
I've seen suggestive work, certain visual artists who are just generating stuff I really love.
我相信有很多艺术家会想:等等,我也有过自己的'第37步'时刻,就在这里。
I'm sure there are lots of artists out there who are like, wait, I had my, you know, I had my Move 37, here it is.
我很想听听他们的故事。
I'd love to hear from them.
但我觉得我们离成功就差那么一点点了。
But no, I think we're just tantalizingly close.
我的意思是,我的认知已经被多次颠覆了。
I mean, my mind has been blown multiple times.
当我第一次听到音乐转换器(Music Transformer)创作的音乐作品时,我的认知就被颠覆了,这是我参与的一个项目。
My mind was blown the first time that I heard some of the music compositions done by Music Transformer, which is a project I was part of.
感觉那是我第一次听到神经网络能创作出如此惊艳的音乐。
Like, I felt like it was the first time we've heard a neural network able to to make insanely interesting music.
我想,变形金刚模型通常更多是与语言、聊天机器人联系在一起。
I mean, I guess transformers are sort of more commonly associated with, with language, with chatbots.
但这是否意味着我们讨论的不仅仅是音乐?
But does that mean that we are not just talking about music here then?
我们是否期待AI能在更多不同领域展现创造力?
Are we are we hoping that there'll be lots of different spaces that that you want AI to be creative in?
当然。
Of course.
是的。
Yes.
那就给我讲讲你研究过的一些模型...
So tell tell tell me about some of the models you've
让我们聊聊视觉领域吧。
Let's talk about the visual world.
我们来聊聊图像生成和视频生成。
Let's talk about image generation and video generation.
好的。
Okay.
我把笔记本电脑带来了。
So I have my laptop here for you.
首先我想展示的是一张静态图片,由Imagine 3生成,这是我们目前最强大的图像生成模型。
The first thing I wanted to show you was a still, an an image done by Imagine three, which is our most capable image generation model.
好的。
Okay.
我们看到一幅混合地貌的风景,背景有些沙质草地,还有些岩石和山脉。
So we have got a mixed landscape, some some sort of grass is a bit sort of sandy in the background, a bit a bit sort of rocky and mountainous.
还有三个构图非常精美的热气球,看起来极其逼真。
And then there are three very beautifully composed hot air balloons, which look extremely realistic.
三个气球的灯光效果一致,光线都来自画面中的同一个方向。
The lighting is consistent across all three of them, so the lights come from from one particular direction in the in the shot.
乍一看,非常非常真实。
It looks, at a glance, very, very real.
我觉得当你仔细观察时,可能会发现那些岩石比现实中预期的要尖一些。
I think when you when you look really, really carefully, you might say, oh, those rocks are, are perhaps a little bit pointier than you might expect in reality.
但乍看之下,它真实得令人信服。
But but at a glance, it's extremely convincing as real.
事实上,我很高兴你注意到那些尖岩石,因为它们在某种意义上也是真实的。
And in fact, I would I would I'm glad you noticed the pointy rocks because they're actually they're actually in some sense real as well.
这个提示词是...我来完整读一下:'用单反相机风格拍摄' 嗯。
So this prompt is I'll read the entire prompt, shot in the style of DLSR camera Mhmm.
你能感受到偏振滤镜的效果。
Which you you feel with polarizing filter.
哦。
Oh.
土耳其卡帕多西亚独特岩层上空漂浮着三只热气球的照片。
A photo of three hot air balloons floating over the unique rock formations in Cappadocia, Turkey.
原来那些岩层看起来就像这样尖耸。
It turns out those rock formations look pointy just like this.
太美了。
Amazing.
而且
And
这些气球上的色彩与图案,与下方大地色调的景观形成了绝美对比。
the colors and patterns on these balloons contrast beautifully against the earthy tones of the landscape blow.
简直像首诗。
It's almost a poem.
是啊。
Yeah.
确实。
It is.
这是伊琳娜·布洛克和她团队的作品。
This was done by Irina Block on her team.
她是位天才,我们称她为提示语耳语者。
She's a genius that we call her a prompt whisperer.
真的吗?
Really?
这张照片捕捉到了体验这种冒险时的心境。
This shot captures the sense of adventure that comes with enjoying such an experience.
很有诗意。
It is poetic.
这很有诗意。
That is very poetic.
就是,不知道你注意到没有,那些气球其实不太圆。
Like, I don't know if you noticed, but, like, the balloons are not exactly round.
它们有点,怎么说呢,形状不规则。
Like, they're a little bit, like, misshapen.
嗯哼。
Mhmm.
有人指出了这点,然后我们都去看了。
Someone else pointed this out and we looked.
这是热气球被风吹时的自然现象。
That happens when the wind blows on a hot air balloon.
不是的。
No.
这些尖尖的岩石其实是土耳其卡帕多西亚的真实地貌。
So these pointy rocks are, like, actually, it's what it looks like in Cappadocia, Turkey.
就连你看到的这些小变形,我们在Imagine three图像生成模型中的表现确实令人惊叹。
And, like, even these little deformations that you're seeing, the models where we are with image generation with Imagine three is truly astonishing.
所以我在想,这个提示词在写作风格上到底做了多少工作?
So I wonder how much of that is the is the the how much work is that prompt doing in terms of the way that the prompt is written, in terms of the style of that prompt?
因为按理说,Irina每天都在不停地写提示词。
Because presumably, Irina is is writing prompts all day every day.
她本人非常擅长操作这些模型。
She's quite good at in person with these models.
是啊。
Yeah.
没错。
Yeah.
马马虎虎,但她不仅仅是在画三个热气球
So so but she's not just writing three hot air balloons
对。
Right.
你知道的,在土耳其。
You know, in Turkey.
对。
Right.
所以你触及了让这些模型良好运作最具挑战性也最有趣的部分——模型对非常具体提示的响应能力如何?
So the this is you you hit on one of the the most challenging and I think interesting parts of getting these models to work well, which is how how well does the model respond to very specific prompts?
我们称之为提示连贯性。
Prompt coherence, we call it.
确保提示连贯性对于让像伊琳娜这样脑海中有想法的人将其表达出来至关重要。
And getting prompt coherence right is utterly critical to allow someone like Irina who has this idea in her head to get it out.
对吧?
Right?
如果模型只会响应'几个热气球飘在山脉上空'这样的提示,我们就无法获得眼前这样的细节效果。
And so if all the model did was respond to three hot air balloons floating over some mountains, we wouldn't be able to get the kind of detail that we're seeing here.
她已经掌握了技巧,可以说摸透了模型运作方式。或许我们也该提升生成图像的质量标准,让非专业人士也能达到这种水准。
And she has learned, you know, she's kind of figured out the model, and you could say that we should do a better job of generating images of exactly this high quality, even for someone who's not Irina Block.
如果只有伊琳娜·布洛克能制作这些图像,那我们就遇到问题了。
If Irina Block is the only one that can make these images, we have a problem here.
事实上确实如此。
And and in fact, it's true.
你很快就能掌握这些提示技巧。
You can you can pretty quickly come up to speed on these prompts.
她恰好是位艺术大师。
She just happens to be an artiste.
我们来看视频吧。
Let's let's move to video.
这是Veo生成的一段1分5秒的视频。
So this is a one minute and five second video from Veo.
Veo是谷歌的新模型。
And Veo is the new Google model.
Veo是我们谷歌新推出的视频生成模型。
Veo is our new Google video generation model.
嗯。
Mhmm.
对。
Yeah.
这里有四个提示词。
So the first there's four prompts here.
一个快速跟拍镜头穿过繁华的反乌托邦城区,霓虹灯闪烁,飞行汽车穿梭,雾气弥漫,夜间镜头光晕,体积光照效果。
A fast tracking shot through a bustling dystopian sprawl with bright neon signs, flying cars and mist, night lens flare, volumetric lighting.
接着镜头继续快速跟拍穿过未来感十足的反乌托邦城区,霓虹灯璀璨,星际飞船划过天际。
And then it continues to a fast tracking shot through a futuristic dystopian sprawl with bright neon lights, starships in the sky.
夜间体积光照效果,略有不同,只是细微调整。
Night volumetric lighting, slightly different, so minor changes.
嗯。
Mhmm.
然后我们加入一辆汽车,一个全息霓虹汽车以光速飞驰,电影级质感,细节惊人,体积光照效果。
And then we bring a car in, a neon hologram of a car driving at top speed, speed of light, cinematic, incredible details, volumetric lighting.
如果要描述你即将看到的画面,我们会从顶部俯冲进入。
So if I were to describe what you should see, we're gonna come in from the top.
这是个跟拍镜头。
It's a tracking shot.
我们将从跟踪镜头中降下来。
We're gonna come down from the tracking shot.
我们会看到一个霓虹全息影像的汽车以光速行驶,电影感十足,然后汽车驶出隧道,回到真实的香港城市。
We're gonna have this neon hologram of a car driving at the speed of light, cinematic, and then the car leaves the tunnel back into the real world city of Hong Kong.
所以我们应该预期会有一个过渡回到香港的场景。
So we should expect a kind of transition back to Hong Kong.
好的。
Alright.
明白。
Okay.
那我们开始吧。
So we're starting off.
就像提示描述的那样,没有其他更好的形容方式了。
You've got, I mean, there's no other way to describe it than what what the prompt said.
这些建筑都覆盖着霓虹灯。
You got these buildings covered in neon lights.
这个跟踪镜头非常流畅,然后你加速并越来越近地拉近镜头。
It's very smooth, this tracking shot, and then you speed up and zoom in closer and closer.
你现在正穿梭在建筑之间。
You're in between the buildings now.
哦。
Oh.
然后我们看到这辆车在街道上飞驰。
And and then we have this car racing through the streets.
你可以看到霓虹灯反射在下方潮湿的路面上。
You can see the neon lights reflected in the wet pavement below.
周围还有其他车辆在争夺位置,由于速度太快,几乎所有的东西都变得模糊不清。
There's other cars jostling for position around, and it's almost like everything is blurred as you're because you're just going so fast.
但它确实非常连贯。
But it's really consistent.
当然我们得说明,如果你想观看这些视频,它们都在YouTube上。
And we should say, of course, that that if you would like to actually watch those videos, they are up on YouTube.
现在它穿过了隧道。
Now it's gone through a tunnel.
头顶上有这些大灯,它从隧道中驶出,进入一个极其逼真的现代场景。
There are these big lights overhead, and it's come out of the tunnel into an extremely realistic modern scene.
这是个令人惊叹的时刻。
It's a wow moment.
太不可思议了。
That's incredible.
惊艳时刻。
Wow moment.
对吧?
Right?
而且过渡天衣无缝。
And it was seamless.
我们全程都在跟随那辆车。
We were following the car the entire way.
如果要说的话,可能可能它并不完全像租赁那样好,但当当我们驶出隧道时,我浑身起鸡皮疙瘩。
If If there's a like, that's a that's maybe maybe it's not quite as good as the lease at all, but, like, when when we come out of that tunnel, like, I get chills.
所以这不仅意味着每个单独场景都遵循提示词。
So this is not only that each individual scene is following the prompt.
还意味着场景之间也在完美融合。
It's that it's blending between the scenes too.
所以这里有个被称为自回归组件的部分,它能提供时间上的连贯性。
So there's what's called an autoregressive component that provides coherence over time.
怎么做到的?
How?
怎么做到的?
How?
人工智能的魔力。
The magic of AI.
我是说,毕竟视频处理比图像难得多。
I mean, because video is much harder than image anyway.
对吧?
Right?
快告诉我们原因。
Tell tell us why.
嗯,我认为主要有两个原因。
Well, I mean, there's two reasons why.
一种理解方式是视频由大量图像组成,每秒大约24到30帧。
And one way to look at it is video is lots of images, what, somewhere between twenty four and thirty frames a second.
所以生成一秒钟的视频需要处理更多内容。
And, so there there you are for a second of of video you have to generate, you know.
更多。
More.
更多。
More.
但还存在时间连贯性问题,这才是最关键的。
But there's also this temporal coherence problem, and that's the the massive thing.
展开剩余字幕(还有 351 条)
时间连贯性意味着随着时间推进内容要保持合理。
Temporal coherence meaning it's gotta make sense as you go forward in time.
没错。
Yeah.
我是说,就像一些简单的事情,比如有人在运球,但篮球本身却几乎无法改变。
I mean, it's like even simple things like someone's dribbling a basketball, and the basketball kinda can't change.
你知道,那个篮球就这样被运走了。
You know, that basketball's dribbling away.
所以这些物理问题,比如,如何模拟世界物理才能真实反映我们生活的现实世界?
And so the physics these physics issues, like, how how do you how do you simulate the physics of the world in a way that is reflective of the real world we live in?
这样你才会相信那是个篮球。
So that you believe it is a basketball.
没错。
That is correct.
确实如此。
That is correct.
因为如果它晃动不定,边缘不规整,颜色或图案变化,或者运动方式不真实。
Because if it's wobbling around, the edges aren't quite right or the color changes or the pattern changes or it just doesn't move in a realistic way.
我是说,这里面有很多层次需要
I mean, there's lots of layers to that then that you have to
处理好。
get right.
完全正确。
Absolutely true.
而且你看看这个,我真的很喜欢。
And even if you look at I really like this.
这是只小狗狗在浴缸里,鼻子上还沾着泡沫。
This is a a little puppy in in a bathtub, it's and got some suds on its nose.
哦天哪。
Oh my goodness.
是啊。
Yeah.
这太好了。
It's so nice.
对吧?
Right?
是啊。
Yeah.
所以这就是我们所说的连贯性问题。
So this this is the kind of coherence problem we're talking about.
模型如何理解诸如肥皂泡从小狗下巴滴落的方式,然后小狗在某个时间点移动头部,这时...
How does the model figure out the the fact like, the way that the the soap suds is is falling from the chin of the puppy, and then at some point in time, the puppy moves its head, and then it At which point the
青色的泡沫滴落。
teal suds drop.
嗯。
Yeah.
现在好了。
Now okay.
我的背景是流体力学。
My background is in fluid dynamics.
嗯哼。
Mhmm.
我我知道。
I I know.
如果我用方程来尝试解决这个... 你真是个优秀的博士生。
If I tried to do that with using equations Well, you're a very good PhD.
没错。
That's right.
这是个这是个
That's a that's a
那是
That is
但这仅仅是基于图像进行的。
But this is doing it just based on the images.
不是的。
It is no.
它做的远不止这些。
It's doing more than that.
我认为可以安全地说,这些模型正在通过视频学习世界的物理规律。
I think it's safe to say that these models are learning about the physics of the world from the videos.
是的。
Yes.
所以输入是带有视频内容描述的标注视频,模型接收的是视频帧序列。
So it the input are annotated videos, annotated with a description of what's happening in the video, and the model's being presented with the frames of the video.
这就是输入数据。
That's the input.
为了让模型能良好地生成视频复现,学习世界物理规律似乎对模型很有帮助。
It is in order to reproduce these videos to do good generation, it seems to be helpful for the model to learn about the physics of the world.
因此我们既对这些模型作为世界模拟器、物理引擎感到兴奋,也对其用于电影制作的能力充满期待。
And so we're as excited about these models as world simulators, physics simulators, as we are about the ability to use them to make film.
我们可以讨论为机器人技术模拟环境。
We can talk about simulating environments for robotics.
我们可以讨论为几乎所有事物模拟环境。
We can talk about simulating environments for almost anything.
就这些模型捕捉真实世界物理规律(包括流体动力学)的程度而言——在座各位比谁都清楚这有多难,真的非常难。
And to the extent that these models capture real world physics, including fluid dynamics, which you above all in the room know is hard, like really hard.
对吧?
Right?
这这这相当了不起。
It's it's it's quite remarkable.
我们正在考虑三维领域。
We are thinking about three d.
我们正在思考如何为用户、电影制作人或所有人提供明确控制镜头的能力。
We're thinking about ways to provide, users, filmmakers, or or everybody with the ability to explicitly control camera.
我们现在可以通过文本提示实现这一点。
We can do it with text prompts now.
所以还有很大的改进空间。
So there's a lot of room for improvement.
这仅仅是个开始。
This is just the beginning.
但我认为Veo、Imagine 3以及音乐作品确实标志着我们这项研究的一个分水岭时刻。
But I think I I think Veo and these and and Imagine three and the the music work really does mark a watershed moment for us in terms of where we are with this research.
那么这能如何应用呢?
How can this be used then?
因为这已经超越了仅仅创造漂亮图像的范畴。
Because this this goes beyond just creating pretty images.
我的意思是,绝对可以。
I mean, absolutely.
我认为主要应用场景源于这些模型能学习物理世界的某些规律。
I think that the major use case here does flow from the fact that these models learn something about the physical world.
能以某种方式学习世界物理规律的模型,对科学探索、模拟等领域会非常有用。
Models that can can, in one way or another, learn about the physics of the world can be incredibly useful for scientific exploration, for simulation, and etcetera.
因此我认为这项工作具有完整的科学发现导向性。
So I think there's a whole scientific discovery directionality around this work.
还有我们经常讨论的创意方面。
There's also the creative aspects, which we spend a lot of time talking about.
这些就是我看到的两个主要领域。
Those those are the two major areas that I see.
我认为对很多人来说,这种图像生成、视频生成,甚至音乐生成的技术,感觉像是突然之间就冒出来了。
I think for a lot of people, this image generation, video generation, even the music generation sort of feels like it's it's come very quickly from nowhere.
但你其实已经梦想这个很久了。
But you have been dreaming about this for a long time.
是啊。
Yeah.
确实如此。
Actually, I have.
就像有些孩子会谈论职业足球运动员那样。
So, like, some kids will talk about, like, know, like, professional soccer players.
他们会说,我从两岁就开始踢足球了,或者我从小就穿着冰鞋。
They'll be like, I just was kicking a soccer ball from the time I was like two or like, you know, I've been on ice skates.
我小时候有点奇怪。
I was just kind of a weird kid.
我对音乐特别着迷。
Like, I was just like so fascinated by music.
我在印第安纳州的南本德长大,虽然那是个小城镇,我在一个工人阶级家庭长大,生活很幸福,也很以这样的成长背景为荣。
I grew up in I mean, the town is South Bend, Indiana, but it's, you know, you know, I grew up in a working class family, very happy, very proud of growing up in a working class family.
实际上我们家并不富裕,我想要一架钢琴,结果却得到了一把小号去上课,因为租小号更便宜。
We actually didn't have a lot of money and, I was asking for a piano and I got a trumpet instead to do trumpet lessons because it was easier to rent a trumpet.
确实如此。
That's true.
确实如此。
That's true.
但我记得更小的时候,大概五六岁,我隔壁邻居家有一架自动钢琴。
But I remember I was, even younger, five or six years old, my my next door neighbors had a player piano.
嗯。
Mhmm.
我是说,如果你没见过世纪之交的自动钢琴,它有用脚踩的踏板,通过来回踩踏控制风箱,让空气穿过系统,然后推动穿孔纸卷。
I mean, like, if you've never seen a player piano from the turn of the century, it has foot pedals that you pump back and forth, and those control bellows that send air through the system that then moves the air through a paper roll that has holes cut in it.
当空气穿过纸卷时,纸卷上的孔洞会让对应的琴键弹奏。
And when the through the roll, when the roll has a hole in it, that causes the corresponding key to play.
我就记得当时一直问我妈妈,我们能去埃德和琼家吗?
And I just I remember just asking my mom, can we go to Ed and Jean's house?
我想看那架自动钢琴。
I want to see the player piano.
我着迷得不得了,比小孩看到糖果店还兴奋——我,道格,隔壁邻居家的自动钢琴。
I was fascinated, like beyond like kids candy store, me, Doug, player piano next door neighbor.
现在回想起来,我几乎要热泪盈眶。
I just it's almost bringing tears to my eyes to think about it now.
所以我一直在思考这种结合...其实直到人们...当你问这个问题时,我才意识到这一点。
So I have been thinking about the marriage of I didn't even I didn't even know it until people like like I brought this this thought came to mind when you asked this question.
是的,从五岁起,我就一直在思考技术与音乐之间的互动关系。
It's yeah, I've been thinking about technology and, and the inter the interplay between technology and and and music since I was, five.
但机器自己演奏音乐确实有种奇妙的魔力。
But there is something quite magical about a machine playing music on its own.
噢,太酷了。
Oh, it's so cool.
真的太酷了。
It's so cool.
是啊。
Yeah.
没错。
Yeah.
那跟我聊聊你最近在研究的那些东西吧
Then talk to me a bit about some of the stuff that you've playing around within the
过去五年里
last five years.
当然
Sure.
我花了很多时间思考这些技术之间的相互作用,它们如何协同工作
I have been spending a lot of time thinking about the the interplay between these technologies, how do they work together.
我不需要在场,Veo或Imagine也能不断进步
I don't need to be in the room for Veo to get better or for Imagine to get better.
有这么多优秀的人才在为此努力
There's so many great people working on this.
但我们如何真正将这些拼图组合起来?
But how do we actually put the pieces together?
比如,我一直在思考皮克斯做了件非常美妙的事
You know, for example, I I've been thinking I've been thinking a lot about you know, Pixar did something really beautiful.
对吧?
Right?
他们运用这项技术,制作出一系列令人惊叹的电影
They took this technology, and they turned it into a bunch of astonishing movies.
虽然我认为我们不能照搬皮克斯的模式,但我们应该意识到通过这些电影将技术带入每个人生活所需的真正条件
And I don't I don't think that we can follow the Pixar playbook, but I think we should be aware of what it really takes to move technology into everybody's lives with these movies.
我们如何用AI实现这个目标?
And how do we do that with AI?
我想到了比空气重的飞行器
I think of heavier than air flight.
对吧?
Right?
这是什么?
What is it?
六十五年?
Sixty five years?
从首次飞行到登月,只用了不到七十年,大约六十多年。
Sixty something years between less than seventy years between the first flight and landing on the moon.
对吧?
Right?
而人工智能领域,我们至少已经研究了三十年。
And if you look at AI, we've been doing this for at least thirty years.
对吧?
Right?
那么在这个行业的三十年里发生了什么?
And so what happens thirty years into this other industry?
要知道,你们已经解决了飞行问题。
You know, you've already figured out flight.
现在考虑的是创建航空公司。
What you're thinking about now are creating airlines.
思考的是如何真正实现全球人员运输?
You're thinking about how do you actually move people around the world?
如何连接人们?
How do you connect people?
就像,你们要解决的问题已与航空学大不相同。
How do you like, you're solving for something much different than aeronautics.
你们仍在解决航空学问题。
You're still solving for aeronautics.
对吧?
Right?
你必须持续攻克航空学难题
You have to keep solving for aeronautics.
但这已经超越了一个层次
But it's a level beyond
但这又超越了一个层次
But it's a level beyond that.
我们如何才能做些有意义的事?
How can we do something that matters?
要做有意义的事,不仅需要构建这些模型,更需要弄清楚它们如何适应社会、如何为用户服务
And to do something that matters is gonna require not just building these models, but actually figuring out how they fit, how they fit in society, how they fit for users.
我们到底在构建什么?
What are we actually building here?
我认为这其中蕴含着非常重要的意义
I think there's something really important in that.
我想这是许多人最关心的问题:这对人们意味着什么?
I think it's something that's at the forefront of a lot of people's minds, which is what does this mean for people?
当然有些人会张开双臂欢迎这些变化,但也有些人可能更犹豫或担忧
Because there are some people, of course, who are welcoming these changes with open arms, but there are other people who are maybe a bit more reluctant or or or concerned about it.
同意
Agreed.
首先,我也有这些顾虑
I mean so first, I I share these concerns.
我认为我们需要非常谨慎
I think we need to be very cautious.
我们为此已经努力了很久
We've been working on this for a long time.
我们对此思考了很久
We've been thinking about it for a long time.
在某些方面,我们在这个领域进展缓慢,因为我们意识到自己可能产生的影响,并努力倾听社区的声音。
In some ways, we've moved slowly in this space because we realize how much impact we can have, and we're trying to listen to the community.
因此我们投入的不是几个月或几周,而是多年时间与艺术家合作,聆听音乐家、视觉艺术家的声音,甚至思考戏剧和电影制作,试图理解这项工作如何融入其中。
So we have spent not months or weeks, but years working with artists, listening to musicians, listening to visual artists, thinking even about theater, about filmmaking, trying to understand how this work fits in.
对我而言,这是最有意义的——或许是我工作中最有价值的部分——与这些社群合作并试图理解:我们能为这些社群和所有人做出哪些卓越贡献?
And for me, it's some of the most rewarding it's maybe the most rewarding part of what I'm doing is working with these communities and trying to understand, you know, what can we do that's great, right, for these communities and for everyone?
我们非常严肃地承担着这份责任。
As we take that we take that responsibility very seriously.
那么你们具体能做些什么?
So what can you do then?
因为众所周知,这些模型的训练是基于人类创造的知识产权。
Because, of course, you know, the training of these models is based on intellectual property that's created by humans.
你们打算如何解决这个问题?
How can you sort of negotiate that issue?
我们正在探索通过识别、数字水印等方式,将模型输出溯源归功于原创者。
So we're thinking about ways of identifying, watermarking, attributing the outputs of our models back to the creators.
这仍是个非常困难且开放的问题,但我们希望帮助建立补偿艺术家的新机制。
And this remains a very hard and open question, but where we wanna be, we wanna help build, you know, new ways to to compensate artists for their work.
我们认为正确处理此事至关重要。
Think it's incredibly important to get this right.
同时我们也确保运用像Synth ID这样的技术。
And we also want to make sure that we're, you know, we're using technologies like Synth ID.
Synth ID就是你们的数字水印软件吧?
Synth ID, that's that's your digital watermarking software.
是的。
Yes.
以确保能在作品流转于生态系统时进行识别。
To to make sure that we can identify this work as it moves, moves through the ecosystem.
知道吗,我是个音乐人。
You know, I'm a musician.
我在SoundCloud上有作品。
I have work up on on SoundCloud.
我不会告诉你如何找到它。
I won't tell you how to find it.
虽然只有五首歌,但都是我的创作。
Even though it's like five songs, like, they're mine.
对吧?
Right?
它们是我的。
They're mine.
已经找到了,
Found it already,
其实。
actually.
我真的找到了。
I actually did.
你在疫情期间用AI做的一些音乐。
Some of the AI music you did in lockdown.
嗯。
Yeah.
嗯。
Yeah.
我找到了。
I found it.
我找到了。
I found it.
比如说,结束后我可能会得到50次或100次点击量,你懂的。
Like, I'm gonna get 50 or I might get a 100 hits after this is over, you know.
但是,我还是会在意的。
But, like, I'd still mind.
对吧?
Right?
这是我的创作,我强烈认为我们需要保护这种创造性知识产权的所有权,对这份作品的所有权。
And it's my creation, and I feel very strongly that we need to protect that creative intellectual property ownership, right, of of that work.
那么未来可能会是什么样子呢?
So what might that look like in future?
是不是要向作品被纳入模型的人支付版税?
Is it is it is it where you give royalties to people whose work is included in the models?
我的意思是,具体怎么运作?
I mean, how does it work?
我们正在为解决方案添砖加瓦。
We're adding building blocks towards a solution.
其中之一是一个协议,允许网站标注哪些作品可以用于AI训练。
One of them is a protocol that allows a website to annotate what works are are okay to train on for AI.
你知道吗?
And you know what?
如果有人用它来声明我的网站内容完全不可用于训练,我也完全没问题。
I'm fine if someone uses that to completely say nothing in my website is okay to train.
我完全接受。
It's totally cool with me.
对吧?
Right?
其他人会想要参与其中。
Other people are gonna we wanna wanna be part of that.
他们会希望看到自己的作品在那里展示出来。
They're gonna wanna see their work work show up there.
所以这是一点。
So that's one thing.
那么如何将作品纳入系统,使其既能清晰展示又便于归属认定呢?
Then how do you take work and put it in the system in a way that makes it really clean and easy to do attribution?
这才是真正的难题。
And that's the hard problem.
对吧?
Right?
目前有一种技术叫做检索式生成。
So, you know, one technology that's out there is called retrieval based generation.
想象一下,你拥有一个系统,它总体上了解视觉世界的物理规律。
So imagine what you have is a a system that in general knows about the physics of the visual world.
它是基于授权数据训练的。
It's trained on licensed data.
也是基于通用数据训练的。
It's trained on generic data.
但现在你是一位拥有独特鲜明视觉风格的艺术家。
But now you're an artist with a very interesting and vivid visual style.
或许我们可以检索到你的作品。
You know, maybe we can retrieve your work.
我的意思是,在获得你授权的前提下,你的作品会以某种向量数字形式存储在数据库中,让我们能够运用你的风格。
By that, I mean, your work is represented with your permission in some database that has some vectors of numbers that would allow us to use your style.
这种检索会生成一些向量,我们在创作时就会参考这些向量。
That retrieval yields some vectors that we're attending to when we generate.
现在我们生成的图像就能明显体现你的风格特征。
And now we generate an image that is very clearly reflecting your style.
嗯,现在版权归属问题已经变得相当简单了。
Well, the attribution issue has become quite simple now.
嗯。
Mhmm.
我们在获得您许可的情况下检索了您的作品。
We retrieved your work with your permission.
我们可以将该图像标注为采用您的风格生成,这为市场交易打开了大门。
We can annotate that image as being generated with your style, and then that opens the door for marketplaces.
那么我的第二个后续问题是:现在亡羊补牢是否为时已晚?
So I guess my second follow-up question to that is, has the horse not already bolted?
这些模型已经在基于艺术家作品生成输出了。
Like, these models are already producing outputs based on the work of artists.
我认为我们已经有...你知道...很多可能出错的方式。
I think that we already have, you know, lots of ways that we could get this wrong.
我经常思考音乐产业在MP3革命时期的遭遇,接着出现了Limewire、Napster(顺序可能不对)和BitTorrent。
I really think a lot about what happened with music during the m p three revolution, and then we had Limewire, and then we had Napster, maybe not in that order, and BitTorrent.
这对音乐人来说绝对是场灾难。
And this was not good for music artists at all.
而当时我认为运作良好的一个例子是YouTube。
And, you know, one example of something that I think worked well in at that point in time was YouTube.
实际上我为我们所做的感到自豪,虽然作为音频研究员我只参与了很小一部分。
And I'm actually I'm quite proud of what we did, because I was a very small tangential part of it as a as an audio researcher.
从某种意义上说,它让艺术家能自主决定如何与观众建立联系。
In in the sense that it allowed artists to connect with their audiences on their own terms.
是的。
Yeah.
我们确实创造了...
We created yeah.
我们打造了一个比BitTorrent更优质的用户体验,建立了完整的侵权内容下架机制,还创建了一个创作者网络,不仅连接创作者与粉丝,更能帮助他们发展事业并实现变现。
So what we did was we created a user experience that was better than BitTorrent, built an entire takedown mechanism for infringing content, built a creator network that not only connects creators to their fans, but allows them to build their careers and monetize their careers.
如今这已是一个相当繁荣的生态系统,当然它并非唯一存在的生态体系。
And now it's a a quite thriving ecosystem, and it's not the only ecosystem out there.
所以我们期待看到更多类似YouTube这样的生态系统涌现。
So, you know, we should hope to see ecosystems that are like YouTube.
它们或许不是YouTube,但也可能具备相似特质。
They're not YouTube maybe, but they're maybe they are.
我的YouTube同事们都愣住了——
My YouTube colleagues are like, wait.
等等。
Wait.
它们可能像YouTube,但额外增加了支持生成式创作的组件,同时依然维持着资金流动——说实在的。
Maybe it could be you know, they're like YouTube, but they're extra moving parts that account for the generative process, but still create this this flow of money, frankly.
毕竟大家都得吃饭。
Like, we all gotta eat.
对吧?
Right?
嗯哼。
Mhmm.
这种资金流动和作品归属体系,加上艺术家自我营销的能力,向世界展示才华——坦白说,还有成名的机会。
This flow of money and this flow of attribution, and also the ability to market yourself as an artist and see show the world, and frankly, to get famous.
这也很酷。
That's cool too.
创作者网络里很多行为本质上都是在追求成名。
Like, lot of what happens on the creator network is you're trying to get famous.
这很棒。
That's great.
我希望所有这些都能实现。
I want all of these things to be able to happen.
我们目前没有解决方案,我个人也没有,但我看到了诱人的可能性——像我刚才描述的那种市场和生态系统能在生成领域运作。
I don't we don't have a solution, and I particularly personally don't have a solution, but I see tantalizing possibility to see marketplaces and ecosystems like the one I just described work in generative.
这是从艺术家的角度出发,但还有另一个视角,即观众的视角。
That's things from the perspective of of the artist, but there is another perspective here, which is from the perspective of the audience.
是的。
Yeah.
我的意思是,如果你突然降低了创作音乐、艺术、文学等任何形式的门槛...
I mean, how do you if it's suddenly you're lowering the barrier to entry to to create music, create art, to create literature, to, you know, whatever it might be.
如何避免观众被垃圾内容淹没?
How do you avoid the audience being flooded with junk?
没错。
Yeah.
这正是我在生成领域面临的实际挑战之一。
This is one of the one of the real challenges I have with generative.
就像我说的,如果你能生成披头士的第十三张专辑,就能生成第一百万张。
You know, I already mentioned that if you can generate the thirteenth Beatles album, you can generate the millionth.
对吧?
Right?
所以我认为,其中一个连接点在于回归到艺术家与观众的联系。
So, you know, I think we I mean, one of the one of the connectors here is back to people as artists connecting with people.
就像我们会开始信任某些策展人,就像我们喜欢某些DJ一样——嗯。
Like, I think we will start to trust even if it's a curation issue, we'll start to trust certain curators in the same way that, like, we love certain DJs Mhmm.
他们做的远不止是策展。
Who are doing more than curating.
别误会我的意思。
Don't get me wrong.
嗯。
Mhmm.
不过,你知道的,我们有那个。
But, you know, we have that.
我认为部分原因在于我们某种程度上限制了外部材料的真正爆发式增长。
I think part part of it is like we just sort of limit the true explosion of materials that are out there.
但是,是的,这仍然是个挑战。
But, yeah, it's still a challenge.
对吧?
Right?
我们正淹没在媒体的海洋中。
We're swimming in we're swimming in media.
但我确实怀疑我们是否会陷入这样一种情况:假设你是个图书出版商,正在审阅投稿,突然被生成式AI创作的小说淹没。
But then I do wonder whether we end up in a situation where let's say that you're a book publisher and you're reading submissions, and suddenly you're inundated by generative AI novels.
我是说,我觉得目前还能分辨出来。
I mean, I think I think at the moment you can tell.
但让我们想象在未来某个时候你无法分辨的情况。
But let let's imagine in a future a future time where you can't.
唯一的解决方案就是拥有自己的AI来协助筛选内容,这是肯定的。
The only solution there is to have your own AI that's that's that's helping you with curation, surely.
是啊。
Yeah.
这并不疯狂。
That's not crazy.
我认为还存在另一个关于信任媒体的问题。
I think you have you have you have this other question of trusting media.
另一方面,我不是个书籍作者。
The flip side of that, which is I'm not a book author.
我正在阅读一篇新闻报道,说世界上某个地方发生了某件事。
I'm reading a news report that tells me that something happened somewhere in the world.
这里有两个问题。
There's two issues.
这是AI生成的吗?
Is it AI generated?
然后它真的属实吗?
And then is it actually true?
对吧?
Right?
这些都是非常、非常、非常重要的问题,我认为它们是短期内亟需解决的重大问题。
Those are very, very, very big questions, and I think they're immensely important short term questions that we need to deal with.
我们正尝试通过某些方式来缓解这个问题。
We are trying to mitigate in certain ways this.
例如,Gemini将在选举周期内限制政治类查询的数量,我认为这很棒。
For example, Gemini will limit the number of political queries that are possible during the election cycle, which I think is great.
这某种程度上能让事态稍微放缓一些。
It kind of just slow things down a little bit.
如何阻止AI生成文章
What to stop AI generated articles
被...那是...那是...是的。
from being That's that's that's that's Yeah.
所以这是一种缓解措施,需要与其他措施配合使用来尝试
So this is one mitigation that you tie in with other mitigations to try to
只要转动旋钮
Just turn the wally
开启射击模式。
shots on.
是啊。
Yeah.
没错。
Yeah.
只是在展望未来,嗯哼。
Just thinking ahead to the future then Mhmm.
你觉得未来五年左右生成式AI会有什么发展?
What do you imagine is gonna happen in generative AI in the next five years or so?
我看到的发展趋势是向多模态方向迈进。
What I'm seeing happening is a move towards multimodality.
视频、图像、音乐。
Video, image, music.
正是如此。
Exactly.
我认为在这个过程中,我们现在做的是解决单一模态的AI问题。
And I think in doing so, I think what we're doing now is we're we're figuring out the AI for single modality issues.
比如我给你看过一些超棒的图片。
Like, I showed you some really cool images.
还给你看过一些我觉得很酷的视频。
I showed you some really cool I think they were cool, some cool videos.
但当你开始加入音频时。
But then you start adding audio.
你会发现,如果我们能真正以生成方式融合视觉和听觉,让人们用不同方式讲述脑海中的故事,那将真正改变一切。
And what you see is, like, if we can truly meld the the visual and the audio in a generative way such that people can tell the stories that are in their heads in different ways, we really do change things.
我认为这就是未来五年会发生的事。
And I think that's what'll happen in the next five years.
我们五年内解决不了这个问题,我坚信这点,但我们会开始真正创造性地整合这些核心技术模块。
It won't we won't solve it in five years, I strongly believe, but we will start to just really creatively bring together these core chunks of technology.
这也是双子座的梦想。
This is the the also the dream of of Gemini.
双子座的使命明确且公开地是多模态的。
The mission of Gemini is explicitly and openly multimodal.
我们将开始看到文本和图像以不同方式结合,这将开启一系列全新的创意和表达可能性。
We'll start to see the combinations of text and images combined differently, and that will open up a whole bunch of new creative and expressive options.
不过我也在想,回到你之前提到的这些系统现在理解物理学的观点。
Well, I also wonder though, if you just going back to this point that you made earlier about how these things now understand physics.
我们讨论的所有关于创造力的话题,其实都是在艺术领域。
Everything we've spoken about, really, about about creativity has been in the artistic space.
嗯。
Mhmm.
但如果你理解物理学,是否存在另一种创造力的可能性?
But but if you understand physics, is there a sort of potential there for a different type of creativity?
我是说,如果你理解空气动力学...嗯...
I mean, if you if you understand aerodynamics Mhmm.
未来能不能让你设计一架飞机?
Could you, in future, go in and say, design me an airplane?
可以。
Yes.
我认为如果你说的是五年以上,考虑到科学发现和物理引擎的发展,让我们能够设计更好的飞机,我相信这会实现。
And I think if you had said more than five years, I think if we're talking about scientific discovery and having physics engines that will allow us to do things like build better airplanes, I also believe that will come.
其实你知道,我真的很讨厌做这些预测。
I've in my you know, I I actually hate making these predictions.
哦,继续说啊。
Oh, go on.
你懂我的意思吧?
You know?
我们都喜欢这样。
We all love that.
我倾向于把它放在,比如说,我更多是把它放在十年这个时间尺度上。
I'm putting that at, like I'm putting that more in, like, the decade.
好的。
Okay.
你觉得是十年吗?
Decade, you think?
我想是的。
I think so.
是啊。
Yeah.
哇。
Wow.
我认为是的。
I think so.
感觉有点短。
Feels short.
确实感觉很短,但如果你看看机器人技术的发展,由于语言模型的应用,机器人技术已经取得了巨大进步,现在我们看到扩散模型也被引入其中,其发展速度令人非常惊讶——比如我们能让机器人多快适应动态变化的世界,包括抓取等能力。
It does feel short, but if you look at what's been happening in robotics, like robotics has accelerated tremendously thanks to the use of language models, and now we're seeing diffusion being brought into this as well in ways that are very, very surprising in terms of like how quickly we can allow robots to be able to basically dynamically exist in a, you know, a changing world in terms of like grasp and things like that.
所以如果我们看到这些进展,并开始意识到我们还能提供更高精度的物理模拟,我确实认为通往物理相关领域的大门正在打开。
And so if we see that and we start to see that we can also provide much, much higher resolution physics, I definitely see the door opening for, you know, for for the the the physics related stuff.
最后再谈谈艺术这个话题,毕竟这是我们讨论的重点。
Just to finish on on a point about art because that has been so much of the discussion that we've had.
你认为会有AI获得奥斯卡奖、普利策奖或重要摄影比赛奖项的那一天吗?
Do you think there will be a point where an AI can win an Oscar or a Pulitzer Prize or or some great photography competition?
会的。
Yeah.
我不会排除这种可能性。
I wouldn't rule it out.
你知道的,代理时代已经来临——我们用来指代那些具有身份的AI的术语。
You know, the era of agents, you know, that's the term we use to talk about, like, an AI that has an identity, I guess, is is upon us.
这里的归因问题非常有趣。
The attribution problem is very interesting here.
对吧?
Right?
有些极具创造力的人正在编写代码。
You have people that are quite creative that are doing the coding.
还有些人正在试图决定模型或代理的目标等等。
You have people that are trying to decide what are the goals of the model or the agent, etcetera.
如果你说的是赢得奥斯卡奖,那么这个思想实验就是:完全没有人类参与其中。
If you're talking about winning an Academy Award, like, so then so the thought experiment is like, there's really no human involved.
我在提示框里输入:'好了,代理,去给我赢个奥斯卡奖',然后我去喝了杯咖啡,对吧?
I type into the prompt, okay, agent, go win me an Academy Award, and I get a cup of coffee, right?
是啊。
Yeah.
我认为这确实就是你想表达的意思。
I think that's really what you're saying.
这确实就是你想说的。
That's really what you're saying.
没错。
Yeah.
我认为我们离那个目标还非常、非常、非常遥远。
I think we're very, very, very, very far from that.
最终可能会是这样:如果我们解决了这个问题,可能就实现了通用人工智能。
And it it could end up being like one answer is if we solve that, we may have solved AGI.
对吧?
Right?
说实话吧。
Let's be real.
就像,AGI可能很难做到,我是说,从零到零
Like like, it could be AGI hard to just I mean, with zero in zero
外部输入,无需请求
External input, no Please
去给我赢个奥斯卡
go win me an Academy
奖。
Award.
策展。
Curation.
没错。
That's right.
零。
Zero.
我觉得这在社会层面看来是件极其困难的事,能实现这个的智能体将会非常了不起。
I think that feels societally like an insanely insanely hard thing, And it's gonna be a pretty cool agent that pulls it off.
嗯,我想这也让我们回到了对话最初的观点——我们消费艺术的最大原因之一,或许正是因为它能连接人与人,传递人类的体验。
Well, I think it also does bring us right back to the very beginning of our conversation, which is that perhaps one of the biggest things about the reason why we consume art is because it's something about connecting to other humans and and communicating the human experience.
所以我想是的。
So I guess yeah.
嗯,我是说,这正是纯AGI生成的奥斯卡电影所明显缺失的。
Well, I mean, that's that's distinctly lacking from an AGI only generated Oscar film.
我 我想反其道而行之。
I I wanna go in the opposite direction.
要知道,我觉得人们都在担心AI会让他们失去参与创作的机会。
You know, I think people people are worried about AI taking them out of the participatory process.
我希望让更多人参与进来。
I wanna bring more people in.
所以我想带大家回到录音技术出现前的年代,那时我们都围在钢琴旁歌唱,虽然现在的钢琴可能不太一样了。
So I wanna take us back to the time before recorded music where we all stood around the piano and sang, And it might not be quite the same piano.
可能是某种AI辅助的钢琴形式,能让我们所有人都得到提升。
It might be something like a piano that AI is helping us do that's lifting us all.
但我认为这是个美好的目标。
But that, I think, is a beautiful goal.
是要让更多人而非更少人参与艺术创作和创意表达。
It's to to bring more people into art making and into creativity, not fewer.
利用技术来增进人与人之间的联系。
To use technology to increase human connection.
正是如此。
That's it.
非常赞同这个观点。
Really like that.
我真的很喜欢这个理念。
I really like that.
我想这正是结束对话的最佳节点。
Well, I think that's the perfect point to finish on.
道格·埃克,非常感谢您做客Google DeepMind播客节目。
So, Doug Eck, thank you so much for joining us on Google DeepMind, the podcast.
您真是太出色了。
You're brilliant.
汉娜,能来参加节目是我的荣幸。
Hannah, it's been such a pleasure to be here.
我知道,当谈论创造力及其在AI时代的意义时,话题很快就会变得极具哲学深度。
I know things quickly get deeply philosophical when you're talking about creativity and what that means in an era of AI.
通过与道格的对话,我认为我们现在所处的阶段与相机或钢琴发明时不同,这些发明只是为人类提供了另一种表达媒介。
And from that conversation with Doug, I think that we're in a different place now than we were with the invention of the camera or the piano, which which gave humans just a different medium to express themselves.
我认为这里的区别部分在于数量级。
The difference here, I think, in part, is one of volume.
正如道格所说,第一百万张披头士专辑还有什么价值?
As Doug said, what value would the millionth Beatles album have?
虽然AI当然能出色地完成策展工作,但我认为未来几年许多创意产业工作者将面临重大冲击。
And while AI can, of course, do a brilliant job of curation, I think that there are a lot of people who work in creative industries who are gonna face substantial disruption over the next few years.
但与此同时,我觉得道格对创造力的定义很有意思——需要被保留的本质是,AI无法真正独立创造,因为它缺少人类要素。
But at the same time, I think there is something interesting about Doug's definition of creativity, the thing that needs to be preserved, that AI can't be truly creative on its own because the human component is missing.
实际上我认为这蕴含着希望:即便是设计这些工具的人,也视艺术、文学和音乐为人类专属的、由人类主导的事业,这一点不会改变。
And I think there's something quite hopeful in that, actually, that even the people who are designing these tools, they see art and literature and music as deeply human endeavors for humans, by humans, and that is not going to change.
您正在收听的是由汉娜·弗莱教授主持的《谷歌深度心智》播客。
You've been listening to Google Deep Minds, the podcast with me, professor Hannah Fry.
如果您喜欢本期节目,嘿——
If you have enjoyed this episode, hey.
何不订阅一下呢?
Why not subscribe?
我们还将带来更多与AI前沿人士的精彩对话,主题涵盖AI如何加速科学发现到应对这项技术的最大风险。
We have got plenty more fascinating conversations with the people at the cutting edge of AI coming up on topics ranging from how AI is accelerating the pace of scientific discoveries to addressing some of the biggest risks of this technology.
若有任何反馈或想推荐嘉宾,请在YouTube评论区留言。
If you have any feedback or you want to suggest a future guest, then do leave us a comment on YouTube.
下次再见。
Until next time.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。