本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
Hi-Phi 国家。
Hi-Phi Nation.
一档哲学与现实交汇的节目。
A show where philosophy and reality meet.
来自 Slate Plus。
From Slate Plus.
去年,十位音乐人接到谷歌的电话,帮助他们完成你所能想象的最枯燥乏味的工作之一。
Last year, 10 musicians got a call from Google to help them do one of the most mind numbing jobs you could imagine.
该公司拥有来自 YouTube 的五千五百段音乐片段。
The company had five and a half thousand clips of music from YouTube.
听起来像这样的音乐。
Music that sounded like this.
还有这样的。
And this.
以及这样的。
And this.
他们的工作是聆听每一个片段,并用文字描述它们。
And the job was to listen to every single one of these clips and describe them in words.
比如,拿这个例子来说。
Like, take this example.
其中一位音乐人给出的描述是:这是对一首R&B灵魂乐作品的混音。
The description one of these musicians came up with was, this is a remix of an r and b soul piece.
有一段男声以轻松的方式演唱,同时伴以经过自动调音的男声。
There's a male vocal singing in a laid back manner joined by an auto tuned male vocal.
这段音乐的氛围很律动,充满令人愉悦的气息。
The atmosphere of the piece is groovy, and there's a feel good aura to it.
这段音乐可以用于情景喜剧的原声带中。
This piece could be used in the soundtrack of a sitcom.
他们完成了这项工作,我统计了数据。
Well, they finished the job, and I crunched the numbers.
这十个人总共听了38天不间断的音乐,每人92小时,逐个打字记录他们的描述,然后才进入下一个片段。
These 10 people listened to thirty eight straight days worth of music, ninety two hours each, typing out their descriptions one by one before moving on to the next clip.
他们总共使用了37万个词来描述所有这些片段。
They used a total of 370,000 words to describe all of the clips.
谷歌将音乐和这些文字输入到一个深度学习模型中,目的是找出哪些词语与哪些音乐声音相关联。
Google ran the music and the words through a deep learning model, with a goal of figuring out what words correlate with what musical sounds.
如果这个项目成功,结果将是任何人都可以对谷歌说出任何话,它就能根据我们的指令生成全新的音乐。
The outcome, if the project went well, would be the ability for any of us to say anything to Google, and it would generate brand new music based on our instructions.
在2023年1月,谷歌发布了一个原型系统,名为Music LM,一个从语言生成音乐的工具。
And at the January in 2023, it released a prototype, Music LM, a language to music generator.
好的,谷歌。
Okay, Google.
用人工智能生成电子舞曲风格的手风琴音乐。
Use AI to generate techno accordion music.
好的。
Okay.
使用人工智能生成带手风琴的电子舞曲音乐。
Using AI to generate techno music with an accordion.
好的,谷歌。
Okay, Google.
给我播放上世纪五十年代俱乐部的音乐。
Play me music from a club in the nineteen fifties.
好的。
Okay.
使用人工智能生成上世纪五十年代俱乐部的音乐。
Using AI to generate music from a club in the nineteen fifties.
好的。
Okay.
那真的太奇怪了。
That was really bizarre.
我不知道该怎么评价这个。
I don't know what to say about that one.
我不确定那到底是什么。
I'm not sure what that was supposed to be.
这是音乐哲学家罗宾·詹姆斯。
This is Robin James, a philosopher of music.
是的。
Yeah.
我觉得它试图融合杰瑞·李·刘易斯和弗兰克·辛纳屈的风格。
I feel like it was trying to combine, like, Jerry Lee Lewis and Frank Sinatra.
或者类似的东西。
Or something.
对。
Right.
好的。
Okay.
好的,谷歌。
Okay, Google.
现在播放六十年代俱乐部的音乐。
Now play music from a club in the sixties.
也许
Maybe
可能有一点冲浪吉他。
there's a little, like, surf guitar.
也许有一点塞尔日·甘斯布的味道。
Maybe there's a little, like, Serge Gainsberg.
摩德?
Mod?
是的。
Yeah.
我的意思是,这是人工智能。
I mean, it's AI.
你得记住。
You gotta remember.
根本没有人类做过这个。
Like, no no human being did this.
对吧?
Right?
嗯,这确实表明它是在这些广泛的数据库上进行训练的。
Well, yeah, it just shows that, like, it's being trained on, you know, these sort of broad datasets.
她有生殖器。
She's got genitals.
来自《Slate》,这里是Hi Fi Nation,用故事形式呈现的哲学。
From Slate, this is Hi Fi Nation, philosophy in story form.
来自普林斯顿大学的录音,以下是巴里·兰姆。
Recording from Princeton University, here's Barry Lamb.
我其实挺喜欢的。
I actually like it.
对于AI生成的、完美模仿人类音乐的作品,我并没有觉得有什么特别的审美价值。
There's nothing particularly aesthetically interesting to me about AI generated music that's a perfect copy of human made music.
那我为什么要听这个呢?
Because why would I listen to that?
我只会听人类创作的东西。
I'll just listen to the human stuff.
但那种 glitchy、怪异、机械感十足,完全不像人类会创作的东西呢?
But glitchy, weird, robotic stuff that doesn't sound like anything humans would make?
听这种音乐并思考它,挺有趣的。
That's fun to hear and to think about.
根据 Google Music LM 的说法,那是手风琴说唱。
That was accordion rap according to Google Music LM.
我希望 AI 音乐研究者能更多地关注这一点。
I wish AI music researchers would focus on that more.
你可能会称之为独特的机器人 glitch 美学。
What you might call distinctively robot glitch aesthetics.
嗯,今天的节目是关于机器生成的音乐。
Well, today's show is about machine generated music.
AI 音乐并不仅仅是一种东西。
There isn't just one thing that is AI music.
目前有数百个不同的项目正在同时进行,试图利用机器分析、重组和组织声音,使其成为我们能识别为音乐的形式。
There are hundreds of different projects going on simultaneously that are trying to use machines to analyze, recombine, and organize sounds into something we recognize as musical.
对于当今大多数人工智能技术,我远非一个乐观者。
I'm nowhere near an AI optimist about most AI technologies today.
但在音乐领域,我认为它有潜力释放人类的大量音乐潜能。
But with music, I think it has the promise to unlock a lot of musical potential from humans.
所以今天,我们将探讨三种不同类型的人工智能音乐项目。
So today, we're going to look at three different kinds of AI music projects.
而且我会做一件在HiFi Nation中很少见的事。
And I'm going to do something rare for HiFi Nation.
我会对每个项目给出我的看法。
I'm actually going to give you my take about each project.
它们对音乐创造力未来意味着什么。
What they mean for the future of musical creativity.
如果音乐产业巨头不先起诉将其扼杀的话。
If the music industrial complex doesn't sue it out of existence first.
这是《拉斯维加斯的那个夜晚》,由一个深度伪造AI——图派克·沙库尔演唱。
This is That Night in Vegas featuring a deepfake AI, Tupac Shakura.
这首歌由一位在线艺术家创作并制作,我只知道他在YouTube上的名字叫‘嘻哈智慧’。
It was written and produced by an online artist who I only know as hip hop intelligence on YouTube.
歌词、音效制作,甚至说唱的节奏和风格都出自他手,但声音是图派克·沙库尔的。
The lyrics, the sound production, even the rapping and flow is all his, But the voice is Tupac Shakur's.
这个
The
这段作品是想象图派克在演唱一首关于他本人死亡当晚的歌曲,借助AI语音模拟技术使其成为现实。
piece is an imagination of Tupac rapping a track about the night of his own death, made real by an AI vocal emulator technology.
语音模拟技术通过训练来模仿某个人声音的音色。
Vocal emulators train and mimic the timbre of an individual's voice.
因此,当你——作为艺术家或制作人——对着麦克风在数字录音软件中演唱或说唱时,你的声音音色会被深度伪造的声音所取代。
So when you, the artist or producer, sing or rap into a microphone into your digital recording software, the timbre of your voice gets replaced by that of the deepfake.
但你所说的其他一切内容都保持不变。
But everything else you spoke stays the same.
歌词、旋律、节奏都完全保留了你原本的样子。
The words, the notes, the rhythm.
嘻哈智能(Hip hop intelligence)这位创作者完完全全是用自己的声音,像操作乐器一样呈现出图派克·夏库尔的嗓音,就像任何音乐人都能把其他各类声音当作乐器来演奏一样。
Hip hop intelligence is in all respects using his own voice to play Tupac Shakur's voice like an instrument, in the same way any musician can play any other sounds as an instrument.
如果把最终完成的这首曲目当成是AI生成的作品,那就大错特错了。
It's a mistake to think of the resulting track as an AI generated track.
这是一首由人类创作完成的曲目,只是用到了语音模拟器这一种AI技术而已。
It's a human produced track using a vocal emulator, which is one kind of AI technology.
还有另一部由嘻哈才子(Hip Hop Intelligence)制作的作品,用了同样的方式生成埃米纳姆的声音。
Here's another piece produced by Hip Hop Intelligence playing Eminem's voice in the same way.
还有一首由一位名叫伊兹·比弗(Yeezy Beaver)的人制作的曲目。
And here's a track produced by someone named Yeezy Beaver.
里面是坎耶·韦斯特的嗓音,翻唱了纯白色乐队(Plain White T's)的热门歌曲《你好,迪莉娅》(Hey There Delilah)。
It's Kanye West's voice singing the Plain White T's hit Hey There Delilah.
目前这类新奇的创作都是游离在法律之外的。
These kind of novelty productions currently operate outside of the law.
目前还没有出台禁止这类行为的法律,而且人们还没有从这些作品中赚到钱,因此还没有任何艺术家或唱片公司提起诉讼。
No legislation prohibiting them has been issued, and people aren't really making money from the productions yet, so no artist or record label has sued.
但当诉讼最终到来时,将需要提出一种形而上的论点,来界定语音模拟的性质。
But when the lawsuit eventually comes, there's going to need to be a kind of metaphysical argument to be made about how to classify vocal emulation.
与许多人工智能技术一样,语音模拟器处于与以往事物之间的模糊地带。
Like a lot of AI, vocal emulators occupy an in between space from what has come before.
如果坎耶·韦斯特的声音被采样并在歌曲中重放,这显然是使用了属于坎耶或采样录音版权所有者的东西。
If Kanye West's voice were sampled and replayed in a track, that is clearly the use of something owned by Kanye, or whoever owns the sampled recording.
另一方面,如果一个非常出色的坎耶模仿者,通过数小时聆听他的声音来学习模仿,然后翻唱了《Hey There, Delilah》。
On the other hand, if a really good Kanye impersonator, who learned to mimic Kanye's voice based on hours and hours of listening to it, then proceeded to cover Hey There, Delilah.
那么这就属于翻唱。
Well, then it's a cover.
坎耶对这种声音没有任何所有权主张。
Kanye would have no claim to ownership of that voice.
这显然是另一个人的声音,无论模仿得多么逼真。
It's clearly someone else's voice, no matter how convincing.
这不仅仅是理论上的问题。
This isn't just theoretical.
歌手马克·马特尔以模仿弗雷迪·默丘里嗓音的能力闻名,以至于在皇后乐队传记电影《波西米亚狂想曲》中,他们直接使用了他的声音来替代弗雷迪的声音。
Singer Mark Martell is famously so good an impersonator of Freddie Mercury's voice that they actually used his voice as Freddie's voice in the Queen biopic Bohemian Rhapsody.
这是马克·马特尔在演唱,而不是弗雷迪·默丘里。
This is Mark Martell singing, not Freddie Mercury.
语音模拟技术正好处于采样和模仿者之间。
Vocal emulators are exactly in between a sample and an impressionist.
机器确实通过采样来学习如何模仿坎耶。
A machine did take samples to learn how to mimic Kanye.
但模仿本身并不是直接从采样中生成的。
But the mimicry itself isn't generated from the samples.
它是在生成全新的声音,就像任何人的嗓音一样。
It's making new sounds just like any person's voice.
它是一个机器语音模仿者。
It's a machine vocal impressionist.
所以这有点像采样,但又不是。
So it's a little bit like sampling, but not.
也有点像学习模仿某人,但又不是。
And a little bit like learning to impersonate someone, but not.
这是AI版的弗雷迪·墨丘里翻唱《颤栗》。
That's AI Freddie Mercury covering Thriller.
这些曲目的制作人大多并不想欺骗我们或让我们上当。
The producers of these tracks are mostly not trying to fake us out or deceive us.
他们中的大多数都明确表示,这些并不是真正的艺术家。
Most of them are being transparent that these are not the real artists.
而且,任何在世的艺术家都很容易否认这是他们唱的。
And anyways, it's easy for any live artists to deny it's them.
至于已故的艺术家,我们都知道他们不可能唱出新的歌曲。
And dead artists, well, we know they can't sing new music.
这些作品更像是一种音乐领域的同人创作。
These productions are instead a lot like fan fiction for music.
很高兴在你曾经去过的地方遇见你。
Nice to meet you where you've been.
我可以带你见识不可思议的事物。
I could show you incredible things.
魔法、旗帜、天堂,还有富有想象力的制作人和音乐人正试图创造平行世界,在那里弗雷迪·默丘里翻唱了《Thriller》,或者图帕克在死后依然能说唱关于这件事。
Magic, banners, heavens, and Imaginative producers and musicians are trying to create parallel worlds where Freddie Mercury did cover Thriller, or where Tupac was able to survive his death to rap about it.
现在你能为阿黛尔写一首歌,并真正听到她演唱的效果,这难道不酷吗?
And how cool is it that it's now possible for you to write a song for Adele and actually hear what it would sound like if she sang it.
这些制作人从这些作品中获利的程度,并不比同人小说作者更多。
And these producers are not making money from these tracks any more than fan fiction authors are.
他们所做的是一种相当纯粹的创意行为,将作品发布出来供我们欣赏、讨论和争论,而不是为了盈利。
They're doing something quite pure creatively, putting work out there to consume for us to talk about and debate, but not for profit.
所以,当然,这种状态不会长久。
So of course, it's not going to last.
因为历史表明,如果有什么会扼杀音乐创作,那并不是新技术。
Because history shows that if anything shuts down musical creativity, it's not new technology.
这是一场诉讼。
It's a lawsuit.
如今在音乐行业赚钱的唯一方式
The only way you can make money in the music industry this day
音乐哲学家罗宾·詹姆斯。
Philosopher of music, Robin James.
不是通过作曲或当现场音乐人。
Is not by composing or being a gigging musician.
而是通过拥有歌曲版权。
It's by owning copyright to song catalogs.
这就是为什么你看到贾斯汀·比伯以大约200万美元的价格卖掉了他的版权。
That's why you see Justin Bieber just sold his catalog for, like, $2,000,000.
泰勒·斯威夫特卖掉了她的版权。
Taylor Swift sold her catalog.
鲍勃·迪伦卖掉了他的版权,而这些版权大多都卖给了名为Hypnosis的公司。
Bob Dylan sold his catalog, and they're mostly going to this a firm called Hypnosis.
Hypnosis本质上是通过收购这些歌曲目录并授权版权来赚钱。
Hypnosis is basically making money buying up these song catalogs and licensing the copyright.
如今在音乐行业赚钱的唯一方式,就是拥有会增值的资产。
The only way to make money in the music industry these days is by owning assets that appreciate.
这些资产所有者将诉讼视为从其拥有的知识产权中获利的另一种方式。
These asset owners see lawsuits as another way to make money from the IP that they own.
在这种背景下,我认为我们对AI语音模拟技术的发展方向已经有了相当清晰的认识。
In this context, I think we have a pretty good sense of where AI vocal emulation technology is headed.
一个人声音的音色很可能将成为一种数字资产,可以被拥有、授权、购买,并成为收入来源。
The timbre of a person's voice is probably going to be a kind of digital asset, something that can be owned, licensed, purchased, and a source of revenue.
最好的情况是,这些收益会归那些声音被模仿的艺术家所有。
At best, it'll go to the artists whose voices are being mimicked.
但更可能的是,最终这些资产会被拥有更多资金的公司收购,它们会建立一个名人声音组合,通过授权获取被动收入。
But in all likelihood, it'll end up being owned by corporations who have far more money to acquire a portfolio of celebrity voices to license out for passive income.
目前,AI语音模拟还处于早期阶段,每个人的语音都仍是开源的。
Right now, it's so early in the era of AI voice emulation that everyone's voice is still open source.
所以目前最具创造力的活动正在发生。
So the most creativity is going to be happening now.
我认为,几年后,语音模拟技术将走向专业化、工业化和商业化。
My take is, in a few years, vocal emulator technology is going to be professionalized, industrialized, and monetized.
而受益者将不会是业余制作人。
And the beneficiaries are not going to be amateur producers.
所以现在就尽情享受AI制作的Kanye翻唱Taylor Swift吧。
So enjoy your AI Kanye covering Taylor Swift for now.
这是AI音乐技术的第一种类型。
That's AI music technology number one.
《高菲国度》将在这些广告后继续播出。
Hi-Phi Nation will return after these
广告。
messages.
现在我们来谈谈AI音乐技术的第二种类型:完全生成式AI音乐。
Now let's talk about AI music technology number two, fully generative AI music.
如果你把音乐看作是一系列写在乐谱上的音符,那么深度学习早在很久以前就解决了让音乐听起来像人类创作的问题。
If you wanted to treat music as a series of notes on a page, then deep learning solved the problem of making music sound like human compositions long ago.
这是来自索尼CSL的Deep Bach,索尼在巴黎的一个声音研究实验室,它诞生于2016年。
This is Deep Bach from Sony CSL, one of Sony's sound research labs in Paris, and it's from 2016.
Deep Bach是众多能够多年生成巴赫风格作品的深度学习模型之一。
Deep Bach is one of many deep learning models that has been able to generate Bach sounding compositions for years.
这是AI莫扎特,一个完全由AI生成的、基于莫扎特钢琴奏鸣曲的创作。
And here's AI Mozart, a completely AI generated composition based on Mozart's piano sonatas.
在很大程度上,人工智能正在向我们揭示人类音乐创作实际上是如何运作的。
To a large extent, AI is revealing to us how most human music making actually does work.
这是明尼苏达大学莫尔黑德分校的音乐哲学家西奥多·格雷西克。
This is Theodore Gracyk, philosopher of music at the University of Minnesota, Moorhead.
其中所谓的创造力水平相当低。
There's a fairly low level of what we might call creativity in it.
格雷西克认为,和许多人一样,像莫扎特、巴赫以及当今的词曲作者这样的音乐创作者似乎都遵循着相似的职业路径。
Gracyk thinks, like many people do, that music composers like Mozart, Bach, and the songwriters of today all seem to follow a similar career path.
你开始职业生涯时,偶然发现了一种新的作曲方式。
You start your career by stumbling upon some new way of writing music.
然后
And then
在某个阶段,他只是在重复自己。
After a certain point, he's just copying himself.
莫扎特只是在机械地创作。
Mozart is just writing by rote.
他只是在自动驾驶状态下工作。
He just works on autopilot.
没有任何原创的东西产生。
Nothing original happens.
他的创作灵感已经停止了。
His writing creativity has stopped.
他只是在自我抄袭。
He's just self plagiarizing.
这听起来像是历代任何评论家对从巴赫到保罗·麦卡特尼等知名人物的牢骚。
This might sound like a grumpy complaint from any critic over the ages about anyone famous from Bach to Paul McCartney.
但格雷西克认为,在人工智能时代,这种抱怨已从主观的价值判断转变为实际上有科学依据的论断。
But Gracyk thinks, in the age of AI, this kind of complaint goes from subjective value judgment to actually scientifically supported.
否则,你如何解释巴赫或莫扎特的作品为何对机器模型来说如此容易生成?
How else could you explain, for example, how Mozart or Bach compositions can be so easy for a machine model to generate?
在巴赫和莫扎特的心中,他们一定是发现了某种模式或模板,然后围绕这些模式和模板来谱写音符。
In the minds of Bach and Mozart, it must be that they came across a pattern or template, and then just wrote notes around that pattern and template.
这难道就是人类创造力的运作方式吗?
This is just how human creativity tends to work?
这正是人工智能在研究巴赫和莫扎特时所做的事情。
It's exactly what the AI does when it studies Bach and Mozart.
这是OpenAI在2019年推出的MUSE NET。
This is Open AI's MUSE NET from 2019.
这也是一个由音符生成的作品。
This is also a notes generated piece.
这次,它正在生成三组重叠的音符,组成爵士乐编曲,包括钢琴、鼓和贝斯。
This time, it's generating three overlapping sets of notes into a jazz arrangement, piano, drums, and bass.
这是更复杂的层次,不仅为一种乐器,而是为多种乐器生成音符。
This is the next level of complexity, generating notes not just for one instrument, but for many.
音符在乐谱上的模式之所以能被AI模型轻松学习,是因为这些信息实际上非常简单,甚至比自然语言还要简单,而自然语言的词汇量高达数十万。
The reason that the patterns from notes on a page are so easily learned in an AI model is because the information is actually quite simple, even simpler than natural language, which has a vocabulary of hundreds of thousands.
在音乐中,你最多只有12个音符,分布在七个八度以内,节奏模式在每分钟80到140拍之间。
In music, you have 12 notes in at most seven octaves and rhythmic patterns between eighty and one hundred and forty beats per minute.
爵士钢琴音符与蓝草钢琴音符之间的差异,甚至对人类来说也很快就能分辨出来。
The difference between jazz piano notes and bluegrass piano notes appears very quickly even to humanize.
然而,生成乐谱对AI研究人员来说还不够具有技术挑战性。
Generating compositions, though, was not enough of a technical challenge for AI researchers.
在乐谱上生成音符并不等于生成声音形式的音乐。
Getting notes on a page isn't generating music as sound.
这只是在生成抽象的音乐信息,而我们早已知道计算机擅长做这类事情。
It's generating music as abstract information, something we already knew computers were good at doing.
你听到的是由人类演奏或由计算机生成的MIDI声音。
What you're hearing are sounds played by a human or by computer generated MIDI.
让AI完全生成作曲和声音,这正是Google Music LM当时试图实现的目标。
Getting everything generated by an AI, composition and sound, is what they were trying to do at Google Music LM.
好的。
Okay.
七十年代的俱乐部。
Club in the seventies.
好的。
Okay.
这会非常有趣。
This will be super interesting.
这个更好。
That one's better.
是的。
Yeah.
是的。
Yeah.
作为
As
一首音乐作品。
a piece of music.
但它作为放克迪斯科风格更易理解。
It's more legible as something funk disco.
但同样,在七十年代,这只会出现在某些俱乐部、某些地方。
But, again, in the seventies, that's gonna be certain clubs, certain places.
是的。
Yeah.
对吧?
Right?
你知道的。
You know?
这并不是CBGBs,那也是一个俱乐部
It's not CBGBs, which was also a club
是的
Yeah.
在七十年代。
In the seventies.
好的。
Okay.
那我们来谈谈八十年代。
Let's do eighties.
显然在尝试引用新的方式。
So clearly trying to reference new ways.
是的。
Yeah.
这里那里有一点Blondie的影子。
There's a little blondie here and there.
我不
I don't
知道。
know.
是的。
Yeah.
是的。
Yeah.
我们会播放九十年代一家俱乐部的音乐。
We will play music from a club in the nineties.
那是浩室音乐。
That's house.
是的。
Yeah.
有点儿。
Sort of.
是的。
Yeah.
是的。
Yeah.
但再说一遍,这在某种程度上确实偏向了当时主要发生在欧洲的现象。
But again, it's sort of definitely privileging what's going on, I would say primarily in Europe at that point.
因为在九十年代,电子舞曲虽然在芝加哥和底特律也能找到,但人们通常把过去十年称为电子舞曲在美国爆发的年份。
Because in the nineties, electronic dance music was I mean, you'd find it in Chicago and Detroit, but, you know, they talk about the last decade as the the year that electronic dance music broke The US.
所以,是的,这再次体现了一种特定的文化表征,反映了那个时代的特征。
So, yeah, again, it's sort of a specific kind of cultural representation of what what that era was.
好的。
Okay.
我们试试二月吧。
Let's try the February.
这让我笑了,因为听起来像是‘独立风封面’,现在人们正是用这个词来指代两千年代的复古潮流。
That just made me laugh because it sounds like, indie sleeves, which they're what they're calling the sort of two thousands retro trend right now.
这完美地概括了今天人们认为那个时代音乐听起来的样子。
And it's a perfect encapsulation of what people today think music in that era sounded like.
但同样,这主要指的是白人。
But, again, that would be primarily, like, white people.
所以,独立脏乱风其实就是对LCD Soundsystem声音的怀旧。
So the indie sleaze trend would be, like, nostalgia for LCD sound system.
对吧?
Right?
再想想二月,那是炫富说唱的年代。
Think about it too in the February, this is the era of bling rap.
所以,如果你去一家嘻哈俱乐部,那里的音乐根本不像这样。
So if you went to a hip hop club, that doesn't sound anything at all like that.
你想试试垃圾摇滚吗?
Do you wanna try grunge?
好啊。
Sure.
是的。
Yeah.
我不知道。
I don't know.
我的意思是,直到今天人们还在为此争论。
I mean, like, people argue about this to this day.
比如,人们会说,哦,不。
Like, people say, oh, no.
你知道的,碎南瓜乐队,那不算垃圾摇滚。
You know, smashing pumpkins, that ain't grunge.
那比我认为的垃圾摇滚更偏向金属一些。
That was a little bit more metal than I would have placed in.
但你怎么看?
But what do you think?
我觉得至少不比尼克尔巴克差。
I think it's at least as good as Nickelback.
不。
No.
不。
No.
但没错,它听起来很像你在另类摇滚电台听到的音乐,而这类音乐是由另一个排行榜统计的,是的。
But, yeah, it sounds a lot like what you'd see on alt rock radio, which is measured by a separate chart Yeah.
而不是热门另类歌曲。
Than hot alternative songs.
所以它 definitely 在这个范畴内。
So it's definitely in the universe.
我认为所有这些完全由AI生成的音乐的目的并不是要创作出一首新的流行金曲、另类摇滚金曲、R&B金曲或类似的东西。
I don't think the goal of any of this completely AI generated sound stuff is to create a new pop hit or an alt rock hit or R and B hit or anything of the kind.
如果真有市场的话,我猜会是像情绪氛围类的音乐。
If there's a market for it, my guess would be it'd be an ambient music, like moods.
我们来听一些情绪类别的音乐吧。
Let's listen to some of the mood genres.
有几个。
There's a couple.
我们先从放松音乐开始。
Let's start with chill out.
好的。
Alright.
我们来听一下慢节奏的。
Let's do down tempo.
是的。
Yeah.
我觉得这种声音会非常受欢迎。
I think that's gonna be a sound that will be really popular.
对吧?
Right?
我打赌你会看到很多AI制作的YouTube频道,你可以直接听那种低保真、稳定的节奏,用来学习或放松。
I I bet you'll get a bunch of, like, AI, YouTube channels where you can just listen to, you know, like, the lo fi steady beats to study relax to.
所以我觉得这有无限的变现潜力。
So I think that's infinitely monetizable.
对吧?
Right?
因为对于放松音乐来说,人们其实并不是在认真听它。
Because for chill music, the point is people aren't really listening to it.
对吧?
Right?
他们只是被它分散了注意力。
They're just distracted.
而且有很多内容是由业余爱好者制作的,他们需要音乐。
And there's a lot of content being generated by amateurs that requires music.
你知道,所有在YouTube上做健身视频的人,都想吸引关注,他们就需要健身音乐。
You know, all the people who are doing their little exercise regimes on YouTube, you know, like to try to get attention, like they need exercise music.
所有做播客的人都需要背景音乐,所有做YouTube频道的人也都需要。
All the people doing podcasts need like soundtracking music, all the people doing YouTube channels.
我不清楚在这样的规模下,这些人都怎么负担得起请专业音乐人作曲的费用。
And I don't know at that scale how all the these people can afford an actual musician to compose.
是的。
Yeah.
对。
Right.
你指出,存在一个针对非版权音乐的市场,这些用户根本不会为他们业余的播客、健身视频或其他内容支付定制创作的费用。
You're pointing out that there's a market for non copyrighted music from people who would never pay either for a bespoke commission for their amateur podcast or exercise video or whatever.
这个市场部分并不一定是坏事。
This part of the market is not necessarily bad.
对吧?
Right?
这某种程度上正是科技的叙事。
And this is sort of the narrative of tech.
通过简化某些事物,比如音乐创作,你让DIY创作者更容易获得这些资源。
In deskilling certain things, like music composition, you make it more available to DIY creators.
我认为这本身并不坏。
That's, I think, not inherently bad.
这是当你让谷歌的语言模型生成梦幻流行音乐时所产生的曲目。
This is the track that Google LM generates when you ask it to create dream pop.
关于人工智能生成的音乐,一个真正可听的观点是,它是一种大众艺术形式。
One take about AI generated music that's actually listenable is that it's a populist art form.
西奥多·格雷西克。
Theodore Gracyk.
大多数人会把音乐和某种活动结合在一起,比如跑步、在健身房锻炼、在办公室工作、做作业,等等。
Most people conjoin music with something else, some activity, whether that's taking a run, working in the gym, working in their office, doing their homework, whatever.
将某种人类活动变得特别或难忘,似乎是音乐真正的功能。
Taking some human activity and making it special or memorable seems to be the real function of music.
而这正是人工智能机器人擅长的。
And this is what the artificial intelligence bots are great for.
它们实际上在创作不会干扰人的音乐,因此非常适合作为其他活动的背景音效。
They're actually creating music that is nondisturbing, so it functions really well as background enhancement to something else.
这是《凛冬的寒风》,由Eva创作的一首完全由AI生成的钢琴曲。
This is Winds of Winter, a completely AI generated piano piece by Eva.
它的目标是让人们购买它作为背景音效,而不是让音乐爱好者在音乐厅里欣赏。
The goal is for people to buy it as a piece of soundtracking, not a piece for music lovers to appreciate in a concert hall.
坐在音乐厅里专心聆听器乐,这种行为在文化上显得很奇怪。
The process of actually sitting in a concert hall, attending to instrumental music is culturally weird.
那种静坐聆听音乐并全神贯注地感受它,这种观念实际上是十九世纪才出现的历史异常。
The idea of sitting there listening to music and attending to it with your full attention, This is a historical anomaly created basically in the nineteenth century.
这并不是普通人与音乐互动的方式。
It's not how the average person relates to music.
西奥多·格雷西克认为,就美学和创造力而言,AI生成的音乐与人类创作的绝大多数音乐并无本质区别。
Theodore Gracyk thinks that AI generated music is pretty much most music that humans generate in terms of aesthetics, in terms of creativity.
它不会比普通人创作的音乐更好或更差。
It's not gonna be any better or any worse than what the typical human makes.
因为在最好的情况下,它只是在复制普通人所创作的东西。
Because in the best case, it is making what the typical human makes.
这个AI机器人将熟悉相当于人类有史以来经历的20到50倍的内容。
The the AI bot here is going to familiarize itself with something like 20 to 50 times what any human being has ever experienced.
明白吗?
Okay?
然后它所做的只是找出重复的模式。
And then all it's gonna do is find the recurring patterns.
这其实和我们的大脑长期以来被塑造去做的基本音乐图式——通常被称为蓝图——是一回事。
Well, that's the same thing our mind has been wired to do with basic musical schema, as they're often referred to blueprints.
对吧?
Right?
它创作音乐的方式,和人类创作音乐的方式差不多。
It's making the music more or less the same way that humans make music.
另一方面,罗宾·詹姆斯认为,AI生成的音乐作为一种政治和经济对象是有趣的。
Robin James, on the other hand, thinks AI generated music is interesting as a political and economic object.
但作为一种将取代人类音乐创造力的技术呢?
But as a technology that's going to take over human musical creativity?
不可能。
No way.
AI技术爱好者和末日论者对这一点完全错了。
AI tech enthusiasts and doomsayers are completely wrong about that.
我认为他们展现的是音乐学家所谓的‘仅关注音符’的音乐观,这种观点把音乐当作一种文本,认为这便是全部。
I think they exhibit what musicologists will call the just the notes approach to music, which sort of looks at music as like a text and that's all that matters.
而那种认为AI somehow能比个人更具创造力的观点,实际上是将‘英雄式的个人’和‘创作是个体行为’的理念置于中心,但事实并非如此。
And that perspective that like AI can somehow become more creative than an individual person, it's it's centering this idea of the heroic individual and the idea that creation is an individual activity and it's not.
创作是一种参与性的活动。
It's a participatory activity.
我通过了解其他优秀的音乐作品,并与它们对话,来创作出好的音乐。
I make good music by knowing what other good music sounds like and being in conversation with other works.
如果你把音乐看作一种人们共同参与的实践,那么AI音乐不会影响音乐体验中的这一部分,尤其是当它作为一种私有财产、由私营企业掌控时,它永远无法参与到音乐的协作性社会过程中。
If you think of music as a practice that people do together, AI music is not gonna impact that part of the musical experience, as especially as long as it's, like, a privately owned thing, right, that's creating private property for private businesses is never gonna be engaged in the collaborative social process of music.
罗宾·詹姆斯是一位音乐哲学家。
Robin James is a philosopher of music.
我们正在聆听由OpenAI的Jukebox生成的一段完全由机器创作的、模仿艾拉·费兹杰拉德风格的古典爵士乐。
We're listening to an entirely generated piece of classical jazz in the style of Ella Fitzgerald from Open AI's Jukebox.
音符、音乐、她的嗓音,所有这些都由机器生成。
The notes, the music, the sound of her voice, all of it is created by a machine.
机器甚至没有输入过歌词。
The machine didn't even get fed lyrics.
你必须在聆听时努力分辨歌词,但其中大部分可能根本听不清。
You have to try to make them out as you listen to it, and probably most of it is indiscernible.
就我而言,我认为某些AI生成的音乐在美学上是有趣的、独特的,但只有当它与任何熟悉的声音相去甚远时才如此。
As for me, my take is that some AI generated music is aesthetically interesting, uniquely interesting, but only when it strays far from sounding like anything familiar.
因为音乐的突破正是在这里发生的。
Because this is where musical breakthroughs happen.
当最初听起来像是无法忍受的噪音,随后却逐渐演变为有意义的旋律时。
When at first things sound like unbearable noise, and then becomes coherent as something meaningful.
在我看来,Google Music LM最有趣的地方在于它能将文字转化为声音,这意味着你也可以将声音转化为文字。
The most interesting thing to me about Google Music LM is that it translates words into sound, which means that you can translate sounds into words.
当这一点成为可能时,就意味着你可以将任何能用文字表达的东西转化为音乐。
And when that's possible, that means you can translate anything into music that you can also translate into words.
你正在聆听文森特·梵高所画的《星夜》。
You're listening to the painting Starry Starry Night by Vincent van Gogh.
没错。
That's right.
你正在聆听这幅画。
You're listening to the painting.
这是维基百科上对这幅画的文字描述。
The painting as described in words on Wikipedia.
然后这些文字被输入到谷歌LM中,转化为音乐。
And then the words fed into Google LM to translate into music.
再听一遍。
Here it is again.
这是爱德华·蒙克的《呐喊》所发出的声音。
This is the sound of The Scream by Edward Munch.
这是Google LM对萨尔瓦多·达利的《记忆的永恒》的诠释。
And this is Google LM's interpretation of Salvador Dali's The Persistence of Memory.
这幅画描绘了融化的时钟。
It's the painting with melting clocks.
一个时钟融化在桌子边缘,另一个垂挂在树枝上,还有一盘子正被蚂蚁群围攻。
One clock is melted over the edge of a table, another is draped over a branch, and there's a plate being swarmed with ants.
这幅著名的超现实主义画作灵感源于达利在食用了一块变质的卡门贝尔奶酪后产生的幻觉。
This famous surrealist painting was inspired by Dali hallucinating after eating a rotten piece of Camembert cheese.
无论如何,我刚才说的整个描述以及更多内容都被输入到Music LM中,生成了这幅画的声音。
Anyhow, that whole description I just said and more was fed into Music LM to generate this sound of the painting.
它可能不会成为热门单曲,但这些作品所表达的理念非常迷人。
It may not be a radio hit, but the idea these pieces express is fascinating.
这个理念是:尽管语言作为描述其他人类表达媒介(如视觉艺术和音乐)的方式并不完美。
It's the idea that however imperfect language is as a meaningful description of other mediums of human expression, like visual art and music.
但它可以充当终极的翻译者。
It can serve as an ultimate translator.
展开剩余字幕(还有 164 条)
一幅画听起来有一种近似的方式。
There is an approximate way a painting can sound.
一首乐曲看起来也有其近似的方式。
There's an approximate way for a piece of music to look.
甚至可能通过我们对气味和触感的描述来生成音乐。
And there may even be music generated from our description of smells and touches.
谷歌,一个拥抱的声响是什么样的?
Google, what's the sound of a couple embracing?
也许关于AI艺术生成器的正确观点,不仅在于我们为艺术家们创造了一个令人恐惧的新政治和经济现实。
Maybe the right take about AI art generators is not only that we've got a scary new political and economic reality for artists.
这一点确实存在。
There's that.
但也许我们还发现了一种工具,可以将视觉艺术转化为音乐艺术,或将音乐艺术转化为视觉艺术,或把一种艺术形式转化为另一种艺术形式。
But maybe we've also discovered a tool to translate visual art to musical art, or musical art to visual art, or any one kind of art into another kind of art.
也许这将有助于解锁我们此前从未见过的全新创意项目。
And maybe that will serve to unlock new creative projects we haven't seen before.
好的,谷歌,放点极简浩室音乐,带我进入广告时段。
Okay, Google, play me into the commercial break with some minimal house music.
我想向你们介绍的最后一种AI音乐技术是音乐即兴创作。
The final piece of AI music technology I want to introduce you to is musical improvisation.
这是一个由机器生成的12小节蓝调独奏,只是作为一项研究项目生成的数十个独奏之一。
This is a machine generated 12 bar blues solo, one out of dozens and dozens of solos generated as part of a research project.
项目。
Project.
我曾经采访过一位创造力研究者,她认为即兴创作是人类创造力最纯粹的来源之一。
I once spoke to a creativity researcher who told me that she thinks improvisation is one of the purest sources of human creativity.
她指的是所有形式的即兴创作,比如爵士乐、摇滚乐的即兴演奏、自由式说唱、喜剧、雕塑,任何需要人们当场创作的东西。
She meant all improvisation, music like jazz or jamming in rock, freestyle rap, comedy, sculpting, anything that requires people to come up with something on the spot.
如果机器能够像人类一样即兴创作,那么也许我们就更接近于模拟人类的创造力了。
If machines can improvise like humans, then maybe it's one step closer to modeling human creativity.
但真的可以吗?
But can it?
我们来测试一下。
Let's test it out.
这是研究人员阿德南·苏布拉马尼安在C大调中为所有12小节蓝调独奏所使用的伴奏轨。
This is the backing track that researcher Adnan Subramanian used for all of the 12 bar blues solos he generated in the key of C.
我是巴西帕拉伊巴联邦大学的教授。
I'm a professor at Universidad de Federal da Paraiba in Brazil.
我把这段伴奏轨发给了两位业余吉他手,让他们即兴创作一段独奏,以便与阿南研究项目中生成的机器独奏进行对比。
I sent this backing track to two amateur guitarists, and I had them each come up with a solo on the spot to compare to the machine generated solos that were part of Anand's research project.
但首先,阿南做了什么?
But first, what Anand did?
在我们的案例中,我们有一个数据库。
So in our case, we had a database.
所以我们使用乐句。
So we work with licks.
我们可以在各处找到这些乐句。
So we can find those licks all over the place.
有一本教科书专门讲蓝调乐句。
There's a textbook dedicated to blue slicks.
所以我们的策略是把这些乐句拆解,用一拍一个乐句来构建我们的数据库。
So our strategy was to break those licks and have one bar lick and populated our database.
从这个包含数百个一拍蓝调乐句的数据库中,阿南和他的团队尝试生成十二拍的独奏。
And from that database of hundreds and hundreds of one bar blues licks, Adnan and his team tried to generate 12 bars worth of solos.
如果你只是让机器随机生成,最终得到的独奏听起来就像这样。
If you just tell the machine to do it at random, you end up with solos that sound like this.
听起来很糟糕。
It sounds awful.
你如何让机器表现得更好?
How do you get the machine to do better?
这时候,你可以选择两种方法之一。
At this point, you can do one of two things.
你可以让人类坐下来聆听所有可能的独奏,并标记哪些是差的,哪些是好的。
You can tell humans to sit down and listen to all of the possible solos and label which are bad ones and which are good ones.
然后你把所有这些数据交给机器,让它通过数学运算,找出好独奏和差独奏之间的共同点。
And then you give all that to the machine, and it tries to figure out through math, what the good solos have in common and what the bad solos have in common.
这就是机器学习,我们整个季节一直在讨论的这种技术。
That's machine learning, the kind of technology we've been talking about all season.
或者你可以像阿南德那样做,这叫做数学优化。
Or you could do what Anand did, which is called mathematical optimization.
这需要一个人根据音乐理论中对好听音乐的理解,提出一个数学公式,来区分好独奏和差独奏。
This is where a person has to come up with a math formula that separates good solos from bad solos based on human knowledge of what sounds good, based on music theory.
例如,如果两个音符都在五声音阶中,那么它们就构成了一段好的独奏。
For instance, if two notes are in the pentatonic scale, then they make up a good solo.
你把所有这些规则都编程进机器,然后让机器根据音乐理论生成独奏。
You program all that into the machine, and you have the machine generate solos around the music theory.
用这种方式与深度学习方式相比,有什么优势?
What is the advantage of doing it this way versus a deep learning way?
是的。
Yeah.
我认为其中一个主要优势是可解释性。
I think one of the main advantages is interpretability.
你可以解释你正在做什么。
You can explain what you're doing.
在阿南德的方法中,他对生成算法有完全的控制权,能够明确告诉你为什么连续两小节算作好的独奏,而另外两小节却不好。
In Anand's way, he has complete control over the generative algorithm, and he can tell you exactly why two consecutive bars count as part of a good solo, and why two other ones are bad.
我们稍后会听一段深度学习生成的独奏。
We're going to look at a deep learning solo later.
对于这项技术,甚至无法理解机器是如何得出这些模式的。
And for that technology, it's not even possible to understand how the machine ended up with the patterns it did.
好的。
Alright.
让我们请出我们的吉他手法布里齐奥和凯沙夫。
Let's bring in our guitarists, Fabrizio and Keshaave.
首先,我让法布里齐奥听了一段生成的吉他独奏,但没有告诉他这段独奏是生成的还是凯沙夫演奏的。
First, I had Fabrizio listen to a generated guitar solo without telling him whether the solo was generated or played by Keshav.
我只是需要猜猜这是Keshaab还是AI演奏的。
I just need to guess if it was Keshaab or The AI.
我不是让你猜这个,但你可以猜。
I am not asking you to guess that, but you may.
我想问的是,作为一名吉他手和音乐爱好者,你觉得这个怎么样?
I'm asking, guitarist as and appreciator of music, what do you think of that?
然后如果你愿意的话,可以在最后猜一猜。
And then you can guess if you want at the end.
我不会猜,因为根本不可能猜对。
Well, I'm not going to guess because there's no way of succeeding at that.
我觉得这段有很多有趣的半音进行。
I thought it had a lot of interesting chromatic passages.
它的乐句处理非常均匀。
It's very even in the way it's phrased.
就像很多音符以均匀的间距依次出现,而且重复很少。
It's like a lot of notes that come at even spacing from each other, and there's not much repetition.
一连串很多音符。
Kind of a lot of notes one after the other.
所以如果要我猜的话,我会说那是AI。
So if I had to guess, I would say that was the AI.
我希望我猜对了。
I hope that's right.
但我希望我没有冒犯凯沙夫。
But I hope that I'm not insulting Keshav.
此时此刻,感受一下法布里齐奥的犹豫。
Sit with Fabrizio's indecision at this point.
注意他在尽量不冒犯凯沙夫时有多紧张。
Notice how nervous he is in trying not to insult Keshav.
我让凯沙夫听了一段来自机器的即兴独奏。
I had Keshav listen to a different improvised solo from The Machine.
听起来并不差,但两者都有一种奇怪的生硬感。
It doesn't sound bad, but there's something oddly stilted about both of them.
很难用别的词来形容,只能说这种演奏缺乏某种人性的温度。
Like, it's hard to describe it in other terms than, like, the playing lacks a certain kind of humanity.
好的,现在给你们听我们第一个真人即兴独奏的版本。
Okay, so now here's the first of our human improvised solo for you to compare.
伴奏轨和机器生成的那些是一样的。
Same backing track as the machine generated ones.
这听起来不错。
That was nice.
我非常喜欢这个。
I like that a lot.
我猜这是法布里齐奥弹的。
I'm guessing that was Fabrizio.
听起来非常好。
It sounded really good.
所以法布里齐奥,听了刚才那两个机器生成的版本后,你如何评价自己的这段演奏?
So Fabrizio, talk me through how you hear your piece now in light of the two machine generated ones that you just heard.
当我创作这段旋律时,我在想:有哪些乐句是我一定要演奏的?我要在大调上停留哪里,又要在小调上停留哪里?
When I approached my piece, I was thinking, what are a couple of phrases I want to make sure to play and where am I going to live in major and where am I going to live in minor?
所以我主要用C小调,但有一处我突然用了C大调,因为我觉得:管他呢。
So, I mostly live in C minor, but at one point I threw in a C major scale cause I was like, whatever.
我当时考虑的是乐句之间要有停顿,不是一直不停地弹音符,有些时候基础是贝斯在演奏,贝斯轨在持续,而AI则是连续不断的,你知道的,它几乎用音符填满了所有空间。
And I was thinking in terms of like phrases with spaces in between them, so I wasn't like playing notes all the time, there were moments where the basis was the bass was playing, the bass track was playing, whereas the AI was just continuous, you know, it was filling almost all the space that there was with notes.
你绝对能听出来这是人在演奏,而不是AI,因为其中有一种感觉。
So you can definitely tell that that's a person playing as opposed to the AI just because, you know, there's a kind of feeling to it.
这种感觉在蓝调中尤其重要。
That's especially important in the blues.
我认为蓝调是AI最难模仿的东西之一,因为那种感觉或灵魂,说不清道不明,但你就是能听得出来。
I mean, I think I of the blues as one of the hardest things for an AI to emulate because there's something less easy to describe that that feeling or that soul, you know, behind it that you can just kind of hear.
现在我们来听Keshav的即兴独奏。
Now let's play Keshav's improvised solo.
我觉得这非常好听。
I think that's very nice.
而且,另一个轨道里没有的是,这里有一个动机。
And again, what's there that is not in the other track is there is this motif.
还有一个,你几乎可以像这样结束这段独奏,这是一种萦绕在你脑海中的旋律想法,而当我听AI独奏时,我做不到这一点。
There is And you can almost, like, sort of exit the solo, which is like this melodic idea in your head, which I couldn't do when I was hearing the AI solo.
这并不是我让法布里齐奥和凯沙夫进行的唯一一次即兴演奏。
That wasn't the only improvisation I had Fabrizio and Keshaav do.
即兴演奏的目的是给音乐家或机器相同的伴奏轨,看看他们在同一段基础音乐之上能创造出什么。
The idea behind improv is to give a musician or a machine the same backing track and to see what they could come up with on top of the same underlying music.
我还找到了一段由深度学习生成的重金属独奏。
I also found a deep learning generated solo, a heavy metal one.
这一段完全是AI自己学习的。
This one was all AI learned.
没有人给AI任何提示,也没有给它任何数学公式来生成这些独奏。
No person gave the AI any hints, any mathematical formulas to generate the solos.
它只是通过聆听各种重金属独奏来学习,并当场构建出了一段。
It just learned from listening to all matters of heavy metal solos, and constructed one on the spot.
这是AI、Fabrizio和Keshav被要求即兴演奏的伴奏曲。
This is the backing track on which the AI Fabrizio and Keshaave were asked to improvise.
感谢Vishwanath Subramanian提供这首曲子。
Thanks to Vishwanath Subramanian for that track.
首先是Keshav的一次性即兴演奏。
First, Keshav's one take improvisation.
我只是随便玩玩,你知道的,有点像即兴弹奏,我听到和弦进行后就想,好吧。
I'm I'm kinda just messing around, you know, almost like noodling, you know, and and, yeah, you know, I I heard the chord progression and I thought, okay.
我知道我想用哪个调式。
You know, here's the mode I wanna use.
是的。
Yeah.
我觉得可能是C弗里吉亚调式,然后我就随便弹弹,飘在上面,因为这段音乐没什么明显的节奏让我抓住,不像布鲁斯那样。
I think it was C Phrygian or and so then, yeah, I'm kinda just like messing around, floating over it because it didn't have much of a groove for me to latch onto as opposed to the blues.
我的意思是,布鲁斯你真的能感受到,你开始跟着节奏摇摆,开始有感觉,而且会有Fabrizio提到的那种重复感。
I mean, blues, you really get, you know, you start grooving, you start feeling it, and and you get that repetition that Fabrizio was talking about.
你其实是在讲一个故事。
You're kinda telling a story.
这个嘛,我觉得就像是,好吧。
With this one, it was kinda like, alright.
这是一首重金属伴奏,但节奏感不强。
Here's a heavy metal backing track that's not that, rhythmic.
所以我只是随便选个音阶,稍微炫一下技。
So I'm just gonna kinda, know, pick the scale and and shred a little bit.
而且
And
这是法布里齐奥的版本。
here's Fabrizio's.
相似之处在于,缺乏明显的节奏可以抓住。
Similarities in that the lack of, like, rhythm to latch onto.
在法布里齐奥的独奏中,感觉他是在伴奏的节奏之上自由飘荡,尝试了各种不同的东西,甚至在结尾处还带点前卫的风格。
It felt like it led to in Fabrizio solo, you know, kind of floating over the rhythm of the track and kind of trying some different things over the course of the solo, you know, getting even a little avant garde with it at the end.
不过对我们两人来说,虽然我们以不同的方式处理独奏,但最终都在某种程度上与伴奏对抗。
For both of us, though we approached the solo differently, we ended up in in various ways kind of fighting against the track.
我当时在与伴奏对抗,完全无法融入它的节奏。
I was fighting the track and I couldn't enter into the rhythm of it.
所以我决定根据伴奏中的鼓手在做什么来演奏不同的东西。
And so I was like, I'm gonna play different things depending on what the drummer in the track is doing.
这就是我想要做到的。
And so that's what I aim to do.
现在这是AI的独奏。
And now here is the AI solo.
好的。
Okay.
嗯,这非常有趣。
Well, that that was very interesting.
我的意思是,这根本听不出是吉他声,但又让我想起了《星球大战》里的机器人,你知道的,发出哔哔啵啵的声音。
I mean, it doesn't sound like a guitar at all, but it also kinda reminded me of, like, droids from Star Wars, you know, like, making their beep boop noises.
我在想,AI可能捕捉到了什么样的模式,才让它发出这些声音?
I was thinking what kind of pattern could the AI possibly have picked up that made it make those sounds?
也许是更快的音符?
Maybe faster notes?
但正如Kesha所说,它没有那种攻击性。
But as Kesha was saying, it did not have the attack.
所以它听起来就像是合成器和长笛的某种组合,发出断断续续的声音。
And so it just it felt like it was this some combination of a synthesizer sizer and a flute making these intermittent sounds.
但它也具备了前一段独奏的特质——那就是它没有方向感。
But it also had that quality that the previous solo had that it wasn't going anywhere.
它并不认为独奏需要有任何发展。
It wasn't thinking of the solo as having any kind of development.
它只是把独奏看作一个可以随意放置音符的空间。
It was thinking of the solo as just a space in which you could just put notes.
无论是旧的还是新的AI独奏,都没有任何结构。
No structure to it on either of the AI ones, old or new.
当你们谈到结构时,对我来说,你们用的是那种富有质感的、音乐家般的词汇。
When you guys talk about structure, you're talking about it in this kind of qualitative and like a musician vocabulary to me.
但我觉得蓝调听起来更好。
But the blues sounded better to me.
但你们还是觉得它没有结构。
But you still thought there was no structure.
你能稍微解释一下这指的是什么吗?
Could you talk a little bit about what that means?
你知道,这并不是像AI所寻找的那种数学意义上的结构。
You know, it's not like a mathematical structure in the sense that like an AI would be, you know, looking for.
对我来说,用叙事的方式,或者说用音乐来讲故事,这样描述是很自然的。
It's natural to me to describe it as like a narrative or sort of telling a story with the music or something like that.
我认为,正是这种特质让一段好的独奏听起来如此具有人性。
And I and I think that's what, you know, makes a a good solo, you know, sound so distinctively human.
主题的发展赋予了它一种叙事性的结构,而不是仅仅背后存在某种数学结构。
That the kind of development of a theme, it gives it kind of a narrative structure as opposed to just some kind of mathematical structure behind it.
就像情感上的起伏,甚至情节的发展。
Like an emotional arc or even a plot arc.
比如,我们会谈论情节。
Like, we talk about plots.
是的。
Yeah.
类似这样的东西。
Something like that.
这就是为什么人们自然会说,一个好的独奏会在其过程中讲述一个故事。
That's why it's natural to say that, you know, a good solo kinda tells a story over the course of it.
我对它的理解是,从一个音乐想法开始,比如一个能哼唱出来的旋律,然后在发展过程中不断丰富这个想法。
What I think about it is, sort of, starting with an idea, like a musical idea, something you would be able to sing, and then enriching that idea as you develop it.
如果你只是想想歌曲,甚至不必专门考虑吉他独奏。
You don't even have to think about guitar solos if you just take, sort of, songs.
对吧?
Right?
通常会有一段主歌,然后是另一段主歌,主歌的旋律可能相同,但会有一些变化或调整,让你感到熟悉,因为你仍然处于之前类似的位置,但通过加入一些不同的音乐元素来避免单调。
Usually there is a verse, and then there's another verse, that a verse has maybe the same melody but with a few changes or modifications that give you a sense of familiarity, because you're still in kind of the same place that you were before, but you fight off the repetitiveness by adding some different musical elements.
或者在音乐中,有时会有一个陈述,然后紧接着一个反陈述。
Or sometimes in music, of course, you can have like a certain statement being made and then a counter statement.
但AI生成的音乐中缺少了这一点,即使在布鲁斯风格中也是如此,我听到的那段独奏完全没有停顿。
But that was missing from the AI one, even in the Blues case, and what I was hearing was that solo had no spaces.
没有任何地方让吉他手暗示某个主题已经结束。
There were no spots in which the guitarist was signaling that a theme had just been completed.
从这个意义上说,布鲁斯风格的那段音乐也有所欠缺。
In that sense, the Blues one was lacking something as well.
我们目前用来描述人类音乐体验的语言,正在机器音乐时代面临考验。
The language we currently have to describe our experiences of human music are being tested in the era of machine music.
我们在Keshaav和Fabrizio的独奏中听到的是旋律。
We hear a tune in Keshaav and Fabrizio solos.
当我们听到机器生成的音乐时,听到的只是一串音符。
We hear strings of notes when we hear the machine generated ones.
一个有人性,另一个则缺乏人性。
One has humanity, the other lacks it.
但在那些听起来悦耳或有人性的独奏中,是否存在某种模式?
But is there a pattern in the solos that sound tuneful or in the ones that have humanity?
因为科技乐观主义者告诉我们,如果存在某种模式,机器就能发现它并加以复制。
Because the tech optimists tell us that if there is a pattern, a machine will find it and will be able to replicate it.
还是说我们正在直接发现人类音乐中某种无法复制的东西,一种近乎超自然的、如灵魂般的东西?
Or are we directly finding something unreplicable in human made music, something bordering on supernatural, like a soul?
因为如果真是这样,我们就没什么好担心的。
Because if that's true, we have nothing to worry about.
人工智能永远不会有这种东西。
AI is never going to have that.
这就是第六季《高保真国度》的全部内容。
That's it for season six of Hi Fi Nation.
这一季较短,因为我得写一本书。
It's a shorter season because I have a book to write.
明年留意一下这一点。
Look out for that in the next year.
如果你想获取我们提到的所有技术的链接,请访问网站 hifination.org,找到本集的节目页面。
If you want links to all of the technologies we've talked about, go to the web page, hifination.org, and look for the show page for this episode.
我会提供谷歌音乐 LM、OpenAI Jukebox 以及其他所有内容的链接。
I'll have links to Google Music LM, Open AI Jukebox, and everything else.
如果你希望我在未来的季中讨论某个特定话题,也可以通过网站给我发邮件。
If you want me to cover a certain topic in future seasons, you can also email me through the website.
我们第七季再见。
See you in season seven.
《Hi-Phi Nation》由巴里·兰姆制作、撰写和剪辑。
Hi-Phi Nation is produced, written, and edited by Barry Lamb.
本季的故事编辑是埃莉诺·戈登·史密斯。
Story editor for this season is Eleanor Gordon Smith.
在 Slate 播客团队中,艾莉西亚·蒙哥马利担任音频副总裁。
For Slate Podcasts, Alicia Montgomery is VP of audio.
德里克·约翰是叙事类播客的执行制片人,本·里士满是运营高级总监。
Derek John is executive producer of narrative podcasts, and Ben Richmond is senior director of operations.
在 Facebook、Twitter 和 Instagram 上关注 Hi-Phi Nation,账号为 Hi-Phi Nation。
Follow Hi-Phi Nation on Facebook, Twitter, and Instagram at Hi-Phi Nation.
就是 h I p h I nation。
That's h I p h I nation.
每集的完整文字稿、节目笔记和阅读建议均可在 hifination.org 上获取。
Complete transcript, show notes, and reading suggestions for every episode is available at hifination.org.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。