本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
好的。
Alright.
欢迎回到《DeNoised》的又一期节目。
Welcome back to another episode of DeNoised.
我们将回顾一下Infinity Fest的一些内容。
We are gonna focus on a bit of a recap of Infinity Fest.
我在那里主持了一个关于40高斯、溅射、AI和虚拟制作的小组讨论。
I hosted a panel there on 40 gauss and splats and AI and virtual production.
Adi无法到场,所以我会在过程中向他和大家做一下说明。
Adi cannot make it, so I'm gonna fill him in and fill all of you in while we're doing it.
然后我们会聊聊AI电影制作工作流中其他一些最新的进展。
And then we're gonna talk about some of the other things, the updates we've been seeing in AI filmmaking workflows.
准备好了吗,Adi?
You ready, Adi?
听起来不错吧?
Sound good?
太棒了。
Awesome.
对。
Yeah.
好的。
Alright.
所以是Infinity Fest。
So Infinity Fest.
那是上周四和周五。
It was last Thursday and Friday.
非常棒的活动。
Super cool event.
有很多很好的论坛和演讲。
A lot of really good panels, lot of good talks.
重点讲一下我主持的论坛。
Focus on the panel that I've moderated.
我们举办了一个关于人工智能创新和虚拟制作的小组讨论,但我们真正深入讨论的是高斯点云和3D Nice。
We had a panel on AI innovations and virtual production, but that was the the real thing that we were talking about a lot was Gaussian splatting and three d Nice.
4D高斯点云。
Four d Gaussian splatting.
我们请到了保罗。
We had Paul
看你主持这么大的小组讨论啊。
Look at you moderating big panels.
大型小组讨论。
Big panel.
是的。
Yeah.
卡瓦利尔本人,乔伊。
Boy cavalier himself, Joey.
那很不错。
That was good.
我也很好奇。
Was I was curious as well.
我当时就想,我就直接问你问题吧。
I was like, I'm just gonna ask you questions.
我也想知道。
I wanna know as well.
而且结尾还有一些不错的预告,关于一些我希望能有更多时间深入探讨的内容,比如三角形高斯溅射。
And there were, like, some good teasers at the end too of things that I I wish we had more time to get into because I'm also still curious about them, like triangle Gaussian splats.
是的。
Yeah.
我想提醒一下观众,AI和高斯溅射之间的联系是什么,因为我们通常会认为这是传统的摄影测量技术,尽管显然不是。
And, I guess the audience I mean, remind all of us, what the AI connection to Gaussian splats are because we we tend to think of that as traditional photogrammetry even though it's clearly not.
这不一样,它也不是AI。在小组讨论中,有来自Netflix iLine工作室的Paul Debevik,还有Jason Shugart,他现在在英伟达工作,但在视觉特效领域有丰富的经验。
Different, it's also not AI as, so on the panel is Paul Debevik, from Netflix's iLine Studios, and then also Jason Shugart, who's, he works at NVIDIA now but has extensive history in VFX.
对。
Yeah.
我认识杰森。
I know Jason.
他很棒。
And He's great.
他们明确表示,高斯点云并不是直接的AI技术,但确实有关联。
They made clear that Gaussian Splats is not directly AI, but it is Yeah.
在这个领域内有关联。
Related in that field.
所以我认为这里的关联在于,高斯点云并不是生成式AI。
So I think the correlation there is Gaussian splats are not generative AI.
它们并不是从零开始创造新事物。
They're not creating something novel from scratch.
但为了解决点云分布中高斯函数在三维空间中的关联性,使用了AI引擎。
But to solve itself, how the, point splatters, the Gaussians, are, correlated in three-dimensional space is using an AI engine.
嗯。
Mhmm.
而且,是的,顺便简单解释一下这是什么以及它有什么用。
And, yeah, also just to, like, a short explanation of what this is and how it's useful.
基本上,如果我们比较一下重建三维物体的原始方式——这种方法至今仍很常见,那就是使用多边形,用大量小线段连接所有点来构建形状,但这样可能需要数百万甚至数十亿个小点和多边形来构建你想要的三维物体或空间。
Basically, if we kinda compare how we would rebuild things in three d, the original kinda way that is still pretty standard is, polygons, a bunch of little lines connecting everything, you're building out your shapes, but that can use millions or billions of little dots and polygons to build out whatever three d object or space you're building out.
高斯溅射,是的。
Gaussian splats Yep.
一个溅射就是一个包含多种信息的漂浮斑块。
A splat is a floating blob with a variety of information inside it.
如果你有数百万个这样的包含信息的斑块,就可以重建出,是的。
And then if you have millions of these blobs with information inside it, that can rebuild Yep.
你的三维空间。
Your three d space.
这种三维物体的优势在于,它的运行速度更快、资源占用更轻,相比用多边形重建同一场景来说。
And the advantage of this or three d object, the advantage of this is it can run much faster, much lighter weight than if you're rebuilding the same scene with polygons.
正确。
Correct.
是的。
Yeah.
我认为高斯溅射仍然使用了一种所谓的实时渲染技术。
I think Gaussian splats are still using a sort of, quote, unquote, real time rendering.
比如,当你移动摄像头时,它会实时计算出新的视角。
Like, the fact that when you are moving the camera, it's calculating that novel view in real time.
嗯。
Mhmm.
但与其说是在渲染,比如通过光线追踪或像传统计算机图形学那样让光线在表面反弹,它实际上是将这些带有颜色的点块放置在对应于该物体的三维空间中。
But instead of rendering per se, like, with ray trace or, you know, bouncing light off of surfaces the way traditional computer graphics does, what it's doing is it's placing these blobs of color in the three space that corresponds to that object.
由于它们是高斯溅射,你可以将它们紧密排列在一起,从而融合成一个无缝的物体。
And because they have a Gaussian splat, you can actually put them right next to each other, and they blend in like a seamless object.
嗯。
Mhmm.
对。
Yeah.
在这之间,还有NeRFs,它们曾短暂地引起过一些关注,也就是神经辐射场。
And in between this too, there were Nerfs, which kinda had a a little bit of a moment, neural, radiance fields.
但NeRF的问题在于,你必须先处理数据,然后基本上把它固化下来。
But the NERF the thing with NERFs was you had to process the data and sort of it sort of got baked in.
这样理解对吗?
Is that is that kinda correct?
而高斯溅射之所以能实时运行,是因为数据就存储在溅射点中,是这样吗?
Where, like, Gaussian splats can kinda be run-in real time because the data is in the splat?
这超出了我的理解范围,但我认为你说得对,NeRFs已经是过时的技术,目前这个领域大部分开发工作都集中在3D GS上。
Again, that's beyond my understanding, but I I think you're right in that nerfs are an outdated technology, and most of the development work that's happening in this realm is now happening in three d GS.
嗯。
Mhmm.
有趣的是,高斯溅射并不是什么新技术。
And the interesting thing was that Gaussian splats are not new.
它基于一篇大约十五到二十年前的论文,但人们只是最近才重新发现它在三维和四维空间中的应用,这一点我们也稍微提到过。
It was, like, based on a paper, I I think, fifteen, twenty years old, but they sort of just rediscovered that it has this application in three d space and four d space, which we also touched on a bit too.
所以,我的意思是,四维高斯溅射的概念是,你捕捉的不仅是三维环境,还有三维环境中的运动。
So, I mean, the idea of four d Gaussian splats is you are capturing not just the three d environment, but motion in a three d environment.
比如人们跳舞或移动,你能重建出这个场景的四维形态——即在一段时间内的三维空间,并可以在该三维空间中重新定位或移动你的摄像机。
So people dancing or moving, and you're able to reconstruct that scene in four dimensions, a three d space over a period of time, and reposition your camera, move your camera in that three d space.
是的。
Yeah.
所以当你提到四维时,你并不是在说《星际穿越》里那种手部场景吧?
So when you say four d, you're not talking about Interstellar, like the hand scene with
我说的四维是指《木偶奇遇记》四维那种。
I'm talking about four d as in Muppet Vision four d.
这是三维空间随时间的变化。
It's three dimension over time.
对。
Yeah.
没错。
And so right.
就像那个演示视频,
Like and and there was that demo clip.
我想我们几个月前在播客里聊过,还记得那个男人坐下的片段吗?
I think we talked about on the podcast, like, months and months ago of remember that clip of the guy sitting down?
我觉得那是个中国YouTube博主,他坐着,然后用的是四维高斯溅射技术,摄像头围绕着他移动。
I think it was, like, a a Chinese YouTuber, and he was sort of sitting down, and then it was, a four d Gaussian splat, and the camera was sort of moving around him.
然后在我们这些极客圈子里稍微火了一把,因为人们都说,哦,这居然是从一个摄像机角度生成的。
And then it kinda went a little bit viral in our nerdy sphere because people were saying, oh, was generated from, like, one video angle.
然后是的。
And then Yeah.
后来真相大白,我说其实他们用了20台摄像机才完成的,但即便如此,这依然很酷,只是确实用了20台摄像机。
It came out where I was like, no, they needed 20 cameras to pull that off, but it was still, you know, was still cool, but they needed 20 cameras.
是的。
Yeah.
我们从来没看到过幕后制作过程,我记得我朋友吉姆·加迪拉说过,是的。
We we never caught the BTS, and I think I was, what my friend Jim Gadilda Yeah.
他指出了这一点。
Kinda pointed that out.
意思是,你知道的,这和使用50台相机的传统摄影测量一样复杂。
It's like, just so you know, this is as involved as traditional photogrammetry with, like, 50 cameras.
不管怎样,这项技术背后的公司也在Infinity Fest上做了演示。
Well, anyways, the company behind that technology, they were demoing at Infinity Fest as well.
我想是叫FourD View之类的,他们设置了设备和那些演示片段。
I think it was four four d view or something, and they had the rig and that those demo clips set up.
但他们想出了一个非常巧妙的解决方案,用多种PCZ相机搭建设备,用来捕捉4D高斯溅射,而且这个
But they've been doing a really clever solved solution for building out a rig with a variety of PCZ cameras where they can capture four d Gaussian splats, and that, like
所以他们并不是在构建所谓的超立方体。
So they're not building the tesseract per se.
那是什么?
What was that?
就是《星际穿越》里的那个东西。
That's the thing from the Interstellar
我很久没看到那个了。
I haven't seen that in a long time.
那个超立方体是什么?
What was the tesseract?
我不确定这是否
Don't know if this is
卡尔·萨根。
Carl I Sagan.
我不确定这是否真实
I don't know if this is real
卡尔·萨根用一个叫超立方体的物体来描述四维空间。
Carl Sagan described four dimensions using an object called the tesseract.
哦,对了。
Oh, that's right.
那本关于暗物质的书里也有提到。
That was in that, black matter, dark matter book as well.
是的。
Yeah.
所以,就像,你知道的,一个立方体的影子是正方形,嗯。
So, like, you know, like, a a cube shadow is a square Mhmm.
而四维超立方体的影子是
And a tesseract's shadow is a
立方体。
cube.
好的。
Okay.
这让你大脑崩溃了吗?
Does that break your brain?
我正要
I'm to
你想象不出来。
You can't picture it.
我正在努力理解。
I'm trying to wrap.
我们的大脑无法
Our brain can't
想象它。
picture it.
不行。
No.
是的。
Yeah.
好的。
Okay.
但是,是的。
But, yeah.
是的。
Yeah.
当我们谈论四维时,事情就发生在那个层面。
When we talk four dimensions, that's where it goes.
所以让我们回到为什么高斯点云对现代电影制作很重要。
So loop us back into why Gaussian splats are important for modern filmmaking.
好的。
Okay.
那么,让我们回到实际的好处,以及这与虚拟制作和其他几位嘉宾的关系。
So bringing this back into, like, tangible benefits and also, some of the other panelists and what this had to do with virtual production.
在小组讨论中,我们还有像费尔南多·里瓦斯这样的公司或人士,他是Valinga的首席执行官。
Also on the panel, we have companies or people, so, like, Fernando, Rivas, who's the CEO of Valinga.
我在NAB上对他进行了一次深入采访,有关这方面的内容非常丰富。
I did a big interview with him at NAB, so there's extensive stuff on that.
但他基本上创建了Valinga,这是一个Unreal Engine的插件。
But he basically built Valinga, which is an Unreal Engine plug in.
你可以将你做的高斯点云扫描,通过多种技术转化为模型,但我们还谈到了XGrid的PortalCam,这是一种最新的设备,能一体化完成这些工作,让操作变得简单。
You can take a Gaussian splat scan that you do, and you can make these through a variety of techniques, but we also talked about PortalCam from XGrid, which is sort of the latest device that kinda does this all in one and makes it easy to do.
你可以扫描一个房间。
You could scan a room.
我们已经讲过这一点了。
We covered that Yeah.
我们已经讲过这一点了。
We covered that.
是的。
Yeah.
PortalCam 是大约一个月或两个月前发布的。
PortalCamp launch, about a month or two ago.
你可以将房间的扫描数据导入到 Unreal Engine 中。
You can take your scan of a room, bring it into Unreal Engine.
通过 Olinda,他们可以加载高斯点云并将其导入 Unreal,同时还支持 ACES 色彩管理。
With Olinda, they can load Gaussian splats and load it into Unreal, and they also have ACES color management.
然后你可以将它投射到墙上,基本上就能用真实环境的扫描数据来构建你的场景,无论是房间还是布景都可以。
And then you can push that into a wall, and basically, you can have your environment based on a real life environment, scan a room, scan a set.
他们还经常将这种技术用作备份,比如在拍摄期间扫描布景,如果之后需要补拍,或者第二季被续订而他们原本没预料到,或者有各种其他原因,你都可以扫描任何想要的空间,然后把它投射到墙上。
They've also been, you know, using that a lot as backup, so you scan a set from when from production, if you have to do pickups later, or if, like, season two gets picked up and they weren't quite counting on season two, or variety of reasons, you can take a scan of whatever space you want and put it on a wall.
所以这是一种非常实用的方法。
And so that's, a very practical way.
他们还做了一个案例研究,Valinga扫描了奥斯维辛集中营,而那里是严禁任何人进行拍摄的,无论是什么类型的制作。
Also, had a case study too of, that Valinga did where they scanned, Auschwitz, the concentration camp, and that one was they just did not allow anyone to film at Auschwitz if they're trying to do any type of production.
他们做了扫描,现在我们把这个扫描数据提供给那些正在制作与奥斯维辛相关项目的团队,如果他们想使用这个地点,就可以把它投射到墙上,进行数字化拍摄,而不会干扰实际遗址。
They did a scan, and then now we're making that scan available to productions if they're doing a project around Auschwitz or that wants to use the location, they can put it on a wall and have that space, digitally so they can film there digitally, but not disturb the actual site with production.
世界上有太多地方,把它们引入虚拟拍摄棚或视觉特效中都会带来巨大好处。
There's so many places in the world where bringing that place into a volume or into VFX would be super beneficial.
我的意思是,奥斯维辛就是其中之一。
I mean, Auschwitz is one of them.
甚至像金字塔这样的地方,那里有各种许可限制和拍摄时间限制。
Even something like the Pyramids where there's all types of permits and time just of
游客。
tourists.
游客的不便。
Tourist inconvenience.
切尔诺贝利。
Chernobyl.
你知道的。
You know?
是的。
Yeah.
就像那部剧。
Like the show.
对。
Yeah.
如果你要在虚拟影棚里重现那个场景,那可能是最好的方法。
Like, if you were to recreate that in a volume, that's that's probably the best way to do it.
而且我认为,相比传统的摄影测量法——你需要拍摄数小时的视频或成千上万张图片——这种方法的优势在于,传统方法需要一个复杂的处理过程,而且很多步骤都是手工完成的。
And and I think it the advantage over traditional photogrammetry where you take, you know, hours of video or thousands of images is that there's a big solve process, and a lot of it is hand hand done.
传统上使用的是实景捕捉技术,而这项技术现在已被虚幻引擎收购。
Traditionally uses, reality capture, which is now acquired by Unreal Engine.
你所说的不仅仅是运行在配备强大GPU的超级计算机上,而是需要艺术家亲自去清理、对齐各种元素,这是一项非常繁琐的工作。
And you're talking about not just running it on a superfast computer with a lot of GPU power, but an artist manually going there, cleaning stuff up, aligning things, and it's tedious work.
而高斯点溅技术只需要上传图像。
Versus a Gaussian splat, you upload the images.
如果你拍摄得当,它基本上就能自动完成处理。
And if you shot them correctly, it should more or less, quote, unquote, solve for itself.
然后你会得到一个非常轻量的文件,大约只有100兆字节,而不是像10太字节那样的庞大数据。
And then you get this sort of super lightweight file back, which is, like, a 100 megabytes, not, like, 10 terabytes or whatever.
是的。
Yeah.
然后当你的摄像机移动时,它就能实时地大致渲染出画面。
And that could then, in real time, kinda sorta render it as your camera moves.
对。
Yeah.
我的意思是,其他实际应用,比如你看到的Lightcraft Jet Set,他们的价值主张是你可以用手机作为摄像机追踪器,并将你的3D场景加载到手机里。
I mean, other practical use, you know, seen with light Lightcraft Jet Set, you know, their their value prop is you can use your phone as a camera tracker, and you can load in your three d scene into your phone.
这样在拍摄时,你可以大致合成并看到你的3D空间。
So when you're filming, you can roughly comp in and see your three d space.
你可以导入Blender和Unreal的场景,但它们会比较沉重。
You could bring in Blender and Unreal scenes, but they're gonna be kind of heavy.
但你也可以把高斯溅射数据导入到手机上的Lightcraft Jet Set,并在手机上实时查看。
But you could also bring in Gaussian splats into Lightcraft Jet Set on your phone and see this in real time on your phone.
由于溅射数据轻得多,你在拍摄时能获得更好的体验。
And because the splats are much more lighter weight, you get a better, experience when you're filming.
我很想知道Paul de Bebek在小组讨论中说了什么,因为你知道,他是……
I'm curious to know what Paul de Bebek said at the panel, because as you know, he's Yeah.
当今VFX领域最受尊敬的人物之一,所以我只是好奇他具体说了什么。
One of the most respected folks in VFX today, so I'm just curious what he said.
所以,是的,Paul大致谈到了我们刚才讨论过的那些基础内容,比如什么是高斯溅射,包括3D和4D的。
So, yeah, Paul, he kinda talked about a lot of the basics that we sort of just covered now of, like, what is Gaussian splatting, both three d and four d.
他还隐约提到了一篇即将在SIGGRAPH亚洲会议上发表的论文,内容是关于超分辨率技术,用于从大场景捕捉中实现极致特写。
He also sort of alluded to a paper that they've got coming out at SIGGRAPH Asia on super resolution techniques to achieve extreme close ups from large volume capture.
我们没有深入讨论更多细节,但这篇论文我很快就会发布。
So we didn't get into more detail about that, but that's a paper that I've got, coming out soon.
他们还谈到了高斯泼溅的一些局限性。
And they also sort of talked about some of the limitations of Gaussian splat.
目前用于编辑高斯泼溅(比如你扫描得到的数据)的工具仍然很有限,当然远不如已经发展了二三十年的多边形工作流程成熟。
So the tools to edit a Gaussian splat, like a scan that you make, they're still limited, you know, definitely not as much as the existing twenty, thirty year pipeline of working with polygons.
是的。
Yeah.
高斯泼溅的另一个局限性在于它们是辐射场表示,也就是说,扫描过程中捕捉到的光线基本上被固化在其中了,因此重新打光高斯泼溅会更困难。
The other thing the other limitation with Gaussian splats is they are ratings field representation, so basically the light that you captured during your scan is sort of baked into it, and so also relighting Gaussian splats is trickier.
目前,如果需要具备重新打光的灵活性,通常需要结合高斯泼溅、LiDAR技术和传统摄影测量来重建场景。
That's something where currently right now, something like a combo of Gaussian splats, LiDAR technology, and traditional photogrammetry comes into rebuilding a space if you need to have that flexibility of relighting.
Global Object 专长于此,他们综合运用所有这些技术,以便创建一个既属于Unreal Engine三维场景、又具备重新打光灵活性的空间。
That's something that Global Object specializes in, where they combine all of these techniques if you need to have a space where it's an Unreal Engine scene, three d, but has the flexibility of relighting.
这目前就是高斯溅射的局限性之一。
That's just one of the limitations right now with Gaussian splats.
当然。
For sure.
但烘焙光照的优势在于,许多反射效果都非常精准。
But the advantage to having baked lighting is a lot of the reflections are super accurate.
嗯。
Mhmm.
而且很多时候,摄影测量在玻璃或金属物体上会完全出错,而高斯溅射在这方面表现很好,但问题是这些效果是被固化进去的。
And it just looks like photogrammetry, a lot of times, totally messes up, like, glass or metal objects, whereas Gaussian splats are really good at that stuff, but then again, it's baked in.
他还隐约提到了另一点,但我们没有深入讨论,也许将来我们可以试着拆解一下:我们已经从多边形发展到NeRF,现在又转向了高斯溅射,那之后会是什么?
And then the other thing he sort of alluded to, but we didn't get into details, and so maybe we'll try to break this down in the future, was, you know, we've gone from polygons, we've gone to Nerfs, we've gone, now we're looking at Gaussian splats, what's after that?
他提到的是三角形高斯溅射,但我没有深入探讨,也没去查过,不过我注意到已经有一些相关论文出现了。
And triangle Gaussian splats are what was alluded to, but I didn't really get into details about that and I have not looked it up, but I noticed some papers out about that.
尝试。
Tries.
我猜是多边形和高斯点云的组合。
Guessing it is a combo of polygons and Gaussian splats.
是的。
Yeah.
我觉得三角形对GPU来说计算效率更高,因为GPU原本就是为处理三角形和四边形设计的,一次性并行计算百万甚至十亿个三角形的数学运算,相比高斯点云(本质上是球体)要容易得多。
I would imagine tries would be more computationally efficient for GPUs because GPUs are really built for tries and quads, like the math behind just computing a million or a billion tries at a time in parallel versus a Gaussian splat, I think, is a sphere.
所以从几何上看,它更复杂。
So geometrically, it's more complex.
哦,明白了。
Oh, okay.
这说得通。
That makes sense.
好的。
Okay.
所以转向三角形会带来无限更好的性能
So moving to tri's would give you infinitely better So
它更像是,数据存储方式不是球形的斑点,而是三角形的斑点?
it'd be, like, sort of the data stored in sort of instead of, like, a sphere y blob, more of a triangle y blob?
三角形的斑点斑点。
Triangly blob blob.
但这个三角形是平面的,这是现代计算机图形学中的自然基本单元。
But that triangle is a flat triangle, which is, a natural unit for modern computer graphics.
好的。
Alright.
好吧。
Okay.
明白了。
Okay.
有意思。
Interesting.
我们以后得深入研究一下这个。
We'll have to dig into this in the future.
好的。
Okay.
然后为了全面介绍一下 panel 上的其他成员及其讨论内容,我们有本·阿弗格尔,他与 ETC 合作,是一位参与 ETC 项目的创意技术专家。
And then he also just to round out who else was on the panel and what they covered, we had Ben Avergel, who is works with ETC and is a creative technologist involved in ETC.
我们会稍后谈到这些乐队,但他参与了 Pathways 以及其他 ETC 正在制作的项目。
And he we'll talk about the bands in a bit, but he worked on Pathways and other project that ETC is producing.
他们最初尝试用几部 iPhone 实现某种四维高斯点云。
And they were initially trying to do some form of four d Gaussian splats with a few iPhones.
我认为他们用四部 iPhone 录制了一个场景,但数据量远远不够,无法生成一个可以自由移动且保持高质量的高斯点云。
I believe they recorded a scene with, like, four iPhones, but it just was not enough data needed to create a Gaussian splat that you could, like, move around and still have, good quality.
因此他们调整了方案,转向了我们之前讨论过的一种工作流程:使用 iPhone、Lightcraft Jet Set、BeBull、真实演员、绿幕空间,然后用类似拼贴方式的 Unreal Engine 场景替换背景,再通过 AI 视频到视频技术进行后期风格化。
So they shifted and they kind of shifted to a workflow that was more something we've talked about before of like iPhones, Lightcraft Jet Set, BeBull, real actors, green screen space, and then restyling the background with like a kit bashed kind of Unreal Engine scene, and then restyling that with video to video with AI later on.
回到他们最初用 iPhone 实现 4D 高斯点云的初衷,我问过,到底需要多少部 iPhone 才能成功实现?
The thing, going back to the original 40 Gaussian splats that they're trying to do with the iPhone, I did ask, like, how many iPhones would you actually need to successfully pull that off?
他认为大概需要 20 到 30 部,和其它高斯点云视频所用的数量差不多。
And he thinks probably around 20 to 30, the same number that, the other Gaussian Splat videos are using.
最后是艾拉·罗伯茨。
And then lastly, Ella Roberts.
她是《绿野仙踪》在球形影院中进行上采样工作的AI监督。
She was the AI supervisor for the Wizard of Oz upscaling at the Sphere or upscaling.
我认为,马克斯·菲利普斯负责了所有内容。
Max Phillips, I believe, did all the Yeah.
她与他们合作。
She works with them.
她是创意AI监督。
She's the creative AI supervisor.
总的来说,我们讨论了他们所采用的工作流程和制作管线,但显然我无法深入太多细节。
And, yeah, in broad sense, we kind of talked about the workflow and pipeline of what they did, but obviously, I couldn't get into too many details.
但话说回来,这完全是另一个项目:他们将现有格式从原电影中拆解,然后为球形影院16K的宏大景观重新构建每一个场景。
But, yeah, I mean, that was just sort of a whole different project in taking existing formats and sort of breaking them down from the original film and then rebuilding out each scene for the 16 k massive landscape of the sphere.
是的。
Yeah.
那个小组讨论涵盖了行业的多个层面。
It's a nice cross section of the industry on that panel.
我的意思是,从像保罗这样资历深厚、堪称顶尖的专家,到像你这样站在AI前沿的人,应有尽有,嗯。
I mean, you have everybody from, like, tenured, you know, one of the foremost experts like Paul all the way up to, like, you on the AI frontier side Mhmm.
加布里埃尔代表的是Sphere和沉浸式娱乐领域,同时你还有NVIDIA的代表。
Gabriel on the, you know, sphere and location entertainment side, and then you also have, NVIDIA.
你有来自NVIDIA的杰森。
You have Jason from NVIDIA.
不是。
No.
那场讨论非常好。
It was really good.
真的非常棒,这是一个
It was a really good It's a
非常精彩的小组讨论。
really cool panel.
我错过了,真遗憾。
I'm bummed out that I missed it.
是的。
Yeah.
对。
Yeah.
也要感谢科琳娜组织了这个小组讨论。
And a shout out to Corinne for assembling that panel.
不过,是的。
But, yeah.
另外,趁着我们还在聊无限节,我想谈谈另一场非常有趣的演讲,关于如何将普通的AI输出——通常是8位RGB 720p或1080p的图像——提升为4K影院级、16位EXR格式的文件。我们之前讨论过,Luma的ORAY 3很厉害,它能在生成时直接输出这样的高质量结果。
Also wanna talk about one other thing while we're on Infinite Fest because there's another really interesting presentation about upscaling regular AI outputs, are usually, eight bit RGB seven twenty, maybe ten eighty output to four k cinema quality, 16 bit EXR files, which we've talked about how like ORAY three from Luma, their cool thing is they can do that in the output and the generation.
但如果你不想用ORAY 3,或者你有其他输出源呢?
But what if you don't want to use RAY three or you have other outputs?
这个工作流程是怎样的?
What is this workflow?
所以
And so
是的
Yeah.
如果你只是想放大呢?
What if you just want to upscale?
你已经完成生成了?
You already have the generation done?
没错。
Exactly.
所以这个流程非常有趣,是ETC的另一个项目《Benz》,这是一个可爱的动画短片,讲述一条深海鱼在水下旅行并努力游向水面的故事。
And so this is a really interesting workflow with the that ETC's other project, the Benz, which is sort of this kinda cute animated short with like a fish, deep sea fish underwater kind of traveling around and making its way to the surface.
他们构建了一个使用多种Topaz模型的流水线,这个本地模型与Starlight无关。
They built out a pipeline using a variety of Topaz's models, the local model has nothing to do with Starlight.
基本上,他们会先获取原始素材,通过Nick的去噪模型去除噪点,然后再用他们另一个模型Gaia Uprights进行处理。
And so basically they would take their source footage, they would run it through Nick's denoise, kind of strip out the noise from it, then run it through, Gaia Uprights, which is one of their other models.
然后这个流程中还有几个其他步骤,但我拍的截图里没有包含这些。
And then there were a couple other steps in the process, which I don't have in the screenshot that I took.
但基本上,他们用了三个不同的Topaz模型,之后得到了更丰富的细节、更高的质量,输出了16位的XR文件,还有色彩分级等所有内容,他们拥有更大的调整空间来匹配镜头,这就像你用电影级摄像机拍摄时习惯的那样。
But basically, it was like three different Topaz models, and then out of that, they were able to get, much more detail, much higher quality, the XR file, 16 bit, that and the color grade and everything else, they could just they just had a lot more latitude to adjust the shots, match the shots, something that you would, you know, be used to if you're shooting with a cinema quality camera.
是的。
Yeah.
这真是很有趣的工作。
It's really interesting work.
让我更着迷的是分辨率提升之外的位深提升。
One the thing that fascinates me more about the resolution upscale is the bit depth upscale.
你知道,通常8位是0到255。
So, you know, when you talk about typically eight bit is zero to two fifty five.
对吧?
Right?
所以如果你的画面中有太阳,太阳的最高值是255,但现实世界的动态范围,就像我们眼睛看到的那样,远不止255。
So if you have, you know, your sun in the frame, the highest value that sun will have is two fifty five versus real world dynamic range, like what our eyes see, you know, is it won't be two fifty five.
它可能会达到一百万单位。
It'll be, like, 1,000,000 units.
对吧?
Right?
是的。
Yeah.
那么,你是怎么获得如此极端的动态范围的?又如何仅从原始素材中人工插值生成这些数据?
So, like, how do you get that crazy of a dynamic range, and how do you create it artificially interpolate it just from
我知道。
I know.
对。
Right.
你居然能从这种质量很差的原始帧中提取并生成出这么多数据。
From having this, like, kind of crappy source frame that you're able to, like, pull and generate all this data.
对。
Yeah.
真的很令人印象深刻。
Was really impressive.
我真希望当时拍下了波形监视器的画面,因为那里有处理前后的对比,你才能真正看到所获得的动态范围。
And I I wish I had a shot of the waveform monitors because they had before and after the waveform, and that's where you could really see the latitude you get.
而且,另一个优势是,它能大大减少色带现象。
And also, the other advantage is, like, it reduces banding a lot.
你知道,这就是色带问题。
You know, it's a banding.
这是因为如果你使用的是8位,颜色空间是有限的。
It's like just because there's limited if you're in eight bit, it's limited amounts of color space.
对。
So Right.
特别是这部短片发生在深海,所以有很多暗景。
Especially because this short film took place in, like, the deep sea, so there's a lot of dark shots.
因此,即使在渐变和阴影部分,也能看到那些色带线条。
And so even with the kind of gradients and some of the shadows in the dark, would see the, like, banding lines.
这个上采样过程在波形监视器上清晰可见,波形从非常僵硬的阶梯状变成了更平缓的曲线,这正是你希望看到的正常效果,能获得更自然的渐变效果,真的吗?
And this upscale process, and you could see it in the waveform monitors where it went from, very rigid steps in the waveform monitor to a much more gradual curve, which is what you would want to see or normally see where you get more of a a nice gradient with with Oh, really?
色彩空间。
Color space.
我认为很多人对暗场景、光线较少的假设都有误区,比如我想到了《权力的游戏》,我管那些场景叫‘权游场景’,对吧?
And I think the a lot of the sort of assumptions about having darker scenes with less light, You know, I'm I'm thinking Game of Thrones, so I call the scenes the game right?
你可能会觉得,因为画面很暗,就不需要那么高的位深,但实际上,我们的眼睛对阴影和暗部区域的细微差别更加敏感,如果你给暗部区域分配更高的位深,反而能提取出更多细节。
It's just, like, you think that you don't need that big of a bit depth because things are dark, when in reality, our eyes are actually much more sensitive to shadows and differences in the darker lights and the regions that if you throw more bit depth at the darker regions, you actually get more detail out of it.
所以,即使场景中没有太阳,不是白天的场景,它依然能从中受益。
So it still benefits the scene even though there is not a sun in it, for example, it's not a daylight scene.
即使是一个暗场景,你仍然需要在位深上保留更多的信息量。
If it's a dark scene, you still need more bandwidth in the bit depth area.
是的。
Yeah.
而且在调色时,你也能获得更大的控制力和灵活性。
And also just when you're the color grade too, and you have that that control and flexibility.
而且他们正在进行的实验是,试图在专业的好莱坞电影制作流程中构建一套AI生成工作流。
And and this and what the experiment they're doing too is they're trying to really build out an AI generative workflow, but in a professional Hollywood filmmaking pipeline.
整个过程由ASC摄影师罗伯托·沙弗监督,他用自己的眼睛分析场景,试图从这些AI输出中获得他习惯用阿莱Alexa或索尼Venice拍摄的效果。
So, like, this entire process was supervised by an ASC cinematographer, Roberto Schafer, and it, you know, he he's using his eye to analyze the scenes and, like, try to get what he's used to with shooting with an Arri Alexa or Sony Venice out of these AI outputs.
他们做的另一件事是通过胶片化学方法添加颗粒效果,并展示了一些带有颗粒和不带颗粒的对比效果。
And then the other thing they did was they did a grain film pass with photochem, and they showed some kind of comparisons of, like, with grain or without grain.
只是,添加颗粒后,画面就变得
And just, you know, adding the grain just makes it
好太多了。
Way better.
对吧?
Right?
好太多了。
Way better.
是的。
Yeah.
用一个更贴切的词来说吧。
For lack of a better word.
好太多了。
Way better.
颗粒感真正帮助到的一点是色带问题。
One thing grain really helps with is banding.
比如,有一个叫做抖动的过程,嗯。
Like, you know, there's a process called dithering, which Mhmm.
可以减少色带。
Reduces banding.
所以这本质上是在画面中添加噪点,而颗粒感就是一种噪点。
So it's essentially adding noise into the frame, and grain is kind of noise.
因此,它确实有助于克服AI因8位色深和低动态范围带来的诸多限制。
So it does help overcome a lot of the limitations with AI having, you know, eight bit and low dynamic range.
是的。
Yeah.
我还觉得他说了一些特别有意思的话,因为当我想到给AI图像或者数字图像添加颗粒感时,我内心其实有点疑惑:这算不算作弊?
And I thought he said something really interesting too because, like, when I thought about adding grain to an AI image, you know, or even just a digital image, part of it is felt like, is this cheating?
或者这算不算在造假?毕竟,这根本不是用胶片拍的,我们却硬加了胶片颗粒。
Or is this, you know, is this doing something, like, you know, faking it because, like, it, you know, wasn't actually shot on film, but we're adding film grain to it.
但他提到一个很有趣的观点,他说,你用什么设备拍摄其实并不重要。
But he said something interesting where he's just like, it it doesn't matter what you shoot with.
即使我们用的是数码相机,传感器本身也有自己的噪点特性。
Even if we're shooting this with digital cameras, like, sensor has a noise profile.
比如,
Like, they
所有设备拍出来的画面都不会是那种超清晰、毫无瑕疵的。
all like, nothing is shooting super crisp and clear.
太死板了。
Clinical.
是的。
Yeah.
是的。
Yeah.
比如,如果你在制作动画CG电影,或者生成图像时,如果画面太干净了,因为这可能是唯一能生成完全无噪点或无颗粒感的方式,但看起来会很奇怪、太干净了。
Like, and if you're doing, you know, even an, an animated CG film or if you're doing outputs and it's just like too clean because, like, that's the only way you could get generate something that, like, had absolutely zero noise or grain, and it just looks too weird and too clean.
在这个过程中加入颗粒感有助于增强真实感,这并不是说你在添加虚假的胶片效果。
And adding the grain in that process, you know, helps sell the illusion, and it's not like a it's not like, oh, you're adding fake film effect.
你只是在添加一些能模拟即使你用数字电影摄像机拍摄时也会有的效果。
It's just you're adding something that replicates even if you shot it with a digital cinema camera.
是的。
Yeah.
你知道,电影中的颗粒感非常重要,以至于现在很多摄像机都内置了颗粒感调节功能。
You know, the the grain part of cinema is so important that a lot of cameras now have built in grain adjustment.
我不确定RED摄像机是否已经具备这个功能。
I don't know if the RED cameras have it just yet.
也许V Raptor可能有,但Arri Alexa肯定可以,你不仅能调节颗粒感的强度,从0到100,还能选择颗粒的纹理。
Perhaps the V Raptor might, but the Arri Alexa's for for sure, you can not only pick the level of grain like a zero to a 100, but also texture of the grain.
好的。
Okay.
这和选择ISO设置之类的有区别吗?
And this is separate from, like, picking your ISO setting or whatever?
是的。
Yep.
没错。
Exactly.
所以这是相机在拍摄后添加的。
So this is something that the the camera body will add after acquisition.
所以它会像叠加一样加进去。
So it'll, like, kinda overlay it.
我认为如果你使用RE RAW格式,之后也可以调整它。
And I believe if you're in RE raw format, you can tweak it down the road as well.
这说得通。
That that would make sense.
好的。
Alright.
所以,是的。
So, yeah.
这些就是我在Infinity Fest上觉得最值得关注的亮点。
Those are the highlights for me from, Infinity Fest.
我的意思是,有很多精彩的讲座,但我知道我们已经聊过这些了。
I mean, a lot of the great panels, but, I know we've talked about this for a bit.
对我来说,这两个是最突出的。
Those were the two standouts for for me.
不过,这确实是一场不错的会议,有很多非常有趣的演讲。
But, yeah, it was a a good good conference, a lot of a lot of really interesting talks.
我们来聊聊人们正在使用的工具吧。
Let's talk about some of the tools people are using.
好的。
Alright.
你在用什么?
What are you using?
最近你最常用的工具是什么?
What have your been your go to tools lately?
我更专注于图像生成,而不是视频生成。
So I tend to focus more on image generation stuff over video generation.
所以我对图像生成的每一个细节和局限性都非常敏感。
So I'm, like, hyper aware of all of the nuances and the limitations of image generation.
比如,我用的是Nano Banana。
For example, I use Nano Banana.
我的意思是,我不确定。
Like, I don't know.
我每天在Nano Banana上生成大约50次,只是为了弄清楚它能做什么、不能做什么。
I make maybe 50 generations a day on Nano Banana just to kind of figure out what it can and can't do.
你用Nano Banana的时候,是直接用文字生成图像,还是会提供一些输入内容?
Are you doing when you're using Nanobanana, are you base are you doing, like, just straight text to image, or are you giving it any input stuff?
展开剩余字幕(还有 241 条)
我主要在做图像编辑。
I'm doing mostly image editing.
所以我在处理一堆东西,个人方面,我正在搭建一个网站。
So I'm going through a bunch of so I'm, on the personal side, I'm building out a website.
对于这个网站,我会用很多真实地点的照片。
And for the website, I'm gonna take a bunch of images of, real world places.
比如,想想亨廷顿图书馆,对吧,一个很漂亮的地方,如果你拍下没有人的照片,怎么把它导入Nano Banana,然后添加你想要的人群或主角人物呢?
For example, just if you think of the Huntington Library, right, like, beautiful place, if you just take photos of it with no people in it, how do you bring it into Nano Banana and then add the people that you want, like, populate it with the crowd or hero characters and things like that?
所以Nano Banana在这方面表现得非常好。
So Nano Banana has been really good at that.
你可以提供每个你想添加的人物的参考图,然后通过标记工具,精确调整他们应该坐或站的位置。
You could give it a reference of each of the people that you wanted to be filled, and then with, like, the markup tools, you can actually dial in where they'd be sitting or standing.
它对背景的保留效果非常好。
It preserves the backplate really well.
它不会扭曲任何原始照片的细节。
Like, it doesn't distort any of the actual photo.
然后当它添加新的人物或物体时,有时我会放一辆车进去。
And then when it puts on the new person or the object, sometimes I'll put a car in.
汽车会呈现出环境的反射,几乎像HDR效果一样。
The car will have the reflections of the environment, like in HDR almost.
这很酷。
That's cool.
对。
Yeah.
是的。
Yeah.
所以它在后台做了很多工作,我还没见过哪个图像模型能如此出色地完成这个任务。
So it's it's doing a lot under the hood, and I I really have not seen an image model that capable at doing that task yet.
你说的标注工具,是用FreePic内置的标注工具,还是别的什么工具?
When you said markup tools, are you using the markup tools built into FreePic, or are you doing something else?
FreePic。
FreePic.
是的。
Yeah.
对。
Yeah.
好的。
Okay.
这对确实非常有效,是的。
And that works really well for Yeah.
用于精确的标注和你想要创建的内容。
For precise notes and what you're looking to create.
当然。
For sure.
然后,我正在做的另一件事是网站设计和搭建,但我并不是那种专业的网页开发者。
And then, the other thing that I'm doing is, obviously, with website design and build, and I'm not a, like, a web builder.
我只是因为需要才这么做。
Like, I'm just doing this because I need it.
只是需要把它完成。
It just needs to get done.
我想了想,我们总是谈论的这个AI世界,为什么不实际测试一下并真正使用它呢?
And I figured, like, just this world of AI that we we always talk about, why not put it to the test and actually use it?
所以我一直用Ideogram 3来做很多徽标和文字生成。
So I've been using Ideogram three for a lot of the logo and text gen
哦,明白了。
Oh, okay.
甚至包括网站的设计元素,比如图案之类的东西。
And even motifs and stuff like design elements of the website.
你知道的。
You know?
比如,如果你有一个拖拉机之类的图像,想要一个手绘风格的拖拉机,你可以先用Nano Banana处理,得到一个拖拉机图像,再把它放进Ideogram,生成一种与之匹配的字体,让文字风格和徽标保持一致,形成统一的视觉体系。
Like, for example, if you have, like, a tractor or something and you want, like, a hand illustrated version of that tractor, like, you can put it through Nano Banana, get a tractor back, and then put it through Ideogram and make a font in that style and have the text reflect the logo and live in the same sort of ecosystem.
这很酷。
That's cool.
你有没有试过让AI生成一个网站的原型,然后交给Claude之类的工具来根据这个原型制作网站?
Have you done anything where you're you've had an AI mock up of the website and then give it to, like, Claude or something to make the website based on the mock up?
目前我正在试用Framer AI。
So right now, I'm playing around with Framer AI.
Framer是一个不错的工具。
Framer is a Okay.
那是什么?
What's that?
是的。
Yeah.
Framer就像Wix或Squarepace这样的知名工具,可以用来搭建网站。
Framer is like a known tool like Wix or Square pace where you can build websites.
好的。
Okay.
所以它内置了一个AI引擎,非常强大,能为你生成网站的骨架,然后你可以进去添加动画、文字、其他页面等等。
So they have an AI engine built into it that's quite good at it'll give you a skeleton of a website, and then you can go in and you can, you know, add animation, add text, add another page, and so on.
现在有太多可用的AI工具了。
There's just there's just so much so much AI tools at your disposal.
你有没有比较过Banana Banana和Sea Dream?
Have you done a comparison with Banana Banana and Sea Dream?
我觉得Sea Dream可能是最接近的。
Because I feel like Sea Dream's probably the closest.
Sea Dream四
Sea Dream four
是的。
Yeah.
你可以输入16到20张图片。
You can give it 16 or 20 input images.
对。
Yeah.
你觉得Pros更偏向于所谓的生产环境,是在哪里发现的?
How where have you found pros is more built for, quote, unquote, production.
它的输出更具照片级真实感。
Like, it has a more photorealistic output.
它在理解合成方面表现更好。
It it tends to understand, compositing better.
它更能理解我刚才提到的内容,比如基于图像的光照以及类似的一些技术。
It tends to understand, like, what I what I just mentioned, like, image based lighting and sort of some of that stuff better.
我认为Sea Dream更适合创造全新、富有创意的输出。
Seadream, I would say, is better for creative, completely novel outputs.
所以,如果你是从零开始构建一个世界,我建议使用Sea Dream,获得一种你从未想象过的奇特古怪的视角;但如果你已经有一张照片,只是想对其进行类似Photoshop的修改,那就用Nana Banana。
So if you're creating the world from scratch, I think go to Sea Dream and get a weird, wacky view of the world that you've never could have imagined, versus if you already have a photo and you're just trying to do essentially Photoshop on it, then use Nana Banana.
我还要提一下,大约一两周前,我从亨利·多布雷兹那里得到了一个关于Nana Banana的技巧:有时候,当我尝试给它一张图片并修改其中某些内容时,提示词有时会变成‘修改这个’或‘替换这个’。
I will say I've also found this tip, a week or two ago from, Henry Dobrez, for Nana Banana, where, you know, sometimes I've had issues when I'm trying to give it an image and then change something in it, and then the prompting sometimes will be like change this or replace this.
他说最好的提示词是‘给我看看’。
He says that the best prompt is show me.
比如,给我看看航拍镜头,给我看看那个人的侧面照,等等,就是这种‘给我看看’的表达方式。
So show me an aerial shot, show me a side shot of that person, you know, show me blah blah blah blah blah.
所以我最近更多地使用这个方法,发现它确实效果不错。
So I've been using that more, and I've I have found that that does work pretty well.
哦,我喜欢这个提示技巧。
Oh, I love prompting trick.
我会记住这个技巧。
I'll keep that trick in mind.
好的。
Yeah.
而且,海梦和香蕉的另一个特点是,你可以给出上千句话的提示。
And the the I mean, the other thing with, like, sea dream and banana both is, like, you can prompt it, like, a thousand sentences.
没问题。
No problem.
它会,嗯,接受所有内容。
It'll, like Yeah.
全部消化掉。
Swallow it up.
你知道的吧?
You know?
而我认为,以前文本编码器对文本有500个字符或500行的限制。
Whereas, I think there was, like, a 500 character or 500 line limit on the text encoders before.
所以如果你在Comfy UI里使用T5或XXL作为CLIP编码器,我认为这些模型的限制是500个token,而不是500个字符,不管那意味着什么。
So if you're in Comfy UI land, if you use, like, t five or XXL for the clip encoder, I think those things have a 500 care not 500 character, 500 token limit, whatever that means.
哦,原来如此。
Oh, okay.
是的。
Yeah.
相比之下,比如Nano Banana。
Versus, like, Nano Banana.
它通常没有这样的限制。
It doesn't really tend to have a limit.
你可以一直继续下去。
You could just keep going.
你有没有发现,更长的提示效果更好?
You have you found, like, longer prompts work better?
很多时候,我会进入ChatGPT,拿一个简短的提示,然后说:嘿。
So a lot of times, I'll go into ChatGPT, and I'll I'll take a small prompt, and I'll I'll be like, hey.
你能帮我优化一下这个提示吗?
Can you boost the prompt?
我就是这么直接说的。
That's literally what I say.
然后我会得到一个非常冗长的提示,再回去调整一些关键词。
And I'll come back like, a really verbose prompt, and then I'll go back in it, and I'll adjust some keywords.
接着我就有了一个超长的段落,然后复制粘贴到Nano Banana里。
And then then I'll have this, like, giant paragraph that I'll then cut and paste into nana banana.
是的。
Yeah.
我也是。
Same.
我用这个来创建提示词。
I use that for creating prompts.
我不会写很长的提示词。
I'm not gonna write a long prompt.
没错。
Yep.
是的。
Yeah.
我最近在X上看到更多这样的内容,当然,这可能让我们想当然了,因为这是来自Scenario的联合创始人Em的分享,Scenario是一个帮助你创建自定义模型、训练模型的网站。
The other thing I've been seeing pop up a little bit more on X, and I mean, now this could take us for granted because it's coming from, this is from Em, who is the co founder of Scenario, who that is a website that makes custom helps you make custom models, training models.
所以他是在卖铲子。
So he is selling the pickaxe.
但我最近看到一些关于LoRA的演示和讨论,想知道LoRA是否仍然相关,以及它们还在哪些地方发挥作用。
But I've been seeing some demos of and talk of Lora's, and if Lora's are still relevant and where they still come into play.
我们以前讨论过这个问题,比如在Not a Banana、Sea Dream中,既然现在可以用Flex Context、Not a Banana等工具直接通过提示词获得想要的输出,那还值得训练自定义模型吗?
We've talked about this before of just like with Not a Banana, Sea Dream, do Lora's still make sense to train some custom model to get the specific output you want consistently when you could just prompt it in something like flex context or not a banana or etcetera.
所以这是他展示的一个演示,将某物转换为等距视图,他仍然主张使用LoRa,特别是为Flex Context设计的LoRa流程,是因为你仍然能在大规模应用中获得更一致的输出。这是一个建筑等距视图的演示,相比之下,我猜这个可能只是用文本提示加输入图像生成的,效果看起来差不多,但如果你要为一系列镜头保持一致性,就无法保证这种稳定性和统一性。
And so this was a demo from him of turning something into an isometric view, and basically, his argument still for doing a LoRa and specifically a LoRa flow for a flex context is you still just get much more consistent outputs at scale, and so this is sort of a demo of, isometric view of a building compared to, I'm guessing this is probably just a text prompt with an input image, you get, you know, it kinda looks like it, but if you're trying to do something consistent over a series of shots, it they'll you know, you won't get that consistency.
是的。
Yeah.
所以最终的结论是,你仍然需要LoRa来实现最高级别的控制。
So the end argument is that you still need LORAs for the highest level of control.
对。
Yeah.
而且我还一直在思考一个问题,因为训练Flex Context的LoRa时,你不能只是给它一组你想要的图片。
And for I mean, so the the thing I've also been trying to wonder too, because like to train a flex context Lora, you you don't just give it a group of images of what you want.
你必须提供成对的图片。
You have to give it a pairing of images.
你需要提供输入图像和对应的输出图像。
You have to give it, like, input image, output image.
不。
No.
不。
No.
不。
No.
你给它一小组图像,但你必须为它们添加文字说明。
You give it a small group of images, but you'll have to caption them.
不。
No.
我认为你必须提供输入和输出的配对。
I think you have to give it, like, input, output, pairs.
你必须使用配对来训练它。
You have to go to pairs to train it.
比如,这是输入,这是输出,你必须用图像对来训练它。
Like, this is what the input, this is the output, this is what like, you have to train it with a pair of images.
你怎么知道
How do you know what the
如果你还没生成它,输出是什么?
output is if you haven't generated it yet?
这就是关键。
That's the thing.
你得生成输出。
You gotta make the output.
所以你得修改大约20张图片,变成你想要的输出,用来训练模型。
Like, so you have to modify, you know, 20 or so images to the output that you want in order to train it.
因此,我一直都在想这个问题。
And so, that's where I've been wondering.
你是直接用flux上下文来做,还是自己别管香蕉,只是不断调整,让它看起来特别好,从而形成这种配对?还是你 elsewhere 修改?
It's like, so do you just do that, you know, with flux context or not a banana yourself and just kind of refine it so it looks really good, so you have that pairing or do you modify this elsewhere?
这部分我一直有点模糊,因为有些模型你需要提供配对数据来训练,不能只是说:‘嘿,这可不是MidJourney的灵感板,你不能随便弄一堆你喜欢的图片做成灵感板,然后指望模型从中学习你想要的输出类型。’
That's the part I've been a little fuzzy on because with some of these models you have to give it the pairing to train, you can't just be like here's, you know, not it can't be like, mid journey mood board where you just make a mood board of, like, a bunch of images that you like, and then that sort of trains the model of what type of output you're looking to get.
哦,这挺有意思的。
Oh, that's interesting.
我现在正在查看 FluxContext Laura 在 Replicate 上的训练。
I'm looking at, FluxContext Laura training on replicate right now.
根据 Google AI 概述,FluxContext Laura 的训练涉及准备一对图像——主题的起始帧和后续帧,将它们上传到训练服务并设置一个独特的触发词,然后开始训练过程。
According to Google AI overview, FluxContext Laura training involves preparing paired images, start and frames of a subject, uploading them to a training service with a unique trigger word, and then they're running the training process.
这很有趣。
Oh, that's interesting.
我确实这么做过。
I've yeah.
所以你得手动修改大约20张图像,调整成你想要的样式,
So you have to, like, modify, like, 20 or so images manually to what you
想要的,是的。
want Yeah.
我的意思是,
I mean
来训练这个模型。
to train that.
图像模型的元老之一。
One of the OGs of image models.
对吧?
Right?
比如,Flux 和 Stable Diffusion。
Like, it was Flux and stable diffusion.
我认为 Flux 是基于 Stable Diffusion 构建的,如果我没记错的话。
And I think Flux is built off of stable diffusion, if I'm not mistaken.
所以,训练 SDXL 和一些旧模型的方法是,给它大约 50 张或 30 张带标注的图片,然后它会训练一个低秩适配器,这个适配器会附加到基础模型上。
And so the way to train SDXL and some of the older models was to just give it 50 or so or 30 or so images with the caption, and then it'll train a low rank adaptation, which then latches onto the foundational model.
但是
But
你说现在你需要将输出与输入配对,不需要文字标注。
you're saying that now you need to have an output attached to an input, no caption needed.
嗯。
Mhmm.
这就是你的LoRa训练。
That's your LoRa training.
是的。
Yeah.
比如,从起点到终点,重复大约20次。
Like, start point, end point, repeat 20 or so times
但那你如何
But then how do you and
那就是数据输入。
that's the data feeds.
获得一种风格呢?
Get a style?
比如说,你想要复现某种特定的动画风格。
Like, let's say you're trying to recreate, like, a very particular style of animation.
对吧?
Right?
比如,黑白动画。
Like, black and white animation.
所以你是给它一个彩色图像作为输入,一个黑白图像作为输出?
So you give it a colored image as an input and a black and white image as an output?
是的。
Yeah.
我猜是这样。
I would guess so.
但我的问题是,要制作那个黑白图像,你是直接用手动方式,用任何你想要的流程来做吗?
But it's like my question was just like to make that black and white image, do you just do that by hand using whatever process you want?
对。
Yeah.
会去Photoshop里直接去色之类的。
Would go to Photoshop and just desaturate it or something.
我不确定。
I don't know.
对。
Right.
但我的意思是,你想要一种更有风格化的效果。
But I mean, you wanted something like more stylized.
所以我们正在做一个项目,它需要呈现出动画的风格。
So I there's a project we're working on where it is like supposed to be an animation look.
而我之前的做法是,因为画面中有角色,
And what I've been doing was I I found that if you try to create the frame because it has it has characters.
我们需要在特定的位置安排特定的人物,并且采用特定的动画风格。
It has, like, you know, we have we need to have specific locations with specific people in a specific animated style.
我发现,如果直接输入像‘Not a Banana’或‘Seadream’这类提示,要求人物以某种风格呈现,模型就会混乱,根本无法生成符合要求的风格。
I found that if I gave something like Not a Banana or, Seadream, all of those commands, like people place in a certain style, it would like melt its brain, and it just wouldn't give me the thing in the style.
但我发现,如果先生成一张写实风格的图像,让角色和场景看起来符合我们的需求,然后用Flux模型配合一致的提示词,比如‘将这张图转换为具有油画质感的绘画风格’,
But I found if I had to make a photorealistic looking image with the character in the place that like looked the way we want it to look, and then I've just been running it through flux context with a consistent text prompt of, like, change this into a painterly, you know, oil based painting style.
这样输出的图像效果很好,而且风格也很一致。
And then the output image looks pretty good and looks pretty consistent.
我还没有发现需要为此训练一个LoRa,但我正在考虑可以这么做,因为我有起始帧和结束帧。
I haven't found a need to train a LoRa on that, but I'm thinking I could because I have the start frame and I have the end frame.
如果我想简化流程,或者团队其他成员也想操作,以确保流程更加一致,我或许可以直接提供 photorealistic 图像的起始帧和 stylized 图像的结束帧,然后训练Allure。
If I wanted to simplify the process or if other people on the team were doing it to make sure it's like a more consistent process, I probably could just give it the start frame of the photorealistic looking image, the end frame of the stylized image, and and train Allure on that.
听起来挺麻烦的。
Seems like a pain in the butt.
是的。
Yeah.
只要文本提示有效就行。
As long as you've been like, well, the text prompt works.
只要每个人只是把那段提示词复制粘贴到FluxContext里,它就能很好地工作。
It's like as long as everyone just copies and pastes the paragraph of the prompt in for FluxContext, like, it it it works pretty well.
那我为什么还需要训练一个LoRa呢?
So why do I need to train a LoRa?
是的。
Yeah.
你知道,我还在犹豫。
You know, I'm still on the fence.
另一方面,一些封闭的模型,比如不开放源代码的Nana Banana、Seedance,还有ChatGPT,确实如此。
On the flip side, the some of the, like, closed down models that are not open source, you know, Nana Banana, Seedance maybe, definitely, you know, ChatGPT.
这些模型无法进行LoRa训练。
There is no LoRa training.
对吧?
Right?
所以它们唯一拥有的功能就是参考图像插入。
So the only thing it does have is reference image insertion.
是的。
Yeah.
或者当你需要改变风格时,只需通过文本提示来实现,这时ChatGPT或Clot就派上用场了——你可以让它生成一个非常详细的文本提示,只要提示足够细致,持续运行就能得到比较一致的输出。
Or just trying to text prompt it if you need to change the style into that's where ChatGPT or Clot comes in handy where you have it write a very detailed text prompt, and the prompt is very detailed where if you keep running it consistently, you get pretty consistent outputs.
是的。
Yeah.
目前还无法确定哪个图像模型是最好的,我认为我们永远也达不到所谓的最全面的图像模型。
The jury is still out on which image model is the best, and I I don't think we'll ever get to, like, the most comprehensive image model ever.
我觉得
I think
我觉得,永远都会是哪个图像模型最适合
I think I it's always gonna be which image model is the best for
你的需求。现在你正看到这些领域逐渐明确起来。
what you need to And now you're kind of seeing those lanes being defined.
就像我提到的,Ideogram。
Like I mentioned, ideogram.
对吧?
Right?
比如,Ideogram 在徽标、文字和字体方面做得非常好。
Like, ideogram has got the logos and text and font stuff down.
这就是他们的专长领域。
Like, that's their lane.
我认为Nana Banana之所以获得如此多关注,是因为它在写实风格上表现得非常出色。
I think Nana Banana, the reason it gets so much attention is because it's just really good at photorealism.
但这并不意味着它是最好的模型。
But that doesn't mean it's the best model.
它只是在写实风格上特别出色。
It's just really good at photorealism.
然后你有C Dance,它在创意作品、动画风格和风格化内容方面表现得非常好。
And then you have C Dance, which is really good for creative work and sort of animated looks and stylized stuff.
你知道吗,再过六个月,我们就不会讨论哪个模型是最好的了。
The you know, I down the road, six months from now, we won't be talking about what model's the best.
那时候大家会更关注:你用哪个模型来做x、y和z?
It'll be more like, what model are you using for x and y and z?
也许在未来的某一集中,我们可以再回过头来讨论,因为我一直在看到来自x x AI的东西,不管他们的视频生成器叫什么名字。
Maybe in a future episode, we could revisit because I keep seeing stuff from x x AI, whatever their video generator is called.
哦,是的。
Oh, yeah.
Grock现在是0.9版本。
It's on version point nine, Grock.
是的。
Yeah.
Grock 0.9。
Grock point nine.
我看到过用它生成的威尔·史密斯吃意大利面的视频。
I I saw I saw the Will Will Smith eating pasta done on that one.
看起来挺不错的。
It looked pretty good.
是的。
Yeah.
我觉得它可能又开始流行了,因为Sora的来回反复——Sora 2刚发布时说想干嘛都行,结果一周后又彻底反转,说啥也不能做。
And I think, you know, it's probably getting popular again because I think with the back and forth, the boomerang of Sora, like, Sora two comes out, do whatever you want, FIP, and then a week later, a complete u-turn of, like, you can't do anything.
现在人们都觉得,我就想自由创作。
And now people are like, I wanted to do whatever I want.
然后Grock就说你可以随意做任何事
And then Grock is like You can do whatever
你在这里可以随意做。
you want here.
大门缓缓打开。
The doors wind open.
我们不在乎。
We don't care.
是的。
Yeah.
你得记住萨姆·阿尔特曼和埃隆·马斯克之间的那场争执。
And you gotta remember the whole Sam Altman, Elon Musk beef.
对吧?
Right?
随着事情的发展,他们会每一步都试图压过对方。
Like, they're they're gonna one up each other at every step of the game as as we go on.
是的。
Yeah.
对。
Yeah.
要是Grock里能有个埃隆的客串就好了,这样就能让萨姆·阿尔特曼和埃隆对上了。
If only there was a a an Elon cameo on, Grock, so you could have Sam Altman and Elon.
一决高下。
Duke it out.
不是在Grock上。
Not on Grock.
抱歉。
I'm sorry.
是在Sora上。
On Sora.
所以你让萨姆和埃隆跟洛根·保罗一决高下。
So you got Sam and Elon duke it out with Logan Paul.
我有个主意。
Here's an idea.
你可以用传统的视觉特效来做。
You could do it with traditional VFX.
你可以
You could
这么做
do that
也行。
too.
其实不行。
Actually, no.
我收回刚才的话。
I take it back.
你可以用几乎任何开源模型来实现。
You could do it with pretty much any open weight model.
你只需要取他们两人的参考图片。
You just take reference images of both of them.
这很容易。
It's pretty easy.
是的。
Yeah.
你可以用WAN 2.2 animate来实现。
You could do it with WAN 2.2 animate.
我的意思是,效果真的很好。
I mean And see it's really good.
重现那场打斗的图像有问题。
Problem recreating images of that fight.
因为今天,我只是让一个人这样拿着詹姆斯·卡梅隆的照片。
Because today, I just made somebody holding James Cameron like this.
当我上传詹姆斯·卡梅隆的照片时,我在想,他们会因为这是个名人而标记出来吗?
And as I was uploading James Cameron's image, I was like, are they gonna flag that this is a famous person?
而且没什么问题。
And it was fine.
它什么都没做。
It didn't do anything.
嘿,去噪器们。
Hey, denoisers.
所以我和乔伊有了这个想法。
So Joey and I had this idea.
我们为什么不开始回答观众的问题呢?
Why don't we start answering our audience's questions?
如果你对我们过去讲过的任何内容、某些AI模型或其他任何东西,甚至是我们还没涉及的内容有疑问,请发送邮件至 denoised@vp-land.com 提交你的问题。
If you have a question about anything that we've covered in the past, certain AI models or what have you, or even stuff stuff we haven't gotten to yet, submit a question at the following email denoised@vp-land.com.
再次提醒,如果你在YouTube上观看,这个邮箱地址也会显示在文字里。
Again, it's gonna be on the text here if you're watching it on YouTube.
denoised@vp-land.com,我们会收到这些
Denoised@vp-land.com, and we'll get those
这是 v p dash land,因为有人占着 VP land 这个域名,想收几千美元的费用。
It is v p dash land because someone else is just sitting on VP land and wants to pay charge thousands of dollars for it.
所以在节目变得大得多之前。
So until until the show grows way, way bigger.
是的。
Yeah.
我们非常期待听到你们的问题。
We'd love to hear your questions.
我们未来可以做一个邮件问答专场,回答一些关于……的问题。
We could do, a mail mail room episode in the future, questions about yeah.
就是一些正在发生的事情,或者工作流程方面的问题,尽管发过来。
Just stuff that's happening or workflow questions, send them over.
和往常一样,所有我们提到的内容的链接都在 denoisedpodcast.com。
And, links as usual for everything we talked about at denoisedpodcast.com.
谢谢观看。
Thanks for watching.
我们下一期再见。
We'll catch you in the next episode.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。