本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
今天的AI每日简报将介绍25项nanoVanana Pro能实现而传统AI图像生成无法做到的功能。
Today on the AI Daily Brief, 25 things you can do with nanoVanana Pro that you couldn't do with AI image generation before.
AI每日简报是一档每日播客和视频节目,聚焦人工智能领域最重要的新闻和讨论。
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
好的朋友们,在开始前先快速说明几点。
All right, friends, quick notes before we dive in.
首先感谢今天的赞助商:KPMG、Robo、Robots and Pencils以及Blitzy。
First of all, thank you to today's sponsors, KPMG, Robo, Robots and Pencils and Blitzy.
要获取无广告版本节目,请访问patreon.com/aideallybrief,或在苹果播客订阅。
To get an ad free version of the show, go to patreon.com/aideallybrief, or you can subscribe on Apple Podcasts.
赞助合作请联系sponsorsaidailybrief.ai。
To learn about sponsoring the show, send us a note at sponsorsaidailybrief dot ai.
最后提醒,AI投资回报基准研究即将截止。
Lastly, we are in the last couple days of the AI ROI benchmarking study.
请访问roisurvey.ai查看详情,感谢所有已参与贡献的听众。
Check it out at roisurvey.ai, and thanks to all of you who have contributed so far.
正式开始前最后一点说明。
Final quick note before we dive in.
今天我们完全沉浸在这个新模型的各种趣味应用中,将跳过新闻头条环节。
We are just wallowing in all of the fun ways to use this new model today, we will be skipping headlines.
当然本周末我们还有重磅THINK特辑,周一将恢复正常节目形式。
Then of course over the weekend we have a big THINK episode and we will be back with our normal format on Monday.
交代完毕,现在让我们正式开始。
With that out of the way, let's dive in.
欢迎回到AI每日简报。
Welcome back to the AI Daily Brief.
过去两周简直是丰收的烦恼。
The last two weeks have been an absolute embarrassment of riches.
我们先是有了GPT 5.1,接着是Gemini三代,然后是GPT 5.1 Pro和Codex Max,最后发现还有Nano Banana Pro。
We got GPT 5.1 followed by Gemini three followed by GPT 5.1 Pro and Codex Max followed it turns out by Nano Banana Pro.
从某些方面来说,就AI能力的即时影响和改变而言,我认为这个图像模型可能是最重要的。
And in some ways, in terms of immediate impact and change in your capabilities with AI, I think that this image model might be the big one.
现在快速提醒一下,如果你还没切换,这期节目需要切换到视频版观看。
Now quick warning, if you are not already, you got to switch over to the visual version for this episode.
你可以在Spotify上找到。
You can find it on Spotify.
YouTube上也有版本。
There's also a version of it on the YouTube.
今天我们要讨论的是用Nano Banana Pro(谷歌最新图像生成模型)可以立即实现的25件事。
Today we are talking about 25 things you can do right now with Nano Banana Pro, which is of course Google's latest image generation model.
重要的是,这些不只是炫酷的功能。
And importantly, this isn't just cool things that you can do.
这些都是三天前用上一代顶尖图像技术还几乎无法实现的事情。
This is stuff that you pretty much couldn't do as of like three days ago with the previous image state of the art.
顺便提一下背景,你可能记得几个月前大家都为Nano Banana图像模型疯狂。
Now, by way of background, you might remember that a couple months ago, everyone started going wild for the Nano Banana image model.
Nano Banana当然是它的代号,但它太受欢迎以至于这个名字就这么保留下来了。
Nano Banana was, of course, its code name, but it became so beloved that it sort of stuck.
它的技术名称我记得是Gemini 2.5 Flash ImageGen之类的。
Its technical name was, I think, Gemini 2.5 Flash ImageGen or something like that.
无论如何,Nano Banana之所以如此吸引人,并非因为它生成的图像质量绝对优于其他模型,而是因为它具有难以置信的可控性。
In any case, what made Nano Banana so interesting to people was not that it had raw output that definitively beat other image models, it's that it was so unbelievably steerable.
它提供的精细编辑功能真正开辟了新的应用场景。
It provided for fine grained editing in ways that really opened up new use cases.
这让我深刻意识到,我们需要一种全新的启发式标准或评估方法——不是看模型在基准测试中的表现,而是关注它能解锁哪些新能力。
So much that it got me thinking that we need a different type of heuristic or metric or eval when we look at a new model that's not so much about its performance on benchmarks, but is instead about what capabilities it unlocks.
我提议设立'解锁分数',本质上就是衡量模型能开启哪些新的可能性。
I propose the idea for an unlock score which would basically be exactly that: a determination of what new possibilities a model opened up.
按照这个标准,Nano Banana Pro的分数绝对会爆表。
And in any version of that, Nano Banana Pro would just score off the charts.
就核心能力而言,我认为最关键的是两大突破点。
Now in terms of the capabilities, there are really two big things that feed into everything else in my estimation.
第一是文本呈现能力。
The first is text representation.
其他模型在图像上添加文字的效果与Nanobanana之间的差距,是我见过的图像生成领域最大的代际飞跃。
The difference between any other model putting words on an image and what Nanobanana can do is the single biggest jump that I've ever seen between models when it comes to image generation.
句号。
Period.
完毕。
Full stop.
第二个关键因素是Gemini能在图像生成基础上进行逻辑推理,这与其他能力结合后催生了各种新的可能性。
Now the second factor that combines with that to open up all sorts of new possibilities is the fact that Gemini is able to reason on top of image generation.
当你在Gemini内部时,这不是两个割裂的体验,你不仅可以通过图像提示,还能直接与模型对话,精确实现你的创作意图。
So when you are inside Gemini, it is not two disconnected experiences, but you don't just have to prompt an image, you can actually talk to the model and figure out exactly what you're trying to do.
这种在图像生成之上的推理能力再次开启了全新的可能性。
That reasoning on top of image generation once again opens up totally new possibilities.
最后,第三个值得一提的因素是它对任何编辑需求都保持惊人的保真度。
Finally, third factor which should be mentioned as well is the incredible fidelity that it has to whatever edits you want to make.
所有这些特性叠加,使这个模型能将你从普通的图像生成者转变为真正的专业人士。
All of which adds up to this being a model that is going to turn you from an average image generator to an absolute professional.
从很多方面看,我认为核心叙事以及大量应用场景背后的元类别,都围绕着视觉压缩这个概念展开。
In a lot of ways, I think that the core story here in some ways and the meta category that a huge number of these use cases fall out of is the idea of visual compression.
在人们早期的大量实验中,一个突出主题就是将海量信息转化为可视化形式。
One of the big themes that you see across a ton of the early experiments that people are doing is taking a bunch of information and making it visual.
换言之,Nano Banana Pro对文本的运用差异不仅是规模变化,更是本质上的革新。
In other words, the difference in how Nano Banana Pro can use text is not just a change in scale, but a change in kind.
新的突破不仅在于更好地使用文本,更在于能如此娴熟地用它讲述视觉故事。
The new unlock is not just being better able to use text, but being able to do it so well that you can start to tell visual stories.
使用NVPro的第一个范例就是将大量数据压缩成可视化图表。
So a first example of what you can do with NVPro is compressing lots of data into visuals.
以财务报告为例。
Take for example financial results.
Menlo Ventures的Didi Doss将英伟达Q3财报PDF转化为单页信息图,突出显示营收、营业利润、净利润、毛利率等关键数据,同时还能标注板块表现、增长动力、资本策略和风险等其他核心内容。
Didi Doss of Menlo Ventures took the entire NVIDIA Q3 earnings PDF and generated a single page infographic that had the key highlights like revenue, operating income, net income, gross margin, and was also able to highlight other key parts of the report including segment performance and drivers and capital strategy and risks.
A16z的Justine Moore对Alphabet的Q1财报也做了类似处理。
Justine Moore from A16z did something similar with Alphabet's Q1 earnings release.
那张图表在数据密度上更为突出,精准的展示图表呈现了收入增长、营业利润增长以及收入构成。
That one was even higher on chart density with accurate representation charts showing revenue growth, operating income growth, as well as revenue composition.
精确的比例图表实际上是一个绝佳例证,展示了更先进的智能如何改变我们所能实现的事情本质。
Accurate scale charts is actually a great example of where the more sophisticated intelligence changes the nature of what you can do.
为了展示这一能力,西蒙·史密斯制作了一个香蕉图表,准确呈现了25%、50%、75%和100%之间的数量级差异。
To demonstrate this capability, Simon Smith made a chart of bananas where he showed the difference in magnitude between 25%, 50%, 75%, and 100% and it actually gets it right.
西蒙说:我之前尝试用图像生成器制作图表,但它们无法正确生成柱状图的长度比例。
Simon said: I've previously tried to generate charts with image generators and they failed to get the correct lengths for bars and columns.
卡希亚克·希瓦库马尔也发现了类似现象。
Kashyyak Shivakumar found something similar.
谷歌DeepMinder团队写道:Nano Banana Pro的涌现能力让我惊讶——它能生成比例精确且美观的图表。
The Google DeepMinder wrote: An emergent capability of Nano Banana Pro that took me by surprise, the ability to generate beautiful and accurate charts that are to scale.
在他测试的人均GDP案例中,不仅数据准确,视觉效果同样出色。
In his case he tested GDP per capita, and it not only is accurate, it really is aesthetic at the same time.
关于压缩海量个人数据的另一种思路,我认为白板模式将会成为趋势。
Another approach to this idea of compressing lots of data individuals that I kind of think is going to become a trend is the whiteboard trend.
皮特罗·塞拉诺写道:Nano Banana Pro太疯狂了。
Pietro Serrano writes, Nano Banana Pro is wild.
这是目前为止我最喜欢的应用场景。
Here's my favorite use case so far.
将论文或长篇报道转化为详细的白板示意图。
Take papers or really long articles and turn them into a detailed whiteboard photo.
这简直是人类历史上最伟大的压缩算法。
It's basically the greatest compression algorithm in human history.
接着他展示了将LAMA三组模型的92页PDF压缩后转换到教授白板上的过程。
He then shows a compression of a 92 page PDF from the LAMA three herd of models converted to a professor's whiteboard.
虽然显然白板上能容纳的细节有限,但这仍是个令人印象深刻的总结。
And while obviously you can only fit so many details into a whiteboard, it's an incredibly impressive summarization.
再次强调这里的独特之处,我们有两项技术结合:更好的文本处理与表达能力,以及在图像生成基础上进行推理以创造真正原生视觉输出的能力。
So again, just to reinforce the points about what's different here, we have two things coming together: better ability to handle and represent text, and the ability to reason on top of image generation to create truly native visual outputs.
这引出了第四类非常广泛的应用场景,我们称之为教育领域。
This leads to a fourth, very broad category of use cases, which we'll call educational.
实际上,图像生成现在可以与大语言模型并肩成为教育工具,这在几天前还是完全不可能的事。
Effectively, image generation can be an educational tool alongside LLMs now in a way that just was impossible up until literally a couple of days ago.
这里有无数案例,仅举几例:我们有个可视化展示,对比机器人技术中已解决部分与关键瓶颈障碍。
We have infinite examples here, but to take a few, we have a visualization showing what parts of robotics are solved versus where there are key bottlenecks and hurdles.
克拉克·温伯利创建了一个从点击到动作解释触摸屏原理的可视化。
Clark Wimberly created a visualization explaining how a touchscreen works from tap to action.
仅用最简单的提示词'制作解释触摸屏工作原理的信息图',NanoBanana Pro就生成了四部分视觉内容,既美观又完整展示了从物理触碰到感应交互,再到信号处理最终执行命令的全过程。
From the literal dead simple prompt, make an infographic explaining how a touchscreen works, NanoBanana Pro was able to put together a four part visual that looks great and explains how the process goes from physical touch to sensing the interaction, to signal processing to executing the command.
Swix玩起了元概念,让Nanobanana Pro解释Nanobanana Pro自身。
Swix went meta and asked Nanobanana Pro to explain Nanobanana Pro.
他得到了两种截然不同的呈现方式。
He got it in two very different ways.
一种是看起来专业但偏学术的信息图,另一种则是用经典漫画形式直接展示Nanobanana的功能。
One is a sort of good looking but ultimately academic infographic, while the other is a literal classic comic strip explaining what Nanobanana can do.
当然,我已经看到了大量家长使用场景。
I'm, of course, seeing a ton of parental use cases.
谷歌的Jacqueline Conselman创作了这幅精美的太阳系导览图,看起来就像是适合贴在三四岁孩子墙上的那种海报。
Google's Jacqueline Conselman created this gorgeous tour of our solar system that looks like the type of poster that you'd put on a three or four year old's wall.
说到三四岁的孩子,我家有个四岁小孩正在学认字,而且对工程车辆和建筑工具特别着迷。
Speaking of three or four year olds, I have a four year old who is learning to read and who is very, very much into construction equipment and construction tools.
所以,我自然要制作一张以这个主题为基础的字母表挂图。
And so, of course, I had to put together an alphabet chart that was based on that theme.
如果你曾尝试过类似的事情就会知道,虽然这看起来应该是基本操作,但事实绝非如此。
And if you've ever tried to do something like this, you will know that while this feels like it should have been table stakes, it absolutely was not.
以前几乎不可能得到这样的成果。
It was nearly impossible to get something like this before.
而且几乎完全不可能做到毫无错误,又不需要详细指定所有不同元素。
And pretty much genuinely impossible to get it with no errors and without specifying all the different elements.
我不需要特意告诉它用沥青摊铺机代表A字母,或者用推土机代表B字母。
I didn't have to tell it to put an asphalt paver for A or a bulldozer for B.
我只需说明想要一张这种主题的字母表,它就能自动完成其余部分。
I just told it that I wanted an alphabet chart with these themes and it figured out all the rest.
如你所见,我们正从这些实际上包含多种使用场景的广泛用例开始。
As you can see, we're starting with these very broad use cases that are actually lots of use cases bundled together.
但接下来我们要看的是信息图表的一个子类——流程图。
But the next one we'll look at is a sort of subset of infographics which is flowcharts.
Ethan Moloch提出了这个需求。
Ethan Moloch prompted it.
我需要一个关于如何烤面包的流程图,要尽可能荒诞夸张且复杂。
I need a flowchart for how to toast bread, make it as wacky and over the top and complicated as possible.
它以一种宏大的方式做到了这一点。
And it did that with grand fashion.
虽然伊森当时有些傻气,但能将不同视觉元素以流程图形式呈现的能力显然极具价值,而且不仅仅是以一种滑稽的方式。
Now Ethan was being silly, but this ability to actually show the representation of different visual elements as a flowchart is obviously incredibly valuable, not just in a silly way as well.
如果人工智能不仅仅是一个流行词,而是商业必需品呢?
What if AI wasn't just a buzzword, but a business imperative?
在《You Can With AI》节目中,我们将带您走进全球最具前瞻性企业的董事会和战略会议现场。
On You Can With AI, we take you inside the boardrooms and strategy sessions of the world's most forward thinking enterprises.
由我——纳撒尼尔·维特摩尔主持,毕马威赞助的这七集系列节目,将为您带来那些有目标地扩展AI应用的领导者们的实战洞见。
Hosted by me, Nathaniel Wittemore, and powered by KPMG, this seven part series delivers real world insights from leaders who are scaling AI with purpose.
从协调企业文化与领导力,到建立信任、数据准备和部署AI代理。
From aligning culture and leadership to building trust, data readiness, and deploying AI agents.
无论您是C级高管、战略家还是创新者,这档播客都将带您前排直击企业AI的未来。
Whether you're a C suite executive, strategist, or innovator, this podcast is your front row seat to the future of enterprise AI.
立即访问www.kpmg.us/aipodcasts,或在Spotify、Apple Podcasts等播客平台搜索《You Can With AI》。
So go check it out at www.kpmg.us/aipodcasts or search You Can With AI on Spotify, Apple Podcasts, or wherever you get your podcasts.
认识罗博——您的人工智能队友。
Meet Robo, your AI powered teammate.
罗博通过AI驱动的搜索、聊天和代理功能释放团队潜力,您还可以使用Studio构建专属代理。
Robo unleashes the potential of your team with AI powered search, chat, and agents, or build your own agent with Studio.
罗博由您组织的知识库驱动,并运行在Atlassian安全可信的平台上,始终与您的工作场景保持同步。
Robo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work.
将罗博与您最爱的SaaS应用连接,确保知识永不遗漏。
Connect Robo to your favorite SaaS app so no knowledge gets left behind.
Robo运行在Teamwork Graph上,这是Atlassian的智能层,它能整合您所有应用的数据,并从第一天起提供个性化AI洞察。
Robo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one.
Robo已内置在Jira、Confluence以及Jira Service Management的标准版、高级版和企业版订阅中。
Robo is already built into Jira, Confluence, and Jira Service Management Standard, Premium, and Enterprise subscriptions.
可曾体会过AI从工具转变为队友的感觉?
Know the feeling when AI turns from tool to teammate?
若您使用Robo,便会懂得。
If you Robo, you know.
探索Robo——您由Atlassian驱动的新AI队友。
Discover Robo, your new AI teammate powered by Atlassian.
请访问rov,asinvictory,o.com开始使用。
Get started at rov,asinvictory,o.com.
AI瞬息万变。
AI changes fast.
您需要一位为长期合作而生的伙伴。
You need a partner built for the long game.
Robots and pencils与企业并肩合作,将AI抱负转化为真实的人类影响力。
Robots and pencils work side by side with organizations to turn AI ambition into real human impact.
作为AWS认证合作伙伴,他们通过现代化基础设施、设计云原生系统及应用AI来创造商业价值。
As an AWS Certified Partner, they modernize infrastructure, design cloud native systems, and apply AI to create business value.
他们的合作伙伴关系不会止步于项目启动。
And their partnerships don't end at launch.
当AI技术迭代时,robots and pencils始终与您同行,助您与时俱进。
As AI changes, robots and pencils stays by your side, so you keep pace.
差异在于紧密的合作伙伴关系,这种关系能随时间积累价值。
The difference is close partnership that builds value and compounds over time.
此外,通过遍布美国、加拿大、欧洲和拉丁美洲的交付中心,客户既能获得本地专业知识,又能实现全球规模。
Plus, with delivery centers across The US, Canada, Europe, and Latin America, clients get local expertise and global scale.
想要见证人工智能带来的实质进展而非空头承诺,请访问robotsandpencils.com的aidailybrief栏目。
For AI that delivers progress, not promises, visit robotsandpencils.com aidailybrief.
本节目由Blitzy赞助播出——这款企业级自主软件开发平台拥有无限的代码上下文理解能力。
This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context.
Blitzy运用数千个专业AI代理,经过数小时思考来理解数百万行代码的企业级代码库。
Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale codebases with millions of lines of code.
企业工程领导者通过Blitzy平台启动每个开发冲刺周期,输入他们的开发需求。
Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements.
Blitzy平台会制定计划,随后为每个任务生成并预编译代码。
The Blitzy platform provides a plan, then generates and pre compiles code for each task.
Blitzy能自主完成80%以上的开发工作,并为完成冲刺所需的最后20%人工开发提供指导。
Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.
上市公司将Blitzy作为预集成开发工具,配合自选的编码辅助工具实现AI原生软件开发生命周期,工程效率提升了5倍。
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre IDE development tool, pairing it with their coding pilot of choice to bring an AI native SDLC into their org.
访问blitzy.com并点击"获取演示",了解Blitzy如何将您的SDLC从AI辅助升级为AI原生。
Visit blitzy.com and press Get a Demo to learn how Blitzy transforms your SDLC from AI assisted to AI native.
接下来排名第六的是视觉教程——Callum McClark制作了跆拳道正确鞠躬步骤图表,将其分为四个步骤,并提供了鞠躬时机的说明。
Next up, number six is Visual Tutorials Callum McClark put together a chart of the correct bowing procedure for Taekwondo, dividing it into the four steps, as well as providing insight on when to bow.
再次说明,Callum提供的信息并不详尽。
Once again, Callum didn't provided a ton of information.
当有人询问时,他说这个提示相当简单:给我生成一张信息图,解释如何在ITF跆拳道中正确行礼。
When someone asked he said the prompt was fairly simple generate me an infographic explaining how to bow correctly in ITF Taekwondo.
虽然我还没见过这类版本,但你能想象这对组装某物的指导说明会有多宝贵吗?
Now I didn't see any versions of this but can you imagine how valuable this would be for instructions on assembling something?
许多人正在尝试的另一类独立教学形式是可视化食谱。
Another sort of separate category of instruction that a lot of people are experimenting with is visual recipes.
ChubbionX制作了一张图表,展示如何冲泡完美的豆蔻茶。
ChubbionX built a chart showing how to make the perfect cardamom tea.
Vittorio创建了一份分步指南,教你煮出完美的意大利面。
Vittorio created a step by step guide for cooking the perfect pasta.
解剖图和技术绘图是个重要主题。
Anatomical and technical drawings was a huge theme.
JSON提示账户展示了一系列宝可梦解剖图,包括杰尼龟、妙蛙种子、小火龙和皮卡丘本尊。
The JSON prompts account showed a bunch of Pikachu anatomy drawings including Squirtle, Bulbasaur, Charmander, and Pikachu himself.
另一个我认为会大量出现的应用场景是将一种媒体形式转换为另一种媒体形式。
Another use case which I think we'll see a ton of is taking one type of media and turning it into another type of media.
Shopify CEO托比·勒基将多年前给团队的演讲视频转化成了丰富的复杂可视化作品——你最好相信我会用本期节目文字稿尝试这个。
Shopify CEO Toby Lucky took a video of a speech that he gave a number of years ago to his team and turned it into a rich complex visualization which is something you better believe I'm going to try with the transcripts to this show.
我们看到人们尝试的另一种技术图纸子类型是蓝图设计。
Another subgenre of technical drawings that we're seeing people experiment with is blueprints.
AI for Success账户写道:它不仅是生成图像,而是先正确读取蓝图,再生成包含每个微小细节的最终输出。
The AI for Success account wrote it did not just create the image, it first read the blueprint properly and then created the final output with every small detail.
这再次精彩展现了真正多模态理解能力的强大之处。
Another great representation of the power of true multimodal understanding.
与此相关的是另一个我认为将高度商业化的应用场景——虚拟家居布置。
Sort of related to that is another use case that I think is going to be highly commercialized which is virtual staging.
A16z的贾斯汀·摩尔给了它三件家具,说用这张沙发、桌子和两把椅子布置一个客厅。
Justine Moore from A16z gave it a set of three pieces of furniture and said stage a living room with this couch table and two chairs.
它出色地完成了输出,贾斯汀写道:'模型的第一版在这方面表现不错,但我发现新版在保留物体纹理、不对称性和独特特征方面做得更好。'
It executed the output very well and Justine wrote, The first iteration of the model was good at this but I find the new model is much better at retaining textures and asymmetry and unique features of objects.
阿尔辛更进一步写道:'Nano Banana Pro正在让数百万室内设计师失业。'
Alcine went farther writing, Nano Banana Pro is making millions of interior designers obsolete.
我上传了我的平面图,它为我设计了整栋房子,甚至根据尺寸为每个房间生成了真实图像。
I upload my floor plan and it designed the whole house for me and even generated real images for each room based on the dimension.
在我看来,这再次证明专业室内设计师将能够以更快的速度、更低的成本为更多客户提供更多服务,但能力提升绝对是巨大的。
Now in my opinion, I think that this is once again an example of where professional interior designers are just going to be able to do more, faster, potentially for less and for a greater number of clients, but the capability increase is absolutely huge.
谷歌在发布时重点强调的另一项功能是将多人合成到一张照片中。
Another capability that Google talked a lot about when they launched this was the ability to combine multiple people into a single photo.
FOFR发现Nano Banana Pro最多可处理14张参考图像,虽然处理5人左右效果最佳,但有时可以突破这个限制。
FOFR found that Nano Banana Pro could take up to 14 reference images and that while it worked best with around five people, sometimes you could push it farther.
如果你尝试过将多人合成到一张图像中,就会知道AI模型经常遇到困难,最终会把人物特征混合在一起,而不是让他们并排出现。
If you've ever tried to combine people into an image, you'll know that AI models frequently have a hard time and end up kind of blending people's features together rather than putting them next to one another.
FoFR实际展示了几个例子,说明你可以结合所有这些参考图像创造出不同风格的作品。
FoFR actually showed a couple examples of the different styles that you could combine all these reference images to create something with.
这里的一个重要主题是精准遵循指令。
A big theme here is really precise instruction following.
哈利姆·阿尔·拉希给模型提供了两个角色、一个风格参考和他们想要的动态姿势草图,最终得到了他们想要的精确图像。
Halim Al Rasihee gave the model two characters a style reference and a sketch of an action pose that they wanted and got the exact image that they were looking for.
我认为,这种精确指令和精准编辑的能力将再次成为整个模型中最具商业价值和最重要的突破点。
Now this precision instructions and precise editing is once again, I think, going to be one of the most commercially viable and important unlocks of the whole model.
最初的Nano Banana在这方面就做得很好。
The original Nano Banana was good at this.
这实际上就是解锁分数理念的起源核心。
This was actually the core of where the unlock score idea came from.
这种局部照片编辑的能力。
This ability to spot edit photos.
但Pro版本将其提升到了全新高度。
But Pro takes it to another level.
Clark Wimberly拍摄了一张仓库中男子的照片,并向模型提示:'这名男子刚收到供应商调价请求,看起来忧心忡忡'。
Clark Wimberly took a photo of a man in a warehouse and prompted the model, This man just got a supplier price change request and looks concerned.
模型做出的修改看起来极其自然,毫不夸张。
The model makes the change in ways that look incredibly natural and not over exaggerated.
Clark还把一罐White Claw饮料变成了一杯插着条纹吸管的苏打水。
Clark also turned a White Claw into a glass of soda with a striped straw.
还要特别提到Prins,他把一堆万智牌从红黑混色全部改成了纯黑色卡牌。
Gotta give a shout out to Prins as well who took a handful of Magic cards, split between red and black, and made them all into black cards.
如果你不了解万智牌,我想强调几个让这个操作比表面看起来更惊人的细节。
Now if you don't know Magic, I wanna underscore a couple things about this that make it even more impressive than it seems.
首先是它能根据用户要求,判断出需要将左侧的山脉地形改为沼泽地——这在游戏中是与黑色派系相关的基础地牌。
The first is that it was able to tell that based on what the user was asking, it needed to change the mountain on the left to a swamp, which is the basic land associated with Black in the game.
这涉及到的理解与认知层级完全超出了原始提示的范畴。
That involves a whole different level of understanding and comprehension that wasn't there in the prompt.
第二部分是不同颜色的边框具有不同的视觉提示。
The second part is that the borders of different colors have different visual cues.
因此模型知道它不能仅仅改变这个图案的颜色,实际上必须将图案改为黑色卡片的样子。
And so the model knew that it couldn't just change the color of this pattern, it actually had to change the pattern to what black cards look like.
虽然这只是一个演示示例,但如此精确的编辑水平开启了令人难以置信的新机遇世界。
Now, while this was just a demonstration example, that level of precise editing opens up such a crazy world of new opportunities.
说到对指令的忠实度和精确编辑,广告公司绝对会垂涎三尺。
Speaking of fidelity to instructions and precise editing, the ad agencies are going to be absolutely salivating.
人们分享的最常见示例类型之一是产品和品牌照片。
One of the most common type of examples that people were sharing were product and brand shots.
有人为耳机创建了高保真度的广告视觉效果。
Someone created high fidelity advertising visuals for earbuds.
Hedra Labs将其标志放在了广告牌上。
Hedra Labs took its logo and put it on a billboard.
Jacob Paulsall将一组参考产品图片转换成了杂志风格的广告。
Jacob Paulsall took a set of reference product images and turned it into a magazine style ad.
现在继续品牌营销广告的主题,很多人也尝试了标志设计。
Now staying in the brand marketing advertising theme, a lot of people also experimented with logos.
关于这一点我要说,为了保持一定的怀疑态度,我仍然认为Nano Banana Pro输出的标志设计说得不客气些就是既没品味又丑陋。
Now this is one where I will say, for the sake of having some amount of skepticism, I still got to think that the logo outputs of Nano Banana Pro are to put it uncharitably tasteless and ugly.
不过我也没有深入尝试从中获得真正出色的作品。
But I also haven't gone in and tried to get something really great out of it.
公平地说,它所训练的大多数标志,我也认为绝对丑得吓人。
And to give it credit and acknowledgement, most of the logos that it was trained on, I also think are absolutely horrifyingly ugly.
再次强调其令人印象深刻之处,Pro不仅能生成一个标志或品牌资产,还能批量生成品牌资产。
Still bringing it back to the very impressive, Pro isn't just able to generate a logo or a brand asset, it's able to generate bulk brand assets.
Crystal Maria写道:在Nano Banana Pro上只需简单提示,就能一次性打造品牌并应用到商品上。
Crystal Maria writes, One shot at a brand and put it on merch with a low effort prompt on Nano Banana Pro.
她创建了一家新的鸡肉披萨公司,并用统一的标志系统设计了披萨盒、T恤和帽子。
She created a new chicken pizza company and designed a pizza box, a t shirt, and a hat all with an integrated logo system that was consistent.
Andrew Lane为抹茶能量胶原蛋白品牌做了同样的事。
Andrew Lane did the same for a matcha energy and collagen brand.
人们在进行这些创作时注意到,谷歌似乎稍微放松了一些限制。
Now one thing that people noticed as they were doing this is that Google appears to have wound back the guardrails just a little bit.
现在生成人物图像和自有IP内容更加自如了。
It's more comfortable producing images of people and owned IP.
例如,人们能够非常精确地获取星球大战和迪士尼的标志。
For example, folks were able to get the Star Wars and Disney logos really accurately.
虽然不确定这种情况是否会持续,但我认为只要在合理范围内允许人们使用这类标志识别功能,就能开拓更多应用场景。
Now, whether this is something that will persist, I have fairly big questions of, but the more that within reason they can just let people do those sort of logo identities at least, I think the more use cases it opens up.
结束前再举几个应用案例:许多人尝试制作电影剧照,病毒式AI广告人PJ Ace写道:Nano Banana Pro是全球最具电影感的模型。
Just a few more use cases before we wrap up: Tons of people were experimenting with movie stills Viral AI advertiser PJ Ace wrote: Nano Banana Pro is the most cinematic model on the planet.
我让Gemini生成《塞尔达传说》新电影的真实感泄露剧照,这将改变好莱坞。
I asked Gemini to generate photorealistic leaked images from the new Legend of Zelda movie, and this will change Hollywood.
Archer Rathi用《超级无敌掌门狗》的剧照做了同样尝试,并能从多角度呈现,称这是AI电影制作的跨越时刻。
Archer Rathi did the same thing with a Wallace and Grommet still but was able to take it from multiple angles calling it a leapfrog moment for AI filmmaking.
说到电影制作,NanoBanana Pro的文本功能也为AI视频应用开辟了更多可能性。
Speaking of filmmaking, NanoBanana Pro's text capabilities also open up improved possibilities for using AI video as well.
尼克·马塔里斯写道:第一步:使用Nano Banana上传一张图片或生成的图像。
Nick Matarisse writes: Step one: Upload an image or generated image using Nano Banana.
第二步:使用Nano Banana Pro对图像进行标注。
Step two: Use Nano Banana Pro to annotate the image.
他的提示是:在图像顶部添加草图标注,解释镜头运动。
His prompt was: Add sketch annotations on top of this image explaining the camera movement.
我想要一个从低处升起然后俯视的航拍镜头效果。
I want it to crane up and look down as an aerial shot.
第三步:使用VO 3.1的帧转视频功能赋予其生命力。
Step three: Use VO three point one's frames to video to bring it to life.
本质上,图像上的标注能让视频模型知道该做什么。
Basically, the annotation on the image allow the video model to know what to do.
这里有大量媒体混搭创作,比如人们把数字新闻文章放到旧报纸上,把现代标志变成毛绒风格,把孩子的照片做成电影海报。
There is a ton of media remixing, like people putting digital news articles on old newsprint, people taking contemporary logos and turning them fluffy, people taking photos of their kids and turning it into movie posters.
在一次令人印象深刻的物理演示中,克里斯托弗·弗莱恩特发现他可以将西德妮·斯威尼的图像应用到一个十二面体上,而福弗再次成功将梗图转化为乐高积木。
In an impressive display of physics, Christopher Fryent found that he could apply an image, in this case of Sidney Sweeney, to a dodecahedron, And Foffer again was able to take a meme and turn it into Legos.
确实,我认为这里的梗图潜力几乎是无限的。
Indeed, I think the meme potential here is pretty unlimited.
我选取了'Base Face Kid'梗图——这种图片在人们特别喜欢某首歌(尤其是电子舞曲圈)时会分享。
I took the Base Face Kid meme, basically an image that people share when they really like a song, particularly in dance music circles.
然后我让Nano Banana把它变成四格视觉量表,让表情从正常逐渐变成最疯狂的贝斯脸。
And I asked Nano Banana to turn it into a four part visual scale where the face goes from normal to the most insane bass face.
你们可以看到,我认为它完全做到了。
You can see here that I think that it absolutely nailed it.
创建常规、柔和低音、强烈和极致低音效果的设置。
Creating settings for normal, mild bass, intense, and insane bass face.
从这一切中可以清楚地看出,不仅Nano Banana的解锁分数爆表,而且它从根本上重新定义了我们对图像生成能力的认知。
What's clear from all of this is not just that the unlock score of Nano Banana is off the charts, but that it pretty fundamentally redefines how we have to think about image generation capabilities.
对于那些长期关注Ethan Mollick的人来说,你们会知道他多年来一直使用类似的测试提示来观察新型图像生成模型的不足之处。
For those of you who have followed Ethan Mollick for a while, you'll know that he's used a similar test prompt for years to see how new image generation models fail.
基本上就是水獭在飞机上使用WiFi的场景。
It's basically otters on a plane using Wi Fi.
他半开玩笑地写道:既然Nano Banana Pro能做到这点,我那'飞机上用WiFi的水獭'可能已经成为饱和基准了。
He writes with tongue in cheek, I think my otters on a plane using Wi Fi may be a saturated benchmark now that Nano Banana Pro can do this.
图片展示了一群穿着白大褂、戴眼镜的水獭正在白板上解释为何之前的模型难以处理这个问题,右侧墙面上展示着历代模型的成果。
The image is a set of white lab coat and glasses clad otters describing on a whiteboard why models previously had a hard time with this with a gallery wall on the right showing all of the previous generations.
换句话说,我们正处于全新的领域。
We are, in other words, in very new territory.
接下来,我们将更深入地探讨这背后的意义。
Now, we'll explore a lot more about just what the implications of this are.
目前,如果你能使用Gemini,我强烈建议你花些时间好好把玩这个功能。
For now, if you have access to Gemini, I would highly recommend just going and spending a bunch of time playing with this.
不仅要尝试生成有趣的图像,更要探索如何用视觉元素传递高密度的信息。
Try exploring not just interesting image generation, but something where you need to convey a lot of information density with visuals.
我想你会印象深刻,这会以积极的方式改变你对AI图像生成可能性的认知。
I think you'll be impressed and I think it'll change in a good way what you think is possible with AI image generation.
以上就是今天AI每日简报的全部内容。
For now, that's gonna do it for today's AI Daily Brief.
一如既往地感谢你的聆听或观看,下次再见,愿你平安。
I appreciate you listening or watching as always, and until next time, peace.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。