本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
今天的AI每日简报,OpenAI发布了一款新的图像生成模型。
Today on the AI Daily Brief, OpenAI has released a new image generation model.
我们正在收集所有首批反馈,同时讨论四个我认为它甚至优于Nano Banana Pro的领域。
We are gathering all of the first responses as well as talking about four areas where I think you may prefer it even over Nano Banana Pro.
《AI每日简报》是一档每日播客和视频节目,聚焦人工智能领域最重要的新闻和讨论。
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
好了朋友们,在深入讨论之前先快速宣布几件事。
Alright friends, quick announcements before we dive in.
首先,感谢今天的赞助商:KPMG、Blitzy、Rovo以及Robots and Pencils。
First of all, thank you to today's sponsors: KPMG, Blitzy, Rovo, and Robots and Pencils.
要获取无广告版本节目,请访问patreon.com/aideallybrief,或可在Apple Podcasts订阅。
To get an ad free version of the show, go to patreon.com/aideallybrief or you can subscribe on Apple Podcasts.
当然,若有意赞助本节目,请发送邮件至sponsorsaidallybrief.ai联系我们。
And of course, to learn about sponsoring the show, send us a note at sponsorsaidallybrief dot ai.
小小剧透一下:如您所知,我们即将公布AI投资回报率基准调查的早期数据结果。
Little teaser here: As you know, we've got the early readout results of the AI ROI benchmarking survey coming.
总的来说,如果你对这类数据和研究感兴趣,不妨访问aidbintel.com?
And just in general, if you're interested in that sort of data and research, might I point you to aidbintel dot com?
明年我们将在这些领域推出更多有趣的内容,你可以注册以便在我们分享更多信息时获得通知。
We're going have a lot more interesting things in these domains coming next year, and you can sign up to get notified as we share more of that information.
在深入讨论之前还有一点要说明:与往常发布新模型时一样,本期节目篇幅较长,占用了我们所有的空间。
Now, one more note before we dive in: As is usually the case with a new model release, this episode got long and consumed the entirety of the space that we have.
现在我们需要一个扩展版头条了,所以很快就会推出。
At this point, we're getting due for an extended headline, so that will be coming soon.
不过现在,请先欣赏关于ChatGPT图像生成功能的介绍。
But for now, enjoy this look at ChatGPT images.
又是新的一天,又是新的模型。
Another day, another new model.
看吧,这些大实验室之间的竞争可能让工作人员倍感压力,但对我们消费者而言,只意味着更多选择。
Look, this competition between the big labs may be stressful for the people working there, but for us consumers, it means nothing but more choice.
今天我们要讨论的是OpenAI最新的图像生成模型,以及他们为其打造的新平台——他们称之为ChatGPT图像。
Today we are talking about OpenAI's latest image generation model and the new house they put it in, which they are calling ChatGPT images.
总的来说,这个结果某种程度上在我预料之中。
Now, overall, this is one that I kind of expected.
你可能记得在十二月的预测节目中,甚至在Sam Altman宣布进入'红色警戒'状态之前——或者说至少在我们知道之前——我对Gemini三号和Nano Banana Pro的回应预测就是OpenAI会推出一个图像模型。
You might remember in the December prediction episode, even before I think Sam Altman had declared Code Red, or at least before we knew about it, my best guess for a response to Gemini three and Nano Banana Pro was an OpenAI image model.
我们确实很久没有收到这方面的更新了。
It had just been a really long time since we got an update on that.
这显然是一个他们明显落后的领域。
It was clearly an area where they were pretty far behind.
基于这么久没有更新的事实,再加上了解OpenAI的发布速度,人们会认为他们应该已经非常接近能够发布新模型了。
And it seemed like, based on the fact that it had been so long since we got an update, and knowing the speed at which OpenAI delivers, they had to be pretty close, one would think, to being able to release a new model.
虽然我没预料到会是完整的5.2版本发布——这显然是'红色警戒'的第一个成果——但OpenAI昨天周二确实推出了他们的新ChatGPT图像功能。
Now I didn't expect a full 5.2 release and that's obviously the first output of Code Red, but yesterday on Tuesday OpenAI dropped their new ChatGPT images.
他们指出的优势包括更强的指令跟随能力、精确编辑、细节保留,以及相比之前的速度大幅提升。
As benefits they point to stronger instruction following, precise editing, detail preservation, and a big speed boost as compared to before.
那么让我们更详细地谈谈OpenAI在这里强调的这些优势。
So let's talk a little bit more about what OpenAI points to as the benefits here.
这其中很多功能是为了与Nano Banana Pro实现特性对标。
A lot of this is about feature parity with Nano Banana Pro.
要记住Nano Banana Pro的真正价值不仅在于原始生成能力的提升。
Remember the real value of Nano Banana Pro was not just that it was an improvement in terms of raw generation capability.
关键在于用户对生成过程拥有的控制权。
It was about the controls that the user had over it.
过去要想获得完全符合预期的生成结果,你只能反复输入提示词,然后选择最接近的那个。
Whereas in the past to get exactly what you wanted out of a generation, you'd just have to kind of prompt it over and over and over again and pick the one that was closest.
NINNABANNANA Pro支持更精准的编辑功能。
NINNABANNANA Pro allowed for more precise edits.
现在ChatGPT图像也具备了这种能力。
That capability has now come to ChatGPT images as well.
他们写道:该模型能更可靠地遵循你的意图,精确到微小细节,只修改你指定的部分,同时保持光线构图和人物外貌在输入、输出及编辑过程中的一致性。
They write the model adheres to your intent more reliably, down to the small details, changing only what you ask for while keeping elements like lighting composition and people's appearance consistent across inputs, outputs, and edits.
有趣的是,他们指出了一些相当以消费者为中心的使用场景,这个主题我们将在本期节目中多次提及。
Interestingly, they point to some pretty consumer centric use cases for that, which is a theme we'll come back to throughout this episode.
他们继续写道:这解锁了符合你意图的结果:更可信的服装和发型试穿效果,同时保留原图本质的风格滤镜和概念转换。
They continue: This unlocks results that match your intent: more believable clothing and hairstyle try ons, alongside stylistic filters and conceptual transformations that retain the essence of the original image.
他们指出的另一项能力是添加、减去、组合、混合和调换。
Another capability they point to is adding, subtracting, combining, blending, and transposing.
例如,将一组输入转化为单一构图。
For example, taking a set of inputs and turning it into a single composition.
他们重点强调的另一项能力是他们所谓的创意转换。
Another capability that they're really hammering is what they call creative transformations.
基本上是将一张图片转换成不同的风格预设。
Basically taking one image and turning it into a different style preset.
电影海报风格。
A movie poster.
或者把某人变成八十年代的健身教练。
Or turning someone into an eighties fitness instructor.
将某人的照片变成装饰品。
Taking someone's photo and turning it into an ornament.
诸如此类,等等。
Etcetera, etcetera.
再次强调,正如我们稍后会回到这个话题,我认为他们重点展示这些功能,实际上充分说明了这款产品的目标用户群体是谁。
Once again, and as we'll come back to, I actually think that they're highlighting this says a lot about who they are intending this product for.
他们提到的其他优势还包括更好的指令跟随能力,甚至可以实现更精确的提示输入,同时还显著提升了文本渲染质量。
Other benefits they point to include better instruction following, up to and including much more precise prompting, and they also point to much better text rendering.
显然,Nano Banana带来的最大变化之一就是:除了基础版Nano Banana和Pro版都能处理文本外,现在还能生成大量高保真文本,这为信息图表等应用开辟了新可能。
Now this was obviously one of the biggest changes that we got with Nano Banana is that in addition to just being able to have text with Nano Banana and then Nano Banana Pro, you could get a ton of high fidelity text, opening up new possibilities for things like infographics.
他们公告中最后提到一个有趣现象:虽然模型在大多数领域都有提升,但也确实发现了一些性能倒退的情况。
One final interesting thing from their announcement post is that while in most areas the model improved, they actually did find some regressions as well.
例如他们写道:生成某些特定艺术风格的能力相比前代版本有所退步。
For example, they write: The ability to generate some specific art styles has regressed from the previous version.
他们举的例子是'把我画成暗黑奇幻动漫风格',而新版本生成的结果完全——100%——不符合要求。
The example they give is Draw Me Like I'm in a Dark Fantasy Anime, with the new version completely 100% not being that at all.
此外还存在其他一些局限性。
There are other limitations as well.
例如,当图片中有许多不同面孔时,要在多次生成间保持所有这些面孔的一致性相当困难。
For example, when there's a picture with a lot of different faces in it, keeping all those faces consistent between generations can be difficult.
总体而言,他们宣称取得了重大改进,但仍有很大提升空间。
Overall, they claim a big improvement, but still a lot more opportunity ahead.
那么,人们的第一印象如何?
So, what were people's first impressions?
我感觉大家其实已经做好了会有些失望的心理准备。
I think my sense is that people were kinda prepared to be somewhat underwhelmed.
我不太确定具体原因是什么。
I'm not exactly sure what the reason for that is.
可能是担心由于这次属于'红色警报'项目,这个模型以及他们可能发布的其他模型都会是赶工之作。
Maybe it's a concern that because this was part of that Code Red, that this and basically any other model that they might release would be a rush job.
但对很多人来说,尽管做好了失望的准备,结果却——我该怎么说呢——还算满意。
But for a lot of people, even though they were prepared to be underwhelmed, they were, I would put it, kinda whelmed.
A16z的贾斯汀·摩尔写道:在早期测试中,该模型在保持上传图像中角色和对象一致性方面有显著提升。
Justine Moore from A16z writes: In early tests, this is a big step up in maintaining consistency of characters and objects from uploaded images.
换句话说,你的脸看起来还是你本人。
In other words, your face still looks like you.
它可能是Nano Banana Pro的真正竞争对手。
It may be a real competitor to Nano Banana Pro.
来自Click Health的西蒙·史密斯写道:我原本没指望OpenAI的新图像生成器能与Nano Banana Pro相媲美,所以我直接用它和NBP进行了相同道具的对比测试。
Simon Smith from Click Health wrote: I wasn't expecting OpenAI's new image generator to be comparable to Nano Banana Pro, so I ran it head to head on props I tried with NBP.
令人惊讶的是,它的表现相当甚至更优。
Surprisingly, it did as well or better.
但它有不同的个性,至少在ChatGPT上是这样。
But it has a different personality, at least via ChatGPT.
不那么异想天开,更专业。
Less whimsical, more professional.
他举了几个例子:研究知名人士认为何时会出现通用人工智能。
So here are a couple of the examples he gave: Research when prominent people think will get AGI.
然后在时间线上进行可视化,并将这些人物的头像标注在他们预测的通用人工智能实现年份上。
Then illustrate this on a timeline and put the faces of the people on the timeline on the years when they think will have AGI.
给它一种有趣、卡通化但不过分幼稚的感觉。
Give this a fun kind of cartoony but not too silly feel.
现在,有几点需要注意。
Now, a couple things.
首先,我认为这是测试图像生成功能与模型其他部分融合程度的好方法。
First, I think this is a good test to see how well integrated with the rest of the model image generation is.
换句话说,这不仅需要图像生成能力,还需要推理和研究能力。
In other words, this requires not just image generation, but it's also reason and research.
其次,这带出了本集的核心挑战——顺便说,如果你只是听音频的话,这集很值得一看——某种程度上质量评判是主观的。
And the second thing that this brings up is that inherently the challenge with all of this episode, and by the way, this is a good one to watch if you're just listening, is that to some extent quality is going to be subjective.
不过在这个案例中,我完全理解他为什么更喜欢ChatGeeBT图像版而非Nano Banana。
Although in this case, I certainly see why he prefers ChatGeeBT Image's version as opposed to Nano Banana.
他尝试创建了一个细胞结构图(这个同样存在主观判断),但确实表现不俗,还有骨骼解剖图,以及一个要求'搜索今日头条新闻并以旧报纸风格呈现'的提示。
He tried creating a cell cutout diagram, which again is a little bit in the eye of the beholder, but certainly holds its own, alongside a skeleton anatomy chart, and a prompt that said search up today's top headlines and then give them to me in the style of an old newspaper.
这次两个模型对提示的解读方向截然不同,从美学角度我其实更喜欢Nano Banana Pro的版本。
Now the two models in this case took the prompt in very different directions, and I actually prefer aesthetically, Nano Banana Pro's.
但总体而言,西蒙表示,我原本做好了失望的准备,结果却出乎意料。
But overall, Simon says, I was prepared to be disappointed and I'm not.
这评价相当高,因为Nano Banana Pro已经非常出色了。
That's saying something because Nano Banana Pro is amazing.
我需要更多时间来试用新的图像生成器,但第一印象很正面。
I need more time to play around with the new image generator, but my first impressions are positive.
他随后又补充道:'不过幻灯片可能是GPT Image 1.5的弱点',但很快又改口说:'好吧,我收回这句话'。
He then came back and said, Slides, however, may be a weakness of GPT Image 1.5 before very quickly returning and saying, Okay, I take it back.
GPT Image 1.5能制作非常精美的幻灯片。
GPT Image 1.5 can do gorgeous slides.
你只需要正确引导它。
You just need to prompt it.
我在上述示例中使用了相同的模板,但改用GPT 5.2的思考模式替代即时生成,并采用了更宽泛的提示词。
I gave it the same template in the above example, but used GPT 5.2 thinking instead of instant and a broader prompt.
不过他指出,GPT图像在比例方面确实存在限制,这一直是ChatGPT图像的固有问题。
He did point out, however, that there are real limitations to the ratios that you can get with GPT image, which has always been an issue for chat GPT images.
尽管如此,综合来看,西蒙确实认为GPT Image 1.5在他的个人评分表上已经击败了Nano Banana Pro。
Still, all of this added up for Simon to him actually thinking that GPT Image 1.5 has beaten Nano Banana Pro on his personal scorecard.
而且不只是西蒙这么认为。
And it wasn't just Simon.
阿拉玛丽娜发推说:ImageArena大洗牌!
Alamarina tweets, ImageArena shakeup!
OpenAI的GPT Image 1.5是文生图领域的佼佼者。
OpenAI's GPT Image 1.5 is one in text to image.
ChatGPT最新图像功能专注于图像编辑。
ChatGPT Image Latest is one on image edit.
GPT图像1.5版在文生图领域以29分优势遥遥领先,同时在图像编辑方面对Nano Banana Pro保持3分微弱优势。
GPT Image 1.5 holds a commanding 29 lead on text to image while maintaining a narrow three point edge over Nano Banana Pro on Image Edit.
他们确实声明这些评分是初步结果,最终定档还有待观察,但我认为当前结果已足够让许多人惊讶。
Now they do say that these scores are preliminary and we'll see where they settle, but still I think this would surprise a lot of people.
人工分析也得出了类似结论。
Artificial analysis found something similar.
他们写道:在文本生成图像和图像编辑两方面,GPT图像1.5在他们的测试中再次超越了Nano Banana Pro。
They wrote: On both text to image and image editing, GPT image 1.5 again surpassed Nano Banana Pro on their tests.
他们列举了几个不同的文本生成图像案例,以及诸如改变汽车颜色、插入鸭子家族穿越铁轨等编辑案例,最终再次将其评为第一名。
They gave a couple of different text to image generation examples, a couple of editing examples like changing a car's color, and inserting a family of ducks crossing a railroad, ultimately again ranking it number one.
网上有无数案例可供直接对比ChatGPT与NanoBanana Pro的表现。
Now there are a million examples out there if you want to go see direct head to heads on ChatGPT versus NanoBanana Pro.
我强烈怀疑,如果你对这场竞赛没有特别倾向或预设偏见,可能会发现某些场景下更喜欢ChatGPT,而另一些场景则更青睐Nano Banana Pro。
And my strong suspicion is that if you don't have a particular horse in the race or a set of biases that you're bringing in to start, you're likely to find some where you prefer ChatGPT and some where you prefer Nano Banana Pro.
就我个人而言,除了探索一些我认为有趣的内容外,我还进行了几项测试。
For myself, outside of just exploring a bunch of things that I thought were interesting, I ran a couple of tests.
针对多条件指令跟随测试,我要求生成'一个人站立指着屏幕'的画面。
For instruction following with multiple constraints, I asked for one person standing and pointing at a screen.
要求'两个人坐着'。
Two people are seated.
并规定'屏幕显示没有可读文字的抽象图表'。
The screen shows abstract charts with no readable text.
房间是现代简约风格的。
The room is modern and minimalist.
配色仅限于黑色、白色和浅灰色。
The color palette is black, white, and light gray only.
没有窗户、植物或标识。
No windows, no plants, no logos.
在这种情况下,Nano Banana Pro和GPT图像都能同样出色地完成任务。
In that case, both Nano Banana Pro and GPT images were able to do it equally competently.
在照片真实感测试中。
On a test of photorealism.
我要求生成一张照片级真实的手部图像,握着一个装有半杯黑咖啡的透明玻璃杯。
I asked for a photorealistic image of a hand holding a clear glass coffee mug filled halfway with black coffee.
手部必须完整展示五根手指且全部可见。
The hand has to have all five fingers and have them all visible.
玻璃杯必须呈现真实的反射和折射效果。
The glass has to show realistic reflections and refraction.
咖啡表面需要平整水平,自然室内光线搭配中性背景。
The coffee surface needs to be flat and level, natural indoor lighting in a neutral background.
同样,在这两种情况下,模型的表现都相当出色。
Again, in both cases, the models were pretty equally competent.
进入更具风格和美学的测试,我要求创作一幅1950年代复古未来主义风格的插画,采用扁平大胆的造型,限定使用蓝绿色、奶油色和哑光橙的配色,线条简洁,体现乐观的中世纪现代美学。
Getting into more stylistic and aesthetics, I asked for a 1950s retro futurist style illustration with flat, bold shapes, a limited color palette of teal cream and muted orange, clean lines, and an optimistic mid century modern aesthetic.
它们再次展现了同等实力,最终的偏好将取决于观者的个人品味。
Once again, they were both competent, and ultimately the preference here is going to be in the eye of the beholder.
这揭示了一个挑战:单一的风格指令可能被解读出不同含义。
One of the challenges that this shows is that a single stylistic prompt can mean different things.
这两幅都是1950年代复古未来主义的范例,但一幅更偏向《杰森一家》风格,另一幅则更抽象些。
These are both examples of 1950s retrofuturism, but one is a little more Jetsons and the other is a little more abstract.
当我们创造一个角色并将其置于不同场景时,两个模型都能轻松保持角色的一致性。
When we created a character and then put them in a different setting, both models had no problem keeping consistent from one to the next.
当然还有YouTube的缩略图——这个视频平台我最常用的场景。
And of course on YouTube thumbnails, a very common use case for me.
坦白说,两者都相当糟糕。
Frankly, they were both pretty garbo.
尽管我确信通过不同的提示方式可以改善这一状况。
Although I know for a fact that I could improve that with different prompting.
正如你可能从我的测试中看出的,我发现两者性能基本相当。
As you can probably tell across my tests then, what I found was pretty meaningful parity.
虽然未必比Nanobanana Pro有明显或巨大的提升,但相比OpenAI之前的图像生成模型确实进步显著。
Not necessarily a clear or huge improvement over Nanobanana Pro, but clearly a huge improvement from where OpenAI's image generation model was before this.
不过,如果你去TwitterX上看看,不难发现持相反观点的人。
However, it's not hard to find people who feel the opposite if you go check out TwitterX.
有很多人只是普遍感到失望。
There were many people who were just kind of generically underwhelmed.
Small AI的AI新闻栏目表示产品发布本就困难,所以我们很少批评失误,但OpenAI这次显然失手了。
AI News by Small AI said shipping anything is hard so we rarely call out misses, and OpenAI rarely misses, but this was clearly a miss.
OpenAI Image 1.5号称在所有领域都胜过Nano Banana Pro,但完全经不起实际体验检验。
OpenAI Image 1.5 claims to beat Nano Banana Pro one across all arenas, but completely fails vibe checks.
Yahiko进行了一项测试,发现角色面部准确度有所欠缺。
Yahiko did a test and found that character face accuracy was kind of lacking.
品牌设计师Daria Serkova提供了一个基础输入图像和产品包装,要求两个模型让输入图像中的女孩握住瓶子,并表示:虽然比以前有所改进,但ChatChiPT未能正确调整比例和改变产品及光线。
Brand designer Daria Serkova gave a base input image as well as a product package and asked both models to make the girl in the input image hold the bottle, and said, While it's better than before, ChatchiPT didn't get scale and change the product and the light.
如果我要求它进行一些编辑,它会重新处理整个图像。
And if I ask it to make some edits, it reworks the whole image.
我们将继续测试,但目前谷歌以1比0领先。
We'll keep testing but for now it's one to zero for Google.
David Shapiro提供了一堆自己的照片,要求两个模型创建YouTube缩略图,在这方面Nano Banana的表现明显优于ChatGPT。
David Shapiro provided a bunch of images of himself and asked both bottles to create a YouTube thumbnail, which in this case undeniably, Nano Banana smashed compared to ChatGPT.
一些人对Arena和Artificial Analysis的结果感到震惊。
Some people were even quite flabbergasted with the Arena and Artificial Analysis results.
我是Emily2050,转发了Artificial Analysis的帖子并评论道:真是个笑话。
I am Emily two thousand fifty reshared Artificial Analysis's post and said, a joke.
我不打算讨论阴谋论方面,但这确实对Artificial Analysis来说情况不妙。
I'm not going into the conspiracy side, but this is really not looking good for Artificial Analysis.
当有人说,怎么会这样?
When someone said, How?
艾米丽回应道,这不可能是真的,OpenAI要么赢得了基准测试,要么付钱让他们这么说。
That can't be right, Emily responds, OpenAI gained the benchmarks or paid them to say so.
暂且不论这个论点的实质,我认为这反映了人们的怀疑态度。
Which, hold aside the substance of that argument, I think reflects people's skepticism.
无论是Artificial Analysis的帖子还是Ella Marina的帖子,X平台上的评论都显示出大量的质疑声。
The X comments on both Artificial Analysis's post and the Ella Marina post also show just tons of skepticism.
好的,我们来聊聊企业AI中的信号与噪音问题。
Alright, let's talk about the signal versus the noise in Enterprise AI.
当前的挑战不仅在于技术可能性,更在于实际应用性。
The challenge right now isn't just about what's possible, it's about what's practical.
这正是我为毕马威主持的《AI赋能》播客的核心主题。
That's the entire focus of the You Can With AI podcast I host for KPMG.
第一季节目穿透炒作迷雾,专注于部署与负责任地规模化应用。
Season one cut through the hype to focus on deployment and responsible scaling.
第二季将深入一个层次。
Season two goes a level deeper.
我们汇集了AI构建者、客户和毕马威领导者的专题小组,共同探讨将定义企业AI未来发展的战略性问题。
We're bringing together panels of AI builders, clients, and KPMG leaders to debate the strategic questions that will define what's next for AI in the enterprise.
六集内容满载您实际可用的框架。
Six episodes packed with frameworks you can actually use.
在您获取播客的任何平台查找《You Can With AI》。
Find You Can With AI wherever you get your podcasts.
立即订阅,不错过新一季内容。
Subscribe now so you don't miss the new season.
本节目由Blitzy赞助播出,这是一款拥有无限代码上下文的企业级自主软件开发平台。
This episode is brought to you by Blitzy, the enterprise autonomous software development platform with infinite code context.
Blitzy运用数千个专业AI代理,经过数小时思考来理解数百万行代码的企业级代码库。
Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise scale codebases with millions of lines of code.
企业工程领导者通过Blitzy平台开启每个开发冲刺周期,输入他们的开发需求。
Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements.
Blitzy平台会提供计划,然后为每个任务生成并预编译代码。
The Blitzy platform provides a plan, then generates and pre compiles code for each task.
Blitzy能自主完成80%以上的开发工作,同时为剩余20%需要人工完成的开发工作提供指导。
Blitzy delivers 80% plus of the development work autonomously, while providing a guide for the final 20% of human development work required to complete the sprint.
上市公司将Blitzy作为预集成开发工具,配合自选的编码辅助工具使用时,工程速度提升了5倍,实现了AI原生的软件开发生命周期。
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre IDE development tool, pairing it with their coding pilot of choice to bring an AI native SDLC into their org.
访问blitzy.com,点击'申请演示',了解Blitzy如何将您的SDLC从AI辅助转变为AI原生。
Visit blitzy.com and press Get a Demo to learn how Blitzy transforms your SDLC from AI assisted to AI native.
认识Robo,您的人工智能队友。
Meet Robo, your AI powered teammate.
Robo通过AI驱动的搜索、聊天和代理释放团队潜力,或者使用Studio构建自己的代理。
Robo unleashes the potential of your team with AI powered search, chat, and agents, or build your own agent with Studio.
Robo由您组织的知识驱动,并运行在Atlassian可信赖且安全的平台上,始终在您的工作环境中工作。
Robo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work.
将Robo连接到您最喜欢的SaaS应用,确保不会遗漏任何知识。
Connect Robo to your favorite SaaS app so no knowledge gets left behind.
Robo运行在Teamwork Graph上,这是Atlassian的智能层,它能整合您所有应用的数据,并从第一天起就提供个性化的AI洞察。
Robo runs on the Teamwork Graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one.
Robo已内置在Jira、Confluence以及Jira Service Management的标准版、高级版和企业版订阅中。
Robo is already built into Jira, Confluence, and Jira Service Management Standard, Premium, and Enterprise subscriptions.
您是否体验过AI从工具转变为队友的感觉?
Know the feeling when AI turns from tool to teammate?
如果您使用Robo,您就会明白。
If you Robo, you know.
探索由Atlassian驱动的新AI队友Robo。
Discover Robo, your new AI teammate powered by Atlassian.
请访问rov,asinvictory,o.com开始使用。
Get started at rov,asinvictory,o.com.
AI发展迅猛。
AI changes fast.
您需要一个为长期发展而构建的合作伙伴。
You need a partner built for the long game.
机器人与铅笔携手组织,将AI雄心转化为真实的人类影响。
Robots and pencils work side by side with organizations to turn AI ambition into real human impact.
作为AWS认证合作伙伴,他们通过现代化基础设施、设计云原生系统并应用AI来创造商业价值。
As an AWS Certified Partner, they modernize infrastructure, design cloud native systems, and apply AI to create business value.
他们的合作伙伴关系不会止步于项目启动。
And their partnerships don't end at launch.
随着AI发展变化,机器人与铅笔始终相伴左右,助您与时俱进。
As AI changes, robots and pencils stays by your side, so you keep pace.
不同之处在于这种紧密合作能持续创造价值并随时间不断增值。
The difference is close partnership that builds value and compounds over time.
此外,通过在美国、加拿大、欧洲和拉丁美洲设立的交付中心,客户可获得本地专业知识和全球规模优势。
Plus, with delivery centers across The US, Canada, Europe, and Latin America, clients get local expertise and global scale.
要实现带来实际进展而非空头支票的AI,请访问robotsandpencils.com/aidailybrief。
For AI that delivers progress, not promises, visit robotsandpencils.com/aidailybrief.
那么该如何理解这一切呢?
So what to make of all of this?
我认为Ella Marina的Peter Gostov在文章中提到的观点大体上是正确的:根据我的个人观察,GPT 1.5和Nano Banana Pro总体上可谓旗鼓相当。
I think Peter Gostov from Ella Marina is directionally correct when he writes: My anecdotal impression of GPT 1.5 versus Nano Banana Pro is that they are pretty neck and neck overall.
我发现GPT的提示词使用起来要容易得多。
I find GPT a lot easier to prompt.
使用Nano Banana时,通常需要多次迭代才能获得理想结果,而GPT往往能直接给出符合要求的答案。
With Nano Banana, you often had to iterate several times before getting a good result, While with GPT you typically get what you ask for.
不过我认为Nano Banana在风格上略胜一筹,比
But I think Nano Banana has slightly nicer taste, e.
如
G.
在信息图表和幻灯片制作方面,Google更具优势。
For infographics, slides, Google has the advantage.
我发现GPT的风格略显厚重。
I found GPT style quite heavy.
我所说的'大体上正确'中,最关键的部分就是'总体上旗鼓相当'这个观点。
With the important point of the part I'm saying directionally correct being the pretty neck and neck overall.
吉米·阿普勒斯对此有个更简洁的说法:相比前代模型是个重大升级。
Jimmy Apples had an even simpler version of the same statement: Big upgrade over the previous model.
它虽不如香蕉智能,但风格偏好本就是主观的。
It's not as smart as Banana, but it's going to be subjective on what you like on style versus style.
对我个人而言,它完美契合了我脑海中对这个提示的想象:用你喜欢的就好。
Personally, it really hits the image in my head I have for this prompt: Just use what you prefer.
我会两者都用。
I'll be using both.
这正是我的最终结论。
And that is exactly what my overall conclusion is.
在此之前,纳米香蕉在图像生成方面无疑明显优于OpenAI的任何技术。
Before this, Nanobanana was undeniably and very clearly better than anything OpenAI had going on with image generation.
现在,这种优势已不再那么明显。
Now, it is not so clearly better.
至少并非在所有情况下都如此。
At least not in all cases.
从实际意义来看,这意味着在高质量图像生成领域,周二早晨你只有一个选择,而现在很多情况下你将拥有两个选项。
What that means practically is that for really high quality image generation, on Tuesday morning you had one option, and now in a lot of cases you're gonna have two.
现在Swix提出了一个有趣的观点:我们可能也正在触及当前图像生成方法所能达到的极限。
Now one interesting point that Swix made is that we may also see the limits of how far we can go in image generation with current methods.
他写道:我认为今天图像1.5的发布说明了人们为何如此坚定地押注显式世界模型。
He writes: I think today's image 1.5 launch illustrates one of the reasons why people are betting so hard on explicit world models.
要实现更高层次的真实感,我们必须教会模型像我们生活般观察世界,而非通过零星快照。
For the next level in realism, we're going to have to teach the models to see the world as we live it, not through occasional snapshots.
他引用了rchatchibt上的一条帖子说:'新版ImageGen简直疯了'。
He pointed to a post on rchatchibt that said, The new ImageGen is nuts.
不过有人回复道:'确实,但某些细节还是有点不对劲'。
Someone responded, however, Yes, but also the details are a little off.
为什么一条腿裸露而另一条腿穿着裤子?
Why is one leg bare and the other covered by pants?
什么样的汽车会在前排座椅后放梳妆台?
What kind of car has a vanity table behind the front seat?
乘客座位在哪里?
Where is the passenger seat?
也许被她挡住了,但很多背景和上下文看起来仍然不对劲。
Maybe it's covered by her, but a lot of the background and context still seems off.
不过至少人物看起来像真人,不再像塑料了。
Still, the people at least look human and not like plastic anymore.
那么总结时让我们思考:ImageGen目前是否有任何方面明显优于NanoBanana?
So as we round out, let's ask: Is there anything that ImageGen, I think, does distinctly better than NanoBanana right now?
虽然我的答案是否定的——没有哪个用例让我觉得在所有测试中ImageGen都碾压了NanoBanana Pro,但有四个领域(未来可能还有第五个加分项)我认为ImageGen可能成为比Nano Banana更理想的选择。
And while my answer is no, there's no one use case where I thought, just in every test that I tried, ImageGen crushed NanoBanana Pro or anything like that, there are four areas right now with a fifth potential bonus area in the future that I think ImageGen may be a desirable alternative to what Nano Banana can do.
首先,我们来谈谈信息图表。
First up, let's talk infographics.
Nano Banana Pro发布时最惊人的一点是,它突然具备了从文本生成信息图表的新能力。
One of the incredible things about Nano Banana Pro when it was released is that all of a sudden this new capability of making infographics from text came online.
相信你在网上见过大量这类图表,而正是这种无处不在的通用风格让我认为在某些情况下,你可能会想用ChatGPT图像而非Nano Banana制作信息图表——仅仅因为它们不会带着Nano Banana那种老远就能被认出来的特定风味和风格。
I'm sure that you have seen a ton of these floating around the internet, and indeed that ubiquitousness and commonality of style is exactly why I think in some cases you might want to use ChatGPT images instead of Nano Banana to make your infographics for the simple reason that they don't look like a Nano Banana infographic which already has a particular flavor and style that people can spot from a mile away.
我将最近一集的文字记录输入以生成信息图,两个模型都能做到这一点,尽管各自都有些小毛病。
I dumped in a recent episode transcript to get an infographic based on it, and both models were able to do this, although they each had their own quirks.
正如常见的那样,Nano Banana的第一版生成了一堆引用参考,尽管这些在视觉信息图上完全无用且浪费空间。
As it often does, Nano Banana's first iteration gave a bunch of citation references, even though those are completely useless and wasted space on a visual infographic like this.
而ChatGPT生成的图片只是这里那里有些小错误。
Whereas ChatGPT images just had a few little mistakes here and there.
例如,在'agentic.ai面临三大障碍'这一部分,它只列出了两个障碍。
For example, in the three biggest barriers to agentic dot ai section, it only has two barriers.
还有一些随机的拼写错误,比如把'bigger'拼成了'b I g e r'。
There were also some random spelling mistakes like bigger being spelled b I g e r.
也许比使用ChatGPT图片更好的方法是尝试通过提示词来摆脱Nano Banana Pro的标准外观。
Now perhaps the better approach than using ChatGPT images is just to try to prompt your way out of the standard look of Nano Banana Pro.
但我的观点是,你现在至少有了一个可靠的视觉替代方案。
But my point here is that you at least now have a competent visual alternative.
对于这个用例,我可能会补充需要极高文本保真度的场景。
I might add to this use case things that need really high text fidelity.
这是OpenAI在发布公告时特别提到的一点,我也围绕这一点做了一些测试。
That was one of the things that OpenAI called out in their announcement post, and I did some tests around that as well.
我要求生成一张亚伯拉罕·林肯伏案撰写葛底斯堡演说的过肩视角照片,并确保整篇演说内容清晰可读——不过这次我发现两个模型都能做到。
I asked for an over the shoulder shot of Abraham Lincoln sitting at his desk writing the Gettysburg Address, make the entire address readable, although in this case I found both models able to do it.
于是我们又回到了风格偏好的讨论范畴。
So once again we're back in the stylistic preference area.
第二个我认为ChatGPT图像目前可能真正占优势的领域是超精确指令和复杂场景处理。
A second area where I think genuinely ChatGPT images right now might have an edge is around hyper precise instructions and complexity.
我采用这个6x6网格概念并大幅提升了复杂度。
I took this 6x6 grid idea and really ratcheted up the complexity.
我要求生成6列×6行的网格,每个单元格包含一个独特的克苏鲁风格神器或实体插图,必须严格居中且不超出网格线。
I said make a six columns by six rows grid of Lovecraftian artifacts and entities where each cell contains exactly one distinct illustration centered within its square and not overlapping grid lines.
整体风格要融合1920年代通俗杂志插画与邪教手稿特质。
Overall style is 1920s pulp illustration meets a cult manuscript.
使用墨水线稿,搭配褪色棕褐色与海绿色调,带有细微纸张纹理,杜绝任何现代元素,图像中不得出现任何文字。
Inked linework, muted sepia and sea green tones, subtle paper grain, no modern elements, no text anywhere in the image.
为了再加一层难度,我实际上精确给出了所有36个方格中我想要的内容。
And then just to add another layer, I actually precisely gave it everything I wanted in all 36 squares.
它完成得简直太出色了。
It did just a phenomenal job.
没有一个方格不是完美呈现了我所要求的画面。
There wasn't a single square that didn't have a strong competent version of exactly what I asked for.
NanoBanana Pro版本的这个测试结果简直一团糟。
NanoBanana's Pro's version of this was an absolute mess.
我得到的不是6x6的网格,而是8x5的。
Instead of a six by six grid, I got an eight by five.
它也没有很好地遵循整体指令。
It didn't follow the overall instructions as well.
而且大量单个方格的内容完全是莫名其妙。
And tons of the individual squares were just out of the blue and nowhere.
当然这只是一个测试案例,但我注意到其他几个案例也显示ChatGPT图像在处理这类高精度或复杂指令时更胜一筹。
Now, of course, this is just one test, but I noticed a couple others also preferring ChatGPT images for some of these hyper precise or complex instructions as well.
Ethan Moloch写道,我尝试了一个有趣的事情,在ChatGPT Image Generator 1.5上比Nano Banana Pro效果更好。
Ethan Moloch writes, I tried something fun that worked better with ChatGPT Image Generator 1.5 than Nano Banana Pro.
点选式冒险游戏的我。
Point and click adventure game me.
你是解析器。
You are the parser.
生成图像作为输出并接收指令。
Make images as the output and taken commands.
让这个世界超级有趣。
Make the world super interesting.
记录物品栏状态等。
Keep track of inventory state, etcetera.
所以你可以看到它基本上生成了游戏截图,然后Ethan提示它转到游戏下一场景:查看激光、用地图和物品栏遮挡激光、穿过传送门。
So you can see it basically creates a screenshot from a video game, and then Ethan prompts it to go to the next shot in the game, look at the laser, cover the laser with map and inventory, run through the portal.
ChatGPT在这方面做得非常出色。
ChatGPT did a really good job with this.
纳米香蕉Pro就没能做到。
Nanobanana Pro did not.
在第一次尝试中,第二张图像与第一个场景完全不同,然后就彻底退出了。
In its first attempt, the second image was completely different than the first scene, and then it just completely bowed out.
而在第二次尝试中,它勉强做到了,但过程要困难得多。
And in the second attempt, it sort of did it, but with a much, much harder time.
当然,还有彼得·戈斯托夫再次发推说:我知道人们喜欢纳米香蕉,但我有些重要需求它无法满足。
Then, of course, there was Peter Gostov again, who tweets, I know people like Nano Banana, but I have some important needs that it just cannot meet.
他的提示是:创建一张方形图片,包含一只六指的手、显示08:22的挂钟、一杯满到杯口的红酒。
His prompt was create a square image of a hand with six fingers, a wall clock showing 08:22, a glass of red wine full to the top.
纳米香蕉Pro生成的是正常的手、7:58的时钟和接近满杯但未完全满的红酒杯,而新ImageGen模型则呈现完全满杯的红酒、8:22的时钟和七根怪异多汁的手指。
Nano Banana Pro had a normal hand, a clock at 07:58, and a wine glass that was mostly but not entirely full, whereas the new ImageGen model had a completely full wine glass, eight twenty two on the clock, and seven juicy weird fingers.
第三个方面,我认为你可能会更倾向于或至少想测试Chad GPT图像而非纳米香蕉Pro,那就是针对审美导向和更高品味的提示。
A third area where I think you might prefer or at least want to test Chad GPT images as opposed to Nano Banana Pro is for aesthetically focused and higher taste prompts.
花店展示的几个例子中,我认为GPT图像版本在视觉上比纳米香蕉版本提升了一大截。
Flower Shoppe showed a couple of examples where I think that the GPT images version is just a big step up visually from the Nano Banana version.
这里还有一个关于logo的例子。
Here's another example with a logo.
Aziz AI也发现了类似的情况。
And Aziz AI found something similar.
他尝试的提示是:以苹果风格为耐克创建一个4:5比例的简洁网站。
The prompt that he tried was Create a clean look website in Apple style for Nike in a four to five aspect ratio.
他说:胜出的是GPT,无论是UI美学还是对提示的理解都更胜一筹。
He said: The winner was GPT and aesthetics of UI and understanding the prompt.
现在我要明确说明,我在这里想表达的重点并不是特别针对这个案例认为ChatGPT图像总会更好。
Now I will say very clearly here that the point that I am trying to make is not especially in this case that I think that ChatGPT images will always be better.
而是因为这两款模型现在都处于高端水平,当你试图寻找符合你审美并达到你所追求的高品位标准时,你现在有了几个选择。
It's that because these models are both so at the high end now, when you are trying to find something that matches your vibes and reaches the levels of the high taste that you're going for, you now have a couple of options.
在某些情况下ImageGen会更好,而在另一些情况下则会更差。
Images is in some cases going to be better and in some cases going to be worse.
但再次强调,这意味着你几乎在一夜之间从一个选择增加到了两个选择。
But again, that means you've gone from one option to two options basically overnight.
第四点我想提到的是,ChatGPT图像相比NanoBanana表现出色的领域在于其实际使用界面,我认为这很大程度上揭示了他们对这款工具使用场景的设想。
The fourth thing that I want to mention in terms of an area where ChatGPT images excels as compared to NanoBanana is the actual interface for using it, and I think this reveals quite a bit about how they're imagining usage of this tool.
当然,我自己——也敢打赌在座的许多人——都是从商业或高级用户的角度参与这场讨论的。
Certainly myself, and I'd be willing to bet many of you, are coming at this conversation from a standpoint of a business or power user.
你们想要这些精细的编辑控制功能。
You want these fine grained editing controls.
你们正在想象如何将其用于自己的个体创业业务。
You're imagining how you can use this for your solopreneur business.
但我认为OpenAI预见到,实际上很多用户使用这个功能只是为了娱乐和消遣。
But I think OpenAI is imagining that a lot of the usage of this is in fact just going to be people messing around and having fun.
而Gemini在创建图像时完全没有区别,除了你需要输入‘创建图像’的指令。
Whereas with Gemini, there's absolutely no difference when you're creating an image, other than you say create image.
现在ChatGPT的网页应用中有一个完全独立的版块,视觉设计略有变化,并提供了更多选项。
In the ChatGPT web app now, there's a whole different section with slightly changed visuals and a whole lot more options.
除了标准的文本提示框外,下方还有一排风格选项可供尝试:素描、节日肖像、戏剧性、毛绒玩具、棒球摇头娃娃等等。
In addition to your standard text prompt field, you also have a row of styles underneath that you can try on an image: Sketch, holiday portrait, dramatic, plushie, baseball bobblehead, etcetera.
然后在下方,他们还有一个创意面板,可以探索新事物,比如制作节日贺卡。
Then below that, they also have a panel of ideas to just discover something new like creating a holiday card.
如果我成为K-pop明星会是什么样子?
What would I look like as a K pop star?
我化身戴珍珠耳环的少女。
Me as the girl with the pearl earring.
你能从中感受到他们想解决空白画布问题,让人们不是为了商业目的,而是纯粹为了好玩来尝试这个功能。
And you get the sense from this that they want to solve the blank slate problem and get people messing around with this not for a business purpose but just for fun.
我相信他们不会忽视这一点:他们用户增长最显著的时刻之一(如果不是有史以来最显著的增长时刻,那肯定也是2025年最显著的),就是当'吉卜力化'趋势兴起,所有人都把各种东西变成吉卜力工作室风格的图像时。
I'm sure it's not lost on them that one of their biggest moments of user growth, if not their biggest moment of user growth ever and certainly in 2025, was when we got the Ghiblification trend where everyone turned everything into a Studio Ghibli image.
这类界面选项显然是为普通用户设计的,他们不考虑商业结果和投资回报率,只是来寻找乐趣的。
These sort of interface options are very clearly aimed at the average user who isn't thinking about business outcomes and ROI but is just there to have some fun.
考虑到ChatGPT的用户中有大量普通日常用户,我理解他们为何会做出这样的战略选择。
Given how much of ChatGPT's usage is regular everyday people, I can see why they're making that bet.
以上就是我认为你可能想尝试ChatGPT图片功能的四个领域,无论是替代还是至少作为Nanabanana Pro的补充。
So that is four areas where I think you might want to try ChatGPT images, either instead of or at least in addition to Nanabanana Pro.
不过,第五个未来领域的额外优势当然是当你想要创作米老鼠、莫阿娜或其他迪士尼角色时。
The bonus, however, in fifth future area is, of course, when you want to make Mickey or Moana or a Disney character.
目前,至少在我的测试中,ChatGPT图像生成比Nano Banana受到更多限制。
Now, right now, ChatGPT Images is much more locked down, in my tests at least, than is Nano Banana.
我输入了'萨姆·奥特曼在安迪·贾西驾驶的船后滑水'的提示词。
I gave the prompt Sam Altman water skiing behind a boat driven by Andy Jassy.
这显然与OpenAI可能和亚马逊达成合作的新闻有关。
This obviously relating to the news that OpenAI might be doing a deal with Amazon.
从Gemini那里,我得到了这张很酷的拉尔夫·斯特德曼风格图片。
From Gemini, I got this cool Ralph Steadman looking image.
从ChadGPT那里,我得到了这个。
From ChadGPT, I got this.
该图像生成请求不符合我们的内容政策。
The image generation request did not follow our content policy.
当然,我们刚刚得知OpenAI和迪士尼已经达成了合作。
Of course, we just learned that OpenAI and Disney had done a deal.
这项协议将明确把迪士尼角色引入Sora平台。
A deal that will explicitly bring Disney's characters into Sora.
如果这延伸到图像生成领域,也可能意义重大。
If that extends into image generation, it could be a big deal as well.
西蒙·史密斯再次写道:如果OpenAI和迪士尼通过允许在Images V2发布时生成角色而让所有人惊讶,我敢肯定这将在假期期间引发大量ChatGPT的使用。
Simon Smith again writes: If OpenAI and Disney surprise everyone by allowing character generation with the launch of Images V2, pretty sure it will spark a ton of ChatGPT use over the holidays.
单是父母们就会为了给孩子们制作包含角色的节日信息而耗尽GPU资源。
Parents alone will burn up GPUs inserting characters into holiday messages for their kids.
西蒙提到的一个重点是v2版本。
Now one thing Simon references there is v2.
请记住当前只是1.5版本,人们期待在不久的将来会有更强大的图像生成模型出现。
Remember that this is version 1.5, and people are expecting a lot more in the relatively near future from an even better image generation model.
OpenAI员工确实暗示这只是开始,未来我们将迎来更多图像生成功能的更新,正如我一开始所说,这对我们消费者而言绝对是利好消息。
OpenAI staffers are indeed suggesting that this is just the start and that we are in for more image generation updates in the future, which as I said right at the beginning is nothing but good news for us consumers.
朋友们,这就是我对ImageGen 1.5的初体验。
So friends, that is my first look at ImageGen 1.5.
希望这能对你有所帮助。
Hope this was useful.
如果你还没尝试过,赶快去开始创作吧。
Certainly if you haven't yet, get in there and start creating.
随着我们获得越来越多AI工具,圣诞老人持续提前到来。
Santa continues to come early as we get more and more AI toys.
以上就是今天的AI每日简报内容。
For now, that is gonna do it for today's AI Daily Brief.
一如既往感谢你的收听或观看,下次见,祝安好!
Appreciate you listening or watching as always, and until next time, peace!
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。