Y Combinator Startup Podcast - 通往超级智能的最快路径 封面

通往超级智能的最快路径

The Fastest Path To Super Intelligence

本集简介

Poetiq 是一家由前 DeepMind 研究员创立的新创公司,最近通过在现有模型之上叠加递归自改进系统,在 ARC-AGI 和人类最后考试基准上实现了重大突破。在本期 Lightcone 节目中,Poetiq 创始人兼首席执行官 Ian Fischer 与我们探讨了小型团队如何构建超越基础模型的“推理支架”,这对初创公司意味着什么,以及自动化提示工程为何可能是当今人工智能领域最强大的杠杆之一。 章节: 00:00 – 导言 00:40 – 什么是 Poetiq? 01:07 – 递归自改进详解 02:07 – 微调陷阱 02:59 – LLM 的“支架” 03:14 – 递归自改进 vs 微调 05:05 – 登顶 ARC-AGI 榜首 06:37 – 在人类最后考试中击败 Claude 08:40 – 元系统的工作原理 10:26 – 超越强化学习:新的 S 曲线 11:32 – 自动化提示工程 13:37 – 从 5% 到 95% 的性能提升 14:50 – 早期访问与为你的智能体装上支架 16:17 – 从 YC 创始人到 DeepMind 研究员 18:29 – 给人工智能时代工程师的建议 申请 Y Combinator:https://www.ycombinator.com/apply 在初创公司工作:https://www.ycombinator.com/jobs

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

世界变化得太快了。

The world is changing so quickly.

Speaker 0

这可能有点显而易见,但你应该试着去做一些事情,每天都要用一下AI。

This is probably a little bit obvious, but you should just try things and and, like, every day, do something with AI.

Speaker 0

去年夏天,我花了一个周末,用GPT-5帮我开发了一个iPhone应用。

Last summer, I took a weekend and used, GPT five to help me build an iPhone app.

Speaker 0

我已经十年没做过这个了。

I hadn't done that in a decade.

Speaker 1

真快啊。

So fast.

Speaker 0

是的。

Yeah.

Speaker 0

它又快又简单。

It's so fast and so easy.

Speaker 0

而那已经是很久以前的事了。

And that was, you know, an age ago.

Speaker 0

那是八个月前的事了。

That was, like, eight months ago.

Speaker 0

现在更快更简单了。

Now it's even faster and easier.

Speaker 0

不要限制自己。

Don't limit yourself.

Speaker 0

任何你想象到的事情,都应该尝试用AI去做,看看你能走多远,这样你就能让世界变得更好。

Like, anything that you imagine, you should just try to use AI and how see far you can get with it, and you'll be, you know, making the world better.

Speaker 1

欢迎来到《光锥》的又一期节目。

Welcome to another episode of the Lightcone.

Speaker 1

伊恩·费舍尔是Poetiq的联合创始人兼联席CEO,该公司正在为大语言模型构建可自我递归改进的AI推理系统。

Ian Fischer is the co founder and co CEO of Poetiq, which is building recursively self improving AI reasoning harnesses for LLMs.

Speaker 1

此前,他曾在谷歌DeepMind担任研究员长达十年,并多年前通过Y Combinator创办了一家移动开发工具公司。

Previously, he spent a decade as a researcher at Google DeepMind and founded a mobile dev tools company through YC years ago.

Speaker 1

欢迎你,伊恩。

Welcome, Ian.

Speaker 0

谢谢。

Thank you.

Speaker 0

我很高兴能来到这里。

I'm so happy to be here.

Speaker 1

Poetiq是什么?

What is Poetiq?

Speaker 1

它和强化学习(RL)有什么不同?

How's it different than RL?

Speaker 1

你觉得它和上下文工程有什么区别呢?

You know, how's it different than context engineering?

Speaker 0

在Poetiq,我们正在打造的是一套可递归自我优化的系统。

At Poetiq, what we're building is a recursively self improving system.

Speaker 0

递归自我优化可以说是人工智能领域的终极目标,也就是让AI不断提升自身的智能水平。

And so recursive self improvement is this of the holy grail of AI, where the AI is making itself smarter.

Speaker 0

我们的核心洞见是,我们实现递归自我优化的速度和成本,能远远优于其他人之前提出的所有方案。

The core insight that we had is that we could do recursive self improvement far faster and cheaper than all of the other ways that people had been proposing to do this.

Speaker 0

所以显然,我不能详细说明我们的具体方法是什么,但目前大多数方法都需要从零开始训练一个新的大语言模型,而从零开始训练大语言模型通常需要数亿美元的投入和数月的时间。

And so obviously, I can't go into details about what that what that is, and what our particular approach is, but, most of the approaches out there involve, you know, they require you to train a new LLM from scratch, and training LLMs from scratch costs, you know, hundreds of millions of dollars and takes months of effort.

Speaker 0

所以

And so the

Speaker 1

然后Anthropic或OpenAI会在下一个模型发布时直接把你打得落花流水。

And then Anthropic or OpenAI will come along and just eat your lunch in the next model release.

Speaker 0

对。

Right.

Speaker 0

没错。

Right.

Speaker 0

当然,Anthropic、OpenAI和Google也在探索递归自改进,但通常它们的做法是,每一次自改进都需要训练一个全新的模型。

And, you know, of course, Anthropic and OpenAI and Google, they're exploring Recursive Self Improvement, but typically at that level of having the you know, having to train a new model for every step of self improvement that they do.

Speaker 1

我的意思是,这恰恰是初创公司最渴望拥有的核心优势。

I mean, that seems like actually the, like, defining thing that a startup really, really wants.

Speaker 1

我知道我希望能利用下一个模型的成果,但在微调领域待上第二年时,我已经要花上数百万甚至数亿美元了。

Like, I know that I want to take advantage of whatever the next model is, but the second year in fine tuning land, I'm spending, you know, millions to hundreds of millions of dollars.

Speaker 1

那你猜怎么着?

And then guess what?

Speaker 1

我就直接把它烧了,因为前沿模型的下一个版本出来了,我永远追不上。

Like, I just lit it on fire because, you know, the next version of the frontier model comes out, and I'll never catch up.

Speaker 1

而使用你们的系统意味着我始终拥有比开箱即用版本更好的东西,这简直就是圣杯。

Whereas, like, working with your systems means that I will always have the thing that is better than the thing that's out of box, and that's sort of like the holy grail.

Speaker 0

是的,我们认为这对任何构建在大型语言模型之上的人而言都极具价值。

Yeah, we think that this is incredibly valuable to anybody who's building on top of large language models.

Speaker 0

我们并不把前沿模型视为竞争对手。

And we don't view the frontier models as competitors.

Speaker 0

它们是我们赖以站立的高跷。

They're the ones that were building stilts to stand on top of.

Speaker 0

但如果我们没有这个基础层,Poetiq就不可能存在。

But if we didn't have that foundational layer, then Poetiq couldn't exist.

Speaker 1

对。

Yeah.

Speaker 1

我的意思是,成为最聪明的模型,实际上是一场毫厘之争。

I mean, being the smartest model, it's a game of inches, actually.

Speaker 1

所以这些细微的差距至关重要。

And so those inches matter a lot.

Speaker 0

没错。

Right.

Speaker 0

没错。

Right.

Speaker 1

我们到底该怎么开始呢?

How do we actually get started?

Speaker 1

我的意思是,你打造了一个任何初创公司都能使用的东西,这简直就是梯子。

I mean, you've built something that basically any startup could use that it's sort of like stilts, really.

Speaker 0

我们构建了一个系统,能够自动为你的特定问题生成系统,并且始终优于底层的语言模型。

We have built a system that can automatically generate systems for your particular problem that will always outperform the underlying language models.

Speaker 0

而且无需承担巨大的开支,就像你提到的苦涩教训那样——你知道,如果没有这个系统,你可能会说:好吧,我们先收集一个大型数据集,针对我们正在处理的问题收集数万个样本,然后微调我们能拿到的最好的模型。

And without kind of the massive expense, as you're saying about the bitter lesson, where, you know, what would you what would you have done without You probably would have said, Okay, we're going to first collect a large data set, tens of thousands of examples for a particular problem that we're working on, and we're going to fine tune the best model we can get our hands on.

Speaker 0

也许这是其中一个前沿模型,或者是一个开源权重模型。

Maybe that's one of the frontier models, or maybe it's an open weights model.

Speaker 0

这其实并不特别重要。

It doesn't particularly matter.

Speaker 0

你在微调上会花很多钱。

You're going to spend a lot of money on that fine tuning.

Speaker 0

计算资源太昂贵了。

Compute is so expensive.

Speaker 0

最终,你得到了一个比你用来微调的模型表现更好的东西。

And then at the end of it, you have something that works better than the thing that you fine tuned on top of.

Speaker 0

但到那时,一个新的模型已经问世,它比你用来微调的模型还要优秀。

But by then, a new model has come out, and it's better than the thing that you fine tuned.

Speaker 0

你三年前基于GPT-3.5或类似模型做了微调,然后GPT-404出现了,直接把你甩得远远的。

You fine tuned three years ago on top of GPT 3.5 or whatever, and then GPT four zero four comes out, and it just blows you out of the water.

Speaker 0

所以你是打算再做一遍,还是干脆关门大吉?

And so are you gonna do that again, or are you gonna go out of business?

Speaker 0

在某些情况下,是后者。

And, like, in some cases, the the latter.

Speaker 0

对于Poetic,我们提供给你的是一种现在人们称之为‘框架’的东西,或者说是智能代理系统,不管你怎么叫它,它建立在一个或多个语言模型之上,表现得比它们更好。

With Poetic, what we end up giving you is a you know, people are calling these things harnesses now, but, you know, or an agentic system or whatever you wanna call it that sits on top of one or more language models, and it just performs better than them.

Speaker 0

当新模型出现时,这个相同的框架能完美兼容它,你无需做任何改动就能获得更大的性能提升。

And when the new model comes out, that same harness is perfectly compatible with it, And you don't need to change anything to get the, you know, an even bigger performance bump.

Speaker 0

此外,我们可以继续针对这个新模型——无论你打算使用哪个新模型——进行优化,让它变得更好。

Additionally, we can continue to optimize for this new model, whatever the new model is that you wanna use, and make it even better.

Speaker 0

但你不会因此损失数亿美元。

But you you don't lose out on, you know, hundreds of millions of dollars.

Speaker 0

事实上,我们的做法比微调便宜得多。

In fact, we do this so much more cheaply than fine tuning would cost as well.

Speaker 1

你实际上已经这么做过很多次了。

And you've done this actually a bunch of times.

Speaker 1

对吧?

Right?

Speaker 1

我记得你们去年十二月刚发布论文时,直接登上了ARC AGI v2的榜首,之后你们还多次在其他基准测试中做到了这一点。

Like, I remember when you first came out with your paper in December, you shot to the top of ARC AGI v two, and then you've done this a bunch of times for other benchmarks too.

Speaker 1

那感觉怎么样?

What, you know, what was that like?

Speaker 0

ARC AGI v2其实是我们从低调状态中走出来,向大家展示我们有能力解决这些极其困难问题的一次亮相。

ARC AGI v two was a this was kind of, yeah, us coming out of stealth, letting people know that we could tackle these really hard problems.

Speaker 0

特别是,我们想证明我们的系统——也就是你们所说的Poetic Meta系统——能够生成非常高效的推理系统。

And in particular, you know, we wanted to show that our system could generate these what we call, you know, we call our system like the Poetic Meta system, can generate reasoning systems that are highly effective.

Speaker 0

当时Gemini 3和DeepThink刚发布,它们在排行榜上以45%的分数遥遥领先。

Gemini three, DeepThink had just come out, and they were really quite dramatically at the top of the leaderboard at 45%.

Speaker 0

两天后,我们发布了结果,展示出我们的分数能远超那个水平。

And two days later, we released our results where we were showing that we could get a lot higher than that.

Speaker 1

所以他们一发布Soda,你们就立刻超过他们,每次都这样,说实话,这简直太惊人了。

So they come out with soda, and then you come in right above them every single time, which is, like, wild to see, honestly.

Speaker 1

这就是拥有‘高跷’的感觉,你知道吗?

That's what it's like to have stilts, you know?

Speaker 1

是的。

Like Yeah.

Speaker 1

无论推出什么模型,你们都能用Poetic超越它,这真是太棒了。

Whatever model comes out, you can be taller than that one with Poetic, which is like that's so awesome.

Speaker 0

对。

Yeah.

Speaker 0

有趣的是,我们的成本只有Gemini Three DeepThink的一半,因为我们是在Gemini Three Pro这个便宜得多的模型基础上构建的。

So the interesting thing is that we were half the cost of Gemini three DeepThink because we were building on top of Gemini three Pro, which is a much cheaper model.

Speaker 0

但最终,我们在官方验证上还是取得了9个百分点的提升。

But we still got, in the end, a nine percentage point improvement on the official verification.

Speaker 0

他们当时是45%,而我们是70多美元,而我们是54%的成本,每道题只要32美元。

So they were at 45%, and we were and, like, 70 something dollars, and we were at fifty four percent and $32 per problem.

Speaker 2

最近,你们刚刚公布了关于《人类最后的考试》的一些惊人成果。

So recently, you guys just announced some incredible results for Humanity's Last Exam.

Speaker 2

能跟我们详细说说吗?

Can you tell us more about those?

Speaker 0

人类最后的考试是一套由多个领域专家撰写的2500道极其困难的问题。

Humanity's Last Exam is a a set of 2,500 really, really hard questions written by experts in many different domains.

Speaker 0

这些问题即使对这些领域的博士来说也极具挑战性。

Meant to be challenging even for PhDs in those fields.

Speaker 0

AI至今尚未通过这项考试,但我们达到了55%的正确率,比上周由Anthropic发布的Claude Opus 4.6所创下的先前最佳成绩高出近两个百分点。

AI hasn't passed it yet, but we got to 55%, which is almost two percentage points higher than the the previous state of the art, which came out just last week from Anthropic with Claude Opus 4.6.

Speaker 0

他们取得了53.1%的正确率,而我们达到了55%。

They got 53.1%, and we got 55% on it.

Speaker 2

人类最后的考试并未公布取得这些成绩所需的成本。

And one thing that Humanity Last Exam doesn't publish is the cost of getting those results.

Speaker 2

在你们的情况下,这次运行的成本低于六位数。

In your case, this run was done with less than around 6 figure.

Speaker 2

具体是多少呢?

How much was it?

Speaker 0

我们没有公布这次的成本,但我可以透露,优化过程的花费不到十万美元。

We didn't publish any cost for this, but I can say that the the optimization costs us less than a 100 k.

Speaker 0

是的

Yeah.

Speaker 2

这令人印象深刻,因为这些大型基础模型的训练成本高达数亿美元。

Which is impressive because each of these big foundation modeled train runs are in the hundreds of millions of dollars.

Speaker 2

你们公司只有七个人吗?

And you guys, as a company, you're only seven people?

Speaker 0

没错。

That's right.

Speaker 0

是的

Yeah.

Speaker 0

是的

Yeah.

Speaker 0

七位研究科学家和研究工程师。

Seven seven research scientists and research engineers.

Speaker 0

是的

Yeah.

Speaker 2

这很令人印象深刻。

That's impressive.

Speaker 2

我认为你们方法中非常有趣的一点是,以一种科学的方式应对许多顶尖创始人在模型上所展现的涌现行为。

And I think the thing that's very interesting about your approach is sort of taking a very scientific approach to the emergent behaviors that a lot of the best founders are doing with models.

Speaker 2

我认为,很多在智能体上取得出色成果的创始人,都将底层模型视为一个可以随意替换的通用层。

I think a lot of founders that get very good results for agents, they treat the underlying model as a common layer that you can switch in between.

Speaker 2

比如,某个特定任务,例如难以验证的GPT 5.2的bug,会被发送到该模型,而另一些架构则会被发送到Claude 4.6。

And there's a certain task, for example, for GPT 5.2, like, very hard to verify bugs, gets sent to that versus architecture that gets sent to Claude 4.6.

Speaker 2

但你们是自动完成这一过程的,而不是依赖人工操作,这非常令人印象深刻。

But you're kinda doing this automatically instead of having a human conducting is very impressive.

Speaker 2

我认为下面还有更特别的东西在起作用。

I think there's something more special going on underneath.

Speaker 2

你能告诉我们它是如何运作的吗?

Can you tell us a bit about how it works?

Speaker 1

是的。

Yeah.

Speaker 1

这听起来很神奇。

It sounds magical.

Speaker 1

所以,是的。

So Yeah.

Speaker 1

是的。

Yeah.

Speaker 1

你能告诉我们些什么吗?

What can you tell us?

Speaker 0

对。

Right.

Speaker 0

你正在触及一个非常核心的问题。

You're you're so you're getting at a core a a really core thing.

Speaker 0

你知道吗?

You know?

Speaker 0

这些框架是在一个或多个语言模型之上构建的代码、提示和数据。

These harnesses, they are code prompts data built on top of one or more language models.

Speaker 0

因此,原则上你可以手工构建这些东西,或者使用Claude代码之类的工具。

And so this is something that in principle you can build by hand, or with like Claude code or whatever.

Speaker 0

但实际上,要完成这些工作并获得所有必要的洞察力以使其良好运行,需要付出大量努力。

But in practice, it takes a lot of work to do these, have all the insights to make these work well.

Speaker 0

因此,我们在Poetiq开发的核心技术是递归自我改进。

And so the core technology that we've developed at Poetiq is recursive self improvement.

Speaker 0

所以我们有一个递归自我改进的系统,我们称之为Poetiq元系统。

So we have a recursively self improving system, which we call the Poetiq Metasystem.

Speaker 0

该系统的输出是能够解决复杂问题的系统,而复杂问题指的是,如果你把这类问题交给GPT-5.2,它也会难以给出可靠、稳健的结果——仅举一例。

The output of that system is systems that solve hard problems, where a hard problem is something that if you gave it to GPT-five point two, it would struggle to give you a reliable, robust result, just to use an example.

Speaker 0

因此,这对我们的优势非常明显。

So this is a very big advantage for us.

Speaker 0

我们可以以更自动化的方式生成这些系统,这意味着相比你自己雇佣团队来开发解决特定任务的代理,我们可以更快、更低成本地完成。

We can generate these systems in a much more automated manner, which means that we can do it much more quickly and much more cheaply than if you hired a team yourself to try to make your own agent to solve your particular task.

Speaker 0

不仅如此,由于这本质上是一个自动化优化过程,如果你已经完成了这项工作——比如你是一家专注于特定垂直领域的初创公司,你认为自己已经充分理解了问题,并构建了代理,它运行得还不错,但你知道还能做得更好,或者你确实需要更好的方案——那么你可以把你的代理交给我们,我们可以优化整个代理或其中的部分组件。

But not only that, since this is really an automated optimization process, if you already have done that work, you're a startup that's like going after a particular vertical and you've put together, you think you understand your problem pretty well, you've put together your agent, and maybe it's working pretty well, but you know you can get something better or you really need something better, then you can bring that to us and we can optimize that entire agent or pieces of that agent.

Speaker 0

我们可以只优化提示词,或者只优化推理策略。

We could optimize just the prompts, just the reasoning strategies.

Speaker 0

根据你的具体需求,我们可以做很多不同的事情。

There's a lot of different things that we can do depending on your particular needs.

Speaker 2

这听起来像是与强化学习完全不同的范式,因为我们已经经历了常规预训练和OpenAI发布o1时的强化学习曲线,而现在这感觉像是一个新的范式。

It sounds like this is a complete different paradigm than RL, because we went through the s curve of regular pretraining, RL with when OpenAI release o one, and now this feels like a new one.

Speaker 2

这听起来很特别。

It sounds special.

Speaker 2

这听起来和RNN非常相似,而RNN是一种与强化学习完全不同的范式。

It sounds it rhymes a lot with RNNs, which is a whole different paradigm than RL.

Speaker 2

对吧?

Right?

Speaker 0

这取决于我们所针对的具体任务、要解决的问题类型,以及我们所使用的底层模型。

It's going to depend on the particular task, the particular type of problem that we're going after, that we're trying to solve, and the underlying models that we're working with.

Speaker 0

但本质上可以说,我们所使用的每个模型或每组模型都会有其自身的S型曲线。

But effectively, could say, like, each model or each set of models that we're working with will have its their own s curve.

Speaker 0

诗歌元系统本身也会有自己的S曲线。

The Poetic the Poetic Meta system itself is also going to have its own S curve.

Speaker 0

因此,随着诗歌元系统和底层模型的不断改进,你所面对的S曲线会持续向上攀升,直到最终达到饱和或——

And so as the Poetic Meta system gets better, and as the underlying models get better, you'll find that the S curve that you're dealing with keeps shifting higher and higher until ultimately either you saturate or like-

Speaker 2

达到AGI吗?

Reach AGI?

Speaker 0

是的。

Yeah.

Speaker 0

达到AGI,达到超级智能。

Reach AGI, reach super intelligences.

Speaker 0

是的。

Yeah.

Speaker 1

考虑到它的局限性,你可能会先碰到天花板。

Given its stilts, you might like hit the ceiling first then.

Speaker 0

这正是目标。

That's the goal.

Speaker 0

对吧?

Right?

Speaker 0

是的。

Yeah.

Speaker 0

你希望先让Poetiq触及上限。

You wanna hit the ceiling first with Poetiq.

Speaker 1

我认为我们合作的许多初创公司,以及我在空闲时间所做的大量上下文工程,都是如此。

I think a lot of startups that we work with, and then in my spare time, I, you know, do a bunch of context engineering.

Speaker 1

嗯。

Mhmm.

Speaker 1

关键是,我们正在调整它,调整评估指标,调整上下文填充,我们自己在不断堆砌上下文。

And then the thing is we're sort of, like, tuning it, tuning evals, tuning like, we're context stuffing ourselves.

Speaker 1

当提示工程和上下文工程具备递归自我改进的能力时,那会是什么感觉?

What does that even feel like to have a recursively self improving version of prompt engineering and context engineering?

Speaker 0

我们并没有花太多时间去研究我们所使用的具体数据。

We don't spend a lot of time looking at the particular data that we're working with.

Speaker 0

相反,我们让Poetic Metasystem来查看这些数据。

Instead, we're letting the Poetic Metasystem look at that data.

Speaker 0

因此,如果Metasystem认为需要往上下文中添加更多内容、进行更多的上下文填充或其他操作,它就会自行处理。

And so the Metasystem, if it thinks that it needs to put more things into context, do more context stuffing or whatever, it'll do that.

Speaker 0

如果它需要生成大量示例以提升性能,它也会为你完成。

If it needs to generate a bunch of examples to get better performance, it'll do that for you.

Speaker 0

特别值得注意的是,我认为在查看ARC AGI的提示输出时,你可以明显看出这些内容并非人类会写出的,其中有一些出人意料的地方,它生成了一些非常简单的例子,其中一个例子甚至是错误的,但我们没有修改它。

It was pretty interesting to look at the prompt outputs in particular, I'd say, for ARC AGI, in that I think you can read those and say, well, that's not what a human would have written pretty clearly, there's some unexpected stuff, and it made some really simple examples, and one of the examples is actually wrong, but we didn't change it.

Speaker 0

我们觉得,既然这是它输出的结果,那就保持原样吧。

We're like, well, this is the thing that it output, we'll just leave it be.

Speaker 0

我们不想去随意改动这些内容。

We don't wanna go in and monkey around with things.

Speaker 0

在机器学习的历史上,一直以来的规则都是你必须非常熟悉你的数据集。

And so historically in machine learning, always, it's like the rule was you have to know your data set really well.

Speaker 0

但现在,我们将这项工作交给了AI本身,由AI负责理解数据集,找出失败模式,以及确定模型或智能体可以采用哪些稳健的推理策略来提升性能?

But now we're outsourcing that to the AI itself, where it's the AI's job to understand the data set and figure out where are the failure modes and where are the robust reasoning strategies that the model that that the agent could use to get better performance?

Speaker 1

其中有多少是靠输出质量更高的提示词,又有多少是靠框架本身——比如补充上下文、以正确的方式做总结、或是合理重排序,来在你调用大模型的次数有限的情况下,最大化每一次调用的效果呢?

How much of it is, like, much the output is much better prompts, and then how much of it is, like, the harness itself, context stuffing, or summarizing in the right way, or re ranking in the right way, so that you have some number of mega LLM calls, and then how do you get the most out of each of those calls?

Speaker 0

Yeah.

Speaker 0

而且这个比例肯定是因问题而异的,但我们实际观察到的情况是——我们在DeepMind发表的上一篇论文没有做这种递归式自我改进的方向,但那篇论文里我们证明了,人工搭建这类框架也能解决非常难的问题。

And so that definitely varies per problem, but what we've seen in fact, our our last paper at DeepMind was not doing this recursive self improving stuff, but we were showing that you could build these harnesses manually to solve really hard problems.

Speaker 0

当时我们发现,为那些超高难度的问题人工拼命优化提示词,只能帮我们取得一点点进展。

And what we saw there is that we manually optimized the prompts really hard for these very hard problems, and that got us a little bit of the way.

Speaker 0

就这次的案例来说,在我们当时攻坚的那个难度最高的任务上,用Gemini 1.5 Flash只取得了5%的性能表现。

In this particular case, the hardest task we were working on, we got to 5% performance with Gemini 1.5 Flash.

Speaker 0

那是之前的事了。

This was a while ago.

Speaker 0

后来我们加入了这些推理策略后,性能直接从5%提升到了95%。

And then when we added on the reasoning strategies, we went from 5% to 95%.

Speaker 0

我的天呐。

Oh my god.

Speaker 0

这通常就是我们所看到的情况。

And so this is typically what we see.

Speaker 0

很多人都在外面做某种程度的自动提示优化,我不是说每个人,但确实有很多人这么做。

Everybody's out there kind of doing some amount, I wouldn't say everybody, but many people are out there kind of doing some amount of automated prompt optimization.

Speaker 0

JEPPA 是一篇非常受欢迎的论文。

JEPPA is this very popular paper.

Speaker 0

大家都在重新实现它。

Everybody's kind of reimplementing that.

Speaker 0

这能带来一些性能提升,但远远达不到你真正思考那些将通过代码而非仅仅优化提示来实现的推理策略所能达到的效果。

That will get you some performance improvements, but it's very far from everything that you can get if you actually think about these reasoning strategies that are really gonna be written in code rather than in in just better prompts.

Speaker 0

所以,如果初创公司想用 Poetic 让他们的代理更强大,他们应该怎么做?

So if startups want to use Poetic to put their agent on stilts, what should they do?

Speaker 0

是的。

Yeah.

Speaker 0

目前我们还没有发布任何东西,但如果你访问 poetiq.ai,可以点击一个按钮申请早期访问权限。

So right now, we haven't released anything yet, but, if you go to poetiq.ai, there is a button you can click to get, sign up for early access.

Speaker 0

如果你是一家初创公司或企业,面对一个极其困难的问题,已经尝试了所有方法来使其可靠和稳定,却仍然无法完全达成目标,那么你需要更进一步的解决方案,请告诉我们。

And if you're a startup, or a company who has a really hard problem, and you've tried everything that you can to make it, reliable and robust, and you just can't get all the way there, you you need something more, then, let us know.

Speaker 0

我们正在寻找这样的问题。

We're looking for problems like that.

Speaker 0

所以,请告诉我们你正在做什么,我们会主动联系你。

So just tell us tell us what it is that you're working on, and, we'll reach out.

Speaker 0

当我们准备与你合作时,你会是第一个知道的人。

You'll be the first to know when we're when we're ready to work with you.

Speaker 1

我的意思是,如果你站在人类最后考试的顶端,那确实非同小可。

I mean, if you're at the top of Humanity's Last Exam, then, I mean, that's that's pretty big.

Speaker 1

所以你现在已经在SODA的前沿了,而这些‘高跷’基本上能让任何智能体公司都达到SODA的水平。

So it's you're all the you're already all the way out there at SODA, and then I guess the stilts basically let any agentic company become SODA.

Speaker 0

这就是我们的想法。

That's the idea.

Speaker 0

是的。

Yeah.

Speaker 0

是的

Yeah.

Speaker 0

而且,我们认为ARC AGI结果和人类最后考试结果展示了我们所具备的两种不同能力。

And, you know, we view the ARC AGI results and the Humanity's Last Exam results as showing kind of two different capabilities that we have.

Speaker 0

我们可以极大地提升你的推理能力,也能从这些模型中深度提取知识。

We can really improve your reasoning, and we can really improve deep knowledge extraction from these models.

Speaker 1

那你就能完全免疫于苦涩的教训了。

And then you're just totally vaccinated against the bitter lesson.

Speaker 0

没错。

Exactly.

Speaker 1

YC的下一期项目现在开始接受申请。

YC's next batch is now taking applications.

Speaker 1

你有创业想法吗?

Got a startup in you?

Speaker 1

请前往ycombinator.com/apply申请。

Apply at ycombinator.com/apply.

Speaker 1

永远不会太早,填写申请表能提升你的想法。

It's never too early, and filling out the app will level up your idea.

Speaker 1

好的。

Okay.

Speaker 1

回到视频。

Back to the video.

Speaker 3

稍微换个话题,我有点好奇。

Slight sort of change change your topic, but something I was curious about.

Speaker 3

你十多年前加入谷歌时,当时谷歌收购了你的第一个YC初创公司Portable。

So you arrived at Google over a decade ago when they acquired your first YC startup, Portable.

Speaker 3

Portable主要是做移动应用的跨平台移植,比如Android之类的。

Portable was it's porting mobile apps cross platform rate, like Android or whatever.

Speaker 3

这和递归自改进的AGI差别很大。

It's quite different to Recursive Self Improving AGI.

Speaker 3

你是怎么完成这一跃的?

How did you make that leap?

Speaker 3

你到谷歌之后发生了什么?

What happened once you got to Google?

Speaker 3

是什么让你觉得也许想换条路,做点不同的事情?

What made you think that you maybe wanted to shift down and do something different?

Speaker 3

我真的很想听听这个故事。

And just love to hear that story.

Speaker 0

这次收购是一个绝佳的机会,让我反思自己接下来真正想做什么。

The acquisition was this amazing opportunity to reflect on what I really wanted to be doing next.

Speaker 0

对吧?

Right?

Speaker 0

你知道,谷歌本身就是一个可以做很多不同事情的地方。

Like, Google was in you know, itself is a place where you can do so many different things.

Speaker 0

所以我花了一些时间思考,在我的人生旅程中,下一步该去哪里。

So I spent some time thinking about where where I wanted to go next in in in my journey.

Speaker 0

我意识到,让我最兴奋的问题其实是人工智能和机器人技术。

I realized that the problems that I was most excited about were really actually AI and and robotics.

Speaker 0

当时,这些领域中最优秀的人才很多都在谷歌,所以我去和他们交流了。

And the best people in the world, many of them in those fields were at Google at the time, and so I went and talked to them.

Speaker 0

他们让我加入谷歌研究部新成立的AI机器人团队,这对我来说是个绝佳的机会,因为这并非我的专业背景。

They let me come join a new AI robotics team in Google Research, which was this amazing opportunity for me since that wasn't my background.

Speaker 0

我的背景是计算机安全,然后是跨平台移动系统开发之类的工作。

My background was like computer security, and then this cross platform mobile, you know, it's systems building stuff.

Speaker 0

我加入了这个团队,说实话,我很快意识到硬件太难了,我并不真的想做机器人。

I was able to join this team, and I'll tell you the truth that I very quickly realized that hardware is hard, and I didn't really wanna be doing robotics.

Speaker 0

那时那更多是一种理想化的追求,但我对机器学习真的充满热情。

It was more aspirational at that moment, but I was really passionate about machine learning.

Speaker 0

于是我毅然转向专注于机器学习研究,在谷歌和后来的DeepMind做了大约十年。

So I just made a very hard switch into just doing machine learning research and did that for about a decade at Google, then Google and then DeepMind.

Speaker 3

对于想要进入AI领域、尤其是应用AI并围绕AI创业的工程师,你今天有什么建议吗?

What's maybe some advice that you have today for engineers who want to get into more of the AI side, probably the applied AI and build startups around AI?

Speaker 3

他们应该怎样思考这个问题?

Like, how should they think about that?

Speaker 0

你知道,世界变化得太快了。

You know, the world is changing so quickly.

Speaker 0

这可能有点显而易见,但你就是应该去尝试各种事情。

This is probably a little bit obvious, but you should just try things.

Speaker 0

每一天,都要做点和人工智能相关的事情。

And and, like, every day, do something do something with AI.

Speaker 0

始终试着挑战自己,探索它们能力的边界,去构建你真正想创造的东西。

Always try to push yourself to find the boundaries of what they're capable of and and build the things that you that you want to build.

Speaker 0

对吧?

Right?

Speaker 0

就连我,去年夏天,也花了一个周末,用GPT-5帮我开发了一个iPhone应用。

Even for me, last summer, I took a weekend and used GPT-five to help me build an iPhone app.

Speaker 0

我之前没做过这个。

I hadn't done that

Speaker 1

这太神奇了,进展如此之快。

in It's amazing, a so fast.

Speaker 0

是的,它变得如此快速和简单。

Yeah, it's so fast and so easy.

Speaker 0

那还是不久前的事,大约八个月前,现在它更快更简单了。

And that was an age ago, that was like eight months ago, now it's even faster and easier.

Speaker 0

不要限制自己。

Don't limit yourself.

Speaker 0

你想象的任何事情,都应该尝试用AI去实现,看看你能走多远,这样你就能让世界变得更好。

Anything that you imagine, you should just try to use AI and see how far you can get with it, and you'll be making the world better.

Speaker 1

今天的时间就到这里了。

That's all we have time for today.

Speaker 1

但是,伊恩,非常感谢你给我们所有人提供了助力。

But, Ian, thank you so much for giving us all stilts.

Speaker 1

我们迫不及待想在YC使用它。

We can't wait to use it at YC.

Speaker 1

我迫不及待想用它来处理Gary's List。

I can't wait to use it for Gary's List.

Speaker 1

我的意思是,有太多事情可做了。

I mean, there's just so much to do.

Speaker 1

所以

So

Speaker 0

是的。

Yeah.

Speaker 0

谢谢您邀请我。

Thank you for having me.

Speaker 0

这非常有趣。

This was a lot fun.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客