Cheeky Pint - 与Waymo的德米特里·多尔戈夫一起探索自动驾驶汽车的20年历程 封面

与Waymo的德米特里·多尔戈夫一起探索自动驾驶汽车的20年历程

The 20-year journey to fully autonomous cars with Dmitri Dolgov of Waymo

本集简介

Waymo 目前已在 11 座城市每周完成 50 万次乘车服务。联合首席执行官德米特里·多尔戈夫来到酒吧,与我们探讨了团队如何从科学研究迈向全球规模。他深入讲解了传感器堆栈(以及为何仍需激光雷达)、他们如何使用“教师”和“批评者”模型训练人工智能,以及为何他认为需要人类监督的汽车永远不会自然演变为机器人出租车。他们还讨论了全新定制车辆——其体验宛如客厅、阿拉斯加乡村叫车服务的经济模式,以及似乎主导英国科技界的“俄罗斯数学极客”群体。 时间戳 (00:00:22) 俄罗斯 (00:02:51) Waymo 架构 (00:09:59) 为何是现在? (00:19:46) 驾驶的细微差别 (00:29:37) Stripe 代理商业套件 (00:30:17) 硬件 (00:40:20) 涌现行为 (00:46:36) 规模化 (00:57:56) 谷歌 文章: EMMA:面向自动驾驶的端到端多模态模型 – Waymo 研究:https://waymo.com/research/emma/

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

德米特里·多尔戈夫是Waymo的联合首席执行官。

Dmitri Dolgov is co CEO of Waymo.

Speaker 0

他于2009年作为首批工程师之一加入谷歌的自动驾驶汽车项目,并屡获晋升,直至2021年接管该项目。

He joined Google's self driving car project in 2009 as one of its first engineers and was repeatedly promoted until he took it over in 2021.

Speaker 0

Waymo是谷歌最成功的登月计划,如今每周提供超过50万次完全自动驾驶出行服务。

Waymo is Google's most successful moonshot and now provides over 500,000 fully autonomous rides each week.

Speaker 0

干杯,通过

Cheers, by the

Speaker 1

方式。

way.

Speaker 1

是的。

Yeah.

Speaker 1

干杯。

Cheers.

Speaker 0

你是在俄罗斯长大的。

You grew up in Russia.

Speaker 1

对吧?

Right?

Speaker 1

我是在俄罗斯长大的。

I grew up in Russia.

Speaker 1

是的。

Yep.

Speaker 1

那时候我实际上生活在苏联。

Then I I was actually a Soviet Union.

Speaker 0

对。

Right.

Speaker 0

对。

Right.

Speaker 0

没错。

Exactly.

Speaker 1

我父亲是一名物理学家。

My dad is a physicist.

Speaker 1

好的。

Okay.

Speaker 1

所以苏联开始解体,然后你知道的,他获得了一个在京都大学的访问职位,嗯。

So the Soviet Union started falling apart, and then, you know, he got had a position, a visiting position in university in Kyoto University Mhmm.

Speaker 1

为期一年。

For a year.

Speaker 1

我们全家搬到了那里,然后他去了伯克利,我也跟着去了。

We moved there as a family, and then he went to Berkeley, and I kind of tagged along.

Speaker 1

然后我高中毕业了。

And then I ran out of, you know, I graduated from high school.

Speaker 1

是的。

Yep.

Speaker 1

我在想接下来想做什么,我真的很喜欢俄罗斯的那所技术学校。

I was thinking about the next thing I wanted to do, and I really liked that that that technical school in Russia.

Speaker 1

The

Speaker 0

俄罗斯人对物理学非常认真。

Russians are serious about the physics.

Speaker 1

他们是的。

They are.

Speaker 1

他们是的。

They are.

Speaker 1

所以我回到了俄罗斯,获得了我的学士和硕士学位。

So I went back to Russia, and I got my bachelor's and master's

Speaker 0

你是什么时候回到俄罗斯的?

in What year was this that you went back to Russia?

Speaker 1

1994年。

1994.

Speaker 0

好的。

Okay.

Speaker 0

所以那在某种意义上几乎是俄罗斯人乐观情绪的顶峰时期

So that was kind of almost peak Russian optimism in a sense where

Speaker 1

确实是。

It was.

Speaker 0

当时正在开放。

It was opening up.

Speaker 1

确实是。

It was.

Speaker 1

确实是。

It was.

Speaker 1

是的。

Yeah.

Speaker 1

是的。

Yeah.

Speaker 1

不。

No.

Speaker 1

我实际上记得当时和我妈妈聊过这件事,当然,我父母是在苏联时期长大的。

I actually remember talking to my mom about it, and, you know, of course, my parents grew up in the Soviet Union.

Speaker 1

他们见过。

They've seen it.

Speaker 1

是的。

Yep.

Speaker 1

你知道,他们出生在战争前夕

You know, they were born right before

Speaker 0

是的。

Yeah.

Speaker 1

战争爆发后,他们经历了非常艰难的岁月。

The war, and then they saw you know, they lived through some really tough times.

Speaker 1

我记得和我妈妈谈过这件事,她说,事实上,我在回苏联之前就在这里拿到了绿卡,她坚持要我办。

And I remember talking to my mom and saying, she she you know, in fact, I got my green card here in The US before I went back, and she insisted that I do it.

Speaker 1

嗯。

Mhmm.

Speaker 1

当时我其实没打算再回来。

And I was actually, at the time, I wasn't thinking of coming back.

Speaker 1

但那时我对拉斯赫所处的位置及其发展轨迹感到非常兴奋。

But then I was pretty excited about where Rashe is and what trajectory it's on.

Speaker 1

而且,你知道,当时我年纪小、天真,觉得已经没有回头路了。

And, you know, being nine young and naive, I was like, there's no turning back.

Speaker 0

那你为什么决定回来呢?

And so why did you decide to come back?

Speaker 0

这更多是一种游戏。

It was more of a play

Speaker 1

是学校和命运的安排,是的,是的。

by School, fate than yeah, yeah.

Speaker 1

不是。

No.

Speaker 1

学校,这一点对我来说很清楚。

School, it was pretty clear to me.

Speaker 1

我想继续学习。

Like, I wanted to continue, studying

Speaker 0

嗯嗯。

Mhmm.

Speaker 1

你知道的,数学和计算机科学。

You know, math and computer science.

Speaker 1

虽然我本科和硕士学的是物理和应用数学,但我认为这仍然是非常扎实的俄罗斯数学与科学教育基础。

And while the undergrad and masters that I got in physics and applied math, that I think was still an incredibly strong kind of foundational, you know, school of Russian math and science.

Speaker 1

研究生阶段,是的。

Graduate school Yes.

Speaker 1

对我来说,最棒的方式就是在美國完成。

It was very clear to me that the best way to do it was in The US.

Speaker 1

所以我回来了。

So I came back.

Speaker 0

我注意到,两家最有价值的英国公司的创始人都是俄罗斯数学极客,而且都毕业于同一所学校。

I'm struck by the founders of the two most valuable UK companies are Russian math nerds who both went to the the same school.

Speaker 0

Revolut的尼古拉和Alex Gerco,是的。

Nikolay at Revolut and Alex Gerco- Yeah.

Speaker 0

Gerco- 是的。

Gerco- Yeah.

Speaker 0

在XTX。

At XTX.

Speaker 0

但确实,这是一个强大的侨民群体。

But, yeah, it's a it's a strong diaspora.

Speaker 1

有一家公司离这儿不远,它的创始人之一也有类似的背景。

There's a company not far from here where one of the founders also has, you know, a similar pedigree.

Speaker 1

嗯。

Mhmm.

Speaker 1

我们是一家,确切地说,

We're, like, a company that Exactly.

Speaker 0

与之关系密切。

We're closely related to.

Speaker 0

没错。

Exactly.

Speaker 0

你知道,经典的工程面试题是,当你输入 google.com 并按下回车时会发生什么,你可以给我讲讲你感兴趣的任何内容,比如 HTTP、DNS 和 BGP。

You know, the classic engineering interview question of, you know, what happens when I type google.com and hit enter as, you know, to talk me through, you know, whatever you like, you know, HTTP and DNS and BG.

Speaker 0

你可以深入到任意一层技术栈。

You can go down to whatever level of stack you want.

Speaker 0

你能不能简单描述一下,今天我乘坐 Waymo 时,技术层面到底发生了什么?

Do you want to maybe just describe, when I take a ride in a Waymo today, what's happening at a technical level?

Speaker 0

比如,它的架构是怎样的?

Like, what is the architecture?

Speaker 1

让我回答你的问题,关于实时发生了什么,但这只是故事的一部分,因为我们主要讨论的是推理部分,也就是实时推理。

Let me answer your question, what's happening in real time, but this is going to be only a part of the story because we're going to be talking about kind of the inference, the real time inference part of it.

Speaker 1

如果我们想进行更深入、更丰富的技术对话,我认为很有意思的是,把视角拉远,谈谈构建、评估和部署 Waymo 驾驶系统所涉及的整个生态系统。

And if we want to have deeper, richer technical conversation, I think it would be interesting also to zoom out and talk about the entire ecosystem of what goes into building, evaluating, and deploying the Waymo Driver.

Speaker 1

但当你在驾驶或被搭载时,比如说,我们把我们所构建的东西视为一个驾驶员。

But when you're driving around or being driven around, say, you know, we think about what we're building as a driver.

Speaker 1

显然,它并不是一辆车。

Obviously, it's not a car.

Speaker 1

它配备了分布在车辆周围的多个传感器。

So it has a number of sensors that are positioned around the vehicle.

Speaker 1

我们使用三种不同的感知模式。

We use three different sensing modalities.

Speaker 1

有摄像头、激光雷达,是的。

There's cameras, there's gliders or lasers Yep.

Speaker 1

还有雷达。

And there are radars.

Speaker 1

这些是主要的传感器。

You know, those are the primary ones.

Speaker 1

此外还有麦克风,比如定向麦克风阵列,但感知外界主要依靠这三种。

There are also microphones, directional, you know, microphone arrays, but those are the primary three for sensing the world.

Speaker 1

它们在物理特性上非常互补。

They all have very nicely complementary physical properties.

Speaker 1

它们都提供了车辆周围360度的覆盖,因此Waymo驾驶员能持续看到全方位的环境。

They all have 360 degree coverage around the vehicle, so the Waymo driver sees kind of 360, you know, all the time.

Speaker 1

所以,所有的数据都会进入计算机,这是你所期望的。

So, all of the data goes into a computer, you would expect.

Speaker 1

而处理这些数据的软件,现在都是人工智能了。

And the software that process, now it's, you all AI.

Speaker 1

嗯。

Mhmm.

Speaker 1

我可以说这是应用于物理世界的专用人工智能。

I can say specialized AI in the physical world.

Speaker 1

它会处理传感器数据。

So, it processes the sensor data.

Speaker 1

如今,我们用人工智能的术语来描述这个过程,比如编码器,用来接收这些数据。

Nowadays, you know, we talk about it in the, you know, using AI terminology as, you know, encoders that, you know, take this data in.

Speaker 1

然后是解码器,也就是行动部分,或者说,车上的生成部分。

And then there's the kind of the decoder, the action, you know, the generative part, if you will, in the car.

Speaker 1

而这里的生成任务就是,弄清楚如何驾驶。

And the generative task there is to, you know, figure out how to drive.

Speaker 1

是的

Yep.

Speaker 1

对吧?

Right?

Speaker 1

当然,这通过一个专门的接口连接到汽车,使我们能够控制车辆。

And that is, of course, connected through kind of a specialized interface to the car where we can actuate the vehicle.

Speaker 1

所以,你才会看到方向盘转动,汽车带你行驶。

And, you know, that's why you see the steering wheel, you know, turn and it drives you around.

Speaker 0

好的。

Okay.

Speaker 0

所以我坐进我的车里。

So I get into my car.

Speaker 0

主要有三类传感器:激光雷达、雷达和摄像头。

There's three main families of sensors, Lidar, radar, and, cameras.

Speaker 0

然后系统利用这些数据首先构建对周围世界的模型,比如其他车辆的位置等等。

And then it is using that to first build a model of what's going on in the world, you know, where are all the other cars and things like that.

Speaker 0

然后,正如你所说,做出决策并通过汽车执行这些操作。

And then as you say, make decisions and then actuate that with the car.

Speaker 0

这就是你所处的系统。

That is the the system that you're living in.

Speaker 0

所有这些推理都是在本地完成的吗?还是说,当然,不是。

And is all that inference done locally or presumably, yes.

Speaker 0

所有东西都不在云端吗?

Nothing's in the cloud?

Speaker 1

没有实时的。

Nothing real time.

Speaker 1

没有实时的云端操作,虽然有些事情可以在云端发生,但它们并不是必需的。

Nothing real time in the And there are some things that can happen, you know, in the cloud, but they're not required.

Speaker 0

明白了。

Got it.

Speaker 0

什么是云端可实现的加分项例子?

What's an example of a nice to have that happens in the cloud?

Speaker 1

你可以想象一种情况,其中一些事情并不直接与测试代码或驾驶相关,比如你离开车后,我们想检查一下车子有没有变脏。

You can imagine a situation where we do you know, some of it is not directly related to the test code driving, say, after you leave the car, we wanna check that, you know, the car is not dirty.

Speaker 1

是的。

Yep.

Speaker 1

你没有留下任何东西,是的。

You didn't leave anything Yes.

Speaker 1

在那里。

There.

Speaker 1

如果你确实留下了什么东西,是的。

If you did leave, you know, an item Yeah.

Speaker 1

如果你离开时弄得一团糟,是的。

Well, if you, you know, left in a a mess Yep.

Speaker 1

那么,我想把车送到我们的一个维修点去清洁一下。

Then, you know, I wanna send the car to one of our depots, get it cleaned up.

Speaker 1

如果你把东西落在车上了,比如你的手机,我们希望检测到这种情况,是的。

If you left an item there, you know, on your phone, alright, we want to detect that Yep.

Speaker 1

然后,我们把它发送到我们的登记系统,通知你。

And then, you know, send it to our listing phone and let you know.

Speaker 1

对吧?

Right?

Speaker 1

所以,我们通过一个不在车上运行、而是离线处理的模型来实现这一点,因为这并不是一个实时任务,是的。

So, that, you know, we do with the kind of a by asking a model that actually lifts off board as opposed to having to put it on the car, right, because it's not a real time task Yes.

Speaker 1

与驾驶无关。

Related to, you know, the driving.

Speaker 1

所以,这是一个例子,是的。

So that's one example of Yeah.

Speaker 0

关于自动驾驶,Twitter上一直有很多争论。

Something that There are all these debates that go on on Twitter around self driving.

Speaker 0

比如,端到端方法与更模块化的方法之间的争论。

So I can think of, you know, end to end versus the more kind of modular approach.

Speaker 0

还有仅使用摄像头与使用多种传感器的争论。

There's cameras only versus array of sensors.

Speaker 0

我无法判断,这些争论对这个领域的专家来说真的有趣吗,还是说这些只是已经定论的问题,只是算法的素材?

And I can't tell, are these debates actually interesting to an expert in the field, or do you think these are just settled matters and they're just grist for the algorithm?

Speaker 1

我理解这些问题的来源。

I understand where the questions are coming from.

Speaker 1

我发现,通常这些问题的提出方式和争论的方式,丢失了很多真正重要的细微差别和细节。

I do find that, kind of, often, the way they're posed and the way the debate happens is losing a lot of the nuance and a lot of detail that really matters.

Speaker 1

对我来说,最有意思的技术问题就存在于这个层面。

Are, to me, the most interesting technical questions are in that level.

Speaker 1

因为我们在构建Waymo自动驾驶系统时,是从一个大型离线基础模型开始的。

Because the way we think about the in building the Waymo Driver, it starts with a large off board foundation model.

Speaker 1

嗯。

Mhmm.

Speaker 1

我可以想象构建一个大型模型,它能理解物理世界如何运作,理解驾驶所涉及的重要特性、驾驶的社会层面,以及什么是好司机而非坏司机。

I can imagine building a big model that understands how the physical world works and understands the important properties of what it means to drive, the social aspects of driving, and what it means to be, you know, a good driver as opposed to a bad one.

Speaker 1

所以,这就是基础。

So, that's the foundation.

Speaker 1

然后我们将它专化为三个主要的离线教师模型。

Then we specialize it into, what are we calling it, three main off board teachers.

Speaker 1

仍然存在大型、高容量的离线模型。

There are still large high capacity off board models.

Speaker 1

有Waymo驾驶员、模拟器,还有评判者。

There's the Waymo Driver, there is the simulator, and then there's the Critic.

Speaker 1

对吧?

Right?

Speaker 1

然后这些模型会被蒸馏成更小的模型,以便更快地进行推理。

And those then get distilled into smaller models that you can run inference on faster.

Speaker 1

因此,Waymo驾驶员成为车载系统的核心骨干。

So, the Waymo Driver becomes the backbone, the mount backbone of what's in the car.

Speaker 1

模拟器当然用于驱动我们的云端合成生成环境,以进行训练和系统评估,而评判者则

The simulator, of course, is what powers our synthetic generative environment that can run on the cloud for training and for evaluation in close of our system, and the critic Does

Speaker 0

Validar会本地运行吗?

the Validar ever run locally?

Speaker 1

不。

No.

Speaker 1

不,它不会。

No, it doesn't.

Speaker 1

是的。

Yeah.

Speaker 1

然而,我认为有趣的是,解码器的工作方式、模型的工作方式,如果你思考模拟器中生成这些真实世界的任务,以及人们如何行为、车辆如何行为,你知道的。

However, what I think is interesting, in a way, the way the decoder works, the way the model works, if you think about the generative task in the simulator of kind of creating those realistic worlds and how, you know, other people behave, how, you know, cars Yeah.

Speaker 1

行人,是的。

Pedestrians Yeah.

Speaker 1

骑自行车的人,为了在车上实时解决的任务,存在一种根本性的共同能力,即理解这些物体如何相互关联,并预测如果在车上运行,它们未来可能的行为,然后在模拟器中生成这些概率性行为。

Cyclists, in order and the task that you have to solve on the car in real time, there is this fundamental shared capability of understanding how these objects relate to each other and predicting what they might do in the future if you are running on the car and then generating those, you know, some sampling, those probabilistic behaviors in the simulator.

Speaker 1

所以,它们是不同的模型,但这就是为什么共享的基础模型能够同时支持两者。

So, it's it's different models, but there is you know, this is why the shared foundation model is able to power both.

Speaker 1

明白了。

Got it.

Speaker 1

同样地,你想想批评者的作用,批评者的任务是发现有趣的事件,然后对什么是良好行为、什么是不良行为发表看法。

And similarly, you think about the critic, like the job of the critic is to find interesting events and then, you know, be opinionated about what's good behavior and what's bad behavior.

Speaker 1

同样是基本的理解。

Similar fundamental understanding.

Speaker 1

是的。

Yes.

Speaker 1

对吧?

Right?

Speaker 1

如果你在车上运行推理,仍然需要确定在这些未来的多种可能性中,你想要选择哪一个来实现。

If you're running, you know, inference on the car, you still have to, like, figure out which of the multiple hypotheses of these future worlds you wanna, you know Yes.

Speaker 1

采取行动去引导方向。

Take action to steer towards.

Speaker 1

是的。

Yes.

Speaker 0

对吧?

Right?

Speaker 0

好的。

Okay.

Speaker 0

这些都基于同一个基础模型吗?

And these are all downstream of the same foundation model?

Speaker 1

没错。

That's right.

Speaker 1

先从基础模型开始。

So start with the foundation model.

Speaker 1

是的。

Yep.

Speaker 1

你知道,你进行微调,仍然使用外部模型。

You know, you, you know, specialize in fine tune, still off board model.

Speaker 1

这些是教师模型,然后你将每个教师模型的知识蒸馏到...

Those are the teachers, and then you distill each one of the teachers kind of distill Yes.

Speaker 1

它自己的学生模型中。

Its own student.

Speaker 1

是的。

Yes.

Speaker 1

对吧?

Right?

Speaker 1

驾驶员、模拟器、评判者。

The driver, the simulator, the critic.

Speaker 1

是的。

Yes.

Speaker 0

你二十年前就开始做自动驾驶了。

You started working on self driving twenty years ago.

Speaker 0

对。

Yeah.

Speaker 0

当你思考技术的演进时,这只是一个规模扩大问题吗?也就是说,我们只需要投入足够的算力就行?

As you think about the tech evolution, is this just a scaling loss story where we had to be able to throw enough compute at us?

Speaker 0

还是说,我们需要等待某些架构方法被发明出来?

Were there architectural approaches we needed to wait to, have be invented?

Speaker 0

这只是一个故事吗?我们需要花二十年时间走错死胡同,最终才找到正确的方法?

Was it just a story of we needed twenty years of going down the wrong cul de sacs before we eventually arrived at the right approach?

Speaker 0

以你现在的认知,如果你在2015年就想打造一个成功的Waymo,这可能吗?还是说当时需要某些关键技术的突破?

You know, could you knowing what you know now, could you have a successful Waymo in market in 2015, or was there some enabling technology?

Speaker 1

不可能。

No.

Speaker 1

这些年来发生的技术突破至关重要,尤其是在人工智能领域。

Technology breakthroughs that happened over the years were critically important, primarily in AI.

Speaker 1

嗯。

Mhmm.

Speaker 1

但其他领域也很重要,比如计算能力。

But also in other areas like, you know, compute.

Speaker 1

你需要强大的计算能力来支撑。

You got a heavy compute that you need to work on.

Speaker 1

对。

Yep.

Speaker 1

我不认为这是经历了上千个死胡同,然后不得不回头,最终找到唯一正确路径的过程。

Now, I wouldn't characterize it as like going, you know, a thousand different dead ends and then having to retract and then finding like the one right path.

Speaker 1

我认为这是一个迭代学习和不断演进的过程。

I would characterize it as, iterative learning and evolution.

Speaker 1

是的。

Yes.

Speaker 1

然后,Transformer出现了,但Transformer本身是一种非常通用的架构,对吧?

And then, you know, Transformers came around, but, you know, Transformers, for example, are very general architecture, right?

Speaker 1

对。

Yep.

Speaker 1

它推动了大语言模型,也推动了我们的模型。

Powers LLM's powers, you know, our models.

Speaker 1

但如何将它们应用于这个领域,我认为这并不是Transformer自然衍生出来的结果。

But how you apply them to that space, I think this is where It didn't just fall out of transformers.

Speaker 1

没错。

Exactly.

Speaker 1

对。

Right.

Speaker 1

当然,人们喜欢谈论架构,架构确实很重要。

And of course, people like to talk about architectures, architecture is important.

Speaker 1

但实际上,很多因素都取决于你的指标、评估机制、所有的训练方法,当然还有新数据。

But really, a lot of it comes down primarily to your metrics, to your evaluation mechanisms, you know, all of the training recipes, and, of course, new data.

Speaker 0

是的。

Yes.

Speaker 0

语言模型擅长处理文本,或者说具体来说是词元,显然在那些拥有某种单一文本语料的领域表现最佳,比如编程,因为这些领域本来就完全是文本形式的,这非常有帮助。

LMs are good at text or, I mean, tokens specifically, and obviously perform best at domains that have some kind of single corpse text they can work on, like coding, where it's very helpful that everything was just kind of textual already.

Speaker 0

部分成功之处在于,我们为各个领域创建了文本表示形式,以便能够将语言模型应用于这些领域。

And part of the success has been creating textual representations for domains such that we can then, you know, put LMs against them.

Speaker 0

你能描述一下你是如何编码你所看到的世界的吗?

Can you describe how you encode the worlds that you're seeing?

Speaker 0

好的。

Yeah.

Speaker 0

我的意思是,你只是在构建一个三维模型,本质上就像一个三维位图吗?

I mean, are you just building a three d mod like a three d bitmap essentially?

Speaker 0

或者

Or

Speaker 1

所以,我认为我们可以深入探讨一下编码器和解码器部分之间的接口问题。

So this is where I think we can get a bit into the this question of what is the interface between the encoder and the decoder parts.

Speaker 1

这同时也涉及到你之前提到的那一点,即人们喜欢争论是否要端到端。让我们稍微谈一谈端到端,然后再回到这两个部分之间的接口是什么。

And I think that touches also on the, you know, the thing you flagged earlier where people like to, you debate end to end or not end So, to kind of the way let's make it, you know, a little bit about end to end and then get back to like what is the interface between those two.

Speaker 1

对吧?

Right?

Speaker 1

那么当我们说端到端时,我们指的是什么?

So when we say end to end, what do we mean?

Speaker 1

我们指的是一个大型的机器学习模型。

We mean that it is some large ML model.

Speaker 1

通常,你不会一次性从头构建它们。

Typically, you don't build them monolithically.

Speaker 1

有不同的部分和不同的子图。

Have, you know, different parts and different subgraphs.

Speaker 1

但重要的是,你可以在整个过程中传播和反向传播梯度和损失函数,因此每一层你都可以学习对最终任务至关重要的权重和表示。

But what's important is that you can propagate and backprop the gradient and the loss function all through the So, different can you know, every layer you can learn the weights and the representations that matter for the the final task.

Speaker 1

你不需要强迫它通过某个狭窄的通道,比如编码器和解码器之间的接口。

You don't force it through some, you know, narrow funnel between, let's say, the encoder and the decoder.

Speaker 0

是的。

Yeah.

Speaker 0

我认为我对端到端有一个简单的理解,就是像素输入,汽车动作输出,

I think I have a simple view of end to end being, you know, pixels go in and car actions come out,

Speaker 1

有点过于简化了。

a bit of an oversimplification.

Speaker 1

但对的。

But Yeah.

Speaker 1

完全正确。

That's exactly right.

Speaker 1

这基本上是它的基础标准版本。

And if like, this is kind of the basic vanilla version of it.

Speaker 1

对吧?

Right?

Speaker 1

如果你想想,要构建一个能够实现完全自动驾驶的驾驶员,需要些什么。

There if you think about the, you know, what will it take to build the driver that's capable of fully autonomous operations.

Speaker 1

是的。

Yep.

Speaker 1

你想想整个驾驶员生态系统:驾驶员、模拟器、评估系统。

You think about this entire ecosystem of the driver, the simulator, the critic.

Speaker 1

是的。

Yep.

Speaker 1

如果你只做这些:像素输入,轨迹输出,要同时做好这三者并达到我们所需的高安全性和高性能就变得非常困难,也很难实现规模化。

If that's all you do, pixels in, trajectories out, it becomes very difficult to do all of those three and achieve the high level of safety and performance that we require, and it becomes very difficult to kind of do it at scale.

Speaker 1

然而,这确实是一种非常简单的入门方式。

However, if, you know, that's it's kind of a very easy way to get started.

Speaker 1

对吧?

Right?

Speaker 1

你收集一些数据,有点像在大语言模型世界里的做法,对吧?

You collect some data, kind of like, know, an answer to the LLM world, right?

Speaker 1

最简单的方法就是选一个模型。

Easiest thing you can do is have you know, pick a model.

Speaker 1

是的。

Yep.

Speaker 1

如今最容易上手的方式就是直接用一个视觉语言模型。

The easiest way to get started nowadays would be just, you know, take a VLM.

Speaker 1

它已经具备了与语言对齐的摄像头编码器。

It already has a of a language aligned camera encoder.

Speaker 1

是的。

Yep.

Speaker 1

然后它还有一个解码器,可以生成文本输出。

And then it has a decoder that, you know, will can predict, you know, generate text by end.

Speaker 1

你可以对它进行微调,比如说:别生成文本了,生成轨迹吧。

And you can fine tune it and say, hey, instead of text, generate trajectories.

Speaker 1

你知道的,这完全做得到。

You know, very, very doable.

Speaker 1

事实上,我们不久前发表了一篇论文,是的。

In fact, we you know, a while ago, we published a paper Yes.

Speaker 1

叫EMMA,就是做这件事的。

Called EMMA that did exactly that.

Speaker 1

是的。

Yes.

Speaker 1

在正常情况下,它实际上能开得非常好。

And it will actually, I mean, in the nominal case, drive pretty darn well.

Speaker 1

嗯。

Mhmm.

Speaker 1

这简直令人难以置信。

Which is mind blowingly impressive.

Speaker 0

很有趣。

Is very funny.

Speaker 0

是的。

Yeah.

Speaker 1

我的意思是,你提到的有这么多

I mean, there's so many You're saying you

Speaker 0

你可以拿一个原本与驾驶完全无关的现成模型,却能得到这么好的结果。

can take an off the shelf model, which has nothing to do with driving to start with, and you'll get these good results.

Speaker 1

没错。

That's right.

Speaker 1

你明白了。

You get it.

Speaker 1

在正常情况下。

In the normal case.

Speaker 1

是的。

Yeah.

Speaker 1

我只是想说清楚。

I just want to be clear.

Speaker 1

这和你需要的水平相差几个数量级。

It's orders of magnitude away from what you need.

Speaker 0

是的。

Yeah.

Speaker 0

你不应该在街上试用,但它确实有效。

You should not try it on the street, but it works.

Speaker 1

但例如

But for example

Speaker 0

像一匹会说话的马。

Like a talking horse.

Speaker 0

它会说话,这很令人印象深刻,你知道吗?

It's impressive that it's talking, you know?

Speaker 1

没错。

Exactly.

Speaker 1

没错。

Exactly.

Speaker 1

实际上,你想要构建的产品可能是一个驾驶辅助系统,而不是完全自动驾驶系统。

You can actually the product that you wanted to build was maybe a driver assist system, not a fully autonomous system.

Speaker 1

是的。

Yep.

Speaker 1

那么,也许你只需要做到这一点。

Then maybe that's all you need to do.

Speaker 1

是的。

Yep.

Speaker 1

而对于这一点,你并不需要所有这些模拟器和其他设备,因为可靠性要求的‘九’的个数大幅降低了。

And then for that, you don't need all this other machinery of the simulator and the so that that's because the number of nines is drastically lower.

Speaker 1

但这很有趣,因为这里确实有一些直觉,解释了为什么这样可行。

But there is this is interesting because there you know, is some intuition behind, you know, why that works.

Speaker 1

如果你想想驾驶的难点,其实和进行对话没什么不同。

If you think about the hard parts of driving, it's, you know, not unlike, you know, having a conversation.

Speaker 1

嗯嗯。

Mhmm.

Speaker 1

但在大语言模型的世界里,你是在建模语言,或者建模句子和词语层面的对话。

Except, you know, if in the LLM world, right, having you know, you're modeling language or maybe modeling a dialogue in the space of sentences and words.

Speaker 1

让驾驶变得困难的,也是这种多智能体的社会互动特性。

What makes driving hard is also this kind of multi agent social interactive part of it.

Speaker 1

对吧?

Right?

Speaker 1

如果我做了一件事,这会影响你,也会影响其他人。

If I do something, that's going to affect you, it's going affect somebody else.

Speaker 1

历史很重要,这不仅仅是局部的、几何意义上的。

And history matters, it's not local and just geometric.

Speaker 1

上下文很重要,语义也很重要。

Context matters, semantics matters.

Speaker 1

但这是在另一种语境下,不是用词语来表达的,而是用肢体语言之类的表达方式。

So but it's in a different you know, it's not in the language of words, in the language of kind of well, body language, if you will.

Speaker 1

对吧?

Right?

Speaker 1

为什么会这样?如果我们采用这种方法, empirically 也得到了验证。

How so and we see that empirically validated if you, you know, do this approach.

Speaker 1

好的。

Okay.

Speaker 1

那么,假设我们构建了这个系统。

So then let's say we build this thing.

Speaker 1

仅使用摄像头和图像编码器,输入像素,输出轨迹。

Just cameras camera encoder, pixels go in, trajectory go out.

Speaker 1

它的质量足以在正常情况下驾驶。

It the quality is sufficient to, you know, drive in the normal case.

Speaker 1

但它不足以应对长尾的边缘情况,达不到我们所需的超人级安全标准。

It's not sufficient to deal with the long tail of, you know, the edge cases and hit the high bar of superhuman safety that we require.

Speaker 1

那么,你就会开始问:还需要什么?

So then, you start asking the question, what else do you need?

Speaker 0

是的。

Yes.

Speaker 1

如果你在训练这个系统的时候,只是单纯观察其他人的驾驶方式,没错。

And if all you did was kind of observing how other people drive when you trained Yep.

Speaker 1

这套系统可能只是被动观察人们如何驾驶、如何交互,或者你也可以亲自驾驶这辆车,再用模仿学习来训练它。

The system, maybe observing just passively how people drive and how they interact, maybe also driving the car yourself and then using imitative learning to train it.

Speaker 1

要知道,这些是远远不够的。

Mind that that's not enough.

Speaker 1

你必须在闭环中开展一些工作。

You have to do something in closed loop.

Speaker 1

你差不多得——你懂的,必须采用RLFT这类方法,而这种方法也和我们上次了解到的情况一致。

You kinda have to you know, you have to do things like RLFT, which is also, you know, parallel to what we see last time.

Speaker 1

RLFT?

RLFT?

Speaker 1

RLFT,也就是基于强化学习的微调方法。

RLFT, reinforcement learning based fine tuning.

Speaker 0

哦,好吧。

Oh, okay.

Speaker 0

对。

Yeah.

Speaker 0

对。

Yeah.

Speaker 1

你知道的,类似于基于人类反馈的强化学习。

You know, similar to the reinforcement learning with human feedback Yeah.

Speaker 1

在大语言模型领域。

In in the LLM world.

Speaker 1

对吧?

Right?

Speaker 1

是的。

Yeah.

Speaker 1

你想要的是真正的闭环驾驶,也就是探索各种不同的情境,然后给予它一个奖励信号,以保持其在正常分布范围内。

You wanna do maybe closed loop proper closed loop driving where, you know, you explore all kinds of different situations and then you give it a reward signal to kind of keep it in distribution.

Speaker 1

为此,你需要一个真实的模拟器。

For that, then you need a realistic simulator.

Speaker 1

对吧?

Right?

Speaker 1

如果你想要一个良好的强化学习系统,你还需要对奖励函数有自己的见解。

You also, if you want to have a good RL system, you need to have an opinion for the reward function.

Speaker 1

这就是评判者发挥作用的地方。

This is where the critic comes in.

Speaker 1

对吧?

Right?

Speaker 1

如果你有一个纯粹的端到端系统,我们来看看模拟器。

If you have a purely end to end system, let's look at the simulator.

Speaker 1

那么,你该怎么做?

Now, what do you do?

Speaker 1

你只能从像素直接映射到轨迹。

You have to you're you're then constrained to just go from pixels to trajectories.

Speaker 1

对吧?

Right?

Speaker 1

就是这样,你知道的,你可以把And系统运行在上面。

That's that's all, know, you can run the And system on.

Speaker 1

这是一个非常高维的空间。

It it's a very high dimensional space.

Speaker 1

这是一个,你知道的,生成所有内容的难题。

It's a it's a, you know, hard problem to generate everything.

Speaker 1

但即使你解决了这个问题,完全通过像素到轨迹和模拟来进行训练或评估,效率也会变得极其低下。

But even if you solve that, it just becomes incredibly inefficient to run it in the full way of pixels to trajectories and simulation for training or for evaluation.

Speaker 1

所以,这时候就需要中间表示了。

So, this is when intermediate representation come in.

Speaker 1

在这个测试组中,有一些中间表示存在于世界上。

There are some intermediate representations in the world in this test group.

Speaker 1

你知道,物理世界,确实是正确的。

You know, the physical world, know are correct.

Speaker 1

是的。

Yes.

Speaker 1

它们还不够充分。

They are not sufficient.

Speaker 1

是的。

Yes.

Speaker 1

但它们并没有限制通用性。

But they're not generality limiting.

Speaker 1

对吧?

Right?

Speaker 1

换句话说,这里有一个物体,有一个道路的概念,有标志,有速度限制。

In other words, there's an object here, there's, you know, a concept of a road, there's signs, there's speed limits.

Speaker 1

所以,这就是在这些基础上进行增强的地方。

So this is where augmenting that Yes.

Speaker 1

用编码器-解码器学习到的表示,也就是这些学习到的嵌入,结合这些更有结构的表示。

Learned representation, those learned embeddings from the encoder decoder with that, you know, more, you know, structured representation Yes.

展开剩余字幕(还有 480 条)
Speaker 1

这就是我们所做的,我们发现这种做法为我们提供了额外的调节手段来模拟。

Is what we do, and we find that this kind of gives us additional knobs to simulate Yes.

Speaker 1

你知道,在那个空间里,仅仅是像素到轨迹。

You know, in in that space, just, you know, pixels to Yep.

Speaker 1

轨迹。

Trajectories.

Speaker 1

它使我们能够在实时中增加额外的安全验证层。

It allows us to have additional safety validation Yep.

Speaker 1

实时的层级。

Layers in real time.

Speaker 1

它还为我们提供了额外的机制来定义奖励函数,用于评估评论家,所以这又回到了我们之前的完整循环。

And it also allows us, you know it gives us additional mechanisms to specify the reward function, you know, for evaluation of the critic or, you know, for So this is, again, like, we've gone kind of full circle of it.

Speaker 1

是意图吗?

Is it intent?

Speaker 1

是的,确实是。

Yes, it is.

Speaker 1

是的。

Yes.

Speaker 1

但如果你想要在全自动驾驶的规模上实现它,就需要用所有这些其他东西来增强它。

But if you wanna do it at scale for full autonomy, it's augmented with all of this other stuff.

Speaker 0

关于模拟这一点,非常有趣。

That's very interesting on the simulating point.

Speaker 0

对于端到端模型来说,模拟非常困难,因为处理端到端或中间表示更容易,而不是试图完美地构建世界的全貌。

It's just very hard to simulate for an End to End model because it's easier to deal in End to End or it's easier to deal in intermediate representations rather than coming up to pick a perfect view of the world.

Speaker 1

你需要两者兼备。

You need both.

Speaker 1

对。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

所以,拥有一个经过增强的端到端架构,是的。

So, you know, having an end to end architecture that's augmented Yes.

Speaker 1

这种结构让你能够在这两个领域之间自由切换。

With that structure allows you to kind of play in both of those worlds.

Speaker 0

是的。

Yeah.

Speaker 0

对。

Yeah.

Speaker 0

没错。

Yeah.

Speaker 0

作为一辆自动驾驶汽车,你希望实现什么功能?

What are you looking to do as a self driving car?

Speaker 0

听起来可能有点奇怪,但我认为人们可能没有意识到,你需要解决许多不同的问题,比如让乘客安全到达目的地。

I mean, it sounds funny, but I think people maybe don't realize that there are many different things that you're looking to solve for, where you're looking to get the person to their destination.

Speaker 0

你还要确保他们能相对及时地到达目的地,是的。

You're looking to get them there reasonably promptly Yeah.

Speaker 0

但同时也要驾驶得非常平稳,嗯。

But also drive quite smoothly Mhmm.

Speaker 0

而且还要有众多安全措施。

And also have many lines of safety Yeah.

Speaker 0

而且也不要惹恼其他司机,被人按喇叭,你知道的,就是各种情况。

And also not annoy other drivers and get honked at, and, you know, and and and.

Speaker 0

那么,有哪些奖励函数或者你们在优化的一些可能对人们来说并不明显的目标呢?

And so what are some of the reward functions or kind of things you're optimizing for that maybe are not obvious to people?

Speaker 1

安全是首要关注点。

So safety is the primary focus.

Speaker 1

对吧?

Right?

Speaker 1

但当然,我们也希望车辆驾驶平稳,无论是对车内乘客还是其他道路参与者。

But, of course, we also want to be a smooth driver so that, you know, for both people in the car and other actors.

Speaker 1

是的。

Yep.

Speaker 1

我还希望它能成为一个可预测、行为得体的驾驶者,以便能够很好地融入交通环境。

And I also want it to be, you know, a predictable, well behaved one so that it can, like, nicely Yes.

Speaker 1

融入我们道路系统的整个社会生态中

Fit into the whole social ecosystem of our

Speaker 0

我们的道路。

our roadways.

Speaker 0

看起来,自动驾驶迅速出现的一个问题是,人们无法拥有美好的体验,或者,你知道的,并不是每个人都会善待这些机器人。

It seems like one of the issues that has quickly emerged with self driving is the fact that people can't have nice things or, you know, not everyone is nice to the robots.

Speaker 0

所以,你知道,无论你是穿过一个危险的区域,还是被堵住,或者,我可能不会在这里让你下车。

And so, you know, whether you're, you know, driving through a dodgy area or getting blocked or, you know, maybe I'm not gonna drop you off here.

Speaker 0

我可能会绕一圈,然后,你知道,在一个更好的地方让你下车。

Maybe I'm gonna go around the block and, you know, drop you somewhere better.

Speaker 0

但正如你所说,所有这些其他人类相关的问题,你们是如何解决的呢?

But all of these, as you say, kind of other human issues, how do go about solving those?

Speaker 1

你提到的很多问题,其实都是我们需要去解决的。

A lot of the ones that you mentioned are just things that, you know, we need to work on.

Speaker 1

是的。

Yeah.

Speaker 1

而且,说实话,如果我们没有把你送到目的地,那说明我们正好把你停在了你想要的地方。

And understanding, honestly, you know, said that if we're not dropping you off, we're exactly where you want it to be dropped off.

Speaker 1

对。

Yep.

Speaker 1

或者,我们没有给你一个良好的交互界面,是的。

Or, you know, we don't give you kind of a good interface Yeah.

Speaker 1

来告诉我们。

To tell us.

Speaker 1

这是我们的责任。

That's on us.

Speaker 1

对吧?

Right?

Speaker 1

只是,如何把它做得更好。

It's just, you know, how to make it better.

Speaker 0

是的。

Yeah.

Speaker 0

对吧?

Right?

Speaker 0

感觉下车这个环节实际上是自动驾驶旅程中相当微妙的一部分。

It feels like the drop off is actually a pretty nuanced part of the the self driving journey.

Speaker 0

比如高速路段和 Yeah。

Like, the the highway stuff and the Yeah.

Speaker 0

你知道,35英里每小时的道路上,这些都已经做得很好了,但下车体验中确实有很多细微之处。

You know, the 35 mile an hour roads, that is all nailed, but there's just like a lot of nuance in the drop off experience.

Speaker 1

我觉得这些都很困难。

I'd say they're all hard.

Speaker 1

你选择高速公路和选择下车地点,原因其实是不同的,对吧?

You pick freeways and you pick drop offs for different reasons, right?

Speaker 1

关于下车,你说得完全对。

For drop offs, there's you're absolutely right.

Speaker 1

确实有一些可能并不明显的事情。

There are, you know, a few things that are maybe not obvious.

Speaker 1

你知道,你只是在思考这个问题。

You know, you just think about this problem.

Speaker 1

但关键是理解你想要去哪里,没错。

But it's understanding where you wanna go Yep.

Speaker 1

并尽可能让你方便。

And making it as convenient as possible for you.

Speaker 1

而从下车点接人的情况,并不完全对称。

And pickups from drop off, it's not exactly symmetric on that.

Speaker 1

但还要理解车站的环境,你知道的,你该在哪里停车?

But then it was also understanding the context of the station where you you know, where do you stop?

Speaker 1

是的。

Yes.

Speaker 1

你不想堵住车道。

You don't wanna block a driveway.

Speaker 1

也不想,你知道的,双排停车。

Don't wanna, you know, double park.

Speaker 1

但在某些情况下,如果只是短暂停留,也许是可以接受的。

Although in some cases where if it's just a quick one, maybe it's okay.

Speaker 1

因此,要做好这一点,让骑手的体验更加顺畅、减少摩擦,其中涉及很多细节,是的。

So there's a lot of nuance that goes into doing that well so that it's a smooth less frictionless experience for the rider Yes.

Speaker 1

以及其他人的体验。

As well as other folks.

Speaker 1

是的。

Yeah.

Speaker 1

高速公路在大多数时候

Freeways, for most of the time

Speaker 0

是的。

Yes.

Speaker 1

它们通常没什么特别的情况发生。

They're very you know, not much happens.

Speaker 1

嗯。

Mhmm.

Speaker 1

它们的结构非常完善,是的。

They're very well structured Yep.

Speaker 1

因为我们就是按这种方式设计的。

Because we designed them that way.

Speaker 1

但仍然存在大量非常复杂的情况发生,是的。

But there is still that long tail of really complicated stuff that happens Yes.

Speaker 1

当发生糟糕事件时,其后果

Where the consequences of, you know, a bad event

Speaker 0

要严重得多。

Much more severe.

Speaker 1

后果要严重得多,对吧?

Are much more severe, right?

Speaker 1

速度要高得多。

Speed is much higher.

Speaker 1

所有事情都与速度呈二次方关系。

Everything is, you know, quadratic in speed.

Speaker 1

所以你可能会在那里看到很多东西。

So it but you may see a lot of stuff there.

Speaker 1

想象一下高架桥上的护栏掉落的情景。

You imagine grills falling off of freeways.

Speaker 1

想象一下,人们发生事故,车辆失控打转。

Imagine, you know, people getting into accidents and kind of spinning out of control.

Speaker 1

你可能会看到一辆

You see one of

Speaker 0

那种平板卡车,上面堆满了东西,你跟在它后面开车?

those flatbed trucks with just, like, a bunch of stuff piled in it, and you're driving behind it?

Speaker 0

我知道。

I know.

Speaker 0

我总觉得这非常让人紧张。

I always find it very nerve racking.

Speaker 0

看起来有点

Looks a bit

Speaker 1

我知道。

I know.

Speaker 1

对。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

而且我们确实见过它们留下一长串痕迹。

And we're, like, we've seen them, you know, leave a trail.

Speaker 0

是的。

Yes.

Speaker 0

是的。

Yes.

Speaker 0

是的。

Yeah.

Speaker 0

好的。

Okay.

Speaker 0

所以这是另一组问题。

So it's a different set of problems.

Speaker 0

但我觉得,大家对Waymo的普遍看法是,你们已经基本解决了驾驶问题,现在主要是在扩大规模,以及应对一些极端情况,比如大雪天气。

But it feels I feel like the general sentiment with Waymo is that the driving has mostly now been solved by you guys, and it's kind of a question of scaling up and maybe some super long tail stuff, really snowy conditions.

Speaker 1

就像。

Like,

Speaker 0

这是你们内部的判断吗?还是说实际情况比这复杂得多?

is that your sense internally, or is there actually much more nuance to it than that?

Speaker 1

我会说,没错,听起来像是我们已经完成工程阶段了。

I would say the yeah, it sounds like, you know, we're done with engineering.

Speaker 1

是的。

Yeah.

Speaker 1

我认为我们已经明显从科学研究和核心技术开发阶段,进入了加速全球规模化与部署的新阶段。

I would say that we've clearly moved past the stage of scientific research and kind of deep core technology development to this new phase of accelerated global scaling and deployment.

Speaker 1

是的。

Yes.

Speaker 1

所以,你知道,我们仍然有工作要做。

So, you know, we still have work to do.

Speaker 1

对。

Yeah.

Speaker 1

对吧?

Right?

Speaker 1

但我今天看不到核心科技有任何限制或缺口。

But I don't see today any limitations or any gaps in the core technology.

Speaker 0

驾驶技术现在已经足够好了吗?

The driving is good enough now?

Speaker 1

嗯,核心科技已经足够好了,我想不出任何驾驶场景是基础技术无法支持的。

Well, the core technology, think, is good enough that I can't, you know, think of any, you know, aspect of driving that is not supported by the fundamental technology.

Speaker 1

不过话说回来,在能够负责任地部署之前,我们还有很多在专业化和验证方面的工作要做。

Now, that said, there is a lot of work to do in specialization and in validation before, you know, we can deploy responsibly.

Speaker 1

对吧?

Right?

Speaker 1

我们并不是在全世界各地都开车。

We're not driving everywhere in the world.

Speaker 1

你知道,我们计划今年在伦敦和东京开始运营。

You know, we are planning to start operating in London and in Tokyo this year.

Speaker 1

而且,你知道,你们今天在旧金山用的司机,能直接搬到伦敦去吗?不行。

And, you know, are we do we have a driver that, you know, you're using today in San Francisco that we can just plop down in London and No.

Speaker 0

是的。

Yeah.

Speaker 0

对吧?

Right?

Speaker 1

但从核心技术是否到位的角度来看,我们看到的情况非常令人鼓舞。

But what we're seeing is incredibly encouraging from the perspective of like, is the core technology there?

Speaker 1

所以,现在的问题是收集数据,进行一些专门化和验证。

So, now it's a matter of collecting the data, doing some specialization and validation.

Speaker 1

这两个地方的交通标志不同,人们靠另一侧驾驶,但你知道,这对计算机来说其实并不难。

Signs are different in both of those places, people drive on the other side of the road, but, you know, that's actually not that hard for computers.

Speaker 1

对吧?

Right?

Speaker 1

核心技术的泛化能力非常好,但仍然需要做大量工作。

And core technology generalizes really well, but it's still work that you have to do.

Speaker 0

泛化效果最差的是什么?

What generalizes least well?

Speaker 0

我们越来越发现,尤其是

Increasingly, we're finding, especially,

Speaker 1

你知道,现在我们能够将Waymo的AI与数字世界中的AI、视觉语言模型连接起来,从而继承视觉语言模型的通用世界知识。

you know, now that we're able to kind of hook the Waymo AI to the AI in the digital world, the VLMs, and kind of inherit the general world knowledge from VLMs.

Speaker 1

由于我们引入了这种通用知识,我们在零样本或小样本学习方面取得了非常显著的成果。

We're seeing really strong results from, like, zero shot or, you know, few shot learning because of that general knowledge that we bring in.

Speaker 1

但有一些因素,比如天气,特别是寒冷的冬季天气,会影响整个系统,对吧?

But there are a few things like, let's say, weather, cold winter weather, where it affects the entire stack, right?

Speaker 1

所以,这不仅仅是AI的问题,我们实际上还需要。

So, it's not just, you know, the AI, we actually have to.

Speaker 1

硬件,是的。

The hardware, yeah.

Speaker 1

你需要硬件,需要合适的清洁溶液、加热元件,然后还要考虑那些计算机完全能解决的问题,比如运动控制和湿滑路面,这些都需要大量工作。

You need the hardware, you need to have the proper cleaning solution, heating elements in it, and then you think about things that are completely solvable by computers like motion control and slippery surfaces, So, that takes a bunch of work.

Speaker 1

你不能仅仅通过引入一些VLM就免费获得这些能力,是的。

You don't get that for free from just, you know, pulling in some, you know, VLM Yes.

Speaker 0

解码器。

Decoder.

Speaker 0

早期的情况是,我的印象——虽然我对这领域一无所知——是,在早期市场,比如旧金山或凤凰城,可能做了大量针对当地的具体工作,无论是地图还是其他方面,而你们似乎是通过通用化解决了这个问题,还是只是提升了针对每个城市开展工作的能力?

Was it the case I mean, my impression, not knowing anything, is that in the early days, there was maybe a lot of San Francisco specific work or Phoenix specific work in the early markets, whether it be mapping or something else, and that you guys seem to either have solved that in generalizing it or just scaled up your ability to do the city specific work?

Speaker 0

是什么促成了这种快速扩展到新城市的能力?

What enabled the kind of the rapid city expansion?

Speaker 1

我们通常把Waymo驾驶系统的功能和部署能力,主要不是直接看作城市或邮政编码层面的问题。

We usually think about it, the you capability of the Waymo driver as well as deployment, not primarily and directly in that space of cities or zip codes.

Speaker 1

想想操作领域。

Think about the operating domain.

Speaker 1

然后还有就是

Then there's just the

Speaker 0

高速公路和寒冷天气全都

Freeways and cold weather all

Speaker 1

一直如此。

the time.

Speaker 1

高速公路、寒冷天气、雪、雨、雾、密度等等。

Freeways, cold weather, snow, rain, fog, density, etcetera, etcetera.

Speaker 1

而这些正是我们正在构建的,也是我们进行评估的地方,然后这些会对应到某个具体城市,无论是在操作领域之内还是之外。

And then that's what we are building, that's where we're evaluating, and then that maps to a city, like particular city, be it within the operating domain or outside of it.

Speaker 1

如果我们稍微回顾一下历史,我们首次提供完全自动驾驶商业服务的初始部署是在2020年的亚利桑那州钱德勒。

If we can provide history a little bit, our initial deployment in where we started offering a fully autonomous commercial service for the first time was in 2020 in Chandler, Arizona.

Speaker 1

而那是在我们所称的第四代Waymo驾驶员上实现的。

So, and that was on what we called the fourth generation of the Waymo Driver.

Speaker 1

这当时是,如果你还记得的话,那些配备了不同硬件和软件的太平洋小型客车。

This was the, if you remember, the Pacifica minivans with, you know, different hardware, different software.

Speaker 1

那时,我们非常专注于从头到尾完整地实现整个系统。

There, you know, we were super focused on kind of doing the whole thing end to end.

Speaker 1

嗯。

Mhmm.

Speaker 1

你知道,学习如何构建自动驾驶系统、评估它、定期部署、24/7与客户一起运营、从客户那里学习,然后我们非常专注于以钱德勒为主的这个运营区域,那是一个中等、低复杂度的区域。

You know, learn how to build the driver, evaluate it, deploy regularly, operate it end to end 20 fourseven with customers, learn from the customers, and then we're very focused on that operating domain of, you know, mostly Chandler, which is a medium, low complexity one.

Speaker 1

当我们过渡到我们系统的第五代时,也就是如今基于IP的系统,我们真的想大幅拓展这个运营区域。

Then when we made the jump to the fifth generation of our system, this is, you know, what's on the IP basis today, we really wanted to take a huge bite out of that operating domain.

Speaker 1

我们在美国各地、各个州、不同城市都收集了数据。

And we collected data all over The United States, all different states, different cities.

Speaker 1

当我们选择在旧金山和凤凰城最复杂的区域部署时,我们在硬件上做出了重大升级,更重要的是在软件和人工智能方面。

When we chose to deploy in the hardest parts of San Francisco, hardest parts of Phoenix, we made a big jump on the hardware side and most importantly on the software, the AI side.

Speaker 1

我认为这是一次巨大的跃迁。

And I would say that was the big discontinuous jump.

Speaker 1

这就是你们现在看到的,当我们扩大规模并迭代了构建和部署自动驾驶系统的所有方面之后,现在你们看到我们在美国多地并行扩展的原因。

And that's what you're seeing now after we've scaled up and iterated on the, you know, all of the aspects of building and deploying driver, this is now why you're seeing us kind of, you know, go in parallel and scaling, you know, in The US and

Speaker 0

所以第五代自动驾驶系统比第四代更具通用性吗?

So driver version five was just a much more generalizable stack than version four?

Speaker 0

是因为它在更广泛的数据集上进行了训练吗?

And what was it about it that was it just that it had been trained on a much wider dataset?

Speaker 1

当我们对人工智能做出这一重大投入时,情况就变了。

It it was when we made this big bet on AI.

Speaker 1

是的。

Yeah.

Speaker 1

我认为第四代系统中包含更多小型的人工智能模型和机器学习模型。

That was I think there was a lot more, little AI models and ML models in the fourth generation.

Speaker 0

明白了。

Got it.

Speaker 1

我们做出了更大胆的投入和跨越,朝着……

We made a much bigger bet and jump to kind of

Speaker 0

AI作为第五代的骨干。

AI as the backbone for the fifth generation.

Speaker 0

AI是核心引擎,就像你所说的,第四代有很多小型的AI子系统用于

AI is the backbone as the core engine, as in you're saying that Gen four had lots of small little AI subsystems for

Speaker 1

好的。

okay.

Speaker 1

是的。

Yeah.

Speaker 1

对。

Yeah.

Speaker 1

我们已经完成了这一跃迁,此后一直在迭代和改进模型。

And that's been so we've kind of made that jump, and we've been, you know, iterating and improving the model since then.

Speaker 0

正如我们看到Waymo推出广泛自动驾驶时,它对整个系统产生了二次影响。

As we're seeing with Waymo rolling out widespread autonomy, it has second order changes on the entire system.

Speaker 0

在这种情况下,交通模式、其他驾驶员的行为,或者最终城市布局的变化。

In this case, traffic patterns or other drivers' behavior or eventually how cities are laid out.

Speaker 0

自主系统正在多个领域兴起。

And autonomous systems are coming in many domains.

Speaker 0

在商业领域,很快代理将无需人工干预即可进行交易。

In commerce, soon agents are going to be transacting without human intervention.

Speaker 0

我们基本上正在迎来无人值守的商业时代。

We're basically getting driverless commerce.

Speaker 0

Stripe正在为人工智能构建经济基础设施。

And Stripe is building the economic infrastructure for AI.

Speaker 0

作为其中的一部分,我们允许支付由人类或代理发起。

And as part of that, we're letting payments be initiated by humans or by agents.

Speaker 0

因此,如果你想向代理销售产品,或让你的代理在全网范围内花钱,请查看Stripe的代理商业套件。

So if you wanna sell to agents or if you wanna let your agents spend money all around the web, check out Stripe's Agentic Commerce Suite.

Speaker 0

我们能聊一下硬件吗?

Can we talk about hardware a second?

Speaker 0

有很多关于硬件的问题。

So lots of hardware questions.

Speaker 0

是的。

Yeah.

Speaker 0

但也许这个领域里的每个人都有一个非常有魅力的、定制的自动驾驶车辆演示。

But one is maybe everyone in this space has a very charismatic demo of a vehicle that is custom made Yeah.

Speaker 0

用于自动驾驶。

For self driving.

Speaker 0

所以,通常就是那种没有方向盘、座椅双向对坐的面包车。

And so, you know, it's often the van with the, you know, no steering wheel, seats facing in both directions.

Speaker 0

你们也有这样的车。

You know, you guys have one.

Speaker 0

特斯拉有无方向盘的Cyber Cab。

Tesla has the steering wheel less cyber cab.

Speaker 0

Cruise有Cruise Origin。

You know, Cruise have the Cruise Origin.

Speaker 0

而我们仍然在驾驶捷豹汽车,它们有前向方向盘,和普通家用轿车很相似。

And, we're still driving in Jaguars that have a steering wheel in the front and are pretty similar to consumer cars.

Speaker 0

这让我觉得很有意思,因为如果十年前我们讨论这个问题,可能会说,嗯,开发一辆定制汽车相对 straightforward(直接)。

And it's interesting to me because, know, if we were talking about this ten years ago, we might say, well, yeah, developing a custom car, like, that's relatively straightforward.

Speaker 0

我们知道如何在新车上安装一堆传感器,但软件需要很长时间才能完善。

We know how to put a bunch of sensors on a new car, but the software will take a long time.

Speaker 0

有趣的是,我们在软件方面取得了巨大进展,但令人惊讶的是,这些汽车仍然是人们日常驾驶车型的衍生品。

And what's interesting is we've made huge progress in the software, but interestingly, the cars are still derivatives of, you know, cars that people are driving.

Speaker 0

所以我想知道,为什么你觉得截至2026年,定制硬件还没有实现?

And so I'm curious why you just think the custom hardware has not happened as of 2026.

Speaker 0

这显然相比Waymo的重大突破只是个小改进,但令人好奇的是,它至今仍未发生。

It's obviously it's it's a small improvement compared to, you know, Waymo is the big improvement, but it's just interesting that it still hasn't happened.

Speaker 1

好吧,我们已经推出了第六代车辆,嗯。

Well, let's say our sixth generation of the vehicle Mhmm.

Speaker 1

而这个驾驶员就是我们版本的实现。

And the driver is our version of that.

Speaker 0

哦,不是的。

Oh, no.

Speaker 1

这是Ohai平台。

It It's the Ohai platform.

Speaker 1

所以,你知道,它仍然保留了Yes。

So that is you know, it still has the Yes.

Speaker 1

我们可以讨论一下,比如你是否希望座椅朝后。

You know, we can talk about, you know, whether you want to have the seats Sure.

Speaker 1

朝后还是不朝后。

Pointed backwards or not.

Speaker 1

实际上,你知道,看起来在演示中很不错,但从实际角度来看,Yes。

Actually, you know, think it's you know, it looks nice in a demo, but practically speaking Yes.

Speaker 1

也许这不是正确的方向。

Maybe not the way to go.

Speaker 1

但这确实是一款定制设计的Yes。

But that is it is a custom designed Yes.

Speaker 1

车辆,我们花了很多心思去思考如何摆脱那种传统设计的Yes。

Vehicle, and it is we put a lot of thought into, you know, moving away from a car that's designed Yeah.

Speaker 1

围绕驾驶员设计。

Around the driver Yes.

Speaker 1

围绕乘客设计的汽车。

To a car that's designed around Yes.

Speaker 1

乘客。

Passenger.

Speaker 1

而且它更加宽敞,虽然……但这种情况正在发生。

And it's, you know, much more spacious, like but and it's it's happening.

Speaker 1

你知道,我们还没向公众开放。

It's you know, we're not it's not open to the the public yet.

Speaker 1

但前几天我乘坐了一次,全程自动驾驶,今年就会推出。

But, you know, I took a ride in it the other day fully autonomously, and that's coming this year.

Speaker 0

是的。

Yes.

Speaker 0

作为乘客体验,它好在哪里?

How much better is it as a passenger experience?

Speaker 1

你试过之后就会告诉我了。

You'll tell me once you give it a try.

Speaker 1

我非常喜欢。

I love it.

Speaker 1

好的。

Okay.

Speaker 1

它就是,嗯,

It's so it's yeah.

Speaker 1

一切都在于空间

It's all about the space

Speaker 0

是的。

Yeah.

Speaker 1

还有上下车的便利性,以及后排的屏幕和界面。

And the convenience of, you know, ingress and egress and the the screens and the interface of the pasture.

Speaker 1

所以我们为它的每一个细节都倾注了大量心思。

So we put a lot of thought into every aspect of it.

Speaker 1

是的。

Yes.

Speaker 1

它有滑动门。

It has sliding doors.

Speaker 1

进去非常方便。

It's very easy to get in.

Speaker 1

它有一个平坦的地板。

It has a flat floor.

Speaker 1

是的,没错。

It is yeah.

Speaker 1

如果你坐在后排,可以完全伸展开,那里空间很大。

If you sit on the back, you can like fully stretch out, and there's so much space there.

Speaker 1

而且从外面看,它看起来相当大。

And it it looks you know, from the outside, you know, it looks fairly big.

Speaker 0

是的。

Yes.

Speaker 0

是吧?

Right?

Speaker 1

但它实际的占地面积呢

But the actual footprint of that is

Speaker 0

它没多大。

It's not bigger.

Speaker 1

只比I-PACE大了一丢丢,真的就一丢丢。

Barely barely, barely larger than the I PACE.

Speaker 0

嗯哼。

Mhmm.

Speaker 1

所以这点挺厉害的,对。

So it's kind of amazing that Yeah.

Speaker 1

你知道吗,你坐进车里的时候,感觉就像待在客厅里一样。

You know, you walk in, just it feels like you're in the living room.

Speaker 0

是啊。

Yes.

Speaker 0

我想我的问题是,Waymo每年大约有2500万次乘车服务,都是用捷豹I-PACE提供的。

I guess my question is just, you know, Waymo does, you know, 25,000,000 rides a year run rate ish with the Jaguar I PACE.

Speaker 0

有趣的是,到目前为止,自动驾驶的规模化主要发生在这些老旧的、可能是改装的车辆上,这或许并不意外。

And it's interesting that so much scaling has happened with self driving so far on the old, you know, retrofit maybe that's to be expected.

Speaker 1

嗯,这与高端市场相符,但我觉得这并不是理所当然的。

Well, it matches the high I don't think, you know, it's a given.

Speaker 1

你说得对。

You're you're right.

Speaker 1

我觉得,但如果你想想它的价值主张,对吧?

I think I But if you think about the value proposition, right?

Speaker 1

当然,安全性是其中一点。

Of course, there is the safety of it.

Speaker 1

是的。

Yep.

Speaker 1

你不需要为此操心。

You don't have to worry about it.

Speaker 1

还有隐私问题。

There's also the privacy.

Speaker 1

是的。

Yep.

Speaker 1

一个人在车里,或者和其他人一起,但不必与另一个人共享空间,对吧?

Being in the car by yourself, maybe with other folks, but not having to share the space with another human, right?

Speaker 0

不,Waymo 是个很棒的产品,好。

No, Waymo is great product, yeah.

Speaker 1

对。

Yeah.

Speaker 1

但我猜这正是我们看到车辆行驶如此稳定、非常可预测的原因。

But I guess this is why we're seeing such consistency in card, drives well, very predictable.

Speaker 1

而且你能超越这一点,对吧?

And you can go beyond that, right?

Speaker 1

你还能进一步专精,让体验更加神奇,是的。

And you specialize even more to make the experience even more magical Yes.

Speaker 1

围绕乘客。

Around the rider.

Speaker 1

但我猜,如果没有这种专用的自动驾驶系统,你可能会感到失望,我觉得我会很惊讶,嗯。

But I guess it's you know, it would have been disappointing if, you know, without the specialized card and I I think I would have been surprised Mhmm.

Speaker 1

如果我们停滞在某个更低的水平上,你知道,是的。

If we leveled off, you know, at some other much lower Yes.

Speaker 1

客户采纳率会更低,因为,是的,一辆车看起来更像是优化改进,但价值主张的核心来自其他因素。

Level of customer adoption because, yeah, a car seems like, you know, more of an optimization improvement, but the core of the value proposition comes from those other factors.

Speaker 0

是的。

Yes.

Speaker 0

对。

Yes.

Speaker 0

我想,我们最好一次只承担一个风险。

I guess it just take risk on one thing at a time.

Speaker 0

我们会先从软件层开始,然后再开发专用的汽车或类似的东西。

We'll start by, doing the software layer, then we'll build a specialized car or something like that.

Speaker 1

没错。

That's right.

Speaker 1

没错。

That's right.

Speaker 1

是的。

Yeah.

Speaker 1

它还

It's also

Speaker 0

嗯,我

well, I

Speaker 1

我的意思是,正如你所说,这是一笔大投资。

mean, as you said, it's a big investment.

Speaker 1

是的。

Yes.

Speaker 1

所以你必须先降低核心风险。

So you have to, like, you de risk the fundamentals.

Speaker 1

是的。

Yes.

Speaker 1

而且,你知道,在我们的历史上,我们一直专注于为公司设定最大的目标,以降低最重要的风险,对吧?

And, you know, throughout our history, we were very focused on setting the most, you know, the biggest goal for the company to derisk the most important questions, right?

Speaker 1

我们谈到了第三代,当时我们希望部署一个端到端的系统。

We talked about, you know, the third generation where, you know, we wanted to deploy something and go end to end.

Speaker 1

我们讨论了第四代的目标,哦,抱歉,是第五代。

We talked about what was the goal with the fourth generation, then oh, sorry, the fifth generation.

Speaker 1

没错。

Yep.

Speaker 1

然后是第六代。

And then there's the sixth generation.

Speaker 1

对吧?

Right?

Speaker 1

所以,正是在第六代时,才值得投入这么多精力去开发定制

So, it was the sixth generation where it made sense to go out and spend all this, you know, effort into the custom

Speaker 0

第六代既是定制车辆。

And sixth generation is both the custom vehicle.

Speaker 0

它也是驾驶栈的新一代吗?

Is it also a new generation of the the driving stack?

Speaker 1

是的。

Yeah.

Speaker 1

它是新的硬件。

It is the new hardware.

Speaker 0

对。

Yep.

Speaker 1

传感器,你知道的,硬件,自动驾驶硬件,它们都用在了Ojai车辆上,是的。

The sensors, you know, the hardware, the self driving hardware they're putting on the the Ojai vehicle Yeah.

Speaker 1

这就是第六代。

Is the sixth generation.

Speaker 1

对。

Yep.

Speaker 1

它与第五代非常不同。

It is very different from the fifth generation.

Speaker 1

它更简单。

It is simpler.

Speaker 1

它更强大。

It is more capable.

Speaker 1

成本低得多。

It is much lower cost.

Speaker 1

它的成本与如今高端驾驶辅助系统(ADAS)相当。

It's a of the cost that's comparable to what you would get like a fancy ADAS system nowadays, the driver assist system.

Speaker 1

是的。

Yeah.

Speaker 1

软件基本上是一样的。

The software is pretty much the same.

Speaker 1

嗯。

Mhmm.

Speaker 1

所以当我们谈论Waymo的泛化能力时,这是另一个方面。

So that's another so when we talk about generalizability of the Waymo

Speaker 0

司机们,是的。

drivers Yes.

Speaker 1

是的。

Yes.

Speaker 1

你知道,我们谈论天气条件。

You know, we talk about weather conditions.

Speaker 1

是的。

Yes.

Speaker 1

我们谈论城市,但它在不同的车辆平台和不同的传感器配置上也能很好地泛化,是的。

Talk about cities, but it also generalizes well to different vehicle platforms and different Yes.

Speaker 1

传感器配置。

Sensor configurations.

Speaker 0

好的。

Okay.

Speaker 0

所以第六代车型是全新的车辆和全新的传感器套件,但这其中的逻辑几乎是个延续性的迭代周期,和之前的发展模式很像。

So Gen six is a new vehicle and a new sensor stack, but a similar it's almost a TikTok cycle happening here.

Speaker 0

所使用的软件是同一套的。

It's a similar software.

Speaker 1

没错。

That's right.

Speaker 1

没错。

That's right.

Speaker 1

然后我们会把第六代Waymo自动驾驶系统投入使用,对。

And then we're gonna put the sixth generation Waymo Driver Yep.

Speaker 1

部署到其他车型平台上,比如今年晚些时候推出的现代艾尼氪。

On other vehicle platforms like the Hyundai Ioniq that's coming later in the year.

Speaker 0

第六代的硬件组有什么不同之处?你们又是怎么把成本降下来的?

What is different about the sixth generation hardware stack and how did you make it cheaper?

Speaker 1

它依然保留了原有的三种传感模态,但我们在这三个方面都做了大幅优化。

So, it still has the same three sensing modalities, but we've made significant optimizations in all three.

Speaker 1

是的

Yep.

Speaker 1

所以是统一和简化。

So unification, simplification.

Speaker 1

嗯哼。

Mhmm.

Speaker 1

而且就是,你知道的,那种

And there's just, you know, the kind of just

Speaker 0

编写代码是典型的制造规模效应吗?我们还没有获得

writing the Is it a classic case of, you know, manufacturing scale where we're not getting a

Speaker 1

多得多。嗯,规模效应还没完全显现,但所有这些,如果你想想供应链和行业的话,是的。

lot more Well, scale hasn't fully come into place, but all of those, you know, if you think about the kind of the the supply chains, the industries Yes.

Speaker 1

对吧?

Right?

Speaker 1

摄像头技术已经相当成熟了。

Cameras is pretty mature.

Speaker 1

嗯。

Mhmm.

Speaker 1

雷达,是的。

Radars Yeah.

Speaker 1

很多年前,雷达又大又复杂,非常昂贵。

Way, you know, many years ago used to be bulky, complex, very expensive.

Speaker 1

是的。

Yep.

Speaker 1

你知道,当我们把它们装在飞机上时,后来我们开始把它们装在汽车上。

You know, when we're putting them in planes and know, but then we start putting them on cars.

Speaker 1

现在,你只需几十美元就能买到一款不错的汽车雷达。

Now you can get a decent automotive radar for tens of dollars.

Speaker 1

汽车雷达有一个变种,叫做成像雷达,它能提供更丰富的数据。

There is a variant of the automotive radar, and it's called the imaging radar, and it gives you a richer setting.

Speaker 1

所以,这种雷达的成本也大幅下降了,但相比普通汽车雷达,它稍微落后一些。

So that is also, you know, has come down in cost drastically, but it's a little bit behind your standard automotive radars.

Speaker 1

激光雷达遵循着同样可预测、众所周知的趋势。

Lidars are following the same, very predictable, very well known trend.

Speaker 1

因此,我们在总结这些经验的同时,也在从前几代技术中学习,不断进行改进、简化和优化。

So we're, you know, writing that, and we're also, you know, learning from the previous generation to just make improvements and simplifications and optimizations.

Speaker 0

所以,这是一个非常傻的问题。

So a very silly question.

Speaker 0

在自动驾驶场景中,激光雷达相比雷达有哪些优势?

What are Lidars versus radars better at in a self driving context?

Speaker 1

激光雷达

Lidar

Speaker 0

它们是互补的吗?

Are they complementary?

Speaker 1

它们非常互补。

They're very complementary.

Speaker 1

是的。

Yeah.

Speaker 1

你知道,这本质上就是不停地发射。

You know, it's all blasting, you know Effectively.

Speaker 1

就像,你知道的,向外发射光子,然后它们碰到物体反弹回来,你测量返回的信号。

Like, you know, blasting, you know, photons out there, and then they bounce off of something, they come back, you know, you measure what comes back.

Speaker 1

它们的频率非常不同。

The frequencies are very different.

Speaker 1

是的。

Yes.

Speaker 1

所以激光能提供极高的分辨率。

So laser gives you it's very, very high resolution.

Speaker 1

所以,你可以想象成一束激光射出去,然后旋转扫描。

So, you know, think of it as like a laser beam that goes out, you know, spins around.

Speaker 1

对。

Yeah.

Speaker 1

它每秒会发射数百万个激光脉冲。

It, you know, shoots out millions of these laser pulses, you know, per second.

Speaker 1

然后每一个脉冲都会返回,你可以以极高的分辨率采样世界的三维结构。

And then each one comes back and you can, you know, kind of you're kind of sampling the three d structure of the world with very high resolution.

Speaker 0

用于精细测绘的激光雷达。

The Lidar for very fine grained mapping.

Speaker 0

没错。

That's right.

Speaker 1

雷达的分辨率低得多,但由于其物理特性,它在恶劣天气条件下的性能下降要小得多。

Radar has much lower resolution, but because of, you know, the physics of it, it degrades much better in adverse weather conditions.

Speaker 1

比如雾、雪、大雨。

So fog, snow, heavy rain.

Speaker 1

所以,它

So, it's

Speaker 0

不会被精确地干扰。

not going be imputed by Exactly.

Speaker 0

它和目标之间的颗粒物

Particles between it and the

Speaker 1

想象一下在浓雾中开车。

So, imagine driving in super dense fog.

Speaker 1

是的。

Yes.

Speaker 1

我们靠近旧金山,所以可能不需要多费脑筋去想。

We're close to San Francisco, so probably don't have to think that hard.

Speaker 1

视线会变得非常差。

It can be really hard to see.

Speaker 1

摄像头的性能会下降。

So cameras degrade.

Speaker 1

对。

Yes.

Speaker 1

激光雷达,取决于颗粒物的大小,其性能可能比摄像头更好或更差。

Laser, you know, depending on kind of the size of the particulates can degrade better or worse than camera.

Speaker 1

雷达受影响较小。

Radar is not well affected.

Speaker 1

所以,你可以想象在高速公路上开车,雷达能为你提供非常清晰的回波,那些车辆在摄像头视野中是完全看不见的。

So, you know, you can imagine driving on a freeway, then radar will give you really good returns for, you know, cars that are absolutely, you know, invisible in the, you know, in the camera space.

Speaker 0

哦,这很有趣。

Oh, that's interesting.

Speaker 0

那么,这意味着在某些环境中,你会更依赖雷达吗?

So so does that mean there are some environments where you'll be relying significantly more on radar?

Speaker 1

但性能是

It's But the performance is

Speaker 0

足够好。

good enough.

Speaker 1

所以,这是多种传感器的结合。

Well, so it's it's a combination of the sensors.

Speaker 1

对吧?

Right?

Speaker 1

所以我们依赖每一个传感器,但每个都有噪声。

So we we rely on you know, each one is noisy.

Speaker 1

对吧?

Right?

Speaker 1

不同环境中噪声的特性表现不同,但并不是说我们会在这几种传感器之间切换。

How the noise characteristic show up in different environments is different, but it is I mean, it's not like we switch from one to another.

Speaker 1

我们并不是先通过摄像头、雷达和激光雷达分别估计世界的状态,然后再进行比较。

It's not like we know estimate what's happening with the world through cameras and through Rayners and through Lidar, and then we compare.

Speaker 1

不是的。

No.

Speaker 1

它们就像分别有针对激光雷达的编码器、针对摄像头的编码器,它们全部输入到一个系统中,共同提供对世界状态的最佳综合判断。

They're like, there's an encoder for Lidar, there's an encoder for Lidar, there's a and they all go into the, you know, the system that gives you jointly the best view of what's happening in the world.

Speaker 1

所以,如果是在阳光明媚的好天气,摄像头就非常有价值。

So, if you were, you know, if it's a nice bright sunny day, cameras are very valuable.

Speaker 1

如果是在漆黑一片,或者阳光直射、对向车辆的车灯刺眼的情况下,摄像头的性能就会下降。

If, you know, it's pitch dark or you have like sun in your face or you're blinded by the headlights from, you know, oncoming car, then camera will degrade.

Speaker 1

还是会有一些噪声信号,但性能确实会下降。

Like, there's still some, you know, noisy signal, but it will degrade.

Speaker 1

是的。

Yes.

Speaker 1

雷达和激光雷达则完全不受影响。

And radar light Lidar is completely unaffected.

Speaker 1

对吧?

Right?

Speaker 0

有没有哪些技术难题是你们一直执着追求或特别想解决的?

Are there technical problems that are your white whale or you're just you're still chasing or you are particularly interested in solving?

Speaker 0

即使这些问题比较小众,比如在真正下雪时实现安全驾驶,或者在旧金山的陡坡上表现完美,你们有没有哪些历史上一直感兴趣、至今仍在关注的问题?

Even if they're kinda niche for the you know, we just we really want to have, you know, driving when it's actually snowing nailed or steep hills in San Francisco or you know, are there problems you've been very interested in historically or still are?

Speaker 1

我确实很感兴趣,因为我现在对全球扩张的加速感到非常兴奋。

I am because I'm super excited right now about the accelerating global expansion.

Speaker 1

嗯。

Mhmm.

Speaker 1

美国更多城市正在加入,同时也在走向国际。

More cities in The United States and going internationally.

Speaker 1

是的。

Yes.

Speaker 1

所以,我不太明白。

So being I don't understand.

Speaker 1

我并没有回答你关于技术的问题。

I'm not answering your your question about the, you know, technology.

Speaker 1

你回头再看看那个。

You go back to that.

Speaker 1

但真正让我今天最兴奋的是,能够实现这样一个场景:你飞抵任何主要大都市的机场后,可以直接乘坐Waymo去任何你想去的地方。

But really, that's the thing that I'm, today most excited about, just, you know, be you know, getting to a place where any major, you know, metropolitan area you can fly into the airport and then take a Waymo and go anywhere you wanna go.

Speaker 1

就像,那确实是。

Like, that is Yes.

Speaker 1

是的。

Yes.

Speaker 1

现在对我来说简直令人激动不已。

Insanely exciting to me right now.

Speaker 1

所以,从技术上讲,我最兴奋的是人工智能领域的快速发展。

So then, you know, technically, what I'm most excited about is all of the rapid progress in AI.

Speaker 1

嗯哼。

Mhmm.

Speaker 1

还有世界模型和基础模型的研究。

And the world models, the foundational model work.

Speaker 1

这极大地提升了我们简化系统的能力。

And it is just such a massive boost to how much we can simplify the system.

Speaker 1

是的。

Yep.

Speaker 1

我们能降低多少成本,以及如何实现全球扩展。

How much we can bring down the cost and how we can scale globally.

Speaker 1

有些奇妙的事情发生了,我几年前根本没想到会这样。

And there's just some magic that happens that I don't think I would have anticipated a few years ago.

Speaker 1

因此,从技术角度来看,这让我感到无比激动。

So that I find from the technical perspective just insanely thrilling.

Speaker 0

是的。

Yes.

Speaker 0

当你谈到人工智能的进展时,这些日子最让你觉得有趣的部分是什么?

When you talk about kind of the progress in AI, what are the most fun parts of it for you these days?

Speaker 1

我认为,最令人兴奋的是看到这种从基础模型出发、然后逐步专门化到具体应用(比如T恤),再进行知识蒸馏的方法所带来的能力和扩展规律,这种做法在整体性能上带来了巨大的提升。

I think it's seeing the capability and the scaling laws from this approach of starting, you know, with that cornerstone of the foundational model and then specializing to t shirts and then, you know, distilling, it just you get such big wins in performance across the board.

Speaker 1

我只需要在架构上做一些改进,或者在数据或训练方法上做得更好,然后,是的,在早期阶段投入,就会产生巨大的放大效应和连锁反应。

I just need to use you something into the architecture or get better at data or training recipe, and then, yeah, you invest at that early stage, and then it just has massive amplification and ripple effects.

Speaker 1

在某种程度上,这确实有点神奇。

That is, in some ways, is kind of magical.

Speaker 1

然后,我想你就会在汽车上看到这种效果。

And then you you I guess, then you see it on the car.

Speaker 1

我曾经有过一些时刻,看到一辆车做了一些事情,然后我查看日志,感到非常惊讶。

And I've had some moments where, you know, a car does something and you look at a log, and I've been surprised.

Speaker 1

它做了一些我原本以为它做不到的事情。

Like, it does things that I didn't think it was capable of doing.

Speaker 1

对吧?

Right?

Speaker 1

所以就是这样

So it's that

Speaker 0

当你看到涌现行为时,这是一个很令人自豪的例子,是的。

When you see emergent behavior, that's kind of a One proud example, yeah.

Speaker 1

你知道,当你构建一个系统,然后你以为自己完全理解了它的运行方式,清楚它的能力和性能边界,但它却做了一些几乎像魔法一样的事情,是的。

You know, it's, you know, when you build a system and then, you know, you think you understand, you know, how it works and you understand fully, you know, the limits of its capability and performance and then it does something, you know, kind of almost magical Yes.

Speaker 1

这让人兴奋不已。

It's it's exhilarating.

Speaker 1

对吧?

Right?

Speaker 1

是的。

Yes.

Speaker 1

我可以给你举一个例子,我想我在一些公开演讲中分享过相关的视频,是这样的:在旧金山发生的一个相当温和的情境,当时路口的信号灯是红灯,有横向车流,一辆公交车开过,它停了下来,部分挡住了道路。

So one one example I can give you, I think I've shared some videos of that, yeah, publicly in some talks, was this example where the situation that happened in San Francisco you know, fairly benign situation where at an intersection our light is red, there's cross traffic, a bus goes by, and, you know, it stops partially blocking.

Speaker 1

灯变绿了,我们就开始前进。

Light turns green, so we start to go.

Speaker 1

我们绕着公交车缓慢移动,然后你看到行人出现在公交车的另一侧。

We're nudging around the bus, and then you see a pedestrian being detected on the other side of the bus.

Speaker 1

对吧?

Right?

Speaker 1

然后你的车会做出适当的反应。

And then your car responds appropriately.

Speaker 1

它减速,绕得更宽一些。

It slows down, goes a little bit wider.

Speaker 1

是的。

Yep.

Speaker 1

接着,一个行人真的从公交车后面走出来,然后我们继续前行。

And, you know, then a pedestrian actually emerges from the bus and, you know, we go on our own way.

Speaker 1

所以第一次我看这个日志时,心想:这里到底发生了什么?

So the first time I looked at that log, like, what's what's going on here?

Speaker 1

我的意思是,我知道我们的传感器已经相当出色了,嗯。

Like, I I know we have pretty darn good sensors Mhmm.

Speaker 1

而且软件也非常强大。

And the software is very capable.

Speaker 1

我们的系统并不能穿透物体。

Like, we don't see through stuff.

Speaker 0

是的。

Yeah.

Speaker 0

对。

Yeah.

Speaker 1

对吧?

Right?

Speaker 1

摄像头、激光雷达和雷达可不是这样工作的。

Like, that's not how cameras or lighters and radars work.

Speaker 1

对吧?

Right?

Speaker 0

它们能穿透公交车。

They can punch bus.

Speaker 0

穿过公交车。

Through the bus.

Speaker 1

你看到公交车另一侧的行人了。

You throw you saw the pedestrian on the other side of the bus.

Speaker 1

是的。

Yeah.

Speaker 1

是的。

Yeah.

Speaker 1

而且并不是说,你知道的,你看着车窗,心想,好吧。

And it's not like, you know, you look at the windows, you're like, okay.

Speaker 1

你知道,雷达不应该被这个巨大的金属箱体阻挡。

You know, radars shouldn't this massive metal box.

Speaker 1

是的。

Yeah.

Speaker 1

是的。

Yeah.

Speaker 1

你知道,看看传感器数据。

You know, look at the sensor data.

Speaker 0

是的。

Yes.

Speaker 1

而且,雷达不应该能穿透它。

And, like, it you shouldn't radar shouldn't be able to go through it.

Speaker 1

对吧?

Right?

Speaker 1

你知道,摄像头,你没法通过摄像头看到,因为有反光,还有车上的乘客,你不可能透过窗户看到里面。

You know, camera, like, you can't see in the camera because, you know, there's reflections and there's people on the bus, it's not like you can see through the windows.

Speaker 1

对。

Right.

Speaker 1

那么,到底发生了什么?

So, like, what is going on?

Speaker 1

也许这是噪音或者某种巧合。

Maybe it's, you know, noise or some coincidence.

Speaker 1

我第一次看到的时候,真的不敢相信。

And I you know, first time I saw it, like, I couldn't actually believe it.

Speaker 1

不,不可能。

I like, no.

Speaker 1

实际上发生的情况是,我们车底的辅助梯子产生了反射,人脚移动时带来了一点非常微弱但明显的杂波,足以让AI模型误判为附近有行人,于是我就会想,哦,我检测到一个信号。

There's something doesn't So sound what actually turned out was happening is that our peripheral ladders bounced under the bus, and there was just a little bit of very, very noisy reflection of the movement of the person's feet that was enough for the AI models that, hey, likely there's a pedestrian there, and I'm gonna, you know oh, I I detected a sign.

Speaker 1

是的。

Yeah.

Speaker 1

而且,这些数据已经足够用来预测他们的行为。

And moreover, there's enough data there to, you know, predict what they're going to do.

Speaker 1

对。

Yes.

Speaker 1

这简直让我大吃一惊。

And it's kinda like blew my mind.

Speaker 0

这是否正是解释我们之前讨论内容的完美例子?即传感器套件中融合的价值,以及其次,构建对当前状况的中间表示——如果你只处理像素,那么躲在公交车后面的人在像素空间中是不存在的,因此你需要一个对世界存在的表示,才能推理出公交车后面的人。

Is this the perfect example to explain what we were talking about earlier, the value of one fusion across a sensor suite, but then secondly, building I mean, relatedly, building an intermediate representation of what's going on, where if you're just dealing with pixels I mean, the person behind the bus does not exist in pixel space, and so you need to have some representation of the world that exists to be able to reason about the person behind the bus.

Speaker 1

我认为这是一个例子,说明利用这种中间表示来提升模型各个部分的性能正在这里发生。

I think it's an example where giving it an using that intermediate representation to boost the level of performance of all parts of the model is what's happening here.

Speaker 1

想象一下,如果用一个黑箱、纯粹的开环系统来解决这个问题,嗯。

Just imagine solving this problem with a black box, you know, purely open loop Yep.

Speaker 1

模仿系统。

Imitative system.

Speaker 1

是的。

Be Yeah.

Speaker 1

很难实现。

Hard to go.

Speaker 1

这是否,你知道的,不可能?

Is it, you know, impossible?

Speaker 1

不。

No.

Speaker 1

是的。

Yeah.

Speaker 1

在实际操作中,要达到这种性能水平需要什么?

In practice, what would it take to achieve that level of performance?

Speaker 0

是的。

Yes.

Speaker 1

非常非常困难。

Very, very difficult.

Speaker 0

你能分享一下目前业务在订单量、收入和道路上车辆数量方面的具体数据吗?

What metrics can you share on just where the business is at today in terms of rides, revenues, cars on the roads?

Speaker 1

我们大约有3000辆车。

We have about 3,000 cars Mhmm.

Speaker 1

在道路上运行。

On the roads.

Speaker 1

我们每天完成约五十万次行程。

We're doing about half a million rides Mhmm.

Speaker 1

每周。

Per week.

Speaker 1

这相当于每周大约四百万英里的全自动驾驶里程,是的。

That translates to about, you know, over 4,000,000 fully autonomous miles Yeah.

Speaker 1

每周。

Per week.

Speaker 1

我们在美国11个城市中以全自动驾驶模式运营。

We are operating in a fully autonomous mode in 11 cities in The US.

Speaker 1

其中10个城市有公众乘客。

And 10 of those, we have riders, public, you know, riders.

Speaker 1

那座鬼城是什么?

And What's the Ghost City?

Speaker 1

鬼城是自然形成的。

The Ghost City is natural.

Speaker 1

我们刚刚在那里开始运营。

We we just started there.

Speaker 1

所以我们一天之内向四个新城市开放了作家权限。

So we just opened it up to writers in four new cities in one day.

Speaker 1

这可以说是其中一个虽小但极其令人兴奋的时刻,我记得回想起这段历史,是的。

So, like, it it that was one of those, you know, little but super exciting moments where I, you know, I thought back to the history Yes.

Speaker 1

从我们首次启动完全自动驾驶的仅乘客运营,到首次在四个城市拥有外部乘客,花了多长时间?

Like, how long did it take us from the first time we started fully autonomous rider only operation to the first time we had external riders in four cities.

Speaker 1

大约八年。

It's about eight years.

Speaker 0

嗯。

Mhmm.

Speaker 1

然后就在上周,我们又一天之内推出了四个新城市。

And then just, you know, like, the other week, we just launched four in one day.

Speaker 0

是的。

Yes.

Speaker 0

对。

Yes.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客