本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
一个经常被反复提及的问题是:我们如何跟进最新的人工智能动态?为什么需要持续关注AI新闻?通过与用户交流,了解他们的需求与不满,分析反馈意见,才能真正大幅提升产品性能。
One question that get asked a lot and a lot is how do we keep up to date with the latest AI news? Why why do you need to keep up to date with the latest AI news? If you talk to the users, you understand what they want, what they don't want, look into the feedbacks, then you can actually improve the application way way way
更多的情况是:许多公司在开发AI产品,但很多公司在开发过程中并不顺利。
more. A lot of companies are building AI products. A lot of companies are not having a good time building AI products.
我们正处于理想与危机的交汇点。现在我们拥有这些非常酷炫的工具,但必须从零开始构建一切。你可以拥有自己的设计方案,也可以保留自己的记录。
We are in an ideal crisis. Now we have all these really cool tools. You have to do everything from scratch. You have your design. It can have your record.
还可以建立自己的网站。理论上我们应该看到更多成果,但核心团队似乎陷入了困境,他们不知道该开发什么。
It can have your website. So in theory, we should see a lot more. But the central people are, like, somehow stuck. They don't know what to build.
尽管AI热潮汹涌,但数据显示大多数公司尝试后收效甚微,最终放弃。你认为问题出在哪里?
All this AI hype, the data is actually showing most companies try it. It doesn't do a lot. They stop. What do you think is the gap here?
生产力确实难以量化。我常建议人们询问他们的经理:你更愿意为团队每位成员购买昂贵的编程助手订阅,还是增加一个编制?多数基层管理者会选择后者。但如果询问副总裁或管理多个团队的高管,他们会选择AI助手。因为作为管理者,你们仍在成长阶段。
It's really hard to measure productivity. So I do ask people to ask their managers, would you rather have give everyone on the team very expensive coding agent subscriptions or you get an extra headcount? Almost everyone in managers would say headcount. But if you ask VP level or someone who manage a lot of teams, they would say one AI assistant. Because as managers, you are still growing.
对你而言,增加一个HR编制意义重大;而对高管来说,可能更关注业务指标。所以实际上你需要思考:什么才能真正推动你的生产力指标?
So for you, having one HR headcount is big. Whereas for executive, maybe you have more business metrics that you you care about. So you actually think about what actually drive productivity metrics for you.
今天,我的嘉宾是Chip Hwan。与许多分享关于构建优秀AI产品见解及行业趋势的人不同,Chip已成功打造了多款AI产品、平台和工具。她曾是英伟达Nemo平台的核心开发者、Netflix的AI研究员,并在斯坦福大学教授机器学习课程。她还两次创业,并撰写了AI领域两本最受欢迎的书籍,其中最新出版的《AI工程》自发行以来一直是O'Reilly平台上阅读量最高的书籍。
Today, my guest is Chip Hwan. Unlike a lot of people who share insights into building great AI products and where things are heading, Chip has built multiple successful AI products, platforms, tools. Chip was a core developer on NVIDIA's Nemo platform, an AI researcher at Netflix. She taught machine learning at Stanford. She's also a two time founder and the author of two of the most popular books in the world of AI, including her most recent book called AI Engineering, which has been the most read book on the O'Reilly platform since its launch.
她还与众多企业合作制定AI战略,因此能深入了解不同公司的实际运作情况。在我们的对话中,Chip详细解释了基础知识,比如预训练和后训练的具体流程、什么是RAG、强化学习是什么?RLHF又是什么?我们还探讨了她关于构建优秀AI产品的全部心得,包括人们的认知误区与实际所需。我们聊到了企业最常见的陷阱、她观察到的最大生产力提升领域,以及更多内容。
She's also gotten to work with a lot of enterprises on their AI strategies, and so she gets to see what's actually happening on the ground inside a lot of different companies. In our conversation, Chip explains a lot of the basics, like what exactly does pretraining and post training look like, what is RAG, what is reinforcement learning? What is RLHF? We also get into everything she's learned about how to build great AI products, including what people think it takes and what it actually takes. We talk about the most common pitfalls that companies run into, where she's seeing the most productivity gains, and so much more.
本期节目技术性很强,比我大多数对话都更专业,适合所有希望深入了解AI的听众。如果你喜欢这期播客,别忘了在你常用的播客应用或YouTube上订阅关注。若你成为我通讯的年费订阅用户,可免费获得16款卓越产品的年使用权,包括Devin、Lovable、Replic、Bolt、n eight n linear、superhuman、descript、whisper flow、gamma、perplexity、warp、granola、magic patterns、raycast、j p r d和mobbin。请访问lennysnewsletter.com点击product pass。接下来,在赞助商信息后,请听Chip Nguyen的分享。
This episode is quite technical, more technical than most conversations I've had, and is meant for anyone looking for a more in-depth conversation about AI. If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. And if you become an annual subscriber of my newsletter, you get a year free of 16 incredible products, including Devin, Lovable, Replic, Bolt, n eight n linear, superhuman, descript, whisper flow, gamma, perplexity, warp, granola, magic patterns, raycast, j p r d, and mobbin. Head on over to lennysnewsletter.com and click product pass. With that, I bring you Chip Nguyen after a short word from our sponsors.
本期节目由dScout赞助播出。如今设计团队既要快速行动又要精准无误,这正是dScout的价值所在。dScout是为现代产品和设计团队打造的一站式研究平台。无论你正在进行可用性测试、访谈、问卷调查还是实地调研,dScout都能帮助你快速连接真实用户并获得有效洞察。
This episode is brought to you by dScout. Design teams today are expected to move fast, but also to get it right. That's where dScout comes in. DScout is the all in one research platform built for modern product and design teams. Whether you're running usability tests, interviews, surveys, or in the wild fieldwork, dScot makes it easy to connect with real users and get real insights fast.
你甚至可以直接在平台内测试Figma原型。无需切换工具,不必担心找不到测试者。凭借行业最受信赖的用户池和AI驱动的分析功能,你的团队能在保持速度的同时获得清晰决策依据。如果你想优化研究流程、加速决策并打造有影响力的设计,请访问dscout.com了解更多。网址是dscout.com。
You can even test your Figma prototypes directly inside the platform. No juggling tools, no chasing ghost participants. And with the industry's most trusted panel, plus AI powered analysis, your team gets clarity and confidence to build better without slowing down. So if you're ready to streamline your research, speed up decisions, and design with impact, head to dscout.com to learn more. That's dsc0ut.com.
让你自信前行的答案。你知道吗?我有一个完整团队协助制作播客和通讯。我希望团队每位成员都能快乐工作并茁壮成长。Justworks深知员工不仅是雇员,更是你最重要的伙伴。
The answers you need to move confidently. Did you know that I have a whole team that helps me with my podcast and with my newsletter? I want everyone on that team to be super happy and thrive in their roles. Justworks knows that your employees are more than just your employees. They're your people.
我的团队分布在科罗拉多、澳大利亚、尼泊尔、西非和旧金山。如果没有Justworks,跨国招聘、按时支付当地货币薪资以及全天候解答HR问题将极其复杂。但有了Justworks,一切都变得简单。无论是设置自动化薪资系统、提供优质福利还是国际招聘,Justworks通过简易软件和小企业专家7×24小时服务,为你和团队妥善处理人力资源事务,让你能专注于善待员工。
My team is spread out across Colorado, Australia, Nepal, West Africa, and San Francisco. My life would be so incredibly complicated to hire people internationally, to pay people on time and in their local currencies, and to answer their HR questions twenty four seven. But with Justworks, it's super easy. Whether you're setting up your own automated payroll, offering premium benefits, or hiring internationally, Justworks offers simple software and twenty four seven human support from small business experts for you and your people. They do your human resources right so that you can do right by your people.
Justworks为您的团队服务。Chip,非常感谢你能来参加,欢迎来到播客节目。
Justworks for your people. Chip, thank you so much for being here, and welcome to the podcast.
嗨,Lenny。我是这个播客的长期粉丝,能来参加真的很兴奋。谢谢你邀请我。
Hi, Lenny. I've been a big fan of the podcast for a while, I'm really excited to be here. Thank you for having me.
我想从你之前在LinkedIn上分享的一张表格/图表开始,它当时非常火爆。我认为它之所以如此受欢迎,是因为触动了很多人的神经。我来读一下,我们会在YouTube上为观看的观众展示。这是你分享的一个非常简单的表格,对比了人们认为能提升AI应用的因素与实际能提升AI应用的因素。人们认为的因素包括:紧跟最新AI新闻、采用最新的代理框架、纠结使用哪种向量数据库、不断评估哪个模型更智能、微调模型。而你列出的实际提升因素则是:与用户交流、构建更可靠的平台、准备更好的数据、优化端到端工作流程、编写更好的提示语。为什么你觉得这个表格如此触动人心?
I wanna start with this table slash chart that you shared on LinkedIn a while ago that went super viral, and I think it went super viral because it hit a nerve with a lot of people. Let me just read this and we'll show this on YouTube for people that are watching. So it's this very simple table you share of what people think will improve AI apps, and what actually improves AI apps, what people think will improve AI apps, staying up to date with the latest AI news, adopting the newest agentic framework, agonizing about vector databases to use, constantly evaluating what model is smarter, fine tuning a model. And then you have what actually improves AI apps: Talking to users, building more reliable platforms, preparing better data, optimizing end to end workflows, writing better prompts. Why do you think this hit such a nerve with people?
简单来说,如果要归结一点,你认为人们在构建成功的AI应用时最常忽视的是什么?
Just what, if you have to boil it down, what do you think is, what do you think people are missing about building successful AI apps?
他们经常被问到的一个问题是:我们该如何跟上最新的AI新闻?而我的反应是:为什么?为什么你需要紧跟最新的AI新闻?我知道这听起来很直观,但我觉得新闻太多了。很多人还会问我类似的问题:我该如何在两个不同的技术之间做选择?
One question they get asked a lot and a lot is that, how do we keep up your dick with the latest AI news? And I'm like, why? Why do you need to keep up to date with the latest AI news? I know it's very cut to cut to intuitive, but I guess so much news out there. A lot of people also ask me questions like, how do I choose between two different technologies?
比如最近可能是MCP对比代理协议,对吧?哪个更好?这类问题。而我真正想问他们的是:首先,最优解和非最优解之间能带来多少改进?有时候他们会说,其实差别不大。那我就说:如果改进不大,为什么要花那么多时间去纠结呢?
Like maybe like recently like MCP versus like agent agents, right? Like protocol and it was like, which one is better or like this or that? And things are a serious question I should ask them is like first, like if how much of the improvements could you get like from like optimal solutions versus non optimal solutions, right? And sometimes they were like, actually it's not much, right? And I was like, okay, if it's not much improvement, why do you want to spend so much time debuting?
这意味着这对你的性能影响不大。另一个他们常问的问题是:如果采用了一项新技术,之后想换成另一个有多难?有时候他们会说:哦,我觉得换起来可能工作量很大。我就说:假设这里有一项新技术,还没有经过大量用户验证,如果你采用了它,可能就要永远被它绑定了。
So means it doesn't make so much difference to your performance. And another question they asked is like, if you adopted a new technology, like how hard it could be to switch that out to another. And sometimes it will like, oh, I think it could be like a lot of work switching it out. And I was just like, let's say here's a new technology. It hasn't been tested by a lot of people and if you adopt it, you would be stuck with it forever.
那么,你是真的想采用它吗?对吧。或许你应该三思而后行,不要过度承诺那些尚未经过充分测试的新技术。
Like, do you actually want to adopt it? Right. And maybe you wanna think twice about like over commit to like new technologies that hasn't been better tested.
我很喜欢你更宏观的建议——其实很简单,要打造成功的应用就该多和用户交流,优化数据质量,改进提示词设计,提升用户体验,而不是一味追逐最新最炫的东西。现在该用什么模型?AI领域有什么新动态?让我顺着微调和后训练这个概念聊下去。AI领域充斥着各种术语,而您作为实践者真正构建过这些系统、与企业合作落地过这些技术,相信能给大家提供一个绝佳的学习机会。
I love your just broader advice is just simple, like talk to to build successful apps, talk to users, build better data, write better prompts, optimize the user experience versus just like, what is the latest and greatest? What's the best model to use right now? What's happening in AI? Let me follow this thread of this idea of fine tuning and basically post training. There's all these terms that people hear in AI, and I think this is going be a really good opportunity for people to learn what we're actually talking about, since you actually do these things, you build these things, you work with companies doing these things.
我会在对话中穿插几个专业术语,但让我们从这个开始:如何用最简单的方式理解预训练和后训练的区别?微调又在其中扮演什么角色?到底什么是微调?
There's a few terms I want to sprinkle in through the conversation, but let's start with this one. What's the simplest way for someone to understand? What is the difference between pre training and post training and then just how fine tuning fits into that? Just what fine tuning actually is?
免责声明,我并不完全了解那些神秘的前沿实验室的具体工作。但根据我所知,其中一种是有监督微调——当你拥有示范数据集和专家标注时,比如给出问题后标注标准答案,通过训练让模型学会模拟人类专家的输出模式。
Disclaimer, I don't have full visibility into what this big secretive frontier labs are doing. But right from what I heard, right? So I think it's like one is supervised fine tuning when you have demonstration data and you have a bunch of experts like, okay, here's a prop, right? And here is what the answer should be like. And you just train it on to like stimulate, like emulate what the human expert could be like.
这也是许多开源模型采用的知识蒸馏方式。不同于依赖专家人工编写优质提示答案,它们会获取知名优质模型的输出结果,比如ChatGPT的响应,然后训练小模型去模仿。虽然我十分敬佩开源社区的贡献,但必须指出:能训练出模仿现有优质模型的系统,与能从头训练出同等水平的模型之间,存在着巨大的能力鸿沟。
And that's also like what a lot of people would like, so open source models are doing as they do it by distillation. So instead of having human experts should like write really good starting a great answers to prompts, they get like very popular, famous, good models. So like Jerry is a response to it and like getting this train smaller model to emulate. So sometimes you see people was just like so that's because I really appreciate open source community by the way, but like going from like have been able to train a model that can emulate a existing good model is very different from like being able to train a good models like an output for existing good model. So it's a big step there.
所以我们有监督微调,另一个重要方向是强化学习——虽然不确定你们节目是否讨论过,但这个技术现在无处不在。
So yeah, so like we have my supervised fine tuning and another thing that's like very big, I'm not sure you have guests talking about it already, but like reinforcement learning is like everywhere.
这个话题我们稍后重点讨论,它确实越来越频繁地出现在我的对话中。不过先总结下您刚才分享的要点:本质上模型是人为编写的算法代码,前沿模型被投喂整个互联网的内容数据,通过不断预测下一个词(严格来说是token,但通俗理解为文本中的下一个词)来训练。当预测错误时,它会调整所谓的权重参数——虽然这个理解很表层,但作为基础认知可以这样简化理解对吗?
Let's pause on that because I would definitely wanna spend time on that. And that's such a cool topic that's merging more and more in my conversations, but just to even summarize the things you just shared, which I think is really, really important stuff. So the idea here is a model essentially this algorithm piece of code that someone writes, and say the frontier models are feeding it just like the entire internet of content, and basically it's trying to test itself on predicting across all that data, the next word. Essentially, token is the simpler way, isn't the correct way to think about it, but a simpler way to think about it is like the next word in text, And as it gets it wrong, it adjusts these things called weights essentially. Just like, is that a simple way to think about it, even though that's just like very surface level?
所以我认为语言建模是一种编码语言统计共现信息的方式,对吧?比如说我们都说英语,就能感知到哪些表达统计上更可能。比如我说'我最喜欢的颜色是',你自然会想到后面应该接一个颜色词。像'蓝色'这个词出现的概率就远高于'桌子末端'这类词,因为统计上'蓝色'更可能接在'我最喜欢的颜色是'后面。
So I think of language modeling as a way of encoding statistical co information about language, right? So let's say that we both speak English, so we can't get a sense of like what is more statistically likely. Like if I say my favorite color is then you would say, okay, that should be another color. Like the word blue would be much more likely to appear than the word like end of table, right? Because statistically blue is more likely to come up to my favorite color is.
因此它的核心在于这是一种信息编码方式。当你在海量数据上训练语言模型时——比如接触过多种语言和领域——它就能识别出'哦,你们通常这样表达'。然后用户输入提示时,它就会生成统计上最可能的下一个标记。顺便说一句,这并非新概念。
So it's a sign get is this is is a way of encoding set of information. So like when language model, like when you train a large amount of data, like you see a lot of languages, a lot of domains. So it can tell like, okay, you guys say this standard. Then it user do the prompts and it would come like with the next most likely token. So by the way, it's not a new idea.
实际上这个理念非常古老,可以追溯到1951年关于英语熵的论文,我记得是克劳德·香农的杰作。有个我很喜欢的故事——对了,你读过福尔摩斯探案集吗?
Actually, every So it's the idea comes very, very old, like from the 1951 papers, like English entropy. I think it's like Claude Shannon, it's a great paper. And I think every There's a story I really like is from, did you read Sherlock Holmes by the way?
嗯,读过几本福尔摩斯。
Yeah, read a few Sherlock Holmes books.
对,这个故事讲的是福尔摩斯如何运用这种统计信息破案。有人留下了一堆火柴人图案的密信,福尔摩斯推理:既然英语里最常见的字母是E,那么出现最频繁的火柴人造型必然代表E,就这样他层层破解了密码。
Yeah, so this is a story of when Sherlock Holmes was using this technical information to like have sewn a case. So this is a story, Somebody left a message with a lot of stick figures. So Shoho was like, okay, he knows that in English, the most common letter is E, then the most common stick figure must be E, right? And then he goes, he stopped like that. He was really the code.
我认为这就是语言的本质。某种程度上这就像简易版语言建模,只不过他是在字符层面操作,而标记(token)介于两者之间——既非完整单词,又大于单个字符。
So I think that's language. So in a way it's like simple language modeling, right? But instead of like at a word level, he does this as like character level. And token is something in between, right? A token is not quite a word, but it's bigger than a character.
采用标记(token)的概念能帮我们平衡词汇量:纯字符方案只有26个字母但表达能力有限,完整单词方案可能有数百万词汇,而标记可以折中处理。比如遇到'podcasting'这种新词,可以拆解为'podcast'和'ing'两个标记。
So let's say we say token because it helps us like read what help us reduce vocabulary because with character is like smallest amount of vocabulary right now. Amplify has like 26 character, but words can have like millions and millions, right? Whereas tokens you can like be able to like get like the sweet spot maybe the two. So, let's say that we have a new word, like how to say like podcasting, right? Let's say it's new word, but it can divide it in podcast and ink.
所以人们理解,好吧,播客,我们知道它的含义,知道ink可以作动词,像动名词之类的。我们知道podcasting这个词。这就是token的由来。是的,预训练本质上就是编码语言的统计信息,让你预测最可能的情况。我认为最可能的情况是最简单的方式,因为这更像是建立一个分布,比如下一个token可能有90%概率是频道,也可能是颜色。
So people understand, okay, podcast, we know the meaning, we know that ink is like a verb, like gerund, whatever it is. So we know the word like podcasting. So that's why the token comes in. But yeah, that's like the pre tuning is basically like encoding statistical informations of language to have you predict what is most likely. I think that most likely is a most simple way of doing it because it's more like building a distribution of like, okay, so there's a next token could be like more like 90% of the channel, it could be like a color.
比如10%的概率可能是其他东西,对吧?所以你基于分布,语言可以根据你的采样策略来选择。你是希望它总是选择最可能的token,还是希望选择更有创意的内容?所以我认为采样策略极其重要。
Like 10% of the time could be something else. Right? So you base on distribution, so language could like pick, like depending on your sampling strategy. Like do you want it to always pick the most likely token or do you want it to pick something more creative? So I think my sampling strategy I think is something extremely important.
它能大幅提升性能,而且这一点常被严重低估。
It can have you boost up performance in a huge way and very, very underrated.
好的,太棒了。本质上模型就是带有整套权重的代码,本质上就是学会预测特定词语和短语后接内容的统计模型。
Okay, awesome. So essentially a model is just code with this whole set of weights, essentially the statistical model that has learned to predict what comes next after certain words and phrases.
对。
Yeah.
而后训练和微调具体就是在做同样的事。预训练得到像GPT-5这样的模型,微调就是有人拿GPT-5做类似的事,针对他们发现对特定用例必要的数据稍微调整这些权重。这样理解简单吗?
And then post training and fine tuning specifically is doing that same thing. So pre training, you get like GPT-five, fine tuning is someone taking GPT-five and doing the same sort of thing, adjusting these weights a little bit for specific use cases on data that they find is necessary to do their very specific use case. Is that a simple way to think about it?
是的,我把权重看作函数。比如说就像可能有函数表示Lenny的身高是1x加上2x之类,这个加上的数值就是权重。你不断调整直到拟合正确数据,比如我和你的身高。所以可以把权重看作函数参数,通过链式调整权重来拟合训练数据。
Yeah, I think of like weights as like functions, right? So let's say just like you have maybe it has a functions of like maybe Lenny's height is maybe like 1x plus something like 2x, like one and plus something is a weight, right? So you change it until you fit the correct data, which is like my height and your height, right? So you can think of the weight, it's just like a weight, like they function. So you like chain adjust the weight so they can fit the data, which is the training data.
太棒了,好的。那么我们正在讨论预训练、后训练和微调。关于这些训练环节的确切含义以及人们需要理解的重点,还有什么其他重要内容需要分享吗?
Awesome, okay. So we're talking about pre training, post training, fine tuning. Is there anything else here that's important to share about just like what this is exactly, what people need to understand about these parts of training?
绝大多数时候,我们作为用户不会触及预训练模型,基本用不到它。
So the vast majority of time, we don't touch on like pre model, like as users we don't use it.
已经为我们做好了。
Already done for us.
是的。我觉得这个过程挺有趣的。比如我尝试玩过法语预训练模型,结果很糟糕。它们会说些'天啊'之类的话,简直疯了。所以观察后训练能多大程度改变模型行为非常有意思。
Yeah. So I think my action is a bit of fun process. Like when my French training model is I tried to play with their pre shooting model and they're horrendous. They're like saying things like, oh my gosh, it's like, yeah, it's crazy. So it's very interesting to look at like how much of like post trading can change the model behavior.
现在很多人把精力都花在后训练上,因为预训练主要是用来提升模型的通用能力和容量,这需要大量数据和增大模型规模。但某种程度上我们已经把互联网文本数据用到极限了,所以大家都在尝试音频、视频等其他数据源,寻找新的数据来源。
And the things that's where like a lot of time as that a lot of people are spending energy on nowadays they function a lab is on like post training because pre training, I think, so pre training have been used to like increase the general capacity of a model, capabilities of a model. And it depends on, it needs a lot of data and like model size, like to increase the model capabilities. And at some point we are actually like have kind of maxed out on the Internet data, right? And people like texted up, they maxed out. I think a lot of people are doing like the other data, audios and videos, and everyone's trying to think of like, what is the new source of data?
在后训练阶段,虽然大家的预训练数据可能很相似,但后训练才是如今真正产生差异化的地方。
But if we're like post trading, but like middle class of like, is this more of like, everyone can have very similar pre trading data. Is that post trading is where they make a big difference nowadays.
这是个很好的过渡——你刚才提到监督学习与非监督学习。顺便说我很喜欢探讨这个,非常有趣。你所说的标记数据,本质上监督学习就是AI在已标注正确与错误的数据上学习,比如区分垃圾邮件与非垃圾邮件。
This is good segue to, you talked about supervised learning versus unsupervised learning. Love getting into this, by the way, is super interesting. So you're talking about labeled data. Basically supervised learning is AI learning on data that somebody has already labeled and told it here is correct versus incorrect. For example, this is spam versus not spam.
这是个不错的短篇故事,这不是个好短篇故事。我们接触过许多为实验室提供这类服务的公司CEO,比如Mercor、Scale、Handshake,还有Micro等几家。那么这些公司本质上就是在为实验室提供标注数据、高质量的训练数据吗?
This is a good short story, this is not a good short story. We've had the CEOs of a lot of these companies that do this for labs, Mercor and Scale, Handshake, there's Micro, there's a few others. So is that essentially what these companies are doing for labs, giving them labeled data, high quality data to train on?
某种程度上是的,但我认为这更像是大型方程式的产物。其中包含的组件远不止于此。所以我刚才提到强化学习——不知道你们采访的CEO是否提到过这个术语。核心思想是希望人们...比如说你给模型一个命题对吧?
It is in a way, but I think it's more like a product of big equations. So there are a lot more different components than that. So that's why I was talking about reinforcement learning. I'm not sure if your CEO that you interviewed bring up like that term. So the idea is that you want people to like, so like let's say you have a model give the model like a prop, right?
然后它会生成输出结果对吧?你需要强化或鼓励模型产生更好的输出。这就引出了如何判断答案好坏的问题。人们通常通过信号来识别——获取新鲜好坏反馈的方式之一就是人类反馈。
And it produce an output, right? You want to buy, like you want to reinforce or encourage the model to produce an output that is better. Like that how like now comes to like, how do we know that the answer is good or bad? So easily people realize on like signals. So one way to get like a fresh one good or bad is like human feedback.
比如当有两个回应时,你可以判断这个比那个更好。我们这样做是因为人类往往难以给出具体分数,但比较判断更容易。就像让我给一首歌打分...
They happen to have two responses. You can okay, this one's better than the other. And we do that is because like as humans, we tend to it's very hard to give like concrete score, but it's easier to do comparisons. Right? Like if you ask me, okay, give this song a score.
我不是音乐人,根本不知道难度系数。可能会随便打个6分什么的。如果一个月后完全忘了这首歌再问我,可能这次打7分或4分——我自己都说不准。
I'm not a musician, like, and don't know, like, how hard it is. Like, it's like, yeah. I don't know, like, what? Like, 10, I'm gonna have six, you know. And then if you ask me again a month from now and I completely forgot it, it's okay.
但如果你问我:这里有两首歌,生日派对你更想播放哪首?我就能明确选择:可以放这首而不是那首。
Maybe now seven or, like, four. I don't know. But I need to ask me, okay, here are two songs. And which one could you prefer to play for the birthday party? I was like, okay, I can play it before this song.
所以说比较判断容易得多。通过人类反馈训练奖励模型——这个模型能对生成回应进行评分:这个结果是好是坏?
So like comparison is a lot easier. So say, so we have a humans you have human feedback and then you use this human feedback to train a reward model. So you'll tell which And then if reward model help you like, okay, it's a model that produces response. It's a reward model can score. Is this good or bad?
你应该倾向于生成更好的模型,也就是更优质的响应。另一种方法是,你可以不用人类,而是使用人工智能,对吧?比如判断响应是好是坏。或者按照现在流行的说法,就是可验证的奖励机制,这很自然。基本上就是给模型一个数学问题和数学解答,比如模型输出的解答,如果预期答案应该是四个H2,而它没有提供四个H2,那显然就是错误响应。
And you should bias toward producing better model, the better responses. Another way is that you can instead of using a human, can use like AI, right? Like what was the response, yes, good or bad, right? Or in terms of things that people are very big on nowadays, verifiable rewards, which is like natural. So basically they give it a math problem and a math solutions, like is a model output a solution, it, you know, okay, so expected response should be in a four H2, and it doesn't provide four H2s, then it's wrong right now, it's not a good response.
是的,很多时候人们会使用人工劳动力,比如雇佣专业人士来生成数学题和标准答案。并通过设计可验证的系统,让模型能够基于这些数据进行训练。
So, yes, a lot of time people are using this human labor, like laborers to produce like math, how to say expert questions and I say expected answers. And in a ways of like design systems that's like verifiable so that the models can can be trained on. Yeah.
好的,我很高兴你提到这点。这本质上就是RLHF——基于人类反馈的强化学习,也正是我想讨论的内容。
Okay, I'm really glad you went there. This is essentially RLHF, reinforcement learning with human feedback, which is exactly what I wanted to also talk about, right?
是的,我认为这是一种通用的学习方式。传统训练是常规学习,而无论是通过人类反馈、AI反馈还是可验证奖励机制,本质上都是收集信号的不同方式。
Yeah, so I guess it's general, it's a way of learning. Like training is conventional learning and whether it learn from human feedback or like AI feedback or like very valuable rewards. I think they say, you say it's just different way of like collecting signals.
太棒了。我们曾邀请Anthropic的CEO上播客,他谈到他们版本的RLHF是AI驱动的强化学习。我很喜欢他的表述方式:本质上是要帮助模型,强化正确行为和答案。这个方法无论是工程师看到模型输出后说'不,我会用不同方式编码',还是训练一个辅助模型来判定原模型的对错,原理都一样。
Awesome. Yeah, that's, we had the CEO of Anthropic on the podcast and he talked about their version of RLHF, is AI driven reinforcement learning. I love the way he phrased it, where you basically, you want to help the model, you want to reinforce correct behavior and correct answers, and this is the method to do it, whether it's, say, an engineer seeing an output from a model being like, no, here's how I would code it differently. And then training, and it's training a different model that the original model works with to tell it, am I correct or not correct? Is that right?
没错,这是理解这个问题的一个角度。我认为这个领域现在非常令人兴奋,因为有很多专业领域任务需要模型开发者去攻克。比如会计行业想用模型处理会计事务,就需要大量会计数据样本,这意味着要雇佣很多会计师来生成数据。
Yeah, I that's a way of looking into it. And I think that's a space is so exciting nowadays because there are so many like domain expert tasks that the model, like the model developers want models to do well on, right? Let's say you're like accountant, Maybe you want to use a model to have accounting tasks. So I need a lot of like accounting data, like examples from my accountant. So you need to hire a lot of them to like do it.
再比如你想处理物理程序、法律问题或工程问题。有人告诉我他们想用编程解决科研问题,而不仅仅是开发产品——这完全是另一个维度。还有使用特定工具的场景,比如财务软件QuickBooks或Google表格,这些工具需要专业知识和特殊技能。要让模型掌握这些,就需要大量该领域的人类专家来创建训练数据,这是个庞大的工程。
Or if you want to do physics programs, I want to do, I don't know, like legal questions and stuff, or like engineering questions, or like somebody was telling me, want to do like using like coding for, to solve scientific problems and not just like coding to build product, which is another different whole realm of things. And I also like using very specific toolings like, yeah, like I'm not sure what apps you use, but maybe like forwarding app or like QuickBooks or like Google Excel, like they have very specific, like tools specific expert expertise. So you want the model to which you learn. So as they need a lot of like humans expert in this area should like create data to treat them. And it's a massive things.
这就像人们因为每个人都想要大量数据,仿佛一旦有了无限预算就能为所欲为。但我觉得这其中也暗藏一些低调有趣的经济学原理。我不确定你是否和嘉宾讨论过这个问题,我认为思考这个非常有趣,因为它非常不平衡,对吧?因为前沿实验室的数量屈指可数。
It's like people because everyone wants a lot of data and like once slaps at like unlimited budget. But whether I think there's also like a little bit of low key interesting economics. I'm not sure you've talked to the guests about. I thought it's very interesting to think about because it's very lopsided, right? Because there are only like a very small numbers of frontier labs, right?
而这些实验室需要海量数据。同时又有大量初创公司或企业在提供数据。你会看到这些做数据标注的初创公司可能营收惊人,但当你问他们有多少客户时,他们可能会说数量非常有限。
And they want a lot of data. And there's a massive amount of startups or companies that are providing data. So you can see this companies, this startup doing data labeling, maybe they have massive AR. But you ask them like, okay, so how many customers you have? And they could be like, oh, very small numbers.
我确信看到你在微笑。
I'm sure you saw you smiling.
是啊,聊聊这个吧。
Yeah, yeah, chat about that.
没错,这让我有点不安。我们有些公司疯狂增长,却严重依赖两三家大客户。同时,如果我是前沿实验室,从经济学角度看该怎么做?现阶段我需要大量初创企业作为供应商来择优选择。
Yeah, I'm like a bit of it like, leave me uneasy, right? Here we have a companies growing like crazy, but it's like heavily dependent on like two or three companies. And at the same time, if I was this company Frontier Labs, what would be the right economical things for me to do? Right now, I want a lot of startups. I want to have a lot of providers so I can pick and choose.
这些供应商还会相互竞争压价,虽然极度依赖这个领域,但在我看来无论如何...所以我觉得...这种经济学现象整体非常有趣,我很好奇最终会如何演变。
And as these providers can also like to compete each other to lower the price and it's so dependent on music, but sounds to me regardless. So so I feel like yeah. So so I don't know. This economics, a whole economics is very interesting to me and I'm curious to see how it plays out.
我听到的意思是,你看衰这些数据标注公司的前景。正如你所说,由于客户稀少而竞争者众多,它们缺乏定价权。所以尽管它们是全球增长最快的公司之一,你认为前方存在挑战。
What I'm hearing is you're bearish on the future of these data labeling companies, because as you said, they don't have a lot of leverage over pricing because they have so few customers and there's so many people getting into the space. So basically, even though they're some of the fastest growing companies in the world, you're feeling like there's a challenge up ahead.
实际上我对此持悲观态度。我很好奇,因为事情往往以我意想不到的方式发展。也许这些公司拥有大量数据,或许他们能利用这些数据获得一些洞察力,帮助他们保持领先。所以我不确定。
I'm actually some bearish on it. I think I'm curious because I think things have has a way of work out in ways I don't expect. So I think that maybe these companies, they have a lot of data. Maybe they wouldn't be able to use that to like have some insight that helps them stay ahead of the curve. So I don't know.
非常公正的回答。好的,既然我们谈到这个话题,我想聊聊评估(evals),这是本播客中反复出现的话题。这是AI实验室真正需要的、这些公司分享的另一类数据内容。你能用最简单的方式解释什么是评估,以及它如何帮助模型变得更智能吗?
A very fair answer. Okay, while we're on this topic, I want to chat about evals, which is a very recurring topic in this podcast. This is the other piece of data content these companies share that AI labs really need. Can you just talk about what an eval is the simplest way to understand it and then how this helps models get smarter?
我认为人们对待评估有两种截然不同的方式。一种是应用开发者,比如我开发了一个简单的聊天机器人应用——这是我首先想到的例子。我需要知道这个聊天机器人是好是坏,所以需要某种评估方法来判断。
So I think that people approach eval, I think they're like two very different problems. Once is a app builder, right? And I can say have an app that do like maybe a chatbot, very simple and I was first thing that came to my mind. And I want you know if chatbot is good or bad, right? So it needs to come a little way with like evaluate to chatbot.
另一种情况是任务特定的评估设计。假设我是模型开发者,想让我的模型更擅长代码编写。但如何衡量创意写作呢?这就需要有人真正理解创意写作,思考什么是好故事的标准,然后设计整个数据集和评估创意写作的准则。
Another thing is, I think of this as a task specific event design. So let's say I'm a model developer and I want to make my model better at code writing, Right? And it was like, okay, but how do I even measure creative writing? So I even need someone to like, okay, understand creative writing and think about like what makes story, or like what makes a story good? And then design the whole dataset and then criteria to evaluate creative writing.
所以我觉得评估设计特别有趣——制定标准、建立指南、培训人员如何有效执行。在我看来评估(EVAR)非常有意思,因为它极具创造性。我看到不同评估体系的构建方式时总忍不住感叹:这完全不枯燥啊!
So, so I think there's that, I think it's like more like eval design that is very interesting. Come out with criteria, come out with guideline, how to do it. And then also like train people like how to do it effectively. So I guess in a case I think EVAR is really, really fun because it's extremely creative. I was looking at like different EVARs were built and was like, wow, like is this not dry at all?
这简直超级超级超级有趣。
This is like super super super fun.
我们之前和Hamel、Shreya做过关于评估的播客,他们谈的正是这点——为企业创建评估体系其实非常有趣。让我们深入探讨一下:网上有种争论(不知重要性如何),似乎人们花很多时间思考AI产品是否需要评估。一些顶尖公司表示他们不做正式评估,只凭感觉判断'这个效果好吗?我能感受到吗?' 你对构建AI应用评估体系的重要性及评估技巧有什么看法?——这里特指应用公司,而非模型公司。
We had all podcasts on Evolves with Hamel and Shreya, and that's exactly what they talked about, it's just it's actually really fun to create Evolves for companies especially, so let's dig into that one a little bit more. There's this kind of debate online that I don't know how big of a deal this debate is, but it feels like people spend a lot of time thinking about this, this idea of do we need evals for AI products, some of the best companies say they don't really do evals, they just go on vibes. They're just like, is this working well? Can I feel it or not? What's your take on just the importance of building evals and the skill of evals for AI apps, not the model companies?
你不必事事追求绝对完美才能获胜。只要做到足够好并保持一致性就够了。好吧,这并非我信奉的哲学,但通过与众多公司合作,我亲眼见证了这一点。所以当我问为什么公司不进化时——假设你是高管对吧?
You don't have to be like absolutely perfect at things to win. You just need to be like good enough and being consistent about it. Okay, this is not the philosophy I follow, but like I have worked with enough companies to see that play out. So when I say like why company don't evolve, right? Let's say you are like an executive, right?
假设你想开发一个新用例。我们启动了这个用例,构建完成后运行良好对吧?客户还算满意,虽然没有精确指标,但流量持续增长,用户似乎很开心,购买行为也在持续。这时工程师提出:我们需要进行评估。问题是:评估需要投入多少资源?
And you want to have a new use case. So here's a use case you started out, we built and it's like, it works well, right? The customers are somewhat happy, you don't have the exact metric for it, but like, so the traffic keeps increasing, like people seem happy, people keep buying stuff, right? And now here's our engineer coming like, okay, we need Eval for it. Was like, okay, how much effort do we need to bring to Eval?
他们回答可能需要两名工程师,或许能带来改进。那么预期收益是多少?工程师会说可能从80%提升到82%或95%。但我在想:同样的两名工程师如果开发新功能,带来的提升会不会更大?
And they were like, okay, maybe like two engineers as much as much. And it could maybe would improve. So it was okay, so how much expected gain can I get from it? And the engineer will be like, oh, maybe you can improve it from like 80% to like 82%, 95%, right? And I was like, okay, but who do think that's two engineers and be able to launch a new feature?
这就像EVAR(评估价值与风险)的权衡。有时你会觉得当前状态足够好,不必调整。若在EVAR上耗费大量精力,可能只获得边际改善,不如把精力投入新用例。只要达到'足够好'的标准通过团队验证就行。我认为这正是争议的核心。
Then it could give me like so much more like improvement, right? So I think it's like one of them is like EVAR, sometime you're posting EVAR, it's like, okay, this is good enough to touch it. Like if you do spend a lot of energy on EVAR, it would like only incremental improvement where it spends the energy on like another use case. And maybe they get it good enough that you guys vibe check it, right? So I do think just like maybe it's like that's a debate is about.
现实中人们常在'足够好'时就止步不前。但这存在巨大风险——若缺乏清晰指标,无法准确评估模型表现,可能导致灾难性错误。在规模化运营中,进化至关重要。你必须严格控制用户端呈现内容,预判各种故障模式。特别是当产品功能构成竞争优势时更应如此。
I do think that's like a lot of time people just like get things the place when it's like, okay, good enough. People run. But of course it's like there's a lot of risk associated with it because if you don't have a clear metric, you have a good visibility to how's the application is a model is performing, it might do something very dumb or it can cause you like, I know something like crazy can happen. I do think evolve is very, very important if you have, if operate a scale and where like failures can have like catastrophic consequences, then you do need to be very tyrannical about what you print in front of the users, understand different failure modes, what could go wrong. And also maybe in a space when that's like, it's a feature, the product is as a competitive advantage, right?
如果是核心竞争领域,你必须做到极致,清晰掌握自身与对手的差距。但若只是辅助功能,不必过分苛求——暂时够用就行。即便失败也无妨。
You want to be the best at this. You want to have a very strong understanding of like where you are, and like where you are with the competitors. But it's just something that's like more like a low key, okay, this is like something that's like, okay, that's not the core, or like it helps with our users. Then maybe you don't need to be so obsessed or like tyrical about it is okay, that's good enough for now. And if it fails, then it fails.
虽然听起来很可怕,但归根结底是投资回报率的问题。我个人推崇评估体系,热衷研读评估报告。但我完全理解为何有人选择暂缓评估,优先开发新功能。
Like, okay, I know it's like, it's such terrifying, but like, As I think it's all about is a question of like return on investment. I'm a big fan of eval, I love reading eval. And it says like, I understand why some people would choose to not focus on eval right away and choose like bringing on new functionalities instead.
太棒了,这个回答非常务实。我理解的是评估确实很重要,尤其是在规模化运营时,要有所取舍,不必为每个小功能都做评估。Hamel和Shreya提到,人们通常只需要针对产品最关键的部分做五到七个评估。你看到的情况是这样吗?还是发现实际生产中人们构建和需要的评估要多得多?
Awesome, that is a really pragmatic answer. What I'm hearing is evals are great and very important, especially if you're operating at scale, pick your battles, you don't need to write evals for every little feature. Something that Hamel and Shreya shared is that people need just like, I don't know, five or seven evals for the most important elements of their product. Is that what you see or do you see a lot more in production that people build and need?
我不认为评估数量应该是个固定数字,关键要看评估的目的是什么对吧?评估的目的是指导产品开发。就像你看到的演进过程,我认为最大的价值在于它能帮你发现进展顺利的领域。有时候结果很明显——我们查看评估数据发现某个用户群体表现特别差,就会深入排查问题所在。
I don't think of like just a fixed number on like the EVARs, like what was the goal of EVAR, right? The goal of EVAR is to guide the product development. So like you see evolve because I think our big kind of evolve is that it helps you uncover opportunities where the progress are doing well. So I sometimes I've seen it very obvious, okay, we look at the evolve and we realize it's like, okay, it performed really poorly on this like specific segment of users. And then we look into it, it's like, okay, what's wrong with it?
结果往往发现只是我们的信息传达不到位。只要集中改进这些薄弱环节,效果就会显著提升。所以评估数量真的取决于具体情况。我们见过有些产品用上百个指标,人们都快被逼疯了。
And it turns out it's like, we just don't have a good messaging to it. So like we were to just focus on the things of reading poorly, it can improve significantly. Yeah, so I can say number of evolve is really depends. Like we have seen product with like hundreds of different metrics. Like people are like going crazy.
这是因为像Gerald这样的产品会细分不同维度——比如用单独指标衡量信息冗余度,用另一个指标衡量用户敏感数据处理,再用其他指标评估内容长度等。举个完整例子:假设有个应用能构建模型帮你做深度研究,比如输入'请全面研究Own Learning的播客,分析他关注的主题类型、哪些视频最受欢迎,以及他遗漏了哪些该涉及的话题'。
This is because like that product is like Gerald, Have different names, have like one evolve for like, I don't know, like verbosity have like one evolve for like user sensitive data. And like another is like for length, but like has a number of like, okay, that's just pretty good example, complete example, like deep research. So you have the application, you have like builds a model to like do deep research for you. Right? Like, okay, like have a prompt, like me said, okay, do me a comprehensive research on own learning's podcast and help me like sort of like propose like, show me report on what kind of topics he's interested in, what kind of videos get the most views or like what topics that he's missing on that he should be covering, right?
当你设计好这样的提示词后,如何评估结果质量呢?我认为没有单一指标能解决这个问题。可能需要召集上百名专家撰写大量提示词,让AI生成答案后人工评审——这个过程极其昂贵且耗时。
Like has a kind of like prompt, then how do you evaluate the result, right? I don't think there's like one like metrics that would help. Maybe it's just like, maybe you have like a 100, I think somebody has a benchmark and is to get like a 100 experts, like write a bunch of prompts and it goes through like all the answers on AI and I do it. And it's like, it's extremely costly and slow. Right?
不过我有另一个思路:之前和朋友讨论时想到,可以先关注摘要生成的过程。首先需要收集信息,这就要执行大量搜索查询并抓取结果。
But if I might have something else, first of like one way I was thinking about it, I was talking to a friend about it. And one way is just like, how do you produce the result of the summary, right? At first you need to do like gather information. And to gather information, need to do a lot of search queries. You like gather, grab the search results.
然后对搜索结果进行聚合分析,可能发现仍有缺失,就需要补充查询路径,最终形成摘要。这个流程每个环节都需要评估——不必全程覆盖。比如设计搜索查询时,可以先思考'现在写了五个搜索词,如何判断它们的质量?'
And then from the search results, you like aggregate and then maybe say, okay, I'm still missing on this, you have to do another route, like another route, and it's the end of the summary. So every step of the way you need evaluations, right? Don't need to end to end. So maybe there was a search query, you might first think about like, okay, now I write five search queries. Am I looking to like, how good is this search queries?
比如,它们是否彼此相似?因为这五个搜索查询非常相似,比如,需要播客,然后需要上个月的播客,再比如需要两个月前的播客,对吧?这并不太令人兴奋,但如果播客的查询关键词更多样化呢?然后看看搜索查询的结果,假设你输入搜索查询,比如很多明信片数据标注。然后会出现大约10页,10个结果。
Like, do they like, are they like similar to each other? Because in these five search queries that are very similar, like, okay, need podcasts, then I need podcasts last month, then need podcasts like two months ago, right? It's not very, very exciting, but like if the queries of podcast, like the keywords are like more diverse, right? And then look at the results of the search query, let's say you enter the search query, like many postcard data labeling. And then it come up with like 10 pages, 10 results.
然后你可能会发现,比如,Lenny Podcast关于,我不知道,比如Frontier Labs,有大约10个结果。而我的关注点是不同的网页,它们有多少重叠?比如,我们是否既在广度上,获取了很多页面,又在深度上有所覆盖,同时它们是否相关,因为我们可能会得到与原始提示完全无关的搜索查询。所以我觉得每个方面都需要一种评估方式,对吧?所以我认为不仅仅是,我需要多少评估?
And then you come up with like, oh, Lenny Podcast on, I don't know, like Frontier Labs and have like 10 results. And my look was a different webpage, like how much of them overlapping? Like, are we doing both like the breadth, like getting a lot of page, but also like do we have depth and also they have relevance because we come up with a search queries that are completely irrelevant to the original prompt. So I feel like every aspect of it, it would need a way of evaluating, right? So I don't think it's just like, how many eval should I get?
而是,我需要多少评估才能获得良好的覆盖范围,对我的应用性能有高度信心?同时帮助我理解它在哪些方面表现不佳,以便我可以进行修复。
But like, how many eval should do I need to get a good coverage, a high confidence in my application's performance? And also to help me understand like where it is not performing well so that I can fix it.
太棒了。我还听到,特别是对于核心用例,比如用户在你产品中最常走的路径,是你想要重点关注的。
Awesome. And I'm hearing also just especially for the very core use case, like the most common path people take in your product is where you wanna focus.
是的,没错。
Yeah, so yeah.
好的,让我再提一个术语,我想稍微换个方向。Rag,人们经常看到这个词,R A G,它是什么意思?
Okay, let me, there's one more term I want to cover and I want to go in a somewhat different direction. Rag, people see this term a lot, R A G, what does it mean?
所以RACA代表检索增强生成,而且不仅限于生成式AI。这个想法就像是我们回答问题需要上下文。我认为它起源于2017年的一篇论文。有人意识到,对于一系列问答基准测试,如果给模型提供关于问题的信息,答案会好得多。所以它的作用是从维基百科等地方检索信息。
So RACA stands for Retrieval Augmented Generations and also not a specific to generative AI. So the idea is just like follow the questions we need context to answer. So I think it came pretty, I think it's from the paper of 2017. So someone was like So they realized it's like for a bunch of like benchmark, when the question answering benchmarks, they realized it's like, okay, if we give the model informations about the questions, the next answer can be much, much better. So what it do is it tries to retrieve information from Wikipedia.
关于主题的问题,比如检索相关内容并放入上下文后再回答,效果会好得多。所以我觉得这听起来像是理所当然的,对吧?我的意思是,显然如此。我认为这就是Racket最简单的意义所在,就是为模型提供相关上下文,让它能回答问题。而这正是事情变得真正有趣的地方,因为传统上Rack刚开始时主要是处理文本。
So for question about topics, like retrieve that and then put into the context and like answer it does much better. So I feel like it sounds like a no brainer, right? I mean, obviously. So I think that's what Racket as a simplest sense, it's just like providing the model with a relevant context so that it can answer the questions. And that's where like things get like really more interesting, because traditionally when it started out Rack is mostly like text.
我们讨论了很多关于如何准备数据以便模型能有效检索的方法。比如说,并非所有内容都是维基百科页面,对吧?维基百科页面相当自洽,你知道它都是关于某个主题的。但很多时候你拿到的文档内容很分散。
So we talk about like a lot of ways like how should prepare data so that the model can retrieve effectively. Let's say just like, not everything is a Wikipedia page, right? Like Wikipedia page is pretty contained and he knows, okay, everything about it is about a topic. But a lot of time you have documents, it cheer me a lot. Right?
而且这些文档可能有奇怪的结构方式。比如你有一些关于Lenny播客的文档,但在文档开头部分可能根本不会提到'Lenny播客'这个词。假设将来有人问'告诉我Lenny的工作',由于文档其余部分没有'Lenny'这个词,你可能根本不会检索到这部分内容。
And like they have a weird way of like structures of documents. Let's say that you have documents about Lenny podcast, right? And in the future, in the beginning of documents like from now on podcast wouldn't refer to Lenny's podcast. So let's say somebody in the future is like, okay, tell me about Lenny, Lenny's work. And because as a rest of the document does not have the term Lenny, you just don't know, you might not read through it.
当文档被分割成不同部分时,第二部分可能完全不包含关键词。所以必须找到处理数据的方法,确保即使关联性不明显,也能检索到与查询相关的信息。于是人们提出了上下文检索的概念,比如为数据块添加摘要元数据。
And the document is long enough that it chunk into a different part. So like the second part doesn't have the word many, so you cannot retrieve it. So I have to find a way to process data. So that makes sure that it's like, it can retrieve the information that's relevant to the query, even though it might not immediately like obvious that is related. So people come up with like only thing of I think like contextual retrieval, like giving a chunk of the data the relevant, like maybe in a summary metadata so that it knows.
有些人会使用假设性问题的方法,这很有趣。针对某个文档块,先预设它能回答哪些问题。当实际查询到来时,看是否匹配这些假设性问题,从而进行检索。
Some people use it like as a hypothetical questions. It's very interesting. Like for a given chunk of like documents, I must get a bunch of questions that the chunks can help answer. That when you have a query is okay, does it match any of the like hypothetical questions? So it can fetch it.
这是非常有意思的方法。在继续之前我想强调:RAC系统的数据准备极其重要。根据我的观察,很多公司Rack解决方案的性能提升主要来自更好的数据准备,而不是纠结使用哪种数据库。当然数据库选择对延迟或特定读写模式很重要,但就回答质量而言...
So it's very interesting approach. Okay, so maybe before I go to the next thing, just want to say this like, data preparations for RAC is really important. And I will say this like in a lot of the companies that I have seen, that's like the biggest performance in their Rack solutions coming from like better data preparations, not agonizing over like what database to use. Because database, a client is very important to care about things like latency or like if you have like very specific access patterns like read heavy or write heavy, of course it's like it matters. But in terms of like pure quality answers, right?
我认为数据准备才是决定性因素。
I think the data preparation like hands out.
当你提到数据准备时,能否举个具体例子让我们更好理解?
When you say data preparation, what's an example to make that real and concrete for us to understand?
举个例子来说,就像刚才提到的数据分块。我们需要考虑每个数据块应该多大才合适。比如你想检索大约一千字的内容,如果数据块很长,就更可能包含更多相关元数据,这样就能检索到更多信息。
So like one way is it like, just mentioned as in like, you have like chunks of data. So we have to think about like how big of each chunk should be, right? If it's like, Because sort of think about like, is a context you want to maximize, maybe you can, it's very simple example, now you want to retrieve like a thousand words, right? So if a data chunk is long, then it's more likely to contain more relevant metadata. So you can retrieve more.
但如果数据块过长,比如一个块就有一千字,那你可能只能检索到一个块,这样效果就不好。反之如果太短,虽然能检索到更多相关信息和更广范围的文档块,但每个块又太小难以包含足够的相关信息。所以我们需要精心设计数据块的大小。
But if it's too long, then you have a thousand word and so chunk is like a thousand words, you're gonna reach one chunk. So it's not very useful. But if it's too short, then it can retrieve more relevant information. Also it can retrieve a wider range of documents and chunk, but at the same time, each chunk is too small to contain relevant information. So we have like very nice chunk design, like how big each chunk should be.
你还可以添加上下文信息,比如摘要、元数据、假设性问题。有人告诉我,他们通过将数据改写成问答格式获得了很大性能提升。比如把播客内容不是简单切块,而是重构为'这是问题,这是答案'的形式批量生成。用AI也能完成这种数据处理。
You add contextual information like summary, metadata, hypothetical questions. Somebody was telling me just like a very big performance I got is that from rewriting their data in the question answering format. So instead of having like, so they have a podcast, Instead of just chunking the podcast, you just like reframe, rewrite it into like, here's a question, here's answers and produce a lot of them. They can use AI for that as well. So that's one example of data processing.
我看到很多案例是帮助人们用AI处理特定工具和文档。现在的文档大多是为人类阅读编写的,而AI阅读方式不同。人类有常识能理解上下文,但AI不具备这种背景知识。比如有人提到,当他们为一个函数库编写文档时...
A lot of example we I see is like for people helping like using AI to help with like specific tool use and documentations, right? And a lot and we write documentation usually a lot of documentation today is written for human reading. And AI reading is different because it's different because humans, we have like common sense and we kinda know what it is. So so all on things are like human for human experts, they have the context that AI doesn't quite have. So somebody told me is that like, what's a big change they have is like, let's say that you have a you have a function, a document, documentation for this maybe this library.
假设这个库的某个输出参数可能是某个专业术语,比如'温度'或图表下的某个值应该在1、0或-1之间。人类专家能理解这个量级的含义,但AI完全不明白1代表什么。所以需要额外为AI添加注释层,说明'温度=1'表示...
And this library say, okay, the output of this one is like maybe talking for like, I know, some crazy term, maybe there's some temperature or something under graph, it should be like one zero or minus one. And as a human expert, maybe understand the scale like what one in the scale mean, but like for AI, I just really doesn't understand what I mean. So so I actually have like another annotation layer for AI. It's like, okay. Good temperatures equal one means like that.
这并不是指绝对温度,而是与那边的量级相关联。所有这些数据处理工作都是为了帮助AI更轻松地检索相关信息来回答问题。
It's not like it's absolute temperature. It's, like, associated with the scale over there. So it's just saving all this data processing to make it easier for AI to retrieve the relevant information to answer the questions.
本期节目由Persona赞助播出,这是一家帮助机构进行用户身份验证、防范欺诈并建立信任的认证平台。我们在播客中经常讨论AI的惊人进步,但这可能是一把双刃剑。每一个令人惊叹的时刻背后,都有欺诈者利用相同技术制造混乱——洗钱、盗用员工身份、冒充企业。Persona通过自动化用户、企业和员工验证来应对这些威胁。无论您是想防范候选人欺诈、满足年龄限制,还是保障平台安全,Persona都能根据您的具体需求提供定制化验证方案。
This episode is brought to you by Persona, the verified identity platform helping organizations onboard users, fight fraud, and build trust. We talk a lot on this podcast about the amazing advances in AI, but this can be a double edged sword. For every wow moment, there are fraudsters using the same tech to wreak havoc, laundering money, taking over employee identities, and impersonating businesses. Persona helps combat these threats with automated user, business, and employee verification. Whether you're looking to catch candidate fraud, meet age restrictions, or keep your platform safe, Persona helps you verify users in a way that's tailored to your specific needs.
最重要的是,Persona能让您轻松识别交易对象,同时不给合规用户增添麻烦。这正是Etsy、LinkedIn、Square和Lyft等领先平台选择Persona保障安全的原因。Persona还为我的听众提供特别优惠:连续12个月每月500次免费服务。立即访问withpersona.com/leni即可开始。重复一遍:withpersona.com/leni。
Best of all, Persona makes it easy to know who you're dealing with without adding friction for good users. This is why leading platforms like Etsy, LinkedIn, Square, and Lyft trust Persona to secure their platform. Persona is also offering my listeners 500 free services per month for one full year. Just head to with persona.com/leni to get started. That's with persona.com/leni.
再次感谢Persona对本节目的赞助。太棒了。好的,你刚才谈到如何与企业合作制定AI战略、开发AI产品、选择构建工具等方面。我想多花点时间讨论这个,因为很多公司正在开发AI产品,也有很多公司在这个过程中遇到困难。
Thanks again to Persona for sponsoring this episode. Awesome. Okay. So you've talked a bit about how you work with companies on these sorts of things, on their AI strategies, on their AI products, how they build, which tools they build, all these things. I want to spend a little time here, because a lot of companies are building AI products, a lot of companies are not having a good time building AI products.
让我就你从成功企业案例中获得的经验提几个问题。首先是关于企业AI工具采用率的问题——最近有很多关于AI炒作的声音,数据显示大多数公司尝试后效果有限就放弃了,导致出现'这技术可能走不远'的论调。就企业内部AI工具的采用现状,你观察到了什么?
Let me ask a few questions along these lines of what you've learned working with companies that are doing this well. One is just, I guess in terms of AI tool adoption and adoption generally in companies, there's always this talk recently of just like, all this AI hype, the data is actually showing most companies try it, doesn't do a lot, they stop. And so there's all this just like, maybe this isn't going anywhere. So in terms of just adoption of tools in AI within companies, what are you seeing there?
关于企业级生成式AI,我认为目前有两类工具表现突出:一类是提升内部生产力的工具,比如编程辅助工具、Slack聊天机器人、内部知识库。许多大型企业都建立了模型封装层,可能对接不同类型的知识库——我们之前讨论过文本类知识库,虽然还没涉及亚洲技术库或多模态知识库,但这确实是个非常令人兴奋的领域。本质上,这类工具让员工能便捷查询内部文档。
For Gen AI in company, I think there's two type of Gen AI toolings that have been I have seen like once is to like internal productivity, And I have coding tools, Slack chatbot, internal knowledge. Like a lot of big enterprises have some kind of like a wrapper around like models, like with access to like maybe some different type of racks. So we should, I think we talk about it a kind of like text based rack. I haven't talked about like Asian tech rack or like having to like multimodal rack yet, but it's like, yes, there's a whole very exciting area around that. Yeah, so like basically it should allow the employee to like access internal document.
比如员工可以询问:'我要休产假/陪产假,相关政策是什么?'或者'这个手术是否在医保范围内?'又或者'我想推荐朋友入职,流程是怎样的?'这些场景都需要内部聊天机器人来辅助运营。另一类工具则侧重对外服务。
Some ways I'm going to ask like, okay, I'm having a baby, what could be the maternal or paternal policy, right? Or like, am I having these operations with the health benefit, cover that? Or like, I want you to interview, I want you to refer my friend, what could be the process for that? So a lot of this like having chatbot, internal chatbot to help with internal operations. And another things, another category is more like customer facing.
也就是面向客户或合作伙伴的服务。客服聊天机器人是典型应用,比如酒店集团的预订机器人——这类应用规模庞大,我认为企业热衷开发预订/销售机器人是因为能直接量化效果:通过对比人工客服的转化率,可以清晰衡量聊天机器人的表现。
So all like partner facing. So for the customer support chatbot is a big one. If a hotel chain, you might have like a booking chatbot, which is like somehow massive, like a lot of booking chatbot because I guess it's I do have this theory of like a lot of applications companies pursue because they can measure as a concrete outcome. And I feel like booking on a sales chatbot is very clear, right? There was a conversion rate right now with a chatbot with human operators.
那么聊天机器人的转化率会是多少呢?有时候我觉得结果非常明确,企业很容易就能接受这些解决方案。很多公司都有面向客户的聊天机器人,这算是另一类工具。我之所以面向客户或外部工具,是因为人们倾向于选择结果明确的应用。
And what could be conversion rate with a chatbot? And it's sometimes I think it's like very clear outcomes and companies are easy as you buy into this solutions. So a lot of companies have that like customer facing chatbot. So that is another category of tool. And I that for customers or external facing tools because people are driven to choose applications with clear outcomes.
因此是否采用它们,实际上取决于企业是否能看到成效。当然这并不完美,因为有时效果不佳并非因为创意或应用本身不好,而是构建过程不够理想。是的,这很棘手。对于内部工具采用,比如内部生产力工具,情况就变得复杂。我认为很多公司在制定AI战略时...
So the questions of adopting them is really based on like whether they see the outcome or not. Of course, it's not perfect because sometimes the outcome can be bad not because the idea or like the applications, idea sell is bad just because of the process of building it is like not that great. Yeah, so it's tricky. For the internal adoptions of like tooling, so internal productivity that's where it gets tricky. I would say like a lot of companies was a think of AI strategy.
我认为AI系列通常有两个关键方面,对吧?一是应用场景,二是人才。你可能拥有优质数据和完善的应用场景,但若缺乏人才就无法实现。在生成式AI初期阶段,我常钦佩许多公司的做法——高管们会说:我们需要员工高度了解生成式AI,具备AI素养。
I think of AI series have like usually have very Have like two key aspect, right? It's like use cases. And the second is talent. You might have a great data for great use cases, but you don't have talents and you cannot do it. So a lot of time at the beginning with Gen AI and still And sometimes I'm really admiring a lot of companies for that is like, executives, okay, we need our employees to be very Gen AI aware, like very AI literate, right?
于是他们开始行动,可能让团队采用一系列工具。他们举办大量技能提升研讨会,比如锚定学习。我认为这是非常好的举措。他们也愿意投入大量资金用于采用,比如为员工提供各种订阅服务:触控订阅、个人订阅、云成本订阅等,以提高员工的AI素养。但问题是,很多次要因素...有些国家可能会说:我们在这些工具上花了大钱,却看不到使用效果。
So what they do is they start like, maybe like adopting a bunch of tools the team to use. They have a lot of upscaling workshops, like the Anchorage learning. I think it's like a really, really good thing. And it's also a willing to spend a lot of money into like adopting, like giving people like touchability subscriptions, personal subscriptions, cloud cost subscriptions, like to get the employees to be more AI literate. And that's the thing is it's like a lot of the secondary no country may say, okay, we spend a ton of money on this tooling, but then we don't see because you can see the usage.
但似乎人们并没有充分使用它们。问题出在哪里?所以,是的,我认为这很棘手。
And it was like, but people don't seem to use them as much. And what is the issue? So, so, yeah. So I think that is is tricky. Yeah.
你认为问题出在哪里?是他们不知道如何使用吗?你觉得这里的差距是什么?你认为我们会达到那种'哇,AI让很多公司的工作方式彻底改变'的境界吗?
What do you think is the issue? Is it just they're not they're like they don't know how to use them? Like, what do you think is the gap here? Do you think we'll get to a place of just like, wow, work is completely different because of AI for a lot of companies?
关键在于生产力提升很难量化。我和很多人交流过,以简单的编程为例——很多公司没有使用编程助手或AI辅助编程。当我询问'你认为这能提高生产力吗'时,得到的回答往往很模糊。
The main thing is like, it's really hard to measure productivity gain. So I talked to a lot of people and they was like, first of all, on a simple is coding, right? A lot of companies not using coding agents or coding AI as a coding. And I was asking, was like, do you think that it helps with your productivity? And a lot times the questions are very hand wavy.
感觉就像是,好吧,我觉得情况有所好转。我说,好吧,因为我们有更多的PR(代码审查请求),能看到更多代码然后立即纠正我。但当然,代码行数并不是衡量这个的好指标。所以这真的非常棘手,而且有点讽刺。我确实建议人们去问他们的经理,因为我通常与VP级别的人合作,看到他们手下管理着多个团队。
It was like, okay, I feel like it's been better. And I said, okay, because we have more PRs, we see more code and then immediate correct me. Okay, but of course code number of live code is not a good metric for that. So it's really, really tricky and it's something funny. I do ask people to ask their managers because I've worked with like usually VP level, I see they have like multiple teams under them.
所以我问他们,好吧,你会问经理吗?比如,你更希望获得什么?是给团队每个人订阅昂贵的编程助手服务,还是增加一个正式编制?假设这样问,几乎所有的经理都会选择增加编制。但如果你问VP级别或管理多个团队的人,他们会说只要一个好的AI助手这类工具就够了。
So I asked them like, okay, do you ask a manager? Like, okay, would you rather have access? Would you rather have give everyone on the team like very expensive coding agent subscriptions, or you get an extra headcount, right? Let's say this like maybe like, and almost everyone could say the managers could say head count. But if you ask a VP level or someone who managed a lot of teams, they would say just like say good one AI assistant as these tools.
原因是人们会说,作为经理,因为你还在成长阶段,还没达到管理数十万人的层级。所以对你来说增加一个正式编制很重要,不是因为生产力,而是想要更多人为你工作。而对高管而言,你更关心的是业务指标。
And the reason is that people say like, okay, because as managers, because you are still growing, you're not as a level when you manage hundreds of thousands of people. So for you like having one HR headcount is is big. So you want that not for productivity reasons, but because you just want to have more people booking for you. Whereas for executive, you care more about like that. Maybe you have more like business metrics that you care about.
所以实际上你会思考,什么才能真正推动生产力指标。是的,这很复杂。我认为关于生产力的问题本质上并不在于某些人是否更高效,而是我们缺乏衡量生产力提升的好方法。另一个同样狡猾的现象是,人们告诉我他们注意到不同层级的员工对AI辅助工具有不同反应。
So you actually think about it, what actually drive productivity metrics for you. So, yeah, so it's tricky. And I think the size of question of like productivity is not, I'm not sure it's like fundamentally is there's some people more productive, but it's just like we don't have a good way of measuring productivity improvement. Another thing is also very wily. And I think that people do tell me that they notice different buckets of employees, like different reactions to AI assisted tools.
首先我一直用代码编写举例,因为这个领域影响大且更容易说明问题。根据我收到的不同反馈:有个团队负责人告诉我,他认为在所有工程师中,高级工程师使用工具后产出提升最明显。这很有趣——他实际上将团队分为三个层级(但没明说),即当前表现最佳、中等和最差的。
Like first of I keep going by chip coding because it's boring is big and it's easier to make reasons about. So it seems like I have different reports. Like one team would tell me that like, one of the people tell me, okay, amongst all his engineers, he thinks it's like senior engineers would get the most output, would be more productive because it's like, okay, so that person is very interesting. So he actually divided his team to three buckets, but he didn't tell them obviously. Was like, okay, here's more like currently best performing, average performing and lowest performing.
然后做了随机试验:给每组半数人使用Cursor工具。他观察到,随着时间的推移,表现最好的高级工程师群体获得了最显著的效率提升。
And then there's a randomized trial. So as I give like half of each group, like access to cursor. And then he was noticed like over time, was like, okay, something funny, like the group that get the biggest performance boost, like in his opinion, it goes very close in his team. As a biggest boom boost like the senior engineer, so it's the highest performing. So highest performing engineer get the biggest boost out of it.
第二组是表现中等的工程师。他的观点是:最高效的工程师本身就更主动,他们会说'不,我要自己解决问题',而工具能帮他们更好地解决问题。至于原本表现最差的员工,他们本身就对工作不太上心。
And then the second group is just like the average performing. So his opinion is like, okay, the highest performing engineers, they also know more proactive. They will say no, I just solve problem. So I help them solve problem better. Whereas the people who already have a lowest performing, they only don't care much about work.
对吧?就像这样更容易选择自动驾驶模式,让AI像Jared那样写代码,直接执行就好。但我总是不知道具体怎么做。不过另一家公司告诉我,实际上资深工程师最抗拒使用AI工具。他们说,虽然AI可以帮忙,但这些工程师很有主见,标准也很高。
Right? So like this easier chooses like go on autopilot, get it to like Jared, like that code and just like do it. I always just don't know how to do it. As another company, however, they told me just like, actually senior engineers are the one most resistant to using AI as this tooling. Because they said it's like, okay, but AI because they are more opinionated and they have very high standard.
他们觉得,好吧,但AI生成的代码太烂了。所以非常非常抗拒使用它。所以我不确定。目前我还无法调和这些截然不同的报告。
Was like, okay, but AI code just sucks. So just like very, very resistant in using it. So I don't know. I haven't quite been able to reconcile very different reports on that yet.
这真有意思。让我确认下我理解的对不对——有家合作公司对工程团队做了三组测试:将工程师按绩效分为高、中、低三档,然后给部分人比如Cursor的权限。是Cursor吗?还是其他工具?应该是Cursor吧?
This is so interesting. So just to make sure I'm hearing the story, so there's a company I work with that did a three bucket test with their engineering team, where they created three sorts of groups, the highest performing engineers, mid performing engineers, lowest performing engineers, and gave some of them, so they gave some of them access to say cursor. Was it cursor or what did they give them access to? Was cursor. I think
他们说是Cursor。
they said it was cursor.
好的,明白了。
Okay, cool.
其实我没和他们合作,这是家朋友的公司。
And so I within didn't work with them. This is more like a friend company.
明白了,是朋友的公司。那他们是给半数高效工程师Cursor权限,另一半不给吗?具体怎么分组的?
Okay, it's a friend's company. So did they give like half of the higher performing engineers cursor and half not? Or how did they do the split?
是啊,他们基本上把整个公司分成两半,每个部门也都分成两半。然后他们观察生产效率上的差异。
Yeah, so like they give like half of the entire company, but like half of each bucket. And then they Yeah. Observe the difference in like productivity.
明白了。那他们具体怎么操作的呢?就是随机分配,有些人能用光标工具,有些人不能?他们是怎么实现的?
I see. Yeah. So how do they even do that? They're just like, okay, you get cursor, you don't get cursors. How did they do that?
这太有意思了。
That's so interesting.
确实。具体操作细节我不太清楚,但我很佩服他们能进行随机对照试验。
Yeah. I I didn't get just the mechanics of it, but but I was like, I respect you for doing a randomized trial.
太酷了。这个工程团队规模有多大?有上百人吗?
That is so cool. Okay, wow. How large was this engineering team? Was it like hundreds of people?
没那么大,大概30到40人左右。
It's not that large. It's about like maybe 30 to maybe 40.
30到40人啊。哇哦,所以他们发现最高效的工程师从AI工具中获益最大,其次是中等水平的工程师,而表现最差的工程师受益最少。
30 to 40, okay. Yeah. Wow, okay. So they found that the highest performing engineers had the most benefit from using AI tools, and then behind them was the middle tier engineers and the worst performers were the lowest performers.
是的,但各地情况不同。对吧,就像有些公司确实不一样。你举的这个例子
Yeah. But it's not the same everywhere. Right, Like some companies, yeah, different. This other example you
在这个例子中,资深工程师最抗拒改变工作方式,这点我能理解。目前除了像你这样的ML研究员和AI研究员,我觉得最有价值的就是资深工程师,因为初级工程师的很多工作现在都被AI取代了。但真正懂行的工程师,能在大规模使用AI工具的情况下把控全局,他们就像是能指挥无数初级工程师的超级资产,价值难以估量。
shared of just senior engineers in this one example are most resistant to changing the way they work, which I get. I do feel like the most valuable people right now, other than ML researchers and AI researchers like yourself, are senior engineers, because it feels like junior engineers are just like so much of this is now done by AI, but an engineer that knows what they're doing, that understands how things work at a large scale with AI tools, just basically like infinite junior engineers doing their bidding feels like an extremely valuable and powerful asset.
确实,我非常认同你们公司重视那些能全面理解系统、具备优秀解决问题能力的工程师。要有全局思维而非局部思维。正如我所说,我们公司现在的工作方式完全不同了。他们重组了工程部门,让更多资深工程师参与代码评审。因为他们需要制定工程规范,明确流程标准,所以编写了大量关于如何高效工作的流程文档。
Yeah, I definitely like really appreciate as you see companies like we appreciate engineers who have a good understanding of the whole systems and be able to have good problem solving skill. Like thinking holistically instead of like locally. When our company have seen as the way they work as I said, me, it's like we're completely different now. So they actually restructured engineering org, so that they get more senior engineer should be more in the peer review. Because they've like to get sort of writing guidelines on what is a good engineering practices, what is a process would be like, or they'd be like, okay, so they've write a lot of processes on how to work well.
然后让初级工程师主要负责编写代码和提交PR,而资深工程师更多承担评审工作。我觉得这是在为未来做准备。另一家公司也告诉我类似的策略——未来可能只需要一小群顶尖工程师来制定流程和审核代码,让AI或初级工程师负责生产代码。但问题来了:怎样才能成为这样的顶尖人才?
And then they have junior engineers just like produce quote and like submit PR, but senior engineer more in the reviewing case. So I thought it might be prepared for the future. So another company actually told me something very similar. So it's like kite preparing for future once they only need a very small group of like very, very strong engineers to like create like processes and like reviewing code to get into production, but get AI or junior engineers to produce code. But then the question becomes, how does one become a very strong person?
没错,这正是问题所在。
That's right. That's right. That's the problem.
是啊,我也不清楚具体路径。我最近就在思考这个问题...
Yeah, so I don't know what's the process. I was thinking about like, yeah.
没人考虑这个问题。这是个危机。照这样下去,十到二十年后就不会再有工程师了,因为没人雇佣初级工程师。不过我认为现在刚接触计算机科学的'AI原生代'如果保持好奇心,不把学习和思考完全交给AI,而是借助AI来学习优质编码和架构设计,理论上他们可能会快速成长为最优秀的工程师。
No one's thinking about it. It's it's a problem. We won't have any more in ten, twenty years there'll be no more engineers because no one's hiring junior engineers. Although I could make the case, junior engineers, people just getting into computer science right now are just AI native, and in theory, you could argue they will become really good really fast if they're curious, aren't just delegating learning and thinking to AI, but learning how to actually, using it to learn how to code well and architect correctly, like you could argue they will be the most successful engineers in the future.
我确实认为我之前提到的,比如加载到架构师这个概念,我将其归类于系统思维中。我认为这是非常重要的技能,因为AI可以自动化许多被淘汰的技能,但如何综合运用这些技能来解决问题却很困难。斯坦福大学计算机系课程主席Mehran Sami教授(我最喜欢的教授之一)最近在一个网络研讨会上提到这一点。
I do think that what I mentioned, say load into architect. I think I grouped that in my system thinking. I do think it's very important skill because I think AI can help automate a lot of like destroyed skills, but like knowing how to utilize the skills together to solve a problem is hard. So there's a webinar between Mehran Sami, one of my favorite professors. He was a chair of the curriculum at the CS department at Stanford.
他花了大量时间思考计算机教育,比如在AI编程时代学生应该学什么。另一位嘉宾是AI领域的传奇人物吴恩达。Neera Sami教授提出了一个非常有趣的观点:很多人认为计算机科学就是编程,但其实不是。
So he spent a lot of time thinking about CS educations, right? Like what should students learn nowadays in the era of like AI coding? And then the other person is like Andrew Ng, which is of course is like a legend in the AI space. And Neera Sami, a person like Sami, something very interesting. He said like, a lot of people think that CS is about coding, but it's not.
编程只是实现目标的手段。计算机科学的核心是系统思维——用编程解决实际问题。问题解决能力永远不会过时,因为随着AI自动化更多事务,问题只会变得更复杂。但理解问题根源并逐步设计解决方案的过程将永远存在。以我使用AI调试时遇到的诸多问题为例...
Like coding is just a means to an end. Like CS is about system thinking, like using like coding to solve actual problem and problem solving will never go away. Because like what like AI can automate more stuff, the problem is just get bigger. But as a process of understanding what caused the issue and like how to like design step by step solution to it will always be there. So I think an example of like I actually have a lot of issues with like AI for like in the way of like is debugging.
我不确定你是否经常用AI辅助编程,但据我和朋友们的观察:AI在处理定义明确的任务时表现很好,比如写推荐信、修复特定功能或从零开发应用。但当需要与现有大型代码库交互,或涉及更复杂的组件协作时,AI通常就不太行了。比如我曾用AI部署应用时...
So I'm not sure you use a lot of AI for coding but like something I have noticed and also seen from my friends it's like, it is pretty good when you have very clear well defined tasks, maybe write recommendations, fix specific features or like build an app from scratch, right? Like it doesn't have to interact with a large existing code base, but it added something like a little bit more complicated. Maybe require interacting with other components and stuff. It's usually like not that good. And for example, I was using AI to like use to deploy an applications.
当时我在测试一个不熟悉的新托管服务。AI给我的价值是尝试新工具的信心——以前接触新工具需要先研读大量文档,现在AI让我可以直接边试边学。
And it was testing out a new hosting service. I was not familiar with it. Was like, okay, usually they form me. So what AI does give me is like confidence to try a new tool. Like before what AI is like trying new tools, history, not documentations for the beginning, but AI was like, okay, just try it out and learn.
但在测试这个托管服务时不断遇到bug,非常烦人。我反复让AI修复,它给出的方案五花八门:改环境变量、修代码、换函数、甚至换编程语言...但统统不奏效。
So I was testing out this new hosting service and it kept getting a bug. It was very, very annoying. And it was like, okay, I asked car course, fix it. And it kept giving me like, it kept changing the way, like maybe change the environment variable, fix the code, maybe change from the function to this function, maybe change the language, maybe it doesn't process JavaScript, well, I don't know, whatever. And it didn't work.
最后我决定自己阅读文档排查,结果发现问题出在服务层级——我需要的功能根本不在当前订阅套餐里。这让我意识到:AI只会从错误的方向修修补补,而真正需要的是理解各组件协作关系并定位问题源头。
And it was like, okay, that's it. I'm also gonna read the documentation myself and see what's wrong. And it turns out it's like, I'm on another tier, Like the fish I want is not available in this tier, right? So I feel like, okay, so the issue with Cloud Code is just trying to focus on fixing things from a very different component versus the issue is from a different component. So I think of like, okay, be understanding how different components work together and where the source of the issue might come from.
你需要提供一个全面的视角。这让我思考,我们该如何教会AI进行系统思考,就像让所有人类专家那样,他们写作时往往基于一个框架。比如遇到这类问题,先看这个,再看那个,然后处理其他事项。所以我认为这可能是一种方法,但这让我想到,我们该如何教会人类进行系统思考。
You need to give a holistic view of it. And it's made me think is like, okay, how do we teach AI, like system thinking like that, where I can have all the human experts, like having, they write like very much built on a scaffold. Just like, okay, for this kind of problem, look into this, look into that, look into that and then stuff. So I think that could be one way, but that's what made me think is like, how do we teach humans? Like system thinking.
是的,我认为这是非常有趣的技能,确实非常重要。
Yeah, so I think it's very interesting skill. I do think it's very important.
这与Brett Taylor在播客中分享的见解完全一致。他是Sierra的联合创始人,创建了谷歌地图,曾任Salesforce、Quip等公司的CEO。我问他,人们是否应该学习编程。
That's exactly the same insight Brett Taylor shared on the podcast. He's the co founder of Sierra. He created Google Maps. He was CEO of Salesforce, Quip, a few other things. And I asked him just like, should people learn to code?
他的观点与你所说完全一致,即学习计算机科学课程不是为了掌握Java或Python,而是理解系统如何运作、代码如何运行以及软件如何广泛工作,而不仅仅是学会某个功能。我想帮助人们理解的是,你写了这本《AI工程》的书,本质上是在帮助人们认识这种新型工程师。你对机器学习工程师与AI工程师的区别有个非常简洁的比喻,这与现在产品经理的情况很相似——AI产品经理与非AI产品经理。你的描述是:机器学习工程师自己构建模型,AI工程师利用现有模型构建产品。还有什么想补充的吗?
His point is exactly what you said, which is learning, taking computer science classes is not about learning Java and Python. It's learning how systems work and how code operates and how software works broadly, not just here's like a function to do a thing. One thing that I wanted to help people understand, you wrote this book called AI Engineering, which is essentially helping people understand this new genre of engineer, and you have this really simple way of thinking about the difference between an ML engineer and an AI engineer, which has a really good corollary to product managers now, of just like an AI product manager versus a non AI product manager. The way you describe it and fill in what I'm missing is just, ML engineers build models themselves, AI engineers use existing models to build products. Anything you wanna add there?
我写书时最不喜欢的一点就是必须下定义。我认为没有完美的定义,总会有边缘情况。但总的来说,我认为生成式AI更像是一种服务,当有人为你构建模型且基础模型性能足够好时,它能让人们轻松地将AI集成到产品中,甚至不需要了解核心设计原理——尽管知道这些会很有帮助。这大大降低了想要用AI构建产品的人的入门门槛。
One thing I really dislike about writing books is that it has to define like this. And I think it's like no definitions to be perfect because there always be like edge cases. But yeah, in general, I think it's like Gen I AI as a service, like more as a service, like when somebody build the models for you and the base model performance is a pretty shock. So it's like it's enabled people to just like, okay, now once you integrate AI into my product, I don't need to learn what Korean design is, even though knowing that could really help. But yeah, it's it makes the entry barrier really low for people who want to use AI to build product.
与此同时,AI的能力如此强大,也拓展了其可能应用的领域。因此,虽然入门门槛极低,但我看到对AI应用的需求激增。这非常令人兴奋,仿佛打开了一个全新的可能性领域。
And at the same time, AI capabilities are so strong. It's also increased the possibilities, the type of applications that AI can be used for. So I think that yes, was entry barriers like super low and I saw demand for AI applications a like lot bigger. So it feels very, exciting. It opens up like a whole new ball of possibilities.
是啊,现在你甚至不需要花时间构建这个AI大脑,直接就能用它做事。真是巨大的解放。好吧,也许最后一个问题:你经常能看到哪些方法有效、哪些无效,以及行业发展趋势。
Yeah, it's like now you don't have the time, not even to spend time building this AI brain. Now you can just use it to do stuff. Such an unlock. Okay, maybe just a final question. You get to see a lot of what's working, what's not working, where things are heading.
我很好奇,如果你必须思考未来两三年内行业的发展方向,你认为产品构建方式会有何不同?公司运营模式将如何改变?如果让你预测未来几年企业运作方式的最大变革会是什么?
I'm curious just if you had to think about in the next two or three years, where things are heading, what do you think, how do you think building products will be different? How do you think companies working will be different if you had to think of maybe the biggest change we expect to see in the next few years in terms of how companies work?
我认为很多组织行动并不迅速。但与此同时,它们的变革速度又超出我的预期——因为说实话,我们对那些恐龙企业存在偏见。许多来找我的高管其实很有前瞻性。所以可能我个人更倾向于认为组织应该快速行动。
I think in a lot organizations they don't move that fast. But at the same time, they also move faster than I expected because again, I think it's like bias like end up with dinosaur companies. Don't care. I think a lot of executives who come to me are like very forward looking. So maybe for me I'm very biased towards like organizations is like move fast.
因此,我认为一个重大变化将发生在组织结构层面。过去我们有很多割裂的团队,比如泾渭分明的工程团队和产品团队。但现在问题来了:该由谁来制定评估标准?
So, yeah. So I think one big change I see is just like in organizational structure. I think there's a lot of value place in like So before we have like a lot of disjointed team. Like we have very clear like engineering team, product team. But then there's a question of like who should write eval, right?
究竟谁该对指标负责?事实证明评估不是独立问题,而是系统性问题。因为需要考察不同组件的交互关系,必须基于用户行为——只有了解用户关注点,才能制定出反映用户关切的评估框架。
Like who should own the metrics? And it turns out it's like eval is not a separate problem. It's a system problem, right? Because need to look into different components, how they interact each other. You need to use the behaviors, because you need to know what users care about so that you can write a box that's like reflect what users care about.
所有这些都可以通过审视组件架构、设置防护机制来实现。这看似是工程问题,但理解用户本质上是产品团队的职责。正因如此,产品团队、工程团队乃至用户获取等营销团队会比以往更紧密协作。我认为未来组织结构会促进这些原本泾渭分明的职能间加强沟通。
So all of that, you can sort it from look like at your different component architectures, place guardrails and stuff. So it's just engineering, but understanding users is like what product, right? So because of like a lot of things and it's extremely important. So like that kind of brings product team and like engineering team even like marketing team like user acquisition like very close to each other. So yes, since in a way so I think we're structuring so that's more communications between like previously very distinct functions.
另一个变化是关于团队职能自动化。我常思考未来几年哪些能自动化,哪些不能。现实是很多岗位已在消失,这想法确实有些骇人。但团队会说:这些非核心职能本就可外包,现在通过系统化和AI就能实现自动化替代。
Another thing is just like SOC as teams, of course, I think about like what can be automated in the next few years and what cannot be automated. And I've seen that people already like shedding like actually it's a little bit like scary to think about it, but I also think it's like the team, the web Toby is like, okay, this is a good you and me, but we have we like got rid of these functions. Right? I follow a lot of things like previously outsourced, for example, like traditionally is a business outsourcing this not core to them, and like can be done with like not can be then more systematized. So with that, you can actually like use AI to like automate a lot of that.
因此我们更需思考:初级和高级工程师的价值差异,如何重构工程组织架构。这确实是组织成功的关键——我正在重新调配资源,考虑是否需要开拓新用例,由谁来主导。关于AI,我不确定这个观点是否正确,但我认为基础模型虽未达极限,但短期内不太会出现突破性的超级模型——就像当初GPT问世时那样。
So as a separation, we're thinking more of like what is the value of like junior engineers or senior engineers, how should restructure engineering org for that. So yes, so I do definitely think that is one thing to success organization because I was just moving pieces around and I thinking about like use cases, whether you need to like spin out new use cases and who would lead the new effort and like, yeah, that is one big change. Another things in terms of like AI, I think this is I'm not sure how true this is. I guess I'm also like on the camp of like thinking that is has merit is a camp of like, okay, base models we have probably like not quite maxed out, but we want, we are unlikely to see like really, really strong, like crazy strong model. So like you remember, like when we have like GPT, right?
GPT-2是一个重大升级,像是自动化的进步。就像GPT到GPT-3是巨大飞跃,GPT-4也是质的跨越。那么GPT-5是否会有同样量级的突破?这存在争议。我认为基础模型的性能提升可能不会带来颠覆性震撼。
And GPT-two, which is a big step up, like an automatic, like better. So like GPT and then GPT-three was like much, much bigger and GPT-four much, much bigger. And then of course on your GPT-five, but like is GPT-five like that scale of much bigger, like a step jump compared to the previous? I think it's a debate. So I think that it's like we have disappointment, the base model performance improvement is not going to be mind blowing.
过去三年间的发展表明,我们将在训练后阶段和应用构建阶段看到大量改进。我对多模态领域特别感兴趣——虽然已有很多文本应用,但音视频用例的发展前景更令人振奋。
It was in the last three years. So I think there's like a lot of like improvements we're going see in the post training phase, in the application building phase. And yes, also I think that's where I feel I would see a lot of improvement there. I was very interested in like multi modality. So we've seen a lot of text based, but I think there's a lot of audio videos use cases that is very, very exciting.
音频技术并不像表面那么简单,我曾与几家语音初创公司合作。语音交互完全是另一个维度——比如从文本聊天机器人转向语音聊天机器人,这完全是两套不同的系统架构。
And I think audio is not quite as soft as one thing because I do work with like a couple of like voice startups. And when I talk to you, think about voice is an entirely different beast. So let's say you have chatbot, right? And we go from a text chatbot to voice chatbot. It's like the consoles are completely different.
语音聊天机器人需要考虑延迟问题:从语音转文本、文本处理、生成回答再到语音输出,这个多跳过程使得延迟控制至关重要。另一个关键是如何实现自然对话——比如人类交流时会通过'嗯哼'这样的声音反馈来实现流畅对话而不中断。
Because now with voice chatbot, right, we need to think about like latency. Because having multiple steps, first like have like text, like voice to text, text to text and text question to text answer, and then text to voice answer, right? So it's like multiple hops and like latency become very important. And there's a question like, what does make you sound natural? So for example, like people would think of like in AI and humans, so when humans touch each other, like if I say, you try to interrupt me and say, Chip, that's right.
我会适当停顿倾听对方,但有时用简单语气词表示聆听而不打断对话。这种电话交谈中的中断处理机制直接影响对话自然度,这还涉及相关法规要求。
I would like pause and I try to hear you out, right? But sometimes I just use my just some word, like acknowledge when I, mhmm, mhmm, then I shouldn't stop, I just continue. So the question of like, phone interruption and whether it's like, I stop or not? Like it's a big in what perceived as like natural conversations. And there's also regulations, right?
虽然开发者希望语音机器人能模仿人类,但可能需要法规强制披露对话对象的真实属性(人类或AI)。这个领域比想象中复杂,不完全是基础模型能解决的问题。
Because like a lot of time people want to build AI chatbot, voice chatbots, sound like humans, try to like trick users into thinking they're talking to humans. But also like maybe potential regulations saying like, okay, you have to disclose to users when you talk if the bot is talking to is human or AI. So I think just like there's a whole space. I think it's not quite as so as you think, is it? But it's not quite like an AI foundation model problem, right?
人类对话中断检测其实是经典机器学习问题,可以构建分类器解决。这本质上是巨大的工程挑战而非纯AI问题——当然现在有人尝试构建端到端语音模型,跳过语音转文本的中间步骤。
Because like a human interruption detection is actually a classical machine learning problem. It's a different framing of like you can view classifier for that. Or like the question of like, let's see, actually it was a massive engineering challenge, not an AI challenge. Of course, it can be an AI challenge because people are trying to build a voice to voice model. So instead of having to first transcribe the voice from me into text and then get a model to answer and get another model to like turn from text to speech.
你可以直接发送语音到语音。这是我们正在努力的方向,但这确实非常困难。是的。所以即使是音频,我认为它也比视频更容易处理,对吧?
You can just send your voice to voice directly. So that is something we're working on, but it's like very hard. Yeah. So, yeah. So like even audio, I think of it as like the easier than video, right?
因为视频既有图像又有声音。这已经相当困难了。所以我认为这个领域存在很多挑战。
Because video have like both image and voice. It's already like pretty hard. So I think there's a lot of challenges in that space.
这真是一份令人惊叹的清单。让我快速反馈一下。你预测未来几年我们工作方式将发生的变化。这些观点实际上与我在这档播客中的许多对话产生了共鸣,正好印证了当前的发展趋势。首先是不同职能之间的界限模糊化,比如工程设计工程师,现在每个人都要做很多不同的事情。
That was an awesome list of things. Let me mirror on back real quick. So what you're predicting in the next few years, things that will change in the way we work. And these actually resonate with so many conversations I've had on this podcast, so just kind of doubling down on where things are heading. One is the blurring of lines between different functions, of just like engine design engineering, everyone's going to be doing a lot of different things now.
其次是更多工作将被代理和这些AI工具自动化,理论上生产力会提高。第三是从预训练模型转向后训练、微调等,因为正如你所说,模型在智能提升方面可能正在放缓。不过我要提一下与Anthropic联合创始人的Ed Chat访谈,他提出了一个很好的观点:我们很难理解指数级增长的感觉,我们正身处其中,而且模型发布更频繁了,所以它们之间的差异我们可能注意不到,因为GPT-3是在GPT-2之后大约一年才发布的。也许是这样,也许不是。你提到的第四点是投资多模态体验的理念。
Two is just more of work being automated with agents and all these AI tools, and just in theory productivity going up. Third is shifting from pre training models to post training, fine tuning and things like that, because to your point, models maybe are slowing down on how smart they're getting. Although I'll point folks to the Ed Chat with the co founder of Anthropic, he made a really good point here, he's like, we're really bad at understanding what exponentials feel like, we're in the middle of that, and also models are being released more often, so the difference between them, we may not notice because they're just happening more often versus GPT-three came out like a year, I don't know, after GPT-two. So maybe true, maybe not. And then the fourth point you made is this idea of multimodal investing in multimodal experiences.
我迫不及待希望ChatGPT的语音模式在打断功能上做得更好,就像你说的那样。我正在和它说话时,它会发出一点声音然后说'停'。然后你必须...然后它就停止说话了。太烦人了。
I cannot wait for ChatGPT voice mode to get better at interruption, like exactly what you're saying. I'm just like talking to it and it makes a little sound and it's like, stop. And then you have to, and then it stops talking. It's so annoying.
我很惊讶我们家里还没有更好的语音助手。我测试了很多产品,总是抱着希望:'天啊,Zach可能是那个对的'。但不知道有多少产品只是勉强能用,因为它们还不够好。
I'm shocked that we don't have better voice assistant at home yet. I think I have been testing out a bunch. Was like, I keep hoping, oh my God, Zach could be the one. And then I don't know how many of them are just like, have to get a voice because they're not that good.
我认为它即将到来。我听说快来了。Anthropic正在与某家公司合作,我不确定是否已经发布。
I think it's coming. I hear it's coming. Anthropic is working with someone that I don't know if it's launched or not yet.
是的,抱歉,我想回到你提到的那个点,就像作为嘉宾,Anthropic提到性能改进时说的。我认为这是个重大变化。关键在于模型基础能力与感知性能之间的差异。比如预训练模型与实际表现之间的区别。简单来说,你是否熟悉‘测试时间计算’这个概念?
Yeah, I'm sorry, I want to break back to you know what you mentioned about like the, as your guest, like from Anthropic mentioned about the performance improvement. I think there's a big change. I think like this difference between a model based capability. So I'm about like the pre trained model, Versus a perceived performance. So let's say it's like, I'm actually thought about like, are you familiar with the term test time compute?
我不太清楚,请解释一下。
I don't think so. Help us understand.
这些理念大致是:假设你有固定量的计算资源对吧?你会花大量计算在预训练或模型预训练上,然后还要投入不少计算在微调上。预训练与微调的计算资源比例在不同实验室间差异巨大。比如SenseN还需要在实时推理上消耗计算——当我有一个训练好的模型要服务用户时,我输入问题或提示,它就需要实时进行推理,这都需要计算资源。
So those ideas is like, okay, you have some fixed amount of compute, right? So you're gonna spend a lot of compute on pre shooting or training the model pre shooting and then have been a lot of some compute on like five shooting. And the ratio of pre tuning to the post shooting compute is like crazy, varies different between different lab. And also like SenseN has to spend compute on like Jerry inference when I have a trends and five ton of model and now it wants to like survey to users. So I might type a questions or prompt and it's like Jerry, like do inference like and that requires a compute.
我在想,如果讨论应该把更多计算资源分配给预训练、微调还是推理时?因为推理环节在测试阶段会消耗双倍计算资源。把更多计算用于推理就意味着采用‘测试时计算’策略——即分配更多计算资源来生成推理结果,实际上能带来更好的性能表现。具体怎么做呢?比如你有个数学问题:
And I guess if you have a discussion of like, should I spend more compute on like pre tuning or fine tuning or inference, right? Because like inference and double five hours of like test and compute. So like spending more compute on inference is like calling like test time, like compute, like as a strategy of like just allocating more resources, compute resource to generate inference when actually bring better performance. And how does that do it? Like, let's say let's say you have a math questions, right?
与其只生成一个通用答案,可以产生四个不同答案然后根据某个标准选择最优解。或者比如四个答案中有三个说是42,一个说是20,那么根据多数一致原则答案应该是42对吧?这就相当于让模型批量生成多个候选方案。
And maybe instead of just generic one answer, can get like four different answers and say, okay, whichever is the best according to some standard. Or like, okay, I have four answers and then maybe like three of them say forty two and one of them says like 20. And so, okay, three of them in agreement. So the answer should be 42, right? So like just people shouldn't generate a bunch of it.
另一种情况是:很多时候推理思考就像人类不应该只生成少量思维标记,而应该花更多时间思考再给出最终答案。这需要更多计算资源,但能带来更好的性能。从用户角度看,当模型花更多时间探索不同潜在答案、进行更长时间思考时,就能给出优质得多的最终答案。
Or another thing is like a lot of time like reasoning thinking, it's just like people shouldn't like generate more thinking tokens. I spend more time thinking before showing the final answers. It's like require more compute, it's only giving more better performance. So I think it was like from the user perspective, right? Like when the model spend more time exploring different potential answers, thinking longer, it can give you much better final answers.
但基础模型本身并没有改变。
But the base model itself does not change.
是的,确实如此。绝对正确。这是对Ben Mann观点很好的补充。Chip,我们讨论了很多内容。我已经了解了我希望学习的所有内容,甚至更多。
Yes, Does it make that does. Absolutely. That is a good corollary to Ben Mann's point. Chip, we covered a lot of ground. I've gone through everything I was hoping to learn and more.
在我们进入激动人心的闪电问答环节之前,你还有什么想分享的吗?还有什么想留给听众的?
Before we get to our very exciting lightning round, is there anything else that you wanted to share? Anything else you want to leave listeners with?
我在几家做这类事情的公司工作,他们希望员工能提出创意。所以关于AI战略有个大争论:自上而下还是自下而上更好?高管是否应该提出一两个杀手级用例,然后集中资源?
So I do work at a few companies that does these things of like, they want employees to like come up with ideas. So there's a big debate on like, what is a better way for AI strategy? Should it be top down or like bottom up? Right? Should like executive come up with like one or two like killer use case and like everyone like allocate resource to that?
还是应该让工程师、产品经理和聪明人自己提出想法?其实两者需要结合。有些公司会说:我们雇了一群聪明人,看看他们能想出什么。他们会组织黑客马拉松或内部挑战赛来激发产品创意。
Should you give engineers and PMs and smart people come up with ideas? And it makes us a mixture of both. So some companies, it was like, okay, we hire a bunch of smart people. Let's see what they come up with. And they organize more like hackathons or internal challenge to get people to build products.
我注意到很多人根本不知道该开发什么。这让我很震惊,感觉我们正面临某种创意危机。现在我们有这么多酷炫工具,可以从零开始做任何事——设计、写代码、建网站。理论上应该涌现更多创意,但人们却不知为何卡住了。
And one thing that I noticed is like a lot of people just like don't know what you built. And it shocked me, why I feel like we are in some kind of like an idea crisis, Now we have all this really cool tools to have you like do everything from scratch. I can have you like design, it can have you like write code, you can have build websites. So in theory, we should see a lot more. But at same time, people are like somehow stuck, like they don't know what to build.
我认为这可能与社会期望有关。我们已进入高度专业化阶段,人们被要求专注于单一领域而非全局。缺乏全局观时,就难以产生该开发什么的想法。所以和这家公司合作时,我们进行了大量头脑风暴。
And I think it's like, maybe you see a lot of had to do with like maybe like society expectations. Because like we have gone through, we have gone into this phase of like specializations, people like very highly specialized and people are supposed to do like focus on one thing really well instead of being a big picture. And we don't have a big picture of you. It's hard to come up with ideas of what you build. So I know what like, when I work with this company, just hacked a ton.
我们确实制定了创意指南。通常我们会建议:回顾过去一周,留意你做了什么和什么让你沮丧。当遇到挫折时,思考能否用不同方式解决?
We do work out like come up with a guideline, like how to come up with ideas. And usually what we think of is like, okay, like one tip is like, go look from the last week, right? Like for a week, just like pay attention to what you do and what frustrates you. And when something frustrate you, think about like, is there anything we can do? Is there like, can it be done a different way?
这并不令人沮丧。我可以像人们那样交流,让团队接受替代方案。如果你看到常见的困扰,或许可以考虑围绕它构建解决方案。所以我觉得,只需观察我们的工作方式,不断思考改进方法,持续提问:怎样才能做得更好?然后我就针对这些困扰构建解决方案。
It's not frustrating. And I can talk like people can swap to accept sub note with our teams. And if you see like common frustrations, maybe there's something you can think about is to build something around that. So, yeah, so I feel like just like notice how we work, thinking of ways, constantly ask questions, how can this be better? And then I just build something to address the frustrations.
我认为这是学习和采用人工智能的好方法。
I think it's a good way to just learn and adopt AI.
我想人们每次打开这些Vibe编码工具时,都会感受到你描述的那种体验——你可以描述任何想要的东西。而我却想:我不知道自己想要什么。我非常喜欢这个实用的建议:关注那些让你感到沮丧的地方。比如,我刚构建了一个很酷的小型Vype编码应用。当时我在Google文档里写新闻稿,粘贴了一堆截图之类的图片,然后突然想起:哦对了,你无法从Google文档中提取图片。
I think people have felt exactly what you're describing every time they open up one of these Vibe coding tools, where they could just describe anything you want. I'm like, I don't know, what do I want? I love this very tactical piece of advice, just like what frustrates you, just pay attention to where you're frustrated. For example, I just built a very cool little Vype Coded app. I was working on a newsletter post inside Google Docs, and I pasted all these images into the Google Doc from screenshots and stuff, and then I forgot, oh yeah, you can't take images out of Google Docs.
这就像《加州旅馆》的体验——你可以粘贴内容进去,但很难把图片再弄出来。于是我直接去所有Vibe编码工具里,构建了一个应用:只要输入Google文档链接,就能自动下载所有图片。效果出奇地好,界面也很可爱,我会把链接放在节目说明里。
It's like this Hotel California experience where you can paste stuff into it, very hard to get images back out. So I just went to all the Vibe Coded tools and just built an app that I can give you a Google Doc URL, and it let me download all the images automatically, And it worked amazingly well and it made it really cute and I'll link to it in the show notes.
哦,我很想看看。我非常看好用AI创建微型工具,就是那些能让生活稍微轻松点的小东西。
Oh, I would love to see that. I'm very bullish on using AI, just create like micro tools. It's just something that's like make your life a bit easier.
百分之百同意。我觉得这正是人们使用这些工具的主要方式之一——解决他们遇到的小众问题。说到这里,Chip,我们进入激动人心的快问快答环节。我有五个问题,你准备好了吗?
And 100%. I feel like that's one of the main ways people are using these tools, just like a little niche problem they have. With that, Chip, we've reached our very exciting lightning round. I've got five questions for you. Are you ready?
当然,随时可以。等等——这取决于问题有多难。
Yeah, always. No, no. Depends on how hard the questions are.
他们对每位嘉宾的提问都非常一致。所以我想你应该听过这些问题。第一个问题:你发现自己最常向别人推荐的两三本书是什么?
They're very consistent across every guest. So, I imagine you've heard them before. First question, what are two or three books that you find yourself recommending most to other people?
其实我很害怕推荐书籍,因为我觉得一个人该读什么书完全取决于他们的需求、人生阶段和目标。不过确实有几本书彻底改变了我的思维方式和世界观。比如《自私的基因》,这本书帮我解答了是否要生孩子的问题——它让我明白我们的许多行为模式本质上都是基因功能的体现,而基因的首要目的就是繁衍。当然,鲍勃还提出了另一个观点:其实每个人都渴望永生对吧?
I'm really terrified of book recommendations because I feel like what books a person should read really depends on what they want and where they're in life and what they want to get to. But I guess several books I do think have really changed the way I think and see the world. So one thing is a selfish gene. That's like to understand, it actually helped me with the question like whether I want to have kids or not, because it's like understanding more of like, a lot of our functions, way we operate is the functions of our genes and genes, once you do one thing was like to procreate. So yes, in a way, but it's like, so Bob also proposed another thing is like, so everyone wants to live forever, right?
这种渴望可能不是显意识的,但潜意识里确实存在。我认为有两种实现方式:一种是基因层面的永生——基因会永远延续;另一种是模因(文化基因)层面的,就像某些思想被传播后能长久存在,某种意义上也是一种'活着'。虽然这个概念有点抽象,但非常有趣。
And maybe it's not like consciously, but subconsciously, we do want that. And I said two ways, like one is like by genes, like genes, one is just like going to continue forever, but it's also the two ideas. I think there's something going to meme. It's just like being a boy, if have some ideas out there and then it's like last for a long time, so I'm going to like live on. I know it's like, it's a little bit like abstract, but it's very interesting.
我特别喜欢的另一本书是新加坡前领导人——被誉为新加坡国父的李光耀的著作(书名记不清了)。他在二十五年间将新加坡从第三世界国家发展为第一世界国家。我从没见过哪位国家元首如此详尽地记录治国方略:如何制定能引导民众为国效力的公共政策,如何处理外交关系等等。这本书提供了难得的系统思考视角。
The other books I really, really like is like from like the book from Singaporean previous, I think he's as a, known as the father of Singapore, I don't know, like Limi Kwang Yeo, I'm not sure what's the title is, but like he did so, he was the one who left Singapore from, he changed Singapore from a third world country to a fourth world country within twenty five years. And I have never seen any country leaders spend so much effort into, like, putting down his thought on, like, how to build a country, like like that. And as I talk a lot about public policy, like how to create policies that encourage people to do the right things that is good for the nations. And also talking about foreign affairs, foreign policies, like the liberation of the country with other. So it's a really good book to think about.
对我们普通人而言,这种国家层面的系统运作是终生难以亲身实践的,所以通过这本书来学习特别有价值。
For me, it's like system thinking, but like it's a different kind of system, which a country, which a lot of us don't get a chance to like ever experiment in our life. So it's good to learn about that.
你提到的第二本书叫什么名字?
What was the name of that second book?
书名应该是《从第三世界到第一世界》。我好像把书放这附近了。
It's called like From Third to First World Flesh. I think I have it somewhere here. Yeah.
是的,非常
It is. It's very
史诗级的书。
epic book.
太棒了。我绝对想读这本书。真是个好建议。我听过很多关于他影响力的讨论,也在推特上看过许多视频,都是关于他如何建设繁荣社会的深刻见解。不,显然你
That's awesome. I definitely want to read that. That's really good tip. I've heard a lot about the impact he's had and I've seen all these videos on Twitter of just his really wise insights into how to build a thriving society. No, clearly it you
需要知道,他哪来的时间写这么厚的一本书?简直难以置信。
need like, how does he have time to write such a thick book? It's like insane.
克劳德,请总结一下。开个玩笑。顺便说,《自私的基因》这本书我也非常喜欢。选得太好了。这是一本不太为人所知却彻底改变我世界观的书。
Is Claude, please summarize. I'm just joking. By the way, Selfish Gene, I also absolutely love that book. That is such a good choice. It's such an under the radar kind of book that really changed the way I see the world as well.
所以,选得真好。好了,下一个问题。你最近有没有特别喜欢的电影或电视剧?
So, really good pick. Okay, next question. Do you have a favorite recent movie or TV show you really enjoyed?
我看了很多电影和电视剧作为研究,因为我在写我的第一部小说,而且最近已经卖出去了。所以我很好奇是什么让它——这是一部剧情片,不是科幻或科技人士通常看的那种。这非常...我知道这很出人意料,非常...所以几乎像是通过看电视来了解哪些故事会流行,试图理解那些套路之类的东西。所以我不确定观众是否...
So I watch a lot of movie and TV shows as a research because I working on my first novel and I recently sold it. So I'm interesting like what makes it, it's a drama, it's not a science fiction or anything that like tech people usually read. So it's very like, I know it's a very out of left field and like very so it's almost like reading, watching TV to see like what kind of stories become popular, trying to understand the trope and and stuff like that. So I'm not sure if the audience are like
那么,举个例子?有没有哪部作品让你对写作有所领悟?
Well, what's one? What's one that taught you something about writing?
我觉得像是《延禧攻略》,一部中国电视剧。
I think it's like Yamsi Palace. It's a Chinese TV show.
不错。好的。我之前没听说过这部。挺好的。嗯。
Cool. Okay. I haven't heard that one on the before. Cool. Yeah.
下一个问题。你有没有一句人生格言,在遇到困难时——无论是工作还是生活——会经常想起并从中获得力量?
Next question. Do you have a life motto that you often think about come back to when you're dealing with something hard, whether it's in work or in life?
这听起来可能有点虚无主义。我认为社会最终就像...其实什么都不重要。我常想,在亿万年的时间尺度上,什么都不会留下,没人会记得。我知道可能有人会反驳这个观点。所以我的理论是:十亿年后,我们所有人都不复存在。
This sounds very nihilist. I think society is like in the end, nothing really matters. Usually I think of like in the grand scheme or thing that in a billionaires, nothing will like, no one will ever be there. I think, okay, someone will argue with me about that. So I go to say like, so my theory is like in a billion years, like none of us will never exist.
所以无论我们做了多混乱、多疯狂的事,或者把事情搞得多糟,反正没人会记得。某种程度上这听起来可怕,但反而让人解脱——它让我觉得:那就尽管尝试吧,有什么关系呢?最近有个例子:我一位家人刚去世,因为无法回家,我就问父亲...
So like whatever like messy things, like crazy things we do, or like how bad we do it, I mean, no one would be remember when we there should remember it. And I think in a way it's like, it sounds scary, but it's very liberating because it just allows me, okay, let's just try things out, right? Like, why does it matter? And there's a story of like recently, so I have some family member who passed away recently. And I was talking to my dad because I couldn't be home for that.
我问父亲:我能做些什么让逝者得到安慰吗?父亲反问我:此刻他还能需要什么呢?这让我意识到生命终点时,物质无法带来快乐——金钱、商品都没意义。这促使我思考:到底什么才是真正重要的?所以现在觉得,就算搞砸合同或失败又怎样?在生命尽头,这些根本无关紧要。
I was asking my dad like, okay, is there anything I can do to make the person like, oh, something like comfort, sort of anything that you can get that person. And my dad was just like, what can he possibly want at this moment? Like it has made me feel like at the end of life, like there's nothing that can bring you, like material can bring you joy, there's no like money, no product, nothing. And in a way it's making me feel like, okay, what really do I really care about at the end of the day? So I guess it's like, think about it, it's like, okay, maybe I fail it, maybe I don't get that contract, maybe I do things like, but at the end of life, like I don't think that actually really matters.
从某种意义上说,这确实让人感到解脱。
So in a way, it's it's kind of liberating.
我知道你说这可能有点虚无主义,但史蒂夫·乔布斯在他最著名的演讲中也分享过这点。毕竟我们终有一死,所以不必太过较真。这确实让人感到自由。它让你学会珍惜每个瞬间、每一天,就像——没错,让我们去做些艰难又可怕的事吧。好了,最后一个问题。
I know you said it might be nihilistic, but this is what Steve Jobs shared too in one of his most famous speeches. Just, we will all die someday, so don't take things so seriously. And it is freeing, absolutely. It just makes you appreciate every moment, every day you have, just like, yeah, let's just do something hard and scary. Okay, final question.
你提到正在写小说。科技行业大多数人从未尝试过创意写作和小说创作。在这个过程中,你学到了什么关于写出更好故事、更好小说的经验?
You talked about how you're writing a novel. Most people in tech have never written something creative and fiction. What's just like one thing you learned in the process about how to write better stories, better fiction?
很多时候我们阅读时会被细节绊住。我想尝试创意写作是因为想成为更好的写作者。这让我明白,面对不同的读者群体能帮助我更好地预判:这类读者想听什么?他们会在意什么?这就是我的成长方式。就像任何内容创作,本质上都是在预测用户的反应。
A lot of time when we read, we get tripped up by some small things. So I think like I want to do creative writing because I just want to become a better writer. And it tells us like maybe try my like a different audience could help me like become better at like anticipating what this different type of audience would want you to hear and like, are they able to care about? So it's the way for me to get up. So I think about Friday, even like any kind of like content creations is about like predicting the user's reactions.
下一个token(双关语)。开个玩笑。是的。
The next token. Just kidding. Yeah.
就像做播客时,你会思考:用户会对什么内容感兴趣?我发现很多公司推出产品时都要构建叙事逻辑——如何定位产品才能打动用户?我从事理论写作有段时间了,对于预测工程师群体想听什么还算有经验。
So like you do a podcast, it's like, okay, what kind of things that the users could find engaging? And I find this like a little bit like in a lot of companies, you have like launch a product, you have a narrative coming out. So okay, how do we position this product in a way that like users would want, right? So I feel like I have done theoretical writing for a while. And I felt like I have had some experience like trying to predict what engineers would want to hear or like care about.
但面对完全不同的受众群体时我就缺乏经验了。这正是我想通过小说创作来突破的。为此我做了大量研究,比如追剧观察大众喜好。有位编辑让我明白:最重要的是角色的情感脉络。
But then I don't have an experience like this completely different type of audience. So that's what I want you to like career writing, creating a story. And that's why I was doing a lot of research. I mean, going to research, actually enjoy a lot, like watching a lot of dramas, I just see like what people like. So one thing that I care about is just like, I think I learned is like, well, like emotional journey was from an editor.
当我们撰写自己关心的内容时,比如用户在整个故事中的感受,我们总希望开头能抓住读者对吧?我们需要一个钩子来吸引人继续阅读。但也不能有太多戏剧性情节,否则读者会感到疲惫——因为情感被过度操控会让人精疲力竭。所以如果设计情感线,可能需要安排高潮和舒缓段落。就像我最近意识到,技术写作需要完全聚焦内容本身。
So like when we write something that we care about, like how users would feel, like across the story, like we want something in the beginning, right? We want something just like, we need to have a hook so that we will continue reading. But we also don't want too much of like drama because we'll get like too tired, right? Like, because like you're the emotionally exhausted, because it's like you're being emotionally manipulated like a lot of time. So if you have like emotional journey, maybe have like some climax or like something more chill, like maybe like, I also care about another things I didn't realize is like for me, for technical writing, you entirely focus on the content.
这类论述是非常客观的对吧?比如机器学习编译器领域,读者根本不在乎讲解者是谁,因为内容本身是客观的。但小说就不同,人物魅力很重要。我初稿把角色塑造得过于逻辑化、理性化,每个行为都极度理性。后来我请一位好友读了稿子。
Like the argument is very impersonal, right? For example, like people like ML compilers, like doesn't matter if they like the person telling them about compiler or not, right? Because it's just like objective, like, but like for novel people care about like character likability. So, like in the first version is my story, it makes the characters like a little bit more like very logical, very rational, and just does everything just like very rationally. And then the feedback I got is I have a very good friend read it.
他真是个很棒的人。他直接对我说:'Chip,说实话我讨厌这个角色。'故事本身没问题,但这个角色让人讨厌到不想继续读。所以第二版我调整了角色性格,让他更讨喜。
And he's an amazing person. He's a great person. And he was like, Chip, I'll be honest with you, I hate that person. So it doesn't matter as a story, it's just like the person is so unlikable to say he doesn't want to continue. So he's a second version and makes a person, the character more likable.
怎么让角色更讨喜呢?就是加入些脆弱性。比如让角色遭遇挫折,这样读者能产生共鸣。这很有趣,本质上都是关于理解情感要素——不仅是故事本身,还包括角色引发的情感。
Like what, how to make a character more likable is that you put in some vulnerability, like sometimes that, okay, maybe this person like has setback because somehow we can relate to it. So in a lot of ways, it's very interesting. It's like a lot of it is like, yeah, a lot of it is about like understands the emotional bits. Like how the users feel, not just about the story, but also about the characters.
太有意思了!没想到能学到这么多,这个例子太棒了。Chip,最后两个问题:
That is so interesting. Wow, I learned a lot more there than I thought. That was awesome. Really good example. Chip, two final questions.
如果听众想联系你合作,或者分享你提供的内容,他们该去哪里找你?另外听众怎样才能帮到你呢?
Where can folks find you online if they want to reach out and maybe work with you or maybe even just share the stuff that you offer if folks want to reach out? And then how can listeners be useful to you?
我其实不太用社交媒体,虽然LinkedIn和Twitter都有账号,也经常发帖,但总觉得应该更活跃些,毕竟要和读者保持互动。我正准备启动一个项目——目前有个临时页面,打算做系统思维相关的内容,这个技能很有趣。考虑开个YouTube频道做书评,主要推荐能提升思维能力的书籍。
I'm like, I'm not social media, LinkedIn, Twitter. I do post a lot, but I keep telling myself that I should do more because I can't like the competition with readers. So I'm actually about to start a slight a suspect. So I have like a placeholder for suspect right now and I'm thinking of doing it for more system thinking because I think it's a very interesting skill. And so like thinking of doing a YouTube channel on book reviews and basically books that help you think better.
所以我想我要评论的第一本书大概会是这本,因为它是我成长过程中最喜欢的书,我一直反复阅读。是的,它如何能有所帮助呢?比如,把你喜欢的书、那些改变你思维方式或行为方式的书籍推荐给我。我会非常感激。
So I think it's the first book I'm gonna review is probably like this book because it's like my favorite book growing up and I have been like keep on reading it. So yes, so how can it be helpful? Like send me books that you like, books that help you have changed the way you think or change you the way you do anything. So I would appreciate it.
太棒了。我很期待读那本书。Chip,非常感谢你能来参加。
Amazing. I'm excited to read that book. Chip, thank you so much for being here.
非常感谢Lenny邀请我。
Thank you so much Lenny for having me.
大家再见。非常感谢你们的收听。如果觉得这期节目有价值,可以在苹果播客、Spotify或你喜欢的播客应用上订阅我们的节目。也请考虑给我们评分或留下评论,这能帮助其他听众发现这个播客。你可以在lennyspodcast.com找到所有往期节目或了解更多关于本节目的信息。
Bye everyone. Thank you so much for listening. If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at lennyspodcast.com.
下期节目再见。
See you in the next episode.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。