Hard Fork - “AI洗牌”裁员?+ 为什么大型语言模型写不好文章?+ Token最大化 封面

“AI洗牌”裁员?+ 为什么大型语言模型写不好文章?+ Token最大化

‘A.I.-Washing’ Layoffs? + Why L.L.M.s Can’t Write Well + Tokenmaxxing

本集简介

本周,我们首先讨论Atlassian和Block的新一轮科技裁员,以及Meta计划裁减多达20%员工的报道。这引发了一个问题:人工智能导致的失业是否真的已经开始,还是另有其他因素?随后,我们邀请作家Jasmine Sun,探讨为什么聊天机器人在创意写作方面依然表现糟糕。最后,进入“代币最大化”时间!Kevin带我们深入了解他最新报道的幕后故事,揭示科技公司为何建立排行榜来衡量谁在使用最多的人工智能。 嘉宾: Jasmine Sun,记者兼作家,jasmi.news 延伸阅读: 我在Block工作过。它的AI裁员并非表面看起来那样。 Meta因AI成本攀升计划大规模裁员。 Meta因性能担忧推迟新AI模型发布。 裁员的“AI洗牌”具有腐蚀性且令人困惑。 人工智能难以掌握的人类技能。 我们期待您的声音。请发送邮件至hardfork@nytimes.com。在YouTube和TikTok上关注“Hard Fork”。 立即在nytimes.com/podcasts、Apple Podcasts和Spotify上订阅。您也可以通过您最喜欢的播客应用订阅:https://www.nytimes.com/activate-access/audio?source=podcatcher。如需更多播客和有声文章,请下载《纽约时报》应用:nytimes.com/app。 由Simplecast(AdsWizz公司)制作。有关我们为广告目的收集和使用个人数据的信息,请访问pcm.adswizz.com。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

我们向《纽约时报》的员工提前展示了跨平台游戏功能,以下是他们的反馈。

We gave Times employees a preview of cross play from New York Times games, and here's what they had to say.

Speaker 1

我终于可以和其他人一起玩了。

I can finally play with other people.

Speaker 2

我很有竞争心。

I'm pretty competitive.

Speaker 2

击败朋友和同事很有趣。

It's fun to beat friends and coworkers.

Speaker 1

我有一个字母J,得10分。

I have a j for 10 points.

Speaker 3

我猜‘tango’不是一个单词。

I'm guessing tango is not a word.

Speaker 3

我们来看看。

Let's see.

Speaker 3

‘tango’是一个单词。

Tango is a word.

Speaker 1

哦。

Oh.

Speaker 1

作为英语作为第二语言的使用者,我喜欢学习新单词。

As in English as a second language speaker, I like to learn new words.

Speaker 0

跨平台对战,纽约时报游戏推出的首款双人文字游戏。

Crossplay, the first two player word game from New York Times games.

Speaker 0

今天免费下载吧。

Download it for free today.

Speaker 3

今天早上我刚读到一条最暖心的新闻,想和你分享一下,凯文。

I just read the most heartwarming news this morning that I wanted to share with you, Kevin.

Speaker 3

是什么?

What's that?

Speaker 3

英国政府在遭到杜阿·利帕等艺术家的强烈反对后,撤回了允许人工智能公司使用受版权保护的作品进行训练的提案。

The UK government has withdrawn a proposal to let AI companies train on copyrighted works after a backlash from artists like Dua Lipa.

Speaker 3

你看到这个了吗?

Did you see this?

Speaker 3

没有。

No.

Speaker 3

杜阿·利帕说:别现在就开始搞这个AI了。

Dua Lipa said, don't start now with this AI.

Speaker 3

我的甜心?

My sugar boo?

Speaker 3

她在打官司呢,凯文。

She litigating, Kevin.

Speaker 3

她就像在制定新规则,说我们不会用我受版权保护的作品进行训练。

She's like she's making some new rules, and she's saying, we're not gonna train on my copyrighted works.

Speaker 3

哇。

Wow.

Speaker 3

这就是为什么她是个女王。

And that's why she is a queen.

Speaker 3

所以,杜阿·利帕,如果你在听,我们向你致敬。

And so Dua Lipa, if you're listening, we salute you.

Speaker 2

是的

Yeah.

Speaker 2

杜阿·利帕,你真是个杜阿·基帕。

Dua Lipa, you're a dua keepa.

Speaker 3

没错。

Period.

Speaker 3

杜阿·利帕说艺术家的权利。

Dua Lipa said artist rights.

Speaker 2

哇。

Wow.

Speaker 2

我是《纽约时报》的科技专栏作家凯文·罗斯。

I'm Kevin Roose of tech columnist at the New York Times.

Speaker 3

我是《平台者》的凯西·纽恩。

I'm Casey Newn from Platformer.

Speaker 3

这很难办。

And this is hard for.

Speaker 3

本周,大规模的科技公司裁员潮引发了人们的疑问:AI导致的失业真的已经开始了吗?

This week, a big wave of tech layoffs is raising the question, has AI job loss truly begun?

Speaker 3

接着,作家 Jasmine Sun 将帮助我们解答这个问题:为什么聊天机器人不擅长写作?

Then writer Jasmine Sun is here to help us answer the question, why are chatbots bad at writing?

Speaker 3

最后,是时候谈谈‘代币最大化’了:科技公司为何在建立排行榜,以衡量谁在 AI 上花费最多。

And finally, it's tokenmaxxing time, why tech companies are building leaderboards to measure who is spending the most on AI.

Speaker 2

嗯,Casey,多年来我们一直在关注 AI 导致大规模失业的迹象。

Well, Casey, for years now, we've been monitoring for signs of an AI job apocalypse.

Speaker 2

是的。

Yeah.

Speaker 2

我们一直在关注这一情况。

We've been monitoring the situation.

Speaker 2

确实如此。

It's true.

Speaker 2

过去几周,我认为我们已经获得了一些早期迹象,表明劳动力市场正在发生变化,尤其是对科技工作者而言。

And over the past few weeks, I think we've gotten some early indications that something is happening in the labor market, especially for tech workers.

Speaker 3

是的。

Yeah.

Speaker 3

我们确实听到一些公司首席执行官宣布裁员时将AI作为原因,这引起了我们的注意。

We have certainly heard CEOs of companies announcing layoffs invoking AI as a reason that it is happening, and so that has gotten our attention.

Speaker 2

对。

Yeah.

Speaker 2

所以,仅举最近几周的几个例子。

So just a couple examples from the last few weeks.

Speaker 2

上周,Atlassian宣布裁员10%,约1600个职位,称此举将帮助他们为AI和企业销售的进一步投资提供资金。

Last week, Atlassian announced a 10% reduction in its staff, about 1,600 jobs that they said were going to help them fund further investment in AI and enterprise sales.

Speaker 2

紧随其后的是金融科技公司Block(前身为Square)的大规模裁员,该公司表示将裁员约40%,即约4000人,称其正在调整工作方式,转向更小、更扁平的团队。

That came on the heels of a big round of layoffs at Block, the financial tech company formerly known as Square, which said that it was cutting its staff by about 40% or about 4,000 jobs, saying that they were shifting the way that they were working to use smaller and flatter teams.

Speaker 2

而人们预计可能就在本周发生的重大事件是,Meta据称即将裁员20%或更多。

And then the big one that folks are expecting maybe as soon as this week is that Meta is reportedly poised to lay off 20% or more of the entire company.

Speaker 2

上周五,路透社报道称,其消息人士透露,Meta正准备裁员多达16000人,这是自2022年底至2023年初该公司裁员20000人以来规模最大的一次裁员。

This was reported by Reuters last Friday who said that their sources had told them that Meta was preparing to cut as many as 16,000 jobs, the largest layoffs at that company since late twenty two or early twenty twenty three when they laid off 20,000 people.

Speaker 2

因此,截至本次录音时,我们尚未得知Meta确实发生了裁员,但我了解到Meta的员工们都非常紧张,正在等待关于他们工作的进一步消息。

So as of this recording, that hasn't happened yet that we know of, but I know that people at Meta are very on edge and are awaiting the further news about their jobs.

Speaker 3

这个故事曝光后,Meta告诉路透社,称这是‘推测性报道’。

Meta, after this story came out, told Reuters that it was, quote, speculative reporting.

Speaker 2

如果你不熟悉Meta公关人员常用的措辞,这意味着事情确实正在发生,但我们不想直接告诉你。

Which if you're not familiar with the language deployed by Meta communication staffers means this is happening, but we don't wanna tell you

Speaker 3

但还没正式发生。

it's happening yet.

Speaker 2

没错。

Correct.

Speaker 2

所以,凯西,我想听听你对这些裁员的看法,但首先,我们该做一下披露。

So, Casey, I wanna hear what you make of these layoffs, but first, we should do our disclosures.

Speaker 2

我在《纽约时报》工作,而《纽约时报》正在起诉OpenAI、微软和Perplexity。

I work for the New York Times, which is suing OpenAI, Microsoft, and Perplexity.

Speaker 3

而我的未婚夫

And my fiance

Speaker 2

他在Anthropic工作。

works at Anthropic.

Speaker 2

好吧。

So okay.

Speaker 2

凯西,你怎么看待这些公司都以某种方式将人工智能作为裁员理由的现象?

Casey, what do you make of the fact that all these companies are referencing AI in some way as a reason for their layoffs?

Speaker 3

我觉得每家公司的情况都不太一样,凯文。

Well, I think it's a little different at each company, Kevin.

Speaker 3

我认为我们可以为AI是否真正主导了每家公司的决策提供有力的正反两方面论据。

And I think we can make a decent case for and against the idea that AI is really driving the show at each of them.

Speaker 3

所以也许我们应该深入探讨一下。

So maybe we should get into that.

Speaker 3

但就总体而言,我会说,如今这些公司仍然不断告诉我们,人工智能是导致这些裁员的重要因素。

But at the highest level, I would say companies do continue to tell us now that AI is a significant factor in the reduction of these workforces.

Speaker 3

而且迟早,我认为我们不得不相信他们。

And sooner or later, I do think we're going to have to believe them.

Speaker 2

是的

Yeah.

Speaker 2

我认为这对很多人来说是一个早期预警信号,尤其是在科技行业,我认为可以公平地说,他们将是首批因这些新AI工具而经历工作变化或失业的人。

I think this is the early warning sign for a lot of people, especially in the tech industry, who are, I think, it's fair to say, going to be some of the first people to see their jobs change or disappear because of these new AI tools.

Speaker 2

但让我们深入探讨一些具体的细节。

But let's get into some of the specifics here.

Speaker 2

所以,凯西,我们先从我提到的第一个公司——Atlassian开始。

So, Casey, let's start with Atlassian, the first company I mentioned.

Speaker 2

他们的首席执行官迈克·坎农布鲁克在公司博客文章中表示,对于软件公司而言,关于增长、盈利能力、速度和价值创造的卓越标准已经提高。

Their CEO, Mike Cannonbrook, said in a company blog post that the bar for what great looks like for software companies on growth, on profitability, on speed, on value creation has gone up.

Speaker 2

他说,我们选择以深思熟虑、果断且迅速的方式进行调整,以推动持久且盈利的增长。

He said we are choosing to adapt thoughtfully, decisively, and quickly to drive durable, profitable growth.

Speaker 2

他声称AI并没有取代人,但他也表示,若假装AI不会改变我们所需技能的组合或某些领域所需岗位的数量,那是不诚实的。

He claimed that AI was not replacing people, but he said it would be disingenuous to pretend that AI doesn't change the mix of skills we need or the number of roles required in certain areas.

Speaker 3

是的

Yeah.

Speaker 3

所以我相信他的话。

So I take him at his word.

Speaker 3

看起来他自己正试图走一条中间路线。

It seems like he himself is trying to walk a middle path there.

Speaker 3

对吧?

Right?

Speaker 3

既不否认AI是一个因素,也不说这是唯一的原因。

And sort of not denying that AI is a factor here, but also not saying like this is the only reason this is happening.

Speaker 3

我认为值得补充的其他背景是,Atlassian 是我们这里所说的‘SaaSpocalypse’可能涉及的公司之一。

I think some other context that is worth having is that Atlassian is one of the companies that could be part of what we've been calling the SaaSpocalypse around here.

Speaker 3

对吧?

Right?

Speaker 3

这是一家为企业提供工具的公司。

This is a company that makes tools for businesses.

Speaker 3

它的许多产品本质上是结构化的工作流程,有些人认为,迟早你都能以很低的成本自己编写代码。

A lot of its products are essentially structured workflows, and there are those who believe that sooner or later, you're just going to be able to code your own pretty cheaply.

Speaker 3

现在,你可能仍然会选择购买像Atlassian这样的公司产品,但你可能不再愿意支付以前那么高的价格了。

Now, maybe you will still choose to buy a product from a company like Atlassian, but maybe you're not gonna be willing to pay nearly as much as you would before.

Speaker 3

因此,过去一年里,该公司的股价一直遭受重创,我认为这使他们一方面现金流紧张,但更重要的是,他们急需向股市讲述一个全新的故事,来解释他们正在做什么。

And so the company's stock price has just been battered over the past year, and I think that has left them, one, hurting for cash a little bit, But two, and probably more importantly, looking for a different story that they can tell the stock market about what they're doing.

Speaker 3

所以今天,他们的说法是:我们要裁掉一些员工,并设法让剩下的员工提高生产力。

And so today, that story is we're gonna get rid of some of these workers, and we're gonna figure out how to make our remaining workers more productive.

Speaker 2

所以现在有一个词在流传,叫‘AI洗牌’,嗯。

So there's this term that's been floating around called AI washing Mhmm.

Speaker 2

这基本上是指,当一家公司想要大量裁员,或者觉得不需要那么多人的时候。

Which is basically when a company wants to lay a bunch of people off or maybe they don't feel like they need as many people.

Speaker 3

我以为那是当一名软件工程师终于去洗了个澡的时候。

I thought it was when a software engineer finally took a shower.

Speaker 2

而基本的观点是,这些裁员其实并不是真的因为AI。

And, basically, the thesis is, like, these aren't really layoffs about AI.

Speaker 2

这只不过是一些公司借来使用的方便借口。

This is just sort of a convenient excuse that these companies are using.

Speaker 2

是的。

Yeah.

Speaker 2

你认为Atlassian属于AI泡沫吗?

Do do you think Atlassian qualifies as AI washing?

Speaker 3

我想更详细地了解他们具体裁掉了哪些人,而这一点在其他一些公司中我们是有数据的,这有助于我们回答这个问题。

I would like to get a little bit more detail on exactly who they are laying off here, which which is a detail that we do have about some of these other companies that helps us answer that question.

Speaker 3

我不清楚Atlassian内部具体是如何操作的,但我认为他们的CEO在这方面相对直率,他说这和AI有点关系。

So I don't know exactly how it is happening inside of Atlassian, but I think that their CEO was relatively straightforward as these things go in saying, like, it's a little bit about AI.

Speaker 3

这不完全是关于AI,但没错,你要关注AI。

It's not entirely about AI, but, like, yes, keep your eye on AI.

Speaker 3

对我来说,这听起来很诚实,所以我愿意放过他们。

So to me, that just reads as honest, and so I'm gonna give them a pass.

Speaker 2

好的。

Okay.

Speaker 2

我们来谈谈Block。

Let's talk about Block.

Speaker 2

Block的首席执行官杰克·多西对裁员问题做了解释。

Jack Dorsey, the CEO of Block, gave an explanation about their layoffs.

Speaker 2

他说:‘我们做这个决定并不是因为公司陷入困境。’

He said, quote, we're not making this decision because we're in trouble.

Speaker 2

我们的业务依然强劲,但有些事情已经发生了变化。

Our business is strong, but something has changed.

Speaker 2

我有两个选择:要么随着这一转变缓慢地在数月或数年内逐步削减,要么坦诚面对现状并立即采取行动。

I had two options cut gradually over months or years as this shift plays out or be honest about where we are and act on it now.

Speaker 2

我选择了后者。

I chose the latter.

Speaker 2

凯西,你的看法是什么?

Casey, your take.

Speaker 3

关于我和杰克·多西,有一件事你需要知道:作为一名曾经的Twitter用户,我非常怀念那个网站,因此对他有些偏见。

So something to know about me and Jack Dorsey is I have a bit of a bias against him as a former Twitter user who misses that website dearly.

Speaker 3

到了2026年的今天,我不会雇用杰克·多西来经营一个柠檬水摊。

At this point in 2026, I would not hire Jack Dorsey to run a lemonade stand.

Speaker 3

好吧?

Okay?

Speaker 3

但如果你要谈Block公司,这是一家在2019年员工约3800人的情况下,于疫情繁荣时期因忽视业务实际而将员工规模扩大了三倍的公司。

But if you wanna talk about Block specifically, this is a company that tripled its head count from about 3,800 people in 2019 in what seems like just kind of classic, like, in inattention to what was happening in the business during pandemic era boom times.

Speaker 3

对吧?

Right?

Speaker 3

我不知道你有没有注意到这个细节,但这个事实真的让我震惊了,凯文。

And I wonder if you saw this detail because it truly took me out, Kevin.

Speaker 3

在宣布裁员五个月前,Block公司花了6800万美元,把8000名员工飞到现场参加一场与Jay-Z同台的线下活动。

Five months before they announced their layoffs, Block spent $68,000,000 to fly 8,000 people to an in person event with Jay z.

Speaker 3

拜托。

Come on.

Speaker 3

是的。

Yeah.

Speaker 3

这就是那种广为人知的细节把控,正是它让杰克·多西成为科技界最伟大的远见者之一。

So that's the kind of famous attention to detail that has turned Jack Dorsey into one of the greatest visionaries in tech.

Speaker 3

所以,你看,这跟人工智能有关吗?

So, look, is this about AI?

Speaker 3

再说一遍,你知道Block到底做什么吗?

Again, you know, what does Block really do?

Speaker 3

他们在咖啡店有那些小小的iPad,还有Cash App。

They have those little iPads at the coffee shop, and then they have Cash App.

Speaker 3

明白吗?

Okay?

Speaker 3

运营这些产品,你真的需要这么多人吗?

How many people do you really need to run those products?

Speaker 3

可能不到一万人就够了。

Probably fewer than 10,000.

Speaker 3

这跟人工智能有关吗?

Is that about AI?

Speaker 3

我不知道。

I don't know.

Speaker 3

也许你眯着眼看的话。

Maybe if you squint.

Speaker 3

但再说一次,这是一家股价暴跌的公司。

But again, this is a company whose stock price was cratering.

Speaker 3

他们需要给市场讲一个新故事。

They needed a different story to tell the market.

Speaker 3

我认为你确实可以说,人工智能会让剩下的员工更高效。

And I do think you can make a case that AI will make the remaining workers more productive.

Speaker 3

所以,这又是另一个例子:你可以用人工智能来解释正在发生的事,但你也可以直接说,这家公司长期以来管理不善。

So again, this is another one where it's like, you could use AI to justify what's happening, but you also could just say, this company has been mismanaged for a while now.

Speaker 2

是的。

Yeah.

Speaker 2

你可以说这是在‘人工智能洗牌’,或者‘Jay Z洗牌’,这似乎正是他们在这里做的。

You could use AI washing or Jay Z washing, which just seems to be what they what they are doing here.

Speaker 3

嗯。

Mhmm.

Speaker 3

是的。

Yes.

Speaker 2

所以这确实对他们的股价产生了影响。

So this did seem to have an effect on their stock price.

Speaker 2

事实上,杰克·多西宣布裁员后的第二天,Block的股价上涨了17%。

In fact, the day after Jack Dorsey announced the layoffs, Block stock shot up 17%.

Speaker 2

此后股价略有回落,但仍然高于裁员前的水平。

It's gone down a little bit since then, but they're still up from where they were before these layoffs.

Speaker 2

我认为我们应当明确指出,这也是这里需要考虑的一部分。

And I think we should just say, like, this is also a part of the equation here.

Speaker 2

对吧?

Right?

Speaker 2

这些公司大多是上市公司,受到投资者的关注。

These are companies, largely public ones, that have investors' attention.

Speaker 2

而目前,AI正形成一种强大的叙事力量:如果你看起来是一家大力投资AI工具和AI工作方式的公司,投资者就会说,这家公司的视野非常前瞻。

And right now, there's sort of this narrative power around AI where if you seem like a company that is investing heavily in the AI tools and the AI way of working, your investors say, oh, that company is really forward looking.

Speaker 2

他们必须有一个应对这一过渡的计划。

They must have a plan for how to navigate this transition.

Speaker 2

所以我认为,他们意识到讲述这个故事的力量——所有这些都与人工智能有关。

And so I think there's sort of they're seeing the power in telling the story that all this is related to AI.

Speaker 2

是的。

Yeah.

Speaker 2

这,通过

Which, by

Speaker 3

顺便说一下,让我想起了加密货币狂热的巅峰时期,当时一些上市公司只要在名字里加个加密货币相关的词,股价就会飙升高达40000%。

the way, reminds me of, like, the peak of Cryptomania when, like, some public traded companies would just add, a crypto term to their name and their stock price would shoot up by, like, 40000%.

Speaker 2

对。

Yes.

Speaker 3

事实证明,公开市场真的可以被如此轻易地蒙骗。

It turns out that the public markets actually can just be tricked that easily.

Speaker 3

对。

Yes.

Speaker 3

如果我是CEO,知道我居然能这么轻易地骗到人,可能会让我松一口气。

That would give me some relief if I was a CEO just knowing that I could fool people like that.

Speaker 3

但不管怎样

But anyways

Speaker 2

让我们谈谈第三家被报道正在裁员的大型科技公司——Meta。

So let's talk about the third large tech company that is reportedly conducting layoffs, Meta.

Speaker 2

我们还不知道这些裁员具体影响了哪些人或哪些团队,但这是他们员工队伍中相当重要的一部分。

We don't know exactly who or what teams are being affected by these layoffs, but this is a significant part of their workforce.

Speaker 2

他们在与公众的沟通中似乎和其他公司说的一样:我们将全力投入新的工作方式,为此必须做出一些削减。

And they seem to be saying in their communications with the public what all of these other companies are saying, which is we are going all in on the new way of working, and we are going to have to make some cuts to make that work.

Speaker 3

是的。

Yeah.

Speaker 3

在最近的一次财报电话会上,马克·扎克伯格表示:'过去需要庞大团队才能完成的项目,现在只需一个非常有才华的个人就能完成。'

On a recent earnings call, Mark Zuckerberg said that, quote, projects that used to require big teams now can be accomplished by a single very talented person.

Speaker 3

我们还应该指出,这次裁员正值公司大规模投资AI基础设施之际。

And and we should also say that this cut is coming alongside this massive AI infrastructure investment.

Speaker 3

对吧?

Right?

Speaker 3

他们今年要在资本支出上投入1350亿美元。

They're gonna spend a $135,000,000,000 on capital expenditures this year.

Speaker 3

就算以Meta这样规模的公司来说,这也绝对是一笔天文数字。

And even for a company of Meta's size, like, that is real money.

Speaker 3

没错吧?

Right?

Speaker 3

所以我明白他们一直在刻意保持谨慎,尽力不去过度惊动股市。

So I know that they're trying to be careful, again, trying to not spook the stock markets too much.

Speaker 3

这显然是Meta公司发展史上规模最大的一次押注,而我认为此次大规模裁员其实是在向市场传递一个信号——听着,

This is obviously the biggest bet in the company's history, and I think that making some substantial cuts are going to signal to the market hey.

Speaker 3

大家大可放心。

Don't worry.

Speaker 3

我们并没有彻底乱了阵脚。

We're not, like, completely losing our minds here.

Speaker 3

我们的目标是控制这些开支。

Like, we're going to keep some of these expenses under control.

Speaker 2

对。

Yeah.

Speaker 2

我认为这是一个非常重要的观点,因为我们在这些公司看到的是,它们并没有真正通过这些工具总体上削减成本。

I think that's a really important point because what we're seeing here at some of these companies is that they are not actually sort of cutting costs in the aggregate by using these tools.

Speaker 2

它们只是把成本从人力劳动转移到了人工智能上。

They are just shifting the cost from human labor to AI.

Speaker 2

没错。

Right.

Speaker 2

它们正把因裁员数千人而节省下来的资金投入到数据中心和其他人工智能基础设施的建设中。

They are plowing this money that they're going to save by laying off these thousands of people into the building of data centers and other AI infrastructure.

Speaker 2

它们所做的赌注是,这些新的AI员工会更快、更高效,从长远来看可能更便宜,也可能不是,但它们能够完成过去需要成千上万人完成的工作。

And, basically, the bet they're making is these new AI workers are going to be faster, more efficient, maybe cheaper in the long run, maybe not, but they are going to be able to do the work that used to require many thousands of people.

Speaker 2

这标志着公司在谈论员工方式上的一次深刻转变。

And that is a profound shift in the way that companies are talking about their workers.

Speaker 2

我最近和一位风险投资人聊过,他说他看到的许多AI初创公司,尤其是最原生的AI公司,花在AI工具上的钱比发给员工的工资还多。

I recently talked to a venture capitalist who said that a lot of the AI startups that he sees, the most AI native companies, are spending more on AI tools than they are on payroll.

Speaker 2

这可能是个特例,但我认为这正是这些公司所相信的未来趋势——大部分开支将不再用于支付人类员工的工资。

And that may be an outlier, but I think that is sort of where these companies believe that we are headed, where the majority of your expenses will not go to paying the salaries of human workers.

Speaker 2

而会转向购买AI工具和公司运行所需的令牌。

It will go toward buying the AI tools and the tokens that your company runs on.

Speaker 3

是的。

Yes.

Speaker 3

我认为这正是他们所押注的方向。

I think that's absolutely the the bet that they're making.

Speaker 3

我还想指出,这目前基本上仍完全是推测性的。

I also just think it is worth noting that this is still purely mostly speculative.

Speaker 3

对吧?

Right?

Speaker 3

比如Meta这家公司,它在AI领域可以说一直面临挑战。

Like, in the case of Meta specifically, this is a company that has arguably been struggling when it comes to AI.

Speaker 3

他们不得不放弃最后一个庞大的模型,因为它表现不佳。

They had to abandon their last model behemoth because it wasn't very good.

Speaker 3

《纽约时报》上周报道,由于未能达到性能目标,该公司推迟了最新模型‘Avocado’的发布。

The Times reported last week that it's delaying the release of its latest model, Avocado, because it hasn't been hitting its performance targets.

Speaker 3

它仅仅略微超过了Gemini 2.5的表现。

It's apparently barely outperformed Gemini 2.5.

Speaker 3

这是什么?

What is this?

Speaker 3

去年三月?

Last March?

Speaker 2

是的。

Yeah.

Speaker 2

那个模型真的很差劲。

That model is really the pits.

Speaker 2

这是在调侃Avocado。

That's an Avocado joke.

Speaker 3

这个梗玩得不错。

That's very good.

Speaker 3

谢谢你的笑点。

Thank you.

Speaker 3

所以我再强调一下,这件事根本不是说他们因为取得了这些巨大的进展,就能裁掉20%的员工这么简单。

So, again, this is not as simple as saying they're able to cut 20% of their workforce because they've just made these massive gains.

Speaker 3

我肯定公司里确实有一些人做出了突出的成绩。

I'm sure there are individuals there who have made massive gains.

Speaker 3

但作为一家公司,它似乎还是陷入了某种运作失调的困境里。

But as a company, it still seems like it is somewhat mired in dysfunction.

Speaker 3

他们刚对自己的人工智能团队又进行了一次局部重组,这种事总是会让我心生疑虑。

They just did yet another partial reorg of their AI teams, and and that just always sort of makes me raise my eyebrows.

Speaker 2

对。

Yeah.

Speaker 2

我得说,最近这轮裁员里有一点挺让我意外的:发起裁员的公司都不属于行业第一梯队的那批。

I I will say, like, one thing that's been surprising to me about this recent round of layoffs is that the companies that are making them are not the ones on the frontier.

Speaker 2

对吧?

Right?

Speaker 2

不是OpenAI、Anthropic、谷歌这些公司。

It is not the OpenAI, the Anthropix, the Googles.

Speaker 2

这些公司并没有因为这些AI工具而大规模裁员,而这些工具正是它们自己在开发的,而且它们拥有的模型可能比对外发布的还要先进。

Those companies are not laying off people on mass because of these AI tools, which they are building and presumably have even better models than the ones they're releasing to the public.

Speaker 2

所以你必须认为,这部分原因只是那些落后于竞争对手的公司认为,也许如果我们大量使用AI,就能赶上它们。

So you have to think that part of this is just companies that are sort of lagging behind their competition saying, well, maybe if we just use a bunch of AI, it'll help us catch up.

Speaker 3

是的。

Yes.

Speaker 3

但同时,像OpenAI和Anthropic这样的公司,员工人数比我们今天讨论的某些公司要少得多。

But also, like, OpenAI and Anthropic are much smaller companies than some of the ones that we've been talking about today, at least in number of workers.

Speaker 3

对吧?

Right?

Speaker 3

比如,我觉得挺有意思的是,当你看它们创造的价值相对比例时,Atlassian的员工人数竟然比OpenAI还多。

Like, I think it it is interesting to think that Atlassian is, like, bigger than OpenAI in terms of the number of people who work there when you look at the, you know, relative, like, value of what what they're generating.

Speaker 3

DocuSign 有七千名员工。

DocuSign has 7,000 employees.

Speaker 3

在科技新闻界,再也没有比这句话更真实的了。

There's no funnier sentence that true in all of tech journalism.

Speaker 3

作为一个付费订阅了 DocuSign 且真心讨厌为此付费的人,那边的人们,赶紧去工作吧。

As somebody who who has a paid subscription for DocuSign that I that I truly resent paying for, get to work over there, people.

Speaker 2

或者干脆别工作了。

Or get not to work.

Speaker 3

别工作了。

Get not to work.

Speaker 3

我还有一个问题想问你,凯文。

Here's another question that I would ask, Kevin.

Speaker 3

好的。

Okay.

Speaker 3

我们正看到大量裁员。

So we're seeing a bunch of layoffs.

Speaker 3

这些裁员是和人工智能有关的吗?

Like, are these AI related or not?

Speaker 3

即使对员工的影响是一样的,这真的重要吗?

Does it actually matter if the effect on workers is the same?

Speaker 3

对吧?

Right?

Speaker 3

比如,对你作为员工来说,不管是不是人工智能导致的,你还是丢了工作。

Like, know, if you're the worker, like, whether it's about AI or not, you're still out of a job.

Speaker 2

是的。

Yeah.

Speaker 2

我不清楚员工能或应该做些什么来保护自己免受这些裁员的影响。

And it's not clear to me what workers can or should be doing to sort of protect themselves against these layoffs.

Speaker 2

我采访过一个人,他说他们就在一家大型科技公司工作,现在到处都是竞争、恐惧和焦虑。

One person I talked to said, you know, they're they work at one of these big tech companies, and they're like, well, there's just a lot of jostling and fear and anxiety right now.

Speaker 2

人们不知道是否应该大量使用人工智能工具,因为这样看起来他们跟上了潮流,但同时也可能证明他们的工作是可以被自动化的。

People don't know if they should be, like, using the AI tools a ton because then it shows that they're, like, getting with the the program or whether that just means that they're proving that their work can be automated.

Speaker 2

我觉得现在这些公司内部充满了恐惧、猜疑和不信任,而且这种情绪是有道理的。

Like, I think there's a lot of fear and suspicion and mistrust inside these companies right now, and and for good reason.

Speaker 2

他们的高管正在计划裁员。

Their executives are planning to lay them off.

Speaker 3

是的。

Yes.

Speaker 3

顺便说一下,我认为至少在一些公司里,这可能并不是裁员的明确理由,但一些高管会把这视为一个积极的附带结果。

And by the way, I think at at least some of these companies, that is maybe not an explicit reason for these layoffs, but some of the executives there would see that as a positive byproduct.

Speaker 3

对吧?

Right?

Speaker 3

因为,你知道,如果你像马克·扎克伯格一样,经历过2020年那段时期,你就知道当时那些不安分的员工向你提出了很多要求,希望对公司能做什么、不能做什么以及如何做拥有很大控制权。

Because, you know, if you're like Mark Zuckerberg, you lived through the 2020 era, you had these restive employees that, like, wanted a lot of things from you, and they wanted to have a lot of control over what the company could and could not do and how it did it.

Speaker 3

我知道那边的高管们真的很反感这种事。

And, you know, I just know that executives over there really resented that sort of thing.

Speaker 3

一旦Meta进入大规模裁员的新阶段,那里的员工确实变得非常害怕,原因也都是你能想到的那些。

And once Meta entered this new era of massive layoffs, employees over there did get really scared for all of the reasons that you would assume.

Speaker 3

他们心想:天哪。

They were like, oh god.

Speaker 3

你知道,也许我真的会丢掉工作。

Like, you know, maybe I actually am gonna lose my job.

Speaker 3

突然间,他们变得安静多了,你开始看到那里很少再有抗议活动。

And all of a sudden, they got a lot more quiet, and you started to see a lot fewer protests over there.

Speaker 3

所以我不会说,这些偶尔的大规模裁员是为了让员工听话,但我确实注意到,它似乎产生了这样的效果。

So I'm not gonna say that, like, these occasional mass layoffs are a way of, like, keeping the workforce in line, but I have noticed that it seems to be having that effect.

Speaker 3

完全正确。

Totally.

Speaker 2

这让我想到,我一年前或两年前预测过但没发生的一件事——这些公司员工突然大规模组建工会——可能在未来一两年内真的会开始发生。

And it makes me wonder whether something that I predicted was going to happen, you know, a year or two ago that did not happen, which is the sort of sudden and mass unionization of workers at these companies may actually start to happen in the next year or two.

Speaker 2

我认为,当前这些科技公司发生的情况,与几十年来制造业、汽车公司、工厂工人所经历的情况,有一个重大区别:那些工人基本上都是工会成员。

I think one major difference between what's happening now at these tech companies and what has been happening for decades at manufacturing companies, car companies, you know, factory workers, is that those workers were by and large unionized.

Speaker 2

所以当雇主们说:喂。

And so when the employers said, hey.

Speaker 2

我们要裁掉你们很多人,但他们成功进行了谈判。

We're gonna lay a bunch of you off, they were able to negotiate.

Speaker 2

他们能够说,嘿。

They were able to say, hey.

Speaker 2

也许你们不必把我们全部裁掉,而是可以为我们安排其他工作。

Maybe instead of laying us all off, maybe you could find other jobs for us.

Speaker 2

如果我们的工作被自动化取代了,也许我们应该有机会重新培训,去做别的事情。

If our jobs are being automated, maybe we should be allowed to sort of retrain to do something else.

Speaker 2

这在很大程度上是成功的。

And that was largely successful.

Speaker 2

当然还是有裁员,但数量远不及如今这些科技公司所经历的。

There were still layoffs, of course, but not the number that we're seeing today at these tech companies.

Speaker 2

所以,你认为这种情况有可能发生吗,还是只是工会的空想?

So do you think there's any possibility of that, or is that just sort of a union fever dream?

Speaker 3

我来这么说吧。

Here's what I will say.

Speaker 3

我实在想不到有什么事会比Meta的软件工程师组建工会更能激怒马克·扎克伯格,而且我认为Meta的软件工程师们可以自行斟酌如何用好这一点。

I cannot think of anything that would make Mark Zuckerberg more mad than a union of software engineers at Meta, and I think the software engineers at Meta should use that information how they will.

Speaker 2

你觉得这会比他在UFC赛事上被击倒还让他生气?

You think that would make him more mad than getting booted at UFC fight?

Speaker 3

百分之百是这样,那场惨败大概率只会让他觉得很难堪。

Abs absolute I think that probably just made him really sad.

Speaker 2

行,这下大家都清楚了。

Well, there you have it.

Speaker 2

脸书的员工们,如果你们想把马克·扎克伯格气炸,那就去签你们的工会入会卡。

If you wanna make Mark Zuckerberg mad, Meta employees, sign your union card.

Speaker 3

可当我们真的裁员的时候,为什么那些聊天机器人写东西的水平还不如我呢?

When we cut back, why aren't chatbots as good at writing as I am?

Speaker 2

那你去问孙嘉欣就好了。

Well, ask Jasmine Sun.

Speaker 4

我是小A。

This is A.

Speaker 4

G.

G.

Speaker 4

索勒斯伯格。

Solesberger.

Speaker 4

我是《纽约时报》的出版人。

I'm the publisher of The New York Times.

Speaker 4

我负责我们的新闻业务和商业运营。

I oversee our news operations and our business.

Speaker 4

但我也曾是一名记者,近年来眼睁睁看着我们这个行业不断萎缩,感到十分担忧。

But I'm also a former reporter who has watched with a lot of alarm as our profession has shrunk and shrunk in recent years.

Speaker 4

通常情况下,在这些广告中,我们会谈论订阅《纽约时报》的重要性。

Normally, in these ads, we talk about the importance of subscribing to The Times.

Speaker 4

今天,我想传达一个不同的信息。

I'm here today with a different message.

Speaker 4

我鼓励你们支持任何致力于原创报道的新闻机构。

I'm encouraging you to support any news organization that's dedicated to original reporting.

Speaker 4

如果是你当地的报纸,那就太好了。

If that's your local newspaper, terrific.

Speaker 4

尤其是地方报纸,特别需要你的支持。

Local newspapers in particular need your support.

Speaker 4

如果是其他全国性报纸,那也很好。

If that's another national newspaper, that's great too.

Speaker 4

如果选择《纽约时报》,我们会用这笔钱派遣记者去发掘事实和背景信息,这些是你从人工智能那里永远得不到的。

And if it's The New York Times, we'll use that money to send reporters out to find the facts and context that you'll never get from AI.

Speaker 4

就这样。

That's it.

Speaker 4

不是要你点击任何链接。

Not asking you to click on any link.

Speaker 4

只是订阅一家由真实记者进行第一手事实报道的新闻机构。

Just subscribe to a real news organization with real journalists doing firsthand fact based reporting.

Speaker 4

如果你已经这样做了,谢谢。

And if you already do, thank you.

Speaker 2

嗯,凯西,过去几年里,我们在这个节目中讨论过AI模型在许多方面变得越来越出色。

Well, Casey, over the last couple of years, we've talked on this show about how AI models are getting better at so many things.

Speaker 2

它们在编程、竞赛数学、解决新颖的物理问题上都取得了进步。

They are getting better at coding, at competition math, at solving novel physics problems.

Speaker 3

大规模国内监控,自主武器。

Mass domestic surveillance, autonomous weapons.

Speaker 3

是的。

Yes.

Speaker 2

我认为过去几年AI的故事,是一种快速而稳定的进展。

And I think the story of the last few years in AI has been one of sort of rapid, steady progress.

Speaker 2

但这些系统仍然存在不少瑕疵和弱点。

But these systems are still sort of jagged, and they have flaws and weaknesses.

Speaker 2

而其中一个 arguably 没有太大进步的领域,就是写作。

And one place where they arguably haven't improved that much is in writing.

Speaker 3

这正是我们的专长。

Now that's our domain.

Speaker 3

是的。

Yes.

Speaker 2

至少这是贾斯敏·孙本周在《大西洋》杂志中提出的观点。

At least that is the argument that Jasmine Sun made in the Atlantic this week.

Speaker 2

她是一名自由记者。

She is a freelance journalist.

Speaker 2

她的文章名为《AI难以掌握的人类技能》,她试图理解为什么尽管在这些领域取得了巨大进展,当今的模型却仍写不出任何特别出色或引人入胜的内容。

Her piece was called the human skill that alludes AI, and it's her attempt to understand why despite so much progress in all these different areas, the models of today don't seem to be writing anything particularly good or compelling.

Speaker 3

是的。

Yeah.

Speaker 3

虽然我认为大语言模型是否擅长写作这个问题高度主观,且取决于具体使用场景,但我确实认为贾斯敏提出了一个非常有趣的技术性论点,解释了这些模型为何会如此写作。

And while I think the question of are LLMs good at writing is highly subjective and dependent on the use case, I do think Jasmine makes a really interesting technical case for why these models write the way they do.

Speaker 2

对。

Yes.

Speaker 2

在请她上来之前,我们应该说明一下,贾斯敏是我的朋友。

And we should say before we bring her in, Jasmine is a friend of mine.

Speaker 2

她也是我正在撰写的下一本著作的研究助理。

She has also been my researcher on the upcoming book that I'm working on.

Speaker 2

我觉得她是当今最擅长撰写AI相关内容的人之一。

And I just think she's, like, one of the best people writing about AI today.

Speaker 2

她在自己的Substack上写作,名为Jasmine News。

She writes on her substack, which is called Jasmine News.

Speaker 2

网址是jasmi.news,你可以在那里阅读她更多的文章。

It's jasmi.news, and you can read much more of her writing there.

Speaker 3

好的。

Alright.

Speaker 3

我同意,但下周我打算请一位你的对手来平衡一下。

I'll allow it, but I do wanna balance it out by next week bringing on one of your enemies.

Speaker 2

好的。

Okay.

Speaker 2

我们请她进来吧。

Let's bring her in.

Speaker 2

孙佳敏,欢迎来到《Hard Fork》节目。

Jasmine Sun, welcome to Hard Fork.

Speaker 1

谢谢邀请我来。

Thanks for having me.

Speaker 1

我很激动。

I'm excited.

Speaker 3

你好,佳敏。

Hi, Jasmine.

Speaker 2

你这周在《大西洋月刊》上发表了一篇很棒的文章,主题是人类独有的、AI还无法掌握的技能。我想先从对你文章副标题的一个疑问说起。

So you wrote this great piece in the Atlantic this week about the human skill that alludes AI, and I wanna start by challenging the subtitle of your piece.

Speaker 1

嗯。

Mhmm.

Speaker 2

为什么语言模型没办法写出好的内容?

Why can't language models write well?

Speaker 2

难道语言模型真的没法写出优质的内容吗?

Can't language models write well?

Speaker 1

我在文章中确实提到,绝大多数写作都非常糟糕。

So I do say in the piece that most writing period is very bad.

Speaker 1

所以我认为,语言模型在写作和语言方面确实比大多数人类更出色。

And so I think that language models are definitely better at writing and language than most humans are.

Speaker 1

但我真正好奇的问题是,为什么它们无法写出文学性或创意小说级别的作品?

But the question that I was really curious about is why can't they write a sort of literary creative fiction level?

Speaker 1

因为关键是,如果你听这些AI领袖谈论他们的抱负,他们会说,我们要治愈癌症。

Because the thing is, if you listen to these AI leaders talk about their aspirations, they say, we're gonna cure cancer.

Speaker 1

我们要解决物理学问题。

We're gonna solve physics.

Speaker 1

我们要打造一个超人类的程序员。

We're gonna build a superhuman coder.

Speaker 1

他们毫不避讳地说,我们的AI模型将比75%的人类程序员更优秀。

They are not shy about, oh, our AI models are gonna be better than 75% of human coders.

Speaker 1

他们说的是,不。

They're saying, no.

Speaker 1

我们明天就会 literally 建立一个自我复制的工厂。

We will literally build a self replicating factory tomorrow.

Speaker 1

然后,泰勒·科文在去年十月的一次采访中问萨姆·阿尔特曼:你觉得 GBT 什么时候能写出一首聂鲁达的诗?

And then Tyler Cowen asked Sam Altman in an interview from last October, when do you think GBT will be able to write a Neruda poem?

Speaker 1

萨姆·阿尔特曼回答说:也许在未来,Chatziubiti 能写出一首真正诗人的‘还行的’诗。

And Sam Altman says, maybe in the future, Chatziubiti will be able to write, quote, a real poet's okay poem.

Speaker 1

所以让我感到着迷的是,即使是那些对自身技术能力最乐观的人,也对他们的模型在文学创作上的表现极为保守。

So that was the thing that fascinated me is even these guys who are more bullish than anybody else about the capabilities of their technology, they are very reserved about how much literary writing their models can do.

Speaker 1

这正是我真正感兴趣的那个缺口。

So that was the gap that I was really interested in.

Speaker 2

你文章的开篇提出了一个有趣的论点,即在某种程度上,GPT-2 是人工智能在创意写作方面的巅峰。

And you start your piece with this interesting provocation, which is that in some ways, GPT two was the peak of AI when it comes to creative writing.

Speaker 2

解释一下这个观点。

So explain that.

Speaker 1

让我对这篇文章产生兴趣的部分原因,是我当时正在为你那本书做研究,翻阅了所有这些早期模型的输出内容。

Part of what got me interested in this piece was I was actually doing research for your book and I was going through all of these previous generations of models and reading the outputs.

Speaker 1

真正让我震惊的是,GPT-2和GPT-3的写作风格,我觉得比今天的ChatGPT要吸引得多。

And the thing that really shocked me is that like in a way, the writing style of GPT-two and GPT-three I found so much more compelling than chat GPT today.

Speaker 1

它没有那些令人厌烦的小习惯。

It doesn't have any of the annoying tics.

Speaker 1

它没有破折号,没有三段式列举,也没有‘不是这个,而是那个’的套路。

It doesn't have the em dashes, the tripartite lists, the it's not this but that.

Speaker 1

它的语气要丰富得多。

The tone was much more variable.

Speaker 1

它真的会给你惊喜。

Like it would actually surprise you.

Speaker 1

它会很幽默。

It would be funny.

Speaker 1

它会很有诗意。

It would be poetic.

Speaker 1

这让我震惊,让我回溯到几代之前,意识到也许那时候它们也一直在撒谎,还有其他各种问题。

And that shocked me to sort of, like, go back a few generations and realize that maybe, you know, they were also lying all the time and all sorts of other things.

展开剩余字幕(还有 480 条)
Speaker 1

但从写作风格的角度来看,我更喜欢它,我想深入研究一下。

But from a writing style perspective, I kind of preferred it, and I wanted to investigate that.

Speaker 3

很奇怪。

Weird.

Speaker 3

这让我很震惊。

That shocks me.

Speaker 3

对我来说,和GPT-2对话就像在和一个刚从楼梯上摔下来的人交谈。

What to me, talking to GPT two was like talking to somebody who had just fallen down the stairs.

Speaker 3

你明白我的意思吗?

You know what I mean?

Speaker 3

当它说:‘我觉得我们得送你去医院吗?’

When it was like, do I think, we need to get you to the hospital?

Speaker 3

你闻到烤面包的味道了。

You smell toast.

Speaker 3

是的。

Yeah.

Speaker 2

你瞧,早期OpenAI的提示词库中有一些非常棒的提示,比如他们会说:‘我刚在拉斯维加斯赢了17.5万美元。’

You there's there are these amazing prompts for this, like, early OpenAI prompt library where they would say, like, you know, I just won a $175,000 in Las Vegas.

Speaker 2

关于税务,我需要知道些什么?

What do I need to know about taxes?

Speaker 2

而GPT-2会回答:‘不如先写个关于孤儿院的短篇故事吧,比如说……’

And GPT two would say, like, start just writing some short story about, like, an orphanage and, like

Speaker 1

这让人感到惊讶。

was like surprising.

Speaker 1

是的。

Yes.

Speaker 1

它们简直疯了。

They were like nutty.

Speaker 1

它们真的很怪异。

They were they were they were weird.

Speaker 1

它们绝对会是个糟糕的公司助手,糟糕透顶的编程实习生。

They would absolutely be a terrible corporate assistant, horrible, like, coding intern.

Speaker 1

它根本做不到现在的大语言模型能做到的那些事,这点我特别感激。

It can't do any of the things that modern L.

Speaker 1

那些大语言模型能实现的功能,幸好现在的模型都能搞定。

L.Ms can do that I'm very grateful for.

Speaker 1

但单从纯写作风格的角度来说,以前的这些模型写得非常好。

But like from pure writing style perspective, they're very good.

Speaker 1

就说GPT-3吧,我还找到过有人做的一组示例,他让模型以保罗·格雷厄姆的风格写作,以理查德·道金斯的风格写点东西,诸如此类的要求。

So GB three in particular, like they there's I found this, like, set of samples that some guy did where he was like, oh, write in the style of Paul Graham, write in the style of Richard Dawkins, whatever.

Speaker 1

而且它在风格匹配上的表现,比现在的大语言模型还要好得多。

And it could style match much better than modern LLMs can.

Speaker 1

尤其因为文学创作很大程度上源于叙事口吻和行文风格,这正是我当时真正感兴趣的方向之一。

And particularly because so much of sort of literary writing comes from voice and style, that was one of the things I was really interested in.

Speaker 1

然后我就会想,我们到底丢失了什么,才会让如今的大语言模型连保罗·格雷厄姆或是其他人的写作风格都模仿不出来了?

And it's like, what did we lose that the LLMs can no longer emulate Paul Graham's style or whoever's style?

Speaker 1

因为如果我把那个人给GPT-3用的一模一样的提示词,输入到ChatGPT 5.4这类版本里,生成的内容会烂得一塌糊涂。

Because I would put in the same exact prompt that this guy gave GPT three, put it into chat GPT 5.4 thinking or whatever, and it would be god awful.

Speaker 1

我当时就觉得这太奇怪了。

And I was like, that's really weird.

Speaker 3

那么给我们讲讲,在GPT-2和GPT-3时代之后,发生了什么变化,导致这些模型对我们的回应方式改变了?

So tell us about what you learned about what happened after the GPT two and three era that changed the way that these models respond to us.

Speaker 1

是的。

Yeah.

Speaker 1

我的意思是,答案基本上是训练后的调整。

I mean, I think the the answer is post training, basically.

Speaker 1

于是他们开始加入一个训练后层,本质上就是说:我们有这些疯狂、不可预测、像脑震荡一样的模型。

So they started adding a post training layer, which is basically saying we have these, like, crazy unpredictable, like, nut job concussed models.

Speaker 1

嗯。

Mhmm.

Speaker 1

它们需要学会如何表现,因为一个能表现得体的模型,会是一个很差的公司助手。

And they need to learn how to behave because a model that can behave is a very bad corporate assistant.

Speaker 1

因此,人工智能研究人员会提供一些对话范例和脚本供它们学习。

And so the AI researchers give them example dialogues and scripts to learn from.

Speaker 1

他们给他们规定了可以说和不能说的词语。

They give them words that they can and can't say.

Speaker 1

他们采用RLHF,这是一种由人类评分者评估哪种回答听起来最有帮助的过程。

They do RLHF, which is a process by which human graders will rate, like, which response is the most helpful sounding or something like this.

Speaker 1

因此,这些经过后训练的模型被以某种方式束缚、训练或引导,形成了一个非常乐于助人的角色或人格,但可能在创造性与出人意料的表达方面表现不佳。

And so now these post trade models have been trapped in a way or trained or guided towards a very particular character or persona that is a very helpful assistant, but might be very bad at writing in creative and surprising ways.

Speaker 1

I

Speaker 2

我的意思是,你描述的是,在后训练阶段,这些AI模型会接受人类的评估。

mean, the way that you described it was that there is a phase within the post training phase where these AI models are evaluated by humans.

Speaker 2

嗯。

Mhmm.

Speaker 2

这就是他们所说的RLHF,即基于人类反馈的强化学习。

And that's part of what they call RLHF or reinforcement learning from human feedback.

Speaker 2

你报道中让我印象深刻的是,你确实采访了一些参与过这种模型反馈工作的人,他们说,自己只是被要求以一些毫无意义的方式进行评分。

And what struck me in your reporting is that you actually talked to some people who have done this kind of feedback to the models who say that they're they're just being asked to grade things that in ways that don't make sense.

Speaker 3

是的

Yeah.

Speaker 2

对吧?

Right?

Speaker 2

跟我们说说这个。

Tell us about that.

Speaker 1

是的

Yeah.

Speaker 1

我的意思是,这非常有趣,因为你会在像Merkor或XAI这样的地方直接看到这类职位招聘。

I mean, this is super interesting because, like, these job listings you'll see on, like, places like Merkor or XAI, Elon's company will list them directly.

Speaker 1

它会写着:创意写作专家,每小时45美元。

It'll be like, creative writing expert, $45 an hour.

Speaker 1

必须是《纽约时报》畅销书作者,并且还得有类似星德·库尔吉斯的书评之类的。

Must be a New York Times bestseller and have like a Stard Kirkus review or something like this.

Speaker 3

你拿过星德·库尔吉斯的书评吗,鲁斯?

Have you ever gotten a Stard Kirkus review, Ruse?

Speaker 2

我觉得是的。

I think so.

Speaker 3

好的。

Okay.

Speaker 3

干得好。

Good job.

Speaker 3

好吧。

Alright.

Speaker 1

你可能有资格帮助埃隆,帮助格罗克的安妮写得更好一点。

You might qualify to help Elon to help Annie from Grok write a little bit better.

Speaker 3

是的。

Yeah.

Speaker 3

我们会去申请这份工作。

We're gonna get in that job listing.

Speaker 3

但好吧。

But okay.

Speaker 3

你刚才

You were

Speaker 1

在说。

saying.

Speaker 1

是的。

Yeah.

Speaker 1

所以不管怎样,这些公司意识到,这些AI研究人员非常擅长判断什么是好的代码,但他们并不真正懂得什么是好的写作。

So anyway, so these companies because they realize that these AI researchers, they're really good at knowing, like, what good coding is, but they don't actually know what good writing is.

Speaker 1

所以他们就想,为什么不雇一些人来弄清楚呢?

So they're like, why don't we hire some humans to find out?

Speaker 1

于是他们会聘请一些文学硕士、出版作家,有时甚至只是些有博客的普通人。

And so they'll commission like MFAs and published authors and sometimes just like random guys with a blog or whatever.

Speaker 1

我认识的一个为Scale AI做写作评估外包的人,他当时为其中一个大型实验室工作。

And one of the people I talked to who was a contractor for Scale AI as a writing evaluator, and he was doing this for one of the bigger labs.

Speaker 1

他说,那些评分标准根本毫无道理。

He said that the rubrics just didn't make any sense.

Speaker 1

他会收到诸如“你得根据文本里的感叹号数量来给它们打分”这类要求。

He would be told things like you have to grade them based on the number of exclamation marks that there are.

Speaker 1

所以如果一段内容里出现了三个感叹号,那就算超标了。

And so if something has three exclamation marks, that's too many.

Speaker 1

那你就得给这段内容扣分。

And so you have to ding that one.

Speaker 3

对。

Yeah.

Speaker 3

而且我得说,这些写作建议总体上还不赖。

And I have to say, generally not bad writing advice.

Speaker 3

我是说,我觉得这得取决于文本的长度,但就很多情况而言,三次实在是太多了

I mean, I guess it depends on the length of the text, but three feels like a lot for many

Speaker 1

的场景里。

scenarios.

Speaker 1

这就是他们给职场女性提的商务沟通建议。

This is what they tell women in business communications.

Speaker 1

就像是,把所有这些感叹号都换成句号。

It's like, take all those exclamation marks, replace them with periods.

Speaker 1

也就是说,我们要把所有这些东西都删掉。

Like, we are just gonna remove all of the items.

Speaker 3

我们教女性要压抑自己。

We teach women to shrink themselves.

Speaker 1

没错。

Exactly.

Speaker 1

是的。

Yeah.

Speaker 1

但确实如此。

But yeah.

Speaker 1

所以他实际上是被要求去评判这些东西。

So he's he was sort of, like, being asked to grade these things.

Speaker 1

还有另一个例子,他收到一堆同人小说,却被要求根据事实性来评分,因为这是其中一项标准。

Or another one was he got a bunch of fan fictions, and he was supposed to grade them on their factuality since that was one of the criteria.

Speaker 1

我确实觉得,人们可以设计出比这位评估者所使用的更好的评分标准。

I do imagine that one could, you know, devise better rubrics than this particular evaluator was given.

Speaker 1

但我认为,这至少表明了一些资源雄厚的大公司根本不知道如何判断什么是好的写作。

But I think it does show at least that some of these, like, very big companies that are very well resourced simply do not know how to think about what good writing is.

Speaker 3

简而言之,我想强调一下,因为在我看来,这正是整个问题的关键。

Briefly, like, I like, I I wanna underline that because to me, that seems like the whole story.

Speaker 3

我们正在把整个互联网的内容都拿去根据事实性来打分。

We are taking the entire Internet and we are grading it on factuality.

Speaker 3

所以,你从这种做法中得到的大型语言模型,很可能根本没什么创造力。

And like so the the LLM that you're gonna get out of that is just probably not gonna be all that creative.

Speaker 2

而且我在想,这在多大程度上与这种可验证的奖励机制有关呢?

Well and I wonder how much of it is related to this sort of verifiable reward Mhmm.

Speaker 2

很多公司都在使用这样的系统:让一个系统生成大量代码,再让另一个评估模型检查代码是否合格。

System that a lot of these companies are using where you you have a system generate a bunch of code, and then you have another evaluator model check the code to see whether it's good or not.

Speaker 2

这种做法在编程领域是有效的,因为代码要么能运行,要么不能运行。

And that works in domains like programming where the code either runs or it doesn't.

Speaker 2

但创意写作并不是这样运作的。

But creative writing doesn't work that way.

Speaker 2

你无法让评估者以任何一致的方式告诉你,某样东西究竟是好是坏。

You can't have an evaluator tell you, you know, with any sort of consistency whether something is good or not.

Speaker 2

因此,这可能只是取决于个人偏好。

And so it may just come down to preference.

Speaker 2

所以我想知道,你认为这是一个实验室正努力解决的技术问题吗?

And so I guess I'm curious, like, do you see this as a technical problem that the labs are frustrated trying to solve?

Speaker 2

还是这只是需求相关的问题?

Or is this just demand related?

Speaker 2

这仅仅是人们希望聊天机器人说话的方式吗?

Is this just what people want chatbots to sound like?

Speaker 2

在每一次测试中,当不同模型相互比拼时,听起来像乏味的企业助手的那个总是胜出。

And in every test where they pit different models against one another, the one that sounds like a bland corporate assistant wins.

Speaker 2

因此他们就选择了这种风格。

And so they go with that.

Speaker 1

我认为两者都是对的。

I think both are true.

Speaker 1

就像我们让模型做的大部分写作,都是帮我写这封邮件。

It's it's like the majority of writing that we are asking the models to do is write this email for me.

Speaker 1

对吧?

Right?

Speaker 1

而且,它们在这方面非常擅长。

And, like, they excel at that.

Speaker 1

它们确实是出色的公司邮件写手。

They are truly great corporate email writers.

Speaker 1

在那种委婉含蓄的表达方式上,它们比我强多了。

They are much better at the whole like passive aggressive thing than I am.

Speaker 1

同时,我也同意你所说的,确实存在一个与可验证性相关的技术难题。

At the same time, I do think like you said, there is a technical challenge that has to do largely with verifiability.

Speaker 1

有些人花了数十年时间试图阐明是什么让莎士比亚成为莎士比亚,或是什么让一首俳句成为俳句,但他们依然无法给出确定的答案。

It's like there are people who have spent decades of their lives attempting to articulate what makes Shakespeare Shakespeare or what makes a Naruto poem a Naruto poem, and they will still not know in any kind of certain way.

Speaker 1

他们仍然会与同行学者和文学评论家争论谁的写作更优秀。

They will still get into debates with their fellow academics and literary critics about which writer is better than the other.

Speaker 1

正因为这些事情是主观的、难以言喻的、难以用评分标准衡量的,这才是艺术的本质。

And because these things are subjective, because they are ineffable, because they are hard to put in a rubric, that is the nature of art.

Speaker 3

说到这点,你一开始提到萨姆·阿尔特曼说,我们目前还写不出一首好诗。

And to that point, you know, you started this segment by talking about Sam Altman saying, like, hey, you know, we've just basically can't write a great poem yet.

Speaker 3

一年前,萨姆·阿尔特曼曾表示,公司已经训练出一个优秀的创意写作模型,并在X平台上发布了一篇短篇小说。

Sam Altman, a year ago, said the company had trained a good creative writing model and posted a short story on x.

Speaker 3

许多人觉得这篇小说很有吸引力。

Many people found it compelling.

Speaker 3

萨姆·阿尔特曼是不是一直没对我们坦诚,贾斯敏?

Sam Altman just not being consistently candid with us, Jasmine?

Speaker 3

哦。

Oh.

Speaker 1

这也不是第一次了。

Wouldn't be the first time.

Speaker 1

但那篇短篇小说,要是你们还记得的话,我敢肯定各位都有印象,里面有不少很棒的句子,比如提到镜子的接缝,还有周四,那叫什么来着?

But that short story, if you remember, I'm sure you guys recall, had some great lines, like, talking about the seams of mirrors or Thursday, the what was it?

Speaker 2

好像是“介于周四和周五之间的临界时刻”之类的,或者说

It was like the liminal almost Friday or

Speaker 1

差不多的说法。

something.

Speaker 1

对。

Yeah.

Speaker 1

就快到周五的那种过渡性日子。

The liminal day that takes almost Friday.

Speaker 1

等一下。

Wait.

Speaker 2

我真得去查一下这个内容,因为它写得实在太棒了。

I I have to actually look this one up because it was so good.

Speaker 1

趁你查资料的工夫,你也知道,AI写作的问题在于它能想出一大堆新奇的比喻,有时候那些比喻还挺让人意外的。

While you're looking it up, like, you know, when the the thing about AI writing is like it comes up with all these fun metaphors, and they are like kind of surprising sometimes with the metaphors.

Speaker 1

但这种语言并没有扎根于生活。

But also the language is not grounded in the life.

Speaker 1

这正是我想说的另一点:除了可验证性之外,从根本上说,当我想到我真正喜爱的作家时,无论是记者、诗人还是其他类型的作家,他们都是从生活中写作的。

And that was my other thing is aside from the verifiability, fundamentally, when I think about the writers who I really love, when I think about whether it's journalists or poets or whatever, like they are writing from life.

Speaker 1

对吧?

Right?

Speaker 1

比如记者会走出去与人交谈,观察事物,留意天空在特定时刻的颜色;诗人则会思考他们亲身经历的个人体验。

Like a journalist goes out and talks to people and they like see stuff and observe like the color of the sky in a particular way or like a poet is thinking about personal experiences that they've had.

Speaker 1

他们的写作是有分量的。

Their their writing has stakes.

Speaker 1

它源自情感深处。

It comes from an emotional place.

Speaker 1

尽管大型语言模型非常有才华,语法 impeccable,但它们没有生活。

And the fact that like LLMs while being very talented, grammatically pristine, whatever, they don't have lives.

Speaker 1

这意味着它们所选择的所有隐喻、所有词语、所有例子,都是缺乏根基的。

That means that all of the metaphors they choose, all of the words they choose, the examples they choose, they're just ungrounded.

Speaker 1

对吧?

Right?

Speaker 1

因为写作并不是源自某种观点、特定经历或特定社群,所以缺乏可信度。

Like it's not coming from a point of view or a particular experience or particular community that makes the writing believable.

Speaker 1

我认为,语调和风格的一部分在于它非常贴合一个人所经历的生活,而大语言模型无法像那些未曾真正经历过那种生活的人一样达到这种程度。

I think part of what voice and style is is that it is very specific to the life that a person has had, and LLMs cannot get there in the same way a human who hasn't really lived that life, like, cannot get there.

Speaker 3

我不知道。

I don't know.

Speaker 3

我觉得这要看具体情况。

I feel like it's case dependent.

Speaker 3

你知道的?

You know?

Speaker 3

我是个音乐迷。

I'm a big music fan.

Speaker 3

是的。

Yeah.

Speaker 3

在过去几个月里,我喜欢向大语言模型提出一些关于音乐的问题,特别是某些乐队的声音,这听起来像是个玩笑式的提问,因为大语言模型从未真正听过任何声音。

And over the past few months, I have enjoyed putting questions about music and, in particular, the sounds of certain bands to an LLM, which sounds like a joke prompt because an LLM has never heard anything.

Speaker 3

对吧?

Right?

Speaker 3

然而,我发现这些模型通常能和我就音乐的音色进行很好的对话。

And yet, I find that in general, the models can have good conversations with me about the sound of music.

Speaker 3

也许它们只是基于互联网上大量有听觉经验的人所写的公开内容进行模式匹配。

Now it may be that they are just pattern matching based on a bunch of public writing on the Internet by people who do have ears and have heard.

Speaker 3

对吧?

Right?

Speaker 3

我非常接受这种可能性。

Like, I'm very open to that.

Speaker 3

是的。

Yeah.

Speaker 3

但我再次感到惊讶的是,它竟能以一种富有感染力的方式谈论感官类话题,至少在我看来,这超越了我对它能力的预期。

But I I I again, I I have just sort of been struck about the way that it is able to, like, sort of write about sensory topics in an evocative way that, at least to me, like, surpasses what I would predict they would be able to do.

Speaker 2

是的。

Yeah.

Speaker 2

我想提出一些可能有人会对你的文章提出的反对意见。

I I wanna pose a couple objections that I think someone might make to your article.

Speaker 2

其中之一是这是科普。

One of them is this is Cope.

Speaker 2

这是贾斯敏,一位作家,一位非常有才华的作家,她正在寻找AI目前还不擅长的方面,并认为这证明AI要完成这些任务将非常困难。

This is Jasmine, a writer, a very talented writer, sort of finding the things that AI in her view is not good at yet and saying this is categorical proof that they will it will be very hard for AI to do these things.

Speaker 2

这和软件工程师在模型开始在编程方面表现得非常出色时的反应一样。

This is the same reaction that software engineers had when models started getting really good at code.

Speaker 2

他们会说,哦,但我还不能做其他那十件事。

They would say, oh, well, I can't do these other 10 things that I do.

Speaker 2

但事实上,再过几年,模型在所有事情上都会比我们所有人做得更好,包括写作。

And that basically just wait a few years and the models will be better than all of us at everything, including writing.

Speaker 1

我真希望这真是科普,因为我一直在努力让自己被自动化取代。

I would love for it to be cope because I try to automate myself away all the time.

Speaker 1

我并没有深深依恋必须亲自写作这件事,我喜欢写作,但过去三年里,我反复尝试过无数次,想让Claude替我完成我的工作。

I have no sort of deep attachment to having to like like, I like writing, but like I have tried over and over and over for the past three years to automate my own job away and to get Claude to do my job for me.

Speaker 1

它做不到。

It cannot do it.

Speaker 1

这非常令人沮丧。

This is very frustrating.

Speaker 1

这并不是因为我没努力,你知道吗?

It's not out of a lack of trying, you know?

Speaker 1

而且,我又要回到那些首席执行官们自己说的话上了,对吧?

And again, I'm going back to the CEOs themselves and the things that they themselves are saying, right?

Speaker 1

这不只是我作为一个作家的看法。

Like it's not just me a writer.

Speaker 1

是萨姆·阿尔特曼说,这项技术能治愈癌症、解决物理学难题,但它写不出比真正诗人的一首好诗更好的作品。

It's Sam Altman saying this thing will cure cancer and solve physics, but it will not write better than a real poet's okay poem.

Speaker 1

所以,我认为这表明至少在人们感知中,存在某种微妙的不同。

And so, like, I think like that suggests that there is something that is at least perceived as a little bit different.

Speaker 1

我认为在未来几年里,这些模型在写作方面可能会变得好很多。

I think it's very possible the models will get much better at writing over the next few years.

Speaker 1

我不觉得这是永远做不到的事。

I don't think it's like a never thing.

Speaker 1

我认为,像报道这种事很难被复制。

I do think that, you know, like reporting is hard to replicate.

Speaker 1

我认为,拥有真实且可验证的生活经历是很难被复制的。

I think that, like, having life experiences that are real and verifiable is hard to replicate.

Speaker 1

我认为,风格方面是可以改进的,尤其是如果你对模型进行微调的话。

I think the style stuff can be improved, especially if you fine tune the models.

Speaker 1

但我觉得这篇报道有趣的地方在于,它展示了这些公司的市场激励和需求激励是如何塑造了我们今天所看到的它们的能力的。

But I think what's also interesting to me about this piece is that it shows how the the market incentives, the demand incentives of these companies do shape what we see their abilities are today.

Speaker 2

我想象那些深度信赖AI的人可能会提出的另一个反对意见是,这一切都只是见仁见智。

The other objection I I'm imagining people might have who are very AI pilled is that this is all in the eye of the beholder.

Speaker 2

对吧?

Right?

Speaker 2

现在已有几项研究表明,如果让人们在不知情的情况下对AI写作和人类写作进行盲测,他们会更喜欢AI写作;但一旦告诉他们这是AI写的,他们对它的评价就会骤降。

There have been several studies now that have shown that if you give people a blind taste test of AI writing versus human writing, they prefer the AI writing until you tell them that it's AI writing, and then the value in their eyes plummets.

Speaker 2

我最近在《纽约时报》的一个测验中做过一次这样的实验。

I did one of these in a New York Times quiz just recently.

Speaker 2

那么,是否有可能AI模型在写作方面已经超越了人类,但只要我们一知道这些文字是AI生成的,而不是人类亲手写的,就会因为来源而非质量而完全失去兴趣?

So is it possible that the models have already become superhuman at writing, but that the minute we learn that they are AI models generating text and not humans writing words with their fingers, we lose all interest in it just because of the source, not because of the quality of the writing?

Speaker 1

我的意思是,人们确实不喜欢AI写作,这一点毫无疑问,这也是他们看到明显是AI生成的文字时感到困扰的原因之一;但正如你所说,在这些测验和测试中,AI在这些狭隘场景下其实能胜过人类作家。

I mean, I think it's definitely interesting and true that people don't want to like AI writing, and that is part of what bothers them when they see AI text that is obviously AI even though, like you said, like, in these quizzes and tests, AI can outperform human writers in those narrow scenarios.

Speaker 1

我的疑问在于,很多这类测验和测试都忽略了一点:作为作家,你们也都是作家,你们的工作中有多少真正是在进行文字生成呢?

I mean, my quibble with a lot of these quizzes and tests is that like as a writer and you guys are writers too, how much of your job is actually text generation?

Speaker 1

我认为AI在文字生成方面已经超越了人类。

I think AI is a superhuman text generator.

Speaker 1

对吧?

Right?

Speaker 1

我的工作,我每天大约有25%的时间是在生成文字。

My job, I am generating text probably 25% of the hours in my day.

Speaker 1

我花很多时间采访别人。

I spend a lot of time interviewing people.

Speaker 1

我花很多时间构思点子。

I spend a lot of time coming up with ideas.

Speaker 1

我花很多时间阅读,而且不是随便读,而是专门读那些我觉得合适的特定来源。

I spend a lot of time reading and not just reading indiscriminately, but like reading very particular sources that feel like the right ones.

Speaker 1

所以,你知道,通常在你做这类测试的时候,你会说:‘请写一段关于特朗普为何赢得2016年大选的文字,500字以内。’

And so like, you know, usually at the point that you are doing one of these tests, you're saying like generate like one paragraph very specifically about like why Trump won the twenty sixteen election 500 words or less.

Speaker 1

而且你已经给出了提示,我认为这是写作中非常关键的一部分——你到底要写什么。

And like you've already given the prompt, which I think is a critical part of writing is like what are you gonna write about.

Speaker 1

你通常还会提供一些证据和指导,比如要求‘500字以内’。

You've often like supplied some of the evidence and the guidance in the form of it saying like 500 words or less.

Speaker 1

在这一点上,我认为AI的文字生成能力确实比绝大多数人类都要强。

And at that point, I do think that AI is probably a better text generator than almost all humans are.

Speaker 1

但再次想想,AI在为文章构思点子方面仍然非常糟糕。

But again, when I think about it, you know, AI is still very bad at coming up with ideas for articles.

Speaker 1

它在新闻报道方面仍然非常差。

It is still very bad at reporting.

Speaker 1

非文本生成的那部分工作离自动化还很远。

The non text generation parts of the role feel further away from automation.

Speaker 1

我并不是那种绝对不说‘永远不’的人,我有点儿倾向于说‘永远不要说永远’。

Again, like, I'm not a never say like like, I'm I'm sort of, like, never say never.

Speaker 1

也许我终会达到那个境界。

Like, maybe I'll get there.

Speaker 1

如果克劳德能为我下一篇论文提供好点子,我会非常开心,但目前还做不到。

I would be totally happy if, you know, Claude was able to give me good ideas for my next essays, but it's not there yet.

Speaker 3

我们已经看到大语言模型在类型小说领域取得了巨大进展。

Well, we're we're already seeing the LLMs make huge progress in genre fiction.

Speaker 3

对吧?

Right?

Speaker 3

最近在节目中,我们采访了一位《纽约时报》文章的作者,他谈到浪漫小说作家现在能用大语言模型每年生成几十部小说。

So, like, recently on the show, we talked to the author of a story in the Times about how authors of romance novels are now able to generate dozens of novels a year using LLMs.

Speaker 3

事实上,我们之前的很多讨论都围绕着一个观点:你必须以不同方式、坚持不懈地提示模型,才能得到你想要的结果。

In fact, much of the discussion that we had was around how you just have to prompt them differently and sort of relentlessly in order to get what you want.

Speaker 3

你知道吗,贾斯敏,你的文章让我思考:通过反复用各种方式告诉模型‘再怪一点’,能让模型写出更怪异的内容,这到底能实现多少?

You know, your piece, Jasmine, it made me wonder, like, how much of getting a model to just write weird can be achieved by repeatedly telling it in different ways, hey, be a little weirder.

Speaker 1

一部分可以,但不是全部。

Some of it, but not all of it.

Speaker 1

比如,我采访了詹姆斯·余,他是PseudoWrite的联合创始人,这是最早期的创意小说AI写作助手之一。

I mean, so I talked to, for example, James Yu, who is the cofounder of PseudoWrite, which is one of the earliest creative fiction AI writing assistants.

Speaker 1

我还和其他一些同样在小说写作LLM领域的人聊过。

I talked to some other folks who similarly were in the fiction writing LLM space.

Speaker 1

就像你说的,某种程度上,很多作家已经在大量使用这些工具,依赖LLM生成大量文本,这非常成功,也能满足读者的需求等等。

And like you said, to an extent, a lot of writers are already using these a lot, are already leaning on LLMs to generate large amounts of text and it can be very successful and it can meet readers' needs and whatever.

Speaker 1

但就连我采访的这些人也向我描述,要逆转实验室在训练后所做的所有调整,有多难。

But like even these people who I was talking to, they were describing to me how freaking hard it is to undo all of the post training that the labs have done.

Speaker 1

因此,他们不得不投入巨大的工程努力,这显然让他们非常沮丧——因为要让这些模型不再那么活泼、不再那么奉承、不再那么PG-13,回归到一种更原始、能再次变得怪异的基础模型状态,实在太难了。

So they are like applying immense amounts of engineering effort that clearly when in my conversations with them, clearly frustrates them that it is so hard to get these models to stop being so chirpy, so sycophantic, so PG 13 and everything in in order to get them to this sort of like base model state where they're able to be weird again.

Speaker 1

所以我认为这当然是可能的,但我觉得实验室由于这些模型的训练方式,让这件事变得相当困难。

So I think it's certainly possible, but I think the labs have made it quite challenging just because of the way that these models are trained.

Speaker 1

我认为另一件重要的事情是,我倾向于认为写作和许多创意工作实际上是这种半人马模型的完美应用场景。

The other thing that I think is important is I tend to think that writing and a lot of creative work is actually like the perfect use case for these centaur models.

Speaker 1

对吧?

Right?

Speaker 1

也就是说,人类与人工智能的协作才能达到最远的境界。

Like the idea that the human plus AI collaboration is where you can get the furthest.

Speaker 1

当我听到你们采访那些小说作家的对话时,我就在想,这正是一个半人马模型。

And when I listened to the interviews that you guys did about the fiction authors, I was thinking this is a centaur model.

Speaker 1

对吧?

Right?

Speaker 1

因为如果没有人类不断引导、逼迫AI变得怪异、变得感性等等,它自己是根本不会这么做的。

Because without the human prompting and bullying the AI into getting weird and getting sensual and whatever, like it was not gonna do that on its own.

Speaker 1

就像我自己,我也确实把LLM当作研究助手来使用。

And like I myself, like I do use LLMs as a research assistant.

Speaker 1

就像我在《大西洋》文章中提到的,克劳德现在以一种我觉得极其有用的方式帮助我修改自己的作品。

Like I wrote about that inside the Atlantic piece about the way that Claude has now sort of helped me edit my own work in a way that I found incredibly useful.

Speaker 1

但我确实觉得,在任何个人视角、生活经验真正重要的领域,协作元素都至关重要。

But I do feel like the collaborative element is is important for any domain where the personal perspective, lived experience, whatever really matters.

Speaker 2

稍微谈一谈这一点。

Talk about that a little bit.

Speaker 2

你提到了你的编辑过程。

You mentioned your editing process.

Speaker 2

你是如何使用人工智能来帮助你编辑作品的?你觉得有用吗?

How are you using AI to help you edit your work, and are you finding it useful?

Speaker 1

是的。

Yeah.

Speaker 1

我觉得在过去几个月里,我终于找到了诀窍,这让我非常兴奋。

So I feel like I really cracked this over the last, like, couple months, which I'm very excited about.

Speaker 1

因为再次强调,我曾无数次尝试让这些工具替我写作和修改,但它们从来都没真正能做到。

Because, again, I've tried to make these things, like, write and edit for me over and over and over, and they've never really been able to do it.

Speaker 1

所以我意识到,如果我把克劳德当作一个编辑,而不是仅仅试图根据某种通用的优秀写作标准来评判和反馈我的作品。

So the thing that I realized was if I make Claude into an editor, that is not just trying to grade and give feedback on my work against some genericized standard of what good writing is.

Speaker 1

而是真正地根据我本人贾斯敏对写作的个人追求来调整,它就能提供对我而言更有帮助的反馈。

But actually, we what we did against basically what my personal Jasmine's personal aspirations for writing are, it can give feedback that I find much, much more helpful.

Speaker 1

所以我做的就是,把之前我写过的所有子章节存档,以及一些自由职业作品,全部输入给克劳德。

So what I did was basically I fed Claude my entire sub sec archive of the writing that I've previously done as well as some of my freelance work.

Speaker 3

为了更具体一点,这是在克劳德的某个项目里完成的吗?你是怎么设置的?

And just to get real specific, is this inside, like, a Claude project, or how have you set this up?

Speaker 3

因为我知道,我们的听众都想要尝试这种方法。

Because I know, like, our listeners are gonna wanna try this.

Speaker 1

我是在一个项目里完成的。

I did it in a project.

Speaker 1

但根据克劳德的建议,我当时想:我需要让克劳德写点代码吗?

But on Claude's advice, I was like, do I need to Claude code something?

Speaker 1

克劳德说:不需要。

Claude was like, no.

Speaker 1

这有点过度了。

That's overkill.

Speaker 1

所以你根本不需要写代码之类的。

So you don't need to code or anything.

Speaker 1

在Claude项目中,我上传了我全部的写作档案。

So in a Claude project, I gave it my whole archive of writing.

Speaker 1

我本人在每篇作品发布后,都会给自己写一些回顾笔记。

I also personally write retro notes to myself after everything I publish.

Speaker 1

我有一个笔记应用,里面全是我自己写的,关于我所有作品的优点和缺点。

So I have a notes app that's just like me writing what was good and bad about everything I've ever written.

Speaker 1

就几条要点而已。

Just a few bullet points.

Speaker 2

这就是为什么Jasmine以后会成为我们的老板。

This is why Jasmine's gonna be our boss.

Speaker 1

我的这些要点质量其实很低,但我还是给了它,因为我想了解自己的审美偏好。

I mean, these are very low quality bullet points, but I also gave it that because I wanted to learn my taste.

Speaker 1

我想了解自己渴望成为什么样的人,自己在哪些方面不足,又在哪些方面感到自豪。

I wanted to learn what do I aspire to be and where do I see myself falling short and where what am I proud of.

Speaker 1

对吧?

Right?

Speaker 1

基于这两点,再加上一些额外的信息,比如我的受众是谁、我的报道领域是什么、我的目标是什么,我们共同制定了一套评估标准,不再像以前那样关注感叹号有多少个,而是会问:‘这篇文章是否充分利用了你在硅谷的‘内部人类学家’身份?’

And so from those two things, plus a little bit more information about like, here's my audience, this is my beat, this is my goals, we were able to codevelop a rubric of instead of like how many exclamation marks does it have, it would say things like, does this take advantage of your quote unquote, like insider anthropologist position in Silicon Valley?

Speaker 1

因为这正是我和Claude都认为能体现我独特风格的一点。

Because that's one of the things that Claude and I think distinguish my voice.

Speaker 1

或者它也会注意到:‘哦,Jasmine,你经常在不同语体之间切换。’

Or it'll also notice like, oh, Jasmine, you tend to move between registers.

Speaker 1

你会在创业术语、网络俚语等各种表达方式之间来回转换。

You'll switch between, you know, startup jargon and like Internet slang and whatever.

Speaker 1

我觉得你能自如地在高端与通俗之间、政策议题与个人场景之间切换,这是你写作的鲜明特点。

And like, I think the fact that you can do the high low or move from like policy to personal scene, this is something that is characteristic of your writing.

Speaker 1

所以,我们再次共同开发这些定性标准。

And so again, we're co developing these qualitative criteria.

Speaker 1

然后我把它分成几个阶段:构思阶段、结构评分标准、文风评分标准、最终事实核查。

And then I split it into phases of like ideation phase, rubric, structure rubric, prose rubric, final fact checking.

Speaker 1

所以我现在把这一切都放进了一个Claude项目中。

And so what I do now, I put this all in a Claude project.

Speaker 1

我告诉它:你的任务是根据这些标准评估我的草稿,但不要替我写作,而是要不断引导我思考如何改进。

I said, your job is to evaluate my drafts based on this criteria, but not to do the writing for me and to make sure to prompt out of me, like, what I can do better.

Speaker 1

我把一份草稿上传给了Claude。

I dumped a draft into Claude.

Speaker 1

Claude会对其执行第二阶段的结构评估。

Claude will run like phase two structure on it.

Speaker 1

它会说:你的结论只是总结,这太枯燥了。

It'll say things like, your conclusion is just a summary, and this is really boring.

Speaker 1

事实上,在你写那篇关于某某的文章时,你最后是以一个场景收尾的,我觉得那样有力得多。

In fact, in your piece about this and that, you actually ended on a scene, and I thought that was much more powerful.

Speaker 1

所以你为什么不试着用一个场景来结束这一篇呢?

So why don't you try ending this one on a scene?

Speaker 1

克劳德会说,与其虚构一个场景,不如问问你自己:飞机起飞时你在想什么?

And Claude will say, rather than inventing a scene, it will say, what were you thinking when the plane took off?

Speaker 1

你内心有什么感受?

What were you feeling inside?

Speaker 1

你能想到一个与儿童安全倡导者讨论人工智能时让你深有共鸣的场景吗?

Can you think of a scenario where you had a conversation with say a kid's safety advocate about AI that really resonated with you?

Speaker 1

因为现在听起来就像一篇枯燥的政策解释。

Because right now it sounds like dry policy explainer.

Speaker 1

事实上,我觉得这种反馈非常有用。

And, that feedback, I actually found incredibly useful.

Speaker 1

比如,我仍然在运用自己的判断来决定是否采纳这些建议。

Like, I'm still applying my own judgment to say, do I take it or not?

Speaker 1

但我知道,这关乎我成为更好的自己,一个更出色的写作者。

But I'm like, you know, this is about me becoming the best version of myself as a writer.

Speaker 1

这关乎自我提升,而克劳德在推动我这样做,我觉得这要有效得多。

It's about, like, me self improving and Claude pushing me to do that, which I found much, much more helpful.

Speaker 1

哇。

Wow.

Speaker 2

我想以同行作家的身份问你们两个一个问题。

I wanna ask you both a question as fellow writers.

Speaker 2

你们有没有因为AI而产生让自己的写作更怪异的冲动,以从一堆平庸作品中脱颖而出?

Do you feel the impulse to make your writing weirder because of AI to sort of stand out from the sea of slop?

Speaker 2

因为我自己就感受到一种拉扯:哦,这个奇怪的细节可能该删掉,但我还是想留着,因为克劳德永远不会这么写。

Because I I find myself feeling this tug of, like, oh, that's a little weird aside that probably I should cut, but I think I'm gonna leave it in because, like, Claude would never do that.

Speaker 2

对吧?

Right?

Speaker 2

这就像一种标记,表明这些字是我亲手打出来的,我觉得这就是我留下的印记。

It's like it's like a marker that I am typing these words, and I feel like that's sort of my imprimatur that I'm leaving.

Speaker 3

我的回答是:是的。

My answer to you is yes.

Speaker 3

我确实有这种感觉。

I absolutely feel that way.

Speaker 3

我确实回去修改过句子,故意让它们显得更怪异一些,尤其是让它们听起来更口语化,因为我知道大型语言模型通常不会这么表达。

And I've I've, like, gone back and tried to edit sentences to, like, make them feel a little bit more, like, weird or, like, in particular, to make them sound colloquial in a way that I know, like, an LLM generally would not be.

Speaker 3

是的,正是因为这个原因。

And, like, yes, it is for that reason.

Speaker 3

我觉得现在的写作,我们其实都不尽相同。

I think that writing right now, like, we're all not all.

Speaker 3

我们很多人都对可能读到垃圾内容高度警惕,所以如果你是个不想生产垃圾内容的作家,你就应该问问自己这个问题。

Many of us are on such high alert for the prospect that we might be reading slop that I think if you are a writer who does not want to be producing slop, like, you you should be asking yourself that question.

Speaker 1

我觉得这让我更能安心地按照自己本来的方式写作。

I I think it makes me a lot more comfortable writing the way I want to write in the first place.

Speaker 1

我觉得,可能和你们两位不同,我并没有在新闻编辑部成长,没有被灌输过那种特定的报社风格和各种规范。

Like, I think, like, maybe unlike both of you, I didn't sort of come up through newsrooms where I was, like, learning a very specific house style and all of these norms.

Speaker 1

我现在也能写新闻稿了。

Like, I can do news writing now.

Speaker 1

这是我后来学会的。

It's something I've learned now.

Speaker 1

但说实话,我其实更贴近网络和博客原生的风格,这种文体更口语化、不拘一格,也不那么精致,会开一些不恰当的玩笑,总之是一种更松散的写作形式。

But, like, I'm actually much more, quote, unquote, like Internet and blogging native, which is a form that is voice y and irreverent and not as pristine and, like, will make inappropriate jokes and, like, you know, it's just a looser form of writing.

Speaker 1

所以我认为,这让我更自在地去写那种博客风格的内容,而不是总试图用更职业化的新闻语调来写作。

And so I think what it's actually done is made me more comfortable doing the bloggy thing instead of sort of always trying to write in a more professionalized journalistic tone.

Speaker 2

所以我想,我们应该给你,贾斯敏,留一个问题:你的文章非常有说服力地指出,如今的AI并不擅长我们所有人所珍视的那种写作。

So I think we should leave this with a question for you, Jasmine, which is, you know, your your piece makes the case very convincingly that today's AIs are not very good at the kind of writing that I think we all value.

Speaker 2

你认为它们将来能达到那种水平吗?

Do you think they will get there?

Speaker 2

公司应该做些什么,才能让它们的模型在写作上变得更好?

And what should the companies do to make their models better at writing?

Speaker 1

我认为,如果我们把文本生成和新闻报道分开来看——我对模型做新闻报道这件事并不太看好——而只是讨论文学小说,或者给你一堆采访录音,让你写一篇杂志特稿之类的,我认为,如果它们能把投入在编码代理和真正赚钱项目上的资源,同样多投入到这个任务上,它们是有可能做到的。

I think that if we separate out text generation from reporting, which I'm not that bullish on the models doing, and we are just talking about, say, literary fiction or here's a bunch of interview transcripts, write a magazine feature or something, I think that if they applied as many resources towards that task as they do towards coding agents and things that actually make them money, I think that they could get there.

Speaker 1

公司会最终觉得,把所有资源都花在提升写作能力上,而不是用来替代23岁的软件工程师,从财务上更划算吗?

Will the companies ever find it financially advisable to spend all their resources on that instead of automating 23 year old software engineers?

Speaker 1

大概不会。

Probably not.

Speaker 1

我会感激这样的世界。

I would be grateful for that world.

Speaker 1

我不需要它们取代我的工作或这些人的工作,但我认为这是可能的。

I don't need them to take my job or these folks' jobs, but I think it's possible.

Speaker 3

听好了。

Look.

Speaker 3

它们最终会做到的。

They're gonna get around to it eventually.

Speaker 3

明白吗?

Okay?

Speaker 3

你知道,我的意思是,我听懂你的意思了。

You know, it's like, I mean, I I I hear you

Speaker 2

凯西,在这个经济环境下,作家能赚多少钱?

say writers make in this economy, Casey?

Speaker 2

最终,这些收入可没法支撑大量的数据中心。

Eventually, like, those aren't gonna pay for a lot of data centers.

Speaker 2

不。

No.

Speaker 3

写作是有经济价值的,最终,AI公司会想把这一切都据为己有。

There there is economic value in writing, like and eventually, the AI companies will want that all to themselves.

Speaker 2

你知道这件事可能会带来什么非常滑稽的结果吗?

You know what would be a very funny outcome of this?

Speaker 2

你知道吗,关于模型的那些限制措施,你的观点是这样的。

You know, taking your your point about the sort of guardrails of the models.

Speaker 2

也许下一部伟大的美国小说会由Grok写出来。

Maybe the great the next great American novel will be written by Grock.

Speaker 1

天啊。

Oh god.

Speaker 1

那太

That's

Speaker 2

好了,感谢贾斯敏·孙加入我们的讨论。

And with that, Jasmine Sun, thank you for joining us.

Speaker 3

谢谢

Thank you

Speaker 1

非常感谢。

very much.

Speaker 1

凯文和凯西。

Kevin and Casey.

Speaker 3

我们回来后,每个人都在花钱买令牌,凯文。

When we come back, everyone's spending money on tokens, Kevin.

Speaker 2

太好了。

Great.

Speaker 2

继续。

Keep going.

Speaker 3

你得做个令牌。

You've gotta be token.

Speaker 3

令牌最大化,就是这样。

Tokenmaxxing, that is.

Speaker 2

继续。

Keep going.

Speaker 2

是的。

Yes.

Speaker 2

而且。

And.

Speaker 3

我们回来后,你在说什么?

When we come back, what are you talking about?

Speaker 3

这个问题正由席卷硅谷的排行榜提出。

That's the question being asked by a leaderboard that's sweeping Silicon Valley.

Speaker 2

席卷。

The sweeping.

Speaker 2

它真的在席卷。

It's it's really sweeping.

Speaker 2

看到了吗?

See?

Speaker 3

好吧,凯文,你最近结束了休假,又开始为《纽约时报》撰稿了。

Well, Kevin, you've recently returned from book leave and are once again writing in the New York Times.

Speaker 3

再次看到自己的名字印在纸上,感觉如何?

How does it feel to see your name in print again?

Speaker 2

感觉很棒。

Feels great.

Speaker 2

还没发生呢,但一旦发生了,一定会很棒。

Hasn't happened yet, but when it does, it'll be great.

Speaker 3

我提前读了你即将发表的一篇文章,内容是科技公司现在创建了排行榜,显示哪些员工在工作中使用了最多的AI令牌。

Well, I got to take an early read at a story that you are publishing about the fact that tech companies have now created leaderboards to show which employees are using the most AI tokens in their work.

Speaker 3

是的。

Yes.

Speaker 3

这是一项

It's a

Speaker 2

那里正掀起一场令牌热潮,这些公司的员工在同事之间非正式地、带着点趣味地竞争,但他们却非常认真对待。

token frenzy out there, and the employees of these companies are competing among their colleagues sort of informally and and sort of for fun, but they're taking it very seriously.

Speaker 2

他们希望自己是公司里使用AI令牌最多的那个人。

They want to be the people at their company who are using the most AI tokens.

Speaker 3

所以我想问一个基础问题,给可能不太了解的听众。

So let me just ask a basic question for listeners who may not be familiar.

Speaker 3

什么是令牌?为什么你会开始关注这个东西?

What is a token, and why is that something you might start keeping track of?

Speaker 2

令牌是AI工作的基本单位。

So a token is the basic atomic unit of AI labor.

Speaker 2

它基本上是一个词的片段,也是AI模型提供商衡量使用量的方式。

It's basically a a fragment of a word, and it is how AI model providers measure their consumption.

Speaker 2

所以如果你输入一个提示,比如‘帮我写这篇论文’,旧模型可能会用几百个令牌来回应。

So if you type in a prompt, you know, help me, write this essay, an old model might have given you a couple 100 tokens in response.

Speaker 2

那相当于几百个单词。

That would be a couple 100 words.

Speaker 2

而过去一年左右,随着这些自主编码工具的兴起,模型对令牌的需求变得大得多。

And what has been happening over the past year or so as these agentic coding tools have started taking off is that the models are just much more token hungry.

Speaker 2

现在你可以在单次会话中使用数十万甚至数百万个令牌。

You can use now hundreds of thousands or even millions of tokens in a single session.

Speaker 2

因此,正是这种理念推动了这些排行榜的兴起:你进行的编码越多,使用的智能工具越多,同时运行的进程越多,你的令牌数量就会越高。

And so that is what is propelling these leaderboards is the idea that the more sort of coding you're doing, the more agentic tools you're using, the more simultaneous processes you're running, the higher your token count will be.

Speaker 3

我发现一个有用的衡量标准是,生成7500个单词大约需要10000个令牌。

One measurement I found useful was that apparently, it takes about 10,000 tokens to generate 7,500 words.

Speaker 3

如果这个数字能帮你更好地理解的话。

If that sort of, you know, helps to ground you at all.

Speaker 3

但正如你刚才所说,我想多了解一下,更先进的系统使用的令牌数量远超这个数字。

But as you just said, and I want to hear more about this, the more advanced systems are using way more tokens than that.

Speaker 3

那么,能跟我讲讲那些令牌使用顶尖人物在排行榜上公布的数字吗?

So tell me about some of the numbers that some of the sort of token all stars are putting up on the on the boards.

Speaker 2

我不清楚所有确切的数字,但我了解到,在OpenAI,他们确实追踪这类排行榜,最近一位员工在七天内的最高令牌使用量达到了2100亿个。

So I don't know all of the exact numbers, but I did learn that at OpenAI, where they do track this kind of leaderboard, the highest employee token count over a seven day period recently was a guy who used 210,000,000,000 tokens.

Speaker 2

作为参考,这大约相当于33部维基百科内容的文本量。

And this is, for rough scale, about 33 Wikipedia's worth of text.

Speaker 2

而现在,这一切并不只是打字和接收回复。

And now all of that is not sort of typing and receiving a response.

Speaker 2

其中一部分被称为缓存令牌。

Some of that is what they call cached tokens.

Speaker 2

所以,并不是所有内容都是首次由模型生成的,但我想,即使是一年前,这些数字听起来也会完全荒谬。

So it's not all sort of, you know, being extruded from the model for the first time, but these are the kinds of numbers that I think even a year ago would have sounded completely insane.

Speaker 3

对。

Right.

Speaker 3

那么,这个人是在为国防部开发一个新的大规模国内监控项目吗?

Now was this guy working on a new mass domestic surveillance program for the Department of Defense?

Speaker 2

我不知道,OpenAI也没有让他接受采访。

I don't know, and OpenAI did not make him available for interviews.

Speaker 2

但我在撰写这篇专栏时,想试着联系一些人,

But what I wanted to do in writing this column was to try to call up a

Speaker 3

或者与一些人交谈

bunch of people or talk to a

Speaker 2

一群属于这十亿令牌俱乐部的人,对吧,那些极端的高级用户,然后直接问他们:嘿。

bunch of people who are in this sort of billion token club, right, the the sort of extreme power users, and just ask them, like, hey.

Speaker 2

你们是怎么用掉这么多令牌的?这不会非常昂贵吗?

How are you guys using all those tokens, and isn't that very expensive?

Speaker 2

你们是怎么支付这笔费用的?

And how are you paying for it all?

Speaker 2

我学到了很多。

And I learned a lot.

Speaker 3

是的。

Yeah.

Speaker 3

好吧。

Well, okay.

Speaker 3

那么首先告诉我们,这到底有多贵?

Well, so tell us first of all just how expensive it is.

Speaker 2

非常昂贵。

Very expensive.

Speaker 2

是的。

Yeah.

Speaker 2

事实上,我听说使用Claude代码最多的个人用户,按Anthropic的统计,上个月在令牌上的花费超过了15万美元。

In fact, I I heard that the top user of Claude code, the top individual user of Claude code as measured by Anthropic spent more than a $150,000 on tokens last month.

Speaker 2

所以你推算一下。

So extrapolate that.

Speaker 2

这相当于一名员工年薪超过一百万美元。

That is like a an employee making more than a million dollars a year.

Speaker 2

是的。

Yeah.

Speaker 2

而他们一个月就花掉了这么多。

And they are burning that in a month.

Speaker 2

我还从其他一些极端的程序员那里听到了类似的数据,他们每天在这些模型的令牌上花费数千美元。

And I heard similar figures from some of these other extreme coders who are spending something on the order of thousands of dollars a day on tokens from these models.

Speaker 2

不过我们也应该说明,这些公司的员工可以免费获得令牌。

Now we should also say the employees of these companies get their tokens for free.

Speaker 2

对吧?

Right?

Speaker 2

所以他们自己并没有掏钱。

So they are not shelling out.

Speaker 2

他们的公司也没有掏钱。

Their companies are not shelling out.

Speaker 2

但在其他公司,这正开始成为一个问题,因为他们的支出已经超出了这类开销的预算。

But at other companies, this is starting to become an issue because they are sort of outstripping their their budgets for these things.

Speaker 3

所以确实有一些公司,工程师们真正地给雇主带来了每周高达15万美元的成本,因为他们大量使用大型提供商的令牌。

So there are companies where there are engineers who legitimately are costing their employers maybe a $150,000 a week because they're getting tokens from one of the big providers.

Speaker 2

是的。

Yeah.

Speaker 2

我认识一位在瑞典的软件工程师,他说他花在Claude上的钱可能比他的工资还多。

I talked to a software engineer in Sweden who said that he probably spends more than his salary on Claude.

Speaker 2

因此,这对一些程序员来说,正逐渐变成一种极其昂贵的福利。

So this is essentially becoming like a a very expensive job perk for some of these coders.

Speaker 3

所以跟我讲讲,为什么雇主们想创建排行榜来鼓励员工使用这些工具?因为我觉得其他公司可能会说:‘上个月你花了15万美元在令牌上?那你别在这儿干了,公司都破产了。’

So talk to me about why employers want to create leaderboards to promote this to employees because I could see other companies saying, if you spent a $150,000 on tokens last month, you actually don't work at this company anymore because we're bankrupt.

Speaker 2

对。

Right.

Speaker 2

所以这正是我很好奇的问题:为什么会发生这种情况?

So this was a big question that I had is, like, why is this going on?

Speaker 2

这似乎是一些员工激励和员工行为追踪的结合。

And it seems to be some combination of sort of employee motivation and worker tracking.

Speaker 2

对吧?

Right?

Speaker 2

这些公司的高管们认为,你使用的令牌越多,可能就越有生产力。

There are executives at these companies who think that the more tokens you use, the more productive you probably are.

Speaker 2

正如我们在本节之前的环节中讨论过的,这些公司非常希望员工开始使用AI工具。

And as we discussed in a previous segment on this show, these companies are very eager to have their workers start embracing the AI tools.

Speaker 2

因此,在不少公司里,我跟一些人聊过,他们说,是的。

And so at a number of these companies, I talked to people who said, yeah.

Speaker 2

这本质上是他们想看看谁真正全身心投入了这种新的编程方式。

This is just basically them trying to see who is really all in on the new way of programming.

Speaker 3

你已经和一些在这些排行榜上排名靠前的人谈过了。

And you've talked to a number of people who are ranking high on these leaderboards.

Speaker 3

我知道你可能没有深入研究过他们的代码,但你对他们实际的生产力有什么看法?

I realize you probably haven't dug deep into their code, but what is your sense of how productive they actually are?

Speaker 3

也就是说,令牌使用量和推动我公司迈向新高度之间有什么关系?

Like, what is the relationship between token usage and taking my company to the next level?

Speaker 2

我的意思是,这非常不清楚。

I mean, it's very unclear.

Speaker 2

对吧?

Right?

Speaker 2

这些人中有些可能只是在生成一些毫无价值的项目。

Some of these people may be just generating, like, worthless projects.

Speaker 2

我觉得我接触的很多人对这些排行榜感到担忧的是,它们只是在激励你拼命增加令牌使用量。

I think the the thing that worries a lot of the people I talk to about these leaderboards is that they just incentivize you to, like, run up your token count.

Speaker 2

对吧?

Right?

Speaker 2

因为这样你就会显得像是那个特别的、所谓的十倍工程师或百倍工程师,远远超越你的同事。

Because then you look like the special, you know, 10 x engineer or 100 x engineer who's, like, outperforming all your colleagues.

Speaker 2

所以我认为有不少公司觉得这种排行榜有点奇怪,甚至可能适得其反。

So I think there are a number of companies that that see this leaderboard business as a little strange and maybe counterproductive.

Speaker 2

但我确实觉得,那些最频繁使用令牌的人中,有一种他们正在高效工作的感觉。

But I do think that there is a feeling among the most sort of heavy token users that they are being productive.

Speaker 3

是的。

Yeah.

Speaker 3

我得说,读你的专栏时,我觉得这简直会制造最糟糕的激励机制。

I have to say, when I read your column, I thought this just seems like it would create the worst incentives.

Speaker 3

没错。

Yes.

Speaker 3

对吧?

Right?

Speaker 3

有一种叫做古德哈特定律的观点。

There's this idea of Goodheart's law.

Speaker 3

对吧?

Right?

Speaker 3

当一个指标变成目标时,它就不再是一个好的指标了。

Like, when a measure becomes a target, it ceases to become a good measure.

Speaker 3

我想不出比为令牌使用量创建排行榜更能确保它变成一个糟糕指标的方法了。

I can't think of a better way to ensure that tokens usage becomes a bad measure than creating a leaderboard for it.

Speaker 2

完全正确。

Totally.

Speaker 3

公司内部的人对此有什么看法?

What are the people inside the company saying about that?

Speaker 2

嗯,有些人反对这种排行榜做法。

Well, some of them are opposed to this whole leaderboard thing.

Speaker 2

我也和一些为排行榜辩护的人谈过。

I also talked with some folks who defended the leaderboards.

Speaker 2

他们说,看啊,要衡量程序员的生产力从来就不是件容易的事。

They said, look, it's never been all that easy to track the productivity of programmers.

Speaker 2

有些人曾用他们写出的代码行数或提交的拉取请求数量来衡量生产力。

Some people have had their productivity measured by, like, how many lines of code they generate or how many pull requests they made.

Speaker 2

这些都只是衡量你有多努力、做了多少工作的不完美替代指标。

These are sort of these imperfect proxies for, like, how hard are you working, how much are you doing.

Speaker 2

但这些公司的员工也明智地认为,这关系到他们自身的成功。

But the employees of these companies also see this, I think, wisely as a key to their own success.

Speaker 2

如今,许多公司正将AI令牌的使用和消耗纳入绩效考核周期。

A number of these companies are now using AI token use and consumption as part of the performance review cycle.

Speaker 2

所以你去参加年度评估时。

So you go in for your annual review.

Speaker 2

你的老板会说,嘿。

Your boss says, hey.

Speaker 2

看起来你上个月只用了大约七千万个令牌。

It looks like you only used, you know, 70,000,000 tokens last month.

Speaker 2

怎么回事?

What's going on?

Speaker 2

所以我认为,这些公司的工程师们逐渐意识到,如果他们想拥有长期成功的职业生涯,最好开始多使用一些令牌。

And so I think the engineers of these companies are getting wise to the fact that if they want to have a long successful career, they better start using some tokens.

Speaker 3

是的。

Yeah.

Speaker 3

但我想象他们中的一些人对此其实很紧张。

But I imagine that some of them are really nervous about that though.

Speaker 3

对吧?

Right?

Speaker 3

因为在我看来,至少有一些公司希望通过激励令牌使用来达到目的,因为他们自己也怀疑,我们让员工使用这些工具越多,就越不需要长期雇佣人类了。

Because, like, it seems clear to me that at least some of these companies want to incentivize token usage because the companies themselves suspect that the more we can get them using this stuff, the less long we will have to employ the humans.

Speaker 2

也许吧。

Maybe.

Speaker 2

不过,我认为这与其说是人工智能系统取代人类,不如说是一种截然不同的工作方式。

Although, I think it's less about, like, the AI systems replacing the humans and more about, like, it is just a radically different way of working.

Speaker 2

对吧?

Right?

Speaker 2

这些人中的大多数都有很长的软件工程职业生涯。

These are people who most of them have had long careers in software engineering.

Speaker 2

他们从小就是手写代码长大的。

They grew up writing code by hand.

Speaker 2

他们可能从小就开始使用某种AI助手,比如GitHub Copilot。

They maybe grew up using some sort of, like, AI assistant, like GitHub Copilot.

Speaker 2

这些公司里的人说,这些自主工程系统真的非常不同。

And what people at these companies are saying is that these agentic engineering systems are just really different.

Speaker 2

你需要以不同的方式来应对它们。

You have to approach them in a different way.

Speaker 2

你需要花大量时间与它们相处,了解它们擅长什么、不擅长什么。

You have to spend a lot of time with them to understand what they're good and not good at.

Speaker 2

对它们来说,这是一种激励员工的方式,意思是:嘿。

And to them, this is sort of a way of motivating their employees to say, hey.

Speaker 2

去试试新东西吧。

Go out and try the new thing.

Speaker 3

是的。

Yeah.

Speaker 3

我不确定。

I I don't know.

Speaker 3

我一直在思考这个问题:如果我是这些公司中的一名工程师,并且有动力登上排行榜,我会怎么应对?

I've been thinking a lot about this question of, like, if I were an engineer at one of these companies and I have this incentive to get on the leaderboard, like, how would I approach it?

Speaker 3

我认为,本能地会想浪费大量令牌来提升排行榜名次。

And I do think that, like, the instinct to just, like, waste a bunch of tokens to to, like, rise higher on the leaderboard.

Speaker 3

但最终,如果你排名太高,人们会问你那些令牌都用到哪儿去了。

Like, ultimately, if you rise too high, people are gonna ask you what you did with all the tokens.

Speaker 3

没错。

Right.

Speaker 3

如果你以一亿次调用的用量排在第一,却只写了个计算器之类的程序,人们很可能会生气。

If you're number one at, like, 10,000,000,000 and you only manage to, you know, like, you know, vibe code a calculator or something, people are probably gonna get mad at

Speaker 2

你。

you.

Speaker 2

是的。

Yeah.

Speaker 2

我确实和一个人聊过,他推测 leaderboard 上排名靠前的人其实都在做副业。

And I actually did talk to one person who speculated that actually the people top of the leaderboards are all doing side projects.

Speaker 2

他们正在创办他们的

They're starting their They're starting their

Speaker 3

新公司。

new company.

Speaker 3

他们用老板的钱创办了一家新公司。

They start a new company with the with the boss's money.

Speaker 3

如果你真的这么做,我想说,我向你致敬。

And if you're doing that, I just wanna say, I salute you.

Speaker 3

这才是正确的工作方式。

Like, that is the right way to work.

Speaker 2

对。

Yeah.

Speaker 2

要是你打算这么做的话,或许别当排行榜第一比较好。

Maybe don't be the number one on the leaderboard if you're doing that.

Speaker 2

试着稳定在第六或第七名左右。

Maybe try to stick around six or seven.

Speaker 3

没错。

Yeah.

Speaker 3

其实排行榜中游才是你应该努力达到的位置。

Like middle of the pack is is kind of where you want to aim yourself.

Speaker 3

那个,我来问个问题。

I mean, let me ask.

Speaker 3

你觉得有没有什么代币追踪工具能提供靠谱的参考信号?

Is there any kind of token tracking that you think offers a reasonable signal?

Speaker 3

比如说,你觉得如果是一家科技公司,该不该做排行榜?

Like, do you think that if you're, like, a tech company, you should create a leaderboard?

Speaker 3

不。

No.

Speaker 3

我觉得这

I I think that's

Speaker 2

是一个糟糕的主意,原因就是我们刚才讨论过的所有内容,包括好心法则,我认为这只会导致人们浪费代币,去做一些副项目。

a bad idea for all the reasons that we just talked about, including Good Heart's Law, which is I think this is just going to lead to people just wasting tokens, doing side projects.

Speaker 2

但如果我是公司里的预算负责人,看到人们花在AI代币上的钱是他们薪水的数倍,嗯。

But if I'm the budget manager at a company and I'm seeing that people are spending multiples of their salary Mhmm.

Speaker 2

我就会问他们,到底用这些代币做了些什么。

On AI tokens, I'm asking them some questions about what they're doing with that all.

Speaker 2

如果他们的回答不是‘我开发了一个能每年创造数十亿美元收入的全新产品’,那我就会说,嘿。

And if their answer is not, I built an amazing new product that's gonna generate billions of dollars a year in revenue, I'm trying to say, hey.

Speaker 2

你这个月能不能少用一点?

Could you maybe use a little less this month?

Speaker 3

是的。

Yeah.

Speaker 3

我得说,这个代币排行榜的想法让我意识到,它只不过是软件行业长期以来一直在探索的一个问题的新形式:我该如何判断我的软件工程师是否高效?

I I have to say, I have been struck at how this idea of the token leaderboard just represents a new incarnation of something that the software industry has been trying to figure out for a long time, which is how can I figure out if my software engineers are productive?

Speaker 3

我最近刚和我未婚夫——一位非常英俊的软件工程师——聊过你的专栏。

You know, I was talking recently to this very handsome software engineer who I'm engaged to about your column.

Speaker 3

他告诉我,他过去是根据他贡献的代码行数来评估绩效的。

And he was telling me that, you know, he used to be evaluated on how many lines of code he contributed.

Speaker 3

他还跟我讲了当年人们玩的各种把戏,比如写一个快速算法,把一堆东西翻译成几种新语言,虽然完全没用,但能让他看起来那一周特别高效。

And he told me about all the games that people used to play back in the day with, oh, you know, I like wrote a quick algorithm to like, you know, translate a bunch of stuff into some new languages and it's like completely worthless, but it makes me look like I had a very productive week.

Speaker 3

于是我回去查了下,发现这种做法早在六七十年代就有了。

And so I went back and looked into this and they were doing this in the sixties and seventies.

Speaker 3

在计算机编程的早期,曾流行一句话:用代码行数来衡量编程进展,就像用重量来衡量飞机制造的进展。

And there's this saying from the early days of computer programming that eventually arises that says, quote, measuring programming progress by lines of code is like measuring aircraft building progress by weight.

Speaker 3

我必须说,我觉得这里的情况也差不多。

And I have to say, I think that the same thing kind of applies here.

Speaker 3

对吧?

Right?

Speaker 3

是的,如果你稍微眯着眼睛,从足够抽象的层面来看,确实可能有些使用大量令牌的人比那些不使用的更高效。

That, like, yes, if you squint and at the right level of abstraction, it's probably true that some people who are using a lot of tokens are more productive than some people who aren't.

Speaker 3

但这似乎并不是衡量这些事情的正确方式。

It just doesn't quite seem like the right way to measure these things.

Speaker 3

我只是好奇,这个行业多久才能意识到这一点。

And I just wonder how quickly the industry is gonna figure that out.

Speaker 2

对。

Yeah.

Speaker 2

我认为这会很快发生,部分原因是预算已经变得极其荒谬。

I think it's gonna be pretty soon in part because the budgets are just getting very ridiculous.

Speaker 2

尤其是AI模型提供商现在看到,单个用户消耗的服务量,是几个月前整个公司都难以企及的。

And and especially the AI model providers are now seeing individual users consuming amounts of their services that entire companies would have consumed just a few months ago.

Speaker 3

关于这个问题,我最后想问你的是,你认为这对整体经济会有什么影响?

You know, maybe the kind of last question I have for you about this is just what implications do you think it has for the broader economy?

Speaker 3

对吧?

Right?

Speaker 3

因为我们知道,在经济的许多不同领域,管理者都在说,我想激励我的员工使用人工智能,并且我想追踪他们是如何使用人工智能的。

Because we know that in so many different sectors of the economy, managers are saying, I want to incentivize my employees to use AI, and I wanna track how they're using AI.

Speaker 3

所以你认为,随着这些排行榜信息的传播,非技术领域的员工会不会也试图建立自己的版本?

So do you think that as knowledge of these leaderboards spread, we're going to see people in nontechnical fields try to adopt their own version of them?

Speaker 2

我不希望这样。

I hope not.

Speaker 2

我认为这不仅对追踪实际生产力和产出是个错误的举措,对员工士气也是如此。

I think it's really a bad move, not just for tracking actual productivity and output, but just for morale.

Speaker 2

对吧?

Right?

Speaker 2

我记得多年前,Gawker 公司办公室里曾有一个流量排行榜,你可以看到自己的文章点击量相对于其他人的情况。

Like, I remember years ago when, like, Gawker would have, like, a traffic leaderboard at their office so you could see how many clicks your stories were getting relative to other people.

Speaker 2

我不认为当时在那里工作的任何人觉得这种做法能激励正确的行为,或者提升员工的士气。

I don't think anyone who, like, worked there at the time thought that was, like, incentivizing the right things or creating, like, high morale among employees.

Speaker 2

基本上,每个人都在不停地相互竞争。

Basically, everyone was just competing with each other all the time.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客