本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
欢迎收听《经济对话》,这是经济学与自由图书馆的一部分。
Welcome to Econ Talk, part of the Library of Economics and Liberty.
我是主持人拉斯·罗伯茨,来自斯坦福大学胡佛研究所。
I'm your host, Russ Roberts, of Stanford University's Hoover Institution.
我们的网站是econtalk.org,您可以在此订阅、评论本期播客,并找到与今天对话相关的链接及其他信息。
Our website is econtalk.org, where you can subscribe, comment on this podcast, and find links and other information related to today's conversation.
您还可以浏览我们的档案库,收听自2006年以来的每一期节目。
You'll also find our archives, where you can listen to every episode we've ever done going back to 2006.
我们的电子邮箱是mail@econtalk.org。
Our email address is mail@econtalk.org.
我们期待您的来信。
We'd love to hear from you.
今天是2015年11月30日,我的嘉宾是宾夕法尼亚大学沃顿商学院及文理学院的安纳伯格大学教授菲利普·泰特洛克。
Today is 11/30/2015, and my guest is Philip Tetlock, the Annenberg University professor affiliated with the Wharton School and the School of Arts and Sciences at the University of Pennsylvania.
他与丹·加德纳合著的《预测的艺术与科学》是本期节目的主题。
He is the author, along with Dan Gardner, of The Art and Science of Prediction, which is the subject of today's episode.
菲利普,欢迎来到Econ Talk。
Philip, welcome to Econ Talk.
嗯,谢谢。
Well, thank you.
你在书中一开始就提出了许多批评,或者说贯穿全书,我认为你对那些评论家有很多批评。
So you start with a lot of criticisms or throughout the book, I'd say you have a lot of criticisms of pundits.
他们中有些人拥有博士学位,有些是记者,还有些只是所谓的预测专家。
Some of those have PhDs and some of them are journalists and some are just so called experts who make predictions.
但事实证明,当需要判断预测是否准确时,你很难真正追究他们的责任?
But it turns out a lot of those, you can't really hold their feet to the fire when it comes time to judge whether predictions are accurate or not?
他们是否擅长预测呢?
Are they good forecasters or not?
为什么会这样?
And why is that?
在我们日常生活中,人们声称某事会发生并将其刊登在报纸上,这其中的挑战是什么?
What's the challenge with our sort of day to day world where people claim that something's gonna happen and print it in the newspaper?
嗯,你提到的那些评论家,可能是指像托马斯·弗里德曼或尼尔·弗格森这样的人,左派或右派都有。
Well, the pundits of whom you think whom you say were critical, probably thinking of people like Tom Friedman or Niall Ferguson, people on the left or people on the right.
我们研究了各种类型。
We identify all sorts.
他们都相当一致。
They're all pretty uniform.
他们都是相当一致的非常聪明的人。
They're they're pretty uniformly very smart people.
他们都非常善于表达。
They're they're very articulate.
他们知识非常渊博。
They're very knowledgeable.
他们可能会对世界政治和经济做出许多看似极具洞察力的观察。
They offer may make many observations about world politics and economics that, seem very insightful.
然而,要评估他们对可能未来的判断、选择不同政策路径的后果是否正确极其困难,因为他们几乎完全依赖我们称之为模糊措辞预测的方法。
It is extremely difficult, however, to gauge the degree to which their assessments of, possible futures, the consequences of going down one policy path or another are correct or incorrect because they rely almost exclusively on what we call vague verbiage forecasting.
他们不会说某事发生的概率是20%或80%。
They don't say that they're the 20% likelihood of something happening or an 80% likelihood of something happening.
他们会说类似'2016年全球通缩是一个明显可能性'这样的话。
They say things like, well, it's a distinct possibility that there'll be a global deflation in 2016.
当你问人们'明显可能性'是什么意思时,根据听众当时的心情,这个概率可能在20%到80%之间。
And when you ask people what distinct possibility could mean, it could mean anything from about 20% to 80% probability depending on the mood they're in when they're listening.
我并不是要暗示你在批评他们(虽然你有时确实会),但你是在批评我们这种文化——把模糊的预言当真,然后让对立阵营的人玩'抓把柄'游戏。
And I I didn't mean to suggest you're critical of them, although you sometimes are, but you're you're critical of the of our culture that takes these vague pronouncements and then there's a gotcha game that gets played by people on the other side.
但当然,他们总有办法脱身,因为那些含糊其辞的话里通常都留有退路。
But of course, there's always a way to weasel out of it because there's usually some hedging in that in that verbiage.
对吧?
Correct?
嗯,确实如此。
Well, that that that's right.
如果你处在一个'指责游戏'的文化中——每当你的明确概率判断看起来可能出错时就会被人揪住不放——那么退回到模糊措辞中其实是相当理性的选择。
If if you if you exist in a blame game culture in which people are going to pounce on you whenever you make an explicit probability judgment that that appears to be on the wrong side of maybe, it's pretty rational to retreat into vague verbiage.
我们在书中提到了一位杰出的记者——《纽约时报》的戴维·莱昂哈特,他创办了数据新闻专栏'The Upshot'。
So we talk in the book about a brilliant journalist, the New York Times journalist, David Leonhardt, created the Upshot, a quantitative column in New York Times.
他在2011年还是2012年写过一篇文章,当时最高法院以5比4的微弱优势维持了奥巴马医改法案。
And he wrote a piece back in, I guess, was 2011 or 2012, and the Supreme Court narrowly upheld Obamacare by a five four margin.
而预测市场此前认为该法案被推翻的概率高达75%。
And the prediction markets had been putting a 75% probability on the law being overturned.
据我所知,戴维·莱昂哈特对预测市场并无成见,但他最终认定预测市场这次判断失误。
And David Leonhardt, who doesn't have any grudge against prediction markets as far as I know, concluded that the prediction markets got it wrong.
这对预测市场来说是个严厉的批评,毕竟它们多年来在数百个议题上做出过数百次预测,总体表现并不差。
Now that's that's a harsh judgment on the prediction markets because they make hundreds of predictions on hundreds of different issues over years, and they're not bad.
当它们预测某事件有75%发生概率时,实际发生率确实接近75%——这意味着有25%的情况不会发生。
When they say there's a 75% likelihood of something happening, it's pretty close to a 75 likelihood, which means that 25% of the time it doesn't happen.
如果你每次都要指责一个校准良好的预测系统仅仅因为结果落在'可能'的错误一侧,那你最终将失去所有可靠的预测工具。
So if you're gonna throw it a very well calibrated forecasting system every time it's on the wrong side of maybe, you're not gonna have any well calibrated forecasting systems at your disposal.
我认为这其实暴露了第二个问题:即便你量化了预测概率,本质上仍允许结果存在不确定性。
I would say that's a second problem really, which is that even when you do quantify your prediction, by definition, you're allowing the possibility that it doesn't happen.
那么问题在于,你如何评估做出这种预测之人的准确性或判断力?
And then the question is, how do you assess the accuracy or judgment of the person who makes a statement like that?
是的,正是如此。
Yes, exactly.
这需要理解概率,需要一些耐心和意愿去长期追踪记录。
And that then that requires some understanding of probability and some willing some patience and some willingness to look at track records over time.
那就从你的个人记录开始吧。
So let's begin with your particular track record.
你在这个领域做了大量研究,关于预测是否可能、准确性如何、专家是否擅长预测这些问题。
You've done a lot of research in this area, this question of whether prediction is possible, how accurate is it, are experts good at forecasting.
谈谈你的背景。
Talk about your background.
我们稍后会讨论你书中的核心赛事,但我想先了解你的研究历程、过去的发现以及人们的反应。
We're gonna get to the tournament that's at the heart of your book, but I wanna start with your research history and what what you found in the past and how people reacted to it.
嗯,我想你这是在委婉问我到底有多老吧?
Well, I guess that's another way of asking you just exactly how old must I be?
因为我从事长期预测竞赛研究已经很久了。
Because I've I've been doing longitudinal forecasting tournaments for a long time.
那我们就直截了当地说吧。
So let let's just put on the table.
我今年61岁,这个研究是在我获得加州大学伯克利分校终身教职后不久开始的,那时我刚过30岁。
I'm I'm I'm 61 years old, and I got started at at this after right after I got tenure at the University of California Berkeley, and I I was I was a little little more than 30 years old.
那是1984年。
It was 1984.
当时苏联还存在。
And the Soviet Union still existed.
戈尔巴乔夫还没当上苏联共产党总书记。
Korbetschev had yet to become general secretary of the communist part of the Soviet Union.
我们在八十年代中期进行了初步试点研究,当时鹰派和鸽派正在争论应对苏联的最佳方式。
And we we did our initial pilot studies back in the mid nineteen eighties when people were or hawks and doves were arguing about the best ways of dealing with the Soviet Union.
而现在当我们在伊朗核问题或俄乌冲突问题上争论不休时,我们仍在进行预测竞赛研究。
And now we're doing forecasting tournaments as hawks and doves are arguing about the best ways of dealing with the Iranian nuclear program or for that matter, if we're dealing with Russia and The Ukraine.
所以我们断断续续进行预测竞赛已有三十多年了。
So we've been running forecasting tournaments off and on for thirty plus years.
第一轮大型预测竞赛在八十年代末九十年代初进行,相关成果发表在2005年出版的《专家政治判断》一书中。
The first big set of forecasting tournaments were done in the late eighties and the early nineties and were reported on a book expert political judgment that came out in 2005.
第二轮预测竞赛规模更大,涉及数千名预测者、超百万份预测报告,由美国情报机构赞助。
And the second wave of forecasting tournaments were much larger involving many thousands of forecasters, a million plus forecasts, and were sponsored by the US intelligence community.
这些竞赛从2011年持续到2015年。
And they ran from 2011 to 2015.
事实上,它们现在仍在进行中。
And in fact, they're still running.
如果读者有兴趣报名参加正在进行的预测竞赛,可以考虑访问gjopen.com网站。
So if your readers are interested in signing up for an ongoing forecasting tournament, they should consider visiting the website at gjopen.com.
回到您在苏联解体前做的早期研究,那些研究有哪些主要实证发现?
Going back to the earlier work that you did and before the fall of the Soviet Union, what were some of the main empirical takeaways from that work?
一个重要发现是,自由派和保守派提出的政策建议大相径庭,他们对不同政策路径可能产生的结果也有截然不同的条件性预测。
Well, one big takeaway was that liberals and conservatives had very different policy prescriptions, and they had very different conditional forecasts about what would happen if you went down one policy path or another.
而且没有人真正接近预测戈尔巴乔夫现象。
And that nobody really came close to predicting the Gorbachev phenomenon.
同样,也没有人真正接近预测后来苏联的解体。
Nobody for that matter came really close to predicting the disintegration of the Soviet Union later on.
但事后每个人都似乎有一套解释,要么邀功,要么推诿责任。
But everyone after the fact seemed to have an explanation that I I either appropriated credit or deflected blame.
我相信这与他们的世界观是一致的。
And that was consistent with their worldview, I'm sure.
并且与他们先前的世界观完美契合。
And meshed perfectly with their prior worldview.
所以这就像我们处于一个结果无关紧要的学习情境中。
So so it it was as though we are in an outcome irrelevant learning situation.
实际上发生了什么并不重要。
It didn't really matter what happened.
人们总能处于绝佳位置,将发生的事解释为与他们先前的观点一致。
People would were would would be in an excellent position to interpret what happened as consistent with their prior views.
预测竞赛的理念是让人们更容易记住他们过去无知的状态。
The idea of forecasting tournaments was to make it easier for people to remember their past states of ignorance.
嗯,这算是题外话,但它确实是对人性的绝妙洞察。
Well, this is an aside of sorts, but it's just a wonderful insight into human nature.
这也是Econ Talk节目中的一个主题,当你回顾并让人们给出他们记忆中的概率时,比如苏联解体,他们是怎么说的?
And it's a theme here at Econ Talk, which is when you went back and asked people to give their, what they remember as their probability of say the Soviet Union falling, What what did they say?
嗯,他们肯定认为自己当时给苏联解体的概率比实际要高。
Well, they they they certainly thought they assigned a higher probability to the dissolution of the Soviet Union than they did.
还有少数人当初给出了极低的概率,却记得自己当时预测的概率超过50%。
And there were a few people who assigned really very low probabilities who remember being on assigning higher than a 50% probability.
所以人们事后确实夸大了那些概率。
So people really pumped up those probabilities retrospectively.
心理学家称之为后见之明偏差或'我早就知道'效应,我们在苏联预测竞赛中看到了大量这种现象。
So the psychologist called out the hindsight bias or the I knew it all along effect, and we we saw that in spades in the the Soviet Forecasting Tournament.
是啊。
Yeah.
我认为这是我们所有人都会做的一件极其重要的事。
I I think that's an incredibly important thing that we all tend to do.
我们往往以为自己比实际更有远见。
We tend to think we had much more vision than we actually had.
而且我们通常不会把这些事情写下来。
And we usually don't write those things down.
你碰巧记录下了其中一些。
You happen to have written some of them down.
所以当他们拿出原始预测时确实很尴尬。
So that was awkward that they actually had their original forecast.
但对大多数人来说,'我早就知道'效应是个更严重的问题,因为我们没有记录下来。
But most of us, the I knew it all along problem is a bigger problem for most of us because we don't write it down.
事实上我们的记忆确实会出现偏差。
Well, we truly remember it differently.
即使你认为桌子对面的人知道正确答案,你仍然会记错。
Even if you think the person on the other side of the table knows what the correct answer is, you still tend to misremember it.
是啊。
Yeah.
所以最近这场锦标赛相当引人注目。
So this more recent tournament was rather remarkable.
给我们讲讲参赛者的背景、你在其中的角色、赛事是如何组织的,以及参赛者们竞争回答的一些问题示例。
Give us the background of who competed and your role in it and how it was set up and what some of the questions, for example, were that people were competing on.
好的。
Sure.
这是我和我妻子兼研究搭档芭芭拉·梅勒斯共同完成的工作,当时我们还在加州大学伯克利分校任教,直到2010年左右才转去宾夕法尼亚大学。
This was work I did jointly with my wife, research collaborator, Barb Mellors, we were faculty then at the University of California, Berkeley, and we didn't leave for University of Pennsylvania until about 2010.
2009年底我们在伯克利时,国家情报总监办公室的三位工作人员拜访了我们。
But we were visited by three people from the Office of the Director of National Intelligence when we were at Berkeley, I guess, late in 2009.
其中至少有两人对将我的早期研究中运用的技术——即用超政治判断来评估情报分析师判断准确性——应用于美国情报界这个想法非常热衷。
And at least two of them were quite enthusiastic about the idea of The US intelligence community using some of the techniques that were employed in my earlier work, extra political judgment for keeping score on the accuracy of intelligence analyst judgments.
这就是后来被称为IARPA预测锦标赛的核心构想。
And that was the core idea behind the what became known as the IARPA forecasting tournaments.
IARPA是国家情报总监办公室下属的研发部门,作为统筹机构管辖着包括中情局、国防情报局等所有情报机构。
IARPA is the research and development branch of the office of the director of national intelligence, which is the umbrella organization over all intelligence agencies like CIA and DIA and and so forth.
总共涵盖16个情报机构。
And all all 16 of them.
其核心理念是举办一场竞赛,让顶尖高校和咨询公司竞标大型合同,组建团队专门对美国情报界认定的国家安全相关未来事件做出最精准的概率预测。
And the the the idea would be would be they they would have a competition and major universities and consulting operations would would apply for large contracts to assemble teams whose purpose would be to assign the most realistic probability estimates to possible futures that The US community US intelligence community deemed to be of national security relevance.
这些预测最终会转化为具体问题。
So those those turned out to be questioned.
问题涵盖领域极其广泛,从中日东海冲突、希腊退出欧元区、西班牙国债收益率差,到俄罗斯与爱沙尼亚、乌克兰、格鲁吉亚的关系。
I know everything from Sino Japanese clashes in the East China Sea to Greece leaving the Eurozone and Spanish bond yield spreads to Russian relations with Estonia, Ukraine, Georgia.
当然还包括中东冲突、埃博拉疫情、H5N1禽流感等各类重大议题。
Of course, conflicts in The Middle East, Ebola, h five n one issues, just enormous range of issues.
四年间累计发布了500多个预测问题。
A 500 plus questions over about four years.
每个研究团队的目标都是开发出最优的概率预测方法论。
And the goal would be of each of the research operations would would be to come up with the best possible ways of assigning probability estimates.
他们当时对每个人的学术背景都进行了严格筛查,确保所有人都是正规学者。
Now they they they screened everybody for their academic bona fides, so they wanna make sure that everybody was legit.
他们可没用通灵板之类的东西。
They weren't using Ouija boards or anything like that.
但除此之外,美国情报界只关心谁能对这些极其多样化的问题做出最精确的概率预测。
But beyond that, they they now the US intelligence community was simply interested in who could generate the most accurate probability estimates for these extremely diverse questions.
他们并不在乎我们采用心理学方法、统计学方法还是综合方法。
And they didn't really care whether we took a more psychological approach or more statistical approach or composite approach.
他们唯一在乎的就是准确性,仅此而已。
What they what they cared about was accuracy, and that was it.
准确、准确、再准确。
Accuracy, accuracy, accuracy.
于是我们——我妻子芭芭和我组建了这个名为'优质判断小组'的团队,这是个汇集了杰出学者的跨学科联盟。
So we our our group, Barb Barb my my wife and I put together this group called the good judgment group, which is an interdisciplinary consortium of of of wonderful scholars.
我们着手招募优秀预测者,并努力为他们提供最佳的概率推理培训与原则指导。
And we went out about we we we tried to recruit good forecasters who and we tried to give them the best possible training and principles of good probabilistic reasoning.
我们将其中一些人组建成团队,并指导他们如何高效协作。
And we assembled some of them into teams, and we gave them guidance on how teams can work effectively together.
我们还让部分人参与预测市场,想看看预测市场的效果如何。
And we put some of them into prediction markets, and we just wanted to see how well prediction markets would work.
我们尝试了许多不同的方法进行实验。
We did we experimented with a lot of different approaches.
我们还有非常优秀的统计学家,他们尝试了各种从群体中提炼智慧的方法。
We also have really good statisticians who experimented with different ways of distilling wisdom from crowds.
因此我们的方法极具实验性。
So our approach was very experimental.
我认为其他一些方法同样具有实验性质。
I think some of the other approaches were experimental as well.
但我们的实验成果比他们的更为出色。
But our experiments worked out better than their experiments.
因此在前两年的比赛中,我们以相当显著的优势获胜,优势如此明显以至于美国情报界决定将剩余资金集中投入一个大型项目——即'优质判断计划',该项目可以吸纳其他团队中最优秀的研究人员。
So we won the tournament by pretty resounding margins in the first two years, sufficiently resounding that US intelligence community decided to funnel the remaining money into one big group, which would be the good judgment project, which could hire some of the best researchers from other teams.
谁谁
Who who
你在和谁竞争?
are you competing against?
最初我们是在这里与不同的竞争基准进行较量。
Well, we originally we were competing different different competitive benchmarks here.
最初,我们的竞争对手是其他获得政府合同的机构,比如麻省理工学院、密歇根大学、乔治梅森大学等。
Originally, we were competing against the other institutions that received contracts from the government like oh gosh, MIT and University of Michigan and George Mason University, places like that.
后来我们还与自己运营的一个名为Inkling的公司运行的预测市场竞争,同时也与美国情报分析师自己生成的概率估计进行内部基准比较,尽管这部分是机密的,因为美国情报分析师的身份本身就是机密。
Then later and we were competing against a prediction market that we ourselves were running by a firm known as Inkling and also against internal benchmarks, US intelligence analysts themselves generating probability estimates and competing against them, although that was classified because, of course, The US intelligence analysts were classified.
但《华盛顿邮报》的大卫·伊格内修斯在第二年末或第三年时泄露了部分信息。
But David Ignatius at the Washington Post leaked some of that information in I think at the end of the second year or third year.
但两年后,你的团队彻底击败了所有人。
But after two years, you your team trounced everybody.
之后又发生了什么?
And then what happened going forward after that?
嗯,我们得以吸收其他团队的资源,因为政府显然通过暂停对其他团队的资助节省了大量资金。
Well, we were able to absorb resources from the other teams because the government was obviously saving a lot of money by suspending the funding of the other teams.
因此我们能够整合部分资源,从而更有力地与剩余的其他基准线竞争。
So we were able to consolidate some resources, and we were able to compete all the more aggressively against the the other the remaining benchmarks.
对我们而言,关键的外部基准是由Inkling运营的预测市场,以及美国政府内部更机密的那个。
Key benchmarks for us to be were an external benchmark, the prediction market run by Inkling and the more confidential one inside the US government.
你刚才提到——其实我要引用书里一句很应景的话,我特别喜欢,是来自早期医师盖伦的。
Now you mentioned, and this is just Well, actually I'm gonna read a quote from the book, I loved, which is relevant, which is from Galen, the early physician.
盖伦大致生活在什么年代?
And what time period did Galen live roughly?
他是公元二世纪的人物。
He is a second century after Christ.
大约两千年前左右。
It was about roughly 2,000 ago.
好吧,我以为他生活的年代要更晚些。
Okay, I thought he was later than that.
他在很久以前就写下了这些,你引用了以下内容说他并不热衷于实验,你写道:盖伦从不为结果所困扰,这些结果证实他是对的,无论证据在不如大师智慧的人看来多么模棱两可。
So he wrote a long time ago and you write the following that he wasn't into experiments and you wrote the following that he here's the quote: Galen was untroubled by outcome confirmed he was right, no matter how equivocal the evidence might look to someone less wise than the master.
这是盖伦的原话:所有服用此疗法的人都会在短时间内康复,除了那些未被治愈的人,他们都死了。
Here's Galen's Galen quote: All who drink of this treatment recover in a short time, except those whom it does not help who all die.
因此很明显,它只在无法治愈的病例中失效。
It is obvious therefore that it fails only in incurable cases.
还有什么比这更好的呢?
So what could be better than that?
我是说,这太了不起了。
I mean, that's phenomenal.
这让我想起,我想这就是你引用那句话的用意,即便是专家给某个事件发生的概率赋值为63.7%,无论最终是否发生。
I was reminded, I think that's where you apply the quote, of even the pundit who puts a numerical value on a certain event happening is a sixty three point seven percent chance that this will happen whether it happens or not.
如果确实发生了,他就会说:看吧,我早说过有63.7%的概率。
If it does happen, he says, See, I told you it was 63.7.
如果没有发生,他又可以说:我早说过有36.3%的概率不会发生。
And if it doesn't happen, he can say, Well, I said there was a 36.3% chance that it wouldn't happen.
所以当它没发生时,我依然是对的。
And so when it didn't happen, I'm still right.
那么问题就变成了,当你说你碾压了其他团队时,必须有一种评估概率的方法。
So the question then becomes, when you say you trounced the other teams, there has to be a way to evaluate probabilities.
在书中你提出了Briar评分。
And in the book you present the Briar score.
试着给我们讲讲你是如何衡量预测成功的。
So try to give us the flavor of how you measured success in prediction.
哦,这个观点很棒。
Oh, that's an excellent point.
确实无法衡量单个事件的概率判断准确性,除非预测者鲁莽地将概率设为零而事件发生了,或者设为1.0而事件没发生。
It really isn't possible to measure the accuracy of a probability judgment of an individual event Unless the person, the forecaster is rash enough to assign this probability of zero and it happens, or a probability of one point zero and it doesn't happen.
否则预测者总能辩称发生了小概率事件。
Otherwise, the forecaster can always argue that something improbable happened.
所以评估单个事件的准确性是不可能的,除了那些极端情况。
So assessing the accuracy of individual events is impossible, except in those limiting cases.
但可以通过评估多个事件和多个时间段的准确性来实现。
But it is possible to assess the accuracy across many events and many time periods.
因此,在世界政治中,良好的判断意味着你比其他人更擅长在众多事件和多个时间段内,为实际发生的事件分配比未发生事件更高的概率。
So good judgment in world politics means you're better than other people at assigning higher probabilities to things that happen than the things that don't happen across many events, many time periods.
举个例子来说,我们来讨论一下。
So the example would be, let's talk about it.
我们举一个具体的例子。
Let's take a particular example.
我们将尝试预测希腊退出欧元区的概率。
We're gonna try to forecast the probability of Grace leaving the Eurozone.
我说概率是0.51,而你说——抱歉,你说的是0.15。
So I say it's 0.51 and you say, excuse me, say it's point one five.
不,我们再来一次。
No, let's go again.
我说的是0.15。
I did point one five.
好的,请继续。
Yeah, okay, go ahead.
我选0.49,因为我认为可能性不大。
I'll go point four nine because I think it's not likely.
选择这个数字是因为它低于0.5。
Going because it's below point five.
我说0.49而你说0.1,结果这事没发生。
I say point four nine and you say point one and it doesn't happen.
所以结论是你判断得比我更准确。
So the argument is that you did a better job than I did.
你并不能确定这一点
You don't know that for sure
不,你不能。
No, you don't.
关于英国脱欧这件事。
With respect to Brexit.
没错。
That's correct.
你确实通过IARPA竞赛中提出的全部问题,从概率角度了解这一点。
You do know it probabilistically across the full range of questions posed in the IARPA tournament.
就你多年来一直预测0.49,而我预测0.1却未发生的情况而言,你可能会忍不住得出结论——即便关于希腊退欧问题,我的预测也更接近事实。
Now insofar as you've been predicting point four nine consistently over several years, and I've been predicting point one and it doesn't happen, you might be tempted to draw the conclusion even with respect to Grexit that I've been closer to the truth.
你可能会这么想。
You might be.
所以让我感到困扰的是这种评估良好判断力的设定方式。
So one of the things I found troubling about the setup and the way of assessing good judgment.
你的著作让人深思的一点是:评估某人是否具备良好判断力竟如此困难。
One of the things your book makes one ponder is just how hard it is to assess whether someone has good judgment.
这完全正确。
That's absolutely true.
我完全赞同。
I couldn't agree more.
这是一个非常难以操作化的概念。
It's a very difficult concept to operationalize.
是啊。
Yeah.
所以这种方式,尽管...让我们以这个案例为例,假设有10件事,我倾向于预测0.45的概率而你预测0.1,结果这些事都没发生——这样我们俩都算对了,而且我们都认为概率低于50%
So this particular way, even though, so let's take this case, let's say there's 10 things where I tended to predict point four five and you predicted point one and none of them happened so that we're both right and that we both thought it was below a half.
发生的可能性更低。
It was less likely.
但你比我更正确,因为什么?
But you were more right than I was because what?
这就是我想让你回应的点。
And here's what I want you to respond to.
在我看来,你可以辩解说,你只是比我更有信心。
It seems to me, you could argue, you just had more confidence than I did.
你在选择数字时更具策略性。
You were more strategic in how you picked your number.
你对实际概率并没有更准确的了解。
You didn't have any more accurate knowledge of the actual probability.
那么,你需要抛多少次硬币才能确定声称硬币有偏差的人比声称硬币接近平衡的人更接近正确?
Well, how many how many times did you have to flip that coin before you decided that the person who claims the coin is biased is closer to correct than the person who claims the coin is very close to equilibrium?
嗯,这是个很有挑战性的问题。
Well, that's a challenging question.
我在读这本书的时候想到了Legg Mason的比尔·米勒。
I thought while I was reading the book, I thought of Bill Miller of Legg Mason.
比尔·米勒连续至少十五年,可能更久,都跑赢了标普500指数。
So Bill Miller beat the S and P 500, I think for at least fifteen years in a row, maybe more.
很多人因此断定他一定是个天才,因为他能跑赢标普500。
And a lot of people concluded he had to be a genius because, well, he beat the S and P 500.
一年不算什么,但连续十五年,这概率太低了。
One year, not so impressive, but fifteen years, that's so unlikely.
但我们当然知道这并不能证明他是天才。
But of course, we know that that doesn't prove he's a genius.
这甚至不能证明他聪明。
It doesn't even prove he's smart.
可能仅仅意味着他运气好。
It might merely mean he was lucky.
在成千上万的共同基金经理中,恰好是他连续十五年跑赢了标普500指数。
Out of the thousands and tens of thousands of managers of mutual funds, he was the one who happened to beat the S and P five hundred fifteen years in a row.
我们知道只要有足够长的时间和足够多的经理人,这种情况就必然会发生。
And we know that over enough time and enough managers that's going to happen.
因此我们无法据此推断他未来的能力。
And so we know nothing about his ability going forward.
事实上,在他的连胜纪录被打破后,他的表现并不突出。
And in fact, he didn't do particularly well after his streak was broken.
他变笨了吗?
Did he get less smart?
他变得过于自信了吗?
Did he get overconfident?
我们无从得知。
We have no way of knowing.
因此,尽管我在书中发现许多有助于深思未来展望的有益内容,但这项根本的衡量技术在我看来仍是个挑战。
So I find myself, even though I found many things in the book that are useful in thinking thoughtfully about looking into the future, the fundamental measurement technique strikes me as a challenge.
对此你怎么看?
What do you say to that?
我认为这是个极好的问题。
I think that is a great question.
这是个非常非常深刻的问题。
It's a really, really deep question.
金融界人士当然会争论是否存在所谓的好判断力。
People in finance argue, of course, about whether there is such a thing as good judgment.
如果你坚信有效市场假说,自然会对此持高度怀疑态度。
If you're a really strong believer in the efficient markets hypothesis, you're going to be very skeptical.
只要投掷硬币的次数足够多,总会有几枚硬币连续出现76次、60次、70次甚至80次正面朝上的情况。
If you toss enough coins enough times, a few of them are bound to wind up heads 76 sixty, seventy, 80 times in a row.
你可以一直这样持续下去。
You can just keep keep doing that.
也有怀疑论者认为,比尔·米勒、沃伦·巴菲特或乔治·索罗斯不过是那些幸运的连续掷硬币序列之一,然后我们就把他们奉为天才。
And there are skeptics who argue that Bill Miller or for that matter Warren Buffett or George Soros were just one of those lucky sequences of of coin flips, and then we anoint them geniuses.
我们对超级预测者可能只是超级幸运的可能性非常敏感。
We are very sensitive to the possibility that super forecasters could be super lucky.
我们也始终持开放态度,认为任何特定的超级预测者可能只是超级幸运。
And we're always open to the possibility that any given super forecaster has been super lucky.
我们一直在寻找均值回归的模式。
We're always looking for patterns of regression toward the mean.
任务中的偶然性越大,均值回归效应就越明显。
The more chance there is in a task, the greater the regression toward the mean effect.
这正是我们持续关注的现象。
And that's just something we're continually looking for.
根据我们观察到的效应均值回归情况,我们最佳估计是IARPA赞助的地缘政治预测锦标赛中技能与运气比例约为七比三,这意味着技能占很大成分,运气也有显著影响。
Our best estimates are that the geopolitical forecasting tournament sponsored by IARPA had about a seventy thirty skill lock ratio based on the regression to toward the mean of of effects that we were observing, which means there's a big element of skill and there's a significant element of luck.
基于其他因素,比如我们引入的实验性干预措施能可靠地提高预测准确性。
And it it and based on other factors, like, we we introduce experimental manipulations that reliably improve accuracy.
如果完全是随机噪声,就不可能做到这一点。
If it were pure noise, it would wouldn't be possible to do that.
所以如果我们面对的是完全不可预测的极端噪声,就不可能开发出能提高准确性的训练模块或团队协作机制。
So it wouldn't be possible to develop training modules or teaming mechanisms that improve accuracy if we were dealing with a radically noisy deep and invariable.
但事实上我们能做到。
It is possible to do that.
因此,来自预测者个体差异的证据和实验证据等多方面证据表明,我们面对的并非完全不可预测的现象。
So various converging lines of evidence, both individual difference evidence among the forecasters and experimental evidence suggests that we're not dealing with a radically indeterminate phenomenon here.
确实存在良好的判断力这种东西,但运气成分也起着重要作用。
There is such a thing as good judgment, but there is certainly a significant element of luck as well.
当你阅读沃伦·巴菲特或查理·芒格对市场的合伙人分析时,面临的挑战之一是他们实在太聪明了。
And one of the challenges when you read Warren Buffett or Charlie Munger's partners analysis of the market, they're really smart.
他们充满了真知灼见,对吧?
They're full of interesting insights, right?
所以这强化了你的观点,也许并非运气使然。
So it reinforces your view that maybe it's not luck.
当然,挑战在于你无法确定那些特定见解是否真的重要。
The challenge, of course, is that you don't know whether those particular insights really matter.
确实如此。
That's true.
在宇宙中
In the universe
那些重要的事物里。
of things that matter.
就是
That's
就是这样。
That's that.
是啊。
Yeah.
我们在这个问题上完全达成一致。
We in complete agreement on this subject.
让我们以书中的一个例子来说明,我觉得这个例子非常有启发性,它展示了在某些预测问题和评估问题中技能的作用。比如你举的例子:有一个姓伦泽蒂的家庭,他们是独生子女家庭。
Let's take an example from the book, which I found really illuminating, which is an example of how there is a role for skill, at least in some forecasting problems and some estimation problems, which is you give the example of you're told that there's a family, their last name is Renzetti, they have an only child.
他们养宠物的概率有多大?
What are the odds that they have a pet?
然后讨论如何更深入地思考这个问题,而不是简单地说'我不知道',或者更糟的是'哦,他们是独生子女家庭,这很重要'。
And talk about how you might think about that more thoughtfully than just saying, well, I don't know, or worse, Well, they have an only child, that's important.
而你提到的内外视角的区分让我觉得特别有启发性。
And you the inside outside distinction I found very illuminating.
嗯,这是书中关于超级预测者与普通预测者区别的更大讨论的一部分,超级预测者的特点就是倾向于从外部视角出发,逐步深入分析。
Well, it it it's part of a more general discussion in the book about what distinguishes super forecasters from regular forecasters, and that is the tendency of the super forecasters to start with the outside view and gradually work in.
所以无论是估算芝加哥有多少钢琴调音师,还是某个家庭是否养宠物,或是某个非洲独裁者能否再掌权一年这类例子,你都会先从一个初步估计开始。
So you would start with your you you would start your initial estimate whether it's, you know, trying to estimate number of piano tuners in Chicago or whether a particular family has a pet or whether a particular African dictator is likely to survive in power another year, all those kinds of examples.
你会先问:基本的存活率是多少?
You would start by saying, well, what's the base rate of survival?
宠物的基准概率是多少?
What's the base rate of pets?
另一个例子是这个非洲独裁者问题。
What what and so the another example is this African dictator problem.
我们可能会问你一个问题:y国的x独裁者是否有可能再掌权一年。
You you we might ask you a question about whether dictator x in country y is likely to survive in power for another year.
你可能会耸耸肩说:我对这个国家都知之甚少,更别说这个独裁者了。
And you might shrug and say, you know, I barely heard of the country, less still the dictator.
但你还是知道一些事情的。
But you do know a couple of things.
你知道的比你想象的要多。
You know more than you think you know.
其中之一就是,如果一个独裁者已经掌权一两年以上,那么他再掌权一年的可能性非常高。
And one of them is that once a day, if a dictator has been in power more than a year or two, the likelihood of the dictator being in power another year is very high.
这个概率高达90%以上。
It's a 9090% plus.
即便你对这位独裁者或其国家一无所知,你依然可以说:我知道当某人在一个国家建立了权力基础后,要推翻他们是很困难的。
You could even though you know nothing about the dictator or the country, you can say, well, I know that when someone has established a power base within a country, it's difficult to dislodge them.
既然如此,你会基于这个简单可验证的统计事实,以一个高概率值作为预测的起点。
Now, if if so you you would start your you would start your estimation process with a high probability because of that fact, just a simple demonstrable statistical fact.
然后你会说:现在我最好做些调研,了解一下这个人和他的国家。
And then you would say, well, now I better do a little bit of research and find out a little bit about this guy in his country.
如果你发现这个人已经91岁高龄且患有晚期前列腺癌,你可能需要调整预测概率。
If you discover that this particular person is 91 years old and has advanced prostate cancer, you might wanna modify your probability.
如果你发现首都郊区正在发生武装冲突,你可能需要调整预测概率。
If you discover there are there there's fighting in the suburbs of the capital, you might wanna modify your probability.
这体现了超级预测者独特的工作风格——在深入研究复杂的历史细节前,他们会尽可能获取初始的统计优势。
So these are captures part of the distinctive working style of the super forecasters is that they try to get as much initial statistical leverage on the problem as they can before they delve into the messy historical details.
我认为我们都认同循证医学、循证预测的理念,你的著作无疑证明了数据和统计学能帮助我们更好地预判重要事件。
And I think all of us like the idea of evidence based medicine, evidence based forecasting, your book is certainly a tribute to the potential for data and statistics to help improve our ability to anticipate events that are important.
我想难点在于选择哪些证据以及如何整合其他因素。
I guess the challenge is which evidence and how we incorporate the other factors.
我是说,你讲述了很多关于不同预测者(其中许多人只是所谓的业余爱好者)的有趣故事,这很美妙。
I mean, one of the you tell a lot of really interesting stories of the way the different forecasters, many of whom are just quote amateurs, which is beautiful.
他们不像我或其他人那样背负着博士学位,那些人也总试图预测事物。
They're not burdened by the PhD that I have and that others have who tend to try to predict things.
所以你谈了很多关于他们如何权衡证据的内容。
So you talk a lot about how they weigh evidence.
我在理智上接受这些结果时感到挑战的部分主要有两个问题。
Part I find intellectually challenging in accepting these results is sort of two issues.
首先,我想明确一点,你组建的这些业余团队与专家合作,并通过整合人员形成团队,同时提供如何协作和避免群体思维的指导——这是书中非常重要且非常有趣的部分,我认为对任何人来说都非常实用。
One is, and I want to make it clear, these amateur teams that you put together along with experts and the aggregation of folks into teams with advice on how to work together and how to avoid groupthink, which is a large part of the book, very, very interesting and very useful, I think, to anybody.
在所有这些例子中,他们都占据主导地位。
In all of these examples, they dominate.
他们并非仅仅比其他团队高出三个百分点。
It's not like they do three percentage points better than the others.
我只是想澄清这一点,明白吗?
I just wanna make that clear, right?
他们的表现确实远超许多受过更高教育的人以及所谓的专家。
They really did a lot better than just some of the more educated folk and the so called experts.
对吗?
Correct?
嗯,当你把所有因素综合起来看,在锦标赛中,这种累积优势确实让普通选手望尘莫及。
Well, when you throw everything together, the cumulative advantage does get to be quite staggering over the ordinary folks in the tournament.
确实如此。
That's true.
但你刚才讨论的是不同的组成部分。
But you were talking about different components here.
拥有天赋并让合适的人上车当然大有裨益。
It it it certainly helps to have talent and to get the right people on the bus.
超级预测者之间的个体差异表明他们绝非普通人。
So individual differences among super super forecasters are not just regular people.
他们在某些可量化的方面确实与众不同。
They are different in certain measurable ways.
他们在流体智力测试中得分更高。
They score higher measures of fluid intelligence.
他们的政治知识更丰富。
They're more politically knowledgeable.
他们的思想更开放。
They're more open minded.
但最重要的是,我不认为他们比普通人具备所有这些优势,这点我已经验证过了。
But most important, I don't think I think they have all those advantages over regular folks, and I've done that.
不。
No.
这些因素确实重要。
And those matter.
但我不认为他们比专业情报分析师更有优势。
But I don't think they have those advantages over professional intelligence analysts.
我不认为他们有更高的流体智力,肯定也没有更丰富的知识储备。
I don't think they have greater fluid intelligence, greater think definitely don't have greater knowledge.
而且我并不认为他们真的更加思想开明,尽管他们确实相当开放。
And I don't really think they're more even be more open minded, although they are pretty open minded.
我认为真正将超级预测者与情报界经验丰富的专业人士区分开来的,是他们能够超越后者——这在我看来是所有基准中最具挑战性的。
I think what really distinguishes the super forecasters from the seasoned professionals in the intelligence community whom they were able to outperform, and that was really, I thought, the most difficult of all the benchmarks.
我认为真正的区别在于,他们相信主观概率评估是一种可以且值得培养的技能。
I think what really distinguishes them is that they believe that subjective probability estimation is a skill that can be cultivated and is worth cultivating.
我想许多资深分析师,就像许多资深评论员一样,当他们看到诸如'希腊退出欧元区的可能性有多大'或'普京试图吞并更多乌克兰领土的概率是多少'这类问题时,
I think many of the sophisticated analysts, like many of the sophisticated pundits, when they when they see a question like, well, how likely is is Greece to leave the Eurozone or how likely is is Putin to try to annex more Ukrainian territory?
他们会耸耸肩,
They'll they'll shrug it.
然后说:'听着,
They'll say, look.
这是个独特的历史事件,
This is a unique historical event.
我们不可能给这种情况分配概率。'
There's no way we can assign a probability to this.
你本该在统计101课程里就学过这个。
You should have learned this in statistics one zero one.
你可以在扑克这类游戏中学会做出精确的概率判断。
You do you can you can make probab you can learn to make refined probability judgments in poker and things like that.
在扑克中你能学会区分60%和40%的下注概率,因为扑克是在明确抽样空间中的重复博弈,而频率统计正是统计101课程所教授的内容。
You can learn to distinguish sixty, forty bets from forty, sixty bets in poker because in poker, you're have you have repeated play in a well defined sampling universe, and the the the frequentest statistics everybody learns is that one zero one apply.
那些统计方法在这里根本不适用。
Those statistics just don't apply here.
所以你正在进行一种伪精确的练习。
So you're engaging in an exercise in pseudo precision.
所以你面对的是智商极高的人群。
So you've got people with really high IQ.
这听起来似乎有可能。
That strikes you as possible.
高智商人群确实会说出这类聪明话。
Really high IQ saying really smart things like this.
这阻碍了他们探索如何做得更好的可能性,而我认为IR支持者已经证明了这是可能的。
And it blocks them from exploring the potential of learning to do it better, which I think is what the IR proponent proved is possible.
现在,我想试着接受这个批评。
Now, I wanna try to take that criticism.
我会换一种方式表达。
I'll phrase it a little differently.
如果你问我,让我们来预测一下。
If you asked me, let's predict.
有一场橄榄球比赛。
There's a football game.
我们是在周一录制的这段内容。
We're recording this on a Monday.
今晚有一场橄榄球比赛。
There's a football game tonight.
是克利夫兰布朗队对阵巴尔的摩乌鸦队。
It's the Cleveland Browns against the Baltimore Ravens.
如果我没记错的话,这场比赛应该不太精彩。
If I remember correctly, it's not a very interesting game.
而我们想试着计算巴尔的摩队获胜的概率。
And we want to try to figure out the probability that Baltimore is going to win.
我觉得他们可能是被看好的一方。
I think they're probably favored.
对吧?
Okay?
所以他们理应获胜,但我们知道也可能爆冷。
So they're supposed to win, but we know that they might not.
不过我们想知道具体的概率是多少。
So we'd like to know, though, what the probability is.
解决这个问题有很多种方法。
Now, there are many ways to go about this question.
你书中那些人的做法是采用基准率——或者说方法之一——就像我们刚才讨论的那样采用基准率。
The way the people in your book go about it is they take a base rate, or this is one of the ways, they would take a base rate like we talked about a minute ago.
那些已经执政x年的独裁者中,有多少人还能再执政一年?
How many dictators who've been in office x years are in offices an additional year?
或者以宠物为例,你确实提到过,但书中讨论的是拥有宠物的家庭比例。
Or in the case of the pet example, you did mention it, but in the book you talk about what's the proportion of households that have pets.
这将是一个很好的起点。
That would be a great starting place.
你从这个开始,然后深入挖掘,尝试发现更多信息。
You start with that and then you dig deeper and you try to find out more stuff.
首先,基准率很难确定,因为这是周一晚上弱势球队的基准率吗?
It's first of all, it's really hard to know what the base rate is because is it the base rate of underdogs on a Monday night?
还是连续输掉两场比赛的球队的基准率?
Is the base rates of teams that have lost two games in a row?
于是人们开始尝试——这在足球领域可行,但要分析希腊退出欧元区就困难得多——他们试图系统性地积累统计证据,进行多元回归分析。
So what then people start to do and they can do it in football, it's harder to a lot harder to do with Greece exiting the Euro is they try to accumulate statistical evidence that, you know, in a systematic way, they run multivariate regressions.
他们在这方面相当擅长,因为就足球的本质而言,我们很擅长缩小范围。
And they're pretty good at that because for the nature of football, we're pretty good at narrowing down.
展开剩余字幕(还有 352 条)
我们可以参考过去的表现。
We can look at past performance.
我们需要考虑可能搅乱局势的伤病因素。
We can take account of injuries that can mess things up.
我们永远无法确知。
We'll never know.
正如哈耶克在另一个语境中指出的,我们永远无法知道四分卫是否在前一晚与妻子发生了不愉快的争执,或是午餐吃了不合适的食物影响了他的发挥。
As Hayek pointed out in a different context, we'll never know if the quarterback had an unsettling argument with his wife the night before or a bad meal at lunch that's affecting his play.
但在橄榄球领域,我们可以相当准确地预测概率,只是我们缺乏这类分析工具。
But in football, we can get pretty good at predicting probabilities, but we don't have those tools.
事实上,当我们观察诸如希腊退出欧元区这类事件时——更糟的是——即便我们拥有分析工具,也往往难以做出准确预测。
And in fact, when we look at, say, Greece exiting and worse than that, when we do have those tools, we often can't do it very well.
所以我们试图在流行病学中估算大量饮用咖啡的影响,比如是否会增加患癌概率。
So we try to say estimate in epidemiology the effect of drinking a lot of coffee, whether you're more likely to get cancer.
这个我们无法准确测量。
We can't measure that.
那么这些人是如何吸收所有这些信息的呢?你谈到他们大量阅读、交流、分享观点,并在行动中互相碰撞思想。
So how are these people somehow absorbing all this information, which you talk about how they read a lot and they talk and they share ideas and they bounce ideas off each other when they were doing this.
他们为何能在不使用的情况下精准把握?
How are they able to somehow hone in with an accuracy without using
a
a
正式的统计模型?要知道即便我们使用正式统计模型时也往往难以做好。
formal statistical model that even when we use formal statistical models, we can't do very well?
是的。
Yeah.
嗯,他们善于抓住机会,有时确实会在意想不到的地方发现统计模型。
Well, they are opportunistic and sometimes they do find statistical models in unlikely places.
比如,对于你的橄榄球比赛,他们首先会去查看拉斯维加斯的赔率。
And, you know, one of the first places they would go for your football game is they look at look at Las Vegas and what the odds are.
他们会...你知道...他们实际上会做那些你表面上可能认为是作弊的行为。
They go you know, they they they they they they would do they would do what you might superficially consider to be cheating.
他们会说,看,市场上已经有一些非常高效的信息整合者存在了。
They would say, well, the weed there are there are some very efficient information aggregators already out there.
比如有内特·西尔弗和他的538网站,还有这个那个的预测机构。
Like, there's Nate Silver and five thirty eight, and there's this and there's that.
我会逐一研究这些数据源,取其平均值作为我的初始预估。
And I'm gonna take a look at each of those, and I'm gonna average those, and I'm gonna take those as my initial estimate.
如果我碰巧知道四分卫和他配偶的关系状况,可能也会纳入考量,但权重不会给太高。
And then if I know something about the quarterback's relationship with the spouse, I might factor that in too, but I'm probably not gonna give very much weight to it.
事实证明这确实是相当不错的策略。
And that that that turns out to be a pretty good strategy.
你提出了一个关于精确度极限的深刻哲学问题,我觉得这简直太棒了。
You're raising a deep philosophical question about the limits of precision, and and I think it's just wonderful.
要我说,这大概是我经历过最精彩的访谈之一。
It's a what's this is one of the best interviews I've had, I think.
是的,先生。
Yes, sir.
这这这是个深刻的问题。
It it it it a deep question.
我们何不直接承认,我不知道精确度的极限在哪里?
And why don't we just say, I don't know the answer to what the where the limits of precision are?
你并不知道答案是什么。
You don't know what the answer is.
我们为什么不通过像ARPA竞赛这样的研究来找出极限所在呢?
Why don't we run studies like the ARPA tournament and find out where they are?
而这本质上就是ARPA所做的。
And that's essentially what ARPA did.
采取了非常务实的态度并表示,你看,我们可能会陷入哲学立场的争论中。
Adopted a very pragmatic attitude and said, you know, we could we could we could, you know, get get get conquered down in philosophical positions.
而你...你可以说我是个懒惰的工程师,而你是个频率主义者。
And you you could I could say, I'm lazy engineer, and you're a frequentist.
我认为我们在这里可以进行概率估算。
And I think we can we can make probability estimations here.
而你认为,不。
And you think, no.
这...这里噪音太多,机会却太少。
It is it's it's there's just too much noise, and then there's not enough opportunities.
我们可以为此争论不休。
We we we could argue about that until the cows come home.
但真正该做的是举办预测锦标赛,探索精确度的极限在哪里。
But what really the right thing to do here is to run forecasting tournaments and explore what the limits of precision are.
而且...而且ARPA锦标赛中确实存在精确度的实际限制。
And and and the little there there are real limits of precision in the ARPA tournament.
我是说,最优秀的预测者平均而言,对发生事件的预测概率也不过75%,未发生事件的预测概率25%。
I mean, it's very best forecasters on average are not doing much better than assigning 75% probabilities for things that happen and 25% probabilities things that don't.
所以这里仍存在大量残余不确定性。
So there's still a lot of residual uncertainty here.
锦标赛中存在着大量无法消除的不确定性区域。
There there are big pockets of irreducible uncertainty in in the tournament.
这里存在很大的误差空间。
There there's there's lots of room for error.
他们犯了很多错误。
They make lots of errors.
我们只是证明了,做一件聪明人之前认为几乎不可能的事情是可能的。
What we simply showed is that it's possible to do something that's very smart people previously supposed was pretty impossible.
让我问一个不同的问题。
Let me ask a different question.
我想这与该实证法则的发现有两个相关的问题。
I guess there's two issues related to that empirical law finding.
一个是,你能再做一次吗?
One would be, could you do it again?
对吧?
Right?
这将是一个关于重复验证的问题。
It would be a question of replication.
你们能用同一个团队复制这种成功吗?
Could you replicate the success with the same team?
你认为他们会继续超越基准表现吗?
Do you think they would continue to outperform the benchmarks?
这是第一个问题。
That'd be the first question.
第二个问题是你们要怎么利用它?
The second question is what do you do with it?
那么,我先让你回答第一个问题。
So, well, I'll let you answer the first one first.
是否有计划尝试复制这些结果?
Is there any plans to try to replicate these results?
嗯,我们正在做这件事。
Well, we're doing that.
IARPA正在做完全正确的事情。
IARPA is doing exactly the right thing.
他们正在建立一个机制来探索这些结果的可复制性。
They're they're setting up a mechanism for exploring how replicable these results are.
所以我们将举办更多的预测竞赛。
So we're going to be running more forecasting tournaments.
我邀请你们的读者参与GJ Open的原因之一是他们可以探索自己的技能,谁知道呢,也许现在正在听的就有未来的超级预测者。
One of the reasons I invited your readers to participate in GJ Open is they can explore their skills, and who knows, there might be some super forecasters listening right now.
说得对,说得对。
Hear, hear.
我想,接下来的问题就是,这真的有价值吗?
I guess, and then the next question would be, is this really valuable?
所以你们经常需要制定应急方案。
So you have to have contingencies often.
你们需要为可能发生的意外情况制定应急计划。
You wanna have contingency plans for the contingencies that could happen.
所以你们需要知道希腊是否真的可能退出欧元区,或者中国是否会在军事上采取某些行动。
So you wanna know whether it's really likely that Greece will leave the the Eurozone or you wanna know whether China's gonna do such and such militarily.
你想知道这个或那个国家是否会发生政变。
You wanna know if there's gonna be a coup in this or that country.
但知道概率确实是73%而非58%,这会影响我们的行动吗?
But does it affect our actions to know that it's really 73% rather than 58?
那么后果是什么?你相信提高这些概率会带来更好的政策吗?
So what's the consequences of Do you believe that improving those probabilities are gonna lead to better policy?
再次强调,不要冒着讨好采访者的风险。
Again, don't look at the risk of flattering the interviewer.
这真是个绝妙的问题。
That's just a superb question.
这取决于具体领域。
It it depends on the domain.
如果我们讨论的是石油期货期权的定价,我想华尔街的专业人士会说,是的。
If we were talking about pricing, futures options on oil, I think Wall Street professionals would say, yeah.
我确实想知道60对40概率和40对60概率之间的区别。
I would really wanna know the difference between a sixty forty probability and a forty sixty probability.
AQR的首席风险官Aaron Brown在我们为这本书采访他时也这么说过,我知道这在对冲基金界人士中是一种普遍态度。
Aaron Brown, the chief risk officer of AQR said as much when we interviewed him for the book, and I know that's a common attitude among people in the hedge fund world.
所以在那个领域,期权定价和金融领域,我认为这没什么可争议的。
So in that world, options pricing and finance, I don't think there's much question about it.
扑克游戏,我认为也没什么可争议的。
Poker, I don't think there's much question about it.
如果我们有机会与美国情报界的高级官员讨论一年前的某个项目,我们问:如果你早知道俄罗斯入侵乌克兰的概率不是上升1%,而是在社会奥运会期间有20%的可能性,你会采取不同的行动吗?
Now if we had an opportunity to talk to a senior official in US intelligence community about the project a year or so ago, And we asked, well, if you had known that the probability of Russian incursion into the Ukraine was not up 1%, but was 20% during the Social Olympics, would you have done something different?
我们得到了一个有趣的反应。
And we got an interesting reaction.
对方说:天啊,我从来没听过这样的问题。
It was boy, I've never I've never even heard a question like that before.
那
That's
真有意思。
fascinating.
这确实是个引人入胜的问题,它引发了非常深刻的思考。
And and it is a it is a fascinating problem, and it it it it raises very deep questions.
我认为简短的答案是:从长远来看,拥有更准确的概率评估总比不准确的更好,这一点大家都会认同。
I think the short answer is everybody would agree you're not worse off with better probability estimates than worse probability estimates in the long run.
我不认为存在任何争议。
I don't think there's any.
问题在于:我们所能实现的精确度提升,是否能在特定领域中转化为足够多的优质决策,从而证明为此付出的成本是合理的?
The the question would be, are the increments, improvements in accuracy we're able to achieve, do they translate into enough better decisions in a given domain to justify the cost of achieving those improvements?
我认为这会是情报领域的关键问题。
I think that would be your question in the intelligence context.
而这一点,我认为正是情报部门目前正在明智探索的方向。
And that is I think something that the intelligence community is quite sensibly exploring right now.
但我认为另一个问题是:你可能会误以为自己掌握了比实际更多的确定性。
But I think the other question is you might lead yourself down a path of thinking you've got more certainty than you actually do.
因此即使我们都认为更有条理的方法必然更好,但使用这种方法也存在误判风险。
So there's a downside risk also of using a more organized method, even though we all want to, we all think that's got to be better.
可惜事实未必如此。
It doesn't have to be, unfortunately.
没错。
Right.
但相反的错误也同样可能发生。
But the opposite error is also possible.
我是说,心理学家们确实倾向于强调过度自信的危险,但信心不足同样存在风险。
I mean, psychologists, you're right, tend to emphasize the dangers of overconfidence, but there's also the danger of underconfidence.
是的。
Yep.
就像我们讨论奥巴马总统决定是否追捕本·拉登时的情形。
So we talk about the situation in which president Obama was making the decision about whether to go after Osama bin Laden.
我们认为他当时可能从会议中获得的概率判断中得出了信心不足的结论。
And he was probably he he drew an underconfident conclusion, we think, from the probability judgments that were offered to him in that room.
当拥有不同专业背景和观点的人都在提供概率评估时——几乎所有人都给出了超过50%的可能性。
When you have people with different expertise and different points of view, all offering probabilities, almost all of them offering probabilities on this above 50%.
处理这些概率的正确方式是什么?
What's the right way to process those probabilities?
你是应该简单地取中位数,还是应该采取比中位数更极端的数值?
Should you simply take the median or should you take something more extreme than the median?
这正是我们的统计学家们争论的问题之一。
And that was one of the issues our our statisticians wrestled with.
不妨将其视为一个思维实验。
I mean, treat it as a thought experiment.
在这个思维实验中,你是美国总统。
In a thought experiment, you're the president of The United States.
在你周围,围坐着一桌精英顾问,每个人都向你提供了奥萨马·本·拉登正藏身于巴基斯坦阿伯塔巴德某处神秘建筑的概率评估。
You around you, you have a table of elite advisers, each of whom offers you a probability estimate that Osama bin Laden is residing in a mystery compound in Abbottabad, Pakistan.
每个人都对你说:总统先生,我认为有70%的可能性。
And each one says, I mister president, I think there's a point seven probability.
下一位也说70%
And the next 1.7.
零点七。
Point seven.
整个会议桌周围,统一的零点七。
All around the table, uniform point seven.
美国总统应该得出什么结论,关于本·拉登是否在那里,以及是否考虑进入下一步发动海豹突击队行动?
What conclusion should the president of The United States draw about whether Osama is there and whether to consider going to the next step of launching a navy seal attack?
简短的答案是,这取决于顾问们是否彼此雷同。
Well, the short answer is it depends on, whether the advisers are clones of each other or not.
如果他们彼此雷同,答案就是70%。
If they're clones of each other, the answer is 70%.
他们都基于相同的信息来源。
They're all drawing on the same information.
他们以相同的方式处理信息。
They're processing it in the same ways.
70%。
70%.
每个70%并不能提供增量信息。
There's no incremental information provided by each 70%.
但如果他们依据不同类型的证据,并以不同方式处理——比如有人掌握卫星情报,有人是密码破译者,还有人拥有人力情报等等。
But if they're drawing on different types of evidence and different processing it in different ways, there's one guy with satellites information and others a code breaker and other as human intelligence and so forth.
如果他们基于不同类型的信息并以不同方式处理,各自仍得出70%的结论,但并不知道其他人在得出70%时所掌握的全部信息。
If they're drawing on different sorts of information processing in different ways, each of them still arriving at 70%, but not knowing all the information the other people had when they reached their 70%.
那么正确的概率是多少?
Now what's the correct probability?
正如我刚才提出的问题,答案在数学上是不确定的,但在统计学上是可估算的。
And the answer as I've to the question I just posed is mathematically indeterminate, but it's statistically estimatable.
在ARPA竞赛期间,我们反复进行了统计估算。
And we did statistically estimate it over and over again during the course of the ARPA tournament.
这是我们预测成功的重要驱动力之一。
It was one of the big drivers of our forecasting success.
通常在ARPA竞赛中,你会采取极端化处理,从70%调整到85%或90%。
And typically in the ARPA tournament, you would extremize, you'd move from 70% to 85 or 90%.
其实你知道的比你认为的要多。
There you know more than you think you did.
解释一下。
Explain.
你刚才说你会调整,我明白了。
When you said you'd move, understand.
你是说这就是你们会发现的结果?
You're saying that's what you would discover?
解释一下
It's Explain
这取决于你的顾问是谁。
a question of who your advisors are.
如果你的顾问们都在参考相同信息并得出70%的结论,那么当你平均他们的判断时,结果只会是70%。
If your advisers are all drawing on the same information and reaching 70%, the answer is when you average their judgments, it's just gonna be 70%.
实际上只有一个估算值。
Only have really have one estimate.
如果你认为自己有10个不同的信息来源,那是在自欺欺人。
You're fooling yourself if you think you have 10.
没错。
That's right.
但如果顾问们都说70%,而他们依据的是多样化的信息来源,这虽然有点反直觉,但最终结果会比70%要极端得多。
But if the advisers are all saying 70%, but they're drawing on diverse sources of information, it's a little bit counterintuitive, but the answer is going to be quite a bit more extreme than 70%.
具体会极端多少,取决于信息类型的多样化程度以及在场人员的专业水平,这些确实都是难以量化的事情。
Now how much more extreme is going to be a function of how diverse the types of information are and how much expertise is in the room, and these are difficult to quantify things for sure.
我们的统计学家采用了一种极端化算法,当最佳预测者的加权平均值偏向某一可能性时,他们就会将其极端化处理。
What our statisticians did is they used an extremizing algorithm that simply when when when the the weighted average of the best forecasters was tilted in on one side of maybe or another, they extremized it.
他们将概率从30%下调至15%,或从70%上调至85%。
They moved from 30% down to 15 or from 70% up to 85.
你是说当他们得出70%的平均估值时,实际上他们假装这个数值更高。
You're saying when they had that average estimate of 70, they actually see they were they pretended it was higher.
他们给予了比单纯10%更高的置信度,因为参考了不同的信息源。
They gave it more confidence than than just the 10 because they drew on different information.
我们的统计员每天东部时间上午9点提交预测,在预测锦标赛期间。
Well, our statisticians, we submitted forecasts at 9AM eastern time every day during the forecasting tournament.
所以这里没有回旋余地。
So there's no wiggle room here.
我是说,这是经过严格监控的研究。
I mean, this is very carefully monitored research.
对吧?
Right?
这不像某些研究中人们可能有操作空间的问题。
This doesn't have the problems of some research where, you know, people can have wiggle room.
这里没有回旋余地。
There's no wiggle room here.
这就像银行一样运作,每天都有交易,我们在那些聚合算法上下注。
This is being run like a bank with with with transactions every day, and we were betting on those aggregation algorithms.
而我刚才用语言描述的那个极端化算法,基本上就是预测锦标赛的胜出方案。
And and that that that particular extremizing algorithm I'm describing in in verbal terms right now was essentially the forecasting tournament winner.
它的准确率超过了99%的个体超级预测者,而算法本身主要就是从这些预测者中衍生出来的。
It was more accurate than 99% of the individual super forecasters from whom the algorithm itself was largely derived.
是的,我想再次强调回到我们之前的讨论,这并不意味着具体数字是0.8还是0.9。
Yeah, I just wanna emphasize again that going back to our earlier discussion is that it didn't really mean that the number was point eight or point nine.
我不确定这个数字是否有意义,只是说你们对本·拉登的确认程度高于0.7这个数字所显示的。
I'm not sure that's a meaningful number, just that you were more confident that it was Osama Bin Laden, say, than the point seven than the point seven number suggested.
这会让总统在发动海豹突击队行动时更有把握,换句话说,尽管他们都认为可能性超过0.7,但如果信息来自不同渠道,你们可以更有信心地认为可能性确实超过这个数字。
That would give some comfort to the president in launching a seal attack, knowing another way to say it would be that even though they all thought it was more likely than not, point seven, if it came from different sources of information, you could be more confident it was more likely than not.
这甚至接近于确定。
It was even closer to certain.
完全正确。
Exactly right.
是的。
Yeah.
我想稍后再讨论群体智慧和汇总问题,但既然你提到总统做决策,你对于领导者如何平衡谦逊与自信有些有趣的见解。
I wanna come back to the wisdom of crowds in a minute and the aggregation issue, but since you're talking about a president making a decision, you have some interesting thoughts on how a leader balances humility with confidence.
要知道,我们在节目里经常讨论这个,我持怀疑态度,但有时我过于怀疑了。
And I find, you know, we talk about this on the program a lot, I'm skeptical, but sometimes I'm too skeptical.
我需要对自己的怀疑态度更加怀疑,因为我很难接受那些可能为真却与我怀疑信念相悖的事情。
I need to be more skeptical about my skepticism because I have trouble accepting things that might be true that go against my skeptical beliefs.
所以你在书中提到,领导者,大多数领导者并不怎么持怀疑态度。
So you deal with that in the book that a leader, most leaders are not very skeptical.
他们看起来都很果敢。
They seem to be bold.
温斯顿·丘吉尔,就是你提到的典型例子。
Winston Churchill, it'd be a quintessential example you mentioned.
他们不会说'可能是73%或80%'这样的话。
They don't say, well, it could be 73 or 80.
他们会说'我们清楚真相'。
They say, well, we know the truth.
我们必须继续前进。
We gotta move forward.
谈谈如何在谦逊与过度自信之间找到平衡,这个我确实应该
Talk about this issue of balancing humility and overconfidence and confidence, This I should
这个话题如果我要写续集的话,一定会成为书中重点探讨的内容。
is a topic I If I were to write a sequel book, it's it's one that I would very much want to feature prominently in the book.
我们可以用体育来打个比方。
Well, let's use a sports analogy.
我不是体育迷,但我的合著者丹·加德纳是个狂热的冰球迷。
I'm not a big sports fan, but my coauthor, Dan Gardner, is a big hockey fan.
他是加拿大人,我其实也出生在加拿大,但后来归化了美国籍。
And he's a Canadian, and I was actually born originally in Canada myself, but I'm a US naturalized.
据说渥太华参议员队某年打进斯坦利杯决赛时,在七局四胜制的系列赛中一度以1比3落后。
But the Ottawa senators were apparently in a Stanley Cup final one year, and they were down three to one in the in in the in the series.
七局四胜制。
Best of seven series.
七局四胜制。
Best of seven series.
对吧?
Right?
然后有个记者把麦克风伸到教练面前说,嘿,教练。
And some reporter thrust a microphone in in front of the the coach's mouth and said, hey, coach.
你觉得你们还有机会吗?
You think you got a chance?
而教练没有像其他教练那样说些鼓舞士气的话,比如我们当然会赢。
And the coach, instead of doing what coaches are supposed to do and say, of course, we're gonna kick butt.
我们能行的。
We can do it.
直接连赢三场。
Just went three in a row.
我们以前做到过。
We've done it before.
我们会再次做到的。
We'll do it again.
他进入了超级预测模式。
He went into super forecaster mode.
然后他说,那么,成功的基础概率是多少?
And he said, well, what's the base rate of success?
落后三比一的球队看起来情况不妙,对吧?
Teams that are down three to one doesn't look very good, does it?
希望渺茫。
It's a long shot.
希望渺茫啊。
It's a long shot.
这可不是教练或领导者该做的事。
This is not what coaches or leaders are supposed to do.
这就引出了一个问题:在什么情况下领导者应该撒谎?
And it raises the question about what are the conditions under which leaders are supposed to be liars?
是啊。
Yeah.
那么答案是什么呢?
And what's the answer?
嗯,这就是为什么我说下一本书会探讨这个问题。
Well, that's why I said that the next book would wrestle with this.
不过我们在《超级预测》一书中确实用了相当篇幅讨论这个问题,我认为书中列举了一些有趣的军事案例和其他例子,展示了领导力需求与信心需求在某种程度上存在张力的情况。
But we do talk about it in the super forecasting book at some length and and and we have some interesting, I think, military examples and some other examples as well of where where, you know, the the the need for leadership and the need for confidence are in some degree of tension.
审慎的需求与信心的需求彼此之间存在张力。
The need the need the need for circumspection and the need for confidence are in tension with each other.
是啊。
Yeah.
我一直很着迷于领导者事后承认错误有多难,或者专家承认错误有多难。
It's always fascinated me how hard it is for a leader to say ex post, I made a mistake or a pundit to say I made a mistake.
他们大多不会承认。
Most of them don't.
他们会说,我当时考虑到了这一点,这里有文字可以证明我早就知道。
They they had should they say, well, I had that in mind, and here's the word that suggests that I knew that.
或者我不知道这个信息。
Or I didn't know this piece of information.
如果我早知道,当然就不会那样做了。
If I'd known that, of course, I wouldn't have da da da.
或者我最喜欢的说法:这根本不是个错误。
Or just my favorite, it wasn't a mistake.
其他所有人都认为这是个错误。
Everyone else thinks it's a mistake.
他们错了。
They're wrong.
这是件很棒的事。
It was a great thing.
所以你能看到各种反应。
So you get the whole range.
但我觉得这里面有心理因素...你是心理学家。
But I think there's a psycho you know, you're a psychologist.
我认为承认错误的心理挑战在于,你知道,说‘我知道这希望渺茫,但我不说因为这对团队不利’是一回事。
I think the psychological challenge of admitting a mistake and being you know, it's one thing to say, well, I know it's a long shot, but I won't say it because that would be bad for the team.
我想我害怕承认,我怀疑很多伟大的领导者甚至不认为这是希望渺茫的。
I think I I'm afraid to say I I suspect a lot of great leaders don't even think that it's a long shot.
他们只是说我们会赢,而且他们真心相信这一点。
They just say we're gonna win, and they actually believe it.
没错。
Right.
关于你更倾向于选择哪种领导者的问题:是能够自我欺骗的,还是两面派、有一套私下数字和一套公开数字的。
And and there's a question about whether you would prefer to have a leader who believe who is capable of self deception or a leader who is capable of being two faced and has one set of private private numbers and and set of public Yeah.
毫无疑问。
For sure.
我们来谈谈群体智慧,你在书中多次提到这个概念。
Let's talk about the wisdom of crowds, which you referred to a few times in the book.
我们刚才其实已经隐晦地讨论过这个问题了。
And you've just, we implicitly talked about it a minute ago.
谈谈如何聚合人群,以及你们在预测中如何避免克隆问题或群体思维问题?
Talk about how aggregated folks and how you avoided the cloning problem or the group think problem in your estimates?
我们研究小组早期曾有过一场大争论:预测者作为个体工作还是团队合作会更好。
There was a big argument in our research group after early on about whether it would be good, better for our forecasters to work as individuals or work as teams.
反团队派正确地指出了群体思维的危害以及团队运作中的其他功能障碍。
And the anti team faction correctly pointed it to the dangers of groupthink and all the other dysfunctions of groups.
任何在团队工作过的人都知道团队能有多糟糕。
Anyone who's ever worked in a team knows how bad teams can be.
霸凌。
Bullying.
以上种种。
All of the above.
但另一派认为,在某些条件下,团队可以发挥出超越个体总和的效果。
And then there was another group that said, look, there are conditions under which teams can be more than the sum of their parts.
如果我们给予正确的团队协作指导,他们就能产出出色的成果。
And if we give them the right guidance on how to work as a team, they can they can deliver some great stuff.
我们通过进行实验解决了这个问题。
And we resolved it by running an experiment.
结果在第一年结束时,团队表现更好,明显更出色。
And it turned out at the end of the first year, the teams were better, significantly better.
好多少呢?
How much better?
大概10%左右。
Maybe 10%.
要知道,我们讨论的是许多小因素和几个大因素共同作用,为超级预测团队带来了非常巨大的优势。
You know, what we're talking about is many small factors and a few big ones that cumulatively produce a really big, big advantage for the super forecaster teams.
超级预测者表现更好,是因为他们具备某些先天和后天的才能优势。
So super forecasters do better because they have certain natural and and acquired talent advantages.
他们表现更出色,部分原因是他们在认知丰富的环境中与其他超级预测者一起工作。
They do better partly because they work in a cognitively enriched environment with other super forecasters.
他们表现更好,部分原因是他们接受了大量关于概率估算的培训和指导,然后彼此互相学习。
They do partly they do better partly because they've been given them a lot of training and guidance on how to do probability estimation, and then they taught each other.
同样地,超级预测者也教会了我们一些东西。
And for that matter, the super forecasters have taught us things.
因此,通过超级预测者的反馈,我们的培训变得更好了。
So our our training has become better by virtue of the the feedback from the super forecasters.
最后,他们表现更好还因为算法的帮助。
And then finally, they do better because of the algorithms.
是的。
Yeah.
有一点我想澄清,之前我们强调得不够——这些参与预测的人,正如你所说,他们进行了大量预测,而且是持续进行的。
One thing I wanna make clear, which we didn't stress enough, these folks who are doing this, and as you said, there's a lot of forecasts and they're doing it, you know, on an ongoing basis.
这些人并不是全职从事这份工作的。
These are not people doing this as a full time job.
要知道,他们也不是政治学教授之类,专门预测南海局势发展的专业人士。
You know, these are not and these aren't professors of political science, say, forecasting what's going to happen in the South China Sea.
他们只是一些非常聪明的普通人,在业余时间做这些预测。
These are just really smart everyday people who are doing this on the side.
对吗?
Correct?
嗯,我真希望我们有更多的政治学教授。
Well, I wish we had more professors of political science.
我们有一些,但没达到我期望的数量。
We have a few, but we don't have as many as I would have hoped.
我的一些挚友
Some of my best friends
就是政治学教授。
are professors of political science.
我应该补充这一点。
I should add that.
不过我也有一些这样的朋友。
But Some of some of mine too.
没错。
Right.
对。
Right.
那么,那些超级预测者是谁呢?
Well, the the there but who are the super forecasters?
所以媒体喜欢关注那些最反直觉的超级预测者。
So the media like to focus on the super forecasters who are the most counterintuitive.
比如阿拉斯加的安妮·基尔肯尼,她是一位家庭主妇。
So there's Anne Kilkenny who is a housewife in in the civil Alaska.
还有匹兹堡的一位社工个案工作者,以及马里兰州的一位药剂师。
And there's a there's a a social work caseworker in Pittsburgh, and there is a person who works as a pharmacist in Maryland.
其他超级预测者在华尔街担任分析师,或曾是情报界的分析师,或在硅谷工作,或是非常熟练的软件程序员,他们开发了有趣的工具来帮助人们决定关注哪些问题以及如何在媒体资源上获胜等等。
Other super forecasters work as analysts on Wall Street or were previously analysts in the intelligence community or work in Silicon Valley or or really adept software programmers who develop interesting tools for helping people decide, which problems to focus on and how to how to win on media sources and so forth.
因此超级预测者的背景非常多样化。
So superforecasters are really quite varied.
其中有些人可能更符合你对超级预测者的刻板印象。
Some some of them, you know, would fit more your your stereotype of what you'd expect a super forecaster to look like.
有些是硅谷、华尔街那种智商180的类型,而另一些则更像是你在日常生活中遇到的聪明、有思想的普通市民。
Some Silicon Valley, Wall Street, IQ of a 180 type, and and and others look a lot more like intelligent, thoughtful citizens you you run across in everyday life.
有趣的是他们相处得非常好。
And it's interesting how well they get along.
在超级团队中观察这种互动实际上是一种美妙的体验。
It it's it's it's it's actually a wonderful dynamic to behold in in the super teams.
他们是一个多元化的群体。
They they are they are they are a diverse group.
他们非常擅长找出各自的比较优势所在,并处理问题、分配任务。
They have they're they're and they're very clever at at working out what what their sources of comparative advantage are and and dealing with problems and allocating labor.
实际上他们组成了许多组织。
They they they in effect many organizations.
实际上,他们创造了许多情报机构,这些机构产生的概率估计比许多情报分析师得出的更准确。
They create in effect, what they created were many intelligence agencies that were generating probability estimates more accurate than those that were coming out of many intelligence analysts.
但你说过你给过他们建议。
But you said that you gave them advice.
你并不是简单地把他们扔进团队然后说‘祝你好运’就完事了。
You didn't just throw them into teams and say, good luck.
希望你们能解决问题。
Hope you work it out.
你采取了一些书中描述的非常周到的措施,促使他们作为团队高效协作,而非简单的复制人或群体行为。
You did some very thoughtful things that the book describes to get them to perform effectively as teams rather than as clones or group thing.
我们确实这么做了。
We did.
我们确实这么做了。
We did.
我们这么做是因为群体中总是存在注意力分配的问题。
We did because there's always attention in groups.
要在群体中获取真相,往往需要提出可能稍微冒犯到别人的问题。
To get to the truth in groups, you often have to ask questions that might offend people a little bit.
因此掌握'和而不同'的艺术,以及精通加州某些咨询师所称的'精准提问'技巧,我们发现这是非常有用的工具——不仅适用于我们的超级预测团队,也适用于所有常规预测团队。要知道,我们拥有数千名预测员和多种实验条件。
So mastering the art of disagreeing without being disagreeable and mastering with the art of what some consultants in California call precision questioning, we found that to be a very useful tool to transfer to our super to our to all of our teams with regular forecasting teams and super forecasting teams, you know, because we have thousands of forecasters and many experimental conditions here.
请用两句话给我们描述一下什么是精准提问。
Give give us a two sentence description of precision questioning.
那是什么?
What is that?
嗯,当有人提出诸如‘足球作为世界最受欢迎的消遣活动正在衰落’这样的主张时,你需要明确他们关键术语的具体含义,比如‘消遣活动’包含哪些内容。
Well, when when someone makes a claim like soccer is declining in in popularity as the world's most popular pastime, you would want to figure out what exactly they mean by key terms, by what are the things that are included in pastimes.
他们所说的‘衰落’具体指什么?
What do they mean by decline?
你要促使他们比常人更具体地阐述观点。
And you you wanna get them to be more specific than people normally are.
当你开始追问时,人们往往无法给出更具体的解释,而且被追问时他们常会感到烦躁,心想‘别老纠缠这个问题’。
And when you start probing people, they're often unable to become more specific, and they often when they they then they feel when they're being probed, they get they feel irritated, and they think, you know, quit bugging me about this.
因此超级预测团队学会了在保持合理礼仪的前提下,将精准度推向极限。
So super forecasting teams have learned to push the limits of precision but maintain reasonable etiquette inside the group.
我认为这对于触及你们对话中一直探讨的核心问题——即‘精准度的边界在哪里’——至关重要。
And I think that's crucial for for getting at this, you know, more underlying question you've been you've been raising throughout the conversation, which is what are the limits of precision?
精确何时会变成伪精确?
When does precision become pseudo precision?
当你开始使用
When you start using
你不测试就不知道。
And you don't know until you test it.
简单的答案是当你开始使用小数点,但
Well, the simple answer is when you start using decimal points, but
我想他说73.2%。
I wanna He says 73.2%.
嗯,我认为这是对的。
Well, know, I think that's right.
我认为当你观察不确定性的程度时——比方说为了讨论方便,我们把概率尺度视为有100个点,而不是无限可分的。
I think when you look at how many degrees of uncertainty on let's say let's just say for sake of argument that we treat the probability scale as having a 100 points along it rather than being infinitely divisible.
我们就说这是个100点概率尺度,这正是我们在锦标赛中实际使用的。
Let's just say it's a 100 probability scale, which is the one we actually used in in the tournament.
超级预测者在做出预测时,能有效区分多少种不确定性程度?
How many degrees of uncertainty were super forecasters collectively usefully distinguishing when they make their forecast?
你可以通过将他们的预测四舍五入到十分位等方式进行统计估算。
You can estimate that statistically by rounding off their forecast to tenths and so forth.
对。
Right.
我认为我们最好的估计是,他们能区分概率尺度上大约15到20种不确定性程度。
And I think our best estimate is they can distinguish somewhere between about fifteen and twenty degrees of uncertainty on a probability scale.
大多数人只能区分大约四五种,介于四到五之间。
Most people distinguish about five, four, somewhere between four and five.
我们时间快不够了。
We're almost out of time.
我想讨论你在书中提出的一个经济学问题——这也是我经常思考的——我会用听众们熟悉的方式来表述:我们通过了这项看似庞大的(你可以争论它是否真的庞大,因为事后总会对基础条件是否成立存在争议)
I wanna get to an economics issue that you raise in the book, which is often on my mind, which is I'm gonna couch it the way listeners here would expect, which is we passed this enormous, seemingly enormous, you can debate whether it was enormous or not because there's always a debate after the fact of whether the base conditions even held.
我们通过了一项看似庞大的刺激方案来对抗经济衰退。
We passed a seemingly enormous stimulus package to fight the recession.
当时有人对此举可能产生的效果做出了一些预测。
And there were some predictions that were made, sort of, about what that would achieve.
其中一方的人声称这将在特定时间内终结失业问题。
And there were people on one side of the fence who said it was gonna end unemployment over a certain period of time.
另一方则认为这会让情况变得更糟。
Other people said it's gonna make things worse.
还有很多人只是单纯表示喜欢或不喜欢,连最基本的量化预测都没有。
A lot of people just said, oh, I really like it or I really don't without making any kind of even beginnings of a quantitative prediction.
但当尘埃落定后,辩论双方都宣称自己是对的。
But then the dust settled and afterward everybody said I was right on either side of the debate.
听起来很耳熟。
Sounds familiar.
我发现
I find it
这非常令人不安——尤其是在经济学领域(其他领域也有类似情况)——完全缺乏问责机制。
deeply troubling that in economics in particular, but it's elsewhere, that there's no accountability.
如果没有问责制,我们为何还要关注这些?
And if there's no accountability, why do we even begin to pay attention?
如果没有权威的方法,哪怕是稍微权威的方法来评估预测是否准确、模型是否正确、政策处方是否得到落实,我们如何能取得任何进展?
If there's no authoritative way, even mildly authoritative way to assess whether a prediction is accurate, whether a model is accurate, whether a policy prescription is fulfilled, How can we make any progress?
在我的专业领域里,我看不到这种机制。
And I don't see it in my profession.
你提出了一些可能的方法,让我们可以督促人们负责,至少建立某种问责机制。
You suggest the possibility of some ways we might hold people's feet to the fire and at least have some accountability.
这可能如何运作呢?
How might that work?
我...我想你指的可能是对抗性合作锦标赛的提议,我们以尼尔·弗格森和保罗·克鲁格曼关于量化宽松政策的辩论为例。
I I I think you might be referring to the proposal of adversarial collaboration tournaments, and we use the example of Niall Ferguson and Paul Krugman and the debate over quantitative easing.
没错。
Yep.
对。
Right.
嗯,我我认为这是个非常棒的模型。
Well, I I think a great it's a great model.
它在科学领域确实发挥过一些作用。
It's it's it's I I had some utility in science.
我认为它在公共政策辩论中也具有实用性,比如现在在GJ Open平台上举办的伊朗核协议竞赛。
I think it has some utility in public policy debates for running an Iranian nuclear accord tournament now on GJ Open.
关键的一点是这样的:
Here's here's one of the key things you you would do.
双方都有机会提出五到十个他们认为自己具有相对优势回答的问题。
Each side would have a chance to nominate five or 10 questions, and it thinks it has a comparative advantage in answering.
这些问题必须与核心议题相关,并且需要通过'预见性测试'——即事后能严格评估其准确性。
The questions have to be relevant to the underlying issue, and they have to pass the clairvoyance test, which means it has to be rigorously scoreable for accuracy after the fact.
在这种背景下,胜利的含义非常明确。
And victory has a pretty clear meaning in this kind of context.
它意味着你不仅能回答我的问题。
It means it you you not only can answer my question.
你无法比我更好地回答你自己的问题。
You're not gonna answer your questions better than I can.
你能比我更好地回答我的问题。
You can answer my questions better than I can.
这让我陷入尴尬境地,因为我不能简单地说你提出了一些愚蠢的问题。
And that that that that leaves me in an awkward situation because I I can't simply say, well, you published some stupid questions.
我不得不承认,我的问题也很愚蠢。
I would have to say, well, my questions were stupid too.
是啊。
Yeah.
这确实更加尴尬。
It's it's much more awkward.
当然,专家们会非常不愿意参与这样的博弈。
Now, of course, pundits are going to be very reluctant to engage in a game like this.
我是说,为什么要参与一个你...呃...就是这样。
I mean, why would you want to engage in a game which you're you're you're you're it's that's all yeah.
最好的结果其实并不够吸引人,不足以证明承担这样的风险是值得的。
The the best possible outcome is is not very it's not not nearly attractive enough to justify the risk.
唯一能促使高地位评论家同意参与公平竞争预测锦标赛的方式,就是让公众要求他们这样做——在这种比赛中,他们会将自己的未来预测与竞争对手进行较量。
The only way we're ever going to induce high status pundits to agree to participate in level playing field forecasting tournaments in which they pit their predictions about the future against their competitors is if the public demands it.
如果出现强烈需求,如果评论家们因为拒绝提供更精确且可验证的预测(相对于竞争对手)而感到信誉开始受损,那么我认为这将是唯一能迫使他们参与的力量。
And if there is a groundswell demand for that, if pundits feel that their credibility is beginning to suffer because they're refusing to offer more precise and testable predictions vis a vis their competitors, then I think that would be the only force on earth capable of inducing them to do it.
是啊。
Yeah.
我认为这里面存在羞耻感因素。
I think there's a shame factor.
我觉得外部力量——也许这个节目可以激发某些人的羞耻感来参与,但这是个有趣的问题。
I think an external source, maybe this program could shame some people into participating, but it's an interesting question.
我认为经济学中的一个挑战(仔细思考后其他领域也存在类似问题)是:你到底在衡量什么?
One of the challenges I think in economics, and it's somewhat of a question I think when you think about it carefully in other fields as well, what are you measuring?
如果我们真正关心的(比如说并非全貌)是担心最低工资是否会导致失业,有人会说:'我无法参与这个预测,因为除了最低工资上涨外还有太多其他影响因素。'
So if what we really care about, say, it's not the whole picture, but if we're worried about whether the minimum wage, say, causes unemployment, one reason people would say, well, I can't participate in that because there's so many other factors besides that increase in minimum wage.
我不能保证它们不会到位。
I can't guarantee that they're not gonna be in place.
然后我会
And then I would
说我们只想要一个概率。
say We just want a probability.
什么?
What?
我们只想要一个概率。
We just want a probability.
对。
Yeah.
但我想说,如果你不能,那你就该闭嘴,因为你只是在空谈。
But I just want to say, if you can't, then you should shut your mouth because you're just talking.
我是说,这没有——我自己也这样。
I mean, there's no and I do it too.
我不应该随意下结论,不该说问题全出在他们身上。
I shouldn't just make it I shouldn't say it's just them.
我虽然...但我不会像他们有时处理实证数据那样,假装自己的方法是科学的。
I I but I don't pretend mine's scientific in the way that they sometimes do with empirical data.
我只是试图依靠我的原则,我认为这些原则相当可靠,但恐怕我也存在自欺欺人的风险。
I just I'm I'm trying to rely on my principles, which I think are pretty reliable, but I'm probably at risk of fooling myself there too.
噢,我认为最低工资问题会是对抗性协作的绝佳案例,因为各州和地方政府在这方面采取了众多独立行动。
Oh, I think the minimum wage would be a wonderful example of our adversarial collaboration could work because there are so many states and municipalities taking independent actions on that front.
没错,如果策略得当,或许我能促成些事情。
Yeah, and maybe something I can enable if I play my cards right.
今天的嘉宾是菲利普·泰特洛克。
My guest today has been Philip Tetlock.
菲利普,感谢参与EconTalk节目;听众朋友们,感谢你们忍受那些电子噪音。
Philip, thanks for being part of EconTalk and listeners, thanks for putting up with some of that electronic noise.
我们正在努力改进这个问题。
We're trying to make it better.
谢谢。
Thank you.
非常有趣。
It's a lot of fun.
这里是经济谈话,隶属于经济学与自由图书馆。
This is Econ Talk, part of the Library of Economics and Liberty.
想收听更多经济谈话节目,请访问econtalk.org,您还可以对本期播客发表评论,并找到与今天对话相关的链接和阅读材料。
For more Econ Talk, go to econtalk.org, where you can also comment on today's podcast and find links and readings related to today's conversation.
经济谈话的音响工程师是Rich Goyette。
The sound engineer for Econ Talk is Rich Goyette.
我是主持人Russ Roberts。
I'm your host, Russ Roberts.
感谢您的收听。
Thanks for listening.
下周一见。
Talk to you on Monday.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。