理查德·克雷布 - 通过Numerai实现众包阿尔法（第七季第28集） | Flirting with Models 中文双语解读

本集简介

今天，我与Numerai的首席执行官兼创始人理查德·克雷布进行了对话。如果你以前听说过Numerai，并认为它只是数据科学与加密货币交汇处的一个有趣实验，那么现在是时候更新你的认知了。过去几年里，Numerai的资产管理规模悄然从约6000万美元增长到超过6亿美元。摩根大通已投资并获得了5亿美元的容量，而Numerai最近完成了一轮由顶尖大学捐赠基金领投的C轮融资，估值达5亿美元。这已不再是一个玩具项目，而是一个拥有非常规引擎的、真正意义上的机构级市场中性对冲基金。在这次对话中，我们深入探讨了Numerai的实际运作机制。理查德详细阐述了Numerai设计背后的核心洞察：只有当激励机制与参与行为对齐时，众包阿尔法才真正有效，而不仅仅是鼓励参与。仅仅开放数据并排名模型，会激励人们操纵系统，而非产生持久有效的信号。这一认知促使他们推出了Numeraire代币。通过要求研究人员用真实资本为其预测押注，Numerai从一个以排行榜驱动的实验，转变为一个以资本加权的信号引擎。系统不再奖励活跃度，而是奖励信念、责任感和独特性，从而形成一个自我过滤的模型，自然降低噪音，并抑制导致早期众包平台失败的行为。我们还讨论了投资组合构建与风险管理，包括Numerai如何对冲常见因子敞口、2023年回撤中出现了什么问题，以及这些经验如何重塑了他们在分散化与集中化方面的策略。最后，我们展望未来，探讨了众包建模的局限性、Numerai研究生态系统的下一个前沿，以及理查德如何看待AI代理如何重塑模型开发。请享受我与理查德·克雷布的对话。

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

嘿，大家好。

Hey, everyone.

Speaker 0

我是科里。

Corey here.

Speaker 0

感谢收听《与模特约会》的另一期节目。

Thanks for tuning into another episode of flirting with models.

Speaker 0

如果你喜欢这个节目，我非常希望你能花点时间评分、评论，更重要的是，推荐给朋友。

If you're enjoying the show, I'd greatly appreciate it if you take a moment to rate, review, and most importantly, with a friend.

Speaker 0

口口相传是这个播客成长的方式。

Word-of-mouth is how this podcast grows.

Speaker 0

如果你想了解更多关于Newfound的收益叠加型共同基金、ETF和模型投资组合平台，请访问returnstacks.com。

And if you'd like to learn more about Newfound's platform of return stacked mutual funds, ETFs, and model portfolios, head over to returnstacks.com.

Speaker 0

现在，继续我们的节目。

Now on with the show.

Speaker 0

好了，里奇。

Alright, Rich.

Speaker 0

你准备好了吗？

Are you ready?

Speaker 1

是的。

Yeah.

Speaker 0

好的。

Alright.

Speaker 0

三、二、一。

Three, two, one.

Speaker 0

让我们开始吧。

Let's jam.

Speaker 0

大家好，欢迎各位。

Hello, and welcome, everyone.

Speaker 0

我是科里·霍夫斯坦，这里是《与模型调情》，这档播客将揭开面纱，探寻量化策略背后的人性因素。

I'm Corey Hofstein, and this is Flirting with Models, the podcast that pulls back the curtain to discover the human factor behind the quantitative strategy.

Speaker 2

科里·霍夫斯坦是Newfound Research的联合创始人兼首席投资官。

Corey Hofstein is the cofounder and chief investment officer of Newfound Research.

Speaker 2

由于行业监管规定，他不会在本播客中讨论Newfound Research的任何基金。

Due to industry regulations, he will not discuss any of Newfound Research's funds on this podcast.

Speaker 2

播客参与者表达的所有观点均为其个人意见，不代表Newfound Research的观点。

All opinions expressed by podcast participants are solely their own opinion and do not reflect the opinion of Newfound Research.

Speaker 2

本播客仅用于信息目的，不应作为投资决策的依据。

This podcast is for informational purposes only and should not be relied upon as a basis for investment decisions.

Speaker 2

Newfound Research的客户可能持有本播客中讨论的证券。

Clients of Newfound Research may maintain positions and securities discussed in this podcast.

Speaker 2

如需更多信息，请访问thinknewfound.com。

For more information, visit thinknewfound.com.

Speaker 0

今天，我与NumeraI的首席执行官兼创始人理查德·克雷布进行对话。

Today, I'm speaking with Richard Craib, the CEO and founder of NumeraI.

Speaker 0

如果你之前听说过NumeraI，并认为它只是数据科学与加密货币交汇处的一个有趣实验，那么你有必要更新这一认知。

If you've heard of NumeraI before and thought of it as an interesting experiment at the intersection of data science and crypto, it's worth updating that mental model.

Speaker 0

在过去几年中，NumeraI的资产规模已悄然从约6000万美元增长至超过6000亿美元。

Over the last few years, NumeraI has quietly grown from roughly 60,000,000 in assets to over 600,000,000,000.

Speaker 0

摩根大通已投资并锁定5亿美元的规模。

JP Morgan has invested and secured 500,000,000 in capacity.

Speaker 0

Numerai最近完成了由顶尖大学捐赠基金领投的C轮融资，估值达5亿美元。

And Numerai recently raised the series C at a $500,000,000 valuation led by top university endowments.

Speaker 0

这已不再是实验性项目。

This is no longer a toy project.

Speaker 0

它是一个真正具备机构规模的市场中性对冲基金，拥有非常非传统的阿尔法引擎。

It is a real institutional scale market neutral hedge fund with a very unconventional alpha engine.

Speaker 0

在这次对话中，我们将深入探讨Numerai的实际运作方式。

In this conversation, we go deep into how Numerai actually works.

Speaker 0

理查德详细阐述了Numerai设计背后的核心理念。

Richard walks through the core insight behind Numerai's design.

Speaker 0

只有当激励机制一致时，众包阿尔法才能奏效，而不仅仅是参与。

That crowdsourced alpha only works if incentives are aligned, not just participation.

Speaker 0

仅仅开放数据并对模型进行排名，会激励人们操纵系统，而非生成持久有效的信号。

Simply opening up data and ranking models creates incentives to game the system, not to produce durable signals.

Speaker 0

这一认识促成了Numeraire代币的推出。

That realization led to the introduction of the Numeraire token.

Speaker 0

通过迫使研究人员用真实资本押注他们的预测，Numeraire将原本基于排行榜的实验转变为一种资本加权的信号引擎。

By forcing researchers to stake real capital behind their predictions, Numeraire shifts from a leaderboard driven experiment to a capital weighted signal engine.

Speaker 0

系统不再奖励活跃度，而是奖励信念、责任感和独特性，从而创建一个自我过滤的模型，自然减少噪音，并抑制导致早期众包平台失败的行为。

Instead of rewarding activity, the system rewards conviction, accountability and uniqueness, creating a self filtering model that naturally reduces noise and discourages the behaviors that cause earlier crowdsourced platforms to fail.

Speaker 0

我们还讨论了投资组合构建和风险管理，包括Numeraire如何中和常见因子暴露、2023年回撤期间出了什么问题，以及这些经验如何重塑他们对分散化和集中化的策略。

We also talk about portfolio construction and risk management, including how NumeraI neutralizes common factor exposures, what went wrong during their 2023 drawdown, and how those lessons reshape their approach to diversification and concentration.

Speaker 0

最后，我们展望了众包建模的局限性、Numerai研究生态系统的下一个前沿，以及Richard如何看待AI代理如何改变模型开发。

Finally, we look forward covering the limits of crowdsourced modeling, the next frontier for Numerai's research ecosystem, and how Richard sees AI agents reshaping model development.

Speaker 0

请欣赏我与理查德·克雷夫的对话。

Please enjoy my conversation with Richard Crave.

Speaker 0

理查德，欢迎来到节目。

Richard, welcome to the show.

Speaker 0

很高兴你能来。

Excited to have you here.

Speaker 0

我一直在关注你的进展很久了，所以非常高兴能有机会和你聊聊，不是关于Numerai的背景和它的起源，而是关于你们未来的方向。

Been tracking your progress for a very long time and so really excited to get the opportunity to talk to you, not about the background of Numerai and where it all came from, but where you're going.

Speaker 0

我知道未来有很多令人兴奋的事情即将发生。

I know there's lots of exciting stuff on the horizon.

Speaker 0

非常感谢你加入我的节目。

So thank you so much for joining me.

Speaker 1

是的，能来这里真的很棒。

Yeah, it is good to be here.

Speaker 0

在深入探讨Numerai的具体细节之前，也许我们可以先聊聊你的背景，你能跟我们讲讲你的经历吗？是什么让你对数学、机器学习产生兴趣，最终萌生了建立一个由众包模型驱动的对冲基金的想法？

Before we dive into the specifics of Numerai, maybe we can start with a bit of your background and you can tell us a little about your journey, what drew you towards math, machine learning and ultimately this idea of building a hedge fund that is powered by crowdsourced models.

Speaker 1

我想当我八岁的时候，我爸爸给了我一些股票让我跟踪。

I guess when I was eight years old, my dad gave me stock to track.

Speaker 1

我会观察股票的波动，这非常诱人。

I would watch the stocks move and it's very seductive.

Speaker 1

金融领域有一种很奇怪的现象：你能看到巨大的波动性，但要做得好却异常困难。

There's something very strange about finance where you can see there's all this volatility, but somehow it's very hard to do well.

Speaker 1

所以你看到一只股票在利好财报公布后一天内上涨了20%，但接着你想想，世界上最好的投资者是谁？

So you see your stock goes up 20% in a single day on a good earnings announcement, but then you look, wait, who's the best investor in the world?

Speaker 1

哦，他们的年化回报其实低于20%，比如沃伦·巴菲特之类的人。

And, oh, their return is actually below 20% a year, Warren Buffett or something.

Speaker 1

所以这里面有一种很有趣的现象。

So there's like this interesting thing.

Speaker 1

我从小就开始思考这个问题。

I just started thinking about it as a kid early.

Speaker 1

于是我决定，我应该试着去了解这方面的知识。

I decided, well, I should try to learn about this.

Speaker 1

问题是，关于金融的大量内容其实都像是垃圾食品版的金融。

The trouble is there's so much that's written about finance is actually kind of a junk diet version of it.

Speaker 1

你整天看CNBC，读报纸之类的东西，但在我看来，这些都偏离了重点。

So you could watch CNBC all day, you could read the newspapers and things like that, but it's all kind of missing the point to me.

Speaker 1

这些都属于媒体层面的东西，而不是真实的情况。

Like, that's all in the media realm instead of the what's real.

Speaker 1

后来，当我接触到文艺复兴科技时，我意识到这实际上是一个数学问题。

Later on, I realized when I came across Renaissance that what's really going on is a this is a mathematical problem.

Speaker 1

这是一个机器学习问题。

This is a machine learning problem.

Speaker 1

这促使我开始学习数学和机器学习。

So that drove me towards studying mathematics and machine learning.

Speaker 0

我知道你早期是通过像Kaggle这样的平台开始接触这些的。

I know you got a lot of your early start looking at things like Kaggle.

Speaker 0

你可以稍后解释一下Kaggle是什么。

You can explain what that is in a minute.

Speaker 0

你受到文艺复兴科技的启发，但你职业生涯早期——比如2010年之前——的机器学习状态与今天大不相同。

And you had this inspiration from RenTech, but the state of machine learning early on in your career, call it pre 2010 was very different than it looked today.

Speaker 0

我想你在我们之前的通话中用过‘惨淡’这个词来形容。

It was, I think the word you used in our pre call was quote grim.

Speaker 0

你能分享一下2000年代机器学习的前沿状态是怎样的吗？

Maybe you can share a little about like what was the state of the art of machine learning back in the 2000s?

Speaker 0

我有自己的看法，当年我主修计算机科学时，我们讨论的所有内容都和今天大不相同。

I have my own view, doing a computer science major back then, all the things we talked about is so different than we have today.

Speaker 0

当时缺少了什么？

What was missing?

Speaker 0

然后你可以谈谈Kaggle，以及当时的整个数据科学社区是如何塑造和影响你对这个问题的看法的。

And then maybe you can talk about Kaggle and what the broader data science community, how that shaped and influenced how you thought about this problem.

Speaker 1

确实发生了转变。

It did shift.

Speaker 1

2013年，我认为是DeepMind被谷歌收购的时候。

2013, I think was the DeepMind acquisition by Google.

Speaker 1

2011年有一些计算机视觉领域的突破，人们开始谈论自动驾驶汽车之类的事情。

2011 were some of the computer vision breakthroughs where people started talking about maybe self driving cars and things like that.

Speaker 1

所以所有这些几乎都是在那时发生的，我称之为深度学习革命。

So it all kind of happened there, would say, the deep learning revolution.

Speaker 1

与此同时，有一个叫Kaggle的网站上线了，聚集了大量爱好者，他们可能拥有工程、数学或统计学背景，一起来解决平台上发布的问题。

During the same time, you had this website called Kaggle, which was launched, and it was all of these hobbyists who maybe had backgrounds in engineering or math or statistics coming together to build models on these problems that were put out.

Speaker 1

竞争性数据科学领域大约在2010年之后诞生，它极大地推动了开源项目的发展，比如scikit-learn、TensorFlow等工具都在2015年左右相继问世。

The field of competitive data science was born after 2010 or so, and it really just spurred a whole bunch of growth in open source projects to help with data science like scikit learn and TensorFlow and all of these types of things came out around twenty fifteen, sixteen as well.

Speaker 1

人们逐渐意识到，机器学习在Kaggle上能够解决许多这类问题。

It started to become known that machine learning can solve so many of these problems often on Kaggle.

Speaker 1

比赛主办方常常对Kaggle数据科学家所构建模型的出色表现感到惊讶。

The competition host would be surprised by how good the models that were produced by the Kaggle data scientists were.

Speaker 1

这是一个常常让人不清楚当前最先进水平在哪里、甚至不确定潜力何在的领域。

It's a field often where you don't know what state of the art is or what even is the potential.

Speaker 1

所以你会惊叹：天啊，我们居然能在这样的数据集上达到这么高的准确率。

So you're like, wow, I can't believe we can get this level of accuracy on this dataset.

Speaker 1

我原本以为，我们的性能上限可能只是线性回归模型之类的水平。

I thought, you know, we were limited to the performance of linear regression models or something.

Speaker 1

尽管它变得非常流行，但在2015到2016年启动Numeraire时，人们仍普遍觉得机器学习对金融领域并不真正相关。

It became very popular, but even around this time of starting Numeraire, was 2015, 2016, there was a feeling that machine learning was still not really relevant for finance.

Speaker 1

我记得Two Sigma发布过一篇论文，颇具讽刺意味，因为他们现在大量使用机器学习，但那篇论文却声称：我们认为机器学习在金融领域不会有什么大作为。

I think Two Sigma put out a paper, it's ironic because they do a lot of machine learning now, but they put out a paper something like, we don't think machine learning is going to be a big deal in finance.

Speaker 1

你甚至在量化推特上也经常听到这种说法。

And you hear that a lot even on maybe quant Twitter.

Speaker 1

关键在于拥有干净的数据，然后运行一个普通的模型，并且对交易成本等细节保持高度谨慎。

It's really about having clean data and then running a normal model and just being very careful about transaction costs or something.

Speaker 1

我认为今天人们完全不是这样理解的。

I don't think that's how people understand it today at all.

Speaker 1

我认为它已经进化了。

I think it's evolved.

Speaker 0

对于可能不太了解的听众，你能先给我们做一个关于Numerai是什么、如何运作的高层次概述吗？

For listeners who may be unfamiliar or less familiar, can you start maybe the rest of this conversation by giving us a really high level overview of what Numerai is and how it actually works and operates?

Speaker 1

当时的思路是，如果你能创建一个对全世界所有人开放数据的对冲基金，那么全世界的人都可以在这些数据上构建机器学习模型。

The thinking was, well, if you could make a hedge fund where everybody in the world could access the data, Well, then everybody in the world could build machine learning models on the data.

Speaker 1

因此，Numerai创建了这些数据集，并将其向全世界发布。

So Numerai created this datasets and we released it to the world.

Speaker 1

我们说，全世界的竞赛数据科学家们，来加入Numerai吧！如果我们把你们所有的模型结合起来，这将是一个顶尖的阿尔法生成系统，因为所有可能的技术都会源源不断地涌入Numerai系统。

And we said, all the competitive data scientists in the world come and join Numerai And if we combine all of your models together, that's going to be a state of the art alpha generation system because we just have all of the different techniques that are possible coming in and out of the Numerai system.

Speaker 1

这就是它的本质。

So that's what it is.

Speaker 1

它可以说是第一个开放式对冲基金。

It's kind of the first open hedge fund.

Speaker 1

我们希望成为人工智能进入市场的首个平台。

We wanted to be the first platform for AI to reach the market.

Speaker 1

而我们确实做到了。

And that's what we did.

Speaker 1

因此，在早期阶段，我们就吸引了一些顶尖的Kaggle大师、Netflix大奖得主，以及数据科学界的所有这些英雄人物，他们都加入Numerai，试图推动我们的进展。

And so very early on, we got some of the top Kaggle grandmasters, the winner of the Netflix prize, all these different heroes of the data science world, all joining Numerai to try to advance what we've done.

Speaker 0

你们提供给用户的那个数据集是经过主动混淆处理的。

Now that data set that you provide to users is actively obfuscated.

Speaker 0

我想知道，在高层次上，你们是如何得出结论，认为直接向用户分享原始金融数据不可行的？

I'm curious at a high level, how did you come to the conclusion that sharing the raw financial data in and of itself with users was not viable?

Speaker 0

我知道，通常情况下，共享原始数据可能涉及数据提供方的使用条款问题，但这是一个主动的建模决策。

And I know often sharing raw data might be a terms of service issue with the data providers themselves, but this was an active modeling decision.

Speaker 0

我很想知道你是怎么想到这一点的。

And I'm curious how you came to that.

Speaker 0

为什么在为用户构建这些模型的过程中，隐藏数据的真实内容是有价值的？

Like why is hiding what the data is valuable in the actual process of building these models for users?

Speaker 1

这其中有多方面的原因。

There are a number of reasons for that.

Speaker 1

混淆的数据其实很有趣，我记得大约在2012年左右，我上机器学习课时，教授给我们布置了一个问题，所有的变量都只是x1、x2、x3、x4。

Obfuscated data, it's interesting actually when I was first taking a class in machine learning maybe around 2012 or something, there was a problem that the professor had assigned us and all of the variables were just x one, x two, x three, x four.

Speaker 1

我当时就想，这数据到底是什么？

And I was like, what is the data?

Speaker 1

告诉我啊。

Like, tell us.

Speaker 1

他却说，我想这跟声呐数据有关。

And he's like, well, I think it's something to do with sonar data.

Speaker 1

就这么多了。

And that was it.

Speaker 1

他就告诉我们这些。

That was all he told us.

Speaker 1

所以，你不需要领域知识，这正是学习的意义所在。

And so the idea that you don't need domain knowledge is sort of the point of learning.

Speaker 1

你从数据中学习哪些因素是重要的。

You learn from the data what matters.

Speaker 1

因此，通过提供混淆后的数据，我们可以规避许多潜在的人类偏见。

And so by giving out the data obfuscated, we do get around a lot of potential human biases.

Speaker 1

如果你看到我的模型想做多苹果股票，但你担心苹果会错过人工智能浪潮，或者某种人为的故事，那你可能会偏离数据的科学性来调整你的模型。

If you could see, oh, my model wants to go long apple, but I'm worried that apple is gonna miss the AI wave or some kind of human story, then you're gonna maybe alter your model away from the science of the data.

Speaker 1

我们不希望人们这么做，因此混淆数据能带来一个巨大的优势。

We don't want people to do that so that's one big gain we get by obfuscating the data.

Speaker 1

但另一个非常重要的原因是，我们不希望人们拿走我们的数据去建立他们自己的对冲基金。

The other really important one though is we don't want people running off with our data and building their own hedge funds.

Speaker 1

那样会非常低效，你最好把数据提交给我们，这样我们可以构建出最优秀的模型，而你可以通过质押分享收益，这一点我们稍后可以讨论。

That would be very inefficient, you'd much rather submit to us and then we can have the best possible model and you can share in the rewards through staking which we can get to.

Speaker 1

但这就是核心理念。

But that's the idea.

Speaker 1

你本来就想让数据脱敏，而脱敏并不会破坏每个特征的信号和价值。

It's like you want to have the data obfuscated anyway and obfuscation doesn't ruin the signal, the value of each of the features.

Speaker 1

在量化领域，你常常会对变量进行z分数标准化、排名归一化，或者类似的操作。

Often in quant, you might z score a variable or rank normalize it or or something.

Speaker 1

因此，很多这类转换本来就会被采用。

And so a lot of those transformations might be done anyway.

Speaker 0

对于任何有计算机科学学位的听众来说，他们很可能在本科期间接触过数据匿名化的工作。

For anyone who's got a computer science degree who's listening, they'll probably have likely done some work in their undergrad about anonymizing data sets.

Speaker 0

这是一个相当复杂的问题。

And it's a pretty nontrivial problem.

Speaker 0

这种脱敏概念表面上看似乎显而易见，但在我看来，实际操作中极其具有挑战性。

And this obfuscation concept sounds obvious at face value, but seems incredibly challenging in practice to me.

Speaker 0

你们如何在破坏数据中可识别的经济结构的同时，保留对你们希望用户参与的预测建模至关重要的统计特性？

How do you balance destroying the identifiable economic structure in the data while preserving the statistical properties that ultimately matter to the predictive modeling you're trying to have your users engage in.

Speaker 1

也许这个答案不是很好，因为你得知道PCA是什么，但一个很好的例子就是PCA，它确实完全改变了底层变量。

Maybe this isn't a great answer because you have to know what PCA is, but one good example is something like PCA where that really does change the underlying variables completely.

Speaker 1

PCA的工作原理是寻找每个主成分中方差最大的维度。

The way PCA works is it's looking for the dimensions with the highest variance for each principal component.

Speaker 1

所以你知道，即使数据被高度混淆了，所有结构仍然存在。

So you know that all the structure is still there even though the data is very obfuscated.

Speaker 1

这就是我们对数据所做的那种转换。

That's the kind of transformation we've done on the data.

Speaker 1

但这些年来我们尝试过许多不同的版本。

But we've done many different versions over the years.

Speaker 0

当我们思考其他一些试图实现与Numerai相似目标的平台时，Numerai的一项关键创新是引入了Numerai代币及其配套的质押机制，我们稍后可能会再回到这一点。

One of the key innovations that you have with Numerai as we think about different platforms that have somewhat tried to achieve the same goals you have, and maybe we'll come back to that a little bit later.

Speaker 0

但引入Numerai代币以及与其配套的质押机制是关键。

But was the introduction of the Numerai token and the staking mechanism that went along with it.

Speaker 0

除了奖励表现之外，为什么你觉得质押是必要的呢？

Why did you feel that staking was necessary beyond just rewarding performance, right?

Speaker 0

我认为就像你提到的Netflix奖一样，冠军可以获得一百万美元的奖金。

I think like with the Netflix prize that you mentioned, it was like a million dollar prize to the winner.

Speaker 0

为什么不用奖金机制，而要引入你们这种质押机制呢？

Why not just offer prizes versus this staking mechanism you introduced?

Speaker 0

在回答这个问题时，也许你可以解释一下这个质押机制是什么以及它是如何运作的。

And maybe in answering that you can explain what that staking mechanism is and how it works.

Speaker 1

事实上，Numerai在初期找到了一种方式，吸引了很多优秀的数据科学家。

It really was that Numerai found a way to get a lot of talented data scientists in the beginning.

Speaker 1

当时效果并不好，我们非常失望。

It wasn't really working and we were very disappointed.

Speaker 1

我们心想：天哪，我们这里有三百位顶尖的数据科学家，但公司内部构建的模型却比他们还要好。

It was like, wait, we have like 300 amazing data scientists here, but we can internally build a model that's better than the data scientists.

Speaker 1

这对于一家只有九个月大的公司来说，简直是灾难性的消息。

This is catastrophic news for a company that's nine months old or something.

Speaker 1

我其实很早就投资了以太坊，而当时以太坊刚刚上线，我就想我们是不是该利用一下以太坊。

I had actually invested early on in Ethereum and Ethereum had just launched and I thought we should do something with Ethereum.

Speaker 1

关键在于创建我们自己的加密货币，以便实现质押。

The trick was creating our own cryptocurrency in order to allow for staking.

Speaker 1

质押就是有风险投入。

And what staking is is it's skin in the game.

Speaker 1

这可以说是资本主义最根本的要素。

It's the most, like, fundamental element of capitalism, you might say.

Speaker 1

让人有东西可输，而不仅仅是可赢。

Make someone have something to lose, not just to gain.

Speaker 1

这对这一点有着非常大的影响。

And that has a very big impact on on this.

Speaker 1

当时的情况是，一些数据科学家可能会创建一千个账户，并在所有账户上提交结果。

So what was happening is some data scientists might make 1,000 accounts and they would submit on all of them.

Speaker 1

如果他们提交的是随机噪声，其中一个模型可能会表现良好。

And if they submit random noise, one of their models would do well.

Speaker 1

但如果你强制人们进行质押，这种攻击方式就不可能了，因为你必须选择哪一种噪声是最好的，然后才把它用于质押。

But if you force people to stake, there's no possibility of that type of attack because you'd have to choose which noise is the best and then that's the one you'd stake on.

Speaker 1

因此，这基本上消除了这种操纵数值的方式。

So that ended up basically eliminating that kind of way of rigging numeri.

Speaker 1

我们早在2016年、2017年和2018年就做了这件事。

We did that a long time ago back in like twenty sixteen, seventeen, and '18.

Speaker 1

这极大地提升了性能，因为这意味着你必须对自己的模型有十足的信心。

And that was like a very large lift in the performance because it sort of meant, well, you have to really have a lot of confidence in your model.

Speaker 1

请记住，数据科学中最大的问题就是过拟合。

And remember, the biggest problem in data science is overfitting.

Speaker 1

量化交易等领域最受批评的一点就是，你只是在过度拟合过去的数据。

And the biggest criticism of quant and so on is you're just gonna overfit the past.

Speaker 1

但当你质押时，你押注的是你未来的预期表现，也就是实际运行的表现。

But when you stake, you're staking on your prospective performance, what's in the future, your live performance.

Speaker 1

因此，这让你对过拟合变得极其谨慎，因为你清楚，仅仅在测试集或验证集上表现良好并不会带来回报。

So it makes you extremely cautious about overfitting because you know there's no reward for overfitting, for just doing well on a test set or a validation set.

Speaker 1

你必须在实际数据集上表现优异。

You've gotta do well on the live set.

Speaker 1

这些举措真正将Numerai推向了一个全新的阶段。

Those things really pushed Numerai into a new place.

Speaker 1

它曾经是个自由发挥的领域，但后来变得非常困难。

It used to be kind of a free for all and then it was like, oh boy, this is very difficult.

Speaker 1

因此，我们打造了自认为最难的数据科学竞赛。

And so we made what we think is the hardest data science competition.

Speaker 0

你能简单介绍一下实际的质押机制是如何运作的吗？特别是关于收益分配的部分？

Can you walk us a little bit through how the actual staking mechanisms work, particularly with respect to the payouts?

Speaker 0

我相信你们使用了某种残差相关性结合中性化约束来确定参与者从其加权模型中获得的最终收益。

And I believe you used sort of residual correlations with neutrality constraints to determine the ultimate payouts that participants receive from their stake weighted model.

Speaker 0

在这一讨论中，我希望你能稍微解释一下，为什么你们选择这种设计而非其他方案？

And with that discussion, I'm hoping maybe you can tie in a little bit of why you chose that particular design over others, right?

Speaker 0

根据模型在系统中的表现，有无数种方式可以将质押与收益挂钩。

There's so many ways in which you could correlate the stake to payout depending on how the model participates with the rest of the system.

Speaker 0

你们是如何想到采用这种残差相关性模型的？

How did you come upon this residual correlation model?

Speaker 1

我们确实设定了若干个我们称之为目标的东西。

We did have a number of what we call targets.

Speaker 1

我们希望Numerai社区建模的是什么？

What do we want the Numerai community to model?

Speaker 1

在早期，我们的想法是：从收益中剔除一些关键因素，因为如果你保留这些因素，人们提交的就只是因子押注，而这些其实是风险押注。

And in the early days, was, well, let's take out some of the key factors out of the returns because if you leave those factors in, you'll just get factor bets being submitted, which are really risk bets.

Speaker 1

没有理由相信，风险押注在长期来看会是一个好主意或带来显著的超额收益。

There's no reason to believe that a risk bet will be in the long run a good idea or significant alpha.

Speaker 1

所以我们只是想，我们的目标到底是什么？

So we just thought, well, what's our game?

Speaker 1

我们是想追求最纯粹的超额收益吗？如果是，那我们就无所谓把所有东西都中性化。

Are we trying to make the purest alpha you can and so we don't mind neutralizing everything.

Speaker 1

我们在风险模型（比如借贷风险模型）中会中性化所有因素，但我们的做法远不止于此。

We neutralize everything in say a risk model, like a borrow risk model, but we go far beyond that too.

Speaker 1

这种做法的好处是，能够聚焦整个Numerai社区。

That has this benefit of focusing the entire numerai community.

Speaker 1

押注动量并因此获利并不会得到任何奖励。

There's no reward for betting on momentum and then momentum goes up and you make money.

Speaker 1

对此没有任何奖励。

There's zero reward for that.

Speaker 1

唯一的奖励是创造最高质量、持续稳定且与所有风险因子正交的阿尔法收益。

There's only reward for making the most high quality consistent alpha that is orthogonal to all risk factors.

Speaker 1

这一点在我们的长期表现中得到了充分体现。

And that really shows over time in our performance.

Speaker 1

你可以对我们的表现进行归因分析，但不会发现那些常见的因子。

You can do attribution on our performance and you won't find those normal factors.

Speaker 0

你能谈谈你们是如何初始分配NumeraI代币给用户的吗？

Can you talk a little bit about how you bootstrapped the initial allocation of the NumeraI token to users?

Speaker 0

第一天，没有人拥有NumeraI。

Day zero, nobody has NumeraI.

Speaker 0

他们需要NumeraI来进行质押。

They need NumeraI to stake.

Speaker 0

最初是如何分发的？或者用户是如何获得这些代币以开始参与平台的？

How was it initially distributed or how was it acquired by those users to start participating in the platform?

Speaker 1

这要追溯到2017年我们当时的做法。

So this is all the way back in 2017 that we did this.

Speaker 1

我们的许多用户甚至不知道以太坊是什么，也不知道加密钱包是什么。

So many of our users didn't even know what Ethereum was or what a crypto wallet was or anything.

Speaker 1

我们告诉他们：‘你的账户里有一些新的NMR代币。’

And we were like, well, you have this in your account, some new NMR coins.

Speaker 1

对他们中的一些人来说，他们并不清楚这些代币是做什么用的，但最终我们决定不采取所谓的ICO方式，即向大量投资者出售代币。

And they didn't really know what it was for some of them, but it ended up being that the decision we made was we didn't want to do what's known as an ICO where you sell the token to a lot of investors.

Speaker 1

我们希望NMR代币由Numerai的用户持有，因为它的用途是参与Numerai的竞赛质押。

We wanted the NMR to be held by the Numerai users because its purpose is to stake on Numerai's tournaments.

Speaker 1

所以我们认为这是最好的做法。

And so we decided that that was the best thing to do.

Speaker 1

于是，我们向社区免费发放了大约一百万枚NMR代币。

So we gave about a million NMRs away to our community.

Speaker 1

随着时间的推移，我认为公司只保留了300万枚NMR。

And over time, I think we only own 3,000,000 NMRs, the company.

Speaker 1

其余的都流通在外了，总共有1100万枚，所以有800万枚在外部流通。

The rest of them are kind of out in the wild and there's 11,000,000 total, so 8,000,000 out in the wild.

Speaker 1

我认为这是一个非常正确的决定，因为我们希望这个代币能够被真正使用。过去十年里，许多加密项目虽然一度价值很高，但最终没能生存下来。

And that was a really good decision, I think, because we wanted to have this token be used and there are many many crypto projects that didn't survive very well the past ten years, even though that one point were very valuable.

Speaker 1

所以我们更注重长期发展，对此我们很满意。

So we try to be more long term and we're fine with that.

Speaker 0

在这种质押机制中，一部分是让用户根据他们对模型的信念进行质押，从而形成经济反馈循环。

In that staking methodology, part of this is having users stake with their convictions in their models and giving that economic feedback loop.

Speaker 0

但另一部分则在某种程度上起到了对抗恶意用户的作用。

But another part of this is a protection in a way against adversarial users.

Speaker 0

质押机制是如何为整个系统提供这种保护的？

How does staking provide that protection to the system at large?

Speaker 0

你们如何防止有人大量购买NMR，进行质押，然后故意构建一个糟糕的模型来破坏你们的信誉？

How do you prevent someone from just buying a large amount of Numerais, staking it, building a purposely bad model and destroying your track record?

Speaker 1

你可以这么做。

You could do that.

Speaker 1

但你会损失比我们多得多的钱。

It's just that you would lose a lot more money than we would.

Speaker 1

当你质押并表现不佳时，这种情况发生得非常快。

When you stake and you do badly, it happens very quickly.

Speaker 1

我的意思是，如果你连续六个月表现糟糕，你可能会输掉全部质押金。

I mean, you could have if you have a bad six months of performance, you can lose your whole stake.

Speaker 1

有一些用户会经历表现持续下滑的阶段，但这是很自然的现象。

There are some numerous users that go through these patches of just, you know, their performance erodes, but that's kind of natural.

Speaker 1

你可以把它想象成运营一个多元策略基金，总有一些策略在挣扎。

You could think about it as if you're running a multi strategy fund, there's always some strategies that are struggling.

Speaker 1

我们已经能够应对这种情况。

We're already able to handle that.

Speaker 1

所以，如果有任何敌对用户，当然。

So if there's any adversarial users, Sure.

Speaker 1

你可以去购买大量我们的加密货币，然后在Numerai上销毁它们，但笑到最后的是我们。

You can go and buy a whole bunch of our cryptocurrency and get it burned on Numerai, but the joke's on you.

Speaker 1

所以我认为没人会这么做。

And so I don't think anyone's doing that.

Speaker 1

另外，你可能会发现，要构建一个糟糕的模型其实很难，因为这和构建一个优秀的模型是一回事。

And also you might find it interesting to note that it's actually hard to make a model that's bad because that's the same as making a model that's good.

Speaker 1

要创建一个具有持续负阿尔法的模型实际上非常困难。

It's actually quite hard to make a model with extremely consistent negative alpha.

Speaker 0

是的。

Yeah.

Speaker 0

只要在前面加个负号就行了。

Just throw a minus sign in front of it.

Speaker 0

对吧？

Right?

Speaker 0

一瞬间，它就成了一个出色的模型。

And all of a sudden, it's a great model.

Speaker 1

没错。

Exactly.

Speaker 0

你之前提到，质押机制可以作为一种方式，让参与者真正投入经济利益，从而促使他们做出更理性的决策，尤其是在表达自身信念的坚定程度上。

You were talking about this idea of the staking mechanism being a way to create economic skin in the game and maybe create more rational decision making among participants, particularly around the strength of their convictions.

Speaker 0

但我也觉得，这种方式可能会继承人们把美元换成赌场筹码时所伴随的许多相同行为偏差。

But it also strikes me that in a way you might inherit a lot of the same behavioral biases that go along with when people turn dollars into casino chips.

Speaker 0

突然之间，这些赌场筹码的分量就不如真正的美元了；当手里的筹码变多时，他们会觉得像是在用赌场的钱玩；而当筹码减少时，他们可能会情绪失控，甘愿把最后的筹码全部赌掉，因为他们并不觉得这些筹码和真正的美元等值。

All of a sudden those casino chips don't carry the same weight as having a dollar and when they're up, they start feeling like they're playing with house money And when they're down, they might go on tilt and just be willing to gamble away the last of those casino chips because they're not really seen as being the same as dollars.

Speaker 0

我想知道，当人们开始使用NumeraI代币而不是像USDC或USDT这样的美元稳定币进行质押时，你是否也观察到类似的现象？

Curious whether you see that when people start using NumeraI tokens instead of saying stake waiting with with USDC or USDT, like a dollar stable coin.

Speaker 1

可能会有一些这种情况，但我认为这并不是什么大问题。

There might be some of that, but I don't think it's such a big deal.

Speaker 1

典型的优质用户已经在Numerai上活跃了五到七年。

The average good user has been on Numerai for five or seven years.

Speaker 1

我们讨论的不是那种进来随便猜几只股票的人，完全不是这么回事。

So we're not talking about people who like coming in to like try to guess on a few stocks or something that it's not like that at all.

Speaker 1

整个问题都是以一种非常科学的方式来界定的。

The whole problem is framed in this very scientific way.

Speaker 1

想要开始使用Numerai，甚至面临很多门槛。

There's a lot of barriers to entry to even get started on numerai.

Speaker 1

这么说吧，这里并不是一个适合赌博的地方。

It's not a fun place to gamble, put it that way.

Speaker 1

如果你仔细想想，通过从目标中剔除所有因子收益，它本质上是一个非常像白噪声的信号。

If you think about it, by taking all of the factor returns out of the target, it's a very white noise kind of signal.

Speaker 1

你想想，过去两年增长因子一路飙升，你会想大量押注它。

You think about something like the growth factor has been on a tear for the last two years and I'm gonna bet a lot on it.

Speaker 1

但在Numerai上，这种类型的押注是不可能实现的。

Well that type of bet can't be done on Nibirai.

Speaker 1

这里完全没有这种行为。

There's none of that type of behavior.

Speaker 1

所以我认为这确实是一个科学家的社区。

So I do think it's really a community of scientists.

Speaker 1

我甚至想反驳你之前提到的众包这个概念。

I would even push back on this idea of crowdsourcing, which you mentioned earlier.

Speaker 1

你会把开源描述为众包吗？

Would you describe open source as crowdsourcing?

Speaker 1

比如你有一个在开源中被广泛使用的关键软件包。

Like if you have a key package that's used in open source.

Speaker 1

理论上每个人都可以贡献，但实际上核心团队只占用户中的1%，而且这些人极其优秀。

You could because everybody could contribute but actually there's a core group and it's 1% of the users that are extremely good.

Speaker 1

在某种程度上，我们并不真正相信群体的智慧。

We don't really believe in the wisdom of crowds in some way.

Speaker 1

如果我走上街头，向人们询问100个股票建议，我并不认为这会产生超额收益。

If I go on the street and I ask people for a 100 stock tips, I don't actually think that will have alpha.

Speaker 1

所以我们并不这么做。

So we're not doing that.

Speaker 1

我们只是一个科学共同体，以最严格的意义上构建超额收益。

We're just a scientific community that is building alpha in the strictest sense of the word.

Speaker 0

毕竟，人们不能去当地的杂货店用numerae购买食品杂货。

Now at the end of the day, people can't go to the local grocery store and use numerae to buy groceries.

Speaker 0

他们必须把它兑换回法币。

They have to convert it back to fiat.

Speaker 0

正如我们提到的，numerae并不是稳定币，对吧？

And as we mentioned, numerae is not a stablecoin, right?

Speaker 0

它在法币中的价值会上下波动。

Its value in fiat fluctuates up and down.

Speaker 0

我想知道你是如何看待驱动numerae自身价值的供需动态，以及numerae价格随时间波动——无论是上涨过快还是下跌过快——如何影响新用户和老用户的行为，影响他们对质押的看法，以及价格波动如何影响他们在任何给定时间愿意质押的意愿。

Curious how you think about supply and demand dynamics that drive the value of numerae itself and how that fluctuation of NumeraI's price over time, either going up too quickly or going down too quickly impacts the actions of new users, old users, how they think about staking, how volatility in the price impacts their willingness to stake at any given time.

Speaker 0

这些似乎都是构建一个稳定模型市场的重要方面。

Those all seem like very important aspects of making a stable model marketplace.

Speaker 1

是的。

Yes.

Speaker 1

确实如此。

That's true.

Speaker 1

首先，它叫 NMR，Numeraire。

First of all, it's called NMR, Numeraire.

Speaker 1

所以它被称为 Numeraire 的 NMR 代币。

So it's called the NMR token for numeraire.

Speaker 1

但问题是，它波动很大。

But the thing is it's volatile.

Speaker 1

实际上非常波动。

Very volatile, actually.

Speaker 1

它的交易方式有点像比特币。

It sort of trades like Bitcoin.

Speaker 1

对于一些新加入的用户来说，他们会感觉：天啊，我得在 Coinbase 上买这个币，还要承担波动风险。

For some new users joining, there's the feeling of, gosh, I have to buy this coin on Coinbase and take on the volatility.

Speaker 1

即使我的模型很好，也可能因为波动而亏钱。

And even if my model's good, I might lose on just the volatility.

Speaker 1

但对我们的许多用户来说，这并不是什么大问题。

But that hasn't been such a big deal for many of our users.

Speaker 1

多年来，Numeraire的价格一直徘徊在10到20美元之间。

Numeraire has been between like 10 and $20 for years.

Speaker 1

当它上涨时，通常是因为加密货币领域的一些潮流原因。

When it goes up, it's often for like some fashion reason in crypto.

Speaker 1

这并没有让很多人却步。

It hasn't dissuaded that many people.

Speaker 1

此外，当NMR刚推出时，根本没有任何稳定币存在。

Also when NMR started, there weren't stable points at all.

Speaker 1

当我们销毁NMR时，这是在区块链上执行的一种销毁代币的操作。

When we burn NMR, it's an operation on the blockchain that destroys the token.

Speaker 1

所以你不会想销毁稳定币。

So you wouldn't wanna burn stablecoins.

Speaker 1

但在我们最近的Numicon大会上，我们首次宣布将允许在Numeraire上进行稳定币质押。

But actually at Numicon, our recent conference, we announced for the first time that we're going to enable stablecoin staking on Numeraire.

Speaker 1

这将为新用户提供一种入门方式，让他们能够以一种比Numeraire波动性低得多的代币开始质押。

This will be a way for new users to kind of get onboarded and start staking in it in this coin that they feel like it's much lower volatility than Numeraire.

Speaker 1

但我认为从长远来看，这将是一个入门步骤。

But I think in the long run, that will be a sort of onboarding step.

Speaker 1

你通过质押稳定币赚的钱，不会像质押NMR那样多。

You won't really be able to make as much money staking the stable coin as you will staking NMR.

Speaker 0

你能稍微讲一下吗？我们之前略过了这一点，但我希望确保我们能覆盖到。

Can you talk a little bit about we glossed over this since I want to make sure we do cover it.

Speaker 0

质押的奖励机制到底是如何运作的？

How the reward system actually works from the staking.

Speaker 0

所以有人构建了一个模型，进入一个新的周期提交，他们质押了一定数量的Numeraire。

So someone builds a model, there's a new epoch of submissions, They stake a certain amount of Numeraire.

Speaker 0

payout系统是如何运作的？

How does the payout system work?

Speaker 1

是的。

Yeah.

Speaker 1

你上传到Numeraire的是你的信号，这是一种包含6000个预测的CSV文件。

So what you're uploading to Numeraire is your signal, which is kind of a CSV file with 6,000 predictions.

Speaker 1

这6000个预测覆盖了全球股票市场，共6000只股票。

Those 6,000 predictions are across the global equity universe, 6,000 stocks.

Speaker 1

你不知道这些股票具体是哪些，因为信息已被混淆，但这就是你提交的内容。

You don't know what the stocks are because it's obfuscated, but that's what you're submitting.

Speaker 1

因此，你押注的是这个信号会表现良好。

And so what you're staking on is that this is going to be a good signal.

Speaker 1

它将与这些股票未来二十天的残差收益相关联。

It will have correlation with the subsequent twenty day residual returns of those stocks.

Speaker 1

这是第一部分，我们称之为核心部分。

So that's the first part, which is what we call core.

Speaker 1

你的信号是否与未来的股价存在相关性？

Do you have correlation with this future prices?

Speaker 1

但另一个更重要的是，你的信号是否在元模型之外还具备正交的阿尔法收益？

But the other one which is actually more important is do you have orthogonal alpha over and above the meta model?

Speaker 1

Numerai已经拥有一个非常出色的元模型。

Numerai already has the meta model, like it's very good.

Speaker 1

他是Numerai上最优秀的用户之一，经常是最顶尖的用户。

It's one of the best users on Numerai, often the best user.

Speaker 1

这只是所有用户表现的平均值。

It's just the average of all the users.

Speaker 1

要对Numerai有帮助，你实际上必须在平均值之上有所贡献。

To be helpful to Numerai, you actually have to be additive to that.

Speaker 1

因此，你必须提供我们已有的alpha之外的正交alpha。

So you have to have orthogonal alpha to the alpha we already have.

Speaker 1

想想这有多难。

So think how hard that is.

Speaker 1

我们已经有了来自Numerai所有用户的大量alpha，而你必须以某种方式在此基础上有所增益。

It's like we already have all this alpha, but it's from all the people on Numerai and you have to somehow be additive to that.

Speaker 1

这就是我们所说的MMC，即元模型贡献。

And that's what we call MMC, Meta Model Contribution.

Speaker 1

你对系统有增益吗？

Are you additive to the system?

Speaker 1

通过拥有MMC，你的质押与你的表现之间的关系最为紧密。

By having MMC, that's the strongest way that your stake has a relationship to your performance.

Speaker 1

因此，如果你能预测价格并为Numerai提供正交的阿尔法，你的质押就会增长。

So your stake will grow if you could predict the prices and if you have orthogonal alpha to add to numerai.

Speaker 1

所以这两点都是必需的。

So both of those things are required.

Speaker 1

是的，你可能会看到，比如一个月后，你的质押上涨了5%，因为你有核心贡献和一些MMC。

And so yeah, you might see, you know, after one month, your stake is up 5% because you had some core and some MMC.

Speaker 1

或者如果你的表现为负，你的质押就会下降5%，部分质押会被销毁。

Or if you had negative, it'll be down 5% and some of your stake will get burned.

Speaker 1

所以一旦你理解了，其实挺简单的。

So it's kind of simple once you get into it.

Speaker 0

你觉得这个浮动奖励机制是如何平衡系统中的模型数量、参与者数量以及最终的风险资本的？

How do you think that that floating reward system balances the number of models and participants and ultimately capital at risk within the system?

Speaker 1

关键是确保那些贡献阿尔法的优秀用户能获得丰厚回报，而表现差的用户会主动退出。

Well, the main thing is to make sure great users who are adding alpha are earning a lot and that bad users are quitting even.

Speaker 1

如果他们无法为Numerai做出贡献，可以不押注提交，纯粹娱乐，但我们不希望他们在对自己的表现没有信心时进行押注。

If they can't add to Numerai, they can submit without staking and play for fun but we don't want them staking if they don't believe in their prospective performance.

Speaker 1

我们也在努力高效地配置人力资本。

We're trying to actually efficiently allocate human capital too.

Speaker 1

如果你在Numerai上表现不佳，我们并不认为全世界每个人都应该去学习量化金融之类的东西。

Like if you are bad at Numerai, we don't believe there's some reason for everybody in the world to learn about quant or something like that.

Speaker 1

我们实际上认为这是人才的浪费。

We actually think it's a waste of talent.

Speaker 1

我个人认为金融行业的人力资本过剩了。

I personally believe there's actually way too much human capital in the finance industry.

Speaker 1

如果我们能给你一个信号：你在数据科学方面还不错，但你并没有为他人带来额外价值，于是你退出了？

If we can give you a signal, yes, you're pretty good at data science, but you're not additive to what other people are doing and you quit?

Speaker 1

这太好了。

That's great.

Speaker 1

这真是让我欣喜若狂。

Music to my ears.

Speaker 1

你决定将你的精力投入到经济的其他领域。

You decided your efforts in the economy should go elsewhere.

Speaker 1

在一个充满投机交易的世界里，仅仅疯狂地盯着该买什么，我认为这是对你生命的极大浪费。

In a world where there's so much kind of degen trading, really bad expression of your life I think to just be frenetically looking at what to buy and so on.

Speaker 1

来到Numerai，我能让你感受到，天啊，真正做好这件事实际上极其困难。

Coming to Numero, I can give you the sense of, gosh, this is actually incredibly difficult to do properly.

Speaker 1

它汇聚了能够做到这一点的优秀人才，同时也节省了其他人的宝贵时间。

That collects the great talent that can do it, but also saves time of other people.

Speaker 0

Numerai是否需要维持一定的价格水平，或者更准确地说，维持一定的市值，才能让激励机制正常运作？

Does Numerare have to maintain a certain price level for or maybe better stated market cap to to have the incentives work properly?

Speaker 0

这里有一个让我想到的问题：假设有一个人对元模型的独特阿尔法价值贡献巨大，而其他许多人贡献甚微，那些贡献小的人的押注被烧毁，而贡献大的人则押注增长，但他们最终实际上垄断了市场，却无法在不摧毁自己Numeraire价值的情况下套现。

One of the things that sort of comes to mind here is let's say you have someone who's incredibly value add to the ultimate unique alpha of the meta model and a whole bunch of parties that are not, the parties that are not see their stake burnt, the party that is sees their stake grow, but they ultimately effectively corner the market and there's no way for them to sell out without destroying the value of their own numeraire.

Speaker 0

这里存在一种有趣的供需关系：你需要不断有新人加入游戏，而赢家要想套现或真正体会到自己Numeraire的真实价值，就需要一个流动性良好的市场来承接他们的卖出。

There's this interesting need of supply and demand, you need people who have to constantly coming into the game, the winners for them to cash out or actually see the true value of their numeraire need a nice liquid market that they can sell into.

Speaker 0

因此，这里的平衡至关重要。

So the balance here really matters.

Speaker 0

你如何看待这个问题？

How do you think about that?

Speaker 1

它非常有流动性。

It's very liquid.

Speaker 1

当我向资金分配者介绍这个时，他们经常问到数额币（numeraire）。

When I'm pitching this to allocators, they often ask about numeraire.

Speaker 1

它的流动性比大多数股票还要高。

It has more liquidity than most stocks.

Speaker 1

它在很多不同的交易平台上有非常高的交易量，其中一个原因是它是最早在以太坊上发行的代币之一。

It is very, very highly traded and on a lot of different venues and one of the reasons is it's one of the first coins that was made on Ethereum.

Speaker 1

它有数以万计的持有者等等。

There's tens of thousands of holders and so on.

Speaker 1

所以它实际上是一个相当有流动性的市场。

So it actually is quite a liquid market.

Speaker 1

而且，如果你在Numerai中表现优异，你的收益会远高于即使NMR在你参与期间下跌了20%、30%或更多的情况——因为我们的报酬是基于你如何最大限度地推动我们的业绩记录来计算的，你每年的收益可能达到数百个百分点。

And then I think if you're good at Numerai, your earnings are so much higher than even if NMR fell 20%, 30% or something while you were playing, your earnings per year can be hundreds of percent because we're paying you based on how much you're pushing our track record to the maximum.

Speaker 0

好的。

Alright.

Speaker 0

所以我们花了大量时间讨论这个质押系统，但Numerai并不是第一个也不是唯一的众包对冲基金平台。

So we spent a lot of time talking about this staking system, but Numerai is not the first or only crowdsourced hedge fund platform.

Speaker 0

我知道，我们在这里要用引号来强调‘众包’这个词。

And I know, you know, we're gonna use crowdsourced and air quotes here.

Speaker 0

Quantopian是另一个非常知名的平台，但它已经兴起又消失了。

Quantopian was another very well known platform that has come and gone.

Speaker 0

它没有像Numerai那样取得成功。

It did not succeed where Numerai has been successful.

Speaker 0

你认为为什么Numerai成功了，而其他平台却没有？

Why do you think Numerai survived where others didn't?

Speaker 1

嗯，我其实也想说，Millennium也是一个众包对冲基金，而且它确实成功了。

Well, I would also actually say, you know, Millennium is a crowdsourced hedge fund and they did succeed.

Speaker 1

Quantopian有一些非常不同的地方。

There was something quite different about Quantopian.

展开剩余字幕（还有 265 条）

Speaker 1

我认为当时时机太早了，因为那时还没有区块链和机器学习。

I think it was a little bit too early because it was pre blockchain and pre machine learning.

Speaker 1

所以在Quantopian上构建的模型就像是这些手工制定的规则，比如我会买入，我会在夜间做多科技股，然后在日本这么做之类的，本质上你只是在编写手写算法，而这种算法已经不再符合现在对算法的定义了。

And so the models you would build on Quantopian would be like these kind of hand crafted rules like I'm gonna buy it, I'm gonna go long tech overnight and then I do this in Japan or whatever like you would be basically writing a hand coded algorithm which is not even what is like meant by algorithm anymore.

Speaker 1

现在大家说的算法都是指从数据中学习的机器学习算法。

Everybody means like the learning algorithm where you learn from the data.

Speaker 1

而Numerai的方法一直是这样的：你访问网站，下载数据，然后按照自己的方式开始建模。

That was always numerized approaches like you go to the website, you download the data and you start modeling it however you want.

Speaker 1

而Quantopian则是：你能否在看不到数据的情况下，通过他们的Python界面手工编写一些规则，运行一些回测，然后获得信心，但你根本做不到。

Whereas Quantopian was, can you hand code some rules in their Python interface without even seeing the data, run some back tests and get confident and you just can't.

Speaker 1

而且他们没有质押机制。

And then they didn't have staking.

Speaker 1

Quantopian上没有任何利益绑定，他们甚至曾经写过一篇论文，指出99%的社区成员都存在过拟合问题。

There was no skin in the game on Quantopian and they even wrote a paper at one point that 99% of the community was overfitting.

Speaker 1

我认为，如果我们没有用质押机制解决过拟合问题，Numerai的社区也可能有99%的人在过拟合。

And I think Numerai might have had a community that was 99% overfitting if we didn't solve overfitting with staking.

Speaker 1

这两点让它们变得截然不同，我实际上不认为前Quantopian用户和Numerai用户之间有太多重叠，因为Quantopian用户更像是量化交易员。

Those two things make it extremely different and I actually don't think there's that much overlap in former Quantopian users and Numerai users because Quantopian users were kinda quants.

Speaker 1

他们关注的是量化交易员会关注的东西。

They were looking at things that a quant would look at.

Speaker 1

但在Numerai上，你唯一看到的是一个经过混淆的2000维数据对象，完全不知道它是什么。

But on NumeraI, the only thing you're looking at is an obfuscated 2,000 dimensional data object and it's no idea what it is.

Speaker 1

所以，加入Numerai的是一类完全不同的工程师。

So it's just a different class of engineer that would join Numerai.

Speaker 0

所以你有这么多Numerai参与者，他们提交自己的模型，对6000只匿名股票进行排序，并且附上某种保证金。

So you have all these Numerai players, they submit their models, their ranking of the 6,000 anonymized stocks, you have them submitting that with some sort of stake.

Speaker 0

然后Numerai将这些结果整合成一个按保证金加权的元模型。

And then Numerite takes that and builds a stake weighted meta model.

Speaker 0

在我看来，这是一个非平凡的决策。

And that to me is a non trivial decision.

Speaker 0

选择将所有这些底层的排序、赌注或你愿意称作的任何东西，仅仅根据每个人的保证金进行线性加权，这只是众多可能设计中的一种。

That decision to take all those underlying rankings or bets or whatever you want to put it, and actually just sort of linearly stake weight them based on everyone's stake is one of many, many, many potential designs.

Speaker 0

这个方法看似简单，相对于其他可能的做法，比如根据它们的边际贡献来排序等。

One that it seems frankly simplistic relative to some of the other things you could do, like ranking them based on their marginal contribution or something like that.

Speaker 0

有点事后诸葛亮，但可以长期跟踪用户。

A little post hoc, but tracking users over time.

Speaker 0

你们是怎么决定采用这种简单的押注加权方法的？

How did you come on using a simple steak weighted methodology?

Speaker 1

是的。

Yeah.

Speaker 1

很多人会问，你们为什么不直接用最好的模型呢？

Many people have said, well, don't you just use the best models?

Speaker 1

但‘最好’是以什么标准衡量的？

But the best by what measure?

Speaker 1

是过去十二个月表现最好的吗？

Over the last twelve months they were the best?

Speaker 1

那又怎样？

Well, what?

Speaker 1

你知道，有时候这些并不是最好的。

You know, sometimes those aren't gonna be the best.

Speaker 1

事实上，它们在接下来的十二个月里可能是最差的。

In fact, they might be the worst in the subsequent twelve months.

Speaker 1

权益加权很巧妙，因为它本质上总是前瞻性的。

Stake waiting is clever because it's basically always prospective.

Speaker 1

它总是基于当前这群人认为自己未来会有怎样的表现和MMC。

It's always this group of people currently think they are going to have live performance and MMC going forward.

Speaker 1

这总是更好的。

That's always better.

Speaker 1

然后你会说，那为什么不用MMC来加权呢？

And then you say, well, why don't you wait by MMC?

Speaker 1

猜猜怎么样？

Well guess what?

Speaker 1

权益。

The stakes.

Speaker 1

因为人们会滚动他们的权益，他们持续将NMR收益加入到自己的权益中。

Because people roll their stakes, they're constantly earning NMR into their stakes.

Speaker 1

它由NMC加权。

It is weighted by NMC.

Speaker 1

那些持有百万美元NMR并进行质押的人，赢得了所有的这些NMR。

The people with a million dollars of NMR who are staking won all of that NMR.

Speaker 1

他们最初几乎一无所有，却在Numerai上赢取了价值一百万美元的NMR。

They started with almost nothing and they won a million dollars worth of NMR on Numerai.

Speaker 1

所以他们已经处于成为大趋势的位置。

So they're already in a position to be a big wave.

Speaker 1

假设他们决定将权益减半或减少80%，你真的愿意不跟着他们一起减仓吗？

Suppose that they decide to cut their stake in half or by 80%, do you really wanna not cut along with them?

Speaker 1

不，因为他们正在传达关于未来的某种信号。

No, because they're saying something about their future.

Speaker 1

它非常稳健且具有弹性，并且能够自我纠错。

It is very robust and resilient and it's error correcting.

Speaker 1

让我们假设押注有误，有个人押注的金额是他应该押注的十倍。

Let's imagine the stakes are wrong and one guy is staking 10 times more than he should.

Speaker 1

几个月后，他的押注会烧毁，最终他会押注在他应该押注的准确金额上。

Well, in a few months, his stake will burn and he'll be staking the exact right amount he should.

Speaker 1

这就是押注等待的思路，而且非常难以超越。

That's the thinking with stake waiting and it's very hard to beat.

Speaker 1

我们试过。

We have tried.

Speaker 1

我们原本以为最简单的方案不会成功，但它确实远胜于其他所有方法。

We didn't think it would be the easiest thing would be the one that worked, but it definitely beats everything else.

Speaker 0

一旦你有了这个元模型，你会如何思考投资组合的构建？

Once you have this meta model, how do you think about portfolio construction?

Speaker 0

具体来说，你如何在保持从元模型中获取阿尔法收益的同时，对行业、国家、货币和常见风格因子进行中性化处理？

Specifically, how do you neutralize exposures to sectors and countries and currencies and common style factors while still harvesting the alpha within that meta model.

Speaker 1

因为所有信号都是在这些中性化目标上训练的，所以元模型几乎没有想要承担的因子暴露。

So because all of the signals are trained on these neutralized targets, there's very little factor exposure that the meta model wants to take on.

Speaker 1

但可能会有一些，我的意思是，不能保证完全没有。

But there might be some, I mean there's no guarantee that there'll be zero.

Speaker 1

也许会对动量有10%的押注。

Maybe there's a 10% bet on momentum.

Speaker 1

在Numerai内部，我们有一个风险团队，会将元模型对所有我们关心的因子进行中性化处理，这涉及很多方面。

Well at Numerai internally we have a team for risk and we will neutralize the meta model to every single thing that we care to neutralize, which is loads of things.

Speaker 1

这会将投资组合调整到一个最大化元模型表现的位置，同时避免我们不想承担的任何风险。

And that will put the portfolio in a place that maximizes the expression of the meta model but subject to no risks that we don't want to take.

Speaker 0

你已经做了很多年了。

You've been doing this for many years now.

Speaker 0

2023年是系统的一次严峻考验。

2023 was a standout year as a stress test to the system.

Speaker 0

这可能代表了——如果我理解得没错的话——你们迄今为止最大的回撤。

It probably represented, and correct me if I'm wrong here, but think it represented your deepest drawdown to date.

Speaker 0

你能回头谈谈那次经历吗？

Can you talk a little bit about that episode in hindsight?

Speaker 0

你认为是什么原因造成的？从中吸取了哪些教训？

What do you think caused it and what were the lessons learned?

Speaker 1

这很有趣，因为2023年，有些人可能记得那是股市相对平静的时期。

It's interesting because 2023, maybe some people remember it as actually being kind of a calm period in this in stocks.

Speaker 1

市场在上涨，我想是这样。

Things were going up, I think.

Speaker 1

而我们真正经历的重大压力测试其实是2020年，也就是新冠疫情时期，那时我们的表现非常好。

Whereas the bigger stress tests we had were actually twenty twenty where we did very well, COVID.

Speaker 1

然后2021年是迷因股狂潮，我们的表现也非常出色。

And then 2021 was a meme stock rally where we did very well.

Speaker 1

接着2022年市场大幅崩盘，我们的表现依然很好。

And then 2022 was a big crash in the market where we did very well.

Speaker 1

但2023年，确实发生了某些事情，导致我们的表现特别糟糕。

But 2023, something did, you know, happen where we did particularly badly.

Speaker 1

最终，2023年全年我们的收益大约下跌了16%。

In the end, in 2023, I think we're down about 16% for the year.

Speaker 1

我认为，对于一只新基金来说，波动性是一个挑战，尤其是当你在做像Numerai这样全新事物的时候。起初，我们使用了较低的杠杆和较低的波动性，可能总共只有两倍杠杆，后来才逐渐提高到六倍，并设定了更高的波动率目标。

The volatility of the funds I guess when you have a new fund, one of the challenges is especially when you're doing something brand new like Numerai, in the beginning, we started with low leverage and low volatility, maybe even two times leverage total, and then we crept up to six with a higher volatility target.

Speaker 1

在那一年我们亏损了16%的时候，我们的波动率是15%。

So in that year where we made 16% negative return, our volatility was 15%.

Speaker 1

这并不算太离谱。

So it wasn't that crazy.

Speaker 1

你才四岁，到那时可能经历一次标准差级别的回撤很正常，但依然令人失望。

You're four years old, you're gonna have one standard deviation drawdown by then maybe, but it was still disappointing.

Speaker 1

当时最大的教训是，我们认为，只要全力以赴确保拥有最好的阿尔法因子就行了。

The biggest lessons there were at the time I think we thought, well, we're just gonna work very very hard on making sure we have the best alphas.

Speaker 1

我们会对阿尔法因子下重注，并保持高度集中。

We're gonna take big bets on the alpha and have high concentration.

Speaker 1

因此，那时我们的投资组合有时只包含大约600只股票，这对于量化基金来说已经非常少了，而且单笔持仓规模也很大。

And so our portfolio at that time, it was sometimes as little as 600 stocks, which is quite small for a quant fund and the position sizes were high.

Speaker 1

这种策略是可行的。

That can work.

Speaker 1

2022年我们的回报远超15%，那时这个策略奏效了，其他时候也有效，但它让策略在亏损年份变得相当脆弱。

It worked for us in 2022 when we made much more than 15% and it worked in other times but it just basically made the strategy quite vulnerable to down years.

Speaker 1

这个行业的问题在于，即使你每五年才出现一次亏损年份，人们依然在寻找所谓的‘圣杯’投资产品。

And the trouble is with this business is even if you have like one down year every five or something, people are really looking for like a holy grail investment product.

Speaker 1

所以他们一看到你有亏损年份，就认为你之后的每一年都会亏损，而不是说，你四年里只有一年亏损，意味着75%的年份都是盈利的。

And so they think if you have a down year, well then all your subsequent years are gonna be down instead of saying, well you've had one down year in four, so 75% of your years are gonna be up.

Speaker 1

这个行业里普遍存在‘你最近为我做了什么’的心态，这很难理解，因为并不理性，但确实如此。

There's really quite a lot of what have you done for me lately in the industry, which is quite hard to grasp because it's not that rational, but it is a thing.

Speaker 1

因此，这段经历让我们意识到：我们不能只做一个普通的对冲基金。

And so this experience basically pushed us to be like, well, we can't be a good hedge fund.

Speaker 1

我们必须做到最好。

We have to be the best.

Speaker 1

我们必须在风险管理上变得极其出色。

We have to really, really get incredibly good at risk.

Speaker 1

于是我们在2023年投入大量精力构建了更先进的内部风险系统和目标体系，这最终带来了我们在2024年有史以来最好的一年。

And so we spent a lot of 2023 building much more advanced risk systems internally and for targets, and that led to our best year ever in 2024.

Speaker 1

这是一次很好的教训，我学到了很多关于各种事情的知识，但最重要的是提升了内部的风险管理能力，明白了它有多么重要。

It was a good lesson and I learned a lot about all kinds of things, but mainly leveling up internally in risk, how important that is.

Speaker 1

顺便说一句，如果你看看多策略平台的世界，你可能会觉得这一切都是由各个部分驱动的。

Just an aside, if you look at the multi strategy platform world, you know, you might say it's it's all driven by the parts.

Speaker 1

但也许并不是这样。

Maybe it's not.

Speaker 1

也许真正驱动它的，其实是中间的风险团队之类的东西。

You know, maybe it's actually driven by the risk team in the middle or something.

Speaker 1

这在量化领域里是一个你总能学到的教训。

And and that's a lesson you can always get in quant.

Speaker 1

是模型更重要吗？

Is it about the models?

Speaker 1

还是数据更重要？

Or is it about the data?

Speaker 1

又或者是风险更重要？

Or is it about the risk?

Speaker 1

而且在所有方面都要做到世界一流。

And it's always the world class at everything.

Speaker 0

你能谈谈那些风险系统升级具体包括哪些内容吗？

Can you talk a little bit about what some of those risk system level ups were?

Speaker 0

我知道在我们之前的通话中，你特别提到过一些关于分散化强制执行的方法和策略，2023年由于缺乏这些机制，导致我们的投资组合过于集中，后来我们加强了分散化的强制执行。

I know in our pre call you mentioned specifically methods and approaches to diversification enforcement that caused some issues not having those in place in 2023, as you mentioned, were sort of your most concentrated, you know, have a lot more enforcement of diversification.

Speaker 0

你能再详细说说这方面的情况，以及你们还实施了哪些其他风险系统改进吗？

Can you talk a little bit about that and maybe some other risk system enhancements that you put in place?

Speaker 1

一个有趣的现象是，一个对所有因子都保持中性的投资组合，表现可能有多差。

One of the interesting things is how badly a portfolio that is neutral to all of the bar factors can do.

Speaker 1

我相信你们的许多听众都在关注当前量化领域所面临的压力，那些原本非常知名且备受尊敬的量化公司，如今表现得异常糟糕。

I'm sure many of your listeners are following some of the stress in quant at the moment where these otherwise very knowledgeable and prestigious quant firms are doing incredibly badly.

Speaker 1

这其中很大一部分原因在于它们过度依赖普遍使用的风险模型，比如Barra风格的风险模型。

And a lot of that is reliance on the risk model that's commonly used, like a Barra style risk model.

Speaker 1

当Numerai经历我们唯一亏损的那一年时，我们的投资组合对所有Barra因子都保持完全中性。

When Numerai had our down year of only down year, we were running completely neutral to all of the bara factors.

Speaker 1

系统本身并没有什么问题，我们的一位投资者甚至要求我们发一下投资组合，只是想确认一下它确实是中性的。

There was nothing kind of wrong with the system and one of our investors actually asked us, please send me the portfolio, just wanna check that this is neutral.

Speaker 1

然后他回复说，完全没想到会这样。

And then he said back up, completely new.

Speaker 1

我真不敢相信你们表现得这么差。

I can't believe you're doing so badly.

Speaker 1

那这是怎么回事？

So what is that?

Speaker 1

这是拥挤交易。

Well, it's crowding.

Speaker 1

拥挤交易，就是你持有的东西正是所有人都在持有的。

Crowding, it's like you're holding a bunch of stuff that everybody else holds.

Speaker 1

这使得实际风险比风险模型显示的要大得多。

And that makes things more dangerous than it seems from a risk model.

Speaker 1

你的回撤幅度会比波动率所暗示的要大得多。

Your drawdown will be bigger than your volatility would suggest.

Speaker 1

Numerai 做了什么？

What did Numerai do?

Speaker 1

我之前跟你说过的很多内容，比如 MMC 决定在 MMC 上采取非常激进的策略，也就是说，你可能和市场相关，但我们想要的是与我们无关的 MMC。

Well, a lot of the stuff I was telling you about earlier like MMC Deciding to be really aggressive on MMC and be like, look, you might be correlated with the market, but we want MMC which is uncorrelated from us.

Speaker 1

如果你激励成千上万的数据科学家去构建一个反拥挤的模型，每个个体都在努力与众不同，那么在市场压力时期，你的表现实际上会好得多。

And if you incentivize thousands of data scientists to make a model that is anti crowding, individual contributors are trying to be different, then you're going to actually do a lot better in times of market stress.

Speaker 1

过去两年里，我们完全见证了这一点。

That's what we've seen totally in the past two years.

Speaker 1

这是最重要的教训之一：提升 MMC，不再那么关注扩大用户基数，而是更注重确保每位贡献者的个体阿尔法非常强劲。

That was one of the biggest lessons, upwaiting MMC, being more less interested in growing the user base and more interested in making sure the individual alphas are very strong from each of the contributors.

Speaker 0

实盘运营基金与纸上谈兵的基金运作非常不同，它涉及交易成本、容量、基金治理等各种问题。

Running a fund live is very different than running a fund on paper and it includes all sorts of things like transaction costs and capacity and fund governance and all those sorts of issues.

Speaker 0

在将这些理论上对股票进行排序或提供理论阿尔法的纸面策略，转化为现实世界实施的过程中，你学到了什么？

What have you learned about bridging the gap between these theoretical paper strategies that are ranking stocks or providing these models on theoretical alpha versus taking that and turning it into real world implementation?

Speaker 0

我这个问题最天真的例子就是：有人可能在纸上算出一个阿尔法，但现实是，扣除交易成本后，这个阿尔法根本无法实现。

Most naive example of my question would be someone could rank something on paper alpha, but the reality is that alpha is not capturable after TCOS.

Speaker 0

你们如何确保他们提供的确实是真正可执行的阿尔法信号？

How do you make sure that they're providing you alpha signals that are really truly implementable alpha signals?

Speaker 1

在Numerai早期，我在这方面做得并不好。

In the beginning of Numerai, I wasn't the best at this.

Speaker 1

我们的一些阿尔法信号是，哦，那些是小市值股票，你根本无法做空，等等。

Some of our alpha was, oh, well, that's on small capsule names that you can't short and so on.

Speaker 1

但Numerai的竞赛对这些问题变得非常严格。

But Numerai's tournament got very strict on all of that.

Speaker 1

例如，过去我们有8000只股票的池子，后来缩减到了6000只。

So actually in the past, for example, we had an 8,000 stock universe and we took it down to 6,000.

Speaker 1

那为什么这么做呢？

So why is that?

Speaker 1

因为我们获得了更好的做空可用性数据和更好的流动性数据，意识到那后2000只股票根本无法建立有意义的仓位，于是就把它们从池子中移除了。

Well, we got much better short availability data, much better liquidity data and we realized that those bottom 2,000 stocks, we couldn't get a meaningful position in anyway and so we took them out of the universe.

Speaker 1

所以现在当我们把它们移出池子后，用户就不会再想着去押注这些微小股票来赚钱了。

So now when we take them out of the universe, none of the users are gonna pick up on, oh, I can go along this tiny stock and make money.

Speaker 1

模型根本无法学到这一点。

Well, the model never gets to learn that.

Speaker 1

当我们处理好股票池后，模型就会自然而然地运作良好。

When we take care of the universe then the models will kind of take care of themselves.

Speaker 1

这是我们年复一年所做的事：改进我们的目标，使其对更多因素保持中性，并产生更可交易的阿尔法信号。

That's the type of thing we've done year after year improving our targets to make them neutral to more things and more tradable alpha.

Speaker 1

你能在这方面做到多么出色，真是令人惊叹。

It's amazing how good you can get at that.

Speaker 1

细节真的很重要，但你可以让你的阿尔法信号真正有效地转移到投资组合中。

The details really matter but you can make your alpha really transfer to the portfolio.

Speaker 0

你提到你最近举办了NumerCon，并在那里发布了一些令人兴奋的消息。

You mentioned you recently hosted NumerCon and you had some pretty exciting announcements there.

Speaker 0

我注意到其中一项是推出了一个新的MCP，它支持智能体模型管理。

One of which that I caught was the launch of a new MCP that enables agentic model management.

Speaker 0

这将适用于所有正在使用模型智能体的人。

And this will be for anyone who is playing with model agents.

Speaker 0

对于那些可能没有紧跟当前LLM和智能体模型前沿发展的人来说，这是一项令人兴奋的时代性进展，但对他们而言可能毫无意义。

This is an exciting of the times development for those who maybe are not as at the vanguard of playing with the current state of LLMs and agentic models, this might not mean anything.

Speaker 0

听起来可能像胡言乱语。

It might sound like gibberish.

Speaker 0

所以也许我们可以退一步来说。

So maybe we can start with a step back.

Speaker 0

你们到底在构建什么？

What are you building?

Speaker 0

你们提供的是什么？

What are you providing?

Speaker 0

Numerai为模型构建者带来了哪些新的工具集突破？

What's the new toolset unlock for model builders at Numerai?

Speaker 1

有人曾批评Numerai，说大家都知道在表格数据集上该用什么模型。

One of the criticisms of Numerai was maybe that, well, everybody knows the right model to use on a tabular dataset.

Speaker 1

所以一些较年长的人可能会说，你只需要用线性回归，就能得到一个确定性模型，清楚地知道系数，还能解释模型，然后就这么用。

So maybe some of the older people would say, well, you just use linear regression, you get a deterministic model and you know the coefficients and you can interpret the model and you use that.

Speaker 1

明白吗？

Okay?

Speaker 1

但其他人会说，你直接用XGBoost或LightGBM模型就好了，因为它们在表格数据上表现非常出色。

But the other people would say, well, you just use an XGBoost or a light GBM model because that's a very good model for tabular data.

Speaker 1

Numerai却问：如果事实并非如此呢？

Numerai said, what if it's not?

Speaker 1

如果最好的模型其实是将所有模型结合在一起，用加权的元模型呢？

Like, what if the best model is actually all of the models put together with state weighted meta model?

Speaker 1

总有人觉得，总有更好的方法可以尝试。

The idea that there's always something better to do is always out there.

Speaker 1

所以，我在刚开始参与Numerai时发现的一个有趣现象是，许多参赛者都来自所谓的AutoML公司。

So one of the things I found interesting when starting Numerai is that many of the participants were working at what was called AutoML companies.

Speaker 1

AutoML的意思是，你提供一个数据集，哪怕它有点混乱，我们也能自动帮你找出最佳方案。

So AutoML is you give us a dataset and it can be kinda messy and stuff and we will figure out automatically what to do.

Speaker 1

针对你的特定数据问题，哪个模型才是最好的？

What's the best model for your particular data problem?

Speaker 1

于是这些AutoML公司加入进来，使用它们的AutoML系统进行各种特征转换，再结合XGBoost和其他各种技巧。

And so you had these AutoML companies joining and actually using their AutoML systems, is doing all these feature transformations and then doing XGBoost and all these different tricks.

Speaker 1

但它们的实际表现反而不如那些个人参赛者。

And they would actually not do as well as the people.

Speaker 1

也许这些个人参赛者是在使用AutoML的同时还结合了其他方法。

Maybe the people were using AutoML plus something else.

Speaker 1

所以这始终是一场竞赛。

So it's always a kind of race.

Speaker 1

但在NuruCon上，我们意识到出现了一种新的AutoML范式。

But what happened at NuruCon is we decided there's kind of a new paradigm of AutoML that developed.

Speaker 1

我认为从十一月开始，这件事变得越来越有趣了：你能否让一个智能体来执行AutoML，或者真正打造一个AI科学家，它能访问大量数据，理解数据的各种特性，懂得如何进行向前滚动交叉验证。

And I would say it's in November that this started to get interesting, which is can you make an agent do AutoML or really make an AI scientist that has access to numerous data, understands different peculiarities of the data, understands how to do a walk forward cross validation.

Speaker 1

如果你为这一切搭建好了框架，那么你就可以完全放手，让AI自行运作。

And if you could have the scaffolding set up for that, then you would really just get out of the way of AI.

Speaker 1

我们认为这简直是天赐良机：Anthropic在模型上投入巨资，OpenAI也在模型上大举投入，实际上却以低于成本的价格将这些成果提供给所有Numeraire用户。

We see it as a gift from God that Anthropic is investing so much money into their models and OpenAI investing so much in their models and basically giving it to all of Numeraire users for below cost.

Speaker 1

现在每个Numeraire用户都像是智商提高了100分，而他们原本就已经非常优秀了。

Every Numeraire user is now like a 100 IQ points higher and they were already very high.

Speaker 1

现在他们可以真正构建出能够代替他们完成所有工作层次的智能工作流，并同时并行运行上万次实验。

Now they can actually make agentic workflows that do every layer of their work for them and then run 10,000 experiments in parallel.

Speaker 1

谁知道会发生什么？

Who knows what could happen?

Speaker 1

所以这次发布实际上更侧重于我们发布的这个名为skills.md的文件，而不是MCP服务器——后者只是用来与API交互的。

So the release is actually probably more about this file we released called skills dot m d than it is about the MCP server, which is just to interact with the kind of API.

Speaker 1

但关键是这些技能。

But the skills.

Speaker 1

md文件定义了Numerai自己如何开展研究。

Md defines how Numerai would do research ourselves.

Speaker 1

它包含了我们所有关于如何做好数据科学的知识。

So it's got everything we know about how to do good data science.

Speaker 1

它所解决的问题是：如果你现在让Claude或OpenAI直接处理Numeraized数据，它迟早会在某个环节出错。

The problem it solves is if you point, say, Claude or OpenAI at numerized data right now, it will mess up somewhere.

Speaker 1

他们不会意识到这些数据实际上是时间序列之类的。

They won't realize that the data is actually time series or something.

Speaker 1

他们会假装所有行都可以互换使用，但实际上存在时间维度。

They'll just pretend all the rows can be used interchangeably when actually there's a time dimension.

Speaker 1

因此，它们会犯下许多这样的小错误，但这并不是因为它们笨。

And so there's all these little mistakes that'll make, but it's not because it's dumb.

Speaker 1

只是因为它们对这个问题了解得不够深入。

It's just because it doesn't know the problem very well.

Speaker 1

我们已经将这个问题的解决方案编码到了skills.md文件中。

And we've encoded the problem into the skills md file.

Speaker 1

所以现在你只需用三行代码，就能让一个智能体开始解决这个问题，并且不会犯任何错误。

So now you can kind of with three lines of code have an agent start attacking the problem and not make any mistakes.

Speaker 1

我们举的一个例子是，这个智能体大约需要三天时间才能完成工作。

And so the one example we gave, the agent goes for about three days before completing its work.

Speaker 1

因此，你实际上只需什么都不做地等待三天，然后就会产出一个在排行榜上表现非常强劲的模型。

So you're really just doing nothing for three days and then out comes a model that's probably really strong on the leaderboard.

Speaker 1

这真的是未来。

It's really the future.

Speaker 1

这就像Numerai的奇点时刻。

It's like singularity moment for Numerai.

Speaker 1

我们早已为这一刻搭建了Numerai，而我们看到的智能水平令人惊叹。

We've set up Numerai to be for this moment and the level of intelligence we're seeing coming in is amazing.

Speaker 0

这真是令人兴奋的东西。

That's really exciting stuff.

Speaker 0

当你退后一步思考Numerai时，我觉得它有五个支柱，如果我说得过于简单了，还请见谅，但我认为它包括数据、模型开发、分布式模型开发，而这种智能体机会在模型开发方面是一个重大突破。

You know, when I take a big step back and I think about Numerai, think about sort of like there's five pillars, and I apologise if this is overly simplistic, but I sort of think of it as the data model development, distributed model development, where this agentic opportunity is a major step forward in the model development.

Speaker 0

然后你有元模型、风险管理，最后是实际执行。

Then you have the meta model, the risk management, and then finally the actual execution.

Speaker 0

更不用说基金管理和所有那些事了，我会把这些搁置一旁。

To speak nothing of actually the fund administration and all that, I'm going sweep that aside.

Speaker 0

但这些可以说是五大支柱。

But these are sort of five big pillars.

Speaker 0

我们谈到了在模型开发方面，随着AI代理的采用所取得的进展。

And we talked about advancements in the model development side with the adoption of AI agents.

Speaker 0

我们讨论了元模型和质押的构建。

We've talked about the construction of the meta model and staking.

Speaker 0

我们谈到了风险管理方面的变化。

We've talked about changes to risk management.

Speaker 0

我们还没谈到执行，但有一件我们没讨论的重要事情是数据。

We haven't talked about execution yet, but one of the big things we haven't talked about is data.

Speaker 0

我们只是轻描淡写地提到，有一些数据，而且是经过混淆的。

And we sort of glossed over, okay, there is some data and it's obfuscated.

Speaker 0

随着你继续引入新数据，并解锁这种代理式流程，你如何看待数据方面的进步？数据量的上限在哪里？更多的数据现在是否反而成为一种潜力？

How do you think about data advancements as you move forward incorporating new data and the limits of how much data is ultimately useful or is more data now a potential as you unlock this sort of agentic process?

Speaker 1

数据真的非常重要，我们一直尽可能多地引入数据。

Data is really really important and we've always just gone ahead and onboarded as much data as we could.

Speaker 1

但我们的团队规模也比较小。

But we also are a smaller team.

Speaker 1

我们并不想变成三四百人的公司。

We don't really wanna be 300, 400 people.

Speaker 1

所以我们现在只有大约20个人。

So we're like 20 people.

Speaker 1

因此，我们做了很多数据接入自动化的工作。

So a lot of what we did was build data onboarding automation.

Speaker 1

你可以想象，是代理在为我们接入数据。

Think about like agents are onboarding our data.

Speaker 1

当你和数据供应商试用时，

You get a trial with a data vendor and

Speaker 0

然后

then

Speaker 1

AI会判断这些数据是否真的有增益价值。

the AI figures out is this data actually additive.

Speaker 1

在Numerai内部，所有这些也都通过智能代理编程在进行。

Internally, all of this is happening at Numerai as well with agentic coding.

Speaker 1

但最令人兴奋的一点是能够获取互联网上的数据。

But one of the most exciting things is getting access to data on the Internet.

Speaker 1

我们有一个项目叫做Numerai预测型大语言模型。

So one of the projects we have is called Numerai Predictive LLM.

Speaker 1

它利用大语言模型的思维能力，一个开源的大语言模型，然后重新设计它的思维结构，告诉你，目前你能够写诗、做数学运算等等。

And it takes the kind of brain of an LLM, an open source LLM, and then it re engineers the brain of it and says, you know, currently you can do things like write poetry and do math and so on.

Speaker 1

我们只是想用你的思维来根据新闻预测股票走势。

We just wanna use your brain to predict stocks based on news.

Speaker 1

输入是新闻，输出是对股票未来走势的预测。

So input is news and the output is a prediction of where the stock's going in the future.

Speaker 1

这个预测型大语言模型，我们可以从基于互联网的文本数据中生成完全新颖且独立的特征。

And this predictive LLM, we can generate features that are completely new and orthogonal from internet based text data.

Speaker 1

这也是非常令人兴奋的一部分，它真正展示了我们的定位。

So that's a very exciting part of it as well and that really shows you where we are.

Speaker 1

我们并不是为对冲基金打造的Kaggle之类的东西。

We are not Kaggle for hedge funds or something like that.

Speaker 1

我们现在正处于公司的一个全新阶段，代理正在编写内部代码，大量用户正在使用这些代理，而我们正通过大语言模型阅读整个互联网来生成数据。

We are now like in a whole new phase of the company where agents are coding the internal code, numerous users are using agents, and then we are creating data from LLMs reading the whole Internet.

Speaker 1

所以，事情已经发展到让我感叹：这正是我一直期待的样子。

So it's getting to the point of like, wow, this is what I always hoped for.

Speaker 1

因此，现在正是Numerai令人兴奋的时刻。

And that's why it's an exciting time at Numerai.

Speaker 0

我们来聊聊产品包装吧。

Wanna talk a little bit about packaging here.

Speaker 0

在之前的通话中，我们讨论过投资者如何获取Numerai服务的交付机制。

In our pre call, we had a conversation where you were talking about the delivery mechanism with which investors can get access to Numerai.

Speaker 0

历史上，它一直是一种多空市场中性、行业中性、板块中性的对冲基金，典型的股票市场中性对冲基金。

Historically, it's been in a long short market neutral, sector neutral, industry neutral, factory neutral hedge fund, classic equity market neutral hedge fund.

Speaker 0

你目前正在考虑的是一个可移植阿尔法的实施方案。

One of the things you're currently thinking about is a portable alpha implementation.

Speaker 0

当然，我明白这其中存在诸多监管限制，不会透露太多细节，那么你如何看待Numerai未来的愿景？

So without maybe giving too much away, and I know there's obviously regulatory constraints around all this, how are you thinking about the vision of how Numerai evolves from here?

Speaker 1

在Numericon上，我们推出了一只名为Numerai Singularity的新基金。

At Numericon, we released a new fund called Numerai Singularity.

Speaker 1

Numerai Singularity同时做多全球股票和Numerai的Alpha，这也被称为收益叠加。

And Numerai Singularity is just long global equities and long Numerai's Alpha at the same time, also known as return stacking.

Speaker 1

你可能对此有所了解。

You might be familiar.

Speaker 1

但这确实是一个非常非常匹配的组合。

But it really is a very very good match.

Speaker 1

我总是觉得有点烦，也许这听起来很枯燥，但人们总说：‘你们去年只有9%的阿尔法收益。’

And one of the things I always find annoying and maybe this is dull, but people say, oh, you only had 9% alpha last year.

Speaker 1

你们跑输标普500了。

You lost to the S and P.

Speaker 1

我却说：不不不不不不不。

And I'm like, no no no no no no no.

Speaker 1

这9%的收益全部都是跑赢标普500的。

All of that 9% is beating the S and P.

Speaker 1

如果你想要这样，你就得这么包装它。

You just have to package it that way if you want it that way.

Speaker 1

我们决定把它放在全球股票上，仅仅因为我们的阿尔法与世界无关，并不意味着我们不相信世界或全球股票市场。

We decided let's make it on global equities just because our alpha is orthogonal to the world doesn't mean we don't believe in the world, the global equity market.

Speaker 1

该基金几天前刚刚推出，所以目前我是唯一的投资者，先启动起来。

That fund launched just a couple of days ago, so I'm the only investor just to get started.

Speaker 1

但终于推出这个产品真的令人兴奋，因为人们有一种误解，认为如果你在构建一个市场中性基金，那你一定对市场非常悲观。

But it is really exciting to finally have that out because I think there's a perception that if you're building a market neutral fund, you must be really negative on the market.

Speaker 1

但实际上，市场组合非常棒。

But actually the market portfolio is amazing.

Speaker 1

它是我最喜欢的东西之一。

It's like my favorite thing.

Speaker 1

我不喜欢的一点是，人们把买入标普500称为被动投资，这其实掩盖了真相。

One of the things I don't like is that people call buying the S and P passive investing because it really hides the ball.

Speaker 1

事实上，你实际上是让全球500位最优秀的首席执行官全天候主动为你管理资本。

In reality, it's actually you've got 500 of the world's best CEOs to actively manage your capital all day, every day.

Speaker 1

顺便说一下，它们拥有成千上万的世界顶尖人才为其工作。

And by the way, they have like hundreds and hundreds of thousands of the top talent in the world working for them.

Speaker 1

那你怎么会觉得市场组合很无聊呢？

So why do you think the market portfolio is boring?

Speaker 1

这几乎达到了科幻级别的惊人程度。

That's almost sci fi level amazing.

Speaker 1

我喜欢市场组合，它和Numerai完美结合，因为即使你看历史表现，也会发现：Numerai什么时候表现不好？

So I like the market portfolio and it's a perfect thing to blend with Numerai because even if you look at the historical performance, it's like, okay, when did Numerai do badly?

Speaker 1

嗯，2023年是我们表现不佳的一年。

Well, 2023 was our bad year.

Speaker 1

明白了。

Okay.

Speaker 1

猜猜怎么着？

Guess what?

Speaker 1

那一年市场却大幅上涨。

The market went up a lot that year.

Speaker 1

如果将它们叠加起来，会带来更佳、更平稳的回报路径，并且回撤也低得多。

If you stack them, it makes for a much nicer, smoother return path and and much lower drawdown.

Speaker 0

恭喜你成功发布这个产品。

Well, congratulations on that launch.

Speaker 0

我绝对是可移植阿尔法和收益叠加的支持者，所以听到你成功推出这个产品，我感到非常兴奋。

I'm certainly a proponent of portable alpha and return stacking, so it's exciting to hear that you got that out the door.

Speaker 0

我想用一个我一直在这季节目中反复问嘉宾的问题来结束这次对话，这个问题超出了我们之前讨论的所有内容。

I wanna end this conversation with the same question I have asked my guests consistently across this season, and it's outside the realm of all the questions we've had here.

Speaker 0

而且它与工作、与Numerai无关。

And it is outside of work, of numerai.

Speaker 0

今天你对什么事物充满热情，甚至着迷？

What is something that you are incredibly passionate about today or even obsessed with?

Speaker 0

这可能是一个想法。

This could be an idea.

Speaker 0

也可能是一部影视作品。

It could be some sort of media show.

Speaker 0

也可能是你在读的某本书。

It could be something you're reading.

Speaker 0

也可能是来自历史的某件事，或者一个深深吸引你的想法。

It could be something from history or just an idea that's really grabbed a hold of you.

Speaker 0

今天有什么东西是你痴迷并充满热情的吗？

What is something that you are obsessed with and passionate about today?

Speaker 1

我最近一直在思考的一件事是时间本身。

One thing I've been thinking about a lot lately is time itself.

Speaker 1

也许是因为我是个投资者，所以会想回报，比如你是否赚了20%。

Maybe it's from being an investor, but the return, are you someone made 20%.

Speaker 1

归根结底，这是关于时间的回报。

It's like, well, it's return over time.

Speaker 1

所以时间这个概念，也是你正在做的事情。

And so the time thing is like what you're doing as well.

Speaker 1

有一部分我也很惊讶，我过去想象的许多事情竟然在2015年就实现了，比如以太坊和人工智能。

Part of me is also surprised at how many of the things that I've imagined have come true in 2015 thinking about Ethereum and AI.

Speaker 1

现在我看到美国总统每天都在谈论以太坊和人工智能。

And then now I look at the president of The United States talking about Ethereum and AI every day or something.

Speaker 1

这很奇特。

It's peculiar.

Speaker 1

这似乎与‘被随机性愚弄’之类的想法背道而驰——你觉得自己擅长预测未来，但也许你只是运气好，下次未必还能做到，等等。

It kinda flies in the face of like fooled by randomness or something where it's like, yeah, you have this perception that you were good at predicting the future, but maybe you were just lucky and you won't be able to do it again and so on.

Speaker 1

因此，这种感觉是，我认为有时候时间仿佛一切早已发生。

And so that feeling, I think, that time is sometimes I think everything has already happened.

Speaker 1

我们正被拉向一个事先可知的未来。

We are being pulled towards a future that can be known upfront.

Speaker 1

这与Aquant应该说的完全相反，但有时我确实会质疑这一点。

This is like the opposite of what Aquant should say, but sometimes I do question that.

Speaker 1

我认为人们确实会这样想。

And I think people do think that way.