本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
如果我告诉你,有一个网站可以让你在一个界面里和所有主流 AI 模型聊天,你会怎么想?它有点像 ChatGPT,但每一次提问都会被路由到最适合你这条提示的 AI 模型。在今天的节目里,我们请来了 OpenRouter AI 的创始人兼 CEO Alex Atala。这是增长最快的 AI 模型市场,提供超过 400 个大语言模型,也是唯一真正了解人们如何使用 AI 模型、更重要的是未来可能如何使用它们的地方。它位于每个人写下的每一条提示和他们可能用到的每一个模型的交汇点。
What if I told you there was a single website you could go to where you can chat to any major AI model from one single interface? It's kind of like ChatGPT, but instead every prompt gets routed to the exact AI model that will do the best job for whatever your prompt might be. Well, on today's episode, we're joined by Alex Atala, the founder and CEO of OpenRouter AI. It's the fastest growing AI model marketplace with access to over 400 LLMs, making it the only place that really knows how people use AI models, and more importantly, how they might use them in the future. It's at the intersection of every single prompt that anyone writes and every model that they might ever be.
Alex Sotala,欢迎做客节目。兄弟,最近怎么样?
Alex Sotala, welcome to the show. How are you, man?
谢谢各位,很好。非常感谢邀请我。
Thanks, guys. Great. Thanks so much for having me on.
今天是周一。作为 OpenRoute 的创始人,你周末怎么过?大概出门闲逛、放松,完全不想公司的事?
So it is a Monday. How does the founder of OpenRoute spend his weekend? Presumably, you know, out and about, chilling, relaxing, not at all focused on the company?
哦,我特别喜欢那种一个会议都没安排的周末,就去咖啡馆,连续好几个小时做那些需要持续投入、积累势头的事。周六和周日我都这么干了,然后又把《银翼杀手》看了一遍。
Oh, I usually I love weekends with no meetings planned, and I just go to a coffee shop and just have tons of hours stacked in a row to do things that require a lot of momentum build up. So I did that at coffee shops on Saturday and Sunday, and then I watched Blade Runner again.
好的好的。Alex,我们在准备这期节目时,我禁不住想,你这十年的创业经历简直疯狂。OpenRoute 算是你做的第二件大事。
Okay. Okay. Well, so when we were preparing for this episode, Alex, I couldn't help but think that you've had a pretty insane decade of startup foundership. Right? So OpenRoute is kind of like your second major thing that you've done.
但在此之前,你是全球最大 NFT 市场 OpenSea 的联合创始人和 CTO。现在你又专注于一家顶尖的 AI 公司。听起来你正好站在过去十年两大关键技术赛道的交汇点。你能给我们讲讲你是怎么走到今天的,更重要的是你从哪里出发?带我们回顾 OpenSea 的历程,以及你怎么最终来到 OpenRouter AI。
But prior to doing that, you were the founder and CTO of OpenSea, the biggest NFT marketplace out there. And now you're focused on one of the biggest AI companies out there. So it sounds like, you know, you're you're at kind of like the pivot point of two of the most important technology sectors over the last decade. How do you can you just give us a bit of background as to, you know, how you ended up here and more importantly, where you started? Walk us through the journey of OpenSea and and how you ended up at OpenRata AI.
好的。2017 年底、2018 年初,我和 Devin Finzer 一起创办了 OpenSea,这是第一个 NFT 市场。它和 OpenRouter 有点像,当时 NFT 的元数据和媒体资源非常碎片化,而 OpenSea 把它们整合到了一起。这也是加密世界里第一次出现非同质化、可以点对点交易的单品。
Yeah. So I cofounded OpenSea with with Devin Finzer at the very beginning of 2018, very end of 2017. It was the first NFT marketplace. And it was, yeah, not dissimilar to OpenRouter in that there was a really fragmented ecosystem of NFT metadata and and, like, media that gets attached to these tokens. And it was the first example of, something in crypto that, like, could could be nonfungible, meaning it's a single thing that can be traded from person to person.
现实世界里大多数东西都是非同质化的,比如一把椅子就是非同质化的,而货币是同质化的。回到 2018 年,几乎没人用非同质化商品的视角去思考加密。而且非同质化商品的问题是,当时根本没有成型的标准。
Most things in the world are nonfungible. Like, a chair is nonfungible. A currency is fungible. So it was back back in 2018, no one was really thinking about crypto in terms of nonfungible goods. And, and the problem with the non with non fungible goods is that there weren't any real standard setup.
有很多异构的实现方式,想把一个非同质化物品以去中心化的方式表示并可交易。OpenSea 把这些非常分散的库存整合到一个地方,我们制定了元数据标准,做了大量工作,让每个系列的体验都极其顺畅。今天 AI 领域也能看到很多相似之处:同样是非常异构的生态,不同的大模型提供商有各自的 API 和功能支持。
There's a lot of heterogeneous, like, implementations for how to get, like, a non fungible item represented and tradable in a decentralized way. So OpenSea organized this, like, very heterogeneous inventory and and, put it together in one place. We came up with, like, a metadata standard. We did a lot of, like, a a lot of work to really make the experience super good for each collection. And you see a lot of those, a lot of similarities with how AI works today too, where there's also just a very heterogeneous ecosystem, a lot of different APIs and and different, like, features supported by language model providers.
OpenRouter 也做了大量工作来整合这一切。我在 OpenSea 工作到 2022 年,当时有点想尝试新东西。我八月底离职,几个月后 ChatGPT 就发布了。那段时间我最大的疑问是这会不会是一个赢家通吃的市场,因为 OpenAI 遥遥领先。我们当时有 CohereCommand。
And OpenRouter similarly does a lot of work to organize it all. I was at OpenSea until 2022 when I was kinda feeling the itch to do something new. And I'm at the very end of I left in August, and then ChatGPT came out a few months later. And, and my biggest question around that time was whether it was gonna be a winner take all market because opening eye was very far ahead of everybody else. And, you know, we had CohereCommand.
我们也有一些开源模型,但只有 OpenAI 真正可用。我用 GPT-3 API 做点小项目做实验。接着 Llama 在一月发布,非常激动人心,体积只有十分之一,在几个基准测试上赢了,但还不能聊天。几个月后,斯坦福的一个团队把它蒸馏成一个叫 Alpaca 的新模型。
We had a couple open source models, but OpenAI was the only really usable one. I was doing little projects to experiment with the g p t three API. And, and then Llama came out in January. Really exciting, about a tenth the size, one on a couple benchmarks, but it wasn't really chatable yet. And and it wasn't until a few months later that somebody a team at Stanford distilled it into a new model called Alpaca.
蒸馏的意思是,你用一组合成数据对模型进行定制或微调,这些数据是他们用 ChatGPT 做研究项目生成的。那是我所知的第一次成功的大规模蒸馏,而且模型真的能用。我在飞机上跟它聊天,当时想,哇。
And distillation means you you take the model and you customize it or fine tune it on a set of synthetic data that they made using ChatGPT as a research project. And and that was the it was the first successful major distillation that I'm aware of. And it was an actually usable model. I was, like, on the airplane talking to him. I was like, wow.
如果做这样一个模型只花 600 美元,那就不需要 1000 万美元。未来可能有成千上万个模型。突然间,这看起来像是新的经济原语、新的构建块,值得在互联网上拥有自己的位置,但当时没有。没有一个地方可以让你发现新的大语言模型,看看谁在用什么、为什么用,OpenRouter 就是这样起步的。
This is if it only took $600 to make something like this, then you don't need $10,000,000 to make a model. There might be, like, tens of thousands hundreds of thousands of models in the future. And suddenly, this started to look like a new, like, economic primitives, a new building block that people would that that kinda deserve their own place on the Internet, and it did there wasn't one. There wasn't a place where you could discover new language models and see who uses them and why, and that's how Open Router got started.
太棒了。我们频道特别痴迷的一件事就是探索前沿,如何正确观察这些前沿,分析它们,判断它们何时会发生。我回顾你的经历,你一次又一次展现出这种天赋。甚至早在黑客马拉松里破解 Wi-Fi 路由器时,你就非常超前。
That's amazing. So one of the things that we're we're obsessed with on this channel in particular is is exploring frontiers and how to properly see these frontiers and analyze them and understand when they're gonna happen. And when I was going through your history, you you have this talent consistently over time. And I even as far back as early on, I read you you were hacking Wi Fi routers in a hackathon. You're very early to that.
你很早进入 NFT,也很早就理解 AI 及其影响。我想请你讲讲,在探索这些新前沿时,你的思维过程和观察指标是什么?因为显然你在做某种模式匹配,显然你对什么会重要、为什么重要有感知,然后把自己嵌入到那个叙事里。
You were early to the NFTs. You were early to understanding AI and the impact that it would have. What I'd love for you to explain is the thought process and the indicators you look for when exploring these new frontiers. Because clearly, there's some sort of pattern matching going on. Clearly, you have some sort of awareness of what will be important and why it will be important and then inserting yourself into that narrative.
有没有固定模式?有没有你寻找的特定信号,让你在寻找新机会时做出这些决策?
Are there patterns? Are there certain things that you look for when searching for these new opportunities and that led you to make these decisions that you have?
我觉得很重要的一点是找到发烧友社区,看看你能不能加入。每当出现有生态系统潜力的新事物,就会有发烧友社区冒出来。互联网让这一切自助化,你可以直接加入。
I think there's there's a lot to be said for finding enthusiast communities and and seeing if you're gonna join it. Like, can you be an enthusiast with them? Like, whenever something new comes out that has, like, some kind of ecosystem potential, there's there are gonna be enthusiast communities that pop up. And the Internet has made it self serve. You could just join the communities.
Discord 是个被严重低估的平台,因为这些社区感觉半私密,你就像
Discord, I think, is a incredible and super underrated platform because the communities feel kind of private. You're, like, getting
你你
you you don't
你会觉得,就像,看到有人试图在Discord里搞SEO流量。可Discord里根本没有SEO流量。这里只有人们在聊他们真正热爱的事,而且话题会变得非常细分。当你找到一个Discord兴趣小组,里面讨论的是某个刚刚出现、还完全不好用的新技术时,你会看到大家就在那儿一起琢磨该怎么用它、怎么把它变好。我觉得这就是最先蹦出来的那种核心魔力。
feel like you're, you know, seeing somebody trying to get s you know, like, advertise something for SEO juice. There's no SEO juice in Discord. It's it's just people talking about what they're passionate about, and and it goes it gets really niche. And when you find, like, an interest group in Discord that, like, has to do with some some new piece of technology that's just being developed right now and doesn't really work very well at all, You get people who are just trying to figure out what to do with it and how to make it better. And I think that's like that's the first core piece of magic that jumps to mind.
你得愿意接受“怪”。因为如果你只看表面,这些东西都很蠢。对,就是个游戏,或者是个特别奇怪的游戏。我其实不算收藏类游戏玩家。
You got there's gotta be like a willingness to be weird. Because, like, if you if you jump into any of these communities at face value, it's stupid. Yeah. And, this is like just a game or it's, like, a really weird game. I mean, I'm not really in a collectible game.
所以我随时可能退群。而且你不仅得接受,还得有创意。就像,好吧,这只是区块链上的猫,大家来回撸猫。你不能把社区简单看成这样。
So I'm gonna leave right now. And you and not only do have to be aware, but you have to be creative. Like, okay. It's you know, this is you're just cats on the blockchain, and people are just, like, trading cats back and forth. It's you can't, like, look at the community as simply that.
得想想你能用它干点什么。它解锁了什么以前做不到的事?有些人就擅长这个,他们会冲进社区,现场头脑风暴,你能实时看到所有人一起想点子。另一个绝佳例子就是Midjourney的Discord。
Like, think about what you could do with it. Like, what what does this unlock that wasn't achievable before? And and and I think there are there are people who just are good who will do this, and they'll join the communities and and brainstorm live. And you can see everybody brainstorming, in real time. But, like, another incredible example of this was the mid journey Discord.
后来它成了Discord里最大的服务器,远超其他。为什么会这样?起初只是些奇怪、傻气、可能没啥用的东西,但你能看到所有爱好者现场混剪、头脑风暴,把它变得又美又实用,然后它就爆了。我觉得这是Discord上出现过的最神奇的小众社区,从毫无用武之地到疯狂有趣。
Now it it became the biggest disc biggest server in Discord by far. And, you know, why did that happen? Well, you could it it started with something weird, silly, maybe not super useful, but you could see all the enthusiasts, like, remixing and brainstorming live how to turn it into something beautiful and how to how to make it useful. And and then, you know, just explode it. Like, I it's the most it's the it's the most incredible, like, niche community I think that Discord has ever seen because of, like, how useless it started and how in insanely exciting it became.
我在2021年玩过一个叫Big Sleep的模型,能生成有点像DeviantArt风格的图。
So, like, I mean, I I I think I saw Big Sleep. I was, like, playing around with this model called Big Sleep in 2021 that let you generate images that look kind of like deviant art.
哦。
Okay.
那些图都是动图,其实看不太懂,但能出很酷的东西,说不定能当桌面壁纸。如果你混DeviantArt社区,你会欣赏这种味道。那一刻我就觉得,这儿有颗小火苗。然后又过了一两年,Midjourney才开始真正火起来。
And, and you could see you could like they're all animated images and they none of them really made sense, but you could get some really cool stuff. Not like, potentially something you'd wanna make your desktop wallpaper. And if you're really, like, deep in some deviant art communities, you know, you kind of appreciate it. And so and that that that was like, oh, this is like a kernel of something here. And it took, like, a like, another year or two before mid journey started to, like, pick up.
但那就像——
But that was like
Alex,你都在哪儿看到这些的?是随便逛论坛还是跟着鼻子走?
Where were you seeing all of this, Alex? Like, where were you scouring? Just random forums or wherever your nose told you to go?
基本上,有一个 Twitter 账号,我在努力回忆它叫什么,它会发布 AI 研究论文,并尝试展示你能用它们做什么。我在 2021 年左右发现了这个账号,我觉得它跟加密一点关系都没有,但它是一种方式——Big Sleep 是我第一次看到用 AI 生成可能变成 NFT 的东西。于是我开始实验,看看能不能引导它做出一个说得通的 NFT 系列,这非常非常难。但那就是我第一次做生成
Basically, there's this Twitter account, I'm trying to remember what it's called, that posts AI research papers and kind of tries to show what you can do with them. And I discovered this Twitter account in, like, 2021, and I think it it was not it was it wasn't at all, like, related to crypto, but it was a way you know, Big Sleep was, like, the first thing I saw that used AI to generate things that could potentially be NFTs. So I I started experimenting around, like, how how much you could direct it to make an NFT collection that would make any sense. It was very, very difficult. But that was how, that was, like, the first generative
那时候你甚至还没想创办 OpenRata,对吧?
And this was before you were even thinking about starting Open Rata. Right?
对对,那时候我还在 OpenSea 全职。哦,我得了肠绞痛。
Yeah. Yeah. This was back this was when I was full time at OpenSea. Oh, is yeah. I got the it's a colic.
这个 Twitter 账号——好,强烈推荐。他们基本上会发论文,然后解释、探索这篇论文怎么变得有用。他们会发动画。
This Twitter account Okay. Alright. Highly recommend it. They basically post papers and, like, explain and and explore how this paper gets useful. They post animations.
他们让 AI 研究变得有点好玩,那就是我的第一次体验。
Like, they make they make AI research, like, kind of fun to engage with, and that was that was my, like, first experience.
所以,这算是 X——当时还叫 Twitter——的一次巨大胜利,对吧?它催生了两大技术:加密,也就是 Crypto Twitter,现在又有了 AI 研究内容,正是这些把你引向了 OpenRata 的道路。
Okay. So, I mean, that's a massive win for X or formerly as it was known back then, Twitter as a platform. Right? It gave birth to kind of, like, two of the biggest technologies, crypto or also known as Crypto Twitter. And now apparently, like, you know, all the AI research stuff, which kind of put you on to the path that led you to OpenRata.
所以如果我理解得没错,Alex,你在 OpenSea 全职,那是一家市值数十亿美元的公司,有很多重要的事要做,但你还是抽时间去挖掘这项边缘技术——因为在 GPT-2 或 GPT-3 之前,AI 就是边缘的。你摆弄这些生成式 AI 模型,它们会产出这种“神奇小物质”,可能是一张图或一只怪猫。你跳进了这些爱好者的小众论坛,继续深挖。听起来你离开 OpenSea 后还在打磨它。
So if I've got this right, Alex, you know, you were full time at OpenSea with, you know, multibillion dollar company, loads of important stuff to do there, but you still found the time to kind of scour this fringe technology because that's what AI was at the time prior to kind of GPT two or GPT three. No one really knew about this. And you were playing around with these gen AI models, these generative AI models that would, you know, create this magical little substance and and maybe it came in the form of a picture or a weird little cat. And you kind of, like, jumped into these niche forums of enthusiasts, as you say, and and kind of explored that further. And it sounds like you kind of, like, honed that even beyond your journey from OpenSea when you left.
我记得在你离开 OpenSea 到创办 OpenRouter 之间的那段“深渊期”见过你,当时你在头脑风暴一堆点子。我记得在一次联合办公的休息区里,你白板上写满了 AI 的东西,其中就有“推理”这个词。说实话,Alex,那时候我根本不知道“推理”是啥意思。我满脑子都是 NFT 和加密。
I I remember actually meeting you in this kinda like this abyss between you leaving OpenSea and starting OpenRouter where you were kind of brainstorming a bunch of these ideas. And I remember a snippet from our conversation in, like, one of the reworks here where you just kind of, like, had whiteboarded a bunch of AI stuff. And and one of those things was kind of, like, the whole topic of inference. And if I'm being honest with you, Alex, I had no idea what that word even meant back then. I was I was extremely focused on on all the NFT stuff and and all the crypto stuff.
我的背景全在那块,但你总是扎进早期社区,我觉得这真的很重要。我想接着你提到的“玩早期 AI 模型”这个话题深挖一下。你说你在 GPT 之前、Claude 还没诞生的时候,就在玩各种随机模型,在论坛、Twitter 或 Reddit 上找到它们,然后做实验,对吧?
My background's in in all of that, but I I just found that fascinating that you always had your nose in some of the early communities, and I think that's a a really important lesson there. I wanna pick up on something that you actually brought up when you said you discovered kind of like your path to open router, Alex, and that is you said you were playing around with these early AI models. So not the GPTs before Claude was even created. You're playing around with these random models that you would find either on forums, on Twitter, or on Reddit. Right?
你当时就在实验。我觉得特别神奇的是,即便后来 GPT 火了,你仍坚信未来会有几十万——或者说几十万——AI 模型?那时候没人这么想。当时大家都觉得得砸几千万甚至几亿美元,这是富人的游戏。
And you would experiment with them. And I find it fascinating that back then, even when GPT became a thing, you were convinced that there would be hundreds of thou or would you say hundreds of thousands of AI models? Back then that wasn't a normal view. Back then, everyone was like, you need hundreds of millions of dollars. Maybe it was tens of millions of dollars back then, and it was gonna be a rich man's game.
是的。基本上正是 Alpaca 项目让我下定决心,相信未来会有非常非常非常多的模型,而不是只有极少数几个。
Yeah. It was basically the Alpaca project that that kind of put me over the fence on model on on there being, like, many, many, many models instead of just a a very small number.
那你能给观众解释一下 Alpaca 项目到底是什么吗?
And and can you explain what the alpaca project is for the for the audience?
好的。Alpaca 项目呢,你知道,在 LLaMA 发布之后,它其实不太能好好聊天。它只是一个文本补全模型。虽然它在几个基准测试上打败了 GPT-3,而且体积大约只有大家以为的 GPT-3 的十分之一,所以本身已经很了不起,但用户体验还不到位。
Yeah. So the Alpaca project, like, you know, after llama came out, it it you really could not chat with it very well. It was it was a text completion model. They there were, like, a couple benchmarks where it beat g p t three, and it was about a tenth the size of, like, what what most people thought g p t three was was sized at. So it was a pretty incredible achievement, but, you can it wasn't really like it didn't the user experience wasn't there.
于是,Alpaca 项目就用 ChatGPT 生成了一大堆合成输出,然后用这些合成数据去微调 LLaMA。这样做给 LLaMA 带来了两点提升:一是风格,二是知识。风格就是教会它怎么聊天,这正好是用户体验最大的缺口;同时它变聪明了——微调既能传递风格也能传递知识,模型在那些合成数据反映的内容上,之后的基准测试表现就提高了。
And, the Alpaca project took ChatGPT and generated a bunch of synthetic outputs, and then they fine tuned llama on those synthetic outputs. And this did two things to llama. It taught it style, and it taught it knowledge. It taught it like, the the style is, how to chat, which was the big user experience gap, And, and it made it smarter. It it it you're like, you can fine tuning transfer to both style and knowledge, and and the model would, like, respond to things that it had you know, like, what what the content, this synthetic data, what, like, was reflected in the model's performance on benchmarks after that point.
所以,如果你能在不泄露全部输入数据的前提下做到这一点,现在就出现了一种通过 API 售卖数据的新商业模式,而不用把所有数据一次性公开,然后再也赚不到钱。于是,全新的数据商业模式诞生了。同时,这也让我们能够朝着开放智能迈进,更快地构建新架构、测试并快速微调——基本上可以站在巨人的肩膀上,不必每次都从零开始。
So, so if you can do that without revealing all the data that goes in, now now there's, like, a way you could sell data via API without, like like, just dumping all the data out to the world and then never being able to to, like, monetize it again. So there's, like, a brand new business model around data that emerges. Yet, like, the ability to to create to, work towards open intelligence and and build, like, new architectures, test them more quickly, and and and fine tune them quickly. Basically, you can build on top of the work of giants. I mean, you don't have to start from zero every time.
很多最大的开发者体验创新,其实就是给开发者一个更高的台阶起步,这样他们就不用每次都从楼梯最底下开始爬。而Llama对社区的最大贡献就在于此。当然,并不是只有它在搞开源模型。几个月后,Mistral 推出了 7B Instruct,那模型也非常棒。是的。
A lot of, like, the biggest developer experience innovations just involve, like, giving developers a higher stair to start walking up so they don't have to start at the bottom of the staircase every single time. And, you know, that was, like, the the the the big, like, generous give that llama had for the community. And it wasn't, you know, that wasn't the only company doing open source models. Mastral came out with seven b instruct a few months later, and it was an incredible model. Yep.
又过了几个月,他们推出了第一个开放权重的混合专家模型。感觉就是真正的智能,却完全开源。这些模型给全球其他开发者提供了越来越高的台阶,让他们能够——对,基本上就是把整个地球的点子众包出来,并在非常扎实的基础上继续创新。所以,当这幅图景逐渐成形时,我就觉得,好吧。
And they came out with the first open weights mixture of experts a few months later. You know, it just it felt like actual intelligence, but completely open. And all of these provide, like, higher and higher stairs for other developers to kind of, like yeah. Basically, to crowdsource new ideas from the whole planet and and let these new ideas build on top of really good foundations. So and and, you know, when that when that, like, whole picture started to form into place, it felt like, okay.
这将会是一个巨大的“库存”局面,就像当年的NFT藏品一样,虽然两者完全不同,市场动力和买家的目标也完全不一样。所以我早期的很多实验——比如我做了一个叫Window AI的Chrome扩展,还搞了些别的东西。
This is gonna be like a, you know, huge inventory situation. You're kind of like NFT collections were a huge inventory situation. Obviously, completely different, really different market dynamics, really different type of of goal that buyers have. And so a lot of, like, my early experimentation like, I made, like, a a Chrome extension called Window AI. I did, like, a few other things.
都是为了搞懂这个生态系统到底是怎么运作的,它跟别的有什么不同,以及人们——尤其是开发者——真正想要的是什么。
We're just about learning how the ecosystem works and, like, what makes it different and how the like like, what people really want, what developers really want.
这就引出了OpenRouter本身。对吧?我想请你给还不了解OpenRouter的听众讲讲它到底是干什么的。因为我觉得很多人用AI的方式就是给自己选的模型发一条提示,他们用ChatGPT,或者用Grok App,或者在Gemini上。
So that leads us to OpenRouter itself. Right? So I kind of want you to help explain to the listeners who aren't familiar with OpenRouter what it does. Because I think a lot of people, they're the way they interact with an AI is they send a prompt to their model of choice. They use ChatGPT or they use the Grok app or they're on Gemini.
他们生活在这些彼此隔绝的世界里。再往上一步,就是那些专业使用它的人,也就是开发者。他们与 API 交互,也许并不直接接触实际的用户界面,但他们会调用单个模型。OpenRouter 就建立在这之上。
They live in these siloed worlds. Then the next step up from the people are those who use it professionally who are developers. They're interacting with APIs. Maybe they're not interfacing with the actual UI, but they're calling a single model. OpenRouter exists on top of this.
你能给我们讲讲它是如何运作的,以及为什么这么多人喜欢用 OpenRouter 吗?
Can you walk us through how it works and why so many people love using OpenRouter?
OpenRouter 是一个大型语言模型的聚合器和交易市场。你可以把它想象成 Stripe 遇上 Cloudflare 的那种感觉。嗯,对。它就像一块统一的面板。
OpenRouter is a an aggregator and marketplace for large language models. You can kind of think of it as like a, you know, like a Stripe meets Cloudflare for both of them. Mhmm. Yeah. It's like a single pane of glass.
你可以在一个地方编排、发现并优化所有智能需求。一个账单提供商就能让你用上所有模型,现在已经有 470 多个了。就像,所有模型都会实现功能,但方式各不相同。而且还会出现很多 Andrej Karpathy 所说的“智能掉线”现象。
You can orchestrate, discover, and optimize all of your intelligence needs in one place. You know, one one billing provider gets you all the models. There's, like, 470 plus now. Like, all the models like, they sort of implement features, but they do it differently. And they also there's a lot of, like, intelligence brownouts as Andrei Karpathy calls them
是啊。
Yeah.
模型总是宕机。就连那些顶级模型,比如 Anthropic、Gemini 和 OpenAI 也是如此。所以我们所做的,就是——开发者需要大量选择,CTO 需要高可靠性,CFO 需要可预测的成本。
Where models just stay down all the time. Even the, you know, even the top models, like Anthropic and Gemini and and OpenAI. So what we do is, you know, we like, developers need a lot of choice. CTOs need a lot of reliability. CFOs need predictable costs.
CSO 需要复杂的策略控制。所有这些需求都输入到我们的工作中,也就是打造一块统一面板,让模型更可靠、成本更低、选择更多,然后帮你从所有选项中挑选最适合的智能来源。
CSOs need, like, complex policy controls. All of these are inputs to what we do, which is build like a single pane of glass that makes models more reliable, lower cost, gives you more choice, and and then and helps you choose between all the options for where to source your intelligence.
它是怎么运作的?我想,H. S. 和我在节目里经常谈到基准测试,对吧,某个模型在编程方面得分最高。这就暗示你也许应该把所有编程需求都交给它,因为它最擅长。但看起来,如果你通过多个提供商路由,情况似乎并非如此。
How does it work? I would imagine, H. S. And I on the show, we frequently talk about benchmarks, right, where a certain model is the best at coding, And that infers that maybe you should go to that model to do all of your coding needs because it's the best at it. But it would appear as if it's not true if you're routing through a lot of different providers.
那么,你如何决定什么时候路由到哪个提供商,以及如何为所提问题获得最佳结果?
So how do you consider which provider gets routed to when and how to get the best result for what you're asking?
所以我们目前采取了不同的思路,即我们并没有专注于一个生产级路由器来替你挑选模型,
So we've taken a different approach so far, which is instead of focusing on a production router that picks the model for you,
嗯。
Mhmm.
我们努力帮你选模型。所以我们构建了大量分析,既针对你的账户,也放在我们的排行榜页面上,帮你浏览并发现那些重度用户在某些工作负载上真正用得好、用得成功的模型。因为我们觉得,今天的开发者主要还是想自己挑模型;在模型家族之间来回切换会带来很多不可预测的行为。但一旦选定了模型,我们就尽量让开发者不用再操心提供商。
We try to help you choose the model. So we we build lots we we create lots of analytics both on your account and, and on our rankings page to help you browse and discover the models that, like, the power users are really using successfully on a certain type of workload. Because we think, like, developers today primarily want to choose the model themselves. Switching between all families can result in, like, a lot like, very unpredictable behavior. But once you've chosen your model, we try to help developers not need to think about the provider.
有时候,一个模型可能有几十家提供商。各种公司都有,包括超大规模云厂商,像 AWS、Google Vertex、Azure,也有成长型初创,比如 Fireworks、Deep Infra,还有一长尾提供非常独特功能、性能卓越的提供商。他们的差异化点五花八门。所以我们把它们全部集中到一个地方。如果你想要某个功能,就给你列出支持它的那些提供商。
There are, like, sometimes dozens of providers for a given model. All kinds of companies, including the hyperscalers, like AWS, Google Vertex, and Azure, and, like, scaling startups, like, fireworks, deep infra, and a long tail of providers that provide, like, very unique features, very, like, exceptional performance. There's all kinds of differentiators for them. So what we do is we we collect them all in one place. And if you if you want a feature, you just get the providers that support it.
如果你追求性能,就优先给你高绩效的提供商;如果你对成本特别敏感,就优先给你今天价格最低的提供商。我们基本上开辟了所有这些通道。可以有无数种路由方式,但你对最终想要达成的整体用户体验拥有完全控制权。这正是我们发现整个生态里缺失的东西。
If you want performance, you get prioritized to the providers that have high performance. If you really are cost sensitive, you get prioritized to the providers that are really low cost today. And, and we we basically create all these lanes. There's, like, a new like, innumerable ways you could get routed, but you're in full control of the of the overall user experience that you're aiming for. And that's what and that's what we found that was missing from the whole ecosystem.
就是这么一种做法。平均下来,我们通过负载均衡,把请求送到当下最可用、最能处理的那家提供商,能比直接找提供商多出 5% 到 10% 的可用性。我们非常注重效率和性能,只给你的请求增加大约 20 到 25 毫秒延迟,而且全部部署在靠近你服务器的边缘。所以总体来说,我们把提供商叠加起来。
It was just a way of doing that. And and we, you know, we get, like, between, by on average, five to 10% uptime boost over going to providers directly just by load balancing and, and sending you to the top provider, that's, like, up and able to handle your request. And, and we do in, like, a we we we really focus hard on efficiency and performance. We only add about, like, twenty to twenty five milliseconds of latency on top of your request, and it all gets deployed very close to your servers up the edge. So we overall get, you know, just we we we stack providers.
我们找出别人都在做什么、你能从中获益的点,然后把大数据的力量直接给到作为开发者的你,让你只需选择模型即可。
We figure out, like, what you can benefit from that everybody else is doing and and just give you the power of big data as a as a developer just accessing your model choice.
所以这相当于让你利用集体的智慧,对吧?你拿到所有数据,掌握所有查询,知道哪种效果最好,于是就能为用户交付最优的产品。
So it kind of allows you to harness the collective knowledge of everybody. Right? You get all of the data. You have all of the queries. You know which yields the best result, and you're able to deliver the best product for them.
现在说到实际的 LLM,EJ 刚才已经调出了排行榜。我很想了解你们怎么看哪些 LLM 最好、如何给它们做基准测试,以及怎样把用户路由过去。你们是否认为基准测试是准确的?在路由流量时,你们会参考这些基准吗?
Now, in terms of actual LLMs, EJ has actually pulled this up just before, which is a leaderboard. I'm interested in how you guys think about LLMs, which are the best, how to benchmark them, and how you route people through them. Is there a specific do do you believe that benchmarks are accurate, do you reflect those in the way that you you route traffic through these models?
总体而言,我们的立场是:我们想成为模型的“资本主义式”基准——到底实际发生了什么?我非常相信大数定律和重度用户的热情对其他人极具价值。比如你在欧洲调用 Claude 4,某家提供商的吞吐量可能瞬间差异巨大;只有别的用户先发现,我们才能检测到。
In general, we have taken the stance that like, we wanna be the the the, like, the capitalist benchmark for models. Like, what what is actually happening? And and part of this is that I really think the the law of large numbers and the enthusiasm of power users are really, really valuable for everybody else. Like, when you're routing to, like, Claude in let's say you're routing to Claude 4 and you're based in Europe, there you know, all of a sudden, there might be, like, a huge variance in in throughput from one of the providers. And we're only able to detect that if, like, some other users have discovered it before you.
于是我们会绕开那家在欧洲跑得慢的提供商,如果你的数据策略允许,就把你送到别处快得多的提供商,让你获得更高性能。这是在提供商层面,数字如何起作用。在模型选择层面,比如在这个排行榜上,每当我们上线一个新模型——今天我们刚上线一家叫 z AI 的新模型实验室的模型——重度用户会立刻发现它。我们有一个 LLM 爱好者社区,他们会深入挖掘,找出模型在一大堆核心用例上到底擅长什么。重度用户弄清楚哪些工作负载有趣,然后大家就能在数据里看到他们的做法,所有人都能受益。
And so we route around the provider that's, like, running kind of slow in Europe and send you, if your data policies allow it, to a much faster provider somewhere else, and that allows you to get faster performance. So, like, that's, like, on the provider level, how, like, numbers help. On the, like, model selection level, like, you see on this rankings page here, power users will like, when we put up a model, like, we put up a new model today from a new model lab called z AI, like like, the power users instantly discover it. We have this LLM enthusiast community that, that dives in and and really figures out, like, what a model is good for and along, like, a bunch of core use cases. The power users figure out which workloads are are interesting, and then you can just see in the data, what what they're doing, and everybody can benefit from it.
这就是为什么我们会在排行榜页面上免费开放并共享我们的数据。
That that's why we, like, open up our data and share it for free on the rankings page here.
Alex,我在所有这些排行榜上看到一个一致的单位,就是token。对吧?Josh和我在节目中聊过这个,但我想知道,你们是怎么选定这个特定单位来衡量这些模型的好坏或效果,或者它们被消费或使用的情况的。你能多讲讲为什么选这个单位吗?这对你们OpenRouter平台来说,又能说明用户是如何使用某个模型的?
I'm seeing this one consistent unit across all these rankings, Alex, which is tokens. Right? And Josh and I have spoken about this on the show before, but I'm wondering how like, you've chosen this specific unit to measure, you know, how good or effective these models are or how consumed or used they are. Can you tell us a bit more as as to why you picked this particular unit and what that tells you as, like, the open router platform as to, like, how a user is using a particular model?
嗯,我觉得美元也是一个不错的指标。我们选token主要是因为价格下降得非常快,OpenRouter从2023年初就有了,我不想因为价格大幅下降就让某个模型在排行榜上吃亏。
Yeah. I think I mean, I think dollar is a good metric too. The reason we chose tokens is primarily because we were seeing prices come down really quickly, for mostly like, OpenRouter's been around since the beginning of 2023. And and I didn't want a model to be penalized in the rankings just because, like, the prices are going down really dramatically. Mhmm.
现在有个悖论叫杰文斯悖论,就是说当价格下降10倍时,用户对某种基础设施的使用量会增加超过10倍。也许他们根本不会计较那点小钱。但我觉得用token还有其他好处。token不会因为价格下降而受罚,也不依赖杰文斯悖论,那种效应会有滞后。
Now, like, there's there's a a paradox called Jevin's paradox, which is that when when prices decrease, like, 10 x, users use of some, like, component infrastructure increases by more than 10 x. And so and maybe they wouldn't get Pennywise at all. But I thought there were there were some other advantages to using tokens too. Tokens, like, don't have this penalty and don't don't rely on Jevan's paradox, which could which can have, like, a lot of lag. They also they also are a little bit of a proxy for time.
token也算是一种时间代理。如果一个模型在大量用户间长时间生成很多token,说明很多人真的在读这些token并拿来用。输入也一样,如果我一次性塞给模型一大堆文档,而它的提示定价非常低,我觉得这仍然有价值,我们也想看到这种场景——模型处理了海量文档,这种用例就该在排行榜上体现。
You know, a model that is, like, generating a lot of tokens and doing so, for a while across a lot of users means that a lot of people are, like, reading those tokens and actually doing something with them. And same goes for input. Like, if I if I really want to, like, send an enormous number of documents and the model has, like, a really, really, really tiny prompt pricing, I think that's still valuable and something that we wanna see. We wanna see that this model is, like, processing an enormous number of documents. That's like a use case.
所以我们决定用token。未来可能会加上美元,但token没有杰文斯悖论那种滞后,而且当时也没人做整体分析。直到几个月前谷歌开始公布Gemini处理的总token量,我们才看到别的公司这么做。所以看哪些场景真的需要美元指标吧,但token目前表现不错。
That should show up in the rankings. And so we decided to go with tokens. We might, like, add dollars in the future, but I think tokens are you know, they they they don't have this, like, Jevons paradox lag. And and there wasn't anything else. Like, nobody was doing any kind of, like, overall analytics.
我们决定用token。未来可能会加上美元,但token没有杰文斯悖论那种滞后,而且当时也没人做整体分析。直到几个月前谷歌开始公布Gemini处理的总token量,我们才看到别的公司这么做。所以看哪些场景真的需要美元指标吧,但token目前表现不错。
We didn't see any other company even do it until Google did a few months ago where they started publishing the total amount of tokens processed by Gemini. So we'll see, like, what the you know, which use cases really need dollars, but but tokens have been holding up pretty well.
这仪表盘太棒了,我推荐所有听不到我们屏幕的听众去OpenRouter网站看看。这两周我一直在认真盯,两周前Grok 4发布了,对吧?
Yeah. I mean, this dashboard is awesome, and I recommend anyone that's listening to this that can't see our screen to to get on OpenRauter's website and and check it out. I've been following it for the last two weeks kind of pretty rigorously, Alex. And what I love is you can literally see so two weeks ago, Grok four got released. Right?
Josh和我做了好多视频,几乎什么事都用它。然后几天后中国出了个新模型叫Kimi k2,我当时想,哦,随便吧,就是个中国模型,懒得关注。结果它一直出现在我时间线上,我就想,行,试试。我直接上OpenRouter看更广泛AI用户的兴趣,发现它飙升得飞快。
And Josh and I were making a ton of videos on this, and we were using it with for pretty much everything that we could do. And then this other model came out of China pretty much a few days after called Kimi k two. And I was like, oh, yeah. Whatever. This is just some random Chinese model.
Josh和我做了好多视频,几乎什么事都用它。然后几天后中国出了个新模型叫Kimi k2,我当时想,哦,随便吧,就是个中国模型,懒得关注。结果它一直出现在我时间线上,我就想,行,试试。我直接上OpenRouter看更广泛AI用户的兴趣,发现它飙升得飞快。
I'm not gonna focus on it. And then I kept seeing it in my feed and I thought, okay, maybe I'll give this a go. And I kind of, like, went straight to open route and just kind of, like, almost gauge the interest from a wider set of AI users. And I saw that it was skyrocketing. Right?
然后我看到,Quinn 上周发布了他们的模型。我再次来到 OpenRoute,它就像提前预见了趋势。对吧?人们已经开始用了。所以我喜欢你如何把 OpenRoute 描述成一种预言球,让爱好者和社区本身可以抢先于非常流行的趋势。
And and then I saw that, you know, Quinn dropped their models last week. And again, I came to OpenRoute and it it, like, preceded the trend. Right? People had already started using it. So I love how you describe OpenRoute as this kind of, like, prophetic orb basically where the enthusiasts and the community itself can kind of, like, front run very popular trends.
我认为这是一种非常强大的模式。在这条路上,Alex,我注意到,很多主要模型提供商都看到了它的价值。对吧?所以,如果我没记错,OpenAI 好像利用你们的平台,在他们正式发布之前,秘密地推出了他们的前沿模型。对吧?
And I think that's a very powerful mode. And kind of on this path, Alex, I noticed that, a lot of these major model providers see the value in this. Right? So, if I'm not mistaken, OpenAI kind of, like, used your platform to kind of secretly launch their frontier model before they officially launched it. Right?
你能给我们讲讲,这是怎么发生的,更重要的是,他们为什么想这么做,以及为什么选择 OpenRoute 来做这件事?
Can can you walk us through, you know, how that comes about and and more importantly, why they want to do that and and why they chose OpenRadgets to do that?
OpenAI 有时会提前把模型给一些客户测试。我们问他们是否愿意和我们一起尝试一个隐形模型,这我们以前从没做过。这需要用另一个名字上线,看看用户在没有先入为主偏见的情况下如何反应。这是一种新的测试方式,对我们和他们都是一次实验,他们慷慨地决定冒险一试。我们和他们一起上线了 GPT 4.1,并把它叫做 quasar alpha。
OpenAI will sometimes give early access to their to models to some of their customers for testing. And we asked them if they wanted to try a stealth model with us, which we had never done before. It involved, like, launching it as under another name and seeing how users respond to it without having any bias or sort of inclination for or against the model at the onset. And it it would be like a new a new way of testing it and a new way of it was like an experiment for both us and them, and they generously decided to take, you know, the leap of faith and try it. And we launched GPT 4.1 with them, and we called it quasar alpha.
是的。这是一个百万 token 上下文长度的模型,是他们第一个超长上下文模型,也针对编程做了优化。发生了几件令人难以置信的事。首先,我们有一个由基准测试者组成的社区,他们运行开源基准,我们会给他们很多资助,用 OpenRouter 的代币来支持这些基准。
Yep. And it was a million token context length model, opening its first very, very long context model. And it was also optimized for coding. And the incredible there were a couple incredible things that happened. First, we have this community, of benchmarkers that run open source benchmarks, and we we give a lot of them grants to help fund the benchmarks, grants of open router tokens.
他们会免费对所有模型跑一整套测试,其中一些非常有创意。比如有一个测试生成小说的能力,有一个测试能不能在 Minecraft 里做出 3D 物体,叫 MCbench,还有一些测试不同编程能力,有一个专门测 Ruby,因为很多模型在 Ruby 上表现不好。
They'll just run the suite of tests against all the models, and some of them are very creative. Like, there's one that tests, like, ability to generate fiction. There's one that tests, like, how, like, whether it can make a three d object in Minecraft called MCbench. There are a few that test different types of coding proficiency. There's one that just focuses on how good it is at Ruby, because Ruby turns out a lot of the models are not great at Ruby.
有很多语言所有模型都做得很差。所以我们有一长串非常小众的基准,所有测试者都免费在 Quasar Alpha 上跑了一遍,大多数结果都非常惊人。于是 OpenAI 实时得到了这些反馈,我们帮他们找到这些结果,他们又做了一个快照,我们以 Optimus Alpha 的名义上线,这样他们就能比较两个快照得到的反馈。
There are a lot of, like, languages that all the models are pretty bad at. And and so we have this, like, long tail of very niche benchmarks, and all the benchmarkers ran, you know, like, for free their benchmarks on Quasar Alpha and found pretty incredible results for most of them. And so the model got, like you know, OpenAI got got this feedback in real time. We kind of, like, help them find it, and and they, like, they they made another snapshot, which we launched as Optimus Alpha. So and they could compare the feedback that they got from the two snapshots.
然后两周后,他们正式向所有人发布了 GPT 4.1。这对我们来说是一次实验,我们后来又和另一家模型提供商做了类似的事,他们还在打磨。这是一种很酷的众包基准方式,能获得你意想不到的测试,也能得到无偏见的社区 sentiment。
And and then they and then, like, two weeks later, they launched g b d 4.1 live for everybody. So it was like a was it an experiment for us, and and we've done it, again since with another model provider that that's still working on it. And it and it's kind of like a cool way of learning of, like, crowdsourcing benchmarks that you wouldn't have expected and also getting unbiased community sentiment.
太棒了。所以现在当我们看到新模型出现,想提前测试 GPT-5,我们知道该去哪儿了。我们会盯着的,因为记得它很快就来,我们在你的观察名单上。不过我想问你开源和闭源的问题,这对我们很重要,我们经常讨论。
That's great. So now when we see a new model pop up and we wanna test GPT five, we know where to come to try it early. We'll see because remember it is coming soon, so we're on your watch list. But having I do wanna ask you about open source versus closed source because this has been an important thing for us. We talk about this a lot.
你手里有大量数据。我看排行榜,开源模型表现很好,闭源的也是。你总体怎么看?你如何看待开源与闭源模型,特别是在你向用户提供服务的方式上?
You have a ton of data on this. I'm looking at the leaderboards. There are open source models that are doing very well, closed source. What are your takes in general? How do you feel about open source versus closed source models, particularly around how you serve them to users?
两种模型都有供应问题,但供应问题的性质非常不同。通常,我们在闭源模型上看到的情况是供应商极少,往往只有一两个。比如 Grok,就只有 Grok Direct 和 Azure;Anthropic 这边,有 Anthropic Direct,还有 Google Vertex。
Both models both types of models have supply problems, but the supply problems are very different. Typically, what we see with closed source models is that there's there's very few suppliers, usually just one or two. Like, with Grok, for example, there's Grok Direct, and there's Azure. With Anthropic, there's Anthropic Direct. There's Google Vertex.
还有 AWS Bedrock。我们还会在不同区域部署,比如为了只让数据留在欧盟的客户,我们在欧盟也做了部署。我们也会为闭源模型做定制化部署,来保证高吞吐和高额度。但难点在于,需求通常让闭源模型在 OpenRouter 上占了大部分 token。
There's AWS Bedrock. And then we also, like, deploy it in different regions. Like, we have an EU deployment, for customers who'd, like, only want their data, like, to stay in the EU. And, and we do custom deployments for the for the closed source models too to just kinda guarantee good throughput high and high rate limits for people. But but, like, a tricky part is that, like, the the demand usually, like, the closed source malls are doing most of the tokens on OpenRouter.
它占主导地位,大概今天 70% 到 80% 的 token 来自闭源模型。但开源模型的供应端更碎片化,就像卖方订单簿一样。每个提供商的速率限制平均来说也更不稳定,超大规模云服务商通常要花一段时间才能上线新的开源模型。
It's it's dominant you know, it's probably 80 ish 70 to 80% closed source tokens today. But the the open source models have a much more fragmented supply like, sell side order book. Yeah. And and, like, the rate limits for you know, each provider is, like, a like, less stable on average. It usually takes a while for the hyperscalers to serve a new closed source a new open source model.
所以我们在开源模型上做的负载均衡工作往往更有价值。闭源模型的负载均衡主要聚焦在缓存和功能感知,确保你能干净地命中缓存,只在缓存过期时才切换到新提供商。开源模型几乎没什么缓存,很少开源模型实现缓存,所以切换提供商更常见。
So we so the load balancing work that we do on open source models tends to be a lot more valuable. The load balancing work that we do for closed source models tends to be very focused on, like, caching and feature awareness, making sure you're getting, like, clean cache hits and only transitioning over to new providers when your cache is expired. For open source models, like, there's way less caching. Like, very, very few open source models implement caching. And so, like, switching between providers becomes more common.
我们还会追踪不同开源提供商之间的质量差异。有些会用更低的量化级别部署,也就是一种压缩模型的方式,通常对输出质量影响不大。但我们仍会从某些开源提供商那里看到一些奇怪的结果,于是内部会跑测试来检测这些输出,很快会建立更强的机制,把它们踢出路由链路,避免影响用户。
And, like, we we also track a lot of quality differences between the the open source providers. Some of them will deploy at lower quantization levels, which means, like, it's kinda like a way of compressing the model. Generally, doesn't have an impact on the quality of the output. But and yet, we still see some odd things from some of the open source providers. And and so we run tests internally to detect those outputs, and we're building up a lot more muscle here soon so that they get pulled out of routing lane and don't affect anyone.
所以闭源占了大约八成,比例非常大。你觉得这会变吗?刚才那篇帖子说,上周增长最快的十个大模型里,有九个是开源的。每次中国发布新模型,比如一两周前的 Kimi K2,都会把开源前沿往前推一大步。开源的加速速度似乎跟闭源一样快,甚至更快,改进非常迅速。
So closed source accounts for 80% or something like that, a very large amount. Do you see that changing? That post we just had, it said nine out of the 10 fastest growing LLMs last week, they were open source. Every time it seems like China comes out with another model, it was Kimi K2 a week or two ago, it really pushes the frontier of open source forward. The rate of acceleration of open source seems to be as fast, if not faster, than closed source, where it's making these improvements very quickly.
因为开源,大家都能贡献,所以能复合加速。你觉得这会不会改变你们发出 token 中开源与闭源的比例?还是你认为趋势仍是 Google、OpenAI 这些闭源模型继续占大头?
It has the benefit of being able to compound in speed based because it's open source and everyone can contribute. Do you think that starts to change where the percentage of tokens you're issuing are from open source models versus closed source, or do you continue to see a trend where it's gonna be Google, it's gonna be OpenAI that are serving a majority of these tokens to users?
短期内,开源模型很可能继续在 OpenRouter 上占据增长最快模型的类别。原因是很多用户先冲着闭源模型来,后来想优化——要么想省钱,要么想试一个在他们应用场景里稍好的新模型——就会从闭源跳到开源。所以开源往往是一种最后一公里优化手段。
In the short term, we're likely to see open source models continue to dominate the fastest growing model category on open router. And the reason for that is that a lot of users who come for a closed source model, but then decide they want to optimize later. Either they wanna, like, you know, save on costs or or, like, try out a new model that's supposed to be a little bit better in some direction that their their app cares about or their use case cares about, then they they leave the closed source model and go to an open source model. So open source tends to be like a last mile optimization thing. Yeah.
当然这是大方向,因为反过来也会发生。正因为它是最后一公里优化,从完全没人用到被几个离开 Claude 4、想试新编程场景的人采用,这个跳跃会比原本基数就高、增长没那么剧烈的闭源模型更显著。你问题的另一部分,是问会不会出现闭源和……
Making a big generalization because, you know, the the reverse can happen too. And so because it's a last mile optimization thing, the jump from this model is not being used at all to this model is really being used by a couple people who, like, have left Claude four and, and want to try, like, some new coding use case will be, like, bigger than, you know, the open the closed source models, which start at a really high base and don't have, like, growth quite as dramatic. So the other part of your question though was whether there's gonna be, like, a flipping of Mhmm. Close and
或者说,某种逐步削弱闭源 token 垄断地位的趋势。
Or some sort of, like, chipping it away at that monopoly of closed source tokens.
这些事情很难预测,因为我觉得目前开源模型最大的问题是激励机制不够强。模型实验室和模型提供商已经建立了如何作为公司成长并吸引高质量AI人才的既定激励方式,而公开模型权重会削弱这些激励。我们未来可能会看到中心化提供商在这方面发挥作用。如果能有很好的激励方案,让高质量人才能在一个至少保持开放权重的开源模型上工作,也许就能解决这个问题。
It's hard to predict these things because, you know, I think, like, the the biggest problem today with open source models is that the incentives are not as strong. Like, the the model lab and the model provider, they've you know, they're they're sort of established incentives for how to grow as a company and attract good, high quality AI talent, and and giving the model weights away impairs those incentives. Now, like, we might see yeah. This is where we might see, like, the centralized providers helping in the future. A way for, like, you know, like, a a really good incentive scheme that, like, allows high quality talent to work on an open source model that can that remains open weights at least, like, could fix this.
我自己会尽量贴近去中心化提供商,向他们学习。在推理服务的提供端,确实有一些很酷的激励方案正在开发。但在模型本身的开发上,很遗憾我见得不多。我觉得一旦出现一个这样的方案,就会进入视野。在那之前,我个人持怀疑态度。
I, like I, you know, I I stay pretty I try to stay close to the decentralized providers and, like, learn a lot from them. There's some, like, cool on the provider side on, like, on running inference, I think there's some really cool incentive schemes being worked on. But on actually developing the models themselves, I haven't seen too much, unfortunately. I think if we see one, opening in in the radar. And until we do, I personally doubt it.
TBD,你个人对开源和闭源怎么看?这是我们一直在激烈讨论的大话题,涉及对齐和闭源模型的伦理担忧,与开源对比。看竞争对手时,中国通常被贴上开源标签,而美国通常被贴上闭源标签。我们看到Meta的LAMA发布了开源模型,但现在他们又在筹集大笔资金,用高薪雇佣很多人,很可能去开发闭源模型。
TBD, do you have personal takes on how you feel about open source versus closed source? Because this has been a huge topic we've been debating too. It's just the the ethical concerns around alignment and closed source models versus open source. When you look at the competitors, China, generally speaking, is associated with open source, whereas The United States is generally associated with closed source. And we saw LAMA and Meta release the open source models, but now they're raising a ton of money to pay a lot of employees a lot of money to probably develop a closed source model.
所以看起来美国和中国之间的趋势有些分裂。我很好奇,除了OpenRouter之外,你个人更看好哪一方,对美国的长远地位,或者对AI安全与对齐的整体讨论更有利?
So it seems like the trends are kind of split between US and China. And I'm curious if you have any personal takes, outside of OpenRouter, of of which you think serves better for the long term outlook on, I mean, the position of The United States or just the general safety and alignment conversation around AI?
两者一个非常根本的区别是,开源模型的创新可以比闭源模型的创新更快被复制。所以在速度和领先程度上,这是巨大的结构性差异。理论上这意味着闭源模型应该始终领先,直到出现我刚才说的那种有趣的激励方案。
I mean, a very simple fundamental difference between the two is that an innovation in open source models can be copied more quickly than an innovation in closed source models. Mhmm. So in terms of velocity and, like, how far ahead one is over the other, that is like an that's a massive structural difference. That means, like, closed source models should be theoretically always ahead Yeah. Until a really interesting incentive scheme develops, like I mentioned before.
我看不出有证据表明这种情况会改变。至于中国和美国,我觉得很有意思的是中国还没有出现重大闭源模型,我也不清楚未来为什么不会有的原因。我预测中国会出现闭源模型。像DeepSeek、月之暗面以及Quinn等可能已经建立了黏性很强的人才池。
I think and I think that's you know, I don't see, like, evidence that that's gonna change. In terms of China versus The US, it's I think it's very interesting that China has not had, like, a major closed source model. And I don't really see a great reason why I am not aware of any reasons that's not that's not going to be going to be the case in the future. My prediction is that there's going to be a closed source model from China. And, you know, if, you know, if, like, it's possible that deep sea is kinda like and and and Moonshot and a few of and Quinn have, like, built up really sticky talent pools.
但一般来说,人才池在几年后总会有人离职去创办新公司,形成新的人才池。所以我们应该能看到这种情况。AI领域不像对冲基金那样有严格的保密协议或竞业限制,未来也许会有,但如果目前的竞业文化继续,中国应该会出现更多新公司,我敢打赌其中一些会是闭源的。
But generally, with talent pools, after enough years have passed, people quit and go and create new companies and build new talent pools. And so we should see some of that. Like, there it's not the case that the AI space has NDAs or non competes that the, like, hedge fund space has. Like, that might happen in the future too. But assuming that, you know, the current non compete culture continues, we should have there should be, like, more companies that pop up in China over over time, and I'm betting that some of them will be closed source.
我猜测两国最终会看起来越来越像。
And my guess is that, like, the two nations will start to look more similar.
是啊,所以扎克伯格才会砸3亿到10亿美元的年薪挖这些人。还有一个关于中美的问题,我基本同意你的看法。
Yeah. I guess, you know, that's why you have Zuck dishing out 300 mil to a billion dollar salary offers to a bunch of these guys. Right? One more question on China versus The US. I I kind of agree with you.
我真没想到中国会在任何领域带头搞开源,更别说我们这个时代最重要的技术了。你觉得他们构建这些模型的秘诀是什么,Alex?我知道这可能超出了OpenRouter的专业范围,但作为一个研究这项技术多年的人,我很难弄清楚他们的优势在哪。他们不断发现新技术,也许答案很简单,就是资源受限。
I didn't really expect China to be the one to lead open source anything, let alone the most important technology of our time. Do you think is their secret source to building these models, Alex? And I know this might be out of the forte of OpenRatchet specifically, but as someone who has studied this technology for a while now, I I'm struggling to figure out, you know, what advantage they had. You know, they're discovering all these new techniques. And maybe the simple answer is, like, constraints.
对吧?他们拿不到英伟达的全部芯片,也没有无限的算力。所以也许他们被迫想办法绕开西方公司专注的那些同类问题。但很明显,美国虽然资金充足,也还没能取得这些前沿突破。
Right? They don't have access to all of NVIDIA's chips. They don't have access to infinite compute. So then maybe they're forced to kind of like figure out other ways around the same kinds of problems that Western companies are focused on. But it's pretty clear that America with all its funding hasn't been able to make these frontier breakthroughs.
所以我很好奇,你是否知道中国AI研究者或者那些每天都在OpenRata上露脸的AI团队,有什么技术护城河是美国没有的?
So I'm curious whether you are aware of or know some kind of technical moat that Chinese AI researchers or these AI teams that are featuring on OpenRata day in and day out have over The US.
嗯,我不太确定。确实有一些他们提出的东西,比如DeepSeek在他们的论文里发表了很多很酷的推理创新。但他们原始R1论文里的很多内容,其实是OpenAI自己也独立做过的,我之前提到过。所以在推理侧以及部分模型侧,我觉得DeepSeek——我们在R1发布前就跟他们团队聊了好几年——他们之前出过好多模型,一直是一支在推理方面非常犀利的团队。
Well, I don't know. There like, there are certainly some that they've come up with that like like, DeepSeek had a lot of very cool inference innovations that they published in their paper. But a a lot of what they published in the original r one paper were things that put, like, that OpenAI done independently themselves, and I mentioned before. So I like, on the inference side and on some of the model side, I think, like, DeepSeek, we we had talked to their team for years before r one came out. They had many models before that, and they were always, a pretty sharp, team for doing inference.
比如,他们早就做出了最好的提示缓存用户体验,价格比谁都便宜。在DeepSeek R1出来之前,他们就已经是我们所知的最强的中国团队了。所以我猜,在中国想留下来的人里,他们确实积累了一批人才,这绝对是巨大优势。美国公司显然没在做这种事。
Like, they came up with, like, the best user experience for caching prompts long before Deepsake r one came out, and they had very good pricing. They they were they were, like, you know, by far the the strongest Chinese team that we're aware of well before that happened. And so I'm guessing there was, like, some talent accumulation that they were working on in China for people who wanted to stay in China. And, yeah, that's that's a huge advantage. Like, American companies are obviously not doing that.
扎克伯格说得挺对,很多都靠人才。AI领域开放且可组合,就像一棵巨大的知识树。一篇论文出来会引用20篇其他论文,你可以去把这些被引论文全读一遍,然后就能大致理解这篇论文。
There's a little bit Zuck is very on point that a lot of this is just based on talent. There are a lot of AI is open and out there and just, like, the and very composable. It's like a big tree of knowledge. There's a paper that comes out and it cites, like, 20 other papers, and you can go and read all of the cited papers. And then you, like, have kind of a basis for understanding the paper.
但你得再往下挖两层,把被引论文的参考文献也全读了,才能真正搞懂发生了什么。能做到这一点的人极少,而且需要多年经验,才能把大量根本没写进论文的知识真正用起来。能在模型各个维度上领衔研究的人,数量非常有限。中美之间的边界其实挺明显。
But you really have to go one level deeper and read all the cited papers two level down to really understand what's going on. And it it's just that no very few people can do that. And it takes, like, a lot of years of experience to, like, actually apply that knowledge and learn all these things that have not been written in any paper at all. And and there's just there's just such such, like, a small number of people who can really lead research on all the different dimensions that go on to making a model. And and and, like, the the border between China and The US is is pretty defined.
你得离开中国、搬到美国,并在这里真正扎根。所以我觉得存在国家套利,也有对冲基金背景套利,还有硬件套利——大量硬件只在中国有,反之亦然,这就创造了机会。
You have to leave China and move to The US and really establish yourself here. So I do think there's, like, country arbitrage. There's, like, there's, you know, the head the hedge fund background arbitrage. There's, there's there's hardware arbitrage. Like, there's, like, a ton of hardware that's only available in China, but not here vice versa that creates an opportunity on this.
这种情况会一直持续下去。
This will just continue to happen.
是啊,这种套利真有意思。我读到说全球大概也就不到200到250名研究者,能进这些前沿AI模型实验室。我查了最近中国开源模型Kimi k2的团队背景,这模型排名爆表,好像有万亿参数,非常夸张。
Yeah. I I think this arbitrage is is fascinating. I I read somewhere that there's, like, probably less than 200 or 250 researchers in the world that are kind of, like, worthy of working at some of these frontier AI model labs. And I looked into some of the backgrounds of the team behind Kimi k two, which is this recent open source model out of China, which kind of, like, broke all these crazy rankings. I think there it was like a trillion parameter models, something crazy like that.
他们很多人曾在顶级美国科技公司工作,又都毕业于中国同一所大学——清华,据说那是中国AI界的哈佛。挺疯狂的。不过Alex,我想把话题转到你前面提到的数据问题上。
And a lot of them worked at some of the top American tech companies, and they all graduated from this one university in China. I think it's Tsinghua, which apparently is like, you know, the Harvard of AI in China. Right? So so pretty crazy. But, Alex, I wanted to shift the focus of the conversation to a point that you brought up earlier in this episode, which is around data.
明白了吗?这就是背景,我和Josh已经深入聊过这件事。我们完全被OpenAI的一个功能迷住了,就是记忆功能,对吧?
Okay? So here here's the context that, like, Josh and I have spoken about this at length. Right? We are obsessed with this feature on OpenAI, which is memory. Right?
我知道很多其他AI模型也有记忆功能,但我们之所以这么喜欢它,是因为我觉得这个模型了解我,Alex。我觉得它知道我的一切。它能为我量身定制任何提示。
And I know a lot of the other memory sorry. A lot of the other AI models have memory as well. But the reason why we love it so much is I I feel like the model knows me, Alex. I feel like it knows everything about me. It can personally curate any of my prompter.
它懂我,知道我想要什么,然后直接给我,我就能立刻上手,做我自己的事。现在OpenRouter就像查询层之上的一层。对吧?所以有很多人写各种奇奇怪怪的提示,然后通过OpenRouter路由到不同的AI模型。
It just gets me. It knows what I want, and it just serves up to me and a partner, and off I go, you know, doing my thing. Now OpenRouter sits on top of, like, kind of like the query layer. Right? So you have all these people writing all these weird and wonderful prompts and kind of routing it through on towards like different AI models.
你们掌握了所有这些数据,也许你们可以访问这些数据。我知道你们还有一个叫“私人聊天”的功能,你们不会访问那些数据。跟我聊聊OpenRouter和你们打算怎么处理这些数据。因为在我看来,你们其实拥有最好的模型,甚至可能比ChatGPT还好,因为你们有来自各种用户、各种模型、各种不同类型的提示。理论上,如果你们愿意,你们可以为每个用户打造最个性化的AI模型。我说得对吗,还是我在胡说八道?
You hold all of that data, maybe you have access to all of that data. And I know you have something called private chat as well where you don't have access to it. Talk to me about, like, what OpenRoute and what you guys are thinking about doing with this data. Because presumably, or in my opinion, you guys have actually the best mode, arguably better than ChatGPT because you have all these different types of prompts coming from all these different types of users for all these different types of models. So theoretically, you could spin up some of the most personal AI models for each individual user if you wanted to.
我说得对吗,还是我在胡说八道?
Do I have that correct, or am I, you know, speaking crazy?
不,你说得对。这确实是我们正在考虑的事情。默认情况下,你的提示不会被我们记录,新用户的提示和完成内容我们默认不保存。你需要在设置里手动开启。
No. That's true. It's it's something we're we're thinking about. We by default, your prompts are not logged at all by a or, like, we don't have prompts or completions for new users by default. You have to toggle it on in in settings.
但确实,很多人选择开启。因此,我觉得我们拥有目前最大的多模态提示数据集。但我们现在几乎没做什么处理。我们只对其中极小一部分进行了分类,就是你在排行榜页面上看到的那部分。但在个人账户层面,其实可以做三件事。
But, yeah, like, the the result a lot of people do toggle it on. And as a result, I think we have, like, by far the largest multimodal prompt datasets. But what we've done today, we've barely done anything with it. We classify a tiny, tiny, tiny subset of it, and that's what you see in the rankings page. But what it could be done on a per account level is really three main things.
第一,直接实现记忆功能。你今天就可以通过把OpenRouter和“记忆即服务”结合起来实现。有几家公司在做这个,比如MEM0和SuperMemory。我们可以和其中一家合作,或者自己做类似的东西,提供大量分发渠道。这样基本上就能得到一个“ChatGPT即服务”,感觉模型真的了解你,上下文会自动加到你的提示里。
One, memory right out of the box. You can you can get this today by, like, combining OpenRouter with, like, a memory as a service. There's, a couple companies that do this, like, MEM0 and SuperMemory. And we can partner with one of those companies or do something similar and just, like, provide a lot of distribution. And and and that basically gets you like a ChatGPT as a service where the where it feels like the model really knows you and and context automatically gets the rate context gets added to your prompt.
另外两件事是,更智能地帮你选模型。有很多模型之间需要做非常明确的迁移决策。我们从数据里能看得很清楚,但现在我们只能通过某种沟通渠道告诉客户,比如:我们发现你大量用了这个模型。
The other things that we can do are, like, help you select the right model more intelligently. There's a lot of models where there's, like, a super clear, like, migration decision that needs to be made. And and we can just see this very clearly in the data, but we right now, we just, like, you know, we have, like, a channel or like some kind of communication channel open with the customer. We can just tell them like, hey. And we noticed you're using this model a ton.
它已经弃用了,这个新模型明显更好,你应该把这类任务迁过去。或者,这类任务如果换过去,价格会便宜很多。这就是我们目前唯一做过的、基于数据的指导性路由,其实可以做得更智能、更开箱即用、更深度集成到产品里。
It's been deprecated. This model is significantly better. You should move this kind of workload over to it. Or, like, this workload, you'll get way better pricing if you do this. And and that's based like, that's the only sort of guidance and kind of, like, opinionated routing we've done so far, and it could be intelligent, a lot more out of the box, a lot more built into the product.
最后一件事我们可以做的,我是说,可能还有无数我们根本没想到的事情。但比如,变得非常、非常聪明地观察模型和提供商如何响应提示,并向你展示最酷的数据。就像告诉你哪些提示去了哪些模型,这些模型又是如何回复的,并且用各种有趣的方式来刻画回复。比如,模型是否拒绝回答?拒绝率是多少?
And then the the last thing we can do I mean, there's there's probably tons of things we're not even thinking about. But, like, getting really, really smart about how models and providers are responding to prompts and showing you just the really coolest data. Just like telling you what kinds of of prompts are going to which models and how those models are replying and just, like, characterizing the reply in all kinds of interesting ways. Like, did the model refuse to answer? What's the refusal rate?
模型是否成功调用了工具,还是决定忽略你传进去的所有工具?这一点非常关键。模型有没有注意到它的上下文?在你把它发给模型之前,有没有发生某种截断?所以,有各种各样的小众情况会让开发者的应用变蠢,而这些都是可以被检测到的。
Did the model did the model, like, successfully make a tool call, or did it decide to ignore all the tools that you passed in? That's a huge one. Did the model, like, pay attention to its context? Did, you know, did what did did some kind of truncation happening happen before you sent it to the model? So there's all kinds of, like, edge cases that that cause developers apps to just get dumber, and and and they're all detectable.
我很高兴你这么说,因为我有一个可能不算太热门的观点:我其实觉得现在所有前沿模型都已经足够强大,可以为每个用户做出最疯狂的事情,只是我们还没能解锁它,因为它缺乏上下文。当然,你可以给它接一堆不同的工具,但如果它不知道什么时候用工具,或者如何处理某个提示,又或者用户自己不知道该怎么读懂AI模型的输出——就像你刚才说的,我们需要某种分析手段——那我们就像无头苍蝇一样乱撞。所以我很高兴你提到这一点。另一件我想听听你看法的数据相关的事情是,我觉得“AI智能体”这个概念正在变成一个巨大的趋势,Alex。
I'm so happy you said that because I I I have this kind of, like, hot take, but maybe not so hot take, which is I actually think all the frontier models right now are good enough to do the craziest stuff ever for each user, but we just haven't been able to unlock it because it just doesn't have the context. Sure. You can, like, attach it to a bunch of different tools and stuff, but if it doesn't know when to use the tool or how to process a certain prompt or if the users themselves don't know how to, like, read, you know, what the the output of the AI model themselves, like you just said, we need some kind of analytics into all of this, then we're just kind of walking around headless chickens almost. So I'm really happy that you said that. One other thing that I wanted to get your take on data side of things is I just think this whole concept or notion of AI agents is becoming such a a big trend, Alex.
我注意到很多前沿模型实验室发布的新模型会启动多个实例,各自被赋予特定角色。比如,你去负责研究,你去负责编排。
And I've noticed a lot of frontier model labs, release new models that kind of spin up several instances of their AI model, and they're tasked with a specific role. Right? Okay. You're gonna do the research. You're gonna do the orchestrating.
你通过浏览器去网上查资料,等等等等。然后它们在小搜索结束后聚在一起, refine 答案,再呈现给用户。对吧?Grok 4 这么做,Claude 也这么做,还有其他一些模型。我觉得,有了你说的这些数据,OpenRouter 完全可以把这也做成一个功能,对吧?
You're gonna look online via a browser, blah blah blah blah blah. And then they coalesce together at the end of that little search and refine their answer and then present it to someone. Right? You know, Grok four does this, Claude does this, and a few other models. I feel like with this data that you're describing, Open Rider could be or could offer that as a feature, right?
也就是,你现在可以拥有超级直观、上下文丰富的智能体,它们不仅能跟你聊天或回答提示,还能替你完成一大堆其他操作。这个看法合理吗,还是说可能超出了 OpenRouter 的范畴?
Which is essentially, you can now have super intuitive context rich agents that can do a lot more than just talk to you or answer your prompts, but they could probably do a bunch of other actions for you. Is that a fair take, or is that something that maybe might be out of the realm of of open router? Our
我们的战略是成为智能体最好的推理层。而且,我觉得开发者想要的是对智能体如何工作的控制权。我们的开发者至少喜欢把我们当作单一窗口来做推理,但他们想看到并控制智能体的样貌。智能体本质上就是在循环里做推理并控制方向的东西。
strategy is to be the best inference layer for agents. And Okay. What we what I think what what I think developers want is control over how their agents work. And and our our developers, at least, like, want to use us for as, like, a a single pane of glass for doing inference, but they want to, like, see and control the way an agent looks. An agent is is basically just something that is doing inference in a loop and controlling the direction it goes.
所以我们想做的就是写出超棒的文档,提供非常棒的底层原语,让这件事变得简单,这样你知道,我们的很多开发者其实就是做智能体的人。他们想要的是原语问题被解决,这样他们就能不断迭代新版本、新点子,而不用一遍又一遍地重新实现工具调用。这问题挺难的,因为几乎每天都有新模型或新提供商出现,而且人们真的想要、真的在用。所以把这一切标准化,让这些工具真正可靠,就是我们想聚焦的地方,这样智能体开发者就不用操心了。
So what what we wanna do is just, like, build incredible docs, really good primitives that make that easy to do so that you know, like, I think, like, a lot of our developers are just people building agents. And so what they want is they want the primitives to be solved so that they can just keep creating new versions and new ideas without worrying about, like, you know, reimplementing tool calling over and over again. And and and and so, like, at least for this is like a it's it's a tough problem given how many models there's like a new model or provider every day, and and people actually want them and use them. So to standardize this, like, make make these tools, like, really dependable, that's kind of, like, where we wanna focus and so that agent developers don't have to worry about it.
随着我们越来越接近甚至超越 AGI,我很好奇 OpenRouter 的终局是什么,如果你们有的话。你们希望最终到达的“主计划”是什么?因为假设这些系统变得越来越智能,能够自己做决定、选工具,OpenRouter 在继续路由这些数据方面扮演什么角色?你们有没有一个宏大的愿景,看到这一切最终走向哪里?
As we level up towards closer and closer to getting to AGI beyond, I'm curious what OpenRouter's kind of endgame is if you have one. What is the the master plan where you hope to end up? Because the assumption is as as these systems get more intelligent, as they're able to kind of make their own decisions and choose their own tool sets, what role does OpenRouter play in continuing to route that that data through? Do you have a a kind of master plan, a grand vision of where you see this all heading to?
你的意思是,随着智能体越来越擅长选择它们使用的工具,当它们真的很擅长这事时,我们的角色会变成什么?
You're saying as agents get better at choosing the tools that they use, what what becomes our role when the agents are are really good at that?
是的,是的。那么,你觉得 OpenRouter 在这个图景中扮演什么角色?对于 OpenRouter 的未来,最理想的情况是什么?
Yes. Yes. And, like, where do you see OpenRouter fitting into the picture, and what would be the best case scenario for this this future of OpenRouter?
目前,OpenRouter 是一个“自带工具”的平台。我们还没有一个 MCP 市场。而且我认为,很多——我觉得最常用的工具会是开发者自己配置的那些。智能体只要被赋予访问权限就能工作。我觉得 OpenRouter 的“圣杯”在于,整个生态会——我预测生态的演进方式是,所有模型都会加入状态和其他形式的粘性,让你想一直用它们。
Right now, OpenRouter is a bring your own tool platform. We don't have, like, a marketplace of MCPs yet. And and I I do think, like, a lot of the I think most of the most used tools will be ones that developers configure themselves. Agents just work like they're given access to it. Like, I think, like, a holy grail for for OpenRouter is that the the ecosystem is going to, like, basically, my prediction for how the ecosystem is gonna evolve is that all the models are gonna be adding state and other kinds of stickiness that just make you wanna stick with them.
所以它们会加入服务端协议。它们会加入有状态的网页搜索或记忆功能,会加入各种东西,试图阻止开发者离开,增加锁定效应。而 OpenRouter 反其道而行之,我们希望开发者不会感到被供应商锁定。
So they're gonna add server side protocols. So they're gonna add, like, you know, web search that that is stateful or got memory. They're gonna add all kinds of things that that try to prevent developers from leaving and and increase lock in. And OpenRouter is doing the opposite. We we want, like, developers to not feel vendor lock in.
我们希望他们觉得自己有选择权,可以随时使用最智能的模型,即使之前没留意,也永远不会太晚切换到更聪明的模型。这对我们来说就是一个持续的好结果。所以我觉得我们最终会做的是,与其他公司合作,或者必要时自己构建工具,让开发者不会感到被困住。这就是——你知道,生态有很多种演进方式,但简而言之,我就是这样看的。
We want them to feel like they have choice and they can, like, use the best intelligence even if they didn't look well. You know, it's never too late to switch to a more intelligent model. That would be, like, you know, a good always on outcome for us. And so what I think we'll end up doing is is, partnering with other companies or building the tools ourselves if we have to so that developers don't feel stuck. That's how I you know, there's a lot of ways the ecosystem could evolve, but that's how I would put it in a nutshell.
好的。还有一个私人问题我很好奇,因为我当时也和你一起经历了加密周期,NFT 爆火的时候,我是 OpenSea 的重度用户,那是一股先涨后跌的潮流。NFT 逐渐熄火,不再那么热门,AI 接过了风头。虽然是完全不同的受众,但情况类似,现在 AI 成了全球最火的东西。
Okay. Now there there's another personal question that I was really curious about because I was also right there with you in the crypto cycle when NFTs got absolutely huge, was a big user of OpenSea, and it was kind of this trend that went up and then went down. And NFTs kind of fizzled out. It wasn't as hot anymore, and AI kind of took the wind from the sails. And it's a completely separate audience, but a similar thing where now it's the hottest thing in the world.
我很好奇你怎么看接下来的趋势。这是像周期一样有起有落,还是一条单行道,每天更多token、更多AI?你觉得它是周期性的,还是一路向上、向右的单向趋势?
And I'm curious how you see the trend continuing. Is this a cyclical thing that has ups and downs, or is this a one way trajectory of more tokens every day, more AI every day? Is do you see it being a cyclical thing, or is this a a one way trend towards up and to the right?
NFTs 某种程度上会跟着加密货币走。当加密货币涨跌时,NFTs 通常会滞后一点,但大体上也有类似的起伏。而加密货币本身是一个极其长期的赌注,像是在构建一个全新的金融体系,有太多原因让它不会一夜之间实现,而且都是非常根深蒂固的原因。
NFTs kind of follow crypto in a indirect way. When crypto has ups and downs, NFTs generally lag a bit, but they they have similar ups and downs. And and crypto is an extremely long term play on, like, building a new financial system. And there are so many reasons that that's not gonna happen overnight. And and they're like it's very in very entrenched reasons.
相比之下,AI 正在发生一些一夜之间就能改变商业的事情。我觉得 AI 进展更快的原因之一,是它只是让计算机更像人类。如果你所在的公司已经在和一群人合作,那就只需要做一些工程,需要思考如何扩展。但总的来说,在见识到可能性之后,我认为推理会成为所有公司增长最快的运营支出。
Whereas AI, there are some overnight business transformations going on. And and the reason AI, I think, moves a lot one of the reasons that AI moves a lot faster is it's just about making computers behave more like humans. So if you are if a company already works with a bunch of humans, then there's, you know, there's some engineering that needs to be done. There's some, like, thinking about how to, like, scale this. But but in general, I think that it's not it it like, after seeing what can be possible, inference will be the fastest growing operating expense for all companies.
就会变成:我们只需点一下按钮,就能招到高绩效员工。它们表现都可预测,全是 AI,我们可以衡量,它们 24 小时工作,弹性扩展。这并不难。
It'll it'll be like, oh, we can just hire high performing employees at a click of a button. And they're they all perform predictably. They're all AI, and and we can measure and they they work twenty four seven. They scale elastically. It's like, you know, it it it's not that hard.
这不需要巨大的思维模型转变,只是对如今大多数公司运作方式的一次大升级。所以它和加密货币完全不同。除了都是新的这一点,NFTs 和 AI 从根本上就是非常不同的变革。
It's not like huge mental model shift. It's just like a huge upgrade to the way companies work today in most cases. So it's just completely different from crypto. There's there's, like other than both being you know, the NFTs, I mean, other than both being new, they're fundamentally very different changes.
你现在可能是世界上极少数对每一个AI模型都有疯狂洞察的人之一,绝对比普通用户多得多,对吧?我现在有三四个订阅,就觉得自己很厉害了。而你能在OpenRata上接触到400多个模型,具体是多少来着?57个模型。
You're probably one of very few people in the world right now that has crazy insights to every single AI model, definitely more than the average user. Right? Like, I have, like, three or four subscriptions right now, and I think I'm a hotshot. You get access to, like, 400 and what is it? 57 models right now on OpenRata.
所以我有一个很明显的问题想问你,我不会说“未来几年”,因为这个领域变化太快了。但在接下来的六个月里,有没有什么是你觉得AI领域应该重点关注的?也许是某些模型的设计方式,或者是应用层上现在还没人谈论的东西。因为我们之前聊到,你总能很早发现这些趋势,我想知道你现在有没有看到什么。
So an obvious question that I have for you is, I'm not gonna say in the next couple of years because everything moves way too quickly in this sector. But over the next six months, is there anything really obvious to you that should be focused on within the AI sector? Maybe it's like the way that certain models should be designed or perhaps it's at the application layer that no one's talking about right now. Because going on to like, going on from our earlier part of the conversation, you just pick these trends out really early. And I'm wondering if you see anything.
而且这不一定要和OpenRetro相关,只要是AI相关的都行。
And it it doesn't have to be open retro related. It could just be AI related.
我看到模型的发展趋势是,它们越来越重视“资源利用能力”,而不是“知识储备”。不是所有——我觉得很多应用我我不知道有多少模型实验室真的深信这一点,但有几家确实在谈这个。不过我觉得这还没真正影响到应用层,因为人们还是会问ChatGPT问题,如果知识答错了,他们就会觉得模型很蠢。这其实是一种糟糕的评估方式。
I've seen the models trending words caring more about how resourceful they are than what knowledge they have in the bank. Not all of I feel like a lot of the application I I think the model labs maybe a lot of them I don't know how many of them really deeply believe that, but, you know, a couple of them talk about it. And, I don't think it's really hit the application space yet, because people will will ask chat GPT things. And if the knowledge is wrong, they think the model is stupid. And that's just kind of a bad way of evaluating a model.
就像一个人知道什么、能回忆起某件事,并不能代表他有多聪明。模型的智能和实用性,最终会体现在它使用工具的能力上,以及它在超长上下文中的注意力表现上。也就是它的总记忆容量和准确性。所以我觉得这两点应该被更强调。也许模型的所有知识都应该来自在线数据库、实时抓取的网页索引,以及大量实时更新的数据源。
Like, whatever knowledge a person has whatever a person, like, recalls happened at a certain time, like, does not it's not a proxy for how smart they are. Like, the the intelligence and usefulness of a model is gonna trend towards how good it is at using tools and and and how good it is at, like, paying attention to its context of of a long, long, long, long context. So it's like it's it's total memory capacity and accuracy. So I think those two things need to be, like, emphasized more. The the like, it it might be that that models pull all of their knowledge from, like, online databases, from, like, real time scraped indices of the web along with a ton of real time updating data sources.
它们永远不会——它们总是依赖某种数据库来获取知识,但依赖自己的推理过程来调用工具。你知道吗?我们我们花了我们大概每周花最多时间的事情,就是工具调用,研究怎么把它做得特别好。人类和动物最大的区别,就是我们是用工具、造工具的物种。而人类的加速发展和创新,也正是来源于此。
And they're never they're always kind of, like, relying on some some sort of database for knowledge, but relying on their reasoning process for for tool calling. You know? Like, we we put it in like, we spend we spend most we spend probably, like, the plurality of our time every week on tool calling and figuring out how to make it work really well. Humans, like, the the big difference between us and animals is that we're tool users and tool builders. And and that's, like, where human acceleration and innovation has happened.
所以我们要怎么让模型非常高效地创造工具、使用工具?目前这方面几乎没有——几乎没有基准测试,也很少有前人研究。有一个叫Towel Bench的,用来衡量模型工具调用能力,但也就这一个,还有几个别的。比如SWE Bench,是用来测模型在多轮编程任务中的表现。
So how do we get models creating tools and using tools very, very effectively? There's there's very little like, there's are very few benchmarks. There's very little prior art. There's the towel bench for measuring how good a model is at tool calling, but there's and there's, like, maybe a few others. There's SWE bench for measuring how good a model is at at multi turn programming tasks.
但这个测试非常难跑,成本也高,比如对SONET来说,跑一次可能要花1000美元。现在用来评估模型真正智能的用户体验并不好。所以虽然OpenRouter今天还没列出这些基准,但我本人非常喜欢基准测试。我觉得应用生态和开发者生态应该花更多时间,做出更酷、更有趣的基准。
It's very very hard to run though. It costs, like, you know, for for SONET, it could cost, like, 10 like, a thousand dollars to run it. It's it like, the the user experience for kind of, like, evaluating the real intelligence of these models is not good. And so, like, I as much as we don't have benchmarks listed on OpenRouter today, I love benchmarks. And I I think, like, the app ecosystem and, like, developer ecosystem should spend a lot more time making very cool and interesting ones.
另外,我们会给最优秀的基准测试提供信用额度资助。
Also, we will give credit grants for for all the best ones.
所以
So
非常推荐大家试试。
highly encourage it.
Alex,感谢你今天的宝贵时间。我们的节目也快结束了。这次对话真的很精彩。从非AI的OpenSea一路走来,到如今的OpenRouter,你的整个旅程充分展现了这些技术的演进方向,更重要的是,我们未来会走向何方。我非常期待看到OpenRouter在提示路由之外的更多发展。
Well, Alex, thank you for your time today. I think we're we're coming up on a close now. That was a fascinating conversation, man. And I think your entire journey from just non AI stuff, so OpenSea all the way to OpenRouter has just been a great indicator of like where these technologies are progressing and more importantly, where we're gonna end up. I'm incredibly excited to see where OpenRouter goes beyond just prompt routing.
我觉得你提到的数据相关功能会非常有趣,甚至可能成为你们的一大亮点。所以我对未来的发布充满期待。正如Josh刚才所说,如果GPT-5要首发在你们平台,请给我们一些额度,我们肯定想用。对于本节目的听众,正如你们所知,我们一直在邀请最有趣的人来聊聊AI和前沿科技。
I think some of the stuff you spoke about on the data side of things is gonna be fascinating and arguably one of your bigger features. So I'm excited for future releases. And as Josh said earlier, if GPT-five is releasing through your platform first, please give us some credits. We would love to use it. But for the listeners of this show, as you know, we're trying to bring on the most interesting people to chat about AI and frontier tech.
希望你们喜欢本期节目。一如既往地,请点赞、订阅,并分享给任何可能感兴趣的朋友。下期再见,谢谢大家。
We hope you enjoyed this episode. And as always, please like, subscribe, and share it with any of your friends who would find interesting, and we'll see you on the next one. Thanks, folks.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。