本集简介
双语字幕
仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。
你好,欢迎收听数据工程播客,这是一档关于现代数据管理的节目。
Hello, and welcome to the data engineering podcast, the show about modern data management.
如果你领导一个数据团队,你一定深有体会。
If you lead a data team, you know this pain.
每个部门都需要仪表板、报告和自定义视图,而他们都会来找你。
Every department needs dashboards, reports, custom views, and they all come to you.
所以,你要么成为拖慢所有人进度的瓶颈,要么花所有时间去开发一次性工具,而不是做真正有价值的数据工作。
So you're either the bottleneck slowing everyone down, or you're spending all your time building one off tools instead of doing actual data work.
Retool 提供了一种打破这种循环的方法。
Retool gives you a way to break that cycle.
他们的平台允许员工在你公司数据的基础上构建自定义应用,同时确保所有数据的安全性。
Their platform lets people build custom apps on your company data while keeping it all secure.
只需输入一个提示,比如‘帮我构建一个自助式报告工具,让团队能从 Databricks 查询客户指标’,他们就能立即获得一个具备权限控制和治理机制的生产级应用。
Type a prompt like build me a self-service reporting tool that lets teams query customer metrics from Databricks, and they get a production ready app with the permissions and governance built in.
他们可以自助服务,而你则能重新夺回自己的时间。
They can self serve, and you get your time back.
这是没有混乱的数据民主化。
It's data democratization without the chaos.
今天就去 dataengineeringpodcast.com slash Retool 了解 Retool,看看其他数据团队是如何实现自助服务的。
Check out Retool at dataengineeringpodcast.com slash Retool today, that's r e t o o l, and see how other data teams are scaling self-service.
因为说实话,我们都得重新思考如何处理数据请求。
Because let's be honest, we all need to retool how we handle data requests.
你的主持人是托比阿斯·麦肯伊,今天我将采访拉吉·舒克拉,探讨如何构建自我改进的AI系统,以及它们如何在真实生产环境中实现AI的可扩展性。
Your host is Tobias Macey, and today I'm interviewing Raj Shukla about building self improving AI systems and how they enable AI scalability in real production environments.
那么,拉吉,你能先介绍一下自己吗?
So, Raj, can you start by introducing yourself?
嗨,托比阿斯。
Hi, Tobias.
很高兴来到这里。
Very nice to be here.
谢谢你的邀请。
Thanks for having me.
我叫拉吉·舒克拉。
My name is Raj Shukla.
我是SymphonyAI的首席技术官。
I am the CTO at SymphonyAI.
SymphonyAI是一家垂直领域AI公司,这意味着我们一次专注于一个行业,深入运用AI代理和AI模型,实现端到端的业务流程自动化,并帮助构建自主型企业。
SymphonyAI is a vertical AI company, which means we take one domain or one vertical at a time, and we go very deep with AI agents, AI models to achieve end to end business process automations and just generally helping build the autonomous enterprise.
这就是我们向客户销售的产品。
That's what we sell to our customers.
我们会提供代理即服务、模型即服务,以及一些将这一切整合起来的应用程序。
So it'll be agents as a service, models as a service, and some applications that bring it all together.
我已在SymphonyAI工作了三年半。
I've been at SymphonyAI for the last three and a half years.
与真正的AI以及真实行业、工厂车间、杂货店,还有金融机构对抗犯罪一起工作,这段旅程非常精彩。
It's been a great journey working with true real AI and real industries and factory floors and grocery stores and, you know, financial institutions fighting crime.
这就是我的这段经历。
So that's that's my journey here.
在这之前,我曾在微软工作,担任过公司内多个职位,主要集中在应用人工智能、机器学习以及工程领导岗位。
And before this, I used to be at Microsoft, kind of had roles across the company, mostly in applied AI, machine learning, engineering leadership positions at Microsoft.
你还记得你是如何开始涉足机器学习和人工智能领域的吗?
And do you remember how you first got started working in the ML and AI space?
是的,我记得。
Yes, I do.
当时正好是我攻读计算机科学专业的时候,快完成本科学业、即将进入硕士阶段,机器学习系统正开始流行起来。
I just happened to be at the very, you know, right around my I was a computer science major and right about when I was finishing my bachelor's, coming into my master's, just machine learning systems were becoming popular.
因此,我接触到了机器学习,真正了解了它背后的数学基础,比如优化问题及相关领域。
So I've, you know, got introduced to machine learning as a true, you know, with its mathematical backgrounds in optimization problems and all related areas.
所以我的起点是机器学习的一些理论,之后才转向应用机器学习。
So it started with a bit of theory around machine learning and then got into applied machine learning.
在攻读硕士期间,我研究的是大规模网络系统中的异常检测系统。
And as a part of my master's, I was doing anomaly detection systems on very large scale network systems.
比如电信网络,如何检测异常,如何构建能够快速预测并响应异常的系统等等?
So like, you know, telecom networks and how you detect anomalies and how do you How can you build systems that predict and react fast to anomalies and so on?
所以,可以说,在我学习计算机科学的过程中,我也接受了一些机器学习的训练。
So, you know, as a part of my kind of growing up in computer science itself, I was trained a bit in machine learning.
当我进入行业时,我的第一份真正的工作就是在搜索和广告领域。
And when I came into the industry, my first real jobs were in these areas of search and advertising.
我的第一份工作是做点击预测模型,帮助广告商确定在网页上如何投放广告等。
So my first job was in click prediction models that allowed, you know, advertisers to figure out where to place their ads on pages, etcetera.
进入微软后,我最初的工作是搜索排名问题,这是当时业界最早大规模应用的机器学习模型之一。
And once I got into Microsoft, my first work was on search ranking problems, and which was the kind of the early at scale machine learning models which used to be applied in the industry.
我一直身处这个领域,并且一直很享受它。
So always been in this space, always enjoyed it.
这一切运作起来有种神奇的感觉。
There is a element of magic to how it all works.
这就是让我持续前行的原因。
So that's what keeps me going.
现在深入探讨自我提升型AI这个概念,显然这是非常必要的,因为AI模型本身在重新训练或后训练之前基本上是静态的。
And now digging into this concept of self improving AI, obviously, is a very necessary element of it because the AI models themselves are largely static until you retrain them or post train them.
我只是想知道,你能否先概述一下什么是自我改进的AI系统,以及你如何判断系统是否具备这种进化能力?
And I'm just wondering if you can just start by giving an outline of what constitutes a self improving AI system and what are the signals that you look to for whether the system is capable of that type of evolution.
是的,当然可以。
Yeah, absolutely.
一旦将现代基础模型和基于这些基础模型的智能体系统应用于工业场景,这一点就变得至关重要。
This is very critical once you apply the modern foundation models and agentic system based on these foundation models out in the industry.
所以,你有环境这个概念。
So, you know, you have the concept of the environment.
环境实际上是一个真实世界系统,智能体或你的模型在其中运行并试图完成某项任务,同时受到某些外部变量的影响,因此可能存在触发条件。
An environment is really a real world system which has where an agent is operating or one of your models is operating and it's trying to do a task And it's operating under some influence of certain external variables, so there could be triggers.
例如,人们从超市货架上购买商品,货架上的商品逐渐售罄。
For example, people are buying things off of grocery shelves, and the shelves are going empty.
而有一个监控系统正坐在那里观察这一情况。
And there is a monitoring system sitting there and which is observing it.
因此,这可能会产生一个触发条件。
And so that can create a trigger.
当系统对这些采取行动后,环境会产生相应的反应,对吧?
And once the system takes actions on that, there is a reaction to it from the environment, right?
这种反应可能是正面的,说明这个行动是好的,也可能是负面的,说明这个行动并不好。
And the reaction could be on the positive side that that action was good, or it could be on the negative side that that action wasn't good.
例如,在打击金融犯罪中,模型和代理可以检测某项活动是否属于欺诈或疑似洗钱行为。
So, for example, in financial crime fighting, the models and agents can detect whether something is fraudulent or looks like a money laundering activity.
最终,会有一位人类进行调查,模型会给出建议,而人类可能会说:不,这不对,因为他们知道或发现了某些情况;或者他们认为这是对的,而且基本正确。
And then at the end of it, there is a human who is investigating it and the model makes a recommendation and the human could say, No, that is not right because of something that they know or they found out, or it's right and it's largely right.
因此,环境中的这种定义创造了代理运行所需的所有输入变量的条件。
And so that's kind of, that's, so there is a definition in environment which kind of creates the conditions under which the, all the input variables under which the agent is operating.
一旦代理采取行动,就会收到某种反馈,表明这个行动是对还是错。
And once the agent takes an action, there is some feedback coming whether that is right or not.
关键是,无论是在代理内部使用自学习模型,还是通过智能的记忆更新或其他技术,一旦获得反馈,就应当有所调整,对吧?
And the idea is that whether it is with a self learning model inside the agent or whether it is through intelligent kind of memory updates or other techniques, that once you get the feedback, something should change, right?
因此,系统应该说:好吧,我知道我犯了错误,并且收到了相关的反馈。
So the system should say, okay, I know I made a mistake and I got feedback for it.
我正在更新某些东西,无论是内存数据库、核心LLM配置,还是模型本身,我都试图加以改进。
And I'm updating something, whether it's a memory database or whether it's its core LLM configuration or the model itself that I'm trying to improve.
下次这种情况就不会再发生了,当然这是从概率上来说的。
Next time, this will not going to happen, probabilistically, of course.
这正是我们所追求的目标。
And that's what we try to achieve.
我认为这其中既有实际的考量,也有将环境视为强化学习(RL)环境的理论方法。
I think there are practical considerations of it, and then there are theoretical ways of treating the environment as a reinforcement learning or RL environment.
同时,也有将它视为带有触发器和接口的自学习系统的实际做法。
And there is practical aspects of treating it as a self learning system with triggers and hooks and so on.
所以这个领域非常令人兴奋,我认为它已经准备好在今年和明年大放异彩了,我不会说它是颠覆性的,但肯定会迎来爆发。
So the field is very exciting, but I think it's primed for, you know, I would not say disruption, it's primed to go big this year and next.
在模型所处系统的环境互动这一环节中,构成该环境的组成部分有哪些?比如数字基础设施或物理操控?
And in that element of interacting with the environment of the system that the model is operating within, what are some of the components that comprise that environment, whether that's digital infrastructure, physical manipulation?
特别是当你开始考虑机器人这类应用时,我想知道,为了让AI系统具备这些强化和学习循环,需要哪些关键要素?
So particularly if you're starting to think towards things like robotics, I'm just wondering if you can talk through some of the pieces that are necessary in order for an AI system to have some of those reinforcement and learning loops.
是的
Yeah.
对
Yeah.
这是个好问题。
That's a good question.
所以我认为,环境的核心方面首先是你们如何数字化正在输入的信息,无论是核心数据还是其他输入数据。
So I think, the the core aspect of the environment first is how are you digitizing the information that is coming in, whether it's the core data that is coming in.
有时候这很明显,有时候却很难,对吧?
And sometimes that's obvious, sometimes that's hard, right?
当你在处理工厂车间或我们运营的杂货店等物理环境时,必须部署基于视觉的图像传感器来采集输入数据。
When you're working with physical environments like factory floors we operate in or grocery stores we operate in, you have to have some vision based, image based sensors out there that is feeding in an input.
这些数据会被数字化,形成你们的核心数据层,触发器等机制都基于此建立。
And that is going to be digitized, forms your core data layer on which these triggers are created, etcetera.
而在行动端,有时行动只是向人类发出通知,让他们做出调整。
And on the flip side, on the action side, sometimes the actions are notifications to humans to change something.
例如,去工厂车间或超市货架上补货。
So for example, to go restock the items on the factory floor or on the grocery floor.
有时则是根据代理所观察到的情况,订购下一批货物。
And sometimes it's about ordering your next batch of goods to come in based on what the agent sees.
因此,在这两方面,我认为从根本上来说,你必须思考信息的数字化形式,以及它的物理实现方式。
So on both sides, it's you have to, I think fundamentally you have to look of what is the digital form of the information and then what is the physical translation of it.
这可以复杂到边缘计算或边缘机器学习模型,也可以简单到一个API连接,用于在操作端触发某些动作。
And it can be as complex as sort of edge computing or edge ML models, and it could be as simple as an API connection that actually triggers some action on the on the action side.
其中一个让我想到的、用于引导特定模型并提升其能力的方面,是广义上被称为上下文工程的内容,即确保模型在特定时刻能够访问到特定信息。
One of the pieces that comes to mind as far as a way for guiding a particular model and improving its capabilities is the what's broadly being termed context engineering, but making sure that the model has access to particular information at a particular point in time.
另一个组成部分是‘代理记忆’这一概念,即模型在执行过程中决定某个信息对未来有用。
Another component is this idea of agentic memory where the model, as part of its execution decides that there's a particular piece of information that is relevant for future use.
因此,它会决定将该信息存入某种形式的上下文存储中,无论是专用的记忆层,还是某个地方的文本文件。
And so it will decide to push that into some form of context store, whether that's a dedicated memory layer or just a text file somewhere.
我只是想知道,你能否区分一下,你所理解的自改进系统这一整体概念,与广泛使用的‘代理系统’术语之间有何不同?你如何判断自己拥有的是哪一种?
And I'm just wondering if you can maybe give some differentiation between the ways that you think about this overall concept of self improving systems versus the or agentic system terminology that's also broadly used and how you can decide which one you have?
是的,当然。
Yeah, absolutely.
我认为它已经经历了多次演变。
I think there has been many evolutions of it.
如果我从两年前甚至更早的早期演变开始说起,当时的早期智能体系统中,有一个领域叫做上下文学习,本质上是指当智能体在执行某个任务或步骤时,其决策会受到提示的引导。
So if I were to start from the very early evolution that existed even two years ago, the early agentic systems, there is this area of in context learning, which is basically saying if an agent at a particular task or a step is making a decision, it is guided by a prompt.
那么,这个提示是受什么引导的呢?
Now what is the prompt guided by?
提示通常由一些示例引导,这属于少样本学习的一种形式。
The prompt is guided by some examples usually, which is a few shot kind of exercise.
即使在两年前,人们就已经知道可以将这些少样本示例根据具体问题动态调整。
Now, even two years ago, people knew that you could adjust those few shots to be dynamic to the problem.
因此,根据你收到的输入类型,你可以选择不同的示例集合,以使结果稍好一些。
So depending on what type of input you are getting, you could select a different set of examples to make the result a little better.
这本身极大地提升了模型能够完成的任务能力。
And that itself gave huge improvements in what the models could accomplish.
所以在这种情况下,学习循环会非常简单。
So in this case, the learning loop would be quite simple.
代理在做决策,或者步骤倡导者做出决策。
The agent is making a decision or the step advocate makes a decision.
你会收到反馈,包括它哪里做错了,或者哪里做对了的例子。
You get feedback back in terms of examples of where it got wrong or examples of where it got right.
在下一个输入到来的上下文中,你选择合适的例子。
And just in context of the next input coming in, you choose the right examples.
对吧?
Right?
这是一种非常简化的形式。
And that's a very simplistic form.
另一方面,极端的做法是真正建立强化学习环境,通过强化学习技术训练语言模型,有时我们会有可验证的反馈。
Now, the very extreme of it on the other side is to really set up the true RL environment and to have a true language model be trained with the right kind of techniques through reinforcement learning, whether it's In some cases, we have like verifiable feedback.
因此你可以进行这种RLVR类型的循环。
So you can do these RLVR kind of loops.
在某些情况下,你会采用GRPO类型的策略,根据收到的反馈构建奖励系统。
In some cases, you do GRPO kind of policies where you are creating reward systems out of feedback you are getting.
但从某种意义上说,你确实在进行真正的学习,而不仅仅是更新一些人工制品。
But in a sense, you are truly learning in the sense of learning and not just updating artifacts anyway.
因此,这才是真正的学习形式。
So that's the true form of learning.
而这个持续学习的模型,正是指导代理所执行决策的核心。
And then this always learning model, it's what's guiding the true decision step that the agent is operating on.
我认为,最近像Cloud和OpenClaw这样的代理系统变得非常有趣,因为它们在某种程度上找到了一个中间地带:反馈循环确实存在,但并没有直接进入一个经过强化学习训练的系统。
I think what is getting very interesting lately with agents like Cloud and agentic systems like OpenClaw recently is that there is a middle ground somewhere where the agent, well, the feedback loop is coming in, but the feedback loop is not going directly into an RL learned system.
它也不是以原始提示的形式输入。
It's not going as raw prompt.
而是被作为记忆进行更新,并在合适的时机、合适的上下文中被调用。
It's actually being updated as memory, as an intelligent memory that gets pulled in at the right context at the right time.
而且,这种记忆的更新也不仅仅是简单地将反馈循环直接追加进去。
And the update on the of that memory is also not just a pure append of that feedback loop.
这是一种智能的追加方式,意味着通过调用大语言模型,可以根据这次互动来更新用户的口味和偏好。
It's an intelligent append in the sense there is an LLM call that says, you can update taste and preferences of this user based on this interaction.
你可以更新,你知道的,有情景式的记忆,还有随机性或基于时间的记忆等等。
And you can update, you know, there's episodic sort of memories, there is kind of stochastic or time based kind of memories and all.
但这些记忆都可以保存在相当简单的、基于文件系统的架构中。
And but with these, and you can keep all these memories in rather simple file system based, you know, architectures.
而智能体足够聪明,能在恰当的时候作为后台进程选择并更新正确的记忆。
And the agent harness is intelligent enough to pick the right memory and update the memory at the right time as a background process.
因此,我们实现的更多是一项工程壮举,而非科学突破。
So it's really an engineering feat that we are accomplishing more than a science feat.
我认为这确实正变得
I think it's definitely turning out to
比基于提示的上下文学习更好。
be better than prompt based in context learning.
而在另一方面,基于强化学习的真正学习模型更难实现。
And on the other side, the RL based true learning models are harder to implement.
因此,从实际角度来看,这感觉像是一个合适的中间地带,正在获得广泛认可,我们也在产品中采用它。
So from a practicality of it, it feels like the right middle ground, which is gaining traction, and we are adopting it in our products as well.
除了特定的记忆系统之外,代理在信息的创建、检索和整理中发挥作用的上下文工程,还涉及各种工具调用、MCP服务器,甚至代理间的交互。
And another piece of the context engineering beyond the specific memory system where the agent is involved in the creation and retrieval and curation of those pieces of information is also the idea of the various tool calls or MCP servers or even agent to agent interactions that are involved.
我想知道,你如何看待这些工具使用能力与这些工具所提供信息的演进,与记忆系统、强化学习或模型微调之间的区别?
And I'm wondering if you can give your sense of whether and how you differentiate between those tool use capabilities and the evolution of information available from those tools versus this idea of the memory system or reinforcement learning or model fine tuning.
是的。
Yeah.
我认为,工具使用领域在过去大约一年里也经历了非常迅速的发展。
I think the tool use area overall has also seen a very rapid evolution in the last, I would say, one year or so.
对吧?
Right?
我记得我们,作为SymphonyAI,运营的领域都是金融犯罪等高度监管的行业。
I remember us, like we, as SymphonyAI, we operate in very regulated industries like financial crime and so on.
因此,我们对代理准确性的要求非常高,需要达到90%以上,甚至99%。
And so our need for our agents to get things right is very high, in the high 90s, 99 percentages, etc.
所以我们倾向于不把事情交给大语言模型。
So we tended to not leave things to LLMs.
我们开发了数百种非常具体的确定性工具。
And we had made hundreds of tools, which are very specific kind of deterministic tools.
我们曾依赖代理来正确调用这些工具。
And we used to rely on our agents to do the right tool calling.
在工具内部,你会进行确定性的计算,这样我们就不会给大语言模型留下在执行这类计算时产生幻觉的空间,对吧?
Within the tools, you would do deterministic calculations so that we don't leave a chance for LLMs to hallucinate while doing those sorts of calculations, right?
所以这是一个非常复杂的系统,但它能确保结果准确。
So it was a very complex system, but it used to get things right, etc.
然后,模型在工具使用等方面持续改进。
And then, you know, models kept improving on just like, you know, better tool usage, etc.
但特别是,它们在某些类型的工具及其使用上进步显著。
But particularly, they kept improving in certain kinds of tools and their usage a lot.
比如,搜索作为一种工具变得非常流行。
Like, so search as a tool became very popular.
代码执行或代码编写显然在这方面变得非常出色。
Code execution or code writing, obviously, became really good at it.
所以在当今时代,你可以合理地认为模型在代码编写方面能正确完成95%的工具任务。
So in today's day and age, you can actually assume that the code writing aspect of the model will get 95% of those tools right.
因此你不需要预先编写这些工具。
And so you don't have to prewrite those tools.
你可以在运行时探索这一领域,如果需要的话,之后再缓存这些工具。
You can actually explore that space at runtime and cache those tools later on if you have to be.
但对我来说,最令人震惊的时刻是,当这些类Unix工具和文件系统工具开始变得非常优秀时。
But I think for me, the mind blowing moment was when these kind of Unix tools and file system tools start getting really good.
我认为这要归功于Claude团队对这条路径的探索。
And I think credit goes to Claude a lot, the Claude team for exploring that path.
仅仅使用文件系统或基于Unix的工具作为基础工具,将它们组合起来生成大量衍生工具,确实大大简化了整个架构,对吧?
And just using the file system or Unix based tools as base tools, which can come together to produce a lot of derived tools, really simplify the stack, right?
我认为到去年年底,我们已经弃用了大约80%的工具,仅依靠核心基础工具就获得了相同的效果。
And I think by the end of last year, we had thrown away maybe 80% of our tools and gotten the same kinds of results with just relying on core basic tools.
在某些情况下,我们无法这样做,因为这实际上是在用测试时的计算资源或长时间运行的工具使用和推理模型,来换取执行的延迟和速度。
Now in certain cases for us, we cannot do that because I think you are effectively trading test time compute or long running tool usage and thinking models for latency and speed at which you're executing.
因此,在某些领域我们做不到。
And so certain domains we cannot.
我们仍然保留了旧的架构。
And we've we've kept the older architectures.
但只要有可能,我们就会用时间换取模型利用其基础工具、业务上下文以及我们从垂直领域提供的知识上下文来发挥其‘魔力’,这极大地简化了我们的架构。
But wherever we can trade off time to let the model do its kind of magic with the base tools that it brings and the business context and the right knowledge context that we bring from our verticals that has been, like, really simplified our architecture.
当你谈到在运行时动态创建工具这一理念时,正如你所指出的,那些将这些模型用于软件工程场景的人们,实际上一直在这么做,而且也鼓励这样做。
And as you're talking about this idea of creating the tools dynamically at runtime, obviously, as you pointed out, people who are using these models for software engineering use cases, it's doing that all the time, and it's encouraged to do so.
这也引出了一个更广泛的问题:AI系统所谓的‘改进’究竟意味着什么?它应该在哪些维度上改进?适用于哪些使用场景?
And I think that also opens up the overall question of what does it mean for an AI system to improve, improve along what axes and for what use cases?
是的,完全正确。
Yeah, absolutely.
我认为,如果你看看当前领先的基础模型公司是如何思考这个问题的,他们已经不再把AGI视为纯粹的LLM概念了。
I think if you look at how the leading foundation model companies are thinking about it now, they don't think about AGI as a pure LLM concept.
对吧?
Right?
他们说:‘你知道吗?’
They are saying, oh, you know what?
我们中途发现了一些有趣的事情。
We realized in the middle something interesting happened.
模型在进步,但模型在代码编写方面变得非常出色。
Models are improving, but models got really good at code writing.
这使得其他类型的内部循环成为可能:当模型无法解决时,它可以编写代码,并创建测试代码的环境。
And that gave an ability for a other kinds of inner loops where when the model cannot figure it out, it can write code, and it could create environments where it tests the code.
所以,它可以创建系统,这些系统并非完全由模型输出构成,而是作为内部循环的子代理,持续改进。
And and, you know, so it can create systems not entirely as a model output, but as inner loop sub agents where it can keep improving.
因此,这被视为通向AGI的更清晰路径,而不是仅仅指望模型整体上的提升。
So that is seen as a much clearer path towards AGI than just seeing it just a model improve overall.
自身。
Itself.
我认为我们在自己的垂直AI系统和代理中也采用了类似的方法。
And I think we take a similar approach in our, you know, vertical AI systems and agents that we do.
我认为我们必须确保构建这些垂直领域特定的子代理循环,我们要在这方面做到最好——让模型编写代码,为它提供一个能获得反馈的环境,很多时候,根据反馈,它会重写代码并更新这些系统,使其更高效。
I think we gotta make sure that we build these sub agentic loops where the which are vertical specific for us, and we have to be the best at it, where the models write code, we give it an environment where it gets the feedback, and and many times, based on the feedback, it rewrites the code and and updates these systems to be more effective.
一个非常简单的例子是,如果你考虑企业中的深度研究代理,可以把企业API作为主要的知识上下文,而不是把搜索当作API。
Like a very simple thing is like if you think of deep research agents and in enterprises, can think of deep research agents working on enterprise APIs as the main knowledge context and not search as an API.
我们内部意识到,如果你正在使用企业内部API并试图为其构建一个深度研究代理,你可以让模型在中间编写代码。
And, you know, we realized internally that if you're working with internal enterprise APIs and are trying to build a deep research agent on it, you could let the model write code in the middle.
当你给予它反馈,判断它是否找到了正确类型的信息时,它会编写更多代码,以便将来在进行一次API调用时,能获得输出。
And as you give it feedback that whether it was able to find the right kind of information or not, it will write more code to say in the future when it does one API call, it gets an output.
它不会将该输出直接传递给下一个LLM调用。
It does not pass that output to the next LLM call.
它实际上编写了代码,对输出进行分析,并只将良好的摘要传递给下一个工具。
It actually has written code where it is doing analytics on that output and passing only a good summary of it to the next tool.
简单来说,我们的子代理从基于RAG的系统演变为高度进化的智能搜索,再进一步发展为这种在中间加入代码编写的智能搜索子代理。
So in simple terms, our own sub agents evolved from RAG based systems to these really highly evolved Agentic Search, and then now into this Agentic Search plus code writing in the middle kind of sub agents.
这在某种程度上确实具有开创性,帮助我们实现了对这些子系统所期望的自主性水平。
And that's really been, groundbreaking in some sense to help us achieve the level of autonomy we were expecting for these subsystems to do.
当你拥有这种能够自主创建工具、探索特定关注领域的动态概率系统时,另一个挑战是系统可能与既定目标产生偏离。
One of the other challenges that comes up when you do have this dynamic probabilistic system that is capable of creating its own tools, exploring particular areas of focus is the potential for misalignment with the stated goals of that system.
这就引发了许多关于安全、身份管理、访问控制和防护机制的问题。我想知道,你如何看待生态系统中这些能力的演变?人们开始如何思考约束和引导这些系统,以确保它们不会偏离目标,或者一旦偏离,能迅速被引导回既定目的?
And so that brings a lot of questions around security, identity management, access controls, guardrails, and I'm wondering how you're seeing some of those capabilities evolve in the ecosystem in terms of how people are starting to think about constraining and guiding these systems to ensure that they don't become misaligned, or if they do, that they're quickly redirected to the stated purpose.
是的,我认为这正是代理在消费级应用和企业级应用之间最大的区别。
Yeah, I think that is probably the biggest difference between a consumer application of agents and enterprise application.
因此,我们对此给予了高度关注。
And so we pay a lot of attention to it.
我们在平台上构建了许多代理生命周期管理层面来应对这个问题。
We build a lot of kind of agent lifecycle management layers in our platforms to deal with it.
举个例子,在金融服务或金融犯罪防控领域,代理在做出任何决策之前,必须先通过一个关于政策一致性的代理流程。
So just to give you an example, in financial services or financial crime fighting, there is a before an agent can make any decision, there is an agentic process first, which is on policy alignment.
因此,代理的第一步必须是确认它已通过政策和标准操作流程的验证,并证明自己理解无误。
And so the first step the agent has to say is it has gone through the policy and standard operating procedures and it proves that it got it right.
它实际上更进一步,指出了政策中的空白,提醒人类应在何种情境下介入并明确该做什么。
It actually goes one step further and highlights the policy gaps it sees where the human should come in and clarify what to do in what scenario.
在实际应用中,当我们向客户介绍时,我们会说:让我们确保这一点做对。
And in a practical application, when we go to our customers, we'd say, let's make sure we get this right.
而且,它生成了一份详细的待办事项清单,并将这份清单的生成过程与银行标准操作程序中的具体段落对应起来,这本身就为每家银行和大型金融机构的治理委员会和模型治理团队提供了有力支持。
And the fact that it creates this big to do list and it maps how it created that to do list to what snippet in the standard operating procedure in the bank, that itself gives these, you know, every bank and big financial institution has these governance committees and model governance kind of teams.
仅凭这一点,就已经比过去预测性机器学习模型的透明度高得多。
And just that is actually far more transparent than what predictive ML models used to do.
我的意思是,这些系统、这些智能体实际上在明确告知:我读取了这份文档,并生成了这份待办事项清单。
I mean, these systems, these agents are actually telling, okay, I took this document, I created this to do list.
我认为当我执行这个工具时,我会编写出这样的代码。
I think when I execute this tool, I will write this kind of code.
我这么做是因为这种映射关系是存在的。
And I'm doing this because this mapping exists.
它们对此非常接受,因为此前它们从未从机器学习模型中获得过如此高程度的透明度和可解释性。
They are actually being very receptive of it because they never got this level of transparency and explainability from ML models before.
所以当需要政策指导时,这是我们有意识采取的一步。
So that's it like a conscious step we take when something has to be policy guided.
这种政策对齐本身就是代理或子代理的首要目标。
That policy alignment itself is the first goal of an agent or a sub agent.
接下来是执行的过程。
Then comes the process of executing on it.
而在执行过程中,它是否会偏离轨道或其他情况?
And while it's executing on it, does it go off trail or whatnot?
是的,你必须保持正确的评估机制。
And yes, you just have to keep the right evals.
你必须为它设置正确的安全边界。
You have to keep the right guardrails around it.
这些安全边界中,很多是在代理运行时生效的,但也有很多是在代理监控阶段的聚合层面进行的。
Many of these guardrails are, many of these are while the agent is running, but many of these are also at an aggregate level at an agent monitoring step.
当这些代理在生产环境中运行时,你会观察它们的表现,以及它们为何做出某些决策。
While these agents are running in production, you are seeing how they are performing, why they are making certain decisions.
我们已经设置了指标,在这些代理运行时持续观察它们的性能表现。
We have metrics in place, which we are observing as these agents are running and seeing how the performance is going.
我再进一步说一点。
And I'll go one step further.
我们已做好准备,这些代理系统不会立即上线。
We are prepared that these agentic systems will not go live right away.
因此,我们为它做准备的方式是:让这些代理在后台运行,观察它们的表现,对吧?
So the way we prep for it is we say, let these agents run-in the background and see how they are performing, right?
随着时间推移,你基于这些数据驱动、指标驱动的方法,逐步建立起信心:比如在过去三个月里,这个代理在金融犯罪案件中的表现,已经和人类调查员一样出色。
And then over time, you build with these data driven, with these metrics driven approaches, you build confidence that, look, over the last three months, the agent is performing as good as your human investigator in the case of financial crime, for example.
在这方面,我们还会采取额外措施,因为调查员分为三个级别:一级、二级和三级。
And there also we take extra steps because there is three levels of investigators, level one, level two, level three.
我们会先说:它表现得和一级调查员一样好,这就实现了初步应用。
And we'll first say, look, it is doing as good as a level one investigator and that gets adoption.
然后你再证明它能和二级调查员一样出色,就像软件工程中,我们会说这些模型的表现已经媲美初级软件工程师、高级软件工程师,乃至资深工程师或架构师。
And then you prove it doing as good as a level two investigator, kind of like software engineering, you would say these models are doing as good as a junior software engineer and as a senior software engineer and then as a staff or architect or whatnot.
但这些阶段在我们所运营的垂直领域和行业中都有非常明确的定义。
But, I mean, these stages are very clearly defined in the verticals and industries we operate in.
我们正在采取渐进式的步骤,始终牢记透明度和严格的管控措施,在生产系统后台运行并验证其效果,暂不上线,然后分阶段逐步上线。
And we are taking gradual steps, keeping this transparency and strict guardrails in mind, proving it out while running in the background in production systems, not going live, and then stage wise going live.
我认为,要真正建立起对这类系统的信心,必须做到所有这些。
And I think it takes all of that to actually develop confidence in a system like this.
在这一过程中,我们也在不断学习。
And we are also learning along the way as we are doing that.
但我很自豪地说,与我们在这些领域所见到的其他公司相比,我们在这一方面的推进程度是最领先的。
But I'm proud to say we feel like we are the furthest along amongst every company we see in these domains to, you know, and how we are operating on this.
当谈到代理动态创建工具的能力时,安全性就显得尤为重要,因为它可能在生产环境中执行未经审查的代码,这就需要对可执行的代码、功能或操作施加一定的沙箱隔离或限制。
The other element of security comes in when you're talking about the agent's ability to dynamically create tools, and therefore it's executing unreviewed code in potentially a production environment, which necessitates a certain level of sandboxing or constraints on what code or what features or functions can be executed.
我想知道,你们如何看待人们处理这种动态运行时计算的这一方面?
And I'm curious how you're seeing people manage that aspect of that dynamic runtime compute.
是的。
Yeah.
是的
Yeah.
100%
100%.
我们实际上在平台中提供了代码执行沙箱,这些沙箱被配置为运行 Python 或 TypeScript,但更重要的是,控制其能访问的网络范围、可用的 API 等等。
We we actually provide code execution sandboxes as a part of our platform, and they are being configured to run Python or TypeScript, but more importantly, the how much of the network it can access, which APIs it has access to, whatnot.
是的,关于沙箱和保障这些代理的行为安全,这是一个非常关键的方面。
Yeah, so as far as sandboxing and securing what these agents are doing, it's a very critical aspect.
我们在这方面不依赖任何第三方。
And we don't rely on any third party for that.
我们将它作为我们平台的一部分提供。
We ship it as a part of our our platforms.
有趣的是,文件系统在这里正变得越来越有意思,因为代理越来越多地以文件系统中的文件形式运作。
What is interesting is file systems are getting an interesting twist here because the agents are operating more and more as files in the file system.
在沙箱本地文件系统和某些更持久的云存储之间,必须进行大量的系统工程,以确保代理操作时这两者保持同步。
And between sandbox local file systems and some, know, more persistent cloud storage, there has to be a lot of system engineering done to keep those two things in sync as the agents are operating on it.
因此,除了大模型的神奇之处,还需要大量的严谨系统工程来确保这一切顺利进行。
So more than the LLM magic, there is a lot of rigorous system engineering that has to be done to to make sure that that happens.
但是的,沙箱默认包含在我们所有的部署中,选择使用Python还是TypeScript或其他语言,取决于我们正在进行的用例评估。
But, yeah, sandboxes come come by default as a part of all our deployments And the choice of Python versus TypeScript versus what they are writing is based on evals in the use cases that we are doing.
我想说的最后一点是,我们努力确保代理的行为像真实人类一样。
And then the final thing I would say is we try to make sure the agents operate as real humans.
因此,每个代理都有自己的身份认证,必须以可被客户审查和验证的身份进行操作。
So every agent gets its own auth and every agent has to, you know, operate as an identity which can be reviewed and verified in a customer that we deploy it in.
并且它也会遵循正确的授权机制。
And it follows the right authorization as well.
无论是它访问的MCP服务器,还是API调用或访问的数据,所有这些都由相同的RBAC和其他访问控制策略进行管理。
So whether it's the MCP servers it accesses or whether it's the API calls or data that it accesses, all of that is governed by the same RBAC and other access control policies that they have.
而这一点实际上是最难做对的。
And that's actually the hardest part to get right.
因为一旦上线,我认为概念验证和试点项目都很容易。
As like once going live, I think POCs and proof points are easy.
但当您上线时,这些受控概念对我们而言意味着:每个代理、每个代理系统都是一个项目,包含一组在其中运行的代理、某种主代理,以及项目中受控的一组资源。
But when you're going live, the concepts of these governed, like for us, every agent, every agentic system is a project with a set of agents operating in it, a master agent of sorts, and a set of resources that are being governed in the project.
每个项目都自带RBAC和访问控制策略等。
And the project comes with its own RBAC and access control policies and so on.
在入职过程中,我们有一个为代理进行入职的步骤。
And while onboarding, we have a step of onboarding an agent.
它需要获得正确的身份验证和在公司内运行的正确身份。
It goes through kind of getting it the right authentication and the right ID in the company that operates in.
因此,所有这些因素在某种程度上都必须协同作用,才能在企业中建立正确的安全环境,使其高效运行。
So all that has to play in some sense in getting that right kind of security, secure setting in an enterprise for it to work efficiently.
自我改进系统这一理念的另一个方面是,模型本身已不再是差异化因素。在生成式AI广泛普及之前,许多企业投入大量资源构建针对其专注领域的定制模型。
Another aspect of this idea of self improving systems is that the models themselves are no longer the differentiator where before generative AI became so widely adopted, there were a number of businesses that were investing a lot of resources into building their custom models that were specific to the problem domain that they were focused on.
深度学习或许扩展了这些模型的能力边界,但它们仍然是为特定用例量身打造的。
Deep learning maybe expanded the bounds of what those models could do, but they were still purpose built for a particular use case.
现在,每个人都能广泛访问相同的一组模型。
Now everyone has access broadly to all of the same set of models.
因此,为了让它成为一项有用且能体现企业差异化的功能,它需要具备围绕它的所有这些系统级能力。
And so in order for it to be something that is useful and a differentiating capability of that business, it needs to have all of these other system level capabilities around it.
我想知道,您如何看待企业对这一层面竞争的思考,以及它们如何从更广泛的组织背景出发,思考这些机器学习和AI模型的用途,而不仅仅将其视为对基础模型提供商的一次API调用?
And I'm wondering how you're seeing organizations think about that level of competition and ways that they think about the purpose of these machine learning and AI models in the broader context of their organization beyond just being an API call to a foundation model provider?
是的,这是个很好的问题。
Yeah, that's a great question.
当我们与真实客户交流时,发现他们不清楚自己的数据和知识产权有多少泄露到了模型中。
And we see it in when we talk to our real customers, there is an unknown in how much of their data and IP is leaking into the models.
他们对自身对这些模型的依赖程度也存在不确定性。
There is an unknown into what are they creating as dependencies on these models.
我们对自己的立场非常明确。
I think we are very clear on our stance.
我们了解自己的行业,了解自己的领域。
We know our industries, we know our domains.
我们将这些领域知识或领域知识图谱融入到我们所面对的每一个问题中。
We bring that domain knowledge or the domain knowledge graph into every problem that we bring in.
我们知道,有很多上下文是特定于客户的。
We know that there is a lot of context that is customer specific.
我们提供了正确的语义,帮助将这些特定于客户的上下文融入其中。
We provide the right semantics around how to bring that context in that is customer specific.
当代理处理任务时,我们会提供它所使用的上下文,并根据我们之前讨论的关于记忆的方式更新这些上下文。
And we provide sort of as the agent works on it, the context it uses, the context is updates, as we discussed earlier in terms of memories.
我们对其中所包含的知识产权向客户非常明确。
We are very clear to our customers on the IP that lies in there.
大部分知识产权实际上属于客户。
And much of the IP sits with them.
因此,如果我们能在生产环境中真正创建出自学习系统——模型本身保持不变,而通过文件系统、Markdown文件等更新记忆,这才是真正的奥妙所在。
So it is very interesting now that if we can create these self learning systems truly in production, where the model remains the same, but in updating memory as sitting in file systems and markdown files and so on is the real magic.
这对大多数企业来说都是非常有吸引力的,对吧?
That's a very attractive proposition to most enterprises, right?
感觉他们正在构建这一层问题上下文和流程上下文。
It feels like that they are with that, they are creating that layer of kind of problem context and then process context.
我们还使用一个术语,叫做行动上下文。
And we also use a term called action context.
从某种意义上说,这一点是任何CRM和ERP系统都无法捕捉的。
And in some sense, that is not captured by any CRMs and ERPs.
随着这些系统上线,由代理创建的包含markdown文件的文件夹将逐渐成为你流程、行动、升级等的来源。
And as these systems go live, these folders with these markdown files that get created with the agent start becoming these sources of your processes and your actions and your escalations and all of that.
我认为我们现在还处于这个旅程的非常早期阶段,但我可以说,事情正在逐步展开。
I think it's, we are very early in this journey right now, but it's playing out is what I can say.
如果我们在这方面取得成功,我认为每个企业都可以拥有一个持续更新的专属知识层。
And if we are successful with this, I think every enterprise can own their own knowledge layer per se that is updating.
而我们作为供应商,是我们行业知识的专家。
And we as vendors, we are experts in our industry knowledge.
我们提供这种上下文。
We bring that context.
我们帮助每个企业建立属于自己的主权知识和流程上下文。
And we help every enterprise have their kind of sovereign knowledge and process context.
那将是一个很好的落脚点。
That'll be a great place to land in.
但我认为我们还处于非常早期的阶段。
But I think we are very early.
我认为这需要整整一年时间,也许到2026年,才能看到它真正取得成功。
I think it'll take a whole year, maybe 2026, for that to evolve and see how how it is truly successful.
这些自我改进系统的另一个方面是,除了这些代理能够捕捉的所有上下文和数据知识外,基础模型提供者也在不断演进模型,每一代都会新增能力或聚焦特定用例。
And another piece of these self improving systems is that beyond all of the context and data knowledge capture that these agents are capable of, there is also the underlying evolution of the models by these foundation providers as each generation adds new capabilities or focuses on particular use cases.
但这也带来了一定的平台风险,因为你所依赖的、支撑大量运营能力的特定模型可能会被弃用,转而采用另一个行为略有不同的模型。
But it also brings with it a certain level of platform risk as a particular model that you build a lot of your operational capacity around gets deprecated in favor of a different model that maybe behaves slightly differently.
我想知道,你如何看待这一方面——企业如何把握自己的命运?企业在多大程度上依赖通过API访问的特定模型,而不是自托管这些开源权重模型,甚至现在开始构建自己的大语言模型?尽管仍需要大量知识、能力和特定硬件,但至少这已经是一个可实现的明确选择。
And I'm wondering how you think about that aspect of owning your own destiny and how enterprises are thinking about their level of reliance on a particular model governed via API access versus self hosting one of these open weights models or even building some of their own large language models now that that has become, I'm gonna say, commoditized even though there is still a lot of knowledge and capability and in particular hardware required, but it's at least a a known quantity that people can do if they so choose.
是的。
Yeah.
对。
Yeah.
不。
No.
这是个很好的问题。
That's a great question.
我认为,人们并没有意识到从一个版本切换到另一个版本,比如从3.5升级到3.7,再到4.5,然后又回退到3.7或3.5并被弃用,会带来多大的混乱。
I think, people don't realize practically what havocates creates to move from one version to another version or an update to cloud move from 3.5 to 3.7 and then 4.5 and then 3.7 or 3.5 is deprecating.
现实是,企业系统需要极高的可靠性,对吧?
The reality is enterprise systems need a lot of reliability, right?
一旦将系统投入生产,就会有严格的标准,比如我们在平台上进行评估时,第一件事就是:如果我运行这一操作100次,有多少次能得到相同的结果。
And one taking it to production, there are these strict you know, we have an evaluation in our platform, which the first thing it checks is if I run this a 100 times, how many times it gets the same result.
而每次我们更新模型版本时,这个可靠性指标在实际应用中都会被打破。
And every time we've updated the model version, you know, that, you know, that reliability metric has broken, has, you know, in practice.
因此,我们从未能仅仅通过切换API来完成模型升级。
So we've never been able to do just model upgrades just, you know, by switching from one API to the other.
每次都需要调整提示词,还必须进行一些调查和更新。
There is always some prompt changes and there is always some, you know, investigation and updates we have to do.
所以,我认为你的观点非常正确。
So it's not a I think your point is very right.
归根结底,基础模型公司拥有一套非常广泛的评估基准。
Just at the end of the day, the foundation model companies have a very wide set of benchmarks that they are operating against.
他们可以看到模型整体在进步,但在某些局部基准上,从一个版本到另一个版本的表现反而会下降。
And they can see that overall the model is improving, but in some local benchmarks, it does go down from one model to the other.
这方面有一些非常著名的例子,比如GPT-5发布后,5.2版本推出时的情况。
There are some very popular examples of this when GPT-five launched and 5.2 came along and all that.
在一些编程基准测试中,性能实际上反而变差了。
And in some of the coding benchmarks, it was actually going worse.
所以我认为这是一个真实存在的问题。
So I think that's a real problem.
目前的情况是,像我们这样的供应商要为客户的这些问题买单。
Right now, what happens is what's happening is vendors like us, we take the hit of that for our customers.
我们至少构建了能够率先发现这类问题并阻止其上线的系统。
And we are at least build systems where we catch this first and don't let it go live.
然后我们不断迭代,帮助我们的客户解决这些问题。
And then we iterate and help our customers on it.
但整个系统有点脆弱。
But the whole thing is a little brittle.
挑战在于,你该如何改进它,避免它变得脆弱?
The challenge is how do you make it, like how do you improve it and not let it be brittle?
当然,一种方法是使用专门为该任务优化的小型语言模型或小型推理模型,自己部署并让它独立运行。
So of course, one way is to say, you know, have your small language model, small reasoning model that is very good at that task, host it yourself and, you know, let it live out.
当然,这需要更多的投入,更加资本密集。
The, the, of course, it comes with more investment and more, it's more capital intensive.
你需要自己运行GPU等设备。
You have to run your own GPUs, etcetera.
但它能带来更高的可靠性,以及对未来的更大把握。
And, but it does come with more reliability and more bet on the future.
到目前为止,我还没有真正采用这种方式,因为尽管基准测试正在趋于饱和,模型也正变得商品化。
So far, I haven't played that one like that thing play out primarily because even though benchmarks are getting saturated, models are getting commoditized.
即使现在,每次新的大型模型发布都会在许多方面显著提升整体性能,对吧?
Even now, every new big release that comes out in models does improve overall performance in many things a lot, right?
因此,企业将拥有自己的小型语言模型视为一种维护项目,对此有些望而却步。
So having your own small language model is seen as a bit of a maintenance project by enterprises and they are a little afraid to do it.
在我看来,这正是像我们SymphonyAI这样的供应商的机会——让它们无需操心这些事。
I see it as an opportunity for vendors like us, SymphonyAI to make it easy for them, where they don't have to worry about it.
正如我之前所说,如果我们从模型转向智能体系统中的学习能力,将其视为内存升级,并智能地调用正确的内存更新,如果这种智能体框架能发挥出这么多魔法,一切都会变得简单得多。
And this is where I was saying this, if we go from models to the learning capabilities sitting in agentic systems as memory upgrades and intelligent invocation of the right memory updates to the right memory, if that agentic harness can play a lot of that magic then everything will get simplified.
因此,在我看来,这是一种更实际的方案。
And so I think that is a more practical approach in my mind.
当然,如果强化学习系统能变得对每个企业都易于操作的话。
Of course if RL learning systems become very easy for every enterprise to operate with.
我觉得你所说的可能会实现,但目前看来还显得有些遥远。
I think, you know, what you said can play out, but right now, it it feels feels a little far out.
关于你所看到的企业如何实际投资于这些自我改进的AI系统,你观察到哪些常见的模式?听起来强化学习和微调可能还较远,或未被广泛采用,但我很好奇,企业是如何思考它们愿意投入的层级和复杂性的?
Regarding the ways in which you're seeing organizations actually invest in these self improving AI systems, what are some of the common patterns that you're seeing play out where it sounds like reinforcement learning and fine tuning are maybe further out or not as widely adopted, but just wondering how you're seeing people think through the levels of investment and sophistication that they're willing to operationalize.
是的
Yeah.
我认为一个非常明确的模式是,每个人都意识到,要构建任何自学习系统,环境设置必须正确。
I think one very clear pattern is everyone's realizing is that I need to form any self learning system, the environment setup has to be right.
对吧?
Right?
因此,搭建合适的环境需要大量工作。
And so it takes a lot to set up the right environment.
你需要从输入数据的摄入开始,一直做到行动层,然后正确获取人类反馈。
It said you have to start at all the way from the input data ingestion all the way to the action layer and then getting the human feedback right.
我认为许多企业正在努力至少把这一点做好。
And I think I see a lot of enterprises putting efforts in at least getting that right.
比如为这个环境准备好正确的数据。
Like getting the data right for this environment to be ready.
我看到许多供应商正在提出正确的问题。
And I see a lot of vendors asking questions, rightly so.
我看到很多企业都在问,你知道的,大家都用智能代理。
I see a lot of enterprises asking questions of like, you know, everybody has agents.
你的代理比其他的有什么优势?
How is your agent better than the other?
它是怎么学习的?
How is it learning?
当我们深入时,我们会向他们展示它是如何学习的,但我们会主动提出要求。
And when we go in, we say, you know, we show them how it learns, but we ask for it.
我们该怎么把这种反馈收集回来,对吧?
How are we going to get this feedback back, right?
这些反馈有没有被记录下来?
Is this feedback getting captured somewhere?
我们希望在你们的系统中植入合适的机制,以便让这些反馈能够顺利流入。
We would like to implement the right hooks in your systems for that feedback to flow in.
我认为客户对这一点是接受的。
And I think customers are receptive to that.
他们意识到,将整个流程从数据输入捕获、触发捕获到动作数字化,再到反馈循环成为清晰的数字实体——比如执行任务并记录人类反馈等——的重要性。
They realize that just digitizing that whole process from input data capture, trigger capture, to actions being digitized, to feedback loops being very clear kind of digital entities, if you will, as sort of a task action being taken and a human feedback being written, etcetera.
在那里,我看到企业甚至愿意在流程中引入人类评判者,或者至少引入一个大语言模型评判者。
There, I see enterprises even being ready to put a human judge in the loop or at least an LLM judge in the loop.
他们希望将‘任务是否正确’这一判断转化为大语言模型可用于反馈的输入格式,并将此作为自身拥有的核心特性。
And that is something as a property they want to own to convert whether the task was right or not into an input format that the LLMs can use for feedback.
因此,我认为企业对这种环境的接受程度是我观察到的一个模式。
So I think the readiness of kind of reading the enterprise for this environment is one pattern I see.
第二个模式是我们参与的领域,即构建正确的上下文层实现和正确的记忆层实现。
The second pattern is where we play in, which is that building the right implementation of the context layer, building the right implementation of the memory layer.
无论它存在于你的文件系统中,还是作为知识图谱,或其他形式。
And the fact that it sits in your file systems or as knowledge graphs or as something else.
我认为这正成为一种差异化优势。
And I think that is becoming a differentiator.
当我们进入企业销售时,正是这些问题促使人们开始思考正确的架构设计。
And that's where the right architecture questions are getting asked when we go to sell inside enterprises and so on.
总的来说,我认为企业正在为大量代理的运行做好准备。
So overall, I think the enterprises are getting ready for agents operating, a lot of agents operating.
他们知道必须准备好数据、API以及整个反馈循环。
They know that they have to get their data and APIs and just the whole feedback loop ready.
我认为从实际角度来看,目前的努力就集中在这一点上。
And I feel like that's practically speaking, that's that's where the efforts are right now.
当你与这些组织和企业讨论投资于这些自我改进的AI系统,并集中精力推动代理能力发展时,
As you're talking to these organizations and enterprises about investing in these self improving AI systems and really pushing forward in a focused and concerted manner on agentic capabilities.
在他们能够以可重复、可靠的方式将这些系统投入生产之前,有哪些陷阱或隐性成本需要提前考虑?
What are some of the pitfalls or hidden costs that you need to explore before they are able to actually put these things into production in a repeatable and reliable fashion?
是的。
Yeah.
我认为一旦开始部署这些系统,就会充满各种隐患,而不仅仅是简单的陷阱。
I think it's full of landmines and not just pitfalls once you go into putting these systems in place.
我认为你首先会意识到,确实存在数据库、ERP和CRM系统,还有各种政策和标准操作流程。
I think the first thing you realize is yes, there's databases, there is ERPs and CRMs, then there is policies and standard operating procedures.
展开剩余字幕(还有 184 条)
本质上,你是在观察人类今天是如何运行这些复杂的业务流程的。
Essentially, you're trying to see how do humans run these complex business processes today.
你首先意识到的是,企业认为自己是按照政策或标准操作流程在运行的,但事实并非如此。
The first thing you realize is enterprises think that they are running as per that policy or that as per that standard operating procedure, but they are not.
因此,存在大量的政策缺口。
So there's lots of policy gaps.
实际上,随着时间的推移,人类已经找到了绕过这些政策缺口的方法,并围绕这些缺口形成了隐性知识。
So the reality is humans over time have found a way to work around those policy gaps and have developed retribal knowledge around that.
但代理系统在面对这些政策缺口时会失败。
But agents fail with those policy gaps.
那么,你该如何填补领域知识,让大语言模型的‘大脑’明白:我完全按照政策执行了,但结果却是错的?
And so how do you fill that domain knowledge of for the LLM inside or the brain inside to say, know, I did everything right as per what the policy said and yet my outcome was wrong.
这是因为政策本身存在缺口,而每个企业内部都存在一些隐藏的流程,这些流程被用来填补这些缺口。
Well, was wrong because there are gaps in the policy and there are hidden processes in every enterprise where they found ways to fill that gaps.
这些流程可能是以人员为导向的,比如升级路径等。
Those processes could be people oriented like escalation paths, etc.
这些差距可能只是基于你所处场景类型而逐渐形成的隐性知识。
Those those gaps could be just tribal knowledge built over the type of scenarios that you operate in etc.
但这些数据根本没有任何地方被记录下来。
But that data is just isn't captured anywhere.
因此,也没有办法填补这些空白,因为这类数据在历史上根本就不存在。
And so but the there is no way to fill that also like that data just doesn't exist historically.
所以你必须从零开始,构建这个知识图谱,或者说是以一种LLM能够操作和基于其扩展的方式,建立这种隐性知识层。
So you have to kind of start at day zero and build that knowledge graph or build that sort of tribal knowledge layer, if you will, in a way that the LLMs can operate on it and build on it.
我认为这是最难做对的一点。
And I think that's the hardest one to get right.
第二点是,为了让这些内容真正具有可操作性,中间还存在大量行动步骤上的空白。
The second one is for these things to be truly actionable, there's a lot of gaps and action steps.
并不是所有内容都存在,也不是每个行动都以可执行的API格式或数字格式呈现。
Like not everything exists, not every action is in an executable like API like format or, you know, or in a digital format.
它们是一系列步骤,通过自己的流程进行,比如通过电子邮件渠道,或者其他一些执行方式。
It is in a set of steps which goes through its own processes of, you know, email channels or, you know, some other formats of executing on it.
因此,要捕捉端到端的流程很困难,因为这个流程非常碎片化。
So it is hard to capture the end to end loop because it is very fragmented.
所以,从某种意义上说,正确的方法是先自动化各个子流程,在弄清楚如何数字化或解决子流程之间的集成缺口之前,先把子流程做好。
And so in some sense, the right way to attempt it is to first automate the sub processes and to get that right while you figure out how to digitize or get right the integration gaps across the sub processes.
在你与这些企业共同经历整个采用曲线的过程中,当它们开始运营并依赖这些系统时,你看到过哪些最有趣、最具创新性或出人意料的方式,让它们应用自我改进的AI,或者如何构建支持这种能力的系统?
And as you have been working with these businesses through that overall adoption curve and as they start to operationalize and rely on these systems, what are some of the most interesting or innovative or unexpected ways that you've seen them either apply self improving AI or ways that you've seen them approach the creation of the systems that support that capability?
是的,我们有很多成功案例。
Yeah, I think we we have a lot of success stories.
我认为我们引以为豪的一点是,能在最短的时间内从零实现到生产环境,对吧?
I think one thing we pride ourselves in is to go from zero to production in the least amount of time, right?
我们的做法是,即使在概念验证阶段,也能从签署POC到快速产出实际成果,以最快的速度上线。
And the way we are even in a POC to go from having signed a POC to going live on some kind of results at the fastest sort of time.
所以,真正令人欣喜的是,当我们进入某个用例时,我们能带来什么。
So I think the, what has been really great to see is what we bring once we go into a use case.
正如我所说,我们是一家垂直领域的AI公司。
We, as I said, we are a vertical AI company.
我们在少数几个垂直领域运营,这些领域的端到端用例非常完善。
We operate in few verticals where we know the end to end use case is perfect.
因此,我们已经为这些领域建立了流程图。
So we have a process graph for it already.
我们还拥有底层的实体图,清楚地知道需要哪些数据以及它们之间的关系等。
We have the underlying entity graphs for what data is needed, what are the relationships between them and so on.
一旦我们将这些映射到客户的数据上,我们就具备了智能代理能力。
And we now have Agentic capabilities once we map it to the customer's data.
我们拥有能够进行映射的代理,比如将客户的实体图映射到我们的实体图,或者更准确地说,是行业实体图。
We have agents that can map it, like do this mapping from their entity graph to our entity graph or rather an industry entity graph.
我们可以非常快速地将他们的数据映射到流程图上。
And we can map it from their data to a process graph very fast.
因此,有一个代理可以完成这项工作。
So there is an agent that can do that.
这极大地加速了这种标准化,也大大缩短了企业实现价值的路径。
That really speeds up that kind of standardization really speeds up the path to value that these enterprises are getting.
否则,就会有一个完整的探索阶段,去弄清楚该做什么。
Otherwise, there is a whole discovery phase and and coming up with what.
现在,你只能在你所熟悉的这些垂直领域和用例中做到这一点。
Now you can only do it when you're operating in this vertical slices in the use case you're aware of.
如果你从企业中一个通用问题开始,它可能走向无数条路径,那就有点难了。
If you start with like a general problem in an enterprise with which can go in n number of paths, it's a little hard to do.
所以对我们来说,开发这些标准的行业或垂直领域知识图谱和流程图谱,实际上是非常富有成效的。
So for us, that's been, practically speaking, a very fruitful exercise to develop these standard industry or vertical knowledge graphs and process graphs.
当我们进入企业时,我们第一天就能落地。
We land in day one when the enterprises that we go into.
另一方面,在成果方面,这些自动化能帮你达成的效果确实非常惊人。
And on the flip side, on the outcome side, it is actually quite remarkable what some of these automations help you get to.
正如我所说,我们在金融服务领域有一些客户,他们的L1和L2客服以及调查环节已经完全被AI自动化了。
Like I was saying, there are customers we have in financial services where their L1 and L2 agents and an investigation step have been completely automated by AI.
L1客服主要遵循政策和标准操作流程。
And so L1 agent is primarily, it follows a policy and standard operating procedures.
整个过程没有人类参与。
There's no human involved.
它会读取那份文档。
It takes in that document.
它会为我们创建一个代理,完全严格按照文档执行。
It creates an agent for us that would follow it, and it would follow it exactly to the T.
而且它能百分之百准确地完成。
And it would get it like 100% right.
然后进入L2调查阶段,这时会涉及一些人类的直觉。
And so, and then you get to L2 investigator stage where there is some human intuition involved.
我们在这一阶段也达到了非常高的准确率。
And we are getting at very, very high accuracies there as well.
我举个最令人惊讶的例子:通过遵循政策并自主判断调查的多种方式,我们实际上发现了银行政策尚未更新的新型违规行为,这些行为与最新法规不符。
The ultimate surprising example I'll give you is that by following policies and by coming up with its own judgment of some of the ways these investigations can be done, we are actually finding new detections like banks policies were not getting updated with the latest in what the regulations say.
但由于这些代理持续改进、不断发现新情况,它们开始检测出政策尚未覆盖的新犯罪行为。
But because these agents are always improving, always discovering new stuff, it started detecting new crime, which the policies were not ready to capture yet.
因此,这些自我改进的循环实际上缩短了从美国监管机构颁布新法规到该法规在实际场景中被发现之间的时间周期。
So these self improving loops actually fix the refresh cycle of how quickly from once like a US regulator passes a new regulation to how quickly it gets to it being caught in real scenarios that time really shortened.
这让我们感到惊讶,也让最终客户感到意外。
And it was surprising to us, it's surprising to the end customer.
但如果你仔细想想,这其实是有道理的。
But I mean, if you think about it, it does make sense.
这相当于砍掉了中间许多人为流程的环节,所以你本该预期到这一点。
This is the kind of like, you're cutting many layers of human processes in the middle, and that's why you kind of expect it.
我们只是没料到它会来得这么快。
We just didn't expect it to come so fast.
在你亲身参与这个领域、探索让AI系统通过执行和互动学习与改进的整体能力的过程中,你学到了哪些最有趣、最出人意料或最具挑战性的经验?
And in your own experience of working in this space and exploring this overall capability of allowing these AI systems to learn and improve from their execution and interactions, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
对我来说和我的团队而言,我们正在并行推进两条路径。
I think for me and my teams, we are following kind of two parallel paths.
一条是我们自己的产品开发、系统构建和平台建设过程。
One is our own kind of product building and system building, platform building process.
这是一个软件开发生命周期过程,其中代理正变得越来越自主。
It's a software development lifecycle process and agents are getting more and more autonomous there.
因此,我们将这视为与我们的产品、代理和产品所做工作的平行现象。
And so we see that as a parallel to what our products and our agents and our products are doing.
我们在软件开发流程中能实现多少自主性,这与我们在最终客户流程自动化中通过代理能实现多少自主性是相似的。
And it's kind of how much autonomy can we get in our software development processes kind of parallels how much autonomy we are able to build in our end customer process automation with agents.
我们在内部代码编写、代码审查、测试和DevOps流程中学到的许多经验,都可以应用到最终场景中。
And there's a lot of lessons we learn in our internal code writing and PR and testing and DevOps processes that we learn from, that we are able to apply to the end scenarios.
我们在实施过程中遇到的许多问题,与在实际应用中遇到的问题也非常相似。
And a lot of problems that we face with implementation here are similar to the ones that we face in practice there as well.
在编码环境中创建一个能够自我改进的环境,与在实际场景中创建这样的环境一样复杂。
Creating that right environment that self improves in a coding environment is just as complex as creating that environment in a practical example.
所以,虽然我没有直接回答你的问题,但对我们来说,这无疑是最有趣的历程。
So I think without answering your question specifically, I think that's been the most interesting journey for us.
某种程度上,我们在学习如何提升行业自主性的同时,也在提升自己作为开发者的水平。
So in some sense, are improving our own selves as developers while we are learning how to improve autonomy for the industries we operate in.
当你考虑将人工智能应用于特定问题时,在哪些情况下你会建议不要投资于这些自我强化的循环,而直接使用现成的大型语言模型或现成的预测模型?
When you're thinking about the application of AI for a particular problem, what are situations where you would advise against the investment in those self reinforcing loops where you're fine just using an out of the box LLM or out of the box predictive model?
是的。
Yeah.
这是我们几乎每天、每个使用场景都会问的问题。
That's that's a question we ask almost every day, every use case we go into.
我认为这个问题也有不同的层次,对吧?
I think there are levels of this question as well, right?
你应该使用大型语言模型吗?
Should you use an LLM?
应该。
Yes.
你应该使用大型语言模型还是小型语言模型?
Should you use a large language model, small language model?
在许多情况下,你可能应该使用非常小的语言模型,因为对于你所处理的任务来说,不一定需要大型模型。
You should probably use a very small language model in many cases and doesn't have to be large for the kind of task you are on.
所以我认为,关键在于意识到任务的复杂性,判断是大型语言模型能完成,较小的语言模型能完成,还是需要让LLM在循环中完成?
So I think awareness of the task complexity and whether an LLM can do it, a smaller language model can do it, or does it need a LLM in a loop to do it?
这才是关键的分类方式。
That is the key kind of categorization.
智能体本质上是将LLM置于循环中,结合特定上下文,并以巧妙的方式改变上下文等。
Agents are effectively LLM in a loop with certain context and clever ways of changing that context, etc.
因为我们专注于这些行业中的特定用例,任务对我们来说非常明确。
Because we operate in these specific use cases in the industries, the tasks are very, very clear to us.
例如,如果你要做网页研究摘要,我们实际上为此使用了一个小型模型。
So for example, if you were to do a web research summary, we actually have a small model for it.
在这种情况下,你不需要在大型语言模型上浪费令牌。
And it's like you don't need to burn tokens on a very large language model there.
当你在更不确定的领域中操作,智能体需要根据多种上下文和变量做出决策时,它应该进行思考或推理。
When you are operating in a more undefined domain where the agent is making decisions with a lot of different contexts and variables, it should probably be thinking or it should be reasoning.
因此,对我们而言,自动化流程的每一步都相当明确,需要深入到何种程度的推理或思考,或者它属于哪种自然语言理解类任务。
And so for us, every step of the process automation is fairly well understood how deep it has to go into reasoning or thinking or how, what kind of a task, of natural language understanding kind of task is it.
这些任务中有一些并不难。
And some of those tasks are not that hard.
有些任务甚至根本不需要大语言模型。
Some of those tasks actually don't even need an LLM.
所以我们已经把这一点梳理得很清楚了。
And so we have that fairly well laid out.
我给的建议是,判断哪个任务需要何种级别的处理,这有点像一门艺术。
The recommendation I would give, it's a bit of an art to figure out which task requires what level of thing.
但总的来说,我会说,如果某个任务反复出现,并且可以用确定性代码完成,你可以用大语言模型来编写这段代码,但之后就保存下来,对吧?
But in general, would say if it's the same task over and over again, and if it can be done by deterministic code, you can use an LLM to write that code, but just then save it, right?
别一遍又一遍地用大语言模型来执行它。
Just don't use an LLM again and again to do it.
只需用大语言模型写一次,保存那段代码,然后反复执行它即可。
Just Just write an LLM once, save that code, execute that code again and again.
如果输入发生变化,并且决策必须不断调整,那时你就需要判断是否真的需要一个智能体。
If the input changes and if the decisions have to change again and again, that's when you have to decide whether you truly need an agent.
在我们的情况中,真正困难的是根因分析这类场景,比如当工厂中的某些系统宕机时,需要回溯之前的上下文,探索类似事件是否发生过,并据此做出判断。
So in our cases, the truly hard, like you look at some of the truly hard scenarios of root cause analysis when some systems are going down in a plant and it has to go into figuring out previous contexts like this, exploring the path of whether similar things have happened, and making judgments based on that.
当面对这种复杂任务时——这类任务即使对人类来说也需要数小时甚至数天才能完成——显然你需要考虑使用智能体系统。
When it's a complex task like that, something which takes humans also hours and days today, quite obvious you have to look into kind of agentic systems.
当你谈到尽量避免不必要的令牌消耗时,这也引出了另一个自我改进和成本优化的维度:也许你的目标之一是最小化某个智能体用例的成本,当然,基础设施层面的成本优化是另一个独立的问题。
And when you were talking about that aspect of trying to avoid burning tokens unnecessarily, that also brings up another axis of self improvement of cost optimization where maybe one of your objectives is to minimize the expense of a particular agent use case or, obviously, cost optimization on the infrastructure side is a separate question.
但就智能体本身或AI系统如何优化自身的效率而言,我想知道你在这一领域看到了哪些进展。
But in terms of the agent itself or the AI system itself optimizing its own efficiency, I'm wondering what you're seeing as far as capabilities on that horizon as well.
是的。
Yeah.
这是个很好的问题。
That's a great question.
我的意思是,我们每天都在面对云代码和Cursor使用带来的成本问题。
I mean, we are facing that every day in with our cloud code and cursor costs.
如果Cursor团队能开发出一个自我改进的成本优化器,那将是我最期待的功能。
I wish that's my favorite feature from Cursor team if they could build a self improving cost optimizer.
但我认为这可能违背了他们的商业模式。
But I think it probably goes against their business model to try to do that.
我认为公平地说,我们在这方面的投入并不多,但你可以看到未来的前景。
I think really to be fair, we haven't invested much in the area, but you could see it in the horizon.
正如我所说,我们在许多任务中使用了多个小型语言模型。
As I said, we do use many small language models in many of our tasks.
我认为真正的挑战在于任务的可重复性有多高?
I think the real challenge is how repeatable of a task is it?
比如你下次执行时看到的模式是否完全相同。
Like how exactly the same pattern you see the next time you do it.
大型语言模型和更大模型的出色之处在于它们在跨任务方面的表现非常优秀,对吧?
And the great thing about LLMs and bigger models is their how across task they are great at, right?
所以如果你的任务只是稍微改变了一些参数,它们就能适应,或者说它们已经在许多其他类似任务中展现了良好的效果。
So if your task just changes parameters a little bit, they will adapt to it, or rather they have already shown good results in many other similar tasks and so on.
因此,任务的特定性与你希望有多通用,通常决定了这一点。
So the task specificity versus how general you want to go typically defines it.
只要我们能在任务特异性达到一定程度,使得模型很难有太大偏差时,我认为我们就应该全力优化,使用更小的模型。
Wherever we can achieve the levels of task specificity where, you know, it's hard for it to deviate too much, I think that's where we should just optimize the hell out of it and go to smaller models.
说实话,在业界,我们还没有构建过尝试大量不同模型的系统。
You know, I'll tell you, practically speaking in the industry, we haven't built systems which try a whole lot of models.
我真的很惊讶谷歌的Gemma系列模型表现得这么好。
Like, I am very surprised by how good the Gemma series of models are from Google.
而且在业界,我几乎看不到有人在很多任务中尝试使用Gemma,对吧?
And like, I see hardly anyone in the industry trying Gemma for many of their tasks, right?
甚至在Gemini和Gemma之间,人们连Gemma都不愿意试一下。
And even within Gemini versus Gemma, like people don't even try Gemma.
所以我觉得,我们应该开发一种工具:一旦你定义了任务和评估标准,就把它交给优化器,让它去尝试所有这些低成本模型,找出哪个能在保证准确率的前提下表现最佳。
And so it's a little bit of a, I think we should build something that once you've defined a task and an eval, you give it to an optimizer and it tries all these cheap models and it says where it gets the most accuracy and it's able to achieve it.
我总觉得总会有人问:如果输入数据有轻微变化,会发生什么?
I think the question always comes, what happens when there is a drift in the input a little bit?
你准备好应对这种变化了吗?
Are you ready to absorb that?
我认为使用较小的模型确实存在这种风险。
And I think with smaller models, that has been a risk.
但这种情况正在出现。
But it's coming.
我觉得现在人们在编码代理上的投入已经变得相当疯狂了。
I think the amount people are spending on coding agents, it's getting quite crazy out there.
我认为编码代理会首先推动这种成本优化。
Just, I think coding agents will drive that cost optimization first.
我已经看到很多我们的开发者尝试使用 Ollama 和本地模型,只在需要思考时使用云服务,而用本地模型来编写实际代码。
I've already seen a lot of our developers trying to use Ollama and local models and trying to use cloud only for thinking and using a local model for writing the actual code.
因此,我认为人们已经开始针对他们面临成本压力的场景构建这些系统。
And so I think people have begun to develop these systems for wherever they are facing these cost pressures.
我认为在成本压力和产品利润率的驱动下,许多这类事情都会发生。
I think the economy under cost pressures and product margins will drive many of these things.
我觉得现在人们只是想先以高可靠性实现他们的用例和自动化。
I think right now people are just trying to get their use cases and automation right with high reliability.
接下来紧随其后的就是成本问题。
Right after this will come the cost aspects.
我认为每个人都觉得他们可以降低成本,所以他们或多或少会推迟一年左右。
I think everyone feels they can reduce the cost, so they are kind of delaying it one year onwards or so on.
但这是必然的。
But it's coming.
这肯定会到来。
It's coming for sure.
当你持续在这个领域工作并监控生态系统的发展时,你预计工具、底层架构和代理框架会如何适应自我改进的概念,并让实际执行和集成支持系统变得更简单?
And as you continue to work in this space and monitor the evolution of the ecosystem, what are some of the ways that you anticipate the tooling and substrates and agentic frameworks adapting to these concepts of self improvement and making the actual execution and integration of the supporting systems easier to do?
我认为这非常难以预测。
I think it's very, very hard to predict.
这是真实而坦诚的答案。
That's the true honest answer.
我认为我们可以看到,像我们SymphonyAI这样的公司会在我们所处的某些行业中构建出性能优异、可靠的代理系统。
I think what we can see is there'll be companies like us at SymphonyAI who will build very good performance, reliable, agentic systems in some industries we are in.
我们的希望是,这能在我们所处的行业中得到广泛应用。
And our hope is that results in wide adoption in the industries we are in.
但企业会意识到,这些技术栈正在趋同。
But the enterprises will realize that the stacks are converging.
我们从客户那里收到了很多请求,问:‘你们能帮帮我们吗?’
We get a lot of requests from our customers that, Can you help us?
除了你们正在做的用例之外,你们能否帮助我们公司标准化看待智能体系统的方式?
Besides the use cases you are in, can you help us in our company in standardizing ways in which we should see our Agentic systems?
我认为在数据层面上,我们已经看到MCP服务器和MCP协议在智能体接口标准化方面取得了一定进展。
I think the layers of data, I think we already see a fair bit of standardization in MCP servers and the MCP protocols standardizing interfaces to agents.
但我还没看到A2A协议在多智能体互操作性方面有太多应用。
I haven't seen that much pickup in A2A like protocols between multi agent interoperability.
不过,随着OpenClaw等系统越来越流行,企业也会看到智能体控制平面出现一些标准化趋势。
But, you know, with systems like OpenClaw and all getting popular, there will be some standardization in the agent control planes as well that enterprises see.
我认为,当有多个智能体在运行时,谁来管理和控制跨这些智能体的策略,这显然正在成为一个新兴领域。
I think like with many agents running, who's governing and controlling policies across those, Clearly that is emerging as an area.
我认为,未来的工具链正变得越来越标准化。
I think the tooling is getting very, very standardized going forward.
难以预测的是,Postgres数据库是否会成为代理的首选层,还是文件系统会占据主导,这些情况会如何变化?
What is hard to predict is whether it'll be Postgres databases as the agent preferred layer or whether it'll be file systems and like how do these things change?
这正在迅速演变,很难预测。
That's quite evolving and that's hard to predict.
但将代理视为代理生命周期管理的概念已经相当标准化了,就像模型运维和模型生命周期管理一样。
But the concept of treating agents as like the agent lifecycle management has become fairly standardized, like just like model ops and model lifecycle management.
因此,这些方面正在变得标准化。
And so those things are getting standardized.
我认为,关键在于代理进入生产环境并创造价值,之后其底层的各个层面都会开始标准化。
I think it's a question of agents going into production, creating value, and then everything under them will start getting standardized in its layers.
好的。
All right.
在我们结束本节目之前,关于自我改进的AI系统或AI应用生态系统,还有没有我们尚未讨论但你希望补充的其他方面?
Are there any other aspects of this aspect of self improving AI systems or the ecosystem of AI applications that we didn't discuss yet that you'd like to cover before we close out the show?
对我来说,这非常有趣。
I think for me, it's very interesting.
事实上,大家都清楚,基础模型公司正在大力投资强化学习与真实世界环境,尤其是在可验证系统方面,模型在各个领域正随着越来越多的强化学习环境的创建而不断改进。
Mean, ultimately, it's known to everybody that the foundation model companies are that reinforcement learning with real world environments and especially with verifiable systems is a big area of investments and models are improving at different domains with more and more RL environments being created and so on.
让我感到特别有趣的是,同样的能力能否迅速被企业掌握并加以利用。
What's very interesting to me is if that same subset of capability can come very quickly to enterprises in a way they can harness it.
就我的业务流程而言,我能否获得同样水平的能力?
And so for just my business process, can I get kind of that same level of capability?
那么,从一个较小的模型开始,我该如何在不雇佣研究科学家的情况下,应用相同的强化学习步骤呢?
So start with a smaller model and how do I apply the same reinforcement learning steps without needing to have a research scientist employed in my company, right?
但我知道流程是动态的,如果我以某种方式执行流程,我知道它是次优的。
But knowing that the processes is dynamic, knowing that if I follow the process in one way, I know it is suboptimal.
如果我换一种方式执行,效果会更好。
If I follow it another way, is better.
我该如何在不了解奖励机制和强化学习原理的情况下,将这种差异转化为模型的奖励?
How can I turn it into a reward for the model without knowing what rewards mean and RL means and all that?
如果我们能简化这一点,我认为很多企业都会对此感兴趣,这也是我们正试图深入探索的领域。
If we can simplify that, I think a lot of businesses, and that's an area we are trying to go deep into as well.
我认为这将使企业能够拥有自己的推理层,而不必依赖大型模型公司。
I think that will enable companies to own sort of reasoning layers of their own without relying on the big model companies.
因此,我对这一领域的发展前景感到兴奋。
So I'm excited how this area emerges forward.
好的。
All right.
对于任何想联系你并关注你工作的人,我会请你把首选的联系方式添加到节目笔记中。
Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes.
最后一个问题,我想听听你对当今AI系统在工具技术或人员培训方面最大缺口的看法。
And as the final question, I'd like to get your perspective on what you see as being the biggest gaps in the tooling technology or human training that's available for AI systems today.
工具和技术方面最大的缺口是什么?
Biggest gap in tooling and technology?
我认为,我可能会反过来讲。
I think it's, I would probably say the other way.
我认为这与Claw Code和其他一些智能体的发展方式有关。
I think it's with the way Claw Code and some of these other agents are evolving.
填补任何存在的空白其实非常容易,对吧?
It is very easy to fill whatever gap exists, right?
我更担心的差距在于,当需要将这些技术应用到真实产业和实际案例时的集成步骤。
I think the gap I'm more worried about is the integration steps when it needed to apply this to real practical industries, real practical example.
我认为目前没有清晰的集成框架。
I think there is no clear integration outline.
每个实施项目都必须以不同的方式处理。
Like everything has to be done differently for every implementation that you go to.
因此,如果能利用智能体去发现这些流程,并创建出类似模板的东西,我认为我们长期以来一直在讨论行业中的数字孪生概念,也已经有过多次尝试。
And so if there is a way to use agents to go and discover these processes and create like templates of, I think effectively we've been talking about digital twins for a while in the industry, and there have been several attempts at it.
我们在SymphonyAI从产业角度也做过自己的尝试,但真正将数字孪生视为企业中的工具或集成层的意识还不存在。
We have our own attempt at it at an industry perspective in SymphonyAI, but a true recognition of a digital twin in a company as a tooling, as an integration layer is not there.
如果这种意识存在,智能体就能非常轻松地接入其中。
And if it was there, then agents would onboard onto it very.
但我觉得另一方面,如果你把Cloud Code部署到公司的一个环境中,赋予它访问各种实时基础设施和资源的权限,它就能开始自行摸索。
But I think on the flip side, you put Cloud Code in an environment in a company, you give it access to various live infra and resources, and it starts to figure out.
我认为我们拥有了人类历史上最棒的工具,尤其是对于开发者而言,这些AI工具简直太强大了。
I think we've got the best tooling ever in the history of mankind and what developers had, especially with these AI tools.
因此,我实际上对如何填补现有差距和改进工具持非常非常乐观的态度。
So I'm actually seeing the picture as very, very optimistic on what we can do to fix the gaps and tooling where it exists.
好的。
All right.
非常感谢您今天抽出时间与我交流,分享您在构建这些AI系统方面的经验和见解,特别是如何让它们持续演进和提升,以及如何确保它们始终与组织目标保持一致的安全考量。
Well, thank you very much for taking the time today to join me and share your experiences and insights into how to build these AI systems in a way that they can continually evolve and improve and some of the safety considerations around how to make sure that they stay well aligned with the organization's objectives.
这是一个迷人且快速发展的领域。
It's a fascinating and fast moving space.
非常感谢您抽出时间分享您在应对各种挑战过程中积累的专业知识,希望您今天剩下的时间愉快。
So I appreciate you taking the time to help share some of the expertise that you've developed through working through the hard bits and hope you enjoy the rest of your day.
谢谢。
Thank you.
谢谢您邀请我。
Thank you for having me.
感谢您的收听,别忘了收听我们的其他节目。
Thank you for listening, and don't forget to check out our other shows.
Podcast.net 涵盖 Python 语言、其社区以及它被用于创新的方式。
Podcast.net covers the Python language, its community, and the innovative ways it is being used.
AI 工程播客是您了解构建 AI 系统这一快速变化世界的指南。
And the AI Engineering podcast is your guide to the fast moving world of building AI systems.
访问网站订阅节目、注册邮件列表并阅读节目笔记。
Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
如果您从节目中有所收获或尝试了某个项目,请告诉我们。
And if you've learned something or tried out a project from the show, then tell us about it.
请将您的故事发送至 hosts@dataengineeringpodcast.com。
Email hosts@dataengineeringpodcast.com with your story.
为了帮助其他人找到这个节目,请在 Apple 播客上留下评价,并告诉您的朋友和同事。
Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
关于 Bayt 播客
Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。