Maths on the Move - Euromaths:海瑟·哈灵顿 封面

Euromaths:海瑟·哈灵顿

Euromaths: Heather Harrington

本集简介

我们都知道数据是什么:在这个大数据时代,我们拥有大量信息片段。你可能也知道拓扑是什么:研究形状的学科,认为如果两个形状可以通过不撕裂或不粘连的方式相互变形而得到,那么它们就是相同的。 但什么是拓扑数据分析?它如何帮助我们理解蛋白质或癌症等疾病?我们采访了数学家希瑟·哈林顿,今年夏天我们在欧洲数学大会(ECM)上认识了她。希瑟向我们解释了拓扑数据分析如何为给定数据集生成所谓的“条形码”,从而揭示其深层结构。下面是一些条形码的示意图,帮助大家理解我们在播客中讨论的内容。 我们参加ECM得到了伦敦数学学会(LMS)的大力支持。希瑟在ECM上发表了LMS讲座。 你也可以收听我们关于ECM的“Euromaths”系列的更多期节目。 平面上20个点周围画出的圆。当半径r小于r0时,圆足够小,彼此不重叠(左);当半径超过r0但小于r1时,圆开始重叠,形成环状结构(中);当半径大于r1时,圆在环状结构的中心连成一体。此时你看到的是一个没有孔洞的单一团块。 条形码捕捉了这一信息:当r < r0时,有20条红线,表示有20个无孔洞的连通分量;当r0 < r < r1时,有一条绿线,表示有一个带一个孔洞的连通分量(红色和绿色分别代表无孔洞和一个孔洞);当r > r1时,有一条红线,表示有一个无孔洞的连通分量。 本内容由伦敦数学学会大力支持制作。

双语字幕

仅展示文本字幕,不包含中文音频;想边听边看,请使用 Bayt 播客 App。

Speaker 0

大家好,欢迎收听《移动中的大师》,这是来自plus.maths.org的播客。

Hello and welcome to Master on the Move, the podcast from plus.maths.org.

Speaker 0

我是玛丽安·弗赖贝格。

I'm Marianne Freiberger.

Speaker 1

我是蕾切尔·托马斯。

And I'm Rachel Thomas.

Speaker 1

今天,我们继续我们的特别系列播客‘欧罗巴数学’,这个系列的所有访谈都是你在夏季前往塞维利亚参加欧洲数学大会时进行的,玛丽安。

Today, we're continuing our special Euromath series of podcasts, which all feature interviews that you did, Marianne, at the European Congress of Mathematics that you went to in Seville in the summer.

Speaker 1

这个播客系列由伦敦数学学会支持。

And their podcast series is supported by the London Mathematical Society.

Speaker 1

在我们开始录制之前,你说这周我们要讨论拓扑数据分析。

Now before we started the recording, you said this week we're going to talk about topological data analysis.

Speaker 1

我知道数据是什么,是一些信息。

Now I know what data is, bits of information.

Speaker 1

正如我们所知,我们每天都会接触到大量这样的数据,尤其是在大数据时代,数据量还在不断增长。

And as we know, we deal with a lot of this every day, growing more and more in the era of big data.

Speaker 1

我知道拓扑是什么。

And I know what topology is.

Speaker 1

它是一种描述形状的方法,允许形状发生变形。

It's a way of describing shapes that allow for deformation.

Speaker 1

所以,众所周知,一个咖啡杯在拓扑上与一个甜甜圈是相同的,因为你可以通过拉伸、挤压和拉扯将甜甜圈变成咖啡杯,再变回去。

So famously, a coffee cup is topologically the same as a doughnut because you can stretch and squeeze and pull a doughnut to form a coffee cup and back again.

Speaker 1

但是,Marianne,什么是拓扑数据分析?

But Marianne, what is topological data analysis?

Speaker 0

嗯,我是从希瑟·哈灵顿那里了解到拓扑数据分析(简称TDA)的。她在欧洲数学大会期间做了伦敦数学学会的讲座,因为每次欧洲数学大会,伦敦数学学会都会主办一场特别讲座。

Well, I found out about topological data analysis, or TDA for short, from Heather Harrington, who at the European Congress of Maths, she gave the London Mathematical Society lecture because at every ECM, the LMS hosts a special lecture.

Speaker 0

现在,希瑟是牛津大学的数学教授,同时也是德国德累斯顿马克斯·普朗克分子细胞生物学与遗传学研究所的所长,以及德累斯顿工业大学的荣誉教授。

Now Heather is Professor of Mathematics at Oxford, and she's also a director at the Max Planck Institute of Molecular and Cell Biology and Genetics in Dresden in Germany, and an honorary professor at the TU in Dresden.

Speaker 0

她还是拓扑数据分析中心的联合主任。

And she's the co director of the Center for Topological Data Analysis.

Speaker 0

所以她身兼数职。

So she's a lot of things.

Speaker 1

她真是非常忙碌。

She's a very busy woman.

Speaker 0

确实是。

She is.

Speaker 0

为了了解拓扑数据分析的作用和它为何如此有用,她给我举了蛋白质的例子。

Now, to get an idea of what topological data analysis does and why it's really useful, she gave me the example of proteins.

Speaker 0

如果你把蛋白质看作是三维空间中的物体,它们的结构可以非常复杂。

Now proteins can have very complex structures if you think of them as objects sitting in three-dimensional space.

Speaker 0

人们特别感兴趣的一件事是所谓的蛋白质折叠。

And one thing that people are particularly interested in is this thing called protein folding.

Speaker 1

我听说过蛋白质折叠。

I've heard of protein folding.

Speaker 1

构成蛋白质的氨基酸链会折叠,形成复杂的三维结构存在于空间中。

So the chain of amino acids that make up the protein, it kind of folds to give some complex three d structure sitting in space.

Speaker 1

因此,它不再仅仅是一条长长的直线,而是折叠成了一个有趣的结构。

So it's no longer just one long line, one long chain of acids that's kind of folded into this interesting structure.

Speaker 1

这真的很重要,不是吗?

And it's it's really important, isn't it?

Speaker 1

因为如果它没有完全正确地折叠,就可能导致疾病、过敏之类的问题,对吧?

Because if it doesn't fold exactly the right way, it can lead to diseases and things like allergies and things like that, isn't it?

Speaker 0

是的,没错。

Yeah, that's right.

Speaker 0

蛋白质折叠是一个非常活跃的研究领域。

And protein folding is a very, very active research topic.

Speaker 0

我还从希瑟那里听说,谷歌深脑甚至开发了一个名为AlphaFold的AI系统,它利用人工智能根据氨基酸序列预测蛋白质的三维结构。

So I heard from Heather as well that there's even an AI system called AlphaFold by Google DeepMind, uses AI to predict a protein's three d structure from its amino acid sequence.

Speaker 0

要构建这样的AI系统,你需要大量关于蛋白质及其折叠方式的数据,以便AI系统能够从这些数据中学习。

And to make such an AI system, you need lots and lots of data about proteins and the way they fold so that the AI system can learn from that data.

Speaker 0

因此,目前有许多研究正在围绕蛋白质及其折叠展开。

So there's lots of research going into protein and its folding.

Speaker 1

所以,我想象如果你要对所有关于蛋白质的数据进行统计分析,或者用这些数据训练AI,你就需要一种非常好的方法来描述蛋白质结构的形状及其在空间中的姿态。

So I imagine if you're gonna do sort of a statistical analysis of all the data you might have about proteins or to train an AI on that sort of data, you need a really good way of describing the shape of this structure of a protein and the way it sits in space.

Speaker 0

是的。

Yeah.

Speaker 0

所以这完全正确。

So that's exactly right.

Speaker 0

现在如果你想想,如果我给你看一个蛋白质,你会如何最精确地描述它的外观,用一种计算机能够理解的方式?

Now if you think about it, if I showed you a protein, how would you describe the way it looks in the most precise way, in a way that a kind of computer can understand?

Speaker 1

我首先想到的是记录蛋白质结构中所有点在三维空间中的x、y和z坐标。

Well, my first thought would be to use something like writing down the x, y, and zed coordinates in three d space of all the points in the protein structure.

Speaker 1

我知道计算机确实是用这种方式存储三维形状的。

I know computers do store three d shapes in that way.

Speaker 1

所以我想,也许你可以为每个氨基酸选择一个点,作为该氨基酸在空间中的x、y、z坐标,然后描述它们是如何连接的。

So I can imagine maybe for each you choose a point on each amino acid, and that would be the point in x of the x, y, and z coordinate of that acid in space and then how they're linked together.

Speaker 1

但这是一种相当原始的描述方式。

But that's quite a brute force way of describing something.

Speaker 0

是的。

Yeah.

Speaker 0

所以你可以这么做。

So so you could do that.

Speaker 0

比如,对于任何形状,你都可以通过给出其中每个点的x、y、z坐标来描述它。

Like, for any shape, you could describe it by giving the x, y, z coordinates of every point in it.

Speaker 0

但这种方法的一个问题,除了需要列出一长串坐标外,还在于如果你的坐标测量稍有偏差——因为测量起来非常困难——就可能无意中描述了一个完全不同的蛋白质,甚至是一个根本不存在的结构。

But then one problem with that, apart from that you have a long list of coordinates, is that, you know, what if you got the coordinates slightly wrong because it's very difficult to measure, for example, then you might accidentally end up describing a completely different protein or maybe one that doesn't even exist.

Speaker 0

或者你可能遗漏了结构中的某个关键特征。

Or maybe you miss a crucial feature of the structure.

Speaker 1

所以我想,这就是拓扑学可以发挥作用的地方。

So I suppose that's where topology could come in.

Speaker 1

你可能需要一种蛋白质的描述方式,一种能够容忍些许误差、偏差或噪声的描述方法:即使存在微小的错误,它仍能捕捉到形状重要的拓扑结构特征,比如结构是由多个分离的部分组成,还是包含环状或回路结构,或者有很多分支?

You might want some kind of descriptor of the protein, some way of describing it that's somehow kind of tolerant to a bit of defamation, a bit of mistake, a bit of noise, that although there might be this this slight error, but it still captures the important structural topological features of the shape, such as whether the structure is made up of several disconnected bits or whether it contains a ring or a loop or lots of branches?

Speaker 0

是的,正是如此。

Yeah, exactly.

Speaker 0

因此,拓扑数据分析就是关于为数据开发这样的描述方法。

So topological data analysis is about developing such descriptors for data.

Speaker 0

所以这些数据可以是蛋白质。

So the data could be proteins.

Speaker 0

你想要的是一个对少量信息误差具有容忍性并能捕捉重要特征的描述符。

And what you want is a descriptor of this data that is kind of tolerant to a bit of information and captures important feature.

Speaker 0

以下是希瑟用她自己的话来解释。

Here's Heather explaining it in her own words.

Speaker 2

所以想法就是你刚才说的,即你有一些三维空间中的点,比如蛋白质结构中的碳原子,因此你有x、y、z坐标。

So the idea is that exactly what you said, which is that you have some points in three-dimensional space, like the carbon atoms in the protein structure, and so you have x y zed coordinates.

Speaker 2

然后你可能不只想研究几何结构。

And then maybe your you don't wanna only study the geometry.

Speaker 2

你还想了解一些拓扑结构,我会解释一下。

You want also some of the topology will explain.

Speaker 2

因此,拓扑数据分析使用拓扑不变性,但也会提取一些几何信息,因为你实际上是在将其视为一个多尺度描述符。

So TDA uses topological invariance, but you extract some geometric information as well because you're really looking at this as a multiscale descriptor.

Speaker 2

所以如果你的输入数据发生轻微扰动,蛋白质仅发生微小变化——我们知道蛋白质并非完全刚性。

And so if your input data, if it is perturbed just a little bit, the protein is perturbed just a tiny bit, which we know proteins are not only rigid.

Speaker 2

对吧?

Right?

Speaker 2

它们是动态的。

They are dynamic.

Speaker 2

所以如果它只是轻微扰动,你希望得到相似的描述符,这就是TDA的优势之一。

So if it perturbs just a little bit, you want to get a similar descriptor out, and that's one of the strengths in TDA.

Speaker 2

所以我们研究了这个,实际上这是和本科生一起做的。

And so we looked at and this was actually with the undergraduate students.

Speaker 2

这是其中一个本科生研究项目,他们获取了蛋白质数据,然后运行了这个拓扑分析流程,计算不同维度的拓扑特征。

So it was one of these undergraduate research projects where they took protein data and then ran this topological pipeline where you compute different topological features in different dimensions.

Speaker 2

比如一些环的连通性,也就是一维的空洞。

So like how connected are some loops, so one dimensional holes.

Speaker 2

通过这些,他们可以得到不同蛋白质的指纹或描述符。

And from that, they could kind of get a fingerprint or a descriptor of different proteins.

Speaker 2

我们发现,相似的蛋白质会聚在一起,在这种拓扑指纹——也就是这个方法的输出结果中——非常相似。

What we found is that proteins that are similar, cluster together, are very similar in the topological kind of fingerprint, the output of this.

Speaker 1

所以海瑟提到了一个流程。

So Heather mentioned a pipeline.

Speaker 1

她所说的流程是什么意思?

What does she mean by that?

Speaker 0

正如她在演讲中描述的那样,有一整套数学方法可以对原始数据进行处理,最终生成一个描述该数据的特征指纹。

So as she described in her talk, there's a whole pipeline of mathematical things you can do to raw data to at the end come out with a fingerprint describing that data.

Speaker 0

这个流程涉及一种非常有趣的东西,叫做滤波。

And this pipeline involves this very interesting thing that is called a filtration.

Speaker 0

现在,蕾切尔,为了说明或解释一下什么是这个滤波,我们先来想象一个场景。

Now, Rachel, in order to describe or give an idea of what this filtration is, let's start by imagining something.

Speaker 0

对吧?

Right?

Speaker 0

好的。

So Okay.

Speaker 0

我想让你想象一下,我们试图描述的对象只是一些点。

I want you to imagine that the object that we're trying to describe is just a bunch of points.

Speaker 0

对吧?

Right?

Speaker 1

比如在三维空间中的点

Like in three d space, points

Speaker 0

在平面上。

In the plane.

Speaker 0

它们可以在平面上。

They can be in the plane.

Speaker 0

好的。

Okay.

Speaker 0

这就是在一个平面上。

That's in a plane.

Speaker 0

是的。

Yeah.

Speaker 0

让我们想象它们都呈某种圆形排列。

Let's imagine they're all arranged in a sort of circular fashion.

Speaker 0

它们位于一个圆上。

They sit on a circle.

Speaker 0

对吧?

Right?

Speaker 1

所以我想象的是一张空白的纸,上面有一些点构成了一个圆形。

So I'm imagining kind of a blank piece of paper with dots forming into a kind of circle.

Speaker 0

是的。

Yeah.

Speaker 0

想象一下,假设有大约20个点,比如说有20个点。

Imagine, let's say that there's around 20 let's say there are 20 points.

Speaker 0

并且假设它们之间的距离都相等,比如说一厘米。

And let's say they all have equal distances between them, let's say distance of a centimeter.

Speaker 0

明白吗?

Okay?

Speaker 0

对。

Right.

Speaker 0

现在想象一下,我在每个点周围画一个半径为一毫米的小圆。

Now imagine if I draw a tiny little circle of one millimeter radius around each of these points.

Speaker 0

那你看到了什么?

What do you see then?

Speaker 1

嗯,我现在看到一圈小圆,但由于这些点之间的距离比小圆的半径要大,所以这些小圆彼此是分离的。

Well, I now see a ring of kind of circles, but they're not going to the circles are gonna be distinct from each other because the points are kind of separated by a bigger distance than the radius of the little circles.

Speaker 1

所以这就像一个由许多小圆组成的大圆。

So it's like a a bigger circle which is made up of little circles.

Speaker 0

没错。

Exactly.

Speaker 0

没错。

Exactly.

Speaker 0

这就是你看到的。

That's what you see.

Speaker 0

所以,如果我们把圆的内部也算上,那就是二十个小圆盘。

So it's a it's a bunch of 20 little discs if we take the inside of the circle as well.

Speaker 0

对吧?

Right?

Speaker 0

而且它们不会重叠,正如你所说。

And they don't overlap, as you say.

Speaker 0

那么,如果我逐渐增大这些圆的半径并继续下去,会发生什么?

Now what if I gradually increase the radius of of these circles and keep going?

Speaker 1

最终会发生什么?

What will happen eventually?

Speaker 1

最终,当这些圆变得足够大时,它们内部的区域会相互重叠。

Eventually, the discs, the areas inside these circles will overlap each other as they get big enough.

Speaker 1

没错。

Exactly.

Speaker 0

没错。

Exactly.

Speaker 0

一旦它们开始重叠,你看到的是什么?

And what do you see once they've started overlapping?

Speaker 1

然后你就看到一个连续的圆圈?

Then you kind of see a continuous circle?

Speaker 0

是的,没错。

Yeah, exactly.

Speaker 0

当它们开始重叠时,你会看到类似环状的结构。

You see sort of like a ring type structure as they've started to overlap.

Speaker 0

所以你会得到一个中间有洞的环。

So you have a ring with a hole in the middle.

Speaker 0

对吧?

Right?

Speaker 0

对。

Yeah.

Speaker 0

好的。

Okay.

Speaker 0

现在,如果你继续增大这些圆的半径,让它们变得非常、非常大,会发生什么?

Now what if you keep increasing the radius of these circles if you make them just really, really big?

Speaker 1

如果你把这些圆变得非常大,最终它们不仅会覆盖整个环,还会一直延伸到环的中心。

If you make them really, really big, then eventually they're going to cover not just the ring, but they'll end up covering all the way to the center of the ring.

Speaker 1

所以你会得到一个大的圆盘,也就是整个圆形内部被这些单独的圆覆盖住了。

So you'll end up getting kind of like a big disc, a big interior of a circle covered, which is made up by all these individual circles.

Speaker 0

是的。

Yeah.

Speaker 0

没错。

Exactly.

Speaker 0

所以发生的情况是,所有这些环形的圆突然开始在中间合并,形成一个类似团块的形状,完全就是这样。

So what happens what happens is that all this ring of circles suddenly starts merging together in the middle and you get like a blob made Exactly.

Speaker 0

这就是你得到的结果。

Up of So that's what you get.

Speaker 0

因此,通过不断增加这些小圆盘的半径——如果我们是在三维空间中,它们就是球体——这告诉我们关于这些点的分布情况,对吧?

So this process of increasing the radius of the little disks, or if we were in three d space they would be balls, it tells us something about the configuration of the bunch of points, right?

Speaker 0

因为它表明,最初这些点是彼此分离的,因为我们看到的是许多独立的圆盘。

Cause it tells us that initially they're separated because we get lots of individual disks.

Speaker 0

它们连接起来形成一个环,这意味着它们是以某种圆形的方式排列的。

The fact that they join up to form a ring means they're arranged in a kind of circular fashion.

Speaker 0

然后最终所有部分都融合成一个团块。

And then eventually it just all merges up into a blob.

Speaker 1

因此,在这些圆盘逐渐增大的过程中所形成的形状特征,能让你对初始数据的结构有所了解。

So the kind of characteristics of the shapes that are created along the way of the increasing in size of these disks gives you some sense of the formation of that data that you started with.

Speaker 0

是的,没错,正是如此。

Yeah, yeah, exactly.

Speaker 0

这就是滤波的过程。

And this is what the filtration is.

Speaker 0

更广泛地说,滤波涉及增加一个相关参数,在这个例子中就是半径,以捕捉信息,当然,你也可以用比我们刚才所讲的更严谨的数学方式来实现。

And more generally, a filtration involves increasing a relevant parameter, in this case it was the radius, to capture information with a little bit, you know, you can do that with a little bit more mathematical rigor than we have just let on.

Speaker 0

下面是海瑟的解释。

Here is Heather explaining it.

Speaker 2

没错。

Exactly.

Speaker 2

因此,滤过过程是观察这个参数的每一个取值,比如半径,然后查看在每个参数值下产生的形状是什么。

So you have a filtration is looking at each value of this parameter, which could be the radius, and then looking at what is the shape that results at each parameter value.

Speaker 2

而我们希望对这个形状进行量化,这里有一个可计算的拓扑不变量叫做同调,通过这种球体的并集,我们可以提取出数据的连通性和形状。

And that shape, we want to quantify, and there's a topological invariant that's computable called homology where we can, yeah, extract the connectedness and the shape of that of that data through this basically this union of of balls.

Speaker 1

我真的很喜欢滤过这个概念。

Well, I really like this idea of filtration.

Speaker 1

这是一个非常巧妙的想法,我喜欢它对微小变化具有鲁棒性。

It's a very neat idea, and I I like the fact that it's it's sort of robust to little changes.

Speaker 1

所以,如果我们例子中的某些点稍微移动了一下,整个过程中发生的事情——比如圆盘先连接成环状,再合并成一团——并不会发生巨大改变。

So if some of the points in our example were moved a little bit, the overall nature of what happening with what's happening during the process, the way the disks join up first into a ring and then into a blob, that wouldn't massively change.

Speaker 1

是的,

Yeah,

Speaker 0

确实如此。

that's true.

Speaker 0

在拓扑数据分析中,通过这种滤过过程,你可以构建出拓扑数据分析者所说的条形码,它描述了你所关注的结构,而这正是你用来表征结构的特征指纹。

And from such a filtration in topological data analysis, you can build what the topological data analysts call a barcode, which describes the structure you're interested in and that's the kind of fingerprint that you're looking for to describe the structure.

Speaker 1

所以现在我脑海中有了一个画面,拓扑数据分析正在扫描这些东西,就像自助结账一样。

So now I have a picture in my head of topological data analysis scanning these things, self checkout.

Speaker 0

是的,它实际上生成了条形码,然后任何统计分析都可以扫描所有这些条形码并进行处理。

Yeah, or it's producing, it's actually producing the barcode that you can then any statistical analysis will then scan all these barcodes and kind of process them.

Speaker 0

对。

Yeah.

Speaker 1

它基本上能识别出你所关注的对象或结构。

Kind of identifies the the object that you're looking at, the structure you're interested in.

Speaker 0

没错。

Exactly.

Speaker 0

接下来,海瑟会更好地解释一下条形码这个概念。

So here's Heather describing this idea of a barcode a little better.

Speaker 2

所以,确切地说。

So at any exactly.

Speaker 2

在任何一个点上,如果我们考虑半径,对吧?在任何一个半径值或参数值下,条形的数量就代表了该维度中的拓扑特征数量。

At any point in your if we're thinking of radius, right, if at any radius value, any parameter value, the number of bars, it tells you the number of of topological features in that dimension.

Speaker 2

当你考虑零维时,这指的是连通分量,也就是数据中的聚类。

So when you're thinking of dimension zero, this is connected components, the clusters and data.

Speaker 2

而当你考虑一维时,这指的是数据中环的数量。

And if you're thinking of degree one, this is the number of kind of loops in your data.

Speaker 2

由于你得到了这个条形码,每个区间的左右端点分别告诉你一个特征何时出现、何时消失;当一个特征消失时,它的长度就给出了持久性的度量。

And because you are this barcode, the like like the left and the right endpoints of each interval are telling you when a feature is born and when a feature is die and when a feature dies, then the length of that gives you a measure of persistence.

Speaker 2

那么这个特征的大小如何呢?

So how the size of that feature?

Speaker 2

你知道吗?你是否有一个会持续很久的大环,还是没有?

You know, do you have a big loop that will last a long time or not?

Speaker 1

哇哦。

Wow.

Speaker 1

那么,比如我们之前想象的那20个数据点排成一个圆圈时,它的条形码会是什么样子?

So what would the barcode look like, for example, of that 20 dataset we were imagining before sort of arranged into a circle?

Speaker 0

好的。

Okay.

Speaker 0

我来试着描述一下。

I'll try describe it.

Speaker 0

所以,让我看看我能不能做到,而且说得通。

So let's see if I can do this and it makes sense.

Speaker 0

首先,想象在一张纸上放一把尺子,水平放置,因为

So first of all, imagine putting down a ruler on a piece of paper just horizontally because what Is

Speaker 1

这张纸是那个有点的纸吗?

this the piece of paper with the dots on it?

Speaker 0

不是,是另一张。

No, it's a separate one.

Speaker 0

是用来画条形码的那张纸。

It's the one that you're going to draw the barcode on.

Speaker 1

对。

Right.

Speaker 1

好的。

Okay.

Speaker 1

所以现在我有一张单独的纸

So now I've got a separate piece of paper

Speaker 0

你把尺子放在上面。

and you're putting a ruler down on it.

Speaker 0

是的。

Yep.

Speaker 0

水平放置,因为我们平时在商品上看到的条形码是竖线,但这个条形码是由横线组成的。

Horizontally because the like, the barcodes we're used to from products that we buy there are vertical lines, but this this barcode would be made of horizontal lines.

Speaker 0

现在,在尺子零到半厘米之间的部分,我们画上20条平行线,全部堆叠在一起。

You put the Now, above the bit of the ruler between zero and half a centimeter, we draw 20 parallel lines all stacked on top of each other.

Speaker 1

是因为我们有20个数据点吗?

Is that because we've got 20 points of data?

Speaker 0

因为只要小圆盘的半径小于半厘米,它们就不会重叠,对吧?

It's because as long as the radius of the little discs, as long as that's less than half a centimeter, they don't overlap, right?

Speaker 1

因为当你有半厘米时,它们有20个独立的部分。

Because they've got 20 separate parts when you've got half Got six meter you.

Speaker 0

没错。

Exactly.

Speaker 0

所以你有了这20条水平线叠在一起,表示我们有20个连通分量。

So you've got this stack of 20 horizontal lines sitting on top of each other indicating that we have 20 connected components as they're called.

Speaker 0

我们也会给这20条线加上颜色。

And we'll give these 20 lines a color as well.

Speaker 0

我们把它们设为红色。

Let's make them red.

Speaker 0

这表示从拓扑学上看它们是圆盘,也就是中间没有孔的形状。

And this indicates that topologically they are disks, so they are things that don't have holes in them.

Speaker 1

所以颜色代表了这些线所表示的结构吗?

So the color indicates the structure that the lines represent?

Speaker 0

是的,它代表的是拓扑结构,也就是一个中间没有孔的圆盘。

Yeah, so it's the topological structure which is a disk, something without a hole in the middle.

Speaker 0

现在,在这叠线的右侧,当你沿着尺子往下回到半厘米以上的半径值时,我们会画一条单独的水平线,并将其涂成绿色。

Now to the right of that stack, which would then down if you go back down to the ruler, correspond to values of the radius of the circle being bigger than a half, we draw a single horizontal line and we will color that green.

Speaker 0

好的。

Okay.

Speaker 0

这表明,一旦半径超过半厘米(因为点之间的距离是一厘米)。

So this indicates that as soon as the radius is bigger than a half, half a centimeter because the points are one centimeter apart.

Speaker 0

如果两个点周围的圆的半径大于半厘米,那么它们就会重叠,对吧?

If around two points there's a circle of a radius bigger than a half, then they will overlap, right?

Speaker 0

是的。

Yeah.

Speaker 0

因此,一旦半径超过半厘米,你看到的就不再是20个互不相连的圆盘,而是一个单一的环。

So as soon as the radius is bigger than a half, what you see is no longer 20 disconnected disks, but a single ring.

Speaker 0

对吧?

Right?

Speaker 0

这就是为什么我们只画一条水平线。

So that's why we only draw one horizontal line.

Speaker 1

而且颜色不同了,因为现在它是一个环,而不是一个圆盘。

And it's a different color because now it's a ring rather than a disk.

Speaker 0

没错。

Exactly.

Speaker 0

所以从拓扑学上看,环和圆盘非常不同,因为一个有洞,另一个没有。

So now topologically a ring and a disk are very, very different because one has a hole and one doesn't.

Speaker 0

所以我们用颜色来表示这一点。

So we indicate this by color.

Speaker 0

当然,随着半径继续增大,在某个时刻环会消失并变成一个实心块。

And now, of course, we know that as the radius increases further, at some point the ring disappears and turns into a blob.

Speaker 0

这在拓扑上与圆盘是相同的。

This will indicate Topologically the same as a disc.

Speaker 0

没错。

Exactly.

Speaker 0

因此,这会在我们的条形码中通过这条绿色线具有有限长度来表示。

So that will be indicated in our barcode by the fact that this green line has also a finite length.

Speaker 0

它会在某一点终止。

It stops at some point.

Speaker 0

当半径超过环消失的那个点时,我们将再次得到一条单一的红线。

And then what we get for values of the radius bigger than that point where the ring disappears, we will again have a single red line.

Speaker 0

这表示现在只有一个没有孔的单连通部分。

And that indicates that there's now a single simply connected bit without holes in it.

Speaker 1

我们有这张图吗?

Have we got a picture of this?

Speaker 1

如果能在节目中展示就好了。

It'd be great to put it in the show

Speaker 0

是的,我会制作一张图并放在节目笔记里。

Yeah, will make a picture of it and put it in the show notes.

Speaker 0

我的意思是,这大致描述了条形码的样子以及它是如何编码的,

I mean, is sort of the rough idea of what the barcode code looks like and how it encodes,

Speaker 1

这些不同的拓扑特征如何随着半径增大而出现和消失。

how these different topological features appear and disappear as the radius increases.

Speaker 1

这是一种非常巧妙的方式,用以编码组件的数量、颜色表示组件的类型,以及线条的位置指示我们所观察的尺度。

And it's quite a clever way of encoding both the count of how many components you've got, the color of how many indicating what type of components they are, and then this positioning of the lines indicating what scale you're sort of looking at.

Speaker 1

这真的很有趣。

It's really interesting.

Speaker 0

是的,正如希瑟已经提到的,这种条形码的想法对微小变化具有鲁棒性。

Yes, and as Heather already said, it's sort of this barcode idea is robust to small changes.

Speaker 0

所以,如果你稍微移动一下这些点,条形码看起来仍然大致相同。

So if you moved the little points a little bit, the barcode would look still roughly the same.

Speaker 0

它只会略有不同。

It would only be slightly different.

Speaker 0

而且,确实有一些数学结果证明了这种用于刻画我们所研究形状的方法的鲁棒性。

And I mean, are mathematical results which prove the kind of robustness of this approach to characterizing these shapes that we were looking at.

Speaker 0

而我们刚才描述的,当然只是一个非常简单的例子,即我们想要刻画的对象只是平面上的20个点。

And what we just described is of course a very simple example of whether the object we wanted to characterize is just 20 points sitting in the plane.

Speaker 0

但借助数学的力量,你可以对比点集复杂得多的数据做类似的操作。

But with the power of mathematics, you can do something sort of similar for much more complex data than just bunches of points.

Speaker 0

例如,你可以为存在于更高维空间、我们甚至无法可视化数据创建条形码。

So, for example, you can create barcodes for data that live in higher dimensional spaces that we can't even visualize.

Speaker 1

所以,这再次体现了数学的力量:你可以将那些我们很容易直观理解的数学概念——比如圆盘、环形、二维中的扩展圆——数学地扩展到更高维度的情形中。

So again, that's that illustration of the power of maths that you can take mathematical ideas for things that we can really easily visualize: discs, rings, expanding circles in two d, and you can just extend those ideas mathematically to cover higher dimensional situations.

Speaker 1

太神奇了。

It's amazing.

Speaker 1

那么,人们通常用拓扑数据分析来做些什么呢?

So what kind of things do people do with topological data analysis?

Speaker 0

我问过海瑟这个问题。

That's what I asked Heather.

Speaker 2

有一个名为Donut的完整数据库,里面收集了所有成功应用拓扑数据分析的案例数据集。

So there's been there's an entire database called Donut, which is all examples of of datasets that have been useful with TDA.

Speaker 2

我研究的许多问题都源于生物科学,但在拓扑数据分析领域,神经拓扑和神经科学取得了大量成果,因为大脑太过复杂。

And so many of the problems that I'm motivated to study are from biological sciences, but in the field of TDA, there's been a lot of success looking at neuro topology and neuroscience because the brain is so complicated.

Speaker 2

试图整合这些信息,这种方法能以某种方式将不同尺度或大脑连接的不同权重的信息整合起来进行研究。

Trying to kind of integrate information, this gives you a way to integrate it in some way and study across different scales or different weights of connections in the brain.

Speaker 2

我主要关注疾病领域,结果发现,如果一个人身体不适,他们通常会采集某种样本。

I've been focused a lot in disease, so it turns out that, you know, if someone is unwell, they'll take some type of samples.

Speaker 2

例如,这可能是一个组织样本。

It could be a tissue sample, for example.

Speaker 2

从组织样本中,病理学家或某种机器学习算法会指出这些细胞的位置,并识别出它们的类型。

And from the tissue sample, a pathologist or some type of machine learning algorithm will say these are the cell locations and identify what types of cells they are.

Speaker 2

因此,我们得到了平面上不同细胞的点分布,其中一个问题是:是否存在某些模式是特定疾病或疾病进展的典型特征?

And so we have points in the plane of what different cells are, and then one question is, are there certain patterns that are, you know, kind of characteristic of certain diseases or progression to a disease?

Speaker 2

一个例子就是癌症。

And so one example is cancer.

Speaker 2

癌症通常有不同的阶段,治疗也是如此。

There's different stages often in cancer, and same thing with treatment in cancer.

Speaker 2

因此,在过去几年里,我们成功地开发了新的TDA方法,这些方法正是由这类问题所驱动的。

And so we have, over the last few years, applied quite successfully in building new TDA approaches that's really been motivated by these types of problems.

Speaker 2

那么,当我们获得这些空间信息时,能否描述并量化某些特定细胞的空间模式?

So can we describe, quantify the patterns of certain spatial certain cells when we are given that spatial information?

Speaker 2

通常,这些是一些免疫细胞及其亚型,比如T细胞、巨噬细胞,或者许多其他类型的细胞。

Usually, these are some type of immune cells, some and subsets, so like t cells or macrophages or, I don't know, many different types of cells.

Speaker 2

因此,这为他们提供了一种量化方法,我们也做出了一些预测。

And so that's given them a way to quantify that and we've made some predictions.

Speaker 2

昨天我讨论了一个项目,涉及小鼠肾脏,与牛津大学的合作者一起进行。

So yesterday I discussed this as very excited in a project of it was mouse kidney and with collaborators in Oxford.

Speaker 2

我们通过拓扑数据分析预测存在一个免疫细胞环,后来他们通过实验验证了这一点。

And we made a prediction that there's a ring with the topological data analysis that there's a ring of immune cells, which then they experimentally validated.

Speaker 2

我们能够使用一种相当复杂的最新拓扑数据分析方法——多参数持久同调,这在数学上极具挑战性,但正如我所说,不存在完整的不变量。

And we were able to use the, you know, actually fairly sophisticated recent topological data analysis was the multi parameter persistent homology, which is mathematically really challenging, but there's no as I said, there's no complete invariant.

Speaker 2

只有不完整的不变量。

There's only incomplete invariant.

Speaker 2

尽管如此,我们仍能提出这些假设并加以验证,因此发现了一些新的生物学现象。

Still, we can make these hypotheses and actually test them, so we found some new biology.

Speaker 0

所以,这是数学帮助发现新生物学的例子。

So this is maths helping to discover new biology.

Speaker 0

对。

Yeah.

Speaker 0

这太神奇了,不是吗?

It's amazing, isn't it?

Speaker 0

我请海瑟再给我们详细解释一下。

I asked Heather to explain us a little more about this.

Speaker 2

所以,我认为一种理解数学的方式是,我们通常从底层开始构建基础。

So I think one way to think about mathematics is that we often build from the ground up the foundation.

Speaker 2

而所有东西都是建立在这个基础之上的。

And everything is kind of built on that.

Speaker 2

所以你有一块块拼图,或者说是积木,然后用这些积木搭建起你的建筑。

So you have pieces of a puzzle or, you know, blocks and you build up your building with these blocks.

Speaker 2

但在生物学中,你并没有所有的拼块。

And in biology, you don't have all the pieces.

Speaker 2

因此,你无法看到所有拼块来完成整个构建。

So it's kind of you can't see all the pieces to build that.

Speaker 2

所以我认为,在这种意义上,他们仍在努力发现所有这些拼块是什么,而数学能否帮助识别出这些拼块。

And so I think in that sense, they're still trying to discover what are all the pieces and can math help kind of identify what those pieces are.

Speaker 2

这就是我认为数学真正能够帮助揭示的大量内容。

And that's what I think mathematics can really, yeah, help uncover so much.

Speaker 2

我认为生物学中变化很大的一点是,现在像我提到的组织样本这样的测量数据被收集了,它们拥有来自组织样本的基因组信息、图像,以及关于该样本、也许还有该患者及其他相关信息的大量数据。

I think what's changed a lot in biology is now measurements are taken like I mentioned, this example of a tissue sample, but they have, you know, the genomic information from the tissue sample, they have the image, then they have all the information about that, maybe that patient and like, I don't know, other right?

Speaker 2

因此,它们拥有多种不同类型的数据,全部都关于同一个样本、同一个时间点,例如。

So they have many different types of data and all about that one sample, that one point in time, for example.

Speaker 2

而且我认为生物学和医学需要新的方法,是的。

And and I think they just biology and medicine need new approaches Yeah.

Speaker 2

这些方法能够将所有这些整合起来,具有严谨性、对扰动具有稳定性,而且这仍然是拼图的一部分。

That that can bring it all together, that are rigorous, that are stable to perturbations, that And it's still pieces of a puzzle.

Speaker 2

对吧?

Right?

Speaker 2

但我认为数学能够统一并简化这些内容。

But I think mathematics can unify and simplify things.

Speaker 2

至少这是我们努力要做的。

At least that's what we try to do.

Speaker 2

然后获得一些洞察,使其具有可解释性。

And then have some insights, so have it be interpretable.

Speaker 2

在拓扑数据分析中,这一点非常重要,因为它具有很高的可解释性和稳定性。

And that's one thing in topological data analysis that there's a lot of interpretability and stability.

Speaker 2

所以,它确实带来了不少帮助。

So, yeah, it brings yeah.

Speaker 2

我认为生物学就像是一个充满各种问题的游乐场,我们正试图拼凑那些连样子都看不到的碎片。

I think biology is kind of a playground of many problems that we're trying to piece together pieces of the puzzle that we can't even see.

Speaker 1

拓扑数据分析听起来是个相当新的领域。

Topological data analysis, it's quite a new thing by the sound of it.

Speaker 1

那么,目前都发生了哪些进展?

So what kind of things have been happening?

Speaker 0

嗯,真的有很多事情在发生。

Well, lots of things, really.

Speaker 0

特别是在英国,有一个名为拓扑数据分析中心的机构,希瑟是它的联合主任之一,她给我讲了一些关于这个中心的情况。

In The UK specifically, there has been this thing called the Center for Topological Data Analysis that Heather is a co director of, and she told me a little more about that.

Speaker 2

对。

Right.

Speaker 2

所以我认为英国在这方面非常前沿,英国工程与物理科学研究理事会(EPSRC)在支持跨学科研究方面做得非常好,无论是数学内部还是科学领域的跨学科合作。

So we I think The UK is very progressive, and the EPSRC has been very very good at kind of these interdisciplinary whether it's interdisciplinary within mathematics or within the sciences to support this.

Speaker 2

因此,这是一项针对数据科学新方法的申请项目。

And so this is an application where they they had a call for new approaches to data science.

Speaker 2

我们之前举办过几次关于拓扑数据分析的小型研讨会,吸引了大量来自产业界的人士参与。

And so we had had a couple small workshops on topological data analysis and had an incredible number of of people even from industry.

Speaker 2

我记得有一次产业界活动,有50多位来自不同机构的人前来询问:拓扑数据分析到底在搞什么?

I think at one point we had an industry event with 50 over 50 different people that came saying, oh, what's going on with TDA?

Speaker 2

于是我们申请了中心资助,并于2018年正式启动了该项目。

So we applied for the center grant and and then we started it in 2018.

Speaker 2

我的意思是,这真的非常棒。

And, I mean, it's been really wonderful.

Speaker 2

我们目前有50多位成员,大约11位教职人员,以及许多产业合作伙伴。

We have over 50 members, so something like 11 faculty and then many industry partners.

Speaker 2

而且这跨越了三到四个节点,因为有些人换了地方,但这种方式真的非常棒,建立了一个社区,数学上的进展也极为显著。

And it's been across three or four nodes depending because some of the people moved, how you can't But it's it's been this really wonderful way to have a community, and the advances mathematically have have been phenomenal.

Speaker 2

这是一个在全球都非常活跃的领域。

It's just such an active field worldwide.

Speaker 2

这确实是一个活跃的领域,但我认为中心的成立为英国带来了关键性的聚集效应,并创造了大量新职位。

It's really an active field, but I think having the center brought a critical mass to The UK and many new positions.

Speaker 2

因此,我们发展了许多新的理论,包括数学理论;但正如我所说,这也是应用驱动的。对我们牛津来说,主要聚焦于生物医学;而在利物浦,他们研究材料;在斯旺西和达勒姆,则更多关注物理问题。

So we developed lots of new theory, mathematical theory, but also, as I said, it's application driven, so new insights into to for us in in Oxford, we focused mostly on biomedicine, but then in Liverpool, they were looking at materials, and in Swansea, in Durham, more physics problems.

Speaker 2

所以,每位成员都带来了自己的兴趣和专长,形成了非常棒的组合,我们定期举行会议,以持续交流、探讨该领域的快速进展。

So, I mean, each of kind of the different people bring what their interests and their expertise are and it was really a nice mix and we had regular meetings to, yeah, continue informing each other and discussing rapid advances in in the field.

Speaker 2

这真的非常棒,充满了正能量,我想这么说。

It's been really great and a lot of energy, like positive energy, I would say.

Speaker 1

听到这样一个全新且极其活跃的研究领域,真是令人着迷。

It's so interesting to hear about, you know, such a new and so very active area of research.

Speaker 0

是的。

Yeah.

Speaker 0

这真的非常有趣,因为对我来说,这也是第一次了解到拓扑数据分析。

It's been it's been really fascinating because that was the first time for me as well to find out about topological data analysis.

Speaker 0

所以我们拭目以待吧。

So we'll see.

Speaker 0

我相信在未来几年里,我们会听到更多关于它的内容。

I'm sure we'll hear lots more about it in the years to come.

Speaker 1

那么,今天的播客就到这里了。

So that's it for today's podcast.

Speaker 1

希望你们在了解拓扑数据分析的过程中感到有趣。

We hope you've enjoyed finding out about topological data analysis.

Speaker 1

如果你喜欢这个播客,请推荐给朋友,或者在你的播客应用中给我们评分和评论。

And if you've enjoyed the podcast, please do recommend us to a friend or rate and review us in your podcast app.

Speaker 1

这能帮助其他人找到我们。

It helps other people find us.

Speaker 1

但今天就到这里了。

But that's it for today.

Speaker 1

感谢收听,我们下次再见。

Thanks for listening and bye for now.

关于 Bayt 播客

Bayt 提供中文+原文双语音频和字幕,帮助你打破语言障碍,轻松听懂全球优质播客。

继续浏览更多播客