梅森·波特谈社区检测与数据拓扑

本集简介

看待世界的一种方式是将其视为动态的、不断变化的连接所形成的干涉模式——这些关系在多层次网络的嵌套群体中不断形成与断裂。身份可以通过一个关系集群与其他任何关系集群之间的信息交换来定义。在创新一个世纪所赋予我们的海量数据和新分析方法的洪流中，一种类似音乐的秩序开始显现。但正如新音乐流派一样，需要经过训练的耳朵才能适应这种陌生的秩序……我们能从网络科学及相关的一般性、抽象数学方法中学到什么，以在数字的洪流中发现这种秩序？欢迎收听《复杂性》，圣塔菲研究所的官方播客。我是您的主持人迈克尔·加菲尔德，每期节目，我们都将带您与全球范围内严谨的研究者展开广泛对话，他们正在构建新的框架，以解释宇宙最深层的奥秘。本周，我们与圣塔菲研究所外部教授、加州大学洛杉矶分校数学家梅森·波特（UCLA官网、Twitter、Google Scholar、维基百科）对话，探讨他在网络社区检测与数据拓扑方面的研究——深入一系列多样化的工具方法，帮助科学家揭示现代生活所产生海量数据集中的深层结构。如果您重视我们的研究与传播工作，请在Apple Podcasts或Spotify上订阅、评分并评论我们，并考虑通过santafe.edu/engage进行捐赠，或以其他方式参与我们的活动。我知道这可能令人意外，但本期是我们倒数第二集。请继续关注五月的最后一期节目，届时圣塔菲研究所所长大卫·克拉考尔与我将回顾过去三年半的主要主题与亮点，并展望我接下来的计划！以这种方式将复杂系统科学带给您，是我的荣幸与快乐，希望我们保持联系——我并不难找。感谢您的收听。播客主题音乐由米奇·米尼亚诺创作。关注我们的社交媒体： Twitter • YouTube • Facebook • Instagram • LinkedIn 提及及关联媒体：网络上的意见动态有界信任模型梅森·波特的圣塔菲研究所研讨会（实时Twitter报道与YouTube直播录像）网络中的社区作者：梅森·波特、Jukka-Pekka Onnela、Peter Mucha Facebook网络的社会结构作者：Amanda Traud、Peter Mucha、梅森·波特关于幂律的关键真相作者：Michael Stumpf、梅森·波特数据的拓扑结构作者：梅森·波特、Michelle Feng、Eleni Katifori 具有复杂权重的复杂网络作者：Lucas Böttcher、梅森·波特超图上的有界信任意见动态模型作者：Abigail Hicock、Yacoub Kureh、Heather Z. Brooks、Michelle Feng、梅森·波特疾病传播与竞争性意见共演化多层网络模型作者：Kaiyan Peng、Zheng Lu、Vanessa Lin、Michael Lindstrom、Christian Parkinson、Chuntian Wang、Andrea Bertozzi、梅森·波特面向社会神经科学家的社会网络分析作者：Elisa C. Baek、梅森·波特、Carolyn Parkinson 社会与生物网络中的社区结构作者：Michelle Girvan、Mark Newman 个体性的信息论作者：David Krakauer、Nils Bertschinger、Eckehard Olbrich、Jessica C. Flack、Nihat Ay 社会资本 I：测量及其与经济流动性的关联作者：Raj Chetty、Matthew O. Jackson、Theresa Kuchler、Johannes Stroebel、Nathaniel Hendren、Robert B. Fluegge、Sara Gong、Federico Gonzalez、Armelle Grondin、Matthew Jacob、Drew Johnston、Martin Koenen、Eduardo Laguna-Muggenburg、Florian Mudekereza、Tom Rutter、Nicolaj Thor、Wilbur Townsend、Ruby Zhang、Mike Bailey、Pablo Barberá、Monica Bhole、Nils Wernerfelt 网络中层次结构与缺失链接的预测作者：Aaron Clauset、Cristopher Moore、M.E.J. Newman 格雷戈里·贝特森（维基百科）《复杂性》第99期——艾莉森·戈普尼克谈儿童发展、老年、照护与人工智能《我们为何睡觉？》作者：Van Savage、Geoffrey West，《Aeon》杂志《复杂性》第4期——路易斯·贝滕库尔谈城市科学《复杂性》第12期——马修·杰克逊谈社会与经济网络《复杂性》第68期——W. 布莱恩·阿瑟谈名词与动词中的经济学（第一部分）《复杂性》第100期——丹妮·巴斯特与佩里·祖恩谈好奇大脑的神经科学与哲学

One way of looking at the world reveals it as an interference pattern of dynamic, ever-changing links — relationships that grow and break in nested groups of multilayer networks. Identity can be defined by informational exchange between one cluster of relationships and any other. A kind of music starts to make itself apparent in the avalanche of data and new analytical approaches that a century of innovation has availed us. But just as with new music genres, it requires a trained ear to attune to unfamiliar order…what can we learn from network science and related general, abstract mathematical approaches to discovering this order in a flood of numbers? Welcome to COMPLEXITY, the official podcast of the Santa Fe Institute. I’m your host, Michael Garfield, and in every episode we bring you with us for far-ranging conversations with our worldwide network of rigorous researchers developing new frameworks to explain the deepest mysteries of the universe. This week we speak with SFI External Professor, UCLA mathematician Mason Porter (UCLA Website, Twitter, Google Scholar, Wikipedia), about his research on community detection in networks and the topology of data — going deep into a varied toolkit of approaches that help scientists disclose deep structures in the massive data-sets produced by modern life. If you value our research and communication efforts, please subscribe, rate and review us at Apple Podcasts or Spotify, and consider making a donation — or finding other ways to engage with us — at santafe.edu/engage. I know it comes as a surprise, but this is our penultimate episode. Please stay tuned for one more show in May when SFI President David Krakauer and I will reflect on major themes and highlights from the last three-and-a-half years, and look forward to what I’ll be doing next! It’s been an honor and a pleasure to bring complex systems science to you in this way, and hope we stay in touch. I won’t be hard to find. Thank you for listening. Podcast theme music by Mitch Mignano. Follow us on social media: Twitter • YouTube • Facebook • Instagram • LinkedIn Mentioned & Related Media: Bounded Confidence Models of Opinion Dynamics on Networks SFI Seminar by Mason Porter (live Twitter coverage & YouTube stream recording) Communities in Networks by Mason Porter, Jukka-Pekka Onnela, & Peter Mucha Social Structure of Facebook Networks by Amanda Traud, Peter Mucha, & Mason Porter Critical Truths About Power Laws by Michael Stumpf & Mason Porter The topology of data by Mason Porter, Michelle Feng, & Eleni Katifori Complex networks with complex weights by Lucas Böttcher & Mason A. Porter A Bounded-Confidence Model of Opinion Dynamics on Hypergraphs by Abigail Hicock, Yacoub Kureh, Heather Z. Brooks, Michelle Feng, & Mason Porter A multilayer network model of the coevolution of the spread of a disease and competing opinions by Kaiyan Peng, Zheng Lu, Vanessa Lin, Michael Lindstrom, Christian Parkinson, Chuntian Wang, Andrea Bertozzi, & Mason Porter Social network analysis for social neuroscientists Elisa C Baek, Mason A Porter, & Carolyn Parkinson Community structure in social and biological networks by Michelle Girvan & Mark Newman The information theory of individuality by David Krakauer, Nils Bertschinger, Eckehard Olbrich, Jessica C Flack, Nihat Ay Social capital I: measurement and associations with economic mobility by Raj Chetty, Matthew O. Jackson, Theresa Kuchler, Johannes Stroebel, Nathaniel Hendren, Robert B. Fluegge, Sara Gong, Federico Gonzalez, Armelle Grondin, Matthew Jacob, Drew Johnston, Martin Koenen, Eduardo Laguna-Muggenburg, Florian Mudekereza, Tom Rutter, Nicolaj Thor, Wilbur Townsend, Ruby Zhang, Mike Bailey, Pablo Barberá, Monica Bhole & Nils Wernerfelt Hierarchical structure and the prediction of missing links in networks by Aaron Clauset, Cristopher Moore, M.E.J. Newman Gregory Bateson (Wikipedia) Complexity Ep. 99 - Alison Gopnik on Child Development, Elderhood, Caregiving, and A.I. “Why Do We Sleep?” by Van Savage & Geoffrey West at Aeon Magazine Complexity Ep. 4 - Luis Bettencourt on The Science of Cities Complexity Ep. 12 - Matthew Jackson on Social & Economic Networks Complexity Ep. 68 - W. Brian Arthur on Economics in Nouns and Verbs (Part 1) Complexity Ep. 100 - Dani Bassett & Perry Zurn on The Neuroscience & Philosophy of Curious Minds

双语字幕

仅展示文本字幕，不包含中文音频；想边听边看，请使用 Bayt 播客 App。

Speaker 0

无论如何，我们用一个数学对象来表示某种事物，即使我们纳入了大量这类细微差别、非成对交互等，它也比现实简单得多。

In any event, we're representing something in a mathematical object that even when we include a bunch of these nuances, non pairwise interactions and so on, is a much simpler thing than reality.

Speaker 0

这是对现实的粗略简化，你必须始终担心：当你面对现实，然后从现实中获取数据——而这些数据本身已经做了一些简化，接着你又将它们转化为一个你研究的数学对象时，你会意识到：好吧，如果我把它变成一个数学对象，我可以说我可能对这个数学对象做出了一个精确的陈述，尽管即使如此也存在近似，但我们就假设我确实这么做了。

It's a gross simplification of reality and you always have to worry that when you get reality and then you get data from reality, which is already making certain simplifications, and that you then turn it into a mathematical object that you study, you know, there's this thing of, okay, if I turn it into a mathematical object, I can say that potentially I am making a precise statement about the mathematical object, although even then there's approximations, but let's suppose that I'm doing that.

Speaker 0

现在我已经对这个数学对象做出了一个精确的陈述，我想把这个陈述推及到现实世界，试图说明一些关于现实世界的东西，尽管这个数学对象只是现实的简化版本。

Now I've made a precise statement about the mathematical object and I want to turn that statement and imply something and say something about the real world even though the mathematical object is a simplification of the real world.

Speaker 0

你必须警惕，因为选择以某种特定方式表示事物时，会产生人为的假象。

And you have to worry because there are artifactual things that occur by choosing to have represented something in a certain way.

Speaker 0

而我们的期望是，你对这个对象所做出的某些结论，或许也能告诉你关于它所代表的更复杂事物的某些信息。

And the hope is that something that you then say about this object hopefully can tell you also something about the more complicated thing it's representing.

Speaker 0

因此，当我们处理这些成对交互时，我们知道如何更有效地从数学上研究它们，所以我们花了很多时间在它们上面。

So when you do something like these pairwise interactions, we know more about how to study them mathematically, so we spend a lot of time on them.

Speaker 0

当我们试图推广时，比如多体交互——我在这里 conventionally 使用‘多体’指三个或以上，有些人称之为高阶，但顺序是单体、成对，然后是多体，即多体指三个或以上，你需要推广不同的概念。

When we make efforts to generalize, so polyadic interactions and this I'm conventionally using polyadic to mean three plus, so some people call it higher order, but mono and then dyadic and then polyadic, so polyadic three plus, you need to generalize different concepts.

Speaker 1

一种看待世界的方式，是将其视为动态且不断变化的关联所构成的干涉模式。

One way of looking at the world reveals it as an interference pattern of dynamic ever changing links.

Speaker 1

关系在多层次网络的嵌套群体中不断形成与破裂。

Relationships that grow and break in nested groups of multilayer networks.

Speaker 1

身份可以通过一个关系集群与任何其他关系集群之间的信息交换来定义。

Identity can be defined by informational exchange between one cluster of relationships and any other.

Speaker 1

在世纪创新所赋予我们的海量数据和新分析方法的洪流中，某种音乐风格开始显现出来。

Kind of music starts to make itself apparent in the avalanche of data and new analytical approaches that a century of innovation has availed us.

Speaker 1

但正如新的音乐流派一样，需要经过训练的耳朵才能感知到这种陌生的秩序。

But just as with new music genres, it requires a trained ear to attune to unfamiliar order.

Speaker 1

我们可以从网络科学及相关的一般性抽象数学方法中学到什么，以在数字的洪流中发现这种秩序？

What can we learn from network science and related general abstract mathematical approaches to discovering this order in a flood of numbers?

Speaker 1

欢迎收听《复杂性》，圣塔菲研究所的官方播客。

Welcome to Complexity, the official podcast of the Santa Fe Institute.

Speaker 1

我是您的主持人迈克尔·加菲尔德。

I'm your host, Michael Garfield.

Speaker 1

在每一集中，我们将带您与我们全球范围内的严谨研究者进行深入对话，他们正在开发新的框架，以解释宇宙最深层的奥秘。

And in every episode, we bring you with us for far ranging conversations with our worldwide network of rigorous researchers developing new frameworks to explain the deepest mysteries of the universe.

Speaker 1

本周，我们邀请了圣塔菲研究所的外部教授、加州大学洛杉矶分校的数学家梅森·波特，讨论他在网络社区检测和数据拓扑方面的研究，深入探讨了一系列帮助科学家揭示现代生活所产生的海量数据中深层结构的多元方法。

This week, we speak with SFI external professor, UCLA mathematician Mason Porter, about his research on community detection in networks and the topology of data, going deep into a varied toolkit of approaches that help scientists disclose deep structures in the massive datasets produced by modern life.

Speaker 1

如果您重视我们的研究与传播工作，请在Apple Podcasts或Spotify上订阅、评分并评论我们，并考虑通过santafe.edu/engage进行捐赠或以其他方式参与我们的活动。

If you value our research and communication efforts, please subscribe, rate, and review us at Apple Podcasts or Spotify, and consider making a donation or finding other ways to engage with us at santafe.edu/engage.

Speaker 1

我知道这可能出乎意料，但这是我们的倒数第二期节目。

I know it comes as a surprise, but this is our penultimate episode.

Speaker 1

请继续关注五月的最后一期节目，届时圣塔菲研究所所长大卫·克拉考尔和我将回顾过去三年半的主要主题与亮点，并展望我接下来的计划。

Please stay tuned for one more show in May when SFI president David Krakauer and I will reflect on major themes and highlights from the last three and a half years and look forward to what I'll be doing next.

Speaker 1

以这种方式将复杂系统科学带给你们，我感到无比荣幸与愉快，希望我们能保持联系。

It's been an honor and a pleasure to bring complex systems science to you in this way, and I hope we'll stay in touch.

Speaker 1

我并不难找。

I won't be hard to find.

Speaker 1

感谢您的收听。

Thank you for listening.

Speaker 1

梅森·波特，我非常高兴。

Mason Porter, I am delighted.

Speaker 1

终于，我有时间邀请你来《复杂性播客》了。

Finally, I found the time to get you on Complexity Podcast.

Speaker 1

在深入科学话题之前，我很想听听你作为研究者的背景故事。

Before we dive into the science, I'd love just a bit of backstory about you as a researcher.

Speaker 0

背景故事。

Backstory.

Speaker 0

好吧。

All right.

Speaker 0

那我先说一些事情，你可以告诉我这些是否符合你的经历。

So I guess I'll mention some things, and you can let me know if this addresses or not.

Speaker 0

我在加州理工学院主修应用数学，这确实是我的专业。

I majored in applied mathematics at Caltech, and this was my actual major.

Speaker 0

加州理工有一个独立的应用数学系，这在美国比较少见，但在英国和其他一些国家则更为普遍。

There is a separate applied math department, which is unusual in The US, though much more common, say, in The UK and some other countries.

Speaker 0

在我进入加州理工之前，我只在图片中见过分形，但并不了解其背后的科学含义。

Before I got to Caltech, I had encountered fractals just pictorially, not like the scientific meaning behind any of them.

Speaker 0

我小时候也画过一些东西，特别是类似偏头痛时看到的图案——那些华丽、闪亮、色彩斑斓的图形，实际上它们与某些类型的复杂系统存在一些关联，尽管当时我并不知道。

I also had drawn, like as a kid, especially like kind of stuff I had seen in migraine headaches, like lots of fancy, flashy, colorful patterns that does actually have some various connections to complex systems of certain types, although again, didn't know that at the time.

Speaker 0

所以我特别喜欢鲜艳的色彩和各种复杂的图案等等。

So I really liked lots of bright colors and lots of fancy patterns and so on.

Speaker 0

我后来了解到分形，它包含了一些具体的例子。

I learned about fractals as something that included examples of them.

Speaker 0

我上了加州理工学院，原本打算做计算机图形相关的工作，但发现我更喜欢应用数学。

I went to Caltech, thought I was going to do computer graphics stuff, found that I liked applied math much better.

Speaker 0

我本来以为自己会从事计算机游戏之类的图形工作。

I thought I going do computer graphics for like computer games or something.

Speaker 0

尽管不知怎的，我内心深处还是觉得我会去读博士来做这件事，所以我很早就知道自己想读博士，但当时以为会走计算机科学这条路。

Although somehow I still had in my head that I was going to get a PhD to do that, so I knew from really early that I wanted to get a PhD, but I thought I was going to be computer science.

Speaker 0

最后我选择了应用数学。

I ended up doing applied math.

Speaker 0

我主修应用数学，甚至没有选修计算机科学专业，但后来却对它越来越感兴趣。

I majored in applied math and never even declared a computer science major, and I ended up becoming more interested.

Speaker 0

我依然对这些精美的图像充满兴趣，这就是为什么我总是不自觉地到处充满五彩斑斓的东西，远超应有的程度。但我对哪些模型能产生或可能产生这样的图案产生了浓厚兴趣，非常享受建立数学模型以及推导这些现象的运动方程和控制方程的过程。

I'm still interested in all these fancy pictures and this is why I have random colorful things going on all the time more than I should, but I became very interested in what models produced or could produce such patterns and really enjoyed the mathematical modeling and coming up with equations of motion, governing equations for these things.

Speaker 0

于是我最终主修了应用数学。应用数学的一个美妙之处在于，你永远不必决定长大后想成为什么样的人，你可以不断研究新事物、学习新知识，从不真正决定自己的专业方向，甚至专精于‘不专精’。

I ended up then majoring in applied math, One of the beauties of applied math is that you never have to decide what you want to be when you grow up, and you can just keep working on new things and learning new things and never actually decide what your specialty is or specialize in not specializing.

Speaker 0

之后，我去了康奈尔大学，在应用数学中心攻读研究生。

Then I went to grad school at Cornell in the Center for Applied Mathematics.

Speaker 0

这是一个独立的项目，严格来说并非独立的系，但无论是加州理工学院的本科应用数学专业，还是康奈尔大学的应用数学研究生项目，实际上都隶属于工程学院或工程部门。

This was a separate program, so not a separate department per se, but a separate program, and both Caltech's undergrad applied math major and Cornell's applied math grad program are actually within the engineering schools or engineering divisions.

Speaker 0

在每种情况下，数学系实际上都属于另一个部门。

The mathematics department in each case was actually in a different division.

Speaker 0

在康奈尔之后，我做了两个完整的博士后研究，中间还穿插了一个短期的访问研究。

After Cornell, I did, well, two full length postdocs, I also had a small visiting mini one in between, or in the middle of one of them.

Speaker 0

我在佐治亚理工学院做了数学博士后，但同时也隶属于一个非线性物理小组，即物理学中的非线性系统研究组。

So I did a math postdoc at Georgia Tech, but I also was affiliated with a nonlinear physics group, a nonlinear systems group in physics.

Speaker 0

我名义上属于数学系，但与物理系的人互动频繁，经常充当两者之间的桥梁。

So I was officially in the math department, I also interacted a lot with people in physics and was very regularly a go between.

Speaker 0

在我就读佐治亚理工学院期间，我曾在数学科学研究所度过一个学期，这个机构现在似乎即将更名，但它物理上位于伯克利，即将更名为西蒙数学研究所。

During my Georgia Tech time, I spent one semester at what's I think they've changed their they're just about to change their name now, but Mathematical Sciences Research Institute, which is physically in Berkeley, they're becoming the Simon something in for math.

Speaker 0

所以目前它正处于更名过渡期，但我当时参加了他们举办的一个特定项目，度过了一个学期。

So they're in between names at the moment, but I spent a semester there in a specific program that they had.

Speaker 0

我想我们这里涉及到了不同的时间尺度。

I guess I trust we have time scales involved.

Speaker 0

我1998年从加州理工学院毕业，2002年从康奈尔大学毕业，2002年秋季开始我的博士后工作，而我在MSRI的这个学期是在2003年春季，正好处于我第一个博士后期间的中间。

I graduated from Caltech in 1998, graduated from Cornell in 2002, started my postdoc fall two thousand and two, and my semester at this program at MSRI was spring two thousand and three, so right in the middle of my first postdoc.

Speaker 0

当我完成佐治亚理工学院的博士后工作时，我本打算谋求一个教职职位。

When I finished my Georgia Tech postdoc, I was intending to go get a faculty position.

Speaker 0

我当时确实已经进入了学术招聘市场，并在一些不错的机构进行了面试，但并没有获得我心仪职位的录用通知。

I was actually even on the faculty market and had interviews at some places that were nice, but I didn't get a job offer at places that I wanted.

Speaker 0

但最终我收到了一个来自物理系的博士后offer，让我回到加州理工学院。

And I got a postdoc offer actually from physics this time to go back to Caltech.

Speaker 0

这是我唯一一次主要隶属于物理系的时期，持续了两年四个月，我当时在加州理工学院当时的‘信息物理中心’工作，这是一个综合性研究项目。

The one time that I was mainly based in a physics department for two years and four months, and I was in the what's called the Center for the Physics of Information, which was an umbrella thing that Caltech had at the time.

Speaker 0

这与他们的量子信息研究有些关联，但我是那里少数甚至几乎是唯一一个不做量子信息相关工作的人。

It was somewhat related to their quantum information stuff, but I was like the one person or just about or one of the only people in there who was doing something other than quantum information.

Speaker 0

于是在那里做了第二份博士后，那从2005年6月开始，到2007年9月结束，之后我去了牛津大学担任第一个教职。

And so I did a second postdoc there and then So that started in June 2005, and then I finished there the September 2007, moved to University of Oxford for my first faculty position.

Speaker 0

他们称之为数学研究所，但实际上就是一个系。

They call it the Mathematical Institute, but it means a department.

Speaker 0

我在那里待了九年，然后于2016年搬到加州大学洛杉矶分校，之后就一直在这里。

Then was there for nine years and then moved to UCLA in 2016, that's been here since.

Speaker 0

我大部分时间都在数学系工作，所以我的所有学位都是工程学院下的应用数学专业。

I've spending most of my time in math departments, so all my degrees are in applied math within engineering divisions.

Speaker 0

我做过一份物理系的博士后，其余时间都在数学系，研究中也会偶尔涉及其他领域。

Spent one postdoc in physics and otherwise have been in math departments, and let go between those and occasional other subjects in my research.

Speaker 1

是的。

Yeah.

Speaker 1

我认为这有助于理解：当我查阅你的工作、熟悉你的研究时，我记得大卫·克拉科沃曾对我说过，似乎有两种类型的科学家会进入圣塔菲研究所的网络，其中一些人深深扎根于某一学科或某种方法论，而他们确实能从这种无边界的方式中获益。

I think that's helpful as a way of making sense of like, when I was looking through your work, when I was familiarizing myself, I remember David Krakauer saying to me once that there are two kinds of scientists that seem to filter into the SFI network, and some of them are anchored very squarely within one discipline or one kind of methodology, and they really they benefit from this kind of boundaryless approach.

Speaker 1

复杂系统科学更像是一种以广泛而统一的方式看待事物的方法。

It's a complex system science as more kind of like a a way of looking at things in a very general and and unifying effort.

Speaker 1

还有一类人则把一切东西都应用到狭窄的范围或特定的研究领域中。

And then there are these folks that just, like, take everything and apply it to a narrow range or, like, a particular area of focus.

Speaker 1

无论如何，我想现在从你的背景开始聊一聊，因为正如我们在邮件中提到的，我追溯了你非常早期的出版物。

Anyway, I wanna start now that you've given me your backstory with a little bit of the backstory because as we noted over email, I went pretty far back in your publication history.

Speaker 1

我读到的最早的一篇文章是你与Anela和Mucha共同领导的关于网络中社区检测的研究，那是2009年的成果。

And the earliest piece that I I read was a piece that you led with Anela and Mucha on community detection, communities in networks, which is a piece from 2009.

Speaker 1

对。

And Right.

Speaker 1

正如你所说，自那以后变化很大。

As you noted, a lot has changed since.

Speaker 0

是的。

Yeah.

Speaker 0

我的意思是，那已经是很久以前的事了？

I mean, that was you know?

Speaker 0

这实际上是一篇综述文章。

That's even actually a survey article.

Speaker 0

它并不是一篇研究论文。

It's not actually a research article.

Speaker 1

对。

Right.

Speaker 1

总是会有一个问题，那就是谁在听，他们的熟悉程度如何，以及这个节目的内容是什么。

It's always a question as to who's listening, what their level of familiarity is, what the content of this show.

Speaker 1

在圣塔菲研究所，几乎已经成了一个笑话：把一切事物都看作网络的阶段早已成为过去，现在感觉就像一张高中年鉴照片一样。

It's almost become a kind of a joke at SFI that the think of everything as a network phase is so far behind us now that it feels like a high school yearbook photo or something.

Speaker 0

但是，是的。

But Yeah.

Speaker 0

我不确定它是否真的已经过去了。

I'm not sure if it's behind us.

Speaker 1

是的。

Yeah.

Speaker 1

但很好。

But good.

Speaker 1

因为这确实是我认为仍然非常重要的内容，还没有被广泛理解，它能帮助人们了解你在这项工作中探索的一些核心概念，以及后续被进一步阐述、深化和研究的内容。

Because I because this is this is something that I think is still really vital and not as widely understood as could be just as a way of introducing people to some of the core concepts that you're exploring in this work and have Okay.

Speaker 1

自那以后被进一步阐述、深化和研究过。

Elaborated on and deepened and researched since.

Speaker 1

我们来谈谈社区网络检测。

Let's talk about community network detection.

Speaker 0

所以，是的，

So, yeah,

Speaker 1

介绍一下这个概念。

introduce this

Speaker 0

给你。

to you.

Speaker 0

是的。

Yeah.

Speaker 0

社区检测仍然是网络科学及相关领域的一个非常重要的研究方向，它的起源可以追溯到很久以前，涉及多个不同领域的早期工作。

So community detection remains a very prominent area of network science and of related things, and its origins go back it has various origins in different topics that go back quite a while.

Speaker 0

这是一种聚类类型，人们长期以来一直在对数据进行聚类——虽然严格来说并不准确，但确实如此。

It's a type of clustering, and people have been clustering, well, this is not literally true, but people have been clustering data for time immemorial.

Speaker 0

对吧？

Right?

Speaker 0

就像这几乎是圣经里记载的事情一样。

Like, it's like it was almost biblical.

Speaker 0

对吧？

Right?

Speaker 0

就像在第八天，我们就对数据进行了聚类。

Like, on day eight, we clustered data.

Speaker 0

对吧？

Right?

Speaker 0

这本应该是圣经中的一部分，只是不知为何被遗漏了。

It's part of the bible that somehow is missing.

Speaker 0

实际上，还有更早的。

It's actually there's some earlier than that.

Speaker 0

对吧？

Right?

Speaker 0

有这么一个人，叫赖斯。

There's there's this guy Rice.

Speaker 0

我认为是斯图尔特·赖斯，他当时用国会数据做了一种聚类，而且主要是手动完成的。

I think it's Stuart Rice who is essentially doing some type of clustering with congressional data, and he was doing it more by hand.

Speaker 0

我确信，在有正式算法之前，人们就已经在尝试对数据进行聚类，以更好地理解它。

And I'm sure the notion of clustering data and just trying to make more sense of it is something that people have done before there were official algorithms.

Speaker 0

人类在试图解释某件事时，总想让它变得有意义，也许一些细节并不重要。

Human beings want to make sense of something if they're trying to explain it, and maybe some of the details don't matter.

Speaker 0

数据聚类的概念，我确信早在任何正式的开端之前就已经存在了。

The notion of data clustering, I'm sure, predates long before any official beginnings.

Speaker 0

我认为我们那项工作中引用的最早论文就是斯图尔特·赖斯的。

I think the earliest paper we cite in that work was by Stuart Rice.

Speaker 0

我可能把他的名字记错了。

I might be screwing up his first name.

Speaker 0

计算机科学领域有很多工作致力于实现谱聚类。

There's a lot of work from computer science on trying to do things became spectral clustering.

Speaker 0

从技术上讲，‘谱’指的是使用特征值、特征向量等概念，加上一堆术语，但这些至少是可以通过谷歌搜索到的，它们既有丰富的数学历史，也有深厚的计算机科学背景。

Spectral means, just from a technical point of view, using ideas like eigenvalues and eigenvectors and doing a bunch of buzzwords, but anyway, those are at least Googleable and is stuff that has both a rich mathematical history and a rich computer science history.

Speaker 0

你可能有一个想要分割的系统，这些方法最初出现在20世纪70年代初，目的是将网络划分为大致相等的部分。

You might have a system that you want to divide and those methods were originally, so it's from the early 70s, those methods were meant to divide a network into sort of maybe roughly equal sized parts.

Speaker 0

与此形成对比的是，想想更社会化的领域，比如朋友圈子或其他人群组合。

Contrast to that, think of more social domains, circles of friends, or other things, and some sets of people.

Speaker 0

直观上什么是社区，以及如何用数学方式准确定义它，需要更多努力，真正严谨地实现一直很有挑战性。

What a community is supposed to be intuitively and defining it mathematically is it takes more effort and it's been challenging to do it really rigorously.

Speaker 0

密集的连接组，原则上与其他密集组之间的连接较为稀疏。

Dense sets of connections that in principle are sparsely connected to other dense sets.

Speaker 0

这就是我为什么说朋友圈子的原因。

That's why I'm saying circles of friends.

Speaker 0

所以你可能会有一群来自宿舍的朋友，但其中一个人是交换生，他们充当了连接另一个密集人群的桥梁，比如他们来自另一个国家。

So you might have a bunch of people who are friends from a dorm, but then somebody is an exchange student and so they serve as a bridge to a dense set of people, they came from another country or something.

Speaker 0

对吧？

Right?

Speaker 0

这就是直觉所在。

Like that's the intuition.

Speaker 0

我知道‘社区结构’这个术语在生态学中也会用到。

I know that the term community structure also gets used in ecology.

Speaker 0

它并不完全等同于我在这里使用的含义，但无论如何，算法的目标是找到密集连接、且与其他密集集合连接稀疏的群体，比如人或其他实体。

Doesn't mean exactly the same thing as the way that I'm using the word here, but somehow to algorithmically find dense sets of say people or whatever that are sparsely connected to other dense sets.

Speaker 0

因此，社区检测广义上指的就是这类问题。

So community detection very broadly refers to that type of problem.

Speaker 0

许多网络科学家和其他研究者为此投入了大量时间，实际上有一篇著名的论文，我认为正是这篇论文首次使用了‘社区检测’这个术语，而不是其他如聚类之类的说法。

A lot of network scientists have spent, and others have spent a lot of time on it, there's actually one prominent paper and this is where the term, I think this is the one where they use the term community detection as opposed to other terms like clustering or whatnot.

Speaker 0

这篇论文由米歇尔·吉尔万和马克·纽曼撰写，他们两人是圣塔菲研究所的知名人物。

It's a paper by Michelle Girvan and Mark Newman who are going to be two familiar names, SFI folks.

Speaker 0

这是一篇2002年的论文，他们使用了一种衡量边重要性的方法，称为介数中心性，旨在找出那些具有关键作用的连接。

This is a 2002 paper and they were using a measure of importance of edges, so trying to find it's called betweenness centrality, trying to find connections that are somehow important.

Speaker 0

特别是，想想那个交换生所涉及的那些边。

In particular, think of that the edges that are involved from this exchange student.

Speaker 0

如果我想从网络的一个部分到另一个部分，这些边在信息传递过程中会形成高流量的路径，我可以通过测量谁可能是瓶颈来识别它们。

Those are, if I'm trying to go from one part of the network to another, edges that would have dense traffic if I imagine some message passing through and measuring who might be the bottleneck.

Speaker 0

因此，他们基于这一思路开发了一种方法。

So they developed a method based on that.

Speaker 0

这种方法存在一些问题，但它让很多人关注到了这个问题。

There are various issues with the method but it was a paper that brought the attention of this problem to a lot of people.

Speaker 0

我甚至可能在这篇你正在看的论文里也提到过这一点。

I even probably say this in this particular paper that you're looking at.

Speaker 0

引起了关注

Brought the attention

Speaker 1

你确实提到了，是的。

You do actually, yeah.

Speaker 0

你确实这么说过吗？

You do actually say that?

Speaker 0

好的，很好。

Okay, good.

Speaker 0

至少我的历史观点并没有改变，只是方法发展了，人们思考这个问题的方式变了，比如数学上如何理解，或者简而言之，变化在于现在出现了大量统计推断方法，这些方法在2009年之前就已存在，但当时由于某些问题未能取得良好效果，它们的理论基础还不够扎实；而后来的一些进展不仅增强了理论基础，也让它们在实践中取得了良好的结果。

At least It's not that my opinions of the history have changed, it's just that the methods have developed and how one thinks about the problem, say mathematically or how some I should say the short version of what has changed is that there's a lot more statistical inference methods now that existed before 2009 but were not there were certain issues of them that weren't getting good results just yet and so they had more theoretical grounding and there were certain advances that allowed them to not just have more theoretical grounding but also get good results in practice.

Speaker 0

还有其他类型的方法。

There's still other types of methods as well.

Speaker 0

我本人更喜欢基于动力系统的方法，因为不同的动力机制可能会产生不同的瓶颈或不同的社区结构。

I actually like methods that are based on dynamical systems where maybe a different type of dynamics might have different bottlenecks or might have different communities.

Speaker 0

这是我的个人观点。

That's my own personal view.

Speaker 0

但这并不是文献中主流的方法。

That's not the predominant one that literature follows.

Speaker 0

当时，也就是2009年做综述时，我想说的是：人们当时都在使用哪些不同的方法？

At the time, again, doing the survey in 2009 trying to say, what are different approaches people use?

Speaker 0

另一种人们很自然会想到的方法，它最初并非来自社区检测，而是普遍存在于人们喜欢使用目标函数来尝试最大化某些东西。

Another approach that people very naturally think about, and it comes not originally from community detection but just in general, people like objective functions to try to, say, maximize.

Speaker 0

人们喜欢目标函数。

People like objective functions.

Speaker 0

你正在研究某样东西，然后也许我可以将它概括为一个目标函数。

There's something that you're studying and, okay, maybe I can encapsulate this in an objective function.

Speaker 0

即使在最佳情况下，这个目标函数也会存在问题，无法完美地概括你想要的东西。

And this objective function, even in the best case scenarios, is going to have issues and not be a perfect encapsulation of what you want.

Speaker 0

但这个模块性目标函数后来被各种人彻底质疑和拆解。

But this is this modularity objective function, which has since been torn to pieces by various people.

Speaker 0

它试图衡量内部的密集程度与之间的差异，并相对于某种随机基线进行比较。

But it's trying to measure what is dense inside versus between and be relative to some kind of random baseline.

Speaker 0

世界并没有那么简单。

The world is not so simple.

Speaker 0

如果你只是试图以这种方式定义并试图最大化它。

And if you just try to define something that way and try to maximize it.

Speaker 0

那时，这是一种主导性的方法。

That was like the predominant kind of approach at the time.

Speaker 0

它现在仍然被广泛使用，我知道有些人可能不认同我的以下观点，但我认为有时这样做仍然有帮助。

It's still used a lot and I know that some people will not agree with me on the following, but I think it's still useful to sometimes do it.

Speaker 0

但它缺乏其他方法那样的理论基础，并且已知在某些情况下会出错。

But it does not have, say, the theoretical grounding of other things and it's known to go wrong in certain ways.

Speaker 0

你可以构造出一些它会出错的例子。

You can construct examples in which it goes wrong.

Speaker 0

在这一综述文章发表之前，相关雏形就已经出现在已发表的论文中了。

The beginnings of that were occurring in published papers before this survey article.

Speaker 0

例如，2007年巴托洛梅奥和福尔图纳托的一项工作就提出了分辨率极限问题，这是该方法存在的一系列问题之一，而这个问题早在那时就已经被揭示了。

There was a work in 2007, for instance, by Bartholomew and Fortunato about a resolution limit, which is one of the there's a whole list of issues with that approach, but that was one particular issue with that approach that had been illustrated before.

Speaker 0

如果你现在查看社区检测领域在方法论方面的文献，会发现更多统计导向的方法。

Really, if if you look at the community detection literature now on the methodological end, you'll see a lot more statistically oriented approaches.

Speaker 0

如果你看看实际从业者在做什么，有时他们会采用这些方法，有时则不会。

If you look at what practitioners are doing, sometimes they'll do that and sometimes not.

Speaker 0

正如我提到的，还有这些基于动态过程的方法，而我认为这更值得深入研究——当然，这只是我个人的观点。

Then, as I mentioned, there are these sort of dynamical process based approaches, which is the one that I actually think is a more interesting one to pursue, again, my personal point of view.

Speaker 0

问题是，目前人们熟悉的方法大多基于非常简化的动态过程，比如随机游走，而不是其他类型的动态过程。

The issue is that the ones that people know how to deal with are dynamical processes that are based on very simplistic ones, based on random walks as opposed to a dynamical process based on something else.

Speaker 0

因此，我的观点是：社区结构不应当仅仅是一个结构性的概念。

So in particular, my view is the following, that community structure should not actually just be a structural thing.

Speaker 0

有一些方法，无论是通过目标函数还是某些统计推断方法，都可以与网络上随机游走的瓶颈以某种方式联系起来。

There are certain approaches, both doing it with objective functions and a couple of the statistical inference ones that can be related to bottlenecks of, say, a random walk on a network in some particular way.

Speaker 0

你可以想象一个人在网络上不断跳跃，沿着边移动，然后测量他们在某些区域停留的时间，而瓶颈就位于社区之间，正是在那里他们会被卡住。

So you imagine somebody's bouncing around on a network and they're following the edges and you measure how long they spend in certain places and bottlenecks are between communities so that's where they get stuck.

Speaker 0

他们在这些地方来回穿梭。

They bounce around a lot.

Speaker 0

想象一下谣言传播的情景：人们频繁地与同一群人交流，很多人都已经听说了这个消息，但你仍然需要一些零星的耳语，比如交换生之类的人，才能让谣言传到世界的另一端。

Think of if you're talking about a rumor spreading, people are talking a lot to the same people and a lot people there will have heard it before, but you still have the sort of whispers and you need the exchange student or whatnot for it to get to the other side of the world.

Speaker 0

你可以理解为某种意义上的瓶颈。

You can think of some notion of a bottleneck.

Speaker 0

我现在对社区结构的看法是，这应该是从瓶颈中产生的网络。

My perspective on community structure now is that this should be the networks that you get from a bottleneck.

Speaker 0

但问题是，如果我采用一种不同的动态过程，我可能会得到不同的瓶颈，因为不同的结构可能是不同动态过程的瓶颈。

But the thing is, if I have a different type of dynamic, I could get a different bottleneck because maybe different structures are bottlenecks to different dynamics.

Speaker 0

而难点在于，我们真正知道如何应用这种方法的动态过程只有少数几种。

And then the rub is that there's only a couple of dynamical processes that we actually know how to do this with.

Speaker 0

但这种视角我认为很有意思，因为尽管我们能将一切视为网络，并且网络结构确实存在，但通常我们关注的是发生在网络上的某种过程，而我不明白为什么当发生的过程不同时，出现的结构却必须相同。

But that's the sort of perspective that I think is interesting because even though we can do everything as a network and there is a notion of network structure, usually we're thinking about something occurring on that network and I don't see why it should be the same structure that shows up if it's a different thing occurring.

Speaker 0

我认为这并不一定成立。

I don't think that has to be true.

Speaker 0

但这并不是文献中普遍接受的观点。

But that's not the that's not the prevalent view of this in the literature.

Speaker 1

很有趣，因为我觉得我在这档节目里漏掉了一个对话，就是我之前和布莱恩·阿瑟在68、69年的一次交谈，他谈到不同方法论如何揭示或实现不同的本体论，特别是指出，如果你用代数思维研究经济，你对经济的理解会与用算法思维看待经济时完全不同。

Interesting because I feel like I oversight, actually, at this point on the show, the conversation I had with Brian Arthur a while back, '68 '69, where he was talking about the way that different methodologies disclose or enact different ontologies, and specifically talking about the way that if you're using algebraic thinking to study the economy, you're gonna get a whole different view of what an economy is than if you're Sure.

Speaker 1

如果你用算法的方式去思考它。

If you're thinking of it algorithmically.

Speaker 1

所以这是一个对象或过程的问题。

And so this is object or process thing.

Speaker 0

是的。

Sure.

Speaker 0

我确实认为人们用不同方式研究问题很好。

I definitely think it's good for people to study problems in different ways.

Speaker 0

对吧？

Right?

Speaker 0

比如，我认为人们使用不同方法很重要，而不是简单地说，这种方法最好。

Like, I think it's important for people to use different approaches and to not just say, oh, this approach is the best.

Speaker 0

别人都应该这么做。

Everyone else should do it that way too.

Speaker 0

我觉得这样没问题。

I think that's okay.

Speaker 0

好的。

Okay.

Speaker 0

我不是说，我用了‘危险’这个词。

Not I mean, I use the word dangerous.

Speaker 0

这并不危险。

It's not dangerous.

Speaker 0

世界可能会以危险的方式崩溃，但我认为这只是次优的。

The world's gonna blow up dangerous, but it's suboptimal, I think.

Speaker 0

所以我完全乐意让其他人以不同的方式来处理问题。

So I'm perfectly happy for other people to approach things in different ways.

Speaker 0

在我写的论文中，以及根据我的合作者不同，我曾以多种方式思考过问题，我觉得这样做对我很有帮助。

And in different papers that I've written and also depending on who my co authors are, I have thought about problems in different ways and I feel like I benefit from doing that.

Speaker 0

所以，至少我对你所说的他的观点的理解——被我自己的看法影响了，人们总是想听他们想听的——这似乎和我的想法一致。

So at least the way I'm interpreting what you're saying of his, colored by my own view and people wanting to hear what they wanna hear, that seems to be consistent with what I think.

Speaker 1

嗯，在阅读这份综述时，我确实注意到，你所识别的每一种方法都有其独特的优势和劣势。

Well, that is certainly something that that popped up for me in reading this survey, was this every one of these approaches you identify as having its own distinct portfolio of benefits and drawbacks.

Speaker 1

是的。

Yeah.

Speaker 0

所以当你在谈论

And and so when you're talking

Speaker 1

关于优化和优化问题时，我们曾邀请克里斯·摩尔做客节目。

about, yeah, when you're talking about, like, optimization and optimization problems, we had Chris Moore on the show.

Speaker 1

我想是第51期。

I think it was episode 51.

Speaker 0

我的意思是，克里斯也是这个领域的专家。

I mean, Chris is an expert on this topic as well.

Speaker 0

因此，很多贡献都来自他和他的团队。

So he'll have so a lot of the contributions have come from he and his crowd.

Speaker 1

我思考这个问题的倾向，源于我本科时的经历——那时我被迫在进化生物学课上站在全班面前，为某种物种概念辩护，而反对其他所有概念，这让我非常沮丧，因为这些不同方法的适用性、情境性都很重要。

So it's my tendency thinking about this stuff just coming out of I remember, like, as an undergraduate being forced to stand up in front of an evolutionary biology class and defend one kind of species concept over every other and just getting super frustrated by this because it's also like, the situatedness, the contextuality of all of these different approaches matters.

Speaker 1

而我喜欢你在这里所倡导的这种思维方式，首先，人们必须展示他们的工作过程。

And then I like the way that the kind of thinking that you're espousing here is one in which, first of all, people are required to show their work.

Speaker 1

然后这就

And then that's

Speaker 0

哦，是的。

Oh, yeah.

Speaker 0

不。

No.

Speaker 0

你必须这么做。

You have to do that.

Speaker 0

不。

No.

Speaker 0

我不喜欢隐藏的假设。

I do not like hidden assumptions.

Speaker 0

我对假设完全没问题。

I'm perfectly happy with assumptions.

Speaker 0

我真的很讨厌有人热情地把某些东西藏起来。

I I really hate it when someone tries to put something under the rug, like, passionately.

Speaker 1

是的。

Yeah.

Speaker 1

但还有另一件事，我认为这体现了圣塔菲研究所（SFI）在实践复杂系统科学时的现状：这里的人似乎喜欢针对同一个问题，运用一套完全不同的技术组合，而不假定其中某一种方法一定会胜出。

But and then but then there's this other thing, which I think as a way of conveying what seems like it makes the status quo of complex system science as practiced by SFI institutionally is that people here seem to enjoy bringing to bear a whole different portfolio of techniques on a single question and not assuming that one is gonna outperform the rest.

Speaker 1

是的。

Yeah.

Speaker 1

总之，说了这么多，我想再回过头来谈谈，因为这对我来说就像在公开场合尽情展现自己作为新手的学习过程，我想再多聊聊这些网络和技术如何揭示社群中的模块化与层级结构，我很想听听你的看法。

So at any rate, that having been said, I wanted to double back because in as much as this is me just getting to indulge my own noob learning in public here, I wanna talk a little bit more about the way that these networks, these techniques reveal modularity and hierarchy in communities, and I'd love to hear you riff on that.

Speaker 1

因为，这只是先打个预防针。

Because, like, there's this is just to call the shot.

Speaker 1

这是一种为后续讨论你主导的那篇关于数据拓扑的论文铺路的方式——它表明，数据的不同粒度或模糊性、不同分辨率层级，似乎会揭示出完全不同的对象，就像那只Jigglypuff一样。

This is a way of setting the table for later in this conversation talking about paper that you led on the topology of data and how different layers of granularity or, like, fuzziness, different levels of resolution of data seem to reveal, again, like, completely different, like, different objects, like the Jigglypuff thing.

Speaker 1

但我不想

But I don't wanna

Speaker 0

对。

Right.

Speaker 0

对。

Right.

Speaker 1

是的。

Yeah.

Speaker 1

但所以它只是，

But so it just,

Speaker 0

就像对。

like Yeah.

Speaker 0

我们确实把一只宝可梦发表在了一个严肃的物理学术期刊上。

We and we published a Pokemon in a serious physics venue.

Speaker 0

这种事可能不会再发生，但我们成功了，这很好。

It may not happen again, but we got away with it, which is good.

Speaker 0

顺便说一下，如果你看过《大侦探皮卡丘》，那个愤愤不平的胖丁就是我的精神图腾，尤其是那种特别愤愤不平的类型。

Jigglypuff is by the way, like, if you've seen the detective Pikachu, indignant Jigglypuff is my spirit animal, but specifically the indignant type.

Speaker 0

好的。

Okay.

Speaker 0

让我想想，我最初的问题是什么来着。

So let me see if I can remember what the original question was.

Speaker 1

哦，好吧，让我想想。

Oh, well, let me yeah.

Speaker 1

对我来说，一方面你有这种方法论上的多元主义，但另一方面，SFI网络中的许多研究者已经探讨了研究现象的粒度或层级如何从根本上改变你对它的理解。

So for me, it's like, on the one hand, you have this sort of methodological pluralism, but then on the other hand, you have all of this work that different people have done in the SFI network on how the granularity or the level at which you are studying a phenomenon is really gonna radically change the way that you understand it.

Speaker 1

当你看到像大卫、杰西、尼哈特这些人，他们在探讨究竟什么构成一个个体时。

And, like, when you got people like David and Jess and Nihat and these others that are working on what exactly constitutes an individual.

Speaker 1

我觉得社区检测这个问题与这种根本性的探究密切相关。

I just find this question about community detection very closely related to that kind of fundamental inquiry.

Speaker 0

对。

Right.

Speaker 1

所以我很想听听你对这类问题的看法。

So I'd love to hear you talk about that kind of stuff.

Speaker 0

好的。

Okay.

Speaker 0

当我以这种抽象的方式来思考社区检测时，我会想到：我已经构建了一个网络，它可能是以成对交互的图形式存在，但也可能更复杂，因为图还有各种推广形式。

I think about community detection when I put it in that language as a sort of abstract problem in the sense of I have already say constructed a network or it could be something whether in the form of literally it's a graph with pairwise interactions, but it could be something more complicated than that because there's generalizations of graphs.

Speaker 0

但我们已经把它变成了一个数学对象，我认为社区检测是我们对这个数学对象进行的操作，它会输出结果。

But we've already turned it into a mathematical object and I view community detection as something that we're doing to look at a mathematical object and it then gives an output.

Speaker 0

我们所关注的这种结构，是我所说的介观尺度结构的一种特定类型。

The type of structure we're looking at is a specific type of what I would call a mesoscale structure.

Speaker 0

希望这开始触及到你所提到的内容，即微观尺度结构可能是单个节点，或者可能是单个边等等。

Hopefully this is starting to get at what you're bringing up, where the microscale structure might be the individual nodes or potentially individual edges and so on.

Speaker 0

而宏观尺度结构可能是某种分布，比如朋友数量的分布之类的。

And a macroscale structure might be distributions of stuff, whether it's distributions of the number of friends or whatnot.

Speaker 0

是有很多人只有少数朋友，还是只有少数人拥有许多朋友？

Do you have many people with few friends a few people with many friends?

Speaker 0

所以从这个意义上说，这是一种宏观尺度。

So it's macro scale in that sense.

Speaker 0

介观尺度是介于两者之间的内容，而社区是一种特定的介观尺度结构，其特点是内部连接密集，而不同群体之间的连接稀疏。

Mesoscale is stuff in the middle and communities are a particular type of mesoscale structure, in particular a type in which you have dense connections inside, whatever that means, sparse connections between.

Speaker 0

还有其他类型的介观尺度结构，比如所谓的核心-边缘结构，其中也可能存在密集连接，但这些连接与其他部分的关联方式并不一定是稀疏的。

There are other types of mesoscale structures, like there's something called a core periphery structure where you still might have dense stuff, but it's connected to other things but not necessarily in a sparse way.

Speaker 0

社会科学家称之为角色和位置，或角色结构，这种结构可能并不要求实际的密集连接，而是通过互动模式来体现：例如，教职人员具有一种特定的互动模式，即局部网络结构，而研究生则具有另一种不同的互动模式， presumably 拥有比教职人员更多的朋友，但这里识别的是结构的相似性，而非字面意义上的密集性。

There is something that social scientists call roles and positions or role structure where it might not be that you have literal denseness but maybe the interaction patterns say, Oh, a faculty member has a certain type of interaction patterns like a local network structure and a graduate student has a certain other type of interaction pattern, another network structure presumably with more friends than the faculty member has, but you're identifying similarity of the structure rather than literal denseness.

Speaker 0

人们研究的中观结构有多种类型，而社区结构可能是被研究得最深入、也最容易理解的一种。

So there's different types of mesoscale structures that people study Community structure is somehow the one that people have advanced the most on, I would say, is the most straightforward to think about.

Speaker 0

其他类型的中观结构也取得了进展。

There's advancements on the others too.

Speaker 0

社区结构更像是一种模块化结构。

Community structure is more of a modular type structure.

Speaker 0

‘模块化’这个词，我指的是它的英文含义，这个概念可以追溯到相当早的一篇论文——实际上，我们在那篇文章中引用了这篇名为《复杂性的架构》的论文。

The term modularity, I mean, there's the English meaning of the term, which I think is fairly even goes back to, this is another old paper that we cite in that article actually, the architecture of complexity.

Speaker 0

这是赫伯特·西蒙提出的模块化结构概念，其核心思想是：某种形式的鲁棒性可能源于这样的机制——如果你改变了某个模块中的内容，并不意味着模块之外的其他部分也会随之改变。

This is Herbert Simon who was talking about modular structures where the idea is that maybe in terms of maybe a possible explanation of some types of robustness where if you change something in one module, it doesn't mean that anything else outside that module has changed.

Speaker 0

他在那篇论文中主要以词语为例进行阐述。

He was doing this mostly with words in that paper.

Speaker 0

但与我们大多数人不同，赫伯特·西蒙能够在不进行大量计算的情况下洞察事物。

But Herbert Simon, unlike most of us, can see things without actually working out the calculations as much.

Speaker 0

我认为大多数普通人需要实际进行计算和操作，而不一定具备那种洞察力。

Where I think most normal, most humans need to actually like work through the calculations and do this and don't necessarily have that kind of vision.

Speaker 0

我没有那种洞察力。

I don't have that kind of vision.

Speaker 0

这更像是一种模块化结构，对吧？

That's more of a modular, right?

Speaker 0

想象一下横向的结构，比如数学系、物理系、生态学系等等。

Think of it a horizontal, the math department, the physics department, the ecology department or whatnot.

Speaker 0

一种层级结构，人们用‘层级’这个词的方式也不尽相同，但从这个角度看，想象一个个体，对吧？

A hierarchical structure, and people use the word hierarchy in different ways as well, but from this point of view, think of an individual, right?

Speaker 0

你刚才提到了这一点。

You brought that up.

Speaker 0

一个子部门，比如应用数学小组之类的，整个数学系，整个文理学院，整个大学，对吧？

A sub department, so maybe the applied math group or whatnot, the entire math department, the entire College of Arts and Sciences, the entire university, right?

Speaker 0

这种层级结构中蕴含着一种嵌套性，而这种嵌套性可以与模块化结构并存。哦，抱歉，是与模块化结构并存。

There's a sort of nestedness that's built into that hierarchy, and that's something that can exist alongside a hierarchical structure oh, sorry, alongside a modular structure.

Speaker 0

事实上，我们举了一个例子，这也是圣塔菲研究所的一位人士。

In fact, we give an example of this, and this is another SFI person.

Speaker 0

这是Aaron Clauset好心允许我们使用的图片，他当时是从生态学角度思考的，但我在那篇论文中开了个玩笑，假装这些寄生性草原物种间的相互作用其实是教职员工之间的互动。

This is a picture that Aaron Clauset kindly let us use where he was thinking ecologically, but I made a snarky remark in that paper pretending that these parasitic interact grassland species interactions were actually between faculty members.

Speaker 0

你可以以任何你想要的方式解读这个评论。

You can unpack that comment any way that you'd like.

Speaker 0

这张图片来自Aaron Clauset，他好心允许我们使用。

The picture comes from Aaron Clauset and he kindly let us use it.

Speaker 0

这张图片明确地结合了层级结构和模块化结构。

That picture very specifically has both a hierarchical and a modular structure combined into it.

Speaker 0

这张图片的原始版本来自一篇论文，我认为是Aaron、Chris Moore和Mark Newman的那篇论文，如果我没记错的话，是他们2008年发表在《自然》上的那篇论文，它使用了一种层级统计模型来寻找模块。

A paper that a variant of that picture originally came from, so a paper that is, I think it's actually Aaron and Chris Moore and Mark Newman, I think it's that paper, the 2008 Nature paper of theirs, if I remember correctly, was a paper that used a hierarchical statistical model in it to look for modules, right?

Speaker 0

因此，这张图片甚至源自一篇将层级与模块结合在一起的论文，或者至少是它的变体。

So that particular picture even came from a paper where hierarchy and modules were combined or a variant of that picture.

Speaker 0

我使用它只是为了说明这些结构可以同时存在。

And I was using it just to point out that these things exist simultaneously.

Speaker 0

它们不是相同的东西。

They're not the same thing.

Speaker 0

你可能会跟我有互动，比如与其他大学的一位物理学家合作，而我将他们标记为物理学家，这意味着我在标签上添加了一个粒度层次，这个层次既不同于凝聚态研究组的层次，也不同于个体的层次。

You can have an interaction between me and say collaboration with a physicist at another university and the fact that I've identified them as a physicist means I've put a level of granularity on the label that is not the level of the condensed matter group and is not the level of the individual.

Speaker 0

你可以想象耦合关系以不同方式依赖于不同层级的结构。

You can imagine couplings that depend on different levels of hierarchy in different ways.

Speaker 0

但社区结构本身，我最初更倾向于将其视为模块化的问题，但你也可以将层次化的想法和方法融入到分析方法中，甚至融入到你看待问题的方式中。

But community structure by itself, I think of the problem initially more as looking at modular things, but you can have hierarchical ideas and approaches that are also built into the methods and into potentially the way that you're looking at the problem.

Speaker 1

很棒。

Cool.

Speaker 1

从这里开始，我想转向这类方法的一个应用。

From there, I'd like to move on to an application of this kind of stuff.

Speaker 1

你和特劳德、穆恰合著了2012年发表在《Physica A》上的那篇关于Facebook网络社会结构的论文。

You've got this Physica a twenty twelve piece that you coauthored with, Traud and Mucha, the social structure of Facebook networks.

Speaker 1

这让我回想起第12集，那时我们邀请了马特·杰克逊，而那时他还没开始做关于Facebook数据的研究，对吧。

So this is just to call back all the way to I think it was episode 12, and we had Matt Jackson, and that was before he'd done his work on Facebook data and Right.

Speaker 1

虚假信息的传播。

The flow of disinformation.

Speaker 1

对。

Right.

Speaker 1

我希望将来能请他回来讨论这个话题。

And I hope to get him back for that one.

Speaker 1

但这是另一篇非常有趣的文章，它打开了你从Facebook获得的庞大数据集，研究一些我不知道的东西。

But this is another really interesting piece that is cracking open this enormous dataset that you get from Facebook and looking at I don't know.

Speaker 1

每当我想到这些现象的根源可能追溯到像格雷戈里·贝特森和分裂生成论这样的理论时，我就感到不安。

I get paranoid thinking about the roots of some of this stuff going all the way back to, like, Greg Bateson and the schismogenesis and so on.

Speaker 1

一旦你获得了社区结构的信息，你打算怎么用它呢？

Like, what do you do with, like, community structure info once you have it?

Speaker 1

你能帮我以一种更积极的方式思考这个问题吗？

Like, maybe you can help me think about this in a more pleasant way.

Speaker 0

是的。

Yeah.

Speaker 0

我不确定会不会更愉快一些。

I don't know if it's gonna be more pleasant.

Speaker 0

人们和我聊天后，通常不会对这个世界感觉更好，尤其是在涉及伦理问题、社交媒体、破坏民主，或者我猜Facebook最近比较安静的时候。

People don't usually get through conversations with me thinking better about the world, especially when it comes to ethical issues and social media and destroying democracy and or I guess Facebook's or Facebook's been quiet lately.

Speaker 0

我想我现在该怪Twitter了。

I suppose I'm supposed to blame Twitter now.

Speaker 0

对吧？

Right?

Speaker 0

我不知道。

I don't know.

Speaker 1

但不管怎样，这是一篇很有趣的论文，而且确实如此。

But anyway, this is a cool paper and yeah.

Speaker 0

是的。

Yeah.

Speaker 0

所以我能说确实如此。

So I could tell yeah.

Speaker 0

所以，我可以确定，这篇论文实际上是续篇，它基于2011年的一篇方法论论文，是它的新配套研究。

So this I could tell so this paper was actually a sequel paper, and so there's this 2011 paper that's a methods paper that this is a new companion of.

Speaker 0

我们当时想做的，是我们拥有这些Facebook数据。

And what we were trying to do So we have this Facebook data.

Speaker 0

这是Facebook史前时代的数据。

This is from the pre Cambrian era of Facebook.

Speaker 0

这篇论文所用数据的一个重要特点是，当时你必须拥有一个.edu邮箱账户。

An important part, important aspect of the data that's used in this paper is that you had to have a .edu account at the time.

Speaker 0

明白吗？

Okay?

Speaker 0

所以，那时Facebook的结构和现在完全不同。

So the structure of Facebook back then is completely different from what's there now.

Speaker 0

我还想提一下这个数据的另一个特点：我们只研究Facebook页面之间的连接关系。

The other thing I want to mention about that data is that we're looking at connections between Facebook pages only.

Speaker 0

所以，我们没有使用任何关于用户发布内容、浏览内容或互动行为的信息。

So it's not using anything about what anybody posts or what anybody sees or how they interact.

Speaker 0

它纯粹是结构性的，还带有一些元数据，比如某人专业的数值标识符，或者性别是男是女的数值标识符。

It is purely there is a structural thing that's there and there's certain metadata about it, like maybe a numerical identifier for somebody's major, a technically numerical identifier for whether male or female.

Speaker 0

我想那时候只区分男性和女性，所以此后一些文化规范也发生了一些变化。

I think it only did male or female back then, so also various cultural norms have moved on a bit since then.

Speaker 0

无论如何，那个标识符包含年级信息，但只有少数几种不同的标识。

Anyway, that identifier, had class year, but only like a few different identifiers.

Speaker 0

实际上，这些标识符的集合非常稀疏。

There was actually a very sparse set of identifiers.

Speaker 0

我们有100所不同的大学，并开发了一种方法，试图从非常粗略的层面来观察社区结构，这种应用主要针对课程层面的问题。

We had a 100 different universities and we had developed methodology to try to look at, okay, at a very coarse level and community structure when it comes to an application is really for course level things.

Speaker 0

如果你试图对它做太多解读，就会越来越担心某些结果是否只是你所选方法的产物。

If you try to start saying too much about it, then you really have to worry more and more about when something is an artifact of a particular method you chose.

Speaker 0

考虑到这些方法的混乱性，我更倾向于用它来处理课程层面的问题。

And given the messiness of these methods, I'm much more comfortable using this for course things.

Speaker 0

特别是，原则上我们希望，通过这种计算获得的洞察，能将原本需要仔细研究的一千个事项，缩减到大约二十个。

And in particular, the hope in principle is that an insight that you might get from this type of calculation would then be instead then instead of a thousand things to experiment and look at closely, maybe there's now 20.

Speaker 0

对吧？

Right?

Speaker 0

它的作用是缩小可能性范围，但你不应该依赖它来确定哪一个才是正确的。

It's supposed to narrow down the possibilities, but you shouldn't trust it to tell you which one exactly is right.

Speaker 0

你仍然应该依赖领域专家的意见。

You should still then go with the domain expert.

Speaker 0

至少它能提示你哪些地方值得更详细地查看。

At least it might suggest what to look at in more detail.

Speaker 0

我们正在比较100所不同的大学，试图大致了解它们是如何组织的。

We are comparing 100 different universities and trying to say, how do they, very broadly speaking, organize?

Speaker 0

因此，大部分信息来自年级，但也会涉及宿舍居住情况。

And so predominantly you get a lot of stuff from class year but you also get dormitory residence.

Speaker 0

在某些大学里，宿舍居住情况会与年级挂钩。

Sometimes in some universities it's coupled with class year.

Speaker 0

在100所大学中，有3所主要依赖宿舍居住情况，而97所主要依赖年级，即使在这一层面上也存在差异。

Dormitory residence was actually predominant at three out of the 100 universities as the main one whereas class year was at 97 out of the 100, so even at that level you get some difference.

Speaker 0

我知道加州理工学院会不一样。

I knew that Caltech would be different.

Speaker 0

我从自己在那里求学的经历就知道了，因此我也借助了领域知识，知道我的母校是个异常值——事实上，那个特定的加州理工学院Facebook数据集，即使只是做一些像特征值这样的基本分析（当然，大多数听众可能不知道这是什么意思），它在结构上也与其他数据集不同，不仅仅是因为它是最小的那一个。

I knew that from my own time there, and so I also used in that sense domain knowledge to know that my own school is an outlier, in fact that particular Caltech Facebook dataset, including if you just do even things like feeler values and of course most of the listeners won't know what that means, it's actually structurally different from the other not just because it's the smallest one.

Speaker 0

确实存在结构性差异。

There's actually structural difference.

Speaker 0

加州理工学院在各个方面都很奇特，包括数学层面。

Caltech's weird in every respect including mathematically.

Speaker 0

所以我们正在观察宏观差异，而这个数据集，你知道，这是同一过程的100种不同实现。

So we're looking at broad differences and the dataset, you know, this is a 100 different realizations of the same process.

Speaker 0

通常，只有在谈论生成模型时，比如某种合成规则来构建网络，你才能得到如此干净的数据。

Normally, you only get something that clean if you're talking about generative model, like some synthetic rule for how to create a network.

Speaker 0

因此，这个数据集的另一种用途是，到目前为止，它已经被许多研究人员用于多篇论文。

And so another way that this dataset's been used, because the dataset's been used in quite a few papers by various people at this point.

Speaker 0

我这篇论文被引用得很多，但我觉得这是因为人们在使用这个数据集。

This particular paper of mine is a well cited paper, but I think it's because people use the dataset.

Speaker 0

我认为关于我们对各大学相对关系的特定陈述，知道的人要少得多。

I think it's much less known about the particular statements we made about relative universities.

Speaker 0

人们引用它，最普遍的原因只是因为他们使用了这个数据集。

It's just people citing it because they use a dataset I think is the most common thing.

Speaker 0

并非所有的引用都同等重要。

Not all citations are created equal.

Speaker 0

这是另一个需要记住的重要点。

That's another important thing to remember.

Speaker 0

我并不抱怨人们引用它。

I don't complain about people citing it.

Speaker 0

我想如实说明他们这么做的原因：他们引用是因为他们随后用这个数据集做了别的事情。

Want to keep it real about why they're doing it, it's because they're citing it because they're then using the dataset for something.

Speaker 0

但因为你有同一事物的100种不同实现，并且规模各异——最小的有762个节点，最大的连通分量——我可以看得出我研究过这个很多次，最大的那个有超过四万个节点，所以你也跨越了几个数量级，至少接近两个数量级。

But because you have 100 different realizations of the same thing and vary across sizes where the smallest one has seven sixty two nodes and the largest connected component, you can tell I've studied this a lot, The largest one has, I think, over 40,000, so you also have a couple orders of magnitude or at least almost a couple orders of magnitude.

Speaker 0

你可以在其上叠加某种动态过程，并进行比较。

You can put some sort of dynamical process on top of it and compare things.

展开剩余字幕（还有 480 条）

Speaker 0

或者你可以使用其他你想要研究的方法，包括一些社区检测方法，并进行比较。

Or you can do some other method that you want to study, including some of these community detection methods and compare things.

Speaker 0

最初对这100所学校的全面研究的一个贡献，就是提供了一个被他人用于其他用途的资源，无论是开发新方法还是其他方面。

One of the contributions of the initial study of going through all those 100 schools is to have something that people have used for other things, whether it's developing methods and so on.

Speaker 0

此外，也有一些更偏向社会学的研究跟进并拓展了我们的工作。

And there's also been some more sociologically oriented studies that have followed up on what we did.

Speaker 0

这就是基本的想法。

That's the sort of basic idea.

Speaker 0

所以它的定位是一个应用性研究。

So it's meant to be an application.

Speaker 0

我们内部开玩笑地称它为‘数据dump论文’，因为我们的方法是用大体的框架来比较不同事物的社区结构。

The actual we jokingly internally called it the data dump paper because it's like we developed this methodology and approach to think of, okay, let's in broad brush strokes compare the community structure across a few different things.

Speaker 0

那是2011年的SAM综述论文。

That was the 2011 SAM review paper.

Speaker 0

我们开发了这个宏观的框架。

We developed this broad thing.

Speaker 0

就是说，我们有100所学校。

It's like, okay, but we have a 100 schools.

Speaker 0

那就直接把它们全部比较一下，把我们做过的这些内容整理成数据dump。

Let's just compare all of them and take what we've done and do the data dump.

Speaker 0

事实上，从我网页上下载的文件，名字就直接叫‘data dump’。

In fact, the file that one would download from my HTML file is actually literally called data dump.

Speaker 0

PDF格式，因为这篇就是所谓的数据dump论文。

Pdf because this was the data dump paper.

Speaker 0

不管怎样，我觉得这是一个不错的应用，我很享受这个过程。

Anyway, I think it was a nice application, I enjoyed it.

Speaker 0

我至今仍在使用这个数据集，因为它是用相同流程生成的100个真实世界实例。

I continue to use this dataset because just having a 100 different instances of a real world thing that has been generated using the same process.

Speaker 0

实际上还有一篇论文，虽然不能说是进一步的证据，但确实让这个观点更具体、更扎实了——它确实是由同一过程生成的100个实例。

There's actually a paper that really also gives, I guess, I wouldn't say further evidence, but strengthens the evidence, makes it more concrete that it really is a 100 instances of the same process.

Speaker 0

所以这篇论文也涉及了我之前提到的一些人，但我本人并没有参与其中。

So this particular paper also involves some people I've mentioned before, so I'm not involved in this.

Speaker 0

亚伦·克莱塞特、乔汉·乌甘德参与了这项研究，我一时想不起那些年轻研究者的名字了，这不太好，但我想其中一两位是亚伦以前的学生。

Aaron Clauset's on it, Johan Ugander, and I'm messing up and forgetting the junior folks, which is not good, but I think one of Aaron's former students or something, one or two of them.

Speaker 0

他们研究的是，比如说，一所更大的学校，如果你把较小的学校按比例放大，它会不会就像那个更大的学校一样？

What they were looking at was whether, say, a larger version a larger school, if you took Is that somehow like the smaller one but having grown to a larger size?

Speaker 0

对。

Right.

Speaker 0

如果我把我的学校——规模为762人——让它增长，直到网络规模达到那个更大的学校，它们看起来真的像是来自同一个过程吗？

So if I took my school of size seven sixty two and let it grow until that network is of the larger one, do they actually look like they came from the same process?

Speaker 0

他们的论文中对此有更多细节，但本质上答案是肯定的。

There's more detail on that than that in their paper, but essentially the answer is yes.

Speaker 0

这确实是一个真实世界中的集合体，我特意使用“集合体”这个词，而不是来自合成模型——比如传统上我们所说的随机图模型——的集合体。

It really is an ensemble, and I'm using the word ensemble on purpose, from the real world instead of an ensemble from say a synthetic model, traditionally a what we call a random graph model.

Speaker 0

因此，当你测试方法，或者想写一篇关于某种过程的论文时，比如疾病传播，当然也适用于信息传播或谣言传播，因为——好吧，你当然不希望传播谣言，但你确实会和你的Facebook好友交流。

So that makes it very useful when you're testing methods or when you want to put look, do it a paper which has some sort of process disease spread or of course, makes sense for, information spread or rumor spread because, well, you really would well, hopefully not spread rumors, but you really would communicate with your Facebook friends.

Speaker 0

至少在某种程度上，这种网络具有真实性，是这类现象实际发生的网络类型。

It is at least at some level, there's a verisimilitude where it's the type of network on which this would occur.

Speaker 0

对吧？

Right?

Speaker 0

比如有些论文把疾病传播放在蛋白质相互作用网络上，你会想：好吧，数学上你确实可以这么做，但你有点脱离现实了，因为疾病传播并不是发生在蛋白质相互作用网络上的真实过程，尽管我可以把它数学化为一个网络，也可以数学上把这个过程施加上去。

Like papers where someone puts disease spread on a protein interaction network and you're like, Okay, well mathematically you can do that but you're suspending reality a bit because disease spread is not the type of process that occurs on a protein interaction network even though I can define it mathematically as a network, and I can mathematically put this process on it.

Speaker 0

Facebook网络对于这类情况来说，感觉更真实一些。

The Facebook network feels a little bit more genuine for something like that.

Speaker 1

我想你可能在谈到根据性别、班级年份对高中校友的同质性进行统计时已经触及这一点了，论文后面提到，高中在大型大学的社会组织中扮演的角色，比在小型机构中更重要，因为在小型机构中，来自同一高中的学生对通常较少。

So I think you may have already hit this point when you're talking about the tabulating assortativity based on gender made residents class year in high school and how these say later in the paper that high school plays a greater role in the social organization of large universities than it does at smaller institutions where there are typically fewer pairs of people from the same high school.

Speaker 1

对。

Right.

Speaker 1

这里有一个问题，和你刚才提到的另一个观点相关，我想看看这是否其实是一个问题：你说由于不同机构初期对Facebook的采用率不同，数据所代表的单一时点，可能有效地反映出在线社交网络形成的不同阶段。

There's this question for me that relates to another statement you've made here, and then I'll see if this is really all actually one question, where you say because of the different rates of initial Facebook adoption at different institutions, the single point in time represented by the data might usefully describe different stages in the formation of an online social network.

Speaker 1

我正在思考我刚刚和艾莉森·戈普尼克关于探索与利用权衡的对话，或者更具体地说，关于儿童发展过程中行为模式的变化。

I'm thinking about the conversation I just had with Alison Gopnik about explore exploit trade offs or in, like, childhood development and how patterns of behavior change.

Speaker 1

就像，我刚才想到，天啊。

Like, there was I've oh god.

Speaker 1

我忘了是谁做的了。

I'm forgetting who did it.

Speaker 1

我会在节目笔记里查一下，但这项工作出自杰夫·韦斯顿关于扩展和睡眠作用的研究，也就是为什么孩子需要睡得更多。

I'll look this up for the show notes, but the work that was on it was it came out of, I think, Jeff Weston on scaling and the role of sleep and, like, why it is that kids sleep so much more.

Speaker 1

我想听听你的看法，一方面你说不要用疾病传播模型来研究蛋白质网络。

Curious what your thoughts are about because on the one hand, you're saying, like, don't use a disease spread model and protein networks.

Speaker 1

嗯，

Well,

Speaker 0

从数学上讲，你当然可以这么做。

math math mathematically, you can do it.

Speaker 0

只是你稍微脱离了现实而已。

It's just a matter of you're suspending reality a little bit.

Speaker 0

但数学家们经常脱离现实。

But mathematicians suspend reality all the time.

Speaker 0

我肯定写过一篇论文，用过疾病传播模型在那种网络上的例子，但我真正想做的，是用数学方式描述疾病在不同结构网络上的传播机制。

I am sure that I have a paper in which I've used an example of a disease spread model on that, but what I'm really trying to do is to make a mathematical statement of how the disease spread works on a network with a different structure.

Speaker 0

从数学上讲，我有权这么做。

Mathematically, I'm allowed to do it.

Speaker 0

但当你开始思考这个例子时，你会觉得，嗯，这有点奇怪。

Just when you start thinking about the example, you're like, well, that's weird.

Speaker 1

所以我想我真正想问的是，你认为这些发现能否推广，或者以某种方式与更广泛的网络增长理论联系起来，无论是社交网络还是生物系统？

So I guess I guess what I'm actually shooting for here is do you think that these findings generalize or, like, hook into in any way kind of a broader statement on the growth of networks, whether they be in social or biological systems?

Speaker 0

是的。

Yeah.

Speaker 0

我不确定。

So I'm not sure.

Speaker 0

我要说的是，我提到的那篇由艾伦、约翰和他们的团队撰写的论文，实际上直接研究了Facebook网络的增长。

I will say that the paper that I mentioned by Aaron and Johan and company do actually that paper actually directly deals with the growth of of Facebook networks.

Speaker 0

所以就你提到的更直接的问题而言，确实有一篇论文专门研究了这一点，并且在这些例子中似乎确实如此。

So in terms of the sort of more kind of immediate question, there is actually a paper that studied that explicitly, and that does appear to be the case for these examples.

Speaker 0

至于更普遍的情况，我其实不太清楚，因为虽然我听说过你提到的一些研究，但我并不熟悉那些具体的论文。

In terms of more general, I don't really I don't really know because, I mean, I've heard of some of the works that you're mentioning, but I don't, like, I don't think I know those specific papers, for instance.

Speaker 0

我想提一下同配性，因为我觉得这实际上是另一个问题。

One thing I wanna mention about assortativity because I actually think it's a bit of a different question.

Speaker 0

你好像在说你不确定这是否是同一个问题。

Think you were saying you're not sure whether it was the same question or not.

Speaker 0

同配性是一个具体的度量，某种同配性实际上是一种相关性的度量。

Assortativity is a specific measure and a certain measure of assortativity is actually a measure of a type of correlation.

Speaker 0

即使我有一个很小的学校，我也可能有很高的相关性值，但同配性更多是通过本身拥有足够多的人数来影响结构的。

The idea is that even if I have a small school, I could have a very high value of that correlation, but that assortativity is impacting the structure more just by having enough people in the first place.

Speaker 0

所以这不仅仅是测量那个问题。

So it's a matter of it's not just measuring that.

Speaker 0

让我告诉你，我可以跟你讲讲这个计算的来源，以及论文中那些评论的由来。

So let me tell you, I can tell you a little bit about where that calculation came from and where those comments in the paper came from.

Speaker 0

我不知道现在是否还这样，但以前感觉确实是这样的。

I don't know if this is still true, but it felt like this is true before.

Speaker 0

你完全可以去批评一位社会学家。

You could you could kick a sociologist.

Speaker 0

我不建议这样做。

I don't I don't recommend doing this.

Speaker 0

你可以踢一位社会学家，他们会回应说：这是因为同质性。

You could kick a sociologist and they'll respond with, that's because of homophily.

Speaker 0

这是因为同质性。

It's because of homophily.

Speaker 0

那个评论的实际意思是：不，社区结构并不仅仅是同质性。

The actual point of that comment was that, no, in fact, community structure is not just homophily.

Speaker 0

同质性是一个标量指标，旨在反映同质性的某种特征。

Assortativity is a scalar measure that is supposed to give some signature of homophily.

Speaker 0

它并不是严格意义上的同质性，但它的目的就是如此。

It's not literally homophily, but that's what it's supposed to do.

Speaker 0

关键是，这两种情况下同质性都可能很高，但社区结构及其影响会因人数不同而有所差异，以此区分这些中观结构并不仅仅是由同质性造成的。

The idea is that assortativity could be high in both cases, but the community structure and the effect on it would be different depending on the numbers trying to distinguish that these mesoscale structures are not just homophily.

Speaker 0

所以，那个评论正是源于此。

So that's actually where that comment came from.

Speaker 0

这仅仅是基于我说我有很高的相关性。

And that's just coming from saying I have a high correlation.

Speaker 0

我有很高的相关性。

I have a high correlation.

Speaker 0

我测量了一个相关性。

I measure a correlation.

Speaker 0

这种相关性的测量及其对社区结构的影响，将取决于这类因素的数量。

And the measurement of that correlation and in terms of the impact on community structure is going to depend on how many of those things are there.

Speaker 0

所以，社区结构的计算是一种不同的计算方式。

So the idea is that the community structure calculation is a different calculation.

Speaker 0

它处于系统的不同尺度上，并不仅仅是同质性。

It is at a different scale of the system and it's not just homophily.

Speaker 0

所以我认为这是另一个问题。

So I think that's a different issue.

Speaker 0

不管怎样，这正是我们的想法，对吧？

Anyway, that's what we were thinking, right?

Speaker 0

所以，我们当时写的就是这类内容。

So that was the type of thing that we were writing there.

Speaker 0

我们是在回应那种‘这只是同质性’的说法，当时我看到太多次了。

Were reacting to the it's just homophily refrain that at the time I was seeing too many times.

Speaker 0

我认为这和不同规模的网络在增长时发生的情况是不同的问题。

I think that's a different issue from what happens with networks of different sizes as they grow.

Speaker 0

这更多是介观尺度上的某种现象，而不是另一种测量方式。

It was more something's in a mesoscale versus something is a different measurement.

Speaker 1

是的。

Yeah.

Speaker 1

因为我自己确实也陷入过你所批评的那种社会学家式的思维，读到这些时，我忍不住想：任何有过一定国际旅行经历的人都知道，同质性在敌对社交网络中占主导地位，因为你总会遇到这些……嗯，非常明显的例子。

Because I definitely I think I found myself going into the kick a sociologist kind of thinking that you're critiquing here, reading this and just wondering it strikes me anyone who's done any amount of, like, international travel knows that homophily dominates hostile social networks because you have these Well like, very Yeah.

Speaker 0

但同质性实际上指的是互动。

But it's but homophily is referring though to interactions.

Speaker 0

它指的是微观层面的东西。

It's referring to a microscale thing.

Speaker 0

那么问题来了，介观尺度和微观尺度会受到什么影响？

Then there's the question is what is the impact on a mesoscale and a microscale?

Speaker 0

这就是为什么我说它们并不是同一个问题。

That's why I'm saying I don't consider them to be the same question.

Speaker 0

同质性非常重要，但并不是唯一在发生的事情。

So homophily is very important, but it's not the only thing going on.

Speaker 1

我想听你进一步补充一点细微差别，因为我在你多篇论文以及你在圣塔菲研究所的演讲中都听到过，关于网络结构的许多思考往往假设模型中只存在成对互动。

So there's another piece of nuance I'd like to hear you add to this because I've heard you say in numerous papers as well as in the talk that you gave at SFI that, like, a lot of the thinking on network structure tends to assume pairwise interactions in the model.

Speaker 0

因为这是最简单的方法。

Because that's the easiest thing to do.

Speaker 1

对。

Right.

Speaker 1

但如果你仔细观察现实，就会发现，嗯。

The reality is if you look at this, it's like, okay.

Speaker 1

是的。

Yeah.

Speaker 1

作为初步近似，比如某人是否接受你的好友请求之类的，但确实存在一些情况。

As a first pass approximation, whether somebody accept your friend request or whatever, but, like, there are Right.

Speaker 1

这个问题涉及很多不同的方面。

There's so many different pieces to this.

Speaker 0

我们这么做是因为我们知道如何做，而不是因为我们认为这就是终极解决方案。

So we do it because it's we do a lot of these things because we know how to do them, not because we necessarily think it's the be all and end all.

Speaker 0

如果你要超越成对互动，或者要处理随时间变化的互动，或者要处理多种类型的互动、多种沟通渠道、多种关系类型，你最终会得到一个更复杂的数学表示。

And if you're gonna go beyond pairwise interactions or if you're gonna go to have interactions that change with time or if you're gonna go to have interactions that have maybe multiple types of interactions, multiple communication channels, multiple types of relationships, you're going to end up having a mathematical representation that's more complicated.

Speaker 0

我花了很多时间研究其中各种情况，其他人也花了大量时间研究各种情况。

I've spent a lot of time studying various ones of those, and other people have spent a lot of time studying various ones of those.

Speaker 0

这是一件非常有趣的事情，而且有充分的理由去做，因为正如我在演讲中常说的是，人们并不会真的拿着别人也拿着的棍子。

It's a very cool thing to do and there's good reason to do it because in the way that I probably phrase it in my talk, because I often phrase it that way, people are not walking around holding sticks that other people are holding.

Speaker 0

即使说两个人是朋友，他们之间也存在一系列潜在的互动——电话、邮件、一起看电影、外出吃饭等等，这些你都看不到，却用一个矩阵中的数字来表示，说这两个人握着一根棍子，我们用数学模型来表示人们握着一根特殊的棍子，但这并不是我们实际看到的情形。

Even the notion of saying two people are friends, there is this latent set of interactions between them, phone calls and emails and going to the movie and going out to eat and whatnot, that you're not seeing and you are representing as a number in a matrix saying that these two people are holding and attached to a stick and we're and we're using a mathematical representation that says people holding a special ticket stick and that's that's not what we see.

Speaker 0

对吧？

Right?

Speaker 0

我们看到的是两个人一起吃晚饭之类的。

What we see is two people are having dinner together or whatever.

Speaker 0

对吧？

Right?

Speaker 0

好吧，希望你不会看到这些，因为你不应该去跟踪他们，但这些互动，我们 somehow 将这些互动表示为一个网络。

Well, hopefully, you don't see that because you shouldn't be stalking them, but interactions, and somehow we represent the interactions as as a network.

Speaker 0

是的。

Yeah.

Speaker 0

我经常很刻薄地做出这样的评论。

I'm very snarky and make comments like this all the time.

Speaker 0

抱歉。

Apologies.

Speaker 0

但无论如何，我们正在用一个数学对象来表示某些东西，即使我们包含了这些细微的非成对互动等等，它也远比现实简单。

But in in any event so we're representing something in a mathematical object that even when we include a bunch of these nuances, non pairwise interactions and so on, is a much simpler thing than reality.

Speaker 0

这是对现实的极大简化，你总是需要担心：当你获得现实，然后从现实中获取数据——而这些数据本身已经做了一些简化，再将它们转化为你研究的数学对象时，你会意识到，好吧，如果我把它转化为一个数学对象，我可以说我可能对这个数学对象做出了精确的陈述，即使那样也存在近似，但我们就假设我确实这么做了。

Speaker 0

现在我对这个数学对象做出了一个精确的陈述，我想把这个陈述延伸出去，推断并说明一些关于现实世界的东西，尽管这个数学对象只是现实的简化。

Speaker 0

你必须警惕，因为选择以某种方式表示事物时，会产生一些人为的假象。

You have to worry because there are artifactual things that occur by choosing to have represented something in a certain way.

Speaker 0

而我们的期望是，你对这个数学对象所做出的任何结论，或许也能告诉你一些关于它所代表的更复杂事物的信息。

And the hope is that something that you then say about this object hopefully can tell you also something about the more complicated thing it's representing.

Speaker 0

因此，当我们处理这种成对互动时，我们知道如何更有效地从数学上研究它们，所以我们花了很多时间在它们上面。

And so when you do something like these pairwise interactions, we know more about how to study them mathematically, so we spend a lot of time on them.

Speaker 0

当我们试图推广时，比如多元互动，我在这里 conventionally 使用‘多元’来指代三个或以上的互动，有些人称之为高阶互动。

When we make efforts to generalize, so polyadic interactions and this I'm conventionally using polyadic to mean three plus, so some people call it higher order.

Speaker 0

但单元、二元，然后是多元——即三个或以上——当你推广这些概念时，需要对不同的概念进行扩展。

But mono and then dyadic and then polyadic, so polyadic three plus, you need to generalize different concepts.

Speaker 0

当你推广这些概念时，我们之前讨论过同质性或 assortativity 的概念。

And when you generalize those concepts so it's we talked about assortativity or ideas of homophily.

Speaker 0

标准的同质性概念是一个成对互动的概念。

Assortativity standard type is a pairwise concept.

Speaker 0

如果我要将这样的概念推广到多元交互，我会面临多种不同的推广方式，某些推广可能适用于某些问题，而其他推广则可能更适合其他问题。

If I'm going to take something like this and generalize it to polyadic interactions, I will have many different choices of how to generalize it, and some generalizations might be appropriate for some problems, and other generalizations might be appropriate for other problems.

Speaker 0

我们需要开展科学研究。

And we need to work out science.

Speaker 0

目前很多论文虽然不一定是直接讨论同质性，但在理论层面经常探讨：如果我们把某个概念以不同方式推广到更复杂的数学网络中，会有什么后果？

And this is where a lot of papers are currently not necessarily assortativity per se, but a lot of papers on the theory end are saying, well, if we take a certain idea and generalize it to networks that are more complicated mathematically in different ways, what are the consequences?

Speaker 0

我们应该选择哪种方式呢？

Which way should we do it?

Speaker 0

因此，目前有很多相关研究正在进行。

So there's a lot of work right now.

Speaker 0

实现多元交互的一种方式是使用超图，并将各种方法应用到超图上。

One way of having polyadic interactions is with things called hypergraphs and taking various methods and putting them on hypergraphs.

Speaker 0

我认为这在你提到的另一篇论文中有所涉及，把

I think that's on another of the papers that you pulled Putting

Speaker 1

那实际上是一个重要的研究，关于超图上的有限置信度意见动力学模型，正是你去年夏天在圣塔菲研究所演讲的主题，我们一定会

That was actually a big that was the sub a bounded confidence model of opinion dynamics on hypergraphs was the subject of the talk you gave at SFI last summer, and we'll be sure to

Speaker 0

链接到那个。

link to that.

Speaker 0

是的。

Yes.

Speaker 0

没错。

Exactly.

Speaker 0

所以当你允许三个人在房间里互动，而不仅仅是成对互动时，你可以将意见模型应用其中，并以不同方式推广它。

So you can put an opinion model when you allow interactions with three people in a room, not just pairwise or whatnot, and you can generalize it in different ways.

Speaker 0

因此，需要研究如何实现它，如何用数学方法研究这些模型，或者在超图上进行社区检测，而不仅仅是在普通图上，或在具有多体互动的疾病传播中。

And so working out how to do it and how to study those mathematically or doing community detection on hypergraphs, not just on graphs or disease spread where you have poly polypolytic interactions.

Speaker 0

你需要弄清楚如何推广某些概念。

And you need to work out how you generalize certain concepts.

Speaker 0

很多理论工作都集中在这一点上。

That's where a lot of the theory goes.

Speaker 0

比如说，好吧，我有一个更复杂的数学对象。

Say, Okay, I have a more complicated mathematical object.

Speaker 0

我想把我认为有用或有趣的研究内容拿出来，现在想在更复杂的数学对象上研究它们。

I want to take the things that I think are useful or interesting to study, and I now want to study them on a more complicated mathematical object.

Speaker 0

我该怎么做呢？

How am I going to do that?

Speaker 0

有多种方法可以做到这一点。

There's different ways of doing it.

Speaker 0

即使在普通图或标准网络的成对情况下可能只有一种或两种方法，但在更复杂的情况下，通常会有更多种方法。

Even in cases where there might only be one or two ways of doing it in a pairwise situation on ordinary graph on standard network, there might be more ways, or usually are more ways.

Speaker 0

我常举的一个例子虽然涉及一些流行术语，但其实所有东西都免不了用流行术语。

The example that I tend to like to give, although it involves buzzwords, but I will everything involves buzzwords.

Speaker 0

就像《星际迷航》中某一集那样，所有东西都在引用其他东西，因此你必须找到一个基准。

It's like this episode of Star Trek where everything is in reference to something else and so somehow you have to find the baseline.

Speaker 0

不是 turtles all the way down（乌龟叠乌龟），而是 communities all the way down（社区叠社区）。

Instead of being turtles all the way down, it's communities all the way down.

Speaker 0

但有一种方法叫做奇异值分解。

But there's something called a singular value decomposition.

Speaker 0

我不会费心解释它是什么，因为那只是另一个旁支话题。

I'm not going to bother explaining what it is because that's just another side point.

Speaker 0

但每个矩阵都有一个奇异值分解。

But every matrix has a singular value decomposition.

Speaker 0

你完全可以将一个矩阵分解为具有特定性质的其他矩阵的乘积。

You are guaranteed to take a matrix and to phrase it as a product of other matrices that have certain properties.

Speaker 0

你总是可以这样做。

You are always allowed to do this.

Speaker 0

好的。

Alright.

Speaker 0

我可以将矩阵推广到更复杂的数学对象，特别是所谓的张量。

I can generalize matrices to a more complicated mathematical object, in particular something called a tensor.

Speaker 0

你可以自己去查一下。

Again, you can look it up.

Speaker 0

不是基努·里维斯主演的，至少目前还不是。

Not starring Keanu Reeves, at least not yet.

Speaker 0

那部《矩阵》的续集本来应该叫《张量》。

That should have the sequel to the matrix should have been called the tensor.

Speaker 0

但不管怎样，你可以对张量进行操作。

But, anyway, you can do a tensor.

Speaker 0

对于张量，有多种奇异值分解的版本。

There are different versions of singular value decompositions for tensors.

Speaker 0

然而，你无法保留矩阵版本的所有性质。

However, you cannot keep all the properties of the matrix version.

Speaker 0

所以存在一些变体，比如假设你只能保留三个性质中的两个。

So there's a there are variants that keeps say, suppose you're allowed to keep two out of three properties.

Speaker 0

好吧。

Alright.

Speaker 0

我保留性质A和B，然后推广性质C。

I keep properties a and b and I generalize c.

Speaker 0

不允许，我不能保留全部三个性质。

Not allowed I'm not allowed to keep all three properties.

Speaker 0

我保留a和b，推广c。

I keep a and b, generalize c.

Speaker 0

有不同的实现方式。

Different ways of doing it.

Speaker 0

保留a和c，推广b，也有不同的实现方式。

Keep a and c, generalized b, different ways of doing it.

Speaker 0

所以，好吧，这里发生的情况并不完全如此，但核心思想是我从图推广到超图。

And so, okay, that's not the precise thing that's going on here, but the idea is that I generalize from graphs to hypergraphs.

Speaker 0

我失去了一些保证。

I lose some guarantees.

Speaker 0

我必须改变某些东西。

I have to change certain things.

Speaker 0

可能发生的状况更多了，它们在数学上具有不同的后果。

There's more possibilities of what can occur, and they have different consequences mathematically.

Speaker 0

有些方法可能更适合某些应用，而有些则更适合其他应用。

Some might be more appropriate for some applications, and some might be more appropriate for others.

Speaker 0

我们在这些理论论文上花费大量精力，主要是为了弄清楚不同的选择。

A lot of the effort that we spend time on in these theoretical papers is trying to figure out different choices.

Speaker 0

我还不知道，我认为大多数人也不知道。

And I don't yet know, and I think most others don't either.

Speaker 0

有些人可能声称他们知道，但我并不认为他们真的知道，哪些方法才是真正最好的。

I think they may some people may claim they know, but I don't think that they do, of which ways are actually the best ways to do that.

Speaker 1

这实际上让我想到，我可能有点为难你了，但我还是想提一下，因为我们在最初讨论的综述文章中提到的一个例子非常有趣：在现实世界的情境中，社区检测具有切实的社会影响。

So that actually brings me to I might be putting you on the spot here, but I wanted to bring this up because it was a really interesting example in in the survey piece we we first discussed about real world settings in which community detection matter because they have palpable social consequences.

Speaker 1

对吧？

Right?

Speaker 1

你关于有界信任模型的研究以及你所做的报告，都强调了理解甚至缓解极化现象。

And you've actually the bounded confidence model piece and the talk that you gave emphasized understanding or perhaps even mitigating polarization.

Speaker 1

论文开头有一张图，讲的是韦恩·扎卡里对空手道俱乐部的研究，展示了俱乐部内部的分裂过程，你可以在网络的可视化中清晰地看到，当中心无法维系、局势崩塌时，派系将如何划分。

There's this figure early on about research by Wayne Zachary on a karate club and the schism that happens in the karate club and how you can you can see in the visualization of that network exactly how the party lines, if you will, are gonna split when things when the center cannot hold, when things fall apart.

Speaker 1

我不愿在此过多停留，但我只想强调一点：不仅在数据收集方式上存在伦理考量，你在做出这些具体选择时也同样面临伦理问题。

And not to, like, linger here, but I just wanna stress and then hear you speak to the fact that there are not simply ethical considerations in the way that this data is collected, but also in precisely the kind of choices that you're making.

Speaker 1

因为如果人们试图用这些工具来预测社会结构中的分裂，那感觉就像突然间我正凝视着深渊，现实中决策者真的在运用这些东西。

Because if people are trying to use these tools to predict rifts in social organization, then it's like feel like suddenly I'm staring into an abyss here where, like, decision makers are actually running with this stuff in the real world.

Speaker 0

是的。

Right.

Speaker 0

情况只会变得更糟。

Well, it only gets worse from there.

Speaker 0

关于空手道俱乐部的论文，或者原始的那篇，当然就是著名的空手道俱乐部数据集。

The Karate Club paper or the original one, this is The Karate Club, of course, is an infamous dataset.

Speaker 0

我可以说是扎卡里空手道俱乐部的一员，因此这个数据集甚至有自己的维基百科页面。

So I am a member of the Zachary Karate Club, as it were, which so the dataset has its own Wikipedia page.

Speaker 0

但即使在所谓的最简单社交网络中，这也不是一个庞大的网络。

But this even in the sort of, quote, simplest social network, and this is not a large network.

Speaker 0

共有34个节点，即34个人，如果你仔细看原始论文，会注意到一个我称之为‘笔误’的错误——某个边是否存在并不明确，因为这是一个无向网络，但在邻接矩阵中，某个位置是1，而对应的转置位置却缺失了。

There are 34 nodes, 34 individuals, and if you actually look in the original paper, you'll notice that there is I'll use the word typo, but, like, error where it's not clear if a certain edge should be there or not because it's an undirected network, but then there's an adjacency matrix where there's a one in a certain entry and the corresponding transpose entry, it's not there.

Speaker 0

问题是：这两个位置都应该是1，还是都应该是0？

The question is should they both be ones or should they both be zeros?

Speaker 0

如果你只是说，而且再次忽略的是，那篇论文中有些版本在连接上还带有数值。

If you're just saying and, again, ignoring the there's some versions in the in that paper that have values on the connections as well.

Speaker 0

但即使在最简单的情况下，你已经遇到了混乱，更不用说在数据中，更不用说我们现在研究的那些更复杂的东西了。

But so even in the simplest situations, you already have messiness let alone in the data, let alone in the stuff that we study now and more complicated things.

Speaker 0

不过，那篇论文的关键在于，这种分裂其实是一种事后的回溯，对吧？

That paper though, the thing is the schism was really a retrodiction, right?

Speaker 0

就像分裂发生时研究正在进行，扎卡里事后才观察到，并说：哦，好吧，也许我本可以根据谁和谁互动来预测到这一点，对吧？

Like it occurred while the study was occurring and so then Zachary looked at it afterwards and said, Oh, okay, maybe I could have anticipated this based on who was interacting with whom, right?

Speaker 0

所以这项研究是在分裂发生之后进行的，而不是想象有人是一家公司的顾问，他们做了一项调查，而此时分裂尚未发生，他们请了一位报酬丰厚的顾问网络科学家来检测社群，现在我真的在幻想了。

So the study was occurring after the fact as opposed to imagine that somebody is a consultant for a company and they do a survey and so there has not been a schism yet, and they go to some consultant network scientist who's being paid a lot of money to detect communities, and now I'm really in fantasy land now.

Speaker 0

对吧？

Right?

Speaker 0

你靠检测社群就能发财。

It's like you're gonna get rich by detecting communities.

Speaker 0

也许有人能做到，但并不是学者们在做这件事。

I suppose somebody could, but it's not the academics who are doing that.

Speaker 0

无论如何，假设你是一名顾问，有人问你：好吧。

In any event, suppose you're some consultant and you're asked, alright.

Speaker 0

我们做了一份关于谁和谁在一起的调查。

Well, we get some survey of who's hanging out with whom.

Speaker 0

我需要担心是否存在分裂吗？

Do I have to worry about whether there is a rift?

Speaker 0

我认为这样说很合理：确实有令人担忧的理由。

I think I think that's reasonable to say, well, there's cause for nervousness.

Speaker 0

只要你的结果可能影响人类在现实中的行为，就存在令人担忧的理由。

Anytime your result might impact what a human actually does in real life, there's cause for nervousness.

Speaker 0

对吧？

Right?

Speaker 0

我认为还有更多更深层次的担忧。

I think there's much more there's much more nervousness than that.

Speaker 0

对吧？

Right?

Speaker 0

这就是为什么我说这仅仅是开始。

That's why I'm saying it's only the beginning.

Speaker 0

比如，如果我开始识别出社群，而我的所有朋友恰好都是已知的恐怖分子，那么人们就会怀疑我也是恐怖分子。

Like, so for instance, if I start detecting communities and all of my friends somehow are known terrorists, then people are gonna suspect that I'm a terrorist.

Speaker 0

这是一个随机的例子。

This is a random example.

Speaker 0

我应该明确说明这一点。

I should make that very clear.

Speaker 0

但人们会怀疑我是恐怖分子，不是因为我本身是已知的恐怖分子，而是因为我的所有朋友都是恐怖分子，他们都出现在同一个社群里。

But people are gonna suspect that I'm a known terrorist because I am or not that I'm a known one, but that I'm a terrorist because all my friends are terrorists who showed up in the same community.

Speaker 0

于是，突然间，有人因为被分配到同一个社群，而被某种算法方法盯上。

So all of a sudden, you have somebody, say, being targeted because they were assigned to the same community and some algorithmic method.

Speaker 0

对吧？

Right?

Speaker 0

这比我仅仅担心公司内部会不会出现分裂更让我担忧，毕竟如果我们不希望公司分裂，也许确实该做点什么来防止分裂。

Like, that worries me a lot more than just purely wondering if there's gonna be a schism in a in a company, and maybe we should do something to prevent a schism in the company if we don't want a schism.

Speaker 0

对吧？

Right?

Speaker 0

比如，当推断结果开始针对个人时，这比公司内部出现分裂之类的问题更让我担忧。

Like, types of things where the result of an inference start targeting individuals concerns me a lot more than something at the level of a schism of a company.

Speaker 0

原则上，你可以使用这些方法。

And you can use, in principle, these methods.

Speaker 0

你可以这样想：假设我有一个由20个人组成的群体，他们都自称为红色，因此被标记为红色。

You can literally say, here's a suppose I get a suppose I get a community that has 20 people who all have a label and they're all colored red because they're self identified label that these people are red.

Speaker 0

而这个群体中还有一个人没有说自己是红色，但我因为他的所有朋友都是红色，就推断他也是红色。

And there's one other person in the community who did not say that they're red, but now I think that they're red because all their friends are red.

Speaker 0

而且他确实有可能是红色的，对吧？

And there is a reasonable chance that they are, right?

Speaker 0

这些方法原则上确实可以做到这一点。

And that's something that these sorts of methods in principle can do.

Speaker 0

你可以尝试对那些你还不了解的事情做出推断。

You can you try to make inferences about things you don't know yet.

Speaker 0

对吧？

Right?

Speaker 0

你有已知的标签。

You have labels that are known.

Speaker 0

你有未知的标签。

You have labels that are unknown.

Speaker 0

但你是如何使用这些标签的呢？尤其是在现实世界中使用，而不仅仅说‘这是一篇论文，颜色是红色’。

But then how are you using those labels, especially if you're using it in the real world and not just saying, well, here's a paper and it's red.

Speaker 0

这是一个很大的问题。

That's a big concern.

Speaker 0

这是一个非常严重的问题。

That's a very big concern.

Speaker 1

所以，谢谢。

So in the thanks.

Speaker 1

在我们剩下的时间里，我想特别提到你与米歇尔和埃莱尼·卡蒂福里在2023年1月发表在《今日物理》上的那篇文章，内容是关于数据的拓扑结构。

In the time that we have left, I wanna because did I specifically called out this piece just came out January 2023 in Physics Today that you wrote with Michelle and Eleni Katifori, and this is on the topology of data.

Speaker 1

所以，让我们再次摆脱这些伦理困境。

So, again, we're gonna we're let's dig ourselves out of the ethical conundra.

Speaker 0

所以我们现在处于2023年1月。

So we are now in we are now in January 2023.

Speaker 0

我还应该说明，这是一篇解释性文章。

So this is an expository article, I should also say.

Speaker 0

我告诉数学家们，《今日物理》就像是美国数学学会公告的物理版。

Physics Today is I tell mathematicians that this is the physics version of notices of the American Mathematical Society.

Speaker 0

我知道这对这个播客帮助不大，但《今日物理》是月刊，实际上是一本杂志，而不是期刊。

I know that doesn't really help this podcast, but Physics Today is a monthly it's actually a magazine, not a journal.

Speaker 0

你可以把它想象成给物理学家看的《科学美国人》，对吧？

So think of it like Scientific American but for physicists, right?

Speaker 0

它可能比《科学美国人》稍微更专业一些，因为它假设读者具备广泛的物理学背景，但广泛的物理学背景并不意味着是物理教授，对吧？

So it's maybe a little bit more technical than Scientific American because it's assuming kind of a broad physics background, but broad physics background does not mean physics professor, Right?

Speaker 0

所以你可以假设，大多数阅读《今日物理》的人，要么拥有物理学或相关领域的本科学历，这就是目标读者群体。

So you can assume that most of the people who get physics today are people who had either a physics or related undergraduate degree and that's the sort of audience.

Speaker 0

对吧？

Right?

Speaker 0

所以，任何谁

So anybody who What are

Speaker 1

我们是在向我们的听众推广这本杂志。

we selling this magazine to our listening audience here.

Speaker 1

我猜测

I suspect

Speaker 0

哦，抱歉。

Oh, sorry.

Speaker 0

是的。

Yeah.

Speaker 0

其实我不是靠这个赚钱，我只是想说明一下，抱歉。

Well, that's not I'm actually I make no money from that, but I'm just trying to say what I'm trying to do sorry.

Speaker 0

我总是忍不住陷入更深层次的细节中。

I have a tendency to go down into rabbit holes more than I should.

Speaker 0

我该做什么

What I'm to do

Speaker 1

可能是你把这说得相当吸引人了，好吧。

is is probably you're probably making it sound rather appealing to Okay.

Speaker 1

嗯，

Well,

Speaker 0

我们的文章就是为了这样的人群而写的，我只是想说明这一点。

our article so that I'm just saying that's who our article is written for.

Speaker 0

所以这篇文章的定位应该是通俗易懂的，如果我们做得好的话，作为对比，这些社区和网络实际上也曾出现在数学家们的一个类似平台上，不管怎么说。

So it is meant to be an article that should be accessible, and if we did our job, so just as a contrast, this communities and networks actually appeared in an analogous venue for mathematicians for whatever it's worth.

Speaker 0

总之，这篇文章发表于一月，篇幅必然很短，因为这是要求如此，而我们本来也想说说这篇文章到底讲了什么。

Anyway, so this came out in January and was necessarily short because it was that's what's required, and we we wanted to I guess I'll say what the article is.

Speaker 0

有一个数学领域叫拓扑学，传统上的例子——我们论文里也用了这个例子来铺垫——就是拓扑学家分不清咖啡杯和甜甜圈的区别。

So there's a mathematical subject called topology, and the traditional example, and we do have this example in the paper to set the stage, is that a topologist cannot tell the difference between a coffee cup and a doughnut.

Speaker 0

这就像一个常见的说法了。

This is like the standard trope.

Speaker 0

我们想象的咖啡杯有一个把手，因此存在一个洞的概念，而甜甜圈也有一个洞的概念。

And we're thinking of a coffee cup that has a handle, so there's this notion of a hole, and a doughnut has a notion of a whole.

Speaker 0

对吧？

Right?

Speaker 0

你正在看一个甜甜圈。

You're viewing a doughnut.

Speaker 0

我们不知道外面发生了什么，但这里有一个洞，而咖啡杯是一个整体，有一个洞。

We don't know what's going on the outside, but there's a hole, and the coffee cup is one piece and one whole.

Speaker 0

甜甜圈也是一个整体，有一个洞。

And the doughnut is one piece and one whole.

Speaker 0

现实物理世界有更多复杂性，比如甜甜圈的横截面是什么样子等等，但我们忽略这些因素。

Now physical reality has more complications than because of what a cross section of a donut looks like and so on, but we're ignoring stuff like that.

Speaker 0

而数据本质上是离散的，对吧。

And then data is discrete, right, in its very nature.

Speaker 0

对吧？

Right?

Speaker 0

我们把它输入到计算机中。

We put it into a computer.

Speaker 0

我们在对其进行离散化，于是你面前就出现了一堆点。

We're discretizing it, and you have a bunch of dots on a page.

Speaker 0

所以现在你需要想象一堆分布在页面上的点。

So you're supposed to imagine now a bunch of dots on a page.

Speaker 0

这在技术上被称为点云，你会问自己一个问题：我能否用这种视角来判断两个事物是否相同，比如它们是否具有相同数量的孔？

This is a technical name for this as a point cloud, and you ask yourself the question, can I use this lens of trying to say that two things are the same if they have, like, the same number of holes?

Speaker 0

在数据上，我该如何实现这一点？

How do I do that on data?

Speaker 0

有什么方法可以理解这一点？

What is there to make sense of that?

Speaker 0

总的来说，这篇论文就是关于这个的，然后它特别使用了具有物理性质的示例。

That's broadly speaking what the paper is about, and then it specifically uses examples that have a physics nature.

Speaker 0

我们选择了其中一些具体的例子。

We chose specific ones.

Speaker 0

我们实际上在研究软物质物理等领域的内容，但那些具有物理性质的东西还包括宝可梦，不过宝可梦更多是在引言部分提到，并不属于真正的软物质。

Were really doing things from soft matter physics and so on, But stuff that has a physics nature and also Pokemon, that was more in the introduction, not really soft matter.

Speaker 0

不过我想，胖丁大概是很柔软的。

Though I guess I imagine Jigglypuff as being fairly squishy.

Speaker 0

我不确定这一点是否已被确认。

I don't know if that's been established.

Speaker 0

但好吧。

But okay.

Speaker 0

所以你有一个点云，我想谈谈如何观察这个点云的形状，也就是它的拓扑结构。

So you have a point cloud, and I wanna talk about how I can look at that shape of that point cloud, that topology of that point cloud.

Speaker 0

因此，你可以想象自己在做这件事，这是一个非常视觉化的话题，如果不展示一些内容，很难讲清楚。

And so what you imagine doing, this is a very visual subject, it's hard to do this without showing you stuff.

Speaker 0

但你可以去看论文，并想象自己在做这件事。

But you can look at the paper and you can imagine doing that.

Speaker 0

我们来玩一个连点成画的游戏。

Let's take a connect the dots type game.

Speaker 0

明白吗？

Okay?

Speaker 0

所以你有一堆点。

So you have a bunch of dots.

Speaker 0

现在，我不用线连接它们，而是想象眯起眼睛，让这些点变得稍微模糊一些。

Now instead of connecting them with lines, I'm going to imagine squinting and making these dots slightly blurrier.

Speaker 0

所以，如果两个点足够接近，当我给每个点想象一个圆盘时，它们最终会重叠。

So two dots that are close enough together, if I imagine that I'm putting a disc around each dot, they eventually overlap.

Speaker 0

于是我眯起眼睛，它们就重叠了。

So I squint and they overlap.

Speaker 0

所以想象一下，我有足够的点来实现这一点。

So imagine that I have enough dots to do that.

Speaker 0

人类非常擅长看着这些点，说：嘿，那是一个宝可梦。

Human beings are really good at looking at this dot, the set of dots and saying, hey, that's a Pokemon.

Speaker 0

这不仅仅是一堆随机的点。

It's not just a random collection of dots.

Speaker 0

那里确实存在某种形状。

There is some shape that's there.

Speaker 0

但你希望有算法能做到这一点，让计算机也能做到，并且在你还不知道答案的情况下也能实现。

But you wanna get algorithms that do that, and you want computers to do that, and you wanna do that in cases where you don't already know the answer.

Speaker 0

因为如果你只在已经知道答案的情况下才这么做，那你可以用更简单的方法。

Because if you only did this when you already know the answer, you could just do something simpler.

Speaker 0

对吧？

Right?

Speaker 0

所以你要构建的方法，即使在不知道答案的情况下也能做到这一点。

So you wanna build methods that can do this even when you don't know the answer.

Speaker 0

因此，Jigglypuff的例子展示了我们正在逐渐把这些点变得越来越大。

So the Jigglypuff example is showing that we're gradually making these dots bigger and bigger.

Speaker 0

首先，你会开始看到一些可识别的特征，比如Jigglypuff的眼睛。

And first, you start getting identifiable things like Jigglypuff's eyes.

Speaker 0

所以，这是某种二维生物的眼睛，在这种情况下，因为它画在纸上。

So the eyes of some, well, two dimensional creature in this case because it's on a piece of paper.

Speaker 0

然后你最终会形成这些点。

Then you eventually make the dots.

Speaker 0

首先，特征会出现。

First, features appear.

Speaker 0

你可以想象，我正在改变这些点的大小，特征就会首先出现。

You imagine I am varying the size of these dots and features first appear.

Speaker 0

接着，点变得更大，眼睛逐渐填满，然后又消失了。

Then eventually the dots become larger and the eyes fill in and then they disappear.

Speaker 0

你会根据这些点的大小来追踪特征何时出现（即被‘诞生’），以及特征何时消失。

You track as a function of the size of these dots when do features appear, they are called to be born, and when do features or disappear?

Speaker 0

它们被称为‘消亡’。

They're called to die.

Speaker 0

我知道你看不到我的空气引号，但无论如何，你可以想象此刻正伴随着口头的空气引号。

I know you can't see my air quotes, but, anyway, you can imagine verbal air quotes are occurring right now.

Speaker 0

因此，这给你提供了一个特征的持续时间。

And so this gives you a length of a feature.

Speaker 0

人们使用的术语是持久性。

The term that people use is persistence.

Speaker 0

对吧？

Right?

Speaker 0

所以这是在使用一个英语术语。

So it's using an English term.

Speaker 0

这实际上是数学术语成为精确版本的少数情况之一。

It's actually one of the times where the mathematical term is a precise version.

Speaker 0

它提供了正确的直觉。

It gives the right intuition.

Speaker 0

你有这些用于测量持久性的算法工具。

You have these algorithmic tools that measure persistence.

Speaker 0

对于固定的点大小，我可以使用拓扑学思想。

For a fixed dot size, I can use topological ideas.

Speaker 0

我固定点的大小，然后可以使用拓扑学思想。

I fix the dot size and I can use topological ideas.

Speaker 0

在处理数据的形状和拓扑时，你希望关注那些在一系列连续的点尺寸或你可能变化的尺寸范围内持续存在的特征。

What you try to do for the shape of data, for the topology of data, is you want to look at features that are there for a large set of contiguous dot sizes or contiguous sizes that you might be varying.

Speaker 0

点尺寸是你可能变化的一个因素，但如果你研究的是其他东西，这时这篇说明性论文中关于‘哀悼部分’的内容就派上用场了——你可能变化的是别的东西。

A dot size is one thing you could vary, but if I'm studying something else, this is when the mourn parts of this expository paper come in, it might be something else that you're varying.

Speaker 0

理想情况下，这应该受到你系统物理特性的启发，比如本文的情况，但也可能受到某种领域知识的启发。

It might be ideally something that is informed by well, in the case of this article, the physics of your system, but it could be informed by some domain knowledge of some type.

Speaker 0

对吧？

Right?

Speaker 0

因此，你以一种受领域知识启发的方式进行这种数学构建。

So you do this mathematical construction in a way that is informed by domain knowledge.

Speaker 0

我和我的合作者也在这方面发表过若干研究论文。

And there are various research papers that I and my co authors have also done on this.

Speaker 0

这就是我们采取的视角，我们已经做了大量工作，比如处理各种类型的地理空间数据。

That's the perspective that we've taken to try to put so we've done a bunch of stuff with, like, different types of geospatial data, for instance.

Speaker 1

我想在这里稍作停顿，因为今天早上我刚参加了一个关于人工智能与理解的对话，参与者是我们SFI行动网络的几位成员。

I'd like to hairpin out here for just a moment because I just sat in on a conversation with several of our people for the SFI Action Network this morning on AI and understanding.

Speaker 1

我们的一位研究员阿尔谢尼·莫斯克德维奇指出，他认为大型语言模型的未来将是经过精心挑选的定向数据集，而不是这种大规模的抓取方式。

And one of our fellows, Arseny Moskdevichev, made the point that he believes that the future of large language models is gonna be carefully curated targeted datasets rather than this enormous, like, mass capture approach.

Speaker 1

这一点很好。

And that Okay.

Speaker 1

如果我们真的想与这些类似ChatGPT的系统协同工作，那么我们接下来要做的，听起来就像你所说的那样，我想确认一下你的观点。

That if we really want to work symbiotically with these chat GPT style systems that we're going to start doing is sounds like this is what you're saying, and I just wanted to check this on you.

Speaker 1

你确实需要引入一种筛选方法。

You do need to bring in a curatorial approach.

Speaker 1

作为人类，你正在为数据的筛选提供领域专业知识，对吧。

You as the human being are lending domain specific knowledge Right.

Speaker 1

这些系统实际上正在处理的数据。

To the curation of the data that these systems are actually

Speaker 0

正在处理。

chewing through.

Speaker 0

对。

Right.

Speaker 0

所以我认为这两者之间是有重叠的。

So I don't I think there is overlap between that.

Speaker 0

我不确定它们是否完全相同。

I don't know if they're exactly the same.

Speaker 0

我没看到那场对话，但我认为‘筛选’这个词至少是一个还算恰当的用词。

I didn't see the conversation, but I guess curation's not that's I think that's at least reasonably a reasonably apt word to use.

Speaker 0

本质上来说，你知道，人们在研究中做的各种事情都可能让我吐槽。

It's basically like I you know, there's various things that people do in the research that can go make me rant.

Speaker 0

但拓扑工具在数学上具有一种抽象性。

But so the topological tools mathematically have a certain element of abstraction.

Speaker 0

而在过去几年里，人们正在让这些工具的软件变得越来越易用。

And right now, in the past few years, they're in the process of people making more and more accessible software and so on.

Speaker 0

我刚才提到，我们希望特征更长一些，因为那些我们认为更有用，但这取决于你调节的参数。

I was mentioning that, okay, we want features that are longer because those are the ones that we think are useful, but that depends on the knob that you're tuning.

Speaker 0

我们所考虑的这个参数，比如点的大小，必须是一个合适的调节项，因为如果选错了，你只是简单地说‘更长的特征更好’，那就会有大量现成的方法和代码，有人可能会说：‘我就用这个包，它说这个特征更长，那这就是我的算法输出，这些就是正确的特征。’

This was the size of the dot as we're thinking of a knob that we're tuning, needs to be an appropriate knob because if it's not, if you just say, oh, a longer feature is better, you've got bunch methods that are out there and code that's available and somebody says, well, I'm just going to use this package and this thing said that this feature is longer and now here's the output of my algorithm and I'm going say these are the features that are right.

Speaker 0

这就像是，好吧，让事情更易用确实很棒，我非常支持这一点，但随之而来的也有相反的风险：人们不会明智地使用它，只是简单地认为‘越长越好’，于是我就把这东西丢进机器里，然后直接拿输出结果当答案。

It's just like, okay, it's great to make things more accessible and I'm strongly in favor of that, but then there's also the reverse risk that goes through it and the people don't use it intelligently and are just like, oh, longer is better and so therefore I'm just going to say that I'm going to put this in the machine and I'm to just, here's the output of my thing.

Speaker 0

比如波特写了篇论文说‘越长越好’，因此这就对了。

Well, Porter wrote this paper and said that longer is better and so therefore this is right.

Speaker 0

但其实并不是这样。

And it's like, well, no.

Speaker 0

当然，这种事情在其他领域也发生过，对吧？

And of course, this has happened with other things too, right?

Speaker 0

这在社区检测和其他各种工具上确实都发生过：当一些远离开发过程、缺乏对潜在问题深入思考的人使用这些工具时，他们会说：‘我可以直接信任这个在线发布的、易于使用的工具的输出结果，只要把它套用在我最喜欢的數據集上就行，根本不用去思考我研究的问题本身。’我认为这绝对是个问题——我希望工具能易用，也希望人们使用它们，但我更希望人们能明智地使用，认真思考自己研究的问题，确保得出的结果是合理的。

It happens certainly has happened with community detection, happened with all these other tools where by the time somebody who is removed from a lot of the development and thinking about what can go wrong is using it, they're saying, well, I can just trust the output of this thing that has been put online that's easy for me to use without putting it on whatever my favorite dataset is, without actually thinking about the problem I'm studying, and I think that's definitely the I want tools to be accessible and I want them to use it, but I want people to try to use it intelligently and that you think about the problem you're studying, make sure that the answer makes sense.

Speaker 0

而解决问题的方法绝不是让工具变得难以使用，或者阻止人们使用，因为我觉得这也是个错误，关键在于领域知识非常重要。

And there's just the answer is not to make tools hard to use and for people not to use them because I think that's a mistake as well, but just when Domain knowledge matters a lot.

Speaker 0

领域知识真的非常重要，而且常常存在一种危险：认为‘现在有个工具用起来很简单’。

It really matters a lot, and and there's often a danger of saying, well, here's now a tool that's easy to use.

Speaker 0

于是就盲目地把它应用到所有事情上。

Let's apply it to everything indiscriminately.

Speaker 1

这确实是我们在科学传播中每天都会遇到的一个基本问题。

That's certainly a kind of a base issue that we run up against on a daily basis in science communication.

Speaker 1

对吧？

Right?

Speaker 1

数学形式化的精确性与语言类比的易懂性之间存在矛盾。

There's the the specificity of a mathematical formalization versus the accessibility of a verbal analogy.

Speaker 0

而且人们不喜欢，人类就是不喜欢，他们就是想知道，答案是肯定还是否定？

Well, and people don't like humans humans just don't like they they it's like they wanna know, is the answer yes or no?

Speaker 0

嗯，也许是吧。

It's like, well, maybe.

Speaker 0

像我这样理解这些问题的人，会欣赏其中的细微差别。

Like, I and someone who understands the issues is going to appreciate the nuance.

Speaker 0

但如果你解释得太多细节，整体图景就会被淹没。

But if you come if you explain too much of the nuance, then the big picture gets lost.

Speaker 0

但如果你忽视了，就会有人问：为什么你们还没治好癌症？

But if you ignore the it's like, oh, why haven't you cured cancer already?

Speaker 0

嗯，就是这样，抱歉。

Well, it's like, well, sorry.

Speaker 0

哦，你的理论变了。

It's like, oh, your theory changed.

Speaker 0

这些科学家到底在干什么？

What the hell are these scientists doing?

Speaker 0

其实，我们一开始就没打算用绝对的术语来告诉你这些。

It's like, well, we were never trying to tell you things in absolute terms in the first place.

Speaker 0

我理解科学传播有多难。

I appreciate how hard scientific communication is.

Speaker 0

我总是想表达细节，这就是为什么我的论文里有这么多括号注释。

I always want to speak in nuances, and I don't like to this is why my papers have so many parenthetical comments.

Speaker 0

就连对科学家来说，我也不想假装这些细节并不存在。

It's just like, well, even for scientists, I don't want to pretend that the nuances are not there.

Speaker 0

但如果加得太多，就会影响阅读的流畅性。

And then if you put in enough of them, there's a cost to how easy things are to read sometimes.

Speaker 0

因此，即使在与专家交流时，也存在一种平衡，当你与其他人群交流时更是如此。

And so there's a balance even when you're talking to to an audience of experts, there's really a balance when you're talking to others.

Speaker 0

所以，我的意思是，我非常喜欢这些工具，喜欢研究它们，我认为这正是科学的意义所在。

So I mean, it's I love these tools, I love working on them, and I think science means that.

Speaker 0

好的。

Okay.

Speaker 0

我不是科学传播者，但偶尔也参与过一些文章的撰写，我也很喜欢这种工作。

I'm not a science communicator, but I've occasionally dabbled in some of these articles and I love that too.

Speaker 0

但当人们希望我抛开所有细微差别来说话时，我真的——我在一开始就说过了，我讨厌把问题掩盖起来。

But it's just like when people want me to say something without a nuance, I really and I told this to at the beginning, I hate things being put under the rug.

Speaker 0

我真的很反感这样做，因为那样的话，别人就会说，哦，你可以做x。

I really I just there's a fundamental dislike that I have of doing that because then someone's like, oh, you can do x.

Speaker 0

然后过一段时间，你发现其中一个细微差别其实非常重要，结果发现并不是x。

And then suddenly later, you figure out something where one of those nuances turned out to really be important, and you it's not x.

Speaker 0

而是变成了x'。

It's now x prime.

Speaker 0

他们就会说：你怎么敢？

And they're like, how dare you?

Speaker 0

为什么现在变成了 x'？

How come it's now x prime?

Speaker 0

其实，从绝对意义上讲，它从来就不是 x。

It's like, well, it was never x in absolute terms in the first place.

Speaker 1

所以我认为，这为我们提供了一个机会，可以重新回到数据拓扑的明确讨论，因为是的。

So this, I think, actually gives us an affordance to double back into this the explicit discussion of the topology of data because Yeah.

Speaker 1

还有你的其他工作，我们甚至还没谈到，但我稍后会链接一篇相关文章。

And also your other work, we didn't even get to this, but I will link to a piece.

Speaker 1

你曾与丹尼·巴斯特共同参与了一个大团队的研究，对吧？

You were on a big team that Danny Bassett coauthoring Right.

Speaker 1

《学习过程中人脑网络的动态重组》。

The dynamic reconfiguration of human brain networks during learning.

Speaker 1

这里还有另一个关于时空尺度和层次组织的绝佳例子。

And here's, like, another here's another great example of spatiotemporal scales and hierarchical organization.

Speaker 0

这是真实的数据，而且是大脑数据，所以非常复杂。

It's real data, and it's brain data, so it's really complicated.

Speaker 1

对。

Right.

Speaker 1

但我觉得，也许这里并不是适合讨论这个问题的地方。

It's like though we think about maybe this is not the place to live this.

Speaker 1

但从集体学习的角度来看，作为一种过程的输出，需要在专家和非专家之间进行沟通，而这并不是单维的。

But in in terms of, like, collective learning as a an output of process in which things have to be communicated between experts and nonexperts, and that's not a one dimensional thing.

Speaker 1

对吧？

Right?

Speaker 1

我们上一期和保罗·斯马尔迪诺以及阮团队讨论过，专家识别是一个高度多维的问题，因为专家遍布于越来越多的领域，而这个问题确实如此。

We talked about that in the last episode with Paul Smaldino and team Nguyen talking about expert identification is highly dimensional problem with, like, experts in this proliferating number of myriad of domains and question of yeah.

Speaker 1

总之，我想说的是，我看到这些关于理论的问题，比如有人用过Adobe Illustrator或其他矢量绘图程序，就会思考Jigglypuff每个点周围圆盘的宽度问题。

So at any rate, what I'm getting at here is that I see these theoretical questions about somebody's used Adobe Illustrator or some kind of other vector illustration program that this question of how wide the disc around each dot of Jigglypuff.

Speaker 1

我对于这类考量的后果有一种直觉，特别是当你尝试给这个东西填充颜色的时候。

It's gonna gonna there I have an intuition around the consequences of these kinds of considerations based on if you try to put, like, a color fill into this thing.

Speaker 1

如果它是一个小点，那么颜色就会从你概念上视为一个物体的区域中渗出。

And if it's a tiny dot, then the color bleeds out of this thing that you conceptually consider an object.

Speaker 1

对。

Right.

Speaker 1

所以无论如何，作为应用数学家，我们在这里要说明的是，理论与实践之间存在联系，以及在何种层次上适合作出解析和研究。

So at any rate, there's I guess, what we're laying this is in as an applied mathematician, there is this connection between, again, theory and practice and the levels at which it is appropriate to resolve and investigate something.

Speaker 1

这具有重大影响。

And this has big implications.

Speaker 1

你提到使用俞和冯的方法来研究城市中的街道网络、雪花和蜘蛛网。

So you talk about using Yu and Feng, use this stuff to study street networks in cities, snowflakes, spider webs.

Speaker 1

特别提到杰夫·韦斯特和路易斯·贝坦库尔，他们研究过血管网络的拓扑结构，对。

Call out call out to Jeff West and Luis Betancourt who've looked at vascular network topologies and Right.

Speaker 0

所以是城市研究。

So urban research.

Speaker 0

是的。

Yeah.

Speaker 0

有很多人已经做过这项工作，实际上我们自己没做，但其他人已经用这些工具研究过不同形式的血管网络。

There there's a number of people who've done there's actually we haven't done this, but there's other people who've used these tools for vascular networks of different forms.

Speaker 0

已经有不少相关研究了。

There's a bunch there's been a there's been a bunch of work on that.

Speaker 0

所以实际上，是的。

So actually yeah.

Speaker 0

我这篇论文的合著者埃莱尼·卡蒂福里，以及她的一些研究论文——虽然我没有参与这些论文——就是使用这些想法研究血管网络的人之一。

So my co author on this paper, Eleni Katifori, and some of her research papers, and I'm not I'm not involved in these papers, is one of the people who's used these ideas for vascular networks.

Speaker 0

是的。

Yeah.

Speaker 0

所以这与杰夫和路易斯等人所采用的方法不同。

So it's a different approach from what people like like Jeff and Luis have done.

Speaker 0

对吧？

Right?

Speaker 0

也就是说，这是不同的工具，研究的问题也不一样。

Like, it's different it's a different set of tools and different they're asking different questions.

Speaker 0

这些系统在某种程度上有相似性，但它们对系统提出的问题却截然不同。

Somehow the systems have similarity, but they're asking really different questions about the system.

Speaker 1

是的。

Yeah.

Speaker 1

你能再稍微澄清一下吗？

So could you clarify that just a bit?

Speaker 1

因为，没错，我只是想把这个问题想清楚。

Because, yeah, I just I wanna get my head straight on this.

Speaker 0

对。

Right.

Speaker 0

所以，从拓扑学的角度来看，人们可能会提出这样的问题：首先，好吧。

So the type of questions that one might do for this from the sort of topological lens is, okay, you first say, okay.

Speaker 0

我有一个数据集，我可以直接问：它里面有没有空洞之类的？

I have a dataset, and I can just ask, do I have holes in it and so on?

Speaker 0

然后你会想：在这个特定的应用背景下，‘空洞’到底意味着什么？

Then you say, well, what does a hole even mean in the context of this particular application?

Speaker 0

当我们研究街道网络时，这是一篇论文中的一个例子，该论文包含了一系列示例。

When we were doing the street networks, this was an example in a paper that was a sort of set of examples.

Speaker 0

这是另一篇论文，我们之前已经发表过一篇理论性论文，但由于应用数学的出版速度比物理学慢，那篇前期论文的实际发表日期反而更晚。

This is another one of those papers where we had a prior theory paper, although because applied math publishing is slower than physics publishing, the prior paper actually has a later publication date.

Speaker 0

但不管怎样，有一篇论文介绍了相关方法，然后我们想，既然我们已经投入精力去实现这些方法，

But anyway, there was a paper that has methods and then we're like, well, we spent effort trying to do these methods.

Speaker 0

不如再写一篇面向不同读者群体的论文，包含其他多种应用，包括街道和蜘蛛网。

Let's also have a paper that we write for a different audience that has a bunch of other applications, including the streets and the spider webs.

Speaker 0

其中，街道应用是最重要的一项，在这篇论文中，街道应用是由Michelle Feng和我共同完成的，也是最核心的部分。

The street application was the most serious one and after this is Michelle Feng and me, the street application was the most serious one in that particular paper.

Speaker 0

我们当时关注的并不是一个点逐渐变大，

What we were looking at was we did not have a dot getting bigger.

Speaker 0

而是以一种不同于那种方式来定义我们的参数和调整对象。

We actually defined our parameter, our thing that we tune in a different way than that.

Speaker 0

我还没有解释过我们是如何定义这个参数的具体方式。

There was a particular way that I have not explained in how we defined it.

Speaker 0

这样做的目的是要说明：如果我取城市街道网络的一小部分，我得到的是像洛杉矶那样网格化且单调的结构，还是更有趣的结构？

And what this is meant to do is to say locally, if I take a small part of the street network of a city, am I getting something that looks grid like and boring like LA, or am I getting something more interesting?

Speaker 0

洛杉矶是这个数据集中最单调的例子之一，因为它的街道结构非常规整。

And LA is one of the most boring examples in that dataset because it's a street structure.

Speaker 0

还是我得到的是像伦敦或欧洲大陆城市那样的结构，它们有死胡同和其他各种复杂特征？

Or am I getting something like London or a city in Continental Europe that has dead ends and other various things?

Speaker 0

比如，我认为我们在论文中展示过一张图片，是巴塞罗那之类的城市，那里有各种不同的街道布局。

Like, one of the ones that I think we show a picture of in the paper is, like, Barcelona or something that has a bunch of different things go going on.

Speaker 0

因此，我们关注的是这类局部结构。

And so it's getting at a local structure of things along those lines.

Speaker 0

还有其他人尝试用不同的方法研究局部结构。

So There's other people who've tried to get at local structures with different methods.

Speaker 0

我之前提到过的马克·巴托洛梅就是其中一位研究者。

Mark Bartholomew, who I've mentioned before, is one of the people who's done that.

Speaker 0

事实上，在我们的论文中，由于他们研究的是一个相似的科学问题，只是采用了不同的数学方法，我们甚至在一个小节中直接与他们的论文进行了对比。

In fact, in our paper, we even because they were really looking at a similar question scientifically but just a different mathematical approach, we even have a direct comparison in a sort of subsection where we actually compare to one of their papers.

Speaker 0

而路易斯和杰夫所做的则更偏向宏观层面。

Whereas what Luis and Jeff are doing is much more at the macro scale.

Speaker 0

他们在探讨某些现象如何随规模变化。

They're asking questions about how certain phenomena might say scale with size.

Speaker 0

对吧？

Right?

Speaker 0

因此，即使系统是来自城市的街道网络，我们所提出的科学问题却处于不同的尺度，且是完全不同的问题。

So even though the system may be a street network from a city, the question that we're asking scientifically is at a different scale and a really very different question about it.

Speaker 0

所以，两者之间的重叠之处仅仅在于，好吧。

So where there is overlap is really just, okay.

Speaker 0

我们在探讨一个城市的问题。

We're asking a question about a city.

Speaker 0

但除此之外，我们所提出的问题就完全不同了。

But then beyond that, everything else is just really different, the questions that we're asking.

Speaker 1

是的。

Yeah.

Speaker 1

我想我之所以这么想，是因为特别关注了路易莎关于城市结构中网络干预的研究，比如在巴西贫民窟等地实际增加血管网络之类的举措。

I guess I'm just I was thinking specifically to Luisa's work on network interventions in city structure where they're actually adding vascularization into, like, Brazilian favelas and that kind of a thing.

Speaker 0

对。

Right.

Speaker 0

对。

Right.

Speaker 0

因此，这正是可以用这种方法来研究的问题，比如你可以问：这样做会带来什么影响？

So that would be something that one could look at with this approach where you could ask, does doing that okay.

Speaker 0

我们还没有讨论过如何实际测量拓扑摘要的各种方法。

So there there are various ways that we have not discussed of how you would actually, like, measure the topological summary.

Speaker 0

比如，有一些特定的统计指标是你需要关注的。

So, like, there's certain types of summary statistics of certain things that you would look at.

Speaker 0

因此，今天我们把它当作一个黑箱来处理。

And so we're doing this as a black box for the purpose of today.

Speaker 0

你可以探讨这些拓扑摘要统计量如何因干预而发生变化。

You could ask how those summary statistics, those summary topological statistics, change based on intervention.

Speaker 0

而且即使使用我们自己的代码，你也可以做到这一点。

And using even using actually our code, you could do that.

Speaker 0

对吧？

Right?

Speaker 0

你可以提出这个问题，去研究这个问题。

Like, you you could ask this question and look at this question.

Speaker 0

我们还没有做过。

We have not done that.

Speaker 0

如果你做了，你可能会问，这个拓扑摘要是否能提示某种干预措施？

And then if you did that, you would say, well, you could ask, is the topological summary something that might suggest an intervention?

Speaker 0

对吧？

Right?

Speaker 0

因此，人们可以设想这种做法，即这些测量结果是否有助于实现这一点。

So one could imagine that type of approach asking if these measurements can help with that.

Speaker 0

这不是我们做过的事情。

It's not something we've done.

Speaker 0

或者在另一个血管网络中，对吧？

Or in another vascular network, right?

Speaker 0

街道其实并不是血管网络，但你可以想象一下，你讨论的是营养物质的流动之类的，对吧？

Streets are not really a vascular network, but imagine you're talking about the flow of nutrients or something, right?

Speaker 0

在真菌数据上的实验中，我知道这些方法至少有一些变体已经被用于真菌数据了，牛津的马克·弗里克来自他实验室的数据，实际上，我曾与马克·弗里克和李相勋合写过一篇论文，其中发布了这些数据；而在另一篇论文中，丹尼·巴斯特、埃莱尼·卡蒂福里以及一些其他研究者也使用了这些数据并结合了拓扑方法，但涉及不同的论文和合作者。

With experiments on fungal data, and I know that these approaches, at least some variants of them have been used for fungal data, so Mark Fricker from Oxford is data from his lab is stuff that actually, there is data that I was it was released along with a paper that I wrote with Mark Fricker and Sanghoon Lee that in a different paper, Danny Bassett and Eleni Katifori and some various others actually use that data with topological methods, but there's different papers and collaborators involved.

Speaker 0

因此，你可以想象，这些真菌实验中，你实际上可以控制实验室的设置，可以设想不同的拓扑摘要能反映出良好的状态，或者某种你可以测量的状况。

And so one could imagine, so these fungal ones, you actually have experimental control of how you set up the lab, and you can imagine different topological summaries being indicative of good situations or of something that you would somehow measure.

Speaker 0

我不知道对真菌来说什么是‘良好的状态’，但不管怎样。

I don't know what good situations means for a fungus, but anyway.

Speaker 0

但你可以想象先这么做，然后进行干预，从而达到那种状态。

But you could imagine doing that and then doing an intervention that would get you there.

Speaker 0

这并不是我见过的一篇论文。

This is not a paper I've seen.

Speaker 0

对吧？

Right?

Speaker 0

我没见过有人做过这样的事，但我至少能想象这是一个有人会去探索的项目，这是一种可能性。

Like, I've not seen someone do that, but I can at least imagine this as a project that somebody would explore, and it's a possibility.

Speaker 1

在试图整合所有这些内容时，我想问你一个可能听起来有点疯狂的问题。

In in there's like, trying to synthesize all of this, and I in so doing, I wanna ask you what is honestly probably kinda an insane question here.

Speaker 1

好的。

Okay.

Speaker 1

因为这在我看来，并不能仅仅通过拓扑分析来解决。

Because this is not this does not strike me as something that you can get at solely through the application of kind of a topological analysis.

Speaker 1

但在这个过程中，你提到洛杉矶有着非常单调的网格状结构，而伦敦则有着错综复杂、很多死胡同的结构。

But in in so doing, you mentioned LA has this really boring grid like structure, and London has this weird convoluted lots of dead ends kind of thing.

Speaker 1

这篇文章的最后一页，你谈到你发现研究受精神活性物质影响的蜘蛛所织的网的拓扑结构特别有趣。

And last page of this article, you talk about how you found it particularly amusing to examine the topological structure of spider webs built by spiders under the influence of psychotropic substances.

Speaker 0

我应该说明一下这个想法的来源，也许。

I should say where this came from, maybe.

Speaker 0

我不知道。

I don't know.

Speaker 0

所以，这也是另一篇研究论文中的一个例子，我们决定在讨论这篇流行论文时加入了一点相关内容。

So that was also an example in this other research paper, we decided to include a little bit of it in the discussion of this popular paper.

Speaker 0

所以，这个例子原本是为了有趣而设计的。

So it was an example that was meant to be fun.

Speaker 0

所以这个想法背后有一段很长的科学故事。

So this idea so there's a couple there's a long story behind this in terms of science.

Speaker 0

有一个人叫彼得·维特，他多年前就做过这项研究，后来NASA确实发表过一篇一页纸的论文，内容是他们让蜘蛛服用药物，然后观察它们织出的不同网。

There's this guy, Peter Witt, who is somebody who did this years ago, and there was actually subsequently, there was a one page paper by NASA where they what NASA did is they sent they had spiders and they gave them some drugs and they're like, okay, what are the different webs?

Speaker 0

但他们没有说明原因。

And they don't say why.

Speaker 1

这项研究在网上可以找到。

Online for this research.

Speaker 0

是的。

Yeah.

Speaker 0

网上有一段视频。

There's a video online.

Speaker 0

有一段半严肃的视频，开头是真实的研究，然后就跑偏了。

There's a semi serious video that starts out with a real study and then goes off on there.

Speaker 0

所以这些图片中的一些被传播开来，我记得看到一些图片时就想，这太酷了。

And so there's these some of these pictures got circulated and I remember just seeing some of the pictures and I'm like, that would be cool.

Speaker 0

我们可以从拓扑学的角度来做这个。

We could do this topologically.

Speaker 0

于是我去找我当时的学生米歇尔，对她说：嘿。

And I go to my then PhD student, Michelle, and I'm like, hey.

Speaker 0

这太令人兴奋了。

This is exciting.

Speaker 0

顺便说一句，你有个很棒的博士生。

You have a good PhD student, by the way.

Speaker 0

嘿。

Hey.

Speaker 0

你有没有兴趣把这些拓扑方法用在受药物影响的蜘蛛所织的网上面？

Do you wanna take these topological methods and do this on the spider webs of spiders produced by drugs?

Speaker 0

他们说，好的。

And they said, yes.

Speaker 0

所以我觉得，这真是个好迹象，说明你有个很棒的学生，他们真的觉得去尝试这个主意不错。

And so I'm like, that's a really good that's a sign that you have a really good student that that they actually think that's a good idea to go and do that.

Speaker 0

于是我们决定这么做，顺便说一下，咖啡因蜘蛛和安眠药蜘蛛织的网是最值得关注的两个，所以我来喝口咖啡。

And so we decided to do that, and by the way, the caffeine spider is the the web of the caffeinated spider and the web of the sleeping pill spider are the two to really worry about, so I'm gonna sip coffee.

Speaker 0

干杯。

Cheers.

Speaker 0

是的。

Yeah.

Speaker 0

不过，我们觉得这很有趣，因为它是个有趣的例子。

So the idea though, we were amused by it because it was a fun example on it.

Speaker 0

我们喜欢在论文中加入一些有趣的例子，而且我们当时做的是方法类论文，只是想通过一个应用来给你个概念。

We like to include some fun examples in the paper and since we were doing a methods since we were doing a paper that just said, here's application one to give you an idea.

Speaker 0

再举第二个应用，让你有个概念。

Here's application two to give you an idea.

Speaker 0

这是第三个应用。

Here's application three.

Speaker 0

在这篇论文中加入一个有趣的例子是很合适的。

It's a nice paper in which to include a sort of fun a sort of fun application as part of it.

Speaker 0

大家都喜欢咖啡因蜘蛛。

Everyone likes Caffeine Spider.

Speaker 0

人们觉得这个例子很有趣，所以我很高兴我们做了这个。

It's people get a kick out of the example, so I'm glad we did it.

Speaker 1

这正是我想问你的问题，我觉得这正是复杂系统科学一直在追求的风车——它所测试的桥梁。

Well, so this is the question that I had for you, which is it strikes me that this is the kind of windmill that the questing of complex systems science is always going after the bridge that it's testing.

Speaker 1

我们真的能跨过这座桥吗？

Can we actually walk across this?

Speaker 1

这里的类比真的成立吗？

Is there actually an analogy here that holds?

Speaker 1

LSD蜘蛛让我联想到洛杉矶或威奇托式的拓扑结构，而咖啡因蜘蛛则让我想到一种类似伦敦的拓扑结构。

The LSD spider strikes me as like a Los Angeles or Wichita style topology, and the caffeine spider strikes me as a kind of like London topology.