Eric Berlow: I'm an ecologist, and Sean's a physicist, and we both study complex networks. And we met a couple years ago when we discovered that we had both given a short TED Talk about the ecology of war, and we realized that we were connected by the ideas we shared before we ever met. And then we thought, you know, there are thousands of other talks out there, especially TEDx Talks, that are popping up all over the world. How are they connected, and what does that global conversation look like? So Sean's going to tell you a little bit about how we did that.
Eric Berlow: 我是一个生态学家,而Sean是个物理学家, 我们都在研究一些复杂的网络系统。 几年前我们见了次面 发现我们都曾在TED 做过关于战争生态学的演讲, 然后发现,即便没见过彼此, 我们也因那些共同的想法联系在一起了。 我们觉得,你们也知道,TED有成千上万个演讲, 其中TEDx的演讲特别多, 已经遍布世界各地了。 他们是如何联系在一起的, 这种跨越国界的对话会是怎样的? 下面就由Sean为大家讲一下我们所做的事。
Sean Gourley: Exactly. So we took 24,000 TEDx Talks from around the world, 147 different countries, and we took these talks and we wanted to find the mathematical structures that underly the ideas behind them. And we wanted to do that so we could see how they connected with each other.
Sean Gourley: 没错,我们从世界各地147个国家 挑选了24,000个TEDx演讲。 在这些演讲中,我们想要找到 一种数学结构 来揭示视频背后的思想。 我们想通过这样做来找出 这些演讲是如何联系在一起的。
And so, of course, if you're going to do this kind of stuff, you need a lot of data. So the data that you've got is a great thing called YouTube, and we can go down and basically pull all the open information from YouTube, all the comments, all the views, who's watching it, where are they watching it, what are they saying in the comments. But we can also pull up, using speech-to-text translation, we can pull the entire transcript, and that works even for people with kind of funny accents like myself. So we can take their transcript and actually do some pretty cool things. We can take natural language processing algorithms to kind of read through with a computer, line by line, extracting key concepts from this. And we take those key concepts and they sort of form this mathematical structure of an idea. And we call that the meme-ome. And the meme-ome, you know, quite simply, is the mathematics that underlies an idea, and we can do some pretty interesting analysis with it, which I want to share with you now.
当然,如果想要完成这个目标 得需要很多数据。 这些数据就是来自伟大的YouTube, 我们能深入到YouTube,把它 所有公开的信息都找出来, 包括评论、点击率、浏览者信息 浏览的地点、评论的具体内容。 但我们也能直接地,通过演讲内容翻译成文本的方式, 能获取整个字幕文本。 这方法对像我一样有搞笑口音的人也是行得通的。 然后我们拿着这些文本 做一些很酷的事情。 我们让电脑用自然语言处理算法 去逐行地阅读文本, 从中提取关键思想。 这些关键思想后会形成 该样的思想数学结构。 我们称之为“文化基因集合”。 所谓文化基因集合,其实很简单, 就是一种解释思想的数学, 在此之上我们能做一些很有趣的分析, 现在就和大家分享一下这些分析。
So each idea has its own meme-ome, and each idea is unique with that, but of course, ideas, they borrow from each other, they kind of steal sometimes, and they certainly build on each other, and we can go through mathematically and take the meme-ome from one talk and compare it to the meme-ome from every other talk, and if there's a similarity between the two of them, we can create a link and represent that as a graph, just like Eric and I are connected.
每种思想都有它独自的文化基因集合, 每个文化基因集合也不尽相同, 当然,思想嘛,总是大同小异的, 有时也会相互借鉴, 当然也会有所发展, 我们能从数学层面去检查 然后从演讲里提取文化基因集合, 然后和其他视频的文化基因集合做比较, 如果这两者中有相似之处, 我们就能建立一种联系,用图表来表示, 就像Eric和我的联系。
So that's theory, that's great. Let's see how it works in actual practice. So what we've got here now is the global footprint of all the TEDx Talks over the last four years exploding out around the world from New York all the way down to little old New Zealand in the corner. And what we did on this is we analyzed the top 25 percent of these, and we started to see where the connections occurred, where they connected with each other. Cameron Russell talking about image and beauty connected over into Europe. We've got a bigger conversation about Israel and Palestine radiating outwards from the Middle East. And we've got something a little broader like big data with a truly global footprint reminiscent of a conversation that is happening everywhere.
理论上就这样,挺好的。 那让我们看看在实际中它是如何运作的。 我们现在看到的就是一个全球分布图 代表过去四年 全球所有的TEDx演讲出现的轨迹, 从纽约一直到在地球板块角落小小的古老的新西兰。 然后我们分析了前25%的演讲, 开始意识到联系是在哪里产生的了以及 他们是在哪里相互联系的。 Cameron Russell讲了欧洲各地相互联系的图像和美。 Cameron Russell讲了欧洲各地相互联系的图像和美。 关于以色列和巴勒斯坦联系的交流更多, 是从中东开始的。 我们也有一些全球范围内的比较广泛的对话, 就像一种遍布全球的大数据,反映了某特定话题。 就像一种遍布全球的大数据,反映了某特定话题。 就像一种遍布全球的大数据,反映了某特定话题。
So from this, we kind of run up against the limits of what we can actually do with a geographic projection, but luckily, computer technology allows us to go out into multidimensional space. So we can take in our network projection and apply a physics engine to this, and the similar talks kind of smash together, and the different ones fly apart, and what we're left with is something quite beautiful.
所以从这里,我们似乎碰上了瓶颈, 这些地理投影到底能做什么, 幸运的是,电脑技术能让我们跳出常规框架 进入多维空间。 因此,我们能利用我们的网状投影 使用物理引擎 将类似的演讲一起做离心运动, 不同的则会飞离, 剩下的就是很美的东西。
EB: So I want to just point out here that every node is a talk, they're linked if they share similar ideas, and that comes from a machine reading of entire talk transcripts, and then all these topics that pop out, they're not from tags and keywords. They come from the network structure of interconnected ideas. Keep going.
EB:我只想说明一下,每个节点就是一个演讲, 如果他们内容类似,就会连在一起 这一些都由一个机器去读取 所有演讲的字幕文本, 所有这些弹出来的标题, 他们不是取自标签或者关键字。 他们是取自相互联系思想的网络结构。你继续 他们是取自相互联系思想的网络结构。你继续
SG: Absolutely. So I got a little quick on that, but he's going to slow me down. We've got education connected to storytelling triangulated next to social media. You've got, of course, the human brain right next to healthcare, which you might expect, but also you've got video games, which is sort of adjacent, as those two spaces interface with each other.
SG:好的。我其实说得有点快了, 他只是想我讲慢一点。 现在我们看到“教育”是和“'讲故事” 还有“社交媒体”形成的三角架构。 还有,当然,“人类大脑”紧挨着“医疗保健”, 这估计在预料中, 但同时还有“电子游戏”似乎在和 这两个联系在一起的又有一定的重叠。
But I want to take you into one cluster that's particularly important to me, and that's the environment. And I want to kind of zoom in on that and see if we can get a little more resolution. So as we go in here, what we start to see, apply the physics engine again, we see what's one conversation is actually composed of many smaller ones. The structure starts to emerge where we see a kind of fractal behavior of the words and the language that we use to describe the things that are important to us all around this world. So you've got food economy and local food at the top, you've got greenhouse gases, solar and nuclear waste. What you're getting is a range of smaller conversations, each connected to each other through the ideas and the language they share, creating a broader concept of the environment. And of course, from here, we can go and zoom in and see, well, what are young people looking at? And they're looking at energy technology and nuclear fusion. This is their kind of resonance for the conversation around the environment. If we split along gender lines, we can see females resonating heavily with food economy, but also out there in hope and optimism.
但我想让你们看一个群 这对我尤其重要,那就是“环境”。 我们放大一点 看看能不能清晰一点。 放大之后,我们将看到的 再次代入物理引擎, 就可以看到一个话题 其实由很多小话题组成的。 这个结构开始显现出 一些我们所用遣词造句 的分形行为的地方以及在全球人们用来形容重要事物的语言。 的分形行为的地方以及在全球人们用来形容重要事物的语言。 的分形行为的地方以及在全球人们用来形容重要事物的语言。 所以“食品经济”和“当地食品”在顶端, “温室气体”,“太阳能和核能浪费”也在前列。 你能看到一系列小的话题, 每个都因他们的思想和语言 而联系在一起, 从而创造了一个更大的关于环境的概念。 当然,从这里 我们通过放大能看到年轻人在看什么。 他们聚焦在能源技术和核聚变。 这是他们对环境话题 产生的一种共鸣。 如果我们按性别来分类的话, 能看到女性的共鸣 更与食品经济有关,是充满希望和乐观的。
And so there's a lot of exciting stuff we can do here, and I'll throw to Eric for the next part.
当然还有很多有趣的发现, 我就交给Eric来讲下一部分。
EB: Yeah, I mean, just to point out here, you cannot get this kind of perspective from a simple tag search on YouTube. Let's now zoom back out to the entire global conversation out of environment, and look at all the talks together. Now often, when we're faced with this amount of content, we do a couple of things to simplify it. We might just say, well, what are the most popular talks out there? And a few rise to the surface. There's a talk about gratitude. There's another one about personal health and nutrition. And of course, there's got to be one about porn, right? And so then we might say, well, gratitude, that was last year. What's trending now? What's the popular talk now? And we can see that the new, emerging, top trending topic is about digital privacy.
EB:恩,我就想指出, 你是没法从YouTube那个简单的搜索栏里 得到这种回馈的。 现在我们跳出环境话题,重新回到全球的对话, 来看下所有的这些演讲。 当面对如此数量的内容时,我们经常 通过其他手段去简化它。 我们可能会说, 那里面最受欢迎的演讲有哪些, 然后就有一些浮现出来。 其中一个是关于感恩的。 有个是关于个人健康与营养的。 当然,肯定会有一个是关于色情的,对吧? 所以我们就能说,感恩,是去年的主题。 现在的趋势是什么?现在流行的演讲是什么? 这样我们就能看到数字隐私成为新生的热门话题。 这样我们就能看到数字隐私成为新生的热门话题。
So this is great. It simplifies things. But there's so much creative content that's just buried at the bottom. And I hate that. How do we bubble stuff up to the surface that's maybe really creative and interesting? Well, we can go back to the network structure of ideas to do that. Remember, it's that network structure that is creating these emergent topics, and let's say we could take two of them, like cities and genetics, and say, well, are there any talks that creatively bridge these two really different disciplines. And that's -- Essentially, this kind of creative remix is one of the hallmarks of innovation. Well here's one by Jessica Green about the microbial ecology of buildings. It's literally defining a new field. And we could go back to those topics and say, well, what talks are central to those conversations? In the cities cluster, one of the most central was one by Mitch Joachim about ecological cities, and in the genetics cluster, we have a talk about synthetic biology by Craig Venter. These are talks that are linking many talks within their discipline. We could go the other direction and say, well, what are talks that are broadly synthesizing a lot of different kinds of fields. We used a measure of ecological diversity to get this. Like, a talk by Steven Pinker on the history of violence, very synthetic.
这很棒,它简化了一切。 但这样就有很多新鲜的内容 被淹没在底部了。 我不喜欢这样。我们要怎样才能让这些 可能真的有新意有趣的话题回到顶层呢? 其实我们可以回到思想的网络结构上 来实现这一点。 记住,是这个网络结构 让这些话题显现出来, 我们不妨拿出其中的两个 像“城市”和“遗传”,然后看看 是否有其他演讲能有创意地把这两个不同的科目联系起来。 而这——基本上,这种创新性的混合 就是革新的标记之一。 这儿有一个Jessica Green 关于建筑的微生物生态学的演讲。 它事实上是在定义一个新的领域。 然后我们再回到这两个话题上, 想象在这里,有哪些是比较核心的? 在“城市”的那堆里,其中一个最核心的 就是Mitch Joachim的生态城市, 而在“遗传”的那堆里 有Craig Venter的一个关于合成生物学的演讲。 还有很多演讲以他们的内容与其他许多演讲联系在一起的。 我们还可以从另一个方向入手,比方说 有哪些演讲,是广泛地 综合了很多不同领域的。 我们利用一种生态多样性的方法得到这个答案。 例如,Steven Pinker的一个演讲是关于暴力的历史, 非常综合。
And then, of course, there are talks that are so unique they're kind of out in the stratosphere, in their own special place, and we call that the Colleen Flanagan index. And if you don't know Colleen, she's an artist, and I asked her, "Well, what's it like out there in the stratosphere of our idea space?" And apparently it smells like bacon. I wouldn't know. So we're using these network motifs to find talks that are unique, ones that are creatively synthesizing a lot of different fields, ones that are central to their topic, and ones that are really creatively bridging disparate fields. Okay? We never would have found those with our obsession with what's trending now. And all of this comes from the architecture of complexity, or the patterns of how things are connected.
当然,还有一些演讲是非常独特的, 已经到了一定境界,不是常人能理解的, 我们称之为Colleen Flanagan指数。 如果你不知道Colleen没关系,她是个艺术家, 所以我问她,“我们空间概念的最高层 是什么样的?” 显然那地方闻着像培根。 我也不知道。 所以我们用这些网络图形 来寻找那些独特的演讲, 那些创造性地综合了很多领域的演讲, 那些中心明确的演讲, 还有那些创新地把不相干的领域联系起来的演讲。 对吧?一味地看趋势的话 我们是没法找到这些演讲的。 所有的这一切来自复杂性架构 或是事物联系的模式。
SG: So that's exactly right. We've got ourselves in a world that's massively complex, and we've been using algorithms to kind of filter it down so we can navigate through it. And those algorithms, whilst being kind of useful, are also very, very narrow, and we can do better than that, because we can realize that their complexity is not random. It has mathematical structure, and we can use that mathematical structure to go and explore things like the world of ideas to see what's being said, to see what's not being said, and to be a little bit more human and, hopefully, a little smarter.
SG:非常有道理。 我们处在一个非复杂的世界, 我们处在一个非常复杂的世界, 我们一直试图用算法简化它 以便去驾驭它。 而这些算法就算有时有用, 也是非常有限的,而我们能做得更好, 因为我们能意识到这种复杂性不是偶然。 它有数学架构, 而我们能用这个数学架构去深入研究 例如这个世界上的所有思想, 去看看人们都讨论些什么,还有什么没讨论过的, 从而使得这些数据显得更人性化 更富有智慧。
Thank you.
谢谢
(Applause)
(掌声)