For the last 10 years, I've been spending my time trying to figure out how and why human beings assemble themselves into social networks. And the kind of social network I'm talking about is not the recent online variety, but rather, the kind of social networks that human beings have been assembling for hundreds of thousands of years, ever since we emerged from the African savannah. So, I form friendships and co-worker and sibling and relative relationships with other people who in turn have similar relationships with other people. And this spreads on out endlessly into a distance. And you get a network that looks like this. Every dot is a person. Every line between them is a relationship between two people -- different kinds of relationships. And you can get this kind of vast fabric of humanity, in which we're all embedded.
过去十年间,我一直在想 人们是怎样把自己放在社交网络中的 以及他们为什么要这么做。 这里的社交网络 不是最近网上最近流行的那种 而是, 自从人类从非洲大陆出现之后, 人们几十万年来进行的社交活动。 比如说, 我和其他人建立友谊,同事,兄弟,亲戚的关系, 其他人也和另外其他人建立类似的关系。 这样的关系无尽止地延伸出去。 这样你就有了一个像这样的网络。 网络中的每一个点就是一个人。 连接两点的每一条线就是两个人之间的关系 -- 不同类型的关系。 这样就得到了这种巨大的人际关系网, 我们都交织在网中。
And my colleague, James Fowler and I have been studying for quite sometime what are the mathematical, social, biological and psychological rules that govern how these networks are assembled and what are the similar rules that govern how they operate, how they affect our lives. But recently, we've been wondering whether it might be possible to take advantage of this insight, to actually find ways to improve the world, to do something better, to actually fix things, not just understand things. So one of the first things we thought we would tackle would be how we go about predicting epidemics.
我的同事,詹姆斯.福乐,和我一起已经研究了好些时间 什么是支配这些网络的数学,社交, 生物和心理规则 以及 什么是基本的规则 来支配这些网络的运作,和如何影响我们的生活。 最近,我们在研究 是否有可能利用这种认识 来发现改善这个世界的方法, 做一些好事 解决一些问题,而不只是理解而已。 所以我们认为需要解决的一件首要的事情 就是怎样预测传染病。
And the current state of the art in predicting an epidemic -- if you're the CDC or some other national body -- is to sit in the middle where you are and collect data from physicians and laboratories in the field that report the prevalence or the incidence of certain conditions. So, so and so patients have been diagnosed with something, or other patients have been diagnosed, and all these data are fed into a central repository, with some delay. And if everything goes smoothly, one to two weeks from now you'll know where the epidemic was today. And actually, about a year or so ago, there was this promulgation of the idea of Google Flu Trends, with respect to the flu, where by looking at people's searching behavior today, we could know where the flu -- what the status of the epidemic was today, what's the prevalence of the epidemic today.
现在预测传染病的方法 -- 如果是美国疾病控制预防中心或者其他国家级的机构 -- 就是呆在原地 从医生和实验室 收集数据 来分析疾病的流行性和发病率。 所以,(如果)一些病人在一个地方被诊断了得病, 或者其他病人在另一个地方得到诊断, 所有这些数据,一定的延迟之后,都送到一个中心数据库。 如果一切顺利, 一两个星期之后, 你就会知道发生在今天的传染病在何处。 实际上,一年前左右, 曾经有过这样的一个,google流感趋势的想法 关于流感, 通过观察人们今天的搜索行为, 我们能知道流感的发病区... 传染病今天的状况, 以及传播的趋势。
But what I'd like to show you today is a means by which we might get not just rapid warning about an epidemic, but also actually early detection of an epidemic. And, in fact, this idea can be used not just to predict epidemics of germs, but also to predict epidemics of all sorts of kinds. For example, anything that spreads by a form of social contagion could be understood in this way, from abstract ideas on the left like patriotism, or altruism, or religion to practices like dieting behavior, or book purchasing, or drinking, or bicycle-helmet [and] other safety practices, or products that people might buy, purchases of electronic goods, anything in which there's kind of an interpersonal spread. A kind of a diffusion of innovation could be understood and predicted by the mechanism I'm going to show you now.
但我今天想给你展示的 是一种方法 通过这个方法,我们不只是得到关于传染病的警示, 而且 也能够及早发现传染病。 事实上,这个想法不只能 预测病毒的传播, 也能预测很多事情的传播。 比如说,任何以社交形式传播的事情, 都能用这种方法来理解, 从左边这些抽象的事物, 像爱国主义,或利他主义,或者宗教, 到具体的事物, 像饮食行为,或者买书, 或饮酒,或自行车头盔和其他的一些安全措施, 或者人们可能买的产品, 电子书的购买, 任何能在人们之间传播的事情。 一种创新的传播 可以用我即将展示的方法 来理解和预测。
So, as all of you probably know, the classic way of thinking about this is the diffusion-of-innovation, or the adoption curve. So here on the Y-axis, we have the percent of the people affected, and on the X-axis, we have time. And at the very beginning, not too many people are affected, and you get this classic sigmoidal, or S-shaped, curve. And the reason for this shape is that at the very beginning, let's say one or two people are infected, or affected by the thing and then they affect, or infect, two people, who in turn affect four, eight, 16 and so forth, and you get the epidemic growth phase of the curve. And eventually, you saturate the population. There are fewer and fewer people who are still available that you might infect, and then you get the plateau of the curve, and you get this classic sigmoidal curve. And this holds for germs, ideas, product adoption, behaviors, and the like. But things don't just diffuse in human populations at random. They actually diffuse through networks. Because, as I said, we live our lives in networks, and these networks have a particular kind of a structure.
正如你们所有人也许知道的, 考虑这个问题的传统方法 是创新扩散, 或创新采用曲线。 这儿,Y轴上显示的是受到影响的人们的百分比, x轴上显示的是时间。 一开始,受到影响的人不多, 得到的是S状的分布, 或者说S形状的曲线。 形成这种形状的原因是这样的,在一开始, 假设一两个人 受到影响, 然后他们去影响两个人, 然后就是四个,八个,十六个,等等, 这样你就得到了这个曲线的传播增长阶段。 最终,这个群体就饱和了。 可以被影响的人 就越来越少, 这时你就得到了曲线的平顶部分, 这样就形成了经典的S状曲线。 这种方法可以用于病毒,观点, 产品推广,行为, 以及其他类似的情况。 但是这些事物在人群中并非是随机传播的。 他们实际上是通过网络传播。 正如我所说的,因为我们在网络中生存, 这些网络有一种特殊的结构。
Now if you look at a network like this -- this is 105 people. And the lines represent -- the dots are the people, and the lines represent friendship relationships. You might see that people occupy different locations within the network. And there are different kinds of relationships between the people. You could have friendship relationships, sibling relationships, spousal relationships, co-worker relationships, neighbor relationships and the like. And different sorts of things spread across different sorts of ties. For instance, sexually transmitted diseases will spread across sexual ties. Or, for instance, people's smoking behavior might be influenced by their friends. Or their altruistic or their charitable giving behavior might be influenced by their coworkers, or by their neighbors. But not all positions in the network are the same.
现在,如果你看像这样的一个网络。。。 有105人。 这些线代表。。。这些点是人, 这些线代表朋友关系。 你也许看到在这个网络中 人们占据了不同的地点。 人们之间也有不同的关系。 这些关系可以是朋友关系,同胞关系, 配偶关系,同事关系, 邻居关系,以及类似的关系。 不同的事物 通过不同的联系来传播。 比如说,性病 通过性关系传播。 或者,比如说,人们的吸烟行为 可能会受到他们朋友的影响。 或者利他主义以及慈善施舍行为 可能会受到同事的影响, 或者邻居的影响。 但并非网络中所有的位置都是一样的。
So if you look at this, you might immediately grasp that different people have different numbers of connections. Some people have one connection, some have two, some have six, some have 10 connections. And this is called the "degree" of a node, or the number of connections that a node has. But in addition, there's something else. So, if you look at nodes A and B, they both have six connections. But if you can see this image [of the network] from a bird's eye view, you can appreciate that there's something very different about nodes A and B. So, let me ask you this -- I can cultivate this intuition by asking a question -- who would you rather be if a deadly germ was spreading through the network, A or B? (Audience: B.) Nicholas Christakis: B, it's obvious. B is located on the edge of the network. Now, who would you rather be if a juicy piece of gossip were spreading through the network? A. And you have an immediate appreciation that A is going to be more likely to get the thing that's spreading and to get it sooner by virtue of their structural location within the network. A, in fact, is more central, and this can be formalized mathematically. So, if we want to track something that was spreading through a network, what we ideally would like to do is to set up sensors on the central individuals within the network, including node A, monitor those people that are right there in the middle of the network, and somehow get an early detection of whatever it is that is spreading through the network.
所以你看这里,就能立即理解 不同的人有不同数量的连接。 有些人有一个连接,有些人有两个, 有些人有六个,有些人有十个连接。 这就叫做结点的度数, 或者一个结点有的连接数目。 但是,还有些别的东西。 如果你看结点A和B, 都有六个连接关系。 但是如果拔高来看, 你就能理解A和B是 非常不一样。 让我来问你 -- 用一个问题来说明这个直觉 --- 如果有个致命的病毒正在网络中传播, 你更愿意是哪一个,A还是B? (观众:B) 尼古拉斯·克里斯塔吉斯: B, 这很明显。 B在网络的边缘。 现在,如果有个非常有料的流言在网络中传播, 你愿意是哪一个? A。 你很快就看到 A更有可能 更快地得到正在传播的事物 因为他们在网络中的位置。 A,实际上,(位置)更加中心, 这个能在数学上来表示。 如果我们要追踪 在网络中传播的事物, 理想情况下我们会想设置感应器 在网络的中心人物上, 包括结点A, 以此来观察网络中心的人们的活动, 从而能够做到及早探测 正在网络中传播的东西。
So if you saw them contract a germ or a piece of information, you would know that, soon enough, everybody was about to contract this germ or this piece of information. And this would be much better than monitoring six randomly chosen people, without reference to the structure of the population. And in fact, if you could do that, what you would see is something like this. On the left-hand panel, again, we have the S-shaped curve of adoption. In the dotted red line, we show what the adoption would be in the random people, and in the left-hand line, shifted to the left, we show what the adoption would be in the central individuals within the network. On the Y-axis is the cumulative instances of contagion, and on the X-axis is the time. And on the right-hand side, we show the same data, but here with daily incidence. And what we show here is -- like, here -- very few people are affected, more and more and more and up to here, and here's the peak of the epidemic. But shifted to the left is what's occurring in the central individuals. And this difference in time between the two is the early detection, the early warning we can get, about an impending epidemic in the human population.
也就是说,假如你看到网络中心的人们感染病毒或得到了一些信息, 你就能知道,很快 所有人都会被传染这种病毒 或得到这个消息。 这种方法 比不考虑群体的结构,监测六个随机选择的人, 要好的多。 实际上,如果能够这样做, 你就会看到像这样的情况。 在左边,我们有S形状的传播曲线。 这条红色的点线,我们表示的是 在随机人群中的感染率, 左手的线条,向左移动, 表现的是 在网络的中心群体中的感染率。 Y轴上是感染个体的累计总数, X轴上是时间。 右边,我们显示同样的数据, 但是在每天的个体数。 我们在这里要显示的是 -- 比如说,这里 -- 很少的人受到影响,然后逐渐增多到这里, 这里是感染的高峰。 但是移到左边,是在中心群体中的发展趋势。 两者之间在时间上的区别 正是我们能够得到 关于传染病在人群中的 早期预测, 早期示警。
The problem, however, is that mapping human social networks is not always possible. It can be expensive, not feasible, unethical, or, frankly, just not possible to do such a thing. So, how can we figure out who the central people are in a network without actually mapping the network? What we came up with was an idea to exploit an old fact, or a known fact, about social networks, which goes like this: Do you know that your friends have more friends than you do? Your friends have more friends than you do, and this is known as the friendship paradox. Imagine a very popular person in the social network -- like a party host who has hundreds of friends -- and a misanthrope who has just one friend, and you pick someone at random from the population; they were much more likely to know the party host. And if they nominate the party host as their friend, that party host has a hundred friends, therefore, has more friends than they do. And this, in essence, is what's known as the friendship paradox. The friends of randomly chosen people have higher degree, and are more central than the random people themselves.
然而,这个方法的难处在于, 测绘人类的社交关系网 并不总是可能的。 这很昂贵,[很难], 不正当, 或者坦白说,就是没可能做这样的事情。 那么,我们怎样能弄清楚 哪些人在网络中心 而不需要通过测绘整个网络呢? 我们想出来的方法 是采用了一个古老的事实, 或者说关于社交网络的已知事实, 这个事实是这样的: 你知道你的朋友 有比你更多的朋友吗? 你的朋友比你有更多的朋友。 这个称为朋友的悖论。 想像有一个在社交网络中非常受欢迎的人物-- 就像一个聚会的主持有几百个朋友-- 而一个憎恨人类的人只有一个朋友, 然后你随机从人群中选个人; 他们更有可能认识聚会的主持。 如果他们选择晚会主持作为他们的朋友, 那么这个聚会主持就有一百个朋友, 因此,就有比他们更多的朋友。 这个就称为朋友悖论。 随机选择的人群的朋友 比随机人群本身, 有更高的(关系)度数,并且更加中心。
And you can get an intuitive appreciation for this if you imagine just the people at the perimeter of the network. If you pick this person, the only friend they have to nominate is this person, who, by construction, must have at least two and typically more friends. And that happens at every peripheral node. And in fact, it happens throughout the network as you move in, everyone you pick, when they nominate a random -- when a random person nominates a friend of theirs, you move closer to the center of the network. So, we thought we would exploit this idea in order to study whether we could predict phenomena within networks. Because now, with this idea we can take a random sample of people, have them nominate their friends, those friends would be more central, and we could do this without having to map the network.
你能对这个理论有一个本能的理解 如果想像网络周边的人群。 如果你选择(网络周边的)这个人 他们能选择的唯一朋友就是这个人, 而这个,在这个结构中,一定有至少两个朋友, 通常情况下,有更多的朋友。 这种情况发生在每个周边结点上。 实际上,每当你加入一个网络的时候这个情况都会发生, 你选择的每个人,当他们随机选择。。。 当任意一个人选择他们的一个朋友, 你就向网络中心移动。 所以,我们想利用这个概念 来研究是否能预测网络的现象。 因为,有了这个概念, 我们就选择一个随机人群, 让他们提供他们的朋友, 他们的朋友就更加中心, 这样我们就能选择网络的中心,而不用描绘整个网络。
And we tested this idea with an outbreak of H1N1 flu at Harvard College in the fall and winter of 2009, just a few months ago. We took 1,300 randomly selected undergraduates, we had them nominate their friends, and we followed both the random students and their friends daily in time to see whether or not they had the flu epidemic. And we did this passively by looking at whether or not they'd gone to university health services. And also, we had them [actively] email us a couple of times a week. Exactly what we predicted happened. So the random group is in the red line. The epidemic in the friends group has shifted to the left, over here. And the difference in the two is 16 days. By monitoring the friends group, we could get 16 days advance warning of an impending epidemic in this human population.
我们用这个想法来测试H1N1流感的爆发 在哈佛大学 2009年的秋冬,就是几个月前。 我们随机选择了1300本科学生, 让他们推举他们的朋友, 然后我们跟踪随机的学生人群和他们的朋友 每天按时 观察他们是否传染上流感。 我们观察的方法是看他们有没有去过大学健康服务机构。 并且我们要求他们一个星期给我们发几次电子邮件。 我们的预测一点不错的发生了。 这个随机组在这个红线上。 朋友组的传染移到左边,这里 中间相差了16天。 通过检测朋友组, 我们能够得到16天的预先示警 在这个人群的关于这个传染病的传播。
Now, in addition to that, if you were an analyst who was trying to study an epidemic or to predict the adoption of a product, for example, what you could do is you could pick a random sample of the population, also have them nominate their friends and follow the friends and follow both the randoms and the friends. Among the friends, the first evidence you saw of a blip above zero in adoption of the innovation, for example, would be evidence of an impending epidemic. Or you could see the first time the two curves diverged, as shown on the left. When did the randoms -- when did the friends take off and leave the randoms, and [when did] their curve start shifting? And that, as indicated by the white line, occurred 46 days before the peak of the epidemic. So this would be a technique whereby we could get more than a month-and-a-half warning about a flu epidemic in a particular population.
现在,在这个基础上, 如果你是分析师,要研究一种传染病 或者预测一个产品的推广, 你能做的是选择一个随机的人群, 让他们任命他们的朋友,然后跟踪他们的朋友, 跟踪随机组和朋友组。 在朋友组中,你看到的第一个零上的尖峰信号 关于,比如说,创新科技的采纳, 就是即将来临的流行趋势的信号。 或者你能看到两条曲线第一次分离的地方, 就像左边显示的。 朋友组什么时候开始 与随机组分离, 他们的曲线什么时候开始偏移? 正如白线显示的, 发生在 流行高峰的46天之前。 通过这个技术 我们能得到关于流感在特定人群中传播 一个半月以上的预先示警。
I should say that how far advanced a notice one might get about something depends on a host of factors. It could depend on the nature of the pathogen -- different pathogens, using this technique, you'd get different warning -- or other phenomena that are spreading, or frankly, on the structure of the human network. Now in our case, although it wasn't necessary, we could also actually map the network of the students.
我应该说 能多早得到关于一些事情的消息 取决于很多因素。 它也许取决于病原体的本质--- 不同的病原体, 使用这种技术,你可能得到不同的示警--- 或者其他一些传播的现象, 或者,直接的说,在人类网络的结构中。 现在,在我们的例子中,尽管不是很必要, 我们也能够描绘这个学生网络。
So, this is a map of 714 students and their friendship ties. And in a minute now, I'm going to put this map into motion. We're going to take daily cuts through the network for 120 days. The red dots are going to be cases of the flu, and the yellow dots are going to be friends of the people with the flu. And the size of the dots is going to be proportional to how many of their friends have the flu. So bigger dots mean more of your friends have the flu. And if you look at this image -- here we are now in September the 13th -- you're going to see a few cases light up. You're going to see kind of blooming of the flu in the middle. Here we are on October the 19th. The slope of the epidemic curve is approaching now, in November. Bang, bang, bang, bang, bang -- you're going to see lots of blooming in the middle, and then you're going to see a sort of leveling off, fewer and fewer cases towards the end of December. And this type of a visualization can show that epidemics like this take root and affect central individuals first, before they affect others.
这是714个学生的映射图 和他们朋友联系。 很快,我要使这个图动起来。 我们要通过这个网络作每日监控 120天。 红点将会是流感的传染者, 黄点就是流感传染这人的朋友。 这些点的大小 和他们得流感朋友的数目成正比。 越大的点意味着更多的朋友得了流感。 你看这个图 --- 这儿是九月十三号 --- 你看到几个病例出现。 在中间你就会看到流感开始爆发。 这儿是十月十九日。 传播曲线的坡度开始临近,在十一月。 砰,砰,砰,砰,砰,你将看到在中间的很多地方爆发, 然后你会看到情况稳定下来, 到十二月底就越来越少的病例发生。 这样的图形表示 能显示,像这样的传染病先 影响中心个体 在影响别人之前。
Now, as I've been suggesting, this method is not restricted to germs, but actually to anything that spreads in populations. Information spreads in populations, norms can spread in populations, behaviors can spread in populations. And by behaviors, I can mean things like criminal behavior, or voting behavior, or health care behavior, like smoking, or vaccination, or product adoption, or other kinds of behaviors that relate to interpersonal influence. If I'm likely to do something that affects others around me, this technique can get early warning or early detection about the adoption within the population. The key thing is that for it to work, there has to be interpersonal influence. It cannot be because of some broadcast mechanism affecting everyone uniformly.
现在,如我所说, 这个方法并不局限于病毒, 实际上可以用于任何在人群中传播的事物。 信息在人群中传播。 规范在人群中传播。 行为能在人群中传播 我说的行为,就是像犯罪的行为 或选举行为,或者保健行为, 像抽烟,或免疫, 或产品推广,或者其他类型的行为 和人际之间影响相关的性为。 如果我想做些事情来影响我周围的人, 这个技术能得到早期示警,或早期预测, 关于人群的采纳。 要这个技术起作用,关键在于, 人际之间的影响必须存在。 它不能是像一些传播机制 统一地影响每一个人。
Now the same insights can also be exploited -- with respect to networks -- can also be exploited in other ways, for example, in the use of targeting specific people for interventions. So, for example, most of you are probably familiar with the notion of herd immunity. So, if we have a population of a thousand people, and we want to make the population immune to a pathogen, we don't have to immunize every single person. If we immunize 960 of them, it's as if we had immunized a hundred [percent] of them. Because even if one or two of the non-immune people gets infected, there's no one for them to infect. They are surrounded by immunized people. So 96 percent is as good as 100 percent. Well, some other scientists have estimated what would happen if you took a 30 percent random sample of these 1000 people, 300 people and immunized them. Would you get any population-level immunity? And the answer is no. But if you took this 30 percent, these 300 people and had them nominate their friends and took the same number of vaccine doses and vaccinated the friends of the 300 -- the 300 friends -- you can get the same level of herd immunity as if you had vaccinated 96 percent of the population at a much greater efficiency, with a strict budget constraint.
现在同样的观察 可以用上 --- 关于网络 --- 能够用其他的方法来利用, 比如,可以用来追踪 特殊的人群。 比如说,你们大部分人可能听过 群体免疫的概念。 如果我们有一个一千人的群体, 我们想使这个群体对某种病原体免疫, 我们并不需要给每个人打免疫预防针。 如果我们使960人免疫, 效果和使所有人免疫差不多。 因为即使一两个没有免疫的人感染了, 也没有其他人让他们感染。 这两个人周围的人都已经免疫。 所以百分之96和百分之百效果一样好。 一些其他的科学家已经预测了 可能发生的情况,如果你从这一千人中取百分之三十的随机样本 也就是三百个人,并且使他们免疫。 这样能不能使整个群体免疫? 答案是不可能。 但是如果你选择百分之三十的这三百人, 让他们举出他们的朋友 然后使用同样数量的免疫针 使得这三百人的朋友免疫, 这三百个朋友, 你就能达到群体免疫的效果 就好像给百分之九十六的人打预防针的效果一样 同时效率更高,并且花费更少。
And similar ideas can be used, for instance, to target distribution of things like bed nets in the developing world. If we could understand the structure of networks in villages, we could target to whom to give the interventions to foster these kinds of spreads. Or, frankly, for advertising with all kinds of products. If we could understand how to target, it could affect the efficiency of what we're trying to achieve. And in fact, we can use data from all kinds of sources nowadays [to do this].
同样的想法可以用于,比如说, 解决像床罩这种物品在发展中国家 的派发和分布。 如果我们了解村庄的网络结构, 我们就能选择介入的目标 来促进这些过程的进行。 或者,更加直接的说,来促销所有的产品。 如果我们能理解如何选择目标, 就能影响到我们达到目标 的效率。 实际上,我们能使用来源于各种渠道的 数据[来应用这个方法]。
This is a map of eight million phone users in a European country. Every dot is a person, and every line represents a volume of calls between the people. And we can use such data, that's being passively obtained, to map these whole countries and understand who is located where within the network. Without actually having to query them at all, we can get this kind of a structural insight. And other sources of information, as you're no doubt aware are available about such features, from email interactions, online interactions, online social networks and so forth. And in fact, we are in the era of what I would call "massive-passive" data collection efforts. They're all kinds of ways we can use massively collected data to create sensor networks to follow the population, understand what's happening in the population, and intervene in the population for the better. Because these new technologies tell us not just who is talking to whom, but where everyone is, and what they're thinking based on what they're uploading on the Internet, and what they're buying based on their purchases. And all this administrative data can be pulled together and processed to understand human behavior in a way we never could before.
这里是一个欧洲国家八百万人的 电话用户的网络图。 每一个点就是一个人,每一条线代表 人们之间的通话数量。 我们可以利用这些数据,被动方式得到的数据, 来描绘整个国家 从而了解那些人处在网络的中心。 不用实际上去询问每个人, 我们能得到这种的结构。 其他来源的信息,你肯定也知道, 也能提供这样的信息,例如电子邮件交互, 在线交流, 在线社交网络,等等。 实际上,我们处在一个我称为 “大量被动”数据收集的时代。 有很多种不同的方法我们能使用大量收集的数据 来建立传感网络 跟踪人群, 了解在人群中正在发生的事件, 从而可以更好的介入。 因为这些新技术告诉我们 不仅仅是那些人与那些人交流, 同时也告诉我们每个人处在什么位置, 根据他们上传到互联网上的东西来知道他们的想法, 他们的购物记录告诉我们他们买了什么。 所有这些管理数据能一起使用处理 来了解人类的行为 以一种前所未能的方式。
So, for example, we could use truckers' purchases of fuel. So the truckers are just going about their business, and they're buying fuel. And we see a blip up in the truckers' purchases of fuel, and we know that a recession is about to end. Or we can monitor the velocity with which people are moving with their phones on a highway, and the phone company can see, as the velocity is slowing down, that there's a traffic jam. And they can feed that information back to their subscribers, but only to their subscribers on the same highway located behind the traffic jam! Or we can monitor doctors prescribing behaviors, passively, and see how the diffusion of innovation with pharmaceuticals occurs within [networks of] doctors. Or again, we can monitor purchasing behavior in people and watch how these types of phenomena can diffuse within human populations.
比如说,我们能用卡车司机的购油记录。 卡车司机处理他们的生意 他们要买汽油作燃料。 我们看到卡车司机购油数据的零上尖峰信号, 我们就知道经济衰退大概要结束了。 或者我们能监测 人们在高速公路上带着电话移动的速度, 电话公司能看到, 如果速度慢下来, 那么就发生了交通堵塞。 他们能把这个信息发给他们的用户, 只发给在同一条高速公路上 处于交通堵塞地点之前的用户! 或者我们监测医生开药的行为,以被动的形式, 看看在医生网络中 制药公司的新药的发行如何。 或者,我们能监测人们的购物行为, 看看这些种类的现象 在人群中是怎样传播的。
And there are three ways, I think, that these massive-passive data can be used. One is fully passive, like I just described -- as in, for instance, the trucker example, where we don't actually intervene in the population in any way. One is quasi-active, like the flu example I gave, where we get some people to nominate their friends and then passively monitor their friends -- do they have the flu, or not? -- and then get warning. Or another example would be, if you're a phone company, you figure out who's central in the network and you ask those people, "Look, will you just text us your fever every day? Just text us your temperature." And collect vast amounts of information about people's temperature, but from centrally located individuals. And be able, on a large scale, to monitor an impending epidemic with very minimal input from people. Or, finally, it can be more fully active -- as I know subsequent speakers will also talk about today -- where people might globally participate in wikis, or photographing, or monitoring elections, and upload information in a way that allows us to pool information in order to understand social processes and social phenomena.
我想,有三种方式, 这些大量被动的数据能被收集。 一个方式是完全被动, 如我刚刚描述的 --- 例如在卡车司机的例子中, 我们不需要以任何方式干涉这个群体的行为。 另一种是类似于主动的方式, 比如说像我说的流感的例子, 我们选一些人来举出他们的朋友 然后被动地监测他们的朋友 --- 他们是不是感染了流感? -- 然后得到示警。 或者另一个例子, 如果你是电话公司,你能弄清楚谁在网络的中心, 然后你可以问这些人,“你们能不能把你们每天的发烧情况给我们发过来? 只要发体温度数。” 然后收集人体体温的大量数据, 但是只是网络中心个体的信息。 这样就能,大面积地, 监测即将来临的传染病 而只需要人们给出非常少量的信息。 最后的一种方式,就更加主动 --- 我知道随后的演讲者今天会说到的 -- 人们在哪儿参与维基, 摄影,看选举, 上载信息,这样可以让我们收集 数据,来了解社交过程 和社会现象。
In fact, the availability of these data, I think, heralds a kind of new era of what I and others would like to call "computational social science." It's sort of like when Galileo invented -- or, didn't invent -- came to use a telescope and could see the heavens in a new way, or Leeuwenhoek became aware of the microscope -- or actually invented -- and could see biology in a new way. But now we have access to these kinds of data that allow us to understand social processes and social phenomena in an entirely new way that was never before possible. And with this science, we can understand how exactly the whole comes to be greater than the sum of its parts. And actually, we can use these insights to improve society and improve human well-being.
实际上,我认为,这些数据的可用性, 预示了一个新的纪元 也就是我们所说的 “计算社会学”。 这有点像伽利略发明的 -- 不是发明 -- 使用望远镜 能用一种新的方式看到天空, 或者莱文胡克开始了解微观世界 --- 发明了显微镜 --- 而能以新的方式审视生物学。 但现在我们能够得到这些数据 这使得我们能了解社交过程 和社会现象 以一种前所未能的新方式。 通过这门科学,我们能 准确了解 整体是怎样优于 局部的总和。 我们能用这些知识 来改善社会和人类的生存。
Thank you.
谢谢。