In ancient Greece, when anyone from slaves to soldiers, poets and politicians, needed to make a big decision on life's most important questions, like, "Should I get married?" or "Should we embark on this voyage?" or "Should our army advance into this territory?" they all consulted the oracle.
在古希腊, 从奴隶到士兵,从诗人到政治家, 都需要对人生中最重要的问题做决定, 比如,我该结婚吗? 这次出海我该不该去? 我们该不该向那片区域进军? 他们纷纷去请教先知。
So this is how it worked: you would bring her a question and you would get on your knees, and then she would go into this trance. It would take a couple of days, and then eventually she would come out of it, giving you her predictions as your answer.
过程是这样的: 你问她一个问题,然后跪在她面前, 之后她会进入一种恍惚的状态。 也许持续几天, 最终她会恢复清醒状态, 给出她的预测,回答你的问题。
From the oracle bones of ancient China to ancient Greece to Mayan calendars, people have craved for prophecy in order to find out what's going to happen next. And that's because we all want to make the right decision. We don't want to miss something. The future is scary, so it's much nicer knowing that we can make a decision with some assurance of the outcome.
从古代中国用骨头占卜, 到古希腊,再到玛雅历法, 人们祈求能得到预言, 从而知道未来会发生什么。 因为我们都想做出正确的决定。 我们不想忽略什么。 未来是可怕的, 因此若我们 在做决定时多多少少 能预知结果,会更好。
Well, we have a new oracle, and it's name is big data, or we call it "Watson" or "deep learning" or "neural net." And these are the kinds of questions we ask of our oracle now, like, "What's the most efficient way to ship these phones from China to Sweden?" Or, "What are the odds of my child being born with a genetic disorder?" Or, "What are the sales volume we can predict for this product?"
如今我们有了新的先知, 它的名字叫大数据, 或者叫它“沃森”或者 “深度学习”或者“神经网络”。 以下就是我们问这位先知的问题。 “要把这些手机从中国运到瑞典, 怎么做最高效?” 或者“我的孩子出生时 患遗传病的几率是多少?” 或者“这件产品的预计销量是多少?”
I have a dog. Her name is Elle, and she hates the rain. And I have tried everything to untrain her. But because I have failed at this, I also have to consult an oracle, called Dark Sky, every time before we go on a walk, for very accurate weather predictions in the next 10 minutes. She's so sweet. So because of all of this, our oracle is a $122 billion industry.
我养了一只狗,名叫艾尔, 她讨厌下雨。 我想了很多办法来帮她。 但是因为我失败了, 因此每次准备遛狗时, 我都会求助一位先知,叫Dark Sky, 来获得未来10分钟精准的天气预报。 小狗真可爱。 因此,“先知”大数据是 一项价值1220亿美元的产业。
Now, despite the size of this industry, the returns are surprisingly low. Investing in big data is easy, but using it is hard. Over 73 percent of big data projects aren't even profitable, and I have executives coming up to me saying, "We're experiencing the same thing. We invested in some big data system, and our employees aren't making better decisions. And they're certainly not coming up with more breakthrough ideas."
但尽管产业规模大, 投资回报却出奇地低。 投资大数据很简单, 但利用它却很难。 超过73%的大数据项目都不赚钱, 有经理来找我说, “我们的情况也是如此。 我们投资了一些大数据系统, 但雇员们并未因此做出更好的决策。 更别说提出突破性的想法了。”
So this is all really interesting to me, because I'm a technology ethnographer. I study and I advise companies on the patterns of how people use technology, and one of my interest areas is data. So why is having more data not helping us make better decisions, especially for companies who have all these resources to invest in these big data systems? Why isn't it getting any easier for them?
我觉得这个现象很有意思, 因为我是一名技术人类学家。 我研究人们使用技术的模式, 并据此为企业提供建议, 数据是我感兴趣的领域之一。 为什么更多的数据不能 帮我们更好的决策呢? 尤其是那些资源丰富, 能投资大数据系统的公司。 为什么对他们而言, 事情并未变得简单?
So, I've witnessed the struggle firsthand. In 2009, I started a research position with Nokia. And at the time, Nokia was one of the largest cell phone companies in the world, dominating emerging markets like China, Mexico and India -- all places where I had done a lot of research on how low-income people use technology. And I spent a lot of extra time in China getting to know the informal economy. So I did things like working as a street vendor selling dumplings to construction workers. Or I did fieldwork, spending nights and days in internet cafés, hanging out with Chinese youth, so I could understand how they were using games and mobile phones and using it between moving from the rural areas to the cities.
我亲眼见过这种困境。 2009年,我跟诺基亚 开始进行一项研究。 在当时, 诺基亚是全球最大的 手机生产商之一, 在中国、墨西哥和印度等 新兴市场占有巨大份额, 我在上述国家进行了大量的研究, 看低收入人群是如何使用技术的。 我在中国花了大量时间 去了解当地的街头经济。 我当过街边小贩, 卖饺子给建筑工人。 我还泡过网吧, 在那里连续待上几天, 跟中国年轻人 混在一起,来了解 他们如何玩游戏和使用手机, 如何在从农村来到城市时使用。
Through all of this qualitative evidence that I was gathering, I was starting to see so clearly that a big change was about to happen among low-income Chinese people. Even though they were surrounded by advertisements for luxury products like fancy toilets -- who wouldn't want one? -- and apartments and cars, through my conversations with them, I found out that the ads the actually enticed them the most were the ones for iPhones, promising them this entry into this high-tech life. And even when I was living with them in urban slums like this one, I saw people investing over half of their monthly income into buying a phone, and increasingly, they were "shanzhai," which are affordable knock-offs of iPhones and other brands. They're very usable. Does the job.
通过搜集到的这些 高质量的例证, 我开始清晰地看到 在中国低收入人群中 将发生巨大的变革。 尽管奢华产品的广告随处可见, 比如高级马桶——谁不想要? 还有房子和车子, 聊天过程中, 我发现最吸引他们的广告, 是iPhone的广告, 因为感觉可以将他们 带入高科技生活。 跟他们一起住在 这样的城中村里, 我看到有人花掉超过 半个月的收入 去买一部手机, “山寨”越来越多, 就是苹果和其他品牌的 廉价仿冒品。 它们也能用。 基本功能都有。
And after years of living with migrants and working with them and just really doing everything that they were doing, I started piecing all these data points together -- from the things that seem random, like me selling dumplings, to the things that were more obvious, like tracking how much they were spending on their cell phone bills. And I was able to create this much more holistic picture of what was happening. And that's when I started to realize that even the poorest in China would want a smartphone, and that they would do almost anything to get their hands on one.
多年来,我跟这些外地人 一起工作和生活, 跟他们做着同样的事情, 我开始把很多数据联系起来, 从随机事件,比如卖饺子, 到比较直观的东西, 比如看他们会花多少钱买手机。 我更全面地了解了 发生的事。 此时我开始意识到, 即使是中国最穷的人, 也会想拥有一部智能手机, 而为此他们几乎愿意付出一切。
You have to keep in mind, iPhones had just come out, it was 2009, so this was, like, eight years ago, and Androids had just started looking like iPhones. And a lot of very smart and realistic people said, "Those smartphones -- that's just a fad. Who wants to carry around these heavy things where batteries drain quickly and they break every time you drop them?" But I had a lot of data, and I was very confident about my insights, so I was very excited to share them with Nokia.
别忘了, 那是2009年, iPhone才刚刚出现, 差不多是8年前, 而安卓手机刚开始 长得像iPhone。 很多聪明而务实的人断言, “这些智能手机,只会昙花一现。 谁会愿意拿着这么重的手机, 电量掉得那么快,一摔就坏。” 但我有数据, 我对自己的见解很自信, 于是我非常兴奋地告诉诺基亚。
But Nokia was not convinced, because it wasn't big data. They said, "We have millions of data points, and we don't see any indicators of anyone wanting to buy a smartphone, and your data set of 100, as diverse as it is, is too weak for us to even take seriously." And I said, "Nokia, you're right. Of course you wouldn't see this, because you're sending out surveys assuming that people don't know what a smartphone is, so of course you're not going to get any data back about people wanting to buy a smartphone in two years. Your surveys, your methods have been designed to optimize an existing business model, and I'm looking at these emergent human dynamics that haven't happened yet. We're looking outside of market dynamics so that we can get ahead of it." Well, you know what happened to Nokia? Their business fell off a cliff. This -- this is the cost of missing something. It was unfathomable.
但是诺基亚不为所动, 因为我给的不是大数据。 他们说,“我们有几百万的数据, 没有数据显示会 有人愿意买智能手机, 而你的数据量只有几百, 还如此分散,毫无说服力, 根本不值一提。” 我说,“诺基亚,你是对的。 你当然看不到这些, 因为你在调查时就假定 人们不了解智能手机, 因此当然得不到数据来了解 2年之内想买智能手机的人。 因为你们的调查和方法, 目的都是优化现有的商业模式, 而我看到的,是前所未有的 人类新动向。 我们看的是市场动态之外的东西, 因此可以领先一步。” 都知道诺基亚的结局吧? 他们的生意一落千丈。 这就是忽略某些事情的代价。 就是那么难以想象。
But Nokia's not alone. I see organizations throwing out data all the time because it didn't come from a quant model or it doesn't fit in one. But it's not big data's fault. It's the way we use big data; it's our responsibility. Big data's reputation for success comes from quantifying very specific environments, like electricity power grids or delivery logistics or genetic code, when we're quantifying in systems that are more or less contained.
而诺基亚并非个案。 我看到许多组织总是 对数据视而不见, 因为这些数据并非 来自某种数据模型, 或跟模型不符。 大数据本身并没有错。 是我们使用不当,错在我们。 大数据的声名鹊起 是因为它能量化特定环境, 比如电网、物流或者基因编码, 帮我们量化一定程度上 可控的体系。
But not all systems are as neatly contained. When you're quantifying and systems are more dynamic, especially systems that involve human beings, forces are complex and unpredictable, and these are things that we don't know how to model so well. Once you predict something about human behavior, new factors emerge, because conditions are constantly changing. That's why it's a never-ending cycle. You think you know something, and then something unknown enters the picture. And that's why just relying on big data alone increases the chance that we'll miss something, while giving us this illusion that we already know everything.
然而并非所有的体系 都有很好的可控性。 对一个动态的体系进行量化, 尤其是牵涉到人时, 各种因素复杂多变, 有些因素并没有很好的模型。 对人的行为进行预测时, 会出现新的因素, 因为条件是在不断变化的。 因此这是个永远的循环。 你以为已经懂了, 结果新的未知情况又出现了。 因此,仅仅依靠大数据, 反而会使我们更容易 忽略一些事实, 却给了我们已经掌握一切的错觉。
And what makes it really hard to see this paradox and even wrap our brains around it is that we have this thing that I call the quantification bias, which is the unconscious belief of valuing the measurable over the immeasurable. And we often experience this at our work. Maybe we work alongside colleagues who are like this, or even our whole entire company may be like this, where people become so fixated on that number, that they can't see anything outside of it, even when you present them evidence right in front of their face. And this is a very appealing message, because there's nothing wrong with quantifying; it's actually very satisfying. I get a great sense of comfort from looking at an Excel spreadsheet, even very simple ones.
要看清这样一个矛盾, 哪怕仅仅去认真思考它, 也是困难重重, 原因在于我们偏爱量化, 比起不能量化的, 总是不自觉地相信 能够量化的。 这在工作中很常见。 也许我们的同事是这样, 甚至整个公司都是这样, 大家都盯着数字, 而忽略了其他东西, 即便你将证据摆在他们面前。 这一点很有意思, 因为量化本身并没有什么错, 甚至会让人愉悦。 看Excel表格时我就感觉挺好的, 哪怕表格很简单。
(Laughter)
(笑声)
It's just kind of like, "Yes! The formula worked. It's all OK. Everything is under control."
那感觉就是, “好!这个公式能用。 都没问题,一切尽在掌握!”
But the problem is that quantifying is addictive. And when we forget that and when we don't have something to kind of keep that in check, it's very easy to just throw out data because it can't be expressed as a numerical value. It's very easy just to slip into silver-bullet thinking, as if some simple solution existed. Because this is a great moment of danger for any organization, because oftentimes, the future we need to predict -- it isn't in that haystack, but it's that tornado that's bearing down on us outside of the barn. There is no greater risk than being blind to the unknown. It can cause you to make the wrong decisions. It can cause you to miss something big.
但问题在于, 量化会让人上瘾。 一旦忘记这点, 又没有什么纠错的机制, 就很容易舍弃 无法变成数值的信息。 人们很容易执迷于一招鲜, 好像总有简单的解决方法。 对任何组织来说这都很要命, 因为通常我们需要预测的未来, 不是这干草垛, 而是谷仓外向我们袭来的 龙卷风。 最危险的莫过于 忽略未知事物。 这会让你做出错误的决定, 忽略重要的事情。
But we don't have to go down this path. It turns out that the oracle of ancient Greece holds the secret key that shows us the path forward. Now, recent geological research has shown that the Temple of Apollo, where the most famous oracle sat, was actually built over two earthquake faults. And these faults would release these petrochemical fumes from underneath the Earth's crust, and the oracle literally sat right above these faults, inhaling enormous amounts of ethylene gas, these fissures.
但我们并非别无选择。 其实古希腊的先知们 已经掌握了解决问题的关键。 最近的地质研究表明, 最著名的先知 所在的阿波罗神庙 正建在两个地震断层之间。 断层不断从地下释放出 石油化学气体。 先知们恰好坐在这些断层上, 吸入了从断层中 逸出的大量乙烯,
(Laughter)
(笑声)
It's true.
是真的。
(Laughter) It's all true, and that's what made her babble and hallucinate and go into this trance-like state. She was high as a kite!
(笑声) 没骗你们,因此她才会 产生幻觉,开始呢喃, 变得神情恍惚, 她正“飘”着呢!
(Laughter)
(笑声)
So how did anyone -- How did anyone get any useful advice out of her in this state? Well, you see those people surrounding the oracle? You see those people holding her up, because she's, like, a little woozy? And you see that guy on your left-hand side holding the orange notebook? Well, those were the temple guides, and they worked hand in hand with the oracle. When inquisitors would come and get on their knees, that's when the temple guides would get to work, because after they asked her questions, they would observe their emotional state, and then they would ask them follow-up questions, like, "Why do you want to know this prophecy? Who are you? What are you going to do with this information?" And then the temple guides would take this more ethnographic, this more qualitative information, and interpret the oracle's babblings. So the oracle didn't stand alone, and neither should our big data systems.
所以怎么可能—— 这种情况下,怎么可能 从先知那里得到有用的建议? 看到先知身旁的人了吗? 他们扶着她, 因为她已经有点晕了。 你看左手边那位老兄, 手里拿着橙色的本子。 他们是神庙向导, 跟先知一起合作的。 当求问者跪在先知面前时, 神庙向导就要开始介入了, 求问者提问后, 向导开始观察他们的精神状态, 并且问进一步的问题, 比如,“你为什么想问这个?你是谁? 你要用这个答案来做什么?” 神庙向导利用这些与人更相关的 更有实质意义的信息, 来对先知的呢喃进行解释。 所以先知并不是孤立的, 大数据也不应如此。
Now to be clear, I'm not saying that big data systems are huffing ethylene gas, or that they're even giving invalid predictions. The total opposite. But what I am saying is that in the same way that the oracle needed her temple guides, our big data systems need them, too. They need people like ethnographers and user researchers who can gather what I call thick data. This is precious data from humans, like stories, emotions and interactions that cannot be quantified. It's the kind of data that I collected for Nokia that comes in in the form of a very small sample size, but delivers incredible depth of meaning.
别误会, 我不是说大数据吸了乙烯, 或者大数据的预测没有用。 完全不是。 我想说的是, 正如先知需要神庙向导们一样, 大数据系统也需要协助。 需要人类学家和用户研究人员, 搜集所谓的“厚数据”。 这是来源于人类的宝贵信息, 比如故事、情感和交流 等不能被量化的东西。 像我曾为诺基亚搜集的, 它们来自很小的样本量, 却能传达意义重大的信息。
And what makes it so thick and meaty is the experience of understanding the human narrative. And that's what helps to see what's missing in our models. Thick data grounds our business questions in human questions, and that's why integrating big and thick data forms a more complete picture. Big data is able to offer insights at scale and leverage the best of machine intelligence, whereas thick data can help us rescue the context loss that comes from making big data usable, and leverage the best of human intelligence. And when you actually integrate the two, that's when things get really fun, because then you're no longer just working with data you've already collected. You get to also work with data that hasn't been collected. You get to ask questions about why: Why is this happening?
而“厚数据”内涵丰富是因为 其中包含了理解人类生活的过程。 这能帮助我们看清 模型中缺失的东西。 “厚数据”将商业问题 落实到人类生活, 因此将大数据和厚数据相结合 能得到更全面的认识。 大数据能在数量级上提供视角, 最大限度利用机器智能, 而厚数据能补充 在利用大数据时 缺失的情境信息, 充分利用人类智慧。 两者结合起来时就很有意思了, 因为这样你不只是在使用 搜集到的数据。 你还能利用尚未搜集到的数据。 你可能会问: 为什么会这样?
Now, when Netflix did this, they unlocked a whole new way to transform their business. Netflix is known for their really great recommendation algorithm, and they had this $1 million prize for anyone who could improve it. And there were winners. But Netflix discovered the improvements were only incremental. So to really find out what was going on, they hired an ethnographer, Grant McCracken, to gather thick data insights. And what he discovered was something that they hadn't seen initially in the quantitative data. He discovered that people loved to binge-watch. In fact, people didn't even feel guilty about it. They enjoyed it.
Netflix这么做之后, 他们找到了全新的方式 来进行商业转型。 Netflix以出色的 推荐算法而闻名, 他们设立了100万美元的奖金, 寻找可以改进它的人。 有人获奖了。 但Netflix发现改进太慢。 为了彻底弄清原因, 他们雇了一位人类学家: 格兰特·麦克拉肯, 来搜集分析厚数据。 他发现了在一开始的数据分析中 没发现的东西。 他发现人们喜欢连续看片。 事实上人们才不会内疚。 大家乐在其中。
(Laughter)
(笑声)
So Netflix was like, "Oh. This is a new insight." So they went to their data science team, and they were able to scale this big data insight in with their quantitative data. And once they verified it and validated it, Netflix decided to do something very simple but impactful. They said, instead of offering the same show from different genres or more of the different shows from similar users, we'll just offer more of the same show. We'll make it easier for you to binge-watch. And they didn't stop there. They did all these things to redesign their entire viewer experience, to really encourage binge-watching. It's why people and friends disappear for whole weekends at a time, catching up on shows like "Master of None." By integrating big data and thick data, they not only improved their business, but they transformed how we consume media. And now their stocks are projected to double in the next few years.
于是Netflix觉得, “噢,这是个新见解。” 于是他们找来数据科学团队, 将基于厚数据的观点 跟量化数据进行对比。 这一观点得到验证后, Netflix决定采取 简单却有效的措施。 他们不再把同一节目 做成不同体裁, 也不再给同一类用户 推荐不同节目, 而是提供同一节目, 便于连续观看。 不仅如此, 他们还想尽一切办法 重新规划用户体验, 引导用户连续观看。 于是大家在周末集体消失, 都在追《无为大师》这样的剧。 通过结合大数据和厚数据, 他们不仅发展了业务, 还转变了人们消费媒体的方式。 他们的股价预计会在 未来几年内翻番。
But this isn't just about watching more videos or selling more smartphones. For some, integrating thick data insights into the algorithm could mean life or death, especially for the marginalized. All around the country, police departments are using big data for predictive policing, to set bond amounts and sentencing recommendations in ways that reinforce existing biases. NSA's Skynet machine learning algorithm has possibly aided in the deaths of thousands of civilians in Pakistan from misreading cellular device metadata. As all of our lives become more automated, from automobiles to health insurance or to employment, it is likely that all of us will be impacted by the quantification bias.
但这不只是关于看更多的视频, 或者卖更多的智能手机。 对某些人而言,将厚数据的观点 整合到算法中, 关乎生死, 尤其是被边缘化的人群。 全国各地的警察部门都在将大数据 用于预防性警务, 规划牢房数量, 提供量刑建议, 这种的方法更是强化了已有偏见。 国安局的天网机器学习算法 可能间接导致了几千 巴基斯坦平民丧生, 因为误读了他们的 蜂窝移动设备的元数据。 随着我们的生活变得更加自动化, 从汽车到健康保险到就业, 所有人都可能 会受量化偏见的负面影响。
Now, the good news is that we've come a long way from huffing ethylene gas to make predictions. We have better tools, so let's just use them better. Let's integrate the big data with the thick data. Let's bring our temple guides with the oracles, and whether this work happens in companies or nonprofits or government or even in the software, all of it matters, because that means we're collectively committed to making better data, better algorithms, better outputs and better decisions. This is how we'll avoid missing that something.
不过好消息是,我们已经 有了很大进步, 不再吸入乙烯气体, 而是真正做出预测。 我们有了更好的工具, 那就让我们用好它。 让我们将大数据和 厚数据结合起来, 为先知配上神庙向导, 无论是在公司、非营利性机构, 还是在政府或者软件公司, 都很重要, 因为这意味着我们共同承诺 提供更好的数据, 更好的算法,更好的结果, 并做出更好的决定。 这样我们才不会忽略重要信息。
(Applause)
(掌声)