Tricia Wang: The human insights missing from big data

In ancient Greece, when anyone from slaves to soldiers, poets and politicians, needed to make a big decision on life's most important questions, like, "Should I get married?" or "Should we embark on this voyage?" or "Should our army advance into this territory?" they all consulted the oracle.

古希臘時期，不論是奴隸或士兵，詩人或政治家，當他們人生遇到重大問題時，需要做出重要的決定，像是「我該結婚嗎？」或是「我該開始這次的航行嗎？」或是「我的士兵該進攻這個領地嗎？」他們都會請示先知。

So this is how it worked: you would bring her a question and you would get on your knees, and then she would go into this trance. It would take a couple of days, and then eventually she would come out of it, giving you her predictions as your answer.

運行模式是這樣的：你把問題告訴她，接著屈膝跪下，然後她就會進入出神狀態。這會花上幾天的時間，最終她會回神，答復你她的預知。

From the oracle bones of ancient China to ancient Greece to Mayan calendars, people have craved for prophecy in order to find out what's going to happen next. And that's because we all want to make the right decision. We don't want to miss something. The future is scary, so it's much nicer knowing that we can make a decision with some assurance of the outcome.

從古中國的甲骨文，到古希臘，再到馬雅曆，人們都渴求著預言，為了知道接下來會發生什麼事。而這是因為我們都想做正確的決定，我們不希望漏掉了什麼。未來令人害怕。所以能在某種程度上保障決定的結果，是很棒的事。

Well, we have a new oracle, and it's name is big data, or we call it "Watson" or "deep learning" or "neural net." And these are the kinds of questions we ask of our oracle now, like, "What's the most efficient way to ship these phones from China to Sweden?" Or, "What are the odds of my child being born with a genetic disorder?" Or, "What are the sales volume we can predict for this product?"

我們有了新的先知，名字叫大數據。也可以稱它為「華生」、「深度學習」或「人工神經網路」。如今我們會問先知這樣的問題：「要將這批手機從中國運到瑞典，怎樣最有效率？」或是「我的小孩出生就有遺傳疾病的機率是多少？」或是「預期這產品的銷售量多少？」

I have a dog. Her name is Elle, and she hates the rain. And I have tried everything to untrain her. But because I have failed at this, I also have to consult an oracle, called Dark Sky, every time before we go on a walk, for very accurate weather predictions in the next 10 minutes. She's so sweet. So because of all of this, our oracle is a $122 billion industry.

我養了隻狗，名叫埃萊，最討厭下雨。我用盡方法來訓練她，讓她適應下雨。但因為我失敗了，我還是得諮詢一位叫 Dark Sky（天氣預報公司）的先知，每次散步之前都會諮詢，以獲得接下來十分鐘的準確天氣預報。她真的很貼心。基於這些理由，我們的「先知」是個 1220 億美元的產業。

Now, despite the size of this industry, the returns are surprisingly low. Investing in big data is easy, but using it is hard. Over 73 percent of big data projects aren't even profitable, and I have executives coming up to me saying, "We're experiencing the same thing. We invested in some big data system, and our employees aren't making better decisions. And they're certainly not coming up with more breakthrough ideas."

先不論這個產業的規模，令人驚訝的是它極低的報酬率。投資大數據很簡單，運用大數據卻很難。 73% 以上的大數據計畫根本不賺錢，有些業務主管跑來跟我說，「我們都面臨了同樣的問題。我們投資了幾個大數據系統，但我們的員工卻還是不能做出更優的決定。他們當然也沒有想出更多突破性的點子。」

So this is all really interesting to me, because I'm a technology ethnographer. I study and I advise companies on the patterns of how people use technology, and one of my interest areas is data. So why is having more data not helping us make better decisions, especially for companies who have all these resources to invest in these big data systems? Why isn't it getting any easier for them?

這些對我來說都很有趣，因為我是個科技人類學家。我研究並給予公司建議，告訴他們人們使用科技的形態，我有興趣的領域之一就是數據。為什麼獲得更多數據卻沒有幫我們做更好的決定，特別是那些有資源，可以投資大數據系統的公司？為什麼他們沒有更好地做決定？

So, I've witnessed the struggle firsthand. In 2009, I started a research position with Nokia. And at the time, Nokia was one of the largest cell phone companies in the world, dominating emerging markets like China, Mexico and India -- all places where I had done a lot of research on how low-income people use technology. And I spent a lot of extra time in China getting to know the informal economy. So I did things like working as a street vendor selling dumplings to construction workers. Or I did fieldwork, spending nights and days in internet cafés, hanging out with Chinese youth, so I could understand how they were using games and mobile phones and using it between moving from the rural areas to the cities.

我第一時間就目睹了這項困境。 2009 年，我開始了在諾基亞的研究工作。當時，諾基亞是世界上最大的手機公司之一，在中國、墨西哥、印度等新興市場中佔有主要地位── 我在這些地方都做了很多研究，研究低收入的人怎麼使用科技產品。我在中國花了特別多時間來了解地下經濟。所以我當過街頭攤販，賣水餃給建築工人。我也做過實地調查，在網咖中日日夜夜地待著，和中國年輕人來往，這樣我才知道他們怎麼玩遊戲、使用手機，以及他們從農村地區移居到城市時的使用情形。

Through all of this qualitative evidence that I was gathering, I was starting to see so clearly that a big change was about to happen among low-income Chinese people. Even though they were surrounded by advertisements for luxury products like fancy toilets -- who wouldn't want one? -- and apartments and cars, through my conversations with them, I found out that the ads the actually enticed them the most were the ones for iPhones, promising them this entry into this high-tech life. And even when I was living with them in urban slums like this one, I saw people investing over half of their monthly income into buying a phone, and increasingly, they were "shanzhai," which are affordable knock-offs of iPhones and other brands. They're very usable. Does the job.

透過我收集的定性資料，我開始清楚看見即將發生在低收入中國人身上的巨變。雖然他們身邊圍繞著奢侈品的廣告，像是花俏的馬桶──誰不想要呢── 還有公寓和車，從和他們的對話中，我發現最吸引他們的廣告，是 iPhone 的廣告，那些廣告向他們保證了進入高科技生活的途徑。即使我和他們一起住在這樣的城市貧民窟，我也看到人們將半個月以上的收入拿去買手機，而且越來越多都是「山寨品」，也就是他們買得起的 iPhone 或其他品牌的仿冒品。這些仿冒品很堪使用。原廠有的功能都能用。

And after years of living with migrants and working with them and just really doing everything that they were doing, I started piecing all these data points together -- from the things that seem random, like me selling dumplings, to the things that were more obvious, like tracking how much they were spending on their cell phone bills. And I was able to create this much more holistic picture of what was happening. And that's when I started to realize that even the poorest in China would want a smartphone, and that they would do almost anything to get their hands on one.

我和移民一起住、一起工作了數年，真的是他們做什麼，我就做什麼，我開始將所有數據拼湊在一起── 不論是看似不相關的事，像是我賣水餃的事，或是較明顯相關的事，像是追蹤他們花多少錢付手機費。所以我才有辦法描繪出這麼多整體畫面來說明當時正發生什麼事。這時我才開始理解到連中國最窮的人也想要智慧型手機，且他們幾乎會不擇手段拿到手。

You have to keep in mind, iPhones had just come out, it was 2009, so this was, like, eight years ago, and Androids had just started looking like iPhones. And a lot of very smart and realistic people said, "Those smartphones -- that's just a fad. Who wants to carry around these heavy things where batteries drain quickly and they break every time you drop them?" But I had a lot of data, and I was very confident about my insights, so I was very excited to share them with Nokia.

你們要記得，當時是 2009 年，iPhone 才剛出現，這是八年前的事，安卓手機才剛開始像 iPhone。很多聰明又現實的人說，「智慧型手機只是一時的流行。誰會想帶著這麼重的東西到處走，又很快就沒電，還會一掉地就壞？」但我有很多數據，我對自己的洞察觀點非常有自信，我興奮地把數據告訴諾基亞。

But Nokia was not convinced, because it wasn't big data. They said, "We have millions of data points, and we don't see any indicators of anyone wanting to buy a smartphone, and your data set of 100, as diverse as it is, is too weak for us to even take seriously." And I said, "Nokia, you're right. Of course you wouldn't see this, because you're sending out surveys assuming that people don't know what a smartphone is, so of course you're not going to get any data back about people wanting to buy a smartphone in two years. Your surveys, your methods have been designed to optimize an existing business model, and I'm looking at these emergent human dynamics that haven't happened yet. We're looking outside of market dynamics so that we can get ahead of it." Well, you know what happened to Nokia? Their business fell off a cliff. This -- this is the cost of missing something. It was unfathomable.

但我沒能說服諾基亞，因為那不是大數據。他們說：「我們有幾百萬則數據，而我們沒見到任何數據指出有人想買智慧型手機，你的 100 組數據太缺乏多樣性，我們完全無法重視這項數據。」我說：「諾基亞，你說的沒錯。你當然不會看到有人要買，因為你所發送問卷的假設前提是人們不知道智慧型手機是什麼，所以你的數據當然不會反映兩年內想買智慧型手機的人的想法。你問卷、研究方法的設計理念都是想讓現有的業務型態更好，而我關注的是這些正浮現的人類動態，那些是過去沒有發生的，我們看的是市場動態之外，這樣我們才能先走一步。」你們知道諾基亞怎麼樣了嗎？他們的產業跌落谷底。這就是錯失的代價。那代價是深不可測的。

But Nokia's not alone. I see organizations throwing out data all the time because it didn't come from a quant model or it doesn't fit in one. But it's not big data's fault. It's the way we use big data; it's our responsibility. Big data's reputation for success comes from quantifying very specific environments, like electricity power grids or delivery logistics or genetic code, when we're quantifying in systems that are more or less contained.

但不是只有諾基亞這樣。我看到各機構一天到晚丟棄數據，因為數據並非來自數量大的模型，或對不上數量大的模型數據。但這不是大數據的錯。是我們用錯方法，是我們的責任。但一般認為大數據的成功之處在於量化的對象非常的特定，像是電網、物流運送或遺傳密碼，也就是些基本上可操縱的系統。

But not all systems are as neatly contained. When you're quantifying and systems are more dynamic, especially systems that involve human beings, forces are complex and unpredictable, and these are things that we don't know how to model so well. Once you predict something about human behavior, new factors emerge, because conditions are constantly changing. That's why it's a never-ending cycle. You think you know something, and then something unknown enters the picture. And that's why just relying on big data alone increases the chance that we'll miss something, while giving us this illusion that we already know everything.

但並非所有的系統都能被操縱得好好的。若你在量化的系統是動態的，特別是那些有人參與其中的系統，會產生影響的事物複雜又難以預測，我們不太知道怎樣建立這些模型。即使你一時預測了人的行動，又會出現新的要素，因為情況持續在改變。正因如此，這是個永無止境的迴圈。你以為你瞭解了一件事，另一件未知的事物便進入了你的視野。所以純粹依靠大數據便增加了我們錯失的機率，但同時讓我們以為我們無所不知。

And what makes it really hard to see this paradox and even wrap our brains around it is that we have this thing that I call the quantification bias, which is the unconscious belief of valuing the measurable over the immeasurable. And we often experience this at our work. Maybe we work alongside colleagues who are like this, or even our whole entire company may be like this, where people become so fixated on that number, that they can't see anything outside of it, even when you present them evidence right in front of their face. And this is a very appealing message, because there's nothing wrong with quantifying; it's actually very satisfying. I get a great sense of comfort from looking at an Excel spreadsheet, even very simple ones.

為什麼我們很難發現這個矛盾，甚至也很難去理解它，是因為我們有我所謂的「量化成見」，也就是無意識地認為可量化的比不可量化的更有價值。我們工作時常有這樣的經驗。或許我們和這樣想的同事一起工作，或者整個公司都這樣想，人們過於迷戀數字，以至於看不見除此之外的任何東西，即使你將證據貼到他們臉上，給他們看。這是個十分吸引人的訊息，因為量化並沒有錯；量化事實上很讓人滿意。我看著 Excel 電子表格就覺得安心，即使是很簡單的也一樣。

(Laughter)

（笑聲）

It's just kind of like, "Yes! The formula worked. It's all OK. Everything is under control."

那種感覺就是，「好的！方程式沒問題。一切都很好。都在掌控之中。」

But the problem is that quantifying is addictive. And when we forget that and when we don't have something to kind of keep that in check, it's very easy to just throw out data because it can't be expressed as a numerical value. It's very easy just to slip into silver-bullet thinking, as if some simple solution existed. Because this is a great moment of danger for any organization, because oftentimes, the future we need to predict -- it isn't in that haystack, but it's that tornado that's bearing down on us outside of the barn. There is no greater risk than being blind to the unknown. It can cause you to make the wrong decisions. It can cause you to miss something big.

問題是，量化會使人上癮。我們一旦忘記這件事，若我們沒能做到時時確認是否上癮，我們很容易直接扔掉這樣的資料：僅僅因為它無法用數值量化。很容易認為會有完美解決一切的絶招，就好像有某種簡單的解決方法一樣。因為這對任何一間機構來說，都是危機的重要時刻，時常，我們要預測的未來，並不是在這安穩的草堆裡，而是在它之外，是即將襲擊我們的暴風中心。沒有什麼比對未知一無所知來得有風險，那會使你做出錯誤的決定。那可能使你錯失重要的事物。

But we don't have to go down this path. It turns out that the oracle of ancient Greece holds the secret key that shows us the path forward. Now, recent geological research has shown that the Temple of Apollo, where the most famous oracle sat, was actually built over two earthquake faults. And these faults would release these petrochemical fumes from underneath the Earth's crust, and the oracle literally sat right above these faults, inhaling enormous amounts of ethylene gas, these fissures.

但我們不用這樣做。到頭來，是古希臘的先知握有顯示道路的神秘鑰匙。近年的地質研究顯示，最有名的先知所在的阿波羅神廟，事實上座落在兩個地震斷層上。這些斷層會從地殼下釋出石油煙氣，而那位先知就直接坐在那些斷層上方，從縫隙中吸入數不盡的乙烯氣體。

(Laughter)

（笑聲）

It's true.

那是真的。

(Laughter) It's all true, and that's what made her babble and hallucinate and go into this trance-like state. She was high as a kite!

（笑聲）那都是真的，那就是為什麼她講話含糊不清還看到幻覺，並進入類似出神的狀態。她感覺自己都飛上天了！

(Laughter)

（笑聲）

So how did anyone -- How did anyone get any useful advice out of her in this state? Well, you see those people surrounding the oracle? You see those people holding her up, because she's, like, a little woozy? And you see that guy on your left-hand side holding the orange notebook? Well, those were the temple guides, and they worked hand in hand with the oracle. When inquisitors would come and get on their knees, that's when the temple guides would get to work, because after they asked her questions, they would observe their emotional state, and then they would ask them follow-up questions, like, "Why do you want to know this prophecy? Who are you? What are you going to do with this information?" And then the temple guides would take this more ethnographic, this more qualitative information, and interpret the oracle's babblings. So the oracle didn't stand alone, and neither should our big data systems.

所以大家要怎麼── 大家要怎麼在這個狀態下得到有用的建議？看到那些圍繞先知的人們了嗎？你可以看到那些人支撐著她，因為她好像有點頭昏眼花？有沒有發現她左邊的男子正拿著橘色小冊子？那些是神廟的引導人員，他們與先知密切合作。當有人來下跪詢問時，神廟的引導人員就開始工作了，在來者向先知詢問一些問題後，他們會觀察來者的精神狀態，然後他們會問來者一些後續問題，像是：「為什麼你想知道這個預言？你是誰？你會怎麼運用這個資訊？」接著神廟的引導人員會用人類學的角度來看，用質性資訊的角度來看，然後翻譯先知含糊不清的話。所以先知並非自己承攬一切任務，我們的大數據系統同樣也不該如此。

Now to be clear, I'm not saying that big data systems are huffing ethylene gas, or that they're even giving invalid predictions. The total opposite. But what I am saying is that in the same way that the oracle needed her temple guides, our big data systems need them, too. They need people like ethnographers and user researchers who can gather what I call thick data. This is precious data from humans, like stories, emotions and interactions that cannot be quantified. It's the kind of data that I collected for Nokia that comes in in the form of a very small sample size, but delivers incredible depth of meaning.

我要澄清一下，我並非在說大數據系統在呼吸着乙烯氣體，甚至給予沒用的預測。完全相反。我想說的是，就像先知需要神廟的引導人員那樣，大數據系統同樣也需要。大數據需要人類學家以及用戶研究人員來收集我所謂的「厚數據」── 來自於人們的寶貴數據，像是故事、情緒和互動，這些無法計量的事物。就像我收集給諾基亞的那種數據，數據樣本規模非常小，但傳達的涵義卻極其的深。

And what makes it so thick and meaty is the experience of understanding the human narrative. And that's what helps to see what's missing in our models. Thick data grounds our business questions in human questions, and that's why integrating big and thick data forms a more complete picture. Big data is able to offer insights at scale and leverage the best of machine intelligence, whereas thick data can help us rescue the context loss that comes from making big data usable, and leverage the best of human intelligence. And when you actually integrate the two, that's when things get really fun, because then you're no longer just working with data you've already collected. You get to also work with data that hasn't been collected. You get to ask questions about why: Why is this happening?

它如此厚重、內容豐富的原因是那些從人們的話語中明白更多信息的經驗。這才能幫助我們看到模型裡缺少了什麼東西。厚數據以人類問題為根基來說明經濟問題，這就是為什麼結合大數據和厚數據能讓我們得到的訊息更加完整。大數據能在一定程度上洞悉問題，並最大程度發揮機器智能，而厚數據能幫我們找到那缺失的背景資訊，能讓大數據便於使用，並最大程度發揮人類智能。若你真的把這兩個結合在一起事情就會變得非常有趣，如此一來，運用的就不只是你早就收集的數據。你還可以運用尚未收集的數據。你就可以知道「為什麼」：為什麼會變成這樣？

Now, when Netflix did this, they unlocked a whole new way to transform their business. Netflix is known for their really great recommendation algorithm, and they had this $1 million prize for anyone who could improve it. And there were winners. But Netflix discovered the improvements were only incremental. So to really find out what was going on, they hired an ethnographer, Grant McCracken, to gather thick data insights. And what he discovered was something that they hadn't seen initially in the quantitative data. He discovered that people loved to binge-watch. In fact, people didn't even feel guilty about it. They enjoyed it.

所以說，網飛這樣做就開啟了轉換商業模式的全新方式。網飛以擁有優秀的推薦演算法而聞名，且發給任何能改善系統的人一百萬美元獎金。有人贏了獎金。但網飛發現效能提升還是不夠明顯。為了知道發生了什麼事，他們僱用了人類學家，格蘭特．麥克拉肯，來收集厚數據以準確洞察理解。他發現了網飛最初未能從量化數據中看出來的，他發現人們喜歡刷劇。（註：短時間內狂看電視劇）事實上，人們甚至不覺得有什麼不對。他們非常享受這個過程。

(Laughter)

（笑聲）

So Netflix was like, "Oh. This is a new insight." So they went to their data science team, and they were able to scale this big data insight in with their quantitative data. And once they verified it and validated it, Netflix decided to do something very simple but impactful. They said, instead of offering the same show from different genres or more of the different shows from similar users, we'll just offer more of the same show. We'll make it easier for you to binge-watch. And they didn't stop there. They did all these things to redesign their entire viewer experience, to really encourage binge-watching. It's why people and friends disappear for whole weekends at a time, catching up on shows like "Master of None." By integrating big data and thick data, they not only improved their business, but they transformed how we consume media. And now their stocks are projected to double in the next few years.

網飛覺得：「噢，這是個新洞見。」於是叫他們的數據科學組把這洞察放大到量化數據的規模來衡量。一旦他們再次確認了它的準確性，網飛便決定做一件簡單卻影響很大的事情。他們說：「與其提供不同類型但相似的影集，或是給類似的觀眾欣賞更多不同的影集，只要同一影集提供更多集就好了。我們讓你更容易刷劇。」而他們並沒有止步於此。他們用一樣的方式，重新設計了整個觀眾體驗，來真正地鼓勵大家刷劇。這就是為什麼朋友會消失整個星期，追上「無為大師」等戲劇的進度。結合大數據與厚數據，不只讓產業進步，也轉變了我們使用媒體的型態。預期他們的股票會在接下來幾年內翻倍。

But this isn't just about watching more videos or selling more smartphones. For some, integrating thick data insights into the algorithm could mean life or death, especially for the marginalized. All around the country, police departments are using big data for predictive policing, to set bond amounts and sentencing recommendations in ways that reinforce existing biases. NSA's Skynet machine learning algorithm has possibly aided in the deaths of thousands of civilians in Pakistan from misreading cellular device metadata. As all of our lives become more automated, from automobiles to health insurance or to employment, it is likely that all of us will be impacted by the quantification bias.

這不只是關於看了更多影片，或賣了更多智慧型手機，等等。對於一些公司來說，結合厚數據洞察和演算法，可能讓他們起死回生，特別是那些已被邊緣化的公司。全國的警察局都用大數據來防止犯罪，來設定保證金金額，並用加劇偏見的方式來建議判刑。美國國家安全局的天網學習演算法可能致使幾千名巴基斯坦平民死亡，肇因於錯誤判讀了行動電話的數據。當我們的生活變得更加自動化，從汽車、健康保險或者就業，很可能我們所有人都會受量化偏見的影響。

Now, the good news is that we've come a long way from huffing ethylene gas to make predictions. We have better tools, so let's just use them better. Let's integrate the big data with the thick data. Let's bring our temple guides with the oracles, and whether this work happens in companies or nonprofits or government or even in the software, all of it matters, because that means we're collectively committed to making better data, better algorithms, better outputs and better decisions. This is how we'll avoid missing that something.

好消息是我們從吸入乙烯氣體到做出預測已有長足的進步。我們有了更好的工具，那麽讓我們更好地利用它。讓我們將大數據與厚數據結合。讓我們使神廟的引導人員與先知一起合作，不論做這項工作的是公司、非營利組織、政府，甚至軟體，全部都有其意義，因為這代表我們全體一起努力來得到更好的數據，更好的演算法、更好的產品，以及更好的決定。這就是避免錯失的方法。

(Applause)

（掌聲）

運行模式是這樣的：你把問題告訴她，接著屈膝跪下，然後她就會進入出神狀態。這會花上幾天的時間，最終她會回神，答復你她的預知。

(Laughter)

（笑聲）

It's just kind of like, "Yes! The formula worked. It's all OK. Everything is under control."

那種感覺就是，「好的！方程式沒問題。一切都很好。都在掌控之中。」

(Laughter)

（笑聲）

It's true.

那是真的。

(Laughter) It's all true, and that's what made her babble and hallucinate and go into this trance-like state. She was high as a kite!

（笑聲）那都是真的，那就是為什麼她講話含糊不清還看到幻覺，並進入類似出神的狀態。她感覺自己都飛上天了！

(Laughter)

（笑聲）

(Laughter)

（笑聲）

(Applause)

（掌聲）

Tricia Wang: The human insights missing from big data

Tricia Wang: The human insights missing from big data

Related talks

Mona Chalabi: 3 ways to spot a bad statistic

Sebastian Wernicke: How to use data to make a hit TV show

Mallory Freeman: Your company's data could help end world hunger

Giorgia Lupi: How we can find ourselves in data

Madhumita Murgia: How data brokers sell your identity

Jer Thorp: Make data more human

Related talks

Mona Chalabi: 3 ways to spot a bad statistic

Sebastian Wernicke: How to use data to make a hit TV show

Mallory Freeman: Your company's data could help end world hunger

Giorgia Lupi: How we can find ourselves in data

Madhumita Murgia: How data brokers sell your identity

Jer Thorp: Make data more human