Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

如果你還記得網路出現的頭十年，當時是一個很靜態的環境。你可以上網、瀏覽網頁，這些網站或許是由一些機構製作，這些機構有自己的團隊，或是當時很懂科技的人製作的。隨著社交媒體、社交網路在 21 世紀初期的興起，網路世界完全改變了。現在的網路有很多內容我們互動的內容是由網路用戶放上網的，不管是 YouTube 上的影片或者部落格，抑或是商品評價或者社交媒體的文章。除此之外，網路也多了很多互動。人們在網絡上互動，他們評論、分享，而不僅是看看而已。

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

臉書不是唯一一個能做這些事的網站，但它是最大的。我們可以通過臉書來判斷使用人數。臉書每個月的用戶高達 12 億。也就是說全球一半的網民都在使用臉書。這個網站，還有其他的網站，讓網民能創建網路上的個人形象而且無需太多的技術即可操作。用戶反應熱烈，上傳大量的個人訊息到網路上。這樣一來我們就有了有關行為、偏好、地理數據，提供給成千上萬的人，這是史無前例的。作為電腦科學家，這就意味著我可以建立很多模型用來推測各種隱藏特性，而你們自己可能都不知道你們分享的訊息透露了這些特性。科學家利用這些數據來改善網民在網路上的互動，但網路也有一些沒那麼利他主義的應用，我們面臨一個問題，那就是網路用戶並不真正了解這些網路技術、它們的運作原理，而且即使他們懂，也沒什麼辦法控制其影響。所以我今天想和你們分享的，是我們力所能及、可控制的一些事情，給大家一些想法，看看我們如何發展才能把部分控制權交回到網路用戶的手上。

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

這個是 Target 公司。我不是沒事把 Target 的標誌放在這個可憐孕婦的肚子上。你可能讀過一個小故事，刊登在富比士雜誌。故事提到 Target 發了張傳單給一位 15 歲的女孩。上面的廣告和折價卷都是嬰兒奶瓶、尿布、嬰兒床的。這還是在她告訴她父親自己懷孕了之前兩週的事。是的，她的父親很難過。那為什麼 Target 知道在這高中女生告訴父母她懷孕以前，就已經先知道了呢？原來，Target 有購物記錄，記錄成千上萬網路顧客的購物歷史，而且他們還有一個叫做 “懷孕分數”的計算系統，這個系統不只計算一位女性是否懷孕，還有她們的預產期。另外，他們不僅探討一些很明顯的資訊，比如說購買了一張嬰兒床、嬰兒服，還會計算她買了比平時多的維他命，或者是她買了一個大小足夠放下尿布的包包。對購買者來說，他們並不覺得這些購物訊息透露很多隱私，但其實這是一種行為模式，當你把和成千上萬網友的資料放在一起看，其實就能推測出很多東西。所以這些就是我們所做的事情，我們在社群網站上推測與你們相關的東西。我們要找的行為模式是，當你們從上百萬人身上發現這種模式，我們就能找到所有相關的事情。

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

所以我和實驗室的同事們，開發了多種機制，幫助我們較精確地推斷很多事情，像是你的政治傾向、性格測試分數、性別、性取向、宗教信仰、年齡、智力，同時還有像是你對認識的人有多信任、你們的關係有多緊密等。所有這些我們都可以做得很好。而且，這些都不是來自於你會認為是明顯的訊息。

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

我最喜歡舉的一個例子是一個今年發表的研究刊在《美國國家科學院院刊》上。 Google 一下就能查到。研究只有四頁紙，很容易讀。他們僅是研究了用戶在臉書的點讚，只是你在臉書上點讚的內容，用這些點讚的內容來推斷所有這些特性，以及其他的資訊。在調查中，他們列出了五類的讚，這些讚最能表明高智商的用戶。這其中還包括到炸馬鈴薯圈頁面點讚。（笑聲）炸馬鈴薯圈是好吃，但是到這頁面按讚不表示你就比一般人聰明。到底為什麼，最能體現你智商指數的指標之一是到一個頁面按讚，即使頁面的內容完全無關於要推斷的特性？結論是，我們需要參考很多背後的理論來了解為什麼我們能夠做到這點。其中一個就是社會學理論，叫同質相吸，指的是人們通常和與自己相像的人交朋友。所以如果你聰明，你會和聰明的人交朋友，如果你年輕，你會和年輕人交朋友，這個理論是經過驗證的，多年來大家都肯定。我們還知道很多關於訊息在網路上如何傳播。我們發現病毒影片、臉書按讚或是其他訊息傳播的方式完全和病毒透過社群網站傳播的方式一樣。這是我們研究了很長時間的東西，我們有很好的模型。所以如果你們把這些模型都放在一起，就能了解為何這樣的事情會發生了。如果要給各位一個假設，那就是一個聰明的人建立了一個粉絲頁，或者剛開始幾個去按讚的人在智力測試上得了高分，他們給這個頁面點了讚，當他們的朋友看見了，根據同質相吸的原理，我們知道這些人的朋友可能也很聰明，當訊息傳給他們，有些人也會給這個頁面點讚，而他們又有聰明的朋友，訊息接著傳出去，這樣一來，就在網路上傳開了，傳給一群聰明的人，如此，到最後給炸馬鈴薯圈頁面點讚的行為就成了高智商的指標，並不是因為頁面的內容，而是因為點讚的這一行為反映了做這件事情的人的共同特性。

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

所以這還是挺複雜的，是吧？要坐下來跟普通用戶解釋是困難的，而且即使我們分析了，對普通用戶們又有什麼用呢？你們怎麼知道到某個粉絲頁按讚能夠反映出你的特性，而這特性又和你按讚的內容完全無關呢？很多的權力用戶都沒有，他們沒法控制這些數據的使用。我認為這是我們繼續發展所面臨的真正困難。

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

所以我想到了幾條途徑我們可以參考，看能不能給用戶一些控制這些數據的方法。因為這些數據並不總是能替用戶帶來益處。我常舉例說，如果我厭倦當教授，我要開個公司去推斷所有這些用戶特性，像是你的團隊合作、嗑不嗑藥、是不是酒鬼。我們知道如何去推斷這些訊息。接著我就要把這些報告賣給人力資源公司或者大企業就是那些將要雇你的人。我們現在完全可以做到這些。我明天就可以開始做，而且你完全沒法控制我這樣使用數據的行為。這在我看來是一個問題。

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

所以我們能選擇的其中一條途徑就是政策和法律的制定。在某種程度上，我認為這將是最有效的方法，但問題是我們必須得實際執行。透過觀察我們的政治進程，讓我意識到我們很難集合一群代表，讓他們坐下來了解這件事，然後開始進行大規模改變，修改美國的知識產權法律以讓用戶有權控制他們的數據。

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

我們可以走政策道路，讓社群公司表態，「好，你們擁有自己的數據。你們能完全地控制對它們的使用。」問題在於多數社交媒體的收益模式某種程度上仰賴分享或利用用戶的數據。有人說臉書的用戶不是顧客，而是產品。所以你怎麼可能讓一間公司放棄對他們主要收入的控制把控制權還給用戶呢？這是有可能的，但我不認為我們能很快看到這一改變。

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

所以我認為另外一條途徑一條更有效的途徑，是更科學的途徑。正是透過科學，我們才能開發所有的這些機制首先用於計算個人數據事實上，有個很類似的研究，如果我們要發明一些機制是可以對用戶說「這是你剛才所做的行為要面臨的風險。」藉由臉書按讚，或者是分享私人資訊，你現在給了我更多能力去推斷你是否嗑藥或者你是否和同事相處融洽。我認為這些會影響人們是否願意分享事情、還是設為隱私，或者是完全不放上網絡。我們還可以研究一些像是讓用戶可以加密他們上傳的數據，所以對像是臉書的網站，這是隱形而且無用的，或者是第三方服務網站也是如此。但是用戶可選擇上傳的東西要讓誰有權可以看到。如果我們從知識的角度去看，這些都是非常令人興奮的研究，所以說科學家會願意做相關的研究。這比起法律的途徑，給了我們更多的好處。

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

當我談到這個的時候，人們常會提出一個疑問，你知道，如果人們開始把這些數據都保密了，你們一直在開發的這些用來推斷他們特性的方法都將失效，我回答說，完全正確，但對我來說，那就是成功。因為身為一名科學家，我的目標不是要推斷用戶的資訊，而是要改進人們在網路互動的方式。有時候這包括推斷關於他們的事情，但如果用戶不想要我使用這些數據，我認為他們有權利這麼做。我希望用戶們可以知道且同意

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

我們一直開發這些工具。所以，我認為推廣這類科學、支持研究者，支持那些希望把控制權交回到用戶手中，從社群媒體公司拿回這些權利的研究者，意味著隨著這些工具進化和發展，我們是向前發展的。我們將有一組教育程度更高、更有力的用戶數據，我相信大家都會認同朝此理想的發展方式前進。謝謝。

Thank you.

（掌聲）

(Applause)

Thank you.

（掌聲）

(Applause)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads