Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Om du minns internets första decennium, så var det ett väldigt stelt ställe. Man kunde koppla upp sig, besöka webbsidor, och dessa sattes upp av antingen organisationer som hade folk som gjorde det eller av teknikkunniga individer vid den tiden. Och med sociala mediers ökning och sociala nätverk under tidigt 2000-tal, förändrades internet helt och hållet till en plats där majoriteten av innehållet som vi använder läggs upp av vanliga användare, antingen med Youtube-videor eller blogginlägg eller produktrecensioner eller inlägg på sociala medier. Det har också blivit en mycket mer interaktiv plats, där folk interagerar med varandra, de kommenterar, de delar, de läser inte bara.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

Facebook är inte den enda plats där man gör detta men den är störst och kan illustrera siffrorna. Facebook har 1,2 miljarder användare per månad. Så halva jordens internetbefolkning använder Facebook. De är en webbplats som tillsammans med andra har möjliggjort för folk att skapa sig en onlineidentitet med väldigt begränsade teknikkunskaper, och folk svarade med att lägga upp enormt mycket personlig information online. Så resultatet är att vi har behovsmässig, preferensmässig och demografisk data för hundratals miljoner människor, vilket inte tidigare funnits i historien. För en datorforskare som mig betyder detta att jag har kunnat bygga modeller som kan förutse många gömda attribut om alla er som ni inte ens visste att ni delar mer er information om. Som forskare använder vi det för att förbättra sättet som folk interagerar på nätet, men det finns applikationer som är mindre osjälviska och det är ett problem att användarna inte riktigt förstår dessa tekniker och hur de fungerar, och även om de gjorde det, har de inte så stor kontroll över det. Det jag vill prata med er om idag handlar om de saker vi har en möjlighet att göra, och sen ge oss några idéer om hur vi kan röra oss mot att ge tillbaka kontrollen till användarna.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

Detta är Target, ett företag. Jag satte inte det emblemet på denna stackars gravida kvinnans mage. Du kanske har sett anekdoten som publicerades i Forbes där Target skickade ett flygblad till en 15-årig flicka med reklam och kuponger för nappflaskor, blöjor och spjälsängar två veckor innan hon berättat för sina föräldrar att hon var gravid. Ja, pappan var väldigt upprörd. Han sa, "Hur listade Target ut att denna gymnasietjejen var gravid innan hon berättade för sina föräldrar?". Det visade sig att de hade köphistoriken för tiotusentals kvinnor och beräknat, som de kallar det, en graviditetspoäng, vilket inte bara räknar ut om kvinnan är gravid, utan också när barnet kommer. De räknar inte ut det bara genom att se på det uppenbara, som att hon köper en spjälsäng eller barnkläder, utan saker som att hon köpt mer vitaminer än vanligt eller att hon en handväska som är stor nog att rymma blöjor. Och för sig själva verkar inte dessa inköp avslöja särskilt mycket men det är ett beteendemönster, så när man sätter det i samma kontext som tusentals andra, börjar det visa ett antal insikter. Det är den sortens saker vi gör när vi förutser saker om dig på sociala medier. Vi letar efter små beteendemönster, som visar sig bland miljontals människor, som låter oss få reda på alla möjliga saker.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

I mitt labb och med kollegor, har vi utvecklat mekanismer som låter oss förutsäga saker rätt precist som dina politiska preferenser din personlighetstyp, kön, sexuell läggning, religion, ålder, intelligens, tillsammans med saker som hur mycket du litar på dina medmänniskor och hur starka dessa relationer är. Vi kan göra allt detta väldigt väl. Återigen, det kommer inte från vad ni tror är uppenbara saker.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

Mitt favoritexempel från en studie som publicerades i år i Proceedings of the National Academies. Om du googlar så hittar du den. Den är på fyra sidor och lättläst. De såg bara på folks gillningar på Facebook, så bara de saker man gillar på Facebook, och använde det för att förutse alla attribut tillsammans med andra saker. Och i deras text listade de de fem gillningarna som bäst indikerade hög intelligens. Bland dessa fanns en sida om curly fries. (Skratt) Curly fries är jättegoda, men att gilla dem betyder inte nödvändigtvis att du är smartare än en genomsnittlig person. Hur kommer det sig att en av de starkaste indikationerna på ens intelligens är att gilla denna sida med innehåll som är totalt irrelevant för attributet som förutses? Och det visar sig att vi måste se till en massa bakomliggande teorier för att se hur vi kan göra detta. En av dem är en sociologisk teori som kallas homofili, vilket betyder att folk är vänner med folk som liknar dem själva. Så om du är smart känner du ofta andra smarta människor och om du är ung känner du ofta unga människor och detta är etablerat sen hundratals år tillbaka. Vi vet också mycket om hur information sprids genom nätverk. Det visar sig att virala videos eller Facebookgillningar eller annan information sprider som på exakt samma sätt som sjukdomar sprider sig genom sociala nätverk. Detta har vi studerat länge. Vi har bra modeller för det. Och man kan sätta ihop saker och börja se varför såna här saker händer. Om jag ger er en hypotes, skulle det vara att en smart kille startade den här sidan, eller kanske att en av de första som gillade sidan hade fått bra resultat på testet. Och de gillade det, deras vänner såg det, och genom homofili vet vi att han förmodligen hade smarta vänner, och så spreds det till dem, vissa av dem gillade det, och de hade smarta vänner och det spreds vidare, och så fortplanade det sig genom nätverket till en samling smarta människor så till slut blev gillandet av curly fries-sidan en indikation på hög intelligens, inte på grund av innehållet, utan för att själva gillningen reflekterar vanliga attribut hos andra som har gjort det.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Det är rätt komplicerade saker, eller hur? Det är svårt att sätta sig och förklara för en typisk användare, och även om man gör det, vad kan den medelanvändaren göra åt det? Hur vet du att du gillat något som indikerar ett av dina drag som är helt irrelevant för innehållet du gillar? Det finns mycket makt som användaren inte har för att kontrollera hur datan används.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

Och jag ser det som ett stort problem i framtiden. Jag tror det finns ett par vägar som borde kolla på om vi vill ge användarna kontroll över hur datan används, för det kommer inte alltid att användas till deras fördel. Ett exempel jag ofta ger är att om jag någonsin tröttnar på att vara professor kommer jag starta ett företag som förutser alla attribut och saker som hur du jobbar i grupp och om du är drogmissbrukare eller alkoholist. Vi vet hur vi ska förutse allt det. Och jag ska sälja rapporter till försäkringsbolag och storföretag som vill anställa dig. Vi kan göra det nu. Jag kan starta företaget imorgon, och du skulle inte ha någon kontroll över hur jag använder datan. Det verkar för mig vara ett stort problem.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

En av vägarna vi kan gå är politikens och lagens väg. I vissa avseenden tror jag att det skulle vara det mest effektiva, men problemet är att vi faktiskt skulle behöva göra det. Att se hur vår politiska process fungerar får mig att tro att det är högst osannolikt att vi kommer få en massa folkvalda att sätta sig ner och lära sig om detta, och sedan få igenom avgörande förändringar på immaterialrättens lag i USA så att användarna får kontroll över sin data.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Vi kan gå politikens väg, där företag inom sociala medier säger "Vet du vad? Du äger din data. Du har kontroll över hur den används". Problemet är att intäktsmodellerna för flertalet företag inom sociala medier baserar sig på att dela eller utnyttja användarnas data på något sätt. Det påstås ibland om Facebook att användaren inte är kunden, utan de är varan. Hur får man ett företag att ge tillbaka kontrollen över sin främsta tillgång till användarna? Det är möjligt, men jag tror inte att det kommer ändras särskilt snabbt.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

Jag tror att den andra vägen vi kan gå som är mer effektiv är en väg med mer vetenskap. Det är genom vetenskapen som vi kan utveckla alla dessa mekanismer för beräkningar av personliga data från första början. Undersökningarna är faktiskt väldigt lika de som vi måste göra om vi vill utveckla mekanismer som kan säga till användaren, "Det här är risken med den handling du nyss begick". Genom att gilla den där Facebooksidan eller genom att dela en del personlig information, har du förbättrat min möjlighet att förutse om du använder droger eller inte eller om du kommer bra överens med folk på jobbet eller inte. Det tror jag kan påverka om folk vill dela något, hålla det för sig själv, eller bara vara offline helt och hållet. Vi kan också kika på saker som att låta folk kryptera uppladdade data, så det är osynligt och värdelöst för webbplatser som Facebook eller tredjepartstjänster som använder det, men den utvalda personen som valts av uppladdaren kommer att ha möjlighet att se det. Detta är superspännande forskning från ett intellektuellt perspektiv, och forskare kommer vilja göra det. Det ger oss ett övertag över lagsidan.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Ett av problemen som folk tar upp när jag pratar om detta är, "Du vet, om folk börjar hålla all data privat, så kommer alla metoder som du utvecklat som förutser deras karaktärsdrag att misslyckas." Då säger jag "Absolut, och för mig är det en succé", för som forskare är mitt mål inte att få tag på information om användarna, utan att förbättra interaktionen mellan folk online. Och ibland innebär det att dra slutsatser om dem, men om användare inte vill ge mig tillgång till den informationen, tycker jag att de har rätt att göra det. Jag vill att användare ska vara informerade och samtyckande användare av verktygen som vi utvecklar.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Och jag tycker att vi genom att uppmuntra denna sorts vetenskap och att stötta forskare som vill ge tillbaka kontrollen till användarna från företagen inom sociala medier betyder att vi går framåt när dessa verktyg utvecklas och går framåt, och vi kommer få en utbildad och kraftfull användarbas, och jag tror vi är överens om att det är rätt väg att gå framåt.

Thank you.

Tack.

(Applause)

(Applåder)

Thank you.

Tack.

(Applause)

(Applåder)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads