Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Ako se sjećate tog prvog desetljeća na Internetu, bilo je to vrlo statično mjesto. Mogli ste otići online, mogli ste gledati stranice, a one su bile postavljene od strane organizacija koje su imale timove za to ili vrlo tehnološki stručni pojedinci za to doba. S porastom društvenih medija i društvenih mreža ranih 2000 - tih, Internet se u potpunosti promijenio u mjesto gdje velika većina sadržaja s kojim dolazimo u kontakt je postavljena od strane prosječnih korisnika, u YouTube videima ili objavama na blogu ili recenzijama proizvoda ili objavama društvenih medija. Također je postao puno više interaktivno mjesto, gdje ljudi uzajamno djeluju, komentiraju, dijele, nije da samo čitaju.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

Facebook nije jedino mjesto gdje to možete činiti, ali je najveće, i služi za ilustriranje podataka. Facebook ima 1.2 milijardu korisnika na mjesec. Stoga pola Zemljine internetske populacije koristi Facebook. Oni su stranica, zajedno s drugima, koja je dozvolila ljudima da stvore online personu s vrlo malo tehničkih vještina, a ljudi su na to odgovorili postavljanjem gomile osobnih podataka na mrežu. Rezultat je da imamo podatke o ponašanju, podatke o sklonostima, demografske podatke, za stotine milijuna ljudi, što je jedinstven slučaj u povijesti. Kao računalnome znanstveniku, to znači da sam mogla napraviti modele koji mogu predvidjeti razne skrivene osobine za sve vas, a da ni ne znate da dijelite informacije o tome. Kao znanstvenici, koristimo to kao pomoć u načinu kako ljudi komuniciraju online, ali ima i manje altruističnih primjena, a problem je u tome što korisnici ne razumiju baš te tehnike i kako one rade, a čak i kada bi razumjeli, nemaju puno kontrole nad time. Ono o čemu želim pričati danas su neke stvari koje možemo napraviti, i onda dati neke ideje kako bismo mogli napredovati da vratimo dio kontrole natrag korisnicima.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

Ovo je tvrtka Target. Nisam ja stavila taj logo na trbuh ove jadne, trudne žene. Možda ste čuli ovu anegdotu objavljenu u časopisu Forbes; Target je poslao letak 15 godišnjoj djevojci s reklamama i kuponima za dječje bočice, pelene i kolijevke dva tjedna prije no što je rekla roditeljima da je trudna. Da, otac je bio zbilja uzrujan. Rekao je, “Kako je Target saznao da je ova srednjoškolka trudna prije nego što je ona rekla roditeljima?” Ispostavilo se da imaju prošlost kupovine za stotine tisuća kupaca te izračunavaju ono što oni nazivaju domet trudnoće, koji govori ne samo je li žena trudna, već i koji joj je termin porođaja. Oni to računaju ne gledajući samo očite stvari poput: kupuje li kolijevku ili odjeću za bebe, nego i stvari kao što su: kupila je više vitamina no obično, ili kupila je ručnu torbu u koju stanu pelene. Same po sebi, te se kupovine ne čine kao da otkrivaju mnogo, ali uzorak takvog ponašanja, kada ga se stavi u kontekst tisuća drugih ljudi, zapravo otkriva neka shvaćanja. To je način na koji predviđamo stvari o vama u društvenim medijima. Tražimo male obrasce ponašanja koji, kada ih otkrijete među milijunima ljudi, otkrivaju nam razne stvari.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

U laboratoriju s mojim kolegama, razvili smo mehanizme pomoću kojih možemo prilično precizno predvidjeti stvari poput vaše političke preferencije, tipa osobnosti, spol, seksualnu orijentaciju, vjeru, dob, inteligenciju, zajedno sa stvarima poput koliko vjerujete ljudima koje poznajete i koliko su jake te veze. Sve to jako dobro radimo. I opet, to ne dolazi od onoga što vam se čini kao očiti izvor informacija.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

Moj najdraži primjer iz ovog je istraživanja koje je objavljeno ove godine u broju časopisa Postupci Nacionalne Akademije S.A.D. Ako ga guglate, pronaći ćete ga. Ima četiri stranice, lak je za čitanje. Gledali su samo “lajkove” na Facebooku, samo one stvari koje vam se sviđaju, te su ih koristili za predviđanje ovih obilježja, zajedno s još nekim drugima. U svojem radu nabrojali su pet “lajkova“ koji su najviše ukazivali na visok stupanj inteligencije. A među njima bila je stranica za kovrčave krumpiriće. (Smijeh) Kovrčavi krumpirići zbilja su ukusni, no ako ste ih označili da vam se sviđaju, ne znači da ste pametniji od prosječne osobe. Stoga, kako da je jedan od snažnijih indikatora vaše inteligencije “lajkanje” stranice čiji je sadržaj u potpunosti ireleventan za atribut koji predviđa? Ispostavlja se da moramo sagledati golemi broj temeljnih teorija da bismo mogli vidjeti zašto možemo to predvidjeti. Jedna od njih je sociološka teorija imena homofilija, koja u biti govori da su ljudi prijatelji s onima koji su im slični. Stoga, ako ste pametni, težite biti prijatelj s takvima ako ste mladi, težite biti prijatelj s mladim ljudima, i to je uzorak koji je utvrđen već stotinama godina. Također znamo mnogo o tome kako se informacije šire mrežama. Ispostavlja se da se stvari poput viralnih videa, “lajkova” na Facebooku ili drugih informacija šire na potpuno jednak način kako se zarazne bolesti šire, ali kroz društvene mreže. To je nešto što smo proučavali dugo vremena. Imamo dobre modele tog obrasca. Možete posložiti pojave poput te i uvidjeti zašto se takve stvari događaju. Pa, ako bi vam dala hipotezu, ona bi glasila da je pametan tip pokrenuo tu stranicu, ili nekoliko prvih ljudi koji su ju “lajkali” su postigli visoke rezultate na ispitu inteligencije. Svidjela im se, njihovi su je prijatelji vidjeli, i prema načelu homofilije znamo da su vjerojatno imali pametne prijatelje pa se to proširilo do njih, neki od njih su ju “lajkali”, a i oni su imali pametne prijatelje, pa se proširilo do njih, tako se propagira kroz mrežu koristeći pametne ljude kao domaćine, stoga je na kraju čin sviđanja stranice kovrčavih krumpirića postao indikacija visoke inteligencije, ne zbog njezina sadržaja, već zato što sam pritisak na gumb sviđanja zrcali učestale atribute ljudi koji su ga pritisnuli.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Ovo su poprilično komplicirane stvari, jelda? Teško je sjesti i objasniti to prosječnom korisniku, no čak i ako to učinite, što može prosječan korisnik u vezi toga napraviti? Kako znate da nešto što ste “lajkali” indicira vašu osobinu koja je u potpunosti nepovezana sa sadržajem onoga što ste “lajkali”? Postoji puno snage koju korisnici nemaju za kontrolu kako se koriste ti podatci. i vidim to kao velik problem gledajući unaprijed.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

Mislim da postoji nekoliko načina koje želimo sagledati ako bismo korisnicima donekle dali kontrolu nad time kako se koriste ti podatci, jer neće uvijek biti iskorišteni u njihovu korist. Primjer koji često dajem jest, ako mi ikad dosadi biti profesor, pokrenut ću kompaniju koja predviđa sve te atribute i stvari kao što su, radite li dobro u timu, koristite li droge, jeste li alkoholičar. Sve to znamo predvidjeti. I prodavat ću izvješća kompanijama ljudskih resursa i velikim tvrtkama koje vas žele zaposliti. Mi to možemo napraviti. Mogla bih pokrenuti taj posao sutra, a vi ne biste imali apsolutno nikakvu kontrolu nada mnom koja koristim vaše podatke na taj način. To mi se čini kao problem.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

Jedan put kojim možemo poći je put politike i prava. I u nekim pogledima, mislim da bi to bilo najučinkovitije, no problem je da bismo to zbilja morali napraviti. Promatrajući naše političke procese u akciji čini mi se da je malo vjerojatno da ćemo dobiti gomilu predstavnika koji će sjesti, naučiti o tome i onda donijeti dalekosežne promjene u američke zakone o intelektualnom vlasništvu da bi korisnici mogli kontrolirati svoje podatke.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Možemo krenuti putem politike, gdje kompanije društvenih medija kažu: znate što? Vi posjedujete podatke. Imate potpunu kontrolu nad time kako se koriste. Problem je u tome što se modeli prihoda za većinu kompanija društvenih medija oslanjaju na dijeljenje ili, na neki način, iskorištavanje podataka korisnika. Ponekad se kaže za Facebook da korisnici nisu mušterije, već proizvod. Stoga kako natjerati kompaniju da korisnicima vrati kontrolu nad njihovom glavnom imovinom? Moguće je, ali ne mislim da je to promjena koju ćemo ubrzo vidjeti.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

Stoga mislim da će drugi put kojim možemo poći biti djelotvorniji i to je onaj sa više znanosti. Prije svega, znanost nam dopušta razvijanje svih ovih mehanizama za računanje osobnih podataka. I zapravo bismo morali provesti vrlo slično istraživanje ako bismo htjeli razviti mehanizme koji bi korisniku rekli: “Ovo je rizik radnje koju ste upravo napravili.” Lajkanjem te Facebook stranice ili dijeljenjem ove osobne informacije, upravo ste poboljšali moju sposobnost predviđanja koristite li droge, ili slažete li se na radnom mjestu. A to mislim da bi moglo utjecati na želju ljudi da nešto podijele, zadrže to privatnim, ili uopće ne stave online. Možemo se također osvrnuti na stvari poput dopuštanja ljudima da kodiraju podatke koje su objavili, pa oni postaju nevidljivi i beskorisni stranicama poput Facebooka ili uslugama trećih stranaka koje im imaju pristup, ali da odabrani korisnici za koje vlasnici žele da vide informacije ih mogu i vidjeti. Ovo je vrlo uzbudljivo istraživanje iz intelektualne perspektive, stoga će ga znanstvenici biti voljni napraviti. To nam daje prednost pred zakonskom stranom.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Jedan od problema koji ljudi spomenu kada pričam o ovome jest, kažu, znate, ako ljudi počnu zadržavati sve svoje podatke privatnima, sve ove metode koje ste razvijali za predviđanje njihovih osobina će propasti. A ja kažem, apsolutno, i za mene je to uspjeh, jer kao znanstveniku, moj cilj nije dolaziti do informacija o korisnicima, već poboljšati način kako ljudi međusobno komuniciraju online. Ponekad to uključuje saznavanje njihovih informacija, ali ako korisnici ne žele da koristim te podatke, mislim da bi trebali imati to pravo i učiniti. Želim da korisnici budu informirani i da prihvaćaju alate koje razvijamo.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Mislim da ohrabrivanje ove vrste znanosti i podupiranje istraživača koji žele povratiti dio kontrole korisnicima od kompanija društvenih medija znači da idući naprijed, kako se ova oruđa razvijaju i napreduju, imat ćemo obrazovanu i osposobljenu bazu korisnika, te mislim da se svi možemo složiti da je to prilično idealan način za napredovanje.

Thank you.

Hvala.

(Applause)

(Pljesak)

Thank you.

Hvala.

(Applause)

(Pljesak)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads