Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Ako se sećate prve decenije mreže, bilo je to prilično statično mesto. Mogli ste da se umrežite, da gledate stranice, njih su postavljale ili organizacije koje su imale timove koji su to radili ili pojedinci koji su bili prilično tehnološki napredni za to vreme. Sa usponom društvenih medija i društvenih mreža početkom 2000-ih, internet mreža se potpuno transformisala u mesto gde većinu sadržaja sa kojim komuniciramo postavlja prosečan korisnik, bilo preko video snimaka na Jutjubu ili blog unosa ili recenzija proizvoda ili poruka na društvenim mrežama. Mreža je postala mesto sa mnogo više komunikacije gde se ljudi povezuju jedni sa drugima, komentarišu i dele, a ne samo čitaju.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

Fejsbuk nije jedino mesto gde je ovo moguće, ali svakako jeste najveće i služi kao ilustracija stvarnih cifara. Fejsbuk broji 1,2 milijarde korisnika mesečno. Dakle, gotovo polovina internet korisnika na planeti koristi Fejsbuk. To je internet stranica, koja je, kao i mnoge druge, omogućila ljudima da stvore virtualne ličnosti sa jako malo tehničkih sposobnosti, i ljudi su odreagovali postavljajući ogromne količine ličnih podataka na mrežu. Rezultat toga je da postoje podaci o ponašanju, izborima, demografiji stotina miliona ljudi, što je neviđeno u istoriji. Za mene, kao informatičara, to znači da sam u mogućnosti da stvorim modele koji mogu prognozirati svakakve vrste skrivenih osobina svih vas, za koje ni ne znate da ih zapravo delite. Kao naučnici, mi to koristimo da pomognemo ljudima da komuniciraju na mreži, ali postoje i manje altruistične koristi, i problem je da korisnici zapravo ne razumeju ove metode i kako one funkcionišu, a sve i da razumeju, ne mogu mnogo da ih kontrolišu. Dakle, danas želim da govorim o nekim od stvari koje smo mi u mogućnosti da uradimo i da dam neke ideje o tome kako da vratimo kontrolu korisnicima.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

Ovo je kompanija Target. Nisam samo stavila taj logo na stomak ove sirote trudnice. Možda ste pročitali anegdotu u Forbs magazinu gde je Target poslao flajer jednoj petnaestogodišnjakinji sa reklamama i kuponima za pelene, cucle i krevetiće, dve nedelje pre nego što je ona rekla svojim roditeljima da je trudna. O da, otac je bio zaista uznemiren. Rekao je: "Kako je Target znao da je ova srednjoškolka trudna, pre njenih roditelja?" Oni zapravo imaju istoriju kupovine za stotine hiljada korisnika i izračunavaju ono što nazivaju trudničkim rezultatom, što nije samo da li je neka žena trudna, nego i kada treba da se porodi. Oni to izračunaju ne samo gledajući očigledne stvari, kao što je kupovina krevetića, dečije odeće, nego i to da je kupovala vitamine više nego inače, ili da je kupila torbu, dovoljno veliku za pelene. Same po sebi ove kupovine ne izgledaju kao da mnogo otkrivaju, ali predstavljaju obrazac ponašanja koji, kada se stavi u kontekst hiljada drugih ljudi počinje da otkriva neke skrivene činjenice. To je ono što mi radimo kada pokušavamo da predvidimo stvari o vama na društvenim mrežama. Tražimo male obrasce u ponašanju koji, kada ih spazite među milionima ljudi, dozvoljavaju da saznamo svakakve stvari.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

Moje kolege i ja smo u laboratoriji razradili mehanizme gde smo u mogućnosti da sasvim tačno predvidimo vašu političku opredeljenost, rezultat testa ličnosti, rod, seksualnu orijentaciju, versku opredeljenost, godište, nivo inteligencije, kao i koliko imate poverenja u ljude koje poznajete. i koliko su te veze jake. Sve ovo mi radimo veoma dobro. Opet, to ne dolazi iz očiglednih informacija, kao što bi bilo za očekivati.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking

Moj omiljeni primer je iz ove studije koja je izdata ove godine u Zborniku Nacionalnih akademija. Ako pretražite internet moći ćete da nađete. Ima četiri strane, lako je za čitanje. Oni su pregledali samo lajkove ljudi na Fejsbuku, dakle samo stvari koje ste lajkovali na Fejsbuku, i iskoristili su ih da predvide sve ove atribute, zajedno sa nekim drugim. U svom izveštaju su nabrojali 5 lajkova koji predstavljaju nagoveštaje visoke inteligencije. Jedan od lajkova je stranica uvijenih prženih krompirića. (Smeh) Uvijeni prženi krompirići su ukusni, ali to što vam se dopadaju ne znači nužno da ste pametniji od prosečne osobe. Kako je onda jedan od bitnijih indikatora vaše inteligencije lajkovanje ove stranice, kada je sadržaj potpuno nebitan u odnosu na atribut koji se predviđa? Ispostavlja se da moramo da uzmemo u obzir gomilu drugih teorija, da bismo saznali kako dolazimo do ovog rezultata. Jedna od teorija je sociološka, zvana homofilija, koja kaže da prijatelji imaju zajednička interesovanja. Ako ste pametni, često su to i vaši prijatelji, ako ste mladi, sprijateljićete se sa drugim mladim osobama i ovo je već davno ustanovljeno. Mi takođe znamo mnogo tome kako se informacije prenose kroz mreže. Ispostavlja se da se popularni video ili lajkovi na Fejsbuku i druge informacije, prenose identično kao i bolesti kroz društvene mreže. Ovo izučavamo već duže vreme. Imamo dobre modele za to. Kada sve to saberete uvidećete zašto se tako nešto uopšte dešava. Ako bih vam ponudila hipotezu ona bi glasila da je neki bistar momak napravio ovu stranicu, ili da je jedna od prvih osoba koja je lajkovala stranicu imala visoke rezultate na testu inteligencije. Njihovi prijatelji su to videli i na osnovu homofilije, pretpostavljamo da je imao pametne prijatelje, pa se sve prenelo na njih, pa su i oni lajkovali, a i oni su imali pametne prijatelje pa se sve takođe prenelo na njih, pa se sve proširilo kroz mrežu na veliki broj pametnih ljudi, i do kraja je postupak lajkovanja stranice uvijenih krompirića postao indikativan za visoku inteligenciju, ne zbog svog sadržaja,

reflects back the common attributes of other people who have done it.

nego zbog toga što čin lajkovanja odražava zajedničke osobine drugih ljudi koji su učinili isto.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Ovo je prilično komplikovano, zar ne? Nije lako objasniti prosečnom korisniku, a čak iako uspete, šta prosečan korisnik može da uradi povodom toga? Kako da znate da ste lajkovali nešto što odaje neku vašu osobinu, a koja nema nikakve veze sa sadržajem koji ste lajkovali? Korisnici nemaju mnogo moći da kontrolišu kako se ovi podaci koriste. Ja to vidim kao pravi problem.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

Mislim da postoje dva puta koja želimo da razmatramo ako želimo da korisnicima damo kontrolu nad korišćenjem podataka, jer oni neće uvek biti korišćeni u njihovu korist. Jedan primer koji često dajem je da, ako mi ikada dosadi da budem profesor, osnovaću kompaniju koja predviđa sve ove osobine, npr. koliko dobro radite u timovima, da li koristite droge, da li ste alkoholičar. Znamo kako sve to da predvidimo. I prodavaću izveštaje HR kompanijama i velikim firmama koje žele da vas zaposle. To apsolutno možemo sada da uradimo. Mogla bih sutra da otvorim tu kompaniju, a vi uopšte ne biste imali kontrolu nad tim kako ja koristim vaše podatke. Po mom mišljenju, ovo je problem.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

Jedan od puteva kojim možemo da pođemo je put pravila i zakona. Donekle, mislim da bi to bilo i najefikasnije, ali problem je u tome što bi to trebalo i uradimo. Gledajući kako naš politički proces funkcioniše, izgleda mi malo verovatno da će gomila političara da sedne i nauči nešto o ovome, a onda sprovede korenite promene u zakonu o intelektualnoj svojini u SAD-u, kako bi korisnici kontrolisali svoje podatke.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Mogli bismo da idemo putem politike, gde kompanije društvenih medija kažu: "Vi posedujete svoje podatke. Imate potpunu kontrolu nad njihovim korišćenjem." Problem je u tome što se modeli poslovanja većine kompanija društvenih medija oslanjaju na deljenje i iskorišćavanje podataka korisnika na neki način. Za Fejsbuk se nekada kaže da korisnici nisu klijenti, nego su proizvod. I kako da navedete neku kompaniju da kontrolu nad svojim glavnim resursom vrati korisnicima? Moguće je, ali mislim da nije nešto što će se brzo promeniti.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

Mislim da je drugi put kojim možemo da krenemo, i koji je efektniji, je onaj sa više nauke. Primena nauke nam je omogućila da uopšte razvijemo sve ove mehanizme izračunavanja ovih ličnih podataka. To istraživanje je veoma slično onom koje bismo morali da sprovedemo ako bismo želeli da razvijemo mehanizme koji bi korisniku rekli: "Ovo je rizik akcije koju ste upravo sproveli". Lajkovanjem te Fejsbuk stranice ili deljenjem te lične informacije poboljšali ste moju sposobnost da predvidim da li koristite droge ili da li se dobro slažete sa kolegama na poslu. I mislim da će to uticati na odluku ljudi da podele nešto, da zadrže u privatnosti ili uopšte ne postave na internet. Možemo posmatrati i dozvoljavanje ljudima da šifrom zaštite podatke koje postavljaju, da bi bili nevidljivi i beskorisni sajtovima kao što je Fejsbuk ili servisima trećeg lica koji im pristupaju, ali da mogu da vide samo odabrani korisnici, za koje osoba koja postavlja sadržaj, želi da vide. Ovo je veoma uzbudljivo istraživanje, sa intelektualnog stanovišta, i naučnici će želeti time da se bave. To nam daje prednost u odnosu na zakon.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Jedan od problema koji ljudi iznose kada govorim o ovome je, da ako ljudi počnu da podatke drže u tajnosti, sve metode koje sam ja razvijala za predviđanje njihovih osobina će propasti. Ja se apsolutno slažem, za mene je to uspeh, jer kao naučniku, meni nije cilj da nagađam o podacima korisnika, nego da poboljšam način na koji komuniciraju na internetu. Ponekad to uključuje zaključivanje, ali ako korisnici ne žele da koristim njihove podatke, mislim da bi trebalo da imaju pravo na to. Želim da korisnici budu informisani i da pristanu da koriste alate koje razvijamo.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Mislim da podsticanje ovakve nauke i podržavanje istraživača koji žele da deo te kontrole vrate korisnicima i oduzmu od kompanija društvenih medija, znači napredovanje, kako se ti alati razvijaju i napreduju, znači da ćemo imati obrazovanu i osnaženu bazu korisnika, i mislim da se slažemo da je to idealan način za napredovanje.

Thank you.

Hvala vam.

(Applause)

(Aplauz)

reflects back the common attributes of other people who have done it.

nego zbog toga što čin lajkovanja odražava zajedničke osobine drugih ljudi koji su učinili isto.

Thank you.

Hvala vam.

(Applause)

(Aplauz)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads