Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Vă amintiţi primul deceniu al internetului, când era un loc static. Puteai să te conectezi, să cauţi pagini realizate fie de organizaţii care aveau echipe pentru asta, fie de persoane pasionate de tehnologie la acel moment. Popularizarea publicaţiilor şi reţelelor sociale la începutul mileniului au schimbat complet internetul în ceva în care majoritatea conţinutului cu care interacţionăm e realizat de utilizatori obişnuiţi prin videoclipuri pe YouTube, postări pe bloguri, recenzii de produs sau postări obişnuite. A devenit de asemenea un loc mai interactiv unde oamenii interacţionează unii cu alţii comentează, împărtăşesc, nu doar citesc. Facebook nu e singurul loc unde poţi face asta,

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

dar este cel mai mare, iar cifrele vorbesc de la sine. Facebook are 1,2 miliarde de utilizatori lunar. Deci jumătate din populaţia conectată la internet foloseşte Facebook. E un site care, alături de altele, permite oamenilor să îşi creeze o identitate online necesitând puţine cunoştiinţe tehnice, iar oamenii răspund punând foarte multe date personale online. Așa că avem indicatori comportamentali, de preferinţe, demografici, pentru sute de milioane de oameni, ceva unic în istorie. Ca specialist în calculatoare, asta înseamnă că am putut crea modele care pot anticipa tot felul de atribute ascunse despre care nici măcar nu ştiţi că dezvăluiţi informaţie. Ca cercetători, folosim asta pentru a îmbunătăţi modul în care oamenii interacţionează online, dar există aplicaţii mai puţin altruiste şi e o problemă că utilizatorii nu înţeleg exact aceste tehnici şi cum funcţionează şi chiar dacă ar înţelege nu le-ar putea controla. Astăzi vreau să vă vorbesc despre lucrurile pe care le putem face şi apoi idei despre cum să continuăm, redând controlul utilizatorilor. Aceasta e compania Target.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

Acel logo nu e pus degeaba pe pântecele acelei femei însărcinate. Poate aţi văzut anecdota din revista Forbes, unde Target a trimis un fluturaş unei fete de 15 ani, cu reclame şi cupoane de biberoane, scutece şi pătuţuri de copii, cu două săptămâni înainte să le spună părinţilor că este însărcinată. Tatăl a fost foarte supărat: „Cum şi-au dat seama cei de la Target că această liceană este însărcinată înainte ca ea să le spună părinţilor?" A reieşit că ei au istoricul cumpărăturilor a sute de mii de clienţi şi calculează un scor ce estimează sarcina, arătând atât că femeia e însărcinată sau nu, cât şi data la care va naşte. Iar ei au estimat asta fără lucruri evidente: dacă a cumpărat un pătuţ sau haine de copil, ci dacă a cumpărat mai multe vitamine decât în mod obişnuit sau dacă a cumparat o geantă suficient de mare pentru scutece. Aceste cumparături nu par să dezvăluie prea mult, dar e un tipar comportamental care, raportat la mii de oameni, începe să dezvăluie ceva anume. Asta facem noi când estimăm lucruri despre voi în publicaţiile sociale. Căutăm modele de comportament care – fiind comune la milioane de oameni – ne ajută să aflăm tot felul de lucruri. În laborator cu colegii mei am dezvoltat mecanisme

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

prin care putem estima cu acurateţe lucruri precum preferinţele politice personalitatea, genul, orientarea sexuală, religia, vârsta, inteligenţa, împreună cu altele precum: câtă încredere aveţi în cunoscuţii voştri şi cât de apropiate vă sunt relaţiile. Putem face foarte bine acest lucru. Şi asta nu datorită informaţiei considerate evidente. Exemplul meu favorit din acest studiu publicat anul ăsta

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

în Procedurile Academiilor Naţionale. Îl puteţi găsi pe Google. Are patru pagini - uşor de citit. Au luat în calcul doar „Like”-urile de pe Facebook, adică lucrurile care vă plac pe Facebook şi le-au folosit pentru estimarea atributelor menţionate, pe lângă altele. În lucrare, au listat cele 5 like-uri care indică un grad înalt de inteligentă. Printre ele era like-ul la o pagină pentru cartofi prăjiţi spiralaţi. (Râsete) Cartofii prăjiţi spiralaţi sunt delicioşi, dar dacă vă plac, nu înseamnă neapărat că sunteţi mai deştept decât media. Cum de unul dintre cei mai puternici indicatori ai inteligenţei voastre e că daţi like la această pagină cu un conţinut total irelevant pentru însuşirea care e estimată? De fapt trebuie să ţineţi cont de o serie întreagă de teorii fundamentale ca să înţelegeţi cum facem asta. Una din ele e teoria sociologică numită homofilie, care spune că oamenii sunt prieteni cu oameni ca ei. Dacă eşti deştept, tinzi să ai prieteni deştepţi şi dacă eşti tânăr ai prieteni tineri şi ştim asta de sute de ani. Mai ştim multe despre cum se propagă informaţia prin reţele. Se pare că videoclipurile virale, like-urile de pe Facebook sau alte informaţii, sunt transmise exact ca şi bolile prin reţele sociale. Asta studiem de mult timp. Avem modele bune pentru asta. Punem lucrurile astea la un loc şi vedem de ce se întâmplă aşa. Dacă ar fi să vă dau o ipoteză, ar fi că un om deştept a creat această pagină sau e între primii care au dat like şi are punctaj mare la testul IQ. Lui i-a plăcut pagina şi prietenii săi au văzut şi datorită homofiliei ştim că are prieteni deştepţi, astfel s-a transmis şi lor, iar câtorva le-a plăcut şi au şi ei prieteni deştepţi, deci s-a transmis şi lor şi s-a propagat prin reţea către oamenii deştepţi, iar în final acţiunea de a da like la cartofi prăjiţi spiralaţi e un indicator de inteligenţă, nu datorită conţinutului, ci pentru că like-ul denotă însuşiri comune ale celor care l-au dat. Destul de complicat, nu?

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Este greu de explicat utilizatorului obişnuit şi chiar dacă i-ai explica, ce ar putea face el în privinţa asta? Cum poţi şti că ai dat like la ceva care indică o trăsătură a ta fără legătură cu conţinutul care ţi-a plăcut? Utilizatorii nu au puterea să controleze cum sunt folosite aceste date. Văd asta ca o problemă reală ce avansează. Există anumite căi de analizat

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

dacă vrem ca utilizatorii să poată controla utilizarea acestor date, pentru că nu vor fi întotdeauna folosite în beneficiul lor. Dau adesea un exemplu: dacă o să mă plictisesc vreodată să fiu profesor, voi crea o companie care estimează toate aceste însuşiri, precum: cât de bine lucraţi în echipă, dacă sunteţi drogaţi sau alcoolici. Ştim să estimăm toate astea. Aş vinde rapoarte companiilor de recrutare şi firmelor mari care vă angajează. Putem face asta acum. Pot începe afacerea chiar mâine şi nu aţi avea absolut niciun control asupra faptului că folosesc astfel datele. Asta mi se pare o problemă. O cale pe care o putem alege e calea politicii şi a legii.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

În anumite privinţe, asta ar fi cea mai eficientă, dar problema e cum o facem concret. Observând procesului politic în acţiune, gândesc că e puţin probabil să avem câţiva reprezentanţi care să stea să înveţe despre asta şi apoi să adopte schimbări radicale în proprietatea intelectuală în SUA, ca utilizatorii să-şi poată controla datele.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Am putea alege calea politică unde companiile de media socială spun: „Ştiţi ce? Informaţiile vă aparţin. Controlaţi total cum sunt folosite.” Problema e că venitul majorităţii acestor companii de media socială se bazează pe împărtăşirea sau exploatarea datelor utilizatorilor. Se spune despre Facebook că utilizatorii nu sunt clienţii, ei sunt produsul. Cum determini o companie să cedeze controlul asupra sursei sale de venit, utilizatorilor? E posibil, dar nu cred că e ceva care o să se schimbe rapid. Cred că cealaltă cale de adoptat şi una mai eficientă, este cea a ştiinţei.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

Cu ajutorul ştiinţei am dezvoltat mecanismele de calcul ale acestor date. Ar trebui să facem o cercetare similară dacă dorim să dezvoltăm mecanisme care să-l anunţe pe utilizator: „Iată riscul la care te expui făcând asta" dând like la acea pagină de Facebook, sau împărtăşind această informaţie personală, mi-ai înlesnit capacitatea de a prezice dacă foloseşti droguri sau dacă eşti bine adaptat la locul de muncă. Asta poate influenţa ce doresc oamenii să împărtăşească şi ce nu, ce să păstreze privat sau ce să nu pună online. Ne putem ocupa şi de lucruri care să permită oamenilor să cripteze datele pe care le încarcă, încât să fie invizibile şi inutile pentru site-uri precum Facebook sau terți care le accesează, dar să-i selecteze pe cei care ei vor să aibă acces la ele. Asta-i o cercetare foarte interesantă dintr-o perspectivă intelectuală, aşa că cercetătorii vor dori s-o facă. Asta ne dă un avantaj în faţa căii legale. Una dintre problemele menționate când vorbesc despre asta, este:

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

„dacă oamenii încep să îşi păstreze toate datele personale private, toate metodele pe care le-aţi dezvoltat ca să le deduceţi trăsăturile vor eşua.” Le răspund: „Categoric.” Dar pentru mine ăsta e un succes, deoarece, ca cercetător, scopul meu nu e să deduc informaţii despre utilizatori, ci să îmbunătăţesc modul cum oamenii interacţionează online. Uneori asta implică deducţia unor lucruri despre ei, dar dacă utilizatorii nu vor ca eu să folosesc acele date, ar trebui să aibă dreptul să facă asta. Vreau ca utilizatorii să fie informaţi şi de acord cu instrumentele pe care le realizăm.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Cred că încurajând asemenea cercetări şi sprijinind cercetătorii care doresc să redea parte din control utilizatorilor, luând-o de la companiile de media socială, înseamnă că înaintăm odată cu evoluția şi progresul acestor mijloace, înseamnă că vom avea o bază de utilizatori educată şi în control. Cred că toți suntem de acord că asta e calea ideală de a continua. Vă mulţumesc.

Thank you.

(Aplauze)

(Applause)

Thank you.

(Aplauze)

(Applause)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads