Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Se ricordate il primo decennio del web, era davvero un posto statico. Ci si poteva connettere e consultare pagine che erano caricate o da organizzazioni che avevano uno staff dedicato o da singoli individui esperti di informatica per quei tempi. Con l'avvento dei social media e dei social network nei primi anni Duemila, il web si è completamente trasformato in un luogo dove gran parte dei contenuti con cui interagiamo è offerto da utenti normali con video su YouTube, post sui vari blog, recensioni di prodotti o post sui social media. È diventato un luogo molto più interattivo, dove le persone interagiscono tra loro, commentano, condividono non si limitano a leggere.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

Facebook non è l'unico luogo in cui si possono fare queste cose ma è il più grande, ed è utile per farsi un'idea in numeri. Facebook ha 1,2 miliardi di utenti al mese. La metà degli abitanti della Terra connessi a Internet usa Facebook. Si tratta di un sito, insieme ad altri, che ha permesso alle persone di crearsi un'identità online con una competenza tecnica minima e le persone hanno risposto immettendo quantità enormi di dati personali online. Quindi abbiamo dati sul comportamento, dati demografici e dati sulle preferenze di centinaia di milioni di persone. Tutto questo non ha precedenti nella storia. In quanto esperta di informatica, sono stata in grado di creare modelli in grado di dedurre ogni tipo di caratteristica su tutto ciò di cui, senza neanche saperlo, si condividono informazioni. Noi scienziati usiamo questi dati per semplificare il modo in cui le persone interagiscono online, ma esistono anche usi meno altruistici e il problema è che gli utenti non comprendono bene queste tecniche né il loro funzionamento, e se anche lo capiscono non ne hanno il controllo. Oggi voglio parlarvi di alcune cose che siamo in grado di fare e poi voglio proporre qualche idea su come muoverci per riportare parte del controllo nelle mani degli utenti.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

Questa è Target, la compagnia. Non sono stata io a mettere quel logo sulla pancia di quella povera donna incinta. Forse avrete letto questo aneddoto sulla rivista Forbes. Target ha inviato un volantino a questa quindicenne con della pubblicità e dei coupon per biberon, pannolini e culle, due settimane prima che lei dicesse ai suoi genitori di essere incinta. Sì, il padre era davvero sconvolto. Disse, "Come ha fatto Target a capire che questa liceale era incinta prima che lo dicesse ai suoi genitori?" È venuto fuori che l'azienda ha la cronologia degli acquisti di centinaia di migliaia di clienti e calcola quello che chiamano un punteggio di gravidanza, che non rivela soltanto se una donna è incinta oppure no, ma anche la data prevista della nascita. E questo lo calcolano non analizzando le cose ovvie, tipo: "sta comprando una culla o vestiti per bambini", ma cose come "ha comprato più vitamine di quanto non faccia di solito", oppure "ha comprato una borsa abbastanza grande da contenere dei pannolini". Presi singolarmente, questi acquisti non sembrano poter rivelare granché, ma è un modello di comportamento che, se visto nel contesto di migliaia di altre persone, comincia veramente a rivelare alcune informazioni. Questo è il tipo di lavoro che svolgiamo quando facciamo previsioni su di voi nei social media. Cerchiamo piccoli modelli di comportamento che, rilevati tra milioni di persone, ci consentono di scoprire tutta una serie di cose.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

Nel mio laboratorio, insieme ai miei colleghi, abbiamo sviluppato dei meccanismi, attraverso i quali possiamo prevedere con precisione informazioni come le vostre preferenze politiche, il vostro tipo di personalità, il vostro genere, l'orientamento sessuale, la religione, l'età, l'intelligenza, insieme a cose come quanta fiducia avete nelle persone che conoscete e quanto sono forti le relazioni che avete con loro. Riusciamo a fare tutto questo molto bene. E ripeto, non deriva da quelle che potreste considerare delle informazioni ovvie.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

Il mio esempio preferito viene da questa ricerca che è stata pubblicata quest'anno negli Atti dell'Accademia Nazionale. Se lo cercate su Google, lo troverete. È di quattro pagine, di facile lettura. Hanno analizzato solo i "Mi Piace" di Facebook, quindi solo le cose che vi piacciono su Facebook, ed hanno usato quei dati per prevedere tutte queste caratteristiche insieme ad altre. Nel loro articolo hanno elencato i cinque "Mi Piace" più indicativi di una grande intelligenza. Tra questi c'era il "Mi Piace" per una pagina sulle patatine fritte a ricciolo. (Risate) Le patatine fritte a ricciolo sono deliziose, ma il fatto che vi piacciano non vuol dire necessariamente che siate più intelligenti della media. E allora com'è che uno dei più forti indicatori della vostra intelligenza sia legato al fatto che vi piaccia questa pagina, quando il contenuto è totalmente irrilevante rispetto alla caratteristica che ne viene dedotta? Per rispondere a questo dobbiamo considerare tutta una serie di teorie che stanno alla base e che ci illustrano perché si possa fare una cosa del genere. Una di queste è una teoria sociologica, si chiama omofilia, che sostanzialmente dice che le persone fanno amicizia con chi è come loro. Quindi se sei intelligente tenderai ad essere amico di gente intelligente e se sei giovane tenderai ad essere amico di gente giovane. È un meccanismo consolidato da centinaia di anni. Sappiamo anche molto di come si diffondono le informazioni sui social network. Cose come i video virali, i "Mi Piace" su Facebook o altre informazioni si diffondono nei social network esattamente come le malattie. È una cosa che abbiamo studiato a lungo e abbiamo dei buoni modelli che lo illustrano. Se si mettono tutte queste cose insieme, si comincerà a capire come possano accadere cose del genere. Se dovessi fare un'ipotesi, sarebbe che questa pagina è stata creata da un tipo intelligente o magari che uno dei primi che ha messo "Mi Piace" su questa pagina ha ottenuto un punteggio alto in quel test di intelligenza. Ha messo "Mi Piace", i suoi amici l'hanno visto, in base all'omofilia sappiamo che probabilmente ha degli amici intelligenti, la pagina si è diffusa tra i suoi amici, alcuni di loro hanno messo "Mi Piace", gli amici hanno a loro volta altri amici intelligenti e la pagina si è diffusa anche tra di loro, propagandosi attraverso la rete tra una moltitudine di gente intelligente e quindi, alla fine, l'azione di mettere "Mi Piace" sulla pagina delle patatine a ricciolo è indicatore di un'elevata intelligenza non per il contenuto ma perché l'azione fisica di mettere "Mi Piace" riflette una caratteristica comune alle persone che l'hanno fatto. È roba abbastanza complicata, vero?

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Non è facile mettersi lì a spiegarlo a un utente medio. E poi, anche sapendolo, l'utente medio cosa ci può fare? Come fai a sapere che qualcosa che ti piace denota una tua caratteristica che non c'entra nulla con il contenuto di quella pagina che ti piace? Gli utenti non hanno modo di controllare come venga usata questa informazione. Per come la vedo io, questo è un problema serio.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

Penso che ci siano due strade che possiamo percorrere se vogliamo ridare agli utenti un po' di controllo su come questi dati potranno poi essere utilizzati, perché non sempre questi dati saranno utilizzati a loro vantaggio. Faccio spesso l'esempio che, se mai mi stufassi di fare il professore, fonderei un'azienda per dedurre tutte le informazioni di cui abbiamo parlato e cose del tipo: se lavori bene in gruppo, se sei un tossicodipendente o un alcolista. Sappiamo già come dedurre queste informazioni. E poi venderei questi dossier ad società di risorse umane e a grandi aziende interessate ad assumerti. Possiamo già farlo senza nessun problema. Potrei creare questa azienda domani, e voi non avreste nessun controllo su come io andrei ad utilizzare i vostri dati. Questo mi sembra un bel problema.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

Una delle strade che potremmo seguire è quella di creare leggi e linee di condotta. Per certi versi penso che sarebbe molto efficace, il problema tuttavia è l'iter di formazione di queste leggi. Osservando il procedimento legislativo al giorno d'oggi penso che sia estremamente improbabile che un gruppo di rappresentanti si metta lì a studiare questo problema e metta in atto velocemente una serie di cambiamenti alle leggi sulla tutela della proprietà intellettuale negli Stati Uniti in modo da rendere gli utenti proprietari dei propri dati.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Potremmo seguire la strada delle linee di condotta, in cui i social media si impegnano a dire: "I dati sono tuoi e hai il completo controllo di come vengono utilizzati." Il problema sta nel fatto che i modelli di ricavi per la maggior parte delle aziende di social media sono fondati sulla condivisione e l'utilizzo dei dati degli utenti. Parlando di Facebook si dice spesso che gli utenti in realtà non sono i clienti, ma il prodotto stesso. Quindi come è possibile che un'azienda ceda il controllo della sua risorsa più preziosa agli utenti? È tecnicamente possibile, ma non credo che sia qualcosa che possa cambiare nel breve periodo.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

Penso quindi che come strada alternativa un approccio scientifico potrebbe essere la soluzione più efficace. La scienza, in primo luogo, ci ha permesso di sviluppare tutti questi meccanismi per interpretare i dati personali. Dovremmo fare un tipo di ricerca molto simile, per sviluppare dei meccanismi che permettano di dire all'utente, "Ecco, questo è il rischio collegato all'azione che hai appena compiuto." Mettendo "Mi Piace" su quella pagina Facebook o condividendo queste informazioni personali, hai migliorato la mia capacità di capire se fai uso di stupefacenti o meno o se ti sai relazionare bene sul posto di lavoro Penso che questo possa influenzare l'inclinazione personale a condividere un'informazione, inserirla mantenendola privata o addirittura non inserirla affatto. Potremmo anche pensare a funzionalità del tipo permettere agli utenti di criptare i dati che inseriscono, in modo che diventino invisibili e senza valore per siti come Facebook o servizi di terze parti che lo utilizzano. Ma che utenti selezionati, a cui l'utente vuole garantire la possibilità di accedere, abbiano accesso. Il fatto che questa ricerca sia estremamente interessante dal punto di vista intellettuale, la renderebbe appetibile agli scienziati che sarebbero motivati a portarla avanti. È una posizione molto più vantaggiosa rispetto alla percorso legislativo.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Uno dei problemi che la gente menziona quando parlo di queste cose è che se la gente iniziasse a mantenere privata una grande quantità di dati tutti questi metodi che avete sviluppato per dedurre informazioni su di loro non funzionerebbero più. Al che io rispondo, certamente, e lo vedo come un successo, perché come scienziato, il mio obiettivo non è desumere informazioni sugli utenti, bensì migliorare il modo in cui le persone interagiscono online. A volte questo implica dedurre informazioni su di loro, ma se gli utenti non vogliono che io utilizzi alcuni dati, penso che sia un loro diritto. Vorrei che gli utenti degli strumenti che noi sviluppiamo fossero utenti informati e consenzienti.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Incorraggiare questo tipo di scienza e supportare i ricercatori che vogliono restituire parte di quel controllo agli utenti sottraendolo alle aziende di social media vuol dire che andando avanti, mentre questi strumenti si evolvono e migliorano, avremo una base di utenti informata e responsabilizzata. Penso che siamo tutti d'accordo sul fatto che questo sarebbe il modo ideale di procedere.

Thank you.

Grazie.

(Applause)

(Applausi)

Thank you.

Grazie.

(Applause)

(Applausi)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads