Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Se recordam a primeira década da Internet, era um lugar estático. Podíamos entrar online, podíamos ver páginas, criadas por organizações com equipas para o efeito, ou por pessoas de grande saber tecnológico para a altura. Com o crescimento dos "media" sociais e das redes sociais, no início do milénio, a Internet mudou completamente, tornou-se um lugar onde, em grande maioria, os conteúdos que usamos são disponibilizados por utilizadores comuns, sejam vídeos no Youtube, publicações em blogues, avaliações de produtos ou publicações nos "media" sociais. Tornou-se também um lugar muito mais interativo, onde as pessoas podem interagir umas com as outras, podem comentar, partilhar, não estão só a ler. O Facebook não é o único lugar onde podemos fazer isso,

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

mas é o maior e serve para ilustrar os números. O Facebook tem 1200 milhões de utilizadores por mês. Metade da população da Terra com Internet, usa o Facebook. É um site, tal como outros, que permite que as pessoas criem uma personagem virtual, sem ter conhecimentos técnicos e as pessoas reagiram publicando enormes quantidades de dados pessoais online. O resultado é que temos dados comportamentais, preferências e dados demográficos, de centenas de milhões de pessoas, uma situação sem precedentes na história. Enquanto informática, o que isto significa é que posso construir modelos para prever muitos tipos de atributos ocultos sobre todos vocês, sem sequer imaginarem que estão a partilhar essa informação. Enquanto cientistas, usamos isso para facilitar a forma como as pessoas interagem online, mas existem outras aplicações menos altruístas, e o problema é que os utilizadores não percebem estas técnicas nem como elas funcionam e, mesmo que entendessem, não têm grande controlo sobre elas. Portanto, quero falar-vos hoje das coisas que conseguimos fazer, e depois dar-vos algumas ideias sobre o que podemos fazer para devolver algum controlo aos utilizadores. Esta é a companhia Target.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

— não fui eu que coloquei o logótipo na barriga da pobre mulher grávida — Talvez tenham visto a história que foi publicada na revista Forbes: a Target enviou um folheto a uma rapariga de 15 anos com publicidade e cupões para biberões, fraldas e berços, duas semanas antes de ela dizer aos pais que estava grávida. Pois, o pai ficou mesmo chateado e disse: "Como é que a Target descobriu que uma miúda do secundário estava grávida "antes de ela contar aos pais?" Acontece que eles têm o historial de compras de centenas de milhares de clientes e calculam o que chamam uma probabilidade de gravidez, que não só sabe se a mulher está grávida, como também a data provável de gestação. E calculam isso, não por observarem coisas óbvias, como ela comprar um berço ou roupas de bebé, mas coisas como ter comprado mais vitaminas do que é habitual, ou uma mala de mão grande o suficiente para carregar fraldas. Parece que estas compras não podem revelar grande coisa, em si mesmas, mas é um padrão de comportamento que, quando tomado no contexto de milhares de outras pessoas, começa de facto a revelar algumas pistas. É este tipo de coisas que fazemos para prever coisas sobre vocês nos "media" sociais. Procuramos pequenos padrões de comportamento que, quando detetados entre milhões de pessoas, nos permitem encontrar todo o tipo de coisas.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

No meu laboratório, com colegas, desenvolvemos mecanismos para prever, com grande rigor, coisas como as vossas preferências políticas, a vossa personalidade, o género, a orientação sexual, a religião, a idade, a inteligência, juntamente com coisas como em que medida confiam nas vossas relações e se essas relações são fortes. Podemos fazer tudo isto, e bem. E, mais uma vez, não provém do que possam parecer informações óbvias.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

O meu exemplo preferido é um estudo publicado este ano na revista "Proceedings of the National Academies". Encontram-no no Google — são quatro páginas, fáceis de ler. Eles analisam os "Gosto" no Facebook, as coisas de que vocês gostam, e usam-nas para prever todos esses atributos, juntamente com outros. No artigo, listam os cinco "Gosto" mais indiciadores de grande inteligência. Entre eles, estava: "gostar de uma página de batatas fritas encaracoladas". (Risos) Batatas fritas encaracoladas são deliciosas, mas gostar delas não significa necessariamente que somos mais inteligentes do que a média. Então, como é possível que um dos indicadores mais fortes da vossa inteligência seja gostar daquela página quando o conteúdo é totalmente irrelevante para o atributo que está a ser avaliado? Acontece que temos que olhar para uma série de teorias subjacentes para ver porque é que podemos fazer isto. Uma delas é uma teoria sociológica chamada homofilia, que diz que as pessoas são amigas de pessoas parecidas com elas. Se são inteligentes, tendem a ter amigos inteligentes, se são jovens, tendem a ter amigos jovens isto é uma verdade bem estabelecida, tem centenas de anos. Também sabemos muito sobre como a informação se espalha através das redes. Acontece que coisas como vídeos virais, "Gosto" no Facebook, ou outras informações se espalham exatamente da mesma maneira que as doenças se espalham na sociedade. Isto é uma coisa que estudamos há muito tempo. Temos bons modelos para isso. Portanto, podemos juntar estas coisas e começar a ver porque é que acontecem coisas como esta. Se eu quisesse formular uma hipótese, seria que foi um tipo inteligente que iniciou esta página, ou talvez que uma das primeiras pessoas a gostar dela pontuou alto no teste. Gostaram daquilo, e outros amigos viram, por homofilia, sabemos que provavelmente tinha amigos inteligentes, e assim espalhou por eles, e alguns gostaram e tinham amigos inteligentes e espalharam por esses e assim propagou-se pela rede a uma legião de pessoas inteligentes, de modo que, por fim, a ação de gostar da página das batatas fritas encaracoladas é indicadora de grande inteligência, não por causa do conteúdo, mas porque a ação de gostar reflete os atributos comuns de outras pessoas que fizeram o mesmo. Isto é mesmo uma coisa bastante complicada, não é?

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

É difícil sentarmo-nos e explicarmos a um utilizador vulgar e, mesmo que o façamos, o que pode o utilizador vulgar fazer? Como é que sabemos que gostar de uma coisa indica uma característica totalmente irrelevante para o conteúdo daquilo de que gostamos? Os utilizadores não têm o poder de controlar o uso dos dados e eu considero isso um problema real que está a agravar-se. Portanto, penso que há algumas vias que devemos analisar

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

para dar aos utilizadores algum controlo sobre o uso destes dados, porque nem sempre serão usados em seu benefício. Um exemplo que costumo usar, é que, se me fartar de ensinar, vou criar uma companhia que preveja estes atributos: se vocês trabalham bem em equipa, se são consumidores de drogas, se são alcoólicos. Sabemos como prever tudo isso. E vou vender relatórios às companhias de recursos humanos e às grandes empresas que pensem contratar-vos. Hoje, podemos fazê-lo. Eu podia iniciar esse negócio amanhã e vocês não teriam o mínimo controlo deste meu uso dos vossos dados. Penso que isso é um problema.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

Assim, uma das vias a que podemos recorrer é a via da política e da lei. Sob certas perspectivas, penso que esse seria o mais eficaz, mas o problema é que temos que o fazer. Observando o nosso processo político em ação, faz-me pensar que é muito pouco provável arranjar um grupo de representantes que se sentem, oiçam falar disto, e depois produzam alterações radicais à lei da propriedade intelectual nos EU, para que os utilizadores controlem os seus dados.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Temos a via da autorregulação, em que os "media" sociais dizem: "São donos dos vossos dados. Podem controlar o modo como são usados". O problema é que o modelo de receitas da maior parte dos "media" sociais baseia-se em partilhar ou explorar os dados dos utilizadores, de vários modos. Diz-se que os utilizadores do Facebook não são clientes, são o produto. Então como é que conseguimos que uma companhia abra mão do controlo do seu principal ativo devolvendo-o aos utilizadores? É possível, mas não é uma coisa que vá mudar assim tão depressa. Portanto, acho que a outra via que podemos usar e será muito mais eficaz

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

é uma via mais científica. Foi a ciência que nos permitiu desenvolver todos os mecanismos de computação para estes dados pessoais. E na verdade é uma pesquisa muito semelhante que teremos que fazer se queremos desenvolver mecanismos que digam a um utilizador: "Este é o risco da ação que realizou". Ao dar um "Gosto" no Facebook, ou partilhar alguma informação pessoal, aumentaram a minha capacidade de prever se estão a usar drogas ou se estão bem integrados no vosso local de trabalho. E isso, creio, pode afetar se as pessoas querem partilhar alguma coisa, manterem-na em privado, ou apenas mantê-la offline. Podemos olhar para coisas como permitir que as pessoas codifiquem os dados que carregam, de modo que eles sejam invisíveis e sem préstimo no Facebook ou em serviços a que terceiros tenham acesso, mas que as pessoas escolhidas por quem os publicou possam ter acesso a eles. Esta é uma pesquisa extremamente excitante do ponto de vista intelectual, e os cientistas vão querer fazê-la. Isso dá-nos uma vantagem sobre a via legal.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Um dos problemas que as pessoas levantam quando falo sobre isto, é: "Se as pessoas começam a manter estes dados em privado, "todos os métodos que desenvolveram para prever atributos vão fracassar". E eu digo: "Encantada! Para mim, isso é um êxito". porque, enquanto cientista, o meu objetivo não é deduzir informações sobre utilizadores, é melhorar a forma como as pessoas interagem online. E por vezes isso envolve deduzir coisas sobre elas, mas, se os utilizadores não quiserem que eu use esses dados, penso que devem ter o direito de fazê-lo. Quero que os utilizadores sejam informados e autorizem as ferramentas que desenvolvemos.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Portanto, penso que encorajar este tipo de ciência e apoiar investigadores que querem devolver algum desse controlo aos utilizadores e retirá-lo às companhias dos "media" sociais significa avançar, à medida que essas ferramentas evoluem e avançam, significa que passaremos a ter uma base de utilizadores educada e com poder. E penso que todos concordamos que é um belo ideal para pôr em prática.

Thank you.

Obrigada.

(Applause)

(Aplausos)

Thank you.

Obrigada.

(Applause)

(Aplausos)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads