Jennifer Golbeck: Your social media "likes" expose more than you think

Erinnern Sie sich mal zurück: In seinen ersten zehn Jahren war das Internet sehr statisch. Man konnte online gehen und sich Websites anschauen, die entweder von Organisationen mit professionellen Teams betrieben wurden oder von Privatleuten, die für ihre Zeit sehr technikversiert waren. Mit dem Aufstieg der sozialen Medien und der sozialen Netzwerke zu Beginn der 2000er veränderte sich das Internet von Grund auf: Ein Ort, an dem die große Mehrheit der Inhalte, mit denen wir uns beschäftigen, von durchschnittlichen Nutzern kommt, ob als YouTube-Video oder als Blogeintrag, Produktrezension oder Post bei einem sozialen Netzwerk. Das Internet ist auch viel interaktiver geworden. Menschen treten miteinander in Kontakt, sie kommentieren, sie teilen, sie lesen nicht nur.

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Facebook ist nicht der einzige Ort für solche Aktivitäten, aber der größte, und zeigt das Ausmaß sehr gut. Facebook hat monatlich 1,2 Milliarden Nutzer. Mehr als die Hälfte aller Internetnutzer nutzt Facebook. Es ist eine Website so wie andere auch, mit der Leute ohne große technische Kenntnisse ein virtuelles Ich erstellen können. Als Ergebnis stellten viele Leute eine Menge persönlicher Daten online. Wir haben jetzt also Daten zum Verhalten, zu Vorlieben und zur Demographie von hunderten Millionen von Leuten. Das gab es bisher noch nie. Als IT-Forscherin konnte ich daher Modelle erstellen, die alle möglichen versteckten Eigenschaften errechnen können, für Sie alle -- und Ihnen ist nicht bewusst, dass Sie Informationen darüber teilen. Als Forscher helfen wir Menschen dabei, online miteinander umgehen. Aber es gibt auch weniger selbstlose Anwendungen. Das Problem ist, dass Nutzer nicht richtig verstehen, wie diese Techniken funktionieren und auch wenn sie es täten, könnten sie sie nicht steuern. Heute möchte ich Ihnen sagen, was wir alles tun können und einige Wege aufzeigen, wie es weitergehen kann, um den Nutzern wieder mehr Kontrolle zu geben.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

Das hier ist das Unternehmen "Target". Das Logo ist nicht zufällig auf dem Bauch dieser armen Schwangeren. Sie kennen vielleicht die Geschichte, die im Magazin "Forbes" abgedruckt wurde. Darin schickte Target einem 15-jährigen Mädchen einen Flyer mit Werbung und Gutscheinen für Babyfläschchen, Windeln und Bettchen, zwei Wochen bevor das Mädchen seinen Eltern von der Schwangerschaft erzählte. Ja, ihr Vater hat sich ziemlich aufgeregt. Er sagte: "Wie konnte Target herausfinden, dass eine Schülerin schwanger ist, noch bevor sie es ihren Eltern erzählt hat?" Es stellte sich heraus, dass Target Einkaufsdaten von hunderttausenden Kunden besitzt und daraus einen "Schwangerschaftswert" errechnet. Da geht es nicht nur um eine mögliche Schwangerschaft, sondern um das errechnete Geburtsdatum des Kindes. Das wird nicht errechnet, indem die offensichtlichen Dinge angeschaut werden, wie z. B. Kauf von Babykleidung oder Bettchen, sondern ob die Frau z. B. mehr Vitamine als sonst kauft oder eine Handtasche, die groß genug für Windeln ist. Diese Einkäufe an sich scheinen nicht so viel zu offenbaren, aber sie stehen für ein Verhaltensmuster, das im Kontext mit tausenden anderen Menschen doch einige Einblicke bietet. So werden also durch soziale Medien Dinge über Sie errechnet. Wir suchen nach kleinen Verhaltensmustern, die Millionen Menschen zeigen und die somit einiges aussagen.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

Zusammen mit meinen Kollegen haben wir im Labor Mechanismen entwickelt, um sehr genau Dinge errechnen zu können: Politische Vorlieben, persönliche Eigenschaften, Geschlecht, sexuelle Orientierung, Religion, Alter, Intelligenz, wie sehr Sie den Menschen vertrauen, die Sie kennen, und wie stark diese Beziehungen sind. Das alles können wir ziemlich gut. Diese Daten stammen nicht aus offensichtlichen Informationen.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might

Mein Lieblingsbeispiel stammt aus einer Studie, die dieses Jahr in der Fachzeitschrift "PNA S" veröffentlicht wurde. Googeln Sie das mal. Es sind vier Seiten, leicht zu lesen. Es wurden nur die "Gefällt mir"-Angaben auf Facebook untersucht und dazu genutzt, um Eigenschaften und andere Dinge zu errechnen. In dieser Studie findet man die fünf "Gefällt mir", die meistens für hohe Intelligenz standen. Da war auch eine Seite über Spiralpommes dabei. (Gelächter) Spiralpommes sind lecker, aber ein "Gefällt mir" heißt noch lange nicht, dass Sie klüger sind als der Durchschnitt. Wie kann einer der stärksten Hinweise auf Intelligenz das "Gefällt mir" für diese Seite sein, wenn der Inhalt für das Errechnen der Eigenschaft eigentlich völlig irrelevant ist? Dazu muss man sich eine Menge zugrunde liegender Theorien anschauen, um zu verstehen, warum das funktioniert. Eine soziologische Theorie heißt Homophilie: Leute freunden sich mit ähnlichen Leuten an. Wenn Sie also intelligent sind, haben Sie eher intelligente Freunde, und wenn Sie jung sind, eher junge Freunde. Das ist schon seit Hunderten von Jahren bekannt. Wir wissen auch viel darüber, wie sich Informationen in sozialen Netzwerken verbreiten. Sehr beliebte Videos, "Gefällt mir"-Angaben auf Facebook oder andere Informationen verbreiten sich auf die gleiche Weise wie Krankheiten. Das haben wir über eine lange Zeit untersucht. Wir haben da gute Modelle. Dann kann man die beiden Dinge kombinieren und verstehen, warum sie passieren. Eine Hypothese könnte so aussehen: Ein intelligenter Typ hat eine Seite eingerichtet, oder der erste mit "Gefällt mir" hatte ein hohes Testergebnis. Er klickte auf "Gefällt mir", seine Freunde sahen es, und Homophilie sagt uns, dass er wahrscheinlich auch intelligente Freunde hat, und die haben wiederum auf "Gefällt mir" geklickt, die hatten intelligente Freunde und so breitete sich das durch das ganze Netzwerk aus und erreichte viele intelligente Leute. Am Ende ist das "Gefällt mir" für die Spiralpommes-Seite ein Zeichen für hohe Intelligenz. Nicht wegen des Inhalts, sondern weil die "Gefällt mir"-Angabe die gemeinsamen Eigenschaften der Leute widerspiegelt, denen es gefällt.

think of as obvious information. So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

Das klingt sehr kompliziert, oder? Es ist schwer, das einem einfachen Nutzer zu erklären und selbst wenn man das tut -- was kann der einfache Nutzer schon dagegen tun? Wie können Sie wissen, dass Sie mit einem "Gefällt mir" einen Charakterzug gezeigt haben, der aber völlig unabhängig vom Inhalt ist, der Ihnen gefällt? Die Nutzer haben nicht viel Macht, die Nutzung ihrer Daten zu steuern. Ich sehe darin ein großes Problem für die Zukunft.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Wir sollten uns vielleicht mehrere Wege anschauen, wenn Nutzer die Verwendung ihrer Daten beeinflussen wollen. Die Daten werden eben nicht immer. zu ihrem Vorteil genutzt. Ein Beispiel, das ich oft bringe: Falls ich einmal keine Professorin mehr sein will, werde ich eine Firma gründen, die solche Eigenschaften errechnet, ebenso Eigenschaften wie Teamfähigkeit, Drogenmissbrauch oder Alkoholismus. Wir können das errechnen. Und ich werde Berichte an Personalberater und große Unternehmen verkaufen, bei denen Sie sich bewerben. Das können wir alles schon. Ich könnte diese Firma morgen gründen und Sie hätten absolut keine Kontrolle darüber, dass ich Ihre Daten zu diesem Zweck nutze. Für mich ist das ein Problem.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

Eine mögliche Lösung wären interne Richtlinien und gesetzliche Regelungen. In mancher Hinsicht wäre das sehr wirksam, aber wir müssten es auch wirklich tun. Wenn ich mir die politischen Handlungen anschaue, finde ich es höchst unwahrscheinlich, dass wir ein paar Volksvertreter dazu bewegen, sich eingehend damit zu befassen und dann grundlegende Veränderungen im US-Gesetz zum geistigen Eigentum zu beschließen, damit die Nutzer ihre Daten steuern.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

Aufgrund interner Richtlinien könnten soziale Medien sagen: "Ihre Daten gehören Ihnen. Sie allein bestimmen deren Nutzung." Das Problem ist, dass das Geschäftsmodell vieler sozialer Medien in irgendeiner Weise auf der Weitergabe und der Verwertung der Nutzerdaten basiert. Bei Facebook ist manchmal die Rede davon, dass der Nutzer nicht der Kunde ist, sondern das Produkt. Wie bekommt man dann also eine Firma dazu, die Kontrolle über ihre Haupteinnahmequelle wieder an die Nutzer zu geben? Das geht, aber ich denke nicht, dass der Wandel schnell kommen wird.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Ich denke also, dass die andere mögliche Lösung wirksamer ist. Es geht um mehr Forschung. Es geht um die Forschung, die es uns erst ermöglicht hat, diese Mechanismen zum Verwerten persönlicher Daten zu entwickeln. Wir müssten eine sehr ähnliche Forschung betreiben, wenn wir Mechanismen entwickeln wollen, die dem Nutzer das Risiko, das er eingegangen ist, zeigen können. Durch Ihre "Gefällt mir"-Angabe auf Facebook oder die Weitergabe persönlicher Informationen geben Sie mir die Möglichkeit zu ermitteln, ob Sie Drogen nehmen oder sich an Ihrem Arbeitsplatz wohlfühlen. Ich denke, das hat Auswirkungen darauf, ob Leute etwas teilen wollen, es für sich behalten oder gar nicht online stellen wollen. Wir können uns auch anschauen, ob Nutzer vielleicht Daten, die sie hochladen, verschlüsseln, sodass sie unsichtbar oder wertlos für Seiten wie Facebook oder Dritte sind, die sich Zugang verschaffen wollen. Auf der anderen Seite sollen aber andere Nutzer, die die Informationen sehen sollen, sie auch sehen. Das alles ist sehr spannende Forschung aus einer intellektuellen Perspektive und deshalb sind Forscher bereit, das zu tun. Das verschafft uns einen Vorteil gegenüber der politischen Lösung.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

Wenn ich über dieses Thema rede, äußern viele Leute die Kritik: "Wenn Leute anfangen, all ihre Daten geheimzuhalten, dann werden die von Forschern entwickelten Methoden zur Berechnung von Charakterzügen fehlschlagen." Ich für meinen Teil finde, dass das ein Erfolg ist, weil ich als Forscherin keine Informationen über Nutzer sammeln will, sondern den Umgang im Internet verbessern will. Manchmal braucht es dazu auch Datensammlungen, aber wenn Nutzer mir ihre Daten nicht geben wollen, dann sollten sie das Recht dazu haben. Ich will, dass die Nutzer informiert und einverstanden mit den Tools sind, die wir nutzen.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Ich finde also, dass diese Art von Wissenschaft gefördert werden sollte. Forscher sollten unterstützt werden, wenn sie den Nutzern der sozialen Netzwerke wieder etwas Kontrolle zurückgeben wollen. Das heißt, dass wir in Zukunft, wenn sich diese Tools entwickeln und verbessern, aufgeklärte und gestärkte Nutzer haben. Ich denke, wir sind uns einig, dass das ein ziemlich idealer Weg in die Zukunft wäre.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Danke.

Thank you.

(Applaus)

(Applause)

Danke.

Thank you.

(Applaus)

(Applause)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads