Hvis du kan huske det første årti af internettet det var virkelig et stillestående sted. Du kunne gå online, du kunne se på sider og de var sat op af enten organisationer som havde teams til at gøre det eller af individer der var super teknologihabile på den tid. Med fremkomsten af de sociale medier og sociale netværk i begyndelsen af 2000'erne ændrede internettet sig fuldstændigt til et sted hvor nu størstedelen af indholdet vi interagere med er lagt ud af almindelige brugere enten som YouTubevideoer eller blogs eller som produktanmeldelse eller opdateringer på sociale medier. Det blev også et meget mere interaktivt sted, hvor mennesker interagere med hinanden, de kommenterer, de deler de læser ikke bare.
If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.
Så Facebook er ikke det eneste sted du kan gøre dette, men det er det største, og det tjener til at illustrere tallene. Facebook har 1.2 billioner brugere per måned. Det er så halvdelen af Jordens internetbrugere der benytter Facebook. Det er en side, som sammen med andre, har givet personer mulighed for at opbygge en online personlighed med meget lidt teknologisk færdighed og mennesker reagerede ved at putte kæmpe mængder personlig data online. Så resultat er at vi har adfærdsmæssige, præferencemæssige og demografisk data for hundrede af millioner mennesker, Hvilket er hidtil uset i historien. Og som computerforsker, hvad det betyder er, at jeg er blevet i stand til at bygge modeller der kan forudse alle mulige skjulte egenskaber for alle jer, som jeg ikke engang kender, som i deler information omkring. Som forsker, bruger vi det til at hjælpe den måde mennesker interagere online, men der er mere egoistiske anvendelser, og der er et problem i at brugere ikke rigtigt forstår disse teknikker og hvordan de virker, og selv hvis de gjorde, har de ikke meget kontrol over det. Så hvad jeg gerne vil tale om i dag er nogle af de ting som vi er istand til at gøre og så give jer nogle ideer om hvordan vi nok vil tage fat for at flytte noget af kontrollen tilbage til brugerne.
So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.
Det her er firmaet Target. jeg har ikke blot sat dette logo på denne fattige, gravide dames mave. Du har måske set denne anekdote der var trykket i Forbes magasin hvor Target sendte en flyer til denne 15-årige pige med reklamer og kuponer til babyflasker og bleer og vugger to uger før hun fortalte hendes forældre at hun var gravid. Ja, faren var virkelig oprevet. Han sagde, "hvordan fandt Target ud af at denne gymnasiepige var gravid før hun fortalte det til hendes forældre?" Det viser sig, at de har købshistorikken for hundredetusinde kunder og de beregner hvad de kalder en graviditetsscore, der ikke bare er hvorvidt en kvinde er gravid eller ej, men hvornår hendes termin er. De beregner dette ikke ved at kigge på de åbenlyse ting, som, hun køber en vugge eller baby tøj, men ting som, hun købte flere vitaminer end hun normalt gjorde, eller hun købte en håndtaske der er stor nok til at indeholde bleer. For i sig selv, virker disse køb ikke til at de ville afsløre en masse, men der er et adfærdsmønster som, når du tager det i konteksten af flere tusinde af mennesker, rent faktisk begynder at give noget indsigt. Så det er den slags ting vi gør når vi forudsiger ting omkring dig på de sociale medier. Vi leder efter små adfærdsmønster der, når du opdager dem imellem millioner af mennesker, lader os finde ud af alle mulige ting
So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.
Så i mit laboratorie med kollegaer, har vi udviklet mekanismer hvor vi rimelig præcist kan forudsige ting som din politiske præference, din personlighedscore, køn, seksuelle orientering, religion, alder, intelligens, sammen med ting som hvor meget du stoler på de mennesker du kender og hvor stærke disse forhold er. Vi kan gøre alt dette ret godt. Og igen, det kommer ikke fra hvad du måske tror er den åbenlyse information.
So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.
Så mit favorit eksempel fra en undersøgelse der blev udgivet dette år i "Proceedings of the National Academies" Hvis du Googler dette, finder du det. Det er fire sider, let at læse. og de kiggede bare på folks Facebook likes, så bare de ting du kan lide på Facebook, og brugte dette til at forudsige disse attributter. sammen med nogle yderligere. I deres artikel oplistede de fem likes der mest indikerede en høj intelligens. Iblandt dem var at like en side for curly fries. (latter) Curly fries er lækre, men at kunne lide dem betyder ikke nødvendigvis, at du er klogere end den gennemsnitlige person. Så hvordan kan det være at en af de stærkeste indikatorer på din intelligens er at like denne side når indholdet er totalt irrelevant for de attributter der bliver forudsagt? Og det viser sig, at vi er nødt til at se på en hel bunke underliggende teorier for at se hvorfor vi er i stand til det. En af dem er en sociologisk teori kaldet homofili som dybest set siger, at folk er venner med folk som ligner dem. Så hvis du er klog, har du en tendens til, at være venner med kloge mennesker og hvis du er ung, tendere du til at være venner med unge mennesker, og dette er veletableret i hundrede af år. Vi ved også en masse omkring hvordan information spreder sig igennem netværk. Det viser sig at ting som virale videoer eller Facebook likes eller anden information spreder sig på nøjagtig samme måde som sygdom spredes gennem sociale netværk. Dette er noget vi har studeret i lang tid. Vi har gode modeller af det. og så kan du sætte de ting sammen og begynde at se hvorfor ting som dette sker Hvis jeg skulle give jer en hypotese, ville det være at en klog fyr startede denne side, eller måske en af de første personer der likede den ville have scoret højere på den test. De likede den, og deres venner så det, og via homofili, ved vi, at han højst sandsynlig havde kloge venner, så det spreder sig til dem, og nogle af dem likede det, og de havde kloge venner, og så spredte det sig til dem, og sådan forplantede det sig igennem netværket til et væld af kloge personer, så i sidste ende, handlingen af at like curly fries siden er indikator af høj intelligens ikke pga. indholdet men fordi den faktiske handling af at like afspejler den fælles attribut af de mennesker som har gjort det.
So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.
Det er en rimeligt kompliceret ting, ikke? det er svært at sætte sig ned og forklare til en gennemsnitlig bruger, og hvis du gør det, hvad kan brugeren gøre ved det? Hvordan ved du at du likede noget der indikere et træk ved dig der er totalt irrelevant for det indhold af det du likede? Der er en stor magt som brugere ikke har til at kontrollere hvordan denne data er brugt. Og jeg ser det som et reelt problem fremadrettet
So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.
Så jeg tror der er et par veje som vi skal se på hvis vi vil give brugerne noget kontrol omkring hvordan denne data er brugt, fordi det ikke altid bliver brugt til deres fordel. Et eksempel jeg tit giver er, at hvis jeg nogensinde blev træt af at være professor vil jeg starte et firma der forudsiger alle disse attributter og ting som hvor godt du arbejder i teams og om du er stofmisbruger, om du er alkoholiker Vi ved hvordan vi kan forudsige alt dette. og jeg vil sælge nogle rapporter til HR firmaer og store virksomheder der vil hyre dig. Det kan vi altså gøre nu. Jeg kunne starte dette firma imorgen, og du ville have absolut ingen kontrol over at jeg bruger dine data som det. Det virker for mig som et problem.
So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.
Så en af de veje vi kan gå er politik og lovvejen Og i nogle henseende, tror jeg at dette ville være mest effektivt, men problemet er at vi faktisk ville skulle gøre det. Ved observering af vores aktive politiske processer får det mig til at tro at det er meget usandsynligt at vi kan få en bunke repræsentanter til at sidde sig ned, lære om dette, og så vedtage gennemgribende ændringer for den intellektuelle ejendomsrets lov i USA så brugerne kan kontrollere deres data.
So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.
Vi kunne gå politik vejen, hvor sociale medie virksomheder siger, ved du hvad? Du ejer din egne data. Du har total kontrol over hvordan det bliver brugt. Problemet er, at indtægtsmodellerne for de fleste sociale medier afhænger af deling eller udnyttelse af brugers data. Det er nogle gange sagt af Facebook at brugerne ikke er forbrugerne, de er produktet. Så hvordan får du et firma til at give kontrollen af deres primære aktiv tilbage til brugerne? Det er muligt, men jeg tror ikke det er noget som vi vil se ske hurtigt.
We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.
Så jeg tror den anden vej som vi kan gå ned af som vil være mere effektiv er en af mere videnskab. Det er at lave forsøg der tillod os at udvikle de mekanismer til beregning af personlig data i første omgang. Og det er faktisk meget lignende forskning som vi ville skulle lave hvis vi ville udvikle mekanismer der kan fortælle en bruger, "her er risikoen ved det du lige har foretaget dig" Ved at like den Facebook sige, eller ved at dele dette stykke personlige information, har du forbedret min evne til at forudsige hvorvidt du bruger stoffer eller om du ikke kommer godt ud af det på arbejdspladsen. og det, tror jeg, kan påvirker hvorvidt folk vil dele noget, holde det privat, eller bare holde det offline. Vi kan også se på ting som give folk lov til at kryptere data de uploader, så det bliver usynligt og værdiløst for sites som Facebook eller tredjeparts servicer der har adgang til det men som udvalgte brugere som personen der postede det vil have til at se det - kan få adgang til det. Dette er superspændende forskning fra et intellektuelt perspektiv, så forskere vil være villige til at gøre det. Så det giver os en fordel overfor den lovmæssige side.
So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.
Et af de problemer folk bringer på banen når jeg taler om dette er, siger de du ved, hvis folk begynder at holde alle disse data private, alle disse metoder som du har udviklet til at forudsige deres evner vil mislykkes. og jeg siger, absolut, og for mig, er det succes. Fordi som forsker, er mit mål ikke at udlede information om brugere, det er at forbedre den måde folk interagere online. og nogle gange involvere det udledning af ting om dem men hvis brugere ikke vil have mig til at bruge de data tror jeg de skulle have retten til at gøre det. Jeg vil have informerede og samtykkende brugerne af de værktøjer vi udvikler.
One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.
Så jeg tror at opfordre til denne form for videnskab og støtte forskere som vil tage noget af denne kontrol tilbage til brugerne og væk fra de sociale medie virksomheder betyder at gå fremad, som disse værktøjer udvikles og bliver bedre, vil vi være nødt til at have en veluddannet og selvstændig brugerbase og jeg tror at vi alle kan blive enige om at det er en ret ideel måde at gå fremad på.
And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.
Tak.
Thank you.
(Bifald)
(Applause)