Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Як ви, можливо, пам'ятаєте, перші десять років глобальна мережа була доволі статичним місцем. Можна було зайти в Інтернет, переглядати сайти, що їх створювали або організації, де для цього були окремі команди, або люди, справді технічно просунуті за тодішніми мірками. Та з розквітом соціальних медіа й соціальних мереж на початку двотисячних Інтернет враз перетворився на місце, де переважну більшість контенту, з котрим ми маємо справу, створюють пересічні користувачі - це відео на YouTube та дописи в блогах, відгуки споживачів і дописи в соціальних медіа. Заразом Інтернет став значно інтерактивнішим місцем, де люди взаємодіють між собою, коментують і діляться, а не лише читають.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

Так, Facebook - не єдине місце, де це можна робити, але воно найбільше, варто лишень поглянути на статистику. Facebook має 1,2 мільярди користувачів на місяць. Тож половина жителів планети, які мають доступ до Інтернету, користуються Facebook. Це один з сайтів, що дав змогу людям створити собі онлайн-особистість, маючи мінімальні технічні навички, й люди зреагували, виклавши величезні обсяги персональних даних онлайн. Як наслідок, ми маємо інформацію про поведінку, вподобання, демографічні дані сотень мільйонів людей - вперше в історії людства. Для мене, як фахівця у галузі комп'ютерних технологій, це значить, що тепер я можу створювати моделі, що прогнозуватимуть всілякі приховані тенденції на основі інформації, яку ви поширюєте. Науковці використовують ці моделі, щоб удосконалити нашу з вами взаємодію в мережі, але не всі мають такі альтруїстичні наміри, до того ж, є одна проблема - користувачі не зовсім розуміють, як працюють ці технології, а навіть якби розуміли, вони все одно не мають над ними контролю. Отож, сьогодні поговоримо про те, що ми можемо робити з цими даними, а потім поміркуємо, як повернути частковий контроль у руки користувачів.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

Ось компанія Target. Я розмістила її логотип на животі цієї бідолашної вагітної жінки не просто так. Можливо, ви читали історію, надруковану в журналі Forbes, про те, що Target надіслала 15-річній дівчині рекламу й купони на знижку на пляшечки, підгузники й дитячі ліжечка за два тижні до того, як та зізналася своїм батькам, що вона вагітна. Так, батько дуже засмутився. Він задумався: "А як Target дізнався, що ця школярка вагітна, ще до того, як вона розповіла про це своїм батькам?" Як виявилось, компанія збирає історію покупок сотень тисяч своїх клієнтів і вираховує так званий "показник вагітності" - не тільки ймовірність, вагітна жінка чи ні, а й те, коли саме їй народжувати. Компанія вираховує його не лише на основі очевидних факторів, наприклад, чи купує жінка колиску чи одяг для немовляти, а стежить за тим, чи не купила вона, бува, більше вітамінів, ніж зазвичай, чи, може, придбала велику сумку, куди вмістяться підгузники. Самі по собі ці покупки не надто інформативні, але якщо взяти модель поведінки в масштабі тисяч людей, то вималюється певна картина. Ми займаємось чимось подібним, коли складаємо прогнози на основі вашої поведінки в соціальних мережах. Ми вишуковуємо непомітні на перший погляд моделі поведінки, які розкажуть багато чого, якщо виявиться, що вони властиві мільйонам людей.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

Ми з колегами з нашої лабораторії розробили механізми, що дають нам змогу доволі точно прогнозувати ваші політичні вподобання, особисті характеристики, стать, сексуальну орієнтацію, релігію, вік, рівень інтелекту, а також те, наскільки ви довіряєте своїм знайомим, і чи міцні ваші взаємини. Нам це вдається досить добре. Знову ж таки, ми не потребуємо так званої очевидної інформації.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

Мій улюблений приклад - із дослідження, описаного в цьогорічному випуску журналу Національної академії наук. Його можна знайти в Google. Там всього чотири сторінки. Автори дослідження проаналізували вподобання людей у Facebook і на основі цього спрогнозували всі перелічені мною характеристики, і навіть більше. І в своїй статті вони перерахували п'ять вподобань, що найпереконливіше свідчать про високий рівень інтелекту. Серед них було вподобання сторінки спіральної картоплі-фрі. (Сміх) Спіральна картопля-фрі дуже смачна, але якщо ви вподобали її сторінку, це не значить, що ви розумніші за інших. Тож як так може бути, що найпереконливішим показником рівня вашого інтелекту є вподобання сторінки, вміст якої не має аніякісінького стосунку до прогнозованої характеристики? Виявляється, для того, щоб зрозуміти це, треба взяти до уваги цілу низку теорій. Однією з них є соціологічна теорія під назвою гомофілія, згідно з якою, люди приятелюють з тими, хто схожий на них. Тобто якщо ви розумні, скоріш за все, ви приятелюватимете з розумними людьми, якщо молоді - дружитимете з молодими, і так триває вже сотні років. Ми також чимало дізналися про те, як інформація поширюється мережею. Виявилось, що вірусні відео, вподобання на Facebook чи інша інформація поширюються точно так само, як хвороби серед людей. Ми це довгий час аналізували. Розробили кілька моделей. Тому тепер можна скласти всі чинники докупи й зробити висновок, чому так є. Якби ви запитали мене про це, я б відповіла, що ту сторінку з картоплею створив якийсь розумний хлопчина, або той, хто вподобав її одним із перших, має високий рівень інтелекту. Його друзі побачили, що він вподобав ту сторінку - а теорія гомофілії каже нам, що він має розумних приятелів - і дехто з них і собі її вподобав. Ті люди теж мають розумних друзів, які також дізнались про ту сторінку, і так інформація облетіла мережу, поширилась серед розумних людей, і, зрештою, сам факт уподобання сторінки зі спіральною картоплею-фрі став свідчити про високий рівень інтелекту. Але не через зміст цієї сторінки, а через те, що сама дія - вподобання - свідчить про спільні характеристики людей, які цю дію здійснили.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Звучить досить складно, правда ж? Не так легко сісти й пояснити це пересічному користувачеві, та й навіть, якщо вам вдасться пояснити, чим типовий користувач зможе тут зарадити? Звідки ви можете знати, що те, що ви вподобали, свідчить про якусь вашу рису, яка не має жодного стосунку до змісту вподобаної вами сторінки? Користувачам не під силу повністю контролювати те, як використовують ці дані. І мені здається, що це дедалі більша проблема.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

Існує кілька способів, які варто розглянути, якщо ми хочемо дати користувачам частковий контроль над тим, як використовують ці дані, бо це не завжди йде їм на користь. Я часто кажу, що коли мені набридне викладати, я засную компанію, що прогнозуватиме різні характеристики, і те, чи ви, наприклад, вмієте працювати в команді, чи ви наркоман, чи, може, зловживаєте алкоголем. Ми знаємо, як це все прогнозувати. І тоді я розсилатиму звіти у кадрові агенції та великі компанії, що хочуть найняти вас на роботу. Тепер ми спокійно можемо це робити. Я можу взятися за це хоч завтра, а ви не матимете жодного контролю над тим, як я використовуватиму ці дані. Як на мене, це проблема.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

Отож, один із способів їй зарадити - ухвалити правила й закони. На мою думку, цей спосіб - найефективніший, але проблема в тому, що тут доведеться добре попрацювати. Коли я спостерігаю за нашим політичним процесом, то щораз менше впевнена в тому, що нам вдасться залучити десяток депутатів, змусити їх сісти й вивчити цю тему, а потім внести кардинальні зміни в закони США про інтелектуальну власність, щоб користувачі могли контролювати особисті дані.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Можна спробувати ухвалити якісь правила, але соціальні медіа заявлять: Знаєте що? Ваші дані належать вам самим. Те, як їх буде використано, залежить від вас. Проблема в тому, що дохід більшості соціальних медіа залежить від розповсюдження чи використання користувацьких даних у певний спосіб. Про Facebook часом кажуть, що користувачі - це не клієнти, це - продукт. Тож як змусити компанію повернути контроль над своїм найбільшим капіталом назад користувачам? Це можливо, але я не думаю, що ці зміни найближчим часом відбудуться.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

Тому вартує скористатися іншим, ефективнішим, способом - звернутися до науки. Саме наука дала нам змогу створити механізми аналізу особистих даних. І треба провести практично таке саме дослідження, щоб розробити механізми, які попереджатимуть користувача: "Ваші дії - ризиковані". Вподобавши ось цю сторінку на Facebook або поділившись ось цією особистою інформацією, ви підвищили мої шанси успішно спрогнозувати, вживаєте ви наркотики чи ні, і чи ладнаєте ви зі своїми колегами. Це впливатиме на рішення людей - поділяться вони певною інформацією, зроблять її доступною лише для якогось кола людей, чи взагалі не публікуватимуть її в Інтернеті. Можна також дати людям змогу шифрувати дані, які вони завантажують в Інтернет, щоб ті були невидимими й нічого не вартими для сайтів на кшталт Facebook чи третіх сторін. Доступ до цих даних мали б тільки обрані нами користувачі. Це дослідження страшенно цікаве для науковців, тому вони охоче ним займуться. Отож, піти цим шляхом простіше, ніж ухвалювати закони.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Коли я говорю на цю тему, то часто чую зауваження: але якщо люди почнуть захищати всі свої дані, то методи, які ви розробляєте, щоб прогнозувати їхню поведінку, перестануть працювати. Так, перестануть, і це чудово, бо як науковець я не хочу вивідувати інформацію про користувачів. Моя мета - поліпшити взаємодію людей в Інтернеті. Для цього часом доводиться вивідувати якусь інформацію про них, але якщо користувачі не хочуть, щоб я скористалась їхніми даними, то вони мусять мати на це право. Я хочу, щоб люди знали, які програми ми розробляємо, і давали на це свою згоду.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

І тому гадаю, що підтримка таких наукових розробок і дослідників, які хочуть повернути контроль користувачам, забравши його в соціальних медіа, означає крок уперед. А розробка й вдосконалення таких інструментів означає, що користувачі будуть обізнані й матимуть права. І думаю, зі мною всі погодяться, що над цим варто працювати.

Thank you.

Дякую.

(Applause)

(Оплески)

Thank you.

Дякую.

(Applause)

(Оплески)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads