Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

اگر دهه اول وب را به خاطر داشته باشید، محیطی کاملا یک نواخت بود. می توانستید آنلاین شوید، به صفحات نگاه کنید، و آنها توسط سازمانهایی طراحی شده بودند که تیمهایی برای این کار داشتند یا توسط افرادی که در آن دوره واقعا عاشق تکنولوژی محسوب می شدند. و با پیدایش رسانه های اجتماعی و شبکه های اجتماعی از سالهای ۲۰۰۰ به بعد، وب فضایی کاملا تغییر پیدا کرد که در حال حاضر بیشتر مطالبی که با آن سر و کار داریم توسط کاربران معمولی تامین شده اند، یا همینطور در ویدیوهای یوتیوب و یا پست های وبلاگ یا در انتقادات از یک محصول و یا پست های رسانه های اجتماعی. و همین طور به محل تعاملی تر تبدیل شده است، جایی که افراد با دیگران در تعامل اند، نظر می گذارند، به اشتراک می گذارند، فقط مشغول خواندن نیستند.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

فیس بوک تنها محل برای این کار نیست، اما بزرگ ترین محل است، و برای نمایش آماری ما مناسب است. فیس بوک ۱/۲ بیلیون کاربر در ماه دارد. یعنی نیمی از جمعیت اینترنت دنیا از فیس بوک استفاده می کنند. فیس بوک یک سایت مثل سایر سایت ها بود، که به افراد اجازه ایجاد یک نمایه آنلاین را با حداقل دانش فنی داد، و مردم هم با قراردادن مقدار زیادی از اطلاعات شخصی به فیس بوک پاسخ گفتند. بنابراین نتیجه این شد که اطلاعات رفتاری، تمایلات و اطلاعات آمارنگاری درباره صدها میلیون نفر در اختیار ماست، که در تاریخ بی سابقه بوده است. و به عنوان یک دانشمند کامپیوتر، یعنی توانسته ام مدل هایی بسازم که قادر به پیش بینی ویژگیهای پنهانی افرادی مثل شما هستند که حتی نمی دانید که اطلاعات آن ها را به اشتراک گذاشته اید. به عنوان دانشمند، ما از این برای کمک به نحوه ارتباط افراد آنلاین استفاده می کنیم، اما ابزارهای غیر انسان دوستانه هم وجود دارند، و مشکل این است که مردم واقعا نمی دانند که این تکنیک ها چگونه کار می کنند، و حتی اگر می دانستند، کاری از دستشان بر نمی آمد. اما چیزی که امروز می خواهم بگویم کارهاییست که می شود انجام دهیم، و سپس طرح چند ایده برای ما، که بتوانیم به سمتی حرکت کنیم که کنترل برخی از اطلاعاتمان را به دست کاربر برگردانیم.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

خوب این مربوط به شرکت تارگت است. من لوگوی شرکت را روی این خانم حامله بیچاره نگذاشتم. ممکن است آن داستان جالبی را که در مجله فوربز چاپ شد، دیده باشید که تارگت یک بروشور به آدرس دختر ۱۵ ساله ای می فرستد با محتوای تبلیغات و کوپن هایی درباره شیشه شیر بچه و پوشک و گهواره ۲ هفته قبل از آنکه به والدینش بگوید که باردار شده است. آره، باباش خیلی ناراحت شده بود. خوب تارگت چطور متوجه شد که این دختر دبیرستانی باردار است دو هفته قبل از این که والدینش بدانند؟ معلوم شد که آنها یک تاریخچه خرید از صدها هزار مشتری خود دارند و عددی که اسمش را نمره بارداری گذاشته اند محاسبه می کنند. که فقط این نیست که یک زن باردار هست یا نه بلکه تاریخ وضع حملش کی است. و آن ها این را فقط توسط محاسبه بدیهیاتی مثل این که او گهواره یا لباس بچه می خرد حساب نمی کنند، بلکه چیزهایی مثل این که او بیشتر ویتامین خرید می کند نسبت به خریدهای قبلی اش، یا او کیف دستی ای خریده است که برای حمل پوشک مناسب است. و به خودی خود، این خریدها به نظر چیز زیادی را آشکار نمی کنند، اما این الگوی رفتاری است که، زمانی که آن را در کنار الگوی هزاران فرد دیگر قرار می دهید، شروع به آشکار کردن بعضی چیزها می کند. بنابراین این کاری است که می کنیم وقتی چیزی درباره شما روی شبکه های اجتماعی حدس می زنیم. ما به دنبال الگوهای رفتاری کوچکی هستیم، که وقتی در کنار میلیون ها فرد دیگر بگذاریدشان، به ما اجازه دانستن همه نوع چیز را می دهد.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

بنابراین من و همکارانم در آزمایشگاه، مکانیزمی طراحی کرده ایم که می توانیم به خوبی چیزهایی رو پیش بینی کنیم مثل تمایلات سیاسی شما، امتیاز شخصیتی، جنسیت، گرایش جنسی، دین، سن، هوش، به همراه چیزهایی مثل این که چقدر به آشنایانتان اعتماد دارید و چقدر روابطتان عمیق هستند. ما این ها را به خوبی انجام می دهیم. و این هم از چیزهایی که شما ممکن است به عنوان اطلاعات بدیهی تصور کنید به دست نمی آید.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

خوب مثال مورد علاقه من از مطالعاتی است که امسال منتشر شد در ژورنال Proceedings of the National Academies می شود در گوگل جستجو کردش. چهار صفحه است و به راحتی قابل خواندن. و فقط در مورد لایک های افراد در فیس بوک تحقیق کرده اند، پس فقط آن چه در فیس بوک لایک می کنید، و از آن برای پیش بینی همه این ویژگی های شخصی، به همراه سایر چیزها استفاده کردند. و در مقاله شان ۵ لایکی را نام بردند که بیشترین تاثیر را در تشخیص هوش بالا داشت. و در میان آن ها صفحه ای بود درباره سیب زمینی سرخ کرده پیچی. (خنده) سیب زمینی سرخ کرده پیچی خوشمزه است، اما لایک کردنش لزوما به این معنی نیست که شما باهوش تر از یک آدم معمولی هستید. پس چطور می شود که یکی از مهمترین مشخصه های هوش شما لایک کردن این صفحه است وقتی محتوای آن کاملا بی ارتباط با ویژگی شخصیتی ای است که پیش بینی می شود؟ این طور به نظر می رسد که باید کلا به یک سری تئوری پنهان بنگریم که متوجه شویم چرا این اتفاق می افتد. یکی از آن ها یک نظریه جامعه شناسی به اسم هوموفیلی است، که اساسا می گوید افراد، با آدمهای شبیه خودشان دوست می شوند. پس اگر با هوشید، با باهوش ها دوست می شوید، و اگر جوان هستید، با جوان ها دوست می شوید، و این مساله برای هزاران سال وجود داشته است. ما همچینین زیاد درباره اینکه چطور اطلاعات در شبکه ها پخش می شوند می دانیم. ظاهرا چیزهایی مثل کلیپ ویدیویی یا لایک های فیسبوک و سایر اطلاعات دقیقا به شکلی منتشر می شوند، که بیماری ها در بخشهای جامعه انتشار می یابند. بر روی این موضوع زیاد تحقیق کرده ایم. مدل های خوبی برای آن داریم. و می توانید همه این ها را کنار هم بگذارید و ببینید که چرا اتفاقات این چنینی می افتند. بنابراین اگر بخواهم نظریه بدهم، این است که فرد باهوشی که این صفحه را ایجاد کرده است، یا شاید کسی که اولین بار لایکش کرده است. در آزمون امتیاز بالایی می گرفتند. و لایک کردند، و دوستانشان دیدند، و بر اساس هموفیلی، می دانیم که او احتمالا دوستان باهوشی داشته است، پس بین آنها منشتر شد، و بعضی از آنها هم لایک کردند، و دوستانشان باهوش بودند، بینشان منتقل شد، و به همین شکل در شبکه بین افراد باهوش گسترش یافت، تا در نهایت، فعل لایک کردن صفحه سیب زمینی پیچان مشخص کننده هوش بالاست، نه به دلیل محتوای آن، بلکه فقط به خاطر اینکه لایک کردنش ویژگیهای مشابه افرادی را که آن را لایک کرده اند بازتاب می دهد.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

این خیلی موضوع پیچیده ای است، نه؟ نشستن و توضیح دادنش برای یک کاربر معمولی کار سختی است، و حتی اگر چنین کاری بکنیم، یک کاربر معمولی چه کاری از دستش بر می آید؟ چطور متوجه می شوید که چیزی را لایک کرده اید که نشان دهنده یک ویژگی شخصیتیست که کاملا با محتوایی که آن را لایک کرده اید بی ارتباط است. برای کنترل استفاده از داده ها، قدرت زیادی لازم است، که کاربران ندارند. و من این را به عنوان مشکلی که در حال بزرگتر شدن است می بینم.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

و فکر می کنم چند راه وجود دارد که باید مدنظر قرار دهیم اکر می خواهیم به کاربران اجازه کنترل نحوه استفاده از اطلاعاتشان را بدهیم، چرا که اطلاعات همیشه به سود آنها استفاده نمی شود. مثال معمول من این است که، اگر زمانی از استادی خسته شدم، یک شرکت به راه خواهم انداخت برای پیش بینی تمام این ویژگیها و چیزهایی مثل طرز فکر در کار تیمی و آیا معتاد هستید، آیا الکلی هستید. می دانیم چطور پیش بینی کنیم. و این گونه گزارشات را به شرکتهای تجاری بزرگ که قصد استخدام دارند می فروشیم. ما واقعا می توانیم. می توانم همین فردا تجارت را شروع کنم، و شما هیچ کنترلی نخواهید داشت که من از داده های شما این گونه استفاده کنم. این از نظر من یک مشکل است.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

یکی از راه های پبش روی ما راه قانون و خط مشی است. و به دلایلی، فکر می کنم این کاربردی ترین راه است. اما مشکل اینجاست که که ما واقعا باید این کار را انجام دهیم. در نظر گرفتن پروسه سیاسی مان در عمل به من این حس را می دهد که خیلی بعید است که ما چند نفر نماینده پیدا کنیم که بنشینند، در این باره آموزش ببینند، و شروع به انجام تغییرات در قوانین مالکیت عمومی آمریکا کنند، تا کاربران قدرت کنترل داده هایشان را داشته باشند.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

می توان راه خط مشی را رفت، که رسانه های اجتماعی بگویند، اختیار اطلاعاتتان را دارید. و بر نحوه استفاده از آن کنترل دارید. مشکل این جاست که مدل درآمد در بیشتر شبکه های اجتماعی به نوعی بر پایه اشتراک و بهره برداری از اطلاعات کاربران بنا شده است. گاهی درباره فیس بوک گفته می شود که کاربران مشتری نیستند، بلکه کالا هستند. و خب شما چطور می توانید شرکتی را وادار کنید که کنترل مهمترین داراییشان را به کاربران برگردانند؟ ممکن است، ولی من فکر نمی کنم که به این زودی تغییری ببینیم.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

پس فکر می کنم که راه دیگری که می توان رفت و کارآمد تر خواهد بود راه علم است. این علم است که به ما اجازه طراحی مکانیزم هایی برای محاسبه اطلاعات شخصی را داد. و در واقع ما باید تحقیقات مشابهی انجام دهیم تا مکانیزم هایی طراحی کنیم که بتوانند به کاربران بگویند، «این ریسک کاری است که تو انجام دادی.» با لایک آن صفحه فیس بوک، یا به اشتراک گذاری آن تکه از اطلاعات شخصی، تو قابلیت مرا برای پیش بینی این که معتاد هستی یا نه افزایش دادی یا این که شرایطتت در محل کار خوب است یا بد. و فکر می کنم این، بر روی تمایل افراد برای اشتراک گذاری یک چیز، خصوصی کردن، یا حتی فقط ذخیره کردن در کامپیوترشان موثر است. همچنین می توان روی کارهایی مثل اجازه رمزنگاری اطلاعاتی که آپلود می کنند کار کنیم، تا اطلاعات برای سایت هایی مثل فیس بوک یا هر سرویس دیگری مخفی و غیرقابل استفاده باشد، اما کاربرانی که شخص آپلود کننده انتخاب می کند، اجازه دسترسی به اطلاعات را داشته باشند. با یک نگاه عقلانی این تحقیقات خیلی هیجان آور هستند، و دانشمندان تمایل به انجام آن را دارند. و این یک مزیت نسبت به راه قانون است.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

یکی از مشکلاتی که مردم هنگام صحبت در این رابطه بیان می کنند این است که می گویند اگر افراد شروع به مخفی کردن این اطلاعات کنند، تمامی روشهایی که شما برای پیش بینی شخصیتشان طراحی کردید، از کار خواهند افتاد. و من می گویم، کاملا درسته، و برای من، این یعنی موفقیت، چرا که به عنوان یک دانشمند، هدف من کشف اطلاعات درباره مردم نیست، بلکه بهبود روش تعاملشان در اینترنت است. و گاهی این کار با کسب اطلاعات، درباره آنها انجام می شود، اما اگر کاربران نخواهند که از اطلاعاتشان استفاده کنم، به نظرم باید این حق را داشته باشند. من می خواهم کاربران ابزارهایی که ما طراحی می کنیم مطلع و راضی باشند.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

و یه همین دلیل فکر می کنم ترغیب کردن این نوع علم و پشتیبانی از محققینی که قصد دارند بخشی از کنترل را به کاربران برگردانند، و جلوی شرکت های رسانه های اجتماعی را بگیرند، به معنی پیشرفت است، همچنان که این ابزارها بهتر و پیشرفته تر می شوند، به این معنی است که کاربرانی تحصیل کرده و قدرتمند خواهیم داشت، و فکر می کنم که همه می پذیریم که این هدف ایده آلی است برای دنبال کردن.

Thank you.

متشکرم.

(Applause)

(تشویق حضار)

Thank you.

متشکرم.

(Applause)

(تشویق حضار)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads