Jennifer Golbeck: Your social media "likes" expose more than you think

إن كنت تتذكر العقد الأول من الويب، كان مكانا جامدا بالفعل. يمكنك أن تدخل إلى الإنترنت، وتطلع على الصفحات، وهي موضوعة إما من طرف المنظمات التي كانت تتوفر لديها فرق للقيام بذلك أو من طرف أفراد بارعين في التكنولوجيا آنذاك. ومع بزوغ الوسائط الاجتماعية والشبكات الاجتماعية في مطلع القرن 21، تغير الويب بشكل كامل إلى مكان حيث أغلب المحتوى الذي نتفاعل معه، تم وضعه من طرف أعضاء عاديين، سواء في فيديوهات اليوتيوب أو التدوينات أو تقييمات المنتجات أو منشورات الوسائط الاجتماعية. كما أصبح كذلك مكانًا أكثر تفاعلية بكثير، حيث يتفاعل الناس مع بعضهم البعض، فيعلقون ويتشاركون، ولا يقرؤون فقط.

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

إذن فالفيسبوك ليس المكان الوحيد الذي يمكنك القيام بهذا فيه، لكنه الأكبر، ومن المفيد أن نوضح ذلك بالأرقام. يرتاد الفيسبوك 1.2 مليار مستخدم في الشهر. إذن فنصف مستخدمي الإنترنت من سكان الأرض يستخدمون الفيسبوك. إنه موقع، إلى جانب مواقع أخرى، تسمح للناس بإنشاء شخصيات افتراضية بالقليل جدا من المهارات التقنية، وقد تجاوب الناس من خلال وضع كم هائل من البيانات الشخصية على الإنترنيت. والنتيجة أننا نحصل على بيانات حول السلوكيات والتفضيلات والديموغرافية لمئات الملايين من الأشخاص، وهو أمر غير مسبوق تاريخيًا، وكعالمة حاسوب، ما يعنيه هذا هو أنني استطعت بناء نماذج يمكنها التنبؤ بمختلف السمات الخفية لجميعكم، مما لا تعرفون حتى أنكم تشاركون معلومات بخصوصه. كعلماء، نستخدم هذا لتيسير الطريقة التي يتفاعل بها الناس على الإنترنت لكن هناك تطبيقات أكثر أنانية، وهناك مشكلة في كون المستخدمين لا يفهمون حقا هذه التقنيات وكيف تعمل، وحتى إن فهموا ذلك، فليس لديهم الكثير من التحكم فيها. لذا فما أود التحدث إليكم بخصوصه اليوم هو بعض من هذه الأمور التي يمكننا القيام بها، وإعطاؤكم بعض الأفكار حول الكيفية التي يمكن بها أن نمضي قدمًا لإرجاع بعض من التحكم لأيادي المستخدمين.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

هذه تارغيت، الشركة. لم أقم بوضع هذا الشعار على بطن هذه المرأة الحامل المسكينة. قد تكونون رأيتم هذه الطرفة التي طبعت في مجلة فوربز، حيث قامت تارغيت بإرسال نشرة إعلانية لفتاة في 15 من عمرها فيها إعلانات وقسائم شرائية لرضاعات وحفاضات ومهاد قبل أسبوعين من إخبارها لوالديها بأنها حامل. نعم، الأب كان غاضبا بالفعل. فقال: "كيف عرفت تارغيت بأن هذه الفتاة في مرحلة الثانوية حامل قبل إخبارها لوالديها؟" وقد اتضح أنهم يحتفظون بتاريخ المشتريات لمئات الآلاف من الزبائن ويقومون بحساب ما يسمونه بدرجة الحمل، والذي لا يشير فقط إلى ما إذا كانت المرأة حاملًا أم لا، لكن كذلك إلى تاريخ الولادة المحتمل. ويقومون بحساب ذلك ليس من خلال النظر إلى الأمور الواضحة، كشرائها لمهد أو ملابس طفل، ولكن لأشياء مثل أنها اشترت فيتامينات أكثر مما تشتريه في العادة، أو أنها اشترت حقيبة يدوية كبيرة بما يكفي لتحمل الحفاظات. وعمليات الشراء المنفردة تلك لا تبدو بأنها تبدي الكثير، لكنه نمط سلوكي حين تضعه في سياق آلاف الأشخاص الآخرين، يبدأ في إظهار بعض التبصرات. إذن فهذا هو ما نقوم به حين نتنبأ بأشياء حولك في الوسائط الاجتماعية. نبحث عن الأنماط الصغيرة للسلوكات والتي حين تكتشفها ضمن ملايين الأشخاص، تسمح لنا بمعرفة أشياء كثيرة.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

إذن ففي مختبري ومع زملائي، قمنا بتطوير آلية حيث يمكننا توقع أمور بدقة كبيرة مثل توجهك السياسي، ونمط شخصيتك وجنسك وتوجهك الجنسي، وديانتك وعمرك وذكائك، بالإضافة إلى أشياء مثل كم تثق في الأشخاص الذين تعرفهم ومدى متانة علاقتك بهم. يمكننا القيام بكل هذا بشكل جيد. ومجدداً، فإن هذا لا يأتي مما قد تعتقد أنها معلومات بديهية.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

والمثال المفضل لدي من هذه الدراسة التي تم نشرها هذه السنة في أشغال الأكاديميات الوطنية. إن قمتم بالبحث عن هذا في غوغل ستجدونها. إنها دراسة من أربعة صفحات، سهلة القراءة. وقد نظروا فقط إلى إعجابات الأشخاص على الفيسبوك، فقط الأمور التي تسجل إعجابك بها على الفيسبوك، واستخدموا ذلك للتنبؤ بكل هذه السمات، بالإضافة إلى تنبؤات أخرى. وفي منشورهم عدّدوا أكثر خمسة إعجابات تدل على الذكاء العالي. ومن بينهم، كان تسجيل الإعجاب بصفحة للبطاطس المقلية المجعّدة. (ضحك) البطاطس المقلية المجعّدة لذيذة، لكن تسجيل الإعجاب بها لا يعني بالضرورة أنك أذكى من الشخص العادي. فكيف يعتبر ذلك إذن بأنه من أقوى الأدلة على ذكائك أن تسجل إعجابك بهذه الصفحة في حين أن المحتوى غير ذي صلة بالسمات التي يتم التنبؤ بها؟ وقد اتضح أنه علينا أن ننظر إلى كم من النظريات الكامنة لمعرفة سبب قدرتنا على القيام بهذا. أحدها نظرية اجتماعية تدعى "الهوموفيليا (الإمعة)" والتي تقول أساسا بأن الناس يصادقون من هم مثلهم. وبالتالي إن كنت ذكيًّا ، فإنك تميل إلى مصادقة الأذكياء، وإن كنت شابًا، فإنك تميل إلى مصادقة الشباب، وهذا أمر مؤكد على مدى مئات السنين. نعرف كذلك الكثير عن كيفية انتشار المعلومات عبر الشبكات. وقد اتضح أن أشياء مثل الفيديوهات سريعة الانتشار أو إعجابات الفيسبوك أو معلومات أخرى تنتشر بنفس الطريقة التي تنتشر بها الأمراض في الشبكات الاجتماعية. فهذا أمر درسناه لمدة طويلة. ولدينا نماذج جيدة له. لذا فيمكنك أن تضع هذه الأمور معاً وتبدأ في رؤية سبب حدوث أمور كهذه. إذاً لو أردت أن أعطيكم فرضيةً، كانت لتكون أن شخصًا ذكيًا أنشأ هذه الصفحة، أو ربما أحد الأشخاص الأوائل الذين سجلو الإعجاب بها كانت درجاتهم مرتفعة في ذلك الاختبار. فسجلوا إعجابهم بها، فلاحظ أصدقاؤهم الصفحة، وبإعمال مبدأ الهوموفيليا، ندرك أنه قد كان لديه أصدقاء أذكياء، وبالتالي انتشر الأمر بينهم، وبعضهم سجل إعجابه، وقد كان لديهم أصدقاء أذكياء، فانتشرت بينهم كذلك، وهكذا انتشرت عبر الشبكة عبر أشخاص أذكياء، ليصبح في النهاية، فعل تسجيل الإعجاب بصفحة البطاطس المقلية المجعّدة دليلًا على الذكاء العالي، ليس بسبب المحتوى، ولكن بسبب أن فعل تسجيل الإعجاب نفسه يعكس السمات العامة للأشخاص الآخرين الذين قاموا بذلك.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

هذه أشياء في غاية التعقيد، صحيح؟ من الصعب أن يتم الجلوس لشرحها لمستخدم عادي، وحتى إن تم ذلك، ماذا يمكن للمستخدم العادي أن يفعل تجاه هذا الأمر؟ كيف تعرف أنك إن قمت بتسجيل إعجابك بشيء فهذا يدل على إحدى سماتك التي لا علاقة لها بالمحتوى الذي قمت بتسجيل إعجابك به؟ هناك الكثير من القدرة التي لا يمتلكها المستخدمون للتحكم في كيفية استخدام هذه البيانات. وبرأيي أن هذا يشكل مشكلة حقيقية في المستقبل.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

لذا أعتقد أن هناك مسارين يجدر بنا النظر إليهما إن أردنا إعطاء بعض التحكم للمستخدمين حول كيفية استخدام هذه البيانات، لأنه لن يتم استخدامها دائما لمصحلتهم. مثال أضربه عادة لهذا، أنني إن أحسست بالملل كأستاذة جامعية، سأنشئ شركة تتنبأ بكل هذه السمات وأشياء مثل مدى اشتغالك في إطار الفريق أو إن كنت مستخدمًا للمخدرات، أو مدمنًا للكحول. نعرف كيف نخمن كل ذلك. وسأقوم ببيع التقارير لشركات الموارد البشرية والشركات الكبرى التي تريد توظيفك. نستطيع القيام بذلك بكل سهولة الآن. أستطيع أن أبدأ ذلك المشروع من الغد، ولن يكون لديك أي تحكم على الإطلاق في استخدامي لبياناتك بهذه الطريقة. وهذه في نظري مشكلة.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

لذا فأحد المسارات التي يمكن أن نسلكها هو المسار القانوني. وأظن أنه في بعض النواحي، سيكون ذلك أكثر فاعلية، لكن المشكلة هي أننا سيتوجب علينا القيام بذلك. والنظر إلى سيرورة العملية السياسية لدينا يجعلني أظن أنه من الصعب أن نقنع مجموعة من المندوبين ليجلسوا ويتعلموا عن ذلك، ثم يسنّو تغييرات شاملة على قوانين الملكية الفكرية في الولايات المتحدة بحيث يسمح للمستخدمين بالتحكم في بياناتهم.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

نستطيع أن نسلك المسار السياسي، حيث تقول شركات الوسائط الاجتماعية، أتعلمون أمرًا؟ أنتم تمتلكون بياناتكم. أنتم تتحكمون في كيفية استخدامها. والمشكلة هي أن نظام موارد معظم شركات الوسائط الاجتماعية يعتمد على مشاركة أو استغلال بيانات المستخدمين بطريقة ما. ويقال أحيانا عن الفيسبوك بأن المستخدمين ليسوا زبائنًا، لكنهم المنتج. إذن فكيف تجعل شركة تتنازل عن التحكم في أصولها الرئيسية للمستخدمين؟ الأمر ممكن، لكن لا أظنه أمرًا سنراه يتغير بسرعة.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

لذلك أعتقد أن المسار الآخر الذي يمكننا أن نسلكه هو الأكثر فاعلية وهو أن نكثف الجهود العلمية. وهو أن نكثف من الجهود العلمية التي تسمح لنا بتطوير كل هذه الآليات لحساب هذه البيانات الشخصية في المقام الأول. وهو في الواقع بحث مشابه جدًا سيكون علينا القيام به إن أردنا تطوير آليات يمكنها القول لمستخدم، "ها هي مخاطر العملية التي أقدمت عليها للتو." من خلال تسجيل الإعجاب بصفحة فيسبوك، أو مشاركة تلك المعلومات الشخصية، قد قمت بتحسين مقدرتي على توقع ما إذا كنت تستخدم المخدرات أم لا أو ما إذا كنت منسجما في مكان عملك أم لا. وأظن أن ذلك سيؤثر على مدى رغبة الناس في مشاركة شيء، أو إبقائه خاصًّا، أو بعيدًا نهائيًّا عن الإنترنت. يمكننا كذلك النظر إلى أشياء مثل السماح للأشخاص بتشفير البيانات التي يرفعونها، فتصبح بذلك خفيةً وغير ذات فائدة لمواقع مثل الفيسبوك أو أي جهاتٍ خارجيةِ تصل إليها، ولكن المستخدمين الذين يود الشخص الذي نشرها أن يطلعوا عليها وحدهم يستطيعون رؤيتها. هذا مجالٌ بحثي مثيرٌ للغاية من منظورٍ فكري، لذلك سيكون العلماء مستعدين للقيام به. وهذا يعطيه الأفضلية مقارنةً بالجانب القانوني.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

أحد المشاكل التي يطرحها الناس حين أتحدث عن هذا، قولهم: أتعلمين أنه إن بدأ الناس بجعل كل هذا البيانات خاصة، كل تلك الطرق التي كنت تطورينها لتخمين سماتهم ستفشل. وأقول، بكل تأكيد، وبالنسبة لي، فذلك نجاح، لأنني كعالمة، هدفي ليس أن أستنتج معلومات عن المستخدمين، بل أن أحسّن من الطريقة التي يتفاعل بها الناس على الإنترنت. وأحيانا يعني ذلك استنتاج أشياء عنهم، لكن إن لم يرد المستخدمون مني أن أستخدم تلك البيانات، فأظن أنه يجب أن يتم إعطاؤهم الحق في القيام بذلك. أريد من المستخدمين أن يكونو مستخدمين واعيين وراضين عن الأدوات التي نطورها.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

لذا أعتقد أن تشجيع هذا النوع من العلوم ودعم الباحثين الذين يريدون التخلي عن بعض من ذلك التحكم وإعادته للمستخدمين وبعيداً عن شركات الوسائط الاجتماعية يعني أن المضي قدمًا، وبتطور هذه الأدوات وتقدمها، يعني أنه ستكون لدينا قاعدة مستخدمين متعلمة وممكّنة، وأظن أن معظمنا متفقين على أن ذلك مسار مثالي للمضي قدمًا.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

شكرا لكم.

Thank you.

(تصفيق)

(Applause)

شكرا لكم.

Thank you.

(تصفيق)

(Applause)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads