Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Nếu bạn vẫn còn nhớ thập kỷ đầu của website thật sự rất trì trệ. Bạn có thể lên mạng, xem những trang web chúng được xây dựng bởi đội ngũ thuộc các tổ chức hay các cá nhân thành thạo về công nghệ lúc bấy giờ. Với sự phát triển của truyền thông và mạng xã hội những năm đầu thế kỷ 21, trang web đã hoàn toàn trở thành một nơi mà phần lớn nội dung tương tác được xây dựng bởi người dùng thông thường đó có thể là video trên Youtube, bài viết trên blog đánh giá sản phẩm hoặc những bài post. Nó cũng mang tính tương tác ngày một cao hơn, mọi người tương tác với nhau bình luận, chia sẻ chứ không đơn thuần là đọc.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

Facebook không phải nơi duy nhất nhưng lại phổ biến nhất, và những con số sau sẽ minh họa cho điều đó. Mỗi tháng FB có thêm 1,2 tỷ người dùng nghĩa là 1/2 số người dùng Internet toàn cầu đang sử dụng FB. Nơi này, cùng những trang khác cho phép con người tạo ra cá tính trên mạng mà không đòi hỏi nhiều kỹ xảo công nghệ và người ta hưởng ứng nó bằng cách đưa lên mạng lượng lớn thông tin cá nhân. Kết quả là chúng ta có được dữ liệu về hành vi, thiên hướng, nhân khẩu của hàng trăm triệu người, điều này chưa từng có tiền lệ trong lịch sử. Là nhà khoa học máy tính, điều đó có nghĩa là tôi có thể xây dựng sản phẩm dự đoán đặc tính tiềm ẩn của bạn mà bạn không hề biết rằng mình là người đã chia sẻ thông tin đó. Bằng cách đó, chúng tôi cải thiện cách người ta tương tác online nhưng ứng dụng vì người dùng lại ít đi vấn đề là người dùng không thật sự hiểu kỹ thuật và cách chúng hoạt động, thậm chí nếu hiểu, họ cũng không điều khiển được chúng. Cho nên, tôi muốn nói với các bạn hôm nay một vài điều ta có thể làm để hiểu rõ về việc làm thế nào người dùng có thể lấy lại

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

phần nào kiểm soát. Đây là công ty Target, tôi không chỉ để logo lên bụng bà bầu đáng thương này. Có thể bạn đã thấy mẩu chuyện này trên tạp chí Forbes kể về việc công ty Target gửi đến cô bé 15 tuổi này tờ rơi, quảng cáo và phiếu mua hàng cho bình sữa, tã giấy và nôi trẻ em 2 tuần trước khi cô nói với cha mẹ mình đang mang thai. Đúng vậy, người cha thật sự đã rất buồn. Ông nói: "Làm cách nào Target biết được con bé đang mang thai trước khi nó nói với cha mẹ cơ chứ?" Hóa ra là họ có được lịch sử mua sắm của hàng trăm ngàn khách hàng và họ tính toán cái được gọi là "chỉ số mang thai" không chỉ cho phép phát hiện phụ nữ mang thai mà còn biết được ngày sinh nở. Họ tính toán điều đó không dựa trên những gì trước mắt như việc cô ấy mua nôi hay quần áo trẻ sơ sinh mà là dựa vào việc cô ấy mua nhiều vitaminn hơn bình thường hay là mua một cái túi xách đủ to để đựng tã. Bản thân những món hàng ấy dường như cũng không nói lên được gì nhiều nhưng kiểu hành vi đó khi bạn đặt vào hoàn cảnh cụ thể của hàng ngàn người khác nó sẽ tiết lộ vài chuyện đằng sau đó. Đó là công việc mà chúng tôi đang làm, suy đoán về bạn trên truyền thông xã hội. Chúng tôi tìm kiếm những mẩu hành vi, một khi được phát hiện giữa hàng triệu người mọi chuyện sẽ được hé mở.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

Vì thế, trong phòng thí nghiệm cùng với đồng nghiệp chúng tôi đã phát triển các cơ chế suy đoán khá chính xác ví dụ như thiên hướng chính trị, tính cách, nhân phẩm, khuynh hướng tính dục tôn giáo, độ tuổi, trí thông minh. Ngoài ra, còn có: "Bạn tin người quen đến mức nào?" hay: "Quan hệ của bạn bền chặt đến đâu?" Chúng tôi có thể làm tốt nó. Xin nhắc lại, nó không xuất phát từ những thứ bạn cho là hiển nhiên.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

Ví dụ yêu thích của tôi là từ nghiên cứu này, được xuất bản năm nay trong tập san của Viện Hàn Lâm Quốc Gia. Bạn có thể tìm trên Google. Nghiên cứu dài 4 trang và dễ đọc. Những nhà nghiên cứu chỉ tập trung vào lượt "Like" trên FB, và họ sẽ dùng nó để suy đoán những đặc điểm trên, cùng một số khác. Trong bài nghiên cứu, họ đưa ra 5 lượt like thể hiện rõ nét nhất chỉ số thông minh vượt trội. Một trong số đó là "like" trang về khoai tây chiên xoắn. (Cười) Khoai tây chiên xoắn rất ngon, nhưng việc bạn "Like" chúng không nhất thiết là bạn thông minh hơn người. Vậy nên, làm thế nào một trong những biểu hiện rõ nét nhất về trí thông minh lại là việc bấm "Like" trang này khi mà nội dung của nó hoàn toàn không liên quan gì đến đặc điểm suy đoán? Thật ra, phải xem xét rất nhiều lý thuyết cơ sở để hiểu được làm thế nào ta làm được điều này? Một trong số đó là lý thuyết xã hội học: "Sự đồng chất", nói rằng người ta thường kết bạn với người giống mình. Nếu thông mình, bạn thường chơi với người thông minh, Nếu trẻ, bạn thường kết bạn với người trẻ Lý thuyết này đã tồn tại hàng trăm năm nay. Chúng tôi cũng biết rất nhiều về cách mà thông tin lan rộng thông qua các mạng lưới. Hóa ra video lan truyền của Youtube, "Like" Facebook hay những thứ khác đều được lan truyền hệt như cách mà những mối nguy hại lan truyền trên mạng xã hội. Chúng tôi đã nghiên cứu trong thời gian dài và có được những hình mẫu giá trị. Bạn có thể gộp mọi thứ lại rồi từ từ nhận ra làm thế nào những điều tương tự có thể xảy ra. Vậy nên, nếu tôi đặt giả thuyết rằng một gã thông minh nào đó đã lập ra trang này hay một trong những người nhấn "Like" đầu tiên đã đạt điểm số cao. Họ nhấn "Like", rồi bạn bè họ thấy theo thuyết "Đồng chất", ta biết được anh ta có bạn thông minh nó lan truyền đến họ, một trong số đó bấm "Like" họ lại có bạn thông minh rồi thì nó lan truyền đến họ. Điều này được truyền đi qua mạng lưới đến một lượng lớn những người thông minh và bằng cách đó, hành động "Like" trang FB khoai tây xoắn sẽ biểu thị chỉ số thông minh cao không phải vì nội dung, mà vì chính hành động nhấn "Like" phản ánh đặc tính chung của người thực hiện.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Khá phức tạp đúng không? Không dễ để có thể giải thích chuyện này cho một người bình thường thậm chí, nếu có thể, một người bình thường có thể làm gì được? Làm thế nào bạn biết được việc thích thứ gì đó biểu hiện một đặc tính cá nhân trong khi nó chẳng liên quan gì đến nội dung bạn thích? Người dùng mạng không có nhiều quyền hạn để quản lý việc sử dụng những dữ liệu này.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

Và một vấn đề thực sự đang phát sinh. Nên tôi cho rằng cần xem xét có một số hướng đi nếu muốn người dùng có thêm quyền hạn để quản lý vì không phải lúc nào nó cũng được dùng để phục vụ lợi ích của họ. Một ví dụ tôi thường đưa ra là nếu chán làm giảng viên tôi sẽ mở một công ty chuyên dự đoán đặc tính và những thứ như: bạn làm việc nhóm giỏi tới đâu, liệu bạn có dùng ma túy hay nghiện rượu. Chúng tôi biết cách dự đoán và sẽ bán các báo cáo đó cho công ty nhân sự và doanh nghiệp nào muốn thuê bạn. Hoàn toàn có thể làm được . Tôi có thể bắt đầu ngay ngày mai. Bạn sẽ không thể quản lý việc tôi muốn dùng dữ liệu của bạn. Với tôi, dường như đó là một vấn đề.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

Cho nên, một trong những hướng có thể chọn là Luật pháp. Xét trên một số phương diện, tôi nghĩ nó hiệu quả nhất nhưng vấn đề là liệu có cần phải làm thế. Khi quan sát quá trình đi vào hoạt động của nhà nước, tôi nghĩ khó có thể xảy ra việc hàng loạt đại biểu chịu ngồi xuống tìm hiểu và rồi tiến hành những cải cách quan trọng về luật sở hữu trí tuệ tại Mỹ để người dùng tự quản lý dữ liệu cá nhân.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Ta có thể nhờ đến chính sách mà công ty truyền thông vẫn thường nói "Bạn sở hữu dữ liệu. Bạn có toàn quyền sử dụng." Vấn đề là mô hình doanh thu của những công ty này lại phụ thuộc vào việc chia sẻ hay khai thác dữ liệu người dùng. Đôi khi, ta nói trên Facebook người dùng không phải khách hàng mà chính là sản phẩm. Làm thế nào bắt một công ty nhượng lại quyền quản lý "tài sản" chính cho người dùng FB? Điều đó có thể xảy ra, nhưng sẽ không phải một sớm một chiều.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

Nên tôi nghĩ đến một hướng khác mà có lẽ sẽ hiệu quả hơn : khoa học. Chính khoa học đã giúp ta phát triển những bộ máy tính toán dữ liệu cá nhân này đầu tiên. Ta cũng cần phải làm một nghiên cứu tương tự nếu muốn xây dựng hệ thống nói cho người dùng rằng: "Đây là hiểm họa mà hành động của bạn tạo ra." Bằng việc "Like" các trang trên FB đến việc chia sẻ thông tin cá nhân nào đó, bạn đang giúp tôi nâng cao khả năng dự đoán liệu bạn có đang dùng ma túy hay hòa nhập được với nơi làm việc. Điều đó có thể ảnh hưởng đến việc mọi người chia sẻ thông tin hay giữ chúng cho riêng mình, hoặc đặt chế độ ẩn. Chúng ta có thể xét đến việc cho phép mã hóa dữ liệu được đăng tải, thông tin sẽ bị ẩn và không còn giá trị với những trang như FB hay dịch vụ của bên thứ ba nhưng nó sẽ lựa chọn người dùng nào sẽ có quyền đăng hay truy cập thông tin được đăng tải. Nghiên cứu này cực kỳ thú vị xét về phương diện tri thức các nhà khoa học sẽ hào hứng bắt tay thực hiện. Điều tạo cho chúng tôi thuận lợi trên phương diện pháp lý.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Một trong những vấn đề thường được đề cập khi tôi bàn về việc này là: "Nếu mọi người bắt đầu giữ chúng cho riêng mình, những phương pháp cô thực hiện để dự đoán đặc tính của họ sẽ thất bại. Và tôi nói: "Với tôi đó là thành công." Vì là một nhà khoa học mục tiêu của tôi không phải là suy đoán thông tin người dùng, mà là cải thiện cách con người tương tác trên mạng. Đôi khi, nó dính đến việc phải suy đoán về họ, nhưng nếu người dùng không muốn tôi dùng những dữ liệu đó họ có quyền làm thế. Tôi muốn họ được biết và chấp thuận công cụ mà chúng tôi đang phát triển.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Vậy nên, tôi cho rằng việc khuyến khích lĩnh vực này và ủng hộ các nhà nghiên cứu, những người muốn trả lại cho người dùng quyền kiểm soát, tách khỏi các công ty truyền thông đồng nghĩa với việc tiến về trước khi công cụ được phát triển và cải tiến, rằng sẽ có một thế hệ người dùng mạng được huấn luyện và tiếp sức, tôi nghĩ tất cả chúng ta đều đồng ý rằng đó là con đường lý tưởng để tiến lên.

Thank you.

Xin cảm ơn.

(Applause)

Thank you.

Xin cảm ơn.

(Applause)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads