Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

인터넷의 초기 10년을 생각하면 거긴 정말 정적인 곳이었습니다. 온라인으로 페이지들을 볼 수 있었죠 그것은 팀으로 이루어진 단체에서 올린 것이거나 그 당시 컴퓨터 기술에 능숙한 개인들이 올린 것이었습니다. 2000년대 초기에 부상하기 시작한 소셜미디어와 네트워크로 인해 인터넷은 완전히 바뀌었습니다. 우리가 보는 어마어마한 콘텐츠들을 보통의 사용자들이 올리는 장소로 말입니다. 유투브 동영상이라든지 블로그, 또는 상품후기나 소셜미디어 글들이요. 사람들이 서로 소통하는 훨씬 상호적인 장소가 되었습니다. 답글을 남기고, 공유합니다. 그저 읽기만 하지 않습니다.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

페이스북은 이런 것을 할 수 있는 유일한 곳일 뿐만 아니라 가장 큰 곳이기도 합니다. 숫자가 바로 설명해주죠. 페이스북은 한 달기준으로 12억명이 사용합니다. 그러니까 지구상의 인터넷 사용자 중 절반이 페이스북을 쓰고 있어요. 다른 사이트들과 같이 별다른 기술이 없어도 온라인 상의 인물을 만들수 있도록 해줍니다. 사람들은 엄청난 양의 개인적인 내용을 올리는 것으로 그것을 사용하죠. 그것으로 인한 결과는 역사에 선례없는 수 십억 명의 사람들에 관한 행동패턴, 선호도, 인구적 데이타를 얻게 된 것입니다. 컴퓨터 과학자로서, 이것의 의미는 여러분들이 공유하고 있는 줄도 모르는 정보들로 그 안에 숨겨진 여러분의 의도를 예측해 낼 수 있는 모형을 만들어 낼 수 있었다는 겁니다. 과학자로서, 우리는 사람들이 온라인상에서 상호작용하는 방식에 도움을 주는데 씁니다. 그러나 도움이 되는 쪽은 좀 적습니다. 사용자들은 이 기술이나 과정이 어떤지 실제로 이해하지 못한다는 게 문제입니다. 안다고 해도, 그것을 통제할 수가 없죠. 그래서 오늘 말씀드리려고 하는 것은 이에 대해 우리가 할 수 있는 것에 대한 것과 어떻게 하면 사용자들에게 그 통제권을 돌려 주도록 할 수 있는가에 관한 생각들입니다.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

이건 타겟이라는 회사입니다. 여기 임신한 여성의 배에 있는 로고는 제가 넣은 게 아닙니다. 이 이야기를 들어 보신 적이 있으실 거에요. 포브스 잡지에 실린 것인데 타겟 회사에서 15살짜리 소녀에게 젖병, 기저귀, 요람를 위한 광고와 쿠폰 전단지를 그녀가 자신의 부모님에게 임신했다는 사실을 말하기 2주 전에 보냈다는 겁니다. 네, 그래요. 아버지가 무척 화가 나셨죠. 그가, "어떻게 고등학생이 임신했다는 것을 부모가 알기도 전에 타겟이 먼저 알 수가 있죠?" 라고 말했어요. 알고 보니 그들은 수 십만명의 고객들에 대한 구매이력을 가지고 있었고 그들만의 임신척도라고 하는 것을 계산했어요. 단지 임신을 했는지 안했는지만 보는게 아니라 예정일이 언제인지 까지요. 그들이 계산 해낸 것이 아기요람이나 옷을 사는 것과 같은 뻔한 것을 통해서 알아 낸 것이 아니고 평소에 샀던 것보다 더 많은 비타민을 산다든지 아니면 기저귀를 넣을 수 있을 만큼 큰 가방을 산다든지 하는 것으로 아는 겁니다. 그 구매자체만 봐서는 별로 보여주는 게 없습니다. 그것은 행동 패턴으로 수 천명의 다른 사람들 속에서 보면 이해할 만한 것이 보입니다. 그게 우리가 하는 일인데요 소셜미디어에서 여러분에 대한 것을 예측하는 겁니다. 여러분이 수 백만명의 사람들 속에서 무언가 발견하면 여러가지 것들을 알 수 있게 해주는 어떤 행동 패턴들을 찾습니다.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

연구실에서 동료들과 꽤 정확하게 예측할 수 있는 방법을 개발했습니다. 여러분의 정치적 성향, 성격 점수, 성별, 성적 성향, 종교, 나이, 지능, 거기에다가 사람들을 얼마나 신뢰하고 있고 관계가 얼마나 두터운지도요. 저희는 이런 걸 정말 잘합니다. 다시 말씀드리지만, 여러분이 생각하실 만한 뻔한 정보로 이런 것들을 알아내는 게 아닙니다.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

올해 미국 국립과학원 회보에 실린 이 연구에서 제가 제일 좋아하는 부분입니다. 구글 검색을 해보시면 나옵니다. 읽기 쉬운 4페이지 분량이에요. 여러분들이 좋아하시는 것들에 표시하는 페이스북의 "좋아요"를 보고서 다른 것들과 함께 어떤 요인을 예측하는데 쓰는 겁니다. 보고서에는 5개의 "좋아요" 리스트가 있는데 높은 지능을 나타내는 강력한 지표였습니다. 그 다섯 가지 중 하나가 꼬부랑 감자에 대한 페이지를 "좋아요"하는 것이었어요. (웃음) 꼬부랑 감자는 맛있지만, 그것을 좋다고 하는 것이 꼭 평균보다 똑똑하다는 것을 의미하진 않습니다. 그럼 어떻게 여러분의 지능에 관한 확실한 지표중 하나가 이 페이지를 좋다고 하는 걸까요? 이 내용이 지능이라는 요인과는 아무런 관련성이 없는데도 말이에요. 왜 그런지 보려면 그 이면에 있는 여러 가지 이론들을 알아 봐야 합니다. 그 중의 하나가 호모필리라고 하는 사회이론입니다. 비슷한 사람들끼리 친구가 된다는 이론입니다. 똑똑한 사람은 똑똑한 사람끼리 젊은 사람은 젊은 사람끼리 친구가 되는 경향이 있고 이것은 수 백년 동안 그렇게 굳어졌습니다. 우리는 잘 압니다. 정보가 어떻게 인맥을 통해 퍼지는지요. 히트 동영상이나 페이스북의 "좋아요", 또는 다른 정보들이 사회적 망을 통해서 질병이 퍼지는 방법과 똑같다는 것입니다. 이것이 저희가 오랫동안 연구한 것입니다. 잘 설명해 줄 수 있는 모델이 있죠. 그러니까 이 모든 정보들을 모아서 왜 이런 일이 생기는지 알 수 있는 것이죠. 가능한 가설을 말씀드리자면, 어떤 똑똑한 사람이 이 페이지를 시작했거나 "좋아요" 라고 시작한 사람이 지능테스트에서 높은 점수를 받았던 거죠. "좋아요" 한 후에 그 친구들이 봤고, 호모필리 이론상, 그에겐 똑똑한 친구들이 있을겁니다. 그게 그들에게 퍼진 것이고, 몇몇은 "좋아요"를 눌렀을 거구요. 그들에게 있는 똑똑한 친구들에게 다시 퍼지는 겁니다. 그래서 그 관계를 통해 선전이 되는 것이죠. 똑똑한 사람들 집단에게요. 그래서 마침내 꼬부랑 감자를 "좋아요" 한 행동이 높은 지능의 표시가 되는 겁니다. 그것이 내용 때문이 아니라 "좋아요"를 누르는 행동을 한 사람들의 공통적인 요인 때문인 것이죠.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

꽤 복잡하게 들리시죠? 이걸 보통 사람들에게 설명하기가 어렵습니다. 한다고 해도 그들이 이걸 갖고 뭘 할까요? 여러분이 어떤 것을 좋다고 하는 것이 좋아하는 내용과는 전혀 관련이 없는 어떤 특징을 보여주는 것을 어떻게 알겠어요? 이런 데이타가 사용되는 방법에 대해 사용자들은 통제력이 거의 없습니다. 이것이 앞으로 나아가는데 큰 문제라고 봅니다.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

우리가 살펴봐야 할 몇 가지 경우가 있다고 생각합니다. 데이타가 사용되는 것에 대한 통제권을 사용자에게 주고 싶은지에 대해서요. 왜냐하면 이것이 언제나 사용자에게 좋은 쪽으로만 사용되는 것은 아니니까요. 제가 자주 드는 예가 있는데요, 제가 만약 교수직을 하는 것이 싫어진다면 이런 모든 요인들을 예측하는 회사를 차릴 수 있을 겁니다. 여러분이 팀에서 일을 잘 할 지, 마약복용자인지, 알콜 중독자인지 알려주는 회사요. 예측하는 방법을 알고 있거든요. 저는 그 보고서를 판매하는 거죠. 여러분을 고용하려는 대기업이나 인력회사에 말이에요. 지금 그 일들을 할 수가 있어요. 내일 그 사업을 시작할 수도 있죠. 제가 그런 데이타를 쓰는 것에 대해 여러분은 통제권이 아예 없습니다. 그게 저에게는 문제로 느껴집니다.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

우리가 택할 수 있는 방법 중 하나는 정책과 법률입니다. 어떤 면에서, 제 생각엔 가장 효과적일 것 같습니다. 문제는 실제로 그렇게 해야 한다는 겁니다. 정치적인 과정이 돌아가는 것을 보면 그렇게 될 것 같지가 않습니다. 수 많은 의원님들을 데리고 앉혀 놓고, 상황을 알려 준 다음에 미국의 지적재산권법에 엄청난 변화를 줄 법을 제정해서 사용자들이 데이타를 통제하게 하는 것이죠.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

소셜미디어 회사들이 말하는 정책적인 방법을 선택할 수 도 있어요. 아세요? 데이타는 여러분 것입니다. 데이타 사용에 대한 통제권이 여러분에게 있어요. 문제는 대부분의 소셜미디어 기업들이 어떤 방식으로든 사용자들의 데이타를 공유하고 이용하는 것으로 매출을 얻고 있다는 것입니다. 페이스북에는 사용자가 있는게 아니라 상품들이 있는 거라고 얘기하기도 합니다. 어떻게 해야 회사들이 주요 자산의 통제권을 사용자들에게 돌려 줄까요? 가능한 일이지만, 제 생각엔 당장 벌어지진 않을 겁니다.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

그래서 다른 방법으로 훨씬 효과적으로 대응할 수 있는 것이 과학적인 방법을 쓰는 겁니다. 애초에 개인적인 데이타를 계산해내는 메카니즘을 만들어 내는 것이 과학입니다. 사실 이건 우리가 해야 하는 연구와 아주 비슷합니다. 사용자에게 이런 경고를 해주는 프로그램을 만들려면요. "방금 하신 일에는 위험이 따릅니다." 페이스북의 "좋아요"를 누르시고 개인 정보를 공유하심으로써 제가 예측하는 능력을 키워 주시는 거에요. 여러분이 마약을 하시는지 회사에서 잘 지내시는지요. 제 생각엔, 그게 영향을 줄 수 있습니다. 어떤 것을 공유할 지 비밀로 할지, 아예 오프라인으로만 갖고 있을지요. 이런 것도 생각 해 볼 수 있습니다. 업로드하는 데이타를 암호화해서 페이스북 같은 사이트나 제3자가 아예 추적도 못하고 쓸 수도 없게 말입니다. 하지만 업로드한 사용자가 지정한 사람만 그것을 보게 해 주는 것이죠. 지적인 관점에서 보면 이것은 대단히 설레는 연구입니다. 과학자들은 기꺼이 하려고 합니다. 그래서 법률적인 측면에서 우리에게 유리하도록 말입니다.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

제가 이런 말을 할때 사람들이 제기하는 문제가 있는데요, "모두가 데이타를 비밀로 한다면 당신이 예측하려고 개발하는 방법들이 다 실패할 겁니다." 라고 말이에요. 그럼 저는, "맞아요, 그럼 성공한 거에요" 라고요. 저는 과학자로서 사용자들에 대한 정보를 추측하는게 아니라 온라인에서 더 좋은 방법으로 교류하도록 하는게 목표이니까요. 어떤 때는 그들에 대한 정보를 추측하는 일이 생깁니다. 하지만 사용자가 원치 않으면, 제 생각엔 그들에게 그럴 권리가 있습니다. 우리가 개발한 방법들을 사용자들에게 알려주고 동의를 받고 싶습니다.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

이러한 과학을 장려하고 사용자들에게 통제권을 돌려주고 소셜미디어 회사들로부터 멀어지도록 하는 연구자들을 지원하는 것이 이 개발도구들이 변화하고 발전하면서 앞으로 나아가는 것입니다. 또한 지식이 있고 권리가 있는 사용자들이 생긴다는 것이죠. 저는 앞으로 나아가기 위해 모두가 동의할 만한 것이라고 생각합니다.

Thank you.

감사합니다.

(Applause)

(박수)

Thank you.

감사합니다.

(Applause)

(박수)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads