Jennifer Golbeck: Your social media "likes" expose more than you think

If you remember that first decade of the web, it was really a static place. You could go online, you could look at pages, and they were put up either by organizations who had teams to do it or by individuals who were really tech-savvy for the time. And with the rise of social media and social networks in the early 2000s, the web was completely changed to a place where now the vast majority of content we interact with is put up by average users, either in YouTube videos or blog posts or product reviews or social media postings. And it's also become a much more interactive place, where people are interacting with others, they're commenting, they're sharing, they're not just reading.

Çevrimiçi ağın ilk on yılını düşündüğünüzde, çok durağan bir yer olduğunu görürsünüz. Çevrimiçi olurdunuz, ve sayfalara bakardınız, bu sayfalar ya bunu yaptıracak ekipleri olan kuruluşlar ya da o zamana göre gerçekten teknoloji meraklısı bireyler tarafından yapılırdı. 2000li yılların başlarında sosyal medyanın ve sosyal ağın gelişmesi ile birlikte, çevrimiçi ağ tamamen değişti ve etkileşimde olduğumuz içeriğin büyük çoğunluğu ortalama kullanıcılar tarafından Youtube videoları ya da blog yazıları ürün eleştirileri ya da sosyal medya mesajları şeklinde hazırlanır oldu. Ayrıca, insanların yalnızca okumadığı, birbirleriyle etkileştiği, paylaştığı ve yorum yazdığı çok daha etkileşimli bir ortam haline geldi.

So Facebook is not the only place you can do this, but it's the biggest, and it serves to illustrate the numbers. Facebook has 1.2 billion users per month. So half the Earth's Internet population is using Facebook. They are a site, along with others, that has allowed people to create an online persona with very little technical skill, and people responded by putting huge amounts of personal data online. So the result is that we have behavioral, preference, demographic data for hundreds of millions of people, which is unprecedented in history. And as a computer scientist, what this means is that I've been able to build models that can predict all sorts of hidden attributes for all of you that you don't even know you're sharing information about. As scientists, we use that to help the way people interact online, but there's less altruistic applications, and there's a problem in that users don't really understand these techniques and how they work, and even if they did, they don't have a lot of control over it. So what I want to talk to you about today is some of these things that we're able to do, and then give us some ideas of how we might go forward to move some control back into the hands of users.

Facebook bunu yapabileceğiniz tek platform değil ancak hem en büyüğü, hem de rakamlarla konuşmamıza olanak tanıyor. Facebook'un ayda 1.2 milyar kullanıcısı var. yani dünyanın internet kullanıcı nüfusunun yarısı Facebook kullanıyor. Diğerleri gibi bu site de çok az teknik bilgi gereksinimi ile online bir kişilik oluşturmalarına olanak sağladı ve insanlar da çok miktarda kişisel bilgiyi online olarak yayınladılar. Sonuç olarak tarihte daha önce görülmemiş bir şekilde yüzlerce milyon insanın davranışsal, tercihsel ve demografik bilgilerine ulaştık. Bir bilgisayar bilimcisi olarak bu benim, siz paylaştığınızın farkına bile varmadan sizin her türlü saklı özelliğinizi tahmin eden modeller tasarlamam anlamına gelmektedir. Bilim insanları olarak bizler bu bilgileri insanların online iletişimine destek amaçlı kullanırız ancak böyle fedakar olmayan uygulamalar da var ve problem, kullanıcıların bu tekniklerin nasıl işlediğini çok iyi anlamamaları, anlasalar dahi bunları kontrol edememeleridir. Bu nedenle bugün sizlerle yapabildiğimiz şeylerden bazılarını paylaşmak ve kullanıcıların eline biraz daha fazla kontrol vermek için neler yapılabileceği hakkında fikirler vermek istiyorum.

So this is Target, the company. I didn't just put that logo on this poor, pregnant woman's belly. You may have seen this anecdote that was printed in Forbes magazine where Target sent a flyer to this 15-year-old girl with advertisements and coupons for baby bottles and diapers and cribs two weeks before she told her parents that she was pregnant. Yeah, the dad was really upset. He said, "How did Target figure out that this high school girl was pregnant before she told her parents?" It turns out that they have the purchase history for hundreds of thousands of customers and they compute what they call a pregnancy score, which is not just whether or not a woman's pregnant, but what her due date is. And they compute that not by looking at the obvious things, like, she's buying a crib or baby clothes, but things like, she bought more vitamins than she normally had, or she bought a handbag that's big enough to hold diapers. And by themselves, those purchases don't seem like they might reveal a lot, but it's a pattern of behavior that, when you take it in the context of thousands of other people, starts to actually reveal some insights. So that's the kind of thing that we do when we're predicting stuff about you on social media. We're looking for little patterns of behavior that, when you detect them among millions of people, lets us find out all kinds of things.

Bu Target, bir şirket. Bu zavallı hamile kadının karnına bu logoyu öylesine koymadım. Daha önce Forbes dergisinde bu anekdotu görmüşsünüzdür. Target, 15 yaşındaki bu kıza kendisi daha anne ve babasına hamile olduğunu söylemeden iki hafta önce içinde biberon, bebek bezi ve bebek yatağı reklam ve kuponlarının olduğu bir broşür göndermişti. Evet haliyle baba sinirlenmişti. "Nasıl olur da Target liseli bir kızın hamile olduğunu o anne ve babasına söylemeden önce bilebilir!"di. Belli oldu ki, yüzlerce, binlerce müşterinin alışveriş geçmişi hakkında bilgileri vardı ve hamilelik puanı adı verilen bir değeri hesaplayarak, bir kadının sadece hamile olup olmadığını değil, ne zaman doğum yapacağını da biliyorlardı. Ve bu hesaplamaları, kızın bebek yatağı ya da bebek kıyafetleri alması gibi zaten bariz olan şeylere bakarak değil, normalden daha fazla vitamin satın alması ya da bebek bezi konulabilecek kadar büyük bir el çantası alması gibi şeyere bakarak yapıyorlar. Bu satın almalar kendi başlarına çok da bir şey ifade eder gibi değiller ancak bu davranış kalıpları binlerce diğer insan bağlamında düşünüldüğünde gerçekten bazı öngörüler sunmaya başlıyor. İşte sosyal medyada sizler hakkında tahminler yürütürken bizim de yaptığımız bu. Milyonlarca insanla kıyaslanıp bulunduğunda her tür bilgiyi bize sunan küçük davranış kalıpları ararız.

So in my lab and with colleagues, we've developed mechanisms where we can quite accurately predict things like your political preference, your personality score, gender, sexual orientation, religion, age, intelligence, along with things like how much you trust the people you know and how strong those relationships are. We can do all of this really well. And again, it doesn't come from what you might think of as obvious information.

Laboratuarımızda ben ve iş arkadaşlarım sizin siyasi tercihinizi, kişilik puanınızı cinsiyetinizi, cinsel tercihinizi dininizi, yaşınızı, zekanızı ve bunlara ek olarak tanıdığınız insanlara ne kadar güvendiğinizi ve bu ilişkilerinizin ne kadar güçlü olduğunu gayet doğru şekilde tahmin eden bir mekanizma geliştirdik. Bütün bunları çok iyi şekilde yapabiliyoruz. Ve bütün bunlar, sizlerin de bariz olarak adlandıramayacağı şeylerden derleniyor.

So my favorite example is from this study that was published this year in the Proceedings of the National Academies. If you Google this, you'll find it. It's four pages, easy to read. And they looked at just people's Facebook likes, so just the things you like on Facebook, and used that to predict all these attributes, along with some other ones. And in their paper they listed the five likes that were most indicative of high intelligence. And among those was liking a page for curly fries. (Laughter) Curly fries are delicious, but liking them does not necessarily mean that you're smarter than the average person. So how is it that one of the strongest indicators of your intelligence is liking this page when the content is totally irrelevant to the attribute that's being predicted? And it turns out that we have to look at a whole bunch of underlying theories to see why we're able to do this. One of them is a sociological theory called homophily, which basically says people are friends with people like them. So if you're smart, you tend to be friends with smart people, and if you're young, you tend to be friends with young people, and this is well established for hundreds of years. We also know a lot about how information spreads through networks. It turns out things like viral videos or Facebook likes or other information spreads in exactly the same way that diseases spread through social networks. So this is something we've studied for a long time. We have good models of it. And so you can put those things together and start seeing why things like this happen. So if I were to give you a hypothesis, it would be that a smart guy started this page, or maybe one of the first people who liked it would have scored high on that test. And they liked it, and their friends saw it, and by homophily, we know that he probably had smart friends, and so it spread to them, and some of them liked it, and they had smart friends, and so it spread to them, and so it propagated through the network to a host of smart people, so that by the end, the action of liking the curly fries page is indicative of high intelligence, not because of the content, but because the actual action of liking reflects back the common attributes of other people who have done it.

Benim en sevdiğim örnek Ulusal Akademi Konferansında geçen yıl yayımlanan bir araştırmadan. Google'da ararsanız bulursunuz. Dört sayfalık kolay okunan bir metin. İnsanların Facebook'ta neleri beğendiklerine bakıp sadece beğendikleri şeylere bakarak bu bilgiyle bütün bu özellikleri ve başka özellikleri tahmin etmişler. Makalelerinde yüksek zekanın en iyi göstergesi olan beş beğeniyi listelemişler. ve bunların arasında kıvrık patates kızartmasının beğenildiği bir sayfa da bulunuyor (gülüşmeler) Kıvrık patates kızartması nefistir ama onları seviyor olmanız ortalama bir insandan daha zeki olduğunuz anlamına gelmez. Peki nasıl oluyor da zekanızın en iyi göstergesi içeriği ölçülen özellikle hiç bir ilgisi olmayan bir sayfayı beğenmek oluyor? Görülüyor ki bunun nasıl yapıldığını anlamak için altta yatan pek çok teoriye bakmak gerekecek. Bunlardan bir tanesi homofili adı verilen sosyolojik bir teoridir ve temelde insanların kendileri gibi olan kişilerle dost olduklarını söyler. Yani eğer zekiyseniz zeki insanlarla dost olma eğilimindesinizdir, gençseniz genç insanlarla dost olma eğiliminiz vardır ve bu yüzyıllardır yerleşik olan bir bilgidir. Biz bilginin de ağlarda nasıl yayıldığını biliyoruz. Gördük ki viral videolar, Facebook beğenileri ya da diğer bilgiler sosyal ağlarda hastalıkların yayılması ile aynı şekilde yayılıyor. Bu uzun zamandır üzerinde çalıştığımız bir konu. Bunun için iyi modellerimiz var. Bütün bu bilgileri bir araya getirerek böyle şeylerin neden olduğunu görmeye başlayabilirsiniz. Size bir hipotez verecek olsam bu sayfayı zeki bir insanın başlattığını ve bunu ilk beğenen kişilerden birinin de bu testte iyi puan aldığını söylerdim. Onlar beğendiler, arkadaşları bunu gördü, ve homofili sayesinde onun da zeki arkadaşlarının olduğunu biliyoruz böylece bu onlara da yayıldı ve onların da bazıları beğendi ve onların da zeki arkadaşları vardı ve bu onlara da yayıldı böylece bu ağ boyunca zeki insanlar arasında yayılmış oldu sonunda kıvrık patatesi beğenme sayfası yüksek zekanın göstergesi haline geldi içeriği için değil ama, beğenme eylemini gerçekleştiren kişilerin ortak özelliklerinden dolayı.

So this is pretty complicated stuff, right? It's a hard thing to sit down and explain to an average user, and even if you do, what can the average user do about it? How do you know that you've liked something that indicates a trait for you that's totally irrelevant to the content of what you've liked? There's a lot of power that users don't have to control how this data is used. And I see that as a real problem going forward.

Epeyce karmaşık bir şey değil mi? Ortalama bir kullanıcıyla oturup bunu onlara anlatmak zor olacaktır, zaten anlatılsa bile ortalama kullanıcının yapabileceği ne var ki? Beğendiğiniz bir şeyin o beğndiğiniz şeyin içeriğiyle hiç alakası olmayan bir özelliğinize işaret ettiğini nasıl bilebilirsiniz? Kullanıcıların bilgilerin nasıl kullanılacağına dair üzerinde hiç bir kontrolü olmayan pek çok veri var. Ve ben bunu devam eden çok büyük bir sorun olarak görüyorum.

So I think there's a couple paths that we want to look at if we want to give users some control over how this data is used, because it's not always going to be used for their benefit. An example I often give is that, if I ever get bored being a professor, I'm going to go start a company that predicts all of these attributes and things like how well you work in teams and if you're a drug user, if you're an alcoholic. We know how to predict all that. And I'm going to sell reports to H.R. companies and big businesses that want to hire you. We totally can do that now. I could start that business tomorrow, and you would have absolutely no control over me using your data like that. That seems to me to be a problem.

Eğer kullanıcılara bu verilerin nasıl kullanılacağına dair biraz kontrol vermek istiyorsak bakılması gereken bazı seçenekler var, çünkü bu veriler her zaman onların yararına kullanılmayacaktır. Sık verdiğim bir örnek, eğer bir gün profesörlükten bıkarsam, takım içinde nasıl çalıştığınız uyuşturucu kullanıp kullanmadığınız, alkolik olup olmadığınız gibi özelliklere bakacağım bir şirket kurma isteğimdir. Bunların hepsini nasıl tahmin edeceğimizi biliyoruz. Bu bilgileri, sizi işe almak isteyen insan kaynakları şirketlerine ve büyük firmalara satacağım. Bunların hepsini yapabiliyoruz. Yarın bir şirket açabilirim ve sizin, size ait bu bilgileri nasıl kullanacağım üzerinde hiç bir kontrolünüz olmaz. Bu bana bir sorun gibi görünüyor.

So one of the paths we can go down is the policy and law path. And in some respects, I think that that would be most effective, but the problem is we'd actually have to do it. Observing our political process in action makes me think it's highly unlikely that we're going to get a bunch of representatives to sit down, learn about this, and then enact sweeping changes to intellectual property law in the U.S. so users control their data.

Bakacağımız seçeneklerden biri politika ve hukuk seçeneğidir. Bence bazı açılardan bu en etkilisidir ancak sorun bununla gerçekten uğraşmanız gerektiğidir. Şu andaki politik süreçlerimize baktığımda bir kaç vekilin oturup, bunları öğrenip ve sonra da Amerikan hukukunun fikir mülkiyeti alanında geniş içerikli değişiklikler yaparak kullanıcıların verilerini kontrol etmelerini sağlama ihtimalleri bana düşük görünüyor.

We could go the policy route, where social media companies say, you know what? You own your data. You have total control over how it's used. The problem is that the revenue models for most social media companies rely on sharing or exploiting users' data in some way. It's sometimes said of Facebook that the users aren't the customer, they're the product. And so how do you get a company to cede control of their main asset back to the users? It's possible, but I don't think it's something that we're going to see change quickly.

Politika seçeneğine bakabiliriz, burada da sosyal medya şirketleri size elbette verileriniz size aittir nasıl kullanıldığının kontrolü tamamen sizdedir diyeceklerdir ancak sorun şu ki çoğu sosyal medya şirketinin gelir modeli kullanıcı verilerinin bir şekilde paylaşılması ya da kullanılmasına dayanıyor. Bazen Facebook kullanıcıların müşteri değil ürünün kendisi olduğunu söylüyor. O zaman bir firmanın en temel kazancından feragat ederek bunu kullanıcılara geri vermesini nasıl beklersiniz? Bu mümkündür tabii ancak yakın zamanda değişecek bir şey olduğunu düşünmüyorum.

So I think the other path that we can go down that's going to be more effective is one of more science. It's doing science that allowed us to develop all these mechanisms for computing this personal data in the first place. And it's actually very similar research that we'd have to do if we want to develop mechanisms that can say to a user, "Here's the risk of that action you just took." By liking that Facebook page, or by sharing this piece of personal information, you've now improved my ability to predict whether or not you're using drugs or whether or not you get along well in the workplace. And that, I think, can affect whether or not people want to share something, keep it private, or just keep it offline altogether. We can also look at things like allowing people to encrypt data that they upload, so it's kind of invisible and worthless to sites like Facebook or third party services that access it, but that select users who the person who posted it want to see it have access to see it. This is all super exciting research from an intellectual perspective, and so scientists are going to be willing to do it. So that gives us an advantage over the law side.

O nedenle daha etkili olan diğer seçeneği bilim seçeneğine bakmayı öneriyorum. Bütün bu kişisel verilerin bu mekanizmalarla hesaplananabilmesine en başta olanak tanıyan şey bilim oldu. Kullanıcıya "Az önce gerçekleştirdiğin eylemin riski şudur" diyebilecek bir mekanizma yaratmak istiyorsak ilk baştakine çok benzer araştırmalar yapmak gerekir. O Facebook sayfasını beğenerek ya da diğer insanlarla şu bilgiyi paylaşarak uyuşturucu kullanıp kullanmadığını ya da iş yerinde insanlar iyi geçinip geçinmediğini tahmin etmemi kolaylaştırdın. Bu durum, insanların bir şeyi paylaşıp paylaşmamalarını kendilerine saklamalarını, ya da tamamen çevrimdışı tutmalarını bence etkileyecektir. İnsanların yükledikleri verileri şifrelemelerine izin vermek de bir seçenektir böylece Facebook gibi siteler ya da üçüncü şahıslar için bu veriler görünmez ya da yararsız olacak ve yükleyen kişi kimlerin bu verileri göreceğini ve erişimi olacağını seçebilecektir. Entellektüel açıdan bu süper heyecan verici bir araştırmadır ve bilim insanları bunu yapmak isteyecektir. Bu hukuk seçeneğine göre bize daha fazla avantaj sağlar.

One of the problems that people bring up when I talk about this is, they say, you know, if people start keeping all this data private, all those methods that you've been developing to predict their traits are going to fail. And I say, absolutely, and for me, that's success, because as a scientist, my goal is not to infer information about users, it's to improve the way people interact online. And sometimes that involves inferring things about them, but if users don't want me to use that data, I think they should have the right to do that. I want users to be informed and consenting users of the tools that we develop.

Bundan bahsettiğimde insanların öne sürdükleri bir sorun, eğer herkes verilerini gizli tutarsa sizin geliştirdiğiniz tüm tahminle ilgili tüm metodların başarısız olacağıdır. Ben de evet kesinlikle diyorum ve bence bu bir başarıdır çünkü bir bilim insanı olarak, benim amacım kullanıcılar hakkında çıkarımarda bulunmak değil, insanların online olarak etkileşimlerini geliştirmek. Bazen bu onlar hakkında çıkarımlar yapılmasını da gerektiriyor ancak eğer kullanıcılar bu verileri kullanmama izin vermezlerse buna hakları olması gerektiğini düşünüyorum. Ben kullanıcıların geliştirdiğimiz araçları hakkında bilgisi olan ve buna izin veren kullanıcılar olmasını istiyorum.

And so I think encouraging this kind of science and supporting researchers who want to cede some of that control back to users and away from the social media companies means that going forward, as these tools evolve and advance, means that we're going to have an educated and empowered user base, and I think all of us can agree that that's a pretty ideal way to go forward.

Bence bu bilimin gelişmesini desteklemek kontrolün bir kısmından feragat ederek ve sosyal medya şirketlerinden geri alarak kullanıcıya iade etmek isteyen araştırmacıları desteklemek ilerlemek anlamına gelecektir ve bu araçlar gelişip evrim geçirdikçe eğitimli, ve güç sahibi bir kullanıcı tabanımız olacak ve sanırım hepimiz bunun ilerleme yolunda bir ideal olduğu konusunda hemfikirizdir.

Thank you.

Teşekkürler.

(Applause)

(Alkış)

Thank you.

Teşekkürler.

(Applause)

(Alkış)

Jennifer Golbeck: Your social media "likes" expose more than you think

Jennifer Golbeck: Your social media "likes" expose more than you think

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads

Related talks

Del Harvey: Protecting Twitter users (sometimes from themselves)

Johanna Blakley: Social media and the end of gender

Juan Enriquez: Your online life, permanent as a tattoo

Susan Etlinger: What do we do with all this big data?

Tamas Kocsis: The case for a decentralized internet

Zeynep Tufekci: We're building a dystopia just to make people click on ads