I would like to tell you a story connecting the notorious privacy incident involving Adam and Eve, and the remarkable shift in the boundaries between public and private which has occurred in the past 10 years.
You know the incident. Adam and Eve one day in the Garden of Eden realize they are naked. They freak out. And the rest is history.
Nowadays, Adam and Eve would probably act differently.
[@Adam Last nite was a blast! loved dat apple LOL]
[@Eve yep.. babe, know what happened to my pants tho?]
We do reveal so much more information about ourselves online than ever before, and so much information about us is being collected by organizations. Now there is much to gain and benefit from this massive analysis of personal information, or big data, but there are also complex tradeoffs that come from giving away our privacy. And my story is about these tradeoffs.
We start with an observation which, in my mind, has become clearer and clearer in the past few years, that any personal information can become sensitive information. Back in the year 2000, about 100 billion photos were shot worldwide, but only a minuscule proportion of them were actually uploaded online. In 2010, only on Facebook, in a single month, 2.5 billion photos were uploaded, most of them identified. In the same span of time, computers' ability to recognize people in photos improved by three orders of magnitude. What happens when you combine these technologies together: increasing availability of facial data; improving facial recognizing ability by computers; but also cloud computing, which gives anyone in this theater the kind of computational power which a few years ago was only the domain of three-letter agencies; and ubiquitous computing, which allows my phone, which is not a supercomputer, to connect to the Internet and do there hundreds of thousands of face metrics in a few seconds? Well, we conjecture that the result of this combination of technologies will be a radical change in our very notions of privacy and anonymity.
To test that, we did an experiment on Carnegie Mellon University campus. We asked students who were walking by to participate in a study, and we took a shot with a webcam, and we asked them to fill out a survey on a laptop. While they were filling out the survey, we uploaded their shot to a cloud-computing cluster, and we started using a facial recognizer to match that shot to a database of some hundreds of thousands of images which we had downloaded from Facebook profiles. By the time the subject reached the last page on the survey, the page had been dynamically updated with the 10 best matching photos which the recognizer had found, and we asked the subjects to indicate whether he or she found themselves in the photo.
Do you see the subject? Well, the computer did, and in fact did so for one out of three subjects.
So essentially, we can start from an anonymous face, offline or online, and we can use facial recognition to give a name to that anonymous face thanks to social media data. But a few years back, we did something else. We started from social media data, we combined it statistically with data from U.S. government social security, and we ended up predicting social security numbers, which in the United States are extremely sensitive information.
Do you see where I'm going with this? So if you combine the two studies together, then the question becomes, can you start from a face and, using facial recognition, find a name and publicly available information about that name and that person, and from that publicly available information infer non-publicly available information, much more sensitive ones which you link back to the face? And the answer is, yes, we can, and we did. Of course, the accuracy keeps getting worse. [27% of subjects' first 5 SSN digits identified (with 4 attempts)] But in fact, we even decided to develop an iPhone app which uses the phone's internal camera to take a shot of a subject and then upload it to a cloud and then do what I just described to you in real time: looking for a match, finding public information, trying to infer sensitive information, and then sending back to the phone so that it is overlaid on the face of the subject, an example of augmented reality, probably a creepy example of augmented reality. In fact, we didn't develop the app to make it available, just as a proof of concept.
In fact, take these technologies and push them to their logical extreme. Imagine a future in which strangers around you will look at you through their Google Glasses or, one day, their contact lenses, and use seven or eight data points about you to infer anything else which may be known about you. What will this future without secrets look like? And should we care?
We may like to believe that the future with so much wealth of data would be a future with no more biases, but in fact, having so much information doesn't mean that we will make decisions which are more objective. In another experiment, we presented to our subjects information about a potential job candidate. We included in this information some references to some funny, absolutely legal, but perhaps slightly embarrassing information that the subject had posted online. Now interestingly, among our subjects, some had posted comparable information, and some had not. Which group do you think was more likely to judge harshly our subject? Paradoxically, it was the group who had posted similar information, an example of moral dissonance.
Now you may be thinking, this does not apply to me, because I have nothing to hide. But in fact, privacy is not about having something negative to hide. Imagine that you are the H.R. director of a certain organization, and you receive résumés, and you decide to find more information about the candidates. Therefore, you Google their names and in a certain universe, you find this information. Or in a parallel universe, you find this information. Do you think that you would be equally likely to call either candidate for an interview? If you think so, then you are not like the U.S. employers who are, in fact, part of our experiment, meaning we did exactly that. We created Facebook profiles, manipulating traits, then we started sending out résumés to companies in the U.S., and we detected, we monitored, whether they were searching for our candidates, and whether they were acting on the information they found on social media. And they were. Discrimination was happening through social media for equally skilled candidates.
Now marketers like us to believe that all information about us will always be used in a manner which is in our favor. But think again. Why should that be always the case? In a movie which came out a few years ago, "Minority Report," a famous scene had Tom Cruise walk in a mall and holographic personalized advertising would appear around him. Now, that movie is set in 2054, about 40 years from now, and as exciting as that technology looks, it already vastly underestimates the amount of information that organizations can gather about you, and how they can use it to influence you in a way that you will not even detect.
So as an example, this is another experiment actually we are running, not yet completed. Imagine that an organization has access to your list of Facebook friends, and through some kind of algorithm they can detect the two friends that you like the most. And then they create, in real time, a facial composite of these two friends. Now studies prior to ours have shown that people don't recognize any longer even themselves in facial composites, but they react to those composites in a positive manner. So next time you are looking for a certain product, and there is an ad suggesting you to buy it, it will not be just a standard spokesperson. It will be one of your friends, and you will not even know that this is happening.
Now the problem is that the current policy mechanisms we have to protect ourselves from the abuses of personal information are like bringing a knife to a gunfight. One of these mechanisms is transparency, telling people what you are going to do with their data. And in principle, that's a very good thing. It's necessary, but it is not sufficient. Transparency can be misdirected. You can tell people what you are going to do, and then you still nudge them to disclose arbitrary amounts of personal information.
So in yet another experiment, this one with students, we asked them to provide information about their campus behavior, including pretty sensitive questions, such as this one. [Have you ever cheated in an exam?] Now to one group of subjects, we told them, "Only other students will see your answers." To another group of subjects, we told them, "Students and faculty will see your answers." Transparency. Notification. And sure enough, this worked, in the sense that the first group of subjects were much more likely to disclose than the second. It makes sense, right? But then we added the misdirection. We repeated the experiment with the same two groups, this time adding a delay between the time we told subjects how we would use their data and the time we actually started answering the questions.
How long a delay do you think we had to add in order to nullify the inhibitory effect of knowing that faculty would see your answers? Ten minutes? Five minutes? One minute? How about 15 seconds? Fifteen seconds were sufficient to have the two groups disclose the same amount of information, as if the second group now no longer cares for faculty reading their answers.
Now I have to admit that this talk so far may sound exceedingly gloomy, but that is not my point. In fact, I want to share with you the fact that there are alternatives. The way we are doing things now is not the only way they can done, and certainly not the best way they can be done. When someone tells you, "People don't care about privacy," consider whether the game has been designed and rigged so that they cannot care about privacy, and coming to the realization that these manipulations occur is already halfway through the process of being able to protect yourself. When someone tells you that privacy is incompatible with the benefits of big data, consider that in the last 20 years, researchers have created technologies to allow virtually any electronic transactions to take place in a more privacy-preserving manner. We can browse the Internet anonymously. We can send emails that can only be read by the intended recipient, not even the NSA. We can have even privacy-preserving data mining. In other words, we can have the benefits of big data while protecting privacy. Of course, these technologies imply a shifting of cost and revenues between data holders and data subjects, which is why, perhaps, you don't hear more about them.
Which brings me back to the Garden of Eden. There is a second privacy interpretation of the story of the Garden of Eden which doesn't have to do with the issue of Adam and Eve feeling naked and feeling ashamed. You can find echoes of this interpretation in John Milton's "Paradise Lost." In the garden, Adam and Eve are materially content. They're happy. They are satisfied. However, they also lack knowledge and self-awareness. The moment they eat the aptly named fruit of knowledge, that's when they discover themselves. They become aware. They achieve autonomy. The price to pay, however, is leaving the garden. So privacy, in a way, is both the means and the price to pay for freedom.
Again, marketers tell us that big data and social media are not just a paradise of profit for them, but a Garden of Eden for the rest of us. We get free content. We get to play Angry Birds. We get targeted apps. But in fact, in a few years, organizations will know so much about us, they will be able to infer our desires before we even form them, and perhaps buy products on our behalf before we even know we need them.
Now there was one English author who anticipated this kind of future where we would trade away our autonomy and freedom for comfort. Even more so than George Orwell, the author is, of course, Aldous Huxley. In "Brave New World," he imagines a society where technologies that we created originally for freedom end up coercing us. However, in the book, he also offers us a way out of that society, similar to the path that Adam and Eve had to follow to leave the garden. In the words of the Savage, regaining autonomy and freedom is possible, although the price to pay is steep. So I do believe that one of the defining fights of our times will be the fight for the control over personal information, the fight over whether big data will become a force for freedom, rather than a force which will hiddenly manipulate us.
Right now, many of us do not even know that the fight is going on, but it is, whether you like it or not. And at the risk of playing the serpent, I will tell you that the tools for the fight are here, the awareness of what is going on, and in your hands, just a few clicks away.
Thank you.
(Applause)