John Wilbanks: Let's pool our medical data

So I have bad news, I have good news, and I have a task. So the bad news is that we all get sick. I get sick. You get sick. And every one of us gets sick, and the question really is, how sick do we get? Is it something that kills us? Is it something that we survive? Is it something that we can treat?

And we've gotten sick as long as we've been people. And so we've always looked for reasons to explain why we get sick. And for a long time, it was the gods, right? The gods are angry with me, or the gods are testing me, right? Or God, singular, more recently, is punishing me or judging me. And as long as we've looked for explanations, we've wound up with something that gets closer and closer to science, which is hypotheses as to why we get sick, and as long as we've had hypotheses about why we get sick, we've tried to treat it as well.

So this is Avicenna. He wrote a book over a thousand years ago called "The Canon of Medicine," and the rules he laid out for testing medicines are actually really similar to the rules we have today, that the disease and the medicine must be the same strength, the medicine needs to be pure, and in the end we need to test it in people. And so if you put together these themes of a narrative or a hypothesis in human testing, right, you get some beautiful results, even when we didn't have very good technologies.

This is a guy named Carlos Finlay. He had a hypothesis that was way outside the box for his time, in the late 1800s. He thought yellow fever was not transmitted by dirty clothing. He thought it was transmitted by mosquitos. And they laughed at him. For 20 years, they called this guy "the mosquito man." But he ran an experiment in people, right? He had this hypothesis, and he tested it in people. So he got volunteers to go move to Cuba and live in tents and be voluntarily infected with yellow fever. So some of the people in some of the tents had dirty clothes and some of the people were in tents that were full of mosquitos that had been exposed to yellow fever. And it definitively proved that it wasn't this magic dust called fomites in your clothes that caused yellow fever. But it wasn't until we tested it in people that we actually knew. And this is what those people signed up for. This is what it looked like to have yellow fever in Cuba at that time. You suffered in a tent, in the heat, alone, and you probably died. But people volunteered for this.

And it's not just a cool example of a scientific design of experiment in theory. They also did this beautiful thing. They signed this document, and it's called an informed consent document. And informed consent is an idea that we should be very proud of as a society, right? It's something that separates us from the Nazis at Nuremberg, enforced medical experimentation. It's the idea that agreement to join a study without understanding isn't agreement. It's something that protects us from harm, from hucksters, from people that would try to hoodwink us into a clinical study that we don't understand, or that we don't agree to. And so you put together the thread of narrative hypothesis, experimentation in humans, and informed consent, and you get what we call clinical study, and it's how we do the vast majority of medical work. It doesn't really matter if you're in the north, the south, the east, the west. Clinical studies form the basis of how we investigate, so if we're going to look at a new drug, right, we test it in people, we draw blood, we do experiments, and we gain consent for that study, to make sure that we're not screwing people over as part of it.

But the world is changing around the clinical study, which has been fairly well established for tens of years if not 50 to 100 years. So now we're able to gather data about our genomes, but, as we saw earlier, our genomes aren't dispositive. We're able to gather information about our environment. And more importantly, we're able to gather information about our choices, because it turns out that what we think of as our health is more like the interaction of our bodies, our genomes, our choices and our environment. And the clinical methods that we've got aren't very good at studying that because they are based on the idea of person-to-person interaction. You interact with your doctor and you get enrolled in the study. So this is my grandfather. I actually never met him, but he's holding my mom, and his genes are in me, right? His choices ran through to me. He was a smoker, like most people were. This is my son. So my grandfather's genes go all the way through to him, and my choices are going to affect his health. The technology between these two pictures cannot be more different, but the methodology for clinical studies has not radically changed over that time period. We just have better statistics. The way we gain informed consent was formed in large part after World War II, around the time that picture was taken. That was 70 years ago, and the way we gain informed consent, this tool that was created to protect us from harm, now creates silos. So the data that we collect for prostate cancer or for Alzheimer's trials goes into silos where it can only be used for prostate cancer or for Alzheimer's research. Right? It can't be networked. It can't be integrated. It cannot be used by people who aren't credentialed. So a physicist can't get access to it without filing paperwork. A computer scientist can't get access to it without filing paperwork. Computer scientists aren't patient. They don't file paperwork.

And this is an accident. These are tools that we created to protect us from harm, but what they're doing is protecting us from innovation now. And that wasn't the goal. It wasn't the point. Right? It's a side effect, if you will, of a power we created to take us for good. And so if you think about it, the depressing thing is that Facebook would never make a change to something as important as an advertising algorithm with a sample size as small as a Phase III clinical trial. We cannot take the information from past trials and put them together to form statistically significant samples.

And that sucks, right? So 45 percent of men develop cancer. Thirty-eight percent of women develop cancer. One in four men dies of cancer. One in five women dies of cancer, at least in the United States. And three out of the four drugs we give you if you get cancer fail. And this is personal to me. My sister is a cancer survivor. My mother-in-law is a cancer survivor. Cancer sucks. And when you have it, you don't have a lot of privacy in the hospital. You're naked the vast majority of the time. People you don't know come in and look at you and poke you and prod you, and when I tell cancer survivors that this tool we created to protect them is actually preventing their data from being used, especially when only three to four percent of people who have cancer ever even sign up for a clinical study, their reaction is not, "Thank you, God, for protecting my privacy." It's outrage that we have this information and we can't use it. And it's an accident. So the cost in blood and treasure of this is enormous. Two hundred and twenty-six billion a year is spent on cancer in the United States. Fifteen hundred people a day die in the United States. And it's getting worse.

So the good news is that some things have changed, and the most important thing that's changed is that we can now measure ourselves in ways that used to be the dominion of the health system. So a lot of people talk about it as digital exhaust. I like to think of it as the dust that runs along behind my kid. We can reach back and grab that dust, and we can learn a lot about health from it, so if our choices are part of our health, what we eat is a really important aspect of our health. So you can do something very simple and basic and take a picture of your food, and if enough people do that, we can learn a lot about how our food affects our health. One interesting thing that came out of this — this is an app for iPhones called The Eatery — is that we think our pizza is significantly healthier than other people's pizza is. Okay? (Laughter) And it seems like a trivial result, but this is the sort of research that used to take the health system years and hundreds of thousands of dollars to accomplish. It was done in five months by a startup company of a couple of people. I don't have any financial interest in it.

But more nontrivially, we can get our genotypes done, and although our genotypes aren't dispositive, they give us clues. So I could show you mine. It's just A's, T's, C's and G's. This is the interpretation of it. As you can see, I carry a 32 percent risk of prostate cancer, 22 percent risk of psoriasis and a 14 percent risk of Alzheimer's disease. So that means, if you're a geneticist, you're freaking out, going, "Oh my God, you told everyone you carry the ApoE E4 allele. What's wrong with you?" Right? When I got these results, I started talking to doctors, and they told me not to tell anyone, and my reaction is, "Is that going to help anyone cure me when I get the disease?" And no one could tell me yes. And I live in a web world where, when you share things, beautiful stuff happens, not bad stuff. So I started putting this in my slide decks, and I got even more obnoxious, and I went to my doctor, and I said, "I'd like to actually get my bloodwork. Please give me back my data." So this is my most recent bloodwork. As you can see, I have high cholesterol. I have particularly high bad cholesterol, and I have some bad liver numbers, but those are because we had a dinner party with a lot of good wine the night before we ran the test. (Laughter) Right. But look at how non-computable this information is. This is like the photograph of my granddad holding my mom from a data perspective, and I had to go into the system and get it out.

So the thing that I'm proposing we do here is that we reach behind us and we grab the dust, that we reach into our bodies and we grab the genotype, and we reach into the medical system and we grab our records, and we use it to build something together, which is a commons. And there's been a lot of talk about commonses, right, here, there, everywhere, right. A commons is nothing more than a public good that we build out of private goods. We do it voluntarily, and we do it through standardized legal tools. We do it through standardized technologies. Right. That's all a commons is. It's something that we build together because we think it's important.

And a commons of data is something that's really unique, because we make it from our own data. And although a lot of people like privacy as their methodology of control around data, and obsess around privacy, at least some of us really like to share as a form of control, and what's remarkable about digital commonses is you don't need a big percentage if your sample size is big enough to generate something massive and beautiful. So not that many programmers write free software, but we have the Apache web server. Not that many people who read Wikipedia edit, but it works. So as long as some people like to share as their form of control, we can build a commons, as long as we can get the information out. And in biology, the numbers are even better. So Vanderbilt ran a study asking people, we'd like to take your biosamples, your blood, and share them in a biobank, and only five percent of the people opted out. I'm from Tennessee. It's not the most science-positive state in the United States of America. (Laughter) But only five percent of the people wanted out. So people like to share, if you give them the opportunity and the choice.

And the reason that I got obsessed with this, besides the obvious family aspects, is that I spend a lot of time around mathematicians, and mathematicians are drawn to places where there's a lot of data because they can use it to tease signals out of noise. And those correlations that they can tease out, they're not necessarily causal agents, but math, in this day and age, is like a giant set of power tools that we're leaving on the floor, not plugged in in health, while we use hand saws. If we have a lot of shared genotypes, and a lot of shared outcomes, and a lot of shared lifestyle choices, and a lot of shared environmental information, we can start to tease out the correlations between subtle variations in people, the choices they make and the health that they create as a result of those choices, and there's open-source infrastructure to do all of this. Sage Bionetworks is a nonprofit that's built a giant math system that's waiting for data, but there isn't any.

So that's what I do. I've actually started what we think is the world's first fully digital, fully self-contributed, unlimited in scope, global in participation, ethically approved clinical research study where you contribute the data. So if you reach behind yourself and you grab the dust, if you reach into your body and grab your genome, if you reach into the medical system and somehow extract your medical record, you can actually go through an online informed consent process -- because the donation to the commons must be voluntary and it must be informed -- and you can actually upload your information and have it syndicated to the mathematicians who will do this sort of big data research, and the goal is to get 100,000 in the first year and a million in the first five years so that we have a statistically significant cohort that you can use to take smaller sample sizes from traditional research and map it against, so that you can use it to tease out those subtle correlations between the variations that make us unique and the kinds of health that we need to move forward as a society.

And I've spent a lot of time around other commons. I've been around the early web. I've been around the early creative commons world, and there's four things that all of these share, which is, they're all really simple. And so if you were to go to the website and enroll in this study, you're not going to see something complicated. But it's not simplistic. These things are weak intentionally, right, because you can always add power and control to a system, but it's very difficult to remove those things if you put them in at the beginning, and so being simple doesn't mean being simplistic, and being weak doesn't mean weakness. Those are strengths in the system.

And open doesn't mean that there's no money. Closed systems, corporations, make a lot of money on the open web, and they're one of the reasons why the open web lives is that corporations have a vested interest in the openness of the system. And so all of these things are part of the clinical study that we've created, so you can actually come in, all you have to be is 14 years old, willing to sign a contract that says I'm not going to be a jerk, basically, and you're in. You can start analyzing the data. You do have to solve a CAPTCHA as well. (Laughter) And if you'd like to build corporate structures on top of it, that's okay too. That's all in the consent, so if you don't like those terms, you don't come in. It's very much the design principles of a commons that we're trying to bring to health data. And the other thing about these systems is that it only takes a small number of really unreasonable people working together to create them. It didn't take that many people to make Wikipedia Wikipedia, or to keep it Wikipedia. And we're not supposed to be unreasonable in health, and so I hate this word "patient." I don't like being patient when systems are broken, and health care is broken. I'm not talking about the politics of health care, I'm talking about the way we scientifically approach health care. So I don't want to be patient. And the task I'm giving to you is to not be patient. So I'd like you to actually try, when you go home, to get your data. You'll be shocked and offended and, I would bet, outraged, at how hard it is to get it. But it's a challenge that I hope you'll take, and maybe you'll share it. Maybe you won't. If you don't have anyone in your family who's sick, maybe you wouldn't be unreasonable. But if you do, or if you've been sick, then maybe you would. And we're going to be able to do an experiment in the next several months that lets us know exactly how many unreasonable people are out there. So this is the Athena Breast Health Network. It's a study of 150,000 women in California, and they're going to return all the data to the participants of the study in a computable form, with one-clickability to load it into the study that I've put together. So we'll know exactly how many people are willing to be unreasonable.

So what I'd end [with] is, the most beautiful thing I've learned since I quit my job almost a year ago to do this, is that it really doesn't take very many of us to achieve spectacular results. You just have to be willing to be unreasonable, and the risk we're running is not the risk those 14 men who got yellow fever ran. Right? It's to be naked, digitally, in public. So you know more about me and my health than I know about you. It's asymmetric now. And being naked and alone can be terrifying. But to be naked in a group, voluntarily, can be quite beautiful. And so it doesn't take all of us. It just takes all of some of us. Thank you. (Applause)

John Wilbanks: Let's pool our medical data

John Wilbanks: Let's pool our medical data

Related talks

Ben Goldacre: What doctors don't know about the drugs they prescribe

Mina Bissell: Experiments that point to a new understanding of cancer

Jay Bradner: Open-source cancer research

Thomas Goetz: It's time to redesign medical data

Danny Hillis: Understanding cancer through proteomics

David Agus: A new strategy in the war on cancer

Related talks

Ben Goldacre: What doctors don't know about the drugs they prescribe

Mina Bissell: Experiments that point to a new understanding of cancer

Jay Bradner: Open-source cancer research

Thomas Goetz: It's time to redesign medical data

Danny Hillis: Understanding cancer through proteomics

David Agus: A new strategy in the war on cancer