John Wilbanks: Let's pool our medical data

So I have bad news, I have good news, and I have a task. So the bad news is that we all get sick. I get sick. You get sick. And every one of us gets sick, and the question really is, how sick do we get? Is it something that kills us? Is it something that we survive? Is it something that we can treat?

좋은 소식과, 나쁜 소식, 그리고 할 일이 하나 있습니다 나쁜 소식은 우리 모두가 병에 걸린다는 겁니다 저도 병에 걸리고, 여러분도 걸리고, 우리 모두가 병에 걸립니다 진짜 질문은 바로 얼마나 아프냐는 거죠 죽을 정도로 아픈 건가요? 살아 남을 수 있을 정도인가요? 아니면 치료할 수 있을 정도로요?

And we've gotten sick as long as we've been people. And so we've always looked for reasons to explain why we get sick. And for a long time, it was the gods, right? The gods are angry with me, or the gods are testing me, right? Or God, singular, more recently, is punishing me or judging me. And as long as we've looked for explanations, we've wound up with something that gets closer and closer to science, which is hypotheses as to why we get sick, and as long as we've had hypotheses about why we get sick, we've tried to treat it as well.

우리가 사람으로 살아가는 이상 아프기 마련입니다 그래서 우리는 왜 아프게 되는 지에 대한 이유를 항상 찾곤 합니다 오랜 기간동안 신 때문이라고 믿었어요 그렇죠? 신이 나에게 노하셨다거나, 신이 나를 시험한다 맞죠? 좀 더 최근에는 주로 신이 나를 벌한다거나 심판한다고 믿게 되었죠 우리가 이유를 찾으려고 할수록 왜 우리가 병에 걸리는 지에 관한 가정과 같은 과학적인 측면에 점점 더 가까워지는 것에 이르렀고, 우리가 그러한 가정을 하는 한, 치료를 하려고 시도해왔습니다

So this is Avicenna. He wrote a book over a thousand years ago called "The Canon of Medicine," and the rules he laid out for testing medicines are actually really similar to the rules we have today, that the disease and the medicine must be the same strength, the medicine needs to be pure, and in the end we need to test it in people. And so if you put together these themes of a narrative or a hypothesis in human testing, right, you get some beautiful results, even when we didn't have very good technologies.

자, 이 사람은 이븐 시나(Avicenna)인데요, 수천 년 전에 "의학정전"이라는 책을 썼습니다 약을 시험하는데 있어서 그가 나열한 방법들은 실제로 병원균과 시약은 같은 강도여야만 하고, 약은 정제되어야하며, 마지막으로 사람에게 시험해야한다는 오늘날의 법칙과 아주 유사합니다 그래서 만약 이러한 서술이나 가정의 주제를 임상시험에 적용한다면, 아주 뛰어난 기술을 가지고 있지 않더라도 좋은 결과를 얻을 것입니다

This is a guy named Carlos Finlay. He had a hypothesis that was way outside the box for his time, in the late 1800s. He thought yellow fever was not transmitted by dirty clothing. He thought it was transmitted by mosquitos. And they laughed at him. For 20 years, they called this guy "the mosquito man." But he ran an experiment in people, right? He had this hypothesis, and he tested it in people. So he got volunteers to go move to Cuba and live in tents and be voluntarily infected with yellow fever. So some of the people in some of the tents had dirty clothes and some of the people were in tents that were full of mosquitos that had been exposed to yellow fever. And it definitively proved that it wasn't this magic dust called fomites in your clothes that caused yellow fever. But it wasn't until we tested it in people that we actually knew. And this is what those people signed up for. This is what it looked like to have yellow fever in Cuba at that time. You suffered in a tent, in the heat, alone, and you probably died. But people volunteered for this.

이 남자는 카를로스 핀라이(Carlos Finlay)입니다 그의 가정은 1800년대 후반이라는 시대와는 어울리지 않는 것이었습니다 그는 황열이 더러운 옷에 의해 전염이 되는 것이 아니라고 생각했습니다 모기에 의해 전염된다고 믿었죠 사람들은 그를 비웃었고, 20년 동안이나 "모기남"이라고 비웃었습니다 하지만 그는 사람들에게 실험을 해봤습니다 이러한 가정을 가지고, 사람들에게 시험을 한 것이죠 그래서 쿠바로 가서 텐트에서 살면서 자발적으로 황열에 감염될 지원자들을 구했습니다 일부는 더러운 옷을 넣어 놓은 텐트에서 지냈고, 일부는 황열에 감염된 모기들로 가득찬 텐트에서 살았습니다 그리고 이 실험은 황열을 일으키는 것은 매개물(fomites)이라고 불리는 마법의 가루가 아니라는 것을 확실히 증명해냈죠 하지만 우리가 아는 사람들에게 직접 실험을 하기 전까지는 몰랐습니다 그리고 사람들이 여기에 자발적으로 동참했죠. 당시 쿠바에서 황열에 걸렸을 때 나타나는 증상으로는 텐트 안에서 혼자서 고열에 시름시름 앓다가 결국엔 죽게 되는 것이었습니다. 하지만 사람들은 기꺼이 자원했습니다

And it's not just a cool example of a scientific design of experiment in theory. They also did this beautiful thing. They signed this document, and it's called an informed consent document. And informed consent is an idea that we should be very proud of as a society, right? It's something that separates us from the Nazis at Nuremberg, enforced medical experimentation. It's the idea that agreement to join a study without understanding isn't agreement. It's something that protects us from harm, from hucksters, from people that would try to hoodwink us into a clinical study that we don't understand, or that we don't agree to. And so you put together the thread of narrative hypothesis, experimentation in humans, and informed consent, and you get what we call clinical study, and it's how we do the vast majority of medical work. It doesn't really matter if you're in the north, the south, the east, the west. Clinical studies form the basis of how we investigate, so if we're going to look at a new drug, right, we test it in people, we draw blood, we do experiments, and we gain consent for that study, to make sure that we're not screwing people over as part of it.

게다가 이건 단지 이론상의 과학적 실험 계획법의 예인 것만은 아니었습니다 다음과 같은 멋진 일도 해냈죠 그들은 이 문서에도 서명을 했었는데, 이는 사전동의서라고도 불립니다 이 사전동의서는 사회 구성원으로서 아주 자랑스러워해야하는 아이디어 아닐까요? 이게 우리와 나치의 뉘른베르크 강제 의학 실험을 구분해주니까요 바로 연구에 대한 이해가 동반되지 않은 동의서는 동의서가 아니라는 걸 시사하고 있습니다. 이것 덕분에 우리가 상해나, 강매상인, 또는 우리가 이해하지 못하거나 동의하지 않는 의학 연구로 우리를 속이려는 사람들로부터 보호받을 수 있는 거죠 그래서 가설과 임상시험, 그리고 사전동의서를 합치면 저희가 의학 연구라고 부르는 것이 되고, 대부분의 의학적인 일은 이런 방식으로 진행 됩니다. 여러분이 동서남북 어디에 있던 간에 문제가 안 됩니다. 의학 연구는 우리가 어떻게 연구하는지에 대한 바탕을 이룹니다. 그래서 만약에 새로운 약을 검사해보고자 한다면 연구를 통해 사람들에게 해를 끼치지 않는다는 것을 확실히 하기 위해서 임상시험을 하고, 채혈을 하고, 실험을 하고, 이 연구에 대한 동의서를 얻습니다

But the world is changing around the clinical study, which has been fairly well established for tens of years if not 50 to 100 years. So now we're able to gather data about our genomes, but, as we saw earlier, our genomes aren't dispositive. We're able to gather information about our environment. And more importantly, we're able to gather information about our choices, because it turns out that what we think of as our health is more like the interaction of our bodies, our genomes, our choices and our environment. And the clinical methods that we've got aren't very good at studying that because they are based on the idea of person-to-person interaction. You interact with your doctor and you get enrolled in the study. So this is my grandfather. I actually never met him, but he's holding my mom, and his genes are in me, right? His choices ran through to me. He was a smoker, like most people were. This is my son. So my grandfather's genes go all the way through to him, and my choices are going to affect his health. The technology between these two pictures cannot be more different, but the methodology for clinical studies has not radically changed over that time period. We just have better statistics. The way we gain informed consent was formed in large part after World War II, around the time that picture was taken. That was 70 years ago, and the way we gain informed consent, this tool that was created to protect us from harm, now creates silos. So the data that we collect for prostate cancer or for Alzheimer's trials goes into silos where it can only be used for prostate cancer or for Alzheimer's research. Right? It can't be networked. It can't be integrated. It cannot be used by people who aren't credentialed. So a physicist can't get access to it without filing paperwork. A computer scientist can't get access to it without filing paperwork. Computer scientists aren't patient. They don't file paperwork.

하지만 수십 년, 혹은 50년에서 100년에 이르는 기간 동안 꽤나 잘 정립되어 있던 의학 연구 분야가 변화하고 있습니다 이제 우리는 유전자에 대한 정보를 수집할 수 있게 되었지만, 앞에서 보셨다시피 유전자 정보는 결정되어있지 않습니다. 그리고 환경에 대한 정보도 얻을 수 있죠. 더 중요한 사실은 우리의 선택에 대한 정보도 수집할 수 있게 되었습니다. 우리가 건강이라고 생각하는 것이 신체와 유전자, 우리의 선택, 그리고 환경의 상호작용으로 밝혀졌기 때문이죠. 또 우리가 의학적인 방법이라고 했던 것들도 사람과 사람 사이의 상호작용에 기반한 것이므로 연구에 그리 좋은 것은 아닙니다. 여러분들의 의사와 이야기를 하고 연구에 들어가는 것이죠 이 사진은 저희 할아버지입니다. 전 만나뵌 적이 없죠. 하지만 할아버지께서는 저희 어머니를 안고 계시고, 제 안에는 그분의 유전자가 있습니다 할아버지의 선택은 제게로도 왔습니다 많은 사람들이 그랬듯이 담배를 피셨죠. 이 사진은 제 아들입니다. 즉, 제 할아버지의 유전자는 이 아이에게까지 닿아있고, 제가 하는 일은 제 아들의 건강에도 영향을 줍니다. 이 두 사진 사이의 기술은 완전히 다르지만, 의학적 연구를 위한 방법론은 여태 급격하게 변화하지 않았습니다. 다만 통계학만 발전했죠. 우리가 사전 동의서를 받았던 방법의 대부분은 첫 번째 사진이 찍혔던 제2차 세계대전 이후에 정립이 되었습니다. 한 70년 정도 된 일이네요. 그리고 피해로부터 우리를 보호해주도록 고안된 사전동의서는 이제 저장소를 만들게 되었습니다. 즉, 전립선암이나 알츠하이머 실험을 위해 모은 자료들은 오직 그 연구만을 위해 쓰일 수 있는 저장소로 들어가게 됩니다. 이 자료들은 서로 공유될 수 없으며, 통합될 수 없고, 자격이 없는 사람들은 접근할 수 없습니다. 물리학자나 컴퓨터 과학자는 서류에 서명하지 않고는 이 자료에 절대 손댈 수 없죠. 컴퓨터 과학자는 인내심이 없습니다. 그래서 서류에 서명을 안 하죠.

And this is an accident. These are tools that we created to protect us from harm, but what they're doing is protecting us from innovation now. And that wasn't the goal. It wasn't the point. Right? It's a side effect, if you will, of a power we created to take us for good. And so if you think about it, the depressing thing is that Facebook would never make a change to something as important as an advertising algorithm with a sample size as small as a Phase III clinical trial. We cannot take the information from past trials and put them together to form statistically significant samples.

이건 재앙입니다. 이 방법들은 피해로부터 우리를 보호하려고 만든 것인데, 그들은 지금 우리를 혁신으로부터 막고 있습니다. 그럴려고 만든 것이 아니었습니다. 그게 목적이 아니죠, 그렇지 않나요? 이건 말하자면, 좋은 목적으로 만든 힘의 부작용이라고 할 수 있겠네요. 조금 더 생각해봤을 때, 우울한 것은 바로 페이스북은 3기 임상시험만큼의 표본을 가진 광고 알고리즘과 같이 중요한 것들에 변화를 주진 못한다는 것입니다. 통계적으로 유의한 표본을 만들기 위해 지난 시험의 정보를 합칠 수는 없는 노릇입니다.

And that sucks, right? So 45 percent of men develop cancer. Thirty-eight percent of women develop cancer. One in four men dies of cancer. One in five women dies of cancer, at least in the United States. And three out of the four drugs we give you if you get cancer fail. And this is personal to me. My sister is a cancer survivor. My mother-in-law is a cancer survivor. Cancer sucks. And when you have it, you don't have a lot of privacy in the hospital. You're naked the vast majority of the time. People you don't know come in and look at you and poke you and prod you, and when I tell cancer survivors that this tool we created to protect them is actually preventing their data from being used, especially when only three to four percent of people who have cancer ever even sign up for a clinical study, their reaction is not, "Thank you, God, for protecting my privacy." It's outrage that we have this information and we can't use it. And it's an accident. So the cost in blood and treasure of this is enormous. Two hundred and twenty-six billion a year is spent on cancer in the United States. Fifteen hundred people a day die in the United States. And it's getting worse.

참 안된 일이지 않나요? 자, 45% 남성에게 암이 발병합니다. 여성의 38%에게서 암이 발병하고요. 남성 4명 중 1명 꼴로 암으로 죽고, 미국에서만해도 여성 5명당 1명이 암으로 죽었습니다. 그리고 암에 걸렸을 때 의사들이 처방하는 약 중 4분의 3은 효과가 없습니다. 이건 제 개인적인 문제이기도 합니다. 제 여동생도 암에서 살아남았고, 저희 장모님께서도 살아남으셨습니다. 암은 참 거지 같죠. 암에 걸리게 되면 병원에서 사생활은 거의 없어지죠. 대부분의 시간 동안 벌거벗고 있습니다. 알지도 못하는 사람들이 와서 쳐다보고, 쑤시고 찔러대죠. 그리고 제가 암에서 회복한 사람들에게 그들을 보호하기 위해 고안한 방법이 실제로는 그들의 데이터가 사용되는 것을 막고 있다는 것을 말해주면, 특히 임상시험에 등록한 3~4%에 불과한 사람들의 반응은 절대로 "하느님, 제 사생활을 보호해주셔서 감사합니다"가 아닙니다. 이 정보가 있는데도 사용할 수 없다는 건 아주 통탄할 일입니다. 그리고 사고이기도 합니다. 즉, 피의 가치와 이에 담긴 보물은 아주 막대합니다. 미국에서는 매년 2,260억 달러가 암에 쓰입니다. 그리고 매일 1,500명이 죽습니다. 그리고 점점 더 나빠지고 있죠.

So the good news is that some things have changed, and the most important thing that's changed is that we can now measure ourselves in ways that used to be the dominion of the health system. So a lot of people talk about it as digital exhaust. I like to think of it as the dust that runs along behind my kid. We can reach back and grab that dust, and we can learn a lot about health from it, so if our choices are part of our health, what we eat is a really important aspect of our health. So you can do something very simple and basic and take a picture of your food, and if enough people do that, we can learn a lot about how our food affects our health. One interesting thing that came out of this — this is an app for iPhones called The Eatery — is that we think our pizza is significantly healthier than other people's pizza is. Okay? (Laughter) And it seems like a trivial result, but this is the sort of research that used to take the health system years and hundreds of thousands of dollars to accomplish. It was done in five months by a startup company of a couple of people. I don't have any financial interest in it.

좋은 소식은 조금씩 바뀌고 있다는 것입니다. 그리고 바뀐 것 중에 가장 중요한 것은 이제 우리 자신을 보건계에서 지배적이었던 방법으로 측정할 수 있게 되었다는 점입니다. 많은 사람들은 이를 인터넷 정보 범람의 범주에서 말하곤 합니다. 전 이걸 아이들 뒤에 따라오는 먼지라고 생각하고 싶습니다. 아이들을 뒤따라가면서 이 먼지를 가지고 건강에 대해서 알 수도 있고, 만약 우리의 선택이 건강의 일부분이라면 무엇을 먹는지도 우리의 건강에서 아주 중요한 부분일테니까요. 즉, 음식을 사진 찍는 것처럼 아주 쉽고 간단한 걸 할 수 있겠죠. 만약 충분히 많은 사람들이 동참한다면 우리가 먹는게 어떤 영향을 주는지도 알 수 있게 될 겁니다. 이건 'The Eatery'라는 아이폰 어플인데요, 여기서 알 수 있는 한 가지 흥미로운 점은 자신이 먹는 피자는 다른 사람들의 피자보다 훨씬 더 건강하다는 겁니다. 아시겠어요? (웃음) 뭐 이게 아주 사소한 결과로 보일지도 모르겠지만, 예전에는 보건계에서 수년 동안 수십만 달러를 퍼붓곤했던 그런 종류의 연구입니다. 그런데 두어명의 벤처회사가 5달 만에 완수했죠. 제가 금전적인 이익에 관심이 있진 않습니다.

But more nontrivially, we can get our genotypes done, and although our genotypes aren't dispositive, they give us clues. So I could show you mine. It's just A's, T's, C's and G's. This is the interpretation of it. As you can see, I carry a 32 percent risk of prostate cancer, 22 percent risk of psoriasis and a 14 percent risk of Alzheimer's disease. So that means, if you're a geneticist, you're freaking out, going, "Oh my God, you told everyone you carry the ApoE E4 allele. What's wrong with you?" Right? When I got these results, I started talking to doctors, and they told me not to tell anyone, and my reaction is, "Is that going to help anyone cure me when I get the disease?" And no one could tell me yes. And I live in a web world where, when you share things, beautiful stuff happens, not bad stuff. So I started putting this in my slide decks, and I got even more obnoxious, and I went to my doctor, and I said, "I'd like to actually get my bloodwork. Please give me back my data." So this is my most recent bloodwork. As you can see, I have high cholesterol. I have particularly high bad cholesterol, and I have some bad liver numbers, but those are because we had a dinner party with a lot of good wine the night before we ran the test. (Laughter) Right. But look at how non-computable this information is. This is like the photograph of my granddad holding my mom from a data perspective, and I had to go into the system and get it out.

근데 더 중대하게는, 유전자 타입도 끝낼 수 있다는 거죠. 비록 유전자 타입이 모든 걸 결정하진 않지만, 약간의 힌트는 얻을 수 있죠. 제걸 보여드리죠. 그냥 여러 개의 A, T, C, G만 있긴 하지만요. 이건 그걸 해석한 것입니다. 여러분께서도 보시다시피, 저는 전립선암에 걸릴 위험이 32%, 건선 위험이 22%, 그리고 알츠하이머에 걸릴 위험이 14% 정도 되네요. 다시 말하자면, 만약 여러분께서 유전학자라면 "맙소사, 모든 사람에게 ApoE E4 대립유전자가 있다고 말하고 다니신거에요? 제 정신이세요?" 라고 하며 환장하는거죠. 그렇죠? 이 결과를 받고 나서 의사들과 이야기를 나누기 시작했고, 그들은 아무에게도 말하지 말라고 하더군요. 저는 "그래서 그렇게 하면 누군가가 제 병을 고치는데 도움이 됩니까?"라고 반응을 했죠. 아무도 제게 그렇다고 말해주진 않았습니다. 저희는 지금 무언가를 공유했을 때 나쁜 일이 아니라 멋진 일이 일어나는 인터넷 세계에 살고 있습니다 그래서 저는 프레젠테이션에 이걸 올리기 시작했고, 더 기분이 나빠져서 주치의에게 찾아가서 "혈액 분석 결과를 받고 싶어요. 제 자료좀 주세요."라고 말했습니다. 이게 가장 최근의 제 혈액분석 결과입니다. 보시다시피 콜레스테롤 수치가 좀 높네요. 특히 나쁜 콜레스테롤 수치가 높고요, 간수치도 안 좋네요. 아, 이건 검사 전날 저녁에 근사한 와인을 곁들인 파티가 있어서 그런 거에요. (웃음) 자, 그런데 이 정보가 얼마나 계산이 불가능한지 한 번 보세요. 데이터의 관점에서 본다면 마치 저희 어머니를 안고 계시던 할아버지의 사진이랑 별반 다르지가 않아요. 제가 직접 그 시스템으로 들어가서 찾아내야만 했었죠

So the thing that I'm proposing we do here is that we reach behind us and we grab the dust, that we reach into our bodies and we grab the genotype, and we reach into the medical system and we grab our records, and we use it to build something together, which is a commons. And there's been a lot of talk about commonses, right, here, there, everywhere, right. A commons is nothing more than a public good that we build out of private goods. We do it voluntarily, and we do it through standardized legal tools. We do it through standardized technologies. Right. That's all a commons is. It's something that we build together because we think it's important.

그래서 제가 여기서 제안드리고 싶은 것은 뒤로 손을 뻗어서 먼지를 잡는 것, 몸 속으로 들어가서 유전자 타입을 찾는 것, 그리고 의학 시스템 안으로 들어가서 기록을 얻는 것입니다. 그리고 이를 가지고 모든 사람에게 통용가능한 것을 같이 만드는 거죠. 여기, 저기, 모든 곳에서 공공재에 대한 말이 많습니다. 공공재라는 것은 개인의 재화에서 만들어진 공공의 물건, 그 이상도 그 이하도 아니죠. 우린 자발적으로, 그리고 표준화된 법적 도구를 통해서 이를 행하고 있습니다. 표준화된 기술을 통해서 하고 있죠. 이게 바로 공공재가 의미하는 바입니다. 우리가 중요하다고 생각하기 때문에 같이 만드는 것들이죠. 같이 만드는 것들이죠.

And a commons of data is something that's really unique, because we make it from our own data. And although a lot of people like privacy as their methodology of control around data, and obsess around privacy, at least some of us really like to share as a form of control, and what's remarkable about digital commonses is you don't need a big percentage if your sample size is big enough to generate something massive and beautiful. So not that many programmers write free software, but we have the Apache web server. Not that many people who read Wikipedia edit, but it works. So as long as some people like to share as their form of control, we can build a commons, as long as we can get the information out. And in biology, the numbers are even better. So Vanderbilt ran a study asking people, we'd like to take your biosamples, your blood, and share them in a biobank, and only five percent of the people opted out. I'm from Tennessee. It's not the most science-positive state in the United States of America. (Laughter) But only five percent of the people wanted out. So people like to share, if you give them the opportunity and the choice.

데이터로써의 공공재라는 것은 우리 자신의 자료로부터 만드는 것이기 때문에 아주 특별하다고할 수 있습니다. 비록 많은 사람들이 자료를 취급하는 방법 중에 하나로 사생활을 들고 있지만, 데이터를 나누고 싶어하는 사람들은 있기 마련이죠. 그리고 디지털 공공재가 놀라운 이유는 표본이 거대하고 아름다운 뭔가를 만들 수 있을만큼 크다면 비율은 문제가 되지 않는다는 겁니다. 그래서 무료 소프트웨어를 만들고 싶어하는 프로그래머가 많지는 않지만, 아파치(Apache) 웹 서버가 만들어졌죠. 위키피디아를 읽는 사람들 중에서 많은 사람들이 편집을 하진 않지만, 잘 돌아가고 있죠. 그러니까 몇몇 사람들이 관리의 형태로 공유를 하는 한, 그리고 거기서 정보를 얻어낼 수 있는 한 우리는 공공재를 만들 수 있습니다. 그리고 생물학에서는 그 숫자가 훨씬 더 많습니다. 반더발트(Vanderbilt)는 사람들에게 조직샘플과 피를 얻어서 생물은행에서 공유하고 싶다고 묻는 조사를 했었는데, 오직 5%의 사람들만이 거절했습니다. 전 테네시에서 왔습니다. 테네시는 미국에서 과학에 가장 긍정적인 주는 아닙니다. (웃음) 그럼에도 불구하고 오직 5%만 거부했죠. 즉, 기회와 선택만 주어진다면 사람들은 나누길 원한다는 뜻입니다.

And the reason that I got obsessed with this, besides the obvious family aspects, is that I spend a lot of time around mathematicians, and mathematicians are drawn to places where there's a lot of data because they can use it to tease signals out of noise. And those correlations that they can tease out, they're not necessarily causal agents, but math, in this day and age, is like a giant set of power tools that we're leaving on the floor, not plugged in in health, while we use hand saws. If we have a lot of shared genotypes, and a lot of shared outcomes, and a lot of shared lifestyle choices, and a lot of shared environmental information, we can start to tease out the correlations between subtle variations in people, the choices they make and the health that they create as a result of those choices, and there's open-source infrastructure to do all of this. Sage Bionetworks is a nonprofit that's built a giant math system that's waiting for data, but there isn't any.

분명한 가족적 관점은 차치하고, 제가 이 일에 집착하는 이유는 제가 수학자들 옆에서 많은 시간을 보내고, 수학자들은 잡음으로부터 신호를 잡아내는데 데이터를 사용할 수 있어서 자료들이 많은 곳으로 이리저리 불려다니기 때문입니다. 그리고 그들이 정리할 수 있다는 것이 인과관계를 의미하진 않습니다. 하지만 오늘날의 수학은 보건에 있어서 콘센트에 꽂히지 않은 채 바닥에 놓인 전기톱과 같다고할 수 있습니다. 손으로 톱질을 하고 있으면서 말이죠. 만약 아주 많은 유전자 타입과 결과, 생활 방식 선택, 그리고 환경적 정보가 공유된다면 사람들의 결정, 그러한 결정들로 인해 야기된 건강과 같은 사람들 사이의 미세한 변동들 사이의 상관관계를 정리할 수 있게 될 것이고, 이를 가능하게 하는 오픈소스 인프라는 이미 준비되어 있습니다. Sage Bionetwooks라는 비영리 단체가 거대한 수학 시스템을 만들었지만 자료가 없습니다.

So that's what I do. I've actually started what we think is the world's first fully digital, fully self-contributed, unlimited in scope, global in participation, ethically approved clinical research study where you contribute the data. So if you reach behind yourself and you grab the dust, if you reach into your body and grab your genome, if you reach into the medical system and somehow extract your medical record, you can actually go through an online informed consent process -- because the donation to the commons must be voluntary and it must be informed -- and you can actually upload your information and have it syndicated to the mathematicians who will do this sort of big data research, and the goal is to get 100,000 in the first year and a million in the first five years so that we have a statistically significant cohort that you can use to take smaller sample sizes from traditional research and map it against, so that you can use it to tease out those subtle correlations between the variations that make us unique and the kinds of health that we need to move forward as a society.

그래서 저는 이런 일을 하고 있습니다. 여러분들께서 데이터를 제공해주실 수 있는 세계 최초로 완전 디지털화되어 있고, 전적으로 자가 공헌이며, 크기에 제한이 없고, 전세계 누구나 참여할 수 있으며, 도덕적으로 인정된 의료적 탐색 연구라고 생각하고 있는 일을 시작했습니다. 즉, 여러분께서 본인의 뒤에 있는 먼지를 잡으실 수 있으시거나, 유전정보를 얻으실 수 있으시거나, 혹은 의료체계에서 의료 기록을 얻으실 수 있으시다면 공공재에 대한 기부는 자발적이고 사전공지가 있어야하기에, 온라인 사전동의 작업을 거치시면, 여러분의 정보를 올리실 수 있고, 이런 종류의 빅데이터 연구를 하는 수학자들에게 보낼 수 있습니다. 저희의 목표는 첫 해에 10만 건을 모으고, 5년 동안 백만 건을 모아서 전통적인 연구에서 더 작은 표본을 뽑아서 비교할 수 있는, 그리고 우리를 특별하게 만들어주는 변동과 사회로써 한 발짝 더 나아가야만 하는 그런 종류의 보건 사이의 작은 상관관계까지 밝혀낼 수 있는 통계적으로 유의한 집단을 만드는 것입니다.

And I've spent a lot of time around other commons. I've been around the early web. I've been around the early creative commons world, and there's four things that all of these share, which is, they're all really simple. And so if you were to go to the website and enroll in this study, you're not going to see something complicated. But it's not simplistic. These things are weak intentionally, right, because you can always add power and control to a system, but it's very difficult to remove those things if you put them in at the beginning, and so being simple doesn't mean being simplistic, and being weak doesn't mean weakness. Those are strengths in the system.

저는 또 다른 커먼즈에도 많은 시간을 보냈습니다 초기의 웹에도 있었고, 크리에이티브 커먼즈 세계의 초반에도 잠깐 참여했었죠. 이 모든 활동들이 공유하고 있는 아주 간단한 네 가지가 있습니다. 그리고 만약 여러분들께서 웹사이트에 가서 연구에 등록한다고 해도 절대 복잡한 걸 보는 일은 없을 겁니다. 하지만 지나치게 단순화 되어있진 않습니다 이것들은 의도적으로 약하게 만들었습니다 왜냐하면 한 체계에 힘과 통제를 더하는 것은 쉽지만 초반에 더하게 되면 제거하긴 어렵기 때문이죠. 그러니까 간단하다는 것이 지나치게 단순화된 것은 아니고, 약한 것이 약점은 아닙니다. 오히려 이 시스템의 강점들입니다.

And open doesn't mean that there's no money. Closed systems, corporations, make a lot of money on the open web, and they're one of the reasons why the open web lives is that corporations have a vested interest in the openness of the system. And so all of these things are part of the clinical study that we've created, so you can actually come in, all you have to be is 14 years old, willing to sign a contract that says I'm not going to be a jerk, basically, and you're in. You can start analyzing the data. You do have to solve a CAPTCHA as well. (Laughter) And if you'd like to build corporate structures on top of it, that's okay too. That's all in the consent, so if you don't like those terms, you don't come in. It's very much the design principles of a commons that we're trying to bring to health data. And the other thing about these systems is that it only takes a small number of really unreasonable people working together to create them. It didn't take that many people to make Wikipedia Wikipedia, or to keep it Wikipedia. And we're not supposed to be unreasonable in health, and so I hate this word "patient." I don't like being patient when systems are broken, and health care is broken. I'm not talking about the politics of health care, I'm talking about the way we scientifically approach health care. So I don't want to be patient. And the task I'm giving to you is to not be patient. So I'd like you to actually try, when you go home, to get your data. You'll be shocked and offended and, I would bet, outraged, at how hard it is to get it. But it's a challenge that I hope you'll take, and maybe you'll share it. Maybe you won't. If you don't have anyone in your family who's sick, maybe you wouldn't be unreasonable. But if you do, or if you've been sick, then maybe you would. And we're going to be able to do an experiment in the next several months that lets us know exactly how many unreasonable people are out there. So this is the Athena Breast Health Network. It's a study of 150,000 women in California, and they're going to return all the data to the participants of the study in a computable form, with one-clickability to load it into the study that I've put together. So we'll know exactly how many people are willing to be unreasonable.

그리고 공개되었다는 것이 무료를 의미하진 않습니다. 폐쇄된 시스템과 기업은 오픈 웹에서 많은 돈을 벌어들이고 있고, 오픈 웹이 살아가는 이유 중에 하나는 그런 기업들이 시스템의 개방성에 기득권을 가지고 있기 때문입니다. 그러니까 이 모든 것들이 저희가 만든 임상시험의 일부이며, 여러분께서는 실제로 참여하실 수 있고, 필요한 것이라곤 14세 이상에, 쉽게 말해서 '또라이짓 안 하겠습니다'라고 적힌 계약서에 싸인만 하시면 됩니다. 그럼 데이터를 분석할 수 있습니다. 아, 그리고 CAPCHA도 풀어야합니다. (웃음) 또 이를 기반으로 사업을 하시는 것도 가능합니다. 모두 동의서에 포함되어 있으니까요. 만약 이게 싫으시다면 안 오시면 됩니다. 이게 바로 저희가 보건 데이터를 얻으려고 하는 커먼즈의 설립 목적입니다. 그리고 또 다른 것은 이를 만들고자 같이 일하는 아주 비이성적인 소수의 사람들만 있으면 된다는 점입니다. 위키피디아를 위키피디아로 만들거나 유지하는데 많은 사람을 필요로하진 않았습니다. 그리고 우리는 건강에 있어서는 비이성적이면 안 됩니다. 전 그래서 "patient"라는 단어를 아주 싫어합니다. 전 시스템이 망가졌을 때 인내심을 갖고 싶진 않아요. 그리고 보건은 지금 엉망입니다. 전 지금 보건의 정책이 아니라, 우리가 보건에 과학적으로 접근하는 방법을 말하고 있습니다. 즉, 전 참기 싫습니다. 제가 여러분께 드리는 임무도 참지 않는 것입니다. 전 여러분께서 집에 가셨을 때 본인의 자료를 얻으려고 해보라고 권해드리고 싶습니다. 분명히 충격을 받고, 기분이 상하실 겁니다. 장담하건데, 얻기가 무척 어렵다는 사실에 분노하실 겁니다. 하지만 이건 여러분께서 수행하셨으면 하는 과제입니다. 공유를 하실 수도 있고, 안 하실 수도 있겠죠. 만약 가족들 중 아픈 사람이 없다면, 부당해하시진 않을 겁니다. 하지만 있거나, 본인이 아팠던 적이 있다면 그러실 수도 있습니다. 저희는 앞으로 몇 달간 정확히 몇 명이나 비이성적인 사람들이 있는지 알아보는 실험을 할 수 있을 것입니다. 자, 이건 Athena Breast Health Network입니다. 캘리포니아에 있는 15만 명의 여성들에 대한 연구죠. 그들은 참가자에게 모든 데이터를 제가 만든 연구에 클릭 한 번으로 불러들일 수 있는 계산가능한 형태로 돌려줄 것입니다. 그러면 얼마나 많은 사람들이 기꺼이 불합리하게 되려고 하는지 알 수 있겠죠.

So what I'd end [with] is, the most beautiful thing I've learned since I quit my job almost a year ago to do this, is that it really doesn't take very many of us to achieve spectacular results. You just have to be willing to be unreasonable, and the risk we're running is not the risk those 14 men who got yellow fever ran. Right? It's to be naked, digitally, in public. So you know more about me and my health than I know about you. It's asymmetric now. And being naked and alone can be terrifying. But to be naked in a group, voluntarily, can be quite beautiful. And so it doesn't take all of us. It just takes all of some of us. Thank you. (Applause)

제가 마지막으로 드리고 싶은 말씀은, 이 일을 하려고 약 1년 전에 제 일을 그만두고 나서 깨달은 가장 아름다운 사실은 놀라운 결과를 얻기 위해서는 그렇게 많은 사람을 필요로하진 않는다는 것입니다. 단지 비이성적이기만 하면 됩니다. 그리고 저희가 감수하고 있는 위험은 황열 연구에 참가한 14명의 것과는 좀 다르죠. 이건 디지털적으로 공공에게 벌거벗겨지는 것입니다. 그러니까 여러분이 저와 제 건강을 제가 여러분의 것을 아는 것 보다 더 많이 안다는 것이죠. 이제 비대칭적이게 되었습니다. 그리고 벌거벗겨지고 홀로 남는 것은 무섭습니다. 하지만 여러 사람들과 함께 자발적으로 하는 것은 꽤나 아름답죠. 즉, 우리 모두가 다 할 필요는 없습니다. 우리들 중 일부의 모든 사람들만 있으면 됩니다. 감사합니다. (박수)

John Wilbanks: Let's pool our medical data

John Wilbanks: Let's pool our medical data

Related talks

Ben Goldacre: What doctors don't know about the drugs they prescribe

Mina Bissell: Experiments that point to a new understanding of cancer

Jay Bradner: Open-source cancer research

Thomas Goetz: It's time to redesign medical data

Danny Hillis: Understanding cancer through proteomics

David Agus: A new strategy in the war on cancer

Related talks

Ben Goldacre: What doctors don't know about the drugs they prescribe

Mina Bissell: Experiments that point to a new understanding of cancer

Jay Bradner: Open-source cancer research

Thomas Goetz: It's time to redesign medical data

Danny Hillis: Understanding cancer through proteomics

David Agus: A new strategy in the war on cancer