So you go to the doctor and get some tests. The doctor determines that you have high cholesterol and you would benefit from medication to treat it. So you get a pillbox. You have some confidence, your physician has some confidence that this is going to work. The company that invented it did a lot of studies, submitted it to the FDA. They studied it very carefully, skeptically, they approved it. They have a rough idea of how it works, they have a rough idea of what the side effects are. It should be OK. You have a little more of a conversation with your physician and the physician is a little worried because you've been blue, haven't felt like yourself, you haven't been able to enjoy things in life quite as much as you usually do. Your physician says, "You know, I think you have some depression. I'm going to have to give you another pill."
你去看醫生,接受了一些檢查。 醫生診斷出你的膽固醇過高, 建議你服藥治療可能有幫助。 所以,你拿到了藥罐子。 你有點信心, 你的醫師也有信心,認為這藥會有效。 發明這個藥的公司做了很多的研究, 然後呈送給食品藥物管理局。 他們很仔細、審慎地研究, 並核准了這藥物上市。 他們大概知道這藥物如何運作, 也大略知道會有什麼副作用, 應該沒問題。 你跟醫師又多聊了一會, 而醫師有點擔心,因為你很憂鬱, 精神欠佳。 無法像平常一樣盡情享受生活點滴。 你的醫師說: 「我認為你有一點精神憂鬱, 我再開個藥給你。」
So now we're talking about two medications. This pill also -- millions of people have taken it, the company did studies, the FDA looked at it -- all good. Think things should go OK. Think things should go OK. Well, wait a minute. How much have we studied these two together?
所以,我們現在有兩種藥了。 這個藥也有好幾百萬人服用過, 公司做了研究,食品藥物管理局 也檢查過,全部都沒問題。 想一下,這東西沒問題,OK的。 想一下,這東西沒問題,OK的。 但,請等一下。 我們對這兩種藥混在一起吃 做了多少研究?
Well, it's very hard to do that. In fact, it's not traditionally done. We totally depend on what we call "post-marketing surveillance," after the drugs hit the market. How can we figure out if bad things are happening between two medications? Three? Five? Seven? Ask your favorite person who has several diagnoses how many medications they're on.
其實,這很難評估。 事實上,傳統上都不會做。 在藥物上市後,我們完全倚賴一種 叫做「上市後監察系統」的機制, 我們要如何確認,兩種藥之間 是否有什麼不好的事會發生? 三種?五種?七種呢? 問你身邊有各種疾病在身的人, 他們正在吃多少藥。
Why do I care about this problem? I care about it deeply. I'm an informatics and data science guy and really, in my opinion, the only hope -- only hope -- to understand these interactions is to leverage lots of different sources of data in order to figure out when drugs can be used together safely and when it's not so safe.
為什麼我在乎這個問題? 我非常在乎。 我是念資訊和數據科學的人, 真的,在我看來, 了解藥物彼此間的交互影響 唯一的希望只有 運用不同來源的龐大資料, 才能找出這些藥 何時可以安全地一起服用, 以及何時不行。
So let me tell you a data science story. And it begins with my student Nick. Let's call him "Nick," because that's his name.
所以,讓我來告訴各位 一個數據科學的故事。 這要從我的學生尼克開始講起。 我們就稱呼他為尼克吧, 因為那就是他的本名。
(Laughter)
(笑聲)
Nick was a young student. I said, "You know, Nick, we have to understand how drugs work and how they work together and how they work separately, and we don't have a great understanding. But the FDA has made available an amazing database. It's a database of adverse events. They literally put on the web -- publicly available, you could all download it right now -- hundreds of thousands of adverse event reports from patients, doctors, companies, pharmacists. And these reports are pretty simple: it has all the diseases that the patient has, all the drugs that they're on, and all the adverse events, or side effects, that they experience. It is not all of the adverse events that are occurring in America today, but it's hundreds and hundreds of thousands of drugs.
尼克很年輕, 我說:「尼克, 我們必須了解藥物如何運作, 以及藥物在一起會如何運作、 分開會如何運作, 而我們並沒有了解很深。」 但食品藥物管理局已經 有一個很驚人的資料庫, 是一個藥物不良反應通報資料庫。 資料真的直接放在網路上 供大眾查詢,你現在就可以全部下載, 從病人、醫生、公司、藥劑師通報上來 好幾百萬個的藥物不良反應通報。 這些報告都相當簡單: 上面有病人所有疾病 及所有藥物的使用狀況, 還有他們經歷過的 所有不良反應事件或副作用。 雖然沒有現今在美國 發生的所有不良反應事件, 但卻有上百萬種藥物資科。
So I said to Nick, "Let's think about glucose. Glucose is very important, and we know it's involved with diabetes. Let's see if we can understand glucose response. I sent Nick off. Nick came back.
所以,我跟尼克說: 「我們來想一想葡萄糖。 葡萄糖非常重要,而且 大家都知道它與糖尿病有關。 讓我們來看看是否可以 了解葡萄糖的反應。」 我請尼克去找資料,
"Russ," he said, "I've created a classifier that can look at the side effects of a drug based on looking at this database, and can tell you whether that drug is likely to change glucose or not."
他回來後說:「洛斯, 我已經建造了一個分辨器, 可以透過這個資料庫 來檢視一種藥物的副作用, 而且還可以告訴你,這個藥 會否改變病人血糖狀況。」
He did it. It was very simple, in a way. He took all the drugs that were known to change glucose and a bunch of drugs that don't change glucose, and said, "What's the difference in their side effects? Differences in fatigue? In appetite? In urination habits?" All those things conspired to give him a really good predictor. He said, "Russ, I can predict with 93 percent accuracy when a drug will change glucose."
他用一個方法做到了,很簡單。 他把所有已知會改變葡萄糖的藥物 及所有不會改變的藥物拿出來做比較, 「它們之間的副作用有什麼分別? 疲勞狀況上的差異?食慾上的差異? 排尿習慣上的差異?」 所有這些事情都可以協助他 做出一個很棒的預測器。 他說:「洛斯,我能預測 哪種藥可改變血糖, 準確率可以高達93%。」
I said, "Nick, that's great." He's a young student, you have to build his confidence. "But Nick, there's a problem. It's that every physician in the world knows all the drugs that change glucose, because it's core to our practice. So it's great, good job, but not really that interesting, definitely not publishable."
我說:「尼克,這太棒了!」 他是個年輕的學生, 你必須建立他的信心。 「但,尼克,有一個問題。 就是全世界的醫師都知道 這些藥會改變葡萄糖, 因為這是我們實務上的核心。 所以,你很棒,幹得好, 但並沒有人對這有興趣, 絕對還不適合公布你的研究結果。」
(Laughter)
(笑聲 )
He said, "I know, Russ. I thought you might say that." Nick is smart. "I thought you might say that, so I did one other experiment. I looked at people in this database who were on two drugs, and I looked for signals similar, glucose-changing signals, for people taking two drugs, where each drug alone did not change glucose, but together I saw a strong signal."
他說:「我知道,洛斯。 我知道你可能會這麼說。」 尼克很聰明。 「我知道你會這麼說, 所以我多做了另一項實驗。 我仔細觀察資料庫裡 同時服用兩種藥的人, 然後尋找他們之間 葡萄糖改變的相似訊號, 但前提是,這些藥單獨服用 不會改變葡萄糖, 一起服用時,會有強烈訊號的藥物。」
And I said, "Oh! You're clever. Good idea. Show me the list." And there's a bunch of drugs, not very exciting. But what caught my eye was, on the list there were two drugs: paroxetine, or Paxil, an antidepressant; and pravastatin, or Pravachol, a cholesterol medication.
我說:「喔!你真聰明, 好主意,讓我看一下清單。」 有一大堆藥,並沒有令人非常興奮。 但引起我注意的是,清單上有兩種藥: 帕羅西汀或稱克憂果, 這是一種治療憂鬱症的藥, 還有普伐他汀或稱美百樂, 一種治療心臟疾病的藥。
And I said, "Huh. There are millions of Americans on those two drugs." In fact, we learned later, 15 million Americans on paroxetine at the time, 15 million on pravastatin, and a million, we estimated, on both. So that's a million people who might be having some problems with their glucose if this machine-learning mumbo jumbo that he did in the FDA database actually holds up. But I said, "It's still not publishable, because I love what you did with the mumbo jumbo, with the machine learning, but it's not really standard-of-proof evidence that we have." So we have to do something else. Let's go into the Stanford electronic medical record. We have a copy of it that's OK for research, we removed identifying information. And I said, "Let's see if people on these two drugs have problems with their glucose."
然後我說:「哈!有上百萬 美國人正在服用這兩種藥」。 事實上,我們之後才知道, 當時有1500萬美國人正在服用帕羅西汀, 1500萬人正在服用普伐他汀, 而我們預估有100萬人, 同時服用這兩個藥。 所以,有100萬人 可能有葡萄糖上的問題, 如果他用食品藥物管理局的資料庫 做的機械學習判讀器真的有用的話。 但我說:「還是不能發表, 因為我雖然喜歡你做的 機械學習判讀器, 但我們沒有真正的證明標準 來證明我們是正確的。」 所以,我們來必須做些其他事來驗證。 我們去找史丹佛的電子病例紀錄。 我們有一個副本,可以用來研究, 我們移除了病人個資。 我說:「讓我們來看看, 服用這兩種藥的人 是否有葡萄糖上的疾病。」
Now there are thousands and thousands of people in the Stanford medical records that take paroxetine and pravastatin. But we needed special patients. We needed patients who were on one of them and had a glucose measurement, then got the second one and had another glucose measurement, all within a reasonable period of time -- something like two months. And when we did that, we found 10 patients. However, eight out of the 10 had a bump in their glucose when they got the second P -- we call this P and P -- when they got the second P. Either one could be first, the second one comes up, glucose went up 20 milligrams per deciliter. Just as a reminder, you walk around normally, if you're not diabetic, with a glucose of around 90. And if it gets up to 120, 125, your doctor begins to think about a potential diagnosis of diabetes. So a 20 bump -- pretty significant.
在史丹佛病例紀錄中有成千上萬的人 同時服用這兩種藥。 但我們需要特定病患。 我們需要已經做葡萄糖檢測 且服用其中一種藥的病人, 另外再找到另一個已經做過 另一個葡萄糖檢測的病人, 全部都在合理期間做的, 例如兩個月內。 當我們開始著手進行時, 我們找到十個病人。 然而,十個人裡面 有八個葡萄糖異常增加現象, 在他們服用第二個P時 ─我們稱呼這個叫 P&P─ 當他們服用了第二個 P。 哪一個先服用都行, 當第二個藥服用後, 葡萄糖濃度每公升會增加20毫克。 提醒各位一下, 如果你能正常走動,沒有糖尿病, 你的葡萄糖濃度約90毫克/公升。 如果上升到120、125, 你的醫生會開始認為 你有潛在的糖尿病症狀。 所以,一下子增加20是相當明顯的。
I said, "Nick, this is very cool. But, I'm sorry, we still don't have a paper, because this is 10 patients and -- give me a break -- it's not enough patients."
我說:「尼克,這很酷。 但,很抱歉,我們仍然沒辦法寫報告, 因為只有十個病人,饒了我吧, 病人樣本數根本不夠。」
So we said, what can we do? And we said, let's call our friends at Harvard and Vanderbilt, who also -- Harvard in Boston, Vanderbilt in Nashville, who also have electronic medical records similar to ours. Let's see if they can find similar patients with the one P, the other P, the glucose measurements in that range that we need.
所以,那怎麼辦? 我們來打電話給哈佛 及范德堡大學的朋友, 就是波士頓的哈佛 及納許維爾的范德堡, 他們都有跟我們很像的 電子病歷紀錄。 讓我們看看,他們是否 也可以找到相同的病人, 也有我們需要的已經服用這兩種藥, 並做過葡萄糖檢測的病人。
God bless them, Vanderbilt in one week found 40 such patients, same trend. Harvard found 100 patients, same trend. So at the end, we had 150 patients from three diverse medical centers that were telling us that patients getting these two drugs were having their glucose bump somewhat significantly.
上天保佑,范德堡一個星期內找到40個 有同樣趨勢的病人。 哈佛找到100個有同樣趨勢的病人。 所以,最後,我們從三個不同的 醫學中心找到150個病人 服用過這兩種藥, 然後有葡萄糖異常增加現象。
More interestingly, we had left out diabetics, because diabetics already have messed up glucose. When we looked at the glucose of diabetics, it was going up 60 milligrams per deciliter, not just 20. This was a big deal, and we said, "We've got to publish this." We submitted the paper. It was all data evidence, data from the FDA, data from Stanford, data from Vanderbilt, data from Harvard. We had not done a single real experiment.
有趣的是,我們沒有考慮糖尿病患者, 因為糖尿病患者本身的 血糖濃度就已經很混亂。 當我們觀察糖尿病患者的血糖濃度時, 會上升到每公升60毫克, 不只20毫克。 這事情很重要,我們說: 「我們必須發佈這件事。」 我們遞交報告, 裡面全部都是資料證明, 有來自食品藥物管理局、史丹佛的資料、 有來自范德堡、哈佛醫學院的資料, 我們完全沒有做任何實驗。
But we were nervous. So Nick, while the paper was in review, went to the lab. We found somebody who knew about lab stuff. I don't do that. I take care of patients, but I don't do pipettes. They taught us how to feed mice drugs. We took mice and we gave them one P, paroxetine. We gave some other mice pravastatin. And we gave a third group of mice both of them. And lo and behold, glucose went up 20 to 60 milligrams per deciliter in the mice.
但我們很緊張。 所以,當報告送去審核時, 尼克就去了實驗室。 我們找到會做實驗的人。 我不做實驗的。 我會看病人,但我不會做分量管。 他們教我們如何餵老鼠吃藥。 我們給第一組老鼠餵食帕羅西汀, 給第二組老鼠餵食普伐他汀。 第三組的老鼠兩種藥都餵食。 驚奇的是,葡萄糖每公升上升20到60毫克, 老鼠也有相同的反應。
So the paper was accepted based on the informatics evidence alone, but we added a little note at the end, saying, oh by the way, if you give these to mice, it goes up.
所以,只有資料證據的報告被接受了, 但我們在最後加了註記說, 如果把藥物給老鼠,葡萄糖也會上升。
That was great, and the story could have ended there. But I still have six and a half minutes.
太棒了,故事其實就到這裡結束。 但,我還有六分半鐘。
(Laughter)
(笑聲)
So we were sitting around thinking about all of this, and I don't remember who thought of it, but somebody said, "I wonder if patients who are taking these two drugs are noticing side effects of hyperglycemia. They could and they should. How would we ever determine that?"
所以,我們坐下來想一下所有的事, 我忘記誰曾經說過,但有人說: 「不曉得同時服用這兩種藥的病人, 是否有注意到高血糖症的副作用。 他們可能知道,也必須知道。 我們要如何確定?」
We said, well, what do you do? You're taking a medication, one new medication or two, and you get a funny feeling. What do you do? You go to Google and type in the two drugs you're taking or the one drug you're taking, and you type in "side effects." What are you experiencing? So we said OK, let's ask Google if they will share their search logs with us, so that we can look at the search logs and see if patients are doing these kinds of searches. Google, I am sorry to say, denied our request. So I was bummed. I was at a dinner with a colleague who works at Microsoft Research and I said, "We wanted to do this study, Google said no, it's kind of a bummer." He said, "Well, we have the Bing searches."
我們說,好吧,你會怎麼做? 你服用了一種藥,一個或兩個新藥, 然後你感覺怪怪的。 你會怎麼做? 你會去問 Google, 然後搜尋你在服用的一或兩個藥名, 然後加上「副作用」。 你會找到什麼? 所以,我們說,好, 我們來問 Google 能否 跟我們分享搜尋紀錄, 讓我們可以觀察搜尋紀錄, 看是否有病人也在做同樣的搜尋。 很抱歉我得這麼說, 但 Google 拒絕了我們的請求。 所以,我很煩惱。 我跟一個在微軟研究室的同事吃晚餐時, 我跟他說:「我們想做這個研究, Google 說不行,我有點煩惱。」 他說:「我們有 Bing 搜尋引擎啊。」
(Laughter)
(笑聲)
Yeah. That's great. Now I felt like I was --
是啊! 太棒了。 現在,我感覺...
(Laughter)
(笑聲)
I felt like I was talking to Nick again. He works for one of the largest companies in the world, and I'm already trying to make him feel better. But he said, "No, Russ -- you might not understand. We not only have Bing searches, but if you use Internet Explorer to do searches at Google, Yahoo, Bing, any ... Then, for 18 months, we keep that data for research purposes only." I said, "Now you're talking!" This was Eric Horvitz, my friend at Microsoft.
我好像又在鼓勵尼克一樣。 他在全世界數一數二的公司上班, 我已經開始要安慰他了。 但他說:「不,洛斯,你可能沒搞懂。 我們不只有 Bing 啊, 如果你用 IE 在 Google、 雅虎、Bing 等任何搜尋引擎, 之後18個月,我們保留這些數據 僅做研究目的使用。 我說:「這才像話嘛!」 這就是我的微軟朋友艾瑞克.霍維茲。
So we did a study where we defined 50 words that a regular person might type in if they're having hyperglycemia, like "fatigue," "loss of appetite," "urinating a lot," "peeing a lot" -- forgive me, but that's one of the things you might type in. So we had 50 phrases that we called the "diabetes words." And we did first a baseline. And it turns out that about .5 to one percent of all searches on the Internet involve one of those words. So that's our baseline rate. If people type in "paroxetine" or "Paxil" -- those are synonyms -- and one of those words, the rate goes up to about two percent of diabetes-type words, if you already know that there's that "paroxetine" word. If it's "pravastatin," the rate goes up to about three percent from the baseline. If both "paroxetine" and "pravastatin" are present in the query, it goes up to 10 percent, a huge three- to four-fold increase in those searches with the two drugs that we were interested in, and diabetes-type words or hyperglycemia-type words.
我們做了一項研究, 我們定義出了50個 如果一般人有高血糖症時 會鍵入的關鍵字, 像是疲勞、沒食慾、頻尿等。 請原諒我,但這些就是 你可能會鍵入的關鍵字。 所以,我們有了50個短語, 我們稱之為「糖尿病關鍵字」。 我們先設定了一條基準線。 原來,網路上有包含這些關鍵字的搜尋 占了0.5~1%的比例。 所以,這就是我們的基準線率, 如果大家鍵入「帕羅西汀」或「克憂果」 ──這些是同義字── 以及剛剛其中一個關鍵字, 那糖尿病類型的基準線率會上升到2%, 如果你已經知道 「帕羅西汀」這個字的話。 如果是「普伐他汀」, 那比率會從基準線率上升到3%。 如果「帕羅西汀」 和「普伐他汀」同時出現, 那會上升到10%, 有3到4倍的增加, 用這兩種藥搜尋,會出現 我們感興趣的字在裡面, 像是糖尿病類的字 或高血糖症類的字。
We published this, and it got some attention. The reason it deserves attention is that patients are telling us their side effects indirectly through their searches. We brought this to the attention of the FDA. They were interested. They have set up social media surveillance programs to collaborate with Microsoft, which had a nice infrastructure for doing this, and others, to look at Twitter feeds, to look at Facebook feeds, to look at search logs, to try to see early signs that drugs, either individually or together, are causing problems.
我們發佈了這個研究, 並得到一些關注。 它值得被關注的原因是, 病人會透過搜尋, 直接告訴我們藥物的副作用。 我們得到了食品藥物管理局的關注。 他們很感興趣。 他們已經成立社會媒體監測計畫, 與微軟展開合作, 他們有良好的設備來做這些事, 可以觀察推特的動態、 觀察臉書的動態、 觀察搜尋日誌、 嘗試觀察引發問題的 無論單一藥物或混合藥物的早期症狀。
What do I take from this? Why tell this story? Well, first of all, we have now the promise of big data and medium-sized data to help us understand drug interactions and really, fundamentally, drug actions. How do drugs work? This will create and has created a new ecosystem for understanding how drugs work and to optimize their use. Nick went on; he's a professor at Columbia now. He did this in his PhD for hundreds of pairs of drugs. He found several very important interactions, and so we replicated this and we showed that this is a way that really works for finding drug-drug interactions.
我從這件事學到什麼? 為什麼要講這個故事? 首先, 我們現在有大數據及中型數據稱腰, 來幫助我們了解藥物的相互作用, 以及真實、基本的藥物作用。 藥物是如何作用? 這個將會創造一個新的生態系統, 來幫助我們了解藥物如何運作 以及有效使用它們。 尼克繼續往前走, 他現在是哥倫比亞的教授。 他用好幾百對藥物做為博士研究。 他找到一些非常重要的藥物交互作用, 所以,我們複製這個模式, 展示出利用這樣做 來尋找藥與藥之間的作用真的有效。
However, there's a couple of things. We don't just use pairs of drugs at a time. As I said before, there are patients on three, five, seven, nine drugs. Have they been studied with respect to their nine-way interaction? Yes, we can do pair-wise, A and B, A and C, A and D, but what about A, B, C, D, E, F, G all together, being taken by the same patient, perhaps interacting with each other in ways that either makes them more effective or less effective or causes side effects that are unexpected? We really have no idea. It's a blue sky, open field for us to use data to try to understand the interaction of drugs.
然而,還有一些事。 我們不會同時一次只服用兩種藥。 就如我之前所說的, 有病人一次是服用三、五、七、九種藥。 他們有認真研究 這九種藥的相互作用嗎? 沒錯,我們可以做成對的藥, A+B、A+C、A+D, 但如果同一個病人 同時服用ABCDEFG, 那可能會互相產生那些作用? 藥效更好或更不好? 或造成那些意想不到的副作用呢? 我們真的不知道。 它是個開放式的藍天領域, 讓我們可以使用數據, 來嘗試了解藥物彼此間的作用。
Two more lessons: I want you to think about the power that we were able to generate with the data from people who had volunteered their adverse reactions through their pharmacists, through themselves, through their doctors, the people who allowed the databases at Stanford, Harvard, Vanderbilt, to be used for research. People are worried about data. They're worried about their privacy and security -- they should be. We need secure systems. But we can't have a system that closes that data off, because it is too rich of a source of inspiration, innovation and discovery for new things in medicine.
另外兩件事: 我想要各位去想想 我們所創造出來的力量, 就是我們已經可以透過藥劑師、 病人本身、病人的醫師, 來取得自願者身上 他們的藥物不良反應, 這些人同意他們的資料可以被 史丹佛、哈佛、范德堡醫學院 來做研究使用。 大家都擔心個資問題。 他們擔心自己的隱私及安全 ──他們必須要擔心。 我們需要保全系統。 但我們不能有一個 把資料關起來的系統, 因為它的資源太豐盛了, 它對醫學界的鼓舞、 創新、發現新事物 實在太重要了。
And the final thing I want to say is, in this case we found two drugs and it was a little bit of a sad story. The two drugs actually caused problems. They increased glucose. They could throw somebody into diabetes who would otherwise not be in diabetes, and so you would want to use the two drugs very carefully together, perhaps not together, make different choices when you're prescribing. But there was another possibility. We could have found two drugs or three drugs that were interacting in a beneficial way. We could have found new effects of drugs that neither of them has alone, but together, instead of causing a side effect, they could be a new and novel treatment for diseases that don't have treatments or where the treatments are not effective. If we think about drug treatment today, all the major breakthroughs -- for HIV, for tuberculosis, for depression, for diabetes -- it's always a cocktail of drugs.
最後,我想說的是, 我們發現這兩個藥的案例, 的確是令人難過的故事。 這兩個藥一起服用真的會有問題。 同時服用會增加葡萄糖, 會造成一個原本沒糖尿病的人 發生糖尿病情形, 所以,各位如果想一起使用 這兩種藥,一定要非常小心, 最好不要一起服用, 當你要開處方簽時, 看看有沒有不同的選擇。 但,也有其他的可能。 我們或許能找到兩或三種藥, 一起服用時也許可以更有效。 我們或許也可以找到 藥物本身沒有的作用, 但在一起服用時不但沒有產生副作用, 反而產生新作用,有可能變成最新的 絕症疾病治療方式, 或者原本的治療方式完全是無效的。 如果我們想想現今的藥物治療方式, 所有的重大突破── 愛滋病、肺結核、 憂鬱症,糖尿病── 總像是藥物雞尾酒。
And so the upside here, and the subject for a different TED Talk on a different day, is how can we use the same data sources to find good effects of drugs in combination that will provide us new treatments, new insights into how drugs work and enable us to take care of our patients even better?
這件事的好處是, 也許哪一天不同的TED主題, 我們又會來到這裡分享, 我們要如何用同樣的資料來源 來找到藥物混用時產生的好效果, 它將提供我們新的治療方式, 以及對藥物如何作用提供新的見解, 並且讓我們的病人得到更好的照顧。
Thank you very much.
非常謝謝各位。
(Applause)
(掌聲)