Riccardo Sabatini: How to read the genome and build a human being

For the next 16 minutes, I'm going to take you on a journey that is probably the biggest dream of humanity: to understand the code of life.

接下來的16分鐘，我要帶各位進行一段冒險之旅，這大概是人類最大的夢想：了解生命的密碼。

So for me, everything started many, many years ago when I met the first 3D printer. The concept was fascinating. A 3D printer needs three elements: a bit of information, some raw material, some energy, and it can produce any object that was not there before.

對我而言，這一切的開始，要拉回到好幾好幾年前，當我第一次遇上3D印表機時。它的概念真的很棒。 3D印表機需要三個元素：少量的資訊、一些原物料、再加上點能量，這樣它就可以製造出以前從未存在過的任何東西。

I was doing physics, I was coming back home and I realized that I actually always knew a 3D printer. And everyone does. It was my mom.

我當時研究的是物理學，有天回到家裡時，我突然意識到，我家裡就有一台 3D 印表機。而且每個人家裡都有一台。那就是我媽嗎。

(Laughter)

（笑聲）

My mom takes three elements: a bit of information, which is between my father and my mom in this case, raw elements and energy in the same media, that is food, and after several months, produces me. And I was not existent before.

我媽也有三個元素：少量的資訊：我這個例子，指的是我媽跟我爸之間的投入，食物就是原物料及能量的來源，然後，幾個月後，生下了我。而我以前也是不存的。

So apart from the shock of my mom discovering that she was a 3D printer, I immediately got mesmerized by that piece, the first one, the information. What amount of information does it take to build and assemble a human? Is it much? Is it little? How many thumb drives can you fill?

所以，除了我發現我媽就是一台3D列印機之外，我突然間也被這個吸引注了，那邊的第一項，資訊。要有多少這樣的資訊才能建構並組裝出一個人來呢？要很多嗎？還是只要一點點？要多少隨身碟存取這些資訊呢？

Well, I was studying physics at the beginning and I took this approximation of a human as a gigantic Lego piece. So, imagine that the building blocks are little atoms and there is a hydrogen here, a carbon here, a nitrogen here. So in the first approximation, if I can list the number of atoms that compose a human being, I can build it. Now, you can run some numbers and that happens to be quite an astonishing number. So the number of atoms, the file that I will save in my thumb drive to assemble a little baby, will actually fill an entire Titanic of thumb drives -- multiplied 2,000 times. This is the miracle of life. Every time you see from now on a pregnant lady, she's assembling the biggest amount of information that you will ever encounter. Forget big data, forget anything you heard of. This is the biggest amount of information that exists.

我一開始是研究物理學的，我喜歡把人類比喻成一個大型的樂高玩具，你可以想像，每一個樂高積木就是一個原子，氫原子在這，碳原子在這，氮原子在這。按照最初的估算想法，如果我可以列出人類的原子清單的數量，我就可以把它建造出來。現在，請各位算一下，這想必是個驚人的數字。所以，存在這隨身碟裡面可以組合出來一個小寶寶的檔案，裡面的原子數數量，實際上若用樂高玩具組裝起一個人類，它的大小足足有 2000台鐵達尼號這麼大。這就是生命的奇蹟啊！從現在起，你每次看到懷孕的婦女，她就是那個正在組裝你這輩子所遇到的最大量資訊。忘了大數據吧！忘了你曾聽過的。這就是現存的最大數據資料。

(Applause)

（笑聲）

But nature, fortunately, is much smarter than a young physicist, and in four billion years, managed to pack this information in a small crystal we call DNA. We met it for the first time in 1950 when Rosalind Franklin, an amazing scientist, a woman, took a picture of it. But it took us more than 40 years to finally poke inside a human cell, take out this crystal, unroll it, and read it for the first time. The code comes out to be a fairly simple alphabet, four letters: A, T, C and G. And to build a human, you need three billion of them. Three billion. How many are three billion? It doesn't really make any sense as a number, right?

但...好在大自然比一位年輕的物理學家還聰明，這40億年來，大自然中負責管理包裹這個資訊的小晶體--我們稱之為DNA。我們在1950年第一次認識了它，當時有一位了不起的女科學家 --羅莎琳．富蘭克林-- 給 DNA 拍了張照。但我們花了40年的時間，最後才戳進人類細胞裡取出這個晶體，才首次把它伸展開來閱讀。而密碼也就是大家所孰知的四個字母：A、T、C、G。而建造一個人類，你需要30億個字母。 30億。 30億有多少？我們對這個數字真的很沒有概念，對吧？

So I was thinking how I could explain myself better about how big and enormous this code is. But there is -- I mean, I'm going to have some help, and the best person to help me introduce the code is actually the first man to sequence it, Dr. Craig Venter. So welcome onstage, Dr. Craig Venter.

所以，我在想，這麼大的數字我要怎麼解釋才讓人比較容易了解。但，我的意思是... 我最好找個人來幫忙，而能幫我介紹基因密碼的最佳人選，想當然就是第一個定序的人，克萊格．凡特博士。所以，讓我們歡迎克萊格．凡特博士上台。

(Applause)

（掌聲）

Not the man in the flesh, but for the first time in history, this is the genome of a specific human, printed page-by-page, letter-by-letter: 262,000 pages of information, 450 kilograms, shipped from the United States to Canada thanks to Bruno Bowden, Lulu.com, a start-up, did everything. It was an amazing feat.

當然不是活生生的人，但這是史上第一次特定人類的基因組被一頁接著一頁，一個字接著一個字地列印出來： 262,000頁的資料， 450公斤、從美國運到加拿大，感謝新創公司Lulu.com的布魯諾．鮑登，他們幫我做的這一切。這是個很棒的饗宴。

But this is the visual perception of what is the code of life. And now, for the first time, I can do something fun. I can actually poke inside it and read. So let me take an interesting book ... like this one. I have an annotation; it's a fairly big book. So just to let you see what is the code of life. Thousands and thousands and thousands and millions of letters. And they apparently make sense. Let's get to a specific part. Let me read it to you:

但這只是對生命密碼的視覺感受。現在，為了慶祝第一次，我要做件有趣的事。我真的可以從裡面挑一段來讀一讀。所以，讓我來找一本有趣的.... 書兒，比如這本。我做了個註記；這書太厚了。讓各位看一下甚麼是生命密碼。數以百萬、千萬、億個字母。它們當然都有意義。讓我來找一段特別的讀給各位聽：

(Laughter)

（笑聲）

"AAG, AAT, ATA."

To you it sounds like mute letters, but this sequence gives the color of the eyes to Craig. I'll show you another part of the book. This is actually a little more complicated.

你們可能覺得像是在聽天書，但這段序列，決定了克萊格的眼睛顏色。我再展示另一段給各位看。這段實際上稍微複雜些。

Chromosome 14, book 132:

14 號染色體，第132 號書：

(Laughter)

（笑聲）

As you might expect.

如你所望！

(Laughter)

（笑聲）

"ATT, CTT, GATT."

This human is lucky, because if you miss just two letters in this position -- two letters of our three billion -- he will be condemned to a terrible disease: cystic fibrosis. We have no cure for it, we don't know how to solve it, and it's just two letters of difference from what we are.

這個人很幸運，因為如果你在這個位置剛好漏掉兩個字母-- 30億個字母，只漏掉兩個-- 你就等同於被宣判得了一個恐佈的疾病：囊性纖維化。目前我們沒有治療的方式，我們不知道如何解決，僅僅就這兩個字母上的差異而已。

A wonderful book, a mighty book, a mighty book that helped me understand and show you something quite remarkable. Every one of you -- what makes me, me and you, you -- is just about five million of these, half a book. For the rest, we are all absolutely identical. Five hundred pages is the miracle of life that you are. The rest, we all share it. So think about that again when we think that we are different. This is the amount that we share.

這本偉大的書，這本偉大的書，可以幫助我了解，也能讓各位看到一些嘆為觀止的事情。在場的每一個人，成就你我不同的地方就這五百萬個字母的差異，半本書。剩下的，我們絕對都長一樣。就是這 500 頁的字母，行塑了你是甚麼樣的人，剩下的，我們都一樣。所以，當我們在討論彼此差異的時候，讓我們再反思一下，其實我們共同的地方真的有這麼多。

So now that I have your attention, the next question is: How do I read it? How do I make sense out of it? Well, for however good you can be at assembling Swedish furniture, this instruction manual is nothing you can crack in your life.

所以，我問一下各位，接下來的問題：我要怎麼讀它？我要怎麼搞懂它？其實，無論你多麼會看說明書組裝瑞典的家具，這本安裝手冊也沒辦法教你如何破解你的人生。

(Laughter)

（笑聲）

And so, in 2014, two famous TEDsters, Peter Diamandis and Craig Venter himself, decided to assemble a new company. Human Longevity was born, with one mission: trying everything we can try and learning everything we can learn from these books, with one target -- making real the dream of personalized medicine, understanding what things should be done to have better health and what are the secrets in these books.

2014年，兩位出名的 TED 演講者，彼得．戴曼迪斯和克雷格．文特爾本人，他們決定創立一家新公司。《人類長壽公司》誕生了，並賦予一個使命：竭盡所能的，從這些書上，嘗試每樣東西，學習每樣東西，就為了一個目標—— 讓個人化醫療的美夢可以成真，了解需要做哪些事才能更健康，以及了解這些書裡面的秘密。

An amazing team, 40 data scientists and many, many more people, a pleasure to work with. The concept is actually very simple. We're going to use a technology called machine learning. On one side, we have genomes -- thousands of them. On the other side, we collected the biggest database of human beings: phenotypes, 3D scan, NMR -- everything you can think of. Inside there, on these two opposite sides, there is the secret of translation. And in the middle, we build a machine. We build a machine and we train a machine -- well, not exactly one machine, many, many machines -- to try to understand and translate the genome in a phenotype. What are those letters, and what do they do? It's an approach that can be used for everything, but using it in genomics is particularly complicated. Little by little we grew and we wanted to build different challenges. We started from the beginning, from common traits. Common traits are comfortable because they are common, everyone has them.

一個令人驚豔的團隊，40 個數據科學家，還有其他很多、很多的人，一起為團隊努力。這概念其實很簡單。我們將要使用一種叫「機械自主學習」的概念。一方面，我們有成千上萬的基因組—— 另一方面，我們收集了人類最大的資料庫：生物特性、3D掃描、核磁共振—— 你能想到的每樣東西。這兩方面的資料，被自主翻譯出來後就可以解開很多的祕密。在這兩個中間，我們建立了一台機器。我建立它，訓練它—— 當然，並不只一台機器啦！是很多很多台機器—— 嘗試去了解並翻譯基因組的生物特徵表象。這些字母代表甚麼？它們有甚麼作用？這個方法可以運用在每件事上，但用在基因學上，它就特別複雜。在一點一滴的慢慢累積後，我們想建立不一樣的挑戰。我們從共同的特徵開始。談共同特徵比較輕鬆，因為它們都很普遍。每個人都有。

So we started to ask our questions: Can we predict height? Can we read the books and predict your height? Well, we actually can, with five centimeters of precision. BMI is fairly connected to your lifestyle, but we still can, we get in the ballpark, eight kilograms of precision. Can we predict eye color? Yeah, we can. Eighty percent accuracy. Can we predict skin color? Yeah we can, 80 percent accuracy. Can we predict age? We can, because apparently, the code changes during your life. It gets shorter, you lose pieces, it gets insertions. We read the signals, and we make a model.

我們從這個問題開始問：我們可以預測身高嗎？我們可以光看書就可以知道你的身高嗎？沒錯，我們真的可以，預測的誤差在五公分內。身體質量指數與你的生活形式有關，但我們仍然可以，相當精準地將預測誤差控制在 8 公斤以內。那我們可以預測眼睛顏色嗎？是的，我們可以。精準度高達80%。我們可以預測皮膚顏色嗎？是的，可以，80%的準確率。年齡呢？可以，因為隨著年紀，你的基因碼也會更著改變。它會變短、消失或被插入。我們可以讀到那個訊號，並把它模擬出來。

Now, an interesting challenge: Can we predict a human face? It's a little complicated, because a human face is scattered among millions of these letters. And a human face is not a very well-defined object. So, we had to build an entire tier of it to learn and teach a machine what a face is, and embed and compress it. And if you're comfortable with machine learning, you understand what the challenge is here.

現在，有一項有趣的挑戰：我們可以預測一個人的臉嗎？這有點複雜，因為人臉上散播了上百萬個這種字母。而人臉不太容易預測。所以，我們必須建立一個完整的堆疊系統，去學習並教會機器人臉是甚麼，然後把它嵌進去並壓縮。如果你很懂機器自主學習，你會懂得這邊的挑戰是甚麼。

Now, after 15 years -- 15 years after we read the first sequence -- this October, we started to see some signals. And it was a very emotional moment. What you see here is a subject coming in our lab. This is a face for us. So we take the real face of a subject, we reduce the complexity, because not everything is in your face -- lots of features and defects and asymmetries come from your life. We symmetrize the face, and we run our algorithm. The results that I show you right now, this is the prediction we have from the blood.

15年後--整整15年後-- 我們讀取到第一個序列-- 今年10月，我們開始看到一些訊號。真的是令人感動的時刻。你現在看到的是一個進來我們實驗室的實驗對象。這是一個我們人類的臉。所以我們拿一個真實的臉當作實驗對象，我們減少了複雜度，因為不是每樣東西都會在你的臉上原貌呈現出來-- 有很多的特徵、缺陷及不對稱來自於你後天的生活方式。我們把臉對稱好後，拿去跑我們的演算法。我現在展示給各位看的結果，是由血液演算出來的預測結果。

(Applause)

（掌聲）

Wait a second. In these seconds, your eyes are watching, left and right, left and right, and your brain wants those pictures to be identical. So I ask you to do another exercise, to be honest. Please search for the differences, which are many. The biggest amount of signal comes from gender, then there is age, BMI, the ethnicity component of a human. And scaling up over that signal is much more complicated. But what you see here, even in the differences, lets you understand that we are in the right ballpark, that we are getting closer. And it's already giving you some emotions.

稍等一下。在這短短的幾秒鐘，你的眼睛會左看看、右看看做比較，而你的大腦會希望這些照片是一致的。所以，我要求各位做另一項活動，這次要誠實。請找出他們不一樣的地方，有很多喔。最多的訊號來自性別，然後是年齡、身體質量指數、人類種族族群。把這些訊號擴大是相當複雜的。但即使你現在看到有點不同，還是要讓各位知道，我們預測還算不錯，已經很接近了。這已經讓你有點激動了。

This is another subject that comes in place, and this is a prediction. A little smaller face, we didn't get the complete cranial structure, but still, it's in the ballpark. This is a subject that comes in our lab, and this is the prediction. So these people have never been seen in the training of the machine. These are the so-called "held-out" set. But these are people that you will probably never believe. We're publishing everything in a scientific publication, you can read it.

這裡有另外一個例子，這是預測的結果。有點小的臉，我們雖然沒有跑完整個頭蓋骨結構，但，還是很精準。這是另一個實驗對象，這是預測結果。這些人從未在我們訓練的機器裡面出現過。也就是說這些從外面隨機取樣的。但也許各位不相信。我們已經在科學期刊上發表這一切了，你可以找到。

But since we are onstage, Chris challenged me. I probably exposed myself and tried to predict someone that you might recognize. So, in this vial of blood -- and believe me, you have no idea what we had to do to have this blood now, here -- in this vial of blood is the amount of biological information that we need to do a full genome sequence. We just need this amount. We ran this sequence, and I'm going to do it with you. And we start to layer up all the understanding we have. In the vial of blood, we predicted he's a male. And the subject is a male. We predict that he's a meter and 76 cm. The subject is a meter and 77 cm. So, we predicted that he's 76; the subject is 82. We predict his age, 38. The subject is 35. We predict his eye color. Too dark. We predict his skin color. We are almost there. That's his face.

但自從知道我們要上台後，克里斯就挑戰我說，我也許可以自己上陣並嘗試預測你們可能認識的人。所以，在這一瓶血液裡面-- 相信我，你們絕對不知道我們去哪裡搞來這一瓶血的，這瓶血就擁有全部的生物資訊，夠我們跑完全部的基因組定序。我們只需要這麼多。我們已經把它拿去定序，下次再做給大家看。然後開始堆疊出所有我們知道的東西，從這瓶血液裡，我們預測出他是位男士。而實驗對象是男士。我們預測他身高176公分。實際上他身高177公分。我們預測他的體重是76公斤；實際上是82公斤。我們預測他的年齡是38歲。實際上是35歲。我們預測眼睛的顏色是這樣。太暗了。我們預測他的皮膚顏色。幾乎很接近了。這是他的臉。

Now, the reveal moment: the subject is this person.

現在，真相要大白的時刻了：他長這樣。

(Laughter)

（笑聲）

And I did it intentionally. I am a very particular and peculiar ethnicity. Southern European, Italians -- they never fit in models. And it's particular -- that ethnicity is a complex corner case for our model. But there is another point. So, one of the things that we use a lot to recognize people will never be written in the genome. It's our free will, it's how I look. Not my haircut in this case, but my beard cut. So I'm going to show you, I'm going to, in this case, transfer it -- and this is nothing more than Photoshop, no modeling -- the beard on the subject. And immediately, we get much, much better in the feeling.

我故意這樣做的。我是一個非常特別的奇特種族。南歐洲人、義大利人—— 他們從來不會跟我們的預測相符。這個種族在我們的模式下，就是一個很複雜的特殊案例。但有另外一個重點。我們用很多工具來辨認人的特徵，但絕對不會把這些特徵寫到基因組裡面。因為這是我們的自由意志，我就是長這樣。在這個案例中，重點不是我的髮型，而是我的鬍鬚。所以，我要秀給各位看，我會把它轉變一下-- 就僅是用Photoshop上個鬍子，沒有調整其他的。突然間，感覺就比較像了。

So, why do we do this? We certainly don't do it for predicting height or taking a beautiful picture out of your blood. We do it because the same technology and the same approach, the machine learning of this code, is helping us to understand how we work, how your body works, how your body ages, how disease generates in your body, how your cancer grows and develops, how drugs work and if they work on your body.

所以，我們為什麼要做這個？我們絕對不是為了預測高度或拍一張你血液的美麗照片。我們這樣做的原因是，這些科技、方法、機器自主學習程式，可以幫助我們了解我們要如何進行工作、你的身體是如何運作、你的身體如何老化、你身上的疾病是如何造成的、你的癌症是如何成長和擴散的、藥物如何運作、以及這些藥物在你身上是否有作用。

This is a huge challenge. This is a challenge that we share with thousands of other researchers around the world. It's called personalized medicine. It's the ability to move from a statistical approach where you're a dot in the ocean, to a personalized approach, where we read all these books and we get an understanding of exactly how you are. But it is a particularly complicated challenge, because of all these books, as of today, we just know probably two percent: four books of more than 175.

這是一個很大的挑戰。這是我們全世界的研究人員共同的挑戰。它叫做個人化醫療。這種醫療能力是從傳統的統計方法，讓你大海撈針亂吃藥，轉成個人客製化的方法，都是從閱讀這些書裡面，讓我們了解真正的你。但這是充滿了複雜的挑戰，因為到目前為止，這些書，我們僅大概了解2%：四本書又175頁。

And this is not the topic of my talk, because we will learn more. There are the best minds in the world on this topic. The prediction will get better, the model will get more precise. And the more we learn, the more we will be confronted with decisions that we never had to face before about life, about death, about parenting.

但這不是我演講的主題，因為我們還有很多要學。全世界最聰明的智慧就在這個主題裡面。預測會越來越改善，模式會越來越精準。我們學得越多，我們克服從未面對過的決策的能力就越強，有關於生命、死亡、養育的決策。

So, we are touching the very inner detail on how life works. And it's a revolution that cannot be confined in the domain of science or technology. This must be a global conversation. We must start to think of the future we're building as a humanity. We need to interact with creatives, with artists, with philosophers, with politicians. Everyone is involved, because it's the future of our species. Without fear, but with the understanding that the decisions that we make in the next year will change the course of history forever.

所以，我們正接觸到生命如何運作的內部細節。而且這個革命不能只侷限在主流科學或技術上。我們需要一個全球性的對話。我們必須開始思考，我們要建構的人類未來。我們需要與創意人才、藝術家、哲學家政治家相互配合。每個人都要參與其中，因為這是我們人類的未來。不需要害怕，但需要包容明年我們所做的決定，將永遠地改變歷史。

Thank you.

謝謝各位！

(Applause)

（掌聲）

For the next 16 minutes, I'm going to take you on a journey that is probably the biggest dream of humanity: to understand the code of life.

接下來的16分鐘，我要帶各位進行一段冒險之旅，這大概是人類最大的夢想：了解生命的密碼。

I was doing physics, I was coming back home and I realized that I actually always knew a 3D printer. And everyone does. It was my mom.

我當時研究的是物理學，有天回到家裡時，我突然意識到，我家裡就有一台 3D 印表機。而且每個人家裡都有一台。那就是我媽嗎。

(Laughter)

（笑聲）

(Applause)

（笑聲）

(Applause)

（掌聲）

(Laughter)

（笑聲）

"AAG, AAT, ATA."

To you it sounds like mute letters, but this sequence gives the color of the eyes to Craig. I'll show you another part of the book. This is actually a little more complicated.

你們可能覺得像是在聽天書，但這段序列，決定了克萊格的眼睛顏色。我再展示另一段給各位看。這段實際上稍微複雜些。

Chromosome 14, book 132:

14 號染色體，第132 號書：

(Laughter)

（笑聲）

As you might expect.

如你所望！

(Laughter)

（笑聲）

"ATT, CTT, GATT."

(Laughter)

（笑聲）

(Applause)

（掌聲）

Now, the reveal moment: the subject is this person.

現在，真相要大白的時刻了：他長這樣。

(Laughter)

（笑聲）

Thank you.

謝謝各位！

(Applause)

（掌聲）

Riccardo Sabatini: How to read the genome and build a human being

Riccardo Sabatini: How to read the genome and build a human being

Related talks

Jennifer Doudna: How CRISPR lets us edit our DNA

Craig Venter: Watch me unveil "synthetic life"

Juan Enriquez: We can reprogram life. How to do it wisely

Christoph Adami: Finding life we can't imagine

Juan Enriquez: The age of genetic wonder

Rob Reid: How synthetic biology could wipe out humanity -- and how we can stop it

Related talks

Jennifer Doudna: How CRISPR lets us edit our DNA

Craig Venter: Watch me unveil "synthetic life"

Juan Enriquez: We can reprogram life. How to do it wisely

Christoph Adami: Finding life we can't imagine

Juan Enriquez: The age of genetic wonder

Rob Reid: How synthetic biology could wipe out humanity -- and how we can stop it