Riccardo Sabatini: How to read the genome and build a human being

For the next 16 minutes, I'm going to take you on a journey that is probably the biggest dream of humanity: to understand the code of life.

Следующие 16 минут мы с вами проведём в путешествии к тому, что, возможно, является величайшей мечтой человечества: к пониманию кода жизни.

So for me, everything started many, many years ago when I met the first 3D printer. The concept was fascinating. A 3D printer needs three elements: a bit of information, some raw material, some energy, and it can produce any object that was not there before.

Для меня это путешествие началось много-много лет назад, когда я увидел первый 3D-принтер. Просто потрясающий принцип: 3D-принтеру необходимы три вещи — немного информации, исходный материал и энергия, и тогда он способен напечатать любой не существовавший ранее предмет.

I was doing physics, I was coming back home and I realized that I actually always knew a 3D printer. And everyone does. It was my mom.

Я тогда занимался физикой и, возвращаясь как-то домой, осознал, что на самом деле всегда был знакóм с 3D-принтером. Как и любой из нас. Этот принтер — моя мама.

(Laughter)

(Смех)

My mom takes three elements: a bit of information, which is between my father and my mom in this case, raw elements and energy in the same media, that is food, and after several months, produces me. And I was not existent before.

Мама взяла три ингредиента: немного информации — в данном случае от себя и от папы, — исходный материал и энергию, то есть пищу, и через несколько месяцев она произвела на свет меня — не существовавший ранее объект.

So apart from the shock of my mom discovering that she was a 3D printer, I immediately got mesmerized by that piece, the first one, the information. What amount of information does it take to build and assemble a human? Is it much? Is it little? How many thumb drives can you fill?

Шокировав маму новостью о том, что она на самом деле 3D-принтер, я был совершенно зачарован одним из ингредиентов, самым первым — информацией. Сколько нужно информации, чтобы произвести человека? Много? Мало? Сколько нужно флеш-карт, чтобы всё уместилось?

Well, I was studying physics at the beginning and I took this approximation of a human as a gigantic Lego piece. So, imagine that the building blocks are little atoms and there is a hydrogen here, a carbon here, a nitrogen here. So in the first approximation, if I can list the number of atoms that compose a human being, I can build it. Now, you can run some numbers and that happens to be quite an astonishing number. So the number of atoms, the file that I will save in my thumb drive to assemble a little baby, will actually fill an entire Titanic of thumb drives -- multiplied 2,000 times. This is the miracle of life. Every time you see from now on a pregnant lady, she's assembling the biggest amount of information that you will ever encounter. Forget big data, forget anything you heard of. This is the biggest amount of information that exists.

Я когда-то изучал физику и представил человека как конструкцию LEGO гигантских размеров. Вообразите, что она состоит из маленьких атомов: водорода, углерода, азота. Тогда получается, что если составить список всех атомов, из которых состоит человек, то можно его создать. Что ж, можно произвести подсчёты и получить совершенно невероятное число. Количество атомов, тот самый файл, который нужно сохранить на флешке, чтобы создать ребёнка... На самом деле вам придётся заполнить такими флешками целый «Титаник», точнее, 2 000 таких «Титаников». Вот оно — чудо жизни. Отныне, встретив беременную женщину, помните: она собирает воедино самый большой массив информации, о котором вы когда-либо слышали. Куда до него большим данным и прочим штукам. Это самый большой в мире массив данных.

(Applause)

(Аплодисменты)

But nature, fortunately, is much smarter than a young physicist, and in four billion years, managed to pack this information in a small crystal we call DNA. We met it for the first time in 1950 when Rosalind Franklin, an amazing scientist, a woman, took a picture of it. But it took us more than 40 years to finally poke inside a human cell, take out this crystal, unroll it, and read it for the first time. The code comes out to be a fairly simple alphabet, four letters: A, T, C and G. And to build a human, you need three billion of them. Three billion. How many are three billion? It doesn't really make any sense as a number, right?

К счастью, природа намного разумнее молодого физика и за 4 миллиарда лет умудрилась упаковать всю эту информацию в маленький кристаллик под названием ДНК. Впервые мы увидели ДНК в 1950 году, когда Розалинд Франклин, замечательная женщина-учёный, смогла получить её снимок. Но понадобилось больше 40 лет, чтобы забраться в человеческую клетку, достать этот кристаллик, развернуть и впервые его прочесть. Код оказался довольно простой азбукой — всего четыре буквы: А, Т, С и G. И чтобы создать человека, понадобится 3 миллиарда этих букв. 3 миллиарда. Насколько это много? Само число нам ни о чём не говорит.

So I was thinking how I could explain myself better about how big and enormous this code is. But there is -- I mean, I'm going to have some help, and the best person to help me introduce the code is actually the first man to sequence it, Dr. Craig Venter. So welcome onstage, Dr. Craig Venter.

Я задумался, как понагляднее объяснить, насколько этот код грандиозен и огромен. Мне понадобится помощь, и кто лучше может представить вам код жизни, чем человек, впервые его расшифровавший, Крейг Вентер? Итак, добро пожаловать на сцену, Крейг Вентер.

(Applause)

(Аплодисменты)

Not the man in the flesh, but for the first time in history, this is the genome of a specific human, printed page-by-page, letter-by-letter: 262,000 pages of information, 450 kilograms, shipped from the United States to Canada thanks to Bruno Bowden, Lulu.com, a start-up, did everything. It was an amazing feat.

Не сам Крейг во плоти, но, впервые в истории, генóм конкретного человека, распечатанный постранично, буква за буквой, 262 000 страниц информации, 450 килограммов, доставленных из США в Канаду, — спасибо Бруно Бодену и компании Lulu.com, которые всё организовали. Это был целый подвиг.

But this is the visual perception of what is the code of life. And now, for the first time, I can do something fun. I can actually poke inside it and read. So let me take an interesting book ... like this one. I have an annotation; it's a fairly big book. So just to let you see what is the code of life. Thousands and thousands and thousands and millions of letters. And they apparently make sense. Let's get to a specific part. Let me read it to you:

Вот наглядное представление кода жизни. А сейчас я впервые могу проделать кое-что занятное. Я могу заглянуть внутрь и почитать. Давайте-ка я возьму интересную книжку, к примеру вот эту. У меня тут закладка — том довольно увесистый. Давайте я покажу вам, как выглядит код жизни. Тысячи, тысячи, тысячи, миллионы букв. И они явно что-то означают. Давайте заглянем вот сюда. Я вам почитаю:

(Laughter)

(Смех)

"AAG, AAT, ATA."

«ААG, AAT, ATA».

To you it sounds like mute letters, but this sequence gives the color of the eyes to Craig. I'll show you another part of the book. This is actually a little more complicated.

Для вас это ничего не значащие буквы, но эта последовательность отвечает за цвет глаз Крейга. Я покажу вам ещё одну книгу. Здесь чуть посложнее.

Chromosome 14, book 132:

Хромосома 14, том 132.

(Laughter)

(Смех)

As you might expect.

Как и следовало ожидать.

(Laughter)

(Смех)

"ATT, CTT, GATT."

«AТT, CTT, GATT».

This human is lucky, because if you miss just two letters in this position -- two letters of our three billion -- he will be condemned to a terrible disease: cystic fibrosis. We have no cure for it, we don't know how to solve it, and it's just two letters of difference from what we are.

Этому человеку повезло, потому что, пропади хотя бы две буквы вот в этом месте — всего две буквы из трёх миллиардов, — он был бы обречён на ужасный недуг: муковисцидоз. Лекарств от него не придумано, мы не знаем, что с этим делать, — и всего лишь две буквы отделяют нас от этой болезни.

A wonderful book, a mighty book, a mighty book that helped me understand and show you something quite remarkable. Every one of you -- what makes me, me and you, you -- is just about five million of these, half a book. For the rest, we are all absolutely identical. Five hundred pages is the miracle of life that you are. The rest, we all share it. So think about that again when we think that we are different. This is the amount that we share.

Удивительная, мощная книга, которая помогла мне понять и показать вам нечто весьма примечательное. То, что делает меня мной, а вас вами, — всего лишь пять миллионов букв, половина тома. В остальном мы совершенно идентичны. Чудо жизни, коим вы являетесь, — это всего пять сотен страниц. Всё остальное у всех одинаково. Вспомните об этом, когда услышите, что все мы разные. Всё это у нас одинаково.

So now that I have your attention, the next question is: How do I read it? How do I make sense out of it? Well, for however good you can be at assembling Swedish furniture, this instruction manual is nothing you can crack in your life.

Теперь, когда я вас заинтересовал, задам новый вопрос: а как читать этот код? Как в нём разобраться? Что ж, даже если вы специалист по сбору мебели из IKEA, на то, чтобы раскусить эту инструкцию, вам не хватит всей жизни.

(Laughter)

(Смех)

And so, in 2014, two famous TEDsters, Peter Diamandis and Craig Venter himself, decided to assemble a new company. Human Longevity was born, with one mission: trying everything we can try and learning everything we can learn from these books, with one target -- making real the dream of personalized medicine, understanding what things should be done to have better health and what are the secrets in these books.

В 2014 году два знаменитых участника TED, Питер Диамандис и Крейг Вентер, решили создать новую компанию. Так появилась Human Longevity, миссией которой было испытать всё, что можно испытать, исследовать всё, что можно исследовать в этих книгах, с одной целью: воплотить в реальность мечту о персонализированной медицине, понять, что нужно сделать, чтобы улучшить здоровье и разгадать все загадки этих книг.

An amazing team, 40 data scientists and many, many more people, a pleasure to work with. The concept is actually very simple. We're going to use a technology called machine learning. On one side, we have genomes -- thousands of them. On the other side, we collected the biggest database of human beings: phenotypes, 3D scan, NMR -- everything you can think of. Inside there, on these two opposite sides, there is the secret of translation. And in the middle, we build a machine. We build a machine and we train a machine -- well, not exactly one machine, many, many machines -- to try to understand and translate the genome in a phenotype. What are those letters, and what do they do? It's an approach that can be used for everything, but using it in genomics is particularly complicated. Little by little we grew and we wanted to build different challenges. We started from the beginning, from common traits. Common traits are comfortable because they are common, everyone has them.

Замечательная команда — 40 специалистов по обработке данных и многие другие люди, прекрасные коллеги. Идея на самом деле очень проста. Мы используем технологию под названием «машинное обучение». С одной стороны, у нас есть тысячи геномов. С другой стороны, мы собрали огромнейшую базу данных о различных индивидуумах: фенотипы, 3D-снимки, магнитный резонанс — чего там только нет. Две противоположные стороны связаны секретом трансляции генов. Мы создали для этого машину и научили её — вообще-то, не одну, а много-много машин — научили понимать и транслировать геном в фенотип. Что это за буквы, за что они отвечают? Такой подход применим повсюду, но в геномике он особенно замысловат. Потихоньку мы развивались и ставили перед собой различные задачи. Мы начали с простого — с общих характеристик. С ними удобно работать, потому что они общие, они есть у каждого.

So we started to ask our questions: Can we predict height? Can we read the books and predict your height? Well, we actually can, with five centimeters of precision. BMI is fairly connected to your lifestyle, but we still can, we get in the ballpark, eight kilograms of precision. Can we predict eye color? Yeah, we can. Eighty percent accuracy. Can we predict skin color? Yeah we can, 80 percent accuracy. Can we predict age? We can, because apparently, the code changes during your life. It gets shorter, you lose pieces, it gets insertions. We read the signals, and we make a model.

Мы начали задаваться вопросами. Можно ли предугадать рост? Можно ли прочесть эти тома и угадать ваш рост? На самом деле можно — с точностью до 5 сантиметров. Индекс массы тела часто связан с образом жизни, но его тоже можно предсказать с погрешностью в 8 килограммов. Спрогнозировать цвет глаз? Можем. С точностью 80%. Цвет кожи? С точностью 80%. Можем ли мы угадать возраст? Да, потому что, по всей видимости, код меняется с возрастом: укорачивается, какие-то кусочки теряются, какие-то появляются. Мы считываем сигналы, мы создаём модель.

Now, an interesting challenge: Can we predict a human face? It's a little complicated, because a human face is scattered among millions of these letters. And a human face is not a very well-defined object. So, we had to build an entire tier of it to learn and teach a machine what a face is, and embed and compress it. And if you're comfortable with machine learning, you understand what the challenge is here.

Вот интересная задачка: можем ли мы предугадать черты лица? Это непросто, потому что гены, отвечающие за черты лица, зашифрованы в миллионнах букв. Лицо человека трудно поддаётся определению. Над этим пришлось поработать отдельно: изучить и объяснить машине, что такое лицо, а затем встроить эти данные в алгоритм. Если вы разбираетесь в машинном обучении, вы понимаете, насколько это сложно.

Now, after 15 years -- 15 years after we read the first sequence -- this October, we started to see some signals. And it was a very emotional moment. What you see here is a subject coming in our lab. This is a face for us. So we take the real face of a subject, we reduce the complexity, because not everything is in your face -- lots of features and defects and asymmetries come from your life. We symmetrize the face, and we run our algorithm. The results that I show you right now, this is the prediction we have from the blood.

Спустя 15 лет после первой расшифровки, в октябре этого года, мы начали получать первые сигналы. Это был очень волнующий момент. Здесь вы видите лицо участницы эксперимента нашей лаборатории, мы работали с этим лицом. Мы берём данные его черт, упрощаем их, так как нам не нужно всё — многие особенности и дефекты приобретаются уже в течение жизни. Мы делаем лицо более симметричным и запускаем наш алгоритм. Результаты, которые я вам покажу, — это то, что нам удалось предсказать на основе образца крови.

(Applause)

(Аплодисменты)

Wait a second. In these seconds, your eyes are watching, left and right, left and right, and your brain wants those pictures to be identical. So I ask you to do another exercise, to be honest. Please search for the differences, which are many. The biggest amount of signal comes from gender, then there is age, BMI, the ethnicity component of a human. And scaling up over that signal is much more complicated. But what you see here, even in the differences, lets you understand that we are in the right ballpark, that we are getting closer. And it's already giving you some emotions.

Секундочку. Сейчас вы переводите взгляд слева направо, справа налево: вашему мозгу хочется найти признаки идентичности этих изображений. Для чистоты эксперимента давайте сделаем по-другому. Пожалуйста, поищите различия, их здесь немало. Самый сильный сигнал отвечает за пол, затем возраст, индекс массы тела, этническую принадлежность. Разобраться в этих сигналах совсем непросто. Но то, что вы здесь видите, даже различия, показывает, что мы на верном пути, мы всё ближе к истине. Уже только это будоражит эмоции.

This is another subject that comes in place, and this is a prediction. A little smaller face, we didn't get the complete cranial structure, but still, it's in the ballpark. This is a subject that comes in our lab, and this is the prediction. So these people have never been seen in the training of the machine. These are the so-called "held-out" set. But these are people that you will probably never believe. We're publishing everything in a scientific publication, you can read it.

Это ещё один наш испытуемый, а это полученный прогноз. Лицо не такое крупное, не совсем удалось передать строение черепа, но всё равно — очень близко. Вот другой испытуемый, а вот наш расчёт. Когда мы обучали машину, мы не использовали эти изображения. Это так называемый «резерв». Но в случае с этими людьми вам трудно оценить наш успех. Мы всё опубликуем в научной статье, вы сможете её почитать.

But since we are onstage, Chris challenged me. I probably exposed myself and tried to predict someone that you might recognize. So, in this vial of blood -- and believe me, you have no idea what we had to do to have this blood now, here -- in this vial of blood is the amount of biological information that we need to do a full genome sequence. We just need this amount. We ran this sequence, and I'm going to do it with you. And we start to layer up all the understanding we have. In the vial of blood, we predicted he's a male. And the subject is a male. We predict that he's a meter and 76 cm. The subject is a meter and 77 cm. So, we predicted that he's 76; the subject is 82. We predict his age, 38. The subject is 35. We predict his eye color. Too dark. We predict his skin color. We are almost there. That's his face.

Однако Крис поставил передо мной непростую задачу. Возможно, я рисковал, но я попытался спрогнозировать внешность человека, которого вы способны узнать. Итак, в этой пробирке с кровью — поверьте, вы даже не представляете, на что нам пришлось пойти, чтобы добыть эту пробирку, — в этой пробирке находится биологическая информация, необходимая для полной расшифровки генома. Достаточно вот такого количества. Мы сделали расшифровку, я вас проведу через процесс. Слой за слоем складывается наша картинка. С помощью этой пробирки мы определили, что это мужчина. И это действительно мужчина. Мы предсказали, что его рост — 1,76 см. Рост испытуемого — 1,77 см. Мы рассчитали, что он весит 76 кг, оказалось — 82 кг. Предсказанный возраст — 38 лет. Испытуемому 35. Определили цвет его глаз. Более тёмный оттенок. Теперь цвет кожи. Почти угадали. Вот его лицо.

Now, the reveal moment: the subject is this person.

Настаёт момент истины: вот наш испытуемый.

(Laughter)

(Смех)

And I did it intentionally. I am a very particular and peculiar ethnicity. Southern European, Italians -- they never fit in models. And it's particular -- that ethnicity is a complex corner case for our model. But there is another point. So, one of the things that we use a lot to recognize people will never be written in the genome. It's our free will, it's how I look. Not my haircut in this case, but my beard cut. So I'm going to show you, I'm going to, in this case, transfer it -- and this is nothing more than Photoshop, no modeling -- the beard on the subject. And immediately, we get much, much better in the feeling.

Я сделал это нарочно. У меня очень специфическая национальность. Южноевропеец, итальянец — мы вечно не вписываемся в модели. А для нашей модели этническая принадлежность очень важна. Но есть ещё кое-что. Одна из черт, которую мы часто используем для узнавания лиц, никогда не будет отражена в геноме. Это наш свободный выбор, это то, как я выгляжу. В моём случае — не прическа, а форма бороды. Сейчас я вам покажу — и это Photoshop и ничего больше, никакого моделирования — мы перенесём бороду вот сюда. И сразу же всё становится гораздо лучше.

So, why do we do this? We certainly don't do it for predicting height or taking a beautiful picture out of your blood. We do it because the same technology and the same approach, the machine learning of this code, is helping us to understand how we work, how your body works, how your body ages, how disease generates in your body, how your cancer grows and develops, how drugs work and if they work on your body.

Итак, зачем мы этим занимаемся? Конечно, не для того, чтобы угадывать рост или получать красивые картинки по данным крови. Мы делаем это потому, что эта технология, этот метод, машинное обучение в генетике, помогает нам понять, как мы функционируем, как работает наше тело, как оно стареет, как возникают заболевания, как появляется и развивается рак, как действуют лекарства и действуют ли они на вас лично.

This is a huge challenge. This is a challenge that we share with thousands of other researchers around the world. It's called personalized medicine. It's the ability to move from a statistical approach where you're a dot in the ocean, to a personalized approach, where we read all these books and we get an understanding of exactly how you are. But it is a particularly complicated challenge, because of all these books, as of today, we just know probably two percent: four books of more than 175.

Это сложнейшая задача. Над этой задачей мы работаем вместе с тысячами других исследователей по всему миру. Это называется «персонализированная медицина». Это переход от статистического подхода, когда вы капля в море, к персонализированному подходу, когда мы читаем все эти книги и точно понимаем все ваши особенности. Но задача эта особенно сложна, потому что сегодня из всех этих томов мы изучили, вероятно, примерно 2% — 4 тома из более чем 175.

And this is not the topic of my talk, because we will learn more. There are the best minds in the world on this topic. The prediction will get better, the model will get more precise. And the more we learn, the more we will be confronted with decisions that we never had to face before about life, about death, about parenting.

Но я сегодня говорю не об этом, потому что всё ещё впереди. Над этим работают лучшие умы планеты. Мы научимся лучше предсказывать, модель станет точнее. И чем больше мы будем узнавать, тем больше перед нами будет вставать вопросов, с которыми нам не приходилось сталкиваться раньше: вопросов о жизни, о смерти, о рождении и воспитании детей.

So, we are touching the very inner detail on how life works. And it's a revolution that cannot be confined in the domain of science or technology. This must be a global conversation. We must start to think of the future we're building as a humanity. We need to interact with creatives, with artists, with philosophers, with politicians. Everyone is involved, because it's the future of our species. Without fear, but with the understanding that the decisions that we make in the next year will change the course of history forever.

Мы касаемся самых потаённых деталей того, как работает сама жизнь. Эту революцию нельзя ограничивать рамками науки и технологии. Это должно обсуждаться глобально. Нам нужно задуматься о будущем человечества. Нужно взаимодействовать с творческими людьми, художниками, философами, политиками. Это касается каждого, потому что это будущее нашего биологического вида. Без страха, но с пониманием того, что решения, которые мы примем в ближайшем будущем, способны навсегда изменить ход истории.

Thank you.

Спасибо.

(Applause)

(Аплодисменты)

For the next 16 minutes, I'm going to take you on a journey that is probably the biggest dream of humanity: to understand the code of life.

I was doing physics, I was coming back home and I realized that I actually always knew a 3D printer. And everyone does. It was my mom.

(Laughter)

(Смех)

(Applause)

(Аплодисменты)

(Applause)

(Аплодисменты)

(Laughter)

(Смех)

"AAG, AAT, ATA."

«ААG, AAT, ATA».

To you it sounds like mute letters, but this sequence gives the color of the eyes to Craig. I'll show you another part of the book. This is actually a little more complicated.

Chromosome 14, book 132:

Хромосома 14, том 132.

(Laughter)

(Смех)

As you might expect.

Как и следовало ожидать.

(Laughter)

(Смех)

"ATT, CTT, GATT."

«AТT, CTT, GATT».

(Laughter)

(Смех)

(Applause)

(Аплодисменты)

Now, the reveal moment: the subject is this person.

Настаёт момент истины: вот наш испытуемый.

(Laughter)

(Смех)

Thank you.

Спасибо.

(Applause)

(Аплодисменты)

Riccardo Sabatini: How to read the genome and build a human being

Riccardo Sabatini: How to read the genome and build a human being

Related talks

Jennifer Doudna: How CRISPR lets us edit our DNA

Craig Venter: Watch me unveil "synthetic life"

Juan Enriquez: We can reprogram life. How to do it wisely

Christoph Adami: Finding life we can't imagine

Juan Enriquez: The age of genetic wonder

Rob Reid: How synthetic biology could wipe out humanity -- and how we can stop it

Related talks

Jennifer Doudna: How CRISPR lets us edit our DNA

Craig Venter: Watch me unveil "synthetic life"

Juan Enriquez: We can reprogram life. How to do it wisely

Christoph Adami: Finding life we can't imagine

Juan Enriquez: The age of genetic wonder

Rob Reid: How synthetic biology could wipe out humanity -- and how we can stop it