Sara-Jane Dunn: The next software revolution: programming biological cells

The second half of the last century was completely defined by a technological revolution: the software revolution. The ability to program electrons on a material called silicon made possible technologies, companies and industries that were at one point unimaginable to many of us, but which have now fundamentally changed the way the world works. The first half of this century, though, is going to be transformed by a new software revolution: the living software revolution. And this will be powered by the ability to program biochemistry on a material called biology. And doing so will enable us to harness the properties of biology to generate new kinds of therapies, to repair damaged tissue, to reprogram faulty cells or even build programmable operating systems out of biochemistry. If we can realize this -- and we do need to realize it -- its impact will be so enormous that it will make the first software revolution pale in comparison.

De tweede helft van de vorige eeuw werd volledig gekenmerkt door een technologische revolutie: de softwarerevolutie. De mogelijkheid om elektronen in silicium te programmeren liet technologieën, bedrijven en industrieën ontstaan die toen voor velen van ons nog onvoorstelbaar waren, maar die nu de aard van de wereld fundamenteel hebben veranderd. Maar de eerste helft van deze eeuw zal worden getransformeerd door een nieuwe softwarerevolutie: de revolutie van levende software. Die zal in staat zijn om biochemie te programmeren op een materiaal dat we biologie heten. Daardoor zullen we gebruik kunnen maken van de eigenschappen van de biologie om nieuwe soorten therapieën te ontwerpen, om beschadigd weefsel te herstellen, defecte cellen te herprogrammeren of zelfs biochemisch programmeerbare besturingssystemen te bouwen. Als we dit kunnen realiseren -- en we moeten dit realiseren -- zal de impact ervan zo enorm zijn dat de eerste softwarerevolutie erbij zal verbleken.

And that's because living software would transform the entirety of medicine, agriculture and energy, and these are sectors that dwarf those dominated by IT. Imagine programmable plants that fix nitrogen more effectively or resist emerging fungal pathogens, or even programming crops to be perennial rather than annual so you could double your crop yields each year. That would transform agriculture and how we'll keep our growing and global population fed. Or imagine programmable immunity, designing and harnessing molecular devices that guide your immune system to detect, eradicate or even prevent disease. This would transform medicine and how we'll keep our growing and aging population healthy.

Levende software zou de geneeskunde namelijk helemaal veranderen, en ook de landbouw en energiesector, en deze sectoren overvleugelen veruit de sectoren gedomineerd door de IT. Stel je programmeerbare planten voor die stikstof efficiënter fixeren of nieuwe fungale pathogenen weerstaan, of zelfs eenjarige gewassen tot doorlevende herprogrammeren zodat je je jaarlijkse oogst zou kunnen verdubbelen. Dat zou de landbouw hervormen en onze groeiende wereldbevolking van voedsel voorzien. Of stel je programmeerbare immuniteit voor door het ontwerpen en inschakelen van moleculaire apparaten die je immuunsysteem leren om ziektes op te sporen, uit te roeien of zelfs te voorkomen. Dit zou de geneeskunde hervormen en onze toenemende en vergrijzende bevolking gezond houden.

We already have many of the tools that will make living software a reality. We can precisely edit genes with CRISPR. We can rewrite the genetic code one base at a time. We can even build functioning synthetic circuits out of DNA. But figuring out how and when to wield these tools is still a process of trial and error. It needs deep expertise, years of specialization. And experimental protocols are difficult to discover and all too often, difficult to reproduce. And, you know, we have a tendency in biology to focus a lot on the parts, but we all know that something like flying wouldn't be understood by only studying feathers. So programming biology is not yet as simple as programming your computer. And then to make matters worse, living systems largely bear no resemblance to the engineered systems that you and I program every day. In contrast to engineered systems, living systems self-generate, they self-organize, they operate at molecular scales. And these molecular-level interactions lead generally to robust macro-scale output. They can even self-repair.

We hebben al veel methodes om levende software te realiseren. We kunnen met CRISPR genen precies bewerken. We kunnen de genetische code base per base herschrijven. We kunnen zelfs functionerende synthetische circuits bouwen uit DNA. Maar uitzoeken hoe en wanneer deze tools te hanteren is nog maar in het stadium van gissen en missen. Er is diepgaande expertise en jaren van specialisatie nodig. Experimentele protocollen zijn moeilijk te ontdekken en maar al te vaak moeilijk te reproduceren. In de biologie focussen we ons vaak op de details, maar we weten toch dat iets als vliegen niet begrepen kan worden door alleen maar veren te bestuderen. Het programmeren van biologie gaat nog niet zo eenvoudig als het programmeren van je computer. Tot overmaat van ramp lijken levende systemen grotendeels niet op de ontwikkelde systemen die jullie en ik elke dag programmeren. In tegenstelling tot technische systemen doen levende systemen aan zelfgeneratie, zelforganisatie en werken ze op moleculaire schaal. Interacties op moleculair niveau leiden over het algemeen tot een robuuste output op macroschaal. Ze kunnen zelfs zelfreparatie aan.

Consider, for example, the humble household plant, like that one sat on your mantelpiece at home that you keep forgetting to water. Every day, despite your neglect, that plant has to wake up and figure out how to allocate its resources. Will it grow, photosynthesize, produce seeds, or flower? And that's a decision that has to be made at the level of the whole organism. But a plant doesn't have a brain to figure all of that out. It has to make do with the cells on its leaves. They have to respond to the environment and make the decisions that affect the whole plant. So somehow there must be a program running inside these cells, a program that responds to input signals and cues and shapes what that cell will do. And then those programs must operate in a distributed way across individual cells, so that they can coordinate and that plant can grow and flourish.

Denk maar aan de nederige kamerplant die thuis op je schoorsteenmantel staat en die je vergat water te geven. Ondanks je verwaarlozing moet die plant elke dag wakker worden en uitzoeken hoe ze haar middelen zal benutten. Zal ze groeien, aan fotosynthese doen, zaden produceren of bloeien? Dat is een beslissing op het niveau van het hele organisme. Maar een plant heeft geen hersens om dat allemaal uit te zoeken. Ze moet het doen met de cellen op haar bladeren. Die moeten reageren op de omgeving en beslissingen nemen die de hele plant beïnvloeden. Dus moet er in die cellen een of ander programma lopen, een programma dat reageert op ingangssignalen en bepaalt wat die cel zal doen. Dan moet dat programma op een gedistribueerde manier werken in individuele cellen, zodat ze kunnen coördineren en de plant kan groeien en bloeien.

If we could understand these biological programs, if we could understand biological computation, it would transform our ability to understand how and why cells do what they do. Because, if we understood these programs, we could debug them when things go wrong. Or we could learn from them how to design the kind of synthetic circuits that truly exploit the computational power of biochemistry.

Als we deze biologische programma's konden begrijpen, als we biologisch computeren begrepen, konden we snappen hoe en waarom cellen doen wat ze doen. Want als we deze programma's begrepen, konden we ze debuggen als er iets misgaat. Of konden we van hen leren hoe synthetische circuits te ontwerpen die de rekenkracht van de biochemie echt zouden benutten.

My passion about this idea led me to a career in research at the interface of maths, computer science and biology. And in my work, I focus on the concept of biology as computation. And that means asking what do cells compute, and how can we uncover these biological programs? And I started to ask these questions together with some brilliant collaborators at Microsoft Research and the University of Cambridge, where together we wanted to understand the biological program running inside a unique type of cell: an embryonic stem cell. These cells are unique because they're totally naïve. They can become anything they want: a brain cell, a heart cell, a bone cell, a lung cell, any adult cell type. This naïvety, it sets them apart, but it also ignited the imagination of the scientific community, who realized, if we could tap into that potential, we would have a powerful tool for medicine. If we could figure out how these cells make the decision to become one cell type or another, we might be able to harness them to generate cells that we need to repair diseased or damaged tissue. But realizing that vision is not without its challenges, not least because these particular cells, they emerge just six days after conception. And then within a day or so, they're gone. They have set off down the different paths that form all the structures and organs of your adult body.

Mijn passie over dit idee leidde me naar een carrière in het onderzoek op het raakpunt van wiskunde, informatica en biologie. In mijn werk richt ik me op het concept van biologie als computerwerk. Dat betekent uitzoeken wat cellen berekenen en hoe deze bio-programma's te vinden. Samen met enkele schitterende medewerkers begon ik me dat af te vragen bij Microsoft Research en de Universiteit van Cambridge waar we samen wilden begrijpen hoe biologische programma’s verlopen in een uniek type cel: een embryonale stamcel. Deze cellen zijn uniek omdat ze helemaal naïef zijn. Ze kunnen alles worden wat ze willen: een hersencel, een hartcel, een botcel, een longcel, elk type volwassen cel. Deze naïviteit maakt ze speciaal, maar dat ontstak ook de verbeelding van de wetenschappers, die beseften dat als we dat potentieel konden aanboren, we een krachtige medische tool zouden hebben. Als we kunnen achterhalen hoe deze cellen beslissen om een of ander type cel te zijn, kunnen we dat misschien benutten om cellen te genereren die ziek of beschadigd weefsel herstellen. Maar die visie realiseren loopt niet van een leien dakje, vooral omdat deze specifieke cellen tot slechts zes dagen na de bevruchting ontstaan. Na een paar dagen zijn ze weer weg. Ze volgen dan de verschillende paden die alle structuren en organen van je volwassen lichaam gaan uitmaken.

But it turns out that cell fates are a lot more plastic than we might have imagined. About 13 years ago, some scientists showed something truly revolutionary. By inserting just a handful of genes into an adult cell, like one of your skin cells, you can transform that cell back to the naïve state. And it's a process that's actually known as "reprogramming," and it allows us to imagine a kind of stem cell utopia, the ability to take a sample of a patient's own cells, transform them back to the naïve state and use those cells to make whatever that patient might need, whether it's brain cells or heart cells.

Maar nu blijkt het lot van cellen veel plastischer te zijn dan we eerder dachten. Ongeveer dertien jaar geleden vonden enkele wetenschappers iets revolutionairs. Door het inbrengen van enkele genen in een volwassen cel, zoals een huidcel, kan je die cel terug naar de naïeve staat omvormen. Het is een proces dat bekend staat als ‘herprogrammering’ en laat ons dromen van een stamcel-utopie, waar je een staal van de eigen cellen van een patiënt kan nemen, ze terug naar de naïeve staat kan transformeren en ze gebruiken om de cellen te maken die de patiënt nodig heeft, ongeacht het nu hersen- of hartcellen zijn.

But over the last decade or so, figuring out how to change cell fate, it's still a process of trial and error. Even in cases where we've uncovered successful experimental protocols, they're still inefficient, and we lack a fundamental understanding of how and why they work. If you figured out how to change a stem cell into a heart cell, that hasn't got any way of telling you how to change a stem cell into a brain cell. So we wanted to understand the biological program running inside an embryonic stem cell, and understanding the computation performed by a living system starts with asking a devastatingly simple question: What is it that system actually has to do?

Maar in de afgelopen tien jaar bleef uitzoeken hoe het lot van de cel te veranderen toch nog steeds een proces van gissen en missen. Zelfs in gevallen waarin we succesvolle experimentele protocollen hebben ontdekt, zijn ze nog steeds inefficiënt en weten we niet hoe en waarom ze werken. Als je vindt hoe je een stamcel in een hartcel kan veranderen, vertelt je dat nog niets over hoe je een stamcel kan veranderen in een hersencel. Dus wilden we begrijpen hoe het biologische programma in een embryonale stamcel verloopt. Om de berekening te begrijpen die verloopt in een levend systeem moet je beginnen met een uiterst simpele vraag: wat moet dat systeem eigenlijk doen?

Now, computer science actually has a set of strategies for dealing with what it is the software and hardware are meant to do. When you write a program, you code a piece of software, you want that software to run correctly. You want performance, functionality. You want to prevent bugs. They can cost you a lot. So when a developer writes a program, they could write down a set of specifications. These are what your program should do. Maybe it should compare the size of two numbers or order numbers by increasing size. Technology exists that allows us automatically to check whether our specifications are satisfied, whether that program does what it should do. And so our idea was that in the same way, experimental observations, things we measure in the lab, they correspond to specifications of what the biological program should do.

Nu heeft de informatica eigenlijk een set strategieën om om te gaan met wat de software en de hardware moeten doen. Wanneer je een programma of een stukje software schrijft, wil je dat die software goed werkt. Je wil prestaties, functionaliteit. Je wil bugs voorkomen. Die kunnen je duur komen te staan. Wanneer iemand een programma schrijft, kan hij een bestek maken. Dat bepaalt wat je programma hoort te doen. Misschien de grootte van twee getallen vergelijken of ze ordenen naar grootte. Technologie bestaat om automatisch na te gaan of aan onze specificaties is voldaan, of dat programma doet wat het moet doen. Ons idee bestond erin om op dezelfde manier na te gaan of experimentele waarnemingen, dingen die we meten in het lab, beantwoorden aan specificaties

So we just needed to figure out a way

van wat het biologische programma moet doen.

to encode this new type of specification. So let's say you've been busy in the lab and you've been measuring your genes and you've found that if Gene A is active, then Gene B or Gene C seems to be active. We can write that observation down as a mathematical expression if we can use the language of logic: If A, then B or C. Now, this is a very simple example, OK. It's just to illustrate the point. We can encode truly rich expressions that actually capture the behavior of multiple genes or proteins over time across multiple different experiments. And so by translating our observations into mathematical expression in this way, it becomes possible to test whether or not those observations can emerge from a program of genetic interactions.

We hoefden maar een manier te vinden om deze nieuwe vorm van specificatie te coderen. Stel dat je in het lab uitzocht wat je genen doen en je ontdekte dat als gen A actief is, gen B of gen C ook actief lijken te zijn. We kunnen die observatie opschrijven als een wiskundige uitdrukking. In de taal van de logica: als A, dan B of C. Nu is dit wel een heel eenvoudig voorbeeld, oké. Alleen om het punt te illustreren. Maar we kunnen echt rijke uitdrukkingen coderen die het gedrag van meerdere genen of eiwitten in de tijd vastleggen over meerdere verschillende experimenten. Door onze observaties zo in wiskundige vorm te gieten, wordt het mogelijk om te testen of deze waarnemingen al dan niet kunnen ontstaan uit een programma van genetische interacties.

And we developed a tool to do just this. We were able to use this tool to encode observations as mathematical expressions, and then that tool would allow us to uncover the genetic program that could explain them all. And we then apply this approach to uncover the genetic program running inside embryonic stem cells to see if we could understand how to induce that naïve state. And this tool was actually built on a solver that's deployed routinely around the world for conventional software verification. So we started with a set of nearly 50 different specifications that we generated from experimental observations of embryonic stem cells. And by encoding these observations in this tool, we were able to uncover the first molecular program that could explain all of them.

We ontwikkelden een tool om net dat te doen. We konden deze tool gebruiken om waarnemingen te coderen als wiskundige uitdrukkingen en daardoor het genetische programma ontdekken dat ze allemaal zou kunnen verklaren. En dan passen we deze aanpak toe om het genetische programma in embryonale stamcellen zichtbaar te maken om te zien of we kunnen begrijpen hoe die naïeve toestand te krijgen. Deze tool was eigenlijk gebouwd op een solver die routinematig wereldwijd wordt ingezet voor conventionele softwareverificatie. Dus begonnen we met een set van bijna 50 verschillende specificaties gegenereerd uit experimentele waarnemingen van embryonale stamcellen. Door het coderen van die waarnemingen in deze tool, konden we het eerste moleculaire programma ontdekken dat ze allemaal zou kunnen verklaren.

Now, that's kind of a feat in and of itself, right? Being able to reconcile all of these different observations is not the kind of thing you can do on the back of an envelope, even if you have a really big envelope. Because we've got this kind of understanding, we could go one step further. We could use this program to predict what this cell might do in conditions we hadn't yet tested. We could probe the program in silico.

Dat is toch wel een prestatie op zich, niet? Al die verschillende waarnemingen met elkaar verzoenen, is niet iets dat je even doet op de achterkant van een envelop, zelfs niet op een echt grote envelop. Nu we dat begrepen, konden we een stap verder. We kunnen dit gebruiken om te voorspellen wat deze cel zou kunnen doen in nieuwe omstandigheden. We konden het programma in silico uittesten.

And so we did just that: we generated predictions that we tested in the lab, and we found that this program was highly predictive. It told us how we could accelerate progress back to the naïve state quickly and efficiently. It told us which genes to target to do that, which genes might even hinder that process. We even found the program predicted the order in which genes would switch on. So this approach really allowed us to uncover the dynamics of what the cells are doing.

We deden precies dat: we maakten voorspellingen die we testten in het lab en we vonden dat dit programma een hoge voorspellende waarde had. Het vertelde ons hoe we de voortgang konden versnellen om snel en efficiënt naar de naïeve staat terug te keren. Het vertelde ons op welke genen we ons moesten richten en welke genen dit proces zelfs zouden kunnen hinderen. We vonden zelfs dat het programma de volgorde voorspelde waarin de genen inschakelden. Deze benadering liet ons echt de dynamiek ontdekken van wat de cellen doen.

What we've developed, it's not a method that's specific to stem cell biology. Rather, it allows us to make sense of the computation being carried out by the cell in the context of genetic interactions. So really, it's just one building block. The field urgently needs to develop new approaches to understand biological computation more broadly and at different levels, from DNA right through to the flow of information between cells. Only this kind of transformative understanding will enable us to harness biology in ways that are predictable and reliable.

Wat we hebben ontwikkeld, is geen methode die specifiek is voor stamcelbiologie. Nee, ze stelt ons in staat te begrijpen wat de cel aan berekeningen uitvoert in de context van genetische wisselwerkingen. Het is slechts één bouwsteen. Het gebied moet dringend nieuwe methodes ontwikkelen voor een breder begrip van biologische berekening en wel op verschillende niveaus, vanaf DNA tot aan de informatiestroom tussen cellen. Alleen dit soort transformatief begrip zal ons toelaten om de biologie op een voorspelbare en betrouwbare manier te benutten.

But to program biology, we will also need to develop the kinds of tools and languages that allow both experimentalists and computational scientists to design biological function and have those designs compile down to the machine code of the cell, its biochemistry, so that we could then build those structures. Now, that's something akin to a living software compiler, and I'm proud to be part of a team at Microsoft that's working to develop one. Though to say it's a grand challenge is kind of an understatement, but if it's realized, it would be the final bridge between software and wetware.

Maar om biologie te programmeren, is ook de ontwikkeling nodig van tools en talen waarmee zowel experimentalisten als computationele wetenschappers biologische functies kunnen ontwerpen en dan die ontwerpen vertalen naar het machinecodeniveau van de cel, haar biochemie, zodat we vervolgens die structuren kunnen bouwen. Dat lijkt wel een samensteller van levende software en ik ben trots deel te zijn van een team van Microsoft dat werkt aan de ontwikkeling daarvan. Dat het een grote uitdaging is, is wel een understatement, maar eens gerealiseerd, zou het de laatste brug zijn tussen software en wetware.

More broadly, though, programming biology is only going to be possible if we can transform the field into being truly interdisciplinary. It needs us to bridge the physical and the life sciences, and scientists from each of these disciplines need to be able to work together with common languages and to have shared scientific questions.

Biologieprogrammering zal echter alleen maar mogelijk worden als we het gebied echt interdisciplinair kunnen maken. De wetenschappen van de fysica en het leven moeten we verbinden en wetenschappers uit elk van deze disciplines moeten kunnen samenwerken met gemeenschappelijke talen en gedeelde wetenschappelijke vragen.

In the long term, it's worth remembering that many of the giant software companies and the technology that you and I work with every day could hardly have been imagined at the time we first started programming on silicon microchips. And if we start now to think about the potential for technology enabled by computational biology, we'll see some of the steps that we need to take along the way to make that a reality. Now, there is the sobering thought that this kind of technology could be open to misuse. If we're willing to talk about the potential for programming immune cells, we should also be thinking about the potential of bacteria engineered to evade them. There might be people willing to do that. Now, one reassuring thought in this is that -- well, less so for the scientists -- is that biology is a fragile thing to work with. So programming biology is not going to be something you'll be doing in your garden shed. But because we're at the outset of this, we can move forward with our eyes wide open. We can ask the difficult questions up front, we can put in place the necessary safeguards and, as part of that, we'll have to think about our ethics. We'll have to think about putting bounds on the implementation of biological function. So as part of this, research in bioethics will have to be a priority. It can't be relegated to second place in the excitement of scientific innovation.

Uiteindelijk moeten we beseffen dat veel van de grote softwarebedrijven en de technologie waarmee wij dagelijks werken, nauwelijks voorstelbaar was toen we voor het eerst programmeerden op silicium microchips. Nu we gaan nadenken over de technologische mogelijkheden van de computationele biologie, zien we een aantal van de stappen die we moeten nemen om dat te realiseren. Nu is er de ontnuchterende gedachte dat van dit soort technologie misbruik gemaakt kan worden. Als we willen praten over de mogelijkheden van immuuncellen programmeren, moeten we ook bedenken dat we bacteriën kunnen ontwerpen om ze ontwijken. Er kunnen mensen zijn die dat zouden willen doen. Een geruststellende gedachte is dat -- nou ja, minder voor de wetenschappers -- is dat biologie een fragiel ding is om mee te werken. Biologie programmeren is niet iets dat je in je tuinhuisje gaat doen. Omdat we aan het begin hiervan staan, kunnen we verder gaan met onze ogen wijd open. We kunnen vooraf de moeilijke vragen stellen, de nodige waarborgen instellen en gecombineerd hiermee nadenken over onze ethiek. We moeten nadenken over het trekken van grenzen voor het implementeren van biologische functies. Als onderdeel hiervan zal onderzoek in de bio-ethiek prioritair moeten zijn. Ze mag niet op de tweede plaats komen in de opwinding over de wetenschappelijke innovatie.

But the ultimate prize, the ultimate destination on this journey, would be breakthrough applications and breakthrough industries in areas from agriculture and medicine to energy and materials and even computing itself. Imagine, one day we could be powering the planet sustainably on the ultimate green energy if we could mimic something that plants figured out millennia ago: how to harness the sun's energy with an efficiency that is unparalleled by our current solar cells. If we understood that program of quantum interactions that allow plants to absorb sunlight so efficiently, we might be able to translate that into building synthetic DNA circuits that offer the material for better solar cells. There are teams and scientists working on the fundamentals of this right now, so perhaps if it got the right attention and the right investment, it could be realized in 10 or 15 years.

Maar de ultieme prijs, de ultieme bestemming op deze reis, zouden baanbrekende toepassingen en baanbrekende industrieën zijn op het gebied van landbouw, geneeskunde, energie en materialen, en zelfs van het computeren. Stel dat we ooit de planeet duurzaam konden voorzien van ultieme groene energie als we iets zouden kunnen nabootsen wat planten millennia geleden al uitvonden: hoe zonne-energie te benutten met een rendement dat buiten het bereik ligt van onze huidige zonnecellen. Als we het programma begrepen van de kwantuminteracties die planten het zonlicht zo efficiënt laten absorberen, konden we dat vertalen in de bouw van synthetische DNA-circuits die een materiaal vormen voor betere zonnecellen. Er zijn nu teams en wetenschappers die aan de fundamenten hiervan werken. Met de juiste aandacht en de juiste investeringen, zou het in 10 of 15 jaar kunnen worden gerealiseerd.

So we are at the beginning of a technological revolution. Understanding this ancient type of biological computation is the critical first step. And if we can realize this, we would enter in the era of an operating system that runs living software.

We staan dus aan het begin van een technologische revolutie. Inzicht in deze oude soort van biologische berekening is de cruciale eerste stap. Als we dit kunnen realiseren, zouden we het tijdperk betreden van een besturingssysteem dat levende software draait.

Thank you very much.

Veel dank.

(Applause)

(Applaus)