Deb Roy: The birth of a word

Stellen Sie sich vor, Sie könnten Ihr Leben aufnehmen - alles was Sie sagen, alles was Sie tun steht Ihnen in einem perfekten Erinnerungsspeicher zur Verfügung, so dass Sie zurückgehen können und unvergessliche Momente finden und noch einmal durchleben, die Spuren der Zeit durchsuchen und Muster in Ihrem Leben entdecken, die vorher unentdeckt geblieben waren. Nun, genau dies war die Reise, die für meine Familie vor fünfeinhalb Jahren begann. Dies ist meine Frau und Mitarbeiterin, Rupal. Und an dem Tag, in diesem Augenblick kamen wir zum ersten Mal mit unserem Erstgeborenen nach Hause, unserem wundervollen Sohn. Und wir traten in ein Haus ein mit einem sehr speziellen Video-Aufnahmesystem. (Video) Man: Okay. Deb Roy: Diesen Augenblick und tausend andere Momente, die uns viel bedeuten, wurden in unserem Haus aufgenommen, weil in jedem Raum des Hauses oben an der Decke eine Kamera mit Mikrophon montiert war. und von dort oben war der ganze Raum aus der Vogelperspektive zu sehen. Hier ist unser Wohnzimmer, das Kinderzimmer Küche, Esszimmer und der Rest des Hauses. Alle zusammen spielen einen CD-Wechsler, der auf Daueraufnahme eingestellt ist. Hier fliegt ein Tag in unserem Heim vorbei vom sonnigen Morgen zum glühenden Abend und, zum Schluss, Lichterlöschen am Ende des Tages. Im Lauf von drei Jahren nahmen wir täglich acht bis zehn Stunden auf, sammelten ungefähr eine Viertelmillion Stunden mehrspuriges Ton- und Bildmaterial. Sie sehen also ein Stück der größten Heimvideosammlung. (Gelächter) Die Bedeutung dieser Daten für uns persönlich als Familie ist jetzt schon immens, und wir finden immer noch heraus, wie wertvoll sie sind. Unzählige Momente dieser ungeplanten, ursprünglichen, nicht gestellten Augenblicke wurden hier eingefangen, und wir lernen immer noch, wie wir sie aufspüren und finden können. Es gibt aber auch einen wissenschaftlichen Grund für dieses Projekt. nämlich die Daten dieser natürlichen Langzeitstudie zu verwenden, um den Prozess zu verstehen mit dem ein Kind seine Sprache lernt -- in diesem Fall mein Sohn. Mit vielen Vorkehrungen, um die Privatsphäre all derer zu gewähren die in den Daten aufgenommen worden waren, stellten wir Teile der Daten unserem zuverlässigen Forschungsteam am MIT zur Verfügung, um so Muster herauskitzeln zu können aus dieser riesigen Datenmenge. Wir versuchten zu verstehen, wie das soziale Umfeld Spracherwerb beeinflusst, Dies hier ist ein Blick auf eines der ersten Dinge, mit denen wir anfingen. Meine Frau und ich kochen hier gerade das Frühstück. Während wir uns durch Raum und Zeit bewegen in einem sehr alltäglichen Lebensmuster in der Küche. Zur Übersetzung dieser 90.000 Stunden Video, in eine Form, in der wir etwas sehen konnten, holten wir, während wir uns durch Raum und Zeit bewegten, mit Bewegungsanalysen so-genannte Raum-Zeit-Würmer heraus. Und dies wurde Teil unserer Werkzeugpalette um herauszufinden, wo etwas Wichtiges stattfand und damit Muster erkennen zu können, insbesondere wo sich mein Sohn im Haus bewegte, um so die Protokollarbeiten darauf fokussieren zu können, in welchem Sprachumfeld sich mein Sohn befand - mit all den Wörtern von mir, meiner Frau, unserem Kindermädchen und den Wörtern, die er mit der Zeit zu produzieren begann. Mit der Technologie und der Datenmenge, sowie der Möglichkeit, mit Hilfe von Apparaten Sprache zu transkribieren, haben wir jetzt mehr als 7 Millionen Wörter protokolliert. Damit kann ich Sie jetzt mitnehmen auf eine erste Tour in die Daten. Sie haben sicher alle schon Zeitraffervideos gesehen. in der das Aufblühen einer Blume verschnellert wird. Erleben Sie jetzt mit mir das Aufblühen einer Sprechweise. Kurz nach seinem ersten Geburtstag sagte mein Sohn "gaga", wenn er Wasser meinte. Im Laufe des nächsten halben Jahres lernte er langsam, sich anzunähern an die erwachsene Form "Wasser". Wir werden also in 40 Sekunden durch ein halbes Jahr sausen. Kein Video hier, damit Sie sich auf den Ton, die Akustik konzentrieren können. einer neuen Art von Höhenflug von gaga zu Wasser. (Audio) Baby: Gagagagaga Gaga gaga gaga guga guga guga wada gaga gaga guga gaga wadö guga guga uata wata wata wata wata wata water water water. Dr: Er hat's getroffen, nicht wahr? (Applaus) Er lernte also nicht nur "Wasser" im Lauf der 24 Monate, den zwei ersten Lebensjahren, auf die wir uns wirklich konzentrierten. Dies ist eine chronologische Karte aller Worte, die er lernte. Und weil wir komplette Transkripte haben können wir jedes der 503 Wörter identifizieren, die er bis zu seinem 2. Geburtstag sprechen gelernt hatte. Er sprach sehr früh. Darum fingen wir an zu analysieren, warum. Warum wurden bestimmte Wörter vor andern geboren? Dies ist eines der ersten Fazits, das vor etwas mehr als einem Jahr herauskam, das uns wirklich überraschte: Diese einfache Grafik ist folgendermaßen zu interpretieren: vertikal ist angegeben, wie komplex die Sprache der Bezugsperson ist basierend auf der Länge der Sätze. Die vertikale Achse ist also Zeit. Und all die Daten glichen wir ab unter dem folgenden Konzept: jedes Mal, wenn mein Sohn ein Wort lernte, gingen wir zurück und erfassten die gehörte Sprache, in der das Wort vorkam. Dann zeichneten wir die relative Satzlänge auf. Und was wir herausfanden war dieses eigenartige Phänomen, dass die Satzlänge der Bezugsperson systematisch auf ein Minimum sank, die Sprache so einfach wie möglich machte, und danach langsam wieder in Komplexität gewann. Das Faszinierendste daran war, dass dieser Taucher, diese Senke fast perfekt übereinstimmte mit dem Moment, in dem jedes Wort geboren wurde. Wort für Wort, ganz gezielt. Es scheint also, dass alle drei Bezugspersonen - ich, meine Frau und unser Kindermädchen - systematisch und ich glaube unbewusst - unsere Sprache anpassen um ihm während der Geburt eines Wortes gerecht zu werden und ihn sanft zu komplexerer Sprache zu bringen. Die Schlussfolgerungen daraus -- es gibt viele, aber ich will nur diese eine hervorheben -- ist, dass es unglaubliche Rückmelde-Schlaufen geben muss. Natürlich lernt mein Sohn von seinem sprachlichen Umfeld, aber das Umfeld lernt von ihm. Dieses Umfeld, Menschen, sind in diesen engen Reaktionsschlaufen und erstellen eine Art Gerüst das bis jetzt noch nicht beobachtet wurde. Aber das ist der Blick auf den sprachlichen Kontext. Wie ist es mit dem visuellen Kontext? Wir betrachten nicht -- stellen Sie sich unser Haus ohne Dach vor, wie ein Puppenhaus. Wir nahmen diese runden Fischaugen-Kameras und führten optische Korrekturen aus, die wir zu einem dreidimensionalen Leben erwecken können. Also, willkommen in meinem Haus. Dies ist ein Augenblick eingefangen über mehrere Kameraeinstellungen. Das Ziel war, den ultimativen Datenspeicher zu kreieren, in dem man zurückgehen und interaktiv herumfliegen und dem System Video-Leben einhauchen kann Was ich jetzt machen werde ist, Ihnen einen Zeitraffer von 30 Minuten zu zeigen, wieder, aus dem Alltag im Wohnzimmer. Das bin ich mit meinem Sohn am Boden. Und hier ist die Videoanalyse, in der unsere Bewegungen verfolgt werden. Mein Sohn hinterlässt rote Farbe, ich hinterlasse grüne Farbe. Jetzt sind wir auf dem Sofa, schauen aus dem Fenster den vorbeifahrenden Autos zu. Und zum Schluss spielt mein Sohn alleine in einer Gehhilfe. Jetzt stoppen wir den dreißig Minuten langen Film, wir verwandeln die Zeit in die vertikale Achse und öffnen die Sicht auf die Interaktionsspuren, die wir hinterlassen haben. Und wir sehen diese verblüffenden Strukturen -- diese kleinen Verknüpfungen der zwei farbigen Fäden nennen wir soziale Lichtpunkte. Diese spiralförmigen Gewinde nennen wir einen Einzel-Lichtpunkt. Wir glauben, dass diese beeinflussen, wie Sprache gelernt wird. Wir möchten gerne zu verstehen beginnen, wie diese zwei Muster zusammenspielen mit der Sprache, der mein Sohn ausgesetzt ist um zu sehen, ob wir voraussagen können wie die Struktur im Moment, in dem Wörter gehört werden, einen Einfluss darauf hat, wann sie gelernt werden - In anderen Worten die Beziehung zwischen Wörtern und dem, wofür sie in der Welt stehen. So gehen wir die Sache an. In diesem Film ist mein Sohn erneut ausgezeichnet. Er hinterlässt eine rote Spur. Und hier ist unser Kindermädchen bei der Tür. (Video) Kindermädchen: Willst du Wasser? Baby: Aaaa. Kindermädchen: Also gut. Baby: Aaaa DR. Sie bietet ihm Wasser an, und los gehen die beiden Würmer hinüber in die Küche, um Wasser zu holen. Das Wort "Wasser" haben wir gezielt benutzt um diesen Moment zu markieren, dieses kleine Stück Handlung. Und jetzt nehmen wir die Macht der Daten und nehmen jeden Augenblick, in dem mein Sohn jemals das Wort Wasser hörte, und den Zusammenhang, in dem er es sah, und verwenden es, um das Video zu durchforsten und jede Spur einer Handlung zu finden, die gleichzeitig im Zusammenhang mit Wasser stattfand. Und was die Daten hinterlassen ist eine Landschaft -- wir nennen sie Wortlandschaft. Dies ist die Wortlandschaft für das Wort Wasser. Wie Sie sehen können, findet die größte Aktivität in der Küche statt. Das sind hier diese hohen Spitzen auf der linken Seite. Und nur um zu vergleichen, können wir dies mit jedem Wort machen, Wir können das Wort "Wiedersehen" nehmen, wie es in "auf Wiedersehen" vorkommt. Wir sind jetzt eingezoomt über dem Hauseingang. Und wir beobachten wie erwartet eine Veränderung der Wortlandschaft, in der das Wort "Wiedersehen" in einer viel strukturierteren Weise stattfindet. Wir brauchen also diese Strukturen um anzufangen, Voraussagen zu treffen in welcher Reihenfolge Sprache erworben wird, daran arbeiten wir im Moment. In meinem Labor am MIT, in das wir jetzt gucken, dies ist am Medienlabor. Dies wurde meine bevorzugte Art jeden erdenklichen Bereich auf Film zu bannen. Drei der wichtigsten Personen in diesem Projekt Philip DeCamp, Rony Kubat und Brandon Roy sind hier zu sehen. Philip war ein enger Mitarbeiter in all den Visualisierungen, die Sie gesehen haben. Und Michael Fleischman war ein weiterer Doktorant in meinem Labor, der mit mir an dieser Heimvideoanalyse arbeitete, und er machte folgende Beobachtung: "Die Art, mit der wir analysieren wie sich Sprache an Ereignisse koppelt, die die gemeinsame Basis für Sprache schafft können wir auch losgelöst von deinem Haus auf die Welt der Medien anwenden, Deb." Und so erfuhren unsere Anstrengungen eine unerwartete Wende. Stellen Sie sich die ganzen Massenmedien vor, die eine gemeinsame Basis bilden und Sie haben das Rezept, um dieses Konzept in eine ganz andere Richtung zu entwickeln. Wir begannen, den Inhalt von Fernsehsendungen zu analysieren, indem wir die gleichen Prinzipien benutzten -- wir analysierten die Ereignisstruktur eines Fernsehsignals -- Folgen einer Fernsehsendung, Werbung, all die Komponenten, die zu einem Fernsehprogramm gehören. Und zur Zeit sammeln und analysieren wir immer über Satellit einen guten Teil aller Fernsehprogramme, die in den USA geschaut werden. Dazu muss man jetzt nicht Wohnzimmer mit Mikrophonen ausrüsten, um die Gespräche der Leute auf zu nehmen, man klinkt sich nur in die öffentlich verfügbaren Kommunikationskanäle. So erhalten wir ungefähr drei Milliarden Kommentare pro Monat. Und jetzt beginnt das Wunder. Sie haben die Struktur der Sendungen, die gemeinsame Basis, von der die Worte handeln, die von den Sendern ausgestrahlt werden; Sie haben die Gespräche über diese Themen; und durch semantische Analysen - und dies ist echtes Datenmaterial, das Sie hier sehen von unserer Datenverarbeitung - jede gelbe Linie zeigt, wie eine Verbindung gemacht wurde zwischen einem Kommentar irgendwo da draußen und einem Stück einer Fernsehsendung, die ausgestrahlt wurde. Und die gleiche Idee kann nun aufgebaut werden. Und wir bekommen diese Wortlandschaft, nur diesmal wurden nicht Wörter in meinem Wohnzimmer gesammelt, sondern jetzt, im Kontext der Aktivitäten auf öffentlichen Plattformen, geht es um den Inhalt von Fernsehsendungen, die die Gespräche antreiben. Was wir hier sehen, diese Wolkenkratzer, sind Kommentare, die im Zusammenhang mit dem Inhalt der Fernsehsendungen sind. Gleiches Konzept, aber mit Blick auf die Kommunikationsdynamik in einem ganz anderen Einflussbereich. Und im Wesentlichen, anstatt zum Beispiel zu messen, wie viele Leute sich einen bestimmten Inhalt ansehen, erhalten wir hier die Grunddaten an denen wir studieren können, welcher Inhalt welche Teilnahmewirkung hat Und so wie wir die Rückmeldeschlaufen und Dynamik in einer Familie beobachten können, können wir jetzt das gleiche Konzept öffnen und auf eine viel größere Gruppe Menschen anwenden. Dies hier ist ein Teilsatz unserer Daten -- nur 50.000 von mehreren Millionen -- und das soziale Diagramm, das sie über öffentliche Quellen verbindet. Und wenn Sie diese jetzt in eine Ebene legen, und in einer anderen Ebene den Inhalt, dann haben wir die Programme und die Sportveranstaltungen und die Werbung, und all die Verbindungsstrukturen, die sie zusammenketten, ergeben eine Inhaltsgrafik. Und dann die wichtige dritte Dimension. Jede dieser Linien, die Sie hier sehen, bezeichnet eine effektiv zustande gekommene Verbindung zwischen etwas, das jemand sagte und einem Ausschnitt aus einer Sendung. Und hier sind wieder mehrere Zehnmillionen Verbindungen, die dieses Geflecht eines sozialen Netzes ergeben und aufzeigen, wie sie mit dem Inhalt der Sendungen korrelieren. Wir können jetzt diese Struktur auf interessante Weise erforschen. Wenn wir zum Beispiel den Weg verfolgen, den ein Stück Information zurücklegt, das jemanden dazu bringt, einen Kommentar abzugeben, und wenn wir den Weg dieses Kommentars Weiterfolgen, und die ganze soziale Struktur beobachten, die aktiviert wird, und danach wieder zurückkehren, um die Beziehung zu erkennen zwischen der sozialen Struktur und der Information wird eine sehr interessante Struktur sichtbar. Wir nennen dies eine mit-sehende Klique, ein virtuelles Wohnzimmer sozusagen. Und da ist eine faszinierende Dynamik im Spiel. Das ist keine Einbahnstraße. Eine Information, ein Ereignis bringt jemandem zum Reden. Sie reden mit anderen Leuten. Dies bringt die Leute, die zuhören, zurück zu den Massenmedien und es entstehen Kreisläufe die das Gesamtverhalten lenken. Ein anderes, ganz anderes Beispiel -- eine echte Person in unserer Datei -- und wir finden mindestens hunderte, wenn nicht tausende davon. Wir haben dieser Person einen Namen gegeben. Dies ist ein halbprofessioneller - ein "Pro-Am", ein Medienkritiker, dessen Suchverhalten sich weit ausdehnt. Viele Leute verfolgen die Kommentare dieser einflussreichen Person, die dazu neigt, über Dinge zu reden, die im Fernsehen waren. So wird diese Person eine Hauptverbindung zwischen den Massenmedien und den sozialen Netzwerken. Ein letztes Beispiel aus diesen Daten: Manchmal ist tatsächlich der Inhalt etwas Besonderes. Wenn wir uns jetzt diesen Ausschnitt betrachten, die Rede von Präsident Obama zur Lage der Nation, erst einige Wochen her, und verfolgen, was wir im gleichen Datenset in der gleichen Auflösung finden, entdecken wir, dass die Beteiligung zu diesem Thema wirklich bemerkenswert ist. Eine Nation bricht in Gespräche aus, in Echtzeit, und reagiert auf das, was gesendet wurde. Selbstverständlich geht durch jede dieser Linien eine Flut unstrukturierter Sprache. Wir können dies durchleuchten und sind am Echtzeit-Puls einer Nation -- wir erhalten einen Eindruck in Echtzeit wie verschiedene Netzwerke in sozialen Strukturen darauf reagieren, aktiviert durch den Inhalt. Um also die Idee zusammen zu fassen: während unsere Welt zunehmend verkabelt wird, und wir die Möglichkeit haben, die Punkte zu sammeln und zu verbinden zwischen dem, was Menschen sagen und dem Kontext, in dem sie es sagen, erhalten wir die Fähigkeit neue Sozialstrukturen und Dynamiken zu sehen, die vorher verborgen blieben. Als würden wir ein Mikroskop oder Teleskop bauen und neue Strukturen aufdecken über unser eigenes Verhalten im Zusammenhang mit Kommunikation. Und ich glaube, die Implikationen sind tiefgründig, ob für die Wissenschaft, die Wirtschaft, die Regierung, oder vielleicht am Wichtigsten, für uns als Individuen. Und um nun wieder zu meinem Sohn zurückzukommen; während ich diesen Vortrag vorbereitete, schaute er mir über die Schulter, und ich zeigte ihm die Videoclips, die ich zu zeigen plante, und fragte ihn um Erlaubnis -- erteilt. Ich fuhr fort und sagte zu ihm, "Ist es nicht total faszinierend, diese ganze Datensammlung, all diese Aufnahmen werde ich dir und deiner Schwester übergeben," die zwei Jahre später zur Welt kam. Und ihr zwei könnt zurückgehen und die Momente Wiedererleben, wie ihr euch mit eurem biologischen Erinnerungsvermögen wohl nie so gut daran erinnern könntet." Er schwieg für einen Moment, und ich dachte, "Was bilde ich mir ein? Er ist fünf Jahre alt, das versteht er eh nicht." Und während ich das dachte, schaute er zu mir herauf und sagte, "Das heißt, wenn ich groß bin, kann ich das meinen Kindern zeigen?" Und ich dachte: "Großartig, das ist echt stark." Daher möchte ich Sie mit einem letzten denkwürdigen Moment für unsere Familie entlassen. Dies ist das erste Mal, dass unser Sohn mehr als zwei Schritte auf einmal machte -- eingefangen im Film. Und ich möchte Sie bitten, auf etwas zu achten, während wir das gemeinsam ansehen. Es ist eine Unordnung, es ist das Leben. Meine Mutter ist in der Küche, sie kocht, und ich bemerke, dass er ausgerechnet im Korridor im Begriff ist, zum ersten Mal mehr als zwei Schritte zu gehen. Sie hören mich also, wie ich ihn ansporne, als ich merke, was geschieht, und dann passiert das Wunder. Hören Sie ganz genau hin. Nach ungefähr drei Schritten wird ihm klar, dass etwas Besonderes passiert. Und die fantastischste Rückmelde-Schlaufe beginnt, er nimmt einen tiefen Atemzug, und flüstert "wow" und ich wiederhole instinktiv, was er sagt. Lassen Sie uns zurückfliegen in der Zeit zu diesem großen Augenblick (Video) DR: Hallo. Komm her. Schaffst du's? Junge, Junge! Kannst du's? Baby: Wow. DR: Wow. Mama, er geht. (Gelächter) (Applaus) DR: Danke (Applaus)

Imagine if you could record your life -- everything you said, everything you did, available in a perfect memory store at your fingertips, so you could go back and find memorable moments and relive them, or sift through traces of time and discover patterns in your own life that previously had gone undiscovered. Well that's exactly the journey that my family began five and a half years ago. This is my wife and collaborator, Rupal. And on this day, at this moment, we walked into the house with our first child, our beautiful baby boy. And we walked into a house with a very special home video recording system. (Video) Man: Okay. Deb Roy: This moment and thousands of other moments special for us were captured in our home because in every room in the house, if you looked up, you'd see a camera and a microphone, and if you looked down, you'd get this bird's-eye view of the room. Here's our living room, the baby bedroom, kitchen, dining room and the rest of the house. And all of these fed into a disc array that was designed for a continuous capture. So here we are flying through a day in our home as we move from sunlit morning through incandescent evening and, finally, lights out for the day. Over the course of three years, we recorded eight to 10 hours a day, amassing roughly a quarter-million hours of multi-track audio and video. So you're looking at a piece of what is by far the largest home video collection ever made. (Laughter) And what this data represents for our family at a personal level, the impact has already been immense, and we're still learning its value. Countless moments of unsolicited natural moments, not posed moments, are captured there, and we're starting to learn how to discover them and find them. But there's also a scientific reason that drove this project, which was to use this natural longitudinal data to understand the process of how a child learns language -- that child being my son. And so with many privacy provisions put in place to protect everyone who was recorded in the data, we made elements of the data available to my trusted research team at MIT so we could start teasing apart patterns in this massive data set, trying to understand the influence of social environments on language acquisition. So we're looking here at one of the first things we started to do. This is my wife and I cooking breakfast in the kitchen, and as we move through space and through time, a very everyday pattern of life in the kitchen. In order to convert this opaque, 90,000 hours of video into something that we could start to see, we use motion analysis to pull out, as we move through space and through time, what we call space-time worms. And this has become part of our toolkit for being able to look and see where the activities are in the data, and with it, trace the pattern of, in particular, where my son moved throughout the home, so that we could focus our transcription efforts, all of the speech environment around my son -- all of the words that he heard from myself, my wife, our nanny, and over time, the words he began to produce. So with that technology and that data and the ability to, with machine assistance, transcribe speech, we've now transcribed well over seven million words of our home transcripts. And with that, let me take you now for a first tour into the data. So you've all, I'm sure, seen time-lapse videos where a flower will blossom as you accelerate time. I'd like you to now experience the blossoming of a speech form. My son, soon after his first birthday, would say "gaga" to mean water. And over the course of the next half-year, he slowly learned to approximate the proper adult form, "water." So we're going to cruise through half a year in about 40 seconds. No video here, so you can focus on the sound, the acoustics, of a new kind of trajectory: gaga to water. (Audio) Baby: Gagagagagaga Gaga gaga gaga guga guga guga wada gaga gaga guga gaga wader guga guga water water water water water water water water water. DR: He sure nailed it, didn't he. (Applause) So he didn't just learn water. Over the course of the 24 months, the first two years that we really focused on, this is a map of every word he learned in chronological order. And because we have full transcripts, we've identified each of the 503 words that he learned to produce by his second birthday. He was an early talker. And so we started to analyze why. Why were certain words born before others? This is one of the first results that came out of our study a little over a year ago that really surprised us. The way to interpret this apparently simple graph is, on the vertical is an indication of how complex caregiver utterances are based on the length of utterances. And the [horizontal] axis is time. And all of the data, we aligned based on the following idea: Every time my son would learn a word, we would trace back and look at all of the language he heard that contained that word. And we would plot the relative length of the utterances. And what we found was this curious phenomena, that caregiver speech would systematically dip to a minimum, making language as simple as possible, and then slowly ascend back up in complexity. And the amazing thing was that bounce, that dip, lined up almost precisely with when each word was born -- word after word, systematically. So it appears that all three primary caregivers -- myself, my wife and our nanny -- were systematically and, I would think, subconsciously restructuring our language to meet him at the birth of a word and bring him gently into more complex language. And the implications of this -- there are many, but one I just want to point out, is that there must be amazing feedback loops. Of course, my son is learning from his linguistic environment, but the environment is learning from him. That environment, people, are in these tight feedback loops and creating a kind of scaffolding that has not been noticed until now. But that's looking at the speech context. What about the visual context? We're not looking at -- think of this as a dollhouse cutaway of our house. We've taken those circular fish-eye lens cameras, and we've done some optical correction, and then we can bring it into three-dimensional life. So welcome to my home. This is a moment, one moment captured across multiple cameras. The reason we did this is to create the ultimate memory machine, where you can go back and interactively fly around and then breathe video-life into this system. What I'm going to do is give you an accelerated view of 30 minutes, again, of just life in the living room. That's me and my son on the floor. And there's video analytics that are tracking our movements. My son is leaving red ink. I am leaving green ink. We're now on the couch, looking out through the window at cars passing by. And finally, my son playing in a walking toy by himself. Now we freeze the action, 30 minutes, we turn time into the vertical axis, and we open up for a view of these interaction traces we've just left behind. And we see these amazing structures -- these little knots of two colors of thread we call "social hot spots." The spiral thread we call a "solo hot spot." And we think that these affect the way language is learned. What we'd like to do is start understanding the interaction between these patterns and the language that my son is exposed to to see if we can predict how the structure of when words are heard affects when they're learned -- so in other words, the relationship between words and what they're about in the world. So here's how we're approaching this. In this video, again, my son is being traced out. He's leaving red ink behind. And there's our nanny by the door. (Video) Nanny: You want water? (Baby: Aaaa.) Nanny: All right. (Baby: Aaaa.) DR: She offers water, and off go the two worms over to the kitchen to get water. And what we've done is use the word "water" to tag that moment, that bit of activity. And now we take the power of data and take every time my son ever heard the word water and the context he saw it in, and we use it to penetrate through the video and find every activity trace that co-occurred with an instance of water. And what this data leaves in its wake is a landscape. We call these wordscapes. This is the wordscape for the word water, and you can see most of the action is in the kitchen. That's where those big peaks are over to the left. And just for contrast, we can do this with any word. We can take the word "bye" as in "good bye." And we're now zoomed in over the entrance to the house. And we look, and we find, as you would expect, a contrast in the landscape where the word "bye" occurs much more in a structured way. So we're using these structures to start predicting the order of language acquisition, and that's ongoing work now. In my lab, which we're peering into now, at MIT -- this is at the media lab. This has become my favorite way of videographing just about any space. Three of the key people in this project, Philip DeCamp, Rony Kubat and Brandon Roy are pictured here. Philip has been a close collaborator on all the visualizations you're seeing. And Michael Fleischman was another Ph.D. student in my lab who worked with me on this home video analysis, and he made the following observation: that "just the way that we're analyzing how language connects to events which provide common ground for language, that same idea we can take out of your home, Deb, and we can apply it to the world of public media." And so our effort took an unexpected turn. Think of mass media as providing common ground and you have the recipe for taking this idea to a whole new place. We've started analyzing television content using the same principles -- analyzing event structure of a TV signal -- episodes of shows, commercials, all of the components that make up the event structure. And we're now, with satellite dishes, pulling and analyzing a good part of all the TV being watched in the United States. And you don't have to now go and instrument living rooms with microphones to get people's conversations, you just tune into publicly available social media feeds. So we're pulling in about three billion comments a month, and then the magic happens. You have the event structure, the common ground that the words are about, coming out of the television feeds; you've got the conversations that are about those topics; and through semantic analysis -- and this is actually real data you're looking at from our data processing -- each yellow line is showing a link being made between a comment in the wild and a piece of event structure coming out of the television signal. And the same idea now can be built up. And we get this wordscape, except now words are not assembled in my living room. Instead, the context, the common ground activities, are the content on television that's driving the conversations. And what we're seeing here, these skyscrapers now, are commentary that are linked to content on television. Same concept, but looking at communication dynamics in a very different sphere. And so fundamentally, rather than, for example, measuring content based on how many people are watching, this gives us the basic data for looking at engagement properties of content. And just like we can look at feedback cycles and dynamics in a family, we can now open up the same concepts and look at much larger groups of people. This is a subset of data from our database -- just 50,000 out of several million -- and the social graph that connects them through publicly available sources. And if you put them on one plain, a second plain is where the content lives. So we have the programs and the sporting events and the commercials, and all of the link structures that tie them together make a content graph. And then the important third dimension. Each of the links that you're seeing rendered here is an actual connection made between something someone said and a piece of content. And there are, again, now tens of millions of these links that give us the connective tissue of social graphs and how they relate to content. And we can now start to probe the structure in interesting ways. So if we, for example, trace the path of one piece of content that drives someone to comment on it, and then we follow where that comment goes, and then look at the entire social graph that becomes activated and then trace back to see the relationship between that social graph and content, a very interesting structure becomes visible. We call this a co-viewing clique, a virtual living room if you will. And there are fascinating dynamics at play. It's not one way. A piece of content, an event, causes someone to talk. They talk to other people. That drives tune-in behavior back into mass media, and you have these cycles that drive the overall behavior. Another example -- very different -- another actual person in our database -- and we're finding at least hundreds, if not thousands, of these. We've given this person a name. This is a pro-amateur, or pro-am media critic who has this high fan-out rate. So a lot of people are following this person -- very influential -- and they have a propensity to talk about what's on TV. So this person is a key link in connecting mass media and social media together. One last example from this data: Sometimes it's actually a piece of content that is special. So if we go and look at this piece of content, President Obama's State of the Union address from just a few weeks ago, and look at what we find in this same data set, at the same scale, the engagement properties of this piece of content are truly remarkable. A nation exploding in conversation in real time in response to what's on the broadcast. And of course, through all of these lines are flowing unstructured language. We can X-ray and get a real-time pulse of a nation, real-time sense of the social reactions in the different circuits in the social graph being activated by content. So, to summarize, the idea is this: As our world becomes increasingly instrumented and we have the capabilities to collect and connect the dots between what people are saying and the context they're saying it in, what's emerging is an ability to see new social structures and dynamics that have previously not been seen. It's like building a microscope or telescope and revealing new structures about our own behavior around communication. And I think the implications here are profound, whether it's for science, for commerce, for government, or perhaps most of all, for us as individuals. And so just to return to my son, when I was preparing this talk, he was looking over my shoulder, and I showed him the clips I was going to show to you today, and I asked him for permission -- granted. And then I went on to reflect, "Isn't it amazing, this entire database, all these recordings, I'm going to hand off to you and to your sister" -- who arrived two years later -- "and you guys are going to be able to go back and re-experience moments that you could never, with your biological memory, possibly remember the way you can now?" And he was quiet for a moment. And I thought, "What am I thinking? He's five years old. He's not going to understand this." And just as I was having that thought, he looked up at me and said, "So that when I grow up, I can show this to my kids?" And I thought, "Wow, this is powerful stuff." So I want to leave you with one last memorable moment from our family. This is the first time our son took more than two steps at once -- captured on film. And I really want you to focus on something as I take you through. It's a cluttered environment; it's natural life. My mother's in the kitchen, cooking, and, of all places, in the hallway, I realize he's about to do it, about to take more than two steps. And so you hear me encouraging him, realizing what's happening, and then the magic happens. Listen very carefully. About three steps in, he realizes something magic is happening, and the most amazing feedback loop of all kicks in, and he takes a breath in, and he whispers "wow" and instinctively I echo back the same. And so let's fly back in time to that memorable moment. (Video) DR: Hey. Come here. Can you do it? Oh, boy. Can you do it? Baby: Yeah. DR: Ma, he's walking. (Laughter) (Applause) DR: Thank you. (Applause)

Deb Roy: The birth of a word

Deb Roy: The birth of a word

Related talks

Patricia Kuhl: The linguistic genius of babies

Steven Pinker: Human nature and the blank slate

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Eric Berlow and Sean Gourley: Mapping ideas worth spreading

David McCandless: The beauty of data visualization

Adam Ostrow: After your final status update

Related talks

Patricia Kuhl: The linguistic genius of babies

Steven Pinker: Human nature and the blank slate

Jean-Baptiste Michel + Erez Lieberman Aiden: What we learned from 5 million books

Eric Berlow and Sean Gourley: Mapping ideas worth spreading

David McCandless: The beauty of data visualization

Adam Ostrow: After your final status update