People use the internet for various reasons. It turns out that one of the most popular categories of website is something that people typically consume in private. It involves curiosity, non-insignificant levels of self-indulgence and is centered around recording the reproductive activities of other people.
(Laughter) Of course, I'm talking about genealogy --
(Laughter) the study of family history.
When it comes to detailing family history, in every family, we have this person that is obsessed with genealogy. Let's call him Uncle Bernie. Uncle Bernie is exactly the last person you want to sit next to in Thanksgiving dinner, because he will bore you to death with peculiar details about some ancient relatives. But as you know, there is a scientific side for everything, and we found that Uncle Bernie's stories have immense potential for biomedical research.
We let Uncle Bernie and his fellow genealogists document their family trees through a genealogy website called geni.com. When users upload their trees to the website, it scans their relatives, and if it finds matches to existing trees, it merges the existing and the new tree together. The result is that large family trees are created, beyond the individual level of each genealogist. Now, by repeating this process with millions of people all over the world, we can crowdsource the construction of a family tree of all humankind.
Using this website, we were able to connect 125 million people into a single family tree. I cannot draw the tree on the screens over here because they have less pixels than the number of people in this tree. But here is an example of a subset of 6,000 individuals. Each green node is a person. The red nodes represent marriages, and the connections represent parenthood. In the middle of this tree, you see the ancestors. And as we go to the periphery, you see the descendants. This tree has seven generations, approximately.
Now, this is what happens when we increase the number of individuals to 70,000 people -- still a tiny subset of all the data that we have. Despite that, you can already see the formation of gigantic family trees with many very distant relatives.
Thanks to the hard work of our genealogists, we can go back in time hundreds of years ago. For example, here is Alexander Hamilton, who was born in 1755. Alexander was the first US Secretary of the Treasury, but mostly known today due to a popular Broadway musical. We found that Alexander has deeper connections in the showbiz industry. In fact, he's a blood relative of ... Kevin Bacon!
(Laughter)
Both of them are descendants of a lady from Scotland who lived in the 13th century. So you can say that Alexander Hamilton is 35 degrees of Kevin Bacon genealogy.
(Laughter)
And our tree has millions of stories like that.
We invested significant efforts to validate the quality of our data. Using DNA, we found that .3 percent of the mother-child connections in our data are wrong, which could match the adoption rate in the US pre-Second World War.
For the father's side, the news is not as good: 1.9 percent of the father-child connections in our data are wrong. And I see some people smirk over here. It is what you think -- there are many milkmen out there. (Laughter) However, this 1.9 percent error rate in patrilineal connections is not unique to our data. Previous studies found a similar error rate using clinical-grade pedigrees. So the quality of our data is good, and that should not be a surprise. Our genealogists have a profound, vested interest in correctly documenting their family history.
We can leverage this data to learn quantitative information about humanity, for example, questions about demography. Here is a look at all our profiles on the map of the world. Each pixel is a person that lived at some point. And since we have so much data, you can see the contours of many countries, especially in the Western world. In this clip, we stratified the map that I've showed you based on the year of births of individuals from 1400 to 1900, and we compared it to known migration events. The clip is going to show you that the deepest lineages in our data go all the way back to the UK, where they had better record keeping, and then they spread along the routes of Western colonialism. Let's watch this. (Music) [Year of birth: ] [1492 - Columbus sails the ocean blue] [1620 - Mayflower lands in Massachusetts] [1652 - Dutch settle in South Africa] [1788 - Great Britain penal transportation to Australia starts] [1836 - First migrants use Oregon Trail] [all activity]
I love this movie.
Now, since these migration events are giving the context of families, we can ask questions such as: What is the typical distance between the birth locations of husbands and wives? This distance plays a pivotal role in demography, because the patterns in which people migrate to form families determine how genes spread in geographical areas. We analyzed this distance using our data, and we found that in the old days, people had it easy. They just married someone in the village nearby. But the Industrial Revolution really complicated our love life. And today, with affordable flights and online social media, people typically migrate more than 100 kilometers from their place of birth to find their soul mate.
So now you might ask: OK, but who does the hard work of migrating from places to places to form families? Are these the males or the females? We used our data to address this question, and at least in the last 300 years, we found that the ladies do the hard work of migrating from places to places to form families. Now, these results are statistically significant, so you can take it as scientific fact that males are lazy.
(Laughter)
We can move from questions about demography and ask questions about human health. For example, we can ask to what extent genetic variations account for differences in life span between individuals. Previous studies analyzed the correlation of longevity between twins to address this question. They estimated that the genetic variations account for about a quarter of the differences in life span between individuals. But twins can be correlated due to so many reasons, including various environmental effects or a shared household. Large family trees give us the opportunity to analyze both close relatives, such as twins, all the way to distant relatives, even fourth cousins. This way we can build robust models that can tease apart the contribution of genetic variations from environmental factors. We conducted this analysis using our data, and we found that genetic variations explain only 15 percent of the differences in life span between individuals. That is five years, on average. So genes matter less than what we thought before to life span. And I find it great news, because it means that our actions can matter more. Smoking, for example, determines 10 years of our life expectancy -- twice as much as what genetics determines.
We can even have more surprising findings as we move from family trees and we let our genealogists document and crowdsource DNA information. And the results can be amazing. It might be hard to imagine, but Uncle Bernie and his friends can create DNA forensic capabilities that even exceed what the FBI currently has. When you place the DNA on a large family tree, you effectively create a beacon that illuminates the hundreds of distant relatives that are all connected to the person that originated the DNA. By placing multiple beacons on a large family tree, you can now triangulate the DNA of an unknown person, the same way that the GPS system uses multiple satellites to find a location.
The prime example of the power of this technique is capturing the Golden State Killer, one of the most notorious criminals in the history of the US. The FBI had been searching for this person for over 40 years. They had his DNA, but he never showed up in any police database. About a year ago, the FBI consulted a genetic genealogist, and she suggested that they submit his DNA to a genealogy service that can locate distant relatives. They did that, and they found a third cousin of the Golden State Killer. They built a large family tree, scanned the different branches of that tree, until they found a profile that exactly matched what they knew about the Golden State Killer. They obtained DNA from this person and found a perfect match to the DNA they had in hand. They arrested him and brought him to justice after all these years. Since then, genetic genealogists have started working with local US law enforcement agencies to use this technique in order to capture criminals. And only in the past six months, they were able to solve over 20 cold cases with this technique.
Luckily, we have people like Uncle Bernie and his fellow genealogists These are not amateurs with a self-serving hobby. These are citizen scientists with a deep passion to tell us who we are. And they know that the past can hold a key to the future.
Thank you very much. (Applause)