TEDMED: In your TEDMED 2018 talk, you describe “Uncle Bernie,” the family genealogist who corners family members to get more information. Are you the genealogist in your family? When did your interest in genealogy begin?
Yaniv Erlich: [laughing] I liked genealogy quite a lot, especially as a child. Like many Israeli teenagers, I conducted my own genealogy project while I was in seventh grade. It was so enjoyable that I asked my mother to take me to the Museum of the Jewish People at Beit Hatfutsot; it had one of the only sources for genealogical information available at the time. I loved how history intersects with family stories, and the process of finding ancestors felt like detective work. I did such a good job on this project that it won the title of best genealogy project of the year at my middle school. Now, since genealogy is my work, it is no longer a hobby of mine and the family genealogist is my aunt.
The last time I spent time on the genealogy of my family was after my father passed away 2 years ago. In some way, I felt that tracing my ancestors connected me to my father and his childhood–and reviewing the lifecycles of my family relatives gave me some serenity and comfort that the sorrow that I was experiencing was simply part of the endless rivers of generations.
TM: What was the catalyst for you to begin professional research on genetics and family trees?
YE: I was invited to join a commercial genealogy and social networking website by my third cousin, who was able to trace me and send an invitation email. At that time, I was about to finish my PhD studies and become more interested in human genetics. When I started documenting my family tree on the website, I was shocked to discover that many of my relatives already existed there! This got me thinking — family trees are one of the most valuable assets in human genetics. Yet, large family trees are very hard to collect.
A few months later, I started my own independent research group at the Whitehead Institute of MIT. I decided to try to collect all the data from that website as one of the first projects of the lab, so I sent a cold email to the CTO of the website at that time, Amos Elliston. He immediately agreed and instructed me on how to collect the data. Eventually we downloaded 86 million public profiles from the website.
But over time it became a very long project. We actually spent 8 years from inception to publication.
TM: Did you have any hurdles during the project?
YE: First, we had to substantially enhance and validate the dataset. The central question was whether we can trust datasets that were produced by amateur genealogists the same way that we trust family trees built by scientists. So we subjected the data to a massive number of tests, such as measuring the error rates of family trees, whether the individuals in these datasets represented the general population at the time, and the accuracy of the demographic details inserted by the genealogists. Second, we had to find the correct questions. In some ways, this dataset was a blessing and a curse because so many things can be evaluated using such datasets, and we had to think carefully about the focus of our study. Finally, we had to develop the computational infrastructure to answer those questions. Most genetic algorithms were developed to work with family trees with up to several thousand individuals. We had to develop and improve these algorithms to work on a scale of tens of millions of people.
TM: A lot of your research focuses on the role of genetics in longevity. What was the main thing you wanted to understand about longevity when you began your research?
YE: Longevity is probably the most important trait because the question: “When am I going to die?” is imminent to us as individuals and as a society. Surprisingly, not a lot is known about the genetics of longevity. Some studies in the past suggested that 25% of the variance in longevity is attributed to genetic differences. However, these differences were never spotted by any study!
In addition, there is a long-lasting debate in human genetics regarding the manner in which genetic variations affect traits. One camp argues that each genetic variant contributes independently to a trait regardless of the status of other variants. Another camp claims that the contribution of each variant is a complex function that is affected by the status of other variants. It is possible to find which camp is right by inspecting the correlation of the trait in various types of relatives, from, say, fourth cousins to full siblings. However, until our study, nobody was able to collect large family trees with enough relatives to robustly differentiate between the two camps.
Using our data, we inspected the longevity readout of millions of pairs of relatives. Our analysis shows that longevity is much less heritable than we thought before and only ~15% of the variance in the population can be attributed to genetic differences. Moreover, we showed that at least in the case of longevity, the first camp is the correct one. The value of each genetic variant is independent of the other variants. This is actually great news for precision medicine, because if each variant works independently, it means that it should be easier to find those longevity variants in the future.
TM: In your TEDMED talk, you spoke about the immense potential of biomedical research and the many insights we can gain from genealogy research. What’s the future of genealogy research?
YE: DNA! We currently see an ongoing revolution in the field. DNA tests enable genealogists to find relatives beyond the information permitted in genealogical records and as a tool to validate these records. In addition, DNA helps to solve cases when records are missing such in the case of adoptees, holocaust survivors, and even child trafficking. Thanks to the genomics revolution DNA tests are now highly affordable, democratizing access by growing segments of the population. A recent Technology Review article estimated that more than 26 million people took such tests and the uptake shows an exponential increase. Some estimate that in a decade most people in Western societies will have access to their DNA information, which means that we may be able to create the world’s family tree based also on DNA matches and not just genealogical information and family stories.