Our ancestry as a braided estuary

Ancient DNA figure
Citation: Ancient human genomes suggest three ancestral populations for present-day Europeans

Note:
Purple text included by author of this post
Citation: Statistical Methods for Evolutionary Trees.
Citation: Statistical Methods for Evolutionary Trees.

At some point you have no doubt encountered trees of the sort you see to the left. They are incredibly useful visualizations of historical relationships between lineages. Breeding populations. The metaphor of the tree of life was co-opted almost immediately by evolutionary science in the 19th century. On the orders of tens of millions to billions of years the idea of diverging and bifurcating lineages is accurate to a great extent in terms of depicting the dynamics of natural history. But even on this scale the tree masks facts which are not of trivial importance. Horizontal gene transfer means that even very sharply delineated branches of the tree of life may share commonalities across wide regions of the genome. The smaller the value which defines the last common ancestors of two putative lineages, the muddier the image reflected through the lens of the tree becomes. And yet the tree visual metaphor persists when comparing populations which are rather close genetically in an evolutionary sense because of its plain utility. Trees are thick in L. L. Cavalli-Sforza’s History and Geography of Human Genes, which paints the broad and rich landscape of human populations only diverged over the past few tens of thousands of years, our own species.

This is not to ignore the self-evident fact that tips of the branches can eventually converge. Geneticists have long acknowledged, and leveraged, recent admixture between populations long separated by time and space. No one denies that African Americans coalesced out of the relations of black slaves and white settlers. Or that the population genetic landscape of Latin America can not be understood without taking into account the varied quanta of African, European, and Amerindian ancestry which defines particular locales. The reality of admixture in these cases was attested to historically, is visible in a straightforward phenotypic sense, and, can be detected using a small number of classical markers.

What has has changed over the past 10 years, and in particular the past 5 years, has been the analytic fruit born of high density marker sets. By this, I mean rather than the hundreds of markers which L. L. Cavalli-Sforza and colleagues had access to, modern statistical geneticists can extract patterns out of hundreds of thousands of markers, and often whole genomes. This allows for researchers to detect more subtle or distant events which have been erased slowly by the effects of time. To my mind the seminal paper which heralded a paradigm shift was 2009’s Reconstructing Indian History. In this publication the authors concluded that South Asians, ~20% of the world’s population, are themselves a synthetic population, derived from two primary ancestral groups. One group, “Ancestral North Indians,” (ANI) has close affinities with West Eurasians (Europeans, Middle Easterners, etc.). Another group, “Ancestral South Indians” (ASI) has distant affinities with East Eurasians. In fact nearly all Indian subcontinental populations (there are exceptions) can be modeled as a two-way admixture, with various proportions of these two ancestral populations (also see Genetic evidence for recent population mixture in India for an update). The big take home was that the admixture had been thorough and deep enough so that standard clustering techniques (e.g., PCA) could not allow one to infer that South Asians were a synthetic group, ~2,000 to ~4,000 years post-dating an amalgamation event. One major stumbling block was that no close proxy existed for ASI, which was totally absorbed into what became South Asians. But the authors made use of the fact that Andaman Islanders were sufficient substitutes for the purposes of inferring the dynamics of the admixture (they diverged ~20 to ~30,000 years before the present from ASI). Using these methods the same group also came to similar conclusions about Amerindians in Reconstructing Native American population history. Another research group concluded the same for populations in the Horn of Africa, Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool. And even stranger results can be found deeper in Africa, Ancient west Eurasian ancestry in southern and eastern Africa.

More recently there has been the finding that an ancient Siberian boy seems to be representative of a population related to West Eurasians which contributed a substantial proportion of the ancestry of the first settlers of the New World. These results were prefigured by intriguing hints in the genome-wide studies as well as uniparental lineages. The power in this case is that internal nodes in the tree of life which were once only inferred from descendants, can now be examined directly with ancient DNA. There are limitations to time and locale. DNA degrades exponentially, and even in the best of cases it seems that the edge of preservation will be on the order of 100,000 to 1 million years. Additionally, cold and dry climates are naturally going to be highly enriched for samples, because tropical wet climates are amenable to rapid degradation of biomolecules of any sort.

A_large_blank_world_map_with_oceans_marked_in_blueTo me a major implication is that over the next ten years the natural history of Pleistocene metazoans of some size and numbers across the Palearctic shall be illuminated to a much greater degree than we could have imagined. First in line will be humans and dogs, and later this will expand to assorted other lineages, such as bison and elk. And it is the human part of the jigsaw which is at the heart of a recent preprint posted on bioArxiv, Ancient human genomes suggest three ancestral populations for present-day Europeans. Since it is a preprint I won’t repeat much that you can read for yourself. I want to emphasize though that you really should read the supplements if you want more than spare conclusions. As the title states the authors conclude that overall you require at minimum three ancestral populations over a post “Out-of-Africa” time scale to model the dynamics of the emergence of Europeans. Though there were hints of this utilizing results from extant populations, the presence of ancient DNA truly pushed the ability to draw conclusions over the edge. That is because it seems that few of these ancient populations exist in “pure” form. One of the major shortcomings of drawing conclusions from distributions of populations in the present about the past is that interactions and admixtures were far more thoroughgoing than researchers had imagined.

The figure (modified) at the top of this post lays out the findings. In the preprint the authors arrive at the simplest model which can explain the most data. They acknowledge freely that there are likely modifications and elaborations on the edges and margins, and that the data might be explained by more complex models, but the key outcome is that they have rejected more parsimonious models which were once ascendant in regards to the ethnogenesis of Europeans. Ten years ago (see Seven Daughters of Eve) some researchers were presenting a cartoon model of hunter vs. farmer, as if these were two distinct options for the origins of all Europeans. But it turns out even the more nuanced and realistic models which posit varied degrees of genetic and cultural assimilation and interaction were false. Which seems clear from these data, and the ancient DNA, is that a substantial minority fraction of the ancestry of Europeans derives from a third population of northern Eurasian provenance.

Vision1Going back to the lack of parsimony, to the left you see a model of diversification outside of Africa that many had in mind until recently. In this framework a small population of northeast Africans left that continent 50 to 100 thousand years ago, and populated the rest of the world. One group moved east, and gave rise to the populations of eastern Eurasia, as well as Australasia and the New World. Another branch moved north and west, and gave rise to Europeans and and Middle Easterners. The rest of population history might be modeled then as admixtures and rearrangements of this original diversification. In this scenario South Asians are an admixture of West Eurasians and an extinct branch of East Eurasians, explaining their affinities to both great branches of humanity. The divergent nature of Australians might simply be an artifact of their long term isolation in Oceania, rather early on in the diversification of East Eurasians. This model was already difficult to square with genetic data, but it could be shoehorned. Or at least I thought it could, because I did so myself.

Vision2The simplest form of the new model complexifies the topology considerably. Now there is an early branch off of a Eurasian population prior to the diversification of West and East Eurasians, and within the western clade there is a separation between a North Eurasian group, and West Eurasians proper. Putting the focus on Europeans, they may be thought of as a complex admixture between Basal Eurasians, West Eurasians, and North Eurasians. The Basal Eurasian component is mediated by “Early European Farmers,” EFF, who seem to be a hybrid between this group and West Eurasian hunter-gatherers. The North Eurasian component seems to be both ancient and recent. Ancient because some Swedish hunter-gatherers had it (though the Central European one lacked it), and recent because the EFF populations which evident ancient Near Eastern ancestry lacked it, suggesting that it was not as widely present across western Eurasia as it is now. In fact, it is present in high fractions across many Middle Eastern populations, especially the Caucasus. Though the authors studiously avoiding speculating, it is clearly intriguing to them that the North Eurasian component is so widespread, and, that it is likely that it expanded relatively recently. Like the Denisovans the pesence North Eurasian DNA from the far north may simply be a function of biased preservation.

How the authors inferred the existence of Basal Eurasians is rather convoluted, and outlined in the supplements. In many ways this is the only simple model which fulfills all the conditions of their data. The key finding is that the European hunter-gatherers, both Central and Northern, were equally genetically closer to all East Eurasians than EFF. This sort of symmetrical relatedness implies that it is not admixture, but reflecting an ancient, but more recent than the outgroup, bifurcation in the phylogenetic tree. The EFF distance from East Eurasians is a function of the earlier divergence of their Basal Eurasian ancestry. The nature of the Basal Eurasians is left somewhat opaque. One can posit many scenarios of ancient population structure in the Near East, or migrations back and forth between these region and Africa. More data, and especially ancient DNA from the Near East, would clarify the model (unfortunately modern Near Eastern populations are high admixed).

Citation:  Mallick et al.
Citation: Mallick et al.

Though I have focused on phylogenetics, the authors had enough marker density to draw some functional conclusions. In particular they found that the Central European hunter-gatherer had some of the distinctive pigmentation mutations common to Europeans (and lesser extent other West Eurasians), such as at the OCA2-HERC2 ‘blue eye’ locus, as well as SLC45A2. But what was shocking to me is that the hunter-gatherer was fixed for the ancestral homozygous state at SLC24A5. To most of you that might not mean anything, but SLC24A5 is almost always homozygous in the derived state in modern Europeans. The HapMap data set as 329 alleles at this SNP for Europeans, whites of Northwest European heritage and Tuscans from Italy. There is only one copy of the ancestral alleles in the whole data set. Assuming that the result is not a genotyping error of some sort, a homozygote at this locus implies to me that the evidence for a strong selective event in this region (it has a long haplotype) within the last ~10,000 years is correct. The widespread distribution outside of Europe of the derived variant of SLC24A5 means we may not be looking at an originally ‘European’ allele, even if it is fixed in Europeans today. No doubt there will be much more in terms of our understanding of functional and population genetics through the window which ancient DNA allows us to view the past.

There are so many details, and so little time. Because it is a preprint you really should read the whole thing (several times). You are part of the revision process in some sense. But I think the general finding that the past is much more complex than we’d imagined will stand the test of time. On some level everyone understood that the trees illustrating genetic relationships on species which exhibit evidence of extensive gene flow were stylized representations which elided a great deal. But in the case of humans thanks to ancient DNA we see just how much that representation masked. Admixture events were collapsed back into the tree to such an extent that it may have been grossly simplified, and our understanding of past demographic events were sorely lacking in realism. We know this about humans across Northern Eurasia because they’ve been extensively studied, and, we have ancient DNA. Unfortunately due to climate we may never have ancient DNA from the tropics, or from many organisms due to the constraints of preservation (e.g., fish?). But I think that we need to update our null hypotheses. This may mean we give up some cherished models which explain things in a neat fashion, but obscure complexity which is truer to reality is preferable to elegant models which lead us to falsity. Perhaps we should finally end our love affair with the beautiful tree, and admit the virtues of a rambling graph.

Citation: Ancient human genomes suggest three ancestral populations for present-day Europeans.

Addendum: I’ve seen references in internet discussions to affinities in Admixture plots of MA1 (the Siberian boy). Please remember that because we only have one MA1 individual that individual will be forced to be a combination of populations generated from the groups where we have many individuals. So some of the strange and intriguing results are just nonsense, as the algorithm is trying to find the best fit to confusing conditions.

Posted in Uncategorized

Comments are closed.