Tracing historical genetic leapfrogging

There have been many popular press treatments of Hellenthal et al.’s A Genetic Atlas of Human Admixture History already. If you have not seen their interactive map, which imparts many of their results, I highly recommend it. To understand the scientific results it does help to read some of this group’s earlier papers, such as Inference of Population Structure using Dense Haplotype Data and Population Identification Using Genetic Data. As I suggested earlier the real paper is in the supplements, which has the virtue of being free, but generally the downside of not enforcing concision or accessibility. Obviously the general public is going to focus on the primary results; which populations mixed when. But perhaps more important is that the ingenuous methods described in the supplements illustrate the power of looking at linked variants across segments of the genome, rather than just the variants themselves.
guatemalanThese segments are haplotypes, sequences of variation across genetic regions which exhibit some association. This association can be used to illustrate relatedness across populations and individuals, because the greater the distance of generations (meiotic events) the more recombination events break apart the haplotypes. To make this clearer, I’ve included several chromosomes “painted” by 23andMe as a function of varied ancestral assignments for one individual. You notice that in this painting different colors keep alternating. That is because the individual in this case is a friend whose background is from the mestizo population of Central America. In other words, he has well over 10 generations of recombination events breaking apart associations of ancestry along his genome.

The paper above reports the end product of a similar process of analysis, but quite a bit more elaborated in the inferences being made. At this point I will elide the technical details, not because they are unimportant (I’m particularly fascinated by their decomposition of decay curves which hide multiple admixture events), but because they are difficult, and with one read-through of the supplements I don’t particularly grasp all the subtleties. What is relevant for the reader is that authors used haplotypic information by phasing their data, and so presumably can squeeze more juice out of it. This is illustrated by the comparison with the ROLLOFF application in ADMIXTOOLS, which uses just genotype data to make similar inferences. The future is probably toward phased analysis of haplotypes, because this sort of structuring of genomic data is more informationally rich. But it is computationally intensive to perform population based phasing, and the marker sets have to be dense enough that you can infer haplotypes. That will happen, but we’re not there yet with all data sets. This is a preview of the future, but we’re not in the future yet, that’s for sure.

Before moving onto a cursory survey of the results, I’d like to quote one of the authors from Nick Wade’s piece in The New York Times:

Dr. Myers and Dr. Hellenthal said that they hoped historians would find their work useful, but that they had not collaborated with historians.

“In some sense we don’t want to talk to historians,” Dr. Falush said. “There’s a great virtue in being objective: You put the data in and get the history out. We do think this is a way of reconstructing history by just using DNA.”

You might wonder how they could test if their methods were on the right track as they went along. They preformed many simulations, constructing admixture scenarios to which they were blind and which their method managed to ferret out. Looking at their reported results in the supplements I’m impressed and rather confident that they’re onto something (additionally, the signals were cross-validated with ROLLOFF). But this sort of attitude of ostensible objectivity through ignorance does concern me somewhat, because the reality is the authors are quite aware at least in the most superficial sense of historical events, and did not hesitate to include their judgments within the text. From the paper itself:

Distinct, ancient, and partially shared admixture signals (always dated older than 90 BCE) are seen in six groups (Fig. 4B), including the Kalash (Fig. 2C), whose strongest signal suggests a major admixture event (990 to 210 BCE) from a source related to present-day Western Eurasians, although we cannot identify the geographic origin precisely. This period overlaps that of Alexander the Great (356 to 323 BCE), whose army, local tradition holds, the Kalash are descended from (40), but these ancient events predate recorded history in the region, precluding confident interpretation.

Credit: Flickr
Credit: Flickr

The Kalash are famous because they are pagans who reside in the fastness of the Chitral in northwestern Pakistan. It is highly likely that they will be subject to cultural genocide by their Muslim neighbors within the next 10-20 years, as the Taliban has been threatening them with forced conversion for the past few years, and it seems unlikely that the broader Pakistani society would be willing to expend much more in blood & treasure to protect them. Additionally, many of the Kalash also evince a very European physiognomy, making the legend of a Macedonian origin plausible on the face of it. The 95% confidence interval for date of admixture actually does fall within the period of Alexander’s invasion of the Persian east. But what are truly the chances of this? My own hunch is that the admixture into the Kalash is a real phenomenon, but probably because of another migration event which we are less aware of. This conjecture is based on some prior information. Zack Ajmal has been running the Harappa Ancestry Project for many years now, and has assembled a very large database of Indian ethnicities and castes. And there are already some suggestive patterns which I think may shed light on what is going on with the Kalash. I’ve taken Zack’s data and sorted it by particular ancestral quanta for a subset of populations:

Ethnicity Dataset N S Indian Baloch Caucasian NE Euro
haryana-jatt harappa 5 27% 37% 9% 18%
rajasthani-jatt harappa 2 25% 35% 11% 15%
punjabi-jatt-sikh harappa 13 28% 40% 10% 12%
brahmin-uttar-pradesh metspalu 8 42% 36% 5% 12%
pathan hgdp 23 23% 42% 16% 11%
kalash hgdp 23 22% 43% 18% 11%
burusho hgdp 25 23% 41% 12% 10%
pushtikar-brahmin harappa 1 31% 36% 12% 10%
bengali-brahmin harappa 7 42% 33% 5% 10%
kashmiri-pandit reich 5 32% 39% 12% 9%
punjabi harappa 12 33% 39% 11% 8%
up-kshatriya metspalu 7 45% 37% 4% 8%
punjabi-arain xing 25 31% 44% 10% 7%
sindhi hgdp 24 29% 46% 10% 6%
iyer-brahmin harappa 11 47% 37% 5% 5%

The “NE Euro” cluster is strongly correlated with Northern Europe. It peaks in Finns at 80%, with Lithuanians next at 72%. It’s striking to me that the peasant cultivators of eastern Punjab, the Jatts, have a high fraction of this component, which drops off as you go east and south, as you’d expect, but also into Pakistan. The Jatts have legends of “Scythian” ancestry. This might not be true, but I think something is going on in their history to explain their elevated “NE Euro,” which is above the fraction of North Indian upper castes. Interestingly the ratio between “Baloch” and “S Indian” is ~1.4, almost exactly that of the Punjabi Arain of Pakistan, who are also peasant farmers. What this suggests to me is that the “NE Euro” among the Jatt may be an overlay upon the peasant substrate of the Punjab. I also believe that it post-dated any primary Indo-Aryan contribution, as the upper castes of the North Indian plain do not exhibit the same patterns as the Jatt. Such specific stories are likely common, and illustrate the process of demographic “leapfrogging,” where populations translocate themselves rapidly over great distances, and admix with the local substrate in a single pulse.

As Dienekes has already observed this method has the greatest power to detect admixture events between two strongly distinct populations less than 4,000 years ago. So, for example, he observes that the Yayoi-Jomon admixture in Japan which occurred ~1,500-2,000 years ago does not show up, likely because the two populations are too genetically similar. In contrast, low fractions of East Asian admixture do show up among European populations because they jump out against the European genetic admixture. Many of the more ancient admixture events detected by the Reich lab are outside the purview of this method, which relies on haplotype associations which decay as an exponential function of the admixture time. Additionally, as the authors note they have greater ability to narrow in on singular pulse admixture events, as opposed to continuous gene flow. This is why I say that this is really a map of recent leapfrogging as rapid demographic translocations produce the sort of genetic revolutions which leave the marks that these methods can easily detect.

And that is exactly the case in Central Eurasia. It’s clear on the map at the top of this page that admixture events are dominant in Central Eurasia, but less so on the periphery. H. L. Mackinder’s Heartland is always roiling. And the biggest commotion seems to have been caused by the Mongol Empire, as the authors repeatedly allude to this event as having caused dislocation and admixture. Those who were skeptical of the idea of a Genghis Khan Y chromosomal lineage should present a forthright rebuttal, because though these results don’t imply the expansion of the Genghiside lineage as such, they definitely point to the Mongolians as being the “source” population for numerous admixture events across Central Asia. Both the peoples and the timing fit. The primary question I have about these results though is the relative weaker signal of the Turkic migrations, which preceded the rise of the Mongol Empire by over 500 years. One possibility is that the Turkic signal is weaker because it is closer to a continuous admixture scenario, as the nomadic Turks infiltrating the Islamic civilization were slowly emulsified by their neighbors. In contrast, the rise of the Mongols was a hammer blow to the Islamic world, and peoples rose and fell, an amalgamated, in a few generations.

These results also resolve some old debates among historians and archaeologists. During the late 6th and 7th centuries the Byzantine focused their energies on the Anatolian, and  theBalkan hinterland and much of Greece proper fell to barbarians, Scalveni, Slavs. After several centuries the hinterlands of Greece proper were reconquered, and under Basil II the Balkans were brought back into the Byzantine fold. This process of ethnic back and forth, and likely admixture, was already hinted at in Peter Ralph and Graham Coop’s The Geography of Recent Genetic Ancestry across Europe. But the results here are even clearer. Modern Greeks do seem to have significant Slavic ancestry. Similarly, the Slavs, north to south, are the products of an admixture event which produces a cline in ancestry.

Most of the results in this paper are of this form. Stepping into tendentious historical debates, and pointing the finger in one specific direction. They came. They inferred. They resolved. But not in all instances. Here I quote from the paper:

A different method, which aims to detect but not date admixture, concluded that Cambodians trace ~16% of their DNA to a group equally related to modern-day Europeans and East Asians (29). GLOBETROTTER infers a ~19% contribution from a similar source related to modern-day Central, South, and East Asians and an ~81% contribution from a source related specifically to modern-day Han and Dai, the latter a branch of the Tai people who entered the region in historical times (30) (Fig. 2D, orange box 5). Further, this event dates to 1362 CE (1194 to 1502 CE), a period spanning the end of the Indianized Khmer empire (802 to 1431 CE) (30), one of the most powerful empires in Southeast Asia, whose fall

It is true Southeast Asia witnessed a massive cultural and demographic upheaval ~1000 A.D. Anyone who wishes to read about this period’s impact on modern Southeast Asia should get a copy of Victor Lieberman’s . A thumbnail sketch of what occurred is that Thai invasions from southern China transformed what are today the nations of Thailand and Laos from being zones of Mon-Khmer civilization to a synthetic one, where the Thai absorbed many elements of the Theravada Buddhist culture of their predecessors while maintaining an ethno-linguistic distinctiveness. To the west the Burmese were already in the process of absorbing their Mon predecessors when the Thai arrived and established the Shan polities in the hinterland. Ultimately the incipient Burmese polity survived the Thai assaults, though it retained a Thai minority, albeit one integrated through Therevada Buddhism. Finally, in Vietnam you had a situation where the Kinh, the Vietnamese, were shielded to a great extent from the Thai invasions by geography, and engaged in their own expansion south toward the Mekong delta at the expense of the Khmer.

Why does any of this matter? Because the straightforward interpretation to me of the text above from the paper is that ~80% of the ancestry of modern Cambodians is actually from the Thai. This is difficult to credit, as the standard model is that in fact there was greater assimilation of Mon and Khmer to Thai identification. It could be that continuous gene flow resulted in the demographic turnover, but from what I can tell this should not show up so strongly with the tests being utilized here. The standard model is that Austro-Asiatic rice farmers arrived from southern China nearly ~4,000 years ago, and assimilated a Melanesian-like population. The Thai migrations provided an overlay across the highlands of Burma and in Thailand and Laos, but it assimilated a large and substantial non-Thai peasant substrate. In eastern Thailand this assimilation of Khmer rice farmers has continued to occur down to the present. But perhaps the history is wrong somewhere. If confirmed by future analyses, then the historians and archaeologists may need to look at their inferences with fresh eyes.

One of the ways that the press is spinning these results is that inter-ethnic admixture was extremely common in the past. This is too simple a model, and in fact I suspect that isolation-by-distance gene flow between neighboring groups was the default for most of human history. These pulse admixture events show up against the background of conventional and boring variation because they’re atypical, albeit not rare. Associations between geographically disparate groups is fascinating, because they illustrate the power of human technology (the horse) and organization (reproductive advantage). The future is going to be synthesizing this sort of natural science with history and economics, to construct a fully textured model of the past where the normal is perturbed by bursts of atypicality.

Citation: DOI: 10.1126/science.1243518.

Addendum: Finally, I should add that many of the low frequency admixtures that they see and do not explain with any clarity have reasonable explanations. Perhaps the authors did not elaborate due to constraints of space, or simply because they did not wish to engage in excessive speculation. But to me it seems obvious that West Eurasian admixture in places like Mongolia make a long more sense when you remember that the Mongols had to employ Christian priests to serve their Alan mercenaries. As for West Asian admixture on the North China plain, Sogdians were common as a “middleman minority” during the Tang dynasty, while the Mongol Yuan famously brought in many Muslims from Central Asia.

Posted in Uncategorized

Comments are closed.