Graham Coop’s group has been exploring the implications of more complex models of spatial structured genetic variation and admixture for the last few years. I’ve already pointed Gideon Bradburd’s SpaceMix preprint, which attempts to differentiate genetic relatedness due to geographic proximity and therefore continuous gene flow, as opposed to an admixture event which is not congruous with spatial position (e.g., the Norwegian Sami have more Siberian than many groups to their east). Alisa Sedghifar now has a paper out in Genetics, The Spatial Mixing of Genomes in Secondary Contact Zones. Here’s the abstract:
Recent genomic studies have highlighted the important role of admixture in shaping genome-wide patterns of diversity. Past admixture leaves a population genomic signature of linkage disequilibrium (LD), reflecting the mixing of parental chromosomes by segregation and recombination. These patterns of LD can be used to infer the timing of admixture, but the results of inference can depend strongly on the assumed demographic model. Here, we introduce a theoretical framework for modeling patterns of LD in a geographic contact zone where two differentiated populations have come into contact and are mixing by diffusive local migration. Assuming that this secondary contact is recent enough that genetic drift can be ignored, we derive expressions for the expected LD and admixture tract lengths across geographic space as a function of the age of the contact zone and the dispersal distance of individuals. We develop an approach to infer age of contact zones using population genomic data from multiple spatially sampled populations by fitting our model to the decay of LD with recombination distance. To demonstrate an application of our model, we use our approach to explore the fit of a geographic contact zone model to three human genomic datasets from populations in Indonesia, Central Asia and India and compare our results to inference under different demographic models. We obtain substantially different results to the commonly used model of panmictic admixture, highlighting the sensitivity of admixture timing results to the choice of demographic model.
In a stylized fashion what’s going on here is that genome-wide data sets have allowed for the inference of admixture events which usually assume a single pulse of rapid random mating between two extremely diverse populations. This works in a controlled laboratory situation, but is less plausible for humans. There are cases which fit, such as the settlement of Pitcairn by the mutineers from the Bounty, but they’re exceptional (another case might be the admixture you see in some areas of Latin America from Amerindians, where the indigenous groups seem to have disappeared after a few generations, but it turns out that native women were assimilated into the European and African populations in a very short period of time). An alternative scenario is one where two populations come into contact, and admixture takes a longer period of time. In a spatial rendering there’d be a “contact zone” where gene flow might occur in fashion well modeled as a diffusion process. To give a concrete example of the latter case I will offer the Kalmyk people. The Estonian Biocentre has posted some data from this population, and all of them have varying levels of European admixture. As there is variance it is likely that this admixture did not happen all at once. Rather, once the Kalmyks migrated to Russia three hundred years ago there has been continuous gene flow into the community, as opposed to a frenzy of admixture, after which barriers might be thrown up. The latter scenario actually might be likely to occur in a case where only male Kalmyks migrated, but as it was the population it was a full folk wandering, where the tribes evacuated Dzungaria as a whole (I am aware that there were also back migrations, please don’t leave a comment explaining this to me!).
So what happens after an admixture event? As noted in this paper assuming a simple pulse admixture the distribution of ancestry tract lengths and LD decay is exponential. This is a function of the fact that recombination is going to break apart ancestral multi-locus allelic associations as a function of generation time. As an extreme example, the F1 offspring of two very different populations would have alternative ancestry tracts on their paternal and maternal chromosomes. Obviously LD would be very high as well. But as the F1 population randomly mates the LD would be broken apart by recombination, as ancestry tracts would begin to alternate on chromosomal segments. You can see it when you perform ancestry deconvolution on groups such as Puerto Ricans. There are short segments due to old Native American ancestry which entered the population over a narrow period of time which has been chopped up by recombination. In contrast, the African segments have a wider range of block lengths in part because there has been more continuous admixture since the settlement of the island by the Spaniards.
Sedghifar et al. building an analytical framework to allow one to make inferences which are hopefully true to the more multi-textured manner in which populations actually admix than the single pulse. As the paper is open access I invite readers to peruse the formalism as well as the simulations which were performed to evaluate their framework. It strikes me that this is a definite first-pass, but a necessary one. As noted in the paper, but well known for years, the single pulse admixture models tend to underestimate the dates of mixing (or, more charitably, they pick up the last “pulse”). So, often when I saw a paper giving an admixture estimate, I took that as a floor, and nothing more.
In the final section the framework is applied to real data sets. There are two issues that jump out into the foreground. As noted by the authors, the HUGO Pan-Asian data set, which is what you need to use for many maritime Southeast Asian groups, has very few markers. At ~50,000 SNPs it’s really an animal grade set of chip data, not human grade one (and even for animals they’re going beyond 60K SNP-chips). The second issue is geographical coverage. It strikes me that ideally they’d have transects that with more sampling by position. This obviously isn’t something that can be changed right now, so I assume that in the future the situation will improve on the data side and the methods can more robustly be applied.
They compared admixture in India, Southeast Asia, and Central Asia. It seems that their framework did not yield much in India, probably because the admixture patterns are complex and old, and could not be easily retrieved from the data with the few assumptions they had (though that in itself tells you something about the real dynamics). This is really a situation where hopefully ancient DNA will allow researchers to fix some parameters in the future. There are cases where compound pulse admixtures are actually a better model for reality than contact zones and diffusion gene flow across the borders. India may be an instance. For example, the Tamil Brahmins seem to have some indigenous South Indian admixture, but very little variation of this admixture across individuals. That implies that once the admixture occurred, there was a long period where gene flow did not occur due to strict endogamy, else you’d see more variation. In a world unencumbered by social constraints a contact zone model would work well, but South Asia may not be that world.
As expected the secondary contact zone model gave an older date of admixture for Southeast Asia, where Austronesians arrived over the lat 4,000 years. Perhaps even too old! They note: “Linguistic evidence suggests that the Austronesian expansion through Indonesia dates to ∼ 4000 years ago (Gray et al. 2009)…Our estimate of timing based on fitting a geographic contact zone (5800 years ago) is much older than dates estimated by single pulse models, but is also considerably older than the Austronesian expansion.” The citation for Gray et al. seems to be this paper, but I’m pretty sure it was meant to be Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement. In a inter-disciplinary field like this you need to rely on other researchers to complement your own understanding of specific domains. As it happens I am now more skeptical of linguistic phylogenetics than I was, so I don’t put too much stock that the date inferred was much older using their methods than what the linguists believe. Rather, I’d put more of an emphasis on material remains and archaeology, though dating and provenance can be hard to pin down on some occasions.
The last empirical illustration has to do with Central Asia, and I have a bit to say about this. The authors seem to be concerned that their signal of admixture is much older than the period of the Mongol invasions, ~700 years ago. Other studies, based on a pulse admixture model, pin this exact date, and others do not. The problem I have with this is that the real demographic history actually aligns well in my opinion with the dates that are given this paper. I don’t think they needed to take the Mongol model nearly as seriously as they did. But, I doubt that any Central Asianists peer reviewed this for Genetics, so a lot of weight was given probably to the older papers in genetics, where the Mongol angle is always played up. The reality is that there was a massive continuous movement of Turkic peoples from about 500 A.D. from greater Mongolia down into the Persianate world of Central Asia. While in India a contact zone model may not work well due to a history of endogamy, the situation is more amenable to that in Central Asia. I think further extensions of this framework in Inner Asia will be fruitful and necessary.
Though the authors here focus on human data sets, presumably because there was data and we know something about human demographic history, the secondary contact zone model formalized in this paper may be more useful with populations of animals and plants, where social constraints don’t exist to enforce endogamy (unless you count reinforcement!). Also, it probably will be useful in island situations, such as in Japan, where the migration patterns are probably defined by a single admixture followed by a wave of advance which likely had secondary contact zone dynamics (the Ainu have Yayoi ancestry).
Citation: Sedghifar, Alisa, et al. “The Spatial Mixing of Genomes in Secondary Contact Zones.” Genetics (2015).