Women hate going to India


For some reason women do not seem to migrate much into South Asia. In the late 2000s I, along with others, noticed a strange discrepancy in the Y and mtDNA lineages which trace one’s direct male and female lines: in South Asia the male lineages were likely to cluster with populations to the north an west, while the females lines did not. South Asia’s females lines in fact had a closer relationship to the mtDNA lineages of Southeast and East Asia, albeit distantly.

One solution which presented itself was to contend there was no paradox at all. That the Y chromosomal lineages found in South Asia were basal to those to the west and north. In particular, there were some papers suggesting that perhaps R1a1a originated in South Asia at the end of the last Pleistocene. Whole genome sequencing of Y chromosomes does not bear this out though. R1a1a went through rapid expansion recently, and ancient DNA has found it in Russia first. But in 2009 David Reich came out with Reconstructing Indian population history, which offered up somewhat of a possible solution.

What Reich and his coworkers found that South Asia seems to be characterized by the mixture of two very different types of populations. One set, ANI (Ancestral North Indian), are basically another western or northwestern Eurasian group. ASI (Ancestral South Indian), are indigenous, and exhibit distant affinities to the Andaman Islanders. The India-specific mtDNA then were from ASI, while the Y chromosomes with affinities to people to the north and west were from ANI. In other words, the ANI mixture into South Asia was probably through a mass migration of males.

But it’s not just Y and mtDNA in this case only. A minority of South Asians speak Austro-Asiatic languages. The most interesting of these populations are the Munda, who tend to occupy uplands in east-central India. Older books on India history often suggest that the Munda are the earliest aboriginals of the subcontinent, but that has to confront the fact that most Austro-Asiatic language are spoken in Southeast Asia. There was no true consensus where they were present first.

Genetics seems to have solved this question. The evidence is building up that Austro-Asiatic languages arrived with rice farmers from Southeast Asia. Though most of the ancestry of the Munda is of ANI-ASI mix, a small fraction is clearly East Asian. And interestingly, though they carry no East Asian mtDNA, they do carry East Asian Y. Again, gene flow mediated by males.

The same is true of India’s Bene Israel Jewish community.

A new preprint on biorxiv confirms that the Parsis are another instance of the same dynamic: The genetic legacy of Zoroastrianism in Iran and India: Insights into population structure, gene flow and selection:

Zoroastrianism is one of the oldest extant religions in the world, originating in Persia (present-day Iran) during the second millennium BCE. Historical records indicate that migrants from Persia brought Zoroastrianism to India, but there is debate over the timing of these migrations. Here we present novel genome-wide autosomal, Y-chromosome and mitochondrial data from Iranian and Indian Zoroastrians and neighbouring modern-day Indian and Iranian populations to conduct the first genome-wide genetic analysis in these groups. Using powerful haplotype-based techniques, we show that Zoroastrians in Iran and India show increased genetic homogeneity relative to other sampled groups in their respective countries, consistent with their current practices of endogamy. Despite this, we show that Indian Zoroastrians (Parsis) intermixed with local groups sometime after their arrival in India, dating this mixture to 690-1390 CE and providing strong evidence that the migrating group was largely comprised of Zoroastrian males. By exploiting the rich information in DNA from ancient human remains, we also highlight admixture in the ancestors of Iranian Zoroastrians dated to 570 BCE-746 CE, older than admixture seen in any other sampled Iranian group, consistent with a long-standing isolation of Zoroastrians from outside groups. Finally, we report genomic regions showing signatures of positive selection in present-day Zoroastrians that might correlate to the prevalence of particular diseases amongst these communities.

The paper uses lots of fancy ChromoPainter methodologies which look at the distributions of haplotypes across populations. But some of the primary results are obvious using much simpler methods.

1) About 2/3 of the ancestry of Indian Parsis derives from an Iranian population
2) About 1/3 of the ancestry of Indian Parsis derives from an Indian popuation
3) Almost all the Y chromosomes of Indian Parsis can be accounted for by Iranian ancestry
4) Almost all the mtDNA haplogroups of Indian Parsis can be accounted for by Indian ancestry
5) Iranian Zoroastrians are mostly endogamous
6) Genetic isolation has resulted in drift and selection on Zoroastrians

The fact that the ancestry proportion is clearly more than 50% Iranian for Parsis indicates that there was more than one generation of males who migrated. They did not contribute mtDNA, but they did contribute genome-wide to Iranian ancestry. There are wide intervals on the dating of this admixture event, but they are consonant oral history that was later written down by the Parsis.

So there you have it. Another example of a population formed from admixture because women hate going to India.

Citation: The genetic legacy of Zoroastrianism in Iran and India: Insights into population structure, gene flow and selection.
Saioa Lopez, Mark G Thomas, Lucy van Dorp, Naser Ansari-Pour, Sarah Stewart, Abigail L Jones, Erik Jelinek, Lounes Chikhi, Tudor Parfitt, Neil Bradman, Michael E Weale, Garrett Hellenthal
bioRxiv 128272; doi: https://doi.org/10.1101/128272

The revenge of the cavemen

In 2012 I wrote Post-Neolithic revenge of the foragers. There were two proximate rationales for my thoughts at the time. First, I thought Peter Bellwood’s thesis of agricultural based demographic expansions in was being vindicated in the broadest sketch, but there were many countervailing details. Second, there were already suggestions that genetic data was not indicative of a final victory of farmers by pastoralists.

There were several immediate issues that came to mind in the non-genetic domain. Bellwood argued that agriculture shape the distribution of modern language families, but the spread of Turkic and Finnic peoples seem likely to have been post-agricultural, and not based on farming. Both these groups were arguably nomadic, one pastoralist, and the other engaging in mixed use lifestyles which were reminiscent of classic hunting and gathering. And, there has been anthropological evidence that though pure hunter-gatherers, such as indigenous Australians, do not take to cultivation easily, they quickly transition to pastoralism. In other words, the skills and mores which are common among hunter-gatherers can translate rapidly once domesticate based nomadism spreads.

The Turks, or the Saami with their reindeer, are evidence of this transition, and its success. It seems plausible that the same was the case with Indo-Europeans, and that is what I thought at the time.

Now we have more data from ancient DNA. It does seem there was a “resurgence” of Mesolithic hunter-gatherer ancestry as time passed, with Neolithic farmers exhibiting a more indigenous genetic profile in Europe. Additionally, the arrival of Indo-European steppe ancestry brought another dollop of “hunter-gatherer” ancestry from beyond the fringes of Europe proper.

So what story can we tell of the transition between the Late Neolithic (LN) and the Early Bronze Age (ENA) in Europe? First, the proto-Indo-Europeans were people from the fringes and boundaries. Their genetics indicate some sort of influence from the Near East, likely via the Maykop people. But their roots were also deep in eastern Europe, from the local hunter-gatherers who had affinities with Siberians to their east and European hunter-gatherers to their west. From from this synthesis emerged something special, a warlike group of mobile pastoralists who quickly swept the field.

This reminds me of something from Peter Turchin’s book, . Populations on the borders or frontiers of ethno-cultural (and possibly political) zones may exhibit more group cohesion than those from “core” areas. The Indo-Europeans were a border folk. They may also take to cultural innovations more quickly, in it is clear that switching to the new religion occurred faster among elites in outlying regions than in the core.

A second issue, which is not proven, but may be possible, is that once the Indo-Europeans moved into the North European plain, they allied with residual hunter-gatherer populations. A classic enemy-is-my-enemy proposition. This would likely result in a higher proportions of Pleistocene ancestry in later generations due to assimilation.

The moral of the story is that often there is no final victory in the war. Human history is full of reversals.

The reality of cultural hitchhiking

The figure to the left is from a paper, The mountains of giants: an anthropometric survey of male youths in Bosnia and Herzegovina, which attempts to explain why the people from the uplands of the western Balkans are so tall. Anyone who has watched high level basketball, or perused old physical anthropology textbooks, knows that average heights in the Dinaric Alps are quite high in comparison to the rest of Europe, matched only in the region around Scandinavia. The Dutch of late have been the world champions in height, and explanations such as recent selection and their high consumption of dairy products have been given. In this paper the authors point out that the people who live in the Dinaric uplands are not a population which consumes a inordinately high protein diet, at least in relation to their neighbors.

Rather, they suggest that the height of the people who reside in the Dinarics is due to a genetic factor. There is now good genomic evidence that selection accounts for at least some of the difference in height between Northern and Southern Europeans. That is, seems that there have been divergent pressures in these two locales, their genetic differences due to historical demography aside.

The exception to this north-south gradient is obviously in the Dinarics. Another way in which the Dinarics are exception is that it has the highest frequency of Y chromosomal haplgroup I. The other mode of haplogroup I is in Scandinavia. I1 is common among people who live in Sweden, while I2 among the peoples of the western Balkans. I has an interesting history because the vast majority of Mesolithic hunter-gatherer males in Europe belong to this haplogroup. It is very rare outside of Europe. This is in contrast to the other major European haplogroups, which are found outside of Europe at appreciable frequencies.

It is likely that I is indicative of a lineage with roots in Europe which go back to the late Pleistocene period after Last Glacial Maximum ~20,000 years ago. As the world warmed ~10,000 years ago small populations of hunter-gatherers rapidly expanded from their refuges and either most of the males were I, or in the drift process on the edge of the wave of advance I became very common. It is plausible that in terms of alleles which account for variation in height these hunter-gatherers were enriched for those conferring larger size. Cold weather populations tend to be larger. Additionally, they probably consumed a relatively diversified but high protein diet, allowing for greater median size than among farmers at the Malthusian carrying capacity.

But, there has been a lot of selection over the past 10,000 years, and I am skeptical that this correlation between I and height in Europe is anything but a coincidence. Rather, the phylogeny which I exhibits brings me to another issue which I think is not often highlighted: I1 in particular may have “hitchhiked” with the exogenous lineages such as R1b and R1a in early Indo-European society.

That is, in the patrilineal descent groups expanding across the landscape and monopolizing access to resources and mates, the non-invasive I somehow integrated themselves into the broader cultural complex, and partook in the plenty. Like R1b and R1a it exhibits a rake-like topology which suggests rapid recent expansion.

This would not be exceptional. The modern Russian state’s origins are in the polities created by Keivan Rus, who were famously Scandinavian. Rurik was by origin a Swede, and his dynasty eventually came to encompass most of the eastern Slavic peoples, and rule over the Russian people and state until the 17th century. Because there were so any descendants of this dynasty it was possible to adduce its Y chromosomal haplogroup, N1c1. The kicker is that this is clearly a Finnic lineage, with the most recent evidence being that it is a remnant of a recent migration out of Siberia to the west. The implication here is that the direct male lineage of Rurik were assimilated into the Scandinavian culture and power structure, and were possibly chieftains of Finnic tribes somewhere along the Baltic littoral.

Another example is the House of Wessex. Alfred the Great is arguably the first true king of England. Here are the names of some of the earlier monarchs of the House of Wessex, Ceawlin, Cynric, and Cynegils. Even someone without a background in historical linguistics may be curious about whether these are Anglo-Saxons, and there is a line of thinking that perhaps the forebears of Alfred were British warlords, who “went Saxon,” in a fashion analogous to Gallo-Roman aristocrats who assimilated to Frankish-Germanic norms and forms in the 6th and 7th centuries in the Merovingian domains.

Overall what you see in the genetic data are many things, but rarely a straightforward story. Just as genes can impact culture (e.g., lactase persistence), so culture impacts the distribution of genes. Just as human polities are coalitions, so genetic lineages themselves in their distribution and evolutionary history exhibit fingerprints of these past socio-political events and ideas.

Why only one migrant per generation keeps divergence at bay

The best thing about population genetics is that because it’s a way of thinking and modeling the world it can be quite versatile. If is a way to analyze the world rationally, thinking like a population geneticist allows you to have the big picture on the past, present, and future, of life.

I have some personal knowledge of this as a transformative experience. My own background was in biochemistry before I became interested in population genetics as an outgrowth of my lifelong fascination with evolutionary biology. It’s not exactly useless knowing all the steps of the Krebs cycle, but it lacks in generality. In his  I recall Isaac Asimov stating that one of the main benefits of his background as a biochemist was that he could rattle off the names on medicine bottles with fluency. Unless you are an active researcher in biochemistry your specialized research is quite abstruse. Population genetics tends to be more applicable to general phenomena.

In a post below I made a comment about how one migrant per generation or so is sufficient to prevent divergence between two populations. This is an old heuristic which goes back to Sewall Wright, and is encapsulated in the formalism to the left. Basically the divergence, as measured by Fst, is proportional to the inverse of 4 time the proportion of migrants times the total population + 1. The mN is equivalent to the number of migrants per generation (proportion times the total population). As the mN become very large, the Fst converges to zero.

The intuition is pretty simple. Image you have two populations which separate at a specific time. For example, sea level rise, so now you have a mainland and island population. Since before sea level rise the two populations were one random mating population their initial allele frequencies are the same at t = 0. But once they are separated random drift should begin to subject them to divergence, so that more and more of their genes exhibit differences in allele frequencies (ergo, Fst, the between population proportion of genetic variation, increases from 0).

Now add to this the parameter of migration. Why is one migrant per generation sufficient to keep divergence low? The two extreme scenarios are like so:

  1. Large populations change allele frequency very slowly due to drift, so only a small proportion of migration is needed to prevent them from diverging
  2. Small populations change allele frequency very fast due to drift, so a larger proportion of migration is needed to prevent them from drifting

Within a large population one migrant is a small proportion, but drift is occurring very slowly. Within a small population drift is occurring fast, but one migrant is a relatively large proportion of a small population.

Obviously this is a stylized fact with many details which need elaborating. Some conservation geneticists believe that the focus on one migrant is wrongheaded, and the number should be set closer to 10 migrants.

But it still gets at a major intuition: gene flow is extremely powerful and effective at reducing differences between groups. This is why most geneticists are skeptical of sympatric speciation. Though the focus above is on drift, the same intuition applies to selective divergence. Gene flow between populations work at cross-purposes with selection which drives two groups toward different equilibrium frequencies.

This is why it was surprising when results showed that Mesolithic hunter-gatherers and farmers in Europe were extremely genetically distinct in close proximity for on the order of 1,000 years. That being said, strong genetic differentiation persists between Pygmy peoples and their agriculturalist neighbors, despite a long history of living nearby each other (Pygmies do not have their own indigenous languages, but speak the tongue of their farmer neighbors). In the context of animals physical separation is often necessary for divergence, but for humans cultural differences can enforce surprisingly strong taboos. Culture is as strong a phenomenon as mountains or rivers….

Sex bias in migration from the steppe (revisited)

Last fall I blogged a preprint which eventually came out as a paper in PNAS, Ancient X chromosomes reveal contrasting sex bias in Neolithic and Bronze Age Eurasian migrations. The upshot is that the authors found that there was far less steppe ancestry on the X chromosomes of Bronze Age Central Europeans than across the whole genome. The natural inference here is that you had migrations of males into territory where they had to find local wives.

But the story does not end there. Iosif Lazaridis and David Reich have put out a short not on biorxiv, Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe. It’s short, so I suggest you read the note yourself, but the major issue seems to be that on X chromosomes ADMIXTURE in supervised mode seems to behave really strangely. Lazaridis and Reich find that there seems to be a downward bias of steppe ancestry. Ergo, the finding was an artifact.

Goldberg et al. almost immediately responded, Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe. Their response seems to be that yes, ADMIXTURE does behave strangely, but the overall finding is still robust.

With these uncertainties I do wonder if it’s hard at this point to evaluate the alternative models. But, we do have archaeology and mtDNA. What do those say? On that basis, from what little I know, I am inclined to suspect a strong male bias of migration.

Citation: Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe, Amy Goldberg, Torsten Gunther, Noah A Rosenberg, Mattias Jakobsson
bioRxiv 122218; doi: https://doi.org/10.1101/122218

Citation: Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe, Iosif Lazaridis, David Reich, bioRxiv 114124; doi: https://doi.org/10.1101/114124

How a Eurasian “band of brothers” shaped the world


When I was eight years old I saw a map which genuinely confused me. I had opened up deluxe dictionary at my elementary school and saw a map of the world’s language families, and noticed that there were a group of dialects which spanned the Bay of Bengal to the North Sea. In fact, according to this map the language I had first learned to speak, Bengali, was in the same language family as English.

This was hard to wrap my mind around, but there it was in front of me. Further research at the public library confirmed this fact. And, upon further reflection it was obvious to me there were similarities…I had been learning French at school, and English, Bengali, and French, all exhibited similarities in the first ten numbers. English and French I understood in terms of a natural relationship, but Bengali?

My personal and professional interests have never been in domains where I would explore the topic first hand, but the origins of Indo-European languages have always been a hobby. I read books such as and  when I could. When taking in excellent works such as the Indo-European thread was always something I kept in mind.

But the above works take a more old-fashioned Eurasian heartland “marauders from the steppe” viewpoint. Starting about 15 years ago I began to look into a different framework: Indo-Europeans as farmers. For me begins with the 2002 paper, Mapping the Origins and Expansion of the Indo-European Language Family, which finds that “the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago” (this is the last paper I can remember reading in paper format). The model is elaborated by Peter Bellwood in works such as , though he applies it to most language families.

But its origins go back decades, with the archaeologist Colin Renfrew. Rather than dramatic explosions from the steppe, Renfrew and colleagues suggest that the demographic expansion enabled by agriculture as a mode of production allowed for groups like Indo-Europeans to rapidly swamp their neighbors and enter into a process known as a wave of advance. There wasn’t a organized movement. Rather, farming enables the growth of population to such an extent that it was almost an undirected thermodynamic law that the original farmers would radiate outward, away from zones at the Malthusian carrying capacity and out toward virgin land.

It was a parsimonious theory, and phylogenetic techniques seem to have supported it. But then came ancient DNA to overturn the apple-cart. I won’t reshash what you probably already know, but will point to the two most relevant papers, Massive migration from the steppe was a source for Indo-European languages in Europe and Population genomics of Bronze Age Eurasia. Basically there was massive population turnover during the early Bronze Age. The genetic data aligned well with predictions you’d make from the old “marauders from the steppe” model, not the demic diffusion of farmers who were subject to high endogenous population growth over time.

Of course the Anatolian model proponents have an answer. There is a thesis whereby the steppe pastoralists derive from Anatolians, and so the European population turnover was of one Indo-European group by another. This is possible, but to my knowledge this model was never foregrounded by Anatolianists before. Rather, it strikes me as a way to “save” their framework.

So far much of the battle has been between archaeologists, who tend to favor gradualism, and often even  cultural diffusion as opposed to migration, and historical linguists and arriviste geneticists, who tend toward a more classical migration-from-the-steppe perspective.

A new paper in Antiquity takes the sledgehammer to the Anatolian hypothesis with an archaeology first tack. Re-theorising mobility and the formation of culture and language among the Corded Ware Culture in Europe. They don’t pull punches:

…the Anatolian hypothesis must be considered largely falsified. Those Indo-European languages that later came to dominate in western Eurasia were those originating in the migrations from the Russian steppe during the third millennium BC.

Why would they say this? There is a major paper coming out:

These local processes of social integration between intruding Yamnaya/Corded Ware populations and remnant Neolithic populations can be applied to language dispersal. We should expect that the transformation from Proto-Indo-European to Pre-Proto Germanic would reveal the same kind of hybridisation between an earlier Neolithic language of the Funnel Beaker Culture, and the incoming Proto-Indo-European language. This is precisely what recent linguistic research has been able to demonstrate (Kroonen & Iversen in press). In their study on the formation of Proto-Germanic in Northern Europe, Kroonen and Iversen document a bundle of linguistic terms of non-Indo-European origin linked to agriculture that were adopted by Indo-European-speaking groups who were not fully fledged farmers.

They also contend that the Neolithic language was roughly the same throughout the zone of Indo-European expansion. From what those who would know about these sorts of things have told me this is plausible, because the Neolithic farmers spread so rapidly from a small founder culture, and exhibited broad Europe-wide similarities for a thousand years. Curiously, the chart shows that Germanic languages may have been influenced by a hunter-gatherer language, which the others were not. I suspect this may have to do with the relatively late persistence of hunter-gatherers in some maritime environments facing the Baltic and North Sea.

The paper, which is open access, needs to be read in full. Here are some important points:

  • Burial type seems to be a more robust form of indicator of dominant cultural identity
  • Corded Ware males practiced exogamy
  • Corded Ware males traveled long distances
  • Corded Ware culture was initially exclusively pastoralist
  • There is a great deal of circumstantial, and some genetic, evidence that Corded Ware communities were characterized by having women who were clearly from the Neolithic farming population
  • There was intergroup violence as a function of culture
  • The Corded Ware and Neolithic populations persisted near each other geographically, though the Neolithic groups seem to have retreated to uplands
  • The Corded War engaged in a wholesale pattern of landscape sculpting, burning down forests to produce pasture

Neolithic Y lineages, such as G2, are far rarer in Northern Europea today that R1a and R1b (in contrast, the hunter-gatherer I seems to have gone through an expansion just like R1a and R1b). We already have a model for what went on here, the Iberian settlement of the New World. Among mestizo populations there are huge skews of mtDNA and Y, with the former almost all Amerindian (with some African) and the latter almost all European (with some African).

The Corded War are the ancestors of the German peoples who we see emerge into the light of history during antiquity. What these data are telling is that the Germans are the product of a massive period of biological and cultural amalgamation and synthesis between indigenous groups and intrusive populations from the steppe. The archaeological data indicate that the intrusion was male mediated. The “battle axe” culture probably lived up to its name. And they weren’t likely exceptional….

Why are so many of us “star-men”

Seven years ago I wrote 1 in 200 men direct descendants of Genghis Khan. It’s the most popular post I’ve ever written. As of now there have been 630,000 “sesssions” (basically visits) on that page alone. I suspect that many more have read my summary of The Genetic Legacy of the Mongols, the original paper on which it was based, than that paper (though it’s a good paper, you should read it).

At the time I wrote that people often asked me if I was a descendent of Genghis Khan. That seems unlikely on the paternal lineage. My Y chromosome is R1a1ab2-Z93. This is typically found in South Asia, and among Iranian peoples, as well as in the Altai region of western Mongolia. It is not common among Mongols though, even if it is found amongst them, likely due to gene flow from the west. The particular branch of R1a1a that I carry has been found in ancient remains from the Srubna culture of the eastern Pontic steppe. As a friend of mine might say, I am the scion of marauders from the steppe, even though not Genghiside ones. The fact that I have the last name Khan is simply a legacy of the custom whereby South Asian Muslim lineages of a particular status accrued the surname to denote their position within Islamicate civilization.

But though I am no direct descendent of Genghis, it turns out that my Y chromosome shares a similar history. The figure to the left is focused on European Y chromosomes, and at the top you see various “R” lineages. It turns out that R1b and R1a are both basically subject to the same explosive dynamics as the Genghis Khan haplotype: both exploded into star phylogenies relatively recently in time. Trees of the R1 lineages always show them to exhibit a rake-like pattern. This is due to the fact that starting from a small base they expanded so rapidly that they did not develop the intricate node-structure you see in lineages which accrued mutations at a more normal pace.

What could have caused such explosive growth? We know why Genghis Khan and his sons left so many descendents: conquest yielded social status. For many generations having a male Genghiside bloodline was highly effective as a means to gain bonus points when attempting to scale the summits of power and wealth. This was even true in the Muslim regions of Central Asia, despite Genghis Khan’s negative impact on Islamic civilization (Transoxiana arguably never recovered from this period).

We don’t have anything like the “Secret History of the Indo-Aryans” to explain the emergence of these older star phylogenies. In , David Anthony argues that mobile populations domesticated the horse, and used that as a killer cultural advantage to spread their Indo-European language. In his book from the 2000s Anthony argues for elite transmission of language by the Kurgan people. But more recently he has been persuaded by genetic work which suggests massive population displacements and migrations into Europe during the late Neolithic and early Bronze Age.

Unfortunately the timing doesn’t work from what I can tell. The expansion of groups like the Corded Ware seem to pre-date the emergence of the steppe chariot toolkit by many centuries. It does so happen that the chariot was invented in the region where R1a1a2b-Z93 was also found to exist. So I suspect this “Scythian” R1a lineage did sweep across much of Central-South Eurasia thanks to the horse and the wheel. But a technological explanation is more difficult for the rest.

I will posit another speculative answer, stealing the idea from Snorri Sturluson. He believed that the gods that were remembered by his pagan Norse ancestors were at one point men of great renown and fame. Kings of yore. Over time they had been deified, and legends had grown up around them. Sturluson may have been right. Perhaps the Indo-European gods recollect the forefathres of R1a and R1b. What was there advantage? Perhaps it was a hierarchical stratified social structure which brooked no individualism against the interests of the lineage unit? It may be that asabiyya is worth more than a chariot?

How Indians are a lot like Latin Americans


Pretty much any person of Indian subcontinental origin in the United States of a certain who isn’t very dark skinned has probably had the experience of being spoken to in Spanish at some point. When I was younger growing up in Oregon I had the experience multiple times of Spanish speakers, probably Mexican, pleading with me to interpret for them because there was no one else who seemed likely. It isn’t a genius insight to conclude I was most likely South Asian…but it wasn’t out of the question I was Mexican. This applies even more to lighter skinned South Asians. In the Central Valley of California, where there are many Sikhs from Punjabi and Mexicans, this confusion occurred a lot for some Indian kids.

Of course biogeographically there isn’t that much connection between South Asia and the New World. But it isn’t crazy that Christopher Columbus labelled the peoples of the New World “Indian.” After all, they were a brown-skinned people whose features were not African, East Asian, or West Eurasian. And, it turns out genetically there is a coincidence that connects the New World and South Asia: the mixed peoples of Latin America with Amerindian and European ancestry recapitulate an admixture which resembles what occurred in South Asia thousands of years ago. It looks as if about half the ancestry of South Asians is West Eurasian and half something more like eastern Eurasians.

On principles component analysis that means that South Asian and Mexican and Peruvian samples often overlap. This is somewhat curious because the non-West Eurasian ancestors of South Asians and Amerindians diverged in ancestry on the order of 25 to 45 thousand years before the present. And the Iberian ancestry of the mixed people of the New World is almost as far from the character of South Asian West Eurasian ancestry as you can get (in the parlance of this blog, lots of EEF, less CHG, not too much ANE).

A new paper, A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals, highlights another similarity: massive bias in biogeographic ancestry by sex. More precisely, the rank order of West Eurasian ancestry in South Asia is skewed like so: Y chromosome > whole-genome > mtDNA (as is evident in the above figure).

I actually began writing about this in the late 2000s, when the fact that South Asian mtDNA was very different from West Eurasian mtDNA, and South Asian Y chromosome was mostly West Eurasian, was obvious. Then work using genome-wide data sets began to point to massive intra-Eurasian admixture between very diverged lineages. The paper is not revolutionary, but worth reading for its thoroughness and how it brings together all the lines of evidence.

Finally, no ancient DNA. That’s probably for the future, but I don’t expect any surprises.

Citation: A genetic chronology for the Indian Subcontinent points to heavily sex-biased dispersals.

Ancestry inference won’t tell you things you don’t care about (but could)

The figure above is from Noah Rosenberg’s relatively famous paper, Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure. The context of the publication is that it was one of the first prominent attempts to use genome-wide data on a various of human populations (specifically, from the HGDP data set) and attempt model-based clustering. There are many details of the model, but the one that will jump out at you here is that the parameter defines the number of putative ancestral populations you are hypothesizing. Individuals then shake out as proportions of each element, K. Remember, this is a model in a computer, and you select the parameters and the data. The output is not “wrong,” it’s just the output based how you set up the program and the data you input yourself.

These sorts of computational frameworks are innocent, and may give strange results if you want to engage in mischief. For example, let’s say that you put in 200 individuals, of whom 95 are Chinese, 95 are Swedish, and 10 are Nigerian. From a variety of disciplines we know to a good approximation that non-Africans form a monophyletic clade in relation to Africans (to a first approximation). In plain English, all non-Africans descend from a group of people who diverged from Africans more than 50,000 years ago. That means if you imagine two populations, the first division should be between Africans and non-Africans, to reflect this historical demography. But if you skew the sample size, as the program looks for the maximal amount of variation in the data set it may decide that dividing between Chinese and Swedes as the two ancestral populations is the most likely model given the data.

This is not wrong as such. As the number of Africans in the data converges on zero, obviously the dividing line is between Swedes and Chinese. If you overload particular populations within the data, you may marginalize the variation you’re trying to explore, and the history you’re trying to uncover.

I’ve written all of this before. But I’m writing this in context of the earlier post, Ancestry Inference Is Precise And Accurate(Ish). In that post I showed that consumers drive genomics firms to provide results where the grain of resolution and inference varies a lot as a function of space. That is, there is a demand that Northern Europe be divided very finely, while vast swaths of non-European continents are combined into one broad cluster.

Less than 5% Ancient North Eurasian

Another aspect though is time. These model-based admixture frameworks can implicitly traverse time as one ascends up and down the number of K‘s. It is always important to explain to people that the number of K‘s may not correspond to real populations which all existed at the same time. Rather, they’re just explanatory instruments which illustrate phylogenetic distance between individuals. In a well-balanced data set for humans K = 2 usually separates Africans from non-Africans, and K = 3 then separates West Eurasians from other populations. Going across K‘s it is easy to imagine that is traversing successive bifurcations.

A racially mixed man, 15% ANE, 30% CHG, 25% WHG, 30% EEF

But today we know that’s more complicated than that. Three years ago Pickrell et al. published Toward a new history and geography of human genes informed by ancient DNA, where they report the result that more powerful methods and data imply most human populations are relatively recent admixtures between extremely diverged lineages. What this means is that the origin of groups like Europeans and South Asians is very much like the origin of the mixed populations of the New World. Since then this insight has become only more powerful, as ancient DNA has shed light as massive population turnovers over the last 5,000 to 10,000 years.

These are to some extent revolutionary ideas, not well known even among the science press (which is too busy doing real journalism, i.e. the art of insinuation rather than illumination). As I indicated earlier direct-to-consumer genomics use national identities in their cluster labels because these are comprehensible to people. Similarly, they can’t very well tell Northern Europeans that they are an outcome of a successive series of admixtures between diverged lineages from the late Pleistocene down to the Bronze Age. Though Northern Europeans, like South Asians, Middle Easterners, Amerindians, and likely Sub-Saharan Africans and East Asians, are complex mixes between disparate branches of humanity, today we view them as indivisible units of understanding, to make sense of the patters we see around us.

Personal genomics firms therefore give results which allow for historically comprehensible results. As a trivial example, the genomic data makes it rather clear that Ashkenazi Jews emerged in the last few thousand years via a process of admixture between antique Near Eastern Jews, and the peoples of Western Europe. After the initial admixture this group became an endogamous population, so that most Ashkenazi Jews share many common ancestors in the recent past with other Ashkenazi Jews. This is ideal for the clustering programs above, as Ashkenazi Jews almost always fit onto a particular K with ease. Assuming there are enough Ashkenazi Jews in your data set you will always be able to find the “Jewish cluster” as you increase the value.

But the selection of a K which satisfies this comprehensibility criterion is a matter of convenience, not necessity. Most people are vaguely aware that Jews emerged as a people at a particular point in history. In the case of Ashkenazi Jews they emerged rather late in history. At certain K‘s Ashkenazi Jews exhibit mixed ancestral profiles, placing them between Europeans and Middle Eastern peoples. What this reflects is the earlier history of the ancestors of Ashkenazi Jews. But for most personal genomics companies this earlier history is not something that they want to address, because it doesn’t fit into the narrative that their particular consumers want to hear. People want to know if they are part-Jewish, not that they are part antique Middle Eastern and Southwest European.

Perplexment of course is not just for non-scientists. When Joe Pickrell’s TreeMix paper came out five years ago there was a strange signal of gene flow between Northern Europeans and Native Americans. There was no obvious explanation at the time…but now we know what was going on.

It turns out that Northern Europeans and Native Americans share common ancestry from Pleistocene Siberians. The relationship between Europeans and Native Americans has long been hinted at in results from other methods, but it took ancient DNA for us to conceptualize a model which would explain the patterns we were seeing.

An American with recent Amerindian (and probably African) ancestry

But in the context of the United States shared ancestry between Europeans and Native Americans is not particularly illuminating. Rather, what people want to know is if they exhibit signs of recent gene flow between these groups, in particular, many white Americans are curious if they have Native American heritage. They do not want to hear an explanation which involves the fusion of an East Asian population with Siberians that occurred 15,000 to 20,000 years ago, and then the emergence of Northern Europeans thorough successive amalgamations between Pleistocene, Neolithic, and Bronze Age, Eurasians.

In some of the inference methods Northern Europeans, often those with Finnic ancestry or relationship to Finnic groups, may exhibit signs of ancestry from the “Native American” cluster. But this is almost always a function of circumpolar gene flow, as well as the aforementioned Pleistocene admixtures. One way to avoid this would be to simply not report proportions which are below 0.5%. That way, people with higher “Native American” fractions would receive the results, and the proportions would be high enough that it was almost certainly indicative of recent admixture, which is what people care about.

Why am I telling you this? Because many journalists who report on direct-to-consumer genomics don’t understand the science well enough to grasp what’s being sold to the consumer (frankly, most biologists don’t know this field well either, even if they might use a barplot here and there).

And, the reality is that consumers have very specific parameters of what they want in terms of geographic and temporal information. They don’t want to be told true but trivial facts (e.g., they are Northern European). But neither they do want to know things which are so novel and at far remove from their interpretative frameworks that they simply can’t digest them (e.g., that Northern Europeans are a recent population construction which threads together very distinct strands with divergent deep time histories). In the parlance of cognitive anthropology consumers want their infotainment the way they want their religion, minimally counterintuitive. Consume some surprise. But not too much.