The misrepresentation of genetic science in the Vox piece on race and IQ

I don’t have time or inclination to do a detailed analysis of this piece in Vox, Charles Murray is once again peddling junk science about race and IQ. Most people really don’t care about the details, so what’s the point?

But in a long piece one section jumped out to me in particular because it is false:

Murray talks about advances in population genetics as if they have validated modern racial groups. In reality, the racial groups used in the US — white, black, Hispanic, Asian — are such a poor proxy for underlying genetic ancestry that no self-respecting statistical geneticist would undertake a study based only on self-identified racial category as a proxy for genetic ancestry measured from DNA.

Obviously the Census categories are pretty bad and not optimal (e.g., the “Asian American” category pools South with East & Southeast Asians, and that has caused issues in biomedical research in the past). But the claim is false. In the first half of the 2000s the eminent statistical geneticist Neil Risch specifically addressed this issue. From 2002 in Genome Biology Categorization of humans in biomedical research: genes, race and disease:

A debate has arisen regarding the validity of racial/ethnic categories for biomedical and genetic research. Some claim ‘no biological basis for race’ while others advocate a ‘race-neutral’ approach, using genetic clustering rather than self-identified ethnicity for human genetic categorization. We provide an epidemiologic perspective on the issue of human categorization in biomedical and genetic research that strongly supports the continued use of self-identified race and ethnicity.

A major discussion has arisen recently regarding optimal strategies for categorizing humans, especially in the United States, for the purpose of biomedical research, both etiologic and pharmaceutical. Clearly it is important to know whether particular individuals within the population are more susceptible to particular diseases or most likely to benefit from certain therapeutic interventions. The focus of the dialogue has been the relative merit of the concept of ‘race’ or ‘ethnicity’, especially from the genetic perspective. For example, a recent editorial in the New England Journal of Medicine [1] claimed that “race is biologically meaningless” and warned that “instruction in medical genetics should emphasize the fallacy of race as a scientific concept and the dangers inherent in practicing race-based medicine.” In support of this perspective, a recent article in Nature Genetics [2] purported to find that “commonly used ethnic labels are both insufficient and inaccurate representations of inferred genetic clusters.” Furthermore, a supporting editorial in the same issue [3] concluded that “population clusters identified by genotype analysis seem to be more informative than those identified by skin color or self-declaration of ‘race’.” These conclusions seem consistent with the claim that “there is no biological basis for ‘race'” [3] and that “the myth of major genetic differences across ‘races’ is nonetheless worth dismissing with genetic evidence” [4]. Of course, the use of the term “major” leaves the door open for possible differences but a priori limits any potential significance of such differences.

In our view, much of this discussion does not derive from an objective scientific perspective. This is understandable, given both historic and current inequities based on perceived racial or ethnic identities, both in the US and around the world, and the resulting sensitivities in such debates. Nonetheless, we demonstrate here that from both an objective and scientific (genetic and epidemiologic) perspective there is great validity in racial/ethnic self-categorizations, both from the research and public policy points of view.

From a 2005 interview:

Gitschier: Let’s talk about the former, the genetic basis of race. As you know, I went to a session for the press at the ASHG [American Society for Human Genetics] meeting in Toronto, and the first words out of the mouth of the first speaker were “Genome variation research does not support the existence of human races.”

Risch: What is your definition of races? If you define it a certain way, maybe that’s a valid statement. There is obviously still disagreement.

Gitschier: But how can there still be disagreement?

Risch: Scientists always disagree! A lot of the problem is terminology. I’m not even sure what race means, people use it in many different ways.

In our own studies, to avoid coming up with our own definition of race, we tend to use the definition others have employed, for example, the US census definition of race. There is also the concept of the major geographical structuring that exists in human populations—continental divisions—which has led to genetic differentiation. But if you expect absolute precision in any of these definitions, you can undermine any definitional system. Any category you come up with is going to be imperfect, but that doesn’t preclude you from using it or the fact that it has utility.

We talk about the prejudicial aspect of this. If you demand that kind of accuracy, then one could make the same arguments about sex and age!

You’ll like this. In a recent study, when we looked at the correlation between genetic structure [based on microsatellite markers] versus self-description, we found 99.9% concordance between the two. We actually had a higher discordance rate between self-reported sex and markers on the X chromosome! So you could argue that sex is also a problematic category. And there are differences between sex and gender; self-identification may not be correlated with biology perfectly. And there is sexism. And you can talk about age the same way. A person’s chronological age does not correspond perfectly with his biological age for a variety of reasons, both inherited and non-inherited. Perhaps just using someone’s actual birth year is not a very good way of measuring age. Does that mean we should throw it out? No. Also, there is ageism—prejudice related to age in our society. A lot of these arguments, which have a political or social aspect to them, can be made about all categories, not just the race/ethnicity one.

Risch is not obscure. In the piece the author observes that Risch ‘was described by one of the field’s founding fathers [of the field] as “the statistical geneticist of our time.’

2005 is a long way from 2017. Risch may have changed his mind. In fact, it is probably best for him and his reputation if he has changed his mind. I wouldn’t be surprised if Risch comes out and engages in a struggle session where he disavows his copious output from 2005 and earlier defending the utilization of race as a concept in statistical genetics.

Also, genotyping is cheap enough and precise enough that one might actually make an argument for leaving off any self-reported ancestry questions. It’s really not necessary. This isn’t 2005.

But that section in the Vox piece is simply false. The existence of Risch refutes it. Vox is a high profile website which serves to “explain” things to people. The academics who co-wrote that piece are very smart, prominent, and known to me. I don’t plan on asking them why they put that section in there. I think I know why.

There will be no update to that piece I’m sure. It will be cited widely. It will become part of what “we” all know. Who I am to disagree with Vox? This is journalism from what have been able to gather and understand. The founders of Vox are rich and famous now. Incentives matter. There are great journalists out there  who don’t misrepresent topics which I know well. But the incentive structure is not to reward this. More often storytellers who tell you the story you like to be told are rewarded.

As for science and the academy? I am frankly too depressed to say more.

The population genetic structure of Sicily and Greece

By total coincidence a paper came out yesterday, Ancient and recent admixture layers in Sicily and Southern Italy trace multiple migration routes along the Mediterranean (I blogged about the topic). It’s open access, and it has a lot of statistics and analyses. I’d recommend you read it yourself.

You see the Sicilian and Greek populations and their skew toward the eastern Mediterranean. But in the supplements they displayed some fineSTRUCTURE clustering, and at K = 3 you see that Europe and the Middle East diverge into three populations. What this is showing seems to be: 1) in red, those groups least impacted by post-Neolithic migration 2) in blue, Middle Eastern groups characterized by the fusion between western & eastern Middle Eastern farmer which occurred after the movement west of the ancestors of the “Early European Farmers” (who gave rise to the red cluster), who were related to the western Middle Eastern farmers 3) the groups most impacted by Pontic steppe migration.

The authors confirm what I reported over two years ago on this blog: mainland and island Greeks are genetically distinct, probably because the former have recent admixture from Slavs and Slav-influenced people. And, many Southern Italians resemble island Greeks.

One has to be careful about dates inferred from genetic patterns. For example:

Significant admixture events successfully dated by ALDER reveal that all Southern Italian and Balkan groups received contributions from populations bearing a Continental European ancestry between 3.0 and 1.5 kya

The beginning of folk wanderings in the Balkans which reshaped its ethnographic landscape really dates to the later 6th century, when the proto-Byzantines began to divert all its resources to the eastern front with Persia, and abandoned the hinterlands beyond the Mediterranean coast in Europe to shift its focus toward the Anatolian core of the empire. The Slavic migrations were such that there were tribes resident in the area of Sparta in the early medieval period. Presumably because they were not a seafaring folk they don’t seem to have had much impact on the islands.

Such an early period in the interval though can not be the Slavs. What can it be? I suspect that that there are signals of Indo-European migrations in there that are being conflated due to low power to detect them since they are rather modest in demographic impact. The islands such as Sardinia, Crete and Cyprus had non-Indo-European speakers down to the Classical period.

Overall it’s an interesting paper. But it needs a deeper dig than I have time right now.

The Orantes has not mixed much with the Tiber

In a moment of weakness I decided to read some of Mary Beard’s SPQR: A History of Ancient Rome. I say weakness because I want to wean myself off of excessive reading of Roman history, as in terms of inferential utility I’ve long reached diminishing returns. But I quite enjoy the topic, and so here I am.

The author is an excellent writer as well as a scholar, and I quite enjoyed Roman Triumph, so I am entirely not surprised that SPQR has me hooked. Some of my correspondents have exhibited some disdain toward it because of Beard’s attempts to draw some connections to present day mores and values from that of Rome, presumably with a progressive bent.

Myself, this does not bother me. I don’t come into reading about Rome as an ignorant, so I can sort that from the nuggets of fact and positivistic interpretation. In any case, I think of it rather like how Islamic philosophers viewed Aristotle through their own religio-cultural lens. Obviously this was an issue that caused resistance to the transmission of Aristotle to the Christian West, but ultimately it did not stop what was inevitable. At the end of the day it was more about Aristotle than the glosses.

Though I highly recommend SPQR (I’m halfway through), that’s not the point of this post. Going along I kept thinking about the section on the Etruscans. The Rasena. Their origins have a genetic connection that is clouded and uncertain right now. I would like to dig deeper into this issue in the future; no doubt some day it will be cleared up. But that day is not this day.

Modern Italians have more “Indo-European” admixture than they do “Middle Eastern”

Rather, I want to address the idea that modern Italians are genetically a distinct people from ancient Roman Italians. Because on that score we have the answers. Ultimately the idea that this is even a debate goes back to Juvenal:

It is that the city is become Greek, Quirites, that I cannot tolerate; and yet how small the proportion even of the dregs of Greece! Syrian Orontes has long since flowed into the Tiber, and brought with it its language, morals, and the crooked harps with the flute-player, and its national tambourines, and girls made to stand for hire at the Circus. Go thither, you who fancy a barbarian harlot with embroidered turban….

These comments are rooted in the reality that Rome during Juvenal’s period was quite a cosmopolitan city, with large numbers of Greeks and people from the Eastern Mediterranean who were Hellenized to various degrees (in the early 3rd century Rome was ruled by a family of Hellenized Syrians). We know this because we have plenty of observations and complaints, and there are a plethora of inscriptions and graffiti in the new languages.

In the 19th and early 20th century the ascendency of Nordic racial theories about the origins of white supremacy across the world presented a problem. The Mediterranean peoples had been in decline for centuries, and were perceived to be Orientalized and inferior. Yet in the past they had achieved greatness which Northern Europeans were attempting to emulate. How could a racially inferior people have created such excellence?

A simple explanation for this condition for Victorians and their Continental fellow travelers was one of racial degradation. The ancient Romans were in this telling fundamentally a different people than modern Romans, with the latter being derived from migrants from the eastern Mediterranean who had arrived during the period of the Empire.

Though most of the racially derogatory elements are gone form this narrative, it is still strongly persistent in public consciousness. Being a Cavalli-Sforza nerd (there is such a thing), I have a copy of Consanguinity, Inbreeding, and Genetic Drift in Italy, and there was data in it which made me skeptical of wholesale replacement in the middle 2000s. Then there was Peter Ralph and Graham Coop’s 2013 paper, The Geography of Recent Genetic Ancestry across Europe, which reported lots of deep regional structure across Italy.

This is important because it suggests a local stability to the demographic character of the regions for a long time. Probably earlier than the period of the Roman Empire. Though one can imagine scenarios of demographic replacement which would produce this result, they’re generally less parsimonious than the model whereby modern Italian population structure maintains the general outline it had at the beginning of the Iron age.

Finally, over the past seven years I have done a lot of analysis and manipulation of tens of thousands of Europeans and Middle Easterners in relation to their genetic data for personal and professional reasons. Some patterns jump out at you, and some subtle tendencies come into the foreground. It is pretty clear that Italians are not a transplanted Middle Eastern population (though there is some recent non-Italian ancestry; Sicilians often have minor components of clear North African ancestry as well as small percentages of Sub-Saharan heritage, which I think is almost certainly due not to Greek and Roman cosmopolitanism, but the legacy of the Arab emirate which existed on the island for a few centuries).

But now I have realized probably the best illustration of this. The Reich lab has been generating a massive genotype dataset over the past five years on the Human Origins Array. And not only do you have modern populations, but you have ancient ones (from ancient DNA). The PCA plots in their papers make what I’m saying above pretty clear.

I’ve modified the PCA plot from Genomic insights into the origin of farming in the ancient Near East. Notice where various Italian groups and Greeks are. I’ve also labeled the Druze; they are almost certainly an excellent representation of Near Eastern Syrians from 2,000 years ago. They have been endogamous for nearly 1,000 years in the Lebanese highlands, and don’t have admixture that is more common in Syrian Muslims from the lowlands.

Notice that the most of the Greeks are shifted further toward Northwestern Europeans than Southern Italians. I say most, because I’ve had access to a larger data set of Greeks, and it becomes clear that a minority of Greeks cluster more with Southern Italians, and the majority have a minority admixture element from a Northern European population. This is Slavic ancestry that arrived after the middle of the 6th century, when the East Roman state basically abandoned most of the Balkans to focus on maintaining control over Constantinople, Salonika, and the Peloponnese.

Northern Italians are shifted toward Sardinians and Spaniards. The Sardinians are important, because we now know that they are the closest modern Europeans to the agriculturalists who arrived from the eastern Mediterranean during the early Neolithic. This population, “Early European Farmers” (EEF), once dominated most of the continent. But ~5,000 years ago migrations from the steppe brought a new element which replaced and assimilated them in Northern Europe.

But in Southern Europe their genetic legacy remains strong and to a great extent dominant. Iberia and the Italian peninsula have been impacted by the migrations out of the steppe, with Sardinia the least so. In the smaller plot above you can see that the early Neolithic individuals are close to the Sardinians, with mainland Italians being shifted toward other populations.

The Northern Italians in particular show some influence from Northern European populations. Some of this may be gene flow through diffusion due to proximity, but the Alps are a rather formidable barrier. Rather, I suspect it reflects episodic migration. I generally do not weight the Lombards too highly as a major influence. Rather, I suspect that it is a combination of Gaulish settlement in the Po river valley, and early impacts from the Indo-Europeans who arrived in the Italian peninsula.

The Southern Italian shift toward the Middle East probably does indicate some gene flow, but it is important to remember that this was also Magna Graecia, so there is probably a Greek element here similar to what occurs among those Greeks without Slavic admixture (please note that Byzantine Greek rule also persisted in Southern Italy up until the Norman conquest ). And if you look at how they relate to the Neolithic samples, they exhibit a lot of shift on the plane toward the steppe populations, parallel to the Levantines. In other words, a lot of the change since the Neolithic in Southern Italy is attributable to the influence of the steppe migration, not Roman era gene flow from Syrians.

I will probably do some formal analysis at some point so that the numbers can get out there now that there are so many ancient genotypes available too. But really this shouldn’t be a discussion anymore.

Addendum: You may be asking, if there are so many literary comments about non-Italians during the Roman Empire in Italy, where did they go? I think the big thing to remember is that there is an ascertainment bias toward what we know in urban areas. There is a high likelihood that urban areas were population sinks, which could not maintain themselves without constant migration.

Beyond “Out of Africa” and multiregionalism: a new synthesis?

For several decades before the present era there have been debates between proponents of the recent African origin of modern humans, and the multiregionalist model. Though molecular methods in a genetic framework have come of the fore of late these were originally paleontological theories, with Chris Stringer and Milford Wolpoff being the two most prominent public exponents of the respective paradigms.

Oftentimes the debate got quite heated. If you read books from the 1990s, when multiregionalism in particular was on the defensive, there were arguments that the recent out of Africa model was more inspirational in regards to our common humanity. As a riposte the multiregionalists asserted that those suggesting recent African origins with total replacement was saying that our species came into being through genocide.

Though some had long warned against this, the dominant perception outside of population genetics was that results such the “mitochondrial Eve” had given strong support to the recent African origin of modern humans, to the exclusion of other ancestry. 2002’s Dawn of Human Culture took it for granted that the recent African origin of modern humans to the total exclusion of other hominin lineages was established fact.

In 2008 I went to a talk where Svante Paabo presented some recent Neanderthal ancient mtDNA work. It was rather ho-hum, as Paabo showed that the Neanderthal lineages were highly diverged from modern ones, and did not leave any descendants. Though of course most modern human lineages did not leave any descendants from that period, Paabo took this evidence supporting the proposition that Neanderthals did not contribute to the modern human gene pool.

When his lab reported autosomal Neanderthal admixture in 2010, it was after initial skepticism and shock internally. I know Milford Wolpoff felt vindicated, while Chris Stringer began to emphasize that the recent African origin of modern humanity also was defined by regional assimilation of other lineages. The data have ultimately converged to a position somewhere between the extreme models of total replacement or balanced and symmetrical gene flow.

This is not surprising. Extreme positions are often rhetorically useful and popular when there’s no data. But reality does not usually conform to our prejudices, so ultimately one has to come down at some point.

The data for non-Africans is rather unequivocal. The vast majority of (>90%) of the ancestry of non-Africans seems to go back to a small number of common ancestors ~60,000 years ago. Perhaps in the range of ~1,000 individuals. These individuals seem to be a node within a phylogenetic tree where all the other branches are occupied by African populations. Between this period and ~15,000 years ago these non-Africans underwent a massive range expansion, until modern humans were present on all continents except Antarctica. Additionally, after the Holocene some of these non-African groups also experienced huge population growth due to intensive agricultural practice.

To give a sense of what I’m getting at, the bottleneck and common ancestry of non-Africans goes back ~60,000 years, but the shared ancestry of Khoisan peoples and non-Khoisan peoples goes back ~150,000-200,000 years. A major lacunae of the current discussion is that often the dynamics which characterize non-Africans are assumed to be applicable to Africans. But they are not.

A 2014 paper illustrates one major difference by inferring effective population from whole genomes: African populations have not gone through the major bottleneck which is imprinted on the genomes of all non-African populations. The Khoisan peoples, the most famous of which are the Bushmen of the Kalahari, have the largest long term effective populations of any human group. The Yoruba people of Nigeria have a history where they were subject to some population decline, but not to the same extent as non-Africans.

What do we take away from this?

One thing is that we have to consider that the assimilationist model which seems to be necessary for non-Africans, also applies to Africans. For years some geneticists have been arguing that some proportion of African ancestry as well is derived from lineages outside of the main line leading up to anatomically modern humans. Without the smoking gun of ancient genomes this will probably remain a speculative hypothesis. I hope that Lee Berger’s recent assertion that they’ve now dated Homo naledi to ~250,000 years before the present may offer up the possibility that ancient DNA will help resolve the question of African archaic admixture (i.e., if naledi is related to the “ghost population”?).

The second dynamic is that the bottleneck-then-range-expansion which is so important in defining the recent prehistory of non-Africans is not as relevant to Africans during the Pleistocene. The very deep split dates being inferred from whole genome analysis of African populations makes me wonder if multiregional evolution is actually much more important within Africa in the development of modern humans in the last few hundred thousand years. Basically, the deep split dates may highlight that there was recurrent gene flow over hundreds of thousands of years between different closely related hominin populations in Africa.

Ultimately, it doesn’t seem entirely surprising that the “Out of Africa” model does not quite apply within Africa.

Addendum: Over the past ~5,000 years we have seen the massive expansion of agricultural populations within the continent. The “deep structure” therefore may have been erased to a great extent, with Pygmies, Khoisan, and Hadza, being the tip of the iceberg in terms of the genetic variation which had characterized the Africa during the Pleistocene.

“Out of Africa” bottleneck is what really matters for mutations

At least in relation to mutational load, if you read a new preprint in biorxiv, The demographic history and mutational load of African hunter-gatherers and farmers:

The distribution of deleterious genetic variation across human populations is a key issue in evolutionary biology and medical genetics. However, the impact of different modes of subsistence on recent changes in population size, patterns of gene flow, and deleterious mutational load remains to be fully characterized. We addressed this question, by generating 300 high-coverage exome sequences from various populations of rainforest hunter-gatherers and neighboring farmers from the western and eastern parts of the central African equatorial rainforest. We show here, by model-based demographic inference, that the effective population size of African populations remained fairly constant until recent millennia, during which the populations of rainforest hunter-gatherers have experienced a ~75% collapse and those of farmers a mild expansion, accompanied by a marked increase in gene flow between them. Despite these contrasting demographic patterns, African populations display limited differences in the estimated distribution of fitness effects of new nonsynonymous mutations, consistent with purifying selection against deleterious alleles of similar efficiency in the different populations. This situation contrasts with that we detect in Europeans, which are subject to weaker purifying selection than African populations. Furthermore, the per-individual mutation load of rainforest hunter-gatherers was found to be similar to that of farmers, under both additive and recessive modes of inheritance. Together, our results indicate that differences in the subsistence patterns and demographic regimes of African populations have not resulted in large differences in mutational burden, and highlight the role of gene flow in reshaping the distribution of deleterious genetic variation across human populations.

There’s two major moving parts in this preprint. First, they using phylogenomic methods to explicitly model population history. Second, they integrated their demographic results in generation and interpreting the distribution of mutations within the exomes of these populations. That is, they combined phylogenomics to gain insight into population genomics, as the latter focuses more on the parameters which define variation with a population.

The data they worked with was from the exome. The regions of the genome which translate into genes. That’s ~30 million bases. They get really good precision due to high coverage, hitting site about 70 times. Their sample was about 300 Africans and 100 Europeans, and they got ~500,000 polymorphisms or variants for their trouble.

The populations were labeled by subsistence and provenance. The Europeans were Belgians. For the Africans they had two groups of hunter-gatherer Pymgies, and two groups of Bantu agriculturalists, sampled from western and eastern locations as you see on the map above.

The admixture plots, which separate out individuals into K numbers of populations break out in a way that makes sense. First, Europeans separate, and the eastern agriculturalist populations have a little bit of evidence of European-like ancestry. This is almost certainly Middle Eastern farmer, which has been found in many East African populations, and those populations which have mixed with them. Then the hunter-gathers separate from the agriculturalists. This is in line with expectation and earlier research; the hunter-gatherers of Africa seem very different from the agriculturalists, and are actually more closely related to each other than the agriculturalists in their neighboring regions.

The exception to this pattern is caused by recent gene flow, which is clearly evident above. Due to population size differences it looks like there is more agricultural ancestry in the Pygmies than vice versa. I wish that they had sampled Mbuti Pygmies. I’m told that this group has the least agricultural admixture.

But then they decided to get fancy and explicitly model demographic histories with fastsimcoal2. What does this do? From the website for the software:

While preserving all the simulation flexibility of simcoal2, fastsimcoal is now implemented under a faster continous-time sequential Markovian coalescent approximation, allowing it to efficiently generate genetic diversity for different types of markers along large genomic regions, for both present or ancient samples. It includes a parameter sampler allowing its integration into Bayesian or likelihood parameter estimation procedure.

fastsimcoal can handle very complex evolutionary scenarios including an arbitrary migration matrix between samples, historical events allowing for population resize, population fusion and fission, admixture events, changes in migration matrix, or changes in population growth rates. The time of sampling can be specified independently for each sample, allowing for serial sampling in the same or in different populations.

The models you see that were tested are pretty simple, and they all seem plausible I suppose. Their simulations suggested that the three above scenarios, with alternative branching patterns and various gene flows, were all of equal likelihood. That is, given the models and the data that they had (4-fold synonymous sites which are likely to be neutral) you can’t distinguish which is right.

In all the models hunter-gatherers diverged relatively recently and so did the agriculturalists. Europeans, who are stand-ins for all non-Africans in this scenario, diverged pretty early from the Africans. But how the Africans relate to each other and Europeans is not totally clear. Why? Because ancient population structure. It is becoming rather obvious now that ~100,000 years ago, and earlier, there were many different modern human lineages which had already diversified. The Khoisan seem to have diverged from other human lineages closer to 200,000 thousand than 100,000 years ago. What this means is that for most of the history of anatomically modern humans population structure  existed between distinct lineages. And some of that persists down to today within Africa.

I’ll bullet point some of their inferences from these models (verbatim quotes below):

  1. Our results suggest that the ancestors of the contemporary RHG, AGR and EUR populations diverged between 85 and 140 thousand years ago (kya), from an ancestral population that underwent demographic expansion between 173 and 191 kya
  2. After the initial population splits, the Ne of AGR and RHG (NaAGR and NaRHG) remained within a range extending from 0.55 to 2.2 times the ancestral African Ne (NHUM), whereas EUR (NaEUR) experienced a decrease in Ne by a factor of three to seven.
  3. The ancestors of the wRHG and eRHG populations diverged 18 to 20 kya (TRHG), and underwent a decreased in Ne by a factor of 3.8 to 5.7 for the wRHG (NwRHG) and 7.1 to 11 for the eRHG (NeRHG), regardless of the branching model considered.
  4. The ancestors of the AGR (NaAGR) split into western and eastern populations 6.7 to 11 kya (TAGR), and underwent a mild expansion, by a factor of 2.3 to 3.1 for the wAGR (NwAGR) and 1.2 to 2.2 for the eAGR (NeAGR).
  5. The EUR population experienced a 7.1- to 8.3-fold expansion (NEUR) 12 to 22 kya (TEUR).

No results are perfect. But some of these dates do not make sense. There’s a lot of circumstantial evidence that the ancestors of European populations began to expand over the last 10,000 years. The dates above suggest there was a Pleistocene expansion. Basically you can divide that value by half, and then you get a reasonable range.

Second, both the agriculturalists sampled here are Bantu speaking, and there’s a good amount of cultural and genetic data for recent shared ancestry of the Bantu over the last 3,000 years. I understand that admixture with a very diverged lineage (e.g., eastern Bantu agriculturalist samples mixing with Nilotic populations, which is how they got some non-African ancestry, as well as local Pygmy groups) can inflate these divergence dates. If that’s the case, they should note that in the text.

We don’t have much historical or archaeological clarity from what I know about divergences between Pygmy groups. This particular group has studied the topic and published on it before, so I’m inclined to trust them more than anyone else. But, the above dates for groups we do know make me a bit more skeptical of a simple divergence around the Last Glacial Maximum.

Then there are the earliest divergences. And 85 to 140,000 year interval is huge for when non-Africans split off from Africans. If closer to 140 than 85, then that means that non-African divergence from Africans preserves ancient African diversity. That is, non-Africans descend from an African group that no longer exists (or has not been sampled in this study at least!). I’ve poked around this question, and when you take into account recent gene flow, it is hard to find the specific African group that non-Africans descend from, though there is some consensus that they branched off from the non-Khoisan Africans later than from the Khoisan.

But there is also a lot of archaeological and some ancient genetic DNA now that indicates that the vast majority of non-African ancestry began to expand rapidly around 50-60,000 years ago. This is tens of thousands of years after the lowest value given above. Therefore, again we have to make recourse to a long period of separation before the expansion. This is not implausible on the face of it, but we could do something else: just assume there’s an artifact with their methods and the inferred date of divergence is too old. That would solve many of the issues.

I really don’t know if the above quibbles have any ramification for the site frequency spectrum of deleterious mutations. My own hunch is that no, it doesn’t impact the qualitative results at all.

Figure 3 clearly shows that Europeans are enriched for weak and moderately deleterious mutations (the last category produces weird results, and I wish they’d talked about this more, but they observe that strong deleterious mutations have issues getting detected). Ne is just the effective population size and s is the selection coefficient (bigger number, stronger selection).

Why are the middle two values enriched? Presumably it’s the non-African bottleneck. This is where another non-African population would have been a nice check to make sure that it was the “Out of Africa” bottleneck…but it’s probably asking a bit much to sequence more individuals to 70x coverage.

The lack of difference between the African populations is an indication that recent demography is not shaping the distribution much. Additionally, they note that gene flow between the African groups probably increased diversity in some ways, so that as long as a group is connected with other populations it will probably be rescued (note that none of these in their data were particular inbred as judging by runs of homozygosity).

Finally, they found that the number of homozygote mutations that were deleterious is higher in their model results for Europeans than the African groups. This is not surprising, and what one expects. But, they found that this is a function likely of continuous gene flow between the African groups. Without gene flow homozygosity would have been much higher. This gets back to the fact that gene flow is a powerful homogenizing tool, and the lack of gene flow has to be pretty extreme for divergence to occur.

Which brings us back to the “Out of Africa” event. The next ten years are going to see a lot of investigation of African phyologenomics and population genomics. Basically, the relationships, and selection pressures. It is totally implausible that Bantu groups in Kenya and Tanzania did not absorb local non-Nilotic populations. We’ll figure that out. Additionally, selection pressures are probably different between different groups. We’ll know more about that. But, ancient DNA will probably give us some understanding of why non-Africans went through such a massive demographic sieve. We know in broad sketches. But most people want to fill in the details.

Citation: The demographic history and mutational load of African hunter-gatherers and farmers, Marie Lopez, Athanasios Kousathanas, Helene Quach, Christine Harmant, Patrick Mouguiama-Daouda, Jean-Marie Hombert, Alain Froment, George H Perry, Luis B Barreiro, Paul Verdu, Etienne Patin, Lluis Quintana-Murci, doi:

The logic of human destiny was inevitable 1 million years ago

Robert Wright’s best book, Nonzero: The Logic of Human Destiny, was published nearly 20 years ago. At the time I was moderately skeptical of his thesis. It was too teleological for my tastes. And, it does pander to a bias in human psychology whereby we look to find meaning in the universe.

But this is 2017, and I have somewhat different views.

In the year 2000 I broadly accepted the thesis outlined a few years later in The Dawn of Human Culture. That our species, our humanity, evolved and emerged in rapid sequence, likely due to biological changes of a radical kind, ~50,000 years ago. This is the thesis of the “great leap forward” of behavioral modernity.

Today I have come closer to models proposed by Michael Tomasello in The Cultural Origins of Human Cognition and Terrence Deacon in The Symbolic Species: The Co-evolution of Language and the Brain. Rather than a punctuated event, an instance in geological time, humanity as we understand it was a gradual process, driven by general dynamics and evolutionary feedback loops.

The conceit at the heart of Robert J. Sawyer’s often overly preachy Neanderthal Parallax series, that if our own lineage went extinct but theirs did not they would have created a technological civilization, is I think in the main correct. It may not be entirely coincidental that the hyper-drive cultural flexibility of African modern humans evolved in African modern humans first. There may have been sufficient biological differences to enable this to be likely. But I believe that if African modern humans were removed from the picture Neanderthals would have “caught up” and been positioned to begin the trajectory we find ourselves in during the current Holocene inter-glacial.

Luke Jostins’ figure showing across board encephalization

The data indicate that all human lineages were subject to increased encephalization. That process trailed off ~200,000 years ago, but it illustrates the general evolutionary pressures, ratchets, or evolutionary “logic”, that applied to all of them. Overall there were some general trends in the hominin lineage that began to characterized us about a million years ago. We pushed into new territory. Our rate of cultural change seems to gradually increased across our whole range.

One of the major holy grails I see now and then in human evolutionary genetics is to find “the gene that made us human.” The scramble is definitely on now that more and more whole genome sequences from ancient hominins are coming online. But I don’t think there will be such gene ever found. There isn’t “a gene,” but a broad set of genes which were gradually selected upon in the process of making us human.

In the lingo, it wasn’t just a hard sweep from a de novo mutation. It was as much, or even more, soft sweeps from standing variation.

Aryan marauders from the steppe came to India, yes they did!

Its seems every post on Indian genetics elicits dissents from loquacious commenters who are woolly on the details of the science, but convinced in their opinions (yes, they operate through uncertainty and obfuscation in their rhetoric, but you know where the axe is lodged). This post is an attempt to answer some questions so I don’t have to address this in the near future, as ancient DNA papers will finally start to come out soon, I hope (at least earlier than Winds of Winter).

In 2001’s The Eurasian Heartland: A continental perspective on Y-chromosome diversity Wells et al. wrote:

The current distribution of the M17 haplotype is likely to represent traces of an ancient population migration originating in southern Russia/Ukraine, where M17 is found at high frequency (>50%). It is possible that the domestication of the horse in this region around 3,000 B.C. may have driven the migration (27). The distribution and age of M17 in Europe (17) and Central/Southern Asia is consistent with the inferred movements of these people, who left a clear pattern of archaeological remains known as the Kurgan culture, and are thought to have spoken an early Indo-European language (27, 28, 29). The decrease in frequency eastward across Siberia to the Altai-Sayan mountains (represented by the Tuvinian population) and Mongolia, and southward into India, overlaps exactly with the inferred migrations of the Indo-Iranians during the period 3,000 to 1,000 B.C. (27). It is worth noting that the Indo-European-speaking Sourashtrans, a population from Tamil Nadu in southern India, have a much higher frequency of M17 than their Dravidian-speaking neighbors, the Yadhavas and Kallars (39% vs. 13% and 4%, respectively), adding to the evidence that M17 is a diagnostic Indo-Iranian marker. The exceptionally high frequencies of this marker in the Kyrgyz, Tajik/Khojant, and Ishkashim populations are likely to be due to drift, as these populations are less diverse, and are characterized by relatively small numbers of individuals living in isolated mountain valleys.

In a 2002 interview with the India site Rediff, the first author was more explicit:

Some people say Aryans are the original inhabitants of India. What is your view on this theory?

The Aryans came from outside India. We actually have genetic evidence for that. Very clear genetic evidence from a marker that arose on the southern steppes of Russia and the Ukraine around 5,000 to 10,000 years ago. And it subsequently spread to the east and south through Central Asia reaching India. It is on the higher frequency in the Indo-European speakers, the people who claim they are descendants of the Aryans, the Hindi speakers, the Bengalis, the other groups. Then it is at a lower frequency in the Dravidians. But there is clear evidence that there was a heavy migration from the steppes down towards India.

But some people claim that the Aryans were the original inhabitants of India. What do you have to say about this?

I don’t agree with them. The Aryans came later, after the Dravidians.

Over the past few years I’ve gotten to know the above first author Spencer Wells as a personal friend, and I think he would be OK with me relaying that to some extent he was under strong pressure to downplay these conclusions. Not only were, and are, these views not popular in India, but the idea of mass migration was in bad odor in much of the academy during this period. Additionally, there was later work which was less clear, and perhaps supported an Indian origin for R1a1a. Spencer himself told me that it was not impossible for R1a to have originated in India, but a branch eventually back-migrated to southern Asia.

But even researchers from the group at Stanford where he had done his postdoc did not support this model by the middle 2000s, Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions and Reveal Minor Genetic Influence of Central Asian Pastoralists. In 2009 a paper out of an Indian group was even stronger in its conclusion for a South Asian origin of R1a1a, The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system.

By 2009 one might have admitted that perhaps Spencer was wrong. I was certainly open to that possibility. There was very persuasive evidence that the mtDNA lineages of South Asia had little to do with Europe or the Middle East.

Yet a closer look at the above papers reveals two major systematic problems.

First, ancient DNA has made it clear that there has been major population turnover during the Holocene, but this was not the null hypothesis in the 2000s. Looking at extant distributions of lineages can give one a distorted view of the past. Frankly, the 2009 Indian paper was egregious in this way because they included Turkic groups in their Central Asian data set. Even in 2009 there was a whole lot of evidence that Central Asian Turkic groups were likely very different from Indo-European Turanian populations which would have been the putative ancestors of Indo-Aryans. Honestly the authors either consciously loaded the die to reduce the evidence for gene flow from Central Asia, or they were ignorant (the nature of the samples is much clearer in the supplements than the  primary text for what it’s worth).

Second, Y chromosomal marker sets in the 2000s were constrained to fast mutating microsatellite regions or less than 100 variant SNPs on the Y. Because it is so repetitive the Y chromosome is hard to sequence, and it really took the technologies of the last ten years to get it done. Both the above papers estimate the coalescence of extant R1a1a lineages to be 10-15,000 years before the present. In particular, they suggest that European and South Asian lineages date back to this period, pushing back any possible connection between the groups, and making it possible that European R1a1a descended from a South Asian founder group which was expanding after the retreat of the ice sheets. The conclusions were not unreasonable based on the methods they had.  But now we have better methods.*

Whole genome sequencing of the Y, as well as ancient DNA, seems to falsify the above dates. Though microsatellites are good for very coarse grain phyolgenetic inferences, one has to be very careful about them when looking at more fine grain population relationships (they are still useful in forensics to cheaply differentiate between individuals, since they accumulate variation very quickly). They mutate fast, and their clock may be erratic.

Additionally, diversity estimates were based on a subset of SNP that were clearly not robust. R1a1a is not diverse anywhere, though basal lineages seem to be present in ancient DNA on the Pontic steppe in some cases.

To show how lacking in diversity R1a1a is, here are the results of a 2016 paper which performed whole genome sequencing on the Y. Instead of relying on the order of 10 to 100 SNPs, this paper discover over 65,000 Y variants worldwide. Notice how little difference there is between different South Asian groups below, indicative of a massive population expansion relatively recently in time which didn’t even have time to exhibit regional population variation. They note that “The most striking are expansions within R1a-Z93 [the South Asian clade], ~4.0–4.5 kya. This time predates by a few centuries the collapse of the Indus Valley Civilization, associated by some with the historical migration of Indo-European speakers from the western steppes into the Indian sub-continent.

Read More

Mouse fidelity comes down to the genes

While birds tend to be at least nominally monogamous, this is not the case with mammals. This strikes some people as strange because humans seem to be monogamous, at least socially, and often we take ourselves to be typically mammalian. But of course we’re not. Like many primates we’re visual creatures, rather than relying in smell and hearing. Obviously we’re also bipedal, which is not typical for mammals. And, our sociality scales up to massive agglomerations of individuals.

How monogamous we are is up for debate. Desmond Morris, who is well known to many from his roles in television documentaries, has been a major promoter of the idea that humans are monogamous, with a focus on pair-bonds. In contrast, other researchers have highlighted our polygamous tendencies. In The Mating Mind Geoffrey Miller argues for polygamy, and suggests that pair-bonds in a pre-modern environment were often temporary, rather than lifetime (Miller is now writing a book on polyamory).

The fact that in many societies high status males seem to engage in polygamy, despite monogamy being more common, is one phenomenon which confounds attempts to quickly generalize about the disposition of our species. What is preferred may not always be what is practiced, and the external social adherence to norms may be quite violated in private.

Adducing behavior is simpler in many other organisms, because their range of behavior is more delimited. When it comes to studying mating patterns in mammals voles have long been of interest as a model. There are vole species which are monogamous, and others which are not. Comparing the diverged lineages could presumably give insight as to the evolutionary genetic pathways relevant to the differences.

But North American deer mice, Peromyscus, may turn to be an even better bet: there are two lineages which exhibit different mating patterns which are phylogenetically close enough to the point where they can interbreed. That is crucial, because it allows one to generate crosses and see how the characteristics distribute themselves across subsequent generations. Basically, it allows for genetic analysis.

And that’s what a new paper in Nature does, The genetic basis of parental care evolution in monogamous mice. In figure 3 you can see the distribution of behaviors in parental generations, F1 hybrids, and the F2, which is a cross of F1 individuals. The widespread distribution of F2 individuals is likely indicative of a polygenic architecture of the traits. Additionally, they found that some traits are correlated with each other in the F2 generation (probably due to pleiotropy, the same gene having multiple effects), while others were independent.

With the F2 generation they ran a genetic analysis which looked for associations between traits and regions of the genome. They found 12 quantitative trait loci (QTLs), basically zones of the genome associated with variation on one or more of the six traits. From this analysis they immediately realized there was sexual dimorphism in terms of the genetic architecture; the same locus might have a different effect in the opposite sex. This is evolutionarily interesting.

Because the QTLs are rather large in terms of physical genomic units the authors looked to see which were plausible candidates in terms of function. One of their hits was vasopressin, which should be familiar to many from vole work, as well as some human studies. Though the QTL work as well as their pup-switching experiment (which I did not describe) is persuasive, the fact that a gene you’d expect shows up as a candidate really makes it an open and shut case.

The extent of the variation explained by any given QTL seems modest. In the extended figures you can see it’s mostly in the 1 to 5 percent range. In Carl Zimmer’s excellent write up he ends:

But Dr. Bendesky cautioned that the vasopressin gene would probably turn out to be just one of many that influence oldfield mice. Though it is strongly linked to parental behavior, the vasopressin gene accounts for 6.7 percent of the variation in nest building among males, and only 2.9 percent among females.

The genetic landscape of human parenting will turn out to be even more rugged, Dr. Bendesky predicted.

“You cannot do a 23andMe test and find out if your partner is going to be a good father,” he said.

Sort of. The genetic architecture above is polygenic…but not incredibly diffuse. The proportion of variation explained by the largest effect allele is more than for height, and far more than for education. If human research follows up on this, I wouldn’t be surprised if you could develop a polygenic risk score.

But I don’t have a good intuition on how much variation in humans there really is for these sorts of traits that are heritable. I assume some. But I don’t know how much. And how much of the variance in behavior might be explained by human QTLs? Humans don’t lick or build nests, or retrieve pups. Also, as one knows from Genetics and Analysis of Quantitative Traits sexually dimorphic traits take a long time to evolve. These are two deer mice species. Within humans there may not have been enough time for this sort of heritable complexity of behavior to evolve.

There are a lot of philosophical issues here about translating to a human context.

Nevertheless, this research shows that ingenious animal models can powerfully elucidate the biological basis of behavior.

Citation: The genetic basis of parental care evolution in monogamous mice. Nature (2017) doi:10.1038/nature22074

Women hate going to India

For some reason women do not seem to migrate much into South Asia. In the late 2000s I, along with others, noticed a strange discrepancy in the Y and mtDNA lineages which trace one’s direct male and female lines: in South Asia the male lineages were likely to cluster with populations to the north an west, while the females lines did not. South Asia’s females lines in fact had a closer relationship to the mtDNA lineages of Southeast and East Asia, albeit distantly.

One solution which presented itself was to contend there was no paradox at all. That the Y chromosomal lineages found in South Asia were basal to those to the west and north. In particular, there were some papers suggesting that perhaps R1a1a originated in South Asia at the end of the last Pleistocene. Whole genome sequencing of Y chromosomes does not bear this out though. R1a1a went through rapid expansion recently, and ancient DNA has found it in Russia first. But in 2009 David Reich came out with Reconstructing Indian population history, which offered up somewhat of a possible solution.

What Reich and his coworkers found that South Asia seems to be characterized by the mixture of two very different types of populations. One set, ANI (Ancestral North Indian), are basically another western or northwestern Eurasian group. ASI (Ancestral South Indian), are indigenous, and exhibit distant affinities to the Andaman Islanders. The India-specific mtDNA then were from ASI, while the Y chromosomes with affinities to people to the north and west were from ANI. In other words, the ANI mixture into South Asia was probably through a mass migration of males.

But it’s not just Y and mtDNA in this case only. A minority of South Asians speak Austro-Asiatic languages. The most interesting of these populations are the Munda, who tend to occupy uplands in east-central India. Older books on India history often suggest that the Munda are the earliest aboriginals of the subcontinent, but that has to confront the fact that most Austro-Asiatic language are spoken in Southeast Asia. There was no true consensus where they were present first.

Genetics seems to have solved this question. The evidence is building up that Austro-Asiatic languages arrived with rice farmers from Southeast Asia. Though most of the ancestry of the Munda is of ANI-ASI mix, a small fraction is clearly East Asian. And interestingly, though they carry no East Asian mtDNA, they do carry East Asian Y. Again, gene flow mediated by males.

The same is true of India’s Bene Israel Jewish community.

A new preprint on biorxiv confirms that the Parsis are another instance of the same dynamic: The genetic legacy of Zoroastrianism in Iran and India: Insights into population structure, gene flow and selection:

Zoroastrianism is one of the oldest extant religions in the world, originating in Persia (present-day Iran) during the second millennium BCE. Historical records indicate that migrants from Persia brought Zoroastrianism to India, but there is debate over the timing of these migrations. Here we present novel genome-wide autosomal, Y-chromosome and mitochondrial data from Iranian and Indian Zoroastrians and neighbouring modern-day Indian and Iranian populations to conduct the first genome-wide genetic analysis in these groups. Using powerful haplotype-based techniques, we show that Zoroastrians in Iran and India show increased genetic homogeneity relative to other sampled groups in their respective countries, consistent with their current practices of endogamy. Despite this, we show that Indian Zoroastrians (Parsis) intermixed with local groups sometime after their arrival in India, dating this mixture to 690-1390 CE and providing strong evidence that the migrating group was largely comprised of Zoroastrian males. By exploiting the rich information in DNA from ancient human remains, we also highlight admixture in the ancestors of Iranian Zoroastrians dated to 570 BCE-746 CE, older than admixture seen in any other sampled Iranian group, consistent with a long-standing isolation of Zoroastrians from outside groups. Finally, we report genomic regions showing signatures of positive selection in present-day Zoroastrians that might correlate to the prevalence of particular diseases amongst these communities.

The paper uses lots of fancy ChromoPainter methodologies which look at the distributions of haplotypes across populations. But some of the primary results are obvious using much simpler methods.

1) About 2/3 of the ancestry of Indian Parsis derives from an Iranian population
2) About 1/3 of the ancestry of Indian Parsis derives from an Indian popuation
3) Almost all the Y chromosomes of Indian Parsis can be accounted for by Iranian ancestry
4) Almost all the mtDNA haplogroups of Indian Parsis can be accounted for by Indian ancestry
5) Iranian Zoroastrians are mostly endogamous
6) Genetic isolation has resulted in drift and selection on Zoroastrians

The fact that the ancestry proportion is clearly more than 50% Iranian for Parsis indicates that there was more than one generation of males who migrated. They did not contribute mtDNA, but they did contribute genome-wide to Iranian ancestry. There are wide intervals on the dating of this admixture event, but they are consonant oral history that was later written down by the Parsis.

So there you have it. Another example of a population formed from admixture because women hate going to India.

Citation: The genetic legacy of Zoroastrianism in Iran and India: Insights into population structure, gene flow and selection.
Saioa Lopez, Mark G Thomas, Lucy van Dorp, Naser Ansari-Pour, Sarah Stewart, Abigail L Jones, Erik Jelinek, Lounes Chikhi, Tudor Parfitt, Neil Bradman, Michael E Weale, Garrett Hellenthal
bioRxiv 128272; doi:

Sex bias in migration from the steppe (revisited)

Last fall I blogged a preprint which eventually came out as a paper in PNAS, Ancient X chromosomes reveal contrasting sex bias in Neolithic and Bronze Age Eurasian migrations. The upshot is that the authors found that there was far less steppe ancestry on the X chromosomes of Bronze Age Central Europeans than across the whole genome. The natural inference here is that you had migrations of males into territory where they had to find local wives.

But the story does not end there. Iosif Lazaridis and David Reich have put out a short not on biorxiv, Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe. It’s short, so I suggest you read the note yourself, but the major issue seems to be that on X chromosomes ADMIXTURE in supervised mode seems to behave really strangely. Lazaridis and Reich find that there seems to be a downward bias of steppe ancestry. Ergo, the finding was an artifact.

Goldberg et al. almost immediately responded, Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe. Their response seems to be that yes, ADMIXTURE does behave strangely, but the overall finding is still robust.

With these uncertainties I do wonder if it’s hard at this point to evaluate the alternative models. But, we do have archaeology and mtDNA. What do those say? On that basis, from what little I know, I am inclined to suspect a strong male bias of migration.

Citation: Reply To Lazaridis And Reich: Robust Model-Based Inference Of Male-Biased Admixture During Bronze Age Migration From The Pontic-Caspian Steppe, Amy Goldberg, Torsten Gunther, Noah A Rosenberg, Mattias Jakobsson
bioRxiv 122218; doi:

Citation: Failure to Replicate a Genetic Signal for Sex Bias in the Steppe Migration into Central Europe, Iosif Lazaridis, David Reich, bioRxiv 114124; doi: