Africa, the churning continent

Martin Meredith’s The Fortunes of Africa glosses very quickly over one of the major reasons that the “great scramble” for the continent occurred in the late 19th century, the discovery of the usefulness of quinine as an anti-malarial agent. Perhaps because I’ve read Plagues and Peoples and The Retreat of the Elephants: An Environmental History of China, I have always been conscious of the role of disease in discouraging conquest and migration (malaria in Italy was also a way to limit the extent of long-term occupation).

The coastal regions of Africa had been subject to the trade and depredations of European actors for nearly 400 years when the Berlin Conference partitioned the continent amongst European powers. Despite the fact that much of the interior was not charted, there had long been a colonial presence. Accra, the modern capital of Ghana, was originally a 16th-century Portuguese fort, but for several centuries between the 17th and 19th centuries, it was actually a possession of Scandinavian powers, Sweden and Denmark! (before passing on to the British)

For all these centuries the heart of Africa was unknown to Europeans, in part because there were native powers blocking their way, but also because the mortality rates were so high for outsiders, as indicated above. It is no surprise that the main European settlement in Africa which was more than a simple trading fort was at the southern tip of the continent, where the climate was Mediterranean and so the disease burden low.

But once quinine, and machine guns, came into the equation the interior was accessible. It all happened rather quickly in a few decades, though in some cases European ‘colonialism’ involved little more than nominal allegiance of tribal chieftains.

Now, a new paper in Cell may herald the beginning of a great genomic scramble to understand the history of Africa. Carl Zimmer in The New York Times has a piece up, Clues to Africa’s Mysterious Past Found in Ancient Skeletons. It begins:

It was only two years ago that researchers found the first ancient human genome in Africa: a skeleton in a cave in Ethiopia yielded DNA that turned out to be 4,500 years old.

On Thursday, an international team of scientists reported that they had recovered far older genes from bone fragments in Malawi dating back 8,100 years. The researchers also retrieved DNA from 15 other ancient people in eastern and southern Africa, and compared the genes to those of living Africans.

The general results of the paper, Skoglund’s Reconstructing Prehistoric African Population Structure, were presented at the SMBE meeting this summer. So in broad sketches I was not surprised, though the details require some digging into.

The Bantu Expansion repatterned the population structure of Africa

Between 1000 BC and 500 AD the expansion of iron wielding agriculturalists from the environs of modern day southern Cameroon reshaped the cultural and genetic landscape of Sub-Saharan Africa. The relatively late date of this expansion should give us a general sense of how careful we need to be about making assertions about “prehistoric Africa.” When Egypt’s New Kingdom was expanding southward along the Nile and into the Levant, Sub-Saharan Africa was qualitatively very different from what we see today in both culture and genetic structure. The continent’s contemporary human geography does not have a deep time depth.

In any case, anyone who has worked with genetic data from Africa is struck by how similar Bantu-speaking populations are genetically. So these results are not world-shaking. South African Zulus occupy positions far closer to Kenyans and Congolese than they do to Khoisan peoples to the west of them facing the Kalahari. The Xhosa people on the cultural frontier of the Bantus in South Africa exhibit substantial admixture from Khoisan (to the point where they have even integrated clicks into their language!), but even they are preponderantly non-Khoisan.

By sampling ancient genomes across a geographical transect which runs up the Rift Valley to Ethiopia, Skoglund et al. show that before the Bantu Expansion there was a north-south genetic relatedness cline. When this result was presented at SMBE a few friends were quite excited that they were being presented a cline, as some researchers have felt that this particular lab group has a tendency to model everything as pulse admixtures between distinct ancestral populations. But the reasonably deep time transect in Malawi exhibited no variance in admixture fractions, which is indicative of the likelihood that its “mixed” status at a particular K cluster is simply an artifact (see this post for what’s going on).

One particular aspect of the results from Malawi is that they found no continuity between contemporary populations, Bantu agriculturalists, and these ancient hunter-gatherers. That is, hunter-gatherers were replaced in toto. This is not entirely surprising, as many researchers who have worked with European ancient DNA believe that hunter-gatherers in many areas left no descendants at all as well (the “hunter-gatherer” fractions in modern groups in a particular region are believed to be due to migration of mixed populations who obtained “hunter-gatherer” ancestry at another locale).

But the Bantus were not the first “intrusive” population

These results also have some moderate surprises. A Tanzanian sample from 1100 BC from a pastoralist context exhibits an ancestral mix which is Sub-Saharan African and West Eurasian/North African. More precisely, about 38 percent of this individual’s ancestry resembles that of the Pre-Pottery Neolithic culture of the Levant, and the rest of the genome most resembles a 4500 year old sample from Ethiopia.

This date is before the initiation of the Bantu Expansion. The genetic results in this work, and earlier publications, strongly points to the likelihood that this population(s) mediated the spread of pastoralism to the south and west. In particular, all Khoisan groups of southern Africa seem to have admixture from this group, more (Khoi) or less (San).

But a curious aspect of this result is that these early pastoralists do not carry any evidence of admixture from ancient eastern farmers from the Zagros region. That is, the West Eurasian gene flow into the Tanzanian pastoralists predates the great exchange/admixture in the Middle East between western and eastern lineages. Since that reciprocal gene flow seems to have occurred at least 2,000 years before the Tanzanian pastoralist’s time, it suggests that this West Eurasian element was in Africa for thousands of years.

The second important point to emphasize is that the Iranian-like component is found among Cushitic speaking Somali and Afar samples, at 15-20% clips. Looking at the supporting tables a wide range of East African populations have the Tanzanian pastoralist ancestry but do not show evidence of the Iranian-like ancestry, which is now ubiquitous in the Middle East, and presumably in the highlands of Ethiopia as well (which usually show somewhat higher levels of Eurasian ancestry than is the case on the coast, especially among Semitic language speakers).

This fact is important because many of the Nilotic peoples are reputed to have absorbed Cushitic groups relatively recently in the past. This is also true for Bantu speaking groups according to these and other data. Finally, the Sandawe, who speak a language with clicks, and so may have some affinity to Khoisan, are often stated to have Cushitic affinities (looking at the data they clearly have West Eurasian ancestry). But their Eurasian ancestry seems to lack the Iranian-like component as well.

None of the populations with putative Cushitic ancestry, but who lack Iranian-like ancestry, speak a Cushitic language (most speak Nilotic languages, but East African Bantus have mixed with these Nilotic groups, so they have the same ancestry). Therefore I wonder if these pastoralists spoke an Afro-Asiatic language in the first place.

A patchy landscape

The phylogenetic tree illustrates the relationships of various African populations without much recent Eurasian ancestry. In The New York Times article David Reich indicates that the Hadza people of Tanzania are the closest Sub-Saharan Africans to the lineage ancestral to non-Africans. This is actually a simplification of what you see in the paper, and is illustrated in the tree to the left. The 4500 year old Ethiopian sample, which does not have Eurasian ancestry, nevertheless is the closest of all Sub-Saharan groups to Eurasians. The Hadza have the highest fraction of this ancestral component of all Sub-Saharan Africans in their data set, but many other populations also carry this ancestry (the Tanzanian pastoralist combined the PPN ancestry with this element).

This was a patchy landscape of inhabitation, because though the Tanzanian pastoralist ancestry, a combination of PPN and proto-Ethiopian, spread all the way to the Cape, there were populations, such as the Hadza and a 400 year old individual sampled from the Kenya island of Pemba, which lacked this genetic variation. Indeed, they are also not on the north-south (proto-Ethiopian to Khoisan) cline that featured so prominently above.

The sampling of ancient individuals is not very dense yet, so we can’t say much. But I think it does indicate we need to be cautious about assumpting gene flow dynamics as-the-crow-flies, simply a function of distance. Ecological suitability no doubt plays a strong role in how populations expand. The Bantus, for example, were stopped in South Africa by the fact that their agricultural toolkit was not suitable for the western half of the country. So when Europeans arrived in the 16th century the residents of the Cape where Khoi pastoralists.

The presence of the Hadza in Tanzania, or an individual of unmixed proto-Ethiopian ancestry on Pemba 400 years ago, indicates that the ethnic geography of East Africa has long been fluid and dynamic. There is no reason to suppose that the Hadza are not themselves migrants from further north, perhaps easily explaining why they are not on the north-south cline so evident from the ancient DNA.

The rise of Basal Humans

Several years ago researchers discovered that the first farmers of Europe, who descended from an Anatolian population, were in part derived from a group which split off very early from other Eurasian populations. This group was termed “Basal Eurasian” (BEu) because it was an outgroup to all other Eurasians, including European hunter-gatherers, East Asians, Oceanians, and the natives of the New World. Subsequent work has shown that the early Neolithic farmers of the Near East, whether they’re from the Levant or the Zagros, had about half their ancestry from this population.

No ancient genomes which are predominantly BEu have been discovered yet. The fact that populations on the cusp of the Holocene seem to have Basal Eurasian ancestry across the Middle East suggests that the admixture with hunter-gatherers related to those of Europe must have occurred during the Pleistocene. But Basal Eurasian is arguably the most parsimonious explanation of the shared drift patterns that we see.

Skoglund et al. suggest that there may be the necessity of a similar construct in Africa. They are not the first, Schlebusch et al. also suggested the necessity of this lineage in the supplements of their preprint on ancient South Africans. Within Skoglund et al. the authors see variation between the far West African Mende and the eastern West African Yoruba, where the latter exhibits closer affinity to East African populations than the former (this includes those such as the proto-Ethiopian with no Eurasian admixture). Additionally, the authors found that Khoisan groups share more alleles with populations in East Africa than they do with those in West Africa even when you account for admixture.

One model that can explain this variation is long range gene flow, so that there would be connections between various regions as a function of their distance. Another explanation is that West African populations are the product of a Basal Human (BHu) population which separated first, before the bifurcation of Khoisan from other human populations. This would reorder our understanding of who the most basal humans are. Additionally, it would align with long-standing work on deep lineages within Africa contributing a minor component of the continent’s ancestry.

As should be clear due to the tree above, BHu postdates the separation of African humans from Neanderthals. One does wonder about the relevance of the Moroccan “modern” human to these models.

Understanding culture from genetics and genetics from culture

The spread of the Bantus over 1500 years from one end of the continent to the other is perhaps one of the most important dynamics we can use to understand the spread of farming more generally. The linguistic unity of the Bantus, or at least their affinity, suggests to us that the first farmers of Europe, who spread across much of the continent in 2500 years, probably exhibited the same pattern. The low levels of gene flow between hunter-gatherers and farmers, despite living in the same regions for thousands of years, can be illustrated with African examples (e.g., the Hadza vs. their Bantu neighbors).

We are rather in the early phase of understanding these dynamics. There are more remains to be found, perhaps in the dry fastness of the Sahara or Sahel? (though unfortunately political considerations may prevent excavation due to danger to archaeologists) The genetics will give us a general idea about the nature of genetic variation and how it arose, but robust cultural models also need to be developed which illustrate how these genetic patterns arose.

Citation: Reconstructing Prehistoric African Population Structure, Skoglund, Pontus et al. Cell , Volume 171 , Issue 1 , 59 – 71.e21

Population structure in Neanderthals leads to genetic homogeneity

The above tweet is in response to a article which reports on the finding past month in PNAS, Early history of Neanderthals and Denisovans. It’s open access, you should read it. I don’t think I’ve reviewed it because I haven’t dug through the supplements. To be frank this is a paper where you pretty much have to read the supplements because they’re introducing a somewhat different model here than is the norm.

I talked to Alan Rogers at SMBE about this paper. Broadly, I think there might be something to it, and it’s because of what David says above. It is simply hard to imagine that Neanderthals could be extremely successful with such low genetic diversity as we see, and spread so thin. Now, the Quanta Magazine tries to emphasize that the effective population is not the true census population, but I wish it would have explained it more clearly. Basically, the size that is relevant for breeding is obviously not going to the same as a head count. And, because effective populations are highly sensitive to bottlenecks you can get really small numbers even when the extant population at any given time may be large.

The PNAS paper makes some novel inferences, and I’ll set that to the side until I read the supplements. But I don’t think it’s crazy that population structure within Neanderthals could be leading to lower total genetic diversity.

Release the UK Biobank! (the prediction of height edition)

There’s so much science coming out of the UK Biobank it’s not even funny. It’s like getting the palantír or something.

Anyway, a preprint, submitted for your approval. A vision of things to come? Accurate Genomic Prediction Of Human Height:

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ~20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.

A scatter-plot is worth a thousand derivations.

You know what better than 500,000 samples? One billion samples! A nerd can dream….

Massive genomic sample sizes = detecting evolution in real time

The recent PLOS BIOLOGY paper, Identifying genetic variants that affect viability in large cohorts, seems to have triggered a feeding frenzy in the media. For example, Big Think has put up Researchers Find Evidence That Human Evolution Is Still Actively Happening.

I wasn’t paying close attention because of course human evolution is still happening actively. From a genetic perspective, evolution is just change in allele frequencies. Populations aren’t infinite, so even if there wasn’t any selection stochastic forces would shift allele frequencies. But of course selection is probably happening. For adaptation by natural selection to occur you need heritable variation on a trait where there are fitness differences as a function of variation within the population. It seems implausible that these conditions don’t still apply. There’s plenty of fitness variation in the population, and it’s unlikely to be random as a function of heritable variation.

But the devil is in the details. And last year Field et al. used the modern genomic tools available to detect selection occurring over the past 2,000 years. It is not credible that it would have magically stopped a few centuries ago.

So why is this new paper such a big deal? (note that it’s in PLOS BIOLOGY, not PLOS GENETICS) Because the method they use is ingenious and simple. Basically, they’re looking at changes in allele frequencies as a function of age in huge populations. It’s a little more complicated than that, they used a logistic regression to control for some of the other variables. But they found some biologically plausible hits with their data set of 50,000-150,000. And, they replicated their hits from a European sample to a non-European one.

This does bring me back to a discussion I observed a while back. An evolutionary geneticist who works with Drosophila mentioned offhand that in his field there really wasn’t that much of a need for more data. They could spend all their time to doing analysis. A prominent human geneticist whose work focused on biomedicine piped up that that wasn’t true at all for their field. There are some differences in the scientific questions, but there are also differences in terms of what you can do with humans as a model organism.

In the paper they look forward to the day of increasing sample sizes an order of magnitude beyond where it is now. At some point in the near future, large fractions of entire nations will be sequenced at medical grade level (30x coverage).

Anyway, you should read Identifying genetic variants that affect viability in large cohorts. It’s pretty straightforward.

After agriculture, before bronze


The above plot shows genetic distance/variation between highland and lowland populations in Papa New Guinea (PNG). It is from a paper in Science that I have been anticipating for a few months (I talked to the first author at SMBE), A Neolithic expansion, but strong genetic structure, in the independent history of New Guinea.

What does “strong genetic structure” mean? Basically Fst is showing the proportion of genetic variation which is partitioned between groups. Intuitively it is easy to understand, in that if ~1% of the genetic variation is partitioned between groups in one case, and ~10% in another, then it is reasonable to suppose that the genetic distance between groups in the second case is larger than in the first case. On a continental scale Fst between populations is often on the order of ~0.10. That is the value for example when you pool the variation amongst Northern Europeans and Chinese, and assess how much of it can be apportioned in a manner which differentiates populations (so it’s about ~10% of the variation).

This is why ancient DNA results which reported that Mesolithic hunter-gatherers and Neolithic farmers in Central Europe who coexisted in rough proximity for thousands of years exhibited differences on the order of ~0.10 elicited surprise. These are values we are now expecting from continental-scale comparisons. Perhaps an appropriate analogy might be the coexistence of Pygmy groups and Bantu agriculturalists? Though there is some gene flow, the two populations exist in symbiosis and exhibit local ecological segregation.

In PNG continental scale Fst values are also seen among indigenous people. The differences between the peoples who live in the highlands and lowlands of PNG are equivalent to those between huge regions of Eurasia. This is not entirely surprising because there has been non-trivial gene flow into lowland populations from Austronesian groups, such as the Lapita culture. Many lowland groups even speak Austronesian languages today.

Using standard ADMIXTURE analysis the paper shows that many lowland groups have significant East Asian ancestry (red), while none of the highland groups do (some individuals with East Asian admixture seem to be due to very recent gene flow). But even within the highlands the genetic differences are striking. The  Fst values between Finns and Southern European groups such as Spaniards are very high in a European context (due to Finnish Siberian ancestry as well as drift through a bottleneck), but most comparisons within the highland groups in PNG still exceeds this.

The paper also argues that genetic differences between Papuans and the natives of Australia pre-date the rising sea levels at the beginning of the Holocene, when Sahul divided between its various constituents. This is not entirely surprising considering that the ecology of the highlands during the Pleistocene would have been considerably different from Australia to the south, resulting in sharp differences in the hunter-gatherer lifestyles. Additionally, there does not seem to have been a genetic cline. Papuans are symmetrically related to all Australian groups they had samples from.

Using coalescence-based genomic methods they inferred that separation between highlands and some lowland groups occurred ~10-20,000 years ago. That is, after the Last Glacial Maximum. For the highlands, the differences seem to date to within the last 10,000 years. The Holocene. Additionally, they see population increases in the highlands, correlating with the shift to agriculture (cultivation of taro).

None of the above is entirely surprising, though I would take the date inferences with a grain of salt. The key is to observe that large genetic differences, as well as cultural differences, accrued in the highlands of PNG during the Holocene. In the paper they have a social and cultural explanation for what’s going on:

  Fst values in PNG fall between those of hunter-gatherers and present-day populations of west Eurasia, suggesting that a transition to cultivation alone does not necessarily lead to genetic homogenization.

A key difference might be that PNG had no Bronze Age, which in west Eurasia was driven by an expansion of herders and led to massive population replacement, admixture, and cultural and linguistic change (7, 8), or Iron Age such as that linked to the expansion of Bantu-speaking
farmers in Africa (24). Such cultural events have resulted in rapid Y-chromosome lineage expansions due to increased male reproductive variance (25), but we consistently find no evidence for this in PNG (fig. S13). Thus, in PNG, wemay be seeing the genetic, linguistic, and cultural diversity that sedentary human societies can achieve in the absence of massive technology-driven expansions.

Peter Turchin in books like Ultrasociety has aruged that one of the theses in Steven Pinker’s The Better Angels of Our Nature is incorrect: that violence has not decreased monotonically, but peaked in less complex agricultural societies. PNG is clearly a case of this, as endemic warfare was a feature of highland societies when they encountered Europeans. Lawrence Keeley’s War Before Civilization: The Myth of the Peaceful Savage gives so much attention to highland PNG because it is a contemporary illustration of a Neolithic society which until recently had not developed state-level institutions.

What papers like these are showing is that cultural and anthropological dynamics strongly shape the nature of genetic variation among humans. Simple models which assume as a null hypothesis that gene flow occurs through diffusion processes across a landscape where only geographic obstacles are relevant simply do not capture enough of the dynamic. Human cultures strongly shape the nature of interactions, and therefore the genetic variation we see around us.

Inbreeding causing issues in Osama bin Laden’s family

I didn’t figure I would have to say much about 9/11 really that others could not say (aside from perhaps you should read Marc Sageman’s Understanding Terror Networks if you want an ethnography of the Salafi jihadist movement which lead to al-Qaeda). But The Daily Best has a profile of one of Osama bin Laden’s sons:

Moreover, by this time, bin Laden already had two wives. But Najwa, the first of them, encouraged him to pursue Khairia, believing that having someone with her training permanently on hand would help her son Saad and his brothers and sisters, some of whom also suffered from developmental disorders.

Osama bin Laden had two dozen some children (approximately). But it was strange to me to see mention of several children with developmental disorders. Inbreeding is a major burden for Arab Muslim societies. And sure enough, Osama bin Laden’s first wife was his first cousin. She gave birth to around 10 children. Her father was Osama bin Laden’s mother’s brother. With the possibility of several generations of cousin marriage their relatedness may have been closer than normal half-siblings.

Note: Osama bin Laden’s father was from Yemen and his mother from Syria. So he was most certainly not inbred.

Ancient Europeans: isolated, always on the edge of extinction

A few years ago I suggested to the paleoanthropologist Chris Stringer that the first modern humans who arrived in Europe did not contribute appreciable ancestry to modern populations in the continent (appreciable as in 1% or more of the genome).* It seems I may have been right according to results from a 2016 paper, The genetic history of Ice Age Europe. The very oldest European ancient genome samples “failed to contribute appreciably to the current European gene pool.”

Why did I make this claim? Two reasons:

1) 40,000 years is a long time, and there was already substantial evidence of major population turnovers across northern Eurasia by this point. You go far enough into the future and it’s not likely that a local population leaves any descendants. So just work that logic backward.

2) There was already evidence of low population sizes and high isolation levels between groups in Pleistocene and Mesolithic/Neolithic Europe. This would again argue in favor of a high likelihood of local extinctions give enough time.

This does not only apply to just modern humans, descendants of southern, likely African, populations. Neanderthals themselves show evidence of high homogeneity, and expansions through bottlenecks over the ~600,000 years of their flourishing.

The reason that these dynamics characterized modern humans and earlier hominins in northern Eurasia is what ecologists would term an abiotic factor: the Ice Age. Obviously humans could make a go of it on the margins of the tundra (the Neanderthals seem less adept at penetrating the very coldest of terrain in comparison to their modern human successors; they likely frequented the wooded fringes, see The Humans Who Went Extinct). We have the evidence of several million years of continuous habitation by our lineage. But many of the ancient genomes from these areas, whether they be Denisovan, Neanderthal, or Mesolithic European hunter-gatherer, show indications of being characterized by very low effective population sizes. Things only change with the arrival of farming and agro-pastoralism.

For two obvious reasons we happen to have many ancient European genomes. First, many of the researchers are located in Europe, and the continent has a well developed archaeological profession which can provide well preserved samples with provenance and dates. And second, Europe is cool enough that degradation rates are going to be lower than if the climate was warmer. But if Europe, as part of northern Eurasia, is subject to peculiar exceptional demographic dynamics we need to be cautious about generalizing in terms of the inferences we make about human population genetic history. Remember that ancient Middle Eastern farmers already show evidence of having notably larger effective population sizes than European hunter-gatherers.

Two new preprints confirm the long term population dynamics typical of European hunter-gatherers, Assessing the relationship of ancient and modern populations and Genomics of Mesolithic Scandinavia reveal colonization routes and high-latitude adaptation. The first preprint is rather methods heavy, and seems more of a pathfinder toward new ways to extract more analytic juice from ancient DNA results. Those who have worked with population genomic data are probably not surprised at the emphasis on collecting numbers of individuals as opposed to single genome quality. That is, for the questions population geneticists are interested in “two samples sequenced to 0.5x coverage provide better resolution than a single sample sequenced to 2x coverage.”

I encourage readers (and “peer reviewers”) to dig into the appendix of Assessing the relationship of ancient and modern populations. I won’t pretend I have (yet). Rather, I want to highlight an interesting empirical finding when the method was applied to extant ancient genomic samples: “we found that no ancient samples represent direct ancestors of modern Europeans.”

This is not surprising. The ‘hunter-gatherer’ resurgence of the Middle Neolithic notwithstanding, Northern Europe was subject to two major population replacements, while Southern Europe was subject to one, but of a substantial nature. Recall that the Bell Beaker paper found that “spread of the Beaker Complex to Britain was mediated by migration from the continent that replaced >90% of Britain’s Neolithic gene pool within a few hundred years.” This means that less than 10% of modern Britons’ ancestry are a combination of hunter-gatherers and Neolithic farmers.

And yet if you look at various forms of model-based admixture analyses it seems as if modern Europeans have substantial dollops of hunter-gatherer ancestry (and hunter-gatherer U5 mtDNA and Y chromosomal lineage I1 and I2, associated with Pleistocene Europeans, is found at ~10% frequency in modern Europe in the aggregate; though I suspect this is a floor). What gives? Let’s look at the second preprint, which is more focused on new empirical results from ancient Scandinavian genomes, Genomics of Mesolithic Scandinavia reveal colonization routes and high-latitude adaptation. From early on in the preprint:

Based on SF12’s high-coverage and high-quality genome, we estimate the number of single nucleotide polymorphisms (SNPs) hitherto unknown (that are not recorded in dbSNP (v142)) to be c. 10,600. This is almost twice the number of unique variants (c. 6,000) per Finnish individual (Supplementary Information 3) and close to the median per European individual in the 1000 Genomes Project (23) (c. 11,400, Supplementary Information 3). At least 17% of these SNPs that are not found in modern-day individuals, were in fact common among the Mesolithic Scandinavians (seen in the low coverage data conditional on the observation in SF12), suggesting that a substantial fraction of human variation has been lost in the past 9,000 years (Supplementary Information 3). In other words, the SHGs (as well as WHGs and EHGs) have no direct descendants, or a population that show direct continuity with the Mesolithic populations (Supplementary Information 6) (13–17). Thus, many genetic variants found in Mesolithic individuals have not been carried over to modern-day groups.

The gist of the paper in terms of archaeology and demographic history is that Scandinavian hunter-gatherers were a compound population. One component of their ancestry is what we term “Western hunter-gatherers” (WHG), who descended from the late  Pleistocene Villabruna cluster (see paper mentioned earlier). Samples from Belgium, Switzerland, and Spain all belong to this cluster. The second element are “Eastern hunter-gatherers” (EHG). These samples derive from the Karelia region, to the east of modern Finland, bound by the White Sea to the north. EHG populations exhibit affinities to both WHG as well as Siberian populations who contributed ancestry to Amerindians, the “Ancestral North Eurasians” (ANE). There is a question at this point whether EHG are the product of a pulse admixture between an ANE and WHG population, or whether there was a long existent ANE-WHG east-west cline which the EHG were situated upon. That is neither here nor there (the Tartu group has a paper addressing this leaning toward isolation-by-distance from what I recall).

Explicitly testing models to the genetic data the authors conclude that there was a migration of EHG populations with a specific archaeological culture around the north fringe of Scandinavia, down the Norwegian coast. Conversely, a WHG population presumably migrated up from the south and somewhat to the east (from the Norwegian perspective).

And yet the distinctiveness of the very high quality genome as inferred from unique SNPs they have suggests to them that very little of the ancestry of modern Scandinavians (and Finns to be sure) derives from these ancient populations. Very little does not mean all. There is a lot of functional analysis in the paper and supplements which I will not discuss in this post, and one aspect is that it seems some adaptive alleles for high latitudes might persist down to the present in Nordic populations as a gift from these ancient forebears. This is no surprise, not all regions of the genome are created equal (a more extreme case is the Denisovan derived high altitude adaptation haplotype in modern Tibetans).

Nevertheless, there was a great disruption. First, the arrival of farmers whose ultimate origins were Anatolia ~6,000 years ago to the southern third of Scandinavia introduced a new element which came in force (agriculture spread over the south in a few centuries). A bit over a thousand years later the Corded Ware people, who were likely Indo-European speakers, arrived. These Indo-European speakers brought with them a substantial proportion of ancestry related to the hunter-gatherers because they descended in major fraction from the EHG (and later accrued more European hunter-gatherer ancestry from both the early farmers and likely some residual hunter-gatherer populations who switched to agro-pastoralism**).

For several years I’ve had discussions with researchers whose daily bread & butter are the ancient DNA data sets of Europe. I’ve gotten some impressions implicitly, and also from things they’ve said directly. It strikes me that the Bantu expansion may not be a bad analogy in regards to the expansion of farming in Europe (and later agro-pastoralism). Though the expanding farmers initial mixed with hunter-gatherers on the frontier, once they got a head of steam they likely replaced small hunter-gatherer groups in totality, except in areas like Scandinavia and along the maritime fringe where ecological conditions were such hunter-gatherers were at advantage (War Before Civilization seems to describe a massive farmer vs. coastal forager war on the North Sea).

But this is not the end of the story for Norden. At SMBE I saw some ancient genome analysis from Finland on a poster. Combined with ancient genomic analysis from the Baltic, along with deeper analysis of modern Finnish mtDNA, it seems likely that the expansion of Finno-Samic languages occurred on the order of ~2,000 years ago. After the initial expansion of Corded Ware agro-pastoralists.

The Sami in particular seem to have followed the same path along the northern fringe of Scandinavia that the EHG blazed. Though they herd reindeer, they were also Europe’s last indigenous hunter-gatherers. Genetically they exhibit the same minority eastern affinities in their ancestry that the Finns do, though to a greater extent. But their mtDNA harbors some distinctive lineages, which might be evidence of absorption of ancient Scandinavian substate.

I’ll leave it to someone else to explain how and why the Finns and Sami came to occupy the areas where they currently dominate (note that historically Sami were present much further south in Norway and Sweden than they are today). But note that in Latvia and Lithuania the N1c Y chromosomal lineage is very common, despite no language shift, indicating that there was a great deal of reciprocal mixing on the Baltic.

Overall the story is of both population and cultural turnover. This should not surprise when one considers that northern Eurasia is on the frontier of the human range. And perhaps it should temper the inferences we make about other areas of the world.

* You may notice that this threshold is lower than the Neanderthal admixture proportions in the non-African genome. Why is this old admixture still detectable while modern human lineages go extinct? Because it seems to have occurred with non-African humans had a very small effective population, and was mixed thoroughly. Because of the even genomic distribution this ancestry has not been lost in any of the daughter populations.

** Haplogroup I1, which descends from European late Pleistocene populations, exhibits a star phylogeny of similar time depth as R1b and R1a.

Castes are not just of mind

Before Nicholas Dirks was a controversial chancellor of UC Berkeley, he was a well regarded historian of South Asia. He wrote Castes of Mind: Colonialism and the Making of Modern India. I read it, along with other books on the topic in the middle 2000s.

Here is Amazon summary from Library Journal:

Is India’s caste system the remnant of ancient India’s social practices or the result of the historical relationship between India and British colonial rule? Dirks (history and anthropology, Columbia Univ.) elects to support the latter view. Adhering to the school of Orientalist thought promulgated by Edward Said and Bernard Cohn, Dirks argues that British colonial control of India for 200 years pivoted on its manipulation of the caste system. He hypothesizes that caste was used to organize India’s diverse social groups for the benefit of British control. His thesis embraces substantial and powerfully argued evidence. It suffers, however, from its restricted focus to mainly southern India and its near polemic and obsessive assertions. Authors with differing views on India’s ethnology suffer near-peremptory dismissal. Nevertheless, this groundbreaking work of interpretation demands a careful scholarly reading and response.

The condensation is too reductive. Dirks does not assert that caste structures (and jati) date to the British period, but the thrust of the book clearly leaves the impression that this particular identity’s formative shape on the modern landscape derives from the colonial experience. The British did not invent caste, but the modern relevance seems to date to the British period.

This is in keeping with a mode of thought flourishing today under the rubric of postcolonialism, with roots back to Edward Said’s Orientalism. As a scholar of literature Said’s historical analysis suffered from the lack of deep knowledge. A cursory reading of Orientalism picks up all sorts of errors of fact. But compared to his heirs Said was actually a paragon of analytical rigor. I say this after reading some contemporary postcolonial works, and going back and re-reading Orientalism.

To not put too fine a point on it postcolonialism is more about a rhetorical posture which aims to destroy what it perceives as Western hegemonic culture. In the process it transforms the modern West into the causal root of almost all social and cultural phenomenon, especially those that are not egalitarian. Anyone with a casual grasp of world history can see this, which basically means very few can, since so few actually care about details of fact.

Castes of Mind is an interesting book, and a denser piece of scholarship than Orientalism. Its perspective is clear, and though it is not without qualification, many people read it to mean that caste was socially constructed by the British.

This seems false. It has become quite evident that even the classical varna categories seem to correlate with genome-wide patterns of relatedness. And the Indian jatis have been endogamous for on the order of two thousand years. From The New York Times, In South Asian Social Castes, a Living Lab for Genetic Disease:

The Vysya may have other medical predispositions that have yet to be characterized — as may hundreds of other subpopulations across South Asia, according to a study published in Nature Genetics on Monday. The researchers suspect that many such medical conditions are related to how these groups have stayed genetically separate while living side by side for thousands of years.

This is not really a new finding. It was clear in 2009’s Reconstructing Indian Population History. It’s more clear now in The promise of disease gene discovery in South Asia.

Unfortunately though science is not well known in any depth among the general public. The ascendency of social constructionism is such that a garbled and debased view that “caste was invented by the British” will continue to be the “smart” and fashionable view among many intellectual elites.

SLC24A5 is very important, but we don’t know why

The golden of pigmentation genetics started in 2005 with SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Prior to that pigmentation genetics was really to a great extent coat color genetics, done in mice and other organisms which have a lot of pelage variation.

Of course there was work on humans, mostly related to melanocortin 1. But more interesting were classical pedigree studies which indicated that the number of loci controlling variation in pigmentation was not that high. This, it was a mildly polygenic trait insofar as some large effect quantitative trait loci could be discerned in the inheritance patterns.

From The Genetics of Human Populations, written in the 1960s, but still useful today because of its comprehensive survey of the classical period:

Depending on what study samples you use variance on a locus of SLC24A5 explains less than 10% or more than 30% of the total variance. But it is probably the biggest effect locus on the whole in human populations when you pool them altogether (obviously it explains little variance in Africans or eastern non-Africans since it is homozygous ancestral by and large in both groups).

One aspect of the derived SNP in this locus is that it seems to be under strong selection. In a European 1000 Genomes sample there are 1003 SNPs of the derived variant, and 3 of the ancestral. Curiously this allele was absent in Western European Mesolithic European hunter-gatherers, though it was present in hunter-gatherers on the northern and eastern fringes of the continent. It was also present in Caucasian hunter-gatherers and farmers from the Middle East who migrated to Europe. It seems very likely that these sorts of high frequencies are due to selection in Europe.

The variant is also present in appreciably frequencies in many South Asian populations, and there seems to have been in situ selection there too, as well as the Near East. In Ethiopia it also seems to be under selection.

It could be something due to radiation…but the Near East and South Asia are quite high intensity in that regard. As are the highlands of Ethiopia. About seven years ago I suggested that rather that UV radiation as such the depigmentation that has occurred across the Holocene might be due to agriculture and changes in diet.

But a new result from southern Africa presented at the SMBE meeting this year suggests that this can not be a comprehensive answer. Meng Lin in Brenna Henn’s lab uses a broad panel of KhoeSan populations to find that the derived allele on SLC24A5 reaches ~40% frequency. Probably a high fraction of West Eurasian admixture in these groups is around ~10% being generous. Where did this allele come from? The results from Joe Pickrell a few years back are sufficient to explain: there was a movement of pastoralists with distant West Eurasian ancestry who brought cattle to southern Africa, and so resulted in the ethnogenesis of groups such as the Nama people (there is also Y chromosomal work by Henn on this).

Sad human with two derived alleles of SNP of interest

Lin reports that the haplotype around SLC24A5 is the same one as in Western Eurasia. Iain Mathieson (who is now at Penn if anyone is looking for something to do in grad school or a post-doc) has told me that the haplotype in the Motala Mesolithic hunter-gatherers and in the hunter-gatherers from the Caucasus are the same. It seems that this haplotype was widespread early in the Holocene. Curiously, the Motala hunter-gatherers also carry the East Asian haplotype around their derived EDAR variant.

I don’t know what to make of this. My intuition is that if a haplotype like this is so widespread nearly ~10,000 years ago recombination would have broken it apart into smaller pieces so that haplotype structure would be easier to discern. As it is that doesn’t seem to be the case.

And we also don’t know what’s going on withSLC24A5. Obviously it impacts skin color. It has been shown to do so in admixed populations. But it is hard to believe that that is the sole target of natural selection here.

The fad for dietary adaptations is not going away

Food is a big deal for humans. Without it we die. Unlike some animals (here’s looking at you pandas) we’re omnivorous. We eat fruit, nuts, greens, meat, fish, and even fungus. Some of us even eat things which give off signals of being dangerous or unpalatable, whether it be hot sauce or lutefisk.

This ability to eat a wide variety of items is a human talent. Those who have put their cats on vegetarian diets know this. After a million or so years of being hunters and gatherers with a presumably varied diet for thousands and thousands of years most humans at any given time ate some form of grain based gruel. Though I am sympathetic to the argument that in terms of quality of life this was a detriment to median human well being, agriculture allowed our species to extract orders of magnitude more calories from a unit of land, though there were exceptions, such as in marine environments (more on this later).

Ergo, some scholars, most prominently Peter Bellwood, have argued that farming did not spread through cultural diffusion. Rather, farmers simply reproduced at much higher rates because of the efficiency of their lifestyle in comparison to that of hunter-gatherers. The latest research, using ancient DNA, broadly confirms this hypothesis. More precisely, it seems that cultural revolutions in the Holocene have shaped most of the genetic variation we see around us.

But genetic variation is not just a matter of genealogy. That is, the pattern of relationships, ancestor to descendent, and the extent of admixtures across lineages. Selection is also another parameter in evolutionary genetics. This can even have genome-wide impacts. It seems quite possible that current levels of Neanderthal ancestry are lower than might otherwise have been the case due to selection against functional variants derived from Neanderthals, which are less fitness against a modern human genetic background.

The importance of selection has long been known and explored. Sickle-cell anemia only exists because of balancing selection. Ancient DNA has revealed that many of the salient traits we associate with a given population, e.g., lactose tolerance or blue eyes, have undergone massive changes in population wide frequency over the last 10,000 years. Some of this is due to population replacement or admixture. But some of it is due to selection after the demographic events. To give a concrete example, the frequency of variants associated with blue eyes in modern Europeans dropped rapidly with the expansion of farmers from the Near East ~10,000 years ago, but has gradually increased over time until it is the modal allele in much of Northern Europe. Lactase persistence in contrast is not an ancient characteristic which has had its ups and downs, but something new that evolved due to the cultural shock of the adoption of dairy consumption by humans as adults. The region around lactase is one of the strongest signals of natural selection in the European genome, and ancient DNA confirms that the ubiquity of the lactase persistent allele is a very recent phenomenon.

But obviously lactase is not going to be the only target of selection in the human genome. Not only can humans eat many different things, but we change our portfolio of proportions rather quickly. In a Farewell to Alms the economic historian Gregory Clark observed that English peasants ate very differently before and after the Black Death. As any ecologist knows populations are resource constrained when they are near the carrying capacity, and England during the High Medieval period there was massive population growth due to gains in productivity (e.g., the moldboard plough) as well as intensification of farming and utilization of all the marginal land.

After the Black Death (which came in waves repeatedly) there was a massive population decline across much of Europe. Because institutions and practices were optimized toward maintaining a much higher population, European peasants lived a much better lifestyle after the population crash because the pie was being cut into far fewer pieces. In other words, centuries of life on the margins just scraping by did not mean that English peasants couldn’t live large when the times allowed for it. We were somewhat pre-adapted.

Our ability to eat a variety of items, and the constant varying of the proportions and kind of elements which go into our diet, mean that sciences like nutrition are very difficult. And, it also means that attempts to construct simple stories of adaptation and functional patterns from regions of the genome implicated in diet often fail. But with better analytic technologies (whole genome sequencing, large sample sizes) and some elbow grease some scientists are starting to get a better understanding.

A group of researchers at Cornell has been taking a closer look at the FADS genes over the past few years (as well as others at CTEG). These are three nearby genes, FADS1FADS2, and FADS3 (they probably underwent duplication). These genes are involved in the metabolization of fatty acids, and dietary regime turns out to have a major impact on variation around these loci.

The most recent paper out of the Cornell group, Dietary adaptation of FADS genes in Europe varied across time and geography:

Fatty acid desaturase (FADS) genes encode rate-limiting enzymes for the biosynthesis of omega-6 and omega-3 long-chain polyunsaturated fatty acids (LCPUFAs). This biosynthesis is essential for individuals subsisting on LCPUFA-poor diets (for example, plant-based). Positive selection on FADS genes has been reported in multiple populations, but its cause and pattern in Europeans remain unknown. Here we demonstrate, using ancient and modern DNA, that positive selection acted on the same FADS variants both before and after the advent of farming in Europe, but on opposite (that is, alternative) alleles. Recent selection in farmers also varied geographically, with the strongest signal in southern Europe. These varying selection patterns concur with anthropological evidence of varying diets, and with the association of farming-adaptive alleles with higher FADS1 expression and thus enhanced LCPUFA biosynthesis. Genome-wide association studies reveal that farming-adaptive alleles not only increase LCPUFAs, but also affect other lipid levels and protect against several inflammatory diseases.

The paper itself can be difficult to follow because they’re juggling many things in the air. First, they’re not just looking at variants (e.g., SNPs, indels, etc.), but also the haplotypes that the variants are embedded in. That is, the sequence of markers which define an association of variants which indicate descent from common genealogical ancestors. Because recombination can break apart associations one has to engage with care in historical reconstruction of the arc of selection due to a causal variant embedded in different haplotypes.

But the great thing about this paper is that in the case of Europe they can access ancient DNA. So they perform inferences utilizing whole genomes from many extant human populations, but also inspect change in allele frequency trajectories over time because of the density of the temporal transect. The figure to the left shows variants in both an empirical and modeling framework, and how they change in frequency over time.

In short, variants associated with higher LCPUFA synthesis actually decreased over time in Pleistocene Europe. This is similar to the dynamic you see in the Greenland Inuit. With the arrival of farmers the dynamic changes. Some of this is due to admixture/replacement, but some of it can not be accounted for admixture and replacement. In other words, there was selection for the variants which synthesize more LCPUFA.

This is not just limited to Europe. The authors refer to other publications which show that the frequency of alleles associated with LCPUFA production are high in places like South Asia, notable for a culture of preference for plant-based diets, as well as enforced by the reality that animal protein was in very short supply. In Europe they can look at ancient DNA because we have it, but the lesson here is probably general: alternative allelic variants are being whipsawed in frequency by protean shifts in human cultural modes of production.

In War Before Civilization Lawrence Keeley observed that after the arrival of agriculture in Northern Europe in a broad zone to the northwest of the continent, facing the Atlantic and North Sea, farming halted rather abruptly for centuries. Keeley then recounts evidence of organized conflict in between two populations across a “no man’s land.”

But why didn’t the farmers just roll over the old populations as they had elsewhere? Probably because they couldn’t. It is well known that marine regions can often support very high densities of humans engaged in a gathering lifestyle. Though not farmers, these peoples are often also not nomadic, and occupy areas as high density. The tribes of the Pacific Northwest, dependent upon salmon fisheries, are classic examples. Even today much of the Northern European maritime fringe relies on the sea. High density means they had enough numbers to resist the human wave of advance of farmers. At least for a time.

Just as cultural forms wane and wax, so do some of the underlying genetic variants. If you dig into the guts of this paper you see much of the variation dates to the out of Africa period. There were no great sweeps which expunged all variation (at least in general). Rather, just as our omnivorous tastes are protean and changeable, so the genetic variation changes over time and space in a difficult to reduce manner. The flux of lifestyle change is probably usually faster than biological evolution can respond, so variation reducing optimization can never complete its work.

The modern age of the study of natural selection in the human genome began around when A Map of Recent Positive Selection In the Human Genome was published. And it continues with methods like SDS, which indicate that selection operates to this day. Not a great surprise, but solidifying our intuitions. In the supplements to the above paper the authors indicate that the focal alleles that they are interrogating exhibit coefficients of selection around ~0.5% or so. This is rather appreciable. The fact that fixation has not occurred indicates in part that selection has reversed or halted, as they noted. But another aspect is that there are correlated responses; the FADS genes are implicated in many things, as the authors note in relation to inflammatory diseases. But I’m not sure that the selection effects of these are really large in any case. I bet there are more important things going on that we haven’t discovered or understood.

Obviously genome-wide analyses are going to continue for the foreseeable future. Ten years ago my late friend Mike McKweon predicted that at some point genomics was going to have be complemented by detailed follow up through bench-work. I’m not sure if we’re there yet, but there are only so many populations you can sequence, and only to a particular coverage to obtain any more information. Some selection sweeps will be simple stories with simple insights. But I suspect many more like FADS will be more complex, with the threads of the broader explanatory tapestry assembled publications by publication over time.

Citation: Ye, K., Gao, F., Wang, D., Bar-Yosef, O. & Keinan, A. Dietary adaptation of FADS genes in Europe varied across time and geography. Nat. Ecol. Evol. 1, 0167 (2017).