Genetic variation and disease in Africa

Very readable review, Gene Discovery for Complex Traits: Lessons from Africa. It’s open access, so I recommend it. The summary:

The genetics of African populations reveals an otherwise “missing layer” of human variation that arose between 100,000 and 5 million years ago. Both the vast number of these ancient variants and the selective pressures they survived yield insights into genes responsible for complex traits in all populations.

The main issue I might have is I’m not sure that focusing on 5 million year time spans is particularly useful. Rather, looking at the last major bottleneck for modern humans before the “Out of Africa” event would be key, since that’s when a lot of the common variation would disappear, and very rare variants probably don’t have deep time depth in any case. With all that being said, the qualitative analysis is on point.

One of the major issues in the “SNP-chip” era has been that ascertainment of variation has been skewed toward Europeans. Though more recent techniques have tried to fix this…this review points out that if you by necessity constrain the SNPs of interest to those that vary outside of Africa (most of the world’s population), you are taking may alleles private to Africa off the table. This is relevant because the “Out of Africa” bottleneck ~50,000 years ago means that African populations harbor a lot more genetic variation than non-African populations do.

The move to high-quality whole genome sequencing obviates these concerns. As a matter of course African variation will be “picked up” since the marker set is not constrained ahead of time.

Importantly the authors focus on South Africa and the Xhosa population. This group has about ~20% Khoisan genetic ancestry, which is very diverse, and, very distinct, from that of the remaining ~80% of its ancestry. With its large African immigrant population and highly diverse native groups, some of them quite admixed, South Africa could actually provide some hard-to-substitute value in biomedical genetics.

Africa, the churning continent

Martin Meredith’s The Fortunes of Africa glosses very quickly over one of the major reasons that the “great scramble” for the continent occurred in the late 19th century, the discovery of the usefulness of quinine as an anti-malarial agent. Perhaps because I’ve read Plagues and Peoples and The Retreat of the Elephants: An Environmental History of China, I have always been conscious of the role of disease in discouraging conquest and migration (malaria in Italy was also a way to limit the extent of long-term occupation).

The coastal regions of Africa had been subject to the trade and depredations of European actors for nearly 400 years when the Berlin Conference partitioned the continent amongst European powers. Despite the fact that much of the interior was not charted, there had long been a colonial presence. Accra, the modern capital of Ghana, was originally a 16th-century Portuguese fort, but for several centuries between the 17th and 19th centuries, it was actually a possession of Scandinavian powers, Sweden and Denmark! (before passing on to the British)

For all these centuries the heart of Africa was unknown to Europeans, in part because there were native powers blocking their way, but also because the mortality rates were so high for outsiders, as indicated above. It is no surprise that the main European settlement in Africa which was more than a simple trading fort was at the southern tip of the continent, where the climate was Mediterranean and so the disease burden low.

But once quinine, and machine guns, came into the equation the interior was accessible. It all happened rather quickly in a few decades, though in some cases European ‘colonialism’ involved little more than nominal allegiance of tribal chieftains.

Now, a new paper in Cell may herald the beginning of a great genomic scramble to understand the history of Africa. Carl Zimmer in The New York Times has a piece up, Clues to Africa’s Mysterious Past Found in Ancient Skeletons. It begins:

It was only two years ago that researchers found the first ancient human genome in Africa: a skeleton in a cave in Ethiopia yielded DNA that turned out to be 4,500 years old.

On Thursday, an international team of scientists reported that they had recovered far older genes from bone fragments in Malawi dating back 8,100 years. The researchers also retrieved DNA from 15 other ancient people in eastern and southern Africa, and compared the genes to those of living Africans.

The general results of the paper, Skoglund’s Reconstructing Prehistoric African Population Structure, were presented at the SMBE meeting this summer. So in broad sketches I was not surprised, though the details require some digging into.

The Bantu Expansion repatterned the population structure of Africa

Between 1000 BC and 500 AD the expansion of iron wielding agriculturalists from the environs of modern day southern Cameroon reshaped the cultural and genetic landscape of Sub-Saharan Africa. The relatively late date of this expansion should give us a general sense of how careful we need to be about making assertions about “prehistoric Africa.” When Egypt’s New Kingdom was expanding southward along the Nile and into the Levant, Sub-Saharan Africa was qualitatively very different from what we see today in both culture and genetic structure. The continent’s contemporary human geography does not have a deep time depth.

In any case, anyone who has worked with genetic data from Africa is struck by how similar Bantu-speaking populations are genetically. So these results are not world-shaking. South African Zulus occupy positions far closer to Kenyans and Congolese than they do to Khoisan peoples to the west of them facing the Kalahari. The Xhosa people on the cultural frontier of the Bantus in South Africa exhibit substantial admixture from Khoisan (to the point where they have even integrated clicks into their language!), but even they are preponderantly non-Khoisan.

By sampling ancient genomes across a geographical transect which runs up the Rift Valley to Ethiopia, Skoglund et al. show that before the Bantu Expansion there was a north-south genetic relatedness cline. When this result was presented at SMBE a few friends were quite excited that they were being presented a cline, as some researchers have felt that this particular lab group has a tendency to model everything as pulse admixtures between distinct ancestral populations. But the reasonably deep time transect in Malawi exhibited no variance in admixture fractions, which is indicative of the likelihood that its “mixed” status at a particular K cluster is simply an artifact (see this post for what’s going on).

One particular aspect of the results from Malawi is that they found no continuity between contemporary populations, Bantu agriculturalists, and these ancient hunter-gatherers. That is, hunter-gatherers were replaced in toto. This is not entirely surprising, as many researchers who have worked with European ancient DNA believe that hunter-gatherers in many areas left no descendants at all as well (the “hunter-gatherer” fractions in modern groups in a particular region are believed to be due to migration of mixed populations who obtained “hunter-gatherer” ancestry at another locale).

But the Bantus were not the first “intrusive” population

These results also have some moderate surprises. A Tanzanian sample from 1100 BC from a pastoralist context exhibits an ancestral mix which is Sub-Saharan African and West Eurasian/North African. More precisely, about 38 percent of this individual’s ancestry resembles that of the Pre-Pottery Neolithic culture of the Levant, and the rest of the genome most resembles a 4500 year old sample from Ethiopia.

This date is before the initiation of the Bantu Expansion. The genetic results in this work, and earlier publications, strongly points to the likelihood that this population(s) mediated the spread of pastoralism to the south and west. In particular, all Khoisan groups of southern Africa seem to have admixture from this group, more (Khoi) or less (San).

But a curious aspect of this result is that these early pastoralists do not carry any evidence of admixture from ancient eastern farmers from the Zagros region. That is, the West Eurasian gene flow into the Tanzanian pastoralists predates the great exchange/admixture in the Middle East between western and eastern lineages. Since that reciprocal gene flow seems to have occurred at least 2,000 years before the Tanzanian pastoralist’s time, it suggests that this West Eurasian element was in Africa for thousands of years.

The second important point to emphasize is that the Iranian-like component is found among Cushitic speaking Somali and Afar samples, at 15-20% clips. Looking at the supporting tables a wide range of East African populations have the Tanzanian pastoralist ancestry but do not show evidence of the Iranian-like ancestry, which is now ubiquitous in the Middle East, and presumably in the highlands of Ethiopia as well (which usually show somewhat higher levels of Eurasian ancestry than is the case on the coast, especially among Semitic language speakers).

This fact is important because many of the Nilotic peoples are reputed to have absorbed Cushitic groups relatively recently in the past. This is also true for Bantu speaking groups according to these and other data. Finally, the Sandawe, who speak a language with clicks, and so may have some affinity to Khoisan, are often stated to have Cushitic affinities (looking at the data they clearly have West Eurasian ancestry). But their Eurasian ancestry seems to lack the Iranian-like component as well.

None of the populations with putative Cushitic ancestry, but who lack Iranian-like ancestry, speak a Cushitic language (most speak Nilotic languages, but East African Bantus have mixed with these Nilotic groups, so they have the same ancestry). Therefore I wonder if these pastoralists spoke an Afro-Asiatic language in the first place.

A patchy landscape

The phylogenetic tree illustrates the relationships of various African populations without much recent Eurasian ancestry. In The New York Times article David Reich indicates that the Hadza people of Tanzania are the closest Sub-Saharan Africans to the lineage ancestral to non-Africans. This is actually a simplification of what you see in the paper, and is illustrated in the tree to the left. The 4500 year old Ethiopian sample, which does not have Eurasian ancestry, nevertheless is the closest of all Sub-Saharan groups to Eurasians. The Hadza have the highest fraction of this ancestral component of all Sub-Saharan Africans in their data set, but many other populations also carry this ancestry (the Tanzanian pastoralist combined the PPN ancestry with this element).

This was a patchy landscape of inhabitation, because though the Tanzanian pastoralist ancestry, a combination of PPN and proto-Ethiopian, spread all the way to the Cape, there were populations, such as the Hadza and a 400 year old individual sampled from the Kenya island of Pemba, which lacked this genetic variation. Indeed, they are also not on the north-south (proto-Ethiopian to Khoisan) cline that featured so prominently above.

The sampling of ancient individuals is not very dense yet, so we can’t say much. But I think it does indicate we need to be cautious about assumpting gene flow dynamics as-the-crow-flies, simply a function of distance. Ecological suitability no doubt plays a strong role in how populations expand. The Bantus, for example, were stopped in South Africa by the fact that their agricultural toolkit was not suitable for the western half of the country. So when Europeans arrived in the 16th century the residents of the Cape where Khoi pastoralists.

The presence of the Hadza in Tanzania, or an individual of unmixed proto-Ethiopian ancestry on Pemba 400 years ago, indicates that the ethnic geography of East Africa has long been fluid and dynamic. There is no reason to suppose that the Hadza are not themselves migrants from further north, perhaps easily explaining why they are not on the north-south cline so evident from the ancient DNA.

The rise of Basal Humans

Several years ago researchers discovered that the first farmers of Europe, who descended from an Anatolian population, were in part derived from a group which split off very early from other Eurasian populations. This group was termed “Basal Eurasian” (BEu) because it was an outgroup to all other Eurasians, including European hunter-gatherers, East Asians, Oceanians, and the natives of the New World. Subsequent work has shown that the early Neolithic farmers of the Near East, whether they’re from the Levant or the Zagros, had about half their ancestry from this population.

No ancient genomes which are predominantly BEu have been discovered yet. The fact that populations on the cusp of the Holocene seem to have Basal Eurasian ancestry across the Middle East suggests that the admixture with hunter-gatherers related to those of Europe must have occurred during the Pleistocene. But Basal Eurasian is arguably the most parsimonious explanation of the shared drift patterns that we see.

Skoglund et al. suggest that there may be the necessity of a similar construct in Africa. They are not the first, Schlebusch et al. also suggested the necessity of this lineage in the supplements of their preprint on ancient South Africans. Within Skoglund et al. the authors see variation between the far West African Mende and the eastern West African Yoruba, where the latter exhibits closer affinity to East African populations than the former (this includes those such as the proto-Ethiopian with no Eurasian admixture). Additionally, the authors found that Khoisan groups share more alleles with populations in East Africa than they do with those in West Africa even when you account for admixture.

One model that can explain this variation is long range gene flow, so that there would be connections between various regions as a function of their distance. Another explanation is that West African populations are the product of a Basal Human (BHu) population which separated first, before the bifurcation of Khoisan from other human populations. This would reorder our understanding of who the most basal humans are. Additionally, it would align with long-standing work on deep lineages within Africa contributing a minor component of the continent’s ancestry.

As should be clear due to the tree above, BHu postdates the separation of African humans from Neanderthals. One does wonder about the relevance of the Moroccan “modern” human to these models.

Understanding culture from genetics and genetics from culture

The spread of the Bantus over 1500 years from one end of the continent to the other is perhaps one of the most important dynamics we can use to understand the spread of farming more generally. The linguistic unity of the Bantus, or at least their affinity, suggests to us that the first farmers of Europe, who spread across much of the continent in 2500 years, probably exhibited the same pattern. The low levels of gene flow between hunter-gatherers and farmers, despite living in the same regions for thousands of years, can be illustrated with African examples (e.g., the Hadza vs. their Bantu neighbors).

We are rather in the early phase of understanding these dynamics. There are more remains to be found, perhaps in the dry fastness of the Sahara or Sahel? (though unfortunately political considerations may prevent excavation due to danger to archaeologists) The genetics will give us a general idea about the nature of genetic variation and how it arose, but robust cultural models also need to be developed which illustrate how these genetic patterns arose.

Citation: Reconstructing Prehistoric African Population Structure, Skoglund, Pontus et al. Cell , Volume 171 , Issue 1 , 59 – 71.e21

The population genetic structure of Northeast Africa

The major frontier in the understanding of human population genetic structure in the next five years is going to be Africa. There are several reasons for this.

The ‘standard model’ of late has been that a group of humans left eastern Africa ~50,000 years ago, and swept across the world in one go. Though Africa itself was often an afterthought in that discussion, it now seems that there was a lot of demographic history that occurred after the “Out of Africa” event within Africa. Second, the whole picture outside of Africa and within Africa has been greatly complexified over the last decade.

The idea that modern humans, defined as the descendants of anatomically modern populations present in Africa ~100,000 years, only ventured out of the continent a bit before 50,000 years ago, is now rather shaky. There are several fossils from eastern Asia which seem older. But just as interestingly, there are Neanderthal genomic results from Altai populations which indicate an early admixture with an anatomically modern human population basal or almost basal to all extant groups. This means that this lineage is an outgroup to modern Africans and non-Africans, or, it was part of the original diversification of African lineages (one of which was the primary ancestor for non-Africans) ~200,000 years ago, give or take.

And that is the second major issue of complexification. Population structure within Africa, of both archaic and modern lineages, is going to be a major topic of interest. Just as non-Africans have admixture from highly diverged ‘archaic’ lineages (Neanderthals and Denisovans), some scholars have been arguing for years that Sub-Saharan Africans, especially those from “hunter-gatherer” lineages, have similar admixture. The most recent work seems to support the contention of very deep structure within Africa.

But much of this structure has been elided by the reality of the Bantu expansion. Starting 3,000 years ago a wave of agriculturalists from the environs of modern Cameroon pushed eastward and southward. Today the Bantu languages are dominant from the Gulf of Guinea to South Africa. This is a major problem in trying to understand the genetic variation which served as the context for the origin of modern humans, because the older structure has been overlain or replaced across much of its geographic extent. Though it is correct that modern Africans have the most genetic diversity one the whole, the between group diversity for Bantus is quite low, because they descend from a common population in the recent past.

But there are peoples within Africa who may preserve some of the ancestry of groups before the arrival of the Bantus. A new paper in PLOS GENETICS uses a very dense marker set to analyze Sudanese populations in particular. These groups are of interest because they seem reasonably distant from West Africans, but some of them do not have much Eurasian ancestry either. Like Mbuti Pygmies, or Kalahari Bushmen, they may therefore be one of the descendent groups from those which flourished within Africa when the ancestors of non-Africans left.

It’s called Northeast African genomic variation shaped by the continuity of indigenous groups and Eurasian migrations. It’s an uncorrected proof. Kind of like a preprint. So it may change. But here is the abstract:

…We investigate the population history of northeast Africa by genotyping ~3.9 million SNPs in 221 individuals from 18 populations sampled in Sudan and South Sudan and combine this data with published genome-wide data from surrounding areas. We find a strong genetic divide between the populations from the northeastern parts of the region (Nubians, central Arab populations, and the Beja) and populations towards the west and south (Nilotes, Darfur and Kordofan populations). This differentiation is mainly caused by a large Eurasian ancestry component of the northeast populations likely driven by migration of Middle Eastern groups followed by admixture that affected the local populations in a north-to-south succession of events. Genetic evidence points to an early admixture event in the Nubians, concurrent with historical contact between North Sudanese and Arab groups. We estimate the admixture in current-day Sudanese Arab populations to about 700 years ago, coinciding with the fall of Dongola in 1315/1316 AD, a wave of admixture that reached the Darfurian/Kordofanian populations some 400–200 years ago. In contrast to the northeastern populations, the current-day Nilotic populations from the south of the region display little or no admixture from Eurasian groups indicating long-term isolation and population continuity in these areas of northeast Africa.

The Eurasian admixture is well known. So not a big surprise. But I do think that this paper, like most, is somewhat biased toward detection of the most recent admixture event.

There are several Coptic samples in this data set. These individuals are descendants of recent migrants to the Sudan from Egypt. Because they are Christian, and resident in northern Sudan (I believe sampled in Khartoum), they are by necessity endogamous (marriage to a Muslim would have resulted in the result being raised as Muslim). It is no surprise that they are genetically similar to the Egyptian Muslim sample. But interestingly like the Egyptian Muslims they have a substantial minority of Nilotic Sub-Saharan African ancestry.

In much of the paper the admixture between Eurasian and Sudanic peoples is dated to after the rise of Islam. This is reasonable. For various reasons I am not totally clear on the emergence of Islam resulted in a far greater interconnectedness between Sub-Saharan Africa and North Africa & West Asia. But as non-Muslims Egyptian Christians, Copts, would not be part of the genetic admixture which slavery produced across the Middle East. Non-Muslim minorities tend to be rather less cosmopolitan than their Muslim neighbors. Perhaps the situation was different in Egypt, with Copts being a majority up until 1000 A.D. But, another factor may be that there were older pulses of admixture dating back to antiquity which the LD decay methods are missing (notice that Egyptians have West African ancestry which Copts lack).

The second major issue in this paper is that some groups, such as the Nuer, show no evidence of Eurasian admixture. This is not true of all Nilotic peoples. The Masai of Kenya for example have clear Eurasian admixture. But if indisputably Nilotic groups in southern Sudan lack it, it suggests that this occurred in East Africa due to mixing with Cushitic groups, some of whom, such as the Somalis, are also pastoralists.

Remember that Khoisan in southern Africa have Eurasian ancestry through the migration of Nilotic pastoralists. And yet somehow the Nilotic peoples of the Sudan, who have lived near to Cushitic and Semitic peoples with copious Eurasian admixture, lack that element. Similarly, the Bantus swept from Cameroon to the highlands of South African in 1,300 years, but were totally ineffectual at penetrating the Sudd. What this illustrates is that when it comes to human gene flow simple considerations of distance “as the crow flies” is not so important in many cases. Rather, cultures occupy territory in a geographically patchy manner, constrained by ecology and local human geography.

This reiterates the likely importance of ancient DNA in understanding African prehistory, and therefore, the prehistory of humankind as a whole.

Citation: Hollfelder N, Schlebusch CM, Günther T, Babiker H, Hassan HY, Jakobsson M (2017) Northeast African genomic variation shaped by the continuity of indigenous groups and Eurasian migrations. PLoS Genet 13(8): e1006976.