The human phylogenetic graph gets curiouser and curiouser


While most of my readers were sleeping, Lee Berger in South Africa was giving a press conferences on new Homo naledi related results. Three papers are in elife. It’s open access, so read yourself. The major result is that the fossils have been dated to a 236,000 to 335,000 years ago.

If you aren’t a paleontologist, Homo naledi and Pleistocene hominin evolution in subequatorial Africa:

New discoveries and dating of fossil remains from the Rising Star cave system, Cradle of Humankind, South Africa, have strong implications for our understanding of Pleistocene human evolution in Africa. Direct dating of Homo naledi fossils from the Dinaledi Chamber (Berger et al., 2015) shows that they were deposited between about 236 ka and 335 ka (Dirks et al., 2017), placing H. naledi in the later Middle Pleistocene. Hawks and colleagues (Hawks et al., 2017) report the discovery of a second chamber within the Rising Star system (Dirks et al., 2015) that contains H. naledi remains. Previously, only large-brained modern humans or their close relatives had been demonstrated to exist at this late time in Africa, but the fossil evidence for any hominins in subequatorial Africa was very sparse. It is now evident that a diversity of hominin lineages existed in this region, with some divergent lineages contributing DNA to living humans and at least H. naledi representing a survivor from the earliest stages of diversification within Homo. The existence of a diverse array of hominins in subequatorial comports with our present knowledge of diversity across other savanna-adapted species, as well as with palaeoclimate and paleoenvironmental data. H. naledi casts the fossil and archaeological records into a new light, as we cannot exclude that this lineage was responsible for the production of Acheulean or Middle Stone Age tool industries.

In relation to the DNA part, we don’t have ancient genomes except for the Ethiopian Holocene one. They couldn’t get DNA out of naledi. But we do have inferences made from modern populations. Here is the most recent paper cited, Model-based analyses of whole-genome data reveal a complex evolutionary history involving archaic introgression in Central African Pygmies.

“Out of Africa” bottleneck is what really matters for mutations


At least in relation to mutational load, if you read a new preprint in biorxiv, The demographic history and mutational load of African hunter-gatherers and farmers:

The distribution of deleterious genetic variation across human populations is a key issue in evolutionary biology and medical genetics. However, the impact of different modes of subsistence on recent changes in population size, patterns of gene flow, and deleterious mutational load remains to be fully characterized. We addressed this question, by generating 300 high-coverage exome sequences from various populations of rainforest hunter-gatherers and neighboring farmers from the western and eastern parts of the central African equatorial rainforest. We show here, by model-based demographic inference, that the effective population size of African populations remained fairly constant until recent millennia, during which the populations of rainforest hunter-gatherers have experienced a ~75% collapse and those of farmers a mild expansion, accompanied by a marked increase in gene flow between them. Despite these contrasting demographic patterns, African populations display limited differences in the estimated distribution of fitness effects of new nonsynonymous mutations, consistent with purifying selection against deleterious alleles of similar efficiency in the different populations. This situation contrasts with that we detect in Europeans, which are subject to weaker purifying selection than African populations. Furthermore, the per-individual mutation load of rainforest hunter-gatherers was found to be similar to that of farmers, under both additive and recessive modes of inheritance. Together, our results indicate that differences in the subsistence patterns and demographic regimes of African populations have not resulted in large differences in mutational burden, and highlight the role of gene flow in reshaping the distribution of deleterious genetic variation across human populations.

There’s two major moving parts in this preprint. First, they using phylogenomic methods to explicitly model population history. Second, they integrated their demographic results in generation and interpreting the distribution of mutations within the exomes of these populations. That is, they combined phylogenomics to gain insight into population genomics, as the latter focuses more on the parameters which define variation with a population.

The data they worked with was from the exome. The regions of the genome which translate into genes. That’s ~30 million bases. They get really good precision due to high coverage, hitting site about 70 times. Their sample was about 300 Africans and 100 Europeans, and they got ~500,000 polymorphisms or variants for their trouble.

The populations were labeled by subsistence and provenance. The Europeans were Belgians. For the Africans they had two groups of hunter-gatherer Pymgies, and two groups of Bantu agriculturalists, sampled from western and eastern locations as you see on the map above.

The admixture plots, which separate out individuals into K numbers of populations break out in a way that makes sense. First, Europeans separate, and the eastern agriculturalist populations have a little bit of evidence of European-like ancestry. This is almost certainly Middle Eastern farmer, which has been found in many East African populations, and those populations which have mixed with them. Then the hunter-gathers separate from the agriculturalists. This is in line with expectation and earlier research; the hunter-gatherers of Africa seem very different from the agriculturalists, and are actually more closely related to each other than the agriculturalists in their neighboring regions.

The exception to this pattern is caused by recent gene flow, which is clearly evident above. Due to population size differences it looks like there is more agricultural ancestry in the Pygmies than vice versa. I wish that they had sampled Mbuti Pygmies. I’m told that this group has the least agricultural admixture.

But then they decided to get fancy and explicitly model demographic histories with fastsimcoal2. What does this do? From the website for the software:

While preserving all the simulation flexibility of simcoal2, fastsimcoal is now implemented under a faster continous-time sequential Markovian coalescent approximation, allowing it to efficiently generate genetic diversity for different types of markers along large genomic regions, for both present or ancient samples. It includes a parameter sampler allowing its integration into Bayesian or likelihood parameter estimation procedure.

fastsimcoal can handle very complex evolutionary scenarios including an arbitrary migration matrix between samples, historical events allowing for population resize, population fusion and fission, admixture events, changes in migration matrix, or changes in population growth rates. The time of sampling can be specified independently for each sample, allowing for serial sampling in the same or in different populations.

The models you see that were tested are pretty simple, and they all seem plausible I suppose. Their simulations suggested that the three above scenarios, with alternative branching patterns and various gene flows, were all of equal likelihood. That is, given the models and the data that they had (4-fold synonymous sites which are likely to be neutral) you can’t distinguish which is right.

In all the models hunter-gatherers diverged relatively recently and so did the agriculturalists. Europeans, who are stand-ins for all non-Africans in this scenario, diverged pretty early from the Africans. But how the Africans relate to each other and Europeans is not totally clear. Why? Because ancient population structure. It is becoming rather obvious now that ~100,000 years ago, and earlier, there were many different modern human lineages which had already diversified. The Khoisan seem to have diverged from other human lineages closer to 200,000 thousand than 100,000 years ago. What this means is that for most of the history of anatomically modern humans population structure  existed between distinct lineages. And some of that persists down to today within Africa.

I’ll bullet point some of their inferences from these models (verbatim quotes below):

  1. Our results suggest that the ancestors of the contemporary RHG, AGR and EUR populations diverged between 85 and 140 thousand years ago (kya), from an ancestral population that underwent demographic expansion between 173 and 191 kya
  2. After the initial population splits, the Ne of AGR and RHG (NaAGR and NaRHG) remained within a range extending from 0.55 to 2.2 times the ancestral African Ne (NHUM), whereas EUR (NaEUR) experienced a decrease in Ne by a factor of three to seven.
  3. The ancestors of the wRHG and eRHG populations diverged 18 to 20 kya (TRHG), and underwent a decreased in Ne by a factor of 3.8 to 5.7 for the wRHG (NwRHG) and 7.1 to 11 for the eRHG (NeRHG), regardless of the branching model considered.
  4. The ancestors of the AGR (NaAGR) split into western and eastern populations 6.7 to 11 kya (TAGR), and underwent a mild expansion, by a factor of 2.3 to 3.1 for the wAGR (NwAGR) and 1.2 to 2.2 for the eAGR (NeAGR).
  5. The EUR population experienced a 7.1- to 8.3-fold expansion (NEUR) 12 to 22 kya (TEUR).

No results are perfect. But some of these dates do not make sense. There’s a lot of circumstantial evidence that the ancestors of European populations began to expand over the last 10,000 years. The dates above suggest there was a Pleistocene expansion. Basically you can divide that value by half, and then you get a reasonable range.

Second, both the agriculturalists sampled here are Bantu speaking, and there’s a good amount of cultural and genetic data for recent shared ancestry of the Bantu over the last 3,000 years. I understand that admixture with a very diverged lineage (e.g., eastern Bantu agriculturalist samples mixing with Nilotic populations, which is how they got some non-African ancestry, as well as local Pygmy groups) can inflate these divergence dates. If that’s the case, they should note that in the text.

We don’t have much historical or archaeological clarity from what I know about divergences between Pygmy groups. This particular group has studied the topic and published on it before, so I’m inclined to trust them more than anyone else. But, the above dates for groups we do know make me a bit more skeptical of a simple divergence around the Last Glacial Maximum.

Then there are the earliest divergences. And 85 to 140,000 year interval is huge for when non-Africans split off from Africans. If closer to 140 than 85, then that means that non-African divergence from Africans preserves ancient African diversity. That is, non-Africans descend from an African group that no longer exists (or has not been sampled in this study at least!). I’ve poked around this question, and when you take into account recent gene flow, it is hard to find the specific African group that non-Africans descend from, though there is some consensus that they branched off from the non-Khoisan Africans later than from the Khoisan.

But there is also a lot of archaeological and some ancient genetic DNA now that indicates that the vast majority of non-African ancestry began to expand rapidly around 50-60,000 years ago. This is tens of thousands of years after the lowest value given above. Therefore, again we have to make recourse to a long period of separation before the expansion. This is not implausible on the face of it, but we could do something else: just assume there’s an artifact with their methods and the inferred date of divergence is too old. That would solve many of the issues.

I really don’t know if the above quibbles have any ramification for the site frequency spectrum of deleterious mutations. My own hunch is that no, it doesn’t impact the qualitative results at all.

Figure 3 clearly shows that Europeans are enriched for weak and moderately deleterious mutations (the last category produces weird results, and I wish they’d talked about this more, but they observe that strong deleterious mutations have issues getting detected). Ne is just the effective population size and s is the selection coefficient (bigger number, stronger selection).

Why are the middle two values enriched? Presumably it’s the non-African bottleneck. This is where another non-African population would have been a nice check to make sure that it was the “Out of Africa” bottleneck…but it’s probably asking a bit much to sequence more individuals to 70x coverage.

The lack of difference between the African populations is an indication that recent demography is not shaping the distribution much. Additionally, they note that gene flow between the African groups probably increased diversity in some ways, so that as long as a group is connected with other populations it will probably be rescued (note that none of these in their data were particular inbred as judging by runs of homozygosity).

Finally, they found that the number of homozygote mutations that were deleterious is higher in their model results for Europeans than the African groups. This is not surprising, and what one expects. But, they found that this is a function likely of continuous gene flow between the African groups. Without gene flow homozygosity would have been much higher. This gets back to the fact that gene flow is a powerful homogenizing tool, and the lack of gene flow has to be pretty extreme for divergence to occur.

Which brings us back to the “Out of Africa” event. The next ten years are going to see a lot of investigation of African phyologenomics and population genomics. Basically, the relationships, and selection pressures. It is totally implausible that Bantu groups in Kenya and Tanzania did not absorb local non-Nilotic populations. We’ll figure that out. Additionally, selection pressures are probably different between different groups. We’ll know more about that. But, ancient DNA will probably give us some understanding of why non-Africans went through such a massive demographic sieve. We know in broad sketches. But most people want to fill in the details.

Citation: The demographic history and mutational load of African hunter-gatherers and farmers, Marie Lopez, Athanasios Kousathanas, Helene Quach, Christine Harmant, Patrick Mouguiama-Daouda, Jean-Marie Hombert, Alain Froment, George H Perry, Luis B Barreiro, Paul Verdu, Etienne Patin, Lluis Quintana-Murci, doi: https://doi.org/10.1101/131219

The logic of human destiny was inevitable 1 million years ago

Robert Wright’s best book, Nonzero: The Logic of Human Destiny, was published nearly 20 years ago. At the time I was moderately skeptical of his thesis. It was too teleological for my tastes. And, it does pander to a bias in human psychology whereby we look to find meaning in the universe.

But this is 2017, and I have somewhat different views.

In the year 2000 I broadly accepted the thesis outlined a few years later in The Dawn of Human Culture. That our species, our humanity, evolved and emerged in rapid sequence, likely due to biological changes of a radical kind, ~50,000 years ago. This is the thesis of the “great leap forward” of behavioral modernity.

Today I have come closer to models proposed by Michael Tomasello in The Cultural Origins of Human Cognition and Terrence Deacon in The Symbolic Species: The Co-evolution of Language and the Brain. Rather than a punctuated event, an instance in geological time, humanity as we understand it was a gradual process, driven by general dynamics and evolutionary feedback loops.

The conceit at the heart of Robert J. Sawyer’s often overly preachy Neanderthal Parallax series, that if our own lineage went extinct but theirs did not they would have created a technological civilization, is I think in the main correct. It may not be entirely coincidental that the hyper-drive cultural flexibility of African modern humans evolved in African modern humans first. There may have been sufficient biological differences to enable this to be likely. But I believe that if African modern humans were removed from the picture Neanderthals would have “caught up” and been positioned to begin the trajectory we find ourselves in during the current Holocene inter-glacial.

Luke Jostins’ figure showing across board encephalization

The data indicate that all human lineages were subject to increased encephalization. That process trailed off ~200,000 years ago, but it illustrates the general evolutionary pressures, ratchets, or evolutionary “logic”, that applied to all of them. Overall there were some general trends in the hominin lineage that began to characterized us about a million years ago. We pushed into new territory. Our rate of cultural change seems to gradually increased across our whole range.

One of the major holy grails I see now and then in human evolutionary genetics is to find “the gene that made us human.” The scramble is definitely on now that more and more whole genome sequences from ancient hominins are coming online. But I don’t think there will be such gene ever found. There isn’t “a gene,” but a broad set of genes which were gradually selected upon in the process of making us human.

In the lingo, it wasn’t just a hard sweep from a de novo mutation. It was as much, or even more, soft sweeps from standing variation.

How Tibetans can function at high altitudes


About seven years ago I wrote two posts about how Tibetans manage to function at very high altitudes. And it’s not just physiological functioning, that is, fitness straightforwardly understood. High altitudes can cause a sharp reduction in reproductive fitness because women can not carry pregnancies to term. In other words, high altitude is a very strong selection pressure. You adapt, or you die off.

For me there have been two things of note since those original papers came out. First, one of those loci seem to have been introgressed from a Denisovan genetic background. I want to be careful here, because the initial admixture event may not have been into the Tibetans proper, but earlier hunter-gatherers who descend from Out of Africa groups, who were assimilated into the Tibetans as they expanded 5-10,000 years ago. Second, it turns out that dogs have been targeted for selection on EPAS1 as well (the “Denisovan” introgression) for altitude adaptation as well.

This shows that in mammals at least there’s a few genes which show up again and again. The fact that EPAS1 and EGLN1 were hits on relatively small sample sizes also reinforces their powerful effect. When the EPAS1 results initially came out they were highlighted as the strongest and fastest instance of natural selection in human evolutionary history. One can quibble about the details about whether this was literally true, but that it was a powerful selective event no one could deny.

A new paper in PNAS, Genetic signatures of high-altitude adaptation in Tibetans, revisits the earlier results with a much larger sample size (the research group is in China) comparing Han Chinese and Tibetans. They confirm the earlier results, but, they also find other loci which seem likely targets of selection in Tibetans. Below is the list:

SNP A1 A2 Frequency of A1 P value FST Nearest gene
Tibetan EAS (Han)
rs1801133 A G 0.238 0.333 6.30E-09 0.021 MTHFR
rs71673426 C T 0.102 0.013 1.50E-08 0.1 RAP1A
rs78720557 A T 0.498 0.201 4.70E-08 0.191 NEK7
rs78561501 A G 0.599 0.135 6.10E-15 0.414 EGLN1
rs116611511 G A 0.447 0.003 3.60E-19 0.57 EPAS1
rs2584462 G A 0.211 0.549 3.90E-09 0.203 ADH7
rs4498258 T A 0.586 0.287 1.70E-08 0.171 FGF10
rs9275281 G A 0.095 0.365 1.10E-10 0.162 HLA-DQB1
rs139129572 GA G 0.316 0.449 5.80E-09 0.036 HCAR2
P value indicates the P value from the MLMA-LOCO analysis. FST is the FST value between Tibetans and EASs. Nearest gene indicates the nearest annotated gene to the top differentiated SNP at each locus except EGLN1, which is known to be associated with high-altitude adaptation; rs139129572 is an insertion SNP with two alleles: GA and G. A1, allele 1; A2, allele 2.

Many of these genes are familiar. Observe the allele frequency differences between the Tibetans and other East Asians (mostly Han). The sample sizes are on the order of thousands, and the SNP-chip had nearly 300,000 markers. What they found was that the between population Fst of Han to Tibetan was ~0.01. So only 1% of the SNP variance in their data was partitioned between the two groups. These alleles are huge outliers.

The authors used some sophisticated statistical methods to correct for exigencies of population structure, drift, admixture, etc., to converge upon these hits, but even through inspection the deviation on these alleles is clear. And as they note in the paper it isn’t clear all of these genes are selected simply for hypoxia adaptation. MTFHR, which is quite often a signal of selection, may have something to due to folate production (higher altitudes have more UV). ADH7 is part of a set of genes which always seem to be under selection, and HLA is never a surprise.

Rather than get caught up in the details it is important to note here that expansion into novel habitats results in lots of changes in populations, so that two groups can diverge quite fast on functional characteristics.  The PCA makes it clear that Tibetans and Hans have very little West Eurasian admixture, and the Fst based analysis puts their divergence on the order of 5,000 years before the present. The authors admit honestly that this is probably a lower bound value, but I also think it is quite likely that Tibetans, and probably Han too, are compound populations, and a simple bifurcation model from a common ancestral population is probably shaving away too many realistic edges. In plainer language, there has been gene flow between Han and Tibetans probably <5,000 years ago, and Tibetans themselves probably assimilated more deeply diverged populations in the highlands as they expanded as agriculturalists. An estimate of a single divergence fits a complex history to too simple of a model quite often.

The take home: understanding population history is probably important to get a better sense of the dynamics of adaptation.

Citation: Jian Yang, Zi-Bing Jin, Jie Chen, Xiu-Feng Huang, Xiao-Man Li, Yuan-Bo Liang, Jian-Yang Mao, Xin Chen, Zhili Zheng, Andrew Bakshi, Dong-Dong Zheng, Mei-Qin Zheng, Naomi R. Wray, Peter M. Visscher, Fan Lu, and Jia Qu, Genetic signatures of high-altitude adaptation in Tibetans, PNAS 2017 ; published ahead of print April 3, 2017, doi:10.1073/pnas.1617042114