Population structure in Neanderthals leads to genetic homogeneity

The above tweet is in response to a article which reports on the finding past month in PNAS, Early history of Neanderthals and Denisovans. It’s open access, you should read it. I don’t think I’ve reviewed it because I haven’t dug through the supplements. To be frank this is a paper where you pretty much have to read the supplements because they’re introducing a somewhat different model here than is the norm.

I talked to Alan Rogers at SMBE about this paper. Broadly, I think there might be something to it, and it’s because of what David says above. It is simply hard to imagine that Neanderthals could be extremely successful with such low genetic diversity as we see, and spread so thin. Now, the Quanta Magazine tries to emphasize that the effective population is not the true census population, but I wish it would have explained it more clearly. Basically, the size that is relevant for breeding is obviously not going to the same as a head count. And, because effective populations are highly sensitive to bottlenecks you can get really small numbers even when the extant population at any given time may be large.

The PNAS paper makes some novel inferences, and I’ll set that to the side until I read the supplements. But I don’t think it’s crazy that population structure within Neanderthals could be leading to lower total genetic diversity.

Release the UK Biobank! (the prediction of height edition)

There’s so much science coming out of the UK Biobank it’s not even funny. It’s like getting the palantír or something.

Anyway, a preprint, submitted for your approval. A vision of things to come? Accurate Genomic Prediction Of Human Height:

We construct genomic predictors for heritable and extremely complex human quantitative traits (height, heel bone density, and educational attainment) using modern methods in high dimensional statistics (i.e., machine learning). Replication tests show that these predictors capture, respectively, ~40, 20, and 9 percent of total variance for the three traits. For example, predicted heights correlate ~0.65 with actual height; actual heights of most individuals in validation samples are within a few cm of the prediction. The variance captured for height is comparable to the estimated SNP heritability from GCTA (GREML) analysis, and seems to be close to its asymptotic value (i.e., as sample size goes to infinity), suggesting that we have captured most of the heritability for the SNPs used. Thus, our results resolve the common SNP portion of the “missing heritability” problem – i.e., the gap between prediction R-squared and SNP heritability. The ~20k activated SNPs in our height predictor reveal the genetic architecture of human height, at least for common SNPs. Our primary dataset is the UK Biobank cohort, comprised of almost 500k individual genotypes with multiple phenotypes. We also use other datasets and SNPs found in earlier GWAS for out-of-sample validation of our results.

A scatter-plot is worth a thousand derivations.

You know what better than 500,000 samples? One billion samples! A nerd can dream….

Black ancestry in white Americans of colonial background

I stumbled upon striking photographs of “white slaves” while reading The United States of the United Races: A Utopian History of Racial Mixing. The backstory here is that in the 19th century abolitionists realized that Northerners might be more horrified as to the nature of slavery if they could find children of mostly white ancestry, who nevertheless were born to slave mothers (and therefore were slaves themselves). So they found some children who had either been freed, or been emancipated, and dressed them up in more formal attire (a few more visibly black children were presented for contrast).

This illustrates that the media and elites have been using this ploy for a long time. I am talking about the Afghan girl photograph, or the foregrounding of blonde and blue-eyed Yezidi children. Recently I expressed some irritation on Twitter when there was a prominent photograph of a hazel-eyed Rohingya child refugee being passed around. Something like 1 in 500 people in that region of the world has hazel eyes! That couldn’t be a coincidence. Race matters when it comes to compassion.

But this post isn’t about that particular issue…rather, the images of enslaved white children brought me back to a tendency I’ve seen and wondered about: the old stock white Americans whose DNA results suggest ~1% or less Sub-Saharan ancestry. These are not uncommon, and I’ve looked at several of them (raw data). I’m pretty sure the vast majority at the 0.5% or more threshold are true positives, and probably many a bit below this (to my experience people from England and Ireland don’t get 0.3% African “noise” estimates with the modern high-density marker sets).

According to 23andMe’s database about 1 out of 10 white Southerners has African ancestry at the 1% threshold. It would be even more if you dropped to closer to 0.5%. And the DNA ancestry here understates the extent of what was going on: at about 10 generations back you are about 50% likely to inherit zero blocks of genomic ancestry from a given ancestor (assuming no inbreeding in the pedigree obviously). And this is exactly when a lot of the ancestry that is being detected seems to have “entered” the white population. In other words, for every person who is 1% African and 99% white American, they have a sibling who is 0% African and 100% white American, even though genealogically they share the same ancestors. Dropping the threshold to closer to 0.3%, and considering that even in the South there was migration from the North, and to a lesser extent Europe, after the Civil War, I wouldn’t be surprised if models of admixture inferred from the distributions we see indicate that over half the lowland Southern white population likely had genealogical descent from a black slave.

This all comes to mind because there aren’t too many records of people “passing” during this period. Those who deal in genealogy and encounter these cases of low fractions, which are nevertheless likely not false positives, almost never find a “paper trail” when they go look. And they look really hard.

The reason is obvious in the context of American history. Thomas Jefferson’s slave Sally Hemings had three white grandparents and one African slave grandparent. Several of her children are recorded to have been totally European in an appearance, and all except one passed into the white population (the two eldest married well into affluent white families in Washington D.C.). Passing as white was a way to escape the debilities of black status in the United States.

That being said, I think our Whig conception of the progressive nature of history sometimes misleads us in forgetting that the dynamics of race relations has had its ups and downs several times in the last few centuries in North America. If you read Daniel Walker Howe’s excellent What Hath God Wrought you observe that racial beliefs about the necessity and institutionalization of white supremacy in the early American republic evolved over time. Though the early republic would never be judged racially enlightened by modern lights, it was certainly far less explicitly racially conscious than what was the norm in the decades before the Civil War.

In particular, the rise of democratic populism during the tenure of Andrew Jackson was connected with much more muscular racial nationalism. To utilize a framework emphasized by David Cannadine in Ornamentalism, colonialism and Western civilization during the 19th and early 20th centuries can be viewed through the lens of race and class. Though the economic inequalities of American society persisted through the 19th century, men such as Andrew Jackson affected a more populist and rough-hewn persona than the aristocratic presidents of the early 19th century.* The white man’s republic had a leveling effect on the nature of elite culture.

But the attitudes toward racial segregation and mixing took decades to harden. Martin van Buren’s vice president, Richard Mentor Johnson, was well known to have had a common-law wife, Julia Chinn, who was a slave. He recognized his two daughters by her. He was vice president from 1837-1841 in the more racist of the two American political parties of the time. It is hard to imagine this being a viable “lifestyle” choice for someone of this prominence in later decades (after Julia Chinn’s death Johnson continued to enter into relationships with slaves).

Walter F. White, a black leader of the NAACP

Which brings us back to what was happening in the decades around 1800. Racism was a fact of life, necessitating the need for passing. But, beliefs about racial purity and the one drop rule had not hardened, so it would not be surprising to me that it was much easier for slaves or ex-slaves with mostly European ancestry to change their identity. Perhaps white Americans of that period were simply less vigilant about someone’s background because they were genuinely less concerned about the possibility that their partner may have had some black ancestry, so long as they looked white.

As the databases grow larger we’ll get a better sense of the demographic and genealogical dynamics. My suspicion is that we’ll see that there wasn’t much diminishment of gene flow into the black-identified community over the past 200 years, as much as the fact that hypo-descent, the one-drop rule, became so powerful in the between 1850 and 1950 we can confirm that passing declined, before rising again in the 1960s as whites became less vigilant due to decreased racism.

* As a middle class New Englander John Adams obviously was no aristocrat, but he was no populist either.

Open Thread, 9/17/2017

Reading Vietnam: A New History. The author has an apologia/explanation for why he is focusing not just on European colonialism, but the history of what became Vietnam back to the first contacts with Han China (with some perfunctory archaeological passages). This is great in theory, but from what I have read so far we’re going to have a tryst with the French sooner than later. So I don’t think he really delivered here (though perhaps “normal” people want to read about evil European colonialism immediately?).

By coincidence, there is a Ken Burns documentary on the Vietnam War now. One of my friends from when I was a kid had (has) a dad who was a Vietnam vet. He’d have night terrors. Only now do I realize how recently in the past it was for him back in the 1980s.

I did enjoy The Best and the Brightest.

Sent out my second newsletter. Here’s a stat that I divulged: more than 50% of traffic to this site is directly due to Google+Twitter+Facebook. In 2011 it was 35%. Much of the difference is due to the decline in RSS feeds, and the rise of mobile.

Why is Twitter not what it was in the early 2010s? I think part of it is that there are too many people on Twitter, and the average user is less intelligent overall. Unlike Facebook on Twitter the “genius” is anyone can talk to you. This is a problem.

The grandmaster of Mormon dweeb fantasy (I say this affectionately) Brandon Sanderson is coming out with the third book in his projected ten book Stormlight Archive series.

I’m at peace with the likelihood that I won’t finish this series. Sanderson is a great world-builder, so I’m looking at these books more as fictional ethnographies. Just along for a short ride.

Finally in the homestretch of A New History of Western Philosophy. After the classical period I haven’t really enjoyed this book, it was a slog. I began to read it at the same time as I read Consciousness and the Brain, which I finished in a week. Two years on I’m finally finishing the other book I started then.

Finally, again I highly recommend The Fortunes of Africa. Great read. I do have to say that it was hard not to be particularly appalled by Arab slave traders. It’s not like the European trade isn’t appalling, but that’s widely known. In contrast the driving of black Africans across the Sahara is less in the Western consciousness.

Massive genomic sample sizes = detecting evolution in real time

The recent PLOS BIOLOGY paper, Identifying genetic variants that affect viability in large cohorts, seems to have triggered a feeding frenzy in the media. For example, Big Think has put up Researchers Find Evidence That Human Evolution Is Still Actively Happening.

I wasn’t paying close attention because of course human evolution is still happening actively. From a genetic perspective, evolution is just change in allele frequencies. Populations aren’t infinite, so even if there wasn’t any selection stochastic forces would shift allele frequencies. But of course selection is probably happening. For adaptation by natural selection to occur you need heritable variation on a trait where there are fitness differences as a function of variation within the population. It seems implausible that these conditions don’t still apply. There’s plenty of fitness variation in the population, and it’s unlikely to be random as a function of heritable variation.

But the devil is in the details. And last year Field et al. used the modern genomic tools available to detect selection occurring over the past 2,000 years. It is not credible that it would have magically stopped a few centuries ago.

So why is this new paper such a big deal? (note that it’s in PLOS BIOLOGY, not PLOS GENETICS) Because the method they use is ingenious and simple. Basically, they’re looking at changes in allele frequencies as a function of age in huge populations. It’s a little more complicated than that, they used a logistic regression to control for some of the other variables. But they found some biologically plausible hits with their data set of 50,000-150,000. And, they replicated their hits from a European sample to a non-European one.

This does bring me back to a discussion I observed a while back. An evolutionary geneticist who works with Drosophila mentioned offhand that in his field there really wasn’t that much of a need for more data. They could spend all their time to doing analysis. A prominent human geneticist whose work focused on biomedicine piped up that that wasn’t true at all for their field. There are some differences in the scientific questions, but there are also differences in terms of what you can do with humans as a model organism.

In the paper they look forward to the day of increasing sample sizes an order of magnitude beyond where it is now. At some point in the near future, large fractions of entire nations will be sequenced at medical grade level (30x coverage).

Anyway, you should read Identifying genetic variants that affect viability in large cohorts. It’s pretty straightforward.

George R. R. Martin’s typical fantasy trope

George R. R. Martin has done something new in fantasy. He has created a world in shades of gray. This is in contrast to the modern template of J. R. R. Tolkien’s The Lord of the Rings, where what is good and what is evil were as clear and distinct as black and white. In addition, A Song of Ice and Fire transcended fantasy’s traditional appeal to adolescent males. This latter tendency is pretty evident in Robert Jordan’s The Wheel of Time, which was simply not moving beyond its juvenile origins by book seven or eight, when I gave up (I was moving beyond my juvenile origins by that point). It isn’t as if the Jordan-style, geared toward adolescent male virgins, can’t be done well. I’d argue that Brandon Sanderson pulls this off very competently.

One of the aspects of A Song of Ice and Fire in the books is there isn’t a Dark Lord who is the literal personification of evil. No Sauron. Even the primary antagonists become less dark with deeper exploration, and their motivations are often complex, and comprehensible in their own way.

But there is a major exception to this: Ramsay Snow and House Bolton. The Boltons are the great rivals of the Starks in the North, and before they were vassals they were kings. And they are evil in a straightforward sense without nuance. One stupid “fan theory” (these are the television show watchers) even posited that Roose Bolton was an immortal vampire.

Though Martin is careful to suggest that people should not take the antiquity of the dynasties in A Song of Ice and Fire literally, it is clear that the Bolton’s are not parvenus. Their lineage is old, and it has persisted. And yet the Boltons which are highlighted are basically without any redeeming qualities over their history. Ramsay Snow is basically the protagonist of a snuff film come alive.

When it comes to mining history perhaps the best analog to the Boltons were the Assyrians. Like the Boltons they flayed their enemies alive. The Assyrians also famously were totally destroyed by their enemies because of the ill-will their cruelty in conquest generated. Martin is a student of history, and there is no way that a lineage of such unmitigated evil could persist down the centuries. The Boltons exist as witness to the long tradition of fantasy antagonists which readers love to hate.

Reason with Prose, Not Poetry

One of the insights of the excellent book The Enigma of Reason is that “reason” isn’t some disembodied analytic faculty, but part of a broader cognitive toolkit. And, it doesn’t really have the catbird seat we like to think. This is pretty obvious to many people; at least when it comes to the “reasons” of those with whom they disagree. And some of the basic propositions were explicated rather well by David Hume over 200 years ago.

But if you conceive of reason as a form of argumentation aimed at those who don’t agree with you, then in many cases dense and stolid may be superior to poetic and stirring. If you are looking for reasons to entertain or consider views with which you disagree you need a good argument to chew on. Reasons to align with countervailing intuitions.

To give a concrete example, most people seem to admit that Adolf Hitler was a stirring orator. But I’m pretty sure that few modern Neo-Nazis were immediately converted by watching his speeches. If you don’t already believe in his propositions Hitler’s speeches just seem sinister.

That’s an extreme example of course. But it gets at the point. The conservative thinker William F. Buckley was often praised for his command of the English language, but I know that many liberals find his prose pretentious and tedious. Ta-Nehisi Coates’ pieces elicit almost orgasmic praise from liberal public intellectuals, but non-liberals often judge that he’s simply indulged.

My point here is that reasoned and dense discourse, which nonetheless maintains clarity, may not persuade in one sitting through force of argumentation. But it is far more likely to push the needle with someone who begins at sharp contradiction with the core propositions. In contrast, sermons convince those who are already primed to be carried up the heights. Sermons don’t really make cogent points, because they already take for granted you agree on the points.

Addendum: Please note that the above applies to the small proportion of the population fixated on the necessity of reasoned arguments. Most people are convinced by social cues of what is, and isn’t, acceptable for their ingroup. Basically, the audience that I’m talking about here are the sorts which read political magazines rather than listen to talk radio or watch Samatha Bee. If they are religious they are the sorts who actually read the Nicene creed and attempt to understand the Athanasian formula.

Carving nature at its joints more realistically

If you are working on phylogenetic questions on a coarse evolutionary scale (that is, “macroevolutionary,” though I know some evolutionary geneticists will shoot me the evil eye for using that word) generating a tree of relationships is quite informative and relatively straightforward, since it has a comprehensible mapping onto to what really occurred in nature. When your samples are different enough that the biological species concept works well and gene flow doesn’t occur between node, then a tree is a tree (one reason Y and mtDNA results are so easy to communicate to the general public in personal genomics).

Everything becomes more problematic when you are working on a finer phylogenetic scale (or in taxa where inter-species gene flow is common, as is often the case with plants). And I’m using problematic here in the way that denotes a genuine substantive analytic issue, as opposed to connoting something that one has moral or ethical objections to.

It is intuitively clear that there is often genetic population structure within species, but how to summarize and represent that variant is not a straightforward task.

In 2000 the paper Inference of Population Structure Using Multilocus Genotype Data in Genetics introduced the sort of model-based clustering most famously implemented with Structure. The paper illustrates limitations with the neighbor-joining tree methods which were in vogue at the time, and contrasts them with a method which defines a finite set of populations and assigns proportions of each putative group to various individuals.

The model-based methods were implemented in numerous packages over the 2000s, and today they’re pretty standard parts of the phylogenetic and population genetic toolkits. The reason for their popularity is obvious: they are quite often clear and unambiguous in their results. This may be one reason that they emerged to complement more visualization methods like PCA and MDS with fewer a priori assumptions.

But of course, crisp clarity is not always reality. Sometimes nature is fuzzy and messy. The model-based methods take inputs and will produce crisp results, even if those results are not biologically realistic. They can’t be utilized in a robotic manner without attention to the assumptions and limitations (see A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots).

This is why it is exciting to see a new preprint which addresses many of these issues, Inferring Continuous and Discrete Population Genetic Structure Across Space*:

A classic problem in population genetics is the characterization of discrete population structure in the presence of continuous patterns of genetic differentiation. Especially when sampling is discontinuous, the use of clustering or assignment methods may incorrectly ascribe differentiation due to continuous processes (e.g., geographic isolation by distance) to discrete processes, such as geographic, ecological, or reproductive barriers between populations. This reflects a shortcoming of current methods for inferring and visualizing population structure when applied to genetic data deriving from geographically distributed populations. Here, we present a statistical framework for the simultaneous inference of continuous and discrete patterns of population structure….

The whole preprint should be read for anyone interested in phylogenomic inference, as there is extensive discussion and attention to many problems and missteps that occur when researchers attempt to analyze variation and relationships across a species’ range. Basically, the sort of thing that might be mentioned in peer review feedback, but isn’t likely to be included in any final write-ups.

As noted in the abstract the major issue being addressed here is the problem that many clustering methods do not include within their model the reality that genetic variation within a species may be present due to continuous gene flow defined by isolation by distance dynamics. This goes back to the old “clines vs. clusters” debates. Many of the model-based methods assume pulse admixtures between population clusters which are random mating. This is not a terrible assumption when you consider perhaps what occurred in the New World when Europeans came in contact with the native populations and introduced Africans. But it is not so realistic when it comes to the North European plain, which seems to have become genetically differentiated only within the last ~5,000 years, and likely seen extensive gene flow.

The figure below shows the results from the conStruct method (left), and the more traditional fastStructure (right):

There are limitations to the spatial model they use (e.g., ring species), but that’s true of any model. The key is that it’s a good first step to account for continuous gene flow, and not shoehorning all variation into pulse admixtures.

Though in beta, the R package is already available on github (easy enough to download and install). I’ll probably have more comment when I test drive it myself….

* I am friendly with the authors of this paper, so I am also aware of their long-held concerns about the limitations and/or abuses of some phylogenetic methods. These concerns are broadly shared within the field.