Jerry Coyne on race: a reflection of evolution

After my post on the ‘race question’ I thought it would be useful to point to Jerry Coyne’s ‘Are there human races’?. The utility is that Coyne’s book Speciation strongly shaped my own perceptions. I knew the empirical reality of clustering before I read that book, but the analogy with “species concept” debates was only striking after becoming more familiar with that literature. Coyne’s post was triggered by a review of Race?: Debunking a Scientific Myth and Race and the Genetic Revolution: Science, Myth, and Culture. He terms the review tendentious, and I generally agree.

In the early 20th century Western intellectuals of all political stripes understood what biology told us about human taxonomy. In short, human races were different, and the white European race was superior on the metrics which mattered (this was even true of Left-Socialist intellectuals such as H. G. Wells and Jack London). In the early 21st century Western intellectuals of all political stripes understand what biology teaches us about human taxonomy. Human races are basically the same, and for all practical purposes identical, and equal on measures which matter (again, to Western intellectuals). As Coyne alludes to in his post these are both ideologically driven positions. One of the main reasons that I shy away from modern liberalism is a strong commitment to interchangeability and identity across all individuals and populations as a matter of fact, rather than equality as a matter of legal commitment. In a minimal government scenario the details of human variation are not of particular relevance, but if you accept the feasibility of social engineering (a term I am not using in an insulting sense, but in a descriptive one) you have to start out with a model of human nature. So this is not just an abstract issue. For whatever reason many moderns, both liberals and economic conservatives, start out with one of near identity (e.g., H. economicus in economics).

I want to highlight a few sections of Coyne’s post:

What are races?

In my own field of evolutionary biology, races of animals (also called “subspecies” or “ecotypes”) are morphologically distinguishable populations that live in allopatry (i.e. are geographically separated).  There is no firm criterion on how much morphological difference it takes to delimit a race.  Races of mice, for example, are described solely on the basis of difference in coat color, which could involve only one or two genes.

Under that criterion, are there human races?

Yes.  As we all know, there are morphologically different groups of people who live in different areas, though those differences are blurring due to recent innovations in transportation that have led to more admixture between human groups.

Why do these differences exist?

The short answer is, of course, evolution.  The groups exist because human populations have an evolutionary history, and, like different species themselves, that ancestry leads to clustering and branching, though humans have a lot of genetic interchange between the branches!

But what evolutionary forces caused the differentiation?  It’s undoubtedly a combination of natural selection (especially for the morphological traits) and genetic drift, which will both lead to the accumulation of genetic differences between isolated populations.  What I want to emphasize is that even for the morphological differences between human “races,” we have virtually no understanding of how evolution produced them.  It’s pretty likely that skin pigmentation resulted from natural selection operating differently in different places, but even there we’re not sure why (the classic story involved selection for protection against melanoma-inducing sunlight in lower latitudes, and selection for lighter pigmentation at higher latitudes to allow production of vitamin D in the skin; but this has been called into question by some workers).

As for things like differences in hair texture, eye shape, and nose shape, we have no idea….

I have no idea if reading Coyne’s earlier work influenced me, but observe that he too emphasizes that human races are a reflection of evolutionary history. Some of my interlocutors believe it is essential to have a tree-like phylogeny with no reticulation (gene flow across branches) to have a reasonable model for race, but I do not. That’s because the focus for me is evolutionary history. I want to understand evolutionary history. Taxonomy is a means to that end. It is not the end.

Coyne has a follow up post which will be of no surprise to reader of this weblog. But I do want to add a few things. 1) For pigmentation we do now understand its genomics relatively well. It seems that light skin emerged at least twice at the two ends of Eurasia, and, that it was a recent emergence (as evidence by markers of selective sweeps). 2) As for hair texture, there is some work which has shed light on this. East Asians in particular carry a variant of EDAR which gives them their distinctive thick straight hair. There has been less work on “woolly hair,” but I suspect that it will be elucidated soon (there are some candidate genes, from linkage studies and animal models). Additionally, I think it is important to note that the dark-skin-as-protection-against-skin-cancer does not make much evolutionary sense. Melanoma strikes later in one’s reproductive years. Rather, I accept that Nina Jablonski has the right of it when she argues that it protects against neural tube defects which arise because of various chemical changes which occur in one’s biochemistry due to exposure to sun. Finally, I think Coyne underestimates the power of even gene genomics using haplotype based techniques in narrowing down on very specific geographical and population origins for segments of your DNA right now. The key is not where you come from, it is how segments of your DNA relate to the full range of segments of other peoples’ DNA.

Are Sardinians like Iberians?

Dienekes asks:

In terms of autosomal DNA, the Iceman clearly clusters with modern Sardinians, and also appears slightly more removed than them compared to continental Europeans. Interestingly, at least as far as the PC analyssi shows, Sardinians appear to be intermediate between the Iceman and SW Europeans, rather than Italians. Perhaps, this makes sense if the Paleo-Sardinian language is indeed related to languages of Iberia.

This trend aroused a little curiosity in me too. I’m sure Dienekes & company will be probing these issues a lot in the near future, but I couldn’t wait. I took the IBS data set, which includes a lot of individuals from various areas of Spain, the Sardinians, French and French Basque from the HGDP, and the Tuscans from the HapMap, and threw them together into a pot. I added HGDP Russians & Orcadians (the latter a British group) to make sure there was a North European “outgroup.” In terms of technical details the combined data set had ~220,000 SNPs, not too shabby. Additionally, I decided to run a PCA, where this number of SNPs is more than sufficient.

On a technical note, the Sardinians were swamped in raw numbers by Iberians and Tuscans (over 100 and around 80 respectively). This means that the peculiarities of the Sardinian genetic heritage didn’t show up, rather, what you see are the Sardinians as they arrange themselves in relation to the genetic variation of these more numerous groups. I used SmartPCA to generate the 10 largest independent dimensions of variation. To make a long story short there really wasn’t much variation added from the second dimension on in this relatively homogeneous sample. So below is PC 1 and 2 (E1 and E2).


I’d be curious if someone could replicate this. I’m rather surprised that the Tuscans form such a tight cluster, but then again the IBS sample is very geographically distributed across Spain. The analogy to the HapMap Tuscans might be if Spain was represented by just Galicians. So what you’re really seeing is a lot of Spanish variation, and of course the north-south range in Europe (which is really a southwest to northeast cline). I don’t see a very strong affinity between Basques and Sardinians, but repeated trials indicated that the Sardinians do not cluster with Tuscans when it comes to their position within the Iberian genetic spectrum.

 

Ötzi the Iceman and the Sardinians

Well, the paper is finally out, New insights into the Tyrolean Iceman’s origin and phenotype as inferred by whole-genome sequencing. In case you don’t know, Ötzi the Iceman died 5,300 years ago in the alpine region bordering Austria and Italy. His seems to have been killed. And due to various coincidences his body was also very well preserved. This means that enough tissue remained that researchers have been able to amplify his DNA. And now they’ve sequenced it enough to the point where they can make some inferences about his phenotypic characteristics, and, his phylogenetic relationships to modern populations.

The guts of this paper will not be particularly surprising to close readers of this weblog. The guesses of some readers based on what the researchers hinted were correct: Ötzi seems to resemble mostly closely the people of Sardinia. This is rather interesting. One reason is prosaic. The HGDP sample used in the paper has many Northern Italians (from Bergamo). Why is it that Ötzi does not resemble the people from the region that he was indigenous to? (we know that he was indigenous because of the ratio of isotopes in his body) A more abstruse issue is that it is interesting that Sardinians have remained moored to their genetic past, enough so that a 5,300 year old individual clearly can exhibit affinities with them. The distinctiveness of Sardinians jumps out at you when you analyze genetic data sets. They were clearly set apart in L. L. Cavalli-Sforza’s The History and Geography of Human Genes, 20 years ago. One reason that Sardinians may be distinctive is that Sardinia is an isolated island. Islands experience reduced gene flow because they’re surrounded by water. And sure enough, Sardinians are especially similar to each other in relation to other European populations.

 

But Ötzi’s affinities reduce the strength of this particular dynamic as an explanation for Sardinian distinctiveness. The plot to the left is a PCA. It takes the genetic variation in the data set, and extracts out the largest independent components. PC 1 is the largest component, and PC 2 the second largest. The primary cline of genetic variation in Europe is North-South, with a secondary one going from West-East. This is evident in the plot, with PC 1 being North-South, and PC 2 being West-East. The “Europe S” cluster includes northern, southern, and Sicilian Italians. Now notice the position of Ötzi: he is closest to a large cluster of Sardinians. Interestingly there are also a few others. Who are they? I do not know because I do not have access to the supplements right now. The fact that the Sardinians are shifted closer to the continental populations than Ötzi is also striking. But totally intelligible: Sardinia has had some gene flow with other Mediterranean populations. This obviously post-dates Ötzi; Roman adventurers and Genoaese magnates could not be in his genealogy because Rome and Genoa did not exist 5,300 years ago. These data strongly point to the possibility of rather major genetic changes in continental Europe, and in particular Italy, since the Copper Age.  Juvenal complained that the “River Orantes has long flowed into the Tiber,” a reference to the prominence of easterners, Greek and non-Greek, in the city of Rome. The impact of this is not to be dismissed, but I do not think that it gets to the heart of this matter.

The second panel makes clear what I’m hinting at: Ötzi is actually closer to the “Middle Eastern” cluster than many Italians! In fact, more than most. Why? I suspect that rather than the Orantes, the Rhine and the Elbe have had more of an impact on the genetic character of Italians over the past ~5,000 years. Before Lombardy was Lombardy, named for a German tribe, it was Cisapline Gaul, after the Celts who had settled it. And before that? For that you have to ask where Indo-Europeans came from. I suspect the answer is that they came from the north, and therefore brought northern genes.


A Sardinian

And what of the Sardinians? I believe that the “islanders” of the Mediterranean are a relatively “pristine” snapshot of a particular moment in the history of the region. This is evident in Dienekes’ Dodecad Ancestry Project. Unlike their mainland cousins both the Sardinians and Cypriots tend to lack a “Northern European” component. Are the islanders in part descendants of the Paleolithic populations? In part. Sardinians carry a relatively high fraction of the U5 haplogroup, which has been associated with ancient hunter-gatherer remains. But it is also possible that the preponderant aspect of Sardinian ancestry derives from the first farmers to settle the Western Mediterranean.  I say this because the Iceman carried the G2a Y haplogroup, which has of late been strongly associated with very early Neolithic populations in Western Europe. And interestingly some scholars have discerned a pre-Indo-European substrate in Sardinian which suggests a connection to the Basque. I wouldn’t read too much into that, but these questions need to be explored, as Ötzi’s genetic nature makes Sardiniaology more critical to understanding the European past.

Image credit: Wikipedia

Neandertal population structure

There’s a new paper out, Partial genetic turnover in neandertals: continuity in the east and population replacement in the west. The primary results are above. Basically, using 13 mtDNA samples the authors conclude that it looks as if there was a founder effect for Neanderthals in Western Europe ~50 K years ago, generating a very homogenized genetic background for this particular population before the arrival of modern humans. Perhaps it’s just me, but press releases with headlines such as “European Neanderthals Were On the Verge of Extinction Even Before the Arrival of Modern Humans” strike me as hyperbolic. I’m also confused by quotes like the one below:

“The fact that Neanderthals in Europe were nearly extinct, but then recovered, and that all this took place long before they came into contact with modern humans came as a complete surprise to us. This indicates that the Neanderthals may have been more sensitive to the dramatic climate changes that took place in the last Ice Age than was previously thought”, says Love Dalén, associate professor at the Swedish Museum of Natural History in Stockholm.

There are several points that come to mind, from the specific to the general. First, from what I gather Neandertals were actually less expansive in pushing the northern limits of the hominin range than the modern humans who succeeded them. From this many suppose that despite the biological cold-adapted nature of the Neandertal physique they lacked the cultural plasticity to push the range envelope (e.g., modern humans pushed into Siberia, allowing them to traverse Beringia). One might infer from this that Neandertals were more, not less, sensitive to climate changes than later human populations. Second, there is the fact that as the northern hominin lineage one would expect that Neandertals would be subject to more population size variations than their cousins to the south during the Pleistocene due to cyclical climate change. This is not just an issue just for Neandertals, but for slow breeding or moving organisms generally. The modern human bottleneck is in some ways more surprising, because modern humans derive from a warmer climate. Finally, there is the “big picture” issue that though we throw these northern adapted hominins into the pot as “Neandertals,” one shouldn’t be surprised if they exhibit structure and variation. Non-African humans have diversified over less than 100,000 years, at a minimum the lineages which we label Neandertals were resident from Western Europe to Central Asia for ~200,000 years. Wouldn’t one expect a lot of natural history over this time?

Presumably the authors focused on mtDNA because this is copious relative to autosomal DNA, making ancient DNA extraction easier. I’m a bit curious how it aligns with the inference from the Denisovan paper that Vindija and Mezmaiskaya Neandertals both went through a population bottleneck using autosomal markers. The dates from the paper’s supplements are not clear to me, though it seems possible that they may have sampled individuals where the Vindija population may have been post-resettlement. At some point presumably we may be able to get a better sense of the source population of the Neandertal admixture into our own genomes if the genomic history of this population is well characterized.

What the World Is Made Of

I know you’re all following the Minute Physics videos (that we talked about here), but just in case my knowledge is somehow fallible you really should start following them. After taking care of why stones are round, and why there is no pink light, Henry Reich is now explaining the fundamental nature of our everyday world: quantum field theory and the Standard Model. It’s a multi-part series, since some things deserve more than a minute, dammit.

Two parts have been posted so far. The first is just an intro, pointing out something we’ve already heard: the Standard Model of Particle physics describes all the world we experience in our everyday lives.

The second one, just up, tackles quantum field theory and the Pauli exclusion principle, of which we’ve been recently speaking. (Admittedly it’s two minutes long, but these are big topics!)

The world is made of fields, which appear to us as particles when we look at them. Something everyone should know.


Men on the move and women in place?

After posting on Basque mtDNA I wanted to make something more explicit that I alluded to below, that uniparental lineages are highly informative, but they may not be representative of total genome content. This is plainly true in the case of mestizos from Latin America, but we don’t need genetics to point us in the right direction on this score, we have plenty of textual evidence for asymmetry in sexes when it came to admixture events in the post-Columbian era. Rather, I want to note again the issue of South Asia. When it comes to mtDNA the good majority of South Asian lineages are closer to those of East Asia than Western Eurasia. By this, I do not mean to say that that they’re particular close to East Asian lineages, only that if you go back in the phylogeny the South Asian lineages (I’m thinking here of haplogroup M) they tend to coalesce first with East Asian lineages before they do so with West Eurasian lineages.

Here is a quote from one of the definitive papers on this topic:

Broadly, the average proportion of mtDNAs from West Eurasia among Indian caste populations is 17% (Table 2). In the western States of India and in Pakistan their share is greater, reaching over 30% in Kashmir and Gujarat, nearly 40% in Indian Punjab, and peaking, expectedly, at approximately 50% in Pakistan (Table 11, see Additional file 6, Figure 11, panel A). These frequencies demonstrate a general decline (SAA p < 0.05 Figure 4) towards the south (23%, 11% and 15% in Maharashtra, Kerala and Sri Lanka, respectively) and even more so towards the east of India (13% in Uttar Pradesh and around 7% in West Bengal and Bangladesh).

In Iran, over 90 percent of the mtDNA lineages seem West Eurasian. Though I accepted these findings, I was always a bit concerned that the 40 unit chasm between Iran and Pakistan was so large. Additionally, the autosomal studies seem to show that Pakistani populations exhibited affinities to West Eurasians greater than than would be predicted by being ~50 percent West Eurasian. And, as many of you no doubt know the mtDNA does not align well with the Y chromosomal lineages, which seem to indicate a stronger affinity to West Eurasia.

The 2009 paper Reconstructing Indian History resolved some of these confusions. In it the authors inferred that South Asians were a compound population, about ~50 percent West Eurasian, and ~50 percent South Eurasian, with this latter component having distant, but still closer, affinities to East Asians. In other words, the latter component could be easily aligned with the mtDNA, while the former made sense of the Y chromosomal lineages. According to the above paper the West Eurasian component was present at 70-80 percent fractions in Pakistan at the total genome level. This is considerably above the 50 percent for mtDNA, and made more sense of the visible affinities of Pakistanis to West Eurasians on the phenotypic dimension. But look at the rapid drop off mtDNA fraction.

Here’s a table I generated combining the drop off in ANI and mtDNA across the two papers:

If you don’t know the geography of India, the West Eurasian mtDNA fraction falls off a cliff very quickly in Northwest India. In contrast, the autosomal ANI fraction drops, but not nearly as precipitously. The ratio between the two is 2:3 in Pakistan. In Bengal is 1:5, but it is already 1:4 in Uttar Pradesh, which is closer geographically to Pakistan than Bengal (though arguably more ecologically distinct from Pakistan, the linguistic dialects of Uttar Pradesh are far closer to those of Pakistan than of Bengal). I will let you develop your own the story in this case, as there’s obviously a lot there could be said speculatively. Rather, I simply wanted to illustrate the reality that the differences between patterns in uniparental lineages and autosomal DNA can tell you a great deal, despite their disagreements on occasion.

Finally, I want to end on a somewhat different note:

Elevated frequencies of haplogroups common in eastern Eurasia are observed in Bangladesh (17%) and Indian Kashmir (21%) and may be explained by admixture with the adjacent populations of Tibet and Myanmar (and possibly further east: from China and perhaps Thailand).

These proportions are both higher than anything in the autosomal DNA. My parents are both 10-15 percent Southeast Asian in ancestry. But I am willing to bet that they’re slightly on the high side even for Bangladeshis (going by geography). And as for Kashmiris, these populations do often show some East Asian admixture, but generally not so high as 20%. What explains this? I have posited that rather than being intrusive to Bengal, the East Asian populations (Munda?) may have been already present when Indo-Aryan speaking agriculturalists arrived. This could explain a sex bias in assimilation of these populations toward females. In general my rule of thumb is that later population arrivals are correlated with a male bias in ancestry.

Parkinson’s disease patients & free 23andMe

Michelle tipped me off to 23andMe’s new initiative to get Parkison’s disease sufferers genotyped. Basically, if you are a sufferer, you get the service for free. The goal presumably to increase the sample size so as to pick up new possible associations. But a question: can you think of a downside for Parkinson’s disease sufferers? A lot of people have genetic privacy concerns, but if you manifest a disease like Parkinson’s I suspect that’s the least of your worries.

Basque maternal heritage & continuity

There’s a new paper in AJHG which caught my eye, The Basque Paradigm: Genetic Evidence of a Maternal Continuity in the Franco-Cantabrian Region since Pre-Neolithic Times (ungated). The first thing you need to know about this paper is that it focuses on only the direct maternal lineage of Basques via the mtDNA. In some ways this is weak tea, since it doesn’t give us a total genome estimate. But there are major upsides to mtDNA and Y. First, because of the lack of recombination it is relatively easy to generate a nice phylogenetic tree using a coalescent model. And second, for mtDNA the molecular clock is considered relatively reliable.

In this specific paper they also expanded the scope of their analysis to the whole mtDNA sequence, instead of just the hypervariable region. Not only did they look at whole sequences, but they also had an enormous sample size. They sequenced over 400 mtDNA genomes from the Basque country and neighboring regions. Haplogroup H peaks in frequency among Basques, and drops off among their neighbors (Gascons, Spaniards, etc.). Because the Basque speak a non-Indo-European language they are usually presumed to be indigenous in relation to their neighbors (or at least more indigenous). Until recently there was a strong presupposition that the Basque were ideal representatives of the pre-Neolithic populations of Western Europe. One common method of analysis would be to use the Basque as a pre-Neolithic “reference,” and simply estimate the impact of a Neolithic demographic wave of advance by using a eastern Mediterranean population as a second “reference” within an admixture framework. But more recent work has muddled the idea that the Basque are the descendants of Paleolithic Europeans. Finally, I suspect we’ll also have to acknowledge complexity in demographic histories. To say that the Basque exhibit continuity with Mesolithic Iberians may not contradict a substantial Neolithic contribution. South Asians for example are one numerous modern group which exhibits sharply divergent affinities if you use Y chromosomes (West Eurasian) or mtDNA (not West Eurasian). Why? The details are prehistorical.


The major takeaway from this paper is that the Basque mtDNA exhibit a pattern of demographic expansion ~4,000 years BP, and ~8,000 years BP. But I think it is important to look at the range of outcomes over their confidence intervals, so I’ve reproduced their second table below:


Table 2. Time Estimates of the Six Autochthonous Haplogroups

Haplogroup N Percentage Rho Standard Error Age (in Years) 95% Confidence Interval
Coalescence Age
H1j1 52 12.4% 1.86 0.49 4845 2324 − 7408
H1t1 34 8.1% 1.94 0.97 5057 99 – 10176
H2a5a1 22 5.2% 1.33 0.65 3422 118 – 6800
H1av1 17 4.0% 1.24 0.52 3213 567 – 5906
H3c2a 14 3.3% 1.27 0.37 3291 1403 – 5204
H1e1a1 12 2.9% 1.23 0.72 3187 −464 – 6927
Splitting Age
H1j1 52 12.4% 2.86 1.11 7514 1764 – 13470
H1t1 34 8.1% 2.94 1.39 7730 554 – 15227
H2a5a1 22 5.2% 2.33 1.19 6094 −6 – 12434
H1av1 17 4.0% 2.24 1.13 5854 65 – 11860
H3c2a 14 3.3% 2.27 1.07 5934 443 – 11619
H1e1a1 12 2.9% 5.23 2.13 14011 2729 – 26000
 

For our purposes the splitting age is important, because it shows when the Basque specific H lineages diverged from other European H lineages. Some of the intervals are huge (look at H1e1a1), so I don’t know what to make of it. I’ll leave further comments to those more well versed in the mtDNA literature, but I would like to say that it is important to remember that we don’t know where the demographic events inferred occurred. It may not have been in the trans-Pyrenees region at all.

More later.

The data sets in the dark

Recently I was tipped off to the appearance of a new paper, Genome-Wide Association Study Identifies Chromosome 10q24.32 Variants Associated with Arsenic Metabolism and Toxicity Phenotypes in Bangladesh. This is the section which caught my eye: “Using data on urinary arsenic metabolite concentrations and approximately 300,000 genome-wide single nucleotide polymorphisms (SNPs) for 1,313 arsenic-exposed Bangladeshi individuals.” 300 K SNPs with 1,313 Bangladeshi individuals is a lot! I’m interested in this data set because of the 200+ participants in the Harappa Ancestry Project my parents remain the “unadmixed” South Asians with the highest fraction of East Asian ancestry (10-15 percent). Within South Asia aside from those groups with clear East Asian affinities only peoples of Munda background have the same levels. This data set could answer a lot of questions as to the typicality of my parents (literally within a few hours in terms of data exploration). But this is all you get in the supplements:

 


Zack Ajmal has already sent off an email asking about this data set, so hopefully the results will be positive.

This is a medical genetics study, so all they wanted to confirm is that there wasn’t population stratification due to inbreeding. They confirmed that. It is fine if they don’t want to explore further questions in relation to ancestry, but it would be really depressing if the data set can never see the light of day for those who are interested in asking other questions.

Friday Piano Solo

Keith Emerson has been doing some interesting work on wave mechanics, Fourier transforms, and temporal structure. Here are some of his findings.

Not exactly what you see at the Grammy’s these days. (Not that it was back in 1974, either.)