Recollections of Mel Green


Mel Green co-taught a “history of genetics” course that I took as a first-year grad student at UC Davis. It was fitting because Mel Green was a living embodiment of the history of genetics. Mine was one of the last years that Mel co-taught that class, so I feel quite privileged.

Unlike some of my friends who have gone through Davis I only had a few conversations with Mel. But he gave us the wisdom of a life of learning and seeing genetics evolve as a discipline over the 20th century. It isn’t often that you talk to someone who could dismiss Charles Davenport because he had talked to the man and judged that he had a poor grasp of Mendelian theory!

Most everyone has a “Mel Green story.” So let me recount mine. Though it doesn’t have to do me with as such. Mel lived 101 years, and was active in science by the 1940s. In our history of genetics course we had to give a presentation on a particular topic (mine was on polytene chromosomes). The student who was giving the presentation on Drosophila research was not a genetics student. I had assumed she would be a bit nervous because Mel was a renowned Drosophilist, and he was sitting right there listening to everything.

At some point she began to refer to a researcher, “M Green.” She went on about “M Green” and his work for about five minutes, at one point pausing to note that “M Green” even worked at Davis! At this point the co-instructor had to stop her and tell her that “M Green” was sitting in the room, right next to her. Because the research was published in the 1940s the student had assumed that this was from someone who could never have been alive in the present. But there it was, Mel Green was still with us, a witness to all that history that had come and gone.

The non-European ancestry of Afrikaners


A few years ago I got some South African genotypes. Some of the individuals were clearly African. A few mapped perfectly upon Northern Europeans. But many of the samples consistently were European but shifted toward non-European populations.

Based on history of the assimilation of slaves into the European population of Cape Colony in the 18th century, my assumption is that these individuals are Afrikaners.

Recently I realized that Brenna Henn had released some more Khoisan samples, so I decided to look at this question of admixture again. The two Khoisan populations are the Nama and the Khomani. I removed those with lots of Bantu and European admixture and combined them together into one population.

Running unsupervised Admixture shows how distinct the South African whites are.

The average Utah white in this sample (this population is a mix of British, German, and Scandinavian in ancestry) is 99% European modal cluster, and 1% South Asian. The average for the white South Africans in this data set is 94% European modal cluster. The residual is 1% East Asian (Dai modal), 1% Khosian, 1% non-Khoisan African, and 2% South Asian.

I ran Treemix a bunch of times, and every single plot came out like this when I ran it for three migrations:

 

The gene flow from the Utah whites to the Gujuratis is simply an artifact of the fact that the Gujurati sample is mixed caste, and some of the Brahmin or Lohannas have more “Ancestral North Indian.” The gene flow from the Europeans to the Khoisan is probably real, or, might be due to pastoralist admixture via East Africans. The last migration arrow goes from the African populations to the South African whites, with a shift toward the Khoisan.

I also ran a three population test where A is the outgroup, and B and C are a clade. A significantly negative f3-statistic indicates admixture in population A. The negative values are listed below:

A B C f3 f3-error Z-score
Gujrati Dai UtahWhite -0.00121718 0.000140141 -8.68539
South_Africa EsanNigeria UtahWhite -0.00127718 0.000147982 -8.63059
South_Africa Khoisan_SA UtahWhite -0.0012928 0.000151416 -8.53802
Gujrati South_Africa Dai -0.000778791 0.000155656 -5.00329
South_Africa Dai UtahWhite -0.000541974 0.000133262 -4.06699
South_Africa UtahWhite Gujrati -0.000103581 8.46193e-05 -1.22408

This aligns well with the Admixture results. Afrikaners have both African ancestries, and, Asian ancestry.

In James Michener’s The Covenant one of the plot lines alludes to mixed ancestry in one of the Afrikaner families. The results above suggest that mixed ancestry is very common, and perhaps ubiquitous, in this population. True, there are some Afrikaners such as Hendrik Verwoerd who migrated to South Africa from the Netherlands in the past century or so, but these are uncommon to my knowledge.

Genetics books for the masses!

Since I’ve become professionally immersed in genetics I haven’t read many books on the topics. I read papers. And I do genetics. But back in the day I did enjoy a good book. The standard recommendation would be to read Matt Ridley’s Genome. It’s a bit dated now (it was published around when the Human Genome Project being completed), but I’d still recommend it.

But when in the mid-2000s I dabbled a little bit in the world of worm (C. elegans) genetics I read Andrew Brown’s In the Beginning Was the Worm: Finding the Secrets of Life in a Tiny Hermaphrodite. It’s pretty far from my current concerns and fixations, with more of a focus on developmental processes, but it is pretty cool to read about the race to “map” every cell in C. elegans.

The second book I’d recommend readers of this blog is the late Will Provine’s The Origins of Theoretical Population Genetics. Modern population genomics is a massive edifice built atop the foundations of the early 20th century fusion of Mendelism and the biometrical heirs of Darwin. Provine outlines how primitive genetics eventually seeded the birth of the Neo-Darwinian Synthesis.

Why do percentage estimates of “ancestry” vary so much?

When looking at the results in Ancestry DNA, 23andMe, and Family Tree DNA my “East Asian” percentage is:

– 19%
– 13%
– 6%

What’s going on here? In science we often make a distinction between precision and accuracy. Precision is how much your results vary when you re-run an experiment or measurement. Basically, can you reproduce your result? Accuracy refers to how close your measurement is to the true value. A measurement can be quite precise, but consistently off. Similarly, a measurement may be imprecise, but it bounces around the true value…so it is reasonably accurate if you get enough measurements just cancel out the errors (which are random).

The values above are precise. That is, if you got re-tested on a different chip, the results aren’t going to be much different. The tests are using as input variation on 100,000 to 1 million markers, so a small proportion will give different calls than in the earlier test. But that’s not going to change the end result in most instances, even though these methods often have a stochastic element.

But what about accuracy? I am not sure that old chestnuts about accuracy apply in this case, because the percentages that these services provide are summaries and distillations of the underlying variation. The model of precision and accuracy that I learned would be more applicable to the DNA SNP array which returns calls on the variants; that is, how close are the calls of the variant to the true value (last I checked these are arrays are around 99.5% accurate in terms of matching the true state).

What you see when these services pop out a percentage for a given ancestry is the outcome of a series of conscious choices that designers of these tests made keeping in mind what they wanted to get out of these tests. At a high level here’s what’s going on:

  1. You have a model of human population history and dynamics with various parameters
  2. You have data that that varies that you put into that model
  3. You have results which come back with values which are the best fit of that data to the model you specificed

Basically you are asking the computational framework a question, and it is returning its best answer to the question posed. To ask whether the answer is accurate or not is almost not even wrong. The frameworks vary because they are constructed by humans with difference preferences and goals.

Almost, but not totally wrong. You can for example simulate populations whose histories you know, and then test the models on the data you generated. Since you already know the “truth” about the simulated data’s population structure and history, you can see how well your framework can infer what you already know from the patterns of variation in the generated data.

Going back to my results, why do my East Asian percentages vary so much? The short answer is that one of the major variables in the model alluded to above is the nature of the reference population set and the labels you give them.

Looking at Bengalis, the ethnic group I’m from, it is clear that in comparison to other South Asian populations they are East Asian shifted. That is, it seems clear I do have some East Asian ancestry. But how much?

The “simple” answer is to model my ancestry is a mix of two populations, an Indian one and an East Asian one, and then see what the values are for my ancestry across the two components. But here is where semantics becomes important: what is Indian and East Asian? Remember, these are just labels we give to groups of people who share genetic affinities. The labels aren’t “real”, the reality is in the raw read of the sequence. But humans are not capable of really getting anything from millions of raw SNPs assigned to individuals. We have to summarize and re-digest the data.

The simplest explanation for what’s going on here is that the different companies have different populations put into the boxes which are “Indian/South Asian” and “East Asian.” If you are using fundamentally different measuring sticks, then there are going to be problems with doing apples to apples comparisons.

My personal experience is that 23andMe tends to give very high percentages of South Asian ancestry for all South Asians. Because “South Asian” is a very diverse category when tests come back that someone is 95-99% South Asian…it’s not really telling you much. In contrast, some of the other services may be using a small subset of South Asians, who they define as “more typical”, and so giving lower percentages to people from Pakistan and Bengal, who have admixture from neighboring regions to the west and east respectively.*

Something similar can occur with East Asian ancestry. If the “donor” ancestral groups are South Asian and East Asian for me, then the proportions of each is going to vary by how close the donor groups selected by the company is to the true ancestral group. If, for example, Family Tree DNA chose a more Northeastern Asian population than Ancestry DNA, then my East Asian population would vary between the two services because I know my East Asian ancestry is more Southeast Asian.

The moral of the story is that the values you obtain are conditional on the choices you make, and those choices emerge from the process of reducing and distilling the raw genetic variation into a manner which is human interpretable. If the companies decided to use the same model, the would come out with the same results.

* I helped develop an earlier version of MyOrigins, and so can attest to this firsthand.

When journalists get out of their depth on genetic genealogy

For some reason The New York Times tasked Gina Kolata to cover genetic genealogy and its societal ramifications, With a Simple DNA Test, Family Histories Are Rewritten. The problem here is that to my knowledge Kolata doesn’t cover this as part of her beat, and so isn’t well equipped to write an accurate and in depth piece on the topic in relation to the science.

This is a general problem in journalism. I notice it most often when it comes to genetics (a topic I know a lot about for professional reasons) and the Middle East and Islam (topics I know a lot about because I’m interested in them). It’s unfortunate, but it has also made me a lot more skeptical of journalists whose track record I’m unfamiliar with.* To give a contrasting example, Christine Kenneally is a journalist without a background in genetics who nevertheless is immersed in genetic genealogy, so that she could have written this sort of piece without objection from the likes of me (she did write a book on the topic, The Invisible History of the Human Race: How DNA and History Shape Our Identities and Our Futures, which I had a small role in fact-checking).

What are the problems with the Kolata piece? I think the biggest issue is that she didn’t go in to test any particular proposition, and leaned on the wrong person for the science. She quotes Joe Pickrell, who knows this stuff like the back of his hand. But more space is given to Jonathan Marks, an anthropologist who is quite opinionated and voluble, and so probably a “good source” for any journalist.

Marks seems well respected in anthropology from what I can tell, but he’s also the person who put up a picture of L. L. Cavalli-Sforza juxtaposed with a photo of Josef Mengele in the late 1990s during a presentation at Stanford. Perhaps this is why anthropologists respect him, I don’t know, but I do not like him because of his nasty tactics (I wouldn’t be surprised if Marks had power he would make sure people like me were put in political prison camps, his rhetoric is often so unhinged).

Marks’ quotes wouldn’t be much of an issue if Kolata could figure out when he’s making sense, and when he’s just bullshitting. But she can’t. For example:

…“tells me I’m 95 percent Ashkenazi Jewish and 5 percent Korean, is that really different from 100 percent Ashkenazi Jewish and zero percent Korean?”

The precise numbers offered by some testing services raise eyebrows among genetics researchers. “It’s all privatized science, and the algorithms are not generally available for peer review,” Dr. Marks said.

The part about precise numbers is an issue, though a lot less of an issue with high density SNP-chips (the real issue is sensitivity to reference population and other such parameters). But if a modern test says you are 95 percent Ashkenazi Jewish and 5 percent Korean it really is different from 100% Ashkenazi. Someone who comes up as 5% Korean against an Ashkenazi Jewish background is most definitely of some East Asian heritage. In the early 2000s with ancestrally informative markers and microsatellite based tests you’d get somewhat weird results like this, but with the methods used by the major DTC companies (and in academia) today these sorts of proportions are just not reported as false positives. Marks may not know because this isn’t his area, but Pickrell would have. Kolata probably did not think to double-check with him, but that’s because she isn’t able to smell out tendentious assertions. She has no feel for the science, and is flying blind.

Second, Marks notes that the science is privatized, and it isn’t totally open. But it’s just false that the algorithms are not generally available for peer review. All the details of the pipeline are not downloadable on GitHub, but the core ancestry estimation methods are well known. Eric Durand, who wrote the originally 23andMe ancestry composition methodology presented on it at ASHG 2013. I know because I was there during his session.

You can find a white paper for 23andMe’s method and Ancestry‘s. Not everything is as transparent as open science would dictate (though there are scientific papers and publications which also mask or hide elements which make reproducibility difficult), but most geneticists with domain experience can figure out what’s going on and it if it is legitimate. It is. The people who work at the major DTC companies often come out of academia, and are known to academic scientists. This isn’t blackbox voodoo science like “soccer genomics.”

Then Marks says this really weird thing:

“That’s why their ads always specify that this is for recreational purposes only: lawyer-speak for, ‘These results have no scientific standing.’”

Actually, it’s lawyer-speak for “do not sue us, as we aren’t providing you actionable information.” Perhaps I’m ignorant, but lawyers don’t get to define “scientific standing”.

The problem, which is real, is that the public is sometimes not entirely clear on what the science is saying. This is a problem of communication from the companies to the public. I’ve even been in scientific sessions where geneticists who don’t work in population genomics have weak intuition on what the results mean!

Earlier Kolata states:

Scientists simply do not have good data on the genetic characteristics of particular countries in, say, East Africa or East Asia. Even in more developed regions, distinguishing between Polish and, for instance, Russian heritage is inexact at best.

This is not totally true. We have good data now on China and Japan. Korea also has some data. Using haplotype-based methods you can do a lot of interesting things, including distinguish someone who is Polish from Russian. But these methods are computationally expensive and require lots of information on the reference samples (Living DNA does this for British people). The point is that the science is there. Reading this sort of article is just going to confuse people.

On the other hand a lot of Kolata’s piece is more human interest. The standard stuff about finding long lost relatives, or discovering your father isn’t your father. These are fine and not objectionable factually, though they’ve been done extensively before and elsewhere. I actually enjoyed the material in the second half of the piece, which had only a tenuous connection to scientific detail. I just wish these sorts of articles represented the science correctly.

Addendum: Just so you know, three journalists who regularly cover topics I can make strong judgments on, and are always pretty accurate: Carl Zimmer, Antonio Regalado, and Ewen Callaway.

* I don’t follow Kolata very closely, but to be frank I’ve heard from scientist friends long ago that she parachutes into topics, and gets a lot of things wrong. Though I can only speak on this particular piece.

The future will be genetically engineered


If the film Rise of the Planet of the Apes had come out a few years later I believe there would have been mention of CRISPR. Sometimes science leads to technology, and other times technology aids in science. On occasion the two are one in the same.

The plot I made above shows that in the first five years of the second decade of the 20th century CRISPR went from being an obscure aspect of bacterial genetics to ubiquitous. Friends who had been utilizing “advanced” genetic engineering methods such as TALENS and zinc fingers switched overnight to a CRISPR/Cas9 framework.

As I’ve said before the 2010s are the decade when “reading” the genome becomes normal. We really don’t know what the CRISPR/Cas9 technology is capable of. It’s early years yet. With that, First Human Embryos Edited in U.S.. Technically they’re single celled zygotes. The science itself is not astounding. Rather, it is that the human rubicon has been passed in the United States. As indicated in the article there has been some jealousy about what the Chinese have been able to do because of a different cultural and regulatory framework.

There are those calling for a moratorium on this work (on humans). I’m not in favor or opposed. Rather, my question is simple: if CRISPR/Cas9 makes genetic engineering cheap, easy, and effective, how exactly are we going to enforce a world-wide moratorium? A Butlerian Jihad?

Note: I know that people are freaking about humans + genetic engineering. But most geneticists I know are more excited about the prospects of non-human work, since human clinical trials are going to be way in the future. Over 20 years since Dolly it’s notable to me that no human has been cloned from adult somatic cells yet.

Origin of modern humanity pushed back 260,000 years BP (?)


The above figure is from a preprint, Ancient genomes from southern Africa pushes modern human divergence beyond 260,000 years ago. The title and abstract are pretty clear:

Southern Africa is consistently placed as one of the potential regions for the evolution of Homo sapiens. To examine the region’s human prehistory prior to the arrival of migrants from East and West Africa or Eurasia in the last 1,700 years, we generated and analyzed genome sequence data from seven ancient individuals from KwaZulu-Natal, South Africa. Three Stone Age hunter-gatherers date to ~2,000 years ago, and we show that they were related to current-day southern San groups such as the Karretjie People. Four Iron Age farmers (300-500 years old) have genetic signatures similar to present day Bantu-speakers. The genome sequence (13x coverage) of a juvenile boy from Ballito Bay, who lived ~2,000 years ago, demonstrates that southern African Stone Age hunter-gatherers were not impacted by recent admixture; however, we estimate that all modern-day Khoekhoe and San groups have been influenced by 9-22% genetic admixture from East African/Eurasian pastoralist groups arriving >1,000 years ago, including the Ju|’hoansi San, previously thought to have very low levels of admixture. Using traditional and new approaches, we estimate the population divergence time between the Ballito Bay boy and other groups to beyond 260,000 years ago. These estimates dramatically increases the deepest divergence amongst modern humans, coincide with the onset of the Middle Stone Age in sub-Saharan Africa, and coincide with anatomical developments of archaic humans into modern humans as represented in the local fossil record. Cumulatively, cross-disciplinary records increasingly point to southern Africa as a potential (not necessarily exclusive) ‘hot spot’ for the evolution of our species.

These results in the outlines were actually presented at a conference. I saw it on Twitter and don’t remember which conference anymore. But this is not entirely surprising.

First, much respect to Mattias Jakobsson’s group for breaking through the Reich-Willerslev duopoly. Hopefully this presages some democratization of the ancient DNA field as expenses are going down.

Second, notice how in most cases ancient DNA shows that modern reference populations turn out to be admixed. This was the problem with much of Eurasia, and why using modern genetic variation to make inferences about the past totally failed.

I am entirely convinced that the genome from Ballito Bay dating to ~2,000 years does not carry the Eurasian inflected East African admixture. The Mota genome implies that Eurasian admixture did not come to eastern Africa much before 4,500 years ago. There needs to be a much deeper big picture analysis of the archaeology of Africa and the genetic information we have to get a sense of what happened back then…but, it seems likely that the Bantu migration has over-written much of the earlier genetic variation.

The fact that ancient genomes always show that our current populations are admixed makes me wonder if the Ballito Bay sample itself is admixed from more ancient populations. That is, if we found a genome from 20,000 years ago, would it be very different from the Ballito Bay samples? The relatively thick time transect from Europe indicates that turnover happens every 10,000 years or so. Australian Aborigines seem to have been resident in their current locations for ~50,000 years, but this seems the exception, not the rule. Do we really think that the ancestors of the Bushmen were living in southern Africa for five times as long as Australian Aborigines?

Another curious aspect of this paper is that it suggests the effective population size of Bushmen is smaller than we might have thought, and they’re somewhat less diverse than we’d thought. That’s because East African (with Eurasian ancestry) gene flow increased heterozygosity, as well as inferred effective population sizes. I’ve mentioned this effect on statistics before. Unless you have a true model of population history (or close to it) your assumptions might distort the numbers you get.

There is another aspect to this preprint mentioned glancingly in the text, and a bit more in the supplements: they seem to only be able to model Yoruba well if you assume that they themselves are a mix of “Basal Humans” (BH) and other African population which gave rise to East Africans and “Out of Africa” populations. Note that the BH seem to diverge from other human populations before the ancestors of Southern Africans like the Ballito Bay sample. That is, BH could push the diversification of the ancestors of modern humans considerably before 260,000 years before the present.

The possibility of deep structure in the Yoruba is pretty notable because they’ve been the gold standard in many human population genetic data sets as a reference population. But this is not result of deep structure is not entirely surprising. For years researchers have been hinting at confusing results in relation to the possibility of Eurasian back-migration. Perhaps the deep structure was confounding inferences?

The authors themselves are quite cautious about their dating of the divergence. It’s sensitive to many assumptions, and in particular the mutation rate being known and constant over time. But I think it’s hard to deny that this is pushing back the emergence of modern humans beyond what we know today. The earliest anatomically modern humans are found in Ethiopia 195,000 years ago from what I know. As I said, I’m convinced that the ancient genome has shown that modern “pristine” populations have some serious admixture. But I’m not as convinced about any specific point estimate, because that’s sensitive to a lot of assumptions which might not hold.

Finally, first a quick shout out to the blogger Dienekes. As early as ten years ago he anticipated the basic outlines of these sorts of results in the generality, if not the details. We really have come a long way from popular science declaring that all humans descend from a small group of East Africans who lived 50,000 to 100,000 years ago. The real picture was much more complex.

Also, I have to admit I considered titling this blogspot “Wolpoff’s revenge.” As in Milford Wolpoff. The reason being that we’re getting quite close to territory familiar to the much maligned multi-regionalist model of modern human origins.

Note: These findings should make us less surprised perhaps by a “modern” human migration before the primary one out of Africa.

The nadir of genetics in the Soviet Union

A fascinating excerpt in Slate from How to Tame a Fox (and Build a Dog), :

This skepticism of genetics all started when, in the mid-1920s, the Communist Party leadership elevated a number of uneducated men from the proletariat into positions of authority in the scientific community, as part of a program to glorify the average citizen after centuries of monarchy had perpetuated wide class divisions between the wealthy and the workers and peasants. Lysenko fit the bill perfectly, having been raised by peasant farmer parents in the Ukraine. He hadn’t learned to read until he was 13, and he had no university degree, having studied at what amounted to a gardening school, which awarded him a correspondence degree. The only training he had in crop-breeding was a brief course in cultivating sugar beets. In 1925, he landed a middle-level job at the Gandzha Plant Breeding Laboratory in Azerbaijan, where he worked on sowing peas. Lysenko convinced a Pravda reporter who was writing a puff piece about the wonders of peasant scientists that the yield from his pea crop was far above average and that his technique could help feed his starving country. In the glowing article the reporter claimed, “the barefoot professor Lysenko has followers … and the luminaries of agronomy visit … and gratefully shake his hand.” The article was pure fiction. But it propelled Lysenko to national attention, including that of Josef Stalin.

Sometimes it is easy to believe that the period in the Soviet Union under Stalin or in China under Mao or in Germany under Hitler, to name a few, were aberrations. But I think that’s the wrong way to look at it. The story of how Lysenko became influential hooks into so many historical tropes and psychological instincts of our species that we should be wary of it.

There have been great scholars without requisite qualifications. Ramanujan and Faraday come to mind. But great scholars are exceptional people. They are not average.

To be a scientific intellectual today

George Busby has put up note about changes in his career path, Meditation on the Caltrain. I took offense to this section:

On top of this, there was the burgeoning realisation that no one actually reads the academic papers that I write. This is no moot point: writing papers is the main purview of a research scientist, and the central way we both communicate our results and measure success. However, compared to the proportion of the world’s population who can read, the number of people that had sat down to ingest my latest, dense, and fascinating (to me at least) treaty on the population genetics of Africa, three years in the making, was minuscule. The words of a colleague rang in my head: “99.9% of scientific papers just don’t get read”.

His most recent paper, Admixture into and within sub-Saharan Africa, was great. I meant to blog it, but got busy with other things. To be frank the fact that someone like George Busy is having trouble in the academic market is sobering. He has produced good and prominent work, and has been attached to groups which have some prominence. Of course grant approvals and job prospects have a stochastic element. But his experience shows that talent and good work is just a necessary, not sufficient, condition.

It looks like Busby will land in Silicon Valley with one of the two companies that do a lot of work on ancestry. Good for him. I think it does behoove those of us with intellectual pretensions to wonder what we’re doing out in the world. And, it also behooves academics to wonder what they’re doing with their job security. Sometimes it is important to tell the truth and explore topics even if people don’t care, or don’t want to listen. Otherwise, why fund anything that’s not practical with the public fisc?

The misrepresentation of genetic science in the Vox piece on race and IQ

I don’t have time or inclination to do a detailed analysis of this piece in Vox, Charles Murray is once again peddling junk science about race and IQ. Most people really don’t care about the details, so what’s the point?

But in a long piece one section jumped out to me in particular because it is false:

Murray talks about advances in population genetics as if they have validated modern racial groups. In reality, the racial groups used in the US — white, black, Hispanic, Asian — are such a poor proxy for underlying genetic ancestry that no self-respecting statistical geneticist would undertake a study based only on self-identified racial category as a proxy for genetic ancestry measured from DNA.

Obviously the Census categories are pretty bad and not optimal (e.g., the “Asian American” category pools South with East & Southeast Asians, and that has caused issues in biomedical research in the past). But the claim is false. In the first half of the 2000s the eminent statistical geneticist Neil Risch specifically addressed this issue. From 2002 in Genome Biology Categorization of humans in biomedical research: genes, race and disease:

A debate has arisen regarding the validity of racial/ethnic categories for biomedical and genetic research. Some claim ‘no biological basis for race’ while others advocate a ‘race-neutral’ approach, using genetic clustering rather than self-identified ethnicity for human genetic categorization. We provide an epidemiologic perspective on the issue of human categorization in biomedical and genetic research that strongly supports the continued use of self-identified race and ethnicity.

A major discussion has arisen recently regarding optimal strategies for categorizing humans, especially in the United States, for the purpose of biomedical research, both etiologic and pharmaceutical. Clearly it is important to know whether particular individuals within the population are more susceptible to particular diseases or most likely to benefit from certain therapeutic interventions. The focus of the dialogue has been the relative merit of the concept of ‘race’ or ‘ethnicity’, especially from the genetic perspective. For example, a recent editorial in the New England Journal of Medicine [1] claimed that “race is biologically meaningless” and warned that “instruction in medical genetics should emphasize the fallacy of race as a scientific concept and the dangers inherent in practicing race-based medicine.” In support of this perspective, a recent article in Nature Genetics [2] purported to find that “commonly used ethnic labels are both insufficient and inaccurate representations of inferred genetic clusters.” Furthermore, a supporting editorial in the same issue [3] concluded that “population clusters identified by genotype analysis seem to be more informative than those identified by skin color or self-declaration of ‘race’.” These conclusions seem consistent with the claim that “there is no biological basis for ‘race'” [3] and that “the myth of major genetic differences across ‘races’ is nonetheless worth dismissing with genetic evidence” [4]. Of course, the use of the term “major” leaves the door open for possible differences but a priori limits any potential significance of such differences.

In our view, much of this discussion does not derive from an objective scientific perspective. This is understandable, given both historic and current inequities based on perceived racial or ethnic identities, both in the US and around the world, and the resulting sensitivities in such debates. Nonetheless, we demonstrate here that from both an objective and scientific (genetic and epidemiologic) perspective there is great validity in racial/ethnic self-categorizations, both from the research and public policy points of view.

From a 2005 interview:

Gitschier: Let’s talk about the former, the genetic basis of race. As you know, I went to a session for the press at the ASHG [American Society for Human Genetics] meeting in Toronto, and the first words out of the mouth of the first speaker were “Genome variation research does not support the existence of human races.”

Risch: What is your definition of races? If you define it a certain way, maybe that’s a valid statement. There is obviously still disagreement.

Gitschier: But how can there still be disagreement?

Risch: Scientists always disagree! A lot of the problem is terminology. I’m not even sure what race means, people use it in many different ways.

In our own studies, to avoid coming up with our own definition of race, we tend to use the definition others have employed, for example, the US census definition of race. There is also the concept of the major geographical structuring that exists in human populations—continental divisions—which has led to genetic differentiation. But if you expect absolute precision in any of these definitions, you can undermine any definitional system. Any category you come up with is going to be imperfect, but that doesn’t preclude you from using it or the fact that it has utility.

We talk about the prejudicial aspect of this. If you demand that kind of accuracy, then one could make the same arguments about sex and age!

You’ll like this. In a recent study, when we looked at the correlation between genetic structure [based on microsatellite markers] versus self-description, we found 99.9% concordance between the two. We actually had a higher discordance rate between self-reported sex and markers on the X chromosome! So you could argue that sex is also a problematic category. And there are differences between sex and gender; self-identification may not be correlated with biology perfectly. And there is sexism. And you can talk about age the same way. A person’s chronological age does not correspond perfectly with his biological age for a variety of reasons, both inherited and non-inherited. Perhaps just using someone’s actual birth year is not a very good way of measuring age. Does that mean we should throw it out? No. Also, there is ageism—prejudice related to age in our society. A lot of these arguments, which have a political or social aspect to them, can be made about all categories, not just the race/ethnicity one.

Risch is not obscure. In the piece the author observes that Risch ‘was described by one of the field’s founding fathers [of the field] as “the statistical geneticist of our time.’

2005 is a long way from 2017. Risch may have changed his mind. In fact, it is probably best for him and his reputation if he has changed his mind. I wouldn’t be surprised if Risch comes out and engages in a struggle session where he disavows his copious output from 2005 and earlier defending the utilization of race as a concept in statistical genetics.

Also, genotyping is cheap enough and precise enough that one might actually make an argument for leaving off any self-reported ancestry questions. It’s really not necessary. This isn’t 2005.

But that section in the Vox piece is simply false. The existence of Risch refutes it. Vox is a high profile website which serves to “explain” things to people. The academics who co-wrote that piece are very smart, prominent, and known to me. I don’t plan on asking them why they put that section in there. I think I know why.

There will be no update to that piece I’m sure. It will be cited widely. It will become part of what “we” all know. Who I am to disagree with Vox? This is journalism from what have been able to gather and understand. The founders of Vox are rich and famous now. Incentives matter. There are great journalists out there  who don’t misrepresent topics which I know well. But the incentive structure is not to reward this. More often storytellers who tell you the story you like to be told are rewarded.

As for science and the academy? I am frankly too depressed to say more.