Genotype them all, and let GWAS sort it out

Screenshot 2016-08-28 15.41.08
About thirteen years ago I expressed the opinion that an understanding of population structure will become a matter of intellectual curiosity once we have a better understanding of the genetic basis of characteristics. A friend, who was a statistical geneticist, told me that this was unlikely. We were unlikely to capture the ability to predict all outcomes well enough on even high heritable complex traits to simply discard population structure information. Some of this is not due to genetics; different populations may expose themselves to different environmental conditions. For example, it would be useful to know which individuals in the CEU white European American data set are practicing Mormons, and which are not, because Mormonism tends to result in a lot of behavior modification.

But some of the concern about population structure has to do with the fact that genetic background matters, and we are unlikely to ever have total omniscience as to the nature of genetic interactions and dependencies. By this, I mean that if we have a strong causal signal which associates disease risk with a genetic variant, that risk is still conditional on dependencies of other genetic variations across the genome. Those variations are the outcome of demographic histories, which one can “control” for to some extent by accounting for population structure. In more plain language, a signal that predicts an outcome in Norwegians may not predict the same outcome in Nigerians. The may be due to different frequencies of other variants which are not directly causal, but interact with the causal signals, which vary between populations.

Screenshot 2016-08-28 15.58.43More recently I’ve been a bit sanguine. I don’t follow the literature closely, but papers like High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants, make me wonder if the genetic background concerns weren’t over-wrought.

A new preprint, Population genetic history and polygenic risk biases in 1000 Genomes populations, suggests we should be worried. Or, more precisely, we should be cognizant of the limitations genetic background imposes upon us for certain classes of variants and disease. In particular, rare variants are going to be less portable across populations because of shallower time depth of their emergence, after, populations have diverged. So, if you have a low frequency major effect causal variant in Europeans, there is a much lower likelihood that it is in other populations.

The histogram above illustrates an excellent case study from the preprint. The genetic architecture of height and its genomic basis has been most well elucidated for Europeans. We know, for example, many of the loci which distinguish Northern and Southern Europeans, and, we know that selection has resulted in divergence between the two populations over the past 5,000 years. But as you can see the predicted heights seem to simply follow genetic distance from Europeans. SAS = South Asians, while AMR = a mixed cohort of populations from the Americas. EAS and AFR are East Asians and Africans. In reality, Africans are nearly as tall as Europeans (taller or shorter depending upon the reference European population), and taller than East Asians. The predictions here are off because the causal variants inferred from the studies of European cohorts are portable in direction proportion to shared demographic history. South Asians share a relatively ancient demographic history with Europeans, while many mixed groups from the Americas have Europeans as one of their recent founding populations. But in both cases the causal variants were likely segregating in the ancestral populations before divergence, so there is no major difference in the consequence.

The preprint has a lot more than just a reanalysis of GWAS. Using local ancestry deconvolution methods they show how one can infer history from patterns of genetic variation (though as always, this should not be taken as gospel, as there are biases in the methods currently used). The major take home is simple: population structure is real, and, it has real consequences functionally.

Posted in Uncategorized

Comments are closed.