Why the rate of evolution may only depend on mutation

Sometimes people think evolution is about dinosaurs.

It is true that natural history plays an important role in inspiring and directing our understanding of evolutionary process. Charles Darwin was a natural historian, and evolutionary biologists often have strong affinities with the natural world and its history. Though many people exhibit a fascination with the flora and fauna around us during childhood, often the greatest biologists retain this wonderment well into adulthood (if you read W. D. Hamilton’s collections of papers, Narrow Roads of Gene Land, which have autobiographical sketches, this is very evidently true of him).

But another aspect of evolutionary biology, which began in the early 20th century, is the emergence of formal mathematical systems of analysis. So you have fields such as phylogenetics, which have gone from intuitive and aesthetic trees of life, to inferences made using the most new-fangled Bayesian techniques. And, as told in The Origins of Theoretical Population Genetics, in the 1920s and 1930s a few mathematically oriented biologists constructed much of the formal scaffold upon which the Neo-Darwinian Synthesis was constructed.

The product of evolution

At the highest level of analysis evolutionary process can be described beautifully. Evolution is beautiful, in that its end product generates the diversity of life around us. But a formal mathematical framework is often needed to clearly and precisely model evolution, and so allow us to make predictions. R. A. Fisher’s aim when he wrote The Genetical Theory Natural Selection was to create for evolutionary biology something equivalent to the laws of thermodynamics. I don’t really think he succeeded in that, though there are plenty of debates around something like Fisher’s fundamental theorem of natural selection.

But the revolution of thought that Fisher, Sewall Wright, and J. B. S. Haldane unleashed has had real yields. As geneticists they helped us reconceptualize evolutionary process as more than simply heritable morphological change, but an analysis of the units of heritability themselves, genetic variation. That is, evolution can be imagined as the study of the forces which shape changes in allele frequencies over time. This reduces a big domain down to a much simpler one.

Genetic variation is concrete currency with which one can track evolutionary process. Initially this was done via inferred correlations between marker traits and particular genes in breeding experiments. Ergo, the origins of the “the fly room”.

But with the discovery of DNA as the physical substrate of genetic inheritance in the 1950s the scene was set for the revolution in molecular biology, which also touched evolutionary studies with the explosion of more powerful assays. Lewontin & Hubby’s 1966 paper triggered a order of magnitude increase in our understanding of molecular evolution through both theory and results.

The theoretical side occurred in the form of the development of the neutral theory of molecular evolution, which also gave birth to the nearly neutral theory. Both of these theories hold that most of the variation with and between species on polymorphisms are due to random processes. In particular, genetic drift. As a null hypothesis neutrality was very dominant for the past generation, though in recent years some researchers are suggesting that selection has been undervalued as a parameter for various reasons.

Setting the live scientific debate, which continue to this day, one of the predictions of neutral theory is that the rate of evolution will depend only on the rate of mutation. More precisely, the rate of substitution of new mutations (where the allele goes from a single copy to fixation of ~100%) is proportional to the rate of mutation of new alleles. Population size doesn’t matter.

The algebra behind this is straightforward.

First, remember that the frequency of the a new mutation within a population is \frac{1}{2N}, where N is the population size (the 2 is because we’re assuming diploid organisms with two gene copies). This is also the probability of fixation of a new mutation in a neutral scenario; it’s probability is just proportional to its initial frequency (it’s a random walk process between 0 and 1.0 proportions). The rate of mutations is defined by \mu, the number of expected mutations at a given site per generation (this is a pretty small value, for humans it’s on the order of 10^{-8}). Again, there are 2N gene copies, so you have 2N\mu to count the number of new mutations.

The probability of fixation of a new mutations multiplied by the number of new mutations is:

    \[ \( \frac{1}{2N} \) \times 2N\mu = \mu \]

So there you have it. The rate of fixation of these new mutations is just a function of the rate of mutation.

Simple formalisms like this have a lot more gnarly math that extend them and from which they derive. But they’re often pretty useful to gain a general intuition of evolutionary processes. If you are genuinely curious, I would recommend Elements of Evolutionary Genetics. It’s not quite a core dump, but it is a way you can borrow the brains of two of the best evolutionary geneticists of their generation.

Also, you will be able to answer the questions on my survey better the next time!

Aryan marauders from the steppe came to India, yes they did!

Its seems every post on Indian genetics elicits dissents from loquacious commenters who are woolly on the details of the science, but convinced in their opinions (yes, they operate through uncertainty and obfuscation in their rhetoric, but you know where the axe is lodged). This post is an attempt to answer some questions so I don’t have to address this in the near future, as ancient DNA papers will finally start to come out soon, I hope (at least earlier than Winds of Winter).

In 2001’s The Eurasian Heartland: A continental perspective on Y-chromosome diversity Wells et al. wrote:

The current distribution of the M17 haplotype is likely to represent traces of an ancient population migration originating in southern Russia/Ukraine, where M17 is found at high frequency (>50%). It is possible that the domestication of the horse in this region around 3,000 B.C. may have driven the migration (27). The distribution and age of M17 in Europe (17) and Central/Southern Asia is consistent with the inferred movements of these people, who left a clear pattern of archaeological remains known as the Kurgan culture, and are thought to have spoken an early Indo-European language (27, 28, 29). The decrease in frequency eastward across Siberia to the Altai-Sayan mountains (represented by the Tuvinian population) and Mongolia, and southward into India, overlaps exactly with the inferred migrations of the Indo-Iranians during the period 3,000 to 1,000 B.C. (27). It is worth noting that the Indo-European-speaking Sourashtrans, a population from Tamil Nadu in southern India, have a much higher frequency of M17 than their Dravidian-speaking neighbors, the Yadhavas and Kallars (39% vs. 13% and 4%, respectively), adding to the evidence that M17 is a diagnostic Indo-Iranian marker. The exceptionally high frequencies of this marker in the Kyrgyz, Tajik/Khojant, and Ishkashim populations are likely to be due to drift, as these populations are less diverse, and are characterized by relatively small numbers of individuals living in isolated mountain valleys.

In a 2002 interview with the India site Rediff, the first author was more explicit:

Some people say Aryans are the original inhabitants of India. What is your view on this theory?

The Aryans came from outside India. We actually have genetic evidence for that. Very clear genetic evidence from a marker that arose on the southern steppes of Russia and the Ukraine around 5,000 to 10,000 years ago. And it subsequently spread to the east and south through Central Asia reaching India. It is on the higher frequency in the Indo-European speakers, the people who claim they are descendants of the Aryans, the Hindi speakers, the Bengalis, the other groups. Then it is at a lower frequency in the Dravidians. But there is clear evidence that there was a heavy migration from the steppes down towards India.

But some people claim that the Aryans were the original inhabitants of India. What do you have to say about this?

I don’t agree with them. The Aryans came later, after the Dravidians.

Over the past few years I’ve gotten to know the above first author Spencer Wells as a personal friend, and I think he would be OK with me relaying that to some extent he was under strong pressure to downplay these conclusions. Not only were, and are, these views not popular in India, but the idea of mass migration was in bad odor in much of the academy during this period. Additionally, there was later work which was less clear, and perhaps supported an Indian origin for R1a1a. Spencer himself told me that it was not impossible for R1a to have originated in India, but a branch eventually back-migrated to southern Asia.

But even researchers from the group at Stanford where he had done his postdoc did not support this model by the middle 2000s, Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions and Reveal Minor Genetic Influence of Central Asian Pastoralists. In 2009 a paper out of an Indian group was even stronger in its conclusion for a South Asian origin of R1a1a, The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system.

By 2009 one might have admitted that perhaps Spencer was wrong. I was certainly open to that possibility. There was very persuasive evidence that the mtDNA lineages of South Asia had little to do with Europe or the Middle East.

Yet a closer look at the above papers reveals two major systematic problems.

First, ancient DNA has made it clear that there has been major population turnover during the Holocene, but this was not the null hypothesis in the 2000s. Looking at extant distributions of lineages can give one a distorted view of the past. Frankly, the 2009 Indian paper was egregious in this way because they included Turkic groups in their Central Asian data set. Even in 2009 there was a whole lot of evidence that Central Asian Turkic groups were likely very different from Indo-European Turanian populations which would have been the putative ancestors of Indo-Aryans. Honestly the authors either consciously loaded the die to reduce the evidence for gene flow from Central Asia, or they were ignorant (the nature of the samples is much clearer in the supplements than the  primary text for what it’s worth).

Second, Y chromosomal marker sets in the 2000s were constrained to fast mutating microsatellite regions or less than 100 variant SNPs on the Y. Because it is so repetitive the Y chromosome is hard to sequence, and it really took the technologies of the last ten years to get it done. Both the above papers estimate the coalescence of extant R1a1a lineages to be 10-15,000 years before the present. In particular, they suggest that European and South Asian lineages date back to this period, pushing back any possible connection between the groups, and making it possible that European R1a1a descended from a South Asian founder group which was expanding after the retreat of the ice sheets. The conclusions were not unreasonable based on the methods they had.  But now we have better methods.*

Whole genome sequencing of the Y, as well as ancient DNA, seems to falsify the above dates. Though microsatellites are good for very coarse grain phyolgenetic inferences, one has to be very careful about them when looking at more fine grain population relationships (they are still useful in forensics to cheaply differentiate between individuals, since they accumulate variation very quickly). They mutate fast, and their clock may be erratic.

Additionally, diversity estimates were based on a subset of SNP that were clearly not robust. R1a1a is not diverse anywhere, though basal lineages seem to be present in ancient DNA on the Pontic steppe in some cases.

To show how lacking in diversity R1a1a is, here are the results of a 2016 paper which performed whole genome sequencing on the Y. Instead of relying on the order of 10 to 100 SNPs, this paper discover over 65,000 Y variants worldwide. Notice how little difference there is between different South Asian groups below, indicative of a massive population expansion relatively recently in time which didn’t even have time to exhibit regional population variation. They note that “The most striking are expansions within R1a-Z93 [the South Asian clade], ~4.0–4.5 kya. This time predates by a few centuries the collapse of the Indus Valley Civilization, associated by some with the historical migration of Indo-European speakers from the western steppes into the Indian sub-continent.

Read More

Mouse fidelity comes down to the genes

While birds tend to be at least nominally monogamous, this is not the case with mammals. This strikes some people as strange because humans seem to be monogamous, at least socially, and often we take ourselves to be typically mammalian. But of course we’re not. Like many primates we’re visual creatures, rather than relying in smell and hearing. Obviously we’re also bipedal, which is not typical for mammals. And, our sociality scales up to massive agglomerations of individuals.

How monogamous we are is up for debate. Desmond Morris, who is well known to many from his roles in television documentaries, has been a major promoter of the idea that humans are monogamous, with a focus on pair-bonds. In contrast, other researchers have highlighted our polygamous tendencies. In The Mating Mind Geoffrey Miller argues for polygamy, and suggests that pair-bonds in a pre-modern environment were often temporary, rather than lifetime (Miller is now writing a book on polyamory).

The fact that in many societies high status males seem to engage in polygamy, despite monogamy being more common, is one phenomenon which confounds attempts to quickly generalize about the disposition of our species. What is preferred may not always be what is practiced, and the external social adherence to norms may be quite violated in private.

Adducing behavior is simpler in many other organisms, because their range of behavior is more delimited. When it comes to studying mating patterns in mammals voles have long been of interest as a model. There are vole species which are monogamous, and others which are not. Comparing the diverged lineages could presumably give insight as to the evolutionary genetic pathways relevant to the differences.

But North American deer mice, Peromyscus, may turn to be an even better bet: there are two lineages which exhibit different mating patterns which are phylogenetically close enough to the point where they can interbreed. That is crucial, because it allows one to generate crosses and see how the characteristics distribute themselves across subsequent generations. Basically, it allows for genetic analysis.

And that’s what a new paper in Nature does, The genetic basis of parental care evolution in monogamous mice. In figure 3 you can see the distribution of behaviors in parental generations, F1 hybrids, and the F2, which is a cross of F1 individuals. The widespread distribution of F2 individuals is likely indicative of a polygenic architecture of the traits. Additionally, they found that some traits are correlated with each other in the F2 generation (probably due to pleiotropy, the same gene having multiple effects), while others were independent.

With the F2 generation they ran a genetic analysis which looked for associations between traits and regions of the genome. They found 12 quantitative trait loci (QTLs), basically zones of the genome associated with variation on one or more of the six traits. From this analysis they immediately realized there was sexual dimorphism in terms of the genetic architecture; the same locus might have a different effect in the opposite sex. This is evolutionarily interesting.

Because the QTLs are rather large in terms of physical genomic units the authors looked to see which were plausible candidates in terms of function. One of their hits was vasopressin, which should be familiar to many from vole work, as well as some human studies. Though the QTL work as well as their pup-switching experiment (which I did not describe) is persuasive, the fact that a gene you’d expect shows up as a candidate really makes it an open and shut case.

The extent of the variation explained by any given QTL seems modest. In the extended figures you can see it’s mostly in the 1 to 5 percent range. In Carl Zimmer’s excellent write up he ends:

But Dr. Bendesky cautioned that the vasopressin gene would probably turn out to be just one of many that influence oldfield mice. Though it is strongly linked to parental behavior, the vasopressin gene accounts for 6.7 percent of the variation in nest building among males, and only 2.9 percent among females.

The genetic landscape of human parenting will turn out to be even more rugged, Dr. Bendesky predicted.

“You cannot do a 23andMe test and find out if your partner is going to be a good father,” he said.

Sort of. The genetic architecture above is polygenic…but not incredibly diffuse. The proportion of variation explained by the largest effect allele is more than for height, and far more than for education. If human research follows up on this, I wouldn’t be surprised if you could develop a polygenic risk score.

But I don’t have a good intuition on how much variation in humans there really is for these sorts of traits that are heritable. I assume some. But I don’t know how much. And how much of the variance in behavior might be explained by human QTLs? Humans don’t lick or build nests, or retrieve pups. Also, as one knows from Genetics and Analysis of Quantitative Traits sexually dimorphic traits take a long time to evolve. These are two deer mice species. Within humans there may not have been enough time for this sort of heritable complexity of behavior to evolve.

There are a lot of philosophical issues here about translating to a human context.

Nevertheless, this research shows that ingenious animal models can powerfully elucidate the biological basis of behavior.

Citation: The genetic basis of parental care evolution in monogamous mice. Nature (2017) doi:10.1038/nature22074

Genetic variation in human populations and individuals


I’m old enough to remember when we didn’t have a good sense of how many genes humans had. I vaguely recall numbers around 100,000 at first, which in hindsight seems rather like a round and large number. A guess. Then it went to 40,000 in the early 2000s and then further until it converged to some number just below 20,000.

But perhaps more fascinating is that we have a much better catalog of the variation across the whole human genome now. Often friends ask me questions of the form: “so DTC genomic company X has about 800,000 SNPs, is that enough to do much?” To answer such a question you need some basic numbers in your head, as well as what you want to “do.”

First, the human genome has about 3 billion base pairs (3 Gb). That’s a lot. But most of the genome famously doesn’t code for proteins. The exome, the proportion of the genome where bases directly translate into a protein accounts for 1% of the whole genome. That’s 30 million bases (30 Mb). But this small region of the genome is very important, as the vast majority of major disease mutations are found in the exome.

When it comes to a standard 800K SNP chip, which samples 800,000 positions across the 3 Gb genome, it is likely that the designers enriched the marker set for functional positions relevant to diseases. Not all marker positions are created equal. Though even outside of those functional positions there are often nearby SNPs that can “tag” them, so you can infer one from the state of the other.

But are 800,000 positions enough to make good ancestry inference? (to give one example) Yes. 800,000 is actually a substantial proportion of the polymorphism in any given genome. There have been some papers which improved on the numbers in 2015’s A global reference for human genetic variation, but it’s still a good comprehensive review to get an order-of-magnitude sense. The table below gives you a sense of individual variation:

Median autosomal variant sites per genome

When it comes to single nucleotide polymorphisms (SNPs), what SNP chips are getting at, an 800K array should get a substantial proportion of your genome-wide variation. More than enough for ancestry inference or forensics. The singleton column shows mutations specific to the individual.  When focusing on new mutations specific to an individual that might cause disease, singleton large deletions and nonsynonymous SNPs is really where I’d look.

But what about whole populations? The plot to the left shows the count of variants as a function of alternative allele frequency. When we say “SNP”, you really mean variants which exhibit polymorphism at a particular cut-off frequency for the minor allele (often 1%). It is clear that as the minor allele frequency increases in relation to the human reference genome the number of variants decreases.

From the paper:

The majority of variants in the data set are rare: ~64 million autosomal variants have a frequency <0.5%, ~12 million have a frequency between 0.5% and 5%, and only ~8 million have a frequency >5% (Extended Data Fig. 3a). Nevertheless, the majority of variants observed in a single genome are common: just 40,000 to 200,000 of the variants in a typical genome (1–4%) have a frequency <0.5% (Fig. 1c and Extended Data Fig. 3b). As such, we estimate that improved rare variant discovery by deep sequencing our entire sample would at least double the total number of variants in our sample but increase the number of variants in a typical genome by only ~20,000 to 60,000.

An 800K SNP chip will be biased toward the 8 million or so variants with a frequency of 5%. This number gives you a sense of the limited scope of variation in the human genome. 0.27% of the genome captures a lot of the polymorphism.

Citation: 1000 Genomes Project Consortium. “A global reference for human genetic variation.” Nature 526.7571 (2015): 68-74.

Why overdominance probably isn’t responsible for much polymorphism

Hybrid vigor is a concept that many people have heard of, because it is very useful in agricultural genetics, and makes some intuitive sense. Unfortunately it often gets deployed in a variety of contexts, and its applicability is often overestimated. For example, many people seem to think (from personal communication) that it may somehow be responsible for the genetic variation around us.

This is just not so. As you may know each human carries tens of millions of genetic variants within their genome. Populations have various levels of polymorphism at particular positions in the genome. How’d they get there? In the early days of population genetics there were two broad schools, the “balance” and “classical.” The former made the case for the importance of balancing selection in maintaining variation. The latter suggested that the variation we see around us is simply a transient between fixation of a favored mutation from a low a frequency or extinction of a disfavored variant (perhaps environmental conditions changed and a high frequency variant is now disfavored). Arguably the rise of neutral theory and empirical results from molecular evolution supported the classical model more than the balance framework (at least this was Richard Lewontin’s argument, and I follow his logic here).

But even in relation to alleles which are maintained at polymorphism through balancing selection, overdominance isn’t going to be the major player.

Sickle cell disease is a classic consequence of overdominance; the heterozygote is more fit than the wild type or the recessive disease which is caused by homozygotes of the mutation. Obviously polymorphism is maintained despite the decreased fitness of the mutant homozygote because the heterozygote is so much more fit than the wild type. The final proportion of the alleles segregating in the population will be conditional on the fitness drag of the homozygote in the mutant type, because as per HWE it will be present in the population ~q2.

The problem is that this is clearly not going to scale across loci. That is, even if the fitness drag is more minimal than is the case with the sickle cell locus, one can imagine a cummulative situation. The segregation load is just going to be too high. Overdominance is probably a transient strategy which fades away as populations evolve more efficient ways to adapt that doesn’t have such a fitness load.

So how does balancing selection still lead to variation without heteroygote advantage? W. D. Hamilton argued that much of it was due to negative frequency dependent selection. Co-evolution with pathogens is the best case of this. As strategies get common pathogens adapt, so rare strategies encoded by rare alleles gain in fitness. As these alleles increase in frequency their fitness decreases due to pathogen resistance. Their frequency declines, and eventually the pathogens lose the ability to resist it, and its frequency increases again.

What if you call for a revolution and no one revolts?

When I was in 8th grade my earth science teacher explained he did not believe in Darwinism. He seemed a reasonable fellow so my first reaction was shock. My best friend at the time, who sat next to me, laughed, “Yeah, some people believe we’re descended from monkeys! Crazy, huh?” I didn’t really know what to say. But what followed was even more confusing to me: my teacher explained that he accepted punctuated equilibrium, not Darwinism. He did not elaborate much beyond this, though I tried to get at what he believed after class in the few minutes I had.

Later on I realized that he had drunk deeply at the well of Stephen Jay Gould, paleontologist and polymath. I will quote Richard Lewontin, Gould’s longtime collaborator and friend:

Now I should warn you about my prejudices. Steve and I taught evolution together for years and in a sense we struggled in class constantly because Steve, in my view, was preoccupied with the desire to be considered a very original and great evolutionary theorist. So he would exaggerate and even caricature certain features, which are true but not the way you want to present them. For example, punctuated equilibrium, one of his favorites. He would go to the blackboard and show a trait rising gradually and then becoming completely flat for a while with no change at all, and then rising quickly and then completely flat, etc. which is a kind of caricature of the fact that there is variability in the evolution of traits, sometimes faster and sometimes slower, but which he made into punctuated equilibrium literally. Then I would have to get up in class and say “Don’t take this caricature too seriously. It really looks like this…” and I would make some more gradual variable rates. Steve and I had that kind of struggle constantly. He would fasten on a particular interesting aspect of the evolutionary process and then make it into a kind of rigid, almost vacuous rule, because—now I have to say that this is my view—I have no demonstration of it—that Steve was really preoccupied by becoming a famous evolutionist.

Gould succeeded, after a fashion. His reputation within evolutionary biology is mixed, at best. Just look at what someone who thinks he made genuine original contributions to science admits above. But in the mind of the public Stephen Jay Gould was an oracle of sorts.

A revolution is sexy. A revolution sells. Having read both of them, I would say that Richard Dawkins is the better stylist when compared to Gould. Additionally, though some might disagree with this Dawkins is closer to the mainline of the modern evolutionary biological tradition than Gould. But in the United States Gould far overshadowed Dawkins…until the latter began to make a name for himself as an anti-religion polemicist in the 2000s. Revolution. Controversy. They’re salient. The press eats it up, and the public trusts the press.

And some things never change. Every few years there is an impending “revolution” in evolutionary biology or genetics. But the revolution is mostly in the minds of a few journalists, and a public that reads a little too much into a puff piece here and there. The sort of well educated public woolly on what the “central dogma” is, but clear that it has been overthrown.

Sometimes this gets out of control. Suzan Mazur’s The Altenberg 16: An Exposé of the Evolution Industry is probably the weirdest instance of this genre of “the sky is falling in evolutionary theory!” But of late some scholars have been coming out with more sober critiques, arguing that the Neo-Darwinian Synthesis needs to be extended or modified significantly. Kevin Laland’s Darwin’s Unfinished Symphony: How Culture Made the Human Mind is the latest instance of this, but this was preceded by Evolution in Four Dimensions: Genetic, Epigenetic, Behavioral, and Symbolic Variation in the History of Life. You can also read David Dobbs’ sympathetic treatment from a few years back around this issue.

I can communicate to you what seems to be the majority view among the evolutionary biologist I know: there isn’t a need for a revolution in conceptual thought, just a working out of details and reallocation of resources. Many who are sympathetic to Kevin Laland’s argument still believe that it’s about emphases and semantics. There’s no reason to put out a clarion call that evolution needs to be rethought in its conceptual foundations.

Honestly I don’t know if there’s been much that is revolutionary conceptually since the original period of the synthesis. Perhaps the rise of molecular evolution and neutrality as a null hypothesis? But even I’m not sure about that.

Erik I. Svensson has put up a preprint which speaks for many people, On reciprocal causation in the evolutionary process. Read the whole thing, it’s thorough, and accessible to a lay audience. The main aspect a bit surprising to me is the good word put in for The Dialectical Biologist, which I have heard is an interesting book:

Recent calls for a revision the standard evolutionary theory (ST) are based on arguments about the reciprocal causation of evolutionary phenomena. Reciprocal causation means that cause-effect relationships are obscured, as a cause could later become an effect and vice versa. Such dynamic cause-effect relationships raises questions about the distinction between proximate and ultimate causes, as originally formulated by Ernst Mayr. They have also motivated some biologists and philosophers to argue for an Extended Evolutionary Synthesis (EES). Such an EES will supposedly replace the Modern Synthesis (MS), with its claimed focus on unidirectional causation. I critically examine this conjecture by the proponents of the EES, and conclude, on the contrary, that reciprocal causation has long been recognized as important in ST and in the MS tradition. Numerous empirical examples of reciprocal causation in the form of positive and negative feedbacks now exists from both natural and laboratory systems. Reciprocal causation has been explicitly incorporated in mathematical models of coevolutionary arms races, frequency-dependent selection and sexual selection. Such feedbacks were already recognized by Richard Levins and Richard Lewontin, long before the call for an EES and the associated concept of niche construction. Reciprocal causation and feedbacks is therefore one of the few contributions of dialectical thinking and Marxist philosophy in evolutionary theory, and should be recognized as such. While reciprocal causation have helped us to understand many evolutionary processes, I caution against its extension to heredity and directed development if such an extension involves futile attempts to restore Lamarckian or soft inheritance.

The reality of cultural hitchhiking

The figure to the left is from a paper, The mountains of giants: an anthropometric survey of male youths in Bosnia and Herzegovina, which attempts to explain why the people from the uplands of the western Balkans are so tall. Anyone who has watched high level basketball, or perused old physical anthropology textbooks, knows that average heights in the Dinaric Alps are quite high in comparison to the rest of Europe, matched only in the region around Scandinavia. The Dutch of late have been the world champions in height, and explanations such as recent selection and their high consumption of dairy products have been given. In this paper the authors point out that the people who live in the Dinaric uplands are not a population which consumes a inordinately high protein diet, at least in relation to their neighbors.

Rather, they suggest that the height of the people who reside in the Dinarics is due to a genetic factor. There is now good genomic evidence that selection accounts for at least some of the difference in height between Northern and Southern Europeans. That is, seems that there have been divergent pressures in these two locales, their genetic differences due to historical demography aside.

The exception to this north-south gradient is obviously in the Dinarics. Another way in which the Dinarics are exception is that it has the highest frequency of Y chromosomal haplgroup I. The other mode of haplogroup I is in Scandinavia. I1 is common among people who live in Sweden, while I2 among the peoples of the western Balkans. I has an interesting history because the vast majority of Mesolithic hunter-gatherer males in Europe belong to this haplogroup. It is very rare outside of Europe. This is in contrast to the other major European haplogroups, which are found outside of Europe at appreciable frequencies.

It is likely that I is indicative of a lineage with roots in Europe which go back to the late Pleistocene period after Last Glacial Maximum ~20,000 years ago. As the world warmed ~10,000 years ago small populations of hunter-gatherers rapidly expanded from their refuges and either most of the males were I, or in the drift process on the edge of the wave of advance I became very common. It is plausible that in terms of alleles which account for variation in height these hunter-gatherers were enriched for those conferring larger size. Cold weather populations tend to be larger. Additionally, they probably consumed a relatively diversified but high protein diet, allowing for greater median size than among farmers at the Malthusian carrying capacity.

But, there has been a lot of selection over the past 10,000 years, and I am skeptical that this correlation between I and height in Europe is anything but a coincidence. Rather, the phylogeny which I exhibits brings me to another issue which I think is not often highlighted: I1 in particular may have “hitchhiked” with the exogenous lineages such as R1b and R1a in early Indo-European society.

That is, in the patrilineal descent groups expanding across the landscape and monopolizing access to resources and mates, the non-invasive I somehow integrated themselves into the broader cultural complex, and partook in the plenty. Like R1b and R1a it exhibits a rake-like topology which suggests rapid recent expansion.

This would not be exceptional. The modern Russian state’s origins are in the polities created by Keivan Rus, who were famously Scandinavian. Rurik was by origin a Swede, and his dynasty eventually came to encompass most of the eastern Slavic peoples, and rule over the Russian people and state until the 17th century. Because there were so any descendants of this dynasty it was possible to adduce its Y chromosomal haplogroup, N1c1. The kicker is that this is clearly a Finnic lineage, with the most recent evidence being that it is a remnant of a recent migration out of Siberia to the west. The implication here is that the direct male lineage of Rurik were assimilated into the Scandinavian culture and power structure, and were possibly chieftains of Finnic tribes somewhere along the Baltic littoral.

Another example is the House of Wessex. Alfred the Great is arguably the first true king of England. Here are the names of some of the earlier monarchs of the House of Wessex, Ceawlin, Cynric, and Cynegils. Even someone without a background in historical linguistics may be curious about whether these are Anglo-Saxons, and there is a line of thinking that perhaps the forebears of Alfred were British warlords, who “went Saxon,” in a fashion analogous to Gallo-Roman aristocrats who assimilated to Frankish-Germanic norms and forms in the 6th and 7th centuries in the Merovingian domains.

Overall what you see in the genetic data are many things, but rarely a straightforward story. Just as genes can impact culture (e.g., lactase persistence), so culture impacts the distribution of genes. Just as human polities are coalitions, so genetic lineages themselves in their distribution and evolutionary history exhibit fingerprints of these past socio-political events and ideas.

Fisherianism in the genomic era

There are many things about R. A. Fisher that one could say. Professionally he was one of the founders of evolutionary genetics and statistics, and arguably the second greatest evolutionary biologist after Charles Darwin. With his work in the first few decades of the 20th century he reconciled the quantitative evolutionary framework of the school of biometry with mechanistic genetics, and formalized evolutionary theory in The Genetical Theory of Natural Selection.

He was also an asshole. This is clear in the major biography of him, R.A. Fisher: The Life of a Scientist. It was written by his daughter.  But The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century also seems to indicate he was a dick. And W. D. Hamilton’s Narrow Roads of Gene Land portrays Fisher has rather cold and distant, despite the fact that Hamilton idolized him.

Notwithstanding his unpleasant personality, R. A. Fisher seems to have been a veritable mentat in his early years. Much of his thinking crystallized in the first few decades of the 20th century, when genetics was a new science and mathematical methods were being brought to bear on a host of topics. It would be decades until DNA was understood to be the substrate of heredity. Instead of deriving from molecular first principles which were simply not known in that day, Fisher and his colleagues constructed a theoretical formal edifice which drew upon patterns of inheritance that were evident in lineages of organisms that they could observe around them (Fisher had a mouse colony which he utilized now and then to vent his anger by crushing mice with his bare hands). Upon that observational scaffold they placed a sturdy superstructure of mathematical formality. That edifice has been surprisingly robust down to the present day.

One of Fisher’s frameworks which still gives insight is the geometric model of the distribution of fitness of mutations. If an organism is near its optimum of fitness, than large jumps in any direction will reduce its fitness. In contrast, small jumps have some probability of getting closer to the optimum of fitness. In plainer language, mutations of large effect are bad, and mutations of small effect are not as bad.

A new paper in PNAS loops back to this framework, Determining the factors driving selective effects of new nonsynonymous mutations:

Our study addresses two fundamental questions regarding the effect of random mutations on fitness: First, do fitness effects differ between species when controlling for demographic effects? Second, what are the responsible biological factors? We show that amino acid-changing mutations in humans are, on average, more deleterious than mutations in Drosophila. We demonstrate that the only theoretical model that is fully consistent with our results is Fisher’s geometrical model. This result indicates that species complexity, as well as distance of the population to the fitness optimum, modulated by long-term population size, are the key drivers of the fitness effects of new amino acid mutations. Other factors, like protein stability and mutational robustness, do not play a dominant role.

In the title of the paper itself is something that would have been alien to Fisher’s understanding when he formulated his geometric model: the term “nonsynonymous” to refer to mutations which change the amino acid corresponding to the triplet codon. The paper is understandably larded with terminology from the post-DNA and post-genomic era, and yet comes to the conclusion that a nearly blind statistical geneticist from about a century ago correctly adduced the nature of mutation’s affects on fitness in organisms.

The authors focused on two primary species which different histories, but well characterized in the evolutionary genomic literature: humans and Drosophila. The models they tested are as follows:

 

Basically they checked the empirical distribution of the site frequency spectra (SFS) of the nonsynonymous variants against expected outcomes based on particular details of demographics, which were inferred from synonymous variation. Drosophila have effective population sizes orders of magnitude larger than humans, so if that is not taken into account, then the results will be off. There are also a bunch of simulations in the paper to check for robustness of their results, and they also caveat the conclusion with admissions that other models besides the Fisherian one may play some role in their focal species, and more in other taxa. A lot of this strikes me as accruing through the review process, and I don’t have the time to replicate all the details to confirm their results, though I hope some of the reviewers did so (again, I suspect that the reviewers were demanding some of these checks, so they definitely should have in my opinion).

In the Fisherian model more complex organisms are more fine-tuned due topleiotropy and other such dynamics. So new mutations are more likely to deviate away from the optimum. This is the major finding that they confirmed. What does “complex” mean? The Drosophila genome is less than 10% of the human genome’s size, but the migratory locust has twice as large a genome as humans, while wheat has a sequence more than five times as large. But organism to organism, it does seem that Drosophila has less complexity than humans. And they checked with other organisms besides their two focal ones…though the genomes there are not as complete presumably.

As I indicated above, the authors believe they’ve checked for factors such as background selection, which may confound selection coefficients on specific mutations. The paper is interesting as much for the fact that it illustrates how powerful analytic techniques developed in a pre-DNA era were. Some of the models above are mechanistic, and require a certain understanding of the nature of molecular processes. And yet they don’t seem as predictive as a more abstract framework!

Citation: Christian D. Huber, Bernard Y. Kim, Clare D. Marsden, and Kirk E. Lohmueller, Determining the factors driving selective effects of new nonsynonymous mutations PNAS 2017 ; published ahead of print April 11, 2017, doi:10.1073/pnas.1619508114

Sexual selection decreasing difference

Sexual selection is often considered a driver of diversification of a lineage. I was introduced to the concept in Jared Diamond’s The Third Chimpanzee, where he suggested that racial differences in appearance might be due to sexual preference, following a suggestion originally made by Charles Darwin. Though sexual selection emerges now and then as a deus ex machina in discussion sections of papers, in general it hasn’t panned out addressing this topic.

But a new paper using shorebirds offers results which oppose this sort of inference, in that sexual selection may be a homogenizing force. Basically the authors used the fact that shorebird lineages have related monogamous and polygamous species. They looked at species richness and genetic diversity using STRUCTURE and microsatellites.

Polygamy slows down population divergence in shorebirds:

Examining microsatellite data from 79 populations in 10 plover species (Genus: Charadrius) we found that polygamous species display significantly less genetic structure and weaker isolation-by-distance effects than monogamous species. Consistent with this result, a comparative analysis including 136 shorebird species showed significantly fewer subspecies for polygamous than for monogamous species. By contrast, migratory behavior neither predicted genetic differentiation nor subspecies richness. Taken together, our results suggest that dispersal associated with polygamy may facilitate gene flow and limit population divergence. Therefore, intense sexual selection, as occurs in polygamous species, may act as a brake rather than an engine of speciation in shorebirds.

A reminder that lots of theorizing may lead you nowhere fast, but a quick empirical check can be very humbling. I’m not sure as to the generality of this result, and ultimately it probably has to do with reproductive variance. But it is a starting point.

Addendum: Overall Geoffrey Miller’s The Mating Mind is probably wrong in most of the details, though perhaps on the most general level there may be something there (I’m wondering particularly in regards to mutational load). But it’s a decent introduction to sexual selection theory in  human context, and has a lot of interesting ideas. And Miller is actually a good writer as far as scientists go.

The human extended phenotype


I think there is something to the hypothesis that we as a species are self-domesticated, but a new preprint really doesn’t change my probability up or down, Comparative Genomic Evidence for Self-Domestication in Homo sapiens. Notwithstanding my own participation in some comparative genomic work, a lot of the conclusions from this field are as clear and obvious to me as the above figure, not very.

To be fair at least the authors of the preprint have a hypothesis they’re testing, the “domestication syndrome” as cause by the neural crest gene modification. Two major issues I’d bring up: it’s comparative genomic because of a paucity of samples, and, tidy explanations often don’t pan out.

Genomic analysis of ancient genomes is very preliminary. Phylogenomic work, which establishes relationships between lineages, can accept a noisy and poor marker set with only a few representative samples. But when looking at population genomics one should at least have either really good data on a small number of individuals, or, more preferable, good-enough-data on lots of individuals. The ancient genomic data set for hominins is not rich enough that I’m confident about any but the most obvious and clear differences between our closest relations and ourselves. The reality of gene flow across populations also adds a confounding element, because it might not be implausible that “modern” alleles actually derive from another ancient lineage, and our modern forebears exhibited the ancestral state.

Second, the neural crest hypothesis and a general model of domestication is rather attractive. I myself find it intriguing, and am curious from a professional scientific perspective. But, attractive hypotheses often do not pan out, and gain early attention because scientists are human, and exhibit some bias and hope. A case in point, mirror neurons has stalled as a silver bullet to explain all sorts of unique aspects of human cognition. Neural crest models are part of the long quest to establish the genes which make us unique and human, even though I’m not even sure this is a wrong question.

The preprint did remind me of an excellent book I read over 10 years ago, The Cultural Origins of Human Cognition. I am much more well disposed toward the thesis now than I was then, in large part because I now longer hold to a “big bang” theory of the origin of modern humanity due to a behavioral revolution triggered by a rapid suite of genetic changes. Rather, I suspect a cultural model where there is reciprocal feedback with genetic changes in a sort of ratchet has a lot more utility, in part because the gap between “archaic” H. sapiens and our own ancestors was I believe much smaller in many ways in relation to behavior than we’ve assumed until lately. Finally, the genetic evidence of lots of lateral gene flow across these distinct branches is indicative of more complexity in the origin of humanity than we had previously understood.

There is also the whole idea of “self-domestication.” I think perhaps it needs to be more explicitly formulated in an ecological sense. Rather than self-domestication, what occurred is that a host of species began to inhabit an evolving “extended phenotype” which humans were a motive engine within. But we need to be cautious about overemphasizing our agency. Once human societies became agricultural beyond a certain point it is not not possible to revert back to hunter-gathering lifestyles without migration or mass die off. In some ways we are as much pawns in the forces unleashed by our original choices and actions as the domestic animals and plants and parasites which have come along for the ride.

Citation: Comparative Genomic Evidence for Self-Domestication in Homo sapiens, Constantina Theofanopoulou, Simone Gastaldon, Thomas O’Rourke, Bridget D Samuels, Angela Messner, Pedro Tiago Martins, Francesco Delogu, Saleh Alamri, Cedric Boeckx, doi: https://doi.org/10.1101/125799