Maybe beauty is just “cognitive cheesecake”?

Beauty matters a lot in our world. The entertainment and fashion industries are based on beauty. Obviously some aspect of beauty is socially constructed and contextual. Beauty standards can change. There was a time when many aspects of European physical appearance, from light hair and eyes, down to the lack of an epicanthic fold, were excluded from idealized canons in East Asia. Obviously that is not the case today, and one can give a very plausible explanation through recourse to recent history as to why those norms shifted. Similarly, there is even a possibility that something as central to evolutionary psychology as preference for a particular waist-to-hip ratio may vary as a function of material conditions. I is clear from the social historical scholarship that the ideal characteristics of a female mate are strongly conditioned on the resources of the male; lower status males put greater emphasis on the direct economic benefits which their partner may bring because they are more on the margin of survival. For much of history lower status males meant most males. That is, peasants.


And yet cross-culturally there does seem to be a certain set of preferences which one might argue are “cultural universals.” People from “small-scale” societies are still able to consistently rank photographs of people from WEIRD societies in facial attractiveness which correlation with results from participants in developed nations. This indicates that there is a strong innate basis. An element of taste deep in our bones, even if we may inflect it on the margins, or increase or decrease its weight in our calculations of what makes an optimal mate. There may be societies where Lena Dunham’s “thick” physique may be preferred to Bar Refaeli‘s svelte profile, but I am skeptical that there would be societies where the former’s facial features would strike individuals as preferable to those of the latter (one might have to correct for Refaeli’s species-atypical hair and eye color, but European norms are pretty widespread outside of small-scale societies now, so that shouldn’t be a major issue). So the question then becomes: is this adaptive?

Evolutionary psychologists have a panoply of ready explanations. They are often grounded in correlations, and then adaptationist logic. For example, women with lower waist to hip ratios (0.7 being the target) have more estrogen, and are more likely to be nubile, and so are more fertile, all things equal. Since being more fertile is going to be a target of selection, a lower waist to hip ratio is going to be a target of selection, because implicitly there is a genetic correlation between estrogen and waist to hip ratio. The problem is that genetic correlations have to be proved, not assumed. Correlations are not necessarily transitive. Just because A has a positive correlation with B and B has a positive correlation with C, does not entail (necessarily) that A has a positive correlation with C.

With that in mind, a new paper looks at facial attractiveness, averageness of facial features, and heritability of both these traits. They use a twin design, with an N of ~1,800. And, they relate it to a comprehensible causal mechanism: mutational load resulting in increased developmentally instability. Basically, the more mutations you have, the more likely you have to exhibit facial asymmetry, and therefore facial averageness is a good proxy for genetic quality. It is well known that average faces tend to be rated better looking than non-average faces. This is part of an argument that Geoffrey Miller put forth in , a very fertile work. There is an elegance to it. Unfortunately follow up work over the past ten years is suggesting that this simple model is either wrong, or, everything is a whole lot more complicated.

First, the paper, Facial averageness and genetic quality: Testing heritability, genetic correlation with attractiveness, and the paternal age effect. The abstract gives away the game:

Popular theory suggests that facial averageness is preferred in a partner for genetic benefits to offspring. However, whether facial averageness is associated with genetic quality is yet to be established. Here, we computed an objective measure of facial averageness for a large sample (N = 1,823) of identical and nonidentical twins and their siblings to test two predictions from the theory that facial averageness reflects genetic quality. First, we use biometrical modelling to estimate the heritability of facial averageness, which is necessary if it reflects genetic quality. We also test for a genetic association between facial averageness and facial attractiveness. Second, we assess whether paternal age at conception (a proxy of mutation load) is associated with facial averageness and facial attractiveness. Our findings are mixed with respect to our hypotheses. While we found that facial averageness does have a genetic component, and a significant phenotypic correlation exists between facial averageness and attractiveness, we did not find a genetic correlation between facial averageness and attractiveness (therefore, we cannot say that the genes that affect facial averageness also affect facial attractiveness) and paternal age at conception was not negatively associated with facial averageness. These findings support some of the previously untested assumptions of the ‘genetic benefits’ account of facial averageness, but cast doubt on others.

I’m going to reproduce some of the results from Table 4 below.

Averageness Attractiveness    
Heritability Non-heritable Heritability Non-heritable Genetic correl Env correl
Female 0.21 0.79 0.6 0.38 0.11 0.21
Male 0.22 0.78 0.62 0.39 0.11 0.08
Overall 0.21 0.79 0.6 0.4 0.11 0.16

What you see is very modest heritability for averageness, and a decent one for attractiveness. But, there’s no statistically significant evidence that the genetic correlation is there (the confidence intervals are huge around 0.11, from 0 to 0.35). Though they state the environmental correlation passes statistical muster (so common environmental variables might be producing attractiveness and facial averageness). Please note that a heritability of 0.6 does no mean a correlation of 0.6. The heritability of height is 0.8 to 0.9, but correlation of the trait across siblings is ~0.5. Heritability is the proportion of variation of the trait explained by variation in genes, in the population.

If you just look at heritabilities, averageness seems to have been under stronger selection than attractiveness all things equal. Usually strong directional selection removes the heritable variation on a trait. The high heritability gives us a clue that there are a lot of ugly people around still, and some of that is just the way they are born. In contrast, there are fewer people with lop-sided faces. These are subjects from a Western society, so I bet the results are going to be different in a high pathogen load environment (my expectation is that heritability will decrease, but perhaps it will actually increase because as genetic factors which allow for one to be robust to disease will become more important in explaining variation in the trait).

Finally, in the near future there will be high coverage genomic sequences from many people. If you hit the same marker more than 30 times you can conclude with decent confidence if it’s a de novo mutation unique to the individual. You can actually check how well mutational load tracks with averageness and attractiveness (each human has <100 de novo mutations, so there’s a lot of inter-sibling variance presumably). At this point I’m moderately skeptical of a lot of the selectionist models, whereas five years ago I’d have thought there would be something there, and it would be easy to discover. And it may be that beauty, like many aspects of culture, is not about adaptation and function in any direct sense, but simply a cognitive side effect. Like what Steve Pinker has stated about music. I don’t really believe that, but we can’t dismiss that position out of hand anymore.

All non-Basque Spaniards do have Moorish admixture


I just realized in the post below that I casually stated that pretty much all non-Basque people in Spain have significant ancestry from people who were Muslim at some point since the fall of the Visigothic kingdom. By significant I mean more than ~1%. So not just a genealogical line of descent, but genomic ancestry attributable to a specific historical event. But perhaps I should justify this a bit. The reason is two-fold. First, many people are not totally aware of what’s going on in genetics over the past five years or so. That’s important, because a lot of data has come online. Second, fleshing out the details matters. After all, one might contend that North African signals date to the Roman era, rather than the Moorish period.

This paper in PNAS, Gene flow from North Africa contributes to differential human genetic diversity in southern Europe, is the best survey I know. It establishes a few points. For example, the Basque differ from people from other regions of Spain in that they lack much evidence of North African admixture. The historical and social separation of the Basque country during the Moorish period, and also after the Reconquista, is a pretty good rationale for why this might be. Second, the paper establishes some regional variation in the admixture. There’s more in Andalusia. Roughly, it seems that areas closer to North Africa, but probably more significantly under Muslim rule longer, have more admixture from the Maghreb. Third, a lot of it is too recent to be Roman. Looking at segment length the authors estimate a lower bound of ~300 years. After reading the post from Saturday you should understand why this statistic needs to be handled with caution. Additionally, it is important to note that the gene-sharing between Spaniards and North Africans occurs in cases where the North African population of interest has no European ancestry at all. That is strongly indicative of gene flow from North Africa to Europe, rather than bidirectional dynamics. Finally, it looks to be that pretty much all the very low level Sub-Saharan admixture you can find in Spain is attributable to the Moorish period, because the Berbers and Arabs who arrived had that element in their ancestry due to the ubiquity of the trans-Saharan slave trade.

Of course we need to careful about over-interpreting this. It looks to be that on the order of ~10 percent of the ancestry is due to migration from North Africa. In my judgement this isn’t really that much, considering that most of Spain was ruled by Muslims for 400 years (Muslim power was in sharp recession by 1200, and the conquest of Granada nearly 300 years later was really just a mopping up expedition). This is likely due to two factors. First, Spain was one of the more populous regions of the Western Roman Empire, and the arrival of the Visigoths did not result in nearly the disruption as occurred elsewhere. Despite the Germanic character of the Visigoths, like southern France to my knowledge Roman culture exhibited some continuity in the peninsula. Second, the vast increase in the number of Muslims in the peninsula occurred as it did elsewhere, through conversion and intermarriage. Three of Caliph Abd-ar-Rahman III’s grandparents were born Christian (his paternal grandmother also gave rise to a prominent line of Christian princes through her second marriage). One source suggests that ~80 percent of population of the Iberian peninsula in 1100 was Muslim, after a massive wave of conversion over the previous two centuries (this sort of latency, where Islam is an elite religion for the first century or two, is actually typical). Combined with the genetic data, which suggests widespread admixture with a North African element throughout the population, it is highly likely that Spain is one area of the world where the vast majority of the population have many lineages which went from Christian to Muslim and then Christian again (as well as Jews who became Muslim and whose descendants became Christian!).

The point in rehashing all this is that in Michael Cook’s , he repeats a common belief that Islam in particular exhibits a tendency where cultures tend to have an irreversible transition. Once Muslim, always Muslim. He gives Morisco recalcitrance in Spain after the conquest of Granada as evidence, but that is misleading because these were a rump community, and even among Moriscos many converted in the century after the fall of the Muslim kingdom. Spain is one of the best examples that Islam is like any other religion, under concerted pressure and inducement individuals, and more importantly whole communities, switch identities.

Cook does grant that a substantial number of Muslims in China assimilated to a Han identity. This is well attested for elite lineages of Muslims and Jews (from Kaifeng), whose entrance into the mandarin class of scholar-bureaucrats almost always presaged total assimilation. But it is probably at least as true for the vast majority of non-elite Hui, many of whom also shifted toward the Han identity. In the 20th century it was a truism among the Hui, reported without much skepticism, that Han can become Hui (through conversion), but the converse is not true. The problem with this is that the social norms and mores as such that movement from Hui to Han is never going to be widely documented, while a shift in the other direction will be. As it happens, there is now ethnographic evidence from southern China and Taiwan of whole communities which shifted from Hui identity to a Han one, with their Muslim origins being preserved in oral memory, as well as the persistence of customs such as not offering pork on ancestral graves in deference to the religion of these forebears. Over the past few centuries Muslim communities in China proper have become reintegrated into the world-wide Ummah, and undergone several waves of reform which have resulted in conformity with world normative Islam. But before this it seems likely that there was a continuous flow of Hui into the Han population through assimilation, in particular because there are many documented beliefs of Hui in the 18th century which seem to suggest a convergence with Daoism and Pure Land Buddhism.

All this is not to say that Cook’s thesis, and the public perception, does not have an element of truth to it. It is simply that the reality is a little more complex and less supportive after you scratch below the surface.

Genetic addendum: I have my own data sets, and decided to double check the results above. Including in these data the 1000 Genomes IBS (Spain) samples. You can see from the PCA and TreeMix (all of them exhibited the same topology) that something is going on with non-Basque Spaniards. The Italian position is explained by the fact that these are southern Italian samples, and those tend to exhibit affinities to Eastern Mediterranean groups. Please note that I removed all markers with missing calls and outliers as well. There were 130,000 SNPs in the final data.

Read More

Open Thread, 8/30/2015

A week ago a friend was asking about my opinion on a long article that was shared on Facebook about ISIS and the nature of Islam. It came up that I don’t talk about religion too much on this blog in the same way I did before 2010 or so. The primary reason is that I don’t have too much to say. But, people who are not familiar with my oeuvre to that great an extent might not be conscious that I’ve written/thought/read a good amount on the topic. For example, someone on Twitter attempted to “educate” me on Christian theology, but I’ve read stuff as diverse as and . Just because I don’t talk about it all the time doesn’t mean I’m not familiar; I just happen to find theology as interesting and insightful as a Christian finds reading .

More critical for my understanding of religion is the of the topic.* In particular, Scott Atran’s is probably the best introduction to the axioms which inform the way I look at religious phenomena. This work is useful because it is encyclopediac in the nature of its disciplinary synthesis (e.g., it engages more deeply with evolutionary explanations than most of the cognitive anthropology literature), and, Atran directly engages alternative and complementary viewpoints such as the neo-functionalism David Sloan Wilson espouses in .

All this means that I am highly skeptical of a central assumption of Michael Cook’s , that religions are concrete things which are constrained and defined by foundational premises and institutional structures. Cook’s work is detailed in fleshing out the thesis. But ultimately the acceptance of the chain of causality rests on prior plausibility of the assumption in your own mind. As I have read and accepted a wide range of individual level cognitive results which show that there is little logical constraint on interpretation of religious texts (see ). I do think Cook’s work is an impressive corrective to those who dismiss the importance of religious foundations and text. His scholarship makes a much better case than others, because he brings a wide range of material into contrasting relief.

And when you drill down to the specifics it is easy to problematize. One of Cook’s contentions seems to be that Muslims in particular have very strong asabiyyah in relation to non-Muslims. An illustration of this idea is that once a society becomes Muslim it never switches to another state. Cook admits there are a few cases where this did occur, and in particular highlights the instance of the Moriscos of Spain. But, this actually proves the point of Muslim asabiyyah in the end, because the Moriscos were expelled in the early 17th century due to their indigestibility into the polity of Christian Castile.

There are two objections to this outline of Morisco indigestibility in relation to the Christian polity which became Spain. First, a fair number of Moriscos in fact did assimilate. Some of the most enthusiastic proponents of expulsion were descendants of Moriscos whose own loyalty and standing was threatened by the existence of a large crypto-Muslim population. There is an element of irony here, because the Moriscos were expelled as a people, and some believing Christians were exiled because of their ethnic affiliation as Muslims. Obviously deciding who was, and wasn’t, a Morisco was not cut & dried, though presumably interaction and integration into the community was key. Second, the focus on Moriscos misses the fact that the vast majority of the Muslims who lived in the Iberian peninsula became Christians over the centuries of the Reconquista.

In the author outlines the process of initial toleration, and then assimilation, of the Muslims of northern Spain during the earlier phase of the Reconquista. Before the joint monarchy of Castile and Aragon finally conquered Granada, there had been a centuries long process of conversion, and over the longest time scale, re-conversion, of Muslim populations to Christianity. Focusing on the Morisco populations descended from groups which had had an Islamic identity the longest and most totally probably is not representative. The genetic data make it clear that outside of the far north of Spain there are low levels of admixture likely from people whose ancestry traces to North Africa all across the peninsula.

I would recommend . But with major caveats and cautions. Though that should be true of any book…..

journal.pbio.1002224.g004If there is a paper I read this week which blew my mind, it is The Dynamics of Incomplete Lineage Sorting across the Ancient Adaptive Radiation of Neoavian Birds. Basically, the authors seem to argue that difficulties in resolving the phylogenetic origin of most birds right after the extinction of the dinosaurs. The issue seems to be that speciation occurred so rapidly in all directions from a large founding population that it is hard to resolve the genetic signals well into clean bifurcating speciation trees (ergo, the network). As someone who is more personally focused on a microevolutionary scale I wonder about the ubiquity of admixture, introgression, and hybridization, though that is not focused on too much in the paper.

The latest Planet Money podcast is interesting, as it looks closely at Netflix’s human resources policy. Basically, they don’t have any truck with the cant that the firm is a “family.” To a great extent I assume unless you’re a total rube you understand that on a deep level this is propaganda. Most firms will fire you if you are no longer necessary and that is clear. In contrast, with family usually you can’t just get rid of them (giving out a child for adoption and such are exceptions). Netflix takes this to the logical conclusion…but I think the story shows why you need to be careful about taking things to logical conclusions with humans. Netflix can only exist in a broader labor ecosystem where most firms don’t engage in the same practices or promote the ethos so nakedly. Additionally, Netflix’s analogy to a professional sports team, rather than a family, may be telling. There are cases where teams with a lot of lot under-perform because of lack of cohesion. One might predict that in the long term Netflix is going to face a problem because there’s no capital in the bank of goodwill from its employees. At the first moment Netflix looks like it might be headed in the wrong direction or become a marginalized player I predict its employees, its “team members,” will opt for free agency en masse. But does that matter? Is the institutional persistence of a particular firm even a good we sshould aim for?

* If you are interested in this topic, books of interest/note, in no particular order: , , , , , , , , , , and . These works disagree with each other, and address different, if often overlapping, phenomena. But you kind of need to throw the kitchen sink at this sort of issue.

Admixture beyond the single pulse

Graham Coop’s group has been exploring the implications of more complex models of spatial structured genetic variation and admixture for the last few years. I’ve already pointed Gideon Bradburd’s SpaceMix preprint, which attempts to differentiate genetic relatedness due to geographic proximity and therefore continuous gene flow, as opposed to an admixture event which is not congruous with spatial position (e.g., the Norwegian Sami have more Siberian than many groups to their east). Alisa Sedghifar now has a paper out in Genetics, The Spatial Mixing of Genomes in Secondary Contact Zones. Here’s the abstract:

Recent genomic studies have highlighted the important role of admixture in shaping genome-wide patterns of diversity. Past admixture leaves a population genomic signature of linkage disequilibrium (LD), reflecting the mixing of parental chromosomes by segregation and recombination. These patterns of LD can be used to infer the timing of admixture, but the results of inference can depend strongly on the assumed demographic model. Here, we introduce a theoretical framework for modeling patterns of LD in a geographic contact zone where two differentiated populations have come into contact and are mixing by diffusive local migration. Assuming that this secondary contact is recent enough that genetic drift can be ignored, we derive expressions for the expected LD and admixture tract lengths across geographic space as a function of the age of the contact zone and the dispersal distance of individuals. We develop an approach to infer age of contact zones using population genomic data from multiple spatially sampled populations by fitting our model to the decay of LD with recombination distance. To demonstrate an application of our model, we use our approach to explore the fit of a geographic contact zone model to three human genomic datasets from populations in Indonesia, Central Asia and India and compare our results to inference under different demographic models. We obtain substantially different results to the commonly used model of panmictic admixture, highlighting the sensitivity of admixture timing results to the choice of demographic model.

In a stylized fashion what’s going on here is that genome-wide data sets have allowed for the inference of admixture events which usually assume a single pulse of rapid random mating between two extremely diverse populations. This works in a controlled laboratory situation, but is less plausible for humans. There are cases which fit, such as the settlement of Pitcairn by the mutineers from the Bounty, but they’re exceptional (another case might be the admixture you see in some areas of Latin America from Amerindians, where the indigenous groups seem to have disappeared after a few generations, but it turns out that native women were assimilated into the European and African populations in a very short period of time). An alternative scenario is one where two populations come into contact, and admixture takes a longer period of time. In a spatial rendering there’d be a “contact zone” where gene flow might occur in fashion well modeled as a diffusion process. To give a concrete example of the latter case I will offer the Kalmyk people. The Estonian Biocentre has posted some data from this population, and all of them have varying levels of European admixture. As there is variance it is likely that this admixture did not happen all at once. Rather, once the Kalmyks migrated to Russia three hundred years ago there has been continuous gene flow into the community, as opposed to a frenzy of admixture, after which barriers might be thrown up. The latter scenario actually might be likely to occur in a case where only male Kalmyks migrated, but as it was the population it was a full folk wandering, where the tribes evacuated Dzungaria as a whole (I am aware that there were also back migrations, please don’t leave a comment explaining this to me!).

Citation: Moreno-Estrada, Andrés, et al. “Reconstructing the population genetic history of the Caribbean.” (2013): e1003925.

So what happens after an admixture event? As noted in this paper assuming a simple pulse admixture the distribution of ancestry tract lengths and LD decay is exponential. This is a function of the fact that recombination is going to break apart ancestral multi-locus allelic associations as a function of generation time. As an extreme example, the F1 offspring of two very different populations would have alternative ancestry tracts on their paternal and maternal chromosomes. Obviously LD would be very high as well. But as the F1 population randomly mates the LD would be broken apart by recombination, as ancestry tracts would begin to alternate on chromosomal segments. You can see it when you perform ancestry deconvolution on groups such as Puerto Ricans. There are short segments due to old Native American ancestry which entered the population over a narrow period of time which has been chopped up by recombination. In contrast, the African segments have a wider range of block lengths in part because there has been more continuous admixture since the settlement of the island by the Spaniards.

Sedghifar et al. building an analytical framework to allow one to make inferences which are hopefully true to the more multi-textured manner in which populations actually admix than the single pulse. As the paper is open access I invite readers to peruse the formalism as well as the simulations which were performed to evaluate their framework. It strikes me that this is a definite first-pass, but a necessary one. As noted in the paper, but well known for years, the single pulse admixture models tend to underestimate the dates of mixing (or, more charitably, they pick up the last “pulse”). So, often when I saw a paper giving an admixture estimate, I took that as a floor, and nothing more.

In the final section the framework is applied to real data sets. There are two issues that jump out into the foreground. As noted by the authors, the HUGO Pan-Asian data set, which is what you need to use for many maritime Southeast Asian groups, has very few markers. At ~50,000 SNPs it’s really an animal grade set of chip data, not human grade one (and even for animals they’re going beyond 60K SNP-chips). The second issue is geographical coverage. It strikes me that ideally they’d have transects that with more sampling by position. This obviously isn’t something that can be changed right now, so I assume that in the future the situation will improve on the data side and the methods can more robustly be applied.

They compared admixture in India, Southeast Asia, and Central Asia. It seems that their framework did not yield much in India, probably because the admixture patterns are complex and old, and could not be easily retrieved from the data with the few assumptions they had (though that in itself tells you something about the real dynamics). This is really a situation where hopefully ancient DNA will allow researchers to fix some parameters in the future. There are cases where compound pulse admixtures are actually a better model for reality than contact zones and diffusion gene flow across the borders. India may be an instance. For example, the Tamil Brahmins seem to have some indigenous South Indian admixture, but very little variation of this admixture across individuals. That implies that once the admixture occurred, there was a long period where gene flow did not occur due to strict endogamy, else you’d see more variation. In a world unencumbered by social constraints a contact zone model would work well, but South Asia may not be that world.

As expected the secondary contact zone model gave an older date of admixture for Southeast Asia, where Austronesians arrived over the lat 4,000 years. Perhaps even too old! They note: “Linguistic evidence suggests that the Austronesian expansion through Indonesia dates to ∼ 4000 years ago (Gray et al. 2009)…Our estimate of timing based on fitting a geographic contact zone (5800 years ago) is much older than dates estimated by single pulse models, but is also considerably older than the Austronesian expansion.” The citation for Gray et al. seems to be this paper, but I’m pretty sure it was meant to be Language Phylogenies Reveal Expansion Pulses and Pauses in Pacific Settlement. In a inter-disciplinary field like this you need to rely on other researchers to complement your own understanding of specific domains. As it happens I am now more skeptical of linguistic phylogenetics than I was, so I don’t put too much stock that the date inferred was much older using their methods than what the linguists believe. Rather, I’d put more of an emphasis on material remains and archaeology, though dating and provenance can be hard to pin down on some occasions.

The last empirical illustration has to do with Central Asia, and I have a bit to say about this. The authors seem to be concerned that their signal of admixture is much older than the period of the Mongol invasions, ~700 years ago. Other studies, based on a pulse admixture model, pin this exact date, and others do not. The problem I have with this is that the real demographic history actually aligns well in my opinion with the dates that are given this paper. I don’t think they needed to take the Mongol model nearly as seriously as they did. But, I doubt that any Central Asianists peer reviewed this for Genetics, so a lot of weight was given probably to the older papers in genetics, where the Mongol angle is always played up. The reality is that there was a massive continuous movement of Turkic peoples from about 500 A.D. from greater Mongolia down into the Persianate world of Central Asia. While in India a contact zone model may not work well due to a history of endogamy, the situation is more amenable to that in Central Asia. I think further extensions of this framework in Inner Asia will be fruitful and necessary.

Though the authors here focus on human data sets, presumably because there was data and we know something about human demographic history, the secondary contact zone model formalized in this paper may be more useful with populations of animals and plants, where social constraints don’t exist to enforce endogamy (unless you count reinforcement!). Also, it probably will be useful in island situations, such as in Japan, where the migration patterns are probably defined by a single admixture followed by a wave of advance which likely had secondary contact zone dynamics (the Ainu have Yayoi ancestry).

Citation: Sedghifar, Alisa, et al. “The Spatial Mixing of Genomes in Secondary Contact Zones.” Genetics (2015).

Mind the drift lest your inference go off path


The bar plot above shows the Kalash people in yellow as very distinctive group among a panoply of Eurasian populations. The figure is from a Rosenberg lab paper. There’s nothing aberrant about this result, you can generate this plot pretty easily by using any motley set of markers. The Kalash are distinctive. But it is important to keep the distinction in perspective. They’re not a relic population, remnants of an ancient race lost to time and memory. Rather, they happen to be a highly diverged northwest South Asian group. Their divergence is due to a small isolated breeding population which has been highly endogamous.

What this means is that the Kalash have a low long term effective population and have been more strongly impacted by drift in their allele frequency spectra. Small populations are subject to great allele frequency volatility generation to generation, and tend to lose a lot of their genetic diversity, and also fix many alleles. One consequence of this is genetic inbreeding and a higher recessive disease load. These populations with a lot of drift will have less efficacy of selection in removing deleterious alleles, and if a recessive expressing variant is fixed, then that’s that.

But another major consequence of strong drift on a population so that everyone is quasi-related for all practical purposes is that when you attempt some sort of clustering they naturally fall out as a very natural grouping. They’re low hanging fruit. When you plot populations on on a PCA you normally remove closely related individuals, because they will naturally form a tight cluster, and overwhelm the between population variation you’re looking for, hogging up all the highest dimensions making them distinct from non-relatives. Inbred groups like the Kalash do the same thing, if less boldly so. If you can keep this in mind it will allow for proper inferences about the natural history of a population. If you can’t, then you will be confused.

This is preface to a nice paper in PLOS GENETICS, Evidence for a Common Origin of Blacksmiths and Cultivators in the Ethiopian Ari within the Last 4500 Years: Lessons for Clustering-Based Inference, which reports that an earlier publication, Ethiopian Genetic Diversity Reveals Linguistic Stratification and Complex Influences on the Ethiopian Gene Pool, did not control for the effect of drift due to endogamy and so came to the wrong conclusion.* I won’t repeat the methods they used, as the paper is open access. But, they account for drift much better, and show that the divergence of a presumably genetically distinct caste had much more to do with increased drift due to endogamy than it did with the separation of the two lineages at some time in the distance past. Remember, drift builds up over any two pair of lineages which separate. But if the population size in one of the daughter lineages is very low, then drift will shift it away from the ancestral frequency spectra much faster, producing an artificially “long branch.”

The Kalash and the Ari are extreme cases of this. But they illustrate the general principle that we should be cautious about making inferences when we don’t control for the vicissitudes of demographic history, which may skew the power of our methods to see in a fair and balanced manner.

* There’s an overlap of authors across the two publications, showing that scientists do and can overturn their own conclusions if new data or analysis can persuade them.

As man is, the dog shall be

Screenshot from 2015-08-26 20:34:38
When is a jackal a wolf? All the time apparently. At least according to a persuasive new paper, Genome-wide Evidence Reveals that African and Eurasian Golden Jackals Are Distinct Species.

First, let’s put this in context. Canids area big deal. They’re big social mammals whose distribution and speciose character have undergone big changes across the Pleistocene. Sound familiar? Is it any surprise that one of their kind is our “best friend.” And, according to the anthropologist Pat Shipman the symbiotic relationship between dog and man is responsible for the in the evolutionary war of all against all. About six months ago that thesis would seem a stretch, as the origin of dogs does not date until almost the Holocene according most genetic scholarship (the paleontologists have found rather old suggestive skulls thought). So tens of thousands of years after modern humans replaced other lineages. But ancient DNA suggests problems with the calibration of earlier work, which may have dated their divergence from wolves too recently. That and the fact that the emergence of dogs as a distinct group of canids might be concurrent with the arrival of modern humans to Eurasia make Shipman’s thesis at least feasible, if not probable. And note that I stated divergence from wolves, not derivation. It turns out that dogs are a sister lineage to Palearctic wolves, not derived from them. As observed in this paper extant lineages of wolves are genetically rather homogeneous, and seem to have diversified relatively recently, within the last 20,000 years, on the order of 10 to 20 thousand years after the last common ancestor of extant wolves and dogs.

Screenshot from 2015-08-26 20:44:18Where do jackals play into this? The golden jackal has a distribution which covers both Eurasia and Africa. The species’ was determined morphologically. In other words, they look similar across their range. But sometimes you can’t judge a book by its cover. As an obvious example, most people would think that a hyrax on superficial inspection was a rodent. But a close examination of anatomical details indicated a relationship to elephants to classical taxonomists, which has been validated by DNA. But, as the paper above states plainly in the title the DNA here contradicts inferences made from morphology. Wolves and dogs, and African golden jackals, form a monophyletic lineage, to which Eurasian golden jackals are an outgroup! This determination was achieved through mtDNA analyses, as well as phylogenetic reconstruction from specific genetic regions, and, genome-wide comparisons on millions of polymorphisms.

But wait there’s more! One major difference between the example above of the hyrax vs. elephant and jackal vs. wolf is that the phylogenetic distance in the latter case is far smaller across the tips of the branch. That probably explains why morphological characters were not sufficient to discern the shared ancestry and derived characteristics of the wolf and the African jackal, as opposed to the Eurasian jackal. And, a corollary to this is that hybridization between these lineages is possible. In other words, this isn’t a phylogenetic tree, it’s a phylogenetic graph! Using D-statistics the authors show that there has been a fair amount of gene flow between Eurasian wolves and Eurasian jackals. And, in particular a lot of admixture from the Eurasian jackal to the dingo and basenji breeds.

Is this starting to sound a bit familiar? As population genomics has increased coverage of human populations, modern and ancient, as well as increasing marker density and accuracy, first approximation coarse phylogenetic trees have given way to threads of gene flow edges tracing their away across the thick branches. The trees have given ways to myriad graphs which force us to make more subtle our understanding of the genetic background of our own lineage. I see no reason why the same will not be true for large mammals, or, frankly, an innumerable number of clades.

In the near feature sequencing will be ubiquitous in ecological and systematic studies. At the coarsest big picture scale we’ll still see a confirmation of the tree of life as it’s classically envisioned, exploding outward from node to node, in subdivisions of clean monophyletic lineages, pruned by extinction diversified by drift and selection. But as you focus in closely the bifurcations will turn in on themselves or thread together in tangle, as the branches begin to be stitched together by gene flow. Look even closer and you’ll see that even within a young species, like humans, our local geographic pedigrees also collapse in on themselves, and tangle and coalesce down to a set finite number of individuals, rather than the infinite space of genealogical possibilities.

Communism was an absolute disaster, and its shadow haunts us today

Japan orange, Taiwan navy South Korea green, China light blue

Update: On Twitter it came to my attention that some think that this post is about growth Actually, my point is that the Communist period, and Mao’s period of domination, with the Great Leap Forward and the Cultural Revolution, probably are huge decrements to utility over the 20th century which the Chinese are now just compensating for. I think a KMT China, even if it unified less quickly and thoroughly than China, would probably have resulted in a far more prosperous China far earlier than in our “timeline.” Perhaps not as prosperous as South Korea, and definitely not Japan, but still quite prosperous over the past three generations in comparison to Communist China when state socialism was the dominant motor of the economy. Ergo, look not at the growth itself as opposed to the “area under the curve” from 1950 on.

Organized international Communism was responsible for on the order of tens of millions of deaths in a direct and concerted fashion, conservatively estimated. It also resulted in decades of repression for those who lived under it, but did not die under it. It fell with the Soviet Union, and today post-Communist (e.g., Russia) and quasi-Communist (e.g., China) nations are trying to move on beyond what was by and large a failed experiment in social engineering, with the failure resulting in massive levels of mortality and reduced life satisfaction on the part of those who lived under Communist regimes.

But can we move on? I have noted before that over the past generation in the aggregate Chinese economic development has resulted in the greatest reduction in poverty in the history of the world. With the economic crisis which is starting to afflict China, in all likelihood a deceleration from the very rapid growth phase induced by increased labor and capital inputs is upon us, and people are wondering about the long term trajectory of the nation. The problem is that China may grow old before it grows rich. The Chinese total labor force already peaked a few years ago. Over the next few decades its dependency ratio will shift in a direction similar to Japan’s. I am hopeful that the Chinese can meet their demographic challenges, and there are those who are optimistic. But we really don’t know.

And yet it has been brought to my attention that one could argue the Communist period in China is the cause of our current predicament. Compare the wealth trajectories of South Korea and Taiwan to the People’s Republic of China. It may be that for various reasons (e.g., Japanese investments in Korea and Taiwan, as well as differences between China’s Han population and the Fujianese preponderant in Taiwan) China under a non-communist regime would never have been as wealthy as South Korea or Taiwan are today. But does anyone doubt that China would be wealthier far earlier without the convulsions of the Great Leap Forward, Cultural Revolution, and grinding poverty of the 1970s? A billion people experienced deprivation due to the miscalculations of elite intellectuals in the mid-20th century, when Communism fused with nationalism was on the march. That’s behind us. But the late economic start for China is something we continue to live with today. We might have avoided this problem of China growing old before it grows rich, if it had a 30 year head start toward entrance into the modern economy. The world might have been a very different place…. (in fact, a best case scenario is that a dynamic China would have prodded India’s Permit Raj to liberalize earlier than the 1990s).

Desktop linux will never happen for the masses

Screenshot from 2015-08-25 01:10:15Update: I think Richard Stallman left a comment on my blog!!! OMG.

I remember very precisely that it was in the spring of 2008 that I finally transitioned toward being a total desktop Linux user. Basically I’d been in Linux for a few days…forgotten, and tried to watch something on Netflix streaming. I then realized I wasn’t in Windows! Now that Netflix works on Ubuntu I don’t really use Windows at all. I still have a dual-boot notebook, but I have two desktop computers than are Linux only machines.

Well, it looks like I’m somewhat of an outlier. I think the rise of Mac utilization among nerds over the past 10 years has really had an effect. Since you can go into the terminal on a Mac it removes a lot of the advantage of Ubuntu, which after all is still somewhat less “turn-key” that Windows or Mac OS.

Then of course there’s Android. So in a way Linux has won. Just not in the way people were imagining in the mid-2000s.