Why joint families will die

In response to the post below on joint families. First, a personal question: is it my cultural bias that joint families are fertile incubators of sexual abuse of young females by male relatives? Mind you, there are disgusting as fuck fathers who rape their daughters, but it seems that within the nuclear family the probability of intra-familial sexual abuse is minimized. I don’t have much personal experience with joint families, but when I dig beneath the vibrant and traditional surface there seems to be an undercurrent of perverted uncles and lecherous cousins.

Second, joint families are conditional on particular demographic structural priors. In a world where average sibling cohorts are 3 or lower, many people won’t have siblings with whom they can live. The “circle of cousins” scenario is going to be less common.

Addendum: I am not one who encourages laissez faire in the comments. If you expose me to idiocy I will ban, even if you have posting privileges on this weblog.


Posted in Uncategorized

Using your 23andMe data: exploring with MDS

Note: please read the the earlier post on this topic if you haven’t.

The above image is from 23andMe. It’s from a feature which seems to have been marginalized a bit with their ancestry composition. Basically it is projecting 23andMe customers on a visualization of genetic variation from the HGDP data set. This is actually a rather informative sort of representation of variation. But there has always been an issue with the 23andMe representation: you are projected onto their invariant data set. In other words, you can’t mix & match the populations so as to explore different relationships. The nature of the algorithm and representation produces strange results, so varying the population sets is often useful in smoking out the true shape of things.

With the MDS feature I wrote about yesterday you can now compute positions with different weights of populations and mixes. This post will focus on how to manipulate the overall data set. You should have PHYLO from the the earlier post. Open up the .fam file. It should look like this:

Malayan A382 0 0 1 -9
Paniya D36 0 0 1 -9
BiakaPygmies HGDP00479 0 0 1 -9
BiakaPygmies HGDP00985 0 0 1 -9
BiakaPygmies HGDP01094 0 0 1 -9
MbutiPygmies HGDP00982 0 0 1 -9
Mandenkas HGDP00911 0 0 1 -9
Mandenkas HGDP01202 0 0 1 -9
Yorubas HGDP00927 0 0 1 -9
BiakaPygmies HGDP00461 0 0 1 -9
BiakaPygmies HGDP00986 0 0 1 -9
MbutiPygmies HGDP00449 0 0 1 -9
Mandenkas HGDP00912 0 0 1 -9
Mandenkas HGDP01283 0 0 1 -9
Yorubas HGDP00928 0 0 2 -9

And so forth. PHYLO has 1,500+ individuals. This is a bit much, which is why the – -genome command took so long. To ask particular questions it is often useful to prune the population down. I have a friend who is 1/4 Filipino who is curious as to whether his ancestry was more Chinese or native Filipino. How to answer this?

– You want a range of East Asian populations, north to south.

– You want a good out group. I’ll use the Utah whites.

All you need to do is go through the .fam file and keep only those lines you want, and put them into a new file, keep.txt. Then you run this command:

plink – -noweb – -bfile PHYLO – -keep keep.txt – -make-bed – -out PHYLONARROW

So I’ve now made a new pedigree data set which is a subset of the original. Now I merged my friend and my daughter’s genotype into this data set. What about if I wanted to remove some individuals, for examples, the ones in keep.txt? You do it like so:

plink – -noweb – -bfile PHYLO – -remove keep.txt – -make-bed – -out PHYLOAFEWGONE

With – -keep and – -remove, and making files drawn from the .fam file(s), you can customize your own data set for your own purposes. Again you want to produce an MDS, so run:

– -plink – -noweb – -bfile PHYLONARROW – -genome

-plink – -noweb – -bfile PHYLONARROW – -read-genome plink.genome – -mds-plot 6

This time – -genome will run very fast, because there are far fewer individuals. Here is my plot of the result of the outcome (my friend is “RF,” my daughter is “RD”):

Note that RF is aligned straight toward the “Dai” population, an ethnic group from South China, but not Han (they are related to the Thai). It seems plausible that my friend is of mixed Chinese and Filipino background. My daughter’s minimal East Asian ancestry is indeed Southeast Asian, and this is clear from this plot, as she is shifted further toward the Cambodians (this may be due to South Asian affinities as well).

The point is not to rely on one plot, but to generate many so as to explore the possibilities, and develop and intuition.

Using your 23andMe data in Plink

With the recent $99 price point for 23andMe many of my friends have purchased kits (finally!). 23andMe’s interpretive results are pretty rich now, but there are still things missing. There are plenty of third party tools you can use, but I know some people might want to do their own data analysis. There are many ways you could go about this, but I want to put up some posts on DIY genomic data analysis to making the learning curve a little less steep, and get people started. Motivation to actually begin going down this road is a big issue, but I think once you get over the hump it gets a lot easier.

First, you need Plink. It is really preferable that you work on a Mac or in Linux to engage in heavy duty analysis, but in this post I’ll assume you are working on the Windows platform. Again, the point here is to make this accessible. Download Plink if you don’t have it, and extract it where ever you like.

Plink is a command line tool, which means that you need to into the folder with the old MS-DOS interface. So use the cd command to get into that folder. Here is a screenshot of my shell:

The selection “plink –noweb –bfile PhyloF –genome” is a command that I entered. It is not part of the directory structure. If you don’t know about the cd command, please see the Wikipedia entry. It’s really just a simple way to step through the directory structure of your files and folders.

Now you have Plink. We need to put your 23andMe data into pedigree format. Additionally it would be convenient to have other reference data sets . Go to here. You now need to click the ZIP option. That will download a 74 MB zip containing all the files you see listed to the left. Most of that is in the two zip files, which are pedigree file data sets that I have provided for your future use. More on that later. First you need to use “CONVERT_23AME_PED.pl”  This a Perl script which takes the 23andMe text file, and converts it to pedigree format which Plink can use. You need to have Perl to use this script.

If you are on Windows you need to get ActivePerl. Download it. Again you have to open the command prompt and go into the appropriate folder. On my computer (this is the first time I’m using Perl on Windows in 10 years, the sacrifices I make for the readership of this weblog!) it is in the C:\ directory, so you probably have to move “up” the directory tree twice by typing “cd ..” (if you do this you’ll see what I mean). Once you are in the Perl directory you need to go into the bin directory. Remember to move the Perl script into the Perl directory. Here is a screen of what I get when I try and run the Perl script without any parameters:

Basically there needs to be a file for the script to process. You should have a 23andMe text file, your raw data. It will start like so: “genome_”. If you don’t have it, go into your account, and click “Browse Raw Data.”  On this page there will be options to download various peoples’ data if you have multiple accounts. It will download whoever is selected in your profile (for most it will be just one person of course).

Now you need to just select the button and enter your password. An 8 MB zip file will come down from the server. Put it into your Perl/bin folder by extracting it. Do not try and process the zip file! Once in there you now add it as your first parameter. I’d rename it something short and sweet since you’ll be typing it in. You don’t need to put a unique id parameter in, but I would if I were you. Try “me.” And “me” for the family id. At some point you’ll do more sophisticated things and need less silly ids, but not right now.

Here’s a screenshot of me running the Perl script with my own data (I renamed the text file). If the file name isn’t recognized make sure that you didn’t add the file extension within Windows, that might confuse it (e.g., for razibdata.txt if that’s what you see in the directory, you’d have to enter in razibdata.txt.txt in the parameter value since the extension is hidden):

There are two output files. In my case they are razibdata.ped and razibdata.map. As you can see they are named from your original file. You need to move both into the Plink directory. The .ped file has your individual data, the first half a dozen columns being the same as the parameters you may, or may not, have entered above. But it is very large because the whole line is filled out with your 23andMe genotype. The .map file basically has the information about the SNPs. These are both text files, and unwieldy.  You need to make it into a binary file. At the end of this there are three new files of the same name with extensions .bed, .bim, .fam:

You can see a lot of information. Most of it is not relevant to you, but note the number of SNPs. So now you have a pedigree file! Great. What do you do with it? Lots of stuff. You can look at the Plink documentation. Because the .bed file is a binary, never open it. The .bim has SNP info. You shouldn’t open this. On the other hand when you merge data sets .fam is useful. It’s a text file with all your individual and family id information. In this case with one file it isn’t informative, though you could change the id by editing the .fam file.

One thing you can do with just one individual is look for runs of homozygosity. The command is:

plink –bfile mydata –homozyg

You enter your binary pedigree file name, without the extension. Observe that now we are use –bfile instead of –file. Many commands will be bCommand instead of just Command if you are using binary files instead of the conventional ones. Binary files are smaller and the commands finish much faster, so use them! The output files, unless you use the –out command at the end to define them, usually begin with plink. So above you have plink.hom. It has some interesting information about the runs of homozygosity, but it is probably not too illuminating unless you suspect you are inbred!

Ultimately what I want you do by the end of this is compute an MDS with your own data against a reference set. That’s PHYLO in the data I’ve provided. It has 99,000 SNPs that overlap with 23andMe, and 1,500 individuals. I’ve altered the .fam file so that all the family ids are recognizable as populations. This will make analysis of the output easier for you. First you need to merge the files. It will be useful for you to prune your data set down, since you have a lot of extra SNPs.

Assuming you’ve extracted PHYLO out of the zip downloaded here is my command writing out the list of SNPs within PHYLO:

You can see from reading this that this data set has ~99,000 SNPs. I pruned it so that it ran quicker for phylogenetic analysis. This is more than a sufficient number for most analysis. What you want to next is create a copy of your own data which doesn’t have so many SNPs, so you can merge them well. Because I created this data set I can tell you that all of the above SNPs are probably in your 23andMe file. With the commands above there is a file, plink.snplist, which you will use to filter your data set.

Here’s how to do it:

Now we’ve got it ready to merge. I will warn you that this takes forever on Windows! No idea why. Also, Windows tends to do strange things with the file extensions. If Plink tells you that a .fam does not exist, look to the file extension. If you label something as something.fam, it might actually be something.fam.fam. In any case, here’s how you merge:

This is going to give you lots of warnings. Often this won’t matter, but sometimes it will tell you that you might need to “flip” one of the files. Try flipping it. If it still doesn’t work I would remove the SNPs causing problems. Something like this:

Honestly you might have to do a lot of things to get data sets to merge. But this particular combination of 23andMe genotype and PHYLO shouldn’t be too bad. Let’s assume that your merge worked. What do you want to do? One thing that might be interesting is an MDS plot (it’s like a PCA plot).

First you run the genome command, which takes forever to finish. It might be best if you did this before you go to sleep, and just check in in the morning. The genome command will produce an output that you’ll use next.

Notice the input file. That was generated in the previous step. The value 6 is a parameter that defines how many dimensions you want to output. My experience with this is that it doesn’t take too long, so I go for 6 at least. The final result of this is that you have an plink.mds file with an ordered list of family and individual ids, along with positions for 6 dimensions. It should be straightforward to import this into Excel, and then plot your MDS, emphasizing your own position. Since I can no longer use Excel I couldn’t be bothered to figure out how to plot my own position, but the distribution should be familiar.

That’s about it for now. I’ll put up another post focusing less on phylogenetics, using the HapMap data set that I provided. I don’t know if I can continue to do this in Windows, but hopefully this illustrated how easy (if tedious) most of this is.

Rome: who we were and who we are

Bryan Ward-Perkins in The Fall of Rome: And the End of Civilization spends a great deal of time on the archaeology of the Classical and post-Classical world. But, he also devotes only somewhat less space to the historiography of the study of the Roman Empire, and Late Antiquity. That is because the study of the past is not just the study of the past, but it is the study of the concerns and values of the present. We look through the dark mirror to the past, and in it we see our own outlines. Similarly, science fiction which purports to be a projection of the future is often nothing much more than a retelling of the present in shinier garb. This reality of history, its reflection of the prescription of the present despite the conceit that it is a description of the past, needs to be kept in mind. It is not a failing which pollutes the whole enterprise, it is a reality which must inform our interpretation of it. The study of Rome is a study of what humanity was, but it can not help but also reflect and define what we wish to be, by comparison and contrast.

But before I go on, a minor mea culpa. After further research and correspondence I believe that I extrapolated too far from lead concentrations in Greenland ice caps in a previous post. Though I still believe that it is a good reflection of the decline in proto-industrial vigor in the Western world, I do think that distance from China means that we do not have a good gauge on any comparisons between 0 AD and 750 AD (the later date being the apogee of the Tang). Though I do note that world population estimates seem to be somewhat lower for 700 than 0. But these are not precise estimates, so they need to be taken with a grain of salt. Chinese census records indicate that Tang population was higher than that of the Han, while it seems plausible that the Arab Imperium of the 8th century resulted in a higher population for the regions under its purview than during antiquity. Therefore to make the “math work” one can reasonably assume that the population in Europe was far lower.

Additionally, to clarify my points from the previous post, I was presenting a description of the decline, not any allusion to the causes of the fall. And, my argument was that the rate of decline was far greater after the fall of Rome, not that there was no decline from the Principate to the Dominate. Though my opinions are not particularly well informed or strong, I would hazard to bet greater per person production during the Antonine period of the mid-2nd century than the relatively quiescent epochs of the 4th century. My argument is simply that in terms of economic production the 4th century resembles the 2nd far more than it does the 6th in the Roman West. The world of Procopius was further from that of Constantine than that of Constantine was from that of the Antonines, despite the fact that Justinian’s Byzantium perceived itself to be (and rightly) simply a continuation of antique Rome.

The argument of The Fall of Rome which is powerful and persuasive as a description is fundamentally a material one. Political unity within the Roman Empire decreased the fixed costs of production (e.g., no need for city walls, no political boundaries imposing extortionate levels of duty, etc.). Rome was the Western world’s first free trade zone where military conflict was ended by the imperial monopoly on force. This resulted in gains in wealth due to the economies of scale and classic Smithian productivity increases through specialization. Ancient Rome was not a consumer society characterized by eternal expectations of growth, but neither was there an expectation of collapse and regression (e.g., “The Eternal City”). I find the Malthusian logic of biology powerful in that greater  productivity should be swallowed up by population increase, so that over the long term average well being for most humans has been approximately the same (increased aggregate wealth coexisting with the same per capita wealth). But there are many qualifications within that statement; the long term may actually have been longer than the course of the empire, suggesting that Rome never attained Malthusian equilibrium.

To support his proposition about material prosperity Bryan Ward-Perkins recounts the quality and number of pots and amphorae, as well as extensive archaeological evidence of a dense network of cities all across the Roman Empire. And though the average Roman peasant did not avail themselves of the consumption of the broad upper orders, they were at least free of the fear of marauders and enslavement by foreign peoples. One might respond here that they were subject to grueling taxation, but this is fundamentally a different argument. Modern examples seems to imply that high taxation is preferable to an anarchic order of no taxation and no services (e.g., Afghanistan). Bryan Ward-Perkins makes a compelling case in The Fall of Rome that the economic and political order of the Western Mediterranean and Northwest Europe regressed back to a pre-Iron Age level of complexity in the two centuries after the fall of Rome! In other words, the “Dark Ages” saw the unraveling of a set of organically developed norms and connections which had matured from the 5th century B.C. onward. A 1,000 year old civilization expired, as defined by the innumerable threads which had bound together the Western Mediterranean, Gaul, and Britain.

But there looms over this argument aspects which are just not material, but also normative. When alluding to literacy we obtain here a case where the material and normative intersect. By various means Ward-Perkins argues that rates of literacy were far higher during the period of the Roman Empire than after. It is famously well known that across the centuries of Rome it was not until 518 that an Emperor donned the purple who may have been illiterate (Procopius is not entirely reliable on matters factual when it comes to the family of Justinian!). In contrast, Dark Age princes such as Charlemagne were often illiterate. The most evocative and persuasive component of the argument in regards to penetration of literacy is that graffiti and casual scribbling are legion from the Roman period, but far rarer afterward. There is even documentary evidence that most priests during this the Dark Ages were functionally illiterate, with a substantial minority being totally illiterate (i.e., they could not sign their own names to documents).

Marcus Aurelius

So what that 10% of Western Europeans in 400 may have been functionally literate, while only 1% in 700 were? The “what” is that the penetration of literacy allows for a critical mass to develop for a particular form of cultural discourse. A fundamental aspect of the Classical World is that it developed out of a world of citizens. This does not mean that it was a democratic world. Rather, it means that a broad expanse of the populace was vested in the political order, whether in an Athenian democracy, Spartan oligarchy, or Roman republic. Only 100 years into the Empire were emperors actually called Emperor. Rather, they were Princeps, “First Citizens.” The false conceit of the Roman Empire was that it was a restoration of the republic. Rome never had kings, and the slide toward explicit and formalized autocracy and despotism was gradual. The early republican army was one of freeborn citizens, and even after the Marian reforms which opened up the army to the proletariat the Roman legions were draw from the citizenry. Between the era of Augustus and the 3rd century the military became progressively less Italian, but it remained Roman, with only auxiliaries derived from barbarian people who were not citizens (after Caracalla granted citizenship to all freeborn Romans barbarian would mean someone from outside the imperial frontiers).

This alien world of emperors, slaves, and gladiatorial battles was nevertheless populated by familiar figures. There were philosophers and civil servants, city councils and a professional army paid in coin or salt. What was fundamentally alien is that it was ‘pagan.’ This is to some extent a catchall term which developed relatively late in history (in the Greek-speaking world pagans were termed ‘Hellenes’), but it reflects the view that the old religion of the West was alien in a deep manner from what came after. Until recently faiths like Christianity, Judaism, Islam, Buddhism, and Hinduism were termed ‘higher religions.’ Today such judgmental terminology is less common, and they might be classed together as ‘institutional religions,’ or more more colloquially ‘organized religions.’ Often the term pagan is used to encompass the faiths outside of the Abrahamic religions, but because of the pejorative connotations of pagan this is probably not advisable.

The critical aspect which I think needs elaborating here is that in regards to the relatively seamless coherency and integration of all aspects of religiosity, high and low, and ritualistic, mystical, and philosophical, there is a qualitative difference between the institutional religions, and the older traditions. The ancients had ethical philosophy, they had rites, and they had mystical and ecstatic communal worship. What they often did not have was an organized system which packaged all these aspects together. If Constantine had fallen at the battle of Milvian Bridge then the West may not have become Christian, but it would not have remained pagan in a way which we understand it. The exact nature of the organic development of an institutional religion varies across civilizations, but it seems that a complex culture invariably demands a religious system which binds together various sentiments across elements of society. This is clear in the rise of solar religion in the late 3rd century, and points to the fact that in the 1st millennium all civilizations were moving toward a system where an institutional religion with metaphysical grounding anchored and gave legitimacy to the body politic.

But is this enough to differentiate us from the Romans and align ourselves with what became Christendom? Bryan Ward-Perkins observes that the study of Late Antiquity is rife with a fixation on cultural change in the domain of religion, with a neglect of the material and  political dimensions of life. In particular there is much attention paid to ascetic religious figures who were instrumental in transforming Roman Christianity into Medieval Christianity. Granting the materialist argument (which even the doyen of Late Antique studies, Peter Brown, seems to do in Through the Eye of a Needle), are the cultural changes of Late Antiquity which make the world of Europe more familiar stark enough to make that age truly the seedbed of recognizable modernity?

Charlemagne receiving homage from Saxons

It is critical to note that Late Antiquity lives on genealogically. The modern British royal family can trace descent back to at least the 8th century kings of Wessex. The dispossession of the Roman elites was such that despite the persistence of eminent families with roots in the late Classical period across Europe at the local level it is difficult to validate any European royal genealogies before the early Dark Ages. The modern nation-states of Europe also date back in their embryonic sense to this period. And critically one must distinguish between the Dark Ages from the High Medieval period. The latter phase saw a reemergence of social complexity, in particular in northern France, southern England, and the Low Countries. The Aristotelian Renaissance illustrates that after 1000 A.D. Western Europe began to rouse itself from its intellectual slumber.

But all that notwithstanding for me the critical aspect to emphasize is that the Germanic elites of the post-Roman Western European world were fundamentally military gangs; warlords and their underlings. This is not atypical. Rather, this is the normal state of pre-modern societies. And in this way it is fundamentally alien to the modern sensibility. The lords of the post-Roman world were of the same category as anax of Mycenaean Greece. Their profession was war, and their cultivation was of the sword. Though the Roman world was militarized, in fact most of the Imperial expenditure was on the legions, its elite was fundamentally civilian in orientation. Roman aristocrats were often military leaders, but more universally they defined themselves as being cultured and refined in relation to the common. No Roman could rise to a status of prominence without being literate, and the established nobility was educated in classical literature and rhetoric. This is not entirely surprising, as for several centuries the Roman world was characterized by peace, and high status was not likely to be won through feats of martial prowess. A similar process seems to have occurred in early modern Europe, as the military elites who were often the descendants in spirit if not genealogy from the post-Roman Germanic warlords began to cultivate their manners and fashions to signal their gentility. Part of this is due to the same decline in violence which characterized the Roman world. But it is also perhaps a response to the rise of firearms, which made aristocratic cavalry vulnerable on the field of battle. The civilian orientation of the Roman aristocracy is similar in many ways to that of China, where dynasties founded by generals nevertheless marginalized the military over time.

All of the above is why I state that my agreement with Bryan Ward-Perkins’ contentions are to some extent normative. I feel that a brutal Classical Roman autocrat such as Marcus Aurelius is more modern than a brutal Dark Age autocrat such as Charlemagne. Then again, some of Aurelius’ self-serving thoughts are preserved for us in his Meditations, while Charlemagne was an illiterate whose character is filtered through chroniclers. Charlemagne may have been the defender of Roman Christianity, but he was an exemplar of Romanitas in the same manner that the Democratic Republic of Korea (North Korea) is democratic (for example, he was an open de facto polygamist). The American republic was founded as a republic. Granted many of its institutions evolved organically from English common law tradition, but it is clear that the injection of ancient political theory revitalized the organization of Western nation-states. The Napoleonic Code draws inspiration from the rediscovery of Roman law. Ultimately it strikes me that the modern world manifests many of the structural features of the Roman world, with the Dark Ages being an unwinding from which the West only slowly recovered. Your mileage may vary, but that says more about differences in values among contemporaries, than it truly does about the factual assessment about the shape of the past.

Why so few Asians in ecology? Not all groups have similar preferences

A week ago Keith Kloor had a post up, What Science, Environmentalism and the GOP Have in Common, where he bemoaned the lack of representation of non-whites in these categories. As a matter of fact I think Keith is wrong about science. Even constraining the data set to American citizens and permanent residents people of Asian ancestry are well represented in many areas of science. But not all sciences are created equal. In 2011 there were 158 doctorates which were awarded within the category of ‘evolutionary biology’ for American citizens or permanent residents. Of these 135 were non-Hispanic white, and 5 were Asian. In ‘neuroscience’ the respective figures were 742, 535, and 96. In ‘zoology’ 55, 49, and 0. In ‘bioinformatics’ they were 80, 51, and 17. Finally, in ‘ecology’ the breakdown was 330, 300, and 11. If you are involved in academic biology I’m rather sure that these numbers won’t surprise you too much, even if you’d never thought about it. You can even infer these by walking through the posters at ASHG 2012, and seeing how the demographics of the crowds shift.

We can look at this issue another way. In 2010 US News & World Report listed the top 10 ecology & evolution graduate programs. I went to the faculty websites after typing the university and ‘ecology,’ and then ‘neuroscience.’ Looking at names, and sometimes head shots, I classified everyone as ‘Asian’ (as defined by the US Census) and ‘Not Asian.’ You can find the data here. Please note that the left columns are ecology faculty, and the right are neuroscience.

The raw results are:

University & Department Asian Not Asian % Asian
Berkeley – Ecology 0 46 0.0%
Berkeley – Neuroscience 4 40 10.0%
Harvard – Ecology 3 48 6.3%
Harvard – Neuroscience 21 127 16.5%
Davis – Ecology 8 117 6.8%
Davis – Neuroscience 12 73 16.4%
Chicago – Ecology 3 22 13.6%
Chicago – Neuroscience 11 65 16.9%
Stanford – Ecology 2 17 11.8%
Stanford – Neuroscience 19 74 25.7%
Cornell – Ecology 1 31 3.2%
Cornell – Neuroscience 3 39 7.7%
UTexas – Ecology 3 43 7.0%
UTexas – Neuroscience 7 63 11.1%
Yale – Ecology 0 23 0.0%
Yale – Neuroscience 13 83 15.7%
Princeton – Ecology 0 15 0.0%
Princeton – Neuroscience 2 17 11.8%
Arizona – Ecology 0 54 0.0%
Arizona – Neuroscience 0 20 0.0%


And here are charts of % and counts:

Does this matter? In American society, especially from the center to the left of the social-cultural spectrum, there is a premium on diversity. Usually this means specifically cases of racial and gender diversity (again, as I have contended before the nod to class diversity is almost always perfunctory, and there is only marginal concern about ideological diversity). As a rule within these parameters the question about diversity is usually ‘why not,’ in as proportions out of sync with the population immediately prompt questions as to why this might be. My own personal position is at variance with this. Rather, my attitude is more ‘so what?’ I generally don’t care about these things personally. Unlike most my default assumption isn’t that all groups will have the same aptitudes and preferences, and so it is difficult to assess the scope and nature of the idealized demographic mix sans discrimination. In the sciences what is of importance to me is not ‘who,’ but ‘what’? That is, what is being discovered.

The question in regards to Asian Americans with American biological science is of personal interest to me. My own passions lean strongly to evolutionary biology. Any curiosity about genomics and bioninformatics is prompted by population and evolutionary genetic questions. Frankly, this means that I spend a great deal of time around white people, because for whatever reason evolutionary biology is far more white than many other areas of life science. In contrast, if I stumble into a molecular biology or neuroscience seminar the audiences are by nature far more diverse, with diversity being due to the large contingent of people of Asian ancestral background.

I don’t know if this matters in any deep way. I suspect if Asian Americans were as well represented in human evolutionary genomics as they are in cancer research there might be some stronger and earlier focus on questions of ascertainment bias due to early Eurocentric data sets. But this would be only a shift on the margins; it isn’t as if evolutionary biologists aren’t aware of the issue at all. More importantly I wanted to highlight this difference across fields because I think it illustrates the proximate power of preferences and expectations, rather than discrimination or lack of outreach. To give an example of what I mean, my father, who has a doctorate in physical chemistry, once quipped me that ‘it would be nice if you studied neuroscience, then I could just tell people you study the brain.’ Though conveniently for him since my major area of concern is genetics that is something that he can tell his friends which is intelligible, though questions always get back to me about ‘genetic engineering’ and ‘gene therapy,’ suggesting that people assume my topics must be biomedical. For whatever reason most of the young Asian Americans who enter university and study biology of some sort do not tend to gravitate into areas like ecology or evolution. An Asian American acquaintance who is an ecologist has even joked to me that sometimes his friends refer to him as a ‘twinkie‘ on account of his disciplinary focus. I do not believe that the lack of representation of Asian Americans within ecology or evolution has to do with discrimination, nor do I think that biomedical science has less implicit bias against people of Asian heritage. To be succinct, many Asian American youth who pursue graduate school in science may already elicit raised eyebrows because they did not pursue medical school. Going off to study the phylogeny of starfish, or some such thing, would frankly result in even more bewilderment and disappointment.

In this case it seems clear that the problem is not discrimination or bias (though that exists, I don’t think it varies that much across fields), but a cultural preconception as to what science merits one’s professional energies. Evolutionary biologists could go into Korean American churches to argue for the value of their discipline, but even assuming individuals their audience did not hold Creationist beliefs (many would), it would be a hard sell to convince them that abstract and theoretical evolutionary questions are more worthy of attention than projects with a more practical biomedical focus. This isn’t going to convince people who start out with the null hypothesis that variation in discriminatory atmosphere explains variation in representation in fields by race and ethnicity, but, I hope it makes people reconsider different hypotheses.

Addendum: Also, bemoaning the lack of ‘minorities’ in science often seems a case of the ‘How Asians became white‘ phenomenon.


India Rape Victim Had Many Onlookers, No Savior:

Police records say the underage suspect raped the 23-year-old physiotherapy student twice after she was hit with iron rods and fell unconscious. He extracted her intestine with his bare hands.


Why the future won’t be genetically homogeneous

While reading The Founders of Evolutionary Genetics I encountered a chapter where the late James F. Crow admitted that he had a new insight every time he reread R. A. Fisher’s The Genetical Theory of Natural Selection. This prompted me to put down The Founders of Evolutionary Genetics after finishing Crow’s chapter and pick up my copy of The Genetical Theory of Natural Selection. I’ve read it before, but this is as good a time as any to give it another crack.

Almost immediately Fisher aims at one of the major conundrums of 19th century theory of Darwinian evolution: how was variation maintained? The logic and conclusions strike you like a hammer. Charles Darwin and most of his contemporaries held to a blending model of inheritance, where offspring reflect a synthesis of their parental values. As it happens this aligns well with human intuition. Across their traits offspring are a synthesis of their parents. But blending presents a major problem for Darwin’s theory of adaptation via natural selection, because it erodes the variation which is the raw material upon which selection must act. It is a famously peculiar fact that the abstraction of the gene was formulated over 50 years before the concrete physical embodiment of the gene, DNA, was ascertained with any confidence. In the first chapter of The Genetical Theory R. A. Fisher suggests that the logical reality of persistent copious heritable variation all around us should have forced scholars to the inference that inheritance proceeded via particulate and discrete means, as these processes do not diminish variation indefinitely in the manner which is entailed by blending.

More formally the genetic variance decreases by a factor of 1/2 every generation in a blending model. This is easy enough to understand. But I wanted to illustrate it myself, so I slapped together a short simulation script. The specifications are as follows:

1) Fixed population size, in this case 100 individuals

2) 100 generations

3) All individuals have 2 offspring, and mating is random (no consideration of sex)

4) The offspring trait value is the mid-parent value of the parents, though I also including a “noise” parameter in some of the runs, so that the outcome is deviated somewhat in a random fashion from expected parental values

In terms of the data structure the ultimate outcome is a 100 ✕ 100 matrix, with rows corresponding to generations, and each cell an individual in that generation. The values in each cell span the range from 0 to 1. In the first generation I imagine the combining of two populations with totally different phenotypic values; 50 individuals coded 1 and 50 individuals coded 0. If a 1 and 1 mate, the produce only 1′s. Likewise with 0′s. On the other hand a 0 and a 1 produce a 0.5. And so forth. The mating is random in each generation.

The figure to the left illustrates the decay in the variance of the trait value over generation time in different models. The red line is the idealized decay: 1/2 decrease in variance per generation. The blue line is one simulation. It roughly follows the decay pattern, though it is deviated somewhat because it seems that there was some assortative mating randomly (presumably if I used many more individuals it would converge upon the analytic curve). Finally you see one line which follows the trajectory of a simulation with noise. Though this population follows the theoretical decay more closely initially, it converges upon a different equilibrium value, one where some variance remains. That’s because the noise parameter continues to inject this every generation. The relevant point is that most of the variation disappears < 5 generations, and it is basically gone by the 10th generation. To maintain variation in a blending inheritance model requires a great deal of mutation, the extent of which is just not plausible.

To get a different sense of what occurred in these two particular simulations, here are heat maps. The interval 0 and  1 now have shading in each sell. I am displaying only 50 generations here. The top panel is one without noise, while the bottom panel has the noise parameter.

The contrast with a Mendelian model is striking. Imagine that 0 and 1 are now coded by two homozygote genotypes, with heterozygotes exhibiting a value of 0.5. If all the variation is controlled by the genotypes, then you have three genotypes, and three trait values. If I change the scenario above to a Mendelian one than variance will initially decrease, but the equilibrium will be maintained at a much higher level, as 50% of the population will be heterozygotes (0.5), and 50% homozygotes of each variety (0 and 1). With the persistence of heritable variation natural selection can operate to change the allele frequencies over time without the worry that the trait values within a breeding population will converge upon each other too rapidly. This is true even in cases of polygenic traits. Height and I.Q. remain variant, because they are fundamentally heritable through discrete and digital processes.

All this is of course why the “blond gene” won’t disappear, redheads won’t go extinct, nor will humans converge upon a uniform olive shade in a panmictic future. A child is a genetic cross between parents, but only between 50% of each parent’s genetic makeup. And that is one reason they are not simply an “averaging” of parental trait values.

The future is e-books!

Nicholas G. Carr, purveyor of high-brow neo-ludditism and archeo-utopianism, has a piece out in The Wall Street Journal, Don’t Burn Your Books—Print Is Here to Stay. The subtitle is “The e-book had its moment, but sales are slowing. Readers still want to turn those crisp, bound pages.” Here are some of his rancid chestnuts of un-wisdom:

… Hardcover books are displaying surprising resiliency. The growth in e-book sales is slowing markedly. And purchases of e-readers are actually shrinking, as consumers opt instead for multipurpose tablets. It may be that e-books, rather than replacing printed books, will ultimately serve a role more like that of audio books—a complement to traditional reading, not a substitute.

What’s more, the Association of American Publishers reported that the annual growth rate for e-book sales fell abruptly during 2012, to about 34%. That’s still a healthy clip, but it is a sharp decline from the triple-digit growth rates of the preceding four years.

The initial e-book explosion is starting to look like an aberration… 2012 survey by Bowker Market Research revealed that just 16% of Americans have actually purchased an e-book and that a whopping 59% say they have “no interest” in buying one.

From the start, e-book purchases have skewed disproportionately toward fiction, with novels representing close to two-thirds of sales…Screen reading seems particularly well-suited to the kind of light entertainments that have traditionally been sold in supermarkets and airports as mass-market paperbacks.

Readers of weightier fare, including literary fiction and narrative nonfiction, have been less inclined to go digital. They seem to prefer the heft and durability, the tactile pleasures, of what we still call “real books”—the kind you can set on a shelf.

…In fact, according to Pew, nearly 90% of e-book readers continue to read physical volumes. The two forms seem to serve different purposes.

Having survived 500 years of technological upheaval, Gutenberg’s invention may withstand the digital onslaught as well. There’s something about a crisply printed, tightly bound book that we don’t seem eager to let go of.

An immediate issue with this op-ed is that it engages in shell games with quantities. Starting from a baseline of zero a new technology will undergo incredible rates of initial growth in adoption. But this will level off rather quickly. A 34% rate is still indeed healthy, and a sign I think that the explosive phase is giving way to robust and expansionary growth as the market slouches toward maturation. Other data in the piece seem to me to be irrelevant red-herrings. People who read e-books tend to be readers, so naturally one would expect that they read physical books. Most people with e-books have extensive personal libraries, and many works which they already own are not in e-book formats, or, are expensive in e-book formats (e.g., I have textbooks which I purchased for more than $100, which are discounted 50% for e-books, so they still come in at $60!). Additionally, asking all Americans about reading is rather misleading. A small proportion of the public are intense readers, with most being casual at best, if they read at all.
To the left is a figure I generated from an AP/IPSOS survey on American book reading habits in 2006. As it is a self-report this probably overestimates the reading habits of the general public, as well as the nature of what they read. 25% of Americans admitted reading no books in a year, while the median number of books read was 6.5. This I think gets at the heart of why e-books aren’t as popular as you might expect: books are’t that popular! The typical entry-level e-reader runs in the $50 to $100 range. This initial fixed cost is heavily subsidized because the makers of these devices want you to purchase content from them. But consider that the average American reads on the order of 5 books a year.  And Daniel McCarthy brings up the important issue that you need to analyze the trends across age cohorts; most readers are older, but most future readers are not going to be from the older cohorts. Some of these books that people read are likely to be relatively cheap mass market paperbacks or library books, but assuming on average $20 per book, the expenditure of Americans on new books per year is going to be about the same as an e-reader. These devices are not without hassle or risk, they break or malfunction, and, there are the notorious issues with digital rights.

So why e-books? Interestingly Carr asserts those who read more “serious” books prefer the physical medium. I’d like to see more analysis of this. Certainly I am of the opposite opinion. Though I don’t read mass market science fiction or fantasy paperbacks anymore, these $8 purchases are the sort which I would run through once, never to revisit. I don’t need to have something in my digital library if I never revisit it. This is in contrast to meatier references and classics. But for someone who reads a lot one of the biggest hassles of physical books is storage and retrieval. I’m an avid user of libraries, and am assiduous about making a trip to the used book store every few years, but even I nevertheless have a relatively cumbersome collection of texts which I have to transport on every occasion that I move. In addition, any travel plans would often result in my deciding how many books I could stow before it became more of a nuisance than a boon.

Because I do much of my reading on a Kindle I’ve accrued a massive portable library of classics, most of which I purchased for a few dollars at most. I’d wager that the number of people who would actually read War and Peace all the way through (as opposed to being seen reading it, or mentioning offhand that they’re reading it) would be facilitated by its packaging in a less cumbersome format. Contrary to the waxing of someone like Nicholas Carr about the tactile physical experience of a book I’ve never enjoyed the fact that works of more than 500 pages tend to be unwieldy. This is not an abstract concern for me, I’m an intellectual generalist who has a taste for very expansive surveys on a variety of academic topics. Both A History of the Byzantine State and Society and The Structure of Evolutionary Theory would benefit from not being in a physical format (the latter is heavier than my laptop in hardcover!). Not only is the reading experience made difficult by the mass of the book, but the long term physical integrity of the work is often endangered by the reality that the number of pages tends to exceed the capacity of the binding of the spines.

What of the musty pleasures of the scroll?

Finally, there’s the issue of what e-books are in relation to various other forms of books, printed or audio. I think the analogy to audio books is totally ridiculous; e-books and printed books are fundamentally the same thing, only in somewhat different physical formats. Additionally, the printing press was a quantitative, not qualitative, change. It took the codex format, which attained popularity in late antiquity, and elevated it to the level of mass industrial production. The big change in qualitative formatting was the move from the scroll to the book over 1,000 years earlier. Prior to this there was the shift from the antique Near Eastern forms of writing, such as cuneiform or hieroglyph on heavy non-portable medium, to alphabetic script on papyrus. The alphabets packaged in a light scroll allowed for literacy to be more broadly accessible to the higher orders of society, rather than just the specialized vocation of a scribal class. Reading has always been subject to periodic revolution. I am dismayed by the fixation of some on the physical medium of the book, as opposed to the information content of the book. If the smell of paper and the tactile experience of a hardcover jacket is so critical, then I think consumers of text are missing the point somewhat. Frankly, it makes me think that the term “book slut” is more than metaphorical. Many of the lovers of the physical porn linger longingly upon vivid descriptions of smell and texture of the page in a manner which is reminiscent of what “food porn” factories such as the Food Network indulge in.

All that being said there are genuine concerns with the transition to e-books, in particular the scope of intellectual property, and the possibility of monopolistic domination of the sector by a firm such as Amazon. The struggles of the Nook should worry those who appreciate the spur and pressure which competition forces upon companies, though one must remember that e-book consumption occurs across a variety of platforms (e.g., I can read my Kindle books on the phone, computer, and Kindle, as well as tablets). A more substantive concern is the control which we cede to Amazon when we purchase e-books in their specific format. These are real difficulties which we need to address over the next decade, but I think they’re surmountable, and will be resolved. Information is too important to simply abdicate all control of the means of production to a few firms.

If Nicholas Carr truly believes what he’s saying, I’m curious if he’d be willing to make a bet on the market penetration of e-books in 2017. I suspect the reality is that op-eds such as this are expressions of his sentiment and preference, not a genuine prediction rooted in an understanding of how the world is, as opposed to how an individual might want the world to be.

Addendum: Unlike CDs I believe that physical printed books will persist for the indefinite future. There are some works which are important references where I think many people will want to have in physical format not tied into technology and stored in a cloud. But, the number of these works will be small, and most people will not have any physical books aside from the Bible or a religious text, which has sacred value. Interestingly this will result in a physical reversion to the state of affairs of a few hundred years ago, when for most households the only book might have been of a religious nature.

Mitochondrial Eve: a de facto deception?

The above image, and the one to the left, are screenshots from my father’s 23andMe profile. Interestingly, his mtDNA haplogroup is not particularly common among ethnic Bengalis, who are more than ~80% on a branch of M. This reality is clear in the map above which illustrates the Central Asian distribution my father’s mtDNA lineage. In contrast, his whole genome is predominantly South Asianform, as is evident in the estimate that 23andMe provided via their ancestry composition feature, which utilizes the broader genome. The key takeaway here is that the mtDNA is informative, but it should not be considered to be representative, or anything like the last word on one’s ancestry in this day and age.

As a matter of historical record mtDNA looms large in human population genetics and phylogeography for understandable reasons. Mitchondria produce more genetic material than is found in the nucleus, and so were the lowest hanging fruit in the pre-PCR era. Additionally, because mtDNA lineages do not recombine they are well suited to a coalescent framework, where an idealized inverted treelike phylogeny converges upon a common ancestor. Finally, mtDNA was presumed to be neutral, so reflective of demographic events unperturbed by adaptation, and characterized by a high mutation rate, yielding a great amount of variation with which to differentiate the branches of the human family tree.

Many of these assumptions are are now disputable. But that’s not the point of this post. In the age of dense 1 million marker SNP-chips why are we still focusing on the history of one particular genetic region? In a word: myth. Eve, the primal woman. The “mother of us all,” who even makes cameos in science fiction finales!

In 1987 a paper was published which found that Africans harbored the greatest proportion of mtDNA variation among human populations. Additionally, these lineages coalesced back to a common ancestor on the order of 150,000 years ago. Since mtDNA is present in humans, there was a human alive 150,000 years ago who carried this ancestral lineage, from which all modern lineages derive. Mitochondrial DNA is passed from mothers to their offspring, so this individual must have been a woman. In the press she was labeled Eve, for obvious reasons. The scientific publicity resulted in a rather strange popular reaction, culminating in a Newsweek cover where Adam and Eve are depicted as naked extras from Eddie Murphy’s Coming to America film.

The problem is that people routinely believe that mtDNA Eve was the only ancestress of all modern humans from the period in which she lived. Why they believe this is common sense, and requires no great consideration. The reality is that the story being told by science is the story of mtDNA, with inferences about the populations which serve as hosts for mtDNA being incidental. These inferences need to be made cautiously and with care. It is basic logic that a phylogeny will coalesce back to a common ancestor at some point. Genetic lineages over time go extinct, and so most mtDNA lineages from the time of Eve went extinct. There were many woman who were alive during the same time as Eve, who contributed at least as much, perhaps more, to the genetic character of modern humans today. All we can say definitively is that their mtDNA lineage is no longer present. As mtDNA is passed from mother to daughter (males obviously have mtDNA, but we are dead ends, and pass it to no one), all one needs for a woman’s mtDNA lineage to go extinct is for her to have only sons. Though she leaves no imprint on the mtDNA phylogeny, obviously her sons may contribute genes to future generations.

Prior to ancient DNA and the proliferation of dense SNP data sets scholars were a bit too ambitious about what they believed they could infer from mtDNA and Y lineages (e.g., The Real Eve: Modern Man’s Journey Out of Africa). We are in a different time now, inferences made about the past rest on more than one leg. But the legend of Eve of the mtDNA persists, not because of its compelling scientific nature, but because this is a case where science piggy-backs upon prior conceptual furniture. This yields storytelling power, but a story which is based on a thin basis of fact becomes just another tall tale.

All this is on my mind because one of the scientists involved with Britain’s DNA, Jim Wilson, has penned a response to Vincent Plagnol’s Exaggerations and errors in the promotion of genetic ancestry testing (see here for more on this controversy). Overall I don’t find Wilson’s rebuttal too persuasive. It is well written, but it has the air of sophistry and lawyerly precision. I have appreciated Wilson’s science before, so I am not casting aspersions at his professional competence. Rather, some of the more enthusiastic and uninformed spokespersons for his firm have placed him in a delicate and indefensible situation, and he is gamely attempting to salvage the best of a bad hand. Importantly, he does not reassure me in the least that his firm did not use Britain’s atrocious libel laws as a threat to mute forceful criticism of their business model on scientific grounds. A more general issue here is that Wilson is in a situation where he must not damage the prospects of his firm, all the while maintaining his integrity as a scientist. From what I have seen once science becomes a business one must abandon the pretense of being a scientist first and foremost, no matter how profitable that aura of objectivity may be. The nature of marketing is such that the necessary caution and qualification essential for science becomes a major liability in the processing of communicating. It’s about selling, not convincing.

Going back to Eve, Wilson marshals a very strange argument:

“The claim that Adam and Eve really existed, as you suggest, refers to the most recent common ancestors of the mtDNA and non-recombining part of the Y chromosome. I don’t agree that there is nothing special about these individuals: there must have been a reason why mitochondrial Eve was on the front cover of Time magazine in the late 80s!….

A minor quibble, but I suspect he means the Newsweek cover. More seriously, this line of argumentation is bizarre on scientific grounds. Rather, it is a tack which is more rational when aiming toward a general audience which might purchase a kit which they believe might tell them of their relationship to “Eve.”

In the wake of the discussion at Genomes Unzipped I participated in further exchanges with Graham Coop and Aylwyn Scally on Twitter, and decided to spend 20 minutes this afternoon asking people what they thought about mitochondrial Eve. By “people,” I mean individuals who are pursuing graduate educations in fields such as genetics and forensics. My cursory “field research” left me very alarmed. Naturally these were individuals who did not make elementary mistakes in regards to the concept, but there was great confusion. I can only wonder what’s going through the minds of the public.

Analogies, allusions, and equivalences are useful when they leverage categories and concepts which we are solidly rooted in, and transpose them upon a foreign cognitive landscape. By pointing to similarities of structure and relation one can understand more fully the novel ground which one is exploring. Saying that the president of India is analogous to the queen of England is an informative analogy. These are both positions where the individual is a largely ceremonial head of state. In contrast, the president of the United States and the queen of England are very different figures, because the American executive is not ceremonial at all. This is not a useful analogy, even though superficially it sees no lexical shift.

Who was Eve? A plain reading is that she is the ancestor of all humans, and more importantly, the singular ancestress of all humans back to the dawn of time. This is a concept which the public grasps intuitively. Who is mtDNA Eve? A woman who flourished 150,000 years ago, who happened to carry the mtDNA lineage which would drift to fixation in the ancestors of modern humans. I think this is a very different thing indeed. For purposes of poetry and marketing the utilization of the name Eve is justifiable. But on scientific grounds all it does is confuse, obfuscate, and mislead.

The fiasco that Vincent Plagnol stumbled upon is just a symptom of a broader problem. Scientists need to engage in massive conceptual clean up, as catchy phrases such as “mitochondrial Eve” and “Y Adam” permeated the culture over the past generation, and mislead many sincere and engaged seekers of truth. This is of the essence because personal genomics, and the scientific understanding of genealogy, are now moving out of the ghetto of hobbyists, enthusiasts, and researchers. Though I doubt this industry will be massive, it will be ubiquitous, and a seamless part of our information portfolio. If people still have ideas like mitochondrial Eve in their head it is likely to cloud their perception of the utility of the tools at hand, and their broader significance.

The end of the blogroll

One of the major annoyances with the redesign of this weblog was that its precipitous nature was such that many of the sidebar links, etc., were removed. But, it did make me admit a major point: blogrolls are pretty much dead. In the early years of the blogsophere they served as a way to share traffic and endorse sites of interest. But with the rise of RSS, and later Twitter and its confederates they went into decline. By the end I barely recalled which sites I had on my blogroll; most of them I followed in via RSS. So I’m not going to recreate one at this point. Rather, if you want to get a sampling of what I read and such, please see my Pinboard page (to which you can subscribe via RSS if it suits you). And of course you can follow me on Twitter, though that will include my banter with other people and such. A more likely avenue is to note which websites I link to in my posts…though I’m not a copious linker to other blogs at this point….