The residual of the genes & geography correlation

David of the Eurogenes Genetic Ancestry Project has a cautionary post up, When is a genetic map also a geographic map? Always and never. In it, he uses a specific peculiar pattern as a launching point into a broader exploration of the relationship between visualizations of genetic variation, and geography. That pattern is that Russians, the most geographically furthest east of European peoples, are closer to the Slavs of Central Europe than the Balts when plotted on the two largest dimensions of variation. I’ve highlighted this pattern from a PCA David extracted from a paper on northeast European genetics. This disjunction between geography and genetics has a pretty straightforward possible explanation: the current distribution of Russian-speaking peoples is a function of a massive demographic expansion to the east by Slavic farmers within the last 2,000 years. We already know that the borderlands between the steppe and the forest were long dominated by North Iranian people, from the Scythians to the Sarmatians, while further north the Great Russians absorbed a Finnic substrate (clear because some of the absorption is attested down to the early modern period).
Read More

Around the Web – February 28th, 2011

February always goes by so fast….

Should you go to an Ivy League School, Part II. I think the value of an Ivy League degree will be more, not less, important in the future. It seems possible that we’re nearing the end of the age when the wage gap between unskilled and skilled workers is relatively modest (roughly, the wage gap decreased between 1800 up to 1970, and has been increasing over the past 40 years). Credentialing and finding juicy rents and sinecures is probably the way to go in the future. As the past was, the future shall be?

Anthropologists Trace Human Origins Back To One Large Goat. “Read the whole thing.”

Advanced Degrees Add Up to Lower Blood Pressure. I’m sure that the paper itself is less irritating in terms of conflating correlation and causation. The problem is that it is the least intelligent people who will think that extra years of education = extra years of life in a magical manner. That being said, peer group effects probably matter, so I suspect that that’s part of what’s going on here after you correct for background variables.

Election Defeat Predicted for Ireland’s Ruling Party. It is rather strange that the more right-wing party generally enters into coalitions with the left-wing party, against the centrist party.

At last – an explanation for ‘bunga bunga’.

Read More

My parents, looking east and west

Yesterday Michelle decided to put up a post with her own analysis of her ADMIXTURE results. With that in mind, I thought I’d revisit some results from my parents. After many runs of ADMIXTURE, both by myself and Zack, some consistent differences seem to crop up. To review, one of the big surprises from genotyping my parents is that both of them have about the same “East Asian” element of ancestry which is very distinctive from the conventional South Asian mix. Because both of my parents lack any oral history of recent admixture I posited that this element may be a uniform substrate common among eastern Bengalis, and that it was absorbed during the initial period of settlement and demographic expansion on the frontier in the period between 1000-1500 A.D. By analogy, low levels of Amerindian admixture persist across Brazilians, and African admixture among Mexicans, but because the admixture dates back several hundred years it does not seem to have percolated down to the present in oral history (though some old stock Brazilians of predominantly Portuguese origin have been able to infer Amerindian ancestry by looking at the church marriage records of their ancestors, and adducing that some women were natives due to common baptismal names given to such converts).

Since ADMIXTURE is sensitive to the genetic variance you throw into it in extracting out patterns, I created two pools with my parents in it. One was predominantly West Eurasian, and another was predominantly East Eurasian. In both samples my parents were Bengali A (father) and Bengali B (mother), and I included in the Gujarati_B and Pathan South Asian populations. Gujarati_B because it seems particularly South Asian, and therefore informative. The Pathan sample has less African admixture than the Sindhis or Makranis, and is not so isolated as the Kalash, Brahui, Burusho, and Baloch. For the East Eurasian sample I included Sardinians as the West Eurasian outgroup, while for the West Eurasians I included Japanese as the outgroup. Finally, I pruned the markers down to 65,000 SNPs. Below I report K = 6, as cross-validation determined that to be the optimal value for the number of populations.

Read More

"Content farms" and the media Precambrian

I’ve only become aware of “content farms” in any significant way over the past few days. Yes, I’m aware of Associated Content and eHow. I use Google! But I’ve always ignored them. But with Google’s turn against these websites I’ve become curious. This Wired piece from October 2009 is a gem. Here’s the part that caught my attention:

Plenty of other companies —, Mahalo, — have tried to corner the market in arcane online advice. But none has gone about it as aggressively, scientifically, and single-mindedly as Demand. Pieces are not dreamed up by trained editors nor commissioned based on submitted questions. Instead they are assigned by an algorithm, which mines nearly a terabyte of search data, Internet traffic patterns, and keyword rates to determine what users want to know and how much advertisers will pay to appear next to the answers.

In some ways “mainstream” websites also do this a bit, Nick Denton relies on fine-grained metrics for his Gawker Media properties. But obviously the sort of thing that content farms do, responding so specifically to the interests of the audience, take it to the next level. I started browsing some of the “articles” produced by the contributors, and I think Farhad Manjoo has it right:

Read More

The changing face of fame

Long time reader Dragon Horse has been generating and collecting (top row images are from Dienekes) composite image of various classes of individuals for a while now. It’s really fun to just skim through and make your own assessments (the “global face” resembles darker skinned versions of Amerasians, whose fathers were white Americans and mothers Southeast Asian, to me).

The most well known composites are of nationalities, but he’s also generated and reposted composites of other classes. For example, the average Bollywood actress is Aishwarya Rai. Not literally, but the resemblance is jaw-dropping (compare to the average Indian woman). But most interesting to me were the comparisons of American film actors, male and female, then and now (“Golden Age” vs. contemporary). I’m pretty sure you can pick out which one is which if you’re American. There seem to be two correlated trends here: 1) more feminine features for both males and females, and 2) more youthful features for both males and females. Correlated, because neoteny and masculinization seemed to generally push in opposite directions of trait value. Projecting in the future I assume that the Global Human Celebrity will converge upon a 14 year old girl?

Addendum: One difference between the “Golden Age” and modern celebrities is the attention to a rather buff physique. So though the actors of yore had more rugged faces, their physiques were often rather flabby in comparison to today’s leading men. So I might correct and assert that the future global celebrity will be a baby-faced 14 year old girl with abs to die for!

Brazilians, more European than not?

Credit: Dragon Horse

The Pith: Brazil is often portrayed as the second largest black nation in the world, after Nigeria. But it turns out that the majority of the ancestors for non-white Brazilians are European.

One of the more popular sources of search engine traffic to this website has to do with the population genomics of Latin America. For example, my post showing that Argentina is not quite as European a country as it likes to consider itself is regularly cited in online arguments (people of various “persuasions” are invested in the racial status of the Argentine people). But last week in PLoS ONE a paper looking at the patterns of ancestry in the Brazilian population came to a somewhat inverse conclusion as to the self-conception or perception of the preponderant racial identity of that nation. Let me quote from the conclusion of the paper:

Among the actions of the State in the sphere of race relations are initiatives aimed at strengthening racial identity, especially “Black identity” encompassing the sum of those self-categorized as Brown or Black in the censuses and government surveys. The argument that non-Whites constitute more than half of the population of the country has been routinely used in arguing for the introduction of public policies favoring the no-White population, especially in the areas of education (racial quotas for entrance to the universities), the labor market, access to land, and so on [36]. Nevertheless, our data presented here do not support such contention, since they show that, for instance, non-White individuals in the North, Northeast and Southeast have predominantly European ancestry and differing proportions of African and Amerindian ancestry.

ResearchBlogging.orgThe idea that Brazil is majority non-white, that is black, is one I’ve seen elsewhere. Using the American model of hypodescent, where children inherit the racial status of their most stigmatized ancestral component, no matter its magnitude, well over half of Brazilians are “black.” On the other hand, there’s the persistent trend in the recent analyses which show that black Brazilians have a much higher load of European ancestry than black Americans, while white Brazilians have a much higher load of Amerindian and African, than white Americans.

Let’s jump to the paper first. The Genomic Ancestry of Individuals from Different Geographical Regions of Brazil Is More Uniform Than Expected:

Read More

More on Colleges and Income

Dale and Krueger have responded to Robin Hanson at his blog, which commented on their most recent paper. I’ve also commented on this paper, here.

Most of Dale and Krueger’s comments relate to the stability of estimates that suggest that women earn less after attending high-SAT Colleges. I don’t see particularly compelling evidence here either way, though Hanson is right to note that many of the estimates are consistent in nature. I was surprised by their comment, “The paper is not about gender differences from college selectivity, and we have little reason to suspect that there are such differences.” Well, all three drafts of this paper that are online emphasize the results for attending College on various subgroups — for instance, by race, parental education, and parental income. Surely gender is an equally interesting subgroup.

They do also address the selectivity question — that is, why the Barron’s selectivity measure was large and statistically significant in the working paper, but not used in the published paper. They argue that precise manner in which the Barron’s selectivity measures were coded made a huge difference, and the result was important only for one specification. I’m happy to accept this answer. But as far as the “grand conspiracy” is concerned, I’ll note that even the published paper did make a compelling case that both the identity of the school and tuition paid were hugely important in determining future income. This result, for various reasons, may still have been incomplete. Yet it was the basic message of the published paper, and it’s simply the case that the popular press did not emphasize that result. For the record, I don’t think there was any conspiracy here. But it is awfully easy to trumpet the counter-intuitive but pleasing result — the College you went to doesn’t matter!

Also on the Barron’s measure, Dale and Krueger argue:

“While we did report a 23% return associated with attending the most selective colleges (according to the 1982 Barron’s ranking) in our earliest working paper, these results were from our basic model–which does NOT adjust for student unobserved characteristics.”

Here is the relevant section from Table 7 of their working paper:

If you haven’t seen a regression table, this will be confusing. The dependent variable — what they’re testing the effects for — is a logarithm transformation of wage. They’re testing which of the variables listed on the left matter for that, and each column represents a different specification.

The first three columns select on men. The first one tests to see how these variables impact future wages, without taking into consideration other Colleges you applied to, or where you got in. This is the “basic model,” and the .0234 here next to “Most Competitive” corresponds to the 23% return they mention above (relative to the lowest category of selectivity). But skip over to column 3. This “self-revelation” model is designed to get at student unobserved characteristics. As the authors write:

“The effect of the Barron’s rating is more robust to our attempts to adjust for unobserved school selectivity than the average-school SAT score. Based on the straightforward regression results in column 1, men who attend the most competitive schools earn 23% more than men who attend very competitive colleges, other variables in the equation being equal. In the self-revelation model, the gap is 13 percent… [An] F-test of the null hypothesis that the Barron’s ratings jointly have no effect on earnings is rejected at the .05 level in the matched applicant model for men.”

Now, this was in response to Hanson’s point. Hanson picked up on the 23% number, and Dale and Krueger are right to note that’s a little high (and Hanson is right to concede). But note that the very next sentence reports results from a specification which does adjust for student unobserved characteristics; and it is also quite high.

Finally, I’ll note that while the authors emphasize the significance (or lack of significance) for individual estimates in individual years, my simple calculations suggest that the aggregate, pooled effect of their variables might be quite large in economic importance.

Run as fast as you can!

Since his move to Wired I swear that Dr. Daniel MacArthur has gotten a bit more pugnacious. In any case, today he has a post up which smacks-down the A.M.A.’s attempt to expand the long arm of its regulatory capture:

The American Medical Association has written a letter to the US Food and Drug Administration as part of the lead-up to the FDA’s meeting on direct-to-consumer (DTC) genetic testing next month. The tone is predictable: the medical establishment is outraged by the idea of people having access to their own genetic information without the supervision of its members, and they want the FDA to stop it….

Over the past six months I’ve gotten really into analyzing genotypes of friends & family. Sometimes I talk about this excitedly, and people worry about the “risks.” When I ask what  risks they’re worried about, usually people offer the vague and content-free fear of “what you could find out.” First, if you have family information, that’s usually much more powerful than the “disease risk” estimates that these firms are giving you. In 99% of the cases, if that’s your primary concern it’s not worth the money. Second, if you’re terrified about what ancestry inference might tell you, probably you should see a shrink. You are what you are, and you’ve always been what you are. As a matter of common sense psychology, on the margin a change in self knowledge can have a big effect, but usually it is just informational icing on the cake.

I wouldn’t bet on any regulatory agency being able to clamp down on direct-to-consumer personal genomics for those who want to get it done at this point, though it is probably still possible if campaigners for F.U.D. get clever. If it’s banned in the United States no doubt the firms will move offshore (or new firms will crop up to fill the demand). Rather, it might have a dampening impact on the pace of innovation since there will be new impediments toward profitably. But here’s the important point, I’ve got the markers on several computers and in Gmail. Once the information is out, it’s out. There’s no way that the government can put the genie back in the bottle for those of us who have raced ahead of feared regulation. So run, just in case. Once you cross the threshold they can’t drag you back, no matter how powerful their lobbyists and marketers are.

Note: If you read this blog you know that I’m generally skeptical of the average person to interpret a mass of information. So in some ways F.U.D. pushers have a point. But, we live in a world of fad diets and all sorts of crazy movements. That’s a much bigger issue, and no one is pushing for regulation of that sort of thing.

A mental map of the world

One of the major issues in our world today is that we’re a people of specialties. This means that we don’t have basic interpretative frameworks in which to place novel facts. Because of the abstruse and formal nature of the discipline, this is probably starkest in the domain of science, but it is not restricted to only science. Consider geography. In many ways this is “low hanging” cognitive fruit in the shallow part of the learning curve which mostly consists of assembly of facts, but because of the shifts in emphases in American education geography has tended to get short shrift. This means that whenever there’s a foreign policy crisis middle-brow journals of record such as The New York Times have to commission pieces about nations such as Libya which read like a “first book” for six year olds on that nation (and on political weblogs commenters proudly brandish their “first book” level of knowledge).

But a bigger general issue seems to be in relation to climate. “Climate Change” is in the news constantly, but the average person on the street seems to have zero historical perspective on events such as the Medieval Warm Period, the Little Ice Age, let alone more obscure epochs such as the Younger Dryas. Fair enough, it isn’t as if Deep Time is ever going to be broadly interesting. But more disturbing to me is the total lack of perspective when it comes to current spatial patterns.

For example, a friend who has college degrees in history and philosophy, has traveled to Europe, Canada, and is planning a trip to Thailand and the Philippines, thought China was further to the north than Europe. Take a look at this map:

New York City, Madrid, and Beijing, are all at the same latitude. The average low in Beijing in January is -8.4 °C. For New York City it is -3.22. And finally, for Madrid it is 2.6. Why the difference? Barcelona, to the north and east of Madrid, on the coast, has a mean low of 4.4 °C. This tells us what’s going in the most general sense. Continentality. My friend’s ignorance was understandable; Beijing has a much more frigid clime than southern Europe. China as a whole is much further south than climate without context would suggest, while Europe is much further north than most expect. All that has to do with the rough shape of the continents (and possibly the Gulf Stream for Europe, though this might be overdone taking into account the generally mild character of western upper temperate regions of continents). But first, let’s look at another example.

Read More