PCA remains the swiss-army-knife to explore population structure


I put up a poll without context yesterday to gauge people about what methods they preferred when it came to population genetic structure.* PCA came out on top by a plural majority. More explicitly model-based methods, such as Structure/Admixture, come in right behind them. Curiously, the oldest method, pairwise Fst comparisons (greater Fst means more variance partitioned between the groups), and Treemix, the newest method, have lower proportions of adherence.

Why is PCA so popular? Unlike Treemix or pairwise Fst you don’t have to label populations ahead of time. You just put the variation in there, and the individuals shake out by themselves. Pairwise Fst and Treemix both require you to stipulate which population individuals belong to a priori. This means you often end up using PCA or some other method to do a pre-analysis stage. Structure/Admixture model-based methods make you select the number of distinct populations you want to explore, and often assume an underlying model of pulse admixture between populations (Treemix does this too when you have an admixture edge).

PCA is also better at smoking out structure than Structure/Admixture for the same number of markers, and, it’s pretty fast as well. This is why the first thing I do when I get population genetic data where I want to explore structure is do a PCA and look for clusters and outliers. After this pre-analysis stage, I can move onto other methods.

Further reading:

* I stipulated “genotyped-based” methods to set aside some of the new-fangled techniques, which often assume phasing and analysis of haplotypes, such as Chromopainter or explicit local ancestry deconvolution (some local ancestry deconvolution does not require phased haplotypes, but the most popular do).

Domestication of Rice in the Amazon

A new paper, Evidence for mid-Holocene rice domestication in the Americas, suggests that the Amazon basin was very culturally productive in the pre-Columbian period. What happened? From the conclusion:

The arrival of Europeans to the American continent in AD 1492, with the consequent population decimation and impact on cultural practices, caused the domesticated traits to gradually disappear. The loss of domesticated varieties is a phenomena that has also occurred for other indigenously domesticated species in both South….

One of the novel arguments in Charles C. Mann’s 1491 is that our idea that the Amazon basin has always been a pristine wilderness could be incorrect. Mann relays the theories of revisionist scholars who argue that at one point in history much of the basin was subject human landscape manipulation, with concentrated burnings allowing for increased productivity in the normally poor soil of the region. Of course, this triggered a counterattack from classical scholars.

If these results about rice domestication are confirmed and become solid I think this would lean toward supporting the arguments of the revisionists, whose side Mann seems to favor in any case.

A major general theme in 1491 is that the Columbian Exchange was a disaster for New World peoples, though relatively positive for the Old World. European access to land surplus in the New World has been given as one reason for the economic takeoff of this region (“ghost acres”), while maize introduced into China was responsible for its great population expansion in the centuries leading up to 1800.

In contrast, the consensus seems to be that New World populations suffered massive population declines (some of this has been confirmed by genetic evidence) driven in large part, though not exclusively, by introduced Old World diseases. Mann argues that early fantastical reports of a dense network of villages along the Amazon (which may have fueled legends of El Dorado) actually reflect the reality that in the 16th century the riverine civilization had not collapsed due to disease. At least not yet.

Let’s stipulate that rice domestication in the Amazon was occurring before 1492. This adds another independent domestication event during the Holocene. Basically, agriculture seems to be something that pops up over and over again after the end of the last Ice Age. Why? As I have suggested before a lot had changed since the previous interglacial over 100,000 years before the present. Our cognitive orientation and our cultural toolkit seem to evoke agriculture relatively quickly and independently.

Second, the indigenous peoples of the Amazon today are predominantly hunters and gatherers or slash & burn agriculturalists. Relatively simple societies. In 1491 the author outlines that that mass death often resulted not directly from disease, but the fact that the debilitation of large proportions of the population then led to famine, which led to social disruption and institutional collapse, which then fed into more death and destruction. Today we perceive the Amazonians as “ancient” and “primal” nomads of the forest, just as their tropical homeland is seen to be eternal and everlasting. This, despite the fact that many of them even today are agriculturalists, albeit of a low-intensity sort. But as they are, perhaps so we could be. Complex societies seem to unravel awful quickly when subject to exogenous “shocks.” Perhaps we should be grateful for our “Pleistocene minds.” You never know when a swiss-army-knife mind is going to come in handy….

Note: the natives of the Amazon are unique in the Americans is having a very basal Asian ancestry in their heritage.

(via Dispatches from Turtle Island)

The world of Tolkien coming to the smallscreen

Unless you are hiding under a rock right now you may have heard that Amazon seems to have purchased the rights for the world of The Lord of the Rings. My understanding is that this deal does not cover The Silmarillion (unfortunate, but perhaps for the best as I’m not sure I’d want to see a dramatization of The Children of Hurin). So perhaps one can imagine a series about Aragorn’s earlier adventures in Gondor? If I had my pick though I’d set something during the time of Gil-galad. The Second Age hasn’t be explored in narrative, so it’s a relatively blank canvas, and like The Lord of the Rings it ends in an existential climax.

Why is this happening? Read the story I linked to above. But clearly it’s because of Game of Thrones. As some of you might know George R. R. Martin attempted to develop his works for film in the wake of Peter Jackson’s success. But A Song of Ice and Fire was too sprawling, or more concretely it’s budget would have been outlandish if one wanted to depict it accurately.

In one volume the three book in The Lord of the Rings comes in at a little over 1,000 pages. In contrast the completed books of A Song of Ice and Fire are already more than 4,000 pages.

But this is in some ways the weakness of an attempt to turn The Lord of the Rings into something equivalent to Game of Thrones: the characters are not nearly as well fleshed out in their humanity as those of A Song of Ice and Fire. Tolkien and Martin share similarities in world-building, with a punctilious attention to detail, and a de-emphasis on magic as a deus ex machina.

But when it comes to good and evil Martin’s distribution is more uniform while Tolkien’s is bimodal. The shades of grey found in A Song of Ice and Fire are great raw material for character arcs in episodic television which sprawls over a decade. In contrast, The Lord of the Rings was compressed into three films, so the relatively simple and stark characterizations were good fits in the context of the world-building and plot. I don’t envy the actor who has to play Viggo Mortensen’s role, nor do I want to imagine the abuse writers or show-runners who want to add moral complexity and ambiguity to Aragorn’s character are going to experience from the hardcore fans.

In other news, you can now get a copy of Brandon Sanderson’s Oathbringer. One of the greatest fantasists of our time, albeit he produces works which are Heavenly Father approved! (I don’t state this as a criticism, it’s just that the God of Sanderson’s universe couldn’t even conceive of a creature like Cersei Lannister, let alone create her)

Addendum: The Hobbit films that Peter Jackson produced in this decade are correctly described as bloated affairs. The book didn’t have enough source material to create a plot that extend across three films. But, also note that there isn’t much character development or difference in many of the characters who spanned both groups of films. Part of is that Gandolf is an immortal demigod, while elves such as Elrond and Galadriel are thousands of years old (Galadriel is one of the oldest beings in Middle Earth, she was born ~7,500 years before the events Jackson’s films). It’s hard to imagine a lot of character development over a few decades for such individuals, but one could imagine implications of having lived thousands of years and how it might drive you somewhat crazy (R. Scott Bakker explores this in detail in The Great Ordeal).

My son in the genetics history books

Just saw today that my son’s prenatal sequencing was mentioned in DNA: The Story of the Genetic Revolution:

The ethics of sequencing a presumably health fetus will be debated for years to come. But the day of doing is already here. Razib Khan, a thirty-something graduate student and blogger, decided to sequence his first child’s genome while his wife was still pregnant. Although one instance of whole-genome sequencing in utero was reported in the New England Journal of Medicine in 2012, that had been done to supplement a positive cytogenetic result….

I want to correct the record for future printings: my first son was my second child. And, it was not my decision, it was our decision. My wife was an equal partner, and did as much behind the scenes in making the sequencing happen.

Disruptive ages happen…and they happen fast

A friend of mine was pointing out that there is something of an anti-civilizational polemic in Against the Grain: A Deep History of the Earliest States. It’s the same sort of impulse which also asserts that “Rome never fell it evolved” and that the “Dark Ages” is a myth. I pretty much agree with Scott Alexander’s take. The datum that pollution due to lead did not match that of Classical Antiquity until the early modern period is one I remember as a searing one from The Fall of Rome: And the End of Civilization. You can’t really argue with that.

After reading The Fall of Rome I had a period when I read a lot of stuff on late antiquity. For example Peter Brown’s The Rise of Western Christendom: Triumph and Diversity, A.D. 200-1000. Brown is a serious scholar, and I’ve read several more of his books. But, I do think it shares something with earlier scholarship, and some of the more polemic recent screeds of Rodney Stark (see How the West Won), and that is that Christianity is viewed as a good in and of itself.

That is, if there is one thing that can be said for the period after the fall of Rome, it is that Christianity transcended its Mediterranean focus, and became a truly international religion, and a light unto the nations. If you believe that Christianity is true, then details about population collapse and a recession of cultural productivity matter a lot less than otherwise.

I think the economic historical evidence on the balance does lead to the conclusion that the Roman Empire achieved an optimum of economic development during the Antonine period of the 2nd century A.D. through classical efficiencies on the margin (e.g., specialization through trade, bringing all of the land into production, etc.). These levels were not again reached until after 1000 A.D. in Europe, though comparisons are not entirely apt because innovations such as the moldboard plow and windmills allowed for increases in genuine economic productivity.

The bigger question that looms in the background though is would it have been better to be a median Roman citizen or a median subject of a Dark Age warlord? I don’t have a strong opinion on this, especially when it comes to the ability to consume above subsistence.  It seems likely that the far worst treatment of slaves in places like Sicily than anything serfs were subject to (though serfdom only truly came into its own during the end of the Dark Ages) should be weighed in the calculus, but the Roman peace was also a genuine peace. The petty conflicts persistent at a local level in the Dark Ages may have made the life of a typical peasant less secure than for Roman citizens.

Rather constant reports of subjects and citizens fleeing from strong political units, or more “advanced” nations (e.g., the early American frontier), tell us something real. People valued freedom. But not everyone fled, so we’re probably seeing a bias in terms of who attempted to escape the shackles of civilization (e.g., young able-bodied single men, in particular, loom large in these reports, and I think there’s a reason for that).

Near Prehistory in Northern Europe was an Indo-European world

The Picts were the topic of discussion on this week on In Our Time. They are a mysterious yet intriguing people because we don’t know much about them in their own words, but, they are one of the roots of modern Scottish identity. When I first encountered the Picts decades ago there was some debate as to whether they were a pre-Indo-European people or not. Today that seems to not be a hypothesis people entertain. Rather, the Picts were simply the least Romanized of the Brythonic Celtic people of Britain.

Today because of the genetic data I think we can be rather confident that by the time of the Roman Empire there were no non-Indo-Europeans left in Northern Europe. The Beaker people in Britain and Ireland seem to have overwhelmingly replaced the native population of farmers, whose ancestors had predominantly arrived from the eastern Mediterranean thousands of years ago (via the Atlantic littoral or Central Europe). Across Northern Europe, in general, the replacement of the previous populations was substantial, though not total.

In Southern Europe, the arrival of Indo-Europeans was more fitful, and persistence of Basque attests to the fact that non-Indo-European languages were spoken down to historical times (if Etruscan is considered native to the Italian peninsula, that’s another example, though this is hotly debated and I lean toward the exogenous model). The pre-Latin language of Sardinia was almost certainly not Indo-European, while Greek has a high proportion of non-Indo-European words in its lexicon.

 

The sons of the wolf

When I am not feeling well I often watch Netflix, since my brain really operates at a lower level (passive, consumptive). Curiously I was recommended a Turkish series about the father of Osman (the founder of the Ottoman dynasty), Ertuğrul. I only watched a little bit of it, but it reminded me of the mini-series from the early 2000s around Attila. There are so many commonalities across the nearly one thousand years that separate the Ottomans and Attila, but it shouldn’t be surprising, as it is highly likely that some element of the Hunnic horde was Turkic in origin.

Though I spend a lot of time on this blog talking about Indo-Europeans, because they are a rather big deal, and, they are prehistoric, I think it is important to remember the Turks as well. The similarities are clear. Both groups began at one end of the great Eurasian steppe but swept repeatedly to the other end. Both were at least in part nomadic, and both integrated with other ethnic groups along with their expansions. But the Turks operated on the edges of, and within, history. We know, for example, the importance of Sogdians in playing the role of Greeks to their Romans.

There is a curious tendency, perhaps somewhat justified, of focusing on the Turks after their predominant conversion to Islam around the centuries of 1000 A.D. But Turkic customs and folkways persisted for many centuries, and continue down to the present in a relatively unadulterated form in places like Kyrgyzstan. In The Turks in World History the author recounts how a Cuman chief leading his host into battle against the Byzantines gave a cry that mimicked a wolf, and how his horde repeated it in en masse. This is a callback to the earliest legends of the origins of the Turks, which assert that they were birthed from a she-wolf, and lived as smiths among other peoples.

Probably the best treatment of their common ancestry is in The Genetic Legacy of the Expansion of Turkic-Speaking Nomads across Eurasia. Though genome-wide the predominant northeast Eurasian character of the original Turks is swamped out by the time one reaches Anatolia, there is still an enrichment of i.b.d. tracts even that far, indicating a lineal which stretched from Siberia down to the Middle East.

Anyone who first sees a map of Indo-European languages is often amazed and surprised by their expanse. How could premodern people be so expansive and widespread? And yet the Turks show exactly how such a thing could happen, and they expanded into a much more densely populated and civilized world than the Indo-Europeans.

Open Thread, 11/12/2017

One of the major insights of contemporary cognitive psychology is that a lot of human mental processes emerge from the intersection of lower level intuitions/models/instincts. The key is to remember that a lot of mental operations occur implicitly and rapidly, and we often construct ad hoc rationalizations after the fact (see The Enigma of Reason).

Because rationality is such a good talker many of us have deluded ourselves into thinking that instead of being a mouthpiece and a lawyer that gets us out of sticky situations, it’s actually calling the shots. No.

Anyone interested in these topics should check out Paul Bloom’s Descartes’ Baby: How the Science of Child Development Explains What Makes Us Human (or his other books).

This comes to mind when thinking about issues that have been bubbling up in our society. A friend on Facebook who is an evolutionary anthropologist wondered about the context of Harvey Weinstein’s serial rapes. I think A Natural History of Rape: Biological Bases of Sexual Coercion get’s a bad rap because of the incendiary topic, but in this case, I think cognitive psychology yields a quicker and clearer answer. Weinstein is a very wealthy man, so if it was sex with nubile women he could have paid for high-priced escorts (and it seems he did on occasion). But cognitive psychology suggests that people crave “authenticity.” Weinstein’s targeting and abuse of women he knew professionally and personally clearly provided for him an addictive frisson that paying for sex wouldn’t have given him.

Today people are passing around this “shock poll,” Poll: 37 percent of Alabama evangelicals more likely to vote for Moore after allegations. Probably most of these people think this is a politically motivated hit. That being said, it brought to mind a passage from In Gods We Trust where respondents asserted that disconfirming evidence in regards to their beliefs actually made them stronger in their beliefs.

In other words, when it comes to deeply held beliefs people aren’t going to react in a straightforward manner to reason and logic. Don’t be surprised if they behave irrationally. If the irrationality is consistent across individuals there’s probably some deeper psychology you aren’t accounting for.

The problem of doctors’ salaries. The AMA licensing cartel is keeping the supply of medical services constrained. Yes, we need more doctors. But we need more non-doctors to be able to do things that only doctors can do right now.

On the other hand, medical doctors have on average $200,000 of educational debt when they graduate. The high debt load is probably in part because there is the assumption that they will be making between $200,000 and $400,000 per year (though with income tax rates, as well as malpractice insurance, remember their net take home is considerably less).

These sorts of structural features are why we can’t have nice things. I suspect most people agree that the American tax code should be reformed…but peoples’ choices have been made with deductions in mind!

We’re rolling out more shirts for DNAGeeks. Eight people have bough GNXP t-shirts. Would be curious to post a picture of someone wearing one of those. A little surprised, but the Evo-Devo t-shirts are selling well. Anyone have any ideas for something more pop-gen related?

I love maps [THE MAP IS FAKE!] which have more granularity than country vs. country comparisons. I really hate when people compare the USA to European countries. California alone is nearly as populous as Spain, which isn’t even a small European country.

The map to the left shows the areas of high GDP in South Asia, though resizing region by the size of the population would help give a better sense. The distinction between urban and rural is very stark in Bangladesh.

I predict Twitter will be clearly in a death spiral in a year. The proportion of highly polarized political chatter on my timeline keeps increasing, even though I’m not following anyone different. The vibrant years of “genomics twitter” seem to be a thing of the past.

The above tweet has gone somewhat viral. What did I mean above? The sort of thing in The End of History and the Last Man, that the terminal stable state of humanity would be post-materialist secular individualist liberalism. Though secularism seems to remain ascendant in the West, for now, the post-materialist individualism liberal project seems to be fraying. Instead of Western culture being a stand-in for global culture, it may be in the near future it will again be just another culture among cultures.

Before the Indo-Europeans in Ukraine

It’s been ten years since I read The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. It’s a great book, but some of the material was very wrong. The author, David Anthony, helped provide samples which undercut his thesis that Indo-Europeanization in Europe was mostly a matter of elite cultural diffusion. Rather, it looks as if there was a massive migration from the steppes.

The Horse, the Wheel, and Language was heavy on archaeology which I found hard to follow. The Cucuteni–Trypillia culture plays a major role in the narrative since it seems to have been a source of cultural influence on the Yamnaya steppe culture which eventually overran it. A new preprint seems to confirm that there was a genetic discontinuity. Analysis of ancient human mitochondrial DNA from Verteba Cave, Ukraine: insights into the origins and expansions of the Late Neolithic-Chalcolithic Cututeni-Tripolye Culture

…Burials at Verteba Cave are largely commingled and secondary in nature. A total of 68 individual bone specimens were analyzed. Most of these specimens were found in association with well-defined Tripolye artifacts. We determined 28 mtDNA D-Loop (368 bp) sequences and defined 8 sequence types, belonging to haplogroups H, HV, W, K, and T. These results do not suggest continuity with local pre-Eneolithic peoples, but rather complete population replacement. We constructed maximum parsimonious networks from the data and generated population genetic statistics…We find different signatures of demographic expansion for the Tripolye people that may be caused by existing population structure or the spatiotemporal nature of ancient data. Regardless, peoples of the Tripolye Culture are more closely related to early European farmers and lack genetic continuity with Mesolithic hunter-gatherers or pre-Eneolithic groups in Ukraine.

There is stuff in the preprint about population expansion. My personal opinion is that in most cases genetics doesn’t add much beyond what archaeology does for humans in reconstructing population history. Rather, these results in concern with others are strongly indicative of population turnover. Uniparental lineages are still useful, but only in the context of other data.

The great thing about genetics when it is so clear and striking is that it clears up confusions about relationships in the past that otherwise would be unclear. It’s like having a time machine. So we now know that early European farmers (EEF) were ancestors of this particular culture. Over the next decade or so we’ll get a really granular understanding of the ebb and flow of populations across prehistoric and historic Europe. This won’t abolish all controversy, but it will reduce the space of the unknown….

Bank your exome with Helix for free ($0.00) [update, SALE ENDED!]

Update: Sale over!

I wasn’t going to do this again, but I’ve decided to promote Helix’s special discount. It ends at 2:59 AM EDT November 10th. Eight hours from when I push this post.

Obviously, there is a conflict of interest as I work for one of Helix’s partners. What does that mean?

  • Helix does an exome+ sequence and stores your data.
  • Then, you buy applications which use that data.
  • The company I work for is one of the application providers.
  • “Exome” means that Helix does a very accurate medical grade sequence of all your genes. The “+” points to the fact that they include a substantial number of positions which are not within genes (in the “junk DNA”). That totals up to 30,000,000+ markers (the exome is 1% of your whole genome). This is not trivial. Current direct-to-consumer genomics companies are looking at 500,000 to 1,000,000 markers with SNP arrays.
  • Helix keeps this data. Within a few months, you can buy the data at cost (it won’t be cheap!). But the model is that you buy a la cart apps, which will be affordable (our products are affordable).

I’m laying this all out very plainly because many people are asking me about these details right now as the sale winds down, and this includes people who are pretty savvy about personal genomics. Here is why I think you should get the kits now:

  1. It gets my company more customers. That’s the self-interested part, and less important for the target audience.
  2. For you, it gets you an exome that you can buy later without any upfront cost. For the next eight hours, Helix is basically waiving the kit costs by dropping the price $100.

Our Neanderthal product is now $9.99. Our Metabolism product is $19.99. These products are great, as they give you functional information in a very user-friendly manner. But a lot of my readers can analyze their own data, so what’s the incentive then? Again, the incentive is that you get an exome for free, and can later buy it if you want, or, perhaps even a savvy personal genomics consumer will find an app they’ll want to purchase. Normally the kit is $80, so buying it now means you’ll never have to pay this cost. If you are the type of person who has qualms about a private company keeping your data, this may not be for you.

Of course, there are other app developers in the Helix store, so just buy whatever you want. This is a way to get your exome sequenced for free nowI will tell you that the Insitome apps are among the cheapest.

Finally, a lot of people are buying “family-pack” quantities. I got four kits for example for my immediate family. Unfortunately, there are some issues with the Helix site and the extra purchases. You can buy more than one easily at Amazon right now. Our Neanderthal product is not in low stock. The Metabolism product has only a few left, though I don’t know what that means.

Note: The discount is client-side, so you may need to switch browsers if you are going to the Helix site to buy (or turn off ad-block). From what I can see Amazon does not have these issues.