LaTeX notes

I recently installed a plugin that will allow me to render LaTeX. Mostly this is because I’ve long avoided writing out equations because it’s awkward in HTML, and it gets unintelligible quickly. This will allow me to explore population genetics in its “natural language” a little easier.

But I just noticed on my RSS feed view the LaTeX is not rendering for whatever reason. Where there should be equations or LaTeX rendered text there is a blank space. This is unfortunate, but I don’t know what to do about it. So just click through if you want the equations. If you are willing to “hum through” those portions it shouldn’t matter. The equations aren’t going to take up much of any post.

Also, Donald Knuth is a god. Probably he’ll think that’s blasphemous because he’s a Lutheran and all, but The Art of Computer Programming beats the Bible in my book!

Why the rate of evolution may only depend on mutation

Sometimes people think evolution is about dinosaurs.

It is true that natural history plays an important role in inspiring and directing our understanding of evolutionary process. Charles Darwin was a natural historian, and evolutionary biologists often have strong affinities with the natural world and its history. Though many people exhibit a fascination with the flora and fauna around us during childhood, often the greatest biologists retain this wonderment well into adulthood (if you read W. D. Hamilton’s collections of papers, Narrow Roads of Gene Land, which have autobiographical sketches, this is very evidently true of him).

But another aspect of evolutionary biology, which began in the early 20th century, is the emergence of formal mathematical systems of analysis. So you have fields such as phylogenetics, which have gone from intuitive and aesthetic trees of life, to inferences made using the most new-fangled Bayesian techniques. And, as told in The Origins of Theoretical Population Genetics, in the 1920s and 1930s a few mathematically oriented biologists constructed much of the formal scaffold upon which the Neo-Darwinian Synthesis was constructed.

The product of evolution

At the highest level of analysis evolutionary process can be described beautifully. Evolution is beautiful, in that its end product generates the diversity of life around us. But a formal mathematical framework is often needed to clearly and precisely model evolution, and so allow us to make predictions. R. A. Fisher’s aim when he wrote The Genetical Theory Natural Selection was to create for evolutionary biology something equivalent to the laws of thermodynamics. I don’t really think he succeeded in that, though there are plenty of debates around something like Fisher’s fundamental theorem of natural selection.

But the revolution of thought that Fisher, Sewall Wright, and J. B. S. Haldane unleashed has had real yields. As geneticists they helped us reconceptualize evolutionary process as more than simply heritable morphological change, but an analysis of the units of heritability themselves, genetic variation. That is, evolution can be imagined as the study of the forces which shape changes in allele frequencies over time. This reduces a big domain down to a much simpler one.

Genetic variation is concrete currency with which one can track evolutionary process. Initially this was done via inferred correlations between marker traits and particular genes in breeding experiments. Ergo, the origins of the “the fly room”.

But with the discovery of DNA as the physical substrate of genetic inheritance in the 1950s the scene was set for the revolution in molecular biology, which also touched evolutionary studies with the explosion of more powerful assays. Lewontin & Hubby’s 1966 paper triggered a order of magnitude increase in our understanding of molecular evolution through both theory and results.

The theoretical side occurred in the form of the development of the neutral theory of molecular evolution, which also gave birth to the nearly neutral theory. Both of these theories hold that most of the variation with and between species on polymorphisms are due to random processes. In particular, genetic drift. As a null hypothesis neutrality was very dominant for the past generation, though in recent years some researchers are suggesting that selection has been undervalued as a parameter for various reasons.

Setting the live scientific debate, which continue to this day, one of the predictions of neutral theory is that the rate of evolution will depend only on the rate of mutation. More precisely, the rate of substitution of new mutations (where the allele goes from a single copy to fixation of ~100%) is proportional to the rate of mutation of new alleles. Population size doesn’t matter.

The algebra behind this is straightforward.

First, remember that the frequency of the a new mutation within a population is \frac{1}{2N}, where N is the population size (the 2 is because we’re assuming diploid organisms with two gene copies). This is also the probability of fixation of a new mutation in a neutral scenario; it’s probability is just proportional to its initial frequency (it’s a random walk process between 0 and 1.0 proportions). The rate of mutations is defined by \mu, the number of expected mutations at a given site per generation (this is a pretty small value, for humans it’s on the order of 10^{-8}). Again, there are 2N gene copies, so you have 2N\mu to count the number of new mutations.

The probability of fixation of a new mutations multiplied by the number of new mutations is:

    \[ \( \frac{1}{2N} \) \times 2N\mu = \mu \]

So there you have it. The rate of fixation of these new mutations is just a function of the rate of mutation.

Simple formalisms like this have a lot more gnarly math that extend them and from which they derive. But they’re often pretty useful to gain a general intuition of evolutionary processes. If you are genuinely curious, I would recommend Elements of Evolutionary Genetics. It’s not quite a core dump, but it is a way you can borrow the brains of two of the best evolutionary geneticists of their generation.

Also, you will be able to answer the questions on my survey better the next time!

The logic of human destiny was inevitable 1 million years ago

Robert Wright’s best book, Nonzero: The Logic of Human Destiny, was published nearly 20 years ago. At the time I was moderately skeptical of his thesis. It was too teleological for my tastes. And, it does pander to a bias in human psychology whereby we look to find meaning in the universe.

But this is 2017, and I have somewhat different views.

In the year 2000 I broadly accepted the thesis outlined a few years later in The Dawn of Human Culture. That our species, our humanity, evolved and emerged in rapid sequence, likely due to biological changes of a radical kind, ~50,000 years ago. This is the thesis of the “great leap forward” of behavioral modernity.

Today I have come closer to models proposed by Michael Tomasello in The Cultural Origins of Human Cognition and Terrence Deacon in The Symbolic Species: The Co-evolution of Language and the Brain. Rather than a punctuated event, an instance in geological time, humanity as we understand it was a gradual process, driven by general dynamics and evolutionary feedback loops.

The conceit at the heart of Robert J. Sawyer’s often overly preachy Neanderthal Parallax series, that if our own lineage went extinct but theirs did not they would have created a technological civilization, is I think in the main correct. It may not be entirely coincidental that the hyper-drive cultural flexibility of African modern humans evolved in African modern humans first. There may have been sufficient biological differences to enable this to be likely. But I believe that if African modern humans were removed from the picture Neanderthals would have “caught up” and been positioned to begin the trajectory we find ourselves in during the current Holocene inter-glacial.

Luke Jostins’ figure showing across board encephalization

The data indicate that all human lineages were subject to increased encephalization. That process trailed off ~200,000 years ago, but it illustrates the general evolutionary pressures, ratchets, or evolutionary “logic”, that applied to all of them. Overall there were some general trends in the hominin lineage that began to characterized us about a million years ago. We pushed into new territory. Our rate of cultural change seems to gradually increased across our whole range.

One of the major holy grails I see now and then in human evolutionary genetics is to find “the gene that made us human.” The scramble is definitely on now that more and more whole genome sequences from ancient hominins are coming online. But I don’t think there will be such gene ever found. There isn’t “a gene,” but a broad set of genes which were gradually selected upon in the process of making us human.

In the lingo, it wasn’t just a hard sweep from a de novo mutation. It was as much, or even more, soft sweeps from standing variation.

The case against nutrition “science”

My attitude toward nutrition science is to be skeptical of everything. I am of the generation that lived through the SnackWells fat-free cookie craze (demand was so high at one point that there was a problem with continuous understocking). A friend who is a professor of biology once admitted to me that part of him feels somewhat bad for anti-vaccination believers, because when it comes to nutrition he and many of his colleagues take a very jaundiced view of any orthodoxy. The surfeit of observational studies combined with the huge revenues at stake mean that skepticism is warranted.

This puts the public, and those who serve them in a peculiar position. Last year I recall going to a restaurant where some of the menu items were labeled as “low cholesterol, heart healthy.” I told our server that there is no evidence that dietary cholesterol has any effect on serum levels in your body. But the overhang of nutritional orthodoxy persists, and the American Heart Associations prominence and tendency to be a lagging indictor of the science is going to cast a pall over “public awareness” for decades.

Now researchers are going back to the original studies which supported modern orthodoxy, and finding results that are surprising. Re-evaluation of the traditional diet-heart hypothesis: analysis of recovered data from Minnesota Coronary Experiment (1968-73). Here is the conclusion:

Available evidence from randomized controlled trials shows that replacement of saturated fat in the diet with linoleic acid effectively lowers serum cholesterol but does not support the hypothesis that this translates to a lower risk of death from coronary heart disease or all causes. Findings from the Minnesota Coronary Experiment add to growing evidence that incomplete publication has contributed to overestimation of the benefits of replacing saturated fat with vegetable oils rich in linoleic acid.

The whole story is told over at Scientific American.

Open Thread, 3/23/2017

The reader survey now N > 300. I assume it will stabilize in the next few weeks in the 400s.

So far the biggest surprise that I’ve noticed is the ratio of married to divorced; 14o to 9. But, this aligns with research that college educated people do not get divorced at a high rate, and more than 50% of my readership has completed graduate educations, so the sample is probably even more biased.

In France it is Marcon vs. Le Pen for the second round it seems. It seems likely Marcon will win the second round…but I do wonder if some far Left voters will refuse to vote for a candidate is a pretty transparent avatar of the globalist elite.

I love California, but, In costly Bay Area, even six-figure salaries are considered ‘low income’:

San Francisco and San Mateo counties have the highest limits in the Bay Area — and among the highest such numbers in the country. A family of four with an income of $105,350 per year is considered “low income.” A $65,800 annual income is considered “very low” for a family the same size, and $39,500 is “extremely low.” The median income for those areas is $115,300.

The problem many, but not all, Lefties in this part of the country have is their rhetoric is always about making housing affordable, not making more housing (which would naturally lead to more affordability).

Stanford CS department updates introductory courses: Java is Gone.

I was a bit surprised how few readers had read Matt Ridley’s Genome: The Autobiography of a Species in 23 Chapters. I’d highly recommend it.

A new wave of GSS data is out. Might start some GSS blogging again.

Maybe moderate drinking isn’t so good for you after all:

But our latest research challenges this view. We found while moderate drinkers are healthier than relatively heavy drinkers or non-drinkers, they are also wealthier. When we control for the influence of wealth, then alcohol’s apparent health benefit is much reduced in women aged 50 years or older, and disappears completely in men of similar age.

People I know had long warned these were observational studies. But perhaps I run with a strange crowd….

Why the Menace of Mosquitoes Will Only Get Worse: Climate change is altering the environment in ways that increase the potential for viruses like Zika.

America’s great Saudi foreign policy sin

The future past

Periodically on my Twitter feed there is mention of the new series, The Handmaid’s Tale. The New York Times has a typical positive review. The author attempts to assert its contemporary relevance, ending with ‘the new “Handmaid’s Tale” enters the culture as its own kind of Offred-like resistance, pushing back against a reality that somehow got ahead of the show’s own imagination.’

This is not the 1980s. Or the early 2000s. The President of the United States is a nominal Christian at best. Maggie Haberman, who covers Trump for The New York States had this to say about his relationship to Mike Pence:

…When Trump and Pence were first getting to know each other, the one thing that Trump had relayed to people, according to several advisers I spoke to at the time, was that he was a little uncomfortable with how frequently Pence prayed. And Pence is fairly devout about his praying. Trump is not a serious churchgoer and in an anomaly for a presidential candidate, very rarely went to church services when he was running….

We live in an age of massive secularization, even on the conservative Right. Ergo, the rise of a post-religious Right predicated on ethnic identity, whether implicitly or explicitly. Though Donald Trump and the Republicans in Congress are going to rollback a few of the victories of the cultural Left, there is no likelihood of turning back the clock on the biggest win of the last generation for that camp, gay marriage.

Also, don’t watch the series, read the book. Books are usually better. While I’m recommending reading, while Atwood’s work gets a lot of attention (it’s already been made into a film back in 1990), I want to suggest Pamela Sargent’s The Shore of Women for those curious about a different take on broadly similar themes. Flipping the framework of The Handmaid’s Tale on its head Sargent depicts a far future gynocracy, as opposed to a near future patriarchy. Additionally, The Shore of Women  has echoes of the bizarre 1970s film Zardoz.

I’ve always felt the Sargent is an underrated writer (also see Ruler of the Sky, a novelization of the life of Genghis Khan). Her output is not high volume, but it is high quality.

But this post is not about The Handmaid’s Tale, and the specter of an anti-feminist dystopia. Rather, it will be on the reality of an anti-feminist dystopia which exists in our world, which also happens to be religiously totalitarian and oligarchic. I am talking about the great ally of the United States of America in the Middle East, the kingdom of Saudi Arabia.

Read More

2017 Gene Expression reader survey

Since I’m finally getting settled in here, I thought it was a good time to do a reader survey:

So it’s open. You can only take it once, but it shouldn’t take more than a few minutes. There are 30 questions but the first 20 are mostly demographic and should go very quickly (e.g., your age, your sex, your race), and the last 10 are not difficult either (if you don’t know if you are a deontologist or consequentalist on ethics, don’t answer). Many are now of the form where you can answer more than one option.

I basically took the template of last year’s survey, made several changes, removing some questions and adding some. Also, I stole a few from Slate Star Codex.

You can read the non-text answers of the 2016 survey here.

In the middle of May I will the raw data (no-IP) and post it here so others can analyze if they want.

Addendum 1: Since I don’t know where else to put this, I have noticed an increase in referrals through my Amazon links. So that’s much appreciated. Obviously I’m not really getting paid much for blogging or doing the sysadmin activities, but it’s definitely going to covering overages from VPS traffic or anything like that. Remember, even if you don’t buy directly through the link I still get a referral if you are on Amazon during a session and buy something different.

Addendum 2: Forgot to mention. I’ve been doing reader surveys since 2004. The final tally of the number of people who fill the survey is always between 300 and 500, invariant of how much traffic I received (my traffic has varied about an order of magnitude over the years). It is curious to me that this “core readership” (as I perceive it) is about the same size as a Roman cohort.

Aryan marauders from the steppe came to India, yes they did!

Its seems every post on Indian genetics elicits dissents from loquacious commenters who are woolly on the details of the science, but convinced in their opinions (yes, they operate through uncertainty and obfuscation in their rhetoric, but you know where the axe is lodged). This post is an attempt to answer some questions so I don’t have to address this in the near future, as ancient DNA papers will finally start to come out soon, I hope (at least earlier than Winds of Winter).

In 2001’s The Eurasian Heartland: A continental perspective on Y-chromosome diversity Wells et al. wrote:

The current distribution of the M17 haplotype is likely to represent traces of an ancient population migration originating in southern Russia/Ukraine, where M17 is found at high frequency (>50%). It is possible that the domestication of the horse in this region around 3,000 B.C. may have driven the migration (27). The distribution and age of M17 in Europe (17) and Central/Southern Asia is consistent with the inferred movements of these people, who left a clear pattern of archaeological remains known as the Kurgan culture, and are thought to have spoken an early Indo-European language (27, 28, 29). The decrease in frequency eastward across Siberia to the Altai-Sayan mountains (represented by the Tuvinian population) and Mongolia, and southward into India, overlaps exactly with the inferred migrations of the Indo-Iranians during the period 3,000 to 1,000 B.C. (27). It is worth noting that the Indo-European-speaking Sourashtrans, a population from Tamil Nadu in southern India, have a much higher frequency of M17 than their Dravidian-speaking neighbors, the Yadhavas and Kallars (39% vs. 13% and 4%, respectively), adding to the evidence that M17 is a diagnostic Indo-Iranian marker. The exceptionally high frequencies of this marker in the Kyrgyz, Tajik/Khojant, and Ishkashim populations are likely to be due to drift, as these populations are less diverse, and are characterized by relatively small numbers of individuals living in isolated mountain valleys.

In a 2002 interview with the India site Rediff, the first author was more explicit:

Some people say Aryans are the original inhabitants of India. What is your view on this theory?

The Aryans came from outside India. We actually have genetic evidence for that. Very clear genetic evidence from a marker that arose on the southern steppes of Russia and the Ukraine around 5,000 to 10,000 years ago. And it subsequently spread to the east and south through Central Asia reaching India. It is on the higher frequency in the Indo-European speakers, the people who claim they are descendants of the Aryans, the Hindi speakers, the Bengalis, the other groups. Then it is at a lower frequency in the Dravidians. But there is clear evidence that there was a heavy migration from the steppes down towards India.

But some people claim that the Aryans were the original inhabitants of India. What do you have to say about this?

I don’t agree with them. The Aryans came later, after the Dravidians.

Over the past few years I’ve gotten to know the above first author Spencer Wells as a personal friend, and I think he would be OK with me relaying that to some extent he was under strong pressure to downplay these conclusions. Not only were, and are, these views not popular in India, but the idea of mass migration was in bad odor in much of the academy during this period. Additionally, there was later work which was less clear, and perhaps supported an Indian origin for R1a1a. Spencer himself told me that it was not impossible for R1a to have originated in India, but a branch eventually back-migrated to southern Asia.

But even researchers from the group at Stanford where he had done his postdoc did not support this model by the middle 2000s, Polarity and Temporality of High-Resolution Y-Chromosome Distributions in India Identify Both Indigenous and Exogenous Expansions and Reveal Minor Genetic Influence of Central Asian Pastoralists. In 2009 a paper out of an Indian group was even stronger in its conclusion for a South Asian origin of R1a1a, The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system.

By 2009 one might have admitted that perhaps Spencer was wrong. I was certainly open to that possibility. There was very persuasive evidence that the mtDNA lineages of South Asia had little to do with Europe or the Middle East.

Yet a closer look at the above papers reveals two major systematic problems.

First, ancient DNA has made it clear that there has been major population turnover during the Holocene, but this was not the null hypothesis in the 2000s. Looking at extant distributions of lineages can give one a distorted view of the past. Frankly, the 2009 Indian paper was egregious in this way because they included Turkic groups in their Central Asian data set. Even in 2009 there was a whole lot of evidence that Central Asian Turkic groups were likely very different from Indo-European Turanian populations which would have been the putative ancestors of Indo-Aryans. Honestly the authors either consciously loaded the die to reduce the evidence for gene flow from Central Asia, or they were ignorant (the nature of the samples is much clearer in the supplements than the  primary text for what it’s worth).

Second, Y chromosomal marker sets in the 2000s were constrained to fast mutating microsatellite regions or less than 100 variant SNPs on the Y. Because it is so repetitive the Y chromosome is hard to sequence, and it really took the technologies of the last ten years to get it done. Both the above papers estimate the coalescence of extant R1a1a lineages to be 10-15,000 years before the present. In particular, they suggest that European and South Asian lineages date back to this period, pushing back any possible connection between the groups, and making it possible that European R1a1a descended from a South Asian founder group which was expanding after the retreat of the ice sheets. The conclusions were not unreasonable based on the methods they had.  But now we have better methods.*

Whole genome sequencing of the Y, as well as ancient DNA, seems to falsify the above dates. Though microsatellites are good for very coarse grain phyolgenetic inferences, one has to be very careful about them when looking at more fine grain population relationships (they are still useful in forensics to cheaply differentiate between individuals, since they accumulate variation very quickly). They mutate fast, and their clock may be erratic.

Additionally, diversity estimates were based on a subset of SNP that were clearly not robust. R1a1a is not diverse anywhere, though basal lineages seem to be present in ancient DNA on the Pontic steppe in some cases.

To show how lacking in diversity R1a1a is, here are the results of a 2016 paper which performed whole genome sequencing on the Y. Instead of relying on the order of 10 to 100 SNPs, this paper discover over 65,000 Y variants worldwide. Notice how little difference there is between different South Asian groups below, indicative of a massive population expansion relatively recently in time which didn’t even have time to exhibit regional population variation. They note that “The most striking are expansions within R1a-Z93 [the South Asian clade], ~4.0–4.5 kya. This time predates by a few centuries the collapse of the Indus Valley Civilization, associated by some with the historical migration of Indo-European speakers from the western steppes into the Indian sub-continent.

Read More

Oxford Nanopore finally giving hope to biologist’s dreams

I don’t talk too much about genomic technology because it changes so fast. Being up-to-date on the latest machines and tools often requires regular deep-dives right now, though I believe at some point technological improvements will plateau as the data returned will be cheap and high quality enough that there won’t be much to gain on the margin.

Of course we’ve already come a long way. Fifteen years ago a “whole human genome” cost on the order of billions of dollars. Today a high quality whole human genome will run you on the order of $1,000. This is fundamentally a technology driven change, with big metal machines automatically generating reads and powerful computers to process them. One couldn’t imagine such a scenario 30 years ago because the technology wasn’t there.

I’ve stated before that I don’t think genomics fundamentally alters what we know and understand about evolution. At least so far. But it is a huge change in the domain of medicine. Cleary the human genomicists, especially Francis Collins, overhyped the yield of the technology in relation to healthcare in the 2000s. But with cheap and ubiquitous sequencing we may see the end of Mendelian diseases in our lifetime (through screening and possibly at some point CRISPR therapy).

This has been driven by technological innovation in the private sector around a few firms. The famous chart showing the massive decline in the cost of genomic sequencing over the past 15 years is due in large part to the successes of Illumina. But, Illumina has also had a quasi-monopoly on the field over the past five years (or more), and that shows with the leveling off of the decline in cost. Until the past year….

What gives? Many people believe that Illumina is moving again in part because a genuine challenger is emerging, or at least the flicker of a challenge, in the form of Oxford Nanopore. Oxford Nanopore has been around since 2005, but it really came into the public eye around 2010 or so. But like many tech companies it overpromised in the early years. I remember skeptically listening to a friend in the fall of 2011 talk about how quickly Nanopore was going to change the game…. I didn’t put too much stock into these sorts of presentations to hopeful researchers because I remember Pacific Biosciences making the same sort of pitch to amazed biologists in 2008. Pac Bio is still around, but has turned out to be a bit player, rather than a challenger to Illumina.

But I have to admit that Nanopore has really started to step up its game of late. Probably one of the major things it has accomplished is that it’s made us reimagine what sequencing technology should look like. Rather than refrigerators of various sizes, Oxford Nanopore allows us to imagine sequencing technology which exhibits a form factor more analogous to a USB thumb drive. The first time I saw a Nanopore machine in the flesh I knew intellectually what I was going to see…but because of my deep intuitions I still overlooked the two Nanopore machines laying on the workbench in front of me.

Despite their amazing form factor, these early Nanopore machines had limited application. They didn’t generate much data, and so were utilized by researchers who worked with smaller genomes. Scientists who worked with bacteria seem to have been using them a lot, for example. Additionally the machines were error prone and people were working out their kinks in real time in laboratories (one tech told me early on they were so small that he swore they were affected by ambient vibrations so he found ways to dampen that source of error).

A new preprint suggests we may be turning the corner though, Nanopore sequencing and assembly of a human genome with ultra-long reads:

Nanopore sequencing is a promising technique for genome sequencing due to its portability, ability to sequence long reads from single molecules, and to simultaneously assay DNA methylation. However until recently nanopore sequencing has been mainly applied to small genomes, due to the limited output attainable. We present nanopore sequencing and assembly of the GM12878 Utah/Ceph human reference genome generated using the Oxford Nanopore MinION and R9.4 version chemistry. We generated 91.2 Gb of sequence data (~30x theoretical coverage) from 39 flowcells. De novo assembly yielded a highly complete and contiguous assembly (NG50 ~3Mb). We observed considerable variability in homopolymeric tract resolution between different basecallers. The data permitted sensitive detection of both large structural variants and epigenetic modifications. Further we developed a new approach exploiting the long-read capability of this system and found that adding an additional 5x-coverage of “ultra-long” reads (read N50 of 99.7kb) more than doubled the assembly contiguity. Modelling the repeat structure of the human genome predicts extraordinarily contiguous assemblies may be possible using nanopore reads alone. Portable de novo sequencing of human genomes may be important for rapid point-of-care diagnosis of rare genetic diseases and cancer, and monitoring of cancer progression. The complete dataset including raw signal is available as an Amazon Web Services Open Dataset at:

30x just means that you’re getting bases sampled typically 30 times, so that you have a very accurate and precise read on its state. 30x has become the default standard in medical genomics. If Nanopore can do 30x on human genomes at reasonable cost it won’t be a niche player much longer.

The read length is important because last I checked the human genome still had large holes in it. The typical Illumina machine produces average read lengths in the low hundreds of base pairs. If you have large repetitive regions of the human genome (and you do have these), you’re never going to span them with such short yardsticks. Additionally, these short reads have to be tiled together when you assemble a genome from raw results, and this is a computationally really intensive task. It’s good when you have a reference genome you can align to as a scaffold. But researchers who don’t work on humans or model organisms may not have a good reference genome, or in many cases a reference genome at all.

Pac Bio occupies a space where it provide really long reads for a high price point. Most of the time this isn’t necessary, but imagine you work on a disease which is caused by large repetitive regions. You are likely willing to pay the price that is asked. And because Pac Bio generates very long reads it makes de novo assembly much easier, as your algorithm has to tile together far fewer contiguous sequences, and long sequences are less likely to have lots of repetitive matches in the genome.

But Pac Bio machines are expensive and huge. In the abstract above it alludes to “Portable de novo sequencing of human genomes.” This is a huge deal. The dream, as whispered by some genomicists I have known, is that at a point in the future biologists would carry portable sequencers which would produce very long reads that so that they could de novo assemble sequences on the spot. A concrete example might be a health inspector checking on the sorts of microbes found on the counter of a restaurant, or a field ecologist who might be sample various fungi to discover cryptic species.

Obviously this is still a dream. The preprint above makes it clear that to do what they did required a lot of novel techniques and development of new tools. This isn’t beta technology, it’s early alpha. But because it’s 2017 the outlines of the dream are coming into public view.

Citation: Nanopore sequencing and assembly of a human genome with ultra-long reads
Miten Jain, Sergey Koren, Josh Quick, Arthur C Rand, Thomas A Sasani, John R Tyson, Andrew D Beggs, Alexander T Dilthey, Ian T Fiddes, Sunir Malla, Hannah Marriott, Karen H Miga, Tom Nieto, Justin O’Grady, Hugh E Olsen, Brent S Pedersen, Arang Rhie, Hollian Richardson, Aaron Quinlan, Terrance P Snutch, Louise Tee, Benedict Paten, Adam M. Phillippy, Jared T Simpson, Nicholas James Loman, Matthew Loose
bioRxiv 128835; doi: