Razib Khan’s raw genotype data on 23andMe, Family Tree DNA, Geno 2.0 and Ancestry

It has been a while since I posted an update on my genotype. Since then I’ve been tested on most of the major platforms. I don’t see any harm in releasing this to the public or researchers who want to look at it (though I don’t know why anyone would).

You can download all the files here.

Having my genotypes public is pretty useful for me. If I inquire about someone’s genetics oftentimes people get weirdly defense and ask “what about you?” I Just invite them to look at my raw data and analyze it for themselves! I’m not a hypocrite about this.

Over the years I’ve had researchers inquire about my ethnicity when they stumble upon my genotype on platforms such as openSNP. So in full disclosure, most of my ancestry is pretty standard eastern Bengali. I’m more East Asian shifted than most Bangladeshi samples in the 1000 Genomes project, but then my family is from Comilla, in the far east of eastern Bengal (anyone who cares, my Y is of course R1a1a-Z93 and my mtDNA U2b).

As before I’ll put the genotype under a Creative Commons license:Creative Commons License

Bank your exome with Helix for free ($0.00) [update, SALE ENDED!]

Update: Sale over!

I wasn’t going to do this again, but I’ve decided to promote Helix’s special discount. It ends at 2:59 AM EDT November 10th. Eight hours from when I push this post.

Obviously, there is a conflict of interest as I work for one of Helix’s partners. What does that mean?

  • Helix does an exome+ sequence and stores your data.
  • Then, you buy applications which use that data.
  • The company I work for is one of the application providers.
  • “Exome” means that Helix does a very accurate medical grade sequence of all your genes. The “+” points to the fact that they include a substantial number of positions which are not within genes (in the “junk DNA”). That totals up to 30,000,000+ markers (the exome is 1% of your whole genome). This is not trivial. Current direct-to-consumer genomics companies are looking at 500,000 to 1,000,000 markers with SNP arrays.
  • Helix keeps this data. Within a few months, you can buy the data at cost (it won’t be cheap!). But the model is that you buy a la cart apps, which will be affordable (our products are affordable).

I’m laying this all out very plainly because many people are asking me about these details right now as the sale winds down, and this includes people who are pretty savvy about personal genomics. Here is why I think you should get the kits now:

  1. It gets my company more customers. That’s the self-interested part, and less important for the target audience.
  2. For you, it gets you an exome that you can buy later without any upfront cost. For the next eight hours, Helix is basically waiving the kit costs by dropping the price $100.

Our Neanderthal product is now $9.99. Our Metabolism product is $19.99. These products are great, as they give you functional information in a very user-friendly manner. But a lot of my readers can analyze their own data, so what’s the incentive then? Again, the incentive is that you get an exome for free, and can later buy it if you want, or, perhaps even a savvy personal genomics consumer will find an app they’ll want to purchase. Normally the kit is $80, so buying it now means you’ll never have to pay this cost. If you are the type of person who has qualms about a private company keeping your data, this may not be for you.

Of course, there are other app developers in the Helix store, so just buy whatever you want. This is a way to get your exome sequenced for free nowI will tell you that the Insitome apps are among the cheapest.

Finally, a lot of people are buying “family-pack” quantities. I got four kits for example for my immediate family. Unfortunately, there are some issues with the Helix site and the extra purchases. You can buy more than one easily at Amazon right now. Our Neanderthal product is not in low stock. The Metabolism product has only a few left, though I don’t know what that means.

Note: The discount is client-side, so you may need to switch browsers if you are going to the Helix site to buy (or turn off ad-block). From what I can see Amazon does not have these issues.

10 million DTC dense marker genotypes by end of 2017?


Today I got an email from 23andMe that they’d hit the 2 million customer mark. Since they reached their goal of 1 million kits purchased the company seems to have taken its foot off the pedal of customer base growth to focus on other things (in particular, how to get phenotypic data from those who have been genotyped). In contrast Ancestry has been growing at a faster rate of late. After talking to Spencer Wells (who was there at the beginning of the birth of this sector) we estimated that the direct-to-consumer genotyping kit business is now north of 5 million individuals served. Probably closer to 6 or 7 million, depending on the numbers you assume for the various companies (I’m counting autosomal only).

This pretty awesome. Each of these firm’s genotype in the range of 100,000 to 1 million variant markers, or single nucleotide base pairs. 20 years ago this would have been an incredible achievement, but today we’re all excited about long-read sequencing from Oxford Nanopore. SNP-chips are almost ho-hum.

But though sequencing is the cutting edge, the final frontier and terminal technology of reading your DNA code, genotyping in humans will be around for a while because of cost. At ASHG last year a medical geneticist was claiming price points in bulk for high density SNP-chips are in the range of the low tens of dollars per unit. A good high coverage genome sequence is still many times more expensive (perhaps an order of magnitude ore more depending on who you believe). It also can impose more data processing costs than a SNP-chip in my experience.

Here’s a slide from Spencer:

I suspect genotyping will go S-shaped before 2025 after explosive growth in genotyping. Some people will opt-out. A minority of the population, but a substantial proportion. At the other extreme of the preference distribution you will have those who will start getting sequenced. Researchers will begin talk about genotyping platforms like they talk about microarrays (yes, I know at places like the Broad they already talk about genotyping like that, but we can’t all be like the Broad!).

Here’s an article from 2007 on 23andMe in Wired. They’re excited about paying $1,000 genotyping services…the cost now of the cheapest high quality (30x) whole genome sequences. Though 23andMe has a higher price point for its medical services, many of the companies are pushing their genotyping+ancestry below $100, a value it had stabilized at for a few years. Family Tree DNA has a father’s day sale for $69 right now. Ancestry looks to be $79. The Israel company MyHeritage is also pushing a $69 sale price (the CSO there is advertising that he’s hiring human geneticists, just so you know). It seems very likely that a $50 price point is within site in the next few years as SNP-chip costs become trivial and all the expenses are on the data storage/processing and visualization costs. I think psychologically for many people paying $50 is not cheap, but it is definitely not expensive. $100 feels expensive.

Ultimately I do wonder if I was a bit too optimistic that 50% of the US population will be sequenced at 30x by 2025. But the dynamic is quite likely to change rapidly because of a technological shift as the sector goes through a productivity uptick. We’re talking about exponential growth, which humans have weak intuition about….

Addendum: Go into the archives of Genomes Unzipped and reach the older posts. Those guys knew where we were heading…and we’re pretty much there.