castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.

12/25/2008

 

1000 Genomes project, or more?

It turns out that the Solexa machines are getting better at such a pace that the calculations that the 1000 Genomes Project made are no longer true. Under the assumption that the read length and throughput of these NextGen machines would increase in 2008 and 2009, the project was funded enough money to fully sequence 1000 genomes from a panel of diverse ethnicities. The production capacity is currently led by the Sanger Institute, the Beijing Genomics Institute and the Broad Institute, then Baylor and WashU. The MaxPlanck offered to do some production in mid 2008, and Illumina, Roche and ABI are also contributing. Now the machines are better which means that the project is going to aim at even more than 1000 genomes. Where does the money come from? Well, it comes from biomedical research funding, as the aim of the project is to create a deep catalog of human genetic variation that will represent all rare shared variants in our species. This catalog will facilitate biomedical research by enabling the prospection of phenotypes on all the sampled genotypes, and link both to identify the causes of human diseases and traits. Beyond this obvious goal, such a deep sampling of population genomics data will give us great clues on the evolutionary processes that took place in our genome in the last hundreds of thousands of years. Particularly, one will be able to see what are the polymorphism patterns in the chromosomes, and how these correlate with all the genetic features we are getting from another big project, the scale-up ENCODE project. Add to that the comparative genomics information to closely related monkeys to compare divergence vs polymorphism levels, and you have a winner!

Now that we even have a browser for the 1000 genomes project, you can get a snippet of the kind of data the project will produce:

http://browser.1000genomes.org/Homo_sapiens/genesnpview?db=core;gene=ENSG00000128573;context=200
http://browser.1000genomes.org/Homo_sapiens/transcriptsnpview?db=core;transcript=ENST00000393489;context=200

Notice the "context=200" argument in GeneSNPView and TranscriptSnpView URLs: some people may mistakenly think that the intronic sequences are depleted of variation when one would expect to have most of the SNPs there: well, they are there, but the context in Gene and Transcript SNP Views restricts intronic SNPs to 100bp left and right to the exon by default. This allows for a more coding-centric view of variations, which according to the Ensembl HelpDesk tickets, is what people working in hospitals around the world really like about this view.

I remember when I joined the Ensembl project three years ago that these new machines were only a rumour, something that was secretly happening in a small science park in Great Chesterford, something that people at that time was dismissing simply as an undelivered promise: "Oh, but I've heard that they can only sequence 25bp pairs...", etc, etc. It's been like that for a lot of other scientific and technological promises:

- like production plug-in electric/hybrid cars -- "Oh, but I've heard that they only have an autonomy of a few miles..."
- inexpensive solar energy on the roof of your house --  "Oh, but I've heard that they only pay after 25 years..."
- your very own robotic butler -- "Oh, but I've heard that it doesn't even know how to make a good latte..."

Comments: Post a Comment

Subscribe to Post Comments [Atom]





<< Home

Archives

200409   200412   200501   200502   200503   200504   200505   200506   200507   200508   200509   200510   200511   200512   200601   200602   200603   200604   200605   200606   200607   200608   200609   200610   200611   200612   200701   200702   200703   200704   200705   200707   200708   200709   200710   200711   200712   200801   200802   200803   200804   200805   200806   200807   200808   200809   200810   200811   200812   200901   200902   200903   200904   200905   200906   200907   200908   200909   200912   201001   201002   201003   201004   201007   201009   201011   201102  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]