Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.
Depositing every week same amount of data as all was in public dbs before (past ~20-30 years).
Data coordination (EBI Paul Flicek, Laura+Zam). Keep submissions, run QCs and recalibration, present in
mini Ensembl browser. Working on the Resembl (Solexa) public release, need to implement as MySQL 5.1 partitioning instead of commercial db.
Pipeline -- first align to the genome; will move to new assembly soon. 454 with ssaha, Solexa here with MAQ.
Trios data now being churned at dbSNP -- causing dbSNP more churning than usual releases, they are catching up. Low coverage also submitted, but will probably be in dbSNP 131.
Data formats: Fastq / BAM (binary SAM alignment map format) / GLF (genotype likelihood format). BAMs/GLFs will be updated as more data gets in and old ones will disappear.
Hope all the sequencing will be done by the end of 2009. Paper about pilot projects soon. Targetted sequencing (pilot 3) took more time, pull-down methods a bit longer to nail down, now working.
DCC more automated data delivery systems. Standard QC/Recalibration pipeline. Other high throughput analyses. New staff. May take over the alignment process once the alignment algorithm is consensuated.
Jim Stalker and Thomas Keane doing a lot of work at the Sanger. Eugene Kulesha and Stephen Keenan on the website work. Fiona and Yuan on calling/storing/presenting SNPs in Ensembl.
Labels: ensembl, nextgen sequencing