4/30/2009

RNAseq in the worm -- Gary Williams

Illumina Short-read transcriptome data has the potential to help solve many problems with curating gene models and the genomic sequence in C. elegans. This is an initial look at the data and some examples of how it can be used.

So far, C. elegans gene predictions:

36% - fully confirmed by ESTs
48% - partially confirmed
14% - no transcript confirmation

RNAseq data -- different worms than the genome, so some polymorphisms expected -- 200bp inserts, 36bp paired end reads

MAQ or cross-match to genomic or transcript sequences

6137 new splice junctions (6% increase)

Jumped from 70000 to 98000 splice junctions.
3x as many polyA sites
80 possible new coding genes

V-shaped coverages -- validation against traces, then:
  • Detected sequencing error, correction needed for the reference
  • Detected alternative haplotype
Moving towards single-cell sequencing -- not sequencing in tiny cells but sequencing each cell in each developmental state in the worm. Moving towards RNA sequencing C. briggsae and C. remanei.

Updated gene builds will be given to other projects. Next Ensembl Metazoa comparative genomics build may already have the modENCODE-updated C. elegans and D. melanogaster builds.

No comments:

Post a Comment