Illumina Short-read transcriptome data has the potential to help solve many problems with curating gene models and the genomic sequence in C. elegans. This is an initial look at the data and some examples of how it can be used.
So far, C. elegans gene predictions:
36% - fully confirmed by ESTs
48% - partially confirmed
14% - no transcript confirmation
RNAseq data -- different worms than the genome, so some polymorphisms expected -- 200bp inserts, 36bp paired end reads
MAQ or cross-match to genomic or transcript sequences
6137 new splice junctions (6% increase)
Jumped from 70000 to 98000 splice junctions.
3x as many polyA sites
80 possible new coding genes
V-shaped coverages -- validation against traces, then:
Updated gene builds will be given to other projects. Next Ensembl Metazoa comparative genomics build may already have the modENCODE-updated C. elegans and D. melanogaster builds.

So far, C. elegans gene predictions:
36% - fully confirmed by ESTs
48% - partially confirmed
14% - no transcript confirmation
RNAseq data -- different worms than the genome, so some polymorphisms expected -- 200bp inserts, 36bp paired end reads
MAQ or cross-match to genomic or transcript sequences
6137 new splice junctions (6% increase)
Jumped from 70000 to 98000 splice junctions.
3x as many polyA sites
80 possible new coding genes
V-shaped coverages -- validation against traces, then:
- Detected sequencing error, correction needed for the reference
- Detected alternative haplotype
Updated gene builds will be given to other projects. Next Ensembl Metazoa comparative genomics build may already have the modENCODE-updated C. elegans and D. melanogaster builds.

No comments:
Post a Comment