Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.
Illumina Short-read transcriptome data has the potential to help solve many problems with curating gene models and the genomic sequence in
C. elegans. This is an initial look at the data and some examples of how it can be used.
So far,
C. elegans gene predictions:
36% - fully confirmed by ESTs
48% - partially confirmed
14% - no transcript confirmation
RNAseq data -- different worms than the genome, so some polymorphisms expected -- 200bp inserts, 36bp paired end reads
MAQ or cross-match to genomic or transcript sequences
6137 new splice junctions (6% increase)
Jumped from 70000 to 98000 splice junctions.
3x as many polyA sites
80 possible new coding genes
V-shaped coverages -- validation against traces, then:
- Detected sequencing error, correction needed for the reference
- Detected alternative haplotype
Moving towards single-cell sequencing -- not sequencing in tiny cells but sequencing each cell in each developmental state in the worm. Moving towards RNA sequencing
C. briggsae and
C. remanei.
Updated gene builds will be given to other projects. Next
Ensembl Metazoa comparative genomics build may already have the modENCODE-updated
C. elegans and
D. melanogaster builds.
Labels: ensembl, genomics, nextgen sequencing, scientific talk