Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.
All Ensembl gene predictions for all vertebrate species are based on experimental evidence:
- NCBI RefSeq proteins and mRNAs
- EMBL Nucleotide Sequence Archive
Aligning the evidences back to the gene prediction with Exonerate. Types of alignment results:
- added start
- longer region
- missing start
- non-matching start
- non-matching region
- shorter region
Exonerate has an exhaustive mode that takes a lot more time but fixes some of the mini-intron and
mini-exon issues that sometimes occur. Exonerate cdna2genome is very useful for quality checking.
Genebuild now uses head-to-head alignments of genewise and exonerate, and takes the best in each case.
Some cases are still difficult to get right with algorithmic solutions: this is were the curators are needed.
Labels: ensembl, gene prediction, genomics, scientific talk