4/23/2009

Ensembl Quality Checking -- Michael Schuster

All Ensembl gene predictions for all vertebrate species are based on experimental evidence:

  1. UniprotKB/Swiss-Prot
  2. NCBI RefSeq proteins and mRNAs
  3. UniprotKB/TrEMBL
  4. EMBL Nucleotide Sequence Archive

Aligning the evidences back to the gene prediction with Exonerate. Types of alignment results:

  • perfect
  • added start
  • longer region
  • missing start
  • non-matching start
  • non-matching region
  • shorter region
  • ...

Exonerate has an exhaustive mode that takes a lot more time but fixes some of the mini-intron and
mini-exon issues that sometimes occur. Exonerate cdna2genome is very useful for quality checking.

Genebuild now uses head-to-head alignments of genewise and exonerate, and takes the best in each case.

Some cases are still difficult to get right with algorithmic solutions: this is were the curators are needed.

No comments:

Post a Comment