castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.

1/02/2009

 

Interesting proteomics dataset for Arabidopsis

Discovery and revision of Arabidopsis genes by proteogenomics — PNAS
Gene annotation underpins genome science. Most often protein coding sequence is inferred from the genome based on transcript evidence and computational predictions. While generally correct, gene models suffer from errors in reading frame, exon border definition, and exon identification. To ascertain the error rate of Arabidopsis thaliana gene models, we isolated proteins from a sample of Arabidopsis tissues and determined the amino acid sequences of 144,079 distinct peptides by tandem mass spectrometry. The peptides corresponded to 1 or more of 3 different translations of the genome: a 6-frame translation, an exon splice-graph, and the currently annotated proteome. The majority of the peptides (126,055) resided in existing gene models (12,769 confirmed proteins), comprising 40% of annotated genes. Surprisingly, 18,024 novel peptides were found that do not correspond to annotated genes. Using the gene finding program AUGUSTUS and 5,426 novel peptides that occurred in clusters, we discovered 778 new protein-coding genes and refined the annotation of an additional 695 gene models. The remaining 13,449 novel peptides provide high quality annotation (>99% correct) for thousands of additional genes. Our observation that 18,024 of 144,079 peptides did not match current gene models suggests that 13% of the Arabidopsis proteome was incomplete due to approximately equal numbers of missing and incorrect gene models.


Plant genomics has always been one step behind human genomics in terms of the amount of available evidences accumulating to build gene models. Even in the most well-known plant model, Arabidopsis, there is a lot of room for improvement. This paper shows how the latest genome-wide tools for gathering protein coding evidences, like tandem MS (or RNA-seq) can improve the current protein coding gene set.

Comments: Post a Comment

Subscribe to Post Comments [Atom]





<< Home

Archives

200409   200412   200501   200502   200503   200504   200505   200506   200507   200508   200509   200510   200511   200512   200601   200602   200603   200604   200605   200606   200607   200608   200609   200610   200611   200612   200701   200702   200703   200704   200705   200707   200708   200709   200710   200711   200712   200801   200802   200803   200804   200805   200806   200807   200808   200809   200810   200811   200812   200901   200902   200903   200904   200905   200906   200907   200908   200909   200912   201001   201002   201003   201004   201007   201009   201011   201102  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]