castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.



How much should you stretch HMM classification?

There are ongoing discussions in our Campus regarding the use of de novo clustering or HMM classification to update the family models in an Orthology database from one release to the other.

One trend that I don't favour is to use the HMM models from the previous build and classify the current protein sets to it. Then re-run the alignment and tree-building steps after that.

The other trend that I favour is to re-run the new blasts/phmmers for the new proteins, re-cluster with the other hits in the updated graph, and then re-run the alignment and tree-building steps in the new set of family models.

People who argue in favour of the HMM classification procedure and want to convince me of their feasibility show give convincing answers to these questions:

Let's say you only have a few complete genomes sequences with provisional gene predictions from your clade but expect to have 20% more extra finished genomes with better gene predictions every two months. Over the course of a year your will have more than doubled the number of genomes. Do you trust the HMMs you are doing today to represent the family models in two month, four-month, six-month, eight-month, ten-month, twelve-month time?

Let's say that you have answered the previous question with a 'yes' then, where do you draw the line to update your HMM by rebuilding them from updated genome sets? Why don't you take Human, S.cerevisiae, A.thaliana, E.coli and P.furiosus and call that the ultimate representation of all family models on Earth? Do you think those families would be as good as than the ones obtained using 80 genomes instead?

If so, then it's very easy for you, just use those 5 genomes as your family model set. But I don't think that is the way to go.

Labels: ,

Comments: Post a Comment

Subscribe to Post Comments [Atom]

<< Home


200409   200412   200501   200502   200503   200504   200505   200506   200507   200508   200509   200510   200511   200512   200601   200602   200603   200604   200605   200606   200607   200608   200609   200610   200611   200612   200701   200702   200703   200704   200705   200707   200708   200709   200710   200711   200712   200801   200802   200803   200804   200805   200806   200807   200808   200809   200810   200811   200812   200901   200902   200903   200904   200905   200906   200907   200908   200909   200912   201001   201002   201003   201004   201007   201009   201011   201102  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]