castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.


Revolving ideas around dN and dS analysis

After reading this paper:

Comparisons of dN/dS are time dependent for closely related bacterial


Eduardo P.C. Rocha, John Maynard Smith, Laurence D. Hurst, Matthew

T.G. Holdene, Jessica E. Cooper, Noel H. Smith, Edward J. Feild

I have started to have revolving ideas about codeml, evolver, hyphy

and a program of the likes of "seqgen", sprinkled with some of the

features in "rose", some of the features in "simcoal2", and some of

the features in the recently published "cosi":

The thing is that with either PAML or HyPhy there will always be a

reasonable uncertainty about how accurate is the model of dN/dS

branches given by the MLE for the data. It is like the problem with

multiple sequence alignments: one will never* know if the MSA

determined by probcons, muscle, t-coffee or clustal is _the_ MSA that

depicts the true relationship of each and every aminoacid or

nucleotide of a group of individuals or species.

*well, at least until the technology in "Park Jurassic" is

achieved. Actually, better technology that in P.J., as the frogs

tinkerings were really bad in that case.

Some weeks ago I found out that Aaron Darling, of Mauve's fame,

created a whole framework of what one could call "evolving-MSAs", to

recreate realistic cases for which we know the _true_ MSA. We can then

use this _true_ MSAs to check if our alignment program is good or not.

Cosi is more or less a similar framework, but for different goals.

Cosi, sgevolver, seqgen, rose, simcoal2 and similar programs are great

tools to play with in a long flight or any other situation were one is

stuck to a confined place for several hours without much to do. Like

insomnia nights.

Anyway, CASP is another example by which one can improve the ab initio

prediction programs of protein structures by giving to the authors of

those programs truly given structures (crystallographic? - beats me).

Roderic Guigo and some other colleagues in the ab initio gene

prediction world were trying to promote a CASP-like annual event for

the gene prediction use case.

So, back to the dN/dS stuff. One thing that is always difficult to

take for granted is the reconstructed ancestral sequence given by

either PAML, HyPhy (havent tried much) or any MSA-analysis-like

program. It's a place where the problems with dN/dS analysis and those

of MSA collide.

But one thing we can do is to see how good are the reconstructed

sequences for a given sequence (using something on the likes of

seqgen), and compare the true ancestral sequences given by a seqgen

run to the PAML/HyPhy reconstructed sequences under a specific model.

Obviously, simulating sequences and checking how good is the

reconstruction is absolutely idiotic if one is analising real-world

sequences (this is more or less what my PhD advisor told me). But the

thing is that this is for assessing more or less how good this

reconstructions are. Another analogy comes to my mind: it would be

like to see how good a basketball player is when shooting 100 free

shots. Let's say that the player scores 90 of 100. We cant say this

specific player (or phylogenetic program) will be able to score 90, or

say, 70, of every 100 free shots in play-off games (or real

sequences). But it does say that under certain conditions (as much

realistic as possible given the controlled variables) it _does_ score

90 of 100, so for realistic cases (the rest of the uncontrolled

conditions) if will score close to 90, with a certain confidence of


This reconstructed sequences, by the way, are very important when

doing preferred/unpreferred codon analysis.

I guess this ideas are just "revolving" than "evolving" right now...

Comments: Post a Comment

Subscribe to Post Comments [Atom]

<< Home


200409   200412   200501   200502   200503   200504   200505   200506   200507   200508   200509   200510   200511   200512   200601   200602   200603   200604   200605   200606   200607   200608   200609   200610   200611   200612   200701   200702   200703   200704   200705   200707   200708   200709   200710   200711   200712   200801   200802   200803   200804   200805   200806   200807   200808   200809   200810   200811   200812   200901   200902   200903   200904   200905   200906   200907   200908   200909   200912   201001   201002   201003   201004   201007   201009   201011   201102  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]