castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.

12/23/2005

Revolving ideas around dN and dS analysis

After reading this paper:

Comparisons of dN/dS are time dependent for closely related bacterial

genomes

Eduardo P.C. Rocha, John Maynard Smith, Laurence D. Hurst, Matthew

T.G. Holdene, Jessica E. Cooper, Noel H. Smith, Edward J. Feild

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?cmd=prlinks&dbfrom=pubmed&retmode=ref&id=16239014

I have started to have revolving ideas about codeml, evolver, hyphy

and a program of the likes of "seqgen", sprinkled with some of the

features in "rose", some of the features in "simcoal2", and some of

the features in the recently published "cosi":

http://www.broad.mit.edu/personal/sfs/cosi/cosi_package.tar

The thing is that with either PAML or HyPhy there will always be a

reasonable uncertainty about how accurate is the model of dN/dS

branches given by the MLE for the data. It is like the problem with

multiple sequence alignments: one will never* know if the MSA

determined by probcons, muscle, t-coffee or clustal is _the_ MSA that

depicts the true relationship of each and every aminoacid or

nucleotide of a group of individuals or species.

*well, at least until the technology in "Park Jurassic" is

achieved. Actually, better technology that in P.J., as the frogs

tinkerings were really bad in that case.

Some weeks ago I found out that Aaron Darling, of Mauve's fame,

created a whole framework of what one could call "evolving-MSAs", to

recreate realistic cases for which we know the _true_ MSA. We can then

use this _true_ MSAs to check if our alignment program is good or not.

Cosi is more or less a similar framework, but for different goals.

Cosi, sgevolver, seqgen, rose, simcoal2 and similar programs are great

tools to play with in a long flight or any other situation were one is

stuck to a confined place for several hours without much to do. Like

insomnia nights.

Anyway, CASP is another example by which one can improve the ab initio

prediction programs of protein structures by giving to the authors of

those programs truly given structures (crystallographic? - beats me).

Roderic Guigo and some other colleagues in the ab initio gene

prediction world were trying to promote a CASP-like annual event for

the gene prediction use case.

So, back to the dN/dS stuff. One thing that is always difficult to

take for granted is the reconstructed ancestral sequence given by

either PAML, HyPhy (havent tried much) or any MSA-analysis-like

program. It's a place where the problems with dN/dS analysis and those

of MSA collide.

But one thing we can do is to see how good are the reconstructed

sequences for a given sequence (using something on the likes of

seqgen), and compare the true ancestral sequences given by a seqgen

run to the PAML/HyPhy reconstructed sequences under a specific model.

Obviously, simulating sequences and checking how good is the

reconstruction is absolutely idiotic if one is analising real-world

sequences (this is more or less what my PhD advisor told me). But the

thing is that this is for assessing more or less how good this

reconstructions are. Another analogy comes to my mind: it would be

like to see how good a basketball player is when shooting 100 free

shots. Let's say that the player scores 90 of 100. We cant say this

specific player (or phylogenetic program) will be able to score 90, or

say, 70, of every 100 free shots in play-off games (or real

sequences). But it does say that under certain conditions (as much

realistic as possible given the controlled variables) it _does_ score

90 of 100, so for realistic cases (the rest of the uncontrolled

conditions) if will score close to 90, with a certain confidence of

interval.

This reconstructed sequences, by the way, are very important when

doing preferred/unpreferred codon analysis.

I guess this ideas are just "revolving" than "evolving" right now...

posted by avilella # 12:45 AM

Comments: Post a Comment

Subscribe to Post Comments [Atom]

<< Home

castanyes blaves

12/23/2005

Archives