After reading this paper:
Comparisons of dN/dS are time dependent for closely related bacterial
genomes
Eduardo P.C. Rocha, John Maynard Smith, Laurence D. Hurst, Matthew
T.G. Holdene, Jessica E. Cooper, Noel H. Smith, Edward J. Feild
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?cmd=prlinks&dbfrom=pubmed&retmode=ref&id=16239014
I have started to have revolving ideas about codeml, evolver, hyphy
and a program of the likes of "seqgen", sprinkled with some of the
features in "rose", some of the features in "simcoal2", and some of
the features in the recently published "cosi":
http://www.broad.mit.edu/personal/sfs/cosi/cosi_package.tar
The thing is that with either PAML or HyPhy there will always be a
reasonable uncertainty about how accurate is the model of dN/dS
branches given by the MLE for the data. It is like the problem with
multiple sequence alignments: one will never* know if the MSA
determined by probcons, muscle, t-coffee or clustal is _the_ MSA that
depicts the true relationship of each and every aminoacid or
nucleotide of a group of individuals or species.
*well, at least until the technology in "Park Jurassic" is
achieved. Actually, better technology that in P.J., as the frogs
tinkerings were really bad in that case.
Some weeks ago I found out that Aaron Darling, of Mauve's fame,
created a whole framework of what one could call "evolving-MSAs", to
recreate realistic cases for which we know the _true_ MSA. We can then
use this _true_ MSAs to check if our alignment program is good or not.
Cosi is more or less a similar framework, but for different goals.
Cosi, sgevolver, seqgen, rose, simcoal2 and similar programs are great
tools to play with in a long flight or any other situation were one is
stuck to a confined place for several hours without much to do. Like
insomnia nights.
Anyway, CASP is another example by which one can improve the ab initio
prediction programs of protein structures by giving to the authors of
those programs truly given structures (crystallographic? - beats me).
Roderic Guigo and some other colleagues in the ab initio gene
prediction world were trying to promote a CASP-like annual event for
the gene prediction use case.
So, back to the dN/dS stuff. One thing that is always difficult to
take for granted is the reconstructed ancestral sequence given by
either PAML, HyPhy (havent tried much) or any MSA-analysis-like
program. It's a place where the problems with dN/dS analysis and those
of MSA collide.
But one thing we can do is to see how good are the reconstructed
sequences for a given sequence (using something on the likes of
seqgen), and compare the true ancestral sequences given by a seqgen
run to the PAML/HyPhy reconstructed sequences under a specific model.
Obviously, simulating sequences and checking how good is the
reconstruction is absolutely idiotic if one is analising real-world
sequences (this is more or less what my PhD advisor told me). But the
thing is that this is for assessing more or less how good this
reconstructions are. Another analogy comes to my mind: it would be
like to see how good a basketball player is when shooting 100 free
shots. Let's say that the player scores 90 of 100. We cant say this
specific player (or phylogenetic program) will be able to score 90, or
say, 70, of every 100 free shots in play-off games (or real
sequences). But it does say that under certain conditions (as much
realistic as possible given the controlled variables) it _does_ score
90 of 100, so for realistic cases (the rest of the uncontrolled
conditions) if will score close to 90, with a certain confidence of
interval.
This reconstructed sequences, by the way, are very important when
doing preferred/unpreferred codon analysis.
I guess this ideas are just "revolving" than "evolving" right now...
200409 200412 200501 200502 200503 200504 200505 200506 200507 200508 200509 200510 200511 200512 200601 200602 200603 200604 200605 200606 200607 200608 200609 200610 200611 200612 200701 200702 200703 200704 200705 200707 200708 200709 200710 200711 200712 200801 200802 200803 200804 200805 200806 200807 200808 200809 200810 200811 200812 200901 200902 200903 200904 200905 200906 200907 200908 200909 200912 201001 201002 201003 201004 201007 201009 201011 201102
Subscribe to Posts [Atom]