castanyes blaves

Random ramblings about some random stuff, and things; but more stuff than things -- all in a mesmerizing and kaleidoscopic soapbox-like flow of words.



The blurring line between protein alignments and genomic alignments

As more and more data is being poured into the public sequence databases, an increasingly detailed map is being drawn that relates sequences from different individuals or different species, mainly into what has been known in the field as protein or genomic (DNA) alignments. This is what one can call twenty-first century molecular cartography.

All references to molecular evolution this year should be accompanied with an analogy to Darwin's work, so here is how it works in this case: Darwin's next generation machine, the Beagle, went on a journey to accumulate an enormous variety of specimens that, when compared all together, allowed Darwin to draw the first phylogenetic tree.

Contrary to what one would think, alignments with more sequences are easier to resolve than ones with less sequences, at least when the phylogenetic tree relating the sequences increases in detail, which is almost always. And this is allowing researchers to generate genomic alignments for phylogenetically dense groups of genomes while, in parallel, the protein alignments for the corresponding protein coding genes in these genomes are combined together with more distantly related species. This dense taxon sampling is making the distinction between protein alignments and genomic alignments less and less obvious.

As an example, one can use the highly conserved protein coding exons to anchor the points in the different chromosomes that define stretches of conserved synteny among the genomes, and then align these DNA stretches all together with a genomic aligner. At the same time, one can use the exon boundaries defined in the DNA sequences of the coding genes to help infer the right protein alignment at the aminoacidic level.

A new opportunity is now arising in exploiting the information that is contained separately in the genomic and protein alignments to combine them into a single object representing both. New methods are being developed that will exploit the landmarks that both genomic and protein alignments have correctly place to converge into a single intertwined alignment object. This new type of alignment has in a way already been represented in closely related prokaryotic genomes. But prokaryotic genomes are less interesting for some topics, like alternative splicing, repetitive elements or recombination hotspots. Combined genomic and protein alignments will bring new elements of detail together that have been scattered so far for researchers to study and hopefully some new and brilliant mechanistic explanations of the innards of molecular evolution will arise from them, in the same way that Darwin did two centuries ago.

So a deluge of sequencing data is not really a problem but an opportunity.

Labels: ,

Comments: Post a Comment

Subscribe to Post Comments [Atom]

<< Home


200409   200412   200501   200502   200503   200504   200505   200506   200507   200508   200509   200510   200511   200512   200601   200602   200603   200604   200605   200606   200607   200608   200609   200610   200611   200612   200701   200702   200703   200704   200705   200707   200708   200709   200710   200711   200712   200801   200802   200803   200804   200805   200806   200807   200808   200809   200810   200811   200812   200901   200902   200903   200904   200905   200906   200907   200908   200909   200912   201001   201002   201003   201004   201007   201009   201011   201102  

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]