As more and more data is being poured into the public sequence databases, an increasingly detailed map is being drawn that relates sequences from different individuals or different species, mainly into what has been known in the field as protein or genomic (DNA) alignments. This is what one can call twenty-first century molecular cartography.
All references to molecular evolution this year should be accompanied with an analogy to Darwin's work, so here is how it works in this case: Darwin's next generation machine, the Beagle, went on a journey to accumulate an enormous variety of specimens that, when compared all together, allowed Darwin to draw the first phylogenetic tree.
Contrary to what one would think, alignments with more sequences are easier to resolve than ones with less sequences, at least when the phylogenetic tree relating the sequences increases in detail, which is almost always. And this is allowing researchers to generate genomic alignments for phylogenetically dense groups of genomes while, in parallel, the protein alignments for the corresponding protein coding genes in these genomes are combined together with more distantly related species. This dense taxon sampling is making the distinction between protein alignments and genomic alignments less and less obvious.
As an example, one can use the highly conserved protein coding exons to anchor the points in the different chromosomes that define stretches of conserved synteny among the genomes, and then align these DNA stretches all together with a genomic aligner. At the same time, one can use the exon boundaries defined in the DNA sequences of the coding genes to help infer the right protein alignment at the aminoacidic level.
A new opportunity is now arising in exploiting the information that is contained separately in the genomic and protein alignments to combine them into a single object representing both. New methods are being developed that will exploit the landmarks that both genomic and protein alignments have correctly place to converge into a single intertwined alignment object. This new type of alignment has in a way already been represented in closely related prokaryotic genomes. But prokaryotic genomes are less interesting for some topics, like alternative splicing, repetitive elements or recombination hotspots. Combined genomic and protein alignments will bring new elements of detail together that have been scattered so far for researchers to study and hopefully some new and brilliant mechanistic explanations of the innards of molecular evolution will arise from them, in the same way that Darwin did two centuries ago.
So a deluge of sequencing data is not really a problem but an opportunity.
Labels: genomics, nextgen sequencing
200409 200412 200501 200502 200503 200504 200505 200506 200507 200508 200509 200510 200511 200512 200601 200602 200603 200604 200605 200606 200607 200608 200609 200610 200611 200612 200701 200702 200703 200704 200705 200707 200708 200709 200710 200711 200712 200801 200802 200803 200804 200805 200806 200807 200808 200809 200810 200811 200812 200901 200902 200903 200904 200905 200906 200907 200908 200909 200912 201001 201002 201003 201004 201007 201009 201011 201102
Subscribe to Posts [Atom]