PLoS Genetics: Evolution of Regulatory Sequences in 12 Drosophila Species
Multiple Sequence Alignment and Insertion/Deletion Annotations

For the analysis of TFBS evolution, we developed a new multiple alignment program, “ProbconsMorph”, by integrating Probcons [41], a consistency based multiple sequence alignment program, and Morph [38], a pair-wise sequence alignment program that is specially designed to align regulatory modules. Morph uses a pair-HMM as a generative model for alignment of two orthologous CRMs, and is parameterized by the given motifs, as well as various evolutionary rate parameters that it fits to the data. It uses maximum likelihood inference to simultaneously perform TFBS annotation and alignment. It reports for every pair of positions in the two sequences, the posterior probability that they are aligned. Morph was run to produce such a probabilistic alignment of every pair of species. Probcons takes such pair-wise alignment probabilities and builds a multiple sequence alignment progressively, while using the “consistency transformation”: the probability of alignment of two nucleotides and is updated based on the alignment probabilities of and and of and , where is a nucleotide from a third species. We have shown previously that Morph provides practical benefits for inference of evolutionary events and rates by computing a better alignment; ProbconsMorph is an effective and efficient extension of this program to more than two species. We made two simple modifications to Probcons to integrate it with Morph: firstly, Probcons was made to work on DNA sequences (the current implementation handles protein sequences only), and secondly, it was made to accept a phylogenetic tree as input, rather than estimate the tree at run-time. The ProbconsMorph software is publicly available at our site

