Mixture models of nucleotide sequence evolution, and the evolution of yeast genomes (#92)
Molecular phylogenetic studies of homologous
sequences of nucleotides often assume that the evolutionary process was
globally stationary, reversible, and homogeneous (SRH), and that the data can
be modeled accurately using one or several site-specific, time-reversible rate
matrices. However, a growing body of data suggests that evolution under
globally SRH conditions is an exception, rather than a norm. To address this
issue, we introduce a family of mixture models that considers heterogeneity in
the substitution process across lineages (HAL) and across sites (HAS). We also
introduce an algorithm for searching model space and identifying a model of
evolution that is less likely to over- or under-parameterize the model. The
merits of our algorithms are illustrated with an analysis of 42,337 2nd codon
sites extracted from a concatenation of 106 alignments of orthologs encoded by
the nuclear genomes of eight species of yeast. The best HAL-HAS model provides
a better fit between the tree and data than other models do, and the parameter
estimates for this model indicate not only a complex ancestral sequence but
also a complex evolutionary process.