Improving estimation of evolutionary timescales from multi-gene data sets using ClockstaR — ASN Events

Improving estimation of evolutionary timescales from multi-gene data sets using ClockstaR (#235)

Sebastian Duchene 1 , Simon Y.W Hoi 1
  1. The University of Sydney, Chippendale, NSW, Australia

Molecular data can be used to estimate the evolutionary timescale of a group of taxa. This can be done using phylogenetic methods based on the molecular clock, a model that describes variation in evolutionary rates. The simplest model is the strict molecular clock, which posits that the rate is constant throughout the tree. This model has been rejected for many data sets, leading to the development of ‘relaxed-clock’ models that can account for rate variation among lineages.

The analysis of multi-gene data sets requires special consideration because the patterns of among-lineage rate variation can differ among genes. If this is the case, it is more appropriate to partition the genes according to their evolutionary patterns and to use a separate clock model for each group of genes. However, even in moderately-sized data sets, the number of possible partitioning schemes can be very large, so rigorous comparison of their statistical fit is computationally prohibitive.

We present ClockstaR, a method to select the optimal clock-model partitioning scheme. We show that using arbitrary partitioning schemes can result in misleading estimates of evolutionary timescales, a problem that can be avoided by carefully selecting the partitioning scheme with automated methods, such as ClockstaR.