Topological heterogeneity among gene trees is widely observed in phylogenomic analyses and some of this variation is likely caused by systematic error in gene tree estimation. Systematic error can be mitigated by improving models of sequence evolution to account for all evolutionary processes relevant to each gene or identifying those genes whose evolution best conforms to existing models. However, the best method for identifying such genes is not well established. Here, we ask if filtering genes according to their clock-likeness or posterior predictive effect size (PPES, an inference-based measure of model violation) improves phylogenetic reliability and congruence. We compared these approaches to each other, and to the common practice of filtering based on rate of evolution, using two different metrics. First, we compared gene-tree topologies to accepted reference topologies. Second, we examined topological similarity among gene trees in filtered sets. Our results suggest that filtering genes based on clock-likeness and PPES can yield a collection of genes with more reliable phylogenetic signal. For the two exemplar data sets we explored, from yeast and amniotes, clock-likeness and PPES outperformed rate-based filtering in both congruence and reliability.
Publication Source (Journal or Book title)
Doyle, V. P., Young, R. E., Naylor, G. J., Brown, J. M., & Doyle, V. P. (2015). Can We Identify Genes With Increased Phylogenetic Reliability?. Systematic Biology, SEP (5), 824. https://doi.org/10.1093/sysbio/syv041