Document Type


Publication Date



© 2017 The Author(s). The use of genetic data for identifying species-level lineages across the tree of life has received increasing attention in the field of systematics over the past decade. The multispecies coalescent model provides a framework for understanding the process of lineage divergence and has become widely adopted for delimiting species. However, because these studies lack an explicit assessment of model fit, in many cases, the accuracy of the inferred species boundaries are unknown. This is concerning given the large amount of empirical data and theory that highlight the complexity of the speciation process. Here, we seek to fill this gap by using simulation to characterize the sensitivity of inference under the multispecies coalescent (MSC) to several violations of model assumptions thought to be common in empirical data. We also assess the fit of the MSC model to empirical data in the context of species delimitation. Our results show substantial variation in model fit across data sets. Posterior predictive tests find the poorest model performance in data sets that were hypothesized to be impacted by model violations. We also show that while the inferences assuming the MSC are robust to minor model violations, such inferences can be biased under some biologically plausible scenarios. Taken together, these results suggest that researchers can identify individual data sets in which species delimitation under the MSC is likely to be problematic, thereby highlighting the cases where additional lines of evidence to identify species boundaries are particularly important to collect. Our study supports a growing body of work highlighting the importance of model checking in phylogenetics, and the usefulness of tailoring tests of model fit to assess the reliability of particular inferences.

Publication Source (Journal or Book title)

Systematic Biology

First Page


Last Page