Document Type
Article
Publication Date
6-1-2021
Abstract
Usually, the comparison among genomic prediction models is based on validation schemes as Repeated Random Subsampling (RRS) or K-fold cross-validation. Nevertheless, the design of training and validation sets has a high effect on the way and subjectiveness we compare models. Those procedures cited above have an overlap across replicates that might cause an overestimated estimate and lack of residuals independence due to resampling issues and might cause less accurate results. Furthermore, ANOVA and multiple-comparison tests, such as Tukey, are not recommended due to assumptions unfulfilled regarding residuals' independence. Thus, we propose a new way to sample observations to build training and validation sets based on cross-validation alpha-based design (CV-α). The CV-α was meant to create several validation scenarios (replicates x folds), regardless of the number of genotypes. Using CV-α, the number of genotypes in the same fold across replicates was much lower than K-fold cross-validation, indicating higher residual independence. Therefore, based on the CV-α results, as proof of concept, via ANOVA, we could compare the proposed methodology to RRS and K-fold cross-validation, applying four genomic prediction models with a simulated and real dataset. Concerning the predictive ability and bias, all validation methods showed similar performance. However, regarding the mean squared error and coefficient of variation, the CV-α method presented the best performance under the evaluated scenarios. Moreover, as it has no additional cost or complexity, it is more reliable and allows non-subjective methods to compare models and factors. Therefore, CV-α can be considered a more precise validation methodology for model selection.
Publication Source (Journal or Book title)
Euphytica
Recommended Citation
Yassue, R., Sabadin, F., Galli, G., Alves, F., & Fritsche-Neto, R. (2021). CV-α: designing validations sets to increase the precision and enable multiple comparison tests in genomic prediction. Euphytica, 217 (6) https://doi.org/10.1007/s10681-021-02831-x