Data Integration Using Model-Based Boosting
Document Type
Article
Publication Date
9-1-2021
Abstract
The need for data integration is becoming ubiquitous and encompasses many disciplines due to the technological development in instrumentation. Combining the information from distinct data sources in modeling, so as to improve the prediction accuracy and have a holistic view of the problem is a challenge for statisticians. In this paper, we present a flexible statistical framework for integrating various types of data from distinct sources through model-based boosting (IMBoost) with two types of base models: regression trees and penalized splines. The performance of IMBoost is illustrated through two recent studies in environmental soil science, where multiple sensors were used to quantify several soil parameters. Empirical results are promising and show the proposed algorithms substantially improve the prediction performance through combining the strength from distinct data sources. We also proposed a surrogate model approach, which allows IMBoost to handle situations when partial samples are missing from distinct sources.
Publication Source (Journal or Book title)
SN Computer Science
Recommended Citation
Li, B., Chakraborty, S., Weindorf, D., & Yu, Q. (2021). Data Integration Using Model-Based Boosting. SN Computer Science, 2 (5) https://doi.org/10.1007/s42979-021-00797-0