Soft error resilience in Big Data kernels through modular analysis
Document Type
Article
Publication Date
4-1-2016
Abstract
The shrinking processor feature and operating voltages of processor circuits are making them increasingly vulnerable to soft faults, which calls for fault resilience techniques at both the software and hardware levels under the big data context. To assist software developers in writing fault-resilient big data applications, we propose the tool ErrorSight, which helps them to focus their efforts on code regions and data structures that are most vulnerable to soft errors, understand how numerical errors propagate through the program, and apply fault resilience techniques effectively. ErrorSight achieves this through efficient generation of error profiles leveraging the predictive power of the Boosted Regression Tree model. We use four big data kernels to illustrate the modular analysis mechanism of ErrorSight and show its usefulness in the development of numerical fault-resilience in Big Data.
Publication Source (Journal or Book title)
Journal of Supercomputing
First Page
1570
Last Page
1596
Recommended Citation
Chen, S., Bronevetsky, G., Peng, L., Li, B., & Fu, X. (2016). Soft error resilience in Big Data kernels through modular analysis. Journal of Supercomputing, 72 (4), 1570-1596. https://doi.org/10.1007/s11227-016-1682-2