Document Type
Conference Proceeding
Publication Date
7-18-2016
Abstract
We present OptEx, a closed-form model of jobexecution on Apache Spark, a popular parallel processing engine. To the best of our knowledge, OptEx is the first work thatanalytically models job completion time on Spark. The model canbe used to estimate the completion time of a given Spark job ona cloud, with respect to the size of the input dataset, the numberof iterations, the number of nodes comprising the underlyingcluster. Experimental results demonstrate that OptEx yields amean relative error of 6% in estimating the job completion time. Furthermore, the model can be applied for estimating the costoptimal cluster composition for running a given Spark job ona cloud under a completion deadline specified in the SLO (i.e.,Service Level Objective). We show experimentally that OptEx isable to correctly estimate the cost optimal cluster compositionfor running a given Spark job under an SLO deadline with anaccuracy of 98%.
Publication Source (Journal or Book title)
Proceedings 2016 16th IEEE ACM International Symposium on Cluster Cloud and Grid Computing Ccgrid 2016
First Page
193
Last Page
202
Recommended Citation
Sidhanta, S., Golab, W., & Mukhopadhyay, S. (2016). OptEx: A Deadline-Aware Cost Optimization Model for Spark. Proceedings 2016 16th IEEE ACM International Symposium on Cluster Cloud and Grid Computing Ccgrid 2016, 193-202. https://doi.org/10.1109/CCGrid.2016.10