Document Type

Conference Proceeding

Publication Date

7-18-2016

Abstract

We present OptEx, a closed-form model of jobexecution on Apache Spark, a popular parallel processing engine. To the best of our knowledge, OptEx is the first work thatanalytically models job completion time on Spark. The model canbe used to estimate the completion time of a given Spark job ona cloud, with respect to the size of the input dataset, the numberof iterations, the number of nodes comprising the underlyingcluster. Experimental results demonstrate that OptEx yields amean relative error of 6% in estimating the job completion time. Furthermore, the model can be applied for estimating the costoptimal cluster composition for running a given Spark job ona cloud under a completion deadline specified in the SLO (i.e.,Service Level Objective). We show experimentally that OptEx isable to correctly estimate the cost optimal cluster compositionfor running a given Spark job under an SLO deadline with anaccuracy of 98%.

Publication Source (Journal or Book title)

Proceedings 2016 16th IEEE ACM International Symposium on Cluster Cloud and Grid Computing Ccgrid 2016

First Page

193

Last Page

202

Share

COinS