Automated Cloud Datastore and Infrastructure Management Under SLA
Doctor of Philosophy (PhD)
Modern web-based client applications, like Netflix, Youtube, Facebook, Amazon, BitTorrent, etc., use quorum-replicated datastores, like Cassandra, MongoDB, Hbase, etc., to process huge volumes of data on a cluster of commodity machines. Our research explores novel techniques to adapt the client-centric performance (i.e., performance observed from client application) of such applications according to a given usecase through automated tuning of configuration of underlying datastores. Further, to reduce capital expenditure (capex) for maintaining the hosting infrastructure, client applications are often hosted on infrastructures provided by third party cloud service providers, like Amazon, Rackspace, Microsoft, etc. Users pay the cloud service providers a cloud usage cost on the basis of the hourly usage of virtual machine instances composing the above hosting infrastructure. For minimizing cloud usage cost and optimizing client-centric performance of such applications, composition of the hosting infrastructures or configuration of underlying datastores needs to be managed according to the given usecase (or the Service Level Agreement corresponding to the usecase). Manual management of a cloud infrastructure or manual configuration of an underlying distributed datastores, considering the trade off in client-centric performance, is difficult, because of the large number of possible usecases and dynamic workload changes, which affect the client-centric performance. Moreover, state-ofthe- art cloud management tools do not consider client-centric performance metrics, like latency in the SLA (i.e., Service Level Agreement). Further, workloads in real world cloud-based web applications widely vary over time. For example, Netflix observes that the network traffic for its applications reaches almost 37% of Internet traffic during peak workload hours. State-of-the-art cloud management tools cannot adapt configurations with dynamically changing workload characteristics, like variations in throughput, proportion of read operations, number of concurrent threads, etc. This dissertation presents a group of adaptive cloud management tools that provide an optimal performance trade off, under given SLA deadlines, for cloud based applications, which use distributed datastores for processing data or use hosting infrastructure provided by cloud service providers. Our tools allow such applications to execute under dynamically changing workloads, while respecting the given SLA. First, we present a novel framework OptCon, that automatically tunes client-centric consistency settings in quorum-replicated stores on a per-operation basis, based on staleness (i.e., how old the observed value is with respect to the latest update on the data item) and latency threshold specified in the given SLA. Next, we present Consistify, a novel decentralized framework that automatically tunes the consistency settings of underlying quorum-replicated datastores to allow client applications to simultaneously respect a given SLA deadline and given correctness conditions (specified in the form of simple logical predicates), that impose constraints on the values returned by client applications. Next, we present YCSB-D, a tool that builds upon the YCSB (Yahoo Cloud Serving Benchmark) benchmark suite to assist users in simulating dynamic variations in workloads; YCSB-D can evaluate adaptive frameworks like OptCon against dynamic variations in workload. Then, we present OptEx, an analytical model of execution of Spark jobs, and a technique for using the above model to estimate the cost optimal cluster composition 1 for running a given Spark job under an SLA deadline.
Document Availability at the Time of Submission
Secure the entire work for patent and/or proprietary purposes for a period of one year. Student has submitted appropriate documentation which states: During this period the copyright owner also agrees not to exercise her/his ownership rights, including public use in works, without prior authorization from LSU. At the end of the one year period, either we or LSU may request an automatic extension for one additional year. At the end of the one year secure period (or its extension, if such is requested), the work will be released for access worldwide.
Sidhanta, Subhajit, "Automated Cloud Datastore and Infrastructure Management Under SLA" (2016). LSU Doctoral Dissertations. 707.