CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds
Running MapReduce programs in the cloud introduces this unique problem: how to optimize resource provisioning to minimize the monetary cost or job finish time for a specific job? We study the whole process of MapReduce processing and build up a cost function that explicitly models the relationship among the time cost, the amount of input data, the available system resources (Map and Reduce slots), and the complexity of the Reduce function for the target MapReduce job. The model parameters can be learned from test runs. Based on this cost function, we can solve a number of decision problems, such as the optimal amount of resources that can minimize monetary cost within a job finish deadline, minimize time cost under a certain monetary budget, or find the optimal tradeoffs between time and monetary costs. Experimental results show that the proposed approach performs well on a number of sample MapReduce programs in both the in-house cluster and Amazon EC2. We also conducted a variance analysis on different components of the MapReduce workflow to show the possible sources of modeling error. Our optimization results show that with the proposed approach we can save a significant amount of time and money, compared to randomly selected settings.
Powers, J. L.,
& Tian, F.
(2014). CRESP: Towards Optimal Resource Provisioning for MapReduce Computing in Public Clouds. IEEE Transactions on Parallel and Distributed Systems, 25 (6), 1403-1412.