Characterizing and modeling cloud applications/jobs on a Google data center

Sheng Di; Derrick Kondo; Franck Cappello

doi:10.1007/s11227-014-1131-z

Characterizing and modeling cloud applications/jobs on a Google data center

Sheng Di, Derrick Kondo, Franck Cappello

Source

The Journal of Supercomputing > 2014 > 69 > 1 > 139-160

Abstract

In this paper, we characterize and model Google applications and jobs, based on a 1-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+ % of tasks’ simulation errors are $$<$$ < 20 %, confirming a high accuracy of our simulation model.