Scheduling tasks in the cloud computing environment, particularly for data intensive applications is of great importance and interest. In this paper, we propose a new workflow model presented in a rigorous graph-theoretic setting. In this new model, we would like to incorporate possible similarities between requisite files which are needed to complete the given set of tasks. We show that it is NP-Complete to compute the make span in this model even with oracle access to the cost of retrieving a file.