There is a lot of data generated by the network is growing every day. MapReduce is a promising parallel programming model for processing large data. In this paper we surveyed several distributed storage and computation systems. We have studied various parameters like Fault Tolerance,Replication,Checkpointing,Security and Optimizing small file access using MapReduce and reviewed the distributed file systems like GlusterFS,Lustre,Ceph and HDFS which are open source distributed file systems. Cloud computing is one of the distirbuted file system which plays a very important role in protecting applications’data and the related infrastructure with the help of policies, technologies, controls, and big data tools. In our study we have proposed that MapReduce is the efficient and scalable programming platform for data processing which provides computational capabilities and distributed storage on clusters of commodity hardware.