The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a fully distributed scheduling algorithm to process MapReduce data-intensive applications across geo-distributed clusters in federated clouds. The proposed algorithm, called FedSCD, takes advantage of data locality while reducing both VM cost and data transfer cost (between clusters) subject to Deadline constraint. This work is compared to conventional partially distributed scheduling...
Hadoop is commonly used framework for solving applications which deal with large volumes of data. Most of the current day applications require large storage and computation to be performed. Hadoop jobs are executed in cloud as cloud environment provides flexible provision, maintenance and scalability of resources. Hadoop framework can be improved in terms of parameter automation and map reduce tasks...
Network traffic measurement is significant for network security and network management. As network bandwidth increases and internet applications varies, network big data is bringing new challenge for network traffic measurement. Because the existing network traffic measurement mainly processes network traffic data by the centralized method, it is very difficult to meet the application needs of massive...
Cloud computing is a pattern of processing the big data and provides the convenient, on-demand network access to a shared pool of configurable computing resources. Cloud data center's cost is becoming the hot topic in recent years. This paper studies how to minimize the bandwidth cost for uploading deferral big data to a cloud computing platform, based on the MapReduce Framework. We study the deficiency...
The Hadoop framework has been developed to effectively process data-intensive MapReduce applications. Hadoop users specify the application computation logic in terms of a map and a reduce function, which are often termed MapReduce applications. The Hadoop distributed file system is used to store the MapReduce application data on the Hadoop cluster nodes called Data nodes, whereas Name node is a control...
Cloud computing is based between the service provider and service consumer agreements and cloud data center is under a cloud computing environment that consists of hardware and software components. This paper studies how to minimize the bandwidth cost for uploading deferral big data to a cloud computing platform, based on the MapReduce Framework. We first analysis the shortcoming of bandwidth of data...
In recent years, cloud computing systems become more and more mature and cloud computing system applications are becoming more widespread. Microsoft, Google, IBM, Amazon has developed applications for the cloud computing environment. The cloud computing environment like a large pool of resources, MapReduce distribute resources in this resource pool to achieve cloud computing. Hadoop MapReduce is a...
MapReduce is by far one of the most successful realizations of large-scale data-intensive cloud computing platforms. MapReduce automatically parallelizes computation by running multiple map and/or reduce tasks over distributed data across multiple machines. Hadoop is an open source implementation of MapReduce. When Hadoop schedules reduce tasks, it neither exploits data locality nor addresses partitioning...
Placing data as close as possible to computation is a common practice of data intensive systems, commonly referred to as the data locality problem. By analyzing existing production systems, we confirm the benefit of data locality and find that data have different popularity and varying correlation of accesses. We propose DARE, a distributed adaptive data replication algorithm that aids the scheduler...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.