The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A fundamental vision driving USB mass storage is commonly used in our day to day computing other storage applications and carry personal data anywhere at any time. Portable storage, such as flash memory and USB disc, allow users transport several gigabytes to terabyte data in their pockets. For these applications, mass storage is secure and similar to a hard disk, and so the same file management system...
Building a big data processing cluster needs extra care on selecting right storage device, Operating System(OS) and their configuration. A wrong strategy may lead to a very slow cluster which becomes inefficient to processing data in considerable amount of time. In this work we will show how the performance varies with different setup using Hadoop Distributed File System(HDFS) and related tools.
It has been observed that there has been a great interest in computing experiments which has been useful on shared nothing computers and commodity machines. We need multiple systems running in parallel working closely together towards the same goal. Frequently it has been experienced and observed that the distributed execution engine named MapReduce handles the primary input-output workload for such...
This paper compares the I/O performance, flexibility and ease of use features of Linux file systems; Ext4, XFS, BtrFS running on storage stack systems namely LVM and ZFS with RADOS Block Devices (RBD) as the underlying block devices as replacement to physical disks. Experiment sets that have been conducted to evaluate performance of selected file systems; Ext4, XFS, BtrFS and ZFS are presented and...
With enterprises collecting feedback down to every possible detail, data repositories are being over flooded with information. In-order to access valuable information, these data should be processed using sophisticated statistical analysis. Traditional analytical tools, existing statistical software and data management systems find it challenging to perform deep analysis upon large data libraries...
File System is Data-Structure with Logic (method) to handle efficiently a group of information on disk. File systems come are classified according to their structure method of operation, speed of operation, scalability and flexibility size and security. A file system framework is an intends to sort out data anticipated that would be retained after a program terminates by giving methodology to store,...
Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest...
The cloud computing applications has key building blocks in distributed file system based on Map Reduced programming. Nodes serve computing features and storage. A file is divided into chunks and that can be allocated in the nodes. As files can be created, moved, updated and deleted. This leads to load imbalance in the file system. It is necessary to distribute the file chunks uniformly across the...
MapReduce is one of the most popular programming model for big data analysis in Distributed and Parallel Computing Environment. It is used for implementing parallel applications. With the growing development of mobile Internet and cloud computing, the issues related to big data have been a matter of concern in both industry and academy. There are several platforms for users to develop their applications...
Hadoop HDFS is an open source project from Apache Software Foundation for scalable, distributed computing and data storage. HDFS has become a critical component in today's cloud computing environment and a wide range of applications built on top of it. However, the initial design of HDFS has introduced a single-point-of-failure, HDFS contains only one active name node, if this name node experiences...
Information being the key advantage in today's world, its growth rate and amount requires big data analysis, which is key challenge. Collection and retention of such collected data results in massive growth, which sets the need for infrastructure expansion, replacement and proper disposition of existing data. This important data should not be scrapped or forgotten; instead it should be messaged and...
The computing paradigm of "HPC in the Cloud" has gained a surging interest in recent years, due to its merits of cost-efficiency, flexibility, and scalability. Cloud is designed on top of distributed file systems such as Google file system (GFS). The capability of running HPC applications on top of data-intensive file systems is a critical catalyst in promoting Clouds for HPC. However, the...
Load balancing is necessary in distributed file system, especially in high concurrency scenario. Traditional load balancing algorithms are mainly focus on static configuration of servers. In this paper, we present an improved dynamic load balancing algorithm in distributed file system. It first collects real-time performance and hardware configurations of slave servers to achieve dynamic load balancing...
The I/O path model (IOPm) is a graphical representation of the architecture of parallel file systems and the machine they are deployed on. With help of IOPm, file system and machine configurations can be quickly analyzed and distinguished from each other. Contrary to typical representations of the machine and file system architecture, the model visualizes the data or meta data path of client access...
As PC clusters increase in popularity and quantity, message-passing between nodes has been an important issue for high failure rate in the network. File access in a cluster file system often contains several sub-operations, each includes one or more network transmissions. Any network failures will cause the file system service unavailable. In this paper, we describe a highly reliable message-passing...
The improvement for energy efficiency has been increasingly becoming a major consideration in server and data center design, especially for the power-hungry ones. Numerous studies have provided various new methods or proposals for the building of "green" server and data center, but this paper concentrates on how different configuration schemes in a server effect practical performance and...
Parallelisation, serial optimisation, compiler tuning, and many more techniques are used to optimise and improve the performance scaling of parallel programs. One area which is frequently not optimised is file I/O. This is because it is often not considered to be key to the performance of a program and also because it is traditionally difficult to optimise and very machine specific. However, in the...
At a recent meeting of monitoring experts from nine large supercomputing centers, there was a broad divergence of opinion on what monitoring in our environment actually is, what ought to be monitored, what technology should be used, etc. Broad consensus can be summarized in a couple of key points: • Data management is increasingly a problem. As a result, historical information is rarely kept, or,...
Efficient metadata management is critical for distributed file system in cloud computing. In this paper we propose a new metadata management scheme which employs master metadata server (MMDS) and metadata look-up table server between the metadata servers and clients. The MMDS checks the state of MDSs for load-balancing, and thereby avoids hot spot. The proposed scheme significantly reduces the network...
During the past few years, large, reliable and efficient storage systems have become increasingly important in enterprise environments. Additional requirements for these environments include low installation, maintenance and administration costs. In this paper we propose a hash-based storage approach, combined with block-level operating system semantics. The experimental evaluation confirms that the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.