The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data replication is a key technique to achievedata availability, reliability, and optimized performance indistributed storage systems and data centers. In recent years, with the emergence of new storage devices, heterogeneousobject-based storage system, such as a storage system withthe co-existence of hard disk drives and solid state drives, have become increasingly attractive as they combine merits...
Elastic distributed storage systems have been increasingly studied in recent years because power consumption has become a major problem in data centers. Much progress has been made in improving the agility of resizing small- and large-scale distributed storage systems. However, most of these studies focus on metadata based distributed storage systems. On the other hand, emerging consistent hashing...
High-performance computing (HPC) systems face increasingly critical metadata management challenges, especially in the approaching exascale era. These challenges arise not only from exploding metadata volumes but also from increasingly diverse metadata, which contains data provenance and user-defined attributes in addition to traditional POSIX metadata. This "rich" metadata is critical to...
Distributed storage systems play an increasingly critical role in data centers to meet the ever-increasing data growth demand. Heterogeneous storage systems, with the coexistence of hard disk drives (HDDs) and solid state drives (SSDs), can be an attractive distributed store solution due to the balanced performance, large capacity, and economic cost. The consistent hashing distribution algorithm that...
The data scale in many data centers is growing explosively with emerging applications and usages of big data technologies. Data distribution is a key issue in large-scale distributed storage systems to place petabytes of data or even beyond, among tens or hundreds of thousands of storage devices. In the meantime, heterogeneous storage systems, such as those having devices with hard disk drives (HDDs)...
Fast growing "Big Data" demands present new challenges to the traditional distributed storage system solutions. In order to support cloud-scale data centers, new types of distributed storage systems are emerging. They are designed to scale to thousands of nodes, maintain petabytes of data and be highly reliable. The support for virtual machines is also becoming essential as it is one of...
HPC platforms are capable of generating huge amounts of metadata about different entities including jobs, users, and files. Simple metadata, which describe the attributes of these entities (e.g., file size, name, and permissions mode), has been well recorded and used in current systems. However, only a limited amount of rich metadata, which records not only the attributes of entities but also relationships...
Performance of reading scientific data from a parallel file system depends on the organization of data on physical storage devices. Data is often immutable after producers of data, such as large-scale simulations, experiments, and observations, write the data to the parallel file system. As a result, read performance of data analysis tasks is often slow when the read pattern does not conform with...
Parallel and distributed file systems are widely used to provide high throughput in high-performance computing and Cloud computing systems. To increase the parallelism, I/O requests are partitioned into multiple sub-requests (or 'flows') and distributed across different data nodes. Therefore the completion time of an I/O request depends on the slowest sub-request and the performance of file systems...
The computing paradigm of "HPC in the Cloud" has gained a surging interest in recent years, due to its merits of cost-efficiency, flexibility, and scalability. Cloud is designed on top of distributed file systems such as Google file system (GFS). The capability of running HPC applications on top of data-intensive file systems is a critical catalyst in promoting Clouds for HPC. However, the...
Many high-end computing applications in critical areas of science and technology are becoming more and more data intensive. These applications transfer large amounts of data from storage nodes to compute nodes for processing, which is costly and bandwidth consuming. The data movement often dominates the applications' run time. Active storage provides a promising solution for these applications by...
Parallel applications can benefit greatly from massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed as a critical cause of the low sustained performance of parallel computing systems. In this study, we propose a data layout-aware optimization strategy to promote a better integration of the parallel I/O middleware...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.