The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Elastic distributed storage systems have been increasingly studied in recent years because power consumption has become a major problem in data centers. Much progress has been made in improving the agility of resizing small- and large-scale distributed storage systems. However, most of these studies focus on metadata based distributed storage systems. On the other hand, emerging consistent hashing...
Large-scale parallel file systems are of prime importance today. However, despite of the importance, their failure-recovery capability is much less studied compared with local storage systems. Recent studies on local storage systems have exposed various vulnerabilities that could lead to data loss under failure events, which raise the concern for parallel file systems built on top of them.This paper...
As the largest country code Top Level Domain (ccTLD) name service, .CN receives billions of queries every day. Under the threat of Distributed Denial-of-Service (DDoS) attacks, effective mechanism for client classification is especially important for such busy ccTLD service. In this paper, by analyzing the query log of .CN name service, we propose a novel client classification method based on client...
Distributed storage systems play an increasingly critical role in data centers to meet the ever-increasing data growth demand. Heterogeneous storage systems, with the coexistence of hard disk drives (HDDs) and solid state drives (SSDs), can be an attractive distributed store solution due to the balanced performance, large capacity, and economic cost. The consistent hashing distribution algorithm that...
Fast growing "Big Data" demands present new challenges to the traditional distributed storage system solutions. In order to support cloud-scale data centers, new types of distributed storage systems are emerging. They are designed to scale to thousands of nodes, maintain petabytes of data and be highly reliable. The support for virtual machines is also becoming essential as it is one of...
Property graphs are a promising data model for rich metadata management in high-performance computing (HPC) systems because of their ability to represent not only metadata attributes but also the relationships between them. A property graph can be used to record the relationships between users, jobs, and data, for example, with unique annotations for each entity. This high-volume, power-law distributed...
Most mass data processing applications nowadays often need long, continuous, and uninterrupted data access. Parallel/distributed file systems often use multiple metadata servers to manage the global namespace and provide a reliability guarantee. With the rapid increase of data amount and system scale, the probability of hardware or software failures keeps increasing, which easily leads to multiple...
Hadoop is a popular open-source framework that allows distributed analysis of large datasets using the MapReduce programming model. A distributed file system HDFS is implemented to provide high-throughput access to datasets. HDFS can achieve high performance metadata service but has two disadvantages. First, when the metadata server stores metadata on persistent devices, it is restricted to read and...
Scientific applications from many problem domains produce and/or access large volumes of data. To support these applications, designers of high-end computing (HEC) systems have greatly increased the capacity of storage systems in recent years. However, because hard disk drives (HDDs) are still the dominant storage device used in HEC storage systems, and because HDD performance has not improved as...
MPI collective I/O is a widely used I/O method that helps data-intensive scientific applications gain better I/O performance. However, it has been observed that existing collective I/O strategies do not perform well due to the access contention problem. Existing collective I/O optimization strategies mainly focus on the I/O phase efficiency and ignore the shuffle cost that may limit the potential...
Parallel and distributed file systems are widely used to provide high throughput in high-performance computing and Cloud computing systems. To increase the parallelism, I/O requests are partitioned into multiple sub-requests (or 'flows') and distributed across different data nodes. Therefore the completion time of an I/O request depends on the slowest sub-request and the performance of file systems...
Server evaluation in an enterprise environment is important but technically demanding. An application benchmark based on a tailored enterprise application system is described in this paper. The real workload set gives better production relevance and more focus on customer's requirement, leading to a more practical evaluation of enterprise server. Detailed analysis is made on the design and implementation...
The paper reveals a trend of substituting PC with thin clients. Features and application demand of thin clients are investigated. A comprehensive evaluation system is established, which takes factors including hardware, remote concentrated management system, peripheral compatibility, performance, reliability, virtualization platform compatibility and energy consumption into account. Furthermore, we...
The computing paradigm of "HPC in the Cloud" has gained a surging interest in recent years, due to its merits of cost-efficiency, flexibility, and scalability. Cloud is designed on top of distributed file systems such as Google file system (GFS). The capability of running HPC applications on top of data-intensive file systems is a critical catalyst in promoting Clouds for HPC. However, the...
Many high-end computing applications in critical areas of science and technology are becoming more and more data intensive. These applications transfer large amounts of data from storage nodes to compute nodes for processing, which is costly and bandwidth consuming. The data movement often dominates the applications' run time. Active storage provides a promising solution for these applications by...
Parallel applications benefit considerably from the rapid advance of processor architectures and the available massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed as a critical cause of the low sustained performance of parallel systems. Collective I/O is widely considered a critical solution that exploits...
Shared-nothing and shared-disk are the two most common storage architectures of parallel databases in the past two decades. Both two types of systems have their own merits for different applications. However, there are no much efforts in investigating the integration of these two architectures and exploiting their merits together. In this paper, we propose a novel hybrid storage architecture for large-scale...
Parallel applications can benefit greatly from massive computational capability, but their performance suffers from large latency of I/O accesses. The poor I/O performance has been attributed as a critical cause of the low sustained performance of parallel computing systems. In this study, we propose a data layout-aware optimization strategy to promote a better integration of the parallel I/O middleware...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.