The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recent server architectures embrace a common technology feature: on-chip parallelism via multi-core and CMT (Chip Multi Threading) technologies. However, they also significantly differ in a number of key aspects including clock speed, micro-architecture, cache hierarchy, and memory sub-system. Such differences may lead to difference levels of application performance. This paper presents a performance...
Recent trends in high performance computing have renewed interest in the ability of platforms to sustain high message throughput rates. The continued growth in platform scale, combined with emerging application areas, are pushing platforms to support increasing message rates. Best-case message throughput has grown in previous hardware generations due to growing clock rates and software optimization...
Runtime irreproducibility complicates application performance evaluation on today's high performance computers. Performance can vary significantly between seemingly identical runs; this presents a challenge to benchmarking as well as a user, who is trying to determine whether the change they made to their code is an actual improvement. In order to gain a better understanding of this phenomenon, we...
Recent years have seen the development of many new parallel extensions to high level languages. However, there does not yet seem to have been a concentrated effort to quantify their performance or qualify their usability. Toward this end, we have used several parallel extensions to implement four of the high performance computing (HPC) Challenge benchmarks-FFT, HPL, RandomAccess, and STREAM-according...
Performance projections of high performance computing (HPC) applications onto various hardware platforms are important for hardware vendors and HPC users. The projections aid hardware vendors in the design of future systems, enable them to compare the application performance across different existing and future systems, and help HPC users with system procurement and application refinements. In this...
The Petascale Cray XT5 system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) shares a number of system and software features with its predecessor, the Cray XT4 system including the quad-core AMD processor and a multi-core aware MPI library. We analyze performance of scalable scientific applications on the quad-core Cray XT4 system as part of the early system access...
The architectures which support modern supercomputing machinery are as diverse today, as at any point during the last twenty years. The variety of processor core arrangements, threading strategies and the arrival of heterogeneous computation nodes are driving modern-day solutions to petaflop speeds. The increasing complexity of such systems, as well as codes written to take advantage of the new computational...
Java is a valuable and emerging alternative for the development of parallel applications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of both shared and distributed memory programming is an interesting option for parallel programming multi-core systems. However, the concerns about Java performance are hindering its adoption...
Job scheduling research for parallel systems has been widely exploited in recent years, especially in centers with high performance computing facilities. In the recent past we presented the eNANOS execution environment which is based on a coordinated architecture, from the CPU allocation to the grid scheduling, providing a good low level support to perform an efficient high level scheduling. In this...
The unprecedented parallelism of new supercomputing platforms poses tremendous challenges to achieving scalable performance for I/O intensive applications. Performance assessments using traditional I/O system and component benchmarks are difficult to relate back to application I/O requirements. However, the complexity of full applications motivates development of simpler synthetic I/O benchmarks as...
Due to its high performance, optical fiber is more and more popular in the field of high performance computing. In this field, Message-Passing Interface (MPI) has been the most popular standard for parallel applications. MPICH2 is the most famous implementation of MPI. Optical fiber is a circuit-switched network, which needs to reserve an independent path before communication. Although the huge bandwidth...
In high performance computing environment, the metadata servers of distributed file system become critical to impact overall system performance. An approach of memory based metadata server is proposed, instead of the disk based approach. We present a metadata management system with matrix organization, non-overhead reliable mechanism and static scalability method, which is design to efficiently utilize...
In this paper we assess the use of high performance computing in design space exploration of a complex highly parameterized very long instruction word based system-on-a-chip platform. Experiments show that the conventional belief of linear decrease in exploration time as the number of available processors increases is discredited starting from a relatively low number of processors mainly due to communication...
Efficient intra-node shared memory communication is important for high performance computing (HPC), especially with the emergence of multi-core architectures. As clusters continue to grow in size and complexity, the use of virtual machine (VM) technologies has been suggested to ease the increasing number of management issues. As demonstrated by earlier research, shared memory communication must be...
IBM's Blue Gene/P, the second generation of the Blue Genesupercomputer is designed with a Universal Performance Counter (UPC) Unit at each node capable of monitoring 256 events concurrently, unlike many microprocessors that provide only a few performance counters. In this paper we demonstrate the efficacy of the interface library that we have developed, taking advantage of the UPC unit, enabling users...
The emergence of multi-core processors has made MPI intra-node communication a critical component in high performance computing. In this paper, we use a three-step methodology to design an efficient MPI intra-node communication scheme from two popular approaches: shared memory and OS kernel-assisted direct copy. We use an Intel quad-core cluster for our study. We first run micro-benchmarks to analyze...
The quantum chemistry package General Atomic and Molecular Electronic Structure System (GAMESS) is employed in the first-principles modeling of complex molecular systems by using the density functional theory (DFT) as well as a number of other post-Hartree-Fock (HF) methods. Both DFT and time-dependent DFT (TDDFT) are of particular interest to the Department of Defense (DoD) Computational Biology,...
Multi-core technology produces a new scenario for communicating processes in an MPI cluster environment and consequently the involved trade-offs need to be uncovered. This motivation guided our research and lead to a new approach for setting up more efficient clusters built with commodities. Thus, alternatively to the utilization of non-commodity interconnects such as Myrinet and Infiniband, we present...
The Java language has emerged as a dominant language that could eventually replace C++, due to it being object-oriented, architecture neutral, multi-threaded etc. and its support for applets. But Java is believed to be "too slow" for scientific computing. Many high-performance PCs such as those based on the Pentium II have been introduced, and these have great potential for high-performance...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.