Search results

chapter

Performance Comparison of Four-Socket Server Architecture on HPC Workload

H. Kasim, V. March, S. See

2009 International Conference on Computational Science and Engineering > 1 > 306 - 311

2009 International Conference on Computational Science and Engineering (CSE)

Recent server architectures embrace a common technology feature: on-chip parallelism via multi-core and CMT (Chip Multi Threading) technologies. However, they also significantly differ in a number of key aspects including clock speed, micro-architecture, cache hierarchy, and memory sub-system. Such differences may lead to difference levels of application performance. This paper presents a performance...

chapter

An application based MPI message throughput benchmark

B.W. Barrett, K.S. Hemmert

2009 IEEE International Conference on Cluster Computing and Workshops > 1 - 8

2009 IEEE International Conference on Cluster Computing and Workshops (CLUSTER)

Recent trends in high performance computing have renewed interest in the ability of platforms to sustain high message throughput rates. The continued growth in platform scale, combined with emerging application areas, are pushing platforms to support increasing message rates. Best-case message throughput has grown in previous hardware generations due to growing clock rates and software optimization...

chapter

Measuring and Understanding Variation in Benchmark Performance

N J Wright, S Smallen, C M Olschanowsky, J Hayes, more

2009 DoD High Performance Computing Modernization Program Users Group Conference > 438 - 443

DoD High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC 2009)

Runtime irreproducibility complicates application performance evaluation on today's high performance computers. Performance can vary significantly between seemingly identical runs; this presents a challenge to benchmarking as well as a user, who is trying to determine whether the change they made to their code is an actual improvement. In order to gain a better understanding of this phenomenon, we...

chapter

Evaluating Parallel Extensions to High Level Languages Using the HPC Challenge Benchmarks

Laura Humphrey, Brian Guilfoos, Harrison Smith, Andrew Warnock, more

2009 DoD High Performance Computing Modernization Program Users Group Conference > 410 - 415

DoD High Performance Computing Modernization Program Users Group Conference (HPCMP-UGC 2009)

Recent years have seen the development of many new parallel extensions to high level languages. However, there does not yet seem to have been a concentrated effort to quantify their performance or qualify their usability. Toward this end, we have used several parallel extensions to implement four of the high performance computing (HPC) Challenge benchmarks-FFT, HPL, RandomAccess, and STREAM-according...

chapter

Performance projection of HPC applications using SPEC CFP2006 benchmarks

S. Sharkawi, D. DeSota, R. Panda, R. Indukuru, more

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 12

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Performance projections of high performance computing (HPC) applications onto various hardware platforms are important for hardware vendors and HPC users. The projections aid hardware vendors in the design of future systems, enable them to compare the application performance across different existing and future systems, and help HPC users with system procurement and application refinements. In this...

chapter

Performance analysis and projections for Petascale applications on Cray XT series systems

S.R. Alam, R.F. Barrett, J.A. Kuehn, S.W. Poole

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 8

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The Petascale Cray XT5 system at the Oak Ridge National Laboratory (ORNL) Leadership Computing Facility (LCF) shares a number of system and software features with its predecessor, the Cray XT4 system including the quad-core AMD processor and a multi-core aware MPI library. We analyze performance of scalable scientific applications on the quad-core Cray XT4 system as part of the early system access...

chapter

Predictive Simulation of HPC Applications

S.D. Hammond, J.A. Smith, G.R. Mudalige, S.A. Jarvis

2009 International Conference on Advanced Information Networking and Applications > 33 - 40

2009 International Conference on Advanced Information Networking and Applications (AINA 2009)

The architectures which support modern supercomputing machinery are as diverse today, as at any point during the last twenty years. The variety of processor core arrangements, threading strategies and the arrival of heterogeneous computation nodes are driving modern-day solutions to petaflop speeds. The increasing complexity of such systems, as well as codes written to take advantage of the new computational...

chapter

NPB-MPJ: NAS Parallel Benchmarks Implementation for Message-Passing in Java

D.A. Mallon, G.L. Taboada, J. Tourio, R. Doallo

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing > 181 - 190

2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing

Java is a valuable and emerging alternative for the development of parallel applications, thanks to the availability of several Java message-passing libraries and its full multithreading support. The combination of both shared and distributed memory programming is an interesting option for parallel programming multi-core systems. However, the concerns about Java performance are hindering its adoption...

chapter

Coordinated Co-allocation Scheduling on Heterogeneous Clusters of SMPs

I. Rodero, J. Corbalan

2008 IEEE Fourth International Conference on eScience > 703 - 710

2008 IEEE Fourth International Conference on eScience

Job scheduling research for parallel systems has been widely exploited in recent years, especially in centers with high performance computing facilities. In the recent past we presented the eNANOS execution environment which is based on a coordinated architecture, from the CPU allocation to the grid scheduling, providing a good low level support to perform an efficient high level scheduling. In this...

chapter

Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark

Hongzhang Shan, K. Antypas, J. Shalf

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

The unprecedented parallelism of new supercomputing platforms poses tremendous challenges to achieving scalable performance for I/O intensive applications. Performance assessments using traditional I/O system and component benchmarks are difficult to relate back to application I/O requirements. However, the complexity of full applications motivates development of simpler synthetic I/O benchmarks as...

chapter

Implementation and Optimization of MPICH2 Multicast on Optical Fiber Network

Xiaokun Liu, Minglu Li, Xinhua Lin, Xiaozheng Cheng

2008 IFIP International Conference on Network and Parallel Computing > 577 - 582

2008 IFIP International Conference on Network and Parallel Computing

Due to its high performance, optical fiber is more and more popular in the field of high performance computing. In this field, Message-Passing Interface (MPI) has been the most popular standard for parallel applications. MPICH2 is the most famous implementation of MPI. Optical fiber is a circuit-switched network, which needs to reserve an independent path before communication. Although the huge bandwidth...

chapter

Memory Based Metadata Server for Cluster File Systems

Jing Xing, Jin Xiong, Jie Ma, Ninghui Sun

2008 Seventh International Conference on Grid and Cooperative Computing > 287 - 291

2008 Seventh International Conference on Grid and Cooperative Computing

In high performance computing environment, the metadata servers of distributed file system become critical to impact overall system performance. An approach of memory based metadata server is proposed, instead of the disk based approach. We present a metadata management system with matrix organization, non-overhead reliable mechanism and static scalability method, which is design to efficiently utilize...

chapter

High Performance Computing for Embedded System Design: A Case Study

V. Catania, G. De Francisci Morales, A.G. Di Nuovo, M. Palesi, more

2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools > 656 - 659

2008 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools (DSD)

In this paper we assess the use of high performance computing in design space exploration of a complex highly parameterized very long instruction word based system-on-a-chip platform. Experiments show that the conventional belief of linear decrease in exploration time as the number of available processors increases is discredited starting from a relatively low number of processors mainly due to communication...

chapter

Efficient one-copy MPI shared memory communication in Virtual Machines

Wei Huang, M.J. Koop, D.K. Panda

2008 IEEE International Conference on Cluster Computing > 107 - 115

2008 IEEE International Conference on Cluster Computing (CLUSTER)

Efficient intra-node shared memory communication is important for high performance computing (HPC), especially with the emergence of multi-core architectures. As clusters continue to grow in size and complexity, the use of virtual machine (VM) technologies has been suggested to ease the increasing number of management issues. As demonstrated by earlier research, shared memory communication must be...

chapter

A Performance Counter Based Workload Characterization on Blue Gene/P

K. Ganesan, L. John, V. Salapura, J. Sexton

2008 37th International Conference on Parallel Processing > 330 - 337

2008 37th International Conference on Parallel Processing (ICPP)

IBM's Blue Gene/P, the second generation of the Blue Genesupercomputer is designed with a Universal Performance Counter (UPC) Unit at each node capable of monitoring 256 events concurrently, unlike many microprocessors that provide only a few performance counters. In this paper we demonstrate the efficacy of the interface library that we have developed, taking advantage of the UPC unit, enabling users...

chapter

Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems

Lei Chai, Ping Lai, Hyun-Wook Jin, D.K. Panda

2008 37th International Conference on Parallel Processing > 222 - 229

2008 37th International Conference on Parallel Processing (ICPP)

The emergence of multi-core processors has made MPI intra-node communication a critical component in high performance computing. In this paper, we use a three-step methodology to design an efficient MPI intra-node communication scheme from two popular approaches: shared memory and OS kernel-assisted direct copy. We use an Intel quad-core cluster for our study. We first run micro-benchmarks to analyze...

chapter

Optimization and Parallelization of DFT and TDDFT in GAMESS on DoD HPC Machines

M.E. Lasinski, N.A. Romero, A.D. Yau, G. Kedziora, more

2008 DoD HPCMP Users Group Conference > 437 - 441

2008 DoD HPCMP Users Group Conference (DoD HPCMP UGC)

The quantum chemistry package General Atomic and Molecular Electronic Structure System (GAMESS) is employed in the first-principles modeling of complex molecular systems by using the density functional theory (DFT) as well as a number of other post-Hartree-Fock (HF) methods. Both DFT and time-dependent DFT (TDDFT) are of particular interest to the Department of Defense (DoD) Computational Biology,...

chapter

Building efficient multi-core clusters for high performance computing

L.C. Pinto, L. Tomazella, M. Dantas

2008 IEEE Symposium on Computers and Communications > 474 - 479

2008 IEEE Symposium on Computers and Communications (ISCC)

Multi-core technology produces a new scenario for communicating processes in an MPI cluster environment and consequently the involved trade-offs need to be uncovered. This motivation guided our research and lead to a new approach for setting up more efficient clusters built with commodities. Thus, alternatively to the utilization of non-commodity interconnects such as Myrinet and Infiniband, we present...

chapter

Parallel PIC code using Java on PC cluster

DongSheng Cai, Quanming Lu

Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region > 1 > 495 - 500 vol.1

Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region

The Java language has emerged as a dominant language that could eventually replace C++, due to it being object-oriented, architecture neutral, multi-threaded etc. and its support for applets. But Java is believed to be "too slow" for scientific computing. Many high-performance PCs such as those based on the Pentium II have been introduced, and these have great potential for high-performance...

INFONA - science communication portal

Search results

Performance Comparison of Four-Socket Server Architecture on HPC Workload

An application based MPI message throughput benchmark

Measuring and Understanding Variation in Benchmark Performance

Evaluating Parallel Extensions to High Level Languages Using the HPC Challenge Benchmarks

Performance projection of HPC applications using SPEC CFP2006 benchmarks

Performance analysis and projections for Petascale applications on Cray XT series systems

Predictive Simulation of HPC Applications

NPB-MPJ: NAS Parallel Benchmarks Implementation for Message-Passing in Java

Coordinated Co-allocation Scheduling on Heterogeneous Clusters of SMPs

Characterizing and predicting the I/O performance of HPC applications using a parameterized synthetic benchmark

Implementation and Optimization of MPICH2 Multicast on Optical Fiber Network

Memory Based Metadata Server for Cluster File Systems

High Performance Computing for Embedded System Design: A Case Study

Efficient one-copy MPI shared memory communication in Virtual Machines

A Performance Counter Based Workload Characterization on Blue Gene/P

Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems

Optimization and Parallelization of DFT and TDDFT in GAMESS on DoD HPC Machines

Building efficient multi-core clusters for high performance computing

Parallel PIC code using Java on PC cluster

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options