Supercomputing, 2007. SC '07. Proceedings of the 2007 ACM/IEEE Conference on

chapter

Programming bits and atoms

Neil Gershenfeld

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

No abstract available

chapter

First-principles calculations of large-scale semiconductor systems on the earth simulator

Takahisa Ohno, Takenori Yamamoto, Tatsunobu Kokubo, Akira Azami, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 6

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

First-principles simulations of large-scale semiconductor systems using the PHASE code on the Earth Simulator (ES) demonstrate high performance with respect to the theoretical peak performance. PHASE, designed for vector-parallel systems like the ES, demonstrates excellent parallel efficiency. We simulated an arsenic donor in silicon using up to 8,000 atom unit cell. A sustained peak performance of...

chapter

WRF nature run

John Michalakes, Josh Hacker, Richard Loft, Michael O. McCracken, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 6

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

The Weather Research and Forecast (WRF) model is a limited-area model of the atmosphere for mesoscale research and operational numerical weather prediction (NWP). A petascale problem is a WRF nature run that provides very high-resolution "truth" against which more coarse simulations or perturbation runs may be compared for purposes of studying predictability, stochastic parameterization,...

chapter

A preliminary investigation of a neocortex model implementation on the Cray XD1

Kenneth L. Rice, Christopher N. Vutsinas, Tarek M. Taha

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 8

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

In this paper we study the acceleration of a new class of cognitive processing applications based on the structure of the neocortex. Specifically we examine the speedup of a visual cortex model for image recognition. We propose techniques to accelerate the application on general purpose processors and on reconfigurable logic. We present implementations of our approach on a Cray XD1 and compare the...

chapter

Bounding energy consumption in large-scale MPI programs

Barry Rountree, David K. Lowenthal, Shelby Funk, Vincent W. Freeh, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 9

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Power is now a first-order design constraint in large-scale parallel computing. Used carefully, dynamic voltage scaling can execute parts of a program at a slower CPU speed to achieve energy savings with a relatively small (possibly zero) time delay. However, the problem of when to change frequencies in order to optimize energy savings is NP-complete, which has led to many heuristic energy-saving...

chapter

Anomaly detection and diagnosis in grid environments

Lingyun Yang, Chuang Liu, Jennifer M. Schopf, Ian Foster

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 9

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Identifying and diagnosing anomalies in application behavior is critical to delivering reliable application-level performance. In this paper we introduce a strategy to detect anomalies and diagnose the possible reasons behind them. Our approach extends the traditional window-based strategy by using signal-processing techniques to filter out recurring, background fluctuations in resource behavior....

chapter

Integrating parallel file systems with object-based storage devices

Ananth Devulapalli, Dennis Dalessandro, Pete Wyckoff, Nawab Ali, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

As storage systems evolve, the block-based design of today's disks is becoming inadequate. As an alternative, object-based storage devices (OSDs) offer a view where the disk manages data layout and keeps track of various attributes about data objects. By moving functionality that is traditionally the responsibility of the host OS to the disk, it is possible to improve overall performance and simplify...

chapter

Multi-threading and one-sided communication in parallel LU factorization

Parry Husbands, Katherine Yelick

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Dense LU factorization has a high ratio of computation to communication and, as evidenced by the High Performance Linpack (HPL) benchmark, this property makes it scale well on most parallel machines. Nevertheless, the standard algorithm for this problem has non-trivial dependence patterns which limit parallelism, and local computations require large matrices in order to achieve good single processor...

chapter

Variable latency caches for nanoscale processor

Serkan Ozdemir, Arindam Mallik, Ja Chun Ku, Gokhan Memik, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Variability is one of the important issues in nanoscale processors. Due to increasing importance of interconnect structures in submicron technologies, the physical location and phenomena such as coupling have an increasing impact on the latency of operations. Therefore, traditional view of rigid access latencies to components wil result in suboptimal architectures. In this paper, we devise a cache...

chapter

Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

Keith D. Underwood, Michael J. Levenhagen, Ron Brightwell

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Partitioned global address space (PGAS) programming models have been identified as one of the few viable approaches for dealing with emerging many-core systems. These models tend to generate many small messages, which requires specific support from the network interface hardware to enable efficient execution. In the past, Cray included E-registers on the Cray T3E to support the SHMEM API; however,...

chapter

Application development on hybrid systems

Roger D. Chamberlain, Mark A. Franklin, Eric J. Tyson, Jeremy Buhler, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Hybrid systems consisting of a multitude of different computing device types are interesting targets for high-performance applications. Chip multiprocessors, FPGAs, DSPs, and GPUs can be readily put together into a hybrid system; however, it is not at all clear that one can effectively deploy applications on such a system. Coordinating multiple languages, especially very different languages like hardware...

chapter

A job scheduling framework for large computing farms

Gabriele Capannini, Ranieri Baraglia, Diego Puppin, Laura Ricci, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

In this paper, we propose a new method, called Convergent Scheduling, for scheduling a continuous stream of batch jobs on the machines of large-scale computing farms. This method exploits a set of heuristics that guide the scheduler in making decisions. Each heuristics manages a specific problem constraint, and contributes to carry out a value that measures the degree of matching between a job and...

chapter

Implementation and performance analysis of non-blocking collective operations for MPI

Torsten Hoefler, Andrew Lumsdaine, Wolfgang Rehm

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Collective operations and non-blocking point-to-point operations have always been part of MPI. Although non-blocking collective operations are an obvious extension to MPI, there have been no comprehensive studies of this functionality. In this paper we present LibNBC, a portable high-performance library for implementing non-blocking collective MPI communication operations. LibNBC provides non-blocking...

chapter

A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3

Yousuke Ohno, Eiji Nishibori, Tetsu Narumi, Takahiro Koishi, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

We have achieved a sustained calculation speed of 281 Tflops for the optimization of the 3-D structures of proteins from the X-ray experimental data by the Genetic Algorithm - Direct Space (GA-DS) method. In this calculation we used MDGRAPE-3, special-purpose computer for molecular simulations, with the peak performance of 752 Tflops. In the GA-DS method, a set of selected parameters which define...

chapter

Evaluation of active storage strategies for the lustre parallel file system

Juan Piernas, Jarek Nieplocha, Evan J. Felix

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Active Storage provides an opportunity for reducing the amount of data movement between storage and compute nodes of a parallel filesystem such as Lustre, and PVFS. It allows certain types of data processing operations to be performed directly on the storage nodes of modern parallel filesystems, near the data they manage. This is possible by exploiting the underutilized processor and memory resources...

chapter

PNMPI tools: a whole lot greater than the sum of their parts

Martin Schulz, Bronis R. de Supinski

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 10

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

PNMPI extends the PMPI profiling interface to support multiple concurrent PMPI-based tools by enabling users to assemble tool stacks. We extend this basic concept to include new services for tool interoperability and to switch between tool stacks dynamically. This allows PNMPI to support modules that virtualize MPI execution environments within an MPI job or that restrict the application of existing,...

chapter

A user-level secure grid file system

Ming Zhao, Renato J. Figueiredo

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 11

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

A grid-wide distributed file system provides convenient data access interfaces that facilitate fine-grained cross-domain data sharing and collaboration. However, existing widely-adopted distributed file systems do not meet the security requirements for grid systems. This paper presents a Secure Grid File System (SGFS) which supports GSI-based authentication and access control, end-to-end message privacy,...

chapter

Using MPI file caching to improve parallel write performance for large-scale scientific applications

Wei-keng Liao, Avery Ching, Kenin Coloma, Arifa Nisar, more

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 11

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Typical large-scale scientific applications periodically write checkpoint files to save the computational state throughout execution. Existing parallel file systems improve such write-only I/O patterns through the use of client-side file caching and write-behind strategies. In distributed environments where files are rarely accessed by more than one client concurrently, file caching has achieved significant...

chapter

GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing

Junichiro Makino, Kei Hiraki, Mary Inaba

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 11

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

We describe the GRAPE-DR (Greatly Reduced Array of Processor Elements with Data Reduction) system, which will consist of 4096 processor chips each with 512 cores operating at the clock frequency of 500 MHz. The peak speed of a processor chip is 512Gflops (single precision) or 256 Gflops (double precision). The GRAPE-DR chip works as an attached processor to standard PCs. Currently, a PCI-X board with...

chapter

RobuSTore: a distributed storage architecture with robust and high performance

Huaxia Xia, Andrew A. Chien

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '7) > 1 - 11

2007 SC - International conference for High Performance Computing, Networking, Storage and Analysis

Emerging large-scale scientific applications require to access large data objects in high and robust performance. We propose RobuSTore, a storage architecture that combines erasure codes and speculative access mechanisms for parallel write and read in distributed environments. The mechanisms can effectively aggregate the bandwidth from a large number of distributed disks and statistically tolerate...

INFONA - science communication portal

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)

Programming bits and atoms

First-principles calculations of large-scale semiconductor systems on the earth simulator

WRF nature run

A preliminary investigation of a neocortex model implementation on the Cray XD1

Bounding energy consumption in large-scale MPI programs

Anomaly detection and diagnosis in grid environments

Integrating parallel file systems with object-based storage devices

Multi-threading and one-sided communication in parallel LU factorization

Variable latency caches for nanoscale processor

Evaluating NIC hardware requirements to achieve high message rate PGAS support on multi-core processors

Application development on hybrid systems

A job scheduling framework for large computing farms

Implementation and performance analysis of non-blocking collective operations for MPI

A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3

Evaluation of active storage strategies for the lustre parallel file system

PNMPI tools: a whole lot greater than the sum of their parts

A user-level secure grid file system

Using MPI file caching to improve parallel write performance for large-scale scientific applications

GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing

RobuSTore: a distributed storage architecture with robust and high performance

Filter options

Publication date

Keywords

INFONA - science communication portal

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)