Search results

Items from 1 to 16 out of 16 results

chapter

Servet: A benchmark suite for autotuning on multicore clusters

Jorge Gonzalez-Dominguez, Guillermo L Taboada, Basilio B Fraguela, Maria J Martin, more

2010 IEEE International Symposium on Parallel&Distributed Processing (IPDPS) > 1 - 9

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The growing complexity in computer system hierarchies due to the increase in the number of cores per processor, levels of cache (some of them shared) and the number of processors per node, as well as the high-speed interconnects, demands the use of new optimization techniques and libraries that take advantage of their features. In this paper Servet, a suite of benchmarks focused on detecting a set...

chapter

An introductory exascale feasibility study for FFTs and multigrid

Hormozd Gahvari, William Gropp

2010 IEEE International Symposium on Parallel&Distributed Processing (IPDPS) > 1 - 9

2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The coming decade is going to see a push towards exascale computing. Assuming gigahertz cores, this means exascale systems will have between 100 million and 1 billion of them to achieve this level of performance. At this scale, some important questions need to be answered on the applications end. What applications are feasible at this scale? What needs to be done to make them scalable? How does the...

chapter

Comparing Three Algorithms of Implementing ON/OFF Aggregation Model

Shuai Yuan, Gang Zhou, Yi Jin

2009 International Conference on Computational Intelligence and Software Engineering > 1 - 4

2009 International Conference on Computational Intelligence and Software Engineering

ON/OFF aggregation model is one of the efficient and accurate models for self-similar network traffic generation. In this paper we propose and compare three algorithms of implementing ON/OFF aggregation model, based on Cavium OCTEON CN3860 network processor, aimed to achieve high-bandwidth and real-time network traffic generation. The model is implemented in a multi-thread approach, in a token-bucket...

chapter

System-level Performance Verification of Multicore Systems-on-Chip

Jim Holt, Jaideep Dastidar, David Lindberg, John Pape, more

2009 10th International Workshop on Microprocessor Test and Verification > 83 - 87

2009 10th International Workshop on Microprocessor Test and Verification (MTV)

MCSoC are comprised of a rich set of processor cores, specialized hardware accelerators, and I/O interfaces. Focusing only on functional verification is risky because the motivation for building such systems in the first place is to achieve high levels of system throughput: a functionally correct MCSoC that does not exhibit sufficient performance will fail in the market. Furthermore, focusing performance...

chapter

Understanding PARSEC performance on contemporary CMPs

M. Bhadauria, V.M. Weaver, S.A. McKee

2009 IEEE International Symposium on Workload Characterization (IISWC) > 98 - 107

2009 IEEE International Symposium on Workload Characterization (IISWC)

PARSEC is a reference application suite used in industry and academia to assess new chip multiprocessor (CMP) designs. No investigation to date has profiled PARSEC on real hardware to better understand scaling properties and bottlenecks. This understanding is crucial in guiding future CMP designs for these kinds of emerging workloads. We use hardware performance counters, taking a systems-level approach...

chapter

GPUs for fast triggering and pattern matching at the CERN experiment NA62

G. Lamanna, G. Collazuol, M. Sozzi

2009 IEEE Nuclear Science Symposium Conference Record (NSS/MIC) > 195 - 198

2009 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC 2009)

In high energy physics experiment the trigger system is crucial to reduce the quantity of data recorded on tape and the acquisition bandwidth requirements. This is particularly true in rare decays experiments. The NA62 experiment aims at measuring the branching ratio of K⁺ ?? ??⁺ ??????, predicted in the standard model (SM) at level of ~10^-10. In this paper we describe the idea to use the commercial...

chapter

Improving Resource Availability by Relaxing Network Allocation Constraints on Blue Gene/P

N. Desai, D. Buntinas, D. Buettner, P. Balaji, more

2009 International Conference on Parallel Processing > 333 - 339

2009 International Conference on Parallel Processing (ICPP 2009)

High-end computing (HEC) systems have passed the petaflop barrier and continue to move toward the next frontier of {exascale} computing. As companies and research institutes continue to work toward architecting these enormous systems, it is becoming increasingly clear that these systems will utilize a significant amount of shared hardware between processing units, including shared caches, memory management...

chapter

A new mechanism to deal with process variability in NoC links

C. Hernandez, F. Silla, V. Santonja, J. Duato

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 11

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Associated with the ever growing integration scale of VLSI technologies is the increase in process variability, which makes silicon devices to become less predictable. In the context of network-on-chip (NoC), this variability affects the maximum frequency that could be sustained by each wire of the link that interconnects two cores in a CMP system. Reducing the clock frequency so that all wires can...

chapter

Scaling communication-intensive applications on BlueGene/P using one-sided communication and overlap

R. Nishtala, P.H. Hargrove, D.O. Bonachea, K.A. Yelick

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 12

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

In earlier work, we showed that the one-sided communication model found in PGAS languages (such as UPC) offers significant advantages in communication efficiency by decoupling data transfer from processor synchronization. We explore the use of the PGAS model on IBM BlueGene/P, an architecture that combines low-power, quad-core processors with extreme scalability. We demonstrate that the PGAS model,...

chapter

AIREN: A Novel Integration of On-Chip and Off-Chip FPGA Networks

A.G. Schmidt, W.V. Kritikos, R.R. Sharma, R. Sass

2009 17th IEEE Symposium on Field Programmable Custom Computing Machines > 271 - 274

2009 17th IEEE Symposium on Field Programmable Custom Computing Machines (FCCM 2009)

The Reconfigurable Computing Cluster Project at the University of North Carolina at Charlotte is investigating the feasibility of using FPGAs as compute nodes to scale to PetaFLOP computing. To date the Spirit cluster, consisting of 64 FPGAs, has been assembled for the initial analysis. One important question is how to efficiently communicate among compute cores on-chip as well as between nodes. Tight...

chapter

Custom Assignment of MPI Ranks for Parallel Multi-dimensional FFTs: Evaluation of BG/P versus BG/L

H. Jagode, J. Hein

2008 IEEE International Symposium on Parallel and Distributed Processing with Applications > 271 - 283

2008 IEEE International Symposium on Parallel and Distributed Processing with Applications

For many scientific applications, the fast Fourier transformation (FFT) of multi-dimensional data is the kernel that limits scalability on a large number of processors. This paper investigates the extent of performance improvements for a parallel three-dimensional FFT (3D-FFT) implementation when using customized MPI task mappings. The MPI tasks are mapped in a customized fashion from the two-dimensional...

chapter

A Hardware Filesystem Implementation for High-Speed Secondary Storage

A.A. Mendon, R. Sass

2008 International Conference on Reconfigurable Computing and FPGAs > 283 - 288

2008 International Conference on Reconfigurable Computing and FPGAs (ReConFig)

Platform FPGAs are capable of hosting entire Linux- based systems including standard peripherals, integrated network interface cards and even disk controllers on a single chip. Filesystems, however, are typically implemented in software as part of the operating system. This presents a challenge as some applications are very sensitive to file I/O latency and Platform FPGA processor cores are clocked...

chapter

Early evaluation of IBM BlueGene/P

S. Alam, R. Barrett, M. Bast, M.R. Fahey, more

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

BlueGene/P (BG/P) is the second generation BlueGene architecture from IBM, succeeding BlueGene/L (BG/L). BG/P is a system-on-a-chip (SoC) design that uses four PowerPC 450 cores operating at 850 MHz with a double precision, dual pipe floating point unit per core. These chips are connected with multiple interconnection networks including a 3-D torus, a global collective network, and a global barrier...

chapter

Active CoordinaTion (ACT) - toward effectively managing virtualized multicore clouds

M. Kesavan, A. Ranadive, A. Gavrilovska, K. Schwan

2008 IEEE International Conference on Cluster Computing > 23 - 32

2008 IEEE International Conference on Cluster Computing (CLUSTER)

A key benefit of utility data centers and cloud computing infrastructure is the level of consolidation they can offer to arbitrary guest applications, and the substantial saving in operational costs and resources that can be derived in the process. However, significant challenges remain before it becomes possible to effectively and at low cost manage virtualized systems, particularly in the face of...

chapter

Fundamental performance constraints in horizontal fusion of in-order cores

P. Salverda, C. Zilles

2008 IEEE 14th International Symposium on High Performance Computer Architecture > 252 - 263

2008 IEEE 14th International Symposium on High Performance Computer Architecture

A conceptually appealing approach to supporting a broad range of workloads is a system comprising many small cores that can be fused, on demand, into larger cores. We demonstrate that using in-order cores for this purpose, even under idealized assumptions about fusion-related overheads, would introduce fundamental obstacles to achieving good performance - obstacles that are not present when out-of-order...

chapter

Latency and bandwidth efficient communication through system customization for embedded multiprocessors

Chenjie Yu, P. Petrov

2008 45th ACM/IEEE Design Automation Conference > 766 - 771

2008 45th ACM/IEEE Design Automation Conference

We present a cross-layer customization methodology for latency and bandwidth efficient inter-core communication in embedded multiprocessors. The methodology integrates compiler, operating system, and hardware support to achieve a bandwidth efficient, snoop- free, and coherence cache miss-free shared memory communication between synchronized producer and consumers cores. A compiler- driven code transformation...

Filter options

Data set:
ieee
Keywords:
HARDWARE
BANDWIDTH
MAGNETIC CORES
Publication type:
book

Publication date

Set your own date range

Keywords

BENCHMARK TESTING (7)
MULTIPROCESSING SYSTEMS (6)
MICROPROCESSOR CHIPS (5)
SOFTWARE (3)
YARN (3)
ARRAYS (2)
CACHE STORAGE (2)
COMPUTATIONAL MODELING (2)
COMPUTER ARCHITECTURE (2)
EXASCALE COMPUTING (2)
FAST FOURIER TRANSFORMS (2)
FFT (2)
FIELD PROGRAMMABLE GATE ARRAYS (2)
FPGA (2)
MEMORY MANAGEMENT (2)
NETWORK-ON-CHIP (2)
OUT-OF-ORDER CORES (2)
PERFORMANCE EVALUATION (2)
RECONFIGURABLE COMPUTING (2)
RESOURCE ALLOCATION (2)
SYSTEM-ON-A-CHIP (2)
SYSTEM-ON-CHIP (2)
THREE DIMENSIONAL DISPLAYS (2)
THROUGHPUT (2)
ACTIVE COORDINATION APPROACH (1)
ACTIVE DATAFLOW CHAINS (1)
ADAPTIVE RESOURCE ALLOCATION ALGORITHM (1)
AIREN (1)
ALGORITHM THEORY (1)
APPLICATION PROGRAM INTERFACES (1)
ARCHITECTURE INDEPENDENT RECONFIGURABLE NETWORK (1)
ARCHITECTURE-INDEPENENT HARDWARE CONSTRAINTS (1)
AUTO-TUNED CODE (1)
AUTOTUNING (1)
BANDWIDTH EFFICIENT INTER-CORE COMMUNICATION (1)
BENCHMARKING (1)
BLACK BOX APPROACH (1)
BLUE GENE (1)
BLUE GENE/L (1)
BLUE GENE/P (1)
BLUE GENE/P INSTALLATION (1)
BLUEGENE ARCHITECTURE (1)
BLUEGENE/P (1)
BRANCHING RATIO (1)
BUFFER CIRCUITS (1)
BUS TRAFFIC FILTERING (1)
CACHE HIERARCHY (1)
CACHE-TO-CACHE TRANSFERS (1)
CAVIUM OCTEON CN3860 NETWORK PROCESSOR (1)
CERN NA62 EXPERIMENT (1)
CHIP MULTIPROCESSOR DESIGN (1)
CLASS-OF-SERVICE NOTION (1)
CLOCK FREQUENCY (1)
CLOUD COMPUTING INFRASTRUCTURE (1)
CLOUDS (1)
CMP SYSTEM (1)
COHERENCE (1)
COHERENCE CACHE MISS-FREE SHARED MEMORY COMMUNICATION (1)
COMMERCIAL VIDEO CARD PROCESSOR (1)
COMMUNICATION EFFICIENCY (1)
COMMUNICATION NETWORKS (1)
COMPILER (1)
COMPILER- DRIVEN CODE TRANSFORMATION (1)
COMPUTER CENTRES (1)
COMPUTER GRAPHICS (1)
CONTINUOUS RUNTIME BEHAVIOR MONITORING (1)
COPROCESSORS (1)
CROSS-LAYER CUSTOMIZATION METHODOLOGY (1)
CUSTOMIZED MPI TASK MAPPING (1)
DATA MINING (1)
DATA TRANSFER (1)
DEGRADATION (1)
DETECTORS (1)
DIFFERENTIAL EQUATIONS (1)
DISCRETE-EVENT APPROACH (1)
DYNAMICALLY-SCHEDULED DESIGNS (1)
ELECTRONICS PACKAGING (1)
EMBEDDED MULTIPROCESSOR (1)
EMBEDDED SYSTEMS (1)
EMULATION (1)
EVEN DISK CONTROLLERS (1)
EXASCALE FEASIBILITY STUDY (1)
FABRICATION PROCESSES (1)
FAST FOURIER TRANSFORM (1)
FAST FOURIER TRANSFORMATION (1)
FILESYSTEM (1)
FLAVOR PHYSICS (1)
FREQUENCY 850 MHZ (1)
FUNCTIONAL VERIFICATION (1)
FUSED MACHINE (1)
FUSES (1)
FUSION-RELATED OVERHEADS (1)
GEOMETRIC MULTIGRID (1)
GLOBAL COMMUNICATION (1)
GLOBAL PHIT REDUCTION (1)
GPU (1)
GRID COMPUTING (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options