Search results

chapter

A 32nm Westmere-EX Xeon^® enterprise processor

S Sawant, U Desai, G Shamanna, L Sharma, more

2011 IEEE International Solid-State Circuits Conference > 74 - 75

2011 IEEE International Solid- State Circuits Conference (ISSCC 2011)

The next-generation enterprise Xeon^® processor consists of 10 Westmere 32nm cores and a shared inclusive L3 cache (LLC) integrated on a monolith ic die, with link-based l/Os. This paper focuses on the innovations and circuit optimizations over the predecessor targeting idle power reduction, robust high-speed I/O links, and performance per watt improvements. The processor is implemented in 32nm CMOS...

chapter

Improving the efficiency of a hardware transactional memory on an NoC-based MPSoC

L Kunz, G Girao, F R Wagner

2011 Design, Automation&Test in Europe > 1 - 4

2011 Design, Automation & Test in Europe

Transactional Memories (TM) have attracted much interest as an alternative to lock-based synchronization in shared-memory multiprocessors. Considering the use of TM on an embedded, NoC-based MPSoC, this work evaluates a LogTM implementation. It is shown that the time an aborted transaction waits before restarting its execution (the backoff delay) can seriously affect the overall performance and energy...

article

Codesign for InfiniBand Clusters

Sayantan Sur, Sreeram Potluri, Krishna Chaitanya Kandalla, Hari Subramoni, more

Computer > 2011 > 44 > 11 > 31 - 36

Codesigning applications and communication libraries to leverage underlying network features is imperative for achieving optimal performance on modern computing clusters.

article

Data-Flow Microarchitecture for Wide Datapath RSFQ Processors: Design Study

M Dorojevets, C L Ayala, A K Kasperek

IEEE Transactions on Applied Superconductivity > 2011 > 21 > 3-1 > 787 - 791

Development of an efficient processor architecture with appropriate clocking mechanisms and datapath organization is one of the most challenging design issues for 32-/64-bit RSFQ processors. The cell-level design of a 32-bit RSFQ dual-lane integer processor has been developed at Stony Brook University in an effort to identify and study techniques capable of tolerating significant delay variations...

chapter

Simulation environment configuration for parallel simulation of multicore embedded systems

Dukyoung Yun, Jinwoo Kim, Sungchan Kim, Soonhoi Ha

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC) > 345 - 350

2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC)

Increasing complexity of multicore embedded systems makes careful construction of virtual prototyping system crucial to shorten design turnaround time due to the growing demand of simulation time. Parallel simulation aims to accelerate the simulation speed by running component simulators concurrently. But extra overhead of communication and synchronization between simulators may overshadow the benefits...

chapter

Fast barrier synchronization with AWGR-based optical switch in high-performance and parallel computing

Xiaohui Ye, A Potter, Yawei Yin, R Proietti, more

2011 Optical Fiber Communication Conference and Exposition and the National Fiber Optic Engineers Conference > 1 - 3

2011 Conference on Optical Fiber Communication - OFC 2011 Collocated National Fiber Optic Engineers Conference OFC/NFOEC 2011

We demonstrate speedup of barrier synchronization for parallel computing via wavelength parallelism of the optical switch using a k-ary tree to collect updates without incurring contention, and optical broadcast to distribute the notifications.

chapter

Estimating overheads of OpenMP directives

Sareh Doroodian, Nima Ghaemian, Mohsen Sharifi

2011 19th Iranian Conference on Electrical Engineering > 1 - 5

2011 19th Iranian Conference on Electrical Engineering (ICEE)

Estimating the execution time of programs has always been a concern in computer science. With the emergence of multi-core processors, this concern has found new perspectives and new parameters affect the runtime performance of parallel applications. To estimate the execution time of parallel applications, we investigate the overheads caused by parallelizing an application by identifying the overheads...

chapter

Performance Models for Matrix Computations on Multicore Processors Using OpenMP

P D Michailidis, K G Margaritis

2010 International Conference on Parallel and Distributed Computing, Applications and Technologies > 375 - 380

2010 11th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2010)

The matrix computations such as matrix-vector and matrix multiplication are very challenging computational kernels arising in scientific computing. In this paper, we study and evaluate a number of different data decomposition schemes for matrix computations on multicore architectures using OpenMP programming model. Further, in this work we propose a simple and fast analytical model to predict the...

chapter

Using Partial Reconfiguration in an Embedded Message-Passing System

M Saldaña, A Patel, Hao Jun Liu, Paul Chow

2010 International Conference on Reconfigurable Computing and FPGAs > 418 - 423

2010 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2010)

Partial Reconfiguration (PR) is an FPGA feature that allows the modification of certain parts of an FPGA while the rest of it continues to operate without disruption. This distinctive characteristic of FPGAs has many potential benefits but also challenges. The lack of good CAD tools and the deep hardware knowledge requirement result in a hard to use feature. In this paper, the new Partition-based...

chapter

Network Processing in Multi-core FPGAs with Integrated Cache-Network Interface

C Kachris, G Nikiforos, S Kavadias, V Papaefstathiou, more

2010 International Conference on Reconfigurable Computing and FPGAs > 328 - 333

2010 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2010)

Per-core local (scratchpad) memories allow direct inter-core communication, with latency and energy advantages over coherent cache-based communication, especially as CMP architectures become more distributed. A multicore FPGA platform with cache-integrated network interfaces (NIs) is presented, appropriate for scalable multicores, that combine the best of two worlds -the flexibility of caches (using...

chapter

A fully programmable frame synchronization architecture of OFDM systems implemented on a multi-core processor platform

Wenhua Fan, Bei Huang, Jialin Cao, Yun Chen, more

2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology > 278 - 280

2010 10th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT)

This paper presents a fully programmable frame synchronization architecture of OFDM systems implemented on a multi-core processor platform. By utilizing the guard interval in OFDM signals, the coarse symbol synchronization (CSS) and the fractional carrier frequency offset estimation (CFO) are considered simultaneously. The multi-core processor platform is a 2-Dimension mesh array of SIMD (Single Instruction...

chapter

Execution models for processors and instructions

F Brandner, V Pavlu, A Krall

NORCHIP 2010 > 1 - 4

2010 28th Norchip Conference (NORCHIP 2010)

Modeling the execution of a processor and its instructions is a challenging problem, in particular in the presence of long pipelines, parallelism, and out-of-order execution. A naive approach based on finite state automata inevitably leads to an explosion in the number of states and is thus only applicable to simple minimalistic processors. During their execution, instructions may only proceed forward...

chapter

A Parallel Simulator for Large-Scale Parallel Computers

Yuzhe Zhi, Yi Liu, Lin Jiao, Peng Zhang

2010 Ninth International Conference on Grid and Cloud Computing > 196 - 200

2010 9th International Conference on Grid and Cloud Computing (GCC 2010)

This paper describes the design and application of an execution-driven parallel simulator for predicting performance of Large-Scale Parallel Computers. The simulator can be used in hardware validation and software development for large-scale parallel computers. It simulates processors of each node, network components and disk I/O components. To illustrate the capabilities of our simulator, we describe...

chapter

Building a Personal High Performance Computer with Heterogeneous Processors

Qiang Li, Zhigang Huo, Ninghui Sun

2010 Ninth International Conference on Grid and Cloud Computing > 223 - 228

2010 9th International Conference on Grid and Cloud Computing (GCC 2010)

Personal high performance computer (PHPC) requires lower cost and high performance. The Teraflops PHPC systems with special accelerator units like GPGPU have been presented, but they have difficulties in programming, compatibility and applicability. In this paper, we present HPP-PHPC, a hybrid architecture of heterogeneous processors connected by non-coherent off-chip system bus. The performance of...

chapter

Combining process splitting and merging transformations for Polyhedral Process Networks

S Meijer, H Nikolov, T Stefanov

2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia > 97 - 106

2010 8th IEEE Workshop on Embedded Systems for Real-Time Multimedia (ESTIMedia 2010)

We use the polyhedral process network (PPN) model of computation to program and map streaming media applications onto embedded Multi-Processor Systems on Chip (MPSoCs) platforms. In previous works, it has been shown how to apply different process network transformations in isolation. In this work, we present a holistic approach combining the process splitting and merging transformations and show that...

chapter

Global Lookahead Management (GLM) Protocol for Conservative DEVS Simulation

S Jafer, G Wainer

2010 IEEE/ACM 14th International Symposium on Distributed Simulation and Real Time Applications > 141 - 148

2010 IEEE/ACM 14th International Symposium on Distributed Simulation and Real Time Applications (DS-RT 2010)

An approach to carrying out asynchronous distributed simulation of multiprocessor message passing architectures is presented. Aiming at achieving better performance on Conservative DEVS-based simulations, we introduce the GLM protocol which borrows the idea of safe processing intervals from the conservative time window algorithm and maintains global synchronization in a fashion similar to the distributed...

chapter

Ring Pipelined Algorithm for the Algebraic Path Problem on the CELL Broadband Engine

Claude Tadonki

2010 22nd International Symposium on Computer Architecture and High Performance Computing Workshops > 1 - 6

2010 22nd International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW 2010)

The algebraic path problem (APP) unifies a number of related combinatorial or numerical problems into one that can be resolved by a generic algorithmic schema. In this paper, we propose a linear SPMD model based on the Warshall-Floyd procedure coupled with a systematic shift-toroïdal. Our scheduling requires a number of processors that equals the size of the input matrix. With a fewer number of processors,...

chapter

Deriving concurrent control software from behavioral specifications

G Ramanathan, B Morandi, S West, S Nanz, more

2010 IEEE/RSJ International Conference on Intelligent Robots and Systems > 1994 - 1999

2010 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2010)

Concurrency is an integral part of many robotics applications, due to the need for handling inherently parallel tasks such as motion control and sensor monitoring. Writing programs for this complex domain can be hard, in particular because of the difficulties of retaining a robust modular design. We propose to use SCOOP, an object-oriented programming model for concurrency which by construction is...

chapter

Parallel MLFMA Performance Analysis Using Performance Analysis Toolsets

Cai Liang Liang, Tong Weiqin, Hu Yue, Cui Yanbao

2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery > 384 - 389

2010 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

The Fast Multipole Method (FMM) and Multi- Level Fast Multipole Algorithm (MLFMA) have been used to solve electromagnetic scattering problems for many years. Parallel implementations of MLFMA is currently a hot topic because it is capable of solving scattering problems with tens of millions of unknowns, with complexity O(NlogN), where N is the number of unknowns. In this paper, we discuss a new perfectly...

chapter

Accelerated on-line calibration of Dynamic Traffic Assignment using distributed Stochastic Gradient approximation

Enyang Huang, Constantinos Antoniou, Jorge Lopes, Yang Wen, more

13th International IEEE Conference on Intelligent Transportation Systems > 1166 - 1171

2010 13th International IEEE Conference on Intelligent Transportation Systems (ITSC 2010)

Dynamic Traffic Assignment (DTA) system [Ben-Akiva et al., 1991] [Mahmassani, 2001] benefits travelers by providing accurate estimate of current traffic conditions, consistent anticipatory network information as well as reliable route guidance. Over the years, two types of model adjustment schemes have been studied - DTA off-line calibration [Balakrishna, 2006] [Toledo et al., 2003] [van der Zijpp,...

INFONA - science communication portal

Search results

A 32nm Westmere-EX Xeon^® enterprise processor

Improving the efficiency of a hardware transactional memory on an NoC-based MPSoC

Codesign for InfiniBand Clusters

Data-Flow Microarchitecture for Wide Datapath RSFQ Processors: Design Study

Simulation environment configuration for parallel simulation of multicore embedded systems

Fast barrier synchronization with AWGR-based optical switch in high-performance and parallel computing

Estimating overheads of OpenMP directives

Performance Models for Matrix Computations on Multicore Processors Using OpenMP

Using Partial Reconfiguration in an Embedded Message-Passing System

Network Processing in Multi-core FPGAs with Integrated Cache-Network Interface

A fully programmable frame synchronization architecture of OFDM systems implemented on a multi-core processor platform

Execution models for processors and instructions

A Parallel Simulator for Large-Scale Parallel Computers

Building a Personal High Performance Computer with Heterogeneous Processors

Combining process splitting and merging transformations for Polyhedral Process Networks

Global Lookahead Management (GLM) Protocol for Conservative DEVS Simulation

Ring Pipelined Algorithm for the Algebraic Path Problem on the CELL Broadband Engine

Deriving concurrent control software from behavioral specifications

Parallel MLFMA Performance Analysis Using Performance Analysis Toolsets

Accelerated on-line calibration of Dynamic Traffic Assignment using distributed Stochastic Gradient approximation

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options