Advanced search

chapter

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

Alvise Rigo, Christian Pinto, Kevin Pouget, Daniel Raho, more

2017 Euromicro Conference on Digital System Design (DSD) > 486 - 493

2017 Euromicro Conference on Digital System Design (DSD)

Power consumption and high compute density are the key factors to be considered when building a compute node for the upcoming Exascale revolution. Current architectural design and manufacturing technologies are not able to provide the requested level of density and power efficiency to realise an operational Exascale machine. A disruptive change in the hardware design and integration process is needed...

chapter

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Lin-Ya Yu, Shao-Chung Wang, Jenq-Kuen Lee

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 45 - 52

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

Heterogeneous computing platforms containing a wide range of computing resources from CPUs to specialized hardware accelerators is the trend today resulting from the physical limitations on processors speed and the increasing demand for computing performance. Hence many optimization strategies are studied to get better throughput and lower energy consumption in heterogeneous systems. Various memory...

chapter

Empirical Mode Decomposition for Modeling of Parallel Applications on Intel Xeon Phi Processors

Gary Lawson, Masha Sosonkina, Tal Ezer, Yuzhong Shen

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) > 1000 - 1008

2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

For modern parallel applications, modeling their general execution characteristics, such as power and time, is difficult due to a great many factors affecting software-hardware interactions, which is also exacerbated by the dearth of measuring and monitoring tools for novel architectures, such as Intel Xeon Phi processors. To address this modeling challenge, the present work proposes to employ the...

chapter

Performance Prediction of HPC Applications on Intel Processors

Carlos Rosales, Antonio Gomez-Iglesias, Si Liu, Feng Chen, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 1325 - 1332

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

It is commonly the case that a small number of widely used applications make up a large fraction of the workload of HPC centers. Predicting the performance of important applications running on specific processors enables HPC centers to design best performing system configurations and to insure good performance for the most popular applications on new systems. In the analyses presented in this paper...

chapter

Assessing NUMA Performance Based on Hardware Event Counters

Max Plauth, Christoph Sterz, Felix Eberhardt, Frank Feinbube, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 904 - 913

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Cost models play an important role for the efficient implementation of software systems. These models can be embedded in operating systems and execution environments to optimize execution at run time. Even though non-uniform memory access (NUMA) architectures are dominating today's server landscape, there is still a lack of parallel cost models that represent NUMA system sufficiently. Therefore, the...

chapter

Cycle-accurate software modeling for RTL verification of embedded systems

Michael Schwarz, Carlos Villarraga, Dominik Stoffel, Wolfgang Kunz

2017 IEEE 20th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS) > 103 - 108

2017 IEEE 20th International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS)

Today's applications for HW/SW-systems, such as the Internet-of-Things, often demand SoC architectures where sophisticated firmware is running on fairly simple processors. Designers face the challenge of meeting high requirements for these systems regarding their efficiency and dependability under severe cost constraints. Targeting such applications this paper presents a new technique to generate...

chapter

From exaflop to exaflow

Tobias Becker, Pavel Burovskiy, Anna Maria Nestorov, Hristina Palikareva, more

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 404 - 409

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Exascale computing is facing a gap between the ever increasing demand for application performance and the underlying chip technology that does no longer deliver the expected exponential increases in CPU performance. The industry is now progressively moving towards dedicated accelerators to deliver high performance and better energy efficiency. However, the question of programmability still remains...

chapter

The application of windowed dynamic programming algorithm in software and cognitive radar

Yunlei Zhang, Guimei Zheng, Aniruddha Bhattacharjya, Jun Tang

2017 International Conference on Information Networking (ICOIN) > 293 - 297

2017 International Conference on Information Networking (ICOIN)

By decoupling the software and hardware, the software and cognitive radar (SRadar&CRadar) support the alteration of their functions by remapping different software in a uniform platform. This feature can make the radar more flexible, and has become a significant trend of the radars. In order to map efficiently, a framework including resource model and mapping algorithm is needed, which has been...

chapter

An Abstraction Model and a Comparative Analysis of Intel and ARM Hardware Isolation Mechanisms

Guillaume Averlant, Benoit Morgan, Eric Alata, Vincent Nicomette, more

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC) > 245 - 254

2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC)

Computer systems software and hardware architectures have become increasingly complex today. Meanwhile, cyberattacks are becoming more and more sophisticated and target any software or hardware components of these systems. Several isolation mechanisms, at the software and the hardware layers, are now available to provide the best protection against these widespread attacks. This paper is aimed at...

chapter

A novel mean-shift architecture for scalable multiprocessor implementation

Amna Tehreem, Sajid Gul Khawaja, Muhammad Usman Akram, Shoab A. Khan

2016 Future Technologies Conference (FTC) > 1107 - 1111

2016 Future Technologies Conference (FTC)

Organizing data into its natural grouping based on intrinsic characteristics is the most sensible thing to do with unlabeled data. Mean shift is a non-parametric mode seeking algorithm widely used for data clustering, image segmentation and object tracking, but its use in real time applications is limited because of its high computational cost. In this paper we propose a hybrid, sequentially unfolded,...

chapter

Runtime Power Limiting of Parallel Applications on Intel Xeon Phi Processors

Gary Lawson, Vaibhav Sundriyal, Masha Sosonkina, Yuzhong Shen

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC) > 39 - 45

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)

Energy-efficient computing is crucial to achieving exascale performance. Power capping and dynamic voltage/frequency scaling may be used to achieve energy savings. The Intel Xeon Phi implements a power capping strategy, where power thresholds are employed to dynamically set voltage/frequency at the runtime. By default, these power limits are much higher than the majority of applications would reach...

chapter

Experimental Validation and Exploration of a New Kind of Synchronization in Linux

Fangfang Zhu, Yucong Chen, Jianqiang Wang, Gaofeng Zhang, more

2016 International Symposium on System and Software Reliability (ISSSR) > 91 - 96

2016 International Symposium on System and Software Reliability (ISSSR)

PWCS (Probabilistic Write / Copy-Select) is a new kind of lock-free synchronization mechanism with wait-free characteristics proposed by Nicholas Mc Guire at the 13th real-time Linux workshop, which utilizes the inherent randomness of the modern computer systems. It aims at addressing the multi-reader - single-writer problem in Linux. Based on the original label-based PWCS, we propose a hash-based...

chapter

Communication-Based Power Modelling for Heterogeneous Multiprocessor Architectures

Baptiste Roux, Matthieu Gautier, Olivier Sentieys, Steven Derrien

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 209 - 216

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Programming heterogeneous multiprocessor architectures is a real challenge dealing with a huge design space. Computer-aided design and development tools try to circumvent this issue by simplifying instantiation mechanisms. However, energy consumption is not well supported in most of these tools due to the difficulty to obtain fast and accurate power estimation. To this aim, this paper proposes and...

chapter

When Amdahl Meets Young/Daly

Aurelien Cavelan, Jiafan Li, Yves Robert, Hongyang Sun

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 203 - 212

2016 IEEE International Conference on Cluster Computing (CLUSTER)

This paper investigates the optimal number of processors to execute a parallel job, whose speedup profile obeys Amdahl's law, on a large-scale platform subject to fail-stop and silent errors. We combine the traditional checkpointing and rollback recovery strategies with verification mechanisms to cope with both error sources. We provide an exact formula to express the execution overhead incurred by...

chapter

Network Contention-Aware Method to Evaluate Data Coherency Protocols within a Compilation Toolchain

Loic Cudennec, Safae Dahmani, Guy Gogniat, Cedric Maignan, more

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 249 - 256

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Shared memory is a critical issue for large distributed systems. Despite several data coherency protocols have been proposed, the selection of the protocol that best suits to the application requirements and system constraints remains a challenge. The development of multi-coherency systems, where different protocols can be deployed during runtime, appears to be an interesting alternative. In order...

chapter

Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns

Kevin J. Brown, HyoukJoong Lee, Tiark Romp, Arvind K. Sujeeth, more

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 194 - 205

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

High performance in modern computing platforms requires programs to be parallel, distributed, and run on heterogeneous hardware. However programming such architectures is extremely difficult due to the need to implement the application using multiple programming models and combine them together in ad-hoc ways. To optimize distributed applications both for modern hardware and for modern programmers...

chapter

Detecting Kernels Suitable for C-Based High-Level Hardware Synthesis

Julian Oppermann, Andreas Koch

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) > 1157 - 1164

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld)

We present SpExSim, a software tool for quickly surveying legacy code bases for kernels that could be accelerated by FPGA-based compute units. We specifically aim for low development effort by considering the use of C-based high-level hardware synthesis, instead of complex manual hardware designs. SpExSim not only exploits the spatially distributed model of computation commonly used on FPGAs, but...

chapter

Improving instruction accurate simulation for parallel automotive applications

Dominik Schoenwetter, Alexander Ditter, Dietmar Fey, Ralph Mader

2016 11th IEEE Symposium on Industrial Embedded Systems (SIES) > 1 - 4

2016 11th IEEE Symposium on Industrial Embedded Systems (SIES)

High level simulation and modeling techniques have matured significantly over the last years and have become more and more important in practice, e.g., in the industrial hardware development and especially the automotive domain. Complex and detailed modeling requires a lot of time during preparation and execution, is quite error prone and thus, reduces the average time-to-market significantly. One...

chapter

Performance Models for Split-Execution Computing Systems

Travis S. Humble, Alexander J. McCaskey, Jonathan Schrock, Hadayat Seddiqi, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 545 - 554

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Split-execution computing leverages the capabilities of multiple computational models to solve problems, but splitting program execution across different computational models incurs costs associated with the translation between domains. We analyze the performance of a split-execution computing system developed from conventional and quantum processing units (QPUs) by using behavioral models that track...

article

Reimagining Heterogeneous Computing: A Functional Instruction-Set Architecture Computing Model

Daniel Nemirovsky, Nikola Markovic, Osman Unsal, Mateo Valero, more

IEEE Micro > 2015 > 35 > 5 > 6 - 14

The relentless push in technology scaling driven by Moore's law has witnessed fantastic gains in the quantities of transistors available on chips. Computer architects have exploited the extra transistors by incorporating several computing cores within a single processor. Heterogeneous processing in particular has become a useful technique for dealing with ever-present power and memory restrictions...

INFONA - science communication portal

Advanced search

Advanced search in people

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Empirical Mode Decomposition for Modeling of Parallel Applications on Intel Xeon Phi Processors

Performance Prediction of HPC Applications on Intel Processors

Assessing NUMA Performance Based on Hardware Event Counters

Cycle-accurate software modeling for RTL verification of embedded systems

From exaflop to exaflow

The application of windowed dynamic programming algorithm in software and cognitive radar

An Abstraction Model and a Comparative Analysis of Intel and ARM Hardware Isolation Mechanisms

A novel mean-shift architecture for scalable multiprocessor implementation

Runtime Power Limiting of Parallel Applications on Intel Xeon Phi Processors

Experimental Validation and Exploration of a New Kind of Synchronization in Linux

Communication-Based Power Modelling for Heterogeneous Multiprocessor Architectures

When Amdahl Meets Young/Daly

Network Contention-Aware Method to Evaluate Data Coherency Protocols within a Compilation Toolchain

Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns

Detecting Kernels Suitable for C-Based High-Level Hardware Synthesis

Improving instruction accurate simulation for parallel automotive applications

Performance Models for Split-Execution Computing Systems

Reimagining Heterogeneous Computing: A Functional Instruction-Set Architecture Computing Model

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options