Search results

chapter

Making a case for an ARM Cortex-A9 CPU interlay replacing the NEON SIMD unit

Jose Raul Garcia Ordaz, Dirk Koch

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

As an alternative of adding more and more instructions to CPU cores in order to address a wide range of applications, this paper examines to use a mixed grained CPU interlay fabric to provide reconfigurable instruction set extensions. In detail, we are examining to replace the hardened NEON SIMD unit of an ARM Cortex-A9 with an identical sized FPGA fabric. We show that by applying a set of optimizations,...

chapter

Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations

Yohann Uguen, Florent de Dinechin, Steven Derrien

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are well known for their ability to perform non-standard computations not supported by classical microprocessors. Many libraries of highly customizable application-specific IPs have exploited this capablity. However, using such IPs usually requires handcrafted HDL, hence significant design efforts. High Level Synthesis (HLS) lowers the design effort thanks to the use of C/C++ dialects for programming...

chapter

Utility-Based Hybrid Memory Management

Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 152 - 165

2017 IEEE International Conference on Cluster Computing (CLUSTER)

While the memory footprints of cloud and HPC applications continue to increase, fundamental issues with DRAM scaling are likely to prevent traditional main memory systems, composed of monolithic DRAM, from greatly growing in capacity. Hybrid memory systems can mitigate the scaling limitations of monolithic DRAM by pairing together multiple memory technologies (e.g., different types of DRAM, or DRAM...

chapter

OmniGraph: A Scalable Hardware Accelerator for Graph Processing

Chongchong Xu, Chao Wang, Lei Gong, Yuntao Lu, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 623 - 624

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Large-scale graphs processing attracts more and more attentions, and it has been widely applied in many application domains. FPGA is a promising platform to implement graph processing algorithms with high power-efficiency and parallelism. In this paper, we propose OmniGraph, a scalable hardware accelerator for graph processing. OmniGraph can process graphs with different sizes adaptively and is adaptable...

chapter

Hardware diversity and modified NUREG/CR-7007 based assessment of NPP I&C safety

Oleg Illiashenko, Vyacheslav Kharchenko, Ah-Lian Kor, Artem Panarin, more

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 2 > 907 - 911

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

Diversity and subdiversity-oriented systems applied in safety critical industry systems are analyzed through the use of the classification scheme described in standard NUREG7007. This classification is specified considering diversity of hardware and FPGA designs. In particular, diversity of hard logic and soft processors, interfaces and buses, self-diagnostics means, etc… are described. Impact of...

chapter

Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology

Vicent Selfa, Julio Sahuquillo, Lieven Eeckhout, Salvador Petit, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 194 - 205

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Achieving system fairness is a major design concern in current multicore processors. Unfairness arises due to contention in the shared resources of the system, such as the LLC and main memory. To address this problem, many research works have proposed novel cache partitioning policies aimed at addressing system fairness without harming performance. Unfortunately, existing proposals targeting fairness...

chapter

Design and implementation of an OpenRISC system-on-chip with an encryption peripheral

Latif Akcay, Mehmet Tukel, Berna Ors

2017 European Conference on Circuit Theory and Design (ECCTD) > 1 - 4

2017 European Conference on Circuit Theory and Design (ECCTD)

Open source hardware projects are becoming more and more common. OpenRISC SOC, one of the prominent of these projects, has become quite popular with the support of volunteer developers. In this work, we have demonstrated the design of an DES (Data Encryption Standard) based system, that can be used in security applications, on ORPSoC-v2 (Openrisc Reference Platform System-on-Chip). Additionally, we...

chapter

An integrated design environment of fault tolerant processors with flexible HW/SW solutions for versatile performance/cost/coverage tradeoffs

Yi-Ju Ke, Yi-Chieh Ghen, Jng-Jer Huang

2017 International Test Conference in Asia (ITC-Asia) > 162 - 167

2017 International Test Conference in Asia (ITC-Asia)

This paper presents an integrated design environment (IDE) for embedded fault-tolerant processor system. It takes in a processor core IP and the embedded software which is to be executed on the given processor, and turns them into a fault-tolerant system with various hardware and software mechanisms, subject to the designer's selection. The hardware options include dual redundancy for processor core,...

chapter

A Linux-based support for developing real-time applications on heterogeneous platforms with dynamic FPGA reconfiguration

Marco Pagani, Alessio Balsini, Alessandro Biondi, Mauro Marinoni, more

2017 30th IEEE International System-on-Chip Conference (SOCC) > 96 - 101

2017 30th IEEE International System-on-Chip Conference (SOCC)

Heterogeneous computing platforms including both processors and field programmable gate arrays (FPGAs) represent an attractive solution for balancing software flexibility with high performance and energy efficiency of custom hardware modules. Furthermore, the dynamic partial reconfiguration (DPR) capabilities of modern FPGAs allow virtualizing the available area to support several hardware modules...

chapter

LibHSA: One step towards mastering the era of heterogeneous hardware accelerators using FPGAs

Marc Reichenbach, Philipp Holzinger, Konrad Haublein, Tobias Lieske, more

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 1 - 6

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Various signal and image processing applications require vast acceleration in order to enable real-time processing and meet constraints in power consumption. On FPGAs these applications can be implemented as application-specific circuit. Although IP cores for various applications exist, even interfacing these usually requires experienced knowledge in hardware design. Using FPGAs or other accelerators...

chapter

An Analytical Model of Hardware Transactional Memory

Daniel Castro, Paolo Romano, Diego Didona, Willy Zwaenepoel

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS) > 221 - 231

2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS)

This paper investigates the problem of deriving a white box performance model of Hardware Transactional Memory (HTM) systems. The proposed model targets TSX, a popular implementation of HTM integrated in Intel processors starting with the Haswell family in 2013.An inherent difficulty with building white-box models of commercially available HTM systems is that their internals are either vaguely documented...

chapter

A resource-efficient monitoring architecture for hardware accelerated self-adaptive online data stream compression

Seyyed Mahdi Najmabadi, Prajwala Pandit, Trung-Hieu Tran, Sven Simon

2017 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) > 222 - 227

2017 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

In this paper, a novel scalable and resource-efficient architecture capable of monitoring the compressibility of a data stream with various entropy encoding algorithms is proposed. The self-adaptive architecture determines the best compression technique among many techniques which may be selected to encode an online data stream. This information can be used to reconfigure an adaptive encoding architecture...

article

Gaussian Pyramid: Comparative Analysis of Hardware Architectures

Fernanda D. V. R. Oliveira, Jose Gabriel R. C. Gomes, Jorge Fernandez-Berni, Ricardo Carmona-Galan, more

IEEE Transactions on Circuits and Systems I: Regular Papers > 2017 > 64 > 9 > 2308 - 2321

This paper addresses a comparison of architectures for the hardware implementation of Gaussian image pyramids. Main differences between architectural choices are in the sensor front-end. One side is for architectures consisting of a conventional sensor that delivers digital images and which is followed by digital processors. The other side is for architectures employing a non-conventional sensor with...

chapter

Lightweight Software Encryption for Embedded Processors

Thomas Hiscock, Olivier Savry, Louis Goubin

2017 Euromicro Conference on Digital System Design (DSD) > 213 - 220

2017 Euromicro Conference on Digital System Design (DSD)

Over the last 30 years, a number of secure processor architectures have been proposed to protect software integrity and confidentiality during its distribution and execution. In such architectures, encryption (together with integrity checking) is used extensively, on any data leaving a defined secure boundary.In this paper, we show how encryption can be achieved at the instruction level using a stream...

chapter

Optimizing Memory Access Performance Using Hardware Assisted Virtualization in Retargetable Dynamic Binary Translation

Antoine Faravelon, Olivier Gruber, Frederic Petrot

2017 Euromicro Conference on Digital System Design (DSD) > 40 - 46

2017 Euromicro Conference on Digital System Design (DSD)

Dynamic Binary Translation is one of the most efficient strategies for the simulation of System-on-Chips, with recent studies showing that a large part of the simulation time is spent in realizing memory accesses. Indeed, the simulation of each load and store instructions requires a software emulation of the hardware Memory Management Unit (MMU). In this work, we propose to realize memory accesses...

chapter

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Kanishkan Vadivel, Mark Wijtvliet, Roel Jordans, Henk Corporaal

2017 Euromicro Conference on Digital System Design (DSD) > 14 - 21

2017 Euromicro Conference on Digital System Design (DSD)

Due to their flexibility and high performance, Coarse Grained Reconfigurable Array (CGRA) are a topic of increasing research interest. However, CGRAs also have the potential to achieve very high energy efficiency in comparison to other reconfigurable architectures when hardware optimizations are applied. Some of these optimizations are common for more traditional processors but can also lead to large...

chapter

Automatic Control Flow Generation for OpenVX Graphs

Merten Popp, Stef van Son, Orlando Moreira

2017 Euromicro Conference on Digital System Design (DSD) > 198 - 204

2017 Euromicro Conference on Digital System Design (DSD)

Heterogeneous platforms with large numbers of processing elements (PEs) have been proposed to satisfy the computational requirements of computer vision applications. Limiting the incurred communication cost here is key to meet the power constraints of embedded devices.We present a new heuristic to reduce communication among PEs and to external memory by aggregating inter-process communication and...

chapter

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

Alvise Rigo, Christian Pinto, Kevin Pouget, Daniel Raho, more

2017 Euromicro Conference on Digital System Design (DSD) > 486 - 493

2017 Euromicro Conference on Digital System Design (DSD)

Power consumption and high compute density are the key factors to be considered when building a compute node for the upcoming Exascale revolution. Current architectural design and manufacturing technologies are not able to provide the requested level of density and power efficiency to realise an operational Exascale machine. A disruptive change in the hardware design and integration process is needed...

chapter

Thermal-Aware Job Scheduling of MapReduce Applications on High Performance Clusters

Shubbhi Taneja, Yi Zhou, Mohammed Ibrahim Alghamdi, Xiao Qin

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 261 - 270

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In this study, we develop a thermal-aware job scheduling strategy called tDispatch tailored for MapReduce applications running on Hadoop clusters. The scheduling idea of tDispatch is motivated by a profiling study of CPU-intensive and I/O-intensive jobs from the perspective of thermal efficiency. More specifically, we investigate the thermal behaviors of these two types of jobs running on a Hadoop...

chapter

Multiple Pattern Matching for Network Security Applications: Acceleration through Vectorization

Charalampos Stylianopoulos, Magnus Almgren, Olaf Landsiedel, Marina Papatriantafilou

2017 46th International Conference on Parallel Processing (ICPP) > 472 - 482

2017 46th International Conference on Parallel Processing (ICPP)

Pattern matching is a key building block of Intrusion Detection Systems and firewalls, which are deployed nowadays on commodity systems from laptops to massive web servers in the cloud. In fact, pattern matching is one of their most computationally intensive parts and a bottleneck to their performance. In Network Intrusion Detection, for example, pattern matching algorithms handle thousands of patterns...

INFONA - science communication portal

Search results

Making a case for an ARM Cortex-A9 CPU interlay replacing the NEON SIMD unit

Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations

Utility-Based Hybrid Memory Management

OmniGraph: A Scalable Hardware Accelerator for Graph Processing

Hardware diversity and modified NUREG/CR-7007 based assessment of NPP I&C safety

Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology

Design and implementation of an OpenRISC system-on-chip with an encryption peripheral

An integrated design environment of fault tolerant processors with flexible HW/SW solutions for versatile performance/cost/coverage tradeoffs

A Linux-based support for developing real-time applications on heterogeneous platforms with dynamic FPGA reconfiguration

LibHSA: One step towards mastering the era of heterogeneous hardware accelerators using FPGAs

An Analytical Model of Hardware Transactional Memory

A resource-efficient monitoring architecture for hardware accelerated self-adaptive online data stream compression

Gaussian Pyramid: Comparative Analysis of Hardware Architectures

Lightweight Software Encryption for Embedded Processors

Optimizing Memory Access Performance Using Hardware Assisted Virtualization in Retargetable Dynamic Binary Translation

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Automatic Control Flow Generation for OpenVX Graphs

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

Thermal-Aware Job Scheduling of MapReduce Applications on High Performance Clusters

Multiple Pattern Matching for Network Security Applications: Acceleration through Vectorization

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options