Search results

chapter

Making a case for an ARM Cortex-A9 CPU interlay replacing the NEON SIMD unit

Jose Raul Garcia Ordaz, Dirk Koch

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

As an alternative of adding more and more instructions to CPU cores in order to address a wide range of applications, this paper examines to use a mixed grained CPU interlay fabric to provide reconfigurable instruction set extensions. In detail, we are examining to replace the hardened NEON SIMD unit of an ARM Cortex-A9 with an identical sized FPGA fabric. We show that by applying a set of optimizations,...

chapter

Toward a pixel-parallel architecture for graph cuts inference on FPGA

Tianqi Gao, Jungwook Choi, Shang-nien Tsai, Rob A. Rutenbar

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

The method of Graph Cuts converts a Maximum a Posteriori (MAP) inference problem on a Markov Random Field (MRF) into a network flow, which can be solved efficiently. Many computer vision problems can be conveniently cast as an inference task to find most likely labels for pixels. The method is widely used, but computationally burdensome. Prior accelerator attempts have failed to exploit the problem's...

article

BRAIN: A Low-Power Deep Search Engine for Autonomous Robots

Youchang Kim, Dongjoo Shin, Jinsu Lee, Hoi-Jun Yoo

IEEE Micro > 2017 > 37 > 5 > 11 - 19

Autonomous robots are actively studied for many unmanned applications, however, the heavy computational costs and limited battery capacity make it difficult to implement intelligent decision making in robots. In this article, the authors propose a low-power deep search engine (code-named “BRAIN”) for real-time path planning of intelligent autonomous robots. To achieve low power consumption while maintaining...

chapter

Parallel FPGA routing: Survey and challenges

Mirjana Stojilovic

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

As transistor scaling is slowing down [1], other opportunities for ensuring continuous performance increase have to be explored. Field programmable gate arrays (FPGAs) are in the spotlight these days: not only due to their malleability and energy efficiency, but also because FPGAs have recently been integrated into the cloud [2]. The latter makes them available to everyone in need of the immense computing...

chapter

Exploring On-Node Parallelism with Neutral, a Monte Carlo Neutral Particle Transport Mini-App

Matt Martineau, Simon McIntosh-Smith

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 498 - 508

2017 IEEE International Conference on Cluster Computing (CLUSTER)

In this research we describe the development and optimisation of a new Monte Carlo neutral particle transport mini-app, neutral. In spite of the success of previous research efforts to load balance the algorithm at scale, it is not clear how to take advantage of the diverse architectures being installed in the newest supercomputers. We explore different algorithmic approaches, and perform extensive...

chapter

Pure Functions in C: A Small Keyword for Automatic Parallelization

Tim SuB, Lars Nagel, Marc-Andre Vef, Andre Brinkmann, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 552 - 556

2017 IEEE International Conference on Cluster Computing (CLUSTER)

The need for parallel task execution has been steadily growing in recent years since manufacturers mainly improve processor performance by scaling the number of installed cores instead of the frequency of processors. To make use of this potential, an essential technique to increase the parallelism of a program is to parallelize loops. However, a main restriction of available tools for automatic loop...

chapter

Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations

Yohann Uguen, Florent de Dinechin, Steven Derrien

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are well known for their ability to perform non-standard computations not supported by classical microprocessors. Many libraries of highly customizable application-specific IPs have exploited this capablity. However, using such IPs usually requires handcrafted HDL, hence significant design efforts. High Level Synthesis (HLS) lowers the design effort thanks to the use of C/C++ dialects for programming...

chapter

TAIGA: A new RISC-V soft-processor framework enabling high performance CPU architectural features

Eric Matthews, Lesley Shannon

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Recently, there has been an increased focus on integration of reconfigurable fabric with modern processors. However, existing soft-processors are optimized to leverage older FPGA fabrics, focus primarily on resource minimization and have fixed-pipeline designs that limit the scope for tightly integrated hardware accelerators. In this work, we present Taiga: a RISC-V, 32-bit, soft-processor architecture...

chapter

Estimation of Worst Case Response Time boundaries in multi-core real-time systems

Matthias Mucha, Jurgen Mottok, Stefan Kramer

2017 International Conference on Applied Electronics (AE) > 1 - 6

2017 International Conference on Applied Electronics (AE)

We address a novel probabilistic approach to estimate the Worst Case Response Time boundaries of tasks. Multi-core real-time systems process tasks in parallel on two or more cores. Tasks in our contribution may preempt other tasks, block tasks with semaphores to access global shared resources, or migrate to another core. The depicted task behavior is random. The shape of collected response times of...

chapter

Performance Evaluation of Quantum ESPRESSO on NEC SX-ACE

Osamu Watanabe, Akihiro Musa, Hiroaki Hokari, Shivanshu Singh, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 701 - 708

2017 IEEE International Conference on Cluster Computing (CLUSTER)

In recent years, a lot of computer simulation codes have been developed as open-source software. Meanwhile major processors adopt a concept of a vector processing in high performance computing. Hence, the computer simulation codes need to follow a vector processing manner to have a benefit of a computational potential of the vector processing. Our study is evaluation and analysis of performance of...

chapter

Checkpointing Workflows for Fail-Stop Errors

Li Han, Louis-Claude Canon, Henri Casanova, Yves Robert, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 487 - 497

2017 IEEE International Conference on Cluster Computing (CLUSTER)

We consider the problem of orchestrating the execution of workflow applications structured as Directed Acyclic Graphs (DAGs) on parallel computing platforms that are subject to fail-stop failures. The objective is to minimize expected overall execution time, or makespan. A solution to this problem consists of a schedule of the workflow tasks on the available processors and of a decision of which application...

chapter

Fast Failure Erasure Encoding Using Just in Time Compilation for CPUs, GPUs, and FPGAs

David Rohr, Volker Lindenstruth

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 451 - 463

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Failure tolerant data encoding and storage is of paramount importance for data centers, supercomputers, data transfers, and many aspects of information technology. Reed-Solomon failure erasure codes and their variants are the basis for many applications in this field. Efficient implementation of these codes is challenging because they require computations in Galois fields, which are not supported...

chapter

Utility-Based Hybrid Memory Management

Yang Li, Saugata Ghose, Jongmoo Choi, Jin Sun, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 152 - 165

2017 IEEE International Conference on Cluster Computing (CLUSTER)

While the memory footprints of cloud and HPC applications continue to increase, fundamental issues with DRAM scaling are likely to prevent traditional main memory systems, composed of monolithic DRAM, from greatly growing in capacity. Hybrid memory systems can mitigate the scaling limitations of monolithic DRAM by pairing together multiple memory technologies (e.g., different types of DRAM, or DRAM...

chapter

Vectorization-Aware Loop Optimization with User-Defined Code Transformations

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 685 - 692

2017 IEEE International Conference on Cluster Computing (CLUSTER)

The cost of maintaining an application code would significantly increase if the application code is branched into multiple versions, each of which is optimized for a different architecture. In this work, default and vector versions of a realworld application code are refactored to be a single version, and the differences between the versions are expressed as userdefined code transformations. As a...

chapter

OmniGraph: A Scalable Hardware Accelerator for Graph Processing

Chongchong Xu, Chao Wang, Lei Gong, Yuntao Lu, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 623 - 624

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Large-scale graphs processing attracts more and more attentions, and it has been widely applied in many application domains. FPGA is a promising platform to implement graph processing algorithms with high power-efficiency and parallelism. In this paper, we propose OmniGraph, a scalable hardware accelerator for graph processing. OmniGraph can process graphs with different sizes adaptively and is adaptable...

chapter

Hardware diversity and modified NUREG/CR-7007 based assessment of NPP I&C safety

Oleg Illiashenko, Vyacheslav Kharchenko, Ah-Lian Kor, Artem Panarin, more

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 2 > 907 - 911

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

Diversity and subdiversity-oriented systems applied in safety critical industry systems are analyzed through the use of the classification scheme described in standard NUREG7007. This classification is specified considering diversity of hardware and FPGA designs. In particular, diversity of hard logic and soft processors, interfaces and buses, self-diagnostics means, etc… are described. Impact of...

chapter

Language Design with Intent

Vadim Zaytsev

2017 ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS) > 45 - 52

2017 ACM/IEEE 20th International Conference on Model Driven Engineering Languages and Systems (MODELS)

Software languages have always been an essential component of model-driven engineering. Their importance and popularity has been on the rise thanks to language workbenches, language-oriented development and other methodologies that enable us to quickly and easily create new languages specific for each domain. Unfortunately, language design is largely a form of art and has resisted most attempts to...

chapter

Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology

Vicent Selfa, Julio Sahuquillo, Lieven Eeckhout, Salvador Petit, more

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 194 - 205

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

Achieving system fairness is a major design concern in current multicore processors. Unfairness arises due to contention in the shared resources of the system, such as the LLC and main memory. To address this problem, many research works have proposed novel cache partitioning policies aimed at addressing system fairness without harming performance. Unfortunately, existing proposals targeting fairness...

chapter

OSCAR: Optimizing SCrAtchpad reuse for graph processing

Shreyas G. Singapura, Ajitesh Srivastava, Rajgopal Kannan, Viktor K. Prasanna

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

Recently, architectures with scratchpad memory are gaining popularity. These architectures consist of low bandwidth, large capacity DRAM and high bandwidth, user addressable small capacity scratchpad. Existing algorithms must be redesigned to take advantage of the high bandwidth while overcoming the constraint on capacity of scratchpad. In this paper, we propose an optimized edge-centric graph processing...

chapter

The cashless payment device for vending machines — Import substitution in the sphere of vending

Viktor P. Semenov, Vladimir V. Chernokulsky, Natalya V. Razmochaeva

2017 International Conference "Quality Management,Transport and Information Security, Information Technologies" (IT&QM&IS) > 798 - 801

2017 International Conference "Quality Management,Transport and Information Security, Information Technologies" (IT&QM&IS)

In this paper the notion of vending and the main tasks of a modern vending company are disclosed, one of the main goals of which is to provide as many payment methods as possible. Vending machines are most often located in large companies, large offices, institutions, and enterprises. The aim of the vending company is to increase the speed of customer service. This paper describes own development...

INFONA - science communication portal

Search results

Making a case for an ARM Cortex-A9 CPU interlay replacing the NEON SIMD unit

Toward a pixel-parallel architecture for graph cuts inference on FPGA

BRAIN: A Low-Power Deep Search Engine for Autonomous Robots

Parallel FPGA routing: Survey and challenges

Exploring On-Node Parallelism with Neutral, a Monte Carlo Neutral Particle Transport Mini-App

Pure Functions in C: A Small Keyword for Automatic Parallelization

Bridging high-level synthesis and application-specific arithmetic: The case study of floating-point summations

TAIGA: A new RISC-V soft-processor framework enabling high performance CPU architectural features

Estimation of Worst Case Response Time boundaries in multi-core real-time systems

Performance Evaluation of Quantum ESPRESSO on NEC SX-ACE

Checkpointing Workflows for Fail-Stop Errors

Fast Failure Erasure Encoding Using Just in Time Compilation for CPUs, GPUs, and FPGAs

Utility-Based Hybrid Memory Management

Vectorization-Aware Loop Optimization with User-Defined Code Transformations

OmniGraph: A Scalable Hardware Accelerator for Graph Processing

Hardware diversity and modified NUREG/CR-7007 based assessment of NPP I&C safety

Language Design with Intent

Application Clustering Policies to Address System Fairness with Intel’s Cache Allocation Technology

OSCAR: Optimizing SCrAtchpad reuse for graph processing

The cashless payment device for vending machines — Import substitution in the sphere of vending

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options