Search results

chapter

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 13 - 24

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Multi-/many-core CPU based architectures are seeing widespread adoption due to their unprecedented compute performance in a small power envelope. With the increasingly large number of cores on each node, applications spend a significant portion of their execution time in intra-node communication. While shared memory is commonly used for intra-node communication, it needs to copy each message once...

chapter

Evaluating high-level design strategies on FPGAs for high-performance computing

Artur Podobas, Hamid Reza Zohouri, Naoya Maruyama, Satoshi Matsuoka

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Field-Programmable Gate Arrays (FPGAs) are gaining considerable momentum in mainstream high-performance systems in recent years due to their flexibility and low power consumption. Still, FPGAs remain largely unavailable to software programmers due to programming and debugging difficulties that are inherent to standard Hardware Description Languages. The performance that hardware-oblivious software...

chapter

PolyPC: Polymorphic parallel computing framework on embedded reconfigurable system

Hongyuan Ding, Miaoqing Huang

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 8

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

With the help of parallelism provided by the fine-grained architecture, hardware accelerators on Field Programmable Gate Arrays (FPGAs) can significantly improve the performance of many applications. However, designers are typically required to have excellent hardware programming skills and unique optimization techniques to fully explore the potential of FPGA resources. In this work, we propose the...

chapter

A fully connected layer elimination for a binarizec convolutional neural network on an FPGA

Hiroki Nakahara, Tomoya Fujii, Shimpei Sato

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

A pre-trained convolutional deep neural network (CNN) is widely used for embedded systems, which requires highly power-and-area efficiency. In that case, the CPU is too slow, the embedded GPU dissipates much power, and the ASIC cannot keep up with the rapidly progress of the CNN variations. This paper uses a binarized CNN which treats only binary 2-values for the inputs and the weights. Since the...

chapter

Vectorization-Aware Loop Optimization with User-Defined Code Transformations

Hiroyuki Takizawa, Thorsten Reimann, Kazuhiko Komatsu, Takashi Soga, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 685 - 692

2017 IEEE International Conference on Cluster Computing (CLUSTER)

The cost of maintaining an application code would significantly increase if the application code is branched into multiple versions, each of which is optimized for a different architecture. In this work, default and vector versions of a realworld application code are refactored to be a single version, and the differences between the versions are expressed as userdefined code transformations. As a...

chapter

Preliminary Performance Evaluation of Application Kernels Using ARM SVE with Multiple Vector Lengths

Yuetsu Kodama, Tetsuya Odajima, Motohiko Matsuda, Miwako Tsuji, more

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 677 - 684

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Modern high performance processors are equipped with very wide SIMD instruction set. SVE (Scalable Vector Extension) is an ARM® SIMD technology that supports vector lengths from 128 bits to 2048 bits. One of its promising features is to offer "vector-length agnostic" programming to allow the same SVE code to run on hardware of any vector length without any modification of the code. This...

chapter

Fast linear algebra-based triangle counting with KokkosKernels

Michael M. Wolf, Mehmet Deveci, Jonathan W. Berry, Simon D. Hammond, more

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

Triangle counting serves as a key building block for a set of important graph algorithms in network science. In this paper, we address the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node. Our implementation uses a linear algebra-based approach to triangle counting that has grown out of work related to our...

chapter

WCET analysis of the shared data cache in integrated CPU-GPU architectures

Yijie Huangfu, Wei Zhang

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

By taking the advantages of both CPU and GPU as well as the shared DRAM and cache, the integrated CPU-GPU architecture has the potential to boost the performance for a variety of applications, including real-time applications as well. However, before being applied to the hard real-time and safety-critical applications, the time-predictability of the integrated CPU-GPU architecture needs to be studied...

chapter

Non-von-neumann heap for better streaming, capturing and storing of raw 8K video data

Mohamed Shaafiee, Rajasvaran Logeswaran

2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) > 469 - 473

2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

The advent of 8K and better resolutions of video pose problems for the capture and storage of data by these standards. The contemporary alternative is to compromise on quality and use various (often lossy) compression techniques to reduce the bandwidth required to move this data. This paper proposes a novel method for handling large volumes of video data without compromising its quality through space...

chapter

3D tomography back-projection parallelization on FPGAs using opencl

Maxime Martelli, Nicolas Gag, Alain Merigot, Cyrille Enderli

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 1 - 6

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

This paper deals with the evaluation of FPGAs resurgence for hardware acceleration applied to computed tomography on the back-projection operator used in iterative reconstruction algorithms. We focus our attention on the tools developed by FPGAs manufacturers, in particular the Intel FPGA SDK for OpenCL, that promises a new level of hardware abstraction from the developer's perspective, allowing a...

chapter

Reconfigurable logic embedded architecture of support vector machine linear kernel

Jeevan Sirkunan, N. Shaikh-Husin, Trias Andromeda, M. N. Marsono

2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI) > 1 - 5

2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)

Support Vector Machine (SVM) is a linear binary classifier that requires a kernel function to handle non-linear problems. Most previous SVM implementations for embedded systems in literature were built targeting a certain application; where analyses were done through comparison with software implementations only. The impact of different application datasets towards SVM hardware performance were not...

chapter

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Kanishkan Vadivel, Mark Wijtvliet, Roel Jordans, Henk Corporaal

2017 Euromicro Conference on Digital System Design (DSD) > 14 - 21

2017 Euromicro Conference on Digital System Design (DSD)

Due to their flexibility and high performance, Coarse Grained Reconfigurable Array (CGRA) are a topic of increasing research interest. However, CGRAs also have the potential to achieve very high energy efficiency in comparison to other reconfigurable architectures when hardware optimizations are applied. Some of these optimizations are common for more traditional processors but can also lead to large...

chapter

Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks

Ramakanth Kavuluru, Anthony Rios, Tung Tran

2017 IEEE International Conference on Healthcare Informatics (ICHI) > 5 - 12

2017 IEEE International Conference on Healthcare Informatics (ICHI)

Drug-drug interactions (DDIs) are known to be responsible for nearly a third of all adverse drug reactions. Hence several current efforts focus on extracting signal from EMRs to prioritize DDIs that need further exploration. To this end, being able to extract explicit mentions of DDIs in free text narratives is an important task. In this paper, we explore recurrent neural network (RNN) architectures...

chapter

Developing CPU-GPU Embedded Systems Using Platform-Agnostic Components

Gabriel Campeanu, Jan Carlson, Severine Sentilles

2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA) > 176 - 180

2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA)

Nowadays, there are many embedded systems with different architectures that have incorporated GPUs. However, it is difficult to develop CPU-GPU embedded systems using component-based development (CBD), since existing CBD approaches have no support for GPU development. In this context, when targeting a particular CPU-GPU platform, the component developer is forced to construct hardware-specific components,...

chapter

Deep structured features for semantic segmentation

Michael Tschannen, Lukas Cavigelli, Fabian Mentzer, Thomas Wiatowski, more

2017 25th European Signal Processing Conference (EUSIPCO) > 61 - 65

2017 25th European Signal Processing Conference (EUSIPCO)

We propose a highly structured neural network architecture for semantic segmentation with an extremely small model size, suitable for low-power embedded and mobile platforms. Specifically, our architecture combines i) a Haar wavelet-based tree-like convolutional neural network (CNN), ii) a random layer realizing a radial basis function kernel approximation, and iii) a linear classifier. While stages...

chapter

Parallel Desolvation Energy Term Calculation for Blind Docking on GPU Architectures

Hocine Saadi, Nadia Nouali-Taboudjemat, Abdellatif Rahmoun, Baldomero Imbernon, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 16 - 22

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In the recent literature, drug design relying on molecular docking (MD) techniques is becoming a very promising field. Most of these techniques rely on the way ligands interact with protein target using only one binding site, in addition, they ignore the fact that assorted ligands interact with unconnected parts of the target. However, by taking the latter fact into consideration, the computational...

chapter

In-Place Irregular Computation for Message-Passing Chip-Multiprocessors

Zhang Youhui, Zhang Youyang, Li Yanhua, Fei Xiang, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 69 - 76

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

With the increase of CMP (Chip-Multiprocessor) scale, moving data to computation on chip becomes more expensive. Accordingly, moving computation to data has potential to improve efficiency. We propose an in-place computation co-design of many-simple-core CMP for irregular applications. The computing paradigm is that an application's critical irregular data (or part of them) is partitioned into on-chip...

chapter

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Michael Wagner, Victor Lopez, Julian Morillo, Carlo Cavazzoni, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 243 - 250

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping...

chapter

Autotuning GPU Kernels via Static and Predictive Analysis

Robert Lim, Boyana Norris, Allen Malony

2017 46th International Conference on Parallel Processing (ICPP) > 523 - 532

2017 46th International Conference on Parallel Processing (ICPP)

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models or performing code transformations. Although empirical...

chapter

An FPGA oprimization of a multiple resolution architecture for LDR to HDR image conversion

Carmine Cappetta, Gian Domenico Licciardo, Luigi Di Benedetto

2017 International Symposium on Signals, Circuits and Systems (ISSCS) > 1 - 4

2017 International Symposium on Signals, Circuits and Systems (ISSCS)

An architecture capable of performing the inverse Tone Mapping to convert a Low Dynamic Range image into a High Dynamic Range one is proposed. The proposed image processor is specifically designed for a Field Programmable Gate Array implementation. The design exploits the presence of specific blocks in the Field Programmable Logic board, dedicated to the implementation of memories, in order to develop...

INFONA - science communication portal

Search results

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

Evaluating high-level design strategies on FPGAs for high-performance computing

PolyPC: Polymorphic parallel computing framework on embedded reconfigurable system

A fully connected layer elimination for a binarizec convolutional neural network on an FPGA

Vectorization-Aware Loop Optimization with User-Defined Code Transformations

Preliminary Performance Evaluation of Application Kernels Using ARM SVE with Multiple Vector Lengths

Fast linear algebra-based triangle counting with KokkosKernels

WCET analysis of the shared data cache in integrated CPU-GPU architectures

Non-von-neumann heap for better streaming, capturing and storing of raw 8K video data

3D tomography back-projection parallelization on FPGAs using opencl

Reconfigurable logic embedded architecture of support vector machine linear kernel

Loop Overhead Reduction Techniques for Coarse Grained Reconfigurable Architectures

Extracting Drug-Drug Interactions with Word and Character-Level Recurrent Neural Networks

Developing CPU-GPU Embedded Systems Using Platform-Agnostic Components

Deep structured features for semantic segmentation

Parallel Desolvation Energy Term Calculation for Blind Docking on GPU Architectures

In-Place Irregular Computation for Message-Passing Chip-Multiprocessors

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Autotuning GPU Kernels via Static and Predictive Analysis

An FPGA oprimization of a multiple resolution architecture for LDR to HDR image conversion

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options