Search results

chapter

An efficient runtime adaptable floating-point Gaussian filtering core

Cuong Pham-Quoc, Tran Ngoc Thinh

2017 4th NAFOSTED Conference on Information and Computer Science > 183 - 188

2017 4th NAFOSTED Conference on Information and Computer Science

With the fast increasingly use of image and video processing in many aspects, the requirements for high performance and high-quality systems lead to the use of reconfigurable computing to accelerate traditional image processing platforms. In this work, an efficient runtime adaptable floating-point Gaussian filtering core is proposed to achieve not only high performance and quality but also kernel...

chapter

Performance and Energy Analysis of OpenMP Runtime Systems with Dense Linear Algebra Algorithms

Joao V.F. Lima, Issam Rais, Laurent Lefevre, Thierry Gautier

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 7 - 12

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

In this paper, we analyse performance and energy consumption of four OpenMP runtime systems over a NUMA platform. We present an experimental study to characterize OpenMP runtime systems on the three main kernels in dense linear algebra algorithms (Cholesky, LU and QR) in terms of performance and energy consumption. Our experimental results suggest that OpenMP runtime systems can be considered as a...

chapter

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices

Cameron Musco, David P. Woodruff

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) > 672 - 683

2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS)

We show how to compute a relative-error low-rank approximation to any positive semidefinite (PSD) matrix in sublinear time, i.e., for any n x n PSD matrix A, in Õ(n ⋅ poly(k/ε)) time we output a rank-k matrix B, in factored form, for which kA – B║ 2 F ≤ (1 + ε)║A – Ak║2 F , where Ak is the best...

chapter

A programming model and runtime system for approximation-aware heterogeneous computing

Ioannis Parnassos, Nikolaos Bellas, Nikolaos Katsaros, Nikolaos Patsiatzis, more

2017 27th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2017 27th International Conference on Field Programmable Logic and Applications (FPL)

Heterogeneous platforms that include diverse architectures such as multicore CPUs, FPGAs and GPUs are becoming very popular due to their superior performance and energy efficiency. Besides heterogeneity, a promising approach for minimizing energy consumption is through approximate computing which relaxes the requirement that all parts of a program are considered equally important to the output quality,...

chapter

Triangle counting via vectorized set intersection

Shahir Mowlaei

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 5

2017 IEEE High Performance Extreme Computing Conference (HPEC)

In this paper we propose a vectorized sorted set intersection approach for the task of counting the exact number of triangles of a graph on CPU cores. The computation is factorized into reordering and counting kernels where the reordering kernel builds upon the Reverse Cuthill-McKee heuristic.

chapter

Experimentation of vision algorithm performance using custom OpenCL™ vector language extensions for a graphical accelerator with vector architecture

Bogdan Ditu, Fred Peterson, Ciprian Arbone

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP) > 339 - 346

2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP)

OpenCL is a standard that supports a parallel programming paradigm which enables heterogeneous multi-core systems and also offers a high level of portability for the application. Some of the systems that are used with OpenCL might have vector capabilities at device compute units level. There are more ways the vector capabilities could be exploited by the OpenCL device application, the most common...

chapter

Comprehensive comparison of gradient-based cross-spectral stereo matching generated disparity maps

Christopher B. Picardo, Justin G. R. Delva, R. Iris Bahar

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS) > 200 - 204

2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)

In Gradient-Based Cross-Spectral Stereo Matching (GB-CSSM) output disparity maps tend to produce coarse results that are, for the most part, reliable. However, general methods of improving the performance of disparity maps generated from the Cross-Spectral comparison of visual and full infrared input images are non-existent. In particular, previous works fail to address the role and interaction of...

chapter

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rosetti, more

2017 46th International Conference on Parallel Processing (ICPP) > 151 - 160

2017 46th International Conference on Parallel Processing (ICPP)

While GPUs are becoming common in HPC systems, the CPU is still responsible for managing both GPU-side and CPU-side compute, communication, and synchronization operations. For instance, if a result from a GPU-side computation is to be transferred to a remote destination, then the CPU must synchronize on GPU compute completion issuing a communication operation. Both CPU cycles and energy are consumed...

chapter

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Michael Wagner, Victor Lopez, Julian Morillo, Carlo Cavazzoni, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 243 - 250

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In this paper, we address the decreasing performance of the FFTXlib, the Fast Fourier Transformation (FFT) kernel of Quantum ESPRESSO, when scaling to a full KNL node. An increased performance in the FFTXlib will likewise increase the performance of the entire Quantum ESPRESSO code one of the most used plane-wave DFT codes in the community of material science. Our approach focuses on, first, overlapping...

chapter

Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators

Anna Pupykina, Giovanni Agosta

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 291 - 300

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

We address the problem of optimizing global shared memory usage in deeply heterogeneous accelerators in the context of HPC systems running multiple applications with different quality of service levels. We explore predictive memory allocation algorithms, allowing to serve up to 28% more high priority requests when using a moving average based prediction in a low-workload scenario.

chapter

RealDroid: Large-Scale Evasive Malware Detection on "Real Devices"

Lang Liu, Yacong Gu, Qi Li, Purui Su

2017 26th International Conference on Computer Communication and Networks (ICCCN) > 1 - 8

2017 26th International Conference on Computer Communication and Networks (ICCCN)

In order to effectively detect malware in Android, dynamic analysis techniques with Android emulators are widely adopted. Emulators can be deployed for large-scale malware detection and restored to an ensured clean state in a short period after each app analysis process such that dynamic analysis upon emulators can effectively detect malware. Moreover, emulators significantly reduce the detection...

chapter

Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization

Kyle C. Hale, Conor Hetland, Peter Dinda

2017 IEEE International Conference on Autonomic Computing (ICAC) > 177 - 186

2017 IEEE International Conference on Autonomic Computing (ICAC)

The hybrid runtime (HRT) model offers a path towards high performance and efficiency. By integrating the OS kernel, runtime, and application, an HRT allows the runtime developer to leverage the full feature set of the hardware and specialize OS services to the runtime's needs. However, conforming to the HRT model currently requires a port of the runtime to the kernel level, for example to the Nautilus...

chapter

OpenMP device offloading to FPGA accelerators

Lukas Sommer, Jens Korinth, Andreas Koch

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 201 - 205

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Future high-performance computing systems will need to include multiple specialized accelerators in a single heterogeneous system to overcome power-density limitations of CPU performance.

chapter

Hardwiring the OS kernel into a Java application processor

Chun-Jen Tsai, Cheng-Ju Lin, Cheng-Yang Chen, Yan-Hung Lin, more

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 53 - 60

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

This paper presents the design and implementation of a hardwired OS kernel circuitry inside a Java application processor to provide the system services that are traditionally implemented in software. The hardwired system functions in the proposed SoC include the thread manager, the memory manager, and the I/O subsystem interface. There are many advantages in making the OS kernel a hardware component,...

chapter

Fast, accurate spectral clustering using locally linear landmarks

Max Vladymyrov, Miguel A. Carreira-Perpinan

2017 International Joint Conference on Neural Networks (IJCNN) > 3870 - 3879

2017 International Joint Conference on Neural Networks (IJCNN)

For problems of image or video segmentation, where clusters have a complex structure, a leading method is spectral clustering. It works by encoding the similarity between pairs of points into an affinity matrix and applying k-means in its low-order eigenspace, where the clustering structure is enhanced. When the number of points is large, an approximation is necessary to limit the runtime even if...

chapter

Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code

David Beckingsale, Olga Pearce, Ignacio Laguna, Todd Gamblin

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 307 - 316

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Increasing architectural diversity makes performance portability extremely important for parallel simulation codes. Emerging on-node parallelization frameworks such as Kokkos and RAJA decouple the work done in kernels from the parallelization mechanism, allowing for a single source kernel to be tuned for different architectures at compile time. However, computational demands in production applications...

chapter

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Javier Alejandro Varela, Norbert Wehn, Qian Liang, Songyin Tang

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 124 - 131

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the field of high performance heterogeneous computing systems, field programmable gate arrays (FPGAs) have shown great advantages in terms of acceleration and energy efficiency. And with the inclusion of the OpenCL framework for parallel programming, the design complexity has been greatly reduced. However, the parallel implementation of applications containing data-dependent branches usually experiences...

chapter

Quality Attribute Trade-Offs in Industrial Software Systems

Michael Wahler, Raphael Eidenbenz, Aurelien Monot, Manuel Oriol, more

2017 IEEE International Conference on Software Architecture Workshops (ICSAW) > 251 - 254

2017 IEEE International Conference on Software Architecture Workshops (ICSAW)

The main challenge of architecting modern industrial control and automation systems (ICASs) is that they need to fulfill quality attributes (QAs) traditional to real-time systems — such as timeliness and predictability — and modern software engineering — such as modularity or reusability. QAs often areconflicting, which entails difficult trade-offs. As a consequence, even the architecture of closely...

chapter

Preserving Energy Resources Using an Android Kernel Extension: A Case Study

Luis Corral, Ilenia Fronza, Nabil El Ioini, Andrea Janes, more

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft) > 23 - 24

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft)

In this paper, we present our experience designing and testing anenergy saving strategy for mobile phones, implemented atoperating system level, using Android OS. Our approach was todeploy kernel extensions that assess the status of the device, andenable economic profiles without user intervention. Ourexperiments showed that the power management kernel extensionwas able to extend the battery runtime...

chapter

Fast-extract with cube hashing

Bruno de O. Schmitt, Alan Mishchenko, Victor N. Kravets, Robert K. Brayton, more

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 145 - 150

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

The fast-extract algorithm is a well-known algebraic method for factoring and decomposing Boolean expressions. Since it uses pairwise comparisons between cubes to find factors, the runtime is degraded for networks whose primary outputs are expressed in terms of primary inputs and have Boolean functions with thousands of cubes. This paper describes a new implementation of the fast-extract algorithm,...

INFONA - science communication portal

Search results

An efficient runtime adaptable floating-point Gaussian filtering core

Performance and Energy Analysis of OpenMP Runtime Systems with Dense Linear Algebra Algorithms

Sublinear Time Low-Rank Approximation of Positive Semidefinite Matrices

A programming model and runtime system for approximation-aware heterogeneous computing

Triangle counting via vectorized set intersection

Experimentation of vision algorithm performance using custom OpenCL™ vector language extensions for a graphical accelerator with vector architecture

Comprehensive comparison of gradient-based cross-spectral stereo matching generated disparity maps

MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling

Performance Analysis and Optimization of the FFTXlib on the Intel Knights Landing Architecture

Optimizing Memory Management in Deeply Heterogeneous HPC Accelerators

RealDroid: Large-Scale Evasive Malware Detection on "Real Devices"

Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization

OpenMP device offloading to FPGA accelerators

Hardwiring the OS kernel into a Java application processor

Fast, accurate spectral clustering using locally linear landmarks

Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Quality Attribute Trade-Offs in Industrial Software Systems

Preserving Energy Resources Using an Android Kernel Extension: A Case Study

Fast-extract with cube hashing

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options