Search results

chapter

A Linear Performance-Breakdown Model for GPU Programming Optimization Guidance

Mario A. Chapa M., Sato Hiroyuki

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 596 - 603

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

The use Graphic Processing Units (GPU) as computing accelerators has been. Nevertheless, writing efficient GPU programs is a difficult and time consuming task. In this paper we present the Linear Performance Breakdown Model (LBPM), an analytic model that is used to extract the breakdown of GPU kernel programs execution time into the three major components that affect its running time. The model can...

chapter

The Peril of Fragmentation: Security Hazards in Android Device Driver Customizations

Xiaoyong Zhou, Yeonjoon Lee, Nan Zhang, Muhammad Naveed, more

2014 IEEE Symposium on Security and Privacy > 409 - 423

2014 IEEE Symposium on Security and Privacy (SP)

Android phone manufacturers are under the perpetual pressure to move quickly on their new models, continuously customizing Android to fit their hardware. However, the security implications of this practice are less known, particularly when it comes to the changes made to Android's Linux device drivers, e.g., those for camera, GPS, NFC etc. In this paper, we report the first study aimed at a better...

chapter

FPGA implementation and evaluation of a simple processor for multi-scalar/vector/matrix instructions

Mostafa I. Soliman, Elsayed A. Elsayed

2014 International Conference on Engineering and Technology (ICET) > 1 - 7

2014 International Conference on Engineering and Technology (ICET)

On FPGA, this paper presents the implementation of a simple processor architecture for accelerating data-parallel applications. Our proposed processor called SuperSMP, which can execute multi-scalar, vector, and matrix instructions on parallel execution datapaths. 4×32-bit instructions are fetched from instruction cache. The fetched instructions are decoded and their dependencies are checked. Up to...

chapter

Wavelet based multilevel un-sharp masking using OpenCL

Jai Prakash Bhagat, Kumar Ashish, Sibsambhu Kar, Suresh Kumar Gara

2014 First International Conference on Automation, Control, Energy and Systems (ACES) > 1 - 5

2014 First International Conference on Automation, Control, Energy and Systems (ACES)

A wavelet based multi-level adaptive unsharp masking technique for image sharpening is proposed. It does sharpening at multiple levels of DWT with a small and fixed size Gaussian kernel and automatically adjusts for different amount of blurring in different direction. The proposed method is free from kernel estimation. The algorithm is designed to process in a heterogenous environment consisting of...

chapter

Implementation of an improved parallel metaheuristic on GPU applied to humanoid robot simulation

Nour EL-Houda Benalia, Nesrine Ouannes, Noureddine Djedi

2014 International Conference on Multimedia Computing and Systems (ICMCS) > 42 - 47

2014 International Conference on Multimedia Computing and Systems (ICMCS)

Generally, bio-inspired techniques require significant computational resources. However, due to their complexity and the computing power required for their execution, they have long been neglected. Nevertheless and recently, parallel resolution techniques exploiting the graphics processing units (GPUs) are increasingly used. These specialized processors are being widely adopted for the purpose of...

chapter

A Linux-governor based Dynamic Reliability Manager for android mobile devices

Pietro Mercati, Andrea Bartolini, Francesco Paterna, Tajana Simunic Rosing, more

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1 - 4

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Reliability is a major concern in multiprocessors. Dynamic Reliability Management (DRM) aims at trading off processor performance with lifetime. The state-of-the-art publications study only the theory supported by simulation. This paper presents the first complete software implementation, working on a real hardware, of a low-overhead, Android-compatible workload-aware DRM Governor for mobile multiprocessors...

chapter

Automatic optimization of thread-coarsening for graphics processors

Alberto Magni, Christophe Dubach, Michael O'Boyle

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 455 - 466

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

OpenCL has been designed to achieve functional portability across multi-core devices from different vendors. However, the lack of a single cross-target optimizing compiler severely limits performance portability of OpenCL programs. Programmers need to manually tune applications for each specific device, preventing effective portability. We target a compiler transformation specific for data-parallel...

chapter

Performance comparison of wireless networks over IPv6 and IPv4 under several operating systems

Hossam M. A. Fahmy, Salma A. Ghoneim

2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS) > 670 - 673

2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS)

IPv6 was introduced but yet it is not widely used. Research work has been pointed to many directions, specifically, on how to migrate from IPv4 to IPv6, on how to adapt hardware devices to support a transitory period from coexistence between IPv4 and IPv6 to established use of IPv6, and on how should operating systems perform when using IPv6 as compared to IPv4. This work provides a comparative performance...

chapter

Online Performance Projection for Clusters with Heterogeneous GPUs

Lokendra S. Panwar, Ashwin M. Aji, Jiayuan Meng, Pavan Balaji, more

2013 International Conference on Parallel and Distributed Systems > 283 - 290

2013 International Conference on Parallel and Distributed Systems (ICPADS)

We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases...

chapter

An Embedded NIDS with Multi-core Aware Packet Capture

Chia-Hao Hsu, Sheng-De Wang

2013 IEEE 16th International Conference on Computational Science and Engineering > 778 - 785

2013 IEEE 16th International Conference on Computational Science and Engineering (CSE)

Network security has been a serious problem in the Internet. To face this issue, network intrusion detection tools have become indispensable for computer systems and network gateways. In this paper we propose an embedded, multi-core aware network intrusion detection system (NIDS), which has the following features: 1) It integrates a novel multi-core aware packet capture module, called the MCA ring,...

chapter

Implementation of Parallel 1-D FFT on GPU Clusters

Daisuke Takahashi

2013 IEEE 16th International Conference on Computational Science and Engineering > 174 - 180

2013 IEEE 16th International Conference on Computational Science and Engineering (CSE)

In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) on GPU clusters. This implementation is based on the six-step FFT algorithm. Because the parallel one-dimensional FFT requires three all-to-all communications, one goal for parallel FFTs on GPU clusters is to minimize the PCI Express transfer time and the MPI communication time. We demonstrate that...

chapter

Future Non-volatile Memory Storage Architecture and File System Interface

Shuichi Oikawa, Satoshi Miki

2013 First International Symposium on Computing and Networking > 389 - 392

2013 First International Symposium on Computing and Networking (CANDAR)

Non-volatile memory (NVM) storage is becoming more popular as its performance and cost efficiency improve. Since the performance and characteristics of NVM storage are significantly different from those of HDDs, there are ongoing researches to utilize SSDs more efficiently and effectively. There is a claim that the further improvement of NVM storage performance makes it better to poll a storage device...

chapter

Achieving Natural Mathematical Expression Programming on GPUs via Expression Templates

Alfonso Breglia, Amedeo Capozzoli, Claudio Curcio, Angelo Liseno

2013 European Modelling Symposium > 500 - 505

2013 European Modelling Symposium (EMS)

We present the development of one of the first libraries based on the so-called expression templates technique to simplify the implementation of CPU and parallel GPUcodes. Expression templates allow to express matrix algebra operations to be executed either on the CPU or on the GPU with a syntax very close to the natural mathematical one. The developed library has been deeply optimized so that the...

chapter

High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures

Pei Li, Elisabeth Brunet, Raymond Namyst

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 1512 - 1518

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

Heterogeneous archioectures have been widely used in the domain of high performance computing. On one hand, it allows a designer to use multiple types of computing units and each able to execute the tasks that it is best suited for to increase performance, on the other hand, it brings many challenges in programming for novice users, especially for heterogeneous systems with multi-devices. In this...

chapter

Automatic Mapping Single-Device OpenCL Program to Heterogeneous Multi-device Platform

Dong Chen, Changqing Xun, Dafei Huang, Mei Wen, more

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 135 - 142

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

In this paper, we propose a framework to automatically map single-device OpenCL programs to heterogeneous multi-device platforms with performance concerns. Our framework is based on the independence of work groups which built inside the OpenCL programming model and relies heavily on the knowledge of global memory access regions of work groups. So global memory access patterns of work groups are analyzed...

chapter

PIN-Cache: An Effective Cache Scheme Designed for Application Performance Insulation

Chengxiang Si, Bo Sun, Xiaoxuan Meng

2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing > 964 - 970

2013 IEEE International Conference on High Performance Computing and Communications (HPCC) & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

in shared storage environment, different types of applications share cache resources. The traditional cache management has two disadvantages. First, interference exists between applications that share one cache space, therefore, every application can't share cache resource fairly. Second, overall resource utilization is very low. To solve these problems, we design a cache management system - PIN-Cache,...

chapter

A Low Overhead and Reliable Nested Virtualization VMM for Cloud Computing

Xuan Yu, Qi Yong, Yuehua Dai, Jianbao Ren, more

2013 10th Web Information System and Application Conference > 400 - 405

2013 10th Web Information System and Application Conference (WISA)

Commodity operating systems have already gained functionality of virtual machine monitor. Nested virtualization is needed to run these commodity operating systems as virtual machines. Furthermore, with nested virtualization technology, users can run a self-configured virtual machine monitor (VMM) in Infrastructure as a Service (IaaS) cloud computing model, and live migration of VMM can be realized...

chapter

Optimizing OLAP heterogeneous computing based on Rabin-Karp Algorithm

Haytham I. M. Alzeini, Shihab A. Hameed, Mohamed. H. Habaebi

2013 IEEE International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA) > 1 - 6

2013 IEEE International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA)

The enormous amount of data has been boundlessly growing over the last few decades and expected to exponentially do so in the future. However, a substantial size of this accumulated amount is discarded anyhow. The processing capabilities have been considered as one of the major barriers in the way of exploiting this priceless mine. Therefore, the issue has absorbed considerable part of researchers'...

chapter

A high-performance and energy-efficient CT reconstruction algorithm for multi-terabyte datasets

Edward S. Jimenez, Laurel J. Orr, Kyle R. Thompson, Ryeojin Park

2013 IEEE Nuclear Science Symposium and Medical Imaging Conference (2013 NSS/MIC) > 1 - 7

2013 IEEE Nuclear Science Symposium and Medical Imaging Conference (2013 NSS/MIC)

There has been much work done in implementing various GPU-based Computed Tomography reconstruction algorithms for medical applications showing tremendous improvement in computational performance. While many of these reconstruction algorithms could also be applied to industrial-scale datasets, the performance gains may be modest to non-existent due to a combination of algorithmic, hardware, or scalability...

chapter

One OpenCL to rule them all?

Romain Dolbeau, Francois Bodin, Guillaume Colin de Verdiere

2013 IEEE 6th International Workshop on Multi-/Many-core Computing Systems (MuCoCoS) > 1 - 6

2013 IEEE 6th International Workshop on Multi-/Many-core Computing Systems (MuCoCoS)

OpenCL is now available on a very large set of processors. This makes this language an attractive layer to address multiple targets with a single code base. The question on how sensitive to the underlying hardware is the OpenCL code in practice remains to be better understood. ¹

INFONA - science communication portal

Search results

A Linear Performance-Breakdown Model for GPU Programming Optimization Guidance

The Peril of Fragmentation: Security Hazards in Android Device Driver Customizations

FPGA implementation and evaluation of a simple processor for multi-scalar/vector/matrix instructions

Wavelet based multilevel un-sharp masking using OpenCL

Implementation of an improved parallel metaheuristic on GPU applied to humanoid robot simulation

A Linux-governor based Dynamic Reliability Manager for android mobile devices

Automatic optimization of thread-coarsening for graphics processors

Performance comparison of wireless networks over IPv6 and IPv4 under several operating systems

Online Performance Projection for Clusters with Heterogeneous GPUs

An Embedded NIDS with Multi-core Aware Packet Capture

Implementation of Parallel 1-D FFT on GPU Clusters

Future Non-volatile Memory Storage Architecture and File System Interface

Achieving Natural Mathematical Expression Programming on GPUs via Expression Templates

High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures

Automatic Mapping Single-Device OpenCL Program to Heterogeneous Multi-device Platform

PIN-Cache: An Effective Cache Scheme Designed for Application Performance Insulation

A Low Overhead and Reliable Nested Virtualization VMM for Cloud Computing

Optimizing OLAP heterogeneous computing based on Rabin-Karp Algorithm

A high-performance and energy-efficient CT reconstruction algorithm for multi-terabyte datasets

One OpenCL to rule them all?

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options