Search results

chapter

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Anamaria Vizitiu, Lucian Mihai Itu, Ranveer Joyseeree, Adrien Depeursinge, more

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 431 - 434

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel -- wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard...

chapter

Transforming VHDL descriptions into formal component-based models

Ayoub Nouri, Rahma Ben Atitallah, Anca Molnos, Christian Fabre, more

2016 International Symposium on Rapid System Prototyping (RSP) > 1 - 8

2016 International Symposium on Rapid System Prototyping (RSP)

In this work, we investigate a transformation of VHDL descriptions into equivalent formal models. The targeted equivalence is at the level of the functional behavior. That is, we aim at producing formal models that have the same functional simulation behavior as the original VHDL implementation. We rely on the BIP component-based modeling language as the underlying formalism for this transformation...

chapter

Strategy without tactics: Policy-agnostic hardware-enhanced control-flow integrity

Dean Sullivan, Orlando Arias, Lucas Davi, Per Larsen, more

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

Control-flow integrity (CFI) is a general defense against codereuse exploits that currently constitute a severe threat against diverse computing platforms. Existing CFI solutions (both in software and hardware) suffer from shortcomings such as (i) inefficiency, (ii) security weaknesses, or (iii) are not scalable. In this paper, we present a generic hardware-enhanced CFI scheme that tackles these problems...

chapter

Performance-centric register file design for GPUs using racetrack memory

Shuo Wang, Yun Liang, Chao Zhang, Xiaolong Xie, more

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC) > 25 - 30

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC)

The key to high performance for GPU architecture lies in massive threading to drive the large number of cores and enable overlapping of threading execution. However, in reality, the number of threads that can simultaneously execute is often limited by the size of the register file on GPUs. The traditional SRAM-based register file costs so large amount of chip area that it cannot scale to meet the...

chapter

Design and synthesis of reconfigurable control-flow structures for CGRA

Zoltan Endre Rakossy, Axel Acosta-Aponte, Tobias G. Noll, Gerd Ascheid, more

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 8

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

Coarse-Grained Reconfigurable Architectures (CGRA) promise both low power and high performance coupled with flexibility, however automatic mapping of applications to such platforms remains a great research challenge. Efficient manual mapping of the data-centric kernels of applications yields great results, however these contain internally control-flow specific tasks, which introduce mapping irregularities...

chapter

High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR

Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Jie Zhang, more

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 244 - 253

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Exchanging data on noncontiguous user buffers has been a dominant communication pattern in many scientific applications. The OpenSHMEM specification introduces a new set of communication routines to support strided data communication. Most high performance implementations of the OpenSHMEM specification support strided data communication by either packing/unpacking or multiple reads/writes based scheme,...

chapter

Designing customized ISA processors using high level synthesis

Sam Skalicky, Tejaswini Ananthanarayana, Sonia Lopez, Marcin Lukowiak

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

In this paper we propose a new degree of flexibility for soft processor design in which only the instructions relevant to the task at hand are implemented as a subset of the Instruction Set Architecture (ISA). These customized processors execute software kernels in the usual way, yet can be implemented with a fraction of the hardware resources used by other full- ISA soft processor cores. We present...

chapter

FPGA-accelerated simulation engine for non-viral gene delivery

David Perlaza, Adam Postula, Lech Jozwiak, Tadeusz Wysocki

2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS) > 1 - 5

2015 9th International Conference on Signal Processing and Communication Systems (ICSPCS)

Computational methods have become an important part of gene delivery research, as they allow researchers to experiment with different models of cellular processes. Models of the gene delivery process based on telecommunication theory make this experimentation especially efficient. Therefore, this paper presents a specialised FPGA-accelerated heterogeneous architecture for simulating the gene delivery...

chapter

HSA-enabled DSPs and accelerators

John Glossner, Paul Blinzer, Jarmo Takala

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 1407 - 1411

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

In this paper, we describe the Heterogeneous System Architecture Foundation's application to digital signal processors (DSP) and hardware accelerators. We provide an overview of the HSA runtime, system architecture and programmer's model, identify characteristics of DSPs and compare differences in algorithms to GPUs. We show an example mapping of HSA agents to a modern DSP using the HSA intermediate...

chapter

Design Optimization and Implementation of Bootloader in Embedded System Development

Chen Sha, Zhu-Ying Lin

2015 International Conference on Computer Science and Applications (CSA) > 151 - 156

2015 International Conference on Computer Science and Applications (CSA)

Bootloader is an important part of ARM embedded system. The realization of Bootloader is closely related with hardware. This paper gives a brief introduction to S3C2440 development board and its starting mode. Mainly introduces the initialization of the each function module in S3C2440 during the system startup, and scheme out the Simplified Bootloader for S3C2440. After testing, This Bootloader has...

chapter

A comparative analysis of resource requirements for parallel applications in GPGPU

Winnie Thomas, Rohin D. Daruwala

TENCON 2015 - 2015 IEEE Region 10 Conference > 1 - 6

TENCON 2015 - 2015 IEEE Region 10 Conference

The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within...

chapter

Acceleration of nested conditionals on CGRAs via trigger scheme

Shouyi Yin, Pengcheng Zhou, Leibo Liu, Shaojun Wei

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 597 - 604

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Coarse-Grained Reconfigurable Architecture (CGRA) is a promising accelerator when considering both high performance and high power-efficiency. One of the challenges that CGRAs are confronting is to accelerate loops with control flow (if-then-else structures). Existing techniques employ predication to accelerate the conditionals but cannot accelerate nested conditionals efficiently. The state-of-the-art...

chapter

GPU based parallel image processing library for embedded systems

Mustafa Cavus, Hakki Doganer Sumerkan, Osman Seckin Simsek, Hasan Hassan, more

2014 International Conference on Computer Vision Theory and Applications (VISAPP) > 1 > 234 - 241

2014 International Conference on Computer Vision Theory and Applications (VISAPP)

Embedded image processing systems have many challenges, due to large computational requirements and other physical, power, and environmental constraints. However recent contemporary mobile devices include a graphical processing unit (GPU) in order to offer better use interface in terms of graphics. Some of these embedded GPUs also support OpenCL which allows the use of computation capacity of embedded...

chapter

Restructuring and implementations of 2D matrix transpose algorithm using SSE4 vector instructions

Ahmed S. Zekri

2015 International Conference on Applied Research in Computer Science and Engineering (ICAR) > 1 - 7

2015 International Conference on Applied Research in Computer Science and Engineering (ICAR)

Current general-purpose processors are augmented with vector instructions that can process many elements of matrices and vectors in parallel. Transposing a matrix in-place is a main kernel operation required by many scientific and engineering applications to shuttle data before, during, or after processing. This operation increases the traffic on the memory bus and hence clever techniques such as...

chapter

On-the-Fly Process Resource Quarantine for System Stabilization

Gaku Nakagawa, Hirotaka Kawata, Shuichi Oikawa

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing > 517 - 524

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing (CIT/IUCC/DASC/PICOM)

Several programs demand large memory allocation to execute their tasks. Normally, the demands are based on intentions of program designers, users, and system administrators. Sometimes, however, faulty programs or malicious programs demand large memory without the intentions. These unexpected large memory demands may cause system instability. Generally, operating systems have resource limitation mechanisms...

chapter

The KPLT: The Kernel as a shared object

Scott Brookes, Martin Osterloh, Robert Denz, Stephen Taylor

MILCOM 2015 - 2015 IEEE Military Communications Conference > 954 - 959

MILCOM 2015 - 2015 IEEE Military Communications Conference

Vulnerability amplification is an ever increasing problem in homogeneous large scale networks that operate many instances of the same operating system. Diversification of a process image through techniques such as Address Space Layout Randomization (ASLR) is a commonly used defense against vulnerability amplification. One approach to diversification of a process image is load-time diversity. This...

chapter

OpenCL Kernel Fusion for GPU, Xeon Phi and CPU

Jiri Filipovic, Siegfried Benkner

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 98 - 105

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Kernel fusion is an optimization method, in which the code from several kernels is composed to create a new, fused kernel. It can push the performance of kernels beyond limits given for their isolated, unfused form. In this paper, we introduce a classification of different types of kernel fusion for both data dependent and data independent kernels. We study kernel fusion on three types of OpenCL devices:...

chapter

Progressive Codesign of an Architecture and Compiler Using a Proxy Application

Arpith Jacob, Ravi Nair, Tong Chen, Zehra Sura, more

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 57 - 64

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

The Active Memory Cube (AMC) is a novel near-memory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner. Its energy efficiency is derived from a combination of its novel scalar-vector data-flow path combined with its simple control-flow path that required the development of a sophisticated compiler, co-designed...

chapter

VecMeter: Measuring Vectorization on the Xeon Phi

Joshua Peraza, Ananta Tiwari, William A. Ward, Roy Campbell, more

2015 IEEE International Conference on Cluster Computing > 478 - 481

2015 IEEE International Conference on Cluster Computing (CLUSTER)

Wide vector units in Intel's Xeon Phi accelerator cards can significantly boost application performance when used effectively. However, there is a lack of performance tools that provide programmers accurate information about the level of vectorization in their codes. This paper presents VecMeter, an easy-to-use tool to measure vectorization on the Xeon Phi. VecMeter utilizes binary instrumentation...

chapter

Kernel Malware Core Implementation: A Survey

XiangYu Li, Yi Zhang, Yong Tang

2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery > 9 - 15

2015 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

Kernel Malware resides and performs malicious functions in the operating system kernel space. It is more difficult to be detected and cleared than the malwares implemented in the user space because of its higher authority. It also has better flexibility compared with the malware based on the firmware. As a result, the kernel malware is one of the challenging threats in information security. This paper...

INFONA - science communication portal

Search results

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Transforming VHDL descriptions into formal component-based models

Strategy without tactics: Policy-agnostic hardware-enhanced control-flow integrity

Performance-centric register file design for GPUs using racetrack memory

Design and synthesis of reconfigurable control-flow structures for CGRA

High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR

Designing customized ISA processors using high level synthesis

FPGA-accelerated simulation engine for non-viral gene delivery

HSA-enabled DSPs and accelerators

Design Optimization and Implementation of Bootloader in Embedded System Development

A comparative analysis of resource requirements for parallel applications in GPGPU

Acceleration of nested conditionals on CGRAs via trigger scheme

GPU based parallel image processing library for embedded systems

Restructuring and implementations of 2D matrix transpose algorithm using SSE4 vector instructions

On-the-Fly Process Resource Quarantine for System Stabilization

The KPLT: The Kernel as a shared object

OpenCL Kernel Fusion for GPU, Xeon Phi and CPU

Progressive Codesign of an Architecture and Compiler Using a Proxy Application

VecMeter: Measuring Vectorization on the Xeon Phi

Kernel Malware Core Implementation: A Survey

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options