The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel -- wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard...
In this work, we investigate a transformation of VHDL descriptions into equivalent formal models. The targeted equivalence is at the level of the functional behavior. That is, we aim at producing formal models that have the same functional simulation behavior as the original VHDL implementation. We rely on the BIP component-based modeling language as the underlying formalism for this transformation...
Control-flow integrity (CFI) is a general defense against codereuse exploits that currently constitute a severe threat against diverse computing platforms. Existing CFI solutions (both in software and hardware) suffer from shortcomings such as (i) inefficiency, (ii) security weaknesses, or (iii) are not scalable. In this paper, we present a generic hardware-enhanced CFI scheme that tackles these problems...
The key to high performance for GPU architecture lies in massive threading to drive the large number of cores and enable overlapping of threading execution. However, in reality, the number of threads that can simultaneously execute is often limited by the size of the register file on GPUs. The traditional SRAM-based register file costs so large amount of chip area that it cannot scale to meet the...
Coarse-Grained Reconfigurable Architectures (CGRA) promise both low power and high performance coupled with flexibility, however automatic mapping of applications to such platforms remains a great research challenge. Efficient manual mapping of the data-centric kernels of applications yields great results, however these contain internally control-flow specific tasks, which introduce mapping irregularities...
Exchanging data on noncontiguous user buffers has been a dominant communication pattern in many scientific applications. The OpenSHMEM specification introduces a new set of communication routines to support strided data communication. Most high performance implementations of the OpenSHMEM specification support strided data communication by either packing/unpacking or multiple reads/writes based scheme,...
In this paper we propose a new degree of flexibility for soft processor design in which only the instructions relevant to the task at hand are implemented as a subset of the Instruction Set Architecture (ISA). These customized processors execute software kernels in the usual way, yet can be implemented with a fraction of the hardware resources used by other full- ISA soft processor cores. We present...
Computational methods have become an important part of gene delivery research, as they allow researchers to experiment with different models of cellular processes. Models of the gene delivery process based on telecommunication theory make this experimentation especially efficient. Therefore, this paper presents a specialised FPGA-accelerated heterogeneous architecture for simulating the gene delivery...
In this paper, we describe the Heterogeneous System Architecture Foundation's application to digital signal processors (DSP) and hardware accelerators. We provide an overview of the HSA runtime, system architecture and programmer's model, identify characteristics of DSPs and compare differences in algorithms to GPUs. We show an example mapping of HSA agents to a modern DSP using the HSA intermediate...
Bootloader is an important part of ARM embedded system. The realization of Bootloader is closely related with hardware. This paper gives a brief introduction to S3C2440 development board and its starting mode. Mainly introduces the initialization of the each function module in S3C2440 during the system startup, and scheme out the Simplified Bootloader for S3C2440. After testing, This Bootloader has...
The Single Instruction Multiple Thread (SIMT) architecture based Graphic Processing Units (GPUs) are emerging as more efficient platforms than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous fine-grained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within...
Coarse-Grained Reconfigurable Architecture (CGRA) is a promising accelerator when considering both high performance and high power-efficiency. One of the challenges that CGRAs are confronting is to accelerate loops with control flow (if-then-else structures). Existing techniques employ predication to accelerate the conditionals but cannot accelerate nested conditionals efficiently. The state-of-the-art...
Embedded image processing systems have many challenges, due to large computational requirements and other physical, power, and environmental constraints. However recent contemporary mobile devices include a graphical processing unit (GPU) in order to offer better use interface in terms of graphics. Some of these embedded GPUs also support OpenCL which allows the use of computation capacity of embedded...
Current general-purpose processors are augmented with vector instructions that can process many elements of matrices and vectors in parallel. Transposing a matrix in-place is a main kernel operation required by many scientific and engineering applications to shuttle data before, during, or after processing. This operation increases the traffic on the memory bus and hence clever techniques such as...
Several programs demand large memory allocation to execute their tasks. Normally, the demands are based on intentions of program designers, users, and system administrators. Sometimes, however, faulty programs or malicious programs demand large memory without the intentions. These unexpected large memory demands may cause system instability. Generally, operating systems have resource limitation mechanisms...
Vulnerability amplification is an ever increasing problem in homogeneous large scale networks that operate many instances of the same operating system. Diversification of a process image through techniques such as Address Space Layout Randomization (ASLR) is a commonly used defense against vulnerability amplification. One approach to diversification of a process image is load-time diversity. This...
Kernel fusion is an optimization method, in which the code from several kernels is composed to create a new, fused kernel. It can push the performance of kernels beyond limits given for their isolated, unfused form. In this paper, we introduce a classification of different types of kernel fusion for both data dependent and data independent kernels. We study kernel fusion on three types of OpenCL devices:...
The Active Memory Cube (AMC) is a novel near-memory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner. Its energy efficiency is derived from a combination of its novel scalar-vector data-flow path combined with its simple control-flow path that required the development of a sophisticated compiler, co-designed...
Wide vector units in Intel's Xeon Phi accelerator cards can significantly boost application performance when used effectively. However, there is a lack of performance tools that provide programmers accurate information about the level of vectorization in their codes. This paper presents VecMeter, an easy-to-use tool to measure vectorization on the Xeon Phi. VecMeter utilizes binary instrumentation...
Kernel Malware resides and performs malicious functions in the operating system kernel space. It is more difficult to be detected and cleared than the malwares implemented in the user space because of its higher authority. It also has better flexibility compared with the malware based on the firmware. As a result, the kernel malware is one of the challenging threats in information security. This paper...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.