The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The use Graphic Processing Units (GPU) as computing accelerators has been. Nevertheless, writing efficient GPU programs is a difficult and time consuming task. In this paper we present the Linear Performance Breakdown Model (LBPM), an analytic model that is used to extract the breakdown of GPU kernel programs execution time into the three major components that affect its running time. The model can...
Android phone manufacturers are under the perpetual pressure to move quickly on their new models, continuously customizing Android to fit their hardware. However, the security implications of this practice are less known, particularly when it comes to the changes made to Android's Linux device drivers, e.g., those for camera, GPS, NFC etc. In this paper, we report the first study aimed at a better...
On FPGA, this paper presents the implementation of a simple processor architecture for accelerating data-parallel applications. Our proposed processor called SuperSMP, which can execute multi-scalar, vector, and matrix instructions on parallel execution datapaths. 4×32-bit instructions are fetched from instruction cache. The fetched instructions are decoded and their dependencies are checked. Up to...
A wavelet based multi-level adaptive unsharp masking technique for image sharpening is proposed. It does sharpening at multiple levels of DWT with a small and fixed size Gaussian kernel and automatically adjusts for different amount of blurring in different direction. The proposed method is free from kernel estimation. The algorithm is designed to process in a heterogenous environment consisting of...
Generally, bio-inspired techniques require significant computational resources. However, due to their complexity and the computing power required for their execution, they have long been neglected. Nevertheless and recently, parallel resolution techniques exploiting the graphics processing units (GPUs) are increasingly used. These specialized processors are being widely adopted for the purpose of...
Reliability is a major concern in multiprocessors. Dynamic Reliability Management (DRM) aims at trading off processor performance with lifetime. The state-of-the-art publications study only the theory supported by simulation. This paper presents the first complete software implementation, working on a real hardware, of a low-overhead, Android-compatible workload-aware DRM Governor for mobile multiprocessors...
OpenCL has been designed to achieve functional portability across multi-core devices from different vendors. However, the lack of a single cross-target optimizing compiler severely limits performance portability of OpenCL programs. Programmers need to manually tune applications for each specific device, preventing effective portability. We target a compiler transformation specific for data-parallel...
IPv6 was introduced but yet it is not widely used. Research work has been pointed to many directions, specifically, on how to migrate from IPv4 to IPv6, on how to adapt hardware devices to support a transitory period from coexistence between IPv4 and IPv6 to established use of IPv6, and on how should operating systems perform when using IPv6 as compared to IPv4. This work provides a comparative performance...
We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases...
Network security has been a serious problem in the Internet. To face this issue, network intrusion detection tools have become indispensable for computer systems and network gateways. In this paper we propose an embedded, multi-core aware network intrusion detection system (NIDS), which has the following features: 1) It integrates a novel multi-core aware packet capture module, called the MCA ring,...
In this paper, we propose an implementation of a parallel one-dimensional fast Fourier transform (FFT) on GPU clusters. This implementation is based on the six-step FFT algorithm. Because the parallel one-dimensional FFT requires three all-to-all communications, one goal for parallel FFTs on GPU clusters is to minimize the PCI Express transfer time and the MPI communication time. We demonstrate that...
Non-volatile memory (NVM) storage is becoming more popular as its performance and cost efficiency improve. Since the performance and characteristics of NVM storage are significantly different from those of HDDs, there are ongoing researches to utilize SSDs more efficiently and effectively. There is a claim that the further improvement of NVM storage performance makes it better to poll a storage device...
We present the development of one of the first libraries based on the so-called expression templates technique to simplify the implementation of CPU and parallel GPUcodes. Expression templates allow to express matrix algebra operations to be executed either on the CPU or on the GPU with a syntax very close to the natural mathematical one. The developed library has been deeply optimized so that the...
Heterogeneous archioectures have been widely used in the domain of high performance computing. On one hand, it allows a designer to use multiple types of computing units and each able to execute the tasks that it is best suited for to increase performance, on the other hand, it brings many challenges in programming for novice users, especially for heterogeneous systems with multi-devices. In this...
In this paper, we propose a framework to automatically map single-device OpenCL programs to heterogeneous multi-device platforms with performance concerns. Our framework is based on the independence of work groups which built inside the OpenCL programming model and relies heavily on the knowledge of global memory access regions of work groups. So global memory access patterns of work groups are analyzed...
in shared storage environment, different types of applications share cache resources. The traditional cache management has two disadvantages. First, interference exists between applications that share one cache space, therefore, every application can't share cache resource fairly. Second, overall resource utilization is very low. To solve these problems, we design a cache management system - PIN-Cache,...
Commodity operating systems have already gained functionality of virtual machine monitor. Nested virtualization is needed to run these commodity operating systems as virtual machines. Furthermore, with nested virtualization technology, users can run a self-configured virtual machine monitor (VMM) in Infrastructure as a Service (IaaS) cloud computing model, and live migration of VMM can be realized...
The enormous amount of data has been boundlessly growing over the last few decades and expected to exponentially do so in the future. However, a substantial size of this accumulated amount is discarded anyhow. The processing capabilities have been considered as one of the major barriers in the way of exploiting this priceless mine. Therefore, the issue has absorbed considerable part of researchers'...
There has been much work done in implementing various GPU-based Computed Tomography reconstruction algorithms for medical applications showing tremendous improvement in computational performance. While many of these reconstruction algorithms could also be applied to industrial-scale datasets, the performance gains may be modest to non-existent due to a combination of algorithmic, hardware, or scalability...
OpenCL is now available on a very large set of processors. This makes this language an attractive layer to address multiple targets with a single code base. The question on how sensitive to the underlying hardware is the OpenCL code in practice remains to be better understood. 1
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.