The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
ADAS (Advanced Driver Assistance Systems) algorithms increasingly use heavy image processing operations. To embed this type of algorithms, semiconductor companies offer many heterogeneous architectures. These SoCs (System on Chip) are composed of different processing units, with different capabilities, and often with massively parallel computing unit. Due to the complexity of these SoCs, predicting...
Stream processing is a compute paradigm that promises safe and efficient parallelism. Its realization requires optimization of multiple parameters such as kernel placement and communications. Most techniques to optimize streaming systems use queueing network models or network flow models, which often require estimates of the execution rate of each compute kernel. This is known as the non-blocking...
This paper presents an SSD-based Block I/O Scheduler, short for SBIOS. SBIOS fully exploits the internal parallelism to improve the system performance. It dispatches the read requests to different blocks to make full use of SSD internal parallelism. For write requests, it tries to dispatch write requests to the same block to alleviate the block cross penalty and garbage collection overhead. The evaluation...
Enhancement algorithms can make low light level images have a clear visual effect like the one captured during the daytime, but due to high complexity and generous computational cost, low light level image enhancement algorithms are usually difficult to meet real-time requirements which make it difficult to be widely used in practical application. For this situation, a parallel optimization algorithm...
The Single Instruction Multiple Thread (SIMT) architecture based, Graphic Processing Units (GPUs) are emerging as more efficient than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous finegrained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within a CTA...
Polynomial systems occur in many areas of science and engineering. Unlike general nonlinear systems, the algebraic structure enablesto compute all solutions of a polynomial system. We describe our massively parallel predictor-corrector algorithmsto track many solution paths of a polynomial homotopy. The data parallelism that provides the speedups stems from theevaluation and differentiation of the...
Tasking is a prominent parallel programming model. In this paper we conduct a first study into the feasibility of task-parallel execution at the CUDA grid, rather than the stream/kernel level, for regular, fixed in-out dependency task graphs, similar to those found in wavefront computational patterns, making the findings broadly applicable. We propose and evaluate three CUDA task progression algorithms,...
Interferometric Synthetic Aperture Radar (InSAR) is a remote sensing technology used for estimating displacement of the earth's surface. Phase unwrapping is the most important step in InSAR processing and relies on successful selection of points that appear stable across a set of satellite images taken over time. This paper presents a new algorithm for selecting these points, a problem known as persistent...
Current tools for High-Level Synthesis (HLS) excel at exploiting Instruction-Level Parallelism (ILP), the support for Data-Level Parallelism (DLP), one of the key advantages of Field Programmable Gate Arrays (FPGAs), is in contrast very limited. This work examines the exploitation of DLP on FPGAs using code generation for C-based HLS of image filters and streaming pipelines, consisting of point and...
Audio Fingerprinting (AFP) is a technology, which requests huge computing power for responsiveness, accuracy, and robustness to noise. In this study, we make efforts to improve the computing speed of fingerprint extraction in AFP system by parallelism language OpenCL. Especially, we also explore the function and speed portability across different platform. The experimental results show that the portability...
Searching for the evolutionary relationships between groups of organism has become a routine procedure in molecular biology. MrBayes is a popular model based phylogenetic inference tool using Bayesian statistics. Unfortunately, the computational cost is very high, resulting in undesirably long execution time. In this paper, we present what we believe the fastest solution of the MrBayes MC3 algorithm...
Software Fault Injection (SFI) is an established technique for assessing the robustness of a software under test by exposing it to faults in its operational environment. Depending on the complexity of this operational environment, the complexity of the software under test, and the number and type of faults, a thorough SFI assessment can entail (a) numerous experiments and (b) long experiment run times,...
An R-tree is a data structure for organizing and querying multi-dimensional non-uniform and overlapping data. Efficient parallelization of R-tree is an important problem due to societal applications such as geographic information systems (GIS), spatial database management systems, and VLSI layout which employ R-trees for spatial analysis tasks such as map-overlay. As graphics processing units (GPUs)...
K-means is a method of vector quantization, which is now popularly used for clustering analysis in massive data mining. Due to its heavily computational-intensive feature for iteratively re-computing and sorting distances, the execution of k-means takes a huge amount of time, especially when processing large graph data such as the practical social networks. This paper studies an alternative method...
GPUs devices are becoming critical building blocks of High-Performance platforms for performance and energy efficiency reasons. As a consequence, parallel programming environment such as OpenMP were extended to support offloading code to such devices. OpenMP compilers are faced with offering an efficient implementation of device-targeting constructs.One main issue in implementing OpenMP on a GPU is...
Implementation of the background subtraction algorithm using OpenCL platform is presented. The algorithm processes live stream of video frames from the surveillance camera in on-line mode. Processing is performed using a host machine and a parallel computing device. The work focuses on optimizing an OpenCL algorithm implementation for GPU devices by taking into account specific features of the GPU...
In recent years emergence of many intelligent autonomous systems are possible due to the tremendous advancement of various technologies like computer vision and automation and control engineering with sensor technology. One such intelligent system is autonomous underwater vehicle (AUV) for ocean floor mapping by SONAR technology. Success of this autonomous smart and precise intelligent system depends...
Stencil computations are at the heart of many physical simulations used in scientific codes. Thus, there exists a plethora of optimization efforts for this family of computations. Among these techniques, tiling techniques that allow concurrent start have proven to be very efficient in providing better performance for these critical kernels. Nevertheless, with many core designs being the norm, these...
One of the necessary conditions to gain performance improvement through heterogeneous multi-core is to exploit the parallelism in the program. Compiler applies various transformations to the code to achieve execution efficiency. Code optimization is one of the important tasks performed by the compiler before generating the target code. With the availability of various parallel programming models in...
Supporting dynamic parallelism is important for GPU to benefit a broad range of applications. There are currently two fundamental ways for programs to exploit dynamic parallelism on GPU: a software-based approach with software-managed worklists, and a hardware-based approach through dynamic subkernel launches. Neither is satisfactory. The former is complicated to program and is often subject to some...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.