The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Heterogeneous computing system increases the performance of parallel computing in many domain of general purpose computing with CPU, GPU and other accelerators. Open Computing Language(OpenCL) is the first open, royalty-free standard for heterogenous computing on multi hardware platforms. In this paper, we propose a parallel Motion Estimation(ME) algorithm implemented using OpenCL and present several...
A task-parallel approach to programming commodity graphics hardware is useful for implementing irregular parallel workloads with dependencies, particularly for applications such as video encoding and backtracking algorithms. The featured Web extra is a video that demonstrates how to use a GPU task-parallel model for H.264 intra prediction. The authors first describe the dependency structure and then...
This paper presents an implementation of the Jacobi power flow algorithm to be run on a single instruction multiple data (SIMD) unit processor. The purpose is to be able to solve a large number of power flows in parallel as quickly as possible. This well-known algorithm was modified taking into account the characteristics of the SIMD architecture. The results show a significant speed-up of the algorithm...
Parallel data processing belongs in the present time to the basic approaches. Its realization is possible by using of multi-core processors or we can use new trend with graphic accelerators on the new type of graphic cards. However this process is not straightforward and it requires an adequate model structure and application program parallelization. Adequate model structure includes not only parallelization...
In this paper the implementation of discrete cosine transform (DCT) on the GPU. The study indicates a clear superiority of the GPU as parallel processor for image compression using DCT over the CPU. It also indicates that the increase in image size considerably slowed the CPU and did not affect the GPU.
A study of the fundamental obstacles to accelerate the preconditioned conjugate gradient (PCG) method on modern graphic processing units (GPUs) is presented and several techniques are proposed to enhance its performance over previous work independent of the GPU generation and the matrix sparsity pattern. The proposed enhancements increase the performance of PCG up to 23 times compared to vector optimized...
Much research exists for the efficient processing of spatio-temporal data streams. However, all methods ultimately rely on an ill-equipped processor, namely a CPU, to evaluate concurrent, continuous spatio-temporal queries over these data streams. This paper presents GEDS, a scalable, Graphics Processing Unit (GPU)-based framework for the evaluation of continuous spatio-temporal queries over spatio-temporal...
This paper describes our novel work of using GPUs to improve the performance of a homography-based visual servo system. We present our novel implementations of a GPU based Efficient Second-order Minimization (GPU-ESM) algorithm. By utilizing the tremendous parallel processing capability of a GPU, we have obtained significant acceleration over its CPU counterpart. Currently our GPU-ESM algorithm can...
This paper proposes a triangular solve algorithm with variable block size for graphics processing unit (GPU). By using diagonal blocks inversion with recursion, this algorithm works with tunable block size to achieve the best performance. Various methods are shown on how to make use of existing profiling tools to successfully measure and analyze performance of this algorithm. We use some of the most...
In this paper, we present a novel implementation of a Cellular Genetic Algorithm (cGA) model for a multi-GPU platform using NVIDIA's CUDA technology. This multi-GPU cGA model is compared first against a serial version in CPU and then versus an implementation on a single GPU. We divide the different operations of the cGA into distinct sets of instructions called kernels. Using the multi-GPU platform...
Among the top-performing stereo algorithms on the Middlebury Stereo Database, Semi-Global Matching (SGM) is commonly regarded as the most efficient algorithm. Consequently, real-time implementations of the algorithm for graphics hardware (GPU) and reconfigurable hardware (FPGA) exist. However, the computation time on general purpose PCs is still more than a second. In this paper, a real-time SGM implementation...
Recently, Computer Vision problems like Face Recognition and Super-Resolution solved using sparse representation based methods with large dictionaries have shown state-of-the-art results. However such methods are computationally prohibitive for typical CPUs, especially for a large dictionary size. We present fast implementation of these methods by exploiting the massively parallel processing capabilities...
The purpose of this paper is to implement association rule mining algorithm using Nvidia CUDA framework for general purpose computing on GPU. The major objective is to perform performance comparison of association rule mining algorithm using C based implementation on Intel Quad Core/Core2 Duo CPU with CUDA based implementation on Nvidia G80 and GTX 200 series GPU. The final outcome of this research...
This work presents an implementation of neocognitron neural network, using a high performance computing architecture based on GPU (graphics processing unit). Neocognitron is an artificial neural network, proposed by Fukushima and collaborators, constituted of several hierarchical stages of neuron layers, organized in two-dimensional matrices called cellular planes. For the high performance computation...
The increase of computational power of programmable GPU (graphics processing unit) brings new concepts for using these devices for generic processing. Hence, with the use of the CPU and the GPU for data processing come new ideas that deals with distribution of tasks among CPU and GPU, such as automatic distribution. The importance of the automatic distribution of tasks between CPU and GPU lies in...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.