The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper describes an original joint obstacle detection and tracking method based on a Mean Shift algorithm and semi-dense disparity maps. The semi-dense disparity maps are computed with a local 1D fuzzy scanline stereo matching approach. Each map is associated to a confidence map that is used to remove bad matches. The Mean Shift algorithm is applied to simultaneously extract each vehicle and track...
In this paper, we present a novel approach for using a GPU-based Cloud computing infrastructure to efficiently perform a structural comparison of protein binding sites. The original CPU-based Java version of a recent graph-based algorithm called SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU Instances. This new implementation of SEGA has been...
Optimization problems that contain discontinuities, non-linearity, or high dimensionality are difficult to solve and time consuming using conventional computational methods. This paper introduces a tool that solves these kinds of optimization problems using a patent pending Gaming Particle Swarm Optimization (GPSO) algorithm implemented on Graphics Processing Unit (GPU) hardware. Our study applied...
Liquid chromatography-based tandem mass spectrometry (LC-MS) technique allows for identification and quantification of thousands of proteins in parallel. This technique coupled with a feed-forward artificial neural network provides a technique to analyze and select protein panels for use in multi-biomarker panel discovery applications. In this study, we enhance this technique by utilizing massively...
A brute-force algorithm to solve small instances of the Dominating Set Problem on GPUs is presented. Two implementations of the algorithm are discussed, one that uses atomic operations and one that uses reductions. Experimental results are reported.
Cross equalization is the core step of time-lapse seismic data processing, it can effectively eliminate the influence which is due to the inconsistent of acquisition, data processing and tube processing parameter. As the amount of data and processing of time-lapse seismic data increasing, it becomes the inevitable trend for seismic data to array on massively parallel processes. It deal with the time-lapse...
A GPU accelerated implementation of a reduced-order model of the human arterial circulation is introduced. The computationally intensive tasks of the algorithm (namely, the computation of the flow rate and area values at the interior grid points of the domain) have been migrated to the GPU. The CPU not only coordinates the actions performed by the GPU, but it also computes the inflow, bifurcation...
Synthetic aperture radar (SAR) image formation via backprojection offers a robust mechanism by which to form images on general, non-planar surfaces, without often restrictive assumptions regarding the planarity of the wavefront at the locations being imaged. However, backprojection presents a substantially increased computational load relative to other image formation algorithms that typically depend...
We illustrate how employing Graphics Processing Units (GPU) can speed-up intensive image processing operations. In particular, we demonstrate the use of the NVIDIA CUDA architecture to implement a color digital binary halftoning algorithm based on Direct Binary Search (DBS). Halftoning a color image is more computationally expensive than the single color case as there is a need to minimize dot interaction...
Graphics Processing Units (GPUs) have been widely used as accelerators in large-scale heterogeneous computing systems. However, current programming models can only support the utilization of local GPUs. When using non-local GPUs, programmers need to explicitly call API functions for data communication across computing nodes. As such, programming GPUs in large-scale computing systems is more challenging...
In this paper, we propose an efficient implementation of the branch and bound method for knapsack problems on a CPU-GPU system via CUDA. Branch and bound computations can be carried out either on the CPU or on a GPU according to the size of the branch and bound list. A better management of GPUs memories, less GPUCPU communications and better synchronization between GPU threads are proposed in this...
In the last few years, Graphics Processing Units (GPUs) have become a great tool for massively parallel computing. GPUs are specifically designed for throughput and face several design challenges, specially what is known as the Power and Memory Walls. In these devices, available resources should be used to enhance performance and throughput, as the performance per watt is really high. For massively...
Unprecedented production of short reads from the new high-throughput sequencers has posed challenges to align short reads to reference genomes with high sensitivity and high speed. Many CPU-based short read aligners have been developed to address this challenge. Among them, one popular approach is the seed-and-extend heuristic. For this heuristic, the first and foremost step is to generate seeds between...
We present an automated symbolic verifier for checking the functional correctness of GPGPU kernels parametrically, for an arbitrary number of threads. Our tool checks the functional equivalence of a kernel and its optimized versions, helping debug errors introduced during memory coalescing and bank conflict elimination related optimizations. Key features of our work include: (1) a symbolic method...
Genetic Algorithms (GAs) are powerful search techniques. However when they are applied to complex problems, they consume large computation power. One of the choices to make them faster is to use a parallel implementation. This paper presents a parallel implementation of Combinatorial Optimisation with Coincidence Algorithm (COIN) on Graphic Processing Units. COIN is a modern GA. It has a wide range...
This paper elaborates on a new, fresh parallel optimization algorithm specially engineered to run on Graphic Processing Units (GPUs). The underlying operation relates to Systolic Computation. The algorithm, called Systolic Genetic Search (SGS) is based on the synchronous circulation of solutions through a grid of processing units and tries to profit from the parallel architecture of GPUs. The proposed...
A direct communication facility, called DCFA, for a many-core based cluster, whose compute node consists of many-core units connected to the host via PCI Express with Infiniband, is designed and evaluated. Because a many-core unit is a device of the PCI Express bus, it is not capable of configuring and initializing the Infiniband HCA, according to the PCI Express specification. This means that the...
Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e.g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we presentd OpenCL (Distributed...
Intensity model with blur effects are widely employed to accurately simulate the imaging process of a star simulator used for attitude determination and guiding feedback. The model is computationally intensive and the time requirements are proportional to the number of stars in the simulation, imposing great demands of computing power for realistic uses. This paper presents two star simulators using...
We detail the design and experiences in delivering a specialty multicore computing course whose materials are openly available. The course ambitiously covers three multicore programming paradigms: shared memory (OpenMP), device (CUDA) and message passing (RCCE), and involves significant practical work on their respective platforms: an UltraSPARC T2, Fermi GPU and the Intel Single-Chip Cloud Computer...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.