The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Finite element analysis involves the solution of linear systems described by large size sparse matrices. Iterative Krylov methods are well suited for such type of problems. These methods require linear algebra operations, including sparse matrix-vector multiplication which can be computationally expensive for large size matrices. In this paper, we present the best way to perform these operations,...
GPUs and other accelerators are available on many different devices, while GPGPU has been massively adopted by the HPC research community. Although a plethora of libraries and applications providing GPU support are available, the need of implementing new algorithms from scratch, or adapting sequential programs to accelerators, will always exist. Writing CUDA or OpenCL codes, although an easier task...
When converting a serial program to a parallel program that can run on a Graphics Processing Unit (GPU) the developer must choose what functions will run on the GPU. For each function the developer chooses, he or she needs to manually write code to: 1) serialize state to GPU memory, 2) define the kernel code that the GPU will execute, 3) control the kernel launch and 4) deserialize state back to CPU...
Portfolio risk is commonly defined as the standard deviation of its return. The empirical correlation matrix of asset returns in a portfolio has its intrinsic noise component. This noise is filtered for more robust performance. Eigendecomposition is a widely used method for noise filtering. Jacobi algorithm has been a popular eigensolver technique due to its stability. We present an efficient GPU...
An efficient memory bandwidth utilization for GPU accelerators is crucial for memory bound applications. In medical imaging, the performance of many kernels is limited by the available memory bandwidth since only a few operations are performed per pixel. For such kernels only a fraction of the compute power provided by GPU accelerators can be exploited and performance is predetermined by memory bandwidth...
We present a software package that supports teaching different parallel programming models in a computational science and engineering context. It implements a Finite Volume solver for the shallow water equations, with application to tsunami simulation in mind. The numerical model is kept simple, using patches of Cartesian grids as computational domain, which can be connected via ghost layers. The...
The construction of phylogenetic trees is important for the computational biology, especially for the development of biological taxonomies. UPGMA is one of the most popular heuristic algorithms for constructing ultrametric trees (UT). Although the UT constructed by the UPGMA often is not a true tree unless the molecular clock assumption holds, the UT is still useful for the clocklike data. However,...
Solving exactly Combinatorial Optimization Problems (COPs) using a Branch-and-Bound (B&B) algorithm requires a huge amount of computational resources. Therefore, we recently investigated designing B&B algorithms on top of graphics processing units (GPUs) using a parallel bounding model. The proposed model assumes parallelizing the evaluation of the lower bounds on pools of sub-problems...
GPUs have been successfully used for acceleration of many mathematical functions and libraries. A common limitation of those libraries is a minimal size of primitives being handled in order to achieve significant speedups compared to their CPU versions. The minimal size requirement can prove prohibitive for many applications. It can be loosened by batching operations to have sufficient amount of data...
Sparse matrix vector multiplication, SpMV, is often a performance bottleneck in iterative solvers. Recently, Graphics Processing Units, GPUs, have been deployed to enhance the performance of this operation. We present a blocked version of the Transposed Jagged Diagonal storage format which is tailored for GPUs, BTJAD. We develop a highly optimized SpMV kernel that takes advantage of the properties...
Along with the inclusion of GPU cores within the same CPU die, the performance of Intel's processor-graphics has been significantly improved over earlier generation of integrated graphics. This paper presents a highly optimized SURF cascade based face detector which efficiently exploits both CPU and GPU computing power on the latest Sandy Bridge processor. The SURF cascade classifier procedure is...
This paper describes an original joint obstacle detection and tracking method based on a Mean Shift algorithm and semi-dense disparity maps. The semi-dense disparity maps are computed with a local 1D fuzzy scanline stereo matching approach. Each map is associated to a confidence map that is used to remove bad matches. The Mean Shift algorithm is applied to simultaneously extract each vehicle and track...
In this paper, we present a novel approach for using a GPU-based Cloud computing infrastructure to efficiently perform a structural comparison of protein binding sites. The original CPU-based Java version of a recent graph-based algorithm called SEGA has been rewritten in OpenCL to run on NVIDIA GPUs in parallel on a set of Amazon EC2 Cluster GPU Instances. This new implementation of SEGA has been...
Optimization problems that contain discontinuities, non-linearity, or high dimensionality are difficult to solve and time consuming using conventional computational methods. This paper introduces a tool that solves these kinds of optimization problems using a patent pending Gaming Particle Swarm Optimization (GPSO) algorithm implemented on Graphics Processing Unit (GPU) hardware. Our study applied...
Liquid chromatography-based tandem mass spectrometry (LC-MS) technique allows for identification and quantification of thousands of proteins in parallel. This technique coupled with a feed-forward artificial neural network provides a technique to analyze and select protein panels for use in multi-biomarker panel discovery applications. In this study, we enhance this technique by utilizing massively...
A brute-force algorithm to solve small instances of the Dominating Set Problem on GPUs is presented. Two implementations of the algorithm are discussed, one that uses atomic operations and one that uses reductions. Experimental results are reported.
Cross equalization is the core step of time-lapse seismic data processing, it can effectively eliminate the influence which is due to the inconsistent of acquisition, data processing and tube processing parameter. As the amount of data and processing of time-lapse seismic data increasing, it becomes the inevitable trend for seismic data to array on massively parallel processes. It deal with the time-lapse...
A GPU accelerated implementation of a reduced-order model of the human arterial circulation is introduced. The computationally intensive tasks of the algorithm (namely, the computation of the flow rate and area values at the interior grid points of the domain) have been migrated to the GPU. The CPU not only coordinates the actions performed by the GPU, but it also computes the inflow, bifurcation...
Synthetic aperture radar (SAR) image formation via backprojection offers a robust mechanism by which to form images on general, non-planar surfaces, without often restrictive assumptions regarding the planarity of the wavefront at the locations being imaged. However, backprojection presents a substantially increased computational load relative to other image formation algorithms that typically depend...
We illustrate how employing Graphics Processing Units (GPU) can speed-up intensive image processing operations. In particular, we demonstrate the use of the NVIDIA CUDA architecture to implement a color digital binary halftoning algorithm based on Direct Binary Search (DBS). Halftoning a color image is more computationally expensive than the single color case as there is a need to minimize dot interaction...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.