The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Automatically parallelizing loop nests into CUDA kernels must exploit the full potential of GPUs to obtain high performance. One state-of-the-art approach makes use of the polyhedral model to extract parallelism from a loop nest by applying a sequence of affine transformations to the loop nest. However, how to automate this process to exploit both intra and inter-SM parallelism for GPUs remains a...
Portfolio risk is commonly defined as the standard deviation of its return. The empirical correlation matrix of asset returns in a portfolio has its intrinsic noise component. This noise is filtered for more robust performance. Eigendecomposition is a widely used method for noise filtering. Jacobi algorithm has been a popular eigensolver technique due to its stability. We present an efficient GPU...
Overlapping computations and communication is a key to accelerating stencil applications on parallel computers, especially for GPU clusters. However, such programming is a time-consuming part of the stencil application development. To address this problem, we developed an automatic code generation tool to produce a parallel stencil application with latency hiding automatically from its dataflow model...
In order to obtain more accurate solutions of polynomial systems with numerical continuation methods we use multiprecision arithmetic. Our goal is to offset the overhead of double double arithmetic accelerating the path trackers and in particular Newton's method with a general purpose graphics processing unit. In this paper we describe algorithms for the massively parallel evaluation and differentiation...
Image registration is the process of matching different images whether 2D or 3D of certain similar or common properties for different purposes. This work addresses this field using a Gauss-Newton optimization approach. The problem is basically formulated as minimizing a cost function that is then solved by a backtracking line search. Since this is considered as a demanding problem especially for larger...
Linear equations with large spare coefficient matrices arise in many practical scientific and engineering problems. Previous sparse matrix algorithms for solving linear equations based on single-core CPU are highly complex and time-consuming. To solve such problems, aiming at Jacobi iteration algorithm, in this paper we firstly implement a sparse matrix parallel iteration algorithm on a hybrid multi-core...
This paper presents an implementation of the Jacobi power flow algorithm to be run on a single instruction multiple data (SIMD) unit processor. The purpose is to be able to solve a large number of power flows in parallel as quickly as possible. This well-known algorithm was modified taking into account the characteristics of the SIMD architecture. The results show a significant speed-up of the algorithm...
We develop highly efficient parallel pricing methods on Graphics Processing Units (GPUs) for multi-asset American options via a Partial Differential Equation (PDE) approach. The linear complementarity problem arising due to the free boundary is handled by a penalty method. Finite difference methods on uniform grids are considered for the space discretization of the PDE, while classical finite differences,...
With the development of GPU, the GPU's float-point computing capacity improves rapidly. How to apply the float-point ability of GPU to the non-graphic computing field becomes a highlight in the research of high performance computing. Jacobi is a typical application in scientific computing. This paper designs and implements Jacobi Algorithm on Nvidia's CUDA platform and gets a good speedup compared...
A new approach to solve the power flow problem based on graphic processing units is presented in this paper. A Newton method is implemented to solve the set of nonlinear equations of the power flow formulation. A parallel kernel for the biconjugate gradient method allows solving the voltage corrections on a graphic processing card. While the evaluation of the Jacobian matrix is carried out on the...
Paper presents results obtained when porting FEM 2D linear elastostatic local stiffness matrix calculations to Tesla architecture with OpenCL framework. Comparison with native NVIDIA CUDA implementations has been provided.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.