The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In cutting-edge CPU/GPU hybrid clusters, such as Tianhe-1A, the aggregate CPU computing capability may amount to up to 1/3 of the aggregate GPU computing capability. It thus goes without saying that the CPUs and GPUs should jointly carry out the computational work. However, to effectively and simultaneously use both the hardware components requires great care when developing the parallel implementations...
This paper transforms sequential power flow problem to a parallel problem and solves it on GPU. In particular, we implement parallel Gauss-Seidel solver, Newton-Raphson solver, and P-Q decoupled solver using CUDA (Compute Unified Device Architecture) on GPU. The aim is to investigate the performance of the three different parallel power flow solvers. We use four IEEE standard power systems and one...
Recently, the Graphic Processor Unit (GPU) has evolved into a highly parallel, multithreaded, many-core processor with tremendous computational horsepower and very high memory bandwidth. To improve the simulation efficiency of complex flow phenomena in the field of computational fluid dynamics, a CUDA-based simulation algorithm of large eddy simulation using multiple GPUs is proposed. Our implementation...
A process of generating a digital hologram requires a lot of time-consuming computations. Therefore, it is important to reduce the computation time or the number of computations for achieving real-time digital holographic video generation. In this paper, we propose a method of parallelizing the computations using multiple GPUs with CUDA and OpenMP and an optimization method for reducing the computation...
We present a software package that supports teaching different parallel programming models in a computational science and engineering context. It implements a Finite Volume solver for the shallow water equations, with application to tsunami simulation in mind. The numerical model is kept simple, using patches of Cartesian grids as computational domain, which can be connected via ghost layers. The...
A GPU accelerated implementation of a reduced-order model of the human arterial circulation is introduced. The computationally intensive tasks of the algorithm (namely, the computation of the flow rate and area values at the interior grid points of the domain) have been migrated to the GPU. The CPU not only coordinates the actions performed by the GPU, but it also computes the inflow, bifurcation...
Molecular Dynamics (MD) simulations have been widely used in the study of macromolecules. To ensure an acceptable level of statistical accuracy relatively large number of particles are needed, which calls for high performance implementations of MD. These days heterogeneous systems, with their high performance potential, low power consumption, and high price-performance ratio, offer a viable alternative...
Many geophysical problems are computationally expensive owing to their iterative nature or due to the programs processing to large datasets. Such problems are challenging and have to be approached with extreme caution because a wrong parameter selection will not only lead to wrong results but will also take up a lot of time. The Compute Unified Device Architecture (CUDA) introduced by NVIDIA has enabled...
Diffusion Weighted Magnetic Resonance Imaging (DW-MRI) and tractography approaches are the only tools that can be utilized to estimate structural connections between different brain areas, non-invasively and in-vivo. A first step that is commonly utilized in these techniques includes the estimation of the underlying fibre orientations and their uncertainty in each voxel of the image. A popular method...
Pharmaceutical industries which are intended for the packaging of different tablets in a strip of blister need to make sure that the tablets are free from defects before letting them go into the packing box. The purpose of this project is to speed-up the system process via implementing the image processing algorithm on GPU. Morphological and mathematical operations have been implemented on both GPU...
A trend that has materialized, and has given rise to much attention, is of the increasingly heterogeneous computing platforms. Recently, it has become very common for a desktop or a notebook computer to be equipped with both a multi-core CPU and a GPU. Application development for exploiting the aggregate computing power of such an environment is a major challenge today. Particularly, we need dynamic...
Graphical Processing Units (GPUs) are frequently used for simulations of physical and biological systems. The simulated systems are often composed of simple elements that com municate only with their neighbors. But in some systems, such as large-scale neuronal networks, each element can communicate with any other element in the simulation. In this work, we present an efficient CUDA algorithm that...
We report on recent developments aiming at improving the accuracy and the performances of a discontinuous Galerkin time domain method (DGTD) for the simulation of time-domain electromagnetic wave propagation problems involving general domains and heterogeneous media. The common objective of the associated studies is to bring the method to a level of computational efficiency and flexibility that allows...
Registrations in medical imaging and computational anatomy can be obtained using the Large Deformation Diffeomorphic Kernel Bundle Mapping (LDDKBM) framework. This provides a registration algorithm with a solid mathematical foundation while incorporating regularization of deformation at multiple scales. Because the variational formulation of LDDKBM implies a heavy computational burden in the search...
With emerging of next generation of digital cameras offering a 3D reconstruction of a viewed scene, Depth from Defocus (DFD) presents an attractive option. In this approach the depth profile of the scene is recovered from two views captured in different focus setting. The DFD is well known as a computationally-intensive method due to the shift-variant filtering involved with its estimation. In this...
The simulation of nonlinear propagation of ultrasound wave in biological tissue is a very time consuming operation. Different simulators, based on finite difference or angular spectrum methods have been reported in the literature and the second one provide faster simulations by considering separately the different harmonics. In this paper we proposed to use a generalized angular spectrum method (GASM)...
A parallel distributed CUDA implementation of a Lattice Boltzmann Method for multiphase flows with large density ratios is described in this paper. Validation runs studying the terminal velocity of a rising bubble under the effect of gravity show good agreement with the expected theoretical values. The code is benchmarked against the performance of a typical CPU implementation of the same algorithm...
This paper presents a parallel grid-based method and belief fusion for real-time cooperative Bayesian estimation. The grid-based recursive Bayesian estimation (RBE) method effectively maintains the belief of objects even with no detection event but requires large computation for its prediction and correction processes as well as fusion process in cooperative estimation. In order for real-time estimation,...
Many of the basic image processing tasks suffer from processing overhead to operate over the whole image. In real time applications the processing time is considered as a big obstacle for its implementations. A High Performance Computing (HPC) platform is necessary in order to solve this problem. The usage of hardware accelerator make the processing time low. In recent developments, the Graphics Processing...
Design of data structures for high performance computing (HPC) is one of the principal challenges facing researchers looking to utilize heterogeneous computing machinery. Heterogeneous systems derive cost, power, and speed efficiency by being composed of the appropriate hardware for the task. Yet, each type of processor requires a specific organization of the application state in order to achieve...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.