The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper finite element method for 3D DC resistivity modeling accelerated using multi-GPU (Graphics Processing Unit). Solution of the large system of linear equations is the most expensive computation in finite element method performed in GPUs to reduce the computational time. Conjugate gradient solver used to solve large system of linear equations. We developed kernel for conjugate gradient...
We address the computationally demanding task of real time optimal detection of a Gaussian Signal in Gaussian Noise. The mathematical principles of such a detector were formulated in 1965, but a full real-time implementation of these principles was not possible for decades mainly due to technological barriers. We present a CUDA based implementation of such an optimal detector and study its decision...
An algorithm based on particle filters is employed to track moving objects in video streams from fixed and non-fixed cameras. Particle weighting is based on color histograms computed in the iHLS color space. Particle computations are parallelized with CUDA framework. The algorithm was tested on various GPU devices: a desktop GPU card, a mobile chipset and two embedded GPU platforms. The processing...
The new generation architecture of NVIDIA launched Multi-Process Services (MPS), which provides a context manager in the software layer to handle tasks with different processes. MPS can only be used on the Linux platform, and requires a computing capability of 5.0 or higher NVIDIA GPU card [1]. Although these constraints limit the applicability, but it is a relatively inexpensive way to make multiple...
The performance of a CUDA kernel often depends on the number of threads per thread-block (thread-block size), and the optimal configuration differs according to the graphics processing unit (GPU) hardware and the given data size to the kernel. In particular, in linear algebra libraries such as Basic Linear Algebra Subprograms (BLAS), most routines support a wide range of problem sizes and various...
Non-equispaced fast Fourier transform (NFFT) has attracted significant interest for its applications in tomography and remote sensing where visualization and image reconstruction require non-equispaced data. Here we present an efficient implementation of high accuracy NFFT on an NVidia GPU (Graphic Processing Unit). We focused on the convolution step in the computation of NFFT, since it is the most...
Feature calculation of large amount of images is time consuming. The GPU based CUDA framework offers an affordable solution for calculating image features in parallel. The research focused on an empirical study of different implementations of a general-purpose GPU-based solution for calculating Gray-Level Co-occurrence Matrices (GLCM) and associated features of diffraction images of biological cells...
This paper proposes the parallel implementation of finite volume method based on weighted average flux (WAF) to solve the shallow water equations on a graphic processing unit. We develop two parallel programs which are 1-dimension thread block and 2-dimension thread block, respectively. We compare the performance of these two versions with a sequential program. The numerical experiment is performed...
In modern networks, there exist different applications which generate various different types of network traffic. In order to improve the performance of network management, it is important to identify and classify the internet traffic. The machine learning (ML) technique based on per-flow statistics has been widely used in traffic classification. Different from traditional classification methods,...
The maximum common subgraph of two graphs, G1 and G2, is the largest subgraph in G1 that is isomorphic to a subgraph in G2. Finding the maximum common subgraph of two given graphs is known to be a NP-complete problem. An exact solution for the maximum common subgraph problem can be found by an algorithm that transforms the maximum common subgraph problem into a maximal clique enumeration problem....
Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits...
The clustering coefficient and the transitivity ratio are concepts often used in network analysis, which creates a need for fast practical algorithms for counting triangles in large graphs. Previous research in this area focused on sequential algorithms, MapReduce parallelization, and fast approximations. In this paper we propose a parallel triangle counting algorithm for CUDA GPU. We describe the...
Numerical approach to frequency response problems usually requires that the system governing equation is solved repeatedly at many frequencies. The computational efficiency of the overall process can be increased by departing from traditional sequential computing model in favor of utilizing the parallel processing capability commonly offered by modern hardware. In this paper, we consider a hybrid...
Image filtering is a process of reducing noise which degrades the performance of image processing. In some applications such as segmentation or classification, denoising has been designed to smooth the homogeneous areas while keeping and enhancing the edges. In several applications such as video analysis, image-guided surgical interventions or visual servoing, real-time denoising is needed. The devoted...
Edge detection is one of the most important paradigm of Image processing. Images contain millions of pixel and each pixel information is independent of its neighbouring pixel. Hence this paper puts to test the capability of Graphics Processing Unit (GPU) to compute in parallel against the millions of pixel calculations involved in image processing. Each pixel operation is independent from other thus...
Models are useful to represent abstractions of software and hardware processes. The Bulk Synchronous Parallel (BSP) is a bridging model for parallel computation that allows algorithmic analysis of programs on parallel computers using performance modeling. The main idea of BSP model is the treatment of communication and computation as abstractions of a parallel system. Meanwhile, the use of GPU devices...
Graphics Processing Units (GPUs) are used today as affordable energy-efficient method of acceleration for computationally exhaustive algorithms to decrease execution time exploiting the power of parallel programing techniques. In the field of medical imaging, GPUs became crucial acceleration method for computationally exhaustive algorithms. This paper presented the effect of memory optimization on...
ALMA is a revolutionary instrument in its scientific concept, its engineering design and its organisation as a global effort. ALMA and new incoming radio-telescopes delivery big amounts of data that are useful to the sky image reconstruction. In this context, MEM is one of the most recognized reconstruction algorithms in radio-interferometry and is based on a Bayesian approach. Our results show that...
Graphics processors Unit (GPU) architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of general purpose applications compared to contemporary general-purpose processors (CPUs). However, GPU architecture depends on multithreading that needs to share data and resources that face memory concurrency issues. Data races and deadlocks are the most...
In this paper we present a heavily exploration oriented implementation of genetic algorithms to be executed on graphic processor units (GPUs) that is optimized with our novel mechanism for scheduling GPU-side synchronized jobs that takes inspiration from the concept of persistent threads. Persistent Threads allow an efficient distribution of work loads throughout the GPU so to fully exploit the CUDA...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.