The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The use of Graphics Processing Units (GPUs) has become a very popular way to accelerate the execution of many applications. However, GPUs are not exempt from side effects. For instance, GPUs are expensive devices which additionally consume a non-negligible amount of energy even when they are not performing any computation. Furthermore, most applications present low GPU utilization. To address these...
String matching problems such as sequence alignment is one of the fundamental problems in many computer since fields such as natural language processing (NLP) and bioinformatics. Many algorithms have been proposed in the literature to address this problem. Some of these algorithms compute the edit distance between the two strings to perform the matching. However, these algorithms usually require long...
Source code is a frequent target for plagiarism in massive computing courses. Plagiarism detection requires a significant effort from the teaching staff, thus software tools have been used to detect similar source codes. This paper examines parallelization of source code similarity detection based on Greedy-String-Tiling and Karp-Rabin algorithms. CPU implementation is parallelized using Pthreads,...
In this work the latest multi-graphical-processing unit (multi-GPU) libraries included with the NVIDIA® CUDA® toolkit are used to accelerate the simulation of the radar cross section (RCS) of a target discretized to 125,913 unknowns. On a system with four NVIDIA Tesla K40 GPU cards, the total runtime of a full-wave out-of-core method of moments (MoM) solver was reduced from about 4 days to about 5...
Typhoon/Cyclone is an important part of the risk assessment of natural catastrophes. To estimate probabilistic cyclone damage for a very long time (e.g. 1,000 years, 10,000 years), the first step of catastrophe risk assessment model is to generate a stochastic set of routes derived by historical cyclone routes, which takes a long time. In order to accelerate the generation, we try to use GPU programming,...
The present paper discusses radio monitoring tasks and their solution using DFT-modulated filter banks. Filter bank software-hardware implementations are studied on the basis of Central Processing Unit (CPU) and Compute Unified Device Architecture (CUDA) with the use of Graphics Processing Unit (GPU). It is shown that CUDA technology is efficient for processing large datasets and outperforms computational...
The use of emulators in the WSN design process offers the advantage of precise timing and cross layer simulation. The former due to the fact that each instruction executes in specific machine cycles, and the latter because the emulated machine code contains both application and protocol stack code. Emulators used in sensor network simulation are bounded to specific hardware. Adding different modules...
We present a localization system for autonomous mobile robot, that operates in conditions of well - known environment. In our work we use particle - based Monte - Carlo localization. This algorithm has many applications in mobile robotics, but it is computationally expensive. Due to high level of parallelism in this algorithm, we had an opportunity to accelerate its execution on graphical processing...
The Hierarchical Memory Machine (HMM) is a theoretical parallel computing model that captures the essence of computing on CUDA-enabled GPUs. The summed area table (SAT) of a matrix is a data structure frequently used in the area of computer vision which can be obtained by computing the column-wise prefix-sums and then the row-wise prefix-sums. The main contribution of this paper is to introduce the...
In this paper, several versions of a signal extraction algorithm, pertaining to the entry stage of the Cherenkov Telescope Array's Real Time Analysis pipeline, were implemented and optimised using SSE2, POSIX threads and CUDA. Results of this proof of concept let us gain an insight into the suitability of each platform, and the performance each one can deliver, to carry out this particular task.
Commodity graphic processing units (GPUs) have rapidly evolved to become high performance accelerators for data-parallel computing through a large array of processing cores and the CUDA programming model with a C-like interface. However, optimizing an application for maximum performance based on the GPU architecture is not a trivial task for the tremendous change from conventional multi-core to the...
Usage of GPU-based architectures for scientific computing has been steadily increasing in the last years. This new paradigm for both programming and execution has been applied to solve several classic problems much faster than using the conventional multiprocessor and/or multicomputer approach. These architectures allow an increase in performance -- compared to conventional CPU processors -- for specific...
This paper presents a high performance algorithm for modular multiplication on a graphics processing unit (GPU) implemented in assembler. The proposed algorithm carries out finite field multiplication over the NIST prime fields of size 192, 224, 256 and 384 bits. Included is a detailed explanation of our algorithm, an instruction count analysis, and a comparison to recently published work; compared...
We present acceleration for numerical solving of electromagnetic (EM) problems by using method of moments (MoM) and NVIDIA graphics processing units (GPU). Three stages of MoM are accelerated: matrix fill, solution of complex linear equations and post-processing. The results show that GPUs can be efficiently used for EM simulations.
With the introduction of API's like CUDA, Stream+ or OpenCL, modern Graphics Processing Units (GPU's) can be easily employed for general purpose computing. Plus, their comparatively low price per GFLOP makes them interesting candidates for coprocessors in future embedded Electronic Control Units (ECUs). Yet, as car manufacturers thrive to reduce the Thermal Design Power (TDP) of each and every ECU...
Graphic Processing Unit (GPU) has involved into a parallel computation for it's massively multi threaded architecture. Due to its high computational power, GPU has been used to deal with many problems that can be easily parallelized. This paper will present a GPU based spot noise parallel algorithm for 2D vector field visualization. It uses spot noise method with GPU resources and compute unified...
General-purpose computing on graphics processing units (GPGPU) is popular computing technology to utilize in various fields. In the paper, we parallelize cryptographical hash processing of a password cracking tool, John the Ripper, by utilizing CUDA on GPGPU. We also evaluate our work to compare the processing time of hash processing parallelized by GPU with that of the John the Ripper on a dual-core...
We present and compare implementations of an affine interior-point algorithm for real-time collision detection on a GPGPU and an FPGA. This particular interior-point algorithm is distinguished from other collision detection methods by its ability to perform detection between pairs of objects undergoing fast rotational and translational movement. This enables inter-frame collision detection, i.e. collision...
Spatial matching for object retrieval is often time-consuming and susceptible to viewpoint changes. To address this problem, we propose a novel spatial matching method and implement it on modern GPU in parallel. Unlike previous spatial matching methods, in which the affine transformation estimation is based on the gravity vector assumption, our method abandons this strong assumption by matching the...
Fluid simulation has been an active research field in computer graphics for the last 30 years. Stam's stable fluids method, among others, is used for solving the equations that govern fluids (i.e. Navier-Stokes equations). An implementation of stable fluids in 3D using NVIDIA Compute Unified Architecture, shortly CUDA, is provided in this paper. This CUDA-based implementation also features the accurate...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.