Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
GPU-based clusters are widely chosen for accelerating a variety of scientific applications in high-end cloud environments. With their growing popularity, there is a necessity for improving the system throughput and decreasing the turnaround time for co-executing applications on the same GPU device. However, resource contention among multiple applications on a multi-tasked GPU leads to the performance...
Presented paper explains general purpose approach to the parallel pixel processing on GPU. It presents essential dataset structuring, correct type assignment and kernel configuration for CUDA application interface. Paper also explains data movement and optimal computation saturation. Transfers are also analyzed in correlation with the computation especially for the embarrassingly parallel problem...
Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models or performing code transformations. Although empirical...
GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking...
Adaptive Dynamic Programming (ADP) with critic-actor architecture is a useful way to achieve online learning control. The algorithm Gaussian-Kernel Adaptive Dynamic Programming (GK-ADP) that has been developed before has a kind of two-phase iteration, which not only approximates value function, but also optimizes hyper-parameters simultaneously. However, just like most iteration algorithms are applied...
Coevolutionary particle swarm optimization (CPSO) algorithm has been investigated and applied in the real world widely. When tackling the large-scale and complex real time optimization problems, the running time of CPSO algorithm is a barrier. In this paper, Graphics Processing Unit (GPU) is introduced to provide speedup in order to meet the real time requirements. The CPSO algorithm has been implemented...
Background Subtraction is the major important step in many image processing applications which can be applied in much of video surveillances. The major result of this method is accuracy as well as processing time. So we mainly focused on these two challenges. We parallelized the Two Layered CodeBook Model on Graphical Processing Unit (GPU) for increasing the processing speed and the accuracy of the...
In a CPU-GPU based heterogeneous computing system, the input data to be processed by the kernel resides in the host memory. The host and the device memory address spaces are different. Therefore, the device can not directly access the host memory. In CUDA programming model, the data is moved between the host memory and the device memory. This data transfer is a time consuming task. The communication...
In various applications where the problem domain can be modeled into graphs, the shortest path computation in the graph is an indispensable challenge. In applications like online social networks and shortest route computation problems, the size of the graph is so large; the number of nodes have become close to hundreds of billions. Shortest path graph algorithms like SSSP (Single Source Shortest Path)...
General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which requires more processing power than normal personal computers. Therefore, most of the programmers, researchers and industry use this new concept for their work...
Simulation of activated sludge model (ASM) including detailed biokinetic reaction network often requires the solution of a large system of ordinary differential equations (ODEs) at each time frame, which requires long computing times. In this study, an adaptive time step backward differentiation formula (BDF) is proposed to solve the ASM's system of ODEs that mainly contains a high degree of stiffness...
Histogram is a popular analytic graphical representation of data distribution resulting from processing a given numerical input data. Although the sequential histogram computation may be simple, it is no longer suitable in processing high volume of data. With recent advancement of high performance computing (HPC), aided by the accelerating growth of General Purpose Graphic Processing Unit (GPGPU),...
Multi-scale Retinex algorithm is an image enhancement algorithm that aims at image reconstruction. The algorithm maintains the high fidelity and the dynamic range compression of the image, so the enhancement effect is obvious. The algorithm exploits a large number of convolution operations to achieve dynamic range compression and color/brightness rendition, and the calculation time increased significantly...
To attain scalable performance efficiently, the HPC community expects future exascale systems to consist of multiple nodes, each with different types of hardware accelerators. In addition to GPUs and Intel MICs, additional candidate accelerators include embedded multiprocessors and FPGAs. End users need appropriate tools to efficiently use the available compute resources in such systems, both within...
GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications have not been investigated in depth. While error propagation has been extensively investigated for non-GPU applications, GPU applications have a very different programming model which can have a significant effect on error propagation...
Image classification is one the important processing done on satellite images. Many algorithm are proposed for such classification of which Support Vector Machine (SVM) is mostly used. Many variants and approaches of SVM are proposed of which GA based classifiers shows better prospects. But increasing size, spectrum and multiple dimension of remote sensing data has made image processing problem more...
Source code is a frequent target for plagiarism in massive computing courses. Plagiarism detection requires a significant effort from the teaching staff, thus software tools have been used to detect similar source codes. This paper examines parallelization of source code similarity detection based on Greedy-String-Tiling and Karp-Rabin algorithms. CPU implementation is parallelized using Pthreads,...
The Propositional Satisfiability Problem (SAT) is one of the most fundamental NP-complete problems, and is central to many domains of computer science. Utilizing a massively parallel architecture on a Graphics Processing Unit (GPU) together with a conventional CPU on NVIDIA's Compute Unified Device Architecture (CUDA) platform, this work proposes an efficient scheme to implement one parallel Stochastic...
In this paper, we propose a parallel block-based Viterbi decoder (PBVD) on the graphic processing unit (GPU) platform for the decoding of convolutional codes. The decoding procedure is simplified and parallelized, and the characteristic of the trellis is exploited to reduce the metric computation. Based on the compute unified device architecture (CUDA), two kernels with different parallelism are designed...
It makes the haze removal in real-time by CUDA based on the atmospheric scattering model and temporal coherence algorithm. Firstly, a hierarchical search method based on four fork tree subdivision replaced the original algorithm to obtain the atmospheric light, and put the number of pixels as the number of parallel threads, which processes the required calculation of pixels, the intermediate results...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.