Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
In this paper, an implementation of the ACA on GPU platform is presented, involving two parts: the matrix compression using the ACA and the batched matrix-vector products utilizing H-matrix form. Some numerical examples are provided to demonstrate the overall performance of the proposed implementation of the ACA algorithm on GPU platform through comparison with the 4-threaded CPU algorithm. In these...
in this paper, we propose a new fine-grained clustering bias field estimation and segmentation algorithm on Single Instruction Multiple Data (SIMD) architecture (GPU). The goal is to accelerate compute-intensive portions of the sequential version. We have implemented this parallel algorithm using Compute Unified Device Architecture (CUDA) on different NVidia GPU cards. The numerical results in terms...
During the past decade Graphics Processing Units (GPU) have been increasingly employed for speeding up compute intensive scientific applications. In this field, the geometric multigrid method (GMG) is one of the most efficient algorithms for solving large sparse linear systems of equations. Herein we analyze the performance of an optimized GPU based implementation of the GMG method on different state-of-the-art...
In many-core based parallel computing field, how to optimally allocate and schedule computing core resources according to characteristics of parallel applications is one typical and fundamental problem, which touches closely to computing performances. After analyzing features and mechanisms of Kepler CUDA architecture, three heterogeneous streaming parallel computing modes and corresponding constraints,...
On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU...
This paper presents a CUDA-based real-time solution for true-color synthesis of remote sensing image. The solution reduces total execution time in three aspects. 1. Apply Pinned Memory to reduce data transfer time between host memory and GPU global memory. 2. Use look-up table to reduce computing time. 3 Overlap kernel executions and data transfers, besides register pages and execute kernels in parallel...
The Indonesia Colorectal Cancer Consortium (IC3), the first cancer biobank repository in Indonesia, is faced with computational challenges in analyzing large quantities of genetic and phenotypic data. To overcome this challenge, we explore and compare performance of two parallel computing platforms that use central and graphic processing units. We present the design and implementation of a genome-wide...
We propose a new method derived from DACCER (Distributed Assessment of the Closeness CEntrality Ranking): the modified DACCER (MDACCER), for assessing traditional closeness centrality ranking. MDACCER presents a relaxation that allows it to take advantage of massively parallel environments like General Purpose Graphics Processing Units (GPGPUs). Traditional DACCER proposal assesses Closeness centrality...
The Open ACC standard has been developed to simplify parallel programming of heterogeneous systems. Based on a set of high-level compiler directives it allows application developers to offload code regions from a host CPU to an accelerator without the need for low-level programming with CUDA or Open CL. Details are implicit in the programming model and managed by Open ACC API-enabled compilers and...
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is widely available to achieve high performance in desktop, notebook, and even mobile computer systems. While multicore technology has become the norm of modern computers, programming such systems requires the understanding of underlying hardware architecture and hence posts a great challenge for...
How can GPU acceleration be obtained as a service in a cluster? This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), such that the nodes of a cluster can request...
Enhancement algorithms can make low light level images have a clear visual effect like the one captured during the daytime, but due to high complexity and generous computational cost, low light level image enhancement algorithms are usually difficult to meet real-time requirements which make it difficult to be widely used in practical application. For this situation, a parallel optimization algorithm...
We present a new parallel approach of Needleman-Wunsch algorithm for global sequence alignment. This approach uses skewing transformation for traversal and calculation of the dynamic programming matrix. We compare the execution time of sequential CPU based implementation with two parallel GPU based implementations: Single-kernel invocation with lock-free block synchronization and multi-kernel invocation...
Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing...
In this paper we investigate static memory access predictability in GPGPU workloads, at the thread block granularity. We first show that a significant share of accessed memory addresses can be predicted using thread block identifiers. We build on this observation and introduce a hardware-software prefetching scheme to reduce average memory access time. Our proposed scheme issues the memory requests...
Intuitionistic fuzzy edge detection algorithm has been used for the signification or characterization of images. It has been designed by experts and the algorithm provides to aim to minimize errors. However, it has a fixed value for thresholding. In this paper, a hybrid algorithm has been developed using the Otsu method which is calculated a threshold value depending on the images. To be applicable...
There are many scientific applications ranging from weather prediction to oil and gas exploration that requires high-performance computing. It aids industries and researchers to enrich further their advancements. With the advent of general purpose computing over GPUs, most of the applications above are shifting towards High-Performance Computing (HPC). Agent-based crowd simulation is one of the candidates...
Compute Unified Device Architecture (CUDA) is an architecture and programming model that allows leveraging the high compute-intensive processing power of the Graphical Processing Units (GPUs) to perform general, non-graphical tasks in a massively parallel manner. Hadoop is an open-source software framework that has its own file system, the Hadoop Distributed File System (HDFS), and its own programming...
Massively parallel computing is applied extensively in various scientific and engineering domains. With the growing interest in many-core architectures and due to the lack of explicit support for inter-block synchronization specifically in GPUs, synchronization becomes necessary to minimize inter-block communication time. In this paper, we have proposed two new inter-block synchronization techniques:...
Support vector machine (SVM) is a popular classifier dealing with small-scale datasets. It has outstanding performance compared to other classifiers. However the execution time is extremely long when training Big Data. The Graphics Processing Unit (GPU) is a massively parallel device which performs very well as a co-processor. NVIDIA proposed a programming platform, CUDA, in 2006, which makes it much...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.