The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Stochastic Rotation Dynamics (SRD) is a novel particle-based simulation method that can be used to model complex fluids [1], [2], such as binary and ternary mixtures [3], and polymer solutions [4]-[6], in either two or three dimensions. Although SRD is efficient compared to traditional methods, it is still computationally expensive for large system sizes, e.g. when using a large array of particles...
This article describes parallel algorithm of face detection on images for GPU architecture. This algorithm is an extension of an algorithm from OpenCV library. A computational structure is presented for the developed algorithm. Also, scheduling algorithm was developed to balance a workload among GPU's threads.
Checkpoint/restart has been an effective mechanism to achieve fault tolerance for many scientific applications. However, as GPU becomes a much bigger role in high performance computing, there is no effective checkpoint/restart scheme yet due to GPU's batch-mode execution manner. The paper proposes an application-level checkpoint/restart scheme to save and restore GPU computation states. A precompiler...
Scaling up the sparse matrix-vector multiplication has been at the heart of numerous studies in both academia and industry. The massive parallelism of graphics processing units offers tremendous performance in many high-performance computing applications. In this work, we discuss performance analysis for parallel implementation of sparse matrix-vector multiplication using the conjugate gradient algorithm...
Cloud detection and removal forms an important need for change detection studies. A small amount of cloud cover may misinterpret the crucial information in disaster management applications. Although several cloud detection techniques exist, there is a critical need to apply these techniques in real time and obtain the cloud free images quickly to support real time decisions.
We develop cache efficient, multicore, and GPU algorithms for RNA folding using Nussinov's equations. Our cache efficient algorithm provides a speedup between 1.6 and 3.0 relative to a naive straightforward single core code. The multicore version of the cache efficient single core algorithm provides a speedup, relative to the naive single core algorithm, between 7.5 and 14.0 on a 6 core hyperthreaded...
Computation state construction is an indispensable step to achieve fault tolerance and computation mobility for scientific applications by saving and restoring the state during program execution. However, there is no effective state construction scheme yet due to the GPU's batch-mode execution manner as the GPU takes on a larger role in high performance computing. The GPU's complex memory hierarchy...
The analysis of climatic parameters, vegetation, humidity and pollution in the domain of time and space is done by processing a series of images of a geographic area taken by the satellite at certain times [1]. These images are subject to several computing schemes, with the aim of evaluating spatial and temporal variations of the mentioned parameters. One of the programs used to manipulate the images...
Modern General-Purpose computation on Graphics Processing Units (GPGPUs) explore parallelism in applications by building massively parallel architecture and apply multithreading technology to hide the instruction and memory latencies. Such architectures become increasingly popular for parallel applications using CUDA/OpenCL programming languages. In this paper, we investigate thread scheduling algorithms...
Frequent pattern mining is a field with many practical applications, where large computational power and speed are needed. Many state-of-the-art frequent pattern mining applications are an inefficient solutions for both shared memory and multiprocessor systems due to problems with parallelism and memory. One of possible solutions to the problem is the use of Graphics Processing Unit (GPU) in the system...
Graphics Processing Units (GPUs) are becoming an integral part of modern supercomputer architectures due to their high compute density and performance per watt. In order to maximize utilization, it is imperative that applications running on these clusters have low synchronization and communication overheads. Partitioned Global Address Space (PGAS) models provide an attractive approach for developing...
We develop an optimized FFT based Poisson solver on a CPU-GPU heterogeneous platform for the case when the input is too large to fit on the GPU global memory. The solver involves memory bound computations such as 3D FFT in which the large 3D data may have to be transferred over the PCIe bus several times during the computation. We develop a new strategy to decompose and allocate the computation between...
Vertex coloring is a subset of the graph coloring problem. It is of great importance in many applications. Vertex coloring implies a coloring of the vertices of the graph with minimal number of colors (k), so that adjacent vertices have different color. The paper presents a hybrid implementation of Simulated Annealing algorithm for k-coloring of the vertices of the graph. The programming has been...
An exploit involving the greatest common divisor (GCD) of RSA moduli was recently discovered [1]. This paper presents a tool that can efficiently and completely compare a large number of 1024-bit RSA public keys, and identify any keys that are susceptible to this weakness. NVIDIA's graphics processing units (GPU) and the CUDA massively-parallel programming model are powerful tools that can be used...
Virtualization, as a technology that enables easy and effective resource sharing with a low cost and energy footprint, is becoming increasingly popular not only in enterprises but also in high performance computing. Applications with stringent performance needs often make use of graphics processors for accelerating their computations. Hence virtualization solutions that support GPU acceleration are...
Frequency domain analysis is one of the most common analysis techniques in signal and image processing. Fast Fourier Transform (FFT) is a well know tool used to perform such analysis by obtaining the frequency spectrum for time- or spatial-domain signals and vice versa. FFT-Shift is a subsequent operation used to handle the resulting arrays from this stage as it centers the DC component of the resulting...
In this paper, four beamforming algorithms (i.e., interpolation and phase rotation with pre- and post-filtering, IBF-PRE, IBF-POST, PRBF-PRE and PRBF-POST, respectively) implemented on a high-performance graphics-processing unit (GPU) were presented. Each beamforming method was divided into two kernels consisting of various beamforming and mid-processing blocks and efficiently implemented on a NVIDIA's...
LDA (Latent Dirichlet Allocation) is a text modeling algorithm based on a generative probabilistic model. It is widely used to discover latent topics among a set of documents. Mahout has implemented LDA algorithm, however, the execution time of the LDA program is very long when processing a large amount of documents, because the documents are processed in sequence. This paper introduces a method to...
A common neural network used for complex data clustering is the Self Organizing Maps(SOM). This algorithm have a expensive training step, that occur mainly on high dimensional applications like image clustering. This makes impossible for some of these applications to be run in real time or even in a feasible time. On this paper we explore the use of GPUs with the NVIDIA CUDA language to decrease computational...
More and more computationally intensive scientific applications make use of hardware accelerators like general purpose graphics processing units (GPGPUs). Compared to software development for typical multi-core processors their programming is fairly complex and needs hardware specific optimizations to utilize the full computing power. To achieve high performance, critical parts of a program have to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.