The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Outlier detection is a data mining task consisting in the discovery of observations which deviate substantially from the rest of the data, and has many important practical applications. Outlier detection in very large data sets is however computationally very demanding and the size limit of the data that can be elaborated is considerably pushed forward by mixing three ingredients: efficient algorithms,...
Modern System-on-Chip (SOC) architectures offer much for a relatively small price, but often industrial machine builders only use a fraction of the functionality. Their main interest is the performance boost by using multiple cores. For safety devices, the on-chip redundancy is beneficially to achieve higher reliability, but since most platforms are homogenous, there is a need to get systematic and...
By the analysis of problems of mood spread diffusion, combining the theory of Agent, Agent-Based mood diffusion model was established, using CUDA programming tool, which is suitable for parallel computing of the part to carry on the design implementation, thus proving the GPU computing can improve the efficiency of the model calculation.
This paper addresses the issue of efficient sorting of strings on multi-and many-core processors. We propose CPU and GPU implementations of the most-significant digit radix sort algorithm using different parallelization strategies on various stages of the execution to achieve good workload balance and optimal use of system resources. We evaluate the performance of our solution on both architectures...
In this paper, we focus on the impact of a memory bandwidth limitation by analyzing the bandwidth consumption for a ray tracing system and present an energy efficient data transmission method using a dedicated interface between the processor and ray tracing hardware engine. To achieve real-time ray tracing, we propose a full-stream architecture through the use of this dedicated interface. For an evaluation...
In this article we present a parallel implementation of the Durand-Kerner algorithm to find roots of polynomials of high degree on a GPU architecture (Graphics Processing Unit). We have implemented both a CPU version in and a GPU compatible version with CUDA. The main result of our work is a parallel implementation that is 10 times as fast as its sequential counterpart on a single CPU for high degree...
This paper proposes and evaluates a parallel strategy to execute the exact Smith-Waterman (SW) biological sequence comparison algorithm for huge DNA sequences in multi-GPU platforms. In our strategy, the computation of a single huge SW matrix is spread over multiple GPUs, which communicate border elements to the neighbour, using a circular buffer mechanism. We also provide a method to predict the...
Texture Features introduced by Haralick in 1973 which rely on computing the so-called Gray Level Co-occurrence Matrix (GLCM), are being used extensively by many applications to understand and enhance images acquired from various scientific contexts. The main limitations of these features are their high computational costs pertaining to memory usage and processing time. In this paper a Graphics Processing...
A GPU-based timing-aware ATPG is proposed to generate a compact high-quality test set. The test generation algorithm backtraces and propagates along multiple long paths so that many test patterns are generated at the same time. Generated test patterns are then fault simulated and selected. Compared with an 8-core CPU-based timing-aware commercial ATPG, the proposed GPU-based technique achieved 36%...
Numerous applications in science and engineering rely on sparse linear algebra. The efficiency of a fundamental kernel such as the Sparse Matrix-Vector multiplication (SpMV) is crucial for solving increasingly complex computational problems. However, the SpMV is notorious for its extremely low arithmetic intensity and irregular memory patterns, posing a challenge for optimization. Over the last few...
Modern graphics processing units (GPUs) have became powerful and cost-effective computing platforms. Parallel programming standards (e.g. CUDA) and directive-based programming standards (like OpenHMPP and OpenACC) are available to harness this tremendous computing power to tackle largescale modelling and simulation in scientific areas. ANUGA is a tsunami modelling application which is based on unstructured...
Graphic Processing Units (GPUs) have been increasingly adopted by the High-Performance Computing community. Its unique hardware architecture supports hundreds or housands of light-weighted threads in a more power efficient manner compared with traditional CPUs, and with higher overall performance. This motivates highly parallel applications to be ported to GPUs. Programming GPUs is not a trivial task...
In this work, we present a back-end for the Python library NumPy that utilizes the GPU seamlessly. We use dynamic code generation to generate kernels, and data is moved transparently to and from the GPU. For the integration into NumPy, we use the Bohrium runtime system. Bohrium hooks into NumPy through the implicit data parallelization of array operations, this approach requires no annotations or...
GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained...
Modern parallel programming requires a combination of different paradigms, expertise and tuning, that correspond to the different levels in today's hierarchical architectures. To cope with the inherent difficulty, ORWL (ordered read-write locks) presents a new paradigm and toolbox centered around local or remote resources, such as data, processors or accelerators. ORWL programmers describe their computation...
Today, the industry old adage of sequential processing is certainly no longer sufficient. The need for high performance computation is ever growing, even though certain problem sets remain within the realm of super high performance computing with applications such as weather forecasting, quantum physics and climate research to name a few. Within the commercial realm of computation, NVIDIA has proposed...
In this paper we introduce a parallel implementation of locally-and feature-adaptive diffusion based (LFAD) method for image denoising using NVIDIA CUDA framework and graphics processing units (GPUs). LFAD is a novel method for removing additive white Gaussian (AWG) noise in images reported to yield high quality denoised images [1]. It approaches each image region separately and uses different number...
The purpose of this study is to evaluate the performance of two dimensional multi-threaded linear filtering process on the GPU and FPGA platforms. To obtain the implementation on varying platforms, OpenCL API is used. OpenCL provides platform independent programming advantage. The results on three different platforms are compared to each other within this scope. These platforms are CPU, GPU, and FPGA...
In order to improve the layout quality of a VLSI design, many placement tools employ clustering algorithms to prune the optimization space and produce a design that can be enhanced while considering multiple design constraints. An intelligent clustering algorithm can guide a placement tool to reduce wire length, reduce cycle time, consider additional metrics or optimize a design based on a combination...
GPUs (Graphics Processing Units) are designed to solve large data-parallel problems encountered in the fields of image processing, scene rendering, video playback, and gaming. GPUs are therefore designed to handle a higher degree of parallelism as compared to conventional CPUs. GPGPU (General Purpose computing on Graphics Processing Units) enables users to do parallel computing on the graphics hardware...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.