The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a parallelization scheme for parameter sweep (PS) applications using the compute unified device architecture (CUDA). Our scheme focuses on PS applications with irregular access patterns, which usually result in lower performance on the GPU. The key idea to resolve this irregularity is to exploit the similarity of data accesses between different parameters. That is, the scheme simultaneously...
Over the last few years, we have witnessed the proliferation of GPU devices on HPC environments. Manufacturers produce new versions of their devices every few years, though, posing a new problem for scientists and engineers using their technology: is it worth the time and effort spent optimizing the codes for the current version? Or it is better to wait until a new architecture appears? In this paper,...
This paper focuses on solving large size optimization problems using GPGPU. Evolutionary Algorithms for solving these optimization problems suffer from the curse of dimensionality, which implies that their performance deteriorates as quickly as the dimensionality of the search space increases. This difficulty makes very challenging the performance studies for very high dimensional problems. Furthermore,...
As many-core graphics processors gain an increasingly important position concerning the advancements on modern highly concurrent processors, we are experiencing the deployment of the first heterogeneous clusters that are based on GPUs. The attempts to match future expectations in computational power and energy saving with hybrid - GPU-based - clusters are expected to grow in the next years, and much...
Integrated circuit (IC) design automation has traditionally followed a hierarchical approach. Modern IC design flow is divided into sequentially-addressed design and optimization layers; each successively finer in design detail and data granularity while increasing in computational complexity. Eventual agreement across the design layers signals design closure. Obtaining design closure is a continual...
Significant research has demonstrated the performance and power benefits of runtime dynamic reconfiguration of FPGAs and microprocessor/FPGA devices. For dynamically reconfigurable systems, in which the selection of hardware coprocessors to implement within the FPGA is determined at runtime, online estimation methods are needed to evaluate the performance and power consumption impact of the hardware...
Logic simulation of a VLSI chip is a computationally intensive process. There exists an urgent need to map functional validation algorithms onto parallel architectures to aid hardware designers in meeting time-to-market constraints. In this paper, we propose three novel methods for logic simulation of combinational circuits on GPGPUs. Initial experiments run on two methods using benchmark circuits...
Although general purpose GPUs have relatively high computing capacity, they also introduce high power consumption compared with general purpose CPUs. Therefore low-power techniques targeted for GPUs will be one of the most hot topics in the future. On the other hand, in several application domains, users are unwilling to sacrifice performance to save power. In this paper, we propose an effective kernel...
A flexible and scalable approach for LDPC decoding on CUDA based Graphics Processing Unit (GPU) is presented in this paper. Layered decoding is a popular method for LDPC decoding and is known for its fast convergence. However, efficient implementation of the layered decoding algorithm on GPU is challenging due to the limited amount of data-parallelism available in this algorithm. To overcome this...
In this paper, we present a GPU-based algorithm which accelerates the MoM impedance matrix computation. Based on an efficient quasi-one-dimensional approximation of the reaction integrals, the MPIE formulation for the analysis of microstrip circuits is considered. We use NVIDIA CUDA as GPU development tool and choose an edge-connected line-fed patch antenna as reference problem. In order to demonstrate...
Nowadays, Graphics Processing Unit (GPU), as a kind of massive parallel processor, has been widely used in general purposed computing tasks. Although there have been mature development tools, it is not a trivial task for programmers to write GPU programs. Based on this consideration, we propose a novel parallel computing architecture. The architecture includes a parallel programming model, named Gemma,...
Today's high performance embedded computing applications are posing significant challenges for processing throughout. Traditionally, such applications have been realized on application specific integrated circuits (ASICs) and/or digital signal processors (DSP). However, ASICs' advantage in performance and power often could not justify the fast increasing fabrication cost, while current DSP offers...
This paper provides the first comparison of performance and energy efficiency of high productivity computing systems based on FPGA (Field-Programmable Gate Array) and GPU (Graphics Processing Unit) technologies. The search for higher performance compute solutions has recently led to great interest in heterogeneous systems containing FPGA and GPU accelerators. While these accelerators can provide significant...
Graphics Processing Unit (GPU) has become an attractive coprocessor for scientific computing due to its massive processing capability. The sparse matrix-vector multiplication (SpMV) is a critical operation in a wide variety of scientific and engineering applications, such as sparse linear algebra and image processing. This paper presents an auto-tuning framework that can automatically compute and...
Emerging general-purpose Graphics Processing Unit (GPU) provides a multi-core platform for wide applications, including machine learning algorithms. In this paper, we proposed several techniques to accelerate Support Vector Machines (SVM) on GPUs. Sparse matrix format is introduced into parallel SVM to achieve better performance. Experimental results show that the speedup of 55x-133.8x over LIBSVM...
CUDA (Compute Unified Device Architecture) acceleration of very large scale matrix-vector and matrix-matrix multiplication is presented in this paper. The intrinsic parallelism in the matrix computations are exploited thoroughly. By dividing the entire matrix computation to multiple sub-groups, scalable performance improvement can be achieved using multiple GPUs. The key operations are accelerated...
The host-multi-SIMD chip multiprocessor (CMP) architecture has been proved to be an efficient architecture for high performance signal processing which explores both task level parallelism by multi-core processing and data level parallelism by SIMD processors. Different from the cache-based memory subsystem in most general purpose processors, this architecture uses on-chip scratchpad memory (SPM)...
As one of the most popular accelerators, Graphics Processing Unit (GPU) has demonstrated high computing power in several application fields. On the other hand, GPU also produces high power consumption and has been one of the most largest power consumers in desktop and supercomputer systems. However, software power optimization method targeted for GPU has not been well studied. In this work, we propose...
GPU has recently gained considerable attention in getting significant performance, for application raging from scientific computing to database sorting and search. General-purpose computing on GPU can easily reduce the execution time but results in an associated increase in the energy consumption. This paper analyzes energy consumption of parallel algorithms executing on GPU and provide a methodology...
The Graphical Processing Unit (GPU) has become a competitive general purpose computational hardware platform in the last few years. Recent improvements in GPUs highly parallel programming capabilities such as Compute Unified Device Architecture(CUDA) has lead to a variety of complex applications with tremendous performance improvements. Genetic Sequence alignment is considered to be one of the application...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.