The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Conjugate Gradient method is a very efficient iterative method for solving large systems of equations arising from real life scientific computing applications. In this paper we present the Conjugate Gradient method and its variants in brief. We also present a comparative analysis of implementations of this method on various platforms like FPGAs, GPUs etc which are suitable for High Performance Computing.
This paper presents a different approach for parallelizing the Doolittle Algorithm with the help of Intel Threading Building Blocks (TBB) allowing the users to utilize the power of multiple cores present in the modern CPUs. Parallel Doolittle Algorithm (PDA) has been divided into 3 parts: Decomposing the data, Parallely processing the data, finally Composing the data. Using the PDA we can solve the...
The Branch-and-Bound (B&B) method is a well-known optimization algorithm for solving integer linear programming (ILP) models in the field of operations research. It is part of software often employed by businesses for finding solutions to problems such as airline scheduling problems. It operates according to a divide-and-conquer principle by building a tree-like structure with nodes that represent...
This paper presents a DDC (digital down converter) on NVIDA 580 GTX, which consists of a DDS (direct digital synthesizer), a CIC (cascade integrator comb) decimation filter and a FIR (finite impulse response) filter. The decimating factor of the CIC decimation filter can be arbitrary positive integer and the major concern is concentrated on how to drive it to work well while the decimating factor...
In this paper, we solve the gravity equations on hybrid multi-CPU/GPU using high order finite elements. Domain decomposition methods are inherently parallel algorithms making them excellent candidates for implementation on hybrid architectures. Here, we propose a new stochastic-based optimization procedure for the optimized Schwarz domain decomposition method, which is implemented and tuned to graphics...
Rendering 3D workloads using the least power possible is an increasingly important quality of computing platforms. Current platforms do not achieve this goal because they power the Central Processing Units (CPUs) at frequencies above the minimum required for these workloads to operate without performance loss. Higher than necessary frequencies yield greater than necessary power consumption. This paper...
A comparison of PGI Open ACC, FORTRAN CUDA, and Nvidia CUDA pseudospectral methods on a single GPU and GCC FORTRAN on single and multiple CPU cores is reported. The GPU implementations use CuFFT and the CPU implementations use FFTW. Porting pre-existing FORTRAN codes to utilize a GPUs is efficient and easy to implement with Open ACC and CUDA FORTRAN. Example programs are provided.
Control of safety critical applications requires using of control systems with defined safety level. It is necessary to fulfil requirements not only on safety but in some cases also on reliability of the control system, too. Achievement of these properties depends on the choice of an appropriate structure of the control system. Safety programmable logic controllers (PLC) are modular systems and allow...
Many geophysical problems are computationally expensive owing to their iterative nature or due to the programs processing to large datasets. Such problems are challenging and have to be approached with extreme caution because a wrong parameter selection will not only lead to wrong results but will also take up a lot of time. The Compute Unified Device Architecture (CUDA) introduced by NVIDIA has enabled...
General Purpose GPU (GPGPU) computation relies heavily on intrinsic high data-parallelism to achieve significant speedups. However, application programs may not be able to fully utilize these parallel computing resources due to intrinsic data dependencies or complex data pointer operations. In this paper, we use aggressive software-based value prediction techniques on GPUs to accelerate programs that...
In parallel computing, the memory requirement is an important problem, and in parallel software development, it is vital to optimize the memory management strategy. Programmers need to know the memory optimizing degree. But, the parallel programs' performance evaluation metric speedup only refers to computing time, without considering the memory cost when executing programs. In this paper, the relationship...
We describe computational experiments exploring the performance improvements from overlapping computation and communication on hybrid parallel computers. Our test case is explicit time integration of linear advection with constant uniform velocity in a three-dimensional periodic domain. The test systems include a Cray XT5, a Cray XE6, and two multicore Infiniband clusters with different generations...
Due to large power grid sizes, IR-drop analysis is a computationally challenging design flow step that is commonly used in integrated circuit design. Variability in silicon and circuit operating conditions makes IR-drop analysis even more challenging. We introduce a flow to take benefit of a graphical processing unit (GPU). We introduce variability for the power grid elements through Monte Carlo runs...
The state-of-art computer architecture is based on multi core processor technology. Nowadays processors contain even more than ten cores. On the other hand new technologies have emerged that enable using GPU in general propose computing. Moreover, GPUs have become easier to program, which allows developers to effectively exploit their computational power. Currently, major chip manufacturers are developing...
In the endpoint mixing scheme, the call control and media data between the terminal nodes are exchanged via the neighboring terminal nodes with hierarchical structure. In this paper, we show the formal method to calculate the maximum allowable number of neighboring terminal nodes in the hierarchical conference. This is derived by considering the computing resources and remaining power. We also define...
Power gating (PG) and body biasing (BB) are popular leakage control techniques at microarchitectural level. However, their large overhead prevents them from being applied for active leakage reduction. The overhead problem is further magnified by temperature and process variation, leading to the “corner case leakage control” problem. This paper presents an Adaptive Light-Weight Vth Hopping technique...
In this paper, we propose an approach for significantly improving the performance of parallel matrix-matrix multiplication using a GPU-accelerated cluster. For one node, we implement a CPUs-GPU parallel double-precision general matrix-matrix multiplication (dgemm) operation and achieve a performance improvement of 32% as compared to the GPU-only case and 56% as compared to the CPUs-only case. For...
The main purpose of this paper is to demonstrate how we make use of the powerful graphics processor, NVIDIA GTX280, in numerical simulation with the support of double precision floating number. Apply the finite volume method in simulating the Euler equation, two well-known examples for travelling shock waves were examined in high resolution. We had achieved at best 878 times faster than a Core 2 Duo...
This paper proposes an efficient VLSI extraction algorithm to extract a transistor level netlist to a gate level netlist for functional verification and diagnosis. Compared with other reported circuit extraction algorithm, our proposed technique does not require a cell library and is able to generate Boolean equations without the prior knowledge of transistor type or drain/source orientation of the...
The Variable Preconditioned GVR (VPGCR) with mixed precision on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA) is numerically investigated. The convergence theorem of VPGCR is guaranteed that the residual equation for the preconditioned procedure can be solved in the range of single precision operation. The results of computations show that VPGCR with mixed precision...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.