Search results

Items from 1 to 20 out of 28 results

chapter

A review on accelerating scientific computations using the Conjugate Gradient method

Shreyasee Debnath, Manashwi Tamuli, Ashok Ray, Gaurav Trivedi

2015 International Conference on Electronic Design, Computer Networks & Automated Verification (EDCAV) > 150 - 153

2015 International Conference on Electronic Design, Computer Networks & Automated Verification (EDCAV)

Conjugate Gradient method is a very efficient iterative method for solving large systems of equations arising from real life scientific computing applications. In this paper we present the Conjugate Gradient method and its variants in brief. We also present a comparative analysis of implementations of this method on various platforms like FPGAs, GPUs etc which are suitable for High Performance Computing.

chapter

Parallelizing doolittle algorithm using TBB

Sushil Kumar Sah, Dinesh Naik

2014 International Conference on Parallel, Distributed and Grid Computing > 13 - 15

2014 International Conference on Parallel, Distributed and Grid Computing (PDGC)

This paper presents a different approach for parallelizing the Doolittle Algorithm with the help of Intel Threading Building Blocks (TBB) allowing the users to utilize the power of multiple cores present in the modern CPUs. Parallel Doolittle Algorithm (PDA) has been divided into 3 parts: Decomposing the data, Parallely processing the data, finally Composing the data. Using the PDA we can solve the...

chapter

A Branch-and-Bound algorithm using multiple GPU-based LP solvers

Xavier Meyer, Bastien Chopard, Paul Albuquerque

20th Annual International Conference on High Performance Computing > 129 - 138

2013 20th International Conference on High Performance Computing (HiPC)

The Branch-and-Bound (B&B) method is a well-known optimization algorithm for solving integer linear programming (ILP) models in the field of operations research. It is part of software often employed by businesses for finding solutions to problems such as airline scheduling problems. It operates according to a divide-and-conquer principle by building a tree-like structure with nodes that represent...

chapter

Implementation of a digital down converter using graphics processing unit

Xiao Ma, Lixia Deng, Yuping Zhao

2013 15th IEEE International Conference on Communication Technology > 655 - 660

2013 15th IEEE International Conference on Communication Technology (ICCT)

This paper presents a DDC (digital down converter) on NVIDA 580 GTX, which consists of a DDS (direct digital synthesizer), a CIC (cascade integrator comb) decimation filter and a FIR (finite impulse response) filter. The decimating factor of the CIC decimation filter can be arbitrary positive integer and the major concern is concentrated on how to drive it to work well while the decimating factor...

chapter

Schwarz Method with Two-Sided Transmission Conditions for the Gravity Equations on Graphics Processing Unit

Abal-Kassim Cheik Ahamed, Frederic Magoules

2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science > 105 - 109

2013 12th International Symposium on Distributed Computing and Applications to Business, Engineering & Science (DCABES)

In this paper, we solve the gravity equations on hybrid multi-CPU/GPU using high order finite elements. Domain decomposition methods are inherently parallel algorithms making them excellent candidates for implementation on hybrid architectures. Here, we propose a new stochastic-based optimization procedure for the optimized Schwarz domain decomposition method, which is implemented and tuned to graphics...

chapter

Limiting CPU power consumption for efficient computation of 3D workloads

Travis Schluessler, Jacky Romano, Stas Gurtovoy, Guy Zadicario, more

2012 International Conference on Energy Aware Computing > 1 - 6

2012 International Conference on Energy Aware Computing (ICEAC)

Rendering 3D workloads using the least power possible is an increasingly important quality of computing platforms. Current platforms do not achieve this goal because they power the Central Processing Units (CPUs) at frequencies above the minimum required for these workloads to operate without performance loss. Higher than necessary frequencies yield greater than necessary power consumption. This paper...

chapter

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

B. Cloutier, B.K. Muite, P. Rigge

2012 Symposium on Application Accelerators in High Performance Computing > 145 - 148

2012 Symposium on Application Accelerators in High Performance Computing (SAAHPC)

A comparison of PGI Open ACC, FORTRAN CUDA, and Nvidia CUDA pseudospectral methods on a single GPU and GCC FORTRAN on single and multiple CPU cores is reported. The GPU implementations use CuFFT and the CPU implementations use FFTW. Porting pre-existing FORTRAN codes to utilize a GPUs is efficient and easy to implement with Open ACC and CUDA FORTRAN. Example programs are provided.

chapter

Influence of the control system structure with safety PLC on its reliability and safety

Juraj Zdansky, Peter Nagy

2012 ELEKTRO > 395 - 399

2012 ELEKTRO

Control of safety critical applications requires using of control systems with defined safety level. It is necessary to fulfil requirements not only on safety but in some cases also on reliability of the control system, too. Achievement of these properties depends on the choice of an appropriate structure of the control system. Safety programmable logic controllers (PLC) are modular systems and allow...

chapter

CUDA based Particle Swarm Optimization for geophysical inversion

Debanjan Datta, Suman Mehta, Shalivahan, Ravi Srivastava

2012 1st International Conference on Recent Advances in Information Technology (RAIT) > 416 - 420

2012 1st International Conference on Recent Advances in Information Technology (RAIT)

Many geophysical problems are computationally expensive owing to their iterative nature or due to the programs processing to large datasets. Such problems are challenging and have to be approached with extreme caution because a wrong parameter selection will not only lead to wrong results but will also take up a lot of time. The Compute Unified Device Architecture (CUDA) introduced by NVIDIA has enabled...

chapter

Aggressive Value Prediction on a GPU

Enqiang Sun, David Kaeli

2011 23rd International Symposium on Computer Architecture and High Performance Computing > 9 - 16

2011 23rd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

General Purpose GPU (GPGPU) computation relies heavily on intrinsic high data-parallelism to achieve significant speedups. However, application programs may not be able to fully utilize these parallel computing resources due to intrinsic data dependencies or complex data pointer operations. In this paper, we use aggressive software-based value prediction techniques on GPUs to accelerate programs that...

chapter

Scalable time definite integration in parallel computing

Yue Hu, Wei-qin Tong, Xiao-li Zhi, Huai-liang Xuan

2011 IEEE 2nd International Conference on Computing, Control and Industrial Engineering > 2 > 13 - 16

2011 IEEE 2nd International Conference on Computing, Control and Industrial Engineering (CCIE 2011)

In parallel computing, the memory requirement is an important problem, and in parallel software development, it is vital to optimize the memory management strategy. Programmers need to know the memory optimizing degree. But, the parallel programs' performance evaluation metric speedup only refers to computing time, without considering the memory cost when executing programs. In this paper, the relationship...

chapter

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

J.B. White III, J.J. Dongarra

2011 IEEE International Parallel & Distributed Processing Symposium > 59 - 67

2011 IEEE International Parallel & Distributed Processing Symposium (IPDPS)

We describe computational experiments exploring the performance improvements from overlapping computation and communication on hybrid parallel computers. Our test case is explicit time integration of linear advection with constant uniform velocity in a three-dimensional periodic domain. The test systems include a Cray XT5, a Cray XE6, and two multicore Infiniband clusters with different generations...

chapter

Fast variational static IR-drop analysis on the graphical processing unit

R O Topaloglu

2011 12th International Symposium on Quality Electronic Design > 1 - 6

2011 12th International Symposium on Quality Electronic Design (ISQED 2011)

Due to large power grid sizes, IR-drop analysis is a computationally challenging design flow step that is commonly used in integrated circuit design. Variability in silicon and circuit operating conditions makes IR-drop analysis even more challenging. We introduce a flow to take benefit of a graphical processing unit (GPU). We introduce variability for the power grid elements through Monte Carlo runs...

chapter

Using hybrid GPU algorithm for solving of EMC problems

R. Jobava, P. Tsereteli, K. Odisharia

2011 XVIth International Seminar/Workshop on Direct and Inverse Problems of Electromagnetic and Acoustic Wave Theory (DIPED) > 156 - 160

2011 XVth International Seminar/Workshop on Direct and Inverse Problems of Electromagnetic and Acoustic Wave Theory - (DIPED)

The state-of-art computer architecture is based on multi core processor technology. Nowadays processors contain even more than ten cores. On the other hand new technologies have emerged that enable using GPU in general propose computing. Moreover, GPUs have become easier to program, which allows developers to effectively exploit their computational power. Currently, major chip manufacturers are developing...

chapter

Connection margin value of the terminal in a hierarchical conference

Dongsu Seong, Keonbae Lee

The 5th International Conference on New Trends in Information Science and Service Science > 2 > 277 - 279

2011 5th International Conference on New Trends in Information Science and Service Science (NISS)

In the endpoint mixing scheme, the call control and media data between the terminal nodes are exchanged via the neighboring terminal nodes with hierarchical structure. In this paper, we show the formal method to calculate the maximum allowable number of neighboring terminal nodes in the hierarchical conference. This is derived by considering the computing resources and remaining power. We also define...

chapter

Stretching the limit of microarchitectural level leakage control with Adaptive Light-Weight Vth Hopping

Hao Xu, Wen-Ben Jone, R Vemuri

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 632 - 636

2010 IEEE/ACM International Conference on Computer-Aided Design (ICCAD 2010)

Power gating (PG) and body biasing (BB) are popular leakage control techniques at microarchitectural level. However, their large overhead prevents them from being applied for active leakage reduction. The overhead problem is further magnified by temperature and process variation, leading to the “corner case leakage control” problem. This paper presents an Adaptive Light-Weight Vth Hopping technique...

chapter

Parallel Matrix-Matrix Multiplication Based on HPL with a GPU-Accelerated PC Cluster

Qin Wang, J Ohmura, S Axida, T Miyoshi, more

2010 First International Conference on Networking and Computing > 243 - 248

2010 First International Conference on Networking and Computing (ICNC 2010)

In this paper, we propose an approach for significantly improving the performance of parallel matrix-matrix multiplication using a GPU-accelerated cluster. For one node, we implement a CPUs-GPU parallel double-precision general matrix-matrix multiplication (dgemm) operation and achieve a performance improvement of 32% as compared to the GPU-only case and 56% as compared to the CPUs-only case. For...

chapter

The Possibility of Fast Large-Scale Numerical Simulation Implemented with Graphics Processing Units

Chi-Jer Yu, Chii-Tung Liu

International Symposium on Parallel and Distributed Processing with Applications > 550 - 556

2010 International Symposium on Parallel and Distributed Processing with Applications (ISPA 2010)

The main purpose of this paper is to demonstrate how we make use of the powerful graphics processor, NVIDIA GTX280, in numerical simulation with the support of double precision floating number. Apply the finite volume method in simulating the Euler equation, two well-known examples for travelling shock waves were examined in high resolution. We had achieved at best 878 times faster than a Core 2 Duo...

chapter

An efficient VLSI circuit extraction algorithm for transistor-level to gate-level abstraction

Ye Ren, Yiqiong Shi, Bah-Hwee Gwee, Chan Wai Ting

2010 Asia Pacific Conference on Postgraduate Research in Microelectronics and Electronics (PrimeAsia) > 49 - 52

2010 Second Asia Pacific Conference on Postgraduate Research in Microelectronics & Electronics (PrimeAsia 2010)

This paper proposes an efficient VLSI extraction algorithm to extract a transistor level netlist to a gate level netlist for functional verification and diagnosis. Compared with other reported circuit extraction algorithm, our proposed technique does not require a cell library and is able to generate Boolean equations without the prior knowledge of transistor type or drain/source orientation of the...

chapter

Implementation of Variable Preconditioned GCR with mixed precision on GPU using CUDA

Soichiro Ikuno, Norihisa Fujita, Susumu Yamamoto, Susumu Nakata

Digests of the 2010 14th Biennial IEEE Conference on Electromagnetic Field Computation > 1

2010 14th Biennial IEEE Conference on Electromagnetic Field Computation (CEFC 2010)

The Variable Preconditioned GVR (VPGCR) with mixed precision on Graphics Processing Unit (GPU) using Compute Unified Device Architecture (CUDA) is numerically investigated. The convergence theorem of VPGCR is guaranteed that the residual equation for the preconditioned procedure can be solved in the range of single precision operation. The results of computations show that VPGCR with mixed precision...

Keywords:
EQUATIONS

Publication date

Set your own date range

Publication type

book (24)
article (4)

Keywords

MATHEMATICAL MODEL (17)
GRAPHICS PROCESSING UNIT (10)
GRAPHICS PROCESSING UNITS (6)
INSTRUCTION SETS (6)
COMPUTER ARCHITECTURE (4)
COPROCESSORS (4)
INTEGRATED CIRCUIT DESIGN (4)
PARALLEL PROCESSING (4)
COMPUTATIONAL MODELING (3)
COMPUTER GRAPHIC EQUIPMENT (3)
GPU (3)
KERNEL (3)
ALGORITHM DESIGN AND ANALYSIS (2)
COMPUTERS (2)
CUDA (2)
MATRIX DECOMPOSITION (2)
MEMORY MANAGEMENT (2)
MONITORING (2)
NUMERICAL MODELS (2)
PARALLEL ARCHITECTURES (2)
PARALLEL COMPUTING (2)
RELIABILITY (2)
STANDARDS (2)
30-TRANSISTOR STANDARD CELL (1)
3639-TRANSISTOR COMBINATIONAL CIRCUIT (1)
3D (1)
ACCELERATION (1)
ACCURACY (1)
ACTIVE LEAKAGE REDUCTION (1)
AD HOC NETWORKS (1)
ADAPTATION MODEL (1)
ADAPTIVE FEEDBACK MECHANISM (1)
ADAPTIVE LIGHT-WEIGHT VTH HOPPING (1)
ADJOINT SENSITIVITY ANALYSIS (1)
AMBIENT TEMPERATURES (1)
APPROXIMATION ALGORITHMS (1)
ARCHITECTURE-SPECIFIC PROGRAMMING (1)
BENCHMARK (1)
BODY BIASING (1)
BOOLEAN EQUATIONS (1)
BOOLEAN FUNCTIONS (1)
CACHE MEMORY (1)
CASCADE INTEGRATOR COMB (1)
CELLULAR ARRAYS (1)
CENTRAL PROCESSING UNIT (CPU) (1)
CENTRAL PROCESSING UNIT TEMPERATURE (1)
CIRCUIT ANALYSIS COMPUTING (1)
CIRCUIT CAD (1)
CLUSTER (1)
COMBINATIONAL CIRCUITS (1)
COMPLETE EXTRACTION TIME (1)
COMPLEXITY THEORY (1)
COMPUTATIONAL ELECTROMAGNETICS (1)
COMPUTE UNIFIED DEVICE ARCHITECTURE (1)
COMPUTER GRAPHICS (1)
COMPUTER PERFORMANCE (1)
COMPUTING (1)
CONJUGATE GRADIENT (CG) (1)
CONTACTORS (1)
CONVERGENCE (1)
CONVERGENCE THEOREM (1)
CONVOLUTION (1)
CORE 2 DUO E8500 SYSTEM (1)
CORNER CASE LEAKAGE CONTROL (1)
CORRELATION (1)
CORRELATION METHODS (1)
CPU (1)
CPU TIME (1)
DATA DEPENDENCY (1)
DEPENDENCY PROBLEM (1)
DEPTH-FIRST SEARCH ALGORITHM (1)
DGEMM OPERATION (1)
DIFFUSION (1)
DIFFUSION FIELDS (1)
DIGITAL DOWN CONVERTER (1)
DISTRIBUTED SPATIO-TEMPORAL SAMPLING (1)
DOMAIN DECOMPOSITION METHODS (1)
DOOLITTLE DECOMPOSITION (1)
DOUBLE PRECISION FLOATING NUMBER (1)
DRAIN-SOURCE ORIENTATION (1)
EFFICIENCY (1)
EFFICIENT COMPUTING (1)
ELECTRIC POTENTIAL (1)
ELECTROMAGNETIC ANALYSIS (1)
ELECTROMAGNETIC COMPATIBILITY (1)
ELECTROMAGNETIC MODELING (1)
ELECTRONIC PRODUCT CHARACTERIZATION (1)
ELECTRONIC PROGNOSTICS (1)
ENCODING (1)
ENVIRONMENTAL FACTORS (1)
EULER EQUATION (1)
FAST FOURIER TRANSFORM (1)
FAST LARGE SCALE NUMERICAL SIMULATION IMPLEMENTION (1)
FAST REDUCTION ALGORITHM (1)
FEEDBACK (1)
FIELD PROGRAMMABLE GATE ARRAY (FPGA) (1)
FIELD PROGRAMMABLE GATE ARRAYS (1)
FILTERING (1)
more

INFONA - science communication portal

Search results

A review on accelerating scientific computations using the Conjugate Gradient method

Parallelizing doolittle algorithm using TBB

A Branch-and-Bound algorithm using multiple GPU-based LP solvers

Implementation of a digital down converter using graphics processing unit

Schwarz Method with Two-Sided Transmission Conditions for the Gravity Equations on Graphics Processing Unit

Limiting CPU power consumption for efficient computation of 3D workloads

Performance of FORTRAN and C GPU Extensions for a Benchmark Suite of Fourier Pseudospectral Algorithms

Influence of the control system structure with safety PLC on its reliability and safety

CUDA based Particle Swarm Optimization for geophysical inversion

Aggressive Value Prediction on a GPU

Scalable time definite integration in parallel computing

Overlapping Computation and Communication for Advection on Hybrid Parallel Computers

Fast variational static IR-drop analysis on the graphical processing unit

Using hybrid GPU algorithm for solving of EMC problems

Connection margin value of the terminal in a hierarchical conference

Stretching the limit of microarchitectural level leakage control with Adaptive Light-Weight Vth Hopping

Parallel Matrix-Matrix Multiplication Based on HPL with a GPU-Accelerated PC Cluster

The Possibility of Fast Large-Scale Numerical Simulation Implemented with Graphics Processing Units

An efficient VLSI circuit extraction algorithm for transistor-level to gate-level abstraction

Implementation of Variable Preconditioned GCR with mixed precision on GPU using CUDA

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options