Search results

chapter

A Checkpoint/Restart Scheme for CUDA Applications with Complex Memory Hierarchy

Xinyuan Guo, Hai Jiang, Kuan-Ching Li

2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing > 247 - 252

2013 14th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

Checkpoint/restart has been an effective mechanism to achieve fault tolerance for many scientific applications. However, as GPU becomes a much bigger role in high performance computing, there is no effective checkpoint/restart scheme yet due to GPU's batch-mode execution manner. The paper proposes an application-level checkpoint/restart scheme to save and restore GPU computation states. A precompiler...

chapter

DynaM: Dynamic Multiresolution Data Representation for Large-Scale Scientific Analysis

Yuan Tian, Scott Klasky, Weikuan Yu, Bin Wang, more

2013 IEEE Eighth International Conference on Networking, Architecture and Storage > 115 - 124

2013 IEEE 8th International Conference on Networking, Architecture, and Storage (NAS)

Fast growing large-scale systems enable scientific applications to run at a much larger scale and accordingly produce gigantic volumes of simulation output. Such data imposes a grand challenge to post-processing tasks such as visualization and data analysis, because these tasks are often performed at a host machine that is remotely located and equipped with much less memory and storage resources....

chapter

Sparse matrix-vector multiply on the Texas Instruments C6678 Digital Signal Processor

Yang Gao, Jason D. Bakos

2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors > 168 - 174

2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

The Texas Instruments (TI) C6678 “Shannon” is TI's most recently-released Digital Signal Processor (DSP). Although its original purpose was voice and video encoding and decoding, it may have the potential to become a practical coprocessor for scientific computing. In this paper, we evaluate the C6678 in terms of its programming methodology, performance, and power efficiency. As a case study, we implemented...

chapter

Quantitative measurement of gas component using multisensor array and NPSO-based LS-SVR

Kai Song, Qi Wang, Jianfeng Li, Hongquan Zhang

2013 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) > 1740 - 1743

2013 IEEE International Instrumentation and Measurement Technology Conference (I2MTC)

To solve the nonlinear response of semiconductor gas sensor and cross-sensitivity to the non-target gases, this paper studies gas sensor array and least square support vector regression (LS-SVR) based gas concentration measurement method. Methane (CH4), hydrogen (H2) and their mixtures are selected as the target gases. A multi-sensor array is composed of four metal oxide semiconductor (MOS) gas sensors...

chapter

Virtual Systolic Array for QR Decomposition

Jakub Kurzak, Piotr Luszczek, Mark Gates, Ichitaro Yamazaki, more

2013 IEEE 27th International Symposium on Parallel and Distributed Processing > 251 - 260

2013 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

Systolic arrays offer a very attractive, data centric, execution model as an alternative to the von Neumann architecture. Hardware implementations of systolic arrays turned out not to be viable solutions in the past. This article shows how the systolic design principles can be applied to a software solution to deliver an algorithm with unprecedented strong scaling capabilities. Systolic array for...

chapter

Experimental framework for searching large RDF on GPUs based on key-value storage

Chidchanok Choksuchat, Chantana Chantrapornchai

The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE) > 171 - 176

2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE)

Resource Description Framework (RDF) is commonly used for the semantic web query. During this decade, due to big data processing, the large numbers of RDF triples are crawled. The triples usually stored distributed on the clouds storage or the large clusters. To search for the query answer, it is usually difficult to handle the search across platforms. Also, the search takes a long executed time....

chapter

Multi-object tracking based on improved Mean Shift

Meifeng Gao, Di Liu

2013 IEEE Third International Conference on Information Science and Technology (ICIST) > 1588 - 1592

2013 IEEE Third International Conference on Information Science and Technology (ICIST)

Since Mean Shift algorithm can not track multiple objects, a full automatic multi-object tracking algorithm based on improved Mean Shift is proposed. The background subtraction image kernel density estimation algorithm is used to detect the foreground. The extracted moving objects are used as candidate template to eliminate the influence of background. By adopting object matching based on distance...

chapter

Time-harmonic interaction effects for a periodic system of coplanar cracks in 3D elastic solids

V. Mykhas'kiv, I. Zhbadynskyi, Ch. Zhang

2013 XVIIIth International Seminar/Workshop on Direct and Inverse Problems of Electromagnetic and Acoustic Wave Theory (DIPED) > 246 - 249

2013 XVIIIth International Seminar/Workshop on Direct and Inverse Problems of Electromagnetic and Acoustic Wave Theory (DIPED)

The symmetric problem of time-harmonic elastic wave interaction with a periodic array of coplanar penny-shaped cracks embedded in an infinite elastic solid is numerically investigated. The problem is reduced to a boundary integral equation (BIE) for the crack-opening-displacement (COD) by means of a 3D periodic Green's function obtained in the form of exponentially-convergent Fourier integrals. A...

chapter

High performance multi-standard architecture for DCT computation in H.264/AVC High Profile and HEVC codecs

Tiago Dias, Nuno Roma, Leonel Sousa

2013 Conference on Design and Architectures for Signal and Image Processing > 14 - 21

2013 Conference on Design and Architectures for Signal and Image Processing (DASIP)

A new high performance architecture for the computation of all the DCT operations adopted in the H.264/AVC and HEVC standards is proposed in this paper. Contrasting to other dedicated transform cores, the presented multi-standard transform architecture is supported on a completely configurable, scalable and unified structure, that is able to compute not only the forward and the inverse 8×8 and 4×4...

chapter

Runtime dependency analysis for loop pipelining in High-Level Synthesis

Mythri Alle, Antoine Morvan, Steven Derrien

2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 10

2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC)

Research on High-Level Synthesis has mainly focused on applications with statically determinable characteristics and current tools often perform poorly in presence of data-dependent memory accesses. The reason is that they rely on conservative static scheduling strategies, which lead to inefficient implementations. In this work, we propose to address this issue by leveraging well-known techniques...

chapter

Study of kernel instrumentation based on process switch in program performance testing

Wang Hui, Zhu Xiaodong, Wang Yigang, Chen Ming

IEEE Conference Anthology > 1 - 4

2013 IEEE Conference Anthology

In the software testing, source code instrumentation can be used test code coverage and memory detect, and then collecting testing data during the program dynamic running; but the way can not used to getting process run time. This paper propound kernel task hook instrumentation based on process Switch, for achieving timing relevant index of process during it's life periods, then analyzed the kernel...

chapter

Breaking Weak 1024-bit RSA Keys with CUDA

Kerry Scharfglass, Darrin Weng, Joseph White, Christopher Lupo

2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies > 207 - 212

2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT)

An exploit involving the greatest common divisor (GCD) of RSA moduli was recently discovered [1]. This paper presents a tool that can efficiently and completely compare a large number of 1024-bit RSA public keys, and identify any keys that are susceptible to this weakness. NVIDIA's graphics processing units (GPU) and the CUDA massively-parallel programming model are powerful tools that can be used...

chapter

Software-managed automatic data sharing for Coarse-Grained Reconfigurable coprocessors

Toan X. Mai, Jongeun Lee

2012 International Conference on Field-Programmable Technology > 277 - 284

2012 International Conference on Field-Programmable Technology (FPT)

Coarse-Grained Reconfigurable Architecture (CGRA) in a hybrid system can significantly accelerate the execution of compute-intensive kernels of applications. However, the data communication overhead between the main processor (MP) and the CGRA may be huge and can negate the speed-up of the CGRA. In this paper we address the problem of reducing the data communication overhead in a hybrid system by...

chapter

An OpenCL Approach of Prestack Kirchhoff Time Migration Algorithm on General Purpose GPU

Peiyuan Sun, Xiaohua Shi

2012 13th International Conference on Parallel and Distributed Computing, Applications and Technologies > 179 - 183

2012 13th International Conference on Parallel and Distributed Computing Applications and Technologies (PDCAT)

OpenCL is an open standard for portable, parallel programming across heterogeneous platforms. In this paper, we presented how to implement and optimize Prestack Kirchhoff Time Migration algorithm, which is one of the most widely adopted imaging methods for seismic data processing, on OpenCL and GPGPU. We introduced how to port the original CUDA program to OpenCL, and how to optimize the OpenCL program...

chapter

High performance multi-dimensional (2D/3D) FFT-Shift implementation on Graphics Processing Units (GPUs)

Marwan Abdellah, Salah Saleh, Ayman Eldeib, Amr Shaarawi

2012 Cairo International Biomedical Engineering Conference (CIBEC) > 171 - 174

2012 Cairo International Biomedical Engineering Conference (CIBEC)

Frequency domain analysis is one of the most common analysis techniques in signal and image processing. Fast Fourier Transform (FFT) is a well know tool used to perform such analysis by obtaining the frequency spectrum for time- or spatial-domain signals and vice versa. FFT-Shift is a subsequent operation used to handle the resulting arrays from this stage as it centers the DC component of the resulting...

chapter

Dataflow-driven GPU performance projection for multi-kernel transformations

Jiayuan Meng, Vitali A. Morozov, Venkatram Vishwanath, Kalyan Kumaran

2012 International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2012 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

Applications often have a sequence of parallel operations to be offloaded to graphics processors; each operation can become an individual GPU kernel. Developers typically explore a variety of transformations for each kernel. Furthermore, it is well known that efficient data management is critical in achieving high GPU performance and that "fusing" multiple kernels into one may greatly improve...

chapter

Robust Line Detection in Images of Building Facades using Region-based Weighted Hough Transform

Theocharis Tsenoglou, Nikolaos Vassilas, Djamchid Ghazanfarpour

2012 16th Panhellenic Conference on Informatics > 333 - 338

2012 16th Panhellenic Conference on Informatics (PCI)

A robust region-based weighted Hough Transform method for the detection of straight lines in poor quality images of building facades is presented in this work. Following a typical preprocessing stage that includes color to grayscale transformation, binarization using Otsu's automatic threshold selection method, morphological opening and decomposition into connected regions a minimum bounding rectangle...

chapter

Hybrid-Priority Configuration Cache Supervision Method for Coarse Grained Reconfigurable Architecture

Bo Liu, Peng Cao, Jinjiang Yang

2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery > 408 - 414

2012 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC)

Coarse-grained reconfigurable architecture (CGRA) aims to provide satisfying solutions in terms of both efficiency and flexibility. However, to meet the ever increasing performance demand for multimedia applications, the scale of CGRAs should be larger enough to contain more computation resources for higher processing performance. In this paper, we present a hybrid-priority configuration cache supervision...

chapter

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

John Jenkins, James Dinan, Pavan Balaji, Nagiza F. Samatova, more

2012 IEEE International Conference on Cluster Computing > 468 - 476

2012 IEEE International Conference on Cluster Computing (CLUSTER)

Lack of efficient and transparent interaction with GPU data in hybrid MPI+GPU environments challenges GPU acceleration of large-scale scientific computations. A particular challenge is the transfer of noncontiguous data to and from GPU memory. MPI implementations currently do not provide an efficient means of utilizing data types for noncontiguous communication of data in GPU memory. To address this...

chapter

Cross-Platform OpenCL Code and Performance Portability Investigated with a Climate and Weather Physics Model

Han Dong, Dibyajyoti Ghosh, Fahad Zafar, Shujia Zhou

2012 41st International Conference on Parallel Processing Workshops > 126 - 134

2012 41st International Conference on Parallel Processing Workshops (ICPPW)

Current generation of multicore computing platforms are vastly different. Sustenance of many core applications across heterogenous platforms is a daunting task, more so when dynamic nature of the application is factored in. Open Computing Language (OpenCL) was created to address this issue. Designed to run on CPUs, GPUs, FPGAs and other platforms. OpenCL is becoming a standard for cross-platform parallel...

INFONA - science communication portal

Search results

A Checkpoint/Restart Scheme for CUDA Applications with Complex Memory Hierarchy

DynaM: Dynamic Multiresolution Data Representation for Large-Scale Scientific Analysis

Sparse matrix-vector multiply on the Texas Instruments C6678 Digital Signal Processor

Quantitative measurement of gas component using multisensor array and NPSO-based LS-SVR

Virtual Systolic Array for QR Decomposition

Experimental framework for searching large RDF on GPUs based on key-value storage

Multi-object tracking based on improved Mean Shift

Time-harmonic interaction effects for a periodic system of coplanar cracks in 3D elastic solids

High performance multi-standard architecture for DCT computation in H.264/AVC High Profile and HEVC codecs

Runtime dependency analysis for loop pipelining in High-Level Synthesis

Study of kernel instrumentation based on process switch in program performance testing

Breaking Weak 1024-bit RSA Keys with CUDA

Software-managed automatic data sharing for Coarse-Grained Reconfigurable coprocessors

An OpenCL Approach of Prestack Kirchhoff Time Migration Algorithm on General Purpose GPU

High performance multi-dimensional (2D/3D) FFT-Shift implementation on Graphics Processing Units (GPUs)

Dataflow-driven GPU performance projection for multi-kernel transformations

Robust Line Detection in Images of Building Facades using Region-based Weighted Hough Transform

Hybrid-Priority Configuration Cache Supervision Method for Coarse Grained Reconfigurable Architecture

Enabling Fast, Noncontiguous GPU Data Movement in Hybrid MPI+GPU Environments

Cross-Platform OpenCL Code and Performance Portability Investigated with a Climate and Weather Physics Model

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options