The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Checkpoint/restart has been an effective mechanism to achieve fault tolerance for many scientific applications. However, as GPU becomes a much bigger role in high performance computing, there is no effective checkpoint/restart scheme yet due to GPU's batch-mode execution manner. The paper proposes an application-level checkpoint/restart scheme to save and restore GPU computation states. A precompiler...
Fast growing large-scale systems enable scientific applications to run at a much larger scale and accordingly produce gigantic volumes of simulation output. Such data imposes a grand challenge to post-processing tasks such as visualization and data analysis, because these tasks are often performed at a host machine that is remotely located and equipped with much less memory and storage resources....
The Texas Instruments (TI) C6678 “Shannon” is TI's most recently-released Digital Signal Processor (DSP). Although its original purpose was voice and video encoding and decoding, it may have the potential to become a practical coprocessor for scientific computing. In this paper, we evaluate the C6678 in terms of its programming methodology, performance, and power efficiency. As a case study, we implemented...
To solve the nonlinear response of semiconductor gas sensor and cross-sensitivity to the non-target gases, this paper studies gas sensor array and least square support vector regression (LS-SVR) based gas concentration measurement method. Methane (CH4), hydrogen (H2) and their mixtures are selected as the target gases. A multi-sensor array is composed of four metal oxide semiconductor (MOS) gas sensors...
Systolic arrays offer a very attractive, data centric, execution model as an alternative to the von Neumann architecture. Hardware implementations of systolic arrays turned out not to be viable solutions in the past. This article shows how the systolic design principles can be applied to a software solution to deliver an algorithm with unprecedented strong scaling capabilities. Systolic array for...
Resource Description Framework (RDF) is commonly used for the semantic web query. During this decade, due to big data processing, the large numbers of RDF triples are crawled. The triples usually stored distributed on the clouds storage or the large clusters. To search for the query answer, it is usually difficult to handle the search across platforms. Also, the search takes a long executed time....
Since Mean Shift algorithm can not track multiple objects, a full automatic multi-object tracking algorithm based on improved Mean Shift is proposed. The background subtraction image kernel density estimation algorithm is used to detect the foreground. The extracted moving objects are used as candidate template to eliminate the influence of background. By adopting object matching based on distance...
The symmetric problem of time-harmonic elastic wave interaction with a periodic array of coplanar penny-shaped cracks embedded in an infinite elastic solid is numerically investigated. The problem is reduced to a boundary integral equation (BIE) for the crack-opening-displacement (COD) by means of a 3D periodic Green's function obtained in the form of exponentially-convergent Fourier integrals. A...
A new high performance architecture for the computation of all the DCT operations adopted in the H.264/AVC and HEVC standards is proposed in this paper. Contrasting to other dedicated transform cores, the presented multi-standard transform architecture is supported on a completely configurable, scalable and unified structure, that is able to compute not only the forward and the inverse 8×8 and 4×4...
Research on High-Level Synthesis has mainly focused on applications with statically determinable characteristics and current tools often perform poorly in presence of data-dependent memory accesses. The reason is that they rely on conservative static scheduling strategies, which lead to inefficient implementations. In this work, we propose to address this issue by leveraging well-known techniques...
In the software testing, source code instrumentation can be used test code coverage and memory detect, and then collecting testing data during the program dynamic running; but the way can not used to getting process run time. This paper propound kernel task hook instrumentation based on process Switch, for achieving timing relevant index of process during it's life periods, then analyzed the kernel...
An exploit involving the greatest common divisor (GCD) of RSA moduli was recently discovered [1]. This paper presents a tool that can efficiently and completely compare a large number of 1024-bit RSA public keys, and identify any keys that are susceptible to this weakness. NVIDIA's graphics processing units (GPU) and the CUDA massively-parallel programming model are powerful tools that can be used...
Coarse-Grained Reconfigurable Architecture (CGRA) in a hybrid system can significantly accelerate the execution of compute-intensive kernels of applications. However, the data communication overhead between the main processor (MP) and the CGRA may be huge and can negate the speed-up of the CGRA. In this paper we address the problem of reducing the data communication overhead in a hybrid system by...
OpenCL is an open standard for portable, parallel programming across heterogeneous platforms. In this paper, we presented how to implement and optimize Prestack Kirchhoff Time Migration algorithm, which is one of the most widely adopted imaging methods for seismic data processing, on OpenCL and GPGPU. We introduced how to port the original CUDA program to OpenCL, and how to optimize the OpenCL program...
Frequency domain analysis is one of the most common analysis techniques in signal and image processing. Fast Fourier Transform (FFT) is a well know tool used to perform such analysis by obtaining the frequency spectrum for time- or spatial-domain signals and vice versa. FFT-Shift is a subsequent operation used to handle the resulting arrays from this stage as it centers the DC component of the resulting...
Applications often have a sequence of parallel operations to be offloaded to graphics processors; each operation can become an individual GPU kernel. Developers typically explore a variety of transformations for each kernel. Furthermore, it is well known that efficient data management is critical in achieving high GPU performance and that "fusing" multiple kernels into one may greatly improve...
A robust region-based weighted Hough Transform method for the detection of straight lines in poor quality images of building facades is presented in this work. Following a typical preprocessing stage that includes color to grayscale transformation, binarization using Otsu's automatic threshold selection method, morphological opening and decomposition into connected regions a minimum bounding rectangle...
Coarse-grained reconfigurable architecture (CGRA) aims to provide satisfying solutions in terms of both efficiency and flexibility. However, to meet the ever increasing performance demand for multimedia applications, the scale of CGRAs should be larger enough to contain more computation resources for higher processing performance. In this paper, we present a hybrid-priority configuration cache supervision...
Lack of efficient and transparent interaction with GPU data in hybrid MPI+GPU environments challenges GPU acceleration of large-scale scientific computations. A particular challenge is the transfer of noncontiguous data to and from GPU memory. MPI implementations currently do not provide an efficient means of utilizing data types for noncontiguous communication of data in GPU memory. To address this...
Current generation of multicore computing platforms are vastly different. Sustenance of many core applications across heterogenous platforms is a daunting task, more so when dynamic nature of the application is factored in. Open Computing Language (OpenCL) was created to address this issue. Designed to run on CPUs, GPUs, FPGAs and other platforms. OpenCL is becoming a standard for cross-platform parallel...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.