The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we propose a memory accessing method of Parallel Failureless Aho-Corasick (PFAC) algorithm considering Graphic Processing Unit (GPU) memory architecture for throughput improvement. Compared with Aho-Corasick (AC) Algorithm using Central Processing Unit (CPU) and Data-Parallel Aho-Corasick (DPAC) using Open Multi-Processing (OpenMP), PFAC using GPU achieves high performance advancement...
Inband full-duplex radio transceivers offer enhanced spectral efficiency by transmitting and receiving simultaneously at the same frequency. However, deployment of such systems is challenging due to the inherent self-interference stemming from coupling of the transmit signal to the receiver. Furthermore, to track changes in the time-varying self-interference channel, the process needs to be self-adaptive...
Convolutional Neural Networks (CNNs) have gained significant traction in the field of machine learning, particularly due to their high accuracy in visual recognition. Recent works have pushed the performance of GPU implementations of CNNs showing significant improvements in their classification and training times. With these improvements, many frameworks have become available for implementing CNNs...
With recent advances in deep convolutional neural networks (CNN), deep learning has brought significant quality improvement and flexibility on single image super resolution (SR). In this paper, we describe how CNN based SR can be accelerated on integrated GPUs. To this end, we employ a CNN model from an existing single image SR approach, and develop the model within a well-known deep learning framework...
Heterogeneous computing is a growing trend in recent computer architecture design and is often used to improve the performance and power efficiency for computing applications by utilizing the special-purpose processors or accelerators, such as the Graphic Computing Unit (GPU), Field Programmable Gate Array (FPGA) and Digital Signal Processor (DSP). With the increase of complexity, the interaction...
GPU has become an important component of the high performance computing system and its principal duty is parallel computing rather than graphical display. Determining the power and energy consumption is necessary to the scaling of GPU. This paper presents a statistic model to evaluate the power and energy consumption of AMD's integrated GPU (iGPU). By collecting the data of performance counters from...
Deep learning frameworks have recently gained widespread popularity due to their highly accurate prediction capabilities and availability of low cost processors that can perform training over a large dataset quickly. Given the high core count in modern generation high performance computing systems, training deep networks over large data has now become practical. In this work, while targeting the Computational...
Target detection is a hard real-time task for video and image processing. This task has recently been accomplished through the feedforward process of convolutional neural net-works (CNN), which is usually accelerated by general-purpose graphic units (GPUs). However, there is a challenge for this task. The running speed remains to be improved. In this paper, we present an efficient image combination...
Recently, researchers discovered a GPU has some advantages for non-graphic computing. CPU-GPU heterogeneous architecture combines CPU and GPU to a chip and makes GPU easier to run non-graphic programs. Researchers also proposed LLC(last-level cache) to store and exchange data between CPU and GPU. We discover the LLC hit rate has great influence on memory access performance and system's performance...
Generally, cache is a bridge between CPU and main memory in order to narrow the gap of performance. As a throughput-oriented device, Graphics Processing Unit(GPU) has already integrated with cache, which is similar to CPU cores in order to exploit the locality of memory accesses. However, the applications in GPGPU computing exhibit distinct memory access patterns compared to the multi-core counterparts...
This paper presents a scalable multiple GPU architecture for super multi-view (SMV) synthesis using the multi-view video plus depth (MVD) data. SMV synthesis is essential to generate 3D contents for the SMV 3D display with hundred views. SMV 3D display, recently released to support 108 viewpoints, shows the multiplexed result of small viewing interval. Hence, we should synthesize the intermediate...
Convolution operations dominate the total execution time of deep convolutional neural networks (CNNs). In this paper, we aim at enhancing the performance of the state-of-the-art convolution algorithm (called Winograd convolution) on the GPU. Our work is based on two observations: (1) CNNs often have abundant zero weights and (2) the performance benefit of Winograd convolution is limited mainly due...
Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping...
In this paper the problem of segmentation of vol- umetric medical images is considered. The fast and effective segmentation is obtained by applying the proposed approach which combines the idea of supervoxels and the Fuzzy C-Means algorithm. In particular, Fuzzy C-Means is used to cluster supervoxels produced by the fast 3D region growing. Additional acceleration of the method is achieved with the...
Histogram is a popular analytic graphical representation of data distribution resulting from processing a given numerical input data. Although the sequential histogram computation may be simple, it is no longer suitable in processing high volume of data. With recent advancement of high performance computing (HPC), aided by the accelerating growth of General Purpose Graphic Processing Unit (GPGPU),...
Multi-scale Retinex algorithm is an image enhancement algorithm that aims at image reconstruction. The algorithm maintains the high fidelity and the dynamic range compression of the image, so the enhancement effect is obvious. The algorithm exploits a large number of convolution operations to achieve dynamic range compression and color/brightness rendition, and the calculation time increased significantly...
Molecular dynamics simulations, an indispensable research tool in computational chemistry and materials science, consume a significant portion of the supercomputing cycles around the world. We focus on multi-body potentials and aim at achieving performance portability. Compared with well-studied pair potentials, multibody potentials deliver increased simulation accuracy but are too complex for effective...
Accelerators have emerged as an important component of modern cloud, datacenter, and HPC computing environments. However, launching tasks on remote accelerators across a network remains unwieldy, forcing programmers to send data in large chunks to amortize the transfer and launch overhead. By combining advances in intra-node accelerator unification with one-sided Remote Direct Memory Access (RDMA)...
Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation and object detection and localization. Here we consider the parallelization of inference, i.e., the application of a previously trained ConvNet, with emphasis on 3D images. Our goal is to maximize throughput, defined as the number of output voxels computed per unit...
GPUs have emerged as general-purpose accelerators in high-performance computing (HPC) and scientific applications. However, the reliability characteristics of GPU applications have not been investigated in depth. While error propagation has been extensively investigated for non-GPU applications, GPU applications have a very different programming model which can have a significant effect on error propagation...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.