The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes the parallel implementation of finite volume method based on weighted average flux (WAF) to solve the shallow water equations on a graphic processing unit. We develop two parallel programs which are 1-dimension thread block and 2-dimension thread block, respectively. We compare the performance of these two versions with a sequential program. The numerical experiment is performed...
Finding the roots of polynomials is a very important part of solving real-life problems but the higher the degree of the polynomials is, the less easy it becomes. In this paper, we present two different parallel algorithms of the Ehrlich-Aberth method to find roots of sparse and fully defined polynomials of high degrees. Both algorithms are based on CUDA technology to be implemented on multi-GPU computing...
As we know in case of any Operating System, processes do not share resources well. Theres a high context switching overhead. Whereas, a thread (or lightweight process) is a basic unit of CPU utilization and comprises of a thread Identifier (ID), Program counter, register set and stack space. A thread within the process shares its code section, data section, and other operating-system resources, such...
Various architectural-based techniques have been proposed to reduce power consumption in GPGPUs. However, these techniques mostly ignore temperature of GPGPUs. In this paper, we focus on the register file and propose a new technique to reduce its peak temperature. Register file in GPGPUs is very large, even larger than caches, to support thousands of simultaneously execution threads. This makes register...
General-Purpose Graphics Processing Units (GPGPUs) exploit several levels of caches to hide latency of memory and provide data for thousands of simultaneously executing threads. L1 data cache and L2 cache are critical to performance of GPGPUs as an L1 data cache should provide data for all threads within the corresponding Streaming Multiprocessor (SM) and the L2 cache should service memory requests...
GPGPUs have been widely adopted as throughput processing platforms for modern big-data and cloud computing. Attaining a high performance design on a GPGPU requires careful tradeoffs among various design concerns. Data reuse, cache contention, and thread level parallelism, have been demonstrated as three imperative performance factors for a GPGPU. The correlated performance impacts of these factors...
Nowadays, hydraulic sources are responsible for most of the Brazil's energy production. Hydroelectric power plants (HPP) operators in Brazil usually distribute equally the total power required among the generator units available in the plant. However, studies show that this configuration does not guarantee that each generator unit operate close to its optimal operation point. The energy dispatch optimization...
In the past few years nonlocal filters have emerged as a serious contender for denoising synthetic aperture radar (SAR) images, offering superior noise reduction and detail preservation compared to many other filters. In this manuscript we analyze how nonlocal filters, whose computational costs were so far prohibitive for large scale processing, can be implemented efficiently on graphics processing...
Sparsity-constrained Nonnegative matrix factorization (NMF) has been proved to be an effective method for hyperspectral unmixing. However, the optimization procedure of sparsity-constrained NMF is computational demanding, which may limit its application in time-constrained conditions. In this paper, a parallel L1/2 sparsity-constrained NMF unmixing method on Graphics Processing Units (GPUs) is proposed,...
The use of spatial information prior to spectral unmixing of hyperspectral data is a very active research line in recent years. There are many approximations that consider spatial characteristics of the data in order to guide the endmember identification/extraction procedure. In particular, the spatial preprocessing (SPP) algorithm can be used prior to most existing spectral-based endmember identification...
Compression is a promising technique to increase effective capacity of caches. Due to latency overhead of decompression, most of previous studies applied compression to lower level caches. General-Purpose Graphics Processing Units (GPGPUs) are throughput oriented computing platforms which execute hundreds to thousands of threads, simultaneously. The massive number of threads makes GPGPUs less sensitive...
Many-core architectures trade single-thread performance for a larger number of cores. Scalable throughput can be attained only by a high degree of parallelism, minimized synchronization. Whilst this is achievable for many applications, the operating system still introduces bottlenecks through non-local sharing, synchronization,, message passing. A particular challenge for highly dynamic applications,...
Ceph is a well-known and widely deployed open source distributed storage. Specifically, it is the mostly used storage backend for popular OpenStack cloud computing platform. For the traditional usage of Ceph in cloud computing, Ceph block device implemented in the VMM (virtual machine monitor), qem-rbd, is used to provide disks for the VMs (virtual machine). Recently, the container technology becomes...
The prevalence of real time multimedia delivery appliances has led to the developments of a variety of efficient architectures and supporting software technologies. Especially, Ray-Tracing, a well-known physically-based rendering algorithm, has been receiving great attention in research and development. Unfortunately, Ray-Tracing algorithm, being one of the irregular applications, suffers from the...
This paper presents several novel GPU optimization technologies to accelerate the SRCNN(Super-Resolution Convolutional Neural Network) — one of the best super-resolution algorithm. We first directly parallelize and implement the SRCNN, then accelerate the convolution by making use of the hierarchical feature of GPU memory. We explore different optimization methods on each convolution and select the...
We focus on the Overcomplete Local Principal Component Analysis (OLPCA) method, which is widely adopted as denoising filter. We propose a programming approach resorting to Graphic Processor Units (GPUs), in order to massively parallelize some heavy computational tasks of the method. In our approach, we design and implement a parallel version of the OLPCA, by using a suitable mapping of the tasks on...
Software-based network packet processing on standard high volume servers promises better flexibility, manageability and scalability, thus gaining tremendous momentum in recent years. Numerous research efforts have focused on boosting packet processing performance by offloading to discrete Graphics Processing Units (GPUs). While integrated GPUs, residing on the same die with the CPU, offer many advanced...
This paper describes the retargeting and further enhancement of a compact multitasking kernel for the 32-bit Altera Nios II processor. The kernel, called QUERK for Queen's University Educational Real-time Kernel, was originally written in assembly language and then the C language for the Motorola (and then Freescale) 68HC11 processor. Consisting of less than 200 lines of assembly-language instructions,...
The maximum common subgraph of two graphs, G1 and G2, is the largest subgraph in G1 that is isomorphic to a subgraph in G2. Finding the maximum common subgraph of two given graphs is known to be a NP-complete problem. An exact solution for the maximum common subgraph problem can be found by an algorithm that transforms the maximum common subgraph problem into a maximal clique enumeration problem....
We examine the implementation of block compressed row storage (BCSR) sparse matrix-vector multiplication (SpMV) for sparse matrices with dense block substructure, optimized for blocks with sizes from 2x2 to 32x32, on CPU, Intel many-integrated-core, and GPU architectures. Previous research on SpMV for matrices with dense block substructure has largely focused on the design of novel data structures...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.