Search results

Items from 101 to 120 out of 843 results

1 ...
3
4
5
6
7
8
9

chapter

Optimization of parallel WAF for two-dimensional shallow water model with CUDA

Nugool Sataporn, Worasait Suwannik, Montri Maleewong

2016 11th International Conference on Computer Science & Education (ICCSE) > 155 - 159

2016 11th International Conference on Computer Science & Education (ICCSE)

This paper proposes the parallel implementation of finite volume method based on weighted average flux (WAF) to solve the shallow water equations on a graphic processing unit. We develop two parallel programs which are 1-dimension thread block and 2-dimension thread block, respectively. We compare the performance of these two versions with a sequential program. The numerical experiment is performed...

chapter

Two Parallel Implementations of Ehrlich-Aberth Algorithm for Root-Finding of Polynomials on Multiple GPUs with OpenMP and MPI

Kahina Ghidouche, Abderrahmane Sider, Lilia Ziane Khodja, Raphael Couturier

2016 IEEE Intl Conference on Computational Science and Engineering (CSE) and IEEE Intl Conference on Embedded and Ubiquitous Computing (EUC) and 15th Intl Symposium on Distributed Computing and Applications for Business Engineering (DCABES) > 270 - 277

Finding the roots of polynomials is a very important part of solving real-life problems but the higher the degree of the polynomials is, the less easy it becomes. In this paper, we present two different parallel algorithms of the Ehrlich-Aberth method to find roots of sparse and fully defined polynomials of high degrees. Both algorithms are based on CUDA technology to be implemented on multi-GPU computing...

chapter

Thread execution on embedded processor - ARM9 core in Embedded Linux environment

Bhairavi N. Savant, Shubhangi M. Deshmukh, Surekha K S Hegde

2016 International Conference on Computing Communication Control and automation (ICCUBEA) > 1 - 5

2016 International Conference on Computing Communication Control and automation (ICCUBEA)

As we know in case of any Operating System, processes do not share resources well. Theres a high context switching overhead. Whereas, a thread (or lightweight process) is a basic unit of CPU utilization and comprises of a thread Identifier (ID), Program counter, register set and stack space. A thread within the process shares its code section, data section, and other operating-system resources, such...

chapter

Temperature-Aware Register Mapping in GPGPUs

Ehsan Atoofian

2016 IEEE Trustcom/BigDataSE/ISPA > 1636 - 1643

2016 IEEE Trustcom/BigDataSE/ISPA

Various architectural-based techniques have been proposed to reduce power consumption in GPGPUs. However, these techniques mostly ignore temperature of GPGPUs. In this paper, we focus on the register file and propose a new technique to reduce its peak temperature. Register file in GPGPUs is very large, even larger than caches, to support thousands of simultaneously execution threads. This makes register...

chapter

Compressed L1 data cache and L2 cache in GPGPUs

Ehsan Atoofian

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 1 - 8

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

General-Purpose Graphics Processing Units (GPGPUs) exploit several levels of caches to hide latency of memory and provide data for thousands of simultaneously executing threads. L1 data cache and L2 cache are critical to performance of GPGPUs as an L1 data cache should provide data for all threads within the corresponding Streaming Multiprocessor (SM) and the L2 cache should service memory requests...

chapter

Enhancing Data Reuse in Cache Contention Aware Thread Scheduling on GPGPU

Chin-Fu Lu, Hsien-Kai Kuo, Bo-Cheng Charles Lai

2016 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS) > 351 - 356

2016 10th International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)

GPGPUs have been widely adopted as throughput processing platforms for modern big-data and cloud computing. Attaining a high performance design on a GPGPU requires careful tradeoffs among various design concerns. Data reuse, cache contention, and thread level parallelism, have been demonstrated as three imperative performance factors for a GPGPU. The correlated performance impacts of these factors...

chapter

A successful parallel implementation of NSGA-II on GPU for the energy dispatch problem on hydroelectric power plants

Lucas Braga de Oliveira, Carolina G. Marcelino, Anolan Milanes, Paulo E. M. Almeida, more

2016 IEEE Congress on Evolutionary Computation (CEC) > 4305 - 4312

2016 IEEE Congress on Evolutionary Computation (CEC)

Nowadays, hydraulic sources are responsible for most of the Brazil's energy production. Hydroelectric power plants (HPP) operators in Brazil usually distribute equally the total power required among the generator units available in the plant. However, studies show that this configuration does not guarantee that each generator unit operate close to its optimal operation point. The energy dispatch optimization...

chapter

GPU-based nonlocal filtering for large scale SAR processing

Gerald Baier, Xiao Xiang Zhu

2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 7608 - 7611

IGARSS 2016 - 2016 IEEE International Geoscience and Remote Sensing Symposium

In the past few years nonlocal filters have emerged as a serious contender for denoising synthetic aperture radar (SAR) images, offering superior noise reduction and detail preservation compared to many other filters. In this manuscript we analyze how nonlocal filters, whose computational costs were so far prohibitive for large scale processing, can be implemented efficiently on graphics processing...

chapter

Parallel adaptive sparsity-constrained NMF algorithm for hyperspectral unmixing

Wenhong Wang, Yuntao Qian

2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 6137 - 6140

IGARSS 2016 - 2016 IEEE International Geoscience and Remote Sensing Symposium

Sparsity-constrained Nonnegative matrix factorization (NMF) has been proved to be an effective method for hyperspectral unmixing. However, the optimization procedure of sparsity-constrained NMF is computational demanding, which may limit its application in time-constrained conditions. In this paper, a parallel L_1/2 sparsity-constrained NMF unmixing method on Graphics Processing Units (GPUs) is proposed,...

chapter

On the optimization of memory access to increase the performance of spatial preprocessing techniques on graphics processing units

J. Delgado, G. Martin, J. Plaza, L. I. Jimenez, more

2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 6541 - 6544

IGARSS 2016 - 2016 IEEE International Geoscience and Remote Sensing Symposium

The use of spatial information prior to spectral unmixing of hyperspectral data is a very active research line in recent years. There are many approximations that consider spatial characteristics of the data in order to guide the endmember identification/extraction procedure. In particular, the spatial preprocessing (SPP) algorithm can be used prior to most existing spectral-based endmember identification...

chapter

Many-Thread Aware Compression in GPGPUs

Ehsan Atoofian

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) > 628 - 635

Compression is a promising technique to increase effective capacity of caches. Due to latency overhead of decompression, most of previous studies applied compression to lower level caches. General-Purpose Graphics Processing Units (GPGPUs) are throughput oriented computing platforms which execute hundreds to thousands of threads, simultaneously. The massive number of threads makes GPGPUs less sensitive...

chapter

MyThOS — Scalable OS Design for Extremely Parallel Applications

Randolf Rotta, Jorg Nolte, Vladimir Nikolov, Lutz Schubert, more

Many-core architectures trade single-thread performance for a larger number of cores. Scalable throughput can be attained only by a high degree of parallelism, minimized synchronization. Whilst this is achievable for many applications, the operating system still introduces bottlenecks through non-local sharing, synchronization,, message passing. A particular challenge for highly dynamic applications,...

chapter

Design and Implementation of Ceph Block Device in Userspace for Container Scenarios

Li Wang, Yunchuan Wen

2016 International Symposium on Computer, Consumer and Control (IS3C) > 383 - 386

2016 International Symposium on Computer, Consumer and Control (IS3C)

Ceph is a well-known and widely deployed open source distributed storage. Specifically, it is the mostly used storage backend for popular OpenStack cloud computing platform. For the traditional usage of Ceph in cloud computing, Ceph block device implemented in the VMM (virtual machine monitor), qem-rbd, is used to provide disks for the VMs (virtual machine). Recently, the container technology becomes...

chapter

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems

Chih-Chen Kao, Yu-Tsung Miao, Wei-Chung Hsu

2016 IEEE International Conference on Multimedia and Expo (ICME) > 1 - 6

2016 IEEE International Conference on Multimedia and Expo (ICME)

The prevalence of real time multimedia delivery appliances has led to the developments of a variety of efficient architectures and supporting software technologies. Especially, Ray-Tracing, a well-known physically-based rendering algorithm, has been receiving great attention in research and development. Unfortunately, Ray-Tracing algorithm, being one of the irregular applications, suffers from the...

chapter

GPU accelerated high-quality video/image super-resolution

Zhangzong Zhao, Li Song, Rong Xie, Xiaokang Yang

2016 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB) > 1 - 4

2016 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB)

This paper presents several novel GPU optimization technologies to accelerate the SRCNN(Super-Resolution Convolutional Neural Network) — one of the best super-resolution algorithm. We first directly parallelize and implement the SRCNN, then accelerate the convolution by making use of the hierarchical feature of GPU memory. We explore different optimization methods on each convolution and select the...

chapter

A GPU parallel implementation of the Local Principal Component Analysis overcomplete method for DW image denoising

Salvatore Cuomo, Pasquale De Michele, Ardelio Galletti, Livia Marcellino

2016 IEEE Symposium on Computers and Communication (ISCC) > 26 - 31

2016 IEEE Symposium on Computers and Communication (ISCC)

We focus on the Overcomplete Local Principal Component Analysis (OLPCA) method, which is widely adopted as denoising filter. We propose a programming approach resorting to Graphic Processor Units (GPUs), in order to massively parallelize some heavy computational tasks of the method. In our approach, we design and implement a parallel version of the OLPCA, by using a suitable mapping of the tasks on...

chapter

Exploiting integrated GPUs for network packet processing workloads

Janet Tseng, Ren Wang, James Tsai, Saikrishna Edupuganti, more

2016 IEEE NetSoft Conference and Workshops (NetSoft) > 161 - 165

2016 IEEE NetSoft Conference and Workshops (NetSoft)

Software-based network packet processing on standard high volume servers promises better flexibility, manageability and scalability, thus gaining tremendous momentum in recent years. Numerous research efforts have focused on boosting packet processing performance by offloading to discrete Graphics Processing Units (GPUs). While integrated GPUs, residing on the same die with the CPU, offer many advanced...

chapter

Retargeting and enhancing a compact multitasking kernel for the Altera Nios II processor

Naraig Manjikian

2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE) > 1 - 5

2016 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE)

This paper describes the retargeting and further enhancement of a compact multitasking kernel for the 32-bit Altera Nios II processor. The kernel, called QUERK for Queen's University Educational Real-time Kernel, was originally written in assembly language and then the C language for the Motorola (and then Freescale) 68HC11 processor. Consisting of less than 200 lines of assembly-language instructions,...

chapter

A GPU Based Maximum Common Subgraph Algorithm for Drug Discovery Applications

P. B. Jayaraj, K. Rahamathulla, G. Gopakumar

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 580 - 588

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The maximum common subgraph of two graphs, G1 and G2, is the largest subgraph in G1 that is isomorphic to a subgraph in G2. Finding the maximum common subgraph of two given graphs is known to be a NP-complete problem. An exact solution for the maximum common subgraph problem can be found by an algorithm that transforms the maximum common subgraph problem into a maximal clique enumeration problem....

chapter

Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures

Ryan Eberhardt, Mark Hoemmen

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 663 - 672

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

We examine the implementation of block compressed row storage (BCSR) sparse matrix-vector multiplication (SpMV) for sparse matrices with dense block substructure, optimized for blocks with sizes from 2x2 to 32x32, on CPU, Intel many-integrated-core, and GPU architectures. Previous research on SpMV for matrices with dense block substructure has largely focused on the design of novel data structures...

1 ...
3
4
5
6
7
8
9

Keywords:
KERNEL
INSTRUCTION SETS

Publication date

Set your own date range

Content availability

Available (840)
None (3)

Keywords

GRAPHICS PROCESSING UNITS (354)
GRAPHICS PROCESSING UNIT (291)
GPU (204)
CUDA (164)
COMPUTER ARCHITECTURE (155)
PARALLEL PROCESSING (149)
HARDWARE (131)
OPTIMIZATION (110)
COMPUTATIONAL MODELING (109)
GPGPU (94)
COPROCESSORS (92)
REGISTERS (84)
MEMORY MANAGEMENT (81)
ARRAYS (77)
COMPUTER GRAPHIC EQUIPMENT (70)
PROGRAMMING (62)
SYNCHRONIZATION (57)
ALGORITHM DESIGN AND ANALYSIS (56)
BENCHMARK TESTING (54)
LINUX (48)
PERFORMANCE EVALUATION (46)
VECTORS (44)
ACCELERATION (43)
LIBRARIES (41)
SPARSE MATRICES (41)
MATHEMATICAL MODEL (38)
BANDWIDTH (35)
MULTIPROCESSING SYSTEMS (34)
THROUGHPUT (34)
MULTICORE PROCESSING (33)
RUNTIME (33)
OPENCL (32)
RANDOM ACCESS MEMORY (32)
RESOURCE MANAGEMENT (31)
MESSAGE SYSTEMS (30)
INDEXES (29)
CONTEXT (28)
FIELD PROGRAMMABLE GATE ARRAYS (27)
PARALLEL COMPUTING (27)
PARALLEL ARCHITECTURES (26)
CENTRAL PROCESSING UNIT (25)
DATA STRUCTURES (25)
REAL-TIME SYSTEMS (22)
EQUATIONS (21)
SCHEDULING (21)
SWITCHES (20)
PERFORMANCE (19)
PARALLEL ALGORITHMS (18)
PIPELINES (18)
CLUSTERING ALGORITHMS (17)
PARALLEL PROGRAMMING (17)
ACCURACY (16)
DATA TRANSFER (16)
HEURISTIC ALGORITHMS (16)
OPENMP (16)
EMBEDDED SYSTEMS (15)
IMAGE PROCESSING (15)
MULTI-THREADING (15)
PIXEL (15)
SYSTEM-ON-CHIP (15)
LAYOUT (14)
OPTIMISATION (14)
PROCESSOR SCHEDULING (14)
SCHEDULES (14)
SERVERS (14)
TRAINING (14)
COMPUTE UNIFIED DEVICE ARCHITECTURE (13)
COMPUTERS (13)
HIGH PERFORMANCE COMPUTING (13)
PARALLEL (13)
REAL TIME SYSTEMS (13)
GPU COMPUTING (12)
GRAPHIC PROCESSING UNIT (12)
MONITORING (12)
MPI (12)
SCALABILITY (12)
STANDARDS (12)
TILES (12)
DECODING (11)
ESTIMATION (11)
FEATURE EXTRACTION (11)
FPGA (11)
GENETIC ALGORITHMS (11)
GPUS (11)
GRAPHICS (11)
HISTOGRAMS (11)
JACOBIAN MATRICES (11)
MATRIX DECOMPOSITION (11)
SPMV (11)
TUNING (11)
ANALYTICAL MODELS (10)
APPLICATION PROGRAM INTERFACES (10)
CONVOLUTION (10)
CPU (10)
EDUCATIONAL INSTITUTIONS (10)
ENCODING (10)
ENERGY CONSUMPTION (10)
IMAGE COLOR ANALYSIS (10)
more

INFONA - science communication portal

Search results

Optimization of parallel WAF for two-dimensional shallow water model with CUDA

Two Parallel Implementations of Ehrlich-Aberth Algorithm for Root-Finding of Polynomials on Multiple GPUs with OpenMP and MPI

Thread execution on embedded processor - ARM9 core in Embedded Linux environment

Temperature-Aware Register Mapping in GPGPUs

Compressed L1 data cache and L2 cache in GPGPUs

Enhancing Data Reuse in Cache Contention Aware Thread Scheduling on GPGPU

A successful parallel implementation of NSGA-II on GPU for the energy dispatch problem on hydroelectric power plants

GPU-based nonlocal filtering for large scale SAR processing

Parallel adaptive sparsity-constrained NMF algorithm for hyperspectral unmixing

On the optimization of memory access to increase the performance of spatial preprocessing techniques on graphics processing units

Many-Thread Aware Compression in GPGPUs

MyThOS — Scalable OS Design for Extremely Parallel Applications

Design and Implementation of Ceph Block Device in Userspace for Container Scenarios

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems

GPU accelerated high-quality video/image super-resolution

A GPU parallel implementation of the Local Principal Component Analysis overcomplete method for DW image denoising

Exploiting integrated GPUs for network packet processing workloads

Retargeting and enhancing a compact multitasking kernel for the Altera Nios II processor

A GPU Based Maximum Common Subgraph Algorithm for Drug Discovery Applications

Optimization of Block Sparse Matrix-Vector Multiplication on Shared-Memory Parallel Architectures

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options