Search results

Items from 81 to 100 out of 843 results

chapter

A study on the method of the remote IPC based on xeon-phi hardware platform

Jeong-Hwan Lee, Seung-Jun Cha, Seung-Hyub Jeon, Sungin Jung

2016 International Conference on Information and Communication Technology Convergence (ICTC) > 601 - 603

2016 International Conference on Information and Communication Technology Convergence (ICTC)

We designed and implemented a Remote Inter-Processor Communication architecture software on Xeon Phi coprocessors and made a testbed to verify it. Also, we implemented a lightweight kernel and RIPC transmission/receiver application threads on the lightweight kernel running on Xeon Phi coprocessors. This paper proposes RIPC methods to communicate between user threads in separate Xeon Phi nodes using...

chapter

A Gb/s parallel block-based Viterbi decoder for convolutional codes on GPU

Hao Peng, Rongke Liu, Yi Hou, Ling Zhao

2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) > 1 - 6

2016 8th International Conference on Wireless Communications & Signal Processing (WCSP)

In this paper, we propose a parallel block-based Viterbi decoder (PBVD) on the graphic processing unit (GPU) platform for the decoding of convolutional codes. The decoding procedure is simplified and parallelized, and the characteristic of the trellis is exploited to reduce the metric computation. Based on the compute unified device architecture (CUDA), two kernels with different parallelism are designed...

chapter

OpenSwarm: An event-driven embedded operating system for miniature robots

Stefan M. Trenkwalder, Yuri Kaszubowski Lopes, Andreas Kolling, Anders Lyhne Christensen, more

2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) > 4483 - 4490

2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

This paper presents OpenSwarm, a lightweight easy-to-use open-source operating system. To our knowledge, it is the first operating system designed for and deployed on miniature robots. OpenSwarm operates directly on a robot's microcontroller. It has a memory footprint of 1 kB RAM and 12 kB ROM. OpenSwarm enables a robot to execute multiple processes simultaneously. It provides a hybrid kernel that...

chapter

Hardware thread reordering to boost OpenCL throughput on FPGAs

Amir Momeni, Hamed Tabkhi, Gunar Schirner, David Kaeli

2016 IEEE 34th International Conference on Computer Design (ICCD) > 257 - 264

2016 IEEE 34th International Conference on Computer Design (ICCD)

Availability of OpenCL for FPGAs has raised new questions about the efficiency of massive thread-level parallelism on FPGAs. The general trend is toward creating deep pipelining and in-order execution of many OpenCL threads across a shared data-path. While this can be a very effective approach for regular kernels, its efficiency significantly diminishes for irregular kernels with runtime-dependent...

chapter

ONAC: Optimal number of active cores detector for energy efficient GPU computing

Xian Zhu, Mihir Awatramani, Diane Rover, Joseph Zambreno

2016 IEEE 34th International Conference on Computer Design (ICCD) > 512 - 519

2016 IEEE 34th International Conference on Computer Design (ICCD)

Graphics Processing Units (GPUs) have become a prevalent platform for high throughput general purpose computing. The peak computational throughput of GPUs has been steadily increasing with each technology node by scaling the number of cores on the chip. Although this vastly improves the performance of several compute-intensive applications, our experiments show that some applications can achieve peak...

chapter

Performance Optimization for SpMV on Multi-GPU Systems Using Threads and Multiple Streams

Ping Guo, Changjiang Zhang

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 67 - 72

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Sparse matrix-vector multiplication (SpMV) is a key operation in scientific computing and engineering ap-plications. This paper presents an optimization strategy to improve SpMV performance on the multi-GPU systems by adopting OpenMP threads and multiple CUDA streams. We propose an efficient scheme to control multiple GPUs jointly complete SpMV computations by making use of OpenMP threads. Moreover,...

chapter

Research of parallel dehazing using temporal coherence algorithm based on CUDA

Yanwen Gu, Xiaogang Zhang

2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC) > 56 - 61

2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC)

It makes the haze removal in real-time by CUDA based on the atmospheric scattering model and temporal coherence algorithm. Firstly, a hierarchical search method based on four fork tree subdivision replaced the original algorithm to obtain the atmospheric light, and put the number of pixels as the number of parallel threads, which processes the required calculation of pixels, the intermediate results...

chapter

A Benchmark on Multi Improvement Neighborhood Search Strategies in CPU/GPU Systems

Eyder Rios, Igor M. Coelho, Luiz Satoru Ochi, Cristina Boeres, more

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 49 - 54

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

In combinatorial optimization problems, the neighborhood search (NS) is a fundamental component for local search based heuristics. It consists of selecting a solution from a high cardinality set of neighbor solutions, by means of operations called moves. To perform this search, NS algorithms usually adopt two main approaches: selecting the first or best improving move. The Multi Improvement (MI) strategy...

chapter

Profiling-based task graph extraction on multiprocessor system-on-chip

Sodam Han, Yonghee Yun, Young Hwan Kim

2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS) > 510 - 513

2016 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)

This paper proposes a profiling-based method to extract a task graph, which describes the system behavior of a multiprocessor system-on-chip with Android OS. The proposed method computes the resource usage of each task and extracts dependency among tasks using the run-time system profiling results. The proposed method calculates CPU resource usage and I/O waiting time of each task by analyzing CPU...

chapter

Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels

Islam Harb, Wu-Chun Feng

2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) > 451 - 456

2016 IEEE 24th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS)

There is a lack of support for explicit synchronization in GPUs between the streaming multiprocessors (SMs) adversely impacts the performance of the GPUs to efficiently perform inter-block communication. In this paper, we present several approaches to inter-block synchronization using explicit/implicit CPU-based and dynamic parallelism (DP) mechanisms. Although this topic has been addressed in previous...

chapter

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

Yue Hu, David M. Koppelman, Steven R. Brandt

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 361 - 368

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Stencil computations form the basis for computer simulations across almost every field of science, such as computational fluid dynamics, data mining, and image processing. Their mostly regular data access patterns potentially enable them to take advantage of the high computation and data bandwidth of GPUs, but only if data buffering and other issues are handled properly. Finding a good code generation...

chapter

Parallel motion estimation and GPU-based fast coding unit splitting mechanism for HEVC

Yih-Chuan Lin, Shang-Che Wu

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

This paper presents a parallel motion estimation algorithm on Graphics Processing Units (GPU) with a GPU-based fast Coding Unit (CU) splitting mechanism for speeding up the execution speed of High Efficiency Video Coding (HEVC). Parallel motion estimation algorithms only offer motion vectors to HEVC encoder, but CU splitting decision in HEVC still needs more information to speed up the encoder. Therefore,...

chapter

Dual-Engine Cross-ISA DBTO Technique Utilising MultiThreaded Support for Multicore Processor System

Joo On Ooi, Fawnizu Azmadi B. Hussin, Nordin Zakaria

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 257 - 264

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

The emergence of new era of Internet of Things or IoT have encouraged intensive if not extensive usage of modern mobile apps, thus multi-ISA equipped multicore processor gain great potential to be used for more efficient instruction binary processing in near future. In order to support this ISA diversity of computing platforms, mix modes of statically and dynamically Binary Translation and Optimization...

chapter

Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs

Daichi Mukunoki, Toshiyuki Imamura, Daisuke Takahashi

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 377 - 384

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

The performance of a CUDA kernel often depends on the number of threads per thread-block (thread-block size), and the optimal configuration differs according to the graphics processing unit (GPU) hardware and the given data size to the kernel. In particular, in linear algebra libraries such as Basic Linear Algebra Subprograms (BLAS), most routines support a wide range of problem sizes and various...

chapter

Non-Equispaced FFT Computation with CUDA and GPU

Xiangwen Lyu, Jian-Min Zuo, Haiyong Xie

2016 International Conference on Virtual Reality and Visualization (ICVRV) > 227 - 234

2016 International Conference on Virtual Reality and Visualization (ICVRV)

Non-equispaced fast Fourier transform (NFFT) has attracted significant interest for its applications in tomography and remote sensing where visualization and image reconstruction require non-equispaced data. Here we present an efficient implementation of high accuracy NFFT on an NVidia GPU (Graphic Processing Unit). We focused on the convolution step in the computation of NFFT, since it is the most...

chapter

Efficient HEVC decoder for heterogeneous CPU with GPU systems

Biao Wang, Mauricio Alvarez-Mesa, Chi Ching Chi, Ben Juurlink, more

2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP) > 1 - 6

2016 IEEE 18th International Workshop on Multimedia Signal Processing (MMSP)

The High Efficiency Video Coding (HEVC) standard provides higher compression efficiency than other video coding standards but at the cost of increased computational load, which makes it hard to achieve real-time encoding/decoding of high-resolution, high-quality video sequences. In this paper, we investigate how Graphics Processing Units (GPUs) can be employed to accelerate HEVC decoding. GPUs are...

chapter

An improved GPGPU-Accelerated parallelization for rotation invariant thinning algorithm

Weiguang Yang, Qi Jia, Hui Liu, Yihao Wu, more

2016 IEEE International Conference on Image Processing (ICIP) > 1784 - 1788

2016 IEEE International Conference on Image Processing (ICIP)

Document is unavailable: This DOI was registered to an article that was not presented by the author(s) at this conference. As per section 8.2.1.B.13 of IEEE's "Publication Services and Products Board Operations Manual," IEEE has chosen to exclude this article from distribution. We regret any inconvenience.

chapter

Communication-aware mapping of stream graphs for multi-GPU platforms

Dong Nguyen, Jongeun Lee

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 94 - 104

2016 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Stream graphs can provide a natural way to represent many applications in multimedia and DSP domains. Though the exposed parallelism of stream graphs makes it relatively easy to map them to GP (General Purpose)-GPUs, very large stream graphs as well as how to best exploit multi-GPU platforms to achieve scalable performance poses great challenges for stream graph mapping. Previous work considers either...

chapter

GPGPU vs multiprocessor SPSO implementations to solve electromagnetic optimization problems

Anton Duca, Laurentiu Duca, Gabriela Ciuprina, Daniel Ioan

2015 7th International Joint Conference on Computational Intelligence (IJCCI) > 1 > 64 - 73

2015 7th International Joint Conference on Computational Intelligence (IJCCI)

This paper studies two parallelization techniques for the implementation of a SPSO algorithm applied to optimize electromagnetic field devices, GPGPU and Pthreads for multiprocessor architectures. The GPGPU and Pthreads implementations are compared in terms of solution quality and speed up. The electromagnetic optimization problems chosen for testing the efficiency of the parallelization techniques...

chapter

An empirical study of parallel solutions for GLCM calculation of diffraction images

John Dixon, Junhua Ding

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 3969 - 3972

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Feature calculation of large amount of images is time consuming. The GPU based CUDA framework offers an affordable solution for calculating image features in parallel. The research focused on an empirical study of different implementations of a general-purpose GPU-based solution for calculating Gray-Level Co-occurrence Matrices (GLCM) and associated features of diffraction images of biological cells...

Keywords:
KERNEL
INSTRUCTION SETS

Publication date

Set your own date range

Content availability

Available (840)
None (3)

Keywords

GRAPHICS PROCESSING UNITS (354)
GRAPHICS PROCESSING UNIT (291)
GPU (204)
CUDA (164)
COMPUTER ARCHITECTURE (155)
PARALLEL PROCESSING (149)
HARDWARE (131)
OPTIMIZATION (110)
COMPUTATIONAL MODELING (109)
GPGPU (94)
COPROCESSORS (92)
REGISTERS (84)
MEMORY MANAGEMENT (81)
ARRAYS (77)
COMPUTER GRAPHIC EQUIPMENT (70)
PROGRAMMING (62)
SYNCHRONIZATION (57)
ALGORITHM DESIGN AND ANALYSIS (56)
BENCHMARK TESTING (54)
LINUX (48)
PERFORMANCE EVALUATION (46)
VECTORS (44)
ACCELERATION (43)
LIBRARIES (41)
SPARSE MATRICES (41)
MATHEMATICAL MODEL (38)
BANDWIDTH (35)
MULTIPROCESSING SYSTEMS (34)
THROUGHPUT (34)
MULTICORE PROCESSING (33)
RUNTIME (33)
OPENCL (32)
RANDOM ACCESS MEMORY (32)
RESOURCE MANAGEMENT (31)
MESSAGE SYSTEMS (30)
INDEXES (29)
CONTEXT (28)
FIELD PROGRAMMABLE GATE ARRAYS (27)
PARALLEL COMPUTING (27)
PARALLEL ARCHITECTURES (26)
CENTRAL PROCESSING UNIT (25)
DATA STRUCTURES (25)
REAL-TIME SYSTEMS (22)
EQUATIONS (21)
SCHEDULING (21)
SWITCHES (20)
PERFORMANCE (19)
PARALLEL ALGORITHMS (18)
PIPELINES (18)
CLUSTERING ALGORITHMS (17)
PARALLEL PROGRAMMING (17)
ACCURACY (16)
DATA TRANSFER (16)
HEURISTIC ALGORITHMS (16)
OPENMP (16)
EMBEDDED SYSTEMS (15)
IMAGE PROCESSING (15)
MULTI-THREADING (15)
PIXEL (15)
SYSTEM-ON-CHIP (15)
LAYOUT (14)
OPTIMISATION (14)
PROCESSOR SCHEDULING (14)
SCHEDULES (14)
SERVERS (14)
TRAINING (14)
COMPUTE UNIFIED DEVICE ARCHITECTURE (13)
COMPUTERS (13)
HIGH PERFORMANCE COMPUTING (13)
PARALLEL (13)
REAL TIME SYSTEMS (13)
GPU COMPUTING (12)
GRAPHIC PROCESSING UNIT (12)
MONITORING (12)
MPI (12)
SCALABILITY (12)
STANDARDS (12)
TILES (12)
DECODING (11)
ESTIMATION (11)
FEATURE EXTRACTION (11)
FPGA (11)
GENETIC ALGORITHMS (11)
GPUS (11)
GRAPHICS (11)
HISTOGRAMS (11)
JACOBIAN MATRICES (11)
MATRIX DECOMPOSITION (11)
SPMV (11)
TUNING (11)
ANALYTICAL MODELS (10)
APPLICATION PROGRAM INTERFACES (10)
CONVOLUTION (10)
CPU (10)
EDUCATIONAL INSTITUTIONS (10)
ENCODING (10)
ENERGY CONSUMPTION (10)
IMAGE COLOR ANALYSIS (10)
more

INFONA - science communication portal

Search results

A study on the method of the remote IPC based on xeon-phi hardware platform

A Gb/s parallel block-based Viterbi decoder for convolutional codes on GPU

OpenSwarm: An event-driven embedded operating system for miniature robots

Hardware thread reordering to boost OpenCL throughput on FPGAs

ONAC: Optimal number of active cores detector for energy efficient GPU computing

Performance Optimization for SpMV on Multi-GPU Systems Using Threads and Multiple Streams

Research of parallel dehazing using temporal coherence algorithm based on CUDA

A Benchmark on Multi Improvement Neighborhood Search Strategies in CPU/GPU Systems

Profiling-based task graph extraction on multiprocessor system-on-chip

Characterizing Performance and Power towards Efficient Synchronization of GPU Kernels

A Performance Model and Efficiency-Based Assignment of Buffering Strategies for Automatic GPU Stencil Code Generation

Parallel motion estimation and GPU-based fast coding unit splitting mechanism for HEVC

Dual-Engine Cross-ISA DBTO Technique Utilising MultiThreaded Support for Multicore Processor System

Automatic Thread-Block Size Adjustment for Memory-Bound BLAS Kernels on GPUs

Non-Equispaced FFT Computation with CUDA and GPU

Efficient HEVC decoder for heterogeneous CPU with GPU systems

An improved GPGPU-Accelerated parallelization for rotation invariant thinning algorithm

Communication-aware mapping of stream graphs for multi-GPU platforms

GPGPU vs multiprocessor SPSO implementations to solve electromagnetic optimization problems

An empirical study of parallel solutions for GLCM calculation of diffraction images

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options