Search results

Items from 61 to 80 out of 473 results

chapter

Detecting Kernels Suitable for C-Based High-Level Hardware Synthesis

Julian Oppermann, Andreas Koch

2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld) > 1157 - 1164

We present SpExSim, a software tool for quickly surveying legacy code bases for kernels that could be accelerated by FPGA-based compute units. We specifically aim for low development effort by considering the use of C-based high-level hardware synthesis, instead of complex manual hardware designs. SpExSim not only exploits the spatially distributed model of computation commonly used on FPGAs, but...

chapter

Accelerating Computation on an Android Phone with OpenCL Parallelism and Optimizing Workload Distribution between a Phone and a Cloud Service

Kui Wang, Jari Nurmi, Tapani Ahonen

We evaluate workload distribution optimization between an Android phone, a cloud service by considering the overall impact of both computation, data transfer. We use OpenCL parallelism on Android to obtain high computation performance. We implement an escape time algorithm to compute the Mandelbrot set with OpenCL,, with Java as a reference for comparison. In an experiment of setting the escape boundary...

chapter

Workload-Aware Power Gating Design and Run-Time Management for Massively Parallel GPGPUs

Kapil Dev, Sherief Reda, Indrani Paul, Wei Huang, more

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) > 242 - 247

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Power gating (PG) is an effective power efficiency improvement technique. Future general-purpose graphics processing units (GPGPUs) will likely feature hundreds of compute units (CUs) and be power constrained, which leads to serious challenges to existing PG methodologies. In this paper, we propose novel design-time and run-time techniques to effectively implement power gating in future GPGPUs. Based...

chapter

A new method to parallel implementation for batching vast small-scale computing tasks based on GPU

Jun Zhu, Haifeng Yao, Tao Yang, Qiaomei Zhou, more

2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA) > 2092 - 2095

2016 IEEE 11th Conference on Industrial Electronics and Applications (ICIEA)

The calculation of small-scale data is commonly used in scientific computing and application domain, and the high-efficiency method of small calculation can give play to the potency of many calculation and application. In this paper, a novel self-adaptive parallel computing method based on the graphics processing unit (GPU) architecture for batches of small scale computing tasks is proposed herein...

chapter

A programmable and reconfigurable core for binary image processing

Ayad Dalloo, Alberto Garcia-Ortiz

2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC) > 1 - 6

2016 11th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)

Binary-image processing cores are extremely useful in many image and video applications such as object recognition, tracking, motion detection, and identification. To address the variety of applications and binary-image kernels, we propose an FPGA-based intellectual property core with enhanced flexibility: it is programmable, reconfigurable, and parameterizable. The core performs single binary image...

chapter

Exploiting integrated GPUs for network packet processing workloads

Janet Tseng, Ren Wang, James Tsai, Saikrishna Edupuganti, more

2016 IEEE NetSoft Conference and Workshops (NetSoft) > 161 - 165

2016 IEEE NetSoft Conference and Workshops (NetSoft)

Software-based network packet processing on standard high volume servers promises better flexibility, manageability and scalability, thus gaining tremendous momentum in recent years. Numerous research efforts have focused on boosting packet processing performance by offloading to discrete Graphics Processing Units (GPUs). While integrated GPUs, residing on the same die with the CPU, offer many advanced...

chapter

Employing Compression Solutions under OpenACC

Ebad Salehi, Ahmad Lashgar, Amirali Baniasadi

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 348 - 356

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

For GPUs to achieve their peak performance, effective and efficient usage of memory bandwidth is necessary. To this end, programmers invest extensive development effort to optimize a GPU program, specially its memory bandwidth usage. The OpenACC programming model has been introduced to tackle the accelerators programming complexity. However, this model's coarse-grained control on a program can make...

chapter

A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

Syed Waqar Nabi, Wim Vanderbauwhede

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 114 - 123

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Heterogeneous High-Performance Computing (HPC) platforms present a significant programming challenge, especially because the key users of HPC resources are scientists, not parallel programmers. We contend that compiler technology has to evolve to automatically create the best program variant by transforming a given original program. We have developed a novel methodology based on type transformations...

chapter

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Erik Zenker, Benjamin Worpitz, Rene Widera, Axel Huebl, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 631 - 640

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Porting applications to new hardware or programming models is a tedious and error prone process. Every help that eases these burdens is saving developer time that can then be invested into the advancement of the application itself instead of preserving the status-quo on a new platform. The Alpaka library defines and implements an abstract hierarchical redundant parallelism model. The model exploits...

chapter

Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra

Luis Costero, Francisco D. Igual, Katzalin Olcoz, Sandra Catalan, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 692 - 701

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Dealing with asymmetry in the architecture opensa plethora of questions from the perspective of schedulingtask-parallel applications for which there exist early ad-hocstrategies embedded into an asymmetry-conscious runtimes. In this paper we take a different path that addresses thecomplexity of the problem at the library level, via a fewasymmetry-aware fundamental kernels, hiding the architectureheterogeneity...

chapter

Testing Fine-Grained Parallelism for the ADMM on a Factor-Graph

Ning Hao, Amirreza Oghbaee, Mohammad Rostami, Nate Derbinsky, more

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 835 - 844

2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

There is an ongoing effort to develop tools that apply distributed computational resources to tackle large problems or reduce the time to solve them. In this context, the Alternating Direction Method of Multipliers (ADMM) arises as a method that can exploit distributed resources like the dual ascent method and has the robustness and improved convergence of the augmented Lagrangian method. Traditional...

chapter

Dynamic SIMD re-convergence with paired-path comparison

Yun-Chi Huang, Kuan-Chieh Hsu, Wan-shan Hsieh, Chen-Chieh Wang, more

2016 IEEE International Symposium on Circuits and Systems (ISCAS) > 233 - 236

2016 IEEE International Symposium on Circuits and Systems (ISCAS)

SIMD divergence is one of the critical factors that decrease the hardware utilization in contemporary GPGPUs (General Purpose Graphic Processor Unit). Both the reconvergence scheme and control flow detection have to be well considered. In the emerging HSA (Heterogeneous System Architecture) platform, we develop an effective dynamic stack-based re-convergence scheme that can be implemented without...

chapter

Grater: An approximation workflow for exploiting data-level parallelism in FPGA acceleration

Atieh Lotfi, Abbas Rahimi, Amir Yazdanbakhsh, Hadi Esmaeilzadeh, more

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1279 - 1284

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Modern applications including graphics, multimedia, web search, and data analytics not only can benefit from acceleration, but also exhibit significant degrees of tolerance to imprecise computation. This amenability to approximation provides an opportunity to trade quality of the results for higher performance and better resource utilization. Exploiting this opportunity is particularly important for...

chapter

A new parallel SystemC kernel leveraging manycore architectures

Nicolas Ventroux, Tanguy Sassolas

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 487 - 492

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

The complexity of system-level modeling is continuously increasing. Electronic System Level (ESL) design requires fast simulation techniques to control future SoC development cost and time-to-market. However, SystemC simulations are sequential and then limited by single-thread performance. In this paper, we present a new parallel SystemC kernel that efficiently leverages the multiple cores of a host...

chapter

GSI: A GPU Stall Inspector to characterize the sources of memory stalls for tightly coupled GPUs

Johnathan Alsop, Matthew D. Sinclair, Rakesh Komuravelli, Sarita V. Adve

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 172 - 182

2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

In recent years the power wall has prevented the continued scaling of single core performance. This has lead to the rise of dark silicon and motivated a move toward parallelism and specialization. As a result, energy-efficient high-throughput GPU cores are increasingly favored for accelerating data-parallel applications. However, the best way to efficiently communicate and synchronize across heterogeneous...

chapter

A Component Based Graphical Parallel Programming Approach for Numerical Simulation Development

Liao Li, Mo Zeyao, Zhang Aiqing

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS) > 298 - 303

2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS)

Building massively parallel numerical simulations is not easy due to lasting changes of parallel programming models and various software technologies needed. We develop a component based graphical parallel programming approach to lower the difficulties of coding applications in scientific and engineering computing and support rapid development of large scale simulations basing on a domain specific...

chapter

Dependencies data flow graph based approach for speeding-up application

Aimad Eddine Debbi

2016 International Conference on Industrial Informatics and Computer Systems (CIICS) > 1 - 6

2016 International Conference on Industrial Informatics and Computer Systems (CIICS)

This paper bring a description of ‘HSCoT’, an efficient high level synthesis tool generating register transfer level (RTL) specifications for applications written entirely in C language and an associate reliable approach for speeding applications execution. It's based on dependency data flow graph construction and aims to explore maximally the inherent intrinsic parallelism of application. Application...

chapter

Parallel edge detection by SOBEL algorithm using CUDA C

Adhir Jain, Anand Namdev, Meenu Chawla

2016 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS) > 1 - 6

2016 IEEE Students' Conference on Electrical, Electronics and Computer Science (SCEECS)

Edge detection is one of the most important paradigm of Image processing. Images contain millions of pixel and each pixel information is independent of its neighbouring pixel. Hence this paper puts to test the capability of Graphics Processing Unit (GPU) to compute in parallel against the millions of pixel calculations involved in image processing. Each pixel operation is independent from other thus...

chapter

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Anamaria Vizitiu, Lucian Mihai Itu, Ranveer Joyseeree, Adrien Depeursinge, more

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 431 - 434

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel -- wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard...

chapter

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Nicolas Benoit, Stephane Louise

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 811 - 819

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor's internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we...

Keywords:
KERNEL
PARALLEL PROCESSING

Publication date

Set your own date range

Content availability

Available (468)
None (5)

Keywords

INSTRUCTION SETS (149)
GRAPHICS PROCESSING UNITS (132)
GRAPHICS PROCESSING UNIT (98)
COMPUTER ARCHITECTURE (92)
HARDWARE (89)
GPU (82)
COMPUTATIONAL MODELING (73)
CUDA (58)
FIELD PROGRAMMABLE GATE ARRAYS (58)
PROGRAMMING (56)
OPTIMIZATION (53)
COPROCESSORS (50)
ARRAYS (46)
ALGORITHM DESIGN AND ANALYSIS (44)
PROGRAM PROCESSORS (42)
COMPUTER GRAPHIC EQUIPMENT (38)
MEMORY MANAGEMENT (38)
PERFORMANCE EVALUATION (35)
GPGPU (34)
ACCELERATION (33)
MULTIPROCESSING SYSTEMS (32)
BENCHMARK TESTING (31)
REGISTERS (30)
YARN (29)
OPENCL (28)
RUNTIME (26)
PARALLEL PROGRAMMING (24)
BANDWIDTH (23)
FPGA (23)
SYNCHRONIZATION (22)
COMPUTER GRAPHICS (21)
DATA MINING (21)
MULTICORE PROCESSING (21)
PARALLEL COMPUTING (21)
CENTRAL PROCESSING UNIT (18)
LIBRARIES (18)
MICROPROCESSOR CHIPS (18)
PIXEL (18)
THROUGHPUT (18)
IMAGE PROCESSING (17)
PIPELINES (17)
TRAINING (17)
PARALLEL ARCHITECTURES (16)
CONVOLUTION (15)
HEURISTIC ALGORITHMS (15)
COMPUTE UNIFIED DEVICE ARCHITECTURE (14)
SPARSE MATRICES (14)
LINUX (13)
SERVERS (13)
SUPPORT VECTOR MACHINES (13)
MULTI-THREADING (12)
RANDOM ACCESS MEMORY (12)
VECTORS (12)
CONTEXT (11)
DATA STRUCTURES (11)
DATABASES (11)
EMBEDDED SYSTEMS (11)
INDEXES (11)
RECONFIGURABLE ARCHITECTURES (11)
TILES (11)
ACCURACY (10)
COMPUTERS (10)
DECODING (10)
GRAPHIC PROCESSING UNIT (10)
MAGNETIC CORES (10)
MATHEMATICAL MODEL (10)
MESSAGE PASSING (10)
MESSAGE SYSTEMS (10)
PARALLEL ALGORITHMS (10)
RESOURCE MANAGEMENT (10)
APPLICATION PROGRAM INTERFACES (9)
DIGITAL SIGNAL PROCESSING (9)
HIGH PERFORMANCE COMPUTING (9)
MICROPROCESSORS (9)
OPENMP (9)
RESOURCE ALLOCATION (9)
SCHEDULING (9)
CPU (8)
ENCODING (8)
FEATURE EXTRACTION (8)
GPU COMPUTING (8)
MULTI-CORE (8)
OPTIMISATION (8)
PARALLEL (8)
PROCESSOR SCHEDULING (8)
REAL-TIME SYSTEMS (8)
SCHEDULES (8)
ANALYTICAL MODELS (7)
BIOINFORMATICS (7)
CLOCKS (7)
GRAPHICS (7)
IMAGE COLOR ANALYSIS (7)
JACOBIAN MATRICES (7)
LINEAR ALGEBRA (7)
MATRIX MULTIPLICATION (7)
SCALABILITY (7)
SIMD (7)
SOFTWARE (7)
more

INFONA - science communication portal

Search results

Detecting Kernels Suitable for C-Based High-Level Hardware Synthesis

Accelerating Computation on an Android Phone with OpenCL Parallelism and Optimizing Workload Distribution between a Phone and a Cloud Service

Workload-Aware Power Gating Design and Run-Time Management for Massively Parallel GPGPUs

A new method to parallel implementation for batching vast small-scale computing tasks based on GPU

A programmable and reconfigurable core for binary image processing

Exploiting integrated GPUs for network packet processing workloads

Employing Compression Solutions under OpenACC

A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications

Alpaka -- An Abstraction Library for Parallel Kernel Acceleration

Refactoring Conventional Task Schedulers to Exploit Asymmetric ARM big.LITTLE Architectures in Dense Linear Algebra

Testing Fine-Grained Parallelism for the ADMM on a Factor-Graph

Dynamic SIMD re-convergence with paired-path comparison

Grater: An approximation workflow for exploiting data-level parallelism in FPGA acceleration

A new parallel SystemC kernel leveraging manycore architectures

GSI: A GPU Stall Inspector to characterize the sources of memory stalls for tightly coupled GPUs

A Component Based Graphical Parallel Programming Approach for Numerical Simulation Development

Dependencies data flow graph based approach for speeding-up application

Parallel edge detection by SOBEL algorithm using CUDA C

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options