Wyniki wyszukiwania

Pozycje od 81 do 100 spośród 473 wyników

Poprzednia

Następna

rozdział

Introducing Parallelism by Using REPARA C++11 Attributes

M. Danelutto, J. Daniel Garcia, Luis Miguel Sanchez, Rafael Sotomayor, więcej

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 354 - 358

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage...

rozdział

Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems

Nicolas Benoit, Stephane Louise

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 811 - 819

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Current trends in computer architecture show that we are aiming toward more cores and even more heterogeneity. As an extensive knowledge of processor's internals cannot be a prerequisite to their programming and for the sake of portability, these systems necessitate the compilation flow to evolve and cope with heterogeneity issues. This is even more so true for embedded systems. In this paper, we...

rozdział

Introducing Parallelism by Using REPARA C++11 Attributes

M. Danelutto, J. Daniel Garcia, Luis Miguel Sanchez, Rafael Sotomayor, więcej

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 354 - 358

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

rozdział

GPU-Accelerated Texture Analysis Using Steerable Riesz Wavelets

Anamaria Vizitiu, Lucian Mihai Itu, Ranveer Joyseeree, Adrien Depeursinge, więcej

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP) > 431 - 434

2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)

Visual pattern recognition is a key research topic in the field of image processing and computer vision. Texture analysis based on steerable Riesz wavelets is powerful, but requires computing pixel -- wise operations resulting in a run time in the order of days when large volumes of data are processed. To overcome this limitation we propose a Graphics Processing Unit (GPU) based solution. A standard...

rozdział

POSTER: Pagoda: A runtime system to maximize GPU utilization in data parallel tasks with limited parallelism

Tsung Tai Yeh, Amit Sabne, Putt Sakdhnagool, Rudolf Eigenmann, więcej

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 449 - 450

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Massively multithreaded GPUs achieve high throughput by running thousands of threads in parallel. To fully utilize the hardware, contemporary workloads spawn work to the GPU in bulk by launching large tasks, where each task is a kernel that contains thousands of threads that occupy the entire GPU.

rozdział

Automatically exploiting implicit Pipeline Parallelism from multiple dependent kernels for GPUs

Gwangsun Kim, Jiyun Jeong, John Kim, Mark Stephenson

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 339 - 350

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Execution of GPGPU workloads consists of different stages including data I/O on the CPU, memory copy between the CPU and GPU, and kernel execution. While GPU can remain idle during I/O and memory copy, prior work has shown that overlapping data movement (I/O and memory copies) with kernel execution can improve performance. However, when there are multiple dependent kernels, the execution of the kernels...

rozdział

POSTER - collective dynamic parallelism for directive based GPU programming languages and compilers

Guray Ozen, Eduard Ayguade, Jesus Labarta

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) > 423 - 424

2016 International Conference on Parallel Architecture and Compilation Techniques (PACT)

Early programs for GPU (Graphics Processing Units) acceleration were based on a flat, bulk parallel programming model, in which programs had to perform a sequence of kernel launches from the host CPU. In the latest releases of these devices, dynamic (or nested) parallelism is supported, making possible to launch kernels from threads running on the device, without host intervention. Unfortunately,...

rozdział

SWIFT-A Performance Accelerated Optimized String Matching Algorithm for Nvidia GPUs

Sourabh S. Shenoy, U. Supriya Nayak, Bayyapu Neelima

2016 15th International Symposium on Parallel and Distributed Computing (ISPDC) > 80 - 87

2016 15th International Symposium on Parallel and Distributed Computing (ISPDC)

This paper presents a study of exact string matching algorithms and their performance behavior when executed on dynamic parallelism enabled Kepler Graphics Processing Unit (GPU) by Nvidia. The algorithms considered in this paper are Quick search (QS), Horspool (HP), and Brute force (BF) string matching. Their efficient implementation on Kepler gives a remarkable improvement over their respective multi-core...

rozdział

Efficient synthesis of graph methods: A dynamically scheduled architecture

Marco Minutoli, Vito Giovanni Castellana, Antonino Tumeo, Marco Lattuada, więcej

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 1 - 8

2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

RDF databases naturally map to a graph representation and employ languages, such as SPARQL, that implements queries as graph pattern matching routines. Graph methods exhibit an irregular behavior: they present unpredictable, fine-grained data accesses, and are synchronization intensive. Graph data structures expose large amounts of dynamic parallelism, but are difficult to partition without generating...

rozdział

TEMP: Thread batch enabled memory partitioning for GPU

Mengjie Mao, Wujie Wen, Xiaoxiao Liu, Jingtong Hu, więcej

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

As massive multi-threading in GPU imposes tremendous pressure on memory subsystems, efficient bandwidth utilization becomes a key factor affecting the GPU throughput. In this work, we propose thread batch enabled memory partitioning (TEMP), to improve GPU performance through the improvement of memory bandwidth utilization. In particular, TEMP clusters multiple thread blocks sharing the same set of...

rozdział

C-Brain: A deep learning accelerator that tames the diversity of CNNs through adaptive data-level parallelization

Lili Song, Ying Wang, Yinhe Han, Xin Zhao, więcej

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC)

Convolutional neural networks (CNN) accelerators have been proposed as an efficient hardware solution for deep learning based applications, which are known to be both compute-and-memory intensive. Although the most advanced CNN accelerators can deliver high computational throughput, the performance is highly unstable. Once changed to accommodate a new network with different parameters like layers...

rozdział

Design space exploration of FPGA-based Deep Convolutional Neural Networks

Mohammad Motamedi, Philipp Gysel, Venkatesh Akella, Soheil Ghiasi

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC) > 575 - 580

2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC)

Deep Convolutional Neural Networks (DCNN) have proven to be very effective in many pattern recognition applications, such as image classification and speech recognition. Due to their computational complexity, DCNNs demand implementations that utilize custom hardware accelerators to meet performance and energy-efficiency constraints. In this paper we propose an FPGA-based accelerator architecture which...

rozdział

V-PFORDelta: Data Compression for Energy Efficient Computation of Time Series

Abdullah Al Hasib, Juan M. Cebrian, Lasse Natvig

2015 IEEE 22nd International Conference on High Performance Computing (HiPC) > 416 - 425

2015 IEEE 22nd International Conference on High Performance Computing (HiPC)

Chip multiprocessors (CMPs) and heterogeneous architectures have become predominant in all market segments, from embedded to high performance computing. These architectures exacerbate on-chip data requirements, creating additional pressure on the memory subsystem. Consequently, efficient utilization of on-chip memory space becomes critical for data intensive applications. A promising means of addressing...

rozdział

Using type transformations to generate program variants for FPGA design space exploration

Syed Waqar Nabi, Wim Vanderbauwhede

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 6

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

We present preliminary results with the TyTra design flow. Our aim is to create a parallelising compiler for high-performance scientific code on heterogeneous platforms, with a focus on Field-Programmable Gate Arrays (FPGAs). Using the functional language Idris, we show how this programming paradigm facilitates generation of different correctby- construction program variants through type transformations...

rozdział

Design of OpenCL-compatible multithreaded hardware accelerators with dynamic support for embedded FPGAs

Alfonso Rodrıguez, Juan Valverde, Eduardo de la Torre

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig) > 1 - 7

2015 International Conference on ReConFigurable Computing and FPGAs (ReConFig)

ARTICo³ is an architecture that permits to dynamically set an arbitrary number of reconfigurable hardware accelerators, each containing a given number of threads fixed at design time according to High Level Synthesis constraints. However, the replication of these modules can be decided at runtime to accelerate kernels by increasing the overall number of threads, add modular redundancy to increase...

rozdział

HSA-enabled DSPs and accelerators

John Glossner, Paul Blinzer, Jarmo Takala

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 1407 - 1411

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

In this paper, we describe the Heterogeneous System Architecture Foundation's application to digital signal processors (DSP) and hardware accelerators. We provide an overview of the HSA runtime, system architecture and programmer's model, identify characteristics of DSPs and compare differences in algorithms to GPUs. We show an example mapping of HSA agents to a modern DSP using the HSA intermediate...

rozdział

Determining a device crossover point in CPU/GPU systems for streaming applications

Sudeep Kanur, Wictor Lund, Leonidas Tsiopoulos, Johan Lilius

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 1417 - 1421

2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

In streaming dataflow applications such as video conferencing systems, the applications are often subjected to traffic occurring in bursts. As systems consisting of a CPU and a GPU are becoming ubiquitous, efficient utilization of such platforms for handling bursts of data becomes an interesting problem. For GPUs to be efficient, the chunk size of data to process must be large. The bursty nature of...

rozdział

Operational cloud screening service for Sentinel-2 image time series

Luis Gomez-Chova, Julia Amoros-Lopez, Antonio Ruiz-Verdu, Jordi Munoz-Mari, więcej

2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 334 - 337

2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

This paper deals with the development and implementation of a cloud screening algorithm for image time series, with the focus on the forthcoming Sentinel-2 satellites to be launched under the ESA Copernicus Programme. The proposed methodology is based on kernel ridge regression and exploits the temporal information to detect anomalous changes that correspond to cloud covers. The huge data volumes...

rozdział

Multi-threaded Simics SystemC Virtual Platform

Asad Khan, Weiqiang Ma, Bengt Werner, Chris Wolf

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 373 - 379

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

The functional simulator Simics provides a co-simulation integration path with a SystemC simulation environment to create Virtual Platforms. With increasing complexity of the SystemC models, this platform suffers from performance degradation due to the single threaded nature of the integrated Virtual Platform. In this paper, we present a multi-threaded Simics SystemC platform solution that significantly...

rozdział

An adaptation of the MWP-CWP model for a GPU architecture applied to 3-D stencil kernels

Dorfell Parra, William Salamanca, Ana B. Ramirez

2015 IEEE Thirty Fifth Central American and Panama Convention (CONCAPAN XXXV) > 1 - 6

2015 IEEE Thirty Fifth Central American and Panama Convention (CONCAPAN XXXV)

Finding the bottlenecks in the execution of a kernel in a GPU is essential to improve the performance of the implementation. Although there are several expertise techniques such as Assess, Parallelize, Optimize, Deploy (APOD), proposed by NVIDIA, the use of those techniques in computationally expensive algorithms such as Reverse Time Migration (RTM) is not an option. To solve this problem several...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
KERNEL
PARALLEL PROCESSING

Data publikacji

Ustaw własny zakres dat

Dostępność treści

Dostępna (468)
Brak (5)

Słowa kluczowe

INSTRUCTION SETS (149)
GRAPHICS PROCESSING UNITS (132)
GRAPHICS PROCESSING UNIT (98)
COMPUTER ARCHITECTURE (92)
HARDWARE (89)
GPU (82)
COMPUTATIONAL MODELING (73)
CUDA (58)
FIELD PROGRAMMABLE GATE ARRAYS (58)
PROGRAMMING (56)
OPTIMIZATION (53)
COPROCESSORS (50)
ARRAYS (46)
ALGORITHM DESIGN AND ANALYSIS (44)
PROGRAM PROCESSORS (42)
COMPUTER GRAPHIC EQUIPMENT (38)
MEMORY MANAGEMENT (38)
PERFORMANCE EVALUATION (35)
GPGPU (34)
ACCELERATION (33)
MULTIPROCESSING SYSTEMS (32)
BENCHMARK TESTING (31)
REGISTERS (30)
YARN (29)
OPENCL (28)
RUNTIME (26)
PARALLEL PROGRAMMING (24)
BANDWIDTH (23)
FPGA (23)
SYNCHRONIZATION (22)
COMPUTER GRAPHICS (21)
DATA MINING (21)
MULTICORE PROCESSING (21)
PARALLEL COMPUTING (21)
CENTRAL PROCESSING UNIT (18)
LIBRARIES (18)
MICROPROCESSOR CHIPS (18)
PIXEL (18)
THROUGHPUT (18)
IMAGE PROCESSING (17)
PIPELINES (17)
TRAINING (17)
PARALLEL ARCHITECTURES (16)
CONVOLUTION (15)
HEURISTIC ALGORITHMS (15)
COMPUTE UNIFIED DEVICE ARCHITECTURE (14)
SPARSE MATRICES (14)
LINUX (13)
SERVERS (13)
SUPPORT VECTOR MACHINES (13)
MULTI-THREADING (12)
RANDOM ACCESS MEMORY (12)
VECTORS (12)
CONTEXT (11)
DATA STRUCTURES (11)
DATABASES (11)
EMBEDDED SYSTEMS (11)
INDEXES (11)
RECONFIGURABLE ARCHITECTURES (11)
TILES (11)
ACCURACY (10)
COMPUTERS (10)
DECODING (10)
GRAPHIC PROCESSING UNIT (10)
MAGNETIC CORES (10)
MATHEMATICAL MODEL (10)
MESSAGE PASSING (10)
MESSAGE SYSTEMS (10)
PARALLEL ALGORITHMS (10)
RESOURCE MANAGEMENT (10)
APPLICATION PROGRAM INTERFACES (9)
DIGITAL SIGNAL PROCESSING (9)
HIGH PERFORMANCE COMPUTING (9)
MICROPROCESSORS (9)
OPENMP (9)
RESOURCE ALLOCATION (9)
SCHEDULING (9)
CPU (8)
ENCODING (8)
FEATURE EXTRACTION (8)
GPU COMPUTING (8)
MULTI-CORE (8)
OPTIMISATION (8)
PARALLEL (8)
PROCESSOR SCHEDULING (8)
REAL-TIME SYSTEMS (8)
SCHEDULES (8)
ANALYTICAL MODELS (7)
BIOINFORMATICS (7)
CLOCKS (7)
GRAPHICS (7)
IMAGE COLOR ANALYSIS (7)
JACOBIAN MATRICES (7)
LINEAR ALGEBRA (7)
MATRIX MULTIPLICATION (7)
SCALABILITY (7)
SIMD (7)
SOFTWARE (7)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu