Wyniki wyszukiwania

Pozycje od 81 do 100 spośród 433 wyników

Poprzednia

Następna

rozdział

MDACCER: Modified Distributed Assessment of the Closeness CEntrality Ranking in Complex Networks for Massively Parallel Environments

Frederico Luis Cabral, Carla Osthoff, Daniel Ramos, Rafael Nardes

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) > 43 - 48

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

We propose a new method derived from DACCER (Distributed Assessment of the Closeness CEntrality Ranking): the modified DACCER (MDACCER), for assessing traditional closeness centrality ranking. MDACCER presents a relaxation that allows it to take advantage of massively parallel environments like General Purpose Graphics Processing Units (GPGPUs). Traditional DACCER proposal assesses Closeness centrality...

rozdział

Scalable Relativistic High-Resolution Shock-Capturing for Heterogeneous Computing

Forrest Wolfgang Glines, Matthew Anderson, David Neilsen

2015 IEEE International Conference on Cluster Computing > 611 - 618

2015 IEEE International Conference on Cluster Computing (CLUSTER)

A shift is underway in high performance computing (HPC) towards heterogeneous parallel architectures that emphasize medium and fine grain thread parallelism. Many scientific computing algorithms, including simple finite-differencing methods, have already been mapped to heterogeneous architectures with order-of-magnitude gains in performance as a result. Recent case studies examining high-resolution...

rozdział

Aparapi-UCores: A high level programming framework for unconventional cores

Oren Segal, Philip Colangelo, Nasibeh Nasiri, Zhuo Qian, więcej

2015 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2015 IEEE High Performance Extreme Computing Conference (HPEC)

Combining several types of devices and architectures is at the heart of heterogeneous computing's power efficiency advantage, but the strength of heterogeneous systems is also their Achilles heel, i.e. the diversity of the devices and ecosystems needed to maintain them present major technological challenges. Some of the biggest challenges are in the realm of system programing. We believe that for...

rozdział

Optimizing Image Sharpening Algorithm on GPU

Mengran Fan, Haipeng Jia, Yunquan Zhang, Xiaojing An, więcej

2015 44th International Conference on Parallel Processing > 230 - 239

2015 44th International Conference on Parallel Processing (ICPP)

Sharpness is an algorithm used to sharpen images. As the increase of image size, resolution, and the requirements for real-time processing, the performance of sharpness needs to get improved greatly. The independent pixel calculation of sharpness makes a good opportunity to use GPU to largely accelerate the performance. However, to transplant it to GPU, one challenge is that sharpness involves several...

rozdział

GPU Solver for Systems of Linear Equations with Infinite Precision

J. Khun, I. imeeek, R. Lorencz

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) > 121 - 124

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

In this paper, we would like to introduce a GPU accelerated solver for systems of linear equations with an infinite precision. The infinite precision means that the system can provide a precise solution without any rounding error. These errors usually come from limited precision of floating point values within their natural computer representation. In a simplified description, the system is using...

rozdział

Scaling number of cores in GPGPU: A comparative performance analysis

Winnie Thomas, Rohin D. Daruwala

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 501 - 507

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

The Single Instruction Multiple Thread (SIMT) architecture based, Graphic Processing Units (GPUs) are emerging as more efficient than Multiple Instruction Multiple Data (MIMD) architectures in exploiting parallelism. A GPU has numerous shader cores and thousands of simultaneous finegrained active threads. These threads are grouped into Cooperative Thread Arrays (CTAs). All the threads within a CTA...

rozdział

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Yijie Huangfu, Wei Zhang

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 972 - 976

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing...

rozdział

CUDA-based hybrid intuitionistic fuzzy edge detection algorithm

Eyup Yalcin, Hasan Badem, Mahit Gunes

2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) > 1 - 6

2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

Intuitionistic fuzzy edge detection algorithm has been used for the signification or characterization of images. It has been designed by experts and the algorithm provides to aim to minimize errors. However, it has a fixed value for thresholding. In this paper, a hybrid algorithm has been developed using the Otsu method which is calculated a threshold value depending on the images. To be applicable...

rozdział

JolokiaC++: Optimizing Irregular Accesses for GPGPU

Vibha Patel, Sanjeev Aggarwal, Amey Karkare

We present JolokiaC++ a compiler framework to ease coding of irregular data applications on GPUs. The effectiveness of the compiler and runtime systems of JolokiaC++ is tested using three kernels IRREG, MOLDYN and NBF, executed on NVIDIA GPUs. We developed extensions for the generic parallel constructs that allow portable and efficient programming of codes with irregular accesses on the GPU. We present...

rozdział

A Novel Fast Approach for Convolutional Networks with Small Filters Based on GPU

Wenbin Jiang, Yiming Chen, Hai Jin, Bin Luo, więcej

Recently, convolutional networks have achieved great successes in the field of computer vision. In order to improve the efficiency of convolutional networks, large amount of solutions focusing on training algorithms and parallelism strategies have been proposed. In this paper, a novel algorithm based on look-up table is proposed to speed up convolutional networks with small filters by applying GPU...

rozdział

Towards accelerated agent-based crowd simulation for Hajj and Umrah

Abdur Rahman, Nor Asilah Wati Abdul Hamid, Amir Rizaan Rahiman, Basim Zafar

2015 International Symposium on Agents, Multi-Agent Systems and Robotics (ISAMSR) > 65 - 70

2015 International Symposium on Agents, Multi-Agent Systems and Robotics (ISAMSR)

There are many scientific applications ranging from weather prediction to oil and gas exploration that requires high-performance computing. It aids industries and researchers to enrich further their advancements. With the advent of general purpose computing over GPUs, most of the applications above are shifting towards High-Performance Computing (HPC). Agent-based crowd simulation is one of the candidates...

rozdział

GPU implementation of spatial preprocessing for spectral unmixing of hyperspectral data

Jaime Delgado, Gabriel Martin, Javier Plaza, Luis Ignacio Jimenez, więcej

2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 5043 - 5046

2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

The integration of spatial information into spectral unmixing process has attracted much attention in recent years. Several approaches have been developed to incorporate spatial considerations into the endmember extraction/estimation procedure. Spatial preprocessing algorithms are one of the most commonly adopted techniques to guide endmember identification algorithms in terms of the spatial characteristics...

rozdział

Automatic Parallelization of GPU Applications Using OpenCL

Lizandro D. Solano-Quinde, Brett M. Bode, Arun K. Somani

2015 Asia-Pacific Conference on Computer Aided System Engineering > 276 - 283

2015 Asia-Pacific Conference on Computer Aided System Engineering (APCASE)

Graphics Processing Units (GPUs) have been successfully used to accelerate scientific applications due to their computation power and the availability of programming languages that make more approachable writing scientific applications for GPUs. However, since the programming model of GPUs requires offloading all the data to the GPU memory, the memory footprint of the application is limited to the...

rozdział

Evaluation of global synchronization for iterative algebra algorithms on many-core

Ayaz ul Hasan Khan, Mayez Al-Mouhamed, Lutfi A. Firdaus

2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) > 1 - 6

2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

Massively parallel computing is applied extensively in various scientific and engineering domains. With the growing interest in many-core architectures and due to the lack of explicit support for inter-block synchronization specifically in GPUs, synchronization becomes necessary to minimize inter-block communication time. In this paper, we have proposed two new inter-block synchronization techniques:...

rozdział

Understanding Performance Portability of OpenACC for Supercomputers

Suttinee Sawadsitang, James Lin, Simon See, Francois Bodin, więcej

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 699 - 707

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the supercomputers. However, the performance portability is not guaranteed by the OpenACC standard. Therefore, we propose a systematic optimization method, instead of...

rozdział

Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU

Carl Yang, Yangzihao Wang, John D. Owens

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 841 - 847

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

We implement a promising algorithm for sparse-matrix sparse-vector multiplication (SpMSpV) on the GPU. An efficient k-way merge lies at the heart of finding a fast parallel SpMSpV algorithm. We examine the scalability of three approaches -- no sorting, merge sorting, and radix sorting -- in solving this problem. For breadth-first search (BFS), we achieve a 1.26x speedup over state-of-the-art sparse-matrix...

rozdział

Energy Modeling and Optimization for Tiled Nested-Loop Codes

Nirmal Prajapati, Waruna Ranasinghe, Vamshi Tandrapati, Rumen Andonov, więcej

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 888 - 895

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

We develop a methodology for modeling the energy efficiency of tiled nested-loop codes running on a graphics processing unit (GPU) and use it for energy efficiency optimization. % We use the polyhedral model, a We assume that a highly optimized and parametrized version of a tiled nested -- loop code, either written by an expert programmer or automatically produced by a polyhedral compilation tool...

rozdział

GPU Based Sound Simulation and Visualization

Torbjorn Loken, Sergiu M. Dascalu, Frederick C. Harris

2015 12th International Conference on Information Technology - New Generations > 692 - 697

2015 12th International Conference on Information Technology - New Generations (ITNG)

As the era of Moore's Law and increasing CPU clock rates nears its stopping point the focus of chip and hardware design has shifted to increasing the number of computation cores present on the chip. This increase can be most clearly seen in the rise of Graphic Processing Units (GPU) where hundreds or thousands of slower cores work in parallel to accomplish tasks. Programming for these chips represents...

rozdział

Characterizing and enhancing global memory data coalescing on GPUs

Naznin Fauzia, Louis-Noel Pouchet, P. Sadayappan

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 12 - 22

2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Effective parallel programming for GPUs requires careful attention to several factors, including ensuring coalesced access of data from global memory. There is a need for tools that can provide feedback to users about statements in a GPU kernel where non-coalesced data access occurs, and assistance in fixing the problem. In this paper, we address both these needs. We develop a two-stage framework...

rozdział

Efficient warp execution in presence of divergence with collaborative context collection

Farzad Khorasani, Rajiv Gupta, Laxmi N. Bhuyan

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 204 - 215

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

GPU's SIMD architecture is a double-edged sword confronting parallel tasks with control flow divergence. On the one hand, it provides a high performance yet power-efficient platform to accelerate applications via massive parallelism; however, on the other hand, irregularities induce inefficiencies due to the warp's lockstep traversal of all diverging execution paths. In this work, we present a software...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
KERNEL
GPU

Data publikacji

Ustaw własny zakres dat

Dostępność treści

Dostępna (431)
Brak (2)

Słowa kluczowe

GRAPHICS PROCESSING UNITS (213)
INSTRUCTION SETS (204)
GRAPHICS PROCESSING UNIT (180)
CUDA (142)
COPROCESSORS (86)
COMPUTER ARCHITECTURE (83)
PARALLEL PROCESSING (82)
COMPUTER GRAPHIC EQUIPMENT (70)
COMPUTATIONAL MODELING (69)
HARDWARE (57)
OPTIMIZATION (56)
OPENCL (55)
PROGRAMMING (51)
ARRAYS (50)
ALGORITHM DESIGN AND ANALYSIS (49)
MEMORY MANAGEMENT (42)
ACCELERATION (41)
REGISTERS (31)
PERFORMANCE EVALUATION (30)
SPARSE MATRICES (27)
YARN (26)
PARALLEL COMPUTING (25)
PIXEL (25)
VECTORS (25)
GPGPU (24)
MATHEMATICAL MODEL (24)
BANDWIDTH (23)
COMPUTER GRAPHICS (22)
LIBRARIES (22)
THROUGHPUT (21)
COMPUTE UNIFIED DEVICE ARCHITECTURE (20)
BENCHMARK TESTING (19)
RUNTIME (19)
GRAPHICS (18)
PARALLEL ALGORITHMS (18)
CPU (17)
CENTRAL PROCESSING UNIT (16)
FIELD PROGRAMMABLE GATE ARRAYS (16)
PARALLEL (16)
EQUATIONS (15)
FPGA (15)
IMAGE PROCESSING (15)
INDEXES (15)
FEATURE EXTRACTION (13)
PARALLEL PROGRAMMING (13)
PERFORMANCE (13)
TRAINING (13)
OPENMP (12)
PARALLEL ARCHITECTURES (12)
CONVOLUTION (11)
HIGH PERFORMANCE COMPUTING (11)
SUPPORT VECTOR MACHINES (11)
CONTEXT (10)
GRAPHIC PROCESSING UNIT (10)
MULTICORE PROCESSING (10)
RANDOM ACCESS MEMORY (10)
RENDERING (COMPUTER GRAPHICS) (10)
IMAGE RECONSTRUCTION (9)
JACOBIAN MATRICES (9)
MATRIX MULTIPLICATION (9)
REAL-TIME SYSTEMS (9)
RESOURCE MANAGEMENT (9)
THREE DIMENSIONAL DISPLAYS (9)
VIDEO CODING (9)
ANALYTICAL MODELS (8)
CONFERENCES (8)
DATA MINING (8)
DATA STRUCTURES (8)
DATABASES (8)
ENCODING (8)
ENERGY EFFICIENCY (8)
LINEAR ALGEBRA (8)
MOTION ESTIMATION (8)
MULTIPROCESSING SYSTEMS (8)
NVIDIA (8)
PARALLEL ALGORITHM (8)
PROGRAM PROCESSORS (8)
SPMV (8)
SYNCHRONIZATION (8)
TILES (8)
TUNING (8)
ACCURACY (7)
APPROXIMATION ALGORITHMS (7)
COMPUTER VISION (7)
DECODING (7)
EDUCATIONAL INSTITUTIONS (7)
HIGH DEFINITION VIDEO (7)
HISTOGRAMS (7)
IMAGE COLOR ANALYSIS (7)
IMAGE SEGMENTATION (7)
ITERATIVE METHODS (7)
MPI (7)
OPTIMISATION (7)
PARTITIONING ALGORITHMS (7)
PIPELINES (7)
RADIATION DETECTORS (7)
SHAPE (7)
SIMD (7)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu