Wyniki wyszukiwania

Pozycje od 41 do 60 spośród 303 wyników

Poprzednia

Następna

rozdział

An implementation of the ACA on GPU platform

Xing Mu, Hou-Xing Zhou, Zhe Song, Wei-Bing Kong, więcej

2015 Asia-Pacific Microwave Conference (APMC) > 2 > 1 - 3

2015 Asia-Pacific Microwave Conference (APMC)

In this paper, an implementation of the ACA on GPU platform is presented, involving two parts: the matrix compression using the ACA and the batched matrix-vector products utilizing H-matrix form. Some numerical examples are provided to demonstrate the overall performance of the proposed implementation of the ACA algorithm on GPU platform through comparison with the 4-threaded CPU algorithm. In these...

rozdział

New fine-grained clustering algorithm on GPU architecture for bias field correction and MRI image segmentation

N. Aitali, B. Cherradi, O. Bouattane, M. Youssfi, więcej

2015 27th International Conference on Microelectronics (ICM) > 118 - 121

2015 27th International Conference on Microelectronics (ICM)

in this paper, we propose a new fine-grained clustering bias field estimation and segmentation algorithm on Single Instruction Multiple Data (SIMD) architecture (GPU). The goal is to accelerate compute-intensive portions of the sequential version. We have implemented this parallel algorithm using Compute Unified Device Architecture (CUDA) on different NVidia GPU cards. The numerical results in terms...

rozdział

GPU accelerated geometric multigrid method: Performance comparison on recent NVIDIA architectures

Iulian Stroia, Lucian Itu, Cosmin Nita, Laszlo Lazar, więcej

2015 19th International Conference on System Theory, Control and Computing (ICSTCC) > 175 - 179

2015 19th International Conference on System Theory, Control and Computing (ICSTCC)

During the past decade Graphics Processing Units (GPU) have been increasingly employed for speeding up compute intensive scientific applications. In this field, the geometric multigrid method (GMG) is one of the most efficient algorithms for solving large sparse linear systems of equations. Herein we analyze the performance of an optimized GPU based implementation of the GMG method on different state-of-the-art...

rozdział

Design and Verification of Heterogeneous Streaming Parallel Mechanisms on Kepler CUDA

Kailong Zhang, Shaoli Zhou, Liang Hu, Hang Su, więcej

2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing > 2256 - 2262

In many-core based parallel computing field, how to optimally allocate and schedule computing core resources according to characteristics of parallel applications is one typical and fundamental problem, which touches closely to computing performances. After analyzing features and mechanisms of Kepler CUDA architecture, three heterogeneous streaming parallel computing modes and corresponding constraints,...

rozdział

CPU+GPU Programming of Stencil Computations for Resource-Efficient Use of GPU Clusters

Mohammed Sourouri, Johannes Langguth, Filippo Spiga, Scott B. Baden, więcej

2015 IEEE 18th International Conference on Computational Science and Engineering > 17 - 26

2015 IEEE 18th International Conference on Computational Science and Engineering (CSE)

On modern GPU clusters, the role of the CPUs is often restricted to controlling the GPUs and handling MPI communication. The unused computing power of the CPUs, however, can be considerable for computations whose performance is bounded by memory traffic. This paper investigates the challenges of simultaneous usage of CPUs and GPUs for computation. Our emphasis is on deriving a heterogeneous CPU+GPU...

rozdział

Real-Time True-Color Synthesis of Remote Sensing Image with CUDA

Baolin Chen, Xiaoyu Li, Hang Lei, Hang Wang

2015 Third International Conference on Advanced Cloud and Big Data > 286 - 290

2015 Third International Conference on Advanced Cloud and Big Data (CBD)

This paper presents a CUDA-based real-time solution for true-color synthesis of remote sensing image. The solution reduces total execution time in three aspects. 1. Apply Pinned Memory to reduce data transfer time between host memory and GPU global memory. 2. Use look-up table to reduce computing time. 3 Overlap kernel executions and data transfers, besides register pages and execute kernels in parallel...

rozdział

Parallel genome-wide analysis with central and graphic processing units

Muhamad Fitra Kacamarga, James W. Baurley, Bens Pardamean

2015 IEEE International Conference on Computer and Communications (ICCC) > 265 - 269

2015 IEEE International Conference on Computer and Communications (ICCC)

The Indonesia Colorectal Cancer Consortium (IC3), the first cancer biobank repository in Indonesia, is faced with computational challenges in analyzing large quantities of genetic and phenotypic data. To overcome this challenge, we explore and compare performance of two parallel computing platforms that use central and graphic processing units. We present the design and implementation of a genome-wide...

rozdział

MDACCER: Modified Distributed Assessment of the Closeness CEntrality Ranking in Complex Networks for Massively Parallel Environments

Frederico Luis Cabral, Carla Osthoff, Daniel Ramos, Rafael Nardes

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) > 43 - 48

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

We propose a new method derived from DACCER (Distributed Assessment of the Closeness CEntrality Ranking): the modified DACCER (MDACCER), for assessing traditional closeness centrality ranking. MDACCER presents a relaxation that allows it to take advantage of massively parallel environments like General Purpose Graphics Processing Units (GPGPUs). Traditional DACCER proposal assesses Closeness centrality...

rozdział

Open ACC Programs Examined: A Performance Analysis Approach

Robert Dietrich, Guido Juckeland, Michael Wolfe

2015 44th International Conference on Parallel Processing > 310 - 319

2015 44th International Conference on Parallel Processing (ICPP)

The Open ACC standard has been developed to simplify parallel programming of heterogeneous systems. Based on a set of high-level compiler directives it allows application developers to offload code regions from a host CPU to an accelerator without the need for low-level programming with CUDA or Open CL. Details are implicit in the programming model and managed by Open ACC API-enabled compilers and...

rozdział

CUDABlock: A GUI Programming Tool for CUDA

Hsih-Hsin Lin, Chia-Heng Tu, Yuan-Shin Hwang

2015 44th International Conference on Parallel Processing Workshops > 37 - 42

2015 44th International Conference on Parallel Processing Workshops (ICPPW)

Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is widely available to achieve high performance in desktop, notebook, and even mobile computer systems. While multicore technology has become the norm of modern computers, programming such systems requires the understanding of underlying hardware architecture and hence posts a great challenge for...

rozdział

Acceleration-as-a-Service: Exploiting Virtualised GPUs for a Financial Application

Blesson Varghese, Javier Prades, Carlos Reano, Federico Silla

2015 IEEE 11th International Conference on e-Science > 47 - 56

2015 IEEE 11th International Conference on e-Science (e-Science)

How can GPU acceleration be obtained as a service in a cluster? This question has become increasingly significant due to the inefficiency of installing GPUs on all nodes of a cluster. The research reported in this paper is motivated to address the above question by employing rCUDA (remote CUDA), a framework that facilitates Acceleration-as-a-Service (AaaS), such that the nodes of a cluster can request...

rozdział

Parallel implementation of low light level image enhancement using CUDA

Peiyi Shen, Liang Zhang, Juan Song, Xilu Peng, więcej

2015 IEEE International Conference on Information and Automation > 673 - 677

2015 IEEE International Conference on Information and Automation (ICIA)

Enhancement algorithms can make low light level images have a clear visual effect like the one captured during the daytime, but due to high complexity and generous computational cost, low light level image enhancement algorithms are usually difficult to meet real-time requirements which make it difficult to be widely used in practical application. For this situation, a parallel optimization algorithm...

rozdział

A GPU based implementation of Needleman-Wunsch algorithm using skewing transformation

Anuj Chaudhary, Deepkumar Kagathara, Vibha Patel

2015 Eighth International Conference on Contemporary Computing (IC3) > 498 - 502

2015 Eighth International Conference on Contemporary Computing (IC3)

We present a new parallel approach of Needleman-Wunsch algorithm for global sequence alignment. This approach uses skewing transformation for traversal and calculation of the dynamic programming matrix. We compare the execution time of sequential CPU based implementation with two parallel GPU based implementations: Single-kernel invocation with lock-free block synchronization and multi-kernel invocation...

rozdział

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Yijie Huangfu, Wei Zhang

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 972 - 976

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing...

rozdział

Rethinking Prefetching in GPGPUs: Exploiting Unique Opportunities

Ahmad Lashgar, Amirali Baniasadi

In this paper we investigate static memory access predictability in GPGPU workloads, at the thread block granularity. We first show that a significant share of accessed memory addresses can be predicted using thread block identifiers. We build on this observation and introduce a hardware-software prefetching scheme to reduce average memory access time. Our proposed scheme issues the memory requests...

rozdział

CUDA-based hybrid intuitionistic fuzzy edge detection algorithm

Eyup Yalcin, Hasan Badem, Mahit Gunes

2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) > 1 - 6

2015 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

Intuitionistic fuzzy edge detection algorithm has been used for the signification or characterization of images. It has been designed by experts and the algorithm provides to aim to minimize errors. However, it has a fixed value for thresholding. In this paper, a hybrid algorithm has been developed using the Otsu method which is calculated a threshold value depending on the images. To be applicable...

rozdział

Towards accelerated agent-based crowd simulation for Hajj and Umrah

Abdur Rahman, Nor Asilah Wati Abdul Hamid, Amir Rizaan Rahiman, Basim Zafar

2015 International Symposium on Agents, Multi-Agent Systems and Robotics (ISAMSR) > 65 - 70

2015 International Symposium on Agents, Multi-Agent Systems and Robotics (ISAMSR)

There are many scientific applications ranging from weather prediction to oil and gas exploration that requires high-performance computing. It aids industries and researchers to enrich further their advancements. With the advent of general purpose computing over GPUs, most of the applications above are shifting towards High-Performance Computing (HPC). Agent-based crowd simulation is one of the candidates...

rozdział

CUDA-enabled Hadoop cluster for Sparse Matrix Vector Multiplication

Motahar Reza, Aman Sinha, Rajkumar Nag, Prasant Mohanty

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS) > 169 - 172

2015 IEEE 2nd International Conference on Recent Trends in Information Systems (ReTIS)

Compute Unified Device Architecture (CUDA) is an architecture and programming model that allows leveraging the high compute-intensive processing power of the Graphical Processing Units (GPUs) to perform general, non-graphical tasks in a massively parallel manner. Hadoop is an open-source software framework that has its own file system, the Hadoop Distributed File System (HDFS), and its own programming...

rozdział

Evaluation of global synchronization for iterative algebra algorithms on many-core

Ayaz ul Hasan Khan, Mayez Al-Mouhamed, Lutfi A. Firdaus

2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) > 1 - 6

2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

Massively parallel computing is applied extensively in various scientific and engineering domains. With the growing interest in many-core architectures and due to the lack of explicit support for inter-block synchronization specifically in GPUs, synchronization becomes necessary to minimize inter-block communication time. In this paper, we have proposed two new inter-block synchronization techniques:...

rozdział

A GPU Based SVM Method with Accelerated Kernel Matrix Calculation

Bo Yan, Yitian Ren, Zijiang Yang

2015 IEEE International Congress on Big Data > 41 - 46

2015 IEEE International Congress on Big Data (BigData Congress)

Support vector machine (SVM) is a popular classifier dealing with small-scale datasets. It has outstanding performance compared to other classifiers. However the execution time is extremely long when training Big Data. The Graphics Processing Unit (GPU) is a massively parallel device which performs very well as a co-processor. NVIDIA proposed a programming platform, CUDA, in 2006, which makes it much...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
KERNEL
CUDA

Data publikacji

Ustaw własny zakres dat

Dostępność treści

Dostępna (297)
Brak (6)

Słowa kluczowe

INSTRUCTION SETS (164)
GPU (142)
GRAPHICS PROCESSING UNIT (138)
GRAPHICS PROCESSING UNITS (130)
COPROCESSORS (72)
GPGPU (69)
COMPUTER ARCHITECTURE (63)
PARALLEL PROCESSING (58)
COMPUTATIONAL MODELING (56)
COMPUTER GRAPHIC EQUIPMENT (51)
PROGRAMMING (43)
ARRAYS (37)
OPTIMIZATION (34)
YARN (33)
MATHEMATICAL MODEL (26)
ACCELERATION (25)
PERFORMANCE EVALUATION (25)
COMPUTE UNIFIED DEVICE ARCHITECTURE (24)
HARDWARE (24)
MEMORY MANAGEMENT (24)
PARALLEL ARCHITECTURES (24)
COMPUTER GRAPHICS (23)
REGISTERS (22)
LIBRARIES (21)
PARALLEL COMPUTING (21)
ALGORITHM DESIGN AND ANALYSIS (20)
OPENMP (18)
SPARSE MATRICES (17)
SYNCHRONIZATION (17)
VECTORS (17)
CENTRAL PROCESSING UNIT (16)
GRAPHICS (16)
EQUATIONS (15)
OPENCL (15)
THROUGHPUT (15)
RUNTIME (14)
DATA MINING (13)
PARALLEL PROGRAMMING (13)
PARALLEL ALGORITHMS (12)
DATA STRUCTURES (11)
INDEXES (11)
MPI (11)
BENCHMARK TESTING (10)
BANDWIDTH (9)
BIOINFORMATICS (9)
GPU COMPUTING (9)
IMAGE EDGE DETECTION (9)
IMAGE PROCESSING (9)
MULTI-THREADING (9)
MULTICORE PROCESSING (9)
PIXEL (9)
DATA TRANSFER (8)
HISTOGRAMS (8)
MICROPROCESSOR CHIPS (8)
CONVOLUTION (7)
CPU (7)
DECODING (7)
HIGH PERFORMANCE COMPUTING (7)
ITERATIVE METHODS (7)
MATRIX MULTIPLICATION (7)
NVIDIA (7)
REAL-TIME SYSTEMS (7)
SPMV (7)
TRAINING (7)
ENCODING (6)
FEATURE EXTRACTION (6)
GENETIC ALGORITHMS (6)
GRAPHIC PROCESSING UNIT (6)
HEURISTIC ALGORITHMS (6)
IMAGE COLOR ANALYSIS (6)
IMAGE RECONSTRUCTION (6)
MAGNETIC CORES (6)
MESSAGE SYSTEMS (6)
MULTIPROCESSING SYSTEMS (6)
PROGRAM PROCESSORS (6)
RANDOM ACCESS MEMORY (6)
RENDERING (COMPUTER GRAPHICS) (6)
THREE DIMENSIONAL DISPLAYS (6)
APPROXIMATION ALGORITHMS (5)
CLUSTERING ALGORITHMS (5)
COMPUTATIONAL COMPLEXITY (5)
CRYPTOGRAPHY (5)
DATA MODELS (5)
FINITE DIFFERENCE METHODS (5)
GENOMICS (5)
MATHEMATICS COMPUTING (5)
MEDICAL IMAGE PROCESSING (5)
NUMERICAL MODELS (5)
NVIDIA GPU (5)
PARALLEL (5)
PATTERN CLUSTERING (5)
PERFORMANCE ANALYSIS (5)
POWER AWARE COMPUTING (5)
PROTEINS (5)
RADIATION DETECTORS (5)
SHAPE (5)
SHARED MEMORY (5)
TUNING (5)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu