Search results

Items from 1 to 20 out of 24 results

chapter

A GPU-Friendly Skiplist Algorithm

Nurit Moscovici, Nachshon Cohen, Erez Petrank

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT) > 246 - 259

2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT)

We propose a design for a fine-grained lock-based skiplist optimized for Graphics Processing Units (GPUs). While GPUs are often used to accelerate streaming parallel computations, it remains a significant challenge to efficiently offload concurrent computations with more complicated data-irregular access and fine-grained synchronization. Natural building blocks for such computations would be concurrent...

chapter

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor

James Lin, Zhigeng Xu, Akira Nukada, Naoya Maruyama, more

2017 46th International Conference on Parallel Processing (ICPP) > 432 - 441

2017 46th International Conference on Parallel Processing (ICPP)

The home-grown SW26010 many-core processor enabled the production of China’s first independently developed number-one ranked supercomputer – the Sunway TaihuLight. The design of the limited off-chip memory bandwidth, however, renders the SW26010 a highly memory-bound processor. To compensate for this limitation, the processor was designed with a unique hardware feature, "Register Level Communication"...

chapter

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Lin-Ya Yu, Shao-Chung Wang, Jenq-Kuen Lee

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 45 - 52

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

Heterogeneous computing platforms containing a wide range of computing resources from CPUs to specialized hardware accelerators is the trend today resulting from the physical limitations on processors speed and the increasing demand for computing performance. Hence many optimization strategies are studied to get better throughput and lower energy consumption in heterogeneous systems. Various memory...

chapter

Fast kernel fuzzy c-means algorithms based on difference of convex programming

Li Chen, Shuisheng Zhou, Xintao Gao

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) > 1090 - 1095

2016 12th International Conference on Natural Computation and 13th Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

In this study, we propose three new algorithms based on difference of convex (DC) programming and DC algorithm (DCA) for kernel fuzzy c-means (KFCM) clustering model. Firstly, KFCM model is reformulated into two equivalent forms of DC programmings for which different KFCM algorithms are designed. Then, to further accelerate the second DCA based KFCM algorithm, we adopt an approximate strategy which...

chapter

GraVF: A vertex-centric distributed graph processing framework on FPGAs

Nina Engelhardt, Hayden Kwok-Hay So

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

FPGAs are promising platforms to efficiently execute distributed graph algorithms. Unfortunately, they are notoriously hard to program, especially when the problem size and system complexity increases. In this paper, we propose GraVF, a high-level design framework for distributed graph processing on FPGAs. It leverages the vertex-centric paradigm, which is naturally distributed and requires the user...

chapter

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems

Chih-Chen Kao, Yu-Tsung Miao, Wei-Chung Hsu

2016 IEEE International Conference on Multimedia and Expo (ICME) > 1 - 6

2016 IEEE International Conference on Multimedia and Expo (ICME)

The prevalence of real time multimedia delivery appliances has led to the developments of a variety of efficient architectures and supporting software technologies. Especially, Ray-Tracing, a well-known physically-based rendering algorithm, has been receiving great attention in research and development. Unfortunately, Ray-Tracing algorithm, being one of the irregular applications, suffers from the...

chapter

JolokiaC++: Optimizing Irregular Accesses for GPGPU

Vibha Patel, Sanjeev Aggarwal, Amey Karkare

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 583 - 590

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

We present JolokiaC++ a compiler framework to ease coding of irregular data applications on GPUs. The effectiveness of the compiler and runtime systems of JolokiaC++ is tested using three kernels IRREG, MOLDYN and NBF, executed on NVIDIA GPUs. We developed extensions for the generic parallel constructs that allow portable and efficient programming of codes with irregular accesses on the GPU. We present...

chapter

PGX.D: a fast distributed graph processing engine

Sungpack Hong, Siegfried Depner, Thomas Manhardt, Jan Van Der Lugt, more

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Graph analysis is a powerful method in data analysis. Although several frameworks have been proposed for processing large graph instances in distributed environments, their performance is much lower than using efficient single-machine implementations provided with enough memory. In this paper, we present a fast distributed graph processing system, namely PGX.D. We show that PGX.D outperforms other...

chapter

High level programming framework for FPGAs in the data center

Oren Segal, Martin Margala, Sai Rahul Chalamalasetti, Mitch Wright

2014 24th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

Heterogeneous computing offers a promising solution for energy efficient computing in the data center. FPGA based heterogeneous computing is an especially promising direction since it allows for the creation of custom hardware solutions for data centric parallel applications. One of the main issues delaying wide spread adoption of FPGAs as main stream high performance computing devices is the difficulty...

chapter

Optimizing Collective Communication in UPC

Jithin Jose, Khaled Hamidouche, Jie Zhang, Akshay Venkatesh, more

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 361 - 370

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Message Passing Interface (MPI) has been the defacto programming model for scientific parallel applications. However, data driven applications with irregular communication patterns are harder to implement using MPI. The Partitioned Global Address Space (PGAS) programming models present an alternative approach to improve programmability. PGAS languages like UPC are growing in popularity because of...

chapter

Breaking through memory limitation in GPU parallel processing using Strassen Algorithm

Pujianto Yugopuspito, Sutrisno, Robertus Hudi

2013 International Conference on Computer, Control, Informatics and Its Applications (IC3INA) > 201 - 205

2013 International Conference on Computer, Control, Informatics and Its Applications (IC3INA)

Matrix multiplication is one of the basic operations in linear algebra that mostly used in computer science. For ages, applying naive algorithm to complete it has done it, and it has a standard complexity O(n³). Many researches are concluded to find more efficient and effective algorithm to process this operation, and one day Strassen has one that overcome the naive algorithm complexity with only...

chapter

Interior-point method for second-order cone programming based on a simple kernel function

Li Dong, Jingyong Tang

2010 Second International Conference on Computational Intelligence and Natural Computing > 1 > 85 - 88

2010 Second International Conference on Computational Intelligence and Natural Computing (CINC)

Interior-point methods not only are the most effective methods in practice but also have polynomial-time complexity. In this paper we present a primal-dual interior-point algorithm for second-order cone programming problems based on a simple kernel function. We derive the iteration bounds O(nlogε/n) and O(√nlogε/n) for large- and small-update methods, respectively, which are as good as those in the...

chapter

A simplification on SMO algorithm and its application in solving ε-SVR with non-positive Kernels

XiaoJian Zhou, YiZhong Ma, ZiQiang Cheng, LiPing Liu, more

The 2010 IEEE International Conference on Information and Automation > 878 - 883

2010 International Conference on Information and Automation (ICIA 2010)

Sequential Minimal Optimization (SMO) algorithm is very effective when solving large-scale support vector machine (SVM). The existing algorithms need to judge which quadrant the 4 Lagrange multipliers lie in, complicating its implementation. In addition, the existing algorithms all assume that the kernel functions are positive definite or positive semidefinite, limiting their applications. Having...

chapter

AES Encryption Algorithm Based on the High Performance Computing of GPU

Fei Shao, Zinan Chang, Yi Zhang

2010 Second International Conference on Communication Software and Networks > 588 - 590

2010 Second International Conference on Communication Software and Networks (ICCSN 2010)

The encrypting time of traditional AES algorithm is too long to meet the need of fast encryption. For this point, the high-performance computing capability of Graphic Processing Unit has become the hot issue of research. This paper proposes that AES algorithm is improved by use of GPU's high performance computing capability and compared with that using CPU. And AES encryption algorithm base on high...

chapter

High throughput multiple-precision GCD on the CUDA architecture

N. Fujimoto

2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) > 507 - 512

2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2009)

Investigation of the cryptanalytic strength of RSA cryptography requires computing many GCDs of two long integers (e.g., of length 1024 bits). This paper presents a high throughput parallel algorithm to perform many GCD computations concurrently on a GPU based on the CUDA architecture. The experiments with an NVIDIA GeForce GTX285 GPU and a single core of 3.0 GHz Intel Core2 Duo E6850 CPU show that...

chapter

High Throughput Implementation of MD5 Algorithm on GPU

Guang Hu, Jianhua Ma, Benxiong Huang

Proceedings of the 4th International Conference on Ubiquitous Information Technologies&Applications > 1 - 5

2009 4th International Conference on Ubiquitous Information Technologies & Applications (ICUT 2009)

Graphics processing unit (GPU) has evolved into a highly parallel, multithreaded, many-core processor with tremendous computational capability. The introduction of compute unified device architecture (CUDA) simplifies the software development on GPU and allows direct access to GPU resources. It's an effective way to improve the hashing performance in high-speed network and storage systems by using...

chapter

Fast Disk Encryption through GPGPU Acceleration

G. Agosta, A. Barenghi, F. De Santis, A. Di Biagio, more

2009 International Conference on Parallel and Distributed Computing, Applications and Technologies > 102 - 109

2009 International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2009)

We present the design and performance analysis of a GPU-optimized implementation of a disk encryption application employing the XTS mode of operation applied together with the Twofish algorithm within the well-known TrueCrypt suite. We show how to correctly tune the design parameters, including data allocation, thread packing, and parallelization strategy. Overall, our implementation of TrueCrypt...

chapter

The accelerating implementation of BLAST with stream processor

Gang Wei, Chao Ma, Songwen Pei, Baifeng Wu

2009 IEEE 10th International Conference on Computer-Aided Industrial Design&Conceptual Design > 2245 - 2250

2009 IEEE 10th International Conference on Computer-Aided Industrial Design & Conceptual Design. E-Business, Creative Design, Manufacturing. (CAID&CD 2009)

Sequence alignment is one of the most fundamental and important operation in bioinformatics. Through sequence alignment, we can find the sequence's information of function, structure and evolution. BLAST is one of the most popular algorithms in the field of sequence alignment. In this paper, we have designed a GPU-based parallel BLAST algorithm and implemented it on the brook⁺ platform. The main task...

chapter

An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases

L. Ligowski, W. Rudnicki

2009 IEEE International Symposium on Parallel&Distributed Processing > 1 - 8

2009 IEEE International Symposium on Parallel & Distributed Processing (IPDPS)

The Smith Waterman algorithm for sequence alignment is one of the main tools of bioinformatics. It is used for sequence similarity searches and alignment of similar sequences. The high end graphical processing unit (GPU), used for processing graphics on desktop computers, deliver computational capabilities exceeding those of CPUs by an order of magnitude. Recently these capabilities became accessible...

chapter

GPU-based parallel particle swarm optimization

You Zhou, Ying Tan

2009 IEEE Congress on Evolutionary Computation > 1493 - 1500

2009 IEEE Congress on Evolutionary Computation (CEC 2009)

A novel parallel approach to run standard particle swarm optimization (SPSO) on Graphic Processing Unit (GPU) is presented in this paper. By using the general-purpose computing ability of GPU and based on the software platform of Compute Unified Device Architecture (CUDA) from NVIDIA, SPSO can be executed in parallel on GPU. Experiments are conducted by running SPSO both on GPU and CPU, respectively,...

Data set:
ieee
Keywords:
ALGORITHM DESIGN AND ANALYSIS
KERNEL
PROGRAMMING

Publication date

Set your own date range

Keywords

GPU (7)
COMPUTATIONAL MODELING (6)
OPTIMIZATION (6)
COMPUTER ARCHITECTURE (5)
COMPUTER GRAPHIC EQUIPMENT (5)
COMPUTER GRAPHICS (4)
CRYPTOGRAPHY (4)
GPGPU (4)
GRAPHICS PROCESSING UNITS (4)
HARDWARE (4)
YARN (4)
ACCELERATION (3)
ARRAYS (3)
COMPUTE UNIFIED DEVICE ARCHITECTURE (3)
GRAPHIC PROCESSING UNIT (3)
HIGH PERFORMANCE COMPUTING (3)
OPTIMISATION (3)
PARALLEL PROCESSING (3)
RUNTIME (3)
BIOINFORMATICS (2)
CLUSTERING ALGORITHMS (2)
COMPLEXITY THEORY (2)
CONVERGENCE (2)
COPROCESSORS (2)
CUDA (2)
DATABASES (2)
EQUATIONS (2)
FIELD PROGRAMMABLE GATE ARRAYS (2)
GRAPHICS PROCESSING UNIT (2)
INSTRUCTION SETS (2)
LINEAR PROGRAMMING (2)
MICROPROCESSOR CHIPS (2)
NVIDIA GPU (2)
OPENCL (2)
PROGRAM PROCESSORS (2)
REGISTERS (2)
SEQUENCE ALIGNMENT (2)
SUPPORT VECTOR MACHINES (2)
THROUGHPUT (2)
TRAINING (2)
ε-SVR (1)
ACCURACY (1)
ADVANCED ENCRYPTION STANDARD ENCRYPTION ALGORITHM (1)
AES ALGORITHM (1)
AMD HD4850 (1)
APARAPI (1)
APPROXIMATION ALGORITHMS (1)
ATI STREAM COMPUTING ENVIRONMENT (1)
AVAILABILITY (1)
BANDWIDTH (1)
BENCHMARK TEST FUNCTIONS (1)
BENCHMARK TESTING (1)
CAD (1)
CAD ALGORITHMS (1)
CAPACITANCE (1)
CLASSIFICATION ALGORITHMS (1)
COMPILER (1)
COMPUTATIONAL COMPLEXITY (1)
COMPUTATIONAL ELECTROMAGNETICS (1)
CONDUCTORS (1)
CONSTRUCTION INDUSTRY (1)
CRYPTANALYTIC STRENGTH (1)
CUDA ARCHITECTURE (1)
CUDA PROGRAMMING (1)
DATA ALLOCATION (1)
DATA MINING (1)
DATA MODELS (1)
DATA STRUCTURES (1)
DESIGN AUTOMATION (1)
DESIGN OPTIMIZATION (1)
DESKTOP COMPUTERS (1)
DEVICE MULTITHREAD PROGRAMMING MODELS (1)
DGEMM (1)
DISC STORAGE (1)
ELECTROMAGNETICS (1)
ELECTRONICS PACKAGING (1)
ENCRYPTION (1)
FAST DISK ENCRYPTION (1)
FEATURE EXTRACTION (1)
FLOATING POINT PERFORMANCE (1)
FOUR CORE CPU (1)
FPGA (1)
FRAMEWORK (1)
GENERALIZED MINIMAL RESIDUAL ALGORITHM (1)
GPGPU ACCELERATION (1)
GPU ALGORITHM (1)
GPU OPTIMIZED IMPLEMENTATION (1)
GPU PROCESSING (1)
GPU-BASED PARALLEL BLAST ALGORITHM (1)
GRAPHICAL PROCESSING UNIT (1)
GRAPHICS (1)
GREATEST COMMON DIVISOR (1)
HAAR TRANSFORMS (1)
HAAR-LIKE FEATURE BASED SYSTEMS (1)
HASH ENCRYPTION ALGORITHM (1)
HETEROGENEOUS SYSTEMS (1)
HIERARCHICAL READ/WRITE ANALYSIS (1)
more

INFONA - science communication portal

Search results

A GPU-Friendly Skiplist Algorithm

Optimizations of Two Compute-Bound Scientific Kernels on the SW26010 Many-Core Processor

Hierarchical Read/Write Analysis for Pointer-Based OpenCL Programs on RRAM

Fast kernel fuzzy c-means algorithms based on difference of convex programming

GraVF: A vertex-centric distributed graph processing framework on FPGAs

A pipeline-based runtime technique for improving Ray-Tracing on HSA-compliant systems

JolokiaC++: Optimizing Irregular Accesses for GPGPU

PGX.D: a fast distributed graph processing engine

High level programming framework for FPGAs in the data center

Optimizing Collective Communication in UPC

Breaking through memory limitation in GPU parallel processing using Strassen Algorithm

Interior-point method for second-order cone programming based on a simple kernel function

A simplification on SMO algorithm and its application in solving ε-SVR with non-positive Kernels

AES Encryption Algorithm Based on the High Performance Computing of GPU

High throughput multiple-precision GCD on the CUDA architecture

High Throughput Implementation of MD5 Algorithm on GPU

Fast Disk Encryption through GPGPU Acceleration

The accelerating implementation of BLAST with stream processor

An efficient implementation of Smith Waterman algorithm on GPU using CUDA, for massively parallel scanning of sequence databases

GPU-based parallel particle swarm optimization

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options