Search results

Items from 21 to 40 out of 843 results

chapter

Parallel Desolvation Energy Term Calculation for Blind Docking on GPU Architectures

Hocine Saadi, Nadia Nouali-Taboudjemat, Abdellatif Rahmoun, Baldomero Imbernon, more

2017 46th International Conference on Parallel Processing Workshops (ICPPW) > 16 - 22

2017 46th International Conference on Parallel Processing Workshops (ICPPW)

In the recent literature, drug design relying on molecular docking (MD) techniques is becoming a very promising field. Most of these techniques rely on the way ligands interact with protein target using only one binding site, in addition, they ignore the fact that assorted ligands interact with unconnected parts of the target. However, by taking the latter fact into consideration, the computational...

chapter

Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

Minyoung Jung, Jinwoo Park, Johann Blieberger, Bernd Burgstaller

2017 46th International Conference on Parallel Processing (ICPP) > 271 - 281

2017 46th International Conference on Parallel Processing (ICPP)

String pattern matching with finite automata (FAs) is a well-established method across many areas in computer science. Until now, data dependencies inherent in the pattern matching algorithm have hampered effective parallelization. To overcome the dependency-constraint between subsequent matching steps, simultaneous deterministic finite automata (SFAs) have been recently introduced. Although an SFA...

chapter

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

Yusuke Nagasaka, Akira Nukada, Satoshi Matsuoka

2017 46th International Conference on Parallel Processing (ICPP) > 101 - 110

2017 46th International Conference on Parallel Processing (ICPP)

Sparse general matrix-matrix multiplication (SpGEMM) is one of the key kernels of preconditioners such as algebraic multigrid method or graph algorithms. However, the performance of SpGEMM is quite low on modern processors due to random memory access to both input and output matrices. As well as the number and the pattern of non-zero elements in the output matrix, important for achieving locality,...

chapter

Autotuning GPU Kernels via Static and Predictive Analysis

Robert Lim, Boyana Norris, Allen Malony

2017 46th International Conference on Parallel Processing (ICPP) > 523 - 532

2017 46th International Conference on Parallel Processing (ICPP)

Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models or performing code transformations. Although empirical...

chapter

Understanding the Impact of Fine-Grained Data Sharing and Thread Communication on Heterogeneous Workload Development

Tuan Ta, David Troendle, Xiaoqi Hu, Byunghyun Jang

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC) > 132 - 139

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)

The conventional OpenCL 1.x style CPU-GPU heterogeneous computing paradigm treats the CPU and GPU processors as loosely connected separate entities. At best each executes independent tasks, but, more commonly, the CPU idles while waiting for results from the GPU. No data-sharing and communications are allowed during kernel execution. This model limits the number of applications that can harness the...

chapter

Contention-Aware Selective Caching to Mitigate Intra-Warp Contention on GPUs

Kyoshin Choo, David Troendle, Esraa A. Gad, Byunghyun Jang

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC) > 1 - 8

2017 16th International Symposium on Parallel and Distributed Computing (ISPDC)

Modern GPUs embrace on-chip cache memory to exploit the locality present in applications. However, the behavior and effect of the cache on GPUs are different from those on conventional processors due to the Single Instruction Multiple Thread (SIMT) thread execution model and resulting memory access patterns. Previous studies report that caching data can hurt the performance due to increased memory...

chapter

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

Fabio Baruffa, Luigi Iapichino, Nicolay J. Hammer, Vasileios Karakasis

2017 International Conference on High Performance Computing & Simulation (HPCS) > 381 - 388

2017 International Conference on High Performance Computing & Simulation (HPCS)

We describe a strategy for code modernisation of Gadget, a widely used community code for computational astrophysics. The focus of this work is on node-level performance optimisation, targeting current multi/many-core Intel® architectures. We identify and isolate a sample code kernel, which is representative of a typical Smoothed Particle Hydrodynamics (SPH) algorithm. The code modifications include...

chapter

GPU-based coevolutionary particle swarm optimization

Zhao Liang, Zhu Yanxing, Zhang Jianyu, Ye Zhencheng

2017 36th Chinese Control Conference (CCC) > 9883 - 9887

2017 36th Chinese Control Conference (CCC)

Coevolutionary particle swarm optimization (CPSO) algorithm has been investigated and applied in the real world widely. When tackling the large-scale and complex real time optimization problems, the running time of CPSO algorithm is a barrier. In this paper, Graphics Processing Unit (GPU) is introduced to provide speedup in order to meet the real time requirements. The CPSO algorithm has been implemented...

chapter

OpenMP device offloading to FPGA accelerators

Lukas Sommer, Jens Korinth, Andreas Koch

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 201 - 205

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Future high-performance computing systems will need to include multiple specialized accelerators in a single heterogeneous system to overcome power-density limitations of CPU performance.

chapter

Hardwiring the OS kernel into a Java application processor

Chun-Jen Tsai, Cheng-Ju Lin, Cheng-Yang Chen, Yan-Hung Lin, more

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 53 - 60

2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

This paper presents the design and implementation of a hardwired OS kernel circuitry inside a Java application processor to provide the system services that are traditionally implemented in software. The hardwired system functions in the proposed SoC include the thread manager, the memory manager, and the I/O subsystem interface. There are many advantages in making the OS kernel a hardware component,...

chapter

Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters

Langshi Chen, Bo Peng, Bingjing Zhang, Tony Liu, more

2017 IEEE 10th International Conference on Cloud Computing (CLOUD) > 82 - 89

2017 IEEE 10th International Conference on Cloud Computing (CLOUD)

Data analytics is undergoing a revolution in many scientific domains, and demands cost-effective parallel data analysis techniques. Traditional Java-based Big Data processing tools like Hadoop MapReduce are designed for commodity CPUs. In contrast, emerging manycore processors like the Xeon Phi have an order of magnitude greater computation power and memory bandwidth. To harness their computing capabilities,...

chapter

GPU-based Gray-Level Co-occurrence Matrix for Extracting Features from Magnetic Resonance Images

Hsin-Yi Tsai, Zhang Hanyu, Che-Lun Hung, Hsian-Min Chen

2017 14th International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC) > 391 - 396

With the continuously increasing power of computation, especially in the region of parallel computing, computerbased texture analysis, computer-assisted classification methods, automated pathology detections, etc. are more and more commonly performed on medical images, like X-ray, Magnetic Resonance (MR) images, for clinical or scientific purposes. These procedures almost always include a stage of...

chapter

A scalable load forecasting system for low voltage grids

Marisa Reis, Andre Garcia, Ricardo J. Bessa

2017 IEEE Manchester PowerTech > 1 - 6

2017 IEEE Manchester PowerTech

A recent research trend is driven to increase the monitoring and control capabilities of low voltage networks. This paper describes a probabilistic forecasting methodology based on kernel density estimation and that makes use of distributed computing techniques to create a highly scalable forecasting system for LV networks. The results show that the proposed algorithm outperforms three benchmark models...

chapter

A GPU-based implementation of brain storm optimization

Chen Jin, A. K. Qin

2017 IEEE Congress on Evolutionary Computation (CEC) > 2698 - 2705

2017 IEEE Congress on Evolutionary Computation (CEC)

Brain storm optimization (BSO) is a newly emerging family of swarm intelligence techniques inspired by the human's creative problem-solving process, which has achieved successes in many applications. BSO is characterized by its unique process of grouping a population of ideas and carrying out brainstorming based on the grouped ideas to search for optima generation by generation. Although the original...

chapter

Linux kernel OS local root exploit

A.P. Saleel, Mohamed Nazeer, Babak D. Beheshti

2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT) > 1 - 5

2017 IEEE Long Island Systems, Applications and Technology Conference (LISAT)

Dirty Copy on Write (COW) vulnerability, discovered by Phil Oester on October 2016, it is a serious vulnerability which could escalate unprivileged user to gain full control on devices (Computers, Mobile Smart Phones, Gaming devices that run Linux based operating systems). This means that any user who exploits this bug, would escalate his/her privileges; and can do anything either locally or remotely...

chapter

Highly parallel online bioelectrical signal processing on GPU architecture

Z. Juhasz

2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 340 - 346

2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

Signal processing is of central importance in biomedical systems, in which pre-processing steps are unavoidable in order to reduce noise, remove unwanted artefacts, segment time series into smaller epochs, or extract statistical and other descriptive features that can be used in consecutive classification stages. The high sampling rates and electrode counts used e.g. in advanced EEG or body-surface...

chapter

Power Efficient Sharing-Aware GPU Data Management

Abdulaziz Tabbakh, Murali Annavaram, Xuehai Qian

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 698 - 707

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

The power consumed by memory system in GPUs is a significant fraction of the total chip power. As thread level parallelism increases, GPUs are likely to stress cache and memory bandwidth even more, thereby exacerbating power consumption. We observe that neighboring concurrent thread arrays (CTAs) within GPU applications share considerable amount of data. However, the default GPU scheduling policy...

chapter

Argo NodeOS: Toward Unified Resource Management for Exascale

Swann Perarnau, Judicael A. Zounmevo, Matthieu Dreher, Brian C. Van Essen, more

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 153 - 162

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Exascale systems are expected to feature hundreds of thousands of compute nodes with hundreds of hardware threads and complex memory hierarchies with a mix of on-package and persistent memory modules. In this context, the Argo project is developing a new operating system for exascale machines. Targeting production workloads using workflows or coupled codes, we improve the Linux kernel on several fronts...

chapter

A software technique to enhance register utilization of Convolutional Neural Networks on GPGPUs

Che-Huai Lin, An-Ting Cheng, Bo-Cheng Lai

2017 International Conference on Applied System Innovation (ICASI) > 614 - 617

2017 International Conference on Applied System Innovation (ICASI)

CNNs (Convolutional Neural Networks) have demonstrated superior results in a wide range of applications. However, the time-consuming convolution operations required by CNNs pose great challenges to designers. GPGPUs (General Purpose Graphic Processing Units) have been widely used to exploiting the massive parallelism of convolution operations. This paper proposes a software-based loop-unrolling technique...

chapter

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms

Jie Wang, Xinfeng Xie, Jason Cong

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 72 - 81

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Data movement is increasingly becoming the bottleneck of both performance and energy efficiency in modern computation. Until recently, it was the case that there is limited freedom for communication optimization on GPUs, as conventional GPUs only provide two types of methods for inter-thread communication: using shared memory or global memory. However, a new warp shuffle instruction has been introduced...

Keywords:
KERNEL
INSTRUCTION SETS

Publication date

Set your own date range

Content availability

Available (840)
None (3)

Keywords

GRAPHICS PROCESSING UNITS (354)
GRAPHICS PROCESSING UNIT (291)
GPU (204)
CUDA (164)
COMPUTER ARCHITECTURE (155)
PARALLEL PROCESSING (149)
HARDWARE (131)
OPTIMIZATION (110)
COMPUTATIONAL MODELING (109)
GPGPU (94)
COPROCESSORS (92)
REGISTERS (84)
MEMORY MANAGEMENT (81)
ARRAYS (77)
COMPUTER GRAPHIC EQUIPMENT (70)
PROGRAMMING (62)
SYNCHRONIZATION (57)
ALGORITHM DESIGN AND ANALYSIS (56)
BENCHMARK TESTING (54)
LINUX (48)
PERFORMANCE EVALUATION (46)
VECTORS (44)
ACCELERATION (43)
LIBRARIES (41)
SPARSE MATRICES (41)
MATHEMATICAL MODEL (38)
BANDWIDTH (35)
MULTIPROCESSING SYSTEMS (34)
THROUGHPUT (34)
MULTICORE PROCESSING (33)
RUNTIME (33)
OPENCL (32)
RANDOM ACCESS MEMORY (32)
RESOURCE MANAGEMENT (31)
MESSAGE SYSTEMS (30)
INDEXES (29)
CONTEXT (28)
FIELD PROGRAMMABLE GATE ARRAYS (27)
PARALLEL COMPUTING (27)
PARALLEL ARCHITECTURES (26)
CENTRAL PROCESSING UNIT (25)
DATA STRUCTURES (25)
REAL-TIME SYSTEMS (22)
EQUATIONS (21)
SCHEDULING (21)
SWITCHES (20)
PERFORMANCE (19)
PARALLEL ALGORITHMS (18)
PIPELINES (18)
CLUSTERING ALGORITHMS (17)
PARALLEL PROGRAMMING (17)
ACCURACY (16)
DATA TRANSFER (16)
HEURISTIC ALGORITHMS (16)
OPENMP (16)
EMBEDDED SYSTEMS (15)
IMAGE PROCESSING (15)
MULTI-THREADING (15)
PIXEL (15)
SYSTEM-ON-CHIP (15)
LAYOUT (14)
OPTIMISATION (14)
PROCESSOR SCHEDULING (14)
SCHEDULES (14)
SERVERS (14)
TRAINING (14)
COMPUTE UNIFIED DEVICE ARCHITECTURE (13)
COMPUTERS (13)
HIGH PERFORMANCE COMPUTING (13)
PARALLEL (13)
REAL TIME SYSTEMS (13)
GPU COMPUTING (12)
GRAPHIC PROCESSING UNIT (12)
MONITORING (12)
MPI (12)
SCALABILITY (12)
STANDARDS (12)
TILES (12)
DECODING (11)
ESTIMATION (11)
FEATURE EXTRACTION (11)
FPGA (11)
GENETIC ALGORITHMS (11)
GPUS (11)
GRAPHICS (11)
HISTOGRAMS (11)
JACOBIAN MATRICES (11)
MATRIX DECOMPOSITION (11)
SPMV (11)
TUNING (11)
ANALYTICAL MODELS (10)
APPLICATION PROGRAM INTERFACES (10)
CONVOLUTION (10)
CPU (10)
EDUCATIONAL INSTITUTIONS (10)
ENCODING (10)
ENERGY CONSUMPTION (10)
IMAGE COLOR ANALYSIS (10)
more

INFONA - science communication portal

Search results

Parallel Desolvation Energy Term Calculation for Blind Docking on GPU Architectures

Parallel Construction of Simultaneous Deterministic Finite Automata on Shared-Memory Multicores

High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU

Autotuning GPU Kernels via Static and Predictive Analysis

Understanding the Impact of Fine-Grained Data Sharing and Thread Communication on Heterogeneous Workload Development

Contention-Aware Selective Caching to Mitigate Intra-Warp Contention on GPUs

Performance Optimisation of Smoothed Particle Hydrodynamics Algorithms for Multi/Many-Core Architectures

GPU-based coevolutionary particle swarm optimization

OpenMP device offloading to FPGA accelerators

Hardwiring the OS kernel into a Java application processor

Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters

GPU-based Gray-Level Co-occurrence Matrix for Extracting Features from Magnetic Resonance Images

A scalable load forecasting system for low voltage grids

A GPU-based implementation of brain storm optimization

Linux kernel OS local root exploit

Highly parallel online bioelectrical signal processing on GPU architecture

Power Efficient Sharing-Aware GPU Data Management

Argo NodeOS: Toward Unified Resource Management for Exascale

A software technique to enhance register utilization of Convolutional Neural Networks on GPGPUs

Communication Optimization on GPU: A Case Study of Sequence Alignment Algorithms

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options