Search results

chapter

A Parallel Algorithm for Minimum Spanning Tree on GPU

Jucele Franca de Alencar Vasconcellos, Edson Norberto Caceres, Henrique Mongelli, Siang Wun Song

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 67 - 72

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Computing a minimum spanning tree (MST) of a graph is a fundamental problem in Graph Theory and arises as a subproblem in many applications. In this paper, we propose a parallel MST algorithm and implement it on a GPU (Graphics Processing Unit). One of the steps of previous parallel MST algorithms is a heavy use of parallel list ranking. Besides the fact that list ranking is present in several parallel...

chapter

Parallel Algorithm for Dynamic Community Detection

Hugo Resende, Alvaro Luiz Fazenda, Marcos Goncalves Quiles

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 55 - 60

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Many real systems can be naturally modeled by complex networks. A complex network represents an abstraction of the system regarding its components and their respective interactions. Thus, by scrutinizing the network, interesting properties of the system can be revealed. Among them, the presence of communities, which consists of groups of densely connected nodes, is a significant one. For instance,...

chapter

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Bangtian Liu, Chengyao Wen, Anand D. Sarwate, Maryam Mehri Dehnavi

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 47 - 57

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor operations...

chapter

Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism

Ketan Date, Keven Feng, Rakesh Nagi, Jinjun Xiong, more

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

In this paper, we present collaborative CPU + GPU algorithms for triangle counting and truss decomposition, the two fundamental problems in graph analytics. We describe the implementation details and present experimental evaluation on the IBM Minsky platform. The main contribution of this paper is a thorough benchmarking and comparison of the different memory management schemes offered by CUDA 8 and...

chapter

Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations

Takumi Honda, Shinnosuke Yamamoto, Hiroaki Honda, Koji Nakano, more

2017 46th International Conference on Parallel Processing (ICPP) > 362 - 371

2017 46th International Conference on Parallel Processing (ICPP)

The complete Voronoi map of a binary image with black and white pixels is a matrix of the same size such that each element is the closest black pixel of the corresponding pixel. The complete Voronoi map visualizes the influence region of each black pixel. However, each region may not be connected due to exclave pixels. The connected Voronoi map is a modification of the complete Voronoi map so that...

chapter

Community Detection on the GPU

Md. Naim, Fredrik Manne, Mahantesh Halappanavar, Antonino Tumeo

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 625 - 634

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tune the load balance when processing networks with nodes of highly varying degrees. This is achieved by scaling the number of threads assigned to each node according to its degree. Extensive...

chapter

Properties of mathematical number model provided exact computing

Valentin Golodov

2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 225 - 228

2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

The paper describes the author's experience in identification of computer arithmetic's implementation principles that allow extending mathematical properties of a number on its computer representation. The range of the representable numbers, bitwise identity of the result on the different computer architectures and in parallel computation are very important in the current cloud and parallel computing...

chapter

A Laboratory Based Course on GPU Programming: Methods, Practices, and Lessons

Jawwad Ahmed Shamsi

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 367 - 374

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Technological advancements have necessitated the need for effectively teaching GPU computing. This need has been inspired by the increasing pattern of utilizing parallel computing and by the growing utilization of GPUs for computationally intensive tasks. This paper is motivated to address the above mentioned need. The paper describes a semester-long course on CUDA programming. The course has significant...

chapter

Accelerating the Smith-Waterman Algorithm Using Bitwise Parallel Bulk Computation Technique on GPU

Takahiro Nishimura, Jacir L. Bordim, Yasuaki Ito, Koji Nakano

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 932 - 941

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution...

chapter

Parallel acceleration of HEVC decoder based on CPU+GPU heterogeneous platform

Aidi Ma, Chengan Guo

2017 Seventh International Conference on Information Science and Technology (ICIST) > 323 - 330

2017 Seventh International Conference on Information Science and Technology (ICIST)

The High Efficiency Video Coding (HEVC) standard, as the newest generation video coding standard issued in 2013, significantly improves compression performance relative to existing standards in about 50% bit-rate reduction for equal perceptual video quality with the cost of greatly increasing the computation complexity of the encoder/decoder. In order to improve the decoding efficiency, we design...

chapter

Developing Parallel Cracks and Spots Ceramic Defect Detection and Classification Algorithm Using CUDA

Khaled Ragab, Nahed Alsharay

2017 IEEE 13th International Symposium on Autonomous Decentralized System (ISADS) > 255 - 261

2017 IEEE 13th International Symposium on Autonomous Decentralized System (ISADS)

Automatic defects inspection and classification inceramic tile plays a crucial role in ceramic tiles industry. Inorder to improve the required computing time to detect andclassify spot and crack defects in ceramic tiles, this paperproposed parallel algorithms based on the graphical processingunit. The proposed algorithm divides ceramic tile images intonon-overlapped partitions, identifies the defected...

chapter

GPU-based parallel algorithm for VPL-approximated EM wave propagation

Saki Matsuo, Masato Gocho, Takahiro Hashimoto, Atsuo Ozaki

2017 11th European Conference on Antennas and Propagation (EUCAP) > 3282 - 3285

2017 11th European Conference on Antennas and Propagation (EUCAP)

The simulation of EM (electromagnetic) wave propagation requires considerable computation time, as it analyzes a large number of propagation paths. To overcome this problem, we propose a GPU (graphics processing unit)-based parallel algorithm for VPL (vertical plane launch)-approximated EM wave propagation. The conventional algorithm computes the gain along propagation paths with irregular memory...

article

A Parallel Nonlocal Means Algorithm for Remote Sensing Image Denoising on an Intel Xeon Phi Platform

Fang Huang, Bo Lan, Jian Tao, Yinjie Chen, more

IEEE Access > 2017 > 5 > 8559 - 8567

The nonlocal means (NLM) algorithm is one of the best image denoising algorithms because of its superior capability to retain the texture details of an image and is widely used in remote sensing (RS) image preprocessing. However, the time complexity of the algorithm is very high due to its nonlocality when searching for similar pixels. As a result, the NLM algorithm cannot satisfy the near real-time...

chapter

Optimized GPU implementation for dynamic programming in image data processing

Jing Ke, Tomasz Bednarz, Arcot Sowmya

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) > 1 - 7

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)

It is a trend now that computing power through parallelism is provided by multi-core systems or heterogeneous architectures for High Performance Computing (HPC) and scientific computing. Although many algorithms have been proposed and implemented using sequential computing, alternative parallel solutions provide more suitable and high performance solutions to the same problems. In this paper, three...

chapter

PALLADIO: A Parallel Framework for Robust Variable Selection in High-Dimensional Data

Matteo Barbieri, Samuele Fiorini, Federico Tomasi, Annalisa Barla

2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC) > 19 - 26

2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)

The main goal of supervised data analytics is to model a target phenomenon given a limited amount of samples, each represented by an arbitrarily large number of variables. Especially when the number of variables is much larger than the number of available samples, variable selection is a key step as it allows to identify a possibly reduced subset of relevant variables describing the observed phenomenon...

chapter

Scalable and Modular Online Data Processing for Ultrafast Computed Tomography Using CUDA Pipelines

Tobias Frust, Guido Juckeland, Andre Bieberle

2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV) > 7 - 11

2016 Second Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV)

For investigations of rapidly moving structures in opaque technical devices ultrafast electron beam X-ray computed tomography (CT) scanners are available at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR). Currently, measurement data must be initially downloaded after each CT scan from the scanner to a data processing machine. Afterwards, cross-sectional images are reconstructed. This limits the application...

article

Parallel Algorithms for Generating Harmonised State Identifiers and Characterising Sets

Robert M. Hierons, Uraz Cengiz Turker

IEEE Transactions on Computers > 2016 > 65 > 11 > 3370 - 3383

Many automated finite state machine (FSM) based test generation algorithms require that a characterising set or a set of harmonised state identifiers is first produced. The only previously published algorithms for partial FSMs were brute-force algorithms with exponential worst case time complexity. This paper presents polynomial time algorithms and also massively parallel implementations of both the...

chapter

A parallel algorithm for determining the communication radius of an automatic light trap based on balltree structure

Giang Nguyen Thi Phuong, Huong Hoang Luong, Tai Huu Pham, Hiep Xuan Huynh

2016 Eighth International Conference on Knowledge and Systems Engineering (KSE) > 139 - 143

2016 Eighth International Conference on Knowledge and Systems Engineering (KSE)

Communicating radius of automatic light trap surveillance network characterizes how well an area is monitored or tracked by automatic light traps. Connectivity is an important required that shows how nodes in an automatic BPH light trap surveillance network can eectively communicate. In this paper, we propose a new approach to determine the communication radius of an automatic light trap based on...

chapter

Out-of-core GPU accelerated surface reconstruction for large industrial environment monitoring

Francois Miralles, Chen Xu, Denis Laurendeau

2016 4th International Conference on Applied Robotics for the Power Industry (CARPI) > 1 - 6

2016 4th International Conference on Applied Robotics for the Power Industry (CARPI)

A parallel implementation of a surface reconstruction algorithm is presented. This algorithm uses the vector field surface representation and was adapted in a previous work by the authors to handle large scale environment reconstruction. Two parallel implementations with different memory requirements and processing speeds are described and compared. These parallel implementations increase the vector...

chapter

Parallelizing shortest path algorithm for time dependent graphs with flow speed model

Mehmet Akif Ersoy, Can Ozturan

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT) > 1 - 7

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT)

Various sequential algorithms for the shortest path problem on time dependent graphs are appearing in the literature. However, these algorithms mostly suffer from long running times and huge memory requirements. These problems are making them unsuitable for navigation applications which need to run on real time data with fast response times. For the shortest path problem with time dependent flow speed...

INFONA - science communication portal

Search results

A Parallel Algorithm for Minimum Spanning Tree on GPU

Parallel Algorithm for Dynamic Community Detection

A Unified Optimization Approach for Sparse Tensor Operations on GPUs

Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture: Static graph challenge: Subgraph isomorphism

Simple and Fast Parallel Algorithms for the Voronoi Map and the Euclidean Distance Map, with GPU Implementations

Community Detection on the GPU

Properties of mathematical number model provided exact computing

A Laboratory Based Course on GPU Programming: Methods, Practices, and Lessons

Accelerating the Smith-Waterman Algorithm Using Bitwise Parallel Bulk Computation Technique on GPU

Parallel acceleration of HEVC decoder based on CPU+GPU heterogeneous platform

Developing Parallel Cracks and Spots Ceramic Defect Detection and Classification Algorithm Using CUDA

GPU-based parallel algorithm for VPL-approximated EM wave propagation

A Parallel Nonlocal Means Algorithm for Remote Sensing Image Denoising on an Intel Xeon Phi Platform

Optimized GPU implementation for dynamic programming in image data processing

PALLADIO: A Parallel Framework for Robust Variable Selection in High-Dimensional Data

Scalable and Modular Online Data Processing for Ultrafast Computed Tomography Using CUDA Pipelines

Parallel Algorithms for Generating Harmonised State Identifiers and Characterising Sets

A parallel algorithm for determining the communication radius of an automatic light trap based on balltree structure

Out-of-core GPU accelerated surface reconstruction for large industrial environment monitoring

Parallelizing shortest path algorithm for time dependent graphs with flow speed model

Filter options

Publication date

Publication type

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options