The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Computing a minimum spanning tree (MST) of a graph is a fundamental problem in Graph Theory and arises as a subproblem in many applications. In this paper, we propose a parallel MST algorithm and implement it on a GPU (Graphics Processing Unit). One of the steps of previous parallel MST algorithms is a heavy use of parallel list ranking. Besides the fact that list ranking is present in several parallel...
Many real systems can be naturally modeled by complex networks. A complex network represents an abstraction of the system regarding its components and their respective interactions. Thus, by scrutinizing the network, interesting properties of the system can be revealed. Among them, the presence of communities, which consists of groups of densely connected nodes, is a significant one. For instance,...
Sparse tensors appear in many large-scale applications with multidimensional and sparse data. While multidimensional sparse data often need to be processed on manycore processors, attempts to develop highly-optimized GPU-based implementations of sparse tensor operations are rare. The irregular computation patterns and sparsity structures as well as the large memory footprints of sparse tensor operations...
In this paper, we present collaborative CPU + GPU algorithms for triangle counting and truss decomposition, the two fundamental problems in graph analytics. We describe the implementation details and present experimental evaluation on the IBM Minsky platform. The main contribution of this paper is a thorough benchmarking and comparison of the different memory management schemes offered by CUDA 8 and...
The complete Voronoi map of a binary image with black and white pixels is a matrix of the same size such that each element is the closest black pixel of the corresponding pixel. The complete Voronoi map visualizes the influence region of each black pixel. However, each region may not be connected due to exclave pixels. The connected Voronoi map is a modification of the complete Voronoi map so that...
We present and evaluate a new GPU algorithm based on the Louvain method for community detection. Our algorithm is the first for this problem that parallelizes the access to individual edges. In this way we can fine tune the load balance when processing networks with nodes of highly varying degrees. This is achieved by scaling the number of threads assigned to each node according to its degree. Extensive...
The paper describes the author's experience in identification of computer arithmetic's implementation principles that allow extending mathematical properties of a number on its computer representation. The range of the representable numbers, bitwise identity of the result on the different computer architectures and in parallel computation are very important in the current cloud and parallel computing...
Technological advancements have necessitated the need for effectively teaching GPU computing. This need has been inspired by the increasing pattern of utilizing parallel computing and by the growing utilization of GPUs for computationally intensive tasks. This paper is motivated to address the above mentioned need. The paper describes a semester-long course on CUDA programming. The course has significant...
The bulk execution of a sequential algorithm is to execute it for many different inputs in turn or at the same time. It is known that the bulk execution of an oblivious sequential algorithm can be implemented to run efficiently on a GPU. The bulk execution supports fine grained bitwise parallelism, allowing it to achieve high acceleration over a straightforward sequential computation. The main contribution...
The High Efficiency Video Coding (HEVC) standard, as the newest generation video coding standard issued in 2013, significantly improves compression performance relative to existing standards in about 50% bit-rate reduction for equal perceptual video quality with the cost of greatly increasing the computation complexity of the encoder/decoder. In order to improve the decoding efficiency, we design...
Automatic defects inspection and classification inceramic tile plays a crucial role in ceramic tiles industry. Inorder to improve the required computing time to detect andclassify spot and crack defects in ceramic tiles, this paperproposed parallel algorithms based on the graphical processingunit. The proposed algorithm divides ceramic tile images intonon-overlapped partitions, identifies the defected...
The simulation of EM (electromagnetic) wave propagation requires considerable computation time, as it analyzes a large number of propagation paths. To overcome this problem, we propose a GPU (graphics processing unit)-based parallel algorithm for VPL (vertical plane launch)-approximated EM wave propagation. The conventional algorithm computes the gain along propagation paths with irregular memory...
The nonlocal means (NLM) algorithm is one of the best image denoising algorithms because of its superior capability to retain the texture details of an image and is widely used in remote sensing (RS) image preprocessing. However, the time complexity of the algorithm is very high due to its nonlocality when searching for similar pixels. As a result, the NLM algorithm cannot satisfy the near real-time...
It is a trend now that computing power through parallelism is provided by multi-core systems or heterogeneous architectures for High Performance Computing (HPC) and scientific computing. Although many algorithms have been proposed and implemented using sequential computing, alternative parallel solutions provide more suitable and high performance solutions to the same problems. In this paper, three...
The main goal of supervised data analytics is to model a target phenomenon given a limited amount of samples, each represented by an arbitrarily large number of variables. Especially when the number of variables is much larger than the number of available samples, variable selection is a key step as it allows to identify a possibly reduced subset of relevant variables describing the observed phenomenon...
For investigations of rapidly moving structures in opaque technical devices ultrafast electron beam X-ray computed tomography (CT) scanners are available at the Helmholtz-Zentrum Dresden-Rossendorf (HZDR). Currently, measurement data must be initially downloaded after each CT scan from the scanner to a data processing machine. Afterwards, cross-sectional images are reconstructed. This limits the application...
Many automated finite state machine (FSM) based test generation algorithms require that a characterising set or a set of harmonised state identifiers is first produced. The only previously published algorithms for partial FSMs were brute-force algorithms with exponential worst case time complexity. This paper presents polynomial time algorithms and also massively parallel implementations of both the...
Communicating radius of automatic light trap surveillance network characterizes how well an area is monitored or tracked by automatic light traps. Connectivity is an important required that shows how nodes in an automatic BPH light trap surveillance network can eectively communicate. In this paper, we propose a new approach to determine the communication radius of an automatic light trap based on...
A parallel implementation of a surface reconstruction algorithm is presented. This algorithm uses the vector field surface representation and was adapted in a previous work by the authors to handle large scale environment reconstruction. Two parallel implementations with different memory requirements and processing speeds are described and compared. These parallel implementations increase the vector...
Various sequential algorithms for the shortest path problem on time dependent graphs are appearing in the literature. However, these algorithms mostly suffer from long running times and huge memory requirements. These problems are making them unsuitable for navigation applications which need to run on real time data with fast response times. For the shortest path problem with time dependent flow speed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.