The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
High performance computing applications are far more difficult to write, therefore, practitioners expect a well-tuned software to last long and provide optimized performance even when the hardware is upgraded. It may also be necessary to write software using sufficient abstraction over the hardware so that it is capable of running on heterogeneous architecture. Therefore, it is required to have a...
This paper presents a FDTD CUDA based implementation designed for microstrip antennas simulation. Aspects of geometry and also memory transactions are considered in the formulation of the parallel algorithm. As a result, an improvement in computational cost is achieved using the implementation proposed. Two microstrip antennas, a narrow band patch antenna and a UWB antenna, are simulated to validate...
Ising model was originally designed to address the interactions among the atoms inside magnetic field. As it can fit into many biological problems where adjacent entities can interact with each other, Ising model is geting more and more popular. with its help, people may have a deeper and better understanding of associations between two related entities like genes and their products. However, it may...
The complete Voronoi map of a binary image with black and white pixels is a matrix of the same size such that each element is the closest black pixel of the corresponding pixel. The complete Voronoi map visualizes the influence region of each black pixel. However, each region may not be connected due to exclave pixels. The connected Voronoi map is a modification of the complete Voronoi map so that...
The use of Graphics Processing Units (GPUs) has become a very popular way to accelerate the execution of many applications. However, GPUs are not exempt from side effects. For instance, GPUs are expensive devices which additionally consume a non-negligible amount of energy even when they are not performing any computation. Furthermore, most applications present low GPU utilization. To address these...
Optimizing the performance of GPU kernels is challenging for both human programmers and code generators. For example, CUDA programmers must set thread and block parameters for a kernel, but might not have the intuition to make a good choice. Similarly, compilers can generate working code, but may miss tuning opportunities by not targeting GPU models or performing code transformations. Although empirical...
Network analysis software relies on graph layout algorithms to enable users to visually explore network data. Nowadays, networks easily consist of millions of nodes and edges, resulting in hours of computation time to obtain a readable graph layout on a typical workstation. Although these machines usually do not have a very large number of CPU cores, they can easily be equipped with Graphics Processing...
GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking...
Associative memories are models capable to store and retrieve messages given only a part of their content. These systems have been used in several applications such as databases engines, network routers, natural language processing and image recognition due to their error correction capability in pattern retrieving. Recently, Gripon and Berrou introduced a sparse associative memory based on cliques...
Modern Graphics Processing Units (GPUs) with massive number of threads and many-core architecture support both graphics and general purpose computing. NVIDIA's compute unified device architecture (CUDA) takes advantage of parallel computing and utilizes the tremendous power of GPUs. The present study demonstrates a high performance computing (HPC) framework for a Monte-Carlo simulation to determine...
The OpenACC programming model simplifies the programming for accelerator devices such as GPUs. Its abstract accelerator model defines a least common denominator for accelerator devices, thus it cannot represent architectural specifics of these devices without losing portability. Therefore, this general- purpose approach delivers good performance on average, but it misses optimization opportunities...
In Image Processing efficient algorithms are always pursued for applications that use the most advanced hardware architectures. Distance Transform is a classic operation for blurring effects, skeletonizing, segmentation and various other purposes. This article presents two implementations of the Euclidean Distance Transform using CUDA (Compute Unified Device Architecture) in GPU (Graphics Process...
The numerical solution of the Eikonal equation follows the fast iterative method with its application for tetrahe-dral meshes. Therein the main operations in each discretization element τ contain various inner products in the M-metric as ($e^{\rarr}$k,s,$e^{\rarr}$s,ℓMτ $e^{\rarr}$Tk,s · Mτ · $e^{\rarr}$s,ℓ with $e^{\rarr}$s,ℓ as connecting edge between vertices s and ℓ in element τ. Instead of passing...
Adaptive Dynamic Programming (ADP) with critic-actor architecture is a useful way to achieve online learning control. The algorithm Gaussian-Kernel Adaptive Dynamic Programming (GK-ADP) that has been developed before has a kind of two-phase iteration, which not only approximates value function, but also optimizes hyper-parameters simultaneously. However, just like most iteration algorithms are applied...
This paper outlines an efficient technique for displaying 3D vector fields during conformal FDTD field updates on CUDA GPUs, while incurring only a small computational overhead and using a small configurable memory allocation. A 10GHz OAM phased array is presented as an example where 3D vector visualization shows the development of the OAM mode.
With the explosive growth of user load data in power consumption information collection and load control systems, traditional computing frameworks and methods are faced with tremendous computational pressure when dealing with massive user load clustering and carrying out load characteristic analysis. In this paper, with a view to increasing accuracy and computational power of graphic process unit...
General Purpose GPUs (GPGPUs) are ideal platforms for parallel execution of applications with regular shared memory access patterns. However, majority of real world multithreaded applications require access to shared memory with irregular patterns. The Minimum Spanning Forest (MSF) calculation arises in many real world applications. The Boruvka's algorithm for calculating MSF has the most expressed...
Need for end-to-end secure voice communication under the cyber security threats are increasing day by day. This paper describes a method of establishing secure VOIP system in which the voice encoded with the SYMPES [1] coding technique and encryption set with an open standard encryption algorithm. Voice can be transmitted from point to point within a secure IP network. A Graphic Processing Unit (GPU)...
The keypoints detection, matching and tracking based online tracking algorithm, which is called CMT (Clustering of the Static-Adaptive Correspondences for Deformable Object Tracking), is robust and accurate for deformable object tacking. However, its optical flow tracker is error-prone when active points run outside the scope of the target. Worse still, the computational complexity of CMT greatly...
Coevolutionary particle swarm optimization (CPSO) algorithm has been investigated and applied in the real world widely. When tackling the large-scale and complex real time optimization problems, the running time of CPSO algorithm is a barrier. In this paper, Graphics Processing Unit (GPU) is introduced to provide speedup in order to meet the real time requirements. The CPSO algorithm has been implemented...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.