The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The arch project is a suite of mini-apps that have been developed with consistent coding practices, under a common infrastructural layer. Great emphasis has been placed on making the applications concise and easy to manipulate, while capturing the key performance characteristics of their proxied algorithmic classes. The suite is intended for traditional exploration of performance, portability and...
The development of a deep (stacked) convolutional auto-encoder in the Caffe deep learning framework is presented in this paper. We describe simple principles which we used to create this model in Caffe. The proposed model of convolutional auto-encoder does not have pooling/unpooling layers yet. The results of our experimental research show comparable accuracy of dimensionality reduction in comparison...
In the recent literature, drug design relying on molecular docking (MD) techniques is becoming a very promising field. Most of these techniques rely on the way ligands interact with protein target using only one binding site, in addition, they ignore the fact that assorted ligands interact with unconnected parts of the target. However, by taking the latter fact into consideration, the computational...
The capability of GPUs to accelerate general-purpose applications that can be parallelized into massive number of threads makes it promising to apply GPUs to real-time applications as well, where high throughput and intensive computation are also needed. However, due to the different architecture and programming model of GPUs, the worst-case execution time (WCET) analysis methods and techniques designed...
This paper presents GPU parallelization for a computational fluid dynamics solver which works on a mesh consisting of polyhedral cells, where each cell has an arbitrary number of faces and each face has an arbitrary number of vertices. The parallelization is achieved using NVIDIAs compute unified device architecture (CUDA). The developed code specifically targets performance improvement on NVIDIA...
Today, most high-performance computing (HPC) platforms have heterogeneous hardware resources (CPUs, GPUs, storage, etc.) A Graphics Processing Unit (GPU) is a parallel computing coprocessor specialized in accelerating vector operations. The prediction of application execution times over these devices is a great challenge and is essential for efficient job scheduling. There are different approaches...
This paper studies the implementation and optimization of a high-order weighted essentially non-oscillatory (WENO) solver to the solution of the Euler equations on the multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). The implementation of up to ninth-order accurate WENO schemes is used in the solver. For the GPU platform, both...
As a traditional application on various supercomputers, atmospheric modeling has long been suffering from the low performance efficiency. In this paper, we pick the 3D Euler equation solver (the most essential dynamic component for a non-hydrostatic atmospheric model) as the target application, and explore the maximum performance efficiency that can be achieved on CPU-GPU hybrid architectures. Besides...
Simulating complex physical phenomena implies the manipulation of an important amount of data. In order to simulate very large simulation domains on a limited computing architecture, such as industrial infrastructures, solutions have to be proposed. In this paper, a new out-of-core method is introduced in order to perform fast physical simulations using a complex Lattice Boltzmann model (LBM) on a...
A finite-difference micromagnetic solver called Grace uses C++ Accelerated Massive Parallelism (C++ AMP). The high-speed performance of a single GPU is compared against a typical CPU-based solver. The speedup of GPU to CPU is shown to be two orders of magnitude for problems with larger sizes. This solver can run on GPUs from various hardware vendors, such as Nvidia, AMD, and Intel, regardless of whether...
The aim of this paper is to develop an integrated electronic system that allows the dynamical management of congestion and provides the fast evaluation of dynamical circumstances. Thus, a cellular-automata-based model is proposed that estimates the movement of individuals. The presented system incorporates a process that allows the efficient camera-based initialization of the model, without any special...
Modeling thermal radiation is computationally challenging in parallel due to its all-to-all physical and resulting computational connectivity, and is also the dominant mode of heat transfer in practical applications such as next-generation clean coal boilers, being modeled by the Uintah framework. However, a direct all-to-all treatment of radiation is prohibitively expensive on large computers systems...
In this paper, we accelerate a double-precision alternating direction implicit (ADI) solver for three-dimensional compressible Navier-Stokes equations from our in-house computational fluid dynamics (CFD) software on the latest multi-core and many-core architectures (Intel Ivy Bridge CPU, Intel Xeon Phi 7110P coprocessor and NVIDIA Kepler K20c GPU). For the GPU platform, both the OpenACC-based and...
This paper presents an approach for parallel implementation of cross-correlation using the graphics processing unit (GPU). Cross-correlation is a central digital signal processing (DSP) algorithm with applications in many areas. In many cases in real time systems, a sequential implementation of the cross-correlation creates a performance bottleneck and prevents the systems from reaching the real time...
Biological sequence comparison is a very common task in Bioinformatics applications. Many parallel solutions have been proposed for this problem, using different HPC platforms, programmed usually with platform-specific languages and frameworks. With this approach, it is difficult to port solutions among different platforms such as CPUs and GPUs, for instance. To tackle this problem, this paper proposes...
The paper considers problems of developing the parallel hybrid fluid-based model and methods to solve them. The main reasons of falling of GPU performance that had arisen during development and ways to address them are described. The method for describing the structures of networks using routes adjacency matrix is provided. Also several methods to evaluate line matrix summation are considered and...
During the past decade Graphics Processing Units (GPU) have been increasingly employed for speeding up compute intensive scientific applications. In this field, the geometric multigrid method (GMG) is one of the most efficient algorithms for solving large sparse linear systems of equations. Herein we analyze the performance of an optimized GPU based implementation of the GMG method on different state-of-the-art...
Understanding three-dimensional seismic wave propagation in complex media is still one of the main challenges of quantitative seismology. Because of its simplicity and numerical efficiency, the finite-differences method is one of the standard techniques implemented to consider the elastodynamics equation. Additionally, this class of modeling heavily relies on parallel architectures in order to tackle...
Many high-performance computing applications solving partial differential equations (PDEs) can be attributed to the class of kernels using stencils on structured grids. Due to the disparity between floating point operation throughput and main memory bandwidth these codes typically achieve only a low fraction of peak performance. Unfortunately, stencil computation optimization techniques are often...
Computational biology contributes important solutions for major biological challenges. Unfortunately, most applications in computational biology are highly compute-intensive and associated with extensive computing times. Biological problems of interest are often not treatable with traditional simulation models on conventional multi-core CPU systems. This interdisciplinary work introduces a new multi-timescale...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.