The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the last decade, Graphics Processing Units(GPUs) have gained an increasing popularity as accelerators for High Performance Computing (HPC) applications. Recent GPUs are not only powerful graphics engines but also highly threaded parallel computing processors that can achieve sustainable speedup as compared with CPUs. In this context, researchers try to exploit the capability of this architecture...
This paper presents a parallelization of the Constraint Programming solver OR-Tools, using the parallel framework Bobpp. The principle of the parallelization is to assign one sequential OR-Tools solver per core and to communicate work or nodes using the Bobpp global priority queue. As usual the communication or migration of work between solvers is a kind of Work Stealing. But here we have to deal...
To alleviate the technical barrier for different scientific communities to join in Earth System Modeling, the coupler is widely used to link two or more climate simulation applications called models. With the advent of advanced models, the bigger data volume for transfer and transformation will incur great performance overhead to the coupler. However, the current independent modular design cannot...
Modern GPUs are gradually used by more and more cluster computing systems as the high performance computing units due to their outstanding computational power, whereas bringing system-level (among different nodes) architectural heterogeneity to cluster. In this paper, based on MPI and CUDA programming model, we aim to investigate task scheduling for GPU heterogeneous cluster by taking into account...
Although many NP-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of the dynamic programming tables and excessive runtimes of sequential implementations. This work addresses both challenges by proposing a set of new parallel...
Optimization methods generally do not fall into the most suitable algorithms for parallelization on a GPU. However, a relatively good efficiency still can be obtained if the method is properly adapted to the GPU programming model, which is the case for dynamic programming. In this article, we propose a parallelization strategy for thread grouping for dynamic programming in CUDA. We show that parametrizing...
We present some solutions to handle two problems commonly encountered when dealing with fine grain parallelization on multi-core architecture: Expressing algorithms using a task grain size suitable for the hardware and minimizing the time penalty due to Non Uniform Memory Accesses. To evaluate the benefit of our work we present some experiments on the fine grain parallelization of an iterative solver...
This paper describes initial steps to leverage accelerators, such as GPUs, in ab initio nuclear physics calculations. Specifically, parallel nuclear structure calculations performed by the MFDn package are considered with selected stages adapted for GPUs. This paper outlines the necessary steps to make MFDnutilize GPUs in its matrix construction stage. The experiments are presented to compare the...
Energy efficiency is especially important to the broadcasting operation in wireless sensor networks. It helps to reduce the energy consumption by minimizing the number of relay nodes during the broadcast process in case that the transmission range is identical to all nodes in the network. In this paper, we have introduced an efficient heuristic algorithm EMCDS to build the Minimum Connected Dominating...
Luna and Typhoon are two Top500 supercomputers running production scientific workloads at LANL. Interestingly, users have reported substantially better-than-expected performance gains from Luna, the newer of the two. In this paper we present our methodology for investigating the source of this improvement and determining what architectural changes for future supercomputers would most benefit LANL...
Our problem is to accurately solve linear system son a general purpose graphics processing unit with double double and quad double arithmetic. The linear systems originate from the application of Newton's method on polynomial systems. Newton's method is applied as a corrector in a path tracking method, so the linear systems are solved in sequence and not simultaneously. One solution path may require...
For decades computer architects pursued one primary goal: performance. The even-faster transistors provided by Moore's law were translated into remarkable gains in operation frequency and power consumption. However, the device-level size and architecture complexity imposes several new challenges, including a decrease in dependability level due to physical failures. This makes crucial the usage of...
The design of high-performance Multiprocessor Systems-on-Chip (MPSoCs) has proven to be an attractive challenge in embedded systems design automation. However, the complexity of such designs associated with short time-to-market constraints impose serious limitations on the exploration of different configurations and scenarios on the design space exploration. The use of virtual platforms may decrease...
In embedded system domain there is a continuous trend towards providing higher flexibility for application development. This imposes that the development of distinct components cannot be though as affordable for System-on-Chip platforms, whereas a more holistic approach is necessary for deriving optimal solutions. At the same time, the requirement for integrating more functionality in a smaller form...
Virtual prototyping of parallel and embedded systems increases insight into existing computer systems. It further allows to explore properties of new systems already during their specification phase. Virtual prototypes of such systems benefit from parallel simulation techniques due to the increased simulation speed. One common problem full system simulator implementers face is the revision and integration...
Modern systems-on-chips need sophisticated power-Management policies to control their power consumption and temperature. These power-management policies are usually implemented partly in software, with hardware support. They need to be validated early, hence power and temperature-aware simulation techniques at the system-level need to be developed. Existing approaches for system-level power and thermal...
New challenge about the constantly changing of associated data in big data management has arisen, which leads to the issue of data evolution. In this paper, a data evolution model of Virtual Data Space (VDS) is proposed for managing the big data lifecycle. Firstly, the concept of data evolution cycle is defined, and the lifecycle process of big data management is described. Based on these, the data...
Molecular dynamics simulation is a powerful tool to simulate and analyze complex physical processes and phenomena at atomic characteristic for predicting the natural time-evolution of a system of atoms. Precise simulation of processes such as liquid metal solidification processes simulation has strong requirements both in the simulation size and computing timescale. Therefore, finding available computing...
Computational science is generating increasingly unwieldy datasets created by complex and high-resolution simulations of physical, social, and economic systems. Traditional post processing of such large datasets requires high bandwidth to large storage resources. In situ processing approaches can reduce I/O requirements but steal processing cycles from the simulation and forsake interactive data exploration...
Nearest Neighbor search is one of the simplest and most intuitive ideas in data mining. Due to it's simplicity and diverse utility, Nearest Neighbor search is often found to be the workhorse of a variety of data mining, machine learning, and computer vision algorithms. For very high dimensional data, the naive linear search tends to be optimal. This is due to the so called curse of dimensionality...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.