The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Auto-Tuning techniques have been used in the design of routines in recent years. The goal is to develop routines which automatically adapt to the conditions of the computational system, in such a way that efficient executions are obtained independently of the user's experience. This paper aims to explore programming routines that can be automatically adapted to the computational system conditions,...
Understanding power usage in parallel workloads is crucial to develop the energy-aware software that will run in future Exascale systems. In this paper, we contribute towards this goal by introducing an integrated framework to profile, monitor, model and analyze power dissipation in parallel MPI and multi-threaded scientific applications. The framework includes an own-designed device to measure internal...
We investigate the balance between the time-to-solution and the energy consumption of a task-parallel execution of the Cholesky and LU factorizations on a hybrid platform, equipped with a multi-core processor and several GPUs. To improve energy efficiency, we incorporate two energy-saving techniques in the runtime in charge of scheduling the computations, to block idle threads and enable the transition...
In this paper we analyze the trade-off between energy and performance for a data-parallel execution of the LU factorization with partial pivoting on a multi-core processor. To improve energy efficiency, we adapt the runtime in charge of controlling the concurrent execution of the algorithm to leverage DVFS and block idle threads. For a CPU-bounded operation like the LU factorization, experiments on...
This paper addresses the efficient exploitation of task-level parallelism, present in many dense linear algebra operations, from the point of view of both computational performance and energy consumption. In particular, we consider a procedure, the Slack Reduction Algorithm (SRA), to optimize the execution frequency of a collection of tasks (in which many dense linear algebra algorithms can be decomposed)...
Two strategies of distribution of computations can be used to implement parallel solvers for dense linear algebra problems for Heterogeneous Computational Clusters of Multicore Processors (HCoMs). These strategies are called Heterogeneous Process Distribution Strategy (HPS) and Heterogeneous Data Distribution Strategy (HDS). They are not novel and have been researched thoroughly. However, the advent...
In this work we present a parallel algorithm for the solution of a least squares problem with structured matrices. This problem arises in many applications mainly related to digital signal processing. The parallel algorithm is designed to speed up the sequential one on heterogeneous networks of computers. The parallel algorithm follows the HeHo strategy (Heterogeneous distribution of processes over...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.