The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
3D FFT is a very data and compute intensive kernel encountered in many applications. We report a high performance design and implementation of 3D-FFT on a CGRA which supports partial reconfiguration. The hardware software multi clock design uses dynamic reconfiguration to reduce the required communication bandwidth to achieve a sustained throughput of 40 GOPS on a wordsize of 48 bits. Performance...
This paper links a well-investigated formalism for describing dynamic structured discrete event systems and a modelling methodology for runtime reconfigurable systems. The theory behind dynamic structured discrete event systems is used to back a generic SystemC simulation model derived from and developed for runtime reconfigurable systems. The coupling of formalism and model effects in particular...
Performance portability is a major challenge faced today by developers on heterogeneous high performance computers, consisting of an interconnect, memory with non-uniform access, many-cores and accelerators like GPUs. Recent studies have successfully demonstrated that dense linear algebra operations can be efficiently handled by runtime systems using a DAG representation. In this work, we present...
One of the major trends in the design of exascale architectures is the use of multicore nodes enhanced with GPU accelerators. Exploiting all resources of a hybrid accelerators-based node at their maximum potential is thus a fundamental step towards exascale computing. In this article, we present the design of a highly efficient QR factorization for such a node. Our method is in three steps. The first...
This paper introduces a novel implementation in reducing a symmetric dense matrix to tridiagonal form, which is the preprocessing step toward solving symmetric eigenvalue problems. Based on tile algorithms, the reduction follows a two-stage approach, where the tile matrix is first reduced to symmetric band form prior to the final condensed structure. The challenging trade-off between algorithmic performance...
As tile linear algebra algorithms continue achieving high performance on shared-memory multicore architectures, it is a challenging task to make them scalable on distributed-memory multicore cluster machines. The main contribution of this paper is the extension to the distributed-memory environment of the previous work done by Hadri et al. on Communication- Avoiding QR (CA-QR) factorizations for tall...
Because of the difficulty of increasing single-threaded processor performance, multi-core systems are becoming increasingly popular. These systems bring new challenges to the design of a reconfigurable computing system, with reconfigurable hardware potentially shared between multiple simultaneously-executing applications. In this paper, we examine how to best use reconfigurable hardware in a multiprocessor...
The use of reconfigurable hardware to accelerate computing intensive part of the applications has long been shown to provide large execution speedups. However, the long configuration latency and the limited amount of reconfigurable hardware resources demands careful arbitration of those resources among the applications in the system. In past efforts, we demonstrated that effective allocation of reconfigurable...
A key step in program performance optimization is to determine optimal values for certain parameters. Static approaches determine these values based on analytical models. However, complex computer architectures and complex code structures limit the strength of them. Execution-driven approaches like iterative compilation determine these parameter values by executing the program with different parameter...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.