The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Since main memory system contributes to a large and increasing fraction of server/datacenter energy consumption, there have been several efforts to reduce its power and energy consumption. DVFS schemes have been used to reduce the memory power, but they come with a performance penalty. In this work, we propose DEMM, an OS-based, high performance DVFS mechanism that reduces memory power by dynamically...
Growing performance gap between processors and main memory has made it worthwhile to consider off-chip data accesses in multi-query processing [2], [1], [3]. Exploiting data-sharing opportunities among concurrent queries can be critical for effective utilization of the underlying shared memory hierarchy. Given a set of queries, there may be a common retrieval operation for several cases to the same...
In both architecture and software, the main goal of data locality-oriented optimizations has always been "minimizing the number of cache misses" (especially, costly last-level cache misses). However, this paper shows that other metrics such as the distance between the last-level cache and memory controller as well as the memory queuing latency can play an equally important role, as far as...
Most of the prior compiler based data locality optimization works target exclusively cache locality optimization, and row-buffer locality in DRAM banks received much less attention. In particular, to the best of our knowledge, there is no single compiler based approach that can improve row-buffer locality in executing irregular applications. This presents a critical problem considering the fact that...
After hitting the power wall, the dramatic change in computer architecture from single core to multicore/manycore brings us new challenges on high performance computing, especially for the data intensive applications. Sparse matrix-vector multiplication (SpMV) is one of the most important computations in this area, and has therefore received a lot of attention in recent decades. In contrast to the...
This paper proposes a cache management scheme for mul-tiprogrammed, multithreaded applications, with the objective of obtaining maximum performance for both individual applications and the multithreaded workload mix. In this scheme, each individual application's performance is improved by increasing the priority of its slowest thread, while the overall system performance is improved by ensuring that...
Modern architectures become more susceptible to transient errors with the scale down of circuits. This makes reliability an increasingly critical concern in computer systems. In general, there is a tradeoff between system reliability and performance of multithreaded applications running on multicore architectures. In this paper, we conduct a performance-reliability analysis for different parallel...
Motivated by the observation that most existing data locality optimizations do not specifically target shared last-level caches of emerging multicores and that even multicore-specific locality-oriented techniques employ either loop or data layout optimizations but not both, in this paper we present an integrated loop and data layout optimization strategy, with the goal of improving the last-level...
This paper presents and evaluates a cache hierarchy-aware code parallelization/mapping and scheduling strategy for multicore architectures. Our proposed parallelization/mapping strategy determines a loop iteration-to-core mapping by taking into account the data access pattern of an application and the on-chip cache hierarchy of a target architecture. The goal of this step is to maximize data locality...
The emergence of multicore platforms offers several opportunities for boosting application performance. These opportunities, which include parallelism and data locality benefits, require strong support from compilers as well as operating systems. Current compiler research targeting multicores mostly focuses on code restructuring and mapping. In this work, we explore automatic data layout transformation...
Energy-Delay-Product-aware DVFS is a widely-used technique that improves energy efficiency by dynamically adjusting the frequencies of cores. Further, for multithreaded applications, barrier-aware DVFS is a method that can dynamically tune the frequencies of cores to reduce barrier stall times and achieve higher energy efficiency. In both forms of DVFS, frequencies of cores are reduced from the maximum...
Focusing on the problem of how to partition the cache space given to a multithreaded application across its threads, we show that different threads of a multithreaded application can have different cache space requirements, propose a fully automated, dynamic, intra-application cache partitioning scheme targeting emerging multicores with multilayer cache hierarchies, present a comprehensive experimental...
We propose a variation-aware source routing algorithm for a heterogenous NoC where each router has a different operating latency, as a result of process variations. Our proposed scheme computes the best path for each communication, based on the inherent speed of the routers (dictated by process variations) and the current traffic pattern. Our results indicate that employing our proposed routing scheme...
Efficient management of shared on-chip resources such as the shared level 2 (L2) cache has become an important problem with the emergence of chip multiprocessors (CMPs). Partitioning the shared cache in chip multiprocessors (CMPs) among concurrently executing applications can provide important benefits such as throughput improvement, fairness guarantees, and quality of service (QoS) enhancements....
In this paper, we employ formal feedback control theory to achieve desired communication throughput across a network-on-chip (NoC) based multicore. When the output of the system needs to follow a certain reference input over time, our controller regulates the system to obtain the desired effect on the output. In this work, targeting a multicore that executes multiple applications simultaneously, we...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.