The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This work addresses the modeling of shared cache contention in multicore systems and its impact on throughput and bandwidth. We develop two simple and fast cache sharing models for accurately predicting shared cache allocations for random and LRU caches.
Simulation is an important part of the evaluation of next-generation computing systems. Detailed, cycle-accurate simulation, however, can be very slow when evaluating realistic workloads on modern microarchitectures. Sampled simulation (e.g., SMARTS and SimPoint) improves simulation performance by an order of magnitude or more through the reduction of large workloads into a small but representative...
Since the advent of the smartphone, all high-end mobile devices have required graphics acceleration in the form of a GPU. Today, even low-power devices such as smartwatches use GPUs for rendering and composition. However, the computer architecture community has largely ignored these developments when evaluating new architecture proposals.
Sampling (e.g., SMARTS and SimPoint) improves simulation performance by an order of magnitude or more through the reduction of large workloads into a small but representative sample. Virtualized fast-forwarding (e.g., FSA) speeds up simulation further by advancing execution at near-native speed between simulation points, making cache warming the critical limiting factor for simulation performance...
Cycle-level micro architectural simulation is the de-facto standard to estimate performance of next-generation platforms. Unfortunately, the level of detail needed for accurate simulation requires complex, and therefore slow, simulation models that run at speeds that are thousands of times slower than native execution. With the introduction of sampled simulation, it has become possible to simulate...
Modern processors typically employ sophisticated prefetching techniques for hiding memory latency. Hardware prefetching has proven very effective and can speed up some SPEC CPU 2006 benchmarks by more than 40% when running in isolation. However, this speedup often comes at the cost of prefetching a significant volume of useless data (sometimes more than twice the data required) which wastes shared...
Hardware prefetching has proven very effective for hiding memory latency and can speed up some applications by more than 40%. However, this speedup comes at the cost of often prefetching a significant volume of useless data which wastes shared last level cache space and off-chip bandwidth. This directly impacts the performance of co-scheduled applications which compete for shared resources in multicores...
Shared cache contention can cause significant variability in the performance of co-running applications from run to run. This variability arises from different overlappings of the applications' phases, which can be the result of offsets in application start times or other delays in the system. Understanding this variability is important for generating an accurate view of the expected impact of cache...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.