The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Many future heterogeneous systems will integrate CPUs and GPUs physically on a single chip and logically connect them via shared memory to avoid explicit data copying. Making this shared memory coherent facilitates programming and fine-grained sharing, but throughput-oriented GPUs can overwhelm CPUs with coherence requests not well-filtered by caches. Meanwhile, region coherence has been proposed...
Performance, power, and energy (PPE) are critical aspects of modern computing. It is challenging to accurately predict, in real time, the effect of dynamic voltage and frequency scaling (DVFS) on PPE across a wide range of voltages and frequencies. This results in the use of reactive, iterative, and inefficient algorithms for dynamically finding good DVFS states. We propose PPEP, an online PPE prediction...
Modern heterogeneous multiprocessors integrate CPU and GPU together to provide a boost to computational performance. Data sharing and communication between CPU and GPU has been a critical issue for the final speedup. With tighter integration of CPU and GPU, it has the advantage of sharing and moving data more efficiently in order to leverage the computational power that a GPU can provide. Initially,...
Implementing video applications on emerging multi-core processors is a promising technique for personal, real-time multi-media applications. However, when porting the legacy parallel video encoders developed for clusters to shared-memory multi-cores, the existing parallel algorithms result in workload imbalances on different cores and communication inefficiencies. This paper describes a strip-wise...
Future CMPs will combine many simple cores with deep cache hierarchies. With more cores, cache resources per core are fewer, and must be shared carefully to avoid poor utilization due to conflicts and pollution. Explicit motion of data in these architectures, such as message passing, can provide hints about program behavior that can be used to hide latency and improve cache behavior. However, to make...
The Message Orchestration and Performance Enhancement Device (MOPED) provides an explicit hardware communication mechanism that offloads synchronization and data communication from CPUs to enable overlap between computation and communication, while also transferring data efficiently. The device achieves significant improvement in performance of real applications and reduction of on-chip cache misses,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.