The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Programming hybrid CPU-GPU clusters is hard. This paper addresses this difficulty and presents the design and runtime implementation of <bold/><bold>Unicorn</bold><bold/>—a parallel programming model for hybrid CPU-GPU clusters. In particular, this paper proves that efficient distributed shared memory style programing is possible and its simplicity can be retained across CPUs...
Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping...
Despite many efforts to better utilize the potential of GPUs and CPUs, it is far from being fully exploited. Although many tasks can be easily sped up by using accelerators, most of the existing schedulers are not flexible enough to really optimize the resource usage of the complete system. The main reasons are (i) that each processing unit requires a specific program code and that this code is often...
Many-Core systems and heterogeneous systems are getting more and more common and may soon enter the mainstream market. To harvest their capabilities to their full potential, the runtime system's scheduling policies have to be adapted and, in many cases, tailored to the specific system. The runtime system can be both an operating system or management infrastructure of an infrastructure as a service...
OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue," to a specific device for the entire program. For best performance, the user has to find the ideal queue -- device mapping at command queue creation time, an effort that requires a thorough understanding...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the Cholesky factorization since it exhibits the main features of such problems. Indeed, the relative performance of CPU and GPU highly depends on the sub-routine: GPUs are for instance much more efficient to process regular kernels such...
To circumvent the limitation from the hardware scheduler on GPU, we create an SM-centric transformation technique. This technique enables complete control of the mapping between tasks and streaming multi-processors (SMs), and enables controlling the number of active thread blocks on each SM. Results show that our approach achieves better speedup than previous ones with kernel co-run cases.
Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) currently uses the FIFO policy to schedule thread blocks of concurrent kernels. We show that the FIFO policy leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive...
Traditional procedural operating system (OS) kernels sacrifice maintainability and understandability for optimum performance. Though object oriented (OO) kernels can address these problems up to a certain extent, they lack the layered approach of services and service compositions. We present a new kernel design model Dhara, that raises the level of abstraction from objects and procedures to services...
Multi-core processors with explicitly-managed local memories provide advanced capabilities to optimize data caching and prefetching in software. Unfortunately, these capabilities are neither easily accessible to programmers, nor exploited to their maximum potential by current language, compiler, or runtime frameworks. We present Strider, a runtime framework for optimizing compilers on multi-core processors...
Current in-kernel disk schedulers provide efficient means to optimize the order (and minimize disk seeks) of issued, in-queue I/O requests. However, they fail to optimize sequential multi-file operations, like traversing a large file tree, because only requests from one file are available in the scheduling queue at a time. We have therefore investigated a user-level, I/O request sorting approach to...
Mobile operating systems should adapt to different applications requirement such as multimedia, games, video and audio applications, and mobile calls, etc. Process scheduling is considered as the most important part of the mobile operating system, which has the responsibility for adapting the operating systems to these applications requirements. In this work, the architecture for a runtime CPU scheduler...
Multitasking reconfigurable computers with one or more reconfigurable processors are being used increasingly during the past few years. One of the major challenges in such systems is the scheduling and allocation of the tasks on the reconfigurable fabric. In this paper we present a two level scheduling mechanism for tightly coupled reconfigurable architecture machines. To overcome the complexity of...
Efficiently using the computational power made available through desktop grids based distributed systems is a complicated and many-sided problem, caused by the intermittent resource availability. In this paper a novel solution is presented for predicting the runtimes of parameter sweep jobs. These jobs are characterized by their lack of inter-dependence and suitability for runtime prediction by modeling...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.