Search results

Items from 1 to 14 out of 14 results

article

The Unicorn Runtime: Efficient Distributed Shared Memory Programming for Hybrid CPU-GPU Clusters

Tarun Beri, Sorav Bansal, Subodh Kumar

IEEE Transactions on Parallel and Distributed Systems > 2017 > 28 > 5 > 1518 - 1534

Programming hybrid CPU-GPU clusters is hard. This paper addresses this difficulty and presents the design and runtime implementation of <bold/><bold>Unicorn</bold><bold/>—a parallel programming model for hybrid CPU-GPU clusters. In particular, this paper proves that efficient distributed shared memory style programing is possible and its simplicity can be retained across CPUs...

chapter

Scheduling challenges and opportunities in integrated CPU+GPU processors

Kapil Dev, Sherief Reda

2016 14th ACM/IEEE Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia) > 1 - 6

2016 ACM/IEEE 14th Symposium on Embedded Systems For Real-time Multimedia (ESTIMedia)

Heterogeneous processors with architecturally different devices (CPU and GPU) integrated on the same die provide good performance and energy efficiency for wide range of workloads. However, they also create challenges and opportunities in terms of scheduling workloads on the appropriate device. Current scheduling practices mainly use the characteristics of kernel workloads to decide the CPU/GPU mapping...

chapter

VarySched: A Framework for Variable Scheduling in Heterogeneous Environments

Tim SuB, Nils Doring, Ramy Gad, Lars Nagel, more

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 489 - 492

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Despite many efforts to better utilize the potential of GPUs and CPUs, it is far from being fully exploited. Although many tasks can be easily sped up by using accelerators, most of the existing schedulers are not flexible enough to really optimize the resource usage of the complete system. The main reasons are (i) that each processing unit requires a specific program code and that this code is often...

chapter

CoBaS: Introducing a Component Based Scheduling Framework

Anselm Busse, Reinhardt Karnapke, Hans-Ulrich Heiss

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) > 79 - 84

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

Many-Core systems and heterogeneous systems are getting more and more common and may soon enter the mainstream market. To harvest their capabilities to their full potential, the runtime system's scheduling policies have to be adapted and, in many cases, tailored to the specific system. The runtime system can be both an operating system or management infrastructure of an infrastructure as a service...

chapter

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

Ashwin Mandayam Aji, Antonio J. Pena, Pavan Balaji, Wu-chun Feng

2015 IEEE International Conference on Cluster Computing > 42 - 51

2015 IEEE International Conference on Cluster Computing (CLUSTER)

OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue," to a specific device for the entire program. For best performance, the user has to find the ideal queue -- device mapping at command queue creation time, an effort that requires a thorough understanding...

chapter

Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms

Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, more

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 34 - 45

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

We consider the problem of allocating and scheduling dense linear application on fully heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the Cholesky factorization since it exhibits the main features of such problems. Indeed, the relative performance of CPU and GPU highly depends on the sub-routine: GPUs are for instance much more efficient to process regular kernels such...

chapter

SM-centric transformation: Circumventing hardware restrictions for flexible GPU scheduling

Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, more

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 497 - 498

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

To circumvent the limitation from the hardware scheduler on GPU, we create an SM-centric transformation technique. This technique enables complete control of the mapping between tasks and streaming multi-processors (SMs), and enables controlling the number of active thread blocks on each SM. Results show that our approach achieves better speedup than previous ones with kernel co-run cases.

chapter

Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels

Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 483 - 484

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) currently uses the FIFO policy to schedule thread blocks of concurrent kernels. We show that the FIFO policy leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive...

chapter

Dhara: A Service Abstraction-Based OS Kernel Design Model

Dharanipragada Janakiram, Hemang Mehta, S.J. Balaji

2012 IEEE 17th International Conference on Engineering of Complex Computer Systems > 127 - 136

2012 17th International Conference on Engineering of Complex Computer Systems (ICECCS)

Traditional procedural operating system (OS) kernels sacrifice maintainability and understandability for optimum performance. Though object oriented (OO) kernels can address these problems up to a certain extent, they lack the layered approach of services and service compositions. We present a new kernel design model Dhara, that raises the level of abstraction from objects and procedures to services...

chapter

Strider: Runtime Support for Optimizing Strided Data Accesses on Multi-Cores with Explicitly Managed Memories

Jae-Seung Yeom, Dimitrios S Nikolopoulos

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 11

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

Multi-core processors with explicitly-managed local memories provide advanced capabilities to optimize data caching and prefetching in software. Unfortunately, these capabilities are neither easily accessible to programmers, nor exploited to their maximum potential by current language, compiler, or runtime frameworks. We present Strider, a runtime framework for optimizing compilers on multi-core processors...

chapter

Improving file tree traversal performance by scheduling I/O operations in user space

C.H. Lunde, H. Espeland, H.K. Stensland, P. Halvorsen

2009 IEEE 28th International Performance Computing and Communications Conference > 145 - 152

2009 IEEE 28th International Performance Computing and Communications Conference (IPCCC 2009)

Current in-kernel disk schedulers provide efficient means to optimize the order (and minimize disk seeks) of issued, in-queue I/O requests. However, they fail to optimize sequential multi-file operations, like traversing a large file tree, because only requests from one file are available in the scheduling queue at a time. We have therefore investigated a user-level, I/O request sorting approach to...

chapter

Runtime CPU scheduler customization framework for a flexible mobile operating system

Nasr Addin Al-maweri A, Khairulmizam Samsudin B, Fakhrul Zamani Rokhani C

2009 IEEE Student Conference on Research and Development (SCOReD) > 85 - 88

2009 7th IEEE Student Conference on Research and Development (SCOReD 2009)

Mobile operating systems should adapt to different applications requirement such as multimedia, games, video and audio applications, and mobile calls, etc. Process scheduling is considered as the most important part of the mobile operating system, which has the responsibility for adapting the operating systems to these applications requirements. In this work, the architecture for a runtime CPU scheduler...

chapter

Compiler assisted runtime task scheduling on a reconfigurable computer

M. Sabeghi, V.-M. Sima, K. Bertels

2009 International Conference on Field Programmable Logic and Applications > 44 - 50

2009 International Conference on Field Programmable Logic and Applications (FPL)

Multitasking reconfigurable computers with one or more reconfigurable processors are being used increasingly during the past few years. One of the major challenges in such systems is the scheduling and allocation of the tasks on the reconfigurable fabric. In this paper we present a two level scheduling mechanism for tightly coupled reconfigurable architecture machines. To overcome the complexity of...

chapter

Predicting Parameter Sweep Jobs: From Simulation to Grid Implementation

P. Hellinckx, S. Verboven, F. Arickx, J. Broeckhove

2009 International Conference on Complex, Intelligent and Software Intensive Systems > 402 - 408

2009 International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2009)

Efficiently using the computational power made available through desktop grids based distributed systems is a complicated and many-sided problem, caused by the intermittent resource availability. In this paper a novel solution is presented for predicting the runtimes of parameter sweep jobs. These jobs are characterized by their lack of inter-dependence and suitability for runtime prediction by modeling...

Filter options

Data set:
ieee
Keywords:
KERNEL
RUNTIME
SCHEDULING

Publication date

Set your own date range

Publication type

book (13)
article (1)

Keywords

PROCESSOR SCHEDULING (7)
HARDWARE (5)
GRAPHICS PROCESSING UNITS (4)
LINUX (4)
OPTIMIZATION (4)
SCHEDULES (4)
COMPUTER ARCHITECTURE (2)
DYNAMIC SCHEDULING (2)
ELECTRONIC MAIL (2)
INSTRUCTION SETS (2)
PERFORMANCE EVALUATION (2)
PREDICTIVE MODELS (2)
RESOURCE ALLOCATION (2)
ACCELERATORS (1)
AGING (1)
ARRAYS (1)
AVAILABILITY (1)
BUFFER STORAGE (1)
BULK SYNCHRONOUS PARALLELISM (1)
CACHE STORAGE (1)
CHOLESKY FACTORIZATION (1)
COBRA (1)
COMPILER ASSISTED RUNTIME TASK SCHEDULING (1)
COMPILER SUPPORT (1)
COMPOSITION (1)
CONFIGURATION CALL GRAPH (1)
CONTEXT (1)
COUPLINGS (1)
CPU SCHEDULER CUSTOMIZATION (1)
DATA CACHING (1)
DATA MINING (1)
DATA PLACEMENT INFORMATION (1)
DATA TRANSFER SCHEDULING (1)
DENSE LINEAR ALGEBRA (1)
DESKTOP GRID (1)
DESKTOP GRID BASED DISTRIBUTED SYSTEM (1)
DISK BLOCKS (1)
DISTANCE MEASUREMENT (1)
DISTRIBUTED COMPUTING (1)
DISTRIBUTED SYSTEM DESIGN (1)
DYNAMIC SCHEDULERS (1)
ENERGY AWARE (1)
EXPLICIT SOFTWARE-MANAGED ACCESS SCHEDULING (1)
EXPLICIT-MANAGED LOCAL MEMORIES (1)
FIELD PROGRAMMABLE GATE ARRAYS (1)
FILE TREE TRAVERSAL PERFORMANCE (1)
FLEXIBLE MOBILE OPERATING SYSTEM (1)
GIPSY (1)
GPGPU (1)
GRID COMPUTING (1)
GRID INFORMATION PREDICTION SYSTEM (1)
HAND-OPTIMIZED CODE (1)
HETEROGENEITY (1)
HETEROGENEOUS COMPUTING (1)
HETEROGENEOUS CPU+GPU PROCESSORS (1)
HETEROGENEOUS RESOURCES (1)
I/O OPERATIONS SCHEDULING (1)
I/O REQUEST SORTING (1)
IBM CELL PROCESSOR (1)
IN-KERNEL DISK SCHEDULERS (1)
INFRARED IMAGING (1)
INODES (1)
INPUT-OUTPUT PROGRAMS (1)
INTER-FILE DISK ARM MOVEMENTS (1)
KERNEL DESIGN (1)
LINEAR ALGEBRA (1)
LINUX KERNEL (1)
LOAD BALANCING (1)
MANY-CORE (1)
MICROPROCESSOR CHIPS (1)
MOBILE COMMUNICATION (1)
MOBILE COMPUTING (1)
MOBILE DEVICES (1)
MOBILE OPERATING SYSTEMS (1)
MODELLING (1)
MULTICORE PROCESSORS (1)
MULTIDIMENSIONAL ARRAY ANALYSIS (1)
MULTIPROCESSING SYSTEMS (1)
MULTISTRIDE ACCESSES (1)
OPENCL (1)
OPENCL KERNELS (1)
OPERATING SYSTEMS (COMPUTERS) (1)
OPTIMISATION (1)
OPTIMISING COMPILERS (1)
OPTIMIZING COMPILERS (1)
PARALLEL PROCESSING (1)
PARAMETER SWEEP JOB (1)
PGS (1)
PREDICTION BASED GRID SCHEDULING (1)
PREFETCHING (1)
PROBABILITY DENSITY FUNCTION (1)
PROCESS SCHEDULING (1)
PROGRAM CO-RUN (1)
PROGRAM COMPILERS (1)
PROGRAM PROCESSORS (1)
PROGRAMMING (1)
RECONFIGURABLE ARCHITECTURE MACHINES (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options