Search results

chapter

Methods to monitor process's Spatial and temporal consumption

Huiming Jia, Rui Mao, Wenbo Wu

2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) > 892 - 897

2014 11th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Exact values of Spatial and temporal consumption are needed when we are judging the space and time complexities of an algorithm, but few researchers paid attention on whether the their methods were valid. In this paper, we discussed about some key concepts involved in the process of monitoring process's spatial and temporal consumption, and then we explained and distinguished those concepts. Further,...

chapter

Leveraging Hierarchical Data Locality in Parallel Programming Models

Ahmad Anbar, Engin Kayraklioglu, Olivier Serres, Tarek El Ghazawi

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 363 - 366

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

We are proposing a novel framework that ameliorates locality-aware parallel programming models, by defining hierarchical data locality model extension. We also propose a hierarchical thread partitioning algorithm. This algorithm synthesizes hierarchical thread placement layouts that targets minimizing the program's overall communication costs. We demonstrated the effectiveness of our approach using...

chapter

An OpenCL runtime system for a heterogeneous many-core virtual platform

Kuan-Chung Chen, Chung-Ho Chen

2014 IEEE International Symposium on Circuits and Systems (ISCAS) > 2197 - 2200

2014 IEEE International Symposium on Circuits and Systems (ISCAS)

We present a many-core full system simulation platform and its OpenCL runtime system. The OpenCL runtime system includes an on-the-fly compiler and resource manager for the ARM-based many-core platform. Using this platform, we evaluate approaches of work-item scheduling and memory management in OpenCL memory hierarchy. Our experimental results show that scheduling work-items on a many-core system...

chapter

Comparison of service call implementations in an AUTOSAR multi-core OS

Christian Bradatsch, Florian Kluge, Theo Ungerer

Proceedings of the 9th IEEE International Symposium on Industrial Embedded Systems (SIES 2014) > 199 - 205

2014 9th IEEE International Symposium on Industrial Embedded Systems (SIES

Multi-core processors are gaining a foothold in the domain of embedded automotive systems. The AUTOSAR Release 4.1 establishes a common standard for the use of multi-core processors in automotive systems. While interfaces and functionalities are well defined in the specification, the actual implementation is left open to the software manufacturers. We exploit this room that is left by the specification...

chapter

FACE-CHANGE: Application-Driven Dynamic Kernel View Switching in a Virtual Machine

Zhongshu Gu, Brendan Saltaformaggio, Xiangyu Zhang, Dongyan Xu

2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks > 491 - 502

2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Kernel minimization has already been established as a practical approach to reducing the trusted computing base. Existing solutions have largely focused on whole-system profiling - generating a globally minimum kernel image that is being shared by all applications. However, since different applications use only part of the kernel's code base, the minimized kernel still includes an unnecessarily large...

chapter

Efficient Integration of Online Model Checking into a Small-Footprint Real-Time Operating System

Krishna Sudhakar, Yuhong Zhao, Franz-Josef Rammig

2014 IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing > 374 - 383

2014 IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC)

In this paper we discuss how an efficient online model checker and a small-footprint RTOS can be integrated. Alternative approaches are discussed, leading to the decision for a federated approach. An implemented prototype is described and some analytical as well as experimental evaluations are presented.

chapter

Ethos' Deeply Integrated Distributed Types

W. Michael Petullo, Wenyuan Fei, Jon A. Solworth, Pat Gavlin

2014 IEEE Security and Privacy Workshops > 167 - 180

2014 IEEE Security and Privacy Workshops (SPW)

Programming languages have long incorporated type safety, increasing their level of abstraction and thus aiding programmers. Type safety eliminates whole classes of security-sensitive bugs, replacing the tedious and error-prone search for such bugs in each application with verifying the correctness of the type system. Despite their benefits, these protections often end at the process boundary, that...

chapter

Uscope: A scalable unified tracer from kernel to user space

Junghwan Rhee, Hui Zhang, Nipun Arora, Guofei Jiang, more

2014 IEEE Network Operations and Management Symposium (NOMS) > 1 - 8

NOMS 2014 - 2014 IEEE/IFIP Network Operations and Management Symposium

Unified tracing is the process of collecting trace logs across the boundary of kernel and user spaces, and has been used to understand the in-depth correspondence between low level events and application program context for diagnosing system failures and performance problems. Crossing the boundary from the kernel space to a user space to collect trace events from dual spaces imposes challenges compared...

chapter

Efficient Software-Based Runtime Binary Translation for Coarse-Grained Reconfigurable Architectures

Toan X. Mai, Jongeun Lee

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 132 - 140

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

The increasing use of runtime-compiled applications provides an opportunity for coarse-grained reconfigurable architecture (CGRA) accelerators to be used in a user-transparent way. The challenge is to provide efficient runtime translation for CGRAs. Despite the apparent difficulties stemming from the heterogeneous nature of CGRAs, this paper demonstrates that it is possible to speed up runtime-compiled...

chapter

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Felix Schmitt, Robert Dietrich, Guido Juckeland

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 908 - 915

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Utilizing accelerators in heterogeneous systems is an established approach for designing peta-scale applications. Today, CUDA offers a rich programming interface for GPU accelerators but requires developers to incorporate several layers of parallelism on both CPU and GPU. From this increasing program complexity emerges the need for sophisticated performance tools. This work contributes by analyzing...

chapter

Parallelism Extraction Algorithm from Stream-Based Processing Flow Applying Spanning Tree

Guyue Wang, Shinichi Yamagiwa, Koichi Wada

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 632 - 641

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Manycore architecture promotes a massively parallel computing on the accelerators. Especially GPU is one of the main series of the high performance computing, which is also employed by top supercomputers in the world. The programming method on such accelerators includes development of a control program. The accelerator executes it to schedule the invocation timing of the accelerator's kernel program...

chapter

Optimizing Collective Communication in UPC

Jithin Jose, Khaled Hamidouche, Jie Zhang, Akshay Venkatesh, more

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 361 - 370

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

Message Passing Interface (MPI) has been the defacto programming model for scientific parallel applications. However, data driven applications with irregular communication patterns are harder to implement using MPI. The Partitioned Global Address Space (PGAS) programming models present an alternative approach to improve programmability. PGAS languages like UPC are growing in popularity because of...

chapter

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Dmitry Mikushin, Nikolay Likhogrud, Eddy Z. Zhang, Christopher Bergstrom

2014 IEEE International Parallel & Distributed Processing Symposium Workshops > 1011 - 1020

2014 IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW)

GPUs are becoming pervasive in scientific computing. Originally served as peripheral accelerators, now they are gradually turning into central computing nodes. However, most current directive-based approaches for parallelizing sequential legacy code such as OpenACC and HMPP simply off-load "hot" CPU code onto GPUs, entailing a lot of limitations such as unsupported external calls and coarse-grained...

chapter

Towards more reliable embedded systems through a mechanism for monitoring driver devices communication

Rafael M. Madeira, Edna Barros, Camila Ascendina

Fifteenth International Symposium on Quality Electronic Design > 420 - 427

2014 15th International Symposium on Quality Electronic Design (ISQED)

Embedded systems require even more flexibility. Several system permits on-the-market software updates. However these updates must be reliable, otherwise, the results can be catastrophic. Device drivers may have any updates and they are very vulnerable to this problem, requiring mechanisms that are able to capture errors arising from updates at runtime. This work proposes an approach for runtime errors...

chapter

SM-centric transformation: Circumventing hardware restrictions for flexible GPU scheduling

Bo Wu, Guoyang Chen, Dong Li, Xipeng Shen, more

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 497 - 498

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

To circumvent the limitation from the hardware scheduler on GPU, we create an SM-centric transformation technique. This technique enables complete control of the mapping between tasks and streaming multi-processors (SMs), and enables controlling the number of active thread blocks on each SM. Results show that our approach achieves better speedup than previous ones with kernel co-run cases.

chapter

Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels

Sreepathi Pai, R. Govindarajan, Matthew J. Thazhuthaveetil

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 483 - 484

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) currently uses the FIFO policy to schedule thread blocks of concurrent kernels. We show that the FIFO policy leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive...

chapter

PEMOGEN: Automatic adaptive performance modeling during program runtime

Arnamoy Bhattacharyya, Torsten Hoefler

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 393 - 404

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

Traditional means of gathering performance data are tracing, which is limited by the available storage, and profiling, which has limited accuracy. Performance modeling is often used to interpret the tracing data and generate performance predictions. We aim to complement the traditional data collection mechanisms with online performance modeling, a method that generates performance models while the...

chapter

Automatic execution of single-GPU computations across multiple GPUs

Javier Cabezas, Lluis Vilanova, Isaac Geladeno, Thomas B. Jablin, more

2014 23rd International Conference on Parallel Architecture and Compilation (PACT) > 467 - 468

2014 23rd International Conference on Parallel Architecture and Compilation (PACT)

We present AMGE, a programming framework and runtime system to decompose data and GPU kernels and execute them on multiple GPUs concurrently. AMGE exploits the remote memory access capability of recent GPUs to guarantee data accessibility regardless of its physical location, thus allowing AMGE to safely decompose and distribute arrays across GPU memories. AMGE also includes a compiler analysis to...

chapter

Online Performance Projection for Clusters with Heterogeneous GPUs

Lokendra S. Panwar, Ashwin M. Aji, Jiayuan Meng, Pavan Balaji, more

2013 International Conference on Parallel and Distributed Systems > 283 - 290

2013 International Conference on Parallel and Distributed Systems (ICPADS)

We present a fully automated approach to project the relative performance of an OpenCL program over different GPUs. Performance projections can be made within a small amount of time, and the projection overhead stays relatively constant with the input data size. As a result, the technique can help runtime tools make dynamic decisions about which GPU would run faster for a given kernel. Usage cases...

chapter

VDBSCAN+: Performance Optimization Based on GPU Parallelism

Carlos Roberto Valencio, Guilherme Priolli Daniel, Camila Alves De Medeiros, Adriano Mauro Cansian, more

2013 International Conference on Parallel and Distributed Computing, Applications and Technologies > 23 - 28

2013 International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)

Spatial data mining techniques enable the knowledge extraction from spatial databases. However, the high computational cost and the complexity of algorithms are some of the main problems in this area. This work proposes a new algorithm referred to as VDBSCAN+, which derived from the algorithm VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise) and focuses on the use of parallelism...

INFONA - science communication portal

Search results

Methods to monitor process's Spatial and temporal consumption

Leveraging Hierarchical Data Locality in Parallel Programming Models

An OpenCL runtime system for a heterogeneous many-core virtual platform

Comparison of service call implementations in an AUTOSAR multi-core OS

FACE-CHANGE: Application-Driven Dynamic Kernel View Switching in a Virtual Machine

Efficient Integration of Online Model Checking into a Small-Footprint Real-Time Operating System

Ethos' Deeply Integrated Distributed Types

Uscope: A scalable unified tracer from kernel to user space

Efficient Software-Based Runtime Binary Translation for Coarse-Grained Reconfigurable Architectures

Scalable Critical Path Analysis for Hybrid MPI-CUDA Applications

Parallelism Extraction Algorithm from Stream-Based Processing Flow Applying Spanning Tree

Optimizing Collective Communication in UPC

KernelGen -- The Design and Implementation of a Next Generation Compiler Platform for Accelerating Numerical Models on GPUs

Towards more reliable embedded systems through a mechanism for monitoring driver devices communication

SM-centric transformation: Circumventing hardware restrictions for flexible GPU scheduling

Preemptive thread block scheduling with online structural runtime prediction for concurrent GPGPU kernels

PEMOGEN: Automatic adaptive performance modeling during program runtime

Automatic execution of single-GPU computations across multiple GPUs

Online Performance Projection for Clusters with Heterogeneous GPUs

VDBSCAN+: Performance Optimization Based on GPU Parallelism

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options