Search results

chapter

Self-Managed Component-Based Software Architecture for Business Process Management

Bassem Debbabi, Thomas Calmant, Olivier Gattaz, Sandra Massonnat, more

2015 IEEE International Conference on Autonomic Computing > 145 - 146

2015 IEEE International Conference on Autonomic Computing (ICAC)

While the functions of Business Process Management (BPM) tools are already studied and standardized, new challenges regarding the architecture of such type of tools are emerging including the need for more scalability to support increasing demands, and more resilience of the overall solution to detect and avoid third-party code problems, that can causes failure of all the system. In this paper we...

chapter

Complete Runtime Tracing for Device Drivers Based on LLVM

Jia-Ju Bai, Hu-Qiu Liu, Yu-Ping Wang, Shi-Min Hu

2015 IEEE 39th Annual Computer Software and Applications Conference > 2 > 200 - 209

2015 IEEE 39th Annual Computer Software and Applications Conference (COMPSAC)

Device drivers often suffer from much more bugs than the kernel, so testing device drivers becomes more and more important and necessary. In software testing, runtime tracing is an important technique to monitor real executing procedures of the program. Meanwhile, runtime information can also assist the programmer to make more accurate analysis of the program, like verifying the correctness of code...

chapter

A GPU-accelerated two stage visual matching pipeline for image and video retrieval

Hannes Fassold, Harald Stiegler, Jakub Rosner, Marcus Thaler, more

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI) > 1 - 5

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

We propose a two stage visual matching pipeline including a first step using VLAD signatures for filtering results, and a second step which reranks the top results using raw matching of SIFT descriptors. This enables adjusting the tradeoff between high computational cost of matching local descriptors and the insufficient accuracy of compact signatures in many application scenarios. We describe GPU...

chapter

Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters

Raghunath Raja Chandrasekar, Akshay Venkatesh, Khaled Hamidouche, Dhabaleswar K. Panda

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 261 - 270

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Checkpoint-restart is a predominantly used reactive fault-tolerance mechanism for applications running on HPC systems. While there are innumerable studies in literature that have analyzed, and optimized for, the performance and scalability of a variety of check pointing protocols, not much research has been done from an energy or power perspective. The limited number of studies conducted along this...

chapter

Heterogeneous Habanero-C (H2C): A Portable Programming Model for Heterogeneous Processors

Deepak Majeti, Vivek Sarkar

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 708 - 717

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

Heterogeneous architectures with their diverse architectural features impose significant programmability challenges. Existing programming systems involve non-trivial learning and are not productive, not portable, and are challenging to tune for performance. In this paper, we introduce Heterogeneous Habanero-C (H2C), which is an implementation of the Habanero execution model for modern heterogeneous...

chapter

Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL

Guido Juckeland, Alexander Grund, Wolfgang E. Nagel

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 689 - 698

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this SPEC benchmark to compare an AMD GPU, an NVIDIA GPU and an Intel Xeon Phi with respect to performance and energy consumption. It also provides observations...

chapter

Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU

Carl Yang, Yangzihao Wang, John D. Owens

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 841 - 847

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

We implement a promising algorithm for sparse-matrix sparse-vector multiplication (SpMSpV) on the GPU. An efficient k-way merge lies at the heart of finding a fast parallel SpMSpV algorithm. We examine the scalability of three approaches -- no sorting, merge sorting, and radix sorting -- in solving this problem. For breadth-first search (BFS), we achieve a 1.26x speedup over state-of-the-art sparse-matrix...

chapter

Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms

Emmanuel Agullo, Olivier Beaumont, Lionel Eyraud-Dubois, Julien Herrmann, more

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 34 - 45

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

We consider the problem of allocating and scheduling dense linear application on fully heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the Cholesky factorization since it exhibits the main features of such problems. Indeed, the relative performance of CPU and GPU highly depends on the sub-routine: GPUs are for instance much more efficient to process regular kernels such...

chapter

Linux XIA: an interoperable meta network architecture to crowdsource the future internet

Michel Machado, Michel Machado, John W. Byers

2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) > 147 - 158

2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)

With the growing number of proposed clean-slate redesigns of the Internet, the need for a medium that enables all stakeholders to participate in the realization, evaluation, and selection of these designs is increasing. We believe that the missing catalyst is a meta network architecture that welcomes most, if not all, clean-state designs on a level playing field, lowers deployment barriers, and leaves...

chapter

In memory detection of Windows API call hooking technique

Syed Zainudeen Mohd Shaid, Mohd Aizaini Maarof

2015 International Conference on Computer, Communications, and Control Technology (I4CT) > 294 - 298

2015 International Conference on Computer, Communications, and Control Technology (I4CT)

API call hooking is a technique that malware researchers use to mine malware's API calls. These API calls is used to represent malware's behavior, for use in malware analysis, classification or detection of samples. In this paper, analysis of current Windows API call hooking techniques is presented where surprisingly, it was found that detection of each technique can be done trivially in memory. This...

chapter

PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14

Michael Haidl, Sergei Gorlatch

2014 LLVM Compiler Infrastructure in HPC > 1 - 11

2014 LLVM Compiler Infrastructure in HPC (LLVM-HPC)

We present PACXX -- a unified programming model for programming many-core systems that comprise accelerators like Graphics Processing Units (GPUs). One of the main difficulties of the current GPU programming is that two distinct programming models are required: the host code for the CPU is written in C/C++ with the restricted, C-like API for memory management, while the device code for the GPU has...

chapter

QTrace: a framework for customizable full system instrumentation

Xin Tong, Andreas Moshovos

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 245 - 255

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

This work presents QTrace, an open-source instrumentation extension API for QEMU (1) that can instrument unmodified applications and OS binaries for uni- and multi-processor systems. QTrace facilitates the development of custom, full-system instrumentation tools for the X86 guest architecture enabling statistics collection and program execution studies including system-level code. This paper: illustrates...

chapter

Implementation of numerical methods for nanoscaled semiconductor device simulation using OpenCL

E. Coronado-Barrientos, A. Garcia-Loureiro, G. Indalecio, N. Seoane

2015 10th Spanish Conference on Electron Devices (CDE) > 1 - 4

2015 10th Spanish Conference on Electron Devices (CDE)

The present work implements solvers with OpenCL of the FGMRES and preconditioned BCGSTAB algorithms. These solvers are integrated in a 3-D simulation tool of nanoscaled MOSFET transistors. Simulations are launched in two different platform devices: NVIDIA Tesla S2050 and Intel Xeon Phi 3120A. The resulting times of execution are compared against the optimized PSPARSLIB version of the FGMRES solver...

chapter

Free launch: Optimizing GPU dynamic kernel launches through thread reuse

Guoyang Chen, Xipeng Shen

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) > 407 - 419

2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

Supporting dynamic parallelism is important for GPU to benefit a broad range of applications. There are currently two fundamental ways for programs to exploit dynamic parallelism on GPU: a software-based approach with software-managed worklists, and a hardware-based approach through dynamic subkernel launches. Neither is satisfactory. The former is complicated to program and is often subject to some...

chapter

CilkSpec: optimistic concurrency for Cilk

Shaizeen Aga, Sriram Krishnamoorthy, Satish Narayanasamy

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Recursive parallel programming models such as Cilk strive to simplify the task of parallel programming by enabling a simple divide-and-conquer programming model. This model is effective in recursively partitioning work into smaller parts and combining their results. However, recursive work partitioning can impose additional constraints on concurrency than is implied by the true dependencies in a program...

chapter

VOCL-FT: introducing techniques for efficient soft error coprocessor recovery

Antonio J. Peña, Wesley Bland, Pavan Balaji

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis > 1 - 12

SC15: International Conference for High Performance Computing, Networking, Storage and Analysis

Popular accelerator programming models rely on offloading computation operations and their corresponding data transfers to the coprocessors, leveraging synchronization points where needed. In this paper we identify and explore how such a programming model enables optimization opportunities not utilized in traditional checkpoint/restart systems, and we analyze them as the building blocks for an efficient...

chapter

Real-time multi-core components for cyber-physical systems

Michael Wahler, Manuel Oriol, Aurelien Monot

2015 18th International ACM SIGSOFT Symposium on Component-Based Software Engineering (CBSE) > 37 - 42

2015 18th International ACM SIGSOFT Symposium on Component-Based Software Engineering (CBSE)

Developing correct, efficient, and maintainable real-time control software for cyber-physical systems is a notoriously difficult interdisciplinary challenge. Ever more complex control algorithms and the advent of multi-core hardware in embedded systems have made this challenge even harder. Component-based software development promises to help reduce the complexity and to increase the timing predictability...

chapter

Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations

Qing Yi, Qian Wang, Huimin Cui

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 596 - 608

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

General purpose compilers aim to extract the best average performance for all possible user applications. Due to the lack of specializations for different types of computations, compiler attained performance often lags behind those of the manually optimized libraries. In this paper, we demonstrate a new approach, programmable composition, to enable the specialization of compiler optimizations without...

chapter

Power and performance analysis of the Graph 500 benchmark on the Single-chip Cloud Computer

Zhiquan Lai, King Tin Lam, Cho-Li Wang, Jinshu Su

Proceedings of 2014 International Conference on Cloud Computing and Internet of Things > 9 - 13

2014 International Conference on Cloud Computing and Internet of Things (CCIOT)

The concerns of data-intensiveness and energy awareness are actively reshaping the design of high-performance computing (HPC) systems nowadays. The Graph500 is a widely adopted benchmark for evaluating the performance of computing systems for data-intensive workloads. In this paper, we introduce a data-parallel implementation of Graph500 on the Intel Single-chip Cloud Computer (SCC). The SCC features...

chapter

Runtime Checking for Paired Functions in Device Drivers

Jia-Ju Bai, Hu-Qiu Liu, Yu-Ping Wang, Shi-Min Hu

2014 21st Asia-Pacific Software Engineering Conference > 1 > 407 - 414

2014 21st Asia-Pacific Software Engineering Conference (APSEC)

Device drivers usually invoke functions to allocate resources for managing hardware devices and communicating with the kernel, and these resources should be released by functions when the work is finished. Thus allocating functions and releasing functions must be invoked in pairs. However, many developers ignore this vital rule, and some allocated resources are not released in time, which may cause...

INFONA - science communication portal

Search results

Self-Managed Component-Based Software Architecture for Business Process Management

Complete Runtime Tracing for Device Drivers Based on LLVM

A GPU-accelerated two stage visual matching pipeline for image and video retrieval

Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters

Heterogeneous Habanero-C (H2C): A Portable Programming Model for Heterogeneous Processors

Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL

Fast Sparse Matrix and Sparse Vector Multiplication Algorithm on the GPU

Bridging the Gap between Performance and Bounds of Cholesky Factorization on Heterogeneous Platforms

Linux XIA: an interoperable meta network architecture to crowdsource the future internet

In memory detection of Windows API call hooking technique

PACXX: Towards a Unified Programming Model for Programming Accelerators Using C++14

QTrace: a framework for customizable full system instrumentation

Implementation of numerical methods for nanoscaled semiconductor device simulation using OpenCL

Free launch: Optimizing GPU dynamic kernel launches through thread reuse

CilkSpec: optimistic concurrency for Cilk

VOCL-FT: introducing techniques for efficient soft error coprocessor recovery

Real-time multi-core components for cyber-physical systems

Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations

Power and performance analysis of the Graph 500 benchmark on the Single-chip Cloud Computer

Runtime Checking for Paired Functions in Device Drivers

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options