The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
While the functions of Business Process Management (BPM) tools are already studied and standardized, new challenges regarding the architecture of such type of tools are emerging including the need for more scalability to support increasing demands, and more resilience of the overall solution to detect and avoid third-party code problems, that can causes failure of all the system. In this paper we...
Device drivers often suffer from much more bugs than the kernel, so testing device drivers becomes more and more important and necessary. In software testing, runtime tracing is an important technique to monitor real executing procedures of the program. Meanwhile, runtime information can also assist the programmer to make more accurate analysis of the program, like verifying the correctness of code...
We propose a two stage visual matching pipeline including a first step using VLAD signatures for filtering results, and a second step which reranks the top results using raw matching of SIFT descriptors. This enables adjusting the tradeoff between high computational cost of matching local descriptors and the insufficient accuracy of compact signatures in many application scenarios. We describe GPU...
Checkpoint-restart is a predominantly used reactive fault-tolerance mechanism for applications running on HPC systems. While there are innumerable studies in literature that have analyzed, and optimized for, the performance and scalability of a variety of check pointing protocols, not much research has been done from an energy or power perspective. The limited number of studies conducted along this...
Heterogeneous architectures with their diverse architectural features impose significant programmability challenges. Existing programming systems involve non-trivial learning and are not productive, not portable, and are challenging to tune for performance. In this paper, we introduce Heterogeneous Habanero-C (H2C), which is an implementation of the Habanero execution model for modern heterogeneous...
The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this SPEC benchmark to compare an AMD GPU, an NVIDIA GPU and an Intel Xeon Phi with respect to performance and energy consumption. It also provides observations...
We implement a promising algorithm for sparse-matrix sparse-vector multiplication (SpMSpV) on the GPU. An efficient k-way merge lies at the heart of finding a fast parallel SpMSpV algorithm. We examine the scalability of three approaches -- no sorting, merge sorting, and radix sorting -- in solving this problem. For breadth-first search (BFS), we achieve a 1.26x speedup over state-of-the-art sparse-matrix...
We consider the problem of allocating and scheduling dense linear application on fully heterogeneous platforms made of CPUs and GPUs. More specifically, we focus on the Cholesky factorization since it exhibits the main features of such problems. Indeed, the relative performance of CPU and GPU highly depends on the sub-routine: GPUs are for instance much more efficient to process regular kernels such...
With the growing number of proposed clean-slate redesigns of the Internet, the need for a medium that enables all stakeholders to participate in the realization, evaluation, and selection of these designs is increasing. We believe that the missing catalyst is a meta network architecture that welcomes most, if not all, clean-state designs on a level playing field, lowers deployment barriers, and leaves...
API call hooking is a technique that malware researchers use to mine malware's API calls. These API calls is used to represent malware's behavior, for use in malware analysis, classification or detection of samples. In this paper, analysis of current Windows API call hooking techniques is presented where surprisingly, it was found that detection of each technique can be done trivially in memory. This...
We present PACXX -- a unified programming model for programming many-core systems that comprise accelerators like Graphics Processing Units (GPUs). One of the main difficulties of the current GPU programming is that two distinct programming models are required: the host code for the CPU is written in C/C++ with the restricted, C-like API for memory management, while the device code for the GPU has...
This work presents QTrace, an open-source instrumentation extension API for QEMU (1) that can instrument unmodified applications and OS binaries for uni- and multi-processor systems. QTrace facilitates the development of custom, full-system instrumentation tools for the X86 guest architecture enabling statistics collection and program execution studies including system-level code. This paper: illustrates...
The present work implements solvers with OpenCL of the FGMRES and preconditioned BCGSTAB algorithms. These solvers are integrated in a 3-D simulation tool of nanoscaled MOSFET transistors. Simulations are launched in two different platform devices: NVIDIA Tesla S2050 and Intel Xeon Phi 3120A. The resulting times of execution are compared against the optimized PSPARSLIB version of the FGMRES solver...
Supporting dynamic parallelism is important for GPU to benefit a broad range of applications. There are currently two fundamental ways for programs to exploit dynamic parallelism on GPU: a software-based approach with software-managed worklists, and a hardware-based approach through dynamic subkernel launches. Neither is satisfactory. The former is complicated to program and is often subject to some...
Recursive parallel programming models such as Cilk strive to simplify the task of parallel programming by enabling a simple divide-and-conquer programming model. This model is effective in recursively partitioning work into smaller parts and combining their results. However, recursive work partitioning can impose additional constraints on concurrency than is implied by the true dependencies in a program...
Popular accelerator programming models rely on offloading computation operations and their corresponding data transfers to the coprocessors, leveraging synchronization points where needed. In this paper we identify and explore how such a programming model enables optimization opportunities not utilized in traditional checkpoint/restart systems, and we analyze them as the building blocks for an efficient...
Developing correct, efficient, and maintainable real-time control software for cyber-physical systems is a notoriously difficult interdisciplinary challenge. Ever more complex control algorithms and the advent of multi-core hardware in embedded systems have made this challenge even harder. Component-based software development promises to help reduce the complexity and to increase the timing predictability...
General purpose compilers aim to extract the best average performance for all possible user applications. Due to the lack of specializations for different types of computations, compiler attained performance often lags behind those of the manually optimized libraries. In this paper, we demonstrate a new approach, programmable composition, to enable the specialization of compiler optimizations without...
The concerns of data-intensiveness and energy awareness are actively reshaping the design of high-performance computing (HPC) systems nowadays. The Graph500 is a widely adopted benchmark for evaluating the performance of computing systems for data-intensive workloads. In this paper, we introduce a data-parallel implementation of Graph500 on the Intel Single-chip Cloud Computer (SCC). The SCC features...
Device drivers usually invoke functions to allocate resources for managing hardware devices and communicating with the kernel, and these resources should be released by functions when the work is finished. Thus allocating functions and releasing functions must be invoked in pairs. However, many developers ignore this vital rule, and some allocated resources are not released in time, which may cause...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.