Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
With the advent of large high volume data, we have seen need for real time analytic techniques like Complex Event Processing. This paper extends a Complex Event Processing Engine to support real time identification of technical chart patterns from streaming data. Technical chart patterns are known interesting recurring patterns on time series data, and they are used by experts in time series data...
The ability to automatically detect faults or fault patterns to enhance system reliability is important for system administrators in reducing system failures. To achieve this objective, the message logs from cluster system are augmented with failure information, i.e., The raw log data is labelled. However, tagging or labelling of raw log data is very costly. In this paper, our objective is to detect...
Efficient utilization of high-performance computing (HPC) platforms is an important and complex problem. Execution models, abstract descriptions of the dynamic runtime behavior of the execution stack, have significant impact on the utilization of HPC systems. Using a computational chemistry kernel as a case study and a wide variety of execution models combined with load balancing techniques, we explore...
Many applications in network analysis require the computation of the network's laplacian pseudo-inverse - e.g., Topological centrality in social networks or estimating commute times in electrical networks. As large graphs become ubiquitous, the traditional approaches - with quadratic or cubic complexity in the number of vertices - do not scale. To alleviate this performance issue, a divide-and-conquer...
Scientific applications need to be moved among supercomputers, such as Tianhe-2 and TSUBAME 2.5. OpenACC provides a directive-based approach for a single source code base with function portability across different accelerators used in the supercomputers. However, the performance portability is not guaranteed by the OpenACC standard. Therefore, we propose a systematic optimization method, instead of...
Several recent papers have introduced a periodic verification mechanism to detect silent errors in iterative solvers. Chen [PPoPP'13, pp. 167 -- 176] has shown how to combine such a verification mechanism (a stability test checking the orthogonality of two vectors and recomputing the residual) with check pointing: the idea is to verify every d iterations, and to checkpoint every c × d iterations....
An R-tree is a data structure for organizing and querying multi-dimensional non-uniform and overlapping data. Efficient parallelization of R-tree is an important problem due to societal applications such as geographic information systems (GIS), spatial database management systems, and VLSI layout which employ R-trees for spatial analysis tasks such as map-overlay. As graphics processing units (GPUs)...
In this paper, we present a directive-based auto-tuning (AT) framework, called ppOpen-AT, and demonstrate its effect using simulation code based on the Finite Difference Method (FDM). The framework utilizes well-known loop transformation techniques. However, the codes used are carefully designed to minimize the software stack in order to meet the requirements of a many-core architecture currently...
Heterogeneous architectures with their diverse architectural features impose significant programmability challenges. Existing programming systems involve non-trivial learning and are not productive, not portable, and are challenging to tune for performance. In this paper, we introduce Heterogeneous Habanero-C (H2C), which is an implementation of the Habanero execution model for modern heterogeneous...
The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this SPEC benchmark to compare an AMD GPU, an NVIDIA GPU and an Intel Xeon Phi with respect to performance and energy consumption. It also provides observations...
We implement a promising algorithm for sparse-matrix sparse-vector multiplication (SpMSpV) on the GPU. An efficient k-way merge lies at the heart of finding a fast parallel SpMSpV algorithm. We examine the scalability of three approaches -- no sorting, merge sorting, and radix sorting -- in solving this problem. For breadth-first search (BFS), we achieve a 1.26x speedup over state-of-the-art sparse-matrix...
We develop a methodology for modeling the energy efficiency of tiled nested-loop codes running on a graphics processing unit (GPU) and use it for energy efficiency optimization. % We use the polyhedral model, a We assume that a highly optimized and parametrized version of a tiled nested -- loop code, either written by an expert programmer or automatically produced by a polyhedral compilation tool...
In this paper we show how to analytically model two widely used distributed matrix-multiply algorithms, Cannon's 2D and Johnson's 3D, implemented within the Intel Concurrent Collections framework for shared/distributed memory execution. Our precise analytical model proceeds by estimating the computation time and communication times, taking into account factors such as the block size, communication...
Tile algorithms for matrix decomposition can generate many fine-grained tasks. Therefore, their suitability for processing with multicourse architecture has attracted much attention from the high-performance computing (HPC) community. Our implementation of tile QR decomposition for a cluster system has dynamic scheduling, OpenMP work- sharing, and other useful features. In this article, we discuss...
In recent years, due to a higher demand for portable devices, which provide restricted amounts of processing capacity and battery power, the need for energy and time efficient hard and software solutions has increased. Preliminary estimations of time and energy consumption can thus be valuable to improve implementations and design decisions. To this end, this paper presents a method to estimate the...
Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good...
Taking full advantage of SIMD instructions in C programs still requires tedious and non-portable programming using intrinsics, despite considerable efforts spent developing auto-vectorization capabilities in recent decades. Whole Function Vectorization (WFV) is a recent technique for extending the use of SIMD across entire functions. WFV has so far only been used in data-parallel languages such as...
The complexity of today's high performance computing systems, and their parallel software, requires performance analysis tools to fully understand application performance behavior. The visualization of event streams has proven to be a powerful approach for the detection of various types of performance problems. However, visualization of large numbers of process streams quickly hits the limits of available...
Big data and the Internet of Things era continue to challenge computational systems. Several technology solutions such as NoSQL databases have been developed to deal with this challenge. In order to generate meaningful results from large datasets, analysts often use a graph representation which provides an intuitive way to work with the data. Graph vertices can represent users and events, and edges...
Column-store in-memory databases have received a lot of attention because of their fast query processing response times on modern multi-core machines. Among different database operations, group by/aggregate is an important and potentially costly operation. Moreover, sort-based and hash-based algorithms are the most common ways of processing group by/aggregate queries. While sort-based algorithms are...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.