The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Virtualization has become a central role in HPC Cloud due to easy management and low cost of computation and communication. Recently, Single Root I/O Virtualization (SR-IOV) technology has been introduced for high-performance interconnects such as InfiniBand and can attain near to native performance for inter-node communication. However, the SR-IOV scheme lacks locality aware communication support,...
Turning towards exascale systems and beyond, it has been widely argued that the currently available systems software is not going to be feasible due to various requirements such as the ability to deal with heterogeneous architectures, the need for systems level optimization targeting specific applications, elimination of OS noise, and at the same time, compatibility with legacy applications. To cope...
The Cloud data services, specifically, key/value stores and NoSQL database that require a large number of index lookups that fetch small amount of data. Random I/O becomes the critical performance factor. However, compared with sequential read, the efficiency of random read is very low. Our experiment will explain this. File I/O operation is closely associated with the implementation of I/O mechanism...
In this paper we give a short survey of some existing solutions and describe our attempt to build an Internet of Things platform independent of underlying hardware. It is concluded that the best way to do that is to virtualize hardware by using common microkernel-based operating system. Such an operating systems exists as an open source, however porting it to the particular hardware board turned out...
The study of the I/O performance of a parallel application can be facilitated by the use of an I/O kernel -- a program that generates the same I/O calls as the original application, but can be executed much faster. Such I/O kernels are especially important when the programs under study are proprietary or classified, and only available in binary form.In this paper, we show how to create automatically...
Developing complex scientific applications on high performance systems requires both domain knowledge and expertise in parallel and distributed programming models. In addition, modern high performance systems are heterogeneous, thus composed of multicores and accelerators, which despite being efficient and powerful, are harder to program. Domain-Specific Languages (DSLs) are a promising approach to...
Code maintainability, performance portability and future proofing are some of the key challenges in this era of rapid change in High Performance Computing. Domain Specific Languages and Active Libraries address these challenges by focusing on a single application domain and providing a high-level programming approach, and then subsequently using domain knowledge to deliver high performance on various...
Lattice-based cryptography became a hot-topic in the past years because it seems to be quantum immune, i.e., resistant to attacks operated with quantum computers. The security of lattice-based cryptosystems is determined by the hardness of certain lattice problems, such as the Shortest Vector Problem (SVP). Thus, it is of prime importance to study how efficiently SVP-solvers can be implemented. This...
Emerging mobile devices are likely to adopt CPU-GPU heterogeneous architecture where an embedded GPU executes offloaded computations from the CPU as well as rendering tasks. For design space exploration of such a CPU-GPU heterogeneous architecture at the early design stage or for monitoring the dynamic system behavior of a system, it is very desirable to run the same application software on a full...
We present a new device driver generation approach capable of automatically generating a large portion of device drivers code, and this for different operating systems (OSes). This approach is based on a model-driven methodology, where a tiny language is utilized to model the device features and abstract low-level complexities of a driver. The approach can handle different driver architectures. We...
An extraction of feature-vectors from speech audio signal is a computationally intensive task. However, MFCC and PLP features remain the most popular for more than a decade. We made a GPU-accelerated implementation of the feature extraction processing. The implementation produces identical features as the reference Hidden Markov Toolkit (HTK) but in a fraction of the elapsed time. The saved time can...
Currently, state of the art libraries, like MAGMA, focus on very large linear algebra problems, while solving many small independent problems, which is usually referred to as batched problems, is not given adequate attention. In this paper, we proposed a batched Cholesky factorization on a GPU. Three algorithms -- non-blocked, blocked, and recursive blocked -- were examined. The left-looking version...
Computers have been moving toward a multicore paradigm for the last several years. As a result of the recent multicore paradigm shift, software developers must design applications that exploit the inherent parallelism of modern computing architectures. One of the areas of research to simplify this shift is the development of dynamic scheduling utilities that allow the developer to specify serial code...
Industrial applications often require processing data with large dynamic ranges at low sample rates. As algorithms become more complex, handling the data range of variables required for fixed-point implementations becomes time consuming, and can also lead to inefficient designs. Floating-point solutions leverage these limitations trading automatic data range handling for a usually higher implementation...
We introduce a library for the productive development of image processing accelerators using C-based high-level synthesis. The key concept of our approach is to provide a set of generic building blocks that is applicable to a multitude of image processing applications. An efficient memory architecture that facilitates easy integration of point and local image processing operators is the centerpiece...
In this poster, we present a novel approach, called software fingerprinting, that captures application dependencies. Our Fingerprint tool enables the user to discover, track, display and save the dependencies of an application without modification to its source code. The tool can achieve this both through static and runtime dependency discovery and the result is stored in a separate file called a...
In this paper, we propose an Auto-tuning (AT) function with an AT language for a dedicated numerical library with respect to supercomputers in operation. The AT function is based on well-known loop transformation techniques, such as loop split, fusion, and re-ordering of statements. However, loop split with copies or increase of computations, and loop fusion to the split loop are taken into account...
This paper presents a new algorithm for the sandboxing system calls based on the atomic trusted code region. The algorithm successfully protects against any kind of code-injection attacks as well as any kind of mimicry attack including known-address attacks and scanning attacks. The algorithm is lightweight and simple. The implementation of algorithm does not need any change on an untrusted machine...
Pattern libraries are important tools for high productivity application development. Their struggle for best performance is complicated by the fact that they are used to execute user-provided code, which is not known during their creation. This makes pattern libraries good candidate for automatic software tuning. In this paper, we deal with automatic online parameter tuning of the HyPHI hybrid pattern...
Embedded systems are constantly becoming more complex, as they are increasingly equipped with more functionality. Networking capability is one of the most desired features even for embedded systems, hence network applications, typically used in desktop systems, are required to become available in the embedded system domain. Rewriting these applications to fit into embedded root file systems takes...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.