Search results

chapter

High performance MPI library over SR-IOV enabled infiniband clusters

Jie Zhang, Xiaoyi Lu, Jithin Jose, Mingzhe Li, more

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

Virtualization has become a central role in HPC Cloud due to easy management and low cost of computation and communication. Recently, Single Root I/O Virtualization (SR-IOV) technology has been introduced for high-performance interconnects such as InfiniBand and can attain near to native performance for inter-node communication. However, the SR-IOV scheme lacks locality aware communication support,...

chapter

Interface for heterogeneous kernels: A framework to enable hybrid OS designs targeting high performance computing on manycore architectures

Taku Shimosawa, Balazs Gerofi, Masamichi Takagi, Gou Nakamura, more

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

Turning towards exascale systems and beyond, it has been widely argued that the currently available systems software is not going to be feasible due to various requirements such as the ability to deal with heterogeneous architectures, the need for systems level optimization targeting specific applications, elimination of OS noise, and at the same time, compatibility with legacy applications. To cope...

chapter

Improving Random Read Performance of Glibc

Mei Wang, Yuanyuan Zhou, Feng Xiao, Qiuming Luo

2014 13th International Symposium on Distributed Computing and Applications to Business, Engineering and Science > 78 - 82

2014 13th International Symposium on Distributed Computing and Applications to Business, Engineering and Science (DCABES)

The Cloud data services, specifically, key/value stores and NoSQL database that require a large number of index lookups that fetch small amount of data. Random I/O becomes the critical performance factor. However, compared with sequential read, the efficiency of random read is very low. Our experiment will explain this. File I/O operation is closely associated with the implementation of I/O mechanism...

chapter

Some experiences in building IoT platform

Aleksandar Milinkovic, Stevan Milinkovic, Ljubomir Lazic

2014 22nd Telecommunications Forum Telfor (TELFOR) > 1138 - 1141

2014 22nd Telecommunications Forum Telfor (TELFOR)

In this paper we give a short survey of some existing solutions and describe our attempt to build an Internet of Things platform independent of underlying hardware. It is concluded that the best way to do that is to virtualize hardware by using common microkernel-based operating system. Such an operating systems exists as an open source, however porting it to the particular hardware board turned out...

chapter

Automatic Generation of I/O Kernels for HPC Applications

Babak Behzad, Hoang-Vu Dang, Farah Hariri, Weizhe Zhang, more

2014 9th Parallel Data Storage Workshop > 31 - 36

2014 9th Parallel Data Storage Workshop (PDSW)

The study of the I/O performance of a parallel application can be facilitated by the use of an I/O kernel -- a program that generates the same I/O calls as the original application, but can be executed much faster. Such I/O kernels are especially important when the programs under study are proprietary or classified, and only available in binary form.In this paper, we show how to create automatically...

chapter

A Data Flow Language to Develop High Performance Computing DSLs

Alejandro Fernandez, Vicenc Beltran, Sergi Mateo, Tomasz Patejko, more

2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing > 11 - 20

2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)

Developing complex scientific applications on high performance systems requires both domain knowledge and expertise in parallel and distributed programming models. In addition, modern high performance systems are heterogeneous, thus composed of multicores and accelerators, which despite being efficient and powerful, are harder to program. Domain-Specific Languages (DSLs) are a promising approach to...

chapter

The OPS Domain Specific Abstraction for Multi-block Structured Grid Computations

Istvan Z. Reguly, Gihan R. Mudalige, Michael B. Giles, Dan Curran, more

2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing > 58 - 67

2014 Fourth International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing (WOLFHPC)

Code maintainability, performance portability and future proofing are some of the key challenges in this era of rapid change in High Performance Computing. Domain Specific Languages and Active Libraries address these challenges by focusing on a single application domain and providing a high-level programming approach, and then subsequently using domain knowledge to deliver high performance on various...

chapter

Lock-Free GaussSieve for Linear Speedups in Parallel High Performance SVP Calculation

Artur Mariano, Shahar Timnat, Christian Bischof

2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing > 278 - 285

2014 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Lattice-based cryptography became a hot-topic in the past years because it seems to be quantum immune, i.e., resistant to attacks operated with quantum computers. The security of lattice-based cryptosystems is determined by the hardness of certain lattice problems, such as the Shortest Vector Problem (SVP). Thus, it is of prime importance to study how efficiently SVP-solvers can be implemented. This...

chapter

Hardware-in-the-loop simulation of Android GPGPU applications

Youngsub Ko, Saehanseul Yi, Youngmin Yi, Myungsun Kim, more

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia) > 108 - 117

2014 IEEE 12th Symposium on Embedded Systems for Real-time Multimedia (ESTIMedia)

Emerging mobile devices are likely to adopt CPU-GPU heterogeneous architecture where an embedded GPU executes offloaded computations from the CPU as well as rendering tasks. For design space exploration of such a CPU-GPU heterogeneous architecture at the early design stage or for monitoring the dynamic system behavior of a system, it is very desirable to run the same application software on a full...

chapter

Device driver generation targeting multiple operating systems using a model-driven methodology

Hui Chen, Guillaume Godet-Bar, Frederic Rousseau, Frederic Petrot

2014 25nd IEEE International Symposium on Rapid System Prototyping > 30 - 36

2014 International Symposium on Rapid System Prototyping (RSP)

We present a new device driver generation approach capable of automatically generating a large portion of device drivers code, and this for different operating systems (OSes). This approach is based on a model-driven methodology, where a tiny language is utilized to model the device features and abstract low-level complexities of a driver. The approach can handle different driver architectures. We...

chapter

An open-source GPU-accelerated feature extraction tool

Josef Michalek, Jan Vanek

2014 12th International Conference on Signal Processing (ICSP) > 450 - 454

2014 12th International Conference on Signal Processing (ICSP 2014)

An extraction of feature-vectors from speech audio signal is a computationally intensive task. However, MFCC and PLP features remain the most popular for more than a decade. We made a GPU-accelerated implementation of the feature extraction processing. The implementation produces identical features as the reference Hidden Markov Toolkit (HTK) but in a fraction of the elapsed time. The saved time can...

chapter

A Fast Batched Cholesky Factorization on a GPU

Tingxing Dong, Azzam Haidar, Stanimire Tomov, Jack Dongarra

2014 43rd International Conference on Parallel Processing > 432 - 440

2014 43nd International Conference on Parallel Processing (ICPP)

Currently, state of the art libraries, like MAGMA, focus on very large linear algebra problems, while solving many small independent problems, which is usually referred to as batched problems, is not given adequate attention. In this paper, we proposed a batched Cholesky factorization on a GPU. Three algorithms -- non-blocked, blocked, and recursive blocked -- were examined. The left-looking version...

chapter

Parallel Simulation of Superscalar Scheduling

Blake Haugen, Jakub Kurzak, Asim YarKhan, Piotr Luszczek, more

2014 43rd International Conference on Parallel Processing > 121 - 130

2014 43nd International Conference on Parallel Processing (ICPP)

Computers have been moving toward a multicore paradigm for the last several years. As a result of the recent multicore paradigm shift, software developers must design applications that exploit the inherent parallelism of modern computing architectures. One of the areas of research to simplify this shift is the development of dynamic scheduling utilities that allow the developer to specify serial code...

chapter

Low-cost multiplier-based FPU for embedded processing on FPGA

Bogdan Pasca

2014 24th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

Industrial applications often require processing data with large dynamic ranges at low sample rates. As algorithms become more complex, handling the data range of variables required for fixed-point implementations becomes time consuming, and can also lead to inefficient designs. Floating-point solutions leverage these limitations trading automatic data range handling for a usually higher implementation...

chapter

An image processing library for C-based high-level synthesis

Moritz Schmid, Nicolas Apelt, Frank Hannig, Jurgen Teich

2014 24th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2014 24th International Conference on Field Programmable Logic and Applications (FPL)

We introduce a library for the productive development of image processing accelerators using C-based high-level synthesis. The key concept of our approach is to provide a set of generic building blocks that is applicable to a multitude of image processing applications. An efficient memory architecture that facilitates easy integration of point and local image processing operators is the centerpiece...

chapter

POSTER: Fingerprinting application dependencies

Luca Clementi, Philip Papadopoulos

2014 IEEE International Conference on Cluster Computing (CLUSTER) > 288 - 289

2014 IEEE International Conference On Cluster Computing (CLUSTER)

In this poster, we present a novel approach, called software fingerprinting, that captures application dependencies. Our Fingerprint tool enables the user to discover, track, display and save the dependencies of an application without modification to its source code. The tool can achieve this both through static and runtime dependency discovery and the result is stored in a separate file called a...

chapter

Auto-tuning of Computation Kernels from an FDM Code with ppOpen-AT

Takahiro Katagiri, Satoshi Ohshima, Masaharu Matsumoto

2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs > 91 - 98

2014 IEEE 8th International Symposium on Embedded Multicore/Manycore SoCs (MCSoC)

In this paper, we propose an Auto-tuning (AT) function with an AT language for a dedicated numerical library with respect to supercomputers in operation. The AT function is based on well-known loop transformation techniques, such as loop split, fusion, and re-ordering of statements. However, loop split with copies or increase of computations, and loop fusion to the split loop are taken into account...

chapter

A method for system calls sandboxing based on atomic trusted code region

Milos Subotic, Nemanja Fimic, Darko Dejanovic, Goran Miljkovic

2014 IEEE Fourth International Conference on Consumer Electronics Berlin (ICCE-Berlin) > 453 - 456

2014 IEEE Fourth International Conference on Consumer Electronics – Berlin (ICCE-Berlin)

This paper presents a new algorithm for the sandboxing system calls based on the atomic trusted code region. The algorithm successfully protects against any kind of code-injection attacks as well as any kind of mimicry attack including known-address attacks and scanning attacks. The algorithm is lightweight and simple. The implementation of algorithm does not need any change on an untrusted machine...

chapter

Automatic Tuning of a Parallel Pattern Library for Heterogeneous Systems with Intel Xeon Phi

Jiri Dokulil, Siegfried Benkner

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications > 42 - 49

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

Pattern libraries are important tools for high productivity application development. Their struggle for best performance is complicated by the fact that they are used to execute user-provided code, which is not known during their creation. This makes pattern libraries good candidate for automatic software tuning. In this paper, we deal with automatic online parameter tuning of the HyPHI hybrid pattern...

chapter

Formulating Optimized Storage and Memory Space Specifications for Linux Network Embedded Systems

Kleomenis Tsiligkos, Apostolos Meliones

2014 IEEE Intl Conf on High Performance Computing and Communications, 2014 IEEE 6th Intl Symp on Cyberspace Safety and Security, 2014 IEEE 11th Intl Conf on Embedded Software and Syst (HPCC,CSS,ICESS) > 580 - 584

2014 IEEE International Conference on High Performance Computing and Communications (HPCC), 2014 IEEE 6th International Symposium on Cyberspace Safety and Security (CSS) and 2014 IEEE 11th International Conference on Embedded Software and Systems (ICESS)

Embedded systems are constantly becoming more complex, as they are increasingly equipped with more functionality. Networking capability is one of the most desired features even for embedded systems, hence network applications, typically used in desktop systems, are required to become available in the embedded system domain. Rewriting these applications to fit into embedded root file systems takes...

INFONA - science communication portal

Search results

High performance MPI library over SR-IOV enabled infiniband clusters

Interface for heterogeneous kernels: A framework to enable hybrid OS designs targeting high performance computing on manycore architectures

Improving Random Read Performance of Glibc

Some experiences in building IoT platform

Automatic Generation of I/O Kernels for HPC Applications

A Data Flow Language to Develop High Performance Computing DSLs

The OPS Domain Specific Abstraction for Multi-block Structured Grid Computations

Lock-Free GaussSieve for Linear Speedups in Parallel High Performance SVP Calculation

Hardware-in-the-loop simulation of Android GPGPU applications

Device driver generation targeting multiple operating systems using a model-driven methodology

An open-source GPU-accelerated feature extraction tool

A Fast Batched Cholesky Factorization on a GPU

Parallel Simulation of Superscalar Scheduling

Low-cost multiplier-based FPU for embedded processing on FPGA

An image processing library for C-based high-level synthesis

POSTER: Fingerprinting application dependencies

Auto-tuning of Computation Kernels from an FDM Code with ppOpen-AT

A method for system calls sandboxing based on atomic trusted code region

Automatic Tuning of a Parallel Pattern Library for Heterogeneous Systems with Intel Xeon Phi

Formulating Optimized Storage and Memory Space Specifications for Linux Network Embedded Systems

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options