Search results

chapter

Automatic Scan Parallelization in OpenMP

Maicol Zegarra, Marcio Pereira, Xavier Martorell, Guido Araujo

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 85 - 90

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Prefix Scan (or simply scan) is an operator that computes all the partial sums of a vector. A scan operation results in a vector where each element is the sum of the preceding elements in the original vector up to the corresponding position. Scan is a key operation in many relevant problems like sorting, lexical analysis, string comparison, image filtering among others. Although there are libraries...

chapter

Taco: A tool to generate tensor algebra kernels

Fredrik Kjolstad, Stephen Chou, David Lugato, Shoaib Kamil, more

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) > 943 - 948

2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE)

Tensor algebra is an important computational abstraction that is increasingly used in data analytics, machine learning, engineering, and the physical sciences. However, the number of tensor expressions is unbounded, which makes it hard to develop and optimize libraries. Furthermore, the tensors are often sparse (most components are zero), which means the code has to traverse compressed formats. To...

chapter

Dataflow Programming for Stream Processing

Marcos P. Rocha, Felipe M.G. Franca, Alexandre S. Nery, Leandro S. Guedes

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 103 - 108

2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Stream processing applications have high-demanding performance requirements that are hard to tackle using traditional parallel models on modern many-core architectures, such as GPUs. On the other hand, recent dataflow computing models can naturally exploit parallelism for a wide class of applications. This work presents an extension to an existing dataflow library for Java. The library extension implements...

chapter

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 13 - 24

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Multi-/many-core CPU based architectures are seeing widespread adoption due to their unprecedented compute performance in a small power envelope. With the increasingly large number of cores on each node, applications spend a significant portion of their execution time in intra-node communication. While shared memory is commonly used for intra-node communication, it needs to copy each message once...

chapter

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Behnam Pourghassemi, Aparna Chandramowlishwaran

2017 IEEE International Conference on Cluster Computing (CLUSTER) > 725 - 732

2017 IEEE International Conference on Cluster Computing (CLUSTER)

Fault-tolerance is becoming increasingly important as we enter the era of exascale computing. Increasing the number of cores results in a smaller mean time between failures, and consequently, higher probability of errors. Among the different software fault tolerance techniques, checkpoint/restart is the most commonly used method in supercomputers, the de-facto standard for large-scale systems. Although...

chapter

Estimation of the multidimensional dynamical characteristic eye-motor system

Vitaliy Pavlenko, Illia Ivanov, Evgeny Kravchenko

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 2 > 645 - 650

2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

A new method of constructing nonparametric dynamic model of the human oculomotor system on the basis of experimental data “input-output” is developed, considering nonlinear and inertial properties of the rectus muscles of the eye. A technology for tracking eye movement is based on the videos. It is possible to determine the dynamic characteristics of the oculomotor system functions as a transition...

chapter

Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi

Azzam Haidar, Heike Jagode, Asim YarKhan, Phil Vaccaro, more

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

The emergence of power efficiency as a primary constraint in processor and system designs poses new challenges concerning power and energy awareness for numerical libraries and scientific applications. Power consumption also plays a major role in the design of data centers in particular for peta- and exa-scale systems. Understanding and improving the energy efficiency of numerical simulation becomes...

chapter

Fast linear algebra-based triangle counting with KokkosKernels

Michael M. Wolf, Mehmet Deveci, Jonathan W. Berry, Simon D. Hammond, more

2017 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2017 IEEE High Performance Extreme Computing Conference (HPEC)

Triangle counting serves as a key building block for a set of important graph algorithms in network science. In this paper, we address the IEEE HPEC Static Graph Challenge problem of triangle counting, focusing on obtaining the best parallel performance on a single multicore node. Our implementation uses a linear algebra-based approach to triangle counting that has grown out of work related to our...

chapter

Inter-process communication, MPI and MPICH in microkernel environment: A comparative analysis

Mahnoor Khan, Munam Ali Shah

2017 23rd International Conference on Automation and Computing (ICAC) > 1 - 7

2017 23rd International Conference on Automation and Computing (ICAC)

Inter-process communication (IPC) is one of the crucial aspects of every microkernel. The message-passing interface (MPI) is a specification between different processes, which is used for communication amongst processes. Message Passing Interface Chameleon (MPICH) is the portable implementation of message passing interface. This paper delineates the comparison between IPC, MPI and MPICH in terms of...

chapter

LibHSA: One step towards mastering the era of heterogeneous hardware accelerators using FPGAs

Marc Reichenbach, Philipp Holzinger, Konrad Haublein, Tobias Lieske, more

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP) > 1 - 6

2017 Conference on Design and Architectures for Signal and Image Processing (DASIP)

Various signal and image processing applications require vast acceleration in order to enable real-time processing and meet constraints in power consumption. On FPGAs these applications can be implemented as application-specific circuit. Although IP cores for various applications exist, even interfacing these usually requires experienced knowledge in hardware design. Using FPGAs or other accelerators...

chapter

UDORN: A design framework of persistent in-memory key-value database for NVM

Xianzhang Chen, Edwin H.-M. Sha, Ahmad Abdullah, Qingfeng Zhuge, more

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA) > 1 - 6

2017 IEEE 6th Non-Volatile Memory Systems and Applications Symposium (NVMSA)

Emerging non-volatile memory (NVM) technologies provide opportunities to improve the performance of key-value databases (KVDBs) by deploying database on NVM. However, existing in-memory KVDBs cannot fully exploit the advantages of NVM. They process data on in-memory database and store an image on persistent storage via an underlying file system. The performance of database operations is degraded by...

chapter

Userspace RDMA Verbs on Commodity Hardware Using DPDK

Patrick MacArthur

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI) > 103 - 110

2017 IEEE 25th Annual Symposium on High-Performance Interconnects (HOTI)

RDMA (Remote Direct Memory Access) is a technology that enables user applications to perform direct data transfer between the virtual memory of processes on remote endpoints, without operating system involvement or intermediate data copies. Achieving zero intermediate data copies using RDMA requires specialized network interface hardware. Software RDMA drivers emulate RDMA semantics in software to...

chapter

Overlapping Data Transfers with Computation on GPU with Tiles

Burak Bastem, Didem Unat, Weiqun Zhang, Ann Almgren, more

2017 46th International Conference on Parallel Processing (ICPP) > 171 - 180

2017 46th International Conference on Parallel Processing (ICPP)

GPUs are employed to accelerate scientific applications however they require much more programming effort from the programmers particularly because of the disjoint address spaces between the host and the device. OpenACC and OpenMP 4.0 provide directive based programming solutions to alleviate the programming burden however synchronous data movement can create a performance bottleneck in fully taking...

chapter

Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization

Kyle C. Hale, Conor Hetland, Peter Dinda

2017 IEEE International Conference on Autonomic Computing (ICAC) > 177 - 186

2017 IEEE International Conference on Autonomic Computing (ICAC)

The hybrid runtime (HRT) model offers a path towards high performance and efficiency. By integrating the OS kernel, runtime, and application, an HRT allows the runtime developer to leverage the full feature set of the hardware and specialize OS services to the runtime's needs. However, conforming to the HRT model currently requires a port of the runtime to the kernel level, for example to the Nautilus...

chapter

A pipeline functional language for stateful packet processing

Nicola Bonelli, Stefano Giordano, Gregorio Procissi

2017 IEEE Conference on Network Softwarization (NetSoft) > 1 - 4

2017 IEEE Conference on Network Softwarization (NetSoft)

The evolution of commodity PCs towards multi-core processing platforms equipped with high-speed network interfaces makes them reasonable and cost effective targets for the implementation of generic network functions. In addition, the availability of software accelerated I/O frameworks provides a convenient ground for running a broad variety of applications, from simple software switches to more complex...

chapter

Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters

Langshi Chen, Bo Peng, Bingjing Zhang, Tony Liu, more

2017 IEEE 10th International Conference on Cloud Computing (CLOUD) > 82 - 89

2017 IEEE 10th International Conference on Cloud Computing (CLOUD)

Data analytics is undergoing a revolution in many scientific domains, and demands cost-effective parallel data analysis techniques. Traditional Java-based Big Data processing tools like Hadoop MapReduce are designed for commodity CPUs. In contrast, emerging manycore processors like the Xeon Phi have an order of magnitude greater computation power and memory bandwidth. To harness their computing capabilities,...

chapter

Provenance Enriched PID Kernel Information as OAI-ORE Map Replacement for SEAD Research Objects

Inna Kouper, Yu Luo, Isuru Suriarachchi, Beth Plale

2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) > 1 - 2

2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL)

PIDs and PID Kernel Information, activities of the Research Data Alliance, have the potential to expand the utility and benefit of data provenance. The poster describes such expansion and outlines a study of the trade-offs of replacing the Research Object (RO) and OAI-ORE map solution of the SEAD publishing services with the PID Kernel Information approach.

chapter

VXVDEX: Internet of threads and networks of namespaces

Renzo Davoli

2017 IEEE International Conference on Communications (ICC) > 1 - 6

ICC 2017 - 2017 IEEE International Conference on Communications

A network of namespaces (NoN) is a way to connect network namespaces defined on different hosts so that they appear to be interconnected on a (virtual) Local Area Network. A NoN protects the communications from malicious or accidental interception or intrusion originated by processes running in other NoN-s. A NoN could be defined using VLANs, veth, kernel bridge definitions, etc. It would be a daunting...

chapter

Enabling One-Sided Communication Semantics on ARM

Pavel Shamis, M. Graham Lopez, Gilad Shainer

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 805 - 813

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper, we present our work to enable optimized one-sided communication operations on the ARM v8 architecture using a high-performance InfiniBand network interconnect, as well as an evaluation of our implementation. For this study, we started with an OpenSHMEM implementation based on Open MPI/SHMEM, and combined it with the UCX framework and the XPMEM kernel extension for shared memory communication...

chapter

Bayesian optimization for conditional hyperparameter spaces

Julien-Charles Levesque, Audrey Durand, Christian Gagne, Robert Sabourin

2017 International Joint Conference on Neural Networks (IJCNN) > 286 - 293

2017 International Joint Conference on Neural Networks (IJCNN)

Hyperparameter optimization is now widely applied to tune the hyperparameters of learning algorithms. The hyperparameters can have structure, resulting in hyperparameters depending on conditions, or on the values of other hyperparameters. We target the problem of combined algorithm selection and hyperparameter optimization, which includes at least one conditional hyperparameter: the choice of the...

INFONA - science communication portal

Search results

Automatic Scan Parallelization in OpenMP

Taco: A tool to generate tensor algebra kernels

Dataflow Programming for Stream Processing

Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems

cudaCR: An In-Kernel Application-Level Checkpoint/Restart Scheme for CUDA-Enabled GPUs

Estimation of the multidimensional dynamical characteristic eye-motor system

Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi

Fast linear algebra-based triangle counting with KokkosKernels

Inter-process communication, MPI and MPICH in microkernel environment: A comparative analysis

LibHSA: One step towards mastering the era of heterogeneous hardware accelerators using FPGAs

UDORN: A design framework of persistent in-memory key-value database for NVM

Userspace RDMA Verbs on Commodity Hardware Using DPDK

Overlapping Data Transfers with Computation on GPU with Tiles

Multiverse: Easy Conversion of Runtime Systems into OS Kernels via Automatic Hybridization

A pipeline functional language for stateful packet processing

Benchmarking Harp-DAAL: High Performance Hadoop on KNL Clusters

Provenance Enriched PID Kernel Information as OAI-ORE Map Replacement for SEAD Research Objects

VXVDEX: Internet of threads and networks of namespaces

Enabling One-Sided Communication Semantics on ARM

Bayesian optimization for conditional hyperparameter spaces

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options