Search results

chapter

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming

Riyadh Baghdadi, Ulysse Beaugnon, Albert Cohen, Tobias Grosser, more

2015 International Conference on Parallel Architecture and Compilation (PACT) > 138 - 149

2015 International Conference on Parallel Architecture and Compilation (PACT)

Programming accelerators such as GPUs withlow-level APIs and languages such as OpenCL and CUDAis difficult, error-prone, and not performance-portable. Au-tomatic parallelization and domain specific languages (DSLs)have been proposed to hide complexity and regain performanceportability. We present P ENCIL, a rigorously-defined subset ofGNU C99 -- enriched with additional language constructs -- that...

chapter

An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

Shixiong Xu, David Gregg

2015 International Conference on Parallel Architecture and Compilation (PACT) > 488 - 489

2015 International Conference on Parallel Architecture and Compilation (PACT)

Nested thread-level parallelism (TLP) is pervasive in real applications. For example, 75% (14 out of 19) of the applications in the Rodinia benchmark for heterogeneous accelerators contain kernels with nested thread-level parallelism. Efficiently mapping the enclosed nested parallelism to the GPU threads in the C-to-CUDA compilation (OpenACC in this paper) is becoming more and more important. This...

chapter

Polyhedral Optimizations of Explicitly Parallel Programs

Prasanth Chatarasi, Jun Shirako, Vivek Sarkar

2015 International Conference on Parallel Architecture and Compilation (PACT) > 213 - 226

2015 International Conference on Parallel Architecture and Compilation (PACT)

The polyhedral model is a powerful algebraic framework that hasenabled significant advances to analysis and transformation ofsequential affine (sub)programs, relative to traditional AST-basedapproaches. However, given the rapid growth of parallel software, there is a need for increased attention to using polyhedral frameworksto optimize explicitly parallel programs. An interesting side effectof supporting...

chapter

A Machine-Learning Approach for Communication Prediction of Large-Scale Applications

Nikela Papadopoulou, Georgios Goumas, Nectarios Koziris

2015 IEEE International Conference on Cluster Computing > 120 - 123

2015 IEEE International Conference on Cluster Computing (CLUSTER)

In this paper we present a machine-learning approach to predict the total communication time of parallel applications. Communication time is heavily dependent on a very wide set of parameters relevant to the architecture, runtime configuration and application communication profile. We focus our study on parameters that can be easily extracted from the application and the process mapping ahead of execution...

chapter

High performance user space sockets on low power System on a Chip platforms

Catherine H. Crawford, Piotr Padkowski, Tomasz Baranski, Angela Czubak, more

2015 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2015 IEEE High Performance Extreme Computing Conference (HPEC)

With the introduction of low power System on a Chip (SoC) processor architectures in enterprise server configurations, there is a growing need to develop the software that will support scale-out, data intensive cloud applications that are deployed in data centers today. In this paper, we describe the design and implementation of a low latency user space fully compliant TCP/IP socket stack on a low...

chapter

SBIOS: An SSD-based Block I/O Scheduler with improved system performance

Jiayang Guo, Yimin Hu, Bo Mao

2015 IEEE International Conference on Networking, Architecture and Storage (NAS) > 357 - 358

2015 IEEE International Conference on Networking, Architecture and Storage (NAS)

This paper presents an SSD-based Block I/O Scheduler, short for SBIOS. SBIOS fully exploits the internal parallelism to improve the system performance. It dispatches the read requests to different blocks to make full use of SSD internal parallelism. For write requests, it tries to dispatch write requests to the same block to alleviate the block cross penalty and garbage collection overhead. The evaluation...

chapter

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Yijie Huangfu, Wei Zhang

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 972 - 976

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing...

chapter

Design Exploration for next Generation High-Performance Manycore On-chip Systems: Application to big.LITTLE Architectures

Anastasiia Butko, Abdoulaye Gamatie, Gilles Sassatelli, Lionel Torres, more

2015 IEEE Computer Society Annual Symposium on VLSI > 551 - 556

2015 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Next generation embedded systems will massively adopt on-chip many core architectures to provide both performance and energy-efficiency. This trend will definitely establish the convergence of embedded computing and high-performance computing. In such a context, one major design challenge will concern the choice of adequate architecture parameters given system requirements. Moreover, it will affect...

chapter

A benchmark for scene classification of high spatial resolution remote sensing imagery

Jingwen Hu, Tianbi Jiang, Xinyi Tong, Gui-Song Xia, more

2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS) > 5003 - 5006

2015 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

Scene classification for high-resolution remotely sensed imagery have been widely investigated in recent years. However, there is few public, widely accepted and large scale dataset for benchmarking different methods. This paper presents a new and large dataset consisting of 5000 high-resolution remote sensing images which is manually labeled in 20 semantic classes for scene classification. Each class...

chapter

Performance Analysis of LXC for HPC Environments

David Beserra, Edward David Moreno, Patricia Takako Endo, Jymmy Barreto, more

2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems > 358 - 363

2015 Ninth International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)

Despite of Cloud infrastructures can be used as High Performance Computing (HPC) platforms, many issues from virtualization overhead had kept them unrelated. However, with advent of container-based virtualizers, this scenario acquires new perspectives because this technique promises to decrease the virtualization overhead, achieving a near-native performance. In this work, we analyzed the performance...

chapter

Kernel methods for short-term spatio-temporal wind prediction

Jethro Dowell, Stephan Weiss, David Infield

2015 IEEE Power & Energy Society General Meeting > 1 - 5

2015 IEEE Power & Energy Society General Meeting

Two nonlinear methods for producing short-term spatio-temporal wind speed forecast are presented. From the relatively new class of kernel methods, a kernel least mean squares algorithm and kernel recursive least squares algorithm are introduced and used to produce 1 to 6 hour-ahead predictions of wind speed at six locations in the Netherlands. The performance of the proposed methods are compared to...

chapter

DeepSketch: Deep convolutional neural networks for sketch recognition and similarity search

Omar Seddati, Stephane Dupont, Said Mahmoudi

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI) > 1 - 6

2015 13th International Workshop on Content-Based Multimedia Indexing (CBMI)

In this paper, we present a system for sketch classification and similarity search. We used deep convolution neural networks (ConvNets), state of the art in the field of image recognition. They enable both classification and medium/highlevel features extraction. We make use of ConvNets features as a basis for similarity search using k-Nearest Neighbors (kNN). Evaluation are performed on the TU-Berlin...

chapter

Dynamic user-level CPU allocation for volunteer computing in CFS-based scheduler environment

Korakit Seemakhupt, Krerk Piromsopa

2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) > 1 - 5

2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

In this paper, we propose a CPU allocation technique to solve the problem of running volunteer application on a system with Completely Fair Scheduler (CFS) using adaptive reservation. Our allocation technique works across user boundary without requiring administrative privilege. We implemented and evaluated our technique on Linux-based system with the CFS. Our technique could mitigate performance...

chapter

Generic GNU/Linux reconfiguration platform proposal

Petr Cvek, Ondrej Novák

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM) > 1 - 6

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics (ECMSM)

This article presents a design of a dynamically reconfigurable hybrid multiprocessor system on a chip (SoC), where individual reconfiguration partitions (RP) are time multiplexed by demands of a task. Scheduling the RPs is designed to be done by a modified Linux kernel. Design is partially implemented on the experimental platform, tested by multiple benchmarks and will be extended in the future.

chapter

Evaluating Architecture-Dependent Linux Performance

Lucian Mogosanu, Mihai Carabas, Cristian Condurache, Laura Gheorghe, more

2015 20th International Conference on Control Systems and Computer Science > 499 - 505

2015 20th International Conference on Control Systems and Computer Science (CSCS)

Modern operating system kernels, such as Linux, address the trade-off between portability and performance by exposing a generic interface to user space programs, while maintaining architecture-dependent functionality as a set of separate components inside the kernel space. In particular, performance can only be achieved by ensuring that the architecture-dependent code takes advantage of the facilities...

chapter

Improving TLB Performance by Increasing Hugepage Ratio

Taowei Luo, Xiaolin Wang, Jingyuan Hu, Yingwei Luo, more

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 1139 - 1142

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Linux supports transparent huge page since 2.6.38.It can automatically map huge pages. But this implementation fails to adjust to page alignment in memory allocation and thus cannot use huge page in some situations. The design is not efficient. Our work aims to increase huge page allocation, so as to improve the utilization ratio of huge page and overall performance. The experimental results show...

chapter

Boosting GPU Performance by Profiling-Based L1 Data Cache Bypassing

Yijie Huangfu, Wei Zhang

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 1119 - 1122

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Cache memories have been introduced in recent generations of Graphics Processing Units (GPUs) to benefit general-purpose computing on GPUs (GPGPUs). In this work, we analyze the memory access patterns of GPGPU applications and propose a cost-effective profiling-based method to identify the data accesses that should bypass the L1 data cache to improve performance. The evaluation indicates that the...

chapter

An Evaluation of Unified Memory Technology on NVIDIA GPUs

Wenqiang Li, Guanghao Jin, Xuewen Cui, Simon See

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing > 1092 - 1098

2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)

Unified Memory is an emerging technology which is supported by CUDA 6.X. Before CUDA 6.X, the existing CUDA programming model relies on programmers to explicitly manage data between CPU and GPU and hence increases programming complexity. CUDA 6.X provides a new technology which is called as Unified Memory to provide a new programming model that defines CPU and GPU memory space as a single coherent...

chapter

Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL

Guido Juckeland, Alexander Grund, Wolfgang E. Nagel

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 689 - 698

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

The popular and diverse hardware accelerator ecosystem makes apples-to-apples comparisons between platforms rather difficult. SPEC ACCEL tries to offer a yardstick to compare different accelerator hardware and software ecosystems. This paper uses this SPEC benchmark to compare an AMD GPU, an NVIDIA GPU and an Intel Xeon Phi with respect to performance and energy consumption. It also provides observations...

chapter

Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability

Thomas L. Falch, Anne C. Elster

2015 IEEE International Parallel and Distributed Processing Symposium Workshop > 1231 - 1240

2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW)

Heterogeneous computing, which combines devices with different architectures, is rising in popularity, and promises increased performance combined with reduced energy consumption. OpenCL has been proposed as a standard for programing such systems, and offers functional portability. It does, however, suffer from poor performance portability, code tuned for one device must be re-tuned to achieve good...

INFONA - science communication portal

Search results

PENCIL: A Platform-Neutral Compute Intermediate Language for Accelerator Programming

An Efficient Vectorization Approach to Nested Thread-level Parallelism for CUDA GPUs

Polyhedral Optimizations of Explicitly Parallel Programs

A Machine-Learning Approach for Communication Prediction of Large-Scale Applications

High performance user space sockets on low power System on a Chip platforms

SBIOS: An SSD-based Block I/O Scheduler with improved system performance

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Design Exploration for next Generation High-Performance Manycore On-chip Systems: Application to big.LITTLE Architectures

A benchmark for scene classification of high spatial resolution remote sensing imagery

Performance Analysis of LXC for HPC Environments

Kernel methods for short-term spatio-temporal wind prediction

DeepSketch: Deep convolutional neural networks for sketch recognition and similarity search

Dynamic user-level CPU allocation for volunteer computing in CFS-based scheduler environment

Generic GNU/Linux reconfiguration platform proposal

Evaluating Architecture-Dependent Linux Performance

Improving TLB Performance by Increasing Hugepage Ratio

Boosting GPU Performance by Profiling-Based L1 Data Cache Bypassing

An Evaluation of Unified Memory Technology on NVIDIA GPUs

Performance Portable Applications for Hardware Accelerators: Lessons Learned from SPEC ACCEL

Machine Learning Based Auto-Tuning for Enhanced OpenCL Performance Portability

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options