Search results

chapter

Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution

Ankit Sethia, Scott Mahlke

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 647 - 658

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

GPUs use thousands of threads to provide high performance and efficiency. In general, if one thread of a kernel uses one of the resources (compute, bandwidth, data cache) more heavily, there will be significant contention for that resource due to the large number of identical concurrent threads. This contention will eventually saturate the performance of the kernel due to contention for the bottleneck...

chapter

Multi-GPU System Design with Memory Networks

Gwangsun Kim, Minseok Lee, Jiyun Jeong, John Kim

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 484 - 495

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

GPUs are being widely used to accelerate different workloads and multi-GPU systems can provide higher performance with multiple discrete GPUs interconnected together. However, there are two main communication bottlenecks in multi-GPU systems -- accessing remote GPU memory and the communication between GPU and the host CPU. Recent advances in multi-GPU programming, including unified virtual addressing...

chapter

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU

Guoyang Chen, Bo Wu, Dong Li, Xipeng Shen

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture > 88 - 100

2014 47th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

GPU is often equipped with complex memory systems, including globalmemory, texture memory, shared memory, constant memory, and variouslevels of cache. Where to place the data is important for theperformance of a GPU program. However, the decision is difficult for aprogrammer to make because of architecture complexity and thesensitivity of suitable data placements to input and architecturechanges.This...

chapter

Explicit Versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels

Nils Kriege, Marion Neumann, Kristian Kersting, Petra Mutzel

2014 IEEE International Conference on Data Mining > 881 - 886

2014 IEEE International Conference on Data Mining (ICDM)

As many real-world data can elegantly be represented as graphs, various graph kernels and methods for computing them have been proposed. Surprisingly, many of the recent graph kernels do not employ the kernel trick anymore but rather compute an explicit feature map and report higher efficiency. So, is there really no benefit of the kernel trick when it comes to graphs? Triggered by this question,...

chapter

A Hierarchy Method Based on LDA and SVM for News Classification

Limeng Cui, Fan Meng, Yong Shi, Minqiang Li, more

2014 IEEE International Conference on Data Mining Workshop > 60 - 64

2014 IEEE International Conference on Data Mining Workshop (ICDMW)

He growth of the online data provides the user a access to information on the Internet but also creates the challenges to obtain the valuable knowledge. In this paper we focus on news text classification, which is meaningful for information provider to organize and display the news but also for the users to reach the valuable information easily. A hierarchy method based on LDA and SVM is proposed...

chapter

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

Yuan Wen, Zheng Wang, Michael F. P. O'Boyle

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 10

2014 21st International Conference on High Performance Computing (HiPC)

Heterogeneous systems consisting of multiple CPUs and GPUs are increasingly attractive as platforms for high performance computing. Such platforms are usually programmed using OpenCL which provides program portability by allowing the same program to execute on different types of device. As such systems become more mainstream, they will move from application dedicated devices to platforms that need...

chapter

DPDK-based implementation of application-tailored networks on end user nodes

Hans Wippel

2014 International Conference and Workshop on the Network of the Future (NOF) > 1 - 5

2014 International Conference and Workshop on the Network of the Future (NOF)

Application-tailored networks are customized networks optimized for application requirements. They use custom protocol stacks and network virtualization to provide flexible and efficient communication. End user nodes run a framework called NENA to connect to such networks at runtime. The current NENA implementation runs on top of the operating system's network stack and uses the Socket API. It allows...

chapter

GPU parallelization of the stochastic on-time arrival problem

Maleen Abeydeera, Samitha Samaranayake

2014 21st International Conference on High Performance Computing (HiPC) > 1 - 8

2014 21st International Conference on High Performance Computing (HiPC)

The Stochastic On-Time Arrival (SOTA) problem has recently been studied as an alternative to traditional shortest-path formulations in situations with hard deadlines. The goal is to find a routing strategy that maximizes the probability of reaching the destination within a pre-specified time budget, with the edge weights of the graph being random variables with arbitrary distributions. While this...

chapter

GWIS_FI: A universal GPU interface for exhaustive search of pairwise interactions in case-control GWAS in minutes

Qiao Wang, Fan Shi, Andrew Kowalczyk, Richard M Campbell, more

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 403 - 409

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Epistatic interactions between genes are believed to be a critical component in the genetic architecture of complex diseases. Genome Wide Association Studies (GWAS) may be able to detect such genetic interactions indirectly, via the identification of associated SNP markers. Major obstacles to progress in this area are: the unknown nature of epistatic interactions, little understanding of the capabilities...

chapter

Power and Energy Footprint of OpenMP Programs Using OpenMP Runtime API

Anilkumar Nandamuri, Abid M. Malik, Ahmad Qawasmeh, Barbara M. Chapman

2014 Energy Efficient Supercomputing Workshop > 79 - 88

2014 Energy Efficient Supercomputing Workshop (E2SC)

Power and energy have become dominant aspects of hardware and software design in the High Performance Computing (HPC). Recently, the Department of Defense (DOD) has put a constraint that applications and architectures need to attain 75 GFLOPS/Watt in order to support the future missions. This requires a significant research effort towards power and energy optimization. OpenMP programming model is...

chapter

Runtime Support for Adaptive Spatial Partitioning and Inter-Kernel Communication on GPUs

Yash Ukidave, Charu Kalra, David Kaeli, Perhaad Mistry, more

2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing > 168 - 175

2014 26th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

GPUs have gained tremendous popularity in a broad range of application domains. These applications possess varying grains of parallelism and place high demands on compute resources -- many times imposing real-time constraints, requiring flexible work schedules, and relying on concurrent execution of multiple kernels on the device. These requirements present a number of challenges when targeting current...

chapter

Pedestrian Classification Using K-means and Random Decision Forests

Francisco A.R. Alencar, Carlos Massera Filho, Diego Gomes da Silva, Denis F. Wolf

2014 Joint Conference on Robotics: SBR-LARS Robotics Symposium and Robocontrol > 103 - 108

2014 Joint Conference on Robotics: SBR-LARS Robotics Symposium and Robocontrol (SBR LARS Robocontrol)

In field of autonomous and intelligent vehicles, the goal of pedestrian classification is to reduce amount of accidents. The object classification accuracy depends on the type of classifier and the extracted object features used for classification. Support Vector Machines (SVM), is considered the most effective classifier for this task. However, it depends on a number of factors that require researchers...

chapter

Comparative study of traditional Bayesian algorithm and MassBayes algorithm using Pendigits dataset

Khushbu Trivedi, Parvati Bhurani, Ashish Kumar

Proceedings of 3rd International Conference on Reliability, Infocom Technologies and Optimization > 1 - 4

2014 3rd International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions)

The existing generative classifiers (eg. Naïve Bayes) estimate joint probability distribution p(x,y) or likelihood p(x|y) with the help of different density estimators, which are not suitable for large data sets due to their high time and space complexities. These classifiers also make different assumptions; allow limited dependencies among attributes and estimate one-dimensional likelihood. A new...

chapter

Towards a practical implementation of criticality mode change in RTOS

Young-Seung Kim, Hyun-Wook Jin

Proceedings of the 2014 IEEE Emerging Technology and Factory Automation (ETFA) > 1 - 4

2014 IEEE Emerging Technology and Factory Automation (ETFA)

In order to address the trade-off between certification and resource efficiency, researchers are recently trying to apply a criticality mode change mechanism to mixed-criticality systems. However, the actual implementation of the criticality mode change has not been studied rigorously. In this paper, we suggest a practical design to implement the criticality mode change framework for Real-Time Operating...

chapter

Disruption-free software updates in automation systems

Michael Wahler, Manuel Oriol

Proceedings of the 2014 IEEE Emerging Technology and Factory Automation (ETFA) > 1 - 8

2014 IEEE Emerging Technology and Factory Automation (ETFA)

Automation systems must primarily be deterministic and reliable, especially in safety-critical environments. With recent trends such as mass customization or Industry 4.0, there is an increasing need for automation systems to be dynamic. Changing parts of the software of today's automation systems, however, typically requires rebooting the controller, which makes software updates a complex and costly...

chapter

Overhead Analysis of Performance Counter Measurements

Thomas Roehl, Jan Treibig, Georg Hager, Gerhard Wellein

2014 43rd International Conference on Parallel Processing Workshops > 176 - 185

2014 43nd International Conference on Parallel Processing Workshops (ICCPW)

LIKWID is a set of performance-related command line tools targeting X86 processors. Besides affinity-related tools it also includes likwid-perfctr, which allows to count hardware performance events. LIKWID builds upon the Linux msr kernel module, which allows to access model-specific registers (MSRs) via a device file interface. In addition to a set of convenient functional features such as a logical...

chapter

Integrity Verification and Secure Loading of Remote Binaries for Microkernel-Based Runtime Environments

Michael Weiss, Steffen Wagner, Roland Hellman, Sascha Wessel

2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications > 544 - 551

2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom)

While most microkernel-based systems implement non-essential software components as user space tasks and strictly separate those tasks during runtime, they often rely on a static configuration and composition of their software components to ensure safety and security. In this paper, we extend a microkernel-based system architecture with a Trusted Platform Module (TPM) and propose a verification mechanism...

chapter

Cross Resource Optimisation of Database Functionality across Heterogeneous Processors

Eoghan ONeill, John McGlone, J.G.F Coutinho, Andrew Doole, more

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications > 150 - 157

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

Significant application performance improvements can be achieved by heterogeneous compute technologies, such as multi-core CPUs, GPUs and FPGAs. The HARNESS project is developing architectural principles that enable the next generation cloud platforms to incorporate such devices thereby vastly increasing performance, reducing energy consumption, and lowering associated cost profiles. Along with management...

chapter

Automatic Tuning of a Parallel Pattern Library for Heterogeneous Systems with Intel Xeon Phi

Jiri Dokulil, Siegfried Benkner

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications > 42 - 49

2014 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA)

Pattern libraries are important tools for high productivity application development. Their struggle for best performance is complicated by the fact that they are used to execute user-provided code, which is not known during their creation. This makes pattern libraries good candidate for automatic software tuning. In this paper, we deal with automatic online parameter tuning of the HyPHI hybrid pattern...

chapter

Adaptive Algorithm and Tool Flow for Accelerating System C on Many-Core Architectures

Christoph Roth, Simon Reder, Harald Bucher, Oliver Sander, more

2014 17th Euromicro Conference on Digital System Design > 137 - 145

2014 17th Euromicro Conference on Digital System Design (DSD)

Within this paper an adaptive approach for parallel simulation of SystemC RTL models on future many-core architectures like the Single-chip Cloud Computer (SCC) from Intel is presented. It is based on a configurable parallel SystemC kernel that preserves the partial order defined by the SystemC delta cycles while avoiding global synchronization as far as possible. The underlying algorithm relies on...

INFONA - science communication portal

Search results

Equalizer: Dynamic Tuning of GPU Resources for Efficient Execution

Multi-GPU System Design with Memory Networks

PORPLE: An Extensible Optimizer for Portable Data Placement on GPU

Explicit Versus Implicit Graph Feature Maps: A Computational Phase Transition for Walk Kernels

A Hierarchy Method Based on LDA and SVM for News Classification

Smart multi-task scheduling for OpenCL programs on CPU/GPU heterogeneous platforms

DPDK-based implementation of application-tailored networks on end user nodes

GPU parallelization of the stochastic on-time arrival problem

GWIS_FI: A universal GPU interface for exhaustive search of pairwise interactions in case-control GWAS in minutes

Power and Energy Footprint of OpenMP Programs Using OpenMP Runtime API

Runtime Support for Adaptive Spatial Partitioning and Inter-Kernel Communication on GPUs

Pedestrian Classification Using K-means and Random Decision Forests

Comparative study of traditional Bayesian algorithm and MassBayes algorithm using Pendigits dataset

Towards a practical implementation of criticality mode change in RTOS

Disruption-free software updates in automation systems

Overhead Analysis of Performance Counter Measurements

Integrity Verification and Secure Loading of Remote Binaries for Microkernel-Based Runtime Environments

Cross Resource Optimisation of Database Functionality across Heterogeneous Processors

Automatic Tuning of a Parallel Pattern Library for Heterogeneous Systems with Intel Xeon Phi

Adaptive Algorithm and Tool Flow for Accelerating System C on Many-Core Architectures

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options