Search results

chapter

ApproxEigen: An approximate computing technique for large-scale eigen-decomposition

Qian Zhang, Ye Tian, Ting Wang, Feng Yuan, more

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) > 824 - 830

2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)

Recognition, Mining, and Synthesis (RMS) applications are expected to make up much of the computing workloads of the future. Many of these applications (e.g., recommender systems and search engine) are formulated as finding eigenvalues/vectors of large-scale matrices. These applications are inherently error-tolerant, and it is often unnecessary, sometimes even impossible, to calculate all the eigenpairs...

chapter

Playing Hare and Tortoise: The FigarOS Kernel for Fine-Grained System-Level Energy Optimizations

Timo Honig, Christopher Eibel, Benedict Herzog, Heiko Janker, more

2015 Brazilian Symposium on Computing Systems Engineering (SBESC) > 80 - 83

2015 Brazilian Symposium on Computing Systems Engineering (SBESC)

Energy has emerged to be the most important resource for computing systems. Despite the exceptional importance of energy, reducing its demand at application and system level remains a challenging task for programmers and engineers. This is aggravated by the fact that traditional energy-saving approaches are not only error-prone but even lead to adverse consequences (i.e. increased energy consumption)...

chapter

Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms

Alok Prakash, Siqi Wang, Alexandru Eugen Irimiea, Tulika Mitra

2015 33rd IEEE International Conference on Computer Design (ICCD) > 208 - 215

2015 33rd IEEE International Conference on Computer Design (ICCD)

State-of-the-art mobile system-on-chips (SoC) include heterogeneity in various forms for accelerated and energy-efficient execution of diverse range of applications. The modern SoCs now include programmable cores such as CPU and GPU with very different functionality. The SoCs also integrate performance heterogeneous cores with different power-performance characteristics but the same instruction-set...

chapter

Towards Seismic Wave Modeling on Heterogeneous Many-Core Architectures Using Task-Based Runtime System

Victor Martinez, David Michea, Fabrice Dupros, Olivier Aumage, more

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) > 1 - 8

2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)

Understanding three-dimensional seismic wave propagation in complex media is still one of the main challenges of quantitative seismology. Because of its simplicity and numerical efficiency, the finite-differences method is one of the standard techniques implemented to consider the elastodynamics equation. Additionally, this class of modeling heavily relies on parallel architectures in order to tackle...

chapter

CoBaS: Introducing a Component Based Scheduling Framework

Anselm Busse, Reinhardt Karnapke, Hans-Ulrich Heiss

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW) > 79 - 84

2015 International Symposium on Computer Architecture and High Performance Computing Workshop (SBAC-PADW)

Many-Core systems and heterogeneous systems are getting more and more common and may soon enter the mainstream market. To harvest their capabilities to their full potential, the runtime system's scheduling policies have to be adapted and, in many cases, tailored to the specific system. The runtime system can be both an operating system or management infrastructure of an infrastructure as a service...

chapter

Improving the performance of graph analysis through partitioning with sampling

Michael M. Wolf, Benjamin A. Millery

2015 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2015 IEEE High Performance Extreme Computing Conference (HPEC)

Numerous applications focus on the analysis of entities and the connections between them, and such data are naturally represented as graphs. In particular, the detection of a small subset of vertices with anomalous coordinated connectivity is of broad interest, for problems such as detecting strange traffic in a computer network or unknown communities in a social network. Eigenspace analysis of large-scale...

chapter

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

Ashwin Mandayam Aji, Antonio J. Pena, Pavan Balaji, Wu-chun Feng

2015 IEEE International Conference on Cluster Computing > 42 - 51

2015 IEEE International Conference on Cluster Computing (CLUSTER)

OpenCL is a portable interface that can be used to program cluster nodes with heterogeneous compute devices. The OpenCL specification tightly binds its workflow abstraction, or "command queue," to a specific device for the entire program. For best performance, the user has to find the ideal queue -- device mapping at command queue creation time, an effort that requires a thorough understanding...

chapter

Hierarchical clustering and k-means analysis of HPC application kernels performance characteristics

M.L. Grodowitz, Sarat Sreepathi

2015 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 6

2015 IEEE High Performance Extreme Computing Conference (HPEC)

In this work, we present the characterization of a set of scientific kernels which are representative of the behavior of fundamental and applied physics applications across a wide range of fields. We collect performance attributes in the form of micro-operation mix and off-chip memory bandwidth measurements for these kernels. Using these measurements, we use two clustering methodologies to show which...

chapter

A generic infrastructure for OpenCL performance analysis

Robert Dietrich, Ronny Tschuter

2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS) > 1 > 334 - 341

2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS)

OpenCL is an open standard for programming of parallel heterogeneous systems. It is designed for portability, therefore being utilized in the area of embedded system programming as well as high performance computing (HPC). Due to the applicability on different platforms, OpenCL library vendors have a certain freedom in implementing parts of the OpenCL execution model. Multiple versions of the standard...

chapter

Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms

Jie Shen, Ana Lucia Varbanescu, Xavier Martorell, Henk Sips

2015 44th International Conference on Parallel Processing > 560 - 569

2015 44th International Conference on Parallel Processing (ICPP)

Heterogeneous platforms are mixes of different processing units. The key factor to their efficient usage is workload partitioning. Both static and dynamic partitioning strategies have been defined in previous work, but their applicability and performance differ significantly depending on the application to execute. In this paper, we propose an application-driven method to select the best partitioning...

chapter

Open ACC Programs Examined: A Performance Analysis Approach

Robert Dietrich, Guido Juckeland, Michael Wolfe

2015 44th International Conference on Parallel Processing > 310 - 319

2015 44th International Conference on Parallel Processing (ICPP)

The Open ACC standard has been developed to simplify parallel programming of heterogeneous systems. Based on a set of high-level compiler directives it allows application developers to offload code regions from a host CPU to an accelerator without the need for low-level programming with CUDA or Open CL. Details are implicit in the programming model and managed by Open ACC API-enabled compilers and...

chapter

Extensions over OpenCL for Latency Reduction and Critical Applications

Grigore Lupescu, Emil-Ioan Slusanschi, Nicolae Tapus

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) > 379 - 385

2015 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

Hardware and software stack complexity make programming GPGPUs difficult and limit application portability. This article first discusses challenges imposed by the current hardware and software model in GPGPU systems which relies heavily on the HOST device (CPU). We then identify system bottlenecks both in the hardware design and in the software stack and present two ideas to extend the HOST and DEVICE...

chapter

A System-Level Simulation Framework for Evaluating Resource Management Policies for Heterogeneous System Architectures

Antonio Miele, Gianluca Carlo Durelli, Marco Domenico Santambrogio, Cristiana Bolchini

2015 Euromicro Conference on Digital System Design > 637 - 644

2015 Euromicro Conference on Digital System Design (DSD)

Nowadays, heterogeneous system architectures, integrating CPUs and one or more kinds of accelerators (e.g., GPUs or HW accelerators), are a promising solution to achieve high performance for data-intensive workloads while fulfilling other system-level requirements on the available power/energy budgets. However, heterogeneity comes at the cost of greater design and management complexity leading to...

chapter

CFS performance improvement using Binomial Heap

Shirish Singh, Praveen Kumar

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 1822 - 1824

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Process scheduling algorithm plays a crucial role in operating system performance and so does the data-structure used for its implementation. A scheduler is designed to ensure the distribution of resources among the tasks is fair along with maximization of CPU utilization. The Completely Fair Scheduler (CFS), the default scheduler of Linux (since kernel version 2.6.23), ensures equal opportunity among...

chapter

Parallel implementation of low light level image enhancement using CUDA

Peiyi Shen, Liang Zhang, Juan Song, Xilu Peng, more

2015 IEEE International Conference on Information and Automation > 673 - 677

2015 IEEE International Conference on Information and Automation (ICIA)

Enhancement algorithms can make low light level images have a clear visual effect like the one captured during the daytime, but due to high complexity and generous computational cost, low light level image enhancement algorithms are usually difficult to meet real-time requirements which make it difficult to be widely used in practical application. For this situation, a parallel optimization algorithm...

chapter

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

Yijie Huangfu, Wei Zhang

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 972 - 976

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Intelligent GPU cache bypassing can improve the efficiency of using GPU memory bandwidth, which can benefit GPU performance. In this paper, we study a pure hardware-based GPU cache bypassing method that can be applied to GPU applications without having to recompile the programs. Moreover, we introduce a hybrid method that can exploit profiling information to further enhance the hardware-based bypassing...

chapter

MPI+ULT: Overlapping Communication and Computation with User-Level Threads

Huiwei Lu, Sangmin Seo, Pavan Balaji

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 444 - 454

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

As the core density of future processors keeps increasing, MPI+Threads is becoming a promising programming model for large scale SMP clusters. Generally speaking, hybrid MPI+Threads runtime can largely improve intra-node parallelism and data sharing on shared-memory architectures. However, it does not help much on inter-node communication due to the inefficient integration of existing communication...

chapter

JolokiaC++: Optimizing Irregular Accesses for GPGPU

Vibha Patel, Sanjeev Aggarwal, Amey Karkare

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 583 - 590

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

We present JolokiaC++ a compiler framework to ease coding of irregular data applications on GPUs. The effectiveness of the compiler and runtime systems of JolokiaC++ is tested using three kernels IRREG, MOLDYN and NBF, executed on NVIDIA GPUs. We developed extensions for the generic parallel constructs that allow portable and efficient programming of codes with irregular accesses on the GPU. We present...

chapter

CUDA Grid-Level Task Progression Algorithms

Christos Kartsaklis, Wayne Joubert, Oscar R. Hernandez, Markus Eisenbach, more

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 1628 - 1632

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

Tasking is a prominent parallel programming model. In this paper we conduct a first study into the feasibility of task-parallel execution at the CUDA grid, rather than the stream/kernel level, for regular, fixed in-out dependency task graphs, similar to those found in wavefront computational patterns, making the findings broadly applicable. We propose and evaluate three CUDA task progression algorithms,...

chapter

A New Approach to Embedding Semantic Link Network with Word2Vec Binary Code

Yanhong Yuan, Yao Liu, Qiaoli Huang, Zhixing Huang

2015 11th International Conference on Semantics, Knowledge and Grids (SKG) > 9 - 16

2015 11th International Conference on Semantics, Knowledge and Grids (SKG)

Graph-structured data has come into wide use in various fields where graphs are the natural data structure to model networks. Therefore, the comparison between two graphs becomes a research focus. Traditional approaches for graph comparison face the common problem: either increasing the runtime for large graphs or simplifying the representation of graphs which ignores part of their topological information...

INFONA - science communication portal

Search results

ApproxEigen: An approximate computing technique for large-scale eigen-decomposition

Playing Hare and Tortoise: The FigarOS Kernel for Fine-Grained System-Level Energy Optimizations

Energy-efficient execution of data-parallel applications on heterogeneous mobile platforms

Towards Seismic Wave Modeling on Heterogeneous Many-Core Architectures Using Task-Based Runtime System

CoBaS: Introducing a Component Based Scheduling Framework

Improving the performance of graph analysis through partitioning with sampling

Automatic Command Queue Scheduling for Task-Parallel Workloads in OpenCL

Hierarchical clustering and k-means analysis of HPC application kernels performance characteristics

A generic infrastructure for OpenCL performance analysis

Matchmaking Applications and Partitioning Strategies for Efficient Execution on Heterogeneous Platforms

Open ACC Programs Examined: A Performance Analysis Approach

Extensions over OpenCL for Latency Reduction and Critical Applications

A System-Level Simulation Framework for Evaluating Resource Management Policies for Heterogeneous System Architectures

CFS performance improvement using Binomial Heap

Parallel implementation of low light level image enhancement using CUDA

Hardware-Based and Hybrid L1 Data Cache Bypassing to Improve GPU Performance

MPI+ULT: Overlapping Communication and Computation with User-Level Threads

JolokiaC++: Optimizing Irregular Accesses for GPGPU

CUDA Grid-Level Task Progression Algorithms

A New Approach to Embedding Semantic Link Network with Word2Vec Binary Code

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options