Search results

chapter

Scale-adaptive visual tracking with occlusion detection

Yulong Xu, Jiabao Wang, Yang Li, Zhuang Miao, more

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 938 - 942

2016 IEEE 13th International Conference on Signal Processing (ICSP)

Occlusion is a challenging problem in visual object tracking. Most state-of-the-art trackers may learn the appearance of the occluding target when it becomes occluded by other objects in the scene. This paper proposes a novel approach of detecting occlusion by dividing the target into several patches and computing the peak-to-sidelobe ratio of every response map. Furthermore, our method can calculate...

chapter

Multiple kernel collaborative representation based classification

Ru Li, Qian Zhang, Zhiming Gao, Bao-Di Liu, more

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 826 - 831

2016 IEEE 13th International Conference on Signal Processing (ICSP)

At present, collaborative representation based classification (CRC) is widely used in many pattern classification and recognition tasks. Meanwhile, spatial pyramid matching (SPM) method, which considers the spatial information in representing the image, is efficient for image classification. However, for SPM, the weights to evaluate the representation of different subregions are fixed. In this paper,...

chapter

Performance of MPI Codes Written in Python with NumPy and mpi4py

Ross Smith

2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC) > 45 - 51

2016 6th Workshop on Python for High-Performance and Scientific Computing (PyHPC)

Python is an interpreted language that has become more commonly used within HPC applications. Python benefits from the ability to write extension modules in C, which can further use optimized libraries that have been written in other compiled languages. For HPC users, two of the most common extensions are NumPy and mpi4py. It is possible to write a full computational kernel in a compiled language...

chapter

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support

Matt Martineau, Simon McIntosh-Smith, Carlo Bertolli, Arpith C. Jacob, more

2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) > 54 - 64

2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)

The Clang implementation of OpenMP® 4.5 now provides full support for the specification, offering the only open source option for targeting NVIDIA® GPUs. While using OpenMP allows portability across different architectures, matching native CUDA® performance without major code restructuring is an open research issue.In order to analyze the current performance, we port a suite of representative benchmarks,...

chapter

Characterizing Power and Performance of GPU Memory Access

Tyler Allen, Rong Ge

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC) > 46 - 53

2016 4th International Workshop on Energy Efficient Supercomputing (E2SC)

Power is a major limiting factor for the future of HPC and the realization of exascale computing under a power budget. GPUs have now become a mainstream parallel computation device in HPC, and optimizing power usage on GPUs is critical to achieving future goals. GPU memory is seldom studied, especially for power usage. Nevertheless, memory accesses draw significant power and are critical to understanding...

chapter

A Directive Generation Approach Using User-Defined Rules

Kazuhiko Komatsu, Ryusuke Egawa, Hiroyuki Takizawa, Hiroaki Kobayashi

2016 Fourth International Symposium on Computing and Networking (CANDAR) > 515 - 521

2016 Fourth International Symposium on Computing and Networking (CANDAR)

The appearance of various high-performance computing (HPC) systems compels a user to write a code considering the characteristic of each HPC system. To describe the system-dependent information without drastic code modifications, the directive sets such as the OpenMP directive set and the OpenACC directive set are useful. However, a code becomes complex to achieve high performance on various HPC systems...

chapter

OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences

Khaled Hamidouche, Jie Zhang, Dhabaleswar K. Panda, Karen Tomko

2016 PGAS Applications Workshop (PAW) > 9 - 16

2016 PGAS Applications Workshop (PAW)

PGAS models with a lightweight synchronization and shared memory abstraction, are seen as a good alternative to the Message Passing model for irregular communication patterns. OpenSHMEM is a library based PGAS model. OpenSHMEM 1.3 introduced Non-Blocking data movement operations to provide better asynchronous progress and overlap. In this paper, we present our experiences in designing Non-Blocking...

chapter

Robust and Real-Time Object Tracking Using Scale-Adaptive Correlation Filters

Qingyong Hu, Yulan Guo, Zaiping Lin, Wei An, more

2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA) > 1 - 8

2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Correlation filter based tracking method has been widely used for its high efficiency and robustness. However, reducing model drifting while achieving both high robustness and fast scale estimation is still an open problem. In this paper, we represent the target in kernel feature space and train a classifier on a scale pyramid to achieve adaptive scale estimation. We then integrate three complementary...

chapter

Analysis and improvement of joint bilateral upsampling for depth image super-resolution

Yibing Song, Lijun Gong

2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) > 1 - 5

2016 8th International Conference on Wireless Communications & Signal Processing (WCSP)

We analyze and propose an improved implementation of joint bilateral upsampling algorithm [5] for depth image super-resolution (SR). The input to the algorithm is a low resolution (LR) depth image and its corresponding high resolution (HR) color image. With the guidance of HR color image, the depth edges can be preserved during the SR process. However, in the original implementation, the sparse sampling...

chapter

Physical design of supergate cells aiming geometrical optimizations

Maicon S. Cardoso, Gustavo H. Smaniotto, Regis Zanandrea, Renato S. de Souza, more

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS) > 1 - 4

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS)

Recent papers have demonstrated that graph-based methodologies for supergate design can provide solutions with fewer transistors when compared to the widely used factoring methods. However, there is not enough discussion about the impact of those solutions on physical design, and it is important since the generated supergates have some special topological particularities. In this paper, we perform...

chapter

A Comparative Study of SYCL, OpenCL, and OpenMP

Hercules Cardoso Da Silva, Flavia Pisani, Edson Borin

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 61 - 66

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Recent trends indicate that future computing systems will be composed by a group of heterogeneous computing devices, including CPUs, GPUs, and other hardware accelerators. These devices provide increased processing performance, however, creating efficient code for them may require that programmers manage memory assignments and use specialized APIs, compilers, or runtime systems, thus making their...

chapter

A CUDA implementation of the pagerank pipeline benchmark

Mauro Bisson, Everett Phillips, Massimiliano Fatica

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

This paper presents the details of a CUDA implementation of the PageRank pipeline benchmark [1], a new proposed benchmark aimed to compare and measure the capabilities of big data systems. The reference implementation is only serial at the moment, but our CUDA implementation is parallel. The results indicate that GPU accelerated systems have considerable potential for big data workloads.

chapter

Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus

Janki Bhimani, Miriam Leeser, Ningfang Mi

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Use of accelerators such as GPUs is increasing, but efficient use of GPUs requires making good design choices. Such design choices include type of memory allocation and overlapping concurrency of data transfer with parallel computation. Performance varies with the application, hardware version such as generation of GPU, and software version including programming language drivers. This large number...

chapter

A Robust Methodology for Performance Analysis on Hybrid Embedded Multicore Architectures

Romain Saussard, Boubker Bouzid, Marius Vasiliu, Roger Reynaud

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 77 - 84

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Today's vehicles increasingly embed software intelligence in order to be safer for the driver, and to achieve autonomous driving in a close future. To answer the computational needs of these algorithms, system-on-chip (SoC) suppliers propose heterogeneous architectures. With such complex SoCs, embedding applications in vehicle becomes more and more complex for car manufacturers. Indeed, it is not...

chapter

Pushing the Limits of Online Auto-Tuning: Machine Code Optimization in Short-Running Kernels

Fernando Endo, Damien Courousse, Henri-Pierre Charles

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 265 - 272

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

This paper proposes an online auto-tuning approach for computing kernels. Differently from existing online auto-tuners, which regenerate code with long compilation chains from the source to the binary code, our approach consists on deploying auto-tuning directly at the level of machine code generation. This allows auto-tuning to pay off in very short-running applications. As a proof of concept, our...

chapter

Directive-Based Pipelining Extension for OpenMP

Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-Chun Feng

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 481 - 484

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension...

chapter

SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks

Snehasish Kumar, William N. Sumner, Arrvindh Shriraman

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 11

2016 IEEE International Symposium on Workload Characterization (IISWC)

The end of Dennard Scaling has necessitated research into the adoption of specialized architectures for offloading specific code regions in applications. Recent works in accelerator architectures have chosen diverse workloads and even diverse code regions (within the same workload) to highlight the efficacy of specific accelerator architectures. However this makes it challenging to evaluate the power/performance...

chapter

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, more

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 10

2016 IEEE International Symposium on Workload Characterization (IISWC)

Graphics Processing Units (GPUs) can easily outperform CPUs in processing large-scale data parallel workloads, but are considered weak in processing serialized tasks and communicating with other devices. Pursuing a CPU-GPU collaborative computing model which takes advantage of both devices could provide an important breakthrough in realizing the full performance potential of heterogeneous computing...

chapter

Benchmarking the graphulo processing framework

Timothy Weale, Vijay Gadepally, Dylan Hutchison, Jeremy Kepner

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 5

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Graph algorithms have wide applicablity to a variety of domains and are often used on massive datasets. Recent standardization efforts such as the GraphBLAS specify a set of key computational kernels that hardware and software developers can adhere to. Graphulo is a processing framework that enables GraphBLAS kernels in the Apache Accumulo database. In our previous work, we have demonstrated a core...

chapter

Computational and memory analysis of Tegra SoCs

Andrew Milluzzi, Alan George, Herman Lam

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Low-power, embedded, GPU System-on-Chip (SoC) devices provide outstanding computational performance, especially for compute-intensive tasks. While clusters of SoCs for High-Performance Embedded Computing (HPEC) are not new, the computational power of these supercomputers has long lacked the efficiency of their more traditional, High-Performance Computing (HPC) counterparts. With the advent of the...

INFONA - science communication portal

Search results

Scale-adaptive visual tracking with occlusion detection

Multiple kernel collaborative representation based classification

Performance of MPI Codes Written in Python with NumPy and mpi4py

Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support

Characterizing Power and Performance of GPU Memory Access

A Directive Generation Approach Using User-Defined Rules

OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences

Robust and Real-Time Object Tracking Using Scale-Adaptive Correlation Filters

Analysis and improvement of joint bilateral upsampling for depth image super-resolution

Physical design of supergate cells aiming geometrical optimizations

A Comparative Study of SYCL, OpenCL, and OpenMP

A CUDA implementation of the pagerank pipeline benchmark

Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus

A Robust Methodology for Performance Analysis on Hybrid Embedded Multicore Architectures

Pushing the Limits of Online Auto-Tuning: Machine Code Optimization in Short-Running Kernels

Directive-Based Pipelining Extension for OpenMP

SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

Benchmarking the graphulo processing framework

Computational and memory analysis of Tegra SoCs

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options