Wyniki wyszukiwania

rozdział

Robust and Real-Time Object Tracking Using Scale-Adaptive Correlation Filters

Qingyong Hu, Yulan Guo, Zaiping Lin, Wei An, więcej

2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA) > 1 - 8

2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Correlation filter based tracking method has been widely used for its high efficiency and robustness. However, reducing model drifting while achieving both high robustness and fast scale estimation is still an open problem. In this paper, we represent the target in kernel feature space and train a classifier on a scale pyramid to achieve adaptive scale estimation. We then integrate three complementary...

rozdział

Analysis and improvement of joint bilateral upsampling for depth image super-resolution

Yibing Song, Lijun Gong

2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) > 1 - 5

2016 8th International Conference on Wireless Communications & Signal Processing (WCSP)

We analyze and propose an improved implementation of joint bilateral upsampling algorithm [5] for depth image super-resolution (SR). The input to the algorithm is a low resolution (LR) depth image and its corresponding high resolution (HR) color image. With the guidance of HR color image, the depth edges can be preserved during the SR process. However, in the original implementation, the sparse sampling...

rozdział

Physical design of supergate cells aiming geometrical optimizations

Maicon S. Cardoso, Gustavo H. Smaniotto, Regis Zanandrea, Renato S. de Souza, więcej

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS) > 1 - 4

2016 IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS)

Recent papers have demonstrated that graph-based methodologies for supergate design can provide solutions with fewer transistors when compared to the widely used factoring methods. However, there is not enough discussion about the impact of those solutions on physical design, and it is important since the generated supergates have some special topological particularities. In this paper, we perform...

rozdział

A Comparative Study of SYCL, OpenCL, and OpenMP

Hercules Cardoso Da Silva, Flavia Pisani, Edson Borin

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW) > 61 - 66

2016 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW)

Recent trends indicate that future computing systems will be composed by a group of heterogeneous computing devices, including CPUs, GPUs, and other hardware accelerators. These devices provide increased processing performance, however, creating efficient code for them may require that programmers manage memory assignments and use specialized APIs, compilers, or runtime systems, thus making their...

rozdział

A CUDA implementation of the pagerank pipeline benchmark

Mauro Bisson, Everett Phillips, Massimiliano Fatica

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

This paper presents the details of a CUDA implementation of the PageRank pipeline benchmark [1], a new proposed benchmark aimed to compare and measure the capabilities of big data systems. The reference implementation is only serial at the moment, but our CUDA implementation is parallel. The results indicate that GPU accelerated systems have considerable potential for big data workloads.

rozdział

Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus

Janki Bhimani, Miriam Leeser, Ningfang Mi

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Use of accelerators such as GPUs is increasing, but efficient use of GPUs requires making good design choices. Such design choices include type of memory allocation and overlapping concurrency of data transfer with parallel computation. Performance varies with the application, hardware version such as generation of GPU, and software version including programming language drivers. This large number...

rozdział

A Robust Methodology for Performance Analysis on Hybrid Embedded Multicore Architectures

Romain Saussard, Boubker Bouzid, Marius Vasiliu, Roger Reynaud

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 77 - 84

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

Today's vehicles increasingly embed software intelligence in order to be safer for the driver, and to achieve autonomous driving in a close future. To answer the computational needs of these algorithms, system-on-chip (SoC) suppliers propose heterogeneous architectures. With such complex SoCs, embedding applications in vehicle becomes more and more complex for car manufacturers. Indeed, it is not...

rozdział

Pushing the Limits of Online Auto-Tuning: Machine Code Optimization in Short-Running Kernels

Fernando Endo, Damien Courousse, Henri-Pierre Charles

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC) > 265 - 272

2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

This paper proposes an online auto-tuning approach for computing kernels. Differently from existing online auto-tuners, which regenerate code with long compilation chains from the source to the binary code, our approach consists on deploying auto-tuning directly at the level of machine code generation. This allows auto-tuning to pay off in very short-running applications. As a proof of concept, our...

rozdział

Directive-Based Pipelining Extension for OpenMP

Xuewen Cui, Thomas R. W. Scogland, Bronis R. de Supinski, Wu-Chun Feng

2016 IEEE International Conference on Cluster Computing (CLUSTER) > 481 - 484

2016 IEEE International Conference on Cluster Computing (CLUSTER)

Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension...

rozdział

SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks

Snehasish Kumar, William N. Sumner, Arrvindh Shriraman

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 11

2016 IEEE International Symposium on Workload Characterization (IISWC)

The end of Dennard Scaling has necessitated research into the adoption of specialized architectures for offloading specific code regions in applications. Recent works in accelerator architectures have chosen diverse workloads and even diverse code regions (within the same workload) to highlight the efficacy of specific accelerator architectures. However this makes it challenging to evaluate the power/performance...

rozdział

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

Yifan Sun, Xiang Gong, Amir Kavyan Ziabari, Leiming Yu, więcej

2016 IEEE International Symposium on Workload Characterization (IISWC) > 1 - 10

2016 IEEE International Symposium on Workload Characterization (IISWC)

Graphics Processing Units (GPUs) can easily outperform CPUs in processing large-scale data parallel workloads, but are considered weak in processing serialized tasks and communicating with other devices. Pursuing a CPU-GPU collaborative computing model which takes advantage of both devices could provide an important breakthrough in realizing the full performance potential of heterogeneous computing...

rozdział

Benchmarking the graphulo processing framework

Timothy Weale, Vijay Gadepally, Dylan Hutchison, Jeremy Kepner

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 5

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Graph algorithms have wide applicablity to a variety of domains and are often used on massive datasets. Recent standardization efforts such as the GraphBLAS specify a set of key computational kernels that hardware and software developers can adhere to. Graphulo is a processing framework that enables GraphBLAS kernels in the Apache Accumulo database. In our previous work, we have demonstrated a core...

rozdział

Computational and memory analysis of Tegra SoCs

Andrew Milluzzi, Alan George, Herman Lam

2016 IEEE High Performance Extreme Computing Conference (HPEC) > 1 - 7

2016 IEEE High Performance Extreme Computing Conference (HPEC)

Low-power, embedded, GPU System-on-Chip (SoC) devices provide outstanding computational performance, especially for compute-intensive tasks. While clusters of SoCs for High-Performance Embedded Computing (HPEC) are not new, the computational power of these supercomputers has long lacked the efficiency of their more traditional, High-Performance Computing (HPC) counterparts. With the advent of the...

rozdział

Variable-length VLIW encoding for code size reduction in embedded processors

Ting-Yu Shyu, Bo-Yu Su, Tay-Jyi Lin, Chingwei Yeh, więcej

2016 29th IEEE International System-on-Chip Conference (SOCC) > 296 - 299

2016 29th IEEE International System-on-Chip Conference (SOCC)

Very-long-instruction-word (VLIW) architectures are widely adopted in high-performance and low-power digital signal processors (DSP) due to their simplicity from extensive software optimizations. However, their poor code density (usually > 2× code size for a given application) and corresponding instruction accesses can overwhelm the energy savings on DSP datapaths. This paper presents variable-length...

rozdział

Parallel generation of digitally reconstructed radiographs on heterogeneous multi-GPU workstations

Marwan Abdellah, Asem Abdelaziz, E M B S Eslam Ali, Sherief Abdelaziz, więcej

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 3953 - 3956

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

The growing importance of three-dimensional radiotherapy treatment has been associated with the active presence of advanced computational workflows that can simulate conventional x-ray films from computed tomography (CT) volumetric data to create digitally reconstructed radiographs (DRR). These simulated x-ray images are used to continuously verify the patient alignment in image-guided therapies with...

rozdział

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

Miguel Tasende

2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech) > 894 - 897

2016 IEEE 14th Intl Conf on Dependable, Autonomic and Secure Computing, 14th Intl Conf on Pervasive Intelligence and Computing, 2nd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)

The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor) and one Zynq-7000 series chip, which normally runs a regular Linux OS version, serves as the main processor, and implements "glue logic" in its internal...

rozdział

SoCLog: A real-time, automatically generated logging and profiling mechanism for FPGA-based Systems On Chip

Ioannis Parnassos, Panagiotis Skrimponis, Georgios Zindros, Nikolaos Bellas

2016 26th International Conference on Field Programmable Logic and Applications (FPL) > 1 - 4

2016 26th International Conference on Field Programmable Logic and Applications (FPL)

Recent advances in FPGA technology and the proliferation of High Level Synthesis (HLS) tools makes it possible to implement complex System on Chip (SoC) designs that realize complete applications in a single FPGA device. To be able to exploit the large performance vs. area search space of such modern FPGA-based SoCs, system architects must have the appropriate performance analysis tools to evaluate-preferably...

rozdział

Temperature-Aware Register Mapping in GPGPUs

Ehsan Atoofian

2016 IEEE Trustcom/BigDataSE/ISPA > 1636 - 1643

2016 IEEE Trustcom/BigDataSE/ISPA

Various architectural-based techniques have been proposed to reduce power consumption in GPGPUs. However, these techniques mostly ignore temperature of GPGPUs. In this paper, we focus on the register file and propose a new technique to reduce its peak temperature. Register file in GPGPUs is very large, even larger than caches, to support thousands of simultaneously execution threads. This makes register...

rozdział

Labeled multi-object tracking algorithms for generic observation model

Suqi Li, Wei Yi, Bailu Wang, Lingjiang Kong

2016 19th International Conference on Information Fusion (FUSION) > 1125 - 1131

2016 19th International Conference on Information Fusion (FUSION)

In this paper, we are devoted to the labeled multi-object tracking problem for generic observation model (GOM) in the framework of Finite set statistics. Firstly, we derive a product-labeled multi-object (P-LMO) filter which is a closed form solution to labeled multi-object Bayesian filter under the standard multi-object transition kernel and generic multi-object likelihood, and thus can be used as...

rozdział

soft-NEON: A study on replacing the NEON engine of an ARM SoC with a reconfigurable fabric

Jose Raul Garcia Ordaz, Dirk Koch

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP) > 229 - 230

2016 IEEE 27th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Power is a limiting factor in the design of embedded processors. For this reason adding more instruction extensions is not a scalable option. To overcome this issue, we study the effects of replacing the NEON unit of an ARM SoC with an FPGA-like reconfigurable fabric. We measure the gap between the conventional hard-NEON and a soft-NEON implementation. We found that the soft-NEON has an overhead of...

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Robust and Real-Time Object Tracking Using Scale-Adaptive Correlation Filters

Analysis and improvement of joint bilateral upsampling for depth image super-resolution

Physical design of supergate cells aiming geometrical optimizations

A Comparative Study of SYCL, OpenCL, and OpenMP

A CUDA implementation of the pagerank pipeline benchmark

Design space exploration of GPU Accelerated cluster systems for optimal data transfer using PCIe bus

A Robust Methodology for Performance Analysis on Hybrid Embedded Multicore Architectures

Pushing the Limits of Online Auto-Tuning: Machine Code Optimization in Short-Running Kernels

Directive-Based Pipelining Extension for OpenMP

SPEC-AX and PARSEC-AX: extracting accelerator benchmarks from microprocessor benchmarks

Hetero-mark, a benchmark suite for CPU-GPU collaborative computing

Benchmarking the graphulo processing framework

Computational and memory analysis of Tegra SoCs

Variable-length VLIW encoding for code size reduction in embedded processors

Parallel generation of digitally reconstructed radiographs on heterogeneous multi-GPU workstations

Generation of the Single Precision BLAS Library for the Parallella Platform, with Epiphany Co-processor Acceleration, Using the BLIS Framework

SoCLog: A real-time, automatically generated logging and profiling mechanism for FPGA-based Systems On Chip

Temperature-Aware Register Mapping in GPGPUs

Labeled multi-object tracking algorithms for generic observation model

soft-NEON: A study on replacing the NEON engine of an ARM SoC with a reconfigurable fabric

Opcje filtrowania

Data publikacji

Dostępność treści

Typ publikacji

Słowa kluczowe

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Typ publikacji

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu