Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
Correlation filter based tracking method has been widely used for its high efficiency and robustness. However, reducing model drifting while achieving both high robustness and fast scale estimation is still an open problem. In this paper, we represent the target in kernel feature space and train a classifier on a scale pyramid to achieve adaptive scale estimation. We then integrate three complementary...
We analyze and propose an improved implementation of joint bilateral upsampling algorithm [5] for depth image super-resolution (SR). The input to the algorithm is a low resolution (LR) depth image and its corresponding high resolution (HR) color image. With the guidance of HR color image, the depth edges can be preserved during the SR process. However, in the original implementation, the sparse sampling...
Recent papers have demonstrated that graph-based methodologies for supergate design can provide solutions with fewer transistors when compared to the widely used factoring methods. However, there is not enough discussion about the impact of those solutions on physical design, and it is important since the generated supergates have some special topological particularities. In this paper, we perform...
Recent trends indicate that future computing systems will be composed by a group of heterogeneous computing devices, including CPUs, GPUs, and other hardware accelerators. These devices provide increased processing performance, however, creating efficient code for them may require that programmers manage memory assignments and use specialized APIs, compilers, or runtime systems, thus making their...
This paper presents the details of a CUDA implementation of the PageRank pipeline benchmark [1], a new proposed benchmark aimed to compare and measure the capabilities of big data systems. The reference implementation is only serial at the moment, but our CUDA implementation is parallel. The results indicate that GPU accelerated systems have considerable potential for big data workloads.
Use of accelerators such as GPUs is increasing, but efficient use of GPUs requires making good design choices. Such design choices include type of memory allocation and overlapping concurrency of data transfer with parallel computation. Performance varies with the application, hardware version such as generation of GPU, and software version including programming language drivers. This large number...
Today's vehicles increasingly embed software intelligence in order to be safer for the driver, and to achieve autonomous driving in a close future. To answer the computational needs of these algorithms, system-on-chip (SoC) suppliers propose heterogeneous architectures. With such complex SoCs, embedding applications in vehicle becomes more and more complex for car manufacturers. Indeed, it is not...
This paper proposes an online auto-tuning approach for computing kernels. Differently from existing online auto-tuners, which regenerate code with long compilation chains from the source to the binary code, our approach consists on deploying auto-tuning directly at the level of machine code generation. This allows auto-tuning to pay off in very short-running applications. As a proof of concept, our...
Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension...
The end of Dennard Scaling has necessitated research into the adoption of specialized architectures for offloading specific code regions in applications. Recent works in accelerator architectures have chosen diverse workloads and even diverse code regions (within the same workload) to highlight the efficacy of specific accelerator architectures. However this makes it challenging to evaluate the power/performance...
Graphics Processing Units (GPUs) can easily outperform CPUs in processing large-scale data parallel workloads, but are considered weak in processing serialized tasks and communicating with other devices. Pursuing a CPU-GPU collaborative computing model which takes advantage of both devices could provide an important breakthrough in realizing the full performance potential of heterogeneous computing...
Graph algorithms have wide applicablity to a variety of domains and are often used on massive datasets. Recent standardization efforts such as the GraphBLAS specify a set of key computational kernels that hardware and software developers can adhere to. Graphulo is a processing framework that enables GraphBLAS kernels in the Apache Accumulo database. In our previous work, we have demonstrated a core...
Low-power, embedded, GPU System-on-Chip (SoC) devices provide outstanding computational performance, especially for compute-intensive tasks. While clusters of SoCs for High-Performance Embedded Computing (HPEC) are not new, the computational power of these supercomputers has long lacked the efficiency of their more traditional, High-Performance Computing (HPC) counterparts. With the advent of the...
Very-long-instruction-word (VLIW) architectures are widely adopted in high-performance and low-power digital signal processors (DSP) due to their simplicity from extensive software optimizations. However, their poor code density (usually > 2× code size for a given application) and corresponding instruction accesses can overwhelm the energy savings on DSP datapaths. This paper presents variable-length...
The growing importance of three-dimensional radiotherapy treatment has been associated with the active presence of advanced computational workflows that can simulate conventional x-ray films from computed tomography (CT) volumetric data to create digitally reconstructed radiographs (DRR). These simulated x-ray images are used to continuously verify the patient alignment in image-guided therapies with...
The Parallella is a hybrid computing platform that came into existence as the result of a Kickstarter project by Adapteva. It is composed of the high performance, energy-efficient, manycore architecture, Epiphany chip (used as co-processor) and one Zynq-7000 series chip, which normally runs a regular Linux OS version, serves as the main processor, and implements "glue logic" in its internal...
Recent advances in FPGA technology and the proliferation of High Level Synthesis (HLS) tools makes it possible to implement complex System on Chip (SoC) designs that realize complete applications in a single FPGA device. To be able to exploit the large performance vs. area search space of such modern FPGA-based SoCs, system architects must have the appropriate performance analysis tools to evaluate-preferably...
Various architectural-based techniques have been proposed to reduce power consumption in GPGPUs. However, these techniques mostly ignore temperature of GPGPUs. In this paper, we focus on the register file and propose a new technique to reduce its peak temperature. Register file in GPGPUs is very large, even larger than caches, to support thousands of simultaneously execution threads. This makes register...
In this paper, we are devoted to the labeled multi-object tracking problem for generic observation model (GOM) in the framework of Finite set statistics. Firstly, we derive a product-labeled multi-object (P-LMO) filter which is a closed form solution to labeled multi-object Bayesian filter under the standard multi-object transition kernel and generic multi-object likelihood, and thus can be used as...
Power is a limiting factor in the design of embedded processors. For this reason adding more instruction extensions is not a scalable option. To overcome this issue, we study the effects of replacing the NEON unit of an ARM SoC with an FPGA-like reconfigurable fabric. We measure the gap between the conventional hard-NEON and a soft-NEON implementation. We found that the soft-NEON has an overhead of...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.