Wyniki wyszukiwania

rozdział

Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code

David Beckingsale, Olga Pearce, Ignacio Laguna, Todd Gamblin

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS) > 307 - 316

2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

Increasing architectural diversity makes performance portability extremely important for parallel simulation codes. Emerging on-node parallelization frameworks such as Kokkos and RAJA decouple the work done in kernels from the parallelization mechanism, allowing for a single source kernel to be tuned for different architectures at compile time. However, computational demands in production applications...

rozdział

Single-cell based random neural network for deep learning

Yonghua Yin, Erol Gelenbe

2017 International Joint Conference on Neural Networks (IJCNN) > 86 - 93

2017 International Joint Conference on Neural Networks (IJCNN)

Recent work demonstrated the value of multi clusters of spiking Random Neural Networks (RNN) with dense soma-to-soma interactions in deep learning. In this paper we go back to the original simpler structure and we investigate the power of single RNN cells for deep learning. First, we consider three approaches with the single cells, twin cells and multi-cell clusters. This first part shows that RNNs...

rozdział

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Javier Alejandro Varela, Norbert Wehn, Qian Liang, Songyin Tang

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 124 - 131

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In the field of high performance heterogeneous computing systems, field programmable gate arrays (FPGAs) have shown great advantages in terms of acceleration and energy efficiency. And with the inclusion of the OpenCL framework for parallel programming, the design complexity has been greatly reduced. However, the parallel implementation of applications containing data-dependent branches usually experiences...

rozdział

Using convolutional neural networks for plant classification

Salar Razavi, Hulya Yalcin

2017 25th Signal Processing and Communications Applications Conference (SIU) > 1 - 4

2017 25th Signal Processing and Communications Applications Conference (SIU)

Growing concerns about increasing world population and limited food resources have been leading researchers to utilize advanced computing technology to improve the efficiency of agricultural fields. Computing technology is expected to increase the productivity, contribute to a better understanding of the relationship between environmental factors and healthy crops, reduce the labor costs for farmers...

rozdział

Power Analysis of HLS-Designed Customized Instruction Set Architectures

Tejaswini Ananthanarayana, Sonia Lopez, Marcin Lukowiak

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 207 - 212

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Performance and power consumption are key features for evaluating any processor design. In this paper, we present close attention to the impact on power and energy consumption of customized Instruction SetArchitecture (ISA) designed by means of High Level Synthesis (HLS) tools. We compare these results against a full ISA soft processor, Microblaze. Our customized ISA processors greatly reduce the...

rozdział

New optimized GPU version of the k-means algorithm for large-sized image segmentation

Hicham Fakhi, Omar Bouattane, Mohamed Youssfi, Ouajji Hassan

2017 Intelligent Systems and Computer Vision (ISCV) > 1 - 6

2017 Intelligent Systems and Computer Vision (ISCV)

K-means is a compute-intensive iterative algorithm, each iteration consists of two steps data assignment and K centroids recalculation. In order to accelerate the compute-intensive portions of k-means, the data assignment and K centroids recalculation steps are offloaded to the GPU in parallel. Only the initialization and convergence tests steps are performed by the CPU. In addition this new version...

rozdział

Multi2Sim Kepler: A detailed architectural GPU simulator

Xun Gong, Rafael Ubal, David Kaeli

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 269 - 278

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Presilicon simulation is one of the key toolsets for computer architects to evaluate and optimize their future designs. As Graphics Processing Units (GPUs) have become the platform of choice in many computing communities due to their impressive processing capabilities, computer architecture researchers need a simulation framework that allows them to quantitatively consider design tradeoffs. In this...

rozdział

SimBench: A portable benchmarking methodology for full-system simulators

Harry Wagstaff, Bruno Bodin, Tom Spink, Bjorn Franke

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) > 217 - 226

2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

Full-system simulators are increasingly finding their way into the consumer space for the purposes of backwards compatibility and hardware emulation (e.g. for games consoles). For such compute-intensive applications simulation performance is paramount. In this paper we argue that existing benchmark suites such as SPEC CPU2006, originally designed for architecture and compiler performance evaluation,...

rozdział

An FPGA Design Framework for CNN Sparsification and Acceleration

Sicheng Li, Wei Wen, Yu Wang, Song Han, więcej

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 28

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

Convolutional neural networks (CNNs) have recently broken many performance records in image recognition and object detection problems. The success of CNNs, to a great extent, is enabled by the fast scaling-up of the networks that learn from a huge volume of data. The deployment of big CNN models can be both computation-intensive and memory-intensive, leaving severe challenges to hardware implementations...

rozdział

Fast and Energy-Driven Design Space Exploration for Heterogeneous Architectures

Baptiste Roux, Matthieu Gautier, Olivier Sentieys, Jean-Philippe Delahaye

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) > 83

2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

In the last years, the integration of specialized hardware accelerators in Multiprocessor System-on-Chip (MpSoC) led to a new kind of architectures combining both software (SW) and hardware (HW) computational resources. For these new Heterogeneous MpSoC (HMpSoC) architectures, performance and energy consumption depend on a large set of parameters such as the HW/SW partitioning, the type of HW implementation...

rozdział

Quality Attribute Trade-Offs in Industrial Software Systems

Michael Wahler, Raphael Eidenbenz, Aurelien Monot, Manuel Oriol, więcej

2017 IEEE International Conference on Software Architecture Workshops (ICSAW) > 251 - 254

2017 IEEE International Conference on Software Architecture Workshops (ICSAW)

The main challenge of architecting modern industrial control and automation systems (ICASs) is that they need to fulfill quality attributes (QAs) traditional to real-time systems — such as timeliness and predictability — and modern software engineering — such as modularity or reusability. QAs often areconflicting, which entails difficult trade-offs. As a consequence, even the architecture of closely...

rozdział

A deep learning approach to multiple kernel fusion

Huan Song, Jayaraman J. Thiagarajan, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2292 - 2296

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Kernel fusion is a popular and effective approach for combining multiple features that characterize different aspects of data. Traditional approaches for Multiple Kernel Learning (MKL) attempt to learn the parameters for combining the kernels through sophisticated optimization procedures. In this paper, we propose an alternative approach that creates dense embeddings for data using the kernel similarities...

rozdział

Learning rotation invariance in deep hierarchies using circular symmetric filters

Dhruv Kohli, Biplab Ch Das, Viswanath Gopalakrishnan, Kiran Nanjunda Iyer

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2846 - 2850

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Deep hierarchical models for feature learning have emerged as an effective technique for object representation and classification in recent years. Though the features learnt using deep models have shown lot of promise towards achieving invariance to data transformations, this primarily comes at the expense of using much larger training data and model size. In the proposed work we devise a novel technique...

rozdział

Ultra low-power visual odometry for nano-scale unmanned aerial vehicles

Daniele Palossi, Andrea Marongiu, Luca Benini

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1647 - 1650

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

One of the fundamental functionalities for autonomous navigation of Unmanned Aerial Vehicles (UAVs) is the hovering capability. State-of-the-art techniques for implementing hovering on standard-size UAVs process camera stream to determine position and orientation (visual odometry). Similar techniques are considered unaffordable in the context of nano-scale UAVs (i.e. few centimeters of diameter),...

rozdział

On the Feasibility of Implementing TCP Using a Modular Architecture

Mohamed Oulmahdi, Nicolas Van Wambeke, Christophe Chassot, Abdelkamel Tari

2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA) > 17 - 22

2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA)

Today, the design of modern Transportprotocols follows modular architecture models. This approachtransposes the loose coupling pattern used in software designto protocols allowing them to benefit from the highconfigurability, composability, flexibility and maintainability. Taking into account the "only TCP" policy applied by systemsand Internet providers leading to the non-deployment of...

rozdział

Stochastic-based multi-stage streaming realization of deep convolutional neural network

Mohammed Alawad, Mingjie Lin

2017 18th International Symposium on Quality Electronic Design (ISQED) > 13 - 18

2017 18th International Symposium on Quality Electronic Design (ISQED)

Large-scale convolutional neural network (CNN), conceptually mimicking the operational principle of visual perception in human brain, has been widely applied to tackle many challenging computer vision and artificial intelligence applications. Unfortunately, despite of its simple architecture, a typically-sized CNN is well known to be computationally intensive. This work presents a novel stochastic-based...

rozdział

A novel zero weight/activation-aware hardware architecture of convolutional neural network

Dongyoung Kim, Junwhan Ahn, Sungjoo Yoo

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1462 - 1467

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel weights and activations, we propose a novel hardware accelerator for CNNs exploiting zero weights and activations. We also report a zero-induced load...

rozdział

Exploiting loop-dependent Stream Reuse for stream processors

Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers, więcej

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT) > 22 - 31

2008 International Conference on Parallel Architectures and Compilation Techniques (PACT)

The memory access limits the performance of stream processors. By exploiting the reuse of data held in the Stream Register File (SRF), an on-chip storage, the number of memory accesses can be reduced. In current stream compilers reuse is only attempted for simple stream references, those whose start and end are known. Compiler analysis from outside of stream processors does not directly enable the...

rozdział

FlowOS: A pure flow-based vision of network traffic

Abdul Alim, Mehdi Bezahaf, Laurent Mathy

2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS) > 143 - 144

2012 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS)

The original Internet architecture lacked the concept of a flow, and considered each traffic as a set of packets. In this short paper, we rethink this concept inside middlebox-based platform and handle each traffic as a whole block instead of packets. We design a whole system where each input packet matching some criteria is placed in a specific structure which is shared between all processing modules...

rozdział

Automatic generation of fast BLAS3-GEMM: A portable compiler approach

Xing Su, Xiangke Liao, Jingling Xue

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 122 - 133

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in assembly code or generated from C code by general-purpose compilers (guided by architecture-specific directives or auto-tuning). Therefore, either performance or portability suffers. We present a POrtable Compiler Approach, Poca, implemented in LLVM, to automatically generate and optimize this micro-kernel in...

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code

Single-cell based random neural network for deep learning

Exploiting Decoupled OpenCL Work-Items with Data Dependencies on FPGAs: A Case Study

Using convolutional neural networks for plant classification

Power Analysis of HLS-Designed Customized Instruction Set Architectures

New optimized GPU version of the k-means algorithm for large-sized image segmentation

Multi2Sim Kepler: A detailed architectural GPU simulator

SimBench: A portable benchmarking methodology for full-system simulators

An FPGA Design Framework for CNN Sparsification and Acceleration

Fast and Energy-Driven Design Space Exploration for Heterogeneous Architectures

Quality Attribute Trade-Offs in Industrial Software Systems

A deep learning approach to multiple kernel fusion

Learning rotation invariance in deep hierarchies using circular symmetric filters

Ultra low-power visual odometry for nano-scale unmanned aerial vehicles

On the Feasibility of Implementing TCP Using a Modular Architecture

Stochastic-based multi-stage streaming realization of deep convolutional neural network

A novel zero weight/activation-aware hardware architecture of convolutional neural network

Exploiting loop-dependent Stream Reuse for stream processors

FlowOS: A pure flow-based vision of network traffic

Automatic generation of fast BLAS3-GEMM: A portable compiler approach

Opcje filtrowania

Data publikacji

Dostępność treści

Słowa kluczowe

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu