Wyniki wyszukiwania

rozdział

FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures

Feng Zhang, Bo Wu, Jidong Zhai, Bingsheng He, więcej

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 27 - 38

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

The integrated architecture that features both CPU and GPU on the same die is an emerging and promising architecture for fine-grained CPU-GPU collaboration. However, the integration also brings forward several programming and system optimization challenges, especially for irregular applications. The complex interplay between heterogeneity and irregularity leads to very low processor utilization of...

rozdział

FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks

Wenyan Lu, Guihai Yan, Jiajun Li, Shijun Gong, więcej

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 553 - 564

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Convolutional Neural Networks (CNN) are verycomputation-intensive. Recently, a lot of CNN accelerators based on the CNN intrinsic parallelism are proposed. However, we observed that there is a big mismatch between the parallel types supported by computing engine and the dominant parallel types of CNN workloads. This mismatch seriously degrades resource utilization of existing accelerators. In this...

rozdział

NCAP: Network-Driven, Packet Context-Aware Power Management for Client-Server Architecture

Mohammad Alian, Ahmed H. M. O. Abulila, Lokesh Jindal, Daehoon Kim, więcej

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 25 - 36

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

The rate of network packets encapsulating requests from clients can significantly affect the utilization, and thus performance and sleep states of processors in servers deploying a power management policy. To improve energy efficiency, servers may adopt an aggressive power management policy that frequently transitions a processor to a low-performance or sleep state at a low utilization. However, such...

rozdział

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning

Linghao Song, Xuehai Qian, Hai Li, Yiran Chen

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) > 541 - 552

2017 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Convolution neural networks (CNNs) are the heart of deep learning applications. Recent works PRIME [1] and ISAAC [2] demonstrated the promise of using resistive random access memory (ReRAM) to perform neural computations in memory. We found that training cannot be efficiently supported with the current schemes. First, they do not consider weight update and complex data dependency in training procedure...

rozdział

A space- and energy-efficient code compression/decompression technique for coarse-grained reconfigurable architectures

Bernhard Egger, Hochan Lee, Duseok Kang, Mansureh S. Moghaddam, więcej

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) > 197 - 209

2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

We present an effective code compression technique to reduce the area and energy overhead of the configuration memory for coarse-grained reconfigurable architectures (CGRA). Based on a statistical analysis of existing code, the proposed method reorders the storage locations of the reconfigurable entities and splits the wide configuration memory into a number of partitions. Code compression is achieved...

rozdział

Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs

Qingcheng Xiao, Yun Liang, Liqiang Lu, Shengen Yan, więcej

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA...

rozdział

HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect

Manupa Karunaratne, Aditi Kulkarni Mohite, Tulika Mitra, Li-Shiuan Peh

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

CGRAs are promising as accelerators due to their improved energy-efficiency compared to FPGAs. Existing CGRAs support reconfigurability for operations, but not communications because of the static neighbor-to-neighbor interconnect, leading to both performance loss and increased complexity of the compiler. In this paper, we introduce HyCUBE, a novel CGRA architecture with a reconfigurable interconnect...

rozdział

A kernel decomposition architecture for binary-weight Convolutional Neural Networks

Hyeonuk Kim, Jaehyeong Sim, Yeongjae Choi, Lee-Sup Kim

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) > 1 - 6

2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC)

The binary-weight CNN is one of the most efficient solutions for mobile CNNs. However, a large number of operations are required to process each image. To reduce such a huge operation count, we propose an energy-efficient kernel decomposition architecture, based on the observation that a large number of operations are redundant. In this scheme, all kernels are decomposed into sub-kernels to expose...

rozdział

Work-in-progress: REDEFINE – a case for WCET-friendly hardware accelerators for real time applications

Kavitha Madhu, Tarun Singla, S K Nandy, Ranjani Narayan, więcej

2017 International Conference on Compilers, Architectures and Synthesis For Embedded Systems (CASES) > 1 - 2

2017 International Conference on Compilers, Architectures and Synthesis For Embedded Systems (CASES)

REDEFINE is a distributed dynamic dataow architecture, designed for exploiting parallelism at various granularities as an embedded system-on-chip (SoC). is paper dwells on the exibility of REDEFINE architecture and its execution model in accelerating real-time applications coupled with a WCET analyzer that computes execution time bounds of real time applications.

rozdział

A programmable Galois Field processor for the Internet of Things

Yajing Chen, Shengshuo Lu, Cheng Fu, David Blaauw, więcej

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 55 - 68

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

This paper investigates the feasibility of a unified processor architecture to enable error coding flexibility and secure communication in low power Internet of Things (IoT) wireless networks. Error coding flexibility for wireless communication allows IoT applications to exploit the large tradeoff space in data rate, link distance and energy-efficiency. As a solution, we present a light-weight Galois...

rozdział

Access pattern-aware cache management for improving data utilization in GPU

Gunjae Koo, Yunho Oh, Won Woo Ro, Murali Annavaram

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA) > 307 - 319

2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)

Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space...

rozdział

DFGenTool: A Dataflow Graph Generation Tool for Coarse Grain Reconfigurable Architectures

Manideepa Mukherjee, Alexander Fell, Apala Guha

2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID) > 67 - 72

2017 30th International Conference on VLSI Design and 2017 16th International Conference on Embedded Systems (VLSID)

In this paper, DFGenTool, a dataflow graph (DFG) generation tool, is presented, which converts loops in a sequential program given in a high-level language such as C, into a DFG. DFGenTool adapts DFGs for mapping to Coarse Grain Reconfigurable Architectures (CGRA) to enable a variety of CGRA implementations and compilers to be benchmarked against a standard set of DFGs. Several kernels have been converted...

rozdział

Severity grading of psoriatic plaques using deep CNN based multi-task learning

Anabik Pal, Akshay Chaturvedi, Utpal Garain, Aditi Chandra, więcej

2016 23rd International Conference on Pattern Recognition (ICPR) > 1478 - 1483

2016 23rd International Conference on Pattern Recognition (ICPR)

This paper addresses the problem of automatic machine analysis based severity scoring of psoriasis skin disease. Three different disease parameters namely, erythema, scaling and induration are considered for such severity grading. Given an image containing a psoriatic plaque the task is to predict severity scores for all the three parameters. This paper presents a novel deep CNN based architecture...

rozdział

Convolutional Neural Networks for object recognition on mobile devices: A case study

Luis Tobias, Aurelien Ducournau, Francois Rousseau, Gregoire Mercier, więcej

2016 23rd International Conference on Pattern Recognition (ICPR) > 3530 - 3535

2016 23rd International Conference on Pattern Recognition (ICPR)

Deep Learning (DL), especially Convolutional Neural Networks (CNN), has become the state-of-the-art for a variety of pattern recognition issues. Technological developments have allowed the use of high-end General Purpose Graphic Processor Units (GPGPU) for accelerating numerical problem solving. They resort no only to lower computational time, but also allow considering much larger networks. Hence,...

rozdział

Massage-Passing Interface Cluster Bulid upon System Kernel Environment

Jih-Ching Chiu, Bao-Ren Guo, Chih-Hsun Chao

2016 International Computer Symposium (ICS) > 509 - 514

2016 International Computer Symposium (ICS)

With the age of Big Data coming, the three defining characteristics of Big Data–Volume, variety and Velocity, make Cloud Computing facing new challenges. In response to the demand of Big Data analytics, using distributed computing cluster to process vast amounts of data is a megatrend. In this paper, we discuss the performance of distributed computing clusters provided by the current cloud computing...

rozdział

Security analysis on InfiniBand protocol implementations

Kul Prasad Subedi, Dipankar Dasgupta, Bo Chen

2016 IEEE Symposium Series on Computational Intelligence (SSCI) > 1 - 7

2016 IEEE Symposium Series on Computational Intelligence (SSCI)

The growing popularity of high performance computing has led to a new focus on bypassing or eliminating traditional I/O operations that are usually the bottlenecks for fast processing of large data volumes. One such solution uses a new network communication protocol called InfiniBand (IB) which supports remote direct memory access without making two copies of data (one in user space and the other...

rozdział

Mini-apps for high performance data analysis

Sreenivas R. Sukumar, Michael A. Matheson, Ramakrishnan Kannan, Seung-Hwan Lim

2016 IEEE International Conference on Big Data (Big Data) > 1483 - 1492

2016 IEEE International Conference on Big Data (Big Data)

Scaling-up scientific data analysis and machine learning algorithms for data-driven discovery is a grand challenge that we face today. Despite the growing need for analysis from science domains that are generating ‘Big Data’ from instruments and simulations, building high-performance analytical workflows of data-intensive algorithms have been daunting because: (i) the ‘Big Data’ hardware and software...

rozdział

Kernels for scalable data analysis in science: Towards an architecture-portable future

Sreenivas R. Sukumar, Ramakrishnan Kannan, Seung-Hwan Lim, Michael A. Matheson

2016 IEEE International Conference on Big Data (Big Data) > 1026 - 1031

2016 IEEE International Conference on Big Data (Big Data)

In this paper, we pose and address some of the unique challenges in the analysis of scientific Big Data on supercomputing platforms. Our approach identifies, implements and scales numerical kernels that are critical to the instantiation of theory-inspired analytic workflows on modern computing architectures. We present the benefits of scalable kernels towards constructing algorithms such as principal...

rozdział

Automated Optimal Architecture of Deep Convolutional Neural Networks for Image Recognition

Saleh Albelwi, Ausif Mahmood

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA) > 53 - 60

2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA)

Recent advancements in deep Convolutional Neural Networks (CNNs) have led to impressive progress in computer vision, especially in image classification. CNNs involve numerous hyperparameters that identify the network's structure such as depth of the network, kernel size, number of feature maps, stride, pooling size and pooling regions etc. These hyperparameters have a significant impact on the classification...

rozdział

Extending FreeRTOS to support dynamic and distributed mapping in multiprocessor systems

G. Abich, M. G. Mandelli, F. R. Rosa, F. Moraes, więcej

2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS) > 712 - 715

2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS)

With the ever-increasing complexity of both embedded application workloads and multiprocessor platforms grows the demand for efficient mapping heuristics able of allocating several application workloads at runtime. The majority of promoted mapping techniques are bespoke implementations that consider an in-house operating system, which is developed to a particular architecture, restricting its adoption...

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

FinePar: Irregularity-aware fine-grained workload partitioning on integrated architectures

FlexFlow: A Flexible Dataflow Accelerator Architecture for Convolutional Neural Networks

NCAP: Network-Driven, Packet Context-Aware Power Management for Client-Server Architecture

PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning

A space- and energy-efficient code compression/decompression technique for coarse-grained reconfigurable architectures

Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs

HyCUBE: A CGRA with reconfigurable single-cycle multi-hop interconnect

A kernel decomposition architecture for binary-weight Convolutional Neural Networks

Work-in-progress: REDEFINE – a case for WCET-friendly hardware accelerators for real time applications

A programmable Galois Field processor for the Internet of Things

Access pattern-aware cache management for improving data utilization in GPU

DFGenTool: A Dataflow Graph Generation Tool for Coarse Grain Reconfigurable Architectures

Severity grading of psoriatic plaques using deep CNN based multi-task learning

Convolutional Neural Networks for object recognition on mobile devices: A case study

Massage-Passing Interface Cluster Bulid upon System Kernel Environment

Security analysis on InfiniBand protocol implementations

Mini-apps for high performance data analysis

Kernels for scalable data analysis in science: Towards an architecture-portable future

Automated Optimal Architecture of Deep Convolutional Neural Networks for Image Recognition

Extending FreeRTOS to support dynamic and distributed mapping in multiprocessor systems

Opcje filtrowania

Data publikacji

Dostępność treści

Słowa kluczowe

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Dostępność treści

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu