Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
The integrated architecture that features both CPU and GPU on the same die is an emerging and promising architecture for fine-grained CPU-GPU collaboration. However, the integration also brings forward several programming and system optimization challenges, especially for irregular applications. The complex interplay between heterogeneity and irregularity leads to very low processor utilization of...
Convolutional Neural Networks (CNN) are verycomputation-intensive. Recently, a lot of CNN accelerators based on the CNN intrinsic parallelism are proposed. However, we observed that there is a big mismatch between the parallel types supported by computing engine and the dominant parallel types of CNN workloads. This mismatch seriously degrades resource utilization of existing accelerators. In this...
The rate of network packets encapsulating requests from clients can significantly affect the utilization, and thus performance and sleep states of processors in servers deploying a power management policy. To improve energy efficiency, servers may adopt an aggressive power management policy that frequently transitions a processor to a low-performance or sleep state at a low utilization. However, such...
Convolution neural networks (CNNs) are the heart of deep learning applications. Recent works PRIME [1] and ISAAC [2] demonstrated the promise of using resistive random access memory (ReRAM) to perform neural computations in memory. We found that training cannot be efficiently supported with the current schemes. First, they do not consider weight update and complex data dependency in training procedure...
We present an effective code compression technique to reduce the area and energy overhead of the configuration memory for coarse-grained reconfigurable architectures (CGRA). Based on a statistical analysis of existing code, the proposed method reorders the storage locations of the reconfigurable entities and splits the wide configuration memory into a number of partitions. Code compression is achieved...
Convolutional neural network (CNN) finds applications in a variety of computer vision applications ranging from object recognition and detection to scene understanding owing to its exceptional accuracy. There exist different algorithms for CNNs computation. In this paper, we explore conventional convolution algorithm with a faster algorithm using Winograd's minimal filtering theory for efficient FPGA...
CGRAs are promising as accelerators due to their improved energy-efficiency compared to FPGAs. Existing CGRAs support reconfigurability for operations, but not communications because of the static neighbor-to-neighbor interconnect, leading to both performance loss and increased complexity of the compiler. In this paper, we introduce HyCUBE, a novel CGRA architecture with a reconfigurable interconnect...
The binary-weight CNN is one of the most efficient solutions for mobile CNNs. However, a large number of operations are required to process each image. To reduce such a huge operation count, we propose an energy-efficient kernel decomposition architecture, based on the observation that a large number of operations are redundant. In this scheme, all kernels are decomposed into sub-kernels to expose...
REDEFINE is a distributed dynamic dataow architecture, designed for exploiting parallelism at various granularities as an embedded system-on-chip (SoC). is paper dwells on the exibility of REDEFINE architecture and its execution model in accelerating real-time applications coupled with a WCET analyzer that computes execution time bounds of real time applications.
This paper investigates the feasibility of a unified processor architecture to enable error coding flexibility and secure communication in low power Internet of Things (IoT) wireless networks. Error coding flexibility for wireless communication allows IoT applications to exploit the large tradeoff space in data rate, link distance and energy-efficiency. As a solution, we present a light-weight Galois...
Long latency of memory operation is a prominent performance bottleneck in graphics processing units (GPUs). The small data cache that must be shared across dozens of warps (a collection of threads) creates significant cache contention and premature data eviction. Prior works have recognized this problem and proposed warp throttling which reduces the number of active warps contending for cache space...
In this paper, DFGenTool, a dataflow graph (DFG) generation tool, is presented, which converts loops in a sequential program given in a high-level language such as C, into a DFG. DFGenTool adapts DFGs for mapping to Coarse Grain Reconfigurable Architectures (CGRA) to enable a variety of CGRA implementations and compilers to be benchmarked against a standard set of DFGs. Several kernels have been converted...
This paper addresses the problem of automatic machine analysis based severity scoring of psoriasis skin disease. Three different disease parameters namely, erythema, scaling and induration are considered for such severity grading. Given an image containing a psoriatic plaque the task is to predict severity scores for all the three parameters. This paper presents a novel deep CNN based architecture...
Deep Learning (DL), especially Convolutional Neural Networks (CNN), has become the state-of-the-art for a variety of pattern recognition issues. Technological developments have allowed the use of high-end General Purpose Graphic Processor Units (GPGPU) for accelerating numerical problem solving. They resort no only to lower computational time, but also allow considering much larger networks. Hence,...
With the age of Big Data coming, the three defining characteristics of Big Data–Volume, variety and Velocity, make Cloud Computing facing new challenges. In response to the demand of Big Data analytics, using distributed computing cluster to process vast amounts of data is a megatrend. In this paper, we discuss the performance of distributed computing clusters provided by the current cloud computing...
The growing popularity of high performance computing has led to a new focus on bypassing or eliminating traditional I/O operations that are usually the bottlenecks for fast processing of large data volumes. One such solution uses a new network communication protocol called InfiniBand (IB) which supports remote direct memory access without making two copies of data (one in user space and the other...
Scaling-up scientific data analysis and machine learning algorithms for data-driven discovery is a grand challenge that we face today. Despite the growing need for analysis from science domains that are generating ‘Big Data’ from instruments and simulations, building high-performance analytical workflows of data-intensive algorithms have been daunting because: (i) the ‘Big Data’ hardware and software...
In this paper, we pose and address some of the unique challenges in the analysis of scientific Big Data on supercomputing platforms. Our approach identifies, implements and scales numerical kernels that are critical to the instantiation of theory-inspired analytic workflows on modern computing architectures. We present the benefits of scalable kernels towards constructing algorithms such as principal...
Recent advancements in deep Convolutional Neural Networks (CNNs) have led to impressive progress in computer vision, especially in image classification. CNNs involve numerous hyperparameters that identify the network's structure such as depth of the network, kernel size, number of feature maps, stride, pooling size and pooling regions etc. These hyperparameters have a significant impact on the classification...
With the ever-increasing complexity of both embedded application workloads and multiprocessor platforms grows the demand for efficient mapping heuristics able of allocating several application workloads at runtime. The majority of promoted mapping techniques are bespoke implementations that consider an in-house operating system, which is developed to a particular architecture, restricting its adoption...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.