Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
This work presents an efficient hardware accelerator design of deep residual learning algorithms, which have shown superior image recognition accuracy (>90% top-5 accuracy on ImageNet database). Two key objectives of the acceleration strategy are to (1) maximize resource utilization and minimize data movements, and (2) employ scalable and reusable computing primitives to optimize physical design...
Today's cyber-physical systems (CPS) for the emerging Smart cities includes hardware and software with intelligent sensing and controls. In Smart cities, the use of high definition images, videos, and context information has become a requirement for urban street data collection and processing. Field Programmable Gate Array enabled data centers and processing shows the great potential for its high...
It is imperative to accelerate convolutional neural networks (CNNs) due to their ever-widening application areas from server, mobile to IoT devices. Based on the fact that CNNs can be characterized by a significant amount of zero values in both kernel weights and activations, we propose a novel hardware accelerator for CNNs exploiting zero weights and activations. We also report a zero-induced load...
A desirable feature of a development tool for SoC design is that, given the important applications in the domain to be targeted by the SoC, a powerful hardware-software partitioning engine is available to determine which function(s) shall be mapped to hardware. However, to provide high-quality partitioning, this engine must be able to consider a rich design space of possible alternate hardware and...
The fast and energy-efficient simulation of dynamical systems defined by coupled ordinary/partial differential equations has emerged as an important problem. The accelerated simulation of coupled ODE/PDE is critical for analysis of physical systems as well as computing with dynamical systems. This paper presents a fast and programmable accelerator for simulating dynamical systems. The computing model...
Hadoop is an emerging data application for the big data processing. In Hadoop system, data compression is a significant part in processing big data effectively. Achieving this in software requires significant compute processing. In this paper we present the detailed design of a hardware compression accelerators. We also measure the performance of the hardware accelerators. Our analysis shows that...
Face detection is a critical function in many embedded applications, such as computer vision and security. Although face detection has been well studied, detecting a large number of faces with different scales and excessive variations (pose, expression, or illumination) usually involves computationally expensive classification algorithms. These algorithms may divide an image into sub-windows at different...
As the size of available data is increasing, it is becoming inefficient to scale the computational power of traditional systems. To overcome this problem, customized application-specific accelerators are becoming integral parts of modern system on chip (SOC) architectures. In this paper, we summarize existing hardware accelerators for data centers and discuss the techniques to implement and embed...
This article consists of a collection of slides from the authors' conference presentation. Are FPGAs a Promising Target in the Datacenter for Deep Learning? Yes.
VXLAN (Virtual extensible Local Area Network) is an edge-overlay model that uses L2-in-L3 tunneling protocol. It has attracted attentions for multi-tenant datacenter networks. For the deployment of VXLAN in legacy networks, networks can include VXLAN gateways which forward traffic between VXLAN and non-VXLAN environments. This paper proposes the design of VXLAN gateways which are not in servers, but...
To advance datacenter capabilities beyond what commodity server designs can provide, the authors designed and built a composable, reconfigurable fabric to accelerate large-scale software services. Each instantiation of the fabric consists of a 6 x 8 2D torus of high-end field-programmable gate arrays (FPGAs) embedded into a half-rack of 48 servers. The authors deployed the reconfigurable fabric in...
This paper presents CLOSH, a source level offloading tool for special-purpose hardware accelerators. CLOSH is designed to make it easy to accelerate existing applications when their source code is available. We evaluated CLOSH with one application on our new TV platform, and found that required cycles are decreased by 27.4%.
This paper identifies a new opportunity for improving the efficiency of a processor core: memory access phases of programs. These are dynamic regions of programs where most of the instructions are devoted to memory access or address computation. These occur naturally in programs because of workload properties, or when employing an in-core accelerator, we get induced phases where the code execution...
Datacenter workloads demand high computational capabilities, flexibility, power efficiency, and low cost. It is challenging to improve all of these factors simultaneously. To advance datacenter capabilities beyond what commodity server designs can provide, we have designed and built a composable, reconfigurable fabric to accelerate portions of large-scale software services. Each instantiation of the...
Compute-intensive applications are emerging in intelligent home, retail store and automotive industries. These applications are becoming more sophisticated with new features rich in audio, video, image, and machine learning capabilities that demand heavy computations. We present the EMERALD (EMERging Applications and algorithms for Low power Device) workload suite. We profile the workloads to show...
In NAND Flash-based SSDs, deduplication can provide an effective resolution of three critical issues: cell lifetime, write performance, and garbage collection overhead. However, deduplication at SSD device level distinguishes itself from the one at enterprise storage systems in many aspects, whose success lies in proper exploitation of underlying very limited hardware resources and workload characteristics...
This paper presents a guidance algorithm for formation flight of two UAVs. Since the nonlinear guidance algorithm have good properties to follow nonlinear flight trajectory based on geometric and kinematic, the nonlinear guidance algorithm is modified as a leader-follower station keeping formation control law for two UAVs using the relation of the nonlinear guidance algorithm to the proportional navigation...
A fast and simple algorithm for designing time-optimal waveforms is presented. The algorithm accepts a given arbitrary multidimensional k-space trajectory as the input and outputs the time-optimal gradient waveform that traverses k-space along that path in minimum time. The algorithm is noniterative, and its run time is independent of the complexity of the curve, i.e., the number of switches between...
We envision a mid-session control with 2-level speed scaling scheme that server can control content playback speed without client's aid. Also, our scheme doesn't need re-encoding, so server still can use well-known performance features. Therefore, it promises great advantages for set-top box makers.
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.