The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a highly structured neural network architecture for semantic segmentation with an extremely small model size, suitable for low-power embedded and mobile platforms. Specifically, our architecture combines i) a Haar wavelet-based tree-like convolutional neural network (CNN), ii) a random layer realizing a radial basis function kernel approximation, and iii) a linear classifier. While stages...
Due to increasing demand of low power computing, and diminishing returns from technology scaling, industry and academia are turning with renewed interest toward energy-efficient programmable accelerators. This paper proposes an Integrated Programmable-Array accelerator (IPA) architecture based on an innovative execution model, targeted to accelerate both data and control-flow parts of deeply embedded...
One of the fundamental functionalities for autonomous navigation of Unmanned Aerial Vehicles (UAVs) is the hovering capability. State-of-the-art techniques for implementing hovering on standard-size UAVs process camera stream to determine position and orientation (visual odometry). Similar techniques are considered unaffordable in the context of nano-scale UAVs (i.e. few centimeters of diameter),...
Pushing supply voltages in the near-threshold region is today one of the main avenues to minimize power consumption in digital integrated circuits. This works well with logic units, but memory operations on standard six-transistor static RAM (6T-SRAM) cells become unreliable at low voltages. Standard cell memory (SCM) works fully reliably at near-threshold voltages, but has much lower area density...
Convolutional Neural Networks (CNNs) have revolutionized the world of image classification over the last few years, pushing the computer vision close beyond human accuracy. The required computational effort of CNNs today requires power-hungry parallel processors and GP-GPUs. Recent efforts in designing CNN Application-Specific Integrated Circuits (ASICs) and accelerators for System-On-Chip (SoC) integration...
The logarithmic number system (LNS) has always been an interesting alternative for floating point calculations since the implementation of several arithmetic operations such as divisions, exponentiations and square-roots, which are required for computationally intensive nonlinear functions, is greatly simplified in the logarithmic space. However, additions and subtractions become nonlinear operations...
The stringent power constraints of complex microcontroller based devices (e.g. smart sensors for the IoT) represent an obstacle to the introduction of sophisticated functionality. Programmable accelerators would be extremely beneficial to provide the flexibility and energy efficiency required by fast-evolving IoT applications; however, the integration complexity and sub-10mW power budgets have been...
Energy-efficient computing and ultra-low-power operation are requirements for many application areas, such as IoT and wearables. While for some applications, integer and fixed-point processor instructions suffice, others (e.g. simultaneous localization and mapping - SLAM, stereo vision, nonlinear regression and classification) require a larger dynamic range, typically obtained using single/double-precision...
While high-end heterogeneous systems are increasingly supporting heterogeneous uniform memory access (hUMA) as envisioned by the Heterogeneous System Architecture (HSA) foundation, their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving...
Many-core architectures structured as fabrics of tightly-coupled clusters have shown promising results on embedded computer vision benchmarks, providing state-of-art performance with a reduced power budget. We propose PULP (Parallel processing Ultra-Low Power platform), an architecture built on clusters of tightly-coupled OpenRISC ISA cores, with advanced techniques for fast performance and energy...
Reliability is a major concern in multiprocessors. Dynamic Reliability Management (DRM) aims at trading off processor performance with lifetime. The state-of-the-art publications study only the theory supported by simulation. This paper presents the first complete software implementation, working on a real hardware, of a low-overhead, Android-compatible workload-aware DRM Governor for mobile multiprocessors...
Negative bias temperature instability (NBTI) adversely affects the reliability of a processor by introducing new delay-induced faults. However, the effect of these delay variations is not uniformly spread across functional units and instructions: some are affected more (hence less reliable) than others. This paper proposes a NBTI-aware compiler-directed very long instruction word (VLIW) assignment...
Modern embedded MPSoC designs increasingly couple hardware accelerators to processing cores to trade between energy efficiency and platform specialization. To assist effective design of such systems there is the need on one hand for clear methodologies to streamline accelerator definition and instantiation, on the other for architectural templates and run-time techniques that minimize processors-to-accelerator...
Simulators are still the primary tools for development and performance evaluation of applications running on massively parallel architectures. However, current virtual platforms are not able to tackle the complexity issues introduced by 1000-core future scenarios. We present a fast and accurate simulation framework targeting extremely large parallel systems by specifically taking advantage of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.