Search results for: Luca Benini

Items from 1 to 20 out of 25 results

chapter

Towards a Mobile Health Platform with Parallel Processing and Multi-sensor Capabilities

Florian Glaser, Philipp Schonle, Pascale Meier, Jonathan Bosser, more

2017 Euromicro Conference on Digital System Design (DSD) > 462 - 469

2017 Euromicro Conference on Digital System Design (DSD)

We present ongoing work on a platform for mobile health and implantable telemetry devices with powerful point-of-contact processing capabilities based on our VivoSoC multi-sensor medical instrumentation SoC, a custom power management IC, and only a few additional components - allowing the realisation of sub-ccm devices. We detail the powerful yet efficient acquisition and parallel processing capabilities...

chapter

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

Alvise Rigo, Christian Pinto, Kevin Pouget, Daniel Raho, more

2017 Euromicro Conference on Digital System Design (DSD) > 486 - 493

2017 Euromicro Conference on Digital System Design (DSD)

Power consumption and high compute density are the key factors to be considered when building a compute node for the upcoming Exascale revolution. Current architectural design and manufacturing technologies are not able to provide the requested level of density and power efficiency to realise an operational Exascale machine. A disruptive change in the hardware design and integration process is needed...

chapter

A sub-10mW real-time implementation for EMG hand gesture recognition based on a multi-core biomedical SoC

Simone Benatti, Giovanni Rovere, Jonathan Bosser, Fabio Montagna, more

2017 7th IEEE International Workshop on Advances in Sensors and Interfaces (IWASI) > 139 - 144

2017 7th IEEE International Workshop on Advances in Sensors and Interfaces (IWASI)

Real-time biosignal classification in power-constrained embedded applications is a key step in designing portable e-healtb devices requiring hardware integration along with concurrent signal processing. This paper presents an application based on a novel biomedical System-On-Chip (SoC) for signal acquisition and processing combining a homogeneous multi-core cluster with a versatile bio-potential front-end...

chapter

Design of an Energy Aware Petaflops Class High Performance Cluster Based on Power Architecture

Wissam Abu Ahmad, Andrea Bartolini, Francesco Beneventi, Luca Benini, more

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) > 964 - 973

2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

In this paper we present D.A.V.I.D.E. (Development for an Added Value Infrastructure Designed in Europe), an innovative and energy efficient High Performance Computing cluster designed by E4 Computer Engineering for PRACE (Partnership for Advanced Computing in Europe). D.A.V.I.D.E. is built using best-in-class components (IBM’s POWER8-NVLink CPUs, NVIDIA TESLA P100 GPUs, Mellanox InfiniBand EDR 100...

chapter

A scan-chain based state retention methodology for IoT processors operating on intermittent energy

Pascal Alexander Hager, Hamed Fatemi, Jose Pineda de Gyvez, Luca Benini

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 1171 - 1176

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Future IoT systems are tightly constraint by cost and size and will often be operated from an energy harvester's output. Since these batteryless systems operate on intermittent energy they have to be able to retain their state during the power outages in order to guarantee computation progress. Due to the lack of large energy buffers the state needs to be saved quickly using residual energy only....

chapter

GPUguard: Towards supporting a predictable execution model for heterogeneous SoC

Bjorn Forsberg, Andrea Marongiu, Luca Benini

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 > 318 - 321

2017 Design, Automation & Test in Europe Conference & Exhibition (DATE)

The deployment of real-time workloads on commercial off-the-shelf (COTS) hardware is attractive, as it reduces the cost and time-to-market of new products. Most modern high-end embedded SoCs rely on a heterogeneous design, coupling a general-purpose multi-core CPU to a massively parallel accelerator, typically a programmable GPU, sharing a single global DRAM. However, because of non-predictable hardware...

chapter

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) > 236 - 241

2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Convolutional Neural Networks (CNNs) have revolutionized the world of image classification over the last few years, pushing the computer vision close beyond human accuracy. The required computational effort of CNNs today requires power-hungry parallel processors and GP-GPUs. Recent efforts in designing CNN Application-Specific Integrated Circuits (ASICs) and accelerators for System-On-Chip (SoC) integration...

chapter

A heterogeneous multi-core system-on-chip for energy efficient brain inspired vision

Antonio Pullini, Francesco Conti, Davide Rossi, Igor Loi, more

2016 IEEE International Symposium on Circuits and Systems (ISCAS) > 2910

2016 IEEE International Symposium on Circuits and Systems (ISCAS)

Computer vision (CV) based on Convolutional Neural Networks (CNN) is a rapidly developing field thanks to CNN's flexibility, strong generalization capability and classification accuracy (matching and sometimes exceeding human performance). CNN-based classifiers are typically deployed on servers or high-end embedded platforms. However, their ability to “compress” low information density data such as...

chapter

High-efficiency logarithmic number unit design based on an improved cotransformation scheme

Youri Popoff, Florian Scheidegger, Michael Schaffner, Michael Gautschi, more

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1387 - 1392

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

The logarithmic number system (LNS) has always been an interesting alternative for floating point calculations since the implementation of several arithmetic operations such as divisions, exponentiations and square-roots, which are required for computationally intensive nonlinear functions, is greatly simplified in the logarithmic space. However, additions and subtractions become nonlinear operations...

chapter

Automatic multiview synthesis — Prototype demo

Michael Schaffner, Frank K. Gurkaynak, Hubert Kaeslin, Luca Benini, more

2015 Visual Communications and Image Processing (VCIP) > 1

2015 Visual Communications and Image Processing (VCIP)

Overview. Today, most commercially available 3D display systems require the viewers to wear some sort of shutter-or polarization glasses, which is often regarded as inconvenience. Ideally, a 3D display system should not require the users to wear additional gear. In fact, the optimum would be a display that replicates the original light-field of a scene. So-called multiview aütostereoscopic displays...

chapter

Automatic multiview synthesis — Towards a mobile system on a chip

Michael Schaffner, Frank K. Gurkaynak, Hubert Kaeslin, Luca Benini, more

2015 Visual Communications and Image Processing (VCIP) > 1 - 4

2015 Visual Communications and Image Processing (VCIP)

Over the last couple of years, multiview autostereoscopic displays (MADs) have become commercially available which enable a limited glasses-free 3D experience. The main problem of MADs is that they require several (typically 8 or 9) views, while most of the 3D video content is in stereoscopic 3D (S3D) today. In order to bridge this gap, the research community started to devise automatic multiview...

chapter

Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs

Pirmin Vogel, Andrea Marongiu, Luca Benini

2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS) > 45 - 54

2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)

While high-end heterogeneous systems are increasingly supporting heterogeneous uniform memory access (hUMA) as envisioned by the Heterogeneous System Architecture (HSA) foundation, their low-power counterparts targeting the embedded domain still lack basic features like virtual memory support for accelerators. As opposed to simply passing virtual address pointers, explicit data management involving...

chapter

InfiniTime: A multi-sensor energy neutral wearable bracelet

Michele Magno, Danilo Porcarelli, Davide Brunelli, Luca Benini

International Green Computing Conference > 1 - 8

2014 International Green Computing Conference (IGCC)

Wearable technology is gaining popularity, with people wearing everything "smart" from clothing to glasses and watches. Nowadays wearables are battery-powered and a critical issue is the limited lifetime. So most devices have to be recharged every few days or even hours and thus they miss the expectations for a truly unobtrusive user experience. This paper presents InfiniTIME, a novel sensor-rich...

chapter

A HLS-Based Toolflow to Design Next-Generation Heterogeneous Many-Core Platforms with Shared Memory

Paolo Burgio, Andrea Marongiu, Philippe Coussy, Luca Benini

2014 12th IEEE International Conference on Embedded and Ubiquitous Computing > 130 - 137

2014 12th IEEE International Conference on Embedded and Ubiquitous Computing (EUC)

This work describes how we use High-Level Synthesis to support design space exploration (DSE) of heterogeneous many-core systems. Modern embedded systems increasingly couple hardware accelerators and processing cores on the same chip, to trade specialization of the platform to an application domain for increased performance and energy efficiency. However, the process of designing such a platform is...

chapter

Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters

Paolo Burgio, Giuseppe Tagliavini, Francesco Conti, Andrea Marongiu, more

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1 - 6

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Modern designs for embedded systems are increasingly embracing cluster-based architectures, where small sets of cores communicate through tightly-coupled shared memory banks and high-performance interconnections. At the same time, the complexity of modern applications requires new programming abstractions to exploit dynamic and/or irregular parallelism on such platforms. Supporting dynamic parallelism...

chapter

A tightly-coupled hardware controller to improve scalability and programmability of shared-memory heterogeneous clusters

Paolo Burgio, Robin Danilo, Andrea Marongiu, Philippe Coussy, more

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 1 - 4

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Modern designs for embedded many-core systems increasingly include application-specific units to accelerate key computational kernels with orders-of-magnitude higher execution speed and energy efficiency compared to software counterparts. A promising architectural template is based on heterogeneous clusters, where simple RISC cores and specialized HW units (HWPU) communicate in a tightly-coupled manner...

chapter

A highly efficient, thread-safe software cache implementation for tightly-coupled multicore clusters

Christian Pinto, Luca Benini

2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors > 281 - 288

2013 IEEE 24th International Conference on Application-specific Systems, Architectures and Processors (ASAP)

A widely adopted design paradigm for many-core accelerators features processing elements grouped in clusters. Due to area, power and design simplicity, processors in the same clusters are often not equipped with data-caches but rather share a tightly coupled data memory (TCDM). Even if the use of a TCDM is more energy and area efficient than a cache it requires a higher programming effort because...

chapter

OpenMP-based Synergistic Parallelization and HW Acceleration for On-Chip Shared-Memory Clusters

Paolo Burgio, Andrea Marongiu, Dominique Heller, Cyrille Chavet, more

2012 15th Euromicro Conference on Digital System Design > 751 - 758

2012 15th Euromicro Conference on Digital System Design (DSD)

Modern embedded MPSoC designs increasingly couple hardware accelerators to processing cores to trade between energy efficiency and platform specialization. To assist effective design of such systems there is the need on one hand for clear methodologies to streamline accelerator definition and instantiation, on the other for architectural templates and run-time techniques that minimize processors-to-accelerator...

chapter

A tightly-coupled multi-core cluster with shared-memory HW accelerators

Masoud Dehyadegari, Andrea Marongiu, Mohammad Reza Kakoee, Luca Benini, more

2012 International Conference on Embedded Computer Systems (SAMOS) > 96 - 103

2012 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XII)

Tightly coupling hardware accelerators with processors is a well-known approach for boosting the efficiency of MPSoC platforms. The key design challenges in this area are: (i) streamlining accelerator definition and instantiation and (ii) developing architectural templates and run-time techniques for minimizing the cost of communication and synchronization between processors and accelerators. In this...

chapter

Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs

Jose L. Abellan, Juan Fernandez, Manuel E. Acacio, Davide Bertozzi, more

2012 Design, Automation & Test in Europe Conference & Exhibition (DATE) > 491 - 496

2012 Design, Automation & Test in Europe Conference & Exhibition (DATE 2012)

Barrier synchronization is a key programming primitive for shared memory embedded MPSoCs. As the core count increases, software implementations cannot provide the needed performance and scalability, thus making hardware acceleration critical. In this paper we describe an interconnect extension implemented with standard cells and with a mainstream industrial toolflow. We show that the area overhead...

Keywords:
HARDWARE
Publication type:
book

Publication date

Set your own date range

Keywords

COMPUTER ARCHITECTURE (12)
PROGRAMMING (7)
PROGRAM PROCESSORS (6)
SOFTWARE (6)
ACCELERATION (5)
SYNCHRONIZATION (5)
KERNEL (4)
POWER DEMAND (4)
RANDOM ACCESS MEMORY (3)
REAL-TIME SYSTEMS (3)
REGISTERS (3)
SYSTEM-ON-CHIP (3)
BANDWIDTH (2)
COMPUTATIONAL MODELING (2)
CONVOLUTION (2)
DELAY (2)
ENGINES (2)
FABRICS (2)
FEATURE EXTRACTION (2)
HPC (2)
MEMORY MANAGEMENT (2)
MICROCONTROLLERS (2)
OPENMP (2)
PARALLEL PROCESSING (2)
POWER MANAGEMENT (2)
RENDERING (COMPUTER GRAPHICS) (2)
SUPERCOMPUTERS (2)
SYSTEM-ON-A-CHIP (2)
THREE-DIMENSIONAL DISPLAYS (2)
WIRELESS SENSOR NETWORKS (2)
3D STACKING (1)
ABSTRACTS (1)
ACCURACY (1)
ANDROIDS (1)
BATTERIES (1)
BINARYCONNECT (1)
CAMERAS (1)
CENTRAL PROCESSING UNIT (1)
CLUSTERED ARCHITECTURES (1)
COMPLEXITY THEORY (1)
COMPUTER VISION (1)
CONVOLUTION NEURAL NETWORKS ACCELERATOR (1)
DATA STRUCTURES (1)
DECISION SUPPORT SYSTEMS (1)
DESIGN FLOW (1)
DESIGN SPACE EXPLORATION (1)
DETECTION ALGORITHMS (1)
DYNAMIC SCHEDULING (1)
ELECTRODES (1)
ELECTROMYOGRAPHY (1)
EMBEDDED SYSTEMS (1)
ENCODING (1)
ENERGY AWARE (1)
ENERGY EFFICIENCY (1)
ENERGY NEUTRAL (1)
EUROPE (1)
EXASCALE (1)
FERROELECTRIC FILMS (1)
FIELD PROGRAMMABLE GATE ARRAYS (1)
FLASH FILE SYSTEM (1)
GESTURE RECOGNITION (1)
GRAPHICAL USER INTERFACES (1)
GRAPHICS PROCESSING UNITS (1)
HARDWARE DESIGN LANGUAGES (1)
HETEROGENEOUS ARCHITECTURES (1)
HETEROGENEOUS EMBEDDED SYSTEMS ON CHIP (1)
HLS (1)
HUMANOID ROBOTS (1)
HW ACCELERATION (1)
IMAGE EDGE DETECTION (1)
IMPLANTABLE MEDICAL INSTRUMENTATION (1)
INDOOR ENERGY HARVESTERS (1)
INSTRUCTION SETS (1)
INTEGRATED CIRCUIT INTERCONNECTIONS (1)
INTEGRATED CIRCUITS (1)
INTERPOLATION (1)
LINUX (1)
LIQUID COOLING (1)
LOGIC GATES (1)
LOW-POWER (1)
MANY-CORE (1)
MANY-CORE SYSTEMS (1)
MICROELECTRONICS (1)
MICROPROGRAMMING (1)
MIDDLEWARE (1)
MONITORING (1)
MPSOCS (1)
NEURAL NETWORKS (1)
NONVOLATILE MEMORY (1)
NVLINK (1)
OPTIMIZATION (1)
PERFORMANCE EVALUATION (1)
PORTABLE HEALTH CARE (1)
PORTS (COMPUTERS) (1)
POWER ARCHITECTURE (1)
POWER MEASUREMENT (1)
POWER MONITOR (1)
POWER SYSTEM MANAGEMENT (1)
PREFETCHING (1)
more

INFONA - science communication portal

Search results for: Luca Benini

Towards a Mobile Health Platform with Parallel Processing and Multi-sensor Capabilities

Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach

A sub-10mW real-time implementation for EMG hand gesture recognition based on a multi-core biomedical SoC

Design of an Energy Aware Petaflops Class High Performance Cluster Based on Power Architecture

A scan-chain based state retention methodology for IoT processors operating on intermittent energy

GPUguard: Towards supporting a predictable execution model for heterogeneous SoC

YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights

A heterogeneous multi-core system-on-chip for energy efficient brain inspired vision

High-efficiency logarithmic number unit design based on an improved cotransformation scheme

Automatic multiview synthesis — Prototype demo

Automatic multiview synthesis — Towards a mobile system on a chip

Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs

InfiniTime: A multi-sensor energy neutral wearable bracelet

A HLS-Based Toolflow to Design Next-Generation Heterogeneous Many-Core Platforms with Shared Memory

Tightly-coupled hardware support to dynamic parallelism acceleration in embedded shared memory clusters

A tightly-coupled hardware controller to improve scalability and programmability of shared-memory heterogeneous clusters

A highly efficient, thread-safe software cache implementation for tightly-coupled multicore clusters

OpenMP-based Synergistic Parallelization and HW Acceleration for On-Chip Shared-Memory Clusters

A tightly-coupled multi-core cluster with shared-memory HW accelerators

Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Luca Benini

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options