A 90 nm CMOS, <inline-formula><tex-math notation="LaTeX">$6\ {\upmu {\text{W}}}$</tex-math></inline-formula> Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Komail M. H. Badami; Steven Lauwereins; Wannes Meert; Marian Verhelst

doi:10.1109/JSSC.2015.2487276

A 90 nm CMOS, $6\ {\upmu {\text{W}}}$ Power-Proportional Acoustic Sensing Frontend for Voice Activity Detection

Badami, K.M.H., Lauwereins, S., Meert, W., Verhelst, M.

Source

IEEE Journal of Solid-State Circuits > 2016 > 51 > 1 > 291 - 302

Abstract

This work presents a

${\text{sub}}{\text{-}}6\ \upmu {\text{W}} $

acoustic frontend for speech/non-speech classification in a voice activity detection (VAD) in 90 nm CMOS. Power consumption of the VAD system is minimized by architectural design around a new power-proportional sensing paradigm and the use of machine-learning-assisted moderate-precision analog analytics for classification. Power-proportional sensing allows for hierarchical and context-aware scaling of the frontend’s power consumption depending on the complexity of the ongoing information extraction, while the use of analog analytics brings increased power efficiency through switching <sc>on</sc>/<sc>off</sc> the computation of individual features depending on the features’ usefulness in a particular context. The proposed VAD system reduces the power consumption by

$\text{{10}} \times $

as compared to state-of-the-art (SotA) systems and yet achieves an 89% average hit rate (HR) for a 12 dB signal-to-acoustic-noise ratio (SANR) in babble context, which is at par with software-based VAD systems.