The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The paper focuses on the design of a practical system pipeline for always-listening, far-field spoken command recognition in everyday smart indoor environments that consist of multiple rooms equipped with sparsely distributed microphone arrays. Such environments, for example domestic and multi-room offices, present challenging acoustic scenes to state-of-the-art speech recognizers, especially under...
In this work, we investigate robust speech energy estimation and tracking schemes aiming at improved energy-based multiband speech demodulation and feature extraction for multi-microphone distant speech recognition. Based on the spatial diversity of the speech and noise recordings of a multi-microphone setup, the proposed Multichannel, Multiband Demodulation (MMD) scheme includes: 1) energy selection...
In this paper we discuss the integration of a communication model in the MOBOT assistive robotic platform and its evaluation by target users. The MOBOT platform envisions the development of cognitive robotic assistant prototypes that act proactively, adaptively and interactively with respect to elderly humans with slight walking and cognitive impairments. The respective multimodal action recognition...
The main goal of this work is the development of an improved Large Vocabulary Continuous Speech Recognition (LVCSR) framework in Greek. Language modeling is carried out in a collection of journalistic text and in the acoustic signal processing, a nonlinear approach is implemented for deriving features of the AM-FM type. Experimentation is carried out in both clean and simulated far-field speech offering...
The visual processing of Sign Language (SL) videos offers multiple interdisciplinary challenges for image processing and recognition. Based on tracking and visual feature extraction, we investigate SL visual phonetic modeling by exploiting statistical subunit (SU) models of movement-position and handshape. We further propose a new framework to construct a data-driven lexicon that retains phonetics'...
We propose an Unsupervised method for Extreme States Classification (UnESC) on feature spaces of facial cues of interest. The method is built upon Active Appearance Models (AAM) face tracking and on feature extraction of Global and Local AAMs. UnESC is applied primarily on facial pose, but is shown to be extendable for the case of local models on the eyes and mouth. Given the importance of facial...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.