The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
I-vector adaptation of DNN-HMM acoustic models has shown clear performance improvement for speech recognition. In this paper, we study this technique on Babel task. we use Swahili as target language (training data of 50 hours) and another 6 languages as multilingual resources to train i-vector extractors respectively. Our study shows that i-vector extractors trained with more multilingual data only...
Vowel regions play important role in various speech tasks, such as speech segmentation, speaker-verification, prosody modification and emotion conversion. The instants at which the onset and offset of vowel take place in the speech signal are known as vowel onset point and vowel offset point, respectively. Vowel regions start with the vowel onset point and end with the vowel offset point. In this...
Scene understanding in the context of a smart meeting room involves the extraction of various kinds of cues at different levels of semantic abstraction. Specifically, human activity in a scene is usually monitored using arrays of audio and visual sensors. Tasks such as person localization and tracking, speaker ID, focus of attention detection, speech recognition and affective state recognition are...
Reverberant environments pose a challenge to speech acquisition from distant microphones. Approaches using microphone arrays have met with limited success. Recent research using audio-visual sensors for tasks such as speaker localization has shown improvement over traditional audio-only approaches. Using computer vision techniques we can estimate the orientation of the speaker's head in addition to...
In general, human beings make use of expressions (emotions) through speech, facial movements and gestures for conveying the crucial information. Mostly, expressions in speech can be attributed to longer segments, i.e., suprasegmental features also known to be prosodic features. In this paper we analyze the expressions in speech using prosodic features from utterance level, word level and syllable...
This work demonstrates the development of Keyword Spotting (KWS) system using Vowel Onset Point (VOP), Vector Quantization (VQ) and Hidden Markov Model(HMM) based techniques. The goal of KWS system is to spot the keywords present in the test speech signal, while neglecting rest of the words. In this work, first independent KWS systems will be developed using VOP, VQ and HMM techniques. Each of these...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.