The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features...
This paper proposes methods of using restricted Boltzmann machines (RBM) to generate the sequence of lip images for visual speech synthesis. The aim of our proposed methods is to alleviate the over-smoothing effect of the conventional hidden Markov model (HMM) based statistical approach for lip synthesis. Two model structures using RBMs to model and generate lip movements are investigated in this...
Audio-visual speech recognition (AVSR) involves recognising of what a speaker is uttering using both audio and visual cues. While phonemes, the units of speech in the audio domain, are well documented, this is not equally true for the speech units in the visual domain: visemes. In the literature, only a generic viseme definition is recognised. There is no agreement on what visemes practically imply,...
This paper proposed a new feature extraction method for lip-reading, named DCT+LSDA. Discrete Cosine Transform (DCT) is a popular method used to reduce the dimension of the data and it has been very efficient in lipreading. Linear Discriminant Analysis (LDA) is a method to study the class relationship between data points, it is very useful method for dimensionality reduction and feature extraction...
In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach...
In this paper, a hybrid visual feature extraction method that combines the extended locally linear embedding (LLE) with visemic linear discriminant analysis (LDA) was presented for the audio-visual speech recognition (AVSR). Firstly the extended LLE is presented to reduce the dimension of the mouth images, which constrains the scope of finding mouth data neighborhood to the corresponding individual's...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.