Search results

Items from 1 to 6 out of 6 results

chapter

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading

Satoshi Tamura, Hiroshi Ninomiya, Norihide Kitaoka, Shin Osuga, more

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 575 - 582

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper develops an Audio-Visual Speech Recognition (AVSR) method, by (1) exploring high-performance visual features, (2) applying audio and visual deep bottleneck features to improve AVSR performance, and (3) investigating effectiveness of voice activity detection in a visual modality. In our approach, many kinds of visual features are incorporated, subsequently converted into bottleneck features...

chapter

LIP movement generation using restricted Boltzmann machines for visual speech synthesis

Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 606 - 610

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

This paper proposes methods of using restricted Boltzmann machines (RBM) to generate the sequence of lip images for visual speech synthesis. The aim of our proposed methods is to alleviate the over-smoothing effect of the conventional hidden Markov model (HMM) based statistical approach for lip synthesis. Two model structures using RBMs to model and generate lip movements are investigated in this...

article

Viseme definitions comparison for visual-only speech recognition

Luca Cappelletta, Naomi Harte

02011 00019th European Signal Processing Conference > 2011 > 2109 - 2113

2011 19th European Signal Processing Conference

Audio-visual speech recognition (AVSR) involves recognising of what a speaker is uttering using both audio and visual cues. While phonemes, the units of speech in the audio domain, are well documented, this is not equally true for the speech units in the visual domain: visemes. In the literature, only a generic viseme definition is recognised. There is no agreement on what visemes practically imply,...

chapter

Feature Extraction Based on LSDA for Lipreading

Liang Yaling, Yao Wenjuan, Du Minghui

2010 International Conference on Multimedia Technology > 1 - 4

2010 International Conference on Multimedia Technology (ICMT)

This paper proposed a new feature extraction method for lip-reading, named DCT+LSDA. Discrete Cosine Transform (DCT) is a popular method used to reduce the dimension of the data and it has been very efficient in lipreading. Linear Discriminant Analysis (LDA) is a method to study the class relationship between data points, it is very useful method for dimensionality reduction and feature extraction...

chapter

Inter-frame contextual modelling for visual speech recognition

A Pass, Ji Ming, P Hanna, Jianguo Zhang, more

2010 IEEE International Conference on Image Processing > 93 - 96

2010 17th IEEE International Conference on Image Processing (ICIP 2010)

In this paper, we present a new approach to visual speech recognition which improves contextual modelling by combining Inter-Frame Dependent and Hidden Markov Models. This approach captures contextual information in visual speech that may be lost using a Hidden Markov Model alone. We apply contextual modelling to a large speaker independent isolated digit recognition task, and compare our approach...

chapter

A hybrid visual feature extraction method for audio-visual speech recognition

Guanyong Wu, Jie Zhu, Haihua Xu

2009 16th IEEE International Conference on Image Processing (ICIP) > 1829 - 1832

2009 16th IEEE International Conference on Image Processing (ICIP 2009)

In this paper, a hybrid visual feature extraction method that combines the extended locally linear embedding (LLE) with visemic linear discriminant analysis (LDA) was presented for the audio-visual speech recognition (AVSR). Firstly the extended LLE is presented to reduce the dimension of the mouth images, which constrains the scope of finding mouth data neighborhood to the corresponding individual's...

Filter options

Data set:
ieee
Keywords:
FEATURE EXTRACTION
PRINCIPAL COMPONENT ANALYSIS
HIDDEN MARKOV MODELS
VISUALIZATION

Publication date

Set your own date range

Publication type

book (5)
article (1)

Keywords

SPEECH RECOGNITION (5)
SPEECH (3)
DISCRETE COSINE TRANSFORMS (2)
HIDDEN MARKOV MODEL (2)
MOUTH (2)
TRAINING (2)
ACCURACY (1)
AUDIO-VISUAL FUSION PERIOD (1)
AUDIO-VISUAL SPEECH RECOGNITION (1)
AUDIO-VISUAL SYSTEMS (1)
AUDIOVISUAL SPEECH RECOGNITION (AVSR) (1)
AVASR (1)
CONTEXT MODELING (1)
CONTEXTUAL MODELLING (1)
CUAVE DATABASE (1)
DATABASES (1)
DEEP BELIEF NETWORK (1)
DIGIT RECOGNITION (1)
DIMENSIONALITY REDUCTION (1)
DISCRETE COSINE TRANSFORM (1)
EUCLIDEAN SPACE (1)
FEATURE VECTORS (1)
HYBRID VISUAL FEATURE EXTRACTION METHOD (1)
INTERFRAME CONTEXTUAL MODELLING (1)
LDA (1)
LINEAR DISCRIMINANT ANALYSIS (1)
LIPREADING (1)
LIPREADING SYSTEM (1)
LOCALITY SENSITIVE DISCRIMINATE ANALYSIS (1)
LOCALLY LINEAR EMBEDDING (1)
LOCALLY LINEAR EMBEDDING (LLE) (1)
MATRIX ALGEBRA (1)
MCE BASED DISCRIMINATIVE TRAINING METHOD (1)
MINIMUM CLASSIFICATION ERROR (MCE) (1)
MINIMUM CLASSIFICATION ERROR TRAINING (1)
MOUTH DATA NEIGHBORHOOD (1)
MOUTH IMAGE MATRICES (1)
MOUTH IMAGES (1)
NOISY ENVIRONMENT (1)
OPTIMAL CLASSIFICATION (1)
PCA (1)
PROBABILITY (1)
RESTRICTED BOLTZMANN MACHINE (1)
SEGMENTAL GENERALIZED PROBABILISTIC DESCENT (1)
SIGNAL CLASSIFICATION (1)
SPEECH DYNAMIC (1)
SPEECH DYNAMICS (1)
SPEECH SYNTHESIS (1)
STATISTICAL ANALYSIS (1)
VECTORS (1)
VISEMIC LINEAR DISCRIMINANT ANALYSIS (1)
VISEMIC LINEAR DISCRIMINANT SPACE (1)
VISUAL SPEECH RECOGNITION (1)
VISUAL SPEECH SYNTHESIS (1)
VISUAL-ONLY SPEECH RECOGNITION (1)
more

INFONA - science communication portal

Search results

Audio-visual speech recognition using deep bottleneck features and high-performance lipreading

LIP movement generation using restricted Boltzmann machines for visual speech synthesis

Viseme definitions comparison for visual-only speech recognition

Feature Extraction Based on LSDA for Lipreading

Inter-frame contextual modelling for visual speech recognition

A hybrid visual feature extraction method for audio-visual speech recognition

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options