Search results

Items from 1 to 14 out of 14 results

chapter

Speech recognition and lip shape feature extraction for English vowel pronunciation of the hearing-impaired based on SVM technique

Kyung-Im Han, Hye-Jung Park, Kun-Min Lee

2016 International Conference on Big Data and Smart Computing (BigComp) > 293 - 296

2016 International Conference on Big Data and Smart Computing (BigComp)

The purpose of this study is to suggest the visual teaching method for the English vowel pronunciation, especially for the hearing-impaired who mostly rely on the visual aids, based on the SVM technique. By extracting phonetic features using the SVM technique from the sounds that are hard to hear by ear, the lip shapes for each vowel were refined. The lip shape refinement for vowels is advantageous...

chapter

Recognizing emotion from singing and speaking using shared models

Biqiao Zhang, Georg Essl, Emily Mower Provost

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 139 - 145

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Speech and song are two types of vocal communications that are closely related to each other. While significant progress has been made in both speech and music emotion recognition, few works have concentrated on building a shared emotion recognition model for both speech and song. In this paper, we propose three shared emotion recognition models for speech and song: a simple model, a single-task hierarchical...

chapter

Audio-visual speech recognition with a hybrid SVM-HMM system

Mihai Gurban, Jean-Philippe Thiran

2005 13th European Signal Processing Conference > 1 - 4

2005 13th European Signal Processing Conference

Traditional speech recognition systems use Gaussian mixture models to obtain the likelihoods of individual phonemes, which are then used as state emission probabilities in hidden Markov models representing the words. In hybrid systems, the Gaussian mixtures are replaced by more discriminant classifiers, leading to an improved performance. Most of the time the classifiers used in such systems are neural...

chapter

Multi-modal Voice Activity Detection by Embedding Image Features into Speech Signal

Yohei Abe, Akinori Ito

2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing > 271 - 274

2013 Ninth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP)

Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features,...

chapter

Lip feature selection based on BPSO and SVM

Mengjun Wang

IEEE 2011 10th International Conference on Electronic Measurement & Instruments > 3 > 56 - 59

2011 IEEE 10th International Conference on Electronic Measurement & Instruments (ICEMI)

In speech synthesis system driven by visual speech, many irrelevant and redundant features will lessen the lipreading recognition result. So it is important to select lip features with stronger discriminate performance. Feature selection algorithm based on binary particle swarm optimization (BPSO) and support vector machines (SVM) is used to select the “optimal” lip feature subset. Feature subset...

chapter

A multi-modal video analysis system

Shilin Zhang, Heping Li, Shuwu Zhang

2011 IEEE 3rd International Conference on Communication Software and Networks > 176 - 179

2011 IEEE 3rd International Conference on Communication Software and Networks (ICCSN)

In this paper, we present a system for Chinese news program management based on cross media video analysis. Audio, caption text and video frames are all important for a person to understand the meaning of the video. Given these facts, we devised a system integrating continuous Chinese speech recognition (ASR), video caption text recognition (VOCR) and object/scene recognition (OR). The news program...

chapter

Using multiple visual tandem streams in audio-visual speech recognition

Ibrahim Saygin Topkaya, Hakan Erdogan

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4988 - 4991

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The method which is called the “tandem approach” in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach...

chapter

Lip reading using optical flow and support vector machines

A A Shaikh, D K Kumar, W C Yau, M Z C Azemin, more

2010 3rd International Congress on Image and Signal Processing > 1 > 327 - 330

3rd International Congress on Image and Signal Processing (CISP 2010)

This paper presents a lip reading technique to classify the discrete utterances without evaluating the acoustic signals. The reported technique analysis the video data of lip motions by computing the optical flow (OF). The statistical properties of the vertical OF component were used to form the feature vectors for training the support vector machines (SVM) classifier. The impact of the variation...

chapter

Lip Detection and Tracking Using Variance Based Haar-Like Features and Kalman filter

Lirong Wang, Xiaoli Wang, Jing Xu

2010 Fifth International Conference on Frontier of Computer Science and Technology > 608 - 612

2010 Fifth International Conference on Frontier of Computer Science and Technology (FCST 2010)

Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. Lip reading is an active field that receives much attention from computer scientists. Its applications take part not only in science, such as a speech recognition system, but also in social activities, such as teaching pronunciation for deaf children in order to recover their speaking ability,...

article

Modeling Dominance in Group Conversations Using Nonverbal Activity Cues

D.B. Jayagopi, H. Hung, Chuohao Yeo, D. Gatica-Perez

IEEE Transactions on Audio, Speech, and Language Processing > 2009 > 17 > 3 > 501 - 513

Dominance - a behavioral expression of power - is a fundamental mechanism of social interaction, expressed and perceived in conversations through spoken words and audiovisual nonverbal cues. The automatic modeling of dominance patterns from sensor data represents a relevant problem in social computing. In this paper, we present a systematic study on dominance modeling in group meetings from fully...

chapter

SoTong: An Aware System of Relation Oriented Communication for Enhancing Family Relationship

Hyun Sang Cho, Dongwook Lee, Soohyun Lim, Minsoo Hahn

2008 International Symposium on Ubiquitous Virtual Reality > 71 - 74

International Symposium on Ubiquitous Virtual Reality - ISUVR 2008

In this paper we propose "SoTong" system for enhancing family relationship with relation oriented communication. In the relation oriented communications, we focus on the relationship by representation and promotion of relations together with awareness of other's situation for the connectedness and the coexistence. The system captures and analyzes communication channels among modern families...

chapter

Knowledge-assisted cross-media analysis of audio-visual content in the news domain

V. Mezaris, S. Gidaros, G.T. Papadopoulos, W. Kasper, more

2008 International Workshop on Content-Based Multimedia Indexing > 280 - 287

2008 International Workshop on Content-based Multimedia Indexing - CBMI 2008

In this paper, a complete architecture for knowledge-assisted cross-media analysis of News-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately, and proposes a fusion technique based on the particular characteristics of News-related...

chapter

Query-independent learning for video search

Yuan Liu, Tao Mei, Guojun Qi, Xiuqing Wu, more

2008 IEEE International Conference on Multimedia and Expo > 1249 - 1252

2008 IEEE International Conference on Multimedia and Expo (ICME)

Most of existing learning-based methods for query-by-example take the query examples as ldquopositiverdquo and build a model for each query. These methods, referred to as query-dependent, only achieved limited success as they can hardly be applied to real-world applications, in which an arbitrary query is usually given. To address this problem, we propose to learn a query-independent model by exploiting...

chapter

Automatic video annotation through search and mining

E. Moxley, Tao Mei, Xian-Sheng Hua, Wei-Ying Ma, more

2008 IEEE International Conference on Multimedia and Expo > 685 - 688

2008 IEEE International Conference on Multimedia and Expo (ICME)

Conventional approaches to video annotation predominantly focus on supervised identification of a limited set of concepts, while unsupervised annotation with infinite vocabulary remains unexplored. This work aims to exploit the overlap in content of news video to automatically annotate by mining similar videos that reinforce, filter, and improve the original annotations. The algorithm employs a two-step...

Filter options

Data set:
ieee
Keywords:
SUPPORT VECTOR MACHINES
VISUALIZATION
SPEECH RECOGNITION

Publication date

Set your own date range

Publication type

book (13)
article (1)

Keywords

FEATURE EXTRACTION (10)
SPEECH (7)
HIDDEN MARKOV MODELS (3)
SVM (3)
VIDEO SIGNAL PROCESSING (3)
ACCURACY (2)
DATA MINING (2)
IMAGE MOTION ANALYSIS (2)
MEDIA (2)
MONITORING (2)
REAL TIME SYSTEMS (2)
SHAPE (2)
SIGNAL TO NOISE RATIO (2)
SOCIAL SCIENCES COMPUTING (2)
TRAINING (2)
ACOUSTICS (1)
AGING (1)
ASR (1)
ASYNCHRONOUS COMMUNICATION (1)
AUDIO-VISUAL (1)
AUDIO-VISUAL CONTENT (1)
AUDIO-VISUAL SPEECH RECOGNITION (1)
AUDIOVISUAL ACTIVITY CUES (1)
AUDIOVISUAL INTEGRATION (1)
AVATAR (1)
AVATARS (1)
AWARE SYSTEM (1)
BINARY PARTICLE SWARM OPTIMIZATION (1)
BIT RATE (1)
BOATS (1)
BOW (1)
BUILDINGS (1)
COMMUNICATION CHANNELS (1)
COMPUTATIONAL MODELING (1)
COMPUTER VISION (1)
COMPUTERS (1)
CONTEXT MODELING (1)
CONTEXT-AWARE (1)
CONTEXT-AWARE SERVICES (1)
CONTEXT-REPRESENTATIVE SYMBOLS (1)
COPPER (1)
COUPLED HIDDEN MARKOV MODELS (1)
DATA COMPRESSION (1)
DATA MODELS (1)
DATA VISUALIZATION (1)
DATABASES (1)
DISCRETE COSINE TRANSFORMS (1)
DISCRETE UTTERANCE (1)
DISPLAYS (1)
DOMINANCE (1)
DOMINANCE MODELING (1)
EDUCATION (1)
ELECTRONIC MAIL (1)
EMOTION RECOGNITION (1)
ENGINES (1)
ENGLISH VOWEL (1)
FACE (1)
FAMILY COMMUNICATION (1)
FAMILY RELATIONSHIP ENHANCEMENT (1)
FEATURE SELECTION (1)
FEATURE VECTOR (1)
FEATURES (1)
FORECASTING (1)
FORMANT FREQUENCY (1)
FREQUENCY MEASUREMENT (1)
FUSION TECHNIQUE (1)
GENDER (1)
GEOMETRY (1)
GROUP CONVERSATIONS (1)
GROUP MEETINGS (1)
HAAR-LIKE (1)
HAAR-LIKE FEATURES (1)
HEARING-IMPAIRED (1)
HIDDEN MARKOV MODEL (1)
HISTORY (1)
HOG (1)
IMAGE COLOR ANALYSIS (1)
INFORMATION HIDING (1)
INTERNET (1)
INTIMATE RELATION (1)
JOINING PROCESSES (1)
JOINTS (1)
KALMAN FILTER (1)
KALMAN FILTERS (1)
KNOWLEDGE REPRESENTATION (1)
KNOWLEDGE-ASSISTED CROSS-MEDIA ANALYSIS (1)
LEARNING SYSTEMS (1)
LIP DETECTION (1)
LIP MOTION (1)
LIP READING (1)
LIP SHAPE (1)
LIP TRACKING (1)
LIP-READING SYSTEM (1)
LIPREADING (1)
LIPS (1)
MATRIX DECOMPOSITION (1)
MEDICAL SERVICES (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options