The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The purpose of this study is to suggest the visual teaching method for the English vowel pronunciation, especially for the hearing-impaired who mostly rely on the visual aids, based on the SVM technique. By extracting phonetic features using the SVM technique from the sounds that are hard to hear by ear, the lip shapes for each vowel were refined. The lip shape refinement for vowels is advantageous...
Speech and song are two types of vocal communications that are closely related to each other. While significant progress has been made in both speech and music emotion recognition, few works have concentrated on building a shared emotion recognition model for both speech and song. In this paper, we propose three shared emotion recognition models for speech and song: a simple model, a single-task hierarchical...
Traditional speech recognition systems use Gaussian mixture models to obtain the likelihoods of individual phonemes, which are then used as state emission probabilities in hidden Markov models representing the words. In hybrid systems, the Gaussian mixtures are replaced by more discriminant classifiers, leading to an improved performance. Most of the time the classifiers used in such systems are neural...
Lip movement has a close relationship with speech because the lips move when we talk. The idea behind this work is to extract the lip movement feature from the facial video and embed the movement feature into speech signal using information hiding technique. Using the proposed framework, we can provide advanced speech communication only using the speech signal that includes lip movement features,...
In speech synthesis system driven by visual speech, many irrelevant and redundant features will lessen the lipreading recognition result. So it is important to select lip features with stronger discriminate performance. Feature selection algorithm based on binary particle swarm optimization (BPSO) and support vector machines (SVM) is used to select the “optimal” lip feature subset. Feature subset...
In this paper, we present a system for Chinese news program management based on cross media video analysis. Audio, caption text and video frames are all important for a person to understand the meaning of the video. Given these facts, we devised a system integrating continuous Chinese speech recognition (ASR), video caption text recognition (VOCR) and object/scene recognition (OR). The news program...
The method which is called the “tandem approach” in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach...
This paper presents a lip reading technique to classify the discrete utterances without evaluating the acoustic signals. The reported technique analysis the video data of lip motions by computing the optical flow (OF). The statistical properties of the vertical OF component were used to form the feature vectors for training the support vector machines (SVM) classifier. The impact of the variation...
Speaker lip motion stands out as the most linguistically relevant visual feature for speech recognition. Lip reading is an active field that receives much attention from computer scientists. Its applications take part not only in science, such as a speech recognition system, but also in social activities, such as teaching pronunciation for deaf children in order to recover their speaking ability,...
Dominance - a behavioral expression of power - is a fundamental mechanism of social interaction, expressed and perceived in conversations through spoken words and audiovisual nonverbal cues. The automatic modeling of dominance patterns from sensor data represents a relevant problem in social computing. In this paper, we present a systematic study on dominance modeling in group meetings from fully...
In this paper we propose "SoTong" system for enhancing family relationship with relation oriented communication. In the relation oriented communications, we focus on the relationship by representation and promotion of relations together with awareness of other's situation for the connectedness and the coexistence. The system captures and analyzes communication channels among modern families...
In this paper, a complete architecture for knowledge-assisted cross-media analysis of News-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately, and proposes a fusion technique based on the particular characteristics of News-related...
Most of existing learning-based methods for query-by-example take the query examples as ldquopositiverdquo and build a model for each query. These methods, referred to as query-dependent, only achieved limited success as they can hardly be applied to real-world applications, in which an arbitrary query is usually given. To address this problem, we propose to learn a query-independent model by exploiting...
Conventional approaches to video annotation predominantly focus on supervised identification of a limited set of concepts, while unsupervised annotation with infinite vocabulary remains unexplored. This work aims to exploit the overlap in content of news video to automatically annotate by mining similar videos that reinforce, filter, and improve the original annotations. The algorithm employs a two-step...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.