The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents an empirical evidence of user bias within a laboratory-oriented evaluation of a Spoken Dialog System. Specifically, we addressed user bias in their satisfaction judgements. We question the reliability of this data for modeling user emotion, focusing on contentment and frustration in a spoken dialog system. This bias is detected through machine learning experiments that were conducted...
Malaysia's political scene has been full of requests and demands for apology from errant politicians on certain sensitive issues. The media has had a field day covering the various politicians' resentment, annoyance and even rage at the slightest provocation of inadequacy or other accusations on the victim's part, and a righteous stance on the accuser's part. Name-calling and gutter politics are common...
This paper describes a reading quality scoring system based on large vocabulary continuous speech recognition (LVCSR). Our previous scoring system was based on forced alignment. A disadvantage of forced alignment based system is it can hardly catch huge kinds of reading miscues, while LVCSR based system avoids this disadvantage. The most challenge was that the LVCSR recognition rate was low on our...
Identifying cultural discrepancies in worldviews is of high priority to Cultural Intelligence (CULINT). This paper presents a CULINT computer-based methodology for increasing cultural awareness. By automatically identifying themes/motifs in textual data and using machine translation, we expose cultural discrepancies in cultural understanding. This novel methodology is empirically tested through the...
This paper discusses a remote control system of home electrical appliances using speech recognition. It is very convenient system for not only visual-impaired people but also elderly people to control household appliances based on the speech commands. The goal of our system is that the many kinds of household appliances such as television, video recorder and air conditioner are controlled based on...
This paper describes the analyses of the prosody of Vietnamese emotional speech, accomplished to find the relations between prosodic variations and emotional states in Vietnamese speech. These relations were obtained by investigating the variations of prosodic features in Vietnamese emotional speech in comparison with prosodic features of neutral speech. The analyses were performed on a multi-style...
This paper deals with a post-processing phase of automatic transcription of spoken documents stored in the large Czech Radio audio archive (containing hundreds of thousands of recordings). The ultimate goal of the project is to transcribe them and to allow public access to their content. In this paper we focus on methods and algorithms for unsupervised post-processing of automatically recognized recordings...
In this paper, we examined the feasibility of articulatory phonetic inversion (API) conditioned on the auditory qualities for improved speech recognition. And we introduced an efficient data-driven heuristic learning algorithm to capture the articulatory-phonetic features (APFs) of English speech. Then we reported the performance of the combined auditory and articulatory processing methods in the...
In this paper, we propose a very efficient novel parametric model to describe the surface and structure of the human tongue and a corresponding mathematical model for performing 3D tongue animation. A skeletal chain of virtual bones is automatically generated depending on the geometric features of the 3D object, allowing each tongue segment to be easily manipulated by its corresponding parameters,...
Short Utterance Speaker Recognition (SUSR) is an important area of speaker recognition when only small amount of speech data is available for testing and training. We list the most commonly used state-of-the-art methods of speaker recognition and the significance of prosodic speaker recognition. A short survey of SUSR is hereby conducted, highlighting various methodologies when using short utterances...
This paper takes consideration of (voices of) the characteristics of voice processing by the human auditory system, adopts triangle filter to do signal preprocessing, and uses logarithm operations of all filter output for extracting Mel frequency cepstrum Coefficient (MFCC). By Matlab simulation of MFCC vectors of typical signal of male and female, an analyses is given of the probability to be applied...
This paper presents an analysis of the effect of thirteen different kinds of sound on visual gaze when looking freely at videos to help to predict eye positions. First, an audio-visual experiment was designed with two groups of participants, with audio-visual (AV) and visual (V) conditions, to test the sound effect. Then, an audio experiment was designed to validate the classification of sound we...
Voice based call centers enable customers to query for information by speaking to agents in the call center. Most often these call conversations are recorded for analysis with the intent of trying to identify things that can help improve the performance of the call center to serve the customer better. Today the recorded conversations are analyzed by humans by listening to call conversations, which...
Although electro larynx speech provides an important means for the laryngectomees for oral communication, the resulting speech is of poor intelligibility due to the radiated noise caused by the instrument. This paper concentrates here on the derivation of a minimum mean-square error spectral amplitude estimator, and on its application in electro larynx speech enhancement, also, the frequency domain...
In this paper we design a system that adopts a novel approach for emotional classification from human dialogue based on text and speech context. Our main objective is to boost the accuracy of speech emotional classification by accounting for the features extracted from the spoken text. The proposed system concatenates text and speech features and feeds them as one input to the classifier. The work...
Performance of the speaker verification systems is typically measured based on their binary decision accuracy. However, in speaker verification applications where close to %100 accuracy is required, such as the systems that are used in the call centers of finance companies, it is not possible to rely on the binary decisions of the existing verification systems. Still, in such cases, multi-class verification...
Document summarization algorithms are most commonly evaluated according to the intrinsic quality of the summaries they produce. An alternate approach is to examine the extrinsic utility of a summary, measured by the ability of the summary to aid a human in the completion of a specific task. In this paper, we use topic identification as a proxy for relevancy determination in the context of an information...
Emotions are an important part of human communication and are expressed both verbally and non-verbally. Common nonverbal vocalizations such as laughter, cries and sighs carry important emotional content in conversations. Sighs often are associated with negative emotion. In this work, we show that emotional sighs exist along both ends of the valence axis (positive-emotion vs. negative-emotion sighs)...
While automated speaker recognition by machines can be quite good as seen in NIST Speaker Recognition Evaluations, performance can still suffer when the environmental conditions, emotions, or recording quality changes. This research examines how robust humans are compared to machine recognition for changing environments. Several data conditions including short sentences, frequency selective noise,...
Natural pitch fluctuations are essential to human singing. To effectively synthesize singing voice, the generation of these pitch fluctuations is necessary. Previous synthesis methods classify and reproduce them individually. These fluctuations, however, are found to be dependent and vary under different contexts. This paper proposes a generalized framework for F0 modelling to learn and generate these...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.