The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We are now developing a Japanese speaking test called SCAT, which is part of J-CAT (Japanese Computerized Adaptive Test), a free online proficiency test for Japanese language learners. In this paper, we focus on the sentence-reading-aloud task and the sentence generation task in SCAT, and propose an automatic scoring method for estimating the overall score of answer speech, which is holistically determined...
In this paper, we explore the retrieval of perceptually similar audio. It focuses on finding sounds according to human perceptions. Thus such retrieval is more “human-centered” [1] than previous audio retrievals which intend to find homologous sounds. We make comprehensive use of various acoustic features to measure the perceptual similarity. Since some acoustic features may be redundant or even adverse...
This paper presents a method of automatic lexical stress assessment for L2 English speech. Syllable stress can be labeled at three levels - primary (P), secondary (S) and no (N) stress, but secondary stress may vary among word pronunciations within and across accents and present difficulties for human perception. Hence, evaluation of lexical stress based on all three levels (i.e., the P-S-N criterion...
Previous studies have demonstrated that the left hemisphere was specialized in language function using functional magnetic resonance imaging (fMRI). On the other hand, some studies have revealed that the right hemisphere was related with language function. The hypotheses of this study were that (1) the regions related with language function have a bilateral functional network and (2) the level of...
This paper presents an empirical evidence of user bias within a laboratory-oriented evaluation of a Spoken Dialog System. Specifically, we addressed user bias in their satisfaction judgements. We question the reliability of this data for modeling user emotion, focusing on contentment and frustration in a spoken dialog system. This bias is detected through machine learning experiments that were conducted...
This paper describes a reading quality scoring system based on large vocabulary continuous speech recognition (LVCSR). Our previous scoring system was based on forced alignment. A disadvantage of forced alignment based system is it can hardly catch huge kinds of reading miscues, while LVCSR based system avoids this disadvantage. The most challenge was that the LVCSR recognition rate was low on our...
This paper deals with a post-processing phase of automatic transcription of spoken documents stored in the large Czech Radio audio archive (containing hundreds of thousands of recordings). The ultimate goal of the project is to transcribe them and to allow public access to their content. In this paper we focus on methods and algorithms for unsupervised post-processing of automatically recognized recordings...
In this paper, we examined the feasibility of articulatory phonetic inversion (API) conditioned on the auditory qualities for improved speech recognition. And we introduced an efficient data-driven heuristic learning algorithm to capture the articulatory-phonetic features (APFs) of English speech. Then we reported the performance of the combined auditory and articulatory processing methods in the...
This paper presents an analysis of the effect of thirteen different kinds of sound on visual gaze when looking freely at videos to help to predict eye positions. First, an audio-visual experiment was designed with two groups of participants, with audio-visual (AV) and visual (V) conditions, to test the sound effect. Then, an audio experiment was designed to validate the classification of sound we...
Voice based call centers enable customers to query for information by speaking to agents in the call center. Most often these call conversations are recorded for analysis with the intent of trying to identify things that can help improve the performance of the call center to serve the customer better. Today the recorded conversations are analyzed by humans by listening to call conversations, which...
In this paper we design a system that adopts a novel approach for emotional classification from human dialogue based on text and speech context. Our main objective is to boost the accuracy of speech emotional classification by accounting for the features extracted from the spoken text. The proposed system concatenates text and speech features and feeds them as one input to the classifier. The work...
Performance of the speaker verification systems is typically measured based on their binary decision accuracy. However, in speaker verification applications where close to %100 accuracy is required, such as the systems that are used in the call centers of finance companies, it is not possible to rely on the binary decisions of the existing verification systems. Still, in such cases, multi-class verification...
Emotions are an important part of human communication and are expressed both verbally and non-verbally. Common nonverbal vocalizations such as laughter, cries and sighs carry important emotional content in conversations. Sighs often are associated with negative emotion. In this work, we show that emotional sighs exist along both ends of the valence axis (positive-emotion vs. negative-emotion sighs)...
We introduce a novel metric for speech recognition success in voice search tasks, designed to reflect the impact of speech recognition errors on user's overall experience with the system. The computation of the metric is seeded using intuitive labels from human subjects and subsequently automated by replacing human annotations with a machine learning algorithm. The results show that search-based recognition...
Computer lip-reading is one of the great signal processing challenges. Not only is the signal noisy, it is variable. However it is almost unknown to compare the performance with human lip-readers. Partly this is because of the paucity of human lip-readers and partly because most automatic systems only handle data that are trivial and therefore not representative of human speech. Here we generate a...
This paper shows that pattern classification based on machine learning is a powerful tool for analyzing human brain activity data obtained by magnetoencephalography (MEG). In our previous work, a weighting method using multiple kernel learning was proposed, but this method had a high computational cost. In this paper, we propose a novel and fast weighting method using an AdaBoost algorithm to find...
Whenever we listen to a voice for the first time, we attribute personality traits to the speaker. The process takes place in a few seconds and it is spontaneous and unaware. While the process is not necessarily accurate (attributed traits do not necessarily correspond to the actual traits of the speaker), still it significantly influences our behavior toward others, especially when it comes to social...
In a leading service economy like India, services lie at the very center of economic activity. Competitive organizations now look not only at the skills and knowledge, but also at the behavior required by an employee to be successful on the job. Emotionally competent employees can effectively deal with occupational stress and maintain psychological well-being. This study explores the scope of the...
This paper presents reliability of MLP in speaker identification using characteristics extracted from their voices. Classification accuracy depends on speaking condition and varies up to 23% depending on the selected speaking condition. Results of simulation experiment show that MLP is effective in speaker identification, especially in the case of retelling and synchronous speech where we achieved...
Spoken interactions usually have accurate timing and alignment between interlocutors: turn-taking and topic flow are managed in a manner that provides conversational fluency and smooth progress of the interaction. Turn-taking and topic flow are also important in applications such as robot companions that interact with a user in real time. The creation of a multimodal conversational corpus for modeling...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.