The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we investigate the effectiveness of articulatory information for Mandarin tone modeling and recognition in a deep neural network – hidden Markov model (DNN-HMM) framework. In conventional approaches, prosodic evidence (e.g., F0, duration and energy) is used to build tone classifiers, we here propose performance enhancement techniques in three areas: (i) adding articulatory features...
Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance...
In process of learning Chinese as a second language (CSL), Japanese natives have difficulties in tone perception. Among the four Chinese lexical tones, the tone pairs Tone 1-Tone 2 and Tone 1-Tone 4 are problematic for Japanese CSL beginners. In order to help them develop efficiently discriminating capability of the tone pairs, we designed a hybrid perceptual training scheme which combined adaptive...
Automatic prosodic boundary detection and annotation are important for both speech understanding and natural speech synthesis. Manual annotation of prosody boundary label is very laborious and time consuming. In this paper, from the perspective of interaction of adjacent tones, we proposed a method to automatically detect prosody boundary based on tone nucleus features and Deep Neural Network (DNN)...
Nasal Finals play an important role in distinguishing lexical meanings in Standard Chinese, but it is still unclear what the primary perceptual cues for nasal Finals are. The present study looks into this question, especially the primary perceptual cues for native Chinese listeners. We conducted two perceptual experiments with three-formant synthetic stimuli in which the second formant (F2) and the...
Prosodic boundaries play an important role in intelligibility and naturalness of speech. It is an interesting topic to find a quantitative measurement of their importance. Previous studies have quantitatively measured the importance of prosodic boundaries based on functional loads (FLs). However, early study of estimating the information contribution of prosodic boundaries was under the hypothesis...
Although previous studies have demonstrated that fundamental frequency (f0) is a significant cue in tone perception, there are other factors influencing it (e.g., segments). In the present study, we explored Initials and Finals' effect on tone perception. One perception experiment was conducted with six continua (i.e., Tone 2-Tone 3) based on syllables which include three types of Initials (i.e.,...
The tone is a distinctive feature in Mandarin Chinese. Tone recognition is useful in distinguishing ambiguous words in Chinese Mandarin speech recognition. Most traditional studies focused on prosodic features (e.g., F0, duration and energy) to improve the performance of tone recognition. In this paper, we propose a novel framework to integrate articulatory features (AFs) and MFCC into a DNN-HMM based...
It is important to provide detailed and instructive feedback in computer assisted pronunciation training (CAPT) system. However the feedback is limited to the accuracy of the erroneous tendency detection. This paper proposed to apply senone log-likelihood ratio based articulatory features (AFs) to improve pronunciation erroneous tendency (PET) detection performance. Also the feedback information of...
A Chinese interlanguage corpus lays a foundation of studying speech production, such as the typical pronunciation errors, of non-native Chinese speakers. Traditional Chinese interlanguage corpus has difficulty in covering important phonetic types such tones, syllables with context. This paper presents a construction of an interlanguage corpus which contains 103 sentences covering 394 syllable types...
L2 learners of Mandarin have difficulty learning native-like pronunciation of nasal codas. In order to help them learn native-like pronunciation, we propose to develop targeted classifiers for automatic pronunciation error detection. In this paper, perceptual experiments with modified speech are designed to analyze the exact position of the landmark of a nasal coda. Based on perceptual results from...
Functional load (FL) is the quantitative measure of the importance of phonological contrasts, which stand for the differentiation of communicative linguistic units. Correct estimate of FLs is useful for the studies of speech recognition, language evolution, language teaching and etc. Conventional approaches use phonological transcriptions and unigram probabilities for the estimation, hence weak in...
This paper attempts to provide some insights about the relationship between the differentiability and the classification importance of consonants in Chinese speech communication. The two characteristics can be modelled by the perceptual distance and the functional load respectively. We have a clustering analysis of Chinese consonants based on functional load (FL) relied on mutual information (MI)...
This paper presents several findings in comparison of F0 range in Chinese speech by native Chinese speakers (CC), Chinese Speech by Japanese learners (CJ) and Japanese speech by native Japanese speakers (JJ). The purpose of this work as a whole is to investigate possible F0 range difference between Chinese L1 and L2 and examine whether this difference is correlated with their Japanese L1. “Long term...
In computer assisted pronunciation training system (CAPT), detecting mispronunciations produced by non-native speakers and providing detailed instructive feedbacks are two key points, because it is helpful to L2 learners to improve their pronunciation more effectively. We wish to give feedbacks relate to articulation-placement and articulation-manner, and connect with the detection task through modeling...
This paper uses pitch projection method to synthesize teaching speech by selecting the appropriate standard voice. To synthesize the teaching speech, lexicon tones in learners' speech is turned into standard tones, while keeping the segments and timbie unchanged. Thereby complex variation of speech signal is reduced except for tone. Then the paper carries tone training experiments for Japanese based...
In order to help Japanese C2L learners to learn Mandarin Chinese tones efficiently, we have devised a hybrid perceptual training paradigm consisting of adaptive and high-variability training periods. Through the training the 13 participants were able to show improvements in their tone distinguishing accuracy of the Tone 2 and 3 from the pre-test 84.2% to the post-test 94.2% in 6 days. After six months...
Pinyin-to-character (P2C) conversion is mostly used to input Chinese characters into a computer. Its main problem is homophone words, which is solved through exploiting contextual information provided by lexicon and n-gram language model (LM). Our investigation about the state-of-the-art P2C technologies reveals that the methods of conventional optimization for them were almost based on minimizing...
A segmentation posterior probability based endpointing algorithm for robust ASR is proposed. First, each speech signal is partitioned into homogeneous segments via auto-segmentation. Then posterior probabilities of all possible endpoints are computed, based on the segmentation likelihoods of all levels in a selected range. Endpoints with the highest posterior probabilities are finally selected. The...
The acoustic mismatch between the training and test environments will lead to the difference of the statistical characteristics of speech parameters. Since the statistical characteristics of the kurtosis can measure the non-Gaussianity of a random variable, kurtosis normalization will make the training and test speech parameters match the standard normal distribution in some sense. In this paper,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.