The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a novel decoding algorithm by integrating both steady speech segments and observations' location information into conventional path extension framework. First, speech segments which possess stable spectrum are extracted. Second, a preliminarily improved algorithm is given by modifying traditional inter-HMM extension framework using the detected steady segments. Then, at probability...
For the same tone pattern, different articulatory characteristics may make the pitch contour change. This paper applies articulatory features, which represent the articulatory information, as well as prosodic features to the tone modeling. Three kinds of tone models are trained to verify the effectiveness of articulatory features. Tone recognition experiments indicate significant improvement can be...
Accurately modeling the acoustic variabilities caused by coarticulation is important in continuous speech recognition. Recent research indicates that syllable units do better in modeling intra-syllable co-articulation effect than sub-syllable units. However, most continuous Mandarin speech recognition systems use context dependent phones or initial/finals (IFs) as the basic acoustic unit because it...
Tone plays an important role in distinguishing ambiguous words in Chinese Mandarin speech recognition. In this paper, we make full use of pitch information. On the one hand, we interpolate F0 contour to make the F0 contour continuous between voiced and unvoiced segments in order to embed F0 into speech recognition system in two streams, which cepstrum and its first and second order derivatives constitute...
In this paper, we propose a novel interpolated language model that combines the interpolation and the backing-off along hierarchical classes based on class hierarchy. And the corresponding approach to the estimation of interpolation coefficients is also presented. We use the Minimum Discriminative Information (MDI) method to cluster the vocabulary into a word-clustering tree hierarchically. The tree...
Statistical confusability between different acoustic models is important to character substitution error rate in large vocabulary continuous speech recognition. In this paper, we take factors of gender and speaking styles into consideration in Mandarin speech recognition. We modeled phonemes in different speaking styles, including read speech of female, male, and spontaneous dialogue. Then minimum...
In this study, we combine the Mandarin characteristics with Mandarin acoustic attribute and text information and use hierarchical model based ensemble machine learning to predict Mandarin pitch accent. Our model could make the best of advantages of prosody hierarchical structure and ensemble machine learning. When comparing our model with classification and regression tree (CART), support vector machine...
Prosody is an important factor for a high quality text-to- speech (TTS) system. Prosody is often described with a hierarchical structure. So the generation of the hierarchical prosody structure is very important both in the corpus building and the real-time text analysis, but the prosody labeling procedure is laborious and time consuming. In this paper, an automatic prosody boundary label system is...
In the large vocabulary continuous speech recognition system based on stochastic segment model (SSM), the multistage decoding and pruning algorithm could decrease decoding time obviously. Generally, we only decode and prune for one segment each time. In this paper, a decoding algorithm based on neighboring segments is proposed. This algorithm decodes for multi-segments at the same time, so that the...
It is the key to improve the natural degree of speech synthesis and reduce the error rate of speech recognition that analyzes the information structure and prosodic structure of sentence and chapters. Based on large speech corpus (ASCCD) with prosodic structure label, we measured the characteristics of duration and pitch on prosodic phrase. The statistical results on duration and pitch are presented...
In this paper, a novel adaptive step decoding method using steady-energy pieces (SEPs) is explored in segment model (SM) based LVCSR system. Using speech analysis methods and statistical classification tools, the start and end points of SEPs are detected firstly. In SM decoding stage, frame-by-frame decoding for segments which start or end in SEPs are overleaped, replaced by SEP-based decoding. In...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.