The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In HMM/DNN automatic speech recognition (ASR) systems, the DNNs model the posterior probabilities for triphone states. However, triphone states are unevenly distributed. In this situation, the training algorithm tends to converge to a local optimum more related to states with rich data than states with poor data. Thus, the imbalance of the training data decreases the ASR performances, especially for...
In current DNN/HMM hybrid systems, the DNN models are trained by the 1-of-V targets which are obtained by the Viterbi-based forced-alignment. The states are viewed as unrelated and isolated. In fact, some phonemes are acoustically similar. Especially for Chinese, as a tonal language, its number of similar pairs is quadrupled. To add the similarity information between states into the model training,...
Korean is an agglutinative language, in which pronunciations are affected by long-term context. In this paper, the long-time temporal information is investigated to improve Korean LVCSR. TRAP-based MLP features, which are able to utilize the scattered acoustic information over several hundred milliseconds, are employed to obtain additional information besides the conventional cepstral features. In...
Acoustic modeling of Large Vocabulary Continuous Speech Recognition (LVCSR) system which is normally based on context-dependent phone is heavily limited by representative capability between transcriptions and corresponding variation of raw speech utterance. To describe this relationship more accurate, this paper presents an alternative strategy by which speech attributes are used to capture acoustic...
This paper presents our recent work on the development of a real world voice retrieval system, which automatically updates language models for a specific domain with the latest web data. Two of the main difficult issues in handling this system are tackled in this paper. First, when people use voice retrieval systems, new created "hot words" are inputted as the keywords. In order to ensure...
This paper reports our recent work on optimizing the AF (articulatory features) based confidence measures, and combining them with the traditional HMM-based confidence measures. Different articulatory properties are analyzed using a separate AF-based confidence calculation method proposed in this paper, and are observed to be both complementary and redundant. A more compact subset is chosen and assembled...
The HMM-based TTS can produce a highly intelligible and decent quality voice. However, sometimes the synthesized speech exhibits perceptibly annoying glitches due to F0 extraction errors in the training data and voiced/unvoiced swapping errors in F0 generation. In the conventional MSD based F0 modeling [10], the dual but incompatible two probabilistic spaces, the continuous probability density for...
The great success of Minimum Phone Error (MPE) training criterion in mono-language large vocabulary continuous speech recognition (LVCSR) tasks motivates us to apply it to bilingual LVCSR systems. In this paper, in conjunction with the previous respectable bilingual phoneme inventory construction techniques, we give a comprehensive investigation to the performance of MPE/fMPE on various Mandarin-English...
This paper presents a novel bilingual model modification approach to improve nonnative speech recognition accuracy when the variations of accented pronunciations occur. Each state of baseline nonnative acoustic model is modified with several candidate states from the auxiliary acoustic model, which is trained on speakers' mother language. State mapping criterion and n-best candidates are investigated,...
The performance of automatic speech recognition decreases drastically for nonnative speakers, especially those who are just beginning to learn foreign language or who have heavy accents. This paper presents a novel bilingual model modification approach to improve nonnative speech recognition via considering these great variations of accented pronunciations. Each state of baseline nonnative acoustic...
The speech recognition accuracy has been observed to decrease for nonnative speakers, especially those who are just beginning to learn foreign language or who have heavy accents. This paper presents a novel bilingual model modification approach to improve nonnative speech recognition via considering these great variations of accented pronunciations. Each state of the baseline nonnative acoustic models...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.