The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Forced alignment for speech synthesis traditionally aligns a phoneme sequence predetermined by the front-end text processing system. This sequence is not altered during alignment, i.e., it is forced, despite possibly being faulty. The consistency assumption is the assumption that these mistakes do not degrade models, as long as the mistakes are consistent across training and synthesis. We present...
Speech segmentation refers to the problem of determining the phoneme boundaries from an acoustic recording of an utterance together with its orthographic transcription. This paper focuses on a particular case of hidden Markov model (HMM)-based forced alignment in which the models are directly trained on the corpus to align. The obvious advantage of this technique is that it is applicable to any language...
“Human BeatBox” (HBB) is a newly expanding contemporary singing style where the vocalist imitates drum beats percussive sounds as well as pitched musical instrument sounds. Drum sounds typically use a notation based on plosives and fricatives, and instrument sounds cover vocalisations that go beyond spoken language vowels. HBB hence constitutes an interesting use case for expanding techniques initially...
Both unit-selection and HMM-based speech synthesis require large annotated speech corpora. To generate more natural speech, considering the prosodic nature of each phoneme of the corpus is crucial. Generally, phonemes are assigned labels which should reflect their suprasegmental characteristics. Labels often result from an automatic syntactic analysis, without checking the acoustic realization of...
Several automatic phonetic alignment tools have been proposed in the literature. They usually rely on pre-trained speaker-independent models to align new corpora. Their drawback is that they cover a very limited number of languages and might not perform properly for different speaking styles. This paper presents a new tool for automatic phonetic alignment available online. Its specificity is that...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.