The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...
In this paper we present our approach to the rapid and efficient development of an automatic speech recognition (ASR) system for Russian. We try to utilize our tools, procedures and data previously designed and collected for other Slavic languages, Czech and Slovak. We show how we build a large corpus of texts acquired from major publishers' web pages and convert it from Cyrillic to Latin to simplify...
This paper investigates the problem of defining the acoustic-phonetic unit set for flexible vocabulary continuous speech recognition systems. As an alternative to the classical modeling approach with biphones and triphones, a set of stationary/transitory state units is defined that is limited enough in number as to represent a closed set trainable once and for all. A major benefit of these units is...
Shigin is the singing of Japanese or Chinese poetry, following a melody called “seicho” in Japanese. However, it is difficult to master Shigin because a trainer teaches according to his/her own impressions, and its melody employs a relative music scale. Therefore, this paper proposes a singing training support system for Shigin that clarifies differences in signal characteristics between a trainee...
Shigin is the singing of Japanese or Chinese poetry, following a melody called “seicho” in Japanese. However, it is difficult to master Shigin because a trainer teaches according to his/her own impressions, and its melody employs a relative music scale. Therefore, this paper proposes a singing training support system for Shigin that clarifies differences in signal characteristics between a trainee...
In this paper Arabic alphadigits were investigated from the speech recognition problem point of view. Limited vocabulary Arabic Automatic Speech Recognition Systems (ASRs) were designed, implemented, and tested by using isolated word utterances which consists of Arabic alphabets and/or digits. These systems were implemented separately by using phoneme level and word level based HMM models in distinct...
The following paper examines a possibility of applying phone-pronunciation variability descriptors in emotion classification. The proposed group of descriptors comprises a set of statistical parameters of Poincare maps, which are derived for evolution of formant-frequencies and energy of voiced-speech segments. Poincare maps are represented by means of four different parameters that summarize various...
In this paper, we propose to use syllables as the acoustic units representing speech signals in an automatic speech recognition (ASR) system in order to improve the performance of the automatic recognition of dysarthric speech. The motivation behind using syllables comes from studies of human perception which demonstrate the central role of the syllable played in human perception and generation of...
A person's speaking style, consisting of such attributes as voice, choice of vocabulary, and the physical motions employed, not only expresses the speaker's identity but also emphasizes the content of an utterance. Speech combining these aspects of speaking style becomes more vivid and expressive to listeners. Recent research on speaking style modeling has paid more attention to speech signal processing...
Uyghur language is an agglutinative language. It is one of the least studied languages on speech recognition area. In this work, we present the research process of Uyghur large vocabulary continuous speech recognition based on HMM (hidden Markov model). This paper introduce the process of data collection (text corpus and speech corpus), the unit selection for speech recognition, the creation of acoustic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.