The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this work, we address the issues involved in whisper-to-audible speech conversion. Spectral mapping techniques using Gaussian mixture models or Artificial Neural Networks borrowed from voice conversion have been applied to transform whisper spectral features to normally phonated audible speech. However, the modeling and generation of fundamental frequency (F0) and its contour in the converted speech...
One of the issues in using audio books for building a synthetic voice is the segmentation of large speech files. The use of the Viterbi algorithm to obtain phone boundaries on large audio files fails primarily because of huge memory requirements. Earlier works have attempted to resolve this problem by using large vocabulary speech recognition system employing restricted dictionary and language model...
Analysis of human reference summaries of broadcast news showed that humans give preference to anchor speaker segments while constructing a summary. Therefore, we exploit the role of anchor speaker in a news show by tracking his/her speech to construct indicative/informative extractive audio summaries. Speaker tracking is done by Bayesian information criterion (BIC) technique. The proposed technique...
Screen reader is a form of assistive technology to help visually impaired people to use or access the computer and Internet. So far, it has remained expensive and within the domain of English (and some foreign) language computing. For Indian languages this development is limited by: availability of Text-to-Speech (TTS) system in Indian languages, support for reading glyph based font encoded text,...
Statistical parametric synthesis becoming more popular in recent years due to its adaptability and size of the synthesis. Mel cepstral coefficients, fundamental frequency (f0) and duration are the main components for synthesizing speech in statistical parametric synthesis. The current study mainly concentrates on mel cesptral coefficients. Durations and f0 are taken from the original data. In this...
In this paper, we propose to use artificial neural networks (ANN) for voice conversion. We have exploited the mapping abilities of ANN to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using ANN and the state-of-the-art Gaussian mixture model (GMM) is conducted. The results of voice conversion evaluated using subjective...
In this paper we propose a technique for a syllable based speech synthesis system. While syllable based synthesizers produce better sounding speech than diphone and phone, the coverage of all syllables is a non-trivial issue. We address the issue of coverage of syllables through approximating the syllable when the required syllable is not found. To verify our hypothesis, we conducted perceptual studies...
Indian languages are syllabic in nature where many syllables are found common across its languages. This motivates us to build a global syllable set by combining multiple language syllables to build a synthesizer which can borrow units from a different language when the required syllable is not found. Such synthesizer make use of speech database in different languages spoken by different speakers,...
Pattern classification is an important task in speech recognition and speaker verification. Given the feature vectors of an input the goal is to capture the characteristics of these features unique to each class. This paper deals with exploring Auto Associative Neural Network (AANN) models for the task of speaker verification and speech recognition. We show that AANN models produce comparable performance...
In this paper we present our argument that context information could be used in early stages i.e., during the definition of mapping of the words into sequence of graphemes. We show that the early tagged contextual graphemes play a significant role in improving the performance of grapheme based speech synthesis and speech recognition systems.
In this paper we address the issue of pronunciation modeling for conversational speech synthesis. We experiment with two different HMM topologies (fully connected state model and forward connected state model) for sub-phonetic modeling to capture the deletion and insertion of sub-phonetic states during speech production process. We show that the experimented HMM topologies have higher log likelihood...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.