The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This letter proposes an efficient method for extracting pitch from speech signals for the hidden Markov model (HMM)-based speech synthesis system (HTS). In the proposed method, voicing detection and pitch estimation is performed using the mean signal obtained from continuous wavelet transform coefficients. The proposed pitch extraction method is integrated in the HMM-based speech synthesis system...
This paper proposes a new excitation modeling method for improving the quality of HMM-based speech synthesis. The proposed excitation or source modeling method models the pitch-synchronous residual frames extracted from the excitation signal. Initially, principal component analysis is performed on the pitch-synchronous residual frames. Based on the analysis, the pitch synchronous residual frames are...
In this work, we are attempting emotion classification in view of synthesizing story speech. We are proposing emotion-specific text features (ESF) for classifying sentences from children stories into five different emotion categories: happy, sad, anger, fear and neutral. ESF is a five dimensional feature vector, where each dimension corresponds to weight of the sentence according to each emotion class...
The present work investigates the importance of excitation source features for language identification (LID). Linear prediction residual (LPR) represents the excitation source signal. By processing the LPR in sub-segmental, segmental and supra-segmental levels, we can get the language specific information present within a glottal cycle, within a sequence of a few glottal cycles and at the prosody...
In emotional-speech, it is observed that some words and phrases are spoken prominently, compared to neutral-speech. The prominence of these specific words and phrases are reflected in the form of prosodic features such as duration, intonation and intensity patterns of the words or phrases. The neutral speech and emotional speech have basic difference due to prosody aspects of speech. Three acoustic...
This paper is concerned with speech signal based emotion recognition. Linear Prediction (LP) residual mainly contains source specific emotional information. LP residual is derived by inverse filtering of the speech signal. For characterizing the basic emotions, LP residual has been explored at sub-segmental level, segmental level, supra-segmental level, respectively. Gaussian mixture models (GMMs)...
This paper proposes a method for modeling the excitation signal to improve the quality of HMM-based speech synthesis system (HTS). Single optimal residual frame which closely relates to all frames of phone is chosen to represent the entire residual signal of the phone. Optimal residual frames of all phones present in the speech corpus are efficiently grouped based on positional and contextual features...
This work is mainly intended at identifying emotion contribution of different vowels in Telugu language. Instead of processing the entire speech signal we propose to focus only vowel parts of the utterance (/a/, /i/, /u/, /e/ and /o/). By analysing the vowels we can discriminate the emotions. In this work spectral and prosodic features are used for studying the effect of emotions on different vowels...
In this paper, a framework for synthesizing Telugu emotional speech for story telling applications is presented. An XML based markup langauge, SABLE is used to synthesize the emotions from a given story text. SABLE markup defines a set of tags to improve the quality of the synthesized speech from the concatinative speech synthesizer. In this work, a subset of prosody tags are used to synthesize the...
In this paper, we are introducing speech database consists of 27 Indian languages for analyzing language specific information present in speech. In the context of Indian languages, systematic analysis of various speech features and classification models in view of automatic language identification has not performed, because of the lack of proper speech corpus covering majority of the Indian languages...
In this paper, we propose a subword based approach for grapheme-to-phoneme (G2P) conversion in a text-to-speech (TTS) synthesis system. The proposed method resolves the problems present in both the manual and rule-based approaches for G2P conversion. The subword method uses a segmentation procedure which chops a word into its main part (root word) and subword part (suffix). By proper segmentation...
This paper discusses the development of Bengali screen reader using Festival speech synthesizer. Screen reader is developed with the objective that the visually challenged people can use the computer without any difficulty. The usability of system is checked throughout the development and appropriate modifications are made. Unrestricted Bengali text to speech synthesis (TTS) system which can produce...
Speech coding is one of the major degradation involved in building the speech systems in mobile environment. In this paper, we are exploring the effect of low bit rate speech coding on the accuracy of detection of epochs. Epoch is referred as the instant of significant excitation of the vocal-tract system during production of speech. Many speech applications depend on the the accurate estimation of...
This paper explores the Linear Prediction (LP) residual of speech signal for characterizing the basic emotions. The emotions used in this study are anger, compassion, disgust, fear, happy, neutral, sarcastic and surprise. LP residual is derived by inverse filtering of the speech signal, and the process is known as LP analysis. LP residual mainly contains higher order relations among the samples. For...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.