The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we have employed learned dictionaries to compute sparse representation of speech utterances, which will be used to reduce the footprint of unit selection based speech synthesis (USS) systems. Speech database labeled at phoneme level is used to obtain multiple examples of the same phoneme, and all the examples (of each phoneme) are then used to learn a single overcomplete dictionary...
Emotions in human speech are short lived. In an emotive utterance, the emotive gestures produced due to the emotive state of the speaker persists only to a shorter duration. In this study, the regions of an utterance that are highly influenced by the emotive state of the speaker are detected. These regions are labeled as emotionally significant regions. Data from the detected emotionally significant...
Query-by-example spoken term detection (QbE-STD) refers to the task of determining the subsequence of a reference which matches with a query, where both the query and the reference are in audio format. Dynamic time warping (DTW) based techniques are explored to match the two sequences with different lengths in an unsupervised manner. In this paper, a completely unsupervised approach based on Segmental...
In this paper, the non-uniform duration modification is exploited along with other prosody features for neutral speech to anger speech conversion. The non-uniform duration modification method modifies the durations of vowel and pause segments by different modification factors. Vowel segments are modified by factors based on their identities, and pause segments by uniform factors. Consonant and transition...
Robust syllabification of continuous speech is a vital aspect of language and speech processing systems. Syllabification of speech can be done by detecting the syllable nuclei. Syllable is the basic production unit of human speech and syllable nuclei can be attributed to high energy sonarants or resonant sounds which are relatively loud and carry a clear pitch. In this work, high spectral energy at...
In this paper, we are introducing speech database consists of 27 Indian languages for analyzing language specific information present in speech. In the context of Indian languages, systematic analysis of various speech features and classification models in view of automatic language identification has not performed, because of the lack of proper speech corpus covering majority of the Indian languages...
Speech coding is one of the major degradation involved in building the speech systems in mobile environment. In this paper, we are exploring the effect of low bit rate speech coding on the accuracy of detection of epochs. Epoch is referred as the instant of significant excitation of the vocal-tract system during production of speech. Many speech applications depend on the the accurate estimation of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.