The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, features based on the sparse representation (SR) are proposed for the classification of speech units. The proposed method employs multiple dictionaries to effectively model variations present in the speech signal. Here, a Gaussian mixture model (GMM) is built using spectral features corresponding to frames of all the examples of a speech class. Multiple dictionaries corresponding to...
This paper presents the implementation of a practical voice recognition system using MATLAB (R2014b) to secure a given user's system so that only the user may access it. Voice recognition systems have two phases, training and testing. During the training phase, the characteristic features of the speaker are extracted from the speech signal and stored in a database. In the testing phase, the stored...
Speaker diarization is the task of determining “who spoke when” in a speech recording of an unknown duration containing an unknown number of speakers. The very unsupervised nature of this task makes it more challenging and demands that the feature representation used be highly discriminative across speakers. Commonly used features based on the short time Fourier transform are usually derived from...
Emotions in human speech are short lived. In an emotive utterance, the emotive gestures produced due to the emotive state of the speaker persists only to a shorter duration. In this study, the regions of an utterance that are highly influenced by the emotive state of the speaker are detected. These regions are labeled as emotionally significant regions. Data from the detected emotionally significant...
Query-by-example spoken term detection (QbE-STD) refers to the task of determining the subsequence of a reference which matches with a query, where both the query and the reference are in audio format. Dynamic time warping (DTW) based techniques are explored to match the two sequences with different lengths in an unsupervised manner. In this paper, a completely unsupervised approach based on Segmental...
Speech is an informative signal, which conveys many information's like status of the speaker, environmental conditions of the speaker: the other necessary parameters which are classified as prosodic features and general features of speech. As speech is a signal which can be analysed by subjecting and can be inspected to various criteria with the implication of several available techniques. In this...
In this paper, we are introducing speech database consists of 27 Indian languages for analyzing language specific information present in speech. In the context of Indian languages, systematic analysis of various speech features and classification models in view of automatic language identification has not performed, because of the lack of proper speech corpus covering majority of the Indian languages...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.