The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One of the major challenge in human emotion recognition is extraction of features containing maximum prosodic information. The accuracy of entire emotion detection system eventually relies upon the efficiency of the selected feature. When it comes to identifying emotions from voice, ambiguity in detection can never be completely avoided due to several reasons. Exclusion of redundant information to...
The performance of speech classification tasks can be improved by accurate acoustic modeling. This modelling is responsible for establishing the relationship between the speech signal and the phonetic units that were produced by the speaker. In this paper Acoustic Modeling(AM) is done using Reservoir Computing(RC) technique for which the input speech signal frames can be identified and classified...
Now-a-days, speech-based biometric systems such as automatic speaker verification (ASV) are highly prone to spoofing attacks by an imposture. With recent development in various voice conversion (VC) and speech synthesis (SS) algorithms, these spoofing attacks can pose a serious potential threat to the current state-of-the-art ASV systems. To impede such attacks and enhance the security of the ASV...
Home automation with voice recognition can achieve a high level of performance in real world environment. However, such performance drops significantly in mismatched noisy conditions. To solve this problem, we propose a improvement method to extract Mel Frequency Cepstral Coefficients (MFCC) that increase the accuracy up to 20% than traditional method. This paper describes an approach of speech recognition...
With information processing and retrieval of spoken documents becoming an important topic, there is a need of systems performing automatic segmentation of audio streams. Among such algorithms, spoken term discovery allows the extraction of word-like units (terms) directly from the continuous speech signal, in an unsupervised manner and without any knowledge of the language at hand. Since the performance...
Classification of speech signal is one of the most vital problems in speech perception and spoken word recognition. Although, there have been many studies on the classification of speech signals but the results are still limited. In this paper, we propose an image based approach for speech signal classification based on the combination of Local Naïve Bayes Nearest Neighbor (LNBNN) and Scale-invariant...
Mispronunciation is commonly observed in children from age 2 to 8 years. Some of the common mispronunciations are stopping, fronting, backing and affrication. These processes are known as phonological processes. Identification of these processes is crucial in studying the vocal tract development pattern and treating the phonological disorders in children. The features that clearly discriminate correctly...
This paper presents the experiments on feature selection for emotional speech classification. There are 152 features used in this experiment. The minimum redundancy maximum relevance (mRMR) feature selection is applied as the features selection. The experiments are constructed from two corpora; Interactive Emotional Dyadic Motion Capture (IEMOCAP) and Emotional Tagged Corpus on Lakorn (EMOLA) which...
Tree based context clustering processes reduce the sizes of acoustic models of Hidden Markov Model (HMM) speech synthesis systems as well as eliminate problems arising from unseen sound units. Representations of speech units in speech synthesis systems are often LPC or MCEP features whose characteristics promote speech reconstruction rather than discrimination among different sound units. In this...
This Manuscript probe delinquent of classification of uninterrupted of broad-spectrum aural data for content based recovery. This paper is dealing with scheme for classifying aural data & segmentation is also done on same data so that processing rate is faster. Aural data is able to classify into eight categories Simple speech, noise, silence, music single speech with music, double speech with...
In this paper, we discuss efficient implementation of machine learning algorithms on DSPs. Specifically, we implement OCR and speech recognition on DSP and show how they can be optimized using fixed point routines. We illustrate the optimal usage of DSP resources like MAC units, shifters and software pipelining through assembly code structuring which massively reduces the MIPS consumed by the processor...
The voice is most prominent & primary mode of communication among the human beings. With this speech human can communicate with machine, thus this technique is used in education, military and medical sectors. Though this is not the new area, from last few decades researchers are working on the improvement of accuracy in voice recognition system. The design of that system concerns major issues...
The objective of this paper is to study the effect of speaking mode on spoken term detection (STD) system. The experiments are conducted with respect to query words recorded in isolated manner and words cut out from continuous speech. Durations of phonemes in query words greatly vary between these two modes. Hence pattern matching stage plays a crucial role which takes care of temporal variations...
This paper presents a speaker based Language Independent Isolated Speech Recognition System (LIISRS). The most popular feature extraction technique Mel Frequency Cepstral Coefficients (MFCC) is used for training the system. Representative specific features are identified using K-Means algorithm. Distortion measure is calculated using Euclidian distance function. Pitch contour characteristics are used...
This paper motivates the use of combination of mel frequency cepstral coefficients (MFCC) and its delta derivatives (DMFCC and DDMFCC) calculated using mel spaced Gaussian filter banks for text independent speaker recognition. MFCC modeled on the human auditory system shows robustness against noise and session changes and hence has become synonymous with speaker recognition. Our main aim is to test...
Speech is the standard means of communication among people. Automatic Speech Recognition (ASR) applications facilitate the users to interact with machines through speech and perform their tasks effortlessly. Speech Recognition applications in native languages will enable illiterate and semi-illiterate people to use computer services without any/little knowledge to operate computers and to lead better...
This paper formulates a novel approach to spoken document information retrieval for instinctive speech corpora. The conventional method for this problem is to make use of an Automatic Speech Recognizer (ASR) integrated with the typical information retrieval method. However, ASRs tend to produce transcripts of spontaneous speech with momentous word error rate, which is a negative aspect of standard...
Though emotion speech recognition has gained increasing interest in the field of Human Computer Interaction, it is still a challenge to automatically determine the emotion state type and the boundaries of each emotionally salient segment in continuous speech, which is named as Automatic Emotion Variation Detection (AEVD). In this task, the input utterances are not pre-segmented and may contain emotion...
This paper is aimed at morphing the speech uttered by a source speaker in a manner that it seems to be spoken by another target speaker - a new identity is given while preserving the original content. The proposed method transforms the vocal tract parameters and glottal excitation of the source speaker into target speaker's acoustic characteristics. It relates to the development of appropriate vocal...
We propose new features for the language recognition using Gaussian computations. New features are derived from traditional features like Mel frequency cepstral coefficients (MFCC) using fuzzy c-means clustering algorithm. MFCC feature vectors derived from huge corpus of all languages under consideration are grouped into c-clusters using fuzzy c-means clustering algorithm and one Gaussian distribution...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.