The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Assessment of the language environment of children in early childhood is a challenging task for both human and machine, and understanding the classroom environment of early learners is an essential step towards facilitating language acquisition and development. This paper explores an approach for intelligent language environment monitoring based on the duration of child-to-child and adult-to-child...
This paper proposes a novel approach to voice conversion with non-parallel training data. The idea is to bridge between speakers by means of Phonetic PosteriorGrams (PPGs) obtained from a speaker-independent automatic speech recognition (SI-ASR) system. It is assumed that these PPGs can represent articulation of speech sounds in a speaker-normalized space and correspond to spoken content speaker-independently...
The act of reading Qur'an and pronouncing its sound dwells on the type of recitation. These are referring to the recitation of Warsh or the recitation of Hafss. It's very important to recognise the type of recitations, especially with the diversity and the spread of Qira'at in the world. This research presents a speech recognition system that distinguishes between the different types of the Qur'an...
Jitter and shimmer as indicators of the quality of voice are often used to detect speech disorders and a variety of narrative styles. In this article we examine the suitability of the jitter and shimmer voice quality measurements for speaker verification task. We combine these voice quality features and further prosodic features with short-term spectral features (namely MFCCs). For the purposes of...
In this work, a new feature, residual sinusoidal peak amplitude (RSPA), is proposed for emotion classification. The RSPA feature is evaluated from the LP residual of the speech signal using sinusoidal model. Residual signal is a major source of the excitation and it is expected that emotional information can be well manifested in the residual signal. The effectiveness of the proposed feature is explored...
Speech enhancement using Kalman filter is an extensively researched area. The vast majority of work done in this area uses linear predictive coding (LPC) for modeling speech signal. A few important studies have revealed the superiority of Mel Frequency Cepstral Coefficients (MFCC) over LPC for speech recognition. With this paper, the shortcomings of speech enhancement using LPC with Kalman filters...
This work projects the importance of phonetic match between train and test session for a text-independent framework under limited test data condition. The robustness of text-independent speaker verification (SV) tends to fall down with the reduction of the amount of speech involved. From a deployable application oriented system point of view, the amount of speech involved, is expected to be less to...
Powerful automatic speech recognition system (ASR)is matter of commercial importance as many leading companies are sprinting at industry and consumer level production. One of the major reasons for speech quality to hamper is environmental noise. Speech gets obscured by the loud background sound. This adversely affects the performance of automatic speech recognition system. We also know that human...
Marathi is spoken by the native people of Maharashtra. Spoken word recognition in Marathi is widely studied area of research. This paper describes a method for recognition of Marathi Numerals from ‘Shunya’ (zero) to ‘Nau’ (nine) using Bark scale and Discrete Sine Transform. Features extracted using Bark scale are transformed and reduced using statistical properties. A unique method for feature vector...
Query-by-Example Spoken Term Detection (QbE-STD) under low-resource settings, is the task of retrieval which can be done via the example of an audio. The searching phase involves highly computationally intensive Dynamic Time Warping (DTW)-based matching techniques. Search space reduction is an important need in order to reduce the space of searching and hence, reduce the computational complexity....
The current approaches for spoken language recognition (LR) are predominantly based on GMM mean supervector as the representation of the utterances. It is assumed that the language information lies in a linear manifold of low dimensional spaces. Exploiting that a low dimensional projections of the GMM mean supervectors, known as i-vectors, are derived using a total variability matrix. The i-vector...
Multimedia data generated and stored online is growing at a huge rate. In order to access the desired multimedia data in real time, this needs to be analyzed and stored in categorical manner. This paper presents the data mining base approach for categorization of musical data. Experiments are performed using various data mining classifiers and preprocessing methods. This paper also compares the performance...
The presented work explores the role of pitch-adaptive cepstral features in context of automatic speech recognition (ASR) of children's speech on adults' speech trained acoustic models. On account of large acoustic mismatch between training and test data, highly degraded recognition rates are noted for such cases. Earlier studies have shown that the said acoustic mismatch is aided by the insufficient...
In this paper, an attempt is made to examine and evaluate the effect of bottleneck and the hierarchical bottleneck (HBN) framework in MLP-based Automatic Speech Recognition (ASR) systems. In particular, the bottleneck and hierarchical bottleneck framework are analyzed using Volterra series. Experiments on several architectures with incorporation of systematic hierarchical and bottleneck properties...
In this work, a novel countermeasure is proposed for protecting the speaker verification (SV) system to replay based spoofing attacks. The replay attacks refer to the attacks made with recorded speech of a particular speaker by playing them back to the system, claiming as an authentic speaker. On analyzing live and recorded speech examples, it was noted that the low frequency contents get suppressed...
An artificial neural network is one of the most important models for training features in a voice conversion task. Typically, Neural Networks (NNs) are not effective in processing low-dimensional F0 features, thus this causes that the performance of those methods based on neural networks for training Mel Cepstral Coefficients (MCC) are not outstanding. However, F0 can robustly represent various prosody...
Emotion recognition plays a significant role in affective computing and adds value to machine intelligence. While the emotional state of a person can be manifested in different ways such as facial expressions, gestures, movements and postures, recognition of emotion from speech has gathered much interest over others. However, after years of research, recognizing the emotional state of individuals...
Identifying different audio segments in videos is the first step for many important tasks such as event detection and speech transcription. Approaches using Mel-Frequency Cepstral coefficients (MFCCs) with Gaussian mixture models (GMMs) and hidden Markov models (HMMs) perform reasonably well in stationary conditions but do not scale to a broad range of environmental conditions. This paper focuses...
The sound is a useful and versatile form of communication, where each sound have characteristics and levels of different frequency. Sound serves two basic functions for people around the world: signaling and communication. Several problems are found in sounds identifying, like pitch, velocity, and accuracy of processing voice data. The motivation of this research is to recognize and analyze human...
Asthma is a lung disease that affects airflow to and From the lungs. A whistling sound comes when a person suffering from asthma breathes in and out. Major symptoms of asthma are chest stiffness, breathe shortness and cough production during night and morning. In this paper, Asthma is analyze with the help of Mel frequency Cepstral Coefficient (MFCC). In this system, MFCC for Normal Voice and for...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.