The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for environmental sound classification (ESC) task. TEO can track energy as a function of both amplitude and frequency of an audio signal. TEO is better for capturing energy variations in the signal that is produced by a real physical system, such as, environmental sounds that contain amplitude and frequency...
To learn auditory filterbanks, recently, we have proposed an unsupervised learning model based on convolutional restricted Boltzmann machine (RBM) with rectified linear units. In this paper, theory, training algorithm of our proposed model, and detailed analysis of learned filterbank are being presented. Learning of the model with different databases shows that the model is able to learn cochlear-like...
There has been a significant research attention for unsupervised representation learning to learn the features for speech processing applications. In this paper, we investigate unsupervised representation learning using Convolutional Restricted Boltzmann Machine (ConvRBM) with rectified units for speech recognition task. Temporal modulation representation is learned using log Mel-spectrogram as an...
In this paper, an attempt is made to examine and evaluate the effect of bottleneck and the hierarchical bottleneck (HBN) framework in MLP-based Automatic Speech Recognition (ASR) systems. In particular, the bottleneck and hierarchical bottleneck framework are analyzed using Volterra series. Experiments on several architectures with incorporation of systematic hierarchical and bottleneck properties...
Convolutional Restricted Boltzmann Machine (ConvRBM) as a model for speech signal is presented in this paper. We have developed ConvRBM with sampling from noisy rectified linear units (NReLUs). ConvRBM is trained in an unsupervised way to model speech signal of arbitrary lengths. Weights of the model can represent an auditory-like filterbank. Our proposed learned filterbank is also nonlinear with...
In this paper, auditory spectrogram is proposed for analysis of HIE and asthma infant cries. Auditory spectrogram represents a 2-dimensional (i.e., 2-D) pattern of neural activity, distributed along a logarithmic frequency-axis. Features are derived from the auditory spectrograms of each class. These features are then used to train support vector machine (SVM) classifier. Effectiveness of the proposed...
The generalized statistical framework of Hidden Markov Model (HMM) has been successfully applied from the field of speech recognition to speech synthesis. In this work, we have applied HMM-based Speech Synthesis System (HTS) method to Gujarati language. Adaption and evaluation of HTS for Gujarati language has been done here. Evaluation of HTS system built using Gujarati data is done in terms of naturalness...
This paper analyzes the distance-based objective measures for evaluation of Text-to-Speech (TTS) systems (which is generally used objective measures). In this paper, we discuss some aspects of evaluation of speech quality of synthesized speech. Some of the limitations and issues of subjective evaluation are discussed and importance of objective measures is presented. Traditional objective measure...
In this paper, use of Viterbi-based algorithm and spectral transition measure (STM)-based algorithm for the task of speech data labeling is being attempted. In the STM framework, we propose use of several spectral features such as recently proposed cochlear filter cepstral coefficients (CFCC), perceptual linear prediction cepstral coefficients (PLPCC) and RelAtive SpecTrAl (RASTA)-based PLPCC in addition...
In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllable-based framework is developed. As quality of speech synthesis is of paramount interest, unit-selection synthesizers...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.