Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
In this paper, we propose to use modified Gammatone filterbank with Teager Energy Operator (TEO) for environmental sound classification (ESC) task. TEO can track energy as a function of both amplitude and frequency of an audio signal. TEO is better for capturing energy variations in the signal that is produced by a real physical system, such as, environmental sounds that contain amplitude and frequency...
To learn auditory filterbanks, recently, we have proposed an unsupervised learning model based on convolutional restricted Boltzmann machine (RBM) with rectified linear units. In this paper, theory, training algorithm of our proposed model, and detailed analysis of learned filterbank are being presented. Learning of the model with different databases shows that the model is able to learn cochlear-like...
There has been a significant research attention for unsupervised representation learning to learn the features for speech processing applications. In this paper, we investigate unsupervised representation learning using Convolutional Restricted Boltzmann Machine (ConvRBM) with rectified units for speech recognition task. Temporal modulation representation is learned using log Mel-spectrogram as an...
In this paper, an attempt is made to examine and evaluate the effect of bottleneck and the hierarchical bottleneck (HBN) framework in MLP-based Automatic Speech Recognition (ASR) systems. In particular, the bottleneck and hierarchical bottleneck framework are analyzed using Volterra series. Experiments on several architectures with incorporation of systematic hierarchical and bottleneck properties...
Convolutional Restricted Boltzmann Machine (ConvRBM) as a model for speech signal is presented in this paper. We have developed ConvRBM with sampling from noisy rectified linear units (NReLUs). ConvRBM is trained in an unsupervised way to model speech signal of arbitrary lengths. Weights of the model can represent an auditory-like filterbank. Our proposed learned filterbank is also nonlinear with...
In this paper, auditory spectrogram is proposed for analysis of HIE and asthma infant cries. Auditory spectrogram represents a 2-dimensional (i.e., 2-D) pattern of neural activity, distributed along a logarithmic frequency-axis. Features are derived from the auditory spectrograms of each class. These features are then used to train support vector machine (SVM) classifier. Effectiveness of the proposed...
The generalized statistical framework of Hidden Markov Model (HMM) has been successfully applied from the field of speech recognition to speech synthesis. In this work, we have applied HMM-based Speech Synthesis System (HTS) method to Gujarati language. Adaption and evaluation of HTS for Gujarati language has been done here. Evaluation of HTS system built using Gujarati data is done in terms of naturalness...
This paper analyzes the distance-based objective measures for evaluation of Text-to-Speech (TTS) systems (which is generally used objective measures). In this paper, we discuss some aspects of evaluation of speech quality of synthesized speech. Some of the limitations and issues of subjective evaluation are discussed and importance of objective measures is presented. Traditional objective measure...
In this paper, use of Viterbi-based algorithm and spectral transition measure (STM)-based algorithm for the task of speech data labeling is being attempted. In the STM framework, we propose use of several spectral features such as recently proposed cochlear filter cepstral coefficients (CFCC), perceptual linear prediction cepstral coefficients (PLPCC) and RelAtive SpecTrAl (RASTA)-based PLPCC in addition...
In this paper, we discuss a consortium effort on building text to speech (TTS) systems for 13 Indian languages. There are about 1652 Indian languages. A unified framework is therefore attempted required for building TTSes for Indian languages. As Indian languages are syllable-timed, a syllable-based framework is developed. As quality of speech synthesis is of paramount interest, unit-selection synthesizers...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.