Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance...
Lyrics are an important part of songs. Lyrics recognition is the basis of retrieving songs and recognizing the content of songs, which is of great value. At present, the research of speech recognition has made great progresses. But there are still difficulties in recognition of lyrics in songs with accompaniment. Related research is generally lacking, especially for Chinese lyrics in songs with accompaniment,...
This paper presents the work related to phonetical analysis of classical Arabic speech. Hidden Markov model classifier is applied on Arabic phonemes. For the purpose of this work, a new classical Arabic speech corpus is created. The corpus is based on selected recordings of recitations of The Holy Quran. A number of acoustic features are analyzed and compared. Those are: linear predictive coding (LPC)...
In this article we applied Support Vector Machines to acoustic model of Speech Recognition System based on MFCC and LPC features for Azerbaijani DataSet. This DataSet has been used for speech recognition by Multilayer Artificial Neural Network and achieved some results. The main goal of this work is applying SVM techniques to the Azerbaijan Speech Recognition System. The variety of results of SVM...
Detailed analysis of tonal features for Tibetan Lhasa dialect is an important task for Tibetan automatic speech recognition (ASR) applications. However, it is difficult to utilize tonal information because it remains controversial how many tonal patterns the Lhasa dialect has. Therefore, few studies have focused on modeling the tonal information of the Lhasa dialect for speech recognition purpose...
Speech is not only a way for infants under one year of age to communicate with the outside world, but also the important information source to reflect their emotions and needs, as well as health status and mental level. In order to explore the intelligent machine technology for understanding infant's emotions and needs from speech signals, and therefore help parents in child rearing, this paper studied...
Present Mel Frequency Cepstral Coefficient (MFCC) based Bangla Automatic Speech Recognition (ASR) systems are mostly implemented with delta and acceleration coefficients. With delta and acceleration coefficients of MFCC and the log energy, a vector set of 39 dimensions is obtained per 10ms. In this paper, our objective is to observe the effect of third differential coefficients on the performance...
This paper presents a comparative study and evaluation of the performances of four speech feature vectors, i.e., MFCC, IMFCC, LFCC, and PNCC in a speaker verification system based on speaker modeling through the Gaussian mixture model (GMM) under clean and noisy speech conditions. The TIMIT and NOISEX92 dataset were used in implementing the tests for speech signal and noise, respectively. The evaluation...
Phonetic Engine (PE) is a system that is used to determine the sequence of phones in a spoken utterance. In order to transcribe the speech database, International Phonetic Alphabet (IPA) is used. This work focuses on developing multilingual PE for four Indian languages namely, Bengali, Hindi, Urdu and Telugu. The number of languages can be increased to any number. For developing the PE, read speech...
This paper examines performances of an independent Speaker Identification System (SIS) based on a template model using a Vector Quantization (VQ) method. Template model is characterized by the implementation platform based on a comparison process where the speaker model with the smallest distortion score is identified. In order to analyze the decision of the system and its confidence, a thresholding...
Speech recognition is a broad subject as speech is natural way of communication. The acoustic and language model for this system are available but mostly in English language [15]. In India there are so many peoples who can't understand or speak English. So the speech recognition system in English language is of no use for these people. Here we presented Isolated Hindi words recognition system which...
It is well known that the variability in speech caused by the accents or dialects of speakers degrades the performance of speech recognition systems. One method to prevent this degradation is to correctly identify the accent or dialect of a speaker so that the putative system can be designed to use this information. In this paper, we apply the extreme learning machine, an efficient neural network...
The speech recognition technology is one of the hot spots in the field of audio technology. For the recognition of the lyrics with the accompaniment, there are two commonly used methods, one is applying automatic speech recognition technology to singing recognition, the other way is using sound classification, extracting audio features, and then using pattern matching classifier for classification...
The act of reading Qur'an and pronouncing its sound dwells on the type of recitation. These are referring to the recitation of Warsh or the recitation of Hafss. It's very important to recognise the type of recitations, especially with the diversity and the spread of Qira'at in the world. This research presents a speech recognition system that distinguishes between the different types of the Qur'an...
In this work, a new feature, residual sinusoidal peak amplitude (RSPA), is proposed for emotion classification. The RSPA feature is evaluated from the LP residual of the speech signal using sinusoidal model. Residual signal is a major source of the excitation and it is expected that emotional information can be well manifested in the residual signal. The effectiveness of the proposed feature is explored...
In this paper, an attempt is made to examine and evaluate the effect of bottleneck and the hierarchical bottleneck (HBN) framework in MLP-based Automatic Speech Recognition (ASR) systems. In particular, the bottleneck and hierarchical bottleneck framework are analyzed using Volterra series. Experiments on several architectures with incorporation of systematic hierarchical and bottleneck properties...
The sound is a useful and versatile form of communication, where each sound have characteristics and levels of different frequency. Sound serves two basic functions for people around the world: signaling and communication. Several problems are found in sounds identifying, like pitch, velocity, and accuracy of processing voice data. The motivation of this research is to recognize and analyze human...
Present era is full of speech recognition based services and products. The machine learning paradigms is at the centre stage of speech recognition methodology. Automatic speech recognition (ASR) technology has vastly evolved in recent years including emerging applications in mobile computing, natural user interface, and man-machine assistive technology. In this paper, it's the first time we are presenting...
An interesting issue to be considered up to now in emotional speaker recognition system is the context in which speech database used to develop and evaluate the performance of the system. So, we propose and assess an emotional speaker recognition system based on different feature extraction methods, focusing on the diversities between simulated and natural emotional speech databases(BERLIN and IEMOCAP)...
This paper frames co-relation on three feature extraction techniques in ASR system. As compared to primarily used technique called MFCC (Mel Frequency Cepstral Coefficients), PNCC (Power Normalized Cepstral Coefficients) obtains impressive advancement in noisy speech recognition due of its inhibition in high frequency spectrum for human voice. The techniques differ in the way as MFCC uses traditional...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.