The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes an approach to perform accent adaptation by using accent dependent bottleneck (BN) features to improve the performance of multi-accent Mandarin speech recognition system. The architecture of the adaptation uses two neural networks. First, deep neural network (DNN) acoustic model acts as a feature extractor which is used to extract accent dependent BN (BN-DNN) features. The input...
The Automatic Speech Recognition (ASR) systems suffer from many types of noises in different environments. Nowadays, developing robust ASR system is an attractive research topic due to the high demands in many commercial applications. In this paper, the Mel-Frequency Cepstral Coefficients (MFCC) is modified to robust the noise, where the spectrogram is used as time-frequency analysis tool. The proposed...
Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance...
In this paper, we apply Locality Sensitive Discriminant Analysis (LSDA) to speaker verification system for intersession variability compensation. As opposed to LDA which fails to discover the local geometrical structure of the data manifold, LSDA finds a projection which maximizes the margin between i-vectors from different speakers at each local area. Since the number of samples varies in a wide...
Automatic Speech Recognition (ASR) System is defined as transformation of acoustic speech signals to string of words. This paper presents an approach of ASR system based on isolated word structure using Mel-Frequency Cepstral Coefficients (MFCC's), Dynamic Time Wrapping (DTW) and K-Nearest Neighbor (KNN) techniques. The Mel-Frequency scale used to capture the significant characteristics of the speech...
Lyrics are an important part of songs. Lyrics recognition is the basis of retrieving songs and recognizing the content of songs, which is of great value. At present, the research of speech recognition has made great progresses. But there are still difficulties in recognition of lyrics in songs with accompaniment. Related research is generally lacking, especially for Chinese lyrics in songs with accompaniment,...
Real-time speaker identification (SI) system is the application of Biometric system where the voice samples are collected in real-time. Due to that contamination of noises in speaker samples are the natural scenario. In this work, we tried to increase the accuracy of real-time SI system. We analysed the SI system by using different feature extraction methods with GMM-ML classifier. We found that MFCC...
This paper presents the work related to phonetical analysis of classical Arabic speech. Hidden Markov model classifier is applied on Arabic phonemes. For the purpose of this work, a new classical Arabic speech corpus is created. The corpus is based on selected recordings of recitations of The Holy Quran. A number of acoustic features are analyzed and compared. Those are: linear predictive coding (LPC)...
There had been many empirical researched demonstrating the important link between customer satisfaction and sales performance, as such many Customer Satisfaction index (CSI) were developed. Almost all CSI to date uses the survey or questionnaire method, which has its flaws. In order to quantify the CSI, we propose the use of speech analysis based on the affective space model where the valence and...
In this article we applied Support Vector Machines to acoustic model of Speech Recognition System based on MFCC and LPC features for Azerbaijani DataSet. This DataSet has been used for speech recognition by Multilayer Artificial Neural Network and achieved some results. The main goal of this work is applying SVM techniques to the Azerbaijan Speech Recognition System. The variety of results of SVM...
Speech analysis forms the first layer in the process of automatic speech recognition. All speech recognition system primarily performs pattern recognition and therefore they perform well when inputs features are provided with certain properties. The Mel-Scale cepstral coefficient and LP coefficient transformed into cepstral coefficient are the best techniques for performing the automatic speech recognition...
Speaker identification is a biometric technique of determining an unknown speaker's identity among a number of speakers using distinguish latent information of uttered speech. Crime investigation, security control, telephone banking and trading, and information reservation are some applications of this technique. Frequency Domain Linear Prediction (FDLP) is a time-frequency-based feature has been...
The major problem of most speech recognition systems is their unsatisfactory effectiveness (impact to recognition rate), efficiency (feature vector dimension), shift variance, and robustness in noisy condition. Feature extraction plays a very important role in the speech recognition process, because a better feature is good for improving the recognition rate. This paper presents a speech feature extraction...
Recent research has shown that using senone posteriors for i-vector extraction can achieve outstanding performance. In this paper, we extend this idea to robust speaker verification by constructing a deep neural network (DNN) comprising a deep belief network (DBN) stacked on top of a denoising autoencoder (DAE). The proposed method addresses noise robustness in two perspectives: (1) denoising the...
Detailed analysis of tonal features for Tibetan Lhasa dialect is an important task for Tibetan automatic speech recognition (ASR) applications. However, it is difficult to utilize tonal information because it remains controversial how many tonal patterns the Lhasa dialect has. Therefore, few studies have focused on modeling the tonal information of the Lhasa dialect for speech recognition purpose...
Speech is not only a way for infants under one year of age to communicate with the outside world, but also the important information source to reflect their emotions and needs, as well as health status and mental level. In order to explore the intelligent machine technology for understanding infant's emotions and needs from speech signals, and therefore help parents in child rearing, this paper studied...
This study was performed to evaluate the feasibility of short-time energy as an input vector features that will be used as a key of recognition in the voice biometric system to recognize the Cerebral Palsy (CP). To retrieve the characteristics of the voice, Mel-Frequencies Cepstral Coefficients (MFCC) was used as feature extraction algorithm, while Neuro Fuzzy was used as the classifier algorithm...
In this paper, we intend to introduce a new approach to recognize discrete speeches, specifically pre-assumed words. Our approach is mainly based on Principal Components Analysis (PCA) and Neural Networks (NN). To do so, initially we build a data base which is provided by 20 speakers who uttered each predefined word 5 times and overall 10 Persian words. Then we apply Voice Activity Detection (VAD)...
In this work, we explore prediction of different physical parameters from speech data. We aim to predict shoulder size and waist size of people from speech data in addition to the conventional height and weight parameters. A data-set with this information is created from 207 volunteers. A bag of words representation based on log magnitude spectrum is used as features. A support vector regression predicts...
In this Globalized world, the Call Centers and BPOsare increasing at an exponential rate. There is stiff competitionamong various companies and every company wants to have itsclients happy and satisfied with the resolution of the problems. For this purpose, Agent Quality Monitoring is an importantrequirement. Since in a typical Call Centre, thousands of calls aremade by agents in a single day, it...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.