The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Automatic dialect classification has emerged as an important area in the speech research field. Effective dialect classification is useful in developing robust speech systems, such as speech recognition and speaker identification. In this paper, two novel algorithms are proposed to improve dialect classification for text-independent spontaneous speech in Arabic and Spanish languages, along with probe...
Computional learning from multimodal data is often done with matrix factorization techniques such as NMF (Non-negative Matrix Factorization), pLSA (Probabilistic Latent Semantic Analysis) or LDA (Latent Dirichlet Allocation). The different modalities of the input are to this end converted into features that are easily placed in a vectorized format. An inherent weakness of such a data representation...
Active Learning (AL) is designed to aid the labor-intensive process of training acoustic model for speech recognition. In AL, only the most informative training samples are selected for manual annotation. Thus, how to evaluate the unlabeled samples is worth researching. In this paper, we propose a unified framework to generate confusion networks of multiple levels including character, syllable and...
Researchers in human language processing and acquisition are making an increasing use of computational models. Computer simulations provide a valuable platform to reproduce hypothesised learning mechanisms that are otherwise very difficult, if not impossible, to verify on human subjects. However, computational models come with problems and risks. It is difficult to (automatically) extract essential...
In this paper, we introduce a fully autonomous vehicle classification system that continuously learns from largeamounts of unlabeled data. For that purpose, we proposea novel on-line co-training method based on visual and acoustic information. Our system does not need complicated microphone arrays or video calibration and automatically adapts to specific traffic scenes. These specialized detectors...
This paper compares three different approaches currently used in recognizing contact calls made from the North Atlantic Right Whale (NRW), Eubalaena glacialis. We present two new approaches consisting of machine learning algorithms based on artificial neural networks (NET) and the classification and regression tree classifiers (CART), and compare their performance with earlier work that employs multi-Stage...
Autonomous signal detection of the North Atlantic right whale (NRW), Eubalaena glacialis, is becoming an important factor in monitoring and conservation for this highly endangered species. Both online and offline systems exist to help study and protect animals within this population. In both cases auto-detection of species-specific calls plays a vital role in localizing individual animal by searching...
This paper investigates the Bayesian Ying-Yang (BYY) learning for speech recognition via Gaussian mixture models (GMMs) based Hidden Markov models (HMMs). A two level procedure is proposed with the hidden Markov level trained still under the maximum likelihood principle by the Baum-Welch algorithm but with the GMMs level trained under the BYY best harmony. We proposed a new batch way EM-like Ying-Yang...
Many of the existing machine-learning approaches to speech summarization cast important sentence selection as a two-class classification problem and have shown empirical success for a wide variety of summarization tasks. However, the imbalanced-data problem sometimes results in a trained speech summarizer with unsatisfactory performance. On the other hand, training the summarizer by improving the...
Preparation of a lexicon for speech recognition systems can be a significant effort in languages where the written form is not exactly phonetic. On the other hand, in languages where the written form is quite phonetic, some common words are often mispronounced. In this paper, we use a combination of lexicon learning techniques to explore whether a lexicon can be learned when only a small lexicon is...
This paper describes recent improvements to the Cambridge Arabic Large Vocabulary Continuous Speech Recognition (LVSCR) Speech-to-Text (STT) system. It is shown that Multi-Layer Perceptron (MLP) features trained on phonetic targets can improve the performance of both phonemic and graphemic systems. Also, a morphological decomposition scheme is extended from the graphemic domain to the phonetic domain,...
We propose a committee-based active learning method for large vocabulary continuous speech recognition. In this approach, multiple recognizers are prepared beforehand, and the recognition results obtained from them are used for selecting utterances. Here, a progressive search method is used for aligning sentences, and voting entropy is used as a measure for selecting utterances. We apply our method...
This paper addresses the training of classification trees for weakly labelled data. We call “weakly labelled data”, a training set such as the prior labelling information provided refers to vector that indicates the probabilities for instances to belong to each class. Classification tree typically deals with hard labelled data, in this paper a new procedure is suggested in order to train a tree from...
This paper addresses weakly supervised object recognition. We show how the combination of an image-level inference, in terms of image-level object class priors, can lead to better training of object recognition models. Stated within a probabilistic setting, the proposed approach is applied to fisheries acoustics and fish school recognition.
Aiming at the requirement of class incremental learning in acoustic fault identification research, a network model using a novel Self-organizing map--negative self-organizing map (NSOM) and probabilistic neural network (PNN) is proposed. The experiment of acoustic fault identification of underwater vehicle shows that the proposed network has better capability of class incremental learning than traditional...
Grapheme-based acoustic modeling for Arabic is a demanding research area since high phonetic transcription accuracy is not yet solved completely. In this paper, we are studying the use of a pure grapheme-based approach using Gaussian mixture model to implicitly model missing diacritics and investigating the effect of Gaussian densities and amount of training data on speech recognition accuracy. Two...
In this study, we combine the Mandarin characteristics with Mandarin acoustic attribute and text information and use hierarchical model based ensemble machine learning to predict Mandarin pitch accent. Our model could make the best of advantages of prosody hierarchical structure and ensemble machine learning. When comparing our model with classification and regression tree (CART), support vector machine...
Statistical training allows the establishment of a probabilistic classification model. In the supervised case, the model is assessed from a labelled dataset, i.e. each observed data has a label. In the weakly-supervised case, the label is not exactly known. In our instance, the probability to associate the observation to the different classes is known. Thus, labels for the data are a probability vector...
In this paper we show how common training criteria like for example MPE or MMI can be extended to incorporate a margin term. In addition, a transducer-based training implementation is presented, which covers a large variety of discriminative training criteria for ASR, including the standard MMI, MPE, and MCE criteria, as well as the modifications to these criteria presented here. The modified criteria...
This study proposes an approach to improving speaker recognition through the process of minute vocal tract length perturbation of training files, coupled with pitch normalization for both train and test data. The notion of perturbation as a method for improving the robustness of training data for supervised classification is taken from the field of optical character recognition, where distorting characters...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.