The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a hybrid speaker diarization system. The main body is a variational Bayes — hidden Markov model (VB-HMM) speaker diarization system. The VB-HMM speaker diarization system avoids making premature hard decision and takes advantages of soft speaker information in an iterative way. Thus, it outperforms most of mainstream speaker diarization systems. Unfortunately, this system is sensitive...
End-to-end speech recognition systems have been successfully implemented and have become competitive replacements for hybrid systems. A common loss function to train end-to-end systems is connectionist temporal classification (CTC). This method maximizes the log likelihood between the feature sequence and the associated transcription sequence. However there are some weaknesses with CTC training. The...
Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term memory RNNs (LSTM RNNs) have been proposed to solve this problem, and have achieved excellent results. However, because of the large size of LSTM RNNs, they more easily suffer from overfitting,...
Exploiting sparseness in deep neural networks is an important method for reducing the computational cost. In this paper, we study neuron sparseness in deep neural networks for acoustic modeling. For the feed-forward stage, we only activate neurons whose input values are larger than a given threshold, and set the outputs of inactive nodes to zero. Thus, only a few nonzero outputs are fed to the next...
A linear dynamic programming model was developed to study the optimization of the routes to realize non-fossil fuel development goal in China. To deal with the flip-flop phenomenon when applying linear programming to describe and solve above managerial and engineering optimization problem, a model for optimizing the routes of non-fossil fuel developing (MORN model) was applied in multi-constraints...
Albayzin 2012 language recognition evaluation (LRE) is one of the most challenging language recognition evaluation, which is mainly reflected in: (1) the target languages are more confusable with other languages, which might push down the system performance; (2) developing and test data is heterogeneous regarding duration, number of speakers, ambient noise/music, channel conditions, etc. (3) signals...
In this paper, we propose a method to improve detecting the mispronunciation type of the non-native learners. In order to cope with the low-resource condition of non-native speech and the difference of native and non-native speech, the following efforts are made: 1) train acoustic model with the low-resource non-native data; 2) introduce the articulatory-based tandem feature; 3) pool auxiliary native...
The Context-Dependent Deep-Neural-Network HMM, or CD-DNN-HMM, is a powerful acoustic modeling technique. Its training process typically involves unsupervised pre-training and supervised fine-tuning. In the paper, we demonstrate that the performance of DNNs can be improved by utilizing a large amount of unlabeled data in the training procedure. In our method, CD-DNN-HMM trained using 309 hours of unlabeled...
This paper presents a method to improve the mispronunciation detection performance for low-resource acoustic model. The 1h speech data is randomly selected from CU-CHLOE to imitate the low-resource non-native English situation. The Tandem feature derived from articulatory based Multi-Layer Perception (MLP) is employed to replace the traditional spectral feature (e.g. PLP). Further, motivated by similar...
Several studies have showed that network features (e.g., packet interval and packet size) may be well modeled by a hidden Markov model (HMM) with appropriate hidden variables that capture the current state of the network. In this paper, we propose a prediction mechanism on the basis of the HMM model to assist the Power Saving (PS) in WiMAX. In comparison with prior models whose analyses are often...
Handover interruption as a critical issue has long been studied in wireless networks towards a seamless and lossless target. This paper proposes a model for gap-utilized handover, using traffic-pattern learning based on HMM, and presents the analytic details regarding to the QoS performance. The handover exploiting traffic gaps, i.e., periods of no packet transferred, can reduce packet loss/delay...
Automatic multilingual speech recognition is always a difficult task. This paper presents recent work on the development of a Mandarin-English bilingual speech recognition system. A unified single set of bilingual acoustic models based on a novel State-Time-Alignment (STA) method is proposed to balance the performance and the complexity of the bilingual speech recognition system, and a comparison...
In wireless sensor networks, it has been proved that the reliable transmission protocols sending redundant packets to the upstream neighbour hop-by-hop have advantage on energy efficiency compared with those using end-to-end error recovery and control scheme. It provides an opportunity for applications to find a trade off point regarding transmission probability and energy consumption. The problem...
We propose a Wavelet based Markov Chain (WBMC) model for nature images, which can present statistic divergence between cover image and steg image prominently. Based on Markov chain empirical matrix, we discussed the difference between low frequency domain and high frequency domain generalized by steg process, and then defined two models: WBMC_L model and WBMC_H model respective to construct our WBMC...
This paper introduces a novel isolated word speech recognition system-on-chip (SoC). An Application Specific Integrated Circuit (ASIC) with a unique vector accelerator is designed in the SoC to realize Continuous density Hidden Markov Model (CHMM) recognition algorithm based on the Mel-Frequency Cepstral Coefficients (MFCC) feature. Due to a hardware and software co-design, the cost of the ASIC is...
In order to solve the problem of random and fluctuation of experiment errors and predication errors of neural network, a neural network model modified by a fuzzy Markov chain was introduced, When neural network was used to predict, the prediction errors between actual value and output value of the network were distributed randomly. That can be simulated by a Markov chain. According to the forecasting...
In previous systems of speech emotion recognition, supervised learning are frequently employed to train classifiers on lots of labeled examples. However, the labeling of abundant data requires much time and many human efforts. This paper presents an enhanced co-training algorithm to utilize a large amount of unlabeled speech utterances for building a semi-supervised learning system. It uses two conditionally...
In this paper, a new framework for large vocabulary keyword spotting is proposed, which involves three phases. In the first phase, N-best sub-word lattice is generated by hidden Markov model (HMM). Keyword candidates are hypothesized by dynamic keyword matching during the second phase. In the last phase, two-pass confidence measure, which provides complementary information, is used for keyword verification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.