The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The OpenKWS14 keyword search evaluation is one of the most challenging and influential evaluations in the field of speech recognition. Its goal is to build a high-performance keyword search system for a minority language with limited training data in a short period of time. We present the system of the Department of Electronic Engineering, Tsinghua University (THUEE team) for the OpenKWS14 keyword...
Exploiting sparseness in deep neural networks is an important method for reducing the computational cost. In this paper, we study neuron sparseness in deep neural networks for acoustic modeling. For the feed-forward stage, we only activate neurons whose input values are larger than a given threshold, and set the outputs of inactive nodes to zero. Thus, only a few nonzero outputs are fed to the next...
In the power grid of "strong DC and weak AC", the commutation failure of the ultra-high voltage (UHV) line will result in angle swing between two regional power grids. Our analysis show that, unlike the scenario in the single machine infinite bus system, the speed of angle swing between two regional power grids after the fault is slow since power vacant is not very large (compared with the...
A recently introduced deep neural network (DNN) has achieved some unprecedented gains in many challenging automatic speech recognition (ASR) tasks. In this paper deep neural network hidden Markov model (DNN-HMM) acoustic models is introduced to phonotactic language recognition and outperforms artificial neural network hidden Markov model (ANN-HMM) and Gaussian mixture model hidden Markov model (GMM-HMM)...
In spoken language recognition (SLR), discriminatively trained models always outperform non-discriminative models but computationally expensive and complex to implement. In this paper, we explore a novel approach to discriminative vector space model (VSM) training by using a boosting regression framework, in which an ensemble of VSMs is trained sequentially. The effectiveness of our boosting variation...
Short utterance is a great challenge for speaker recognition, for there is very limited data can be used for training and testing. To give a robust estimation, the amount of model parameters for the short utterance should be less than that for the long utterance; however, this may impede the models descriptive capability. In this paper, we propose a multi-scale kernel (MSK) approach to solve this...
At present, i-vector model has become the state-of-the-art technology for speaker recognition. It represents speech utterance to a low-dimensional fix-length compact i-vector. For some real application, i-vector extraction procedure is relatively slow and requires too much memories. Some numerical approximation based fast extraction methods have been proposed to speed up the computation and to save...
This paper introduces an approach based on Fisher vector feature representation for speaker verification. The Fisher vector is originated from Fisher Kernel and represents each utterance as a high-dimensional vector by encoding the derivatives of the loglikelihood of the UBM model with respect to it's mean and variances. This representation captures the average first and second order differences between...
Nowadays phone recognition followed by support vector machine (PR-SVM) has been proposed in language recognition tasks and shown encouraging results. However, it still suffers from the problems such as the curse of dimensionality led by the increasing order of the N-gram feature supervector, the fast increasing number of possible parameters because of fast exact match of the phoneme history, etc....
Recurrent neural network language models (RNNLMs) have been proved superior to many other competitive language modeling techniques in terms of perplexity and word error rate. The remaining problem is the great computational complexity of RNNLMs in the output layer, resulting in long time for evaluation. Typically, a class-based RNNLM with the output layer factorized was proposed for speedup, which...
A compact acoustic model for speech recognition is proposed based on nonlinear manifold modeling of the acoustic feature space. Acoustic features of the speech signal is assumed to form a low-dimensional manifold, which is modeled by a mixture of factor analyzers. Each factor analyzer describes a local area of the manifold using a low-dimensional linear model. For an HMM-based speech recognition system,...
Signal structure is one of the decisive factors of the inherent performance of satellite navigation system, meanwhile it is one of the critical technologies which must be resolved during system design and upgrading process. In order to improve code tracking precision and have the better bit error rate (BER) ability at the same time, we combine low-density party-check (LDPC) codes and minimum shift...
Albayzin 2012 language recognition evaluation (LRE) is one of the most challenging language recognition evaluation, which is mainly reflected in: (1) the target languages are more confusable with other languages, which might push down the system performance; (2) developing and test data is heterogeneous regarding duration, number of speakers, ambient noise/music, channel conditions, etc. (3) signals...
With the rapid development of keyword advertising on search engine platforms, competitive advertising becomes a novel strategy for advertisers to gain more potential market share. Though keyword suggestion methods can help match the keywords chosen by the advertisers and the queries in search engine, mainstream keyword suggestion methods suggest keywords by directly extending seed keywords and cannot...
The Context-Dependent Deep-Neural-Network HMM, or CD-DNN-HMM, is a powerful acoustic modeling technique. Its training process typically involves unsupervised pre-training and supervised fine-tuning. In the paper, we demonstrate that the performance of DNNs can be improved by utilizing a large amount of unlabeled data in the training procedure. In our method, CD-DNN-HMM trained using 309 hours of unlabeled...
With the rapid development of Smart City, video surveillance platform is a critical part of it now bearing a substantial pressure. The current video surveillance platform appear to have several bottlenecks: overloaded streaming media server, weak tolerant ability and weak in expansion. After an investigation on the architecture of the current stand-alone streaming media server, this paper proposes...
Using neural networks to estimate the probabilities of word sequences has shown significant promise for statistical language modeling. Typical modeling methods include multi-layer neural networks, log-bilinear networks and recurrent neural networks, etc. In this paper, we propose the temporal kernel neural network language model, a variant of models mentioned above. This model explicitly captures...
In prosody event detection field, many local acoustic features have been proposed for representing the prosody characteristics of speech unit. The context information that represents some possible regularities underlying neighboring prosody events, however, hasn't been used effectively. The main difficulty to utilize prosodic context is that it's hard to capture the long-distance sequential dependency...
A new method for image denoising was presented", "which colligated the strong point of wave atoms transform and Cycle Spinning. Due to lack of translation invariance of wave atoms transform", "image denoising by coefficient thresholding would lead to Pseudo-Gibbs phenomena. Cycle Spinning was employed to avoid the artifacts. Experimental results show that the method can remove...
Computing supervectors from many sliced utterance feature vectors as the inputs to support vector machine is used in many state-of-art systems for speaker and language recognition. This feature recombined method can achieve very well recognition results, but is also very time-consuming. By analyzing the supervectors computation procedure, we found great data-parallel potential. We can use vector/matrix...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.