When applied to speech, Non-negative Matrix Factorization is capable of learning a small vocabulary of words, foregoing any prior linguistic knowledge. This makes it adequate for small-scale speech applications where flexibility is of the utmost importance, e.g. assistive technology for the speech impaired. However, its performance depends on the way its inputs are represented. We propose the use...
Evaluating the accuracy of HMM-based and SVM-based spotters in detecting keywords and recognizing the true place of keyword occurrence shows that the HMM-based spotter detects the place of occurrence more precisely than the SVM-based spotter. On the other hand, the SVM-based spotter performs much better in detecting
We propose two simple methods to improve the performance of a keyword spotting system. In our application, the users are allowed to change the keywords anytime if they want. Thus we focused on phone-based GMM-HMM models since they do not require keyword-specific training data. However, the GMM-HMM based models usually
Text analysis of a web page is more difficult than the analysis of the text of normal document due to the presence of additional information, such as HTML structure, styling codes, irrelevant text, and presence of hyperlinks. In this paper, we propose an unsupervised method to extract keywords from a web page. The
This paper proposes a new methodology that automatically generates English mnemonic keywords to support the learning of basic Japanese vocabulary. A new phonetic algorithm, called JemSoundex, is also introduced for phonetically transliterating the Japanese and English languages for phonetic matching. The effective
In this paper, we proposed a method to realize the recently developed keyword-aware grammar for LVCSR-based keyword search using weight finite-state automata (WFSA). The approach creates a compact and deterministic grammar WFSA by inserting keyword paths to an existing n-gram WFSA. Tested on the evalpart1 data of the
In this study, a new keyword spotting system (KWS) that utilizes phone confusion networks (PCNs) is presented. The new system exploits the compactness and accuracy of phone confusion networks to deliver fast and accurate results. Special design considerations are provided within the new algorithm to account for phone
This paper presents a novel architecture for keyword spotting in spontaneous speech, in which keyword model is trained from a small number of acoustic examples provided by a user. The word-spotting architecture relies on scoring patch feature vector sequences extracted by using sliding windows, and performing keyword
This paper presents a template-based system for speaker independent key word spotting (KWS) in continuous speech that can help in automatic analysis, indexing, search and retrieval of user generated videos by content. Extensive experiments on clean speech confirm that the proposed approach is superior to a HMM approach when applied to noisy speech with different signal-to-noise ratio (SNR) levels...
The task of zero resource query-by-example keyword search has received much attention in recent years as the speech technology needs of the developing world grow. These systems traditionally rely upon dynamic time warping (DTW) based retrieval algorithms with runtimes that are linear in the size of the search
Markov Model/ Artificial Neural Network (HMM/ANN) keyword spotting framework. The feature extraction method used was Mel-Frequency Cepstral Coefficients (MFCC). The ANN is a 3-layer feedforward neural network using Multi-Layer Perceptron (MLP). In recognizing the words, an HMM decoder was used which implemented the Viterbi
We study user-friendly voice interface to consumer electronics and propose a voice activation system that can make speech recognition activated only when voice sounds from legitimate users are detected. The proposed system enables efficient operation of speech recognition in a continuous listening environment without any touch and/or key input.
This paper presents a supervised framework for extracting keywords from meeting transcripts, a genre that is significantly different from written text or other speech domains such as broadcast news. In addition to the traditional frequency- or position-based clues, we investigate a variety of novel features, including
Most traditional template matching based keyword recognition methods don't need training data, just rely on frame matching. However, the recognition speed is relatively slow and it can't be used in practice. The LVCSR-based method needs to convert the speech signal into text signal before recognition, which has an
This paper presents a new technique for preparing word templates to improve the performance of dynamic time warping based keyword spotting. The proposed technique selects one reference template from a small set of examples and in contrast to existing model based approaches does not require extensive training
sequence during training. This paper explores the design of an ASR-free end-to-end system for text query-based keyword search (KWS) from speech trained with minimal supervision. Our E2E KWS system consists of three sub-systems. The first sub-system is a recurrent neural network (RNN)-based acoustic auto-encoder trained to
corpus. Using a bigram phoneme language model, phoneme recognition experiments are performed on a two hour independent test set using the Viterbi decoding which show a relative 33.3% improvement by our CD-DNN acoustic model. We then present a filler based Hybrid DNN-HMM Keyword Spotting KWS system which to our knowledge is
We explore techniques to improve the robustness of small-footprint keyword spotting models based on deep neural networks (DNNs) in the presence of background noise and in far-field conditions. We find that system performance can be improved significantly, with relative improvements up to 75% in far-field conditions
A linguistic analyzer based on KCTs (keyword classification trees) was trained on sentences from the ATIS (Air Travel Information System) air travel task and incorporated into the system (CHANEL) built at CRIM (Centre de Recherche Informatique de Montreal) for the Nov. 1992 ATIS benchmarks. Word sequences were
With the completion of the IARPA Babel program, it is possible to systematically analyze the performance of speech recognition systems across a wide variety of languages. We select 16 languages from the dataset and compare performance using a deep neural network-based acoustic model. The focus is on keyword spotting
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.