Evaluating the accuracy of HMM-based and SVM-based spotters in detecting keywords and recognizing the true place of keyword occurrence shows that the HMM-based spotter detects the place of occurrence more precisely than the SVM-based spotter. On the other hand, the SVM-based spotter performs much better in detecting
search tools will make it easier for them to learn course subject matter at their own pace. As a first step in this direction, we seek to help instructors create an index for their lecture videos using audio keyword search, with queries recorded by the instructor on their laptop and/or created from video excerpts. For this
We propose two simple methods to improve the performance of a keyword spotting system. In our application, the users are allowed to change the keywords anytime if they want. Thus we focused on phone-based GMM-HMM models since they do not require keyword-specific training data. However, the GMM-HMM based models usually
In this study, a new keyword spotting system (KWS) that utilizes phone confusion networks (PCNs) is presented. The new system exploits the compactness and accuracy of phone confusion networks to deliver fast and accurate results. Special design considerations are provided within the new algorithm to account for phone
This paper presents a novel architecture for keyword spotting in spontaneous speech, in which keyword model is trained from a small number of acoustic examples provided by a user. The word-spotting architecture relies on scoring patch feature vector sequences extracted by using sliding windows, and performing keyword
Visual words of Bag-of-Visual-Words (BoVW) framework are independent each other, which results in not only discarding spatial orders between visual words but also lacking semantic information. This study is inspired by word embeddings that a similar embedding procedure is applied to a large number of visual words. By this way, the corresponding embedding vectors of the visual words can be formulated...
In keyword spotting applications, language modeling directly affects the system performance, as well as the acoustical modeling. This study focuses on the effects of different language models on the keyword spotting performance on Turkish voice recordings. Three different systems, one of which is proposed by us, that
Keyword search for spoken documents has become more and more important nowadays due to the increasing amount of spoken data. The typical system makes use of an Automatic Speech Recognition system (ASR) and information retrieval methods. While a number of studies have been done to get the optimal system performance
activation features (often using Keyword Spotting, KWS) of their devices. Standard KWS methods aim to statistically model the generation process of the speech signal, requiring hours of recorded and transcribed speech for training, and therefore are not adequate for limited-data scenarios. In this paper we propose a new KWS
A novel spoken keyword search grammar representation framework is proposed to combine the advantages of conventional keyword-filler based keyword search (KWS) and the LVCSR-based KWS systems. The proposed grammar representation allows keyword search systems to be flexible on keyword target settings as in the LVCSR
In this paper, we present the results of evaluating the robustness to language change of a previously proposed keyword spotting system. We assessed the robustness of this system when trained on clean English dataset and tested on telephony Persian speech. To have better recognition rate on telephony data, we used
We propose a new method of dynamically synthesizing Korean character image templates and then converting them into P2DHMMs in real-time. This method is more advantageous than whole character HMMs in memory requirement as well as training difficulty.
Language Model (LM) constitutes one of the key components in Keyword Spotting (KWS). The rapid development of the World Wide Web (WWW) makes it an extremely large and valuable data source for LM training, but it is not optimal to use the raw transcripts from WWW due to the mismatch of content between the web corpus
This paper proposes a document indexing method for keyword spotting based on semi-Markov conditional random fields (semi-CRFs), which provide a theoretical framework for fusing the information of different contexts. The candidate segmentation-recognition lattice is first augmented based on the linguistic context to
The H-KWS 2016, organized in the context of the ICFHR 2016 conference aims at setting up an evaluation framework for benchmarking handwritten keyword spotting (KWS) examining both the Query by Example (QbE) and the Query by String (QbS) approaches. Both KWS approaches were hosted into two different tracks, which in
corpus. Using a bigram phoneme language model, phoneme recognition experiments are performed on a two hour independent test set using the Viterbi decoding which show a relative 33.3% improvement by our CD-DNN acoustic model. We then present a filler based Hybrid DNN-HMM Keyword Spotting KWS system which to our knowledge is
We explore techniques to improve the robustness of small-footprint keyword spotting models based on deep neural networks (DNNs) in the presence of background noise and in far-field conditions. We find that system performance can be improved significantly, with relative improvements up to 75% in far-field conditions
perceptron learning to improve system performance. The entire suite of semi-supervised methods presented in this paper was evaluated under the IARPA Babel program for the keyword spotting tasks. Our semi-supervised system had the best performance in the OpenKWS13 surprise language evaluation for the limited condition. In this
For text-query-based keyword spotting from handwritten Chinese documents, the index is usually organized as a candidate lattice to overcome the ambiguity of character segmentation. Each edge in the lattice denotes a candidate character associated with a candidate class. Character similarity (between character and
The task in keyword spotting (KWS) is to hypothesise times at which any of a set of key terms occurs in audio. An important aspect of such systems are the scores assigned to these hypotheses, the accuracy of which have a significant impact on performance. Estimating these scores may be formulated as a confidence
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.