The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
automatic speech recognisers using HMM/GMM, SGMM and DNN/HMM acoustic models as keyword spotters. We present the first results indicating promising performance of the radio-browsing system.
characters, even on syllabic alphabets like Amharic. In addition, we report improvements in word error rate from rescoring lattices and evaluate keyword search performance on several languages.
Cantonese speech recognition and keyword search tasks. Experiments show that starting from an expert lexicon of only 1K words, we are able to generate a lexicon that works reasonably well when compared with an expert-crafted lexicon of 5K words.
This work proposes a voice-activity home care system which can construct a life log associated with voices at home. Accordingly, the techniques of sound-pressure-level calculation, abnormal sound detection, noise reduction, text-independent speaker recognition and keyword spotting are developed. In abnormal sound
This paper extends recent research on training data selection for speech transcription and keyword spotting system development. Selection techniques were explored in the context of the IARPA-Babel Active Learning (AL) task for 6 languages. Different selection criteria were considered with the goal of improving over a
In particular for “low resource” Keyword Search (KWS) and Speech-to-Text (STT) tasks, more untranscribed test data may be available than training data. Several approaches have been proposed to make this data useful during system development, even when initial systems have Word Error Rates (WER) above 70
an additional held-out target language. STT gains achieved through using multilingual bottleneck features in a Tandem configuration are shown to also apply to keyword search (KWS). Further improvements in both STT and KWS were observed by incorporating language questions into the Tandem GMM-HMM decision trees for the
This article presents a method for automatic tagging of Youtube videos. The proposed method combines an automatic speech recognition (ASR) system, that extracts the spoken contents, and a keyword extraction component that aims at finding a small set of tags representing a video. In order to improve the robustness of
One of the most serious problems that conventional knowledge management (KM) encompasses has been pointed out tardy and ineffective acquisition of knowledge. To resolve this problem, knowledge must be autonomously acquired according to its context of use by applying the technique of keyword extraction in machine
prototype system demonstrates our latest development on automatic speech recognition, keyword spotting, personalized text-to-speech synthesis and visual speech synthesis. The second demo exhibits a virtual concert with immersive audio effects. Through our virtual auditory technology, wearing simple earphones, listeners are
, the system carry out conversation with the user to explicitly understand his/her needs and accordingly filters search results for display. The conversation between the system and the user is based on word co-occurrence keyword extraction and Artificial Intelligence Markup Language (AIML) technique. As per initial
extraction, text normalization, keyword extraction, shot boundary detection, face detection and recognition, and near duplicate keyframe detection. These processing components detect a rich set of metadata information, which is collected by the video monitoring server. On a web interface, users can tune to different digital TV
results in up to 1.1% absolute Word Error Rate (WER) improvement as compared to keyword-based approaches. The proposed approach reduces the WER by 6.3% absolute in our experiments, compared to an in-domain LM without considering any Web data.
recognition using audio and visual cues. The novelty lies in putting together the tasks such that they can provide relevant information to one another. We evaluate the performance of our system and present results for tasks such as keyword spotting and tracking re-identification on real-world meeting scenes collected in our
increase in keyword spotting accuracy. The key finding was that performance improvements observed were due to increased recognition accuracy for words associated with the visual field but not the current focus of visual attention.
system for unsupervised word-clustering, which is able to recognize and learn the structure of speech online in a unified framework. To do so we've extended HMM-based filler-free keyword spotting with acoustic model acquisition. To evaluate and control the dynamics of the combined acquisition-recognition process we propose
, we propose a method for detecting keyword and rejecting out-of-vocabularies (OOV). It consists of filler-modeling technique and utterance verification. And finally, we implement the ASR software on PDAs (Samsung SPH-M4300 and HP iPAQ-RW6100), one kind of portable devices. It works in 54.7% of real-time with the
error. However, this prevalent performance metric is not desirable in many practical applications. For example, the cost of "recognition" error is required to be differentiated in keyword spotting systems. In this paper, we propose an extended framework for the speech recognition problem with non-uniform classification
round, keyword histograms are automatically generated for the refinement of the search query, such that the reformulated query fits better to the target topic. We have also developed an image-based refinement module, which uses the region analysis of the video key-frames. SR-tree like indexing structure is constructed for
General purpose computation based on GPU is a hot topic for research in recent years. The paper presents the parallel implementation of Viterbi algorithm on GPU based on features of GPU and characteristics of Viterbi algorithm in keyword spotting system. The results of examination by using NVIDIA 9600 GT GPU show that
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.