The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
When applied to speech, Non-negative Matrix Factorization is capable of learning a small vocabulary of words, foregoing any prior linguistic knowledge. This makes it adequate for small-scale speech applications where flexibility is of the utmost importance, e.g. assistive technology for the speech impaired. However, its performance depends on the way its inputs are represented. We propose the use...
This paper proposes a new methodology that automatically generates English mnemonic keywords to support the learning of basic Japanese vocabulary. A new phonetic algorithm, called JemSoundex, is also introduced for phonetically transliterating the Japanese and English languages for phonetic matching. The effective
Search operations have become quite indispensable in recent days and loads of research are being organized to store and process the indices required for search operations in a simple and effective manner. Whenever indices are stored, the space it occupies and the ease of access are to be taken care of. This paper briefly deals with the existing system — the inverted index, and discusses the limitations...
investigation about the improvements in the accuracy of a search system provided by network analysis techniques supporting the discovery of relations among the items stored in the repository. For this reason, we have developed the SEEN prototype, a keyword search tool exploiting network analysis. SEEN has been evaluated against a
Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist
lower out of vocabulary rates. This paper proposes a morph-to-word transduction to convert morph sequences into word sequences. This enables powerful word language models to be applied. In addition, it is expected that techniques such as pruning, confusion network decoding, keyword search and many others may benefit from
In this study, we present a pre-filtering method for dynamic time warping (DTW) to improve the efficiency of a posteriorgram based keyword search (KWS) system. The ultimate aim is to improve the performance of a large vocabulary continuous speech recognition (LVCSR) based KWS system using the posteriorgram based KWS
In many cases keywords from a restricted set of possible keywords have to be assigned to texts. A common way to find the best keywords is to rank terms occurring in the text according to their tf.idf value. This requires a corpus of texts from which document frequencies can be derived. In this paper we show that we
This paper proposes a systematic full text search on document using a combined keyword and structural similarity of documents under consideration. The approach operates in two steps. The first step uses a set of designated keywords to acquire potential desired documents by means of an open source tool. The second step
research on using RNNLMs for keyword search systems has been relatively limited. In this paper the application of RNNLMs for the IARPA Babel keyword search task is investigated. In order to supplement the limited acoustic transcription data, large amounts of web texts are also used in large vocabulary design and LM training
This paper proposes a Bag of Visual Words (BoVW) based approach for keyword spotting on the Mongolian historical document images. In this paper, the first step is dividing the scanned Mongolian historical document images into word images by some preprocessing steps, such as connected component analysis, binarization
suggested in this study. Four keyword-based research networks, with journal paper or research project as network actors, constructed previously are selected as the targets of this empirical study: 1) Technology Foresight Paper Network: 181 papers and 547 keywords, 2) Regional Innovation System Paper Network: 431 papers and
This paper presents an improved acoustic keyword spotting (KWS) algorithm using a novel confusion garbage model in Mandarin conversational speech. Observing the KWS corpus, we found there are many words with similar pronunciation with predefined keywords, although they have different Chinese characters and different
In this paper we aim to enhance keyword search for conversational telephone speech under low-resourced conditions. Two techniques to improve the detection of out-of-vocabulary keywords are assessed in this study: using extra text resources to augment the lexicon and language model, and via subword units for keyword
Keyword extraction is an important application in the area of information technology. Automatic keyword extraction can help people know what is the article primarily talking about without reading the long passage carefully. This paper mainly introduced a keyword extraction algorithm using pagerank on Synonym. Firstly
An important facility to aid keyword search on XML data is suggesting alternative queries when user queries contain typographical errors. Query suggestion thus can improve users' search experience by avoiding returning empty result or results of poor qualities. In this paper, we study the problem of effectively and
Spoken keyword search in low-resource condition suffers from out-of-vocabulary (OOV) problem and insufficient text data for language model (LM) training. Web-crawled text data is used to expand vocabulary and to augment language model. However, the mismatching between web text and the target speech data brings
The application of the speaker-independent large-vocabulary CSR (continuous speech recognition) system DECIPHER to the keyword-spotting task is described. A transcription is generated for the incoming spontaneous speech by using a CSR system, and any keywords that occur in the transcription are hypothesized. It is
Keyword spotting is the task of identifying the occurrences of certain desired keywords in an arbitrary speech signal. Keyword spotting has many applications one of them is telephone routing. In particular, we consider a big company which receives thousands of telephone calls daily. We are interested with the
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.