The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Spam mails are one of the greatest challenges faced by internet service providers, organizations and internet users in unison. Spam mails may be targeted, with a malicious intent or just as a commercial marketing activity - on the whole unwanted by everyone except the dispatcher. Spam filters continuously evolve as spammers go techno-savvy and creative. Machine learning algorithms have been popularly...
In this study, a new keyword spotting system (KWS) that utilizes phone confusion networks (PCNs) is presented. The new system exploits the compactness and accuracy of phone confusion networks to deliver fast and accurate results. Special design considerations are provided within the new algorithm to account for phone
In this work, we propose a new descriptor that is called Gradient Local Binary Patterns (GLBP) for automatic keyword spotting in handwritten documents. GLBP is a gradient feature that improves the Histogram of Oriented Gradients (HOG) by calculating the gradient information at transitions of the Local Binary Pattern
Deep learning had a significant impact on diverse pattern recognition tasks in the recent past. In this paper, we investigate its potential for keyword spotting in handwritten documents by designing a novel feature extraction system based on Convolutional Deep Belief Networks. Sliding window features are learned from
effective in terms of better precision. Proposed method makes use of keyword clusters for query expansion. Visual features are used for detecting duplicate images in proposed method. Removing duplicates leads to further improve in precision and recall in retrieval result
In this paper, a segmentation-free keyword spotting method is proposed for Bangla handwritten documents. In order to tolerate large variations in handwritten scenarios, we extracted key points based on SIFT key point detector, and the end and intersection points found by morphological operations. Heat Kernel signature
We propose a new segmentation-free method for keyword spotting in handwritten documents based on Heat Kernel Signature (HKS). After key points are located by the key point detector for SIFT on the document pages and the query image, HKS descriptors are extracted from a local patch centered at each key point. In order
This paper proposes a strategy of the summary sentence selection for query-focused multi-document summarization through extracting keywords from relevant document set. It calculates the query related feature and the topic related feature for every word in relevant document set, then obtains the importance of the word
In this paper we propose a novel and efficient technique for finding keywords typed by the user in digitised machine-printed historical documents using the dynamic time warping (DTW) algorithm. The method uses word portions located at the beginning and end of each segmented word of the processed documents and try to
The source of retrieval about stage's design knowledge base is text script, and text processing has become the key technologies about obtaining related information from script. This paper proposes a method of extracting from the script in the keyword categories by analyzing the characteristics of the script
One of the key components of constructing an ontology is a taxonomy. Creating a comprehensive taxonomy involves extracting keywords and keyphrases from the domain corpus. It is a time consuming endeavour that involves domain expertise and syntactic and structural knowledge of the corpus in question. In this paper we
We develop and analyze an unsupervised and domain-independent method for extracting keywords from single documents. Our approach differs from the previous ones in the way of identifying candidate keywords, pruning the list of candidate keywords with several filtering heuristics and selecting keywords from the list of
Most traditional template matching based keyword recognition methods don't need training data, just rely on frame matching. However, the recognition speed is relatively slow and it can't be used in practice. The LVCSR-based method needs to convert the speech signal into text signal before recognition, which has an
This paper presents a new technique for preparing word templates to improve the performance of dynamic time warping based keyword spotting. The proposed technique selects one reference template from a small set of examples and in contrast to existing model based approaches does not require extensive training
families, alphabets, phone sets and vocabulary sizes. In particular, it looks at ensembles of stimulated networks to ensure that improved generalisation will withstand system combination effects. In order to assess stimulated training beyond 1-best transcription accuracy, this paper looks at keyword search as a proxy for
investigation about the improvements in the accuracy of a search system provided by network analysis techniques supporting the discovery of relations among the items stored in the repository. For this reason, we have developed the SEEN prototype, a keyword search tool exploiting network analysis. SEEN has been evaluated against a
This paper proposes a method for keyword spotting in offline Chinese handwritten documents using a statistical model. On a text query word, the method measures the similarity between the query word and every candidate word in the document by combining a character classifier and four classifiers characterizing the
This paper presents a new way for keyword spotting in degraded imaged document. Two prevalent word indexing, OCR and word shape coding, are combined compactly based on the recognition confidence evaluation. The basic procedures are as follows. First, OCR candidates are used for OCR indexing. Second, a new stoke
The feature extraction is the most key technology of text categorization. The word is used as the feature in the traditional text classification, and its effect for the text classification is evidence. The feature extraction method using base phrase and keyword changes the feature extraction of Chinese text from
As the amount of data increases and the relations among them get more complex, access to information implicit in data appears more difficult, and the role of methods of getting data from diverse texts, and analyzing them becomes more significant. Of such methods is the highly effective technique of keyword extraction
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.