The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we propose a novel and efficient technique for finding keywords typed by the user in digitised machine-printed historical documents using the dynamic time warping (DTW) algorithm. The method uses word portions located at the beginning and end of each segmented word of the processed documents and try to
In this paper, we introduced language network and described three kinds of networks. Keyword extraction is an important technology in many areas of document processing. In particularly, a keyword extraction algorithm based on language network and PageRank is proposed. Firstly a semantic network for a single document
Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature
In this paper, a new information extraction system by statistical shallow parsing in unconstrained handwritten documents is introduced. Unlike classical approaches found in the literature as keyword spotting or full document recognition, our approach relies on a strong and powerful global handwriting model. A entire
attributes must be shared to have at every node a more accurate estimation of the global classifier. When expanding the knowledge of the local classifiers, to reduce costs, the network traffic should be kept to a minimum. We propose a probabilistic model for a keyword selection method which makes a more thorough analysis
both manual and automatic transcriptions, for non-English documents, we use automatic translations. In this work, we use AdaBoost, a discriminative classification method with both lexical and semantic features. The results indicate 11%-13% relative improvement over a baseline keyword-spotting-based approach. We also show
This paper presents a novel framework for multi-folder email classification using graph mining as the underlying technique. Although several techniques exist (e.g., SVM, TF-IDF, n-gram) for addressing this problem in a delimited context, they heavily rely on extracting high-frequency keywords, thus ignoring the
Rooted in multi-document summarization, maximum marginal relevance (MMR) is a widely used algorithm for meeting summarization (MS). A major problem in extractive MS using MMR is finding a proper query: the centroid based query which is commonly used in the absence of a manually specified query, can not significantly outperform a simple baseline system. We introduce a simple yet robust algorithm to...
contextual information and keywords extracted from documents. For our experiments, we preprocessed hundreads of TASKs in the aircraft's maintenance manual and made several cases for context. Our experiments showed that our proposing system could provide information related with context.
user's request, the user has to construct the request using the keywords that best describe the user's objective and match correctly with the Web Service name or location. Clustering Web services based on function similarities would greatly boost the ability of Web services search engines to retrieve the most relevant Web
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.