This paper proposes semantic based keyphrase recovery for domain-independent keyphrase extraction. In this method, we add a keyphrase recovery function as a post- process of the conventional keyphrase extractors in order to reconsider the failed keyphrases by semantic matching based on sentence meaning. We also add the Domain Identification Function to determine the related domain of the keyphrases...
the semantic units from text according to the keywords representing users’ interests and can organise semantic units into facets reflecting certain aspects of a text. The mechanism can display facets of a text with a set of operations. The proposed mechanism considers human reading process. With this mechanism, readers
A user who wants to get information from a relational database needs to know database schema and structured query languages like SQL. The ordinary users are not familiar to those things, so searching information from relational databases is hard to them. Keyword search is a solution of the problem, where a keyword
We propose a new segmentation-free method for keyword spotting in handwritten documents based on Heat Kernel Signature (HKS). After key points are located by the key point detector for SIFT on the document pages and the query image, HKS descriptors are extracted from a local patch centered at each key point. In order
In keyword spotting applications, language modeling directly affects the system performance, as well as the acoustical modeling. This study focuses on the effects of different language models on the keyword spotting performance on Turkish voice recordings. Three different systems, one of which is proposed by us, that
We propose a fully automatic method for summarizing and indexing unstructured presentation videos based on text extracted from the projected slides. We use changes of text in the slides as a means to segment the video into semantic shots. Unlike precedent approaches, our method does not depend on availability of the electronic source of the slides, but rather extracts and recognizes the text directly...
This paper proposes a strategy of the summary sentence selection for query-focused multi-document summarization through extracting keywords from relevant document set. It calculates the query related feature and the topic related feature for every word in relevant document set, then obtains the importance of the word
We develop and analyze an unsupervised and domain-independent method for extracting keywords from single documents. Our approach differs from the previous ones in the way of identifying candidate keywords, pruning the list of candidate keywords with several filtering heuristics and selecting keywords from the list of
Given a set of keywords, we find a maximum Web query (containing the most keywords possible) that respects user-defined bounds on the number of returned hits. We assume a real-world setting where the user is not given direct access to a Web search engine's index, i.e., querying is possible only through an interface
In order to over the shortcoming of the incomprehensive of summarization, a new lexical-chain-based keywords extraction and automatic summarization algorithm from Chinese texts based on the unknown word recognition using co-occurrence of neighbor words is proposed in this paper, and an algorithm for constructing
Language Model (LM) constitutes one of the key components in Keyword Spotting (KWS). The rapid development of the World Wide Web (WWW) makes it an extremely large and valuable data source for LM training, but it is not optimal to use the raw transcripts from WWW due to the mismatch of content between the web corpus
This paper presents a text query-based method for keyword spotting from online Chinese handwritten documents. The similarity between a text word and handwriting is obtained by combining the character similiarity scores given by a character classifier. To overcome the ambiguity of character segmentation, multiple
topic analysis of LDA for feature selection and compare it with the classical feature selection metrics in text categorization. For the experiments, we use SVM as the classifier and tf*idf weighting for weighting the terms. We observed that almost in all metrics, information gain performs best at all keyword numbers while
The feature extraction is the most key technology of text categorization. The word is used as the feature in the traditional text classification, and its effect for the text classification is evidence. The feature extraction method using base phrase and keyword changes the feature extraction of Chinese text from
Spotting keywords in handwritten documents without transcription is a valuable method as it allows one to search, index, and classify such documents. In this paper we show that keyword spotting based on bi-directional Long Short-Term Memory (BLSTM) recurrent neural nets can successfully be applied on online
This paper presents a corpus-based approach for extracting keywords from a text written in a language that has no word boundary. Based on the concept of Thai character cluster, a Thai running text is preliminarily segmented into a sequence of inseparable units, called TCCs. To enable the handling of a large-scaled
This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the
platform, N-gram and word co-occurrence statistical analysis are combined to carry out Chinese keyword extraction experiment. Firstly, candidate keywords are extracted with bi-gram model. Then, a set of co-occurrences between every word in bi-grams and frequent words is generated. Co-occurrence distribution shows importance
Keyword extraction has been a very traditional topic in Natural Language Processing. However, most methods have been too complicated and slow to be applied in real applications, for example in web-based system. This paper proposes an approach which will complete some preparing works focusing on exploring the
This paper proposes a systematic full text search on document using a combined keyword and structural similarity of documents under consideration. The approach operates in two steps. The first step uses a set of designated keywords to acquire potential desired documents by means of an open source tool. The second step
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.