The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a corpus-based approach for extracting keywords from a text written in a language that has no word boundary. Based on the concept of Thai character cluster, a Thai running text is preliminarily segmented into a sequence of inseparable units, called TCCs. To enable the handling of a large-scaled
Consider an information repository whose content is categorized. A data item (in the repository) can belong to multiple categories and new data is continuously added to the system. In this paper, we describe a system, CS*, which takes a keyword query and returns the relevant top-K categories. In contrast, traditional
This paper compares the performance of keyword and machine learning-based chest x-ray report classification for Acute Lung Injury (ALI). ALI mortality is approximately 30 percent. High mortality is, in part, a consequence of delayed manual chest x-ray classification. An automated system could reduce the time to
The search engine, keyword extraction is an important technique. In this paper, aiming at the defects of the traditional keyword extraction algorithm, we proposed an improved weight computation strategy. The experimental results show that, the improved method's results are significantly better results than the
Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering
Based on the analysis of the insufficiencies of the present Chinese matching algorithms, by examining the characteristics of approximately duplicate records, this paper proposes a method of duplicate record cleaning based on a reformative keywords matching algorithm. Experiments show that this method improves Recall
In cross-language information retrieval (CLIR), the query sentence is often combined with a series of query keywords, rather than a complete natural sentence. Lack of necessary contextual syntactic information in such a query sentence makes it impossible to achieve a unique translation of the query sentence with
Due to the huge number of research articles in the biomedical domain, it becomes more and more important to develop methods to find relevant articles of our specific research interests. Keyword extraction is a useful method to find important topics from documents and summarize their major information. Unfortunately
This paper focuses on setting up a question-answering oriented biomedical domain, and it applies several different approaches to the different processing phases. Firstly, it uses shallow parser to identify the types of questions and extract the keywords, and the keywords are expanded with UMLS for the purpose of
Due to the exponential growth of available text documents in digital form, it is of great importance to develop techniques for automatic document classification based on the textual contents. Earlier document classification techniques have used keyword-based features and related statistics to achieve good results when
streets, with or without GPS logs. Those videos will be clipped into a sequence of images and associated with streets on road network. A sequence of images is called a `trail' in this service and it can be shared with other users via keyword search, range search or a direct link. Currently, thousands of trails, implying
consumption. We propose a classification method based on flow information. Our classification use a combination of keyword matching technique and statistical behavior profiles. Keywords are pre-defined by observing from both audio and video traffic. Behavior profiles consist of three attributes, which are the average received
ordinary users to use. In this paper, we propose a novel keyword-based user interface system EasyUI for achieving web-scale data integration and easy to use for ordinary users. Dealing with heterogeneity on the web-scale presents many new challenges. We proposed new methods to address these challenges, i.e., indexing schemata
designed and implemented to resolve the problem of crossing language queries and retrieving images processes. It can greatly reduce lot of time and effort for the search. The experiments on diverse queries on Yahoo images search have shown that the proposed scheme can improve the images results for non-English keyword
In this paper, we propose the ldquoaddedrdquo use of proximity search to a Web search query for narrowing down the set of documents returned as answers to a keyword based search query. This approach adds value to Web search query results by allowing users to better express what they are looking for. Most of the
This paper presents a novel framework for multi-folder email classification using graph mining as the underlying technique. Although several techniques exist (e.g., SVM, TF-IDF, n-gram) for addressing this problem in a delimited context, they heavily rely on extracting high-frequency keywords, thus ignoring the
In this paper, reclassification for the current classification through K-means would be implemented based on the feedback of Web usage mining in order to improve the accuracy of news recommendation and convergence of classification. It could extract most relative keywords and eliminate the disturbance of multi-vocal
This paper presents an integrated approach to automatically provide an overview of content on Thai websites based on tag cloud. This approach is intended to address the information overload issue by presenting the overview to users in order that they could assess whether the information meets their needs. The approach has incorporated Web content extraction, Thai word segmentation, and information...
The similarity between sentences is a theoretical basis and key technology to the question answering system. The method presented in this paper is as follows. Firstly, the dependency question sets are obtained and the key words are extracted from the major components of the question sentences and the target question form the related libraries, and then the candidate question sets are obtained through...
In recent years, the application of ontology has been already toward the diversification under the development of the semantic Web technology. The main application of ontology is information retrieval. With the utilization of ontology, we expect to offer more correct information for users. Although, most of the applications of ontology are information retrieval but they lacks of the interaction with...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.