The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Medical document categorization is the process of automatically assigning one or more predefined category labels to medical documents. Document indexing plays a very important role in the process of classification. This paper proposes an improved method of computing term weights which is called tfidfie (term frequency, inverted document frequency and inverted entropy). In comparison with the tfidf...
Catalog pages construct the intermediate layer in architecture of a standard Web site; therefore research on information retrieval for this kind of pages can be beneficial to improve Web crawler's efficiency. A page is called "catalog-style" if its main body is displayed as a sequence of regular entries, and the central link in each entry apparently contains the pagepsilas major information...
Text classification is an active research area in information retrieval and natural language processing. A fundamental tool in text classification is a list of 'stop' words(stop word list) that is used to identify frequent words that are unlikely to assist in classification and hence are deleted during pre-processing. Till now, many stop word lists have been developed for English language. However,...
The growth of the internet information delivery has made automatic text categorization essential. This investigation explores the challenges of multi-class text categorization using one-against-one fuzzy support vector machine with Reuter??s news as the example data. While the fuzzy set theory is incorporated into the OAO-SVM in the classifying module, the influence of the samples with high uncertainty...
Automatic content categorization by means of taxonomies is a powerful tool for information retrieval and search technologies as it improves the accessibility of data both for humans and machines. While research on automatic categorization has mainly focused on the problem of classifier design, hardly any effort has been spent on the optimization of the taxonomy size itself. However, taxonomy tailoring...
This thesis presents a novel two-stage model that integrates the theories and techniques from the fields of information retrieval/filtering (IR/IF)and the fields of machine learning and data mining to provide more precise document filtering and retrieval. The first stage is topic filtering. The topic filtering stage is intended to minimize information mismatch by filtering out the most likely irrelevant...
Text clustering as a method of organizing retrieval results can organize large amounts of web search into a small number of clusters in order to facilitate users?? quickly browsing. In this paper, we propose a text clustering method based on ontology which is different from traditional text clustering and can improve clustering results performance. This method implements word clustering by calculating...
Text representation, which is a fundamental and necessary process for text-based intelligent information processing, includes the tasks of determining the index terms for documents and producing the numeric vectors corresponding to the documents. In this paper, multi-word, which is regarded as containing more contextual semantics than individual word and possessing the favorable statistical characteristics,...
Web research in Mexico has been addressing issues related mainly to search mechanisms, information extraction, and mediating user interaction and group collaboration. In this paper we provide an overview of representative projects in the area and present a sample of recent advances by research groups in Mexican institutions. These include initiatives aimed to exploring extraction techniques that regard...
Recently, automatic text categorization has made rapid progress and been one of the hotspots in the information processing field. Text tendency classification is one type of text categorization, which has very important applications in information retrievals bad information identification and filtering , content security management and analysis of public opinion tendency. To aim at the important influence...
Information extraction systems are used to extract only relevant text information in digital repositories. The current work proposes an automatic system to extract information in semi-structured official journals. In our approach, given an input document, a Machine Learning (ML) algorithm classifies the documentpsilas fragments into class labels which correspond to the data fields to be extracted...
In Chinese text categorization system, for most classifiers using vector space model (VSM), all attributes of documents construct a high dimensional feature space. And the high dimensionality of feature space is the bottleneck of categorization. TFIDF is a kind of common methods used to measure the terms in a document. The method is easy but it doesn't consider the unbalance distribution of terms...
Over the past decade, more and more users of the Internet rely on the search engines to help them find the information they need. However, the information they find depends, to a large extent, on the ranking mechanism of the search engines they use. Not surprisingly, it, in general, consists of a large amount of information that is completely irrelevant. To help users of the Internet find the information...
With an increasing amount of audio and video materials made available on the web, information extraction from multimedia documents is becoming a key area of growing business and technology interest. Research opportunities range from traditional topics, such as multimedia signal representation, processing, coding, modeling, authentication, and recognition, to emerging subjects, such as language modeling,...
Information extraction (IE) aims to extract from textual documents only the fragments which correspond to datafields required by the user. In this paper, we present new experiments evaluating a hybrid machine learning approach for IE that combines text classifiers and hidden Markov models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM,...
This paper presents a keyword extraction technique that can be used for tracking topics over time. In our work, keywords are a set of significant words in an article that gives high-level description of its contents to readers. Identifying keywords from a large amount of on-line news data is very useful in that it can produce a short summary of news articles. As on-line text documents rapidly increase...
Concept hierarchy is a hierarchically organized collection of domain concepts. It is particularly useful in many applications such as information retrieval, document browsing and document classification. One of the important tasks in the construction of concept hierarchy is the identification of suitable terms with appropriate size of domain vocabulary. One way of achieving such a size is by using...
An experimental prototype system was created and used to investigate how information relevant to analyst queries, and constrained by a contextual model, can be found over a large information space. Agents employing the ant model sift through documents quickly using a transductive support machine classifier and return those meeting a classifier which is constantly refined through feedback from semantic...
With the development of Internet, there are enormous web pages in the Internet. So the good page ranking algorithm is critical for users to gain positive results. The traditional ranking method is suitable for general search engine, but not for the focused search engine and the search engine based on categorization. With state of the art in text categorization, so many cross-subjects appear, and the...
Text Mining tasks include text categorization, text clustering, concept/entity extraction, document summarization, and entity relation modeling. In this paper, the focus is given to concept/entity extraction only. The major challenging issue in extracting concept/entity from texts is natural language words are always ambiguous. Up to now, not much research in text mining especially in concept/entity...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.