The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Scientific documents are unstructured data consisting of natural language and hard for scientists to read and manage. Keywords are very helpful for scientists to search the related documents and know about their contents in a prompt way. In this paper we investigate a kind of data preprocessing technique used in SVM
Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature
Keyword (Feature) selection enhances and improves many Information Retrieval (IR) tasks such as document categorization, automatic topic discovery, etc. The problem of keyword selection is usually solved using supervised algorithms. In this paper, we propose an unsupervised approach that combines keyword selection and
This paper presents the comparison of the text document space dimension reduction and the text document clustering and also the keyword space dimension reduction and keyword clustering by the latent semantic analysis and by the Hebbian neural network with Oja learning rule. Results of this neural network are compared
Traditional Web search engines mostly adopt a keyword-based approach. When the keyword submitted by the user is ambiguous, search result usually consists of documents related to various meanings of the keyword, while the user is probably interested in only one of them. In this paper we attempt to provide a solution to
Since keyword-based search engine usually return large amount of results in which there are many unrelated documents and many documents with same content, automatic clustering technology is used to classify the retrieval results. While there are large amount of Web retrieval results, the clustering process usually
Remote Electronic Document (CReED) provided an access control to all documents that will grant different privileges to each user of the system. It also utilized a keyword analyzer and result matcher that will make searching and retrieving of documents faster and easier. CReED used a scanner device and file importing tool to
This paper addresses the problem of clustering dynamic collections of web documents. We show an iterative algorithm based on a fine-grained keyword extraction (simple, compound words and proper nouns). Each new document inserted in the collection is either assigned to an existing class containing documents of the same
In this paper, reclassification for the current classification through K-means would be implemented based on the feedback of Web usage mining in order to improve the accuracy of news recommendation and convergence of classification. It could extract most relative keywords and eliminate the disturbance of multi-vocal
user's request, the user has to construct the request using the keywords that best describe the user's objective and match correctly with the Web Service name or location. Clustering Web services based on function similarities would greatly boost the ability of Web services search engines to retrieve the most relevant Web
With the increasing amount of unstructured content available electronically on the web, content categorization becomes very important for efficient information retrieval. The basic approaches for information retrieval in text documents are searching using keywords, categorization of the documents and filtering out the
Web has grown to a huge mass of information resource and is diverse in content. To search such rich source of information one has to be very precise in using keywords in queries to retrieve the relevant documents. Most of the queries issued to search engines are short and have ambiguous context. One way to produce
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.