The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering
huge irrelevant search hits. In this paper, we propose an improved method for ranking of search results to reduce human efforts on locating interesting hits. The search results are re-ranked using adaptive user interest hierarchies (AUIH), which considers both investigator-defined keywords and user interest learnt from
associated with an image. In our approach, we divide images into small tiles and create visual keywords using a high-dimensional clustering algorithm. These visual keywords act the same as text keywords. One of the challenges of this approach is to identify an appropriate size for visual keywords. In this paper, we report our
The content of a text is mainly defined by keywords and named entities occurring in it. In particular for news articles, named entities are usually important to define their semantics. However, named entities have ontological features, namely, their aliases, types, and identifiers, which are hidden from their textual
In document categorization method by using similarity measures based on word vectors, it is important to determine key words to characterize each document. However, conventional methods select the key words based on their frequency or/and particular importance index such as tf-idf. In this paper, we propose a method to characterize each document by using temporal clusters of technical term usages...
digital library based on topic or concept features. Firstly, documents in a special domain are automatically produced by document classification approach. It integrates the rule-based and statistical method to classify the documents in the large-scale collection. Then, the keywords of each document are extracted using the
With rapid development of Internet information, It is quite an important project for data mining that how to classify these large amounts of texts. In this paper, we propose an improved text classify cluster algorithm, while calculating similarity, we synthetically consider the relationship between keywords and
Web has grown to a huge mass of information resource and is diverse in content. To search such rich source of information one has to be very precise in using keywords in queries to retrieve the relevant documents. Most of the queries issued to search engines are short and have ambiguous context. One way to produce
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.