The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Text analysis of a web page is more difficult than the analysis of the text of normal document due to the presence of additional information, such as HTML structure, styling codes, irrelevant text, and presence of hyperlinks. In this paper, we propose an unsupervised method to extract keywords from a web page. The
Internet is becoming an increasingly important platform for ordinary life and work. It is expected that keyword extraction can help people quickly find hot spots on the web, since keywords in a document provide important information about the content of the document. In this paper, we propose to use text clustering
Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature
We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with
This paper proposes a novel method to generate labels for grouping and organizing the search results returned by auxiliary search engines. It has applied statistical techniques to measure the quantities of co-occurrence keywords for forming the label matrix of them, and then agglomerated them into higher-level
addition, we use the keyword extracting method, which is based on the maximum entropy model, to get rid of the useless information. The experimental results show that the keyword extracting algorithm can get 70% precision, and the condition probabilistic based algorithm is more precise than the token-based algorithm. HIMA
Since keyword-based search engine usually return large amount of results in which there are many unrelated documents and many documents with same content, automatic clustering technology is used to classify the retrieval results. While there are large amount of Web retrieval results, the clustering process usually
This paper proposes a system for finding a userpsilas interests on the Internet. It is based on his browsing behaviors and the contents of his visited pages. The system has two features. One is building userpsilas browsing interests implicitly, multiple keyword vectors, one per interest. The other is that it can
Web 2.0 tools and environments have made tagging, the act of assigning keywords to on-line objects, a popular way to annotate shared resources. The success of now-prominent tagging systems makes tagging "the natural way for people to classify objects as well as an attractive way to discover new material". One of the
users to shift through and find relevant information. The information retrievals commonly used are based on keywords. These techniques used keyword lists to describe the content of information, but one problem with such list is that they do not say anything about the symantic relationships between keywords, nor do they
quality of text-mined data while efficacy relied on the context of the choice of techniques. Although developments of automated keyword extraction methods have made differences in the quality of data selection, the efficacy of the Natural Language Processing (NLP) methods using verified keywords remain a challenge. In this
event can be effortlessly found using keyword matching, but there are numerous tweets that are likely to contain information that is semantically identical. Moreover, there exist many systems for recapitulating tweets related to a particular event, but they have numerous limitations and are unable to provide accurate
keyword specified by the investigator or suggested by system. Experiments were conducted on dummy crime dataset to test the accuracy and the scalability of the proposed system. Experimental results proved that subject suggestion improved the accuracy and thus speeds up the process of searching the evidence.
title, keyword and link text information to represent the website. Heterogeneous classifiers are then built based on these different features. We propose a principled ensemble classification algorithm to combine the predicted results from different phishing detection classifiers. Hierarchical clustering technique has been
This paper presents the summary of experience obtained with the modified clustering algorithm of Projective Adaptive Resonance Theory. The algorithm was proposed by authors, and was tested for text processing. Possible usage of the algorithm is exemplified by text document clustering, and generation of keyword
integrating both low level-visual features and high-level textual keywords. Unfortunately, manual image annotation is a tedious process and may not be possible for large image databases. To overcome this limitation, several approaches that can annotate images in a semi-supervised or unsupervised way have emerged. In this paper
In this paper, we examine the significance of expansion of the user query by two techniques namely Efficient Clustering-By-Direction and Theme Clustering. These two techniques produce the clusters of keywords extracted from the set of retrieved documents for the user query. The former clustering is based on
as the services management. Existing methods for Web services clustering mostly focus on utilizing directly key features from WSDL documents, e.g., input/output parameters and keywords from description text. Probabilistic topic model Latent Dirichlet Allocation (LDA) is also adopted, which extracts latent topic features
done on a set of data is chosen to form the basis as done with keywords. If the base data is chosen arbitrarily, it is automatic, whereas some 'knowledge' or 'background' is put in the choice it is adaptive. Statistical features of the images are extracted from the pixel map of the image. The extracted features are
fuzzy Euclidean distance clustering algorithm after using MeSH ontology on medical theses data for better categorization of the keywords within the data.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.