The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In order to improve searching results of Web pages and enhancing Web crawling operation, the Web page clustering based on searching keywords is proposed in this paper, which firstly employed matching degree between Web pages and searching keywords to decide the sequence of showing pages of searching results. Then
We consider topic detection without any prior knowledge of category structure or possible categories. Keywords are extracted and clustered based on different similarity measures using the induced k-bisecting clustering algorithm. Evaluation on Wikipedia articles shows that clusters of keywords correlate strongly with
information is deficient and noisy on YouTube. In this paper, we propose the novel dual updating method for YouTube video topic discovery. We first enhance the document representation for each video with its related videos, then we extract meaningful topics via keyword cores, at last, the video response links and the
This paper proposes a structure that automatically analyzes the parameters of Chinese test items. This structure utilizes latent semantic analysis (LSA) to analyze the relationships of keywords among all test items in an item bank. It also uses the similarity measure to calculate the similarity degree of keywords. We
associated with an image. In our approach, we divide images into small tiles and create visual keywords using a high-dimensional clustering algorithm. These visual keywords act the same as text keywords. One of the challenges of this approach is to identify an appropriate size for visual keywords. In this paper, we report our
Keyword (Feature) selection enhances and improves many Information Retrieval (IR) tasks such as document categorization, automatic topic discovery, etc. The problem of keyword selection is usually solved using supervised algorithms. In this paper, we propose an unsupervised approach that combines keyword selection and
This paper presents the comparison of the text document space dimension reduction and the text document clustering and also the keyword space dimension reduction and keyword clustering by the latent semantic analysis and by the Hebbian neural network with Oja learning rule. Results of this neural network are compared
Traditional Web search engines mostly adopt a keyword-based approach. When the keyword submitted by the user is ambiguous, search result usually consists of documents related to various meanings of the keyword, while the user is probably interested in only one of them. In this paper we attempt to provide a solution to
This paper proposes a novel method to generate labels for grouping and organizing the search results returned by auxiliary search engines. It has applied statistical techniques to measure the quantities of co-occurrence keywords for forming the label matrix of them, and then agglomerated them into higher-level
addition, we use the keyword extracting method, which is based on the maximum entropy model, to get rid of the useless information. The experimental results show that the keyword extracting algorithm can get 70% precision, and the condition probabilistic based algorithm is more precise than the token-based algorithm. HIMA
sense discovery problem. Given a query and a list of result pages, our unsupervised method detects word sense communities in the extracted keyword network. The documents are assigned to several refined word sense communities to form clusters. We use the modularity score of the discovered keyword community structure to
The content of a text is mainly defined by keywords and named entities occurring in it. In particular for news articles, named entities are usually important to define their semantics. However, named entities have ontological features, namely, their aliases, types, and identifiers, which are hidden from their textual
Since keyword-based search engine usually return large amount of results in which there are many unrelated documents and many documents with same content, automatic clustering technology is used to classify the retrieval results. While there are large amount of Web retrieval results, the clustering process usually
of search results until they can find all the content for which they were actually looking. In order to address this limitation, we suggest an algorithm to cluster search results using keyword similarity. Clustering search results from YouTube are accomplished by using the Markov clustering algorithm, which helps users
The keyword based search technique suffers from the problem of synonymic and polysemic queries. Current approaches address only the problem of synonymic queries in which different queries might have the same information requirement. But the problem of polysemic queries, i.e., same query having different intentions
The proliferation of Web services demands for a discovery mechanism to find advertisements that satisfy the requests more accurately. OWL-S provides a capability-based description and logic inference mechanism for semantically matching. UDDI provides a registry of businesses and Web services, but its keyword search
process in which groups of semantically similar queries are identified. An efficient clustering algorithm called suffix tree clustering is developed in the study. Meanwhile, the keyword- based similarity measure is used for determining the closest cluster to the given query, and the Chinese synonymy is also considered in the
Web 2.0 tools and environments have made tagging, the act of assigning keywords to on-line objects, a popular way to annotate shared resources. The success of now-prominent tagging systems makes tagging "the natural way for people to classify objects as well as an attractive way to discover new material". One of the
In this research, we used a proxy server to search for information related to the userpsilas browsed Web pages. From the records of the proxy server we constructed a profile of the userpsilas browsing habits. At the end of the userpsilas search subsystem, we will use content based concept to extract keywords to obtain
In document categorization method by using similarity measures based on word vectors, it is important to determine key words to characterize each document. However, conventional methods select the key words based on their frequency or/and particular importance index such as tf-idf. In this paper, we propose a method to characterize each document by using temporal clusters of technical term usages...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.