The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the
quality of information retrieval. The contributions of our research are twofold. First, the existing ranking algorithms of search engine are classified. And we extend expression of queries by “keyword and ”, instead of keywords only. Second, a new ranking algorithm based on user feedback and semantic tags is
In order to improve searching results of Web pages and enhancing Web crawling operation, the Web page clustering based on searching keywords is proposed in this paper, which firstly employed matching degree between Web pages and searching keywords to decide the sequence of showing pages of searching results. Then
In this work, we compare various text-based pornographic Web filtering techniques. The techniques include blacklist and keyword blocking. The technique called SV is modified to extract a representative feature vector. Each test Web pagepsilas feature is extracted and gathered as a vector. The vector is then summarized
This paper explores a unique way in which the thinking algorithm adds an extra logical substrate to a Web search query using artificial intelligence. Instead of just going after keyword searching, the algorithm tries to assess the motives of the user behind entering a query. The algorithm tries to find the reasons as
In this paper, reclassification for the current classification through K-means would be implemented based on the feedback of Web usage mining in order to improve the accuracy of news recommendation and convergence of classification. It could extract most relative keywords and eliminate the disturbance of multi-vocal
Domain — specific search focuses on one area of knowledge. Applying broad based ranking algorithms to vertical search domains is not desirable. The broad based ranking model builds upon the data from multiple domains existing on the web. Vertical search engines attempt to use a focused crawler that index only relevant web pages to a predefined topic. With Ranking Adaptation Model, one can adapt an...
of HTML page, and the proposed algorithms is performed. Complete evaluation is performed which indicates the effectiveness of using our technique. The experimental results show improved precision and recall with the proposed algorithms with respect to keyword-based search. The algorithms are implemented in JAVA and its
Current classification techniques use word matching and clustering techniques to classify webpages. These techniques use ad hoc approach of checking and matching the entire keywords in a webpage for classification. These methods are efficient but not without problems. In general, they suffer from the following
Web page classification plays an essential role in facilitating more efficient information retrieval and information processing. Conventionally, web text documents are represented by term frequency matrix for classification purpose. However, considering the limitations of representing documents using terms or keywords
rely on indexing web pages so that the information obtained by the tourist is still unfavorable because it only shows a web page with keywords that exist on the article. A support system to recognize tourism places on the web pages is required to produce better information presentation. In this study, the recognition
done on a set of data is chosen to form the basis as done with keywords. If the base data is chosen arbitrarily, it is automatic, whereas some 'knowledge' or 'background' is put in the choice it is adaptive. Statistical features of the images are extracted from the pixel map of the image. The extracted features are
the websites into their most appropriate category. Several parameters like the weight applied to each feature and the keywords used to classify the websites were tuned to yield better results. The experimental evaluation revealed that the method implemented provides very high accuracy. In particularly, we obtained an
Traditional automatic classifiers often conduct misclassifications. Folksonomy, a new manual classification scheme based on tagging efforts of users with freely chosen keywords can effective resolve this problem. Even though the scalability of folksonomy is much higher than the other manual classification schemes, the
Traditional information retrieval (IR) method use keywords matching to filter the documents, but usually retrieves unrelated Web pages. In order to effectively classify Web pages, we present a Web page categorization algorithm, named WebPSC (Web page similarity categorization). This algorithm uses latent semantic
obtain latent semantic structure of original term-document matrix solving the polysemous and synonymous keywords problem. LS-SVM is an effective method for learning the classification knowledge from massive data, especially on condition of high cost in getting labeled classical examples. We adopt a novel method of Web page
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.