The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Accumulation of existing web documents on the Internet from one side and rapid changes of these pages and their exponential growth made their manually organizing and retrieving almost impossible. Therefore it is necessary to have a system that can automatically put these pages into the related classes to provide their results for the applied tools to be used. Unfortunately, the classification of Persian...
There are two shortages when the method of classification based on association rules is applied to classify the web documents: one is that the method process the web document as a plain text, ignoring the HTML tags information of the web page; another is that either item of the association rules is only the word in the web page, without considering the weight of the word, or it quantifies the weight...
Web document classification is an important technique of Web mining. Web pages classification has been studied extensively since the Internet has become a huge database of information. The k-NN is a simple classification algorithm that is used to assign patterns of unknown classification to the class of the majority of its k nearest neighbors of known classification according to the distance measure,...
Automatic document classification has been subject to research since the early 1960s. However, additional research is still required and possible because the results obtained until now remain subject to further enhancement and refinement. Although a lot of literature has been written on the subject, very little research was reported on the automatic classification of Arabic documents none of which...
Text classification categories Web documents in large collections into predefined classes based on their contents. Unfortunately, the classification process can be time-consuming and users are still required to spend considerable amount of time scanning through the classified Web documents to identify the ones that satisfy their information needs. In solving this problem, we first introduce CorSum,...
Automatic extraction of opinions on products from Web has been receiving interest increasingly. Such extracted knowledge helps to find out what other people think about the particular product or service. With the growing availability of resources like online review sites and personal blogs, new opportunities and challenges arise as people can, and do, actively use information technologies to seek...
Nowadays, a number of universities, laboratories, government agencies and companies that placing theirs documents online and making them searchable are increasing because the Internet infrastructure for global data access is fully functional. However, a large number of organizations have documents that lack metadata. The lack of metadata breaks off not only the discovery and dissemination of these...
Arranging mass of data in related groups is an important way that helps us to decide about them better, clustering and classification are two efficient methods of grouping huge volume of data, most of clustering and classification methods that work on Web pages grouping problems, use fixed size vectors in their learning algorithm. In the real world of WWW this assumption is not reliable. In this paper...
On the base of NS-IMMC, this paper propose a new method of generating the cause-and-effect of news topic. The new method choose representative sentences for news documents according to the specialty of news structure (NS, News structure), and then utilizes IMMC (Improved Min-Max clustering) to classify these representative sentences to generate multi-documents summary which represents the topic cause-and-effect...
Past studies on emotion classification focus on the writerpsilas emotional state. This research addresses the reader aspect instead. The classification of documents into reader-emotion categories has several applications. One of them is to integrate reader-emotion classification into a Web search engine to allow users to retrieve documents that contain relevant contents and at the same time instill...
In this most developed and shifting era of Internet, the information of Internet does massively increase. Webpage indexing catalogues or search engines which can help Demanders on information to rapidly and precisely collect Web information on Internet have already become the indispensable and important tools in Internet. How to precisely do Webpage classification that can effectively assist with...
Studying link structure of the World Wide Web (WWW) is an area which has attracted a lot of interest. Several papers have been published on structural analysis of hyperlinked environments such as the WWW. The WWW can be modeled as a graph and valuable information can be derived by analyzing links between the Web-pages primarily for the purpose of building better search engines. Many novel methods...
The people often establish taking notes on reading and browsing activities; hence annotation is being very important in human life in anytime and anywhere. We developed a free-form annotation tool for collaboration that provides a convenient way to create annotation easily. Our approach is characterized by two design criteria, including: 1) digital ink annotation: help users to focus on annotated...
Due to the exponential growth of documents in the Internet and the emergent need to organize them, the automatic document classification has received an ever-increased attention in the recent years. The particle swarm optimization (PSO) algorithm, new to the document classification community, is a robust stochastic evolutionary algorithm based on the movement and intelligence of swarms. In this paper,...
This paper describes a method for extracting reliable reputation on the Web. In this research, reliable reputation is the information that has an opposite polarity value of contributor's stance (positive or negative). We call this information "fair reputation". In order to extract fair reputations, we develop the following two tasks. The first task is classification of feedback documents...
In this paper we process and analyze Web search engine query and click data from the perspective of the documents (URs) selected. We initially define possible document categories and select descriptive variables to define the documents. The URL dataset is preprocessed and analyzed using some traditional statistical methods, and then processed by the Kohonen (1984) SOM clustering technique, which we...
Current Web search engines are not able to adapt their operations to the evolving needs, interests and preferences of the users. To cope with this weakness, we developed a system able to classify HTML (or, XML) documents into user prespecified categories of interests. The system processes the user current profile and a set of representative documents - for each category of interest, and produces a...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.