The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Web automation programs offer a means for users to enhance the usability of the web. These programs can be published on a wiki or other repository, thereby making them available for use by other users. However, in addition to programs of broad usefulness to the community at large, these repositories also contain many programs that are unreliable or highly specialized to the needs of very small sub-...
Demographic information plays an important role in gaining valuable insights about a web-site's user-base and is used extensively to target online advertisements and promotions. This paper investigates machine-learning approaches for predicting the demographic attributes of web-sites using information derived from their content and their hyper linked structure and not relying on any information directly...
In this paper we aim to resolve the recommendation problem by using the virtual ratings in online environments when user rating information is not available. As a matter of fact, in most of current websites especially the Chinese video-sharing ones, the traditional pure rating based collaborative filtering recommender methods are not fully qualified due to the sparsity of rating data. Motivated by...
Large amount of labeled training data is required to develop robust and effective facial expression analysis methods. However, obtaining such data is typically a tedious and time-consuming task that is proportional to the size of the database. Due to the rapid advance of Internet and Web technologies, it is now feasible to collect a tremendous number of images with potential label information at a...
When we are dealing with community structure detecting in the blogosphere, we have come to face some obstacles. The data in a blog may be updated frequently by its owner, making the whole blogosphere become very large during a short period of time. It can be very expensive to deal with such huge amount of data using those traditional methods. Meanwhile, few blogs in the blogosphere can be identified...
The blog is featured as a communication system for dissemination of information and expression of opinions. Is the blog a system suitable for collaboration? The research presented in this paper investigates the use of citation and response to enable a conversation on the blog. From analysis of the discourse structuring and the possibilities of relationship among participants, were developed some research...
We propose an approach for gathering web pages written in a specific language. The approach consists of a language predictor and a web site crawler. The language predictor is a machine learning based component that can learn from an example host graph some characteristics of relevant hosts, and is used to calculate the language degree of a web server whether it has a high probability to serve web...
Rich information is contributed to blogs by millions of users all around the world with the development of blogsphere. However, few work has been done on the study of blog extraction so far. Unlike the traditional template-dependent wrapper, not only blog articles but also blogroll is extracted with template-independent wrapper in this paper. In our method, blog extraction is formalized as a machine...
The typical task of unsupervised learning is to organize data, for example into clusters, typically disjoint clusters (eg. the K-means algorithm). One would expect (for example) a clustering of books into topics to present overlapping clusters. The situation is even more so in social networks, a source of ever increasing data. Finding the groups or communities in social networks based on interactions...
Weblogs (blogs) serve as a gateway to a large blog reader population, so blog authors can potentially influence a large reader population by expressing their thoughts and expertise in their blog posts. An important and complex problem, then, is figuring out why and how influence propagates through the blogosphere. While a number of previous research has looked at the network characteristics of blogs...
In this paper we compare four machine learning techniques for blog comments spam filtering. the machine learning techniques are the Naïve Bayes, K-nearest neighbor, neural networks and the support vector machines. For this comparative study we used a blog comment corpus that has been affected by spam, which is our study case in this work. We classify the comments of this blog comments corpus, which...
Automatic categorization of videos in a Web-scale unconstrained collection such as YouTube is a challenging task. A key issue is how to build an effective training set in the presence of missing, sparse or noisy labels. We propose to achieve this by first manually creating a small labeled set and then extending it using additional sources such as related videos, searched videos, and text-based webpages...
Web tracking sites or Web bugs are potential but serious threats to users' privacy during Web browsing. Web sites and their associated advertising sites surreptitiously gather the profiles of visitors and possibly abuse or improperly expose them, even if visitors do not provide their profiles consciously. In order to prevent such activities in a corporate network, most companies employ filters that...
Different Web recommendation systems have been proposed to address the problem of information overload on the Internet. They attempt to guide users toward interesting and useful items in a large information space. They anticipate the information needs of on-line users and provide them with recommendations to facilitate and personalize their navigation. There are many approaches to building such systems,...
The World-Wide Web provides every internet citizen with access to an abundance of information, but it becomes increasingly difficult to identify the relevant pieces of information. Research in web mining tries to address this problem by applying techniques from data mining and machine learning to Web data and documents. The Web Mining is an application of Data Mining. Without the internet, life would...
This paper studies the web wrapper generation for web pages of forum, blog and news web sites. While more and more web pages are dynamically generated using a common template populated with data from databases. This paper proposes a novel method that uses tree alignment and transfer learning method to generate the wrapper from this kind of web pages. We present a new tree alignment algorithm to find...
In recent years, much research has been devoted to the investigation of emulating XML; on the other hand, few have refined the essential unification of Byzantine fault tolerance and write-ahead logging. After years of key research into the World Wide Web, we argue the online algorithms, which embodies the natural principles of machine learning. Sangu, our new methodology for autonomous configurations,...
This paper proposes a minimally supervised method for acquiring high-level semantic relations such as causality and prevention from the Web. Our method learns linguistic patterns that express causality such as ??x gave rise to y??, and uses them to extract causal noun pairs like (global warming, malaria epidemic) from sentences like ??global warming gave rise to a new malaria epidemic??. The novelty...
Weblog is widely used, and the number of users is increasing rapidly. Weblog reflects every aspect of the society, such as politics, economy and culture, so the topic relevance retrieval research on Weblog becomes necessary. Because of a lot of noise in the corpus and it is usually difficult to obtain the appropriate query, the common methods sometimes fail to reach an acceptable precision. We design...
Human often wants to listen to music that fits best his current emotion. A grasp of emotions in songs might be a great help for us to effectively discover music. In this paper, we aimed at automatically classifying moods of songs based on lyrics and metadata, and proposed several methods for supervised learning of classifiers. In future, we plan to use automatically identified moods of songs as metadata...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.