The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose using multi-layer multiple instance learning (MMIL) for image set classification and applying it to the task of cannabis website classification. We treat each image as an instance in an image set, then each image is further viewed as containing instances of local image patches. This representation naturally extends traditional multiple instance learning (MIL) to multi-layers. We then show...
As there are more and more online sources available on the Web, it becomes very time-consuming, if not impossible, to visit and search all web sites, one by one. Many search engines has been developed to help users find information of their need. However, search engines work poor for online sources whose data are often in deep web, which is not part of surface web indexed by standard search engines...
Like search engines, recommender systems have become a tool that cannot be ignored by websites with a large selection of products, music, news or simply webpages links. The performance of this kind of system depends on a large amount of information. At the same time, the amount of information on the Web is continuously growing, especially due to increased User Generated Content since the apparition...
A query burst is a period of heightened interest of users on a topic which yields a higher frequency of the search queries related to it. In this paper we examine the behavior of search engine users during a query burst, compared to before and after this period. The purpose of this study is to get insights about how search engines and content providers should respond to a query burst. We analyze one...
In the past, there have been many documents focusing on English reviews for sentiment analysis. These contain abundant research results which extract features and opinions, identify semantic orientation, and associate features with opinions. Although this approach has performed well for English reviews, it is not as successful with Chinese reviews. In this paper, we aim to develop a sentiment analysis...
This paper presents a novel method to estimate characteristics of information sources about a topic by analyzing their information diffusion subnetworks in blogspace. In an information diffusion network, each influential information source has an affected subnetwork whose nodes are reachable from it. We define three information diffusion properties of the subnetwork using the numbers of three types...
In recent years, social media has become ubiquitous and important for social networking and content sharing. And yet, the content that is generated from these websites remains largely untapped. In this paper, we demonstrate how social media content can be used to predict real-world outcomes. In particular, we use the chatter from Twitter.com to forecast box-office revenues for movies. We show that...
There has been virtually little in the way of user interfaces designed for the exploration and information gathering from large weblog datasets to allow for an integrated and aggregated knowledge collection and information analysis tool. Users have to rely on their own capability to find, select or filter entries and navigate through a blog archive. For weblogs with a large collection of entries this...
Social media such as Weblog and SNS are useful for information gathering purposes in Web 2.0 era. People visit others' personal websites with each other and form online social networks of acquaintances. As an efficient active media for information gathering, the social media require any functions for accelerating information diffusion in the online social network. The paper proposes a mechanism of...
Web log analysis can be helpful in gaining information about the usability of the web site, web performance, for marketing purposes, or for development of business intelligence tools in e-commerce systems. User segmentation is one of the problems solved in marketing and e-commerce sphere. Various software was developed to support web analysis. However, most of them provide only information through...
Collaborative tagging has emerged as a popular and effective method for organizing and describing pages on the Web. We present Treelicious, a system that allows hierarchical navigation of tagged web pages. Our system enriches the navigational capabilities of standard tagging systems, which typically exploit only popularity and co-occurrence data. We describe a prototype that leverages the Wikipedia...
Our research challenge is to provide a mechanism for splitting into user task-based sessions a long-term log of queries submitted to a Web Search Engine (WSE). The hypothesis is that some query sessions entail the concept of user task. We present an approach that relies on a centroid-based and a density-based clustering algorithm, which consider queries inter-arrival times and use a novel distance...
User Navigation Behavior Mining (UNBM) mainly studies the problems of extracting the interesting user access patterns from user access sequences (UAS), which are usually used for user access prediction and web page recommendation. Through analyzing the real world web data, we find most of user access sequences carrying hybrid features of different patterns, rather than a single one. Therefore, the...
Due to the complexity of topical opinion retrieval systems, standard measures, such as MAP or precision, do not fully succeed in assessing their performances. In this paper we introduce an evaluation framework based on artificially defined opinion classifiers. Using a Monte Carlo sampling, we perturb a relevance ranking by the outcomes of these classifiers and analyse how the opinion retrieval performance...
Representing web data into a machine understandable format is a curtail task for the next generation of the web. Most of current web pages are dynamic pages. A large percentage of these web pages get their contents from underlying database. This work proposes an approach to represent dynamic web pages into Concept Description Language (CDL) semantic format. This format does not depend on ontologies...
Information recommender system attempts to present information that is likely to be useful for the user. Showing recommendation reason is an important role of the system. However, current recommender systems give only simple or quantitative reasons for the recommendation. In this paper, we aim at giving precise and non-quantitative reasons which are also easy to understand. We make use of formulas...
Although static ranked lists remain the dominant Web search interface, they can limit the ability of Web searchers to find desired information when it is buried deep in the collection of search results. Web search visualization and Web search personalization are two active research directions that have shown promise for improving the user experience while searching the Web. In this paper, we propose...
Web content clustering is very important part of topic detection and tracking issue. In our paper we focus on pre-processing phase of web content clustering. We focus on blog articles published in Slovak language. We evaluate the impact of different data pre-processing methods on success of blog clustering. We found out that applying various text data manipulation techniques in preprocessing can improve...
In order to address issues such as information overload and the navigation problem, which plague users on the Web, we need to improve user support for query construction, modification, result browsing and information exploration. The Semantic Web aimed to address many of these issues by providing machine processable information and application interoperability, but as of today failed to reach widespread...
In order to talk to each other meaningfully, conversational partners utilize different types of conversational knowledge. Due to the fact that speakers often use grammatically incomplete and incorrect sentences in spontaneous language, knowledge about conversational and terminological context turns out to be as much important in language understanding as traditional linguistic analysis. In the context...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.