The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The web is today's primary publication medium, making web archiving an important activity for historical and analytical purposes. Web pages are increasingly interactive, resulting in pages that are correspondingly difficult to archive. JavaScript enables interactions that can potentially change the client-side state of a representation. We refer to representations that load embedded resources via...
Web archives preserve an unprecedented abundance of materials regarding major events and transformations in our society. In this paper, we present an approach for building event-centric sub-collections from such large archives, which includes not only the core documents related to the event itself but, even more importantly, documents describing related aspects (e.g., premises and consequences). This...
Social media constitute nowadays one of the most common communication mediums. Millions of users exploit them daily to share information with their community in the network via messages, referred as posts. The massive volume of information shared is extremely diverse and covers a vast spectrum of topics and interests. Automatically identifying the topics of the posts is of particular interest as this...
In this paper we propose and evaluate three approaches for automated classification of texts in over 60 languages without the need for a manually annotated dataset in those languages. All approaches are based on the randomized Explicit Semantic Analysis method using multilingual Wikipedia articles as their knowledge repository. We evaluate the proposed approaches by classifying a Twitter dataset in...
Wikipedia is one of the fastest growing websites and a primary source of knowledge on the Internet. Being a wiki, its content is crowd-sourced by the users. This has many benefits and it is one of the main reasons it has grown to reach more than 5 million articles in its English version. Nevertheless, this also raises issues, like the overlinking of articles, which are difficult to deal with by editors...
Social Information Retrieval can be interpreted as querying the private information spaces of others within one's social network. One of the crucial steps in such a search approach is to identify the set of potential information providers to route the query to. In this experiment, we compare various routing mechanisms based on topic models (Latent Dirichlet Allocation, LDA), Explicit Semantic Analysis...
Trolling describes a range of antisocial online behaviors that aim at disrupting the normal operation of online social networks and media. Combating trolling is an important problem in the online world. Existing approaches rely on human-based or automatic mechanisms for identifying trolls and troll posts. In this paper we take a novel approach to the trolling problem: our goal is to identify the targets...
With the rapid development of the Internet, more and more people use social networks to share information and express their views, which lead to a vigorous growth of information. How to select useful and interesting information for users, that is user topic interest, gets more and more attention. Tag functionality in microblog can get user topic interests easily and achieve information recommendation...
In the social world the sharing of knowledge, data's and concepts within a group is done through the network of interactions and relationships. A community is formed by a group of individuals of same interest to share common values within themselves at a higher rate than outside the community. It can be a social unit of any size. The significant chore while studying the social network is to identify...
Inferring potential links is a fundamental problem in social networks. In the link recommendation problem, the aim is to suggest a list of potential people to each user, ordered by the preferences of the user. Although various approaches have been developed to solve this problem, the difficulty of producing a ranking list with high precision at the top -- the most important consideration for real...
Social networking or social media sites are involvedin our daily life. With the increasing popularity of socialmedia sites like Twitter and Facebook, people are using theirsocial network to find answers to their questions. Not everysocial media site can answer a user's question. Different socialmedia sites have different strength such as the StackOverflowcommunity has specific interest in programming...
In this work, we present the Klout Score, an influence scoring system that assigns scores to 750 million users across 9 different social networks on a daily basis. We propose a hierarchical framework for generating an influence score for each user, by incorporating information for the user from multiple networks and communities. Over 3600 features that capture signals of influential interactions are...
This paper provide a brief survey of semantic similarity including semantic similarity between concepts and semantic textual similarity. We classify methods of semantic similarity between into four categories based on background information resource used and classify methods of semantic textual similarity into four categories too. As a basic methodology of text related research and applications, semantic...
Social media has become a part of our daily life and we use it for many reasons. One of its uses is to get our questions answered. Given a multitude of social media sites, however, one immediate challenge is to pick the most relevant site for a question. This is a challenging problem because (1) questions are usually short, and (2) social media sites evolve. In this work, we propose to utilize topic...
Most traditional social networks rely on explicitly given relations between users, their friends and followers. In this paper, we go beyond well structured data repositories and create a person-centric network from unstructured text — the Wikipedia Social Network. To identify persons in Wikipedia, we make use of interwiki links, Wikipedia categories and person related information available in Wikidata...
Mining the silent members, also called lurkers, of an online community has been recognized as an important problem that accompanies the extensive use of social networks. Existing solutions to the ranking of lurkers can aid understanding the lurking behaviors in social networks, however they ignore any information concerning the time dimension. In this work we push forward research in lurker mining...
Online social networks like Slashdot bring valuable information to millions of users - but their accuracy is based on the integrity of their user base. Unfortunately, there are many “trolls” on Slashdot who post misinformation and compromise system integrity. In this paper, we develop a general algorithm called TIA (short for Troll Identification Algorithm) to classify users of an online “signed”...
Detecting the collaborative cheating in an online shopping system is an important but challenging issue. In this paper, we propose a novel approach to detect the collusive manipulation on ratings in Amazon, an online shopping system. Rather than focusing on rating values, we believe the online shopping and rating activities have nontrivial attributes in terms of social network connections. Our major...
This work introduces a semantics-based navigation application called WNavis. It facilitates informationseeking activities in internal link-based websites within Wikipedia. Our goal is to develop an application that helps users easily find related articles on a given topic and then quickly check the content of articles to explore concepts in Wikipedia. We constructed a subject-based network by analyzing...
This paper describes a method for inferring when a person might be coordinating with others based on their behavior. We show that, in Wikipedia, editing behavior is more random when coordinating with others. We analyzed this using both entropy and conditional entropy. These algorithms rely only on timestamped events associated with entities, making them broadly applicable to other domains. In this...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.