The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper discusses part one of the main work in field of data science, mining and analytics. Family of algorithms is developed to predict the educational relevance of individuals' talents through lens of personality features (unstructured and semi-structured) and academic/career data. The big data (unstructured and semi-structured) contains lots of valuable information that can be mined and analyzed...
Social media usage has increased marginally in the last decade and it is still continuing to grow. Companies, data scientists, and researchers are trying to infer meaningful information from this vast amount of data. One of the most important target applications is to find influential people in these networks. This information can serve many purposes such as; user or content recommendation, viral...
Phone number, a unique identifier has emerged as an important Personally Identifiable Information (PII) in the last few years. Other PII like e-mail and online identity have been exploited in the past to launch phishing and spam attacks against them. The reach and security of a phone number provide a genuine advantage over e-mail or online identity, making it the most vulnerable attack vector. In...
In severe outbreaks such as Ebola, bird flu and SARS, people share news, and their thoughts and responses regarding the outbreaks on social media. Understanding how people perceive the severe outbreaks, what their responses are, and what factors affect these responses become important. In this paper, we conduct a comprehensive study of understanding and mining the spread of Ebola-related information...
Analysis of large networks is of interest to many disciplines. Full network data are often hard to collect, storage and analyze. In particular, in many social science surveys, ego nomination techniques have been used to collect the egocentric networks of the randomly sampled survey respondents. In this paper, we propose a sample-GLMLE method that fits exponential random graph models (ERGM) to such...
Given a set AL of community detection algorithms and a graph G as inputs, we propose two ensemble methods EnDisCo and MeDOC that (respectively) identify disjoint and overlapping communities in G. EnDisCo transforms a graph into a latent feature space by leveraging multiple base solutions and discovers disjoint community structure. MeDOC groups similar base communities into a meta-community and detects...
Community detection is a fundamental task in social network analysis. In this paper, first we develop an endorsement filtered user connectivity network by utilizing Heider's structural balance theory and certain Twitter triad patterns. Next, we develop three Nonnegative Matrix Factorization frameworks to investigate the contributions of different types of user connectivity and content information...
We study the problem of synopsis construction of massive graph streams arriving in real-time. Many graphs such as those formed by the activity on social networks, communication networks, and telephone networks are defined dynamically as rapid edge streams on a massive domain of nodes. In these rapid and massive graph streams, it is often not possible to estimate the frequency of individual items (e...
The smart grid interconnects a power grid (network) and a communication network, and enables bi-directional flow of electricity and information. To prevent the cascading failures which occur when the disruptions in one network cause disruptions in the other network, robustness should be enhanced by increasing the number of links (edges) between the power grid and the information flow network. Given...
Nowadays, in the world of limited attention, the techniques that maximize the spread of social influence are more than welcomed. Companies try to maximize their profits on sales by providing customers with free samples believing in the power of word-of-mouth marketing, governments and non-governmental organizations often want to introduce positive changes in the society by appropriately selecting...
Skyline queries are currently the most notable type of multi-criteria search algorithm. A skyline query returns all of the data points in a given a dataset that are not dominated by other data points. However, this type of query is limited by the fact that the number of results cannot be controlled. In some cases, this can result in an excessive number of results, whereas other cases result in an...
We have built a tool for inspecting and managing data lakes. The motivations for creating this tool are 1) schema discovery (determining links pertinent to solving a data analysis problem), 2) discovering high risk links in data schemas that give rise to Information Security problems and 3) discovering high value relationships enabling data asset curation. The tool works by extracting metadata from...
In network science several topology-based link prediction methods have been developed so far. The classic social network link prediction approach takes as an input a snapshot of a whole network. However, with human activities behind it, this social network keeps changing. In this paper, we consider link prediction problem as a time-series problem and propose a hybrid link prediction model that combines...
Investors have always been interested in stock price forecasting. Since the development of electronic media, hundreds pieces of financial news are released on different media every day. Numerous studies have attempted to examine whether the stock price forecasting through text mining technology and machine learning could lead to abnormal returns. However, few of them involved the discussion on whether...
Networks extracted from social media platforms frequently include multiple types of links that dynamically change over time; these links can be used to represent dyadic interactions such as economic transactions, communications, and shared activities. Organizing this data into a dynamic multiplex network, where each layer is composed of a single edge type linking the same underlying vertices, can...
Social networks are known to form on the basis of homophily, where nodes with some type of similar characteristics are more likely to be connected. Some of the most fundamental human characteristics are reflected by an individual's personality, which represents a persistent disposition governing a human's outlook and approach to diverse situations. While taking into account demographics of age and...
In this work we present a new local, vertex-level measure of community change. Our measure detects vertices that change community membership due to the actions (edges) of a vertex itself and not only due to global community shifts. The local nature of our measure is important for analyzing real graphs because communities may change to a large degree from one snapshot in time to the next. Using both...
Being the anchor points for building social relationships in the cyberspace, online social networks (OSNs) play an integral part of modern peoples life. Since different OSNs are designed to address specific social needs, people take part in multiple OSNs to cover different facets of their life. While the fragmented pieces of information about a user in each OSN may be of limited use, serious privacy...
Traditional network classification techniques will become computationally intractable when applied on a network which is presented in a streaming fashion with continuous updates. In this paper, we examine the problem of classification in dynamic streaming networks, or graphs. Two scenarios have been considered: the graph transaction scenario and the one large graph scenario. We propose a unified framework...
Cyberbullying is a major problem affecting more than half of all American teens. Prior work has largely focused on detecting cyberbullying after the fact. In this paper, we investigate the prediction of cyberbullying incidents in Instagram, a popular media-based social network. The novelty of this work is building a predictor that can anticipate the occurrence of cyberbullying incidents before they...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.