The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the literature, a number of methods have been proposed for semi-supervised learning. Recently, graph-based methods of semi-supervised learning have become popular because of their capability of handling large amounts of unlabeled data. However, the existing graph based semi-supervised learning algorithms do not optimize the process of selecting better labeled data. We have developed a new selective...
Sparse subspace clustering (SSC) is an effective approach to cluster high-dimensional data. However, how to adaptively select the number of clusters/eigenvectors for different data sets, especially when the data are corrupted by noise, is a big challenge in SSC and also an open problem in field of data mining. In this paper, considering the fact that the eigenvectors are robust to noise, we develop...
The large adoption of Twitter during electioneering has created a valuable opportunity to monitor political deliberation nationwide. Recent work has analyzed online attention to forecast elections results addressing some limitations of opinion polling. However, the reproducibility of such methods remains a challenge given that most of them rely on the number of political parties or candidates mentions...
Social-media debates on longitudinal political topics often take the form of adversarial discussions: highly polarized user posts, favoring one of two opposing parties, over an extended time period. Recent prominent cases are the US Presidential campaign and the UK Brexit referendum. This paper approaches such discussions as a multi-faceted data space, and applies data mining to identify interesting...
Social surveys have been used by researchers and policy makers as an essential tool for understanding social and political activities in society. Social media has introduced a new way of capturing data from large numbers of people. Unlike surveys, social media deliver data more rapidly and cheaply. In this paper, we aim to rapidly identify socio-political activity in South Africa using proxy data...
In this work we demonstrate a method to detect controversy on news issues. This is done by performing an analysis of people's reaction on social media to news articles reporting these issues. Detecting controversial news topics on web is a relevant problem today. It helps to identify the issues upon which people have divided opinion and is specially useful on topics such as a presidential election,...
The use of RPE as a measure of Internal load has become a common methodology used in team sports owing to its low cost. The aim of this study was to build a machine learning process able to describe the players' RPE by the external load extracted from the GPS. In this paper, we propose a multidimensional approach to assess the RPE in professional soccer which is based on GPS measurements and machine...
Advanced statistics have proved to be a crucial tool for basketball coaches in order to improve training skills. Indeed, the performance of the team can be further optimized by studying the behaviour of players under certain conditions. In the United States of America, companies such as STATS or Second Spectrum use a complex multi-camera setup to deliver advanced statistics to all NBA teams, but the...
Online data provide a way to monitor how users behave in social systems like social networks and online games, and understand which features turn an ordinary individual into a successful one. Here, we propose to study individual performance and success in Multiplayer Online Battle Arena (MOBA) games. Our purpose is to identify those behaviors and playing styles that are characteristic of players with...
In this paper, we propose a very simple method for learning relationships between events by accounting for the spatial or temporal sequence of occurrence of the events. The underlying idea behind our proposed method is that for certain data processing application, such as data collected from retail shoppers, relational access to data is more useful and immediately informative than sequential access...
Contrast patterns are itemsets that frequently occur in one dataset while not in another. These patterns have been successfully applied to many data mining domains, such as prediction, classification and clustering. However, none of the previous studies has considered extracting contrast patterns from different types of datasets. In this paper, we introduce a new type of contrast pattern, Conditional...
Forecasting models that utilize multiple predictors are gaining popularity in a variety of fields. In some cases they allow constructing more precise forecasting models, leveraging the predictive potential of many variables. Unfortunately, in practice we do not know which observed predictors have a direct impact on the target variable. Moreover, adding unrelated variables may diminish the quality...
Agent-based modeling is a paradigm of modeling dynamic systems of interacting agents that are individually governed by specified behavioral rules. Training a model of such agents to produce an emergent behavior by specification of the emergent (as opposed to agent) behavior is easier from a demonstration perspective. Without the involvement of manual behavior specification via code or reliance on...
Identifying meaningful signal buried in noise is a problem of interest arising in diverse scenarios of data-driven modeling. We present here a theoretical framework for exploiting intrinsic geometry in data that resists noise corruption, and might be identifiable under severe obfuscation. Our approach is based on uncovering a valid complete inner product on the space of ergodic stationary finite valued...
As the blooming development of data mining in social computing systems (e.g., crowdsourcing system), statistical inference from crowdsourced data severs as a powerful tool to provide diversified services. To support critical applications (e.g., recommendation), in this paper, we shall focus on the collaborative ranking problems and construct a system of which the input is crowdsourced pairwise comparisons...
The paper considers the problem of feature selection in learning using privileged information (LUPI), where some of the features (referred to as privileged ones) are only available for training, while being absent for test data. In the latest implementation of LUPI, these privileged features are approximated using regressions constructed on standard data features, but this approach could lead to polluting...
Performing statistical inference on collections of graphs is of import to many disciplines. Graph embedding, in which the vertices of a graph are mapped to vectors in a low-dimensional Euclidean space, has gained traction as a basic tool for graph analysis. Here we describe an omnibus embedding in which multiple graphs on the same vertex set are jointly embedded into a single space with a distinct...
Given a stream of heterogeneous edges, comprising different types of nodes and edges, which arrive in an interleaved fashion to multiple different graphs evolving simultaneously, how can we spot the anomalous graphs in real-time using only constant memory? This problem is motivated by and generalizes from its application in security to host-level advanced persistent threat (APT) detection. In this...
Graph data management and mining in HPC environments has been a widely discussed issue in recent times. In this talk I will describe the use of Partitioned Global Address Space languages for graph data mining and management. I will first discuss the rationale behind X10 based graph libraries and graph database benchmarks using ScaleGraph and XGDBench as examples. Next, I will take Acacia which is...
Networks naturally capture a host of real-world interactions, from social interactions and email communication to brain activity. However, graphs are not always directly observed, especially in scientific domains, such as neuroscience, where monitored brain activity is often captured as time series. How can we efficiently infer networks from time series data (e.g., model the functional organization...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.