The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Smartcard data provide a great number of information that are increasingly used nowadays. In the field of transport, they offer the opportunity to study passenger behavior, leading to a better knowledge of public transit demand and thereby granting the transport operators the ability to adapt their transport offer and services accordingly, both in space and in time. In particular, an accurate characterization...
In many real-world applications there is a need to monitor the distribution of a population across different classes, and to track changes in this distribution over time. As an example, an important task is to monitor the percentage of unemployed adults in a given region. When the membership of an individual in a class cannot be established deterministically, a typical solution is the classification...
Online social networks have changed the ways in which people communicate and interact, and have also impacted the business landscape. One recent trend is firms using online social networks as a part of the job hiring process. Firms scrutinize potential employees using their social network profiles, sometimes even seeking access to restricted parts of the profile, for example by demanding applicants...
Feature selection is an important step in building predictive models for most real-world problems. One of the popular methods in feature selection is Lasso. However, it shows instability in selecting features when dealing with correlated features. In this work, we propose a new method that aims to increase the stability of Lasso by encouraging similarities between features based on their relatedness,...
The global financial crisis occurred in 2007 and its severe damaging consequences on other global financial markets, show the great importance of understanding the impact and contagion between different financial markets. A variety of methods have been proposed and implemented on market contagion. However, most of the existing literature simply test the existence of market contagion in financial crisis,...
Modern organizations invest a lot of resources in recruiting, managing, and retaining people with high value and talent. In spite of several studies over the past fifty years, there is no silver bullet for talent management, since the area itself is constantly evolving due to the ever-changing nature of the enterprise in the knowledge economy. In this paper, we adopt an analytics-based approach to...
The availability on the Internet of huge amounts of blog posts, messages and comments allows to study the attitude of people on various topics. Sentiment Analysis, Opinion Mining and Emotion Analysis denote the area of research in Computer Science aimed at studying, analyzing and classifying text documents based on the underlying opinions expressed by their authors on various topics. While this is...
We investigate metric learning in the context of dynamic time warping (DTW), the by far most popular dissimilarity measure used for the comparison and analysis of motion capture data. While metric learning enables a problem-adapted representation of data, the majority of methods has been proposed for vectorial data only. In this contribution, we extend the popular principle offered by the large margin...
Comparing and classifying graphs represent two essential steps for network analysis, across different scientific and applicative domains. Here we deal with both operations by introducing the Hamming-Ipsen-Mikhailov (HIM) distance, a novel metric to quantitatively measure the difference between two graphs sharing the same vertices. The new measure combines the local Hamming edit distance and the global...
Human behavior is predictable in principle: people are systematic in their everyday choices. This predictability can be used to plan events and infrastructure, both for the public good and for private gains. In this paper we investigate the largely unexplored relationship between the systematic behavior of a customer and its profitability for a retail company. We estimate a customer's behavioral entropy...
Efficient energy planning is a key feature for the future smart cities. The real-time optimization of the energy distribution and storage is the real added value for smart grid and cities. However, the available energy providers' infrastructures are not able to estimate and predict real-time fluctuation of the energy demand and are not scalable enough to integrate, with low cost and effort, hardware...
Computer vision enables in-situ monitoring of animal populations at a lower cost and with less ecosystem disturbance than with human observers. However, computer vision uncertainty may not be fully understood by end-users, and the uncertainty assessments performed by technology experts may not fully address end-user needs. This knowledge gap can yield misinterpretations of computer vision data, and...
A regionalization system delineates the geographical landscape into spatially contiguous, homogeneous units for landscape ecology research and applications. In this study, we investigated a quantitative approach for developing a regional-ization system using constrained clustering algorithms. Unlike conventional clustering, constrained clustering uses domain constraints to help guide the clustering...
In real-world social networks, there is increasing interest in tracking the evolution of groups of users. Existing approaches track evolving communities, in a time-sequential way, by comparing communities in terms of nodes using a similarity measure such as the Jaccard or a modified Jaccard measure. The measure allows the use of a one-to-one comparison in order to match communities. However, tracking...
In the recent years, several research works have been conducted on collecting context data from various sensors for activity inference. We observe that users perform several actions in their mobile phones: taking photos, performing check-ins, and accessing Wi-Fi networks. These actions generate spatial-temporal data that could be utilized to capture user activities. Spatial-temporal data could indicate...
As access to broadband continues to grow along with the now almost ubiquitous availability of mobile phones, the landscape of the e-content delivery space has never been so dynamic. To establish their position in the market, businesses are beginning to realize that understanding each of their customers' likes and dislikes is perhaps as important as the offered content itself. Further, a number of...
In recent years, the importance of identifying actionable patterns has become increasingly recognized so that decision-support actions can be inspired by the resultant patterns. A typical shift is on identifying high utility rather than highly frequent patterns. Accordingly, High Utility Itemset (HUI) Mining methods have become quite popular as well as faster and more reliable than before. However,...
Comparing two sets of multivariate samples is a central problem in data analysis. From a statistical standpoint, the simplest way to perform such a comparison is to resort to a non-parametric two-sample test (TST), which checks whether the two sets can be seen as i.i.d. samples of an identical unknown distribution (the null hypothesis). If the null is rejected, one wishes to identify regions accounting...
We consider a novel distributed learning problem: A server receives potentially unlimited data from clients in a sequential manner, but only a small initial fraction of these data are labeled. Because communication bandwidth is expensive, each client is limited to sending the server only a small (high-priority) fraction of the unlabeled data it generates, and the server is limited in the amount of...
Because of a wide range of applications, e.g., GPS applications and location based services, spatial pattern discovery is an important task in data mining. A co-location pattern is defined as a subset of spatial items whose instances are often located together in spatial proximity. Current co-location mining algorithms are unable to quantify the spatial proximity of a co-location pattern. We propose...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.