2016 IEEE 16th International Conference on Data Mining (ICDM)

book

2016 IEEE 16th International Conference on Data Mining (ICDM)

IEEE

chapter

Robust Convex Clustering Analysis

Qi Wang, Pinghua Gong, Shiyu Chang, Thomas S. Huang, more

2016 IEEE 16th International Conference on Data Mining (ICDM) > 1263 - 1268

2016 IEEE 16th International Conference on Data Mining (ICDM)

Clustering is an unsupervised learning approach that explores data and seeks groups of similar objects. Many classical clustering models such as k-means and DBSCAN are based on heuristics algorithms and suffer from local optimal solutions and numerical instability. Recently convex clustering has received increasing attentions, which leverages the sparsity inducing norms and enjoys many attractive...

chapter

Event Grounding from Multimodal Social Network Fusion

Hyunsouk Cho, Jinyoung Yeo, Seung-Won Hwang

2016 IEEE 16th International Conference on Data Mining (ICDM) > 835 - 840

2016 IEEE 16th International Conference on Data Mining (ICDM)

This paper studies the problem of extracting real world event information from social media streams. Although existing work focuses on event signals of bursty mentions extracted from a single-source of textual streams, these signals are likely to be noisy due to ambiguous occurrences of individual mentions. To extract accurate event signals, we propose a framework capable of "grounding"...

chapter

A Theoretical Analysis of the Fuzzy K-Means Problem

Johannes Blomer, Sascha Brauer, Kathrin Bujna

2016 IEEE 16th International Conference on Data Mining (ICDM) > 805 - 810

2016 IEEE 16th International Conference on Data Mining (ICDM)

One of the most popular fuzzy clustering techniques is the fuzzy K-means algorithm (also known as fuzzy-c-means or FCM algorithm). In contrast to the K-means and K-median problem, the underlying fuzzy K-means problem has not been studied from a theoretical point of view. In particular, there are no algorithms with approximation guarantees similar to the famous K-means++ algorithm known for the fuzzy...

chapter

Interpretable Clustering via Discriminative Rectangle Mixture Model

Junxiang Chen, Yale Chang, Brian Hobbs, Peter Castaldi, more

2016 IEEE 16th International Conference on Data Mining (ICDM) > 823 - 828

2016 IEEE 16th International Conference on Data Mining (ICDM)

Clustering is a technique that is usually applied as a tool for exploratory data analysis. Because of the exploratory nature of this task, it would be beneficial if a clustering method generates interpretable results, and allows incorporating domain knowledge. This motivates us to develop a probabilistic discriminative model that learns a rectangular decision rule for each cluster, we call Discriminative...

chapter

A Combinatorial Approach to Role Discovery

Albert Arockiasamy, Aristides Gionis, Nikolaj Tatti

2016 IEEE 16th International Conference on Data Mining (ICDM) > 787 - 792

2016 IEEE 16th International Conference on Data Mining (ICDM)

We provide a new formulation for theproblem of role discovery in graphs. Our definition is structural:two vertices should be assigned to the same roleif the roles of their neighbors, when viewed as multi-sets, are similar enough. An attractive characteristic of our approachis that it is based on optimizing a well-defined objective function, and thus, contrary to previous approaches, the role-discovery...

chapter

DESQ: Frequent Sequence Mining with Subsequence Constraints

Kaustubh Beedkar, Rainer Gemulla

2016 IEEE 16th International Conference on Data Mining (ICDM) > 793 - 798

2016 IEEE 16th International Conference on Data Mining (ICDM)

Frequent sequence mining methods often make use of constraints to control which subsequences should be mined, e.g., length, gap, span, regular-expression, and hierarchy constraints. We show that many subsequence constraints—including and beyond those considered in the literature—can be unified in a single framework. In more detail, we propose a set of simple and intuitive "pattern expressions"...

chapter

Efficient Sampling-Based Kernel Mean Matching

Swarup Chandra, Ahsanul Haque, Latifur Khan, Charu Aggarwal

2016 IEEE 16th International Conference on Data Mining (ICDM) > 811 - 816

2016 IEEE 16th International Conference on Data Mining (ICDM)

Many real-world applications exhibit scenarios where distributions represented by training and test data are not similar, but related by a covariate shift, i.e., having equal class conditional distribution with unequal covariate distribution. Traditional data mining techniques suffer to learn a good predictive model in the presence of covariate shift. Recent studies have proposed approaches to address...

chapter

Mining Summaries for Knowledge Graph Search

Qi Song, Yinghui Wu, Xin Luna Dong

2016 IEEE 16th International Conference on Data Mining (ICDM) > 1215 - 1220

2016 IEEE 16th International Conference on Data Mining (ICDM)

Mining and searching heterogeneous and large knowledge graphs is challenging under real-world resource constraints such as response time. This paper studies a framework that discover to facilitate knowledge graph search. 1) We introduce a class of summaries characterized by graph patterns. In contrast to conventional summaries defined by frequent subgraphs, the summaries are capable of adaptively...

chapter

Structure Selection for Convolutive Non-negative Matrix Factorization Using Normalized Maximum Likelihood Coding

Atsushi Suzuki, Kohei Miyaguchi, Kenji Yamanishi

2016 IEEE 16th International Conference on Data Mining (ICDM) > 1221 - 1226

2016 IEEE 16th International Conference on Data Mining (ICDM)

Convolutive non-negative matrix factorization (CNMF) is a promising method for extracting features from sequential multivariate data. Conventional algorithms for CNMF require that the structure, or the number of bases for expressing the data, be specified in advance. We are concerned with the issue of how we can select the best structure of CNMF from given data. We first introduce a framework of probabilistic...

chapter

POI Recommendation: A Temporal Matching between POI Popularity and User Regularity

Zijun Yao, Yanjie Fu, Bin Liu, Yanchi Liu, more

2016 IEEE 16th International Conference on Data Mining (ICDM) > 549 - 558

2016 IEEE 16th International Conference on Data Mining (ICDM)

Point of interest (POI) recommendation, which provides personalized recommendation of places to mobile users, is an important task in location-based social networks (LBSNs). However, quite different from traditional interest-oriented merchandise recommendation, POI recommendation is more complex due to the timing effects: we need to examine whether the POI fits a user's availability. While there are...

chapter

A Robust Framework for Classifying Evolving Document Streams in an Expert-Machine-Crowd Setting

Muhammad Imran, Sanjay Chawla, Carlos Castillo

2016 IEEE 16th International Conference on Data Mining (ICDM) > 961 - 966

2016 IEEE 16th International Conference on Data Mining (ICDM)

An emerging challenge in the online classification of social media data streams is to keep the categories used for classification up-to-date. In this paper, we propose an innovative framework based on an Expert-Machine-Crowd (EMC) triad to help categorize items by continuously identifying novel concepts in heterogeneous data streams often riddled with outliers. We unify constrained clustering and...

chapter

Personalized Ranking in Signed Networks Using Signed Random Walk with Restart

Jinhong Jung, Woojeong Jin, Lee Sael, U Kang

2016 IEEE 16th International Conference on Data Mining (ICDM) > 973 - 978

2016 IEEE 16th International Conference on Data Mining (ICDM)

How can we rank users in signed social networks? Relationships between nodes in a signed network are represented as positive (trust) or negative (distrust) edges. Many social networks have adopted signed networks to express trust between users. Consequently, ranking friends or enemies in signed networks has received much attention from the data mining community. The ranking problem, however, is challenging...

chapter

ExploreKit: Automatic Feature Generation and Selection

Gilad Katz, Eui Chul Richard Shin, Dawn Song

2016 IEEE 16th International Conference on Data Mining (ICDM) > 979 - 984

2016 IEEE 16th International Conference on Data Mining (ICDM)

Feature generation is one of the challenging aspects of machine learning. We present ExploreKit, a framework for automated feature generation. ExploreKit generates a large set of candidate features by combining information in the original features, with the aim of maximizing predictive performance according to user-selected criteria. To overcome the exponential growth of the feature space, ExploreKit...

chapter

Mining Statistically Significant Attribute Associations in Attributed Graphs

Jihwan Lee, Keehwan Park, Sunil Prabhakar

2016 IEEE 16th International Conference on Data Mining (ICDM) > 991 - 996

2016 IEEE 16th International Conference on Data Mining (ICDM)

Graphs are widely used to represent many differentkinds of real world data such as social networks, protein-proteininteractions, and road networks. In many cases, each node in agraph is associated with a set of its attributes and it is criticalto not only consider the link structure of a graph but also usethe attribute information to achieve more meaningful results invarious graph mining tasks. Most...

chapter

Steering Social Media Promotions with Effective Strategies

Kun Kuang, Meng Jiang, Peng Cui, Shiqiang Yang

2016 IEEE 16th International Conference on Data Mining (ICDM) > 985 - 990

2016 IEEE 16th International Conference on Data Mining (ICDM)

On social media platforms, companies, organizations and individuals are using the function of sharing or retweeting information to promote their products, policies, and ideas. While a growing body of research has focused on identifying the promoters from millions of users, the promoters themselves are seeking to know what strategies can improve promotional effectiveness, which is rarely studied in...

chapter

Time-Aware User Identification with Topic Models

Clement Lesaege, Francois Schnitzler, Anne Lambert, Jean-Ronan Vigouroux

2016 IEEE 16th International Conference on Data Mining (ICDM) > 997 - 1002

2016 IEEE 16th International Conference on Data Mining (ICDM)

Accounts are often shared by multiple users, eachof them having different item consumption and temporal habits. Identifying of the active user can lead to improvements ina variety of services by switching from account personalizedservices to user personalized services. To do so, we developa topic model extending the Latent Dirichlet Allocation usinga hidden variable representing the active user and...

chapter

Learning Deep Networks from Noisy Labels with Dropout Regularization

Ishan Jindal, Matthew Nokleby, Xuewen Chen

2016 IEEE 16th International Conference on Data Mining (ICDM) > 967 - 972

2016 IEEE 16th International Conference on Data Mining (ICDM)

Large datasets often have unreliable labels—such as those obtained from Amazon's Mechanical Turk or social media platforms—and classifiers trained on mislabeled datasets often exhibit poor performance. We present a simple, effective technique for accounting for label noise when training deep neural networks. We augment a standard deep network with a softmax layer that models the label noise statistics...

chapter

Outlier Detection from Network Data with Subnetwork Interpretation

Xuan-Hong Dang, Arlei Silva, Ambuj Singh, Ananthram Swami, more

2016 IEEE 16th International Conference on Data Mining (ICDM) > 847 - 852

2016 IEEE 16th International Conference on Data Mining (ICDM)

Detecting a small number of outliers from a set of data observations is always challenging. This problem is more difficult in the setting of multiple network samples, where computing the anomalous degree of a network sample is generally not sufficient. In fact, explaining why a given network is exceptional, expressed in the form of subnetwork, is also equally important. We develop a novel algorithm...

chapter

Service Usage Analysis in Mobile Messaging Apps: A Multi-label Multi-view Perspective

Yanjie Fu, Junming Liu, Xiaolin Li, Xinjiang Lu, more

2016 IEEE 16th International Conference on Data Mining (ICDM) > 877 - 882

2016 IEEE 16th International Conference on Data Mining (ICDM)

The service usage analysis, aiming at identifying customers' messaging behaviors based on encrypted App traffic flows, has become a challenging and emergent task for service providers. Prior literature usually starts from segmenting a traffic sequence into single-usage subsequences, and then classify the subsequences into different usage types. However, they could suffer from inaccurate traffic segmentations...

INFONA - science communication portal

2016 IEEE 16th International Conference on Data Mining (ICDM)