Search results

chapter

Semi-Supervised Multi-label Dimensionality Reduction

Baolin Guo, Chenping Hou, Feiping Nie, Dongyun Yi

2016 IEEE 16th International Conference on Data Mining (ICDM) > 919 - 924

2016 IEEE 16th International Conference on Data Mining (ICDM)

Multi-label data with high dimensionality arise frequently in data mining and machine learning. It is not only time consuming but also computationally unreliable when we use high-dimensional data directly. Supervised dimensionality reduction approaches are based on the assumption that there are large amounts of labeled data. It is infeasible to label a large number of training samples in practice...

chapter

DeBot: Twitter Bot Detection via Warped Correlation

Nikan Chavoshi, Hossein Hamooni, Abdullah Mueen

2016 IEEE 16th International Conference on Data Mining (ICDM) > 817 - 822

2016 IEEE 16th International Conference on Data Mining (ICDM)

We develop a warped correlation finder to identify correlated user accounts in social media websites such as Twitter. The key observation is that humans cannot be highly synchronous for a long duration, thus, highly synchronous user accounts are most likely bots. Existing bot detection methods are mostly supervised, which requires a large amount of labeled data to train, and do not consider cross-user...

chapter

Knowledge Graph Constraints for Multi-label Graph Classification

Martin Ringsquandl, Steffen Lamparter, Ingo Thon, Raffaello Lepratti, more

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 121 - 127

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

Graph classification methods have gained increasing attention in different domains, such as classifying functions of molecules or detection of bugs in software programs. Similarly, predicting events in manufacturing operations data can be compactly modeled as graph classification problem. Feature representations of graphs are usually found by mining discriminative sub-graph patterns that are non-uniformly...

chapter

Interactive Independent Topic Analysis for Service

Takahiro Nishigaki, Katsumi Nitta, Takashi Onoda

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 560 - 567

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

In this paper, we propose a interactive constrained independent topic analysis in text mining. Independent Topic Analysis (ITA) is a method for extracting the independent topics from the document data by using the independent component analysis. In the independent topic analysis, it is possible to extract the most independent topics between each topic. By extracting the independent topic, it is easy...

chapter

Efficient Algorithms for the Three Locus Problem in Genome-Wide Association Study

Sanguthevar Rajasekaran, Subrata Saha

2016 IEEE 16th International Conference on Data Mining (ICDM) > 1155 - 1160

2016 IEEE 16th International Conference on Data Mining (ICDM)

Using the recent advances in sequencing technology thousands of genomes have been sequenced. This sequence data can be fruitfully employed in diagnosis, drug design, etc. Genome-wide Association Study (GWAS) focuses on this important problem of extracting useful information from genomic data. As an example, a comparison of different genomes could throw light on causes for different diseases. Human...

chapter

Distributed Mining and Modeling of Dynamic Lead-Lag Relations in Evolving Entities

Tian Guo, Jean-Paul Calbimonte, Karl Aberer

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 45 - 52

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

Discovering and modeling lead-lag relations is a critical task in a variety of domains, including energy management, financial markets and environment monitoring. This task becomes more challenging when processing massive and highly dynamic data sources, often produced by sensors and live feeds that collect data about evolving entities in the real world. To cope with this data volume and velocity,...

chapter

Maximal Sequence Mining approach for topic detection from microblog streams

Fereshteh Jafariakinabad, Kien A. Hua

2016 IEEE Symposium Series on Computational Intelligence (SSCI) > 1 - 8

2016 IEEE Symposium Series on Computational Intelligence (SSCI)

Unprecedented expansion of user generated content in recent years demands more attempts of information filtering in order to extract high quality information from the huge amount of available data. In particular, topic detection from microblog streams is the first step toward monitoring and summarizing social data. This task is challenging due to the short and noisy characteristics of microblog content...

chapter

Leveraging large sensor streams for robust cloud control

Alok Singh, Eric Stephan, Todd Elsethagen, Matt MacDuff, more

2016 IEEE International Conference on Big Data (Big Data) > 2115 - 2120

2016 IEEE International Conference on Big Data (Big Data)

Today's dynamic computing deployment for commercial and scientific applications is propelling us to an era where minor inefficiencies can snowball into significant performance and operational bottlenecks. Data center operations is increasingly relying on sensors based control systems for key decision insights. The increased sampling frequencies, cheaper storage costs and prolific deployment of sensors...

chapter

Discovering Spatial Regions of High Correlation

Prerna Agarwal, Richa Verma, Venkata M. V. Gunturi

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 1082 - 1089

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

Given a set of events of two different types (e.g. locations of crime incidents/road accidents) in geographic space and minimum density and area thresholds, spatial regions of high correlation discovery (RHC) aims to determine rectangular-shaped areas of high correlation between two event types. RHC discovery is important to many fields like transportation engineering, criminology, and epidemiology...

chapter

Method for Extraction of Purchase Behavior and Product Character Using Dynamic Topic Model

Mamoru Emoto

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 778 - 782

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

In this study, we focus on extraction of latent topic transition from POS data. POS analysis is conducted to obtain the frequent pattern of customer's behavior. The fundamental method for POS analysis is to conduct market basket analysis. By doing Market basket analysis, the sets of products that are often bought at the same time can be extracted. In market basket analysis, however, the effect of...

chapter

Discovering Multi-type Correlated Events with Time Series for Exception Detection of Complex Systems

Peng Xun, Pei-Dong Zhu, Cun-Lu Li, Hao-Yang Zhu

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 21 - 28

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

With the increase of systems' complexity, exception detection becomes more important and difficult. For most complex systems, like cloud platform, exception detection is mainly conducted by analyzing a large amount of telemetry data collected from systems at runtime. Time series data and events data are two major types of telemetry data. Techniques of correlation analysis are important tools that...

chapter

Explore the Adequate and Concise Information from Communication Signals in Terms of Graphs

Kun Yan, Hsiao-Chun Wu, Hailin Xiao, Xiangli Zhang

2016 IEEE Global Communications Conference (GLOBECOM) > 1 - 6

GLOBECOM 2016 - 2016 IEEE Global Communications Conference

In this paper, a novel adequate and concise information extraction approach is explored to provide a promising alternative for manifesting the intrinsic structure of the cyclostationary signals, such as communication signals. A novel graph-based signal representation is proposed to interpret the spectral correlation function into a graph and its adjacency matrix. This graph can represent the proposed...

chapter

In pursuit of outliers in multi-dimensional data streams

Shiblee Sadik, Le Gruenwald, Eleazar Leal

2016 IEEE International Conference on Big Data (Big Data) > 512 - 521

2016 IEEE International Conference on Big Data (Big Data)

Among many Big Data applications are those that deal with data streams. A data stream is a sequence of data points with timestamps that possesses the properties of transiency, infiniteness, uncertainty, concept drift, and multi-dimensionality. In this paper we propose an outlier detection technique called Orion that addresses all the characteristics of data streams. Orion looks for a projected dimension...

chapter

Designing Sketches for Similarity Filtering

Vladimir Mic, David Novak, Pavel Zezula

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 655 - 662

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

The amounts of currently produced data emphasize the importance of techniques for efficient data processing. Searching big data collections according to similarity of data well corresponds to human perception. This paper is focused on similarity search using the concept of sketches – a compact bit string representations of data objects compared by Hamming distance, which can be used for filtering...

chapter

Predicting the Bursts of Data Access Streams by Filtering Correlated I/Os

Lifeng Huang, Yuhui Deng, Cheng Hu, Yongtao Zhou, more

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS) > 174 - 181

2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS)

Bursty behavior normally indicates that the workload generated by data accesses happens in short time, uneven spurts. In order to handle the bursts, the physical resources of IT devices have to be configured to offer capability which goes far beyond the average resource utilization, thus satisfying the performance. However, this kind of fat provisioning incurs wasting resources when the system does...

chapter

Mining causality graph for automatic web-based service diagnosis

Xiaohui Nie, Youjian Zhao, Kaixin Sui, Dan Pei, more

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC) > 1 - 8

2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC)

It is crucial for Internet company to provide highly reliable web-based services. The web-based services always have many components running in the large-scale infrastructure with complex interactions. As an indispensable part of high reliability, the diagnosis remains to be a thorny problem. With the growth of system scale and complexity, it becomes even more difficult. In this paper, we propose...

chapter

Explode: An Extensible Platform for Differentially Private Data Analysis

Emir Esmerdag, Mehmet Emre Gursoy, Ali Inan, Yucel Saygin

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW) > 1300 - 1303

2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)

Differential privacy (DP) has emerged as a popular standard for privacy protection and received great attention from the research community. However, practitioners often find DP cumbersome to implement, since it requires additional protocols (e.g., for randomized response, noise addition) and changes to existing database systems. To avoid these issues we introduce Explode, a platform for differentially...

chapter

A Study on Sports Tourism Competitiveness Based on Factor Analysis Method

Xinhua Li, Juntao Chen, Jinmei Zhan, Lin Liu

2016 12th International Conference on Computational Intelligence and Security (CIS) > 673 - 676

2016 12th International Conference on Computational Intelligence and Security (CIS)

In this paper the key indicators of sports tourism competitiveness were selected through the factor analysis method, by using the factor analysis method in Spss22.0 software, all kinds of sports tourism data in each city and county of Hainan were analyzed according to the factor analysis method and the sports tourism competitiveness of each city and county were also evaluated in comprehensive scores...

chapter

Online learning of Contextual Hidden Markov Models for temporal-spatial data analysis

Yuxun Zhou, Reza Arghandeh, Costas J. Spanos

2016 IEEE 55th Conference on Decision and Control (CDC) > 6335 - 6341

2016 IEEE 55th Conference on Decision and Control (CDC)

The problem of mining a network of time series data naturally arises in many research areas including energy system, quantitative finance, bioinformatics, environmental monitoring, traffic monitoring, etc. Among others, two emerging challenges shared by manifold applications are (1) the modeling of temporal-spatial dependence with contextual information and (2) the design of efficient learning algorithms...

chapter

Knowledge acquisition at the time of Big Data

Francis Rousseaux, Stephane Cormier

2016 Federated Conference on Computer Science and Information Systems (FedCSIS) > 1343 - 1348

2016 Federated Conference on Computer Science and Information Systems (FedCSIS)

What is exactly ‘Big Data’, and for what purpose and application is it really efficient? Between the commercial promises made by the industrial actors and the Cassandra's cautions from some whistle-blowers, we propose a singular Big Data field to investigate with Inductive Data-Driven Algorithms: developing collections. Last but not least, we investigate the innovative possibility to curate ‘figural’...

INFONA - science communication portal

Search results

Semi-Supervised Multi-label Dimensionality Reduction

DeBot: Twitter Bot Detection via Warped Correlation

Knowledge Graph Constraints for Multi-label Graph Classification

Interactive Independent Topic Analysis for Service

Efficient Algorithms for the Three Locus Problem in Genome-Wide Association Study

Distributed Mining and Modeling of Dynamic Lead-Lag Relations in Evolving Entities

Maximal Sequence Mining approach for topic detection from microblog streams

Leveraging large sensor streams for robust cloud control

Discovering Spatial Regions of High Correlation

Method for Extraction of Purchase Behavior and Product Character Using Dynamic Topic Model

Discovering Multi-type Correlated Events with Time Series for Exception Detection of Complex Systems

Explore the Adequate and Concise Information from Communication Signals in Terms of Graphs

In pursuit of outliers in multi-dimensional data streams

Designing Sketches for Similarity Filtering

Predicting the Bursts of Data Access Streams by Filtering Correlated I/Os

Mining causality graph for automatic web-based service diagnosis

Explode: An Extensible Platform for Differentially Private Data Analysis

A Study on Sports Tourism Competitiveness Based on Factor Analysis Method

Online learning of Contextual Hidden Markov Models for temporal-spatial data analysis

Knowledge acquisition at the time of Big Data

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options