2008 IEEE International Conference on Data Mining Workshops

Items from 1 to 18 out of 18 results

chapter

Hunting for Coherent Co-clusters in High Dimensional and Noisy Datasets

M. Deodhar, J. Ghosh, G. Gupta, Hyuk Cho, more

2008 IEEE International Conference on Data Mining Workshops > 654 - 663

2008 IEEE International Conference on Data Mining Workshops

Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally,...

chapter

Efficient Distance Computation Using SQL Queries and UDFs

S.K. Pitchaimalai, C. Ordonez, C. Garcia-Alvarado

2008 IEEE International Conference on Data Mining Workshops > 533 - 542

2008 IEEE International Conference on Data Mining Workshops

Distance computation is one of the most computationally intensive operations employed by many data mining algorithms. Performing such matrix computations within a DBMS creates many optimization challenges. We propose techniques to efficiently compute Euclidean distance using SQL queries and user-defined functions (UDFs). We concentrate on efficient Euclidean distance computation for the well-known...

chapter

Actionable Knowledge Discovery for Threats Intelligence Support Using a Multi-dimensional Data Mining Methodology

O. Thonnard, M. Dacier

2008 IEEE International Conference on Data Mining Workshops > 154 - 163

2008 IEEE International Conference on Data Mining Workshops

This paper describes a multi-dimensional knowledge discovery and data mining (KDD) methodology that aims at discovering actionable knowledge related to Internet threats, taking into account domain expert guidance and the integration of domain-specific intelligence during the data mining process. The objectives are twofold: i) to develop global indicators for assessing the prevalence of certain malicious...

chapter

Multiple-Instance Regression with Structured Data

K.L. Wagstaff, T. Lane, A. Roper

2008 IEEE International Conference on Data Mining Workshops > 291 - 300

2008 IEEE International Conference on Data Mining Workshops

We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents...

chapter

Extension of Partitional Clustering Methods for Handling Mixed Data

Y. Naija, S. Chakhar, K. Blibech, R. Robbana

2008 IEEE International Conference on Data Mining Workshops > 257 - 266

2008 IEEE International Conference on Data Mining Workshops

Clustering is an active research topic in data mining and different methods have been proposed in the literature. Most of these methods are based on the use of a distance measure defined either on numerical attributes or on categorical attributes. However, in fields such as road traffic and medicine, datasets are composed of numerical and categorical attributes. Recently, there have been several proposals...

chapter

Word Sense Discovery for Web Information Retrieval

T. Nykiel, H. Rybinski

2008 IEEE International Conference on Data Mining Workshops > 267 - 274

2008 IEEE International Conference on Data Mining Workshops

Word meaning disambiguation has always been an important problem in many computer science tasks, such as information retrieval and extraction. One of the problems,faced in automatic word sense discovery, is the number of different senses a word can have. Often, senses are dominated by some other, more frequent ones. Discovering such dominated meanings can significantly improve quality of many text-related...

chapter

Clustering Events on Streams Using Complex Context Information

YongChul Kwon, Wing Yee Lee, M. Balazinska, Guiping Xu

2008 IEEE International Conference on Data Mining Workshops > 238 - 247

2008 IEEE International Conference on Data Mining Workshops

Monitoring applications play an increasingly important role in many domains. They detect events in monitored systems and take actions such as invoke a program or notify an administrator. Often administrators must then manually investigate events to figure out the source of a problem. Stream processing engines (SPEs) are general purpose data management systems for monitoring applications. They provide...

chapter

Distributed Data Mining Models as Services on the Grid

E. Cesario, D. Talia

2008 IEEE International Conference on Data Mining Workshops > 486 - 495

2008 IEEE International Conference on Data Mining Workshops

This paper describes how distributed data mining models, such as collective learning, ensemble learning, and meta-learning models, can be implemented as WSRF mining services by exploiting the Grid infrastructure. Our goal is to design a general distributed architectural model that can be exploited for different distributed mining algorithms deployed as Grid services for the analysis of dispersed data...

chapter

Risk Assessment of Atmospheric Hazard Releases Using K-Means Clustering

G. Cervone, P. Franzese, Y. Ezber, Z. Boybeyi

2008 IEEE International Conference on Data Mining Workshops > 342 - 348

2008 IEEE International Conference on Data Mining Workshops

Unsupervised machine learning algorithms are used to perform statistical analysis of several transport and dispersion model runs which simulate emissions from a fixed source under different atmospheric conditions. A clustering algorithm is used to automatically group the results of the transport and dispersion simulations according to their respective cloud characteristics. Each cluster of clouds...

chapter

Using Betweenness Centrality to Identify Manifold Shortcuts

W.J. Cukierski, D.J. Foran

2008 IEEE International Conference on Data Mining Workshops > 949 - 958

2008 IEEE International Conference on Data Mining Workshops

High-dimensional data presents a significant challenge to a broad spectrum of pattern recognition and machine-learning applications. Dimensionality reduction (DR) methods serve to remove unwanted variance and make such problems tractable. Several nonlinear DR methods, such as the well known ISOMAP algorithm, rely on a neighborhood graph to compute geodesic distances between data points. These graphs...

chapter

Estimating True and False Positive Rates in Higher Dimensional Problems and Its Data Mining Applications

A. Foss, O.R. Zaiane

2008 IEEE International Conference on Data Mining Workshops > 673 - 681

2008 IEEE International Conference on Data Mining Workshops

If we can estimate the accuracy of our observations then we can estimate the true and false positive rates over a series of samples in high dimensional data mining problems. To date such issues have been largely neglected and previously no algorithm has been provided to facilitate the computations involved. In high dimensional data mining tasks, increasing sparsity leads to decreasing true positive...

chapter

A New Graph-Based Algorithm for Clustering Documents

A.P. Suarez, J.F.M. Trinidad, J.A.C. Ochoa, J.E.M. Pagola

2008 IEEE International Conference on Data Mining Workshops > 710 - 719

2008 IEEE International Conference on Data Mining Workshops

In this paper a new algorithm, called CStar, for document clustering is presented. This algorithm improves recently developed algorithms like generalized star (GStar) and ACONS algorithms, originally proposed for reducing some drawbacks presented in previous Star-like algorithms.The CStar algorithm uses the condensed star-shaped sub-graph concept defined by ACONS, but defines a new heuristic that...

chapter

Detecting and Tracking Spatio-temporal Clusters with Adaptive History Filtering

J. Rosswog, K. Ghose

2008 IEEE International Conference on Data Mining Workshops > 448 - 457

2008 IEEE International Conference on Data Mining Workshops

This paper addresses the problem of detecting and tracking moving clusters in spatio-temporal data sets. Spatio-temporal data sets contain data elements that move in space over time. Traditional data clustering algorithms work well on static data sets that contain well separated clusters. When traditional techniques are applied to spatio-temporal data they breakdown when the moving data elements intersect...

chapter

A New Method for Multi-view Face Clustering in Video Sequence

Panpan Huang, Yunhong Wang, Ming Shao

2008 IEEE International Conference on Data Mining Workshops > 869 - 873

2008 IEEE International Conference on Data Mining Workshops

In the problem of face clustering with multi-views, the similarity between faces of different persons with similar pose is usually greater than the similarity between multi-view faces of the same person. This may exert a tremendous impact on the clustering result that sent back to the user. To solve this problem, we should do pose clustering first and then within each dasiapose grouppsila, clustering...

chapter

Semi-supervised Collaborative Clustering with Partial Background Knowledge

G. Forestier, C. Wemmert, P. Gancarski

2008 IEEE International Conference on Data Mining Workshops > 211 - 217

2008 IEEE International Conference on Data Mining Workshops

In this paper we present a new algorithm for semisupervised clustering. We assume to have a small set of labeled samples and we use it in a clustering algorithm to discover relevant patterns. We study how our algorithm works against two other semisupervised algorithms when the data are multimodal. Then, we study the case where the user is able to produce few samples for some classes but not for each...

chapter

Unifying Unknown Nodes in the Internet Graph Using Semisupervised Spectral Clustering

A. Almog, J. Goldberger, Y. Shavitt

2008 IEEE International Conference on Data Mining Workshops > 174 - 183

2008 IEEE International Conference on Data Mining Workshops

Most research on Internet topology is based on active measurement methods. A major difficulty in using these tools is that one comes across many unresponsive routers. Different methods of dealing with these anonymous nodes to preserve the connectivity of the real graph have been suggested. One of the more practical approaches involves using a placeholder for each unknown, resulting in multiple copies...

chapter

An Efficient Search Algorithm for Content-Based Image Retrieval with User Feedback

A. Po Leung, P. Auer

2008 IEEE International Conference on Data Mining Workshops > 884 - 890

2008 IEEE International Conference on Data Mining Workshops

We propose a probabilistic model for the relevance feedback of users looking for target images. This model takes into account user errors and user uncertainty about distinguishing similarly relevant images. Based on this model, we have developed an algorithm, which selects images to be presented to the user for further relevance feedback until a satisfactory image is found. In each query session,...

chapter

Bounding and Estimating Association Rule Support from Clusters on Binary Data

C. Ordonez, Kai Zhao, Zhibo Chen

2008 IEEE International Conference on Data Mining Workshops > 609 - 618

2008 IEEE International Conference on Data Mining Workshops

The theoretical relationship between association rules and machine learning techniques needs to be studied in more depth. This article studies the use of clustering as a model for association rule mining. The clustering model is exploited to bound and estimate association rule support and confidence. We first study the efficient computation of the clustering model with K-means; we show the sufficient...

Filter options

Content availability:
Available
Keywords:
CLUSTERING ALGORITHMS

Publication date

Set your own date range

Keywords

PATTERN CLUSTERING (11)
DATA MINING (10)
DISTANCE MEASUREMENT (8)
ALGORITHM DESIGN AND ANALYSIS (5)
CLASSIFICATION ALGORITHMS (5)
CLUSTERING (3)
COMPUTATIONAL MODELING (3)
DATA MODELS (3)
FEATURE EXTRACTION (3)
GRAPH THEORY (3)
INTERNET (3)
LEARNING (ARTIFICIAL INTELLIGENCE) (3)
PATTERN CLASSIFICATION (3)
STATISTICAL ANALYSIS (3)
BUILDINGS (2)
CLUSTERING METHODS (2)
DATA ANALYSIS (2)
ESTIMATION (2)
IP NETWORKS (2)
ITEMSETS (2)
K-MEANS CLUSTERING ALGORITHM (2)
NOISE (2)
NOISE MEASUREMENT (2)
PROBABILITY (2)
QUERY PROCESSING (2)
TEXT MINING (2)
ACCURACY (1)
ACONS ALGORITHM (1)
ACTIVE MEASUREMENT METHODS (1)
ADAPTATION MODEL (1)
ADAPTIVE FILTERS (1)
ADAPTIVE HISTORY FILTERING (1)
AGRICULTURE (1)
ANONYMOUS NODES (1)
APPROXIMATION METHODS (1)
ARTIFICIAL INTELLIGENCE (1)
ASSOCIATION RULE SUPPORT ESTIMATION (1)
ASSOCIATION RULES (1)
ASTROPHYSICS (1)
ATMOSPHERIC HAZARD (1)
ATMOSPHERIC MODELING (1)
BAG LABELS (1)
BETWEENNESS (1)
BINARY DATA (1)
BIOLOGICALLY MEANINGFUL COCLUSTERS (1)
BISMUTH (1)
BOSPHORUS CHANNEL (1)
BOUND (1)
CATEGORICAL ATTRIBUTE (1)
CATEGORY THEORY (1)
CDM (1)
CENTRALITY (1)
CLASSIFICATION TECHNIQUE (1)
CLIQUE-BASED CLUSTERING TECHNIQUE (1)
CLOUDS (1)
CLUSTER MINING (1)
CLUSTERING ALGORITHM (1)
CLUSTERING DOCUMENTS (1)
CLUSTERING PROBLEMS (1)
CLUSTERING TECHNIQUE (1)
CO-CLUSTERING (1)
COCLUSTERING ALGORITHM (1)
COHERENT COCLUSTERS (1)
COHESIVE EXPRESSIONS (1)
COLLABORATION (1)
COLLABORATIVE CLUSTERING (1)
COMPLEX CONTEXT INFORMATION (1)
COMPUTATIONAL COMPLEXITY (1)
COMPUTER AIDED SOFTWARE ENGINEERING (1)
CONDENSED STAR-SHAPED SUBGRAPH CONCEPT (1)
CONFERENCES (1)
CONTAMINATION (1)
CONTENT-BASED IMAGE RETREIVAL (1)
CONTENT-BASED IMAGE RETRIEVAL (1)
CONTENT-BASED RETRIEVAL (1)
CONTEXT DISTANCE MEASURE (1)
CROP YIELD PREDICTION (1)
CSTAR ALGORITHM (1)
DATA CLASSIFICATION (1)
DATA CLUSTERING (1)
DATA HANDLING (1)
DATA MANAGEMENT SYSTEMS (1)
DATA MINING ALGORITHMS (1)
DATA REDUCTION (1)
DATA STREAM (1)
DATABASES (1)
DBMS (1)
DELAY (1)
DENSITY-BASED CLUSTERING METHOD (1)
DIMENSIONALITY REDUCTION (1)
DIMENSIONALITY REDUCTION METHOD (1)
DISPERSION (1)
DISPERSION MODEL (1)
DISTANCE (1)
DISTINCT DISTRIBUTIONS (1)
DISTRIBUTED ALGORITHMS (1)
DISTRIBUTED ARCHITECTURAL MODEL (1)
DISTRIBUTED DATA MINING (1)
DISTRIBUTED DATA MINING MODEL (1)
more

INFONA - science communication portal

2008 IEEE International Conference on Data Mining Workshops $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2008 IEEE International Conference on Data Mining Workshops