Machine Learning: ECML 2005

chapter

Data Analysis in the Life Sciences — Sparking Ideas —

Michael R. Berthold

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Invited Talks > 1-1

Data from various areas of Life Sciences have increasingly caught the attention of data mining and machine learning researchers. Not only is the amount of data available mind-boggling but the diverse and heterogenous nature of the information is far beyond any other data analysis problem so far. In sharp contrast to classical data analysis scenarios, the life science area poses challenges of a rather...

chapter

Machine Learning for Natural Language Processing (and Vice Versa?)

Claire Cardie

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Invited Talks > 2-2

Over the past 10-15 years, the influence of methods from machine learning has transformed the way that research is done in the field of natural language processing. This talk will begin by covering the history of this transformation. In particular, learning methods have proved successful in producing stand-alone text-processing components to handle a number of linguistic tasks. Moreover, these components...

chapter

Statistical Relational Learning: An Inductive Logic Programming Perspective

Luc Raedt

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Invited Talks > 3-5

In the past few years there has been a lot of work lying at the intersection of probability theory, logic programming and machine learning [14,18,13,9,6,1,11]. This work is known under the names of statistical relational learning [7,5], probabilistic logic learning [4], or probabilistic inductive logic programming. Whereas most of the existing works have started from a probabilistic learning perspective...

chapter

Recent Advances in Mining Time Series Data

Eamonn Keogh

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Invited Talks > 6-6

Much of the world’s supply of data is in the form of time series. Furthermore, as we shall see, many types of data can be meaningfully converted into ”time series”, including text, DNA, video, images etc. The last decade has seen an explosion of interest in mining time series data from the academic community. There has been significant work on algorithms to classify, cluster, segment, index, discover...

chapter

Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce

Ron Kohavi

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Invited Talks > 7-7

Electronic Commerce is now entering its second decade, with Amazon.com and eBay now in existence for ten years. With massive amounts of data, an actionable domain, and measurable ROI, multiple companies use data mining and knowledge discovery to understand their customers and improve interactions.We present important lessons and challenges using e-commerce examples across two dimensions: (i) business-level...

chapter

Data Streams and Data Synopses for Massive Data Sets (Invited Talk)

Yossi Matias

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Invited Talks > 8-9

With the proliferation of data intensive applications, it has become necessary to develop new techniques to handle massive data sets. Traditional algorithmic techniques and data structures are not always suitable to handle the amount of data that is required and the fact that the data often streams by and cannot be accessed again. A field of research established over the past decade is that of handling...

chapter

Clustering and Metaclustering with Nonnegative Matrix Decompositions

Liviu Badea

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 10-22

Although very widely used in unsupervised data mining, most clustering methods are affected by the instability of the resulting clusters w.r.t. the initialization of the algorithm (as e.g. in k-means). Here we show that this problem can be elegantly and efficiently tackled by meta-clustering the clusters produced in several different runs of the algorithm, especially if “soft” clustering algorithms...

chapter

A SAT-Based Version Space Algorithm for Acquiring Constraint Satisfaction Problems

Christian Bessiere, Remi Coletta, Frédéric Koriche, Barry O’Sullivan

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 23-34

Constraint programming is rapidly becoming the technology of choice for modelling and solving complex combinatorial problems. However, users of this technology need significant expertise in order to model their problems appropriately. The lack of availability of such expertise is a significant bottleneck to the broader uptake of constraint technology in the real world. We present a new SAT-based version...

chapter

Estimation of Mixture Models Using Co-EM

Steffen Bickel, Tobias Scheffer

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 35-46

We study estimation of mixture models for problems in which multiple views of the instances are available. Examples of this setting include clustering web pages or research papers that have intrinsic (text) and extrinsic (references) attributes. Our optimization criterion quantifies the likelihood and the consensus among models in the individual views; maximizing this consensus minimizes a bound on...

chapter

Nonrigid Embeddings for Dimensionality Reduction

Matthew Brand

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 47-59

Spectral methods for embedding graphs and immersing data manifolds in low-dimensional spaces are notoriously unstable due to insufficient and/or numerically ill-conditioned constraint sets. Why show why this is endemic to spectral methods, and develop low-complexity solutions for stiffening ill-conditioned problems and regularizing ill-posed problems, with proofs of correctness. The regularization...

chapter

Multi-view Discriminative Sequential Learning

Ulf Brefeld, Christoph Büscher, Tobias Scheffer

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 60-71

Discriminative learning techniques for sequential data have proven to be more effective than generative models for named entity recognition, information extraction, and other tasks of discrimination. However, semi-supervised learning mechanisms that utilize inexpensive unlabeled sequences in addition to few labeled sequences – such as the Baum-Welch algorithm – are available only for generative models...

chapter

Robust Bayesian Linear Classifier Ensembles

Jesús Cerquides, Ramon López Mántaras

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 72-83

Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect...

chapter

An Integrated Approach to Learning Bayesian Networks of Rules

Jesse Davis, Elizabeth Burnside, Inês Castro Dutra, David Page, more

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 84-95

Inductive Logic Programming (ILP) is a popular approach for learning rules for classification tasks. An important question is how to combine the individual rules to obtain a useful classifier. In some instances, converting each learned rule into a binary feature for a Bayes net learner improves the accuracy compared to the standard decision list approach [3,4,14]. This results in a two-step process,...

chapter

Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam

Isabel Drost, Tobias Scheffer

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 96-107

The page rank of a commercial web site has an enormous economic impact because it directly influences the number of potential customers that find the site as a highly ranked search engine result. Link spamming – inflating the page rank of a target page by artificially creating many referring pages – has therefore become a common practice. In order to maintain the quality of their search results, search...

chapter

Rotational Prior Knowledge for SVMs

Arkady Epshteyn, Gerald DeJong

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 108-119

Incorporation of prior knowledge into the learning process can significantly improve low-sample classification accuracy. We show how to introduce prior knowledge into linear support vector machines in form of constraints on the rotation of the normal to the separating hyperplane. Such knowledge frequently arises naturally, e.g., as inhibitory and excitatory influences of input variables. We demonstrate...

chapter

On the LearnAbility of Abstraction Theories from Observations for Relational Learning

Stefano Ferilli, Teresa M. A. Basile, Nicola Di Mauro, Floriana Esposito

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 120-132

The most common methodology in symbolic learning consists in inducing, given a set of observations, a general concept definition. It is widely known that the choice of the proper description language for a learning problem can affect the efficacy and effectiveness of the learning task. Furthermore, most real-world domains are affected by various kinds of imperfections in data, such as inappropriateness...

chapter

Beware the Null Hypothesis: Critical Value Tables for Evaluating Classifiers

George Forman, Ira Cohen

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 133-145

Scientists regularly decide the statistical significance of their findings by determining whether they can, with sufficient confidence, rule out the possibility that their findings could be attributed to random variation—the ‘null hypothesis.’ For this, they rely on tables with critical values pre-computed for the normal distribution, the t-distribution, etc. This paper provides such tables (and methods...

chapter

Kernel Basis Pursuit

Vincent Guigue, Alain Rakotomamonjy, Stéphane Canu

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 146-157

Estimating a non-uniformly sampled function from a set of learning points is a classical regression problem. Kernel methods have been widely used in this context, but every problem leads to two major tasks: optimizing the kernel and setting the fitness-regularization compromise. This article presents a new method to estimate a function from noisy learning points in the context of RKHS (Reproducing...

chapter

Hybrid Algorithms with Instance-Based Classification

Iris Hendrickx, Antal Bosch

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 158-169

In this paper we aim to show that instance-based classification can replace the classifier component of a rule learner and of maximum-entropy modeling, thereby improving the generalization accuracy of both algorithms. We describe hybrid algorithms that combine rule learning models and maximum-entropy modeling with instance-based classification. Experimental results show that both hybrids are able...

chapter

Learning and Classifying Under Hard Budgets

Aloak Kapoor, Russell Greiner

Lecture Notes in Computer Science > Machine Learning: ECML 2005 > Long Papers > 170-181

Since resources for data acquisition are seldom infinite, both learners and classifiers must act intelligently under hard budgets. In this paper, we consider problems in which feature values are unknown to both the learner and classifier, but can be acquired at a cost. Our goal is a learner that spends its fixed learning budget b _L acquiring training data, to produce...

INFONA - science communication portal

Machine Learning: ECML 2005
16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005. Proceedings

Data Analysis in the Life Sciences — Sparking Ideas —

Machine Learning for Natural Language Processing (and Vice Versa?)

Statistical Relational Learning: An Inductive Logic Programming Perspective

Recent Advances in Mining Time Series Data

Focus the Mining Beacon: Lessons and Challenges from the World of E-Commerce

Data Streams and Data Synopses for Massive Data Sets (Invited Talk)

Clustering and Metaclustering with Nonnegative Matrix Decompositions

A SAT-Based Version Space Algorithm for Acquiring Constraint Satisfaction Problems

Estimation of Mixture Models Using Co-EM

Nonrigid Embeddings for Dimensionality Reduction

Multi-view Discriminative Sequential Learning

Robust Bayesian Linear Classifier Ensembles

An Integrated Approach to Learning Bayesian Networks of Rules

Thwarting the Nigritude Ultramarine: Learning to Identify Link Spam

Rotational Prior Knowledge for SVMs

On the LearnAbility of Abstraction Theories from Observations for Relational Learning

Beware the Null Hypothesis: Critical Value Tables for Evaluating Classifiers

Kernel Basis Pursuit

Hybrid Algorithms with Instance-Based Classification

Learning and Classifying Under Hard Budgets

Filter options

Publication date

Content availability

Publication language

Keywords

INFONA - science communication portal

Machine Learning: ECML 2005 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005. Proceedings $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication language

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

Machine Learning: ECML 2005
16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005. Proceedings