The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Classical attribute-value descriptions induce a multi-dimensional geometric space. One way for computing the distance between descriptions in such a space consists in evaluating an Euclidean distance between tuples of coordinates. This is the ground on which a large part of the Machine Learning literature has built its methods and techniques. However, the complexity of some domains require the use...
In this paper we present a new algorithm for semisupervised clustering. We assume to have a small set of labeled samples and we use it in a clustering algorithm to discover relevant patterns. We study how our algorithm works against two other semisupervised algorithms when the data are multimodal. Then, we study the case where the user is able to produce few samples for some classes but not for each...
In this paper we consider the problem of discovering frequent temporal patterns in a database of temporal sequences, where a temporal sequence is a set of items with associated dates and durations. Since the quantitative temporal information appears to be fundamental in many contexts, it is taken into account in the mining processes and returned as part of the extracted knowledge. To this end, we...
The function of proteins in the living cells varies with respect to their localizations. Extracellular plant proteins are responsible for vital functions such as nutrition acquisition, protection from pathogens, communication with other soil organisms, etc. Hence, characterizing these proteins and distinguishing them from intracellular proteins is of high interest to biologists. Nonetheless, the small...
Monitoring applications play an increasingly important role in many domains. They detect events in monitored systems and take actions such as invoke a program or notify an administrator. Often administrators must then manually investigate events to figure out the source of a problem. Stream processing engines (SPEs) are general purpose data management systems for monitoring applications. They provide...
Longitudinal data consist of the repeated measurements of some variables which describe the dynamics of a domain(process or phenomenon) over time. They can be analyzed in order to explain what event may cause the transition from a state into the next one during the evolution of the domain. Generally, approaches to this explanation problem rely on the exclusive usage of domain knowledge, while an analysis...
Clustering is an active research topic in data mining and different methods have been proposed in the literature. Most of these methods are based on the use of a distance measure defined either on numerical attributes or on categorical attributes. However, in fields such as road traffic and medicine, datasets are composed of numerical and categorical attributes. Recently, there have been several proposals...
Word meaning disambiguation has always been an important problem in many computer science tasks, such as information retrieval and extraction. One of the problems,faced in automatic word sense discovery, is the number of different senses a word can have. Often, senses are dominated by some other, more frequent ones. Discovering such dominated meanings can significantly improve quality of many text-related...
Structured data is becoming increasingly abundant in many application domains recently. In this paper, as one of the correlation mining, we propose new data mining problems of finding frequent and correlated pairs of patterns in structured databases. First, we consider the problem of finding all frequent and correlated pattern pairs in two dimensional structured databases. Then, two kinds of top-k...
Action rules describe possible transitions of objects from one state to another with respect to a distinguished attribute. Previous research on action rule discovery usually required the extraction of classification rules before constructing any action rule. This paper gives anew approach for generating association-type action rules. The notion of frequent action sets and Apriori-like strategy generating...
We present a multiple-instance regression algorithm that models internal bag structure to identify the items most relevant to the bag labels. Multiple-instance regression (MIR) operates on a set of bags with real-valued labels, each containing a set of unlabeled items, in which the relevance of each item to its bag label is unknown. The goal is to predict the labels of new bags from their contents...
In some applications, the whole structure of the target data can be represented naturally in "multi-structured graphs" that are complex graphs whose vertices consist of aset of structured data such as itemsets, sequences and so on. To catch the strong affinity relationship in multi-structured graphs, in this paper, we propose an algorithm named HFMG to discover novel and meaningful frequent...
Automatic indexing of music by instruments and their types is a challenging problem, especially when multiple instruments are playing at the same time. We have built a database containing more than one million of music instrument sounds, each described by a large number o features including standard MPEG7 audio descriptors, features for speech recognition, and many new audio features developed by...
The real-world process of generating a large spatio-temporal data collection presents a very difficult technical problem. First, this process is very expensive, requiring a lot of various high-technology software tools and modern hardware infrastructure (sensors, servers, GPS infrastructure etc.) installations; second, the recorded trajectories sometimes cannot represent any special traffic or movement...
INGENS is a prototype of GIS which integrates a geographic knowledge discovery engine to mine several kinds of spatial KDD objects from the topographic maps stored in a spatial database. In this paper we describe the main principles of an inductive spatial database in INGENS. Inductive database allows to keep permanent KDD objects and integrate database technology with systems for the geographic knowledge...
Map generalization is used to derive maps for secondary scales and/or specific goals. This operation greatly benefits spatial decision support systems as it can provide a global and simplified representation of a phenomenon discarding irrelevant information. The recent popularity of OLAP systems for various application domains has generated much interest for the development of spatial OLAP (SOLAP)...
Unsupervised machine learning algorithms are used to perform statistical analysis of several transport and dispersion model runs which simulate emissions from a fixed source under different atmospheric conditions. A clustering algorithm is used to automatically group the results of the transport and dispersion simulations according to their respective cloud characteristics. Each cluster of clouds...
Detection of anomalies in multivariate time series is an important data mining task with potential applications in medical diagnosis, ecosystem modeling, and network traffic monitoring. In this paper, we present a robust graph-based algorithm for detecting anomalies in noisy multivariate time series data. A key feature of the algorithm is the alignment of kernel matrices constructed from the time...
We present and discuss several spatiotemporal kernels designed to mine real-life and simulated data in support of drought prediction. We implement and empirically validate these kernels for support vector machines. Issues related to the nature of geographic data such as autocorrelation and directionality are investigated.
Coastal buoys and stations provide frequent, high quality marine observations for oceanographic study, weather service, atmospheric and public safety. Sharing of the generated data sets requires tremendous efforts and coordination among the different sensor network agencies to come to a shared understanding and for dissemination in a uniform way. Syntactic standardization provides data description...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.