The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a medically driven data mining application: system for diagnosing of gait patterns related to health problems of elderly to support their independent living. Gait of elderly is captured with motion capture system, which consists of tags attached to the body and sensors situated in the apartment. Position of the tags is acquired by the sensors and the resulting time series of position coordinates...
Handling changes over time in supervised learning (concept drift) lately has received a great deal of attention, a number of adaptive learning strategies have been developed. Most of them make an optimistic assumption that the new labels become available immediately. In real sequential classification tasks it is often unrealistic due to task specific delayed labeling or associated labeling costs....
Transcriptional regulatory network identification is both a fundamental challenge in systems biology and an important practical application of data mining and machine learning. In this study, we propose a semi-supervised learning-based integrative scoring approach to tackle this challenge and predict transcriptional regulations. Our approach out-performs a state-of-the-art label propagation method...
Generally, numerous data may increase the statistical power. However, many algorithms in data mining community only focus on small samples. This is because when the sample size increases, the data set is not necessarily identically distributed in spite of being generated by some common data generating mechanism. In this paper, we realize restricted Bayesian network classifiers are robust even when...
Summary form only given. In this talk, we will present how semantics can improve the quality of the data mining process. In particular, first, we will focus on geospatial schema matching with high quality cluster assurance. Next, we will focus on location mining from social network. With regard to the first problem, resolving semantic heterogeneity across distinct data sources remains a highly relevant...
Relational database mining, where data are mined across multiple relations, is increasingly commonplace. When considering a complex database schema, it becomes difficult to identify all possible relationships between attributes from the different relations. That is, seemingly harmless attributes may be linked to confidential information, leading to data leaks when building a model. In this way, we...
Lazy Associative Rule Mining (LARM) integrates lazy learning and Associative Rule Mining (ARM) to tailor label prediction results by generating related class associative rules (CARs) only when an unlabeled document comes. However, two main problems should be carefully concerned in LARM classification: (1) computing efficiency and (2) dominant class bias prediction. The main idea of the proposed method,...
S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data. Keyed data events are routed with affinity to Processing Elements (PEs), which consume the events and do one or both of the following: (1) emit one or more events which may be consumed by other PEs,...
Social bookmarking tools are rapidly emerging on the Web as it can be witnessed by the overwhelming number of participants. In such spaces, users annotate resources by means of any keyword or tag that they find relevant, giving raise to lightweight conceptual structures aka folksonomies. In this respect, needless to mention that ontologies can be of benefit for enhancing information retrieval metrics...
Document classification plays an increasingly important role in extracting and organizing the knowledge, however, the Web document classification task was hindered by the huge number of Web documents while limited resource of human judgment on the training data. To obtain sufficient training data in a cost-efficient way, in this paper, we propose a semi-supervised learning approach to predict a document's...
This paper presents a triple-random ensemble learning method for handling multi-label classification problems. The proposed method integrates and develops the concepts of random subspace, bagging and random k-label sets ensemble learning methods to form an approach to classify multi-label data. It applies the random subspace method to feature space, label space as well as instance space. The devised...
Time-series classification is an active research topic in machine learning, as it finds applications in numerous domains. The k-NN classifier, based on the discrete time warping (DTW) distance, had been shown to be competitive to many state-of-the art time-series classification methods. Nevertheless, due to the complexity of time-series data sets, our investigation demonstrates that a single, global...
This paper presents an improved fuzzy neural network (IFNN) for pattern recognition. The IFNN consists of several sub-networks, which represent different patterns. Each sub-network distinguishes a particular pattern from others, and each pattern corresponds to the certain inputs. In IFNN, an empirical formula tested many times is used to calculate the number of nodes in the hidden layer, and the learning...
MEDLINE®, the flagship database of the U.S. National Library of Medicine, is a critical source of information for biomedical research and clinical medicine. The automated extraction of bibliographic data, such as article titles, author names, abstracts, and references, is essential to the affordable creation of this citation database. References, typically appearing at the end of journal articles,...
In this paper we propose a new approach based on Symbolic Aggregate approximation (SAX), called improved iSAX to recognize efficient and accurate discovery of the important patterns, essential for time series data. The original SAX approach allows a very high-quality dimensionality reduction and distance measures to be defined on the symbolic approach and it is based on PAA (Piecewise Aggregate Approximation)...
Learning in the presence of data imbalances presents a great challenge to machine learning. Imbalanced data sets represent a significant problem because the corresponding classifier has a tendency to ignore samples which have smaller representation in the training sets. In this paper, we propose an ensemble-based learning algorithm as a new ensemble classifier model called as SVM-C5.0 Ensemble Classifier...
The rapidly growing amount of available digital documents of various formats and the possibility to access these through internet-based technologies in distributed environments, have led to the necessity to develop solid methods to properly organize and structure documents in large digital libraries and repositories. Specifically, the extremely large size of document collections make it impossible...
This paper proposes a new feature-selection strategy by integrating the Rough Set Theory (RST) and Particle Swarm Optimisation (PSO) algorithms to generate a set of discriminatory features for the classification problem. The proposed method is seen as a marriage between filter and wrapper approaches in which the RST is used to pre-reduce the feature set before optimisation by PSO, a meta-heuristic...
In this paper, we apply classification system denoted Belief Rough Set Classifier (BRSC) based on the hybridization of belief functions and rough sets to learn decision rules from uncertain data consisting of web usage. The uncertainty appears only in decision attributes and is handled by the Transferable Belief Model (TBM), one interpretation of the belief function theory. The web usage mining dataset...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.