The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Graph propositionalization methods transform structured and relational data into a fixed-length feature vector format that can be used by standard machine learning methods. However, the choice of propositionalization method may have a significant impact on the performance of the resulting classifier. Six different propositionalization methods are evaluated when used in conjunction with random forests...
In previous studies, performance improvement of nearest neighbor classification of high dimensional data, such as microarrays, has been investigated using dimensionality reduction. It has been demonstrated that the fusion of dimensionality reduction methods, either by fusing classifiers obtained from each set of reduced features, or by fusing all reduced features are better than using any single dimensionality...
Both theory and a wealth of empirical studies have established that ensembles are more accurate than single predictive models. Unfortunately, the problem of how to maximize ensemble accuracy is, especially for classification, far from solved. In essence, the key problem is to find a suitable criterion, typically based on training or selection set performance, highly correlated with ensemble accuracy...
Dimensionality reduction has been demonstrated to improve the performance of the k-nearest neighbor (kNN) classifier for high-dimensional data sets, such as microarrays. However, the effectiveness of different dimensionality reduction methods varies, and it has been shown that no single method constantly outperforms the others. In contrast to using a single method, two approaches to fusing the result...
The test set accuracy for ensembles of classifiers selected based on single measures of accuracy and diversity as well as combinations of such measures is investigated. It is found that by combining measures, a higher test set accuracy may be obtained than by using any single accuracy or diversity measure. It is further investigated whether a multi-criteria search for an ensemble that maximizes both...
When using the output of classifiers to calculate the expected utility of different alternatives in decision situations, the correctness of predicted class probabilities may be of crucial importance. However, even very accurate classifiers may output class probabilities of rather poor quality. One way of overcoming this problem is by means of calibration, i.e., mapping the original class probabilities...
When using machine learning for in silico modeling, the goal is normally to obtain highly accurate predictive models. Often, however, models should also bring insights into interesting relationships in the domain. It is then desirable that machine learning techniques have the ability to obtain small and transparent models, where the user can control the tradeoff between accuracy, comprehensibility...
Ensemble classifiers are known to generally perform better than their constituent classifiers. Whereas a lot of work has been focusing on the generation of classifiers for ensembles, much less attention has been given to the fusion of individual classifier outputs. One approach to fuse the outputs is to apply Shaferpsilas theory of evidence, which provides a flexible framework for expressing and fusing...
The main purpose of this study was to determine whether it is possible to somehow use results on training or validation data to estimate ensemble performance on novel data. With the specific setup evaluated; i.e. using ensembles built from a pool of independently trained neural networks and targeting diversity only implicitly, the answer is a resounding no. Experimentation, using 13 UCI datasets,...
Ensemble classifiers are known to generally perform better than each individual classifier of which they consist. One approach to classifier fusion is to apply Shaferpsilas theory of evidence. While most approaches have adopted Dempsterpsilas rule of combination, a multitude of combination rules have been proposed. A number of combination rules as well as two voting rules are compared when used in...
For both single probability estimation trees (PETs) and ensembles of such trees, commonly employed class probability estimates correct the observed relative class frequencies in each leaf to avoid anomalies caused by small sample sizes. The effect of such corrections in random forests of PETs is investigated, and the use of the relative class frequency is compared to using two corrected estimates,...
Two strategies for fusing information from multiple sources when generating predictive models in the domain of pesticide classification are investigated: i) fusing different sets of features (molecular descriptors) before building a model and ii) fusing the classifiers built from the individual descriptor sets. An empirical investigation demonstrates that the choice of strategy can have a significant...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.