The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Financial forecasting is the basis for budgeting activities and estimating future financing needs. Applying machine learning and data mining models to financial forecasting is both effective and efficient. Among different kinds of machine learning models, kernel methods are well accepted since they are more robust and accurate than traditional models, such as neural networks. However, learning from...
We present a new approach to semi-supervised anomaly detection. Given a set of training examples believed to come from the same distribution or class, the task is to learn a model that will be able to distinguish examples in the future that do not belong to the same class. Traditional approaches typically compare the position of a new data point to the set of ``normal'' training data points in a chosen...
Recently, the following discrimination aware classification problem was introduced: given a labeled dataset and an attribute B, find a classifier with high predictive accuracy that at the same time does not discriminate on the basis of the given attribute B. This problem is motivated by the fact that often available historic data is biased due to discrimination, e.g., when B denotes ethnicity. Using...
In many applications of data mining we know beforehand that the response variable should be increasing (or decreasing) in the attributes. Such relations between response and attributes are called monotone. In this paper we present a new algorithm to compute an optimal monotone classification of a data set for convex loss functions. Moreover, we show how the algorithm can be extended to compute all...
This paper introduces a simple yet powerful data transformation strategy for kernel machines. Instead of adapting the parameters of the kernel function w.r.t. the given data (as in conventional methods), we adjust both the kernel hyper-parameters and the given data itself. Using this approach, the input data is transformed to be more representative of the assumptions encoded in the kernel function...
Quantification is the name given to a novel machine learning task which deals with correctly estimating the number of elements of one class in a set of examples. The output of a quantifier is a real value, since training instances are the same as a classification problem, a natural approach is to train a classifier and to derive a quantifier from it. Some previous works have shown that just classifying...
When active learning is applied to real-world applications, human experts usually act as oracles to provide labels. However, human make mistakes, thus noise might be introduced during the learning process. Most previous studies simplify the problem by assuming uniformly-distributed noise over the sample space. Such assumption, however, might fail to precisely reflect the human experts' behaviour in...
When we think of an object in a supervised learning setting, we usually perceive it as a collection of fixed attribute values. Although this setting may be suited well for many classification tasks, we propose a new object representation and therewith a new challenge in data mining: an object is no longer described by one set of attributes but is represented in a hierarchy of attribute sets in different...
We study the retrieval task that ranks a set of objects for a given query in the pair wise preference learning framework. Recently researchers found out that raw features (e.g. words for text retrieval) and their pair wise features which describe relationships between two raw features (e.g. word synonymy or polysemy) could greatly improve the retrieval precision. However, most existing methods can...
Predicting people who other people may like has recently become an important task in many online social networks. Traditional collaborative filtering (CF) approaches are popular in recommender systems to effectively predict user preferences for items. One major problem in CF is computing similarity between users or items. Traditional CF methods often use heuristic methods to combine the ratings given...
In active learning, where a learning algorithm has to purchase the labels of its training examples, it is often assumed that there is only one labeler available to label examples, and that this labeler is noise-free. In reality, it is possible that there are multiple labelers available (such as human labelers in the online annotation tool Amazon Mechanical Turk) and that each such labeler has a different...
In many cases of machine learning or data mining applications, we are not only aimed to establish accurate black box predictors, we are also interested in discovering predictive patterns in data which enhance our interpretation and understanding of underlying physical, biological and other natural processes. Sparse representation is one of the focuses in this direction. More recently, structural sparsity...
The prevailing approach to evaluating classifiers in the machine learning community involves comparing the performance of several algorithms over a series of usually unrelated data sets. However, beyond this there are many dimensions along which methodologies vary wildly. We show that, depending on the stability and similarity of the algorithms being compared, these sometimes-arbitrary methodological...
This paper presents a novel framework for multi-folder email classification using graph mining as the underlying technique. Although several techniques exist (e.g., SVM, TF-IDF, n-gram) for addressing this problem in a delimited context, they heavily rely on extracting high-frequency keywords, thus ignoring the inherent structural aspects of an email (or document in general) which can play a critical...
Human motion recognition in video data has several interesting applications in fields such as gaming, senior/assisted living environments, and surveillance. In these scenarios, we might have to consider adding new motion classes (i.e. new types of human motions to be recognized) as well as new training data (say, for handling different type of subjects). Hence, both accuracy of classification and...
Collecting, monitoring, and analyzing data automatically by well instrumented systems is frequently motivated by human decision-making. However, the same need occurs when system software decisions are to be justified. Compiler optimization or storage management requires several decisions which result in more or less resource consumption, be it energy, memory, or runtime. A magnitude of system data...
An efficient incremental approach to the discriminative common vector (DCV) method for dimensionality reduction and classification is presented. Starting from the original batch method, an incremental formulation is given. The main idea is to minimize both matrix operations and space constraints. To this end, an straightforward per sample correction is obtained enabling the possibility of setting...
With more older adults and people with cognitive disorders preferring to stay independently at home, prompting systems that assist with Activities of Daily Living (ADLs) are in demand. In this paper, with the introduction of “The PUCK”, we take the very first approach to automate a prompting system without any predefined rule set or user feedback. We statistically analyze realistic prompting data...
Traditional supervised learning assumes that instances are described by observable attributes. The goal is to learn to predict the labels for unseen instances. In many real world applications the values of some attributes are not only observable, but can be proactively chosen by a decision maker. Furthermore, in some of such applications the decision maker is interested not only to generate accurate...
In this work, we investigate sentiment mining of Arabic text at both the sentence level and the document level. Existing research in Arabic sentiment mining remains very limited. For sentence-level classification, we investigate two approaches. The first is a novel grammatical approach that employs the use of a general structure for the Arabic sentence. The second approach is based on the semantic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.