The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The growing complexity and variability characterizing markets have induced scholars and marketers to propose new segmentation approaches. Recent research has shown that including the context in which a transaction occurs in customer behavior models, improves the ability of predicting their behavior. However, no systematic research has studied whether contextual information really matters in market...
The performance of user profiling models depends on both the predictive accuracy and the cost of incorrect predictions. In this paper we study whether including contextual information leads to a decrease in the misclassification cost. Several experimental analyses were done by varying the cost ratio, the market granularity and the granularity of context. The experimental results show that context...
A variety of services have recently been provided depending on highly developed networks and personal equipment. With these advances, connecting this equipment has become increasingly more complicated. Problems such as an increase in no-connection and determining the cause have become difficult in some cases because software is often updated to keep up with advancements in services or security. Telecom...
Error-reduction sampling (ERS) is a high performing (but computationally expensive) query selection strategy for active learning. Subset optimisation has been proposed to reduce computational expense by applying ERS to only a subset of examples from the pool. This paper compares techniques used to construct the subset, namely random sub-sampling and pre-filtering. We focus on pre-filtering which populates...
This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution...
For multi-view learning, existing methods usually exploit originally provided features for classifier training, which ignore the latent correlation between different views. In this paper, semantic features integrating information from multiple views are extracted for pattern representation. Canonical correlation analysis is used to learn the representation of semantic spaces where semantic features...
Clustering problems often involve datasets where only a part of the data is relevant to the problem, e.g., in microarray data analysis only a subset of the genes show cohesive expressions within a subset of the conditions/features. The existence of a large number of non-informative data points and features makes it challenging to hunt for coherent and meaningful clusters from such datasets. Additionally,...
An association rule (AR) is a common knowledge model in data mining that describes an implicative co-occurring relationship between two disjoint sets of binary-valued transaction database attributes (items), expressed in the form of an "antecedent rArr consequent" rule. A variant of the AR is the weighted association rule (WAR). With regard to a marketing context, this paper introduces a...
Distance computation is one of the most computationally intensive operations employed by many data mining algorithms. Performing such matrix computations within a DBMS creates many optimization challenges. We propose techniques to efficiently compute Euclidean distance using SQL queries and user-defined functions (UDFs). We concentrate on efficient Euclidean distance computation for the well-known...
For many data mining applications, it is necessary to develop algorithms that use unlabeled data to improve the accuracy of the supervised learning. Co-Training is a popular semi-supervised learning algorithm. It assumes that each example is represented by two or more redundantly sufficient sets of features (views) and these views are independent given the class. However, these assumptions are not...
With the emergence of large-volume and high-speed streaming data, the recent techniques for stream mining of CFIpsilas (closed frequent itemsets) will become inefficient. When concept drift occurs at a slow rate in high speed data streams, the rate of change of information across different sliding windows will be negligible. So, the user wonpsilat be devoid of change in information if we slide window...
Weka4WS is an extension of the Weka toolkit to support remote execution of data mining tasks as grid services. A first version of Weka4WS supporting concurrent execution of multiple data mining tasks on remote grid nodes has been presented in a previous work. In this paper we present a new version supporting also the composition and execution of data mining workflows on a grid. This new version of...
Behavior is increasingly recognized as a key component in business intelligence and problem-solving. Different from traditional behavior analysis, which mainly focus on implicit behavior and explicit business appearance as a result of business usage and customer demographics, this paper proposes the field of Behavior Informatics and Analytics (BIA), to support explicit behavior involvement through...
We introduce a flexible scoring model that can be used by property and casualty insurers that have access to a risk-sharing pool to better select the insureds to transfer to the pool. The model discriminates between insureds whose transfer is likely to be profitable under the pool regulations against those paying a fair premium. This model makes use of feature selection methods to automatically discover...
Sales prediction is an important problem for different companies involved in manufacturing, logistics, marketing, wholesaling and retailing. Food companies are more concerned with sales prediction of products having a short shelf-life and seasonal changes in demand. The demand may depend on many hidden contexts, not given explicitly in the form of predictive features. Even if some changes are known...
In this paper we consider the problem of discovering frequent temporal patterns in a database of temporal sequences, where a temporal sequence is a set of items with associated dates and durations. Since the quantitative temporal information appears to be fundamental in many contexts, it is taken into account in the mining processes and returned as part of the extracted knowledge. To this end, we...
This paper describes a multi-dimensional knowledge discovery and data mining (KDD) methodology that aims at discovering actionable knowledge related to Internet threats, taking into account domain expert guidance and the integration of domain-specific intelligence during the data mining process. The objectives are twofold: i) to develop global indicators for assessing the prevalence of certain malicious...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.