The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Discovering patterns in a sequence is an important aspect of data mining. One popular choice of such patterns are episodes, patterns in sequential data describing events that often occur in the vicinity of each other. Episodes also enforce in which order events are allowed to occur. In this work we introduce a technique for discovering closed episodes. Adopting existing approaches for discovering...
Summary form only only given. We consider the problem of assessing the significance of groups in high-dimensional data. In the case of supervised classification where there are data of known origin with respect to the groups under consideration, a guide to the degree of separation among the groups can be given in terms of the estimated error rate of a classifier formed to allocate a new observation...
Traditional sequential patterns do not take into account additional contextual information since patterns extracted from data are usually general. By considering the fact that a pattern is associated with one specific context the decision expert can then adapt his strategy considering the type of customers. In this paper we propose to mine more precise patterns of the form "young users buy products...
Sport result prediction is nowadays very popular among fans around the world, which particularly contributed to the expansion of sports betting. This makes the problem of predicting the results of sporting events, a new and interesting challenge. Consequently systems dealing with this problem are developed every day. This paper presents one such system, which uses data mining techniques in order to...
The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of under-sampling is that it throws away many potentially...
In this paper, we propose a framework to answer questions of opinion type. The data source is the web pages returned from the search engine. By using Bayes Classifier, the main texts on the pages are classified into three categories at sentence level: positive review, negative review and neutral review. K-means method is used to cluster the sentences of positive review and negative review respectively...
The problem of privacy-preserving data mining has become more and more important in recent years. Many successful and efficient techniques have been developed. However, in collaborative data analysis, part of the datasets may come from different data owners and may be processed using different data distortion methods. Thus, combinations of datasets processed using different methods are of practical...
Learning in a non-stationary environment and in the presence of class imbalance has been receiving more recognition from the computational intelligence community, but little work has been done to create an algorithm or a framework that can handle both issues simultaneously. We have recently introduced a new member to the Learn++ family of algorithms, Learn++.NSE, which is designed to track non-stationary...
Action rules are built from atomic expressions called atomic action terms and they describe possible transitions of objects from one state to another. They involve changes of values within one decision attribute. Association action rule is similar to an action rule but it may refer to changes of values involving several attributes listed in its decision part. Action paths are defined as sequences...
Accuracy is a very important criterion for the classifier in the process of classification. In this paper, a unified paradigm for the calculation of accuracy evaluated different classifier, using topological covering-based granular computing, is presented under the given sample space and different ideal classification assumptions. And corresponding examples for the calculation of accuracy in different...
Association rule mining is one of the most popular data mining techniques to find associations among items in a set by mining necessary patterns in a large database. Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e....
In recent years, the management and processing of data streams has become a topic of active research in several fields of computer science, such as distributed systems, database systems, and data mining. In data streams' applications, such as network monitoring, telecommunication systems and sensor networks, because of online monitoring, answering to the user's queries should be time and space efficient...
The study of host pathogen protein-protein interactions (PPIs) is essential to understand the disease-causing mechanisms of human pathogens. A large number of scientific findings about PPIs are generated in the biomedical literatures. Building a document classification system can accelerate the process of mining and curation of PPI knowledge. With more and more imbalanced dataset appearing, how to...
Suppose that we are interested in classifying n points in a z-dimensional space into two groups having response 1 and response 0 as the target variable. In some real data cases in customer classification, it is difficult to discriminate the favorable customers showing response 1 from others because many response 1 points and 0 points are closely located. In such a case, to find the denser regions...
Controlling the space consumption and improving the precision of mining result is two challenges of frequent patterns mining in data stream. The parameter ?? which denotes the maximum error is widely used to reduce the space consumption. In this paper, we firstly propose a computational strategy for identifying maximum error, consist of resource awareness and polynomial approximate, and then propose...
We present Bautext, a new minimally supervised approach for automatically extracting ratable aspects from customer reviews and classifying them to some previously defined categories. Bautext requires a small amount of seed words as supervised data and uses a bootstrapping mechanism o progressively collect new member for each category. Learning new category members and the category-specific terms for...
When large data repositories are coupled with geographic distribution of data, users and systems, it is necessary to combine different technologies for implementing high-performance distributed knowledge discovery systems. On the other hand, computational grid is emerging as a very promising infrastructure for high-performance distributed computing. Grid applications such as astronomy, chemistry,...
Patterns represent an important tool for communicating, documenting and looking up best practices for both novice and expert system developers and designers. Although there are a number of different patterns and pattern languages available, it is still unclear how to validate patterns in a structured way. Within this paper, we aim to fill this gap by introducing a Quality Criteria Framework developed...
Nowadays, as information explosion, it becomes increasingly important for users to find a resource fast and efficiently in social tagging systems. To deal with the problem, this paper constructs an information classifying and exploring system based on users' tagging behaviors. We group the tags and resources by their semantic relations to construct Tag Bundles automatically, and generate a suitable...
Feature select ion is an important problem in the fields of machine learning and pat tern recognition. Data stream data classification with high dimensional and sparse, and the dimension of the need for compression, feature selection methods suitable for data stream classification study of very value of this area is currently a lack of in-depth study. This paper summarizes the current data flow classification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.