Search results

chapter

Mining Closed Strict Episodes

N Tatti, B Cule

2010 IEEE International Conference on Data Mining > 501 - 510

2010 10th IEEE International Conference on Data Mining (ICDM 2010)

Discovering patterns in a sequence is an important aspect of data mining. One popular choice of such patterns are episodes, patterns in sequential data describing events that often occur in the vicinity of each other. Episodes also enforce in which order events are allowed to occur. In this work we introduce a technique for discovering closed episodes. Adopting existing approaches for discovering...

chapter

Assessing the Significance of Groups in High-Dimensional Data

G McLachlan

2010 IEEE International Conference on Data Mining > 6

2010 10th IEEE International Conference on Data Mining (ICDM 2010)

Summary form only only given. We consider the problem of assessing the significance of groups in high-dimensional data. In the case of supervised classification where there are data of known origin with respect to the groups under consideration, a guide to the degree of separation among the groups can be given in terms of the estimated error rate of a classifier formed to allocate a new observation...

chapter

Contextual Sequential Pattern Mining

J Rabatel, S Bringay, P Poncelet

2010 IEEE International Conference on Data Mining Workshops > 981 - 988

2010 10th IEEE International Conference on Data Mining Workshops (ICDMW 2010)

Traditional sequential patterns do not take into account additional contextual information since patterns extracted from data are usually general. By considering the fact that a pattern is associated with one specific context the decision expert can then adapt his strategy considering the type of customers. In this paper we propose to mine more precise patterns of the form "young users buy products...

chapter

The use of data mining for basketball matches outcomes prediction

Dragan Miljković, Ljubisa Gajić, Aleksandar Kovačević, Zora Konjović

IEEE 8th International Symposium on Intelligent Systems and Informatics > 309 - 312

2010 IEEE 8th International Symposium on Intelligent Systems and Informatics (SISY 2010)

Sport result prediction is nowadays very popular among fans around the world, which particularly contributed to the expansion of sports betting. This makes the problem of predicting the results of sporting events, a new and interesting challenge. Consequently systems dealing with this problem are developed every day. This paper presents one such system, which uses data mining techniques in order to...

chapter

Cluster-based majority under-sampling approaches for class imbalance learning

Yan-Ping Zhang, Li-Na Zhang, Yong-Cheng Wang

2010 2nd IEEE International Conference on Information and Financial Engineering > 400 - 404

2010 2nd IEEE International Conference on Information and Financial Engineering (ICIFE 2010)

The class imbalance problem usually occurs in real applications. The class imbalance is that the amount of one class may be much less than that of another in training set. Under-sampling is a very popular approach to deal with this problem. Under-sampling approach is very efficient, it only using a subset of the majority class. The drawback of under-sampling is that it throws away many potentially...

chapter

A Framework to Answer Questions of Opinion Type

Xiangdong Su, Guanglai Gao, Yu Tian

2010 Seventh Web Information Systems and Applications Conference > 166 - 169

2010 7th Web Information Systems and Applications Conference (WISA 2010). Workshop on Semantic Web and Ontology (SWON2010). Workshop on Electronic Government Technology and Application (EGTA 2010)

In this paper, we propose a framework to answer questions of opinion type. The data source is the web pages returned from the search engine. By using Bayes Classifier, the main texts on the pages are classified into three categories at sentence level: positive review, negative review and neutral review. K-means method is used to cluster the sentences of positive review and negative review respectively...

chapter

Combined data distortion strategies for privacy-preserving data mining

Bo Peng, Xingyu Geng, Jun Zhang

2010 3rd International Conference on Advanced Computer Theory and Engineering(ICACTE) > 1 > V1-572 - V1-576

2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE 2010)

The problem of privacy-preserving data mining has become more and more important in recent years. Many successful and efficient techniques have been developed. However, in collaborative data analysis, part of the datasets may come from different data owners and may be processed using different data distortion methods. Thus, combinations of datasets processed using different methods are of practical...

chapter

An Incremental Learning Algorithm for Non-stationary Environments and Class Imbalance

Gregory Ditzler, Robi Polikar, Nitesh Chawla

2010 20th International Conference on Pattern Recognition > 2997 - 3000

2010 20th International Conference on Pattern Recognition (ICPR 2010)

Learning in a non-stationary environment and in the presence of class imbalance has been receiving more recognition from the computational intelligence community, but little work has been done to create an algorithm or a framework that can handle both issues simultaneously. We have recently introduced a new member to the Learn⁺⁺ family of algorithms, Learn⁺⁺.NSE, which is designed to track non-stationary...

chapter

Association Action Rules and Action Paths Triggered by Meta-actions

Angelina A Tzacheva, Zbigniew W Ras

2010 IEEE International Conference on Granular Computing > 772 - 776

2010 IEEE International Conference on Granular Computing (GrC-2010)

Action rules are built from atomic expressions called atomic action terms and they describe possible transitions of objects from one state to another. They involve changes of values within one decision attribute. Association action rule is similar to an action rule but it may refer to changes of values involving several attributes listed in its decision part. Action paths are defined as sequences...

chapter

A Unified Paradigm for the Accuracy of Classification Based on Granular Computing

Yongbing Chen, Shuang Liu, Ping Ye

2010 IEEE International Conference on Granular Computing > 669 - 672

2010 IEEE International Conference on Granular Computing (GrC-2010)

Accuracy is a very important criterion for the classifier in the process of classification. In this paper, a unified paradigm for the calculation of accuracy evaluated different classifier, using topological covering-based granular computing, is presented under the given sample space and different ideal classification assumptions. And corresponding examples for the calculation of accuracy in different...

chapter

Mining positive and negative association rules

B Ramasubbareddy, A Govardhan, A Ramamohanreddy

2010 5th International Conference on Computer Science&Education > 1403 - 1406

2010 5th International Conference on Computer Science & Education (ICCSE 2010)

Association rule mining is one of the most popular data mining techniques to find associations among items in a set by mining necessary patterns in a large database. Typical association rules consider only items enumerated in transactions. Such rules are referred to as positive association rules. Negative association rules also consider the same items, but in addition consider negated items (i.e....

chapter

Classification and evaluation of data mining techniques for data stream requirements

M Kholghi, H Hassanzadeh, M Keyvanpour

2010 International Symposium on Computer, Communication, Control and Automation (3CA) > 1 > 474 - 478

2010 International Symposium on Computer, Communication, Control and Automation (3CA 2010)

In recent years, the management and processing of data streams has become a topic of active research in several fields of computer science, such as distributed systems, database systems, and data mining. In data streams' applications, such as network monitoring, telecommunication systems and sensor networks, because of online monitoring, answering to the user's queries should be time and space efficient...

chapter

Active Learning Algorithm for Threshold of Decision Probability on Imbalanced Text Classification Based on Protein-Protein Interaction Documents

Guixian Xu, Zhendong Niu, Xu Gao, Yujuan Cao, more

2010 International Conference on Data Storage and Data Engineering > 78 - 82

2010 International Conference on Data Storage and Data Engineering (DSDE 2010)

The study of host pathogen protein-protein interactions (PPIs) is essential to understand the disease-causing mechanisms of human pathogens. A large number of scientific findings about PPIs are generated in the biomedical literatures. Building a document classification system can accelerate the process of mining and curation of PPI knowledge. With more and more imbalanced dataset appearing, how to...

chapter

Assessment of the Trade-off Curve Accuracy in the Bump Hunting Using the Tree-GA

H. Hirose

2010 Third International Conference on Knowledge Discovery and Data Mining > 597 - 600

2010 3rd International Conference on Knowledge Discovery and Data Mining (WKDD 2010)

Suppose that we are interested in classifying n points in a z-dimensional space into two groups having response 1 and response 0 as the target variable. In some real data cases in customer classification, it is difficult to discriminate the favorable customers showing response 1 from others because many response 1 points and 0 points are closely located. In such a case, to find the denser regions...

chapter

Computing Maximum Error and Reduced Threshold of Mining Frequent Patterns in Data Stream

Hao Guanghao, Zheng Yongqing, Cui Lizhen

2009 International Conference on Information Engineering and Computer Science > 1 - 4

2009 International Conference on Information Engineering and Computer Science. ICIECS 2009

Controlling the space consumption and improving the precision of mining result is two challenges of frequent patterns mining in data stream. The parameter ?? which denotes the maximum error is widely used to reduce the space consumption. In this paper, we firstly propose a computational strategy for identifying maximum error, consist of resource awareness and polynomial approximate, and then propose...

chapter

A New Minimally Supervised Learning Method for Semantic Term Classification - Experimental Results on Classifying Ratable Aspects Discussed in Customer Reviews

Thao Pham Thanh Nguyen, T. Hayashi, R. Onai, Y. Nishioka, more

2009 IEEE International Conference on Data Mining Workshops > 43 - 50

2009 IEEE International Conference on Data Mining Workshops (ICDMW 2009)

We present Bautext, a new minimally supervised approach for automatically extracting ratable aspects from customer reviews and classifying them to some previously defined categories. Bautext requires a small amount of seed words as supervised data and uses a bootstrapping mechanism o progressively collect new member for each category. Learning new category members and the category-specific terms for...

chapter

Performance of distributed apriori algorithms on a computational grid

S.S. Rawat, L. Rajamani

2009 IEEE Asia-Pacific Services Computing Conference (APSCC) > 163 - 167

2009 IEEE Asia-Pacific Services Computing Conference (APSCC 2009)

When large data repositories are coupled with geographic distribution of data, users and systems, it is necessary to combine different technologies for implementing high-performance distributed knowledge discovery systems. On the other hand, computational grid is emerging as a very promising infrastructure for high-performance distributed computing. Grid applications such as astronomy, chemistry,...

chapter

Introducing a Comprehensive Quality Criteria Framework for Validating Patterns

D. Wurhofer, M. Obrist, E. Beck, M. Tscheligi

2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns > 242 - 247

2009 Computation World: Future Computing, Service Computation, Cognitive, Adaptive, Content, Patterns (ComputationWorld 2009)

Patterns represent an important tool for communicating, documenting and looking up best practices for both novice and expert system developers and designers. Although there are a number of different patterns and pattern languages available, it is still unclear how to validate patterns in a structured way. Within this paper, we aim to fill this gap by introducing a Quality Criteria Framework developed...

chapter

A Classifying and Exploring System Based on Users' Tagging Behaviors

Zhang Lei-ming, Li Qiu-dan, Zheng Nan

2009 Chinese Conference on Pattern Recognition > 1 - 5

2009 Chinese Conference on Pattern Recognition. (CCPR 2009) and the First CJK Joint Workshop on Pattern Recognition (CJKPR)

Nowadays, as information explosion, it becomes increasingly important for users to find a resource fast and efficiently in social tagging systems. To deal with the problem, this paper constructs an information classifying and exploring system based on users' tagging behaviors. We group the tags and resources by their semantic relations to construct Tag Bundles automatically, and generate a suitable...

chapter

Feature Selection for Classifying Data Stream Based on Maximum Entropy

Yao-zong Liu, Yong-li Wang, Wei Wei, Hong Zhang

2009 Chinese Conference on Pattern Recognition > 1 - 5

2009 Chinese Conference on Pattern Recognition. (CCPR 2009) and the First CJK Joint Workshop on Pattern Recognition (CJKPR)

Feature select ion is an important problem in the fields of machine learning and pat tern recognition. Data stream data classification with high dimensional and sparse, and the dimension of the need for compression, feature selection methods suitable for data stream classification study of very value of this area is currently a lack of in-depth study. This paper summarizes the current data flow classification...

INFONA - science communication portal

Search results

Mining Closed Strict Episodes

Assessing the Significance of Groups in High-Dimensional Data

Contextual Sequential Pattern Mining

The use of data mining for basketball matches outcomes prediction

Cluster-based majority under-sampling approaches for class imbalance learning

A Framework to Answer Questions of Opinion Type

Combined data distortion strategies for privacy-preserving data mining

An Incremental Learning Algorithm for Non-stationary Environments and Class Imbalance

Association Action Rules and Action Paths Triggered by Meta-actions

A Unified Paradigm for the Accuracy of Classification Based on Granular Computing

Mining positive and negative association rules

Classification and evaluation of data mining techniques for data stream requirements

Active Learning Algorithm for Threshold of Decision Probability on Imbalanced Text Classification Based on Protein-Protein Interaction Documents

Assessment of the Trade-off Curve Accuracy in the Bump Hunting Using the Tree-GA

Computing Maximum Error and Reduced Threshold of Mining Frequent Patterns in Data Stream

A New Minimally Supervised Learning Method for Semantic Term Classification - Experimental Results on Classifying Ratable Aspects Discussed in Customer Reviews

Performance of distributed apriori algorithms on a computational grid

Introducing a Comprehensive Quality Criteria Framework for Validating Patterns

A Classifying and Exploring System Based on Users' Tagging Behaviors

Feature Selection for Classifying Data Stream Based on Maximum Entropy

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options