Search results

chapter

An Information Flow Based Approach to Schema Transformations for Information Systems Integration

Kaibo Xu, Junkang Feng

2010 International Conference on Management and Service Science > 1 - 4

2010 International Conference on Management and Service Science (MASS 2010)

In information systems integration, whether the instances of a schema of an information system may be recovered from those of another is a question, which seems profound, and yet has not been well investigated. We shed some light on it by using the notion of information carrying relation. In the literature, a conditional probability based approach has been proposed to finding such a relation within...

chapter

Automatic Annotation for the Generation of Extraction Rules

Shi Yufei, Chen Rong

2010 International Conference on Management and Service Science > 1 - 5

2010 International Conference on Management and Service Science (MASS 2010)

Current Web information extraction systems are supervised systems which require manual annotation of training instances in order to learn extraction rules. The annotation is tedious and subject to changes when Web sites upgrade. In this paper, we present a finite-state-transducer-based method of automatic annotation, which can deal with pages with missing attributes, multiple-valued attributes, multi-ordering...

chapter

A High-Dimensional Access Method for Approximated Similarity Search in Text Mining

F Artigas-Fuentes, R Gil-García, J M Badía-Contelles

2010 20th International Conference on Pattern Recognition > 3155 - 3158

2010 20th International Conference on Pattern Recognition (ICPR 2010)

In this paper, a new access method for very high-dimensional data space is proposed. The method uses a graph structure and pivots for indexing objects, such as documents in text mining. It also applies a simple search algorithm that uses distance or similarity based functions in order to obtain the k-nearest neighbors for novel query objects. This method shows a good selectivity over very-high dimensional...

chapter

Geometrical and Combinatorial Nature of Pearson Residuals

S Tsumoto, S Hirano

2010 IEEE International Conference on Granular Computing > 489 - 494

2010 IEEE International Conference on Granular Computing (GrC-2010)

This paper focuses on residual analysis of statistical independence of multiple variables from the viewpoint of linear algebra. The results show that multidimensional residuals are represented as linear sum of determinants of 2 × 2 submatrices, which can be viewed as information granules measuring the degree of statistical dependence.

chapter

P-top-k queries in probabilistic framework from information extraction models

Ming He, Yong-ping Du

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 5 > 2376 - 2379

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Many applications today need to manage data that is uncertain, such as information extraction (IE), data integration, sensor RFID networks, and scientific experiments. Top-k queries are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering top-k queries in a probabilistic framework from a state-of-the-art statistical IE model-semi-Conditional...

chapter

Research on Improved Quality Measures for the Fuzzy Association Rules of One Airborne Radar Intelligence Database

Jian Cui, Qiang Li, Long-Po Yang, Yong Liu

2010 International Conference on Internet Technology and Applications > 1 - 4

2010 International Conference on Internet Technology and Applications (iTAP 2010)

To address the problems of the rule redundancy and the long algorithm execution time in the process of mining one airborne radar intelligence database by the fuzzy association rules algorithm, this paper define a new QL-implicator based fuzzy support measure in order to enhance the recognition probability of the positive association rules and introduce the fuzzy conditional entropy measure (CE-measure)...

chapter

Empirical evaluation of active sampling for CRF-based analysis of pages

Manabu Ohta, Ryohei Inoue, Atsuhiro Takasu

2010 IEEE International Conference on Information Reuse&Integration > 13 - 18

2010 IEEE International Conference on Information Reuse & Integration (IRI 2010)

We propose an automatic method of extracting bibliographies for academic articles scanned with OCR markup. The method uses conditional random fields (CRF) for labeling serially OCR-ed text lines on an article's title page as appropriate names for bibliographic elements. Although we achieved excellent extraction accuracies for some Japanese academic journals, we needed a substantial amount of training...

chapter

Direction clustering for characterizing movement patterns

Wenjun Zhou, Hui Xiong, Yong Ge, Jannite Yu, more

2010 IEEE International Conference on Information Reuse&Integration > 165 - 170

2010 IEEE International Conference on Information Reuse & Integration (IRI 2010)

The increasing availability of motion data creates unprecedent opportunities to change the paradigm for characterizing movement patterns. While cluster analysis is usually a useful starting point for understanding and exploring data, conventional clustering algorithms are not designed for handling trajectory data. Therefore, in this paper, we propose a direction-based clustering (DEN) method, which...

chapter

On new statistical technologies of quantitative seismic information mining from electromagnetic satellite data

Anxu Wu

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 4 > 1715 - 1719

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

The pattern information (PI) method was reasonably modified and firstly introduced to the observation data processing of electromagnetic satellite in this paper. Taking the moderate-strong earthquakes as examples, the IAP data recorded by the France DEMETER electromagnetic satellite were systematically processed with the modified PI method. We can find that the variation in non-seismic regions and...

chapter

A LDA-Based Approach for Interactive Web Mining of Topic Evolutionary Patterns

Bin Zhou, Jiuming Huang, Kai Cui

2010 International Conference on Internet Technology and Applications > 1 - 5

2010 International Conference on Internet Technology and Applications (iTAP 2010)

Many real-world Web mining tasks need to discover topics interactively, which means the users are likely to interfere the topic discovery and selection processes by expressing their preferences. In this paper, a new algorithm based on Latent Dirichlet Allocation (LDA) is proposed for interactive topic evolution pattern detection. To eliminate those topics not interested, it allows the users to add...

chapter

Document Relevance Identifying and its Effect in Query-Focused Text Summarization

Tingting He, Fang Li, Liang Ma

2010 IEEE International Conference on Granular Computing > 206 - 211

2010 IEEE International Conference on Granular Computing (GrC-2010)

There is an important issue that text summarization has to embody personal information need and provide indicative message to user. In this paper, a method of acquiring relevant documents based on user-feedback information and transductive inference SVM machine learning is presented. This method can well avoid the subjectivity of deciding relevant documents empirically. Furthermore, a sentence selection...

chapter

Application of Bayesian Network in Improving Customer Credit Precision

Gang Ma, Bin Li, Fangfang Yang

2010 International Conference on Management and Service Science > 1 - 3

2010 International Conference on Management and Service Science (MASS 2010)

In order to make CRM more effectively, we need to classify the customer and to realize the personalized service, so we can promote the customer satisfaction and the loyalty, analyze and appraisal the credit is an important step. In the traditional method, the customer credit evaluation precision is insufficient, which causes the enterprise into a dilemma situation. In view of this problem, this article...

chapter

Information Granules of Statistical Dependence in Multiway Contingency Tables

Shusaku Tsumoto, Shoji Hirano

2010 IEEE International Conference on Granular Computing > 483 - 488

2010 IEEE International Conference on Granular Computing (GrC-2010)

This paper focuses on residual analysis of statistical independence of multiple variables from the viewpoint of linear algebra. The results show that multidimensional residuals are represented as linear sum of determinants of 2 × 2 submatrices, which can be viewed as information granules measuring the degree of statistical dependence.

chapter

A Probabilistic Approach to Apriori Algorithm

Vaibhav Sharma, M M Sufyan Beg

2010 IEEE International Conference on Granular Computing > 402 - 408

2010 IEEE International Conference on Granular Computing (GrC-2010)

We consider the problem of applying probability concepts to discover frequent itemsets in a transaction database. The paper presents a probabilistic algorithm to discover association rules. The proposed algorithm outperforms the a priori algorithm for larger databases without losing a single rule. It involves a single database scan and significantly reduces the number of unsuccessful candidate sets...

chapter

Fuzziness vs. probability in a data mining application for soil classification

Feng Qi

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery > 6 > 2614 - 2618

2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD)

Data mining methods have been proven effective in extracting knowledge from existing data sources for the classification of soils. Previous studies have suggested that soils are spatial entities with fuzzy boundaries and prompted the development of data mining methods to extract knowledge that allows for fuzzy classifications of soils. This paper first looks at the nature of soil classification from...

chapter

Research and application of conditional probability decision tree algorithm in data mining

XianMin Wei

2010 Second Pacific-Asia Conference on Circuits, Communications and System > 2 > 78 - 80

2010 Second Pacific-Asia Conference on Circuits, Communications and Systems (PACCS 2010)

Decision tree algorithm is a very active research area of data mining. This paper describes the basic decision tree idea in data mining, then discusses the computational complexity of the classical decision tree algorithm (ID3 algorithm). And the improved algorithm to construct a decision tree by using statistical theory and ideas of conditional probability is proposed in this paper. Experiments show...

chapter

Compression of OLAP Cubes for Aggregate Queries Based on Copula Approach

Yazhuo Gao, Zhiwei Ni, Liping Ni

2010 Third International Conference on Business Intelligence and Financial Engineering > 67 - 71

Third International Conference on Business Intelligence and Financial Engineering (BIFE 2010)

This paper introduces Copula approach, which has been widely used in statistical field, to the construction of OLAP cubes for the first time. Based on this approach, a novel scheme is proposed to compress data and answer any OLAP query without accessing raw data. The procedure of this scheme can be generally divided into three steps. Firstly, find the proper distribution functions to fit the marginal...

chapter

An electro-optical tracking method in target separation based on fuzzy clustering association rules

Guo Tong-jian, Gao Hui-bin, Zhang Shu-mei, Wu Yong-jun

2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering > 4 > 289 - 292

2010 International Conference on Computer, Mechatronics, Control and Electronic Engineering (CMCE 2010)

An effective tracking method is proposed to solve the problem that the electro-optical tracking system in Missile Range easily loses the real target during the target separation. Before target separation, the error correcting value of the theoretical trajectory is obtained by the theoretical trajectory correcting algorithm. In the phase of target separation, the theoretical trajectory of the target...

chapter

Language Models and Topic Models for Personalizing Tag Recommendation

R Krestel, P Fankhauser

2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology > 1 > 82 - 89

2010 IEEE/ACM International Conference on Web Intelligence-Intelligent Agent Technology (WI-IAT)

More and more content on the Web is generated by users. To organize this information and make it accessible via current search technology, tagging systems have gained tremendous popularity. Especially for multimedia content they allow to annotate resources with keywords (tags) which opens the door for classic text-based information retrieval. To support the user in choosing the right keywords, tag...

chapter

Mining Human Location-Routines Using a Multi-Level Approach to Topic Modeling

K Farrahi, D Gatica-Perez

2010 IEEE Second International Conference on Social Computing > 446 - 451

2010 IEEE Second International Conference on Social Computing (SocialCom 2010). the Second IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT 2010)

In this work we address the problem of modeling varying time duration sequences for large-scale human routine discovery from cellphone sensor data using a multi-level approach to probabilistic topic models. We use an unsupervised learning approach that discovers human routines of varying durations ranging from half-hourly to several hours. Our methodology can handle large sequence lengths based on...

INFONA - science communication portal

Search results

An Information Flow Based Approach to Schema Transformations for Information Systems Integration

Automatic Annotation for the Generation of Extraction Rules

A High-Dimensional Access Method for Approximated Similarity Search in Text Mining

Geometrical and Combinatorial Nature of Pearson Residuals

P-top-k queries in probabilistic framework from information extraction models

Research on Improved Quality Measures for the Fuzzy Association Rules of One Airborne Radar Intelligence Database

Empirical evaluation of active sampling for CRF-based analysis of pages

Direction clustering for characterizing movement patterns

On new statistical technologies of quantitative seismic information mining from electromagnetic satellite data

A LDA-Based Approach for Interactive Web Mining of Topic Evolutionary Patterns

Document Relevance Identifying and its Effect in Query-Focused Text Summarization

Application of Bayesian Network in Improving Customer Credit Precision

Information Granules of Statistical Dependence in Multiway Contingency Tables

A Probabilistic Approach to Apriori Algorithm

Fuzziness vs. probability in a data mining application for soil classification

Research and application of conditional probability decision tree algorithm in data mining

Compression of OLAP Cubes for Aggregate Queries Based on Copula Approach

An electro-optical tracking method in target separation based on fuzzy clustering association rules

Language Models and Topic Models for Personalizing Tag Recommendation

Mining Human Location-Routines Using a Multi-Level Approach to Topic Modeling

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options