The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In information systems integration, whether the instances of a schema of an information system may be recovered from those of another is a question, which seems profound, and yet has not been well investigated. We shed some light on it by using the notion of information carrying relation. In the literature, a conditional probability based approach has been proposed to finding such a relation within...
Current Web information extraction systems are supervised systems which require manual annotation of training instances in order to learn extraction rules. The annotation is tedious and subject to changes when Web sites upgrade. In this paper, we present a finite-state-transducer-based method of automatic annotation, which can deal with pages with missing attributes, multiple-valued attributes, multi-ordering...
In this paper, a new access method for very high-dimensional data space is proposed. The method uses a graph structure and pivots for indexing objects, such as documents in text mining. It also applies a simple search algorithm that uses distance or similarity based functions in order to obtain the k-nearest neighbors for novel query objects. This method shows a good selectivity over very-high dimensional...
This paper focuses on residual analysis of statistical independence of multiple variables from the viewpoint of linear algebra. The results show that multidimensional residuals are represented as linear sum of determinants of 2 × 2 submatrices, which can be viewed as information granules measuring the degree of statistical dependence.
Many applications today need to manage data that is uncertain, such as information extraction (IE), data integration, sensor RFID networks, and scientific experiments. Top-k queries are often natural and useful in analyzing uncertain data in those applications. In this paper, we study the problem of answering top-k queries in a probabilistic framework from a state-of-the-art statistical IE model-semi-Conditional...
To address the problems of the rule redundancy and the long algorithm execution time in the process of mining one airborne radar intelligence database by the fuzzy association rules algorithm, this paper define a new QL-implicator based fuzzy support measure in order to enhance the recognition probability of the positive association rules and introduce the fuzzy conditional entropy measure (CE-measure)...
We propose an automatic method of extracting bibliographies for academic articles scanned with OCR markup. The method uses conditional random fields (CRF) for labeling serially OCR-ed text lines on an article's title page as appropriate names for bibliographic elements. Although we achieved excellent extraction accuracies for some Japanese academic journals, we needed a substantial amount of training...
The increasing availability of motion data creates unprecedent opportunities to change the paradigm for characterizing movement patterns. While cluster analysis is usually a useful starting point for understanding and exploring data, conventional clustering algorithms are not designed for handling trajectory data. Therefore, in this paper, we propose a direction-based clustering (DEN) method, which...
The pattern information (PI) method was reasonably modified and firstly introduced to the observation data processing of electromagnetic satellite in this paper. Taking the moderate-strong earthquakes as examples, the IAP data recorded by the France DEMETER electromagnetic satellite were systematically processed with the modified PI method. We can find that the variation in non-seismic regions and...
Many real-world Web mining tasks need to discover topics interactively, which means the users are likely to interfere the topic discovery and selection processes by expressing their preferences. In this paper, a new algorithm based on Latent Dirichlet Allocation (LDA) is proposed for interactive topic evolution pattern detection. To eliminate those topics not interested, it allows the users to add...
There is an important issue that text summarization has to embody personal information need and provide indicative message to user. In this paper, a method of acquiring relevant documents based on user-feedback information and transductive inference SVM machine learning is presented. This method can well avoid the subjectivity of deciding relevant documents empirically. Furthermore, a sentence selection...
In order to make CRM more effectively, we need to classify the customer and to realize the personalized service, so we can promote the customer satisfaction and the loyalty, analyze and appraisal the credit is an important step. In the traditional method, the customer credit evaluation precision is insufficient, which causes the enterprise into a dilemma situation. In view of this problem, this article...
This paper focuses on residual analysis of statistical independence of multiple variables from the viewpoint of linear algebra. The results show that multidimensional residuals are represented as linear sum of determinants of 2 × 2 submatrices, which can be viewed as information granules measuring the degree of statistical dependence.
We consider the problem of applying probability concepts to discover frequent itemsets in a transaction database. The paper presents a probabilistic algorithm to discover association rules. The proposed algorithm outperforms the a priori algorithm for larger databases without losing a single rule. It involves a single database scan and significantly reduces the number of unsuccessful candidate sets...
Data mining methods have been proven effective in extracting knowledge from existing data sources for the classification of soils. Previous studies have suggested that soils are spatial entities with fuzzy boundaries and prompted the development of data mining methods to extract knowledge that allows for fuzzy classifications of soils. This paper first looks at the nature of soil classification from...
Decision tree algorithm is a very active research area of data mining. This paper describes the basic decision tree idea in data mining, then discusses the computational complexity of the classical decision tree algorithm (ID3 algorithm). And the improved algorithm to construct a decision tree by using statistical theory and ideas of conditional probability is proposed in this paper. Experiments show...
This paper introduces Copula approach, which has been widely used in statistical field, to the construction of OLAP cubes for the first time. Based on this approach, a novel scheme is proposed to compress data and answer any OLAP query without accessing raw data. The procedure of this scheme can be generally divided into three steps. Firstly, find the proper distribution functions to fit the marginal...
An effective tracking method is proposed to solve the problem that the electro-optical tracking system in Missile Range easily loses the real target during the target separation. Before target separation, the error correcting value of the theoretical trajectory is obtained by the theoretical trajectory correcting algorithm. In the phase of target separation, the theoretical trajectory of the target...
More and more content on the Web is generated by users. To organize this information and make it accessible via current search technology, tagging systems have gained tremendous popularity. Especially for multimedia content they allow to annotate resources with keywords (tags) which opens the door for classic text-based information retrieval. To support the user in choosing the right keywords, tag...
In this work we address the problem of modeling varying time duration sequences for large-scale human routine discovery from cellphone sensor data using a multi-level approach to probabilistic topic models. We use an unsupervised learning approach that discovers human routines of varying durations ranging from half-hourly to several hours. Our methodology can handle large sequence lengths based on...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.