The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The vertical association rules mining algorithm is an effective mining method recently, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantages that Apriori and its relative algorithms produce large amount of candidate itemsets and require scanning database many times. The vertical association rules mining algorithm needs...
An anomaly intrusion detection algorithm based on minimal diversity is proposed. It can deal with mixed attributes, so overcomes the deficiencies of most unsupervised learning methods. Based on the minimal diversity measurement, we use a small amount of marked data to guide clustering. When detecting new records, we calculate its diversity from the existing clusters to determine its category. This...
Grammar rule set is often used in natural language processing. Usually, rule set can only be gained in linguistics materials artificially. In this paper, through study on the using states of grammar rules in natural language processing, we propose a method, with using a typical way of data mining - mining association rules, of exploring Chinese grammar rules in real corpus environment. And we build...
When studying attribute reduction, we need to enumerate all the minimal reductions to prove the correctness of the attribute reduction algorithm. The explosion of attributes combination makes time complexity of the enumeration algorithm pretty high. An algorithm derived from grouping the rows of decision table is put forward to validate if an attribute set is reduction. Based on this algorithm, a...
A feature point detection algorithm is presented based on the scale-space theory. The algorithm overcome the drawback that a typical single-scale Harris detector usually leads to either missing significant corner points or detecting false corner points due to noise and position displacement. Original matching is solved by similarity of the image gradient module and argument, and the crude matching...
There are a large quantity of non-certain and non-structure contents in the Web text at the present time. It is difficult to cluster the text by some normal classification methods. An algorithm of Web text clustering analysis based on fuzzy set is proposed in this paper, and the algorithm has been described in detail by example. The technique can improve the algorithm complexity of time and space,...
A novel finite-state approach with the practical use of modal operators for Chinese partial parsing is presented in this paper. The traditional rule-based partial parsing approaches make use of regular grammar to approximate the context free or simplified context sensitive grammar with some loss of descriptive ability. Until now, how to solve the non-categorical problem within rule-based paradigm...
In order to extract bottle boundary and pick out unqualified tube-type bottles which possibly have breaks and irregular shapes, a novel processing measure is taken based on wavelet transformation and mathematical morphology. Industrial bottle image contains noises so that traditional and theoretical image processing algorithms need to be improved. In addition, due to bottle inspection standard is...
Nowadays, the number of Web databases has experienced an increase at a surprising rate. Data in the Web databases are hidden behind query forms. As the general reptiles are difficult to search these data, massive resources have been wasted. In order to integrate Web databases and provide a convenience to users' query, one of important problems in this research area is to understand what a query form...
An implementation of data preprocessing system for Web usage mining and the details of algorithm for path completion are presented. After user session identification, the missing pages in user access paths are appended by using the referer-based method which is an effective solution to the problems introduced by using proxy servers and local caching. The reference length of pages in complete path...
In this paper we propose a method that simultaneously performs image denoising and salient curve extraction among random dot patterns using tensor voting. Given an image containing random dots, the pixels are first converted into a set of tokens to be preprocessed by tensor voting, then the voting results are binarized and thinned by a morphological filter to extract the salient curves. At last the...
Text categorization is an important research field within text mining. The initial objective of text categorization is to recognize, understand and organize various volumes of texts or documents. The general procedures of categorization are treated as supervised learning, from which the similarity can be inferred from a collection of categorized texts for training purpose. Obviously, the typical approaches...
Watermark is a new effective digital copyright protection method and data security technology. A new scheme of image retrieval based upon digital watermarking has been proposed. First, embeds the binary watermark information in carried image, computes the value of NC (normalized correlation coefficient) between query watermark and extracted watermark,and gains the retrieval results. The watermarking...
With the rapid development of the Internet and communication technology, huge data is accumulated. Short text such as conversation in chatting room and email is common in such data. It is useful to cluster such short documents to get the structure of the data or to help building other data mining applications. But most of the current clustering algorithms can not get acceptable clustering accuracy...
To discriminate the quality on traditional Chinese medicines Eucommia Bark real-time, according to the characters of Eucommia Bark finger printer, the basic concepts of rough set are introduced briefly. For rough sets can only deal with discrete data, the discretization of data is the key factor in the rough sets applied in quality assessment, we present a method of discretization based on cluster...
The layout understanding of ballot image is the basis of the ballot image recognition, and the ballot table recognition is the key to the layout understanding of the ballot image. This paper presents a method based on a binomial trees model to realize the ballot table recognition. We first analyze the layout characteristics of the ballot image, then study the logic dependence between the candidate...
This paper proposes an improved method for developing reusable components from the legacy non object-oriented codes. By analyzing the key data types of subroutines in a non object-oriented system, this method extracts the meaningful objects and packs them as reusable components. This method has been implemented and applied successfully in our experimental automatic object-extraction system for C programs.
Concept lattice was creatively used in mining correlated policy rules in this study. It takes much time for a conflict detection routine to search every policy in policy repository with conventional policy storage models to see if conflict occurs before a new dynamic policy is added to the policy repository. A novel storage model for dynamic policies was proposed to address this problem. Dynamic policies...
The emergence of search engines set off an unprecedented storm of information. In recent years, a new breakthrough - vertical search, emerged on the basis of the general search engines, compared with the general search engines, it must be conducted on the pre-analysis. A successful vertical search engine must be based on the accurate extraction of a wide variety of Web information. However, unstable...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.