The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A Multi-relational Bayesian Classification Algorithm with Rough Set is proposed in this paper. The concept of relational graph used to dynamic choice associative table associated with the target table, and a tuple ID propagation approach is used to solve directly the association rule mining problem with multiple database relations, and the concept of Core in Rough Set is introduced, simplify the associative...
Discretization of continuous-valued attributes is always one of the key problems in rough sets theory, a multiscale rough set model (MRSM) is developed that describes the discretization at multiple scales and analyzes the relation of classifications and certainty between scales. In view of the model's efficiency and effectiveness. an optimal scale can be acquired with self-organization, self-study...
SVM has been used in speaker identification successfully, whereas training SVM consumes long computing time and large memory with all training data, therefore the training data selection (TDS) is an important step for effective speaker identification system. In this paper, a novel TDS method based on the PCA and improved ant colony cluster (IACC) is proposed to solve this problem existed in SVM. The...
In many areas of pattern recognition and machine learning, subspace selection is an essential step. Fisher's linear discriminant analysis (LDA) is one of the most well-known linear subspace selection methods. However, LDA suffers from the class separation problem. The projection to a subspace tends to merge close class pairs. A recent result, named maximizing the geometric mean of Kullback-Leibler...
Aiming at the online fault diagnoses, the texture features which are usually used in image processing are firstly applied in the early fault signal recognition problems. After the parameter R based on gray-level co-occurrence matrix is defined, the parameter R extraction method of texture features is presented. Then, the novel fault signal recognition algorithm based on the parameter R of the texture...
Along with the rapidly development of the information retrieval and web technology, web entity retrieval has become a new popular way for getting specific information, such as looking for a book or a movie. Like document retrieval, generally there are too many results returned for a query, so ranking is still a necessary step during the entity retrieval process. This paper will focus on the ranking...
By analyzing the related theory and methods commonly used by the current retailing enterprises, the main basis and influencing factors of commodity procurement under the supply chain environment can be affirmed. As well, classification of the commodity can be carried out according to 8 factors, such as price, delivering cycle, storage life, purchase quantity, sale quantity, transportation cost, seasonality...
A hybrid constrained semi-supervised clustering algorithm(HCC) is proposed, both labeled data and pairwise constraints are concerned in clustering a given dataset to get a better clustering result. This paper gives theoretical derivation and experiments on UCI data sets, and the experiments show that the quality of clustering using two kinds of constraint information is better than only one kind of...
Many existing clustering algorithms use a single prototype to represent a cluster. However sometimes it is very difficult to find a suitable prototype for representing a cluster with an arbitrary shape. One possible solution is to employ multi-prototype instead. In this paper, we propose a minimum spanning tree (MST) based multi-prototype clustering algorithm. It is a split and merge scheme. In the...
This paper proposes an improved FCM algorithm aiming at many problems in Fuzzy C Means algorithm, such as being sensitive to initial conditions, usually leading to local minimum results. The new algorithm can obtain global optimal solutions through a new simple and efficient selecting rule of the initial cluster centers, furthermore alternating optimization in terms of a novel separable criterion...
This paper proposed a new point symmetry-based ant clustering algorithm which can defect the number of clusters and the proper partitions from data sets when data sets possess the property of symmetry. In the proposed algorithm, a revised ant clustering algorithm is presented which can reduce the running time of standard ant clustering algorithm. Each ant represents a data object. It will decide its...
Several features existed in Chinese texts result in technologic bottleneck in Chinese text mining, at present the results of Chinese text clustering obtained by traditional methods are not very satisfactory. In this paper, we propose the text clustering method by the English texts clustering method called as Text Clustering via Particle Swarm Optimizer (TCPSO) to solve the Chinese text clustering...
According to the problem that K-Means clustering algorithm fails to correctly distinguish non-convex shape clusters, computation mode of distance in the algorithm is changed and density metric mode which can reflect the characteristics of data themselves is adopted instead. In the mode, Delaunay triangulation graph which has the advantages of nearest neighbour and adjacency is introduced to compute...
Literature-based discovery is linking two or more literature concepts that have heretofore not been linked (i.e., disjoint), in order to produce novel, interesting, plausible, and intelligible knowledge. Cluster analysis is the core of literature-based discovery. This paper proposes an improved fuzzy c means (FCM) algorithm based on the analysis of existing clustering analysis of literature-based...
Spatial data mining is the process of identifying or extracting efficient, novel, potentially useful and ultimately understandable patterns from the spatial data set, the spatial clustering analysis is one of the most important research directions in spatial data mining. Clustering criterion implied in massive data can be discovered by spatial clustering analysis method which can be used to explore...
Clustering in high dimensional data is an important task. Subspace clustering has emerged as a possible solution to the challenges associated with high dimensional clustering. A subspace cluster is a subset of points together with a subset of attributes, such that some category of value of cluster points has great aggregation in these attributes. This paper proposes a subspace clustering algorithm...
K-means clustering is sensitive to starting points and its time cost is expensive for large scale of data, such as audio. Sampling approach is widely applied to find “better” starting points for speeding up the clustering converging procedure. However, how to choose a reasonable sampling-rate remains a problem. In this paper, we reported our initial exploration of locating reasonable sampling-rates...
Clustering is a hot research field in data mining. There are so many methods or algorithms designed for different type data set on which data analysis action operates. Local Agglomerative Characteristic (LAC) based Algorithm, in this paper, is presented for data clustering, which can handle clusters of different size, shapes, and densities, can work well on different distributed and natural variant...
Climate factors govern the distribution of plant species which is the indicator of the corresponding region climate. Spatial clustering methods are an important component of spatial data mining. We obtained distribution data of more than 100 Chinese genuine regional herb plants to serve as basic data for spatial analyze. Spatial clustering algorithm based on spatial contiguity relations in GIS was...
In this paper, we propose a new kernel function that makes use of Riemannian geodesic distance s among data points, and present a Geometric median shift algorithm over Riemannian Manifolds. Relying on the geometric median shift, together with geodesic distances, our approach is able to effectively cluster data points distributed on Riemannian manifolds. In addition to improving the clustering results,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.