The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Text Categorization (TC) is an important component in many information organization and information management tasks. In many TC applications, the case-base grows at a fast rate and this causes inefficiency in the case retrieval process. Using Case-Base Maintenance learning via the GC (Generalization Capability) algorithm, which can reduce the case number into KNN algorithm, can improve efficiency...
Based on the complex network theory, we proposed a clustering algorithm based on content similarity. Firstly, the Chinese documents are represented by the vector-space model, and the content similarity between any two documents is computed by the cosine similarity. Consequently, the network node is defined as a document, and the edge weight is defined as the similarity obtained by the cosine similarity...
A hybrid constrained semi-supervised clustering algorithm(HCC) is proposed, both labeled data and pairwise constraints are concerned in clustering a given dataset to get a better clustering result. This paper gives theoretical derivation and experiments on UCI data sets, and the experiments show that the quality of clustering using two kinds of constraint information is better than only one kind of...
Many existing clustering algorithms use a single prototype to represent a cluster. However sometimes it is very difficult to find a suitable prototype for representing a cluster with an arbitrary shape. One possible solution is to employ multi-prototype instead. In this paper, we propose a minimum spanning tree (MST) based multi-prototype clustering algorithm. It is a split and merge scheme. In the...
This paper proposes an improved FCM algorithm aiming at many problems in Fuzzy C Means algorithm, such as being sensitive to initial conditions, usually leading to local minimum results. The new algorithm can obtain global optimal solutions through a new simple and efficient selecting rule of the initial cluster centers, furthermore alternating optimization in terms of a novel separable criterion...
This paper proposed a new point symmetry-based ant clustering algorithm which can defect the number of clusters and the proper partitions from data sets when data sets possess the property of symmetry. In the proposed algorithm, a revised ant clustering algorithm is presented which can reduce the running time of standard ant clustering algorithm. Each ant represents a data object. It will decide its...
Several features existed in Chinese texts result in technologic bottleneck in Chinese text mining, at present the results of Chinese text clustering obtained by traditional methods are not very satisfactory. In this paper, we propose the text clustering method by the English texts clustering method called as Text Clustering via Particle Swarm Optimizer (TCPSO) to solve the Chinese text clustering...
According to the problem that K-Means clustering algorithm fails to correctly distinguish non-convex shape clusters, computation mode of distance in the algorithm is changed and density metric mode which can reflect the characteristics of data themselves is adopted instead. In the mode, Delaunay triangulation graph which has the advantages of nearest neighbour and adjacency is introduced to compute...
Literature-based discovery is linking two or more literature concepts that have heretofore not been linked (i.e., disjoint), in order to produce novel, interesting, plausible, and intelligible knowledge. Cluster analysis is the core of literature-based discovery. This paper proposes an improved fuzzy c means (FCM) algorithm based on the analysis of existing clustering analysis of literature-based...
Spatial data mining is the process of identifying or extracting efficient, novel, potentially useful and ultimately understandable patterns from the spatial data set, the spatial clustering analysis is one of the most important research directions in spatial data mining. Clustering criterion implied in massive data can be discovered by spatial clustering analysis method which can be used to explore...
Clustering in high dimensional data is an important task. Subspace clustering has emerged as a possible solution to the challenges associated with high dimensional clustering. A subspace cluster is a subset of points together with a subset of attributes, such that some category of value of cluster points has great aggregation in these attributes. This paper proposes a subspace clustering algorithm...
K-means clustering is sensitive to starting points and its time cost is expensive for large scale of data, such as audio. Sampling approach is widely applied to find “better” starting points for speeding up the clustering converging procedure. However, how to choose a reasonable sampling-rate remains a problem. In this paper, we reported our initial exploration of locating reasonable sampling-rates...
Clustering is a hot research field in data mining. There are so many methods or algorithms designed for different type data set on which data analysis action operates. Local Agglomerative Characteristic (LAC) based Algorithm, in this paper, is presented for data clustering, which can handle clusters of different size, shapes, and densities, can work well on different distributed and natural variant...
Climate factors govern the distribution of plant species which is the indicator of the corresponding region climate. Spatial clustering methods are an important component of spatial data mining. We obtained distribution data of more than 100 Chinese genuine regional herb plants to serve as basic data for spatial analyze. Spatial clustering algorithm based on spatial contiguity relations in GIS was...
In this paper, we propose a new kernel function that makes use of Riemannian geodesic distance s among data points, and present a Geometric median shift algorithm over Riemannian Manifolds. Relying on the geometric median shift, together with geodesic distances, our approach is able to effectively cluster data points distributed on Riemannian manifolds. In addition to improving the clustering results,...
Since the emergence of BLOG, it not only represents a new network technology, but also means the beginning of a new life style. How to utilize and mine the BLOG content which contains hidden sentiment and real-time update is a big challenge in the data-mining domain. As most of the existing method for network text's topic mining is achieved through clustering text's topic and label which are labeled...
With the widespread of Internet application, more and more enterprises build their Web sites and provide business information through Web pages. Web page classification could be used to assign the enterprise Web pages to one or more predefined business categories. On the purpose of Internet-based enterprises administration in E-government system, algorithms and application related to web page classification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.