The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper first studies the methods of web documents mining and text clustering, and summaries the fuzzy clustering algorithms and similarity measure functions, then proposes a modified similarity function which can solve the problems of feature selection and feature extraction in high-dimensional space. Finally, this paper puts forward to a dynamic fluzzy clustering algorithm(DCFCM) by combining...
Document Clustering is a widely studied problem in Text Categorization. It is the process of partitioning or grouping a given set of documents into disjoint clusters where documents in the same cluster are similar. K-means, one of the simplest unsupervised learning algorithms, solves the well known clustering problem following a simple and easy way to classify a given data set through a certain number...
Document clustering is the process of partitioning a set of unlabeled n documents into clusters such that documents in each cluster share some common concepts. Each concept is conveniently represented by some key terms. Using words as features, text data are represented as a vector in a very high dimensional vector space. However, most documents are sparse vectors, for example, more than ten thousand...
In this paper we describe our work on developing a novel technique for discovery of implicit knowledge about patents from multilingual patent information sources. In this work we developed a system platform to support locating similar and relevant multilingual patent documents. The platform was implemented using a multilingual vector space based on the latent semantic indexing (LSI) model, and utilizing...
Clustering is currently one of the most crucial techniques for dealing with massive amount of heterogeneous information on the web, which is beyond human beingpsilas capacity to digest. Recent studies have shown that the most commonly used partitioning-based clustering algorithm, the K-means algorithm, is more suitable for large datasets. However, the K-means algorithm can generate a local optimal...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.