The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This article presents a clustering-based approach to fuzzy system identification. In order to construct an effective initial fuzzy model, this article tries to present a modular method to identify fuzzy systems based on a hybrid clustering-based technique. Moreover, the determination of the proper number of clusters and the appropriate location of clusters are one of primary considerations on constructing...
Finding similar crime case subsets is an important task for intelligence analysts in crime investigation. It can not only provide multiple clues to solve crimes but also improve efficiency to catch the criminals. However, the conventional approach by querying specific attributes in relational databases has two defects: first, it is relatively of poor efficiency when a lot of incidents have to be handled;...
This paper introduces the clustering-based sentiment analysis approach which is a new approach to sentiment analysis. By applying a TF-IDF weighting method, voting mechanism and importing term scores, an acceptable and stable clustering result can be obtained. It has competitive advantages over the two existing kinds of approaches: symbolic techniques and supervised learning methods. It is a well...
Protocols such as TCP depend on loss detection and recovery algorithms to provide a reliable data delivery service. TCP detects loss events using either retransmission timeout or receipt of duplicate acknowledgements. Since, TCP does not have any explicit knowledge about the cause of packet loss, it always treats it as a congestion indication and then adjusts sending rate conservatively to maintain...
Active learning and semi-supervised learning are both important techniques to improve the learned model using unlabeled data, when labeled data is difficult to obtain, and unlabeled data is available in large quantity and easy to collect. Combining active learning with a semi-supervised learning algorithm that uses Gaussian field and harmonic functions was suggested recently. This work showed that...
Traditional Clustering is a powerful technique for revealing the "hot" topics among documents. However, it's hard to discover the new type events coming out gradually. In this paper, we propose a novel model for detecting new clusters from time-streaming documents. It consists of three parts: the cluster definition based on Multi-Representation Index Tree (MI-Tree), the new cluster detecting...
We address the problem in the conventional Gaussian mixture model (GMM)-based spectral conversion from the viewpoint of optimal conversion function selection. The proposed method is motivated by that if the optimal conversion function based on minimum mel-cepstral distortion (MMCD) criterion can be selected during the conversion stage, the conversion performance in terms of mel-cepstral distortion...
Collaborative filtering, a technique for making predictions about user preferences by exploiting behavior patterns of groups of users, has become a main prediction technique in recommender systems. One crucial problem for collaborative filtering algorithms is how best to know about the preferences of a new user, who has rated none or few examples. Active learning provides effective strategies to select...
Support Vector Machines (SVMs) ensembles have been widely used to improve classification accuracy in complicated pattern recognition tasks. In this work we propose to apply an ensemble of SVMs coupled with feature-subset selection methods to aleviate the curse of dimensionality associated with expression-based classification of DNA microarray data. We compare the single SVM classifier to SVM ensembles...
In traditional e-commerce websites, social tags are used in product classification only, and not applied in the domain of personalized recommendation technology. In this paper, we propose a personalized recommendation model based on social tags. We build a user interest model for products by reflecting user interest and product features directly through social tags, and optimize the interest model...
Searching initial centers in high dimensional space is an interesting and important problem which is relevant for the wide various types of K-Means algorithm. However, this is a very difficult problem, due to the"curse of dimensionality"and the inherently sparse data.Algorithm IMSND is one of the latest initialization methods that are based on the idea of sharing neighborhood density. Concerning...
This study proposes a novel classification technique of GA/k-prototypes in combination with a genetic algorithm to take the advantage of k-prototypes clustering mechanism for supporting the classification purpose. A genetic algorithm is used to adjust the weight applied to input attributes in order to enable a majority of the data records in each cluster to be with the same outcome class. We conduct...
RBF networks are good at prediction tasks of data mining, and k-means clustering algorithm is one of the mostly used clustering algorithms for basis functions of RBF networks. K-means clustering algorithm needs the number of clusters for initialization, and depending on the number of clusters, the accuracy of RBF networks change. But we cannot resort to increasing the number of clusters in the RBF...
This paper presents a novel technique of document clustering based on frequent concepts. The proposed FCDC (Frequent Concepts based Document Clustering), a clustering algorithm works with frequent concepts rather than frequent itemsets used in traditional text mining techniques. Many well known clustering algorithms deal with documents as bag of words while they ignore the important relationship between...
Simplified Silhouette Filter (SSF) is a recently introduced feature selection method that automatically estimates the number of features to be selected. To do so, a sampling strategy is combined with a clustering algorithm that seeks clusters of correlated (potentially redundant) features. It is well known that the choice of a similarity measure may have great impact in clustering results. As a consequence,...
In order to resolve the current problem about seriously academic plagiarism in the web environment, this article proposes an algorithm of the text copy detection on the topic bag and the algorithm uses the idea of semantic clustering and multi-instance learning. Firstly, a paper is divided into three layers construction tree: a leaf node denotes a sentence; a branch node represents a topic bag, and...
Recently nonnegative matrix factorization (NMF) has been proven powerful for nonnegative data analysis, especially in analyzing gene expression data. We propose an modified consensus clustering mechanism with soft sample assignment to improve the clustering accuracy. The idea is to use normalized inner product or cosine similarity matrix for the connectivity matrix of the consensus clustering. The...
Node clustering has wide-ranging applications in decentralized P2P networks such as P2P file sharing systems, mobile ad-hoc networks, P2P sensor networks, and so forth. This paper proposes an approach to construct clusters in unstructured P2P networks based on small-world theory. In contrast to centralized graph clustering algorithms, our scheme is completely decentralized and it only uses the knowledge...
Learning a compact and yet discriminative codebook for classifying human actions is a challenging problem. One difficulty lies in that the learning procedure is split into two independent phases (dimension reduction and clustering) and thus results in the loss of discriminative information which clustering requires. Besides, traditional used principal component analysis is not optimized for class...
The grey relational analysis is widely used in many fields, such as education, decision-making in economics, marketing research, medicine, computer science, system modeling, social science, chemistry, management, etc. In this paper, the algorithms between grey relational analysis and fuzzy c-mean are compared. Finally, one real data set was applied to prove that the performance of the Grey Relational...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.