The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The focus of this paper is on detecting overlapping communities for the directed graphs by implementing a new algorithm and analyzing it with various performance metrics. The algorithm aims at finding core nodes for the directed graph which are subset of communities and have higher contact frequency. These are then extended to find communities using compactness measurement (CM). The compactness of...
Credit scoring plays an important role in financial institutions and debt based crowdfunding platforms as well as peer to peer lending platforms. In the last few years, adopting ensemble methods for credit scoring has become much more popular. However, the performance of ensemble methods is easily affected by the parameter settings and the number of base classifiers. Ensemble classification based...
In this paper we derive a clustering method based on the Hidden Conditional Random Field (HCRF) model in order to maximizes the performance of a wireless sensor. Our novel approach to clustering in this paper is in the application of an index invariant graph that we defined in a previous work and that precisely links a hyper-tree structure to the data set assumptions. We show that a set of conditional...
It is well known that clustering is an unsupervised machine learning technique. However, most of the clustering methods need setting several parameters such as number of clusters, shape of clusters, or other user- or problem-specific parameters and thresholds. In this paper, we propose a new clustering approach which is fully autonomous, in the sense that it does not require parameters to be pre-defined...
Data clustering analysis is the process of finding similarity between data that are assigned into homogeneous groups and the most heterogeneous as possible among groups. There are several analysis methods in wich K-means clustering algorithm is the widly used in different research areas. Therefore, this paper reviews the most known variants of clustering methods which are K-means, IRP-K-means and...
In this paper, we first show that there exists a day pattern in equities volatility and its volatility pattern is different from daily volume profile. To further emphasize on the most important volatility change during the day, we fold the continuous stock minute-by-minute data into n-by-p matrix, where n is number of days and p is number of minutes during trading hour, and decompose the matrix using...
Subspace clustering has typically been approached as an unsupervised machine learning problem. However in several applications where the union of subspaces model is useful, it is also reasonable to assume you have access to a small number of labels. In this paper we investigate the benefit labeled data brings to the subspace clustering problem. We focus on incorporating labels into the k-subspaces...
The goal of this paper was to apply fuzzy clustering algorithm known as Fuzzy C-Means to color image segmentation, which is an important problem in pattern recognition and computer vision. For computational experiments, serial and parallel versions were implemented. Both were tested using various parameters and random number generator seeds. Various distance measures were used: Euclidean, Manhattan...
In this paper, we present a Gaussian test-based hierarchical clustering method for high-resolution TerraSAR-X images. The purpose is to obtain homogeneous clusters. k-means is used to split image features to create a hierarchical structure. As image feature vectors usually fall into high dimensional feature space, we test different distance metrics, in order to try to tackle the curse of dimensionality...
In this paper a new method, Variable Markov Oracle, for clustering time series data points is proposed. Variable Markov Oracle is based on previous results of Audio Oracle, a method of fast indexing repeating sub-clips in an audio stream. The proposed method is capable of discovering natural clusters with temporal relations without specifying the number of clusters. The discovery of inherent clusters...
This paper presents a method for clustering food offers based on the cuckoo search algorithm. The proposed method clusters food offers based on the similarity between their nutritional features (e.g. calcium, vitamins etc.) and/or ingredients. The similarity is evaluated by using the Sorensen-Dice coefficient. To test the clustering method proposed here, we have developed in-house a set of 800 food...
In this paper we propose a noise detection system based on similarities between instances. Having a data set with instances that belongs to multiple classes, a noise instance denotes a wrongly classified record. The similarity between different labeled instances is determined computing distances between them using several metrics among the standard ones. In order to ensure that this approach is computational...
Fuzzy clustering which can implement flexible classification is very useful but sometimes calculates the degrees of belongingness of an objects to a cluster too exactly. To solve this problem, a new clustering method called rough k-means (RKM) is proposed by Lingras et al. RKM which is an extended method by using rough set representation can classify more roughly than fuzzy clustering without lack...
Nowadays, smart phones get increasingly popular which also attracted hackers. With the increasing capabilities of such phones, more and more malicious softwares targeting these devices have been developed. Malwares can seriously damage an infected device within seconds. In this paper, we propose to use the trimming approaches for automatic clustering (trimmed k-means, Tclust) of smartphone's applications...
Traditional hierarchical text clustering methods assume that the documents are represented only by "technical information", i.e., keywords, phrases, expressions and named entities that can be directly extracted from the texts. However, in many scenarios there is an additional and valuable information about the documents which is usually disregarded during the clustering task, such as user-validated...
Image clustering has been attracting mounting focus on widely used fields, such as data compression, information retrieval, character recognition and so on, due to the emerging applications of various web-based and mobile-based image retrieval and services. To study this, based on Voronoi diagram, we propose a novel image clustering algorithm to effective discovery of image clusters in this paper...
Document clustering is to group documents according to a certain semantic features defined on the document set for measuring the similarities between two documents. The keyword models such as the TFIDF model of document have been widely used as features for document clustering. But it lacks of semantic structure, which limit its further usage in document analysis. Topic model has been developed to...
In this paper, we focus on the problem of clustering faces in videos. Different from traditional clustering on a collection of facial images, a video provides some inherent benefits: faces from a face track must belong to the same person and faces from a video frame can not be the same person. These benefits can be used to enhance the clustering performance. More precisely, we convert the above benefits...
Nowadays, Smartphones have been widely used due to their capabilities in communication and multimedia processing. Smartphones provide access to a tremendous amount of sensitive information related to business, such as customer contacts, financial data, and Intranet networks. Hence, the Internet of the future will be mobile Internet. However, threat of malicious software has become an important factor...
Semi-supervised clustering is a popular machine learning technique, used for challenge data categorization tasks, when some prior knowledge is available to users. In this paper, we report the empirical studies on our newly proposed semi-supervised clustering framework, which utilizes multiple viewpoints for the similarity measure, with the help of the prior knowledge. Two different MVS-based approaches...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.