The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
As an important branch of machine learning, clustering is wildly used for data analysis in various domains. Hierarchical clustering algorithm, one of the traditional clustering algorithms, has excellent stability yet relatively poor time complexity. In this paper, we proposed an efficient hierarchical clustering algorithm by searching given nodes' nearest neighbors iteratively, which depends on an...
Many real-world applications involve multi-label data streams, so effective concept drift detection methods should be able to consider the unique properties of multi-label stream data, such as label dependence. To deal with these challenges, we proposed an efficient and effective method to detect concept drift based on label grouping and entropy for multi-label data. Two methods are proposed to group...
Ensuring high reliability of large-scale clusters is becoming more critical as the size of these machines continues to grow, since this increases the complexity and amount of interactions between different nodes and thus results in a high failure frequency. For this reason, predicting node failures in order to prevent errors from happening in the first place has become extremely valuable. A common...
Incomplete data clustering plays an important role in the big data analysis and processing. Existing algorithms for clustering incomplete high-dimensional big data have low performances in both efficiency and effectiveness. The paper proposes an incomplete high-dimensional big data clustering algorithm based on feature selection and partial distance strategy. First, a hierarchical clustering-based...
This paper proposes a new method for automated clustering of high dimensional datasets. The method is based on a recursive binary division strategy that successively divides an original dataset into distinct clusters. Each binary division is carried out using a model-free expectation maximization scheme that exploits the posterior probability computation capability of the quasi-supervised learning...
Aiming at the long response time, inaccurate recommendation and cold-start problems that faced by present recommendation algorithm, this paper, taking movie recommendation system as an example, proposes a collaborative filtering recommendation model based on user's credibility clustering. This model divides recommendation process into offline and online phases. Offline, it uses the result of user's...
Abstract-Image annotation has been identified to be a suitable means by which the semantic gap which has made the accuracy of Content-based image retrieval unsatisfactory be eliminated. However existing methods of automatic annotation of images depends on supervised learning, which can be difficult to implement due to the need for manually annotated training samples which are not always readily available...
The planted (l, d) motif discovery has been successfully used to locate transcription factor binding sites in dozens of promoter sequences over the past decade. However, there has not been enough work done in identifying (l, d) motifs in the next-generation sequencing (ChIP-seq) data sets, which contain thousands of input sequences and thereby bring new challenge to make a good identification in reasonable...
The Texture Feature Extraction (TFE) plays an important role in satellite image processing application. This paper proposes a novel method for Satellite Imagery Classification. Our proposed method is a combination of Local Binary Pattern (LBP) and Fuzzy c-means classification algorithm. Local Binary Pattern is calculated by thresholding a 3 × 3 neighborhood of each pixel by the center pixel value...
Idiopathic generalized epilepsy (IGE) and symptomatic generalized epilepsy (SGE) are two kinds of generalized epilepsy. In this study, we discussed the methods of automatically segmentation of MR images for patients with these two kinds of epilepsy. K-Means clustering, expectation-maximization, and fuzzy c-means algorithms were employed to perform segmentation on brain images for patients with IGE...
Big data is a set of very large and complex data that is hard to load on computers. The main challenge in big data world is related to their search, categorize and analyze specially, when they are unbalanced. Despite, there are a lot of works in the field of big data but analyzing unbalanced big data is still a fundamental challenge in this area. In this paper we try to solve the problem of RSIO-LFCM...
Parkinson's disease (PD) is a chronic neurological progressive disorder caused by lack of the chemical dopamine in the brain. Up to today, there is still no cure or prevention for PD, and usually the disease worsens gradually over time. However, this disease can be controlled with some treatment, especially in the early stage. Hence, this study proposes a method in early detection and diagnosis of...
Original K-medoid algorithm use to take initial medoids arbitrarily that bears on the resulting clusters and it leads to unstable and empty clusters which are no meaningful and also amount of iterations can be rather high so K-Medoid is not a substitute for big databases because of its computational complexity. Also the original k-means algorithm is computationally. Though existing algorithms usually...
In today's networked environment, massive volume of data being generated, gathered and stored in databases across the world. This trend is growing very fast, year after year. Today it is normal to find databases with terabytes of data, in which vital information and knowledge is hidden. The unseen information in such databases is not feasible to mine without efficient mining techniques for extracting...
This paper aims on collaborative filtering (CF) in TV recommendation system which combines content-based and collaborative filtering recommendation mechanism, we propose an algorithm that using the self-organizing mapping (SOM) to optimize the improved k-means (IK) clustering in collaborative filtering. The whole clustering algorithm is divided into two phases: at the first stage, the quantity of...
Clustering the genes based on their expression patterns is one of the important subjects in analyzing microarray data. Discovering the genes co-expressed in particular conditions has been done by different clustering algorithms. In these methods, the similar genes are located in the same cluster. Thus, the closer the similar genes, the further the dissimilar ones will be. Each of the applied methods...
The majority of learning systems don't take in consideration real world data problem and consider that the training sets are perfect. However, in real world data, this hypothesis is not always true. In fact, real world data is characterized by many different problems like redundancy, incoherence or the big size of data. In this paper we focus on the problem of imbalance between class. Many solutions...
In our previous study, a grouping-geneticalgorithm- based (GGA-based) attribute clustering process has been proposed for grouping features. In this paper, we further improve its performance and propose a center-based GGA for attribute clustering (CGGA). A new encoding scheme with corresponding crossover and mutation operators are designed, and an improved fitness function is proposed to achieve better...
The ULTRA-WIDE BAND (UWB) signals transmit a large amount of information over a short distance with low power and the signals reflected by the inspected materials can be obtained without contacts of the materials. As a result, the reflected UWB signals offer us one potential contactless material identification or classification tool. In this paper, we study the UWB signals collected in a series of...
This paper innovatively proposes a shortwave direction finding (DF) and crossing location (CL) algorithm based on an improved clustering and outlier detection method. The proposed algorithm can reject the scattered azimuths and revise the error monitoring data using the improved clustering and outlier detection method. By practical verification, the quantified results prove that compared to the traditional...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.