The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering is a classic topic in optimization with k-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best known algorithm for k-means with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of 9+≥ilon, a ratio that is known to be tight with respect to such methods.We overcome this barrier...
MicroRNAs form a family of single strand RNA molecules having length of approximately 22 nucleotides that are present in all animals and plants. Various studies have revealed that microRNA tend to cluster on chromosomes. In this regard, a novel clustering algorithm is presented in this paper, integrating rough hypercuboid approach with fuzzy c-means. Using the concept of rough hypercuboid equivalence...
One of the most popular fuzzy clustering techniques is the fuzzy K-means algorithm (also known as fuzzy-c-means or FCM algorithm). In contrast to the K-means and K-median problem, the underlying fuzzy K-means problem has not been studied from a theoretical point of view. In particular, there are no algorithms with approximation guarantees similar to the famous K-means++ algorithm known for the fuzzy...
Clustering streaming data has gained importance in recent years due to an expanding opportunity to discover knowledge in widely available data streams. As streams are potentially evolving and unbounded sequence of data objects, clustering algorithms capable of performing fast and incremental processing of data points are necessary. This paper presents a method of clustering high-dimensional data streams...
Single linkage (SLINK) hierarchical clustering algorithm is a preferred clustering algorithm over traditional partitioning-based clustering as it does not require the number of clusters as input. But, due to its high time complexity and inherent data dependencies, it does not scale well for large datasets. To the best of our knowledge, all existing parallel SLINK algorithms are based on the traditional...
The clustering is the most effective method to identify the outliers in the UCI Repository dataset. This paper proposes detecting outliers on UCI datasets using Adaptive Rough Fuzzy C-Means clustering algorithm. In the first phase of the Adaptive Rough Fuzzy C- Means algorithm, the Rough k means algorithm is used for pre-processing of UCI repository dataset and it is normally identify the outliers...
With increasing data clouds in different geographical areas, the availability of a datacenter and the cost of using the datacenter are two concerned factors of clouds users. The present research aims to present a method using K-means clustering and NSGA-II multi-objective algorithm to maximize availability and minimizes cost in selecting a datacenter. The proposed approach was applied to some real...
An important step in the appearance preservation of real materials is the analysis of how they interact with light. Since this phenomena happens at a microscopic level, heuristics with different complexity have been developed to capture and reproduce it. In order to minimize sampling efforts, one of these approaches consists in representing the reflectance of a material as a linear combination of...
Clustering is an interdisciplinary-studied subject of statistical data analysis. In this study, among various types of clustering algorithms, the algorithms derived from Density Based Spatial Clustering of Applications with Noise (DBSCAN) are investigated. Although DBSCAN is the well-known density-based algorithms it has some bottlenecks. So, enhanced versions of DBSCAN are developed to provide some...
Spectral clustering has shown a superior performance in analyzing the cluster structure. However, the exponentially computational complexity limits its application in analyzing large-scale data. To tackle this problem, many low-rank matrix approximating algorithms are proposed, of which the Nyström method is an approach with proved lower approximate errors. The algorithms commonly combine two powerful...
Data mining is the method which is useful for extracting useful information and data is extorted, but the classical data mining approaches cannot be directly used for big data due to their absolute complexity. The data that is been formed by numerous scientific applications and incorporated environment has grown rapidly not only in size but also in variety in recent era. The data collected is of very...
The amount of unstructured text data available is growing exponentially due to the proliferation of digital information such as emails, text messages, blogs, social media posts, and product reviews. For users of e-commerce websites such as Amazon, navigating thousands of reviews before buying a product can be a daunting task. Unsupervised machine learning techniques can be used to automatically analyze...
Application of clustering algorithms for investigating real life data has concerned many researchers and vague approaches or their hybridization with other analogous approaches has gained special attention due to their great effectiveness. Recently, rough intuitionistic fuzzy c-means algorithm has been proposed by Tripathy et al [3] and they established its supremacy over all other algorithms contained...
Clustering large collections of binary programs is a challenging task due to two factors. First of all, a way to determine if two samples are similar or not is required. Secondly, pair wise comparison is impractical on collections comprising millions of items. This paper will mainly focus on the second factor and will propose a clustering algorithm based on the properties of Min Hash functions. The...
Solutions for facility location problems are numerous. As the problem is NP hard, continuous efforts have been made to find more efficient techniques. The nature of the facility adds to its variety. A popular approach has been based on geometric solutions. Other methods have also been tried; one of them is based on density applied for large databases as in Spatial Data Mining and Geographic Information...
Clustering is a familiar concept in the realm of Data mining and has wide applications in areas like image processing, pattern recognition and rule generation. Uncertainty in present day databases is a common feature. In order to handle these datasets, several clustering algorithms have been formulated in the literature. The first one being the Fuzzy C-Means (FCM) algorithm and it was followed by...
A Semi-supervised Segmentation Fusion algorithm is proposed using consensus and distributed learning. The aim of Unsupervised Segmentation Fusion (USF) is to achieve a consensus among different segmentation outputs obtained from different segmentation algorithms by computing an approximate solution to the NP problem with less computational complexity. Semi-supervision is incorporated in USF using...
Network connectivity maintenance in failure prone environment has received more attention in the recent years. Unfortunately due to hostile environment there is need of some other active nodes i.e. backbone nodes which can compensate the failure of the nodes. One of the main design challenges for wireless sensor network (WSNs) is to obtain connecting dominating set (CDS) in polynomial time with low...
Matrix factorization based techniques, such as nonnegative matrix factorization and concept factorization, have attracted great attention in dimensionality reduction and data clustering. Previous studies show that both of them yield impressive results on image processing and document clustering. However, both of them are essentially unsupervised methods and cannot incorporate label information. In...
The clustering of glyphs (individual letters/characters/symbols) is typically the first step in document processing algorithms and a critical enabling technology for most historical document indexing techniques. In this work, we take a step back from current domain/language specialized research efforts to consider the problem from an agnostic perspective. In particular, we claim that, independent...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.