The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data dependent dissimilarity provides a better closest adaptation than distance measures. When dealing with arbitrary types of data sets especially those with manifold structures, mass-based dissimilarity [1] cannot perform well. Taking the structure into account, this paper introduces a generic structural mass-based dissimilarity which is easily applied to existing algorithms in different missions...
MicroRNAs form a family of single strand RNA molecules having length of approximately 22 nucleotides that are present in all animals and plants. Various studies have revealed that microRNA tend to cluster on chromosomes. In this regard, a novel clustering algorithm is presented in this paper, integrating rough hypercuboid approach with fuzzy c-means. Using the concept of rough hypercuboid equivalence...
Everyday huge amount of information are transferred from one network to another, the information may be exposed to attacks. The information and information system should be protected from unauthorized users. To provide and maintain the Confidentiality and Integrity of the information is a very tedious job so Intrusion Detection plays a very important role. Although various methods are used to protect...
Distributed Applications from different domains like Health care, E-Commerce, science, social networks etc., tend to generate large volumes of heterogeneous data that grow exponentially over a period of time leading to big data sets. Descriptive Analytics, on big data sets, pose a great challenge for traditional data analytical tools, since it is to be performed on the full data set, unlike predictive...
ROCK is a popular algorithm to cluster categorical data due to its ingenious concept of links between them. The only issue with this method is time complexity. The procedure is inherently slow with maximum iterations N-k. This paper shows how properties of dataset can be utilized to reduce the total iterations by a factor of 10 or more. The reduction is much significant as the size of dataset grows...
In this paper, we report an application of data analytics in a real world business case of the telecom industry. This work has been tied up with an IT company in India with a large data set of telecom customers. As part of data analytics, the first task was to perform cleansing of bad and missing data, transforming heterogeneous formats into a unified format, semantic analysis on the data (semantics...
Flow cytometry (FCM) is a very well-known method that is broadly used in clinical and research laboratories. Both clinical and research laboratories have been the target domains of FCM applications. The key research question in this particular field is “how to effectively automate FCM data analysis?”. To answer this question, this paper systematically reviews current advances in the automation of...
A persona in a social network is defined as the person's activities and attributes in a social network as seen by others. And a community in a social network is defined as a group of users in that social network which share common interests and are most likely to interact with each other in the network. For community detection, a user's persona and its connections with the other users in a network,...
Gathering the most relevant data for one's need, from the huge collection of data in the internet is a work of great difficult. To make it easier, we propose an application called text clustering, which is an automatic grouping of text documents into clusters, so that documents within a cluster defines the similarity between them, but they are not similar to documents in other clusters. Most of existing...
In recent years, nonnegative matrix factorization (NMF) attracts much attention in machine learning and signal processing fields due to its interpretability of data in a low dimensional subspace. For clustering problems, symmetric nonnegative matrix factorization (SNMF) as an extension of NMF factorizes the similarity matrix of data points directly and outperforms NMF when dealing with nonlinear data...
Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited research in the clustering of point patterns - sets or multi-sets of unordered elements - that are found in numerous applications and data sources. In this paper, we...
Event segmentation is an important step in monitoring and management applications that categorizes different events into different segments. This is important especially when applications, to be monitored and managed, are large-scale, comprehensive and data-intensive in nature. The process of segmentation is based on data clustering which is one of the key data mining methods used these days. There...
One of the most popular fuzzy clustering techniques is the fuzzy K-means algorithm (also known as fuzzy-c-means or FCM algorithm). In contrast to the K-means and K-median problem, the underlying fuzzy K-means problem has not been studied from a theoretical point of view. In particular, there are no algorithms with approximation guarantees similar to the famous K-means++ algorithm known for the fuzzy...
In recent years, there is a rapid growth in online communication. There are many social networking sites and related mobile applications, and some more are still emerging. Huge amount of data is generated by these sites everyday and this data can be used as a source for various analysis purposes. Twitter is one of the most popular networking sites with millions of users. There are users with different...
In recent years many different subspace clusteringalgorithms and related methods have been proposed. Theypromise to not only find hidden structures in data sets, but also toselect for each structure the features, which are most prominent. Yet, most of these methods suffer from the same problem:finding a satisfactory clustering result heavily depends on anadequate configuration of the parameters. In...
Many clustering evaluation methods are computed as a ratio between two objectives, typically these objectives express the compactness of all clusters while trying to maximize the separation between individual clusters. However, the clustering process itself is typically implemented as a single objective problem: optimizing a linear combination of between-points closeness. We propose MoCham - a hierarchical...
This paper introduces the relative principium of K-Means algorithm, simulated annealing (SA) algorithm and particle swarm optimization (PSO) algorithm at first. Then, in allusion to the influence of the initial value of the K-Means algorithm on the optimal solution of the algorithm, a hybrid algorithm of K-Means based on SA-PSO is proposed. The new algorithm uses the advantage of jumping out of local...
A hybrid clustering approach is proposed for processing image-like data such as plots in flow cytometry. Clustering or partitioning data into relatively homogeneous and coherent subpopulations can be an effective pre-processing method to achieve data analysis tasks such as pattern recognition and classification. Our method uses a graph to model the initial manual partition of the dataset. Based on...
Clustering streaming data has gained importance in recent years due to an expanding opportunity to discover knowledge in widely available data streams. As streams are potentially evolving and unbounded sequence of data objects, clustering algorithms capable of performing fast and incremental processing of data points are necessary. This paper presents a method of clustering high-dimensional data streams...
Clustering can help to make large datasets more manageable by grouping together similar objects. However, most clustering approaches are unable to scale to very large datasets (e.g. more than 10 million objects). The K-Tree is a data structure and clustering algorithm that has proven to be scalable with large streaming datasets. Here, we apply the K-Tree to spatial data (satellite images) and extend...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.