The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this digital world, we are facing the flood of data, but depriving for knowledge. The eminent need of mining is useful to extract the hidden pattern from the wide availability of vast amount of data. Clustering is one such useful mining tool to handle this unfavorable situation by carrying out crucial steps refers as cluster analysis. It is the process of a grouping of patterns into clusters based...
Micro array data play a vital role in simultaneously monitoring the expression profile of large number of genes that are specified with various experimental conditions. In bioinformatics research, the recognition of co-expressed and coherent patterns is a major objective in micro array data analysis. The K-means clustering algorithm is gaining popularity in the knowledge discovery domain for effectively...
In this paper, an exact and proactive technique is created to distinguish Distributed Denial of Service (DDoS) attacks. This is achieved by using an entropy concept to measure abnormal traffic changes according to the phases of the attack. This traffic is then clustered by using a modified DBSCAN algorithm, and the centroids for the resulting clusters are then used as patterns for efficient distance-based...
Large amounts of data gets accumulated and stored in the databases in day to day life that are high dimensional in nature. The data mining task is used to excavate the useful information from the high dimensional data. To classify or cluster the high dimensional data, the dimensionality of the data needs to be reduced. Feature selection is used to select the features that are relevant to the analysis...
Microarray technology is a tool which is essential to observe and monitor the genes in an living organism. Biclustering is a strategy to distinguish qualities that are co-directed under a subset of conditions, however are not really co-controlled crosswise over different conditions. The dataset is in the form of matrix, row matrix represents a set of genes and column matrix represents a set of conditions...
In the on-line monitoring for the fault of rolling bearing, we have no information about the cluster number of the obtained data signal, which cause great challenges for on-line fault diagnosis when using clustering algorithms. In this paper, we extract three features of the vibration signals of rolling bearings as the parameters in time-domain, and then multi-scale possibilistic clustering (MPCM)...
Data retrieval is a key process of acquiring information as per requirement. The necessity of proper information has increased. The most basic tools which provide this service are browser. It traverses the data as per user's query and gives the search results of all related information. Hence, it becomes a time consuming process to find required information. In this paper, the focus is done on content...
Currently, the supervised trained deep neural networks (DNNs) have been successfully applied in several image classification tasks. However, how to extract powerful data representations and discover semantic concepts from unlabeled data is a more practical issue. Unsupervised feature learning methods aim at extracting abstract representations from unlabeled data. Large amount of research works illustrate...
Clustering, or unsupervised classification, is animportant issue in Bioinformatics. It serves to automaticallygroup protein sequences into families. Most researchers treatthe biclass clustering problem. In this paper we present ourapproach for the multiclass clustering of protein sequences. It isa difficult problem, because we are based on primary structure. This approach consists of four steps. In...
Community detection has attracted considerable attention crossing many areas as it can be used for discovering the structure and features of complex networks. With the increasing size of social networks in real world, community detection approaches should be fast and accurate. The Label Propagation Algorithm (LPA) is known to be one of the near-linear solutions and benefits of easy implementation,...
Botnets, which consist of remotely controlled compromised machines called bots, provide a distributed platform for several threats against cyber world entities and enterprises. Intrusion detection system (IDS) provides an efficient countermeasure against botnets. It continually monitors and analyzes network traffic for potential vulnerabilities and possible existence of active attacks. A payload-inspection-based...
Recently, the number of features in different problem domains has grown enormously. In order to select the best representation (attributes) for these problems, a deep knowledge of the problem domain is required. As this type of knowledge is not always possible, feature selection needs to be applied as an automatic selection process of the most relevant attributes in a dataset. In this paper, we propose...
Data mining methods like clustering enable police to get a clearer picture of criminal identification and prediction. Clustering algorithms will help to extracts hidden patterns to identify groups and their similarities. In this paper, a modified k-mean algorithm is proposed. The data point has been allocated to its suitable class or cluster more remarkably. The Modified k-mean algorithm reduces the...
The sparsity and the problem of curse of dimensionality of high dimensional data make traditional clustering algorithms such as K-Means, DBSCAN (Density-Based Spatial Clustering of Applications with Noise) result in low quality clusters and increase the time complexity exponentially. Many Projected Clustering algorithms have been proposed to deal with noisy High Dimensional Data. However, most of...
Data clustering analysis is the process of finding similarity between data that are assigned into homogeneous groups and the most heterogeneous as possible among groups. There are several analysis methods in wich K-means clustering algorithm is the widly used in different research areas. Therefore, this paper reviews the most known variants of clustering methods which are K-means, IRP-K-means and...
The aim of writing this paper is to provide a detailed, in order description and analysis of the often used and important algorithms of clustering with focus on the recent advances, and to provide an extensive comparison of these algorithms in terms of their complexities and applications.
The paper presents a graph model and an agglomerative algorithm for text document clustering. Given a set of documents, the associations among frequently co-occurring terms in any of the documents naturally form a graph, which can be decomposed into connected components at various levels. Each connected component represents a concept in the collection. These concepts can categorize documents into...
With the explosive increase of data volume, the research of data quality and data usability draws extensive attention. In this work, we focus on one aspect of data usability -- incomplete data imputation, and present a novel missing value imputation method using stacked auto-encoder and incremental clustering (SAICI). Specifically, SAICI's functionality rests on four pillars: (i) a distinctive value...
Traffic anomalies that occur on the network usually make authorized users cannot access properly. That because by an increased number of users at a time or due to the attack of botnet to the network. This research purpose a method to detect there is anomaly traffic or not. This research used K-Means algorithm as the detection algorithm that modified on determination of the centroid and the cluster...
Clustering large collections of binary programs is a challenging task due to two factors. First of all, a way to determine if two samples are similar or not is required. Secondly, pair wise comparison is impractical on collections comprising millions of items. This paper will mainly focus on the second factor and will propose a clustering algorithm based on the properties of Min Hash functions. The...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.