The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper mainly studies the complex network detection algorithm, and improves an algorithm based on K-means, Another reference node density properties, this paper puts forward a method community structure detection algorithms (BSTN) based on similarity between the nodes of the complex network, the algorithm greatly reduce iteration times, using the algorithm in the computer generated stochastic...
A clustering problem with balancing constraints is studied in this paper, which means that the sample number in each cluster has to be at least pre-given value. A modified k-means clustering algorithm is proposed, which adopt the proposed heuristic cluster assignment algorithm to deal with the balancing constraints. Numerical computation shows that the proposed algorithm can deal with the balancing...
Data clustering has been applied in multiple fields such as machine learning, data mining, wireless sensor networks and pattern recognition. One of the most famous clustering approaches is K-means which effectively has been used in many clustering problems, but this algorithm has some problems such as local optimal convergence and initial point sensitivity. Artificial fishes swarm algorithm (AFSA)...
Data mining has been defined as "The nontrivial extraction of implicit, previously unknown, and potentially useful information from data". Clustering is the automated search for group of related observations in a data set. The K-Means method is one of the most commonly used clustering techniques for a variety of applications. This paper proposes a method for making the K-Means algorithm...
Data mining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. Within a gene expression matrix there are usually several particular macroscopic phenotypes of samples. Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Currently most of the research...
We present an approach for grouping single-speaker speech segments into speaker-specific clusters. Our approach is based on applying the K-means clustering algorithm to a suitable discriminant subspace, where the euclidean distance reflect speaker differences. A core feature of our approach is approximating speaker-conditional statistics, that are not available, with single-speaker segments statistics,...
Automatic seizure detection is becoming popular in modern epilepsy monitoring units since it assists diagnostic monitoring and reduces manual review of large volumes of EEG recordings. In this paper, we describe the application of machine learning algorithms for building patient-specific seizure detectors on multiple frequency bands of intra-cranial electroencephalogram (iEEG) recorded by a dense...
WebEpi is an epidemiological WebGIS service developed for the Population Health Epidemiology Unit of the Tasmania Department of Health and Human Services (DHHS). Epidemiological geographical studies help analyze public health surveillance and medical situations. It is still a challenge to conduct large-scale geographical information exploration of epidemiology surveillance based on patterns and relationships...
K-Means is a popular clustering algorithm which adopts an iterative refinement procedure to determine data partitions and to compute their associated centres of mass, called centroids. The straightforward implementation of the algorithm is often referred to as `brute force' since it computes a proximity measure from each data point to each centroid at every iteration of the K-Means process. Efficient...
Clustering or data grouping is a key initial procedure in image processing. In present scenario the size of database of companies has increased dramatically, these databases contain large amount of text, image. They need to mine these huge databases and make accurate decisions in short durations in order to gain marketing advantage. As image is a collection of number of pixels. It is difficult to...
Most of the clustering algorithms perform loosely when dimensionality of the data set increase because some dimensions contain irrelevant or noisy data and randomly initialization of clusters centres gives the local optimum clustering. In this paper, we proposed a technique for reducing the effect of high dimensionality and randomly initialization of clusters centres. It consists of three phases....
IDS (Intrusion Detection system) is an active and driving defense technology. This paper mainly focuses on intrusion detection based on data mining. The aim is to improve the detection rate and decrease the false alarm rate, and the main research method is clustering analysis. The algorithm and model of ID are proposed and corresponding simulation experiments are presented. Firstly, a method to reduce...
The k-means algorithm is one of the well-known and most popular clustering algorithms. K-means seeks an optimal partition of the data by minimizing the sum of squared error with an iterative optimization procedure, which belongs to the category of hill climbing algorithms. As we know hill climbing searches are famous for converging to local optimums. Since k-means can converge to a local optimum,...
In this paper, the author used K-means and fuzzy K-means to analyze the classification of precipitation in JingDeZhen City, and the results showed that using fuzzy k-means algorithm is a more efficient data clustering algorithm, with better value of promotion and practical application.
System anomaly detection is very important for development, maintenance and performance refinement in large scale distributed systems. It's a good way to obtain the troubleshooting and problem diagnosis by analyzing system logs produced by distributed systems. However, due to the increasing scale and complexity of distributed systems, the size of logs must be very large. Thus, it's inefficient for...
Presently, in the data mining scenario clustering of large dataset is one of the very important techniques widely applied to many applications including social network analysis. Applying more specific pre-processing method to prepare the data for clustering algorithms is considered to be a significant step for generating meaningful segments. In this paper we propose an innovative clustering technique...
In many application domains such as information retrieval, computational biology, and image processing the data dimension is usually very high. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. The k-means clustering algorithm is used for many practical applications. But it is computationally expensive and the quality...
Based on the complex network theory, we proposed a clustering algorithm based on content similarity. Firstly, the Chinese documents are represented by the vector-space model, and the content similarity between any two documents is computed by the cosine similarity. Consequently, the network node is defined as a document, and the edge weight is defined as the similarity obtained by the cosine similarity...
K-means clustering is sensitive to starting points and its time cost is expensive for large scale of data, such as audio. Sampling approach is widely applied to find “better” starting points for speeding up the clustering converging procedure. However, how to choose a reasonable sampling-rate remains a problem. In this paper, we reported our initial exploration of locating reasonable sampling-rates...
The k-means method is a widely used clustering technique because of its simplicity and speed. However, the clustering result depends heavily on the chosen initial value. In this report, we propose a seeding method with independent component analysis for the k-means method. Using a benchmark dataset, we evaluate the performance of our proposed method and compare it with other seeding methods.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.