The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Document clustering is used to organize the documents into groups. VSM (Vector Space Model) is a technique used to represent the document as a vector. Working with VSM to cluster the documents is easier. The main problem with text documents clustering is very high dimensionality of data. A term in the document represents a dimension. To reduce the dimensions of the document vector space, it is preprocessed...
Clustering is a way of combining data objects or data points into disjoint cluster. The basic concept behind clustering is that the data objects in the same clusters should be related to each other and the data objects belonging to different clusters should differ from each other. This research paper proposes a new algorithm which combines the features of K-means clustering algorithm and Hierarchical...
Data clustering forms a major part of an important aspect of big data analytics. Data Clustering helps to categorize the data, which further leads to recognize hidden patterns. K-means is one such clustering algorithm which is well known for its simple computation and also the capability of being executed in parallel. Big data analytics requires distributed computing which can be achieved using MapReduce...
This paper presents the improved algorithm for the Hybrid Approach of Neural network and Level-2 Fuzzy set (HANN-L2F). The main structure is including 2 parts. The first part is Neuro-Fuzzy system, including the MLP Neural network with the combination of the level-2 Fuzzy system. The second part is using k-nearest neighbor to classify the output from Neuro-fuzzy. The HANN-L2F is an algorithm with...
With the development of digital cable interactive business and the diversification of the customers' demand, grouping TV programmes based on preferences of users effectively is vital for market segmentation and differentiation. The study summarizes the main principle and characteristic of clustering algorithm, and uses K-Means algorithm to show TV programmes preference grouping based on 52392 subscribers...
Educational Data Mining (EDM) is a learning science, and an emerging discipline, concerned with analyzing and studying data from academic databases. Through the exploration of these large datasets, using various data mining methods, one can identify unique patterns which will help study, predict and improve a student's academic performance. This paper elaborates a study on various Educational Data...
This paper investigates the use of machine learning clustering technique to segment and target customers of a wholesale distributor. It describes the selection, analysis, and interpretation of clusters for evaluating customers annual spending on the products. We show how circular statistics can categorize customers by looking at the annual spending on six essential product categories. Several clusters...
Cardiotocography (CTG) records fetal heart rate (FHR) signal and intra uterine pressure (IUP) simultaneously. CTG are widely used for diagnosing and evaluates pregnancy and fetus condition until before delivery. The high dimension of CTG data are the problem for classification computation, by extracting feature we can get the useful information from CTG data, and in this research, K-Means Algorithm...
Data clustering is one of the popular tasks recently used in the educational data mining arena for grouping similar students by several aspects such as study performance, behavior, skill, etc. Many well-known clustering algorithms such as k-means, expectation-maximization, spectral clustering, etc. were employed in the related works. None of them has taken into consideration the incompleteness of...
K-means is one of the most significant clustering algorithms in data mining. It performs well in many cases, especially in the massive data sets. However, the result of clustering by K-means largely depends upon the initial centers, which makes K-means difficult to reach global optimum. In this paper, we developed a novel algorithm based on finding density peaks to optimize the initial centers for...
With the advent of the big data era, traditional data mining algorithm becomes incompetent for the task of massive data analysis, management and mining. The development of cloud computing brings new life to algorithm parallelization. In this paper, we have studied the K-means algorithm, one of the clustering algorithm. Then we attempt to improves this algorithm via the method that sample the large-scale...
In this paper, we propose a hybrid method for intrusion detection which is based on k-means, naive-bayes and back propagation neural network (KBB). Initially we apply k-means which is partition-based, unsupervised cluster analysis method. In the form of clusters, we attain the gathered data which can be easily processed and learned by any machine learning algorithm. These outcomes are provided to...
The boundary devices, such as routers, firewalls, proxies, and domain controllers, etc., are continuously generating logs showing the behaviors of the internal and external users, the working state of the network as well as the devices themselves. To rapidly and efficiently analyze these logs makes great sense in terms of security and reliability. However, it is a challenging task due to the fact...
Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering approaches are normally generating global models by aggregating local results that are obtained on each site. While this approach mines the datasets on their locations...
Online word-of-mouth activity is a very typical index of the lifecycle evolution model of a product, and understanding product lifecycle can help corresponding decision makers with their formulation of marketing strategies. In this paper, the data sets for the online comments on various types of products are studied; based on management theory and economics theory, and by applying such methods as...
K-means is the most widely used clustering algorithm due to its fairly straightforward implementations in various problems. Meanwhile, when the number of clusters increase, the number of iterations also tend to slightly increase. However there are still opportunities for improvement as some studies in the literature indicate. In this study, improved implementations of k-means algorithm with a centroid...
Agricultural crop production depends on various factors such as biology, climate, economy and geography. Several factors have different impacts on agriculture, which can be quantified using appropriate statistical methodologies. Applying such methodologies and techniques on historical yield of crops, it is possible to obtain information or knowledge which can be helpful to farmers and government organizations...
In this paper we propose an accurate clustering algorithm as the necessary step of the Single Channel Independent Component Analysis (SCICA) in the context of the fast extraction of protein profiles from the mass spectra (MALDI-TOF) data. In general K-means clustering is employed for clustering of the basis vectors. However given its iterative and statistical nature, convergence to the same clusters...
Cluster analysis is a main task of exploratory data mining and plays important role in many applications. There are numerous of clustering techniques in data mining works efficiently for low dimensional data and fails to handle high dimensional data. In this paper we evaluated the performance efficiency of K-means and Agglomerative hierarchical clustering methods based on Euclidean and Manhattan distance...
The focus of this research was to use Educational Data Mining (EDM) techniques to conduct a quantitative analysis of students interaction with an e-learning system through instructor-led non-graded and graded courses. This exercise is useful for establishing a guideline for a series of online short courses for them. A group of 412 students' access behaviour in an e-learning system were analysed and...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.