Search results

chapter

Frequent term based text document clustering: A new approach

Manoj Kumar, D K Yadav, Vijay Kumar Gupta

2015 International Conference on Soft Computing Techniques and Implementations (ICSCTI) > 11 - 15

2015 International Conference on Soft Computing Techniques and Implementations (ICSCTI)

Document clustering is used to organize the documents into groups. VSM (Vector Space Model) is a technique used to represent the document as a vector. Working with VSM to cluster the documents is easier. The main problem with text documents clustering is very high dimensionality of data. A term in the document represents a dimension. To reduce the dimensions of the document vector space, it is preprocessed...

chapter

Performance evaluation of enhanced hierarchical and partitioning based clustering algorithm (EPBCA) in data mining

Gurpreet Singh, Jaskaranjit Kaur, Yusuf Mulge

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) > 805 - 810

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)

Clustering is a way of combining data objects or data points into disjoint cluster. The basic concept behind clustering is that the data objects in the same clusters should be related to each other and the data objects belonging to different clusters should differ from each other. This research paper proposes a new algorithm which combines the features of K-means clustering algorithm and Hierarchical...

chapter

A MapReduce framework to implement enhanced K-means algorithm

Rajashree Shettar, Bhimasen. V. Purohit

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT) > 361 - 363

2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)

Data clustering forms a major part of an important aspect of big data analytics. Data Clustering helps to categorize the data, which further leads to recognize hidden patterns. K-means is one such clustering algorithm which is well known for its simple computation and also the capability of being executed in parallel. Big data analytics requires distributed computing which can be achieved using MapReduce...

chapter

Improvements the HANN-L2F for classification by using k-means

Jirawat Teyakome, Narissara Eiamkanitchat

2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE) > 621 - 625

2015 7th International Conference on Information Technology and Electrical Engineering (ICITEE)

This paper presents the improved algorithm for the Hybrid Approach of Neural network and Level-2 Fuzzy set (HANN-L2F). The main structure is including 2 parts. The first part is Neuro-Fuzzy system, including the MLP Neural network with the combination of the level-2 Fuzzy system. The second part is using k-nearest neighbor to classify the output from Neuro-fuzzy. The HANN-L2F is an algorithm with...

chapter

Application of clustering algorithm on TV programmes preference grouping of subscribers

Haiyue Zhang, Jianping Chai, Yan Wang, Min An, more

2015 IEEE International Conference on Computer and Communications (ICCC) > 40 - 44

2015 IEEE International Conference on Computer and Communications (ICCC)

With the development of digital cable interactive business and the diversification of the customers' demand, grouping TV programmes based on preferences of users effectively is vital for market segmentation and differentiation. The study summarizes the main principle and characteristic of clustering algorithm, and uses K-Means algorithm to show TV programmes preference grouping based on 52392 subscribers...

chapter

Educational Data Mining techniques and their applications

John Jacob, Kavya Jha, Paarth Kotak, Shubha Puthran

2015 International Conference on Green Computing and Internet of Things (ICGCIoT) > 1344 - 1348

2015 International Conference on Green Computing and Internet of Things (ICGCIoT)

Educational Data Mining (EDM) is a learning science, and an emerging discipline, concerned with analyzing and studying data from academic databases. Through the exploration of these large datasets, using various data mining methods, one can identify unique patterns which will help study, predict and improve a student's academic performance. This paper elaborates a study on various Educational Data...

chapter

Segmenting and targeting customers through clusters selection & analysis

Ilung Pranata, Geoff Skinner

2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS) > 303 - 308

2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS)

This paper investigates the use of machine learning clustering technique to segment and target customers of a wholesale distributor. It describes the selection, analysis, and interpretation of clusters for evaluating customers annual spending on the products. We show how circular statistics can categorize customers by looking at the annual spending on six essential product categories. Several clusters...

chapter

Fetal state classification from cardiotocography based on feature extraction using hybrid K-Means and support vector machine

Nurul Chamidah, Ito Wasito

2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS) > 37 - 41

2015 International Conference on Advanced Computer Science and Information Systems (ICACSIS)

Cardiotocography (CTG) records fetal heart rate (FHR) signal and intra uterine pressure (IUP) simultaneously. CTG are widely used for diagnosing and evaluates pregnancy and fetus condition until before delivery. The high dimension of CTG data are the problem for classification computation, by extracting feature we can get the useful information from CTG data, and in this research, K-Means Algorithm...

chapter

A robust and effective algorithmic framework for incomplete educational data clustering

Vo Thi Ngoc Chau, Nguyen Hua Phung, Vo Thi Ngoc Tran

2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS) > 65 - 70

2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS)

Data clustering is one of the popular tasks recently used in the educational data mining arena for grouping similar students by several aspects such as study performance, behavior, skill, etc. Many well-known clustering algorithms such as k-means, expectation-maximization, spectral clustering, etc. were employed in the related works. None of them has taken into consideration the incompleteness of...

chapter

Density K-means: A new algorithm for centers initialization for K-means

Xv Lan, Qian Li, Yi Zheng

2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS) > 958 - 961

2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)

K-means is one of the most significant clustering algorithms in data mining. It performs well in many cases, especially in the massive data sets. However, the result of clustering by K-means largely depends upon the initial centers, which makes K-means difficult to reach global optimum. In this paper, we developed a novel algorithm based on finding density peaks to optimize the initial centers for...

chapter

Research and improve on K-means algorithm based on hadoop

Kehe Wu, Wenjing Zeng, Tingting Wu, Yanwen An

2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS) > 334 - 337

2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)

With the advent of the big data era, traditional data mining algorithm becomes incompetent for the task of massive data analysis, management and mining. The development of cloud computing brings new life to algorithm parallelization. In this paper, we have studied the K-means algorithm, one of the clustering algorithm. Then we attempt to improves this algorithm via the method that sample the large-scale...

chapter

KBB: A hybrid method for intrusion detection

Shreya Dubey, Jigyasu Dubey

2015 International Conference on Computer, Communication and Control (IC4) > 1 - 6

2015 International Conference on Computer, Communication and Control (IC4)

In this paper, we propose a hybrid method for intrusion detection which is based on k-means, naive-bayes and back propagation neural network (KBB). Initially we apply k-means which is partition-based, unsupervised cluster analysis method. In the form of clusters, we attain the gathered data which can be easily processed and learned by any machine learning algorithm. These outcomes are provided to...

chapter

Analyzing Boundary Device Logs on the In-memory Platform

Feng Cheng, Andrey Sapegin, Marian Gawron, Christoph Meinel

2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems > 1367 - 1372

2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS) and 2015 IEEE 12th International Conf on Embedded Software and Systems (ICESS)

The boundary devices, such as routers, firewalls, proxies, and domain controllers, etc., are continuously generating logs showing the behaviors of the internal and external users, the working state of the network as well as the devices themselves. To rapidly and efficiently analyze these logs makes great sense in terms of security and reliability. However, it is a challenging task due to the fact...

chapter

Distributed clustering algorithm for spatial data mining

Malika Bendechache, M-Tahar Kechadi

2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM) > 60 - 65

2015 2nd IEEE International Conference on Spatial Data Mining and Geographical Knowledge Services (ICSDM)

Distributed data mining techniques and mainly distributed clustering are widely used in the last decade because they deal with very large and heterogeneous datasets which cannot be gathered centrally. Current distributed clustering approaches are normally generating global models by aggregating local results that are obtained on each site. While this approach mines the datasets on their locations...

chapter

ICA k-means based time series clustering analysis of online word-of-mouth

Na Pan, Hong Li, Chunyang Liu

2015 12th International Conference on Service Systems and Service Management (ICSSSM) > 1 - 6

2015 12th International Conference on Service Systems and Service Management (ICSSSM)

Online word-of-mouth activity is a very typical index of the lifecycle evolution model of a product, and understanding product lifecycle can help corresponding decision makers with their formulation of marketing strategies. In this paper, the data sets for the online comments on various types of products are studied; based on management theory and economics theory, and by applying such methods as...

chapter

k-Means Performance Improvements with Centroid Calculation Heuristics Both for Serial and Parallel Environments

Jeyhun Karimov, Murat Ozbayoglu, Erdogan Dogdu

2015 IEEE International Congress on Big Data > 444 - 451

2015 IEEE International Congress on Big Data (BigData Congress)

K-means is the most widely used clustering algorithm due to its fairly straightforward implementations in various problems. Meanwhile, when the number of clusters increase, the number of iterations also tend to slightly increase. However there are still opportunities for improvement as some studies in the literature indicate. In this study, improved implementations of k-means algorithm with a centroid...

chapter

Applying data mining techniques to predict annual yield of major crops and recommend planting different crops in different districts in Bangladesh

A. T. M Shakil Ahamed, Navid Tanzeem Mahmood, Nazmul Hossain, Mohammad Tanzir Kabir, more

2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) > 1 - 6

2015 IEEE/ACIS 16th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD)

Agricultural crop production depends on various factors such as biology, climate, economy and geography. Several factors have different impacts on agriculture, which can be quantified using appropriate statistical methodologies. Applying such methodologies and techniques on historical yield of crops, it is possible to obtain information or knowledge which can be helpful to farmers and government organizations...

chapter

An accurate clustering algorithm for fast protein-profiling using SCICA on MALDI-TOF

Amit Acharyya, Mavuduru Neehar, Ganesh R. Naik

2015 IEEE International Symposium on Circuits and Systems (ISCAS) > 69 - 72

2015 IEEE International Symposium on Circuits and Systems (ISCAS)

In this paper we propose an accurate clustering algorithm as the necessary step of the Single Channel Independent Component Analysis (SCICA) in the context of the fast extraction of protein profiles from the mass spectra (MALDI-TOF) data. In general K-means clustering is employed for clustering of the basis vectors. However given its iterative and statistical nature, convergence to the same clusters...

chapter

Metric Based Performance Analysis of Clustering Algorithms for High Dimensional Data

Smita Chormunge, Sudarson Jena

2015 Fifth International Conference on Communication Systems and Network Technologies > 1060 - 1064

2015 Fifth International Conference on Communication Systems and Network Technologies (CSNT)

Cluster analysis is a main task of exploratory data mining and plays important role in many applications. There are numerous of clustering techniques in data mining works efficiently for low dimensional data and fails to handle high dimensional data. In this paper we evaluated the performance efficiency of K-means and Agglomerative hierarchical clustering methods based on Euclidean and Manhattan distance...

chapter

Students behavioural analysis in an online learning environment using data mining

I. P. Ratnapala, R. G. Ragel, S. Deegalla

7th International Conference on Information and Automation for Sustainability > 1 - 7

2014 7th International Conference on Information and Automation for Sustainability (ICIAfS)

The focus of this research was to use Educational Data Mining (EDM) techniques to conduct a quantitative analysis of students interaction with an e-learning system through instructor-led non-graded and graded courses. This exercise is useful for establishing a guideline for a series of online short courses for them. A group of 412 students' access behaviour in an e-learning system were analysed and...

INFONA - science communication portal

Search results

Frequent term based text document clustering: A new approach

Performance evaluation of enhanced hierarchical and partitioning based clustering algorithm (EPBCA) in data mining

A MapReduce framework to implement enhanced K-means algorithm

Improvements the HANN-L2F for classification by using k-means

Application of clustering algorithm on TV programmes preference grouping of subscribers

Educational Data Mining techniques and their applications

Segmenting and targeting customers through clusters selection & analysis

Fetal state classification from cardiotocography based on feature extraction using hybrid K-Means and support vector machine

A robust and effective algorithmic framework for incomplete educational data clustering

Density K-means: A new algorithm for centers initialization for K-means

Research and improve on K-means algorithm based on hadoop

KBB: A hybrid method for intrusion detection

Analyzing Boundary Device Logs on the In-memory Platform

Distributed clustering algorithm for spatial data mining

ICA k-means based time series clustering analysis of online word-of-mouth

k-Means Performance Improvements with Centroid Calculation Heuristics Both for Serial and Parallel Environments

Applying data mining techniques to predict annual yield of major crops and recommend planting different crops in different districts in Bangladesh

An accurate clustering algorithm for fast protein-profiling using SCICA on MALDI-TOF

Metric Based Performance Analysis of Clustering Algorithms for High Dimensional Data

Students behavioural analysis in an online learning environment using data mining

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options