The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clustering is an important tool for analyzing gene expression data. Many clustering algorithms have been proposed for the analysis of gene expression data. In this article we have clustered real life gene expression data via K-Means which is one of clustering algorithms. Also, we have proposed a new method determining the initial cluster centers for K-means. We have compared results of our method...
With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data are implicit, previously unknown and potentially useful information. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data...
In the world today, the security of the computer system is of great importance, And in the last few years, there have seen an affected growth in the amount of intrusions that intrusion detection has become the dominant of current information security. Firewalls cannot provide complete protection. Applying on a firewall system alone is not enough to prevent a corporate network from all types of network...
Classification is a central problem in the fields of data mining and machine learning. Using a training set of labeled instances, the task is to build a model (classifier) that can be used to predict the class of new unlabelled instances. Data preparation is crucial to the data mining process, and its focus is to improve the fitness of the training data for the learning algorithms to produce more...
Currently there are many techniques based on information technology and communication aimed at assessing the performance of students. Data mining applied in the educational field (educational data mining) is one of the most popular techniques that are used to provide feedback with regard to the teaching-learning process. In recent years there have been a large number of open source applications in...
The study of the dynamic behaviour of the solar radiation is a very important task for PV system efficiency. Hence, we propose in this paper, a time series data mining method to detect the underlying dynamic presents in hourly solar radiation time series. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is used to cluster the solar radiation time series and detect noisy data. Moreover,...
Association rule mining is one of the most relevant techniques in data mining, aiming to extract correlation among sets of items or products in transactional databases. The huge number of association rules extracted represents the main obstacle that a decision maker faces. Hence, many interestingness measures have been proposed to evaluate the association rules. However, the abundance of these measures...
Crime is one of the most predominant and alarming aspects in our society and its prevention is a vital task. Crime analysis is a systematic way of detecting and investigating patterns and trends in crime. In this work, we use various clustering approaches of data mining to analyse the crime data of Tamilnadu. The crime data is extracted from National Crime Records Bureau (NCRB) of India. It consists...
Medical data mining is one of the significant research field as medical organizations produce large volume of data on daily basis. Handling this vast amount of data in medical field is challenging, so there is a need to mine this data in order to extract useful patterns for disease prediction. A hybrid K-means and Support Vector Machine algorithm for disease prediction is proposed in this paper. The...
One identical weighting scheme for each sample of one cluster is often employed in the traditional sample weighting k-means clustering. However, this paper proposes a novel sample weighting k-means clustering algorithm based on angles information(SWKMA). In this presented SWKMA, firstly, samples of one cluster is divided into two types according to angles information, and secondly, different weighting...
This paper studies the imbalanced data classifycation problem and proposes bi-directional sampling based on clustering (BDSK) for the imbalanced data classification. This algorithm combines SMOTE over-sampling algorithm and under-sampling algorithm based on K-Means to solve the within-class imbalance problem and the between-class imbalance problem. It not only avoid induce too much noise but also...
Lung cancer is the number one cause of cancer deaths in both men and women in the worldwide. The two types of lung cancer, which grow and spread differently, are the small cell lung cancers (SCLC) and non-small cell lung cancers (NSCLC). Treatment of lung cancer can involve a combination of surgery, chemotherapy, and radiation therapy as well as newer experimental methods. The general prognosis of...
The rapid computerization and advancement in the technology has led to huge amount of data in the databases. Research has shown that the amount of data in the world doubles in every 20 months. However, this available data consists of large number of noise values and thus, cannot be directly used. The extraction of information from the vast pool of data has emerged a major challenge.
Clustering is a way of combining data objects or data points into disjoint cluster. The basic concept behind clustering is that the data objects in the same clusters should be related to each other and the data objects belonging to different clusters should differ from each other. This research paper proposes a new algorithm which combines the features of K-means clustering algorithm and Hierarchical...
This paper presents the improved algorithm for the Hybrid Approach of Neural network and Level-2 Fuzzy set (HANN-L2F). The main structure is including 2 parts. The first part is Neuro-Fuzzy system, including the MLP Neural network with the combination of the level-2 Fuzzy system. The second part is using k-nearest neighbor to classify the output from Neuro-fuzzy. The HANN-L2F is an algorithm with...
With the development of digital cable interactive business and the diversification of the customers' demand, grouping TV programmes based on preferences of users effectively is vital for market segmentation and differentiation. The study summarizes the main principle and characteristic of clustering algorithm, and uses K-Means algorithm to show TV programmes preference grouping based on 52392 subscribers...
Cardiotocography (CTG) records fetal heart rate (FHR) signal and intra uterine pressure (IUP) simultaneously. CTG are widely used for diagnosing and evaluates pregnancy and fetus condition until before delivery. The high dimension of CTG data are the problem for classification computation, by extracting feature we can get the useful information from CTG data, and in this research, K-Means Algorithm...
With the advent of the big data era, traditional data mining algorithm becomes incompetent for the task of massive data analysis, management and mining. The development of cloud computing brings new life to algorithm parallelization. In this paper, we have studied the K-means algorithm, one of the clustering algorithm. Then we attempt to improves this algorithm via the method that sample the large-scale...
In this paper, we propose a hybrid method for intrusion detection which is based on k-means, naive-bayes and back propagation neural network (KBB). Initially we apply k-means which is partition-based, unsupervised cluster analysis method. In the form of clusters, we attain the gathered data which can be easily processed and learned by any machine learning algorithm. These outcomes are provided to...
The rise in amount of information over internet in last few years has caused the growing risk of information flooding which in turn has created the problem of accessing relevant data to the users. Also with the hike in number of websites and web pages, webmasters find it challenging to formulate the content in accordance with the user's need. The information demand of the online users can be figured...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.