The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data mining and machine learning are becoming the most interesting research areas and increasingly popular in health organizations. The hidden patterns among patients data can be extracted by applying data mining. The techniques and tools of data mining are very helpful as they provide health care professionals with significant knowledge toward a decision. Researchers have shown several utilities...
This paper studies the imbalanced data classifycation problem and proposes bi-directional sampling based on clustering (BDSK) for the imbalanced data classification. This algorithm combines SMOTE over-sampling algorithm and under-sampling algorithm based on K-Means to solve the within-class imbalance problem and the between-class imbalance problem. It not only avoid induce too much noise but also...
Data mining is the method which is useful for extracting useful information and data is extorted, but the classical data mining approaches cannot be directly used for big data due to their absolute complexity. The data that is been formed by numerous scientific applications and incorporated environment has grown rapidly not only in size but also in variety in recent era. The data collected is of very...
Many sports are being followed by large crowds and soccer is the most popular one among them. During the game, referee is responsible to protect players' health and to ensure proper implementation of the rules. In order to be able to achieve these tasks, referee needs to have tremendous physical and mental fitness, has to be able to interpret events according to the spirit of the rules and needs to...
In this model, we propose an innovative recruitment system using social networking websites like Twitter and LinkedIn along with code repository hosting website GitHub and competitive coding platforms like SPOJ. It is aimed to develop advanced search engines to automatically sort the job-seekers based on job offer requirements using various data mining and machine learning techniques. Vritthi allows...
The paper proposes new framework to predict the chronic Lupus disease. The new algorithm has been proposed which is best suitable for supervised, semi supervised and unsupervised data. The algorithm is named as CAC (Clustering Association and Classification). The best algorithms are selected based on the accuracy. The 8 major attributes to diagnose lupus has been identified and considered for prediction...
Lung cancer is the number one cause of cancer deaths in both men and women in the worldwide. The two types of lung cancer, which grow and spread differently, are the small cell lung cancers (SCLC) and non-small cell lung cancers (NSCLC). Treatment of lung cancer can involve a combination of surgery, chemotherapy, and radiation therapy as well as newer experimental methods. The general prognosis of...
Social media analytics play a major role in e-commerce for extracting the useful information of a product or service. Opinion mining has become the key process of social media analytics. Twitter is a big online social activity platform where millions of people share their opinions. In this paper two clustering techniques, k-means and DBSCAN, are applied to an annotated Twitter dataset in order to...
In view of today's information available, recent progress in data mining research has lead to the development of various efficient methods for mining interesting patterns in large databases. It plays a vital role in knowledge discovery process by analyzing the huge data from various sources and summarizing it into useful information. It is helpful for analyzing the volumes of data in different domains...
This paper is an attempt of applying EDM methods on Moodle data in order to detect specific behaviours within student groups with the tendency to fail the course. The research is conducted on Moodle logs gathered in the blended course Programming 1. Extracting and using crucial information on time can be a turning point for students in at-risk stage, which is what we tried to achieve in this research.
The rapid computerization and advancement in the technology has led to huge amount of data in the databases. Research has shown that the amount of data in the world doubles in every 20 months. However, this available data consists of large number of noise values and thus, cannot be directly used. The extraction of information from the vast pool of data has emerged a major challenge.
Outlier detection is an important issue in the realm of data mining. Several applications relay on outlier detection such as intrusion detection, fraud detection, medical and public health data, image processing, etc. Clustering-based outlier detection algorithms are considered as the most important outlier detection approaches. They provide high detection rate, however, they suffer from high false...
Data mining techniques are playing an important role in the analysis of mass network information and big data nowadays. The cluster analysis, as a main kind of method in data mining, draws great interest from researchers of various fields who proposed many algorithms such as k-means algorithm and its variants, density-based algorithm and its variants. However, these algorithms all have their own problems...
Automatic methods for an early detection of plant diseases could be vital for precise fruit protection. Traditionally the agriculture expert's knowledge is descriptive and experiment based, therefore it is difficult to describe it mathematically and subsequently build decision system which can replace it. Key parameters of decision based fruit protection system could differ for classes of plants and...
This paper proposes an improved method which applies principal components analysis (PCA) algorithm to an existing fingerprinting localization method based on iterative K-means, grid scoring (KS) and AP scoring (AS). In the off-line phase, the suggested method evaluates the localization capability of every access point (AP) for the first step, and then generates only a few new principal components...
Current microarchitectures are equipped with SIMD instruction sets enabling massive data parallelism within each core. Instruction sets like AVX or SSE operate on large reserved registers and support a wide range of parallel arithmetic or logical operations enabling up to 16 double precision floating point operations per clock cycle. Current data mining applications are usually far from fully exploiting...
A problem of soil clustering and spatial representation of the obtained results, based on in-situ measurements of physical and chemical characteristics of soil, is analysed in the paper. K-means and fuzzy K-means algorithms are adapted for the soil data clastering. Database of soil samples sampled in Montenegro is used for comparative analysis of the used algorithm. Classified soil data are presented...
This paper discusses the relation between dorm arrangement and student performance. One of the unsupervised learning algorithms, k-means algorithm, is mainly used in the process of analysis. Students are clustered into several clusters according to their similarity of performance scores. This paper analyzes the result of clustering by comparing it with actual dorm arrangement. In the end, drawbacks...
The Partition Around Medoids (PAM) is a variation of well known k-Means clustering algorithm where center of each cluster should be chosen as an object of clustered set of objects. PAM is used in a wide spectrum of applications, e.g. text analysis, bioinformatics, intelligent transportation systems, etc. There are approaches to speed up k-Means and PAM algorithms by means of graphic accelerators but...
The main contribution of this work is showing how to obtain a classification of visitors to an amusement park by using cluster analysis and visualization techniques. The selection of variables for K-means algorithm and the results obtained are visually analyzed in dispersion graphs according to their Principal Components, in boxplots and in a Linear Model so as to fine-tune a result that can explain...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.