The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Web spam is a big problem for search engine users in World Wide Web. They use deceptive techniques to achieve high rankings. Although many researchers have presented the different approach for classification and web spam detection still it is an open issue in computer science. Analyzing and evaluating these websites can be an effective step for discovering and categorizing the features of these websites...
Algorithms used in data mining techniques are of great importance in the field of health care, especially in the case of getting patterns or models that are undiscovered in databases. In the area of health care, leukemia affects the blood status and can be discovered by using the Blood Cell Counter (CBC). This study aims to predict the leukemia existence by determining the relationships of blood properties...
Anomaly detection is the process of finding outlying records from a given data set. The aim of this paper is to study a well-known anomaly detection technique on the “Short Message Service Centre” server, used in the telecommunications field to handle and store messages. This server was studied in details, a script was written to gather all the required data that went through a cleaning phase and...
Titanic disaster occurred 100 years ago on April 15, 1912, killing about 1500 passengers and crew members. The fateful incident still compel the researchers and analysts to understand what can have led to the survival of some passengers and demise of the others. With the use of machine learning methods and a dataset consisting of 891 rows in the train set and 418 rows in the test set, the research...
E Learning courses are much in demand in recent times. The need to study student's performance and predicting their performance is increasing along with it. With the growing popularity of educational technology, various data mining algorithms suitable for predicting student performance have been reviewed. The best algorithm depends on the nature of prediction the faculty wants to make. As the amount...
There emerges an increasing need to mine and analyze the health data from smart home medical systems and community medical organizations. Regarding to the influence of irrelevant attributes, in this study, an improved C4.5 decision tree method based on RELIEFF attribute weighting techniques is proposed for medical diagnosis. This method includes two steps: the first step is to delete the irrelevant...
Everyday huge amount of information are transferred from one network to another, the information may be exposed to attacks. The information and information system should be protected from unauthorized users. To provide and maintain the Confidentiality and Integrity of the information is a very tedious job so Intrusion Detection plays a very important role. Although various methods are used to protect...
The worldwide study on causes of death due to heart disease/syndrome has been observed that it is the major cause of death. If recent trends are allowed to continue, 23.6 million people will die from heart disease in coming 2030. The healthcare industry collects large amounts of heart disease data which unfortunately are not “mined” to discover hidden information for effective decision making. In...
In recent years, the use of machine learning methods to deal with the problem of user interest prediction has become a hot research direction in the field of electronic commerce. In the present stage, a naive Bayesian algorithm has the advantages of simple implementation and high classification efficiency. However, this method is too dependent on the distribution of samples in the sample space, and...
This paper based on the analysis of the basic meaning in data mining and the structure of decision tree uses the decision tree algorithm — C4.5 to establish a soil quality grade prediction model and combines the soil composition in Lishu to be a training sample. C4.5 algorithm also expresses the acquired knowledge by means of quantitative rules. The experiment results manifest that the expression...
Data mining is the process of extracting the hidden predictive model from large databases. It has various methods and algorithms. Classification is a supervised method, which builds a model for predicting the new instances. Different algorithms like decision tree, neural networks, support vector machines, k nearest neighbour, Bayesian classification are available for the classification. Decision tree...
When traditional sample selection methods are used to compress large data sets, the computational complexity turns out to be very high and it is really time consuming. To avoid these shortcomings, we propose a new method to select samples based on non-stable cut points. With the basic characteristic of convex function that its extreme values occur at the endpoints of intervals, the method measures...
Extraction of relevant Information from data Is a challenging task. Many times an analyst may end up with an erroneous classifier because of huge, redundant, unreliable and noisy data. It may also be due to misinterpretation of results and usage of inappropriate techniques for a specific situation. In our study, we have investigated the two main approaches in data mining which are Decision Tree (J48...
Machine learning algorithms are computer programs that try to predict cancer type based on the past data. The eventual goal of Machine learning algorithms in cancer diagnosis is to have a trained machine learning algorithm that gives the gene expression levels from cancer patient, can accurately predict what type and severity of cancer they have, aiding the doctor in treating it. The existing technology...
Nowadays, the classification problems have become more challenging due to the various types of data set. Some data are appropriated for machine learning techniques and some data are appropriated for statistical leaning techniques. This work proposes a new hybrid ensemble of machine and statistical learning models using confidence-based boosting. The proposed method which uses variants of based classifiers...
The challenge to choose the best algorithm and its best parameters for a given problem is known as Combined Algorithm Selection and Hyperparameter Optimization Problem. Among all the classification algorithms available are those based on human comprehensible representations, such as decision trees and classification rule induction. These algorithms are usually chosen by the clarity of the results...
With the advent of the computer science, the data volume that needed to be processed under many practical situations increases dramatically, challenging many traditional machine learning techniques. Bearing this in mind, we made an intensive study on the optimization of decision tree algorithm and its corresponding porting to the big data analysis in this paper. An optimized genetic algorithm is merged...
Extracting meaningful pattern from data can be challenging. Irrelevant, redundant, noisy and unreliable data, misinterpretation of results and incompatibility of a technique to extract unknown patterns from data may lead analyst to develop an erroneous classifier. This research is encouraged by ‘No Free Lunch’ theorem that can be simplified as no classification technique that works best for every...
Rapid development of information technologies, in particular, progress in methods of collection, storage and processing of data has allowed to collect huge data arrays with the purpose of their analysis in many organizations. Opportunities of experts are not enough because amount of these data are too much. This generates demand for methods of automatic data analysis number of which annually increase...
Decision tree technologists have been examined to be a helpful way to find out the human decision making within a host. Decision tree performs variable screening or feature selection. It requires relatively lesser effort from the users for the preparation of the data. In the proposed algorithm firstly we have undertaken to minimize the unnecessary redundancy in the decision tree, reducing the volume...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.