The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We developed a credit card fraud detection solution for a major bank in Turkey. The study was completed in about three years and the developed system has been in use since February 2013. It had a great impact in the rule based fraud detection process used by the bank. Indeed, while eighty percent of the rules have been eliminated and the number of alerts has been reduced to half, a significant increase...
Distribution of data stream is always changed in the real world. This problem is usually defined as concept drift [1]. The state-of-the-art decision tree classification method CVFDT[2] can solve the concept drift problem well, but the efficiency is debased because of its general method of handling instances in CVFDT without considering the types of concept drift. In this paper, an algorithm called...
In this paper, an improved decision tree algorithm based on dispersion measure of attribute information was proposed, which combined information gain and dispersion of attribute information as an evaluation criterion of attribute selection in order to overcome the deficiency that ID3 decision tree algorithm leaned to the multi-value attribute. From results of the experiment, it can be demonstrated...
Data mining techniques used to analyze and discover data and correlations already present in databases, showed to be very reliable and useful especially when large volumes of data are processed. These techniques have been applied to many areas, such as marketing, medicine, diagnosis, business, biology, astronomy and others. In particular, astronomy requires techniques that allow the recognition or...
This paper aims to classify the demographics that affect successful high school recruitment. It also aims to minimize the workload of the admission/recruitment personnel in planning school-to-school promotion without compromising the chance of recruiting more high school students. Historical data of College Entrance Test examinees is used for classification. In view of this, the researchers have developed...
Violations of listed companies to disclose accounting information will mislead the ordinary investors seriously and bring huge losses to investors. Therefore, it is particularly necessary to analyze and identify the violations of listed companies based on scientific and effective methods to avoid investment risks in advance. In this paper, we firstly use t-statistic to select eight useful and characteristic...
Interwell connectivity of injection-production system is a kind of important information of reservoir performance analysis. It is largely significant for researching the distribution of remaining-oil and adjusting the oilfield development plan. In order to change the status quo of inferring interwell connectivity in NO.1 oil production plant of Daqing Oilfield, an automatic identification method based...
Data mining is a process of inferring knowledge from such huge data. Data Mining has three major components Clustering or Classification, Association Rules and Sequence Analysis. By simple definition, in classification/clustering analyze a set of data and generate a set of grouping rules which can be used to classify future data. Data mining is the process is to extract information from a data set...
This paper focuses an overview of the main clustering techniques and classification algorithms for evaluation of risk and safety in civil aviation industry. This paper aim to study the performance of different clustering algorithms is correlated based on the time taken to build model arrangement the evaluated clusters. The Database contains number of accident data records for all categories of aviation...
Data mining is an area of computer science with a huge prospective, which is the process of discovering or extracting information from large database or datasets. There are many different areas under Data Mining and one of them is Classification or the supervised learning. Classification also can be implemented through a number of different approaches or algorithms. We have conducted the comparison...
The C4.5 Algorithm can result in a thriving decision tree and will overfit the training data while training the model. In order to overcome those disadvantages, this paper proposed a post-pruning decision tree algorithm based on Bayesian theory, in which each branch of the decision tree generated by the C4.5 algorithm is validated by Bayesian theorem, and then those branches that do not meet the conditions...
Identifying the major contributing factors to traffic collisions and their severity will assist highway safety improvement initiatives by improved facility design and educational program to address the needs due to the changes in demographics. The traffic collision data used in this study has been collected over the last 20 years on the rural highways and urban streets from Saskatchewan, Canada. In...
This paper proposes to apply data mining techniques to predict school failure and dropout. We use real data on 670 middle-school students from Zacatecas, México, and employ white-box classification methods, such as induction rules and decision trees. Experiments attempt to improve their accuracy for predicting which students might fail or dropout by first, using all the available attributes; next,...
Time series shapelets are small and local time series subsequences which are in some sense maximally representative of a class. E.Keogh uses distance of the shapelet to classify objects. Even though shapelet classification can be interpretable and more accurate than many state-of-the-art classifiers, there is one main limitation of shapelets, i.e. shapelet classification training process is offline,...
Data mining is a process of finding hidden information from databases storing historical data which are also known as data-warehouses. Classification being a very well-known data mining technique, groups similar data objects by establishing relationship between the objects under test and the pre-defined class labels obtained during training phase. Of all the classification algorithms, decision tree...
Cancer prognosis prediction improves the quality of treatment and increases the survivability of the patients. Conventional methods of cancer prediction deal with single class by limiting the prognosis prediction to one response variable. The SEER Public Use cancer database has more prominent variables that support better prediction approach. The objective of this paper is to find the prominent labels...
Rule discovery is an important classification method that has been attracting a significant amount of researchers in recent years. Rule discovery or rule mining uses a set of IF-THEN rules to classify a class or category. Besides the classical approaches, many rule mining approaches use biologically-inspired algorithms such as evolutionary algorithms and swarm intelligence approaches. In this paper,...
Under smart grid environment, islanding detection plays an important role in reliable operation of distributed generation (DG) units. In this paper an intelligent-based islanding detection algorithm for PV and DFIG units is proposed. Decision tree algorithm is used to classify islanding detection instances. This algorithm is rapid, simple, intelligible and easy to interpret. The error rate of this...
Today location technologies are integrated into many devices enabling location-based services. Movement data recorded with these devices can be uploaded to web sites and shared with others. Movement data can be organized using keywords and semantic tags, e.g. walking and running. Our main goal is to automatically classify movement data as walking, cycling or driving. In contrast to other work we use...
In order to compare the classification accuracies and performance differences between traditional and probability-based decision tree classifiers, and come to understand those algorithms, which aim to improve construction efficiency of probability-based decision trees, mentioned in "Decisions Trees for Uncertain Data", this paper tested several algorithms, named AVG, UDT, UDT-BP, UDT-LP,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.