The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Credit is becoming one of the most important incomes of banking. Past studies indicate that the credit risk scoring model has been better for Logistic Regression and Neural Network. The purpose of this paper is to conduct a comparative study on the accuracy of classification models and reduce the credit risk. In this paper, we use data mining of enterprise software to construct four classification...
Data mining is a process of inferring knowledge from such huge data. Data Mining has three major components Clustering or Classification, Association Rules and Sequence Analysis. By simple definition, in classification/clustering analyze a set of data and generate a set of grouping rules which can be used to classify future data. Data mining is the process is to extract information from a data set...
This paper focuses an overview of the main clustering techniques and classification algorithms for evaluation of risk and safety in civil aviation industry. This paper aim to study the performance of different clustering algorithms is correlated based on the time taken to build model arrangement the evaluated clusters. The Database contains number of accident data records for all categories of aviation...
The work described in this paper applies machine learning techniques, to a database of accommodative esotropic patients. Accommodative esotropia is an eye disease that when left untreated leads to blindness. Patients whose muscles deteriorate most often need corrective surgery in order to prevent this, since less invasive methods of treatment tend to fail in these patients. It is often difficult for...
Today location technologies are integrated into many devices enabling location-based services. Movement data recorded with these devices can be uploaded to web sites and shared with others. Movement data can be organized using keywords and semantic tags, e.g. walking and running. Our main goal is to automatically classify movement data as walking, cycling or driving. In contrast to other work we use...
Support Vector Machines are the state-of-the-art tools in data mining. However, their strength are also their main weakness, as the generated nonlinear models are typically regarded as incomprehensible black-box models. Therefore, opening the black-boxor making SVMs explainable became more important and necessary in areas such as medical diagnosis and credit evaluation. Rule extraction from SVMs,...
CHRONIOUS system is an integrated platform aiming at the management of chronic disease patients. One of the most important components of the system is a Decision Support System (DSS) that has been developed in a Smart Device (SD). This component decides on patient's current health status by combining several data, which are acquired either by wearable sensors or manually inputted by the patient or...
Support Vector Machines have been promising tools for data mining during these years because of their good performance. However, a main weakness of SVMs is lack of comprehensibility: people can not understand what the “optimal hyperplane” means and are unconfident about the prediction especially when they are not the domain experts. In this paper we introduce a new method to extract knowledge with...
Based on the Landsat-5 TM image of May, 4 and January, 12 in 1997, this paper extracts the land cover information in Wuxi and Changzhou city which is belong to the heavily polluted area of Taihu Lake, China. Firstly, the paper collected and analyzed the spectral characteristics of main land cover type, and then determined the main indexes used in decision tree combining with the characteristics of...
Dataset used in financial distress prediction is unbalanced. The traditional machine learning method such as neural network and support vector machine is premise with the hypothesis that the class distribution is basically balanced. The classification of unbalanced dataset inclines to the relative majority samples results in the lower identification of the minority while the conventional down-sampling...
Land cover change assessment is one of the main applications of remote sensed data. Change in forest cover have widespread effects on the provision of ecosystem services, and provide important feedbacks to climate change and biodiversity. Moreover, it will be extremely critical if the accuracy of image interpretation can be improved for better understanding the change of forest. Parametric methods...
Rising of computer violence, such as Distributed Denial of Service (DDoS), web vandalism, and cyber bullying are becoming more serious issues when they are politically motivated and intentionally conducted to generate fear in society. These kinds of activity are categorized as cyber terrorism. As the number of such cases increase, the availability of information regarding these actions is required...
The increase of malware that are exploiting the Internet daily has become a serious threat. The manual heuristic inspection of malware analysis is no longer considered effective and efficient compared against the high spreading rate of malware. Hence, automated behavior-based malware detection using machine learning techniques is considered a profound solution. The behavior of each malware on an emulated...
The following topics are dealt with: data mining; local clustering; spatiotemporal event detection; time series; Markov models; email classification; data stream; parallel mining; Bayesian network; unsupervised learning; missing values prediction; anomaly detection; decision tree; binary classifier; data similarity matrix; data mapping; support vector machine; Mapreduce; document similarity; social...
Network Intrusion Detection aims at distinguishing the behavior of the network. It is an inseparable part of the information security system. Due to rapid development of attack pattern it is necessary to develop a system which can upgrade itself as new threats are detected. Also detection rate should be high because the rate with which attack is carried out on the network is very high. In response...
The metadata embedded in program executables provides information that can be useful for automated malware detection or classification. With potentially tens of thousands of variants per malware family, it is unclear how much consistency there is in the metadata, and whether different families exhibit different consistencies. Header information from multiple variants of recent malware was studied...
Individual credit risk evaluation is an important and challenging data mining problem in financial analysis domain. This paper compares the effectiveness of four data mining algorithms - logistic regression (LR), decision tree (C4.5), support vector machine (SVM) and neural networks (NN) by applying them to two credit data sets. Experiment results show that the LR and SVM algorithms produced the best...
Aiming at the knowledge mining from fuzzy and uncertain information, the definition mode and properties of the fuzzy formal context are discussed in the paper. The method of constructing the fuzzy concept lattice of the fuzzy formal context is proposed, in that the definition of fuzzy product concept is the core: the intents of two concepts are combined to form the intent of the product concept; using...
Recently, DoS (Denial of Service) detection has become more and more important in web security. In this paper, we argue that DoS attack can be taken as continuous data streams, and thus can be detected by using stream data mining methods. More specifically, we propose a new Weighted Ensemble learning model to detect the DoS attacks. The Weighted Ensemble model first trains base classifiers using different...
In order to analyze the water inrush data with a smaller number and a lower accuracy, a linear kernel H-SVMs model was presented. Firstly, a model was deduced to evaluate the generalization power of H-SVMs, then, a novel method to build H-SVMs was put forward. The separation distances of SVMs are regarded as the indices for classifying and clustering. Through the top-down and bottom-up routes, the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.