The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We present in this paper the analysis results of prominent educational characteristics differentiating people from the two regions in the world: advanced economies versus east Asia and the pacific countries. The automatic multivariate analysis of classification trends has been demonstrated through the visual data mining tool called KNIME. We found from the empirical studies that from the years 1950...
Online Peer-to-Peer (P2P) lending has achieved explosive development recently, which could be beneficial to both sides of individual lending. In this study, a data mining (DM) approach to predict the performance of P2P loan before funded is proposed. Using data from the Lending Club, we explore the characteristics of loan and its applicant and use random forest to do the feature selection in the modeling...
In BioWorld, a medical intelligent tutoring system, novice physicians are tasked with solving virtual patient cases. Whilst the importance of modeling and predicting clinical reasoning is recognized, an important aspect of the learner contribution remains unexplored — the written case summary prepared by the learner. The premise of investigating the case summaries is that it captures the thought and...
Regarding information technologies, transnational education has to face several challenges in order to offer a suitable education for computer science students worldwide. Software tools, and specially open source ones, give to the students the possibility of experiment with the most known techniques in the area. Among them, the KEEL software tool can be highlighted as a versatile framework for understanding...
Attracting more students into science and engineering disciplines concerned many researchers for decades. Literature used traditional statistical methods and qualitative techniques to identify factors that affect student retention up most and predict their persistence. In this paper we developed two neural network models using a feed-forward backpropagation network to predict retention for students...
We propose a data mining approach to predict the wine's quality level in order to improve the quality of products for wine enterprises in this paper. A large dataset is considered and three regression techniques were applied. Through the comparison, we get the conclusion that the model established by neural network is more accurate and it can improve the quality of wine's production.
Recently, the following discrimination aware classification problem was introduced: given a labeled dataset and an attribute B, find a classifier with high predictive accuracy that at the same time does not discriminate on the basis of the given attribute B. This problem is motivated by the fact that often available historic data is biased due to discrimination, e.g., when B denotes ethnicity. Using...
To improve the intelligibility and efficiency of knowledge expression for the land evaluation, a land evaluation method combining simplified fuzzy classification association rules with fuzzy decision is proposed in this paper. To reduce the complexity of the land evaluation models and improve the efficiency and intelligibility of fuzzy classification association rules further, an algorithm to eliminate...
This paper applies DEA model to a sample of 58 power plate listed companies in the securities market in China in 2008, with a view to identifying the financial risk companies and non-financial risk companies, instead of using ST in the past. Then, after comparing logit regression model and neural network LVQ in predicting the company financial risks, the conclusion was drawn that neural network LVQ...
Aim at the low accuracy of intrusion detection system, to analysis the Bayesian classification algorithm and give some improvements, with the experimental data of kddcup99, in order to find a reasonable data pre-processing methods and more effective classification algorithm to improve the accuracy of intrusion detection system.
Clustering is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics a novel algorithm based on clustering to extract rules from neural networks is proposed. After neural networks have been trained and pruned successfully, inner-rules are generated by...
The literature on protein function prediction is currently dominated by works aimed at maximizing predictive accuracy, ignoring the important issues of validation and interpretation of discovered knowledge, which can lead to new insights and hypotheses that are biologically meaningful and advance the understanding of protein functions by biologists. The overall goal of this paper is to critically...
In recent years, cellular automata (CA) has been widely used to simulate urban system and the relevant fields. The primary issue is to explore a set of dynamic transition rules that incorporate a neighborhood effect. However, most of transition rules are defined statically by one or several equations. In this study, autoregression (AR) is introduced to establish a CA-AR model integrating CA for data...
Representation and similarity measure of time series is the research basic of the time-series data mining. This paper uses ESAX (extended symbolic aggregate approximation) representing the time series similarly and raises an improved time series method of similarity measure ESSVS (ESAX statistical vector space) based on the statistics symbolic vector space method. ESSVS measure the time series similarity...
In the predicting financial distress, we know that irrelevant or correlated features in the samples could spoil the performance of the SVR classifier, leading to decrease of prediction accuracy. In order to solve the problems mentioned above, this paper use rough sets as a preprocessor of SVR to select a subset of input variables and employ the particle swarm optimization algorithm (PSOA) to optimize...
Ranking model construction is an important topic in Web mining. Recently, many approaches based on the idea of ??learning to rank?? have been proposed for this task and most of them attempt to score all documents of different queries by resorting to a single function. In this paper, we propose a novel framework of query-dependent ranking. A simple similarity measure is used to calculate similarities...
Asset assignment and scheduling algorithms were developed and implemented to support a team-in-the-loop planning experiment conducted at the Naval Postgraduate School (NPS) in March 2009. The experiment examined planning and information flows among three cells in an abstracted and simplified Maritime Operations Center (MOC). This paper describes two optimization-based modules that focused on the Future...
This paper aims to develop a decision rule-based model of household car ownership by using household travel survey data. The rough set approach as one of data mining techniques provides an effective tool to model the presented data set with decision rule sets. In this paper, the outputted “if-then” rules show the significant relationships between car ownership type and household characteristics. The...
This paper studies the dynamic decision-making problem in the intuitionistic fuzzy environment. For the decision matrix composed by intuitionistic fuzzy numbers, the intuitionistic fuzzy priority rating model is established to determine the membership relation of each alternative to the excellent alternative set, which is also described by intuitionistic fuzzy number. Furthermore, based on the score...
Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. A paradigm is proposed to enable searching at the object level. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. To resolve multiple copies inconsistent...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.