The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The purpose of this study is to clarify the applicability of data-driven approach in accounting area. As the first stage, focusing on the model comparison, this paper shows the effectiveness of model selection with data mining technique for the development of earnings prediction model based on financial statement data. In accounting area, researchers have not considered the characteristic of financial...
For the quality of the wine big data identification technology, the introduction of data mining classification algorithm, effectively according to the content of several impact compounds in wine level identification;Are introduced including the Logistic regression and BP neural network and SVM classification algorithm, in view of the three algorithms identify the modeling analysis of wine quality...
Self-paced learning (SPL), a recently proposed learning strategy, which progressively adds instances to train from simplicity to complexity, could typically reduce the risk of achieving local optima. SPL selects instances based on their losses among the entire data set in each iteration. This probably causes that the selected instances are highly imbalanced, e.g., very few (even on) instances of some...
Financial information extraction from big financial reports is a tedious task. This paper speaks about page-wise feature generation and applying learning algorithms for identifying financial information (balance sheets, cash flows, and income statements) in Form 10-K or annual reports of companies. Balance sheets, cash flows, and income statements have some structure in them and are semi-structured...
Mining advisor-advisee relationships can benefit many interesting applications such as advisor recommendation and protege performance analysis. Based on the hypothesis that, advisor-advisee relationships among researchers are hidden in scholarly big data, we propose in this work a deep learning based advisor-advisee relationship identification method which considers the personal properties and network...
Cataclysmic variable (CV) stars are binary stars that consist of two components: a white dwarf primary, and a mass transferring secondary. Due to the relative faint of cataclysmic variable and a large number of irregular changes, it is not easy to get valuable data and important research results on observation. But they have significant meaning on the subsequent research of these spectra. In general,...
During the last two decades, the credit card system has been widely used as a mechanism to drive the global economy to grow dramatically. A credit card provider has issued millions of credit cards to its customers. However, issuing credit cards to wrong customers can be a crucial factor of a financial crisis, e.g., the ones happened in 1997 and 2008. This paper presents a systematic analysis and a...
Prediction of faults in a proposed software is helpful in deciding the amount of effort to be given for software development. We observed that, a good number of authors hypothesized that the performance of fault prediction model depends on the source code metrics which are used as input of the model. Feature selection technique is a process of selecting suitable set of source code metrics which may...
Anomaly based Intrusion Detection System (IDS) identifies intrusion by training itself to recognize acceptable behavior of the network. It then raises an alarm whenever any anomalous network behaviors outside the boundaries of its training sets are observed. However, anomaly based IDS are usually prone to high false positive rate due to difficulties involved in defining normal and abnormal network...
Context: Cross-project defect prediction (CPDP) research has been popular and many CPDP methods were proposed. While these methods used cross-project data as is for their inputs, useless or noisy information in the cross-project data can cause the degradation of predictive and computation performance. Removing such information makes the cross-project data simple and it will affect the performance...
Intensive Care Unit (ICU) patients have significant morbidity and mortality, often from complications that arise during the hospital stay. Severe sepsis is one of the leading causes of death among these patients. Predictive models have the potential to allow for earlier detection of severe sepsis and ultimately earlier intervention. However, current methods for identifying and predicting severe sepsis...
While active learning has drawn broad attention in recent years, there are relatively few studies on stopping criterion for active learning. We here propose a novel model stability based stopping criterion, which considers the potential of each unlabeled examples to change the model once added to the training set. The underlying motivation is that active learning should terminate when the model does...
Prognostic modeling is central to medicine, as it is often used to predict patients' outcome and response to treatments and to identify important medical risk factors. Logistic regression is one of the most used approaches for clinical prediction modeling. Traumatic brain injury (TBI) is an important public health issue and a leading cause of death and disability worldwide. In this study, we adapt...
Prognostic models for end-stage renal disease (ESRD) have been researched extensively as an increasing prevalence internationally. Different machine learning and statistic algorithms for the models were proposed in studies corresponding to different medical datasets including a quantity of missing values for optimal outcomes. We approached this issue by applying stepwise logistic regression, ANN,...
Web social media has become one of the major channels for people to express their opinions, share their feelings and communicate with others. Public opinions often ebb and flow with time due to the occurrence of social events and mutual influence of people on certain topics. The dynamic change of public opinions reflects the evolvement and trend of public attitudes and can facilitate many security-related...
Falls are a common and serious problem faced by older populations. There is a growing interest in estimating the risk of falling for older people using body-worn sensors and simple movement tasks, allowing appropriate fall prevention programs to be administered in a timely manner to the high risk population. This study investigated the capability and validity of using a waist-mounted triaxial accelerometer...
Credit is becoming one of the most important incomes of banking. Past studies indicate that the credit risk scoring model has been better for Logistic Regression and Neural Network. The purpose of this paper is to conduct a comparative study on the accuracy of classification models and reduce the credit risk. In this paper, we use data mining of enterprise software to construct four classification...
Many problems in machine learning involve variable-size structured data, such as sets, sequences, trees, and graphs. Generative (i.e. model based) kernels are well suited for handling structured data since they are able to capture their underlying structure by allowing the inclusion of prior information via specification of the source models. In this paper we focus on marginalisation kernels for variable...
The failure and success of the Banking Industry depends largely on industry's ability to properly evaluate credit risk. Credit Evaluation of any potential credit application has remained a challenge for Banks all over the world till today. This paper checks the applicability of one of the new integrated model on a sample data taken from Indian Banks. The integrated model is a combination model based...
A combining forecast model is proposed to evaluate the residential loan, which improves the accuracy of a single evaluation model. Firstly, the Relevance Vector Machine (RVM) model and logistic regression model are trained by the financial data respectively. Then the weighted average rule is used to fuse these two models based on a weight training procedure. Finally, the combining model is employed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.