The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data mining is the process of extracting patterns from large data sets by combining methods from statistics and artificial intelligence with database management. One of the classes of data mining is classification, where the goal is to generalize a known structure to apply to new data. Supervised learning, a branch of the classification algorithms uses a set of training data to produce an inferred...
The selection of parameters of SVM model will affect the identification effect of suspicious financial transactions, this paper proposes the cross validation method to find the optimal SVM classifier parameters to solve this problem. Cross validation method finds the optimal parameters based on the highest classification accuracy rate through grid search, it can effectively avoid the state of over-learning...
In the medical field, a lot of unstructured information which is expressed by natural language exists in medical literature, technical documentation and medical records. IE (Information Extraction) as one of the most important research directions in natural language process aims to help humans extract concerned information automatically. NER (Named Entity Recognition) is one of the subsystems of IE...
Patent documents, as a kind of open scientific literature protected by law, the abstracts of which often highly summarize the main information. Information extraction work and analysis of the abstracts can contribute to better protection of intellectual property rights and promotion of enterprise technological innovation. This paper focus on patent abstracts and view information extraction of patent...
Customer profiles are by definition made up of factual and transactional data. It is often the case that due to reasons such as high cost of data acquisition and/or protection, only the transactional data are available for data mining operations. Transactional data, however, tend to be highly sparse and skewed due to a large proportion of customers engaging in very few transactions. This can result...
CHRONIOUS system is an integrated platform aiming at the management of chronic disease patients. One of the most important components of the system is a Decision Support System (DSS) that has been developed in a Smart Device (SD). This component decides on patient's current health status by combining several data, which are acquired either by wearable sensors or manually inputted by the patient or...
Data mining is a very popular technique which is successfully used in many areas. The aim of this paper is to present a Hybrid model for data classification from input datasets. The proposed model extracts knowledge using fuzzy rule based systems and performs classification task by fuzzy if-then rules. The proposed method performs the classification task and extracts required knowledge using fuzzy...
Knowledge discovery from the Web is a cyclic process. In this paper we focus on the important part of transforming unstructured information from Web pages into structured relations. Relation extraction systems capture information from natural language text on Web pages, called Web text. However, extraction is quite costly and time consuming. Worse, many Web pages may not contain a textual representation...
Although data mining techniques are made tremendous progress, "knowledge-poor" is still a large gap of the current data mining systems. Few researches notice the fact that useful knowledge not only is the final results of an intelligent classification, clustering or prediction algorithm, but also runs through the whole process of data mining in which much potential useful information is...
Land cover change assessment is one of the main applications of remote sensed data. Change in forest cover have widespread effects on the provision of ecosystem services, and provide important feedbacks to climate change and biodiversity. Moreover, it will be extremely critical if the accuracy of image interpretation can be improved for better understanding the change of forest. Parametric methods...
This article proposes such a question classification approach that integrates multiple semantic features. It is aimed at these two questions in Chinese question classification models: inaccurate semantic information extraction and too slow processing speed caused by too high Eigenvector dimension. With the help of HowNet and the support vector machine and syntactic and semantic information of question...
Recently, the introduction of CT (Culture Technology) in the intelligence system was forming a new paradigm. Among these, research on musical fountain is still in the basic research phase has been started. Musical fountain System is required scenario to control the music and nozzle of fountain. In generally, these musical fountain scenarios created by the experts. So musical fountain need too much...
Opinion mining is a growing interest task in both research and practical applications. It deals with the computational treatment of opinion, sentiment, and subjectivity in documents. This paper focuses on retrieving the opinion documents and giving their sentiment orientation. Mining and ranking the topic relevant opinion documents are implemented with a sentiment model, combining the existing knowledge...
A number of matrix-based data distortion methods are presented and experimentally studied in this paper. The performances of seven methods are compared in terms of utility, privacy and computational cost. We find that left multiplication based random projection methods are useless in data privacy protection. Even though there is no application-free solution in data privacy protection, the nonnegative...
Network intrusion detection system needs to handle huge data selected from network environments which usually contain lots of irrelevant or redundant features. It makes intrusion detection with high resource consumption, as well as results in poor performance of real-time processing and intrusion detection rate. Without loss of generality, feature selection can effectively improve the classification...
The field of Text Mining has evolved over the past years to analyze textual resources. However, it can be used in several other applications. In this research, we are particularly interested in performing text mining techniques on audio materials after translating them into texts in order to detect the speakers' emotions. We describe our overall methodology and present our experimental results. In...
There is little literature to introduce the approaches for the feature selection, which plays an important role in the customer churn prediction. In addition, due to the imbalanced data classification problem occurring, most of the traditional approaches ineffectively select the important features for the churn prediction. This paper proposes a new filter feature selection approach for customer churn...
Rising of computer violence, such as Distributed Denial of Service (DDoS), web vandalism, and cyber bullying are becoming more serious issues when they are politically motivated and intentionally conducted to generate fear in society. These kinds of activity are categorized as cyber terrorism. As the number of such cases increase, the availability of information regarding these actions is required...
The increase of malware that are exploiting the Internet daily has become a serious threat. The manual heuristic inspection of malware analysis is no longer considered effective and efficient compared against the high spreading rate of malware. Hence, automated behavior-based malware detection using machine learning techniques is considered a profound solution. The behavior of each malware on an emulated...
Development of a feature ranking method based upon the discriminative power of features and unbiased towards classifiers is of interest. We have studied a consensus feature ranking method, based on multiple classifiers, and have shown its superiority to well known statistical ranking methods. In a target environment such as a medical dataset, missing values and an unbalanced distribution of data must...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.