The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With an ever-increasing amount of information made available via the Internet, it is getting more and more difficult to find the relevant pieces of information. Recommender systems have thus become an essential part of information technology. Although a lot of research has been devoted to this area, the factors influencing the quality of recommendations are not completely understood. This paper examines...
For extorting the helpful comprehension concealed in the biggest compilation of a database the data mining technology is used. There are some negative approaches occurred about the data mining technology, among which the potential privacy incursion and potential discrimination. The latter consists of irrationally considering individuals on the source of their fitting to an exact group. Data mining...
The proposed methodology involves to compares classification techniques for predicting the cognitive skill of students which can be evaluate by conducting the online test. The paper focuses the comparative performance of C4.5 algorithm, Naïve Bayes classifier algorithm which one is well suited accuracy for predicting the skill of expertise by experimenting in Rapid miner.
Classification of data points in a data stream is a fundamentally different set of challenges than data mining on static data. While streaming data is often placed into the context of "Big Data" (or more specifically "Fast Data") wherein one-pass algorithms are used, true data streams offer additional hurdles due to their dynamic, evolving, and non-stationary nature. During the...
Speech recognition systems are either based on parametric approach or non-parametric approach. Parametric based systems such as HMMs have been the dominant technology for speech recognition in the past decade. Despite a lot of advancements and enhancements in the design of these systems: key problems such as long term temporal dependence, etc. Has not yet been solved. Recently due to availability...
Efficiency of general classification models in various problems is different according to the characteristics and the space of the problem. Even in a particular issue, it may not be distinguished a special privilege for a classifier method than the others. Ensemble classifier methods aim to combine the results of several classifiers to cover the deficiency of each classifier by others. This combination...
Machine learning techniques have been earnestly explored by many software engineering researchers. At present state of art, there is no conclusive evidence on the kind of machine learning techniques which are most accurate and efficient for software defect prediction but some recent studies suggest that combining multiple machine learners, that is, ensemble learning, may be a more accurate alternative...
Tree Augmented Naive Bayes Classification (TANC) is not very well to deal with continuous data and it ignores partial data in the absence of data attribute value and this can reduce the result accuracy. To resolve this problem, an improved algorithm based on C4.5 is proposed in this paper. The proposed algorithm firstly modifies the available training data according to the predictions of C4.5, then...
Supervised learning is a commonly used tool for link prediction in social networks, where data imbalance is a major challenge because only a small portion of nodes may have social connections. In this paper, we propose to use a k-nearest neighbor sampling and a random sampling combined approach to address data imbalance issue for social link prediction. In our solution, we use two sampling approaches...
An increasing number of adaptive protocols use training data to learn optimal parameter choices for adaptation in wireless communication networks. For instance, several recent papers have studied link adaptation protocols based on context information such as node velocity and SNR. However, a number of embedded sensors providing context information frequently report erroneous values, e.g., GPS errors...
Growing scale of server infrastructure in large datacenters has led to an increased need for effective server workload prediction mechanisms. Two main challenges faced in server workload prediction task are lack of large-scale training data and changes in the underlying distribution of server workloads in events like change in dominant applications of servers or change in allocation of servers, etc...
Decision tree classification algorithm provides a fast and effective classification method for datasets, and it calculates information gain of each attribute, and selects the attribute with the greatest information gain as the split. However, to the best of our knowledge, when we use the traditional decision tree algorithm to analyze data in real life, we will encounter some unusual situations. This...
Traditional data-driven prognostics often requires a large amount of failure data for the offline training in order to achieve good accuracy for the online prediction. However, in many engineered systems, failure data are fairly expensive and time-consuming to obtain while suspension data are readily available. In such cases, it becomes essentially critical to utilize suspension data, which may carry...
Classification using association is a recent data mining approach that integrates association rule discovery and classification. A modified version of the Multi-class Classification based on Association Rule (MCAR) is proposed in this paper. The proposed classifier, known as Modified Multi-class Classification based on Association Rule, MMCAR, employs a new rule production function which resulted...
Associating functional information with biological sequences remains a challenge for machine learning methods. The performance of these methods often depends on deriving predictive features from the sequences sought to be classified. Feature generation is a difficult problem, as the connection between the sequence features and the sought property is not known a priori. It is often the task of domain...
Predicting the clinical outcome prior to minimally invasive treatments for Benign Prostatic Hperlasia (BPH) cases would be very useful. However, clinical prediction has not been reliable in spite of multiple assessment parameters, such as symptom indices and flow rates. In our prior study, Artificial Intelligence (AI) algorithms were used to train computers to predict the surgical outcome in BPH patients...
Cross-lingual projection encounters two major challenges, the noise from word-alignment error and the syntactic divergences between two languages. To solve these two problems, a semi-supervised learning framework of cross-lingual projection is proposed to get better annotations using parallel data. Moreover, a projection model is introduced to model the projection process of labeling from the resource-rich...
In bioinformatics fields, Predicting protein subcellular location is an important task, because protein has to be located in its proper position in a cell to perform its biological functions. Therefore, predicting protein location is an important and challenging task in current molecular and cellular biology. In this paper, a computational method based AdaBoost.M1 algorithm and pseudo amino acids...
Feature selection is an important data preprocessing step in pattern recognition. Recently, a wrapper-type semi-supervised feature selection method, known as FW-SemiFS, was proposed to overcome the small labeled sample problem of supervised feature selection. FW-SemiFS does not consider the confidence of predicted unlabeled data, but rather evaluates the relevance of features according to their frequency...
Support Vector Machines have been promising tools for data mining during these years because of their good performance. However, a main weakness of SVMs is lack of comprehensibility: people can not understand what the “optimal hyperplane” means and are unconfident about the prediction especially when they are not the domain experts. In this paper we introduce a new method to extract knowledge with...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.