The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Minimum redundancy maximum relevancy (mRMR) is one of the successful criteria used by many feature selection techniques to evaluate the discriminating abilities of the features. We combined dynamic sample space with mRMR and proposed a new feature selection method. In each iteration, the weighted mRMR values are calculated on dynamic sample space consisting of the current unlabelled samples. The feature...
Among the large number of genes presented in microarray data, only a small fraction of them are effective for performing a certain diagnostic test. However, it is very difficult to identify these genes for disease diagnosis. In this regard, a new supervised gene clustering algorithm is proposed to cluster genes from microarray data. The proposed method directly incorporates the information of response...
Cormack classification is believed as a golden indicator for predicting tracheal intubation is difficult or not in clinic. Some anaesthetists usually estimate the airway state by examining single airway features. However, specialists agree that prediction accuracy of a difficult airway may be improved if multiple static and dynamic metrical airway features were considered. In this paper, we developed...
Each year, insect-borne diseases kill more than one million people, and harmful insects destroy tens of billions of dollars worth of crops and livestock. At the same time, beneficial insects pollinate three-quarters of all food consumed by humans. Given the extraordinary impact of insects on human life, it is somewhat surprising that machine learning has made very little impact on understanding (and...
Micro array data have a low instance-count and high dimensionality problem which prevent classifiers from building accurate models. This may result in significantly different classification accuracies across classifiers and features chosen. Therefore it is important to select the classifier and feature selection method that perform well on a specific data set. This paper proposes a novel criterion...
As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes...
The Naïve Bayesian classifier has been suggested as an effective method to construct anti-spam filters for its strong categorization and high precision. Artificial immune system has become a new embranchment in computing intelligence for its good self-learning, self-adaptability and robustness. This paper proposes a new spam filtering means based on Naïve Bayes and AIS, and analyses the key problems...
Development of a feature ranking method based upon the discriminative power of features and unbiased towards classifiers is of interest. We have studied a consensus feature ranking method, based on multiple classifiers, and have shown its superiority to well known statistical ranking methods. In a target environment such as a medical dataset, missing values and an unbalanced distribution of data must...
This paper presents a simulation-based empirical study of the performance profile of random sub sample ensembles with a hybrid mix of base learner composition in high dimensional feature spaces. The performance of hybrid random sub sample ensemble that uses a combination of C4.5, k-nearest neighbor (kNN) and naïve Bayes base learners is assessed through statistical testing in comparison to those...
In this paper we propose a novel Support Vector Machine(SVM) based approach for noisy data removal from datasets. It is observed that the instability present in the dataset greatly affects the overall performance of the any classifier. Hence, we propose a methodology for removal of such instabilities. In the proposed approach, we proceed by determining the clusters formed using support equilibrium...
Parallel corpora are essential for training statistical machine translation models. Since parallel sentence-aligned corpora are usually noisy due to inexact automatic methods when generated from parallel or comparable documents, we need to clean parallel corpora. In this paper, new features are introduced to assess the correctness of a sentence pair. Also, the impact of new features in combination...
The continued growth of Email usage, which is naturally followed by an increase in unsolicited emails so called spams, motivates research in spam filtering area. In the context of spam filtering systems, addressing the evolving nature of spams, which leads to obsolete the related models, has been always a challenge. In this paper an adaptive spam filtering system based on language model is proposed...
For last few years, researchers are increasingly employing machine learning methods in the domain of cancer prognosis. The main reason behind these efforts is to help oncologist to make accurate and less invasive decisions for the patient's treatment. Moreover, it would relieve many cancer patients from agonizingly complex surgical treatments and their colossal costs. In this paper, we have proposed...
Worms are self-contained programs that spread over the Internet. Worms cause problems such as lost of information, information theft and denial-of-service attacks. The first part of the paper evaluates the detection of worms based on content classification by using all machine learning techniques available in WEKA data mining tools. Four most accurate and quite fast classifiers are identified for...
In this paper, we present a practical algorithm to deal with the data specific classification problem when there are datasets with different properties. We proposed to integrate error rate, missing values and expert judgment as factors for determining data specific pruning to form Expert Knowledge Based Pruning (EKBP). We conduct an extensive experimental study on openly available 40 real world datasets...
In this work we have reformulated the twin support vector machine (TWSVM) classifier by considering unity norm of the normal vector of the hyperplanes as the constraints. TWSVM with unity norm hyperplanes removes the shortcomings of the classical TWSVM formulation. The resulting new formulation is a nonlinear programming problem which is solved by sequential quadratic optimization method. The performance...
Knowledge of structural classes is useful in understanding of folding patterns in proteins. Although numerous methods were proposed and achieved promising results in structural class prediction, some problems in using protein-sequence information have impeded the development. In this paper, a combined representation of protein-sequence information is proposed for prediction of protein structural class,...
In this paper we apply Machine Learning (ML) techniques on static features that are extracted from Android's application files for the classification of the files. Features are extracted from Android's Java byte-code (i.e.,.dex files) and other file types such as XML-files. Our evaluation focused on classifying two types of Android applications: tools and games. Successful differentiation between...
With the appearance of large-scale database and people's increasing concern about individual privacy, privacy-preserving data mining becomes a hot study area, to which the support vector machine(SVM) belongs. In this paper, a novel privacy-preserving SVM for horizontally partitioned data is given. It has comparable accuracy to that of an ordinary SVM as we obtain the SVM by using the distinct property...
Support vector machine (SVM) and K-Nearest Neighbor (KNN) classifier is a combined classifying method, which has excellent performance for various applications. The purpose of this study is to examine the performance of the SVM-KNN classifier on the diagnosis of breast cancer using tumor dataset. The objective is to classify a tumor as either benign or malignant based on cell descriptions gathered...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.