The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The goals of this paper were twofold: to continue and refine previous research in the topic of tree cover type classification by harnessing modern machine learning models, and to extend the conclusions of that work to demonstrate that results gained from such models can be used to assist U.S. land management agencies in current challenges they face. Using the same dataset as the past study, an artificial...
Click-through rate estimation, the core task of programmatic display advertising, is associated with typical big data problems. Online algorithms for generalized linear models, such as Logistic Regression, are the most widely used data mining techniques for learning at such a massive scale. Since these models are unable to capture the underlying nonlinear data patterns, conjunction features are often...
Machine learning and data mining techniques have been widely used in order to improve network intrusion detection in recent years. These techniques make it possible to automate anomaly detection in network traffics. One of the major problems that researchers are facing is the lack of published data available for research purposes. The KDD'99 dataset was used by researchers for over a decade even though...
Real-world data such as medical images and sensor measurements is usually high-dimensional and limited. Using such datasets directly in machine learning tasks can lead to poor generalization. Feature learning is a general approach for transforming high-dimensional data points to a representational space with lower dimensionality. Machine learning models can be trained efficiently with such representations...
Growing bacterial resistance to antibiotics is spurring research on utilizing naturally-occurring antimicrobial peptides (AMPs) as templates for novel drug design. While experimentalists mainly focus on systematic point mutations to measure the effect on antibacterial activity, the computational community seeks to understand what determines such activity in a machine learning setting. The latter seeks...
Modern machine-learning techniques greatly reduce the efforts required to conduct high-quality program compilation, which, without the aid of machine learning, would otherwise heavily rely on human manipulation as well as expert intervention. The success of the application of machine-learning techniques to compilation tasks can be largely attributed to the recent development and advancement of program...
Bug prediction has been a hot research topic for the past two decades, during which different machine learning models based on a variety of software metrics have been proposed. Feature selection is a technique that removes noisy and redundant features to improve the accuracy and generalizability of a prediction model. Although feature selection is important, it adds yet another step to the process...
In this paper, a novel method to do feature selection to detect botnets at their phase of Command and Control (C&C) is presented. A major problem is that researchers have proposed features based on their expertise, but there is no a method to evaluate these features since some of these features could get a lower detection rate than other. To this aim, we find the feature set based on connections...
One of the main tasks of machine learning and data mining is feature selection. Depending on the task different methods applied to find optimal balance between speed and feature selection quality. MeLiF algorithm effectively solves feature selection problem by building ensemble of feature ranking filters. It reduces filters aggregation problem to linear form optimization problem and works as a wrapper,...
In this paper, we proposed an innovative approachfor feature selection and model updating in big data machinelearning. Since hard drive access is the biggest barrier for bigdata problems, it is therefore nature to reduce disk I/O operationswhen evaluating different combinations of features, or updatinga learning machine. Particularly, we are interested in discoveringif small enough matrices exist...
A new prediction model is proposed in transient stability analysis based on machine learning in this paper. It extracts features ahead from the time point that we want to make prediction, which produce an interval to take actions. The proposed model also takes network information into consideration, and tried to analyze how nodes in power grid influence each other. Compared to traditional algorithms...
An increasing number of simultaneous localization and mapping (SLAM) systems are using appearance-based localization to improve the quality of pose estimates. However, with the growing time-spans and size of the areas we want to cover, appearance-based maps are often becoming too large to handle and are consisting of features that are not always reliable for localization purposes. This paper presents...
Energy efficiency measurement and its influence factors is important way of energy efficiency evaluation. In this paper, character identification method has been proposed to determine influence factors of energy efficiency and energy efficiency of 24 provinces in china is analyzed and evaluated by deep learning method. By comparison, two classification and prediction models are built with two other...
Credit scoring prediction is a focus of banking sector to identify trickery customers and to reduce illegal activities. The usage of ensemble classifiers in machine learning plays a vital role in prediction problems. The aim of this study is to analyze the accuracy of the ensemble methods in classifying the customers as good risk group or bad risk group. In this paper experiments are conducted using...
Border Gateway Protocol (BGP) anomalies affect network operations and, hence, their detection is of interest to researchers and practitioners. Various machine learning techniques have been applied for detection of such anomalies. In this paper, we first employ the minimum Redundancy Maximum Relevance (mRMR) feature selection algorithms to extract the most relevant features used for classifying BGP...
Detecting explicit user actions, i.e., requests for web pages such as hyper-link clicks, from passive traces is fundamental for many applications, such as network forensics or content popularity estimation. Every URL explicitly visited by a user usually triggers further automatic URL requests to obtain all objects that compose the web page. HTTP traces provide a summary of all URLs requested by users,...
In this paper we present an empirical evaluation of various techniques for feature selection that are applicable for analysis of funding decisions - whether of not to award funding to a specific scientific project. Input data are a set of review forms (questionnaires), filled in by domain experts, with final decisions of the expert committee about project funding. The data was provided by the Russian...
Millions of people use email correspondence for communication across the globe and it is a critically vital application for many businesses. Considerable amount of unsolicited mail flows into user's mail boxes on a daily basis. A major negative aspect since the past decade has been bulk spam or phishing mail. Besides such unsolicited spam emails being wearisome for many email users, it also puts pressure...
Based on the methods of the traditional topic-based text classification, machine learning method was performed to the coarse-grained sentiment classification of reviews. Sentiment classification involved a lot of problems. In this paper, the sentiment Vector Space Model (s-VSM) was used for text representation to solve data sparseness. In addition, the critical issues of the sentiment classification,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.