The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Rapid urbanization has generated a large number of construction land problems in China, such as the idleness and illegal use of land. Whereas most methods have focused on the discovery of idle construction land by spatial overlay analysis, far less attention has been paid to the prediction of the idle construction land in advance. In this paper, a new method based on the Gradient Boosting Machine...
The recent computing trend is producing tons of data every minutes where the amount of imbalanced data is quite high as far as real life data sets are concerned. In practical aspects of data mining, the imbalanced data set is prone to misguide a data mining model. However, data set needs pre-processing before mining. This work focuses on some practical data mining techniques and produces a valid evaluation...
The major disadvantage of Support Vector Machine (SVM) happens in its training phase as it requires to solve a quadratic programming problem, making computation very costly. With the integration of LiDAR data and high spatial resolution orthophoto, more input data layers are available for object-based Support Vector Machine classification. Initially, confusion among classes arises because of the presence...
A random forest (RF) is a kind of ensemble machine learning algorithm used for a classification and a regression. It consists of multiple decision trees that are built from randomly sampled data. The RF has a simple, fast learning, and identification capability compared with other machine learning algorithms. It is widely used for applicable to various recognition systems. Since it is necessary to...
The ability of an intrusion detection system (IDS) to accurately detect potential attacks is crucial in protecting network resources and data from the attack's destructive effects. Among many techniques available for incorporation into IDS to improve its accuracy, classification algorithms have been demonstrated to produce impressive and efficient results in detecting IPv4-based attacks but have not...
We aim to study the modeling limitations of the commonly employed boosted decision trees classifier. Inspired by the success of large, data-hungry visual recognition models (e.g. deep convolutional neural networks), this paper focuses on the relationship between modeling capacity of the weak learners, dataset size, and dataset properties. A set of novel experiments on the Caltech Pedestrian Detection...
The Islamic State of Iraq and Syria (ISIS) is a extremist militant group in the Middle East known to employ social media for propaganda and recruiting purposes. In particular, the social media website Twitter is well known to be exploited by ISIS supporters. To this end, we devise an effective and scalable classification scheme to filter out ISIS propaganda accounts from the rest of the Twitter accounts...
kNN (k nearest neighbors) is widely adopted because of its simplicity. However, its shortcomings can not be neglected, especially its time complexity. Consequently a great amount of approaches emerged in large numbers in decades to cope with this issue with a tradeoff in performance of the classification. In this paper, a novel improved kNN algorithm is proposed with a better performance than traditional...
The HEVC(H.265) has brought in significant improvements in terms of coding efficiency. However, the reduction in bitrates comes along with an increment in computational complexity. This paper presents a data mining approach to reduce the complexity of inter partition modes in HEVC. Determining the CU-splitting in inter partition modes requires substantial resources, so the goal of the work is to terminate...
Due to the fact that video streaming is the current "killer" application and for competitiveness, telecommunication service providers need to be able to answer a fundamental question: to which extent is the available network infrastructure able to successfully provide users with a satisfactory experience when running video streaming applications? Answering this question is far from trivial...
In analyzing streaming data in which the underlying data distribution may change or the concept of interest may drift over time, the ability of a classifier to adapt to drifted concepts is very important to maintaining the prediction performance. However, the true class labels of data samples are often available only after some period of time or they are obtained by experts' efforts. In this paper,...
Spatial analysis in many fields requires effective address extraction from text reports. This problem is of particular importance in social science where news reports contain information about socially relevant incidents. Previous address extraction work focuses on web pages where addresses are separated from other text, however news reports contain addresses embedded in text. Hence, the need for...
Network Traffic Classification carries great importance for both internet service providers (ISPs) and quality of services (QoSs) management. During the last two decades, a lot of machine learning models have been proposed and applied on different types of real time applications to classify their real time traffic and obtain very proficient accuracy results. However, no research has been done on WeChat...
Artificial immune system (AIS) is considered as an adaptive computational intelligence method that could be used for detecting and preventing current computer network threats. AIS generates Antibodies (self) competent in recognizing Antigen (non-self), which is considered as an anomaly technique. This paper aims to develop artificial immune system (AIS) that consists of two levels. Level one is developed...
Naive Bayes classifiers are widely used to filter spam emails, however, the strong independence assumptions between features limit their performance in accurately identifying spams. To address this issue, we proposed a support machine vector based naive Bayes — SVM-NB — filtering system. The SVM-NB first constructs an optimal separating hyperplane that divides samples in the training set into two...
Hough Forest is a framework combining Hough Transform and Random Forest for object detection. The purpose of the present paper is to improve the efficiency and reliability of the original framework by the mean of two contributions. First, instead of generating the image samples by drawing patches randomly from the training set, we bias this step toward the most relevant image content by selecting...
Sybil detection is an important task in cyber security research. Over past years, many data mining algorithms have been adopted to fulfill such task. Using classification and regression for sybil detection is a very challenging task. Despite of existing research made toward modeling classification for sybil detection and prediction, this research has proposed new solution on how sybil activity could...
Real time data analysis in data streams is a highly challenging area in big data. The surge in big data techniques has recently attracted considerable interest to the detection of significant changes or anomalies in data streams. There is a variety of literature across a number of fields relevant to anomaly detection. The growing number of techniques, from seemingly disconnected areas, prevents a...
A standard data set is useful to empirically evaluate classification rules learning algorithms. However, there is still no standard data set which is common enough for various situations. Data sets from the real world are limited to specific applications. The sizes of attributes, the rules and samples of the real data are fixed. A data generator is proposed here to produce synthetic data set which...
This paper presents a salary prediction system using a profile of graduated students as a model. A data mining technique is applied to generate a model to predict a salary for individual students who have similar attributes to the training data. In this work, we also made an experiment to compare five data mining techniques including Decision trees, Naive Bayes, K-Nearest neighbor, Support vector...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.