The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
DDoS attacks bring huge threaten to network, how to effectively detect DDoS is a hot topic of information security. Currently, there are some methods designed to detect DDoS attacks, but the detection rate of them is low. Moreover, DDoS detection is easily misled by flash crowd traffic. In this paper, a new method to detect DDoS attacks based on RDF-SVM algorithm is proposed. By considering the importance...
In semi administered bunching is one of the vital errands and goes for gathering the information objects into classes (groups) to such an extent that the similitude of items inside bunches is high and the comparability of articles between bunches is Less. The dataset once in a while might be in blended nature that is it might comprise of both numeric and unmitigated sort of information. So two types...
This paper presents a data-driven approach for exploring outage duration in power distribution systems. The primary goals are to analyze and interpret the outage duration according to the underlying causes and to identify significant variables that strongly impact the outage duration. To carry out this study, actual outage data collected by a major utility company that has operations in the southeastern...
Due to the imbalanced distribution of business data, missing of user features and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms like Logistic Regression and SVM etc. This paper exploits a heuristic bootstrap sampling approach combined with...
In the world today, the security of the computer system is of great importance, And in the last few years, there have seen an affected growth in the amount of intrusions that intrusion detection has become the dominant of current information security. Firewalls cannot provide complete protection. Applying on a firewall system alone is not enough to prevent a corporate network from all types of network...
Titanic disaster occurred 100 years ago on April 15, 1912, killing about 1500 passengers and crew members. The fateful incident still compel the researchers and analysts to understand what can have led to the survival of some passengers and demise of the others. With the use of machine learning methods and a dataset consisting of 891 rows in the train set and 418 rows in the test set, the research...
The Wide-reaching usage of the standard called as IEEE 802.111 has been acting as a solution to support aggressive network coverage with high bandwidth raised various security threats. The wide use of the Wi-Fi (Wireless Fidelity) has enabled us to easily access the internet and it has also paved way for the origin of many hacking attacks. Anomaly detection as applied to detecting active data breaches...
In order to improve the efficiency and adaptability of classical random forest algorithm in large data environment, an improved random forest algorithm based on Spark is proposed. Firstly, an improved random forest algorithm (FRF) based on the Fayyad boundary point principle is proposed to deal with the shortcomings of classical random forest algorithm in the process of discretization of continuous...
In recent years, type II diabetes has become a serious disease that threaten the health and mind of human. Efficient predictive modeling is required for medical researchers and practitioners. This study proposes a type II diabetes prediction model based on random forest which aims at analyzing some readily available indicators (age, weight, waist, hip, etc.) effects on diabetes and discovering some...
The recent computing trend is producing tons of data every minutes where the amount of imbalanced data is quite high as far as real life data sets are concerned. In practical aspects of data mining, the imbalanced data set is prone to misguide a data mining model. However, data set needs pre-processing before mining. This work focuses on some practical data mining techniques and produces a valid evaluation...
Risk Analysis of electricity bill charge has been both challenging and important in the field of electricity power supply in China. In this paper, a novel electricity bill charge risk predicting method is proposed. The SMOTE (synthetic minority oversampling technique) algorithm is first used to under-sampling the majority class and over-sampling the minority class, and then it is combined with some...
Nonintrusive load monitoring (NILM) is a procedure for the analysis of the changes in the power (current and voltage) that goes into households and classifying the appliances used in the house according to their individual energy consumption. Utility companies use smart electric meters accompanied with NILM to examine the particular uses of electric power in households. Focus of this paper is on the...
Cognitive radio (CR) network technology is widely used as a approach to the solve the scarce radio spectrum by allowing the unlicensed users to access the licensed spectrum. Since in the CR network, the licensed users are easily be affected by the introduce of the unlicensed user, we have to avoid to bring the interference to the licensed users when the unlicensed users try to transmit the data on...
The hyperspectral image has the advantages of wide spectral range and the high spectral resolution, and is widely applied in the terrain classification. In this paper, we study the airborne hyperspectral image classification methods using the airborne hyperspectral image. Considering the hyperspectral image has amounts of bands and there is redundancy among the bands, the principle component analysis...
In this paper, we introduce a new land cover mapping technique by taking advantages of a weighted random forest [1] and the level set method [2] to remove the weaknesses of each other. The weighted random forest can accurately estimate the likelihood that a pixel belonging to each classes while the level set method can capture the dependency among neighboring pixels. As a result, by combining their...
Sentiment analysis refers to classify the emotion of a text whether positive or negative. The studies conducted on sentiment analysis are generally based on English and other languages while there are limited studies on Turkish. In this study, after constructing a dataset using a well-known hotel reservation site booking.com, we compare the performances of different machine learning approaches. We...
In this paper, we proposed a new random forest algorithm designed specifically for the land cover mapping problem. Three approaches are investigated, namely, pixel-based, neighbor-looking and combination of both. In the pixel-based approach, we use the fact that all decision trees are different whereas, in the neighbor-looking, the decisions from neighboring pixels are used when the decisions from...
This paper proposes a novel framework for automated detection of urban road manhole covers using mobile laser scanning (MLS) data. First, to narrow searching regions and reduce the computational complexity, road surface points are segmented from a raw point cloud via a curb-based road surface segmentation approach and rasterized into a georeferenced intensity image through inverse distance weighted...
It is very important and practical to make data analysis for intrusion detection based on large scale data. For the current system problem in simulation and off-line analysis, a set of system is proposed as intrusion detection and analysis for truly website. The system is integrated with two subsystems of intrusion detection and large data analysis. Through network construction and software design,...
Clustering is a widely used technique in data mining applications for discovering patterns in underlying data. Most traditional clustering algorithms are limited to handling datasets that contain either numeric or categorical attributes. However, data sets with mixed types of attributes are common in real life data mining applications. In this paper, we introduce a new framework for clustering mixed...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.