The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Internet traffic classification is an area of current research interest. The failure of port and payload based classification motivates researchers to head towards a machine learning (ML) approach. However, training and testing dataset validation has not been formally addressed. This paper discusses the problem of ML dataset validation and highlights three training issues to be considered in ML classification...
The past decade has seen a lot of research on statistics-based network protocol identification using machine learning techniques. Prior studies have shown promising results in terms of high accuracy and fast classification speed. However, most works have embodied an implicit assumption that all protocols are known in advance and presented in the training data, which is unrealistic since real-world...
The increased number of documents in digital format available on the Web and its useful information for different purposes entail an essential need to organize them. However, this task must be automated in order to save costs and manpower. In the community research, the main approach to face this problem is based on the application of machine learning techniques. This article studies the main machine...
Anomaly detection in computer networks is an actively researched topic in the field of intrusion detection. The Internet Analysis System (IAS) is a software framework which provides passive probes and centralized backend services to collect purely statistical network data in distributed computer networks. This paper presents an empirical evaluation of the IAS data format for detecting anomalies, caused...
Real-time classification of Internet traffic according to application types is vital for network management and surveillance. Identifying emerging applications based on well-known port numbers is no longer reliable. While deep packet inspection (DPI) solutions can be accurate, they require constant updates of signatures and become infeasible for encrypted payload especially in multimedia applications...
Classification of network traffic is basic and essential for many network researches and managements. However, classification of network traffic using port-based and simple payload-based methods is diminished with the rapid development of peer-to-peer (P2P) application using dynamic port, disguising techniques and encryption to avoid detection. An alternative method based on statistics and machine...
This paper presents an opinion analysis system based on linguistic knowledge which is acquired from small-scale annotated text and raw topic-relevant Web page. Based on the observation on the annotated opinion corpus, some word-, collocation- and sentence-level linguistic features for opinion analysis are discovered. Supervised and unsupervised learning techniques are developed to learn these features...
The 3 most important issues for anomaly detection based intrusion detection systems by using data mining methods are: feature selection, data value normalization, and the choice of data mining algorithms. In this paper, we study primarily the feature selection of network traffic and its impact on the detection rates. We use KDD CUP 1999 dataset as the sample for the study. We group the features of...
Classification of the numerical data is a very important research topic in machine learning. But the incomplete data is very common in real world application. And the existence of incomplete data degrades the learning quality of classification models. But the existence of incomplete data always decrease the quality of classification models, To show the definition of missing data more intuitively,...
Style-based text authorship identification extracts features from authorship-known texts, constructs classifier and then identifies disputed texts. Authorship identification belongs to the domain of style classification and is a branch of text classification. In contrast with text classification which deals with the content of texts, authorship identification focuses on the form property of texts...
Signature-based anti-viruses are very accurate, but are limited in detecting new malicious code. Dozens of new malicious codes are created every day, and the rate is expected to increase in coming years. To extend the generalization to detect unknown malicious code, heuristic methods are used; however, these are not successful enough. Recently, classification algorithms were used successfully for...
Port state control (PSC) inspection is the most important mechanism to ensure world marine safe. Recently, some SVM-based risk assessment systems have been presented in the world. They estimate the risk of each candidate ship based on its generic factors and history inspection factors to select high-risk one before conducting on-board PSC inspection. However, how to improve the performance of the...
In this paper a face indexing scheme based on spatial similarity is proposed. Spatial scattering of anatomically relevant dominant points on faces is preserved in the kd-tree index structure for efficient retrieval. The methodology is invariant to linear transformation and is robust for pose and expression variations. Experimentation on ORL face database has corroborated the retrieval effectiveness...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.