The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
An improved KNN text classification algorithm based on Simhash has been proposed by introducing Simhash and the average Hamming distance of adjacent texts as a unit, which solves the problems caused by data imbalance and the large computational overhead in the traditional KNN text classification algorithms. Experimental results demonstrate that the proposed algorithm performs a higher precision, a...
With the rapid development of mobile Internet, android system becomes the absolute overlord of the operating system, which has an unparalleled influence to the smart phone industry, but also closely related with people's lives. However, the feature of open source and lacking of regulatory of Android system led to a lot of security risks. Because of the widespread use of Android Intent Mechanism, the...
This paper studies the accuracy of a fully-distributed pattern recognition algorithm, namely the P2P-GN, with imbalanced datasets problem which is often neglected by most of the available distributed algorithms. A major distinction of the P2P-GN compared to the other approaches is that it forms a single global classifier, instead of building many local classifiers (one at every site). Fine-granularity...
As the access to Internet has been so much easier in the last decade or so, people are using online applications more than ever. Online marketing, in fact, the whole e-commerce is getting enormous day by day if not in every minute. Online Reviews play a very important role in this field and proving itself to be auspicious in terms of decision making from a customer's point of view. Even though these...
In modern networks, there exist different applications which generate various different types of network traffic. In order to improve the performance of network management, it is important to identify and classify the internet traffic. The machine learning (ML) technique based on per-flow statistics has been widely used in traffic classification. Different from traditional classification methods,...
Internet traffic classification is one of the key foundations for research works and traffic engineering in Internet. With the rapid increase of Internet applications and the number of Internet flow, the technique challenges are coupled with development of traffic classification all the time. Currently, the machine learning-based technique has attracted much attention, since it can address the issues...
Internet traffic classification is an area of current research interest. The failure of port and payload based classification motivates researchers to head towards a machine learning (ML) approach. However, training and testing dataset validation has not been formally addressed. This paper discusses the problem of ML dataset validation and highlights three training issues to be considered in ML classification...
Nowadays, while we are enjoying the convenience brought by such a huge repository of online web information, we may come across difficulties in finding the web pages we want related to particular information we are searching for. Hence, it is essential to classify web documents to facilitate the search and retrieval of pages. Existing algorithms work well with a small quantity of web pages, whereas,...
There are few Chinese dish recommendation algorithms due to the variety of Chinese dishes. It could be impossible to find one's most liked dishes in a restaurant through the name or the ingredients of a dish. The algorithm in this paper uses the user's ordering history to quantify one's taste by k-means clustering method and determines the number of user's favorite tastes by the BWP index. With the...
The classification of data sessions on the Internet is a crucial issue for Authorities involved in lawful interception. Some Internet Service Providers (ISP) can provide a panel of IP nodes that, tuned to detect specific data patterns, are able to send an alert when a data session in a targeted class is found. Unluckily, several applications generate a bulk of IP traffic not characterized by a recognizable...
Along with the information explosion in the Internet era, the traditional classification methods, such as KNN (k-nearest neighbor), Naive Bayes (NB), encounter bottlenecks due to the endless stream of new words. In this paper, through comparing with the Rocchio and Bayesian algorithms, it has been found that centroid-based algorithms are insufficient for text classification. Therefore, a novel feature...
The researchers have started looking for Internet traffic recognition techniques that are independent of ‘well known’ TCP or UDP port numbers, or interpreting the contents of packet payloads. Newer approaches classify traffic by recognizing statistical patterns in externally observable attributes of the traffic (such as typical packet lengths and inter-arrival times). The main goal is to cluster or...
As a new form of malicious software, phishing websites appear frequently in recent years, which cause great harm to online financial services and data security. In this paper, we design and implement an intelligent model for detecting phishing websites. In this model, we extract 10 different types of features such as title, keyword and link text information to represent the website. Heterogeneous...
Active Queue Management (AQM) acts as an enhanced mechanism for end-to-end congestion controlling. It can improve the queue length and usability of data links by the method of drop the packets among nodes in the network. In this paper, we describe the background and design indicator for the AQM. We will discuss the current research status for AQM. Then we will focus on some classical AQM algorithm...
This paper implemented a network traffic classification method on the basis of Guassian Mixture Model-Hidden Markov Model using packet-level properties in network traffic flows (PLGMM-HMM). Our model firstly builds PLGMM-HMMs via two packet-level properties, inter packet time and payload size, respectively; then, we construct the estimation function by computing the F-Measure value through classifying...
Depending on questions, various answering methods and answer sources can be used. In this paper, we build a distributed QA system to handle different types of questions and web sources. When a user question is entered, the broker distributes the question over multiple sub-QAs according to question types. The selected sub-QAs find local optimal candidate answers, and then they are collected in to the...
Abnormal events such as large scale power outages, misconfigurations, and worm attacks can affect the global routing infrastructure and consequently create regional or global Internet service interruptions. As a result, early detection of abnormal events is of critical importance. In this study we present a framework based on data mining algorithms that are applied to anomaly detection on global routing...
Knowledge discovery from the Web is a cyclic process. In this paper we focus on the important part of transforming unstructured information from Web pages into structured relations. Relation extraction systems capture information from natural language text on Web pages, called Web text. However, extraction is quite costly and time consuming. Worse, many Web pages may not contain a textual representation...
In this work, we developed a self-organizing map (SOM) technique for using web-based text analysis to forecast when a group is undergoing a phase change. By “phase change”, we mean that an organization has fundamentally shifted attitudes or behaviors. For instance, when ice melts into water, the characteristics of the substance change. A formerly peaceful group may suddenly adopt violence, or a violent...
The Multiple Classifier Systems are nowadays one of the most promising directions in pattern recognition. There are many methods of decision making based on classifier groups. The most popular are those methods that have their origin in voting, where the decision of the common classifier is a combination of simple classifiers decisions. The paper presents an idea how a decision about attack in application...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.