The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Network analysis methods have been applied in many areas such as computer science, social science, biology and physics. In this paper, we apply network analysis methods to the linguistic domain for classifying subjective documents. Particularly, we view that subjective documents are related to one another according to some common subjective words and build a subjective document network of which nodes...
As the number of pages on the web is permanently increasing, there is a need to classify pages into categories to facilitate indexing or searching them. In the method proposed here, we use both textual and visual information to find a suitable representation of web page content. In this paper, several term weights, based on TF or TF-IDF weighting are proposed. Modification is based on visual areas,...
We consider here the task of multi-label classification for data organized in a multi-relational graph. We propose the IMMCA model - Iterative Multi-label Multi-Relational Classification Algorithm - a general algorithm for solving the inference and learning problems for this task. Inference is performed iteratively by propagating scores according to the multi-relational structure of the data. We detail...
Within this paper we introduce a framework for semi- to full-automatic discovery and acquisition of bag-of-words style interest profiles from openly accessible Social Web communities. To do such, we construct a semantic taxonomy search tree from target domain (domain towards which we're acquiring profiles for), starting with generic concepts at root down to specific-level instances at leaves, then...
In this paper, a novel method for profiling phishing activity from an analysis of phishing emails is proposed. Profiling is useful in determining the activity of an individual or a particular group of phishers. Work in the area of phishing is usually aimed at detection of phishing emails. In this paper, we concentrate on profiling as distinct from detection of phishing emails. We formulate the profiling...
The goal of semi-supervised learning (SSL) methods is to reduce the amount of labeled training data required by learning from both labeled and unlabeled instances. Macskassy and Provost (2007) proposed the weighted-vote relational neighbor classifier (wvRN) as a simple yet effective baseline for semi-supervised learning on network data. It is similar to many recent graph-based SSL methods and is shown...
Sensors are being deployed to improve border security generating enormous collections of data and databases. Unfortunately these sensors can respond to a variety of stimuli, sometimes reacting to meaningful events and sometimes triggered by random events which are considered false alarms. The intent of this project is to supplement human intelligence in a sensor network framework that can assist in...
Chinese Web Automatic Document Classification is one of the core technologies in Chinese information retrieval. Web Spider technology is the key in Chinese WEB document automatic classification. this issue surrounds WEB information explore which is this cutting-edge research, combined with the overall requirements of the Chinese WEB Document Classification System Framework, achieving roaming of the...
Analysis of dissolved gases content in power transformer oil is very important to monitor transformer latent fault and ensure normal operation of entire power system. Analysis of dissolved gases content in power transformer oil is a complicated problem due to its nonlinearity and the small quantity of training data. Support vector machine (SVM) has been successfully employed to solve classification...
AdaBoost is known as an effective method for improving the performance of base classifiers both theoretically and empirically. However, previous studies have shown that AdaBoost is prone to overfitting, especially in noisy domains. On the other hand, the k-nearest neighbors (kNN) rule is one of the oldest and simplest methods for pattern classification, when cleverly combined with prior knowledge,...
Network is more and more popular in the present society. Least squares support vector machine is a kind modified support vector machine for classification, which can solve a convex quadratic programming problem. Least squares support vector machine is presented to network intrusion detection. We apply KDDCUP99 experimental data of MIT Lincoln Laboratory to research the classification performance of...
We present a system for miRNA classification that implements a wide variety of miRNA features found in literature: structural, thermodynamical, information-theoretical, statistical, and comparative. A total of 1485 features are computed and various tests are performed. The classifier of choice used is Random Forests, which is also employed along with various feature selection strategies to determine...
Focusing on the problem in production practice of sintering process, a novel classifier based on BP learning algorithm is proposed for on-line quality inference of sintered ore. In order to speed up the convergence rate of BP learning algorithm, the learning algorithm with adaptive variable step-size is adopted. On the basis of the above work a quality prediction model is proposed in this paper. Experimental...
Aiming at the problem that recognition rate of Principal Component Analysis (PCA) algorithm is low in face recognition, this paper proposes a modular PCA algorithm based on Within-Class median. Firstly, within-class median of each sub-image of all training samples in each class are calculated, and they are used to normalize each corresponding sub-image of within-class sample. After that, the best...
As the classical algorithm of the decision tree classification algorithm, ID3 is famous for the merits of high classifying speed easy, strong learning ability and easy construction. But when use it to classify, there does exist the problem of inclining to chose attributions which has many values, which affects its practicality. This paper for solving the problem a decision tree algorithm based on...
Different condition attributes have different impact on categories in knowledge database. To describe the importance of different attributes, we need to establish the “space” that can measure the degree of this importance. The classification accuracy is a agreed standard of sorting attribute set. We need wipe off redundant condition attributes and to find a satisfactory reduction under the condition...
The algorithm to find the novelty temporal pattern from the temporal database is presented. The basic idea of the algorithm is firstly to extract the feature sequence from a time series, then to compare the feature points of the time series with ones of the normal pattern to decide whether there is a novelty pattern in the time series. The temporal relation of time series is reserved in the feature...
The decision tree algorithm is a hot point in the field of data mining, which is usually used to form classifiers and prediction models. In practice, it has a wide application. This paper describes the decision tree technology and its development process, focuses on typical decision tree algorithms, analyzes their advantages and disadvantages, compares several algorithms, and finally discusses the...
Frequent patterns mining is an important data mining task with many real-world applications. By considering different weights of the items, weighted frequent pattern mining can discover more important knowledge compared to traditional frequent patterns mining. In this paper, we presented a new algorithm called SMFPM to discover weighted frequent patterns over data streams, the proposed method is based...
Identifying the subjective relationship is important for opinion mining on product reviews in Chinese. The commonly used identification methods adopt classifiers as the identifier. However, it is difficult to maintain high accuracy and high recall rate simultaneously for the instability of exciting classification method. Motivated by this, we present a method based on sentential features and ensemble...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.