The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...
Feature representation plays an important role in text classification. Feature mapping based on labels information is an algorithm suitable for Binary Relevance. Compared with the conventional text representation, it makes the dimension of the text under control by means of word embedding. More importantly, it takes full advantage of the general characteristics of the label on text representation...
The basic idea behind the classifier ensembles is to use more than one classifier by expecting to improve the overall accuracy. It is known that the classifier ensembles boost the overall classification performance by depending on two factors namely, individual success of the base learners and diversity. One way of providing diversity is to use the same or different type of base learners. When the...
Aim to multiclass text categorization problem, a classification algorithm based on multiconlitron and 1-a-r method is presented. 1-a-r method is used to convert a multiclass categorization problem to several binary problems. Multiconlitron is constructed for each binary problem in input space. For the text to be classified, its class is decided by multiconlitrons. The classification experiments are...
Nowadays the exponential growth of generation of textual documents and the emergent need to structure them increase the attention to the automated classification of documents into predefined categories. There is wide range of supervised learning algorithms that deal with text classification. This paper deals with an approach for building a machine learning system in R that uses K-Nearest Neighbors...
AdaBoost is one of the most popular algorithm for classification and has been successfully used for text classification, face detection and tracking. However noise sensitivity is regarded as a major disadvantage and previous works show that AdaBoost will be overfitting when dealing with the data sets with noisy data. To improve the noise tolerance of conventional AdaBoost, this paper proposed a preprocessing...
With the development of computer and network techniques, and the digital Chinese news texts explosion, facing a massive unstructured news data, a better way for knowledge extraction and storage, on the one hand, can help readers understand the core content of news, on the other hand, completed news knowledge accumulation will support the reportage. In recent years, information extraction technology...
There is a constantly growing interest in evaluating music information retrieval (MIR) systems that can provide effective management of the music resources. The crucial characteristic of music is its emotion, which reflect the human's perception. To do the automatic classification of Chinese music emotions more effective, we use the lyrics of music to analysis and classify music based on emotion....
In the text classification, The similarity between the text need to be calculated, but the existing classification methods only consider the similarity between feature words and categories and does not involve the semantic similarity between feature words. In this paper, a new classification model LDA (Latent Dirichlet Allocation) — KNN (K-Nearest Neighbor) is proposed. LDA is used to solve the problem...
In the view of mobile data security detection, text classification model can be realized in the application layer to detect malicious attacks. Since traditional C4.5 decision tree has the disadvantage of no considering about interaction influence between properties in attribute selection, an improved model of C4.5 decision tree based on AdaBoost algorithm is put forward. The problem in measuring the...
Nowadays, large volumes of text data are being produced in real time due to expansion of communication. It is necessary to organize this data for exploitation and extraction of useful information. Text classification based on the topic is one of the efficient solutions to this problem. Efficient algorithms are applied for text classification if they address high dimensional data. In this paper, a...
In this paper we investigate the influence of outliers in the training set on the probabilistic classifier quality. By the example of naive Bayes classifier we show how the qualitative characteristics depend on the percentage of outliers' ratio. This dependence is built on three basic metrics of the classifier quality: precision, recall and F1 score. At the end we propose method for reducing the outliers...
For data classification, a feature subset is selected from all features by prior knowledge or determined by empirical experiments; however, it varies to contents, feature measures, and classifiers. This paper presents a filter based algorithm to select a subset of features by using outlier cut-offs of relevance between features and targeted categories. This algorithm uses the statistical techniques...
The public comments on the social hot events on Weibo has attracted lots of attentions in recent years. To remedy the shortage of sentiment analysis about typical events on Weibo, the classification method based on sentiment dictionary is put forward in this paper, and the accuracy rate is close to 50%. This paper also proposed a sentiment classification method based on Naive Bayesian in order to...
Support Vector Machine (SVM) is one of widely-used text classification method. Although SVM performs well in practice, SVM encounters two problems: the data distribution is not taken into consideration in the process of classification and its performance is greatly influenced by noises. In view of this, Fuzzy Support Vector Machine based on Manifold Discriminant Analysis (FSVM-MDA) is proposed and...
This paper presents a semantic naïve Bayes classification technique that is based upon our tensor space model for text representation. In our work, each of Wikipedia articles is defined as a single concept, and a document is represented as a 2nd-order tensor. Our method expands the conventional naïve Bayes by incorporating the semantic concept features into term feature statistics under the tensor-space...
Since text mining saves a large amount of information in text format, it has a very high potential application. One of the main applications of text mining is to classify texts in subject order. In this paper, we tried to propose a aarianew method in order to increase classification accuracy and efficiency, by considering different methods of Persian text classification. We used a number of 5330 news...
With the rapid growth of the number of short text, how to effectively realize the automatic classification of short text is needed to be solved in the information domain. According to the characteristics of short text, this paper proposes Bagging_NB & Bagging_BSJ, which are two classification algorithms based on the improvement of current integrated classifiers. Traditional classifier NB, SVM,...
SMS (Short Message Service) is still the primary choice as a communication medium even though nowadays mobile phone is growing with a variety of communication media messenger applications. However, nowadays along with the SMS tariff reduction leads to the increase of SMS spam, as used by some people as an alternative to advertise and fraud. Therefore, it becomes an important issue as it can bug and...
This research proposes an approach for text classification that uses a simple neural network called Dynamic Text Classifier Neural Network (DTCNN). The neural network uses as input vectors of words with variable dimension without information loss called Dynamic Token Vectors (DTV). The proposed neural network is designed for the classification of large and short text into categories. The learning...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.