The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays, text classification (TC) becomes the main applications of NLP (natural language processing). Actually, we have a lot of researches in classifying text documents, such as Random Forest, Support Vector Machines and Naive Bayes. However, most of them are applied for English documents. Therefore, the text classification researches on Vietnamese still are limited. By using a Vietnamese news corpus,...
Authorship attribution has been well studied in terms of text classification with many diverse feature sets. However, finding topic independent features is hard and trained models with hand crafted features in one domain may not work in another domain. In this study we used a semi-supervised neural language model which is known as document embeddings for authorship attribution problem. This method...
Since short text is characterized of the short length, sparse features and strong context dependency, the traditional models have a limited precision. Motivated by this, this article offers an empirical exploration on a character-level model which implements a combination of convolutional neural network(CNN) and recurrent neural networks(RNN) for short text classification. Including the highway networks...
Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...
Nowadays the exponential growth of generation of textual documents and the emergent need to structure them increase the attention to the automated classification of documents into predefined categories. There is wide range of supervised learning algorithms that deal with text classification. This paper deals with an approach for building a machine learning system in R that uses K-Nearest Neighbors...
In the text classification, The similarity between the text need to be calculated, but the existing classification methods only consider the similarity between feature words and categories and does not involve the semantic similarity between feature words. In this paper, a new classification model LDA (Latent Dirichlet Allocation) — KNN (K-Nearest Neighbor) is proposed. LDA is used to solve the problem...
Nowadays, large volumes of text data are being produced in real time due to expansion of communication. It is necessary to organize this data for exploitation and extraction of useful information. Text classification based on the topic is one of the efficient solutions to this problem. Efficient algorithms are applied for text classification if they address high dimensional data. In this paper, a...
The demand of text classification is growing significantly in web searching, data mining, web ranking, recommendation systems and so many other fields of information and technology. This paper illustrates the text classification process on different dataset using some standard supervised machine learning techniques. Text documents can be classified through various kinds of classifiers. Labeled text...
Spoken language understanding (SLU) is one of the important problem in natural language processing, and especially in dialog system. Fifth Dialog State Tracking Challenge (DSTC5) introduced a SLU challenge task, which is automatic tagging to speech utterances by two speaker roles with speech acts tag and semantic slots tag. In this paper, we focus on speech acts tagging. We propose local coactivate...
Feature selection, which aims at obtaining a compact and effective feature subset for better performance and higher efficiency, has been studied for decades. The traditional feature selection metrics, such as Chi-square and information gain, fail to consider how important a feature is in a document. Features, no matter how much effective semantic information they hold, are treated equally. Intuitively,...
This paper presents a semantic naïve Bayes classification technique that is based upon our tensor space model for text representation. In our work, each of Wikipedia articles is defined as a single concept, and a document is represented as a 2nd-order tensor. Our method expands the conventional naïve Bayes by incorporating the semantic concept features into term feature statistics under the tensor-space...
Text classification is an important task in managing huge repository of textual content prevailing in various domains. In this paper, we propose to use sparse representation classifier (SRC) and support vector machines (SVMs) based classifiers using frequency-based kernels for text classification. We consider term-frequency (TF) representation for a text document. The sparse representation of an example...
With the rapid growth of the number of short text, how to effectively realize the automatic classification of short text is needed to be solved in the information domain. According to the characteristics of short text, this paper proposes Bagging_NB & Bagging_BSJ, which are two classification algorithms based on the improvement of current integrated classifiers. Traditional classifier NB, SVM,...
SMS (Short Message Service) is still the primary choice as a communication medium even though nowadays mobile phone is growing with a variety of communication media messenger applications. However, nowadays along with the SMS tariff reduction leads to the increase of SMS spam, as used by some people as an alternative to advertise and fraud. Therefore, it becomes an important issue as it can bug and...
This research proposes an approach for text classification that uses a simple neural network called Dynamic Text Classifier Neural Network (DTCNN). The neural network uses as input vectors of words with variable dimension without information loss called Dynamic Token Vectors (DTV). The proposed neural network is designed for the classification of large and short text into categories. The learning...
Text classification is an important task in natural language processing that aims to determine the category of a document. In the simplest settings, we adopt the bag-of-words model and convert documents in the corpus into term frequency vectors so that the classifier can process them. Because the bag-of-words model retains only the number of occurrences of each individual term, the classifier cannot...
In modern society, some famous news websites such as Sina and Times server to provide information every day for millions of users. But with the continuous development of information technology, the amount of disorder data is increasing. How to organize the text and make automatically text classification has become a challenge. The traditional manual classification of news text not only consumes a...
Classification systems adapts many machine learning techniques for quality performance in data classification. The neural networks has some unique characteristics and features which can handle high dimensional features and documents with noise and contradictory data. Classification is important to classify the input text into different domains appropriately. This paper give out a move towards classification...
The key of big text documents data analysis is to classify those text documents. To classify those text documents, it is necessary to represent those text documents as vectors which is vector space model (VSM). A powerful vector space model should remain the classification information with dimensions as little as possible. To achieve that, it is important to select most effective features for text...
In the current era, there is a high demand of accurate text identification and categorization methods in N - Lingual non-scanned and scanned machine printed documents, where N represents mono, bi, tri or multi mode. In this paper, a technical study and analysis is presented to show N-lingual document classification for normal text, printed and handwritten documents. Text classification for normal...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.