The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays, text classification (TC) becomes the main applications of NLP (natural language processing). Actually, we have a lot of researches in classifying text documents, such as Random Forest, Support Vector Machines and Naive Bayes. However, most of them are applied for English documents. Therefore, the text classification researches on Vietnamese still are limited. By using a Vietnamese news corpus,...
Text classification (TC) is a task that assigns a text to one or more classes and predefined categories. Constructing text classifiers with high accuracy is a vital task in biomedical field, given the wealth of information hidden in unlabelled documents. Because of large feature spaces, traditionally discriminative approaches, such as logistic regression and support vector machines with n-gram and...
The increased use of the Internet and the ease of access to online communities like social media have provided an avenue for cybercrimes. Cyberbullying, which is a kind of cybercrime, is defined as an aggressive, intentional action against a defenseless person by using the Internet, social media, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended...
Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important for this reason. In this study, a database consisting of 22 thousand...
Social media offer abundant information for studying people's behaviors, emotions and opinions during the evolution of various rare events such as natural disasters. It is useful to analyze the correlation between social media and human-affected events. This study uses Hurricane Sandy 2012 related Twitter text data to conduct information extraction and text classification. Considering that the original...
Authorship attribution has been well studied in terms of text classification with many diverse feature sets. However, finding topic independent features is hard and trained models with hand crafted features in one domain may not work in another domain. In this study we used a semi-supervised neural language model which is known as document embeddings for authorship attribution problem. This method...
In this paper, we present a novel method to classify directions of capital flows in Internet finance. Our method is different from previous text classification methods in that extracts key sentences which may directly reflect the semantics of input text before classification. We use the Bi-LSTM model as a classifier to process input sentences. In this paper, we represent the matrix of key sentences...
Since short text is characterized of the short length, sparse features and strong context dependency, the traditional models have a limited precision. Motivated by this, this article offers an empirical exploration on a character-level model which implements a combination of convolutional neural network(CNN) and recurrent neural networks(RNN) for short text classification. Including the highway networks...
In daily life, we use the internet for many purposes. The Internet makes easier our life and it has led to the providing to occur new technologies. Several smart devices that use the Internet infrastructure generates digital data in different formats and with different generation speeds. The evaluation of the generated data is carried out by the algorithms associated with the field of machine learning...
With the development of technology, people are entering the virtual world more and more. Parallel to this, the internet becomes a bigger network every day and it gets a complex structure depending on this growth. Achieving the desired information with structred data becomes an increasingly important problem. One of the useful ways to find solution for this problem is to divide this complex data into...
Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...
Nowadays the exponential growth of generation of textual documents and the emergent need to structure them increase the attention to the automated classification of documents into predefined categories. There is wide range of supervised learning algorithms that deal with text classification. This paper deals with an approach for building a machine learning system in R that uses K-Nearest Neighbors...
This work includes processing and classification of tweets which are written in Turkish language. Four different sector tweet datasets are vectorized with Word Embedding model and classified with Support Vector Machine and Random Forests classifiers and results have been compared. We have showed that sector based tweet classification is more successful compared to general tweets. Accuracy rates for...
Text feature selection plays an important role in text mining. Terms are the key players in document representation. The document representation can help application in following areas-indexing, summarization, classification, clustering and filtering. Text instances come with a challenge of high dimensional feature space and using such features can be extremely useful in text analysis. Hence it is...
Automatic text classification is the key technology to process and organize large-scale text data. It is well known that the high dimensionality of feature space is a main challenge for text classification. In order to attenuate such a problem as well as inspired by existing arts, we propose an effective text feature selection algorithm by novelly fusing the classical methodologies of Gini index and...
Text classification, a simple and effective method, is considered as the key technology to deal with and organize a large amount of text data. At present, the simple text classification is unable to meet the increasing of user's demand, hierarchical text classification has received extensive attention and has broad application prospects. Hierarchical feature selection algorithm is the key technology...
In the text classification, The similarity between the text need to be calculated, but the existing classification methods only consider the similarity between feature words and categories and does not involve the semantic similarity between feature words. In this paper, a new classification model LDA (Latent Dirichlet Allocation) — KNN (K-Nearest Neighbor) is proposed. LDA is used to solve the problem...
Nowadays, large volumes of text data are being produced in real time due to expansion of communication. It is necessary to organize this data for exploitation and extraction of useful information. Text classification based on the topic is one of the efficient solutions to this problem. Efficient algorithms are applied for text classification if they address high dimensional data. In this paper, a...
The demand of text classification is growing significantly in web searching, data mining, web ranking, recommendation systems and so many other fields of information and technology. This paper illustrates the text classification process on different dataset using some standard supervised machine learning techniques. Text documents can be classified through various kinds of classifiers. Labeled text...
The classification of text documents into a number of pre-defined categories has many application scenarios, for example the classification of news items into thematic sections. Documents to be classified are commonly represented by a bag-of-words feature vector. The bag-of-words model cannot handle two language phenomena: synonymy and polysemy, besides, dimensions of feature vectors are orthogonal...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.