The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays, text classification (TC) becomes the main applications of NLP (natural language processing). Actually, we have a lot of researches in classifying text documents, such as Random Forest, Support Vector Machines and Naive Bayes. However, most of them are applied for English documents. Therefore, the text classification researches on Vietnamese still are limited. By using a Vietnamese news corpus,...
The increased use of the Internet and the ease of access to online communities like social media have provided an avenue for cybercrimes. Cyberbullying, which is a kind of cybercrime, is defined as an aggressive, intentional action against a defenseless person by using the Internet, social media, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended...
Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important for this reason. In this study, a database consisting of 22 thousand...
Natural Language Processing (NLP) is a prominent subject which includes various subcategories such as text classification, error correction, machine translation, etc. Unlike other languages, there are limited number of Turkish NLP studies in literature. In this study, we apply text classification on Turkish documents by using n-gram features. Our algorithm applies different preprocessing techniques,...
Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...
Youtube is one of the most popular video sharing platform in Indonesia. A person can react to a video by commenting on the video. A comment may contain an emotion that can be identified automatically. In this study, we conducted experiments on emotion classification on Indonesian Youtube comments. A corpus containing 8,115 Youtube comments is collected and manually labelled using 6 basic emotion label...
Classification of text documents is commonly carried out using various models of bag-of-words that are generated using feature selection methods. In these models, selected features are used as input to well-known classifiers such as Support Vector Machines (SVM) and neural networks. In recent years, a technique called word embeddings has been developed for text mining and, deep learning models using...
Essays in different text genres have different ideas and writing method. Prediction the text genres firstly will help get a better accuracy when predicting the success of literary or finding the beautiful words and sentences in the essay. And it will help set a different standard for different text genres when scoring the writing by computer. Words and structure can be effective in discriminating...
Feature representation plays an important role in text classification. Feature mapping based on labels information is an algorithm suitable for Binary Relevance. Compared with the conventional text representation, it makes the dimension of the text under control by means of word embedding. More importantly, it takes full advantage of the general characteristics of the label on text representation...
The basic idea behind the classifier ensembles is to use more than one classifier by expecting to improve the overall accuracy. It is known that the classifier ensembles boost the overall classification performance by depending on two factors namely, individual success of the base learners and diversity. One way of providing diversity is to use the same or different type of base learners. When the...
In this information era, the number of websites in the Internet has dramatically increased over a few years. Any information and services can be retrieved from the website. However, the most valuable content of the website is still a text which is related to the topic or category of the websites. But there has only few researches focusing on categorizing Thai language information. The rest of researches...
Aim to multiclass text categorization problem, a classification algorithm based on multiconlitron and 1-a-r method is presented. 1-a-r method is used to convert a multiclass categorization problem to several binary problems. Multiconlitron is constructed for each binary problem in input space. For the text to be classified, its class is decided by multiconlitrons. The classification experiments are...
This work includes processing and classification of tweets which are written in Turkish language. Four different sector tweet datasets are vectorized with Word Embedding model and classified with Support Vector Machine and Random Forests classifiers and results have been compared. We have showed that sector based tweet classification is more successful compared to general tweets. Accuracy rates for...
Because of the feature sparseness problem, conventional text classification methods hardly achieve a good effect on short texts. This paper presents a novel feature extension method based on the TNG model to solve this problem. This algorithm can infers not only the unigram words distribution but also the phrases distribution on each topic. We can build a feature extension library using TNG algorithm...
In natural language processing and text mining, highly successful applications are developed with the recently introduced techniques. Particularly, noticeable performance increases are achieved on countless applications by using word embedding method. In this paper, we propose a novel text mining method based on word embedding and Fisher vector. The automatic analysis of political records is selected...
Text feature selection plays an important role in text mining. Terms are the key players in document representation. The document representation can help application in following areas-indexing, summarization, classification, clustering and filtering. Text instances come with a challenge of high dimensional feature space and using such features can be extremely useful in text analysis. Hence it is...
Automatic text classification is the key technology to process and organize large-scale text data. It is well known that the high dimensionality of feature space is a main challenge for text classification. In order to attenuate such a problem as well as inspired by existing arts, we propose an effective text feature selection algorithm by novelly fusing the classical methodologies of Gini index and...
Text classification, a simple and effective method, is considered as the key technology to deal with and organize a large amount of text data. At present, the simple text classification is unable to meet the increasing of user's demand, hierarchical text classification has received extensive attention and has broad application prospects. Hierarchical feature selection algorithm is the key technology...
A class imbalance problem often appears in many real world applications, e.g. fault diagnosis, text categorization, fraud detection. When dealing with a large-scale imbalanced dataset, feature selection becomes a great challenge. To confront it, this work proposes a feature selection approach based on a decision tree rule. The effectiveness of the proposed approach is verified by classifying a large-scale...
Nowadays, large volumes of text data are being produced in real time due to expansion of communication. It is necessary to organize this data for exploitation and extraction of useful information. Text classification based on the topic is one of the efficient solutions to this problem. Efficient algorithms are applied for text classification if they address high dimensional data. In this paper, a...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.