The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays, text classification (TC) becomes the main applications of NLP (natural language processing). Actually, we have a lot of researches in classifying text documents, such as Random Forest, Support Vector Machines and Naive Bayes. However, most of them are applied for English documents. Therefore, the text classification researches on Vietnamese still are limited. By using a Vietnamese news corpus,...
The increased use of the Internet and the ease of access to online communities like social media have provided an avenue for cybercrimes. Cyberbullying, which is a kind of cybercrime, is defined as an aggressive, intentional action against a defenseless person by using the Internet, social media, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended...
Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important for this reason. In this study, a database consisting of 22 thousand...
There has been a phenomenal increase in the utility of text classification (TC) in applications like targeted advertisement and sentiment analysis. Most applications demand that the model be efficient and robust, yet produce accurate categorizations. This is quite challenging as their is a dearth of labelled training data because it requires assigning labels after reading the whole document. Secondly,...
Twitter is one of the most popular social media networks in the world. It is also mostly used by corporate companies, media as well as individual users. Media organizations use Twitter to announce about the news. Although the language of the given news is formal and preferred words to share information are different for each organization. In this study, we proposed an approach to recognize the Twitter...
Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...
Due to the vast amount of data, searching and obtaining relevant information on the web is a challenging task. Despite that a broad range of classification techniques have been proposed to improve the information retrieval methods, many difficulties are still present because of the continuous increase in the amount of web contents, as well as its diversity. In this paper, we propose a method that...
At present, it is a great challenge that solving high-dimension and text sparsity problems in short text classification. To resolve these problems, this paper proposes a method which takes the correlation between lexical items and tags before completing Latent Dirichlet Allocation(LDA) topic model. Meanwhile, this paper adjusts parameters of Support Vector Machine(SVM) to find the optimal values by...
In this paper we tackle the issue of sentiment analysis of social network posts in a not well targeted language — Slovak. There is a significant lack of research in this area for minor languages, as they often introduce additional language-specific issues for text processing. In case of Slovak, common issues are high flection, complex morphology and syntax. User-generated content of social networks...
In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution...
Text classification is a process of classifying documents into predefined categories through different classifiers learned from labelled or unlabelled training samples. Many researchers who work on binary text classification attempt to find a more effective way to separate relevant texts from a large data set. However, current text classifiers cannot unambiguously describe the decision boundary between...
The beginner counselors have more likely to continue counseling in their own interest, they have a high tendency to make great use of the closed-ended question in order to confirm the interpretation with the client. While expert counselors are instructing the counseling skill to beginner counselors, we consider that the reaction of a client for a beginner counselor's question is important to visualize...
This work includes processing and classification of tweets which are written in Turkish language. Four different sector tweet datasets are vectorized with Word Embedding model and classified with Support Vector Machine and Random Forests classifiers and results have been compared. We have showed that sector based tweet classification is more successful compared to general tweets. Accuracy rates for...
The Internet and social media provide a major source of information about people's opinions. Due to the rapidly growing number of online documents, it becomes both time-consuming and hard task to obtain and analyze the desired opinionated information. Sentiment analysis is the classification of sentiments expressed in documents. To improve classification perfromance feature selection methods which...
Text feature selection plays an important role in text mining. Terms are the key players in document representation. The document representation can help application in following areas-indexing, summarization, classification, clustering and filtering. Text instances come with a challenge of high dimensional feature space and using such features can be extremely useful in text analysis. Hence it is...
Automatic text classification is the key technology to process and organize large-scale text data. It is well known that the high dimensionality of feature space is a main challenge for text classification. In order to attenuate such a problem as well as inspired by existing arts, we propose an effective text feature selection algorithm by novelly fusing the classical methodologies of Gini index and...
Text classification, a simple and effective method, is considered as the key technology to deal with and organize a large amount of text data. At present, the simple text classification is unable to meet the increasing of user's demand, hierarchical text classification has received extensive attention and has broad application prospects. Hierarchical feature selection algorithm is the key technology...
Nowadays, large volumes of text data are being produced in real time due to expansion of communication. It is necessary to organize this data for exploitation and extraction of useful information. Text classification based on the topic is one of the efficient solutions to this problem. Efficient algorithms are applied for text classification if they address high dimensional data. In this paper, a...
The demand of text classification is growing significantly in web searching, data mining, web ranking, recommendation systems and so many other fields of information and technology. This paper illustrates the text classification process on different dataset using some standard supervised machine learning techniques. Text documents can be classified through various kinds of classifiers. Labeled text...
The social media generates large volume of data through tweets and text messages during and after any disaster. The analysis and classification of the obtained data at the time of disaster is essential for conveying the information to the appropriate rescue personnel. In this paper, an automated text classification system is proposed in order to classify the data effectively. The classification of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.