The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The increased use of the Internet and the ease of access to online communities like social media have provided an avenue for cybercrimes. Cyberbullying, which is a kind of cybercrime, is defined as an aggressive, intentional action against a defenseless person by using the Internet, social media, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended...
Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...
At present, it is a great challenge that solving high-dimension and text sparsity problems in short text classification. To resolve these problems, this paper proposes a method which takes the correlation between lexical items and tags before completing Latent Dirichlet Allocation(LDA) topic model. Meanwhile, this paper adjusts parameters of Support Vector Machine(SVM) to find the optimal values by...
In this paper we tackle the issue of sentiment analysis of social network posts in a not well targeted language — Slovak. There is a significant lack of research in this area for minor languages, as they often introduce additional language-specific issues for text processing. In case of Slovak, common issues are high flection, complex morphology and syntax. User-generated content of social networks...
In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution...
Text feature selection plays an important role in text mining. Terms are the key players in document representation. The document representation can help application in following areas-indexing, summarization, classification, clustering and filtering. Text instances come with a challenge of high dimensional feature space and using such features can be extremely useful in text analysis. Hence it is...
Nowadays, large volumes of text data are being produced in real time due to expansion of communication. It is necessary to organize this data for exploitation and extraction of useful information. Text classification based on the topic is one of the efficient solutions to this problem. Efficient algorithms are applied for text classification if they address high dimensional data. In this paper, a...
As social media has become increasingly popular in the modern world, people are using these platforms to express their opinions about products, businesses, and services. The need for categorizing these consumer reviews has been prominent. One effective solution is sentiment analysis (SA), which has been an active research topic. The goal of SA is to automatically extracting and classifying user opinions...
We study in this paper an authorship attribution in Arabic poetry using text mining classification. Several features such as Characters, Poetry Sentence length; Word length, Rhyme, Meter and First word in the sentence are used as input data for text mining classification algorithms Naïve Bayes NB, Support Vector Machine SVM, and Sequential Minimal Optimization SMO. The data set of experiment was divided...
Nowadays there are numerous user-generated restaurant reviews available on the Internet, of which they are considered valuable resources for decision making to customers. In reality, not every reviews available online are helpful to users, so the need for filtering unqualified reviews is realized. There have been several studies on spam review detection that attempt to detect unqualified reviews using...
Text classification deals with allocating a text document to a predetermined class. Generally, this involves learning about a class from representations of documents belonging to that class. In this paper, we propose a classifier combination that uses a Multinomial Naïve Bayesian (MNB) classifier along with Bayesian Networks (BN) classifier. The results of two classifiers are combined by taking an...
Nature language processing is an important part in data mining, which counts a lot in the internet age. Feature extraction effects the accuracy of text classification. This paper proposes a method of iterative feature space evolution to optimize the result. Adjusting the extended dictionary and the stop word list, we optimize the feature space time and again to get a better classifier model. The final...
We offer an automated way of estimating the author of a song using only its lyrics content. To this end, we introduce a complete text classification framework which takes raw lyrics data as input and report estimated songwriter. The performance of the system is evaluated based on its classification and retrieval ability on a large dataset of Turkish songs, which was collected in this study. The results...
Recently, many information retrieval (IR) based bug localization approaches have been proposed in the literature. These approaches use information retrieval techniques to process a textual bug report and a collection of source code files to find buggy files. They output a ranked list of files sorted by their likelihood to contain the bug. Recent approaches can achieve reasonable accuracy, however,...
With the development of the Internet, people share their emotion statuses or attitudes on online social websites, leading to an explosive rise on the scale of data. Mining sentiment information behind these data helps people know about public opinions and social trends. In this paper a sentiment analysis algorithm adapting to Weibo (Microblog) data is proposed. Given that a Weibo post is usually short,...
We propose a novel algorithm, QuIET, for binary classification of texts. The method automatically generates a set of span queries from a set of annotated documents and uses the query set to categorize unlabeled texts. QuIET generates models that are human understandable. We describe the method and evaluate it empirically against Support Vector Machines, demonstrating a comparable performance for a...
Web topic detection is a crucial prerequisite to web-based data integration and also a key component for Vertical Search Engine. So, it attracts much attention from not only the industry but also the literature. In this paper, we proposed a domain-lexicon-based framework for Web topic detection. In our framework, we extracted the topical features from the web page first. Next, we employed Vector Space...
The RLS-MARS (Regularized Least Squares-Multi Angle Regression and Shrinkage) feature selection model is used to select the relevant information, in which both, the keeping and the leaving-out of the regularizer are present. The RLS-MARS model is to find a series of directions in multidimensional space, leading the gradient vectors to change along those directions which would make the gradient matrix's...
With the development of microblog, many studies pay special attention to sentiment classification of the reviews in microblog. This paper summarizes three well-known methods for text classification and then improves one of them for sentiment analysis. We come up with a new model in which we introduce efficient approaches to select features, calculate weights, train samples and evaluate classifier...
Although there have been previous studies performing authorship attribution to a specific individual, we find a shortage of efforts to group authors based on their affiliations. This paper presents our work on classification of website forum posts by the author's group affiliation. Specifically, we seek to classify translated website forum posts by the (inferred) political affiliation of the author...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.