The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Text classification (TC) is a task that assigns a text to one or more classes and predefined categories. Constructing text classifiers with high accuracy is a vital task in biomedical field, given the wealth of information hidden in unlabelled documents. Because of large feature spaces, traditionally discriminative approaches, such as logistic regression and support vector machines with n-gram and...
The increased use of the Internet and the ease of access to online communities like social media have provided an avenue for cybercrimes. Cyberbullying, which is a kind of cybercrime, is defined as an aggressive, intentional action against a defenseless person by using the Internet, social media, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended...
Social media offer abundant information for studying people's behaviors, emotions and opinions during the evolution of various rare events such as natural disasters. It is useful to analyze the correlation between social media and human-affected events. This study uses Hurricane Sandy 2012 related Twitter text data to conduct information extraction and text classification. Considering that the original...
In this paper, we present a novel method to classify directions of capital flows in Internet finance. Our method is different from previous text classification methods in that extracts key sentences which may directly reflect the semantics of input text before classification. We use the Bi-LSTM model as a classifier to process input sentences. In this paper, we represent the matrix of key sentences...
Since short text is characterized of the short length, sparse features and strong context dependency, the traditional models have a limited precision. Motivated by this, this article offers an empirical exploration on a character-level model which implements a combination of convolutional neural network(CNN) and recurrent neural networks(RNN) for short text classification. Including the highway networks...
With the development of technology, people are entering the virtual world more and more. Parallel to this, the internet becomes a bigger network every day and it gets a complex structure depending on this growth. Achieving the desired information with structred data becomes an increasingly important problem. One of the useful ways to find solution for this problem is to divide this complex data into...
Usually, most of the data generated in real-world such as images, speech signals, or fMRI scans has a high dimensionality. Therefore, dimensionality reduction techniques can be used to reduce the number of variables in that data and then the system performance can be improved. Because the processing of the high dimensional data leads the increase of complexity both in execution time and memory usage...
People share their opinions about things like products, movies and services using social media channels. The analysis of these textual contents for sentiments is a gold mine for marketing experts, thus automatic sentiment analysis is a popular area of applied artificial intelligence. We propose a latent syntactic structure-based approach for sentiment analysis which requires only sentence-level polarity...
Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...
At present, it is a great challenge that solving high-dimension and text sparsity problems in short text classification. To resolve these problems, this paper proposes a method which takes the correlation between lexical items and tags before completing Latent Dirichlet Allocation(LDA) topic model. Meanwhile, this paper adjusts parameters of Support Vector Machine(SVM) to find the optimal values by...
In this paper we tackle the issue of sentiment analysis of social network posts in a not well targeted language — Slovak. There is a significant lack of research in this area for minor languages, as they often introduce additional language-specific issues for text processing. In case of Slovak, common issues are high flection, complex morphology and syntax. User-generated content of social networks...
In this paper we present results of a research on automatic extremist text detection. For this purpose an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution...
Text feature selection plays an important role in text mining. Terms are the key players in document representation. The document representation can help application in following areas-indexing, summarization, classification, clustering and filtering. Text instances come with a challenge of high dimensional feature space and using such features can be extremely useful in text analysis. Hence it is...
In the text classification, The similarity between the text need to be calculated, but the existing classification methods only consider the similarity between feature words and categories and does not involve the semantic similarity between feature words. In this paper, a new classification model LDA (Latent Dirichlet Allocation) — KNN (K-Nearest Neighbor) is proposed. LDA is used to solve the problem...
Nowadays, large volumes of text data are being produced in real time due to expansion of communication. It is necessary to organize this data for exploitation and extraction of useful information. Text classification based on the topic is one of the efficient solutions to this problem. Efficient algorithms are applied for text classification if they address high dimensional data. In this paper, a...
As social media has become increasingly popular in the modern world, people are using these platforms to express their opinions about products, businesses, and services. The need for categorizing these consumer reviews has been prominent. One effective solution is sentiment analysis (SA), which has been an active research topic. The goal of SA is to automatically extracting and classifying user opinions...
We study in this paper an authorship attribution in Arabic poetry using text mining classification. Several features such as Characters, Poetry Sentence length; Word length, Rhyme, Meter and First word in the sentence are used as input data for text mining classification algorithms Naïve Bayes NB, Support Vector Machine SVM, and Sequential Minimal Optimization SMO. The data set of experiment was divided...
Nowadays there are numerous user-generated restaurant reviews available on the Internet, of which they are considered valuable resources for decision making to customers. In reality, not every reviews available online are helpful to users, so the need for filtering unqualified reviews is realized. There have been several studies on spam review detection that attempt to detect unqualified reviews using...
Information is one of the foremost fact in the prompt world. Within that, text information plays an imperative role and can acquire diverse mold. The natural images that consist of such text information are called scene text images. Semantic information of the image is used for content-based image retrieval, indexing and classification purpose. First stage of text extraction is the text and non-text...
The exponential growth of unstructured messages generated by the computer systems and applications in modern computing environment poses a significant challenge in managing and using the information contained in the messages. Although these data contain a wealth of information that is useful for advanced threat detection, the sheer volume, variety, and complexity of data make it difficult to analyze...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.