The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays, text classification (TC) becomes the main applications of NLP (natural language processing). Actually, we have a lot of researches in classifying text documents, such as Random Forest, Support Vector Machines and Naive Bayes. However, most of them are applied for English documents. Therefore, the text classification researches on Vietnamese still are limited. By using a Vietnamese news corpus,...
Text classification (TC) is a task that assigns a text to one or more classes and predefined categories. Constructing text classifiers with high accuracy is a vital task in biomedical field, given the wealth of information hidden in unlabelled documents. Because of large feature spaces, traditionally discriminative approaches, such as logistic regression and support vector machines with n-gram and...
The increased use of the Internet and the ease of access to online communities like social media have provided an avenue for cybercrimes. Cyberbullying, which is a kind of cybercrime, is defined as an aggressive, intentional action against a defenseless person by using the Internet, social media, or other electronic contents. Researchers have found that many of the bullying cases have tragically ended...
Today, it is not possible to use human power alone to cope with the increasing amount of data. For this reason, some automated methods are needed to group similar documents together or to place documents in predefined categories according to certain rules. The use of automated classification techniques is becoming increasingly important for this reason. In this study, a database consisting of 22 thousand...
This paper presents a novel approach to launch and defend against the causative and evasion attacks on machine learning classifiers. As the preliminary step, the adversary starts with an exploratory attack based on deep learning (DL) and builds a functionally equivalent classifier by polling the online target classifier with input data and observing the returned labels. Using this inferred classifier,...
Multilingual support in global applications that integrate and filter social media data is a significant challenge due to the cost of manually developing such social media filters for each language. Using LITMUS landslide information system as an experimental platform, we compared six design alternatives with varied combinations of manually developed filters and automatically translated filters for...
There has been a phenomenal increase in the utility of text classification (TC) in applications like targeted advertisement and sentiment analysis. Most applications demand that the model be efficient and robust, yet produce accurate categorizations. This is quite challenging as their is a dearth of labelled training data because it requires assigning labels after reading the whole document. Secondly,...
Social media offer abundant information for studying people's behaviors, emotions and opinions during the evolution of various rare events such as natural disasters. It is useful to analyze the correlation between social media and human-affected events. This study uses Hurricane Sandy 2012 related Twitter text data to conduct information extraction and text classification. Considering that the original...
Authorship attribution has been well studied in terms of text classification with many diverse feature sets. However, finding topic independent features is hard and trained models with hand crafted features in one domain may not work in another domain. In this study we used a semi-supervised neural language model which is known as document embeddings for authorship attribution problem. This method...
In this paper, we present a novel method to classify directions of capital flows in Internet finance. Our method is different from previous text classification methods in that extracts key sentences which may directly reflect the semantics of input text before classification. We use the Bi-LSTM model as a classifier to process input sentences. In this paper, we represent the matrix of key sentences...
Twitter enables large populations of end-users of software to publicly share their experiences and concerns about software systems in the form of micro-blogs. Such data can be collected and classified to help software developers infer users' needs, detect bugs in their code, and plan for future releases of their systems. However, automatically capturing, classifying, and presenting useful tweets is...
Since short text is characterized of the short length, sparse features and strong context dependency, the traditional models have a limited precision. Motivated by this, this article offers an empirical exploration on a character-level model which implements a combination of convolutional neural network(CNN) and recurrent neural networks(RNN) for short text classification. Including the highway networks...
In daily life, we use the internet for many purposes. The Internet makes easier our life and it has led to the providing to occur new technologies. Several smart devices that use the Internet infrastructure generates digital data in different formats and with different generation speeds. The evaluation of the generated data is carried out by the algorithms associated with the field of machine learning...
With the development of technology, people are entering the virtual world more and more. Parallel to this, the internet becomes a bigger network every day and it gets a complex structure depending on this growth. Achieving the desired information with structred data becomes an increasingly important problem. One of the useful ways to find solution for this problem is to divide this complex data into...
Usually, most of the data generated in real-world such as images, speech signals, or fMRI scans has a high dimensionality. Therefore, dimensionality reduction techniques can be used to reduce the number of variables in that data and then the system performance can be improved. Because the processing of the high dimensional data leads the increase of complexity both in execution time and memory usage...
Twitter is one of the most popular social media networks in the world. It is also mostly used by corporate companies, media as well as individual users. Media organizations use Twitter to announce about the news. Although the language of the given news is formal and preferred words to share information are different for each organization. In this study, we proposed an approach to recognize the Twitter...
People share their opinions about things like products, movies and services using social media channels. The analysis of these textual contents for sentiments is a gold mine for marketing experts, thus automatic sentiment analysis is a popular area of applied artificial intelligence. We propose a latent syntactic structure-based approach for sentiment analysis which requires only sentence-level polarity...
An intelligent system uses machine learning algorithms to provide outputs to every input provided. The introduction of emotions in intelligent systems is required to create systems that are more similar to human beings and thus more reliable. In this paper, the idea of introducing the emotion ‘uncertainty’ in Intelligent Systems is proposed. A Semi-Automated Intelligent System is introduced in this...
Huge amount of data in today's world are stored in the form of electronic documents. Text mining is the process of extracting the information out of those textual documents. Text classification is the process of classifying text documents into fixed number of predefined classes. The application of text classification includes spam filtering, email routing, sentiment analysis, language identification...
Type information of objects is very valuable in linked data. However, many linked data are incomplete in type information. Traditional research of type inference is able to find missing types by means of reasoning, but it may become invalid in those data with incomplete or incorrect schema. In this paper, we propose a text-classification approach to type prediction in linked data. An Object Graph...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.