The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
There exists a base classification system for classification of problem tickets in the Enterprise domain. Different deep learning algorithms (Gated Recursive Unit and Long Short Term Memory) were investigated for solving the classification problem. Experiments were conducted for different parameters and layers for these algorithms. Paper brings out the architectures tried, results obtained, our conclusions...
Different types of classifiers were investigated in the context of classification of problem tickets in the Enterprise domain. There were still challenges in building an accurate classifier post data cleaning and other accuracy improving pre-processing techniques. Creating an ensemble of classifiers gave better accuracy than individual classifiers. The maximum accuracy was got by enhancing the ensemble...
With the increasing number of Mauritian-owned websites on the internet, the need for classification is becoming highly important. Our objective in this research is to classify a list of websites into seven broad categories namely education, entertainment, government, health, tourism, sports and shopping. The homepage of three hundred and nineteen websites have been used in this study. We have exploited...
With the expansion of the Web 2.0, daily huge amount of data is produced everywhere, namely new articles. These contents need to be exploited in order to extract relevant information and to build knowledge databases. In this concern, processing the temporal dimension of language and extracting temporal information from electronic news articles is becoming a prominent task. In this concern, we propose...
The automatic insertion of diacritics in electronic texts is necessary for a number of languages, including French, Romanian, Croatian, Sindhi, Vietnamese, etc. When diacritics are removed from a word and the resulting string of characters is not a word, it is easy to recover the diacritics. However, sometimes the resulting string is also a word, possibly with different grammatical properties or a...
Organizations derive policies from a wide variety of sources, such business plans, laws, regulations, and contracts. However, an efficient process does not yet exist for quickly finding or automatically deriving policies from uncontrolled natural language sources. The goal of our research is to assure compliance with established policies by ensuring policies in existing natural language texts are...
Community Question Answering (CQA) has become a popular and effective mean for seeking information on the Web. It is now possible and effective to post a question asked in natural language on a popular community Question Answering (QA) portal, and to rely on other users to provide answers. These online collaborative services are attracting users and questions at an explosive rate, while how to correctly...
In this paper, we present a new mathematical model based on a “Vector Space Model” and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, and Taiwanese China Times 2005 data set using the proposed method. The Reuters-21578 data set is a benchmark data set for automatic...
In this paper, we present a model based on the Neural Network (NN) for classifying Arabic texts. We propose the use of Singular Value Decomposition (SVD) as a preprocessor of NN with the aim of further reducing data in terms of both size and dimensionality. Indeed, the use of SVD makes data more amenable to classification and the convergence training process faster. Specifically, the effectiveness...
Sentiment analysis aims to predict sentiment tendency automatically. Traditional methods tackling this problem are mostly based on supervised learning,but it is time-consuming and uneasy to extendable. In this paper,we provide a novel method of sentiment analysis based on un-supervised learning together with some language rules. It is no necessary to have a positive sentiment dictionary beforehand...
Speech production and speech phonetic features gradually improve in children by obtaining audio feedback after cochlear implantation or using hearing aid. In this study, voice disorders in children with cochlear implantation and hearing aid are classified. 30 Persian children participated in the study, including 6 children in levels 1 to 3 and 12 in level 4. Voice samples of 5 isolated Persian words...
Text categorization-assignment of natural language texts to one or more predefined categories based on their content-is an important component in many information organization and management tasks. Categorization algorithm is the most critical factor to text categorization system performance. The inductive learning classifiers are put forward. Very accurate text categorization result can be learned...
This paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and information extraction. This supervised machine learning POS tagging approach requires a large amount of annotated training corpus to tag...
Aiming to noise samples in the training dataset, a new method for reducing the amount of training dataset is proposed in the paper which is applicable to text classification. This method describes the distribution of training dataset according to the representativeness score of samples in the class they belong to, so as to show representative samples and noise samples in each class. The new method...
Since the automatic word segmentation of Chinese text will bring the lack of information, method of word segmentation according to lexical chunk as segmentation unit are proposed. Use traditional segmentation method segment Chinese text based calculate mutual information between two lexical entries and adjacent frequency of two or more lexical entries, according to this calculated value judge and...
Orientation detection is an important preprocessing step for accurate recognition of text from document images. Many existing orientation detection techniques are based on the fact that in Roman script text ascenders occur more likely than descenders, but this approach is not applicable to document of other scripts like Urdu, Arabic, etc. In this paper, we propose a discriminative learning approach...
Automatic document classification has been subject to research since the early 1960s. However, additional research is still required and possible because the results obtained until now remain subject to further enhancement and refinement. Although a lot of literature has been written on the subject, very little research was reported on the automatic classification of Arabic documents none of which...
The method of type functional application is employed attempting to resolve Chinese overlapping ambiguity in the area of Chinese word segmentation. Instead of traditional methods which treat Chinese overlapping ambiguity as classification problems, the proposed approach regards this task as a sentence type calculus problem. The method is based on type theory and the benefit of this method is that...
Most Chinese text classification methods are based on Chinese word segmentation and bag of words (BOW). The classification performance largely relies on the accuracy of segmentation. Unfortunately, perfect precision and disambiguation of segmentation cannot be reached. In order to solve this problem, a novel Chinese text classification method using string kernel is presented. String kernel computes...
Previous research to improve the performance of Internet search engines has focused on classifying questions, sentences and user-goals but not the classification of sentences and phrases based on query intention and non-query intention. This paper investigates a classification system of query intention and non-query intention of sentences and phrases by firstly analyzing previous work and based on...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.