The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Natural Language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages. Ambiguity is one of these problems which have been a great challenge for computational linguists. This paper concentrates on the problem of target word selection in Myanmar to English machine translation, for which the approach is directly...
Sentiment lexicons are language resources widely used in opinion mining and important tools in unsupervised sentiment classification. We present a comparative study of sentiment classification of reviews on six different domains using sentiment lexicons from different sources. Our results highlight the tendency of a lexicon's performance to be imbalanced towards one class, and indicate lexicon accuracy...
In this work, we investigate sentiment mining of Arabic text at both the sentence level and the document level. Existing research in Arabic sentiment mining remains very limited. For sentence-level classification, we investigate two approaches. The first is a novel grammatical approach that employs the use of a general structure for the Arabic sentence. The second approach is based on the semantic...
This paper investigates lexical stress detection for Chinese learners of English, where a combined differential acoustic feature is developed to represent the lexical stress of polysyllabic words in continuous speech. The use of frame-averaged feature and the contextual information intra-word can be input to the classifiers without normalization. The word-based stress detection method proposed in...
In this paper, we propose a novel heuristic approach to segment recognizable symbols from online Kannada word data and perform recognition of the entire word. Two different estimates of first derivative are extracted from the preprocessed stroke groups and used as features for classification. Estimate 2 proved better resulting in 88% accuracy, which is 3% more than that achieved with estimate 1. Classification...
We present a complete online handwritten character recognition system for Indian languages that handles the ambiguities in segmentation as well as recognition of the strokes. The recognition is based on a generative model of handwriting formation, coupled with a discriminative model for classification of strokes. Such an approach can seamlessly integrate language and script information in the generative...
In this paper, a new method of multiple layer classifiers integration based on single classifier is proposed which called Auto Weight Adjust. In the most used classifiers, Maximum Entropy (ME) model has excellent performance, and Naïve Bayesian (NB) is preferred by researchers for it's simple and useful. So in our experiments we chose ME and NB as single classifiers and use the ME classifier result...
In this paper, we present a new mathematical model based on a “Vector Space Model” and consider its implications. The proposed method is evaluated by performing several experiments. In these experiments, we classify newspaper articles from the English Reuters-21578 data set, and Taiwanese China Times 2005 data set using the proposed method. The Reuters-21578 data set is a benchmark data set for automatic...
This paper presents a high performance bilingual OCR system for printed Thai and English text. With the complex nature of both Thai and English languages, the first stage is to identify languages within different zones by using geometric properties for differentiation. The second stage is the process of character recognition, in which the technique developed includes a feature extractor and a classifier...
We present a study of designing compact recognizers of handwritten Chinese characters using multiple-prototype based classifiers. A modified Quick prop algorithm is proposed to optimize a sample-separation-margin based minimum classification error objective function. Split vector quantization technique is used to compress classifier parameters. Benchmark results are reported for classifiers with different...
In this paper, we compare the experimental results for Tamil online handwritten character recognition using HMM and Statistical Dynamic Time Warping (SDTW) as classifiers. HMM was used for a 156-class problem. Different feature sets and values for the HMM states & mixtures were tried and the best combination was found to be 16 states & 14 mixtures, giving an accuracy of 85%. The features used...
Aiming at problems such as fixed training set and lacking of completed information in traditional Bayesian classification, incremental learning mechanism is introduced. Combining with the characteristics of question sentences in Chinese question answering system, Semi-Naive Bayesian model is used to construct classifier. In order to make prior distribution of samples lean to even distribution, samples...
Text categorization-assignment of natural language texts to one or more predefined categories based on their content-is an important component in many information organization and management tasks. Categorization algorithm is the most critical factor to text categorization system performance. The inductive learning classifiers are put forward. Very accurate text categorization result can be learned...
A number of effective classification algorithms have been developed for spoken language recognition, and it has been a common practice in the NIST Language Recognition Evaluations (LREs) that an information fusion is applied to boost the performance of the recognition system. This paper investigates the fusion of multiple output scores generated using different classifiers that complement to further...
Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different...
Machine learning has become the predominant problem-solving strategy for computational linguistics problems in the last decade. In this paper, we present an implemented machine learning system for the automatic identification of non-referential pronouns in Arabic texts. Our system is based on a Bayesian network which has shown its efficiency for modeling NLP problems. We have evaluated our approach...
To overcome the limitations of traditional text classification approaches based on bag-of-words representation and to effectively incorporate linguistic knowledge and conceptual index into text vector space model, based on two thesaurus HowNet and Tongyici Cilin (hereinafter referred to Cilin), we use semantic vector to describe a document instead of traditional keywords vector, which is based on...
This paper compares the performance of keyword and machine learning-based chest x-ray report classification for Acute Lung Injury (ALI). ALI mortality is approximately 30 percent. High mortality is, in part, a consequence of delayed manual chest x-ray classification. An automated system could reduce the time to recognize ALI and lead to reductions in mortality. For our study, 96 and 857 chest x-ray...
Analyzing requirements for consistency and checking them for correctness can require significant effort, particularly if they have not been maintained with a requirements management tool (e.g., DOORS) or specified in a machine-readable notation. By restricting the number of requirements being analyzed, fewer opportunities exist for introducing errors into the analysis. This can be accomplished by...
Motivated by the numerous applications of analysing opinions in multi-domain scenarios, this paper studies the potential of a still rarely considered approach to the problem of multi-domain sentiment analysis based on Senti-WordNet as lexical resource. SentiWordNet scores are exploited together with additional features to assign a polarity to a text using machine learning. On the other hand, a rule-based...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.