The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Pursuing on the analysis of product reviews, an unsupervised product features categorization method is proposed. Morphemes as smallest linguistic meaningful unit are induced in measuring the intra relationship among product features instead of words. Opinion words around product features are chosen to represent the inter relationship among product features instead of full context information. The...
Word Sense Disambiguation (WSD) is the task of selecting the meaning of a word based on the context in which the word occurs. The principal statistical WSD approaches are supervised and unsupervised learning. The Lesk method is an example of unsupervised disambiguation. We present a measure for sense assignment useful for the simple Lesk algorithm. We use word co-occurrences of the gloss and the context,...
Word Sense Disambiguation in text is still a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. On the other hand, the unsupervised methods express significantly lower accuracy and produce results that are not satisfying for many application. The goal of this work is to develop a model of Word Sense Disambiguation which minimises the...
We applied a structure learning model, Max-Margin Structure (MMS), to natural language processing (NLP) tasks, where the aim is to capture the latent relationships within the output language domain. We formulate this model as an extension of multi-class Support Vector Machine (SVM) and present a perceptron-based learning approach to solve the problem. Experiments are carried out on two related NLP...
One nodus existing in Chinese word segmentation is the ambiguity problem of which more than 85% are crossing ambiguity, therefore it is significant to decrease the error in dealing with the crossing ambiguity. Taking the advantage of the characteristics of the crossing ambiguity string, a novel method based on the mutual information and t-test difference is proposed to deal with the ambiguities in...
There are many connotative semantic features in Chinese which can help Chinese named entity recognition. Moreover, one of the important strongpoint of maximum entropy model is that it can syncretize features in different granularity and level. With that in mind, many Chinese named entity semantic knowledge bases were established by extracting information from corpus in this paper. However, because...
In this paper, we would like to introduce a new approach to recover Vietnamese text's accents. Given a Vietnamese text in which accents are lost, our goal is to seek for a recovered text that yields a best lexical probability. Using a dynamic programming approach, we first build a model of language for Vietnamese as a lexical database which gives lexical probabilities to Vietnamese sentences. Second,...
The rich heritage of Chinese tea culture has attracted an increasing number of people in the world, but the translating of such classical and specialized literature proves to be extremely arduous. Machine translation (MT) is introduced to facilitate the decrypting process. However, when the popular online translation Systran is tried bi-directionally on a high-frequency wordlist from 24 ancient tea...
This paper presents a maximum entropy tagger for the identification of intra-sentential temporal relations between temporal expressions and eventualities mediated by temporal signals in constructions of the kind "eventuality + signal + temporal relation". The tagger reports an accuracy rate of 90.8%, outperforming the baseline (81.8%). One of the main results of this work is represented...
Covering ambiguity is a vital issue in Chinese word segmentation. Challenges are that disambiguation is depending on the contextual information. This paper collected contextual information statistics of covering ambiguity words and found a context calculation mode by using log likelihood ratio. A weighing calculation formula is designed for considering contextual informationpsilas window size and...
Clustering semantically related terms is crucial for many applications such as document categorization, and word sense disambiguation. However, automatically identifying semantically similar terms is challenging. We present a novel approach for automatically determining the degree of relatedness between terms to facilitate their subsequent clustering. Using the analogy of ensemble classifiers in machine...
Natural languages are typically replete with homographs, words which have more than one meaning. Consequently, machine understanding of natural language sentences sometimes suffers from certain ambiguities in getting the correct sense of a word in a given sentence. In this work we present a trainable model for word sense disambiguation (WSD) for resolving this ambiguity. The proposed model applies...
Word sense disambiguation (WSD) is a process of identifying proper meaning of words that may have multiple meanings. It is regarded as one of the most challenging problems in the field of natural language processing (NLP). Nepali Language also has words that have multiple meanings, thus giving rise to the problem of WSD in it. In this paper, we investigate the impact of NLP resources like morphology...
The paper describes the development and usage of a grammar developed to extract definitions from documents. One of the most important practical usages of the developed grammar is the automatic extraction of definitions from web documents. Three evaluation scenarios were run, the results of these experiments being the main focus of the paper. One scenario uses an e-learning context and previously annotated...
The effective automatic judgment of the Chinese words sentiment polarity, the most important part of the Chinese sentiment analysis, can improve the building of the subjectivity lexicon and the efficiency of the sentiment analysis. The technology of the Chinese word subjectivity and objectivity judgment is discussed and analyzed, the subjectivity dictionary is defined and the subjective feature model...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.