The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Text is a vital feature in applications of computer vision. Traditional Chinese character recognition techniques are mainly based on optical character recognition (OCR), however, they can't obtain satisfactory results from images affected by complex circumstance, such as different viewpoint, scale changes, addition of noise and complex background. To solve these problems, inspired by SIFT descriptor,...
In this paper, we present a new method to extract product entity from Chinese customer reviews. The approach requires no segmentation, no domain dictionary and little prior domain knowledge, which is more suitable for domain with resource-limited. Quite different from the previous work, the proposed method first get the entity candidates use a general version bootstrapping algorithm and then distribute...
Named entity (NE) extraction for Thai language is a difficult and time consuming task because sentences in Thai language are composed of a series of words formed by a stream of characters. Moreover, there are no delimiters (blank space) to show word boundaries. Currently, most named entity extraction methods for Thai language are associated with word segmentation and part of speech (POS) tagging processes...
This paper presents the Thai named entity recognition (NER) systems using Conditional Random Fields (CRFs). In the previous studies of Thai NER, there are not any systems using syllable-segmented data as an input but word-segmented one. Since the results of some researches on NER in other languages such as Chinese show that the systems based on character are better than those based on word, this study...
Since more and more users express their reviews on the web, opinion mining becomes much important. Polarity analyzing and opinion mining is the process of automatically mining polarity and opinion with computer technology. This paper focuses on mining opinion of Chinese review sentences, obtaining comprehensive evaluation of product and ranking product in some feature or in all features. Methods are...
We applied a structure learning model, Max-Margin Structure (MMS), to natural language processing (NLP) tasks, where the aim is to capture the latent relationships within the output language domain. We formulate this model as an extension of multi-class Support Vector Machine (SVM) and present a perceptron-based learning approach to solve the problem. Experiments are carried out on two related NLP...
Building a domain model from a specialized corpus requires identifying candidate terms. It also includes identifying semantic relations between terms. Once this model is constructed it can be used for many tasks of information retrieval. In this process, multi-word terms have a great importance. In the one hand they constitute domain relevant candidate terms. On the other hand syntactic relations...
In this paper, we report our participation to the ESTER 2 (Evaluation des Systemes de Transcription Enrichie d'Emissions Radiophoniques) evaluation campaign, on the Named Entity Recognition for French track. After describing the ESTER 2 goals and guidelines, we present our deep robust parser. Then we show how we adapt the existing French NER module of this parser to the ESTER 2 task. The results we...
With the increased demand for English communication, various styles of learning support methods have been proposed and provided to the Japanese learners. However, there are still many learners finding it hard to read, write and speak in English. Regardless of language difference, understanding the other's intention and emotional status accurately and expressing what they think or feel to the others...
Named entity relations are a foundation of semantic networks, ontology and the semantic Web, and are widely used in information retrieval and machine translation, as well as automatic question and answering systems. Relation feature selection and extraction are two key issues. The location features possess excellent computability and operability, and the semantic features have strong intelligibility...
This paper briefly reports on a collection of Korean word associations (KorWA). Then by constructing a Korean semantic network based on the data, it identifies some structural features and discusses about a potential support for semantic study and language learning.
This paper describes an online handwritten Japanese character string recognition system based on conditional random fields, which integrates the information of character recognition, linguistic context and geometric context in a principled framework, and can effectively overcome the variable length of candidate segmentation. For geometric context, we employ both unary and binary feature functions,...
Automatic assessment of word stress error is an integral part for oral language grading system. However, problems that the property of vowels depends on its context information and the data sparseness of different vowel class are yet to be solved. This paper shall briefly introduce a hybrid method consisting of both traditional prosodic features and proposed context dependent strategies. In classification...
The process of evaluating, classifying, and assigning bugs to programmers is a difficult and time consuming task which greatly depends on the quality of the bug report itself. It has been shown that the quality of reports originating from bug trackers or ticketing systems can vary significantly. In this research, we apply information retrieval (IR) and natural language processing (NLP) techniques...
Generally phrasal verbs comprise a verb followed by a preposition that is commonly occurring feature in English. Each of the phrasal verbs acquires absolutely different meanings in different contexts. Having highly context dependent meanings, phrasal verbs may be disambiguated only by devising a technique involving utilization of semantic information pertaining to the context. This paper presented...
This paper proposes a novel approach to improve the kernel-based word sense disambiguation (WSD). We first explain why linear kernels are more suitable to WSD and many other natural language processing problems than translation-invariant kernels. Based on the linear kernel, two external knowledge sources are integrated. One comprises a set of linguistic rules to find the crucial features. For the...
Multiword chunking is designed as a shallow parsing technique to recognize external constituent and internal relation tags of a chunk in sentence. In this paper, we propose a new solution to deal with this problem. We design a new relation tagging scheme to represent different intra-chunk relations and make several experiments of feature engineering to select a best baseline statistical model. We...
Chinese named entity recognition (NER) is studied in two directions: inner structure and outer surroundings. Inner structural analyses induce constitutions of person, location and organization name from the point of linguistics. However inner structural rules for named entities only provide necessary conditions for a sequence of Chinese characters being an entity name but not sufficient. Whether a...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.