The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the rapid development of Internet, how to extract personal relations from Internet has become an important research topic in information extraction. However, current relation extraction researches mainly focus on the processing of English language, the researches focus on Chinese are less. At the same time, there are two main problems in current personal relation extraction approaches: 1) it...
Statistical approach with surrounding context around a space was widely used as a main feature for Thai sentence-breaking. However, it does not represent a contextual behaviour regarding an entire context in a sentence. Moreover, it does not take an advantage of Thai grammar rules to determine a sentence boundary. This paper proposes the use of a hybrid approach integrating between rule-based method...
In this paper, we present the problem of appropriate feature selection for constructing a Maximum Entropy (ME) based Named Entity Recognition (NER) system under the multiobjective optimization (MOO) framework. Two conflicting objective functions are simultaneously optimized using the search capability of MOO. These objectives are (i). the dimensionality of features, which is tried to be minimized,...
This paper presents a semi-supervised learning method for Vietnamese part of speech tagging. We take into account two powerful tagging models including Conditional Random Fields (CRFs)and the Guided Online-Learning models (GLs) as base learning models. We then propose a semi-supervised learning tagging model for both CRFs and GLs methods. The main idea is to use of a word-cluster model as an associate...
The performance of HMM-based text to speech (TTS) system is affected by the basic modeling units and the size of training data. This paper compares two HMM based Mandarin TTS systems using syllable and phone as basic units respectively with 1000, 3000 and 5000 sentences' training data. Two female speakers' corpora are used as training data for evaluation. For both corpora, the system using syllable...
Since whether or not a character sequence refers to an object in real word is determined mostly by its context, the context pattern induction plays an important role in entity recognition, which is an important task in the field of natural language processing (NLP). We present a nominal entity recognition method based on the context pattern induction. It induces high-precision context patterns in...
This paper describes a hybrid system that applies maximum entropy (MaxEnt) model with hidden Markov model (HMM) and some linguistic rules to recognize name entities in Oriya language. The main advantage of our system is, we are using both HMM and MaxEnt model successively with some manually developed linguistic rules. First we are using MaxEnt to identify name entities in Oria corpus, then tagging...
This paper presents the Thai named entity recognition (NER) systems using Conditional Random Fields (CRFs). In the previous studies of Thai NER, there are not any systems using syllable-segmented data as an input but word-segmented one. Since the results of some researches on NER in other languages such as Chinese show that the systems based on character are better than those based on word, this study...
PLSA is one of the most powerful language models for adaptation to a target speech. The vocabulary divided PLSA language model (VD-PLSA) shows higher performance than the conventional PLSA model because it can be adapted to the target topic and the target speaking style individually. However, all of the vocabulary must be manually divided into three categories (topic, speaking style, and general category)...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.