The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Since machine translation systems are still unable to produce satisfactory outputs, recently various interactive machine translation (IMT) approaches are proposed. State-of-the-art IMT systems use the human validated prefix as the only constraint that guides decoding, in which the human guidance is quite insufficient. This paper extends the human-computer interactions by allowing translators to provide...
Emotion is a primary semantic component of human communication. This study focuses on automatic emotion detection in descriptive sentences and how this can be used to tune facial expression parameters for virtual character generation. Therefore, we present a classification based sentiment analysis approach to mapping a sentiment sentence into an emotional state. Each sentence is represented as a feature...
We propose a statistical semantic analysis method for Chinese terms. We use words, part-of-speech (POS) tags, word distances, word contexts and the first sememe of a word in HowNet as features to train a Support Vector Machine (SVM) model for analyzing term semantics. The model is used to identify dependencies embedded inside a term. A Conditional Random Field (CRF) model is used afterwards to incorporate...
State-of-the-art Machine Translation (MT) systems are still far from being perfect. An alternative is the so-called Interactive Machine Translation (IMT). In this paper, we present some novel methods to improve the statistical phrase-based IMT. We utilize dynamic distortion limitation to balance the requirements of long distance reordering and decoding speed. And we introduce the difference function...
In large-scale scientific and technical literature translation in which many people are involved, inconsistency in the translation of the same terminology is inevitable. Firstly, this paper carried out a comprehensive analysis to terminology translation inconsistency, finding that most are translations with same meaning but different indications, which influences the readability of the whole article...
The integration of industrialization and IT application is going to be one of the Chinese new economy development strategies in the future. Therefore, how to make Big Data useful in generating significant productivity improvement in industries has already become one of the most important issues. This paper outlines the platform of knowledge service based on big data processing techniques, which have...
Latent Semantic Analysis (LSA) is a technology which is used to analyze the latent concepts. LSA is based on the Vector Space Model (VSM) and statistics, and it usually takes the Singular Value Decomposition (SVD) as the kernel algorithm. Always, LSA increases the scale of the training data to improve system performance. However, as it needs many extra operations, and it also generates too much cooccurrence...
Knowledge discovery of Ancient Medical Literatures (AMLs) is a research focus due to wide applications of computer technology in Traditional Chinese Medicine (TCM). The foundation of the knowledge discovery research is to get semantic labels within the AMLs and to restructure the text. Due to the diversity of AMLs, low coverage rate of current semantic lexicons and the ambiguities of the lexicon words,...
Prepositional phrase (PP) consists of two parts which are a preposition as the leading part and a word or phrase as the tail part. In accordance with this fact, this paper proposes a new approach for identifying PP. In this method, PP identification is transformed into the collocation identification of preposition itself and the right boundary word. The Cascaded Conditional Random Fields (CCRFs) is...
Concept acquisition is an important part of domain ontology construction, and how to accomplish assistant concept acquisition becomes a research focus. In this paper, a character-based CRF model is adopted to obtain the set of candidate terms, and we propose an active learning algorithm to select a concept from the set of candidate terms for the user and use the stochastic gradient descent algorithm...
The paper proposes an identification method of Maximal-Length Noun Phrase (MNP) based on Maximal-Length Preposition Phrase (MPP). We identify MNP utilizing the mutual restricting characteristic of MNP and adverbial MPP. We employ Conditional Random Fields (CRFs) model in identification processing, and use new tags and above long-distance word as features. Experimental result shows a high quality performance...
According to the characteristics of Chinese language, this paper proposes a statistical parsing method based on Maximal Noun Phrase (MNP) per-processing. MNP parsing is preferable to be separated from parsing of the full sentence. Firstly, MNP in a sentence are identified; next, MNP can be represented by the head of MNP, and then the sentence is parsed with the head of the MNP. Therefore, the original...
This paper proposes a new cascade algorithm based on conditional random fields. The algorithm is applied to automatic recognition of Chinese verb-object collocation, and combined with a new sequence labeling of “ONIY”. Experiments compare identified results under two segmentations and part-of-speech tag sets. The comprehensive experimental results show that the best performance is 90.65% in F-score...
This paper proposes a method which is aimed to translate English patent terms into Chinese based on head-driven method. Firstly, word alignment information and English NP parse tree are formed. The corresponding relation between word alignment information and syntactic structure which is built using restrict of head. The NP translation pattern database is formed as the gist of term reordering. Then...
Term relation extraction is the basis of automatically building the repository of terms. The current research about term relation extraction focuses on how to obtain effective characteristics of describing the relationship. This paper presents a novel method of feature extraction based on latent relation analysis, which automatically obtains the relation characteristics from patent corpus as the features...
Latent semantic indexing (LSI) is an effective method in the way of dimensionality reduction, which has been applied to many text learning mission, such as text categorization, information retrieval. This paper sufficiently analyses influence of text window toward mapping of latent semantic indexing and bring forward a latent semantic analysis method based on the semantic block which strengthen the...
Latent Semantic Indexing (LSI) is an effective method in the way of feature extraction, which has been applied to many text learning tasks, such as text clustering and information retrieval. This paper thoroughly analyses the influence of term co-occurrences on the mapping of Latent Semantic Indexing and brings forward a method named pseudo document which strengthens the beneficial term co-occurrences...
This paper proposes an approach for Chinese maximal noun phrase parsing based on cascaded conditional random fields. In this approach, the parse tree of Chinese maximal noun phrase is constructed layer by layer. The Chinese chunks are first recognized by the lower conditional random fields model, then the result is passed as input to the higher model for recognition of phrases, the process of recognizing...
This paper proposes a method of NP tree matching to realize the translation of English-Chinese patent titles. Firstly a bilingual example database for patent titles is built. English parse trees are produced by English parser, forming NP tree database. The input patent title to be translated is firstly parsed into a tree. Then NP trees are searched for which match with the input NP tree in NP tree...
Answer confidence evaluation is a central issue in answer validation. The paper suggests the three-level answer confidence evaluation including linguistic level evaluation, information-level evaluation and knowledge-level evaluation. It is necessary to combine multi-level answer validation approaches together in order to improve the performance of the QA systems. Finally, the paper introduces confidence...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.