The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the development of the Web, large numbers of documents are put onto the Internet. More and more digital libraries, news sources and inner data of companies are available. Automatic text categorization becomes more and more important for dealing with massive data. However, text preprocessing is still the bottleneck of text categorization based on vector space model (VSM). The result of text preprocessing...
In this paper, we describe a system for intonation evaluation of English utterance by Japanese native speakers using synthesized speech for rapid development of a CALL system. To evaluate the intonation of learners' utterance, we need reference utterances, for which English native speakers' utterances should be used. However, it is costly to gather native speakers' utterances for all sentences in...
This paper presents a method of sentiment and sentimental agent identification based on Chinese sentimental sentence dictionary. Our method can identify eight kinds of sentiment (including joy, sorrow, love, disgust, surprise, anxiety, anger and hate), and the main sentimental agent. Sentimental sentence dictionary is composed by some sentimental sentence patterns. And the sentiment of a candidate...
This paper studies the word sense disambiguation of English modal verb ldquomayrdquo. Based on the analysis of the sense, category of modality and function of ldquomayrdquo in different contexts in the training corpus, a model of back propagation neural network for word sense disambiguation of ldquomayrdquo is established. It takes the mutual information of epistemic and non-epistemic ldquomayrdquo...
Recently, emotion recognition with computer has attracted a great deal of attention to researchers for its broad applications. Emotion estimation from textual input has also become active as natural language processing (NLP) technology develops. However, when it comes to negative sentences in Chinese, the original emotion estimation may be reversed which makes obtaining correct recognition results...
Research on cross-language information retrieval (CLIR) increasingly concentrates in candidate translation selection of the keywords in the query. The accuracy of translation has a direct impact on accurate rate and recalled rate. This thesis presents three methods based on HowNet to resolve query translation ambiguity of CLIR. The first is based on semantic relation, and it uses semantic relation...
We have developed a system that can semi automatically extract numerical and named entity sets from a large number of Japanese documents and can create various kinds of tables and graphs. In our experiments, our system has semiautomatically created approximately 300 kinds of graphs and tables at precisions of 0.2-0.8 with only two hours of manual preparation from a two-year stack of newspapers articles...
Mining the web for customer opinion on different products is both a useful, as well as challenging task. Previous approaches to customer review classification included document level, sentence and clause level sentiment analysis and feature based opinion summarization. In this paper, we present a feature driven opinion summarization method, where the term ldquodrivenrdquo is employed to describe the...
This paper presents a two-step dependency parser to parse Chinese deterministically. By dividing a sentence into two parts and parsing them separately, the error accumulation can be avoided effectively. Previous works on shift-reduce dependency parser may guarantee the greedy characteristic of deterministic parsing less. This paper improves on a kind of deterministic dependency parsing method to weaken...
Speech recognition systems are usually trained using tremendous transcribed utterances, and training data preparation is intensively time-consuming and costly. Aiming at reducing the number of training examples to be labeled, active learning is used in acoustic modeling of speech recognition, this learning scheme iteratively inspects the unlabeled samples, selects the most informative samples corresponding...
This paper makes a systematic study on disambiguating sentiment ambiguous adjectives within context in real text, which is an interaction between word sense disambiguation and sentiment analysis. We firstly address the issue of inter-annotator agreement on assigning semantic orientations to word occurrences in real text. Secondly we demonstrate that co-occurring sentiment monosemous adjectives can...
This paper presents a novel extractive approach which takes advantage of geodesic distance for sentence similarity computation to multi-document summarization task. Based on geodesic distance between every two sentences, the text relationship map is constructed. Sentences with higher degree in the map are selected and grouped into clusters. Finally, sentences with highest degree of each cluster are...
In Chinese language processing, new words are particularly problematic. It is impossible to get a complete dictionary as new words can always be created. We proposed a unified dual-chain unequal-state CRF model to detect new words together with their part-of-speech in Chinese texts regardless of the word types such as compound words, abbreviation, person names, etc. The dual-chain unequal-state CRF...
This paper proposes a novel approach to improve the kernel-based word sense disambiguation (WSD). We first explain why linear kernels are more suitable to WSD and many other natural language processing problems than translation-invariant kernels. Based on the linear kernel, two external knowledge sources are integrated. One comprises a set of linguistic rules to find the crucial features. For the...
A new method based on model selection for acoustic model training is proposed .The MPE trained model and the MLE trained model is used for model selection for the following training. The selection criteria is based on the ratio of the inter-variance to the intra-variance of each model. Besides we also propose a cluster method for the model in order to get the accuracy information for the weight calculation...
Sentence similarity computing plays an important role in the question answering (QA) system. Because there are many question expressions for one meaning, we present a new approach to match question based on fuzzy set. In this paper, we establish a library of standard questions. Each standard question is relative with a series of Keywords. The main focus of this paper lies with matching of standard...
Our objective is to estimate and clarify the factors that determine the degree of importance of information by extracting the words that characterize the degree of importance and to construct a system for automatically estimating this degree of importance. We studied the degree of importance of information by using machine learning. We first performed experiments using newspaper documents (Dn). In...
Cross-document coreference resolution, which is an important subtask in natural language processing systems, focus on the problem of determining if two mentions from different documents refer to the same entity in the world. In this paper we present a two-step approach, employing a classification and clusterization phase. In a novel way, the clusterization is produced as a graph cutting algorithm,...
This paper proposed a novel reordering model based on the reordering of source language chunks. This model is used as a preprocessing step of phrase-based translation models and could be well integrated with them. At the same time, as a chunk-based model, syntax information could be concerned in the process of reordering while the entire parsing of the source sentence is not required. Two experiments...
dasiaZHEpsila imperfective has long time been a burning problem with linguistic research. However, up till now, few studies on the semantic meanings of dasiatemporal adverbials+ZHE imperfectivepsila sentences and their formalizations have been done. This paper addresses the formalization of dasiaZHEpsila imperfective and its combinations with temporal adverbial by an automatic parsing using CTT(Copenhagen...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.