The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Today in media companies there is a serious problem for cataloging news due to the large number of articles received by the documentation departments. That manual labor is subject to many errors and omissions because of the different points of view and expertise level of each staff member. There is also an additional difficulty due to the large size of the list of words in a thesaurus. In this paper,...
This research is part of a larger integrated approach for extraction of information of interest from free text and the visualization of semantic relatedness between phrases of interest. This paper defines a new structure which is a key component, the expanded entity phrase (EPx). This paper also presents an approach for extracting EPx's from free text. The structure of the EPx's facilitates quantitative...
Sentiment lexicons are language resources widely used in opinion mining and important tools in unsupervised sentiment classification. We present a comparative study of sentiment classification of reviews on six different domains using sentiment lexicons from different sources. Our results highlight the tendency of a lexicon's performance to be imbalanced towards one class, and indicate lexicon accuracy...
Automatic Chinese Term Extraction is an important issue in Natural Language Processing. This paper has proposed an improved method based on C/NC-value to extract Chinese term. We delete the linguistic part in C/NC-value method, and add two improved statistical parameters: mutual information and log-likelihood ratio, to calculate the statistical features of the candidate string. The results of the...
Keyphrase extraction is a fundamental research task in natural language processing and text mining. A limitation of previous keyphrase extraction methods based on semantic analysis is that the acquisition of the semantic features within phrases is restricted by the constructed thesaurus and language. An approach to the acquisition of the semantic features within phrases from a single document is proposed...
The paper proposes an approach to contextualized answering of questions. The contextualization is achieved by using an ontology. The answers are provided based on a domain specific document collection. The approach consists of several phases as follows: data preparation, data enhancement, data indexing and handling questions. The functioning of the proposed approach is demonstrated on English document...
There exists a large and underutilized resource of archaeological literature, both formal, such as scholarly journals and less formal in the form of `grey literature'. In the archaeological domain the vast majority of this literature contains some geo-spatial element as well as the expected temporal information and therefore its ease of discovery would be greatly enhanced were it accessible via a...
In this paper, we present an ontology modeling tool for building Chinese domain ontology, which not only offers automatic acquisition knowledge from multiple dictionaries, but also corresponds with a dictionary-based ontology modeling methodology that combines methontology and throwaway prototype. By using a case study of spatial information science ontology modeling, the theoretical method and implementation...
Based on the analysis of the traditional forward maximum matching word segmentation algorithm and the characteristics of the principle on the basis of the results of the use of word frequency statistics, we design a new structure of the dictionary, a dictionary based on the new structure to improve the matching algorithm are the largest. After time complexity analysis and experiments, the improved...
Two keyword-extraction ways are usually used, one is simply using the information from exactly single word like word frequency and TF.IDF, the other is based on the relationship between words. The relationship is usually described as word similarity which derives from a corpus (WordNet, HowNet) or man-made thesaurus. With the information explosion nowdays, the words we using are growing and changing...
In recent years, researchers have begun to pay more attention to the emotion recognition in natural language processing. In order to help this pursuit, this paper proposes a semi-automatic approach to create a Chinese emotion thesaurus with tag of emotion intensity based on two kinds of language resources HowNet and Tongyici Cilin. As a basic emotion resource, the emotion thesaurus should be used...
HowNet-based Chinese Word Lexical Semantic Similarity Measurement (WLSSM) pattern is proposed in this paper, in which the organization structure of HowNet is used to extract abundant semantic information. This paper utilizes the grammatical rules of Knowledge Database Mark-up Language (KDML) and adopts both maximum matching in each level of sememe and depth information of sememes for WLSSM. Compared...
Despite all the advances on techniques to block spam e-mail messages we still receive them on a frequent basis. This is due mainly to the ability of the spammers to modify the message and pass the filters. Therefore a testing technique that could resemble the behavior of spammers would improve the number of scenarios tested and allow filters to be developed based on the potential changes made by the...
We propose in this paper to use NLP approaches to extract and validate induced syntactic relations (verb-object). We employ syntactic parser and a semantic proximity measure to extract them. Then, we focus on a Web validation system, a semantic-vector-based approach, and finally we propose approaches to combine both in order to rank induced syntactic relations. The semantic vectors approach is a Roget-based...
We can observe a clear distinction in the approaches to natural language processing before and after the year 2000. Before the twenty first century grammar rules and word dictionaries for computer use were written by the instinct of linguists, while after the turn of the century these have been basically obtained, or havebeen recognized to be obtainable, from linguistic databy proper automatic analyses...
This paper describes that a graph-based co-clustering approach is suitable for extraction of verb synonyms from large scale texts. The proposed bipartite graph algorithm can produce clusters of verb synonyms as well as noun synonyms taking into account word co-occurrence between verb and its argument. Experimental results show that the co-clustering approach achieve higher accuracy than those by a...
This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution...
This paper presents a method of sentiment and sentimental agent identification based on Chinese sentimental sentence dictionary. Our method can identify eight kinds of sentiment (including joy, sorrow, love, disgust, surprise, anxiety, anger and hate), and the main sentimental agent. Sentimental sentence dictionary is composed by some sentimental sentence patterns. And the sentiment of a candidate...
This paper proposes a novel approach to improve the kernel-based word sense disambiguation (WSD). We first explain why linear kernels are more suitable to WSD and many other natural language processing problems than translation-invariant kernels. Based on the linear kernel, two external knowledge sources are integrated. One comprises a set of linguistic rules to find the crucial features. For the...
When using natural language question to retrieval document, query expansion are the key factors that affect its retrieval performance. By analyzing the traditional query expansion method, this paper puts forward a query expansion method based on set theory for answering document retrieval. In order to verify the validity of the method, a similarity calculation method for question and candidate answer...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.