The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Extracting medical knowledge by structured data mining of many medical records and from unstructured data mining of natural language source text on the Internet will become increasingly important for clinical decision support. Output from these sources can be transformed into large numbers of elements of knowledge in a Knowledge Representation Store (KRS), here using the notation and to some extent...
The Sejong Electronic (machine-readable) Dictionary, developed by the 21st century Sejong Plan, contains a systematically organized information on Korean words. It helps to solve the problems encountered in the electronic formatting of a still-commonly-used hard-copy dictionary. The Sejong Electronic Dictionary, however, has a limitation relating to sentence structure and selection-restricted nouns...
Emails have been increasingly popular and have become an indispensible tool for communication and document exchange. Because of its convenience, people use emails every day at work, at school, and for personal matters. Consequently, the number of emails people receive daily keeps on increasing, causing them to spend more time organizing the emails. People often need to classify and move email into...
This paper proposes a method for exploring technical phrase frames by extracting word n-grams that match our information needs and interests from research paper titles. Technical phrase frames, the outcome of our method, are phrases with wildcards that may be substituted for any technical term. Our method, first of all, extracts word trigrams from research paper titles and constructs a co-occurrence...
It is necessary for a researcher to know historical transition in researchers and research topics. Although Web search engines can be used for obtaining such information, collecting the information across a long time period is difficult and laborious. Thus, we proposed a method for automatically extracting historical transition in researchers and research topics by using co-occurrence information...
This study examines how the Latent Dirichlet Allocation (LDA) model combined with natural language processing techniques can be used to identify hot topics from free-text customer reviews. To verify the validity of the proposed approach, 21 580 restaurant reviews are collected. Each review is viewed as a probabilistic mixture of latent topics and each topic is treated as a probability distribution...
A number of Learning Management Systems (LMSs) exist on the market today. A subset of a LMS is the component in which student assessment is managed. In some forms of assessment, such as open questions, the LMS is incapable of evaluating the students' responses and therefore human intervention is necessary. In order to assess at higher levels of Bloom's (1956) taxonomy, it is necessary to include open-style...
Keyphrase extraction is a fundamental research task in natural language processing and text mining. A limitation of previous keyphrase extraction methods based on semantic analysis is that the acquisition of the semantic features within phrases is restricted by the constructed thesaurus and language. An approach to the acquisition of the semantic features within phrases from a single document is proposed...
Multi-documents summarization is an important research area of NLP. Most methods or techniques of multi-document summarization either consider the documents collection as single-topic or treat every sentence as single-topic only, but lack of a systematic analysis of the subtopic semantics hiding inside the documents. This paper presents a Subtopic-based Multi-documents Summarization (SubTMS) method...
Chinese text automatic proofreading opens up broad possibilities for the application of natural language processing. According to the distribution of Chinese single-character after word segmentation in Chinese text with the characteristic of errors and character trigram model, presents an effective text automatic proofreading algorithm. Experiments show that our method achieves better precision and...
Hierarchical phrase-based translation model has been proven to be a simple and powerful machine translation model. However, due to the computational complexity constraints, the extraction and use of hierarchical rules are usually restricted under certain limits, and these limits could have a negative impact on the performance of the translation model, especially for reordering. This paper presents...
This paper presents a simple method to extract compounds using statistical collocations and POS bigram probabilities without a POS tagger. Statistical collocation was used to determine strength of word co-occurrences. Probabilities of POS sequences were used to adjust the strength of collocation within a possible compound. These probabilities were estimated from compounds found in the dictionary....
Translating Chinese ancient poem is a valuable but hard thing. Automatic choosing of English rhymes in translation of Chinese ancient poems would do translators a favor. This paper extracts three important factors that influence English rhymes, and presents a set of statistical models based on these factors, and then trains these models and acquires their parameters, which at last are used to recommend...
This paper describes an novel approach towards linguistic processing for robots through integration of a motion language module and a natural language module. The motion language module represents association between symbolized motion patterns and words. The natural language module models sentences. The motion language module and the natural language module are graphically integrated. The integration...
Word segmentation is one of the most important tasks in NLP. This task, within Vietnamese language and its own features, faces some challenges, especially in words boundary determination. To tackle the task of Vietnamese word segmentation, in this paper, we propose the WS4VN system that uses a new approach based on Maximum matching algorithm combining with stochastic models using part-of-speech information...
Syntactic parsing is a central problem and a challenge in the field of natural language processing. It attracts many studies and consequently there exists the effective parsers for several popular languages such as English and Chinese. For Vietnamese parsing, there have been a few studies focusing on this problem, these studies lack of applying modern techniques, and no popular parser has been released...
Research on machine translation has a long history and many methods and techniques have been proposed and developed. However, low quality of translation is still a major problem and many related problems remain unresolved. Super function based machine translation was proposed to perform translation without going through syntactic and semantic analysis as many machine translation systems usually do...
This paper presents an exponential language model (ELM) for modeling and managing knowledge elements. The model has been developed based on minimum sample risk (MSR) algorithm, which is a discriminative training method. ELM uses features to capture global, domain, or sentential language phenomena that is composed of name entities, part of speech strings, personal usage words, positions of words, sentence...
Enhancing Arabic tagging is of great importance in many NLP applications. This paper presents a simple comparison tool that compares two powerful tagging systems for Arabic, the first one is the ASVM Tagger, by Diab M. et al,. The second one is RDI Arab Tagger that relies on simple powerful long n-grams probability estimation plus A*search algorithm for disambiguation, this comparison is done to superimpose...
In this paper, we introduced a new semantic induction metric which can induce some semantic classes from a set of domain-specific unannotated data. We emphasized on the co-occurrence probability instead of just distances of word probability distribution. Compared to the traditional approach on right or left context to calculate the similarity, we used both left and right information simultaneously...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.