Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
An important factor of a corpus is its domain, usually the quality of a SMT system trained on an in-domain corpus increases by adding out-of-domain sentences to its training corpus. In this paper we have shown out-of-domain corpora may also contains sentences which are proper for improving the quality of in-domain corpus. These sentences have words and phrases that occur in indomain corpora so, their...
The objective of this work is improving for Statistical Machine (SMT) by using Self - Organizing MAP (SOM). In general we have 2 processes for Training and Translating. Training process is use for preparing resource from a number of bilingual corpuses, which are used for translating process. But, we still have a lot of irrelevant resource of data. Major method for this research is highlighted on new...
Internet has become primary medium for information access and commerce in today's globalized world and the demographics of the Internet are rapidly turning to be multilingual. Therefore, multilingual content management can be treated as a vital issue for the availability of information in the native languages of the Internet users. To provide the contents to the users in their native languages most...
This paper documents recent work carried out for PeEn-SMT, our Statistical Machine Translation system for translation between the English-Persian language pair. We give details of our previous SMT system, and present our current development of significantly larger corpora. We explain how recent tests using much larger corpora helped to evaluate problems in parallel corpus alignment, corpus content,...
Natural Language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages. Ambiguity is one of these problems which have been a great challenge for computational linguists. This paper concentrates on the problem of target word selection in Myanmar to English machine translation, for which the approach is directly...
In machine translation (MT), one of the main problems to handle is word reordering. This paper focuses to design and implement an effective machine translation system for Myanmar to English language.. The framework of this paper is reordering approach for English sentence. We propose an approach to generate the target sentence by using reordering model that can be incorporated into the Statistical...
This paper presents a novel approach to overcome the limitation inherited in statistical machine translation services where the translation of new terms is not covered. The proposed approach is based on the power of user generated content to drive Arabic translations of English words. Our initial pilot experiment reveals the potential of our approach. This approach can act as an add-on to improve...
Calculating the similarity of Wikipedia articles in different languages is helpful for bilingual dictionary construction and various other research areas. However, standard methods for document similarity calculation are usually very simple. Therefore, we describe an approach of translating one Wikipedia article into the language of the other article, and then calculating article similarity with standard...
Automated translation (MT) tools have become an urgent need in a multilingual environment. Although there are any available tools on the market, unfortunately, a robust MT tool is still a dream. This purpose of this paper is to discuss challenging issues in MT tool developments, the state of art of he MT tools and propose a framework for a semantic-based translation. The focus of this paper is English...
This paper introduces normalized Google distance into the study of word sense disambiguation and presents a novel unsupervised method of word sense disambiguation. The normalized Google distance is a theory of similarity between words and phrases, based on information distance and Kolmogorov complexity by using the world-wide-web as database, with its page counts derived from a search engine such...
Inflection and derivation have been the main ways of creating new words in English. With the development of science and technology, words as such appear faster than ever in scientific literature. Influenced by English, Chinese words with multiple affixes are also becoming a major way of new word creation in scientific literature. By studying the similarities of their original sources, this paper employs...
Transfer Grammar is an integral component of a Rule based Machine Translation system. In this paper, we describe a subset of the transfer grammar developed for Tamil to Hindi Machine Translation system, i.e., the transfer of nominal constructions from Tamil to Hindi. Nominal constructions in Tamil, which is an agglutinative language, take multiple suffixes which may be case markers or other suffixes...
Target phrase selection, a crucial component of the state-of-the-art phrase-based statistical machine translation(PBSMT) model, plays a key role in generating accurate translation hypotheses. Inspired by context-rich word-sense disambiguation techniques, machine translation (MT) researchers have successfully integrated various types of source language context into the PBSMT model to improve target...
In English - Vietnamese machine translation (EVMT) project at Ho Chi Minh City University of Technology there are some problems that cause the system to malfunction. One of the most undesired phenomena is lexical gap. A lexical gap occurs in case of lacking Vietnamese equivalent word to English word. There are some approaches to this obstacle. Some researchers prefer replacing lexical gap by its nearest...
Conversion from another language to native language is highly demanding due to increasing the usage of web based application. Firstly, the respective sentence of a native language is converted to Universal Networking Language (UNL) expressions and then UNL expressions can be converted to any native language. Already UNL system is developed for most of the languages, but there are no algorithms to...
This paper presents a training method of log-linear model for statistical machine translation based on structural support vector machine. This method is designed to directly optimize parameters with respect to translation quality. By adopting maximum-margin principle of SVM, the MT model can learn from training samples with generalization capability. Experiments are carried out on a hierarchical phrase-based...
The following topics are dealt with:language lexicon , morphology, syntax and parsing; information extraction; text understanding and summarization; machine translation; language resources; semantics; and spoken language processing.
This paper applies corpus linguistics techniques to study unique words in Hawks' translation of Hong Lou Meng in comparison with Yang Xianyi's translation of it. Concordance and Concapp are used to elicit data from Hong Lou Meng. By analyzing three representative unique words “dark-red”, “penny”, “mile”, which belongs to color terms, unit of money, unit of distance respectively, this paper attempts...
This paper presents evaluation methodology for English to Sinhala machine translation system. The English to Sinhala machine translation system has been developed by using Multi Agent Approach and powered through the concept of "Varanegeema". Translation system works through the communication among nine agents namely English Morphological Analyzer Agent, English Parser Agent, English to...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.