The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The article deals with the use of prefixes in the Czech accentual syllabic trochee. We test a hypothesis raised by Miroslav Červenka, Květa Sgallová, and Petr Kaiser which states that some authors in the 19th century used prefixes to moderate rhythmical irregularities. In our analysis – based on automatic prefix recognition in a large body of poetic texts from the Corpus of Czech Verse – we observe...
This paper proposes new measures for dealing with word dispersion in a language corpus - reduced frequency and rarity. Their calculation is described and some results from the Czech National Corpus (CNC) presented. Some previous approaches are briefly mentioned.
If a corpus is submitted to a morphological analysis, there always remain some words that the analyser could not recognize (foreign names, misspellings,...). However, if a human reads the texts, he usually understands them, even if he does not knowas manywords as there are in the lexicon used by the morphological analyser. The language itself helps him to recognize unknown words. It is not only semantics...
In the paper, we present a software tool Affisix for automatic recognition of prefixes. On the basis of an extensive list of words in a language, it determines the segments – candidates for prefixes. There are two methods implemented for the recognition – the entropy method and the squares method. We briefly describe the methods, propose their improvements and present the results of experiments with...
The paper deals with automatic methods for prefix extraction and their comparison. We present experiments with Czech and English and compare the results with regard to the size and type (wordforms vs. lemmas) of input data.
We discuss two types of asymmetry between wordforms and their(morphological) characteristics, namely (morphological) variants and homographs. We introduce a concept of multiple lemma that allows for unique identification of wordform variants as well as ‘morphologically-based’ identification of homographic lexemes. The deeper insight into these concepts allows further refining of morphological dictionaries...
We investigate machine translation (MT) of user search queries in the context of cross-lingual information retrieval (IR) in the medical domain. The main focus is on techniques to adapt MT to increase translation quality; however, we also explore MT adaptation to improve effectiveness of cross-lingual IR.Our MT system is Moses, a state-of-the-art phrase-based statistical machine translation system...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.