The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Style transformation refers to the process by which a piece of text written in a certain style of writing is transformed into another text exhibiting a distinctly different style of writing without significant change to the meaning of individual sentences. In this paper we continue investigation into the linguistic style transformation problem and demonstrate current achievements in transformation...
An unconstrained online handwritten Chinese text lines dataset, SCUT-COUCH Textline_NU, a subset of SCUT-COUCH [1] [2], is built to facilitate the research of unconstrained online Chinese text recognition. Texts for hand copying are sampled from China Daily corpus with a stratified random manner. The current vision of SCUT-COUCH Textline_NU has 8,809 text lines (4,813 lines are collected by touch...
The inter-language studies on the textual semantic accessibility scale (SAS) are a new branch of the computational linguistics and the present paper tries to statistically probe into the SASes in English, French and Japanese literature works sampled from the corresponding corpora. Firstly, six control groups are formed by the equidistant texts extracted every 10 pages, 5 pages, 4 pages, 3 pages, 2...
Steganography is the ability to hide information in a cover media such as text, and pictures. An improved approach is proposed to embed secret into Arabic text cover media using Kashida, an Arabic extension character. The proposed approach is maximizing the use of Kashida to hide more information, represented in binary bits, in Arabic text cover media. A stego system has been developed based on this...
In information retrieval (IR) systems, there are a query and a collection of documents compared with this query and ranked according to a particular similarity measure. Since texts with the same content can be written by different authors, the writing styles of the documents change as well accordingly. This observation brings the idea of investigating text by means of style. In this paper, we analyze...
The difficulties in segmenting cursive words into individual characters have shifted the focus of handwriting recognition research from segmentation-based approaches to segmentation-free (holistic) methods. However, maintaining and training large number of prototypes (models) that represent the words in the dictionary make the training process extremely expensive and difficult in computing resources...
Text categorization is the task of assigning predefined categories to natural language text. With the widely used 'bag of words' representation, previous researches usually assign a word with values such that whether this word appears in the document concerned or how frequently this word appears. Although these values are useful for text categorization, they have not fully expressed the abundant information...
Typing Japanese texts with computers is not as straightforward as western ones. East-Asian languages use very large sets of symbols which are called ideograms or hieroglyphs. Typing words which are consisted of thousands of symbols is a process which must pass some procedures into which Latin characters are converted into hieroglyphs. This application is designed to assist typing Japanese texts with...
This paper presents an approach for repairing word order errors in English text by reordering words in a sentence and choosing the version that maximizes the number of trigram hits according to a language model. The novelty of this method concerns the use of an efficient confusion matrix technique for reordering the words. For further reducing the number of permutations the use of unigramspsila probability...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.