The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper describes a web-based system for page segmentation and text recognition of historical documents. The system is organised following a pipeline of 4 steps : 1) digitisation, 2) preprocessing, 3) textline extraction, and 4) handwritten text recognition based on hidden Markov models. In this study we used to evaluate the system the “Statuti del Doge Tiepolo”, a 14th century manuscript written...
Arabic script is cursive in both printed and handwritten forms. This intrinsic nature of cursiveness renders the segmentation task challenging. An Arabic word generally consists of multiple parts known as Parts of Arabic Words (PAWs) or simply sub-words. Sub-words share the same vertical space quite frequently which makes vertical projection segmentation technique inefficient. Several Arabic letters...
Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets — ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and...
This paper describes an online handwritten cursive word recognition approach by combining segmentation-free and segmentation-based methods. To search the optimal segmentation and recognition path as the recognition result, we can attempt two methods: segmentation-free and segmentation-based, where we expand the search space using a character-synchronous beam search strategy. The probable search paths...
In this work, a system based on a Bayesian approach, for the off-line recognition of handwritten arabic words, is proposed. Different structural features such as ascenders, descenders, loops and diacritic, are extracted from word's image, tacking into account the morphology of handwritten arabic words. For accurate features extraction, we proposed a novel method to estimate the word's baseline and...
The segmentation of touching characters is still a challenging problem in offline Chinese handwriting recognition. One feasible solution is through the over-segmentation strategy which maintains a high recall of correct cuts between adjacent characters and a moderate level of redundant cuts within a single character. Previous redundant cut filtering methods rely on either pure heuristics or learned...
Urdu Nastaleeq is a highly cursive, context sensitive language, written diagonally from top right to bottom left. This makes it difficult to segment the partial word or a complete word into characters. Further due to stacking of characters, the segmentation at the character level is hard to perform. Some researchers have performed the segmentation and have succeeded to a good extent, but still some...
The main aim of proposed system is to increase the security in banking environment using signature verification. For some real world personal identification, signature can be used for personal identification. It is used for authentication or concluding document. In order to reduce frauds in banks, signature verification is very much important. It increases accuracy and efficiency. Various methods...
Urdu Nastaleeq is a highly cursive, context sensitive language, written diagonally from top right to bottom left that makes it difficult to segment the partial word or a compete word into characters. Further due to stacking of characters, the segmentation at the character level is hard to perform. Some researchers have performed the ligature level segmentation and have succeeded to a great extent,...
This paper describes a comparison between online handwritten cursive word recognition using segmentation-free method and that using segmentation-based method. To search the optimal segmentation and recognition path as the recognition result, we attempt two methods: segmentation-free and segmentation-based, where we expand the search space using a character-synchronous beam search strategy. The probable...
Machine simulation of human reading has been a subject of intensive research for almost four decades. Automatic Urdu character recognition remains a challenging task due to its cursive nature despite the fact that the latest improvements in recognition methods and systems for Latin script are very promising. This work introduces a robust approach based on statistical models that provide solution for...
Segmentation of line, word and character are one of the critical phases of optical character recognition (OCR). Due to the imperfection in segmentation, most of the recognition system produce poor recognition rate. In this paper we are discussing some novel approach for line, word and character segmentation of printed Manipuri document. Few works has been done for optical character recognition on...
This paper presents a novel approach for offline Bangla (Bengali) handwritten word recognition by Hidden Markov Model (HMM). Due to the presence of complex features such as headline, vowels, modifiers, etc., character segmentation in Bangla script is not easy. Also, the position of vowels and compound characters make the segmentation task of words into characters very complex. To take care of these...
In this paper, we propose an analytical approach of an offline recognition of handwritten Arabic. Our method is based on Hidden Markov Models (HMM) Toolkit (HTK), modeling type that takes into consideration the characteristics of Arabic script and possible inclinations of cursive words. The objective is to propose a methodology for rapid implementation of our approach. To this end, a preprocessing...
Recognition of curved text in natural scene image is a challenging task. Due to complex background and unpredictable characteristics of scene text and noise, text characters in strings are often touching that affects the performance of segmentation and recognition. This paper presents a novel approach for curved text recognition using Hidden Markov Models (HMM). From curved text, a path of sliding...
Optical Character Recognition (OCR) problems are often formulated as isolated character (symbol) classification task followed by a post-classification stage (which contains modules like Unicode generation, error correction etc.) to generate the textual representation, for most of the Indian scripts. Such approaches are prone to failures due to (i) difficulties in designing reliable word-to-symbol...
A hidden Markov model (HMM) based method for Chinese legal amount recognition is presented in this paper. In the training phase, gradient feature is extracted from sliding windows and character HMMs are trained with single character images. In the recognition phase, the text line image is segmented using sentence HMM, which is constructed by character HMMs according to a strict language model. The...
In this paper, we present a solution towards building a retrieval system over handwritten document images that i) is recognition-free, ii) allows text-querying, iii) can retrieve at sub-word level, iv) can search for out-of-vocabulary words. Unlike previous approaches that operate at either character or word levels, we use character n-gram images (CNG-img) as the retrieval primitive. CNG-img are sequences...
Character Recognition is a process of understanding a human readable text document by machines. Today many researchers in the academia and industry are interested in this direction. This paper describes a novel method of Character Recognition. The main objective of this is to use the Radon Transform and Principal Component Analysis to obtain a set of invariant features, on basis of which a character...
This paper presents an effective approach for the offline recognition of unconstrained handwritten Chinese texts. Under the general integrated segmentation-and-recognition framework with character oversegmentation, we investigate three important issues: candidate path evaluation, path search, and parameter estimation. For path evaluation, we combine multiple contexts (character recognition scores,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.