The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Word spotting strategies employed in historical handwritten documents face many challenges due to variation in the writing style and intense degradation. In this paper, a new method that permits effective word spotting in handwritten documents is presented that it relies upon document-oriented local features, which take into account information around representative keypoints as well a matching process...
The recognition accuracy of ligature-based Urdu language optical character recognition (OCR) systems highly depends on the accuracy of segmentation that converts Urdu text into lines and ligatures. In general, lines and ligatures-based Urdu language OCRs are more successful as compared to characters-based. This paper presents the techniques for segmenting Urdu Nastaleeq text images into lines and...
In recent years there has been a growing interest in digitizing the extensive amounts of books and documents that existed preceding the widespread adoption of digital technologies. Many of these digitizing initiatives deal with huge collections of handwritten documents, for which document image analysis techniques (page segmentation, keyword-spotting, optical character recognition (OCR), etc) are...
Deep learning had a significant impact on diverse pattern recognition tasks in the recent past. In this paper, we investigate its potential for keyword spotting in handwritten documents by designing a novel feature extraction system based on Convolutional Deep Belief Networks. Sliding window features are learned from word images in an unsupervised manner. The proposed features are evaluated both for...
This paper deals with identifying a writer from his/her offline handwriting. In a multilingual country where a writer can scribe in multiple scripts, writer identification becomes challenging when we have individual handwriting data in one script while we need to verify/identify a writer from handwriting in another script. In this paper such an issue is addressed with two scripts: English and Bengali...
Image segmentation plays an important role in digitized crime scene forensics. Particularly in context of modern high resolution contact-less and non-destructive acquisition and analysis of handwriting impression traces by means of 3D sensors, one main challenge is the separation of writing trace areas and non-traces by image segmentation. In earlier work authors have presented the general, yet qualitative...
Individuality of handwriting is the reason why it is used as a common base element for detecting character traits of the writer. It is believed that dynamic information improve the accuracy of the analysis, but they are not contained in an offline handwritten text. In order to recover dynamic information, a novel approach for handwriting trajectory recovery is proposed in this paper. The procedure...
We propose a novel approach for helping content transcription of handwritten digital documents. The approach adopts a segmentation based keyword retrieval approach that follows query-by-string paradigm and exploits the user validation of the retrieved words to improve its performance during operation. Our approach starts with an initial training set, which contains only a few pages and a tentative...
Existing transcripts for historic manuscripts are a very valuable resource for training models useful for automatic recognition, aided transcription, and/or indexing of the remaining untranscribed parts of these collections. However, these existing transcripts generally exhibit two main problems which hinder their convenience: a) text of the transcripts is seldom aligned with manuscript lines, and...
This paper deals with offline handwritten word recognition of a major Indic script: Bengali. Due to the structure of this script, the characters (mostly ortho-syllables) are frequently overlapping and hard to segment, especially when the writing is cursive. Individual character recognition and the combination of outputs can increase the likelihood of errors. Instead, a better approach can be sending...
This paper presents an interactive approach for fast and accurate page layout segmentation. It is a scribble-based interactive segmentation approach, where the user draws scribbles on the various regions and the system performs page layout segmentation. The user can correct and refine the resulting segmentation by drawing new scribbles. To classify the various regions of the page, we apply a bank...
Line segmentation is very crucial in handwritten text recognition/analysis task. A new text line extraction scheme based on a data clustering algorithm is proposed. Our approach starts by determining the number of lines and setting up text line mid points' initial positions using a modified piece-wise projection profile technique. We apply afterwards competitive learning algorithm to adaptively move...
In this paper, we propose a new writing type and script text classification technique to recognize the identity of texts extracted from heterogeneous document images. English, French and Arabic languages are used in these documents with mixed handwritten and machine-printed types. In order to identify each text-line/word image, we propose to use 23 features computed on a fixed-length sliding window...
Urdu Nastaleeq is a highly cursive, context sensitive language, written diagonally from top right to bottom left. This makes it difficult to segment the partial word or a complete word into characters. Further due to stacking of characters, the segmentation at the character level is hard to perform. Some researchers have performed the segmentation and have succeeded to a good extent, but still some...
Optical Character Recognition (OCR) is one of key research areas of Artificial Intelligence (AI), and image text recognition is one of challenging fields of OCR. Presented work offers a character recognition system for cursive script (e.g., Arabic, Urdu, etc.) segmented characters from their images. Presented methodology consists of phases namely (1) Image Acquisition, (2) Preprocessing, (3) Chain...
Automatic identification of the author of a document has a variety of applications for both online and offline handwritten data such as facilitating the use of writer-dependent recognizer's verification of claimed identity for security, enabling personalized HCI and countering repudiations for legal purposes. Most of the existing writer identification techniques require the data to be from a specific...
Today, Arabic is one of the big challenges in Optical Character Recognition (OCR) to support a digital communication. There are many research on arabic OCR, either printed or handwritten input. However, the research on arabic OCR with harakat is still little due to the high degree of difficulty in segmentation techniques. In this paper, we propose a new segmentation scheme of the arabic character...
Text line extraction is an important part of document image analysis. It provides significant information for follow-up character recognition and text-based retrieval. By analyzing the layout style and writing features of a document image with radicalized Bagua layout in Jiugong in Shui script, we propose a multi-directional text partition method for Shui Script based on Delaunay triangular mesh and...
Lanna Dharma alphabet is used in the past in the North of Thailand, mainly for religious communication. Most of handwritten Lanna Dharma is found in form of old palm leaves manuscripts. These documents have not been properly preserved, still unprotected and damaged by the time. To preserve these valuable documents, handwritten optical character recognition is one of the first choices. This paper proposes...
In this paper, we present the modified method of detecting text lines in handwritten documents based on the Block-Based Hough Transform. The algorithm has the practical application in the manuscript author identification. The proposed technique consists of three steps: preprocessing, detecting of potential text lines and eliminating the false ones. The first step covers the following operations: image...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.