The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we present a novel approach for segmentation-based handwritten keyword spotting. The proposed approach relies upon the extraction of a simple yet efficient descriptor which is based on projections of oriented gradients. To this end, a global and a local word image descriptors, together with their combination, are proposed. Retrieval is performed using to the euclidean distance between...
Word segmentation refers to the process of defining the word regions of a text line. It is a critical stage towards word and character recognition as well as word spotting and mainly concerns three basic stages, namely preprocessing, distance computation and gap classification. In this paper, we propose a novel word segmentation method which uses the Student's-t distribution for the gap classification...
The result of a document image segmentation task, e.g. text line or word segmentation, is usually a labeled image with each label corresponding to a different segmented region. For many applications, the segmented regions need to be stored and represented in an efficient way, using simple geometric shapes. A challenging task is to restrict all pixels corresponding to a specific label inside a polygon...
Word spotting and recognition are among the most important applications used today in the field of document processing and text understanding. In word spotting, the goal is to search a scanned document for instances of a specific word. In word recognition, we aim to identify the transcription of the document words. While substantial work in both topics has been published, not all are readily adaptible...
In this paper, we present a new approach for off-line isolated character recognition. The proposed method relies upon the application of a projection-based feature extraction stage, which resembles the Radon transform, on both the original image and a set of generated images corresponding to different gradient orientations of the original image. For the classification stage, Support Vectors Machines...
Recognition of old Greek document images containing polytonic (multi accent) characters is a challenging task due to the large number of existing character classes (more than 270) which cannot be handled sufficiently by current OCR technologies. Taking into account that the Greek polytonic system was used from the late antiquity until recently, a large amount of scanned Greek documents still remains...
Document image segmentation is a fundamental step in the document image analysis pipeline as it affects the accuracy of subsequent processing steps. An objective and realistic evaluation of page segmentation techniques is crucial for a quantitative comparison among them. In this paper, a goal-oriented performance evaluation methodology that calculates a comprehensive evaluation measure SR (Success...
H-KWS 2014 is the Handwritten Keyword Spotting Competition organized in conjunction with ICFHR 2014 conference. The main objective of the competition is to record current advances in keyword spotting algorithms using established performance evaluation measures frequently encountered in the information retrieval literature. The competition comprises two distinct tracks, namely, a segmentation-based...
In order to achieve accurate text recognition performance for historical handwritten document images, robust and efficient page segmentation is necessary. In this paper, we propose a text zone detection followed by a text line segmentation method suitable for historical handwritten documents. Our aim is to handle several challenging cases such as horizontal and vertical rule lines overlapping with...
Transcript mapping refers to the process of aligning meaningful units of a handwritten document image (e.g. Text lines, words, characters) with the corresponding transcription information. It has many applications such as (i) fast generation of ground truth at different granularity levels and (ii) indexing handwritten collections for document retrieval. In this paper, a novel transcript mapping technique...
Tran Scriptorium is a 3-years project that aims to develop innovative, cost-effective solutions for the indexing, search and full transcription of historical handwritten document images, using Handwritten Text Recognition (HTR) technology. The production of ground-truth (GT) of a dataset of handwritten document images is among the first tasks. We address novel approaches for the faster production...
Recognition of curved text in natural scene image is a challenging task. Due to complex background and unpredictable characteristics of scene text and noise, text characters in strings are often touching that affects the performance of segmentation and recognition. This paper presents a novel approach for curved text recognition using Hidden Markov Models (HMM). From curved text, a path of sliding...
This paper presents the results of the Handwriting Segmentation Contest that was organized in the context of the ICDAR2013. The general objective of the contest was to use well established evaluation practices and procedures to record recent advances in off-line handwriting segmentation. Two benchmarking datasets, one for text line and one for word segmentation, were created in order to test and compare...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.