The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
An approach for the detection of decorative elements - such as initials and headlines - and text regions, focused on ancient manuscripts, is presented. Due to their age, ancient manuscripts suffer from degradation and staining as well as ink is faded-out over the time. Identifying decorative elements and text regions allows indexing a manuscript and serves as input for Optical Character Recognition...
Text recognition in ancient documents poses specific challenges such as degradation and staining, fading out of ink, fluctuating text lines, superimposing of text-elements or varying layouts, amongst others. To cope with those challenges, a texture-based approach is proposed, which exploits the fact that different kinds of textures have distinct orientation distributions. The orientation information...
Before the image of a document enter an OCR module, it should undergo Preprocessing and Document Layout Analysis steps. Document layout analysis usually comes after preprocessing. Noise removal and skew correction are two major preprocessing operations. Document layout analysis itself is divided into physical and logical layout analysis. Physical layout analysis decomposes the image of a document...
The paper describes a new approach using a conditional random fields (CRFs) to extract physical and logical layouts in unconstrained handwritten letters such as those sent by individuals to companies. In this approach, the extraction of the layouts is considered as a labeling task consisting in assigning a label to each pixel of the document image. This label is chosen among a set of labels depicting...
A new hybrid page layout analysis algorithm is proposed, which uses bottom-up methods to form an initial data-type hypothesis and locate the tab-stops that were used when the page was formatted. The detected tab-stops, are used to deduce the column layout of the page. The column layout is then applied in a top-down manner to impose structure and reading-order on the detected regions. The complete...
In this paper we propose a new approach to improve electronic editions of human science corpus, providing an efficient estimation of manuscripts pages structure. In any handwriting documents analysis process, the text line segmentation is an important stage. The presence of variable inter-line spaces, of inconstant base-line skews, overlapping and occlusions in unconstrained ancient 19th handwritten...
Text and non-text segmentation and classification is very important in document layout analysis system before it is presented to an OCR system. Heuristic rules have been used in segmenting and classifying the text and non-text blocks. This research focuses on the classification of non-text block in technical documents into table, graph, and figure. A comparative study is conducted between backpropagation...
This paper presents a novel approach for visual scene modeling and classification, investigating the combined use of text modeling methods and local invariant features. Our work attempts to elucidate (1) whether a textlike bag-of-visterms (BOV) representation (histogram of quantized local visual features) is suitable for scene (rather than object) classification, (2) whether some analogies between...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.