The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we present a new database with handwritten Arabic script. It is based on five books written by different writers from the years 1088–1451. We took 680 pages from these five books, and fully annotated them on the sub-word level. For each page we manually applied bounding boxes on the different sub-words and annotated the sequence of characters. It consists of 121,636 sub-word appearances...
Social networks have a large amount of data available, but often, people do not provide some of their personal data, such as age, gender, and other demographics. Although the sentiment analysis uses such data to develop useful applications in people’s daily lives, there are still failures in this type of analysis, either by the restricted number of words contained in the word dictionaries or because...
Handwriting can be used to predict or analyze a person's behavioral or personality traits. Characteristics of a handwriting are studied for that. In this work various characteristics like size, spacing, slant, skew, pressure, etc are studied. Since signature reflects important characteristics of human being, we analyze them also. As our objective is to build an automated or computerized handwriting...
In recent years there has been a growing interest in digitizing the extensive amounts of books and documents that existed preceding the widespread adoption of digital technologies. Many of these digitizing initiatives deal with huge collections of handwritten documents, for which document image analysis techniques (page segmentation, keyword-spotting, optical character recognition (OCR), etc) are...
In this paper, we present the modified method of detecting text lines in handwritten documents based on the Block-Based Hough Transform. The algorithm has the practical application in the manuscript author identification. The proposed technique consists of three steps: preprocessing, detecting of potential text lines and eliminating the false ones. The first step covers the following operations: image...
Identification of individuals from handwritten documents using automated recognition systems has gained significant research interest due to the wide variety of applications it offers for forensic analysis, signature verification, classification of historical writings and other document analysis tasks. In this paper, we present a framework that combines different feature space representations of handwriting...
A corpus is a large collection of texts that can be automatically analyzed for linguistic patterns and structures using interactive tools. Corpus-based language learning has gained prominence in recent years thanks to the advances in computing technologies, such as text mining, searching, and natural language processing. The size and variety of corpora have also grown significantly in recent years...
This paper describes a database of on-line handwritten patterns mixed of text, figures, tables, maps, diagrams and so on. Now, pen-based and touch-based interfaces are spreading into people and their surfaces are getting large. People can write and draw mixed objects without paying attention on the difference of objects or the mode change. Moreover, they may write text in any direction in combination...
This paper introduces a new offline handwriting database that was developed to be employed in performance evaluation, result comparison and development of new methods related to handwriting analysis and recognition. The database can particularly be used for signature verification, writer recognition and writer demographics classification. In addition, the database also supports isolated digit recognition,...
A key factor in building effective writer identification/verification systems is the amount of data required to build the underlying models. In this research we systematically examine data sufficiency bounds for two broad approaches to online writer identification -- feature space models vs. writer-style space models. We report results from 40 experiments conducted on two publicly available datasets...
Script image segmentation of a document image is the most decisive step to the success of the process of transliteration of the script image into another script, such as automatically transliterating a printed Javanese manuscript image into a Latin manuscript. This paper gives an example of the application of profile projection modification to the segmentation of Javanese script document image of...
The estimation and correction of handwritten word skew is a difficult and challenging task since it has to be independent of the variations due to handwriting style and writing conditions. In this paper, a coarse-to-fine technique that integrates core-region information is presented. At first, a rough estimation and correction of the skew is accomplished by cutting vertically the word in two overlapping...
Codebook-based representations have been effectively employed for writer identification. Most of the codebook-based methods generate a codebook by clustering a set of patterns extracted from an independent data set. The probability of occurrence of the codebook patterns in a given writing is then used to characterize its author. This study investigates the hypothesis that the codebook is merely a...
This paper investigates highly discriminating features for writer identification for off-line handwritten text lines and passages. Five categories of features are tested: slant and slant energy, skew, pixel distribution, curvature, and entropy. Four experiments are run utilizing the IAM Handwriting Database and the ICDAR 2011 Writer Identification Contest dataset: the first, on 10 writers from the...
In recent years, several methods have been proposed for content-based retrieval from manuscripts, mostly based on character or word similarity. In this paper, we present a new segmentation-free method, called Harris Corner Matching (HCM), which accepts an arbitrary writing pattern as a model and allows to retrieve similar patterns from a possibly large database. Retrieval is performed in two steps...
In this paper, we propose an original representation model for handwriting document images. Most state-of-the-art handwriting representation models only use separately textural properties, selective dominant features (such as stroke orientation or gradient orientation) or structural properties. To avoid the drawbacks of using the properties from a single aspect, we design a comprehensive model that...
This paper introduces the Generalized Eigen Cooccurrence Matrix (GECM) as a new feature to describe complex structures like images of handwritings for palaeographic expertise. It measures the spatial dependency between two features in the image. It generalizes the popular grey level cooccurrence Dependencies (SGLD) which uses the luminance for the two features. 2nd order statistics generate high dimensional...
In this paper we present a new dual mode, twin-folio structured English handwriting dataset IBM_UB_1. IBM_UB_1 is our first major release from a large multilingual handwriting corpus. Containing over 6000 pages of handwritten matter, this dataset can not only be used for unconstrained handwriting recognition, more importantly, the dataset's unique twin-folio structure presents a natural fit for research...
Text line detection is a pre-processing step for automated document analysis such as word spotting or OCR. It is additionally used for document structure analysis or layout analysis. Considering mixed layouts, degraded documents and handwritten documents, text line detection is still challenging. We present a novel approach that targets torn documents having varying layouts and writing. The proposed...
Document skew correction is one of the core preprocessing steps in document analysis systems. In this paper, the author proposes a new multi-step skew detection technique for printed Arabic documents. The technique exploits the unique property of the writing line of Arabic script and is based on connected component analysis and projection profiles. The proposed technique works for different types...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.