The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We have proposed a complete system for text detection and localization in gray scale scene images. A boosting framework integrating feature and weak classifier selection based on computational complexity is proposed to construct efficient text detectors. The proposed scheme uses a small set of heterogeneous features which are spatially combined to build a large set of features. A neural network based...
Detection of characters in scenery images is often a very difficult problem. Although many researchers have tackled this difficult problem and achieved a good performance, it is still difficult to suppress many false alarms and although missings. This paper investigates a conspicuous character pattern, which is a special pattern designed for easier detection. In order to have an example of the conspicuous...
We present a writer adaptive training and writer clustering approach for an HMM based Arabic handwriting recognition system to handle different handwriting styles and their variations. Additionally, a writing variant model refinement for specific writing variants is proposed. Current approaches try to compensate the impact of different writing styles during preprocessing and normalization steps. Writer...
The extraction of textual content from colour documents of a graphical nature is a complicated task. The text can be rendered in any colour, size and orientation while the existence of complex background graphics with repetitive patterns can make its localization and segmentation extremely difficult. Here, we propose a new method for extracting textual content from such colour images that makes no...
In this paper, we present a new text line extraction method for handwritten Arabic documents. The proposed technique is based on a generalized adaptive local connectivity map (ALCM) using a steerable directional filter. The algorithm is designed to solve the particularly complex problems seen in handwritten documents such as fluctuating, touching or crossing text lines. The proposed algorithm consists...
Performance evaluation of document recognition systems is a difficult and practically important problem. Issues arise in defining requirements, in characterizing the systempsilas range of inputs and outputs, in interpreting published performance evaluation results, in reproducing performance evaluation experiments, in choosing training and test data, and in selecting performance metrics. We discuss...
Although it has long been recognized that non-biometric factors (for example, general demographic characteristics) can have an impact on the performance of automated person identification systems, such information is not routinely adopted in most practical biometric processing. In forensic applications, however, such additional information may be exploited most productively, since typical scenarios...
The forensic investigation of a questioned signature written on a piece of paper is a challenging task. Electronic pen-tablets for recording writing movements are considered valuable tools to assist in this effort. However, little is known about the precision and reliability of such electronic devices that are not intended as forensic equipment originally. Moreover, very few studies are conducted...
In this paper we analyze our recent research on the use of document analysis techniques for metadata extraction from PDF papers. We describe a package that is designed to extract basic metadata from these documents. The package is used in combination with a digital library software suite to easily build personal digital libraries. The proposed software is based on a suitable combination of several...
In this paper, we tackle the problem of localizing graphical symbols on complex technical document images by using an original approach to solve the subgraph isomorphism problem. In the proposed system, document and symbol images are represented by vector-attributed region adjacency graphs (RAG) which are extracted by a segmentation process and feature extractors. Vertices representing regions are...
Standard databases play very important roles in pattern recognition tasks. To compare the performances of different algorithms, they must be tested on a same dataset. In Farsi, there is not a database of handwritten texts to evaluate different algorithms. In this paper, an unconstraint Farsi handwritten text database is introduced. 250 participants in different ages and education levels filled 1000...
A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. GERMANA is the result of digitising and annotating a 764-page Spanish manuscript from 1891, in which most pages only contain nearly calligraphed text written on ruled sheets of well-separated lines. To our knowledge, it is the...
In todaypsilas world, form processing systems must be able to recognize mutant forms that appear to be based on differing templates but are actually only a variation of the original. A single definition of a representative template actually covers large varieties of the same logical templates. We developed a method and system, similar to the human visual system, which differentiates between templates...
In document image understanding, public datasets with ground-truth are an important part of scientific work. They are not only helpful for developing new methods, but also provide a way of comparing performance. Generating these datasets, however, is time consuming and cost-intensive work, requiring a lot of manual effort. In this paper we both propose a way to semi-automatically generate ground-truthed...
With large databases of document images available,a method for users to find keywords in documents will be useful. One approach is to perform Optical Character Recognition (OCR) on each document followed by indexing of the resulting text. However, if the quality of the document is poor or time is critical,complete OCR of all images is infeasible. This paper build upon previous works on Word Shape...
With the rise of tools for clinical decision support, there is an increased need for automatic processing of electrocardiograms (ECG) documents. In fact, many systems have already been developed to perform signal processing tasks such as 12-lead off-line ECG analysis and real-time patient monitoring. All these applications require an accurate detection of the heart rate of the ECG. In this paper,...
To develop effective learning algorithms for online cursive word recognition is still a challenge research issue. In this paper, we propose a probabilistic framework to model the inherent ambiguity of cursive handwriting by using soft target vector of each character class. In the proposed algorithm, the values of soft targets are estimated by introducing a lower bound on the log likelihood and optimizing...
This paper describes a document recognition system for 16th century German staffless lute tablature notation. We present methods for page layout analysis, symbol recognition and symbol layout analysis and report error rates for these methods on a variety of historic prints. Page layout analysis is based on horizontal separator lines, which may interfere with other symbols. The proposed algorithm for...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.