The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Current approaches for text line segmentation often are either very specialized to specific domains or they depend on many parameters. More specifically, the extraction of text-lines with large sizes, i.e., headings and titles in the Arabic like script could not be segmented correctly by state-of-the-art methods. In this work, we present a simple and robust text-line segmentation approach. The proposed...
Brain development is a protracted and dynamic process. Many studies have charted the trajectory of white matter development, but here we sought to map these effects in greater detail, based on a large set of fiber tracts automatically extracted from HARDI (high angular resolution diffusion imaging) at 4 tesla. We used autoMATE (automated multi-atlas tract extraction) to extract diffusivity measures...
Traumatic brain injury (TBI) is the leading cause of death and disability in children, and can lead to long lasting functional impairment. Many factors influence outcome, but imaging studies examining effects of individual variables are limited by sample size. Roughly 20–40% of hospitalized TBI patients experience seizures, but not all of these patients go on to develop a recurrent seizure disorder...
This paper presents the first Pashto text image database for scientific research and thereby the first dataset with complete handwritten and printed text line images which ultimately covers all alphabets of Arabic and Persian languages. Language like Pashto, written in a complex way by calligraphers, still requires a mature Optical Character Recognition (OCR), system. Although 50 million people use...
Optical Character Recognition (OCR) of cursive scripts like Pashto and Urdu is difficult due the presence of complex ligatures and connected writing styles. In this paper, we evaluate and compare different approaches for the recognition of such complex ligatures. The approaches include Hidden Markov Model (HMM), Long Short Term Memory (LSTM) network and Scale Invariant Feature Transform (SIFT). Current...
Atomic segmentation of cursive scripts into constituent characters is one of the most challenging problems in pattern recognition. To avoid segmentation in cursive script, concrete shapes are considered as recognizable units. Therefore, the objective of this work is to find out the alternate recognizable units in Pashto cursive script. These alternatives are ligatures and primary ligatures. However,...
Cursive handwriting recognition is still a hot topic of research, especially for non-Latin scripts. One of the techniques which yields best recognition results is based on recurrent neural networks: with neurons modeled by long short-term memory (LSTM) cells, and alignment of label sequence to output sequence performed by a connectionist temporal classification (CTC) layer. However, network training...
Recurrent neural networks (RNN) have been successfully applied for recognition of cursive handwritten documents, both in English and Arabic scripts. Ability of RNNs to model context in sequence data like speech and text makes them a suitable candidate to develop OCR systems for printed Nabataean scripts (including Nastaleeq for which no OCR system is available to date). In this work, we have presented...
Optical character recognition (OCR) of machine printed Latin script documents is ubiquitously claimed as a solved problem. However, error free OCR of degraded or noisy text is still challenging for modern OCR systems. Most recent approaches perform segmentation based character recognition. This is tricky because segmentation of degraded text is itself problematic. This paper describes a segmentation...
A large amount of real-world data is required to train and benchmark any character recognition algorithm. Developing a page-level ground-truth database for this purpose is overwhelmingly laborious, as it involves a lot of manual efforts to produce a reasonable database that covers all possible words of a language. Moreover, generating such a database for historical (degraded) documents or for a cursive...
Segmentation and recognition of screen rendered text is a challenging task due to its low resolution (72 or 96 ppi) and use of antialiased rendering. This paper evaluates Hidden Markov Model (HMM) techniques for OCR of low resolution text -- both on screen rendered isolated characters and screen rendered text-lines -- and compares it with the performance of other commercial and open source OCR systems...
Document script recognition is one of the important preprocessing steps in a multilingual optical character recognition (MOCR) system. A MOCR system requires prior knowledge of script to accurately recognize multilingual text in a single document. In multilingual documents two scripts can be mixed together within a single text line. Many existing script recognition methods lack the ability to recognize...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.