The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Subspace learning algorithms aim at finding low dimensional linear manifolds that are representative of the data at hand. In this paper we propose a semi-supervised approach that fits any given dataset to a low dimensional subspace while maintaining class separability. Our approach has no tunable parameters as against many existing subspace learning algorithms which obviates the need for cross-validation...
Word level handwritten CAPTCHA generation involves picking a handwritten word from a pre-existing database and cumulatively applying distortions and noise models. In principle, the addition of distortion and noise makes the CAPTCHA robust to automated attacks. However, the primary drawback of the word level CAPTCHA generation is that it limits us to words that already exist in our data set. If the...
The ICML 2013 Workshop on Challenges in Representation Learning 1 1http://deeplearning.net/icml2013-workshop-competition. focused on three challenges: the black box learning challenge, the facial expression recognition challenge, and the multimodal learning challenge. We describe the datasets created for these challenges and summarize the results of the competitions. We provide suggestions...
A key factor in building effective writer identification/verification systems is the amount of data required to build the underlying models. In this research we systematically examine data sufficiency bounds for two broad approaches to online writer identification -- feature space models vs. writer-style space models. We report results from 40 experiments conducted on two publicly available datasets...
Popular CAPTCHA systems consist of garbled printed text character images with significant distortions and noise. It is believed that humans have little difficulty in deciphering the text, whereas automated systems are foiled by the added noise and distortion. However, in recent years, several text based CAPTCHAs have been reported as broken, that is, automated systems can identify the text in the...
Writer identification is the process of determining the author of a handwritten specimen by utilizing characteristics inherent in the sample. In this work, we apply the concept of accents in handwriting to introduce a novel perspective for writer identification. Analogous to speech, accents in handwriting can be defined as distinctive writing quirks that are unique to a group of people sharing a common...
Writer Identification can be seen as a multi-class learning problem where number of writers are different classes. One of the fundamental approaches to solve a multi-class problemis by breaking it into binary classification tasks. In this work weare proposing a generic approach for multi-class classification using an ensemble of binary classifiers. We assign a distributedoutput representation to each...
Accent in speech is defined as a distinctive mode of pronunciation that is unique to a geographical region. In a similar way, we define accent in handwriting as distinctive writing characteristics that are unique to a group of people sharing a common native script. In other words, we postulate that a group of people with a common native script will share certain traits in their handwriting that can...
In this paper we present a new dual mode, twin-folio structured English handwriting dataset IBM_UB_1. IBM_UB_1 is our first major release from a large multilingual handwriting corpus. Containing over 6000 pages of handwritten matter, this dataset can not only be used for unconstrained handwriting recognition, more importantly, the dataset's unique twin-folio structure presents a natural fit for research...
Availability of sufficient labeled data is key to the performance of any learning algorithm. However, in document analysis obtaining the large amount of labeled data is difficult. Scarcity of labeled samples is often a main bottleneck in the performance of algorithms for document analysis. However, unlabeled data samples are present in abundance. We propose a semi supervised framework for writer identification...
With the explosive growth of the tablet form factor and greater availability of pen-based direct input, writer identification in online environments is increasingly becoming critical for a variety of downstream applications such as intelligent and adaptive user environments, search, retrieval, indexing and digital forensics. Extant research has approached writer identification by using writing styles...
Accent in handwriting can be defined as the influence of a writer's native script on his/her writing style in another script. In this paper, we approach the problem of detecting the existence of accents in handwriting. We approach this problem using two sets of writers, those who can write only in English, and the other set being multilingual writers who can also write in English. We learn the writing...
Techniques and performance of text recognition systems and software has shown great improvement in recent years. OCRs now can read any machine printed document with good accuracy. However, the advancements are primarily for Latin scripts and even for such scripts performance is limited in case of handwritten documents. Little work has been done for cursive scripts such as Arabic and still there is...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.