The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a novel segmentation-free approach using deep neural network based hidden Markov model (DNN-HMM) for offline handwritten Chinese text recognition. In the general Bayesian framework, three key issues are comprehensively investigated, namely feature extraction, character modeling, and language modeling. First, as for the feature extraction on the basis of each frame or sliding window,...
Fine-grained classification is an extremely challenging problem in computer vision, compounded by subtle differences in shape, pose, illumination and appearance. While convolutional neural networks have become the versatile jack-of-all-trades tool in modern computer vision, approaches for fine-grained recognition still rely on localization of keypoints and parts to learn discriminative features for...
Cascade regression framework has been successfully applied to facial landmark detection and achieves state-of-the-art performance recently. It requires large number of facial images with labeled landmarks for training regression models. We propose to use cascade regression framework to detect eye center by capturing its contextual and shape information of other related eye landmarks. While for eye...
The recently proposed trainable COSFIRE filters are highly effective in a wide range of computer vision applications, including object recognition, image classification, contour detection and retinal vessel segmentation. A COSFIRE filter is selective for a collection of contour parts in a certain spatial arrangement. These contour parts and their spatial arrangement are determined in an automatic...
This paper addresses the problem of transferring CNNs pre-trained for face recognition to a face attribute prediction task. To transfer an off-the-shelf CNN to a novel task, a typical solution is to fine-tune the network towards the novel task. As demonstrated in the state-of-the-art face attribute prediction approach, fine-tuning the high-level CNN hidden layer by using labeled attribute data leads...
We propose a novel method for extracting features from images of people using co-occurrence attributes, which are then used for person re-identification. Existing methods extract features based on simple attributes such as gender, age, hair style, or clothing. Our method instead extracts more informative features using co-occurrence attributes, which are combinations of physical and adhered human...
Object detection and localization in images involve a multi-scale reasoning process. First, responses of object detectors are known to vary with image scale. Second, contextual relationships on a part-level, object-level, and scene-level appear at different scales of the image. This paper studies efficient modeling of these two components by training multi-scale template models. The input to the proposed...
Bilinear models based feature space Maximum Likelihood Linear Regression (FMLLR) speaker adaptation have showed good performance for GMM-HMMs especially when the amount of adaptation data is limited. In this paper, we propose using bilinear models feature as inputs to deep neural networks (DNNs) for rapid speaker adaptation of acoustic modeling to facilitate utterance-level normalization. The effectiveness...
When applied for phoneme recognition, the Connectionist Temporal Classification (CTC) objective function allows a neural network to be trained with the phoneme level transcriptions of training utterances. A limitation of the CTC is that it can not be applied directly for network training with large speech corpora, since those corpora usually only have word level transcriptions. This work extends the...
Overfitting is a commonly met issue in automatic speech recognition and is especially impacting when the amount of training data is limited. In order to address this problem, this article investigates acoustic modeling through Multi-Task Learning, with two speaker-related auxiliary tasks. Multi-Task Learning is a regularization method which aims at improving the network's generalization ability, by...
In this paper a novel CNN-based approach in the Content Based Image Retrieval domain that exploits supervised learning is proposed. We employ a deep CNN model to derive feature representations from the activations of the deepest layers and we refine the weights of the utilized layers in order to produce better image descriptors using information obtained from the available data labels. To this end,...
The forced landing problem has become one of the main impediments to UAV's entering civilian airspace. Unfortunately there is no robust forced landing site detection system that will reliably detect a safe landing site. One of the main reasons for this is the difficulty in considering the various classes of surface, to determine whether they are safe or not. We propose a robust UAV landing site detection...
On-line supervised spotting and classification of subsequences can be performed by comparing some distance between the stream and previously learnt time series. However, learning a few incorrect time series can trigger disproportionately many false alarms. In this paper, we propose a fast technique to prune bad instances away and automatically select appropriate distance thresholds. Our main contribution...
Domain adaptation (DA) aims to eliminate the difference between the distribution of labeled source domain on which a classifier is trained and that of unlabeled or partly labeled target domain to which the classifier is to be applied. Compared with the semi-supervised domain adaptation where some labeled data from target domain is utilized to help train the classifier, the unsupervised domain adaptation...
Attributes are defined as mid-level image characteristics shared among different categories. These characteristics are suitable in order to handle classification problems especially when training data are scarce. In this paper, we design discriminative real-valued attributes by learning nonlinear inductive maps. Our method is based on solving a constrained optimization problem that mixes three criteria;...
We introduce Delay Pruning, a simple yet powerful technique to regularize dynamic Boltzmann machines (DyBM). The recently introduced DyBM provides a particularly structured Boltzmann machine, as a generative model of a multi-dimensional time-series. This Boltzmann machine can have infinitely many layers of units but allows exact inference and learning based on its biologically motivated structure...
In this work, we propose a metric adaptation method for set-based face verification and evaluate it on the newly released IARPA Janus Benchmark A (IJB-A) dataset and its extended version, the Janus Challenging Set 2 (CS2). A template-specific metric is trained to adaptively learn the discriminative information in test templates and the negative training set, which contains subjects that are mutually...
Imitation cartoon drawing is an important skill for cartoonists, requiring quantity of efforts on practising and guidance. In this paper, we propose EvaToon, an imitated drawing evaluate system, which automatically assigns judging scores and marks improper drawing regions. With our system, cartoonists can practise and get guidance by themselves. We have cooperated with several experts on developing...
A new online handwritten Mongolian word database, MRG-OHMW, is introduced in this paper. This database contains 946 Mongolian words produced by 300 persons from Mongolian ethnic minority. These Mongolian words are composed of one to fourteen Mongolian characters, and selected from large-scale Mongolian text corpus according to the frequencies of usage. The current version of this database is collected...
In recent years, growing attention has been paid to recognizing text in natural scenes images. Scene Character recognition (SCR) is an important step in automatizing the process of reading text in natural scenes.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.