The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Image captioning is a challenging problem owing to the complexity in understanding the image content and diverse ways of describing it in natural language. Recent advances in deep neural networks have substantially improved the performance of this task. Most state-of-the-art approaches follow an encoder-decoder framework, which generates captions using a sequential recurrent prediction model. However,...
Person re-identification is an open and challenging problem in computer vision. Existing approaches have concentrated on either designing the best feature representation or learning optimal matching metrics in a static setting where the number of cameras are fixed in a network. Most approaches have neglected the dynamic and open world nature of the re-identification problem, where a new camera may...
Recently, there has been a lot of interest in automatically generating descriptions for an image. Most existing language-model based approaches for this task learn to generate an image description word by word in its original word order. However, for humans, it is more natural to locate the objects and their relationships first, and then elaborate on each object, describing notable attributes. We...
This paper addresses deep face recognition (FR) problem under open-set protocol, where ideal face features are expected to have smaller maximal intra-class distance than minimal inter-class distance under a suitably chosen metric space. However, few existing algorithms can effectively achieve this criterion. To this end, we propose the angular softmax (A-Softmax) loss that enables convolutional neural...
The research focus of designing local patch descriptors has gradually shifted from handcrafted ones (e.g., SIFT) to learned ones. In this paper, we propose to learn high performance descriptor in Euclidean space via the Convolutional Neural Network (CNN). Our method is distinctive in four aspects: (i) We propose a progressive sampling strategy which enables the network to access billions of training...
A novel dataset for benchmarking image-based localization is presented. With increasing research interests in visual place recognition and localization, several datasets have been published in the past few years. One of the evident limitations of existing datasets is that precise ground truth camera poses of query images are not available in a meaningful 3D metric system. This is in part due to the...
In domain adaptation, maximum mean discrepancy (MMD) has been widely adopted as a discrepancy metric between the distributions of source and target domains. However, existing MMD-based domain adaptation methods generally ignore the changes of class prior distributions, i.e., class weight bias across domains. This remains an open problem but ubiquitous for domain adaptation, which can be caused by...
Re-identification of people in surveillance footage must cope with drastic variations in color, background, viewing angle and a persons pose. Supervised techniques are often the most effective, but require extensive annotation which is infeasible for large camera networks. Unlike previous supervised learning approaches that require hundreds of annotated subjects, we learn a metric using a novel one-shot...
Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant...
For the mathematical model of tug handling simulator, the locally optimal locally weighted learning (LWL) is proposed. Firstly, samples space rearrangement is taken to diminish the one-to-many mapping and non-separable of ship motion states. Secondly, distance metric is learned by leave-one-out cross validation for every sample, and this approach improves the nonlinearity mapping ability and robustness...
In this work, a diversified deep structural metric learning is proposed for remote sensing image classification. Firstly, a deep structural metric learning is introduced to take full advantage of structural information of training batches. Secondly, we impose a diversity regularization over the factors of deep structural metric learning to encourage them to be uncorrelated, such that each factor tends...
Click-baits are headlines that exaggerate the facts or hide the partial facts to attract user clicks. Click-baits deter readers from effectively and efficiently obtaining information in the era of information explosion, and will obviously affect user experience in news aggregator sites like Google News and Yahoo News. Detecting and preventing click-baits become crucial. Previous work achieved remarkable...
This paper deals with TanDEM-X and Cartosat-1 DEM fusion over urban areas with support of weight maps predicted by an artificial neural network (ANN). Although the TanDEM-X DEM is a global elevation dataset of unprecedented accuracy (following HRTI-3 standard), its quality decreases over urban areas because of artifacts intrinsic to the SAR imaging geometry. DEM fusion techniques can be used to improve...
Understanding temporal expressions is the important foundation of many NLP tasks. However, the varied representations of temporal expressions is difficulty in analysis and understanding. To parsing expressions, an effective classification method of temporal expressions is significant. A temporal expression may belong to one or more classes, but the classification usually requires manual annotation...
Hyperspectral image classification, an astonishing tool to distinguish the land covers in remote sensed hyperspectral images, has been investigated by multiple disciplines such as geoscience, environmental science, mathematics, and computer vision. Following early machine learning (e.g., support vector machines and neural networks) and feature extraction theories (e.g., principal component analysis),...
This paper presents a combination of machine learning and lexicon-based approaches for sentiment analysis of students feedback. The textual feedback, typically collected towards the end of a semester, provides useful insights into the overall teaching quality and suggests valuable ways for improving teaching methodology. The paper describes a sentiment analysis model trained using TF-IDF and lexicon-based...
Facial attractiveness computation is a challenging task because of the lack of labeled data and discriminative features. In this paper, an end-to-end label distribution learning (LDL) framework with deep convolutional neural network (CNN) and geometric features is proposed to meet these two challenges. Different from the previous work, we recast this task as an LDL problem. Compared with the single...
The extraction and recognition of scene text in images is an important way to understand the semantic information in image. By now, scene text detection is still a challenging problem. In this paper, we present a scene text localization method based on the pruning of Maximally Stable Extremal Region (MSER) tree and Linkage-tree. Concretely, the MSER-tree is first constructed and overlap MSERs are...
Kinship verification from facial images is a challenging task in computer vision. The majority of recent verification algorithms concatenate all features of patches in facial image to build the final feature representation, which implicitly takes every facial part into account for kinship verification. However, it is questionable by considering all face regions since certain facial parts such as the...
In this work we present three methods to improve a deep convolutional neural network approach to near-infrared heterogeneous face recognition. We first present a method to distill extra information from a pre-trained visible face network through the output logits of the network. Next, we put forth an altered contrastive loss function that uses the ℓ1 norm instead of the ℓ2 norm as a distance metric...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.