The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The region-based Convolutional Neural Network (CNN) detectors such as Faster R-CNN or R-FCN have already shown promising results for object detection by combining the region proposal subnetwork and the classification subnetwork together. Although R-FCN has achieved higher detection speed while keeping the detection performance, the global structure information is ignored by the position-sensitive...
Human activity recognition is a challenging high-level vision task, for which multiple factors, such as subject, object, and their diverse interactions, have to be considered and modeled. Current learning-based methods are limited in the capability to integrate human-level concepts into an easily extensible computational framework. Inspired by the existing human memory model, we present a context-associative...
Visually perceiving human motion at semantic level is an important however challenging problem in multimedia area. In this work, we propose a novel approach to map the low-level responses from visual detection to semantically sensitive description to human actions. The feature map is triggered by the output of deformable part model detection, in which the critical information about body parts configuration...
This paper focuses on multimodal gender recognition. To achieve a robust and discriminative performance for gender recognition, visual observations from both face and corresponding fingerprints are fused to serve for the task. The bag-of-words model is employed to structure the image representation. We propose a novel supervised method to construct the visual words, by which the redundant feature...
In this paper, we present an efficient discriminative method for human pose estimation. This method learns a direct mapping from visual observations to human body configurations. The framework requires that the visual features should be powerful enough to discriminate the subtle differences between similar human poses. We propose to describe the image features using salient interest points that are...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.