The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Convolutional Neural Networks (CNNs) are responsible for major breakthroughs in object recognition in still images. This work presents an end to end very deep architecture with small convolutional kernel size, small convolutional strides and very deep network architecture for person re-identification in video streams. To achieve such system several good practices for the training were tested, namely:...
A robot needs to localize an unknown object before grasping it. When the robot only has a monocular sensor, how can it get the object pose? In this work, we present a method of localizing the 6-DOF pose of a target object using a robotic arm and a hand-mounted monocular camera. The method includes an object recognition and a localization process. The recognition process uses point features on a surface...
In this paper, we present a novel approach to estimate the relative depth of regions in monocular images. There are several contributions. First, the task of monocular depth estimation is considered as a learning-to-rank problem which offers several advantages compared to regression approaches. Second, monocular depth clues of human perception are modeled in a systematic manner. Third, we show that...
This paper targets to bring together the research efforts on two fields that are growing actively in the past few years: multicamera person Re-Identification (ReID) and large-scale image retrieval. We demonstrate that the essentials of image retrieval and person ReID are the same, i.e., measuring the similarity between images. However, person ReID requires more discriminative and robust features to...
Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional...
Despite significant progress in the development of human action detection datasets and algorithms, no current dataset is representative of real-world aerial view scenarios. We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in...
In this work we present three methods to improve a deep convolutional neural network approach to near-infrared heterogeneous face recognition. We first present a method to distill extra information from a pre-trained visible face network through the output logits of the network. Next, we put forth an altered contrastive loss function that uses the ℓ1 norm instead of the ℓ2 norm as a distance metric...
Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification:feature representation and metric learning. At present, there are many methods in the study of person re-identification, which has achieved remarkable results. Due to the difference of the data distribution in...
Traffic surveillance has always been a challenging task to automate. The main difficulties arise from the high variation of the vehicles appertaining to the same category, low resolution, changes in illumination and occlusions. Due to the lack of large labeled datasets, deep learning techniques still have not shown their full potential. In this paper, we train an Ensemble of Deep Networks (EDeN) to...
We present a method for estimating the body orientation of seated people in a smart room by fusing low-resolution range information collected from downward pointed time-of-flight (ToF) sensors with synchronized speaker identification information from microphone recordings. The ToF sensors preserve the privacy of the occupants in that they only return the range to a small set of hit points. We propose...
Multispectral images that combine visual-optical (VIS) and infrared (IR) image information are a promising source of data for automatic person detection. Especially in automotive or surveillance applications, challenging conditions such as insufficient illumination or large distances between camera and object occur regularly and can affect image quality. This leads to weak image contrast or low object...
This paper tackles the problem of reconstructing 3D human poses from 2D landmarks, which is still an ill-posed problem. A widely-used approach is active shape model (ASM) which considers an unknown 3D shape as a linear combination of predefined basis shapes. The existing methods often resolve an optimization problem to reckon the weights and viewpoints of basis shapes, but they could fall into a locally-optimal...
Frame dropping is a type of video manipulation where consecutive frames are deleted to omit content from the original video. Automatically detecting dropped frames across a large archive of videos while maintaining a low false alarm rate is a challenging task in digital video forensics. We propose a new approach for forensic analysis by exploiting the local spatio-temporal relationships within a portion...
Image content or metadata editing software availability and ease of use has resulted in a high demand for automatic image tamper detection algorithms. Most previous work has focused on detection of tampered image content, whereas we develop techniques to detect metadata tampering in outdoor images using sun altitude angle and other meteorological information like temperature, humidity and weather,...
An increasing number of digital images are being shared and accessed through websites, media, and social applications. Many of these images have been modified and are not authentic. Recent advances in the use of deep convolutional neural networks (CNNs) have facilitated the task of analyzing the veracity and authenticity of largely distributed image datasets. We examine in this paper the problem of...
The successful deep convolutional neural networks for visual object recognition typically rely on a massive number of training images that are well annotated by class labels or object bounding boxes with great human efforts. Here we explore the use of the geographic metadata, which are automatically retrieved from sensors such as GPS and compass, in weakly-supervised learning techniques for landmark...
Automatic person re-identification (re-id) across camera boundaries is a challenging problem. Approaches have to be robust against many factors which influence the visual appearance of a person but are not relevant to the person's identity. Examples for such factors are pose, camera angles, and lighting conditions. Person attributes are a semantic high level information which is invariant across many...
3D pose estimation is a key component of many important computer vision tasks like autonomous navigation and robot manipulation. Current state-of-the-art approaches for 3D object pose estimation, like Viewpoints & Keypoints and Render for CNN, solve this problem by discretizing the pose space into bins and solving a pose-classification task. We argue that 3D pose is continuous and can be solved...
Protecting visual secrets is an important problem due to the prevalence of cameras that continuously monitor our surroundings. Any viable solution to this problem should also minimize the impact on the utility of applications that use images. In this work, we build on the existing work of adversarial learning to design a perturbation mechanism that jointly optimizes privacy and utility objectives...
Vehicle re-identification (re-id) plays an important role in the automatic analysis of the drastically increasing urban surveillance videos. Similar to the other image retrieval problems, vehicle re-id suffers from the difficulties caused by various poses of vehicles, diversified illuminations, and complicated environments. Triplet-wise training of convolutional neural network (CNN) has been studied...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.