The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper tackles the problem of estimating 3D human poses from given 2D landmarks, which is still an ill-posed problem. The existing works have successfully applied Active Shape Model approach to estimate 3D human poses, but the error is still high. In this paper, we propose an improved method by using the cascade of neural networks to make the estimated shape more alike to the ground truth shape...
A novel dataset for benchmarking image-based localization is presented. With increasing research interests in visual place recognition and localization, several datasets have been published in the past few years. One of the evident limitations of existing datasets is that precise ground truth camera poses of query images are not available in a meaningful 3D metric system. This is in part due to the...
In this paper we formulate structure from motion as a learning problem. We train a convolutional network end-to-end to compute depth and camera motion from successive, unconstrained image pairs. The architecture is composed of multiple stacked encoder-decoder networks, the core part being an iterative network that is able to improve its own predictions. The network estimates not only depth and motion,...
We explore 3D human pose estimation from a single RGB image. While many approaches try to directly predict 3D pose from image measurements, we explore a simple architecture that reasons through intermediate 2D pose predictions. Our approach is based on two key observations (1) Deep neural nets have revolutionized 2D pose estimation, producing accurate 2D predictions even for poses with self-occlusions...
Re-identification of people in surveillance footage must cope with drastic variations in color, background, viewing angle and a persons pose. Supervised techniques are often the most effective, but require extensive annotation which is infeasible for large camera networks. Unlike previous supervised learning approaches that require hundreds of annotated subjects, we learn a metric using a novel one-shot...
Removing pixel-wise heterogeneous motion blur is challenging due to the ill-posed nature of the problem. The predominant solution is to estimate the blur kernel by adding a prior, but extensive literature on the subject indicates the difficulty in identifying a prior which is suitably informative, and general. Rather than imposing a prior based on theory, we propose instead to learn one from the data...
We present an unsupervised learning framework for the task of monocular depth and camera motion estimation from unstructured video sequences. In common with recent work [10, 14, 16], we use an end-to-end learning approach with view synthesis as the supervisory signal. In contrast to the previous work, our method is completely unsupervised, requiring only monocular video sequences for training. Our...
Learning based methods have shown very promising results for the task of depth estimation in single images. However, most existing approaches treat depth prediction as a supervised regression problem and as a result, require vast quantities of corresponding ground truth depth data for training. Just recording quality depth data in a range of environments is a challenging problem. In this paper, we...
Coin recognition is one of the prime important activities for modern banking and currency processing systems in which machine vision is widely used. The technique at the heart of such systems is object recognition in a digital image. Although it has high recognition speed, the traditional method of coin recognition can not recognize the coins with similar sizes. This paper presents a method based...
This paper presents a computer vision-based methodology for human action recognition. First, the shape based pose features are constructed based on area ratios to identify the human silhouette in images. The proposed features are invariance to translation and scaling. Once the human body features are extracted from videos, different human actions are learned individually on the training frames of...
Surveillance systems play a critical role in security and surveillance. A surveillance system with cameras that work in the visible spectrum is sufficient for most cases. However, problems may arise during the night, or in areas with less than ideal illumination conditions. Cameras with thermal infrared technology can be a better option in these situations since they do not rely on illumination to...
Convolutional Neural Networks (CNNs) are responsible for major breakthroughs in object recognition in still images. This work presents an end to end very deep architecture with small convolutional kernel size, small convolutional strides and very deep network architecture for person re-identification in video streams. To achieve such system several good practices for the training were tested, namely:...
A robot needs to localize an unknown object before grasping it. When the robot only has a monocular sensor, how can it get the object pose? In this work, we present a method of localizing the 6-DOF pose of a target object using a robotic arm and a hand-mounted monocular camera. The method includes an object recognition and a localization process. The recognition process uses point features on a surface...
In this paper, we present a novel approach to estimate the relative depth of regions in monocular images. There are several contributions. First, the task of monocular depth estimation is considered as a learning-to-rank problem which offers several advantages compared to regression approaches. Second, monocular depth clues of human perception are modeled in a systematic manner. Third, we show that...
This paper targets to bring together the research efforts on two fields that are growing actively in the past few years: multicamera person Re-Identification (ReID) and large-scale image retrieval. We demonstrate that the essentials of image retrieval and person ReID are the same, i.e., measuring the similarity between images. However, person ReID requires more discriminative and robust features to...
Eye gaze is an important non-verbal cue for human affect analysis. Recent gaze estimation work indicated that information from the full face region can benefit performance. Pushing this idea further, we propose an appearance-based method that, in contrast to a long-standing line of work in computer vision, only takes the full face image as input. Our method encodes the face image using a convolutional...
Despite significant progress in the development of human action detection datasets and algorithms, no current dataset is representative of real-world aerial view scenarios. We present Okutama-Action, a new video dataset for aerial view concurrent human action detection. It consists of 43 minute-long fully-annotated sequences with 12 action classes. Okutama-Action features many challenges missing in...
In this work we present three methods to improve a deep convolutional neural network approach to near-infrared heterogeneous face recognition. We first present a method to distill extra information from a pre-trained visible face network through the output logits of the network. Next, we put forth an altered contrastive loss function that uses the ℓ1 norm instead of the ℓ2 norm as a distance metric...
Person re-identification is an important technique towards automatic search of a person's presence in a surveillance video. Two fundamental problems are critical for person re-identification:feature representation and metric learning. At present, there are many methods in the study of person re-identification, which has achieved remarkable results. Due to the difference of the data distribution in...
Traffic surveillance has always been a challenging task to automate. The main difficulties arise from the high variation of the vehicles appertaining to the same category, low resolution, changes in illumination and occlusions. Due to the lack of large labeled datasets, deep learning techniques still have not shown their full potential. In this paper, we train an Ensemble of Deep Networks (EDeN) to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.