The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a simple, yet effective approach for real-time hand pose estimation from single depth images using three-dimensional Convolutional Neural Networks (3D CNNs). Image based features extracted by 2D CNNs are not directly suitable for 3D hand pose estimation due to the lack of 3D spatial information. Our proposed 3D CNN taking a 3D volumetric representation of the hand depth image as input can...
We propose a unified formulation for the problem of 3D human pose estimation from a single raw RGB image that reasons jointly about 2D joint estimation and 3D pose reconstruction to improve both tasks. We take an integrated approach that fuses probabilistic knowledge of 3D human pose with a multi-stage CNN architecture and uses the knowledge of plausible 3D landmark locations to refine the search...
Recent advances with Convolutional Networks (ConvNets) have shifted the bottleneck for many computer vision tasks to annotated data collection. In this paper, we present a geometry-driven approach to automatically collect annotations for human pose prediction tasks. Starting from a generic ConvNet for 2D human pose, and assuming a multi-view setup, we describe an automatic way to collect accurate...
Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply...
This paper addresses the problem of amodal perception of 3D object detection. The task is to not only find object localizations in the 3D world, but also estimate their physical sizes and poses, even if only parts of them are visible in the RGB-D image. Recent approaches have attempted to harness point cloud from depth channel to exploit 3D features directly in the 3D space and demonstrated the superiority...
We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D...
In this paper, we present a novel approach, called Deep MANTA (Deep Many-Tasks), for many-task vehicle analysis from a given image. A robust convolutional network is introduced for simultaneous vehicle detection, part localization, visibility characterization and 3D dimension estimation. Its architecture is based on a new coarse-to-fine object proposal that boosts the vehicle detection. Moreover,...
3D models provide a common ground for different representations of human bodies. In turn, robust 2D estimation has proven to be a powerful tool to obtain 3D fits in-the-wild. However, depending on the level of detail, it can be hard to impossible to acquire labeled data for training 2D estimators on large scale. We propose a hybrid approach to this problem: with an extended version of the recently...
The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset. Kinetics has two orders of magnitude more data, with 400 human...
We present a new Cascaded Shape Regression (CSR) architecture, namely Dynamic Attention-Controlled CSR (DAC-CSR), for robust facial landmark detection on unconstrained faces. Our DAC-CSR divides facial landmark detection into three cascaded sub-tasks: face bounding box refinement, general CSR and attention-controlled CSR. The first two stages refine initial face bounding boxes and output intermediate...
In this paper, we focus on the 3D crater detection problem on lunar surface, which helps high-precision spacecraft landing and rover navigation in moon exploration projects. A random structured forests method is firstly applied to detect the 2D edges of craters, and then dense correspondence between CCD stereo images estimates the elevations of craters. Finally, we propose a 3D crater detection model,...
Hybrid vertical cavity lasers employing high-contrast grating reflectors are attractive for Si-integrated light source applications. Here, a method for reducing a three-dimensional (3D) optical simulation of this laser structure to lower-dimensional simulations is suggested, which allows for very fast and approximate analysis of the quality-factor of the 3D cavity. This approach enables us to efficiently...
A simple and efficient method is presented to enhance the depth perception of an image. The approach termed Depth-Stretch (D-stretch) is a tone mapping operation that is applied to the shading component of the given image. Although re-rendering a scene under geometric transformations typically requires extracting the 3D model of the scene, we show that under very simple assumptions D-stretch can be...
This paper points out a new telltale trace – the characteristic of perspective distortion (CPD), for the image forensics of faces. The perspective distortion is determined by the position of image shooting, and it is often overlooked when creating a forgery, which results in the inconsistency between the claimed camera parameters and the CPD in the face image. To investigate this consistency problem,...
3D pose estimation is a key component of many important computer vision tasks like autonomous navigation and robot manipulation. Current state-of-the-art approaches for 3D object pose estimation, like Viewpoints & Keypoints and Render for CNN, solve this problem by discretizing the pose space into bins and solving a pose-classification task. We argue that 3D pose is continuous and can be solved...
We propose a novel 3D-assisted coarse-to-fine extreme-pose facial landmark detection system in this work. For a given face image, our system first refines the face bounding box with landmark locations inferred from a 3D face model generated by a Recurrent 3D Regressor at coarse level. Another R3R is then employed to fit a 3D face model onto the 2D face image cropped with the refined bounding box at...
Motion analysis is often restricted to a laboratory setup with multiple cameras and force sensors which requires expensive equipment and knowledgeable operators. Therefore it lacks in simplicity and flexibility. We propose an algorithm combining monocular 3D pose estimation with physics-based modeling to introduce a statistical framework for fast and robust 3D motion analysis from 2D video-data. We...
We propose a model-based approach to obtain local pose estimates of micro aerial vehicles (MAVs), with respect to electric towers, using 2D laser scanners. A simple planar model for the body of an electric tower is presented, which is used in an iterative closest point (ICP) framework to register incoming laser scans. This is complemented with attitude estimates from IMU measurements to obtain a complete...
This paper provides a novel texture search method for texture images. Creating a computer graphics (CG) is a popular task in many media creations. However, CG creators require their abundant time and effort. In addition, it is difficult for non-professional creators to make a 3D CG scene. This is because that they have to choose appropriate colors, textures, and lighting patterns in addition to 3D...
The paper describes an optimization process to evaluate the radius of rebars located inside reinforced concrete sample, by solving a 2D inverse problem. This process has been applied to measurements and gives acceptable value of radius. However, there exist differences between measured and computed fields by using the optimized radius. A 3D model of the device has been proposed to improve this drawback.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.