The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Image matting is a fundamental computer vision problem and has many applications. Previous algorithms have poor performance when an image has similar foreground and background colors or complicated textures. The main reasons are prior methods 1) only use low-level features and 2) lack high-level context. In this paper, we propose a novel deep learning based algorithm that can tackle both these problems...
Visual recognition of wet surfaces and their degrees of wetness is important for many computer vision applications. It can inform slippery spots on a road to autonomous vehicles, muddy areas of a trail to humanoid robots, and the freshness of groceries to us. In the past, monochromatic appearance change, the fact that surfaces darken when wet, has been modeled to recognize wet surfaces. In this paper,...
Improvements in color constancy have arisen from the use of convolutional neural networks (CNNs). However, the patch-based CNNs that exist for this problem are faced with the issue of estimation ambiguity, where a patch may contain insufficient information to establish a unique or even a limited possible range of illumination colors. Image patches with estimation ambiguity not only appear with great...
In this work we pursue a data-driven approach to the problem of estimating surface normals from a single intensity image, focusing in particular on human faces. We introduce new methods to exploit the currently available facial databases for dataset construction and tailor a deep convolutional neural network to the task of estimating facial surface normals in-the-wild. We train a fully convolutional...
This paper tackles the photometric stereo problem in the presence of inaccurate lighting, obtained either by calibration or by an uncalibrated photometric stereo method. Based on a precise modeling of noise and outliers, a robust variational approach is introduced. It explicitly accounts for self-shadows, and enforces robustness to cast-shadows and specularities by resorting to redescending M-estimators...
This paper proposes a new extrinsic calibration of kaleidoscopic imaging system by estimating normals and distances of the mirrors. The problem to be solved in this paper is a simultaneous estimation of all mirror parameters consistent throughout multiple reflections. Unlike conventional methods utilizing a pair of direct and mirrored images of a reference 3D object to estimate the parameters on a...
Multi-view stereo relies on feature correspondences for 3D reconstruction, and thus is fundamentally flawed in dealing with featureless scenes. In this paper, we propose polarimetric multi-view stereo, which combines per-pixel photometric information from polarization with epipolar constraints from multiple views for 3D reconstruction. Polarization reveals surface normal information, and is thus helpful...
Maximum consensus estimation plays a critically important role in computer vision. Currently, the most prevalent approach draws from the class of non-deterministic hypothesize-and-verify algorithms, which are cheap but do not guarantee solution quality. On the other extreme, there are global algorithms which are exhaustive search in nature and can be costly for practical-sized inputs. This paper aims...
Monocular 3D object parsing is highly desirable in various scenarios including occlusion reasoning and holistic scene interpretation. We present a deep convolutional neural network (CNN) architecture to localize semantic parts in 2D image and 3D space while inferring their visibility states, given a single RGB image. Our key insight is to exploit domain knowledge to regularize the network by deeply...
This paper addresses the problem of amodal perception of 3D object detection. The task is to not only find object localizations in the 3D world, but also estimate their physical sizes and poses, even if only parts of them are visible in the RGB-D image. Recent approaches have attempted to harness point cloud from depth channel to exploit 3D features directly in the 3D space and demonstrated the superiority...
A human action can be seen as transitions between ones body poses over time, where the transition depicts a temporal relation between two poses. Recognizing actions thus involves learning a classifier sensitive to these pose transitions as well as to static poses. In this paper, we introduce a novel method called transitions forests, an ensemble of decision trees that both learn to discriminate static...
Scene flow describes the motion of 3D objects in real world and potentially could be the basis of a good feature for 3D action recognition. However, its use for action recognition, especially in the context of convolutional neural networks (ConvNets), has not been previously studied. In this paper, we propose the extraction and use of scene flow for action recognition from RGB-D data. Previous works...
Detecting faces with occlusions is a challenging task due to two main reasons: 1) the absence of large datasets of masked faces, and 2) the absence of facial cues from the masked regions. To address these two issues, this paper first introduces a dataset, denoted as MAFA, with 30, 811 Internet images and 35, 806 masked faces. Faces in the dataset have various orientations and occlusion degrees, while...
Social relations are the foundation of human daily life. Developing techniques to analyze such relations from visual data bears great potential to build machines that better understand us and are capable of interacting with us at a social level. Previous investigations have remained partial due to the overwhelming diversity and complexity of the topic and consequently have only focused on a handful...
Motivated by previous success of using non-parametric methods to recognize objects, e.g., NBNN [2], we extend it to recognize actions using skeletons. Each 3D action is presented by a sequence of 3D poses. Similar to NBNN, our proposed Spatio-Temporal-NBNN applies stage-to-class distance to classify actions. However, ST-NBNN takes the spatio-temporal structure of 3D actions into consideration and...
Building robust classifiers trained on data susceptible to group or subject-specific variations is a challenging pattern recognition problem. We develop hierarchical Bayesian neural networks to capture subject-specific variations and share statistical strength across subjects. Leveraging recent work on learning Bayesian neural networks, we build fast, scalable algorithms for inferring the posterior...
We present a novel method to track 3D models in color and depth data. To this end, we introduce approximations that accelerate the state-of-the-art in region-based tracking by an order of magnitude while retaining similar accuracy. Furthermore, we show how the method can be made more robust in the presence of depth data and consequently formulate a new joint contour and ICP tracking energy. We present...
Scene text detection has attracted great attention these years. Text potentially exist in a wide variety of images or videos and play an important role in understanding the scene. In this paper, we present a novel text detection algorithm which is composed of two cascaded steps: (1) a multi-scale fully convolutional neural network (FCN) is proposed to extract text block regions, (2) a novel instance...
In our overly-connected world, the automatic recognition of virality – the quality of an image or video to be rapidly and widely spread in social networks – is of crucial importance, and has recently awaken the interest of the computer vision community. Concurrently, recent progress in deep learning architectures showed that global pooling strategies allow the extraction of activation...
Speckle refers to the granular patterns that occur in ultrasound images due to wave interference. Speckle removal can greatly improve the visibility of the underlying structures in an ultrasound image and enhance subsequent post processing. We present a novel framework for speckle removal based on low-rank non-local filtering. Our approach works by first computing a guidance image that assists in...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.