The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
While video-based activity analysis and recognition has received broad attention, existing body of work mostly deals with single object/person case. Modeling involving multiple objects and recognition of coordinated group activities, present in a variety of applications such as surveillance, sports, biological records, and so on, is the main focus of this paper. Unlike earlier attempts which model...
The traditional bundle adjustment algorithm for structure from motion problem has a computational complexity of O((m+n)3) per iteration and memory requirement of O(mn(m+n)), where m is the number of cameras and n is the number of structure points. The sparse version of bundle adjustment has a computational complexity of O(m3+mn) per iteration and memory requirement of O(mn). Here we propose an algorithm...
Compressed sensing (CS) suggests that a signal, sparse in some basis, can be recovered from a small number of random projections. In this paper, we apply the CS theory on sparse background-subtracted silhouettes and show the usefulness of such an approach in various multi-view estimation problems. The sparsity of the silhouette images corresponds to sparsity of object parameters (location, volume...
Image gradients form powerful cues in a host of vision and graphics applications. In this paper, we consider multiple views of a textured planar scene and consider the problem of estimating the scene texture map using these multi-view inputs. Modeling each camera view as a projective transformation of the scene, we show that the problem is equivalent to that of studying the effect of noise (and the...
Summarizing the contents of a video containing human activities is an important problem in computer vision and has important applications in automated surveillance systems. Summarizing a video requires one to identify and learn a 'vocabulary' of action-phrases corresponding to specific events and actions occurring in the video. We propose a generative model for dynamic scenes containing human activities...
Human faces undergo a lot of change in appearance as they age. Though facial aging has been studied for decades, it is only recently that attempts have been made to address the problem from a computational point of view. Most of these early efforts follow a simulation approach in which matching is performed by synthesizing face images at the target age. Given the innumerable different ways in which...
We propose a two fold approach towards modeling facial aging in adults. Firstly, we develop a shape transformation model that is formulated as a physically-based parametric muscle model that captures the subtle deformations facial features undergo with age. The model implicitly accounts for the physical properties and geometric orientations of the individual facial muscles. Next, we develop an image...
Many applications in computer vision and pattern recognition involve drawing inferences on certain manifold-valued parameters. In order to develop accurate inference algorithms on these manifolds we need to a) understand the geometric structure of these manifolds b) derive appropriate distance measures and c) develop probability distribution functions (pdf) and estimation techniques that are consistent...
Biometric matching decisions have traditionally been made based solely on a score that represents the similarity of the query biometric to the enrolled biometric(s) of the claimed identity. Fusion schemes have been proposed to benefit from the availability of multiple biometric samples (e.g., multiple samples of the same fingerprint) or multiple different biometrics (e.g., face and fingerprint). These...
Joint processing of sensor array outputs improves the performance of parameter estimation and hypothesis testing problems beyond the sum of the individual sensor processing results. When the sensors have high data sampling rates, arrays are tethered, creating a disadvantage for their deployment and also limiting their aperture size. In this paper, we develop the signal processing algorithms for randomly...
Estimation based on received signal strength (RSS) is crucial in sensor networks for sensor localization, target tracking, etc. In this paper, we present a Gaussian approximation of the Chi distribution that is applicable to general RSS source localization problems in sensor networks. Using our Gaussian approximation, we provide a factorized variational Bayes (VB) approximation to the location and...
In surveillance applications, it is common to have multiple cameras observing targets exhibiting motion on a ground plane. Tracking and estimation of the location of a target on the plane becomes an important inference problem. In this paper, we study the problem of combining estimates of location obtained from multiple cameras. We model the relation between the uncertainty in the location estimation...
We study the beneficial effect of side information on the Structure from Motion (SfM) estimation problem. The side information that we consider is measurement of a 'reference vector' and distance from fixed plane perpendicular to that reference vector. Firstly, we show that in the presence of this information, the SfM equations can be rewritten similar to a bilinear form in its unknowns. Secondly,...
In this paper, we propose a non-stationary stochastic filtering framework for the task of albedo estimation from a single image. There are several approaches in literature for albedo estimation, but few include the errors in estimates of surface normals and light source directions to improve the albedo estimate. The proposed approach effectively utilizes the error statistics of surface normals and...
Visual tracking is a very important front-end to many vision applications. We present a new framework for robust visual tracking in this paper. Instead of just looking forward in the time domain, we incorporate both forward and backward processing of video frames using a novel time-reversibility constraint. This leads to a new minimization criterion that combines the forward and backward similarity...
A strong requirement to come up with secure and user- friendly ways to authenticate and identify people, to safeguard their rights and interests, has probably been the main guiding force behind biometrics research. Though a vast amount of research has been done to recognize humans based on still images, the problem is still far from solved for unconstrained scenarios. This has led to an increased...
A critical step for fitting a linear mixing model to hyperspectral imagery is the estimation of the abundances. The abundances are the percentage of each end member within a given pixel; therefore, they should be non-negative and sum to one. With the advent of kernel based algorithms for hyperspectral imagery, kernel based abundance estimates have become necessary. This paper presents such an algorithm...
In general the visual-hull approach for performing integrated face and gait recognition requires at least two cameras. In this paper we present experimental results for fusion of face and gait for the single camera case. We considered the NIST database which contains outdoor face and gait data for 30 subjects. In the NIST database, subjects walk along an inverted Sigma pattern. In (A. Kale, et al...
We introduce an epitomic representation for modeling human activities in video sequences. A video sequence is divided into segments within which the dynamics of objects is assumed to be linear and modeled using linear dynamical systems. The tuple consisting of the estimated system matrix, statistics of the input signal and the initial state value is said to form an epitome. The system matrices are...
Clustering video sequences in order to infer and extract activities from a single video stream is an extremely important problem and has significant potential in video indexing, surveillance, activity discovery and event recognition. Clustering a video sequence into activities requires one to simultaneously recognize activity boundaries (activity consistent subsequences) and cluster these activity...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.