The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Multiple-instance learning (MIL) is a new paradigm of supervised learning that deals with the classification of bags. Each bag is presented as a collection of instances from which features are extracted. In MIL, we have usually confronted with a large instance space for even moderately sized data sets since each bag may contain many instances. Hence it is important to design efficient instance pruning...
In many image and video collections, we have access only to partially labeled data. For example, personal photo collections often contain several faces per image and a caption that only specifies who is in the picture, but not which name matches which face. Similarly, movie screenplays can tell us who is in the scene, but not when and where they are on the screen. We formulate the learning problem...
The ambiguity inherent in a localized analysis of events from video can be resolved by exploiting constraints between events and examining only feasible global explanations. We show how jointly recognizing and linking events can be formulated as labeling of a Bayesian network. The framework can be extended to multiple linking layers, expressing explanations as compositional hierarchies. The best global...
In this paper we introduce a novel method to detect and localize abnormal behaviors in crowd videos using Social Force model. For this purpose, a grid of particles is placed over the image and it is advected with the space-time average of optical flow. By treating the moving particles as individuals, their interaction forces are estimated using social force model. The interaction force is then mapped...
We present a novel framework for recognizing repetitive sequential events performed by human actors with strong temporal dependencies and potential parallel overlap. Our solution incorporates sub-event (or primitive) detectors and a spatiotemporal model for sequential event changes. We develop an effective and efficient method to integrate primitives into a set of sequential events where strong temporal...
We study the problem of object classification when training and test classes are disjoint, i.e. no training examples of the target classes are available. This setup has hardly been studied in computer vision research, but it is the rule rather than the exception, because the world contains tens of thousands of different object classes and for only a very few of them image, collections have been formed...
Maximum likelihood (ML) estimation is widely used in many computer vision problems involving the estimation of geometric parameters, from conic fitting to bundle adjustment for structure and motion. This paper presents a detailed discussion on the bias of ML estimates derived for these problems. Statistical theory states that although ML estimates attain maximum accuracy in the limit as the sample...
Many semi-supervised learning algorithms only deal with binary classification. Their extension to the multi-class problem is usually obtained by repeatedly solving a set of binary problems. Additionally, many of these methods do not scale very well with respect to a large number of unlabeled samples, which limits their applications to large-scale problems with many classes and unlabeled samples. In...
We address the problem of label assignment in computer vision: given a novel 3D or 2D scene, we wish to assign a unique label to every site (voxel, pixel, superpixel, etc.). To this end, the Markov Random Field framework has proven to be a model of choice as it uses contextual information to yield improved classification results over locally independent classifiers. In this work we adapt a functional...
In this paper, we address the problem of learning an adaptive appearance model for object tracking. In particular, a class of tracking techniques called ldquotracking by detectionrdquo have been shown to give promising results at real-time speeds. These methods train a discriminative classifier in an online manner to separate the object from the background. This classifier bootstraps itself by using...
We propose a geometric method for visual tracking, in which the 2-D affine motion of a given object template is estimated in a video sequence by means of coordinate-invariant particle filtering on the 2-D affine group Aff(2). Tracking performance is further enhanced through a geometrically defined optimal importance function, obtained explicitly via Taylor expansion of a principal component analysis...
A novel particle filter, the memory-based particle filter (M-PF), is proposed that can visually track moving objects that have complex dynamics. We aim to realize robustness against abrupt object movements and quick recovery from tracking failure caused by factors such as occlusions. To that end, we eliminate the Markov assumption from the previous particle filtering framework and predict the prior...
We propose a biologically inspired framework for visual tracking based on discriminant center surround saliency. At each frame, discrimination of the target from the background is posed as a binary classification problem. From a pool of feature descriptors for the target and background, a subset that is most informative for classification between the two is selected using the principle of maximum...
Non-rigid object detection and articulated pose estimation are two related and challenging problems in computer vision. Numerous models have been proposed over the years and often address different special cases, such as pedestrian detection or upper body pose estimation in TV footage. This paper shows that such specialization may not be necessary, and proposes a generic approach based on the pictorial...
We present a method for the detection of instances of an object class, such as cars or pedestrians, in natural images. Similarly to some previous works, this is accomplished via generalized Hough transform, where the detections of individual object parts cast probabilistic votes for possible locations of the centroid of the whole object; the detection hypotheses then correspond to the maxima of the...
This paper presents a unified framework for object detection, segmentation, and classification using regions. Region features are appealing in this context because: (1) they encode shape and scale information of objects naturally; (2) they are only mildly affected by background clutter. Regions have not been popular as features due to their sensitivity to segmentation errors. In this paper, we start...
We present a discriminative Hough transform based object detector where each local part casts a weighted vote for the possible locations of the object center. We show that the weights can be learned in a max-margin framework which directly optimizes the classification performance. The discriminative training takes into account both the codebook appearance and the spatial distribution of its position...
We generalize reflection symmetry detection to a curved glide reflection symmetry detection problem. We propose a unifying, local feature based approach for curved glide reflection symmetry detection from real, unsegmented images, where the classic reflection symmetry becomes one of four special cases. Our method detects and groups statistically dominant local reflection axes in a 3D parameter space...
In recent years, 3D deformable surface reconstruction from single images has attracted renewed interest. It has been shown that preventing the surface from either shrinking or stretching is an effective way to resolve the ambiguities inherent to this problem. However, while the geodesic distances on the surface may not change, the Euclidean ones decrease when folds appear. Therefore, when applied...
The layered dynamic texture (LDT) is a generative model, which represents video as a collection of stochastic layers of different appearance and dynamics. Each layer is modeled as a temporal texture sampled from a different linear dynamical system, with regions of the video assigned to a layer using a Markov random field. Model parameters are learned from training video using the EM algorithm. However,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.