The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper delves into the effectiveness of a gait recognition process depending on the length of the video sequence used. To this end, a well-known gait representation, the Gait Energy Image (GEI), is incrementally computed from gait cycles in the order they occur. The main objective is to assess the problem of the minimum number of gait cycles required to obtain discriminant GEIs. An experimental...
Image inpainting in a clutter-scene containing objects at several depths could be more challenging in contrast to conventional image inpainting problems. The basic inpainting problem is filling up the missing parts of an image caused by removing undesired objects existing in the foreground layer. In this paper we propose a method to inpaint missing portions of an image in different depth layers. The...
Existing techniques for object tracking with Multiple Instance Learning take the approach of extracting low-level patches of fixed size and aspect ratios within each image, and employ many simplistic assumptions. In this work, we propose an approach that automatically utilizes image segments as input primitives to develop a multi-level segmentation-based system, and build a target model refinement...
Design of video storyboards has emerged as a popular research area in the multimedia community. Different pattern clustering techniques are applied to extract the key frames from a video sequence to form a storyboard. In this paper, we propose an automatic method for the selection of key frames of a video sequence using Delaunay graphs. We prune certain edges from the Delaunay graph using an iterative...
In this paper, we propose an approach for human activity categorizing based on the use of optical flow direction and magnitude features. The main contribution of this paper is the feature representation that mirrors the geometry of the human body and relationships between its moving regions when performing activities. The features are quantified using a quantization algorithm. We analyze the performance...
In this study, a visual attention region determination approach for H.264 videos using spatiotemporal features is proposed. After Gaussian filtering in Lab color space, the phase spectrum of Fourier transform (PFT) is used to generate the spatial saliency map of each video frame. On the other hand, the motion vector fields from an H.264 video bitstream are backward accumulated and the phase spectrum...
The key frame extraction is designed for obtaining a (very) compressed set of video frames that summarizes the essential content of a video sequence. In this paper, a well-known information theoretic measure, the Jensen-Rényi divergence (JRD), is studied to estimate the frame-by-frame distance between consecutive video images, for segmenting shots/subshots and for choosing key frames. Our new key...
We may represent human actions as a bag of spatiotemporal visual words extracted from input video sequences. For human action categorization, labeled LDA (L-LDA) is an extension of latent Dirichlet allocation (LDA) by providing action class labels to each video. To handle parameter uncertainty in L-LDA, this paper further extends L-LDA within the type-2 fuzzy set (T2 FS) framework, referred to as...
This paper introduces CMV100, a new research dataset for people tracking and re-identification in sparse camera networks. Baseline methods for reidentification performance analysis are also proposed. The dataset consist of over 400 indoor video sequences in total. The number of visually distinctive human objects is 100, and each person appears in three different views on average and in five at maximum...
We propose a new framework for recognizing three-dimensional (3D) objects that needs few reference images. We tackle a key issue of 3D object recognition; the trade-off between the number of reference images and recognition accuracy. We assume that a reference image is a photo of the object from an arbitrary position, typical of those used to exhibit goods on the web. The framework, first, estimates...
To deal with the drifting issue in visual tracking, we propose an Online Transfer Boosting (OTB) algorithm that transfers knowledge from three different source domains to the target domain to improve the performance of the online classifier used in tracking-by-detection. In particular, the OTB algorithm integrates three types of knowledge by: (1) transferring prior knowledge from the first frame using...
This paper introduces an algorithm for dense motion segmentation of pedestrians in crowded video sequences. This algorithm realizes dense and temporally-consistent segmentation under severe occlusion conditions. To segment the whole appearance of each articulated object, the algorithm uses temporal invariance of geodesic distance (similarity) between segments as a criterion for motion segmentation...
Computing similarities between data samples is a fundamental step in most Pattern Recognition (PR) tasks. Better similarity measures lead to more accurate prediction of labels. Computing similarities between video sequences has been a challenging problem for the PR community for long because videos have both spatial and temporal context which are hard to capture. We describe a novel approach that...
For achieving efficient action recognition, some recent works propose to select a smaller number of frames in a video sequence instead of the entire sequence of frames. In this study, we propose to represent a frame by a combination of local and global descriptors instead of the silhouette used in our previous approach aiming at frame selection. Action recognition is then executed on the basis of...
Accurate segmentation provides a useful contour constraint to alleviate drifting during online learning for tracking. Towards this end, we present a closed-loop method for object tracking that links Hough forests and alpha matting via an effective back-projection scheme for patches. A novel hybrid-Hough-forests-based method first estimates object location. Given the object location, the trimap of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.