The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Segmentation of moving objects in a scene is difficult for non-stationary cameras, and especially challenging in the presence of fast and unstable egomotion, e.g., as encountered with car-mounted cameras or wearable devices. Based on an analysis of motion vanishing points of the scene and estimated depth, a geometric model that relates extracted 2D motion to a 3D motion field relative to the camera...
In contrast to still image analysis, motion information offers a powerful means to analyze video. In particular, motion trajectories determined from keypoints have become very popular in recent years for a variety of video analysis tasks, including search, retrieval and classification. Additionally, cloud-based analysis of media content has been gaining momentum, so efficient communication of salient...
This paper proposes a framework for tracking multiple fluorescent objects in 2D + time video-microscopy. We present a novel batch-processing track-before-detect multiple object tracking approach based on a spatio-temporal marked point process model of ellipses. Our approach takes into account events such as births, deaths, splits and merges of objects which are motivated by the biological and physical...
Person re-identification is the process of recognizing a person across a network of cameras with non-overlapping fields of view. In this paper we present an unsupervised multi-shot approach based on a patch-based dynamic appearance model. We use deformable graph matching for person re-identification using histograms of color and texture as features of nodes. Each graph model spans multiple images...
Curvature and torsion of discrete curves are important quantities in numerous applications in 3D image processing. Classical algorithms based on high order derivatives lead to high errors when computing torsion of 3D curves with discrete data of low resolution. To face this challenge we present a discrete parameter free approach to calculate the torsion values without fitting continuous curves on...
In this paper, a robust moving camera calibration method is proposed in order to synthesize a free viewpoint soccer video with a high degree of accuracy. The main problem in video registration-based moving camera calibration is that the calibration accuracy is very low if the detected feature points are from moving objects. In order to solve this problem, the proposed method tracks the feature points...
This work proposes a trajectory clustering-based approach for segmenting flow patterns in high density crowd videos. The goal is to produce a pixel-wise segmentation of a video sequence (static camera), where each segment corresponds to a different motion pattern. Unlike previous studies that use only motion vectors, we extract full trajectories so as to capture the complete temporal evolution of...
Crowd video retrieval is an important problem in surveillance video management in the era of big data, e.g., video indexing and browsing. In this paper, we address this issue from the motion-level perspective by using hand-drawn sketches as queries. Motion sketch based crowd video retrieval naturally suffers from challenges in motion-level video indexing and sketch representation. We tackle them by...
This paper presents a novel approach to detecting crowd groups and learning semantic regions with a Gestalt laws-based similarity. Different from the existing approaches based on optical flows or complete trajectories, our model adopts tracklets as the original input, because they carry more detailed information. Though those tracklets do not appear in the same duration, they are more robust to noise...
We address the problem of full body human pose estimation in video. Most previous work consider body part, pose or trajectory of body part as basic unit to compose the pose sequence. In contrast, we consider tracklet of body part as the basic unit. Based on this medium granularity representation we develop a spatio-temporal graphical model to select an optimal tracklet for each part in each video...
In this paper we propose to improve the localization and the 3D mapping provided by an RGBD SLAM algorithm, using a prior knowledge of the 3D model of the environment. The proposed solution relies on an feature-based RGBD SLAM algorithm to localize the camera and update the 3D map of the scene. To improve the accuracy and the robustness of the localization, we propose to combine in a local bundle...
Scene depth variation is an important factor that leads to spatially-varying camera motion blur. Most of the previous methods require auxiliary cameras or user interaction to make depth-aware deblurring tractable. In this work, we propose to use a noisy/blurred/noisy image sequence and simultaneously recorded inertial measurements to jointly estimate scene depth and remove spatially-varying blur caused...
Capturing multiple images using the burst mode of handheld cameras can be a boon to obtain a high resolution (HR) image by exploiting the subpixel motion among the captured images arising from handshake. However, the caveat with mobile phone cameras is that they produce rolling shutter (RS) distortions that must be accounted for in the super-resolution process. We propose a method in which we obtain...
Sign Language Recognition (SLR) aims at translating the Sign Language (SL) into speech or text, so as to facilitate the communication between hearing-impaired people and the normal people. This problem has broad social impact, however it is challenging due to the variation for different people and the complexity in sign words. Traditional methods for SLR generally use handcrafted feature and Hidden...
This paper presents a methodology to characterize information about groups of people with the main goal of detecting cultural aspects. Based on tracked pedestrians, groups are detected and characterized. Group information is then used to find out Cultural aspects in videos, based on the Hofstede cultural dimensions theory. The presented work was tested in videos of pedestrian groups recorded in different...
Mass gatherings to protest or demonstrate can sometimes turn violent and take the shape of a riot. It has been observed that generally demonstrations deteriorate to riots after instigation by perpetrators. Therefore, identification of instigator(s) can prevent people from turning into a mob and help law enforcement agencies to keep them under control. To the best of our knowledge there has been no...
Motion information is a key factor for action recognition and has been eagerly pursued for decades. How to effectively learn motion features in Convolutional Networks (ConvNets) remains an open issue. Prevalent ConvNets often take several full frames of video as input at a time, which can be a heavy burden for network training. In this paper, we introduce a novel framework called Tube ConvNets, by...
We present a novel video representation for human action recognition by considering temporal sequences of visual words. Based on state-of-the-art dense trajectories, we introduce temporal bundles of dominant, that is most frequent, visual words. These are employed to construct a complementary action representation of ordered dominant visual word sequences, that additionally incorporates fine grained...
Recent advances in image captioning task have led to increasing interests in video captioning task. However, most works on video captioning are focused on generating single input of aggregated features, which hardly deviates from image captioning process and does not fully take advantage of dynamic contents present in videos. We attempt to generate video captions that convey richer contents by temporally...
We propose a passive forgery detection technique for locating spliced regions in motion blurred images of 3D scenes. We consider general camera motion in hand-held cameras and utilize discrepancies in local motion blur patterns as a cue for splicing detection. We first devise an automatic and computationally efficient scheme to estimate the camera motion using only the blur kernels from authentic...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.