The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Object proposals for detecting moving or static video objects need to address issues such as speed, memory complexity and temporal consistency. We propose an efficient Video Object Proposal (VOP) generation method and show its efficacy in learning a better video object detector A deep-learning based video object detector learned using the proposed VOP achieves state-of-the-art detection performance...
Recent progress in video description has shown promising results by combining object/action recognition and natural language processing techniques. However, even the most simplest form of the generated sentence, the SVO triplet (Subject/Verb/Object), can be misleading for its lack of role relationship analysis. When the system detects keywords "person", "baby" and "feed",...
We present a new framework for capturing videos using sensor-rich mobile devices, such as smartphones, tablets, etc. Many of today's mobile devices are equipped with a variety of sensors, including accelerometers, magnetometers and gyroscopes, which are rarely used during video capture for anything more than video stabilization. We demonstrate that these sensors, together with the information that...
In this paper, we propose a new discriminative framework based on Hough forests that enables us to efficiently recognize and localize sequential data in the form of spatio-temporal trajectories. Contrary to traditional decision forest-based methods where predictions are made independently of its output temporal context, we introduce the concept of "transition", which enforces the temporal...
A video coding system is presented that partitions the scene into "visual structures" anda residual "background" layer. A low-level representation ("track-template") of visual structures is proposed that exploits their temporal redundancy. A dictionary of track-templates is constructed that is used to encode video frames. We make optimal use of the dictionary in terms...
Exploiting contextual cues has been a key idea to improve people detection in crowded scenes. Along this line we present a novel context-driven approach to detect people in crowded scenes. Based on a context graph that incorporates both geometric and social contextual patterns in crowds, we apply label propagation to discover weak detections contextually compatible with true detections while suppressing...
In this paper, we propose a structured dictionary learning framework for video-based face recognition. We discover the invariant structural information from different videos of each subject. Specifically, we employ dictionary learning and low-rank approximation to preserve the invariant structure of face images in videos. The learned dictionary is both discriminative and reconstructive. Thus, we not...
This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additional human supervision, i.e., without using any ground-truth bounding boxes for training. The key idea is to analyze the change in the recognition scores when artificially masking out different regions of...
Action recognition from a single image is an important task for applications such as image annotation, robotic navigation, video surveillance and several others. Existing methods for recognizing actions from still images mainly rely on either bag-of-feature representations or pose estimation from articulated body-part models. However, the relationship between the action and the containing image is...
Given 3D outdoor scenes acquired by a LIDAR sensor, we address the problem of semantic segmentation of 3D point clouds involving simultaneously segmenting and classifying the data. The capability of semantic segmentation is essential for several applications, such as autonomous robot navigation and 3D reconstruction of point clouds. In this paper, we present a higher-order class-specific CRF model...
Non-contact measurement of cardiac pulse signals has attracted high interests due to its convenience and cost effectiveness. However, extracting pulse signals on mobile handheld devices (e.g. smartphones) based on face videos captured by mobile cameras usually suffers from low measurement accuracy due to misalignment errors in face tracking and inevitable illumination changes in a mobile scenario,...
This paper focuses on a method of constructing panoramas from a quadcopter, and a new mosaicing sub-problem when the scene contains significant regions of vacant spaces. These vacant spaces yield little to no features to match input images and hence challenge existing mosaicing techniques. We describe a framework that is able to handle this unique input by leveraging the availability of the inertial...
Recently convolutional neural networks (ConvNets) have come up as state-of-the-art classification and detection algorithms, achieving near-human performance in visual detection. However, ConvNet algorithms are typically very computation and memory intensive. In order to be able to embed ConvNet-based classification into wearable platforms and embedded systems such as smartphones or ubiquitous electronics...
Despite the outperformance of Support Vector Machine (SVM) on many practical classification problems, the algorithm is not directly applicable to multi-dimensional trajectories having different lengths. In this paper, a new class of SVM that is applicable to trajectory classification, such as action recognition, is developed by incorporating two efficient time-series distances measures into the kernel...
Fine-grained activity recognition focuses recognition on sub-ordinate levels. This task is made difficult due to low inter-class variability and high intra-class variability caused by human motion and objects. We propose that recognition of such activities can be significantly improved by grouping and decomposing them into a hierarchy of multiple abstraction layers; we introduce a Hierarchical Activity...
This paper revisits the problem of human action recognition from skeleton joint locations, and analyses the tradeoff of sampling the joint space with respect to the recognition performance and computational complexity. The provided insights led to the design of a new algorithm for automatically selecting the most appropriate set of joints for each action. During the training stage, the approach applies...
Sliding window is one direct way to extend a successful recognition system to handle the more challenging detection problem. While action recognition decides only whether or not an action is present in a pre-segmented video sequence, action detection identifies the time interval where the action occurred in an unsegmented video stream. Sliding window approaches can however be slow as they maximize...
Text detection in stores has valuable applications that could transform the shopping experience, yet cluttered store environments present distinct challenges for existing techniques. We propose a strategy for text detection in stores that exploits a repetition prior. Leveraging the fact that shops typically display multiple instances of the same product on the shelf, our approach localizes text regions...
A novel approach for the fusion of heterogeneous object detection methods is proposed. In order to effectively integrate the outputs of multiple detectors, the level of ambiguity in each individual detection score is estimated using the precision/recall relationship of the corresponding detector. The main contribution of the proposed work is a novel fusion method, called Dynamic Belief Fusion (DBF),...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.