The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a framework for anomaly detection in videos which considers both motion and appearance features. For motion cues, we propose a new feature called 3D-HOF, which effectively extracts both velocity and orientation from the optical flow map. At the same time, we introduce the concept of “depth of field” problem to make the detection more accurate when the velocity of an object may...
Video is now one of the major sources of information for forensics. However, video documents can be originating from various recording devices (CCTV, mobile devices, etc.) with inconsistent quality and can sometimes be recorded in challenging light or motion conditions. Therefore, the amount of information that can be extracted relying solely on video image can vary to a great extent. Most of the...
We address the problem of full body human pose estimation in video. Most previous work consider body part, pose or trajectory of body part as basic unit to compose the pose sequence. In contrast, we consider tracklet of body part as the basic unit. Based on this medium granularity representation we develop a spatio-temporal graphical model to select an optimal tracklet for each part in each video...
We consider the fully automated behavior understanding through visual cues in industrial environments. In contrast to most existing work, which relies on domain knowledge to construct complex handcrafted features from inputs, we exploit a Convolutional Neural Network (CNN), which is a type of deep model and can act directly on the raw inputs, to automate the process of feature construction. Although...
We here combine the rich, overcomplete signal representation afforded by the scattering transform together with a probabilistic graphical model which captures hierarchical dependencies between coefficients at different layers. The wavelet scattering network result in a high-dimensional representation which is translation invariant and stable to deformations whilst preserving informative content. Such...
In this paper we consider the problem of semi-supervised learning with deep Convolutional Neural Networks (ConvNets). Semi-supervised learning is motivated on the observation that unlabeled data is cheap and can be used to improve the accuracy of classifiers. In this paper we propose an unsupervised regularization term that explicitly forces the classifier's prediction for multiple classes to be mutually-exclusive...
Document is unavailable: This DOI was registered to an article that was not presented by the author(s) at this conference. As per section 8.2.1.B.13 of IEEE's "Publication Services and Products Board Operations Manual," IEEE has chosen to exclude this article from distribution. We regret any inconvenience.
Sign Language Recognition (SLR) aims at translating the Sign Language (SL) into speech or text, so as to facilitate the communication between hearing-impaired people and the normal people. This problem has broad social impact, however it is challenging due to the variation for different people and the complexity in sign words. Traditional methods for SLR generally use handcrafted feature and Hidden...
Vision based sign language recognition (SLR) is a challenging task due to the complexity of signs and limited data collection. To improve the recognition precision, this paper proposes an adaptive GMM-based (Gaussian mixture model) HMMs (Hidden Markov Models) framework. We discover that inherent latent states in HMMs are not only related to the number of key gestures and body poses, but also related...
With the advent of cost-effective depth sensors and the development of fast human-pose estimation algorithms, interest in action recognition from temporal skeleton sequences has been renewed. In this work we claim the task can be naturally seen as a Multiple Instance Learning (MIL) problem. Specifically, we model skeleton sequences as bags of time-stamped descriptors, and we present a new framework...
We present a novel video representation for human action recognition by considering temporal sequences of visual words. Based on state-of-the-art dense trajectories, we introduce temporal bundles of dominant, that is most frequent, visual words. These are employed to construct a complementary action representation of ordered dominant visual word sequences, that additionally incorporates fine grained...
Sensitivity to spatial details drops across the visual periphery, and hence video streaming systems that gracefully degrades quality away from the viewpoint of the observer, provides an optimum viewing experience with potentially large bitrate savings. As reaction latency is an important performance parameter of such systems, good prediction of future gaze locations at the transmission end is very...
This paper presents a new visual speaker authentication scheme which can extract the most representative details of a speaker's lip feature. For each speaker, the entire utterance pronouncing a specific prompt text is divided into several word-level segments and a mute segment. Three kinds of lip feature details are investigated including: i) lip movements in each word segment; ii) lip movements in...
Document is unavailable: This DOI was registered to an article that was not presented by the author(s) at this conference. As per section 8.2.1.B.13 of IEEE's "Publication Services and Products Board Operations Manual," IEEE has chosen to exclude this article from distribution. We regret any inconvenience.
In this paper, we propose a novel framework for dynamical analysis of human actions from 3D motion capture data using topological data analysis. We model human actions using the topological features of the attractor of the dynamical system. We reconstruct the phase-space of time series corresponding to actions using time-delay embedding, and compute the persistent homology of the phase-space reconstruction...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.