The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a novel method to predict future human activities from partially observed RGB-D videos. Human activity prediction is generally difficult due to its non-Markovian property and the rich context between human and environments. We use a stochastic grammar model to capture the compositional structure of events, integrating human actions, objects, and their affordances. We represent...
Detecting actions in untrimmed videos is an important yet challenging task. In this paper, we present the structured segment network (SSN), a novel framework which models the temporal structure of each action instance via a structured temporal pyramid. On top of the pyramid, we further introduce a decomposed discriminative model comprising two classifiers, respectively for classifying actions and...
Interests in the global security has encouraged researchers to propose novel algorithms to make robust biometrie systems. One of the interesting biometric trait is identifying human on the basis of their walking patterns, called gait recognition. In this paper, our contribution is two-fold. Firstly, we discuss the modules of model-free gait recognition techniques. Secondly, we perform the comparative...
This paper addresses issues in human fall detection from videos. Unlike using handcrafted features in the conventional machine learning, we extract features from Convolutional Neural Networks (CNNs) for human fall detection. Similar to many existing work using two stream inputs, we use a spatial CNN stream with raw image difference and a temporal CNN stream with optical flow as the inputs of CNN....
We present a system for temporal detection of social interactions. Many of the works until now have succeeded in recognising activities from clipped videos in datasets, but for robotic applications, it is important to be able to move to more realistic data. For this reason, the proposed approach temporally detects intervals where individual or social activity is occurring. Recognition of human activities...
In this paper we address the problem of online video abnormal event detection. A vast number of methods to automatically detect abnormal events in videos have been recently proposed. However, the majority of these recently proposed methods cannot attain online performance; in other words, they cannot detect events as soon as they occur. Thus there is a lack of methods specifically aimed to detect...
We address the problem of temporal action localization in videos. We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum of frame-wise classification scores. Additionally, our model classifies the start, middle, and end of each action as separate components, allowing our system to explicitly model each actions temporal...
In this paper, we introduce Key-Value Memory Networks to a multimodal setting and a novel key-addressing mechanism to deal with sequence-to-sequence models. The proposed model naturally decomposes the problem of video captioning into vision and language segments, dealing with them as key-value pairs. More specifically, we learn a semantic embedding (v) corresponding to each frame (k) in the video,...
This paper presents a framework for saliency estimation and fixation prediction in videos. The proposed framework is based on a hierarchical feature representation obtained by stacking convolutional layers of independent subspace analysis (ISA) filters. The feature learning is thus unsupervised and independent of the task. To compute the saliency, we then employ a multiresolution saliency architecture...
Global motion estimation (GME) algorithms are typically employed on aerial videos captured by on-board UAV cameras to compensate for the artificial motion induced in these video frames due to camera motion. However, existing methods for GME have high computational complexity and are therefore not suitable for on-board processing in UAVs with limited computing capabilities. In this paper, we propose...
Deep visual attention in computer vision has attracted much attention over the past years, which achieves great contributions especially in image classification, image caption and action recognition. However, due to taking BP training wholly or partially, they can not show the true power of attention in computational efficiency and focusing accuracy. Our intuition is that attention mechanism should...
In this paper, we propose a new video representation incorporating image based deep features and an efficient pooling strategy for the purpose of action recognition. The Convolutional Neural Network (CNN) based features have very recently emerged as the new state of the art for image classification. Several attempts have been made to extend such CNN models for videos by explicitly focusing on the...
Group activity recognition from videos is a very challenging problem that has barely been addressed. We propose an activity recognition method using group context. In order to encode both single-person description and two-person interactions, we learn mappings from highdimensional feature spaces to low-dimensional dictionaries. In particular the proposed two-person descriptor takes into account geometric...
Most affect based systems analyse facial expressions for emotion detection, and utilize face detection and recognition methods in order to do effective affect analysis. Recent work has demonstrated the efficacy of deep architectures for face recognition by training as classifiers on voluminous datasets. Some architectures are trained as classifiers, and some directly learn an embedding via a triplet...
In this paper, we address the problem of recognizing unfinished human activity from partially observed videos. Specifically, we propose a novel human activity descriptor, which can represent pairwise relationships among human activities in a compact manner using pre-trained Convolutional Neural Networks (CNNs) by capturing the discriminative sub-volume. The potentially important relationship among...
We propose a novel geometric framework for analyzing spontaneous facial expressions, with the specific goal of comparing, matching, and averaging the shapes of landmarks trajectories. Here we represent facial expressions by the motion of the landmarks across the time. The trajectories are represented by curves. We use elastic shape analysis of these curves to develop a Riemannian framework for analyzing...
Activity recognition in videos is a challenging task, mainly if a scarce number of samples is available for modelling the problem. The task becomes even harder when using generative models such as mixture models or Hidden Markov Models (HMMs), as they demand a lot of samples to determinate their parameters. Additionally, these models rely on the appropriate selection of some parameters, for instance...
The MapReduce framework is being increasingly used in the scientific computing and image/video processing fields. Relevant research has tailored it for the field's specificities but there are still overwhelming limitations when it comes to temporal locality-sensitive computations. The performance of this class of computations is closely tied to an efficient use of the memory hierarchy, concern that...
Player believability is often defined as the ability of a game playing character to convince an observer that it is being controlled by a human. The agent's behavior is often assumed to be the main contributor to the character's believability. In this paper we reframe this core assumption and instead focus on the impact of the game environment and aspects of game design (such as level design) on the...
In this study, we make use of brain activation data to investigate the perceptual plausibility of a visual and an auditory model for visual and auditory saliency in video processing. These models have already been successfully employed in a number of applications. In addition, we experiment with parameters, modifications and suitable fusion schemes. As part of this work, fMRI data from complex video...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.