The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Attributes are visual concepts that can be detected by machines, understood by humans, and shared across categories. They are particularly useful for fine-grained domains where categories are closely related to one other (e.g. bird species recognition). In such scenarios, relevant attributes are often local (e.g. “white belly”), but the question of how to choose these local attributes remains largely...
Attribute-based query offers an intuitive way of image retrieval, in which users can describe the intended search targets with understandable attributes. In this paper, we develop a general and powerful framework to solve this problem by leveraging a large pool of weak attributes comprised of automatic classifier scores or other mid-level representations that can be easily acquired with little or...
We propose a framework that performs action recognition and identity maintenance of multiple targets simultaneously. Instead of first establishing tracks using an appearance model and then performing action recognition, we construct a network flow-based model that links detected bounding boxes across video frames while inferring activities, thus integrating identity maintenance and action recognition...
We present a generic framework for object segmentation using depth maps based on Random Forest and Graph-cuts theory, and apply it to the segmentation of human limbs in depth maps. First, from a set of random depth features, Random Forest is used to infer a set of label probabilities for each data sample. This vector of probabilities is used as unary term in α-β swap Graph-cuts algorithm. Moreover,...
State-of-the-art methods for human detection and pose estimation require many training samples for best performance. While large, manually collected datasets exist, the captured variations w.r.t. appearance, shape and pose are often uncontrolled thus limiting the overall performance. In order to overcome this limitation we propose a new technique to extend an existing training set that allows to explicitly...
This paper introduces a probabilistic graphical model for continuous action recognition with two novel components: substructure transition model and discriminative boundary model. The first component encodes the sparse and global temporal transition prior between action primitives in state-space model to handle the large spatial-temporal variations within an action class. The second component enforces...
Simple rule based Multi Agent Systems are widely used in the fields of social simulations and game artificial intelligence in order to incorporate the complexity and richness of action and interaction into the characters in the virtual environments while keeping computational cost low. This paper presents an approach to synthesize the spatio-temporal dynamics of groups in standing conversation: four...
We propose a novel mode of feedback for image search, where a user describes which properties of exemplar images should be adjusted in order to more closely match his/her mental model of the image(s) sought. For example, perusing image results for a query “black shoes”, the user might state, “Show me shoe images like these, but sportier.” Offline, our approach first learns a set of ranking functions,...
This paper addresses the problem of scene categorization while arguing that better and more accurate results can be obtained by endowing the computational process with perceptual relations between scene categories. We first describe a psychophysical paradigm that probes human scene categorization, extracts perceptual relations between scene categories, and suggests that these perceptual relations...
Security breaches that affect personal data and organisational systems have become increasingly significant in the global technology (IT) industry. There is scope for research on the factors that influence user behaviour and attitudes toward this aspect of information security and their impact on organisation's network integrity. This research aims to study the critical success factors (CSF) for employees...
In many visual classification tasks the spatial distribution of discriminative information is (i) non uniform e.g. person ‘reading’ can be distinguished from ‘taking a photo’ based on the area around the arms i.e. ignoring the legs and (ii) has intra class variations e.g. different readers may hold the books differently. Motivated by these observations, we propose to learn the discriminative spatial...
Fine-grained categorization refers to the task of classifying objects that belong to the same basic-level class (e.g. different bird species) and share similar shape or visual appearances. Most of the state-of-the-art basic-level object classification algorithms have difficulties in this challenging problem. One reason for this can be attributed to the popular codebook-based image representation,...
Pedestrian detection from images is an important and yet challenging task. The conventional methods usually identify human figures using image features inside the local regions. In this paper we present that, besides the local features, context cues in the neighborhood provide important constraints that are not yet well utilized. We propose a framework to incorporate the context constraints for detection...
The need for early detection of temporal events from sequential data arises in a wide spectrum of applications ranging from human-robot interaction to video security. While temporal event detection has been extensively studied, early detection is a relatively unexplored problem. This paper proposes a maximum-margin framework for training temporal event detectors to recognize partial events, enabling...
Since high-level events in images (e.g. “dinner”, “motorcycle stunt”, etc.) may not be directly correlated with their visual appearance, low-level visual features do not carry enough semantics to classify such events satisfactorily. This paper explores a fully compositional approach for event based image retrieval which is able to overcome this shortcoming. Furthermore, the approach is fully scalable...
We present a video summarization approach for egocentric or “wearable” camera data. Given hours of video, the proposed method produces a compact storyboard summary of the camera wearer's day. In contrast to traditional keyframe selection techniques, the resulting summary focuses on the most important objects and people with which the camera wearer interacts. To accomplish this, we develop region cues...
We present a machine learning framework that automatically generates a model set of landmarks for some class of registered 3D objects: here we use human faces. The aim is to replace heuristically-designed landmark models by something that is learned from training data. The value of this automatically generated model is an expected improvement in robustness and precision of learning-based 3D landmarking...
In this paper, we propose an effective method to recognize human actions from 3D positions of body joints. With the release of RGBD sensors and associated SDK, human body joints can be extracted in real time with reasonable accuracy. In our method, we propose a new type of features based on position differences of joints, EigenJoints, which combine action information including static posture, motion,...
In this paper, we present a gamesourcing method for automatically and rapidly acquiring labeled images of human poses to obtain ground truth data as input for human pose estimation from 2D images. Typically, these datasets are constructed manually through a tedious process of clicking on joint locations in images. By using a low-cost RGBD sensor, we capture synchronized, registered images, depth maps,...
The launch of Xbox Kinect has built a very successful computer vision product and made a big impact to the gaming industry; this sheds lights onto a wide variety of potential applications related to action recognition. The accurate estimation of human poses from the depth image is universally a critical step. However, existing pose estimation systems exhibit failures when faced severe occlusion. In...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.