2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Neural Aggregation Network for Video Face Recognition

Jiaolong Yang, Peiran Ren, Dongqing Zhang, Dong Chen, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5216 - 5225

This paper presents a Neural Aggregation Network (NAN) for video face recognition. The network takes a face video or face image set of a person with a variable number of face images as its input, and produces a compact, fixed-dimension feature representation for recognition. The whole network is composed of two modules. The feature embedding module is a deep Convolutional Neural Network (CNN) which...

chapter

Learning and Refining of Privileged Information-Based RNNs for Action Recognition from Depth Sequences

Zhiyuan Shi, Tae-Kyun Kim

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4684 - 4693

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Existing RNN-based approaches for action recognition from depth sequences require either skeleton joints or hand-crafted depth features as inputs. An end-to-end manner, mapping from raw depth maps to action classes, is non-trivial to design due to the fact that: 1) single channel map lacks texture thus weakens the discriminative power, 2) relatively small set of depth training data. To address these...

chapter

Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition

Timur Bagautdinov, Alexandre Alahi, Francois Fleuret, Pascal Fua, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3425 - 3434

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense...

chapter

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3337 - 3345

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail. While one new captioning approach, dense captioning, can potentially describe images in finer levels of detail by captioning many regions within an image, it in turn is unable to...

chapter

Predictive-Corrective Networks for Action Detection

Achal Dave, Olga Russakovsky, Deva Ramanan

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2067 - 2076

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

While deep feature learning has revolutionized techniques for static-image understanding, the same does not quite hold for video processing. Architectures and optimization techniques used for video are largely based off those for static images, potentially underutilizing rich video information. In this work, we rethink both the underlying network architecture and the stochastic learning paradigm for...

chapter

Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition

Yufei Wang, Zhe Lin, Xiaohui Shen, Scott Cohen, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7378 - 7387

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recently, there has been a lot of interest in automatically generating descriptions for an image. Most existing language-model based approaches for this task learn to generate an image description word by word in its original word order. However, for humans, it is more natural to locate the objects and their relationships first, and then elaborate on each object, describing notable attributes. We...

chapter

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

Qing Sun, Stefan Lee, Dhruv Batra

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7215 - 7223

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies. Beam Search (BS) is a widely used approximate inference algorithm for decoding sequences from unidirectional neural sequence models. Interestingly, approximate inference in bidirectional...

chapter

Multi-level Attention Networks for Visual Question Answering

Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4187 - 4195

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Inspired by the recent success of text-based question answering, visual question answering (VQA) is proposed to automatically answer natural language questions with the reference to a given image. Compared with text-based QA, VQA is more challenging because the reasoning process on visual domain needs both effective semantic embedding and fine-grained visual understanding. Existing approaches predominantly...

chapter

An Empirical Evaluation of Visual Question Answering for Novel Objects

Santhosh K. Ramakrishnan, Ambar Pal, Gaurav Sharma, Anurag Mittal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7312 - 7321

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We study the problem of answering questions about images in the harder setting, where the test questions and corresponding images contain novel objects, which were not queried about in the training data. Such setting is inevitable in real world–owing to the heavy tailed distribution of the visual categories, there would be some objects which would not be annotated in the train set. We show...

chapter

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5263 - 5271

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects outside of training corpora (i.e., novel objects). In this paper, we present Long Short-Term Memory with Copying Mechanism (LSTM-C) — a new architecture...

chapter

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

Namhoon Lee, Wongun Choi, Paul Vernaza, Christopher B. Choy, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2165 - 2174

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce a Deep Stochastic IOC RNN Encoder-decoder framework, DESIRE, for the task of future predictions of multiple interacting agents in dynamic scenes. DESIRE effectively predicts future locations of objects in multiple scenes by 1) accounting for the multi-modal nature of the future prediction (i.e., given the same context, future may vary), 2) foreseeing the potential future outcomes and...

chapter

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

Kuo-Hao Zeng, Shih-Han Chou, Fu-Hsiang Chan, Juan Carlos Niebles, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1330 - 1338

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

For survival, a living agent (e.g., human in Fig. 1(a)) must have the ability to assess risk (1) by temporally anticipating accidents before they occur (Fig. 1(b)), and (2) by spatially localizing risky regions (Fig. 1(c)) in the environment to move away from threats. In this paper, we take an agent-centric approach to study the accident anticipation and risky region localization tasks. We propose...

chapter

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1090 - 1099

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Associating image regions with text queries has been recently explored as a new way to bridge visual and linguistic representations. A few pioneering approaches have been proposed based on recurrent neural language models trained generatively (e.g., generating captions), but achieving somewhat limited localization accuracy. To better address natural-language-based visual entity localization, we propose...

chapter

Video Captioning with Transferred Semantic Attributes

Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 984 - 992

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most recent progress in this problem has been achieved through employing 2-D and/or 3-D Convolutional Neural Networks (CNNs) to encode video content and Recurrent Neural Networks (RNNs) to decode a sentence. In this paper, we present Long Short-Term Memory with Transferred...

chapter

On Human Motion Prediction Using Recurrent Neural Networks

Julieta Martinez, Michael J. Black, Javier Romero

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4674 - 4683

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion,...

chapter

Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

Hongsong Wang, Liang Wang

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3633 - 3642

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal...

chapter

See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification

Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6776 - 6785

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Surveillance cameras have been widely used in different scenes. Accordingly, a demanding need is to recognize a person under different cameras, which is called person re-identification. This topic has gained increasing interests in computer vision recently. However, less attention has been paid to video-based approaches, compared with image-based ones. Two steps are usually involved in previous approaches,...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Neural Aggregation Network for Video Face Recognition

Learning and Refining of Privileged Information-Based RNNs for Action Recognition from Depth Sequences

Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Predictive-Corrective Networks for Action Detection

Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

Multi-level Attention Networks for Visual Question Answering

An Empirical Evaluation of Visual Question Answering for Novel Objects

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

DESIRE: Distant Future Prediction in Dynamic Scenes with Interacting Agents

Agent-Centric Risk Assessment: Accident Anticipation and Risky Region Localization

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Video Captioning with Transferred Semantic Attributes

On Human Motion Prediction Using Recurrent Neural Networks

Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

See the Forest for the Trees: Joint Spatial and Temporal Recurrent Neural Networks for Video-Based Person Re-identification

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)