2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering

Yunseok Jang, Yale Song, Youngjae Yu, Youngjin Kim, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1359 - 1367

Vision and language understanding has emerged as a subject undergoing intense study in Artificial Intelligence. Among many tasks in this line of research, visual question answering (VQA) has been one of the most successful ones, where the goal is to learn a model that understands visual content at region-level details and finds their associations with pairs of questions and answers in the natural...

chapter

Turning an Urban Scene Video into a Cinemagraph

Hang Yan, Yebin Liu, Yasutaka Furukawa

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1629 - 1637

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper proposes an algorithm that turns a regular video capturing urban scenes into a high-quality endless animation, known as a Cinemagraph. The creation of a Cinemagraph usually requires a static camera in a carefully configured scene. The task becomes challenging for a regular video with a moving camera and objects. Our approach first warps an input video into the viewpoint of a reference camera...

chapter

Fast Video Classification via Adaptive Cascading of Deep Models

Haichen Shen, Seungyeop Han, Matthai Philipose, Arvind Krishnamurthy

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2197 - 2205

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recent advances have enabled oracle classifiers that can classify across many classes and input distributions with high accuracy without retraining. However, these classifiers are relatively heavyweight, so that applying them to classify video is costly. We show that day-to-day video exhibits highly skewed class distributions over the short term, and that these distributions can be classified by much...

chapter

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3185 - 3194

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, since they can be used both to encode the input video and to generate the corresponding description. In this paper, we present a recurrent video encoding scheme which can discover and leverage the hierarchical structure of the video. Unlike the classical encoder-decoder approach, in which a video is encoded...

chapter

End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering

Youngjae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3261 - 3269

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a high-level concept word detector that can be integrated with any video-to-language models. It takes a video as input and generates a list of concept words as useful semantic priors for language generation models. The proposed word detector has two important properties. First, it does not require any external knowledge sources for training. Second, the proposed word detector is trainable...

chapter

Generating Descriptions with Grounded and Co-referenced People

Anna Rohrbach, Marcus Rohrbach, Siyu Tang, Seong Joon Oh, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4196 - 4206

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Learning how to generate descriptions of images or videos received major interest both in the Computer Vision and Natural Language Processing communities. While a few works have proposed to learn a grounding during the generation process in an unsupervised way (via an attention mechanism), it remains unclear how good the quality of the grounding is and whether it benefits the description quality....

chapter

Factorized Variational Autoencoders for Modeling Audience Reactions to Movies

Zhiwei Deng, Rajitha Navarathna, Peter Carr, Stephan Mandt, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6014 - 6023

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Matrix and tensor factorization methods are often used for finding underlying low-dimensional patterns from noisy data. In this paper, we study non-linear tensor factorization methods based on deep variational autoencoders. Our approach is well-suited for settings where the relationship between the latent representation to be learned and the raw data representation is highly complex. We apply our...

chapter

Supervising Neural Attention Models for Video Captioning by Human Gaze Data

Youngjae Yu, Jongwook Choi, Yeonhwa Kim, Kyung Yoo, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6119 - 6127

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The attention mechanisms in deep neural networks are inspired by humans attention that sequentially focuses on the most relevant parts of the information over time to generate prediction output. The attention parameters in those models are implicitly trained in an end-to-end manner, yet there have been few trials to explicitly incorporate human gaze tracking to supervise the attention models. In this...

chapter

UntrimmedNets for Weakly Supervised Action Recognition and Detection

Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6402 - 6411

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances...

chapter

A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering

Tegan Maharaj, Nicolas Ballas, Anna Rohrbach, Aaron Courville, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7359 - 7368

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

While deep convolutional neural networks frequently approach or exceed human-level performance in benchmark tasks involving static images, extending this success to moving images is not straightforward. Video understanding is of interest for many applications, including content recommendation, prediction, summarization, event/object detection, and understanding human visual perception. However, many...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering

Turning an Urban Scene Video into a Cinemagraph

Fast Video Classification via Adaptive Cascading of Deep Models

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering

Generating Descriptions with Grounded and Co-referenced People

Factorized Variational Autoencoders for Modeling Audience Reactions to Movies

Supervising Neural Attention Models for Video Captioning by Human Gaze Data

UntrimmedNets for Weakly Supervised Action Recognition and Detection

A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)