2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Temporal Attention-Gated Model for Robust Sequence Classification

Wenjie Pei, Tadas Baltrusaitis, David M. J. Tax, Louis-Philippe Morency

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 820 - 829

Typical techniques for sequence classification are designed for well-segmented sequences which have been edited to remove noisy or irrelevant parts. Therefore, such methods cannot be easily applied on noisy sequences expected in real-world applications. In this paper, we present the Temporal Attention-Gated Model (TAGM) which integrates ideas from attention models and gated recurrent networks to better...

chapter

StyleNet: Generating Attractive Visual Captions with Styles

Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 955 - 964

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a novel framework named StyleNet to address the task of generating attractive captions for images and videos with different styles. To this end, we devise a novel model component, named factored LSTM, which automatically distills the style factors in the monolingual text corpus. Then at runtime, we can explicitly control the style in the caption generation process so as to produce attractive...

chapter

Self-Critical Sequence Training for Image Captioning

Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jerret Ross, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1179 - 1195

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant...

chapter

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

Linchao Zhu, Zhongwen Xu, Yi Yang

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1339 - 1348

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Despite the recent success of neural networks in image feature learning, a major problem in the video domain is the lack of sufficient labeled data for learning to model temporal information. In this paper, we propose an unsupervised temporal modeling method that learns from untrimmed videos. The speed of motion varies constantly, e.g., a man may run quickly or slowly. We therefore train a Multirate...

chapter

Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

Yanhua Cheng, Rui Cai, Zhiwei Li, Xin Zhao, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1475 - 1483

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper focuses on indoor semantic segmentation using RGB-D data. Although the commonly used deconvolution networks (DeconvNet) have achieved impressive results on this task, we find there is still room for improvements in two aspects. One is about the boundary segmentation. DeconvNet aggregates large context to predict the label of each pixel, inherently limiting the segmentation precision of...

chapter

Feedback Networks

Amir R. Zamir, Te-Lin Wu, Lin Sun, William B. Shen, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1808 - 1817

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Urrently, the most successful learning models in computer vision are based on learning successive representations followed by a decision layer. This is usually actualized through feedforward multilayer neural networks, e.g. ConvNets, where each layer forms one of such successive representations. However, an alternative that can achieve the same goal is a feedback based approach in which the representation...

chapter

Interpretable Structure-Evolving LSTM

Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2175 - 2184

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper develops a general framework for learning interpretable data representation via Long Short-Term Memory (LSTM) recurrent neural networks over hierarchal graph structures. Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network...

chapter

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

Lorenzo Baraldi, Costantino Grana, Rita Cucchiara

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3185 - 3194

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The use of Recurrent Neural Networks for video captioning has recently gained a lot of attention, since they can be used both to encode the input video and to generate the corresponding description. In this paper, we present a recurrent video encoding scheme which can discover and leverage the hierarchical structure of the video. Unlike the classical encoder-decoder approach, in which a video is encoded...

chapter

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning

Jiasen Lu, Caiming Xiong, Devi Parikh, Richard Socher

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3242 - 3250

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Attention-based neural encoder-decoder frameworks have been widely adopted for image captioning. Most methods force visual attention to be active for every generated word. However, the decoder likely requires little to no visual information from the image to predict non-visual words such as the and of. Other words that may seem visual can often be predicted reliably just from the language model e...

chapter

Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

Hongsong Wang, Liang Wang

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3633 - 3642

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recently, skeleton based action recognition gains more popularity due to cost-effective depth sensors coupled with real-time skeleton estimation algorithms. Traditional approaches based on handcrafted features are limited to represent the complexity of motion patterns. Recent methods that use Recurrent Neural Networks (RNN) to handle raw skeletons only focus on the contextual dependency in the temporal...

chapter

Global Context-Aware Attention LSTM Networks for 3D Action Recognition

Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3671 - 3680

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Long Short-Term Memory (LSTM) networks have shown superior performance in 3D human action recognition due to their power in modeling the dynamics and dependencies in sequential data. Since not all joints are informative for action analysis and the irrelevant joints often bring a lot of noise, we need to pay more attention to the informative ones. However, original LSTM does not have strong attention...

chapter

Gated Feedback Refinement Network for Dense Image Labeling

Md Amirul Islam, Mrigank Rochan, Neil D. B. Bruce, Yang Wang

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4877 - 4885

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Effective integration of local and global contextual information is crucial for dense labeling problems. Most existing methods based on an encoder-decoder architecture simply concatenate features from earlier layers to obtain higher-frequency details in the refinement stages. However, there are limits to the quality of refinement possible if ambiguous information is passed forward. In this paper we...

chapter

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5085 - 5093

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Training convolutional networks (CNNs) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large networks that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect...

chapter

Person Search with Natural Language Description

Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5187 - 5196

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textual...

chapter

Residual Attention Network for Image Classification

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6450 - 6458

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

In this work, we propose Residual Attention Network, a convolutional neural network using attention mechanism which can incorporate with state-of-art feed forward network architecture in an end-to-end training fashion. Our Residual Attention Network is built by stacking Attention Modules which generate attention-aware features. The attention-aware features from different modules change adaptively...

chapter

Collaborative Deep Reinforcement Learning for Joint Object Search

Xiangyu Kong, Bo Xin, Yizhou Wang, Gang Hua

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7072 - 7081

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We examine the problem of joint top-down active search of multiple objects under interaction, e.g., person riding a bicycle, cups held by the table, etc. Such objects under interaction often can provide contextual cues to each other to facilitate more efficient search. By treating each detector as an agent, we present the first collaborative multi-agent deep reinforcement learning algorithm to learn...

chapter

Expert Gate: Lifelong Learning with a Network of Experts

Rahaf Aljundi, Punarjay Chakravarty, Tinne Tuytelaars

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7120 - 7129

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

In this paper we introduce a model of lifelong learning, based on a Network of Experts. New tasks / experts are learned and added to the model sequentially, building on what was learned before. To ensure scalability of this process, data from previous tasks cannot be stored and hence is not available when learning a new task. A critical issue in such context, not addressed in the literature so far,...

chapter

Recurrent Modeling of Interaction Context for Collective Activity Recognition

Minsi Wang, Bingbing Ni, Xiaokang Yang

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7408 - 7416

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Modeling of high order interactional context, e.g., group interaction, lies in the central of collective/group activity recognition. However, most of the previous activity recognition methods do not offer a flexible and scalable scheme to handle the high order context modeling problem. To explicitly address this fundamental bottleneck, we propose a recurrent interactional context modeling scheme based...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Temporal Attention-Gated Model for Robust Sequence Classification

StyleNet: Generating Attractive Visual Captions with Styles

Self-Critical Sequence Training for Image Captioning

Bidirectional Multirate Reconstruction for Temporal Modeling in Videos

Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

Feedback Networks

Interpretable Structure-Evolving LSTM

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning

Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks

Global Context-Aware Attention LSTM Networks for 3D Action Recognition

Gated Feedback Refinement Network for Dense Image Labeling

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Person Search with Natural Language Description

Residual Attention Network for Image Classification

Collaborative Deep Reinforcement Learning for Joint Object Search

Expert Gate: Lifelong Learning with a Network of Experts

Recurrent Modeling of Interaction Context for Collective Activity Recognition

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)