2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Full Resolution Image Compression with Recurrent Neural Networks

George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5435 - 5443

This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural...

chapter

Zero Shot Learning via Multi-scale Manifold Regularization

Shay Deutsch, Soheil Kolouri, Kyungnam Kim, Yuri Owechko, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5292 - 5299

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We address zero-shot learning using a new manifold alignment framework based on a localized multi-scale transform on graphs. Our inference approach includes a smoothness criterion for a function mapping nodes on a graph (visual representation) onto a linear space (semantic representation), which we optimize using multi-scale graph wavelets. The robustness of the ensuing scheme allows us to operate...

chapter

Light Field Blind Motion Deblurring

Pratul P. Srinivasan, Ren Ng, Ravi Ramamoorthi

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2354 - 2362

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We study the problem of deblurring light fields of general 3D scenes captured under 3D camera motion and present both theoretical and practical contributions. By analyzing the motion-blurred light field in the primal and Fourier domains, we develop intuition into the effects of camera motion on the light field, show the advantages of capturing a 4D light field instead of a conventional 2D image for...

chapter

Interpretable Structure-Evolving LSTM

Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2175 - 2184

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper develops a general framework for learning interpretable data representation via Long Short-Term Memory (LSTM) recurrent neural networks over hierarchal graph structures. Instead of learning LSTM models over the pre-fixed structures, we propose to further learn the intermediate interpretable multi-level graph structures in a progressive and stochastic way from data during the LSTM network...

chapter

Unified Embedding and Metric Learning for Zero-Exemplar Event Detection

Noureldien Hussein, Efstratios Gavves, Arnold W. M. Smeulders

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2087 - 2096

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Event detection in unconstrained videos is conceived as a content-based video retrieval with two modalities: textual and visual. Given a text describing a novel event, the goal is to rank related videos accordingly. This task is zero-exemplar, no video examples are given to the novel event. Related works train a bank of concept detectors on external data sources. These detectors predict confidence...

chapter

FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos

Suyog Dutt Jain, Bo Xiong, Kristen Grauman

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2117 - 2126

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate this task as a structured prediction problem and design a two-stream fully convolutional neural network which fuses together motion and appearance in a unified...

chapter

ER3: A Unified Framework for Event Retrieval, Recognition and Recounting

Zhanning Gao, Gang Hua, Dongqing Zhang, Nebojsa Jojic, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2107 - 2116

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We develop a unified framework for complex event retrieval, recognition and recounting. The framework is based on a compact video representation that exploits the temporal correlations in image features. Our feature alignment procedure identifies and removes the feature redundancies across frames and outputs an intermediate tensor representation we call video imprint. The video imprint is then fed...

chapter

Dual Attention Networks for Multimodal Reasoning and Matching

Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2156 - 2164

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose Dual Attention Networks (DANs) which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language. DANs attend to specific regions in images and words in text through multiple steps and gather essential information from both modalities. Based on this framework, we introduce two types of DANs for multimodal reasoning and matching,...

chapter

Spatiotemporal Pyramid Network for Video Action Recognition

Yunbo Wang, Mingsheng Long, Jianmin Wang, Philip S. Yu

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2097 - 2106

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Two-stream convolutional networks have shown strong performance in video action recognition tasks. The key idea is to learn spatiotemporal features by fusing convolutional networks spatially and temporally. However, it remains unclear how to model the correlations between the spatial and temporal structures at multiple abstraction levels. First, the spatial stream tends to fail if two videos share...

chapter

Temporal Action Co-Segmentation in 3D Motion Capture Data and Videos

Konstantinos Papoutsakis, Costas Panagiotakis, Antonis A. Argyros

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2146 - 2155

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Given two action sequences, we are interested in spotting/co-segmenting all pairs of sub-sequences that represent the same action. We propose a totally unsupervised solution to this problem. No a-priori model of the actions is assumed to be available. The number of common sub-sequences may be unknown. The sub-sequences can be located anywhere in the original sequences, may differ in duration and the...

chapter

Flexible Spatio-Temporal Networks for Video Prediction

Chaochao Lu, Michael Hirsch, Bernhard Scholkopf

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2137 - 2145

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We describe a modular framework for video frame prediction. We refer to it as a Flexible Spatio-Temporal Network (FSTN) as it allows the extrapolation of a video sequence as well as the estimation of synthetic frames lying in between observed frames and thus the generation of slow-motion videos. By devising a customized objective function comprising decoding, encoding, and adversarial losses, we are...

chapter

Relationship Proposal Networks

Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5226 - 5234

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Image scene understanding requires learning the relationships between objects in the scene. A scene with many objects may have only a few individual interacting objects (e.g., in a party image with many people, only a handful of people might be speaking with each other). To detect all relationships, it would be inefficient to first detect all individual objects and then classify all pairs, not only...

chapter

Few-Shot Object Recognition from Machine-Labeled Web Images

Zhongwen Xu, Linchao Zhu, Yi Yang

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5358 - 5366

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

With the tremendous advances made by Convolutional Neural Networks (ConvNets) on object recognition, we can now easily obtain adequately reliable machine-labeled annotations easily from predictions by off-the-shelf ConvNets. In this work, we present an abstraction memory based framework for few-shot learning, building upon machine-labeled image annotations. Our method takes large-scale machine-annotated...

chapter

Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications

Evgeny Levinkov, Jonas Uhrig, Siyu Tang, Mohamed Omran, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1904 - 1912

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We state a combinatorial optimization problem whose feasible solutions define both a decomposition and a node labeling of a given graph. This problem offers a common mathematical abstraction of seemingly unrelated computer vision tasks, including instance-separating semantic segmentation, articulated human body pose estimation and multiple object tracking. Conceptually, it generalizes the unconstrained...

chapter

Dense Captioning with Joint Inference and Visual Context

Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1978 - 1987

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. The goal is to densely detect visual concepts (e.g., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase. We identify two key challenges of dense captioning that need to be properly addressed when tackling the problem. First,...

chapter

Emotion Recognition in Context

Ronak Kosti, Jose M. Alvarez, Adria Recasens, Agata Lapedriza

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1960 - 1968

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Understanding what a person is experiencing from her frame of reference is essential in our everyday life. For this reason, one can think that machines with this type of ability would interact better with people. However, there are no current systems capable of understanding in detail peoples emotional states. Previous research on computer vision to recognize emotions has mainly focused on analyzing...

chapter

Perceptual Generative Adversarial Networks for Small Object Detection

Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1951 - 1959

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Detecting small objects is notoriously challenging due to their low resolution and noisy representation. Existing object detection pipelines usually detect small objects through learning representations of all the objects at multiple scales. However, the performance gain of such ad hoc architectures is usually limited to pay off the computational cost. In this work, we address the small object detection...

chapter

Generative Hierarchical Learning of Sparse FRAME Models

Jianwen Xie, Yifei Xu, Erik Nijkamp, Ying Nian Wu, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1933 - 1941

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper proposes a method for generative learning of hierarchical random field models. The resulting model, which we call the hierarchical sparse FRAME (Filters, Random field, And Maximum Entropy) model, is a generalization of the original sparse FRAME model by decomposing it into multiple parts that are allowed to shift their locations, scales and rotations, so that the resulting model becomes...

chapter

Deep Unsupervised Similarity Learning Using Partially Ordered Sets

Miguel A. Bautista, Artsiom Sanakoyeu, Bjorn Ommer

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1923 - 1932

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Unsupervised learning of visual similarities is of paramount importance to computer vision, particularly due to lacking training data for fine-grained similarities. Deep learning of similarities is often based on relationships between pairs or triplets of samples. Many of these relations are unreliable and mutually contradicting, implying inconsistencies when trained without supervision information...

chapter

Generating Holistic 3D Scene Abstractions for Text-Based Image Retrieval

Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1942 - 1950

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Spatial relationships between objects provide important information for text-based image retrieval. As users are more likely to describe a scene from a real world perspective, using 3D spatial relationships rather than 2D relationships that assume a particular viewing direction, one of the main challenges is to infer the 3D structure that bridges images with users text descriptions. However, direct...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Full Resolution Image Compression with Recurrent Neural Networks

Zero Shot Learning via Multi-scale Manifold Regularization

Light Field Blind Motion Deblurring

Interpretable Structure-Evolving LSTM

Unified Embedding and Metric Learning for Zero-Exemplar Event Detection

FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos

ER3: A Unified Framework for Event Retrieval, Recognition and Recounting

Dual Attention Networks for Multimodal Reasoning and Matching

Spatiotemporal Pyramid Network for Video Action Recognition

Temporal Action Co-Segmentation in 3D Motion Capture Data and Videos

Flexible Spatio-Temporal Networks for Video Prediction

Relationship Proposal Networks

Few-Shot Object Recognition from Machine-Labeled Web Images

Joint Graph Decomposition & Node Labeling: Problem, Algorithms, Applications

Dense Captioning with Joint Inference and Visual Context

Emotion Recognition in Context

Perceptual Generative Adversarial Networks for Small Object Detection

Generative Hierarchical Learning of Sparse FRAME Models

Deep Unsupervised Similarity Learning Using Partially Ordered Sets

Generating Holistic 3D Scene Abstractions for Text-Based Image Retrieval

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)