2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Unsupervised Video Summarization with Adversarial LSTM Networks

Behrooz Mahasseni, Michael Lam, Sinisa Todorovic

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2982 - 2991

This paper addresses the problem of unsupervised video summarization, formulated as selecting a sparse subset of video frames that optimally represent the input video. Our key idea is to learn a deep summarizer network to minimize distance between training videos and a distribution of their summarizations, in an unsupervised way. Such a summarizer can then be applied on a new video for estimating...

chapter

Variational Autoencoded Regression: High Dimensional Regression of Visual Data on Complex Manifold

Youngjoon Yoo, Sangdoo Yun, Hyung Jin Chang, Yiannis Demiris, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2943 - 2952

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper proposes a new high dimensional regression method by merging Gaussian process regression into a variational autoencoder framework. In contrast to other regression methods, the proposed method focuses on the case where output responses are on a complex high dimensional manifold, such as images. Our contributions are summarized as follows: (i) A new regression method estimating high dimensional...

chapter

Deep Image Harmonization

Yi-Hsuan Tsai, Xiaohui Shen, Zhe Lin, Kalyan Sunkavalli, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2799 - 2807

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Compositing is one of the most common operations in photo editing. To generate realistic composites, the appearances of foreground and background need to be adjusted to make them compatible. Previous approaches to harmonize composites have focused on learning statistical relationships between hand-crafted appearance features of the foreground and background, which is unreliable especially when the...

chapter

StyleBank: An Explicit Representation for Neural Image Style Transfer

Dongdong Chen, Lu Yuan, Jing Liao, Nenghai Yu, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2770 - 2779

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose StyleBank, which is composed of multiple convolution filter banks and each filter bank explicitly represents one style, for neural image style transfer. To transfer an image to a specific style, the corresponding filter bank is operated on top of the intermediate feature embedding produced by a single auto-encoder. The StyleBank and the auto-encoder are jointly learnt, where the learning...

chapter

Full Resolution Image Compression with Recurrent Neural Networks

George Toderici, Damien Vincent, Nick Johnston, Sung Jin Hwang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5435 - 5443

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper presents a set of full-resolution lossy image compression methods based on neural networks. Each of the architectures we describe can provide variable compression rates during deployment without requiring retraining of the network: each network need only be trained once. All of our architectures consist of a recurrent neural network (RNN)-based encoder and decoder, a binarizer, and a neural...

chapter

Flexible Spatio-Temporal Networks for Video Prediction

Chaochao Lu, Michael Hirsch, Bernhard Scholkopf

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2137 - 2145

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We describe a modular framework for video frame prediction. We refer to it as a Flexible Spatio-Temporal Network (FSTN) as it allows the extrapolation of a video sequence as well as the estimation of synthetic frames lying in between observed frames and thus the generation of slow-motion videos. By devising a customized objective function comprising decoding, encoding, and adversarial losses, we are...

chapter

Semantic Autoencoder for Zero-Shot Learning

Elyor Kodirov, Tao Xiang, Shaogang Gong

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4447 - 4456

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Existing zero-shot learning (ZSL) models typically learn a projection function from a feature space to a semantic embedding space (e.g. attribute space). However, such a projection function is only concerned with predicting the training seen class semantic representation (e.g. attribute prediction) or classification. When applied to test data, which in the context of ZSL contains different (unseen)...

chapter

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

Xiao Yang, Ersin Yumer, Paul Asente, Mike Kraley, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4342 - 4351

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover,...

chapter

Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation

Kuan-Lun Tseng, Yen-Liang Lin, Winston Hsu, Chung-Yang Huang

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3739 - 3746

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Deep learning models such as convolutional neural network have been widely used in 3D biomedical segmentation and achieve state-of-the-art performance. However, most of them often adapt a single modality or stack multiple modalities as different input channels, which ignores the correlations among them. To leverage the multi-modalities, we propose a deep convolution encoder-decoder structure with...

chapter

Synthesizing Normalized Faces from Facial Identity Features

Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3386 - 3395

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We present a method for synthesizing a frontal, neutral-expression image of a persons face, given an input face photograph. This is achieved by learning to generate facial landmarks and textures from features extracted from a facial-recognition network. Unlike previous generative approaches, our encoding feature vector is largely invariant to lighting, pose, and facial expression. Exploiting this...

chapter

Neural Scene De-rendering

Jiajun Wu, Joshua B. Tenenbaum, Pushmeet Kohli

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7035 - 7043

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We study the problem of holistic scene understanding. We would like to obtain a compact, expressive, and interpretable representation of scenes that encodes information such as the number of objects and their categories, poses, positions, etc. Such a representation would allow us to reason about and even reconstruct or manipulate elements of the scene. Previous works have used encoder-decoder based...

chapter

Online Video Object Segmentation via Convolutional Trident Network

Won-Dong Jang, Chang-Su Kim

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7474 - 7483

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

A semi-supervised online video object segmentation algorithm, which accepts user annotations about a target object at the first frame, is proposed in this work. We propagate the segmentation labels at the previous frame to the current frame using optical flow vectors. However, the propagation is error-prone. Therefore, we develop the convolutional trident network (CTN), which has three decoding branches:...

chapter

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos

Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2224 - 2232

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a novel algorithm for weakly supervised semantic segmentation based on image-level class labels only. In weakly supervised setting, it is commonly observed that trained model overly focuses on discriminative parts rather than the entire object area. Our goal is to overcome this limitation with no additional human intervention by retrieving videos relevant to target class labels from web...

chapter

Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders

Xin Yu, Fatih Porikli

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5367 - 5375

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Most of the conventional face hallucination methods assume the input image is sufficiently large and aligned, and all require the input image to be noise-free. Their performance degrades drastically if the input image is tiny, unaligned, and contaminated by noise. In this paper, we introduce a novel transformative discriminative autoencoder to 8X super-resolve unaligned noisy and tiny (16X16) low-resolution...

chapter

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Sam Gross, Marc'Aurelio Ranzato, Arthur Szlam

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5085 - 5093

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Training convolutional networks (CNNs) that fit on a single GPU with minibatch stochastic gradient descent has become effective in practice. However, there is still no effective method for training large networks that do not fit in the memory of a few GPU cards, or for parallelizing CNN training. In this work we show that a simple hard mixture of experts model can be efficiently trained to good effect...

chapter

Deep Multimodal Representation Learning from Temporal Data

Xitong Yang, Palghat Ramesh, Radha Chitta, Sriganesh Madhvanath, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5066 - 5074

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

In recent years, Deep Learning has been successfully applied to multimodal learning problems, with the aim of learning useful joint representations in data fusion applications. When the available modalities consist of time series data such as video, audio and sensor signals, it becomes imperative to consider their temporal structure during the fusion process. In this paper, we propose the Correlational...

chapter

Deep Image Matting

Ning Xu, Brian Price, Scott Cohen, Thomas Huang

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 311 - 320

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Image matting is a fundamental computer vision problem and has many applications. Previous algorithms have poor performance when an image has similar foreground and background colors or complicated textures. The main reasons are prior methods 1) only use low-level features and 2) lack high-level context. In this paper, we propose a novel deep learning based algorithm that can tackle both these problems...

chapter

Lip Reading Sentences in the Wild

Joon Son Chung, Andrew Senior, Oriol Vinyals, Andrew Zisserman

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3444 - 3453

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem – unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) a Watch, Listen, Attend and Spell...

chapter

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

Qing Sun, Stefan Lee, Dhruv Batra

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7215 - 7223

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We develop the first approximate inference algorithm for 1-Best (and M-Best) decoding in bidirectional neural sequence models by extending Beam Search (BS) to reason about both forward and backward time dependencies. Beam Search (BS) is a widely used approximate inference algorithm for decoding sequences from unidirectional neural sequence models. Interestingly, approximate inference in bidirectional...

chapter

Learning Diverse Image Colorization

Aditya Deshpande, Jiajun Lu, Mao-Chuang Yeh, Min Jin Chong, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2877 - 2885

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Colorization is an ambiguous problem, with multiple viable colorizations for a single grey-level image. However, previous methods only produce the single most probable colorization. Our goal is to model the diversity intrinsic to the problem of colorization and produce multiple colorizations that display long-scale spatial co-ordination. We learn a low dimensional embedding of color fields using a...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Unsupervised Video Summarization with Adversarial LSTM Networks

Variational Autoencoded Regression: High Dimensional Regression of Visual Data on Complex Manifold

Deep Image Harmonization

StyleBank: An Explicit Representation for Neural Image Style Transfer

Full Resolution Image Compression with Recurrent Neural Networks

Flexible Spatio-Temporal Networks for Video Prediction

Semantic Autoencoder for Zero-Shot Learning

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation

Synthesizing Normalized Faces from Facial Identity Features

Neural Scene De-rendering

Online Video Object Segmentation via Convolutional Trident Network

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos

Hallucinating Very Low-Resolution Unaligned and Noisy Face Images by Transformative Discriminative Autoencoders

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Deep Multimodal Representation Learning from Temporal Data

Deep Image Matting

Lip Reading Sentences in the Wild

Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning

Learning Diverse Image Colorization

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)