2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Kernel Pooling for Convolutional Neural Networks

Yin Cui, Feng Zhou, Jiang Wang, Xiao Liu, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3049 - 3058

Convolutional Neural Networks (CNNs) with Bilinear Pooling, initially in their full form and later using compact representations, have yielded impressive performance gains on a wide range of visual tasks, including fine-grained visual categorization, visual question answering, face recognition, and description of texture and style. The key to their success lies in the spatially invariant modeling...

chapter

Comprehension-Guided Referring Expressions

Ruotian Luo, Gregory Shakhnarovich

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3125 - 3134

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We consider generation and comprehension of natural language referring expression for objects in an image. Unlike generic image captioning which lacks natural standard evaluation criteria, quality of a referring expression may be measured by the receivers ability to correctly infer which object is being described. Following this intuition, we propose two approaches to utilize models trained for comprehension...

chapter

Supervising Neural Attention Models for Video Captioning by Human Gaze Data

Youngjae Yu, Jongwook Choi, Yeonhwa Kim, Kyung Yoo, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6119 - 6127

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The attention mechanisms in deep neural networks are inspired by humans attention that sequentially focuses on the most relevant parts of the information over time to generate prediction output. The attention parameters in those models are implicitly trained in an end-to-end manner, yet there have been few trials to explicitly incorporate human gaze tracking to supervise the attention models. In this...

chapter

Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?

Torsten Sattler, Akihiko Torii, Josef Sivic, Marc Pollefeys, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6175 - 6184

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Accurate visual localization is a key technology for autonomous navigation. 3D structure-based methods employ 3D models of the scene to estimate the full 6DOF pose of a camera very accurately. However, constructing (and extending) large-scale 3D models is still a significant challenge. In contrast, 2D image retrieval-based methods only require a database of geo-tagged images, which is trivial to construct...

chapter

Visual Translation Embedding Network for Visual Relation Detection

Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, Tat-Seng Chua

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3107 - 3115

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Visual relations, such as person ride bike and bike next to car, offer a comprehensive scene understanding of an image, and have already shown their great utility in connecting computer vision and natural language. However, due to the challenging combinatorial complexity of modeling subject-predicate-object relation triplets, very little work has been done to localize and predict visual relations...

chapter

Low-Rank-Sparse Subspace Representation for Robust Regression

Yongqiang Zhang, Daming Shi, Junbin Gao, Dansong Cheng

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2972 - 2981

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Learning robust regression model from high-dimensional corrupted data is an essential and difficult problem in many practical applications. The state-of-the-art methods have studied low-rank regression models that are robust against typical noises (like Gaussian noise and out-sample sparse noise) or outliers, such that a regression model can be learned from clean data lying on underlying subspaces...

chapter

BIND: Binary Integrated Net Descriptors for Texture-Less Object Recognition

Jacob Chan, Jimmy Addison Lee, Qian Kemao

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3020 - 3028

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper presents BIND (Binary Integrated Net Descriptor), a texture-less object detector that encodes multi-layered binary-represented nets for high precision edge-based description. Our proposed concept aligns layers of object-sized patches (nets) onto highly fragmented occlusion resistant line-segment midpoints (linelets) to encode regional information into efficient binary strings. These lightweight...

chapter

Generating the Future with Adversarial Transformers

Carl Vondrick, Antonio Torralba

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2992 - 3000

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We learn models to generate the immediate future in video. This problem has two main challenges. Firstly, since the future is uncertain, models should be multi-modal, which can be difficult to learn. Secondly, since the future is similar to the past, models store low-level details, which complicates learning of high-level semantics. We propose a framework to tackle both of these challenges. We present...

chapter

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3039 - 3048

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

How do we learn an object detector that is invariant to occlusions and deformations? Our current solution is to use a data-driven strategy – collect large-scale datasets which have object instances under different conditions. The hope is that the final classifier can use these examples to learn invariances. But is it really possible to see all the occlusions in a dataset? We argue that...

chapter

Learning Random-Walk Label Propagation for Weakly-Supervised Semantic Segmentation

Paul Vernaza, Manmohan Chandraker

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2953 - 2961

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Large-scale training for semantic segmentation is challenging due to the expense of obtaining training data for this task relative to other vision tasks. We propose a novel training approach to address this difficulty. Given cheaply-obtained sparse image labelings, we propagate the sparse labels to produce guessed dense labelings. A standard CNN-based segmentation network is trained to mimic these...

chapter

Unsupervised Video Summarization with Adversarial LSTM Networks

Behrooz Mahasseni, Michael Lam, Sinisa Todorovic

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2982 - 2991

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper addresses the problem of unsupervised video summarization, formulated as selecting a sparse subset of video frames that optimally represent the input video. Our key idea is to learn a deep summarizer network to minimize distance between training videos and a distribution of their summarizations, in an unsupervised way. Such a summarizer can then be applied on a new video for estimating...

chapter

Growing a Brain: Fine-Tuning by Increasing Model Capacity

Yu-Xiong Wang, Deva Ramanan, Martial Hebert

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3029 - 3038

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

CNNs have made an undeniable impact on computer vision through the ability to learn high-capacity models with large annotated training sets. One of their remarkable properties is the ability to transfer knowledge from a large source dataset to a (typically smaller) target dataset. This is usually accomplished through fine-tuning a fixed-size network on new target data. Indeed, virtually every contemporary...

chapter

Semantic Amodal Segmentation

Yan Zhu, Yuandong Tian, Dimitris Metaxas, Piotr Dollar

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3001 - 3009

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Common visual recognition tasks such as classification, object detection, and semantic segmentation are rapidly reaching maturity, and given the recent rate of progress, it is not unreasonable to conjecture that techniques for many of these problems will approach human levels of performance in the next few years. In this paper we look to the future: what is the next frontier in visual recognition?...

chapter

Accurate Depth and Normal Maps from Occlusion-Aware Focal Stack Symmetry

Michael Strecke, Anna Alperovich, Bastian Goldluecke

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2529 - 2537

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce a novel approach to jointly estimate consistent depth and normal maps from 4D light fields, with two main contributions. First, we build a cost volume from focal stack symmetry. However, in contrast to previous approaches, we introduce partial focal stacks in order to be able to robustly deal with occlusions. This idea already yields significanly better disparity maps. Second, even recent...

chapter

Deep TEN: Texture Encoding Network

Hang Zhang, Jia Xue, Kristin Dana

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2896 - 2905

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a Deep Texture Encoding Network (Deep-TEN) with a novel Encoding Layer integrated on top of convolutional layers, which ports the entire dictionary learning and encoding pipeline into a single model. Current methods build from distinct components, using standard encoders with separate off-the-shelf features such as SIFT descriptors or pre-trained CNN features for material recognition. Our...

chapter

Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning from Web Data

Bohan Zhuang, Lingqiao Liu, Yao Li, Chunhua Shen, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2915 - 2924

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Large-scale datasets have driven the rapid development of deep neural networks for visual recognition. However, annotating a massive dataset is expensive and time-consuming. Web images and their labels are, in comparison, much easier to obtain, but direct training on such automatially harvested images can lead to unsatisfactory performance, because the noisy labels of Web images adversely affect the...

chapter

Efficient Linear Programming for Dense CRFs

Thalaiyasingam Ajanthan, Alban Desmaison, Rudy Bunel, Mathieu Salzmann, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2934 - 2942

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The fully connected conditional random field (CRF) with Gaussian pairwise potentials has proven popular and effective for multi-class semantic segmentation. While the energy of a dense CRF can be minimized accurately using a linear programming (LP) relaxation, the state-of-the-art algorithm is too slow to be useful in practice. To alleviate this deficiency, we introduce an efficient LP minimization...

chapter

Hierarchical Multimodal Metric Learning for Multimodal Classification

Heng Zhang, Vishal M. Patel, Rama Chellappa

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2925 - 2933

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Multimodal classification arises in many computer vision tasks such as object classification and image retrieval. The idea is to utilize multiple sources (modalities) measuring the same instance to improve the overall performance compared to using a single source (modality). The varying characteristics exhibited by multiple modalities make it necessary to simultaneously learn the corresponding metrics...

chapter

Awesome Typography: Statistics-Based Text Effects Transfer

Shuai Yang, Jiaying Liu, Zhouhui Lian, Zongming Guo

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2886 - 2895

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

In this work, we explore the problem of generating fantastic special-effects for the typography. It is quite challenging due to the model diversities to illustrate varied text effects for different characters. To address this issue, our key idea is to exploit the analytics on the high regularity of the spatial distribution for text effects to guide the synthesis process. Specifically, we characterize...

chapter

Variational Autoencoded Regression: High Dimensional Regression of Visual Data on Complex Manifold

Youngjoon Yoo, Sangdoo Yun, Hyung Jin Chang, Yiannis Demiris, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2943 - 2952

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper proposes a new high dimensional regression method by merging Gaussian process regression into a variational autoencoder framework. In contrast to other regression methods, the proposed method focuses on the case where output responses are on a complex high dimensional manifold, such as images. Our contributions are summarized as follows: (i) A new regression method estimating high dimensional...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Kernel Pooling for Convolutional Neural Networks

Comprehension-Guided Referring Expressions

Supervising Neural Attention Models for Video Captioning by Human Gaze Data

Are Large-Scale 3D Models Really Necessary for Accurate Visual Localization?

Visual Translation Embedding Network for Visual Relation Detection

Low-Rank-Sparse Subspace Representation for Robust Regression

BIND: Binary Integrated Net Descriptors for Texture-Less Object Recognition

Generating the Future with Adversarial Transformers

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

Learning Random-Walk Label Propagation for Weakly-Supervised Semantic Segmentation

Unsupervised Video Summarization with Adversarial LSTM Networks

Growing a Brain: Fine-Tuning by Increasing Model Capacity

Semantic Amodal Segmentation

Accurate Depth and Normal Maps from Occlusion-Aware Focal Stack Symmetry

Deep TEN: Texture Encoding Network

Attend in Groups: A Weakly-Supervised Deep Learning Framework for Learning from Web Data

Efficient Linear Programming for Dense CRFs

Hierarchical Multimodal Metric Learning for Multimodal Classification

Awesome Typography: Statistics-Based Text Effects Transfer

Variational Autoencoded Regression: High Dimensional Regression of Visual Data on Complex Manifold

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)