2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Person Search with Natural Language Description

Shuang Li, Tong Xiao, Hongsheng Li, Bolei Zhou, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5187 - 5196

Searching persons in large-scale image databases with the query of natural language description has important applications in video surveillance. Existing methods mainly focused on searching persons with image-based or attribute-based queries, which have major limitations for a practical usage. In this paper, we study the problem of person search with natural language description. Given the textual...

chapter

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Jonathan Krause, Justin Johnson, Ranjay Krishna, Li Fei-Fei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3337 - 3345

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recent progress on image captioning has made it possible to generate novel sentences describing images in natural language, but compressing an image into a single sentence can describe visual content in only coarse detail. While one new captioning approach, dense captioning, can potentially describe images in finer levels of detail by captioning many regions within an image, it in turn is unable to...

chapter

Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures

Fanyi Xiao, Leonid Sigal, Yong Jae Lee

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5253 - 5262

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i.e., localize) arbitrary linguistic phrases, in the form of spatial attention masks. Specifically, the model is trained with images and their associated image-level captions, without any explicit region-to-phrase correspondence annotations. To this end, we introduce an end-to-end model...

chapter

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5376 - 5384

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question Answering (TQA) dataset that includes 1,076 lessons and 26,260 multi-modal questions, taken from middle school science curricula. Our analysis shows that a significant portion of questions require complex parsing...

chapter

A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering

Tegan Maharaj, Nicolas Ballas, Anna Rohrbach, Aaron Courville, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7359 - 7368

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

While deep convolutional neural networks frequently approach or exceed human-level performance in benchmark tasks involving static images, extending this success to moving images is not straightforward. Video understanding is of interest for many applications, including content recommendation, prediction, summarization, event/object detection, and understanding human visual perception. However, many...

chapter

Fine-Grained Image Classification via Combining Vision and Language

Xiangteng He, Yuxin Peng

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7332 - 7340

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy. Despite achieving promising...

chapter

Modeling Relationships in Referential Expressions with Compositional Modular Networks

Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4418 - 4427

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

People often refer to entities in an image in terms of their relationships with other entities. For example, the black cat sitting under the table refers to both a black cat entity and its relationship with another table entity. Understanding these relationships is essential for interpreting and grounding such natural language expressions. Most prior work focuses on either grounding entire referential...

chapter

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4466 - 4475

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection...

chapter

Multi-level Attention Networks for Visual Question Answering

Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4187 - 4195

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Inspired by the recent success of text-based question answering, visual question answering (VQA) is proposed to automatically answer natural language questions with the reference to a given image. Compared with text-based QA, VQA is more challenging because the reasoning process on visual domain needs both effective semantic embedding and fine-grained visual understanding. Existing approaches predominantly...

chapter

Tracking by Natural Language Specification

Zhenyang Li, Ran Tao, Efstratios Gavves, Cees G. M. Snoek, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7350 - 7358

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper strives to track a target object in a video. Rather than specifying the target in the first frame of a video by a bounding box, we propose to track the object based on a natural language specification of the target, which provides a more natural human-machine interaction as well as a means to improve tracking results. We define three variants of tracking by language specification: one relying...

chapter

Unambiguous Text Localization and Retrieval for Cluttered Scenes

Xuejian Rong, Chucai Yi, Yingli Tian

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3279 - 3287

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Text instance as one category of self-described objects provides valuable information for understanding and describing cluttered scenes. In this paper, we explore the task of unambiguous text localization and retrieval, to accurately localize a specific targeted text instance in a cluttered image given a natural language description that refers to it. To address this issue, first a novel recurrent...

chapter

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1090 - 1099

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Associating image regions with text queries has been recently explored as a new way to bridge visual and linguistic representations. A few pioneering approaches have been proposed based on recurrent neural language models trained generatively (e.g., generating captions), but achieving somewhat limited localization accuracy. To better address natural-language-based visual entity localization, we propose...

chapter

Video Captioning with Transferred Semantic Attributes

Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 984 - 992

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most recent progress in this problem has been achieved through employing 2-D and/or 3-D Convolutional Neural Networks (CNNs) to encode video content and Recurrent Neural Networks (RNNs) to decode a sentence. In this paper, we present Long Short-Term Memory with Transferred...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Person Search with Natural Language Description

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

A Dataset and Exploration of Models for Understanding Video Data through Fill-in-the-Blank Question-Answering

Fine-Grained Image Classification via Combining Vision and Language

Modeling Relationships in Referential Expressions with Compositional Modular Networks

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Multi-level Attention Networks for Visual Question Answering

Tracking by Natural Language Specification

Unambiguous Text Localization and Retrieval for Cluttered Scenes

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Video Captioning with Transferred Semantic Attributes

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)