2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5376 - 5384

We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question Answering (TQA) dataset that includes 1,076 lessons and 26,260 multi-modal questions, taken from middle school science curricula. Our analysis shows that a significant portion of questions require complex parsing...

chapter

Modeling Relationships in Referential Expressions with Compositional Modular Networks

Ronghang Hu, Marcus Rohrbach, Jacob Andreas, Trevor Darrell, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4418 - 4427

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

People often refer to entities in an image in terms of their relationships with other entities. For example, the black cat sitting under the table refers to both a black cat entity and its relationship with another table entity. Understanding these relationships is essential for interpreting and grounding such natural language expressions. Most prior work focuses on either grounding entire referential...

chapter

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4466 - 4475

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection...

chapter

Tracking by Natural Language Specification

Zhenyang Li, Ran Tao, Efstratios Gavves, Cees G. M. Snoek, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7350 - 7358

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper strives to track a target object in a video. Rather than specifying the target in the first frame of a video by a bounding box, we propose to track the object based on a natural language specification of the target, which provides a more natural human-machine interaction as well as a means to improve tracking results. We define three variants of tracking by language specification: one relying...

chapter

Video Captioning with Transferred Semantic Attributes

Yingwei Pan, Ting Yao, Houqiang Li, Tao Mei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 984 - 992

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most recent progress in this problem has been achieved through employing 2-D and/or 3-D Convolutional Neural Networks (CNNs) to encode video content and Recurrent Neural Networks (RNNs) to decode a sentence. In this paper, we present Long Short-Term Memory with Transferred...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Modeling Relationships in Referential Expressions with Compositional Modular Networks

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Tracking by Natural Language Specification

Video Captioning with Transferred Semantic Attributes

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Modeling Relationships in Referential Expressions with Compositional Modular Networks

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Tracking by Natural Language Specification

Video Captioning with Transferred Semantic Attributes

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)