2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering

Youngjae Yu, Hyungjin Ko, Jongwook Choi, Gunhee Kim

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3261 - 3269

We propose a high-level concept word detector that can be integrated with any video-to-language models. It takes a video as input and generates a list of concept words as useful semantic priors for language generation models. The proposed word detector has two important properties. First, it does not require any external knowledge sources for training. Second, the proposed word detector is trainable...

chapter

Dual Attention Networks for Multimodal Reasoning and Matching

Hyeonseob Nam, Jung-Woo Ha, Jeonghee Kim

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2156 - 2164

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose Dual Attention Networks (DANs) which jointly leverage visual and textual attention mechanisms to capture fine-grained interplay between vision and language. DANs attend to specific regions in images and words in text through multiple steps and gather essential information from both modalities. Based on this framework, we introduce two types of DANs for multimodal reasoning and matching,...

chapter

Graph-Structured Representations for Visual Question Answering

Damien Teney, Lingqiao Liu, Anton van den Hengel

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3233 - 3241

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper proposes to improve visual question answering (VQA) with structured representations of both scene contents and questions. A key challenge in VQA is to require joint reasoning over the visual and text domains. The predominant CNN/LSTM-based approach to VQA is limited by monolithic vector representations that largely ignore structure in the scene and in the question. CNN feature vectors cannot...

chapter

Knowledge Acquisition for Visual Question Answering via Iterative Querying

Yuke Zhu, Joseph J. Lim, Li Fei-Fei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6146 - 6155

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Humans possess an extraordinary ability to learn new skills and new knowledge for problem solving. Such learning ability is also required by an automatic model to deal with arbitrary, open-ended questions in the visual world. We propose a neural-based approach to acquiring task-driven information for visual question answering (VQA). Our model proposes queries to actively acquire relevant information...

chapter

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5376 - 5384

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. We present the Textbook Question Answering (TQA) dataset that includes 1,076 lessons and 26,260 multi-modal questions, taken from middle school science curricula. Our analysis shows that a significant portion of questions require complex parsing...

chapter

Counting Everyday Objects in Everyday Scenes

Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4428 - 4437

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We are interested in counting the number of instances of object classes in natural, everyday images. Previous counting approaches tackle the problem in restricted domains such as counting pedestrians in surveillance videos. Counts can also be estimated from outputs of other vision tasks like object detection. In this work, we build dedicated models for counting designed to tackle the large variance...

chapter

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6325 - 6334

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Problems at the intersection of vision and language are of significant importance both as challenging research questions and for the rich set of applications they enable. However, inherent structure in our world and bias in our language tend to be a simpler signal for learning than visual modalities, resulting in models that ignore visual information, leading to an inflated sense of their capability...

chapter

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Harm de Vries, Florian Strub, Sarath Chandar, Olivier Pietquin, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4466 - 4475

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce GuessWhat?!, a two-player guessing game as a testbed for research on the interplay of computer vision and dialogue systems. The goal of the game is to locate an unknown object in a rich image scene by asking a sequence of questions. Higher-level image understanding, like spatial reasoning and language grounding, is required to solve the proposed task. Our key contribution is the collection...

chapter

Multi-level Attention Networks for Visual Question Answering

Dongfei Yu, Jianlong Fu, Tao Mei, Yong Rui

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4187 - 4195

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Inspired by the recent success of text-based question answering, visual question answering (VQA) is proposed to automatically answer natural language questions with the reference to a given image. Compared with text-based QA, VQA is more challenging because the reasoning process on visual domain needs both effective semantic embedding and fine-grained visual understanding. Existing approaches predominantly...

chapter

An Empirical Evaluation of Visual Question Answering for Novel Objects

Santhosh K. Ramakrishnan, Ambar Pal, Gaurav Sharma, Anurag Mittal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7312 - 7321

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We study the problem of answering questions about images in the harder setting, where the test questions and corresponding images contain novel objects, which were not queried about in the training data. Such setting is inevitable in real world–owing to the heavy tailed distribution of the visual categories, there would be some objects which would not be annotated in the train set. We show...

chapter

Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach

Aidean Sharghi, Jacob S. Laurel, Boqing Gong

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2127 - 2136

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recent years have witnessed a resurgence of interest in video summarization. However, one of the main obstacles to the research on video summarization is the user subjectivity — users have various preferences over the summaries. The subjectiveness causes at least two problems. First, no single video summarizer fits all users unless it interacts with and adapts to the individual users. Second,...

chapter

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering

Yunseok Jang, Yale Song, Youngjae Yu, Youngjin Kim, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1359 - 1367

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Vision and language understanding has emerged as a subject undergoing intense study in Artificial Intelligence. Among many tasks in this line of research, visual question answering (VQA) has been one of the most successful ones, where the goal is to learn a model that understands visual content at region-level details and finds their associations with pairs of questions and answers in the natural...

chapter

Visual Dialog

Abhishek Das, Satwik Kottur, Khushi Gupta, Avi Singh, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1080 - 1089

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering

Dual Attention Networks for Multimodal Reasoning and Matching

Graph-Structured Representations for Visual Question Answering

Knowledge Acquisition for Visual Question Answering via Iterative Querying

Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension

Counting Everyday Objects in Everyday Scenes

Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

GuessWhat?! Visual Object Discovery through Multi-modal Dialogue

Multi-level Attention Networks for Visual Question Answering

An Empirical Evaluation of Visual Question Answering for Novel Objects

Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering

Visual Dialog

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)