2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Learning Non-maximum Suppression

Jan Hosang, Rodrigo Benenson, Bernt Schiele

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6469 - 6477

Object detectors have hugely profited from moving towards an end-to-end learning paradigm: proposals, fea tures, and the classifier becoming one neural network improved results two-fold on general object detection. One indispensable component is non-maximum suppression (NMS), a post-processing algorithm responsible for merging all detections that belong to the same object. The de facto standard NMS...

chapter

Scene Graph Generation by Iterative Message Passing

Danfei Xu, Yuke Zhu, Christopher B. Choy, Li Fei-Fei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3097 - 3106

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Understanding a visual scene goes beyond recognizing individual objects in isolation. Relationships between objects also constitute rich semantic information about the scene. In this work, we explicitly model the objects and their relationships using scene graphs, a visually-grounded graphical structure of an image. We propose a novel end-to-end model that generates such structured scene representation...

chapter

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

Xiaolong Wang, Abhinav Shrivastava, Abhinav Gupta

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3039 - 3048

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

How do we learn an object detector that is invariant to occlusions and deformations? Our current solution is to use a data-driven strategy – collect large-scale datasets which have object instances under different conditions. The hope is that the final classifier can use these examples to learn invariances. But is it really possible to see all the occlusions in a dataset? We argue that...

chapter

Self-Learning Scene-Specific Pedestrian Detectors Using a Progressive Latent Model

Qixiang Ye, Tianliang Zhang, Wei Ke, Qiang Qiu, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2057 - 2066

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

In this paper, a self-learning approach is proposed towards solving scene-specific pedestrian detection problem without any human annotation involved. The self-learning approach is deployed as progressive steps of object discovery, object enforcement, and label propagation. In the learning procedure, object locations in each frame are treated as latent variables that are solved with a progressive...

chapter

Relationship Proposal Networks

Ji Zhang, Mohamed Elhoseiny, Scott Cohen, Walter Chang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5226 - 5234

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Image scene understanding requires learning the relationships between objects in the scene. A scene with many objects may have only a few individual interacting objects (e.g., in a party image with many people, only a handful of people might be speaking with each other). To detect all relationships, it would be inefficient to first detect all individual objects and then classify all pairs, not only...

chapter

Dense Captioning with Joint Inference and Visual Context

Linjie Yang, Kevin Tang, Jianchao Yang, Li-Jia Li

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1978 - 1987

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. The goal is to densely detect visual concepts (e.g., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase. We identify two key challenges of dense captioning that need to be properly addressed when tackling the problem. First,...

chapter

Scale-Aware Face Detection

Zekun Hao, Yu Liu, Hongwei Qin, Junjie Yan, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1913 - 1922

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Convolutional neural network (CNN) based face detectors are inefficient in handling faces of diverse scales. They rely on either fitting a large single model to faces across a large scale range or multi-scale testing. Both are computationally expensive. We propose Scale-aware Face Detection (SAFD) to handle scale explicitly using CNN, and achieve better performance with less computation cost. Prior...

chapter

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1417 - 1426

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Temporal action localization is an important yet challenging problem. Given a long, untrimmed video consisting of multiple action instances and complex background contents, we need not only to recognize their action categories, but also to localize the start time and end time of each instance. Many state-of-the-art systems use segment-level classifiers to select and rank proposal segments of pre-determined...

chapter

Fully Convolutional Instance-Aware Semantic Segmentation

Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4438 - 4446

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We present the first fully convolutional end-to-end solution for instance-aware semantic segmentation task. It inherits all the merits of FCNs for semantic segmentation [29] and instance mask proposal [5]. It performs instance mask prediction and classification jointly. The underlying convolutional representation is fully shared between the two sub-tasks, as well as between all regions of interest...

chapter

What is and What is Not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors

Changqun Xia, Jia Li, Xiaowu Chen, Anlin Zheng, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4399 - 4407

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Finding what is and what is not a salient object can be helpful in developing better features and models in salient object detection (SOD). In this paper, we investigate the images that are selected and discarded in constructing a new SOD dataset and find that many similar candidates, complex shape and low objectness are three main attributes of many non-salient objects. Moreover, objects may have...

chapter

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

Jianlong Fu, Heliang Zheng, Tao Mei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4476 - 4484

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Recognizing fine-grained categories (e.g., bird species) is difficult due to the challenges of discriminative region localization and fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that region detection and fine-grained feature learning are mutually correlated and thus can reinforce each other. In this paper, we propose...

chapter

Straight to Shapes: Real-Time Detection of Encoded Shapes

Saumya Jetley, Michael Sapienza, Stuart Golodetz, Philip H. S. Torr

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4207 - 4216

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Current object detection approaches predict bounding boxes that provide little instance-specific information beyond location, scale and aspect ratio. In this work, we propose to regress directly to objects shapes in addition to their bounding boxes and categories. It is crucial to find an appropriate shape representation that is compact and decodable, and in which objects can be compared for higher-order...

chapter

Deep Self-Taught Learning for Weakly Supervised Object Localization

Zequn Jie, Yunchao Wei, Xiaojie Jin, Jiashi Feng, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4294 - 4302

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Most existing weakly supervised localization (WSL) approaches learn detectors by finding positive bounding boxes based on features learned with image-level supervision. However, those features do not contain spatial location related information and usually provide poor-quality positive samples for training a detector. To overcome this issue, we propose a deep self-taught learning approach, which makes...

chapter

Towards Accurate Multi-person Pose Estimation in the Wild

George Papandreou, Tyler Zhu, Nori Kanazawa, Alexander Toshev, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3711 - 3719

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task. It is a simple, yet powerful, top-down approach consisting of two stages. In the first stage, we predict the location and scale of boxes which are likely to contain people, for this we use the Faster RCNN detector. In the second stage, we estimate the keypoints...

chapter

Learning Video Object Segmentation from Static Images

Federico Perazzi, Anna Khoreva, Rodrigo Benenson, Bernt Schiele, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3491 - 3500

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Inspired by recent advances of deep learning in instance segmentation and object tracking, we introduce the concept of convnet-based guidance applied to video object segmentation. Our model proceeds on a per-frame basis, guided by the output of the previous frame towards the object of interest in the next frame. We demonstrate that highly accurate object segmentation in videos can be enabled by using...

chapter

Detecting Oriented Text in Natural Images by Linking Segments

Baoguang Shi, Xiang Bai, Serge Belongie

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3482 - 3490

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Most state-of-the-art text detection methods are specific to horizontal Latin text and are not fast enough for real-time applications. We introduce Segment Linking (SegLink), an oriented text detection method. The main idea is to decompose text into two locally detectable elements, namely segments and links. A segment is an oriented box covering a part of a word or text line, A link connects two adjacent...

chapter

Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition

Timur Bagautdinov, Alexandre Alahi, Francois Fleuret, Pascal Fua, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3425 - 3434

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We present a unified framework for understanding human social behaviors in raw image sequences. Our model jointly detects multiple individuals, infers their social actions, and estimates the collective actions with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end to generate dense...

chapter

Joint Detection and Identification Feature Learning for Person Search

Tong Xiao, Shuang Li, Bochao Wang, Liang Lin, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3376 - 3385

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Existing person re-identification benchmarks and methods mainly focus on matching cropped pedestrian images between queries and candidates. However, it is different from real-world scenarios where the annotations of pedestrian bounding boxes are unavailable and the target person needs to be searched from a gallery of whole scene images. To close the gap, we propose a new deep learning framework for...

chapter

Neural Scene De-rendering

Jiajun Wu, Joshua B. Tenenbaum, Pushmeet Kohli

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7035 - 7043

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We study the problem of holistic scene understanding. We would like to obtain a compact, expressive, and interpretable representation of scenes that encodes information such as the number of objects and their categories, poses, positions, etc. Such a representation would allow us to reason about and even reconstruct or manipulate elements of the scene. Previous works have used encoder-decoder based...

chapter

InstanceCut: From Edges to Instances with MultiCut

Alexander Kirillov, Evgeny Levinkov, Bjoern Andres, Bogdan Savchynskyy, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7322 - 7331

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This work addresses the task of instance-aware semantic segmentation. Our key motivation is to design a simple method with a new modelling-paradigm, which therefore has a different trade-off between advantages and disadvantages compared to known approaches. Our approach, we term InstanceCut, represents the problem by two output modalities: (i) an instance-agnostic semantic segmentation and (ii) all...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Learning Non-maximum Suppression

Scene Graph Generation by Iterative Message Passing

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

Self-Learning Scene-Specific Pedestrian Detectors Using a Progressive Latent Model

Relationship Proposal Networks

Dense Captioning with Joint Inference and Visual Context

Scale-Aware Face Detection

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

Fully Convolutional Instance-Aware Semantic Segmentation

What is and What is Not a Salient Object? Learning Salient Object Detector by Ensembling Linear Exemplar Regressors

Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition

Straight to Shapes: Real-Time Detection of Encoded Shapes

Deep Self-Taught Learning for Weakly Supervised Object Localization

Towards Accurate Multi-person Pose Estimation in the Wild

Learning Video Object Segmentation from Static Images

Detecting Oriented Text in Natural Images by Linking Segments

Social Scene Understanding: End-to-End Multi-person Action Localization and Collective Activity Recognition

Joint Detection and Identification Feature Learning for Person Search

Neural Scene De-rendering

InstanceCut: From Edges to Instances with MultiCut

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)