2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

chapter

Learning to Learn from Noisy Web Videos

Serena Yeung, Vignesh Ramanathan, Olga Russakovsky, Liyue Shen, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7455 - 7463

Understanding the simultaneously very diverse and intricately fine-grained set of possible human actions is a critical open problem in computer vision. Manually labeling training videos is feasible for some action classes but doesnt scale to the full long-tailed distribution of actions. A promising way to address this is to leverage noisy data from web queries to learn new actions, using semi-supervised...

chapter

Temporal Action Localization by Structured Maximal Sums

Zehuan Yuan, Jonathan C. Stroud, Tong Lu, Jia Deng

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3215 - 3223

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We address the problem of temporal action localization in videos. We pose action localization as a structured prediction over arbitrary-length temporal windows, where each window is scored as the sum of frame-wise classification scores. Additionally, our model classifies the start, middle, and end of each action as separate components, allowing our system to explicitly model each actions temporal...

chapter

Unified Embedding and Metric Learning for Zero-Exemplar Event Detection

Noureldien Hussein, Efstratios Gavves, Arnold W. M. Smeulders

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2087 - 2096

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Event detection in unconstrained videos is conceived as a content-based video retrieval with two modalities: textual and visual. Given a text describing a novel event, the goal is to rank related videos accordingly. This task is zero-exemplar, no video examples are given to the novel event. Related works train a bank of concept detectors on external data sources. These detectors predict confidence...

chapter

FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos

Suyog Dutt Jain, Bo Xiong, Kristen Grauman

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2117 - 2126

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects in videos. We formulate this task as a structured prediction problem and design a two-stream fully convolutional neural network which fuses together motion and appearance in a unified...

chapter

Temporal Action Co-Segmentation in 3D Motion Capture Data and Videos

Konstantinos Papoutsakis, Costas Panagiotakis, Antonis A. Argyros

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2146 - 2155

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Given two action sequences, we are interested in spotting/co-segmenting all pairs of sub-sequences that represent the same action. We propose a totally unsupervised solution to this problem. No a-priori model of the actions is assumed to be available. The number of common sub-sequences may be unknown. The sub-sequences can be located anywhere in the original sequences, may differ in duration and the...

chapter

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

Zheng Shou, Jonathan Chan, Alireza Zareian, Kazuyuki Miyazawa, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1417 - 1426

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Temporal action localization is an important yet challenging problem. Given a long, untrimmed video consisting of multiple action instances and complex background contents, we need not only to recognize their action categories, but also to localize the start time and end time of each instance. Many state-of-the-art systems use segment-level classifiers to select and rank proposal segments of pre-determined...

chapter

The World of Fast Moving Objects

Denys Rozumnyi, Jan Kotera, Filip Sroubek, Lukas Novotny, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4838 - 4846

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The notion of a Fast Moving Object (FMO), i.e. an object that moves over a distance exceeding its size within the exposure time, is introduced. FMOs may, and typically do, rotate with high angular speed. FMOs are very common in sports videos, but are not rare elsewhere. In a single frame, such objects are often barely visible and appear as semitransparent streaks. A method for the detection and tracking...

chapter

Identifying First-Person Camera Wearers in Third-Person Videos

Chenyou Fan, Jangwon Lee, Mingze Xu, Krishna Kumar Singh, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4734 - 4742

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in scenarios in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first-and third-person videos, which is challenging because the camera...

chapter

PoseTrack: Joint Multi-person Pose Estimation and Tracking

Umar Iqbal, Anton Milan, Juergen Gall

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4654 - 4663

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

In this work, we introduce the challenging problem of joint multi-person pose estimation and tracking of an unknown number of persons in unconstrained videos. Existing methods for multi-person pose estimation in images cannot be applied directly to this problem, since it also requires to solve the problem of person association over time in addition to the pose estimation for each person. We therefore...

chapter

CERN: Confidence-Energy Recurrent Network for Group Activity Recognition

Tianmin Shu, Sinisa Todorovic, Song-Chun Zhu

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4255 - 4263

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This work is about recognizing human activities occurring in videos at distinct semantic levels, including individual actions, interactions, and group activities. The recognition is realized using a two-level hierarchy of Long Short-Term Memory (LSTM) networks, forming a feed-forward deep architecture, which can be trained end-to-end. In comparison with existing architectures of LSTMs, we make two...

chapter

LSTM Self-Supervision for Detailed Behavior Analysis

Biagio Brattoli, Uta Buchler, Anna-Sophia Wahl, Martin E. Schwab, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3747 - 3756

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Behavior analysis provides a crucial non-invasive and easily accessible diagnostic tool for biomedical research. A detailed analysis of posture changes during skilled motor tasks can reveal distinct functional deficits and their restoration during recovery. Our specific scenario is based on a neuroscientific study of rodents recovering from a large sensorimotor cortex stroke and skilled forelimb grasping...

chapter

Towards a Quality Metric for Dense Light Fields

Vamsi Kiran Adhikarla, Marek Vinkler, Denis Sumin, Rafal K. Mantiuk, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3720 - 3729

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Light fields become a popular representation of three-dimensional scenes, and there is interest in their processing, resampling, and compression. As those operations often result in loss of quality, there is a need to quantify it. In this work, we collect a new dataset of dense reference and distorted light fields as well as the corresponding quality scores which are scaled in perceptual units. The...

chapter

Binge Watching: Scaling Affordance Learning from Sitcoms

Xiaolong Wang, Rohit Girdhar, Abhinav Gupta

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3366 - 3375

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

In recent years, there has been a renewed interest in jointly modeling perception and action. At the core of this investigation is the idea of modeling affordances. However, when it comes to predicting affordances, even the state of the art approaches still do not use any ConvNets. Why is that? Unlike semantic or 3D tasks, there still does not exist any large-scale dataset for affordances. In this...

chapter

On the Effectiveness of Visible Watermarks

Tali Dekel, Michael Rubinstein, Ce Liu, William T. Freeman

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6864 - 6872

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Visible watermarking is a widely-used technique for marking and protecting copyrights of many millions of images on the web, yet it suffers from an inherent security flaw—watermarks are typically added in a consistent manner to many images. We show that this consistency allows to automatically estimate the watermark and recover the original images with high accuracy. Specifically, we present...

chapter

Unsupervised Learning of Long-Term Motion Dynamics for Videos

Zelun Luo, Boya Peng, De-An Huang, Alexandre Alahi, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7101 - 7110

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the complexity of the learning framework, we propose to describe the motion as a sequence of atomic 3D flows computed with RGB-D modality. We use a Recurrent Neural Network...

chapter

Predicting Salient Face in Multiple-Face Videos

Yufan Liu, Songyang Zhang, Mai Xu, Xuming He

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3224 - 3232

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Although the recent success of convolutional neural network (CNN) advances state-of-the-art saliency prediction in static images, few work has addressed the problem of predicting attention in videos. On the other hand, we find that the attention of different subjects consistently focuses on a single face in each frame of videos involving multiple faces. Therefore, we propose in this paper a novel...

chapter

HOPE: Hierarchical Object Prototype Encoding for Efficient Object Instance Search in Videos

Tan Yu, Yuwei Wu, Junsong Yuan

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3195 - 3204

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper tackles the problem of efficient and effective object instance search in videos. To effectively capture the relevance between a query and video frames and precisely localize the particular object, we leverage the object proposals to improve the quality of object instance search in videos. However, hundreds of object proposals obtained from each frame could result in unaffordable memory...

chapter

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos

Seunghoon Hong, Donghun Yeo, Suha Kwak, Honglak Lee, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2224 - 2232

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a novel algorithm for weakly supervised semantic segmentation based on image-level class labels only. In weakly supervised setting, it is commonly observed that trained model overly focuses on discriminative parts rather than the entire object area. Our goal is to overcome this limitation with no additional human intervention by retrieving videos relevant to target class labels from web...

chapter

Procedural Generation of Videos to Train Deep Action Recognition Networks

Cesar Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel Lopez

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2594 - 2604

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Deep learning for human action recognition in videos is making significant progress, but is slowed down by its dependency on expensive manual labeling of large video collections. In this work, we investigate the generation of synthetic training data for action recognition, as it has recently shown promising results for a variety of other computer vision tasks. We propose an interpretable parametric...

chapter

Predicting Behaviors of Basketball Players from First Person Videos

Shan Su, Jung Pyo Hong, Jianbo Shi, Hyun Soo Park

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1206 - 1215

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper presents a method to predict the future movements (location and gaze direction) of basketball players as a whole from their first person videos. The predicted behaviors reflect an individual physical space that affords to take the next actions while conforming to social behaviors by engaging to joint attention. Our key innovation is to use the 3D reconstruction of multiple first person...

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Learning to Learn from Noisy Web Videos

Temporal Action Localization by Structured Maximal Sums

Unified Embedding and Metric Learning for Zero-Exemplar Event Detection

FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos

Temporal Action Co-Segmentation in 3D Motion Capture Data and Videos

CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos

The World of Fast Moving Objects

Identifying First-Person Camera Wearers in Third-Person Videos

PoseTrack: Joint Multi-person Pose Estimation and Tracking

CERN: Confidence-Energy Recurrent Network for Group Activity Recognition

LSTM Self-Supervision for Detailed Behavior Analysis

Towards a Quality Metric for Dense Light Fields

Binge Watching: Scaling Affordance Learning from Sitcoms

On the Effectiveness of Visible Watermarks

Unsupervised Learning of Long-Term Motion Dynamics for Videos

Predicting Salient Face in Multiple-Face Videos

HOPE: Hierarchical Object Prototype Encoding for Efficient Object Instance Search in Videos

Weakly Supervised Semantic Segmentation Using Web-Crawled Videos

Procedural Generation of Videos to Train Deep Action Recognition Networks

Predicting Behaviors of Basketball Players from First Person Videos

Filter options

Publication date

Keywords

INFONA - science communication portal

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)