Search results

chapter

Zero-Shot Classification with Discriminative Semantic Representation Learning

Meng Ye, Yuhong Guo

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5103 - 5111

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Zero-shot learning, a special case of unsupervised domain adaptation where the source and target domains have disjoint label spaces, has become increasingly popular in the computer vision community. In this paper, we propose a novel zero-shot learning method based on discriminative sparse non-negative matrix factorization. The proposed approach aims to identify a set of common high-level semantic...

chapter

Semantic Autoencoder for Zero-Shot Learning

Elyor Kodirov, Tao Xiang, Shaogang Gong

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4447 - 4456

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Existing zero-shot learning (ZSL) models typically learn a projection function from a feature space to a semantic embedding space (e.g. attribute space). However, such a projection function is only concerned with predicting the training seen class semantic representation (e.g. attribute prediction) or classification. When applied to test data, which in the context of ZSL contains different (unseen)...

chapter

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

Xiao Yang, Ersin Yumer, Paul Asente, Mike Kraley, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4342 - 4351

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We present an end-to-end, multimodal, fully convolutional network for extracting semantic structures from document images. We consider document semantic structure extraction as a pixel-wise segmentation task, and propose a unified model that classifies pixels based not only on their visual appearance, as in the traditional page segmentation task, but also on the content of underlying text. Moreover,...

chapter

Semantically Consistent Regularization for Zero-Shot Recognition

Pedro Morgado, Nuno Vasconcelos

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2037 - 2046

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The role of semantics in zero-shot learning is considered. The effectiveness of previous approaches is analyzed according to the form of supervision provided. While some learn semantics independently, others only supervise the semantic subspace explained by training classes. Thus, the former is able to constrain the whole space but lacks the ability to model semantic correlations. The latter addresses...

chapter

Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning

Xing Xu, Fumin Shen, Yang Yang, Dongxiang Zhang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2007 - 2016

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Zero-shot learning (ZSL) aims to recognize objects of unseen classes with available training data from another set of seen classes. Existing solutions are focused on exploring knowledge transfer via an intermediate semantic embedding (e.g., attributes) shared between seen and unseen classes. In this paper, we propose a novel projection framework based on matrix tri-factorization with manifold regularizations...

chapter

Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification

Feng Zhu, Hongsheng Li, Wanli Ouyang, Nenghai Yu, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2027 - 2036

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we...

chapter

Predicting Behaviors of Basketball Players from First Person Videos

Shan Su, Jung Pyo Hong, Jianbo Shi, Hyun Soo Park

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1206 - 1215

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper presents a method to predict the future movements (location and gaze direction) of basketball players as a whole from their first person videos. The predicted behaviors reflect an individual physical space that affords to take the next actions while conforming to social behaviors by engaging to joint attention. Our key innovation is to use the 3D reconstruction of multiple first person...

chapter

Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures

Fanyi Xiao, Leonid Sigal, Yong Jae Lee

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5253 - 5262

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a weakly-supervised approach that takes image-sentence pairs as input and learns to visually ground (i.e., localize) arbitrary linguistic phrases, in the form of spatial attention masks. Specifically, the model is trained with images and their associated image-level captions, without any explicit region-to-phrase correspondence annotations. To this end, we introduce an end-to-end model...

chapter

Conditional Similarity Networks

Andreas Veit, Serge Belongie, Theofanis Karaletsos

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1781 - 1789

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

What makes images similar? To measure the similarity between images, they are typically embedded in a feature-vector space, in which their distance preserve the relative dissimilarity. However, when learning such similarity embeddings the simplifying assumption is commonly made that images are only compared to one unique measure of similarity. A main reason for this is that contradicting notions of...

chapter

Webly Supervised Semantic Segmentation

Bin Jin, Maria V. Ortiz Segovia, Sabine Susstrunk

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1705 - 1714

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We propose a weakly supervised semantic segmentation algorithm that uses image tags for supervision. We apply the tags in queries to collect three sets of web images, which encode the clean foregrounds, the common backgrounds, and realistic scenes of the classes. We introduce a novel three-stage training pipeline to progressively learn semantic segmentation models. We first train and refine a class-specific...

chapter

Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

Yanhua Cheng, Rui Cai, Zhiwei Li, Xin Zhao, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1475 - 1483

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper focuses on indoor semantic segmentation using RGB-D data. Although the commonly used deconvolution networks (DeconvNet) have achieved impressive results on this task, we find there is still room for improvements in two aspects. One is about the boundary segmentation. DeconvNet aggregates large context to predict the label of each pixel, inherently limiting the segmentation precision of...

chapter

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

Ziad Al-Halah, Rainer Stiefelhagen

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5112 - 5121

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Attribute-based recognition models, due to their impressive performance and their ability to generalize well on novel categories, have been widely adopted for many computer vision applications. However, usually both the attribute vocabulary and the class-attribute associations have to be provided manually by domain experts or large number of annotators. This is very costly and not necessarily optimal...

chapter

Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection

Xiaodan Liang, Lisa Lee, Eric P. Xing

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4408 - 4417

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Computers still struggle to understand the interdependency of objects in the scene as a whole, e.g., relations between objects or their attributes. Existing methods often ignore global context cues capturing the interactions among different object instances, and can only recognize a handful of types by exhaustively training individual detectors for all possible relationships. To capture such global...

chapter

Predicting Ground-Level Scene Layout from Aerial Imagery

Menghua Zhai, Zachary Bessinger, Scott Workman, Nathan Jacobs

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4132 - 4140

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We introduce a novel strategy for learning to extract semantically meaningful features from aerial imagery. Instead of manually labeling the aerial imagery, we propose to predict (noisy) semantic features automatically extracted from co-located ground imagery. Our network architecture takes an aerial image as input, extracts features using a convolutional neural network, and then applies an adaptive...

chapter

Mining Object Parts from CNNs via Active Question-Answering

Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3890 - 3899

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Given a convolutional neural network (CNN) that is pre-trained for object classification, this paper proposes to use active question-answering to semanticize neural patterns in conv-layers of the CNN and mine part concepts. For each part concept, we mine neural patterns in the pre-trained CNN, which are related to the target part, and use these patterns to construct an And-Or graph (AOG) to represent...

chapter

Visual-Inertial-Semantic Scene Representation for 3D Object Detection

Jingming Dong, Xiaohan Fei, Stefano Soatto

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3567 - 3577

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

We describe a system to detect objects in three-dimensional space using video and inertial sensors (accelerometer and gyrometer), ubiquitous in modern mobile platforms from phones to drones. Inertials afford the ability to impose class-specific scale priors for objects, and provide a global orientation reference. A minimal sufficient representation, the posterior of semantic (identity) and syntactic...

chapter

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Licheng Yu, Hao Tan, Mohit Bansal, Tamara L. Berg

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3521 - 3529

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Referring expressions are natural language constructions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer. The speaker generates referring expressions, the listener comprehends referring expressions, and the reinforcer...

chapter

A unified framework for flickr group recommendation based on tetradic decomposition

Xiaofang Wang, Xiuyang Zhao, Jin Zhou, Ming Xu

2017 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS) > 300 - 305

2017 4th International Conference on Information, Cybernetics and Computational Social Systems (ICCSS)

Different from current researches on Flickr group recommendation approaches that recommend groups to either users or images, this work proposes a unified framework that recommends groups to both users and images. Four types of entities in the Flickr system (users, tags, images, and groups) are integrated into a tetradic model, and then we uses tetradic decomposition to discover the latent semantic...

chapter

Cognitive exploration of regions through analyzing geo-tagged social media data

Yunzhe Wang, George Baciu, Chenhui Li

2017 IEEE 16th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC) > 59 - 64

2017 IEEE 16th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC)

Social media has now become a pervasive global communication channel. Many applications and platforms have become available for users to post messages, follow friends and share experiences. Due to the high frequency with which users update their states, a large amount of data is being generated around the world every second. By analyzing this data, valuable patterns can be extracted such as the distribution...

chapter

Fine-Grained Image Classification via Combining Vision and Language

Xiangteng He, Yuxin Peng

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7332 - 7340

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing fine-grained image classification methods generally learn part detection models to obtain the semantic parts for better classification accuracy. Despite achieving promising...

INFONA - science communication portal

Search results

Zero-Shot Classification with Discriminative Semantic Representation Learning

Semantic Autoencoder for Zero-Shot Learning

Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks

Semantically Consistent Regularization for Zero-Shot Recognition

Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning

Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification

Predicting Behaviors of Basketball Players from First Person Videos

Weakly-Supervised Visual Grounding of Phrases with Linguistic Structures

Conditional Similarity Networks

Webly Supervised Semantic Segmentation

Locality-Sensitive Deconvolution Networks with Gated Fusion for RGB-D Indoor Semantic Segmentation

Automatic Discovery, Association Estimation and Learning of Semantic Attributes for a Thousand Categories

Deep Variation-Structured Reinforcement Learning for Visual Relationship and Attribute Detection

Predicting Ground-Level Scene Layout from Aerial Imagery

Mining Object Parts from CNNs via Active Question-Answering

Visual-Inertial-Semantic Scene Representation for 3D Object Detection

A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

A unified framework for flickr group recommendation based on tetradic decomposition

Cognitive exploration of regions through analyzing geo-tagged social media data

Fine-Grained Image Classification via Combining Vision and Language

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options