Search results

chapter

A Lightweight Discriminative Tracker Based on Classification and Similarity

Weinong Wang, Fei Wang, Yu Guo

2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA) > 1 - 8

2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Convolutional neural network (CNN) based trackers have achieved significant performances in tracking recently. Most existing CNN-based trackers regard tracking as a classification or similarity searching problem. The two methods have their respective superiorities and limitations because of different supervised objectives. In this paper, we propose a multi-task CNN for visual tracking, not only fully...

chapter

Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization

Krishna Kumar Singh, Yong Jae Lee

2017 IEEE International Conference on Computer Vision (ICCV) > 3544 - 3553

2017 IEEE International Conference on Computer Vision (ICCV)

We propose ‘Hide-and-Seek’, a weakly-supervised framework that aims to improve object localization in images and action localization in videos. Most existing weakly-supervised methods localize only the most discriminative parts of an object rather than all relevant parts, which leads to suboptimal performance. Our key idea is to hide patches in a training image randomly, forcing the network to seek...

chapter

Spatio-Temporal Person Retrieval via Natural Language Queries

Masataka Yamaguchi, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada

2017 IEEE International Conference on Computer Vision (ICCV) > 1462 - 1471

2017 IEEE International Conference on Computer Vision (ICCV)

In this paper, we address the problem of spatio-temporal person retrieval from videos using a natural language query, in which we output a tube (i.e., a sequence of bounding boxes) which encloses the person described by the query. For this problem, we introduce a novel dataset consisting of videos containing people annotated with bounding boxes for each second and with five natural language descriptions...

chapter

Teacher training for the creation of accessible courses at atutor

Femando Martinez Rodriguez

2017 Twelfth Latin American Conference on Learning Technologies (LACLO) > 1 - 8

2017 Twelfth Latin American Conference on Learning Technologies (LACLO)

This article shares the obtained results in the teacher training phase for the analysis, development and publication of accessible courses making use of the learning management platform: ATutor. This training phase is carried out within the framework of a research project: “Didactic and technological development in teaching scenarios for the training of teachers who welcome diversity: factors for...

chapter

Unsupervised Representation Learning by Sorting Sequences

Hsin-Ying Lee, Jia-Bin Huang, Maneesh Singh, Ming-Hsuan Yang

2017 IEEE International Conference on Computer Vision (ICCV) > 667 - 676

2017 IEEE International Conference on Computer Vision (ICCV)

We present an unsupervised representation learning approach using videos without semantic labels. We leverage the temporal coherence as a supervisory signal by formulating representation learning as a sequence sorting task. We take temporally shuffled frames (i.e., in non-chronological order) as inputs and train a convolutional neural network to sort the shuffled sequences. Similar to comparison-based...

chapter

Analysis of dialogue stimulated by science videos and reference materials

Daichi Sunouchi, Kiyoshi Nosu

2017 Federated Conference on Computer Science and Information Systems (FedCSIS) > 1119 - 1122

2017 Federated Conference on Computer Science and Information Systems (FedCSIS)

Recently, many have begun to believe that learning and training approaches known as learner-centered, active learning, and cooperative learning improve learning and practicing performance and are more effective than traditional lectures. Moreover, in addition to paper-based materials such as textbooks, face-to-face co-located communication frequently utilizes digital video and other visual reference...

chapter

Learning to Learn from Noisy Web Videos

Serena Yeung, Vignesh Ramanathan, Olga Russakovsky, Liyue Shen, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 7455 - 7463

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Understanding the simultaneously very diverse and intricately fine-grained set of possible human actions is a critical open problem in computer vision. Manually labeling training videos is feasible for some action classes but doesnt scale to the full long-tailed distribution of actions. A promising way to address this is to leverage noisy data from web queries to learn new actions, using semi-supervised...

chapter

Lip Reading Sentences in the Wild

Joon Son Chung, Andrew Senior, Oriol Vinyals, Andrew Zisserman

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3444 - 3453

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio. Unlike previous works that have focussed on recognising a limited number of words or phrases, we tackle lip reading as an open-world problem – unconstrained natural language sentences, and in the wild videos. Our key contributions are: (1) a Watch, Listen, Attend and Spell...

chapter

Deep Sequential Context Networks for Action Prediction

Yu Kong, Zhiqiang Tao, Yun Fu

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 3662 - 3670

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper proposes efficient and powerful deep networks for action prediction from partially observed videos containing temporally incomplete action executions. Different from after-the-fact action recognition, action prediction task requires action labels to be predicted from these partially observed videos. Our approach exploits abundant sequential context information to enrich the feature representations...

chapter

UntrimmedNets for Weakly Supervised Action Recognition and Detection

Limin Wang, Yuanjun Xiong, Dahua Lin, Luc Van Gool

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 6402 - 6411

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, which is able to directly learn action recognition models from untrimmed videos without the requirement of temporal annotations of action instances...

chapter

Learning a real-time generic tracker using convolutional neural networks

Linnan Zhu, Lingxiao Yang, David Zhang, Lei Zhang

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1219 - 1224

2017 IEEE International Conference on Multimedia and Expo (ICME)

This paper presents a novel frame-pair based method for visual object tracking. Instead of adopting two-stream Convolutional Neural Networks (CNNs) to represent each frame, we stack frame pairs as the input, resulting in a single-stream CNN tracker with much fewer parameters. The proposed tracker can learn generic motion patterns of objects with much less annotated videos than previous methods. Besides,...

chapter

Geographic information use in weakly-supervised deep learning for landmark recognition

Yifang Yin, Zhenguang Liu, Roger Zimmermann

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1015 - 1020

2017 IEEE International Conference on Multimedia and Expo (ICME)

The successful deep convolutional neural networks for visual object recognition typically rely on a massive number of training images that are well annotated by class labels or object bounding boxes with great human efforts. Here we explore the use of the geographic metadata, which are automatically retrieved from sensors such as GPS and compass, in weakly-supervised learning techniques for landmark...

chapter

VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN

Na Zhao, Hanwang Zhang, Mingxing Zhang, Richang Hong, more

2017 IEEE International Conference on Multimedia and Expo (ICME) > 277 - 282

2017 IEEE International Conference on Multimedia and Expo (ICME)

We present VidedWhisfer, a novel approach for unsupervised video representation learning, in which video sequence is treated as a self-supervision entity based on the observation that the sequence encodes video temporal dynamics (e.g., object movement and event evolution). Specifically, for each video sequence, we use a pre-learned visual dictionary to generate a sequence of high-level semantics,...

chapter

Fixation Prediction in Videos Using Unsupervised Hierarchical Features

Julius Wang, Hamed R. Tavakoli, Jorma Laaksonen

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) > 2225 - 2232

2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

This paper presents a framework for saliency estimation and fixation prediction in videos. The proposed framework is based on a hierarchical feature representation obtained by stacking convolutional layers of independent subspace analysis (ISA) filters. The feature learning is thus unsupervised and independent of the task. To compute the saliency, we then employ a multiresolution saliency architecture...

chapter

Better deep visual attention with reinforcement learning in action recognition

Gang Wang, Wenmin Wang, Jingzhuo Wang, Yaohua Bu

2017 IEEE International Symposium on Circuits and Systems (ISCAS) > 1 - 4

2017 IEEE International Symposium on Circuits and Systems (ISCAS)

Deep visual attention in computer vision has attracted much attention over the past years, which achieves great contributions especially in image classification, image caption and action recognition. However, due to taking BP training wholly or partially, they can not show the true power of attention in computational efficiency and focusing accuracy. Our intuition is that attention mechanism should...

chapter

Construction Workers Activity Detection Using BOF

K. S. Sowmya

2017 International Conference on Recent Advances in Electronics and Communication Technology (ICRAECT) > 159 - 163

2017 International Conference on Recent Advances in Electronics and Communication Technology (ICRAECT)

Human Activity detection is an imperative area of research in computer vision. This paper focuses on activity recognition by construction personnel at the construction sites. The method uses bag of features (BOF) approach to detect an activity. Here we have considered five types of activities done at construction sites namely ladder climbing, brick laying, carpentry work, painting and plastering work...

chapter

Visual features for context-aware speech recognition

Abhinav Gupta, Yajie Miao, Leonardo Neves, Florian Metze

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5020 - 5024

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Automatic transcriptions of consumer generated multi-media content such as “Youtube” videos still exhibit high word error rates. Such data typically occupies a very broad domain, has been recorded in challenging conditions, with cheap hardware and a focus on the visual modality, and may have been post-processed or edited.

chapter

Generating commentaries for tennis videos

Fei Yan, Krystian Mikolajczyk, Josef Kittler

2016 23rd International Conference on Pattern Recognition (ICPR) > 2658 - 2663

2016 23rd International Conference on Pattern Recognition (ICPR)

We present an approach to automatically generating verbal commentaries for tennis games. We introduce a novel application that requires a combination of techniques from computer vision, natural language processing and machine learning. A video sequence is first analysed using state-of-the-art computer vision methods to track the ball, fit the detected edges to the court model, track the players, and...

chapter

In-situ visualization of pedaling forces on cycling training videos

Oral Kaplan, Goshiro Yamamoto, Yasuhide Yoshitake, Takafumi Taketomi, more

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 994 - 999

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Over the last decades, visual representations of data has been a commonly used medium to bolster human cognition in performance evaluation of professional athletes. However, the current approaches to these visualizations still build upon the paper based principles of initial designs with solid backgrounds. Due to this situation, same visualizations usually fail to provide explicit information about...

chapter

A multimedia gesture dataset for human robot communication: Acquisition, tools and recognition results

I. Rodomagoulakis, N. Kardaris, V. Pitsikalis, A. Arvanitakis, more

2016 IEEE International Conference on Image Processing (ICIP) > 3066 - 3070

2016 IEEE International Conference on Image Processing (ICIP)

Motivated by the recent advances in human-robot interaction we present a new dataset, a suite of tools to handle it and state-of-the-art work on visual gestures and audio commands recognition. The dataset has been collected with an integrated annotation and acquisition web-interface that facilitates on-the-way temporal ground-truths for fast acquisition. The dataset includes gesture instances in which...

INFONA - science communication portal

Search results

A Lightweight Discriminative Tracker Based on Classification and Similarity

Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization

Spatio-Temporal Person Retrieval via Natural Language Queries

Teacher training for the creation of accessible courses at atutor

Unsupervised Representation Learning by Sorting Sequences

Analysis of dialogue stimulated by science videos and reference materials

Learning to Learn from Noisy Web Videos

Lip Reading Sentences in the Wild

Deep Sequential Context Networks for Action Prediction

UntrimmedNets for Weakly Supervised Action Recognition and Detection

Learning a real-time generic tracker using convolutional neural networks

Geographic information use in weakly-supervised deep learning for landmark recognition

VIDEOWHISPER: Towards unsupervised learning of discriminative features of videos with RNN

Fixation Prediction in Videos Using Unsupervised Hierarchical Features

Better deep visual attention with reinforcement learning in action recognition

Construction Workers Activity Detection Using BOF

Visual features for context-aware speech recognition

Generating commentaries for tennis videos

In-situ visualization of pedaling forces on cycling training videos

A multimedia gesture dataset for human robot communication: Acquisition, tools and recognition results

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options