Search results

Items from 1 to 20 out of 73 results

chapter

An Empirical Study of Language CNN for Image Captioning

Jiuxiang Gu, Gang Wang, Jianfei Cai, Tsuhan Chen

2017 IEEE International Conference on Computer Vision (ICCV) > 1231 - 1240

2017 IEEE International Conference on Computer Vision (ICCV)

Language models based on recurrent neural networks have dominated recent image caption generation tasks. In this paper, we introduce a language CNN model which is suitable for statistical language modeling tasks and shows competitive performance in image captioning. In contrast to previous models which predict next word based on one previous word and hidden state, our language CNN is fed with all...

chapter

Novel hybrid CNN-SVM model for recognition of functional magnetic resonance images

Xiaolong Sun, Juyoung Park, Kyungtae Kang, Junbeom Hur

2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 1001 - 1006

2017 IEEE International Conference on Systems, Man and Cybernetics (SMC)

This paper proposes a novel hybrid model that integrates the synergy of two superior classifiers for functional magnetic resonance imaging (fMRI) recognition, namely, convolutional neural networks (CNNs) and support vector machines (SVMs), both of which have proven results in the field of image recognition. In the proposed model, the CNN functions as a trainable feature extractor and the SVM functions...

chapter

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Jifei Song, Qian Yu, Yi-Zhe Song, Tao Xiang, more

2017 IEEE International Conference on Computer Vision (ICCV) > 5552 - 5561

2017 IEEE International Conference on Computer Vision (ICCV)

Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instance-level retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments...

chapter

Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies

Amir Sadeghian, Alexandre Alahi, Silvio Savarese

2017 IEEE International Conference on Computer Vision (ICCV) > 300 - 311

2017 IEEE International Conference on Computer Vision (ICCV)

The majority of existing solutions to the Multi-Target Tracking (MTT) problem do not combine cues over a long period of time in a coherent fashion. In this paper, we present an online method that encodes long-term temporal dependencies across multiple cues. One key challenge of tracking methods is to accurately track occluded targets or those which share similar appearance properties with surrounding...

chapter

Look, Perceive and Segment: Finding the Salient Objects in Images via Two-stream Fixation-Semantic CNNs

Xiaowu Chen, Anlin Zheng, Jia Li, Feng Lu

2017 IEEE International Conference on Computer Vision (ICCV) > 1050 - 1058

2017 IEEE International Conference on Computer Vision (ICCV)

Recently, CNN-based models have achieved remarkable success in image-based salient object detection (SOD). In these models, a key issue is to find a proper network architecture that best fits for the task of SOD. Toward this end, this paper proposes two-stream fixation-semantic CNNs, whose architecture is inspired by the fact that salient objects in complex images can be unambiguously annotated by...

chapter

Multi-scale Deep Learning Architectures for Person Re-identification

Xuelin Qian, Yanwei Fu, Yu-Gang Jiang, Tao Xiang, more

2017 IEEE International Conference on Computer Vision (ICCV) > 5409 - 5418

2017 IEEE International Conference on Computer Vision (ICCV)

Person Re-identification (re-id) aims to match people across non-overlapping camera views in a public space. It is a challenging problem because many people captured in surveillance videos wear similar clothes. Consequently, the differences in their appearance are often subtle and only detectable at the right location and scales. Existing re-id models, particularly the recently proposed deep learning...

chapter

Mobile content based image retrieval architectures

Arif Rahman, Edi Winarko, Moh. Edi Wibowo

2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI) > 1 - 4

2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI)

Mobile device features such as camera and other sensors are evolving rapidly nowadays. Supported by a reliable communications network, it raises new methods in information retrieval. Mobile devices can capture an image with its camera and pass it to the retrieval systems to get the information needed. This system, called Mobile Content-Based Image Retrieval (MCBIR), generally consists of two parts:...

chapter

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

Ronald Clark, Sen Wang, Andrew Markham, Niki Trigoni, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 2652 - 2660

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images. However, in most cases image-sequences, rather only single images, are readily available. To this extent, none of the proposed learning-based approaches exploit the valuable constraint of temporal smoothness, often leading...

chapter

Spatially Adaptive Computation Time for Residual Networks

Michael Figurnov, Maxwell D. Collins, Yukun Zhu, Li Zhang, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 1790 - 1799

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

This paper proposes a deep learning architecture based on Residual Network that dynamically adjusts the number of executed layers for the regions of the image. This architecture is end-to-end trainable, deterministic and problem-agnostic. It is therefore applicable without any modifications to a wide range of computer vision problems such as image classification, object detection and image segmentation...

chapter

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Jimmy Ren, Xiaohao Chen, Jianbo Liu, Wenxiu Sun, more

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 752 - 760

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the single stage detection methods have not been as competitive...

chapter

Three-Skips CNN for road scene semantic segmentation

Jing Tang, Xin Wang

2017 IEEE International Conference on Information and Automation (ICIA) > 858 - 863

2017 IEEE International Conference on Information and Automation (ICIA)

In this paper we propose a deep learning architecture to make the best use of global and local information for pixel-wise semantic segmentation. The architecture of three-skips CNN is built with convolutional layers in VGG16 network and its mirrored convolutional layers. Our architecture aims to road scene understanding. In order to save memory and computational time, we use unpooling layers to map...

chapter

Deep learning for multimodal-based video interestingness prediction

Yuesong Shen, Claire-Heiene Demarty, Ngoc Q. K. Duong

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1003 - 1008

2017 IEEE International Conference on Multimedia and Expo (ICME)

Predicting interestingness of media content remains an important, but challenging research subject. The difficulty comes first from the fact that, besides being a high-level semantic concept, interestingness is highly subjective and its global definition has not been agreed yet. This paper presents the use of up-to-date deep learning techniques for solving the task. We perform experiments with both...

chapter

A joint model for action localization and classification in untrimmed video with visual attention

Weimian Li, Wenmin Wang, Xiongtao Chen, Jinzhuo Wang, more

2017 IEEE International Conference on Multimedia and Expo (ICME) > 619 - 624

2017 IEEE International Conference on Multimedia and Expo (ICME)

In this paper, we introduce a joint model that learns to directly localize the temporal bounds of actions in untrimmed videos as well as precisely classify what actions occur. Most existing approaches tend to scan the whole video to generate action instances, which are really inefficient. Instead, inspired by human perception, our model is formulated based on a recurrent neural network to observe...

chapter

On Detecting Partially Occluded Faces with Pose Variations

Tarik Alafif, Zeyad Hailat, Melih Aslan, Xuewen Chen

2017 14th International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC) > 28 - 37

Face detection in unconstrained environments is a challenging problem due to partial occlusions with pose variations. Existing partial occluded face detection methods require training several models, computing hand-crafted features, or both. In this paper, our contributions are two-fold. First, we propose our Large-Scale Deep Learning (LSDL), a method that requires a single Convolutional Neural Network...

chapter

An embedded FPGA architecture for efficient visual saliency based object recognition implementation

Hanen Chenini

2017 6th International Conference on Systems and Control (ICSC) > 187 - 192

2017 6th International Conference on Systems and Control (ICSC)

In this article, we propose a new optimized embedded architecture based soft-core processors oriented to visual attention based object recognition applications. Our recognition approach relies mainly on two specific modules for online processing of acquired images in real-time: a novel saliency based feature detector/descriptor module and then an object classifier module. To deal with such parallel/pipeline...

chapter

Character-level deep conflation for business data analytics

Zhe Gan, P. D. Singh, Ameet Joshi, Xiaodong He, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2222 - 2226

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Connecting different text attributes associated with the same entity (conflation) is important in business data analytics since it could help merge two different tables in a database to provide a more comprehensive profile of an entity. However, the conflation task is challenging because two text strings that describe the same entity could be quite different from each other for reasons such as misspelling...

chapter

Pre-trained classifiers with One Shot Similarity for context aware face verification and identification

Monika Sharma, Ramya Hebbalaguppe, Lovekesh Vig

2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) > 1 - 7

2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA)

Most affect based systems analyse facial expressions for emotion detection, and utilize face detection and recognition methods in order to do effective affect analysis. Recent work has demonstrated the efficacy of deep architectures for face recognition by training as classifiers on voluminous datasets. Some architectures are trained as classifiers, and some directly learn an embedding via a triplet...

chapter

Convolutional Neural Networks for object recognition on mobile devices: A case study

Luis Tobias, Aurelien Ducournau, Francois Rousseau, Gregoire Mercier, more

2016 23rd International Conference on Pattern Recognition (ICPR) > 3530 - 3535

2016 23rd International Conference on Pattern Recognition (ICPR)

Deep Learning (DL), especially Convolutional Neural Networks (CNN), has become the state-of-the-art for a variety of pattern recognition issues. Technological developments have allowed the use of high-end General Purpose Graphic Processor Units (GPGPU) for accelerating numerical problem solving. They resort no only to lower computational time, but also allow considering much larger networks. Hence,...

chapter

End-to-End attention based text-dependent speaker verification

Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 171 - 178

2016 IEEE Spoken Language Technology Workshop (SLT)

A new type of End-to-End system for text-dependent speaker verification is presented in this paper. Previously, using the phonetic discriminate/speaker discriminate DNN as a feature extractor for speaker verification has shown promising results. The extracted frame-level (bottleneck, posterior or d-vector) features are equally weighted and aggregated to compute an utterance-level speaker representation...

chapter

Non-deep CNN for multi-modal image classification and feature learning: An Azure-based model

Sohini Roychowdhury, Johnny Ren

2016 IEEE International Conference on Big Data (Big Data) > 2893 - 2812

2016 IEEE International Conference on Big Data (Big Data)

Convolutional Neural Networks (CNN) are useful methods for identification of previously unknown embedded patterns in images. Several object and facial recognition along with image segmentation tasks have benefited from the non-linear abstraction of hybrid features using CNN. This work presents a novel CNN model parametrization work-flow developed on the cloud-computing platform of Microsoft Azure...

Data set:
ieee
Keywords:
COMPUTER ARCHITECTURE
COMPUTATIONAL MODELING
FEATURE EXTRACTION

Publication date

Set your own date range

INFONA - science communication portal

Search results

An Empirical Study of Language CNN for Image Captioning

Novel hybrid CNN-SVM model for recognition of functional magnetic resonance images

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Tracking the Untrackable: Learning to Track Multiple Cues with Long-Term Dependencies

Look, Perceive and Segment: Finding the Salient Objects in Images via Two-stream Fixation-Semantic CNNs

Multi-scale Deep Learning Architectures for Person Re-identification

Mobile content based image retrieval architectures

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

Spatially Adaptive Computation Time for Residual Networks

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Three-Skips CNN for road scene semantic segmentation

Deep learning for multimodal-based video interestingness prediction

A joint model for action localization and classification in untrimmed video with visual attention

On Detecting Partially Occluded Faces with Pose Variations

An embedded FPGA architecture for efficient visual saliency based object recognition implementation

Character-level deep conflation for business data analytics

Pre-trained classifiers with One Shot Similarity for context aware face verification and identification

Convolutional Neural Networks for object recognition on mobile devices: A case study

End-to-End attention based text-dependent speaker verification

Non-deep CNN for multi-modal image classification and feature learning: An Azure-based model

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options