Wyniki wyszukiwania

Pozycje od 1 do 20 spośród 71 wyników

Poprzednia

Następna

rozdział

Creativity: Generating Diverse Questions Using Variational Autoencoders

Unnat Jain, Ziyu Zhang, Alexander Schwing

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5415 - 5424

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Generating diverse questions for given images is an important task for computational education, entertainment and AI assistants. Different from many conventional prediction techniques is the need for algorithms to generate a diverse set of plausible questions, which we refer to as creativity. In this paper we propose a creative algorithm for visual question generation which combines the advantages...

rozdział

Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects

Ting Yao, Yingwei Pan, Yehao Li, Tao Mei

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 5263 - 5271

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Image captioning often requires a large set of training image-sentence pairs. In practice, however, acquiring sufficient training pairs is always expensive, making the recent captioning models limited in their ability to describe objects outside of training corpora (i.e., novel objects). In this paper, we present Long Short-Term Memory with Copying Mechanism (LSTM-C) — a new architecture...

rozdział

On Human Motion Prediction Using Recurrent Neural Networks

Julieta Martinez, Michael J. Black, Javier Romero

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) > 4674 - 4683

2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Human motion modelling is a classical problem at the intersection of graphics and computer vision, with applications spanning human-computer interaction, motion synthesis, and motion prediction for virtual and augmented reality. Following the success of deep learning methods in several computer vision tasks, recent work has focused on using deep recurrent neural networks (RNNs) to model human motion,...

rozdział

Improving acoustic modeling using audio-visual speech

Ahmed Hussen Abdelaziz

2017 IEEE International Conference on Multimedia and Expo (ICME) > 1081 - 1086

2017 IEEE International Conference on Multimedia and Expo (ICME)

Reliable visual features that encode the articulator movements of speakers can dramatically improve the decoding accuracy of automatic speech recognition systems when combined with the corresponding acoustic signals. In this paper, a novel framework is proposed to utilize audio-visual speech not only during decoding but also for training better acoustic models. In this framework, a multi-stream hidden...

rozdział

Lip-reading via a DNN-HMM hybrid system using combination of the image-based and model-based features

Mohammad Hasan Rahmani, Farshad Almasganj

2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA) > 195 - 199

2017 3rd International Conference on Pattern Recognition and Image Analysis (IPRIA)

Introducing features that better represent the visual information of speakers during the speech production is still an open issue that highly affects the quality of the lip-reading and Audio Visual Speech Recognition (AVSR) tasks. In this paper, three different types of visual features from both the image-based and model-based ones are investigated inside a professional lip reading task. The simple...

rozdział

Improving audio-visual speech recognition using deep neural networks with dynamic stream reliability estimates

Hendrik Meutzner, Ning Ma, Robert Nickel, Christopher Schymura, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5320 - 5324

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Audio-visual speech recognition is a promising approach to tackling the problem of reduced recognition rates under adverse acoustic conditions. However, finding an optimal mechanism for combining multi-modal information remains a challenging task. Various methods are applicable for integrating acoustic and visual information in Gaussian-mixture-model-based speech recognition, e.g., via dynamic stream...

rozdział

Expressive visual text to speech and expression adaptation using deep neural networks

Jonathan Parker, Ranniery Maia, Yannis Stylianou, Roberto Cipolla

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4920 - 4924

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we present an expressive visual text to speech system (VTTS) based on a deep neural network (DNN). Given an input text sentence and a set of expression tags, the VTTS is able to produce not only the audio speech, but also the accompanying facial movements. The expressions can either be one of the expressions in the training corpus or a blend of expressions from the training corpus....

rozdział

End-to-end visual speech recognition with LSTMS

Stavros Petridis, Zuwei Li, Maja Pantic

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2592 - 2596

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Traditional visual speech recognition systems consist of two stages, feature extraction and classification. Recently, several deep learning approaches have been presented which automatically extract features from the mouth images and aim to replace the feature extraction stage. However, research on joint learning of features and classification is very limited. In this work, we present an end-to-end...

rozdział

Semi-supervised understanding of complex activities from temporal concepts

Carlos Fernando Crispim, Michal Koperski, Serhan Cosar, Francois Bremond

2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) > 80 - 87

2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

Methods for action recognition have evolved considerably over the past years and can now automatically learn and recognize short term actions with satisfactory accuracy. Nonetheless, the recognition of complex activities - compositions of actions and scene objects - is still an open problem due to the complex temporal and composite structure of this category of events. Existing methods focus either...

rozdział

Embodied gesture learning from one-shot

Maria E. Cabrera, Juan P. Wachs

2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) > 1092 - 1097

2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN)

This paper discusses the problem of one shot gesture recognition. This is relevant to the field of human-robot interaction, where the user's intentions are indicated through spontaneous gesturing (one shot) to the robot. The novelty of this work consists of learning the process that leads to the creation of a gesture, rather on the gesture itself. In our case, the context involves the way in which...

rozdział

Deep neural network and switching Kalman filter based continuous affect recognition

Ercheng Pei, Xiaohan Xia, Le Yang, Dongmei Jiang, więcej

2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) > 1 - 6

2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW)

In this paper, we propose the deep neural network - switching Kalman filter (DNN-SKF) based frameworks for both single modal and multi-modal continuous affective dimension estimation. The DNN-SKF framework firstly models the complex nonlinear relationship between the input (audio, visual, or lexical) features and the affective dimensions via the non-recurrent DNN, then models the temporal dynamics...

rozdział

Describing images by feeding LSTM with structural words

Shubo Ma, Yahong Han

2016 IEEE International Conference on Multimedia and Expo (ICME) > 1 - 6

2016 IEEE International Conference on Multimedia and Expo (ICME)

Generating semantic description draws increasing attention recently. Describing objects with adaptive adjunct words make the sentence more informative. In this paper, we focus on the generation of descriptions for images according to the structural words we have generated such as a tetrad of <object, attribute, activity, scene>. We propose to use deep machine translation method to generate semantically...

rozdział

An end-to-end generative framework for video segmentation and recognition

Hilde Kuehne, Juergen Gall, Thomas Serre

2016 IEEE Winter Conference on Applications of Computer Vision (WACV) > 1 - 8

2016 IEEE Winter Conference on Applications of Computer Vision (WACV)

We describe an end-to-end generative approach for the segmentation and recognition of human activities. In this approach, a visual representation based on reduced Fisher Vectors is combined with a structured temporal model for recognition. We show that the statistical properties of Fisher Vectors make them an especially suitable front-end for generative models such as Gaussian mixtures. The system...

rozdział

Skeleton-Free Body Pose Estimation from Depth Images for Movement Analysis

Ben Crabbe, Adeline Paiement, Sion Hannuna, Majid Mirmehdi

2015 IEEE International Conference on Computer Vision Workshop (ICCVW) > 312 - 320

2015 IEEE International Conference on Computer Vision Workshop (ICCVW)

In movement analysis frameworks, body pose may often be adequately represented in a simple, low-dimensional, and high-level space, while full body joints' locations constitute excessively redundant and complex information. We propose a method for estimating body pose in such high-level pose spaces, directly from a depth image and without relying on intermediate skeleton-based steps. Our method is...

rozdział

Achieving "synergy" in cognitive behavior of humanoids via deep learning of dynamic visuo-motor-attentional coordination

Jungsik Hwang, Minju Jung, Naveen Madapana, Jinhyung Kim, więcej

2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids) > 817 - 824

2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids)

The current study examines how adequate coordination among different cognitive processes including visual recognition, attention switching, action preparation and generation can be developed via learning of robots by introducing a novel model, the Visuo-Motor Deep Dynamic Neural Network (VMDNN). The proposed model is built on coupling of a dynamic vision network, a motor generation network, and a...

rozdział

Noise-robust and stress-free visualization of pronunciation diversity of World Englishes using a learner's self-centered viewpoint

Yuichi Sato, Yosuke Kashiwagi, Nobuaki Minematsu, Daisuke Saito, więcej

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 1 - 6

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

The term of “World Englishes” describes the current and real state of English and one of their main characteristics is a large diversity of pronunciation, called accents. We have developed two techniques of individual-based clustering of the diversity [1, 2] and educationally-effective visualization of the diversity [3]. Accent clustering requires a technique to quantify the accent gap between any...

rozdział

An expectation maximisation algorithm for behaviour analysis in video

Olga Isupova, Lyudmila Mihaylova, Danil Kuzin, Garik Markarian, więcej

2015 18th International Conference on Information Fusion (Fusion) > 126 - 133

2015 18th International Conference on Information Fusion (Fusion)

Surveillance systems require advanced algorithms able to make decisions without a human operator or with minimal assistance from human operators. In this paper we propose a novel approach for dynamic topic modeling to detect abnormal behaviour in video sequences. The topic model describes activities and behaviours in the scene assuming behaviour temporal dynamics. The new inference scheme based on...

rozdział

Human action recognition using an improved string edit distance

Pasquale Foggia, Benoit Gauzere, Alessia Saggese, Mario Vento

2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) > 1 - 6

2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

In this paper we propose an improvement of a human action recognition method that uses a string-based representation and a string edit distance to compare the observed action with reference actions in the training set. In particular, the original improvement is based on a specific formulation of the string edit distance that is more suited to take into account the problems related to noise and to...

rozdział

LIP movement generation using restricted Boltzmann machines for visual speech synthesis

Zheng-Chen Liu, Zhen-Hua Ling, Li-Rong Dai

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 606 - 610

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

This paper proposes methods of using restricted Boltzmann machines (RBM) to generate the sequence of lip images for visual speech synthesis. The aim of our proposed methods is to alleviate the over-smoothing effect of the conventional hidden Markov model (HMM) based statistical approach for lip synthesis. Two model structures using RBMs to model and generate lip movements are investigated in this...

rozdział

An Exemplar-Based Hidden Markov Model with Discriminative Visual Features for Lipreading

Xin Liu, Yiu-Ming Cheung

2014 Tenth International Conference on Computational Intelligence and Security > 90 - 93

2014 Tenth International Conference on Computational Intelligence and Security (CIS)

In this paper, we address an exemplar-based hidden markov model (HMM) that represents the lip motion activity using visual cues for lipreading. The discriminative visual features including the geometric shape parameters and contour-constrained spatial histogram are selected for representing each lip frame. Then, a set of exemplars associated with the HMM is learned jointly to serve as a typical representation...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
VISUALIZATION
TRAINING
HIDDEN MARKOV MODELS

Data publikacji

Ustaw własny zakres dat

Słowa kluczowe

FEATURE EXTRACTION (32)
SPEECH (19)
SPEECH RECOGNITION (19)
COMPUTATIONAL MODELING (9)
LEARNING (ARTIFICIAL INTELLIGENCE) (9)
ACCURACY (8)
IMAGE CLASSIFICATION (8)
ACOUSTICS (6)
HIDDEN MARKOV MODEL (6)
IMAGE SEGMENTATION (6)
SUPPORT VECTOR MACHINES (6)
VECTORS (6)
DATA MINING (5)
DATABASES (5)
IMAGE RETRIEVAL (5)
TRAJECTORY (5)
VIDEO SIGNAL PROCESSING (5)
ADAPTATION MODEL (4)
AUTOMATIC IMAGE ANNOTATION (4)
CLASSIFICATION ALGORITHMS (4)
HISTOGRAMS (4)
JOINTS (4)
NEURAL NETWORKS (4)
PRINCIPAL COMPONENT ANALYSIS (4)
PROBABILITY (4)
TESTING (4)
ADAPTIVE ASYMMETRIC LEARNING (3)
ARTIFICIAL NEURAL NETWORKS (3)
ASPECT MODEL (3)
CONTENT-BASED RETRIEVAL (3)
DATA MODELS (3)
EMOTION RECOGNITION (3)
FACE RECOGNITION (3)
GESTURE RECOGNITION (3)
HANDICAPPED AIDS (3)
HMM (3)
IMAGE SEQUENCES (3)
MOUTH (3)
NOISE (3)
PLSA (3)
PREDICTIVE MODELS (3)
PROBABILISTIC LATENT SEMANTIC ANALYSIS (3)
RECURRENT NEURAL NETWORKS (3)
ROBUSTNESS (3)
SIGNAL TO NOISE RATIO (3)
SPEECH PROCESSING (3)
SPEECH SYNTHESIS (3)
STANDARD COREL DATASET (3)
SUPPORT VECTOR MACHINE (3)
TRAINING DATA (3)
TRANSFORMS (3)
VISUAL SPEECH SYNTHESIS (3)
ACOUSTIC SIGNAL PROCESSING (2)
AFFECT RECOGNITION (2)
APPROXIMATION THEORY (2)
AUDIO-VISUAL AUTOMATIC SPEECH RECOGNITION (2)
AUDIO-VISUAL SPEECH RECOGNITION (2)
AUDIO-VISUAL SYSTEMS (2)
CAMERAS (2)
COMPLEXITY THEORY (2)
CONFERENCES (2)
CONTENT-BASED IMAGE RETRIEVAL (2)
CORRELATION (2)
DECODING (2)
DOCUMENT IMAGE PROCESSING (2)
DYNAMIC STREAM WEIGHTING (2)
ESTIMATION (2)
FACIAL EXPRESSIONS (2)
GAMES (2)
HUMANS (2)
IMAGE ANNOTATION (2)
IMAGE CODING (2)
IMAGE COLOR ANALYSIS (2)
IMAGE PROCESSING (2)
IMAGE REPRESENTATION (2)
INFORMATION RETRIEVAL (2)
KERNEL (2)
LIPREADING (2)
LIPS (2)
MACHINE LEARNING (2)
MATHEMATICAL MODEL (2)
MEASUREMENT (2)
MOBILE ROBOTS (2)
MODULATION (2)
MULTIMEDIA COMMUNICATION (2)
OBJECT RECOGNITION (2)
PHOTO-REAL (2)
PRODUCTION (2)
RELIABILITY (2)
ROBOT VISION (2)
TALKING HEAD (2)
TEXTUAL MODALITIES (2)
TRAJECTORY-GUIDED (2)
VECTOR QUANTISATION (2)
VECTOR QUANTIZATION (2)
VIDEO CODING (2)
VIDEO SEGMENTATION (2)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu