Search results

chapter

Dynamic and interactive gesture recognition algorithm based on Kinect

Guangjun Dai, Lei Yu, Jun Huang

2017 29th Chinese Control And Decision Conference (CCDC) > 3479 - 3484

2017 29th Chinese Control And Decision Conference (CCDC)

With the constant development of smart devices, Gesture recognition is applied on more and more fields. Gesture recognition devices currently on the market are inconvenient and expensive. A gesture recognition method based on Kinect is proposed in this paper. The camera of Kinect is used to get gesture images and then a hidden Markov model is established to recognize dynamic gesture so that the operation...

chapter

Acoustic novelty detection with adversarial autoencoders

Emanuele Principi, Fabio Vesperini, Stefano Squartini, Francesco Piazza

2017 International Joint Conference on Neural Networks (IJCNN) > 3324 - 3330

2017 International Joint Conference on Neural Networks (IJCNN)

Novelty detection is the task of recognising events the differ from a model of normality. This paper proposes an acoustic novelty detector based on neural networks trained with an adversarial training strategy. The proposed approach is composed of a feature extraction stage that calculates Log-Mel spectral features from the input signal. Then, an autoencoder network, trained on a corpus of “normal”...

chapter

Asymmetric stacked autoencoder

Angshul Majumdar, Aditay Tripathi

2017 International Joint Conference on Neural Networks (IJCNN) > 911 - 918

2017 International Joint Conference on Neural Networks (IJCNN)

Traditional stacked autoencoders have an equal number of encoders and decoders. However, while fine-tuned as a deep neural network the decoder portion is detached and never used. This begs the question: ‘do we need equal number of decoders and encoders’? In this study we explore asymmetric autoencoders — unequal number of encoders and decoders. We specifically address two tasks — 1. Classification...

chapter

Relational autoencoder for feature extraction

Qinxue Meng, Daniel Catchpoole, David Skillicom, Paul J. Kennedy

2017 International Joint Conference on Neural Networks (IJCNN) > 364 - 371

2017 International Joint Conference on Neural Networks (IJCNN)

Feature extraction becomes increasingly important as data grows high dimensional. Autoencoder as a neural network based feature extraction method achieves great success in generating abstract features of high dimensional data. However, it fails to consider the relationships of data samples which may affect experimental results of using original and new features. In this paper, we propose a Relation...

chapter

A class-specific copy network for handling the rare word problem in neural machine translation

Feng Wang, Wei Chen, Zhen Yang, Xiaowei Zhang, more

2017 International Joint Conference on Neural Networks (IJCNN) > 2658 - 2664

2017 International Joint Conference on Neural Networks (IJCNN)

Neural machine translation (NMT) has shown promising results and rapidly gained adoption in many large-scale settings. With the NMT model being widely used in empirical productions, its long-standing weakness in handling the rare and out of vocabulary words has been amplified a lot. In order to release the model from the stress of “understanding” the rare words, copy mechanism has been proposed to...

chapter

Class-wise deep dictionary learning

Vanika Singhal, Prerna Khurana, Angshul Majumdar

2017 International Joint Conference on Neural Networks (IJCNN) > 1125 - 1132

2017 International Joint Conference on Neural Networks (IJCNN)

In this work we propose a new framework for combined feature extraction and classification. The base idea stems from the sparse representation based classification; where in the training samples from each class are assumed to form a basis for representing the same. Later studies learned a basis for each class using dictionary learning; these were shallow techniques where only one level of dictionary...

chapter

Feature selection using multiple auto-encoders

Xinyu Guo, Ali A. Minai, Long J. Lu

2017 International Joint Conference on Neural Networks (IJCNN) > 4602 - 4609

2017 International Joint Conference on Neural Networks (IJCNN)

Real-world data such as medical images and sensor measurements is usually high-dimensional and limited. Using such datasets directly in machine learning tasks can lead to poor generalization. Feature learning is a general approach for transforming high-dimensional data points to a representational space with lower dimensionality. Machine learning models can be trained efficiently with such representations...

chapter

Hybrid deep autoencoder with Curvature Gaussian for detection of various types of cells in bone marrow trephine biopsy images

Tzu-Hsi Song, Victor Sanchez, Hesham EIDaly, Nasir M. Rajpoot

2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017) > 1040 - 1043

2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017)

Automated cell detection is a critical step for a number of computer-assisted pathology related image analysis algorithm. However, automated cell detection is complicated due to the variable cytomorphological and histological factors associated with each cell. In order to efficiently resolve the challenge of automated cell detection, deep learning strategies are widely applied and have recently been...

chapter

Cluster Adapted Signalling for Intra Prediction in HEVC

Kevin Reuze, Pierrick Philippe, Wassim Hamidouche, Olivier Deforges

2017 Data Compression Conference (DCC) > 191 - 200

2017 Data Compression Conference (DCC)

The High Efficiency Video Coding (HEVC) standard defines 35 Intra Prediction Modes (IPM) to provide an efficient compression of intra coded blocks. Those IPMs are signalled to the decoder through the use of three compression tools: prediction, clustering and coding. In this paper we provide improvements to these three tools through: new labels for the prediction, new tests for the clustering and new...

chapter

Fast speech keyword recognition based on improved filler model

Yang Wang, Jie Yang, Le Zhang

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) > 530 - 534

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

Most traditional template matching based keyword recognition methods don't need training data, just rely on frame matching. However, the recognition speed is relatively slow and it can't be used in practice. The LVCSR-based method needs to convert the speech signal into text signal before recognition, which has an important impact on the final recognition performance. In this paper, we propose a method...

chapter

Analysis of keyword spotting performance across IARPA babel languages

William Hartmann, Damianos Karakos, Roger Hsiao, Le Zhang, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5765 - 5769

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

With the completion of the IARPA Babel program, it is possible to systematically analyze the performance of speech recognition systems across a wide variety of languages. We select 16 languages from the dataset and compare performance using a deep neural network-based acoustic model. The focus is on keyword spotting using the actual term-weighted value (ATWV) metric. We demonstrate that ATWV is keyword...

chapter

Advances in all-neural speech recognition

Geoffrey Zweig, Chengzhu Yu, Jasha Droppo, Andreas Stolcke

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4805 - 4809

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper advances the design of CTC-based all-neural (or end-to-end) speech recognizers. We propose a novel symbol inventory, and a novel iterated-CTC method in which a second system is used to transform a noisy initial output into a cleaner version. We present a number of stabilization and initialization methods we have found useful in training these networks. We evaluate our system on the commonly...

chapter

Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding

Su Zhu, Kai Yu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5675 - 5679

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding. We introduce Bidirectional Long Short Term Memory - Long Short Term Memory networks (BLSTM-LSTM) as the encoder-decoder model to fully utilize the power of deep learning. In the sequence labelling task, the input and output sequences are aligned word by word, while the...

chapter

Combination strategy based on relative performance monitoring for multi-stream reverberant speech recognition

Feifei Xiong, Stefan Goetze, Bernd T. Meyer

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4870 - 4874

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A multi-stream framework with deep neural network (DNN) classifiers is applied to improve automatic speech recognition (ASR) in environments with different reverberation characteristics. We propose a room parameter estimation model to establish a reliable combination strategy which performs on either DNN posterior probabilities or word lattices. The model is implemented by training a multilayer perceptron...

chapter

Supervised monaural source separation based on autoencoders

Keiichi Osako, Yuki Mitsufuji, Rita Singh, Bhiksha Raj

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 11 - 15

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we propose a new supervised monaural source separation based on autoencoders. We employ the autoencoder for the dictionary training such that the nonlinear network can encode the target source with high expressiveness. The dictionary is trained by each target source without the mixture signal, which makes the system independent from the context where the dictionaries will be used. In...

chapter

Deterministic annealing based design of error resilient predictive compression systems

Bharath Vishwanath, Tejaswi Nanjundaswamy, Sina Zamani, Kenneth Rose

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3684 - 3688

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper considers near optimal design of predictive compression system that accounts for packet loss over unreliable networks. Major challenges to address include, propagation of errors due to packet loss through the prediction loop, mismatch between statistics used for design and during operation, and above all a cost function that is fraught with poor local minima. Accurately estimating and minimizing...

chapter

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

chapter

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

Ondrej Klejch, Peter Bell, Steve Renals

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5700 - 5704

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows the system to map from per frame acoustic features to word level representations by replacing the traditional encoder in the encoder-decoder architecture with a hierarchical encoder. Furthermore, we show that a system combining lexical and acoustic...

chapter

End-to-end speech recognition and keyword search on low-resource languages

Andrew Rosenberg, Kartik Audhkhasi, Abhinav Sethy, Bhuvana Ramabhadran, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5280 - 5284

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In recent years, so-called, “end-to-end” speech recognition systems have emerged as viable alternatives to traditional ASR frameworks. Keyword search, localizing an orthographic query in a speech corpus, is typically performed by using automatic speech recognition (ASR) to generate an index. Previous work has evaluated the use of end-to-end systems for ASR on well known corpora (WSJ, Switchboard,...

chapter

Effective keyword search for low-resourced conversational speech

Rasa Lileikyte, Thiago Fraga-Silva, Lori Lamel, Jean-Luc Gauvain, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5785 - 5789

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we aim to enhance keyword search for conversational telephone speech under low-resourced conditions. Two techniques to improve the detection of out-of-vocabulary keywords are assessed in this study: using extra text resources to augment the lexicon and language model, and via subword units for keyword search. Two approaches for data augmentation are explored to extend the limited amount...

INFONA - science communication portal

Search results

Dynamic and interactive gesture recognition algorithm based on Kinect

Acoustic novelty detection with adversarial autoencoders

Asymmetric stacked autoencoder

Relational autoencoder for feature extraction

A class-specific copy network for handling the rare word problem in neural machine translation

Class-wise deep dictionary learning

Feature selection using multiple auto-encoders

Hybrid deep autoencoder with Curvature Gaussian for detection of various types of cells in bone marrow trephine biopsy images

Cluster Adapted Signalling for Intra Prediction in HEVC

Fast speech keyword recognition based on improved filler model

Analysis of keyword spotting performance across IARPA babel languages

Advances in all-neural speech recognition

Encoder-decoder with focus-mechanism for sequence labelling based spoken language understanding

Combination strategy based on relative performance monitoring for multi-stream reverberant speech recognition

Supervised monaural source separation based on autoencoders

Deterministic annealing based design of error resilient predictive compression systems

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features

End-to-end speech recognition and keyword search on low-resource languages

Effective keyword search for low-resourced conversational speech

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options