Search results

Items from 1 to 20 out of 230 results

chapter

Unsupervised multiview learning with partial distribution information

Shashini De Silva, Jinsub Kim, Raviv Raich

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

We consider a training data collection mechanism wherein, instead of annotating each training instance with a class label, additional features drawn from a known class-conditional distribution are acquired concurrently. Considering true labels as latent variables, a maximum likelihood approach is proposed to train a classifier based on these unlabeled training data. Furthermore, the case of correlated...

chapter

Isolated forest in keystroke dynamics-based authentication: Only normal instances available for training

Kai Song, Yujie Zhou, Hongming Liu, Nianhao Zhu

2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA) > 63 - 67

2017 2nd IEEE International Conference on Computational Intelligence and Applications (ICCIA)

Keystroke dynamics, which is a biometric characteristic that depends on typing style of users. In the past thirty years, dozens of classifiers have been proposed for distinguishing people using keystroke dynamics; many have obtained excellent results in evaluation. However, a more common case is that only normal instances are available and none of the rare classes are observed. It leads us to use...

chapter

A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia

Ahmad Zuli Amrullah, Rudy Hartanto, I Wayan Mustika

2017 7th International Annual Engineering Seminar (InAES) > 1 - 5

2017 7th International Annual Engineering Seminar (InAES)

Part of speech tagging has some different methods or techniques to the problem in assigning each word of a text with a part-of-speech tag. In this paper, we conducted some part-of-speech tagging techniques for Bahasa Indonesia experiments using statistical approach (Unigram, Hidden Markov Models) and Brill's tagger. In this study, we used Supervised POS Tagging approach requiring a large number of...

article

Selective Transfer Machine for Personalized Facial Expression Analysis

Wen-Sheng Chu, Fernando De la Torre, Jeffrey F. Cohn

IEEE Transactions on Pattern Analysis and Machine Intelligence > 2017 > 39 > 3 > 529 - 545

Automatic facial action unit (AU) and expression detection from videos is a long-standing problem. The problem is challenging in part because classifiers must generalize to previously unknown subjects that differ markedly in behavior and facial morphology (e.g., heavy versus delicate brows, smooth versus deeply etched wrinkles) from those on which the classifiers are trained. While some progress has...

chapter

Unsupervised query-by-example spoken term detection based on DPHMM tokenizer

Cao Jiankai, Zhang Lianhai

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) > 1321 - 1325

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

This paper investigates the use of Dirichlet process hidden Markov model (DPHMM) tokenizer for the template matching based query-by-example spoken term detection (QbE-STD) task. DPHMM can be obtained following an unsupervised iterative procedure without any training transcriptions. The STD performance of the DPHMM tokenizer is evaluated on TIMIT Corpus. We construct three kinds of DPHMM based QbE-STD...

chapter

Towards bootstrapping Acoustic Models for resource poor Indian languages

Prabhat Pandey, Praful Hebbar, Prashant Borole, Sandeep Satpal, more

2017 Twenty-third National Conference on Communications (NCC) > 1 - 4

2017 Twenty-third National Conference on Communications (NCC)

There are several challenges while building Automatic Speech Recognition (ASR) system for low resource languages such as Indic languages. One problem is the access to large amounts of training data required to build Acoustic Models (AM) from scratch. In the context of Indian English, another challenge encountered is code-mixing as many Indian speakers are multilingual and exhibit code-mixing in their...

chapter

The 2016 BBN Georgian telephone speech keyword spotting system

Tanel Alumae, Damianos Karakos, William Hartmann, Roger Hsiao, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5755 - 5759

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system was constructed in response to the NIST Open Keyword Search (OpenKWS) evaluation of 2016. We present our technological breakthroughs in building top-performing keyword spotting processing systems for new...

chapter

Low-rank and sparse soft targets to learn better DNN acoustic models

Pranay Dighe, Afsaneh Asaei, Herve Bourlard

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5265 - 5269

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training. Subword classes in speech recognition systems correspond to context-dependent tied states or senones. The present work addresses some limitations of GMM-HMM senone alignments for DNN training. We hypothesize...

chapter

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection

Tomoki Hayashi, Shinji Watanabe, Tomoki Toda, Takaaki Hori, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 766 - 770

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper presents a new hybrid approach for polyphonic Sound Event Detection (SED) which incorporates a temporal structure modeling technique based on a hidden Markov model (HMM) with a frame-by-frame detection method based on a bidirectional long short-term memory (BLSTM) recurrent neural network (RNN). The proposed BLSTM-HMM hybrid system makes it possible to model sound event-dependent temporal...

chapter

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Naoyuki Kanda, Xugang Lu, Hisashi Kawai

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4855 - 4859

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When using connectionist temporal classification (CTC) based acoustic models (AMs) for large vocabulary continuous speech recognition (LVCSR), most previous studies have used a naive interpolation of the CTC-AM score and an additional language model score, although there is no theoretical justification for such an approach. On the other hand, we recently proposed a theoretically more sound decoding...

chapter

Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR

Zhong-Qiu Wang, DeLiang Wang

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4890 - 4894

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Batch normalization is a standard technique for training deep neural networks. In batch normalization, the input of each hidden layer is first mean-variance normalized and then linearly transformed before applying non-linear activation functions. We propose a novel unsupervised speaker adaptation technique for batch normalized acoustic models. The key idea is to adjust the linear transformations previously...

chapter

Exploiting sequential Low-Rank Factorization for multilingual DNNS

Reza Sahraeian, Dirk Van Compernolle

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5025 - 5029

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

DNNs have shown remarkable performance in multilingual scenarios; however, these models are often too large in size that adaptation to a target language with relatively small amount of data cannot be well accomplished. In our previous work, we utilized Low-Rank Factorization (LRF) using singular value decomposition for multilingual DNNs to learn compact models which can be adapted more successfully...

chapter

On DNN posterior probability combination in multi-stream speech recognition for reverberant environments

Feifei Xiong, Stefan Goetze, Bernd T. Meyer

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5250 - 5254

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

A multi-stream framework with deep neural network (DNN) classifiers has been applied in this paper to improve automatic speech recognition (ASR) performance in environments with different reverberation characteristics. We propose a room parameter estimation model to determine the stream weights for DNN posterior probability combination with the aim of obtaining reliable log-likelihoods for decoding...

chapter

Boosting performance on low-resource languages by standard corpora: An analysis

Frantisek Grezl, Martin Karafiat

2016 IEEE Spoken Language Technology Workshop (SLT) > 629 - 636

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper, we analyze the feasibility of using single well-resourced language - English - as a source language for multilingual techniques in context of Stacked Bottle-Neck tandem system. The effect of amount of data and number of tied-states in the source language on performance of ported system is evaluated together with different porting strategies. Generally, increasing data amount and level-of-detail...

chapter

Attribute based shared hidden layers for cross-language knowledge transfer

Vipul Arora, Aditi Lahiri, Henning Reet

2016 IEEE Spoken Language Technology Workshop (SLT) > 617 - 623

2016 IEEE Spoken Language Technology Workshop (SLT)

Deep neural network (DNN) acoustic models can be adapted to under-resourced languages by transferring the hidden layers. An analogous transfer problem is popular as few-shot learning to recognise scantily seen objects based on their meaningful attributes. In similar way, this paper proposes a principled way to represent the hidden layers of DNN in terms of attributes shared across languages. The diverse...

chapter

DNN based detection of pronunciation erroneous tendency in data sparse condition

Yingming Gao, Yanlu Xie, Ju Lin, Jinsong Zhang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance...

chapter

Survey of the word sense disambiguation and challenges for the Slovak language

Daniel Hladek, Jan Stas, Matus Pleva, Stanislav Ondas, more

2016 IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI) > 225 - 230

2016 IEEE 17th International Symposium on Computational Intelligence and Informatics (CINTI)

The main goal of this paper is to explain important terms of the word sense disambiguation (WSD) in the Slovak language. A comprehensive survey of current approaches and evaluation methodologies is provided. Special attention is given to necessary language resources and tools. The paper deals with problems specific to Slovak language: missing language resources, rich morphology, free word order and...

chapter

Testing Sphinx's Language Model Fault-Tolerance for the Holy Quran

Mohamed Yassine El Amrani, M.M. Hafizur Rahman, Mohamed Ridza Wahiddin, Asadullah Shah

2016 6th International Conference on Information and Communication Technology for The Muslim World (ICT4M) > 88 - 92

2016 6th International Conference on Information and Communication Technology for The Muslim World (ICT4M)

The Carnegie Mellon University's (CMU) Sphinx framework is increasingly used for the Arabic speech recognition in general and applied to the Holy Quran in particular. Generating the language model includes a tedious task of preparing the transcriptions for all the data. In this paper, we investigate the fault-tolerance of the automatically generated language model as compared to a corrected and uncorrected...

chapter

A many-to-one phone mapping approach for cross-lingual speech recognition

Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson

2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF) > 120 - 124

2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF)

This paper presents a novel method for acoustic modeling of an under-resourced language by “mapping” from acoustic models of well-resourced languages. The proposed method can be considered as a “many-to-one mapping” method where one speech unit in the target language is built as a linear combination of the source speech unit models and hence we can explicitly observe the relationship of the source...

chapter

Mismatched training data enhancement for automatic recognition of children's speech using DNN-HMM

Mengjie Qian, Ian McLaughlin, Wu Quo, Lirong Dai

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

The increasing profusion of commercial automatic speech recognition technology applications has been driven by big-data techniques, using high quality labelled speech datasets. Children's speech has greater time and frequency domain variability than typical adult speech, lacks good large scale training data, and presents difficulties relating to capture quality. Each of these factors reduces the performance...

Keywords:
TRAINING DATA
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Content availability

Available (229)
None (1)

Publication type

book (212)
article (18)

Keywords

SPEECH (80)
DATA MODELS (75)
SPEECH RECOGNITION (67)
ACOUSTICS (50)
FEATURE EXTRACTION (36)
DATA MINING (31)
HIDDEN MARKOV MODEL (27)
COMPUTATIONAL MODELING (24)
NATURAL LANGUAGE PROCESSING (24)
ACCURACY (23)
LEARNING (ARTIFICIAL INTELLIGENCE) (23)
SUPPORT VECTOR MACHINES (18)
HMM (17)
SPEECH PROCESSING (15)
VECTORS (15)
ADAPTATION MODELS (14)
HANDWRITING RECOGNITION (14)
TESTING (14)
DATABASES (13)
DECODING (13)
ARTIFICIAL NEURAL NETWORKS (12)
CONTEXT (12)
DICTIONARIES (12)
TAGGING (12)
AUTOMATIC SPEECH RECOGNITION (11)
PATTERN CLASSIFICATION (11)
SPEECH SYNTHESIS (11)
GAUSSIAN PROCESSES (10)
MACHINE LEARNING (10)
PROBABILITY (10)
SECURITY OF DATA (10)
VITERBI ALGORITHM (10)
VOCABULARY (10)
ADAPTATION MODEL (9)
ANOMALY DETECTION (9)
ERROR ANALYSIS (9)
MARKOV PROCESSES (9)
MAXIMUM LIKELIHOOD ESTIMATION (9)
DECISION TREES (8)
ENTROPY (8)
TEXT ANALYSIS (8)
ALGORITHM DESIGN AND ANALYSIS (7)
CLASSIFICATION ALGORITHMS (7)
COMPUTERS (7)
CONDITIONAL RANDOM FIELDS (7)
CONFERENCES (7)
COVARIANCE MATRIX (7)
GAUSSIAN MIXTURE MODEL (7)
KERNEL (7)
LABELING (7)
MEL FREQUENCY CEPSTRAL COEFFICIENT (7)
SIGNAL PROCESSING (7)
UNSUPERVISED LEARNING (7)
ACOUSTIC MODELING (6)
CLASSIFICATION (6)
COMPLEXITY THEORY (6)
DETECTORS (6)
ESTIMATION (6)
LANGUAGE MODEL (6)
LANGUAGE TRANSLATION (6)
MATHEMATICAL MODEL (6)
MEASUREMENT (6)
PATTERN RECOGNITION (6)
SEMI-SUPERVISED LEARNING (6)
SPEAKER RECOGNITION (6)
STATISTICAL ANALYSIS (6)
TRAJECTORY (6)
ACOUSTIC SIGNAL PROCESSING (5)
ANALYTICAL MODELS (5)
COMPUTATIONAL LINGUISTICS (5)
CORRELATION (5)
EDUCATIONAL INSTITUTIONS (5)
GRAMMARS (5)
HEURISTIC ALGORITHMS (5)
HUMANS (5)
IMAGE MOTION ANALYSIS (5)
IMAGE SEGMENTATION (5)
INFORMATION RETRIEVAL (5)
INTERNET (5)
LATTICES (5)
NATURAL LANGUAGES (5)
SHAPE (5)
SUPPORT VECTOR MACHINE CLASSIFICATION (5)
TEXT RECOGNITION (5)
VOICE CONVERSION (5)
ACOUSTIC MODEL (4)
ACTIVE LEARNING (4)
BAYES METHODS (4)
BAYESIAN METHODS (4)
CHARACTER RECOGNITION (4)
CO-TRAINING (4)
CONTEXT MODELING (4)
ELECTRONIC MAIL (4)
EXPECTATION-MAXIMISATION ALGORITHM (4)
HARMONIC ANALYSIS (4)
IMAGE RECOGNITION (4)
INTERPOLATION (4)
more

INFONA - science communication portal

Search results

Unsupervised multiview learning with partial distribution information

Isolated forest in keystroke dynamics-based authentication: Only normal instances available for training

A comparison of different part-of-speech tagging technique for text in Bahasa Indonesia

Selective Transfer Machine for Personalized Facial Expression Analysis

Unsupervised query-by-example spoken term detection based on DPHMM tokenizer

Towards bootstrapping Acoustic Models for resource poor Indian languages

The 2016 BBN Georgian telephone speech keyword spotting system

Low-rank and sparse soft targets to learn better DNN acoustic models

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection

Minimum Bayes risk training of CTC acoustic models in maximum a posteriori based decoding framework

Unsupervised speaker adaptation of batch normalized acoustic models for robust ASR

Exploiting sequential Low-Rank Factorization for multilingual DNNS

On DNN posterior probability combination in multi-stream speech recognition for reverberant environments

Boosting performance on low-resource languages by standard corpora: An analysis

Attribute based shared hidden layers for cross-language knowledge transfer

DNN based detection of pronunciation erroneous tendency in data sparse condition

Survey of the word sense disambiguation and challenges for the Slovak language

Testing Sphinx's Language Model Fault-Tolerance for the Holy Quran

A many-to-one phone mapping approach for cross-lingual speech recognition

Mismatched training data enhancement for automatic recognition of children's speech using DNN-HMM

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options