Search results for: H. Hermansky

Items from 1 to 20 out of 21 results

chapter

Developing a speaker identification system for the DARPA RATS project

Oldrich Plchot, Spyros Matsoukas, Pavel Matejka, Najim Dehak, more

2013 IEEE International Conference on Acoustics, Speech and Signal Processing > 6768 - 6772

ICASSP 2013 - 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper describes the speaker identification (SID) system developed by the Patrol team for the first phase of the DARPA RATS (Robust Automatic Transcription of Speech) program, which seeks to advance state of the art detection capabilities on audio from highly degraded communication channels. We present results using multiple SID systems differing mainly in the algorithm used for voice activity...

chapter

The UMD-JHU 2011 speaker recognition system

D Garcia-Romero, X Zhou, D Zotkin, B Srinivasan, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4229 - 4232

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In recent years, there have been significant advances in the field of speaker recognition that has resulted in very robust recognition systems. The primary focus of many recent developments have shifted to the problem of recognizing speakers in adverse conditions, e.g in the presence of noise/reverberation. In this paper, we present the UMD-JHU speaker recognition system applied on the NIST 2010 SRE...

chapter

Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop

G. Zweig, P. Nguyen, D. Van Compernolle, K. Demuynck, more

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5044 - 5047

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper summarizes the 2010 CLSP Summer Workshop on speech recognition at Johns Hopkins University. The key theme of the workshop was to improve on state-of-the-art speech recognition systems by using Segmental Conditional Random Fields (SCRFs) to integrate multiple types of information. This approach uses a state-of-the-art baseline as a springboard from which to add a suite of novel features...

chapter

Fully integrated 500uW speech detection wake-up circuit

T Delbruck, T Koch, R Berner, H Hermansky

Proceedings of 2010 IEEE International Symposium on Circuits and Systems > 2015 - 2018

2010 IEEE International Symposium on Circuits and Systems. ISCAS 2010

Speech analysis requires substantial computation. It is desirable to run this analysis only when needed and at other times to go to a low power state. Here we propose a self-biased low power speech detection wake up circuit which interfaces directly to standard electret microphones. The speech detector includes a microphone preamplifier, a power extraction squaring circuit, a bandpass filter passing...

chapter

The use of spike-based representations for hardware audition systems

Shih-Chii Liu, N Mesgarani, J Harris, H Hermansky

Proceedings of 2010 IEEE International Symposium on Circuits and Systems > 505 - 508

2010 IEEE International Symposium on Circuits and Systems. ISCAS 2010

Humans are able to process speech and other sounds effectively in adverse environments, hearing through noise, reverberation, and interference from other speakers. To date, machines have been unable to match human performance. One profound difference between biological and engineering systems comes at the input stage. In machines, an acoustic signal is typically chopped into short equally spaced segments...

chapter

Robust spectro-temporal features based on autoregressive models of Hilbert envelopes

S Ganapathy, S Thomas, H Hermansky

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4286 - 4289

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

In this paper, we present a robust spectro-temporal feature extraction technique using autoregressive models (AR) of sub-band Hilbert envelopes. AR models of Hilbert envelopes are derived using frequency domain linear prediction (FDLP). From the sub-band Hilbert envelopes, spectral features are derived by integrating these envelopes in short-term frames and the temporal features are formed by converting...

chapter

Sparse coding for speech recognition

G S V S Sivaram, S K Nemala, M Elhilali, T D Tran, more

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 4346 - 4349

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper proposes a novel feature extraction technique for speech recognition based on the principles of sparse coding. The idea is to express a spectro-temporal pattern of speech as a linear combination of an overcomplete set of basis functions such that the weights of the linear combination are sparse. These weights (features) are subsequently used for acoustic modeling. We learn a set of overcomplete...

chapter

Temporal envelope subtraction for robust speech recognition using modulation spectrum

S. Ganapathy, S. Thomas, H. Hermansky

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 164 - 169

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

In this paper, we present a new noise compensation technique for modulation frequency features derived from syllable length segments of subband temporal envelopes. The subband temporal envelopes are estimated using frequency domain linear prediction (FDLP). We propose a technique for noise compensation in FDLP where an estimate of the noise envelope is subtracted from the noisy speech envelope. The...

chapter

Applications of signal analysis using autoregressive models for amplitude modulation

S. Ganapathy, S. Thomas, P. Motlicek, H. Hermansky

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics > 341 - 344

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Frequency domain linear prediction (FDLP) represents an efficient technique for representing the long-term amplitude modulations (AM) of speech/audio signals using autoregressive models. For the proposed analysis technique, relatively long temporal segments (1000 ms) of the input signal are decomposed into a set of sub-bands. FDLP is applied on each sub-band to model the temporal envelopes. The residual...

chapter

Reconciliation of human and machine speech recognition performance

M. Pavel, M. Slaney, H. Hermansky

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 1669 - 1672

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper focuses on resolving a number of issues that appear when the performance of human speech recognition is compared to that of automatic speech recognition. In particular human experimental data suggest that the resulting error is a product of the individual streams. On the other hand, Bayesian combination requires a multiplication of the estimates of prior probabilities and likelihoods. We...

chapter

Volterra series for analyzing MLP based phoneme posterior estimator

J. Pinto, G.S.V.S. Sivaram, H. Hermansky, M. Magimai-Doss

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 1813 - 1816

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

We present a framework to apply Volterra series to analyze multi-layered perceptrons trained to estimate the posterior probabilities of phonemes in automatic speech recognition. The identified Volterra kernels reveal the spectro-temporal patterns that are learned by the trained system for each phoneme. To demonstrate the applicability of Volterra series, we analyze a multilayered perceptron trained...

chapter

Phoneme recognition using spectral envelope and modulation frequency features

S. Thomas, S. Ganapathy, H. Hermansky

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4453 - 4456

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

We present a new feature extraction technique for phoneme recognition that uses short-term spectral envelope and modulation frequency features. These features are derived from sub-band temporal envelopes of speech estimated using frequency domain linear prediction (FDLP). While spectral envelope features are obtained by the short-term integration of the sub-band envelopes, the modulation frequency...

chapter

Confidence estimation, OOV detection and language ID using phone-to-word transduction and phone-level alignments

C. White, G. Zweig, L. Burget, P. Schwarz, more

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4085 - 4088

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Automatic speech recognition (ASR) systems continue to make errors during search when handling various phenomena including noise, pronunciation variation, and out of vocabulary (OOV) words. Predicting the probability that a word is incorrect can prevent the error from propagating and perhaps allow the system to recover. This paper addresses the problem of detecting errors and OOVs for read Wall Street...

chapter

Temporal masking for bit-rate reduction in audio codec based on Frequency Domain Linear Prediction

S. Ganapathy, P. Motlicek, H. Hermansky, H. Garudadri

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4781 - 4784

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Audio coding based on frequency domain linear prediction (FDLP) uses auto-regressive model to approximate Hilbert envelopes in frequency sub-bands for relatively long temporal segments. Although the basic technique achieves good quality of the reconstructed signal, there is a need for improving the coding efficiency. In this paper, we present a novel method for the application of temporal masking...

chapter

Exploiting contextual information for improved phoneme recognition

J. Pinto, B. Yegnanarayana, H. Hermansky, M. Magimai-Doss

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4449 - 4452

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

In this paper, we investigate the significance of contextual information in a phoneme recognition system using the hidden Markov model - artificial neural network paradigm. Contextual information is probed at the feature level as well as at the output of the multilayered perceptron. At the feature level, we analyze and compare different methods to model sub-phonemic classes. To exploit the contextual...

chapter

Hierarchical and parallel processing of modulation spectrum for ASR applications

F. Valente, H. Hermansky

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4165 - 4168

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

The modulation spectrum is an efficient representation for describing dynamic information in signals. In this work we investigate how to exploit different elements of the modulation spectrum for extraction of information in automatic recognition of speech (ASR). Parallel and hierarchical (sequential) approaches are investigated. Parallel processing combines outputs of independent classifiers applied...

chapter

Combination of strongly and weakly constrained recognizers for reliable detection of OOVS

L. Burget, P. Schwarz, P. Matejka, M. Hannemann, more

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4081 - 4084

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper addresses the detection of OOV segments in the output of a large vocabulary continuous speech recognition (LVCSR) system. First, standard confidence measures from frame-based word- and phone-posteriors are investigated. Substantial improvement is obtained when posteriors from two systems - strongly constrained (LVCSR) and weakly constrained (phone posterior estimator) are combined. We show...

article

Recognition of Reverberant Speech Using Frequency Domain Linear Prediction

S. Thomas, S. Ganapathy, H. Hermansky

IEEE Signal Processing Letters > 2008 > 15 > 681 - 684

Performance of a typical automatic speech recognition (ASR) system severely degrades when it encounters speech from reverberant environments. Part of the reason for this degradation is the feature extraction techniques that use analysis windows which are much shorter than typical room impulse responses. We present a feature extraction technique based on modeling temporal envelopes of the speech signal...

chapter

Wide-Band Perceptual Audio Coding Based on Frequency-Domain Linear Prediction

P. Motlicek, V. Ullal, H. Hermansky

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '7 > 1 > I-265 - I-268

2007 IEEE International Conference on Acoustics, Speech, and Signal Processing

In this paper we propose an extension of the very low bit-rate speech coding technique, exploiting predictability of the temporal evolution of spectral envelopes, for wide-band audio coding applications. Temporal envelopes in critically band-sized sub-bands are estimated using frequency domain linear prediction applied on relatively long time segments. The sub-band residual signals, which play an...

chapter

Towards ASR Based on Hierarchical Posterior-Based Keyword Recognition

P. Fousek, H. Hermansky

2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings > 1 > I

2006 IEEE International Conference on Acoustics, Speech, and Signal Processing

The paper presents an alternative approach to automatic recognition of speech in which each targeted word is classified by a separate binary classifier against all other sounds. No time alignment is done. To build a recognizer for N words, N parallel binary classifiers are applied. The system first estimates uniformly sampled posterior probabilities of phoneme classes, followed by a second step in...

Publication date

Set your own date range

Publication type

book (20)
article (1)

Keywords

SPEECH RECOGNITION (16)
SPEECH (10)
FEATURE EXTRACTION (8)
FREQUENCY DOMAIN LINEAR PREDICTION (6)
AUTOMATIC SPEECH RECOGNITION (5)
HIDDEN MARKOV MODELS (5)
PHONEME RECOGNITION (5)
FREQUENCY MODULATION (4)
SPEECH PROCESSING (4)
ACOUSTICS (3)
AUDIO CODING (3)
NOISE (3)
ACCURACY (2)
ARTIFICIAL NEURAL NETWORK (2)
AUTOREGRESSIVE MODELS (2)
AUTOREGRESSIVE PROCESSES (2)
CODECS (2)
DETECTORS (2)
DISCRETE COSINE TRANSFORMS (2)
FREQUENCY DOMAIN LINEAR PREDICTION (FDLP) (2)
FREQUENCY-DOMAIN ANALYSIS (2)
HILBERT ENVELOPES (2)
HILBERT TRANSFORMS (2)
KERNEL (2)
LINEAR PREDICTIVE CODING (2)
MODULATION (2)
MODULATION FREQUENCY FEATURES (2)
MODULATION SPECTRUM (2)
MULTILAYER PERCEPTRONS (2)
NEURAL NETS (2)
NEURONS (2)
REVERBERATION (2)
ROBUSTNESS (2)
SPEAKER RECOGNITION (2)
SPEECH CODING (2)
TRAINING (2)
ACOUSTIC MODELING (1)
ACOUSTIC NOISE (1)
ACOUSTIC SIGNAL PROCESSING (1)
ADDITIVE NOISE (1)
ADDITIVE WHITE NOISE (1)
AER (1)
ALL-POLE APPROXIMATION (1)
AM-FM DECOMPOSITION (1)
AM-FM DECOMPOSITION TECHNIQUE (1)
AMPLITUDE MODULATION (1)
ANALYSIS WINDOWS (1)
APPROXIMATION METHODS (1)
APPROXIMATION THEORY (1)
ASR (1)
ASR APPLICATION (1)
AUDIO CODECS (1)
AUDIO RECORDING (1)
AUDIO SIGNAL PROCESSING (1)
AUDIO-FREQUENCY AMPLIFIERS (1)
AUDITION (1)
AUDITORY PROCESSING (1)
AUTOREGRESSIVE MODEL (1)
BAND PASS FILTERS (1)
BAND-PASS FILTERS (1)
BANDPASS FILTER (1)
BAYES METHODS (1)
BAYESIAN COMBINATION (1)
BIOLOGY (1)
BIT-RATE REDUCTION (1)
CHANNEL BANK FILTERS (1)
CIRCUITS AND SYSTEMS (1)
CLOSED LOOP SYSTEMS (1)
CLOSED-LOOP ANALYSIS-BY-SYNTHESIS TECHNIQUE (1)
CMOS INTEGRATED CIRCUITS (1)
COLORED NOISE (1)
COMPRESSIVE SENSING (1)
CONFIDENCE ESTIMATION (1)
CONFIDENCE MEASURES (1)
CONNECTED DIGIT RECOGNITION TASK (1)
CONTEXT (1)
CONTEXT MODELING (1)
CONTEXTUAL INFORMATION (1)
CONVOLUTIONAL NOISE (1)
CORTICAL (1)
COSINE TRANSFORM (1)
CRF (1)
CURRENT MEASUREMENT (1)
DATA COMPRESSION (1)
DICTIONARIES (1)
EAR (1)
ELECTRET MICROPHONES (1)
ELECTRETS (1)
ENCODING (1)
ERROR RECOGNITION (1)
ESTIMATION THEORY (1)
FDLP (1)
FEATURE EXTRACTION TECHNIQUE (1)
FEATURE EXTRACTION TECHNIQUES (1)
FORWARD MASKING (1)
FRAME-BASED WORD MEASURES (1)
FREQUENCY 2 HZ TO 12 HZ (1)
FREQUENCY DOMAIN ANALYSIS (1)
FREQUENCY DOMAIN LINEAR PREDICITON (FDLP) (1)
FREQUENCY MODULATIONS (1)
more

INFONA - science communication portal

Search results for: H. Hermansky

Developing a speaker identification system for the DARPA RATS project

The UMD-JHU 2011 speaker recognition system

Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop

Fully integrated 500uW speech detection wake-up circuit

The use of spike-based representations for hardware audition systems

Robust spectro-temporal features based on autoregressive models of Hilbert envelopes

Sparse coding for speech recognition

Temporal envelope subtraction for robust speech recognition using modulation spectrum

Applications of signal analysis using autoregressive models for amplitude modulation

Reconciliation of human and machine speech recognition performance

Volterra series for analyzing MLP based phoneme posterior estimator

Phoneme recognition using spectral envelope and modulation frequency features

Confidence estimation, OOV detection and language ID using phone-to-word transduction and phone-level alignments

Temporal masking for bit-rate reduction in audio codec based on Frequency Domain Linear Prediction

Exploiting contextual information for improved phoneme recognition

Hierarchical and parallel processing of modulation spectrum for ASR applications

Combination of strongly and weakly constrained recognizers for reliable detection of OOVS

Recognition of Reverberant Speech Using Frequency Domain Linear Prediction

Wide-Band Perceptual Audio Coding Based on Frequency-Domain Linear Prediction

Towards ASR Based on Hierarchical Posterior-Based Keyword Recognition

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results for: H. Hermansky

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options