Search results for: M. Hasegawa-Johnson

Items from 1 to 20 out of 20 results

chapter

Toward overcoming fundamental limitation in frequency-domain blind source separation for reverberant speech mixtures

Lae-Hoon Kim, M Hasegawa-Johnson

2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers > 542 - 545

2010 44th Asilomar Conference on Signals, Systems and Computers

Blind source separation can be implemented in the frequency domain using one-tap multiplication operation in each frequency bin, but only when the frame length is long enough to disregard temporal aliasing effects. If we take a short-time frequency transformation with a window shorter than a room reverberation time, the justification above does not hold anymore. In this paper, we present an appropriate...

chapter

Joint estimation of DOA and speech based on EM beamforming

Lae-Hoon Kim, M Hasegawa-Johnson, G Potamianos, V Libal

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 121 - 124

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

In this paper, we propose a multi-microphone joint optimal estimation of the direction of arrival (DOA) and the source speech signal through newly introduced EM beamforming. This produces a posterior PDF for the DOA, based only on the reliable speech spectrum. By maximizing over the posterior PDF of the DOA, we achieve maximum a posteriori DOA estimation. After convergence, the estimated source spectrum...

chapter

Toward robust learning of the Gaussian mixture state emission densities for hidden Markov models

Hao Tang, M Hasegawa-Johnson, T S Huang

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 5242 - 5245

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

One important class of state emission densities of the hiddenMarkov model (HMM) is the Gaussian mixture densities. The classical Baum-Welch algorithm often fails to reliably learn the Gaussian mixture densities when there is insufficient training data, due to the large number of free parameters present in the model. In this paper, we propose a novel strategy for robustly and accurately learning the...

chapter

Kernel metric learning for phonetic classification

Jui-Ting Huang, Xi Zhou, M. Hasegawa-Johnson, T. Huang

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 141 - 145

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes,...

chapter

Emotion recognition from speech VIA boosted Gaussian mixture models

Hao Tang, S.M. Chu, M. Hasegawa-Johnson, T.S. Huang

2009 IEEE International Conference on Multimedia and Expo > 294 - 297

2009 IEEE International Conference on Multimedia and Expo (ICME)

Gaussian mixture models (GMMs) and the minimum error rate classifier (i.e. Bayesian optimal classifier) are popular and effective tools for speech emotion recognition. Typically, GMMs are used to model the class-conditional distributions of acoustic features and their parameters are estimated by the expectation maximization (EM) algorithm based on a training data set. Then, classification is performed...

chapter

Acoustic fall detection using Gaussian mixture models and GMM supervectors

Xiaodan Zhuang, Jing Huang, G. Potamianos, M. Hasegawa-Johnson

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 69 - 72

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

We present a system that detects human falls in the home environment, distinguishing them from competing noise, by using only the audio signal from a single far-field microphone. The proposed system models each fall or noise segment by means of a Gaussian mixture model (GMM) supervector, whose Euclidean distance measures the pairwise difference between audio segments. A support vector machine built...

chapter

A novel Gaussianized vector representation for natural scene categorization

Xi Zhou, Xiaodan Zhuang, Hao Tang, M. Hasegawa-Johnson, more

2008 19th International Conference on Pattern Recognition > 1 - 4

ICPR 2008 19th International Conference on Pattern Recognition

This paper presents a novel Gaussianized vector representation for scene images by an unsupervised approach. First, each image is encoded as an ensemble of orderless bag of features, and then a global Gaussian Mixture Model (GMM) learned from all images is used to randomly distribute each feature into one Gaussian component by a multinomial trial. The parameters of the multinomial distribution are...

chapter

Face age estimation using patch-based hidden Markov model supervectors

Xiaodan Zhuang, Xi Zhou, M. Hasegawa-Johnson, T. Huang

2008 19th International Conference on Pattern Recognition > 1 - 4

ICPR 2008 19th International Conference on Pattern Recognition

Recent studies in patch-based Gaussian Mixture Model (GMM) approaches for face age estimation present promising results. We propose using a hidden Markov model (HMM) supervector to represent face image patches, to improve from the previous GMM supervector approach by capturing the spatial structure of human faces and loosening the assumption of identical face patch distribution within a face image...

chapter

Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar

Hao Tang, Yuxiao Hu, Yun Fu, M. Hasegawa-Johnson, more

2008 IEEE International Conference on Multimedia and Expo > 1205 - 1208

2008 IEEE International Conference on Multimedia and Expo (ICME)

In this paper, we propose a complete pipeline of efficient and low-cost techniques to construct a realistic 3D text-driven emotive audio-visual avatar from a single 2D frontal-view face image of any person on the fly. This real-time conversion is achieved through three steps. First, a personalized 3D face model is built based on the 2D face image using a fully automatic 3D face shape and texture reconstruction...

chapter

Regression from patch-kernel

Shuicheng Yan, Xi Zhou, Ming Liu, M. Hasegawa-Johnson, more

2008 IEEE Conference on Computer Vision and Pattern Recognition > 1 - 8

2008 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

In this paper, we present a patch-based regression framework for addressing the human age and head pose estimation problems. Firstly, each image is encoded as an ensemble of orderless coordinate patches, the global distribution of which is described by Gaussian mixture models (GMM), and then each image is further expressed as a specific distribution model by Maximum a Posteriori adaptation from the...

chapter

Feature analysis and selection for acoustic event detection

Xiaodan Zhuang, Xi Zhou, T.S. Huang, M. Hasegawa-Johnson

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 17 - 20

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

Speech perceptual features, such as Mel-frequency Cepstral Coefficients (MFCC), have been widely used in acoustic event detection. However, the different spectral structures between speech and acoustic events degrade the performance of the speech feature sets. We propose quantifying the discriminative capability of each feature component according to the approximated Bayesian accuracy and deriving...

chapter

Optimal speech estimator considering room response as well as additive noise: Different approaches in low and high frequency range

Lae-Hoon Kim, M. Hasegawa-Johnson

2008 IEEE International Conference on Acoustics, Speech and Signal Processing > 4573 - 4576

ICASSP 2008. IEEE International Conference on Acoustic, Speech and Signal Processes

This paper proposes minimum mean squared error (MMSE) speech signal estimation in a reverberant space using different optimal estimators in the low and high frequency ranges. At low frequencies, an MMSE spectral amplitude estimator divided by the spectral amplitude of a representative impulse response produces optimal performance. In the high frequency range, the MMSE estimator is computed based on...

chapter

EAVA: A 3D Emotive Audio-Visual Avatar

Hao Tang, Yun Fu, Jilin Tu, T.S. Huang, more

2008 IEEE Workshop on Applications of Computer Vision > 1 - 6

2008 IEEE Workshop on Applications of Computer Vision

Emotive audio-visual avatars have the potential of significantly improving the quality of Human-Computer Interaction (HCI). In this paper, the various technical approaches of a novel framework leading to a text-driven 3D Emotive Audio-Visual Avatar (EAVA) are proposed. Primary work is focused on 3D face modeling, realistic emotional facial expression animation, emotive speech synthesis, and the co-articulation...

article

Humanoid Audio–Visual Avatar With Emotive Text-to-Speech Synthesis

Hao Tang, Yun Fu, Jilin Tu, M. Hasegawa-Johnson, more

IEEE Transactions on Multimedia > 2008 > 10 > 6 > 969 - 981

Emotive audio-visual avatars are virtual computer agents which have the potential of improving the quality of human-machine interaction and human-human communication significantly. However, the understanding of human communication has not yet advanced to the point where it is possible to make realistic avatars that demonstrate interactions with natural-sounding emotive speech and realistic-looking...

chapter

A Multi-Stream Approach to Audiovisual Automatic Speech Recognition

M. Hasegawa-Johnson

2007 IEEE 9th Workshop on Multimedia Signal Processing > 328 - 331

IEEE 9th Workshop on Multimedia Signal Processing, 2007. MMSP 2007

This paper proposes a multi-stream approach to automatic audiovisual speech recognition, based in part on Hickok and Poeppel's dual-stream model of human speech processing. The dual-stream model proposes that semantic networks may be accessed by at least three parallel neural streams: at least two ventral streams that map directly from acoustics to words (with different time scales), and at least...

chapter

Exploring Discriminative Learning for Text-Independent Speaker Recognition

Ming Liu, Zhengyou Zhang, M. Hasegawa-Johnson, T.S. Huang

Multimedia and Expo, 2007 IEEE International Conference on > 56 - 59

2007 IEEE International Conference on Multimedia and Expo

Speaker verification is a technology of verifying the claimed identity of a speaker based on the speech signal from the speaker (voice print). To learn the score of similarity between each pair of target and trial utterances, we investigated two different discriminative learning frameworks: Fisher mapping followed by SVM learning and utterance transform followed by iterative cohort modeling (ICM)...

chapter

Robust Analysis and Weighting on MFCC Components for Speech Recognition and Speaker Identification

Xi Zhou, Yun Fu, Ming Liu, M. Hasegawa-Johnson, more

Multimedia and Expo, 2007 IEEE International Conference on > 188 - 191

2007 IEEE International Conference on Multimedia and Expo

Mismatch between training and testing data is a major error source for both automatic speech recognition (ASR) and automatic speaker identification (ASI). In this paper, we first present a statistical weighting concept to exploit the unequal sensitivity of mel-frequency cepstral coefficients (MFCC) components to against the mismatch, such as ambient noise, recording equipment, transmission channels,...

chapter

Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop

K. Livescu, O. Cetin, M. Hasegawa-Johnson, S. King, more

2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '7 > 4 > IV-621 - IV-624

2007 IEEE International Conference on Acoustics, Speech, and Signal Processing

We report on investigations, conducted at the 2006 Johns Hopkins Workshop, into the use of articulatory features (AFs) for observation and pronunciation models in speech recognition. In the area of observation modeling, we use the outputs of AF classifiers both directly, in an extension of hybrid HMM/neural network models, and as part of the observation vector, an extension of the "tandem"...

chapter

Hmm-Based and Svm-Based Recognition of the Speech of Talkers With Spastic Dysarthria

M. Hasegawa-Johnson, J. Gunderson, A. Penman, T. Huang

2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings > 3 > III

2006 IEEE International Conference on Acoustics, Speech, and Signal Processing

This paper studies the speech of three talkers with spastic dysarthria caused by cerebral palsy. All three subjects share the symptom of low intelligibility, but causes differ. First, all subjects tend to reduce or delete word-initial consonants; one subject deletes all consonants. Second, one subject exhibits a painstaking stutter. Two algorithms were used to develop automatic isolated digit recognition...

chapter

Generalized Optimal Multi-Microphone Speech Enhancement Using Sequential Minimum Variance Distortionless Response(MVDR) Beamforming and Postfiltering

Lae-Hoon Kim, M. Hasegawa-Johnson, Koeng-Mo Sung

2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings > 3 > III

2006 IEEE International Conference on Acoustics, Speech, and Signal Processing

A theoretical basis for optimal multichannel speech enhancements presented, sufficient, flexible to be used with any assumed statistical model and optimality criterion. Any Bayesian optimal one-channel estimator for speech enhancement can be generalized to the multichannel case as a sequentially constructed minimum variance distortionless response (MVDR) beamformer followed by an optimal one-channel...

Filter options

Publication date

Set your own date range

Publication type

book (19)
article (1)

Keywords

HIDDEN MARKOV MODELS (7)
SPEECH (6)
SPEECH RECOGNITION (6)
GAUSSIAN PROCESSES (5)
BAYES METHODS (4)
FACE (4)
SPEECH PROCESSING (4)
SUPPORT VECTOR MACHINES (4)
AVATARS (3)
COMPUTER ANIMATION (3)
DATABASES (3)
EMOTION RECOGNITION (3)
ESTIMATION (3)
GAUSSIAN MIXTURE MODEL (3)
HUMANS (3)
KERNEL (3)
LEARNING (ARTIFICIAL INTELLIGENCE) (3)
MAXIMUM LIKELIHOOD ESTIMATION (3)
STATISTICAL ANALYSIS (3)
TRAINING (3)
ACOUSTICS (2)
ADAPTATION MODEL (2)
ARRAY SIGNAL PROCESSING (2)
AUDIO-VISUAL SYSTEMS (2)
BOOSTING (2)
CEPSTRAL ANALYSIS (2)
DISTANCE MEASUREMENT (2)
EMOTIVE SPEECH SYNTHESIS (2)
EQUATIONS (2)
FACE RECOGNITION (2)
FACIAL ANIMATION (2)
FEATURE EXTRACTION (2)
GAUSSIAN DISTRIBUTION (2)
GAUSSIAN MIXTURE MODELS (2)
HMM (2)
HUMAN COMPUTER INTERACTION (2)
IMAGE REPRESENTATION (2)
LEAST MEAN SQUARES METHODS (2)
SPEAKER RECOGNITION (2)
SPEECH EMOTION RECOGNITION (2)
SPEECH SYNTHESIS (2)
WORD ERROR RATE (2)
2006 JHU SUMMER WORKSHOP (1)
2D FRONTAL-VIEW IMAGE (1)
3–D FACE MODELING AND ANIMATION (1)
3D EMOTIVE AUDIO-VISUAL AVATAR (1)
3D FACE MODELING (1)
3D FACE RECONSTRUCTION (1)
3D TEXT-DRIVEN EMOTIVE AUDIO-VISUAL AVATAR (1)
ACOUSTIC (1)
ACOUSTIC EVENT DETECTION (1)
ACOUSTIC FALL DETECTION (1)
ACOUSTIC FEATURE (1)
ACOUSTIC SIGNAL DETECTION (1)
ACOUSTIC SIGNAL PROCESSING (1)
ACOUSTIC TRANSDUCERS (1)
ADDITIVE NOISE (1)
AGING (1)
AMBIENT NOISE (1)
APPROXIMATED BAYESIAN ACCURACY (1)
ARTICULATORY FEATURE-BASED METHODS (1)
AUDIO–VISUAL AVATAR (1)
AUDIO-VISUAL RECOGNITION (1)
AUDIO-VISUAL SPEECH (1)
AUDIO-VISUAL SPEECH RECOGNITION (1)
AUDIOVISUAL AUTOMATIC SPEECH RECOGNITION (1)
AUTOMATIC 3D FACE SHAPE RECONSTRUCTION (1)
AUTOMATIC ISOLATED DIGIT RECOGNITION SYSTEMS (1)
AUTOMATIC SPEAKER IDENTIFICATION (1)
AUTOMATIC SPEECH RECOGNITION (1)
AVICAR CORPUS (1)
BAUM-WELCH ALGORITHM (1)
BAYESIAN ACCURACY (1)
BAYESIAN BEAMFORMING (1)
BAYESIAN OPTIMAL CLASSIFIER (1)
BAYESIAN OPTIMAL ONE-CHANNEL ESTIMATOR (1)
BAYESIAN SENSE (1)
BELIEF NETWORKS (1)
BLIND SOURCE SEPARATION (1)
BOOSTED GAUSSIAN MIXTURE MODEL (1)
BOOSTING BAUM-WELCH ALGORITHM (1)
CHANNEL INVERSION (1)
CHIL HEAD POSE DATABASE (1)
CLASS-CONDITIONAL DISTRIBUTION (1)
CONVOLUTION (1)
CUAVE AUDIO-VISUAL DIGITS CORPUS (1)
DATA MINING (1)
DECONVOLUTION (1)
DIRECTION OF ARRIVAL (1)
DIRECTION OF ARRIVAL ESTIMATION (1)
DIRECTION-OF-ARRIVAL ESTIMATION (1)
DISCRIMINATIVE LEARNING METHODS (1)
DOA ESTIMATION (1)
DUAL-STREAM MODEL (1)
DYNAMIC BAYESIAN NETWORK (1)
DYNAMIC BAYESIAN NETWORKS (1)
DYNAMIC PROGRAMMING SCHEME (1)
EM ALGORITHM (1)
EM BEAMFORMING (1)
EM-GMM ALGORITHM (1)
more

INFONA - science communication portal

Search results for: M. Hasegawa-Johnson

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options