Advanced search

Advanced search in people

From:

To:

Items from 81 to 100 out of 115 results

chapter

Enhancement of speech intelligibility using transients extracted by wavelet packets

D.M. Rasetshwane, J.R. Boston, Ching-Chung Li, J.D. Durrant, more

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics > 173 - 176

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Speech transients have been shown to be important cues for identifying and discriminating speech sounds. We previously described a wavelet packet-based method for extracting transient speech (Rasetshwane et al. WASPAA 2007, pp. 179-182). The algorithm uses a ldquotransitivity functionrdquo to characterize the rate of change of wavelet coefficients, and it can be implemented in real-time to process...

chapter

Distributed Audio Network for Speech Enhancement in Challenging Noise Backgrounds

T. Kuhnapfel, T. Tan, S. Venkatesh, B. Igel

2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance > 308 - 313

2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

This paper presents a new approach to enhance speech based on a distributed microphone network. Each microphone is used to simultaneously classify the input into either one of the noise types or as speech. For enhancing the speech signal a modified spectral subtraction approach is used that utilize the sound information of the entire network to update the noise model even during speech. This improves...

chapter

Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions

T.V. Pham, C.T. Tang, M. Stadtschnitzer

2009 IEEE-RIVF International Conference on Computing and Communication Technologies > 1 - 8

2009 IEEE-RIVF International Conference on Computing and Communication Technologies (RIVF). Research, Innovation and Vision for the Future

We present an approach to model-based voice activity detection (VAD) for harsh environments. By using mel-frequency cepstral coefficients feature extracted from clean and noisy speech samples, an artificial neural network is trained optimally in order to provide a reliable model. There are three main aspects to this study: First, in addition to the developed model, recent state-of-the-art VAD methods...

chapter

Note onset detection for the transcription of polyphonic piano music

C.G.v.d. Boogaart, R. Lienhart

2009 IEEE International Conference on Multimedia and Expo > 446 - 449

2009 IEEE International Conference on Multimedia and Expo (ICME)

Transcription of music is the process of generating a symbolic representation such as a score sheet or a MIDI file from an audio recording of a piece of music. A statistical machine learning approach for detecting note onsets in polyphonic piano music is presented. An area from the spectrogram of the sound is concatenated into one feature vector. A cascade of boosted classifiers is used for dimensionality...

chapter

Neural network classifiers and Principal Component Analysis for blind signal to noise ratio estimation of speech signals

M. Marbach, R. Ondusko, R.P. Ramachandran, L.M. Head

2009 IEEE International Symposium on Circuits and Systems > 97 - 100

2009 IEEE International Symposium on Circuits and Systems - ISCAS 2009

A blind approach for estimating the signal to noise ratio (SNR) of a speech signal corrupted by additive noise is proposed. The method is based on a pattern recognition paradigm using various linear predictive based features, a neural network classifier and estimation combination. Blind SNR estimation is very useful in speaker identification systems in which a confidence metric is determined along...

chapter

Robust word boundary detection in spontaneous speech using acoustic and lexical cues

A. Tsiartas, P.K. Ghosh, P. Georgiou, S. Narayanan

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4785 - 4788

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

We consider the problem of word boundary detection in spontaneous speech utterances. Acoustic features have been well explored in the literature in the context of word boundary detection; however, in spontaneous speech of Switchboard-I corpus, we found that the accuracy of word boundary detection using acoustic features is poor (F-score ~ 0.63). We propose a new feature - that captures lexical cues...

chapter

Learning to maximize signal-to-noise ratio for reverberant speech segregation

Zhaozhang Jin, De-Liang Wang

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4689 - 4692

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Monaural speech segregation in reverberant environments is a very difficult problem. We develop a supervised learning approach by proposing an objective function that directly relates to the computational goal of maximizing signal-to-noise ratio. The model trained using this new objective function yields significantly better results for time-frequency unit labeling. In our segregation system, a segmentation...

chapter

Speaker recognition based on dynamic MFCC parameters

Wang Yutai, Li Bo, Jiang Xiaoqing, Liu Feng, more

2009 International Conference on Image Analysis and Signal Processing > 406 - 409

2009 International Conference on Image Analysis and Signal Processing. IASP 2009

The Mel-frequency cepstral coefficient is the most widely used feature in speech and speaker recognition. However, the traditional MFCC is very sensitive to noise interference, which tends to drastically degrade the performance of recognition systems because of the mismatches between training and testing. In this paper, we proposed a new speaker recognition algorithm based on the dynamic MFCC parameters...

chapter

Incorporating spectral subtraction and noise type for unvoiced speech segregation

Ke Hu, DeLiang Wang

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4425 - 4428

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Unvoiced speech poses a big challenge to current monaural speech segregation systems. It lacks harmonic structure and is highly susceptible to interference due to its relatively weak energy. This paper describes a new approach to segregate unvoiced speech from nonspeech interference. The system first estimates a voiced binary mask, and then performs unvoiced speech segregation in two stages: segmentation...

chapter

Auto-Correlation Property of Speech and its Application in Voice Activity Detection

Zhang Shuyin, Guo Ying, Wang Buhong

2009 First International Workshop on Education Technology and Computer Science > 3 > 265 - 268

2009 First International Workshop on Education Technology and Computer Science. ETCS 2009

The paper analyzes short term auto-correlation property of speech signal and confirms it through detailed comparing experiment with other kinds of signals. By applying the auto-correlation property of current speech frame and frames nearby, a new feature for voice activity detecting called weighted short-term summation of auto-correlation (WSAC) is formed. It is testified that the new VAD feature...

chapter

Robust Voice Activity Detection Feature Design Based on Spectral Kurtosis

Zhang Shuyin, Guo Ying, Zhang Qun

2009 First International Workshop on Education Technology and Computer Science > 3 > 269 - 272

2009 First International Workshop on Education Technology and Computer Science. ETCS 2009

In traditional VAD algorithms, High Order Statistics (HOS) is usually used in time domain and limited to white noise case. In this paper, a spectral domain HOS feature called spectral kurtosis is introduced, on the bases of which an essential exploring to the different characters between speech and noise in spectral domain is carried out. By the introducing of ldquotime delayrdquo and double thresholds...

chapter

An Efficient Feature Selection Method for Speaker Recognition

Hanwu Sun, Bin Ma, Haizhou Li

2008 6th International Symposium on Chinese Spoken Language Processing > 1 - 4

2008 6th International Symposium on Chinese Spoken Language Processing

In this paper, a new feature selection method for speaker recognition is proposed to keep the high quality speech frames for speaker modelling and to remove noisy and corrupted speech frames. In order to obtain robust voice activity detection in variety of acoustic conditions, the spectral subtraction algorithm is adopted to estimate the frame power. An energy based frame selection algorithm is then...

chapter

Deriving MFCC Parameters from the Dynamic Spectrum for Robust Speech Recognition

Nengheng Zheng, Xia Li, Houwei Cao, T. Lee, more

2008 6th International Symposium on Chinese Spoken Language Processing > 1 - 4

2008 6th International Symposium on Chinese Spoken Language Processing

State-of-the-art automatic speech recognition systems typically adopt the feature set containing mel-frequency cepstral coefficients (MFCC) and their time derivatives. The noise vulnerability of MFCC significantly degrades the recognition performance of such systems in noisy conditions. This paper describes a noise-robust feature extraction method. A set of new MFCC features is derived from the dynamic...

chapter

A Novel Acoustic Feature Extraction Algorithm Based on Root Cepstrum Coefficients and CCBC for Robust Speech Recognition

Xu Wang, Zhiyan Han

2008 Second International Symposium on Intelligent Information Technology Application > 1 > 643 - 647

2008 Second International Symposium on Intelligent Information Technology Application

Studies have shown that depending on speaker task and environmental conditions, recognizers are sensitive to noisy stressful environments. The focus of this study is to achieve robust recognition in diverse environmental conditions through extracting robust features. Central to the technique is Root Cepstrum Coefficients (RCC) method, instead of logarithm amplitude spectrum and discrete cosine transform...

chapter

Weighting of Mel Sub-bands Based on SNR/Entropy for Robust ASR

H. Yeganeh, S.M. Ahadi, S.M. Mirrezaie, A. Ziaei

2008 IEEE International Symposium on Signal Processing and Information Technology > 292 - 296

2008 8th IEEE International Symposium on Signal Processing and Information Technology. ISSPIT 2008

Mel-frequency cepstral coefficients (MFCC) are the most widely used features for speech recognition. However, MFCC-based speech recognition performance degrades in presence of additive noise. In this paper, we propose a set of noise-robust features based on conventional MFCC feature extraction method. Our proposed method consists of two steps. In the first step, mel sub-band Wiener filtering is carried...

chapter

HMM compensation based on non-uniform spectral compression for noisy speech recognition

Geng-xin Ning, Jun Zhang, Hua Yu

2008 11th IEEE Singapore International Conference on Communication Systems > 184 - 187

2008 11th IEEE Singapore International Conference on Communication Systems (ICCS)

A robust speech feature extraction method based on the power law of hearing and non-uniform spectral compression technique is proposed, and the correspondent model compensation algorithm is given. The mismatch functions, reflecting the infections of additive noise and spectral compression, and the model compensation formulae are deduced. The experiment results show that the significant improvement...

chapter

Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition

L.H. Terry, D.J. Shiell, A.K. Katsaggelos

2008 15th IEEE International Conference on Image Processing > 1316 - 1319

2008 15th IEEE International Conference on Image Processing - ICIP 2008

Most current audio-visual automatic speech recognition (AV- ASR) systems use static weights to leverage between audio and visual information during information fusion. State of the art research has led to using audio reliability metrics for dynamically changing the fusion weights in order to successfully improve overall recognition results. So far, however, incorporating visual reliability metrics...

chapter

Codec-independent sound activity detection based on the entropy with adaptive noise update

Jun Wang, Qin Yan, Jun Hong, Haojiang Deng, more

2008 9th International Conference on Signal Processing > 549 - 552

2008 9th International Conference on Signal Processing (ICSP 2008)

The existing voice activity detectors (VAD) always depend on specific audio codecs and give the degraded performance in the existence of music signals. This paper presents a sound activity detection method independent of audio codecs. An entropy feature set with adaptive noise estimation update is proposed to improve the performance of the entropy in detecting both speech and music. Afterwards, a...

chapter

A precise estimation of vocal tract parameters for high quality voice morphing

Ning Xu, Zhen Yang

2008 9th International Conference on Signal Processing > 684 - 687

2008 9th International Conference on Signal Processing (ICSP 2008)

One of the most recent models for voice conversion is the classical LPC analysis-synthesis model combined with GMM, which aims to separate information from excitation and vocal tract and to learn the transformation rules with statistical methods. However, it does not work well as it is supposed to be due to the inaccuracy of the extracted feature information as well as the overly-smoothed spectral...

chapter

A new MFCC improvement method for robust ASR

H. Yeganeh, S.M. Ahadi, A. Ziaei

2008 9th International Conference on Signal Processing > 643 - 646

2008 9th International Conference on Signal Processing (ICSP 2008)

The Mel-frequency cepstral coefficients (MFCC) are widely used for speech recognition. However, MFCC-based speech recognition performance degrades in presence of additive noise. In this paper, we propose a set of noise-robust features based on conventional MFCC feature extraction method. Our proposed method consists of two steps. In the first step, Mel sub-band spectral subtraction is carried out...

Keywords:
SIGNAL TO NOISE RATIO
SPEECH
FEATURE EXTRACTION

Publication date

Set your own date range

Publication type

book (99)
article (16)

Keywords

NOISE MEASUREMENT (43)
SPEECH RECOGNITION (41)
TRAINING (39)
SPEECH PROCESSING (30)
MEL FREQUENCY CEPSTRAL COEFFICIENT (27)
NOISE (26)
HIDDEN MARKOV MODELS (21)
ROBUSTNESS (21)
SPEECH ENHANCEMENT (20)
ACCURACY (14)
VOICE ACTIVITY DETECTION (13)
SPEAKER RECOGNITION (12)
ACOUSTICS (11)
ESTIMATION (11)
CEPSTRAL ANALYSIS (10)
DATABASES (9)
SIGNAL PROCESSING (8)
ADDITIVE NOISE (7)
CORRELATION (7)
MFCC (7)
MICROPHONES (7)
WHITE NOISE (7)
ALGORITHM DESIGN AND ANALYSIS (6)
ARTIFICIAL NEURAL NETWORKS (6)
COMPLEXITY THEORY (6)
COMPUTERS (6)
DATA MINING (6)
ENTROPY (6)
FILTER BANK (6)
MATHEMATICAL MODEL (6)
NOISE ROBUSTNESS (6)
ROBUST SPEECH RECOGNITION (6)
SIGNAL DENOISING (6)
SIGNAL PROCESSING ALGORITHMS (6)
TESTING (6)
ADAPTATION MODEL (5)
ANALYTICAL MODELS (5)
AWGN (5)
DEEP NEURAL NETWORKS (5)
EQUATIONS (5)
GMM (5)
INTERFERENCE (5)
NOISE REDUCTION (5)
NOISY ENVIRONMENT (5)
SPECTROGRAM (5)
TIME FREQUENCY ANALYSIS (5)
TIME-FREQUENCY ANALYSIS (5)
VECTORS (5)
ACOUSTIC NOISE (4)
AUTOMATIC SPEECH RECOGNITION (4)
BAYES METHODS (4)
CLASSIFICATION ALGORITHMS (4)
COMPUTATIONAL AUDITORY SCENE ANALYSIS (CASA) (4)
CONFERENCES (4)
DATA MODELS (4)
DECODING (4)
ELECTRONIC MAIL (4)
FOURIER TRANSFORMS (4)
FREQUENCY DOMAIN ANALYSIS (4)
GAUSSIAN PROCESSES (4)
HARMONIC ANALYSIS (4)
LABORATORIES (4)
LEARNING (ARTIFICIAL INTELLIGENCE) (4)
MEL FREQUENCY CEPSTRAL COEFFICIENTS (4)
MEL-FREQUENCY CEPSTRAL COEFFICIENTS (4)
MODULATION (4)
NEURAL NETWORKS (4)
OPTIMIZATION (4)
PREDICTION ALGORITHMS (4)
RELIABILITY (4)
SIGNAL RESOLUTION (4)
SIGNAL-TO-NOISE RATIO (4)
SNR (4)
SPECTRAL ANALYSIS (4)
SPEECH SEPARATION (4)
SPEECH SIGNAL PROCESSING (4)
SUPPORT VECTOR MACHINES (4)
BACKGROUND NOISE (3)
BANDWIDTH (3)
COMPUTATIONAL MODELING (3)
CONVOLUTION (3)
DELAY (3)
DETECTORS (3)
DISCRETE FOURIER TRANSFORMS (3)
DISTORTION (3)
EDUCATIONAL INSTITUTIONS (3)
ENDPOINT DETECTION (3)
IMAGE RESOLUTION (3)
INDEXES (3)
INTERFERENCE SUPPRESSION (3)
MACHINE LEARNING (3)
MAXIMUM LIKELIHOOD DETECTION (3)
MAXIMUM LIKELIHOOD ESTIMATION (3)
MEL-FREQUENCY CEPSTRAL COEFFICIENT (3)
MONAURAL SPEECH SEGREGATION (3)
NEURAL NETS (3)
NOISE LEVEL (3)
more

INFONA - science communication portal

Advanced search

Advanced search in people

Enhancement of speech intelligibility using transients extracted by wavelet packets

Distributed Audio Network for Speech Enhancement in Challenging Noise Backgrounds

Using Artificial Neural Network for Robust Voice Activity Detection Under Adverse Conditions

Note onset detection for the transcription of polyphonic piano music

Neural network classifiers and Principal Component Analysis for blind signal to noise ratio estimation of speech signals

Robust word boundary detection in spontaneous speech using acoustic and lexical cues

Learning to maximize signal-to-noise ratio for reverberant speech segregation

Speaker recognition based on dynamic MFCC parameters

Incorporating spectral subtraction and noise type for unvoiced speech segregation

Auto-Correlation Property of Speech and its Application in Voice Activity Detection

Robust Voice Activity Detection Feature Design Based on Spectral Kurtosis

An Efficient Feature Selection Method for Speaker Recognition

Deriving MFCC Parameters from the Dynamic Spectrum for Robust Speech Recognition

A Novel Acoustic Feature Extraction Algorithm Based on Root Cepstrum Coefficients and CCBC for Robust Speech Recognition

Weighting of Mel Sub-bands Based on SNR/Entropy for Robust ASR

HMM compensation based on non-uniform spectral compression for noisy speech recognition

Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition

Codec-independent sound activity detection based on the entropy with adaptive noise update

A precise estimation of vocal tract parameters for high quality voice morphing

A new MFCC improvement method for robust ASR

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options