Search results

Items from 1 to 20 out of 69 results

chapter

Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR

Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open...

chapter

Robust speech recognition for similar Japanese pronunciation phrases under noisy conditions

George Mufungulwa, Hiroshi Tsutsui, Yoshikazu Miyanaga, Shin-ichi Abe, more

2017 International Symposium on Signals, Circuits and Systems (ISSCS) > 1 - 4

2017 International Symposium on Signals, Circuits and Systems (ISSCS)

This paper proposes a new noisy robust speech recognition method. Under noise circumstances, several noise reduction methods have been developed and they are applied in various noise conditions. However, in case of similar pronunciation speech, for example, it is still not easy to realize high recognition accuracy. In this paper, the new processing algorithm into speech modulation spectrum is proposed...

chapter

Text independent gender identification in noisy environmental conditions

Seema Khanum, A Firos

2017 International Conference on Computing, Communication and Automation (ICCCA) > 63 - 66

2017 International Conference on Computing, Communication and Automation (ICCCA)

This paper proposes a competent system that is not only text independent in identifying gender of a speaker but can also work efficiently in noisy environmental conditions in real time. The noisy environmental conditions are the places where noise signals are generated at different SNRs (Signal to Noise Ratios) such as train station, restaurant, exhibition hall, airport, and so on. The algorithms...

chapter

Robust speaker recognition based on DNN/i-vectors and speech separation

Jorge Chang, DeLiang Wang

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5415 - 5419

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent research shows that the i-vector framework for speaker recognition can significantly benefit from phonetic information. A common approach is to use a deep neural network (DNN) trained for automatic speech recognition to generate a universal background model (UBM). Studies in this area have been done in relatively clean conditions. However, strong background noise is known to severely reduce...

chapter

Reducing waiting time in automatic captioned relay service using short pause in voice activity detection

Kiettiphong Manovisut, Nattanun Thatphithakkul, Pokpong Songmuang

2017 9th International Conference on Knowledge and Smart Technology (KST) > 216 - 219

2017 9th International Conference on Knowledge and Smart Technology (KST)

The Automatic Captioned Relay Service is crucial for hearing disabilities or hard-of-hearing to communicate with others in real life. This service uses an Automatic Speech Recognition (ASR) to transcribe speech to a caption. If can reduce waiting time from non-streaming speech recognition, the relay service will support more users. In this paper, we proposed a method for improving a voice activity...

chapter

Deep bottleneck features and sound-dependent i-vectors for simultaneous recognition of speech and environmental sounds

Sakriani Sakti, Seiji Kawanishi, Graham Neubig, Koichiro Yoshino, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 35 - 42

2016 IEEE Spoken Language Technology Workshop (SLT)

In speech interfaces, it is often necessary to understand the overall auditory environment, not only recognizing what is being said, but also being aware of the location or actions surrounding the utterance. However, automatic speech recognition (ASR) becomes difficult when recognizing speech with environmental sounds. Standard solutions treat environmental sounds as noise, and remove them to improve...

chapter

Sudden-noise suppression with strike-portion detection based on phase linearity for speech recognition

Terumi Umematsu, Shuji Komeiji, Masanori Tsujikawa, Ryosuke Isotani

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 4

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose a sudden-noise suppression method for speech recognition using a phase linearity feature for noise detection. Our investigation of sound data recorded in actual retail stores shows that short, sudden noises are dominant in such environments. We also confirm the negative effect of such noises on speech recognition performance. Our method addresses this problem by focusing on sudden noises...

chapter

VTS feature compensation based on two-layer GMM structure for robust speech recognition

Lin Zhou, Haijing Li, Ying Chen, Zhenyang Wu, more

2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) > 1 - 5

2016 8th International Conference on Wireless Communications & Signal Processing (WCSP)

In this paper, a two-layer Gaussian Mixed Model (GMM) structure for Vector Taylor Series (VTS) feature compensation is proposed for robust speech recognition. Since GMM with the numerous mixture components is used for VTS, the computation complexity of VTS is extremely huge. To deal with this issue, we propose two-layer GMM structure for VTS. In detail, the GMM with fewer mixture components is utilized...

chapter

Mapping Mel sub-band energies using Deep belief network for robust speech recognition

Mojtaba Gholamipour, Babak Nasersharif

2016 8th International Symposium on Telecommunications (IST) > 510 - 514

2016 8th International Symposium on Telecommunications (IST)

Sub-band speech processing is well-known in robust speech recognition. On the other hand, in recent years, deep neural networks (DNNs) have been widely used in speech recognition for acoustic modeling and also feature extraction and transformation. In this paper, we propose to use deep belief network (DBN) as a post-processing method for de-noising in Mel sub-band level where we enhance logarithm...

chapter

Noise-robust detection of whispering in telephone calls using deep neural networks

Aleksandr Diment, Mikko Parviainen, Tuomas Virtanen, Roman Zelov, more

2016 24th European Signal Processing Conference (EUSIPCO) > 2310 - 2314

2016 24th European Signal Processing Conference (EUSIPCO)

Detection of whispered speech in the presence of high levels of background noise has applications in fraudulent behaviour recognition. For instance, it can serve as an indicator of possible insider trading. We propose a deep neural network (DNN)-based whispering detection system, which operates on both magnitude and phase features, including the group delay feature from all-pole models (APGD). We...

chapter

Robust speech recognition based on speech enhancement and improved perceptual non-uniform spectral compression

Yi Zhang, Long Sun, Pei-pei Wang, Yuan Luo

2016 12th World Congress on Intelligent Control and Automation (WCICA) > 3077 - 3082

2016 12th World Congress on Intelligent Control and Automation (WCICA)

Owing to the decline of recognition rate of speech recognition system in noisy environments. In signal space, the speech enhancement algorithm which combines the Priori Signal-to-Noise Ratio (SNR) with Auditory Masking Effect can effectively remove the noise of the speech signal. In feature space, improved non-uniform spectral perceptual compression feature extraction algorithm can effectively compress...

chapter

Speaker identification and Spoken word recognition in noisy background using artificial neural networks

Shaik Shafee, B. Anuradha

2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) > 912 - 917

2016 International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT)

Generally Speech Recognition Systems are specific to speech/spoken word recognition or Speaker Identification/Verification. In this paper, An attempt has been made to find the better combination of Speech feature extraction and Artificial Neural Network Model for Speaker Identification combined with Spoken word recognition in general noisy back ground (i. e Home/Office environment). Different speech...

chapter

Isolated speech recognition using Fuzzy C Means technique

Vani H.Y, M.A. Anusuya

2015 International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT) > 352 - 357

2015 International Conference on Emerging Research in Electronics, Computer Science and Technology (ICERECT)

Automatic speech recognition is one of the challenging area in the field of speech signal processing. Automatic speech recognition technology converts speech signal into text. This paper presents the implementation of isolated kannada word recognizer using Vector Quantization (VQ) and Fuzzy-C Means (FCM) techniques. The paper compares and contrasts the recognition accuracies of FCM and k-means techniques...

chapter

Voice based control command signal generation for intelligent system

Abhay Kumar

2015 IEEE UP Section Conference on Electrical Computer and Electronics (UPCON) > 1 - 6

2015 IEEE UP Section Conference on Electrical Computer and Electronics (UPCON)

The speech processing and its application to intelligent system is the state of the art research. The systems are getting smarter day by day with the introduction of the speech signal to control the machine. The basic model of human speech system is used to design the algorithm which is able to detect words as well as an alphabetical letter to generate commands and control machine. The speech enhancement...

chapter

Exploring dataset similarities using PCA-based feature selection

Ingo Siegert, Ronald Bock, Andreas Wendemuth, Bogdan Vlasenko

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 387 - 393

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

In emotion recognition from speech, several well-established corpora are used to date for the development of classification engines. The data is annotated differently, and the community in the field uses a variety of feature extraction schemes. The aim of this paper is to investigate promising features for individual corpora and then compare the results for proposing optimal features across data sets,...

chapter

Improved recognition rate of language identification system in noisy environment

Randheer Bagi, Jainath Yadav, K. Sreenivasa Rao

2015 Eighth International Conference on Contemporary Computing (IC3) > 214 - 219

2015 Eighth International Conference on Contemporary Computing (IC3)

Spoken language identification is a technique to model and classify the language, spoken by an unknown person. Language identification task is more challenging in environmental condition due to addition of different types of noise. Presence of noise in speech signal causes several nuisances. This paper covers several aspect of language identification in noisy environment. Experiments have been carried...

chapter

Integrating lip-reading and thai speech to control electronic devices in a vehicle

Isamail Masamae, Panyayot Chaikan

2015 5th IEEE International Conference on System Engineering and Technology (ICSET) > 29 - 32

2015 5th IEEE International Conference on System Engineering and Technology (ICSET)

This paper presents the use of lip-reading and Thai speech to control electronic devices in a vehicle. The Viola-Jones algorithm detects the face of the driver and the constrained local model detects their mouth area before three lips features are extracted. Hidden Markov models are utilized to recognize speech and lip movement, with the lip movement recognizer offering better accuracy than the speech...

chapter

DNN feature compensation for noise robust speaker verification

Steven Du, Xiong Xiao, Eng Siong Chng

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 871 - 875

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

The speaker verification (SV) task has been an active area of research in the last thirty years. One of the recent research topics is on improving the robustness of SV system in challenging environments. This paper examines the robustness of current state of the art SV system against background noise corruptions. Specifically, we consider the scenario where the SV system is trained from noise free...

chapter

Spectral subtraction and missing feature modeling for speaker verification

Andrzej Drygajlo, Mounir El-Maliki

9th European Signal Processing Conference (EUSIPCO 1998) > 1 - 4

9th European Signal Processing Conference (EUSIPCO 1998)

This paper addresses the problem of robust text-independent speaker verification when some of the features for the target signal are heavily masked by noise. In the framework of Gaussian mixture models (GMMs), a new approach based on the spectral subtraction technique and the statistical missing feature compensation is presented. The identity of spectral features missing due to noise masking is provided...

chapter

Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments

Andreas Schwarz, Christian Huemmer, Roland Maas, Walter Kellermann

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4380 - 4384

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin...

Keywords:
FEATURE EXTRACTION
SPEECH RECOGNITION
NOISE MEASUREMENT

Publication date

Set your own date range

Content availability

Available (68)
None (1)

Keywords

SPEECH (64)
NOISE (29)
MEL FREQUENCY CEPSTRAL COEFFICIENT (20)
HIDDEN MARKOV MODELS (15)
SIGNAL TO NOISE RATIO (12)
CEPSTRAL ANALYSIS (11)
ROBUSTNESS (10)
SPEECH ENHANCEMENT (9)
MFCC (8)
SPEECH PROCESSING (8)
ACCURACY (7)
DATA MINING (7)
ACOUSTICS (6)
DATABASES (6)
NOISY ENVIRONMENT (6)
TRAINING (6)
VOICE ACTIVITY DETECTION (6)
AUTOMATIC SPEECH RECOGNITION (4)
ESTIMATION (4)
FILTERING THEORY (4)
INDEXES (4)
NOISE ROBUSTNESS (4)
ROBUST SPEECH RECOGNITION (4)
SIGNAL DENOISING (4)
SIGNAL PROCESSING (4)
SPEAKER RECOGNITION (4)
SPEECH INTELLIGIBILITY (4)
ADDITIVE NOISE (3)
AUDITORY MODEL (3)
COVARIANCE MATRIX (3)
FILTER BANK (3)
HIDDEN MARKOV MODEL (3)
HMM (3)
ISOLATED WORD RECOGNITION (3)
LEAST MEAN SQUARES METHODS (3)
MATHEMATICAL MODEL (3)
MICROPHONES (3)
MODULATION (3)
NEURAL NETWORKS (3)
NOISE REDUCTION (3)
ROBOTS (3)
SPECTROGRAM (3)
VECTORS (3)
WORD PROCESSING (3)
ANALYTICAL MODELS (2)
ASR (2)
AUDIO PROTO OBJECTS (2)
AUDIO SIGNAL PROCESSING (2)
AUDITORY SYSTEM (2)
BANDWIDTH (2)
COMPUTATIONAL MODELING (2)
COMPUTERS (2)
CONFERENCES (2)
DATA MODELS (2)
DEEP NEURAL NETWORKS (2)
DISCRETE COSINE TRANSFORMS (2)
DISTRIBUTED SPEECH RECOGNITION (2)
ENTROPY (2)
FEATURE COMPENSATION (2)
FREQUENCY MODULATION (2)
IMAGE SEGMENTATION (2)
LINE SPECTRAL FREQUENCY (2)
MAXIMUM LIKELIHOOD ESTIMATION (2)
MEL-FREQUENCY CEPSTRAL COEFFICIENTS (2)
MINIMUM MEAN SQUARE ERROR ESTIMATION (2)
MMSE (2)
MULTIMEDIA COMMUNICATION (2)
MUSIC (2)
NATURAL LANGUAGE PROCESSING (2)
NOISE (WORKING ENVIRONMENT) (2)
NOISE LEVEL (2)
NOISY SPEECH RECOGNITION (2)
PRINCIPAL COMPONENT ANALYSIS (2)
PSYCHOACOUSTIC MODELS (2)
ROBOT AUDITION (2)
ROBUST VOICE ACTIVITY DETECTION (2)
SIGNAL PROCESSING ALGORITHMS (2)
SIGNAL RECONSTRUCTION (2)
SMOOTHING METHODS (2)
SPEAKER IDENTIFICATION (2)
SPEECH CODING (2)
SUPPORT VECTOR MACHINE CLASSIFICATION (2)
SUPPORT VECTOR MACHINES (2)
TRANSFORMS (2)
WAVELET PACKETS (2)
WIENER FILTER (2)
WIENER FILTERS (2)
3-AXIS ACCELERATOR (1)
ACOUSTIC FEATURES (1)
ACOUSTIC MEASUREMENT (1)
ACOUSTIC MEASUREMENTS (1)
ACOUSTIC NOISE (1)
ADABOOST (1)
ADAPTIVE AUDITORY ATTENTION (1)
ADAPTIVE BOOSTING (1)
AFE (1)
ALGORITHM DESIGN AND ANALYSIS (1)
more

INFONA - science communication portal

Search results

Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR

Robust speech recognition for similar Japanese pronunciation phrases under noisy conditions

Text independent gender identification in noisy environmental conditions

Robust speaker recognition based on DNN/i-vectors and speech separation

Reducing waiting time in automatic captioned relay service using short pause in voice activity detection

Deep bottleneck features and sound-dependent i-vectors for simultaneous recognition of speech and environmental sounds

Sudden-noise suppression with strike-portion detection based on phase linearity for speech recognition

VTS feature compensation based on two-layer GMM structure for robust speech recognition

Mapping Mel sub-band energies using Deep belief network for robust speech recognition

Noise-robust detection of whispering in telephone calls using deep neural networks

Robust speech recognition based on speech enhancement and improved perceptual non-uniform spectral compression

Speaker identification and Spoken word recognition in noisy background using artificial neural networks

Isolated speech recognition using Fuzzy C Means technique

Voice based control command signal generation for intelligent system

Exploring dataset similarities using PCA-based feature selection

Improved recognition rate of language identification system in noisy environment

Integrating lip-reading and thai speech to control electronic devices in a vehicle

DNN feature compensation for noise robust speaker verification

Spectral subtraction and missing feature modeling for speaker verification

Spatial diffuseness features for DNN-based speech recognition in noisy and reverberant environments

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options