Search results

Items from 1 to 20 out of 32 results

chapter

A novel pitch extraction based on jointly trained deep BLSTM Recurrent Neural Networks with bottleneck features

Bin Liu, Jianhua Tao, Dawei Zhang, Yibin Zheng

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 336 - 340

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Pitch is an important characteristic of speech and is useful for many applications. However, it is still challenging to estimate pitch in strong noise. In this paper, we propose a joint training approach to determinate pitch. First, a Bidirectional Long Short-Term Memory Recurrent Neural Networks (BLSTMRNN) is trained to map the noisy to clean speech features. Second, the pitch estimation is also...

chapter

Respiratory airflow estimation from lung sounds based on regression

Elmar Messner, Martin Hagmuller, Paul Swatek, Freyja-Maria Smolle-Juttner, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1123 - 1127

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The aim of this work is the estimation of respiratory flow from lung sound recordings, i.e. acoustic airflow estimation. With a 16-channel lung sound recording device, we simultaneously record the respiratory flow and the lung sounds on the posterior chest from six lung-healthy subjects in supine position. For the recordings of four selected sensor positions, we extract linear frequency cepstral coefficient...

chapter

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition

Xiong Xiao, Shengkui Zhao, Douglas L. Jones, Eng Siong Chng, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 3246 - 3250

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Acoustic beamforming has played a key role in the robust automatic speech recognition (ASR) applications. Accurate estimates of the speech and noise spatial covariance matrices (SCM) are crucial for successfully applying the minimum variance distortionless response (MVDR) beamforming. Reliable estimation of time-frequency (TF) masks can improve the estimation of the SCMs and significantly improve...

chapter

A discriminative unsupervised method for speaker recognition using deep learning

Muhammad Muneeb Saleem, John H.L. Hansen

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 5

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

A Gaussian mixture model (GMM) is used in state-of-the-art i-Vector based speaker recognition systems for acoustic space division and prediction. The main purpose of such acoustic space clustering is to constrain the acoustic comparison in small regions where between-speaker differences are the main source of variability. In this study, we investigate two unsupervised discriminative approaches as...

chapter

Application of artificial neural network in Geology: Porosity estimation and lithological facies classification

Suihong Son, Jiagen Hou, Yuming Liu, Sifan Cao, more

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) > 740 - 744

2016 12th International Conference on Natural Computation and 13th Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

Based on the relationship between porosity (or lithological facies) and other petrophysical properties, Artificial neural networks (ANN) are respectively trained for porosity estimation and lithological facies classification, using core porosity (CPOR) data and core lithological facies interpretation results of part of core interval together with some well logs (petrophysical properties). After the...

chapter

Grid optimization based methods for estimating and tracking doubly spread underwater acoustic channels

Ye Qin, Shefeng Yan, Zhuqing Yuan, Lijun Xu

OCEANS 2016 - Shanghai > 1 - 8

OCEANS 2016 - Shanghai

In this paper, the methods for estimating and tracking sparse doubly spread channels in single-carrier coherent communications are investigated. The sparse doubly spread channel is parameterized by a few paths with different delays, Doppler scales, and gains. Based on the model, a low-complexity channel estimation algorithm is proposed. The channel estimation is divided into two stages, the first...

chapter

Quality estimation for asr k-best list rescoring in spoken language translation

Raymond W. M. Ng, Kashif Shah, Wilker Aziz, Lucia Specia, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5226 - 5230

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Spoken language translation (SLT) combines automatic speech recognition (ASR) and machine translation (MT). During the decoding stage, the best hypothesis produced by the ASR system may not be the best input candidate to the MT system, but making use of multiple sub-optimal ASR results in SLT has been shown to be too complex computationally. This paper presents a method to rescore the k-best ASR output...

chapter

Estimating confidence scores on ASR results using recurrent neural networks

Kaustubh Kalgaonkar, Chaojun Liu, Yifan Gong, Kaisheng Yao

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4999 - 5003

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper we present a confidence estimation system using recurrent neural networks (RNN) and compare it to a traditional multilayered perception (MLP) based system. The ability of RNN to capture sequence information and improve decisions using processed history was main motivation to explore RNN's for confidence estimation. In this paper we also explore two subtle variations of confidence estimator:...

chapter

A quantitative comparison of blind C₅₀ estimators

P. Peso Parada, D. Sharma, J. Lainez, D. Barreda, more

2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC) > 298 - 302

2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC)

The problem of blind estimation of the room acoustic clarity index C₅₀ from single-channel reverberant speech signals is presented in this paper. We analyze the performance of several machine learning methods for a regression task using 309 features derived from the speech signal and modeled with a Deep Belief Network (DBN), Classification And Regression Tree (CART) and Linear Regression (LR). These...

chapter

Generalization of supervised learning for binary mask estimation

Tobias May, Timo Gerkmann

2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC) > 154 - 158

2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC)

This paper addresses the problem of speech segregation by estimating the ideal binary mask (IBM) from noisy speech. Two methods will be compared, one supervised learning approach that incorporates a priori knowledge about the feature distribution observed during training. The second method solely relies on a frame-based speech presence probability (SPP) es-timation, and therefore, does not depend...

chapter

Application of SVM-based correctness predictions to unsupervised discriminative speaker adaptation

Matthew Gibson, Thomas Hain

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4341 - 4344

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

The effectiveness of unsupervised speaker adaptation is typically limited by errors in the estimated transcription of the adaptation data. Previous work has mitigated this negative effect by using only those sections of the adaptation data which are transcribed with relatively high confidence. In this work, phoneme correctness predictions are integrated into a discriminative unsupervised speaker adaptation...

chapter

Speaker age estimation and gender detection based on supervised Non-Negative Matrix Factorization

Mohamad Hasan Bahari, Hugo Van Hamme

2011 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS) > 1 - 6

2011 IEEE Workshop on Biometric Measurements and Systems for Security and Medical Applications (BIOMS)

In many criminal cases, evidence might be in the form of telephone conversations or tape recordings. Therefore, law enforcement agencies have been concerned about accurate methods to profile different characteristics of a speaker from recorded voice patterns, which facilitate the identification of a criminal. This paper proposes a new approach for speaker gender detection and age estimation, based...

chapter

Increasing discriminative capability on MAP-based mapping function estimation for acoustic model adaptation

Yu Tsao, Ryosuke Isotani, Hisashi Kawai, Satoshi Nakamura

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5320 - 5323

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this study, we propose increasing discriminative power on the maximum a posteriori (MAP)-based mapping function estimation for acoustic model adaptation. Based on the effective and stable learning advantages of MAP-based estimation, we incorporate a discriminative term and derive a new objective function. By applying the new function for online mapping function estimation, we developed discriminative...

chapter

An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques

Hironori Doi, Keigo Nakamura, Tomoki Toda, Hiroshi Saruwatari, more

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5136 - 5139

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this study, we evaluate our proposed methods for enhancing alaryngeal speech based on statistical voice conversion techniques. Voice conversion based on a Gaussian mixture model has been applied to the conversion of alaryngeal speech into normal speech (AL-to-Speech). Moreover, one-to-many eigenvoice conversion (EVC) has also been applied to AL-to-Speech to enable the recovery of the original voice...

chapter

HMM-based separation of acoustic transfer function for single-channel sound source localization

Ryoichi Takashima, Tetsuya Takiguchi, Yasuo Ariki

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 2830 - 2833

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper presents a sound source (talker) localization method using only a single microphone, where a HMM (Hidden Markov Model) of clean speech is introduced to estimate the acoustic transfer function from a user's position. The new method is able to carry out this estimation without measuring impulse responses. The frame sequence of the acoustic transfer function is estimated by maximizing the...

chapter

Improving online incremental speaker adaptation with eigen feature space MLLR

Xiaodong Cui, Jian Xue, Bowen Zhou

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 136 - 140

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

This paper investigates an eigen feature space maximum likelihood linear regression (fMLLR) scheme to improve the performance of online speaker adaptation in automatic speech recognition systems. In this stochastic-approximation-like framework, the traditional incremental fMLLR estimation is considered as a slowly changing mean of the eigen fMLLR. It helps the adaptation when only a limited amount...

chapter

Generalization problem in ASR acoustic model training and adaptation

S. Furui

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 1 - 10

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

Since speech is highly variable, even if we have a fairly large-scale database, we cannot avoid the data sparseness problem in constructing automatic speech recognition (ASR) systems. How to train and adapt statistical models using limited amounts of data is one of the most important research issues in ASR. This paper summarizes major techniques that have been proposed to solve the generalization...

chapter

Feature selection for room volume identification from room impulse response

N.R. Shabtai, Y. Zigel, B. Rafaely

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics > 249 - 252

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

The room impulse response (RIR) can be used to calculate many room acoustical parameters, such as the reverberation time (RT). However, estimating the room volume, another important room parameter, from the RIR is typically a more difficult task requiring extraction of other features from the RIR. Most of the existing fully-blind methods for estimating the room volume from the RIR do not combine features...

chapter

Local sensor system for badminton smash analysis

Chang Tai Kiang, Chan Kuan Yoong, A.C. Spowage

2009 IEEE Instrumentation and Measurement Technology Conference > 883 - 888

2009 IEEE Intrumentation and Measurement Technology Conference (I2MTC)

This paper presents a development of a sensory system for analysis of badminton smashes. During a badminton game, the ability to execute a powerful smash is fundamental for a player to be competitive. In most games, the winning factor for the game is often attributed to a high shuttle speed during the execution of a smash. It was envisioned that the shuttle speed can be correlated from the speed of...

chapter

Automatic topic detection strategy for information retrieval in spoken document

Shan Jin, H. Misra, T. Sikora, J. Jose

2009 10th Workshop on Image Analysis for Multimedia Interactive Services > 300 - 303

2009 10th Workshop on Image Analysis for Multimedia Interactive Services. WIAMIS 2009

This paper suggests an alternative solution for the task of spoken document retrieval (SDR). The proposed system runs retrieval on multi-level transcriptions (word and phone) produced by word and phone recognizers respectively, and their outputs are combined. We propose to use latent Dirichlet allocation (LDA) model for capturing the semantic information on word transcription. The LDA model is employed...

Keywords:
TRAINING
ACOUSTICS
ESTIMATION

Publication date

Set your own date range

Keywords

SPEECH (18)
FEATURE EXTRACTION (11)
SIGNAL PROCESSING (9)
COMPLEXITY THEORY (8)
ACCURACY (7)
ARTIFICIAL NEURAL NETWORKS (7)
HIDDEN MARKOV MODELS (7)
SPEECH RECOGNITION (7)
APPROXIMATION METHODS (6)
COMPUTATIONAL MODELING (6)
COMPUTERS (6)
EQUATIONS (6)
NOISE (6)
ALGORITHM DESIGN AND ANALYSIS (5)
DATA MINING (5)
ELECTRONIC MAIL (5)
LABORATORIES (5)
MATHEMATICAL MODEL (5)
ROBUSTNESS (5)
SHAPE (5)
TESTING (5)
ADAPTATION MODEL (4)
CONFERENCES (4)
CORRELATION (4)
DATABASES (4)
IMAGE COLOR ANALYSIS (4)
IMAGE PROCESSING (4)
NEURAL NETWORKS (4)
REVIEWS (4)
SPEECH PROCESSING (4)
TRANSFORMS (4)
VECTORS (4)
ARRAY SIGNAL PROCESSING (3)
FILTERING (3)
FOURIER TRANSFORMS (3)
FREQUENCY DOMAIN ANALYSIS (3)
IMAGE EDGE DETECTION (3)
IMAGE RECOGNITION (3)
IMAGE RECONSTRUCTION (3)
IMAGE RESOLUTION (3)
IMAGE SEGMENTATION (3)
INDEXES (3)
MANGANESE (3)
MAXIMUM LIKELIHOOD ESTIMATION (3)
PATTERN RECOGNITION (3)
PRODUCTION (3)
REAL TIME SYSTEMS (3)
SIGNAL PROCESSING ALGORITHMS (3)
SIGNAL TO NOISE RATIO (3)
SOLID MODELING (3)
SPEECH ENHANCEMENT (3)
WHITE NOISE (3)
ACOUSTIC BEAMS (2)
ACOUSTIC SIGNAL PROCESSING (2)
ADAPTATION MODELS (2)
ADAPTIVE SYSTEMS (2)
APPROXIMATION ALGORITHMS (2)
AUDITORY SYSTEM (2)
AUTOMATIC SPEECH RECOGNITION (2)
BACKGROUND NOISE (2)
BANDWIDTH (2)
BIOMEDICAL IMAGING (2)
CAMERAS (2)
CEPSTRAL ANALYSIS (2)
CHANNEL ESTIMATION (2)
CLASSIFICATION ALGORITHMS (2)
COMPUTER VISION (2)
CONFIDENCE MEASURES (2)
CYBERNETICS (2)
DISCRETE FOURIER TRANSFORMS (2)
DISCRIMINATIVE TRAINING (2)
DISTANCE MEASUREMENT (2)
DISTORTION (2)
EDUCATIONAL INSTITUTIONS (2)
ENTROPY (2)
FILTERING THEORY (2)
FRACTALS (2)
GABOR FILTERS (2)
GAIN (2)
GAUSSIAN DISTRIBUTION (2)
GEOMETRY (2)
HELIUM (2)
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2)
IMAGE CLASSIFICATION (2)
IMAGE GENERATION (2)
LIGHTING (2)
LIMITING (2)
MAGNETIC RESONANCE IMAGING (2)
MOBILE COMMUNICATION (2)
NOISE LEVEL (2)
NOISE MEASUREMENT (2)
NOISE REDUCTION (2)
OPTIMIZED PRODUCTION TECHNOLOGY (2)
PREDICTION ALGORITHMS (2)
PROBABILITY DENSITY FUNCTION (2)
PSYCHOACOUSTIC MODELS (2)
PSYCHOLOGY (2)
more

INFONA - science communication portal

Search results

A novel pitch extraction based on jointly trained deep BLSTM Recurrent Neural Networks with bottleneck features

Respiratory airflow estimation from lung sounds based on regression

On time-frequency mask estimation for MVDR beamforming with application in robust speech recognition

A discriminative unsupervised method for speaker recognition using deep learning

Application of artificial neural network in Geology: Porosity estimation and lithological facies classification

Grid optimization based methods for estimating and tracking doubly spread underwater acoustic channels

Quality estimation for asr k-best list rescoring in spoken language translation

Estimating confidence scores on ASR results using recurrent neural networks

A quantitative comparison of blind C₅₀ estimators

Generalization of supervised learning for binary mask estimation

Application of SVM-based correctness predictions to unsupervised discriminative speaker adaptation

Speaker age estimation and gender detection based on supervised Non-Negative Matrix Factorization

Increasing discriminative capability on MAP-based mapping function estimation for acoustic model adaptation

An evaluation of alaryngeal speech enhancement methods based on voice conversion techniques

HMM-based separation of acoustic transfer function for single-channel sound source localization

Improving online incremental speaker adaptation with eigen feature space MLLR

Generalization problem in ASR acoustic model training and adaptation

Feature selection for room volume identification from room impulse response

Local sensor system for badminton smash analysis

Automatic topic detection strategy for information retrieval in spoken document

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options