Search results

Items from 1 to 20 out of 187 results

chapter

Research on multi-base depth neural network speech recognition

Cai Jun, Li Fei, Zhang Yi, Liu Yu

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC) > 1540 - 1544

2017 IEEE 2nd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC)

In speech recognition system, an improved multi-base neural network speech recognition model is proposed to solve the problem of long learning time and slow convergence rate of deep neural network. However, the improved model introduces a large number of parameters in the training process to make the model over-fitted in the test set, resulting in the deterioration of generalization ability and the...

chapter

Automatic speech recognition models: A characteristic and performance review

U. G. Patil, S. D. Shirbahadurkar, A. N. Paithane

2016 International Conference on Computing Communication Control and automation (ICCUBEA) > 1 - 7

2016 International Conference on Computing Communication Control and automation (ICCUBEA)

This paper presents a review on few notable speech recognition models that are reported in the last decade. Firstly, the models are categorized into sparse models, learning models and domain - specific models. Subsequently, the characteristics of the models have been observed using speech constraints, algorithmic constraints and performance constraints. The performance of these models reported in...

chapter

Evaluation of wains as a classifier for automatic speech recognition

Rosemary T. Salaja, Ronan Flynn, Michael Russell

2015 26th Irish Signals and Systems Conference (ISSC) > 1 - 6

2015 26th Irish Signals and Systems Conference (ISSC)

This paper introduces a new back-end classifier for a speech recognition system that is based on artificial life (ALife). The ALife species being used for classification purposes are called wains, which were developed using the Créatúr framework. The speech recognition task used in the evaluation of the new classifier is that of isolated digit recognition. Performance of the proposed back-end classifier...

chapter

Feature extraction analysis on Indonesian speech recognition system

Untari N. Wisesty, Adiwijaya, Widi Astuti

2015 3rd International Conference on Information and Communication Technology (ICoICT) > 54 - 58

2015 3rd International Conference on Information and Communication Technology (ICoICT )

Speech recognition is widely applied to speech to text, speech to emotion, in order to make gadget and computer easier to use, or to help people with hearing disability. Feature extraction is one of significant step in the performance of speech recognition. Therefore, the proper selection is really needed. In this paper, we analyze feature extraction that can have good performance for Indonesian speech...

chapter

Deep neural networks for cochannel speaker identification

Xiaojia Zhao, Yuxuan Wang, DeLiang Wang

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4824 - 4828

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Speaker identification (SID) in cochannel speech, where two speakers are talking simultaneously over a single recording channel, is a challenging problem. Previous studies address this problem in the anechoic environment under the Gaussian mixture model (GMM) framework. On the other hand, cochannel SID in reverberant conditions has not been addressed. This paper studies cochannel SID in both anechoic...

chapter

Speech recognition with prediction-adaptation-correction recurrent neural networks

Yu Zhang, Dong Yu, Michael L. Seltzer, Jasha Droppo

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5004 - 5008

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose the prediction-adaptation-correction RNN (PAC-RNN), in which a correction DNN estimates the state posterior probability based on both the current frame and the prediction made on the past frames by a prediction DNN. The result from the main DNN is fed back to the prediction DNN to make better predictions for the future frames. In the PAC-RNN, we can consider that, given the new, current...

chapter

Improved recognition of contact names in voice commands

Petar Aleksic, Cyril Allauzen, David Elson, Aleksandar Kracun, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5172 - 5175

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The recognition of contact names in mobile-device voice commands is a challenging problem. Some of the difficulties include potentially infinite vocabularies, low probability of contact tokens in the language model (LM), increased false triggering of contact voice commands when none are spoken, and very large and noisy contact name lists. In this paper we suggest solutions for each of these difficulties.

chapter

Discriminative spectral learning of hidden markov models for human activity recognition

Alfredo Nazabal, Antonio Artes-Rodriguez

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1966 - 1970

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Hidden Markov Models (HMMs) are one of the most important techniques to model and classify sequential data. Maximum Likelihood (ML) and (parametric and non-parametric) Bayesian estimation of the HMM parameters suffers from local maxima and in massive datasets they can be specially time consuming. In this paper, we extend the spectral learning of HMMs, a moment matching learning technique free from...

chapter

Weighted training for speech under Lombard Effect for speaker recognition

Muhammad Muneeb Saleem, Gang Liu, John H.L. Hansen

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4350 - 4354

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The presence of Lombard Effect in speech is proven to have severe effects on the performance of speech systems, especially speaker recognition. Varying kinds of Lombard speech are produced by speakers under influence of varying noise types [1]. This study proposes a high-accuracy classifier using deep neural networks for detecting various kinds of Lombard speech against neutral speech, independent...

chapter

Longer-length acoustic units for continuous speech recognition

Annika Hamalainen, Johan de Veth, Lou Boves

2005 13th European Signal Processing Conference > 1 - 4

2005 13th European Signal Processing Conference

Recent research on the TIMIT database suggests that longer-length acoustic units are better suited for modelling pronunciation variation and long-term temporal dependencies in speech than traditional phoneme-length units, yielding substantial improvements in recognition accuracy [9]. In this paper, we investigate whether similar improvements can be gained on another database, viz. excerpts from novels...

chapter

Speaker based Language Independent Isolated Speech Recognition System

Shanthi Therese S., Chelpa Lingam

2015 International Conference on Communication, Information & Computing Technology (ICCICT) > 1 - 7

2015 International Conference on Communication, Information & Computing Technology (ICCICT)

This paper presents a speaker based Language Independent Isolated Speech Recognition System (LIISRS). The most popular feature extraction technique Mel Frequency Cepstral Coefficients (MFCC) is used for training the system. Representative specific features are identified using K-Means algorithm. Distortion measure is calculated using Euclidian distance function. Pitch contour characteristics are used...

chapter

Two-stage phone recognition system using articulatory and spectral features

K E Manjunath, K. Sreenivasa Rao, M Gurunath Reddy

2015 International Conference on Signal Processing and Communication Engineering Systems > 107 - 111

2015 International Conference on Signal Processing And Communication Engineering Systems (SPACES)

In this paper, we propose a two-stage phone recognition system using articulatory and spectral features. In the first stage, articulatory features are predicted from spectral features using FeedForward Neural Networks (FFNNs). In the second stage, phone recognition is carried out using the predicted articulatory features and spectral features together. FFNNs and Hidden Markov Models are explored for...

chapter

Improvement of phone recognition accuracy using source and system features

K E Manjunath, K. Sreenivasa Rao, M Gurunath Reddy

2015 International Conference on Signal Processing and Communication Engineering Systems > 501 - 505

2015 International Conference on Signal Processing And Communication Engineering Systems (SPACES)

The goal of this work is to improve phone recognition accuracy using combination of source and system features. As speech is produced by exciting time varying vocal tract system with time varying excitation, we want to explore both source and system components of speech production system for phone recognition. The excitation source information is derived by processing linear prediction residual of...

chapter

Unsupervised speaker adaptation of DNN-HMM by selecting similar speakers for lecture transcription

Masato Mimura, Tatsuya Kawahara

Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific > 1 - 4

2014 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Unsupervised speaker adaptation of Deep Neural Network (DNN) is investigated for lecture transcription tasks, in which a single speaker gives a long speech and thus speaker adaptation is important. The proposed method selects similar speakers to the test data (test speaker) from the training database, which are used for retraining the baseline DNN. Several speaker characteristic features are defined...

chapter

Speech/Music Classification of Short Audio Segments

Toni Hirvonen

2014 IEEE International Symposium on Multimedia > 135 - 138

2014 IEEE International Symposium on Multimedia (ISM)

Research on speech/music classification of digital audio has been both popular in academia, and increasingly utilized in industry. Most of the usual methods use carefully hand-crafted features with Gaussian Mixture Models. To get best performance, some of the features necessitate a long latency due to look ahead, or/and a long onset error. This paper aims to have a different approach to the problem...

chapter

Using k-Nearest Neighbor and Speaker Ranking for Phoneme Prediction

Muhammad Rizwan, David V. Anderson

2014 13th International Conference on Machine Learning and Applications > 383 - 387

2014 13th International Conference on Machine Learning and Applications (ICMLA)

Speech recognition systems are either based on parametric approach or non-parametric approach. Parametric based systems such as HMMs have been the dominant technology for speech recognition in the past decade. Despite a lot of advancements and enhancements in the design of these systems: key problems such as long term temporal dependence, etc. Has not yet been solved. Recently due to availability...

chapter

Automatic pronunciation error detection of nonnative Arabic Speech

Afnan Al Hindi, Mansour Alsulaiman, Ghulam Muhammad, Saad Al-Kahtani

2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA) > 190 - 197

2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)

Computer assisted language learning (CALL) and, more specifically, computer assisted pronunciation training (CAPT) have received considerable attention in recent years. CAPT allows continuous feedback to the learner without requiring the sole attention of the teacher; it facilitates self study and encourages interactive use of the language in preference to rote learning. One of the important processes...

chapter

Speech emotion recognition

S. Lalitha, Abhishek Madhavan, Bharath Bhushan, Srinivas Saketh

2014 International Conference on Advances in Electronics Computers and Communications > 1 - 4

2014 International Conference on Advances in Electronics, Computers and Communications (ICAECC)

In the past decade a lot of research has gone into Automatic Speech Emotion Recognition(SER). The primary objective of SER is to improve man-machine interface. It can also be used to monitor the psycho physiological state of a person in lie detectors. In recent time, speech emotion recognition also find its applications in medicine and forensics. In this paper 7 emotions are recognized using pitch...

chapter

Proposed combination of PCA and MFCC feature extraction in speech recognition system

Hoang Trang, Tran Hoang Loc, Huynh Bui Hoang Nam

2014 International Conference on Advanced Technologies for Communications (ATC 2014) > 697 - 702

2014 International Conference on Advanced Technologies for Communications (ATC)

In speech recognition system, the Mel Frequency Cepstrum Coefficients (i.e. MFCC) feature extraction is an important process. It has also been wildly used in many applications. In this paper, we present the conventional MFCC feature extraction method and propose two novel versions of MFCC method that will combine the PCA technique and conventional MFCC feature extraction method. Finally, these three...

chapter

Syllabic Markov models of Arabic HMMs of spoken Arabic using CV units

Michael Ingleby, Fatmah Baothman

2014 Third IEEE International Colloquium in Information Science and Technology (CIST) > 254 - 259

2014 Third IEEE International Colloquium in Information Science and Technology (CIST)

We survey evidence — orthographic distributional phonological and psycholinguistic — in favor of a model of Arabic speech sounds based on the CV unit and extensive use of the silent sukuun vowel. We then construct a small-vocabulary multi-speaker CV HMM similar to the phonemic HMMs based on tied triphones that are widely used in speech recognizers for English and other European languages. Using experimental...

Keywords:
TRAINING
ACCURACY
SPEECH RECOGNITION

Publication date

Set your own date range

Content availability

Available (186)
None (1)

Keywords

SPEECH (130)
HIDDEN MARKOV MODELS (107)
FEATURE EXTRACTION (54)
ACOUSTICS (46)
SPEECH PROCESSING (30)
DATABASES (26)
MEL FREQUENCY CEPSTRAL COEFFICIENT (25)
ARTIFICIAL NEURAL NETWORKS (23)
AUTOMATIC SPEECH RECOGNITION (21)
SUPPORT VECTOR MACHINES (21)
TRAINING DATA (20)
NATURAL LANGUAGE PROCESSING (19)
DATA MODELS (18)
TESTING (18)
CLASSIFICATION ALGORITHMS (16)
DATA MINING (15)
SPEAKER RECOGNITION (15)
NOISE (14)
ROBUSTNESS (13)
LEARNING (ARTIFICIAL INTELLIGENCE) (12)
NEURAL NETWORKS (12)
SIGNAL PROCESSING (12)
COMPUTATIONAL MODELING (11)
CORRELATION (11)
HIDDEN MARKOV MODEL (11)
HMM (11)
MATHEMATICAL MODEL (11)
PATTERN RECOGNITION (10)
ACOUSTIC MODELING (9)
DISCRIMINATIVE TRAINING (9)
EMOTION RECOGNITION (9)
SUPPORT VECTOR MACHINE CLASSIFICATION (9)
VECTORS (9)
COMPUTERS (8)
CONTEXT (8)
DECODING (8)
ENTROPY (8)
GAUSSIAN PROCESSES (8)
MACHINE LEARNING (8)
PATTERN CLASSIFICATION (8)
SPEAKER IDENTIFICATION (8)
STATISTICAL ANALYSIS (8)
SUPPORT VECTOR MACHINE (8)
VOCABULARY (8)
ADAPTATION MODEL (7)
ALGORITHM DESIGN AND ANALYSIS (7)
CEPSTRAL ANALYSIS (7)
CONFERENCES (7)
EDUCATIONAL INSTITUTIONS (7)
EQUATIONS (7)
MFCC (7)
NATURAL LANGUAGES (7)
NEURAL NETS (7)
SIGNAL CLASSIFICATION (7)
WRITING (7)
ACOUSTIC SIGNAL PROCESSING (6)
FACE RECOGNITION (6)
INFORMATION RETRIEVAL (6)
LANGUAGE MODEL (6)
LATTICES (6)
OPTIMIZATION (6)
PRINCIPAL COMPONENT ANALYSIS (6)
PROBABILITY (6)
SIGNAL PROCESSING ALGORITHMS (6)
SVM (6)
TRANSFORMS (6)
ADAPTATION MODELS (5)
CLASSIFICATION (5)
CONTEXT MODELING (5)
DETECTORS (5)
DICTIONARIES (5)
ELECTRONIC MAIL (5)
ENCODING (5)
MICROPHONES (5)
PRESSES (5)
ROBUST SPEECH RECOGNITION (5)
SIGNAL TO NOISE RATIO (5)
SMOOTHING METHODS (5)
SPEECH ANALYSIS (5)
ANALYTICAL MODELS (4)
ARTIFICIAL INTELLIGENCE (4)
CHARACTER RECOGNITION (4)
CLUSTERING METHODS (4)
EIGENVALUES AND EIGENFUNCTIONS (4)
ERROR ANALYSIS (4)
IMAGE COLOR ANALYSIS (4)
IMAGE PROCESSING (4)
IMAGE SEGMENTATION (4)
LABELING (4)
LABORATORIES (4)
LANGUAGE MODELING (4)
MAXIMUM LIKELIHOOD ESTIMATION (4)
MULTILAYER PERCEPTRONS (4)
PATTERN CLUSTERING (4)
PREDICTION ALGORITHMS (4)
RECOGNITION (4)
SHAPE (4)
more

INFONA - science communication portal

Search results

Research on multi-base depth neural network speech recognition

Automatic speech recognition models: A characteristic and performance review

Evaluation of wains as a classifier for automatic speech recognition

Feature extraction analysis on Indonesian speech recognition system

Deep neural networks for cochannel speaker identification

Speech recognition with prediction-adaptation-correction recurrent neural networks

Improved recognition of contact names in voice commands

Discriminative spectral learning of hidden markov models for human activity recognition

Weighted training for speech under Lombard Effect for speaker recognition

Longer-length acoustic units for continuous speech recognition

Speaker based Language Independent Isolated Speech Recognition System

Two-stage phone recognition system using articulatory and spectral features

Improvement of phone recognition accuracy using source and system features

Unsupervised speaker adaptation of DNN-HMM by selecting similar speakers for lecture transcription

Speech/Music Classification of Short Audio Segments

Using k-Nearest Neighbor and Speaker Ranking for Phoneme Prediction

Automatic pronunciation error detection of nonnative Arabic Speech

Speech emotion recognition

Proposed combination of PCA and MFCC feature extraction in speech recognition system

Syllabic Markov models of Arabic HMMs of spoken Arabic using CV units

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options