Search results

Items from 81 to 100 out of 654 results

chapter

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features

Jiangyan Yi, Hao Ni, Zhengqi Wen, Jianhua Tao

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

This paper proposes an approach to perform accent adaptation by using accent dependent bottleneck (BN) features to improve the performance of multi-accent Mandarin speech recognition system. The architecture of the adaptation uses two neural networks. First, deep neural network (DNN) acoustic model acts as a feature extractor which is used to extract accent dependent BN (BN-DNN) features. The input...

chapter

Robust Automatic Speech Recognition system based on using adaptive time-frequency masking

Ahmed Mostafa Gouda, Mohamed Tamazin, Mohamed Khedr

2016 11th International Conference on Computer Engineering & Systems (ICCES) > 181 - 186

2016 11th International Conference on Computer Engineering & Systems (ICCES)

The Automatic Speech Recognition (ASR) systems suffer from many types of noises in different environments. Nowadays, developing robust ASR system is an attractive research topic due to the high demands in many commercial applications. In this paper, the Mel-Frequency Cepstral Coefficients (MFCC) is modified to robust the noise, where the spectrogram is used as time-frequency analysis tool. The proposed...

chapter

DNN based detection of pronunciation erroneous tendency in data sparse condition

Yingming Gao, Yanlu Xie, Ju Lin, Jinsong Zhang

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance...

chapter

Locality sensitive discriminant analysis for speaker verification

Danwei Cai, Weicheng Cai, Zhidong Ni, Ming Li

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

In this paper, we apply Locality Sensitive Discriminant Analysis (LSDA) to speaker verification system for intersession variability compensation. As opposed to LDA which fails to discover the local geometrical structure of the data manifold, LSDA finds a projection which maximizes the margin between i-vectors from different speakers at each local area. Since the number of samples varies in a wide...

chapter

Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN

Muhammad Atif Imtiaz, Gulistan Raja

2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast) > 106 - 110

2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast)

Automatic Speech Recognition (ASR) System is defined as transformation of acoustic speech signals to string of words. This paper presents an approach of ASR system based on isolated word structure using Mel-Frequency Cepstral Coefficients (MFCC's), Dynamic Time Wrapping (DTW) and K-Nearest Neighbor (KNN) techniques. The Mel-Frequency scale used to capture the significant characteristics of the speech...

chapter

Research on the recognition of isolated Chinese lyrics in songs with accompaniment based on deep belief networks

Juanjuan Cai, Nana Wang, Hui Wang, Bing Zhu

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 535 - 540

2016 IEEE 13th International Conference on Signal Processing (ICSP)

Lyrics are an important part of songs. Lyrics recognition is the basis of retrieving songs and recognizing the content of songs, which is of great value. At present, the research of speech recognition has made great progresses. But there are still difficulties in recognition of lyrics in songs with accompaniment. Related research is generally lacking, especially for Chinese lyrics in songs with accompaniment,...

chapter

Real-time speaker identification system using cepstral features

Monalisha Barik, Susanta Kumar Sarangi, Sushanta Kumar Sahu

2016 2nd International Conference on Communication Control and Intelligent Systems (CCIS) > 89 - 93

2016 2nd International Conference on Communication, Control & Intelligent Systems (CCIS)

Real-time speaker identification (SI) system is the application of Biometric system where the voice samples are collected in real-time. Due to that contamination of noises in speaker samples are the natural scenario. In this work, we tried to increase the accuracy of real-time SI system. We analysed the SI system by using different feature extraction methods with GMM-ML classifier. We found that MFCC...

chapter

A Comparative Study of Different Speech Features for Arabic Phonemes Classification

Ali Meftah, Yousef A. Alotaibi, Sid-Ahmed Selouani

2016 European Modelling Symposium (EMS) > 47 - 52

2016 European Modelling Symposium (EMS)

This paper presents the work related to phonetical analysis of classical Arabic speech. Hidden Markov model classifier is applied on Arabic phonemes. For the purpose of this work, a new classical Arabic speech corpus is created. The corpus is based on selected recordings of recitations of The Holy Quran. A number of acoustic features are analyzed and compared. Those are: linear predictive coding (LPC)...

chapter

Measuring Customer Satisfaction through Speech Using Valence-Arousal Approach

Norhaslinda Kamaruddin, Abdul Wahab Abdul Rahman, Aina Najwa Razman Shah

2016 6th International Conference on Information and Communication Technology for The Muslim World (ICT4M) > 298 - 303

2016 6th International Conference on Information and Communication Technology for The Muslim World (ICT4M)

There had been many empirical researched demonstrating the important link between customer satisfaction and sales performance, as such many Customer Satisfaction index (CSI) were developed. Almost all CSI to date uses the survey or questionnaire method, which has its flaws. In order to quantify the CSI, we propose the use of speech analysis based on the affective space model where the valence and...

chapter

Speech recognition using Support Vector Machines

Kamil Aida-zade, Anar Xocayev, Samir Rustamov

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT) > 1 - 4

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT)

In this article we applied Support Vector Machines to acoustic model of Speech Recognition System based on MFCC and LPC features for Azerbaijani DataSet. This DataSet has been used for speech recognition by Multilayer Artificial Neural Network and achieved some results. The main goal of this work is applying SVM techniques to the Azerbaijan Speech Recognition System. The variety of results of SVM...

chapter

Feature extraction and analysis of MISING speech vowels

Rizwan Rehman, Gopal Chandra Hazarika, Devid Kardong

2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES) > 261 - 265

2016 International conference on Signal Processing, Communication, Power and Embedded System (SCOPES)

Speech analysis forms the first layer in the process of automatic speech recognition. All speech recognition system primarily performs pattern recognition and therefore they perform well when inputs features are provided with certain properties. The Mel-Scale cepstral coefficient and LP coefficient transformed into cepstral coefficient are the best techniques for performing the automatic speech recognition...

chapter

Frequency Domain Linear Prediction-based robust text-dependent speaker identification

M. A. Islam

2016 International Conference on Innovations in Science, Engineering and Technology (ICISET) > 1 - 4

2016 International Conference on Innovations in Science, Engineering and Technology (ICISET)

Speaker identification is a biometric technique of determining an unknown speaker's identity among a number of speakers using distinguish latent information of uttered speech. Crime investigation, security control, telephone banking and trading, and information reservation are some applications of this technique. Frequency Domain Linear Prediction (FDLP) is a time-frequency-based feature has been...

chapter

Feature extraction and classification of the Indonesian syllables using Discrete Wavelet Transform and statistical features

Domy Kristomo, Risanuri Hidayat, Indah Soesanti

2016 2nd International Conference on Science and Technology-Computer (ICST) > 88 - 92

2016 2nd International Conference on Science and Technology-Computer (ICST)

The major problem of most speech recognition systems is their unsatisfactory effectiveness (impact to recognition rate), efficiency (feature vector dimension), shift variance, and robustness in noisy condition. Feature extraction plays a very important role in the speech recognition process, because a better feature is good for improving the recognition rate. This paper presents a speech feature extraction...

chapter

Senone I-vectors for robust speaker verification

Zhili Tan, Yingke Zhu, Man-Wai Mak, Brian Kan-Wing Mak

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Recent research has shown that using senone posteriors for i-vector extraction can achieve outstanding performance. In this paper, we extend this idea to robust speaker verification by constructing a deep neural network (DNN) comprising a deep belief network (DBN) stacked on top of a denoising autoencoder (DAE). The proposed method addresses noise robustness in two perspectives: (1) denoising the...

chapter

Exploring tonal information for Lhasa dialect acoustic modeling

Jian Li, Hongcui Wang, Longbiao Wang, Jianwu Dang, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Detailed analysis of tonal features for Tibetan Lhasa dialect is an important task for Tibetan automatic speech recognition (ASR) applications. However, it is difficult to utilize tonal information because it remains controversial how many tonal patterns the Lhasa dialect has. Therefore, few studies have focused on modeling the tonal information of the Lhasa dialect for speech recognition purpose...

chapter

Recognition of infant's emotions and needs from speech signals

Xuan Zhou, Hongzhi Hu, Lina Wei, Jian Wang, more

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC) > 4620 - 4625

2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Speech is not only a way for infants under one year of age to communicate with the outside world, but also the important information source to reflect their emotions and needs, as well as health status and mental level. In order to explore the intelligent machine technology for understanding infant's emotions and needs from speech signals, and therefore help parents in child rearing, this paper studied...

chapter

Evaluating the usage of short-time energy on voice biometrics system for cerebral palsy

Syifaun Nafisah, Oyas Wahyunggoro, Lukito Edi Nugroho

2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE) > 1 - 6

2016 8th International Conference on Information Technology and Electrical Engineering (ICITEE)

This study was performed to evaluate the feasibility of short-time energy as an input vector features that will be used as a key of recognition in the voice biometric system to recognize the Cerebral Palsy (CP). To retrieve the characteristics of the voice, Mel-Frequencies Cepstral Coefficients (MFCC) was used as feature extraction algorithm, while Neuro Fuzzy was used as the classifier algorithm...

chapter

Speech recognition using Principal Components Analysis and Neural Networks

Shaham Shabani, Yaser Norouzi

2016 IEEE 8th International Conference on Intelligent Systems (IS) > 90 - 95

2016 IEEE 8th International Conference on Intelligent Systems (IS)

In this paper, we intend to introduce a new approach to recognize discrete speeches, specifically pre-assumed words. Our approach is mainly based on Principal Components Analysis (PCA) and Neural Networks (NN). To do so, initially we build a data base which is provided by 20 speakers who uttered each predefined word 5 times and overall 10 Persian words. Then we apply Voice Activity Detection (VAD)...

chapter

Estimating multiple physical parameters from speech data

Shareef Babu Kalluri, Ashwin Vijayakumar, Deepu Vijayasenan, Rita Singh

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 5

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

In this work, we explore prediction of different physical parameters from speech data. We aim to predict shoulder size and waist size of people from speech data in addition to the conventional height and weight parameters. A data-set with this information is created from 207 volunteers. A bag of words representation based on log magnitude spectrum is used as features. A support vector regression predicts...

chapter

An Automatic and Robust System for Identification of Problematic Call Centre Conversations

Joyjit Chatterjee, Ayush Saxena, Garima Vyas

2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE) > 325 - 330

2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE)

In this Globalized world, the Call Centers and BPOsare increasing at an exponential rate. There is stiff competitionamong various companies and every company wants to have itsclients happy and satisfied with the resolution of the problems. For this purpose, Agent Quality Monitoring is an importantrequirement. Since in a typical Call Centre, thousands of calls aremade by agents in a single day, it...

Data set:
ieee
Keywords:
FEATURE EXTRACTION
MEL FREQUENCY CEPSTRAL COEFFICIENT
SPEECH
Publication type:
book

Publication date

Set your own date range

Content availability

Available (651)
None (3)

Keywords

SPEECH RECOGNITION (353)
TRAINING (149)
HIDDEN MARKOV MODELS (147)
SPEAKER RECOGNITION (147)
MFCC (143)
DATABASES (117)
SPEECH PROCESSING (103)
SUPPORT VECTOR MACHINES (92)
ACCURACY (90)
CEPSTRAL ANALYSIS (76)
NOISE (70)
EMOTION RECOGNITION (69)
FILTER BANKS (50)
SPEAKER IDENTIFICATION (44)
GMM (42)
ROBUSTNESS (42)
GAUSSIAN MIXTURE MODEL (39)
NOISE MEASUREMENT (37)
GAUSSIAN PROCESSES (34)
MATHEMATICAL MODEL (34)
VECTORS (33)
CLASSIFICATION ALGORITHMS (32)
ARTIFICIAL NEURAL NETWORKS (31)
DATA MINING (31)
SPEAKER VERIFICATION (31)
MEL FREQUENCY CEPSTRAL COEFFICIENTS (30)
CORRELATION (28)
TESTING (27)
AUTOMATIC SPEECH RECOGNITION (26)
VECTOR QUANTIZATION (26)
MEL-FREQUENCY CEPSTRAL COEFFICIENTS (24)
SIGNAL TO NOISE RATIO (24)
SVM (24)
COMPUTATIONAL MODELING (23)
DISCRETE COSINE TRANSFORMS (23)
FILTER BANK (23)
AUDIO SIGNAL PROCESSING (22)
HIDDEN MARKOV MODEL (21)
KERNEL (20)
PRINCIPAL COMPONENT ANALYSIS (20)
SIGNAL CLASSIFICATION (20)
SUPPORT VECTOR MACHINE (20)
NATURAL LANGUAGE PROCESSING (18)
FILTERING THEORY (17)
SIGNAL PROCESSING (17)
HMM (16)
MEL-FREQUENCY CEPSTRAL COEFFICIENT (16)
MUSIC (16)
ACOUSTIC SIGNAL PROCESSING (15)
LPC (15)
NEURAL NETWORKS (15)
NIST (15)
COMPUTERS (14)
SUPPORT VECTOR MACHINE CLASSIFICATION (14)
ADAPTATION MODELS (13)
MEL FREQUENCY CEPSTRAL COEFFICIENTS (MFCC) (13)
MICROPHONES (13)
NEURAL NETWORK (13)
SPEECH CODING (13)
SPEECH EMOTION RECOGNITION (13)
SPEECH ENHANCEMENT (13)
TIME FREQUENCY ANALYSIS (13)
TRANSFORMS (13)
ALGORITHM DESIGN AND ANALYSIS (12)
DATA MODELS (12)
DISCRETE WAVELET TRANSFORMS (12)
FEATURE SELECTION (12)
GAUSSIAN MIXTURE MODELS (12)
HARMONIC ANALYSIS (12)
INDEXES (12)
LEARNING (ARTIFICIAL INTELLIGENCE) (12)
PATTERN CLASSIFICATION (12)
VECTOR QUANTISATION (12)
WAVELET TRANSFORMS (12)
ACOUSTICS (11)
CEPSTRUM (11)
CLASSIFICATION (11)
CONFERENCES (11)
NEURAL NETS (11)
ROBUST SPEECH RECOGNITION (11)
SPEAKER DIARIZATION (11)
MACHINE LEARNING (10)
PITCH (10)
SPECTRAL ANALYSIS (10)
ACOUSTIC FEATURES (9)
AUDIO CLASSIFICATION (9)
EQUATIONS (9)
ESTIMATION (9)
HEURISTIC ALGORITHMS (9)
NEURONS (9)
POLYNOMIALS (9)
SPEECH ANALYSIS (9)
SPEECH FEATURE EXTRACTION (9)
TRAINING DATA (9)
VISUALIZATION (9)
VQ (9)
ADAPTATION MODEL (8)
more

INFONA - science communication portal

Search results

Improving BLSTM RNN based Mandarin speech recognition using accent dependent bottleneck features

Robust Automatic Speech Recognition system based on using adaptive time-frequency masking

DNN based detection of pronunciation erroneous tendency in data sparse condition

Locality sensitive discriminant analysis for speaker verification

Isolated word Automatic Speech Recognition (ASR) System using MFCC, DTW & KNN

Research on the recognition of isolated Chinese lyrics in songs with accompaniment based on deep belief networks

Real-time speaker identification system using cepstral features

A Comparative Study of Different Speech Features for Arabic Phonemes Classification

Measuring Customer Satisfaction through Speech Using Valence-Arousal Approach

Speech recognition using Support Vector Machines

Feature extraction and analysis of MISING speech vowels

Frequency Domain Linear Prediction-based robust text-dependent speaker identification

Feature extraction and classification of the Indonesian syllables using Discrete Wavelet Transform and statistical features

Senone I-vectors for robust speaker verification

Exploring tonal information for Lhasa dialect acoustic modeling

Recognition of infant's emotions and needs from speech signals

Evaluating the usage of short-time energy on voice biometrics system for cerebral palsy

Speech recognition using Principal Components Analysis and Neural Networks

Estimating multiple physical parameters from speech data

An Automatic and Robust System for Identification of Problematic Call Centre Conversations

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options