Search results

Items from 141 to 160 out of 1,635 results

1 ...
5
6
7
8
9
10
11

chapter

Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models

Nancy Semwal, Abhijeet Kumar, Sakthivel Narayanan

2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) > 1 - 6

2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA)

Emotions exhibited by a speaker can be detected by analyzing his/her speech, facial expressions and gestures or by combining these properties. This paper concentrates on determining the emotional state from speech signals. Various acoustic features such as energy, zero crossing rate(ZCR), fundamental frequency, Mel Frequency Cepstral Coefficients (MFCCs), etc are extracted for short term, overlapping...

article

The Cost of Dichotomizing Continuous Labels for Binary Classification Problems: Deriving a Bayesian-Optimal Classifier

Soroosh Mariooryad, Carlos Busso

IEEE Transactions on Affective Computing > 2017 > 8 > 1 > 119 - 130

Many pattern recognition problems involve characterizing samples with continuous labels instead of discrete categories. While regression models are suitable for these learning tasks, these labels are often discretized into binary classes to formulate the problem as a conventional classification task (e.g., classes with low versus high values). This methodology brings intrinsic limitations on the classification...

chapter

Comparison of crisp and fuzzy kNN in phoneme recognition

Ines Ben Fredj, Kais Ouni

2017 International Conference on Advanced Systems and Electric Technologies (IC ASET) > 118 - 122

2017 International Conference on Advanced Systems and Electric Technologies (IC ASET)

Despite the advances of information technology tools in the speech recognition task, the challenge to find a rapid and an efficient approach remains a principal research topic. In this paper, we apply the k-nearest neighbors (kNN) algorithm for Timit phoneme recognition with two models: crisp and fuzzy. Essentially, we explore the contribution of the fuzzy aspect for the crisp version of the kNN algorithm...

article

Recognizing Emotions From Whispered Speech Based on Acoustic Feature Transfer Learning

Jun Deng, Sascha Fruhholz, Zixing Zhang, Bjorn Schuller

IEEE Access > 2017 > 5 > 5235 - 5246

Whispered speech, as an alternative speaking style for normal phonated (non-whispered) speech, has received little attention in speech emotion recognition. Currently, speech emotion recognition systems are exclusively designed to process normal phonated speech and can result in significantly degraded performance on whispered speech because of the fundamental differences between normal phonated speech...

chapter

Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features

Qing Wang, Jun Du, Li-Rong Dai, Chin-Hui Lee

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 101 - 105

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

We present a joint noise and mask aware training strategy for deep neural network (DNN) based speech enhancement with sub-band features. First, based on the analysis of the previously proposed dynamic noise aware training approach tested on the wide-band (16 KHz) speech data, the full-band dynamic noise features cannot always improve the enhancement performance due to inaccurate noise estimation....

chapter

Low-power neuromorphic speech recognition engine with coarse-grain sparsity

Shunti Yin, Deepak Kadetotad, Bonan Yan, Chang Song, more

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC) > 111 - 114

2017 22nd Asia and South Pacific Design Automation Conference (ASP-DAC)

In recent years, we have seen a surge of interest in neuromorphic computing and its hardware design for cognitive applications. In this work, we present new neuromorphic architecture, circuit, and device co-designs that enable spike-based classification for speech recognition task. The proposed neuromorphic speech recognition engine supports a sparsely connected deep spiking network with coarse granularity,...

chapter

Joint MFCC-and-vector quantization based text-independent speaker recognition system

Ala Eldin Omer

2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE) > 1 - 6

2017 International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE)

Signal processing front end for extracting the feature set is an important stage in any speaker recognition system. There are many types of features that are derived differently and have good impact on the recognition rate. This paper uses one of the techniques to extract the feature set from a speech signal known as Mel Frequency Cepstrum Coefficients (MFCCs) to represent the signal parametrically...

chapter

A unified deep modeling approach to simultaneous speech dereverberation and recognition for the reverb challenge

Bo Wu, Kehuang Li, Zhen Huang, Sabato Marco Siniscalchi, more

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 36 - 40

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

We propose a unified deep neural network (DNN) approach to achieve both high-quality enhanced speech and high-accuracy automatic speech recognition (ASR) simultaneously on the recent REverberant Voice Enhancement and Recognition Benchmark (RE-VERB) Challenge. These two goals are accomplished by two proposed techniques, namely DNN-based regression to enhance reverberant and noisy speech, followed by...

chapter

Performance comparison of real-time single-channel speech dereverberation algorithms

Feifei Xiong, Bernd T. Meyer, Benjamin Cauchi, Ante Jukic, more

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA) > 126 - 130

2017 Hands-free Speech Communications and Microphone Arrays (HSCMA)

This paper investigates four single-channel speech dereverberation algorithms, i.e., two unsupervised approaches based on (i) spectral enhancement and (ii) linear prediction, as well as two supervised approaches relying on machine learning which incorporate deep neural networks to predict either (iii) the magnitude spectrogram or (iv) the ideal ratio mask. The relative merits of the four algorithms...

chapter

Language model adaptation based on correction information for interactive speech transcription

Duan Jia, Xiangdong Wang, Yuzhuo Ma, Yang Yang, more

2016 International Conference on Progress in Informatics and Computing (PIC) > 258 - 263

2016 International Conference on Progress in Informatics and Computing (PIC)

Aiming at language model (LM) adaptation for interactive speech transcription, this paper proposes a topic-based adaptation method using users' correction information. To infer the topic for each utterance in continuous speech, this method uses the correction information of history utterances adjacent to the current one. Perplexity is calculated for topic inference. Topic-related LMs are interpolated...

chapter

Speech emotion recognition cross language families: Mandarin vs. western languages

Zhongzhe Xiao, Di Wu, Xiaojun Zhang, Zhi Tao

2016 International Conference on Progress in Informatics and Computing (PIC) > 253 - 257

2016 International Conference on Progress in Informatics and Computing (PIC)

An investigation on classification of emotional speech cross different language families is proposed in this paper. Datasets on three languages, CDESD in Mandarin, Emo-DB in German, and DES in Danish are analyzed. With 2-D classifications on arousal-appraisal space, better recognition performances are observed in arousal dimension than in appraisal dimension. The classification rates in cross language...

chapter

An approach for spoken term detection based on modified Gaussian posteriorgrams

Liyuan Wang, Lei Wang

2016 International Conference on Progress in Informatics and Computing (PIC) > 264 - 267

2016 International Conference on Progress in Informatics and Computing (PIC)

Query-by-Example Spoken Term Detection(QbE-STD) has been a hot research topic in speech recognition field. While template representation is the key composition part of QbE-STD, many researchers have been committed to developing effective template representations to obtain the better performance. Gaussian posteriorgram has been widely used due to that the GMM model which generates the Gaussian posteriorgram...

chapter

Fuzzy k-nearest neighbors applied to phoneme recognition

Ines Ben Fredj, KaIs Ouni

2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT) > 422 - 426

2016 7th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT)

In this work, the Fuzzy kNN (FkNN), an alternative of the standard kNN algorithm, is used for Timit phoneme recognition. Phoneme is the smallest unit that composes speech. For this reason, if phoneme recognition is performed, it can achieve a significant word and text recognition. Thus, the main idea consists on assigning phoneme membership to the data phonemes by measuring the distance to its kNN...

chapter

An speech and face fusion recognition method based on fuzzy integral

Binxiang Tong, Yong Liu

2016 IEEE International Conference on Robotics and Biomimetics (ROBIO) > 1337 - 1342

2016 IEEE International Conference on Robotics and Biomimetics (ROBIO)

Studies show that multiple modal biometric systems for small-scale populations perform better than single modal biometric systems for robots's recognition. This paper establishes a new fusion method for multiple biometric feature identification which combines visual with auditory information. Before the fusion, speaker recognition based on vector quantization and face recognition based on sparse representation...

chapter

Fully convolutional recurrent network for handwritten Chinese text recognition

Zecheng Xie, Zenghui Sun, Lianwen Jin, Ziyong Feng, more

2016 23rd International Conference on Pattern Recognition (ICPR) > 4011 - 4016

2016 23rd International Conference on Pattern Recognition (ICPR)

This paper proposes an end-to-end framework, namely fully convolutional recurrent network (FCRN) for handwritten Chinese text recognition (HCTR). Unlike traditional methods that rely heavily on segmentation, our FCRN is trained with online text data directly and learns to associate the pen-tip trajectory with a sequence of characters. FCRN consists of four parts: a path-signature layer to extract...

chapter

Automatic Speech Recognition of isolated words in Hindi language using MFCC

U. G. Patil, S. D. Shirbahadurkar, A. N. Paithane

2016 International Conference on Computing, Analytics and Security Trends (CAST) > 433 - 438

2016 International Conference on Computing, Analytics and Security Trends (CAST)

Speech is natural vocalized and primary means of communication. Speech is easy, hand-free, fast and do not require any technical knowledge. Communicating with computer using speech is simple and comfortable way for human being. Speech recognition system made this possible. The acoustic and language model for this system are available but mostly in English language. In India there are so many peoples...

chapter

DLSTM approach to video modeling with hashing for large-scale video retrieval

Naifan Zhuang, Jun Ye, Kien A. Hua

2016 23rd International Conference on Pattern Recognition (ICPR) > 3222 - 3227

2016 23rd International Conference on Pattern Recognition (ICPR)

Although Query-by-Example techniques based on Euclidean distance in a multidimensional feature space have proved to be effective for image databases, this approach cannot be effectively applied to video since the number of dimensions would be massive due to the richness and complexity of video data. The above issue has been addressed in two recent solutions, namely Deterministic Quantization (DQ)...

chapter

Wake-up-word spotting using end-to-end deep neural network system

Shilei Zhang, Wen Liu, Yong Qin

2016 23rd International Conference on Pattern Recognition (ICPR) > 2878 - 2883

2016 23rd International Conference on Pattern Recognition (ICPR)

Deep neural networks (DNNs) have tremendously improved the performance of automatic speech recognition (ASR). On the other hand, end-to-end speech recognition system can achieve state-of-the-art performance using Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) and Connectionist Temporal Classification (CTC) method for unsegmented sequence data. In this paper, we therefor propose a lightweight...

chapter

End-to-End attention based text-dependent speaker verification

Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 171 - 178

2016 IEEE Spoken Language Technology Workshop (SLT)

A new type of End-to-End system for text-dependent speaker verification is presented in this paper. Previously, using the phonetic discriminate/speaker discriminate DNN as a feature extractor for speaker verification has shown promising results. The extracted frame-level (bottleneck, posterior or d-vector) features are equally weighted and aggregated to compute an utterance-level speaker representation...

chapter

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy

Tomohiro Tanaka, Takafumi Moriya, Takahiro Shinozaki, Shinji Watanabe, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 665 - 671

2016 IEEE Spoken Language Technology Workshop (SLT)

Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which...

1 ...
5
6
7
8
9
10
11

Keywords:
TRAINING
SPEECH RECOGNITION

Publication date

Set your own date range

Content availability

Available (1 629)
None (6)

Publication type

book (1 468)
article (167)

Keywords

SPEECH (1 174)
HIDDEN MARKOV MODELS (842)
ACOUSTICS (429)
FEATURE EXTRACTION (402)
SPEECH PROCESSING (206)
ACCURACY (194)
MEL FREQUENCY CEPSTRAL COEFFICIENT (170)
DATA MODELS (158)
DATABASES (154)
NEURAL NETWORKS (147)
SPEAKER RECOGNITION (147)
COMPUTATIONAL MODELING (142)
ARTIFICIAL NEURAL NETWORKS (140)
AUTOMATIC SPEECH RECOGNITION (137)
NATURAL LANGUAGE PROCESSING (131)
SUPPORT VECTOR MACHINES (129)
TRAINING DATA (129)
TESTING (108)
VOCABULARY (103)
EMOTION RECOGNITION (100)
ADAPTATION MODELS (95)
MATHEMATICAL MODEL (94)
ADAPTATION MODEL (92)
DECODING (91)
NOISE (91)
DATA MINING (88)
HIDDEN MARKOV MODEL (86)
LEARNING (ARTIFICIAL INTELLIGENCE) (80)
ERROR ANALYSIS (73)
CONTEXT (70)
MAXIMUM LIKELIHOOD ESTIMATION (68)
HMM (67)
GAUSSIAN PROCESSES (62)
ROBUSTNESS (62)
VECTORS (61)
OPTIMIZATION (60)
CLASSIFICATION ALGORITHMS (59)
DISCRIMINATIVE TRAINING (59)
LATTICES (59)
PATTERN CLASSIFICATION (58)
NOISE MEASUREMENT (55)
NEURAL NETS (53)
CEPSTRAL ANALYSIS (52)
MACHINE LEARNING (52)
MFCC (49)
PROBABILITY (49)
KERNEL (47)
RECURRENT NEURAL NETWORKS (46)
TRANSFORMS (46)
STATISTICAL ANALYSIS (45)
DICTIONARIES (42)
ACOUSTIC MODELING (41)
CONTEXT MODELING (41)
SIGNAL TO NOISE RATIO (41)
DEEP NEURAL NETWORK (40)
LANGUAGE MODEL (39)
CORRELATION (38)
ESTIMATION (38)
DEEP NEURAL NETWORKS (37)
NEURONS (37)
SPEECH CODING (37)
ACOUSTIC SIGNAL PROCESSING (36)
ROBUST SPEECH RECOGNITION (36)
SUPPORT VECTOR MACHINE (36)
VISUALIZATION (36)
NATURAL LANGUAGES (35)
ENTROPY (34)
SPEECH SYNTHESIS (34)
SPEECH ENHANCEMENT (33)
SUPPORT VECTOR MACHINE CLASSIFICATION (33)
EQUATIONS (32)
SIGNAL CLASSIFICATION (32)
GAUSSIAN MIXTURE MODEL (31)
SPEAKER IDENTIFICATION (30)
PATTERN RECOGNITION (29)
SPEECH EMOTION RECOGNITION (29)
VECTOR QUANTIZATION (29)
COVARIANCE MATRIX (28)
NIST (28)
ACOUSTIC MODEL (27)
HUMANS (27)
COMPUTER ARCHITECTURE (26)
COMPUTERS (26)
LANGUAGE MODELING (26)
SEMANTICS (26)
STANDARDS (26)
VITERBI ALGORITHM (26)
ALGORITHM DESIGN AND ANALYSIS (25)
DETECTORS (25)
MULTILAYER PERCEPTRONS (25)
PRINCIPAL COMPONENT ANALYSIS (25)
SIGNAL PROCESSING (25)
CLUSTERING ALGORITHMS (24)
BAYES METHODS (23)
LANGUAGE TRANSLATION (22)
LINGUISTICS (22)
MICROPHONES (22)
NEURAL NETWORK (22)
more

Data set

ieee (1 633)
Elsevier (2)

INFONA - science communication portal

Search results

Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models

The Cost of Dichotomizing Continuous Labels for Binary Classification Problems: Deriving a Bayesian-Optimal Classifier

Comparison of crisp and fuzzy kNN in phoneme recognition

Recognizing Emotions From Whispered Speech Based on Acoustic Feature Transfer Learning

Joint noise and mask aware training for DNN-based speech enhancement with SUB-band features

Low-power neuromorphic speech recognition engine with coarse-grain sparsity

Joint MFCC-and-vector quantization based text-independent speaker recognition system

A unified deep modeling approach to simultaneous speech dereverberation and recognition for the reverb challenge

Performance comparison of real-time single-channel speech dereverberation algorithms

Language model adaptation based on correction information for interactive speech transcription

Speech emotion recognition cross language families: Mandarin vs. western languages

An approach for spoken term detection based on modified Gaussian posteriorgrams

Fuzzy k-nearest neighbors applied to phoneme recognition

An speech and face fusion recognition method based on fuzzy integral

Fully convolutional recurrent network for handwritten Chinese text recognition

Automatic Speech Recognition of isolated words in Hindi language using MFCC

DLSTM approach to video modeling with hashing for large-scale video retrieval

Wake-up-word spotting using end-to-end deep neural network system

End-to-End attention based text-dependent speaker verification

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy

Filter options

Publication date

Content availability

Publication type

Keywords

Data set

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options