Search results

Items from 21 to 40 out of 2,284 results

chapter

A DNN regression approach to speech enhancement by artificial bandwidth extension

Johannes Abel, Tim Fingscheidt

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 219 - 223

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Artificial speech bandwidth extension (ABE) is an extremely effective means for speech enhancement at the receiver side of a narrowband telephony call. First approaches have been seen incorporating deep neural networks (DNNs) into the estimation of the upper band speech representation. In this paper we propose a regression-based DNN ABE being trained and tested on acoustically different speech databases,...

chapter

Simultaneous learning of speech feature and segment for classification of Parkinson disease

Yongming Li, Cheng Zhang, Yunjian Jia, Ping Wang, more

2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom) > 1 - 6

2017 IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom)

Speech feature learning is very important for the design of classification algorithm of Parkinson's disease (PD). Existing speech feature learning method for classification of PD just pays attention to the speech feature. This paper proposed a novel hybrid feature learning algorithm which puts the features of all the speech segments of each subject together, thereby obtaining new and high efficient...

chapter

Broadband doa estimation using convolutional neural networks trained with noise signals

Soumitro Chakrabarty, Emanuel A. P. Habets

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 136 - 140

2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

A convolution neural network (CNN) based classification method for broadband DOA estimation is proposed, where the phase component of the short-time Fourier transform coefficients of the received microphone signals are directly fed into the CNN and the features required for DOA estimation are learned during training. Since only the phase component of the input is used, the CNN can be trained with...

chapter

Research on voiceprint recognition based on weighted clustering recognition SVM algorithm

Yang Wu, Lihong Xu, Yandong Chen, Xueyang Zhang

2017 Chinese Automation Congress (CAC) > 1144 - 1148

2017 Chinese Automation Congress (CAC)

Support vector machine (SVM) algorithm received much attention in the research of voiceprint recognition, especially for small sample datasets. However, with the increase of recognition number and speech features number, the rate of model training and recognition is significantly reduced. In order to solve the problem, a new weighted clustering algorithm is proposed, which use “one to one” SVM model...

chapter

Impact of Bandwidth and Channel Variation on Presentation Attack Detection for Speaker Verification

Hector Delgado, Massimiliano Todisco, Nicholas Evans, Md Sahidullah, more

2017 International Conference of the Biometrics Special Interest Group (BIOSIG) > 1 - 6

2017 International Conference of the Biometrics Special Interest Group (BIOSIG)

Vulnerabilities to presentation attacks can undermine confidence in automatic speaker verification (ASV) technology. While efforts to develop countermeasures, known as presentation attack detection (PAD) systems, are now under way, the majority of past work has been performed with high-quality speech data. Many practical ASV applications are narrowband and encompass various coding and other channel...

chapter

Single-channel speech separation based on deep clustering with local optimization

Taotao Fu, Ge Yu, Lili Guo, Yan Wang, more

2017 3rd International Conference on Frontiers of Signal Processing (ICFSP) > 44 - 49

2017 3rd International Conference on Frontiers of Signal Processing (ICFSP)

There are many challenges in single-channel multi-person mixed speech separation, such as modeling the temporal continuity of the speech signals and improving the frame separation performance simultaneously. In this paper, a separation method based on Deep Clustering with local optimization by the improved Non-Negative Matrix Factorization (NMF) combined with Factorial Conditional Random Fields (FCRF)...

chapter

Extended list of stop words: Does it work for keyphrase extraction from short texts?

Svetlana Popova, Gabriella Skitalinskaya

2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT) > 1 > 401 - 404

2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT)

In this paper we study the problem of key phrase extraction from short texts written in Russian. As texts we consider messages posted on Internet car forums related to the purchase or repair of cars. The main assumption made is: the construction of lists of stop words for key phrase extraction can be effective if performed on the basis of a small, expert-marked collection. The results show that even...

chapter

Novel alignment method for DNN TTS training using HMM synthesis models

Sinisa Suzic, Tijana Delic, Darko Pekar, Vladimir Ostojic

2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY) > 271 - 276

2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY)

In order to train neural networks (NN) for text-to-speech synthesis (TTS), phonetic segmentation must be performed. The most accurate segmentation is performed manually, but the process of creating manual alignments is costly and time-consuming, so automatic procedures are preferable. In this paper, a simple alignment method based on models trained during hidden Markov Model (HMM) based TTS system...

chapter

Effects of Different Behaviors between Cross Cultures on Learners When Studying

Sanggyu Shin, Hiroshi Hashimoto, Ikuyo Yoshida

2017 International Conference on Culture and Computing (Culture and Computing) > 82 - 88

2017 International Conference on Culture and Computing (Culture and Computing)

Various studies on language behavior across cultures have been conducted from a sociolinguistics standpoint. However, the effect of cross-cultural language behavior has not been studied in the context of e-Learning, where lectures are delivered in video form through the web. It is important to explore this context since e-Learning has gained popularity not only for educating students in universities...

chapter

Snore recognition using a reduced set of spectral features

Enrique M. Albornoz, Leandro A. Bugnon, Cesar E. Martinez

2017 XVII Workshop on Information Processing and Control (RPIC) > 1 - 5

2017 XVII Workshop on Information Processing and Control (RPIC)

Snoring affects the sleep quality of the snorer itself and its social circle. Some types of snoring are related to sleep apnea, which leads to sleepiness during the day and to several health risks. Thus automatic detection of the different types of snoring may lead to more specific diagnosis and consequent treatment. In this work, we propose to use a reduced set of speech related features that includes...

chapter

Neural network alternatives toconvolutive audio models for source separation

Shrikant Venkataramani, Cem Subakan, Paris Smaragdis

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

Convolutive Non-Negative Matrix Factorization model factorizes a given audio spectrogram using frequency templates with a temporal dimension. In this paper, we present a convolutional auto-encoder model that acts as a neural network alternative to convolutive NMF. Using the modeling flexibility granted by neural networks, we also explore the idea of using a Recurrent Neural Network in the encoder...

chapter

Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR

Tsubasa Ochiai, Shinji Watanabe, Shigeru Katagiri

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

Recently we proposed a novel multichannel end-to-end speech recognition architecture that integrates the components of multichannel speech enhancement and speech recognition into a single neural-network-based architecture and demonstrated its fundamental utility for automatic speech recognition (ASR). However, the behavior of the proposed integrated system remains insufficiently clarified. An open...

chapter

Mel-Generalized cepstral regularization for discriminative non-negative matrix factorization

Li Li, Hirokazu Kameoka, Shoji Makino

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

The non-negative matrix factorization (NMF) approach has shown to work reasonably well for monaural speech enhancement tasks. This paper proposes addressing two shortcomings of the original NMF approach: (1) the objective functions for the basis training and separation (Wiener filtering) are inconsistent (the basis spectra are not trained so that the separated signal becomes optimal); (2) minimizing...

chapter

Voice transformation using pitch and spectral mapping

Anisha Yathigiri, Meenalatha Bathula, Susmitha Kothapalli, Susmitha Vekkot, more

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 1540 - 1544

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

This paper provides a voice transformation model that uses pitch data and Feed-forward Neural Networks on Line Spectral Frequency. The aim of this work is to achieve the transformation of a speech signal produced by a source speaker by modifying voice individuality parameters such that it appears to be spoken by a chosen target speaker, without modifying the message contents. Most of the previous...

chapter

Learning embeddings for speaker clustering based on voice equality

Yanick X. Lukic, Carlo Vogt, Oliver Durr, Thilo Stadelmann

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

Recent work has shown that convolutional neural networks (CNNs) trained in a supervised fashion for speaker identification are able to extract features from spectrograms which can be used for speaker clustering. These features are represented by the activations of a certain hidden layer and are called embeddings. However, previous approaches require plenty of additional speaker data to learn the embedding,...

chapter

Gaussian density guided deep neural network for single-channel speech enhancement

Li Chai, Jun Du, Yan-nan Wang

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

Recently, the minimum mean squared error (MMSE) has been a benchmark of optimization criterion for deep neural network (DNN) based speech enhancement. In this study, a probabilistic learning framework to estimate the DNN parameters for single-channel speech enhancement is proposed. First, the statistical analysis shows that the prediction error vector at the DNN output well follows a unimodal density...

chapter

Instrumental shell for pronunciation training simulator design

Anastasiya G. Digor, Irina L. Artemeva, Ekaterina M. Lukina, Viktoriya L. Zavyalova

2017 Second Russia and Pacific Conference on Computer Technology and Applications (RPC) > 33 - 38

2017 Second Russian-Pacific Conference on Computer Technology and Applications (RPC)

The article describes the conceptual design of the instrumental shell for a pronunciation training simulator creation. The project is based on linguistic knowledge and methods of speech recognition. The discrepancies in phonetic systems of different (native and non-native) languages are taken into account. The main approaches for speech recognition are analyzed and the required components of the simulator...

chapter

Development of an android application in kannada to enhance picture naming skills in persons with aphasia

Rajath Shenoy, Sudhindra Nayak, Mahendra Kumar Hegde, Narasimha Kini, more

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 2134 - 2140

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Stroke is one of the leading causes of death and disability in India, and the prevalence of this disorder shows a steep rise in the recent decades. Among the many consequences of stroke, the loss of the ability to use language — i.e., aphasia — is a major detriment to the quality of life of the affected individuals. In the recent past, the rise of model-driven treatment approaches has shown promising...

chapter

Development of speech corpora for Goalparia dialect and similar languages

Tanvira Ismail, L. Joyprakash Singh

2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) > 170 - 173

2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA)

Accurate dialect identification technique helps in improving the speech recognition systems that exist in most of the present day electronic devices and is also expected to help in providing new services in the field of e-health and telemedicine which is especially important for older and homebound people. The accuracy of a dialect identification system is highly dependent on its speech corpora. Therefore,...

chapter

On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models

Shi-Xue Wen, Jun Du, Chin-Hui Lee

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP)

We first examine the generalization issue with the noise samples used in training nonlinear mapping functions between noisy and clean speech features for deep neural network (DNN) based speech enhancement. Then an empirical proof is established to explain why the DNN-based approach has a good noise generalization capability provided that a large collection of noise types are included in generating...

Keywords:
TRAINING
SPEECH

Publication date

Set your own date range

Content availability

Available (2,274)
None (10)

Keywords

SPEECH RECOGNITION (1,071)
HIDDEN MARKOV MODELS (916)
FEATURE EXTRACTION (648)
ACOUSTICS (482)
SPEECH PROCESSING (312)
DATABASES (292)
SPEAKER RECOGNITION (286)
MEL FREQUENCY CEPSTRAL COEFFICIENT (249)
SUPPORT VECTOR MACHINES (248)
ACCURACY (241)
DATA MODELS (184)
SPEECH SYNTHESIS (170)
ARTIFICIAL NEURAL NETWORKS (168)
TESTING (168)
COMPUTATIONAL MODELING (165)
TRAINING DATA (161)
NEURAL NETWORKS (155)
NATURAL LANGUAGE PROCESSING (151)
DATA MINING (143)
NOISE MEASUREMENT (127)
VECTORS (127)
ADAPTATION MODELS (125)
NOISE (121)
EMOTION RECOGNITION (114)
AUTOMATIC SPEECH RECOGNITION (113)
SIGNAL TO NOISE RATIO (105)
ADAPTATION MODEL (102)
HIDDEN MARKOV MODEL (102)
MATHEMATICAL MODEL (101)
GAUSSIAN PROCESSES (100)
SPEECH ENHANCEMENT (91)
CONTEXT (89)
KERNEL (88)
DECODING (86)
CLASSIFICATION ALGORITHMS (84)
LEARNING (ARTIFICIAL INTELLIGENCE) (82)
HMM (80)
GAUSSIAN MIXTURE MODEL (79)
NIST (77)
ESTIMATION (72)
ROBUSTNESS (72)
DICTIONARIES (71)
MFCC (70)
CEPSTRAL ANALYSIS (68)
SPEAKER VERIFICATION (68)
VOCABULARY (68)
MAXIMUM LIKELIHOOD ESTIMATION (67)
CORRELATION (62)
PATTERN CLASSIFICATION (62)
SPEECH CODING (62)
MACHINE LEARNING (60)
MICROPHONES (60)
ERROR ANALYSIS (59)
SPEAKER IDENTIFICATION (59)
NEURAL NETS (58)
STATISTICAL ANALYSIS (58)
DEEP NEURAL NETWORKS (56)
SUPPORT VECTOR MACHINE (55)
TRANSFORMS (55)
VISUALIZATION (55)
ALGORITHM DESIGN AND ANALYSIS (53)
SPECTROGRAM (51)
DEEP NEURAL NETWORK (50)
TEXT ANALYSIS (48)
CLUSTERING ALGORITHMS (47)
OPTIMIZATION (47)
GMM (46)
STANDARDS (46)
SVM (46)
VOICE CONVERSION (46)
CONTEXT MODELING (45)
NEURONS (45)
CONFERENCES (43)
HUMANS (43)
VECTOR QUANTIZATION (43)
PREDICTIVE MODELS (42)
EDUCATIONAL INSTITUTIONS (41)
ACOUSTIC SIGNAL PROCESSING (40)
PRINCIPAL COMPONENT ANALYSIS (40)
RECURRENT NEURAL NETWORKS (40)
PROBABILITY (39)
ENTROPY (38)
NATURAL LANGUAGES (38)
DISCRIMINATIVE TRAINING (37)
SIGNAL PROCESSING (37)
SIGNAL PROCESSING ALGORITHMS (36)
SUPPORT VECTOR MACHINE CLASSIFICATION (35)
AUDITORY SYSTEM (34)
DECISION TREES (34)
DETECTORS (34)
SIGNAL CLASSIFICATION (34)
LATTICES (33)
NEURAL NETWORK (33)
EQUATIONS (32)
REVERBERATION (32)
TRAJECTORY (32)
COMPUTERS (31)
JOINTS (31)
more

INFONA - science communication portal

Search results

A DNN regression approach to speech enhancement by artificial bandwidth extension

Simultaneous learning of speech feature and segment for classification of Parkinson disease

Broadband doa estimation using convolutional neural networks trained with noise signals

Research on voiceprint recognition based on weighted clustering recognition SVM algorithm

Impact of Bandwidth and Channel Variation on Presentation Attack Detection for Speaker Verification

Single-channel speech separation based on deep clustering with local optimization

Extended list of stop words: Does it work for keyphrase extraction from short texts?

Novel alignment method for DNN TTS training using HMM synthesis models

Effects of Different Behaviors between Cross Cultures on Learners When Studying

Snore recognition using a reduced set of spectral features

Neural network alternatives toconvolutive audio models for source separation

Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR

Mel-Generalized cepstral regularization for discriminative non-negative matrix factorization

Voice transformation using pitch and spectral mapping

Learning embeddings for speaker clustering based on voice equality

Gaussian density guided deep neural network for single-channel speech enhancement

Instrumental shell for pronunciation training simulator design

Development of an android application in kannada to enhance picture naming skills in persons with aphasia

Development of speech corpora for Goalparia dialect and similar languages

On generating mixing noise signals with basis functions for simulating noisy speech and learning dnn-based speech enhancement models

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options