Search results

Items from 1 to 20 out of 108 results

chapter

Novel alignment method for DNN TTS training using HMM synthesis models

Sinisa Suzic, Tijana Delic, Darko Pekar, Vladimir Ostojic

2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY) > 271 - 276

2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY)

In order to train neural networks (NN) for text-to-speech synthesis (TTS), phonetic segmentation must be performed. The most accurate segmentation is performed manually, but the process of creating manual alignments is costly and time-consuming, so automatic procedures are preferable. In this paper, a simple alignment method based on models trained during hidden Markov Model (HMM) based TTS system...

chapter

Voice transformation using pitch and spectral mapping

Anisha Yathigiri, Meenalatha Bathula, Susmitha Kothapalli, Susmitha Vekkot, more

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 1540 - 1544

2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

This paper provides a voice transformation model that uses pitch data and Feed-forward Neural Networks on Line Spectral Frequency. The aim of this work is to achieve the transformation of a speech signal produced by a source speaker by modifying voice individuality parameters such that it appears to be spoken by a chosen target speaker, without modifying the message contents. Most of the previous...

chapter

Implicit language identification system based on random forest and support vector machine for speech

Manish Gupta, Shambhu Shankar Bharti, Suneeta Agarwal

2017 4th International Conference on Power, Control & Embedded Systems (ICPCES) > 1 - 6

2017 4th International Conference on Power, Control & Embedded Systems (ICPCES)

Speech uttered by the human beings contains the information about speakers, languages and contents. Language of uttered speech can easily be identified by extracting the language specific information from it. Identification of language of speech is known as Language Identification (LID). Identification of language from speech is helpful in its translation, speech recognition and speech activated automatic...

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

Exploiting sequence information for text-dependent Speaker Verification

Subhadeep Dey, Petr Motlicek, Srikanth Madikeri, Marc Ferras

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5370 - 5374

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM)...

chapter

On the impact of non-modal phonation on phonological features

Milos Cernak, Elmar Noth, Frank Rudzicz, Heidi Christensen, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5090 - 5094

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Different modes of vibration of the vocal folds contribute significantly to the voice quality. The neutral mode phonation, often used in a modal voice, is one against which the other modes can be contrastively described, also called non-modal phonations. This paper investigates the impact of non-modal phonation on phonological posteriors, the probabilities of phonological features inferred from the...

chapter

Influence of corpus size and content on the perceptual quality of a unit selection MaryTTS voice

Florian Hinterleitner, Benjamin Weiss, Sebastian Moller

2016 IEEE Spoken Language Technology Workshop (SLT) > 680 - 685

2016 IEEE Spoken Language Technology Workshop (SLT)

State-of-the-art approaches on text-to-speech (TTS) synthesis like unit selection and HMM synthesis are data-driven. Therefore, they use a prerecorded speech corpus of natural speech to build a voice. This paper investigates the influence of the size of the speech corpus on five different perceptual quality dimensions. Six German unit selection voices were created based on subsets of different sizes...

chapter

Entropy-based pruning of hidden units to reduce DNN parameters

Gautam Mantena, Khe Chai Sim

2016 IEEE Spoken Language Technology Workshop (SLT) > 672 - 679

2016 IEEE Spoken Language Technology Workshop (SLT)

For acoustic modeling, the use of DNN has become popular due to its superior performance improvements observed in many automatic speech recognition (ASR) tasks. Typically, DNNs with deep (many layers) and wide (many hidden units per layer) architectures are chosen in order to achieve good gains. An issue with such approaches is that there is an explosion in the number of learnable parameters. Thus,...

chapter

Boosting performance on low-resource languages by standard corpora: An analysis

Frantisek Grezl, Martin Karafiat

2016 IEEE Spoken Language Technology Workshop (SLT) > 629 - 636

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper, we analyze the feasibility of using single well-resourced language - English - as a source language for multilingual techniques in context of Stacked Bottle-Neck tandem system. The effect of amount of data and number of tied-states in the source language on performance of ported system is evaluated together with different porting strategies. Generally, increasing data amount and level-of-detail...

chapter

Code-switching detection using multilingual DNNS

Emre Yilmaz, Henk van den Heuvel, David van Leeuwen

2016 IEEE Spoken Language Technology Workshop (SLT) > 610 - 616

2016 IEEE Spoken Language Technology Workshop (SLT)

Automatic speech recognition (ASR) of code-switching speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we investigate the feasibility of using multilingually trained deep neural networks (DNN) for the ASR of Frisian speech containing code-switches to Dutch with the aim of building a robust recognizer that can handle this phenomenon...

chapter

Improving speaker verification using MFCC order

A. T. Rusli, M. I. Ahmad, M. Z. Ilyas

2016 International Conference on Robotics, Automation and Sciences (ICORAS) > 1 - 4

2016 International Conference on Robotics, Automation and Sciences (ICORAS)

This paper presents a text-dependent speaker verification using Mel-Frequency Cepstral Coefficients (MFCC) and Support Vector Machine (SVM). Mel-Frequency Cepstral Coefficients technique has been used to extract the characteristic from the recorded voice spoken by the user and SVM is used to classify the all models of the speakers and impostors. A Malay spoken digit database is utilized for the training...

chapter

Objective measures to improve the selection of training speakers in HMM-based child speech synthesis

Avashna Govender, Febe de Wet

2016 Pattern Recognition Association of South Africa and Robotics and Mechatronics International Conference (PRASA-RobMech) > 1 - 6

2016 PRASA-RobMech International Conference

Building synthetic child voices is considered a difficult task due to the challenges associated with data collection. As a result, speaker adaptation in conjunction with Hidden Markov Model (HMM)-based synthesis has become prevalent in this domain because the approach caters for limited amounts of data. An initial average voice model is trained using data from multiple speakers and adapted to resemble...

chapter

Tendencies regarding the effect of emotional intensity in inter corpus phoneme-level speech emotion modelling

Bogdan Vlasenko, Bjorn Schuller, Andreas Wendemuth

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

As emotion recognition from speech has matured to a degree where it becomes suitable for real-life applications, it is time for developing techniques for matching different types of emotional data with multi-dimensional and categories-based annotations. The categorical approach is usually applied for acted ‘full blown’ emotions and multi-dimensional annotation is often preferred for spontaneous real...

chapter

Development of multilingual phonetic engine for four Indian languages

Lincy Babykutty, Anu George, Leena Mary

2016 International Conference on Next Generation Intelligent Systems (ICNGIS) > 1 - 3

2016 International Conference on Next Generation Intelligent Systems (ICNGIS)

Phonetic Engine (PE) is a system that is used to determine the sequence of phones in a spoken utterance. In order to transcribe the speech database, International Phonetic Alphabet (IPA) is used. This work focuses on developing multilingual PE for four Indian languages namely, Bengali, Hindi, Urdu and Telugu. The number of languages can be increased to any number. For developing the PE, read speech...

chapter

English learning system of oral phonation based on phoneme and smart phone platform

Sun Yutong, Li Xuan

2015 7th International Conference on Modelling, Identification and Control (ICMIC) > 1 - 8

2015 7th International Conference on Modelling, Identification and Control (ICMIC)

Design a software system on smart phone platform. The purpose of this system is providing a reasonable method to evaluate the English accent of non-native speakers, based on the phoneme recognition and fluency assessment, taking advantage of Hidden Markov Model (HMM). Meanwhile, this paper would use the neural net algorithm to combine the objective scoring and experts' scoring to increase the accuracy...

chapter

Stressed speech analysis using sparse representation over temporal information based dictionary

Bhanu Priya, S. Dandapat

2015 Annual IEEE India Conference (INDICON) > 1 - 6

2015 Annual IEEE India Conference (INDICON)

In this paper, a novel sparse representation over learned and exemplar dictionaries is explored to estimate the speech information of stressed speech. Stressed speech contains speech and stress informations. The acoustic variabilities are induced due to presence of stress information, which results in degradation of the performance of speech recognition system. In this work, the acoustic variabilities...

chapter

Synthesis of speaking styles with corpus- and HMM-based approaches

Peter Nagy, Csaba Zainko, Geza Nemeth

2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) > 195 - 200

2015 6th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

In this paper we compare two state-of-the-art speech synthesis techniques (corpus- and HMM-based) in terms of expressive speech synthesis. Two corpora were composed with different speaking styles (broadcast news and literature reading) from the same female speaker. Our aim was to determine to what extent the different technologies reproduce these styles. The corpora and the synthetic expressive speech...

chapter

An image based approach for speech perception

Nguyen Quang Trung, Bui The Duy, Ma Thi Chau

2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS) > 208 - 213

2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS)

Classification of speech signal is one of the most vital problems in speech perception and spoken word recognition. Although, there have been many studies on the classification of speech signals but the results are still limited. In this paper, we propose an image based approach for speech signal classification based on the combination of Local Naïve Bayes Nearest Neighbor (LNBNN) and Scale-invariant...

chapter

Duration refinement for hybrid speech synthesis system using random forest

Ran Zhang, Xiaoyan Lou, Qinghua Wu

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 792 - 796

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

The hybrid speech synthesis system which combines the hidden Markov model and unit selection method has been widely used and researched in both industry and academia recently due to its naturalness and expressiveness. However, the target duration, which is used to control the duration of selected candidate, is still predicted via the state-based duration model, whose performance is far from satisfactory...

chapter

Transcription of Telugu TV news using ASR

M. Ram Reddy, P. Laxminarayana, A. V. Ramana, Markandeya J L, more

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI) > 1542 - 1545

2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI)

Automatic Speech Recognition (ASR) is the process of converting the human speech which is in the form of acoustic waveform, into text. In this paper we discussed about building an automatic speech recognition system for Telugu news. A Telugu speech database is prepared along with the transcription, dictionary. Telugu speech files are collected from the Telugu TV news channels. Most of the selected...

Data set:
ieee
Keywords:
TRAINING
DATABASES
SPEECH
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Content availability

Available (107)
None (1)

Publication type

book (105)
article (3)

Keywords

SPEECH RECOGNITION (58)
ACOUSTICS (29)
FEATURE EXTRACTION (21)
HIDDEN MARKOV MODEL (21)
SPEECH SYNTHESIS (21)
HMM (18)
SPEAKER RECOGNITION (17)
SPEECH PROCESSING (16)
ACCURACY (14)
MEL FREQUENCY CEPSTRAL COEFFICIENT (14)
NATURAL LANGUAGE PROCESSING (13)
AUTOMATIC SPEECH RECOGNITION (9)
COMPUTATIONAL MODELING (9)
ROBUSTNESS (9)
EMOTION RECOGNITION (8)
NOISE (8)
DATA MODELS (7)
MAXIMUM LIKELIHOOD ESTIMATION (7)
SIGNAL PROCESSING (7)
TRAINING DATA (7)
ARTIFICIAL NEURAL NETWORKS (6)
VOCABULARY (6)
ADAPTATION MODELS (5)
CONFERENCES (5)
CONTEXT MODELING (5)
GAUSSIAN PROCESSES (5)
MFCC (5)
NATURAL LANGUAGES (5)
TESTING (5)
ADAPTATION MODEL (4)
ANALYTICAL MODELS (4)
CEPSTRAL ANALYSIS (4)
CLASSIFICATION ALGORITHMS (4)
COMPUTERS (4)
DECODING (4)
MATHEMATICAL MODEL (4)
PREDICTIVE MODELS (4)
SIGNAL TO NOISE RATIO (4)
SPEAKER IDENTIFICATION (4)
SPHINX (4)
SUPPORT VECTOR MACHINES (4)
ACOUSTIC MEASUREMENTS (3)
AUDIO DATABASES (3)
AUTOMATIC SPEECH RECOGNITION SYSTEMS (3)
BIOLOGICAL SYSTEM MODELING (3)
CORRELATION (3)
DATA MINING (3)
DETECTORS (3)
DICTIONARIES (3)
FREQUENCY DOMAIN ANALYSIS (3)
HIDDEN MARKOV MODEL TOOLKIT (3)
HMM-BASED SPEECH SYNTHESIS (3)
PATTERN RECOGNITION (3)
PSYCHOLOGY (3)
ROBOTS (3)
SPEAKER VERIFICATION (3)
STATISTICAL ANALYSIS (3)
STRESS (3)
TRANSFORMS (3)
VECTORS (3)
VITERBI ALGORITHM (3)
ADDITIVE NOISE (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ARABIC (2)
ATMOSPHERIC MODELING (2)
AUDITORY SYSTEM (2)
BIGRAM LANGUAGE MODEL (2)
CDHMM (2)
CLUSTERING ALGORITHMS (2)
COMPUTATIONAL COMPLEXITY (2)
DENSITY ESTIMATION ROBUST ALGORITHM (2)
DISCRETE COSINE TRANSFORMS (2)
DISCRETE FOURIER TRANSFORMS (2)
DISCRIMINATIVE TRAINING (2)
ELECTRONIC MAIL (2)
EMOTIONAL SPEECH SYNTHESIS (2)
ENCODING (2)
ENGINES (2)
EQUAL ERROR RATE (2)
EQUATIONS (2)
ERROR ANALYSIS (2)
ERROR STATISTICS (2)
EXPRESSIVE SPEECH SYNTHESIS (2)
GAUSSIAN MIXTURE MODEL (2)
GAUSSIAN MIXTURE MODELS (2)
HARMONIC ANALYSIS (2)
HIGH-TEMPERATURE SUPERCONDUCTORS (2)
HMM MODELS (2)
HUMANS (2)
IMAGE RECOGNITION (2)
LABELING (2)
LABORATORIES (2)
LANGUAGE MODEL (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
MANUALS (2)
MAXIMUM LIKELIHOOD DETECTION (2)
more

INFONA - science communication portal

Search results

Novel alignment method for DNN TTS training using HMM synthesis models

Voice transformation using pitch and spectral mapping

Implicit language identification system based on random forest and support vector machine for speech

Lyric recognition in monophonic singing using pitch-dependent DNN

Exploiting sequence information for text-dependent Speaker Verification

On the impact of non-modal phonation on phonological features

Influence of corpus size and content on the perceptual quality of a unit selection MaryTTS voice

Entropy-based pruning of hidden units to reduce DNN parameters

Boosting performance on low-resource languages by standard corpora: An analysis

Code-switching detection using multilingual DNNS

Improving speaker verification using MFCC order

Objective measures to improve the selection of training speakers in HMM-based child speech synthesis

Tendencies regarding the effect of emotional intensity in inter corpus phoneme-level speech emotion modelling

Development of multilingual phonetic engine for four Indian languages

English learning system of oral phonation based on phoneme and smart phone platform

Stressed speech analysis using sparse representation over temporal information based dictionary

Synthesis of speaking styles with corpus- and HMM-based approaches

An image based approach for speech perception

Duration refinement for hybrid speech synthesis system using random forest

Transcription of Telugu TV news using ASR

Filter options

Publication date

Content availability

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options