Search results

Items from 1 to 20 out of 159 results

chapter

Morse Codes Enter Using Finger Gesture Recognition

Ricky Li, Minh Nguyen, Wei Qi Yan

2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA) > 1 - 8

2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA)

Morse code is one of the earliest means of telecommunications; however, it is rarely used nowadays due to viral mobile communications. Although a person can tap Morse codes using his/her fingers easily, perhaps nobody is aware of this kind of finger gestures anymore. In this paper, we will develop a prototype combined together the principle of old Morse code with finger gesture recognition in digital...

chapter

Novel alignment method for DNN TTS training using HMM synthesis models

Sinisa Suzic, Tijana Delic, Darko Pekar, Vladimir Ostojic

2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY) > 271 - 276

2017 IEEE 15th International Symposium on Intelligent Systems and Informatics (SISY)

In order to train neural networks (NN) for text-to-speech synthesis (TTS), phonetic segmentation must be performed. The most accurate segmentation is performed manually, but the process of creating manual alignments is costly and time-consuming, so automatic procedures are preferable. In this paper, a simple alignment method based on models trained during hidden Markov Model (HMM) based TTS system...

chapter

Implementation of ANN based speech recognition system on an embedded board

Pranjali P. Patange, John Sahaya Rani Alex

2017 International Conference on Nextgen Electronic Technologies: Silicon to Software (ICNETS2) > 408 - 412

2017 International Conference on Nextgen Electronic Technologies: Silicon to Software (ICNETS2)

Speech recognition systems are ubiquitous and find its application in automated voice control, voice dialling and automated directory assistance. This paper aims at implementing a neural network based isolated spoken word recognition system on an embedded board — Raspberry Pi using open source software called octave. Mel-Frequency Cepstral Coefficient (MFCC) features are extracted from speech signal...

chapter

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy

Tomohiro Tanaka, Takafumi Moriya, Takahiro Shinozaki, Shinji Watanabe, more

2016 IEEE Spoken Language Technology Workshop (SLT) > 665 - 671

2016 IEEE Spoken Language Technology Workshop (SLT)

Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which...

chapter

Boosting performance on low-resource languages by standard corpora: An analysis

Frantisek Grezl, Martin Karafiat

2016 IEEE Spoken Language Technology Workshop (SLT) > 629 - 636

2016 IEEE Spoken Language Technology Workshop (SLT)

In this paper, we analyze the feasibility of using single well-resourced language - English - as a source language for multilingual techniques in context of Stacked Bottle-Neck tandem system. The effect of amount of data and number of tied-states in the source language on performance of ported system is evaluated together with different porting strategies. Generally, increasing data amount and level-of-detail...

chapter

Deep neural network based voice conversion with a large synthesized parallel corpus

Zhengqi Wen, Kehuang Li, Jianhua Tao, Chin-Hui Lee

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

We propose a voice conversion framework to map the speech features of a source speaker to a target speaker based on deep neural networks (DNNs). Due to a limited availability of the parallel data needed for a pair of source and target speakers, speech synthesis and dynamic time warping are utilized to construct a large parallel corpus for DNN training. With a small corpus to train DNNs, a lower log...

chapter

Research on the recognition of isolated Chinese lyrics in songs with accompaniment based on deep belief networks

Juanjuan Cai, Nana Wang, Hui Wang, Bing Zhu

2016 IEEE 13th International Conference on Signal Processing (ICSP) > 535 - 540

2016 IEEE 13th International Conference on Signal Processing (ICSP)

Lyrics are an important part of songs. Lyrics recognition is the basis of retrieving songs and recognizing the content of songs, which is of great value. At present, the research of speech recognition has made great progresses. But there are still difficulties in recognition of lyrics in songs with accompaniment. Related research is generally lacking, especially for Chinese lyrics in songs with accompaniment,...

chapter

Learning auxiliary categorical information for speech synthesis based on deep and recurrent neural networks

Zhengqi Wen, Kehuang Li, Zhen Huang, Jianhua Tao, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

We proposed an auxiliary categorization framework for training speech synthesis systems using deep neural networks (DNNs) and recurrent neural networks (RNNs). The adopted artificial neural networks (ANNs) are regression models comprising a few hidden layers and an affine-transform layer for transforming the contextual features into a set of speech synthesis parameters. In order to incorporate categorization...

chapter

Estimation of macro sleep stages from whole night audio analysis

E. Dafna, M. Halevi, D. Ben Or, A. Tarasiuk, more

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 2847 - 2850

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

During routine sleep diagnostic procedure, sleep is broadly divided into three states: rapid eye movement (REM), non-REM (NREM) states, and wake, frequently named macro-sleep stages (MSS). In this study, we present a pioneering attempt for MSS detection using full night audio analysis. Our working hypothesis is that there might be differences in sound properties within each MSS due to breathing efforts...

chapter

Lane changing prediction at highway lane drops using support vector machine and artificial neural network classifiers

Yangliu Dou, Fengjun Yan, Daiwei Feng

2016 IEEE International Conference on Advanced Intelligent Mechatronics (AIM) > 901 - 906

2016 IEEE International Conference on Advanced Intelligent Mechatronics (AIM)

High accuracy of lane changing prediction is beneficial to driver assistant system and fully autonomous cars. This paper proposes a lane changing prediction model based on combined method of Supporting Vector Machine (SVM) and Artificial Neural Network (ANN) at highway lane drops. The vehicle trajectory data are from Next Generation Simulation (NGSIM) data set on U.S. Highway 101 and Interstate 80...

chapter

An Efficient Approach of Training Artificial Neural Network to Recognize Bengali Hand Sign

Alvi Mahadi, Fatema Tuj Johora, Mohammad Abu Yousuf

2016 IEEE 6th International Conference on Advanced Computing (IACC) > 152 - 157

2016 IEEE 6th International Conference on Advanced Computing (IACC)

This work proposes a system that percepts handsigns and gestures via computer vision system and extractsufficient amount of images from it. After applying imageprocessing and extracting the features of the images, the systemuses an algorithm to recognize the hand signs and gestures. Inthe process of recognizing the hand signs, the Artificial NeuralNetwork (ANN) is being trained with some specific...

chapter

A probabilistic interpretation for artificial neural network-based voice conversion

Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Yih-Ru Wang, more

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 552 - 558

2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

Voice conversion (VC) using artificial neural networks (ANNs) has shown its capability to produce better sound quality of the converted speech than that using Gaussian mixture model (GMM). Although ANN-based VC works reasonably well, there is still room for further improvement. One of the promising ways is to adopt the successful techniques in statistical model-based parameter generation (SMPG), such...

article

Fault Identification in Distributed Sensor Networks Based on Universal Probabilistic Modeling

Stavros Ntalampiras

IEEE Transactions on Neural Networks and Learning Systems > 2015 > 26 > 9 > 1939 - 1949

This paper proposes a holistic modeling scheme for fault identification in distributed sensor networks. The proposed scheme is based on modeling the relationship between two datastreams by means of a hiddenMarkov model (HMM) trained on the parameters of linear time-invariant dynamic systems, which estimate the specific relationship over consecutive time windows. Every system state, including the nominal...

chapter

Multiple classifiers fusion to classify acoustic events in ONC hydrophone data

Gorkem Cipli, Farook Sattar, Peter F. Driessen

2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM) > 467 - 472

2015 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM)

In this paper, we present a new framework of multiple classifiers fusion to classify acoustic events in ONC (Ocean Network Canada) hydrophone data. The outputs of three different classifiers are fused based on aggregation of a generated decision matrix. An ensemble class label is thereby obtained for the classification of acoustic events into multiple classes of whale calls, boat sounds and noise...

chapter

Different strategies in the development of ANFIS to recognize vowels

Jesus Julian Valencia-Jimenez, Antonio Fernandez-Caballero

2015 10th Iberian Conference on Information Systems and Technologies (CISTI) > 1 - 6

2015 10th Iberian Conference on Information Systems and Technologies (CISTI)

This document is the result of a first approach to recognizing vowels through using Adaptive Neuro-Fuzzy Inference Systems (ANFIS), applying different strategies to train these systems. The tests perform with data from a same speaker used for training and checking; also from another speaker with different gender for checking. Training data are established minimally. Different strategies are applied...

chapter

Comparison of two different text-to-speech alignment systems: Speech synthesis based vs. hybrid HMM/ANN

O. Deroo, F. Malfrere, T. Dutoit

9th European Signal Processing Conference (EUSIPCO 1998) > 1 - 4

9th European Signal Processing Conference (EUSIPCO 1998)

In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented...

chapter

A new training algorithm for hybrid HMM/ANN speech recognition systems

Herve Bourlard, Yochai Konig, Nelson Morgan, Christophe Ris

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

In this paper, we briefly describe REMAP, an approach for the training and estimation of posterior probabilities, and report its application to speech recognition. REMAP is a recursive algorithm that is reminiscent of the Expectation Maximization (EM) [5] algorithm for the estimation of data likelihoods. Although very general, the method is developed in the context of a statistical model for transition-based...

chapter

Full-rank linear-chain NeuroCRF for sequence labeling

Marc-Antoine Rondeau, Yi Su

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5281 - 5285

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Inspired by the success of deep neural network-hidden Markov model (DNN-HMM) in acoustic modeling for automatic speech recognition, a number of researchers from various fields have independently proposed the idea of combining DNN and conditional random fields (CRFs). Despite their subtle differences, this class of models is collectively referred to as “NeuroCRF” in this paper. We focus our attention...

chapter

Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables

Zoltan Tuske, Muhammad Ali Tahir, Ralf Schluter, Hermann Ney

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4285 - 4289

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By...

chapter

Word embedding for recurrent neural network based TTS synthesis

Peilu Wang, Yao Qian, Frank K. Soong, Lei He, more

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4879 - 4883

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents...

Data set:
ieee
Keywords:
ARTIFICIAL NEURAL NETWORKS
TRAINING
HIDDEN MARKOV MODELS

Publication date

Set your own date range

Publication type

book (151)
article (8)

Keywords

SPEECH (57)
SPEECH RECOGNITION (57)
NEURAL NETS (45)
FEATURE EXTRACTION (38)
ACOUSTICS (21)
DATA MINING (21)
DATABASES (20)
HIDDEN MARKOV MODEL (19)
LEARNING (ARTIFICIAL INTELLIGENCE) (19)
ACCURACY (18)
NATURAL LANGUAGE PROCESSING (18)
MATHEMATICAL MODEL (16)
NEURAL NETWORKS (16)
ARTIFICIAL NEURAL NETWORK (15)
CLASSIFICATION ALGORITHMS (15)
DATA MODELS (15)
NEURAL NETWORK (14)
NEURONS (14)
PATTERN CLASSIFICATION (14)
SPEECH PROCESSING (14)
SUPPORT VECTOR MACHINES (14)
COMPUTATIONAL MODELING (13)
HMM (13)
MEL FREQUENCY CEPSTRAL COEFFICIENT (12)
PROBABILITY (12)
TRAINING DATA (12)
HANDWRITING RECOGNITION (11)
PATTERN RECOGNITION (9)
SPEECH SYNTHESIS (9)
ESTIMATION (8)
MACHINE LEARNING (8)
PREDICTIVE MODELS (8)
GAUSSIAN PROCESSES (7)
IMAGE SEGMENTATION (7)
MULTILAYER PERCEPTRONS (7)
SPEAKER RECOGNITION (7)
TESTING (7)
TEXT ANALYSIS (7)
VECTORS (7)
VOCABULARY (7)
ACOUSTIC SIGNAL PROCESSING (6)
AUTOMATIC SPEECH RECOGNITION (6)
CEPSTRAL ANALYSIS (6)
CONFERENCES (6)
DECISION TREES (6)
GESTURE RECOGNITION (6)
MAXIMUM LIKELIHOOD ESTIMATION (6)
NOISE (6)
PRINCIPAL COMPONENT ANALYSIS (6)
SIGNAL PROCESSING (6)
SUPPORT VECTOR MACHINE (6)
SUPPORT VECTOR MACHINE CLASSIFICATION (6)
ADAPTATION MODEL (5)
ALGORITHM DESIGN AND ANALYSIS (5)
BACKPROPAGATION (5)
BIOLOGICAL SYSTEM MODELING (5)
CLUSTERING ALGORITHMS (5)
COVARIANCE MATRIX (5)
DECODING (5)
DEEP BELIEF NETWORKS (5)
ENCODING (5)
ERROR ANALYSIS (5)
MFCC (5)
ROBUSTNESS (5)
SIGNAL PROCESSING ALGORITHMS (5)
STATISTICAL ANALYSIS (5)
TAGGING (5)
VITERBI ALGORITHM (5)
ANN (4)
CHARACTER RECOGNITION (4)
COMPUTERS (4)
CONDITIONAL RANDOM FIELDS (4)
DYNAMIC TIME WARPING (4)
EDUCATIONAL INSTITUTIONS (4)
FAULT DIAGNOSIS (4)
GENETIC ALGORITHM (4)
GENETIC ALGORITHMS (4)
GMM (4)
HANDWRITTEN CHARACTER RECOGNITION (4)
IMAGE RECOGNITION (4)
INTRUSION DETECTION (4)
KERNEL (4)
MARKOV PROCESSES (4)
PHONE RECOGNITION (4)
RADIAL BASIS FUNCTION NETWORKS (4)
SENSORS (4)
STANDARDS (4)
TIME SERIES (4)
TRANSFORMS (4)
WRITING (4)
ACCELERATION (3)
ACOUSTIC MODELING (3)
AMINO ACIDS (3)
BUILDINGS (3)
CLASSIFICATION (3)
COMPUTER ARCHITECTURE (3)
CONVOLUTION (3)
more

INFONA - science communication portal

Search results

Morse Codes Enter Using Finger Gesture Recognition

Novel alignment method for DNN TTS training using HMM synthesis models

Implementation of ANN based speech recognition system on an embedded board

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy

Boosting performance on low-resource languages by standard corpora: An analysis

Deep neural network based voice conversion with a large synthesized parallel corpus

Research on the recognition of isolated Chinese lyrics in songs with accompaniment based on deep belief networks

Learning auxiliary categorical information for speech synthesis based on deep and recurrent neural networks

Estimation of macro sleep stages from whole night audio analysis

Lane changing prediction at highway lane drops using support vector machine and artificial neural network classifiers

An Efficient Approach of Training Artificial Neural Network to Recognize Bengali Hand Sign

A probabilistic interpretation for artificial neural network-based voice conversion

Fault Identification in Distributed Sensor Networks Based on Universal Probabilistic Modeling

Multiple classifiers fusion to classify acoustic events in ONC hydrophone data

Different strategies in the development of ANFIS to recognize vowels

Comparison of two different text-to-speech alignment systems: Speech synthesis based vs. hybrid HMM/ANN

A new training algorithm for hybrid HMM/ANN speech recognition systems

Full-rank linear-chain NeuroCRF for sequence labeling

Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables

Word embedding for recurrent neural network based TTS synthesis

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options