The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Morse code is one of the earliest means of telecommunications; however, it is rarely used nowadays due to viral mobile communications. Although a person can tap Morse codes using his/her fingers easily, perhaps nobody is aware of this kind of finger gestures anymore. In this paper, we will develop a prototype combined together the principle of old Morse code with finger gesture recognition in digital...
In order to train neural networks (NN) for text-to-speech synthesis (TTS), phonetic segmentation must be performed. The most accurate segmentation is performed manually, but the process of creating manual alignments is costly and time-consuming, so automatic procedures are preferable. In this paper, a simple alignment method based on models trained during hidden Markov Model (HMM) based TTS system...
Speech recognition systems are ubiquitous and find its application in automated voice control, voice dialling and automated directory assistance. This paper aims at implementing a neural network based isolated spoken word recognition system on an embedded board — Raspberry Pi using open source software called octave. Mel-Frequency Cepstral Coefficient (MFCC) features are extracted from speech signal...
Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which...
In this paper, we analyze the feasibility of using single well-resourced language - English - as a source language for multilingual techniques in context of Stacked Bottle-Neck tandem system. The effect of amount of data and number of tied-states in the source language on performance of ported system is evaluated together with different porting strategies. Generally, increasing data amount and level-of-detail...
We propose a voice conversion framework to map the speech features of a source speaker to a target speaker based on deep neural networks (DNNs). Due to a limited availability of the parallel data needed for a pair of source and target speakers, speech synthesis and dynamic time warping are utilized to construct a large parallel corpus for DNN training. With a small corpus to train DNNs, a lower log...
Lyrics are an important part of songs. Lyrics recognition is the basis of retrieving songs and recognizing the content of songs, which is of great value. At present, the research of speech recognition has made great progresses. But there are still difficulties in recognition of lyrics in songs with accompaniment. Related research is generally lacking, especially for Chinese lyrics in songs with accompaniment,...
We proposed an auxiliary categorization framework for training speech synthesis systems using deep neural networks (DNNs) and recurrent neural networks (RNNs). The adopted artificial neural networks (ANNs) are regression models comprising a few hidden layers and an affine-transform layer for transforming the contextual features into a set of speech synthesis parameters. In order to incorporate categorization...
During routine sleep diagnostic procedure, sleep is broadly divided into three states: rapid eye movement (REM), non-REM (NREM) states, and wake, frequently named macro-sleep stages (MSS). In this study, we present a pioneering attempt for MSS detection using full night audio analysis. Our working hypothesis is that there might be differences in sound properties within each MSS due to breathing efforts...
High accuracy of lane changing prediction is beneficial to driver assistant system and fully autonomous cars. This paper proposes a lane changing prediction model based on combined method of Supporting Vector Machine (SVM) and Artificial Neural Network (ANN) at highway lane drops. The vehicle trajectory data are from Next Generation Simulation (NGSIM) data set on U.S. Highway 101 and Interstate 80...
This work proposes a system that percepts handsigns and gestures via computer vision system and extractsufficient amount of images from it. After applying imageprocessing and extracting the features of the images, the systemuses an algorithm to recognize the hand signs and gestures. Inthe process of recognizing the hand signs, the Artificial NeuralNetwork (ANN) is being trained with some specific...
Voice conversion (VC) using artificial neural networks (ANNs) has shown its capability to produce better sound quality of the converted speech than that using Gaussian mixture model (GMM). Although ANN-based VC works reasonably well, there is still room for further improvement. One of the promising ways is to adopt the successful techniques in statistical model-based parameter generation (SMPG), such...
This paper proposes a holistic modeling scheme for fault identification in distributed sensor networks. The proposed scheme is based on modeling the relationship between two datastreams by means of a hiddenMarkov model (HMM) trained on the parameters of linear time-invariant dynamic systems, which estimate the specific relationship over consecutive time windows. Every system state, including the nominal...
In this paper, we present a new framework of multiple classifiers fusion to classify acoustic events in ONC (Ocean Network Canada) hydrophone data. The outputs of three different classifiers are fused based on aggregation of a generated decision matrix. An ensemble class label is thereby obtained for the classification of acoustic events into multiple classes of whale calls, boat sounds and noise...
This document is the result of a first approach to recognizing vowels through using Adaptive Neuro-Fuzzy Inference Systems (ANFIS), applying different strategies to train these systems. The tests perform with data from a same speaker used for training and checking; also from another speaker with different gender for checking. Training data are established minimally. Different strategies are applied...
In this paper we compare two different methods for phonetically labeling a speech database. The first approach is based on the alignment of the speech signal on a high quality synthetic speech pattern, and the second one uses a hybrid HMM/ANN system. Both systems have been evaluated on French read utterances from a speaker never seen in the training stage of the HMM/ANN system and manually segmented...
In this paper, we briefly describe REMAP, an approach for the training and estimation of posterior probabilities, and report its application to speech recognition. REMAP is a recursive algorithm that is reminiscent of the Expectation Maximization (EM) [5] algorithm for the estimation of data likelihoods. Although very general, the method is developed in the context of a statistical model for transition-based...
Inspired by the success of deep neural network-hidden Markov model (DNN-HMM) in acoustic modeling for automatic speech recognition, a number of researchers from various fields have independently proposed the idea of combining DNN and conditional random fields (CRFs). Despite their subtle differences, this class of models is collectively referred to as “NeuroCRF” in this paper. We focus our attention...
In the hybrid approach, neural network output directly serves as hidden Markov model (HMM) state posterior probability estimates. In contrast to this, in the tandem approach neural network output is used as input features to improve classic Gaussian mixture model (GMM) based emission probability estimates. This paper shows that GMM can be easily integrated into the deep neural network framework. By...
The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.