The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The article presents studies on the automatic whispery speech recognition. In the performed research a new corpus with whispery speech has been used. The aim of studies presented in this paper was to check, how the vocabulary size and the language model order influence on the speech recognition quality. It has been concluded that even using recordings with 5,000 different words only it is possible...
Speaker recognition has been developed over many years and it comes with many different methods. MFCC is one of more the successful methods due to it being generally modeled on the human auditory system. It represents high success rate of recognition and strong robustness against noise in the lower frequency regions. However, in the higher frequency regions, it captures speaker characteristics information...
In this paper, we present a novel spectrum mapping method — Continuous Frequency Warping and Magnitude Scaling (CFWMS) for voice conversion under the Joint Density Gaussian Mixture Model (JDGMM) framework. JDGMM is a mature clustering technique that models the joint probability density of speech signals from paired speakers. The conventional JDGMM-based approaches morph the spectral features via least...
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training...
Factor of sparsity in a speech signal plays an important role in the speech processing. This paper proposed a method in which variable regularization factor of sparsity is applied for the mixed signal and used to separate the monaural speech signals. The sparsity regularization factor for individual training and testing signal was find using particle swarm optimization. Algorithm has been tested for...
A much discussed topic in the recent years is the interconnectedness of industrial plants in the field of Cyber-Physical Production Systems (CPPS). In the future, the data and aggregated information from various production plants will be available globally at any time. Particularly in maintenance, this could be a helpful information expansion for the maintenance staff, since maintenance information...
This paper describes the implementation of HMM (Hidden Markov Model) based speaker independent isolated word Automatic Speech Recognition (ASR) system for Nepali Language, a commonly spoken language in Nepal. The system has been developed in python using numpy[1] and YAHMM[2] libraries. The system is trained in different Nepali words by collecting data from different speakers in room environment....
Performances of some training techniques of automatic speech recognition system are compared in this paper. Speech recognition accuracy was used as measure of performance. Different kinds of outdoor and indoor noise were used for studying. It is shown the superiority of training on noised speech methods over the competitive technique of training on clear speech. It has been found that training by...
Deep learning has brought a breakthrough to the performance of speech recognition. The speech recognition systems based on deep neural networks have obtained the state-of-the-art performance on various speech recognition tasks. These systems almost utilize the Mel-frequency cepstral coefficients or the Mel-scale log-filterbank coefficients, which are based on short-time Fourier transform. Although...
The authors are developing a talking robot which is a mechanical vocalization system modeling the human articulatory system. The talking robot is constructed with mechanical parts that are made by referring to human vocal organs biologically and functionally. In this study, a newly redesign artificial vocal cord is developed for the purpose of extending the speaking capability of the talking robot...
This paper describes a novel algorithm to improve the performance of sparsity based single-channel speech separation(SCSS) problem based on compressed sensing which is an emerging technique for efficient data reconstruction. The conventional approach assumes the mixing conditions and source signals are stationary. For practical applications of audio source separation, however, we face the challenges...
We present a method for estimating the body orientation of seated people in a smart room by fusing low-resolution range information collected from downward pointed time-of-flight (ToF) sensors with synchronized speaker identification information from microphone recordings. The ToF sensors preserve the privacy of the occupants in that they only return the range to a small set of hit points. We propose...
This paper presents 2D image processing approach to playback detection in automatic speaker verification (ASV) systems using spectrograms as speech signal representation. Three feature extraction and classification methods: histograms of oriented gradients (HOG) with support vector machines (SVM), HAAR wavelets with AdaBoost classifier and deep convolutional neural networks (CNN) were compared on...
Touchscreen assistive technology is designed to support speech interaction between visually disabled people and mobile devices, allowing the use of a choreography of gestures to interact with a touch user interface. This paper presents the evaluation of VoiceOver, a screen reader in Apple Inc. products, made in the research project Visually impaired users touching the screen- A user evaluation of...
Reliable visual features that encode the articulator movements of speakers can dramatically improve the decoding accuracy of automatic speech recognition systems when combined with the corresponding acoustic signals. In this paper, a novel framework is proposed to utilize audio-visual speech not only during decoding but also for training better acoustic models. In this framework, a multi-stream hidden...
This paper presents the main improvements brought recently to the large-vocabulary, continuous speech recognition (LVCSR) system for Romanian language developed by the Speech and Dialogue (SpeeD) research laboratory. While the most important improvement consists in the use of DNN-based acoustic models, instead of the classic HMM-GMM approach, several other aspects are discussed in the paper: a significant...
This paper presents the work done towards developing a speech corpus for Romanian, for automatic speech recognition for the banking domain. This work is done in the context of the Speech2Process project, which aims at creating a system which allows interaction between customers and agents in the contact center much easier. The application to use the banking corpus will provide automatic response to...
This visual paper aims at proposing a framework for detecting depression in cancer patients using prosodic and statistical features extracted by speech, while chatting with a virtual coach.
This study presents a research study on applying ASR (Automatic Speech Recognition) technology in English pronunciation correction. We also discuss the relationship between ELF/ESL learners' self-improvement and English teacher's classroom teaching. The results show that ASR technology can help Chinese English learners improve their English pronunciation. The research aims to provide a new and practical...
In this research, we conducted a questionnaire survey and interview on the current state of flipped classrooms for university faculty members in order to obtain teacher support methods. The survey showed that flipped classrooms are practiced with various subjects and class sizes, which indicates the necessity of support for a wide range of subjects. In the educational method, the authors found that...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.