The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We describe a method of lexicon expansion to tackle variations of spontaneous speech. The variations of utterances are found widely in the programs such as conversations talk shows and are typically observed as unintelligible utterances with a high speech-rate. Unlike read speech in news programs, these variations often severely degrade automatic speech recognition (ASR) performance. Then, these variations...
We propose two simple methods to improve the performance of a keyword spotting system. In our application, the users are allowed to change the keywords anytime if they want. Thus we focused on phone-based GMM-HMM models since they do not require keyword-specific training data. However, the GMM-HMM based models usually have very high false alarm rate, i.e., a keyword is not present but the system gives...
A system for automatically evaluating singing enthusiasm is proposed in this study. The definition of singing enthusiasm is how much enthusiasm is perceived in a song being evaluated. This system evaluates the singing enthusiasm on the basis of pitch accuracy, vibrato, diminuendo, roughness, and the correlation between pitch and loudness. A support vector regression (SVR) machine is used for the evaluation...
In an augmented reality scenario, the perceived image of OST-HMD contains color distortion due to background color blending. In order to reduce color blending, accurate estimation of background color is necessary. In this paper, we perform colorimetric estimation of background using camera images, via local linear regression. Using the estimated background color, virtual image is compensated. Experimental...
The evacuation of children and the elderly from disaster areas is sometimes difficult. This study aims to use a vibration sensor to estimate situations involving people who remain in a devastated building. This paper proposes a method to estimate the attributes of the people, such as their age or sex, based on the vibration data produced by their footsteps. The vibration data obtained through sensors...
Training very deep neural networks is very difficult because of gradient degradation. However, the incomparable expressiveness of the many deep layers is highly desirable at testing time and usually leads to better performance. Recently, training techniques such as residual networks that enable us to train very deep networks have proved to be a great success. In this paper, we studied the application...
Detecting pronunciation erroneous tendency (PET) can provide second languages learners with detailedly instructive feedbacks in the computer aided pronunciation training (CAPT) systems. Due to the data sparseness, DNN-HMM achieved limited improvement over GMM-HMM in our previous work. Instead of directly employing DNN-HMM to detect PETs, this paper investigated how to further improve the performance...
We adopt a linear activation function at the output layer and globally normalize the target features into zero mean and unit variance to learn the complicated mapping from reverberant to anechoic speech with a regression model based on deep neural networks (DNNs). The proposed feature activation and normalization framework was found to retain clearly observable harmonics and improve the speech quality...
In this paper, we describe the use of a voice conversion algorithm for improving the intelligibility of speech by patients with articulation disorders caused by a wide glossectomy and/or segmental mandibulectomy. As a first trial, to demonstrate the difficulty of the task at hand, we implemented a conventional Gaussian mixture model (GMM)-based algorithm using a frame-by-frame approach. We compared...
To improve the performance of noisy automatic speech recognition (ASR), it is effective to prepare multiple ASR systems that can address the large varieties of noise. However, the optimal ASR system is different for each environment and mismatches between training and testing degrade ASR performance. In this situation, the overall system combination of multiple systems is effective; however, the computational...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.