The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we target improving the accuracy of acoustic modelling for statistical parametric speech synthesis (SPSS) and introduce the convolutional neural network (CNN) due to its powerful capacity in locality modelling. A novel model architecture combining unidirectional long short-term memory (LSTM) and a time-domain convolutional output layer (COL) is proposed and employed to acoustic modelling...
Recurrent neural networks (RNNs) with a gating mechanism have been shown to give state-of-the-art performance in acoustic modeling, such as gated recurrent unit (GRU), long short-term memory (LSTM), long short-term memory projected (L-STMP), etc. But little is known about why these gated RNNs work and what the differences are among these networks. Based on a series of experimental comparison and analysis,...
Speech activity detection (SAD) is the key preprocess of speech application. This paper proposed a subspace Gaussian mixture model (SGMM) and phoneme combination based SAD algorithm. This algorithm is efficient, small and can utilize speech recognition corpus directly. Results indicate that, compared with the baseline, our proposed method results in an absolute improvement of 4.9% frame error rate...
In this paper, we propose a discriminative method for the acoustic feature based language recognizer, which is a modification of the polynomial expansion in generalized linear discriminant sequence (GLDS) kernel. It is inspired by the Gaussian mixture model-support vector machine (GMM-SVM) system which has been successfully used in both speaker and language recognition. Because of the restriction...
With the fast development of high-speed network and digital video recording technologies, broadcast video has been playing a more and more important role in our daily life. In this paper, we propose a novel news story segmentation scheme which can segment broadcast video into story units with multi-modal information fusion (MMIF) strategy. Compared with traditional methods, the proposed scheme extracts...
This paper presents our preliminary works on exploring unsupervised training of subspace gaussian mixture models for under-resourced CTS recognition task. The subspace model yields better performance than conventional GMM model, particularly in small or middle-sized training set. As an effective way to save human efforts, unsupervised learning is often applied to automatically transcribe a large amount...
Automatic prosodic break detection is important for both speech understanding and natural speech synthesis. In this paper, we develop complementary model to detect Mandarin prosodic break by using acoustic, lexical and syntactic evidence. The model realizes the complementarities by taking the advantages of each model. When comparing with the baseline system, our proposed method has good performance.
In this paper, we present the work in progress on automatic detection of stress in continuous Mandarin (standard Chinese) spoken utterance, and we are interested in finding the characteristic and performance of the acoustic stress cues in Mandarin. Therefore, correlated stress features including pitch, duration, intensity and spectral intensity are exploited with the purpose of developing the baseline...
In this study, we combine the Mandarin characteristics with Mandarin acoustic attribute and text information and use hierarchical model based ensemble machine learning to predict Mandarin pitch accent. Our model could make the best of advantages of prosody hierarchical structure and ensemble machine learning. When comparing our model with classification and regression tree (CART), support vector machine...
Mispronunciation detection is one of the vital tasks of the CALL (Computer Assisted Language Learning) systems. Many methods have been introduced to accomplish this task. However, few of them have addressed the detection task on confusable phones. In this paper, phone-level classifiers are utilized to improve the detection performance on the confusable phones. Features of the classifiers are posterior...
Prosody is an important factor for a high quality text-to- speech (TTS) system. Prosody is often described with a hierarchical structure. So the generation of the hierarchical prosody structure is very important both in the corpus building and the real-time text analysis, but the prosody labeling procedure is laborious and time consuming. In this paper, an automatic prosody boundary label system is...
Since it is the most natural way for people to search a specific melody in large music database, query by humming/singing is attracting more and more researcherspsila attention in the field of content-based music information retrieval. In this task, note-based and frame-based similarity measures are two commonly used methods. However, in previous works, researchers always focus on one of the two methods...
Query by humming (QBH) is an interactive tool for retrieving favored songs from a large database of known media via acoustic input. In this task, common method for measuring similarity between query and candidate is either by symbolic notation distance or by framed based dynamic programming. However, the former has disadvantage of error-prone to the noted symbolic feature extraction stage, while the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.