The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recently, a hybrid deep neural network/i-vector framework has been proved effective for speaker verification, where the DNN trained to predict tied-triphone states (senones) is used to produce frame alignments for sufficient statistics extraction. In this work, in order to better understand the impact of different phonetic precision to speaker verification tasks, three levels of phonetic granularity...
Proxy-word based out of vocabulary (OOV) keyword search has been proven to be quite effective in keyword search. In proxy-word based OOV keyword search, each OOV keyword is assigned several proxies and detections of the proxies are regarded as detections of the OOV keywords. However, the confidence scores of these detections are still those of the proxies from lattices. To obtain a better confidence...
End-to-end speech recognition systems have been successfully implemented and have become competitive replacements for hybrid systems. A common loss function to train end-to-end systems is connectionist temporal classification (CTC). This method maximizes the log likelihood between the feature sequence and the associated transcription sequence. However there are some weaknesses with CTC training. The...
Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term memory RNNs (LSTM RNNs) have been proposed to solve this problem, and have achieved excellent results. However, because of the large size of LSTM RNNs, they more easily suffer from overfitting,...
Objective: Discuss the relationship between ultrasonic characteristics of Hashimoto's thyroiditis benign nodules (HTBN) and serum TSH. Methods: We summarized 117 cases who were diagnosed as HTBN by thyroid fine needle aspiration according to the inclusion criteria from January 2012 to December 2013 in our department. 32 cases were misdiagnosed by ultrasound as malignant nodules. Using a random number...
The OpenKWS14 keyword search evaluation is one of the most challenging and influential evaluations in the field of speech recognition. Its goal is to build a high-performance keyword search system for a minority language with limited training data in a short period of time. We present the system of the Department of Electronic Engineering, Tsinghua University (THUEE team) for the OpenKWS14 keyword...
In this paper, we propose a method to improve detecting the mispronunciation type of the non-native learners. In order to cope with the low-resource condition of non-native speech and the difference of native and non-native speech, the following efforts are made: 1) train acoustic model with the low-resource non-native data; 2) introduce the articulatory-based tandem feature; 3) pool auxiliary native...
In prosody event detection field, many local acoustic features have been proposed for representing the prosody characteristics of speech unit. The context information that represents some possible regularities underlying neighboring prosody events, however, hasn't been used effectively. The main difficulty to utilize prosodic context is that it's hard to capture the long-distance sequential dependency...
This paper presents a method to improve the mispronunciation detection performance for low-resource acoustic model. The 1h speech data is randomly selected from CU-CHLOE to imitate the low-resource non-native English situation. The Tandem feature derived from articulatory based Multi-Layer Perception (MLP) is employed to replace the traditional spectral feature (e.g. PLP). Further, motivated by similar...
This paper reports a scale treatment on purified terephthalic acid (PTA) productive wastewater in a 50L column form reactor by ultrasound enhanced ozonation. The degradation effects of three kinds of productive wastewater including inlet and outlet water from treatment works as well as accident wastewater are investigated. The results show that the ultrasound enhanced ozonation is an efficient way...
Automatic multilingual speech recognition is always a difficult task. This paper presents recent work on the development of a Mandarin-English bilingual speech recognition system. A unified single set of bilingual acoustic models based on a novel State-Time-Alignment (STA) method is proposed to balance the performance and the complexity of the bilingual speech recognition system, and a comparison...
Acoustic feedback is a common problem in most hearing aids, it reduces the maximum useable gain and even causes "howling" while large forward gain is required, which is quite annoying for patient with severe hearing losses. This paper gives a new combination of de-correlation LMS adaptive algorithm and fixed forward delay to better de-correlate input and output signals of hearing aids as...
This paper presents research on using Chinese phonetics knowledge in acoustic modeling based on extended initial/final (XIF). Context-dependent (CD) model is required for the improvement in performance of the acoustic model, and decision tree-based state tying technology is used to solve the problem which is the huge number of the modelpsilas parameters. Chinese phonetics knowledge plays an important...
In this paper, we describe two approaches for language identification (LID) using support vector machines (SVM) and phonetic n-gram. One is to use the language model scores of phone sequences to do SVM training. The other is to use the n-gram probabilities of those phones to train SVM models. For the second approach, we propose a new effective normalization method. In the experiments of 30 s test...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.