The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we propose to use acoustic feature based submodular function optimization to select a subset of untranscribed data for manual transcription, and retrain the initial acoustic model with the additional transcribed data. The acoustic features are obtained from an unsupervised Gaussian mixture model. We also integrate the acoustic features with the phonetic features, which are obtained...
This paper proposes a method to deal with the problem of sports classification through audio analysis. First, a two-pass audio segmentation module is developed as the front-end to extract announcer's speech from the audio streams. Then speech recognition technology is employed on the speech segments to extract keywords which are used as features to distinguish different sports. Finally, based on the...
This paper proposes a two-pass audio segmentation method for sports games. The 1st pass conducts the segmentation by a metric-based algorithm, and the 2nd pass conducts a model-based classification to extract speech segments. This audio segmentation module we developed can extract announcer's speech efficiently from the complex sport audio stream.
This paper proposes a SVM-based method to deal with the problem of detecting audio events(cheering and applause) by audio analysis. In our framework, a sliding window is first used to pre-segment the audio stream into short segments by moving from start to the end. Second, various kinds of audio features are extracted to represent different audio sounds in each segment. Third, SVM(super vector machine)...
This paper proposes a unified method to deal with the problem of detecting cheering events in audio stream of live sports games. In our framework, first, a sliding window is used to pre-segment the audio stream into short segments by moving from start to the end. Second, various kinds of audio features are extracted to represent different audio sounds in each segment. Third, GMM (Gaussian Mixture...
This paper proposes a method to deal with the problem of extracting commentator's speech in audio stream of live sports games. First, a two-pass metric-based audio segmentation module is developed to segment the audio stream into short ones with homogeneous acoustic features. Then a model-based classification module is adopted to extract the speech segments. For robust audio classification, various...
This paper presents our Mandarin pronunciation quality assessment system for the examination of Putonghua Shuiping Kaoshi (PSK) and investigates some measures to improve the assessment accuracy. In this paper, a selective speaker adaptation method is studied. In the adaptation module, we select well pronounced speech as the adaptation data, and adopt Maximum Likelihood Linear Regression (MLLR) to...
This paper proposes a novel system to automatically determine the sports type of a sports game by conducting keywords spotting on short fragments (around 10 minutes) of a sports game. In this system, we first develop an audio segmentation module as a front-end to separate announcers' speech efficiently from the complex sports audio stream. Then we employ speech recognition technology on these speech...
In this paper we develop an approach to automatic, data-driven generation of pronunciation dictionaries for keyword spotting(KWS) systems. In practical applications, KWS tasks often have to deal with keywords whose pronunciations can not be found in the dictionary. To solve this problem, we study how to derive pronunciations automatically from speech samples of keywords. Recognized sequences from...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.