The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A noise robust scheme for voice activity detection (VAD) that employs a combination of both intra- and inter-frame acoustic features is presented in this paper. As intra-frame features full-band energy and mel-frequency cepstrum coefficient (MFCC) are calculated whereas integrated bispectrum is estimated as inter-frame features. The parameters combined by intra- and inter-frame features are sorted...
We investigate an alternative formulation of phonetic feature representations for SVM-based speaker verification. The new features are based on conditional likelihood representations rather than the joint-likelihood or bag-of-ngram calculations traditionally used. Conditional likelihoods are shown to be a more natural method of modelling phonetic information, and improve upon conventional joint likelihoods...
This paper proposes a novel method for speech endpoint detection. The developed method utilises gradient based edge detection algorithms, used in image processing, to detect boundaries of continuous speech in noisy conditions. It is simple and has low computational complexity. The accuracy of the proposed method was evaluated and compared to the ITU-T G.729 Annex-B voice activity detection (VAD) algorithm...
Having an audio-visual automatic speech recognition (AVASR) system which can recognise what a speaker's says regardless of head position (i.e. left profile, front, right profile etc.), would be most useful as it enables this technology to be used in a host of realistic applications such as mobile phone and in-vehicle speech recognition. A major hurdle in achieving this goal is in developing a visual...
The use of speech recognition in noisy environments requires the use of speech enhancement algorithms in order to improve recognition performance. Deploying these enhancement techniques requires significant engineering to ensure algorithms are realisable in electronic hardware. This paper describes the design decisions and process to port the popular spectral subtraction algorithm to a Virtex-4 field-programmable...
The paper proposes a study of a background noise classifier based on a pattern recognition approach using a neural network. The signals submitted to the neural network are characterised by means of a set of 12 MFCC (Mel frequency cepstral coefficient) parameters typically present in the front end of a mobile terminal. The performance of the classifier, evaluated in terms of percent misclassification,...
In this article the relevant training aspects for building robust and accurate HMM models for large vocabulary recognition system in Slovak are discussed. As the basis for building HMM models the MASPER training procedure was assumed, and applied on the Slovak MOBILDAT database.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.