The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper is concerned with combining models for decoding an optimum translation for a dictation based machine aided human translation (MAHT) task. Statistical language model (SLM) probabilities in automatic speech recognition (ASR) are updated using statistical machine translation (SMT) model probabilities. The effect of this procedure is evaluated for utterances from human translators dictating...
Acquisition of in-domain training data to build speech recognition systems for under-resourced languages can be a costly, time-demanding and tedious process. In this work, we propose the use of machine translation to translate English transcripts of telephone speech into Czech language in order to improve a Czech CTS speech recognition system. The translated transcripts are used as additional language...
This paper investigates detection of English keywords in a conversational scenario using a combination of acoustic and LVCSR based keyword spotting systems. Acoustic KWS systems search predefined words in parameterized spoken data. Corresponding confidences are represented by likelihood ratios given the keyword models and a background model. First, due to the especially high number of false-alarms,...
This paper proposes a novel approach for noise-robust speech recognition which combines a missing-data (MD) derived spectral reconstruction technique and uncertainty decoding based on the weighted Viterbi algorithm (WVA). First, the noisy feature vectors are compensated by using a novel MD imputation technique based on the integration of truncated Gaussian pdfs. Although the proposed MD estimator...
In this paper we investigate whether a layered architecture that has already proven its value for small tasks, works for a system with large lexica (400k words) and language models (5-grams) as well. The architecture was designed to decouple phone and word recognition which allows for the integration of more complex linguistic components, especially at the sub-word level. It was tested on the Dutch...
In this paper we propose a new approach of two-dimensional frame-and-feature weighted Viterbi decoding performed at the recognizer back-end for robust speech recognition. A new SVM-based frame weighting approach is proposed considering the energy distribution and harmonicity of the frame. The feature weighting is based on a previously proposed approach using an entropy measure considering confusion...
Models for silence are a fundamental part of continuous speech recognition systems. Depending on application requirements, audio data segmentation, and availability of detailed training data annotations, it may be necessary or beneficial to differentiate between other non-speech events, for example breath and background noise. The integration of multiple non-speech models in a WFST-based dynamic network...
We compare the most important pruning methods which are common in different LVCSR decoding architectures and lead them back to a theoretical motivation. Based on this motivation, we propose a new pruning method which fades the word end pruning over a large part of the search network. We analyze the methods regarding their relationship between search-space and word error rate, and regarding their mutual...
This paper presents a method to improve the out-of-vocabulary (OOV) word detection performance by combining multiple speech recognition systems' outputs. Three different fragment-word hybrid systems, the phone, subword, and graphone systems, were built for detecting OOV words. Then outputs from each individual system were combined using ROVER. Two combination metrics were explored in ROVER, voting...
It is generally believed that the transition probabilities in a hidden Markov model (HMM) have a limited role in the speech decoding process. In this paper, through a series of recognition experiments on Wall Street Journal (WSJ) read speech and SVitchboard (SVB) conversational telephone speech, we find that the HMM transition probabilities may be more important than we once thought. The experiments...
In VoIP applications, packet loss, delay and delay jitter are inevitable and have a large impact on the perceived speech quality. Jitter buffers are commonly deployed to compensate for jitter in order to play out the received packets continuously. For mobile devices, due to limited battery power, computational complexity has to be kept to a minimum. In this paper, we propose a jitter buffer management...
This paper proposes a new class loss function as an alternative to the standard sigmoid class loss function for optimizing the parameters of decoding graphs using discriminative training based on minimum classification error (MCE) criterion. The standard sigmoid based approach tends to ignore a significant number of training samples that have a large difference between the scores of the reference...
This paper addresses a real-time implementation of multi-channel, high quality G.729A speech codec based on an embedded SIMD processor, which is used in a SIP Video Phone. A series of strategies are designed for the special characteristics of the processor and the G.729A, including the memory management and SIMD decomposing. The profile shows that the dramatic improvement is achieved. Less than 20%...
The present paper exposes a new technique that aims at solving an ill-posed source separation problem encountered in stereo mixtures. The proposed method is realized in an encoder-decoder framework: On the encoder side, a set of spectral envelopes is extracted from the original tracks, which are known. These envelopes are passed on to the decoder in attachment to the stereo mixture, whereas the frequency...
An enhanced bandwidth extension scheme is introduced in this paper for wideband speech coding using ADPCM. The coded lower band signal plus small side information (some parameters) are transmitted instead of the whole band. In the decoder both frequency parts are reconstructed from the coded signal and the received parameters. In the proposed method, the high frequency part is derived from the excitation...
To solve the problem of the low speech signal quality on the FH channels with wideband rejective interference, the performance optimization of CVSD coding in the digital FH system was studied. A new CVSD demodulation arithmetic was designed based on the principle of traditional CVSD system. Then the rule for measuring the quality of speech signal was offered and presented the simulation method of...
This study examined the feasibility of decoding semantic information from human cortical activity. Four human subjects undergoing presurgical brain mapping and seizure foci localization participated in this study. Electrocorticographic (ECoG) signals were recorded while the subjects performed simple language tasks involving semantic information processing, such as a picture naming task where subjects...
This paper reports on studies involving brain-machine interfaces (BMIs) that provide near-instantaneous audio feedback from a speech synthesizer to the BMI user. In one study, neural signals recorded by an intracranial electrode implanted in a speech-related region of the left precentral gyrus of a human volunteer suffering from locked-in syndrome were transmitted wirelessly across the scalp and used...
In this paper, we utilize sender-based Forward Error Correction (FEC) techniques to enhance the robustness of packet loss recovery for AVS Mobile speech and audio (AVS-M) codec. Two FEC schemes are proposed which take the advantage of the codec's structure characteristics and do not introduce extra delay. The objective and subjective listening tests results show that the two methods achieve higher...
One of the problems in speech recognition is out of vocabulary words (OOV) because they can make some words error. Out of vocabulary words are the words that cannot be recognized by speech recognizer because there is no recognizing database. Alignment, language model, and POS Tag method is proposed in order to recognize word error because of OOV words. Word and syllable level decoding from speech...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.