The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Brain-computer interfaces that directly decode speech could restore communication to locked-in individuals. However, decoding speech from brain signals still faces many challenges. We investigated decoding of phonemes — the smallest separable parts of speech — from ECoG signals during word production. We expanded on previous efforts to identify specific phoneme by identifying phonemes by where in...
We present the first demonstration of single-trial neural decoding of vowel acoustic features during speech production with high performance. The ability to predict trial-by-trial fluctuations in speech production was facilitated by using high-density, large-area electrocorticography (ECoG) combined with an adaptive principal components regression. In experiments from two human neurosurgical patients...
The detection of Out-of-vocabulary (OOV) words is a crucial problem for spoken term detection (STD). In this paper, the use of integration with local acoustic information is investigated to retrieve more OOV words. Tokens with high local acoustic probabilities propagated in the search space at the decoding stage will be forced to propagate to the next frame. In this way, acoustic similar words can...
Phone Log-Likelihood Ratios (PLLR) have been recently introduced as features for spoken language and speaker recognition systems. This representation has proven to be an effective way of retrieving acoustic-phonotactic information into frame-level vectors, which can be easily plugged into state-of-the-art systems. In a previous work, we began the search of reduced representations of PLLRs, as a mean...
In this paper, we propose to use Deep Neural Network (DNN), which has been proved to be the state-of-the-art technique in speech recognition, to re-estimate the confidence of keyword hypotheses in the verification stage of spoken term detection. The speech recognition system based on DNN outperforms that based on conventional Gaussian Mixture Model (GMM) but suffers from the increased decoding time...
Recently it has been shown to be possible to ascertain the target of a subject's attention in a cocktail party environment from single-trial (∼60 s) electroencephalography (EEG) data. Specifically, this was shown in the context of a dichotic listening paradigm where subjects were cued to attend to a story in one ear while ignoring a different story in the other and were required to answer questions...
Viterbi algorithm is a dynamic programming algorithm used to find out the most likely word uttered by the unknown speech signal. In Viterbi algorithm, the observation probabilities are calculated using Gaussian distribution function. For implementation of Viterbi decoder, these probability values are initially stored in RAM. Thus conventional Viterbi decoder requires large RAM for its execution. In...
This paper introduces basic principles of MELPe (Enhanced Mixed-Excitation Linear Predictive), which is an enhanced algorithm of MELP. Compiling optimization and code optimization methods will be proposed based on ARM1176JZF-S kernel. The encoding time of optimized algorithm drops from 110.75ms per frame to 52.5ms per frame and decoding time drops from 14.88ms per frame to 10.73ms per frame. Efficiency...
It is widely known that database quality has a huge impact on speech recognition system performance, most especially when the expected domain is well represented. In this paper, we use this idea as leverage for a data-driven solution to the problem of code-switching in Filipino. Practical Filipino conversations often contain English and other loan words in varying frequencies, demanding better training...
This paper presents alternative approaches to select the mixed channels during teleconferencing involving CELP CoDecs. The proposals address the problems related to complexity and delay when classical solutions based on PCM samples are used. The principle consists of avoiding total speech decoding and to extrapolate the speech audio level based on CELP parameters, before channels selection. Only the...
This paper presents a new secure variant of ADPCM encoders that are adopted by the CCITT as Adaptive Differential Pulse Code Modulation. This version provides encryption and decryption of voice simultaneously with operations ADPCM encoding and decoding. The evaluation of the scheme showed better performance in terms of speed and security.
Nowadays the number of mobile subscribers is increasing all over the world, so the system for the communication has to be improved. Mixed Excited Linear Prediction (MELP) algorithm is developed for reducing the bandwidth of the signal as well as transmit more data on a single channel. This results in increase in channel capacity. MELP is basically a speech coding method, relying on a Speech Encoder...
This work describes the development of a scheme for retrieving spoken documents in a remote fashion stored on a voice server. The spoken documents are recorded and indexed based on the frequency of occurrence of isolated keywords and are stored on the voice server. An isolated word recognizer (IWR) is developed for recognizing the identified keywords spoken in isolated fashion. The IWR employs foreground...
In this paper we study the robustness of a command decoding approach based on tiny decoding graphs for voice-based robotic interaction. This approach comprises the fusion of the grammar rules and the statistical n-gram language models to produce an elegant and quite efficient tiny decoding graph. The resulting tiny graph has several advantages such as high speed and improved robustness of command...
The objective of this research is to study the performance of a high quality speech compression in real-time on a single-chip system. Based on voice over Internet protocol (VoIP) requirements, we have decided to implement a high quality speech coding (with signal-to-noise ratio, SNR of more than 10 dB), at a low bit rate of 8 kbit/s or less. The coder must have delay not more than 100 ms. The development...
In this paper, a novel parametric prosody coding approach for Mandarin speech is proposed. It employs a hierarchical prosodic model (HPM) as a prosody generating model in the encoder to analyze the speech prosody of the input utterance to obtain a parametric representation of four prosodic-acoustic features of syllable pitch contour, syllable duration, syllable energy level, and syllable-juncture...
The study proposes a joint source channel decoding scheme with the speech source residual redundancy without changing the complexity of the decoding algorithm. As the speech parameter index could be used to calculate the transition probability of the speech coding parameter index, we can get transition matrixes on different speech parameters according to the statistics of the speech signal. Using...
There are several commercial text-to-speech (TTS) systems that generate speech signals that sound very natural. A distinct problem is utterance copy, which consists in taking speech as input (instead of text, as in TTS) and find the input parameters that would drive a speech synthesizer to generate speech that mimics the target speech with respect to contents and speaker identity. Utterance copy is...
In this paper we compare two sets of audio features in task of audio pattern searching based on elementary sound models. The first set of features consist of well-known mel-frequency cepstral coefficients together with their first and second order time derivatives. The second set was chosen from bag of features by particle swarm optimization algorithm and consist of following audio features: line...
This paper describes the French broadcast speech transcription system by CRIM for the ETAPE 2011 evaluation. The key elements in this recognizer include over 140,000-word dictionary, 478 hours of audio for training the acoustic models, feature-space MMI and boosted MMI discriminative training of the acoustic models, variable-frame-rate decoding with trigram language model, lattice rescoring with quadgram...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.