The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The recent research studies showed that inter-layered network coding is a promising approach to provide the unequal error protection for scalable video multicast under the channel heterogeneity. The selection of the optimal transmission distribution performed at eNB increases the system performance with the cost of time and computational complexities. In this paper, we propose an optimal transmission...
We describe a method of lexicon expansion to tackle variations of spontaneous speech. The variations of utterances are found widely in the programs such as conversations talk shows and are typically observed as unintelligible utterances with a high speech-rate. Unlike read speech in news programs, these variations often severely degrade automatic speech recognition (ASR) performance. Then, these variations...
We propose two simple methods to improve the performance of a keyword spotting system. In our application, the users are allowed to change the keywords anytime if they want. Thus we focused on phone-based GMM-HMM models since they do not require keyword-specific training data. However, the GMM-HMM based models usually have very high false alarm rate, i.e., a keyword is not present but the system gives...
This paper proposes a filtering approach based on global motion estimation (GME) and global motion compensation (GMC) as pre-processing and post-processing for video CODEC. For the pre-processing of video CODEC, group-of-pictures (GOP), i.e., basic unit for GMC and reference frames are first defined for an input video sequence. Next, GME and GMC are sequentially performed for every frame in each GOP...
In this paper, we investigate a DNN tone-based extended recognition network (ERN) approach to Mandarin tone recognition and tone mispronunciation detection. Given a toneless syllable sequence, a tone-based ERN is constructed by assigning five different tones to each toneless syllable, obtaining a fully expanded tonal syllable network. Next, Viterbi decoding is carried out on the tone-based ERN to...
A novel perceptual multiple description coding with randomly offset quantizers (PMDROQ) is proposed. In the proposed PMDROQ method, the input image is partitioned into M subsets, and then obtaining M descriptions. In each description, one subset is directly encoded and decoded with different-small perceptual quantization stepsizes in DCT domain, while other subsets are predictively coded and decoded...
We propose a flexible framework for spectral conversion (SC) that facilitates training with unaligned corpora. Many SC frameworks require parallel corpora, phonetic alignments, or explicit frame-wise correspondence for learning conversion functions or for synthesizing a target spectrum with the aid of alignments. However, these requirements gravely limit the scope of practical applications of SC due...
A secure identification scheme for JPEG 2000 code-streams is proposed in this paper. The aim is to securely identify JPEG 2000 images generated from the same original image, without decoding images. Features used for the identification are extracted from header parts in a JPEG 2000 codestream. The proposed scheme does not provide any false negative matches under various compression ratios, while most...
In applications such as visual surveillance, huge amounts of data are continuously being recorded and efficient encoding forms an important task. The use of compressive sensing (CS) to encode in such scenarios (assuming stationary background) is actively being explored by researchers during the recent times. In the said scenario, working out new measurement and reconstruction pair for efficient encoding...
Research on multilingual speech recognition remains attractive yet challenging. Recent studies focus on learning shared structures under the multi-task paradigm, in particular a feature sharing structure. This approach has been found effective to improve performance on each individual language. However, this approach is only useful when the deployed system supports just one language. In a true multilingual...
To improve the performance of noisy automatic speech recognition (ASR), it is effective to prepare multiple ASR systems that can address the large varieties of noise. However, the optimal ASR system is different for each environment and mismatches between training and testing degrade ASR performance. In this situation, the overall system combination of multiple systems is effective; however, the computational...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.