The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A computationally efficient feature, called Minimum Energy Density (MED) was applied to discriminate audio signals between speech and music in the radio stations programs. The presented binary classifier is based on testing two features: energy distribution and differences between energy in channels. We analyzed 240 hours of signals, from 10 Polish radio stations. Our analysis enables us to provide...
The phenomena of filled pauses and breaths pose a challenge to Automatic Speech Recognition (ASR) systems dealing with spontaneous speech, including recognizer modules in Interactive Voice Reponse (IVR) systems. We suggest a method based on Hidden Markov Models (HMM), which is easily integrated into HMM-based ASR systems and allows detection of those disturbances without incorporating additional parameters...
We investigate whether language models used in automatic speech recognition (ASR) should be trained on speech transcripts rather than on written texts. By calculating log-likelihood statistic for part-of-speech (POS) n-grams, we show that there are significant differences between written texts and speech transcripts. We also test the performance of language models trained on speech transcripts and...
Three possible methods of detecting recorded speech were analysed and tested according to their applicability in the field of voicemail detection in this paper. Methods chosen for testing were: transmission channel characteristics extraction with PFCC, recorded speech detection with trained pattern classifier, differences in transmission channels and speech recognition. Most of the tests gave results...
The Feedback Delay Network (FDN) is used as artificial digital reverberation algorithm. Being one of the most naturally sounding approaches it became widely implemented in many sound processing software products. Although FDN is a very potent tool in regards to artificial reverberation, achieving proper perceptual quality of acoustic simulation usually demands additional modifications to signal processing...
The paper presents statistical phonetic data of Polish collected from a corpus. Lengths of phonemes vary from 5 ms to 670 ms. Average durations of Polish phonemes are presented as well as an important anomaly of longer phonemes in the end of sentences, which is the main topic of the paper. This observation can be used in speech recognition for automatic insertation of dots and sentence modelling....
The paper presents one of the possible approaches to build a triphone model for automatic speech recognition of Polish. Even though classifiers are well developed and described, such task is not a trivial one because of lack of enough training data and importance of calculation time spent for the training of the model. To overcome this problem, some states are typically tied using data-driven criteria...
A new and efficient method of designing the transmultiplexer filters is presented. The bilinear equations posed for the FIR filters are solved to achieve perfect reconstruction. For a given combining filter bank a separation filter bank can be developed by solving a set of algebraic equations. Some examples of a two-channel and four-channel transmultiplexer system are provided to illustrate the method...
Recognition of continuous speech is one of the major challenges in automatic speech recognition (ASR), especially in phonetically complex languages (i.e. Polish). To improve ASR of the Polish language, we obtained phoneme statistics to locate diphones and triphones within the running speech sequences. We found that these clusters occur more likely between the words boundaries rather than within the...
The paper presents analysis of prosodic parameters of speech (energy, phoneme duration) as features characteristic for speaker. The most significant parameters of the features were investigated using CORPORA speech database and described statistically. We observed that phoneme duration depends on a speaker, as well as the preboundary lengthening of the phonemes in sentences. An average phoneme energy...
An algorithm for automatic detection of breath events in a speech signal is suggested in this paper. The issues of breath events occurrences in recordings are discussed as well as their statistical parameters. Also the role of breath pauses for signalizing punctuation and emotional or physical state of the speaker, in both spontaneous and read speech, is described. Wavelet parameters of energy in...
Two possible confidence measures for automatic speech recognition are presented along with results of tests where they were applied. One of them is widely known and it is based on comparing the strongest hypotheses with an average of a few next hypotheses. We found it not efficient in all cases, this is why we came up with our own method based on comparison of substrings. New algorithm was found useful...
The paper presents an evaluation of Polish phone segmentation for different types of phones. The categorisation was done based on acoustic properties. The segmentation method is based on discrete wavelet transform and was already published. The results show that several types of transitions, especially from and to vowels cause more errors than others.
A transmultiplexer assigned to combine images into one image to be sent through a single communication channel is presented. The considered system can be equipped with integer-to-integer filters to enable the lossless compression. The efficiency of lossless JPEG compression applied to transmultiplexed signals has been verified
The application of image compression methods in transmultiplexer systems is presented. The specific energy distribution in the combined image spectrum makes the standard compression methods, especially these that base on frequency decomposition not efficient. Two cases are described and compared: the compression of combined image and the preliminary compression of input images before transmultiplexing...
In this paper a new method of speech segmentation is suggested. It is based on power fluctuations of the wavelet spectrum for a speech signal. In most approaches to speech recognition, the speech signals are segmented using constant-time segmentation. Constant segmentation needs to use windows to decrease the boundary distortions. A more natural approach is to segment the speech signals on the basis...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.