The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Cochlear implant (CI) listeners were found to have great difficulty with vocal emotion recognition because of the limited spectral cues provided by CI devices. Previous studies have shown that the modulation spectral features of temporal envelopes may be important cues for vocal emotion recognition of noise-vocoded speech (NVS) as simulated CIs. In this paper, the feasibility of vocal emotion conversion...
This paper proposes a robust front-end for speech applications based on restoration scheme of instantaneous amplitude and phase. Typical applications such as hearing aids and automatic speech recognition systems still have challenging issues with regard to robustness against noise and reverberation. The proposed front-end employed a combination of our previously proposed method for restoring instantaneous...
This paper proposes the unification of the codeexcited linear prediction (CELP) codec process with watermarking based on formant tuning. The serial problem in atermarking and then encoding with the CELP codec was thereby reduced by using the proposed method which also ncreased the bit detection rate. We took advantage of two key properties: I) humans do not perceive alterations applied to formants...
We have proved that restoring the instantaneous amplitude as well as instantaneous phase on Gammatone interbank plays a significant role for speech enhancement. However, it is still challenging topic with dereverberation since previously proposed scheme can only work in noisy environments. In this paper, we extend our previously proposed scheme to be general speech enhancement of removing the effects...
Illegal use of digital technologies has brought a series of problems in speech protection and authorization. Digital watermarking can effectively solve these problems by embedding watermarks into the host signals. This paper proposes a hybrid watermarking method for speech signals based on the concepts of formant enhancement (FE) and cochlear delay (CD). This hybrid method utilizes the source-filter...
We propose a method of speech watermarking based on modifications to line spectral frequencies (LSFs) of original speech. LSFs were derived from each frame with linear prediction (LP) analysis and watermarks were embedded into them by using the quantization index modulation (QIM) of different quantization steps. We took into consideration inaudibility and robustness that were influenced by minor modifications...
This paper proposes a method for robustly and accurately estimating fundamental frequency (F0) of the steady complex tone on the basis of an amplitude modulation/demodulation technique. It is based on the well-known mechanism of pitch perception for AM tone. The comparative results revealed that the percentage correct rates of the estimated F0s using a few recent methods (TEMPO, PHIA, and CmpCep)...
In this paper, we propose a novel feature compensation approach based on the interacting multiple model (IMM) algorithm specially designed for joint processing of background noise and acoustic reverberation. Our approach to cope with the time-varying environmental parameters is to establish a switching linear dynamic model for the additive and convo-lutive distortions in the log-spectral domain. The...
The speech transmission index (STI) is an objective measurement that is used to assess the quality of speech transmission in room acoustics. This paper proposes a simplified method of blindly estimating the STI in room acoustics based on the concept of the modulation transfer function (MTF). STI can be estimated with this method in four steps: (1) MTF is estimated in the whole band from the reverberant...
Noise reduction algorithms are widely used to mitigate noise effects on speech to improve the robustness of speech technology applications. However, they inevitably cause speech distortion. The tradeoff between noise reduction and speech distortion is a key concern in designing noise reduction algorithms. This study proposes a novel framework for noise reduction by considering this tradeoff. We regard...
Voice activity detection (VAD) is used to detect speech/non-speech periods in observed signals. However, the current VAD technique has a serious problem in that the accuracy of detection of speech periods drastically reduces if it is used for noisy speech and/or for mixtures of speech/non-speech such as those in music and environmental sounds. Thus, VAD needs to be robust to enable speech periods...
The tradeoff between noise reduction and speech distortion is a key concern in designing noise reduction algorithms. We have proposed a regularization framework for noise reduction with the consideration of the tradeoff problem. We regard speech estimation as a functional approximation problem in a reproducing kernel Hilbert space (RKHS). In the estimation, the objective function is formulated to...
Recent methods of speech enhancement have been proposed to suppress the effects of background noise and reverberation. The effect of background noise in these methods is regarded as additive and that of reverberation is convolutive. Therefore, methods of reducing noise and dereverberation have been applied separately in tandem. We previously unified the effects of noise and reverberation in the modulation...
Singing and speaking are important and natural ways in communications for humans to express nonlinguistic and linguistic information. It seems the majority of common people correctly perform and imitate all factors such as pitches and melodies as the same as those achieved by professional singers, while they can correctly vocalize all factors involved in speaking. There is no absolute answer as to...
There have recently been serious social issues involved in multimedia signal processing such as malicious attacks and tampering with digital audio/speech signals. Fragile speech watermarking is a technique that enables the detection of tampering with the original signals. We previously proposed an inaudible digital-audio watermarking approach based on cochlear delay. We investigated how the proposed...
Noise reduction is used to reduce the noise effect on speech, and is important for many real speech applications. However, noise reduction inevitably causes speech distortion. The trade-off between noise reduction and speech distortion is always a key concern in designing noise reduction algorithms. In this study, we took a new look at this problem, and regarded the speech estimation as a functional...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.