The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Voice activity detector (VAD) is used to detect the presence or absence of human voice in a signal. A robust VAD algorithm is essential to distinguish human voice in a noisy acoustic signal. There were many recent works in development of robust VAD which focus on unsupervised features extraction such as temporal variation, signal-to-noise ratio in [1] and etc. However, these methods are typically...
Real-time speaker identification (SI) system is the application of Biometric system where the voice samples are collected in real-time. Due to that contamination of noises in speaker samples are the natural scenario. In this work, we tried to increase the accuracy of real-time SI system. We analysed the SI system by using different feature extraction methods with GMM-ML classifier. We found that MFCC...
Gender classification technique is a part of the signal processing comprises with feature extraction and behavioural gender modelling. Fundamental frequency and pitch are mostly used as feature for gender detection due to their unique characteristics in voice source. In this study, Gammatone Frequency Cepstral Coefficient (GFCC)-based robust gender classification method has been presented. This study...
Speaker identification is a biometric technique of determining an unknown speaker's identity among a number of speakers using distinguish latent information of uttered speech. Crime investigation, security control, telephone banking and trading, and information reservation are some applications of this technique. Frequency Domain Linear Prediction (FDLP) is a time-frequency-based feature has been...
Fundamental frequency (F0) estimation plays an important role in speech processing such as speech coding, synthesis, recognition and so on. Although a present F0 estimation method performs well under clean condition, the performance deteriorates significantly in noisy environment. For this reason robust F0 estimation against additive noise is demanded. We have previously proposed F0 estimation methods...
Interferences, especially competing speakers, significantly influence speech intelligibility. In this paper, we propose an interference suppression method using only two closely-spaced microphones. Firstly, a 1st-order hypercardioid differential microphone array (DMA) with white noise gain (WNG) constraint is designed in the STFT domain, which solves the amplification problem of DMA on incoherent...
Many power based features have been proposed in previous studies as alternative to the conventional feature, i.e. MFCC, for speech recognition. These features are of interest because they are empirically shown to be more robust than MFCC in noisy environments. Some studies argue that the compressions of power functions which are less sensitive than log for low energy spectra is one of the reasons...
A gesture is not performed in only one action but a combination of continuous actions. It is very important to know the start and end of a gesture for accurate gesture recognition. In this paper, to extract a meaningful gesture portion in an online situation, we introduce a method that can distinguish the start and end of a gesture. Then, we describe the method of recognizing an extracted gesture.
Las historias clínicas electrónicas contienen información importante de un paciente, que puede servir de insumo para realizar análisis retrospectivo en el diagnóstico, seguimiento y tratamiento de una enfermedad. Esta información es registrada de forma narrativa con lo que surge la limitación para identificar eventos médicos (tales como citas médicas, prescripción de medicamentos, tratamientos, procedimientos...
Sub-band speech processing is well-known in robust speech recognition. On the other hand, in recent years, deep neural networks (DNNs) have been widely used in speech recognition for acoustic modeling and also feature extraction and transformation. In this paper, we propose to use deep belief network (DBN) as a post-processing method for de-noising in Mel sub-band level where we enhance logarithm...
Robust phonetic segmentation is extremely important for several speech processing tasks such as phone level articulation analysis and error detection, speech synthesis, and annotation. In this paper, we present an unsupervised phonetic segmentation approach and its application to noisy and clipped speech such as mobile phone recordings. We propose a multi-taper-based Perceptual Linear Prediction (PLP)...
Speech therapy is essential to help children with speech sound disorders. While some computer tools for speech therapy have been proposed, most focus on articulation disorders. Another important aspect of speech therapy is voice quality but not much research has been developed on this issue. As a contribution to fill this gap, we propose a robust scoring model for voice exercises often used in speech...
In order to further improve the robustness and discrimination of perceptual hashing and retrieval speed in large-scale data, a novel retrieval algorithm over encrypted speech is proposed. Before encrypted speech is uploaded, perceptual hashing sequences must be embedded as a digital watermark. In the process of generating perceptual hashing, multifractal characteristic of speech that has good distinctiveness...
Diagnosis and monitoring of Parkinson's disease has a number of challenges as there is no definitive biomarker despite the broad range of symptoms. Research is ongoing to produce objective measures that can either diagnose Parkinson's or act as an objective decision support tool. Recent research on speech based measures have demonstrated promising results. This study aims to investigate the characteristics...
High-density electrocorticography (ECoG) arrays are promising interfaces for high-resolution neural recording from the cortical surface. Commercial options for high-density arrays are limited, and historically tradeoffs must be made between spatial coverage and electrode density. However, thin-film technology is a promising alternative for generating electrode arrays capable of large area coverage...
Speaker recognition plays an important role in speech processing and classification. In this paper, we propose a features extraction method using Perception Auditory Factor to improve the performance of speaker recognition in noisy environment. After the speech enhancement based on auditory perception characteristic and the 2-dimension enhancement for spectrogram, speech distribution is obtained from...
A Speech enhancement which is the growth of communication system enhancement means improvement in value of quality of something. This paper explains harmonic noise model (HNM) and MMSE algorithm based speech improvement. There are many types of advantages like provides high quality speech synthesis, speech coding for flexible and effective decomposition of speech. So we use this method for speech...
This paper will investigate viability of a screening application that could be used to identify individuals with Dysarthria from among a larger population using sentence-level speech data. This task presents a number of challenged particularly if we aim to identify the disorder in the earlier stages before the more significant symptoms have begun to manifest themselves. A principal challenge in this...
In order to cope with the multi-source localization in near-field reverberant environment, approximated kernel density estimator (KDE) algorithm is introduced to provide robust anti-reverberation performance and multi-stage (MS) is used to solve the spectrum aliasing of high frequency on account of wide spacing of microphone array. Then spatial likelihood function (SLF) is built to mix the pairwise...
This paper presents an approach to audio parameterization using properties of the peaks detected in the amplitude envelope. The proposed solution based on observation that abrupt changes in the envelope of signal are connected with type of audio signal. For this purpose we used the density properties of peaks to calculate the feature vectors. The extraction process exploits an amplitude envelope estimation...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.