The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The optimal parameters of noise suppression for detection of snoring activity are analyzed and we improve performance of detection of snoring activity in this paper. For detection of snoring activity, we use a Support Vector Machine which is one of machine learning. By training of grand truth and features, the SVM model is obtained. By applying test date to the SVM model, it is classified into snoring...
Speaker identity, the sound of a person's voice, is one of the most important characteristics in human communication. Voice conversion (VC) is an emergent problem in voice and speech processing that deals with the process of modifying a speaker's identity. More particularly, the speech signal spoken by the source speaker is modified to sound a sifit had been pronounced by another speaker, referred...
Ultrasonic Non-Destructive Testing (NDT) and imaging systems has been widely used for industrial and medical applications. In NDT system, detection and characterization of target signal can be extremely challenging because of the complex echo scattering environment and the system noise. In this paper, an algorithm based on Neural Network (NN) is presented to explore the possible solutions for ultrasonic...
This paper deals with the problem of classification of vehicles based on their acoustic signatures. Each type of vehicle transmits a particular type of engine sound, which can be used as a basis of classification. The samples are first collected using a reliable recording device. The signals so obtained are de-noised using wavelet analysis. The frames to be analyzed are selected using a unique energy...
For an asynchronous system based on brain-computer interface (BCI), detecting the occurrence of motor imagery by electroencephalogram (EEG) signals is the basis but also a challenge, due to the complex and non-stationary characteristics of EEG signals. This paper employs a filtering method which uses a the target guided sub-band filter combined with an energy detector for asynchronous motor imagery...
Novelty detection is the task of recognising events the differ from a model of normality. This paper proposes an acoustic novelty detector based on neural networks trained with an adversarial training strategy. The proposed approach is composed of a feature extraction stage that calculates Log-Mel spectral features from the input signal. Then, an autoencoder network, trained on a corpus of “normal”...
Recently, bottleneck features as effective representations have been successfully used in Speaker Recognition (SR) and Language Recognition (LR), but little work has focused on bottleneck features for Bird Species Verification (BSV). In SR, LR and BSR tasks, using short-time spectra features may be insufficient, so it need some more abstract and discriminative representations as complementation to...
Urban environments are characterised by the presence of distinctive audio signals which alert the drivers to events that require prompt action. The detection and interpretation of these signals would be highly beneficial for smart vehicle systems, as it would provide them with complementary information to navigate safely in the environment. In this paper, we present a framework that spots the presence...
In this paper, efficiency comparison of Support Vector Machines (SVM) and Binary Support Vector Machines (BSVM) techniques in utterance-based emotion recognition is studied. Acoustic features including energy, Mel-frequency cepstral coefficients (MFCC), Perceptual linear predictive (PLP), Filter bank (FBANK), pitch, their first and second derivatives are used as frame-based features. Four basic emotions...
In a real-life scenario, the acoustic characteristics of speech often suffer from the variations induced by diverse environmental noises and different speakers. To overcome the speaker-related speech variation problem for Automatic Speech Recognition (ASR), many speaker adaptation techniques have been proposed and studied. Almost all of these studies, however, only considered the speakers' long-term...
This paper presents a novel application of convolutional neural networks (CNNs) for the task of acoustic scene classification (ASC). We here propose the use of a CNN trained to classify short sequences of audio, represented by their log-mel spectrogram. We also introduce a training method that can be used under particular circumstances in order to make full use of small datasets. The proposed system...
The phoneme set influence for Lithuanian speech commands recognition accuracy is investigated. Four phoneme sets are discussed. LIEPA speech corpus for training of Acoustic Model is used. The phonetic representation of corpus transcriptions is generated by grapheme-to-phoneme transformation rules. Rule based transformations for Lithuanian language is proposed. Recognition engine with CMU Pocketsphinx...
We present a novel approach for large speech databases quantization. It uses an unsupervised iterative process to regulate a similarity measure to set the number of clusters and their boundaries, thus overcoming the shortcomings of conventional clustering algorithms such as k-Means and Fuzzy C-Means, which require a priori knowledge of the number of clusters and a similarity measure that follows the...
This paper investigates the use of Dirichlet process hidden Markov model (DPHMM) tokenizer for the template matching based query-by-example spoken term detection (QbE-STD) task. DPHMM can be obtained following an unsupervised iterative procedure without any training transcriptions. The STD performance of the DPHMM tokenizer is evaluated on TIMIT Corpus. We construct three kinds of DPHMM based QbE-STD...
Automatic frog species recognition based on acoustic signal has received attention among biologists for environmental studies as it can detect, localize and document the declining population of frog species efficiently compared to the manual survey. In this study, we investigate the possibility of the use of Deep Neural Network (DNN) as a classifier for a frog species recognition system. The Mel-Frequency...
This paper presents our work on developing acoustic models using deep neural networks (DNN) for low resource languages. This is considered one of the challenging problems in automatic speech recognition (ASR) as DNNs need large amount of data for building efficient models. The techniques explored in this approach use a common idea of transferring knowledge from models of high resource language to...
Mandarin and Tibetan Lhasa dialect are chosen to be the research objects. Phones sets and corresponding Latin Transformation scheme of Mandarin and Tibetan Lhasa dialect are established respectively. KL distance between two GMMs are studied. GMM-HMM models for phones of two languages are trained on the basis of corpus and pronunciation dictionaries. Phones of Mandarin and Tibetan Lhasa dialect are...
In this paper, we investigate various training methods for building deep neural network (DNN) based acoustic models for dysarthric speech data. Methods like multitask learning, knowledge distillation and model adaptation, which overcome data sparsity and model over-fitting problems are employed to study the merits of each method. In Knowledge distillation framework, some privilege information in addition...
Automatic speech recognition can be used to evaluate the accuracy of read speech and thus serve a valuable role in literacy development by providing the needed feedback on reading skills in the absence of qualified teachers. Given the known limitations of ASR in the face of insufficient task-specific training data, the selection of acoustic and language modeling strategies can play a crucial role...
There are several challenges while building Automatic Speech Recognition (ASR) system for low resource languages such as Indic languages. One problem is the access to large amounts of training data required to build Acoustic Models (AM) from scratch. In the context of Indian English, another challenge encountered is code-mixing as many Indian speakers are multilingual and exhibit code-mixing in their...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.