The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In order to train neural networks (NN) for text-to-speech synthesis (TTS), phonetic segmentation must be performed. The most accurate segmentation is performed manually, but the process of creating manual alignments is costly and time-consuming, so automatic procedures are preferable. In this paper, a simple alignment method based on models trained during hidden Markov Model (HMM) based TTS system...
This work presents an embedded hardware architecture for real-time ultrasonic NDE applications that incorporate Hidden Markov Model (HMM) based statistical signal methods. HMM has been successfully used in applications like audio segment retrieval, speech/language recognition and image processing applications. Recently, we proposed a new Hidden Markov Model (HMM) based ultrasonic flaw detection algorithm...
This work presents an embedded hardware architecture for real-time ultrasonic NDE applications that incorporate Hidden Markov Model (HMM) based statistical signal methods. Proposed algorithm is a combination of Discrete Wavelet Transform (DWT) for pre-processing A-scan signals and HMM for classification of the flaw presence. For this study, a MicroZed FPGA with Xilinx Zynq-7020 System-on-Chip (SoC)...
This paper introduces the use of representations based on nonnegative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before...
This paper constructs speech features based on a generative model using a deep latent Gaussian model (DLGM), which is trained using stochastic gradient variational Bayes (SGVB) algorithm and performs efficient approximate inference and learning with a directed probabilistic graphical model. The trained DLGM then generate latent variables based on Gaussian distribution, which is used as new features...
A calibration of various microphones that have different characteristics is very difficult. This paper presents a feature extraction method as an alternative. The method provides acoustic features that are strongly robust against various characteristic transfer functions. The proposed method applies Local Binary Patterns (LBP) and Compressive Sensing (CS) which compare spectral details with spectral...
Independent Vector Analysis is a powerful tool for estimating the broadband acoustic transfer function between multiple sources and the microphones in the frequency domain. In this work, we consider an extended IVA model which adopts the concept of pilot dependent signals. Without imposing any constraint on the de-mixing system, pilot signals depending on the target source are injected into the model...
Sirens and alarms play an important role in everyday life since they warn people of hazardous situations, even when these are out of sight. Automatic detection of this class of sounds can help hearing impaired or distracted people, e.g., on the road, and contribute to their independence and safety. In this paper, we present a technique for the detection of alarm sounds in noisy environments. The technique...
This paper presents an automatic system for detection of bird species in field recordings. A sinusoidal detection algorithm is employed to segment the acoustic scene into isolated spectro-temporal segments. Each segment is represented as a temporal sequence of frequencies of the detected sinusoid, referred to as frequency track. Each bird species is represented by a set of hidden Markov models (HMMs),...
In this work, we investigate the hardware implementation of Support Vector Machine (SVM) prediction on an FPGA platform for industrial ultrasound applications. Specifically, SVM is used as classifier for identifying ultrasonic A-scan signals as signals with flaw or signals without flaw. Hardware acceleration using FPGA is the main theme of the presented work. The architecture used to implement the...
We propose a method for optimizing an acoustic feature extractor for anomalous sound detection (ASD). Most ASD systems adopt outlier-detection techniques because it is difficult to collect a massive amount of anomalous sound data. To improve the performance of such outlier-detection-based ASD, it is essential to extract a set of efficient acoustic features that is suitable for identifying anomalous...
This paper presents and compares two algorithms based on artificial neural networks (ANNs) for sound event detection in real life audio. Both systems have been developed and evaluated with the material provided for the third task of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. For the first algorithm, we make use of an ANN trained on different features extracted...
In this paper, the development of Multilingual Phone Recognition System (MPRS) in the context of Indian languages is described. MPRS is a language independent Phone Recognition System (PRS) that could recognise the phonetic units present in a speech utterance of any language. We have developed two Bilingual and a quadrilingual PRS using four Indian languages — Kannada, Telugu, Bengali, and Odia. International...
Fully automated defect detection and classification of automobile components are crucial for solving quality and efficiency problems for automotive manufacturers, due to the rising wage, production costs and warranty claims. However, metrological deviations in form still represent unsolved problems using state-of-the-art techniques, especially for forged or casted components with complex geometry...
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training...
Long Short-Term Memory (LSTM) is the primary recurrent neural networks architecture for acoustic modeling in automatic speech recognition systems. Residual learning is an efficient method to help neural networks converge easier and faster. In this paper, we propose several types of residual LSTM methods for our acoustic modeling. Our experiments indicate that, compared with classic LSTM, our architecture...
Multimodal sentiment analysis is drawing an increasing amount of attention these days. It enables mining of opinions in video reviews which are now available aplenty on online platforms. However, multimodal sentiment analysis has only a few high-quality data sets annotated for training machine learning algorithms. These limited resources restrict the generalizability of models, where, for example,...
This paper deals with the acoustic event detection (AED) to improve the detection accuracy of acoustic events. Acoustic event detection task is performed by a regression via classification (RvC) based approach along with the random forest technique. A discretization process is used to convert the continuous frame positions within acoustic events into event duration class labels. Outputs of the category-specific...
This paper deals with random forest regression based acoustic event detection (AED) by combining acoustic features with bottleneck features (BN). The bottleneck features have a good reputation of being inherently discriminative in acoustic signal processing. To deal with the unstructured and complex real-world acoustic events, an acoustic event detection system is constructed using bottleneck features...
Reliable visual features that encode the articulator movements of speakers can dramatically improve the decoding accuracy of automatic speech recognition systems when combined with the corresponding acoustic signals. In this paper, a novel framework is proposed to utilize audio-visual speech not only during decoding but also for training better acoustic models. In this framework, a multi-stream hidden...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.