The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A calibration of various microphones that have different characteristics is very difficult. This paper presents a feature extraction method as an alternative. The method provides acoustic features that are strongly robust against various characteristic transfer functions. The proposed method applies Local Binary Patterns (LBP) and Compressive Sensing (CS) which compare spectral details with spectral...
Independent Vector Analysis is a powerful tool for estimating the broadband acoustic transfer function between multiple sources and the microphones in the frequency domain. In this work, we consider an extended IVA model which adopts the concept of pilot dependent signals. Without imposing any constraint on the de-mixing system, pilot signals depending on the target source are injected into the model...
Sirens and alarms play an important role in everyday life since they warn people of hazardous situations, even when these are out of sight. Automatic detection of this class of sounds can help hearing impaired or distracted people, e.g., on the road, and contribute to their independence and safety. In this paper, we present a technique for the detection of alarm sounds in noisy environments. The technique...
This paper presents an automatic system for detection of bird species in field recordings. A sinusoidal detection algorithm is employed to segment the acoustic scene into isolated spectro-temporal segments. Each segment is represented as a temporal sequence of frequencies of the detected sinusoid, referred to as frequency track. Each bird species is represented by a set of hidden Markov models (HMMs),...
In this work, we investigate the hardware implementation of Support Vector Machine (SVM) prediction on an FPGA platform for industrial ultrasound applications. Specifically, SVM is used as classifier for identifying ultrasonic A-scan signals as signals with flaw or signals without flaw. Hardware acceleration using FPGA is the main theme of the presented work. The architecture used to implement the...
We propose a method for optimizing an acoustic feature extractor for anomalous sound detection (ASD). Most ASD systems adopt outlier-detection techniques because it is difficult to collect a massive amount of anomalous sound data. To improve the performance of such outlier-detection-based ASD, it is essential to extract a set of efficient acoustic features that is suitable for identifying anomalous...
This paper presents and compares two algorithms based on artificial neural networks (ANNs) for sound event detection in real life audio. Both systems have been developed and evaluated with the material provided for the third task of the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 challenge. For the first algorithm, we make use of an ANN trained on different features extracted...
In this paper, the development of Multilingual Phone Recognition System (MPRS) in the context of Indian languages is described. MPRS is a language independent Phone Recognition System (PRS) that could recognise the phonetic units present in a speech utterance of any language. We have developed two Bilingual and a quadrilingual PRS using four Indian languages — Kannada, Telugu, Bengali, and Odia. International...
Fully automated defect detection and classification of automobile components are crucial for solving quality and efficiency problems for automotive manufacturers, due to the rising wage, production costs and warranty claims. However, metrological deviations in form still represent unsolved problems using state-of-the-art techniques, especially for forged or casted components with complex geometry...
In the paper we investigate the performance of parallel deep neural network training with parameter averaging for acoustic modeling in Kaldi, a popular automatic speech recognition toolkit. We describe experiments based on training a recurrent neural network with 4 layers of 800 LSTM hidden states on a 100-hour corpora of annotated Polish speech data. We propose a MPI-based modification of the training...
Long Short-Term Memory (LSTM) is the primary recurrent neural networks architecture for acoustic modeling in automatic speech recognition systems. Residual learning is an efficient method to help neural networks converge easier and faster. In this paper, we propose several types of residual LSTM methods for our acoustic modeling. Our experiments indicate that, compared with classic LSTM, our architecture...
Multimodal sentiment analysis is drawing an increasing amount of attention these days. It enables mining of opinions in video reviews which are now available aplenty on online platforms. However, multimodal sentiment analysis has only a few high-quality data sets annotated for training machine learning algorithms. These limited resources restrict the generalizability of models, where, for example,...
This paper deals with the acoustic event detection (AED) to improve the detection accuracy of acoustic events. Acoustic event detection task is performed by a regression via classification (RvC) based approach along with the random forest technique. A discretization process is used to convert the continuous frame positions within acoustic events into event duration class labels. Outputs of the category-specific...
This paper deals with random forest regression based acoustic event detection (AED) by combining acoustic features with bottleneck features (BN). The bottleneck features have a good reputation of being inherently discriminative in acoustic signal processing. To deal with the unstructured and complex real-world acoustic events, an acoustic event detection system is constructed using bottleneck features...
Reliable visual features that encode the articulator movements of speakers can dramatically improve the decoding accuracy of automatic speech recognition systems when combined with the corresponding acoustic signals. In this paper, a novel framework is proposed to utilize audio-visual speech not only during decoding but also for training better acoustic models. In this framework, a multi-stream hidden...
This paper presents the main improvements brought recently to the large-vocabulary, continuous speech recognition (LVCSR) system for Romanian language developed by the Speech and Dialogue (SpeeD) research laboratory. While the most important improvement consists in the use of DNN-based acoustic models, instead of the classic HMM-GMM approach, several other aspects are discussed in the paper: a significant...
This paper presents the work done towards developing a speech corpus for Romanian, for automatic speech recognition for the banking domain. This work is done in the context of the Speech2Process project, which aims at creating a system which allows interaction between customers and agents in the contact center much easier. The application to use the banking corpus will provide automatic response to...
Vehicle counting, time-of-travel analysis, and other traffic studies frequently require the classification and identification of vehicles in a roadway. Unfortunately, many current technologies for identifying vehicles, such as image-based methods that use cameras and machine vision, are not appropriate for studies that require low-power consumption and low cost. Additionally, privacy issues are becoming...
An often underestimated challenge for lecturers is a considerate use of their voice in teaching auditoriums. Even experienced lecturers are challenged by speaking in front of large classes or in new surroundings for the first time. Universities therefore often offer special voice trainings in which lecturers can be trained to use their voice correctly by a professional voice coach. Those trainings,...
In this paper the practical issues of automotive surface identification system development are considering. The novelty of this work is the combining of different training algorithms, neural network structures and methods to increase the classification accuracy and avoid overfitting of real-world data. The obtained results thereby demonstrate that the use of proposed system architecture and statistical...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.