Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
In the presence of environmental noise, speaker verification systems inevitably see a decrease in performance. This paper proposes the (1) use of two parallel classifiers, (2) feature enhancement based on blind signal-to-noise ratio (SNR) estimation and (3) fusion, to improve the performance of speaker verification systems. The two classifiers are based on Gaussian mixture models and the partial least-squares...
The paper proposes the use of just mostly voiced speech (MVS) for speaker verification (SV). The speech is partitioned into an MVS part and a non-MVS part by a simple machine classification. SV experiments were held with a standard Gaussian mixture model (GMM) with universal background model (UBM) system and a GMM with computationally improved individual background model (IBM) system. They demonstrate...
In this paper, an improved nonnegative matrix factorization (NMF) algorithm is proposed for single channel blind source separation and applied to speech enhancement. By adding time correlation item to objective function to constrain the time-varying gain coefficients of noise, it can achieve better effect of speech enhancement. We propose an efficient algorithm to optimize objective function with...
The kernel function plays an important role in the classification of support vector machines (SVM). In order to solve the problem that a single SVM kernel function can not achieve optimal learning ability and generalization ability in recognition classification at the same time, here we present a new combined kernel function by analyzing and comparing the characteristics of various kernel functions...
For i-vector model, normalization approach is Probabilistic linear discriminant analysis and has a significant performance for verification of speaker. However it requires a huge development data which cost a lot in many cases. Unsupervised adaption method is a possible approach, which use unlabeled data to adapt PLDA scattering matrices to the target domain. In this paper, ‘local training’ approach...
As THz and millimetre wave technologies are further developing for a range of applications, photonics is one of the key technology for its development. We will discuss the different recent advances in photonic technologies for THz and millimetre wave. In particular we will look at integration technologies and their potential for reduced foot print and lower power consumption. We will although look...
Recurrent neural networks and their variants have received huge success in many difficult tasks, such as handwriting recognition and generation, natural language processing, acoustic modeling of speech, and so on. As a kind of recurrent neural network architectures, the long short-term memory (LSTM) has attracted great attention. Most research works focus on its structures, training algorithms and...
One service provided by our application ‘Speech Assistant System’ assisting the teaching of the hearing impaired to speak is the automatic assessment of words and sentences in the course of practice and feedback to the person. Individual speech sounds can only be correctly evaluated if they are compared with the appropriate reference speech sounds. This requires segmenting the speech to be examined...
This paper proposes the rehabilitation treatment coach robot which will help at-home patients do their rehabilitation exercises at home without any professional trainers. The coach robot is designed to be cheap enough for patients to afford it. The robot suggests the rehabilitation program and corrects the posture of the patients during the exercise. The deep neural network is used for posture correction...
In this paper, efficiency comparison of Support Vector Machines (SVM) and Binary Support Vector Machines (BSVM) techniques in utterance-based emotion recognition is studied. Acoustic features including energy, Mel-frequency cepstral coefficients (MFCC), Perceptual linear predictive (PLP), Filter bank (FBANK), pitch, their first and second derivatives are used as frame-based features. Four basic emotions...
This paper presents the results of language clustering in the i-vectors space, a method to determine in an unsupervised manner how many languages are in a data set and which recordings contain the same language. The most dense i-vectors clusters are found using the DBSCAN algorithm in a low dimensional space obtained by the t-SNE method. Quality of clustering for spherical k-means and the proposed...
In a real-life scenario, the acoustic characteristics of speech often suffer from the variations induced by diverse environmental noises and different speakers. To overcome the speaker-related speech variation problem for Automatic Speech Recognition (ASR), many speaker adaptation techniques have been proposed and studied. Almost all of these studies, however, only considered the speakers' long-term...
Speech is dynamic in nature and organized in a complex time-and-frequency structure that makes it very hard to solve the issue of automatic speech recognition (ASR) for diverse speaker conditions. The hardcomputing approach to solving this issue (i.e conventional computing based on precisely-stated, analytical, mathematics-inspired models) pushed processing limits because it is highly computationally...
Biometric security systems based on predefined speech sentences are extremely common nowadays, particularly in low-cost applications where the simplicity of the hardware involved is a great advantage. Audio spoofing verification is the problem of detecting whether a speech segment acquired from such a system is genuine, or whether it was synthesized or modified by a computer in order to make it sound...
Good speaker recognition systems should identify the speaker irrespective of what is spoken, including non-speech sounds that are often produced during natural conversations. In this work, the inclusion of breath sounds in the training phase of the speaker recognition is analyzed using the popular Gaussian mixture model-universal background model (GMM-UBM) and deep neural network (DNN) based systems...
Communication through voice is one of the main components of affective computing in human-computer interaction. In this type of interaction, properly comprehending the meanings of the words or the linguistic category and recognizing the emotion included in the speech is essential for enhancing the performance. In order to model the emotional state, the speech waves are utilized, which bear signals...
Neural networks can be used to identify and remove noise from noisy speech spectrum (denoisisng autoencoders, DAEs). The DAEs are typically implemented using the fully-connected feed-forward topology. Usually one of the following possibilities is used as DA target: 1) Ideal frequency ratio mask, which is applied to noisy spectrum to estimate the clean speech spectrum (masking) or 2) Clean speech spectrum...
Augmentative and Alternative Communication (AAC) apps are apps that enable non-speech communicative forms. One class of AAC apps are speech-generating devices (SGDs), where icons/pictures are tapped to produce spoken words. These apps are widely used to support communication and language learning for individuals with disabilities such as autism spectrum disorder (ASD). Given that these apps are used...
Monaural speech enhancement is a key yet challenging problem in speech area, which is always used as a pre-processing step of robust speech processing. Deep learning has proved to be very successful for solving this issue. In this paper, a new approach for enhancing the noisy speech in a single channel recording is presented. We propose a modified ideal ratio mask (IRM) which calculated by normalized...
Speech signal is usually degraded by room reverberation and additive noises in real environments. This paper focuses on separating target speech signal in reverberant conditions from binaural inputs. Binaural separation is formulated as a supervised learning problem, and we employ deep learning to map from both spatial and spectral features to a training target. With binaural inputs, we first apply...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.