The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The dynamic signature is a biometric trait widely used and accepted for verifying a person’s identity. Current automatic signature-based biometric systems typically require five, ten, or even more specimens of a person’s signature to learn intrapersonal variability sufficient to provide an accurate verification of the individual’s identity. To mitigate this drawback, this paper proposes a procedure...
This paper proposes a hybrid speaker diarization system. The main body is a variational Bayes — hidden Markov model (VB-HMM) speaker diarization system. The VB-HMM speaker diarization system avoids making premature hard decision and takes advantages of soft speaker information in an iterative way. Thus, it outperforms most of mainstream speaker diarization systems. Unfortunately, this system is sensitive...
Face sketch synthesis plays an important role in both law enforcement and digital entertainment. The existing methods for sketch synthesis always suffer from noising and blurring effect. To resolve these problems, a nonsubsampled Shearlet transform (NSST) based detail enhancement strategy is proposed. The exemplar-based method is firstly adopted to synthesize the primary sketch, then the final sketch...
Building a human-computer interactive parachute simulator is an efficient way to avoid the high risk and high cost of field parachute training. In this paper, a novel dynamic recognition and simulation approach of parachute training is developed. Firstly we process the skeletal data acquired by Kinect and enforce the indication of the trainees' parachute posture, where principle component analysis...
In order to train neural networks (NN) for text-to-speech synthesis (TTS), phonetic segmentation must be performed. The most accurate segmentation is performed manually, but the process of creating manual alignments is costly and time-consuming, so automatic procedures are preferable. In this paper, a simple alignment method based on models trained during hidden Markov Model (HMM) based TTS system...
This paper provides a voice transformation model that uses pitch data and Feed-forward Neural Networks on Line Spectral Frequency. The aim of this work is to achieve the transformation of a speech signal produced by a source speaker by modifying voice individuality parameters such that it appears to be spoken by a chosen target speaker, without modifying the message contents. Most of the previous...
Rāga is a quintessential component of Indian classical music. Rāgas are primarily characterised by melodic time-frequency (T-F) motifs. There have been several efforts made to determine the identity of a rāga, yet the techniques work only on subset(s) of rāgas, or perform poorly in terms of scalability. In this paper, we propose a rāga identification method for Carnatic music using Locality Sensitive...
Speech uttered by the human beings contains the information about speakers, languages and contents. Language of uttered speech can easily be identified by extracting the language specific information from it. Identification of language of speech is known as Language Identification (LID). Identification of language from speech is helpful in its translation, speech recognition and speech activated automatic...
The trend for about twenty years, the research regarding the number of states in Hidden Markov Model (HMM) was mainly aimed at increasing it in order to ensure the robustness of the face recognition system. In this paper, a novel face recognition method is presented based on one state of discrete HMM, where it seemed impossible in the past. Contrary to other approaches that use the three parameters...
One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...
Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM)...
This paper proposes a long short-term memory recurrent neural network (LSTM-RNN) for extracting melody and simultaneously detecting regions of melody from polyphonic audio using the proposed harmonic sum loss. The previous state-of-the-art algorithms have not been based on machine learning techniques and certainly not on deep architectures. The harmonics structure in melody is incorporated in the...
Different modes of vibration of the vocal folds contribute significantly to the voice quality. The neutral mode phonation, often used in a modal voice, is one against which the other modes can be contrastively described, also called non-modal phonations. This paper investigates the impact of non-modal phonation on phonological posteriors, the probabilities of phonological features inferred from the...
Finding an effective way to represent human actions is yet an open problem because it usually requires taking evidences extracted from various temporal resolutions into account. A conventional way of representing an action employs temporally ordered fine-grained movements, e.g., key poses or subtle motions. Many existing approaches model actions by directly learning the transitional relationships...
Handwritten word recognition is a tough task, mixing image and natural language processing. Recently new recurrent neural networks with LSTM cells allowed significant improvements in this field. These networks are generally coupled with lexical and linguistic knowledge in order to correct character misrecognitions, namely using a lexicon driven decoding. Yet the high performances of LSTM networks...
In this paper, a study is conducted on combining analytical and holistic strategies for handwriting recognition. Even though the big majority of the recent high recognition rate systems adopts analytical strategies, physiological scientists suggest that the holistic strategy is the key for realizing near-human performance. In what we believe is a fresh perspective on handwriting recognition, combining...
State-of-the-art approaches on text-to-speech (TTS) synthesis like unit selection and HMM synthesis are data-driven. Therefore, they use a prerecorded speech corpus of natural speech to build a voice. This paper investigates the influence of the size of the speech corpus on five different perceptual quality dimensions. Six German unit selection voices were created based on subsets of different sizes...
For acoustic modeling, the use of DNN has become popular due to its superior performance improvements observed in many automatic speech recognition (ASR) tasks. Typically, DNNs with deep (many layers) and wide (many hidden units per layer) architectures are chosen in order to achieve good gains. An issue with such approaches is that there is an explosion in the number of learnable parameters. Thus,...
In this paper, we analyze the feasibility of using single well-resourced language - English - as a source language for multilingual techniques in context of Stacked Bottle-Neck tandem system. The effect of amount of data and number of tied-states in the source language on performance of ported system is evaluated together with different porting strategies. Generally, increasing data amount and level-of-detail...
Automatic speech recognition (ASR) of code-switching speech requires careful handling of unexpected language switches that may occur in a single utterance. In this paper, we investigate the feasibility of using multilingually trained deep neural networks (DNN) for the ASR of Frisian speech containing code-switches to Dutch with the aim of building a robust recognizer that can handle this phenomenon...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.