The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a 3D emotional facial animation synthesis approach based on the Factored Conditional Restricted Boltzmann Machines (FCRBM). Facial Action Parameters (FAPs) extracted from 2D face image sequences, are adopted to train the FCRBM model parameters. Based on the trained model, given an emotion label sequence and several initial frames of FAPs, the corresponding FAP sequence is generated...
To develop speaker adaptation algorithms for deep neural network (DNN) that are suitable for large-scale online deployment, it is desirable that the adaptation model be represented in a compact form and learned in an unsupervised fashion. In this paper, we propose a novel low-footprint adaptation technique for DNN that adapts the DNN model through node activation functions. The approach introduces...
Gaussian mixture (GMM)-HMMs, though being the predominant modeling technique for speech recognition, are often criticized as being inaccurate to model heterogeneous data sources. In this work, we propose the stranded Gaussian mixture (SGMM)-HMM, an extension of the GMM-HMM, to explicitly model the dependence among the mixture components, i.e., each mixture component is assumed to depend on the previous...
In this paper, we present a general algorithmic framework based on WFSTs for implementing a variety of discriminative training methods, such as MMI, MCE, and MPE/MWE. In contrast to the ordinary word lattices, the transducer-based lattices are more amenable to representing and manipulating the underlying hypothesis space and have a finer granularity at the HMM-state level. The transducers are processed...
This paper proposes a dynamic Bayesian network (DBN) based MPEG-4 compliant 3D facial animation synthesis method driven by the (Evaluation, Activation) values in the continuous emotion space. For each emotion, a state synchronous DBN model (SS_DBN) is firstly trained using the Cohn-Kanade (CK) database with two streams of inputs: (i) the annotated (Evaluation, Activation) values, and (ii) the extracted...
In this paper, we present the Gauss-Newton method as a unified approach to optimizing non-linear noise compensation models, such as vector Taylor series (VTS), data-driven parallel model combination (DPMC), and unscented transform (UT). We demonstrate that the commonly used approaches that iteratively approximate the noise parameters in an EM framework are variants of the Gauss-Newton method. Through...
In this paper, we propose a novel noise variance estimation method using the fixed point method for the VTS-based robust speech recognition. Noise parameters are re-estimated over a given utterance using an EM algorithm. The derivative of the auxiliary function with respect to the noise variance is resolved, and the fixed point algorithm estimates the noise variance by recursively approximating the...
In a collaborative scenario, a multiplicity of portable devices may constitute a network of distributed microphones, without a clearly defined geometric configuration or synchronization that can be taken advantage of for traditional microphone array processing to enhance the acquired signal. This application scenario represents a severe, but interesting challenge for automatic speech recognition systems...
This paper proposes a new approach for measuring the target cost in unit selection, where the difference between the target and candidate units is estimated by the Kullback-Leibler divergence (KLD) between the context-dependent hidden Markov models (HMM). In order to model the left/right phonetic context, biphone models are generated by merging regular tri-phone HMMs sharing the same left/right phonetic...
The existing automatic methods of face recognition cannot recognize ageing faces with great changes in facial appearance. In this paper, a novel algorithm based on EHMM (embedded hidden Markov model) is presented to recognize the face with large ageing effects. Firstly, the non-linear relations between age and motions of key feature points in face are achieved by analyzing a great number of samples,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.