The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Convolutional Neural Networks (CNNs) have shown great success in solving key artificial vision challenges such as image segmentation. Training these networks, however, normally requires plenty of labeled data, while data labeling is an expensive and time-consuming task, due to the significant human effort involved. In this paper we propose two pixel-level domain adaptation methods, introducing a training...
For i-vector model, normalization approach is Probabilistic linear discriminant analysis and has a significant performance for verification of speaker. However it requires a huge development data which cost a lot in many cases. Unsupervised adaption method is a possible approach, which use unlabeled data to adapt PLDA scattering matrices to the target domain. In this paper, ‘local training’ approach...
The performance of speech emotion classifiers greatly degrade when the training conditions do not match the testing conditions. This problem is observed in cross-corpora evaluations, even when the corpora are similar. The lack of generalization is particularly problematic when the emotion classifiers are used in real applications. This study addresses this problem by combining active learning (AL)...
We analyze the theoretical vulnerability of maximum a posteriori(MAP) speaker adaptation, which is widely used in practical speaker recognition systems. First, we proved that there exist a set of feature vectors, what are called wolves, which can impersonate almost all the registered speakers with probability asymptotically close to 1 with at most two trials. Second, our experiment shows that the...
Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM)...
Biometric is a pattern recognition system that automatically identifies people according to their physiologic and behavioral properties. Among the physiologic properties, hand has a special place so that all features of hand like palm lines, inner knuckles, external knuckles and geometry could be used. More recently, the usage of blood vessels pattern in the palm, in addition to the high acceptability,...
Building synthetic child voices is considered a difficult task due to the challenges associated with data collection. As a result, speaker adaptation in conjunction with Hidden Markov Model (HMM)-based synthesis has become prevalent in this domain because the approach caters for limited amounts of data. An initial average voice model is trained using data from multiple speakers and adapted to resemble...
In this paper, a novel subject-adaptable heartbeat classification model is presented, in order to address the significant interperson variations in ECG signals. A multiview learning approach is proposed to automate subject adaptation using a small amount of unlabeled personal data, without requiring manual labeling. The designed subject-customized models consist of two models, namely, general classification...
Omnidirectional cameras are commonly used in computer vision and robotics. Their main advantage is their wide field of view which allows them to acquire a 360 degree view of the scene with only one sensor and a single shot. However, few studies have investigated the human detection problem using this kind of cameras. In this paper, we propose to extend the conventional approach for human detection...
One task of heterogeneous face recognition is to match a near infrared (NIR) face image to a visible light (VIS) image. In practice, there are often a few pairwise NIR-VIS face images but it is easy to collect lots of VIS face images. Therefore, how to use these unpaired VIS images to improve the NIR-VIS recognition accuracy is an ongoing issue. This paper presents a deep TransfeR NIR-VIS heterogeneous...
In this paper, we investigate methods to improve the recognition performance of low-resource languages with limited training data by borrowing subspace parameters from a high-resource language in subspace Gaussian mixture model (SGMM) framework. As a first step, only the state-specific vectors are updated using low-resource language, while retaining all the globally shared parameters from the high-resource...
For speech emotion recognition on cross-corpus, we study the problem of speaker feature adaptation. First, we discuss the existing approaches in adaptive emotional classification from speech signals. Second, the speaker feature adaptive approach is further studied in view of additive emotion feature distortion. Finally we verified our approaches using different cross-languages corpus, including German,...
Recently we have proposed a new State-GMM-supervector extractor for solving the problem of text-dependent speaker recognition. We demonstrated that segmenting the passphrase into word states for supervector extraction makes it possible to create more accurate statistical models of speech signals and to achieve reduction of EER compared to the best state-of-the-art systems of text-dependent verification...
This paper presents an HMM-based synthesis approach for speechlaughs. The building stone of this project was the idea of the co-occurrence of smile and laughter bursts in varying proportions within amused speech utterances. A corpus with three complementary speaking styles was used to train the underlying HMM models: neutral speech, speech-smile, and finally laughter in different articulatory configurations...
One of the main barriers in the deployment of speech emotion recognition systems in real applications is the lack of generalization of the emotion classifiers. The recognition performance achieved in controlled recordings drops when the models are tested with different speakers, channels, environments and domain conditions. This paper explores supervised model adaptation, which can improve the performance...
Unsupervised speaker adaptation of Deep Neural Network (DNN) is investigated for lecture transcription tasks, in which a single speaker gives a long speech and thus speaker adaptation is important. The proposed method selects similar speakers to the test data (test speaker) from the training database, which are used for retraining the baseline DNN. Several speaker characteristic features are defined...
This paper presents the most recent progress and state of the art result obtained from BBN's Arabic offline handwriting recognition research. Our system is based a left-to-right hidden Markov model and integrates discriminative learning methods including discriminative MPE and n-best rescoring using the scores of glyph classifiers (SVM, DNN) and the RNNLM. Arabic-related features for n-best rescoring...
This paper explores the use of mismatch condition in speaker variability applied to Automatic Speaker Verification (ASV) defined as a classification task to decide whether a proclaimed identity is true or not. This paper proposes to model mismatch conditions in speaker variability from session to another. It was shown that the speaker recognition accuracy deteriorates when there is an acoustic mismatch...
This paper presents a new similarity measure and matching scheme for content-based image retrieval (CBIR), based on modeling positive and negative hypotheses and testing a query image against these two hypotheses. The paper proposes to calculate first a universal image model (UIM), which is built based on a large set of images. The derived UIM is then used as a reference for the calculation of adapted...
Online handwritten signature is a behavioral biometric trait with several practical applications. Examples of these applications include access control to personal devices and validation of online transactions. Several research work have been done to improve the performance of online signature verification systems. This paper presents an improvement of a recently proposed online signature verification...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.