The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The current state of the art TTS synthesis can produce synthesized speech with highly decent quality if rich segmental and suprasegmental information are given. However, some suprasegmental features, e.g., Tone and Break (TOBI), are time consuming due to being manually labeled with a high inconsistency among different annotators. In this paper, we investigate the use of word embedding, which represents...
In DNN-based TTS synthesis, DNNs hidden layers can be viewed as deep transformation for linguistic features and the output layers as representation of acoustic space to regress the transformed linguistic features to acoustic parameters. The deep-layered architectures of DNN can not only represent highly-complex transformation compactly, but also take advantage of huge amount of training data. In this...
In this paper, we propose a two-step approach for the super-resolution reconstruction of video sequences based on the degraded model. Firstly we use the sparse principal component analysis and the linear minimum mean square-error estimation method to remove the noises from the degraded video sequences. Secondly we adopt the Newton-Thiele's vector valued rational interpolation which is one of the nonlinear...
Hierarchical models becomes one of the most widely-adopted and effective solutions in organizing large volume of documents. Although there are general taxonomies on the Web, we observe that in most cases there will be many inconsistencies between general taxonomy and specific resources as the generation of taxonomies is independent of the resources. Besides with the newly available resources into...
In order to spot the digits in a handwritten document, each component is sent to a classifier. This is a time consuming process because a document usually contains several hundred components. A method is presented to reduce the number of candidate components from a handwritten document sent to the classifier. Furthermore, since the classifier does not contain a rejection class, this led to several...
In practical applications, errors should not be treated equally, but conditionally. In this paper, errors are categorized based on different costs in misclassification. Accordingly, the characteristics of the error categorization and the corresponding strategies for correcting them are proposed. Verification based on Arabic Handwritten Numeral Recognition is considered as one application to utilize...
In handwriting recognition, confusing/conflicting writing styles can result in irreducible errors, so the study of writing style consistencies is important for applications. In Arabic Handwritten Numeral Recognition, most errors occur between samples of classes two and three due to their very similar shapes in some writing styles. In this paper, an automated writing style detection process is effectively...
Since the Urdu language has more isolated letters than Arabic and Farsi, a research on Urdu handwritten word is desired. This is a novel approach to use the compound features and a Support Vector Machine (SVM) in offline Urdu word recognition. Due to the cursive style in Urdu, a classification using a holistic approach is adapted efficiently. Compound feature sets, which involves in structural and...
This paper presents a linear discriminant analysis based measurement (LDAM) on the output from classifiers as a criterion to reject the patterns which cannot be classified with high reliability. This is important in applications (such as in processing of financial documents) where errors can be very costly and therefore less tolerable than rejections. To implement the rejection, which can be considered...
A low-cost robust Mandarin speech recognition system is investigated for embedded car navigation application. In the front-end, log-spectral minimum mean-square error (LogMMSE) estimation algorithm is applied to suppress the background noise, and a piece-wise linear function is used to approximate the traditional Taylor expansion in its gain function calculation to reduce the computational complexity...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.