The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper introduces SSML for using with Arabic language. SSML is part of a larger set of markup specifications for voice browsers developed through the open processes of the W3C. The essential role of the markup language is to give authors of synthesizable content a standard way to control aspects of speech output such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable...
Several studies on cross-domain users' behaviour revealed generic personality trails and behavioural patterns. This paper, proposes quantitative approaches to use the knowledge of player behaviour in one game to seed the process of building player experience models in another. We investigate two settings: in the supervised feature mapping method, we use labeled datasets about players' behaviour in...
Emotion recognition in speech is a very challenging task in the speech processing domain. Because of the continuity characteristics of human emotion, most of the recent research focuses on recognising emotion in a continuous space. While previous attempts for speech emotion annotation adopted the likert-like scaling system in a continuous space and relied on prediction models to predict emotion we,...
The demand for automatically gathered data is a societal trend quickly extending to all aspects of human life. Knowledge on the utilization of public facilities is of interest for optimising use and cutting expenses for the owners. Manual observations are both cumbersome and expensive, and they have a risk of incorrect results due to subjective opinions or lack of interest in the given task. In this...
In this work we propose a method for classification of sports types from combined audio and visual features extracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modality short trajectories are constructed to represent the motion of players. From these, four motion features...
Speaker de-identification is an interesting and newly investigated task in speech processing. In the current implementations, this task is based on transforming one speaker speech to another speaker in order to hide the speaker identity. In this paper we present a discriminative approach for human speaker selection for speaker de-identiication. We used two modules, a speaker identiication system and...
This paper introduces a novel approach for pairwise preference learning through combining an evolutionary method with Multivariate Adaptive Regression Spline (MARS). Collecting users' feedback through pairwise preferences is recommended over other ranking approaches as this method is more appealing for human decision making. Learning models from pairwise preference data is however an NP-hard problem...
The dissimilarity between the training and test data in speech recognition systems is known to have a considerable effect on the recognition accuracy. To solve this problem, we use density forest to cluster the data and use maximum a posteriori (MAP) method to build a cluster-based adapted Gaussian mixture models (GMMs) in HMM speech recognition. Specifically, a set of bagged versions of the training...
The use of exemplar-based techniques for pitch generation in a text-to-speech system has shown a high degree of success and very comparable results compared to other techniques. The use of these techniques, however, requires that all units occur in the corpus. One of the limitations of this requirement is that the prosodically correlated data to the input found in the corpus does not always contain...
The generation of a pitch contour from linguistic information has long been recognised as a requirement for natural sounding speech synthesis. This paper investigates the use of an exemplar-based model for pitch contour generation. The main drawbacks of previous unit selection-based approaches for pitch contour generation is determining the size of the unit, and to guarantee that only prosodic and...
Exemplars are typically defined by set of features that may have simple or complex structures. Comparing two exemplars requires a distance calculation between their features, a task which becomes more difficult when some of these features are missing. A possible solution is to predict the missing features making use of those that are known. Prediction of features is considered a hard task in machine...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.