The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The study of emotions in human-computer interaction is a growing research area. Focusing on automatic emotion recognition, work is being performed in order to achieve good results particularly in speech and facial gesture recognition. In this paper we present a study performed to analyze different Machine Learning techniques validity in automatic speech emotion recognition area. Using a bilingual...
Visual language refers to the idea that communication occurs through visual symbols, as opposed to verbal symbols or words. Contrast to a sentence construction in spoken language with a linear ordering of words, a visual language has a simultaneous structure with a parallel temporal and spatial configuration. Inspired by Deikto [5], we propose a two-dimensional string or sentence construction of visual...
Automatic dialogue systems get easily confused if speech is recognized which is not directed to the system. Besides noise or other people’s conversation, even the user’s utterance can cause difficulties when he is talking to someone else or to himself (“Off-Talk”). In this paper the automatic classification of the user’s focus of attention is investigated. In the German SmartWeb project, a mobile...
The Sammon Transform performs data projections in a topology-preserving manner on the basis of an arbitrary distance measure. We use the weights of the observation probabilities of semi-continuous HMMs that were adapted to the current speaker as input. Experiments on laryngectomized speakers with tracheoesophageal substitute voice, hoarse, and normal speakers show encouraging results. Different speaker...
The worker on the move has an ever-increasing need to access information, such as instructions on how to process with a task. The use of audio to convey that information and for interaction has many advantages over traditional hands&eyes devices, especially if the user needs his hands to perform a task. In this paper, we focus on a task model stored in a workflow engine. The execution of a task...
This paper describes use of negative examples in training the HVS semantic model. We present a novel initialization of the lexical model using negative examples extracted automatically from a semantic corpus as well as description of an algorithm for extraction these examples. We evaluated the use of negative examples on a closed domain human-human train timetable dialogue corpus. We significantly...
This paper describes progress in a development of the human-human dialogue corpus for machine translation of spoken language. We have chosen a semantically annotated corpus of phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler plans. Corpus dialogue act tags incorporate abstract semantic meaning. We have enriched a part of the corpus...
The paper analyses, how an information operator processes a customer’s requests. The study is based on the Estonian dialogue corpus. Our further aim is to develop a dialogue system (DS) which interacts with a user in Estonian and recognises, interprets and grants a user’s requests automatically. There are two main classes of computational models of the interpretation of dialogue acts – cue-based and...
We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We report classification...
This paper presents a simple method of determining the voice similarity by analyzing a set of very short sounds. A large number of pitch-length sounds were extracted from natural voice signals from different realizations of open vowels ’a’ and ’o’. The voice similarity was defined as the sum of single elementary similarities of short sound pairs. This method is oriented to the microphonemic speech...
In this work we show how our intonation corpus driven intonation modelling methodology MEMOInt can help in the graphical visualization of the complex relationships between the different prosodic features which configure the intonational aspects of natural speech. MEMOInt has already been used successfully for the prediction of synthetic F0 contours in the presence of the usual data scarcity problems...
The development of a dialogue system for any task implies the acquisition of a dialogue corpus in order to study the structure of the dialogues used in that task. This structure is reflected in the dialogue system behaviour, which can be rule-based or corpus-based. In the case of corpus-based dialogue systems, the behaviour is defined by statistical models which are inferred from an annotated corpus...
Nowadays, most of documents are produced in digital format, in which they can be easily accessed and copied. Document copy detection is a very important tool for protecting the author’s copyright. We present PPChecker, a document copy detection system based on plagiarism pattern checking. PPChecker calculates the amount of data copied from the original document to the query document, based on linguistically-motivated...
Naturalness of synthetic speech highly depends on appropriate modelling of prosodic aspects. Mostly, three prosody components are modelled: segmental duration, pitch contour and intensity. In this study, we present our work on modelling segmental duration in Turkish using machine-learning algorithms, especially Classification and Regression Trees. The models predict phone durations based on attributes...
This paper describes a design methodology for multimodal interactive systems. The method suggested is meant to serve as a foundation for the application of robust software engineering techniques in the field of multimodal systems. Starting from a short review of current design approaches we present a high level view of the design process for multimodal systems, highlighting design issues related to...
This paper describes the Ephyra question answering engine, a modular and extensible framework that allows to integrate multiple approaches to question answering in one system. Our framework can be adapted to languages other than English by replacing language-specific components. It supports the two major approaches to question answering, knowledge annotation and knowledge mining. Ephyra uses the web...
There are many situations in which listening to a text produced by a text-to-speech system is easier or safer than reading, for example when driving a car. Technical documents, such as conference articles, manuals etc., usually are comprised of relatively plain and unequivocal sentences. These documents usually contain words and terms unknown to the listener because they are full of domain specific...
It has been found that the infant’s crying has much information on its sound wave. For small infants crying is a form of communication, a very limited one, but similar to the way adults communicate. In this work we present the design of an Automatic Infant Cry Recognizer hybrid system, that classifies different kinds of cries, with the objective of identifying some pathologies in recently born babies...
Ageing affects the economic and social foundations of societies at world level. Health care has to respond to the challenge that population ageing presents. Medical remote monitoring needs human operator to be assisted by means of smart information systems. Physiological and position sensors give numerous data, but speech analysis and sound classification can give interesting additional information...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.