The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In data-driven spoken dialog system development, developers should prepare a dialog corpus with semantic annotation. However, the labeling process is a laborious and time consuming task. To reduce human efforts, we propose an unsupervised approach based on non-parametric Bayesian Hidden Markov Model to the problem of modeling user actions. With the non-parametric model, system designers do not need...
We propose a saliency-maximized audio spectrogram as a representation that lets human analysts quickly search for and detect events in audio recordings. By rendering target events as visually salient patterns, this representation minimizes the time and effort needed to examine a recording. In particular, we propose a transformation of a conventional spectrogram that maximizes the mutual information...
We introduce a novel metric for speech recognition success in voice search tasks, designed to reflect the impact of speech recognition errors on user's overall experience with the system. The computation of the metric is seeded using intuitive labels from human subjects and subsequently automated by replacing human annotations with a machine learning algorithm. The results show that search-based recognition...
This paper is concerned with combining models for decoding an optimum translation for a dictation based machine aided human translation (MAHT) task. Statistical language model (SLM) probabilities in automatic speech recognition (ASR) are updated using statistical machine translation (SMT) model probabilities. The effect of this procedure is evaluated for utterances from human translators dictating...
Computer lip-reading is one of the great signal processing challenges. Not only is the signal noisy, it is variable. However it is almost unknown to compare the performance with human lip-readers. Partly this is because of the paucity of human lip-readers and partly because most automatic systems only handle data that are trivial and therefore not representative of human speech. Here we generate a...
Speaker recognition by machines can be quite good for large groups as seen in NIST speaker recognition evaluations. However, speaker recognition by machine can be fragile for changing environments. This research examines how robust humans are for recognizing familiar speakers in changing environments. Additionally, bandlimited noise was used to try to learn what frequency regions are important for...
This paper describes some of the results from the project entitled “New Parameterization for Emotional Speech Synthesis” held at the Summer 2011 JHU CLSP workshop. We describe experiments on how to use articulatory features as a meaningful intermediate representation for speech synthesis. This parameterization not only allows us to reproduce natural sounding speech but also allows us to generate stylistically...
Without a doubt there is emotion in sound. So far, however, research efforts have focused on emotion in speech and music despite many applications in emotion-sensitive sound retrieval. This paper is an attempt at automatic emotion recognition of general sounds. We selected sound clips from different areas of the daily human environment and model them using the increasingly popular dimensional approach...
This paper shows that pattern classification based on machine learning is a powerful tool for analyzing human brain activity data obtained by magnetoencephalography (MEG). In our previous work, a weighting method using multiple kernel learning was proposed, but this method had a high computational cost. In this paper, we propose a novel and fast weighting method using an AdaBoost algorithm to find...
We review the literatures on human evolution, organizational communication, and CMC, focusing on research addressing CMC support for the transmission of socio-emotional signals, Theory of Mind (ToM), and social capital. We develop a social capital theory of communication in organizations, linking the use of CMC for the transmission of socio-emotional signals with one's ability to develop social capital,...
In this paper, we introduce an effective automated essay scoring system. To implement the system, we extract several features, including the surface features such as the number of words in the essay, number of words longer than 5, and complex features such as grammar checking, sentences, whether the essay is off-topic, the similarity to full-score essays. We get the result of 86% precision given the...
Kinesthetic teaching is an approach to providing demonstrations to a robot in Learning from Demonstration whereby a human physically guides a robot to perform a skill. In the common usage of kinesthetic teaching, the robot's trajectory during a demonstration is recorded from start to end. In this paper we consider an alternative, keyframe demonstrations, in which the human provides a sparse set of...
This paper presents a generalized method for the design of a gesture vocabulary (GV) for intuitive and natural two-way human-robot dialog. Two GV design methodologies are proposed; one for a robot GV (RGV) and a second for a human GV (HGV). The design is based on motion gestures exerted from a cohort of subjects in response to a set of tasks needed to execute several robot waiter (RW)-customer dialogs...
This work aims to realize multimodal interaction with embodied contextual understanding based on the simple chatterbot system. A system framework is proposed to integrate the dialogue system into a 3D simulation platform, SIGVerse to attain multimodal interaction. The chatterbot AIML implementations are described in the achievement of the conversations with embodied contextual understanding in HRI...
Head motion occurs naturally and in synchrony with speech during human dialogue communication, and may carry paralinguistic information, such as intentions, attitudes and emotions. Therefore, natural-looking head motion by a robot is important for smooth human-robot interaction. Based on rules inferred from analyses of the relationship between head motion and dialogue acts, this paper proposes a model...
Social robots have to potential to serve as personal, organizational, and public assistants as, for instance, diet coaches, teacher's aides, and emergency respondents. The success of these robots — whether in motivating users to adhere to a diet regimen or in encouraging them to follow evacuation procedures in the case of a fire — will rely largely on their ability to persuade people. Research in...
Emotion is an essential element of human behaviours. In this research, we investigated human behaviours related with the touch interface on a smartphone as a way to understand users' emotional states. As modern smartphones have various embedded sensors such as accelerometer and gyroscope, we aim to utilize data from these embedded sensors for recognizing human emotion and further finding emotional...
Speech is one of the most natural medium for human communication, which makes it vital to human-robot interaction. In real environments where robots are deployed, distant-talking speech recognition is difficult to realize due to the effects of reverberation. This leads to the degradation of speech recognition and understanding, and hinders a seamless human-robot interaction. To minimize this problem,...
Cooperative robotic systems, such as unmanned aircraft systems, are becoming technologically mature enough to be integrated into civil society. To gain practical use and acceptance, a verifiable, principled and well-defined foundation for interactions between human operators and autonomous systems is needed. In this paper, we propose and specify such a formally grounded collaboration framework. Collaboration...
This paper describes an approach which allow a humanoid robot to automatically acquire vocalization capability by learning from a human tutor. The proposed algorithm can, at the same time, synthesize speech utterances from unrestricted text and generate facial movements of the humanoid head synchronized with the generated speech. The algorithm uses fuzzy articulatory rules, derived from the International...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.