The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Speech-based human-robot interaction is often plagued with issues such as reverberation and changes in speaker position that impacts overall performance. In this paper, we show a method in compensating the joint effects of reverberation and the change in speaker position. The acoustic perturbation caused by these two takes its toll on the Automatic Speech Recognition (ASR) and then the Spoken Language...
We focus on the problem of speech recognition in the presence of nonstationary sudden noise, which is very likely to happen in home environments. To handle this problem, a model compensation method based on a factorial hidden Markov model (FHMM) has been recently introduced. In this architecture, speech and noise processes are modeled in parallel by a phoneme FHMM that is built by combining a clean-speech...
We propose a novel sparse representation for heavily underdetermined multichannel sound mixtures, i.e., with much more sources than microphones. The proposed approach operates in the complex Fourier domain, thus preserving spatial characteristics carried by phase differences. We derive a generalization of K-SVD which jointly estimates a dictionary capturing both spectral and spatial features, a sparse...
The ability of robots to listen to several things at once with their own “ears”, that is, robot audition, is an important factor in improving interaction and symbiosis between humans and robots. The critical issue in robot audition is real-time processing and robustness against noisy environments with high flexibility to support various kinds of robots and hardware configurations. This paper first...
Speech and audio signal processing research is a tale of data collection efforts and evaluation campaigns. Large benchmark datasets for automatic speech recognition (ASR) have been instrumental in the advancement of speech recognition technologies. However, when it comes to robust ASR, source separation, and localization, especially using microphone arrays, the perfect dataset is out of reach, and...
The structure of a novel soft robot which can mimic a few movements of human tongue was designed with a series of embedded chambers using pneumatic actuation pattern. Two silicone materials (Ecoflex 0030 and PDMS) were chosen to fabricate the body of the robot. FEM simulations have been carried out using software Abaqus. Four types of deformation have been achieved in simulation including roll, groove,...
The work in this paper concerns a small footprint Acoustic Model (AM) and its use in the implementation of a Large Vocabulary Isolated Speech Recognition (LVISR) system for commanding a robot in the Korean language, which requires about 500KB of memory. Tree-based state clustering was applied to reduce the number of total unique states, while preserving its original performance. A decision tree induction...
The advancement in technology nowadays has improved learning methods that are beginning to override the traditional methods. Augmented Reality (AR) is one such technology that has seen many applications in education. This paper describes how an Immersive Augmented Reality (iAR) application in conjunction with a book, can act as a new smart learning method by engaging as many of the user's senses and...
This article proposes an emotive lifelike robotic face, called ExpressionBot, that is designed to support verbal and non-verbal communication between the robot and humans, with the goal of closely modeling the dynamics of natural face-to-face communication. The proposed robotic head consists of two major components: 1) a hardware component that contains a small projector, a fish-eye lens, a custom-designed...
Recent developments in human-robot interaction show how the ability to communicate with people in a natural way is of great importance for artificial agents. The implementation of facial expressions has been found to significantly increase the interaction capabilities of humanoid robots. For speech, displaying a correct articulation with sound is mandatory to avoid audiovisual illusions like the McGurk...
This paper presents an interactive humanoid robot that can moderate a multi-player fastest-voice-first-type quiz game by leveraging state-of-the-art robot audition techniques such as sound source localization and separation and speech recognition. In this game, a player who says "Yes" first gets a right to answer a question, and players are allowed to barge in a questionary utterance of...
The application of robotics to telepresence can enhance user interaction experience by providing embodiment, engaging behaviors, automatic control, and human perception. This paper presents a new telepresence robot with gesture-based attention direction to orient the robot towards attention targets according to human deictic gestures. Gesture-based attention direction is realized by combining Localist...
In this paper we address the problem of musical genre recognition for a dancing robot with embedded microphones capable of distinguishing the genre of a musical piece while moving in a real-world scenario. For this purpose, we assess and compare two state-of-the-art musical genre recognition systems, based on Support Vector Machines and Markov Models, in the context of different real-world acoustic...
This paper presents modification of a speech emotion recognition system for a social robot. Using speaker dependent classifiers with prior speaker identification step was proposed. Emotion recognition is done using global acoustic features of the speech. Six speech signal parameters are computed with the specialised software. The feature extraction is based on calculation of global statistics of those...
In this paper, an unsupervised adaptation algorithm for the microphone array topology of a humanoid robot is proposed, so that the spatial filtering performance is improved. In the given exemplary case, the target suppression (‘blocking’) performance of a geometrically-constrained BSS (GC-BSS) algorithm is shown to improve by the adaptation of the array topology. As a decisive feature, an online performance...
Blind or visually impaired people want to know more about things they hear in the world. They want to know what other people can “see”. With its cameras, a robot can fill that role. But how can an individual make requests about arbitrary objects they can only hear? How can people make requests about objects they do not know either the exact location of, or any uniquely identifiable traits? This work...
In a previous study, we developed an embodied virtual communication system for human interaction analysis by synthesis in avatar-mediated communication and confirmed the close relationship between speech overlap and the period for activating embodied interaction and communication through avatars. In this paper, we propose an interaction-activated communication model based on the heat conduction equation...
In this paper we present results from a user evaluation of a robot bartender system which handles state uncertainty derived from speech input by using belief tracking and generating appropriate clarification questions. We present a combination of state estimation and action selection components in which state uncertainty is tracked and exploited, and compare it to a baseline version that uses standard...
This research explored whether robots can use modern speech synthesizers to convey emotion with their speech. We investigated the use of MARY, an open source speech synthesizer, to convey a robot's emotional intent to novice robot users. The first experiment indicated that participants were able to distinguish the intended emotions of anger, calm, fear, and sadness with success rates of 65.9%, 68...
Language offers the possibility to transfer information between speaker and listener who both possess the ability to use it. Using a “speaker-listener” situation, we have compared the verbal and the emotional expressions of neurotypical and autistic children aged 6 to 7 years. The speaker was always a child (neurotypical or autistic); the listener was a human InterActor or an InterActor robot, i.e...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.