The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The article presents studies on the automatic whispery speech recognition. In the performed research a new corpus with whispery speech has been used. It has been checked whether the extended set of articulatory units (allophones have been used instead of phonemes) improves quality of whispery speech recognition. Experimental results show that the small changes in the allophone set may provide better...
In this paper we propose a method to evaluate the importance of lecture video segments in online courses. The video will be first segmented based on the slide transition. Then we evaluate the importance of each segment based on our analysis of the teacher's focus. This focus is mainly identified by exploring features in the slide and the speech. Since the whole analysis process is based on multimedia...
The presented work explores the role of pitch-adaptive cepstral features in context of automatic speech recognition (ASR) of children's speech on adults' speech trained acoustic models. On account of large acoustic mismatch between training and test data, highly degraded recognition rates are noted for such cases. Earlier studies have shown that the said acoustic mismatch is aided by the insufficient...
This paper is dedicated to the memory of Steven L. Grant for his exceptional contributions to the echo cancellation problem. The regularization is mandatory in all ill-posed problems, especially in the presence of additive noise. In this paper, we consider the regularized recursive least-squares (RLS) algorithm and present a method to find its regularization parameter, depending on the signal-to-noise...
The new Information, communication, and mobile technologies empower the users to learn anywhere and anytime. They also need conversational systems that could be aware of their mobile context in order to adjust it dynamically. The actual research field is focusing on adaptive conversational systems, especially in the case of Mobile-learning. This paper presents a comparative study of some related works...
This paper reports on ongoing work in ITU-T Study Group 12 on developing a universal scale quantifying quality across different types of speech communication services. In contrast to a quality rating scale, this scale should be deliberated from the judgment context as far as possible. As a consequence, it should be possible to compare different types of services on such a scale, in order to justify...
Technical Causes Analysis (P.TCA) is a method for identifying technical causes of sub-optimum speech transmission quality. Originally created as an expert procedure for the annotation of speech samples, its applicability to naïve listener was also studied. Due to the low agreement of naïve listener annotations, it was suggested that detailed training methods are necessary to lift naïve annotations...
A set of Context Sensitive Grammar (CSG) rules to translate Bangla imperative, optative and exclamatory sentences into English are introduced in this paper. In this paper, sentences are considered according to the function and purpose of the user rather than structure of the sentence. Three algorithms are implemented to complete major three steps of machine translation system (i.e., parsing, transfer...
Parser plays a very important role in computational linguistics. In this paper, here we describe a parsing technique for Bangla grammar recognition. The parser is, by nature, a shift reduce parser and constructs a parse table based on LR strategy. It takes the Context Free Grammar (CFG) of the Bangla language as input and constructs parser table from the grammar. The parse table is visited on bottom-up...
This paper addresses a problem that is of paramount importance in solving crimes wherein voice may be key evidence, or the only evidence: that of describing the perpetrator. The term Forensic anthropometry from voice refers to the deduction of the speaker's physical dimensions from voice. There are multiple studies in the literature that approach this problem in different ways, many of which depend...
This paper aimed at introducing a completely automated Arabic phone recognition system based on Enhanced Wavelet Packets Best Tree Encoding (EWPBTE) 15-point speech feature. The process of enhancing of WPBTE is provided by adding energy component to WPBTE, which is implemented in Matlab software and makes an enhancement of 65 % to recognizer accuracy which is the most contribution in this paper. EWPBTE...
Information Present in Different language and Structure gives Rise to language as barrier in information retrieval. Informative Document on queen Elizabeth is been writing by foreign language English which makes its difficult for a Marathi reader to understand and seek History of England, on Similar lines Literature Work on Shivaji is mostly documented in Marathi which makes foreign Historians difficult...
This paper describes the current work on using humanoid robot to augment traditional storytelling for educational and entertainment purposes in casual contexts such as home and classroom. We explore a novel method of Human-Robot Collaboration (HRC) for storytelling and address the question how robots may best augment storytelling. In the pilot study, a humanoid robot, Aldebaran's Nao, was programmed...
Previous HRI research has established that trust, disclosure, and a sense of companionship lead to positive outcomes. In this study, we extend existing work by exploring behavioral approaches to increasing these three aspects of HRI. We increased the expressivity and vulnerability of a robot and measured the effects on trust, disclosure, and companionship during human-robot interaction. We engaged...
Text input is an important part of the data annotation process, where text is used to capture ideas and comments. For text entry in immersive virtual environments, for which standard keyboards usually do not work, various approaches have been proposed. While these solutions have mostly proven effective, there still remain certain shortcomings making further investigations worthwhile. Motivated by...
An increasing amount of research is being conducted to determine how a robot tutor should behave socially in educational interactions with children. Both human-human and human-robot interaction literature predicts an increase in learning with increased social availability of a tutor, where social availability has verbal and nonverbal components. Prior work has shown that greater availability in the...
We present initial findings from an experiment where we used Semantic Free Utterances — vocalizations and sounds without semantic content — as an alternative to Natural Language in a child-robot collaborative game. We tested (i) if two types of Semantic Free Utterances could be accurately recognized by the children; (ii) what effect the type of Semantic Free Utterances had as part of help-giving behaviors...
This work analyzes excitation source to characterize glottal stops using integrated linear prediction (ILP) residual, derived by pitch-synchronous (PS) approach. The glottal stop consonant is produced due to laryngeal gesture in the form of constricted glottis. This pressed glottal configuration, leads to period to period irregularities, aperiodicity, and asymmetry. Normalized crosscorrelation coefficient...
This work proposes a voice-activity home care system which can construct a life log associated with voices at home. Accordingly, the techniques of sound-pressure-level calculation, abnormal sound detection, noise reduction, text-independent speaker recognition and keyword spotting are developed. In abnormal sound detection and speaker recognition, we adopt the two-stage recognition processes of Gaussian...
This research contends that the use of Interactive Media Apps with Talk; i.e. chat, whatsapp, comments, FB, Tweet etc. approach in or out of the classroom is developing learners' psychological and cognitive advancement. The Talk means the social undertaking learning style - a socio-cultural environment of Talk and learning that can be developed into activity learning. The activities in learning are...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.