The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With information processing and retrieval of spoken documents becoming an important topic, there is a need of systems performing automatic segmentation of audio streams. Among such algorithms, spoken term discovery allows the extraction of word-like units (terms) directly from the continuous speech signal, in an unsupervised manner and without any knowledge of the language at hand. Since the performance...
This paper proposes a framework for developing an automatic annotation tool of Romanian prosody for spontaneous and reading speech and a set of acoustic cues at the prosodic word level, necessary to accurately discriminate the prosodic phrases. Even though many approaches have considered the silence pause as an important acoustic cue in the automatic detection of the prosodic phrase boundaries, our...
This paper presents a study on exploring word-level break-type formation rules for Mandarin read speech. A 4-layer hierarchical structure with seven break types is adopted to represent the prosody of utterance. The work is based on the break-type tags labeled on a large read-speech database by the prosody labeling and modeling algorithm (PLM) proposed previously. Occurrence frequencies of seven break...
The paper presents information packaging structures in Romanian utterances with the contrast relation, by decomposing them into hierarchies of embedded communicative units. At any level of the hierarchy, communicative units are structured by two or three functional constituents each of them having text and melodic contour. Communicative unit constituents are functional elements at the information...
Perceptually distinguishing between Mandarin alveolar nasal coda [n] and velar [η] are difficult for Japanese natives in learning Chinese as a second language (CSL). Discovering relations between acoustic cues and perceptual responses is important for studying CSL acquisition and computer-aided pronunciation teaching. In order to investigate the influences of nasal coda's lengths on nasal perception...
For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness...
It is usually considered that focus bears communicative function in discourse, each language has its own ways to realize focus. This paper compares the focus realization of Jinan dialect and Taiyuan dialect. It aims to investigate the similarity and difference of focus realization through examining the variations of mean F0, duration and intensity in both focused and unfocused conditions between these...
Context-dependent pronunciation, e.g. homographs, is a difficult grapheme-to-phoneme conversion (G2P) issue. It causes accuracy downgrade in speech synthesis and speech recognition. However, the context-dependent pronunciation issue is rarely considered in collecting pronunciation corpus for evaluating accuracy of G2P. Thus, this paper proposes a context-dependent pronunciation corpus using grapheme-phoneme...
This Plenary presents automatic speech recognition (ASR) as a task of artificial intelligence. The basis, the methodology, spectral processing, distance measures for speech, segmentation speech, spectral and temporal variability, application of Markov Models, noise robustness, Language Models for ASR, are presented.
In order for robots to work alongside humans in a range of domains, they will need to operate with a variety of social dynamics that each context will require. This paper builds on previous work with a parameterized turn-taking model, CADENCE, in which different parameter settings resulted in different social dynamics. In contrast to the static parameter settings of previous work, we now investigate...
An objective of an autonomous sociable robot is to meet the needs and preferences of a human user. However, this can sometimes be at the expense of the robot's own ability to understand social signals produced by the user. In particular, human preferences of distance (proxemics) to the robot can have significant impact on the performance rates of its automated speech and gesture recognition systems...
Statistic aspects of Marko Cheremshyna's idiolect is one of the main research focus of applied lingustic department. It includes letter frequency, word length, amount and percentage of words of different parts of speech, the most frequent content words and bigrams, the frequency of characters combination in text. In this article we are to outline the part of speech aspect of our research. Some statistic...
Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm...
Human emotional expression tends to evolve in a structured manner in the sense that certain emotional evolution patterns, i.e., anger to anger, are more probable than others, e.g., anger to happiness. Furthermore the perception of an emotional display can be affected by recent emotional displays. Therefore, the emotional content of past and future observations could offer relevant temporal context...
Conversational skills training are getting popular now a days but often very hard to get due to expense and lack of accessibility. In this paper, we present the idea of an automated conversational skills training assistant, which provides both realtime and post summary feedback while having a conversation with a virtual agent. Our exploratory effort shows the applicability of this system and significant...
Thanks to a remarkably great ability to show amusement and engagement, laughter is one of the most important social markers in human interactions. Laughing together can actually help to set up a positive atmosphere and favors the creation of new relationships. This paper presents a data collection of social interaction dialogs involving humor between a human participant and a robot. In this work,...
This paper focuses on using bigrams in a topic determination for speech synthesizer. It contains an explanation of a modular architecture for the speech synthesizer and importance of context analysis for customizing and quality enhancement of synthesized speech. The bigram carries information about context and in this work it is shown how to use them to improve the identification of the theme. At...
In this paper, we address the problematic of automatic detection of engagement in multi-party Human-Robot Interaction scenarios. The aim is to investigate to what extent are we able to infer the engagement of one of the entities of a group based solely on the cues of the other entities present in the interaction. In a scenario featuring 3 entities: 2 participants and a robot, we extract behavioural...
Speech are widely used to express one's emotion, intention, desire, etc. in social network communication, deriving abundant of internet speech data with different speaking styles. Such data provides a good resource for social multimedia research. However, regarding different styles are mixed together in the internet speech data, how to classify such data remains a challenging problem. In previous...
Probabilistic models are often viewed as insufficiently expressive because of strong limitation and assumption on the probabilistic distribution and the fixed model complexity. Bayesian nonparametric learning pursues an expressive probabilistic representation based on the nonparametric prior and posterior distributions with less assumption-laden approach to inference. This paper presents a hierarchical...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.