Advanced search

chapter

Exploring multi-language resources for unsupervised spoken term discovery

Bogdan Ludusan, Alexandru Caranica, Horia Cucu, Andi Buzo, more

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) > 1 - 6

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)

With information processing and retrieval of spoken documents becoming an important topic, there is a need of systems performing automatic segmentation of audio streams. Among such algorithms, spoken term discovery allows the extraction of word-like units (terms) directly from the continuous speech signal, in an unsupervised manner and without any knowledge of the language at hand. Since the performance...

chapter

Towards prosodic phrasing of spontaneous and reading speech for Romanian corpora

Vasile Apopei, Otilia Paduraru

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) > 1 - 4

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)

This paper proposes a framework for developing an automatic annotation tool of Romanian prosody for spontaneous and reading speech and a set of acoustic cues at the prosodic word level, necessary to accurately discriminate the prosodic phrases. Even though many approaches have considered the silence pause as an important acoustic cue in the automatic detection of the prosodic phrase boundaries, our...

chapter

On finding word-level break-type formation rules for mandarin read speech

Fu-Ja Kung, Pa-Hwa Lee, Yih-Ru Wang, Sin-Horng Chen, more

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 53 - 57

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

This paper presents a study on exploring word-level break-type formation rules for Mandarin read speech. A 4-layer hierarchical structure with seven break types is adopted to represent the prosody of utterance. The work is based on the break-type tags labeled on a large read-speech database by the prosody labeling and modeling algorithm (PLM) proposed previously. Occurrence frequencies of seven break...

chapter

Information structure in Romanian utterances with contrast relations

Doina Jitca

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD) > 1 - 6

2015 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)

The paper presents information packaging structures in Romanian utterances with the contrast relation, by decomposing them into hierarchies of embedded communicative units. At any level of the hierarchy, communicative units are structured by two or three functional constituents each of them having text and melodic contour. Communicative unit constituents are functional elements at the information...

chapter

Coda's duration on perception of mandarin syllables with alveolar/velar nasal endings by Japanese CSL learners

Xijing Luo, Jinsong Zhang, Zuyan Wang, Hang Wang

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 150 - 154

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Perceptually distinguishing between Mandarin alveolar nasal coda [n] and velar [η] are difficult for Japanese natives in learning Chinese as a second language (CSL). Discovering relations between acoustic cues and perceptual responses is important for studying CSL acquisition and computer-aided pronunciation teaching. In order to investigate the influences of nasal coda's lengths on nasal perception...

chapter

Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters

Yanlong Zhang, Mee Sonu, Hiroaki Kato, Yoshinori Sagisaka

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 186 - 189

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness...

chapter

Contrastive study of focus phonetic realization between Jinan dialect and Taiyuan dialect

Duan Wenjun, Jia Yuan

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 47 - 52

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

It is usually considered that focus bears communicative function in discourse, each language has its own ways to realize focus. This paper compares the focus realization of Jinan dialect and Taiyuan dialect. It aims to investigate the similarity and difference of focus realization through examining the variations of mean F0, duration and intensity in both focused and unfocused conditions between these...

chapter

Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix

Chatchawarn Hansakunbuntheung, Sumonmas Thatphithakkul

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 160 - 165

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Context-dependent pronunciation, e.g. homographs, is a difficult grapheme-to-phoneme conversion (G2P) issue. It causes accuracy downgrade in speech synthesis and speech recognition. However, the context-dependent pronunciation issue is rarely considered in collecting pronunciation corpus for evaluating accuracy of G2P. Thus, this paper proposes a context-dependent pronunciation corpus using grapheme-phoneme...

chapter

Automatic speech recognition

Douglas O'Shaughnessy

2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON) > 417 - 424

2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON)

This Plenary presents automatic speech recognition (ASR) as a task of artificial intelligence. The basis, the methodology, spectral processing, distance measures for speech, segmentation speech, spectral and temporal variability, application of Markov Models, noise robustness, Language Models for ASR, are presented.

chapter

Real-time changes to social dynamics in human-robot turn-taking

Justin S. Smith, Crystal Chao, Andrea L. Thomaz

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) > 3024 - 3029

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

In order for robots to work alongside humans in a range of domains, they will need to operate with a variety of social dynamics that each context will require. This paper builds on previous work with a parameterized turn-taking model, CADENCE, in which different parameter settings resulted in different social dynamics. In contrast to the static parameter settings of previous work, we now investigate...

chapter

Proxemics and performance: Subjective human evaluations of autonomous sociable robot distance and social signal understanding

Ross Mead, Maja J Mataric

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) > 5984 - 5991

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

An objective of an autonomous sociable robot is to meet the needs and preferences of a human user. However, this can sometimes be at the expense of the robot's own ability to understand social signals produced by the user. In particular, human preferences of distance (proxemics) to the robot can have significant impact on the performance rates of its automated speech and gesture recognition systems...

chapter

Statistics of parts of speech frequencies in Marko Cheremshyna's works

Ihor Kulchytskyy

2015 Xth International Scientific and Technical Conference "Computer Sciences and Information Technologies" (CSIT) > 209 - 211

2015 Xth International Scientific and Technical Conference "Computer Sciences and Information Technologies" (CSIT)

Statistic aspects of Marko Cheremshyna's idiolect is one of the main research focus of applied lingustic department. It includes letter frequency, word length, amount and percentage of words of different parts of speech, the most frequent content words and bigrams, the frequency of characters combination in text. In this article we are to outline the part of speech aspect of our research. Some statistic...

chapter

Vowel duration measurement using deep neural networks

Yossi Adi, Joseph Keshet, Matthew Goldrick

2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)

Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm...

chapter

Context-sensitive learning for enhanced audiovisual emotion classification (Extended abstract)

Angeliki Metallinou, Athanasios Katsamanis, Martin Wollmer, Florian Eyben, more

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 463 - 469

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Human emotional expression tends to evolve in a structured manner in the sense that certain emotional evolution patterns, i.e., anger to anger, are more probable than others, e.g., anger to happiness. Furthermore the perception of an emotional display can be affected by recent emotional displays. Therefore, the emotional content of past and future observations could offer relevant temporal context...

chapter

Automated conversation skills assistant

Mohammad Rafayet Ali

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 760 - 765

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Conversational skills training are getting popular now a days but often very hard to get due to expense and lack of accessibility. In this paper, we present the idea of an automated conversational skills training assistant, which provides both realtime and post summary feedback while having a conversation with a virtual agent. Our exploratory effort shows the applicability of this system and significant...

chapter

Multimodal data collection of human-robot humorous interactions in the Joker project

Laurence Devillers, Sophie Rosset, Guillaume Dubuisson Duplessis, Mohamed A. Sehili, more

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 348 - 354

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Thanks to a remarkably great ability to show amusement and engagement, laughter is one of the most important social markers in human interactions. Laughing together can actually help to set up a positive atmosphere and favors the creation of new relationships. This paper presents a data collection of social interaction dialogs involving humor between a human participant and a robot. In this work,...

chapter

Context analysis using bigrams

M. Spilka, G. Rozinaj, R. Rybarova

2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES) > 401 - 404

2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES)

This paper focuses on using bigrams in a topic determination for speech synthesizer. It contains an explanation of a modular architecture for the speech synthesizer and importance of context analysis for customizing and quality enhancement of synthesized speech. The bigram carries information about context and in this work it is shown how to use them to improve the identification of the theme. At...

chapter

Engagement detection based on mutli-party cues for human robot interaction

Hanan Salam, Mohamed Chetouani

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 341 - 347

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

In this paper, we address the problematic of automatic detection of engagement in multi-party Human-Robot Interaction scenarios. The aim is to investigate to what extent are we able to infer the engagement of one of the entities of a group based solely on the cues of the other entities present in the interaction. In a scenario featuring 3 entities: 2 participants and a robot, we extract behavioural...

chapter

Understanding speaking styles of internet speech data with LSTM and low-resource training

Xixin Wu, Zhiyong Wu, Yishuang Ning, Jia Jia, more

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 815 - 820

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Speech are widely used to express one's emotion, intention, desire, etc. in social network communication, deriving abundant of internet speech data with different speaking styles. Such data provides a good resource for social multimedia research. However, regarding different styles are mixed together in the internet speech data, how to classify such data remains a challenging problem. In previous...

article

Hierarchical Pitman–Yor–Dirichlet Language Model

Jen-Tzung Chien

IEEE/ACM Transactions on Audio, Speech, and Language Processing > 2015 > 23 > 8 > 1259 - 1272

Probabilistic models are often viewed as insufficiently expressive because of strong limitation and assumption on the probabilistic distribution and the fixed model complexity. Bayesian nonparametric learning pursues an expressive probabilistic representation based on the nonparametric prior and posterior distributions with less assumption-laden approach to inference. This paper presents a hierarchical...

INFONA - science communication portal

Advanced search

Advanced search in people

Exploring multi-language resources for unsupervised spoken term discovery

Towards prosodic phrasing of spontaneous and reading speech for Romanian corpora

On finding word-level break-type formation rules for mandarin read speech

Information structure in Romanian utterances with contrast relations

Coda's duration on perception of mandarin syllables with alveolar/velar nasal endings by Japanese CSL learners

Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters

Contrastive study of focus phonetic realization between Jinan dialect and Taiyuan dialect

Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix

Automatic speech recognition

Real-time changes to social dynamics in human-robot turn-taking

Proxemics and performance: Subjective human evaluations of autonomous sociable robot distance and social signal understanding

Statistics of parts of speech frequencies in Marko Cheremshyna's works

Vowel duration measurement using deep neural networks

Context-sensitive learning for enhanced audiovisual emotion classification (Extended abstract)

Automated conversation skills assistant

Multimodal data collection of human-robot humorous interactions in the Joker project

Context analysis using bigrams

Engagement detection based on mutli-party cues for human robot interaction

Understanding speaking styles of internet speech data with LSTM and low-resource training

Hierarchical Pitman–Yor–Dirichlet Language Model

Filter options

Publication date

Content availability

Publication type

Publication language

Keywords

Data set

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Publication language

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options