Search results

chapter

Coda's duration on perception of mandarin syllables with alveolar/velar nasal endings by Japanese CSL learners

Xijing Luo, Jinsong Zhang, Zuyan Wang, Hang Wang

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 150 - 154

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Perceptually distinguishing between Mandarin alveolar nasal coda [n] and velar [η] are difficult for Japanese natives in learning Chinese as a second language (CSL). Discovering relations between acoustic cues and perceptual responses is important for studying CSL acquisition and computer-aided pronunciation teaching. In order to investigate the influences of nasal coda's lengths on nasal perception...

chapter

Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters

Yanlong Zhang, Mee Sonu, Hiroaki Kato, Yoshinori Sagisaka

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 186 - 189

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

For better understanding of the identification difficulties in Japanese geminate/singleton consonants for second language (L2) learners, a perceptual factor is newly introduced to supply the insufficiencies of conventional explanations solely using acoustic duration differences. To systematically explain speech-rate related serious errors of geminate/singleton identification in fast/slow speech, loudness...

chapter

Contrastive study of focus phonetic realization between Jinan dialect and Taiyuan dialect

Duan Wenjun, Jia Yuan

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 47 - 52

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

It is usually considered that focus bears communicative function in discourse, each language has its own ways to realize focus. This paper compares the focus realization of Jinan dialect and Taiyuan dialect. It aims to investigate the similarity and difference of focus realization through examining the variations of mean F0, duration and intensity in both focused and unfocused conditions between these...

chapter

Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix

Chatchawarn Hansakunbuntheung, Sumonmas Thatphithakkul

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE) > 160 - 165

2015 International Conference Oriental COCOSDA held jointly with 2015 Conference on Asian Spoken Language Research and Evaluation (O-COCOSDA/CASLRE)

Context-dependent pronunciation, e.g. homographs, is a difficult grapheme-to-phoneme conversion (G2P) issue. It causes accuracy downgrade in speech synthesis and speech recognition. However, the context-dependent pronunciation issue is rarely considered in collecting pronunciation corpus for evaluating accuracy of G2P. Thus, this paper proposes a context-dependent pronunciation corpus using grapheme-phoneme...

chapter

Automatic speech recognition

Douglas O'Shaughnessy

2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON) > 417 - 424

2015 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication Technologies (CHILECON)

This Plenary presents automatic speech recognition (ASR) as a task of artificial intelligence. The basis, the methodology, spectral processing, distance measures for speech, segmentation speech, spectral and temporal variability, application of Markov Models, noise robustness, Language Models for ASR, are presented.

chapter

Real-time changes to social dynamics in human-robot turn-taking

Justin S. Smith, Crystal Chao, Andrea L. Thomaz

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) > 3024 - 3029

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

In order for robots to work alongside humans in a range of domains, they will need to operate with a variety of social dynamics that each context will require. This paper builds on previous work with a parameterized turn-taking model, CADENCE, in which different parameter settings resulted in different social dynamics. In contrast to the static parameter settings of previous work, we now investigate...

chapter

Proxemics and performance: Subjective human evaluations of autonomous sociable robot distance and social signal understanding

Ross Mead, Maja J Mataric

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) > 5984 - 5991

2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

An objective of an autonomous sociable robot is to meet the needs and preferences of a human user. However, this can sometimes be at the expense of the robot's own ability to understand social signals produced by the user. In particular, human preferences of distance (proxemics) to the robot can have significant impact on the performance rates of its automated speech and gesture recognition systems...

chapter

Statistics of parts of speech frequencies in Marko Cheremshyna's works

Ihor Kulchytskyy

2015 Xth International Scientific and Technical Conference "Computer Sciences and Information Technologies" (CSIT) > 209 - 211

2015 Xth International Scientific and Technical Conference "Computer Sciences and Information Technologies" (CSIT)

Statistic aspects of Marko Cheremshyna's idiolect is one of the main research focus of applied lingustic department. It includes letter frequency, word length, amount and percentage of words of different parts of speech, the most frequent content words and bigrams, the frequency of characters combination in text. In this article we are to outline the part of speech aspect of our research. Some statistic...

chapter

Vowel duration measurement using deep neural networks

Yossi Adi, Joseph Keshet, Matthew Goldrick

2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 6

2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP)

Vowel durations are most often utilized in studies addressing specific issues in phonetics. Thus far this has been hampered by a reliance on subjective, labor-intensive manual annotation. Our goal is to build an algorithm for automatic accurate measurement of vowel duration, where the input to the algorithm is a speech segment contains one vowel preceded and followed by consonants (CVC). Our algorithm...

chapter

Context-sensitive learning for enhanced audiovisual emotion classification (Extended abstract)

Angeliki Metallinou, Athanasios Katsamanis, Martin Wollmer, Florian Eyben, more

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 463 - 469

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Human emotional expression tends to evolve in a structured manner in the sense that certain emotional evolution patterns, i.e., anger to anger, are more probable than others, e.g., anger to happiness. Furthermore the perception of an emotional display can be affected by recent emotional displays. Therefore, the emotional content of past and future observations could offer relevant temporal context...

chapter

Automated conversation skills assistant

Mohammad Rafayet Ali

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 760 - 765

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Conversational skills training are getting popular now a days but often very hard to get due to expense and lack of accessibility. In this paper, we present the idea of an automated conversational skills training assistant, which provides both realtime and post summary feedback while having a conversation with a virtual agent. Our exploratory effort shows the applicability of this system and significant...

chapter

Multimodal data collection of human-robot humorous interactions in the Joker project

Laurence Devillers, Sophie Rosset, Guillaume Dubuisson Duplessis, Mohamed A. Sehili, more

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 348 - 354

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Thanks to a remarkably great ability to show amusement and engagement, laughter is one of the most important social markers in human interactions. Laughing together can actually help to set up a positive atmosphere and favors the creation of new relationships. This paper presents a data collection of social interaction dialogs involving humor between a human participant and a robot. In this work,...

chapter

Context analysis using bigrams

M. Spilka, G. Rozinaj, R. Rybarova

2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES) > 401 - 404

2015 IEEE 19th International Conference on Intelligent Engineering Systems (INES)

This paper focuses on using bigrams in a topic determination for speech synthesizer. It contains an explanation of a modular architecture for the speech synthesizer and importance of context analysis for customizing and quality enhancement of synthesized speech. The bigram carries information about context and in this work it is shown how to use them to improve the identification of the theme. At...

chapter

Engagement detection based on mutli-party cues for human robot interaction

Hanan Salam, Mohamed Chetouani

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 341 - 347

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

In this paper, we address the problematic of automatic detection of engagement in multi-party Human-Robot Interaction scenarios. The aim is to investigate to what extent are we able to infer the engagement of one of the entities of a group based solely on the cues of the other entities present in the interaction. In a scenario featuring 3 entities: 2 participants and a robot, we extract behavioural...

chapter

Understanding speaking styles of internet speech data with LSTM and low-resource training

Xixin Wu, Zhiyong Wu, Yishuang Ning, Jia Jia, more

2015 International Conference on Affective Computing and Intelligent Interaction (ACII) > 815 - 820

2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Speech are widely used to express one's emotion, intention, desire, etc. in social network communication, deriving abundant of internet speech data with different speaking styles. Such data provides a good resource for social multimedia research. However, regarding different styles are mixed together in the internet speech data, how to classify such data remains a challenging problem. In previous...

chapter

Harmonic model for MDCT based audio coding with LPC envelope

Takehiro Moriya, Yutaka Kamamoto, Noboru Harada, Tom Backstrom, more

2015 23rd European Signal Processing Conference (EUSIPCO) > 789 - 793

2015 23rd European Signal Processing Conference (EUSIPCO)

Conventional music coders, based on a modified discrete cosine transform (MDCT) suffer greatly when lowering their bit-rate and delay. In particular, tonal music signals are penalized by short analysis windows and the variable length coding of the quantized MDCT coefficients demands a significant amount of bits for coding the harmonic structure. For solving such an issue, the paper proposes a frequency-domain...

chapter

Local trajectory based speech enhancement for robust speech recognition with deep neural network

Yongbin You, Yanmin Qian, Kai Yu

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 5 - 9

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

Deep neural network(DNN) has achieved a great success in automatic speech recognition(ASR), and it can be regarded as a joint model combining the nonlinear feature transformation and the log-linear classifier. Recently DNN is adopted as a regression model to enhance the distorted feature in noisy condition and the enhanced feature is utilized to improve the performance of DNN based ASR. Previous work...

chapter

Detecting synthetic speech using long term magnitude and phase information

Xiaohai Tian, Steven Du, Xiong Xiao, Haihua Xu, more

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 611 - 615

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

Synthetic speech is speech signals generated by text-to-speech (TTS) and voice conversion (VC) techniques. They impose a threat to speaker verification (SV) systems as an attacker may make use of TTS or VC to synthesize a speakers voice to cheat the SV system. To address this challenge, we study the detection of synthetic speech using long term magnitude and phase information of speech. As most of...

chapter

Data-driven pause prediction for synthesis of storytelling style speech based on discourse modes

Parakrant Sarkar, K. Sreenivasa Rao

2015 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) > 1 - 5

2015 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)

In storytelling style, a storyteller generally uses prosodic variations with subtle speech nuances for the better apprehension of the listeners. It is achieved by emphasizing prominent words, using various emotions, mimicking voices and providing appropriate pauses. This work is a part of building the Story Text-to-Speech (TTS) [1] synthesis systems in Indian Languages, which aims at synthesizing...

chapter

Sentiment analysis from product reviews using SentiWordNet as lexical resource

Alexandra Cernian, Valentin Sgarciu, Bogdan Martin

2015 7th International Conference on Electronics, Computers and Artificial Intelligence (ECAI) > WE-15 - WE-18

2015 7th International Conference on Electronics, Computers and Artificial Intelligence (ECAI)

In the current social, technological and economic context, customers make their decisions based mostly on the opinion of other consumers. On the other side, companies need quick feedback from their customers in order to adapt to their needs in real time. The effective connection between these two aspects relies on opinion mining tools, which automatically process consumers' reviews and opinions about...

INFONA - science communication portal

Search results

Coda's duration on perception of mandarin syllables with alveolar/velar nasal endings by Japanese CSL learners

Analysis on L2 learners' perception errors between geminate and singleton of Japanese consonants using loudness related parameters

Contrastive study of focus phonetic realization between Jinan dialect and Taiyuan dialect

Context-dependent grapheme-to-phoneme evaluation corpus using flexible contexts and Categorial Matrix

Automatic speech recognition

Real-time changes to social dynamics in human-robot turn-taking

Proxemics and performance: Subjective human evaluations of autonomous sociable robot distance and social signal understanding

Statistics of parts of speech frequencies in Marko Cheremshyna's works

Vowel duration measurement using deep neural networks

Context-sensitive learning for enhanced audiovisual emotion classification (Extended abstract)

Automated conversation skills assistant

Multimodal data collection of human-robot humorous interactions in the Joker project

Context analysis using bigrams

Engagement detection based on mutli-party cues for human robot interaction

Understanding speaking styles of internet speech data with LSTM and low-resource training

Harmonic model for MDCT based audio coding with LPC envelope

Local trajectory based speech enhancement for robust speech recognition with deep neural network

Detecting synthetic speech using long term magnitude and phase information

Data-driven pause prediction for synthesis of storytelling style speech based on discourse modes

Sentiment analysis from product reviews using SentiWordNet as lexical resource

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options