Search results

chapter

F₀ estimation of speech based on IRAPT using WLP-based TV-CAR analysis

Wei Shan, Keiichi Funaki

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 4

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Fundamental frequency (F₀) estimation plays an important role in speech processing such as speech coding, synthesis, recognition and so on. Although a present F0 estimation method performs well under clean condition, the performance deteriorates significantly in noisy environment. For this reason robust F₀ estimation against additive noise is demanded. We have previously proposed F₀ estimation methods...

chapter

DBLSTM-based multi-task learning for pitch transformation in voice conversion

Runnan Li, Zhiyong Wu, Helen Meng, Lianhong Cai

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

While both spectral and prosody transformation are important for voice conversion (VC), traditional methods have focused on the conversion of spectral features with less emphasis on prosody transformation. This paper presents a novel pitch transformation method for VC. As the correlation of spectral features and fundamental frequency in pitch perceptions has been proved, well-converted spectrum should...

chapter

Discourse prosody and its application to speech synthesis

Na Hu, Pengfei Shao, Yiqing Zu, Zuyan Wang, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

This paper reveals the correlations between discourse structure and acoustic parameters and presents a method of manipulating discourse prosody in relation to discourse structure to improve the naturalness of synthesis speech. The text material included 1229 passages. The texts were annotated using Rhetorical Structure Theory. Prosody measurements were extracted from the corresponding speech annotation...

chapter

Acoustic correlates and gender effects in production and perception of Japanese polite speech

Shi Shuju, Tsurutani Chiharu, Feng Xiaoli, Zhang Jinsong, more

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

This study examines potential contribution of prosodic features and voice quality to the perception and production of Japanese polite speech as well as possible gender effects in politeness strategy. We first recorded speech from 10 native Japanese speakers (5 male, 5 female) under polite and non-polite settings with identical texts. Then perceptual experiment was conducted to rate the politeness...

chapter

Tongue shape variation model for simulating Mandarin Chinese articulation

Jinguang Zhang, Xiyu Wu, Jiangping Kong

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

We studied tongue shapes extracted from X-ray films which were taken during the process of mandarin Chinese articulation. Through factor analysis, we built an eight-parameter-driven tongue articulation model. This study reveals that the front of the tongue has large horizontal movement; the blade of the tongue has large vertical movement; whereas the back, as well as the root, of the tongue has small...

chapter

Vocal tract and voice source features for monitoring cognitive workload

Manuela Meier, Michal Borsky, Eydis H. Magnusdottir, Kamilla R. Johannsdottir, more

2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) > 97 - 102

2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

Monitoring cognitive workload from speech signals has received a lot of attention from researchers in the past few years as it has the potential to improve performance and fidelity in human decision making. The bulk of the research has focused on classifying speech from talkers participating in cognitive workload experiments using simple reading tasks, memory span tests and the Stroop test, typically...

chapter

Atom decomposition based stress detection and automatic phrasing of speech

Mate Akos Tundik, Branislav Gerazov, Aleksandar Gjoreski, Gyorgy Szaszak

2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom) > 25 - 30

2016 7th IEEE International Conference on Cognitive Infocommunications (CogInfoCom)

The Weighted Correlation based Atom Decomposition (WCAD) is a recently proposed physiological intonation model that decomposes the pitch contour into elementary components — atoms. Since these atoms are said to correspond to laryngeal muscle activation, in theory they could be used to infer higher linguistic meaning from the pitch contour. One such application relevant for cognitive infocommunication...

chapter

Analysis of the dependencies between parameters of the voice at the context of the succession of sung vowels

Edward Polrolniczak, Michal Kramarczyk

2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) > 72 - 77

2016 Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA)

The article presents the results of signal analysis of the recorded singing voice samples. For that study the recorded samples of the “a-e-i-o-u” exercise is analysed. Some significant parameters describing voice have been estimated. Among the estimated parameters are: pitch, calculated with the use of autocorrelation method, values of the first five harmonics, set of parameters containing first five...

chapter

Solving permutation problem with a cascade combination of phase difference entropy and power spectral correlation

Masahito Togami, Ryoichi Takashima, Yusuke Fujita

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC) > 1 - 4

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)

In this paper, we propose a novel method to solve the permutation problem for multi-channel frequency-domain blind source separation problems. For low spectral correlation problem between lower frequencies and higher frequencies, the proposed method utilizes phase difference information between microphones so as to avoid incorrect permutation alignment problems in addition to power spectral information...

chapter

Late reverberation PSD estimation for single-channel dereverberation using relative convolutive transfer functions

Sebastian Braun, Boaz Schwartz, Sharon Gannot, Emanuel A. P. Habets

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC) > 1 - 5

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)

The estimation accuracy of the late reverberation power spectral density (PSD) is of paramount importance in single-channel frequency-domain dereverberation algorithms. In this domain, the reverberant signal can be modeled by the convolution of an early speech component and a relative convolutive transfer function (RCTF). In this work, the RCTF coefficients are modeled by a first-order Markov chain,...

chapter

A multiframe parametric wiener filter for acoustic echo suppression

Hai Huang, Christian Hofmann, Walter Kellermann, Jingdong Chen, more

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC) > 1 - 5

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)

Acoustic echo arises due to the acoustic coupling between the loudspeaker and the microphone in a full-duplex voice communication device. How to reduce or eliminate echo has been an important problem in voice communications. This paper deals with this problem in the short-time Fourier transform (STFT) domain. An approach to acoustic echo suppression (AES) is developed, which uses a linear filter in...

chapter

Multi-speaker DOA estimation in reverberation conditions using expectation-maximization

Ofer Schwartz, Yuval Dorfan, Emanuel A.P. Habets, Sharon Gannot

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC) > 1 - 5

2016 IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)

A novel direction of arrival (DOA) estimator for concurrent speakers in reverberant environment is presented. Reverberation, if not properly addressed, is known to degrade the performance of DOA estimators. In our contribution, the DOA estimation task is formulated as a maximum likelihood (ML) problem, which is solved using the expectation-maximization (EM) procedure. The received microphone signals...

chapter

Toward a brain interface for tracking attended auditory sources

Marzieh Haghighi, Mohammad Moghadamfalahi, Hooman Nezamfar, Murat Akcakaya, more

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) > 1 - 5

2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)

Auditory-evoked noninvasive electroencephalography (EEG) based brain-computer interfaces (BCIs) could be useful for improved hearing aids in the future. This manuscript investigates the role of frequency and spatial features of audio signal in EEG activities in an auditory BCI system with the purpose of detecting the attended auditory source in a cocktail party setting. A cross correlation based feature...

chapter

Two-Layer Decision Model Based on Noise Classification

Liu Tingting, Kang Kai, Chou Li

2016 International Conference on Robots & Intelligent System (ICRIS) > 469 - 472

2016 International Conference on Robots & Intelligent System (ICRIS)

Generally, the performance of endpoint detection is affected by the noise. In this paper, we propose a novel two-layer decision model based on noise classification to detect the activity voice robustly. The training processing mainly contains two steps: firstly, we employ the noisex-92 database, which consists of different types of pure noise, to train a BP neural network in order to classify the...

chapter

Continuous fundamental frequency prediction with deep neural networks

Balint Pal Toth, Tamas Gabor Csapo

2016 24th European Signal Processing Conference (EUSIPCO) > 1348 - 1352

2016 24th European Signal Processing Conference (EUSIPCO)

Deep learning is proven to outperform other machine learning methods in numerous research fields. However, previous approaches, like multispace probability distribution hidden Markov models still surpass deep learning methods in the prediction accuracy of speech fundamental frequency (F0), inter alia, due to its discontinuous behavior. The current research focuses on the application of feedforward...

chapter

Mimicry in online conversations: An exploratory study of linguistic analysis techniques

Tom Carrick, Awais Rashid, Paul J Taylor

2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) > 732 - 736

2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)

A number of computational techniques have been proposed that aim to detect mimicry in online conversations. In this paper, we investigate how well these reflect the prevailing cognitive science model, i.e. the Interactive Alignment Model. We evaluate Local Linguistic Alignment, word vectors, and Language Style Matching and show that these measures tend to show the features we expect to see in the...

chapter

An approach to evaluation index and model of undergraduates' spoken English pronunciation

Xin-guang Li, Jia-hua Chen, Zhen Chen, Yingni Chen

2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) > 2189 - 2193

2016 12th International Conference on Natural Computation and 13th Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)

To study and implement a computer evaluation system for spoken English pronunciation is important for learners to improve their spoken English. This paper introduces an undergraduate-oriented evaluation model of spoken English pronunciation and its related system, with four evaluation parameter of accuracy, speed, rhythm and intonation. This paper illustrates the necessity of each evaluation index,...

chapter

Towards direct speech synthesis from ECoG: A pilot study

Christian Herff, Garett Johnson, Lorenz Diener, Jerry Shih, more

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 1540 - 1543

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time...

chapter

Cross-frequency coupling during auditory perception in human cortex

Urszula Malinowska, Dana Boatman-Reich

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 5521 - 5524

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

Cross-frequency coupling plays an important role in coordinating neuronal computations underlying human perception, learning and memory. Here we compared four methods for measuring phase/amplitude coupling (PAC) of theta (4–7 Hz) and high-gamma (70–150 Hz) in intracranial electrocorticographic (ECoG) recordings. Time-frequency spectral and time-domain evoked responses were derived for comparison....

chapter

Adaptive attention-driven speech enhancement for EEG-informed hearing prostheses

Neetha Das, Simon Van Eyndhoven, Tom Francart, Alexander Bertrand

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) > 77 - 80

2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)

State-of-the-art hearing prostheses are equipped with acoustic noise reduction algorithms to improve speech intelligibility. Currently, one of the major challenges is to perform acoustic noise reduction in so-called cocktail party scenarios with multiple speakers, in particular because it is difficult-if not impossible-for the algorithm to determine which are the target speaker(s) that should be enhanced,...

INFONA - science communication portal

Search results

F₀ estimation of speech based on IRAPT using WLP-based TV-CAR analysis

DBLSTM-based multi-task learning for pitch transformation in voice conversion

Discourse prosody and its application to speech synthesis

Acoustic correlates and gender effects in production and perception of Japanese polite speech

Tongue shape variation model for simulating Mandarin Chinese articulation

Vocal tract and voice source features for monitoring cognitive workload

Atom decomposition based stress detection and automatic phrasing of speech

Analysis of the dependencies between parameters of the voice at the context of the succession of sung vowels

Solving permutation problem with a cascade combination of phase difference entropy and power spectral correlation

Late reverberation PSD estimation for single-channel dereverberation using relative convolutive transfer functions

A multiframe parametric wiener filter for acoustic echo suppression

Multi-speaker DOA estimation in reverberation conditions using expectation-maximization

Toward a brain interface for tracking attended auditory sources

Two-Layer Decision Model Based on Noise Classification

Continuous fundamental frequency prediction with deep neural networks

Mimicry in online conversations: An exploratory study of linguistic analysis techniques

An approach to evaluation index and model of undergraduates' spoken English pronunciation

Towards direct speech synthesis from ECoG: A pilot study

Cross-frequency coupling during auditory perception in human cortex

Adaptive attention-driven speech enhancement for EEG-informed hearing prostheses

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options