The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Fundamental frequency (F0) estimation plays an important role in speech processing such as speech coding, synthesis, recognition and so on. Although a present F0 estimation method performs well under clean condition, the performance deteriorates significantly in noisy environment. For this reason robust F0 estimation against additive noise is demanded. We have previously proposed F0 estimation methods...
While both spectral and prosody transformation are important for voice conversion (VC), traditional methods have focused on the conversion of spectral features with less emphasis on prosody transformation. This paper presents a novel pitch transformation method for VC. As the correlation of spectral features and fundamental frequency in pitch perceptions has been proved, well-converted spectrum should...
This paper reveals the correlations between discourse structure and acoustic parameters and presents a method of manipulating discourse prosody in relation to discourse structure to improve the naturalness of synthesis speech. The text material included 1229 passages. The texts were annotated using Rhetorical Structure Theory. Prosody measurements were extracted from the corresponding speech annotation...
This study examines potential contribution of prosodic features and voice quality to the perception and production of Japanese polite speech as well as possible gender effects in politeness strategy. We first recorded speech from 10 native Japanese speakers (5 male, 5 female) under polite and non-polite settings with identical texts. Then perceptual experiment was conducted to rate the politeness...
We studied tongue shapes extracted from X-ray films which were taken during the process of mandarin Chinese articulation. Through factor analysis, we built an eight-parameter-driven tongue articulation model. This study reveals that the front of the tongue has large horizontal movement; the blade of the tongue has large vertical movement; whereas the back, as well as the root, of the tongue has small...
Monitoring cognitive workload from speech signals has received a lot of attention from researchers in the past few years as it has the potential to improve performance and fidelity in human decision making. The bulk of the research has focused on classifying speech from talkers participating in cognitive workload experiments using simple reading tasks, memory span tests and the Stroop test, typically...
The Weighted Correlation based Atom Decomposition (WCAD) is a recently proposed physiological intonation model that decomposes the pitch contour into elementary components — atoms. Since these atoms are said to correspond to laryngeal muscle activation, in theory they could be used to infer higher linguistic meaning from the pitch contour. One such application relevant for cognitive infocommunication...
The article presents the results of signal analysis of the recorded singing voice samples. For that study the recorded samples of the “a-e-i-o-u” exercise is analysed. Some significant parameters describing voice have been estimated. Among the estimated parameters are: pitch, calculated with the use of autocorrelation method, values of the first five harmonics, set of parameters containing first five...
In this paper, we propose a novel method to solve the permutation problem for multi-channel frequency-domain blind source separation problems. For low spectral correlation problem between lower frequencies and higher frequencies, the proposed method utilizes phase difference information between microphones so as to avoid incorrect permutation alignment problems in addition to power spectral information...
The estimation accuracy of the late reverberation power spectral density (PSD) is of paramount importance in single-channel frequency-domain dereverberation algorithms. In this domain, the reverberant signal can be modeled by the convolution of an early speech component and a relative convolutive transfer function (RCTF). In this work, the RCTF coefficients are modeled by a first-order Markov chain,...
Acoustic echo arises due to the acoustic coupling between the loudspeaker and the microphone in a full-duplex voice communication device. How to reduce or eliminate echo has been an important problem in voice communications. This paper deals with this problem in the short-time Fourier transform (STFT) domain. An approach to acoustic echo suppression (AES) is developed, which uses a linear filter in...
A novel direction of arrival (DOA) estimator for concurrent speakers in reverberant environment is presented. Reverberation, if not properly addressed, is known to degrade the performance of DOA estimators. In our contribution, the DOA estimation task is formulated as a maximum likelihood (ML) problem, which is solved using the expectation-maximization (EM) procedure. The received microphone signals...
Auditory-evoked noninvasive electroencephalography (EEG) based brain-computer interfaces (BCIs) could be useful for improved hearing aids in the future. This manuscript investigates the role of frequency and spatial features of audio signal in EEG activities in an auditory BCI system with the purpose of detecting the attended auditory source in a cocktail party setting. A cross correlation based feature...
Generally, the performance of endpoint detection is affected by the noise. In this paper, we propose a novel two-layer decision model based on noise classification to detect the activity voice robustly. The training processing mainly contains two steps: firstly, we employ the noisex-92 database, which consists of different types of pure noise, to train a BP neural network in order to classify the...
Deep learning is proven to outperform other machine learning methods in numerous research fields. However, previous approaches, like multispace probability distribution hidden Markov models still surpass deep learning methods in the prediction accuracy of speech fundamental frequency (F0), inter alia, due to its discontinuous behavior. The current research focuses on the application of feedforward...
A number of computational techniques have been proposed that aim to detect mimicry in online conversations. In this paper, we investigate how well these reflect the prevailing cognitive science model, i.e. the Interactive Alignment Model. We evaluate Local Linguistic Alignment, word vectors, and Language Style Matching and show that these measures tend to show the features we expect to see in the...
To study and implement a computer evaluation system for spoken English pronunciation is important for learners to improve their spoken English. This paper introduces an undergraduate-oriented evaluation model of spoken English pronunciation and its related system, with four evaluation parameter of accuracy, speed, rhythm and intonation. This paper illustrates the necessity of each evaluation index,...
Most current Brain-Computer Interfaces (BCIs) achieve high information transfer rates using spelling paradigms based on stimulus-evoked potentials. Despite the success of this interfaces, this mode of communication can be cumbersome and unnatural. Direct synthesis of speech from neural activity represents a more natural mode of communication that would enable users to convey verbal messages in real-time...
Cross-frequency coupling plays an important role in coordinating neuronal computations underlying human perception, learning and memory. Here we compared four methods for measuring phase/amplitude coupling (PAC) of theta (4–7 Hz) and high-gamma (70–150 Hz) in intracranial electrocorticographic (ECoG) recordings. Time-frequency spectral and time-domain evoked responses were derived for comparison....
State-of-the-art hearing prostheses are equipped with acoustic noise reduction algorithms to improve speech intelligibility. Currently, one of the major challenges is to perform acoustic noise reduction in so-called cocktail party scenarios with multiple speakers, in particular because it is difficult-if not impossible-for the algorithm to determine which are the target speaker(s) that should be enhanced,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.