The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the advent of hands free devices, speech recognition is of utmost importance but miserably fails to be perfect in a cock-tail party environment without speech separation or speech denoising. There are various techniques available for speech separation but the one technique used nowadays is non-negative matrix factorization (NMF). Non-negative matrix factorization decomposes the mixed signal into...
In stereo or multi-channel system identification, the most critical problems regarding online identification, e.g., for acoustic echo control, are the correlation properties of the excitation signals of the different audio channels. In this paper the impact of both the auto- and cross-correlation properties is considered and investigated. A new system combining appropriate decorrelation techniques...
Pitch extraction from a multi pitched music signal significantly relies on the training data for tasks like enhanced music-voice separation. This paper aims at identifying characteristic temporal and spectral features, using speech processing techniques that can help obtain crucial information, leading to a better understanding of the music structure. Towards this goal, the F0 contour has been studied...
In this work sentiment analysis of annual budget for Financial year 2016–17 is done. Text mining is used to extract text data from the budget document and to compute the word association of significant words and their correlation in computed with the associated words. Word frequency and the corresponding word cloud is plotted. The analysis is done in R software. The corresponding sentiment score is...
The term gender identification deals with finding out the gender of a person from his or her voice. Gender identification has been implemented in several Automatic Speaker Recognition (ASR) systems and has proved to be of great significance. The use of gender identification in today's technology makes it easier for user authentication and identification in high security systems. In this paper, we...
In this work, we presented an approach based on Rakaposhi system for securing digital information. The approach is developed to encrypt and decrypt digital information. A gray level image, speech data saved as .wav and text file recorded as .txt are taken to validate the proposed approach. The approach is easy and highly efficient. Some tests are performed to validate the performance of the proposed...
Noise is omnipresent in almost all acoustical environments. The investigation presents here seeks to quantify the impact of noise on mel-frequency cepstral coefficients (MFCC) of speech signal. MFCC is one of the most commonly used features for speech recognition systems. However, it has been observed that performance of MFCC based system degrades drastically with changing noise levels and noise types...
In this paper, we present comparative study of digital speech processing on Bangla speech signal. We represent oral characteristics of Bangla alphabet in terms of pitch and formant. We worked with both vowels and consonants to show their difference in practical use. We take oral speech signals as voice record and extract phonemes to analyze in both time and frequency domains. Both male and female...
In this paper, we focus on describing the method we designed for automatic perceived personality prediction. We present a simple model that uses three different sets of features: nonverbal audio cues, visual cues from video, and facial landmark points. The model uses a random decision forest to do regression from the extracted features. As we discuss in Section 4, this multimodal model performs relatively...
In this paper, we propose a novel noise masking method based on Computational Auditory Scene Analysis by using an adaptive factor. Although it has succeeded in the field of speech separation and speech enhancement to some extent, the usage of fixed thresholds used for segregation and labeling heavily affects the processing performance. Focusing on this issue, the proposed method utilizes the Normalized...
This paper proposes a method for automatic pronunciation assessment of Korean spoken by L2 learners by selecting the best feature set from a collection of the most well-known features in the literature. The L2 Korean Speech Corpus is used for assessment modeling, where the native languages of the L2 learners are English, Chinese, Japanese, Russian, and Mongolian. In our system, learners' speech is...
In this paper, the correlation between the speech features of the vowel /a/ and depression severity was investigated, so as to derive a depression severity meter mobile application that can accurately detect depression quantitatively. Results showed a correlation between depression severity and speech features, and an application prototype was created and tested to assess for predictive accuracy of...
This paper presents the experimental study of multi-stage classification based recognition of Lithuanian speech emotions. Three different criteria for feature selection were compared for this purpose: Maximal Efficiency, Minimal Cross-Correlation feature criterions, and the Sequential Feature Selection. A large database of spoken emotional Lithuanian language was used in this experiment - each of...
In this paper, an improved pitch contour formulation is introduced by modifying the existing pitch contour sinusoidal function. The aim is to convert neutral speech into storytelling speech in Malay Language. Our speech datasets (neutral and storytelling speech) were recorded by a male and a female professional speaker. They contain 116 speech sentences, 1,164 words, and 2,732 syllables. For storytelling...
With the improvement of China's international influence, more Chinese words are borrowed into English. To find out how popular Chinese borrowings in English, a questionnaire is conducted among English speakers (Chinese natives are excluded). Swaan's Q-value model is hereby employed to analyze the data. The result shows that Chinese borrowings in English vary in Q-value and the ones related to economy...
Brain-computer interfaces (BCIs) promise to promote a novel access channel for functional independence for individuals with severe speech and physical impairment (SSPI) that can occur as a result of numerous neurological diseases and injuries. Current BCI systems lack the robustness and accuracy to allow individuals with SSPI to complete tasks required for independent living (e.g. communication or...
In subjective evaluation of dysarthric speech, the inter-rater agreement between clinicians can be low. Disagreement among clinicians results from differences in their perceptual assessment abilities, familiarization with a client, clinical experiences, etc. Recently, there has been interest in developing signal processing and machine learning models for objective evaluation of subjective speech quality...
Psychiatry describes speech symptoms that are indicative of disorganized thought, but measuring them is not easy. With natural language processing tools, it is possible to quantify psychiatric symptoms. Graph representations of word trajectories and semantic incoherence have independently been shown to predict the Schizophrenia diagnosis. Both analyses assess thought organization through speech, but...
In natural environment speech signal is affected by various acoustic interference. Many of the applications in audio signal processing such as automatic speech recognition, telecommunications and hearing aid applications etc. requires an effective way of segregating the target speech from the mixed speech. Pitch information has an important role in the field of audio signal processing, especially...
In Mandarin language speaking, some consonant and vowel pairs are hard to be distinguished and pronounced clearly even for some native speakers. This study investigates the signal distance between consonants compared in pairs from the signal processing point of view to reveal the correlation of signal distance and consonant pronunciation. Some popular speech quality objective measures are innovatively...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.