The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Cry segmentation is an essential preprocessing step in any infant crying diagnosis system. Besides crying sounds consisting of expiration phases followed by short periods of inspiration episodes, each recording of newborn cries also includes silence sections as well as other sounds such as speech of caregivers, noise and sound of medical equipments. This paper is devoted to a newly developed Empirical...
Researches indicate that electroencephalography (EEG) can be used to classify data of imagined speech. It can be further utilized to develop speech prosthesis and synthetic telepathy systems. The objective of this paper is to improve the classification performance in imagined speech by selecting the features that extract maximum discriminatory information from the data. The features extracted are...
Speech recognition systems are either based on parametric approach or non-parametric approach. Parametric based systems such as HMMs have been the dominant technology for speech recognition in the past decade. Despite a lot of advancements and enhancements in the design of these systems: key problems such as long term temporal dependence, etc. Has not yet been solved. Recently due to availability...
In this paper, a vowel recognition scheme using visual information is proposed based on two dimensional discrete wavelet transform (2D-DWT). First, a video frame corresponding to a steady vowel zone is selected utilizing the speech characteristics of audio frames. Next, a pixel-based method is proposed to identify the lip region of a given video frame, where intensity variation of different color...
The goal of this paper is to identify gender of blog authors. Features such as POS tags, unigram (words+punctuations), bigrams and word classes are considered. To synthesis/rank features we are using Mutual information, Chi-square and Information gain methods. The dataset is the collection of 3227 blogs originally derived from blogs set, and among them 1679 were written by male and 1548 were written...
This paper investigates the relationship between rhythm metrics and the ability to classify speakers depending on gender and/or social environments that may have been affected by factors such as second language effects and ways of living as expressed through speech. The BBN/AUB (BBN Technologies and American University of Beirut) corpus was used; it contains four subsets of native Levantine dialect...
The GRBAS scale is a widely used subjective measure of voice quality. The aim of this paper is to investigate the correlation between the 'grade', 'roughness', 'breathiness', 'asthenia' and 'strain' dimensions of this scale and the objective measurements provided by the 'Analysis of Dysphonia in speech and Voice' (ADSV) software package. To do this, voice recordings of 107 samples were collected in...
Audio classification is one of the most important task in content-based analysis and can be implemented in many audio applications, such as indexing and retrieving. This paper addresses the problem of broadcast news audio classification, by support vector machine - binary tree (SVM-BT) architecture, into the five classes: pure speech, speech with music, speech with environment sound, pure music and...
Unvoiced-voiced portions of cochannel speech contain considerable amounts of both voiced and unvoiced speech and play a significant role in separation. Motivated by recent developments in separation of speech from nonspeech noise, we propose a classification-based approach for unvoiced-voiced speech separation. A new feature set consisting of pitch-based features and gammatone frequency cepstral coefficients...
A classification system that accurately categorizes caller behavior within Interactive Voice Response systems would assist in developing good automated self service applications. This paper details the implementation of such a classification system for a pay beneficiary application. Adaptive Neuro-Fuzzy Inference System (ANFIS), Feed forward Artificial Neural Network (ANN) and Support Vector Machine...
Most of the current automatic speech-based cognitive load measurement systems utilize acoustic features estimated using a mel filterbank. However, a previous study showed that a non-uniform filterbank designed specifically to emphasize cognitive load information present in low frequencies was more effective than a mel filterbank under noise-free conditions. This paper investigates the effectiveness...
We present in this paper a framework for audio concept identification based on audio stream analysis and binary classifiers encapsulation. The system consists of three stages. The first stage is called the pre-processing level audio, where audio stream is segmented and silence segments are detected. In the second stage, speech, music and environmental sounds are automatically divided and further classified...
This paper proposes a new approach to improve the amount of information extracted from the speech aiming to increase the accuracy of a system developed for the automatic detection of pathological voices. The paper addresses the discrimination capabilities of 11 features extracted using nonlinear analysis of time series. Two of these features are based on conventional nonlinear statistics (largest...
The field of Text Mining has evolved over the past years to analyze textual resources. However, it can be used in several other applications. In this research, we are particularly interested in performing text mining techniques on audio materials after translating them into texts in order to detect the speakers' emotions. We describe our overall methodology and present our experimental results. In...
Automatic recognition of emotional states via speech signal has attracted increasing attention in recent years. A number of techniques have been proposed which are capable of providing reasonably high accuracy for controlled studio settings. However, their performance is considerably degraded when the speech signal is contaminated by noise. In this paper, we present a framework with adaptive noise...
Speech production and speech phonetic features gradually improve in children by obtaining audio feedback after cochlear implantation or using hearing aid. In this study, voice disorders in children with cochlear implantation and hearing aid are classified. 30 Persian children participated in the study, including 6 children in levels 1 to 3 and 12 in level 4. Voice samples of 5 isolated Persian words...
This paper presents the building of part-of-speech Tagger for Malayalam Language using Support Vector Machine (SVM). POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and information extraction. This supervised machine learning POS tagging approach requires a large amount of annotated training corpus to tag...
Research in time-frequency distributions (TFDs) is limited in terms of their use of the available spatial domains and in their target applications. Most of the work up till now has been concentrated mainly on the t-f domain space. This work presents a detailed study about the ambiguity domain (AD), their resemblance in the t-f space and the significance of using such a representation. Further, a novel...
The Partitioned Feature-based Classifier (PFC) is proposed in this paper. PFC does not use entire feature vectors extracted from the original data at once to classify each datum, but use only groups of features related to each feature vector to classify data separately. In the training stage, the contribution rate calculated from each feature vector group is drawn throughout the accuracy of each feature...
The human voice is primarily a carrier of speech, but it also contains non-linguistic features unique to a speaker and indicative of various speaker demographics, e.g. gender, nativity, ethnicity. Such characteristics are helpful cues for audio/video search and retrieval. In this paper, we evaluate the effects of various low-, mid-, and high-level features for effective classification of speaker characteristics...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.