The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Large scale fleet tests of autonomous vehicles lead to the availability of massive recorded datasets, offering significant potential for the generation of realistic virtual test drives, for the development and training of machine learning based functions, and facilitated performance analysis. Automated scenario classification and data labeling is necessary to maximize the utility of these massive...
Emotions play a key role in cognitive processes, particularly in learning. Educators should know the emotional state of each student during a teaching activity. They must help students to experiment, interact and explore new topics and constructs. Students must feel in a state that maximize their performance. To know the emotional state of student, we need an emotion recognition system. It can be...
Early detecetion of ST segment's depression or elevation is very important for prevention of myocardial ischemia and it is very important to prevent a myocardial infarction that may occur in the future. In this study, an algorithm based on Choi-Williams time-frequency distribution was developed in order to early detection of ST segment's depressions or elevations. The performance evaluation of the...
Speaker recognition is a pattern recognition task which has long been studied, but the accuracies are still far from the desired levels. The majority of the studies on speaker recognition demonstrates the results obtained from databases in which English voices are used. Since there are very few studies on Turkish speech, the performance of the known successful methods in Turkish voices are uncertain...
Rāga is a quintessential component of Indian classical music. Rāgas are primarily characterised by melodic time-frequency (T-F) motifs. There have been several efforts made to determine the identity of a rāga, yet the techniques work only on subset(s) of rāgas, or perform poorly in terms of scalability. In this paper, we propose a rāga identification method for Carnatic music using Locality Sensitive...
This work proposes a technique for predicting the pitch from Mel-frequency cepstral coefficients (MFCC) vectors. Previous pitch prediction methods are based on the statistical models such as Gaussian mixture models and hidden Markov models. In this paper, we propose a three-step method to estimate pitch from MFCC vectors. First the Mel-filterbank energies (MFBEs) are estimated from MFCC vectors. Secondly,...
Incorporating prosodic information with spectral information at the feature level is challenging. In this paper, a method for feature level fusion of spectral and prosodic information is proposed. A pitch contour is first extracted from the frame blocked segments of the speech signal. These speech segments obtained herein are labeled as high pitch and low pitch segments. Both spectral and prosodic...
Speech uttered by the human beings contains the information about speakers, languages and contents. Language of uttered speech can easily be identified by extracting the language specific information from it. Identification of language of speech is known as Language Identification (LID). Identification of language from speech is helpful in its translation, speech recognition and speech activated automatic...
The trend for about twenty years, the research regarding the number of states in Hidden Markov Model (HMM) was mainly aimed at increasing it in order to ensure the robustness of the face recognition system. In this paper, a novel face recognition method is presented based on one state of discrete HMM, where it seemed impossible in the past. Contrary to other approaches that use the three parameters...
One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...
This paper describes methods for evaluating automatic speech recognition (ASR) systems in comparison with human perception results, using measures derived from linguistic distinctive features. Error patterns in terms of manner, place and voicing are presented, along with an examination of confusion matrices via a distinctive-feature-distance metric. These evaluation methods contrast with conventional...
In current studies, an extended subjective self-report method is generally used for measuring emotions. Even though it is commonly accepted that speech emotion perceived by the listener is close to the intended emotion conveyed by the speaker, research has indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different...
Continuous prediction of dimensional emotions (e.g. arousal and valence) has attracted increasing research interest recently. When processing emotional speech signals, phonetic features have been rarely used due to the assumption that phonetic variability is a confounding factor that degrades emotion recognition/prediction performance. In this paper, instead of eliminating phonetic variability, we...
Recent research on machine learning focuses on audio source identification in complex environments. They rely on extracting features from audio signals and use machine learning techniques to model the sound classes. However, such techniques are often not optimized for a real-time implementation and in multi-source conditions. We propose a new real-time audio single-source classification method based...
Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM)...
This paper proposes a long short-term memory recurrent neural network (LSTM-RNN) for extracting melody and simultaneously detecting regions of melody from polyphonic audio using the proposed harmonic sum loss. The previous state-of-the-art algorithms have not been based on machine learning techniques and certainly not on deep architectures. The harmonics structure in melody is incorporated in the...
Lips deliver visually active clues for speech articulation. Affective states define how humans articulate speech; hence, they also change articulation of lip motion. In this paper, we investigate effect of phonetic classes for affect recognition from lip articulations. The affect recognition problem is formalized in discrete activation, valence and dominance attributes. We use the symmetric KullbackLeibler...
Different modes of vibration of the vocal folds contribute significantly to the voice quality. The neutral mode phonation, often used in a modal voice, is one against which the other modes can be contrastively described, also called non-modal phonations. This paper investigates the impact of non-modal phonation on phonological posteriors, the probabilities of phonological features inferred from the...
Finding an effective way to represent human actions is yet an open problem because it usually requires taking evidences extracted from various temporal resolutions into account. A conventional way of representing an action employs temporally ordered fine-grained movements, e.g., key poses or subtle motions. Many existing approaches model actions by directly learning the transitional relationships...
Sports event recognition and classification is a challenging task due to the number of possible categories. On one hand, how to characterize legitimate event classification names and how to acquire preparing tests for these classes should be investigated, then again, it is non-inconsequential to accomplish acceptable order execution. To address these issues, a content mining pipeline is initially...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.