Search results

chapter

Spatiotemporal representation of driving scenarios and classification using neural networks

Richard Gruner, Philip Henzler, Gereon Hinz, Corinna Eckstein, more

2017 IEEE Intelligent Vehicles Symposium (IV) > 1782 - 1788

2017 IEEE Intelligent Vehicles Symposium (IV)

Large scale fleet tests of autonomous vehicles lead to the availability of massive recorded datasets, offering significant potential for the generation of realistic virtual test drives, for the development and training of machine learning based functions, and facilitated performance analysis. Automated scenario classification and data labeling is necessary to maximize the utility of these massive...

chapter

A review on speech emotion recognition: Case of pedagogical interaction in classroom

Leila Kerkeni, Youssef Serrestou, Mohamed Mbarki, Kosai Raoof, more

2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP) > 1 - 7

2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP)

Emotions play a key role in cognitive processes, particularly in learning. Educators should know the emotional state of each student during a teaching activity. They must help students to experiment, interact and explore new topics and constructs. Students must feel in a state that maximize their performance. To know the emotional state of student, we need an emotion recognition system. It can be...

chapter

Time-frequency analysis based detection of ECG ST segment change using large feature set

Ilknur Kayikcioglu, Guzin Ulutas, Temel Kayikcioglu

2017 25th Signal Processing and Communications Applications Conference (SIU) > 1 - 4

2017 25th Signal Processing and Communications Applications Conference (SIU)

Early detecetion of ST segment's depression or elevation is very important for prevention of myocardial ischemia and it is very important to prevent a myocardial infarction that may occur in the future. In this study, an algorithm based on Choi-Williams time-frequency distribution was developed in order to early detection of ST segment's depressions or elevations. The performance evaluation of the...

chapter

A study on Turkish text — Dependent speaker recognition

Havva Celiktas, Cemal Hanilci

2017 25th Signal Processing and Communications Applications Conference (SIU) > 1 - 4

2017 25th Signal Processing and Communications Applications Conference (SIU)

Speaker recognition is a pattern recognition task which has long been studied, but the accuracies are still far from the desired levels. The majority of the studies on speaker recognition demonstrates the results obtained from databases in which English voices are used. Since there are very few studies on Turkish speech, the performance of the known successful methods in Turkish voices are uncertain...

chapter

Raga identification using Locality Sensitive Hashing

G Padmasundari, Hema A Murthy

2017 Twenty-third National Conference on Communications (NCC) > 1 - 6

2017 Twenty-third National Conference on Communications (NCC)

Rāga is a quintessential component of Indian classical music. Rāgas are primarily characterised by melodic time-frequency (T-F) motifs. There have been several efforts made to determine the identity of a rāga, yet the techniques work only on subset(s) of rāgas, or perform poorly in terms of scalability. In this paper, we propose a rāga identification method for Carnatic music using Locality Sensitive...

chapter

Pitch prediction from Mel-frequency cepstral coefficients using sparse spectrum recovery

M V Achuth Rao, Prasanta Kumar Ghosh

2017 Twenty-third National Conference on Communications (NCC) > 1 - 6

2017 Twenty-third National Conference on Communications (NCC)

This work proposes a technique for predicting the pitch from Mel-frequency cepstral coefficients (MFCC) vectors. Previous pitch prediction methods are based on the statistical models such as Gaussian mixture models and hidden Markov models. In this paper, we propose a three-step method to estimate pitch from MFCC vectors. First the Mel-filterbank energies (MFBEs) are estimated from MFCC vectors. Secondly,...

chapter

Fusion of spectral and prosodic information using combined error optimization for keyword spotting

Laxmi Pandey, Kuldeep Chaudhary, Rajesh M Hegde

2017 Twenty-third National Conference on Communications (NCC) > 1 - 6

2017 Twenty-third National Conference on Communications (NCC)

Incorporating prosodic information with spectral information at the feature level is challenging. In this paper, a method for feature level fusion of spectral and prosodic information is proposed. A pitch contour is first extracted from the frame blocked segments of the speech signal. These speech segments obtained herein are labeled as high pitch and low pitch segments. Both spectral and prosodic...

chapter

Implicit language identification system based on random forest and support vector machine for speech

Manish Gupta, Shambhu Shankar Bharti, Suneeta Agarwal

2017 4th International Conference on Power, Control & Embedded Systems (ICPCES) > 1 - 6

2017 4th International Conference on Power, Control & Embedded Systems (ICPCES)

Speech uttered by the human beings contains the information about speakers, languages and contents. Language of uttered speech can easily be identified by extracting the language specific information from it. Identification of language of speech is known as Language Identification (LID). Identification of language from speech is helpful in its translation, speech recognition and speech activated automatic...

chapter

A novel face recognition method based on one state of discrete Hidden Markov Model

Hameed R. Farhan, Mahmuod H. Al-Muifraje, Thamir R. Saeed

2017 Annual Conference on New Trends in Information & Communications Technology Applications (NTICT) > 252 - 257

2017 Annual Conference on New Trends in Information & Communications Technology Applications (NTICT)

The trend for about twenty years, the research regarding the number of states in Hidden Markov Model (HMM) was mainly aimed at increasing it in order to ensure the robustness of the face recognition system. In this paper, a novel face recognition method is presented based on one state of discrete HMM, where it seemed impossible in the past. Contrary to other approaches that use the three parameters...

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures

Xiang Kong, Jeung-Yoon Choi, Stefanie Shattuck-Hufnagel

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5810 - 5814

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper describes methods for evaluating automatic speech recognition (ASR) systems in comparison with human perception results, using measures derived from linguistic distinctive features. Error patterns in terms of manner, place and voicing are presented, along with an examination of confusion matrices via a distinctive-feature-distance metric. These evaluation methods contrast with conventional...

chapter

Mood detection from daily conversational speech using denoising autoencoder and LSTM

Kun-Yi Huang, Chung-Hsien Wu, Ming-Hsiang Su, Hsiang-Chi Fu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5125 - 5129

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In current studies, an extended subjective self-report method is generally used for measuring emotions. Even though it is commonly accepted that speech emotion perceived by the listener is close to the intended emotion conveyed by the speaker, research has indicated that there still remains a mismatch between them. In addition, the individuals with different personalities generally have different...

chapter

A PLLR and multi-stage Staircase Regression framework for speech-based emotion prediction

Zhaocheng Huang, Julien Epps

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5145 - 5149

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Continuous prediction of dimensional emotions (e.g. arousal and valence) has attracted increasing research interest recently. When processing emotional speech signals, phonetic features have been rarely used due to the assumption that phonetic variability is a confounding factor that degrades emotion recognition/prediction performance. In this paper, instead of eliminating phonetic variability, we...

chapter

A mixture model-based real-time audio sources classification method

Maxime Baelde, Christophe Biernacki, Raphael Greff

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2427 - 2431

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Recent research on machine learning focuses on audio source identification in complex environments. They rely on extracting features from audio signals and use machine learning techniques to model the sound classes. However, such techniques are often not optimized for a real-time implementation and in multi-source conditions. We propose a new real-time audio single-source classification method based...

chapter

Exploiting sequence information for text-dependent Speaker Verification

Subhadeep Dey, Petr Motlicek, Srikanth Madikeri, Marc Ferras

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5370 - 5374

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Model-based approaches to Speaker Verification (SV), such as Joint Factor Analysis (JFA), i-vector and relevance Maximum-a-Posteriori (MAP), have shown to provide state-of-the-art performance for text-dependent systems with fixed phrases. The performance of i-vector and JFA models has been further enhanced by estimating posteriors from Deep Neural Network (DNN) instead of Gaussian Mixture Model (GMM)...

chapter

Melody extraction and detection through LSTM-RNN with harmonic sum loss

Hyunsin Park, Chang D. Yoo

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2766 - 2770

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper proposes a long short-term memory recurrent neural network (LSTM-RNN) for extracting melody and simultaneously detecting regions of melody from polyphonic audio using the proposed harmonic sum loss. The previous state-of-the-art algorithms have not been based on machine learning techniques and certainly not on deep architectures. The harmonics structure in melody is incorporated in the...

chapter

Affect recognition from lip articulations

Rizwan Sadiq, Engin Erzin

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2432 - 2436

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Lips deliver visually active clues for speech articulation. Affective states define how humans articulate speech; hence, they also change articulation of lip motion. In this paper, we investigate effect of phonetic classes for affect recognition from lip articulations. The affect recognition problem is formalized in discrete activation, valence and dominance attributes. We use the symmetric KullbackLeibler...

chapter

On the impact of non-modal phonation on phonological features

Milos Cernak, Elmar Noth, Frank Rudzicz, Heidi Christensen, more

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5090 - 5094

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Different modes of vibration of the vocal folds contribute significantly to the voice quality. The neutral mode phonation, often used in a modal voice, is one against which the other modes can be contrastively described, also called non-modal phonations. This paper investigates the impact of non-modal phonation on phonological posteriors, the probabilities of phonological features inferred from the...

chapter

Learning and inferring human actions with temporal pyramid features based on conditional random fields

Shih-Yao Lin, Yen-Yu Lin, Chu-Song Chen, Yi-Ping Hung

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2617 - 2621

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Finding an effective way to represent human actions is yet an open problem because it usually requires taking evidences extracted from various temporal resolutions into account. A conventional way of representing an action employs temporally ordered fine-grained movements, e.g., key poses or subtle motions. Many existing approaches model actions by directly learning the transitional relationships...

chapter

Event Recognition and Classification in Sports Video

Vijayan Ellappan, Rajkumar Rajasekaran

2017 Second International Conference on Recent Trends and Challenges in Computational Models (ICRTCCM) > 182 - 187

2017 Second International Conference on Recent Trends and Challenges in Computational Models (ICRTCCM)

Sports event recognition and classification is a challenging task due to the number of possible categories. On one hand, how to characterize legitimate event classification names and how to acquire preparing tests for these classes should be investigated, then again, it is non-inconsequential to accomplish acceptable order execution. To address these issues, a content mining pipeline is initially...

INFONA - science communication portal

Search results

Spatiotemporal representation of driving scenarios and classification using neural networks

A review on speech emotion recognition: Case of pedagogical interaction in classroom

Time-frequency analysis based detection of ECG ST segment change using large feature set

A study on Turkish text — Dependent speaker recognition

Raga identification using Locality Sensitive Hashing

Pitch prediction from Mel-frequency cepstral coefficients using sparse spectrum recovery

Fusion of spectral and prosodic information using combined error optimization for keyword spotting

Implicit language identification system based on random forest and support vector machine for speech

A novel face recognition method based on one state of discrete Hidden Markov Model

Lyric recognition in monophonic singing using pitch-dependent DNN

Evaluating automatic speech recognition systems in comparison with human perception results using distinctive feature measures

Mood detection from daily conversational speech using denoising autoencoder and LSTM

A PLLR and multi-stage Staircase Regression framework for speech-based emotion prediction

A mixture model-based real-time audio sources classification method

Exploiting sequence information for text-dependent Speaker Verification

Melody extraction and detection through LSTM-RNN with harmonic sum loss

Affect recognition from lip articulations

On the impact of non-modal phonation on phonological features

Learning and inferring human actions with temporal pyramid features based on conditional random fields

Event Recognition and Classification in Sports Video

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options