Search results

Items from 101 to 120 out of 269 results

1 ...
3
4
5
6
7
8
9

chapter

Multimodal feature fusion for robust event detection in web videos

Pradeep Natarajan, Shuang Wu, Shiv Vitaladevuni, Xiaodan Zhuang, more

2012 IEEE Conference on Computer Vision and Pattern Recognition > 1298 - 1305

2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

Combining multiple low-level visual features is a proven and effective strategy for a range of computer vision tasks. However, limited attention has been paid to combining such features with information from other modalities, such as audio and videotext, for large scale analysis of web videos. In our work, we rigorously analyze and combine a large set of low-level features that capture appearance,...

chapter

Comparison of different multiclass SVM methods for speaker independent phoneme recognition

M. Cutajar, E. Gatt, I. Grech, O. Casha, more

2012 5th International Symposium on Communications, Control and Signal Processing > 1 - 5

2012 5th International Symposium on Communications, Control and Signal Processing (ISCCSP)

Four multiclass Support Vector Machines (SVMs) methods were designed for the task of speaker independent phoneme recognition. These are the All-at-once, One-against-all, One-against-one, and the Directed Acyclic Graph SVM (DAGSVM). The Discrete Wavelet Transform (DWT) 8 frequency band power percentages are used for feature extraction. All tests were carried out on the TIMIT database. Comparable recognition...

chapter

Thai speech assessment based on fractal theory

Montri Phothisonothai, Katsumi Watanabe

2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology > 1 - 4

2012 9th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON 2012)

This paper proposes the novel method to assess Thai speech based on fractal analysis. The fractal algorithm, namely, Higuchi's method was selected to evaluate the fractal dimension (FD) of segmented speech signals. To show the FD changes in waveform over time, the time-dependent FD (TDFD) was proposed. Probability distribution of TDFDs using kernel density estimation was used as an additional parameter...

chapter

Compensation of channel effects for support vector machines based speaker verification

Cemal Hanilci, Figen Ertas

2012 20th Signal Processing and Communications Applications Conference (SIU) > 1 - 4

2012 20th Signal Processing and Communications Applications Conference (SIU)

In this paper, we analyze the effect of channel compensation technique on support vector machines (SVM) based speaker verification performance and compare with another well-known speaker modeling algorithm Gaussian Mixture Models with universal background model (GMM-UBM). Experiments conducted on NIST 2002 SRE shows that channel compensation considerable improves the speaker verification accuracy.

chapter

On the use of different feature extraction methods for linear and non linear kernels

Imen Trabelsi, Dorra Ben Ayed

2012 6th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT) > 797 - 802

2012 6th International Conference on Sciences of Electronic, Technologies of Information and Telecommunications (SETIT)

The speech feature extraction has been a key focus in robust speech recognition research; it significantly affects the recognition performance. In this paper, we first study a set of different feature extraction methods such as linear predictive coding (LPC), mel frequency cepstral coefficient (MFCC) and perceptual linear prediction (PLP) with several features normalization techniques including rasta...

chapter

SVM based Arabic speaker verification system for mobile devices

Abdulrahman Alarifi, Issa Alkurtass, Abdulmalik S. Alsalman

2012 International Conference on Information Technology and e-Services > 1 - 6

2012 International Conference on Information Technology and e-Services (ICITeS)

User authentication is very critical to ensure only allowed users are able to access restricted resources. Voiceprint can be used as a unique password of the user to prove his/her identity. In this paper, we propose a text-dependent speaker verification system for Arabic language. The paper advocates the use of discrete representation of speech signals in terms of Mel-frequency cepstral coefficients...

chapter

Multiple sources' direction finding by using reliable component on phase difference manifold and kernel density estimator

K. Fujimoto, N. Ding, N. Hamada

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2601 - 2604

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper proposes a novel direction-of-arrival estimation method in a general 3-dimensional array configuration for multiple speech signals uttered simultaneously. The method is based on sparseness in the time-frequency representation of speech signal and is applicable to an underdetermined case where the sources outnumber sensors. At first, we introduce a parameterized closed surface to which we...

chapter

The UMD-JHU 2011 speaker recognition system

D Garcia-Romero, X Zhou, D Zotkin, B Srinivasan, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4229 - 4232

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In recent years, there have been significant advances in the field of speaker recognition that has resulted in very robust recognition systems. The primary focus of many recent developments have shifted to the problem of recognizing speakers in adverse conditions, e.g in the presence of noise/reverberation. In this paper, we present the UMD-JHU speaker recognition system applied on the NIST 2010 SRE...

chapter

New techniques for improving the practicality of an SVM-based speech/music classifier

Chungsoo Lim, Seong-Ro Lee, Yeon-Woo Lee, Joon-Hyuk Chang

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1657 - 1660

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Variable bit-rate coding introduced for effective utilization of limited communication bandwidth requires accurate classification of input signals. This paper investigates implementation of a support vector machine (SVM)-based speech/music classifier in the selectable mode vocoder (SMV) framework, which is a standard codec adopted by the Third-Generation Partnership Project 2 (3GPP2). A support vector...

chapter

Fast spoken query detection using lower-bound Dynamic Time Warping on Graphical Processing Units

Yaodong Zhang, Kiarash Adl, James Glass

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5173 - 5176

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper we present a fast unsupervised spoken term detection system based on lower-bound Dynamic Time Warping (DTW) search on Graphical Processing Units (GPUs). The lower-bound estimate and the K nearest neighbor DTW search are carefully designed to fit the GPU parallel computing architecture. In a spoken term detection task on the TIMIT corpus, a 55x speed-up is achieved compared to our previous...

chapter

Efficient speaker search over large populations using kernelized locality-sensitive hashing

Woojay Jeon, Yan-Ming Cheng

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4261 - 4264

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

We propose a novel method of efficiently searching very large populations of speakers, tens of thousands or more, using an utterance comparison model proposed in a previous work. The model allows much more efficient comparison of utterances compared to the traditional Gaussian Mixture Model(GMM)-based approach because of its computational simplicity while maintaining high accuracy. Furthermore, efficiency...

chapter

Spectrogram based features selection using multiple kernel learning for speech/music discrimination

Sharmin Nilufar, Nilanjan Ray, M. K. Islam Molla, Keikichi Hirose

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 501 - 504

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper presents a multiple kernel learning (MKL) approach to speech/music discrimination (SMD). The time-frequency representation (spectrogram) implemented by short-time Fourier transform (STFT) of audio segment is decomposed by wavelet packet transform into different subband levels. The subbands, which contain rich texture information, are used as features for this discrimination problem. MKL...

chapter

A new multiple-kernel-learning weighting method for localizing human brain magnetic activity

T. Takiguchi, T. Imada, R. Takashima, Y. Ariki, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 761 - 764

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper shows that pattern classification based on machine learning is a powerful tool to analyze human brain activity data obtained by magnetoencephalography (MEG). We propose a new weighting method using a multiple kernel learning (MKL) algorithm to localize the brain area contributing to the accurate vowel discrimination. Our MKL simultaneously estimates both the classification boundary and...

chapter

Cluster aware normalization for enhancing audio similarity

Mathieu Lagrange, Luis Gustavo Martins, George Tzanetakis

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1969 - 1972

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

An important task in Music Information Retrieval is content-based similarity retrieval in which given a query music track, a set of tracks that are similar in terms of musical content are retrieved. A variety of audio features that attempt to model different aspects of the music have been proposed. In most cases the resulting audio feature vector used to represent each music track is high dimensional...

chapter

Speech enhancement using pre-image iterations

Christina Leitner, Franz Pernkopf

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4665 - 4668

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we present a new method to de-noise speech in the complex spectral domain. The method is derived from kernel principal component analysis (kPCA). Instead of applying PCA in a high-dimensional feature space and then going back to the original input space by using a solution to the pre-image problem, only the pre-image step is applied for de-noising. We show that the de-noised audio sample...

chapter

Statistical approach to voice quality control in esophageal speech enhancement

Kenzo Yamamoto, Tomoki Toda, Hironori Doi, Hiroshi Saruwatari, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4497 - 4500

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper describes a voice quality control method in statistical esophageal speech enhancement. Esophageal speech is produced by one of the alternative speaking methods for laryngectomees. Its naturalness and intelligibility are much lower than those of natural voices and its voice quality sounds similar even if uttered by different laryngectomees. These issues are alleviated by a statistical voice...

chapter

Speaker verification using sparse representation over KSVD learned dictionary

B C Haris, R. Sinha

2012 National Conference on Communications (NCC) > 1 - 5

2012 National Conference on Communications (NCC)

In this work, we explore the use of sparse representation of GMM mean shifted supervectors over a learned dictionary for the speaker verification (SV) task. In this method the dictionaries are learned using the KSVD algorithm unlike the recently proposed SV methods employing the sparse representation classification (SRC) over exemplar dictionaries. The proposed approach with learned dictionary results...

chapter

Speech emotion recognition based on kernel reduced-rank regression

Wenming Zheng, Xiaoyan Zhou

Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) > 1972 - 1976

2012 21st International Conference on Pattern Recognition (ICPR)

Emotion recognition from Speech has been a very active research topic in pattern recognition. In this paper, we investigate the use of kernel reduced-rank regression (KRRR) model to address the emotion recognition problem from speech. KRRR is a nonlinear extension of the linear reduced-rank regression (RRR) model via the kernel trick, in which a kernel mapping is used for the multivariable of RRR...

chapter

Adaptive Neuro Fuzzy Inference System, Neural Network and Support Vector Machine for Caller Behavior Classification

Pretesh B. Patel, Tshilidzi Marwala

2011 10th International Conference on Machine Learning and Applications and Workshops > 1 > 298 - 303

2011 Tenth International Conference on Machine Learning and Applications (ICMLA 2011)

A classification system that accurately categorizes caller behavior within Interactive Voice Response systems would assist in developing good automated self service applications. This paper details the implementation of such a classification system for a pay beneficiary application. Adaptive Neuro-Fuzzy Inference System (ANFIS), Feed forward Artificial Neural Network (ANN) and Support Vector Machine...

chapter

Speech enhancement using kernel adaptive filtering method

Siva Prasad Nandyala, T. Kishore Kumar

2011 IEEE International Conference on Microwaves, Communications, Antennas and Electronic Systems (COMCAS 2011) > 1 - 5

2011 IEEE International Conference on Microwaves, Communications, Antennas and Electronic Systems (COMCAS 2011)

In this paper, we investigate the enhancement of speech by applying kernel adaptive filter. Noise removal is very important in many applications like telephone conversation, speech recognition, etc. Kernel methods have shown good results for other applications like handwriting recognition, inverse distance weightings, etc. To improve the speech quality and intelligibility, we can process the signals...

1 ...
3
4
5
6
7
8
9

Keywords:
KERNEL
SPEECH

Publication date

Set your own date range

Content availability

Available (268)
None (1)

Keywords

SUPPORT VECTOR MACHINES (134)
SPEECH RECOGNITION (91)
TRAINING (88)
FEATURE EXTRACTION (84)
SPEAKER RECOGNITION (52)
MEL FREQUENCY CEPSTRAL COEFFICIENT (38)
DATA MINING (34)
SPEECH PROCESSING (34)
HIDDEN MARKOV MODELS (33)
SUPPORT VECTOR MACHINE (32)
ACCURACY (29)
EMOTION RECOGNITION (23)
GAUSSIAN PROCESSES (23)
NOISE (23)
VECTORS (23)
ACOUSTICS (22)
SVM (22)
PRINCIPAL COMPONENT ANALYSIS (21)
NIST (19)
DATABASES (18)
LEARNING (ARTIFICIAL INTELLIGENCE) (18)
ROBUSTNESS (17)
PATTERN CLASSIFICATION (16)
GAUSSIAN MIXTURE MODEL (15)
ESTIMATION (14)
SPEAKER VERIFICATION (13)
ADAPTATION MODEL (12)
ARTIFICIAL NEURAL NETWORKS (12)
SUPPORT VECTOR MACHINE CLASSIFICATION (12)
NATURAL LANGUAGE PROCESSING (11)
COMPUTATIONAL MODELING (10)
MACHINE LEARNING (10)
POLYNOMIALS (10)
SIGNAL CLASSIFICATION (10)
SPEAKER IDENTIFICATION (10)
SPEECH ENHANCEMENT (10)
CLASSIFICATION ALGORITHMS (9)
ENCODING (9)
MATHEMATICAL MODEL (9)
MFCC (9)
CLASSIFICATION (8)
REVERBERATION (8)
SIGNAL PROCESSING (8)
SIGNAL TO NOISE RATIO (8)
SPEECH EMOTION RECOGNITION (8)
TESTING (8)
TIME-FREQUENCY ANALYSIS (8)
CEPSTRAL ANALYSIS (7)
COVARIANCE MATRIX (7)
ENTROPY (7)
EQUATIONS (7)
ERROR ANALYSIS (7)
GAUSSIAN MIXTURE MODELS (7)
KERNEL FUNCTION (7)
NOISE MEASUREMENT (7)
RADIAL BASIS FUNCTION NETWORKS (7)
REPRODUCING KERNEL HILBERT SPACE (7)
SPECTROGRAM (7)
TRAINING DATA (7)
ADAPTIVE FILTERS (6)
MICROPHONES (6)
MULTIPLE KERNEL LEARNING (6)
REGRESSION ANALYSIS (6)
SPEECH SIGNAL (6)
ACOUSTIC SIGNAL PROCESSING (5)
AUDIO SIGNAL PROCESSING (5)
CONFERENCES (5)
CORRELATION (5)
DATA MODELS (5)
DELAY (5)
FUZZY SET THEORY (5)
HILBERT SPACES (5)
I-VECTOR (5)
NONLINEAR FILTERS (5)
OPTIMIZATION (5)
PROBABILITY DENSITY FUNCTION (5)
SPEECH SYNTHESIS (5)
STATISTICAL ANALYSIS (5)
VECTOR QUANTIZATION (5)
ADAPTIVE VOLTERRA FILTER (4)
APPROXIMATION METHODS (4)
BLIND SOURCE SEPARATION (4)
CEPSTRUM (4)
CLUSTERING ALGORITHMS (4)
DECODING (4)
DICTIONARIES (4)
ECHO HIDING (4)
ECHO SUPPRESSION (4)
FEATURE SELECTION (4)
GAUSSIAN KERNEL (4)
GMM (4)
LATTICES (4)
MEL FREQUENCY CEPSTRAL COEFFICIENTS (4)
NATURAL LANGUAGES (4)
NOISE REDUCTION (4)
NONLINEAR MAPPING (4)
PARTICLE SWARM OPTIMIZATION (4)
PATHOLOGY (4)
more

INFONA - science communication portal

Search results

Multimodal feature fusion for robust event detection in web videos

Comparison of different multiclass SVM methods for speaker independent phoneme recognition

Thai speech assessment based on fractal theory

Compensation of channel effects for support vector machines based speaker verification

On the use of different feature extraction methods for linear and non linear kernels

SVM based Arabic speaker verification system for mobile devices

Multiple sources' direction finding by using reliable component on phase difference manifold and kernel density estimator

The UMD-JHU 2011 speaker recognition system

New techniques for improving the practicality of an SVM-based speech/music classifier

Fast spoken query detection using lower-bound Dynamic Time Warping on Graphical Processing Units

Efficient speaker search over large populations using kernelized locality-sensitive hashing

Spectrogram based features selection using multiple kernel learning for speech/music discrimination

A new multiple-kernel-learning weighting method for localizing human brain magnetic activity

Cluster aware normalization for enhancing audio similarity

Speech enhancement using pre-image iterations

Statistical approach to voice quality control in esophageal speech enhancement

Speaker verification using sparse representation over KSVD learned dictionary

Speech emotion recognition based on kernel reduced-rank regression

Adaptive Neuro Fuzzy Inference System, Neural Network and Support Vector Machine for Caller Behavior Classification

Speech enhancement using kernel adaptive filtering method

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options