Advanced search

Advanced search in people

From:

To:

Items from 1 to 20 out of 36 results

chapter

On statistical machine translation method for lexicon refinement in speech recognition

Haihua Xu, Xiong Xiao, Eng-Siong Chng, Haizhou Li

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP) > 25 - 29

2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)

In low resource Automatic Speech Recognition (ASR), one usually resorts to the Statistical Machine Translation (SMT) technique to learn transform rules to refine grapheme lexicon. To do this, we face two challenges. One is to generate grapheme sequences from the training data as the targets, which is paired with the original transcripts to train SMT models; the other is to effectively prune the learned...

chapter

Longer-length acoustic units for continuous speech recognition

Annika Hamalainen, Johan de Veth, Lou Boves

2005 13th European Signal Processing Conference > 1 - 4

2005 13th European Signal Processing Conference

Recent research on the TIMIT database suggests that longer-length acoustic units are better suited for modelling pronunciation variation and long-term temporal dependencies in speech than traditional phoneme-length units, yielding substantial improvements in recognition accuracy [9]. In this paper, we investigate whether similar improvements can be gained on another database, viz. excerpts from novels...

chapter

Variational learning and inference algorithms for extended Gaussian mixture model

Xin Wei, Jianxin Chen, Lei Wang, Jingwu Cui, more

2014 IEEE/CIC International Conference on Communications in China (ICCC) > 236 - 240

2014 IEEE/CIC International Conference on Communications in China (ICCC)

In this paper, in order to properly evaluate the relative importance of priors and observed data in the Bayesian framework, we propose an extended Gaussian mixture model (EGMM) and design the corresponding learning inference algorithms. First, we define the likelihood function of the EGMM and then propose the variational learning algorithm for this EGMM. Moreover, the proposed model and approach are...

chapter

Performance analyze of QoE-based speech quality evaluation model

Weiwei Zhang, Yongyu Chang, Yitong Liu, Yuan Tian

2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) > 1 - 6

2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)

In this paper, we analyze three QoE-based speech quality evaluation models: PESQ, NPESQ and POLQA models. PESQ (Perceptual evaluation of speech quality) is a well known objective speech quality assessment method for speech QoE evaluation. It is formed as the ITU-T P.862 Recommendations. NPESQ (New Perceptual Evaluation of Speech Quality) model is a new objective QoE model on evaluating the speech...

chapter

Acoustic modeling for native and non-native Mandarin speech recognition

Xin Chen, Jian Cheng

2012 8th International Symposium on Chinese Spoken Language Processing > 325 - 329

2012 8th International Symposium on Chinese Spoken Language Processing (ISCSLP 2012)

In this paper, we first described the automatic Spoken Chinese Test (SCT). With a large amount of native and non-native data collected for SCT, different training strategies for acoustic modeling were investigated. Evaluations were performed on native as well as non-native datasets. We discovered that directly combining native and non-native data to train acoustic models did not work well, and the...

chapter

Speaker variability in emotion recognition - an adaptation based approach

Ni Ding, Vidhyasaharan Sethu, Julien Epps, Eliathamby Ambikairajah

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5101 - 5104

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

None of the features commonly utilised in automatic emotion classification systems completely disassociate emotion-specific information from speaker-specific information. Consequently, this speaker-specific variability adversely affects the performance of the emotion classification system and in existing systems is frequently mitigated by some form of speaker normalisation. Speaker adaptation offers...

chapter

Multi-domain data modeling for biometrics

Alex Chen, Jason Kinser

2011 IEEE Applied Imagery Pattern Recognition Workshop (AIPR) > 1 - 5

2011 IEEE Applied Imagery Pattern Recognition Workshop: Imaging for Decision Making (AIPR 2011)

Recently, much work has been performed on CBIR (content based image retrieval) that treats images as single data domain. However, in our highly digitized society, information is being supplied in multiple domains where the data is linked across domains. For example, a web site does contain images, but it may also contain text, hyperlinks, documents, sound files, movies, and other domains of data....

chapter

Part of Speech Tagging for Romanian Text-to-Speech System

Lucian Radu Teodorescu, Razvan Boldizsar, Mihai Ordean, Melania Duma, more

2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing > 153 - 159

2011 13th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

This paper describes a Part of Speech (POS) tagger that has been developed for Romanian Text-to-Speech purposes. In our Text-to-Speech (TTS) system, the Part of Speech tagger is used to disambiguate the pronunciation of some homograph words, determine the semantic links between words, phrase breaks and intonation phrase boundaries and eventually design the intonation curves. The paper focuses on the...

chapter

Semantic data selection for vertical business voice search

Giuseppe Di Fabbrizio, Diamantino Caseiro, Amanda J. Stent

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5616 - 5619

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Local business voice search is a popular application for mobile phones, where hands-free interaction and speed are critical to users. However, speech recognition accuracy is still not satisfactory when the number of businesses and locations is extended nationwide. For mobile users, searching a local business directory is often related to the fulfillment of specific tasks “on-the-move”, such as finding...

chapter

Role of nucleus based context in word-independent syllable stress classification

Harish Doddala, Om D Deshmukh, Ashish Verma

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5712 - 5715

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

An acoustic-phonetics based word-independent technique which uses syllable context for classifying the lexical syllable stress of spoken English words is presented. Nucleus based clustering is remarkably successful in moving from word-dependent syllable stress classification which is intrinsically not scalable to word-independent classification. This however is not possible without an inherent drop...

chapter

Significance of segmentation in phoneme based Tamil speech recognition system

S. Harish, P. Vijayalakshmi, T. Nagarajan

2011 3rd International Conference on Electronics Computer Technology > 3 > 212 - 215

2011 3rd International Conference on Electronics Computer Technology (ICECT)

Over the last few decades speech recognition has evolved and matured enough to be used in commercial applications. The applications include automatic dictation software, voice dialling, voice controlled navigation and simple data entry. Automatic Speech Recognition (ASR) deals with automatic conversion of acoustic signals of an utterance into text. In this work speech recognition system for Tamil...

chapter

On the Privacy of Encrypted Skype Communications

B Dupasquier, S Burschka, K McLaughlin, S Sezer

2010 IEEE Global Telecommunications Conference GLOBECOM 2010 > 1 - 5

2010 IEEE Global Communications Conference (GLOBECOM 2010)

The privacy of voice over IP (VoIP) systems is achieved by compressing and encrypting the sampled data. This paper investigates in detail the leakage of information from Skype, a widely used VoIP application. In this research, it has been demonstrated by using the dynamic time warping (DTW) algorithm, that sentences can be identified with an accuracy of 60%. The results can be further improved by...

chapter

Performance improvement in automatic gender identification using hierarchical clustering

M A Keyvanrad, M M Homayounpour

2010 5th International Symposium on Telecommunications > 900 - 903

2010 5th International Symposium on Telecommunications (IST)

In this paper a hierarchical structure is proposed for automatic gender identification (AGI). In this structure two clustering techniques are used. The first technique is divisive clustering for dividing speakers from each gender to some classes of speakers. The second clustering technique is agglomerative clustering for creating a hierarchical structure. Feature reduction is done by SOAP feature...

chapter

Semantics-based language modeling for Cantonese-English code-mixing speech recognition

Houwei Cao, P C Ching, Tan Lee, Yu Ting Yeung

2010 7th International Symposium on Chinese Spoken Language Processing > 246 - 250

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

This paper addresses the problem of language modeling for LVCSR of Cantonese-English code-mixing utterances spoken in daily communications. In the absence of sufficient amount of code-mixing text data, translation-based and semantics-based mapping are applied on n-grams to better estimate the probability of low-frequency and unseen mixed-language n-grams events. In translation-based mapping scheme,...

chapter

The study of Tibetan prosodic structure prediction model

Yu Hongzhi, Chen Chen, Chen Qi, Shi Jing

2010 2nd International Conference on Signal Processing Systems > 1 > V1-645 - V1-648

2010 2nd International Conference on Signal Processing Systems (ICSPS 2010)

Prosodic structure prediction plays a crucial role on the prosodic annotation of speech synthesis corpus as well as on improving the naturalness of synthesized speech. The paper studies Tibetan prosodic structure with Tibetan speech characteristics. Having analyzed a variety of variables that have an impact on Tibetan prosodic boundary, we obtain syllable boundary grammatical information, prosodic...

chapter

A bayesian hierarchical mixture of experts approach to estimate speech quality

S Iman Mossavat, Oliver Amft, Bert de Vries, Petko N Petkov, more

2010 Second International Workshop on Quality of Multimedia Experience (QoMEX) > 200 - 205

2010 Second International Workshop on Quality of Multimedia Experience (QoMEX 2010)

This paper demonstrates the potential of theoretically motivated learning methods in solving the problem of non-intrusive quality estimation for which the state-of-the-art is represented by ITU-T P.563 standard. To construct our estimator, we adopt the speech features from P.563, while we use a different mapping of features to form quality estimates. In contrast to P.563 which assumes distortion-classes...

chapter

Recognition of phonemes and words in singing

Annamaria Mesaros, Tuomas Virtanen

2010 IEEE International Conference on Acoustics, Speech and Signal Processing > 2146 - 2149

2010 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2010

This paper studies the influence of n-gram language models in the recognition of sung phonemes and words. We train uni-, bi-, and trigram language models for phonemes and bi- and trigrams for words. The word-level language model is estimated from a textual lyrics database. In the recognition we use a hidden Markov model based phonetic recognizer adapted to singing voice. The models were tested on...

chapter

Enhancing in-vehicle safety via contact sensor for stress detection

S.A. Patil, J.H.L. Hansen

2009 IEEE International Conference on Vehicular Electronics and Safety (ICVES) > 86 - 90

2009 IEEE International Conference on Vehicular Electronics and Safety (ICVES 2009)

The number of vehicles on the road as well as the human drive time is increasing significantly. Many drivers are increasing their attempts to multi-task while driving including eating, drinking, entertainment control etc. A relatively new domain has emerged over the last 5 years focused on increased technology in the vehicle based on: GPS navigation systems, traffic, weather warning systems, advanced...

chapter

A maximum entropy approach to Chinese grapheme-to-phoneme conversion

R.T.-H. Tsai, Yu-Chun Wang

2009 IEEE International Conference on Information Reuse&Integration > 411 - 416

2009 IEEE International Conference on Information Reuse & Integration (IRI 2009)

Grapheme-to-phoneme (G2P) conversion plays an important role in speech synthesis. The main difficulty facing Chinese G2P conversion is that many Chinese characters are polyphonic, having more than one pronunciation. A Chinese G2P system must be able to pick the correct pronunciation from among several candidates. Contextual information on neighboring characters such as character n-grams, phonetic...

chapter

Unsupervised acoustic and language model training with small amounts of labelled data

S. Novotney, R. Schwartz, J. Ma

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4297 - 4300

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

We measure the effects of a weak language model, estimated from as little as 100k words of text, on unsupervised acoustic model training and then explore the best method of using word confidences to estimate n-gram counts for unsupervised language model training. Even with 100k words of text and 10 hours of training data, unsupervised acoustic modeling is robust, with 50% of the gain recovered when...

Keywords:
DATA MODELS
ACCURACY
SPEECH

Publication date

Set your own date range

Publication type

book (33)
article (3)

Keywords

TRAINING (17)
SPEECH RECOGNITION (15)
HIDDEN MARKOV MODELS (14)
ACOUSTICS (12)
SPEECH PROCESSING (11)
ARTIFICIAL NEURAL NETWORKS (8)
SIGNAL PROCESSING (8)
NOISE (7)
COMPUTATIONAL MODELING (6)
COMPUTERS (6)
ROBUSTNESS (6)
SIGNAL PROCESSING ALGORITHMS (6)
ALGORITHM DESIGN AND ANALYSIS (5)
COMPLEXITY THEORY (5)
DATA MINING (5)
EQUATIONS (5)
FEATURE EXTRACTION (5)
NATURAL LANGUAGE PROCESSING (5)
SIGNAL TO NOISE RATIO (5)
SPEAKER RECOGNITION (5)
ADAPTATION MODEL (4)
CLASSIFICATION ALGORITHMS (4)
CONTEXT MODELING (4)
CONVERGENCE (4)
COST FUNCTION (4)
DATABASES (4)
EDUCATIONAL INSTITUTIONS (4)
ESTIMATION (4)
FREQUENCY MODULATION (4)
GAIN (4)
MATHEMATICAL MODEL (4)
MAXIMUM LIKELIHOOD ESTIMATION (4)
OPTIMIZATION (4)
POLYNOMIALS (4)
TESTING (4)
TRAINING DATA (4)
TRANSFORMS (4)
VECTORS (4)
WRITING (4)
ACOUSTIC SIGNAL PROCESSING (3)
ADDITIVE NOISE (3)
AUTOMATIC SPEECH RECOGNITION (3)
COMPUTER LANGUAGES (3)
CONFERENCES (3)
CONTEXT (3)
CORRELATION (3)
DELAY (3)
EIGENVALUES AND EIGENFUNCTIONS (3)
ELECTRONIC MAIL (3)
IEEE TRANSACTIONS ON SIGNAL PROCESSING (3)
INTERPOLATION (3)
LANGUAGE MODELING (3)
MEL FREQUENCY CEPSTRAL COEFFICIENT (3)
PARAMETER ESTIMATION (3)
PATTERN RECOGNITION (3)
PRINCIPAL COMPONENT ANALYSIS (3)
REAL TIME SYSTEMS (3)
SUPPORT VECTOR MACHINE CLASSIFICATION (3)
WHITE NOISE (3)
ACOUSTIC MEASUREMENTS (2)
ACOUSTIC MODELING (2)
ANALYTICAL MODELS (2)
BAYES METHODS (2)
BAYESIAN METHODS (2)
COMPUTER ARCHITECTURE (2)
COVARIANCE MATRIX (2)
DECISION TREES (2)
DECODING (2)
DISTANCE MEASUREMENT (2)
ESTIMATION THEORY (2)
GALLIUM NITRIDE (2)
GRAMMARS (2)
INDEXES (2)
INSTRUMENTS (2)
INTERNET (2)
INTERNET TELEPHONY (2)
LANGUAGE MODEL (2)
LEARNING (ARTIFICIAL INTELLIGENCE) (2)
LEAST SQUARES APPROXIMATION (2)
MICROPHONES (2)
MUTUAL INFORMATION (2)
NUMERICAL MODELS (2)
OBJECT RECOGNITION (2)
OSCILLATORS (2)
PREDICTIVE MODELS (2)
RADAR (2)
RADAR SIGNAL PROCESSING (2)
REGISTERS (2)
RELIABILITY (2)
REVIEWS (2)
SENSORS (2)
SIMULATION (2)
SONAR (2)
SOURCE SEPARATION (2)
SPEAKER ADAPTATION (2)
SPEECH SYNTHESIS (2)
SPLINE (2)
more

INFONA - science communication portal

Advanced search

Advanced search in people

On statistical machine translation method for lexicon refinement in speech recognition

Longer-length acoustic units for continuous speech recognition

Variational learning and inference algorithms for extended Gaussian mixture model

Performance analyze of QoE-based speech quality evaluation model

Acoustic modeling for native and non-native Mandarin speech recognition

Speaker variability in emotion recognition - an adaptation based approach

Multi-domain data modeling for biometrics

Part of Speech Tagging for Romanian Text-to-Speech System

Semantic data selection for vertical business voice search

Role of nucleus based context in word-independent syllable stress classification

Significance of segmentation in phoneme based Tamil speech recognition system

On the Privacy of Encrypted Skype Communications

Performance improvement in automatic gender identification using hierarchical clustering

Semantics-based language modeling for Cantonese-English code-mixing speech recognition

The study of Tibetan prosodic structure prediction model

A bayesian hierarchical mixture of experts approach to estimate speech quality

Recognition of phonemes and words in singing

Enhancing in-vehicle safety via contact sensor for stress detection

A maximum entropy approach to Chinese grapheme-to-phoneme conversion

Unsupervised acoustic and language model training with small amounts of labelled data

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options