Search results for: Bo Xu

Items from 1 to 13 out of 13 results

chapter

Combining unidirectional long short-term memory with convolutional output layer for high-performance speech synthesis

Wenfu Wang, Bo Xu

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5500 - 5504

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In this paper, we target improving the accuracy of acoustic modelling for statistical parametric speech synthesis (SPSS) and introduce the convolutional neural network (CNN) due to its powerful capacity in locality modelling. A novel model architecture combining unidirectional long short-term memory (LSTM) and a time-domain convolutional output layer (COL) is proposed and employed to acoustic modelling...

chapter

Investigating gated recurrent neural networks for acoustic modeling

Yuanyuan Zhao, Jie Li, Shuang Xu, Bo Xu

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP) > 1 - 5

2016 10th International Symposium on Chinese Spoken Language Processing (ISCSLP)

Recurrent neural networks (RNNs) with a gating mechanism have been shown to give state-of-the-art performance in acoustic modeling, such as gated recurrent unit (GRU), long short-term memory (LSTM), long short-term memory projected (L-STMP), etc. But little is known about why these gated RNNs work and what the differences are among these networks. Based on a series of experimental comparison and analysis,...

chapter

An SAD algorithm based on SGMM and phoneme combination

Xiao Chen, Bo Xu

2015 4th International Conference on Computer Science and Network Technology (ICCSNT) > 1 > 1391 - 1394

2015 4th International Conference on Computer Science and Network Technology (ICCSNT)

Speech activity detection (SAD) is the key preprocess of speech application. This paper proposed a subspace Gaussian mixture model (SGMM) and phoneme combination based SAD algorithm. This algorithm is efficient, small and can utilize speech recognition corpus directly. Results indicate that, compared with the baseline, our proposed method results in an absolute improvement of 4.9% frame error rate...

chapter

Discriminative training of weighted polynomial vector for acoustic language recognition

Ce Zhang, Rong Zheng, Bo Xu

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4849 - 4852

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we propose a discriminative method for the acoustic feature based language recognizer, which is a modification of the polynomial expansion in generalized linear discriminant sequence (GLDS) kernel. It is inspired by the Gaussian mixture model-support vector machine (GMM-SVM) system which has been successfully used in both speaker and language recognition. Because of the restriction...

chapter

Multi-modal information fusion for news story segmentation in broadcast video

Bailan Feng, Peng Ding, Jiansong Chen, Jinfeng Bai, more

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 1417 - 1420

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

With the fast development of high-speed network and digital video recording technologies, broadcast video has been playing a more and more important role in our daily life. In this paper, we propose a novel news story segmentation scheme which can segment broadcast video into story units with multi-modal information fusion (MMIF) strategy. Compared with traditional methods, the proposed scheme extracts...

chapter

Unsupervised training of subspace gaussian mixture models for conversational telephone speech recognition

Zejun Ma, Xiaorui Wang, Bo Xu

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4829 - 4832

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

This paper presents our preliminary works on exploring unsupervised training of subspace gaussian mixture models for under-resourced CTS recognition task. The subspace model yields better performance than conventional GMM model, particularly in small or middle-sized training set. As an effective way to save human efforts, unsupervised learning is often applied to automatically transcribe a large amount...

chapter

Mandarin prosodic break detection based on complementary model

Chong-Jia Ni, Wen-Ju Liu, Bo Xu

2010 7th International Symposium on Chinese Spoken Language Processing > 353 - 357

7th International Symposium on Chinese Spoken Language Processing (ISCSLP 2010)

Automatic prosodic break detection is important for both speech understanding and natural speech synthesis. In this paper, we develop complementary model to detect Mandarin prosodic break by using acoustic, lexical and syntactic evidence. The model realizes the complementarities by taking the advantages of each model. When comparing with the baseline system, our proposed method has good performance.

chapter

Automatic Detection of Stress in Mandarin Utterance with Tone Dependent Model

Taotao Zhu, Dengfeng Ke, Zhenbiao Chen, Bo Xu

2009 Chinese Conference on Pattern Recognition > 1 - 5

2009 Chinese Conference on Pattern Recognition. (CCPR 2009) and the First CJK Joint Workshop on Pattern Recognition (CJKPR)

In this paper, we present the work in progress on automatic detection of stress in continuous Mandarin (standard Chinese) spoken utterance, and we are interested in finding the characteristic and performance of the acoustic stress cues in Mandarin. Therefore, correlated stress features including pitch, duration, intensity and spectral intensity are exploited with the purpose of developing the baseline...

chapter

Mandarin pitch accent prediction using hierarchical model based ensemble machine learning

Chongjia Ni, Wenju Liu, Bo Xu

2009 IEEE Youth Conference on Information, Computing and Telecommunication > 327 - 330

2009 IEEE Youth Conference on Information, Computing and Telecommunication (YC-ICT 2009)

In this study, we combine the Mandarin characteristics with Mandarin acoustic attribute and text information and use hierarchical model based ensemble machine learning to predict Mandarin pitch accent. Our model could make the best of advantages of prosody hierarchical structure and ensemble machine learning. When comparing our model with classification and regression tree (CART), support vector machine...

chapter

Exploring the automatic mispronunciation detection of confusable phones for mandarin

Jie Jiang, Bo Xu

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4833 - 4836

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Mispronunciation detection is one of the vital tasks of the CALL (Computer Assisted Language Learning) systems. Many methods have been introduced to accomplish this task. However, few of them have addressed the detection task on confusable phones. In this paper, phone-level classifiers are utilized to improve the detection performance on the confusable phones. Features of the classifiers are posterior...

chapter

Automatic Prosody Boundary Labeling of Mandarin Using Both Text and Acoustic Information

Chongjia Ni, Wenju Liu, Bo Xu

2008 6th International Symposium on Chinese Spoken Language Processing > 1 - 4

2008 6th International Symposium on Chinese Spoken Language Processing

Prosody is an important factor for a high quality text-to- speech (TTS) system. Prosody is often described with a hierarchical structure. So the generation of the hierarchical prosody structure is very important both in the corpus building and the real-time text analysis, but the prosody labeling procedure is laborious and time consuming. In this paper, an automatic prosody boundary label system is...

chapter

An effective and efficient method for query by humming system based on multi-similarity measurement fusion

Lei Wang, Shen Huang, Sheng Hu, Jiaen Liang, more

2008 International Conference on Audio, Language and Image Processing > 471 - 475

2008 International Conference on Audio, Language and Image Processing

Since it is the most natural way for people to search a specific melody in large music database, query by humming/singing is attracting more and more researcherspsila attention in the field of content-based music information retrieval. In this task, note-based and frame-based similarity measures are two commonly used methods. However, in previous works, researchers always focus on one of the two methods...

chapter

Query by humming via multiscale transportation distance in random query occurrence context

Shenhuang, Leiwang, Shenghu, Hongchen Jiang, more

2008 IEEE International Conference on Multimedia and Expo > 1225 - 1228

2008 IEEE International Conference on Multimedia and Expo (ICME)

Query by humming (QBH) is an interactive tool for retrieving favored songs from a large database of known media via acoustic input. In this task, common method for measuring similarity between query and candidate is either by symbolic notation distance or by framed based dynamic programming. However, the former has disadvantage of error-prone to the noted symbolic feature extraction stage, while the...

Filter options

Keywords:
ACOUSTICS

Publication date

Set your own date range

Keywords

SPEECH (7)
SPEECH RECOGNITION (6)
TRAINING (5)
FEATURE EXTRACTION (4)
HIDDEN MARKOV MODELS (4)
NATURAL LANGUAGE PROCESSING (4)
DATABASES (3)
SUPPORT VECTOR MACHINES (3)
ACCURACY (2)
AUDIO DATABASES (2)
CLASSIFICATION ALGORITHMS (2)
DATA MINING (2)
DATA MODELS (2)
DETECTORS (2)
DISTANCE MEASUREMENT (2)
LOGIC GATES (2)
MUSIC (2)
SPEECH PROCESSING (2)
SPEECH SYNTHESIS (2)
ACOUSTIC EVIDENCE (1)
ACOUSTIC INFORMATION (1)
ACOUSTIC STRESS CUES (1)
ADABOOST (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ANCHOR PERSON DETECTION (1)
ARTIFICIAL NEURAL NETWORKS (1)
ATMOSPHERIC MEASUREMENTS (1)
AUDIO DETECTION (1)
AUTOMATIC MISPRONUNCIATION DETECTION (1)
AUTOMATIC PROSODIC BREAK DETECTION (1)
AUTOMATIC PROSODY BOUNDARY LABELING (1)
AUTOMATIC STRESS DETECTION (1)
BOOSTING (1)
BOOSTING CLASSIFICATION AND REGRESSION TREE (CART) (1)
BROADCAST VIDEO (1)
CART ALGORITHM (1)
CLASSIFICATION AND REGRESSION TREE (1)
CLASSIFICATION AND REGRESSION TREE FRAMEWORK (1)
CLASSIFICATION TREE ANALYSIS (1)
COMPLEMENTARY MODEL (1)
COMPUTATIONAL MODELING (1)
COMPUTER AIDED INSTRUCTION (1)
COMPUTER ARCHITECTURE (1)
COMPUTER ASSISTED LANGUAGE LEARNING (CALL) (1)
COMPUTER ASSISTED LANGUAGE LEARNING SYSTEM (1)
CONDITIONAL RANDOM FIELDS (CRFS) (1)
CONFUSABLE MANDARIN PHONE (1)
CONFUSION MATRIX (1)
CONTENT-BASED MUSIC INFORMATION RETRIEVAL (1)
CONTENT-BASED RETRIEVAL (1)
CONTEXT (1)
CONVOLUTIONAL OUTPUT LAYER (1)
CORRELATED STRESS (1)
DURATION FEATURE (1)
DYNAMIC PROGRAMMING (1)
DYNAMIC TIME WARPING (1)
EARTH MOVER DISTANCE (1)
ENHANCED POSTERIOR PROBABILITY VECTOR (1)
ENSEMBLE MACHINE LEARNING (1)
ENTROPY (1)
EQUATIONS (1)
ERROR ANALYSIS (1)
FACE (1)
FRAME-BASED SIMILARITY MEASURE (1)
GATE RECURRENT UNIT (1)
GATED RECURRENT NEURAL NETWORKS (1)
GMM (1)
HIERARCHICAL MODEL BASED ENSEMBLE MACHINE LEARNING (1)
HIGH QUALITY TEXT-TO-SPEECH SYSTEM (1)
HIGH-PERFORMANCE (1)
INTENSITY FEATURE (1)
INTERACTIVE TOOL (1)
KERNEL (1)
LABELING (1)
LANGUAGE RECOGNITION (1)
LARGE DATABASE (1)
LARGE MUSIC DATABASE (1)
LEARNING (ARTIFICIAL INTELLIGENCE) (1)
LEXICAL EVIDENCE (1)
LINGUISTICS (1)
LOGISTICS (1)
LONG SHORT-TERM MEMORY PROJECTED UNIT (1)
LONG SHORT-TERM MEMORY UNIT (1)
LSTM (1)
MACHINE LEARNING (1)
MANDARIN (1)
MANDARIN PITCH ACCENT PREDICTION (1)
MANDARIN PROSODIC BREAK DETECTION (1)
MANDARIN SPOKEN UTTERANCE (1)
MATHEMATICAL MODEL (1)
MATRIX ALGEBRA (1)
MAXIMUM MUTUAL INFORMATION (1)
MULTI-CLASS LOGISTIC REGRESSION (1)
MULTISCALE TRANSPORTATION DISTANCE (1)
MULTISIMILARITY MEASUREMENT FUSION (1)
NATURAL SPEECH SYNTHESIS (1)
NEURAL NETWORK (NN) (1)
NEURAL NETWORKS (1)
NEWS STORY SEGMENTATION (1)
more

INFONA - science communication portal

Search results for: Bo Xu

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options