Wyniki wyszukiwania

Pozycje od 1 do 20 spośród 39 wyników

Poprzednia

Następna

rozdział

Pairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detection

Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, więcej

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5645 - 5649

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages...

rozdział

Clinical informatics: mining of pathological data by acoustic analysis

Zulfiqar Ali, Mansour Alsulaiman, Ghulam Muhammad, Ahmed Al-nasheri, więcej

2017 International Conference on Informatics, Health & Technology (ICIHT) > 1 - 8

2017 International Conference on Informatics, Health & Technology (ICIHT)

Data mining has a great potential in different areas of health informatics. Data mining in health industry can minimize the health cost as well as reduces the risk of life by informing a person at initial stage. An automatic classification system capable of mining pathological data may contribute in health informatics significantly. In this paper, an automatic system to differentiate between pathological...

rozdział

I-vector based deep neural network acoustic model adaptation using multilingual language resource

Haihua Xu, Wei Rao, Xiong Xiao, Hao Huang, więcej

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) > 1 - 5

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)

I-vector adaptation of DNN-HMM acoustic models has shown clear performance improvement for speech recognition. In this paper, we study this technique on Babel task. we use Swahili as target language (training data of 50 hours) and another 6 languages as multilingual resources to train i-vector extractors respectively. Our study shows that i-vector extractors trained with more multilingual data only...

rozdział

Learning contextual relevance of audio segments using discriminative models over AUD sequences

Sourish Chaudhuri, Bhiksha Raj

2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) > 197 - 200

2011 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

Effective retrieval of multimodal data involves performing accurate segmentation and analysis of such data. With easy access to a number of audio and video sharing platforms online, user-generated content with considerably less than ideal recording conditions has increased rapidly. One major issue with such content is the presence of semantically irrelevant segments in such recordings. This leads...

rozdział

Bilingual audio-subtitle extraction using automatic segmentation of movie audio

Andreas Tsiartas, Prasanta Ghosh, Panayiotis G. Georgiou, Shrikanth Narayanan

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5624 - 5627

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Extraction of bilingual audio and text data is crucial for designing Speech to Speech (S2S) systems. In this work, we propose an automatic method to segment multilingual audio streams from movies. In addition, the audio streams are aligned with the corresponding subtitles. We found that the proposed method gives 89% perfectly segmented bilingual audio and 6% partially segmented bilingual audio. In...

rozdział

Advances in Acoustic Modeling for Vietnamese LVCSR

Tuan Nguyen, Quan Vu

2009 International Conference on Asian Language Processing > 280 - 284

2009 International Conference on Asian Language Processing (IALP 2009)

In this paper, we present our experiments on the selection of basic phonetic units for the Vietnamese large vocabulary continuous speech recognition (LVCSR). Two acoustic models were compared. The first model has just used vowels or monophthongs as phonemes while the second one, which was proposed in this paper, has explored the use of diphthongs and triphthongs as phonemes as well. The two models...

rozdział

Hidden Conditional Random Fields for phone recognition

Yun-Hsuan Sung, D. Jurafsky

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 107 - 112

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

We apply Hidden Conditional Random Fields (HCRFs) to the task of TIMIT phone recognition. HCRFs are discriminatively trained sequence models that augment conditional random fields with hidden states that are capable of representing subphones and mixture components. We extend HCRFs, which had previously only been applied to phone classification with known boundaries, to recognize continuous phone sequences...

rozdział

Automatic detection of vowel pronunciation errors using multiple information sources

J. van Doremalen, C. Cucchiarini, H. Strik

2009 IEEE Workshop on Automatic Speech Recognition&Understanding > 580 - 585

2009 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU 2009)

Frequent pronunciation errors made by L2 learners of Dutch often concern vowel substitutions. To detect such pronunciation errors, ASR-based confidence measures (CMs) are generally used. In the current paper we compare and combine confidence measures with MFCCs and phonetic features. The results show that the best results are obtained by using MFCCs, then CMs, and finally phonetic features, and that...

rozdział

Single Sensor Acoustic Feature Extraction for Embedded Realtime Vehicle Classification

A. Starzacher, B. Rinner

2009 International Conference on Parallel and Distributed Computing, Applications and Technologies > 378 - 383

2009 International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2009)

Vehicle classification is an important task for various traffic monitoring applications. This paper investigates the capabilities of acoustic feature generation for vehicle classification. Six temporal and spectral features are extracted from the audio recordings. Six different classification algorithms are compared using the extracted features. We focus on a single sensor setting to keep the computational...

rozdział

Multi-voice polyphonic music transcription using eigeninstruments

G. Grindlay, D. Ellis

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics > 53 - 56

2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)

We present a model-based approach to separating and transcribing single-channel, multi-instrument polyphonic music in a semi-blind fashion. Our system extends the non-negative matrix factorization (NMF) algorithm to incorporate constraints on the basis vectors of the solution. In the context of music transcription, this allows us to encode prior knowledge about the space of possible instrument models...

rozdział

Named Entity Recognition of Spoken Documents Using Subword Units

G. Paass, A. Pilz, J. Schwenninger

2009 IEEE International Conference on Semantic Computing > 529 - 534

2009 IEEE International Conference on Semantic Computing (ICSC)

The output of a speech recognition system is a stream of text features that is overlayed by noise resulting from errors in the system's statistical classification of the audio input. Conditional random fields (CRFs), which have already proven themselves to be efficient, high-performance named entity recognizers (NERs) for named entities from text, offer the promise to compensate part of these errors...

rozdział

A multiple perception model on emotional speech

Jianhua Tao, Ya Li, Shifeng Pan

2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops > 1 - 6

2009 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops (ACII 2009)

More and more efforts have been made for the research of emotional speech recently. Although we may, sometimes be able to make a definite perceptual decision on emotion state, emotion is actually a kind of cline in a large vector space. Different emotions can be thought of as zones along an emotional vector. To resolve the ambiguity of emotion perception, the authors make an array of perception experiments...

rozdział

Detection of Abnormal Sound Using Multi-stage GMM for Surveillance Microphone

A. Ito, A. Aiba, M. Ito, S. Makino

2009 Fifth International Conference on Information Assurance and Security > 1 > 733 - 736

2009 Fifth International Conference on Information Assurance and Security (IAS)

We developed a system that detects abnormal sound from sound signal observed by a surveillance microphone. Our system learns the ldquonormal soundrdquo from observation of the microphone, and then detects sounds never observed before as ldquoabnormal sounds.rdquo To this end, we developed a technique that uses multiple GMMs for modeling different levels of sound events efficiently. We also consider...

rozdział

LOTUS-BN: A Thai broadcast news corpus and its research applications

A. Chotimongkol, K. Saykhum, P. Chootrakool, N. Thatphithakkul, więcej

2009 Oriental COCOSDA International Conference on Speech Database and Assessments > 44 - 50

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

This paper describes the design and construction of the LOTUS-BN corpus, a Thai television broadcast news corpus. In addition to audio recordings and their transcription, this corpus also includes a detailed annotation of many interesting characteristics of broadcast news data such as acoustic condition, overlapping speech, news topic and named entity. The LOTUS-BN is still an ongoing project with...

rozdział

Design and development of phonetically rich Urdu speech corpus

A.A. Raza, S. Hussain, H. Sarfraz, I. Ullah, więcej

2009 Oriental COCOSDA International Conference on Speech Database and Assessments > 38 - 43

2009 Oriental COCOSDA International Conference on Speech Database and Assessments

Phonetically rich speech corpora play a pivotal role in speech research. The significance of such resources becomes crucial in the development of Automatic Speech Recognition systems and Text to Speech systems. This paper presents details of designing and developing an optimal context based phonetically rich speech corpus for Urdu that will serve as a baseline model for training a Large Vocabulary...

rozdział

A new method for sample selection in active learning

Wei Chen, Gang Liu, Jun Guo, Yu-Jing Guo

2009 International Conference on Machine Learning and Cybernetics > 4 > 2270 - 2274

2009 Eighth International Conference on Machine Learning and Cybernetics (ICMLC)

Speech recognition systems are usually trained using tremendous transcribed samples, and training data preparation is intensively time-consuming and costly. Aiming at achieving better performance of acoustic model with less transcribed samples, active learning is adopted in acoustic model training to iteratively select the most informative samples corresponding to some sample selection method. And...

rozdział

Automatic topic detection strategy for information retrieval in spoken document

Shan Jin, H. Misra, T. Sikora, J. Jose

2009 10th Workshop on Image Analysis for Multimedia Interactive Services > 300 - 303

2009 10th Workshop on Image Analysis for Multimedia Interactive Services. WIAMIS 2009

This paper suggests an alternative solution for the task of spoken document retrieval (SDR). The proposed system runs retrieval on multi-level transcriptions (word and phone) produced by word and phone recognizers respectively, and their outputs are combined. We propose to use latent Dirichlet allocation (LDA) model for capturing the semantic information on word transcription. The LDA model is employed...

rozdział

Voice conversion using Artificial Neural Networks

S. Desai, E.V. Raghavendra, B. Yegnanarayana, A.W. Black, więcej

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 3893 - 3896

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

In this paper, we propose to use artificial neural networks (ANN) for voice conversion. We have exploited the mapping abilities of ANN to perform mapping of spectral features of a source speaker to that of a target speaker. A comparative study of voice conversion using ANN and the state-of-the-art Gaussian mixture model (GMM) is conducted. The results of voice conversion evaluated using subjective...

rozdział

Using collective information in semi-supervised learning for speech recognition

B. Varadarajan, Dong Yu, Li Deng, A. Acero

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4633 - 4636

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Training accurate acoustic models typically requires a large amount of transcribed data, which can be expensive to obtain. In this paper, we describe a novel semi-supervised learning algorithm for automatic speech recognition. The algorithm determines whether a hypothesized transcription should be used in the training by taking into consideration collective information from all utterances available...

rozdział

Acoustic-based pitch-accent detection in speech: Dependence on word identity and insensitivity to variations inword usage

A. Margolis, M. Ostendorf

2009 IEEE International Conference on Acoustics, Speech and Signal Processing > 4513 - 4516

ICASSP 2009 - 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Past work has produced fairly accurate automatic pitch-accent detectors, but it has often been noted that the accent class of a word is highly dependent on word identity, with some words and word types usually being accented and others not. We argue that a good accent detector should not only have high overall accuracy, but also be able to distinguish between accented and unaccented variants of the...

Poprzednia

Następna

Opcje filtrowania

Słowa kluczowe:
ACOUSTICS
DATA MINING

Data publikacji

Ustaw własny zakres dat

Słowa kluczowe

FEATURE EXTRACTION (23)
SPEECH (20)
SIGNAL PROCESSING (17)
SPEECH RECOGNITION (16)
ACCURACY (14)
ARTIFICIAL NEURAL NETWORKS (13)
SPEECH PROCESSING (12)
HIDDEN MARKOV MODELS (11)
PATTERN RECOGNITION (10)
CLASSIFICATION ALGORITHMS (9)
COMPUTERS (9)
SUPPORT VECTOR MACHINE CLASSIFICATION (9)
TESTING (9)
DATABASES (7)
ROBUSTNESS (7)
TRANSFORMS (7)
WAVELET TRANSFORMS (7)
ALGORITHM DESIGN AND ANALYSIS (6)
CONFERENCES (6)
FREQUENCY DOMAIN ANALYSIS (6)
NOISE (6)
SIGNAL PROCESSING ALGORITHMS (6)
TRAINING DATA (6)
ACOUSTIC MEASUREMENTS (5)
COMPLEXITY THEORY (5)
ELECTRONIC MAIL (5)
EQUATIONS (5)
ESTIMATION (5)
FILTERING THEORY (5)
IMAGE PROCESSING (5)
IMAGE SEGMENTATION (5)
MATHEMATICAL MODEL (5)
PRESSES (5)
SUPPORT VECTOR MACHINES (5)
TIME FREQUENCY ANALYSIS (5)
BAND PASS FILTERS (4)
COMPUTATIONAL MODELING (4)
CONVERGENCE (4)
DATA MODELS (4)
IMAGE COLOR ANALYSIS (4)
LABORATORIES (4)
MATERIALS (4)
MEL FREQUENCY CEPSTRAL COEFFICIENT (4)
MONITORING (4)
PSYCHOLOGY (4)
SHAPE (4)
ADAPTATION MODEL (3)
ARTIFICIAL INTELLIGENCE (3)
COMPUTER ARCHITECTURE (3)
CONTEXT (3)
CORRELATION (3)
COVARIANCE MATRIX (3)
DECISION MAKING (3)
DETECTION ALGORITHMS (3)
DETECTORS (3)
EDUCATIONAL INSTITUTIONS (3)
EMOTION RECOGNITION (3)
ENTROPY (3)
FAULT DIAGNOSIS (3)
FILTER BANK (3)
FOURIER TRANSFORMS (3)
FREQUENCY MODULATION (3)
GABOR FILTERS (3)
GAUSSIAN MIXTURE MODEL (3)
GRAY-SCALE (3)
HEURISTIC ALGORITHMS (3)
IMAGE ANALYSIS (3)
IMAGE RECOGNITION (3)
IMAGE RESOLUTION (3)
IMAGING (3)
KNOWLEDGE ENGINEERING (3)
MACHINE LEARNING (3)
MUSIC (3)
NATURAL LANGUAGE PROCESSING (3)
NETWORK TOPOLOGY (3)
PATTERN CLASSIFICATION (3)
PERIODIC STRUCTURES (3)
PROBABILITY (3)
PRODUCTION (3)
PROTOTYPES (3)
RADIAL BASIS FUNCTION NETWORKS (3)
RELIABILITY (3)
REVIEWS (3)
SIGNAL RESOLUTION (3)
SPECTRAL ANALYSIS (3)
TIME DOMAIN ANALYSIS (3)
TOPOLOGY (3)
UNDERWATER VEHICLES (3)
VECTORS (3)
VEHICLES (3)
WAVELET ANALYSIS (3)
WAVELET PACKETS (3)
WRITING (3)
ACOUSTIC MODELS (2)
ACOUSTIC SIGNAL PROCESSING (2)
ANALYTICAL MODELS (2)
APPROXIMATION METHODS (2)
więcej

INFONA - portal komunikacji naukowej

Wyniki wyszukiwania

Dodaj adresata

Anulowanie wysłania wiadomości

Czy na pewno chcesz anulować wysłanie wiadomości?

Wyślij wiadomość

Opcje filtrowania

Data publikacji

Ustawianie zakresu dat

Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.

Słowa kluczowe

Zgłaszanie błędu / nadużycia

Nieudane wysłanie zgłoszenia

Ułatwienia dostępu