Advanced search

Advanced search in people

From:

To:

Items from 1 to 20 out of 36 results

chapter

Voice activity detection using discriminative restricted Boltzmann machines

Rogerio G. Borin, Magno T. M. Silva

2017 25th European Signal Processing Conference (EUSIPCO) > 523 - 527

2017 25th European Signal Processing Conference (EUSIPCO)

Voice Activity Detection (VAD) plays an important role in current technological applications, such as wireless communications and speech recognition. In this paper, we address the VAD task through machine learning by using a discriminative restricted Boltzmann machine (DRBM). We extend the conventional DRBM to deal with continuous-valued data and employ feature vectors based either on mel-frequency...

chapter

Development of a large-scale Mandarin Radio Speech Corpus

Yung-hsiang Shawn Chang, Yuan-fu Liao, Sheng-ming Wang, Jenq-haur Wang, more

2017 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW) > 359 - 360

2017 IEEE International Conference on Consumer Electronics - Taiwan (ICCE-TW)

The Taiwan Mandarin Radio Speech Corpus consists of roughly 300 (and growing) hours of audio recordings, selected from Taiwan's National Education Radio (NER) archive. The corpus includes speech from hundreds of speakers and various speech styles (spontaneous conversational and read news). This corpus provides a rich resource for research in speech and automatic speech recognition (ASR). In this paper,...

chapter

Supervised audio tampering detection using an autoregressive model

Xiaodan Lin, Xiangui Kang

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 2142 - 2146

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Splicing, cutting and insertion are the most common operations imposed on audio files when the adversary intends to modify or fabricate the content. The detection of such kinds of tampering is still challenging in real-world applications. In this paper, a generic approach for the detection of audio tampering is proposed via the analysis of electric network frequency (ENF). Based on the fact that tampering...

chapter

Multi-speaker conversations, cross-talk, and diarization for speaker recognition

Gregory Sell, Alan McCree

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5425 - 5429

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

I-vector training and extraction assume that a speech file is spoken by a single speaker. This work considers the effects of violating that assumption with the presence of cross-talk or multi-speaker conversations. First, it is demonstrated that these problematic speech files can be detected using the i-vector representation itself. The impact of these violations of the single-speaker assumption are...

chapter

Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks

Zhong-Qiu Wang, Ivan Tashev

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5150 - 5154

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Accurately recognizing speaker emotion and age/gender from speech can provide better user experience for many spoken dialogue systems. In this study, we propose to use deep neural networks (DNNs) to encode each utterance into a fixed-length vector by pooling the activations of the last hidden layer over time. The feature encoding process is designed to be jointly trained with the utterance-level classifier...

chapter

Deep learning with maximal figure-of-merit cost to advance multi-label speech attribute detection

Ivan Kukanov, Ville Hautamaki, Sabato Marco Siniscalchi, Kehuang Li

2016 IEEE Spoken Language Technology Workshop (SLT) > 489 - 495

2016 IEEE Spoken Language Technology Workshop (SLT)

In this work, we are interested in boosting speech attribute detection by formulating it as a multi-label classification task, and deep neural networks (DNNs) are used to design speech attribute detectors. A straightforward way to tackle the speech attribute detection task is to estimate DNN parameters using the mean squared error (MSE) loss function and employ a sigmoid function in the DNN output...

chapter

A comparative study on phonological feature detection from continuous speech with respect to variable corpus size

Tanmay Bhowmik, Krishna Dulal Dalapati, Shyamal Kumar Das Mandal

2016 IEEE Students’ Technology Symposium (TechSym) > 311 - 316

2016 IEEE Students’ Technology Symposium (TechSym)

In this paper, place and manner of articulation based phonological features have been successfully identified with high accuracy using very minimal amount of training data. In detection-based, bottom-up speech recognition approach, the phonological feature based acoustic-phonetic speech attributes are considered as a key component. After identifying the features, they are merged together to get the...

chapter

Spoofing attacks to i-vector based voice verification systems using statistical speech synthesis with additive noise and countermeasure

Mustafa Caner Ozbay, Ali Khodabakhsh, Amir Mohammadi, Cenk Demiroglu

2016 24th European Signal Processing Conference (EUSIPCO) > 1207 - 1211

2016 24th European Signal Processing Conference (EUSIPCO)

Even though improvements in the speaker verification (SV) technology with i-vectors increased their real-life deployment, their vulnerability to spoofing attacks is a major concern. Here, we investigated the effectiveness of spoofing attacks with statistical speech synthesis systems using limited amount of adaptation data and additive noise. Experiment results show that effective spoofing is possible...

chapter

Speaker recognition with artificial neural networks and MEL-frequency cepstral coefficients correlations

Roberto A. B. Soria, Euvaldo F. Cabral

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

The problem addressed in this paper is related to the fact that classical statistical approach for speaker recognition yields satisfactory results but at the expense of long length training and test utterances. An attempt to reduce the length of speaker samples is of great importance in the field of speaker recognition since the statistical approach, due to its limitations, is usually precluded from...

chapter

Speaker change point detection using deep neural nets

Vishwa Gupta

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4420 - 4424

ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

We investigate the use of deep neural nets (DNN) to provide initial speaker change points in a speaker diarization system. The DNN trains states that correspond to the location of the speaker change point (SCP) in the speech segment input to the DNN. We model these different speaker change point locations in the DNN input by 10 to 20 states. The confidence in the SCP is measured by the number of frame...

chapter

Emotion conversion for expressive Arabic text to speech

Doaa Gamal, Mohsen Rashwan, Sherif Mahdy Abdou

2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA) > 341 - 348

2014 IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA)

Emotion conversion using a small speech corpus is very important for expressive text to speech systems. Applying the unit selection paradigm for intonation conversion has been widely used for different languages using different intonation units. In this paper, an emotion conversion system is proposed for expressive Arabic speech. This system combines the transformation of both spectral and prosodic...

chapter

Preventing converted speech spoofing attacks in speaker verification

M. J. Correia, A. Abad, I. Trancoso

2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) > 1320 - 1325

2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)

Voice conversion (VC) techniques, which modify a speaker's voice to sound like another's, present a threat to automatic speaker verification (SV) systems. In this paper, we evaluate the vulnerability of a state-of-the-art SV system against a converted speech spoofing attack. To overcome the spoofing attack, we implement state-of-the-art converted speech detectors based on short- and long-term features...

chapter

Acoustic-support vector machines approach to detect spoken Arabic language

Mohammed Osman Eltayeb, Mohammed Elhafiz Mustafa

2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONIC ENGINEERING (ICCEEE) > 525 - 529

2013 International Conference on Computing, Electrical and Electronics Engineering (ICCEEE)

Spoken Language detection is the process of either accepting or rejecting a language identity from its sample speech. The process is essential as it represents the first phase for a complete multilingual-enabled speech processing applications. However, most efforts are focused on European languages and the research is relatively few for other languages such as Arabic. This is mainly due to the lack...

chapter

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

Dong Yu, Sabato Marco Siniscalchi, Li Deng, Chin-Hui Lee

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 4169 - 4172

ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing

Generation of high-precision sub-phonetic attribute (also known as phonological features) and phone lattices is a key frontend component for detection-based bottom-up speech recognition. In this paper we employ deep neural networks (DNNs) to improve detection accuracy over conventional shallow MLPs (multi-layer perceptrons) with one hidden layer. A range of DNN architectures with five to seven hidden...

chapter

A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case

Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li, more

Proceedings of The 2012 Asia Pacific Signal and Information Processing Association Annual Summit and Conference > 1 - 5

2012 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

Voice conversion technique, which modifies one speaker's (source) voice to sound like another speaker (target), presents a threat to automatic speaker verification. In this paper, we first present new results of evaluating the vulnerability of current state-of-the-art speaker verification systems: Gaussian mixture model with joint factor analysis (GMM-JFA) and probabilistic linear discriminant analysis...

chapter

Integrating meta-information into exemplar-based speech recognition with segmental conditional random fields

Kris Demuynck, Dino Seppi, Dirk Van Compernolle, Patrick Nguyen, more

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5048 - 5051

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Exemplar based recognition systems are characterized by the fact that, instead of abstracting large amounts of data into compact models, they store the observed data enriched with some annotations and infer on-the-fly from the data by finding those exemplars that resemble the input speech best. One advantage of exemplar based systems is that next to deriving what the current phone or word is, one...

chapter

Online detection of vocal Listener Responses with maximum latency constraints

Daniel Neiberg, Khiet P. Truong

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5836 - 5839

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

When human listeners utter Listener Responses (e.g. back-channels or acknowledgments) such as ‘yeah’ and ‘mmhmm’, interlocutors commonly continue to speak or resume their speech even before the listener has finished his/her response. This type of speech interactivity results in frequent speech overlap which is common in human-human conversation. To allow for this type of speech interactivity to occur...

chapter

Using latent topic features to improve binary classification of spoken documents

Jonathan Wintrode

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5544 - 5547

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

In many topic identification applications, supervised training labels are indirectly related to the semantic content of the documents being classified. For example, many topically distinct emails will all be assigned a single broad category label of “spam” or “not-spam”, and a two-class classifier will lack direct knowledge of the underlying topic structure. This paper examines the degradation of...

chapter

Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop

G. Zweig, P. Nguyen, D. Van Compernolle, K. Demuynck, more

2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 5044 - 5047

ICASSP 2011 - 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

This paper summarizes the 2010 CLSP Summer Workshop on speech recognition at Johns Hopkins University. The key theme of the workshop was to improve on state-of-the-art speech recognition systems by using Segmental Conditional Random Fields (SCRFs) to integrate multiple types of information. This approach uses a state-of-the-art baseline as a springboard from which to add a suite of novel features...

article

Burst Onset Landmark Detection and Its Application to Speech Recognition

Chi-Yueh Lin, Hsiao-Chuan Wang

IEEE Transactions on Audio, Speech, and Language Processing > 2011 > 19 > 5 > 1253 - 1264

The reliable detection of salient acoustic-phonetic cues in speech signal plays an important role in speech recognition based on speech landmarks. Once speech landmarks are located, not only can phone recognition be performed, but other useful information can also be derived. This paper focuses on the detection of burst onset landmarks, which are crucial to the recognition of stop and affricate consonants...

Keywords:
TRAINING
DETECTORS
SPEECH

Publication date

Set your own date range

Publication type

book (34)
article (2)

Keywords

SPEECH RECOGNITION (22)
FEATURE EXTRACTION (20)
HIDDEN MARKOV MODELS (17)
SPEECH PROCESSING (12)
ACOUSTICS (10)
DATABASES (7)
SUPPORT VECTOR MACHINES (6)
TESTING (6)
ACCURACY (5)
ERROR ANALYSIS (5)
DEEP NEURAL NETWORKS (4)
NEURAL NETWORKS (4)
TRANSFORMS (4)
CLASSIFICATION ALGORITHMS (3)
DATA MINING (3)
DETECTION ALGORITHMS (3)
TRAINING DATA (3)
VECTORS (3)
WAVELET TRANSFORMS (3)
AFFRICATE CONSONANT (2)
ALGORITHM DESIGN AND ANALYSIS (2)
ANALYTICAL MODELS (2)
ARTIFICIAL NEURAL NETWORKS (2)
ATTRIBUTE DETECTION (2)
AUDIO EVENT DETECTION (2)
AUDIO SEGMENTATION (2)
AUDIO SIGNAL PROCESSING (2)
AUTOMATIC SPEECH ATTRIBUTE TRANSCRIPTION (2)
AUTOMATIC SPEECH RECOGNITION (2)
BRIGHTNESS (2)
BURST ONSET (2)
CEPSTRAL ANALYSIS (2)
COMPUTATIONAL MODELING (2)
COMPUTER VISION (2)
CONFERENCES (2)
CORRELATION (2)
EQUATIONS (2)
EVENT DETECTION (2)
FEATURE DETECTION (2)
FILTERING THEORY (2)
FOURIER TRANSFORMS (2)
GABOR FILTERS (2)
IMAGE PROCESSING (2)
IMAGE RECOGNITION (2)
IMPEDANCE MATCHING (2)
LABORATORIES (2)
MACHINE LEARNING (2)
MACHINE VISION (2)
MATHEMATICAL MODEL (2)
MATRIX DECOMPOSITION (2)
MAXIMUM LIKELIHOOD DETECTION (2)
MEASUREMENT (2)
NOISE (2)
PATTERN RECOGNITION (2)
PERIODIC STRUCTURES (2)
PHONE RECOGNITION (2)
PHONOLOGICAL FEATURES (2)
PRESSES (2)
PRINCIPAL COMPONENT ANALYSIS (2)
RANDOM FOREST (2)
SIGNAL CLASSIFICATION (2)
SIGNAL PROCESSING (2)
SIGNAL TO NOISE RATIO (2)
SPEAKER DIARIZATION (2)
SPEAKER RECOGNITION (2)
SPEECH ANALYSIS (2)
SPEECH EVENT DETECTION (2)
STOP CONSONANT (2)
SUPPORT VECTOR MACHINE CLASSIFICATION (2)
ACCURATE SPEECH MODE DETECTOR (1)
ACOUSTIC ECHO CANCELLATION (1)
ACOUSTIC ECHO CANCELLATION (AEC) (1)
ACOUSTIC MODEL (1)
ACOUSTIC SIGNAL DETECTION (1)
ACOUSTIC-PHONETIC DETECTORS (1)
ACTIVE CONTOURS (1)
ADAPTATION MODELS (1)
ADAPTIVE FILTERS (1)
AFFRICATE CONSONANT RECOGNITION (1)
AIRCRAFT (1)
AIRCRAFT PROPULSION (1)
ANIMAL SPECIES (1)
ARABIC LANGUAGE (1)
ASAT-SIRKUS PARADIGM (1)
ASYMMETRIC BOOTSTRAPPING METHOD (1)
ASYMPTOTIC MAXIMUM A POSTERIORI (1)
ATMOSPHERIC WAVES (1)
AUDIO CONTRIBUTIONS (1)
AUDIO RECORDING (1)
AUDIO SIGNAL (1)
AUDIO TAMPERING DETECTION (1)
AUDITORY STIMULUS (1)
AUGMENTED FEATURE VECTOR (1)
AUTOMATIC VOCAL EFFORT DETECTION (1)
AUTOREGRESSIVE MODEL (1)
BASELINE AVERAGE LOG PROBABILITY THRESHOLDING (1)
BAYESIAN METHODS (1)
more

INFONA - science communication portal

Advanced search

Advanced search in people

Voice activity detection using discriminative restricted Boltzmann machines

Development of a large-scale Mandarin Radio Speech Corpus

Supervised audio tampering detection using an autoregressive model

Multi-speaker conversations, cross-talk, and diarization for speaker recognition

Learning utterance-level representations for speech emotion and age/gender recognition using deep neural networks

Deep learning with maximal figure-of-merit cost to advance multi-label speech attribute detection

A comparative study on phonological feature detection from continuous speech with respect to variable corpus size

Spoofing attacks to i-vector based voice verification systems using statistical speech synthesis with additive noise and countermeasure

Speaker recognition with artificial neural networks and MEL-frequency cepstral coefficients correlations

Speaker change point detection using deep neural nets

Emotion conversion for expressive Arabic text to speech

Preventing converted speech spoofing attacks in speaker verification

Acoustic-support vector machines approach to detect spoken Arabic language

Boosting attribute and phone estimation accuracies with deep neural networks for detection-based speech recognition

A study on spoofing attack in state-of-the-art speaker verification: the telephone speech case

Integrating meta-information into exemplar-based speech recognition with segmental conditional random fields

Online detection of vocal Listener Responses with maximum latency constraints

Using latent topic features to improve binary classification of spoken documents

Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop

Burst Onset Landmark Detection and Its Application to Speech Recognition

Filter options

Publication date

Publication type

Keywords

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options