Search results

Items from 1 to 10 out of 10 results

chapter

Lyric recognition in monophonic singing using pitch-dependent DNN

Dairoku Kawai, Kazumasa Yamamoto, Seiichi Nakagawa

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) > 326 - 330

2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

One of the difficulties in sung speech recognition is the small distance in an acoustic space between phonemes in sung speech. Therefore we considered clustering the speech based on a pitch (fundamental frequency F0) and creating a larger distance between the phonemes. In addition, we considered a two-stage training method of DNN-HMM: the first stage is trained by using conventional acoustic features...

chapter

Methods for rapid development of automatic speech recognition system for Russian

Radek Safarik, Jan Nouza

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their Application to Mechatronics (ECMSM) > 1 - 6

2015 IEEE International Workshop of Electronics, Control, Measurement, Signals and their application to Mechatronics (ECMSM)

In this paper we present our approach to the rapid and efficient development of an automatic speech recognition (ASR) system for Russian. We try to utilize our tools, procedures and data previously designed and collected for other Slavic languages, Czech and Slovak. We show how we build a large corpus of texts acquired from major publishers' web pages and convert it from Cyrillic to Latin to simplify...

chapter

Vocabulary independent acoustic-phonetic modeling for continuous speech recognition

L. Fissore, P. Laface, G. Micca, F. Ravera

1996 8th European Signal Processing Conference (EUSIPCO 1996) > 1 - 4

1996 8th European Signal Processing Conference (EUSIPCO 1996)

This paper investigates the problem of defining the acoustic-phonetic unit set for flexible vocabulary continuous speech recognition systems. As an alternative to the classical modeling approach with biphones and triphones, a set of stationary/transitory state units is defined that is limited enough in number as to represent a closed set trainable once and for all. A major benefit of these units is...

chapter

Fundamental research on a singing training support system for Shigin: Japanese traditional singing

Masashi Nakayama

2013 Proceedings of IEEE Southeastcon > 1 - 6

IEEE SOUTHEASTCON 2013

Shigin is the singing of Japanese or Chinese poetry, following a melody called “seicho” in Japanese. However, it is difficult to master Shigin because a trainer teaches according to his/her own impressions, and its melody employs a relative music scale. Therefore, this paper proposes a singing training support system for Shigin that clarifies differences in signal characteristics between a trainee...

chapter

Fundamental research on a singing training support system for Shigin: Japanese traditional singing

Masashi Nakayama

2012 Proceedings of IEEE Southeastcon > 1 - 6

SOUTHEASTCON 2012

chapter

Is Phoneme Level Better than Word Level for HMM Models in Limited Vocabulary ASR Systems?

Yousef Ajami Alotaibi

2010 Seventh International Conference on Information Technology: New Generations > 332 - 337

Seventh International Conference on Information Technology: New Generations (ITNG 2010)

In this paper Arabic alphadigits were investigated from the speech recognition problem point of view. Limited vocabulary Arabic Automatic Speech Recognition Systems (ASRs) were designed, implemented, and tested by using isolated word utterances which consists of Arabic alphabets and/or digits. These systems were implemented separately by using phoneme level and word level based HMM models in distinct...

chapter

Application of voiced-speech variability descriptors to emotion recognition

K. Slot, J. Cichosz, L. Bronakowski

2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications > 1 - 5

2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA)

The following paper examines a possibility of applying phone-pronunciation variability descriptors in emotion classification. The proposed group of descriptors comprises a set of statistical parameters of Poincare maps, which are derived for evolution of formant-frequencies and energy of voiced-speech segments. Poincare maps are represented by means of four different parameters that summarize various...

chapter

Comparative Experiments to Evaluate the Use of Syllables for the Improvement of Automatic Recognition of Dysarthric Speech

H. Tolba

2009 16th International Conference on Systems, Signals and Image Processing > 1 - 4

2009 16th International Conference on Systems, Signals and Image Processing

In this paper, we propose to use syllables as the acoustic units representing speech signals in an automatic speech recognition (ASR) system in order to improve the performance of the automatic recognition of dysarthric speech. The motivation behind using syllables comes from studies of human perception which demonstrate the central role of the syllable played in human perception and generation of...

article

Idiolect Extraction and Generation for Personalized Speaking Style Modeling

Chung-Hsien Wu, Chung-Han Lee, Chung-Hau Liang

IEEE Transactions on Audio, Speech, and Language Processing > 2009 > 17 > 1 > 127 - 137

A person's speaking style, consisting of such attributes as voice, choice of vocabulary, and the physical motions employed, not only expresses the speaker's identity but also emphasizes the content of an utterance. Speech combining these aspects of speaking style becomes more vivid and expressive to listeners. Recent research on speaking style modeling has paid more attention to speech signal processing...

chapter

Large Vocabulary Continuous Speech Recognition in Uyghur: Data Preparation and Experimental Results

N. Tursun, W. Silamu

2008 6th International Symposium on Chinese Spoken Language Processing > 1 - 4

2008 6th International Symposium on Chinese Spoken Language Processing

Uyghur language is an agglutinative language. It is one of the least studied languages on speech recognition area. In this work, we present the research process of Uyghur large vocabulary continuous speech recognition based on HMM (hidden Markov model). This paper introduce the process of data collection (text corpus and speech corpus), the unit selection for speech recognition, the creation of acoustic...

Filter options

Data set:
ieee
Keywords:
TRAINING
DATABASES
SPEECH
VOCABULARY

Publication date

Set your own date range

Publication type

book (9)
article (1)

Keywords

SPEECH RECOGNITION (7)
HIDDEN MARKOV MODELS (6)
ACOUSTICS (4)
ACOUSTIC MODEL (2)
AUTOMATIC SPEECH RECOGNITION (2)
EDUCATIONAL INSTITUTIONS (2)
FEATURE EXTRACTION (2)
HIDDEN MARKOV MODEL (2)
JAPANESE TRADITIONAL SONG (2)
LANGUAGE MODEL (2)
MICROPHONES (2)
SHIGIN (2)
SHIGIN MELODY (2)
SINGING TRAINING (2)
SPEECH PROCESSING (2)
TIME-FREQUENCY ANALYSIS (2)
ACCURACY (1)
ALPHADIGITS (1)
ARABIC (1)
ARABIC DIGITS (1)
AUTOMATIC RECOGNITION (1)
CHINESE CORPUS (1)
CLASSIFICATION ALGORITHMS (1)
CONTEXT (1)
DATA COLLECTION (1)
DATA MINING (1)
DATA PREPARATION (1)
DNN-HMM (1)
DYSARTHRIC SPEECH (1)
EMOTION RECOGNITION (1)
FORMANT FREQUENCIES EVOLUTION (1)
FREQUENCY CONVERSION (1)
HIDDEN MARKOV MODEL TOOLKIT (1)
HMM (1)
HMM MODELS (1)
IDIOLECT EXTRACTION (1)
IDIOLECT GENERATION (1)
ISOLATED WORD UTTERANCES (1)
LIMITED VOCABULARY ASR SYSTEMS (1)
LYRICS RECOGNITION (1)
MULTI-LINGUAL (1)
NATURAL LANGUAGE PROCESSING (1)
PERSONALIZED SPEAKING STYLE MODELING (1)
PHONEME LEVEL (1)
PITCH INFORMATION (1)
POINCARE MAPPING (1)
POINCARE MAPS (1)
PROTOTYPES (1)
RECOGNITION (1)
RUSSIAN (1)
SPEAKER IDENTITY (1)
SPEAKER-INDEPENDENT HMM-BASED ASR SYSTEM (1)
SPEAKING STYLE (1)
SPEECH CORPUS (1)
SPEECH SIGNAL PROCESSING (1)
SPEECH SIGNALS (1)
SPEECH SYNTHESIS (1)
STANDARDS (1)
STATISTICAL ANALYSIS (1)
STATISTICAL METHOD (1)
STEPWISE APPROACH (1)
STRESS (1)
SUPERFLUOUS IDIOLECT (1)
SYNONYM SUBSTITUTION (1)
TAIWANESE POLITICIAN (1)
TEXT CORPUS (1)
TEXT PROCESSING (1)
TEXT-TO-SPEECH CONVERSION (1)
TIME FREQUENCY ANALYSIS (1)
UYGHUR LANGUAGE (1)
VOCABULARY CONTINUOUS SPEECH RECOGNITION (1)
VOICED SPEECH SEGMENTS (1)
VOICED SPEECH VARIABILITY DESCRIPTORS (1)
WORD LEVEL (1)
more

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Publication type

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options