The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Phonemes are the smallest units of sound produced by a human being. Automatic classification of phonemes is a well-researched topic in linguistics due to its potential for robust speech recognition. With the recent advancement of phonetic segmentation algorithms, it is now possible to generate datasets of millions of phonemes automatically. Phoneme classification on such datasets is a challenging...
This paper revealed the analysis of speaker independent isolated Pashto spoken numbers for determination of automatic speech recognition. Initially the database was developed, the database encompasses isolated Pashto numbers from sefer (0) to sul (100). Fifty speakers (25 male, 25 females with different ages) that can frequently speak yousafzai dialect were selected for recording. The recording has...
Rhythm and intonation are important factors in the English sentence pronunciation evaluation. In this paper, the Mel Frequency Cepstrum Coefficient (MFCC) feature and Hidden Markov Model (HMM) algorithm are used to establish a model for speech recognition. Then it makes an evaluation of English sentence pronunciation focusing on rhythm and intonation, and gives feedbacks and recommendations about...
Statistics state that approximately, one in 1000 people are born mute. In a population of 7.046 billion worldwide, the number is a staggering 7 million. Of all the form of disabilities, the mute have the going tough. The inability of fellow mortals to comprehend what they hope to express serves as a constant remainder of the misfortune that had befallen them. This catastrophe often bars them from...
We present in this paper a new Direct Access Framework (DAF) for speaker identification system, to identify a speaker based on original characteristics of the human voice. Direct access method is a process to identify an object based on parts of the object itself, the parts called original characteristics. The proposed framework consists of two parts, the enrolment process and the identification process...
This letter presents a joined cepstral distance and voice quality feature two-stage multi-class classification with DAG-SVM for emotional speech. The Harmonic to Noise Ratio (HNR) is applied to detect the throat diseases because it can reflect characteristics of the throat. Meanwhile, these characteristics are also strong emotional basis to distinguish emotion in speech. The cepstrum and cepstral...
Computer assisted language learning (CALL) and, more specifically, computer assisted pronunciation training (CAPT) have received considerable attention in recent years. CAPT allows continuous feedback to the learner without requiring the sole attention of the teacher; it facilitates self study and encourages interactive use of the language in preference to rote learning. One of the important processes...
This paper investigates the contribution of frequency bands for automatic voice pathology detection. First, the input voice signal is passed through a number of time-domain band-pass filters. The center frequencies are spaced on an octave scale. Each filter output is then divided into overlapping frames. Auto-correlation function is applied to each block to find the first largest peak, in areas other...
Recognizing textual entailment (RTE) is a task that predict whether a text fragment can be inferred from another text fragment. In this paper, we tackle RTE problem using sentence extraction to cover semantic variation and then extracting subject, predicate and object from each sentence without using external resources like Wordnet. Finally, similarity function is used to predict entailment relation...
Sentence similarity measures play an increasingly important role in text-related research and applications in areas such as text mining, Web page retrieval, and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. These methods process sentences in a very high-dimensional space and are consequently inefficient, require...
This paper presents a first approach to the unsupervised learning and prediction of primary lexical stress starting from continuous speech data and its orthographic transcript. The approach is intended to be used in the development of text-to-speech synthesis systems for under-resourced languages. Our method is based on syllable nuclei approximation and stress detection using simple acoustic features...
This paper describes an implementation of speech recognition that recognizes and suppresses ten (10) defined profane and vulgar Filipino words. The adapted speech recognition architecture was that of the Oregon Graduate Institute's (OGI) Center for Spoken Language and Learning (CSLU). It utilizes a hybrid Hidden Markov Model/ Artificial Neural Network (HMM/ANN) keyword spotting framework. The feature...
The use of Electroencephalography (EEG) in the domain of Brain Computer Interface is a now common place. EEG for imagined speech reproduction and observation of brain response to audio stimuli are active areas of research. In this paper, we consider the case of imagined and mouthed non-audible speech recorded with EEG electrodes. We analyze different feature extraction techniques such as Mel Frequency...
Wireless channels are highly prone to eavesdropping. To mitigate this problem encryption systems are used. Analog scrambling systems achieve confidentiality through modification of analog signals. Unfortunately, current literatures lack a thorough security analysis of these systems. In this paper security of hopping window time domain scrambler is investigated. It is shown that cipher-text of these...
Speech signals contains important information to use for different purposes such as surveillance, smart home, medicine, etc. Thus, classification of these signals are chief to consider for further applications. This article presents a simple method that has lower calculations for center process unit to achieve results as well as faster reaction time and high accuracy. Many method exist in speech signal...
Onset detection is one of the main issues towards self-paced BCIs that can be used outside research settings. For this reason, this paper suggests a potential solution for onset detection problem by discriminating between speech related events. In this study, overt, inhibited overt and covert states were tested to classify from idle state in an off-line setting. Autoregressive model coefficients were...
This paper presents investigations into the relative effectiveness of two alternative approaches to open-set text-independent speaker identification (OSTI-SI). The methods considered are the recently introduced i-vector and the more traditional GMM-UBM method supported by score normalisation. The study is motivated by the growing need for effective extraction of intelligence and evidence from audio...
In this paper we have focused on the problem of automatic prediction of parts of speech in sentences. We present an experimental framework which includes the analysis and the implementation of methods for part of speech (POS) labeling (tagging). We have tested three methods that predict the POS without current word's context and also three context awareness statistic methods. The main goal of our...
For poor robustness issues of pitch detection of noisy speech, the improved pitch detection method combined with speech enhancement is proposed in this paper. Firstly, in order to reduce background noise and receive the clean speech relatively, we use the multi-band spectral subtraction and the masking properties of human auditory system to work on the noisy speech, and next use the energy and zero-crossing...
In the past decade a lot of research has gone into Automatic Speech Emotion Recognition(SER). The primary objective of SER is to improve man-machine interface. It can also be used to monitor the psycho physiological state of a person in lie detectors. In recent time, speech emotion recognition also find its applications in medicine and forensics. In this paper 7 emotions are recognized using pitch...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.