The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Research on speech/music classification of digital audio has been both popular in academia, and increasingly utilized in industry. Most of the usual methods use carefully hand-crafted features with Gaussian Mixture Models. To get best performance, some of the features necessitate a long latency due to look ahead, or/and a long onset error. This paper aims to have a different approach to the problem...
Parts of Speech tagging, is a process of marking the words in a text as corresponding to a particular part of speech, based on its definition and context POS tagger plays an important role in Natural language applications like speech recognition, natural language parsing, information retrieval and extraction. This paper discusses architecture for designing a Part-Of-Speech (POS tagger for Malayalam...
Anaphora resolution (AR) is the process of resolving references to an entity in the discourse. The paper presents an algorithm to identify the pronominals and its antecedents in the Malayalam text input. Anaphora resolution is achieved by employing a hybrid of statistical machine learning and rule based approaches. The system is implemented by exploiting the morphological richness of the language...
The analysis of various components of the Electroglottograph (EGG) signal, obtained after Ensemble Empirical Mode Decomposition (EEMD) is the primary objective of this paper. The ability of EEMD to detect intermittent high frequency data embedded in the data of lower frequency is exploited to segregate the Epoch locations and the Periodic nature of EGG signal. The dyadic filterbank property of EEMD...
This paper introduces a novel two dimensional feature extraction method for environmental sound classification, based on two dimensional semi-nonnegative matrix factorization (2D Semi-NMF) of scale-frequency maps. We first extract scale-frequency maps (SFMs) from the input signals, and this feature is considered preserving scale and frequency characteristics of signals. Second, a 2D Semi-NMF method...
In this paper, we address the problem of automatic speech summarization on open-domain TED talks. The large vocabulary and diversity of topics from speaker-to-speaker presents significant difficulties. The challenges increase not only how to handle disfluencies and fillers, but also how to extract topic-related meaningful messages within the free talks. Here, we propose to incorporate semantic and...
In this paper, we propose a method for avoiding digressions in discussion by detecting unnecessary utterances and having a dialogue system intervene. The detector is based on the features using word frequency and topic shifts. The performance (i.e. accuracy, recall, precision, and F-measure) of the unnecessary utterance detector is evaluated through leave-one-dialogue-out cross-validation. In the...
Human behavior sensing and their analysis are great role to improve service quality and education of employees. This paper shows novel frameworks of detection of customer communication and lead time estimation(LTE) by using multi-sensored data, sound data and accounting data in the restaurant. They are useful for management about work environments and problems for employees. Lead time from order to...
In this paper a novel methodology for indexing domain specific audio archives using linguistic information present in the speech signal is discussed. The audio indexing system is phone based and can work under limited training data conditions. A training data set that captures the linguistic information within Hindi language at the syllable level is first developed. A reduced phone set is then derived...
Human voice can serve as a password/key for access to various services. This voice is used for verifying speaker in speaker verification system based on the features extracted from the voice signal. In automated speaker verification the speaker's voice signal is processed to extract speaker-specific information which is used to generate voiceprint also known as a template that cannot be replicated...
This paper presents the process of Quranic Accent Automatic Identification. Recent feature extraction technique that is used for Quranic verse rule identification/Tajweed include Mel Frequency Cepstral Coefficients (MFCC) which prone to additive noise and may reduce the classification result. Therefore, to improve the performance of MFCC with addition of Spectral Centroid features and is proposed...
In this paper, query sound-by-example video retrieval framework based on audio concepts is presented. First, audio stream extracted from movies in the database is set into orientation clusters using an unsupervised segmentation technique. Audio signals admit a new proposed particular pretreatment process to distinguish audio concepts. This is used for indexing the video data. Second, the query asked...
An effective speech brain machine interface requires selecting the best cortical recording sites and signal features for decoding speech production, but also minimal clinical risk for the patient. Motivated by this need to reduce patient risk, the purpose of this study is to detect voice activity (speech onset and offset) automatically from spatial-spectral features of electrocorticographic signals...
This paper presents the work of acoustic analysis related to Modern Standard Arabic (MSA). The problem of classifying the consonant counterparts in MSA is tackled here. The study considers four phonemes: /dˤ, ðˤ/ and their non-emphatic counterparts /d, ð/ respectively. An accurate automatic classification for those phonemes is to be achieved. Artificial neural networks (ANNs) are used for that purpose...
In this paper, the acoustic features of pitch, intensity, formants, and speech rate are extracted and used to classify the following Arabic speech emotions: neutral, sad, happy, surprised, and angry. Three sentences spoken by four male and four female native Arabic speakers were selected from a newly developed Arabic speech corpus (KSUEmotions). Perception tests using human listeners yielded scores...
Human machine interaction is one of the most burgeoning area of research in the field of information technology. To date a majority of research in this field has been conducted using unimodal and multimodal systems with asynchronous data. Because of the above, the improper synchronization, which has become a common problem, due to that, the system complexity increases and the system response time...
Human listeners are capable of recognizing speech in noisy environment, while most of the traditional speech recognition methods do not perform well in the presence of noise. Unlike traditional Mel-frequency cepstral coefficient (MFCC)-based method, this study proposes a phoneme classification technique using the neural responses of a physiologically-based computational model of the auditory periphery...
This work explores Deep Belief Networks (DBN) for the task of detecting Vowel-like regions (VLRs). Vowels and semivowels are considered as VLRs. By using vocal tract features at the input layer of DBN, we extract an evidence for VLRs by transforming the vocal tract features through multiple non-linear hidden layers. The linear classifier is used to predict the class of evidence, i.e.,whether it is...
Speech recognition systems are either based on parametric approach or non-parametric approach. Parametric based systems such as HMMs have been the dominant technology for speech recognition in the past decade. Despite a lot of advancements and enhancements in the design of these systems: key problems such as long term temporal dependence, etc. Has not yet been solved. Recently due to availability...
Speech/non-speech detection (SND) distinguishes between speech and non-speech segments in recorded audio and video documents. SND systems can help reduce the storage space required when only speech segments from the audio documents are required, for example content analysis, spoken language identification, etc. In this work, we experimented with the use of time domain, frequency domain and cepstral...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.