The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Natural and affective handshakes of two participants define the course of dyadic interaction. Affective states of the participants are expected to be correlated with the nature of the dyadic interaction. In this paper, we extract two classes of the dyadic interaction based on temporal clustering of affective states. We use the k-means temporal clustering to define the interaction classes, and utilize...
This paper presents a study of line-wise text identification in comic books. Due to the unavailability of a single OCR system which can handle comic text of multiple scripts, the comic text identification based on script becomes an essential step for choosing the appropriate OCR. In this investigation, a new attempt has been made to explore a comic text identification technique of speech balloon to...
In this paper we present the method we have adopted in order to leave in search of the hearer, through different acoustic measurements which have been described as being influenced by the speaker included in the database TIMIT and ked TIMIT. This study allows us to better understand where there are relevant indices to discriminate the speakers and presentation criteria to distinguish a file giving...
This work explores the vocal tract constriction aspect of speech for speech / music classification. During speech production, the vocal tract is closed for voiced bars and open for low vowels. For high vowels, semivowels, laterals, voiced fricatives and other sounds the vocal tract is in the intermediate position of the closed and open cases. Music signal, in particular the instrumental and non-vocal...
Research on non-intrusive speech quality assessment (SQA) aims to develop a computational model simulating the human perception of speech signals accurately and automatically without any prior information about the reference clean speech signals. In this paper, we propose to learn a non-intrusive SQA metric based on bag-of-words (BoW) representation of speech signals. In particular, the proposed method...
In this paper, we investigate the influence of the language on the text-independent speaker recognition. For this purpose, we have used several automatic text-independent speaker recognition methods (Multivariable Auto-Regression, Vector Quantization and Histogram Classifiers). To measure the effect of the language, we have applied these methods on the POLY-COST 250 multi-language database. Among...
A mathematical morphology based filter structure called a sieve is used to process mouth image sequences of a talker's mouth and form visual speech features. The effects of varying the type of filter, the post-processing and hidden Markov model (HMM) parameters on recognition accuracy are investigated using two audio-visual speech databases.
This research is conducted to accommodate the needs of visually impaired people through an intelligent system, which reads textual information on papers and produces corresponding voice. Indonesian Automated Document Reader (I-ADR) is operated via a voice-based user interface to scan a document page. Textual information from the scanned page is then extracted using Optical Character Recognition (OCR)...
The lip-region can be interpreted as either a genetic or behavioural biometric trait depending on whether static or dynamic information is used. In this paper, we use a texture descriptor called Local Ordinal Contrast Pattern (LOCP) in conjunction with a novel spatiotemporal sampling method called Windowed Three Orthogonal Planes (WTOP) to represent both appearance and dynamics features observed in...
The lip-region can be interpreted as either a genetic or behavioral biometric trait depending on whether static or dynamic information is used. Despite this breadth of possible application as a biometric, lip-based biometric systems are scarcely developed in scientific literature compared to other more popular traits such as face or voice. This is because of the generalized view of the research community...
This paper presents a method of voice activity detection (VAD) suitable for high noise scenarios, based on the fusion of two complementary systems. The first system uses a proposed non-Gaussianity score (NGS) feature based on normal probability testing. The second system employs a histogram distance score (HDS) feature that detects changes in the signal through conducting a template-based similarity...
The lack of publicly available annotated databases is one of the major barriers to research advances on emotional information processing. In this contribution we present a recently collected database of spontaneous emotional speech in German which is being made available to the research community. The database consists of 12 hours of audio-visual recordings of the German TV talk show ldquoVera am...
We consider the non-stationary or colored noise estimation by wavelet thresholding method. First, we propose node dependent thresholding for adaptation in colored or non-stationary noise. Next, we suggest a noise estimation method based on spectral entropy using histogram of intensity instead of estimation method based on median absolute deviation (MAD). And we use a modified hard thresholding to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.