The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Separating an acoustic signal into desired and undesired components is an important and well-established problem. It is commonly addressed by decomposing spectral magnitudes after exponentiation and the choice of exponent has been studied from numerous perspectives. We present this exponent selection problem as an approximation to the actual underlying geometric situation. This approach makes apparent...
The separation of acoustic signals is often accomplished through subtractive decompositions of frequency-domain representations. This is typically enabled by the zero phase approximation or the un-correlated signals approximation but both of these are very coarse approximations in the mathematical sense. We investigate this disconnect between what works in practice and what is mathematically correct...
Music technologies will open the future up to new ways of enjoying music both in terms of music appreciation and music creation. In this keynote speech, I introduce the frontiers of music technologies by showing some practical examples to demonstrate how end users can benefit from music signal processing, music understanding technologies, singing synthesis technologies, and music interfaces. For example,...
Audio signal processing has long been the obvious approach to problems such as microphone array processing, active noise control, or speech enhancement. Yet, it is increasingly being challenged by black-box machine learning approaches based on, e.g., deep neural networks (DNN), which have already achieved superior results on certain tasks. In this talk, I will try to convince that machine learning...
A mapping system based on an artificial neural network was designed, trained, and tested to map Arabic acoustic parameters to their corresponding articulatory features. The main objective of the study was to find the correlation between these two different types of features. To train and test the system, an in-house database was created for all 29 Arabic alphabets as carrier words for our intended...
In this paper, we propose models for emotion recognition from speech based on class-dependent features and Gaussian mixture models (GMM). Seven emotions are identified (Happiness, Fear, Neutral, Disgust, Anger, Boredom and Sadness) with a small set of features for each class. Results show that our system outperforms the single-stage classifier, with a 82.41% (74.86% in single-stage) overall recognition...
Normalized Gram matrices formed from multiple vectors of sensor data, and functions of the eigenvalues of such matrices in particular, have a long history in connection with multiple-channel detection. The determinant and various other functions of the eigenvalues of these matrices arise as detection statistics in a variety of multichannel problems, and knowledge of their distributions under the H...
Affect bursts, which are nonverbal expressions of emotions in conversations, play a critical role in analyzing affective states. Although there exist a number of methods on affect burst detection and recognition using only audio information, little effort has been spent for combining cues in a multi-modal setup. We suggest that facial gestures constitute a key component to characterize affect bursts,...
Gender information is a distinctive and the most important property in a speech. Determination of this information from a speech signal is a substantial subject. Gender information used for various purposes in many applications, provides the less error rate by defining the gender-dependent speech/speaker models. In this study, a system determining the gender of a speaker with no dependency from a...
In this paper, five different voiced direction command recognition is realized in real-time. Speech detection step is performed on voice recordings that includes five different direction commands. Mel frequency cepstrum coefficients (MFCC) and Linear Predictive Coding (LPC) coefficients are utilized to extract feature vectors and training data set. k-nearest neighbor classification algorithm is used...
Automatic diagnosis of the Alzheimer's disease as well as monitoring of the diagnosed patients can make significant economic impact on societies. We investigated an automatic diagnosis approach through the use of speech based features. As opposed to standard tests that are mostly focused on memory recall, spontaneous conversations are carried with the subjects in informal settings. Prosodic speech...
Nowadays interaction between humans and computers is increasing rapidly. Efficiency and comfort of these interactions depend on the availability of user information to computers. Gender, age and emotional state are most the most fundamental pieces of these information. Extraction of such information from audio or video data is an important research area. There are several works on different languages...
In this study, extracting the prosodic information for Turkish Broadcast News Data using the open source tools and comparing the sentence segmentation performances of these grouped prosodic information on the raw data obtained as an output from the Automatic Speech Recognition System are established. Especially for the sentence segmentation task, a very promising prosodic feature set is obtained.
This study aims at presenting an emotional corpus collected at Bog˘aziçi University / Electrical and Electronics Department, on which no previous signal processing and machine learning study was done for classification purposes. It also aims at providing the protocol for further experiments on this corpus. The emotional corpus consists of 484 speech utterances from 11 amateur actors acting 11 emotionally...
An audio recording, made in a real environment, carries an acoustical signature which changes according to the acoustical characteristics of the environment and the recording positions. This signature which is similar to a 3D room impulse response contains the directions, levels and arrival times of the direct source and reflections. Although it is easy to obtain reverberant recordings by convolving...
We study recovering sparse and compressible signals using lp minimization with p < 1 when some part of the support of the signal is known a priori. Sparse reconstruction method based on lp minimization with partially known set is proposed. Recovery conditions of lp minimization with partially known support is given. Theoretical results show that lp minimization with partially known set is...
In this paper the influence of vibrotactile stimuli on the evaluation of loudness of speech is investigated. The speech utterances consisted of context-free consonant-vowel-consonant (CVC) and vowel-consonant-vowel (VCV) utterances played over headphones on a mobile phone. The phone's standard vibration was used as the tactile stimulus. Using an AB/X paradigm, 32 untrained subjects evaluated loudness...
Steganography is the art of hiding message in order to have a secure data communication. This paper addresses a technique for wave steganography. In this paper we proposed the idea to replace bits according to the distortion afforded with lossy or lossless and recovery methods. Carrier file bits are replaced by message file. Message embeded in this method is in form of wave. Hidden message can be...
Steganography is the art of hiding message in order to have a secure communication. In this paper, we present a novel technique for wave steganography for covert communication. The basic idea proposed in this paper is replacement of the bits according to the distortion afforded, with lossy or lossless hiding and recovery. Numbers of bits of the samples in cover file are replaced in accordance with...
In this work, a new empirical mode decomposition (EMD) is introduced. It does not use extrema envelopes nor sifting procedure but the decomposition is only based on a direct calculation of its components from inflexion points. Our technique has many advantages: firstly, in contrast to the classical EMD, we give an analytical formula for the decomposition. Finally, a simulation study shows its efficiency.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.