The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Acoustic monitoring of bird species is an increasingly important field in signal processing. Many available bird sound datasets do not contain exact timestamp of the bird call but have a coarse weak label instead. Traditional Non-negative Matrix Factorization (NMF) models are not well designed to deal with weakly labeled data. In this paper we propose a novel Masked Non-negative Matrix Factorization...
Most activity-based person identity recognition methods operate on video data. Moreover, the vast majority of these methods focus on gait recognition. Obviously, recognition of a subject's identity using only gait imposes limitations to the applicability of the corresponding methods whereas a method capable of recognizing the subject's identity from various activities would be much more widely applicable...
We consider the over-fitting problem for multinomial probabilistic Latent Semantic Analysis (pLSA) in collaborative filtering, using a regularization approach. For big data applications, the computational complexity is at a premium and we, therefore, consider a maximum a posteriori approach based on conjugate priors that ensure that complexity of each step remains the same as compared to the un-regularized...
We propose a method for optimizing an acoustic feature extractor for anomalous sound detection (ASD). Most ASD systems adopt outlier-detection techniques because it is difficult to collect a massive amount of anomalous sound data. To improve the performance of such outlier-detection-based ASD, it is essential to extract a set of efficient acoustic features that is suitable for identifying anomalous...
The proliferation of cameras and personal devices results in a wide variability of imaging conditions, producing large intra-class variations and a significant performance drop when images from heterogeneous environments are compared. However, many applications require to deal with data from different sources regularly, thus needing to overcome these interoperability problems. Here, we employ fusion...
Component Analysis (CA) consists of a set of statistical techniques that decompose data to appropriate latent components that are relevant to the task-at-hand (e.g., clustering, segmentation, classification, alignment). During the past few years, an explosion of research in probabilistic CA has been witnessed, with the introduction of several novel methods (e.g., Probabilistic Principal Component...
Point Process Models (PPM) have been widely used for keyword spotting applications. Training these models typically requires a considerable number of keyword examples. In this work, we consider a scenario where very few keyword examples are available for training. The availability of a limited number of training examples results in a PPM with poorly learnt parameters. We propose an unsupervised online...
Smartphone applications designed to track human motion in combination with wearable sensors, e.g., during physical exercising, raised huge attention recently. Commonly, they provide quantitative services, such as personalized training instructions or the counting of distances. But qualitative monitoring and assessment is still missing, e.g., to detect malpositions, to prevent injuries, or to optimize...
In this paper, we develop a novel second-order method for training feed-forward neural nets. At each iteration, we construct a quadratic approximation to the cost function in a low-dimensional subspace. We minimize this approximation inside a trust region through a two-stage procedure: first inside the embedded positive curvature subspace, followed by a gradient descent step. This approach leads to...
Classification of activities of daily living is of paramount importance in modern healthcare applications. However, hardware monitoring constraints lead frequently to missing raw values, dramatically affecting the performance of machine learning algorithms. In this work, we study the problem of efficient estimation of missing linear acceleration and angular velocity measurements, experimenting on...
This paper presents an unsupervised approach to vocal detection in music recordings based on dictionary learning. At a first stage, the recording to be segmented is treated as training data and the K-SVD algorithm is used to learn a dictionary which sparsely represents a short-term feature sequence that has been extracted from the recording. Subsequently, the vectors of the feature sequence are reconstructed...
In this paper we study the application of Matrix Completion in topic detection and classification in Twitter. The proposed method first employs Joint Complexity to perform topic detection based on score matrices. Based on the spatial correlation of tweets and the spatial characteristics of the score matrices, we apply a novel framework which extends the Matrix Completion to build dynamically complete...
Music Structural Analysis (MSA) algorithms analyze songs with the purpose of automatically retrieving their large-scale structure. They do so from a feature-based representation of the audio signal (e.g., MFCCs, chromagram), which is usually hand-designed for that specific application. In order to design a proper audio representation for MSA, we need to assess which musical properties are relevant...
In recent years, voice conversion (VC) becomes a popular technique since it can be applied to various speech tasks. Most existing approaches on VC must use aligned speech pairs (parallel data) of the source speaker and the target speaker in training, which makes hard to handle it. Furthermore, VC methods proposed so far require to specify the source speaker in conversion stage, even though we just...
A design method of a multiple description vector quantizer (VQ) is proposed. VQ is widely used for data compression, transmission and other processing. Here, we assume transmission channels with data erasure such as a packet-based network. Multiple description coding is a coding method used to achieve “graceful degradation” when transmitting signals through lossy channels. The proposed method is inspired...
Passive Millimeter Wave Images (PMMWI) can be used to detect and localize objects concealed under clothing. Unfortunately, the quality of the acquired images and the unknown position, shape, and size of the hidden objects render difficult this task. In this paper we propose a method that combines image processing and statistical machine learning techniques to solve this localization/detection problem...
The bag-of-audio-words approach has been widely used for audio event recognition. In these models, a local feature of an audio signal is matched to a code word according to a learned codebook. The signal is then represented by frequencies of the matched code words on the whole signal. We present in this paper an improved model based on the idea of audio phrases which are sequences of multiple audio...
In real-life audio scenes, many sound events from different sources are simultaneously active, which makes the automatic sound event detection challenging. In this paper, we compare two different deep learning methods for the detection of environmental sound events: combined single-label classification and multi-label classification. We investigate the accuracy of both methods on the audio with different...
In this paper, we propose a novel method for the automatic detection of fetal head in 2D ultrasound images. Fetal head detection has been a challenging task, as the ultrasound images usually have poor quality, the structures contained in the images are complex, and the gray scale distribution is highly variable. Our approach is based on a deep belief network and a modified circle detection method...
We present a method for real-time detection and classification of impact sounds — relying solely on spatial features — that exploits the difference in the location of each impacted structure. Using a compact sensor array, we formulate the classification problem in terms of an undetermined source separation process where we assume that the linear mixing model can be learned through a training phase...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.