Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
We propose novel methods for automatically detecting non-stationary segments using non-negative matrix factorization (NMF) with aiming to effective sound annotation. For acoustic event detection or acoustic scene analysis, preparing a sufficient amount of training data is important. However, listening to all recorded signals and annotating them are very time-consuming. Assuming that the observed acoustic...
In this paper, a simple and efficient method, based on Quaternionic Distance Based Weber Descriptor (QDWD) and object cues, is proposed for saliency detection. Firstly, QDWD, which was initially designed for detecting outliers in color images, is used to represent the directional cues in an image. Meanwhile, two low-level priors, namely the color contrast and center cue of the image, are utilized...
Recently, the deep neural networks (DNNs) are successfully adopted into the voice activity detection (VAD) area. However, the performance of the DNN-based VAD is still unsatisfactory in noise environments where the feature subspace of the training database and the test environments are not matched with each other. In this paper, we propose a local feature shift technique which normalizes the feature...
Active noise control (ANC) is an efficient technique to deal with low frequency noise that is difficult to be abated by noise barrier or sound absorbing material. Many successful ANC systems have adopted the feedforward filtered-x least mean squares (FxLMS) algorithm to reduce machinery noise. The noise canceling headset is another well known example, where the feedback control structure is favorable...
To date, many researchers have been conducted studies to control an electrical power to construct a smart home system which automatically manipulates individuals. One of the recent topics is NILM(Non-intrusive Load Monitoring) system to infer the devices states. In NILM, the approaches have been focused on dealing only with the feature of the electrical power signals to identify the states of the...
We propose two simple methods to improve the performance of a keyword spotting system. In our application, the users are allowed to change the keywords anytime if they want. Thus we focused on phone-based GMM-HMM models since they do not require keyword-specific training data. However, the GMM-HMM based models usually have very high false alarm rate, i.e., a keyword is not present but the system gives...
Dual-camera electronic devices have the potential to deliver additional functionalities, such as depth acquisition and high dynamic range (HDR) imaging, more easily than those with a single camera. In this paper, we focus on the generation of a high dynamic range image for electronic devices equipped with a pair of parallel cameras that have different exposure time. Specifically, we propose a method...
In this paper, a novel Dynamic Convolutional Neural Network (D-CNN) is proposed using sensor data for activity recognition. Sensor data collected for activity recognition is usually not well-aligned. It may also contains noises and variations from different persons. To overcome these challenges, Gaussian Mixture Models (GMM) is exploited to capture the distribution of each activity. Then, sensor data...
Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement...
Given the increasing attention paid to speech emotion classification in recent years, this work presents a novel speech emotion classification approach based on the multiple kernel Gaussian process. Two major aspects of a classification problem that play an important role in classification accuracy are addressed, i.e. feature extraction and classification. Prosodic features and other features widely...
We present an unsupervised method for discovering objects from depth information. Our method can identify new common objects appearing in different depth images. We use 2D bounding box proposals to detect candidate locations of objects in each depth image, and then retrieve the corresponding 3D bounding boxes using the depth information. Invalid object proposals can be further removed by analyzing...
In this paper, a saliency aware fast intra coding algorithm for HEVC is proposed consists of perceptual intra coding and fast intra prediction mode decision algorithm. Firstly, based on the visual saliency detection, an adaptive CU splitting method is proposed to reduce intra encoding complexity. Furthermore, quantization parameter is adaptively adjusted at the CU level according to the relative importance...
This paper proposes a filtering approach based on global motion estimation (GME) and global motion compensation (GMC) as pre-processing and post-processing for video CODEC. For the pre-processing of video CODEC, group-of-pictures (GOP), i.e., basic unit for GMC and reference frames are first defined for an input video sequence. Next, GME and GMC are sequentially performed for every frame in each GOP...
In this paper, we investigate a DNN tone-based extended recognition network (ERN) approach to Mandarin tone recognition and tone mispronunciation detection. Given a toneless syllable sequence, a tone-based ERN is constructed by assigning five different tones to each toneless syllable, obtaining a fully expanded tonal syllable network. Next, Viterbi decoding is carried out on the tone-based ERN to...
Recently, deep and/or recurrent neural networks (DNNs/RNNs) have been employed for voice conversion, and have significantly improved the performance of converted speech. However, DNNs/RNNs generally require a large amount of parallel training data (e.g., hundreds of utterances) from source and target speakers. It is expensive to collect such a large amount of data, and impossible in some applications,...
In this paper, we propose a frequency-domain speech enhancement algorithm with phase estimation, in which the speech model is modeled by a Gaussian mixture model (GMM) in the log-spectral domain and two closed-form log-spectral amplitude estimators for speech and noise are derived directly by using a Mixture-Maximum (MIXMAX) model. Because the accurate estimation of speech phase could help to reduce...
A physical wireless conversion sensor network (PhyC-SN) is attracting much attention for achieving real time collection of massive sensing data and reduction of power consumption in wireless sensor networks. Since the collected sensing data are interfered each other, we can hardly analyze the tendency of each sensing data. This paper proposes the novel data separation based on the data tracking with...
Devices of IoT (Internet of Things) are limited in resources such as CPU, memory etc. The LEA (Lightweight Encryption Algorithm) was standardized as the encryption algorithm suitable for IoT devices in Korea in 2013. However, LEA is vulnerable to the side-channel analysis attack using consumed electric power. To supplement this vulnerability, masking technique is mainly used. However, in case of masking...
We propose to detect mispronunciations in a language learners speech via a discriminatively trained DNN in the phonetic space. The posterior probabilities of “senones” populated in a decision tree are trained and predicted speaker independently. Acoustic features of each input segment (with preceding and succeeding contexts of several frames) are mapped unto the whole set of senones in their corresponding...
This paper presents an overview of the studies that have been conducted with the purpose of understanding the use of brain signals as input to a speech recogniser. The studies have been categorised based on the type of the technology used with a summary of the methodologies used and achieved results. In addition, the paper gives an insight into some studies that examined the effect of the chosen stimuli...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.