The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposed video stabilization techniques using undesired motion detection and alpha-trimming mean filter. The proposed method consists of detecting undesired motions step and filtering the undesired motions step. The limitation on undesired motions is defined, using the local motion information. The alpha-trimming mean filter's alpha is controlled based on this limitation, so that regenerated...
A brain-computer interface (BCI) based on steady state visual evoked potentials (SSVEPs) is one of the most practical BCI, because of high recognition accuracies and short time training. To increase the number of commands of SSVEP-based BCI, recently a frequency and phase mixed-coded SSVEP BCI has been proposed. However, in order to detect frequency and phase of SSVEPs accurately, it is required to...
A driver is regarded as a system that receives visual information and that controls the steering wheel. To identify the system, we conducted experiments to get input-output data using a driving simulator and confirmed that the focus of expansion of optical flow has sufficient information to predict steering behaviors.
In this paper, we propose that deep neural networks that are thinner than typical image recognition networks can perform sketch recognition effectively. As in other computer vision problems, convolutional neural networks (CNNs) outperform other feature extraction methods significantly in sketch recognition. To date, two CNN structures have been proposed for sketch recognition, as described in [4]...
We address a novel nonnegative matrix factorization (NMF) with a new basis deformation method to handle various music sounds. Conventional supervised NMF has a critical problem that a mismatch between bases trained in advance and an actual target sound reduces the accuracy of separation. To solve this problem, we proposed an advanced supervised NMF that applies a single time-invariant filter to the...
We study the effectiveness of using convolutional neural networks (CNNs) to automatically detect abnormal heart and lung sounds and classify them into different classes in this paper. Heart and respiratory diseases have been affecting humankind for a long time. An effective and automatic diagnostic method is highly attractive since it can help discover potential threat at the early stage, even at...
In this paper, we present a novel multi-photo-based framework to solve a self-portrait enhancement problem we call “1+2 problem”, in which a self-portrait photo is enhanced with the help of two multiple photos that share the same scene and similar shooting time. The key idea is to exploit the extra information of these two photos to overcome the limited field of view and poor illumination of the target...
Reversible data hiding (RDH) is a specific information hiding technique in which both the embedded data and the original cover medium can be exactly extracted from the marked data. In this paper, we present a general expansion-shifting model for RDH by introducing the so-called reversible embedding function (REF) which maps each point of Zn to a nonempty subset of Zn. Moreover, to guarantee the reversibility,...
This paper proposes an approach to perform accent adaptation by using accent dependent bottleneck (BN) features to improve the performance of multi-accent Mandarin speech recognition system. The architecture of the adaptation uses two neural networks. First, deep neural network (DNN) acoustic model acts as a feature extractor which is used to extract accent dependent BN (BN-DNN) features. The input...
In fingerprint recognition system, minutiae-based matching algorithms are most intensively researched. However, in most minutia-based methods, the similarity score is given based on the main score of matched minutiae. And the boosted information is not effectively used in the final similarity score computation. Based on the observation, we extract several features as the supplementary scores. And...
This paper proposes a cascading deep neural network (DNN) structure for speech synthesis system that consists of text-to-bottleneck (TTB) and bottleneck-to-speech (BTS) models. Unlike conventional single structure that requires a large database to find complicated mapping rules between linguistic and acoustic features, the proposed structure is very effective even if the available training database...
Recently, a deep beamforming (BF) network was proposed to predict BF weights from phase-carrying features, such as generalized cross correlation (GCC). The BF network is trained jointly with the acoustic model to minimize automatic speech recognition (ASR) cost function. In this paper, we propose to replace GCC with features derived from input signals' spatial covariance matrices (SCM), which contain...
The emerging high efficient video coding (HEVC) standard adopts quad-tree structure to partition the coding unit (CU) which is flexible and efficient. However, it causes enormous computational complexity. In this paper, a fast CU partitioning algorithm in the inter prediction of HEVC is proposed. Firstly, based on the visual saliency map detection, a fast CU partitioning depth prediction algorithm...
In this paper, we propose a framework that fuses textual and visual features of user generated social media data to mine the distribution of user interests. The proposed framework consists of three steps: feature extraction, model training, and user interest mining. We choose boards from popular users on Pinterest to collect training and test data. For each pin we extract the term-document matrices...
We propose a voice conversion framework to map the speech features of a source speaker to a target speaker based on deep neural networks (DNNs). Due to a limited availability of the parallel data needed for a pair of source and target speakers, speech synthesis and dynamic time warping are utilized to construct a large parallel corpus for DNN training. With a small corpus to train DNNs, a lower log...
The recent research studies showed that inter-layered network coding is a promising approach to provide the unequal error protection for scalable video multicast under the channel heterogeneity. The selection of the optimal transmission distribution performed at eNB increases the system performance with the cost of time and computational complexities. In this paper, we propose an optimal transmission...
We propose a sudden-noise suppression method for speech recognition using a phase linearity feature for noise detection. Our investigation of sound data recorded in actual retail stores shows that short, sudden noises are dominant in such environments. We also confirm the negative effect of such noises on speech recognition performance. Our method addresses this problem by focusing on sudden noises...
Just noticeable difference (JND), which reveals the visibility of our human visual system (HVS), is useful for image/video coding. Due to the content complexity, it is hard to accurately estimate the JND thresholds for different image blocks (e.g., edge and texture). Research on cognitive science indicates that the HVS is adaptive to extract the visual regularities for scene perception and understanding...
We propose a novel noisy low-light image enhancement algorithm via structure-texture-noise (STN) decomposition. We split an input image into structure, texture, and noise components, and enhance the structure and texture components separately. Specifically, we first enhance the contrast of the structure image, by extending a 2D histogram-based image enhancement scheme based on the characteristics...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.