The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a novel technique that learns a low-dimensional feature representation from unlabeled data of a target language, and labeled data from a nontarget language. The technique is studied as a solution to query-by-example spoken term detection (QbE-STD) for a low-resource language. We extract low-dimensional features from a bottle-neck layer of a multitask deep neural network, which is jointly...
While the mobile users enjoy the anytime anywhere Internet access by connecting their mobile devices through Wi-Fi services, the increasing deployment of access points (APs) have raised a number of privacy concerns. This paper explores the potential of smartphone privacy leakage caused by surrounding APs. In particular, we study to what extent the users' personal information such as social relationships...
We propose to use a feature representation obtained by pairwise learning in a low-resource language for query-by-example spoken term detection (QbE-STD). We assume that word pairs identified by humans are available in the low-resource target language. The word pairs are parameterized by a multi-lingual bottleneck feature (BNF) extractor that is trained using transcribed data in high-resource languages...
Using speech or text to predict articulatory movements can have potential benefits for speech related applications. Many approaches have been proposed to solve the acoustic-to-articulatory inversion problem, which is much more than the exploration for predicting articulatory movements from text. In this paper, we investigate the feasibility of using deep neural network (DNN) for articulartory movement...
Recently, deep and/or recurrent neural networks (DNNs/RNNs) have been employed for voice conversion, and have significantly improved the performance of converted speech. However, DNNs/RNNs generally require a large amount of parallel training data (e.g., hundreds of utterances) from source and target speakers. It is expensive to collect such a large amount of data, and impossible in some applications,...
We use query-by-example keyword spotting (QbyE-KWS) approach to solve the personalized wake-up word detection problem for small-footprint, low-computational cost on-device applications. QbyE-KWS takes keywords as templates, and matches the templates across an audio stream via DTW to see if the keyword is included. In this paper, we use neural networks as acoustic models to extract DNN/LSTM phoneme...
In this paper, we propose to use deep neural network (DNN) as an effective tool for audio feature extraction. The DNN-derived features can be effectively used in a subsequent classifier (e.g., an SVM in this study) for audio classification. Specifically, we learn bottleneck features from a multi-layer perceptron (MLP), in which Mel filter bank feature is used as network input and one of the hidden...
A large number of videos are generated and uploaded to video websites (like youku, youtube) every day and video websites play more and more important roles in human life. While bringing convenience, the big video data raise the difficulty of video summarization to allow users to browse a video easily. However, although there are many existing video summarization approaches, the key frames selected...
In this paper, we leverage RFID technology to label different objects withRFID tags, so as to realize the vision of "show me what I see from the augmented reality system". We deploy additional RFID antennas to the COTS depth camera and propose a continuous scanning-based scheme to scan the objects, i.e., the system continuously rotates and samples the depth of field and RF-signals from these...
Previous polarimetric synthetic aperture radar (PolSAR) images change detection methods are generally undertaken in the pixel scale, resulting in overlooking the semantic information. To solve this problem, this paper presents a superpixel-based PolSAR images change detection methods. Different from some previous methods, an improved SLIC superpixel segmentation method is introduced in polarimetric...
This paper presents a deep neural network-conditional random field (DNN-CRF) system with multi-view features for sentence unit detection on English broadcast news. We proposed a set of multi-view features extracted from the acoustic, articulatory, and linguistic domains, and used them together in the DNN-CRF model to predict the sentence boundaries. We tested the accuracy of the multi-view features...
In this paper, we explore the use of prosodic features in sentence boundary detection in Chinese broadcast news. The prosodic features include speaker turn, music, pause duration, pitch, energy and speaking rate. Specifically, considering the Chinese tonal effects in pitch trajectory, we propose to use tone-normalized pitch features. Experiments using decision trees demonstrate that the tone-normalized...
This paper studies how to integrate multi-modal features in automatic topic segmentation of Mandarin broadcast news. The multi-modal feature integration problem is formulated within the Maximum Entropy (MaxEnt) scheme for topic boundary classification by maximizing the entropy and respecting all known constraints (i.e., multiple features contributions). We particularly consider two types of features:...
This paper investigates how to integrate multi-modal features for story boundary detection in broadcast news. The detection problem is formulated as a classification task, i.e., classifying each candidate into boundary/non-boundary based on a set of features. We use a diverse collection of features from text, audio and video modalities: lexical features capturing the semantic shifts of news topics...
In network intrusion detection systems, feature extraction plays an important role in a sense of improving classification performance and reducing the computational complexity. Principle Component Analysis and Independent Component Analysis are both common feature extraction methods currently. This paper proposed a novel feature extraction method for network intrusion detection and the core of this...
While many efforts have been made in the audio signal classification field, the noise interruption problem is seldom concerned so far, especially in many telecommunication applications, where a real-time and noise robust approach is needed. This paper addresses this problem by proposing two novel robust features: average pitch density (APD) and relative tonal power density (RTPD). APD refers to the...
This paper presents a two-stage multi-feature integration approach for unsupervised speaker change detection in real-time news broadcasting. We integrate MFCC and LSP features (i.e. a perceptual feature plus a articulatory feature) in the metric-based potential speaker change detection stage to collect speaker boundary candidates as many as possible. We adopt a weighted Bayesian information criterion...
Tracking vehicles is an important and challenging issue in video-based intelligent transportation systems and has been broadly investigated in the past. This paper presents a robust and real-time method for tracking vehicles and the proposed algorithm includes two stages: object region extraction, vehicle tracking. Object region extraction is a key step and the concept of tracking vehicle is built...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.