The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Deep learning based speaker verification methods (SV) have achieved the state-of-the-art performance. However, SV with short voice commands (SV-SVC) is still challenging and its performance degrades significantly when noise presents. Carefully examining of SV-SVC task in real applications reveals that there are two unavoidable limitations. One is the very short utterances used (less than 1 second)...
Environmental sound classification task (ESC) is still open and challenging. In contrast to speech, sounds of a specific acoustic event may be produced by a wide variety of sources. Thus for one class, feature spectrums of acoustic events are much more transformative than human speech. In order to learn better high-level feature representations from these transformative feature spectrums, convolution...
Video summarization (VS) is one of key video signal processing techniques for unmanned aerial vehicles (UAVs). Essentially VS aims at eliminating redundant frames in aerial videos (AVs) with high similarity, which is helpful for quick browsing, retrieving and efficient storage without losing important information. For VS technique, how to measure the similarity between video frames is not a trivial...
When noise is directional instead of diffuse, the majority of conventional direction of arrival (DOA) estimation techniques suffer from performance degradation because of mismatched noise models. In this paper, a novel robust DOA estimation algorithm is developed as an initial investigation into DOA estimation of speech under directional non-speech interference (DNSI) and non-directional background...
Wireless capsule endoscopy video summarization (WCE-VS) is highly demanded for eliminating redundant frames with high similarity. Conventional WCE-VS methods extract various hand-crafted features as image representations. Researches show that such features only reflect the low-level characteristics of single frame and essentially are not effective to capture the semantic similarity between WCE frames...
Density estimation based visual object counting (DE-VOC) methods estimate the counts of an image by integrating over its predicted density map. They perform effectively but inefficiently. This paper proposes a fast DE-VOC method but maintains its effectiveness. Essentially, the feature space of image patches from VOC can be clustered into subspaces, and the examples of each subspace can be collected...
Video-based crowd counting (VCC) is a high demanded technique in many video applications. Existing supervised VCC methods essentially learn an intrinsic mapping function between image features and corresponding crowd counts. However, imbalanced training dataset degrades the performance of VCC significantly. Encouraged by recent success in cost-sensitive learning for image classification with imbalance...
Vehicle logo recognition (VLR) is a main issue in vehicle identification system. Logo recognition is still a challenge technique since VLR methods suffer from the large within-class variations due to the different illumination conditions, different viewpoints et al. In this paper, motivated by the excellent performance of the collaborative representation based classification (CRC), we formulate VLR...
Playback attack detection (PAD) is essentially a binary classification task which is used to identify the authentic recordings from the playback recordings. For PAD problem, the difference of the acoustic feature between the authentic and playback recordings mainly comes from the recording channel and the ambient noise. Motivated by the excellent performance of the Gaussian Mixture Model-Universal...
We consider the image classification problem via kernel collaborative representation classification with locality constrained dictionary (KCRC-LCD). Specifically, we propose a kernel collaborative representation classification (KCRC) approach in which kernel method is used to improve the discrimination ability of collaborative representation classification (CRC). We then measure the similarities between...
We consider the image classification problem via multiple kernel collaborative representation (MKCR). We generalize the kernel collaborative representation based classification to a multi-kernel framework where multiple kernels are jointly learned with the representation coefficients. The intrinsic idea of multiple kernel learning is adopted in our MKCR model. Experimental results show MKCR converges...
This paper studies the classification problem of the digestive organs in wireless capsule endoscopy (WCE) images based on deep convolutional neural network (DCNN) framework. Essentially, DCNN proves having powerful ability to learn layer-wise hierarchy models with huge training data, which works similar to human biological visual systems. Classifying digestive organs in WCE images intuitively means...
We present a novel two stages signal strength difference (TS-SSD) localization algorithm in this letter. A new model using TS-SSD technique is derived to eliminate the effects of path loss exponent and unknown transmit power. And a total least squares (TLS) solution is given to estimate the distances between anchor and target nodes. Then a low-rank matrix completion framework is established to estimate...
We present a locality preserving K-SVD (LP-KSVD) algorithm for joint dictionary and classifier learning, and further incorporate kernel into our framework. In LP-KSVD, we construct a locality preserving term based on the relations between input samples and dictionary atoms, and introduce the locality via nearest neighborhood to enforce the locality of representation. Motivated by the fact that locality-related...
Adaptive filters with suitable nonlinear devices are very effective in suppressing the adverse effect due to impulse noise. In a previous work, the authors have proposed a new class of nonlinear adaptive filters using the concept of robust statistics [1,2]. The robust M-estimator is used as the objective function, instead of the mean square errors, to suppress the impulse noise. The optimal coefficient...
Many intelligent systems are required to deal with the situation of human-computer interaction. As one of the most important front ends, gender classification plays an irreplaceable role. For practical use, a real-time robust gender classification system is presented in this paper. The system consists of three principal modules: image preprocessing, face detector and gender classifier. To enhance...
A time-interleaved analog-to-digital converter (TIADC) system is a good option to significantly increase the sampling rate of an ADC. However, the performance of a TIADC suffers from mismatch errors among the sub-channels, especially the timing error. This paper presents a method to estimate the channel timing error by using the output data from TIADC and its corresponding reference channel. The proposed...
Sparse representation classification (SRC) plays an important role in pattern recognition. Recently, a more generic method named as collaborative representation classification (CRC) has greatly improved the efficiency of SRC. By taking advantage of recent development of CRC, this paper explores to smoothly apply the kernel technique to further improve its performance and proposes the kernel CRC (KCRC)...
Wireless capsule endoscopy (WCE) is a promising technology for gastrointestinal disease detection. Since there are more than 50,000 frames in one WCE video of a patient, classifying the whole frame set of the digestive tract into subsets corresponding to esophagus, stomach, small intestine, and colon is necessary, which can help physicians review and diagnose rapidly and accurately. The digestive...
This paper proposes a voice activity detection (VAD) algorithm based on a novel long-term metric. By assuming that the most significant difference between noisy speech and non-speech is the harmonicity of the noisy speech spectrum caused by human nature, the long-term auto-correlation statistics (LTACS) measure is designed to be shown as a powerful metric used in VAD. The LTACS measure is calculated...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.