The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Current image captioning methods are usually trained via maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation metrics, such as BLEU, METEOR and ROUGE, are also not well correlated. The newer SPICE and CIDEr metrics are better correlated, but have traditionally been hard to optimize for...
Textual-visual matching aims at measuring similarities between sentence descriptions and images. Most existing methods tackle this problem without effectively utilizing identity-level annotations. In this paper, we propose an identity-aware two-stage framework for the textual-visual matching problem. Our stage-1 CNN-LSTM network learns to embed cross-modal features with a novel Cross-Modal Cross-Entropy...
In order to solve performance reduction of space-time adaptive processing caused by interfering targets mixing in clutter training samples, a robust training samples detection method based on prior knowledge and sparse recovery is proposed. Firstly, the object region in unit to be detected is locked. Then the sparse complete base is got by discretizing the whole Angle-Doppler plane. After that, hollow...
Internet of Things (IoT) has gained substantial attention recently and plays a significant role in multiple real-world application deployments. A wide spectrum of such applications strongly depend on data fusion capabilities in the cloud from diverse information sources. In fact, various information sources often provide conflicting and contradictory for the same object, and thus it is important to...
In scenarios that are ambitious to protect sensitive data in compliance with privacy regulations, conventional score normalization utilizing large proportions of speaker cohort data is not feasible for existing technology, since the entire cohort data would need to be stored on each mobile device. Hence, in this work we motivate score normalization utilizing deep neural networks. Considering unconstrained...
Robust detection of the smallest circulating cerebral microemboli is an efficient way of preventing cerebrovascular accidents (CVA). Transcranial Doppler ultrasound is widely considered as the most convenient system for the detection of microemboli. Standard detection used in commercial device is achieved through the whole Doppler energy spectrum where constant empirical thresholds are implemented...
We present in this paper a novel approach for training a topological deep neural network with visual impression. We show that by combing denoising auto-encoder model and contractive auto-encoder with Hessian regularization model, we can achieve a deterministic auto-encoder aiming for robustness to small variations of the input. We exploit the tangent propagation algorithm to show how our algorithm...
This paper presents the time series cluster kernel (TCK) for multivariate time series with missing data. Our approach leverages the missing data handling properties of Gaussian mixture models (GMM) augmented with empirical prior distributions. Further, we exploit an ensemble learning approach to ensure robustness to parameters by combining the clustering results of many GMM to form the final kernel...
Unlike Support Vector Machine (SVM), Kernel Minimum Classification Error (KMCE) training frees kernels from training samples and jointly optimizes weights and kernel locations. Focusing on this feature of KMCE training, we propose a new method for developing compact (small scale but highly accurate) kernel classifiers by applying KMCE training to support vectors (SVs) that are selected (based on the...
Cross-resolution face recognition tackles the problem of matching face images with different resolutions. Although state-of-the-art convolutional neural network (CNN) based methods have reported promising performances on standard face recognition problems, such models cannot sufficiently describe images with resolution different from those seen during training, and thus cannot solve the above task...
In this paper, we present a novel approach for real-time object identification on a mobile platform. First, our system detects keypoints within a scaled pyramid-based FAST detector and then descriptors of the object of interest are computed using an Analytical Fourier-Mellin transform. The Fourier-Mellin is used in similarity studies due to its invariance property and discrimination power. In this...
The Cerebellar Model Articulation Controller (CMAC) is a type of neural network particularly suited to real-time control applications due to fast adaptation and the ability to handle many inputs. However, the CMAC is well-known to exhibit weight (adaptive-parameter) drift when used in adaptive control, and overlearning when applied in static learning situations. A weight smoothing algorithm originally...
License Plate Detection (LPD) is the pivotal step for License Plate Recognition. In this work, we explore and customize state-of-the-art detection approaches for exclusively handling the LPD in the wild. In-the-wild LPD considers license plates captured in challenging conditions caused by bad weathers, lighting, traffics, and other factors. As conventional methods failed to handle these inevitable...
Previous models based on Deep Convolutional Neural Networks (DCNN) for face verification focused on learning face representations. The face features extracted from the models are applied to additional metric learning to improve a verification accuracy. The models extract high-dimensional face features to solve a multi-class classification. This results in a dependency of a model on specific training...
Efficient crowd counting is an essential task in crowd monitoring, and significant advances have been made in this field recently by counting-by-regression techniques. We propose in this work a learning-to-count strategy with a generic detection algorithm which benefits from a counting regressor in order to identify crowded subregions with inadequate head detection performance, and to improve their...
Network traffic classification technique is currently a key part of network security systems. In recent years, some network traffic classification algorithms using machine learning based on packet and flow level features have been proposed, yet the results are frequently disappointing. On the one hand, obtaining a large, representative, training data set that is fully labeled to train a classifier...
Individualized blood transfusion management would benefit from the ability to prospectively identify patients at risk of complications of blood transfusion, and target them for closer monitoring or intervention. This study presents a simple and efficient multi-task learning method for predicting multiple surgical outcomes based on the weighted least squares support vector machine. To accelerate the...
In this paper, we are proposing Bag of Feature (BoF) approach for vehicle classification using Speeded Up Robust Features (SURF). First, monocular video taken using a stationary camera is given as the input to Gaussian Mixture Model (GMM) based foreground detector. Then a grid is used to measure the number of foreground pixels. If the pixels inside the grid is greater than a pre-assigned threshold,...
In video surveillance, face recognition (FR) systems seek to detect individuals of interest appearing over a distributed network of cameras. Still-to-video FR systems match faces captured in videos under challenging conditions against facial models, often designed using one reference still per individual. Although CNNs can achieve among the highest levels of accuracy in many real-world FR applications,...
In this paper, we propose a pedestrian attribute recognition approach and a CNN-based person re-identification framework enhanced by pedestrian attributes. The knowledge of person attributes can help video surveillance tasks like person re-identification as well as person search, semantic video indexing and retrieval to overcome viewpoint changes with their robustness to the inherent visual appearance...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.