The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Usually, most of hashing methods for information retrieval have a two-step procedure, embedding the data into a low-dimensional intermediate space and then quantizing them into binary codes. In the hyperplane-based hashing methods, the distance between the data in the intermediate space can replace the Hamming distance to improve the retrieval accuracy. In this paper, a novel asymmetric distance for...
In this paper, we propose a novel framework for dynamical analysis of human actions from 3D motion capture data using topological data analysis. We model human actions using the topological features of the attractor of the dynamical system. We reconstruct the phase-space of time series corresponding to actions using time-delay embedding, and compute the persistent homology of the phase-space reconstruction...
In this paper, we present a novel unsupervised method for detecting outliers in image databases, when the images are misaligned by action of transformations forming a group. The main idea is that when the aligned data lie in a low dimensional subspace, the misaligned data, assuming that the group size is small, will lie in a low dimensional group-invariant subspace. We then explicitly exploit this...
In this study, we make use of brain activation data to investigate the perceptual plausibility of a visual and an auditory model for visual and auditory saliency in video processing. These models have already been successfully employed in a number of applications. In addition, we experiment with parameters, modifications and suitable fusion schemes. As part of this work, fMRI data from complex video...
This paper proposes a framework for tracking multiple fluorescent objects in 2D + time video-microscopy. We present a novel batch-processing track-before-detect multiple object tracking approach based on a spatio-temporal marked point process model of ellipses. Our approach takes into account events such as births, deaths, splits and merges of objects which are motivated by the biological and physical...
Deep Convolutional Neural Networks (CNN) have recently been shown to outperform previous state of the art approaches for image classification. Their success must in parts be attributed to the availability of large labeled training sets such as provided by the ImageNet benchmarking initiative. When training data is scarce, however, CNNs have proven to fail to learn descriptive features. Recent research...
For image retrieval and caption generation, this paper considers a multimodal representation that associates image with its text description (caption) by defining a neural language model as the conditional probability of the next word given both n past words in a caption and the image that the caption describes. To address the data sparsity problem, the use of the Kneser-Ney smoothing and skip-gram...
Convolutional neural networks show their advantage in human attribute analysis (e.g. age, gender and ethnicity). However, they experience issues (e.g. robustness and responsiveness) when deployed in an intelligent video system. We propose one compact CNN model and apply it in our video system motivated by the full consideration of performance and usability. With the proposed web image mining and labelling...
In this paper, we propose a supervised learning based model for ocular biometrics. Using Speeded-Up Robust Features (SURF) for detecting local features of the eye region, we create a local feature descriptor vector of each image. We cluster these feature vectors, representing an image as a normalized histogram of membership to various clusters, thereby creating a bag-of-visual-words model. We conduct...
The models based on deep convolutional networks and recurrent neural networks have dominated in recent image caption generation tasks. Performance and complexity are still eternal topic. Inspired by recent work, by combining the advantages of simple RNN and LSTM, we present a novel parallel-fusion RNN-LSTM architecture, which obtains better results than a dominated one and improves the efficiency...
Parametric motion models are commonly used in image sequence analysis for different tasks. A robust estimation framework is usually required to reliably compute the motion model. The choice of the right model is also important. However, dealing simultaneously with both issues remains an open question. We propose a robust motion model selection method with two variants, which relies on the Fisher test...
Zero shot learning (ZSL) provides a solution to recognising unseen classes without class labelled data for model learning. Most ZSL methods aim to learn a mapping from a visual feature space to a semantic embedding space, e.g. attribute or word vector spaces. The use of word vector space is particularly attractive as compared to attribute, it offers vast auxiliary classes with free parts embedding...
We present a refinement framework for background subtraction based on color and depth data. The foreground objects are segmented based on color and depth data independently, in which all of the existed background subtraction (BGS) methods can be applied. The two detected foregrounds will be very inaccurate in some situations such as shadowing and color camouflage. We focus our works on refining the...
We apply social ℓ-norms for the first time to the problem of hyperspectral unmixing while modeling spectral variability. These norms are built with inter-group penalties which are combined in a global intra-group penalization that can enforce selection of entire endmember bundles; this results in the selection of a few representative materials even in the presence of large endmembers bundles capturing...
People head detection in crowded scenes is challenging due to the large variability in clothing and appearance, small scales of people, and strong partial occlusions. Traditional bottom-up proposal methods and existing region proposal network approaches suffer from either poor recall or low precision. In this paper, we propose to improve both the recall and precision of head detection of region proposal...
This paper investigates the effect that the QP offset value has on the coding performance of HEVC. We relate QP offset to the type of texture content present in the sequence. These then are used to develop a low-complexity adaptive QP offset selection method. This enables in-loop configuration of the QP offset parameter in a way that is content dependent and utilizes available encoding statistics...
Most existing person re-identification (ReID) methods assume the availability of extensively labelled cross-view person pairs and a closed-set scenario (i.e. all the probe people exist in the gallery set). These two assumptions significantly limit their usefulness and scalability in real-world applications, particularly with large scale camera networks. To overcome the limitations, we introduce a...
Plankton image classification plays an important role in the ocean ecosystems research. Recently, a large scale database for plankton classification with over 3 million images annotated with over 100 classes was released. However, the database suffers from imbalanced class distribution in which over 90% of images belong to only 5 classes. Due to this class-imbalance problem, the existing classification...
In complex visual recognition systems, feature fusion has become crucial to discriminate between a large number of classes. In particular, fusing high-level context information with image appearance models can be effective in object/scene recognition. To this end, we develop an auto-context modeling approach under the RKHS (Reproducing Kernel Hilbert Space) setting, wherein a series of supervised...
CNN has shown excellent performance on object recognition based on huge amount of real images. For training with synthetic data rendered from 3D models alone to reduce the workload of collecting real images, we propose a concatenated self-restraint learning structure lead by a triplet and softmax jointed loss function for object recognition. Locally connected auto encoder trained from rendered images...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.