The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Convolutional neural network (CNN) has drawn increasing interest in visual tracking, among which fully-convolutional Siamese network based method (SiamFC) is quite popular due to its competitive performance in both precision and efficiency. Generally, SiamFC captures robust semantics from high-level features in the last layer but ignores detailed spatial features in earlier layers, thus tending to...
Image is usually taken for expressing some kinds of emotions or purposes, such as love, celebrating Christmas. There is another better way that combines the image and relevant song to amplify the expression, which has drawn much attention in the social network recently. Hence, the automatic selection of songs should be expected. In this paper, we propose to retrieve semantic relevant songs just by...
In this paper, we propose a cross-modal deep variational hashing (CMDVH) method for cross-modality multimedia retrieval. Unlike existing cross-modal hashing methods which learn a single pair of projections to map each example as a binary vector, we design a couple of deep neural network to learn non-linear transformations from image-text input pairs, so that unified binary codes can be obtained. We...
Zero-shot learning (ZSL) aims to transfer knowledge from observed classes to the unseen classes, based on the assumption that both the seen and unseen classes share a common semantic space, among which attributes enjoy a great popularity. However, few works study whether the human-designed semantic attributes are discriminative enough to recognize different classes. Moreover, attributes are often...
Given a textual description of an image, phrase grounding localizes objects in the image referred by query phrases in the description. State-of-the-art methods address the problem by ranking a set of proposals based on the relevance to each query, which are limited by the performance of independent proposal generation systems and ignore useful cues from context in the description. In this paper, we...
Recognising semantic pedestrian attributes in surveillance images is a challenging task for computer vision, particularly when the imaging quality is poor with complex background clutter and uncontrolled viewing conditions, and the number of labelled training data is small. In this work, we formulate a Joint Recurrent Learning (JRL) model for exploring attribute context and correlation in order to...
The construction of knowledge graph of dangerous goods (KGDG) is with great significance of inferring relative information of dangerous goods, developing corresponding policy for its storage and transport, preventing disaster caused by dangerous goods(DG), and providing emergency plan when the disaster happens. Since distributed representation of natural language is an effective method for knowledge...
As technology evolves, the Internet of Things (IoT) is gaining more importance for constituting a foundation to reach better connectivity between people and things. For this to happen, certain strategies and processes are considered to enhance and grant optimal interoperability between the heterogenous devices of a typical IoT network. Two major key aspects of these networks are autonomous error recovery...
Automatic essay evaluation (AEE) systems are designed to assist a teacher in the task of classroom assessment in order to alleviate the demands of manual subject evaluation. However, although numerous AEE systems are available, most of these systems do not use elaborate domain knowledge for evaluation, which limits their ability to give informative feedback to students and also their ability to constructively...
Fully convolutional network (FCN) has been successfully applied in semantic segmentation of scenes represented with RGB images. Images augmented with depth channel provide more understanding of the geometric information of the scene in the image. The question is how to best exploit this additional information to improve the segmentation performance.,,In this paper, we present a neural network with...
Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) - a novel architecture that integrates attributes into the successful Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs) image captioning framework,...
In this work, we address multimodal learning problem with Gaussian process latent variable models (GPLVMs) and their application to cross-modal retrieval. Existing GPLVM based studies generally impose individual priors over the model parameters and ignore the intrinsic relations among these parameters. Considering the strong complementarity between modalities, we propose a novel joint prior over the...
One domain-specific modeling environment is centered around a domain-specific meta-model which defines syntax (modeling elements, e.g., classes) for the domain models. However, in order for the system designers to be able to construct meaningful models, semantics of the domain-specific meta-model needs to be described as well. This semantics is often provided in a form of informal natural language...
In this paper we describe a cyberspace of scientific papers, in which the most cited and significant documents are represented by a large size and the distance between documents is proportional to their semantic similarity. A new measure of semantic similarity of documents is proposed that is determined by the maximum correlation between explicit and implicit connectivity of the documents. A new science...
News videos store a huge amount of information and are a source of historical archives. The amount of news data is growing rapidly and unpredictably, hence video indexing on news videos is a tedious job. Manual indexing even though effective, it is slow and most expensive for a massive volume of data. Content Based Indexing and Retrieval (CBIR) is a solution for this problem. Textual modality based...
Video summarization (VS) is one of key video signal processing techniques for unmanned aerial vehicles (UAVs). Essentially VS aims at eliminating redundant frames in aerial videos (AVs) with high similarity, which is helpful for quick browsing, retrieving and efficient storage without losing important information. For VS technique, how to measure the similarity between video frames is not a trivial...
The complex event processing paradigm (CEP) has been introduced to detect and react to incoming events in many situations that require near real-time responses, though early detection and reaction to emerging scenarios is desirable and is an active research topic. One approach is to enrich CEP systems in a timely manner from diverse and heterogeneous knowledge sources. CEP enrichment is required to...
To capture the trends of concerned topics in specific field, people often use topic discovery methods to get this goal. The traditional topic discovery algorithms are generally divided into two types, text clustering algorithm and text topic model. The former lacks of attention on semantic information, and the latter always ignores relativity of the topic. These affect the topic discovery and topic...
This paper studies cross-lingual semantic similarity (CLSS) between five European languages (i.e. English, French, German, Spanish and Italian) via unsupervised word embeddings from a cross-lingual lexicon. The vocabulary in each language is projected onto a separate high-dimensional vector space, and these vector spaces are then compared using several different distance measures (i.e., correlation,...
End-to-end training from scratch of current deep architectures for new computer vision problems would require Imagenet-scale datasets, and this is not always possible. In this paper we present a method that is able to take advantage of freely available multi-modal content to train computer vision algorithms without human supervision. We put forward the idea of performing self-supervised learning of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.