The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose an image-text alignment framework to match images with text, and take blog article summarization as the main application. Objects in an image are first detected, from them deep features are extracted and transformed into a space commonly shared with the text. On the other hand, sentences of a blog article are represented as vectors, and are also embedded into the common space. With these...
Convolutional neural network (CNN) has drawn increasing interest in visual tracking, among which fully-convolutional Siamese network based method (SiamFC) is quite popular due to its competitive performance in both precision and efficiency. Generally, SiamFC captures robust semantics from high-level features in the last layer but ignores detailed spatial features in earlier layers, thus tending to...
Following the trend of big data, the business value of data is becoming a hot research field in recent years. The novel concept of Data Jacket introduced by Ohsawa et al. solved the difficult problem of data transactions due to the particular characteristic of data, i.e. the safeguarding privacy. In order to make sure the mechanism of the market of data, there are some researchers proposed a gamified...
[Background]: Developing conceptual models is an integral part of the requirements engineering (RE) process. Goal models are requirements engineering conceptual models that allow diagrammatic representation of stakeholder intentions and how they affect each other. A specific goal modeling language construct, the contribution of goal satisfaction of one goal to another, plays a central role in supporting...
[Background]: Conceptual modeling languages have been widely studied in requirements engineering as tools for capturing, representing and reasoning about domain problems. One of these languages, goal models, has been proposed for representing the structure of stakeholder intentions. Like most other conceptual modeling languages, goal models are visualized using box-and-line diagrammatic notations...
Image is usually taken for expressing some kinds of emotions or purposes, such as love, celebrating Christmas. There is another better way that combines the image and relevant song to amplify the expression, which has drawn much attention in the social network recently. Hence, the automatic selection of songs should be expected. In this paper, we propose to retrieve semantic relevant songs just by...
Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes. Taking advantage of the recent success of unsupervised learning in deep neural networks, we propose an end-to-end learning framework that is able to extract more robust multi-modal representations across domains. The proposed method combines representation...
Referring expression is a kind of language expression that used for referring to particular objects. To make the expression without ambiguation, people often use attributes to describe the particular object. In this paper, we explore the role of attributes by incorporating them into both referring expression generation and comprehension. We first train an attribute learning model from visual objects...
We propose a novel measure of visual similarity for image retrieval that incorporates both structural and aesthetic (style) constraints. Our algorithm accepts a query as sketched shape, and a set of one or more contextual images specifying the desired visual aesthetic. A triplet network is used to learn a feature embedding capable of measuring style similarity independent of structure, delivering...
We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial...
We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself – the correspondence between the visual and the audio streams, and we introduce a novel “Audio-Visual Correspondence” learning task that makes use of this. Training visual and audio networks from...
We present a framework to analyze various aspects of models for video question answering (VideoQA) using customizable synthetic datasets, which are constructed automatically from gameplay videos. Our work is motivated by the fact that existing models are often tested only on datasets that require excessively high-level reasoning or mostly contain instances accessible through single frame inferences...
We propose a novel approach for unsupervised zero-shot learning (ZSL) of classes based on their names. Most existing unsupervised ZSL methods aim to learn a model for directly comparing image features and class names. However, this proves to be a difficult task due to dominance of non-visual semantics in underlying vector-space embeddings of class names. To address this issue, we discriminatively...
Detecting zero-day sophisticated malware is like searching for a needle in the haystack, not knowing what the needle looks like. This paper describes Android Malicious Flow Visualization Toolbox that empowers a human analyst to detect such malware. Detecting sophisticated malware requires systematic exploration of the code to identify potentially malignant code, conceiving plausible malware hypotheses,...
Blind image quality assessment (BIQA) methods aim to estimate the quality of a given test image without referring to the corresponding reference (original) image. Most BIQA methods use visual sensitivity models, which take into consideration intrinsic image characteristics (e.g. contrast, luminance, and texture) to identify degradations and estimate quality. For example, texture-based BIQA methods...
Diagrams can be an effective means of communicating complex ideas and can aid ontology engineering. Indeed, domain experts often do not have the expertise required to understand or create the complex logical statements of an ontology in description logic (DL). This paper presents a visualisation method, concept diagrams, geared toward expressing assertions and class expression axioms alongside providing...
In Enterprise Modeling, we use several languages for designing, analyzing, and communicating the different domains of an enterprise. Two important criteria for choosing a domainspecific language are its appropriateness to the requirements of the enterprise, as well as the accuracy of the language in describing the domain at hand. However, in some business domains, core concepts --such as Capability,...
We report on the results of the first visual search and rating study (N60) evaluating human gaze when assessing the realism of image composites. The effects of object identity knowledge and mismatched feature type on observers' gaze and subjective realism scores are studied. Gaze metrics used include: fixation count, fixation duration, time and duration of first fixation on target object, as well...
The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes. Annotation is performed in a dense and fine-grained style by using polygons for delineating individual objects. Our dataset is 5× larger than the total amount of fine annotations for Cityscapes...
Deep convolutional neural networks (CNNs) have been successfully applied to a wide variety of problems in computer vision, including salient object detection. To detect and segment salient objects accurately, it is necessary to extract and combine high-level semantic features with low-levelfine details simultaneously. This happens to be a challenge for CNNs as repeated subsampling operations such...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.