The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nowadays, the “semantic gap” problems have greatly limited development of image classification. The key to this problem is to get semantic information of the images. A semantic image feature extraction method is proposed in this paper, in which eye movement information is integrated. Firstly, the underlying visual features of images are extracted. Secondly, weighed feature vectors of images are constructed...
Hand-engineered local image features have been proven to be intended representation for a variety of high-level visual recognition tasks. But as the visual recognition tasks such as scene classification and object detection become more challenging, the semantic gap between low-level feature and the concept descriptor of the scene images increases. In this paper, we present novel semantic multinomial...
Machine-learning algorithms have shown outstanding image recognition performance for computer vision applications. While these algorithms are modeled to mimic brain-like cognitive abilities, they lack the remarkable energy-efficient processing capability of the brain. Recent studies in neuroscience reveal that the brain resolves the competition among multiple visual stimuli presented simultaneously...
Zero-shot Learning (ZSL) can leverage attributes to recognise unseen instances. However, the training data is limited and cannot adequately discriminate fine-grained classes with similar attributes. In this paper, we propose a complementary procedure that inversely makes use of attributes to infer discriminative visual features for unseen classes. In this way, ZSL is fully converted into conventional...
The scatter form of multimedia data such as text, image, audio, and video posted regularly in the social media may contain useful information for the organizations. But, this information should be derived with the use of some form of analysis known as Multimodal Sentiment Analysis (MSA). But, there is a lack of proper analytic tools for such analysis. This paper presents a thorough overview of more...
In this paper, a novel semantic segmentation model based on aggregated features and contextual information is proposed. Given an RGB-D image, we train a support vector machine (SVM) to predict initial labels using aggregated features, and then optimize the predicted results using contextual information. For aggregated features, the local features on regions are extracted to capture visual appearance...
Deep learning-based models have recently been widely successful at outperforming traditional approaches in several computer vision applications such as image classification, object recognition and action recognition. However, those models are not naturally designed to learn structural information that can be important to tasks such as human pose estimation and structured semantic interpretation of...
Attributes are defined as mid-level image characteristics shared among different categories. These characteristics are suitable in order to handle classification problems especially when training data are scarce. In this paper, we design discriminative real-valued attributes by learning nonlinear inductive maps. Our method is based on solving a constrained optimization problem that mixes three criteria;...
We present a novel algorithm for the semantic labeling of photographs shared via social media. Such imagery is diverse, exhibiting high intra-class variation that demands large training data volumes to learn representative classifiers. Unfortunately image annotation at scale is noisy resulting in errors in the training corpus that confound classifier accuracy. We show how evolutionary algorithms may...
Image modality classification categorizes images according to their type. It is an important module in the Open-iSM multimodal (text+image) search engine that retrieves figures from biomedical articles. It is a hierarchical classification where on the top level the input figures are classified into two general categories: regular images (X-ray, CT, MRI, photographs, etc.) vs. illustration images (cartoon...
Multimodal recognition has recently become more attractive and common method in multimedia information retrieval. In many cases it shows better recognition results than using only unimodal methods. Most of current multimodal recognition methods still depend on unimodal recognition results. Therefore, in order to get better recognition performance, it is important to choose suitable features and classification...
Attributes are semantic visual properties shared by objects. They have been shown to improve object recognition and to enhance content-based image search. While attributes are expected to cover multiple categories, e.g. a dalmatian and a whale can both have "smooth skin", we find that the appearance of a single attribute varies quite a bit across categories. Thus, an attribute model learned...
Image search techniques were not generally basedon visual features but on the textual annotation of images. Images were firstly annotated with text and then searched usinga text-based approach from traditional database managementsystems which is time consuming and difficult to manage. Toovercome this problem, CBIR (Content Based Image Retrieval) is introduced which is becoming the hottest research...
Identifying different types of damage is very essential in times of natural disasters, where first responders are flooding the internet with often annotated images and texts, and rescue teams are overwhelmed to prioritize often scarce resources. While most of the efforts in such humanitarian situations rely heavily on human labor and input, we propose in this paper a novel hybrid approach to help...
Movie summarization aims at condensing a full-length movie to a significantly shortened version that still preserves the movie's major semantic content. In this paper, we propose a learning-based movie summarization framework via role-community social network analysis and feature fusion. In our framework, scene-based movie summarization is formulated as a 0–1 knapsack problem, where the scene attention...
In this paper, we propose a component-based object detection method extended with the fuzzy inference technique. The proposed method detects constituent components of a complex object instead of a whole object in images. For component detection, multiple multi-class support vector machines (SVM) are used in parallel. Each SVM classifies the candidate component using a different low-level image feature...
We propose the problem of automated photo album creation from an unordered image collection. The problem is difficult as it involves a number of complex perceptual tasks that facilitate selection and ordering of photos to create a compelling visual narrative. To help solve this problem, we collect (and will make available) a new benchmark dataset based on Flickr images. Flickr Album Dataset and provides...
Food-related photos have become increasingly very popular, due to social networks, food recommendation and dietary assessment systems. Reliable annotation is essential in those systems, but user-contributed tags are often non-informative and inconsistent, and unconstrained automatic food recognition still has relatively low accuracy. Most works focus on exploiting only the visual content while ignoring...
Multi-concept image query is a multi-label classification challenge. Traditional query methods focus on single concept query, and only use image visual data without considering the associated textual tag data. In this work, we address the problem of bimodal multi-concept image query, namely retrieving bimodal images with multiple target concepts from the image set. We propose a novel Bimodal Learning...
A comprehensive survey of scene classification based on pLSA formulation literature is presented. Due to the growth in robotics there is an increase in the concern towards visual technology adaption and the interest in the concern has been growing over past years. Vision creates the premises for brain-processing. Our brain receives and keeps unconscious processing over the stupendous amount of visual...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.