The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The performance of pattern classifiers depends on the separability of the classes in the feature space — a property related to the quality of the descriptors — and the choice of informative training samples for user labeling — a procedure that usually requires active learning. This work is devoted to improve the quality of the descriptors when samples are superpixels from remote sensing images. We...
It is well known that image representations learned through ad-hoc dictionaries improve the overall results in object categorization problems. Following the widely accepted coding-pooling visual recognition pipeline, these representations are often tightly coupled with a coding stage. In this paper we show how to exploit ad-hoc representations both within the coding and the pooling phases. We learn...
In this paper we propose an improvement of a human action recognition method that uses a string-based representation and a string edit distance to compare the observed action with reference actions in the training set. In particular, the original improvement is based on a specific formulation of the string edit distance that is more suited to take into account the problems related to noise and to...
The bag of visual words is a well established representation in diverse computer vision problems. Taking inspiration from the fields of text mining and retrieval, this representation has proved to be very effective in a large number of domains. In most cases, a standard term-frequency weighting scheme is considered for representing images and videos in computer vision. This is somewhat surprising,...
Large amounts of available training data and increasing computing power have led to the recent success of deep convolutional neural networks (CNN) on a large number of applications. In this paper, we propose an effective semantic pixel labelling using CNN features, hand-crafted features and Conditional Random Fields (CRFs). Both CNN and hand-crafted features are applied to dense image patches to produce...
Learning to count is a learning strategy that has been recently proposed in the literature for dealing with problems where estimating the number of object instances in a scene is the final objective. In this framework, the task of learning to detect and localize individual object instances is seen as a harder task that can be evaded by casting the problem as that of computing a regression value from...
In this paper we introduce a new video description framework that replaces traditional Bag-of-Words with a combination of Fisher Kernels (FK) and Vector of Locally Aggregated Descriptors (VLAD). The main contributions are: (i) a fast algorithm to densely extract global frame features, easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means based vocabulary...
This paper deals with automatic systems for image recipe recognition. For this purpose, we compare and evaluate leading vision-based and text-based technologies on a new very large multimodal dataset (UPMC Food-101) containing about 100,000 recipes for a total of 101 food categories. Each item in this dataset is represented by one image plus textual information. We present deep experiments of recipe...
We present PET- the Pascal animal classes Eye Tracking database. Our database comprises eye movement recordings compiled from forty users for the bird, cat, cow, dog, horse and sheep trainval sets from the VOC 2012 image set. Different from recent eye-tracking databases such as [1, 2], a salient aspect of PET is that it contains eye movements recorded for both the free-viewing and visual search task...
The capability of applying a weak force with expected accuracy is an important motor skill in surgical operations. Acquiring such a skill is challenging for novices. In this paper, we studied how the accuracy of the force control could be enhanced through repetitive training. Twelve participants were divided into two groups. They were trained to apply a target force of 0.25N with ±20% accuracy under...
In this paper, we tackle the task of recognizing types of partly very similar identity documents using state-of-the-art visual recognition approaches. Given a scanned document, the goal is to identify the country of issue, the type of document, and its version. Whereas recognizing the individual parts of a document with known standardized layout can be done reliably, identifying the type of a document...
Fashion is a major segment in e-commerce with growing importance and a steadily increasing number of products. Since manual annotation of apparel items is very tedious, the product databases need to be organized automatically, e.g. by image classification. Common image classification approaches are based on features engineered for general purposes which perform poorly on specific images of apparel...
In this paper, we are mostly interested in investigating how the study and discovery of the human visual cortex could be utilised to improve the computational models for visual recognition by computer vision. Many of the brain perceptual abilities in vision have corresponding algorithms exist in computer vision, and in this paper we discuss three such models. First we present a model that has the...
Due to the ongoing biodiversity crisis, many species including great apes such as chimpanzees or gorillas are threatened and need to be protected. To overcome the catastrophic decline of biodiversity, biologists recently started to use remote cameras for wildlife monitoring. However, the manual analysis of the resulting image and video material is extremely tedious, time consuming, and highly cost...
Attribute based knowledge transfer has proven very successful in visual object analysis and learning previously unseen classes. However, the common approach learns and transfers attributes without taking into consideration the embedded structure between the categories in the source set. Such information provides important cues on the intraattribute variations. We propose to capture these variations...
Recognition of social styles of people is an interesting but relatively unexplored task. Recognizing "style" appears to be a quite different problem than categorization, it is like recognizing a letter's font as opposed to recognizing the letter itself. Similar-looking things must be mapped to different categories. Hence a priori it would appear that features that are good for categorization...
Data sets ordinarily includes a huge number of attributes, with irrelevant and redundant attributes. Redundant and irrelevant attributes might minimize the classification accuracy because of the huge search space. The main goal of attribute reduction is choose a subset of relevant attributes from a huge number of available attributes to obtain comparable or even better classification accuracy than...
A novel approach to steady-state visual evoked potential (SSVEP) based brain-computer interface (BCI) is presented in the paper. To minimize possible side-effects of the monochromatic light SSVEP-based BCI we propose to utilize chromatic green-blue flicker stimuli in higher, comparing to the traditionally used, frequencies. The developed safer SSVEP responses are processed an classified with features...
Visual codebook based quantization of robust appearance descriptors extracted from local image patches is an effective means of capturing image statistics for object classification. A codebook is usually constructed by using a cluster method such as k-means at object level or image level. The codebook is global. For fine-grained categorization and recognition problems, however, the global object-level...
Embedded visual assist systems are emerging as increasingly viable tools for aiding visually impaired persons in their day-to-day life activities. Novel wearable devices with imaging capabilities will be uniquely positioned to assist visually impaired in activities such as grocery shopping. However, supporting such time-sensitive applications on embedded platforms requires an intelligent trade-off...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.