The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
While fine-grained object recognition is an important problem in computer vision, current models are unlikely to accurately classify objects in the wild. These fully supervised models need additional annotated images to classify objects in every new scenario, a task that is infeasible. However, sources such as e-commerce websites and field guides provide annotated images for many classes. In this...
The role of semantics in zero-shot learning is considered. The effectiveness of previous approaches is analyzed according to the form of supervision provided. While some learn semantics independently, others only supervise the semantic subspace explained by training classes. Thus, the former is able to constrain the whole space but lacks the ability to model semantic correlations. The latter addresses...
Part-based image classification aims at representing categories by small sets of learned discriminative parts, upon which an image representation is built. Considered as a promising avenue a decade ago, this direction has been neglected since the advent of deep neural networks. In this context, this paper brings two contributions: first, this work proceeds one step further compared to recent part-based...
Bilinear convolutional neural networks (BCNN) model, the state-of-the-art in fine-grained image recognition, fails in distinguishing the categories with subtle visual differences. We design a novel BCNN model guided by user click data (C-BCNN) to improve the performance via capturing both the visual and semantical content in images. Specially, to deal with the heavy noise in large-scale click data,...
In this paper, we propose a deep convolutional neural network model for in-bed behavior recognition and bed-exit prediction. This model extracts features for training from depth images taken by depth cameras in two categories: in-bed images taken several time intervals before a patient gets out of bed, and usual in-bed activity images. The depth camera-based model features grayscale and low-resolution...
This paper presents an image recognition technique based on discriminative models using features generated from separable lattice hidden Markov models (SL-HMMs). A major problem in image recognition is that the recognition performance is degraded by geometric variations such as that in position and size of the object to be recognized. SL-HMMs have been proposed to solve this problem. SL-HMMs are an...
Reducing the computational budget of inference in deep neural network while achieving high accuracy is important for time-sensitive applications. In this paper, unlike other approaches that try to compress a large neural network to a neural network with a smaller number of parameters, we try to complete the image classification as early as possible. Adding a middle output layer, we try to complete...
Very deep convolutional networks have recently demonstrated impressive classification performance on competitive benchmarks such as the ImageNet or COCO tasks. However, training such deep convolutional networks becomes more difficult. In this paper, we propose a novel deep network structure called cross-layer architecture to make the best use of information learned from all the lower-level layers...
Augmented reality combines real footage taken of a scene with virtual elements. However, most current methods rely on camera localisation and 3D reconstruction or point cloud generation in order to integrate augmented reality to the footage. In contrast, in this work we present a novel method to augment virtual reality to the scene based on the recognition of dominant planes in interior scenes. Our...
Deep learning neural networks have emerged as one of the most powerful classification tools for vision related applications. However, the computational and energy requirements associated with such deep nets can be quite high, and hence their energy-efficient implementation is of great interest. Although traditionally the entire network is utilized for the recognition of all inputs, we observe that...
Recent advances in deep learning made it possible to build deep hierarchical models capable of delivering state-of-the-art performance in various vision tasks, such as object recognition, detection or tracking. For recognition tasks the most common approach when using deep models is to learn object representations (or features) directly from raw image-input and then feed the learned features to a...
Apparel classification encompasses the identification of an outfit in an image. The area has its applications in social media advertising, e-commerce and criminal law. In our work, we introduce a new method for shopping apparels online. This paper describes our approach to classify images using Convolutional Neural Networks. We concentrate mainly on two aspects of apparel classification: (1) Multiclass...
Vision-based localization on robots and vehicles remains unsolved when extreme appearance change and viewpoint change are present simultaneously. The current state of the art approaches to this challenge either deal with only one of these two problems; for example FAB-MAP (viewpoint invariance) or SeqSLAM (appearance-invariance), or use extensive training within the test environment, an impractical...
To recognize facial expression from candid, non-posed images, we propose a deep-learning based approach using convolutional neural networks (CNNs). In order to evaluate the performance in real-time candid facial expression recognition, we have created a candid image facial expression (CIFE) dataset, with seven types of expression in more than 10,000 images gathered from the Web. As baselines, two...
In this paper, we present a novel framework for representation of images as a combination of multiple mid-level feature descriptor representation based group of visual words. The mid-level feature representation is computed on discriminative patches of the image to build a lexicon, the visual words of which are used to represent the shape within that image. The proposed image representation method...
Because of the increasing popularity of camera-equipped mobile devices, image matching techniques offer a potential solution for indoor localisation problems. However, image matching is challenging indoors because different indoor locations can look very similar. In this paper, we compare two image-based localisation approaches on realistic datasets that include images from cameras of varying quality...
This article is focused on automatic recognition of jewelery stones quality. An image recognition method is described. Relevant image characteristics are computed, which are then used to classify the stone quality. Classification is performed by an algorithm based on binary decision trees with the decision thresholds adapted from a training dataset. At the end, the time complexity as well as accuracy...
It has been shown that incorporation of human-specified high-level description of the target objects, e.g. labeled prior-knowledge data, can increase the performance of one-shot recognition. In this paper, we introduce latent components as a high level representation of the original objects and propose a cascade model for one-shot image recognition based on latent components learned by Hierarchical...
Pattern recognition methods have become a powerful tool for segmentation in the sense that they are capable of automatically building a segmentation model from training images. However, they present several difficulties, such as requirement of a large set of training data, robustness to imaging conditions not present in the training set, and complexity of the search process. In this paper we tackle...
Given a sequence of observable features of a linear dynamical system (LDS), we propose the problem of finding a representation of the LDS which is sparse in terms of a given dictionary of LDSs. Since LDSs do not belong to Euclidean space, traditional sparse coding techniques do not apply. We propose a probabilistic framework and an efficient MAP algorithm to learn this sparse code. Since dynamic textures...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.