The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One-shot learning is a challenging problem where the aim is to recognize a class identified by a single training image. Given the practical importance of one-shot learning, it seems surprising that the rich information present in the class tag itself has largely been ignored. Most existing approaches restrict the use of the class tag to finding similar classes and transferring classifiers or metrics...
This paper presents a novel unsupervised domain adaptation method for cross-domain visual recognition. We propose a unified framework that reduces the shift between domains both statistically and geometrically, referred to as Joint Geometrical and Statistical Alignment (JGSA). Specifically, we learn two coupled projections that project the source domain and target domain data into low-dimensional...
This paper targets on the problem of set to set recognition, which learns the metric between two image sets. Images in each set belong to the same identity. Since images in a set can be complementary, they hopefully lead to higher accuracy in practical applications. However, the quality of each sample cannot be guaranteed, and samples with poor quality will hurt the metric. In this paper, the quality...
The role of semantics in zero-shot learning is considered. The effectiveness of previous approaches is analyzed according to the form of supervision provided. While some learn semantics independently, others only supervise the semantic subspace explained by training classes. Thus, the former is able to constrain the whole space but lacks the ability to model semantic correlations. The latter addresses...
In this paper, we study learning visual classifiers from unstructured text descriptions at part precision with no training images. We propose a learning framework that is able to connect text terms to its relevant parts and suppress connections to non-visual text terms without any part-text annotations. For instance, this learning process enables terms like beak to be sparsely linked to the visual...
Multi-instance multi-label (MIML) learning has many interesting applications in computer visions, including multi-object recognition and automatic image tagging. In these applications, additional information such as bounding-boxes, image captions and descriptions is often available during training phrase, which is referred as privileged information (PI). However, as existing works on learning using...
Recent captioning models are limited in their ability to scale and describe concepts unseen in paired image-text corpora. We propose the Novel Object Captioner (NOC), a deep visual semantic captioning model that can describe a large number of object categories not present in existing image-caption datasets. Our model takes advantage of external sources – labeled images from object recognition...
Though tremendous strides have been made in object recognition, one of the remaining open challenges is detecting small objects. We explore three aspects of the problem in the context of finding small faces: the role of scale invariance, image resolution, and contextual reasoning. While most recognition approaches aim to be scale-invariant, the cues for recognizing a 3px tall face are fundamentally...
We approach the problem of fast detection and recognition of a large number (thousands) of object categories while training on a very limited amount of examples, usually one per category. Examples of this task include: (i) detection of retail products, where we have only one studio image of each product available for training, (ii) detection of brand logos, and (iii) detection of 3D objects and their...
Semantic sparsity is a common challenge in structured visual classification problems, when the output space is complex, the vast majority of the possible predictions are rarely, if ever, seen in the training set. This paper studies semantic sparsity in situation recognition, the task of producing structured summaries of what is happening in images, including activities, objects and the roles objects...
Part-based image classification aims at representing categories by small sets of learned discriminative parts, upon which an image representation is built. Considered as a promising avenue a decade ago, this direction has been neglected since the advent of deep neural networks. In this context, this paper brings two contributions: first, this work proceeds one step further compared to recent part-based...
Target segmentation of synthetic aperture radar (SAR) images is one of the challenging problems in SAR image interpretation, which often serves as a processing step for SAR target recognition. Target segmentation tries to separate the target from the background thus eliminating the interference of background noises or clutters. However, the segmentation may also discard a part of the target characteristics...
This paper introduces an approach to recognize face from 3D space on 2D image using fuzzy vector manifolds and nearest distance. We employ fuzzy vector to help the system minimize negative effect coming from noise and image degradation. On the training set, crisp vector representation of images will be transformed to its fuzzy vector representation using a specific triangle fuzzification method. Then,...
In this work we consider the problem of developing algorithms that automatically identify small-scale solar photovoltaic arrays in high resolution aerial imagery. Such algorithms potentially offer a faster and cheaper solution to collecting small-scale photovoltaic (PV) information, such as their location, capacity, and the energy they produce. Here we build on previous algorithmic work by employing...
Coin recognition is one of the prime important activities for modern banking and currency processing systems in which machine vision is widely used. The technique at the heart of such systems is object recognition in a digital image. Although it has high recognition speed, the traditional method of coin recognition can not recognize the coins with similar sizes. This paper presents a method based...
Surveillance systems play a critical role in security and surveillance. A surveillance system with cameras that work in the visible spectrum is sufficient for most cases. However, problems may arise during the night, or in areas with less than ideal illumination conditions. Cameras with thermal infrared technology can be a better option in these situations since they do not rely on illumination to...
A robot needs to localize an unknown object before grasping it. When the robot only has a monocular sensor, how can it get the object pose? In this work, we present a method of localizing the 6-DOF pose of a target object using a robotic arm and a hand-mounted monocular camera. The method includes an object recognition and a localization process. The recognition process uses point features on a surface...
Understanding the generalization properties of deep learning models is critical for their successful usage in many applications, especially in the regimes where the number of training samples is limited. We study the generalization properties of deep neural networks (DNNs) via the Jacobian matrix of the network. Our analysis is general to arbitrary network structures, types of non-linearities and...
Deep learning has led to many breakthroughs in machine perception and data mining. Although there are many substantial advances of deep learning in the applications of image recognition and natural language processing, very few work has been done in video analysis and semantic event detection. Very deep inception and residual networks have yielded promising results in the 2014 and 2015 ILSVRC challenges,...
Bilinear convolutional neural networks (BCNN) model, the state-of-the-art in fine-grained image recognition, fails in distinguishing the categories with subtle visual differences. We design a novel BCNN model guided by user click data (C-BCNN) to improve the performance via capturing both the visual and semantical content in images. Specially, to deal with the heavy noise in large-scale click data,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.