The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Kotenseki is a collection of classical and ancient Japanese literature. It is comprised of image books that express Japanese stories by using comic drawings of different characters, such as humans, nature, and animals. To effectively store them for posterity, a search system is important. We propose an efficient CBIR system to assist the users in easily accessing the information and have an enjoyable...
Affordance learning in general, is to identify the purpose, use, and ways to interact with an object, based on information gained from observing the object. Most of the existing affordance learning approaches assume the object target has been cropped individually from images. However, the object could not be easily separated from others due to occlusion or noise. Actually, two or more neighboring...
We aim to tackle a novel vision task called Weakly Supervised Visual Relation Detection (WSVRD) to detect “subject-predicate-object” relations in an image with object relation groundtruths available only at the image level. This is motivated by the fact that it is extremely expensive to label the combinatorial relations between objects at the instance level. Compared to the extensively studied problem,...
A major impediment in rapidly deploying object detection models for instance detection is the lack of large annotated datasets. For example, finding a large labeled dataset containing instances in a particular kitchen is unlikely. Each new environment with new instances requires expensive data collection and annotation. In this paper, we propose a simple approach to generate large annotated instance...
In order to realize autonomous landing of the unmanned aerial vehicle (UAV) in power patrolling, a visual method vision based on Faster Regions with Convolutional Neural Network (Faster R-CNN) for UAVs is studied. In this paper, we design the landing sign of the combination of concentric circles and pentagon, and propose the Faster R-CNN recognition algorithm which can be used to identify the target...
In this paper, we propose a multi-modal search engine for interior design that combines visual and textual queries. The goal of our engine is to retrieve interior objects, e.g. furniture or wall clocks, that share visual and aesthetic similarities with the query. Our search engine allows the user to take a photo of a room and retrieve with a high recall a list of items identical or visually similar...
Dense captioning is a newly emerging computer vision topic for understanding images with dense language descriptions. The goal is to densely detect visual concepts (e.g., objects, object parts, and interactions between them) from images, labeling each with a short descriptive phrase. We identify two key challenges of dense captioning that need to be properly addressed when tackling the problem. First,...
We propose a family of quasi-linear discriminants that outperform current large-margin methods in sliding window visual object detection and open set recognition tasks. In these tasks the classification problems are both numerically imbalanced – positive (object class) training and test windows are much rarer than negative (non-class) ones – and geometrically asymmetric –...
Given a convolutional neural network (CNN) that is pre-trained for object classification, this paper proposes to use active question-answering to semanticize neural patterns in conv-layers of the CNN and mine part concepts. For each part concept, we mine neural patterns in the pre-trained CNN, which are related to the target part, and use these patterns to construct an And-Or graph (AOG) to represent...
As the intermediate level task connecting image captioning and object detection, visual relationship detection started to catch researchers attention because of its descriptive power and clear structure. It detects the objects and captures their pair-wise interactions with a subject-predicate-object triplet, e.g. person-ride-horse. In this paper, each visual relationship is considered as a phrase...
Hough voting based methods for object detection work by means of allowing local image patches to vote for the center of the object according to the trained visual words. They are effective for object with small local varieties, but incapable of solving multi-view detection problem. The traditional way is training visual words for each subcategory that has similar view. However, limited training data...
In this paper, we present a novel framework to incorporate high-level guidance and low-level features to automatically identify salient objects based on two ideas. The first one considers the specific location prior to encode visual saliency, while the second one estimates image saliency using contrast with respect to background regions. The proposed framework consists of the following three steps:...
In this paper, an ophthalmic anesthetic training system with two cameras integrated in it to provide a real-time visual feedback to the trainee is presented. The mannequin developed uses anatomically accurate ocular structures and the trainee is able to see the needle and ocular structures in real-time on a monitor, during the training. Other than the mannequin with integrated camera system, a virtual...
The manual process for privacy setting could be very time-consuming and challenging for common users. By assuming that there are hidden correlations between the visual properties of images (i.e., visual features) or object classes and the privacy settings for image sharing, an effective algorithm is developed in this paper to achieve automatic prediction of image privacy, so that the best-matching...
Moving target detection and tracking, recognition, behaviours analysis are the key issues in the intelligent visual surveillance system (IVSS). The challenge is how to process the real-time video stream in an effective way in case that we could find the interested objects for analysis. However, the traditional video surveillance technology often does not meet the needs of real-time key frame recognition...
Most approaches for scene parsing, recognition or retrieval use detectors that are either (i) independently trained or (ii) jointly trained for conjunctions of object-object or object-attribute phrases. We posit that neither of these two extremes is uniformly optimal, in terms of performance, across all categories and conjunctions. The choice of whether one should train an independent or composite...
There is a need for automatic processing and extracting of meaningful metadata from multimedia information, especially in the audiovisual industry. This higher level information is used in a variety of practices, such as enriching multimedia content with external links, clickable objects and useful related information in general. This paper presents a system for efficient multimedia content analysis...
In this paper, we propose a mutual framework that combines two state-of-the-art visual object tracking algorithms. Both trackers benefit from each other's advantage leading to an efficient visual tracking approach. Many state-of-the-art trackers have poor performance due to rain, fog or occlusion in real-world scenarios. Often, after several frames, objects are getting lost, only leading to a short-term...
For object detection, large-scale databases obtained via an in-vehicle camera have been proposed. The databases are generally used to evaluate object detection methods and/or to train classifiers in these methods. When proposing a new database, we should evaluate the characteristics of a large number of the samples in the database to improve the usability of the proposed database. In the evaluation,...
Human detection is an essential task in so many applications, especially surveillance systems. Recently, ConvNets (Convolutional Neural Networks)-based YOLO model is a successful method applied for object (including human) detection. It is one of the fastest way to detect directly objects from the input image. However, compared to the ConvNets-based state-of-the-art object detection methods, YOLO...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.