The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a performance comparison of several state-of-the-art visual feature extraction algorithms when applied in a poorly-structured environment as found on the planet Mars. So far, no systematic evaluation of feature extraction algorithms in extraterrestrial environments is available. The algorithms in this paper are evaluated using the Devon Island dataset which is said to have one...
Motor imagery (MI) based on brain computer interfaces (BCIs) have been widely applied for upper limb motor rehabilitation. Due to the fact that a large number of disabled people need to restore or improve walking ability, it is also important to investigate the use of MI-based BCIs for lower limb motor rehabilitation. The brain activity of lower limb MI is more difficult to detect because of low reliability...
Kotenseki is a collection of classical and ancient Japanese literature. It is comprised of image books that express Japanese stories by using comic drawings of different characters, such as humans, nature, and animals. To effectively store them for posterity, a search system is important. We propose an efficient CBIR system to assist the users in easily accessing the information and have an enjoyable...
We propose an image-text alignment framework to match images with text, and take blog article summarization as the main application. Objects in an image are first detected, from them deep features are extracted and transformed into a space commonly shared with the text. On the other hand, sentences of a blog article are represented as vectors, and are also embedded into the common space. With these...
Convolutional neural network (CNN) has drawn increasing interest in visual tracking, among which fully-convolutional Siamese network based method (SiamFC) is quite popular due to its competitive performance in both precision and efficiency. Generally, SiamFC captures robust semantics from high-level features in the last layer but ignores detailed spatial features in earlier layers, thus tending to...
Object deformation and occlusion are ubiquitous problems for visual tracking. Though many efforts have been made to handle object deformation and occlusion, most existing tracking algorithms fail in case of large deformation and severe occlusion. In this paper, we propose a graph learning-based tracking framework to handle both challenges. For each consecutive frame pair, we construct a weighted graph,...
Welding is a process recognized by the laborious work and hazardous work environment it takes place, but it is an important process in different industrial scenarios, like the shipbuilding industry. The use of robots has been increasing in recent years, reducing the human interference necessary for the process. This paper proposes a system for automated seam tracking and a geometric welding bead analysis...
A big challenge in the precision agriculture is the detection of fruits in coffee crops on agricultural environments. This paper presents a comparison of four features set to detect the red fruits (mature) in Coffee plants. An Unmanned Aerial Vehicle (UAV) is used to obtain high-resolution RGB images of a coffee hall. The proposed methodology enables the extraction of visual features from image regions...
Cardiac auscultation allows diagnosing the heart by listening to its sounds. Current cardiac auscultation training is seeing a preference towards diagnostics equipment such as the echocardiograph that allows visualizing and listening to the heart to determine how the heart is working, rather than the use of the stethoscope which only provides auditory feedback, resulting in a loss of stethoscope-based...
Affordance learning in general, is to identify the purpose, use, and ways to interact with an object, based on information gained from observing the object. Most of the existing affordance learning approaches assume the object target has been cropped individually from images. However, the object could not be easily separated from others due to occlusion or noise. Actually, two or more neighboring...
Agent-based modeling is a paradigm of modeling dynamic systems of interacting agents that are individually governed by specified behavioral rules. Training a model of such agents to produce an emergent behavior by specification of the emergent (as opposed to agent) behavior is easier from a demonstration perspective. Without the involvement of manual behavior specification via code or reliance on...
Analyzing spatio-temporal data like video is a challenging task that requires processing visual and temporal information effectively. Convolutional Neural Networks have shown promise as baseline fixed feature extractors through transfer learning, a technique that helps minimize the training cost on visual information. Temporal information is often handled using hand-crafted features or Recurrent Neural...
Building effective recommender systems for domains like fashion is challenging due to the high level of subjectivity and the semantic complexity of the features involved (i.e., fashion styles). Recent work has shown that approaches to 'visual' recommendation (e.g. clothing, art, etc.) can be made more accurate by incorporating visual signals directly into the recommendation objective, using 'off-the-shelf'...
Convolutional neural network (CNN) based trackers have achieved significant performances in tracking recently. Most existing CNN-based trackers regard tracking as a classification or similarity searching problem. The two methods have their respective superiorities and limitations because of different supervised objectives. In this paper, we propose a multi-task CNN for visual tracking, not only fully...
Many of the existing methods for learning joint embedding of images and text use only supervised information from paired images and its textual attributes. Taking advantage of the recent success of unsupervised learning in deep neural networks, we propose an end-to-end learning framework that is able to extract more robust multi-modal representations across domains. The proposed method combines representation...
Recognizing fine-grained categories (e.g., bird species) highly relies on discriminative part localization and part-based fine-grained feature learning. Existing approaches predominantly solve these challenges independently, while neglecting the fact that part localization (e.g., head of a bird) and fine-grained feature learning (e.g., head shape) are mutually correlated. In this paper, we propose...
This paper introduces a novel approach for modeling visual relations between pairs of objects. We call relation a triplet of the form (subject; predicate; object) where the predicate is typically a preposition (eg. ’under’, ’in front of’) or a verb (’hold’, ’ride’) that links a pair of objects (subject; object). Learning such relations is challenging as the objects have different spatial configurations...
We propose a novel measure of visual similarity for image retrieval that incorporates both structural and aesthetic (style) constraints. Our algorithm accepts a query as sketched shape, and a set of one or more contextual images specifying the desired visual aesthetic. A triplet network is used to learn a feature embedding capable of measuring style similarity independent of structure, delivering...
The intensive annotation cost and the rich but unlabeled data contained in videos motivate us to propose an unsupervised video-based person re-identification (re-ID) method. We start from two assumptions: 1) different video tracklets typically contain different persons, given that the tracklets are taken at distinct places or with long intervals; 2) within each tracklet, the frames are mostly of the...
We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.