The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a proof-of-concept for a novel automated indoor/outdoor navigation system. Our proposed method shall enable an object/user equipped to be able to navigate through closed environments using an automatically generated Spatial Map Graph (SMG) with the aid of pre-placed visual markers. The system is robust to dynamically changing complex environments, through adaptive reconfigurations...
Since the beginning of early civilizations, social relationships derived from each individual fundamentally form the basis of social structure in our daily life. In the computer vision literature, much progress has been made in scene understanding, such as object detection and scene parsing. Recent research focuses on the relationship between objects based on its functionality and geometrical relations...
Given a video and a description sentence with one missing word, “source sentence”, Video-Fill-In-the-Blank (VFIB) problem is to find the missing word automatically. The contextual information of the sentence, as well as visual cues from the video, are important to infer the missing word accurately. Since the source sentence is broken into two fragments: the sentence’s left fragment (before the blank)...
In this paper, we propose a novel domain-specific dataset named VegFru for fine-grained visual categorization (FGVC). While the existing datasets for FGVC are mainly focused on animal breeds or man-made objects with limited labelled data, VegFru is a larger dataset consisting of vegetables and fruits which are closely associated with the daily life of everyone. Aiming at domestic cooking and food...
In this paper we describe the 3D acquisition component integrated in the Sound of Vision (SoV) system. SoV is a computer vision based sensory substitution device (SSD) for the visually impaired. Its main objective is to provide the users with a 3D representation of the environment around them, conveyed by means of the hearing and tactile senses. One of the biggest challenges for the SoV system is...
In this paper, we discuss a semi-dense depth map interpolation method based on convolutional neural network. We propose a compact neural network architecture with loss function defined as Euclidean distance in the feature space of VGG-16 neural network used for deep visual recognition. The suggested solution shows state-of-art performance on synthetic and real datasets. Together with LSD-SLAM, the...
In this paper, we propose a patched-based deep Boltzmann shape priors for visual tracking. The shape priors are generated from deep Boltzmann machine network. The network consists of three layers of hidden and visible units. The generated shapes not only maintain general shapes from a variety of poses, but also entail local modifications with high probability.
The Mapillary Vistas Dataset is a novel, large-scale street-level image dataset containing 25000 high-resolution images annotated into 66 object categories with additional, instance-specific labels for 37 classes. Annotation is performed in a dense and fine-grained style by using polygons for delineating individual objects. Our dataset is 5× larger than the total amount of fine annotations for Cityscapes...
What defines a visual style? Fashion styles emerge organically from how people assemble outfits of clothing, making them difficult to pin down with a computational model. Low-level visual similarity can be too specific to detect stylistically similar images, while manually crafted style categories can be too abstract to capture subtle style differences. We propose an unsupervised approach to learn...
Domain adaption (DA) allows machine learning methods trained on data sampled from one distribution to be applied to data sampled from another. It is thus of great practical importance to the application of such methods. Despite the fact that tensor representations are widely used in Computer Vision to capture multi-linear relationships that affect the data, most existing DA methods are applicable...
An application of artificial vision and artificial neural networks techniques in face recognition, is presented. In order to do that, a set of images (frontal face photos) with different lighting conditions, gestures, accessories and distances is used. A stepwise algorithm allows to achieve a satisfactory results, obtaining the correct identification of images inside and outside the data set.
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network using a minimal model. The proposed minimal convolutional neural network is presented using a layering approach. This approach provides a clear understanding of the main mathematical operations in a convolutional neural network. Hence,...
Loop closure detection is an important part of visual simultaneous location and mapping (SLAM) system. Most of traditional loop closure detection approaches using hand-crafted features often lack robustness with respect to object occlusions and illumination changes, especially for the complicated indoor environment. Recently, convolutional neural network (CNN) makes a huge impact on many computer...
Articulated human pose estimation is a fundamental yet challenging task in computer vision. The difficulty is particularly pronounced in scale variations of human body parts when camera view changes or severe foreshortening happens. Although pyramid methods are widely used to handle scale changes at inference time, learning feature pyramids in deep convolutional neural networks (DCNNs) is still not...
We study large-scale multi-label classification (MLC) on two recently released datasets: Youtube-8M and Open Images that contain millions of data instances and thousands of classes. The unprecedented problem scale poses great challenges for MLC. First, finding out the correct label subset out of exponentially many choices incurs substantial ambiguity and uncertainty. Second, the large data-size and...
The rapid and irregular motion of semen cells makes the counting process of semen difficult in the visual assessment. Therefore, computer based techniques are necessary to evaluate the tests with more accurately. In this paper, an alternative way to the visual assessment technique in spermiogram tests is presented. Analyses are performed on the recorded microscope video images by computer, automatically...
We tackle the problem of learning robotic sensorimotor control policies that can generalize to visually diverse and unseen environments. Achieving broad generalization typically requires large datasets, which are difficult to obtain for task-specific interactive processes such as reinforcement learning or learning from demonstration. However, much of the visual diversity in the world can be captured...
Aesthetic quality assessment plays an important role in how people organize large image collections. Many studies on aesthetic quality assessment are based on design of hand-crafted features without considering whether attributes conveyed by images can actually affect image aesthetics. This paper presents an aesthetic quality assessment method which uses new visual features. The proposed method utilizes...
Video image dataset is playing an essential role in design and evaluation of traffic vision methods. However, there is a longstanding difficulty that manually collecting and annotating large-scale diversified dataset from real scenes is time-consuming and prone to error. In 2016, we proposed the parallel vision methodology to tackle the issues of conventional vision computing approach in data collection,...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network (CNN) using a minimal model (Minimal CNN). The proposed minimal CNN is presented using a layering approach. This approach provides a concise and accessible understanding of the main mathematical operations of a CNN. Hence, it benefits...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.