The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper addresses the problem of weakly supervised semantic image segmentation. Our goal is to label every pixel in a new image, given only image-level object labels associated with training images. Our problem statement differs from common semantic segmentation, where pixel-wise annotations are typically assumed available in training. We specify a novel deep architecture which fuses three distinct...
This paper strives to track a target object in a video. Rather than specifying the target in the first frame of a video by a bounding box, we propose to track the object based on a natural language specification of the target, which provides a more natural human-machine interaction as well as a means to improve tracking results. We define three variants of tracking by language specification: one relying...
We introduce a Multiple Granularity Analysis framework for video segmentation in a coarse-to-fine manner. We cast video segmentation as a spatio-temporal superpixel labeling problem. Benefited from the bounding volume provided by off-the-shelf object trackers, we estimate the foreground/ background super-pixel labeling using the spatiotemporal multiple instance learning algorithm to obtain coarse...
We introduce an unsupervised semantic scene labeling approach that continuously learns and adapts semantic models discovered within a data stream. While closely related to unsupervised video segmentation, our algorithm is not designed to be an early video processing strategy that produces coherent over-segmentations, but instead, to directly learn higher-level semantic concepts. This is achieved with...
We investigate and improve self-supervision as a drop-in replacement for ImageNet pretraining, focusing on automatic colorization as the proxy task. Self-supervised training has been shown to be more promising for utilizing unlabeled data than other, traditional unsupervised learning methods. We build on this success and evaluate the ability of our self-supervised network in several contexts. On VOC...
We test this premise and explore representation spaces from a single deep convolutional network and their visualization to argue for a novel unified feature extraction framework. The objective is to utilize and re-purpose trained feature extractors without the need for network retraining on three remote sensing tasks i.e. superpixel mapping, pixel-level segmentation and semantic based image visualization...
In this paper the intermediary visual content verification method based on multi-level co-occurrences is studied. The co-occurrence statistics are in general used to determine relational properties between objects based on information collected from data. As such these measures are heavily subject to relative number of occurrences and give only limited amount of accuracy when predicting objects in...
Visual words of Bag-of-Visual-Words (BoVW) framework are independent each other, which results in not only discarding spatial orders between visual words but also lacking semantic information. This study is inspired by word embeddings that a similar embedding procedure is applied to a large number of visual words. By this way, the corresponding embedding vectors of the visual words can be formulated...
In this paper, a novel line segment detection method based on probability map is proposed. Firstly, the local gradient information is used to estimate if a pixel belongs to a line segment and a probability map is produced. The probability map combines gradient orientation with gradient magnitude information and can provide candidate points for edge chain extraction. Secondly, these candidate points...
Phase Contrast (PC) and Differential Interference Contrast (DIC) microscopy are two popular non-invasive techniques for monitoring live cells. Each of these two image modalities has its own advantages and disadvantages to visualize specimens, so biologists need these two complementary modalities together to analyze specimens. In this paper, we investigate a conditional Generative Adversarial Network...
In recent times, unmanned aerial vehicles (UAVs) are popular for several applications like rescue, surveillance, mapping, and so on. However, slow flight motion of Quadrotor UAVs is still a challenging issue to overcome. Although there exist several algorithms for the motion estimation and path planning of UAVs, most of them cannot be applied for fast flight in cluttered urban and forest environments...
This paper presents a Semantic Attribute assisted video SUMmarization framework (SASUM). Compared with traditional methods, SASUM has several innovative features. Firstly, we use a natural language processing tool to discover a set of keywords from an image and text corpora to form the semantic attributes of visual contents. Secondly, we train a deep convolution neural network to extract visual features...
In this paper, we present a novel framework to incorporate high-level guidance and low-level features to automatically identify salient objects based on two ideas. The first one considers the specific location prior to encode visual saliency, while the second one estimates image saliency using contrast with respect to background regions. The proposed framework consists of the following three steps:...
Recent work in computer graphics has explored the synthesis of indoor spaces with furniture, accessories, and other layout items. In this work, we bridge the gap between the physical and virtual worlds: Given an input image of an interior or exterior space, and a general user specification of the desired furnishings and layout constraints, our method automatically furnishes the scene with a realistic...
Following the recent progress in image classification and captioning using deep learning, we develop a novel natural language person retrieval system based on an attention mechanism. More specifically, given the description of a person, the goal is to localize the person in an image. To this end, we first construct a benchmark dataset for natural language person retrieval. To do so, we generate bounding...
Computed tomography (CT) stands out among the exams used by computeraided diagnosis in medical imaging as it provides the visualization of internal organs such as the lungs and their structures. This paper focuses on the segmentation of lungs using a three-dimensional region growing (3D RG) method and the registration toolkit ITK library. To evaluate the proposed segmentation method, we used 30 exams...
Computed Tomography (CT) scans are often employed to diagnose lung diseases, as abnormal tissue regions may indicate whether proper treatment is required. However, detecting specific regions containing abnormalities in a CT scan demands time and effort of specialists. Moreover, different parts of a single lung image may present both normal and abnormal characteristics, what makes inaccurate the classification...
In this paper, we propose an algorithm to detect a quadrilateral on image in order to use a rectangle as a high-level feature. The accuracy of the proposed algorithm is verified through comparison with existing quadrilateral detection algorithm.
The paper deals with the development of a system for automatic weld recognition using new information technologies based on cloud computing and single-board computer in the context of Industry 4.0. The proposed system is based on a visual system for weld recognition, and a neural network based on cloud computing for real-time weld evaluation, both implemented on a single-board low-cost computer. The...
Accessibility of RGB-D sensors have facilitated the research in gesture recognition. During sundry approaches, it is found that skeleton information is significant especially for one shot learning by virtue of the minimum requirement of data. We made a review on state-of-the-art approaches for gesture recognition in one shot learning. Based on bag of visual model (BOVW), this paper presents a study...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.