The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Surveillance video parsing, which segments the video frames into several labels, e.g., face, pants, left-leg, has wide applications [41, 8]. However, pixel-wisely annotating all frames is tedious and inefficient. In this paper, we develop a Single frame Video Parsing (SVP) method which requires only one labeled frame per video in training stage. To parse one particular frame, the video segment preceding...
In this paper, a deep learning based method is proposed to perform the basal membrane segmentation, which is a crucial prerequisite step in the diagnosis of Microinvasive Carcinoma of the Cervix (MIC). We regard this as an object contour segmentaion problem and modify the recently proposed contour segmentation deep learning model with adversarial training strategy. Traditional adversarial networks...
This paper addresses the problem of weakly supervised semantic image segmentation. Our goal is to label every pixel in a new image, given only image-level object labels associated with training images. Our problem statement differs from common semantic segmentation, where pixel-wise annotations are typically assumed available in training. We specify a novel deep architecture which fuses three distinct...
We propose a novel superpixel-based multi-view convolutional neural network for semantic image segmentation. The proposed network produces a high quality segmentation of a single image by leveraging information from additional views of the same scene. Particularly in indoor videos such as captured by robotic platforms or handheld and bodyworn RGBD cameras, nearby video frames provide diverse viewpoints...
Human parsing has recently attracted a lot of research interests due to its huge application potentials. However existing datasets have limited number of images and annotations, and lack the variety of human appearances and the coverage of challenging cases in unconstrained environment. In this paper, we introduce a new benchmark Look into Person (LIP) that makes a significant advance in terms of...
This paper presents a novel yet intuitive approach to unsupervised feature learning. Inspired by the human visual system, we explore whether low-level motion-based grouping cues can be used to learn an effective visual representation. Specifically, we use unsupervised motion-based segmentation on videos to obtain segments, which we use as pseudo ground truth to train a convolutional network to segment...
We propose a novel data-driven approach for automatically detecting and completing gaps in line drawings with a Convolutional Neural Network. In the case of existing inpainting approaches for natural images, masks indicating the missing regions are generally required as input. Here, we show that line drawings have enough structures that can be learned by the CNN to allow automatic detection and completion...
We investigate and improve self-supervision as a drop-in replacement for ImageNet pretraining, focusing on automatic colorization as the proxy task. Self-supervised training has been shown to be more promising for utilizing unlabeled data than other, traditional unsupervised learning methods. We build on this success and evaluate the ability of our self-supervised network in several contexts. On VOC...
Scene text detection has attracted great attention these years. Text potentially exist in a wide variety of images or videos and play an important role in understanding the scene. In this paper, we present a novel text detection algorithm which is composed of two cascaded steps: (1) a multi-scale fully convolutional neural network (FCN) is proposed to extract text block regions, (2) a novel instance...
While convolutional neural networks have gained impressive success recently in solving structured prediction problems such as semantic segmentation, it remains a challenge to differentiate individual object instances in the scene. Instance segmentation is very important in a variety of applications, such as autonomous driving, image captioning, and visual question answering. Techniques that combine...
Target segmentation of synthetic aperture radar (SAR) images is one of the challenging problems in SAR image interpretation, which often serves as a processing step for SAR target recognition. Target segmentation tries to separate the target from the background thus eliminating the interference of background noises or clutters. However, the segmentation may also discard a part of the target characteristics...
Semantic segmentation for remote sensing images is a critical process in the workflow of object-based image analysis. Recently, convolutional neural networks(CNNs) are powerful visual models that yield hierarchies of features. In this paper, we propose a deep convolutional encoder-decoder model for remote sensing images segmentation. Specifically, we rely on the encoder network to extract the high-level...
Automated annotation of urban areas from overhead imagery plays an essential role in many remote sensing applications. Generative Adversarial Nets (GANs) is one of the most effective ways to handle this problem. In this manuscript, two tricks were added in conditional GANs(cGANs) which learn the mapping from input image to output remote sensing image. All the experimental results demonstrated that...
This paper presents a computer vision-based methodology for human action recognition. First, the shape based pose features are constructed based on area ratios to identify the human silhouette in images. The proposed features are invariance to translation and scaling. Once the human body features are extracted from videos, different human actions are learned individually on the training frames of...
We introduce a novel loss max-pooling concept for handling imbalanced training data distributions, applicable as alternative loss layer in the context of deep neural networks for semantic image segmentation. Most real-world semantic segmentation datasets exhibit long tail distributions with few object categories comprising the majority of data and consequently biasing the classifiers towards them...
Fusing different sensors with different data modalities is a common technique to improve land cover classification performance in remote sensing. However, all modalities are rarely available for all test data, and this missing data problem poses severe challenges for multi-modal learning. Inspired by recent successes in deep learning, we propose as a remedy a convolutional neural network architecture...
In this paper, a new classifier under Bayesian framework is proposed to explore homogeneous region based low rank representation in hidden field for classification of hyperspectral imagery (HSI). This classifier integrates low rank representation and superpixel segmentation simultaneously, in which the HSI data is assumed to be lying in a low rank subspace within each homogeneous region of an estimated...
Visual words of Bag-of-Visual-Words (BoVW) framework are independent each other, which results in not only discarding spatial orders between visual words but also lacking semantic information. This study is inspired by word embeddings that a similar embedding procedure is applied to a large number of visual words. By this way, the corresponding embedding vectors of the visual words can be formulated...
The aim of this work is to explore the usefulness of face semantic segmentation for head pose estimation. We implement a multi-class face segmentation algorithm and we train a model for each considered pose. Given a new test image, the probabilities associated to face parts by the different models are used as the only information for estimating the head orientation. A simple algorithm is proposed...
We propose a new semantic segmentation method and the necessity of certainty for practical use of semantic segmentation in scene understanding. We implement a deep fully convolutional encoder-decoder neural network for semantic segmentation. This network architecture makes the segmentation accuracy improve by retaining boundary details in the extracted image representation. This accuracy means how...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.