The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper introduces a deep-learning approach to photographic style transfer that handles a large variety of image content while faithfully transferring the reference style. Our approach builds upon the recent work on painterly transfer that separates style from the content of an image by considering different layers of a neural network. However, as is, this approach is not suitable for photorealistic...
We propose a technique that propagates information forward through video data. The method is conceptually simple and can be applied to tasks that require the propagation of structured information, such as semantic labels, based on video content. We propose a Video Propagation Network that processes video frames in an adaptive manner. The model is applied online: it propagates information forward without...
We propose a general framework called Network Dissection for quantifying the interpretability of latent representations of CNNs by evaluating the alignment between individual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a data set of concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are...
Boundary and edge cues are highly beneficial in improving a wide variety of vision tasks such as semantic segmentation, object recognition, stereo, and object proposal generation. Recently, the problem of edge detection has been revisited and significant progress has been made with deep learning. While classical edge detection is a challenging binary problem in itself, the category-aware semantic...
When building artificial intelligence systems that can reason and answer questions about visual data, we need diagnostic tests to analyze our progress and discover short-comings. Existing benchmarks for visual question answering can help, but have strong biases that models can exploit to correctly answer questions without reasoning. They also conflate multiple sources of error, making it hard to pinpoint...
Shadow removal is a challenging task as it requires the detection/annotation of shadows as well as semantic understanding of the scene. In this paper, we propose an automatic and end-to-end deep neural network (DeshadowNet) to tackle these problems in a unified manner. DeshadowNet is designed with a multi-context architecture, where the output shadow matte is predicted by embedding information from...
We investigate and improve self-supervision as a drop-in replacement for ImageNet pretraining, focusing on automatic colorization as the proxy task. Self-supervised training has been shown to be more promising for utilizing unlabeled data than other, traditional unsupervised learning methods. We build on this success and evaluate the ability of our self-supervised network in several contexts. On VOC...
In this paper, we present a novel approach to estimate the relative depth of regions in monocular images. There are several contributions. First, the task of monocular depth estimation is considered as a learning-to-rank problem which offers several advantages compared to regression approaches. Second, monocular depth clues of human perception are modeled in a systematic manner. Third, we show that...
This paper proposes a deep architecture for saliency detection by fusing pixel-level and superpixel-level predictions. Different from the previous methods that either make dense pixellevel prediction with complex networks or region-level prediction for each region with fully-connected layers, this paper investigates an elegant route to make two-level predictions based on a same simple fully convolutional...
We describe an object replacement approach whereby privacy-sensitive objects in videos are replaced by abstract cartoons taken from clip art. Our approach uses a combination of computer vision, deep learning, and image processing techniques to detect objects, abstract details, and replace them with cartoon clip art. We conducted a user study (N=85) to discern the utility and effectiveness of our cartoon...
In this paper we introduce a novel multimodal boosting based solution for semantic segmentation of traffic scenarios. Local structure and context are captured from both monocular color and depth modalities in the form of image channels. We define multiple channel types at three different levels: low, intermediate and high order channels. The low order channels are computed using a multimodal multiresolution...
Perceptual image of a product plays a significant role in decision making when users choose a product whose basic function is homogeneous nowadays. Designers try to design products that meet the all kinds of demands of users. However, a big gap between designers and users exists owning to the subjectivity of designers' experience. An objective model to recognize perceptual image of products is proposed...
We propose a stereo vision based obstacle detection and scene segmentation algorithm appropriate for autonomous vehicles. Our algorithm is based on an innovative extension of the Stixel world, which neglects computing a disparity map. Ground plane and stixel distance estimation is improved by exploiting an online learned color model. Furthermore, the stixel height estimation is leveraged by an innovative...
Nowadays, the “semantic gap” problems have greatly limited development of image classification. The key to this problem is to get semantic information of the images. A semantic image feature extraction method is proposed in this paper, in which eye movement information is integrated. Firstly, the underlying visual features of images are extracted. Secondly, weighed feature vectors of images are constructed...
The key frame extraction helps us to make obtainable summary of a video. After studying a variety of diverse methods of Key frame extraction, we will have comparative analysis of the methods depending on their important features and result. If we want to present the entire video within a squat interval of time, video summary becomes the best alternative for this. This has become a very essential work...
We investigate the task of recognizing objects of daily use in human environments purely based on object descriptions given in natural language. In particular, we present an approach to transform phrases stated in natural language that describe such objects by their visual appearance into formal, semantic representations of their perceptual characteristics, which in turn can be used in a robot perception...
We investigate the effect of word typicality — the degree of membership of a word to its superordinate category — on the N400 event-related potential (ERP) using a single trial detection approach based on spatiotemporal beamforming. Unlike the norm in studies, where mostly concrete categories are used (imaginable objects), we considered a total of 6 basic categories: three abstract and unimaginable...
The movie domain is one of the most common scenarios to test and evaluate recommender systems. These systems are often implemented through a collaborative filtering model, which relies exclusively on the user's feedback on items, ignoring content features. Content-based filtering models are nevertheless a potentially good strategy for recommendation, even though identifying relevant semantic representation...
Image caption generation becomes a raising topic in computer vision and artificial intelligence. In order to solve the problem of stiff description, we intend to extract richer features using convolutional neural network (CNN). A neural and probabilistic framework has been proposed consequently which combines CNN with a special form of recurrent neural network (RNN) to produce an end-to-end image...
The video segmentation is the basis of video analysis and digging, the results directly affect the accuracy of follow-up processing. For the video segmentation, the predecessors have done a IoT of research. We have achieved very good results in regard to the mutant lens, but there is no good way for the gradient. Because football is a popular sports, it has a very practical significance for the process...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.