The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The FlowNet demonstrated that optical flow estimation can be cast as a learning problem. However, the state of the art with regard to the quality of the flow has still been defined by traditional methods. Particularly on small displacements and real-world data, FlowNet cannot compete with variational methods. In this paper, we advance the concept of end-to-end learning of optical flow and make it...
In this work we train in an end-to-end manner a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture. Such a network can act like a swiss knife for vision tasks, we call it an UberNet to indicate its overarching nature. The main contribution of this work consists in handling challenges that emerge when scaling up to many tasks. We...
We propose a novel crowd counting model that maps a given crowd scene to its density. Crowd analysis is compounded by myriad of factors like inter-occlusion between people due to extreme crowding, high similarity of appearance between people and background elements, and large variability of camera view-points. Current state-of-the art approaches tackle these factors by using multi-scale CNN architectures,...
Deep convolutional neural networks (CNNs) have proven highly effective for visual recognition, where learning a universal representation from activations of convolutional layer plays a fundamental problem. In this paper, we present Fisher Vector encoding with Variational Auto-Encoder (FV-VAE), a novel deep architecture that quantizes the local activations of convolutional layer in a deep generative...
Robust perception-action models should be learned from training data with diverse visual appearances and realistic behaviors, yet current approaches to deep visuomotor policy learning have been generally limited to in-situ models learned from a single vehicle or simulation environment. We advocate learning a generic vehicle motion model from large scale crowd-sourced video data, and develop an end-to-end...
We present an improved three-step pipeline for the stereo matching problem and introduce multiple novelties at each stage. We propose a new highway network architecture for computing the matching cost at each possible disparity, based on multilevel weighted residual shortcuts, trained with a hybrid loss that supports multilevel comparison of image patches. A novel post-processing step is then introduced,...
In this work, we study a poorly understood trade-off between accuracy and runtime costs for deep semantic video segmentation. While recent work has demonstrated advantages of learning to speed-up deep activity detection, it is not clear if similar advantages will hold for our very different segmentation loss function, which is defined over individual pixels across the frames. In deep video segmentation,...
In this work, we build a generic architecture of Convolutional Neural Networks to discover empirical properties of neural networks. Our first contribution is to introduce a state-of-the-art framework that depends upon few hyper parameters and to study the network when we vary them. It has no max pooling, no biases, only 13 layers, is purely convolutional and yields up to 95.4% and 79.6% accuracy respectively...
Linking two data sources is a basic building block in numerous computer vision problems. Canonical Correlation Analysis (CCA) achieves this by utilizing a linear optimizer in order to maximize the correlation between the two views. Recent work makes use of non-linear models, including deep learning techniques, that optimize the CCA loss in some feature space. In this paper, we introduce a novel, bi-directional...
We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D pose of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our architecture, named LCR-Net, contains...
We present an approach to efficiently detect the 2D pose of multiple people in an image. The approach uses a nonparametric representation, which we refer to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image. The architecture encodes global context, allowing a greedy bottom-up parsing step that maintains high accuracy while achieving realtime performance,...
Expected product quality is affected by multi-parameter in complex manufacturing processes. Product quality prediction can offer the possibility of designing better system parameters at the early production stage. Many existing approaches fail at providing favorable results duo to shallow architecture in prediction model that can not learn multi-parameter's features insufficiently. To address this...
Nowadays, applications based on digits recognition and characters recognition have become much more reliable thanks to the rapid development of the DNN(deep neural network) architecture and constantly increasing the efficiency to the computing resources. A lot of methods have been proposed to improve the performance of DNNs, such as the ReLU (Rectified Linear Unit) which is a widely used alternative...
Deep-learning-based methods often suffer from insufficient training samples when they are directly used in the task of Synthetical Aperture Radar (SAR) images classification, which in turn leads to poor performance. To alleviate this problem, this paper presents a feature-fused approach, in which several statistical features of SAR images are extracted and integrated into the first layer of a typical...
Speckle removal from single-channel and multi-dimensional SAR remains a difficult problem. In this paper, we are investigating the use of a Convolutional Neural Network (CNN), previously applied to the Super-Resolution problem, for speckle removal. Because speckle noise statistics is signal dependent, we are training the neural network on the residual image formed by the ratio of the observed intensity...
In this work we consider the problem of developing algorithms that automatically identify small-scale solar photovoltaic arrays in high resolution aerial imagery. Such algorithms potentially offer a faster and cheaper solution to collecting small-scale photovoltaic (PV) information, such as their location, capacity, and the energy they produce. Here we build on previous algorithmic work by employing...
Fusing different sensors with different data modalities is a common technique to improve land cover classification performance in remote sensing. However, all modalities are rarely available for all test data, and this missing data problem poses severe challenges for multi-modal learning. Inspired by recent successes in deep learning, we propose as a remedy a convolutional neural network architecture...
Accelerating the inference of a trained DNN is a well studied subject. In this paper we switch the focus to the training of DNNs. The training phase is compute intensive, demands complicated data communication, and contains multiple levels of data dependencies and parallelism. This paper presents an algorithm/architecture space exploration of efficient accelerators to achieve better network convergence...
Predicting interestingness of media content remains an important, but challenging research subject. The difficulty comes first from the fact that, besides being a high-level semantic concept, interestingness is highly subjective and its global definition has not been agreed yet. This paper presents the use of up-to-date deep learning techniques for solving the task. We perform experiments with both...
Facial attractiveness computation is a challenging task because of the lack of labeled data and discriminative features. In this paper, an end-to-end label distribution learning (LDL) framework with deep convolutional neural network (CNN) and geometric features is proposed to meet these two challenges. Different from the previous work, we recast this task as an LDL problem. Compared with the single...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.