The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Extreme weather recognition using GoogLeNet can achieve excellent performance, which is far superior to the conventional methods. However, the complexity of GoogLeNet is relatively high. Furthermore, for the small scale data, GoogLeNet usually cannot achieve the performance as the large scale data does. In this paper, a novel dual fine-tuning strategy is proposed to train the GoogLeNet model. Firstly,...
While recent advances in deep learning pushed the state-of-the-art in object detection and semantic segmentation, it often comes at the cost of a considerable annotation effort. Thus, weakly supervised learning became of increasing interest. In this paper a novel approach to the challenging task of weakly supervised segmentation and object localization will be presented. The problem is tackled from...
Convolutional Neural Network (CNN) has received remarkable achievements in hyperspectral image (HSI) classification. However, how to effectively implement spatial context that has been demonstrated to be crucial for classification of HSI is still an open issue. Current CNNs for hyperspectral classification are restricted into a small scale due to small-scale input and limited training samples. Therefore,...
The process of spatially aligning two or more images acquired from different devices or imaging protocols is known as multi-modal image registration. As the similarity measure used is one of the most significant aspects of this process, certain measures have been proposed to enhance multi-modal image registration. However, the currently available measures are either not sufficiently accurate or are...
Thanks to the advances in deep learning techniques and the increasing size of training data, ground- breaking progress on image classification has recently been achieved. However, focusing on distinguishing usually hundreds of sub-categories belonging to the same basic-level category, fine- grained recognition of unusual natural object categories (e.g., a special type of insect) still remains challenging...
Recently, Two-Stream Convolutional Network has achieved remarkable performance. Especially, by capturing appearance and motion information, spatial-temporal two- stream networks bring noticeable improvement. On the other hand, dynamic image, which is a powerful representation for videos, has also been confirmed to provide complimentary information to spatial appearance. Inspired by these works, we...
Despite the appeal of deep neural networks that largely replace the traditional handmade filters, they still suffer from isolated cases that cannot be properly handled only by the training of convolutional filters. Abnormal factors, including real-world noise, blur, or other quality degradations, ruin the output of a neural network. These unexpected problems can produce critical complications, and...
Recently, many works have been published for counting people. However, when being applied to real-world train station videos, they have exposed many limitations due to problems such as low resolution, heavy occlusion, various density levels and perspective distortions. In this paper, following the recent trend of regression-based density estimation, we present a linear regression approach based on...
Stochastic Gradient Descent (SGD) is the method of choice for large scale problems, most notably in deep learning. Recent studies target improving convergence and speed of the SGD algorithm. In this paper, we equip the SGD algorithm and its advanced versions with an intriguing feature, namely handling constrained problems. Constraints such as orthogonality are pervasive in learning theory. Nevertheless...
A multi-view multi-target correspondence framework employing deep learning on overlapping cameras for identity-aware tracking in the presence of occlusion is proposed. Our complete pipeline of detection, multi-view correspondence, fusion and tracking, inspired by AI greatly improves person correspondence across multiple wide-angled views over traditionally used features set and handcrafted descriptors...
Image classification is one of the critical tasks in hyperspectral remote sensing. In recent years, significant improvement have been achieved by various classification methods. However, mixed spectral responses from different ground materials still create confusions in complex scenes. In this regard, unmixing approaches are being successfully carried out to decompose mixed pixels into a collection...
In this paper, a unified deep convolutional architecture is proposed to address the problems in the person re-identification task. The proposed method adaptively learns the discriminative deep mid-level features of a person and constructs the correspondence features between an image pair in a data-driven manner. The previous Siamese structure deep learning approaches focus only on pair-wise matching...
Object deformation and occlusion are ubiquitous problems for visual tracking. Though many efforts have been made to handle object deformation and occlusion, most existing tracking algorithms fail in case of large deformation and severe occlusion. In this paper, we propose a graph learning-based tracking framework to handle both challenges. For each consecutive frame pair, we construct a weighted graph,...
Image smoothing is a fundamental technology which aims to preserve image structure and remove insignificant texture. Balancing the trade-off between preserving structure and suppressing texture, however, is not a trivial task. This is because existing methods rely on only one guidance to infer structure or texture and assume the other is dependent. However, in many cases, textures are composed of...
Region-based image retrieval has been proven to be effective in finding relevant images. In this paper, we propose a cuboid im-age segmentation method which results in rectangle image partitions. Rectangle partitions are more suitable for image compression, retrieval and other image operations. We apply partitions in image retrieval in this paper. Our experimental results have shown that (1) the proposed...
Using color histograms in automatic emotion recognition systems faces different issues. One of the important challenges is to determine the appropriate number of bins in the color histogram to achieve the highest recognition performance possible with minimal computations. This research focuses on emotion recognition induced by visual contents of images, or REVC for short, using ARTphoto dataset. Twenty-two...
Optical Character Recognition (OCR) in the scanned documents has been a well-studied problem in the past. However, when these characters come from the natural scenes, it becomes a much more challenging problem, as there exist many difficulties in these images, e.g., illumination variance, cluttered backgrounds, geometry distortion. In this paper, we propose to use a deep learning method that based...
Color mapping for 3D models with captured images is a classical problem in computer vision. Typically, registration between 3D model and images is assumed to be provided, otherwise corresponding points need to be labeled. For many applications, 3D model and images are acquired from different devices, since registration cannot be directly obtained, manual labeling has to be adopted. In this paper,...
We propose a novel middle level estimation of traffic scenes: Collision Risk Rating (CRR). Given a video sequence from a dashboard camera as input, the objective is to estimate a rate that describes "how likely a collision could happen". CRR's problem setting is similar to that of video classification, but it is more complicated and requires rich feature representations to capture the different...
Using spatio-temporal features is popular for action recognition. However, existing methods embed these local features into a global representation. Orders and correlations among local motions of each action are missing. This can make it difficult to distinguish closely related actions. This paper proposes a solution to address this challenge by encoding correlations of movements. Space-time interest...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.