The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Convolutional neural networks (CNNs) provide the current state of the art in visual object classification, but they are far less accurate when classifying partially occluded objects. A straightforward way to improve classification under occlusion conditions is to train the classifier using partially occluded object examples. However, training the network on many combinations of object instances and...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network (CNN) using a minimal model (Minimal CNN). The proposed minimal CNN is presented using a layering approach. This approach provides a concise and accessible understanding of the main mathematical operations of a CNN. Hence, it benefits...
Most recent CNN architectures use average pooling as a final feature encoding step. In the field of fine-grained recognition, however, recent global representations like bilinear pooling offer improved performance. In this paper, we generalize average and bilinear pooling to “α-pooling”, allowing for learning the pooling strategy during training. In addition, we present a novel way to visualize decisions...
Detecting potential aerial threats like drones with computer vision is at the paramount of interest for the protection of critical locations. This type of a system should prevent efficiently the false alarms caused by non-malign objects such as birds, which intrude the image plane. In this paper, we propose an improved version of a previously presented Speeded-up Robust Feature Transform (SURF) based...
Recently, kernelized correlation Filter-based trackers have aroused the interest of many researchers and achieved good results in the field of tracking. However, the current tracking model based on kernelized correlation filters can not deal with the changes of the target appearance and scale effectively. Therefore, in this paper, we intend to solve these two problems and improve the robustness of...
Convolutional Neural Networks (CNNs) with Bilinear Pooling, initially in their full form and later using compact representations, have yielded impressive performance gains on a wide range of visual tasks, including fine-grained visual categorization, visual question answering, face recognition, and description of texture and style. The key to their success lies in the spatially invariant modeling...
Multi-label image classification is a fundamental but challenging task in computer vision. Great progress has been achieved by exploiting semantic relations between labels in recent years. However, conventional approaches are unable to model the underlying spatial relations between labels in multi-label images, because spatial annotations of the labels are generally not provided. In this paper, we...
Deep neural networks require a large amount of labeled training data during supervised learning. However, collecting and labeling so much data might be infeasible in many cases. In this paper, we introduce a deep transfer learning scheme, called selective joint fine-tuning, for improving the performance of deep learning tasks with insufficient training data. In this scheme, a target learning task...
In this paper we utilize the first large-scale "in-the-wild" (Aff-Wild) database, which is annotated in terms of the valence-arousal dimensions, to train and test an end-to-end deep neural architecture for the estimation of continuous emotion dimensions based on visual cues. The proposed architecture is based on jointly training convolutional (CNN) and recurrent neural network (RNN) layers,...
Visual tracking is intrinsically a temporal problem. Discriminative Correlation Filters (DCF) have demonstrated excellent performance for high-speed generic visual object tracking. Built upon their seminal work, there has been a plethora of recent improvements relying on convolutional neural network (CNN) pretrained on ImageNet as a feature extractor for visual tracking. However, most of their works...
We describe an end-to-end system for explainable automatic job candidate screening from video CVs. In this application, audio, face and scene features are first computed from an input video CV, using rich feature sets. These multiple modalities are fed into modality-specific regressors to predict apparent personality traits and a variable that predicts whether the subject will be invited to the interview...
In today world the necessity for the autonomous mobile robots and vehicles is increasing. The safety autonomous moving demands the reliable and fast detection algorithms. The Histogram of Oriented Gradients (HOG) descriptors show significantly outperforms the existing feature sets for a human detection. Though the given method has a lot of type I errors. The amount of these errors can be decreased...
The manual process for privacy setting could be very time-consuming and challenging for common users. By assuming that there are hidden correlations between the visual properties of images (i.e., visual features) or object classes and the privacy settings for image sharing, an effective algorithm is developed in this paper to achieve automatic prediction of image privacy, so that the best-matching...
This work introduces the one-class slab SVM (OCSSVM), a one-class classifier that aims at improving the performance of the one-class SVM. The proposed strategy reduces the false positive rate and increases the accuracy of detecting instances from novel classes. To this end, it uses two parallel hyperplanes to learn the normal region of the decision scores of the target class. OCSSVM extends one-class...
Affective computing, particularly emotion and personality trait recognition, is of increasing interest in many research disciplines. The interplay of emotion and personality shows itself in the first impression left on other people. Moreover, the ambient information, e.g. the environment and objects surrounding the subject, also affect these impressions. In this work, we employ pre-trained Deep Convolutional...
In order to cope with the complex variation of target appearance during visual tracking, a robust tracking algorithm based on multi-scale kernelized least squares (KLS) is proposed. First, by showing that the dense sampling set of translated patches is circulant, using the well-established theory of circulant matrices, kernelized least squares is efficient computed with fast Fourier transform (FFT)...
Occlusion is a challenging problem in visual object tracking. Most state-of-the-art trackers may learn the appearance of the occluding target when it becomes occluded by other objects in the scene. This paper proposes a novel approach of detecting occlusion by dividing the target into several patches and computing the peak-to-sidelobe ratio of every response map. Furthermore, our method can calculate...
The state-of-the-art image classification methods require an intensive learning stage and a considerable amount of training images. Recently, with the introduction of these models (and in particular convolutional neural network (CNN)), it is believed that the best solution to achieve a system with high performance on scene classification is to learn deep scene features using CNN. While this can be...
Deep Convolutional Neural Network (CNN) is one of the most popular methods for image processing and recognition. There are many research works to improve the performance of CNNs. However, as an important part of CNNs, convolution kernel has rarely been discussed. As one Original Convolution Kernel (OCK) can only detect one type of visual feature with a fixed deformation, the networks using OCKs may...
Visual localization is the process of finding the location of a camera from the appearance of the images it captures. In this work, we propose an observation model that allows the use of images for particle filter localization. To achieve this, we exploit the capabilities of Gaussian Processes to calculate the likelihood of the observation for any given pose, in contrast to methods which restrict...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.