The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The success of deep neural networks usually relies on a large number of labeled training samples, which unfortunately are not easy to obtain in practice. Unsupervised domain adaptation focuses on the problem where there is no labeled data in the target domain. In this paper, we propose a novel deep unsupervised domain adaptation method that learns transferable features. Different from most existing...
Embedded computer vision applications have been incorporated in industrial automation, improving quality and safety of processes. Such systems involve pattern classifiers for specific functions that, many times, demand high memory footprint and processing time. This work suggests a strategy to choose GLCM (Gray Level Co-occurrence Matrix) features for an SVM classifier that can reduce computer resources...
Our goal is to design architectures that retain the groundbreaking performance of CNNs for landmark localization and at the same time are lightweight, compact and suitable for applications with limited computational resources. To this end, we make the following contributions: (a) we are the first to study the effect of neural network binarization on localization tasks, namely human pose estimation...
Multi-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. These errors can cause failures for a single-person pose estimator (SPPE), especially for methods that solely depend on human detection results. In this paper, we propose a novel regional multi-person pose...
We propose a novel framework for abnormal event detection in video that requires no training sequences. Our framework is based on unmasking, a technique previously used for authorship verification in text documents, which we adapt to our task. We iteratively train a binary classifier to distinguish between two consecutive video sequences while removing at each step the most discriminant features....
In this paper, we propose a novel domain-specific dataset named VegFru for fine-grained visual categorization (FGVC). While the existing datasets for FGVC are mainly focused on animal breeds or man-made objects with limited labelled data, VegFru is a larger dataset consisting of vegetables and fruits which are closely associated with the daily life of everyone. Aiming at domestic cooking and food...
The amount of data produced every day on the internet increases every day and with the increasing popularity of the social networks the number of published photos are huge, and those pictures contain several implicit or explicit brand logos. Detecting this logos in natural images can provide information about how widespread is a brand, discover unwanted copyright distribution, analyze marketing campaigns,...
In order to reduce the number of accidents caused by the call when the driver was driving, this paper uses the computer vision technology to dectet the behavior of the driver. Based on the constrained local models (CLM) to detect the characteristic changes of the mouth area, combine the HSV color space and the template matching to detect the hand characteristics to judge whether the driver has the...
Deep embeddings answer one simple question: How similar are two images? Learning these embeddings is the bedrock of verification, zero-shot learning, and visual search. The most prominent approaches optimize a deep convolutional network with a suitable loss function, such as contrastive loss or triplet loss. While a rich line of work focuses solely on the loss functions, we show in this paper that...
This paper investigates how far a very deep neural network is from attaining close to saturating performance on existing 2D and 3D face alignment datasets. To this end, we make the following 5 contributions: (a) we construct, for the first time, a very strong baseline by combining a state-of-the-art architecture for landmark localization with a state-of-the-art residual block, train it on a very large...
Subspace learning is one of the most foundational tasks in computer vision with applications ranging from dimensionality reduction to data denoising. As geometric objects, subspaces have also been successfully used for efficiently representing certain types of invariant data. However, methods for subspace learning from subspace-valued data have been notably absent due to incompatibilities with standard...
It is a simple task for humans to visually identify objects. However, computer-based image recognition remains challenging. In this paper we describe an approach for image recognition with specific focus on automated recognition of plants and flowers. The approach taken utilizes deep learning capabilities and unlike other approaches that focus on static images for feature classification, we utilize...
The contribution of this paper is to bridge the gap on understanding the mathematical structure and the computational implementation of a convolutional neural network using a minimal model. The proposed minimal convolutional neural network is presented using a layering approach. This approach provides a clear understanding of the main mathematical operations in a convolutional neural network. Hence,...
Person re-identification in public areas (such as airports, train stations and shopping malls) has recently received increased attention within computer vision research due, in part, to the demand for enhanced levels of security. Re-identifying subjects within non-overlapped camera networks can be considered as a challenging task. Illumination changes in different scenes, variations in camera resolutions,...
We introduce a new framework for learning dense correspondence between deformable 3D shapes. Existing learning based approaches model shape correspondence as a labelling problem, where each point of a query shape receives a label identifying a point on some reference domain; the correspondence is then constructed a posteriori by composing the label predictions of two input shapes. We propose a paradigm...
Deep Neural Networks trained on large datasets can be easily transferred to new domains with far fewer labeled examples by a process called fine-tuning. This has the advantage that representations learned in the large source domain can be exploited on smaller target domains. However, networks designed to be optimal for the source task are often prohibitively large for the target task. In this work...
Although Deep Convolutional Neural Networks (CNNs) have liberated their power in various computer vision tasks, the most important components of CNN, convolutional layers and fully connected layers, are still limited to linear transformations. In this paper, we propose a novel Factorized Bilinear (FB) layer to model the pairwise feature interactions by considering the quadratic terms in the transformations...
During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is a core task of various emerging industrial applications such as autonomous driving and medical imaging. However, to train CNNs requires a huge amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNN...
Articulated human pose estimation is a fundamental yet challenging task in computer vision. The difficulty is particularly pronounced in scale variations of human body parts when camera view changes or severe foreshortening happens. Although pyramid methods are widely used to handle scale changes at inference time, learning feature pyramids in deep convolutional neural networks (DCNNs) is still not...
Video image dataset is playing an essential role in design and evaluation of traffic vision methods. However, there is a longstanding difficulty that manually collecting and annotating large-scale diversified dataset from real scenes is time-consuming and prone to error. In 2016, we proposed the parallel vision methodology to tackle the issues of conventional vision computing approach in data collection,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.