The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper we present an autonomous detection approach for airborne surveillance in maritime scenarios. This approach is robust to sun glare, waves and scale variation. Additionally, we introduce a new metric to evaluate detection and tracking results that is more adequate for these scenarios. The proposed detection method is evaluated using videos from different monitoring missions and its results...
Multi-person pose estimation in the wild is challenging. Although state-of-the-art human detectors have demonstrated good performance, small errors in localization and recognition are inevitable. These errors can cause failures for a single-person pose estimator (SPPE), especially for methods that solely depend on human detection results. In this paper, we propose a novel regional multi-person pose...
In this paper, we address the problem of estimating the positions of human joints, i.e., articulated pose estimation. Recent state-of-the-art solutions model two key issues, joint detection and spatial configuration refinement, together using convolutional neural networks. Our work mainly focuses on spatial configuration refinement by reducing variations of human poses statistically, which is motivated...
Technique of comparing pedestrian images observed by different cameras to determine whether they are the same person is important in the surveillance system. This technique is called Person re-identification. Most of Person reidentification is underway assuming that occlusion does not occur. However, since occlusion occurs frequently in the surveillance system and affects accuracy, it is necessary...
We present a novel single-shot text detector that directly outputs word-level bounding boxes in a natural image. We propose an attention mechanism which roughly identifies text regions via an automatically learned attentional map. This substantially suppresses background interference in the convolutional features, which is the key to producing accurate inference of words, particularly at extremely...
Automatically describing an image with a natural language has been an emerging challenge in both fields of computer vision and natural language processing. In this paper, we present Long Short-Term Memory with Attributes (LSTM-A) - a novel architecture that integrates attributes into the successful Convolutional Neural Networks (CNNs) plus Recurrent Neural Networks (RNNs) image captioning framework,...
The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In...
Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identify...
Feature matching is fundamental to many vision tasks. Due to the low visibility of images in underwater environments, traditional pixels-based matching methods suffer from miss-matching or error-matching. Recently, Superpixel based features have been applied to image feature analysis. However, most of existing methods dedicate to rectified stereo matching with images captured in the air. This paper...
Robust covariant local feature detectors are important for detecting local features that are (1) discriminative of the image content and (2) can be repeatably detected at consistent locations when the image undergoes diverse transformations. Such detectors are critical for applications such as image search and scene reconstruction. Many learning-based local feature detectors address one of these two...
Training object class detectors typically requires a large set of images with objects annotated by bounding boxes. However, manually drawing bounding boxes is very time consuming. In this paper we greatly reduce annotation time by proposing center-click annotations: we ask annotators to click on the center of an imaginary bounding box which tightly encloses the object instance. We then incorporate...
Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult...
Matching local image descriptors is a key step in many computer vision applications. For more than a decade, hand-crafted descriptors such as SIFT have been used for this task. Recently, multiple new descriptors learned from data have been proposed and shown to improve on SIFT in terms of discriminative power. This paper is dedicated to an extensive experimental evaluation of learned local features...
Relationships among objects play a crucial role in image understanding. Despite the great success of deep learning techniques in recognizing individual objects, reasoning about the relationships among objects remains a challenging task. Previous methods often treat this as a classification problem, considering each type of relationship (e.g. ride) or each distinct visual phrase (e.g. person-ride-horse)...
In our overly-connected world, the automatic recognition of virality – the quality of an image or video to be rapidly and widely spread in social networks – is of crucial importance, and has recently awaken the interest of the computer vision community. Concurrently, recent progress in deep learning architectures showed that global pooling strategies allow the extraction of activation...
This paper is an approach for pedestrian detection and tracking with infrared imagery. The detection phase is performed by AdaBoost algorithm based on Haar-like features. AdaBoost classifier is trained with datasets generated from infrared images. The number of negative images used for training with AdaBoost algorithm is 3000. For positive training, 1000 samples are used After detecting the pedestrian...
This paper proposes efficient real time method for sterile zone monitoring with human verification. The propose method consists of two main parts: Motion detection module and human verification module. The role of motion detection module is to segment out foreground object from background. Probabilistic Foreground Detector based on Gaussian Mixture Model(GMM) is used. Region of interest (ROI) obtained...
Palm vein recognition is a new biometric identification technology. The horizontal rotation, translation, tilting and loss of local vein information of palm vein image greatly affect recognition rate. To solve the above problems, this paper respectively extract four kinds of local invariant feature, Scale Invariant Feature Transform(SIFT), Affine-SIFT(ASIFT), Harris-Laplace and Maximally Stable Extremal...
Feature matching quality strongly influences the accuracy of most computer vision tasks. This led to impressive advances in keypoint detection, descriptor calculation, and feature matching itself. To compare different approaches and evaluate their quality, datasets from related tasks are used. Unfortunately, none of these datasets actually provide ground truth (GT) feature matches. Thus, matches can...
In today world the necessity for the autonomous mobile robots and vehicles is increasing. The safety autonomous moving demands the reliable and fast detection algorithms. The Histogram of Oriented Gradients (HOG) descriptors show significantly outperforms the existing feature sets for a human detection. Though the given method has a lot of type I errors. The amount of these errors can be decreased...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.