The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In 3D object recognition, local feature-based recognition is known to be robust against occlusion and clutter. Local feature estimation requires feature correspondences, including feature extraction and matching. Feature extraction is normally a two-stage process that estimates keypoints and keypoint descriptors, and existing studies show repeatability to be a good indicator of keypoint feature detector...
Image annotation methods construct a Tag distance matrix, which entries show the relevancy of tags for each test image. More accuracy in calculating this matrix provides better annotation results. The aim of our two methods is to improve the accuracy of the Tag distance matrix using the class information already available in most datasets. If the class information is not available, extracting important...
Salient object detection has been greatly boosted thanks to the deep convolutional neural networks (CNN), especially fully convolutional neural networks (FCN). Nowadays, it is possible to train an end-to-end deep model for salient object detection. However, the diverse scales of salient objects still pose major challenges for these state-of-the-art methods. In this paper, we investigate how different...
We propose an action parsing algorithm to parse a video sequence containing an unknown number of actions into its action segments. We argue that context information, particularly the temporal information about other actions in the video sequence, is valuable for action segmentation. The proposed parsing algorithm temporally segments the video sequence into action segments. The optimal temporal segmentation...
We present OCSB, a novel online Bayesian framework for imbalance multi-class data streams. To the best of our knowledge, OCSB is the first online method applying both cost-sensitive learning and sampling technique in a single classifier to deal with class imbalance learning. Specifically, an artificial cost matrix is designed and adapted in a sequential manner to not only boost the accuracy of minority...
Spectral methods refer to the problem of finding eigenvectors of an affinity matrix. Despite promising performance on revealing manifold structure, they are limited in its applicability to large-scale problems due to the high computational cost of eigendecomposition. Nyström method, as a classic method, seeks an approximate solution by first solving a smaller eigenproblem defined on a subset of landmarks,...
In this paper, we present a method for stereo super-resolution which employs a deep network. The network is trained using the residual image so as to obtain a high resolution image from two, low resolution views. Our network is comprised by two deep sub-nets which share, at their output, a single convolutional layer. This last layer in the network delivers an estimate of the residual image which is...
This paper presents a two-pass clustering technique for orientation-invariant text line clustering in a language-independent text localization problem based on the connected component analysis (CCA) approach. Instead of doing a single-pass cluster in the conventional way, the proposed technique firstly explores nearby objects around the candidate components. By setting up the global constraints with...
A complex activity is a temporal composition of sub-events, and a sub-event typically consists of several low level micro-actions, such as body movement of different actors. Extracting these micro actions explicitly is beneficial for complex activity recognition due to actor selectivity, higher discriminative power, and motion clutter suppression. Moreover, considering both static and motion features...
Saliency detection aims to find the useful and attractive regions from an image. In many situations, there may be multiple objects in the image, and these objects may have equal attractiveness. Moreover, the appearance of pixels in one object may demonstrate large difference, which could lead to lose the object integrality when detecting saliency. To this end, this paper proposes a multi-saliency...
This paper addresses a specific example of nonperiodic translation symmetry and presents an algorithm to automatically detect multiple poles, or their shadows, in aerial imagery by looking for consistent and overlapping regions of self-similarity across a non-urban scene. The algorithm does not rely on having a pole template or knowing its exact size. For each image patch, similar regions (or blobs)...
Morphologic filters are used here to interpolate missing values from sets of frequency domain measurements, as occurs in Magnetic Resonance Imaging. MRI data acquisition is done in the Fourier domain which is often sub-sampled to reduce the required scan time. Partial recovery of the missing frequency samples permits direct Fourier inversion to provide a rapid and improved initial estimation of the...
Kernel descriptors have been proven to outperform existing histogram based local descriptors as such descriptors are extracted from the match kernels which measure similarities between image patches using different pixel attributes (gradient, colour or LBP pattern). The extraction of kernel descriptors does not require coarse quantization of pixel attributes. Instead, each pixel equally participates...
Image compression plays more and more important role in image processing. Image sparse coding with learned over-complete dictionaries shows promising results on image compression by representing images with dictionary atoms compactly. Within the sparse coding based compression framework, a sparse dictionary is first learned from training images in a predefined image library, and then an image is compressed...
Human pose forecasting is an important problem in computer vision with applications to human-robot interaction, visual surveillance, and autonomous driving. Usually, forecasting algorithms use 3D skeleton sequences and are trained to forecast for a few milliseconds into the future. Long-range forecasting is challenging due to the difficulty of estimating how long a person continues an activity. To...
Salient object detection using RGB-D data is an emerging field in computer vision. Salient regions are often characterized by an unusual surface orientation profile with respect to the surroundings. To capture such profile, we introduce the histogram of surface orientation (HOSO) feature to measure surface orientation distribution contrast for RGB-D saliency. We propose a new unified model that integrates...
This paper deals with automatic estimation of the horizon in videos from fixed surveillance cameras. The proposed algorithm is fully automatic in the sense that no user input is needed per-camera and it works with various scenes (indoor, outdoor, traffic, pedestrian, livestock, etc.). The algorithm detects moving objects, tracks them in time, assesses some of their geometric properties related to...
Despite the appeal of deep neural networks that largely replace the traditional handmade filters, they still suffer from isolated cases that cannot be properly handled only by the training of convolutional filters. Abnormal factors, including real-world noise, blur, or other quality degradations, ruin the output of a neural network. These unexpected problems can produce critical complications, and...
Stochastic Gradient Descent (SGD) is the method of choice for large scale problems, most notably in deep learning. Recent studies target improving convergence and speed of the SGD algorithm. In this paper, we equip the SGD algorithm and its advanced versions with an intriguing feature, namely handling constrained problems. Constraints such as orthogonality are pervasive in learning theory. Nevertheless...
A multi-view multi-target correspondence framework employing deep learning on overlapping cameras for identity-aware tracking in the presence of occlusion is proposed. Our complete pipeline of detection, multi-view correspondence, fusion and tracking, inspired by AI greatly improves person correspondence across multiple wide-angled views over traditionally used features set and handcrafted descriptors...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.