The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This work proposes a novel person re-identification method based on Hierarchical Bipartite Graph Matching. Because human eyes observe person appearance roughly first and then goes further into the details gradually, our method abstracts person image from coarse to fine granularity, and finally into a three layer tree structure. Then, three bipartite graph matching methods are proposed for the matching...
In this paper, we present a novel perceptually-based optimization for the improvement of stereoscopic video coding efficiency. The main idea of this proposed scheme is to adaptively adjust the quantization parameter by taking into account the Human Visual System perceptual characteristics. For this, a saliency map is generated from both views and then segmented into salient and non-salient regions...
This paper presents a method for dataset manipulation based on Mixed Integer Linear Programming (MILP). The proposed optimization can narrow down a dataset to a particular size, while enforcing specific distributions across different dimensions. It essentially leverages the redundancies of an initial dataset in order to generate more compact versions of it, with a specific target distribution across...
In this paper, we propose to learn object representations with inference from temporal correlation in videos to achieve effective visual tracking. Unlike traditional methods which perform feature learning either at image level or based on intuitive temporal constraint, we employ the recurrent network with Long Short Term Memory (LSTM) units to directly learn temporally correlated representations of...
We consider the fully automated behavior understanding through visual cues in industrial environments. In contrast to most existing work, which relies on domain knowledge to construct complex handcrafted features from inputs, we exploit a Convolutional Neural Network (CNN), which is a type of deep model and can act directly on the raw inputs, to automate the process of feature construction. Although...
The main purpose of transfer learning is to resolve the problem of different data distribution, generally, when the training samples of source domain are different from the training samples of the target domain. Prediction of salient areas in natural video suffers from the lack of large video benchmarks with human gaze fixations. Different databases only provide dozens up to one or two hundred of...
We consider the use of transfer learning, via the use of deep Convolutional Neural Networks (CNN) for the image classification problem posed within the context of X-ray baggage security screening. The use of a deep multi-layer CNN approach, traditionally requires large amounts of training data, in order to facilitate construction of a complex complete end-to-end feature extraction, representation...
Distributed object recognition is a significantly fast-growing research area, mainly motivated by the emergence of high performance cameras and their integration with modern wireless sensor network technologies. In wireless distributed object recognition, the bandwidth is limited and it is desirable to avoid transmitting redundant visual features from multiple cameras to the base station. In this...
Density estimation based visual object counting (DE-VOC) methods estimate the counts of an image by integrating over its predicted density map. They perform effectively but inefficiently. This paper proposes a fast DE-VOC method but maintains its effectiveness. Essentially, the feature space of image patches from VOC can be clustered into subspaces, and the examples of each subspace can be collected...
Text detection is typically the first step for any text processing such as hand-written text recognition, layout analysis, line detection, or writer identification. This paper describes a new method to detect text in images, particularly in historical document images. For a robust detection, we propose the use of the vesselness filter as a new preprocessing step for text detection. We show, that this...
Subjective test methodologies are morphing to enable researchers to answer questions relevant to rapidly evolving technologies in an efficient and reliable manner. This paper is an exploration of how subjective testing that employs crowdsourcing can be refined to drive stability and reliability in subjective results. We investigate how various design decisions can lead to disparate subjective responses;...
In the quest of perceptually optimized video coding, coding textures is representing a challenging case. While a large body of research was put into the perception of static textures, dynamic textures are still not sufficiently explored. In this paper, we focus on short term consistent patches, known as dynamic textures, with a very limited spatial and temporal extent. We estimated the visual distortion...
In this paper, we present a novel self-learning single image super-resolution (SR) method, which restores a highresolution (HR) image from self-examples extracted from the low-resolution (LR) input image itself without relying on extra external training images. In the proposed method, we directly use sampled image patches as the anchor points, and then learn multiple linear mapping functions based...
Finding highlights relevant to a text query in unedited videos has become increasingly important due to their unprecedented growth. We refer this task as semantic highlight retrieval and propose a query-dependent video representation for retrieving a variety of highlights. Our method consist of two parts: (1) “viralets”, a mid-level representation bridging between visual and semantic spaces; (2) a...
This paper proposes a blur detection algorithm that is capable of detecting and quantifying the level of spatially-varying blur by integrating directional edge spread calculation, Just Noticeable Blur (JNB) and local probability summation. The proposed method generates a blur map indicating the relative amount of perceived local blurriness. We compare the proposed method with six other state-of-the-art...
We developed an optical distortion correction technique for an eyeglasses-type wearable device using a multi-mirror array (MMA). This wearable device is small and light weight, but optics using MMA can cause optical distortions, such as geometric distortion and chromatic aberration of magnification, that depend on the user's pupil distance and degrade the visibility of displayed virtual images. We...
Recently, graph ranking-based methods have been introduced to visual tracking and achieved promising results due to the local structure preserving property. However, existing graph ranking-based trackers use holistic templates to construct the graphs which makes the trackers sensitive to occlusions. In this paper, we propose a part-based multi-graph ranking algorithm for robust visual tracking. In...
A novel approach to spatio-temporal saliency detection in video is proposed. Saliency computation is considered as an optimization problem that maximizes the energy of a fully-connected graphical model based on spatio-temporal feature distinctiveness. Each pixel in a video is modeled by a node, and the spatio-temporal feature distinctiveness between pixels by edges connecting the nodes in the graph...
The emergence of UHD video format induces larger screens and involves a wider stimulated visual angle. Therefore, its effect on visual attention can be questioned since it can impact quality assessment, metrics but also the whole chain of video processing and creation. Moreover, changes in visual attention from different viewing conditions challenge visual attention models. In this paper, we present...
In this study, we make use of brain activation data to investigate the perceptual plausibility of a visual and an auditory model for visual and auditory saliency in video processing. These models have already been successfully employed in a number of applications. In addition, we experiment with parameters, modifications and suitable fusion schemes. As part of this work, fMRI data from complex video...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.