The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Translation symmetry is one of the most important pattern characteristics in natural and man-made environments. Detecting translation symmetry is a grand challenge in computer vision. This has a large spectrum of real-world applications from industrial settings to design, arts, entertainment and eduction. This paper describes the algorithm we have submitted for the Symmetry Detection Competition 2013...
Large scale 3D image localization requires computationally expensive matching between 2D feature points in the query image and a 3D point cloud. In this paper, we present a method to accelerate the matching process and to reduce the memory footprint by analyzing the view-statistics of points in a training corpus. Given a training image set that is representative of common views of a scene, our approach...
We propose a system for user-aided visual localization of desert imagery without the use of any metadata such as GPS readings, camera focal length, or field-of-view. The system makes use only of publicly available digital elevation models (DEMs) to rapidly and accurately locate photographs in non-urban environments such as deserts. Our system generates synthetic skyline views from a DEM and extracts...
Understanding human actions in videos has been a central research theme in Computer Vision for decades, and much progress has been achieved over the years. Much of this progress was demonstrated on standard benchmarks used to evaluate novel techniques. These benchmarks and their evolution, provide a unique perspective on the growing capabilities of computerized action recognition systems. They demonstrate...
Action recognition is one of the major challenges of computer vision. Several approaches have been proposed using different descriptors and multi-class models. In this paper, we focus on binary ranking models for the action recognition problem and address the action recognition as a ranking problem. A binary ranking model is trained for each action and used to recognize the test videos for that action...
Human actions are spatio-temporal patterns. A popular representation is to describe the action by features at interest points. Because the interest point detection and feature description are generic processes, they are not tuned to discriminate one particular action from the other. In this paper we propose a saliency measure for each individual feature to improve its distinctiveness for a particular...
Action Recognition in videos is an active research field that is fueled by an acute need, spanning several application domains. Still, existing systems fall short of the applications' needs in real-world scenarios, where the quality of the video is less than optimal and the viewpoint is uncontrolled and often not static. In this paper, we extend the Motion Interchange Patterns (MIP) framework for...
Referring expression generation is widely considered a basic building block of any natural language generation system. Generating these phrases, which can point out a single object from a group of objects, has been studied extensively in that community. However, to build systems which can discuss images in an intelligent way, it is necessary to consider additional factors unique to the visual domain...
We present the Cardiff Conversation Database (CCDb), a unique 2D audiovisual database containing natural conversations between pairs of people. The database currently contains 30 conversations. To date, eight conversations are fully annotated for speaker activity, facial expressions, head motion, and non-verbal utterances. In this paper we describe the data collection and annotation process. We also...
We present a vision-based method for signer diarization - the task of automatically determining "who signed when?" in a video. This task has similar motivations and applications as speaker diarization but has received little attention in the literature. In this paper, we motivate the problem and propose a method for solving it. The method is based on the hypothesis that signers make more...
Automatically generating meaningful descriptions for images has recently emerged as an important area of research. In this direction, a nearest-neighbour based generative phrase prediction model (PPM) proposed by (Gupta et al. 2012) was shown to achieve state-of-the-art results on PASCAL sentence dataset, thanks to the simultaneous use of three different sources of information (i.e. visual clues,...
Associating photographs with complete sentences that describe what is depicted in them is a challenging problem. This paper examines how an approach that is inspired by image tagging techniques which can scale to very large data sets performs on this much harder task, and examines some of the linguistic difficulties that this bag-of-words model faces.
Person re-identification is about recognizing people who have passed by a sensor earlier. Previous work is mainly based on RGB data, but in this work we for the first time present a system where we combine RGB, depth, and thermal data for re-identification purposes. First, from each of the three modalities, we obtain some particular features: from RGB data, we model color information from different...
In this work, we propose a novel fast and accurate method based on keypoints and temporal information to solve the registration problem on planar scenes with moving objects for infrared-visible stereo pairs. A keypoint descriptor and a temporal buffer (reservoir) filled with matched keypoints are used in order to find the homography transformation for the registration. Inside a given frame, the problem...
The majority of existing image fusion techniques operate in the 2-d image domain which perform well for imagery of planar regions but fails in presence of any 3-d relief and provides inaccurate alignment of imagery from different sensors. A framework for multi-sensor image fusion in 3-d is proposed in this paper. The imagery from different sensors, specifically EO and IR, are fused in a common 3-d...
This paper presents a geo-localization framework of street-level outdoor images using multiple sources of overhead reference imagery including LIDAR, Digital Elevation Maps and Multi-Spectral Land Cover/Use imagery. We describe five different matchers and an adaptive linear fusion process which combines individual matchers' probability maps into a single map. These matchers exploit mountain elevation...
Due to the high dimensionality of spectral data, spectrum representation techniques have often concentrated on modelling the spectra as a linear combination of a small basis set. Here, we focus on the evaluation of a B-Spline representation, a Gaussian mixture model, PCA and wavelets when applied to represent real-world spectrometer and spectral image data. These representations are important since...
This work deals with non-invasive and non-intrusive measurements of the human facial vasculature from thermal IR to measure cardiovascular vital signs. A robust, fully automatic measurement system is developed to study infrared videos of 32 under three imaging scenarios. Vascular mapping, blood perfusion modeling, and wavelet analysis are used to calculate heart rate from 512 video frames in near...
Multiple-look fusion is quickly becoming more important in statistical pattern recognition. With increased computing power and memory one can make many measurements on an object of interest using, for example, video imagery or radar. By obtaining more views of an object, a system can make decisions with lower missed detection and false alarm errors. There are many approaches for combining information...
In recent years, heterogeneous face biometrics has attracted more attentions in the face recognition community. After published in 2009, the HFB database has been applied by tens of research groups and widely used for Near infrared vs. Visible light (NIR-VIS) face recognition. Despite its success the HFB database has two disadvantages: a limited number of subjects, lacking specific evaluation protocols...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.