The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The bag of visual words (BoW) model is one of the most successful model in image classification task. However, the major problem of the BoW model lies in the determination of visual words, which consists of codebook training and feature encoding phases. The traditional K-means and hard-assignment method completely ignore the structure of the local feature space, leading to high loss of information...
In this paper, we propose a novel Weakly-Supervised Dual Clustering (WSDC) approach for image semantic segmentation with image-level labels, i.e., collaboratively performing image segmentation and tag alignment with those regions. The proposed approach is motivated from the observation that super pixels belonging to an object class usually exist across multiple images and hence can be gathered via...
The use of local features for image representation has been proven very effective for a variety of visual tasks such as object localization and scene classification. However, local image features carry little semantic information which is potentially not enough for high level visual tasks. To solve this problem, in this paper, we propose to use a supervised semantic image representation for scene...
In this paper, we propose a novel solution of anomaly detection in crowd scene by jointly modeling appearance and dynamics of motion. First, a novel high-frequency feature based on optical flow (HFOF) is introduced. It can well capture the dynamic information of optical flow. Besides, we adopt the other two types of features, namely multi-scale histogram of optical(MHOF), and dynamic textures (DT)...
Effective video presentation and summarization techniques are critical for fast browsing of video content. In this paper, we propose a novel presentation approach to vividly depict the moving process of a specific object in a surveillance video, which aims at effectively summarizing video content by a static image named narrative. Firstly, the object of interest is extracted and segmented from the...
With the permeation of Web 2.0, large-scale user contributed images with tags are easily available on social websites. How to align these social tags with image regions is a challenging task while no additional human intervention is considered, but a valuable one since the alignment can provide more detailed image semantic information and improve the accuracy of image retrieval. To this end, we propose...
Relevance feedback is a quite effective approach to improve performance for image retrieval. Recently, active learning method has attracted much attention due to its capability of alleviating the burden of labeling in relevance feedback. However, most of the traditional studies focus on single sample selection in each feedback which needs heavy computational cost in practice. In this paper, we presents...
With the proliferation of cameras in public areas, it becomes increasingly desirable to develop fully automated surveillance and monitoring systems. In this paper, we propose a novel unsupervised approach to automatically explore motion patterns occurring in dynamic scenes under an improved sparse topical coding (STC) framework. Given an input video with a fixed camera, we first segment the whole...
In this paper, we address a practical problem of cross-scenario clothing retrieval — given a daily human photo captured in general environment, e.g., on street, finding similar clothing in online shops, where the photos are captured more professionally and with clean background. There are large discrepancies between daily photo scenario and online shopping scenario. We first propose to alleviate the...
In this paper, we present a new method for facial age estimation based on ordinal discriminative feature learning. Considering the temporally ordinal and continuous characteristic of aging process, the proposed method not only aims at preserving the local manifold structure of facial images, but also it wants to keep the ordinal information among aging faces. Moreover, we try to remove redundant information...
Data has multi-view representations from various feature spaces in real world. Multi-view clustering algorithms allow leveraging information from multiple views of the data and this may substantially improve the clustering result obtained by using a single view. In this paper, we propose a novel algorithm called Collaborative PLSA (C-PLSA) for multi-view clustering, which works on the assumption that...
This paper proposes a novel regression method based on distance metric learning for human age estimation. We take age estimation as a problem of distance-based ordinal regression, in which the facial aging trend can be discovered by a learned distance metric. Through the learned distance metric, we hope that both the ordinal information of different age groups and the local geometry structure of the...
Millions of video surveillance cameras distribute around the world, and capture tremendous number of video data endlessly. Video browsing by frame is time consuming and inefficient, since needless information is abundant in the raw videos. Video synopsis is an effective way to solve this problem by producing a short video abstraction, while keeping the essential activities of the original video. However,...
Selective sampling has been widely used in relevance feedback of image retrieval to alleviate the burden of labeling by selecting the most informative instances for user to label. Traditional sample selection scheme often selects a batch of instances each time and label them simultaneously, which ignores the correlation among instances and results in redundant labeling. In this paper, we propose an...
The bag-of-visual-words (BoW) representation has received wide application and public acceptance for visual categorization. However, the histogram based image representation ignores the spatial information and correlations among visual words. To tackle these problems, in this paper, we propose to use some image regions called ‘components’, as the higher-level visual elements to represent an image...
Movie shot classification is vital but challenging task due to various movie genres, different movie shooting techniques and much more shot types than other video domain. Variety of shot types are used in movies in order to attract audiences attention and enhance their watching experience. In this paper, we introduce context saliency to measure visual attention distributed in keyframes for movie shot...
Behavior analysis across multi-cameras becomes more and more popular with the rapid development of camera network in video surveillance. In this paper, we propose a novel unsupervised graph matching framework to associate trajectories across partially overlapping cameras. Firstly, trajectory extraction is based on object extraction and tracking and is followed by a homographic projection to a mosaic-plane...
With the rapid increasing of video cameras, large amount of video data everyday brings the problem of video storage and browsing. In this paper, we propose a novel approach to video reshuffling with a group of static images to effectively summarize the video content. Each static image called narrative is generated to depict the behavior of a specific object or a special event. Firstly background subtraction...
Salient region of an image usually contains the crucial information for image analysis and understanding. Most conventional approaches learn the saliency by utilizing the low-level features, which ignore the participation of human. In this paper, we propose an effective and robust approach to detect the salient region of an image by combining the bottom-up and top-down cues. The proposed method not...
In this paper, we propose an automatic approach to simultaneously name faces and discover scenes in TV shows. We follow the multi-modal idea of utilizing script to assist video content understanding, but without using timestamp (provided by script-subtitles alignment) as the connection. Instead, the temporal relation between faces in the video and names in the script is investigated in our approach,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.