The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Fixed camera videos are obtained/used by surveillance, teleconference, remote lecturing. Since it is one of the most fundamental camera movement techniques, it is also frequently used in studio shots, drama/movie scenes. In this paper, simple and efficient coding method for such fixed camera videos is proposed. The proposal significantly improves the coding efficiency and generated bitstream is fully...
Sensitivity to spatial details drops across the visual periphery, and hence video streaming systems that gracefully degrades quality away from the viewpoint of the observer, provides an optimum viewing experience with potentially large bitrate savings. As reaction latency is an important performance parameter of such systems, good prediction of future gaze locations at the transmission end is very...
Multi-object tracking is a difficult problem underlying many computer vision applications. In this work, we focus on sediment transport experiments in a flow were sediments are represented by spherical calibrated beads. The aim is to track all beads over long time sequences to obtain sediment velocities and concentration. Classical algorithms used in fluid mechanics fail to track the beads over long...
In intra video coding, intra frames are predicted with intra prediction and the prediction residual signal is encoded. In many transform-based video coding systems, intra prediction residuals are encoded with transforms. For example, the Discrete Cosine Transform (DCT) and the Asymmetric Discrete Sine Transform (ADST) are used for intra prediction residuals in many coding systems. In the recent work,...
This paper introduces a novel class of transforms, called graph-based separable transforms (GBSTs), based on two line graphs with optimized weights. For the optimal GBST construction, we formulate a graph learning problem to design two separate line graphs using row-wise and column-wise residual block statistics, respectively. We also analyze the optimality of resulting separable transforms for both...
For image retrieval and caption generation, this paper considers a multimodal representation that associates image with its text description (caption) by defining a neural language model as the conditional probability of the next word given both n past words in a caption and the image that the caption describes. To address the data sparsity problem, the use of the Kneser-Ney smoothing and skip-gram...
In this paper we propose a new quality metric to estimate the impact of packet loss on the perceptual quality of encoded video sequences transmitted over error-prone networks. The proposed metric, henceforth referred to as Cumulative Distortion using Structural Similarity (CDSSIM), quantifies the overall structural distortion resulting from bidirectional error propagation in predictively coded, motion...
We proposed a novel model to predict human's visual attention when free-viewing webpages. Compared with natural images, webpages are usually full of salient regions such as logos, text, and faces, while few of them attract human's attention in a short sight. Moreover, webpages perform distinct viewing patterns which are quite different from the natural images. In this paper, we introduced multi-features...
The curb appeal of a home, which refers to how attractive it is when viewed from the street, is an important decisionmaking factor for many home buyers. Existing models for automatically estimating the price of a home ignore this factor, instead focusing exclusively on objective attributes, such as number of bedrooms, the square footage, and the age. We propose to use street-level imagery of a home,...
Convolutional Neural Networks (CNNs) have been widely adopted for many imaging applications. For image aesthetics prediction, state-of-the-art algorithms train CNNs on a recently-published large-scale dataset, AVA. However, the distribution of the aesthetic scores on this dataset is extremely unbalanced, which limits the prediction capability of existing methods. We overcome such limitation by using...
User-generated videos (UGVs) have dominated contemporary social networking sites (SNSs). Forecasting their popularity is of great relevance to a broad range of online services. All existing studies forecast popularity of UGVs using their popularity statistics that are accumulated for a period of time after they are uploaded. Hence, there is always a substantial time lag (days to weeks) before popularity...
We consider the task of dimensional emotion recognition on video data using deep learning. While several previous methods have shown the benefits of training temporal neural network models such as recurrent neural networks (RNNs) on hand-crafted features, few works have considered combining convolutional neural networks (CNNs) with RNNs. In this work, we present a system that performs emotion recognition...
In this paper we propose a novel bottom-up visual saliency detection model by analysis of image complexity. Compared with existing works, we emphasize the important impact of image complexity on saliency detection. Inspired by the free energy theory, a hybrid parametric and non-parametric model is used to estimate the complexity of a visual signal. Taking the image complexity as a new feature, this...
In this paper, we propose a new edge model for edge adaptive graph-based transforms (EA-GBTs) in video compression. In particular, we consider step and ramp edge models to design graphs used for defining transforms, and compare their performance on coding intra and inter predicted residual blocks. In order to reduce the signaling overhead of block-adaptive coding, a new edge coding method is introduced...
We propose a novel signal model, based on sparse representations, that captures cross-scale features for visual signals. We show that cross-scale predictive model enables faster solutions to sparse approximation problems. This is achieved by first solving the sparse approximation problem for the downsampled signal and using the support of the solution to constrain the support at the original resolution...
Active Shape Models are a powerful and well known method to perform face alignment. In some applications it is common to have shape information available beforehand, such as previously detected landmarks. Introducing this prior knowledge to the statistical model may result of great advantage but it is challenging to maintain this priors unchanged once the statistical model constraints are applied...
Activity forecasting has recently become an active research area for its importance in critical applications like automated navigation and human-computer interaction. However, for a video observed upto a certain time, all of the existing forecasting works focus on predicting the activity label, i.e., predicting what the next unobserved activity is. To the best of our knowledge, no work has answered...
Perceptual distortion prediction at near-threshold level has many applications in general image/video processing tasks. This paper presents a computational model to predict the near-threshold perceptual distortions based on optimal structure classification. This model accounts for contrast sensitivity, light adaptation, and various masking effects of the human visual system (HVS), and automatically...
Learning attribute models for applications like Zero-Shot Learning (ZSL) and image search is challenging because they require attribute classifiers to generalize to test data that may be very different from the training data. A typical scenario is when the notion of an attribute may differ from one user to another, e.g. one user may find a shoe formal whereas another user may not. In this case, the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.