The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we present a novel perceptually-based optimization for the improvement of stereoscopic video coding efficiency. The main idea of this proposed scheme is to adaptively adjust the quantization parameter by taking into account the Human Visual System perceptual characteristics. For this, a saliency map is generated from both views and then segmented into salient and non-salient regions...
A large number of images are available on online photo-sharing services along with rich meta-data, including tags, groups, and locations, etc. For associating two domains of different modalities, e.g. images and tags, Canonical Correlation Analysis (CCA) and its extended methods are used widely. We employ a more flexible graph embedding method called Cross-Domain Matching Correlation Analysis (CDMCA),...
This method introduces an efficient manner of learning action categories without the need of feature estimation. The approach starts from low-level values, in a similar style to the successful CNN methods. However, rather than extracting general image features, we learn to predict specific video representations from raw video data. The benefit of such an approach is that at the same computational...
Text detection is typically the first step for any text processing such as hand-written text recognition, layout analysis, line detection, or writer identification. This paper describes a new method to detect text in images, particularly in historical document images. For a robust detection, we propose the use of the vesselness filter as a new preprocessing step for text detection. We show, that this...
Depth maps are typically made of smooth regions separated by sharp edges. Following this rationale, this paper presents a novel coding scheme where depth data is represented by a set of contours defining the various regions together with a compact representation of the values inside each region. The proposed coding scheme is based on elastic curves, which make possible to compactly represent the contours...
A discriminative dictionary learning algorithm is proposed to find sparse signal representations using relative attributes as the available semantic information. In contrast, existing (discriminative) dictionary learning (DDL) approaches mostly utilize binary label information to enhance the discriminative property of the signal reconstruction residual, the sparse coding vectors or both. Compared...
Deep Convolutional Neural Networks (CNN) have recently been shown to outperform previous state of the art approaches for image classification. Their success must in parts be attributed to the availability of large labeled training sets such as provided by the ImageNet benchmarking initiative. When training data is scarce, however, CNNs have proven to fail to learn descriptive features. Recent research...
We propose an algorithm that accomplishes transform-coded, spatiotemporal, pel-recursive video compression. Traditional pel-recursive coders obtain sophisticated spatio-temporal predictions for the current pixel based on previously decoded data. The resulting per-pixel prediction errors are encoded independently so that the decoder can use previously-encoded pixels in the prediction of the current...
In intra video coding, intra frames are predicted with intra prediction and the prediction residual signal is encoded. In many transform-based video coding systems, intra prediction residuals are encoded with transforms. For example, the Discrete Cosine Transform (DCT) and the Asymmetric Discrete Sine Transform (ADST) are used for intra prediction residuals in many coding systems. In the recent work,...
Sequential dictionary learning via the K-SVD algorithm has been revealed as a successful alternative to conventional data driven methods such as independent component analysis (ICA) for functional magnetic resonance imaging (fMRI) data analysis. fMRI data sets are however structured data matrices with notions of spatio-temporal correlation. This prior information has not been included in the K-SVD...
This paper introduces a novel class of transforms, called graph-based separable transforms (GBSTs), based on two line graphs with optimized weights. For the optimal GBST construction, we formulate a graph learning problem to design two separate line graphs using row-wise and column-wise residual block statistics, respectively. We also analyze the optimality of resulting separable transforms for both...
Visual question answering (VQA) comes as a result of great development in computer vision and natural language processing, which requires deep understanding of images and questions and effective integration of them. Current works on VQA simply concatenated visual and textual features or compared them via dot product, which were unable to eliminate the semantic difference between them. We argue to...
Our challenge is the design of a “universal” bit-efficient image compression approach. The prime goal is to allow reconstruction of images with high quality. In addition, we attempt to design the coder and decoder “universal”, such that MPEG-7-like low-and mid-level descriptors are an integral part of the coded representation. To this end, we introduce a sparse Mixture-of-Experts regression approach...
The accuracy of end-to-end distortion (EED) estimation is crucial to achieving effective error resilient video coding. An established solution, the recursive optimal per-pixel estimate (ROPE), does so by tracking the first and second moments of decoder-reconstructed pixels. An alternative estimation approach, the spectral coefficient-wise optimal recursive estimate (SCORE), tracks instead moments...
In this paper we propose a new quality metric to estimate the impact of packet loss on the perceptual quality of encoded video sequences transmitted over error-prone networks. The proposed metric, henceforth referred to as Cumulative Distortion using Structural Similarity (CDSSIM), quantifies the overall structural distortion resulting from bidirectional error propagation in predictively coded, motion...
The demands for high quality multimedia contents and the advent of the Ultra High Definition (UHD) resolution have motivated the development of the High Efficiency Video Coding (HEVC) standard, which outperforms prior standards by up to 50% in terms of coding efficiency. This improvement, however, involves higher computational complexity in the encoder side, making it essential for realtime encoders...
In this paper we review the current status and ongoing development of High Dynamic Range and Wide Color Gamut (HDR/WCG) video compression within MPEG. We review how existing MPEG, ITU-R and SMPTE standards may be used for coding HDR content. The history of an exploratory activity within MPEG investigating technologies for improved compression of HDR/WCG content is reviewed. An overview of the MPEG...
Fisheye cameras have become extremely popular in applications where the goal is to capture large fields of view with only one camera. However, the wide-angle fisheye imagery has special characteristics that may not be very well suited for modern video codecs that employ block-based translational motion model. This model fails to describe complex deformable motion which is often present in fisheye...
In the context of motion estimation (ME) for video coding, the rate-constrained successive elimination algorithm (RC-SEA) safely eliminates candidate motion vectors while preserving the optimal candidate chosen by the block matching algorithm (BMA). This paper describes a technique for reusing ME information from rectangular to square prediction units in order to reduce the search area without altering...
Regular omnidirectional video encoding technics use map projection to flatten a scene from a spherical shape into one or several 2D shapes. Common projection methods including equirectangular and cubic projection have varying levels of interpolation that create a large number of non-information-carrying pixels that lead to wasted bitrate. In this paper, we propose a tile based omnidirectional video...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.