The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Sentiment analysis from large-scale networked data attracts increasing attention in recent years. Most previous works on sentiment prediction mainly focus on text or image data. However, voice is the most natural and direct way to express people's sentiments in real-time. With the rapid development of smart phone voice dialogue applications (e.g., Siri and Sogou Voice Assistant), the large-scale networked...
Eye localization is a key step in many face analysis related applications. In this paper, we present a novel eye localization method based on a group of trained filters called correlation filter bank (CFB). We formulate the eye localization problem as an optimization problem with a well-defined cost function based on CFB. The CFB is trained with an EM-like adaptive clustering approach. The trained...
In partial duplicate image retrieval systems, min-Hash algorithms are widely used because of its high efficiency and robustness. In most of min-Hash algorithms, min-Hash functions are considered independent and grouped into tuples called sketches, the discriminative power of sketches are limited. By modeling correlations of min-Hash functions, we propose a novel sketch construction method called Nonpara-metric...
In multi-media and social media communities, web topic detection poses two main difficulties that conventional approaches can barely handle: 1) there are large inter-topic variations among web topics; 2) supervised information is rare to identify the real topics. In this paper, we address these problems from the similarity diffusion perspective among objects on web, and present a clustering-like pattern...
In this paper, we address the robust face recognition problem. Recently, trace lasso was introduced as an adaptive norm based on the training data. It uses the correlation among the training samples to tackle the instability problem of sparse representation coding. Trace lasso naturally clusters the highly correlated data together. However, the face images with similar variations, such as illumination...
Analysis and recognition of auditory scenes play an important role in content-based multimedia processing and context-aware applications. In this paper, we propose an auditory scene recognition scheme that integrates the analysis of the audio data of scene with LDA topic model to discover latent structures (i.e. contextual correlations) of audio words, and generation of intermediate contextual descriptions...
Visual attention is an important function of the human visual system (HVS). In the long term research of visual attention, various computational models have been proposed with encouraging results. However, most of those work were conducted on images with ideal visual quality. In practice, outputs of most visual communication systems contain different levels of artifacts, e.g. noise, blurring, blockiness...
The statistical model of the bits to be encoded is crucial for the coding performance of distributed video coding (DVC). In this paper, a bit-level context-adaptive correlation model is proposed to exploit high-order statistical correlation for better channel coding performance, which consequently improves the video coding efficiency. In the proposed scheme, the wavelet domain DVC is considered and...
In this paper we propose a novel multiple target tracking model composed of two detectors and a tracker. An on-line detector and a tracker are used to generate target candidates, whose confidence scores are then evaluated by the off-line trained detectors. In the data association stage, the high-efficient inference in a structural model leads to the optimal tracking result. The experimental results...
This paper presents an inter-view motion prediction technique for efficient compression of motion vectors of the depth views in 3D-HEVC. 3D-HEVC is an extension of HEVC standard for coding the multi-view video plus depth content, known as MVD. In MVD format, the motion characteristics of the adjacent views in the depth video are highly correlated. In this paper, we take benefit of this correlation...
Query difficulty estimation (QDE) attempts to automatically predict the performance of the search results returned for a given query. QDE has been widely investigated in text document retrieval for many years. However, few research works have been explored in image retrieval. State-of-the-art QDE methods in image retrieval mainly investigate the statistical characteristics (coherence, robustness,...
Sensation of reality refers to the ability of users to feel present in a multimedia experience. As 3D technologies target to provide more immersive and higher quality multimedia experiences, it is important to understand Quality of Experience (QoE) and sensation of reality. Recently, there have been efforts to measure brain activity in order to understand implicitly QoE for various multimedia contents...
In this paper, we propose a cross-media regularization framework to enhance image understanding which can benefit image retrieval, classification and so on. The goal of cross-media regularization is to find regularization projections by exploiting the correlations between visual features and textual features. Thus, the original noisy distribution of visual features can be refined by leveraging the...
Cross-media retrieval is a challenging problem in multimedia retrieval area. In the real-world, many applications involve multi-modal data, e.g., web pages containing both images and texts. How to utilize the intrinsic intra-modality and inter-modality similarity to learn the appropriate relationships of the data objects and provide efficient search across different modalities is the core of cross-media...
This paper details an approach for novel video viewing experience on small-screen mobile devices. Users are allowed to choose a region-of-interest via a touch-based gesture, following which a virtual camera algorithm takes control in order to automatically zoom and pan within the video scene. The selected region-of-interest is kept within a retargeted viewport despite change in the position of the...
This paper introduces a novel reconstruction model with compound regularization to recover compressed-sensed video sequences. For a target frame, the compound regularization consists of total variation (TV) norm of the frame, l1 norm of the frame in a certain transform domain, and TV norm of the residual between the frame and its prediction. The first two terms in the compound regularization are used...
Emotions play an important role in how we select and consume multimedia. Recent advances on affect detection are focused on detecting emotions continuously. In this paper, for the first time, we continuously detect valence from electroencephalogram (EEG) signals and facial expressions in response to videos. Multiple annotators provided valence levels continuously by watching the frontal facial videos...
Tone Mapping Operators (TMOs) transform High Dynamic Range (HDR) contents to address Low Dynamic Range (LDR) displays. However, before reaching the end-user, these contents are usually compressed using a codec (coder-decoder) for broadcasting or storage purposes. Achieving the best trade-off between rendering and compression efficiency is of prime importance. Any TMO includes a rounding quantization...
Recent years have witnessed a growing interest in modeling user behaviors in multimedia research, emphasizing the need to consider human factors such as preference, activity, and emotion in system development and evaluation. Following this research line, we present in this paper the LiveJournal two-million post (LJ2M) dataset to foster research on user-centered music information retrieval. The new...
Luminance discrepancies between image pairs occur owing to inconsistent parameters between stereoscopic camera devices and from imperfect capture conditions. Such discrepancies induce binocular mismatches and affect the visual comfort that is felt by viewers, as well as their ability to fuse stereoscopic. To better understand and observe this effect, we built a stereoscopic images database of 240...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.