The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes an efficient video coding method based on audio-visual attention, which is motivated by the fact that cross-modal interaction significantly affects humans' perception of multimedia content. First, we propose an audio-visual source localization method to locate the sound source in a video sequence. Then, its result is used for applying spatial blurring to video frames in order to...
In the H.264/AVC coding standard, motion estimation (ME) is allowed to use multiple reference frames to make full use of reducing temporal redundancy in a video sequence. Although it can further reduce the motion compensation errors, it introduces tremendous computational complexity as well. In this paper, we propose a statistical learning approach to reduce the computation involved in the multireference...
The color variations among different viewpoints in multiview video sequences may deteriorate the visual quality and coding efficiency. Various color correction methods have been proposed, however, the color appearance and histogram of corrected target frames are not similar enough to the reference frames in details. Focusing on restoring more similar color, a block-based color correction algorithm...
In this paper, we propose a fast multi-layer motion estimation algorithm for spatial scalability provided in H.264/AVC scalable extension, based on the reuse of the motion vectors from multiple spatial layers. The reused motion vector is used to set a search center and refined within a small search area. However, the reused motion vector often produces significant prediction error at object boundaries...
A kurtosis-based super-resolution image reconstruction algorithm is proposed in this paper. Firstly, we give the definition of the kurtosis image and analyze its two properties: (i) the kurtosis image is Gaussian noise invariant, and (ii) the absolute value of a kurtosis image becomes smaller as the the image gets smoother. Then we build a constrained absolute local kurtosis maximization function...
In this paper, a line-warping based deinterlacing method will be introduced. The missing pixels in interlaced videos can be derived from the warping of pixels in horizontal line pairs. In order to increase the accuracy of temporal prediction, multiple temporal-line pairs, selected according to constant velocity model, are used for warping. The stationary pixels can be well-preserved by accuracy stationary...
This paper presents a new directional image interpolator, aiming to increase image resolution with high perceptual quality and low computational complexity. In our method, missing pixels in a magnified image are generated through linear interpolation on certain fixed supports to facilitate fast implementation, while local directional features are imposed on the adaptive interpolation weights which...
Spatial resolutions of IKONOS high-resolution panchromatic (PAN) and low-resolution multispectral (MS) satellite images are 1 m and 4 m, respectively. To cope with color distortion and blocking artifacts in fused images, in this study, a new IKONOS imagery fusion approach using particle swarm optimization (PSO) is proposed. The pixels of fused images in the training set are classified into several...
Human eyes have different sensitivity to different frequency components of image signals, typically, low frequency components are relatively more crucial to the perceptual quality of images than high frequency components. Based on this observation, we propose a novel sampling scheme for compressive sensing framework by designing a weighting scheme for the sampling matrix. By adjusting the weighting...
In this paper, we propose an efficient local stereo algorithm for accurate disparity estimation. First, we attain initial disparity estimates by iterating a cross-based cost aggregation process. Then, we propose a robust voting scheme to refine the initial estimates based on a piecewise smoothness prior, improving the quality in occluded regions and low-textured regions effectively. The refinement...
Algorithmic and protocol constraints of most low bitrate compression schemes lead to audio signals of low bandwidth and, inevitably, of low perceptual audio quality. Audio bandwidth extension methods address this problem by reconstructing the high frequency spectrum of a degraded signal based on information from the low frequency part. In this work, a novel audio bandwidth extension method is presented...
Motion compensated frame interpolation (MCFI) is one of the most efficient solutions to generate side information (SI) in the context of distributed video coding. However, it creates SI with rather significant motion compensated errors for some frame regions while rather small for some other regions depending on the video content. In this paper, a low complexity intra mode selection algorithm is proposed...
In this paper, we introduce a novel way to represent an image sequence, which naturally exhibits the temporal persistence of the textures. Standardized representations have been thoroughly optimized, and getting significant improvements has become more and more difficult. As an alternative, Analysis-Synthesis (AS) coders have focused on the use of texture within a video coder. We introduce here a...
This paper proposes two entropy constrained color splitting algorithms through building a binary tree structure for a progressive transmission of palette images. At each step of color splitting, a representative color is split into two new representative colors to minimize the distortion incurred by the reconstructed image subject to an entropy constraint. Among the bit rates of interest, both of...
A fast macroblock mode selection algorithm based on dynamic multi-threshold is proposed to improve the encoding speed of multiview video, but with insignificant degradation in rate distortion (RD) performance. The macroblock modes are divided into four classes after statistically analyzing the macroblock mode selection results. Three thresholds are adopted based on the great RD cost gaps between the...
Acoustic echoes represent a major source of discomfort in hands free, full-duplex, communication systems. The problem becomes particularly difficult when the loudspeakers are nonlinear as considered in this paper. In contrast to the single-microphone linear and nonlinear acoustic echo cancellation techniques, we take advantage of the spatial diversity offered by the microphone arrays. Indeed, having...
In this paper we propose a novel language identification system which utilizes fused phonotactic information. The phase spectrum of speech signals is used with the magnitude spectrum in order to obtain a more robust feature representation. Parallel Broad Phoneclass Recognition followed by Language Model (PBPRLM) is used in order to remove the bias of the likelihood scores introduced by the size inequality...
Bilateral filter has shown its outstanding performance in image denoising and other multimedia applications. In this paper, a new color interpolation technique named joint bilateral demosaicking is proposed. Considering the image gradient, an edge-sensing initialization step is performed. In addition, joint bilateral filter exploits the correlation between color channels with the information from...
Empirical mode decomposition (EMD) developed by Huang et al. is a nonlinear data analysis method for nonstationary real-valued time series. It has been applied extensively in many research areas. Recently, several generalized EMD methods for complex-valued data analysis was proposed. Since a plane closed curve comprises many two-dimensional (2D) space data points, one can imagine that the boundary...
To deal with the issue of data unbalanced condition among a task of multilingual speech recognition and a phenomenon of pronunciation variations across languages, we propose an approach to clustering context dependent phones from an extended phone set in an acoustic model trained on a data unbalanced bilingual corpus. First, we generate an extended phone set using pronunciation modeling by a confidence...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.