The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this talk, I shall introduce a series of research related with deep learning for visual understanding, and focus on three aspects: 1) how to make the model size to be small while keep high accuracy, 2) how to design proper objective functions to enhance algorithmic learnability, and 3) how to design proper network structure to accelerate the inference speed.
Multimedia is defined by Webster Dictionary as "using or involving several forms of communication or expression". It has been common to understand that a multimedia exhibits of photographs, films, and music. In the 1993 first edition of McGraw-Hill's Multimedia: Making It Work, Tay Vaughan declared "Multimedia is any combination of text, graphic art, sound, animation, and video that...
Our planet is photographed on a daily basis by dozens of imaging satellites, hundreds of airplanes and drones, and thousands of cars collecting street-level imagery. This imagery is critical to consumer products such as Google Earth or Google Street View, which let users travel virtually and explore any destination around the world. In addition, it is used by governments and commercial entities to...
With the emerging market of 3D imaging products, 3D video has become an active area of research and development in recent years. 3D video is the key to provide more realistic and immersive perceptual experiences than the existing 2D counterpart. There are many applications of 3D video, such as 3D movie and 3DTV, which are considered the main drive of the next-generation technical revolution. Stereoscopic...
In this paper, a new mode decision scheme is proposed for depth map coding in 3D-AVS. The novelty of the paper mainly contains the following two points. Firstly, an improved distortion estimation model of synthesized views is proposed. Secondly, for the mode decision of depth map coding, the distortion is represented to be the weighted sum of depth distortion and estimated distortion of the synthesized...
As the projection of the real world, videos usually have many repeated patterns with similar structures cross regions, presenting strong non-local correlations. Moreover, different videos own different characteristics. Exploitation of the non-local correlations by off-line training of transforms has attracted considerable attention over the past years for compression. However, the samples used for...
Video shot boundary detection (SBD) is necessary for further video analysis like video retrieval and annotation. Great efforts have been made to develop SBD algorithms for speed and accuracy. Most works implement frame histogram as features to measure similarity for detection. However, when changes between consecutive shot boundaries are small and backgrounds of them are highly similar, most state-of-the-art...
Bit allocation plays an important role in rate control for it determines the calculation of other parameters of the rate control model. For surveillance videos, however, the bit distribution analysis shows that the foreground parts should get more bits and be encoded in higher quality than background parts. By utilizing the background and foreground information (BFI) provided by surveillance videos,...
If scene flow expressed in three-dimensional (3D) vector fields is robustly estimated from multi-view or multi-focus images, we can develop advanced 3D motion tracking and motion compensation for 3D video compression. In this study, based on a synthesis of multi-focus images from multi-view images, we propose a novel method for analyzing 3D scene flow accurately at low computational cost as an extension...
This paper presents a novel pyramid stereo matching method to improve the matching accuracy of panoramas. Initial camera parameters and feature correspondences are obtained from Structure From Motion (SFM) with normal images extracted from two panoramas. Then a stereo matching pyramid is constructed to refine the feature correspondences layer by layer, and the correspondence is corrected in the original...
Convolution Neural Network (CNN) is a state of the art machine learning algorithm. For CNN accelerator implementations, fixed-point and floating-point are two typical numeric representations. Because of the effects of rounding, reducing the word length would save the hardware and the power overheads while sacrificing the computation accuracy. The inherent robustness of neural network makes it possible...
Image enhancement is widely popular due to its capability of producing "better" visual quality for specific applications. Although many enhancement algorithms have been developed in recent years, the studies towards blind assessment of enhanced images are still very lacking. In this paper, we propose a data-driven blind image quality assessment (BIQA) method based on the quality-aware deep...
In face recognition for criminal identification, the training data are always clean while the probe data are occluded by sunglasses, scarf or other facial accessories. Occlusions in the probe data severely degrade the recognition performance. We find that introducing artificial occlusions into the training data is helpful in this situation. The incremental training data is decomposed into a class-specific...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.