Audiovisual fusion with segment models for video structure analysis

M Delakis; G Gravier; P Gros

Audiovisual fusion with segment models for video structure analysis

Source

The 2nd European Workshop on the Integration of Knowledge, Semantics and Digital Media Technology, 2005. EWIMT 2005. (Ref. No. 2005/11099) > 47 - 54

Abstract

Hidden Markov Models provide a powerful framework for bridging the semantic gap between low-level video features and high-level user needs by taking full advantage of our prior knowledge on the video structure. A serious flaw of HMMs is that they require all the modalities of a video document to be strictly synchronous before their fusion. Taking as a case study tennis broadcasts analysis, we introduce video indexing using Segment Models, a generalization of Hidden Markov Models, where the fusion of different modalities can be performed in a more flexible way. Operating essentially as a layered topology they allow the fusion of asynchronous modalities but do not rely on synchronization points fixed a priori. They also facilitate the fusion of audio models of high-level semantics, like the content of a complete scene, on top of the raw lowlevel audio frames. Segment Models provide encouraging experimental results.