The objective of the work reported here is to provide an automatic, context-of-capture categorization, structure detection and segmentation of news broadcasts employing a multimodal semantic based approach. We assume that news broadcasts can be described with context-free grammars that specify their structural characteristics. We propose a system consisting of two main types of interoperating units: The recognizer unit consisting of several modules and a parser unit. The recognizer modules (audio, video and semantic recognizer) analyze the telecast and each one identifies hypothesized instances of features in the audiovisual input. A probabilistic parser analyzes the identifications provided by the recognizers. The grammar represents the possible structures a news telecast may have, so the parser can identify the exact structure of the analyzed telecast.