Objective metrics are widely used to estimate the perceived quality of audio and video content in consumer electronics applications. Although a number of such metrics exist, they have all been developed for the unimodal case-cross-modal interaction, i.e. the influence of video on audio and vice versa, is not considered by these metrics. We give an overview on these, and summarize human audiovisual perception in light of modeling cross-modal interaction. Factors that influence perceived quality, e.g. attention, are discussed.