Advances in fusion of multi-sensor inputs have necessitated the creation of more sophisticated fused image assessment techniques. The current work extends previous studies investigating participant accuracy in tracking individuals in a video sequence. Participants were shown visible and IR videos individually and the two video inputs side-by-side, as well as averaged, discrete wavelet transform, and dual- tree complex wavelet transform fused videos. Two scenarios were shown to participants: one featured a camouflaged man walking down a pathway through foliage and across a clearing; the other featured several individuals moving around the clearing. The side-by-side scanpath data were analysed by studying how often participants looked at the visible and infrared sides, and analysing how accurately participants tracked the given target, and compared with previously analysed data. The results of this study are discussed in the context of wider applications to image assessment, and the potential for modelling human scanpath performance.