Abstract. This paper describes a multisensorial person-identification system in which visual and acoustic cues are used jointly for person identification. A simple approach, based on the fusion of the lists of scores produced independently by a speaker-recognition system and a face-recognition system, is presented. Experiments are reported that show that the integration of visual and acoustic information enhances both the performance and the reliability of the separate systems. Finally, two network architectures, based on radial basis-function theory, are proposed to describe integration at various levels of abstraction.