An approach to identifying a caller or callers by a service robot is presented for a natural interaction with people in a home/office environment. The problem addressed specifically in this paper is how to successfully identify a caller in a cluttered environment with large uncertainties involved in the sensed audio-visual cues. The proposed approach is based on a proposition that the dependability of perceptual recognition may come unlikely from "the effort to make individual sensing perfect", but likely from "the effort to self-generate perceptual behaviors of integrating individual sensing that lead to mission accomplishment, no matter how imperfect and uncertain individual sensing may be". We implement the above proposition in terms of a novel robotic architecture, referred to here as "cognitive robotic engine (CRE)." CRE implemented for the case of a robot identifying a caller in a crowded and noisy environment, including its experimental results, are shown