In this paper, we discuss a robot vision in order to perceive people and the environment around a mobile robot. We developed a tele-operated mobile robot with a pan-tilt mechanism composed of a camera and a laser range finder (LRF). In this paper, we propose a method for sensor fusion to extract a human from the measured data by integrating these outputs based on the concept of synthesis. Next, we propose a method of hierarchical neural network based on Growing neural gas to construct a 3D environmental relation. Finally, we show experimental results of the proposed method.