Autonomous robots and humans need to create a coherent 3D representation of their peripersonal space in order to interact with nearby objects. Recent studies in visual neuroscience suggest that the small coordinated head/eye movements that humans continually perform during fixation provides useful depth information. In this work, we mimic such a behavior on a humanoid robot and propose a computational model that extracts depth information without requiring the kinematic model of the robot. First, we show that, during fixational head/eye movements, proprioceptive cues and optic flow lie on a low dimensional subspace that is a function of the depth of the target. Then, we use the generative adaptive subspace self-organizing map (GASSOM) to learn these depth-dependent subspaces. The depth of the target is eventually decoded using a winner-take-all strategy. The proposed model is validated on a simulated model of the iCub robot.