When perceiving a scene visually we constantly move our eyes and focus on particular details, which we integrate into a coherent percept. Can blind individuals integrate visual information this way? Can they even conceptualize zooming-in on sub-parts of visual images? We explore this question virtually using the EyeMusic Sensory Substitution Device (SSD). SSDs transfer information usually received by one sense via another, here ‘seeing’ with sound. This question is especially important for SSD users since SSDs typically down-sample the visual stimuli into low-resolution images in which zooming-in to sub-parts could significantly improve users' perception. Five blind participants used the EyeMusic with a zoom-mechanism in a virtual environment to identify cartoon figures. Using a touchscreen they could zoom into different parts of the image, identify individual facial features and integrate them into a full facial representation. These findings show that indeed such integration of visual information is possible even for users who are blind from birth and demonstrates the approach's potential for practical visual rehabilitation.