For automating the remote voice detection using a laser Doppler vibrometer (LDV), we integrate a pan–tilt–zoom (PTZ) camera, a mirror, and a pan–tilt unit (PTU) with the LDV to form a multimodal sensing system. With the assistance of vision and active control components, the LDV can automatically select the best reflective surfaces, point the laser beam to the selected surfaces, and quickly focus the laser beam. For accomplishing these functions, distance measurement and sensor calibration methods are proposed using the triangulation between the PTZ camera and the mirrored LDV laser beam. Based on both the measured distances and the return signal levels of the LDV, a fast and automatic LDV focusing algorithm is designed. Furthermore, strategies and related image-processing techniques in surface selection and laser pointing are designed. Experimental results are shown to validate the performance improvement of the LDV in remote automatic voice detection by using the multimodal system.