Our work deals with the classical problem of merging heterogeneous and asynchronous parameters. It's well known that lip reading improves the speech recognition score, specially in noisy conditions; so we study more precisely the modeling of acoustic and articulatory parameters to propose new Automatic Speech Recognition systems. We use a segmental pre-processing, a robust unit "the pseudo-diphone" and we compare a global HMM and a master-slave HMM. We confirm through experiments the importance of labial features in clean and noisy environment.