Acoustic-articulatory relationships and inversion in sum-product and deep-belief networks

Frank Rudzicz; Arvid Frydenlund; Sean Robertson; Patricia Thaine

doi:10.1016/j.specom.2016.03.001

Acoustic-articulatory relationships and inversion in sum-product and deep-belief networks

Frank Rudzicz, Arvid Frydenlund, Sean Robertson, Patricia Thaine

Source

Speech Communication > 2016 > 79 > C > 61-73

Abstract

We provide the first direct comparison of sum-product networks (SPNs) and deep-belief networks on speech, and the first application of SPNs to acoustic-articulatory inversion. Interestingly, speech from individuals with cerebral palsy is reconstructed significantly more accurately across all manners of articulation using SPNs than when using DBNs. In order to select appropriate input parameters, we first compare MFCCs, wavelets, scattering coefficients, and vocal ‘tract variables’ as predictor variables to phonological features. Here, MFCCs provide for more accurate classification over a broad array of phonological categories (in the high 90s in many cases) than the other feature types. All experiments use the MOCHA-TIMIT and TORGO acoustic-articulatory databases.