In order to give an accurate assessment, the test speech should be recognized firstly in the text-independent pronunciation quality assessment system. Field test data has some flaws which degrade the recognition performance, such as noise, accent and spontaneous speaking style. In this paper, we investigate these factors by improving the acoustic model (AM) for the speech recognition system. Background noise is added to the training data to enhance the ability of anti-noise. Speaker-based Cepstral Mean and Variance Normalization (SCMVN) is adopted to alleviate the distortion of channel and the impact of inter-speaker pronunciation variability. Maximum a Posteriori (MAP) adaptation is applied twice, in order to tune acoustic model to match the pronunciation characteristic of the accent and the spontaneous style in spoken language. According to the experimental results, above measures increase the word correct rate relatively by 44.1% and the correlation coefficient between machine score and expert score relatively by 6.3%.