Automated Essay Scoring (AES) has always been a difficulty in the field of language testing. The first step towards AES is scoring model generated by datasets that have already been scored artificially; however, researchers are confronted with the lack of datasets. From a mathematical point of view, in fact, only a small dataset is enough to build a scoring model, which is comparable to that generated by large datasets, thus improving researchers' efficiency and data efficiency. A small dataset extraction algorithm (SDEA) is presented in this paper, and then it is put into use, together with a traditional large dataset scoring model, on an automated scoring software platform based on Latent Semantic Analysis (LSA). Experimental results show although SDEA only use 25% of data, it can achieve the effect which is close to that achieved by the traditional large dataset scoring model, which verifies SDEA is practicable and effective.