Deep learning or deep neural network (DNN) has become an increasingly powerful and popular technique to significantly improve quality of many important Internet applications and services, such as speech recognition, image search, web search, and advertisements. The large-scale DNN with big training data and a diverse set of parameters is facing challenges on insufficient computation power on traditional computing platform. To address these challenges, industry and academia are looking at the approaches for large-scale DNN training and on-line services at algorithms level, system level, and architecture level. We, at Baidu, the largest Internet search company in China, have designed a production-quality PCIe-based FPGA accelerator for large-scale DNN systems. The FPGA DNN accelerator is able to enable efficient large-scale DNN training and on-line services in a low-cost and low-power manner environment.