Checkpoint/restart has been an effective mechanism to achieve fault tolerance for many scientific applications. However, as GPU becomes a much bigger role in high performance computing, there is no effective checkpoint/restart scheme yet due to GPU's batch-mode execution manner. The paper proposes an application-level checkpoint/restart scheme to save and restore GPU computation states. A precompiler and run-time support module are developed to construct and save states in CPU system memory dynamically. Secondary storage can be utilized for scalability and long-term fault tolerance. CUDA applications with complicated memory use are support as well. Experimental results have demonstrated the effectiveness of the proposed scheme.