Given that the reliability of a very large-scaled system is inversely related to the number of computing elements, fault tolerance has become a major concern in high performance computing including the most recent deployments with graphic processing units (GPUs). Many fault tolerance strategies, such as the checkpoint/restart mechanism, have been studied to mitigate failures within such systems. However, fault tolerance mechanisms generate additional costs and these may cause a significant performance drop if it is not used carefully. This paper presents a novel fault tolerance scheduling model that explores the interplay between the GPGPU application performance and the reliability of a large GPU system. This work focuses on the checkpoint scheduling model that aims to minimize fault tolerance costs. Additionally, a GPU performance analysis is conducted. Furthermore, the effect of a checkpoint/restart mechanism on the application performance is thoroughly studied and discussed.