The conservative global gyrokinetic toroidal full-f five-dimensional Vlasov simulation (GT5D) is a nuclear fusion simulation program designed to analyze turbulence phenomena in tokamak plasma. In this research, we optimize it for graphics processing unit (GPU) clusters with multiple GPUs on each node. Based on the profile results of a GT5D on a CPU node, it was decided to offload the entire time development part of the program to GPUs, except for MPI communication. Our evaluation results show we achieved a maximum 3.35 times faster performance with a GPU during a function level execution, and 1.91 times faster total performance, than could be achieved via CPU-only execution, both in measurements on high density GPU cluster HA-PACS, where each computation node consists of four NVIDIA M2090 GPUs and two Intel Xeon E5-2670 (SandyBridge) that provide 16 cores in total. Note that theses performance improvements for a single GPU were obtained in measurements against four CPU cores, not a single-core CPU, and include a 63% performance gain obtained by communications overlapping between MPI processes and GPU calculations.