Graph computation is becoming more and more popular in machine learning, big data analytics, etc. For such workloads, GPU is considered as an efficient execution platform since graph computation is characterized by massively parallel computation and high demand of memory bandwidth. In our investigation, existing GPU programming methods for graph computation do not fully exploit high memory bandwidth as well as high computing power in GPU. We propose a novel optimization called locality-aware vertex scheduling, which aims at minimizing memory requests by adjusting the order of vertex computations to improve temporal locality of vertex data stored in on-chip caches. Experiments with nine real-world graphs and three graph algorithms on the recent GPU platform show that the proposed method offers a significant speedup (average 46%) over the state-of-the-art graph algorithm implementation on GPUs.