In this paper, we propose a source-to-source code optimization framework for general purpose computing on graphics processing units (GPGPU). Our framework is based on a re-formulation of the polyhedral loop transformation theory under the context of GPGPU. We prove that the number of actual memory transactions can be used as a performance metric to guide the code optimization process. In addition, we show how to analytically derive such a metric from a GPU program's polyhedral model. We also develop formations of GPGPU-specific optimization problems and propose corresponding affine transformations, which can be applied to an initial parallelized solution derived from input C/C++ code. The experiment results demonstrate the effectiveness of our work. On average, the code generated by our work outperforms a leading GPGPU compiler and NVIDIA handcrafted CUBLAS 4.0 by 20% and 17%, respectively.