In this paper, we propose RadixBoost, a hardware acceleration structure for scalable 32-bit integer radix sort on GPU. The whole structure is integrated into a GPU microarchitecture as a special functional unit and can be started by new instructions. Our design enables a significantly faster sorting procedure for general purpose GPU computing. The RadixBoost architecture was validated by an FPGA prototype integrated in FPGA-based GPU microarchitecture simulator, Fastlanes. An ASIC evaluation of RadixBoost was also performed. Our results proved that RadixBoost outperformed its GPU software equivalent by a factor of over 6 with an 1% and 3% increase in area and power respectively in cutting-edge Fermi GPU.