The main difficulty to implement modern image coding systems in a GPU is that the algorithms employed in the core of the coding scheme are inherently sequential. We recently proposed bitplane image coding with parallel coefficient processing (BPC-PaCo), a coding scheme that, contrarily to most systems, permits the processing of multiple coefficients of the image in parallel. This enables the use of SIMD computing, ideal for its implementation in a GPU. This paper introduces and evaluates the GPU implementation of BPC-PaCo employing two different strategies that tradeoff computational throughput and compression efficiency. The proposed implementation is compared to the best CPU and GPU implementations of JPEG2000, the state-of-the-art image compression standard. Experimental results indicate that BPC-PaCo achieves a computational throughput that is an order of magnitude superior to that achieved with such implementations with a small reduction in coding efficiency.