This paper presents a design for parallel processing of synthetic aperture radar (SAR) data using multiple Graphics Processing Units (GPUs). Our approach supports real-time reconstruction of a two-dimensional image from a matrix of echo pulses and their response values. Key to runtime efficiency is a partitioning scheme that divides the output image into tiles and the input matrix into a collection of pulses associated with each tile. Each image tile and its associated pulse set are distributed to thread blocks across multiple GPUs, which support parallel computation with near-optimal I/O cost. The partial results are subsequently combined by a host CPU. Further efficiency is realized by the GPU's low-latency thread scheduling, which masks memory access latencies. Performance analysis quantifies runtime as a function of input/output parameters and number of GPUs. Experimental results were generated with 10 nVidia Tesla C2050 GPUs having maximum throughput of 972 Gflop/s. Our approach scales well for output (reconstructed) image sizes from 2,048 × 2,048 pixels to 8,192 × 8,192 pixels.