Scalable architectures were proposed for Discrete Cosine Transform (DCT). Number of processing elements (PE) can be reduced significantly using partial column structure for computing the DCT transform. This feature is very desirable for multimedia applications usage in handheld devices. As per transform computation, data reordering is required between stages (columns) where intermediate computed values are saved in memory-like temporary locations called FIFO's. A scalable interconnect network for both global and local data reordering and its implementation is presented in this paper. Scalability is based on transform size and desired number of processing elements (PE). The structure gives choice flexibility of throughput vs. complexity (cost and area.) of the overall system.