This brief presents a parallel architecture for the turbo decoder using the quadratic permutation polynomial interleaver. The supported block size ranges from 40 to 6144 with an increment of 8, and thus, it includes 188 sizes in the 3rd Generation Partnership Project Long Term Evolution standard. The proposed design can allow one, two, four, or eight soft-in/soft-out decoders to process each block with configurable iterations. To support all data transmissions in the parallel design, a multistage network with low complexity is also utilized. Moreover, a robust path metric initialization is given to improve the performance loss in small blocks and high parallelism. After fabrication in the 90-nm process, the 2.1-mm2 chip can achieve 130 Mb/s with 219 mW for the size-6144 block and eight iterations.