A way to speed up the Montgomery Multiplication by distributing the multiplier operand bits into partitions is proposed. All of them process in parallel and use an identical algorithm. Each partition executes its task in steps. Even though the computationstep operates in radix , the complexity is reduced by the use of a limited digit set. Experiments with a 90-nm cell library show that the hardware cost and its complexity have a linear growth according to the number of partitions. Besides the gain in speed, the proposal reduces power consumption for multiplication operands with 256, 512, 1024, and 2048 bits. The uniform treatment of partition hardware design enables the realization of a fault-tolerant hardware.