Due to complex hardware architecture and heavy computation, it is difficult to perform modular multiplications over large integers. In this paper, we propose a low-latency scalable modular multiplier for multi-precision modular multiplications. Unlike popular scalable architectures by Montgomery algorithm, the classic modular multiplication A · B mod M is directly implemented here. Low latency can be obtained by deferring the uses of most significant bits during the interleaving modular multiplications. Also, the critical path of processing elements is greatly reduced by carry-save additions. While the scalable modular multiplier obtains comparable performance with optimal scalable Montgomery modular multiplier, its area overhead increases since more registers and selection logics are employed. The proposed modular multiplier is convenient for variant operands and nonsuccessive modular multiplications, where it is more energy-efficient than popular scalable Montgomery modular multipliers.