Asymmetric cryptography algorithms such as RSA are widely used in applications such as blockchain technology and cloud computing to ensure the security and privacy of data. However, the encryption and decryption operations of asymmetric cryptography algorithms involve many computation-intensive multiplications, which require high memory bandwidth and involve large performance and resource overhead. Emerging non-volatile memory technologies such as racetrack memory are regarded to be promising for all levels of memory hierarchy to reduce the area and power overhead due to their high data density and nearly zero leakage. In this paper, we propose an efficient racetrack memory based in-memory design to accelerate the modular multiplication for asymmetric cryptography algorithms. A novel two-stage scalable modular multiplication algorithm is proposed to significantly improve the delay. An efficient architecture is further developed to reduce the number of required adders by half. Experimental results show that our proposed scheme improves the energy efficiency by 45.9%, the area efficiency by 93.6% and achieves 8x of throughput per area compared with the state-of-the-art CMOS-based implementation.