An efficient design methodology and a systematic approach for the implementation of multiplication and squaring functions for unsigned large integers, using small-size embedded multipliers are presented. A general architecture of the multiplier and squarer is proposed and a set of equations is derived to aid in the realisation. The inputs of the multiplier and squarer are split into several segments leading to an efficient utilisation of the small-size embedded multipliers and a reduced number of required addition operations. Various benchmarks were tested for different segments ranging from 2 to 5 targeting Xilinx Spartan-3 FPGAs. The synthesis was performed with the aid of the Xilinx ISE 7.1 XST tool. The approach was compared with the traditional technique using the same tool. The results illustrate that the design approach is very efficient in terms of both timing and area savings. Combinational delay is reduced by an average of 7.71% for the multiplier and 21.73% for the squarer. In terms of 4-inputs look-up tables, area is lowered by an average of 11.63% for the multiplier and 52.22% for the squarer. In the case of the multiplier, both approaches use the same number of embedded multipliers. For the squarer, the proposed approach reduces the number of required embedded multipliers by an average of 32.77% compared with the traditional technique