We address in this paper the parallelization of a recursive algorithm for triangular matrix inversion (TMI) based on the ‘Divide and Conquer’ (D&C) paradigm. A series of different versions of an original sequential algorithm are first presented. A theoretical performance study permits to establish an accurate comparison between the designed algorithms. Afterwards, we develop an optimal parallel communication-free algorithm targeting a heterogeneous environment involving processors of different speeds. For this purpose, we use a non equitable and incomplete version of the D&C paradigm consisting in recursively decomposing the original TMI problem in two subproblems of non equal sizes, then decomposing only one subproblem and so on. The theoretical study is validated by a series of experiments achieved on two platforms, namely an 8-core shared memory machine and a distributed memory cluster of 16 nodes. The obtained results permit to illustrate the interest of the contribution.