Modern handheld embedded systems operate under stringent power and real-time constraints. These systems run highly data-dominated applications from multimedia and wireless domains. Most of these applications spend significant amount of execution time in nested-loops. In order to reduce the loop control overhead several loop controller architectures have been proposed in the past. In this paper we present a generic architecture and a compiler technique to significantly reduce the energy overhead related to execution of loop control instructions. The compiler technique not only maps the innermost loops but also maps the outer loops on to the loop controller architecture. Furthermore, we also reduce the number of division operations using induction variable analysis to improve energy efficiency. We show that by utilizing the proposed technique, it is possible to reduce the energy consumption of the branch operations using these loop controller architectures by 25% on average with no performance loss.