Moving into the era of nanoscale devices, reliable clock distribution becomes a challenging problem due to the growing impact of process variations. This paper deals with this difficulty, especially on implementing useful clock skew. One possible robust way is by using programmable delay elements (PDEs) since PDEs can be adjusted after fabrication. However, with this benefit, using PDEs takes large power cost. Based on the fact that the required clock skews are quite different, depending on registers, this paper proposes a register binding approach in high-level synthesis to minimize the number of PDEs for power reduction. A mixed integer linear programming is presented to formally draw up the problem. Experiments achieve 49.4% reduction of PDEs, compared to conventional design.