In emerging CMOS process technologies, network-on-chip (NoC) fabrics are increasingly becoming susceptible to transient faults. Fault-tolerance mechanisms that are typically employed in NoCs usually entail significant energy overheads that are expected to become prohibitive as fault rates increase in future CMOS technologies. We propose a system-level framework called HEFT to trade-off energy consumption and fault-tolerance in the NoC fabric. Our hybrid framework tackles the challenge of enabling energy-efficient resilience in NoCs in two phases: at design time and at runtime. At design time, we implement an algorithm to guide the robust mapping of cores on to a die while satisfying application bandwidth and latency constraints. At runtime we devise a prediction algorithm to monitor and detect changes in fault susceptibility of NoC components, to intelligently balance energy consumption and reliability. Experimental results show that HEFT improves energy/reliability ratio of synthesized solutions by 8–20%, while meeting application performance goals, when compared to multiple prior works on reliable system-level NoC design.