As variability and timing closure become critical challenges in synchronous CAD flows, one attractive alternative is to use robust asynchronous circuits which gracefully accommodate timing discrepancies. In this approach, each gate in an initial Boolean netlist is typically replaced by a robust dual-rail asynchronous template. However, these circuits typically have significant area and latency overhead. A gate-level relaxation approach has recently been proposed: replacing selected simple gates by asynchronous templates performing eager evaluation, without affecting the circuit's overall timing robustness. In this paper, the approach has been significantly extended to block-level relaxation: handling arbitrarily complex multi-output blocks. For these circuits, a much wider range of optimizations is applicable than in the gate-level approach. A block-level relaxation algorithm is implemented, and experiments performed on several high-speed arithmetic circuits (Brent-Kung and Kogge-Stone adders, combinational multipliers). On average, 38.4\% of the blocks could be relaxed (48.4\% best-case),with area improvement of 27.2% (49.7% best-case)and delay improvement of 13.1% (25.5% best-case) for the critical path,while still preserving the circuit's overall timing robustness.