High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems [1,2]. Conventional routers use distinct logic blocks for routing data and handling arbitration [3,4]. At higher radices, connections between these blocks become a bottleneck, limiting router scalability and degrading performance. Recently, two switch topologies [5,6] merged the data-routing fabric with arbitration control, avoiding this bottleneck. However, [6] relies on centralized control for channel allocation, limiting performance, while [5] is restricted to a small set of fixed priorities, rendering input ports prone to starvation. In addition, ever larger CMPs will require continued increases in bandwidth over previous designs. To address these issues, we present a 64×64 single-stage swizzle-switch network (SSN) with 128b data buses (8192 total input/output wires). The SSN can connect any input to any output, including multicast. It has a peak measured throughput of 4.5Tb/s at 1.1V in 45nm SOI CMOS at 25°C. The SSN's key features are: 1) a single-cycle least-recently-granted (LRG) priority arbitration technique that reuses the already present input and output data buses and their drivers and sense amps; 2) an additional 4-level message-based priority arbitration for quality of service (QoS) with 2% logic and 3% wiring overhead; 3) a bidirectional bitline repeater that allows the router to scale to >8000 wires. These features result in a compact fabric (4.06mm2) with throughput gain of 2.1× over [5] at 3.4Tb/s/W efficiency, which improves to 7.4Tb/s/W at 600mV.