A packet switch architecture and a method for load-balancing are described, which involve no centralised schedulers. The method is implemented by distributing extended cross-point queues (a three-dimensional structure) over all elements of the switch and deploying pollers to append packets and to select the queues to be served, together with simple local work-conserving schedulers. The queue structure is such that it renders the proof that no packet will be mis-sequenced trivial. The architecture is practical and shows enhanced performance compared with other state-of-the-art load-balancing architectures, not only for the average delay but also for the distribution of individual delays, the latter being measured by a custom tool that compares the performance of the architecture to the ideal operation of an output queued switch. The queue structure permits the fair penalisation of only the offending input-output flows within the switch in the case of buffer overflow. The basic scheme is enhanced to avoid improper operation in the presence of packet drops, a problem that reintroduces mis-sequencing and that has not been properly addressed in the class of architectures that use pollers to distribute packets.