This paper describes a unified popcount/bitscanforward/bitscanreverse datapath circuit designed for 2.1GHz operation with total power consumption of 6.5 mW, targeted for 65 nm 64-bit microprocessor execution cores. The unified datapath uses a hybrid 3:2 compressor-based Wallace tree to count the number of '1's in the 64-bit input, along with a novel encoding scheme that enables reuse of the same tree to identify the bit-location of the 1st set bit when scanning the input in the forward and reverse directions. This circuit thus combines the functions of 3 separate units, enabling 26% reduction in total energy and 20% lower area, while achieving single-cycle latency & throughput.