This paper describes a new approach to reduce the ground bounce (GB) while keeping the wakeup time short for fine-grain power gating. We propose a novel algorithm to synthesize an optimal unbalanced buffer tree (UBT) that turns on parallel power switches with slight time differences. We have applied our algorithm to function units of a 32-bit microprocessor. Experimental results have revealed that our UBT gives better solution than the conventional daisy-chain approach in the space of wakeup time and GB. For example, in the ALU, our UBT suppressed the maximum GB voltage to 16mV which is 24% smaller than that of the parallel daisy chain, while keeping the wakeup time 0.6ns. In the 32b×32b multiplier, our UBT suppressed GB by 32% lower than the daisy chain but still kept the wakeup time 0.7ns. The microprocessor test chip with our UBT technique is successfully under operation.